Big Data Processing (E)
Niveau
Master's course
Learning outcomes of the courses/module
The following skills are developed in the course:
- The students are familiar with the special challenges involved in storing and processing large quantities of data (V-model: Volume, Variety, Velocity, Veracity).
- Students know the options for meeting these challenges (exemplary systems from the respective areas of the V-model are discussed).
- Students can develop and apply appropriate solutions to a specific problem.
- The students are familiar with the special challenges involved in storing and processing large quantities of data (V-model: Volume, Variety, Velocity, Veracity).
- Students know the options for meeting these challenges (exemplary systems from the respective areas of the V-model are discussed).
- Students can develop and apply appropriate solutions to a specific problem.
Prerequisites for the course
3rd semester: No prerequisites
Course content
Students are introduced to the basic features of Big Data. Special attention is paid to the handling of this data and the knowledge acquired is consolidated with examples. Suitable frameworks for solving Big Data problems are presented and worked on in interactive workshops with case studies. Examples of this are as follows:
- Apache Hadoop
- Apache Spark
- Apache Flink
- Apache Storm
- Apache Samza
- Apache Kafka
These frameworks will be explained and used with case studies. For this purpose, the centrally-provided Data Labs can be accessed.
- Apache Hadoop
- Apache Spark
- Apache Flink
- Apache Storm
- Apache Samza
- Apache Kafka
These frameworks will be explained and used with case studies. For this purpose, the centrally-provided Data Labs can be accessed.
Recommended specialist literature
PRIMARY LITERATURE:
- Jain, V. K. (2017): Big Data and Hadoop (Ed. 1), Khanna Book Publishing, New Delhi (ISBN: 978-9382609131)
- Karau, H.; Warren, R. (2017): High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark (Ed. 1), O'Reilly Media, Farnham (ISBN: 978-1491943205)
SECONDARY LITERATURE:
- O'Neil, C.; Schutt, R. (2013): Doing Data Science. Straight Talk from the Frontline (Ed. 1), O'Reilly Media, Sebastopol (ISBN: 978-1449358655)
- Narkhede, N.; Shapira, G.; Palino, T. (2017): Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale (Ed. 1), O'Reilly Media, Farnham (ISBN: 978-1491936160)
- Jain, V. K. (2017): Big Data and Hadoop (Ed. 1), Khanna Book Publishing, New Delhi (ISBN: 978-9382609131)
- Karau, H.; Warren, R. (2017): High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark (Ed. 1), O'Reilly Media, Farnham (ISBN: 978-1491943205)
SECONDARY LITERATURE:
- O'Neil, C.; Schutt, R. (2013): Doing Data Science. Straight Talk from the Frontline (Ed. 1), O'Reilly Media, Sebastopol (ISBN: 978-1449358655)
- Narkhede, N.; Shapira, G.; Palino, T. (2017): Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale (Ed. 1), O'Reilly Media, Farnham (ISBN: 978-1491936160)
Assessment methods and criteria
Written exam
Language
English
Number of ECTS credits awarded
4
Share of e-learning in %
25
Semester hours per week
2.0
Planned teaching and learning method
The following methods are used:
- Lecture with discussion
- Group work
- Interactive workshop
- Lecture with discussion
- Group work
- Interactive workshop
Semester/trimester in which the course/module is offered
1
Name of lecturer
Tuta Mario
Academic year
Key figure of the course/module
DPR.1
Type of course/module
integrated lecture
Type of course
Compulsory
Internship(s)
none