Automatic Performance Tuning Approaches for Distributed Data Stream Processing Systems

21 March 2023

zoom

18:00

On Tuesday, March 21, at 6pm, our MSc programme will have the pleasure and honor of hosting online  Assistant Professor in the Department of Electrical Engineering and Computer Engineering and Informatics at the Cyprus University of Technology, who will deliver a lecture on advanced data engineering techniques.

https://authgr.zoom.us/j/94212001683?pwd=KzFTTWVnV25tQWd1SFFJMU8waS90QT09

Abstract:

Distributed data stream processing systems (DSPSs) such as Storm, Flink, and Spark Streaming are now routinely used to process continuous data streams in (near) real-time. However, achieving the low latency and high throughput demanded by today’s streaming applications can be a daunting task, especially since the performance of DSPSs highly depends on a large number of system parameters that control load balancing, degree of parallelism, buffer sizes, and various other aspects of system execution. This talk offers a comprehensive review of the state-of-the-art automatic performance tuning approaches that have been proposed in recent years. The approaches are organized into five main categories based on their methodologies and features: cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. The categories of approaches will be analyzed in depth and compared to each other, exposing their various strengths and weaknesses. Finally, we will identify several open research problems and challenges related to automatic performance tuning for DSPSs.

Bio:

Dr. Herodotos Herodotou is an Assistant Professor in the Department of Electrical Engineering and Computer Engineering and Informatics at the Cyprus University of Technology. He received his Ph.D. in Computer Science from Duke University in May 2012. His Ph.D. dissertation work received the ACM SIGMOD Jim Gray Doctoral Dissertation Award Honorable Mention as well as the Outstanding Ph.D. Dissertation Award in Computer Science at Duke. Before joining CUT, he held research positions at Microsoft Research, Yahoo! Labs, and Aster Data as well as software engineering positions at Microsoft and RWD Technologies. His research interests include large-scale Data Processing Systems and Database Systems. In particular, his work focuses on ease-of-use, manageability, and automated tuning of both centralized and distributed data-intensive computing systems. In addition, he is interested in applying database techniques in other areas like maritime informatics, scientific computing, smart tourism, and social computing. His research work to date has been published in several top scientific conferences and journals (e.g., PVLDB, SIGMOD, SoCC, CIDR), three books, and two book chapters.