Tamás is a staff engineer in the Data Engineering Team at Prezi. He leads a team of magicians who provide a platform to fulfill all of the analytical needs of Prezi at scale. Tamás is also a committer in the Apache Gobblin project.
Social Profiles
All Sessions by Tamás Németh
NDR, July 28 2020
09:45 (EEST)
How We Redesigned Our Data Ingestion Pipeline at Prezi
09:45 (EEST) - 10:25
One of the most annoying/soul-destroying tasks in Data Science and Analytics is cleaning data.
It was not different for us at Prezi as well, but there was a point where we said it’s enough, and we want to minimize the time spent on this.
We have gone through several iterations in getting the right data and learned that the key to your success would ultimately lie in how you are capturing that data, to begin with, and providing quick feedback to teams. In this talk, I will cover how we took our need to understand better how our users use our product and how we ended up designing a system for event processing to get those insights. Even though this does not sound hard, we burnt ourselves a couple of times, and we redesigned our data ingestion pipeline a couple of times to get to the state where we are today. We will start by covering how our data ingestion pipeline evolved from starting with semi-structured event data copied to S3 with a bash script to using Avro with Confluent schema registry ingesting events from Apache Kafka with Apache Gobblin to S3. Even though Apache Kafka helps a lot in scaling, just using Kafka is not your silver bullet. We had to introduce multiple components like using Avro format and the schema registry to solve for the missing pieces. We will also cover how we built around this ecosystem to make sure engineers can’t break our system, make it less painful to instrument the right events and how instrumentation works across the various platforms we support (Mac, Windows, IOS, Android, Web).
Share
Tamás Németh
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.OkRead more