DevConf.US 2021 is the 5th annual, free, Red Hat sponsored technology conference for community project and professional contributors to Free and Open Source technologies coming to Boston!
Relational databases are wonderful tools, and they are more than capable of handling many workloads. But one dark day the data stopped flowing. As our customer base grew, so did the data volume and we couldn’t keep up. As we watched our queue grow, data processing put an undue burden on our database, preventing the API from serving requests in a timely manner. Enter Trino to save the day. This talk will discuss how and why our database was bottlenecked and how we leveraged object storage (AWS S3), Hive, and Trino to get our data processing pipeline back on track and scalable for future growth. We’ll look at the properties of our data and how its particular characteristics strained our system. Next, we’ll see how using object storage and the parquet data format enable low cost long term data storage and efficient data access for analytics workloads. Finally, we’ll explore how Trino with Hive enables fast, scalable data analysis using the SQL you already know. And rest assured, there’s still plenty of work left for a relational database. Leave with an overview of a big data toolchain, and how one team is making use of it, for big and not so big datasets, and learn when it may be time to switch data processing architectures to prevent show stopping bottlenecks.