Data-warehouse company Snowflake made headlines in 2020, when it reached an extraordinary market value of $70.4 billion, after issuing its first stock to the public on the New York Stock Exchange. Snowflake offers the first cloud-based data warehousing service that is truly designed for the cloud. Notable features are that it’s ‘elastically’ growing and shrinking a system based on how busy it is, decoupling computation from storage, and automating many administration and configuration tasks.
Less known may be that in its data storage and query engine, Snowflake uses two technologies: vectorized query execution and lightweight compression methods in its columnar data storage. Both were techniques pioneered in CWI’s Database Architectures group, but now popular in analytical database systems. One of the group’s PhD graduates developing those technologies, was Marcin Żukowski, who would later co-found Snowflake in 2012.
How do look back on your time at CWI?
“For decades now, CWI’s data research group is considered to be one of the most interesting ones doing research in the database space, in particular around the performance aspects. As such, being there was great fun, as there were always many interesting, often controversial, and sometimes a bit crazy ideas flying around. With MonetDB, the group also had a history of building sizable software projects. This environment helped us build our new system, called X100. I would probably never come up with the idea that became the basis for the vectorized execution if I didn't work on MonetDB.”
When you were at CWI, did you ever realize that the technology you were working on would grow out to this level?
“It was clear to me - and everyone else in this field at some point - that it would become the storage format for analytical databases. MonetDB for sure was one of the first ‘real’ systems doing it. Also, column storage evolved a lot, and has different forms, and it’s hard to clearly define what a ‘column store’ exactly is.”
“As for vectorized execution, I knew it has amazing potential. So, it was not surprising when we started hearing that multiple systems from various major corporations started incorporating these ideas into their products. It’s amazing to see even now, after 15 years, new systems are being built citing X100 as their main inspiration. The recent Databricks announcement on their Delta Engine is an example of this.”
“So yes, for sure, column stores and vectorized execution are two of the fundamental techniques behind Snowflake's success. The main engine is superfast in large part thanks to these two techniques.”
What made you decide to do your PhD research at CWI?
“First of all, there was so much luck involved here. First, I decided to come to Amsterdam for the final year of my studies. I wanted to do a MSc thesis in databases, but more on the theory side. My professors at the Vrije University (prof. Henri Bal in particular) suggested I should talk to CWI’s Data Architectures group, as they are go-to database folks in Amsterdam. They quickly convinced me that a more practical research is a better idea. All these little things pointed me to getting into this field.”
Did your PhD research at CWI meet your expectations?
“CWI’s Data Architectures group has a strong focus on performance, and also they had MonetDB, a great system for testing new ideas. That let me learn a tremendous amount during my MSc stay, and some of the very core ideas for what later became X100/Vectorwise were formed then.”
“During PhD, it was great to be surrounded by some students with similar focus, but also some working in adjacent fields. For example, my best friend from PhD, Roberto Cornacchia, worked on a system built on top of MonetDB or X100, and he challenged us, as a user, to make our system more functional and better.”
“The one thing that made this all come together was my relationship with Peter Boncz, first as a teacher and mentor, then as a lifetime friend. We “clicked”, and I think together we were able to tackle really hard problems quite well. I was so, so lucky to find him.”
Do you have any advice for other researchers, or students who want to follow in your footsteps?
“My best advice is to go on internships. Too few students in Europe do that. Another advice would be to work with others. It’s very rare that a single idea is enough to make a big change. You need many ideas to build something meaningful, so you need many people. Finally, find real applications for your technology, and search for problems real users have that researchers rarely tackle.”