"Embedded analytics provide big savings because you don't have to drag around as much data and it's easy to build into a larger data pipeline", says Hannes Mühleisen, senior researcher at the Database Architectures group. In 2019 Mühleisen launched the open-source database system DuckDB together with his colleague Mark Raasveldt. DuckDB is small, agile and efficient. It requires ten to a hundred times less hardware capacity than competitor Spark. Unlike Pandas, another popular data science tool, it can handle data that is larger than memory and can profit from parallel processing using multiple cores, present in all computers. DuckDB rapidly became a huge success, with more than two million downloads per month at the beginning of 2023.
"The development of DuckDB was made possible by the great freedom I had at CWI to invent something myself," Mühleisen says. "I had the conviction that for most data problems you don’t need a scale-out of the data to multiple computers. I believed that you can do much more on one computer than most people thought. In the coming years I would like to expand that vision, on the one hand, to significantly reduce the carbon footprint of IT systems and, on the other hand, to give users more control over their own data, thus limiting the power of cloud companies."
Spin-offs
What the Database Architectures group does, is very difficult to realize at a university as the projects significantly exceed the size of a PhD track, nor in companies where the focus is on relatively short term results. Boncz: "For a database system, you have to work on it with at least five people for at least ten years. You can't have fifty people do it in a year. It is CWI’s commitment to invest in long-term software development that led our group to produce MonetDB, VectorWise, and now DuckDB."
In 2021 Mühleisen and Raasveldt founded the spin-off company DuckDB Labs, which provides services and development for DuckDB. In the fall of 2022 DuckDB Labs helped to create the startup company MotherDuck, which connects DuckDB to the cloud. MotherDuck managed to raise 47.5 million dollar in funding.
Datasystems ecosystem
Scientific breakthroughs that inspire new businesses fits into Boncz's long-term vision for the Netherlands to create a data systems ecosystem of research, education and business. Gradually he is seeing the first results of that vision. For example, CWI has been instrumental in the establishment of the R&D center of the American company Databricks in Amsterdam, for which Databricks invested a hundred million euro in the past four years. "You could say that a hundred million euro has been pumped into the Dutch economy thanks to our work", says Boncz.
Boncz and Mühleisen are proud that CWI's long-term software development, which is part of its mission, is having such an impact on database applications used worldwide. Boncz: "If you look at the evolutionary lineage of all database systems, you can say that of the analytical systems 85% have a strong CWI signature." The other systems prominently include Snowflake, which achieved the biggest stock launch ever in 2021 and which was co-founded by Marcin Zukowski, a PhD student from CWI’s Database Architectures group. Zukowski had previously created the VectorWise system.