CWI researchers Hannes Mühleisen and Mark Raasveldt founded the spin-off company DuckDB Labs, providing services and development for DuckDB. DuckDB is an open-source database management system aimed at efficient data analytics. It is easy to install, works very fast and within running processes. Presently, DuckDB is downloaded about 100.000 times per week.
Companies, governments and academic research groups collect ever more data. These data are stored in large databases and one of the main challenges is to extract new insights from the data as fast as possible. That’s one of the jobs of a database management system.
In 2019 Hannes Mühleisen and Mark Raasveldt, researchers in the Database Architectures research group at Centrum Wiskunde & Informatica (CWI), released the first open-source version of their database management system DuckDB. DuckDB is the first purpose built in-process Online Analytical Processing (OLAP)-database management system.
“DuckDB got its name because I used to have a pet duck”, Mühleisen laughs. “Ducks are amazing animals. They can fly, walk and swim, and they are quite resilient to environmental challenges. So, they are the perfect mascot for a versatile and resilient data management system.”
Now, two years later, DuckDB has become a huge success: it is downloaded about 100.000 times per week, mainly by data scientists and corporate users. “In a world where most successful software has been developed in the corporate sector in the USA, it is remarkable that software coming out of the publicly funded research institute CWI is gaining such a traction”, says Mühleisen.
With the aim of creating an even better system, Mühleisen and Raasveldt just founded DuckDB Labs B.V. as a spin-off company from CWI. The company will act as the environment for innovative projects around DuckDB, to provide further development of the system as well as a platform for support services. Mühleisen emphasizes that DuckDB will still continue to be an Open-Source project under the current permissive MIT licence.
What distinguishes DuckDB from existing database management systems?
Raasveldt: “First of all, DuckDB aims at analytical use cases where it is needed to look at lots of data at the same time. Think about cases where millions of rows have to be aggregated, or where giant tables need to be combined. There are many such use cases in business reporting or in statistical analysis.”
Second, DuckDB runs inside other processes already running on the computer. Raasveldt: “If you do data analysis in Python, DuckDB will run inside Python. That creates the advantage that the data transfer is very quick. Actually, DuckDB is the first in-process OLAP-database system that manages large amounts of data. We call ourselves the ‘SQLite for analytics’. SQLite is the world’s most popular database management system, but it doesn’t do analytics.”
From the practical point of view, DuckDB is ‘lean and mean’. It is a small software package that everybody can easily install and for which no separate server is needed. Finally, DuckDB works fast. This is because DuckDB builds on state-of-the art database research origining from the CWI Datatabase Architecture group. For example, it uses a query processing technique called vectorized execution, which was developed at CWI in 2005.
DuckDB Labs is the newest addition to the list of 28 start-up companies originating at Centrum Wiskunde & Informatica in Amsterdam, a research institute in mathematics and computer science. This tradition of research-based spin-offs is in line with the CWI agenda of converting fundamental research into projects that are beneficial for the society.