CWI & Databricks: Big Data in Amsterdam

Amsterdam wants to play a leading international role in the development of data science research. In Big Data Amsterdam, Financieele Dagblad journalist Job Woudt interviews Amsterdam Data Science researchers on the functioning of the ecosystem where companies and knowledge institutions in Amsterdam collaborate in the area of ​​Big Data.

Publication date
17 Mar 2017

Peter Boncz, Senior Researcher in Database Architectures at Centrum Wiskunde & Informatica (CWI) and Professor in Large-Scale Analytical Data Management at VU University, is working with the US company Databricks which has a new location in Amsterdam. Data bricks develops open-source software, Spark, which enables companies and organizations to analyze large amounts of data. Databricks has clients including big names like Cisco, Samsung, Viacom and NBC Universal.

Reynold Xin, Co- founder and Chief Architect for Spark at Databricks (set-up by staff from the University of Berkeley). Xin heads the new Amsterdam branch of Databricks, where some former students of Boncz also now work.

The interview in brief

Xin: “There is much expertise in the field of high performance database, a lot of talent. By having a physical presence in Amsterdam, we hope to make our analyzes faster …”

Boncz: “I was already working on the important techniques … Databricks wanted to add to Spark. ”

Xin: “…We started taking about collaboration in October, then by January we had 7 engineers working in Amsterdam. ”

Boncz:  “This is nice for Amsterdam Data Science. Spark thus play a role in our ecosystem. Data Science is increasingly important for companies. Google was first with MapReduce, from which it was easy to program for large data clusters. That was ten years ago. Yahoo then with Hadoop … Spark goes beyond this. You will see that companies will use Spark. ”

Xin: “It is a database service that we offer. Companies can subscribe to it, we host it in the cloud. ”

Boncz: “Everyone has their own interpretation of Big Data. Firstly, volume (or large amounts), then variety (or from multiple forms), not just tables, it can be text, sound, or image data. The analysis of which is becoming more sophisticated, so we are better able to make decisions. Finally, the data is not static, but a data stream … ”

Xin: “We offer a platform. Customers can then build their solutions.”

Boncz: “Google has a very closed platform… not open-source, like Spark … ”

The collaboration with CWI

Boncz: “Databricks finance research staff and we help the team with the architecture of the data solutions… leading to a faster processing system … It is very interesting for us … It gives us insight into customer questions and that lead to investigation … ”

Xin: “It is a common model in the US to work with academia. ”

Boncz: “Of the six founders, two are professors, there is a strong academic mindset… Our collaboration with Spark offers a broader approach and perspective.”

Data Science is incorporated anywhere

Boncz: “You see that all the other sciences need data science… It is part of every profession. ”

Xin: “At Databricks the motto is: let the data decide.”

Boncz: “The human challenge is the greatest. There are insufficient data scientists…”

Xin: “We have our work fully integrated with the activities in San Francisco… Amsterdam is also a good business location. It is an international city… It’s our first expansion …

Read more in het Financieele Dagblad (behind paywall)

Homepage of Peter Boncz

Header image: Shutterstock