Speeding Up DNA Analysis With String Algorithms

Addressing some of the biggest challenges in medical science relies on processing unimaginably huge amounts of data. Theoretical computer scientists like Hilde Verbeek, second year PhD student at Centrum Wiskunde & Informatica (CWI) in the Netherlands, are creating algorithms that can more efficiently sift through this data, meaning critical analysis can be performed faster than ever.

Hilde Verbeek _Academic_Storie_Portraits_MK3_1452

Addressing some of the biggest challenges in medical science relies on processing unimaginably huge amounts of data. Theoretical computer scientists like Hilde Verbeek, second year PhD student at Centrum Wiskunde & Informatica (CWI) in the Netherlands, are creating algorithms that can more efficiently sift through this data, meaning critical analysis can be performed faster than ever.

One of the most important applications for this theoretical research - which is carried out on paper rather than a computer program - is in DNA analysis. “A single human genome consists of around 3 billion base pairs, and in practice, the amount of data that's worked with is even larger than that,” says Hilde. “So we work on algorithms and data structures that allow this analysis to be done faster and that use less space.” This is pivotal in situations like the Covid pandemic, where rapidly tracking the spread of new variants around the world was key to tackling the virus.

Hilde’s work currently involves developing algorithms to identify what’s known as the shortest unique substring. “Given some sequence like DNA or text, we’re looking for a certain part of this sequence that occurs just once,” Hilde explains. In DNA, finding the shortest part of the sequence that meets this criterion enables certain genes to be more easily identified.

“There is a very simple algorithm that can perform this task in time proportional to the length of the sequence and uses fundamental techniques commonly taught in universities. But we’ve found a way to do this faster than would be intuitively possible - which is very interesting because it's a lot more complex, and it uses a lot of different techniques. We do this by basically taking advantage of the fact that in many applications, such as DNA analysis, we are working with alphabets that are very small - DNA has just four different characters.” In practice, that means these abstract algorithms can shorten the time it takes to find genetic disorders and abnormalities. 

Constance van Eeden Fellowship

As a research institute for maths and science, rather than a university, there’s no teaching at CWI for PhD students. “For a PhD student, this means that a lot more focus can be given to research, and I think it's created a very nice atmosphere here,” says Hilde. She was the first recipient of CWI’s Constance van Eeden fellowship, which offers a female student a PhD position and is named after one of the first women to receive a PhD in statistics in the Netherlands. “It’s given me a lot of freedom to choose what I want to do within my PhD. I got to choose which research group I wanted to join and also choose my supervisor, which is how I ended up in this string algorithms group. But it also means I can work on projects that are outside of this research group, if I want to.”

Part of Hilde’s fellowship also includes mentorship from a CWI academic outside of her area of study. “She guides me through things that are not directly related to research, but are important to know when you're pursuing an academic career. I can go to her with any questions or problems I have,” she says. The Diversity, Equity and Inclusion team Hilde is a part of is pushing for this type of mentorship to be standard for all PhD students. “We are trying to guide the policy-making within the institute to create a more diverse, inclusive, and equitable academic world, and we believe this would really help students who are in some way disadvantaged. 

Hilde Verbeek in front of CWI office

Diversity at CWI

“Diversity has been a very big focus at CWI over the past few years, which is very good to see. I think it's important not only because it's fairer to people, but by allowing for different perspectives, it will also accelerate research.”

This kind of collaboration is found throughout the institute even outside the labs. “There’s a community atmosphere,” says Hilde. “There are a lot of organised group activities, which help us connect with each other. There's very much an effort made to allow people to let themselves be distracted, to socialise, to take the stress off, and to meet others. And I'm very happy that's done here.” 

Text by Academic Positions