Date: 14 December 2021
Speaker: Haodi Zhong (King's College London)
Title: Clustering demographics and sequences of diagnosis codes
Abstract: A Relational-Sequential dataset (or RS-dataset for short) contains records comprised of a patients values in demographic attributes and their sequence of diagnosis codes. The task of clustering an RS-dataset is helpful for analyses ranging from pattern mining to classification. However, existing methods are not appropriate to perform this task. Thus, we initiate a study of how an RS-dataset can be clustered effectively and efficiently. We formalize the task of clustering an RS-dataset as an optimization problem. At the heart of the problem is a distance measure we design to quantify the pairwise similarity between records of an RS-dataset. Our measure uses a tree structure that encodes hierarchical relationships between records, based on their demographics, as well as an edit-distance-like measure that captures both the sequentiality and the semantic similarity of diagnosis codes. We also develop an algorithm which first identifies k representative records (centers), for a given k, and then constructs clusters, each containing one center and the records that are closer to the center compared to other centers. Experiments using two Electronic Health Record datasets demonstrate that our algorithm constructs compact and well-separated clusters, which preserve meaningful relationships between demographics and sequences of diagnosis codes, while being efficient and scalable.
This is a joint work with Grigorios Loukides and Solon P. Pissis; the paper appeared in IEEE Journal of Biomedical and Health Informatics [https://doi.org/10.1109/JBHI.2021.3129461].
Date: 30 November 2021
Arthur Guijt
Abstract: Linkage Learning is at the core of most of GOMEA's scalability improvements, and is a core component of the GOMEA family of approaches. Yet if Linkage Learning fails to work, performance is reduced drastically. In this presentation I'll briefly discuss the current approach, the implicit assumptions that are made, and the failure modes that may occur in practice. Furthermore, I will talk about using neighborhoods - more specifically Local Neighborhoods, how learning linkage locally resolves some of these common failure modes, and how (much) the definition of locality matters.
Anton Bouter
Title: Optimal Mixing Evolutionary Algorithms for Large-Scale Real-Valued Optimization
Abstract: It is known that to achieve efficient scalability of an Evolutionary Algorithm (EA), dependencies (also known as linkage) must be properly taken into account during variation. In a Gray-Box Optimization (GBO) setting, exploiting prior knowledge regarding these dependencies can greatly benefit optimization. We specifically consider the setting where partial evaluations are possible, meaning that the partial modification of a solution can be efficiently evaluated. Such problems are potentially very difficult, e.g., non-separable, multi-modal, and multi-objective. The Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) can effectively exploit partial evaluations, leading to a substantial improvement in performance and scalability. Such partial evaluations have been applied to various real-world problems, including the optimization of treatment plans for prostate cancer, and deformable image registration.
Date: 16 November 2021
Joe Harrison
Title: How GOMEA can be applied to Cartesian Genetic Programming
Abstract: Genetic Programming (GP) has been a popular technique for symbolic regression. In GP expression trees are evolved by means of subtree crossover and point mutation to fit a function. In Cartesian Genetic Programming (CGP) a feed-forward acyclic graph is used rather than a tree and typically the only variation operators are point mutations. An advantage of using CGP is that nodes in the graph can be reused and more complex configurations such as graphs with skip-layers or multiple outputs are possible. The Gene-Optimal Mixing Evolutionary Algorithm (GOMEA) aims to find linkage amongst components (e.g. nodes in a graph). GOMEA was successfully applied to GP. It groups tree nodes with linkage into subsets of nodes and exchanges these subsets as means of variation. In this presentation I will discuss how GOMEA can be applied to Cartesian Genetic Programming, the pros and cons of this application and how this affects node reuse.
Speaker: Leen Stougie
Date: 2 November 2021
Aleksandr Chebykin
Title: Efficient Neural Architecture Search for ensemble learning
Abstract: Neural Architecture Search is the problem of learning a task-specific neural network structure from data. During the search, many candidate architectures need to be considered, but it is computationally infeasible to independently train them all. Weight sharing makes the search more efficient but also reduces the space of possible architectures. The tradeoffs of differently restricted search spaces are not obvious. Our idea is to traverse several spaces simultaneously in order to efficiently find an ensemble that is both small and well-performing.
Solon Pissis
Title: Bidirectional String Anchors: A New String Sampling Mechanism
Abstract: In a previous LSH seminar, we presented bidirectional string anchors, a new mechanism for sampling strings, which has small density and is computable in linear time. Furthermore, we showed that by using this sampling, we can efficiently construct a small sketch that answers pattern matching queries in near-optimal time. In this seminar, we will present an experimental evaluation, which highlights the applicability of our sketch.
Date: 19 October 2021
Riccardo Guidotti
Title: Explaining Explanation Methods
Abstract: The most effective Artificial Intelligence (AI) systems exploit complex machine learning models to fulfil their tasks due to their high performance. Unfortunately, the most effective machine learning models use for their decision processes a logic not understandable from humans that makes them real black-box models. The lack of transparency on how AI systems make decisions is a clear limitation in their adoption in safety-critical and socially sensitive contexts. Consequently, since the applications in which AI are employed are various, research in eXplainable AI (XAI) has recently caught much attention, with specific distinct requirements for different types of explanations for different users. In this webinar, we briefly present the existing explanation problems, the main strategies adopted to solve them, and the most common types of explanations are illustrated with references to state-of-the-art explanation methods able to retrieve them.
Short bio: Riccardo Guidotti is Assistant Professor at University of Pisa. Riccardo Guidotti was born in 1988 in Pitigliano (GR) Italy. In 2013 and 2010 he graduated cum laude in Computer Science (MS and BS) at University of Pisa. He received the PhD in Computer Science with a thesis on Personal Data Analytics in the same institution. He is currently an Assistant Professor at the Department of Computer Science University of Pisa, Italy and a member of the Knowledge Discovery and Data Mining Laboratory (KDDLab), a joint research group with the Information Science and Technology Institute of the National Research Council in Pisa. He won the IBM fellowship program and has been an intern in IBM Research Dublin, Ireland in 2015. He also won the DSAA New Generation Data Scientist Award 2018. His research interests are in explainable artificial intelligence, interpretable machine learning, quantum computing, fairness and bias detection, data generation and causal models, personal data mining, clustering, analysis of transactional data.
Date: 5 October 2021
Sanne Abeln
Title: Bioinformatics: from amyloid formation to oncology
Abstract: In our research group at the Computer Science Department of the VU we look at topics ranging from molecular dynamics simulations to large scale omics analysis with machine learning, most of it applied to the health domains of neurodegenerative disease and oncology. In this talk I will give an overview of current results on multi-task learning for protein structural features, simulations that show the role of hydrophobicity in amyloid formation, a machine learning based breakpoint analysis for colorectal cancer and a simple model for prediction of proteins in extracellular vesicles.
Leah Dickhoff
Title: Adaptive optimization for bi-objective treatment planning in cervical cancer brachytherapy
Abstract: The previously developed bi-objective optimization method using the Real-Valued Gene-pool Optimal Mixing Evolutionary Algorithm (RV-GOMEA) for prostate high-dose-rate brachytherapy (BT) has been extended to cervical cancer BT. This model directly optimizes on dose volume indices (DVIs), which describe specific maximum or minimum dose to subvolume relationships for each of the targets and organs at risk. Discussions with medical specialists have revealed that optimizing solely on the DVIs from the ESTRO-recommended protocol EMBRACE II does not suffice to obtain clinically acceptable treatment plans. Therefore, additional (potentially hospital-specific) DVIs need to be added to the objective functions. Because of interpatient variations in organ, target and applicator geometry, patient-specific aspiration values are required for these added DVIs. This entails the need for an adaptive optimization method, possibly incorporating different priorities in the objectives.
Date: 21 September 2021
Giulia Bernardini
Title: Incomplete Directed Perfect Phylogeny in Linear Time
Abstract: Reconstructing the evolutionary history of a set of species is a central task in computational biology. In real data, it is often the case that some information is missing: the Incomplete Directed Perfect Phylogeny problem asks, given a collection of species described by a set of binary characters with some unknown states, to complete the missing states in such a way that the result can be explained with a perfect directed phylogeny. Pe'er et al. proposed a solution that takes quasilinear time. Their algorithm relies on pre-existing dynamic connectivity data structures: a computational study recently conducted by Fernàndez-Baca and Liu showed that, in this context, complex data structures perform worse than simpler ones with worse asymptotic bounds. This gives us the motivation to look into the particular properties of the dynamic connectivity problem in this setting, so as to avoid the use of sophisticated data structures as a blackbox. Not only are we successful in doing so, and give a much simpler quasilinear-time algorithm for the Incomplete Directed Perfect Phylogeny problem; our insights into the specific structure of the problem lead to an asymptotically faster algorithm, that runs in optimal linear time.
Cedric Rodriguez
Title: Bio-mechanical modeling of large deformations of the uterus and bladder between External Beam Radiotherapy and Brachytherapy for cervical cancer
Abstract: The validation of Deformable Image Registration (DIR) algorithms within the radiotherapy field is challenging due to the missing ground truth in medical imaging. Therefore, we are trying to create a virtual phantom of the abdominal region to perform controlled bio-mechanical deformations. Our goal is to simulate the expected tissue deformation of the uterus, cervix, vagina and bladder due to the applicator insertion between External Beam Radiotherapy (EBRT) and Brachytherapy (BT) for cervical cancer. Our approach consists of the generation of volumetric models of the organ at risk (OARs) using the initial diagnostic MR scans. Thereafter, a Finite Element Method will simulate the interaction between the applicator and the OARs to predict the final organ state before BT. This approach will result in ground truth deformation vector fields that could be used to create augmented medical imaging for the validation of state-of-the-art DIR algorithms for radiotherapy.
Date: 7 September 2021
Prof. Marcel van Herk, Chair in Radiotherapy Physics, The University of Manchester
Title: On software development for medical applications
Date: 27 July 2021
Arthur Guijt
Title: Improving GOMEA for Neural Architecture Search by ‘Kernelization’
Abstract: Currently GOMEA for discrete spaces performs its search in a mostly global fashion, learning a single linkage model per population, and visiting each solution exactly once a generation to perform GOM. Within the context of Neural Architecture Search these traits have some notable downsides. The aim is to do get rid of this generational lockstep by ‘kernelization’, letting each solution be its own ‘individual’ that uses similarity & locality to improve itself, for example by learning its own linkage model. I display some initial results obtained by running GOMEA in this fashion, showing that a performance increase can be obtained in certain settings.
Date: 13 July 2021
Damy Ha
Title: Hybridizing UHV-GOMEA with UHV-ADAM for real-valued multi objective optimization
Abstract: In real-valued single objective optimization, algorithms that make use of gradient information have shown to very efficient in finding the optimal solution if the problem has few local optima. Despite evolutionary algorithms being known for their ability to avoid converging to local optima, problems that have many local optima are generally still solved faster by so called hybrid evolutionary algorithms (which employ gradient information) compared to pure evolutionary algorithms. This faster convergence of hybrid evolutionary algorithms has not been shown for multi objective problems yet. Recently a new technique has been created that redefines multi objective problems to single objective problems. This brings back the question if using this new technique allows a hybrid evolutionary algorithm to converge faster to the optimal solution than a pure evolutionary algorithm. This work hybridizes the evolutionary algorithm UHV-GOMEA with the gradient technique UHV-ADAM and investigates its performance.
Date: 29 June 2021
Martijn Bosma
Title: The Effect of Performance Estimation on Evolutionary Neural Architecture Search for Medical Image Segmentation
Abstract: In order to automate MRI/CT scan interpretation, accurate organ segmentation ML is needed. As organs have different characteristics (volume, density, clear boundaries), using Neural Architecture Search to create neural networks tailored to a specific task, can help in obtaining accurate segmentations. In this research, a novel search space is designed and evaluated. The performance of the obtained networks is compared to other SOTA neural networks. Next to these results, an analysis is done on the effect of performance estimation on evolutionary search algorithms in terms of accuracy and speed-up. This analysis demonstrates that performance estimation at different Spearman correlation levels can deteriorate the accuracy of a NAS algorithm, for similar run times.
Date: 15 June 2021
Speaker: Joost Commandeur
Title: Improving the homogeneity (reducing hot spots) of treatment plans for high dose rate brachytherapy as generated by BRIGHT (MO-RV-GOMEA)
Abstract: BRIGHT (an implementation of Multi Objective Real Valued Gene-Pool Optimal Mixing) has been in use to create brachytherapy treatment plans for patients with prostate cancer since the spring of 2020. A treatment plan consist of so-called 'dwell times', which is the duration that a radioactive source stays in a single 'dwell position' within a catheter (hollow needle) that has been placed inside the body. The goal of such a treatment plan is to deliver the prescribed radioactive dose to the target organs (i.a. prostate), whilst sparing surrounding organs and tissue (i.a. bladder). This multi-objective goal is formulated in a clinical protocol which states the prescribed upper and lower bounds for the different organs involved. In practice it has been noticed that even though the plans as generated by BRIGHT adhere to the clinical protocol, the clinical experts still adjust the resulting plans. In our research we try to tackle one of the aspects that requires adjustment, which is the homogeneity of the delivered radioactive dose by adjusting BRIGHT.
Date: 1 June 2021
Evi Sijben
Title: Diverse high quality solutions using multi-tree multi-objective gene pool optimal mixing evolutionary algorithms
Abstract: Being able to find diverse high quality solutions in machine learning can have many purposes. One such example is to compute uncertainty estimates. Here we assume that the more aligned the different models are the more certain we are about the validity of the output. We search for multiple models, using a multi-tree representation. These multi-tree models are evaluated on multiple objectives, namely diversity between trees and quality of all trees. By doing so, we integrate into the search process a desirable characteristic of human explanatory cognition; exploring alternative hypotheses to explain observations.
Arkadiy Dushatskiy
Title: A Novel Surrogate-assisted Evolutionary Algorithm Applied to Partition-based Ensemble Learning
Abstract: We propose a novel surrogate-assisted Evolutionary Algorithm for solving expensive combinatorial optimization problems. We integrate a surrogate model, which is used for fitness value estimation, into a state-of-the-art P3-like variant of the Gene-Pool Optimal Mixing Evolutionary Algorithm (GOMEA) and adapt the resulting algorithm for solving non-binary combinatorial problems. We test the proposed algorithm on an ensemble learning problem. In our experiments we use five classification datasets from the OpenML-CC18 benchmark and Support-vector Machines as learners in an ensemble. The proposed algorithm demonstrates better performance than alternative approaches, including Bayesian optimization algorithms. It manages to find better solutions using just several thousand fitness function evaluations for an ensemble learning problem with up to 500 variables.
Date: 18 May 2021
Leen Stougie
Title: An Approximation Algorithm for a Phylogenetic Network Problem
Abstract: Often different parts of the genome give rise to different phylogenetic relationships in the form of trees. This is usually not a matter of inexact data, but a true representation of the phenomenon that there may have occurred hybridization between subspecies. This should rather be displayed by a network than by a tree. Therefore rather recently various models for displaying such hyberdization events have been proposed. One of the oldest ones is the Maximum Agreement Forest problem. This is an NP-hard problem and over the past two decades ever improving approximation algorithms have been presented. In this lecture I will explain you the problem and give its raison d'être. Then I will present a high brow view of a 2-approximate algorithm for the problem, which is currently the best in terms of worst-case performance analysis. This is joint work with Neil Olver (London School of Econ.), Frans Schalekamp (Cornell), Anke van Zuylen (Cornell)
Georgios Andreadis
Title: Multi-Objective Deformable Image Registration for 3D Images Using the Gene-pool Optimal Mixing Evolutionary Algorithm
Abstract: Finding the most likely deformation of one image to another image is a problem with important applications in medical imaging, where large deformations and structural changes are common. Approaching this problem of Deformable Image Registration (DIR) from a multi-objective perspective has proven successful at producing a set of realistic registrations for 2D images for the user to choose from. This multi-objective optimization process is driven by the Real-Valued Gene-pool Optimal Mixing Evolutionary Algorithm (RV-GOMEA). Recent parallelization efforts with this algorithm on the Graphics Processing Unit (GPU) have delivered large speed-ups for 2D images by exploiting the conditional independence of local regions. This work builds on these advances and adapts the approach to support 3D images, introducing a new spatial model and a refined deformation energy model while still supporting annotated guidance information and multi-resolution schemes. We find that our proof-of-concept prototype can successfully tackle synthetic registration problems and are currently conducting experiments on patient scans to validate our approach on real-world problems.
Date: 4 May 2021
Alexandr Chebykin
Title: Improving Neural Architecture Search by encouraging diversity
Abstract: Neural Architecture Search (NAS) is concerned with learning structure of a neural network from data. During the search, many candidate architectures need to be considered, but computational constraints make it impossible to independently train all candidates until convergence (or even evaluate them all). Various techniques make the search more efficient, while at the same time reducing the space of architectures that can be found. In this talk, I will describe our approach to improving upon a SOTA algorithm by optimizing an additional objective function that measures novelty of newly discovered solutions. Limitations of the approach, as well as of the current NAS paradigm in general, will also be discussed.
Joe Harrison
Title: Differentiable Cartesian genetic programming for small interpretable symbolic expressions
Abstract: Interpretability is a desirable quality when it comes to machine learning. In genetic programming typically (binary) trees are used to evolve symbolic expressions that are human-readable. The leaf nodes of the tree can either be input data or ephemeral random constants (ERC) and the other nodes are operators (x, +, -, %) that take the output of its child nodes as input. In contrast, Cartesian genetic programming (CGP) uses a directed acyclic graph as a genotype. This allows for skip connections and reuse of salient patterns within the graph and possibly leads to more expressive formulas compared to a tree-based GP algorithm. Another issue with standard GP is that the ERC leaf nodes are instantiated once and do not change over time. By making the expression differentiable the 'constant' leaf nodes can be updated using backpropagation. This possibly takes away part of the burden of the GP algorithm having to evolve specific values using operator nodes. GP trees often suffer from bloat, a significant increase in solution size with insignificant improvement in terms of performance, which makes resulting expressions long and unreadable. CGP naturally uses a constrained template which prevents bloat. This template makes CGP an ideal candidate for Gene Optimal Mixing (GOM), a method where subsets of nodes are exchanged among solutions. GOM ensures improvement and offers the possibility of exploiting linkage information. With these adjustments, more expressive and shorter formulas are possible.
Date: 20 April 2021
Leah Dickhoff
Title: Cervical cancer case modelling for bi-objective treatment planning in brachytherapy
Abstract: The previously developed bi-objective optimization method using the Real-Valued Gene-pool Optimal Mixing Evolutionary Algorithm (RV-GOMEA) for prostate high-dose-rate brachytherapy (BT) is extended to cervical cancer patients. The clinical objectives in terms of dose and dose-point indices, describing minimum doses to targets and maximum doses to organs at risk, are calculated directly from the EMBRACE-II protocol. They are then converted into a bi-objective formulation in which the trade-off is characterized by the Least Coverage Index versus Least Sparing Index. In order to generate clinically acceptable treatment plans, additional constraints on dwell time modulation and needle dose contribution are deemed necessary.
Timo Deist
Title: Multi-objective learning for asymmetric Pareto fronts
Abstract: We will revisit the multi-objective (MO) learning problem to predict Pareto fronts presented in December. This time, I will highlight the issue of asymmetric Pareto fronts (asymmetry along the line L_1(x)=L_2(x)) that can occur when the competing loss functions are different in scale and curvature. Our proposed MO approach uses HV maximization and does not require specifying trade-offs before training. It is expected to outperform existing Multi-Task Learning (MTL) methods that require trade-offs to be known a priori which is a difficult requirement when Pareto front characteristics are unknown. Results from three experiments will be used to illustrate differences between our HV maximization-based approach and existing MTL algorithms.
Date: 6 April 2021
Solon Pissis
Title: Bidirectional string anchors: A new mechanism for string sampling
Abstract: We will present a new mechanism for sampling strings, which has small density and is computable in linear time. Furthermore, we will show that by using this sampling, we can efficiently construct a small sketch that answers pattern matching queries in near-optimal time. This is a joint work (in progress) with Grigorios Loukides from King's College London.
Thomas Uriot
Title: Interpretable dimensionality reduction using multi-objective Genetic Programming
Abstract: In this talk, we will discuss some of the ways to perform interpretable dimensionality reduction using Genetic Programming. These include using a teacher model, such as a neural-based autoencoder, a fully GP-based autoencoder and a manifold learning approach using GP as the function mapping from high to low dimensions. We also introduce variants of GP representation (vanilla, multi-tree, shared multi-tree). We compare all these approaches on several datasets by looking at their predictive power for downstream classification tasks.
Date: 23 March 2021
Speaker: Giulia Bernardini
Title: Constructing Strings Avoiding Forbidden Substrings
Abstract: We consider the problem of constructing strings over an alphabet Σ that start with a given prefix u, end with a given suffix v, and avoid occurrences of a given set of forbidden substrings. In the decision version of the problem, given a set S of forbidden substrings, each of length k, over Σ, we are asked to decide whether there exists a string x over Σ such that u is a prefix of x, v is a suffix of x, and no element of S occurs in x. Our first result is an O(|u|+|v|+k|S|)-time algorithm to decide this problem. In the more general optimization version of the problem, given a set S of forbidden arbitrary-length substrings over Σ, we are asked to construct a shortest string x over Σ such that u is a prefix of x, v is a suffix of x, and no element of S occurs in x. Our second result is an O(|u|+|v|+||S||*|Σ|)-time algorithm to solve this problem, where ||S|| denotes the total length of the elements of S.
Interestingly, our results can be directly applied to solve the reachability and shortest path problems in complete de Bruijn graphs in the presence of forbidden edges or of forbidden paths. Our algorithms are motivated by data privacy, and in particular, by the data sanitization process. In the context of strings, sanitization consists in hiding forbidden substrings from a given string by introducing the least amount of spurious information. We consider the following problem. Given a string w of length n over Σ, an integer k, and a set S of forbidden substrings, each of length k, over Σ, construct a shortest string y over Σ such that no element of S occurs in y and the sequence of all other length-k fragments occurring in w is a subsequence of the sequence of the length-k fragments occurring in y. Our third result is an O(nk|S|*|Σ|)-time algorithm to solve this problem.
Monika Grewal
Title: Multi-Objective learning for Deformable Image Registration
Abstract: Deformable Image Registration (DIR) is an optimization task, wherein a Deformation Vector Field (DVF) is optimized to deform the source image such that it aligns with the target image. DIR is inherently multi-objective, requiring maximization or minimization of multiple conflicting objectives e.g., maximize image similarity and minimize deformation magnitude. Further, the computationally expensive nature of the DVF optimization gives a strong motivation to develop learning-based algorithms. In this talk, I will discuss a potential method for multi-objective learning of the DIR task.
Date: 9 March 2021
Anton Bouter
Title: GPU-Accelerated Parallel Gene-pool Optimal Mixing applied to Multi-Objective Deformable Image Registration
Abstract: The Real-Valued Gene-pool Optimal Mixing Evolutionary Algorithm (RV-GOMEA) has previously been successfully used to achieve highly scalable optimization of various real-world problems in a gray-box optimization setting. Deformable Image Registration (DIR) is a multi-objective problem, aimed at finding the most likely non-rigid deformation of a given source image so that it matches a given target image. We specifically consider the case where the deformation model allows for finite-element-type modeling of tissue properties. This optimization problem is non-smooth, necessitating techniques like EAs to get good results. Though the objectives of DIR are non-separable, non-neighboring regions of the deformation grid are conditionally independent. We show that GOMEA allows to exploit such knowledge through the large-scale parallel application of variation steps, where each is only accepted when leading to an improvement, on a Graphics Processing Unit (GPU). On various 2-dimensional DIR problems, we find that this way, similar results can be achieved as when sequential processing is performed, while allowing for substantial speed-ups (up to a factor of 111) for the highest-dimensional problems (i.e., the highest deformation-grid resolution). This work opens the door to the extension of this type of DIR to larger (3-dimensional) deformation grids, and its application to other real-world problems.
Date: 23 February 2021
Michelle Sweering
Date: 9 February 2021
next set of 8 presentations
Date: 26 January 2021
set of 8 presentations