Promotors:
- Prof. Dr. S. Abeln
- Prof. Dr. J. Heringa
- Prof. Dr. C. Teunissen
Proteomics, the large-scale study of proteins in tissues, cells, or fluids, is crucial for understanding biological processes in health and disease. Protein biomarkers, in particular, provide valuable insights into disease status and progression, making them vital tools in healthcare. Fluid biomarkers, such as those found in blood, hold great promise due to their ease of sampling and measurement. In the case of dementia, these biomarkers enable early, pre-symptomatic diagnosis. However, developing reliable biomarkers is a lengthy and challenging process, highlighting the need for a deeper understanding of the properties of suitable protein candidates. Since biomarkers often have to be established in antibody-based immunoassays for clinical use, exploring how assay antibodies interact with protein biomarkers is equally important. Bioinformatics and machine learning-based prediction of relevant biological features, has the potential to advance biomarker development. This thesis reviews current open-access bioinformatics tools and data resources, proposing a workflow to optimally integrate them into fluid biomarker assay development. It also applies available bioinformatics resources to characterize three established protein biomarkers of Alzheimer’s Disease. Two novel machine learning models were created to predict and analyze protein properties critical to assay development, specifically protein secretion from the brain into the cerebrospinal fluid and protein association with extracellular vesicles. Additionally, an in-depth analysis of antibody-binding regions on protein antigens improves our understanding of antibody function. A newly developed R package for immunogen visualization makes these insights readily accessible to assay developers, aiding in the selection of appropriate antibodies. Finally, a comparison of available antibody clustering tools highlights the interplay between antibody sequence, structure and function. In summary, this thesis presents multiple strategies to enhance biomarker discovery and antibody selection through the application of interpretable machine learning models, detailed analyses of protein-antibody interactions, and a novel tool for improving immunogen selection. Finally, challenges with machine learning models, including data bias and interpretability, as well as the need for data sharing and ongoing development and evaluation of bioinformatics tools are discussed