Although we sometimes seem overwhelmed by data, the amount available in many studies is limited. For example, when investigating the spread of a virus: where did it first appear? ''We often only know that a few individuals were sick, and details such as the exact moment of infection are missing. How can we then find out how the virus spread, and which factors played a role in this? That is not directly visible from data; it is in fact invisible'', says van der Meulen. According to the researcher, the answer lies in a combination of mathematical models, theory from mathematical statistics and computational methods. This makes the invisible visible, as it were. But how does this work?
Pillars
The professor, who was appointed at the end of 2022, focuses his research on developing statistical methods for stochastic processes, with an emphasis on uncertainty quantification and indirect observation schemes. Stochastic processes are phenomena that vary over time and that involve uncertainty. This can be the course of a disease, the spread of a virus, the price of a share or the degree of pollution in a river.
Uncertainty Quantification
''It is not always possible to measure such phenomena continuously. For example, when a virus spreads, we do not know exactly which people are infected at what time; we may know this for some people, but it is usually not known exactly when someone was infected'', explains Van der Meulen. If pollution is measured in a river, there may be sensors that give an indication of this at high frequency, but in-situ measurements often take place at a much lower frequency. Often, measurement errors also have to be taken into account. Researchers therefore receive indirect information.
Indirect observation schemes
Then, returning to an earlier question, how can we find out how the virus spread and which factors played a role in this, if there is little data available? Researchers often extract information from data by specifying a statistical model. Such a model contains unknowns, which we try to find out from the data. These estimates help to imbue the data with meaning. ''But how accurate is such an estimate? To that end, I am interested in specifying the uncertainty'', says Van der Meulen. ''The underlying idea is simple: if I want to know whether a coin is fair, and I have the choice of tossing it 10 times or 1000 times, I will choose the latter option. But if I see heads 5 or 500 times, I will report that the chance of heads is 50% in both cases. In the case of 1000 tosses, however, the margin of uncertainty is much smaller.''
Use of mathematical statistics
By using mathematical statistics, the study of statistical methods from a mathematical perspective, targeted information can be extracted from data. Often to be able to make better decisions under uncertainty. Many methods are initially developed by users. ''However, it is not always clear when such methods work, and what do we actually mean by “work”? Can we prove that such methods do what they are designed to do? This is exactly what mathematical statistics is concerned with: developing and studying statistical methods. This also includes computational methods: methods aimed at efficiently doing calculations in statistical methods.''
Combining information
By then combining the two forms of information, the assumed model and the data, concrete answers can be found. Many problems for which no computational methods were available twenty to thirty years ago, now have good software. Frank van der Meulen hopes to contribute to such methods for statistical methods specifically aimed at stochastic processes in the coming years.
Frank van der Meulen's inaugural lecture will take place on January 24, 15:45 - 17:15. Click here for more information.