Education Research Current About VU Amsterdam NL
Login as
Prospective student Student Employee
Bachelor Master VU for Professionals
Exchange programme VU Amsterdam Summer School Honours programme VU-NT2 Semester in Amsterdam
PhD at VU Amsterdam Research highlights Prizes and distinctions
Research institutes Our scientists Research Impact Support Portal Creating impact
News Events calendar Woman at the top
Israël and Palestinian regions Culture on campus
Practical matters Mission and core values Entrepreneurship on VU Campus
Organisation Partnerships Alumni University Library Working at VU Amsterdam
Sorry! De informatie die je zoekt, is enkel beschikbaar in het Engels.
This programme is saved in My Study Choice.
Something went wrong with processing the request.
Something went wrong with processing the request.

PhD defence M. Laurer 2 October 2024 09:45 - 11:15

Share
Language Models as Measurement Tools

Computational Social Scientist Moritz Laurer demonstrates how instruction-based language models can overcome limitations of older machine learning techniques for text classification. Laurer shows how algorithms can learn to categorize texts with less training data; more accurately on multiple different languages and in a less biased manner. He shows how instruction-based language models can increase validity, robustness and data efficiency.

Summary of findings
Moritz Laurer shows how this type of model can reduce the required training data by a factor of ten compared to previous algorithms, while achieving the same level of performance across eight tasks. He demonstrates how these models require less than 2000 examples in two languages to create valid measurements across eight other languages and ten other countries. Moritz Laurer shows how these models are more robust against group-specific biases. Their average test-set performance only decreases marginally when trained on biased data in experiments across nine groups from four datasets. He explains how these models can be universal classifiers that can learn any number of classification tasks simultaneously in tests across 33 datasets with 389 classes.

Relevance and supervised machine learning
From millions of social media posts, to decades of legal text - more and more relevant information is hidden in digital text corpora that are too large for manual analyses. The key promise of machine learning ("Artificial Intelligence") is to automate parts of the manual analysis process.

One popular method is supervised machine learning for text classification, where a model is trained on examples of manually categorized texts and learns to identify these categories in new texts. Computational social scientists have used this method to create measurements of concepts such as emotions, topics or stances at scale. While measurement with supervised machine learning is established in the social science literature, there are important limitations that reduce the usefulness of established methods for many practical applications.

Limitations of established methods
First, these methods require large amounts of balanced training data to work well. Researchers, however, often only have limited resources for creating training data and need to tailor new data to each new research question. Second, older algorithms struggle with multilingual data. Researchers, however, need measurements that are equally valid for different cultures and languages. Third, they are susceptible to learning shortcuts and biased patterns from their training data, reducing the validity of measurements across social groups. Fourth, they can be difficult to use, making them only accessible to specialised researchers.

Moritz Laurer’s research shows how instruction-based language models can help overcome these limitations.

The models he developed during his PhD research have been downloaded more than 65 million times and are freely available at: https://huggingface.co/MoritzLaurer.

More information on the thesis.

Programme

PhD defence by M. Laurer

PhD Faculty of Social Sciences

Supervisors:

  • prof.dr. W.H. van Atteveldt
  • dr. K. Welbers
  • dr. A. Casas Salleras

The PhD defence can also be followed online.

About PhD defence M. Laurer

Starting date

  • 2 October 2024

Time

  • 09:45 - 11:15

Location

  • Auditorium, Main building
  • (1st floor)

Address

  • De Boelelaan 1105
  • 1081 HV Amsterdam

Follow the defence online

Go to livestream

Moritz Laurer

Moritz Laurer

Quick links

Homepage Culture on campus VU Sports Centre Dashboard

Study

Academic calendar Study guide Timetable Canvas

Featured

VUfonds VU Magazine Ad Valvas Digital accessibility

About VU

Contact us Working at VU Amsterdam Faculties Divisions
Privacy Disclaimer Veiligheid Webcolofon Cookies Webarchief

Copyright © 2025 - Vrije Universiteit Amsterdam