Research Incubator: Automated Text Analysis

Which digitalization topics are set politically and socially? Results of an automated text analysis
Das Team Forschungsinkubator
  1. Home
  2. Research
  3. Research Incubator
  4. Research Incubator: Automated Text Analysis

To find clues to potentially important topics in digitization research, CAIS conducted an automated content analysis of landmark text documents. An insight into the process and the findings.

Abbildung 1. Die automatisierte Textanalyse ist ein Baustein im Themenfindungsprozess

Anticipating social developments and reacting to them accordingly are key tasks for political and social actors. But which digitization issues are being addressed politically at all? And which of them are more short- or medium-term? How do funding bodies (e.g., the German Research Foundation or the Federal Ministry of Education and Research) act in this field? Do they set different priorities? And which topics with a focus on digital transformation are being addressed by other institutions?

To answer these questions, an automated text analysis of such landmark documents was conducted in fall 2020. In combination with the findings of the online real-time Delphi study of fall 2019 as well as the expert:inside discussions with researchers in digitization research conducted one year later, this automated text analysis is a further building block in the context of identifying future-oriented research topics at CAIS.

You can find the visualization of the whole topic finding process in this video.

Automated text analysis of 471 documents

The automated text analysis of 471 documents included important texts, such as the digitization strategies of federal states, calls for research projects of the Federal Ministry of Education and Research, as well as self-descriptions of already existing research contexts with a digital focus. The data collection for this took place in the period from August 20 to September 07, 2020. Based on the bag-of-words approach, an explorative insight into rough structures and contents of the texts was achieved.

First finding:

Strong expression of research- and business-related vocabulary in all texts.


Second finding:

Congruent results to other topic finding methods.


Third finding:

Valuable entry points and development potentials

Bag-of-words approach

Bag-of-words approach means that texts are broken down into components of a fixed length for analysis. In the process, the context in which words, groups of words, or sentences are found is dissolved. Figuratively speaking, a bag contains all the words of the original texts in loose order and relationship.

Topic Modeling Approach

Topic Modeling refers to a procedure in the automated processing of texts. A topic model can be understood as a statistical model for discovering topics or semantic structures that occur in a collection of documents. LDA is the short form of a common analysis method: Latent Dirichlet Allocation.