Caio Mello M.A.

School of Advanced Study, University of London

Caio Mello

Interpreting sentiment analysis outputs to categorise emotions in news articles

Sentiment analysis (SA) is one of the techniques commonly used in Natural Language Processing. These algorithms classify words in a scale of sentiment that usually goes from negative to positive. These techniques have been mostly applied to twitter data or customer feedback to track reception of products. However, researchers in the humanities have been using the algorithms to explore other kinds of text, such as news articles to understand the tone in which some media events are narrated. This kind of use of the SA present several challenges. Questions arise on to what extent the SA scores are reliable or how they can be used for interpretation (qualitative analysis) and discourse analysis. This project aims to investigate how these techniques have been applied to the study of media events, identifying its limitations and potentialities, making use of explainable AI and qualitative analysis of SA outputs in a multilingual text corpus.

esearch Results

Preliminary results of this research project include:

  • Sentiment of objective statements is harder to classify than subjective ones: When using three different sentiment classifiers as a method to verify accuracy by checking agreement or disagreement between them, it was found that objective statements tend to cause more confusion to the algorithms.
  • Combining multiple classifiers potentially helps to filter out challenging objective statements: However, when comparing the final sentiment score obtained from the combination of the three classifiers with gold labels produced by a (human) domain expert, no difference between objective and subjective statements was encountered.
  • Automated sentiment classification did not present considerable difference from human expert gold labels regarding news headlines mentioning ‘money’: Most of the results were similar, when comparing positive, neutral and negative labels generated by sentiment classifiers with a list produced by a domain expert.

A discussion on both the technical limitations of algorithms trained for sentiment classification and the idea of what sentiment means was conducted over a workshop organised by me, Caio Mello, in collaboration with Dr. Johannes Breuer, head of the department of Research Data & Methods at CAIS, and Dr. Gaurish Thakkar, research fellow at the Faculty of Humanities and Social Sciences, University of Zagreb. The material produced for this workshop is available (open access) at the CAIS Github page (https://github.com/CAIS-Research/Introduction-to-SA-Training-CAIS).

This research has benefited from the CAIS funding for visiting fellows, through which Dr. Gaurish Thakkar, a specialist in Sentiment Analysis, has visited the Center in Bochum to work with me on this project for the period of two weeks.

Part of the research undertaken throughout this project can already be read in the book Multilingual Digital Humanities (2023). In chapter 10, entitled Data Scarcity and Methodological Limitations in Multilingual Analysis of News Articles Published in Brazil (https://www.taylorfrancis.com/chapters/edit/10.4324/9781003393696-14/data-scarcity-methodological-limitations-multilingual-analysis-news-articles-published-brazil-caio-mello) , where I discuss some preliminary results of his project.

Main Research Topic

  • Digital methods for the Humanities​;
  • Natural Language Processing (NLP)​;
  • Data visualisation;​
  • Media Studies​;
  • Digital activism.

Curriculum Vitae

  • 2019-2023 Early-stage researcher in Digital Humanities at the School of Advanced Study, University of London.
  • 2021-2022 Programming Historian collaborator (Translator & Reviewer)
  • 2021 Visiting researcher at VICO Research & Consulting (Secondment)
  • 2019 Visiting researcher at British Library (Secondment)
  • 2018 Research fellow at the Center for Advanced Internet Studies

Publications and Presentations

Mello, C., Cheema, G. S., & Thakkar, G. (2022). Combining sentiment analysis classifiers to explore multilingual news articles covering London 2012 and Rio 2016 Olympics. International Journal of Digital Humanities, 1-27.

Mello, C. (2023, forthcoming). Data scarcity and methodological limitations in multilingual analysis of news articles published in Brazil. In Multilingual Digital Humanities: Routledge series Digital Research in the Arts and Humanities.

Sittar, A., Major, D., Mello, C., Mladenić, D., & Grobelnik, M. (2022). Political and Economic Patterns in COVID-19 News: From Lockdown to Vaccination. IEEE Access, 10, 40036-40050.

Caio Mello M.A.

School of Advanced Study, University of London

Fellow at CAIS from April to September 2023