1. Home
  2. News
  3. Aktuelle Publikationen
  4. Generating the Language of AI Harms: Mapping Guardrails Using Critical Code Studies

Veröffentlichung von CAIS-Fellow Sarah Ciston

Generating the Language of AI Harms: Mapping Guardrails Using Critical Code Studies

Dr. Sarah Ciston, ehemalige CAIS-Fellow, hat einen Artikel in der Fachzeitschrift „AI & Society“ veröffentlicht. Darin analysiert sie die Sicherheits- und Filtermechanismen („Guardrails“) großer Sprachmodelle als technische und zugleich ideologische Mechanismen.

22. April 2026

Prof. Sarah Ciston war von April bis September 2025 als Fellow am CAIS. Im März 2026 wurde nun ihre Publikation „Generating the Language of AI Harms: Mapping Guardrails Using Critical Code Studies“ in der Fachzeitschrift AI & Society veröffentlicht. Darin zeigt, dass diese nicht nur technisch festlegen, was eine Künstliche Intelligenz (KI) sagen darf, sondern auch beeinflussen, welche Themen in Gesprächen erlaubt oder eingeschränkt werden.

Abstract der Publikation

Phrases like “This prompt violates our content policy” or “As an AI model, I cannot…” indicate the invisible edges of generative AI systems. These edges are both frustrating and tempting. The concern persists that the massive scale and statistical structures of large models leave them impenetrable and incomprehensible. This paper examines large language model guardrails as a case study for applying critical code studies methodologies to large-scale AI. The subfield of AI alignment provides a rich access point for analyzing how foundation models are moderated and refined. This study explores four popular companies’ guardrails – Anthropic, DeepSeek, Meta, and OpenAI –including both how guardrails are applied in general-purpose models and also how they are built into moderation API tools on public offer. The analysis looks at endpoint documentation, code examples, technical reports, model architectures, training dataset content, and methodology research to create a critical picture of guardrails as part of conversational interfaces for sociotechnical control. It maps how their technical construction also co-constructs ideology through language, both computational and linguistic. Through guardrails, certain conversations are defined and limited by their filters, while other conversations are promoted. Thus, code decodes, encodes, regulates, and is itself conversation about what can be discussed.

Ciston, S. Generating the language of AI harms: mapping guardrails using critical code studies. AI & Soc (2026).
https://doi.org/10.1007/s00146-026-02922-0