1. Home
  2. News
  3. Latest publications
  4. Generating the Language of AI Harms: Mapping Guardrails Using Critical Code Studies

Publication by CAIS Fellow Sarah Ciston

Generating the Language of AI Harms: Mapping Guardrails Using Critical Code Studies

Dr. Sarah Ciston, former CAIS Fellow, has published an article in the journal AI & Society. In it, she analyzes the safety and filtering mechanisms (“guardrails”) of large language models as both technical and ideological mechanisms.

22. April 2026

Prof. Sarah Ciston was a Fellow at CAIS from April to September 2025. In March 2026, her publication “Generating the Language of AI Harms: Mapping Guardrails Using Critical Code Studies” was published in the journal AI & Society. In it, she shows that these mechanisms do not only technically determine what artificial intelligence (AI) is allowed to say, but also influence which topics are permitted or restricted in conversations.

Abstract of the Publication

Phrases like “This prompt violates our content policy” or “As an AI model, I cannot…” indicate the invisible edges of generative AI systems. These edges are both frustrating and tempting. The concern persists that the massive scale and statistical structures of large models leave them impenetrable and incomprehensible. This paper examines large language model guardrails as a case study for applying critical code studies methodologies to large-scale AI. The subfield of AI alignment provides a rich access point for analyzing how foundation models are moderated and refined. This study explores four popular companies’ guardrails – Anthropic, DeepSeek, Meta, and OpenAI –including both how guardrails are applied in general-purpose models and also how they are built into moderation API tools on public offer. The analysis looks at endpoint documentation, code examples, technical reports, model architectures, training dataset content, and methodology research to create a critical picture of guardrails as part of conversational interfaces for sociotechnical control. It maps how their technical construction also co-constructs ideology through language, both computational and linguistic. Through guardrails, certain conversations are defined and limited by their filters, while other conversations are promoted. Thus, code decodes, encodes, regulates, and is itself conversation about what can be discussed.

Ciston, S. Generating the language of AI harms: mapping guardrails using critical code studies. AI & Soc (2026).
https://doi.org/10.1007/s00146-026-02922-0