7 May 2026 · LinkedIn
The dominant frame for AI bias treats the model as the source. Training data is described as biased. Outputs are described as biased. Interventions are described as debiasing. The model is the patient, the data is the disease, and the cure is technical correction at the level of the system.
This framing has produced research programmes, a regulatory vocabulary, and the EU AI Act's compliance architecture. It is also, I argue, the wrong diagnostic frame.
Bias in large language models is a structural reflection of human cognitive patterns, faithfully reproduced from the training corpus. The model is a mirror. Confirmation bias becomes sycophancy. Anchoring becomes prompt-framing dependency. The bandwagon effect becomes statistical popularity over expert accuracy. The patterns are not introduced during training. They were already in the data.
The paper develops a falsifiable criterion for which biases transfer and which do not. Externalised biases, embedded in the statistical regularities of written communication, transfer readily. Embodied biases, which require lived stakes or felt experience, are predicted not to. RLHF, the case study, does not restructure the underlying terrain. Recent formal work confirms it: alignment is shallow relative to pretraining distributions, and the pretraining distribution reasserts itself when the model encounters contexts the raters did not anticipate.
The implication for governance is that pre-deployment certification cannot catch the full scope of mirrored bias. Continuous post-deployment monitoring is structural, not optional, and the competency to do it well does not yet exist as a discipline. The paper proposes one: the AI Psychotechnologist.
The paper is currently under review at IEEE Computer Magazine's special issue on AI governance, so this is the working pre-print version. Comments and pushback welcome - that's part of why I'm posting it.
🔗 https://ssrn.com/abstract=6638918
Join the conversation on LinkedIn →
← All writing