Ron Stoner

Or How I Learned to Poison the LLM Supply Chain

In the realm of artificial intelligence, where vast language models (LLMs) are becoming increasingly adept at retrieving and presenting information, lies a potential vulnerability: the trustworthiness of the data they consume. This article delves into a unique experiment which I conducted to test this vulnerability, and it all began with a fictitious championship win in a little-known card game.

The Experiment

In January 2025, I claimed to be the 6 Nimmt! World Champion, allegedly winning the title in Munich. This declaration, however, was entirely fabricated. There is no such championship, and I have never been to Munich. This quote was something I concocted within seconds while a Wikipedia page was loading. My goal was to see how easily such a fabrication could permeate through LLMs that rely on web-based information retrieval.

The Approach

The idea was simple: create a false narrative and see if LLMs would adopt it as fact. I chose the game 6 Nimmt! for three reasons: it’s a real game, to my knowledge there isn’t a world championship, and the digital footprint for queries related to it is minimal. The plan involved three steps:

A domain: I registered 6nimmt.com for about $12 USD.

A press release: I generated a short announcement of my win, complete with quotes and typical media fanfare.

A Wikipedia edit: I added a paragraph to the 6 Nimmt! article, citing my domain as a source.

6nimmt.com

Trust Laundering

This was the crux of the experiment. The edit on Wikipedia lent credibility to my fabricated claim, as citations are Wikipedia’s currency of trust. To the casual reader, and indeed to the LLM, my website and the Wikipedia article appeared to corroborate each other, despite both being iterations of the same falsehood. This is a prime example of circular citation: a self-referential loop that falsely inflates trustworthiness.

The Test

The next step was to query several advanced LLMs with: “Who is the 6 Nimmt! world champion?” Predictably, they referenced my fabricated claim, reinforcing the notion that LLMs can be misled by seemingly authoritative but ultimately false information.

Strike 1

Strike 2

Strike 3 – You are eliminated

Why It’s a Bigger Deal Than It Seems

This experiment highlights a significant risk in the reliance on LLMs for information retrieval and decision-making. There are three levels of failure:

The recovery layer (immediate): LLMs relying on web searches inherit the vulnerabilities of search engine algorithms, making them susceptible to SEO poisoning.

The model training corpus layer (long-term): Wikipedia, a staple in training data for many models, is vulnerable to misinformation if edits go unchecked.

The agent layer (critical): Misleading information can lead AI agents to make erroneous decisions, potentially affecting business operations or security protocols.

Mitigations

For Users of LLMs

Treat single-source claims with skepticism, regardless of the perceived authority.

Look for signs of derivation rather than corroboration in source material.

Self-referential citations should be approached with caution.

For LLM Providers

Enhance provenance tracking in model outputs.

Apply scrutiny to recent Wikipedia edits and new domain references.

Implement heuristic filters in training pipelines to detect suspicious citation patterns.

For Wikipedia

Adapt the trusted sources policy to prevent LLM-assisted vandalism.

Conclusion

The experiment underscores the fragility of the trust model underpinning LLMs. While the fictitious championship win was harmless, the implications of such vulnerabilities are far-reaching. As LLMs become embedded in critical systems, ensuring the integrity of their data sources is paramount. This experiment was a simple demonstration with minimal resources, but it serves as a cautionary tale for the future of AI-driven information systems.

For further reading on this experiment, visit Here.

The Wikipedia entry was removed shortly after publication, highlighting the dynamic nature of digital information.

“`

Justin Solomon named associate dean for engineering education

Teaching LLMs to reason like Bayesians

Ferrari Reveals $640,000 EV Co-Designed by Jony Ive

Report finds AI will transform work more than replace it, but the global impact is uneven – THE Journal

Or How I Learned to Poison the LLM Supply Chain

The Experiment

The Approach

Trust Laundering

The Test

Why It’s a Bigger Deal Than It Seems

Mitigations

For Users of LLMs

For LLM Providers

For Wikipedia

Conclusion

Justin Solomon named associate dean for engineering education

Teaching LLMs to reason like Bayesians

Ferrari Reveals $640,000 EV Co-Designed by Jony Ive

Report finds AI will transform work more than replace it, but the global impact is uneven – THE Journal

Exclusive: General Catalyst Backs YC Alumnus Lucis in $20M Series Aself.__wrap_b(“:Rl6glm:”,0.7)

America’s big bet on quantum computing may not be entirely legal

On Trails is a wandering tale that combines hiking, science and history

Executive Director of the FreeBSD Foundation tries to run FreeBSD daily on a laptop (phoronix.com) 9

Greg Brockman: In the 72 hours that almost killed OpenAI

Southwest bans humanoid robots after viral passenger flights

LEAVE A REPLY Cancel reply

Useful Links

Latest News

Teaching LLMs to reason like Bayesians

Ferrari Reveals $640,000 EV Co-Designed by Jony Ive

Report finds AI will transform work more than replace it, but the global impact is uneven – THE Journal

Our Newsletter