AI Hallucinations Create “Slopsquatting” Supply Chain Threat

Summary:
Large Language Models have become integral to modern software development, accelerating the creation of web applications, automation scripts, and more. While these AI tools offer significant productivity gains, they also introduce new security risks, including an emerging threat known as slopsquatting. This risk arises when LLMs hallucinate and suggest non-existent package names that appear legitimate. Developers, relying on AI-generated code, may unknowingly install these fake packages if attackers register them in advance.Slopsquatting, a term coined by Seth Larson and popularized by Andrew Nesbitt, builds upon the concept of typosquatting. However, rather than relying on human error, slopsquatting exploits AI-generated mistakes.

A recent academic study titled We Have a Package for You! offers a detailed analysis of this phenomenon. Conducted by researchers from the University of Texas at San Antonio, Virginia Tech, and the University of Oklahoma, the study evaluated 16 leading LLMs, both commercial and open-source, by generating 576,000 code samples in Python and JavaScript. Key findings reveal that 19.7 percent of recommended packages across all models did not exist. Open-source models were especially prone to hallucinations, with an average rate of 21.7 percent, compared to just 5.2 percent among commercial models. CodeLlama 7B and 34B had the highest rates, exceeding 33 percent, while GPT-4 Turbo performed best with a hallucination rate of 3.59 percent. In total, researchers identified over 205,000 unique hallucinated package names, establishing that these are not random anomalies but systemic issues.

Follow-up experiments showed that 58 percent of hallucinated packages were repeated more than once across ten identical prompt executions. Nearly half of these were consistently reproduced every time, indicating that many hallucinations are not transient errors but stable behaviors, which can be easily exploited by attackers. Observing a small number of LLM outputs is often sufficient to identify potential slopsquatting targets. The risk is further influenced by model settings and behavior. Higher temperature values, which increase the randomness of AI responses, correlated with more hallucinations. Models that suggested a higher number of unique packages also demonstrated greater hallucination rates, while conservative models that reused well-known libraries were more accurate and produced better quality code.


Security Officer Comments:
A significant concern is the plausibility of hallucinated names. Many are not simple typos but semantically believable suggestions. Only 13 percent were off-by-one character from legitimate names, while nearly 50 percent were fully fabricated yet convincingly structured. This makes it difficult for developers to detect hallucinated packages by visual inspection, especially when embedded within otherwise valid code.The research also uncovered cross-language confusion, with 8.7 percent of hallucinated Python packages corresponding to valid npm (JavaScript) packages. Additionally, an examination of deleted packages found that just 0.17 percent of hallucinated names matched any removed from repositories like PyPI between 2020 and 2022, reinforcing the finding that most hallucinations are entirely fictional. Promisingly, some models like GPT-4 Turbo and DeepSeek demonstrated the ability to recognize their own hallucinated outputs, achieving over 75 percent accuracy in self-detection tests. This opens the door to mitigation strategies such as self-refinement, where the model audits its own suggestions before presenting them to the user.

To evaluate hallucination tendencies in realistic scenarios, the researchers designed prompts based on Stack Overflow queries and common package usage descriptions. These included typical developer tasks such as automating web interactions with Selenium or implementing Flask rate limiting. The prompts consistently triggered hallucinated packages, allowing researchers to analyze behavior across all models without registering the packages on public repositories.


Suggested Corrections:
Mitigating this threat requires tools capable of identifying not only known risks but also suspicious or newly published packages. Platforms like Socket are addressing this challenge by scanning entire dependency trees for risky behaviors such as install scripts, obfuscated code, or hidden payloads. Developers can also use browser extensions and GitHub integrations to detect malicious packages in real time, adding an essential layer of protection.


Link(s):
https://www.infosecurity-magazine.com/news/ai-hallucinations-slopsquatting/