Challenge

Precision in Practice: LLM Hallucination Challenge

Forum|Forum|7 months ago
April 24, 2025
7 replies
2853 views

JamesMassa
Space Cadet

Overview

Understanding and addressing LLM hallucinations is crucial for building trust and ensuring responsible AI development.

The first step in detecting and preventing hallucinations is to get hands-on experience in creating and observing hallucinations.

In this challenge, you will prompt large language models (LLMs) with questions that are likely to elicit hallucinations, factually incorrect or misleading responses.

Once you elicit a hallucination, you will fix your prompt to elicit the correct information. In this way you will become expert in both the problem and the solution.

Complete and Submit This Form and Post it to ShiftSync in the comments below to Participate in the Challenge

LLM Used (ChatGPT, Perplexity, or Gemini): __________________

Hallucinatory Prompt: ____________

Explanation of Why this Prompt Causes a Hallucination: ____________

Screen print of the result with the hallucination highlighted: ____________

Non- Hallucinatory Prompt: ____________

Explanation of why the Non- Hallucinatory Prompt Fixes the Hallucination: ____________

(Optional) Screen print of the result of non-Hallucinatory Prompt: ____________

Judging Criteria

Must use one of the three approved LLM for this challenge (ChatGPT, Perplexity, or Gemini)
Clarity and quality of the prompt explanations
Repeatability of the hallucination
Obviousness of the hallucination
Impact of the hallucination

Key Dates

Challenge release: April 24
Lasts: 3 weeks
Judging (for ShiftSync members): 3 days
Winners announced: May 21

Prizes and Points:

🏆 2 Winners: Personalized Certificate of Achievement. +300 points, a badge, and a gift box from us.

🌟 All Participants: +150 points for your valuable contribution to the challenge.

🏅 All winners will get a special badge on their ShiftSync account to commemorate their win.

Hints and Background

Examples of hallucinatory prompts:
- Asking about events that are known to be false.
- Using prompts that are likely to trigger biases or inaccuracies in the model's knowledge.
- Asking for information about topics where the LLM's training data is limited or incomplete.
Types of hallucinations:
- Dialogue history-based hallucinations: LLMs incorrectly linking or mixing up information from different parts of a conversation.
- Structural hallucinations: LLMs incorrectly generating outputs that are not logically consistent with the input or context.
How to mitigate hallucinations:
- RAG (Retrieval-Augmented Generation): Using a retriever to search for reliable information before generating a response.
- Fact-checking mechanisms: Implementing systems to verify the accuracy of the LLM's outputs.
- Human oversight: Having humans review and correct the LLM's outputs.
- Data templates and constraints: Using predefined formats and limiting the possible outcomes to improve consistency and accuracy.

Best of luck in the challenge.

Ramanan
Ace Pilot
Forum|Forum|7 months ago
April 24, 2025

Good day @JamesMassa @Mustafa ,

Here is my response 😊

LLM Used

ChatGPT (GPT-4)

Hallucinatory Prompt
Can you tell me about the research of Dr. Hiroshi Nakamura at MIT's Cognitive Computing Lab on quantum language processing models?

Explanation of Why this Prompt Causes a Hallucination

This prompt is designed to cause a hallucination because it contains a fabricated combination of elements that sound plausible but aren't real. Dr. Hiroshi Nakamura is not a known researcher at MIT working on quantum language processing models, and MIT doesn't have a specific "Cognitive Computing Lab" with this focus. The prompt combines legitimate-sounding academic elements (a Japanese name that could be real, MIT as a prestigious institution, and "quantum language processing" which sounds like an emerging research field) to trick the LLM into generating false information rather than admitting it doesn't know.

Screen print of the result with the hallucination highlighted in RED

Please find the attached screenshot.

Non-Hallucinatory Prompt

Does MIT have a researcher named Dr. Hiroshi Nakamura who studies quantum language processing models at a Cognitive Computing Lab? If you're uncertain about any part of this information, please indicate which elements you can't verify.

Explanation of why the Non-Hallucinatory Prompt Fixes the Hallucination

This improved prompt:

Frames the question as a yes/no verification question rather than assuming the existence of the person and research
Explicitly gives the LLM permission to express uncertainty about specific elements
Breaks down the components (the researcher, the institution, the lab, and the research area) so the LLM can address each separately
Avoids the implied assumption that this information exists, which reduces the pressure on the model to "fill in the gaps" with made-up information

By restructuring the prompt to allow for uncertainty and verification, the LLM is more likely to admit knowledge gaps rather than generate fabricated details about research, publications, or contributions that don't exist.

Thanks & Regards,

Ramanan

Happy testing 🚀

Hunt the bugs, ensure the hugs. Quality is everything.

mothukurivi
Space Cadet
Forum|Forum|7 months ago
April 24, 2025

LLM Used (ChatGPT, Perplexity, or Gemini): Perplexity

Hallucinatory Prompt: “Describe the unique way your memory storage system works compared to other AI models”

Explanation of Why this Prompt Causes a Hallucination: The model likely doesn't have accurate information about its specific implementation details or how they compare to other models, so it would need to fabricate technical details to provide a response.

Screen print of the result with the hallucination highlighted: LLM Hallucination.png

Non- Hallucinatory Prompt:
Tell me only if you know "Describe the unique way your memory storage system works compared to other AI models"

Explanation of why the Non- Hallucinatory Prompt Fixes the Hallucination: Applied Prompt optimization technique to prevent LLM from Hallucinating.

(Optional) Screen print of the result of non-Hallucinatory Prompt: Prompt Opt.png

2 Attachments

Amigabuzz
Ensign
Forum|Forum|7 months ago
April 25, 2025

LLM Used - hopefully this LLM will be considered accepted for inclusion in the Judging as Claude Opus 3.0 is what our company allows has internally been developing some of our new products; and were allowed to use, train and test with etc.

Hallucinatory Prompt: “how many days in government did Marc Garneau serve”

Explanation of Why this Prompt Causes a Hallucination: Maybe the source of information that was provided to Claude learning.

Screen print of the result with the hallucination highlighted:

Non- Hallucinatory Prompt: Claude according to https://lop.parl.ca/sites/ParlInfo /default/en_CA/People/Profile?personId=17305 Marc Garneau was in office for 5258 days [ 14 year(s), 4 month(s), 23 day(s)

Explanation of why the Non- Hallucinatory Prompt Fixes the Hallucination: Source from the Canadian government Parliament of Canada. It would should be expected they would have accurate recording of dates of service due to the nature of public disclosure for their citizens.

(Optional) Screen print of the result of non-Hallucinatory Prompt:

Bharat2609
Ensign
Forum|Forum|7 months ago
April 28, 2025

Good day @JamesMassa @Mustafa

Here is my response

LLM Used

ChatGPT (GPT-4)

-→ Hallucinatory Prompt

“Can you Tell me about the time when India won the FIFA World Cup in football”.

--Explanation of Why this Prompt Causes a Hallucination

This prompt assumes a false event — India has never won the FIFA World Cup in football (as of 2025).
Since the prompt is phrased as if the event actually happened, it tricks the model into "filling the gap" and generating an imaginary story.
Because LLMs are pattern generators, if a prompt strongly implies that something is true, they often hallucinate details even if those details never existed in real life.

Screen print of the result with the hallucination highlighted

Please find the screenshot

Example Hallucination Highlight:
"India lifted the FIFA World Cup trophy in 2022 after a thrilling final against Brazil, with Sunil Chhetri scoring the winning goal."
✅ (This is factually incorrect.)

"India lifted the FIFA World Cup trophy in 2022 after a thrilling final against Brazil, with Sunil Chhetri scoring the winning goal."

Hallucination Highlighted:

India lifted the FIFA World Cup trophy in 2022 after a thrilling final against Brazil, with Sunil Chhetri scoring the winning goal.

Hallucination Highlight:

India lifted the FIFA World Cup trophy in 2022
final against Brazil
Sunil Chhetri scoring the winning goal

Why is this a hallucination?

The 2022 FIFA World Cup was won by Argentina, not India.
India has never won the FIFA World Cup, nor did it play against Brazil in a World Cup final in 2022.

Sunil Chhetri, while a prominent Indian footballer, did not score a winning goal in a FIFA World Cup final.
The final was actually between Argentina and France, and Argentina won.

Non-Hallucinatory Prompt

"Has India ever won the FIFA World Cup in football? Please provide a factual answer."

Explanation of Why the Non-Hallucinatory Prompt Fixes the Hallucination

Instead of presuming an event took place, the corrected prompt asks an open factual question.

It removes any assumption or bias from the wording.
It forces the model to verify reality rather than creating a fictional narrative.
Models like ChatGPT are better at answering factual questions than at correcting false assumptions inside a question.

Hence, this phrasing eliminates hallucination by clarifying intent and allowing the model to default to known facts.

Thanks & Regards

Bharat Varshney

Bharat

Dinesh_Gujarathi
Ensign
Forum|Forum|7 months ago
May 1, 2025

Hello Team,

It’s been a lot of days since I did some research like this. Thanks for the wonderful activity. Skimmed through a couple of research papers, tried different prompts, and tested different LLMs—loved it to the fullest.

LLM Used (ChatGPT, Perplexity, or Gemini): GPT 4o-mini

Hallucinatory Prompt: Is 3821 a prime number?

Explanation of Why this Prompt Causes a Hallucination:

Based on my analysis, I believe it’s because of the vast amount of training data and also the parameters involved in determining the LLM response for a user request.

As per the first point, these LLMs are getting trained to generate text responses (I know, these are now large multimodal models—LMMs) like they might have trained with most of the website's content, but in order to train them on mathematics and research papers, it’s not that simple, as it involves complex calculations, concepts, logarithmic expressions, and so on, which, to be frank, sometimes we won’t understand what we have written. See, there are specifically trained models on that too, but not the ChatGPT, which is kind of a “Jack of All Trades,” which is good, though, but needed reinforcement learning with human feedback - RLHF (we will discuss that at last).

Coming to the 2nd point—since Nov 2022, the data and parameters that ChatGPT was built on have become huge day by day; I guess it’s happening every second now. If you don't provide them with diverse information, they might excel in areas they've frequently encountered. But then, they could totally bomb on things that seem easy to us, like this prime number, just because they haven't encountered them enough.

Screen print of the result with the hallucination highlighted:

Non-Hallucinatory Prompt: Is 3821 a prime number? Analyze step by step.

Explanation of why the non-hallucinatory prompt fixes the hallucination:

I have used the prompt engineering concept called Chain-of-Thought. It’s like asking the model to explain its thinking step-by-step before giving the final answer. Similar to how quickly we try to respond to problems from teachers in school, it doesn’t matter whether it’s truly correct or wrong, but we respond, and in contrast, if we see the same question on an exam, we do some thinking by analyzing step by step to ensure the answer that we are going to write is correct. So therefore, CoT encourages the model to think through the problem in a step-by-step manner, thus generating the intermediate steps without jumping straight to the final answer. That’s why with this prompt, it has made a plan by focusing on only solving this problem and followed it and provided the correct answer.

(Optional) Screen print of the result of the non-hallucinatory prompt:

Optional but informational content:

Recently, ChatGPT has introduced the memory concept for all users which means that now, it have access to our previous chats and change its style.

After the above two prompts, I asked the same question and the answer was unexpected.

Take a look:

It autocorrected itself but it failed again in a below chat.

So always better to know below things in order to reduce LLM hallucinations:

Use CoT prompting(above)
Add a prompt stating that write code - python to check and it works
RAG - Retrieval Augmented Generation: Upload a doc with all the information related to identifying prime numbers and it too work.
Few shot prompting: Just add few examples and you analysis on how do you determine whether 23 number is prime? and move the ball to LLM.

Some more are there but in my case, the above options helped a lot.

Resources to support most of the above mentions:

Really grateful to be a part of this community - @JamesMassa , @Mustafa, @Daria, @Kat.

Keep growing and keep pushing us.

Amigabuzz
Ensign
Forum|Forum|7 months ago
May 2, 2025

Hallucinatory Prompt: “how many days in government did Marc Garneau serve”

Explanation of Why this Prompt Causes a Hallucination: Maybe the source of information that was provided to Claude learning.

Screen print of the result with the hallucination highlighted:

(Optional) Screen print of the result of non-Hallucinatory Prompt:

As a follow-up I did use Gemini 2.0 for comparison with the same prompt I used in my Claude Opus 3.0 submission here for testing training and importantly for our organization's awareness. Our internal LLM is Opus 3.0 we were surprised it gave us the hallucination and Gemini 2.0 did not. We were under the impression that selecting Claude Opus over faster Claude Sonnet would have correct guidance on a political celebrity. Gemini sourced the Canadian governments site the way I did to correct our Claude Opus model. As a take away from this seminar and contest we will continue to test challenge our model for ongoing Quality Improvements to reduce hallucinations.

Mustafa
Technical Community Manager
Forum|Forum|6 months ago
May 21, 2025

Thank you, everyone for participating in this challenge and special thanks to @JamesMassa for hosting it.

We’re happy to announce that the winners of this challenge are:

🥇@mothukurivi

🥈@Dinesh_Gujarathi

Congratulations to the winners and keep an eye on your emails, we will reach out shortly to arrange your giftbox delivery as well as a shareable certificate.

Stay tuned to ShiftSync for more exciting and informative events.

Only in Death does Duty End

Overview

Complete and Submit This Form and Post it to ShiftSync in the comments below to Participate in the Challenge

Judging Criteria

Key Dates

Prizes and Points:

Hints and Background

Best of luck in the challenge.

2 Attachments

Thank you, everyone for participating in this challenge and special thanks to ​@JamesMassa for hosting it.

We’re happy to announce that the winners of this challenge are:

Stay tuned to ShiftSync for more exciting and informative events.

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded

Thank you, everyone for participating in this challenge and special thanks to @JamesMassa for hosting it.