Skip to main content
Quiz

Week 2 Exercise - Head-to-Head: Evaluating AI Models


Did this topic help you find an answer to your question?
Show first post

43 replies

parwalrahul
Forum|alt.badge.img+3
  • Author
  • Chief Specialist
  • 91 replies
  • March 21, 2025

@VimalPatel nice observations and especially the conclusion.

 

Both models offer their own benefits as per the situations they are used in. I think it depends on the training data set that both are trained on.


parwalrahul
Forum|alt.badge.img+3
  • Author
  • Chief Specialist
  • 91 replies
  • March 21, 2025

@Jeethu I echo with this observation - Gemini Pro 1.5 offers more relevant practical testing examples.

The quality of examples is really good with gemini.

sometimes, I use it to generate examples and scenarios to understand concepts better.


parwalrahul
Forum|alt.badge.img+3
  • Author
  • Chief Specialist
  • 91 replies
  • March 21, 2025

@Saravanan s fair point related to Gemini’s multimodal design and integration with Google services, offers a more versatile and comprehensive user experience.

I think this is a big leverage that the google ecosystem brings. Gemini might be a clear winner in things like youtube summaries, or learning from Gdrive data.


Forum|alt.badge.img
  • Ensign
  • 14 replies
  • March 26, 2025

Hello ​@parwalrahul ,

My prompt was: How to integrate automation workflows within the GitLab CI/CD pipeline

CHATGPT: It gave detailed steps, such as:

1. Create or Modify the .gitlab-ci.yml File

2. Define Jobs

3. Use Environment Variables

4. Add Triggers and Conditions

5. Integrate with External Services

6. Using Docker

7. Set Up Runners

8. Monitor and Debug

 

Gemini Flash 1.5: It gave a breakdown of how to do it, covering various aspects and levels of complexity:

1. Basic Automation within GitLab CI/CD

2. Advanced Automation with External Tools

3. Example: Automated Deployment to Kubernetes

 

Response Quality: CHATGPT provided a more detailed approach, which is useful to the individual who is completely unaware of how to integrate the workflow within the GitLab CI/CD pipeline, whereas Gemini Flash 1.5 provides only general steps, which are not fully helpful for those who are not aware of it.

Accuracy: From my perspective, it completely depends on the user. If he/she is aware of or have previously worked with the CI/CD pipeline, then Gemini Flash 1.5 can be useful at a certain level, or CHATGPT is good for beginners.


parwalrahul
Forum|alt.badge.img+3
  • Author
  • Chief Specialist
  • 91 replies
  • March 26, 2025

@Charmi07 nice. thanks for sharing your experiment results.

See you in tomorrow’s session :)


Forum|alt.badge.img
  • Ensign
  • 5 replies
  • March 27, 2025

Hello Rahul @parwalrahul , First of all thanks for giving us this Quiz, I'm very grateful to be a part of these interesting webinars and quizzes, as I rarely come across such engaging opportunities.

I have used both ChatGPT 4.0 model and Gemini, I found very interesting and descriptive result from Gemini .

Gemini Model, being designed with multimodal capabilities, could handle both text and images. It can generate descriptions or convert text found in images into more readable formats.

ChatGPT

The image is a colorful illustration representing various iconic landmarks of India. It features a blend of architectural styles, signifying India's rich cultural heritage.

  • Elements in the Image:
    • The Taj Mahal, a symbol of love and one of the Seven Wonders of the World.
    • The Gateway of India, a historical monument located in Mumbai.
    • Traditional temples, likely representing the diversity of religious architecture in India.
    • Mountains in the background, possibly indicating the Himalayan range.

The top of the image prominently displays the text "INDIA" and "WELCOME TO INDIA," suggesting it is designed to promote tourism or celebrate the country's heritage. The vibrant colors and stylized graphics create a welcoming and festive atmosphere.

Gemini

The image is a stylized panoramic illustration showcasing iconic landmarks of India.  It's designed to be visually appealing and evocative of the country's diverse architecture and scenery.

From left to right, we see (although identification is not perfectly precise due to the artistic style):

  • A flock of birds: Suggesting freedom and the vastness of India.
  • Mountains: Representing the Himalayan range, a significant geographical feature.
  • India Gate (possibly): A triumphal arch in New Delhi.
  • Lotus Temple (possibly): A Baha'i House of Worship in New Delhi, known for its unique flower-like design.
  • Taj Mahal: The famous mausoleum in Agra, a symbol of India's rich history and Mughal architecture.
  • Qutub Minar (possibly): A towering minaret in Delhi, showcasing Indo-Islamic architecture.
  • A South Indian Temple (possibly): Representing the diverse temple architecture of South India. The specific temple is difficult to identify definitively.
  • Charminar (possibly): A monument in Hyderabad, Telangana.
  • Gateway of India (possibly): A monument in Mumbai.
  • Another arch (possibly): Possibly another historical arch, though the specific one is hard to determine.
  • Trees and greenery: Adding to the scenic beauty and suggesting the natural landscape.

The text "INDIA" and "WELCOME TO INDIA" is prominently displayed, making the purpose of the image clear:  tourism promotion or a general representation of India's cultural heritage.  The overall style is a blend of realism and illustration, creating a visually compelling image suitable for various promotional materials.


parwalrahul
Forum|alt.badge.img+3
  • Author
  • Chief Specialist
  • 91 replies
  • March 27, 2025

@sarika77 thanks for your participation and detailed response to this experiment. cheers!


Forum|alt.badge.img
  1. I tested ChatGPT 4.0 using a work-related prompt and found that the response was clear, detailed, and well-structured. It provided relevant examples and covered all aspects of the query accurately.

  2. After integrating the Gemini model, I ran the same prompt. The response from Gemini was clear but more concise. It provided direct answers but lacked some of the depth and extra details that ChatGPT included.

  3. Response quality: ChatGPT 4.0 gave a more detailed and structured response, while Gemini was more to the point and summarized information quickly.

  4. Accuracy: Both models provided correct answers, but ChatGPT’s response was more thorough, making it better suited for complex queries.

  5. Final reflection: ChatGPT 4.0 is better for detailed explanations, while Gemini is good for quick and concise answers. The choice depends on whether more depth or a faster response is needed.


parwalrahul
Forum|alt.badge.img+3
  • Author
  • Chief Specialist
  • 91 replies
  • March 31, 2025

@japankansara nice one. i would also like to share this llm leaderboard that lists different llms for different kind of tasks. maybe check it out too:

Chatbot Arena (formerly LMSYS): Free AI Chat to Compare & Test Best AI Chatbots


Comparison:

Step 1 – Testing with ChatGPT 4.0
I asked ChatGPT 4.0 to generate an automated test script in Python for logging into an e-commerce website.

ChatGPT 4.0 Response:
The response was detailed and well-structured. It provided a complete Selenium script with steps to open the browser, navigate to the login page, enter credentials, click login, and verify the result. It also included error handling, wait commands, and proper explanations. The script was easy to understand and ready to use.

Step 2 – Testing with Gemini
After integrating Gemini, I ran the same prompt.

Gemini Response:
Gemini provided a working Selenium script, but it was shorter and missed some important details. It did not include proper error handling or wait commands, making it less reliable for real-world testing. The explanation was brief, and while the script worked, it needed improvements.

Step 3 – Comparison Summary

Response Quality:
ChatGPT gave a more detailed and structured response with clear explanations. Gemini’s response was functional but lacked depth.

Accuracy:
Both models generated correct scripts, but ChatGPT’s version followed best practices like error handling and wait conditions, making it more practical.

Reasoning:
ChatGPT 4.0 is better because it provides more detailed responses, explains concepts well, and includes best practices in its solutions. This makes it useful for both beginners and experienced testers. Gemini, on the other hand, gives shorter responses that may need additional improvements before being used in real testing scenarios.

Step 4 – Final Reflection
ChatGPT 4.0 performed better in this case because it gave a complete, well-explained, and reliable script. Gemini was useful but needed more refinement. For QA automation tasks, ChatGPT 4.0 is a better choice as it saves time and provides high-quality responses.


parwalrahul
Forum|alt.badge.img+3
  • Author
  • Chief Specialist
  • 91 replies
  • April 1, 2025

@Yashvi_Mehta nice experimentation and reflections.

GPT is leading the market and I am sure it will continue to do so unless gemini finds some quantum leap.


Mustafa
Forum|alt.badge.img+6
  • Technical Community Manager
  • 70 replies
  • April 3, 2025

  

Important Note:


Hi, Everyone.

To anyone who attended the course, you have 15 days to submit your answers to all 4 of ​@parwalrahul's exercises before we close submissions on Friday the 18th of April. Certificates will start to be rolled out today, so keep any eye on your emails.

Thank you.


Forum|alt.badge.img+1
  • Specialist
  • 6 replies
  • April 3, 2025

Here is the structured approach to completing the task:

Step 1 – Test using ChatGPT 4.0 (Default Model)

Access ChatGPT

  • Log into AICamp where ChatGPT 4.0 is available by default.

Run a Prompt

  • Prompt Example: "Explain the impact of AI on healthcare in the next 10 years."

Step 2 – Load the Gemini Model

Add Gemini

  • Navigate to the model integration section on AICamp.
  • Add the Gemini (Google) model using the provided API key.

Verify Integration

  • Confirm that Gemini has been successfully loaded and is available on the dashboard.

Step 3 – Test Gemini

Run the Same Prompt

  • Use the identical prompt: "Explain the impact of AI on healthcare in the next 10 years."

Step 4 – Compare and Analyze

Response Quality

  • ChatGPT 4.0: Provided a detailed and structured response with specific examples.
  • Gemini: Offered a concise and clear response, similar in structure but slightly less detailed.

Accuracy

  • Both models provided accurate and relevant information.
  • ChatGPT 4.0 included more specific examples and details, which might be more useful for in-depth understanding.

Step 5 – Final Reflection

Key Takeaways

  • ChatGPT 4.0: Tends to provide more detailed and example-rich responses, which can be beneficial for comprehensive understanding.
  • Gemini: Offers concise and clear responses, which are useful for quick insights and overviews.
  • Both models are accurate and effective, but the choice between them may depend on the need for detail versus brevity.

Hi Rahul

I started by running some typical prompts I use in our day-to-day work, like generating test cases or explaining technical processes. Both ChatGPT and Gemini were tasked with responding to the same set of prompts to ensure fairness in comparison.

ChatGPT, as usual, was quick to respond. It gave clear answers but sometimes skipped over deeper technical details that could be crucial for more complex testing scenarios. This makes it great for when you need quick answers that are easy to understand, especially if you're in a rush or the task isn’t overly complicated.

On the other hand, Gemini really shined with its detailed responses. It seemed to understand the technical depth of the prompts better, providing answers that were not only thorough but also closely aligned with best practices in software development. For instance, if you’re dealing with a detailed user story that needs to adhere to specific testing frameworks or methodologies, Gemini’s responses were spot on.

What I found interesting was how each model maintained its style across different tests. Gemini was consistently more detailed and precise, ideal for tasks that require high accuracy and detailed documentation. Meanwhile, ChatGPT’s straightforwardness was really useful for general queries and quicker tasks where complexity and depth were less critical.

So, in summary, If your work demands high technical accuracy and detailed documentation, Gemini is likely the better choice. For more general tasks or when you need a quick turnaround without getting bogged down by details, ChatGPT could be more beneficial


parwalrahul
Forum|alt.badge.img+3
  • Author
  • Chief Specialist
  • 91 replies
  • April 4, 2025

@ghanesh ​@Nikhilkulkarni  - cheers on completing this exercise. great job!


Forum|alt.badge.img
  • Ensign
  • 6 replies
  • April 4, 2025

My findings and review.

Quality and accuracy
ChatGPT provided detailed and informative responses, often including examples and best practices. Gemini demonstrated strong contextual awareness, especially in automation-related discussions.

Regarding adaptability and reasoning, 
ChatGPT adapted well to variations in prompt wording and provided logical reasoning for test case creation and bug analysis. 
Gemini was effective in breaking down complex requirements, though its explanations were sometimes less detailed.

For QA and Automation Tasks
ChatGPT created clearer test cases and test data, making it better suited and more structured for QA tasks. In contrast.
Gemini performed more like a developer, using strategies typical of non-QA roles. This makes Gemini more useful for QA team members who have a development background.

For integration and usability
ChatGPT is better suited for structured QA tasks such as test planning, requirement analysis, and detailed documentation. 
Gemini, on the other hand, is stronger in AI-assisted debugging and provides automation insights that feel more tailored for developers rather than QA specialists.

In terms of completeness and clarity, 
ChatGPT’s responses were more comprehensive and well-organized, making them easier to use for test documentation. 
Gemini occasionally provided concise answers, which were useful for quick insights, but these sometimes lacked depth.

Overall Review:
ChatGPT is more popular and detail-oriented, making it highly effective for QA professionals. 
Gemini is slightly less popular in comparison but offers more technical insights, which may appeal to users with a development focus.


parwalrahul
Forum|alt.badge.img+3
  • Author
  • Chief Specialist
  • 91 replies
  • April 5, 2025

nice ​@NitinMore 


  • Ensign
  • 5 replies
  • April 10, 2025

@parwalrahul  week 2 exercise

Both Chat GPT 4.0 and Gemini are effective and powerful in their ways. However, I found Gemini provides more precise and detailed information when it comes to technical details on applications, tools, and languages.


Reply