AI QA Test Engineering: Testing AI Applications with the Power of AI
This event is free for community members. If you want to participate in the event or get the recording, you have to firstly register in the community (see instructions below) and then click on Attend to reserve your spot.
What you will learn when you register to this webinar:
Â
1. Testing AI Applications with DeepEval
DeepEval is an open-source evaluation framework designed for structured and automated testing of AI outputs. It allows you to define custom metrics, set expectations, and benchmark responses from LLMs. In this session, we'll explore how QA engineers and developers can use DeepEval to test the quality, accuracy, and reliability of AI-generated responses across different use cases like chatbots, summarization, and code generation.
2. Testing AI Applications with LLM as Judge
LLM-as-a-Judge is a powerful technique where an AI model evaluates the outputs of another model. Instead of relying solely on manual review or static metrics, we'll learn how to use trusted LLMs (like GPT-4) to provide qualitative assessments-grading correctness, coherence, tone, or factuality. This method enables scalable and human-like evaluation in real-time AI testing pipelines.
3. Evaluating LLMs with Hugging Face Evaluate
Hugging Face's evaluate library offers a robust suite of prebuilt metrics and tools to measure the performance of LLMs and NLP models. This topic will cover how to integrate and use evaluate in your testing workflows to assess text generation, classification, translation, and more-using standardized metrics like BLEU, ROUGE, and accuracy, alongside custom metrics for GenAI applications.
About Karthik:
Karthik K.K. is a consultant, blogger, and tech enthusiast with over 20 years of experience in Software Automation Testing. He is passionate about learning and experimenting with cutting-edge tools that take automation to the next level.
Lately, his focus has been on integrating AI with software testing — exploring how Generative AI, large language models (LLMs), and intelligent tooling can enhance QA workflows. He has been actively building and researching with tools such as LangChain, DeepEval, Playwright MCP Server, RAGAs, and even fine-tuning LLMs.
Karthik works hands-on with modern frameworks like Playwright, Cypress, Selenium, and Appium, and codes in JavaScript, C#, and Python — choosing the right language for the task at hand.
He is also deeply involved in cloud-based testing, Docker, and the development of event-driven, scalable test architectures.
Whether it's crafting smarter test strategies or diving into AI-powered automation, Karthik remains curious, innovative, and always ready to explore what's next.
 How to register for an event, if you are not a member:
Â

- Create an account.
Password advice: use the password generator for passwords or avoid dicitionary words ( even when using special characters).
- Check your email box, there is an email from ShiftSync. Click on a button in the email and activate your account.Â
- Go back to this page and click Attend.
- Now you are registered! ✨
Login to the community
No account yet? Create an account
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.