Below is a recap of our webinar with @dave Colwell, the Artificial Intelligence & Machine Learning VP at Tricentis. You can view the original in the link below, just be sure that you’re logged in 🙂

Is Generative AI taking over testing? David Colwell webinar recap

The 3 main questions

Why do we need AI?
How can we use AI?
How do we test AI?

1. Why do we need AI?

Before we begin, we need to address the idea of the human attention span. Take a moment to consider these facts about attention across various demographics:

U-YGZHwVNnWlt2IbeFd7PAipNPM958ZRDXY9uEhG_ydr1-vXky5CWeVa1H9y2-gM_HWr6p92O6jaORZKDbc2Vz8ch0Shs2Y7AwF2gvaxoF1SfV2X6-MyRQ18hUa1RbQgEvBy2Gqxqpl-YpQAK2KS43g

A human’s attention span, put simply, is not good. It gets even worse when you start putting multiple screens in front of someone, a common scenario for many of us today. We’ve found that the average person can hold roughly 7 pieces of information in their working memory at a time, give or take. With apps like TikTok and other distraction-heavy apps on the rise, this stat doesn’t seem like it’s going to improve any time soon.

So how does this relate to AI?

Consider what the modern business looks like. We might like to think of ourselves as organized or tightly knit:

kpW4F_F8lwWK-K-j2tSOFIklPLIhzXXMpQ2OY_--BEraba3xthhtfHYZu4HwKt_8G3hF5pXKyd16edOPrRlmYqtgTM1pPgmKa6fT35YA-hXkDPCjNockMJjNI8bSFtou0_xfXkx-6LPMY0lVwzVUf04

But in reality, our systems are often vastly more complex:

DZXucX7Rg_0-hYsSG-3Y-bIjVya9rpdjlpDE9GTcqj-WPwalXg8QP2Bm7bAHvDr8QkR3b6J-UjqjEXI8J78CKhIp1ZqqOL0zrkJOrUALSkg5aWqD2tADCLXCHzlKvLrpyY4icpg5y0JKM7TQ4CB-i6Q

With the amount of information we process on a daily basis, added to the complexity of our systems, there are hundreds of millions of potential ways for customers to interact with data. These conditions create a perfect environment for defects finding their way into your finished product. Compare this amount of information to the 7 statistics on human attention span listed at the beginning, and it’s obvious that we are clearly well outside the scope of what people can retain in their own heads.

To make things worse, bugs generally don’t make themselves readily apparent. Bugs are often hiding in the intersections and logic gates of how our data interacts with itself. The amount of information that needs to be kept track of in order to account for not only finding, but solving these bugs, is simply overwhelming. On top of that, the real world comes with real relationships- Business partners, developers, and project managers. Are we ready to go live? Is the product finished? How long until launch? What’s the status?

This is where AI thrives. Not in the decision-making process, but rather in acting as a filter. AI is trained on massive data sets to find patterns. By training over and over again on huge amounts of information, it’s able to eventually draw conclusions on what certain patterns or data sets mean. With enough time, and enough iterations (repetitions), AI is able to begin funneling information from the enormous data sets that companies process down into something that can be managed better at a human level.

With AI now taking the role of filtering information down to a manageable amount, humans can utilize one of their best traits: discernment. Once humans are given categorical or distinguished options to choose from, they can use their own judgment as to what the best choice might be, an area in which AI struggles.

It’s important to be aware that AI roughly falls into one of two major categories- Narrow Networks and Generative Transformers. As the name suggests, Narrow Networks provide precise answers to a narrow set of questions, while Generative Transformers (the category Generative AI falls under) give general answers to natural language questions. ChatGPT, which saw a dramatic rise in popularity and awareness in 2023, is perhaps the most well known, and is a form of Generative AI known as a Large Language Model (LLM). ChatGPT specifically was trained using the Internet, which consists of hundreds of billions of different parameters. This has allowed LLMs to become very good at answering questions in a human-like manner, as it has taken billions upon billions of sentences, scenarios, books, forums, blogs, etc. and analyzed their patterns to such an extent as to be able to construct new information very similar to the way a real person does. Generative AI isn’t “thinking” in the same sense that humans do, but the behavior and language it uses can often create an impression as such.

Narrow AI, on the other hand, is trained differently. Narrow AI instead looks at sets of data labeled by humans to help establish correlation and/or meaning. In fact, this is precisely what’s happening when you fill out a CAPTCHA before entering a website:

kjhaLX5dXm4wtr2Jx6uJnx9ZPOg0PmWGp4AEFnMH3vBlh9_yuahP_Z7yYIJq01y1WR9VJwlUa0TNILoCuWZxwTyoTiVlQaFnqt2D41pDum1Rpd5Yi_PXDGO-Vw8Xg7BM7Fi7nAZUwuhVnQeNXQuukdw

Your CAPTCHA labeling is used to help train Narrow AI. Tricentis uses similar data sets for things like test cases, reasons for failure, and visual applications. Tricentis Tosca’s Vision AI is a type of Narrow AI that can study the behavior of a user (a cursor on a screen) and begin labeling actions such as clicking a button, typing into an input field, or interacting with a table. It learns how to “see”, so to speak, a specific screen or set of screens in order to better create test cases or perform designated behavior in a narrow environment. This is not the sort of AI that would understand if you asked it a human-like question, but in an environment where it’s been trained, it can perform far beyond the manual execution of a person. Narrow AI can’t do much outside the environment in which it’s been trained, but where it has been, it excels.

2. How can we use AI?

Within Tricentis, Narrow AI sort of acts as the sensors that drive our applications. Tools like Tosca, qTest, and Testim are the foundation for many of our products. These tools effectively allow our application to see and do more, from Vision AI to Impact Analysis. These sensors are responsible for processing those huge amounts of data and producing the simplified questions for humans to then be able to process. You can see three layers below combining the use of these tools for Tricentis’ purposes. Prior to Generative AI, there were only two layers to this process (our products + Narrow AI), but you can see that Generative AI has been added on top as the initial step:

fyjo4c1WWhqv_6l343mYXu7rqIwdh_z-4g0lNPEw1ms0aLUs7G-U-CFYWdiST9gQDLmAXrdJTCDx3spJ0inD5GjXoHmtjMZFf87v5yVN-cb8P9cqzfk0-3bj44DdXArXnDNoe7BqR8KdQaJfyZCAdc4

We call these tools Smart Assistants in tandem with one another, though we also jokingly refer to them as a (slightly arrogant) intern. They’re very good at accepting a task, completing it quietly on their own, and then coming back with it finished, without actually explaining how it was done. Sometimes this works out to a superb solution, but not always. Solutions can be perplexing or left with no explanation as to how it was constructed. Regardless, the advantage of doing things this way is that we can feed natural language to the program rather than code or another specific method.

So let’s take a look at what we use Narrow AI for. We mentioned earlier how Narrow AI takes labeled sets of data and draws patterns and correlations from them. You can see what this actually looks like pictured below:

s3Jc9OmdjRsce9xV5VSBT6Oj3Bo3tpFBdTJ4FpLcSVhj15aoDCfHK9nD-_OQ9HwJKZ8I5IRLgHbzJO-UCHSXbQa6xDahACXaD-d51idyqIpMtdt_gByM4s1mt-cXPFoXWVSHtHcofV6Oi5mrhhE3Urw

You might be able to start piecing together how something like this might be valuable to testers. Narrow AI knows how to recognize buttons, inputs, and other specific elements on screen. Instead of making testers redundant, combining Generative and Narrow AI allows testers to test both earlier and more overall.

Another form of Narrow AI is AI-powered impact analysis, which can act as a signal processor. For example, when a change is made to your application, impact analysis can process all of the information about the change, and not only narrow down that information, but specifically point out what had the highest likelihood of being impacted by that change:

YDvgNU8pXoeTHRQ7au2g8gIQz15U3qCZcd5LQ1h6fCRuragckb_HJA3OQRdjdFBdYwCYiIX6NDI7bTMWWYvnXAbG4uwn9_6m46PR9-dM3n1W1iC17FhtnWaH8631ARK4GAYTi8zTY2FGTkQLPj-J0Io

This is a form of signal processing, which would take an enormous amount of resources to process manually. With large volumes of changes happening on a regular basis to an application, hundreds of executable files are being affected, making it incredibly difficult to sort out the most impacted components within a system. To reiterate- this is not AI replacing testers, this is AI allowing testers to do their jobs better.

This is also the case for Tricentis Test Automation (TTA) for Salesforce, in which AI is not used to replace testers, but to enable them to do their jobs faster:

FjP2E6K5bVwMgcKsuHK792c08p3XTCcXxaIxSZ-C1UUH3ODuZSZXsbdB0Ah-gpx_HsVUxu8yr4IZGVs-PsZ2wf1AdVnY85c3sywS1OKj1Jdx54CTmem6crwl0UPKUh4votQIYRI6WHHEt9r7JJWy6Zc

TTA looks at your Salesforce configuration and automatically constructs test cases for you by looking at your historical usage and test cases along with your data profiles. This allows you to both focus and write better test cases, as well as write more of them. The main takeaway here is that accomplishing more in the world of testing has never been a threat to testing jobs. Rather than replacing the role of the person at a company, AI allows them to accomplish things better and faster.

So what about Generative AI’s role specifically? This is generally the source of people’s fears of replacement due to its ability to mimic human speech and behavior (i.e. it appears to respond with thought). Is this a threat to us, or can we use it in some way? This example highlights how AI can be incredibly exciting while also emphasizing how and where it needs to be supervised. You can command Generative AI to look at a requirement and then write a set of manual test cases based off of that requirement. The advantages of a capability like this are obvious. However, what happens if you blindly trust an AI’s suggested test cases without review? As you can probably guess, you’ll be left with test cases that are irrelevant and repetitive. Why does this happen? Generative AI doesn’t have the ability to think about the larger context of your organization. It hasn’t been trained on your company or your project, and will thus make many of these basic mistakes.

Does this mean AI’s capabilities aren’t useful in scenarios like this? Of course not. It still acts as a great way to receive feedback or review for requirements. Many test cases may not need adjustment or modification, but the overall process of using AI in this manner allows us to more quickly identify and correct edge cases and defects.

Finally, Generative AI acts as an excellent knowledge assistant. On a day-to-day basis, people are required to process and take in all kinds of broad information, from Jira tickets, to Slack conversation, to meetings, etc. Using the exact same capability of analyzing large sets of data, AI can scan documentation, resources, or other large text bodies for information, which can be requested and given on command, cutting out significant portions of tedious work in finding information manually.

lgRZwYCu4_1Qw9t3mRpNvefqFS1ZIqiH6pt2AdF1ohyvEaukaxG_kIDpS6qdk-8IxuLc4qHZ0Ox7HWGVihX8D9svZ0eVI76MTb31H6H53PAE33ubby5dKYNY1PMVqIgIsSuDNCO3TrNu_sgg_HUq_7w

Parsing through information like this is already one of the primary ways people use ChatGPT, and is now being rapidly adopted and utilized by companies across the globe for internal information. Again, this use of AI is incredibly useful as an assistant or proof reader, but isn’t quite capable of being left completely alone without a human there for correction.

3. How do we test AI?

Testing AI can be a daunting task, because, having been trained on the data of the Internet, the underlying technology doesn’t conform to a traditional set of if-then statements. While many of the testing methods have undergone change, the types and techniques of testing are still familiar, along with a large portion of similar terminology. There are certainly new techniques and methods to follow, but the approach is not entirely foreign.

You can see examples below of terminology used in an AI context that equates to terms we’re already familiar with:

Hk56X0QId5UOv3O-YhmKdRY3_aXSXLt3OpZMPaji93Ejo-qzZKa64cVQ_wp3TStyzLZ1VeOeE-CRLIocA2FzYQgYBixwDfANTm4aQsF36xae6nDgqtXFDNWOReBvLe4fTj21Vg5hnzpbzRo10cUvmMg

With these in mind, we can start using various testing techniques:

X32fO1xZq__DBM3TGXADnRVlv6JnC67TaSBf1-KFh2zx3kt3CcG8390F52FM_V3nZKRK1ai8SV4dN8kbfIisXBHS9utNKw-KVIA4B0tJcca3aozZcKq66PeWYjguwT4x5HBdoHbYNiPZRG9oC-ZZoKU

Instead of testing strictly with logic statements or mathematical equations, we can start targeting specific behaviors in the AI to find whether it stays within our desired confines. Can the AI be prompted in such a way that it crosses ethical boundaries (Forbidden topics)? Can it be overloaded with information so that it bypasses the safeguards we’ve given it (Context window overflowing)? Tests need to be administered in such a way as to account for these types of AI manipulation. Searching for holes in AI like this is a sort of security security testing, or could even be considered ethical hacking.

One of the big reasons this is important is because changes in AI demand it:

RqNgQ8JOwcYNmfHrRom5ulvA7MikiNO9A384g5-EdXIf9Qd7yDwZPliIKgEx_gUwD1_ICGSjgbnzwyEwGGIqe1EKLperqZxMePu7XS_8ciTQScohjEU_sItDSU8wmQppqKkNU_csiSM9pQUzgQxIM_w

During the update from GPT-3.5 to GPT-4, certain areas of performance improved, while other specific categories declined. These results emphasize the need for continual testing in a world where change is effectively constant.

What this means is we’re now living in a world where AI can recognize and filter change to a program or a system, but still needs human validation for full functionality. The more information AI has to process, the more human validation will be needed to check and balance that processing. Testing and validating the results of AI remains a job for humans. We can summarize the ideas discussed here by essentially acknowledging that AI does some of the work we used to do as humans, but it certainly doesn’t do everything, and for the work it does solve, it creates more work that people will ultimately be responsible for.

We’re still in the early stages of what’s possible

With everything we’ve discussed in mind, we of course have to keep in consideration that we’re living through the birth of an AI revolution. New forms of artificial intelligence are being released on a weekly basis. Deployment methods and testing methods are constantly evolving. Even for professionals in the field, the speed at which new technology is being put out into the world is a struggle to keep up with. The best thing you can do as a tester today is to stay educated on what’s out there. Become familiar with terminology, study prompts, prompt hacking, and see if you can find a way to break AI apps as they enter the market. Not only will you be providing an invaluable service to your own organizations, but you’ll possibly even encounter issues that others are unaware of.

Can you remember all the facts about attention span from the beginning of this article?

Most can’t. And that’s why we need AI.

Webinar recap 🎥 - Is Generative AI taking over Testing?

Is Generative AI taking over testing? David Colwell webinar recap

1 reply

Is Generative AI taking over testing? David Colwell webinar recap

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded