Skip to main content

Challenge: Have you ever caused a bug in a live application? And on a scale of 1-10, how much chaos did it cause?

  • March 24, 2026
  • 13 replies
  • 300 views

lewisp707

... Because I have, a couple of times. The most chaotic one was while working on a national health application in the UK, where my refactor to the API which made the application more testable meant that patients nationwide couldn't access information about their prescriptions for a couple of hours during the day. The panic set in realising that my changes had gone through to production too soon... Revert revert!!!

I would rate that a 7 out of 10... Do you agree? Anyone got a more severe bug than that?


Share your failure story in the comments, and I will pick three winners who will get the book “Contract Testing in Action. With Pact, PactFlow, and GitHub Actions” from ShiftSync.

The challenge closes in 10 days.

 

 

13 replies

PolinaKr
Forum|alt.badge.img+6
  • Community Manager
  • March 24, 2026

I once managed to “break” our entire HubSpot communication flow.

We were preparing a campaign and I wanted to clean things up a bit: remove duplicates, fix some workflows, make the segmentation more precise (it always starts with good intentions).
At some point, I updated one of the lifecycle stage rules and tweaked a workflow filter that didn’t look right to me. 
What I didn’t realize was that this workflow was powering multiple automations (email sends, lead assignments, you name it). Turns out I had effectively blocked contacts from entering several key workflows. And first when nothing new was coming to the flow, I thought: “Quiet day, nice.”
But then other departments started panicking, and I realized what had happened.

Fixing it took way longer than causing it!! 😂


I set a device under test (a battery) on fire by not properly connecting the wires 🔥 Everyone was safe, because it was in a lab with proper safety measures. But I was super scared 😅


IOan
  • March 25, 2026

It was a while ago when i first read about load testing. So before black friday I did a mini test on some local vendor website with no intention to break it. It turned out even my mini load test was enough to cause te search for the website to crash. They did recover in time for black friday. Who knows...maybe my mini test helped :)


dharmendratak
Forum|alt.badge.img+1

A change that looked “simple” almost turned into a production nightmare.

Earlier, all images across our app and web were exposed via S3 URLs.
Everyone knew it. Even the client.

Then came a request:
“Let’s secure the S3 URLs.”

Sounds straightforward, right?

 

The change was implemented.
But there was a catch — those URLs were deeply embedded across almost every screen and flow in the app. Around the same time, I had to step away for a few days due to health issues. Before leaving, I ran one round of testing and caught a few issues related to the new URL logic.

When I came back, production was scheduled for the next day. From a surface view, things looked stable. I had already tested earlier. But something didn’t feel right ‘the QA instincts’.

 

I said:
“If nothing has changed, we should be good…
but I’d still like to run one more cycle.”

That one decision changed everything.

 

Just before production, multiple issues surfaced:

  • Event creation flows breaking
  • Switching business within the app causing inconsistencies
  • iPhone app crashing on event listing
  • And then… AI-integrated features started failing

 

We had to stop the release. Then postpone again the next day. For a moment, it felt like everything was connected… and everything was breaking.

 

But here’s the part that mattered:

The client didn’t panic.
They said — “This is how we learn. Let’s fix it properly.”

 

Lesson?

Sometimes the biggest risk is not the change itself…
It’s underestimating how deeply that change is connected to everything else.

And sometimes, the most important QA decision is simply this: “Let me test it one more time.”

 

Chaos level?

A solid 8/10
just one push away from becoming a production incident.


PolinaKr
Forum|alt.badge.img+6
  • Community Manager
  • March 25, 2026

A change that looked “simple” almost turned into a production nightmare.

Earlier, all images across our app and web were exposed via S3 URLs.
Everyone knew it. Even the client.

Then came a request:
“Let’s secure the S3 URLs.”

Sounds straightforward, right?

 

The change was implemented.
But there was a catch — those URLs were deeply embedded across almost every screen and flow in the app. Around the same time, I had to step away for a few days due to health issues. Before leaving, I ran one round of testing and caught a few issues related to the new URL logic.

When I came back, production was scheduled for the next day. From a surface view, things looked stable. I had already tested earlier. But something didn’t feel right ‘the QA instincts’.

 

I said:
“If nothing has changed, we should be good…
but I’d still like to run one more cycle.”

That one decision changed everything.

 

Just before production, multiple issues surfaced:

  • Event creation flows breaking
  • Switching business within the app causing inconsistencies
  • iPhone app crashing on event listing
  • And then… AI-integrated features started failing

 

We had to stop the release. Then postpone again the next day. For a moment, it felt like everything was connected… and everything was breaking.

 

But here’s the part that mattered:

The client didn’t panic.
They said — “This is how we learn. Let’s fix it properly.”

 

Lesson?

Sometimes the biggest risk is not the change itself…
It’s underestimating how deeply that change is connected to everything else.

And sometimes, the most important QA decision is simply this: “Let me test it one more time.”

 

Chaos level?

A solid 8/10
just one push away from becoming a production incident.

I feel you! Also messed up URLs ones. A little differently, but still!! 


PolinaKr
Forum|alt.badge.img+6
  • Community Manager
  • March 25, 2026

It was a while ago when i first read about load testing. So before black friday I did a mini test on some local vendor website with no intention to break it. It turned out even my mini load test was enough to cause te search for the website to crash. They did recover in time for black friday. Who knows...maybe my mini test helped :)

For sure it did help! 


Forum|alt.badge.img

... Because I have, a couple of times. The most chaotic one was while working on a national health application in the UK, where my refactor to the API which made the application more testable meant that patients nationwide couldn't access information about their prescriptions for a couple of hours during the day. The panic set in realising that my changes had gone through to production too soon... Revert revert!!!

I would rate that a 7 out of 10... Do you agree? Anyone got a more severe bug than that?

Share your failure story in the comments, and I will pick three winners who will get the book “Contract Testing in Action. With Pact, PactFlow, and GitHub Actions” from ShiftSync.

The challenge closes in 10 days.

 

 

@PolinaKr ​@lewisp707 Once, I missed the higher-level authorisation while testing. Test & pass the story by the lower level credentials. It was in the testing environment. 😁.

Later on I came to know that “What mistake I made”.

I realize my mistake & again check the story requirements & retested with respect to the requirements.

From that day, I have always checked the authorized users.


PolinaKr
Forum|alt.badge.img+6
  • Community Manager
  • March 25, 2026

@ersourabhskjain That’s why mistakes are useful, right? We learn so much from them 


  • Apprentice
  • March 25, 2026

I once sent out a marketing email where I didn't test all the links before sending - the filler 'xxx' was still present when the email was sent to 800k senior citizens - linking them directly to porn.


Forum|alt.badge.img

@ersourabhskjain That’s why mistakes are useful, right? We learn so much from them 

Yes ​@PolinaKr, from then onwards I always make sure about the authorisation & made a note of the quality.


parwalrahul
Forum|alt.badge.img+5
  • Navigator
  • March 28, 2026

I have caused real physical IoT device to crash and get bricked from my software testing.

 

I test IoT software that gets loaded in to smart industrial IoT devices. I once tested a usecase where if the loading of firmware on one device was ON and if you try to access it from another instance of the web browser application; the access was possible (BUG 1).

The worst side effect was that, you could even trigger a parallel software upload to this device in this new application instance. (BUG 2).

RESULT: Device got bricked. (BUG 3)
 

After Bug 3: No LEDs. No response to reboot. Nothing…  This was a 100/10 issue for us.

 

It was a big issue for us as our devices were already there in the market and then we had to fix this issue and roll out the updated software application that restricted all this.

This one bug taught me a lot about IIoT industry and why it is regulated to some degree. 

Testing uncovers interesting lessons. Thanks for this challenge, Lewis.

 

 


Scale of DESTRUCTION: 11!

I used to work creating dynamometer test rigs control systems. We asked the customer for a list of control commands and added them to the drop-down control in the order they gave them to us. STOP being the first one on the list made sense as a default setting. And as the users were expert users we didn't question any further. 

Roll on a few months after release and we got a report that one of the test cells had exploded, causing about half a million in damages! 

It turns out that the worst thing you can tell a powerful dynamometer to do while the engine is at 9k RPM is to STOP. It applies an opposite and equal force and tired to stop the drive shaft instantly. Anyone who understands a rudimentary grasp of physics knows that this is bad! The engine exploded and pistons embedded themselves into test equipment and destroyed everything. 

The cell operators fell back off their chairs at the explosion, although this was the only injuiries they sustained. 

Someone had dropped a command step in and forgot to change it from the default setting. COAST command is what they wanted. A nice controlled coast down in power. 

That day I learned that you should always question assumptions.


PolinaKr
Forum|alt.badge.img+6
  • Community Manager
  • April 7, 2026

Thank you everyone for participating in this challenge! And special thank you to ​@lewisp707 for hosting it🎉

Our winners are:
@RichHintonTests 

@jayhco 

@maryia.tuleika 

I will reach out to you shortly to organize books delivery!💌