Skip to main content
Blog

How to deliver quality at scale: what I've learned about software testing from 7 years in Big Tech

  • June 19, 2026
  • 1 reply
  • 776 views
avishwan
Forum|alt.badge.img

About Arun Vishwanathan:
I work as a Senior Software Development Engineer in Test at Apple, Inc., in the Machine Learning organization. I create testing automation tools and frameworks for use within the team and across other teams, with a focus on efficiency and improving productivity. These tools help qualify the product stack as a whole and enrich the customer experience. As part of my role, I also present product health statistics to senior company executives

You will learn:

  • How Big Tech ships products fast without breaking quality
  • How to assess and prioritize risk when product cycles move faster than test coverage can keep up
  • Why testing ML systems requires a different mindset than traditional software testing
  • How to make the impact of software testers visible
  • How developers and testers can split ownership to ship faster

Apple ships software to over two billion active devices. Every iOS release, every macOS update, every new Siri feature lands simultaneously on hundreds of millions of screens across countries. The quality expectations are extraordinary, and yet if you peek behind the curtain at how testing works there, you might be surprised.

There are no two-week Scrum sprints, no sprawling QA ceremony calendars, and no test-driven development doctrine handed down from on high. Instead, there is instead is something rarer and harder to teach: radical ownership, relentless speed, and the quiet expectation that you—the test engineer—will figure it out.
After seven years as a software test engineer at Apple, working primarily on infrastructure
tooling and partly in the machine learning space, I want to share what I have
learned: what makes testing in this kind of environment different, what approaches actually work, and how the discipline itself is changing as AI accelerates everything further. These are lessons I gathered from direct experience, and most of them, I believe, apply well beyond any single company.

How engineering culture shapes testing

Apple does not run engineering the way most companies do. From day one, the expectation is that you are competent, need little hand-holding, and will take ownership of your area and drive it forward. Process exists where it earns its place. Rather than multi-week regression cycles or approval chains layered on by default, the emphasis is on traceable ownership and visibility, so that engineers know who is accountable for what and risk is surfaced early.
You are trusted to know what good looks like and to get after it. For engineers who thrive on that kind of ownership it is energizing. For those who need a lot of structure to feel comfortable, it can be a difficult adjustment. I have seen both flavors in my career.

What this autonomy comes with is shared accountability, supported by visibility and traceability across the team. If a regression ships, the expectation is to explain what happened. Not just "there was a bug," but what coverage gap existed, what edge case was not considered, and how the framework should be updated so it does not happen again. That loop, ownership plus shared accountability, is really the foundation of how quality works at scale.

This model is worth examining for any enterprise testing team. When quality problems arise, the easy answer is to layer on more sign-offs, more review gates, and more documentation. A more durable answer is to make process intentional. If a sign-off, meeting, or document helps people understand risk and make a better decision, it is genuinely useful. If it only exists so that everyone can say the box was checked, it probably is not doing much for quality. The goal worth aiming for is smarter, risk-based, and where possible automated governance, paired with engineers who exercise real judgment. Compliance and outcomes are not the same thing, and confusing them is one of the more common failure modes in enterprise QA.

Risk over ritual: why one testing method rarely fits every team

One of the first things you notice coming from a more traditional background is that no single testing methodology is treated as universal. Test-driven development is genuinely useful for certain types of software, particularly well-scoped, stable API surfaces. But for some fast-moving consumer and ML features, the product cycle moves too quickly for any one ritual to fit every team. Features get designed, built, and iterated on quickly. Waiting for a formal test strategy document to be approved before testing begins may mean you are already behind. 

Testing happens alongside development, not after it. You are expected to engage early, ask questions, and start thinking about coverage even before the feature is fully spec'd out. Your value lies in assessing risk quickly, identifying what matters most, and getting coverage in place before the next wave of changes arrives.

And the changes do arrive constantly. Your testing focus might shift mid-week. A feature you were heads-down on Monday might be deprioritized by Thursday while something else lands and needs immediate attention. The faster you can triage, re-orient, and still deliver meaningful coverage, the more valuable you become. Process should serve risk reduction and speed. In high-velocity environments, the most effective test engineers are the ones who can assess risk on the fly, prioritize coverage where it matters most, and lean on shared platforms and reusable frameworks to scale that judgment across the team rather than re-deriving it on every project.

Testing machine learning systems: a different kind of problem

Working in machine learning testing adds a layer of complexity that deserves its own discussion, because the standard testing playbook simply does not transfer cleanly.

ML features do not behave like traditional software. There is no deterministic input-output mapping you can write a clean assertion against. You are evaluating things like response quality, behavioral consistency, relevance, and tone, and a lot of that requires genuine judgment rather than just tooling. You have to develop a feel for what "good enough" looks like and what counts as a regression versus natural model variation.

On the automation side, your test infrastructure needs to keep pace with model updates that can change behavior overnight. Building test frameworks here means building for adaptability, not just correctness. Too rigid and it will break constantly. Too loose and you will not catch the things that matter.

A few principles I have found useful for ML testing specifically:

  • Separate behavioral testing from performance testing. Behavioral tests ask whether the model is doing the right thing qualitatively. Performance tests ask whether it is doing it within acceptable latency and resource bounds. Mixing the two tends to produce noise that is hard to interpret.
  • Define regression in terms of population, not individual outputs. Because ML outputs are non-deterministic, asserting on a single output is rarely meaningful. What you can assert on is the distribution of outputs across a representative population of inputs. A regression is when that distribution shifts in a direction that matters to users.
  • Build evaluation sets that reflect real-world edge cases, not just clean examples. The failures that matter most in production tend to come from the edges: unusual phrasing, unexpected contexts, inputs the model has not seen before. Your evaluation set should deliberately over-represent these.

The problems are hard, and there is real opportunity to build frameworks and approaches that do not exist yet. That is part of what makes this space genuinely interesting to work in right now.

The visibility problem: why testing work disappears

One thing I took a while to accept: testing work is largely invisible, and making peace with that matters more than fighting it.

You might spend time putting together a document or a scope summary to align on what is and isn't covered. You send it around, maybe there is a meeting, people nod at the time. Then the meeting ends, everyone goes back to their features, and within a week the document is rarely referenced again. Developers assume everything is tested. Program managers assume everything is tested. And you know very well that this may not be the case, because comprehensively testing a fast-moving ML system is more challenging than it seems. 

Don't get me wrong, the documentation is valuable and necessary. But disconnected documents don't move decisions. What does move decisions is connected quality intelligence: coverage data, traceable risk signals, and shared dashboards that live where the team already works.

This dynamic is not unique to Apple. It is a structural feature of how fast-moving product organizations work, and it has a few practical implications:

  • Your primary output is release confidence: risk reduction, regression prevention, and protection of the business processes that depend on the software. The infrastructure you build, the regressions you catch, and the frameworks you put in place that prevent entire classes of issues from reaching users are the real deliverables.
  • When something slips through, respond clearly and constructively. Not defensively, and not by pointing to documents that were written and ignored. The right response is always: here is what happened, here is why it was not caught, and here is what we are changing so it does not happen again.
  • Do not rely on other people's awareness of your testing scope. The engineers who thrive here are the ones who internalize the scope completely and do not need external validation to know whether they have done a good job.

The developer-tester dynamic

Developers are excellent at building and iterating on models and features, and they are focused on making things work. But there is an important difference between building something that works and building something that will not break in unexpected ways. Developers naturally tend to test the happy path which is the scenario where everything behaves as intended. Finding weird edge cases, unexpected input combinations, subtle behavioral regressions after a model update — this requires thinking about failure rather than success, approaching a system as an adversary rather than its creator.

That adversarial mindset is genuinely difficult to maintain when you are also the person who built the thing. This is why, even in organizations that have moved toward developer-owned quality, dedicated test engineering tends to persist for complex systems. Not as a gate or a checkpoint but as a complementary perspective. The best outcomes I have seen come from developers owning unit-level correctness and test engineers owning system-level behavior, integration risk, and regression coverage.

Test engineering in this context is frequently the last real line of defense. When something subtle slips through development and code review, it is usually the test engineer who either catches it or does not. Taking that responsibility seriously rather than treating it as a shared burden is what makes the difference.

AI is raising the bar, not lowering it

A lot of people assume AI will eventually make testing easier or reduce the demand for skilled test engineers. In my experience it is going in the opposite direction.

AI tools genuinely help with generating test cases faster, identifying coverage gaps more systematically, and building automation that adapts more gracefully to behavioral changes. Used well, this is less about simply speeding up test authoring and more about AI-powered quality operations: governed, observable, and integrated into how releases get decided. But the pace of what needs to be tested is accelerating fast. New AI features, model updates, and generative capabilities are shipping on tight cycles, and the testing surface is growing faster than any team can keep up with through manual effort alone.

What has changed practically is the expectation around ramp-up time. It used to be somewhat acceptable to take time getting familiar with a new system before building out test infrastructure for it. That grace period has largely disappeared. The expectation now is that you arrive fluent with AI-assisted tooling, move quickly to establish coverage for new features, and give concrete ETAs for when infrastructure will be in place. "Still getting up to speed" is not a satisfying answer when AI tooling is available to accelerate exactly that process.

Principles that transfer

These are not abstract ideals. They came directly from navigating the environment I have described above, and I have found them hold up consistently whether I am working on infrastructure tooling, ML systems, or anything in between. If there is one thing seven years at Apple has reinforced, it is that the fundamentals of good testing do not change much even when the technology underneath changes constantly.

  • Own your domain, and make that ownership visible. The engineers who do the best work treat their coverage area as genuinely theirs and know it deeply enough that they do not need a checklist to know what matters. They also make their coverage, gaps, and risk decisions traceable to the rest of the team. Ownership scales when it is shared and visible, not when it is siloed.
  • Assess risk rather than chase completeness. Complete test coverage of any complex system is not achievable. The job is to test the right things at the right depth. Developing a reliable intuition for where the real risk sits and what edge cases are most likely to reach users is one of the most valuable skills a test engineer can build.
  • Build platforms and reusable frameworks, not one-off scripts. The highest-leverage work is not writing individual test cases. It is building the platforms, analytics, and reusable frameworks that operationalize testing judgment across teams, so that good decisions made once keep paying off. Every framework investment compounds.
  • Communicate outcomes, not activity. Nobody has time to read a detailed account of what you tested. What people need to know is whether it is ready to ship, what the known risks are, and what you would need to see before you would be comfortable. Delivering that clearly and concisely is a skill that earns trust faster than almost anything else.
  • Treat regressions as system feedback. When something slips through, ask what the failure reveals. What assumption was wrong? What class of change is not currently covered? What would a framework look like that catches this automatically in the future? Regressions, handled well, are the best input you will ever get for improving your infrastructure.

Read more ShiftSync articles:
Ethics Is the New Non-Functional Requirement
Will AI Replace Software Testers? The Hidden Expertise That AI Cannot Replicate (2026)
Stop treating AI like Google: start treating it like a senior tester

1 reply

  • Space Cadet
  • June 24, 2026

Great post and well written, I have a few questions your way 
 

1. Delivery Mechanism

You mention there are no two-week Scrum sprints, extensive QA ceremonies, or rigid development processes. I’m curious what the actual delivery model looks like.

How are changes planned and delivered in practice? Are test engineers embedded within product teams delivering features, or do they operate more as specialists responsible for tooling, frameworks, and quality enablement across multiple teams?

If you’re not following a traditional sprint model, what does the cadence of delivery look like?

 

2. Ownership

You mention “traceable ownership” as a core principle.

Could you give a concrete example of what ownership means in your environment?

Is ownership aligned to:

  • Features?
  • Products?
  • Customer journeys?
  • Releases?
  • Test infrastructure/tooling?

I’m trying to understand what the unit of ownership actually is and how clearly that ownership is defined across the organisation.

 

3. Accountability

You describe ownership being coupled with shared accountability.

Where does an individual’s accountability begin and end?

For example, if a feature ships with a production issue:

  • Does the feature owner investigate and drive resolution?
  • Does that responsibility extend into operational support?
  • Do engineers own the entire lifecycle from design through production support?

I’d be interested in understanding how far that accountability extends in practice.

 

4. Ownership + Shared Accountability Loop

You describe ownership plus shared accountability as the foundation for quality at scale.

Could you walk through what that loop actually looks like?

What mechanisms create accountability?

What are the feedback loops that reinforce ownership and quality over time?

 

5. Coverage

You mention getting coverage in place before the next wave of changes arrives.

How are you approaching coverage at scale?

Are teams delivering very small incremental changes, or larger feature drops?

Also, how do you make coverage visible?

I’d love to understand how you define and measure “sufficient coverage.”

 

6. Shared Platforms and Frameworks

You mention relying on shared platforms and reusable frameworks rather than re-deriving quality practices on every project.

Could you share some examples of those platforms?

And how do they help scale engineering judgement rather than simply standardising process?

 

7. Quality Intelligence

You mention “connected quality intelligence” being more valuable than simply counting bugs.

What does that look like in practice?

Specifically:

  • What coverage metrics do you track?
  • What dashboards do teams use regularly?
  • What do you mean by “traceable risk signals”?

I’d be interested in understanding which signals actually influence engineering decisions and release confidence.

 

8. Testing Scope Visibility (The One I’d Most Want Answered)

This is one of the few things I slightly disagreed with in the article.

You wrote:

“Do not rely on other people’s awareness of your testing scope.”

I understand the point about developing deep ownership and not seeking validation.

However, I’ve often found value in making testing scope highly visible—not because others know more than the tester, but because visibility allows assumptions, risks, and blind spots to be challenged.

How do you balance:

  • Personal ownership of testing scope
  • Independent challenge and feedback
  • Collective understanding of risk

What mechanisms exist to prevent someone becoming overconfident in their own understanding of the scope?