Skip to main content
Blog

How to Stay in Control of AI Coding Agents: A Practical Guide for Testers Using Claude Code and GitHub Copilot

  • April 22, 2026
  • 0 replies
  • 33 views
Forum|alt.badge.img+3

Overview: AI coding agents like Claude Code and GitHub Copilot are powerful — but "yes, allow all" without reading the plan is how you lose hours. This article covers five practical habits that keep you in control: providing clear context upfront, knowing enough code to question what the agent produces, building instruction files from your own debugging patterns, setting guardrails before the session starts, and catching wrong directions early instead of undoing them later. Written by someone with hands-on experience using both tools in real testing workflows.

You will learn: How to write prompts that reduce agent guessing · Why one programming language changes how you use these tools · How to build CLAUDE.md and copilot-instructions.md from your own experience · How to use hooks and pre-hooks to protect sensitive files · Why git diff is your most underused safety net


You're using Claude Code or GitHub Copilot. It asks permission: view this file, edit this, allow all edits. You click yes without reading what it's planning to do because you want to see the end result than how it's getting created. I've done this many times, and honestly, most of the time it works out fine. But the times it doesn't - those are the ones that cost you hours. This article is not about slowing you down. It's about making you better at getting what you actually want from these coding agents, whether you're a tester writing automation scripts, a lead setting up CI/CD, or someone exploring AI-assisted testing for the first time.

"There's a difference between a human in the loop and a human in control."

Let's talk about what "in control" actually looks like.

I'm using Claude Code and GitHub Copilot as examples throughout this article because that's where my hands-on experience is. But if you're using Cursor, Codex, Windsurf, or any other AI coding agent, the same ideas apply. The features might have different names - what Claude Code calls hooks, another tool might call guardrails or rules. What matters is the thinking behind how you use them, not which tool you're in.

Give the Agent Half the Answer Before It Starts

Before you open Claude Code or fire up Copilot Chat, ask yourself one question: what does the output look like? You don't need the full picture. Even half is enough. If you have working code, share it. If you have a rough structure in your head, describe it. If you tried something and it partially worked, give that as the starting point. That "at least half" is gold for any coding agent because it gives the agent something real to work with instead of guessing.

Here's what happens when you skip this. The agent fills in the blanks with assumptions. You spend time correcting those assumptions instead of building on what you already know. When I gave Claude Code clear context - the files, the expected behavior, the constraints - it delivered. When I was vague, it went in circles. The same applies to GitHub Copilot. If you open Copilot Chat and type "write me a test," you get a generic test. If you provide the page structure, the selectors you've already verified, and the expected behavior, you get something you can actually use.

I'll share a real example. One evening, a script I wrote started failing. It worked fine in the morning. I let Claude Code debug and fix it, and CC tried everything for 2 hours - different approaches, different fixes, no luck. I then spent a few minutes, checked manually, and found that the password had expired. The UI was working fine but the backend credentials were out of sync with the system ones. Not an AI problem, but AI didn't catch it. A human did. Your experience, your context, your "I know at least this much" - that's what turns an average output into a useful one.

Why Testers Still Need to Read Code in the AI Era

I keep hearing this: "In the Claude Code era, do you still need to learn coding?" I suggest yes, and it's needed now more than ever. Not to write everything from scratch, but to read, understand, question, and know what's happening when things go wrong. And this doesn't mean you need to become a full-time developer. If you're a manual tester or a test lead, even a basic working knowledge of one language changes how you interact with these tools. You go from "I hope this is right" to "I can see what this is doing."

Here's a real scenario. GitHub Copilot suggests an inline completion for a Playwright test. You hit Tab, it auto-completes a selector using page.locator('.btn-primary') - a CSS class selector. Looks fine, test passes. But next sprint, the dev team updates the button's styling. The class name changes. Your test breaks, and you have no idea why because you never questioned the selector choice. If you knew enough to read that line and think "wait, a class selector is fragile here, I should use get_by_role('button', name='Submit') instead," you'd have caught it before it became a problem. That's the difference knowing one language makes - not writing code from scratch, but reading it well enough to question what the agent gave you.

Claude Code generates entire files and plan mode lays out the full approach. But if you can't read that plan and spot where it might hallucinate or make a wrong assumption, you're trusting blindly. The models and tools are incredibly capable, but there are catches. Sometimes a fix that should take 2 minutes takes an hour because the agent keeps trying different things while the real issue is something only your experience and domain knowledge can spot.

When I moved to Playwright, nothing was a cakewalk at first. But learning Python, debugging, putting real hands on the code - it let me think beyond what I thought was possible. The more code I write from scratch versus copy-pasting, the sharper my thinking becomes. So is yours. And I'd even suggest disabling IntelliSense if you're a beginner, because it makes everything auto-populate which makes its thinking better than ours.

"Code is cheap. Show me the thinking that sharpens the questioning."

How to Build an Instructions File From Your Own Experience

This is the practical heart of it. Most people ask their coding agent to generate an instructions file. The agent creates something generic that looks decent, but it doesn't know your workflow, your edge cases, or your debugging patterns. Here's what works better.

Take one piece of your code, one script, one workflow. Go through it yourself - not with the agent, not yet. Just you and the code. Ask yourself: how do I debug this? What do I check first when this breaks? How do I handle this loop? What about this condition? Then, instruction by instruction, tell the agent. Things like "when a test fails on a timeout, check the selector first, not the wait time" or "if a login step fails, verify the credentials haven't expired before changing the code" or "don't modify helpers.py without asking me." Simple instructions, one after another. It takes time, and that time is the difference you're making. You're not outsourcing your thinking, you're encoding it.

Once you've done this practical work - walked through your own workflow, understood your own patterns - THEN ask the agent to formalize it into an instructions file. In Claude Code, this becomes CLAUDE.md, the context file that guides every session. In GitHub Copilot, it's .github/copilot-instructions.md, the workspace-level instructions that shape how Copilot behaves in your project. Both serve the same purpose: they give the agent your playbook so it doesn't have to guess every time. This approach is 10X better than asking an agent to generate instructions from nothing, because the instructions carry your experience, not the agent's guesses.

One more thing worth mentioning. Claude Code has a concept called about-me.md. It's a file where you describe who you are, your role, your expectations, and how you prefer things done. Think of it as giving the agent your perspective upfront. When it knows you're a tester who cares about locator stability over speed, or a lead who needs CI/CD compatibility, it drafts output from your angle and can even ask you more relevant questions. It takes 10 minutes to write and saves hours of misaligned output. Good practice to have this.

How to Set Guardrails Before and During an AI Coding Session

There are two parts to this, and both matter.

Before it happens

Both Claude Code and GitHub Copilot allow you to set guardrails before the agent even starts working. In Claude Code, you have hooks - think of them as automated rules (small scripts or shell commands) that run before or after the agent takes an action. For example, you can set a pre-hook that blocks the agent from writing to your .env file (the file where sensitive data like passwords and API keys live - something you never want an agent to accidentally overwrite). Simple, but it prevents a real problem before it ever starts. You can also flag edits outside a specific folder or require explicit approval before anything gets deleted. In GitHub Copilot, you control this through workspace trust settings and .gitignore patterns that limit what the agent can see and suggest on. The mechanism is different but the idea is the same: set boundaries before the work begins. Think of it the same way you think about your test pipeline - you don't wait for a test to fail in production to add a check. You add the check in the pipeline. Same logic applies here.

When you spot it

This is the real-time part, and it matters just as much. Don't minimize the process window and don't scroll past the agent's reasoning. Read it. Check what files it's accessing and what assumptions it's making. In Claude Code, if it's going in the wrong direction, press ESC, stop it, and tell it "that's not what I want." Redirect it with better context. In GitHub Copilot, reject the suggestion, open Copilot Chat, and clarify what you actually need. Don't accept and fix later - fix the direction now.

One more safety net

Even after all of this, before you commit anything the agent has written, run a quick git diff. It shows you exactly what changed, line by line. This is your final checkpoint - the moment where you see everything the agent touched across all files in one view. I've caught things here that I missed during the session: an extra import that wasn't needed, a comment that was removed, a config value that got changed silently. It takes 30 seconds and it's the easiest habit to build.

I've saved hours of rework by doing this one thing: stopping Claude Code when I spotted it hallucinating early. Instead of letting it go down the wrong path for 20 minutes and then undoing everything, I stopped it at minute 2 and said "that's not it, here's what I mean." Two minutes of attention saved twenty minutes of cleanup. Set the rules before the game starts, and when something still slips through, catch it early.

Habits That Actually Work: Plan Mode, git diff, and One Agent Auditing Another

Here are a few things that have genuinely helped me and are worth your time.

  • Spend time on the initial prompt. The better you plan and the more context you provide upfront, the better the outcome. This is not a shortcut you can skip. A good first prompt with the right files, the right constraints, and a clear expected output saves more time than any prompting hack or template you'll find online.
  • Use plan mode. Claude Code has plan mode and it's too good to skip. Before it writes a single line, it shows you the entire plan - what files it will create, what approach it will take, what assumptions it's making. Review it. Not just skim, review it. Clear any ambiguities and if you think it might go wrong somewhere, say so before execution. Correct the plan, not the output. In GitHub Copilot, you get a similar experience through Copilot Chat: ask it to outline the approach using @workspace before you let it generate code.
  • Check capabilities regularly. These tools evolve fast. Claude Code now has skills (reusable prompt templates that load on demand), hooks (automated guardrails), plan mode, and background agents that work in parallel. GitHub Copilot has agent mode (where it can run terminal commands and iterate on code), MCP support (a protocol that lets the agent connect to external tools like browsers or databases), and workspace context that understands your full project. Know what's available because you might be doing something manually that the tool already handles better.
  • One agent auditing another. I've seen people use one agent to review another agent's work, and that's a valid approach. But the best quality check is still whether the output satisfies you and your specific use case. It might be the most well-structured code in the world, but if it doesn't meet your need, it's not worth it. You define the quality bar, not the agent.

From "Allow All" to Actually Being in Control

None of this is about rejecting AI or being suspicious of it. These tools are genuinely powerful, and I've been impressed with what Claude Code and GitHub Copilot can produce. But the output is only as good as the human guiding it.

  • Know what you want.
  • Learn enough to question what you get.
  • Build instructions from your own experience, not from a generated template.
  • Set guardrails before you start.
  • Pay attention while it's running. And when something feels off, stop and redirect.

"The real skill isn't prompting better. It's knowing when to pause, question, and take responsibility for what gets built."

If you want to start somewhere, try this: pick one tool you already use, write five instructions or better prompts from your own workflow - how you debug, what you check first, what files should not be touched - and use those instructions for a week. See the difference it makes. That's your first step from "yes, allow all" to actually being in control.

👤 By Dinesh Gujarathi

And may the quality be with you