On agentic coding 🤖

What changed when my assistant could run tests

Last updated on 2025-11-11 30 min read Software Development, AI, Level:intermediate

Table of Contents

1. From skepticism to productivity #

About eight months ago, I wrote about my disastrous experience with “vibe coding"—where I let AI generate code without careful review. What started as a fun 3-hour prototype turned into a 15-hour debugging nightmare. I concluded that while AI was fantastic for quick prototyping, it was dangerous without human oversight. I even declared I’d “never invest in, build upon, or use such products.”

Fast forward to today, and something remarkable has changed. Not my core principle—I still never merge code I don’t understand—but the tooling and my approach have evolved through two distinct phases.

First, let’s look at the evidence of AI’s overall impact on my productivity:

This graph tells an interesting story. After maintaining a steady pace for years (2016-2023), my productivity exploded with the introduction of AI tools. In 2024-2025 alone, I’ve published 14 new Python packages and made over 400 releases. That’s more new packages than in the previous 6 years combined!

But this transformation didn’t happen overnight—it came in two distinct phases that fundamentally changed how I write code.

TL;DR: My AI coding journey had two major phases: Phase 1 started in March 2023 with GPT-4’s release—using chat interfaces with copy-paste workflows. Phase 2 began just two months ago in July 2025 with Claude Code—an agentic AI that can explore codebases, run tests, and debug itself. This second leap was as transformative as the first one. I spent >10k USD worth of API usage in the first month alone (thankfully capped at $200 with the Pro plan).

2. But don’t you love programming? #

I’ve heard this objection countless times: “I love programming too much to let AI do it for me.”

Sure, I get it. It’s like preferring analog photography over digital, or writing letters by hand instead of typing. Some people genuinely enjoy the manual process, and that’s valid.

But here’s the thing: I absolutely love programming. It’s my biggest passion in life. As I mentioned in my local AI journey, I bought a gaming PC to play games but have only used it for coding and local AI experiments. My idea of a fun weekend? Writing code until 3 AM. I’ve been doing this for over 10 years, and I’ve never had as much fun programming as I do now.

What I’ve discovered is that I don’t just love the act of typing code—I love building and creating things. There’s a difference.

I’m ridiculously particular about code quality. I literally cannot look at poorly formatted code without feeling physical discomfort. Every operator, every indentation, every naming convention matters to me. I enforce strict linting rules and will spend time refactoring code just because the style bothers me. This isn’t about being pretentious—it’s about caring deeply about my craft.

The revelation: AI doesn’t take away my ability to care about these details. Instead, it gives me insane leverage to create more while maintaining my personal standards. I still review every line (although I am more lenient in certain cases), still enforce my style guides, still refactor when something bothers me. But now I can build 10x faster.

It turns out that what I truly love isn’t the mechanical act of typing—it’s the creative act of bringing ideas to life. And with AI, I can bring more ideas to life than ever before.

The power to explore #

AI has unlocked something incredible: the ability to explore ideas that weren’t worth the effort before. It’s now trivial to write 2,000 lines of code to solve a simple problem. Before AI, spending 5+ hours on a utility script wasn’t justifiable. Now? I can build it in 10 minutes.

Here are just some of the “wouldn’t have been worth it” projects I’ve built recently:

Financial independence tracker: A full website visualizing my path to FIRE
Tuitorial: Terminal-based presentation software
Stream Deck home control: Complete tool to control my entire house from a Stream Deck
Dotbins: Automatic binary dependency syncing across all my machines
Agent-CLI: A local, open-source Siri that actually works

Each of these would have taken weeks of manual coding. The effort-to-reward ratio just wasn’t there. But with AI? I can explore every random idea that pops into my head at 2 AM.

The Matty story: from midnight idea to working code #

Here is the perfect example of this power: One night around midnight, I was lying in bed (slightly intoxicated) when I had an idea—I needed a way for my AI agents to communicate via Matrix. So I grabbed my iPhone, SSHed into my machine, and started building.

One hour later, at 1 AM, I had created Matty—a fully functional terminal-based Matrix client. Built entirely from my phone. In bed. (Poor sleep hygiene, I know.)

When I woke up the next morning and tested it… it actually worked! I spent a few more hours adding tests and cleaning it up, but the core functionality was built in that hour between midnight and 1 AM.

This is impossible without AI. Nobody’s writing a Matrix client from their phone in bed at midnight. But with Claude Code? It’s just another Tuesday night idea that becomes reality.

clip-files UI: when ideas strike at random moments #

What happens more often than you’d think: I get an idea at a completely random moment and NEED to implement it immediately.

Case in point: I was sitting in the bathtub during my copy-paste AI era when I realized I needed to access my codebases from anywhere. I’d already built clip-files—a CLI tool that copies entire codebases to your clipboard—but I wanted it accessible from my phone.

So I got out of the bathtub, built a web UI for it in 20 minutes, and got back in. The UI lets me:

Browse all my GitHub repositories
Select any branch
Click one button to copy all source code to clipboard
Paste it into my self-hosted AI chat from anywhere

Ironically, I rarely use this tool now—Claude Code can explore codebases directly. But back in the copy-paste era, this 20-minute bathtub interruption made mobile AI coding possible.

The barrier between “wouldn’t it be cool if…” and “here’s a working prototype” has essentially disappeared.

The 800-rule marathon: 80 AI agents at once #

Another thing that would be literally impossible without AI: I decided to enable ALL 800+ Ruff linting rules on a 20,000-line codebase.

For context, Ruff is a extremely fast Python linter that implements rules from over a dozen different tools. Most projects enable maybe 50-100 rules. I wanted all 800+.

The result? ≈2,500 violations across the entire codebase.

Fixing these by hand would take 20 hours if I could fix one per 30 seconds… But with Claude Code’s agent spawning capabilities, I did something insane:

Opened 8 terminal windows with Claude Code
Created a work orchestration document that one AI managed
Each session spawned 10 parallel sub-agents using /agents
Total: ~80 AI agents working on my code simultaneously
Two hours later: 100% compliance with all 800 rules

The orchestrator agent would assign work like:

“Session 1: Fix all E501 line-too-long violations in src/core/”
“Session 2: Handle all B008 function-call-in-default-argument issues”
“Session 3: Fix all RUF100 unused-noqa violations”

Each session’s sub-agents would tackle different files in parallel. When conflicts arose, the orchestrator would resolve them. This is the kind of code quality improvement that simply wouldn’t happen without AI—not because it’s technically difficult, but because the human effort required is unrealistic.

1,400 commits, rewritten in minutes for $2.50 #

Here’s another “impossible without AI” story: I had 1,400 commits with terrible messages (in a not-yet-released and private project).

Since I commit so frequently (using them as snapshots in feature branches), my commit messages were all over the place:

Some were just a period “.”
Some were decent descriptions
Some were just merge commits
Zero consistency in style

I wanted them all to follow a consistent convention with prefixes like fix:, feat:, test:, ci:, etc.

Obviously, rewriting 1,400 commit messages by hand is not going to happen. But with AI?

I had Claude write me an agent using the Agno framework that would:

Read each commit’s full diff using git show
Analyze the changes to understand what was done
Return structured JSON with three fields:
- commit_message: The improved message (or empty to keep original)
- keep: Boolean flag whether to keep the original
- reasoning: Why the AI made that decision

Then the magic happened: I spawned 1,400 AI agents simultaneously using the DeepSeek API.

Ten minutes and $2.50 later, all my commit messages were rewritten with consistent, meaningful descriptions. The entire Git history now looks more professional and searchable.

The barrier between “wouldn’t it be cool if…” and “here’s a working prototype” has essentially disappeared.

3. The two-phase AI revolution #

My AI coding journey wasn’t a single leap—it happened in two distinct phases, each with its own breakthrough moment.

Phase 1: copy-paste era (March 2023 - July 2025) #

When GPT-4 launched in March 2023, it changed everything. I built tools like clip-files to efficiently copy code context into a ChatGPT-like web interface. For over two years, this workflow was:

Manual but effective: Copy code → paste into AI → review suggestions → implement manually
Limited by context: Could only share what fit in a single message
No validation: AI couldn’t test its suggestions
Still valuable: Helped with ideation, code review, and refactoring

This phase was already productive—I was shipping more code than before. But I was still the one running tests, debugging errors, and validating everything.

Phase 2: the Claude Code revolution (July 2025) #

Just two months ago, in July 2025, everything changed again. Frustrated with Cursor becoming painfully slow, I tried Claude Code based on the buzz online. That first night, I spent $70 on API tokens. Not by accident—I just kept adding $10-$20 increments like a gambling addict. The experience was so mind-blowing that within two weeks, I’d spent nearly $200 and immediately upgraded to the $200/month Pro plan.

To put this in perspective: Using a command-line tool ccusage to track token consumption, I calculated that I used $10,000 worth of API tokens in my first month. Thankfully, the Pro plan capped my cost at $200!

What made Claude Code different? #

The shift from copy-paste to agentic AI was like upgrading from a bicycle to a spaceship:

Before (copy-paste with ChatGPT/Claude Web)

Generate code based on prompts
No access to your codebase
Can’t run tests or see errors
You debug everything manually

After (Claude Code)

Reads and searches your entire codebase
Executes commands and runs tests
Sees error messages and debugs itself
Iterates until tests pass
Uses tools to explore and validate

The difference is profound. I went from carefully copying snippets with clip-files to having an assistant that can explore my entire project, run my test suite, fix failures, and even commit the changes.

4. My current agentic workflow #

Here’s how I actually work with Claude Code on a typical project:

Parallel development: 6 features at once #

One of Claude Code’s superpowers is enabling truly parallel development. Here’s my setup that would make any developer from 10 years ago think I’m insane:

I use Zellij (a terminal multiplexer like tmux) with a custom layout for my projects:

5 main tabs: Each split into two panes—Claude Code on one side, terminal on the other
Using independent copies of same repository: Each tab is a separate Git worktree with its own environment, own environment variables, own deployment
Each tab = different feature: Using Git worktrees to work on 6 features simultaneously
Monitoring tabs: ccusage for tracking Claude usage, htop for CPU, nvtop for GPU
Voice-driven orchestration: I cycle through tabs, review code while speaking, move to next

My workflow looks like this:

Start feature A in tab 1, give Claude instructions
Switch to tab 2, start feature B while A is working
Continue through all 5 tabs, starting different features
Circle back to tab 1, review what Claude did, give feedback via voice
Repeat the cycle every 10-15 minutes

This parallel workflow is how I built a complex project (that I haven’t released yet) in one month that would have taken 9+ months before. The key is Git worktrees—each feature gets its own working directory, so there’s no context switching overhead.

Starting a new package #

# I describe what I want to build
"I need a Python package that manages CLI tool binaries within Git repositories,
automatically downloading the correct binary for the user's platform, here <insert path> is a repo with boilerplate skeleton to copy (use same conventions)"

# Claude Code then:
# 1. Creates the project structure
# 2. Implements core functionality
# 3. Writes comprehensive tests
# 4. Generates documentation
# 5. Sets up CI/CD workflows

The key difference: iteration and validation #

Unlike my vibe coding disaster, Claude Code doesn’t just dump code and leave. Here’s a real interaction pattern:

Claude writes initial implementation
Claude runs the tests → Several fail
Claude reads the error messages and fixes the code
Claude runs tests again → More pass, some still fail
Claude debugs the remaining issues
Claude validates all tests pass
I review the final code and understand it (tell it to make changes if needed)
Claude creates a proper Git commit

This iterative, validated approach is what makes agentic coding so powerful. It’s not about blindly accepting generated code—it’s about having an assistant that can explore, test, and refine solutions.

5. The evidence: explosive productivity #

Let me share some concrete data from my PyPI packages analysis:

Package creation by year #

2016-2022 (7 years pre-AI): ~2 packages/year average
2023 (GPT-4 launch year): 6 packages
2024 (full year with copy-paste AI): 7 packages
2025 until 3 months ago (6 months): 7 packages
since 3 months (2 with Claude Code, 1 with Codex CLI): wrote about 400k line of code (although this includes many iterations)!

The acceleration with AI is clear!

Recent packages built with agentic AI #

Here are some packages I’ve built recently with Claude Code:

matty: Terminal-based AI chat with file integration
agent-cli: Local-first AI-powered CLI agents (built in days, not weeks!)

Each of these would have taken me weeks or months to build alone. With Claude Code, I’m shipping production-ready packages in days.

6. Why this isn’t “vibe coding” (but with nuance) #

You might wonder: isn’t this just vibe coding with extra steps? The answer is nuanced and depends on the context—similar to my philosophy on dependencies.

My context-driven standards #

Just like I have different standards for dependencies in libraries versus applications, I apply different levels of scrutiny based on what I’m building:

For critical libraries (e.g., Adaptive, pipefunc, unidep)

Maximum scrutiny - These are packages others depend on:

Every single line reviewed and understood
100% test coverage with careful test review
Architecture decisions carefully considered
Documentation must be comprehensive
My reputation is on the line

For experimental CLIs and personal tools

Pragmatic approach - Isolated tools with no downstream dependencies:

Core architecture must be fully understood
Implementation details can be AI-generated if tests pass
Test generation can be more automated
Focus on functionality over perfection
Similar to my relaxed dependency stance for applications

The internal dependency graph principle #

The key insight: My scrutiny level correlates with how foundational the code is within the project:

Core/foundational code (what everything else depends on): Maximum scrutiny, every line matters
- Data models, core algorithms, API interfaces
- Authentication, database operations, state management
- These are the “roots” that everything else builds upon
Peripheral/leaf code (nothing depends on it): Can be more AI-delegated
- Plotting functions, display utilities, CLI formatters
- Test helpers, documentation generators
- These are the “leaves” that don’t affect other code
Work projects: Always maximum scrutiny for foundational code, regardless of project type

This isn’t about being lazy—it’s about focusing human attention where it matters most. I can trust AI more with a plotting function that nothing depends on than with a core data structure that the entire system uses.

Building constraints around AI #

The secret to productive agentic coding isn’t just the AI—it’s the constraints and guardrails I build around it:

Automated quality gates

Ruff with strictest rules: Catches style issues, complexity problems, and common bugs
MyPy in strict mode: Enforces type safety across the entire codebase
Pre-commit hooks: Automatically format and validate code before commits
Comprehensive test suites: AI must make tests pass, not just write code

Project-specific guidance

Every project gets a CLAUDE.md file with explicit rules:

No defensive programming: Don’t wrap things in try-except unless necessary
Functional over classes: Prefer simple functions in Python
No backward compatibility: For new projects, embrace breaking changes
Be ruthless: Aggressively remove unused code

Custom commands and workflows

I’ve built specific commands that inject context and constraints:

Anti-cruft reviews: Remove over-engineering and defensive code
Safe commit practices: Never use git add ., always selective staging
Initialize understanding: Load project context and current work state

Local virtual environments: stop the hallucinations

Here’s a critical tip that eliminated a lot of my AI frustrations: keep a virtual environment with all dependencies installed locally.

Instead of letting Claude hallucinate how libraries work, I tell it explicitly in my CLAUDE.md:

### Step 1: understand the context

- **READ THE SOURCE CODE**: This library has a `.venv` folder with all dependencies installed.
  So read the source code when in doubt.
- **Never guess API behavior**: If unsure, inspect the actual implementation in `.venv/lib/python*/site-packages/`

This simple addition transforms Claude from guessing about library APIs to actually reading them:

Before: “I think your custom AsyncProcessor.batch() method takes a list…”
After: Reads /path/to/.venv/lib/python3.11/site-packages/my_internal_lib/processor.py and knows it actually takes an iterator

When Claude can read the actual source of your own packages, internal company libraries, or niche dependencies, it stops making assumptions and starts working with facts. This is especially powerful for:

Your own libraries that Claude has never seen before
Internal company packages that aren’t public
Niche libraries with barely any GitHub stars or documentation
Modified versions of popular libraries with custom patches
Edge cases where even good documentation doesn’t cover everything

These constraints transform AI from a loose cannon into a precision tool.

7. Common pitfalls I’ve noticed #

After a few months of using agentic AI tools, I’ve noticed consistent patterns that require vigilance:

Note from the future: I now realize this is very model dependent! GPT-5 has different pitfalls than Claude Opus 4.1

The “defensive programming” trap #

# AI loves this:
try:
    result = some_function()
except Exception:
    pass  # Silently suppress errors 😱

# But we need this:
result = some_function()  # Let it fail loudly if something's wrong

Claude tends to wrap everything in try-except blocks, suppressing errors that should bubble up. This is why my CLAUDE.md explicitly forbids unnecessary error handling.

The “backwards compatibility” obsession #

Claude constantly adds backwards compatibility for features that were literally just introduced in the same session:

“Maintaining compatibility with the old version” (that never existed)
Fallback mechanisms for code paths that were just created
Multiple ways to do the same thing “for flexibility”

The “over-engineering” disease #

Implements factory patterns for simple object creation
Adds abstraction layers that serve no purpose
“Production ready” code for experimental scripts

The git commit sins #

Despite explicit instructions in CLAUDE.md:

Creates your_module_v2.py alongside your_module.py instead of updating
Still tries git add -A or git add . regularly despite explicit bans
Loves to “helpfully” revert debugging changes from other files
Commits .env files if not watched carefully

The “helpful” anti-patterns #

Loves defensive programming: Validates things that can’t be wrong
Gladly reads .env: Will expose secrets if not careful
Reverts unrelated changes: “Cleans up” debugging code from other features

The “mission accomplished” hallucination #

This is a dangerous pitfall. Claude Code will sometimes claim complete success when it hasn’t actually fixed anything:

Claims victory: “I’ve fixed all the issues!”
Ask for proof: “Show me the test output”
Backpedals: “Oh, let me actually run the tests…”
Still claims success: “Yes, I can see in the logs it works!”
Demand the actual log: “Show me the exact log file”
Finally admits: “Actually, there are no log files. The tests don’t pass.”

Always demand proof. Never accept “it’s done” without seeing actual test output.

The nuclear option #

I’ve had Claude Code literally try to delete all project files when “cleaning up”:

# Claude's "helpful" cleanup:
"Task complete! Let me remove the temporary files..."
rm -rf src/  # 😱

This is why frequent git commits are non-negotiable. I commit after every small success.

The debugging debris #

During problem-solving, Claude Code leaves a trail of attempts:

Debug logging statements everywhere
Failed attempt code commented out
Multiple approaches tried in parallel
Temporary test files and scripts
Extra imports and unused functions

Before merging, always ask: “Review your changes and remove all debugging artifacts and failed attempts.” Getting to a high code coverage (approaching 100%) helps ensure no dead code remains, as long as your tests use public APIs.

Why this happens #

These patterns emerge because AI is trained on public code that often:

Maintains backward compatibility for years
Uses defensive programming for public APIs
Includes extensive error handling for user input
Follows “enterprise” patterns even for simple scripts
Contains debugging code from development

This is why constraints are essential—without them, Claude defaults to these “safe” but overcomplicated patterns.

8. Critical success factors: what actually makes this work #

After two months of intense usage, here are the non-negotiable practices that make agentic coding actually productive:

Teach it your test commands (day 1 priority) #

This is absolutely crucial. Claude Code needs to know how to run your tests and how to activate the environment:

# To run tests, use: 
uv run pytest tests/ -xvs"

# For coverage:
uv run pytest --cov=src --cov-report=term-missing

# To run specific test
uv run pytest tests/test_module.py::test_function

Without this, it’s just guessing whether code works. With it, it becomes genuinely useful.

Git commits are your safety net #

I commit obsessively when using Claude Code:

After every successful (even partial) feature implementation
Before letting it attempt any major refactoring
Whenever tests pass
Before any “cleanup” operation

But here’s the key: I develop every feature in its own branch, even as a solo developer. My workflow:

Create a feature branch for each new feature or fix
Make frequent commits as snapshots (sometimes just a period for the message or tell Claude to commit)
Open a pull request to review the full diff myself
Merge only after reviewing all changes

This has saved me from disaster multiple times. Git is your undo button when Claude goes nuclear (e.g., breaks or removes your code).

Maintain healthy skepticism #

Never trust, always verify:

“I fixed it!” → “Show me the test output”
“It’s working now!” → “Run the tests again with -xvs”
“The logs show success!” → “Cat the actual log file”
“It’s production ready!” → “Did you run the tests?”

Think of Claude Code as an enthusiastic junior developer who sometimes exaggerates their accomplishments.

Force it to clean up after itself #

After any debugging session, always:

"Review all your changes and remove:
- Debug print statements
- Commented out code
- Failed attempt implementations
- Temporary test files
- Unused imports"

My custom /anti-cruft command automates this, but you can do it manually too.

Code coverage is your friend #

High test coverage (90%+) ensures:

The code Claude wrote actually runs
No dead code from failed attempts
All paths are exercised
You can refactor confidently

9. My secret weapon: voice-to-code workflow #

One of my biggest productivity multipliers isn’t Claude Code itself—it’s how I communicate with it. Using my agent-cli tool, I’ve developed a voice-first workflow that’s transformed how I write prompts.

The problem with typing prompts #

Effective agentic coding requires precise, detailed instructions. A good prompt isn’t 20 words—it’s often 200-500 words explaining exactly what you want, what to avoid, and how to approach the problem. Most people don’t want to type that much.

My voice workflow #

Here’s my actual workflow:

Start recording with agent-cli transcribe (hotkey triggered)
Review Claude’s changes while speaking my thoughts aloud
Speak for minutes about what I want, what’s wrong, what to fix
Paste the transcription directly into Claude Code

This is powered by OpenAI’s Whisper model running locally on my RTX 3090 at home. It’s more reliable than macOS dictation and gives me complete privacy.

Why this works so well #

Rich prompts: I naturally give more context when speaking
Code review narration: I can review code while explaining issues
Thinking out loud: Speaking helps clarify my own thoughts
Speed: Speaking is 3-4x faster than typing
Precision: I can be incredibly specific without typing fatigue

For example, instead of typing “fix the bug,” I might say:

“I’m looking at line 45 where you’re handling the authentication. The problem is you’re not checking if the token has expired before making the API call. Also, you’ve added a try-except block here that’s suppressing errors—remove that. And while you’re at it, the logging statement on line 52 is using the wrong format. Make sure to follow our project’s logging conventions…”

This level of detail is what makes AI coding actually productive.

10. Additional tips for agentic coding #

Beyond the critical success factors and voice workflow, here are more tips to maximize your productivity:

Set clear constraints #

Create a CLAUDE.md or similar file in your project root with your preferences:

## Project standards

- Never use try/except to suppress errors silently
- Always run pytest before marking task complete
- Use type hints for all functions
- Prefer simple solutions over clever ones

Use the task system #

Claude Code’s todo system is great for complex work:

1. ✅ Implement core functionality
2. ✅ Write comprehensive tests
3. 🔄 Fix failing tests
4. ⏳ Add documentation

Leverage the search capabilities #

Let Claude search for existing patterns:

“Find all API endpoint implementations in this project”
“Show me how we handle authentication elsewhere”
“What testing patterns are we using?”

Review in chunks #

Instead of reviewing 500 lines at once:

Have Claude implement a single feature
Review and understand it
Run tests
Commit(s) and PR
Move to the next feature

11. The tools I’ve tried #

My journey through AI coding assistants has been evolutionary:

Pre-AI era #

Manual coding with IDE autocomplete
Stack Overflow copy-paste
Productivity baseline: 1-2 packages per year

Phase 1: copy-paste tools (March 2023 - July 2025) #

ChatGPT/Claude Web + my clip-files tool
Manual context sharing
Helpful for ideation and code review
Productivity: 3x improvement (from 2 to 6 packages/year)

Cursor (early 2024) #

Great autocomplete
Limited context awareness
Became frustratingly slow
Still led to my vibe coding disaster
Productivity: 5x improvement (but with quality issues)

Phase 2: Claude Code (July 2025 - present) #

Full codebase awareness
Can execute commands and debug
Self-correcting through test iterations
Maintains context across sessions
$70 spent on first night, $10,000 worth in first month
Productivity: 24x improvement over pre-AI baseline with maintained quality

OpenAI’s Codex CLI (the reasoning beast) #

Recently tried OpenAI’s Codex—their new CLI alternative to Claude Code—and I’ll admit, for pure reasoning with their GPT-5-Codex model, it’s impressive:

Solved a race condition in an unfamiliar language that took me hours with Claude Opus
More elegant solutions for complex algorithmic problems
Better at deep reasoning when given the same context

But here’s why I still use Claude Code daily:

Claude Code’s UX is superior: Resume conversations, better interface
Codex’s CLI is fragile: Hit Ctrl-C accidentally? Lose everything, start over (muscle memory keeps betraying me!)
Claude Code handles interruptions: First Ctrl-C clears text, second quits gracefully
No session persistence: Codex forgets everything, Claude Code remembers
Context management: Have to re-provide all context after accidental exits

For a complex race condition bug, Codex with GPT-5 gave me the most elegant solution. But for day-to-day development where I need reliability, good UX, and the ability to resume work?

Why not Copilot or other “cheaper” tools? #

Let me be blunt: You get what you pay for. People often say “I tried Copilot and it wasn’t that useful” or “Gemini in chat didn’t help much.” Of course not—these tools are at least 10x cheaper than Claude Code with Opus.

Your $20/month GitHub Copilot subscription simply cannot provide the same quality or quantity of high-quality tokens as Claude Code. It’s like comparing a bicycle to a Ferrari and wondering why the bicycle isn’t as fast. The economics don’t work out:

Copilot: $20/month for limited autocomplete
Claude Code Pro: $200/month for unlimited* agentic assistance (*capped, but generous)
Actual API value used: $10,000+/month at my usage level

The 10x price difference reflects a 10x (or more) difference in capability. Copilot is autocomplete on steroids. Claude Code is a tireless pair programming partner with perfect memory who can debug, test, and iterate. The ROI is obvious when you’re shipping 10x more code.

12. Is this sustainable? #

You might wonder if this productivity is sustainable or if I’m just building lower-quality software faster. Here’s my honest assessment:

The good #

Faster prototyping: Ideas to working code in hours, not days
Better test coverage: AI never gets lazy about writing tests
More experimentation: Lower cost to try new ideas
Improved documentation: AI loves writing docs (I don’t)
Reduced burnout: Less time on boilerplate, more on interesting problems

The challenges #

Code review fatigue: You must stay vigilant
Dependency on tools: What if Claude Code disappears?
Cost: $200/month Pro plan (but worth every penny given the $10K+ value)
The temptation to “vibe”: Constant discipline required
Addiction potential: I literally couldn’t stop coding, having the desire to deplete all tokens and get my money’s worth!

My verdict #

It’s absolutely sustainable if you maintain discipline. The moment you start accepting code you don’t understand, you’re building a house of cards. But with proper review and testing, agentic AI is a massive force multiplier.

13. The experience multiplier: why seniority matters more than ever #

Here’s a counterintuitive truth about agentic AI tools: they amplify your existing skills exponentially, not linearly. This creates a fascinating paradox where these tools become increasingly valuable as you become more experienced.

The exponential trap for beginners #

For beginners, agentic tools can feel like a superpower initially. They can build a working application in hours! But here’s the danger: without the experience to recognize bad patterns, they’re essentially building at 10x speed in the wrong direction. It’s like giving a Formula 1 car to someone who just got their driver’s license—the speed multiplies both the distance traveled and the potential for catastrophic crashes.

A beginner using Claude Code might:

Accept complex solutions they don’t understand
Build massive technical debt at record speed
Miss architectural problems that will haunt them later
Create code that “works” but is impossible to maintain

They’re forced into a difficult position: either slow down to understand everything (negating the speed benefit) or accumulate technical debt at an exponential rate.

The senior developer advantage #

For experienced developers, it’s a completely different game. When I look at Claude Code’s suggestions, I can instantly recognize:

“Oh, that’s the Factory pattern—makes sense here”
“This error handling is too defensive, let’s simplify”
“That’s going to cause N+1 queries, need to refactor”
“This violates our project’s conventions, fix it”

The review process that might take a beginner hours takes me minutes. I’m not learning what the code does—I’m validating that it does what I already know it should do.

The right way for every level #

That said, everyone can benefit from agentic tools—you just need to adjust your approach:

For beginners

Use AI as a learning accelerator: Ask it to explain every decision
Build small projects first: Master fundamentals before scaling up
Prioritize understanding over speed: Better to build slowly and learn
Pair with mentors: Have experienced developers review AI-generated code

For intermediate developers

Focus on patterns: Use AI to learn architectural patterns and best practices
Experiment safely: Try new approaches in side projects first
Question everything: Don’t accept solutions without understanding the tradeoffs

For senior developers

Leverage for velocity: Use your experience to review and direct quickly
Focus on architecture: Let AI handle implementation while you design systems
Teach the AI: Create project-specific guidelines to improve output quality
Push boundaries: Explore more ambitious projects now that implementation is faster

The real multiplier effect #

The productivity boost isn’t just about writing code faster. For senior developers, agentic tools multiply:

Architecture exploration: Test multiple approaches quickly
Refactoring confidence: Refactor fearlessly with AI helping maintain functionality
Learning velocity: Explore new languages and frameworks with a knowledgeable assistant
Documentation quality: Finally have time (and help) for comprehensive docs
Testing coverage: AI never skips tests, even for “simple” functions

This is why I’ve been able to ship 14 packages in 18 months. It’s not just that I’m writing code faster—I’m able to explore ideas, validate approaches, and iterate on designs at a pace that was previously impossible.

14. Reality check: what AI can and can’t do #

Many people try AI coding and give up disappointed. That’s because they’re treating it like a magic genie that will solve the unsolved problems of the universe. It won’t.

What AI CAN do (the churn work) #

AI excels at tasks you could do yourself but would take hours:

Setting up CI/CD pipelines: Boilerplate YAML configuration
Writing tests: Especially unit tests for existing code
Fixing dependency issues: Adapting to breaking changes in libraries
CRUD operations: Basic create, read, update, delete functionality
Data transformations: Converting between formats, parsing, serialization
Documentation: README files, docstrings, API documentation
Refactoring: Renaming variables, extracting functions, reorganizing code
Bug fixes: Especially those with clear error messages

This is the “churn”—the necessary but time-consuming work that makes up 80% of development.

What AI CAN’T do (the creative work) #

Don’t expect AI to:

Solve novel algorithmic problems: It won’t invent the next PageRank
Make architectural decisions: It can’t decide if you need microservices
Understand business logic: It doesn’t know why your company does things
Create truly innovative solutions: It remixes what it’s seen, not invents
Make judgment calls: Security vs convenience, performance vs maintainability

The sweet spot #

I use AI for tasks where:

I know how to do it but it would take time
The solution exists somewhere in its training data
Success is measurable through tests or clear output
The problem is well-defined with clear constraints

This isn’t about AI doing things you can’t do—it’s about AI doing things you don’t want to spend time doing. Once you understand this, AI becomes incredibly powerful.

15. Looking forward: the future of development #

As I showed in my local AI journey, I’m deeply interested in where AI and development intersect. Agentic coding represents a fundamental shift in how we build software:

From writing code to reviewing and directing
From memorizing APIs to understanding patterns
From debugging alone to collaborative problem-solving
From slower iteration to rapid experimentation

This isn’t about replacing developers—it’s about amplifying our capabilities. I’m still the architect, the decision-maker, and the quality gatekeeper. But now I can focus on the interesting problems while my AI assistant handles the implementation details.

16. Conclusion: principles over process #

My journey from vibe coding skeptic to agentic coding advocate might seem like a complete reversal, but it’s not. My core principle remains unchanged: never commit code you don’t understand.

What’s changed is the tooling has finally caught up to the promise. With agentic AI assistants, I can maintain my standards while dramatically increasing my output. Most of those 32 packages on PyPI have users and are well-tested and documented projects that solve real problems.

The key is treating AI as what it is: an incredibly powerful assistant, not a replacement for thinking. Use it to explore ideas faster, implement solutions quicker, and test more thoroughly. But always, always understand what you’re building.

As I continue building in public and sharing my tools, I’m excited to see where this technology takes us. If you’re interested in trying agentic coding yourself, start small, maintain your standards, and prepare to be amazed by what you can build.

What’s your experience with AI coding assistants? Have you tried moving from generative to agentic tools? I’d love to hear your thoughts!

Links and resources #

Edit this page

AI Agentic-Coding Claude Claude-Code Codex Openai Open-Source Python Productivity Development AI-Assisted Cursor Gpt-5