Coding from my phone 📱

WireGuard, Mosh, Zellij, and an AI that actually helps

I am dictating this post at ten kilometers altitude on a flight home from Mexico, Blink Shell open on my iPhone, while an agentic assistant on my NixOS machine writes an initial draft for this post. Admittedly, I’m a little addicted to agentic coding: when an idea pops up, I want to jump on it right away—even if that means doing it from a cramped seat in the sky. This is the workflow I’ve arrived at after trying many alternatives, it extends my agentic coding write-up and the self-hosted AI obsession in my local AI journey. It’s my backup plan for those moments when a computer simply isn’t nearby—the rest of the time I’m still at a keyboard like any other developer. This is in no way a replacement for a proper computer/laptop workflow but gets the job done when needed. It’s a personal, mostly open-source stack built around agent-cli—my local-first voice layer. Here’s how it works: I speak into my phone, the audio goes to FasterWhisper for transcription, Ollama cleans up the text, and it lands on my clipboard ready to paste. My iOS Shortcut handles this through a small HTTP interface that agent-cli exposes. The coding agent itself still uses the best proprietary model I can access.

Meta: I’m writing about agentic coding on mobile from my actual phone. It’s not the most efficient way to write, but it gets the job done — during a 6.5‑hour flight I dictated and iterated this entire post. Each iteration was a commit: dictate changes, review the diff, refine, commit, repeat. See all 50+ commits in PR #40 for the full iteration trail.
TL;DR: iPhone → WireGuard → Blink+Mosh → Zellij. I dictate via a Shortcut → FasterWhisper (transcribe) → Ollama (polish) → clipboard, then paste into Codex CLI using OpenAI’s gpt-5-codex-high. Everything is local except the model.
Table of Contents

1. Why Phone Coding Works Now #

For years I used iSH (a full Alpine Linux emulator) with SSH to hop into servers from my phone, but dropped connections were a constant frustration (Mosh fixes this, more on that later). Because coding on a phone keyboard is terrible, I kept it to tiny configuration tweaks or one-off fixes of a few characters. Agentic tools changed that: with a CLI coding agent, I don’t need to type the code—I describe the change, review the patch, and run it. That made meaningful work on the phone possible for the first time, for those moments when a computer isn’t around.

I tried VS Code in the browser, bounced between iSH and Terminus for SSH, and even lived inside a handful of in-browser terminal clients. I also spent time with mobile companions like Happy and Omnara, both designed to mirror Claude Code sessions on the phone, but they still felt like extra steps between me and my actual terminal. That friction pushed me toward a phone‑ready, self‑hosted workflow that still gives me raw SSH access to my own machine when the computer is out of reach.

This post is based on the way I develop software today. Your mileage may vary, but if you also care about privacy, open tooling, and reproducible environments, I think there are useful pieces here.

2. Constraints and Trade-offs #

This setup grew organically over time. What emerged are less strict requirements and more happy accidents that turned into core benefits:

  • Single trust boundary: Only the model provider (OpenAI) sees code context; audio and automation stay local, and I avoid any additional third‑party relays.
  • Resilient sessions: Connections should survive sleep and spotty networks.
  • Voice-friendly: Dictation should be accurate enough that I can trust it.
  • Reproducible config: The entire stack must live in my dotfiles and NixOS configuration.

To give you a sense of what I tried, here is the short comparison that convinced me to roll my own:

ApproachProsWhy I moved on
VS Code in the browserFamiliar editor UINeeds a steady connection and still lives outside my dotfiles comfort zone
iSH / Terminus SSHWorks without extra infrastructureLaggy, no Mosh, and awkward keybindings
In-browser terminalsInstant access from anywherePoor copy/paste ergonomics and flaky mobile keyboards
Happy (Claude Code companion)Push notifications, encrypted mobile UI for Claude CodeRequires wrapping every session with a separate CLI and still abstracts away my shell, and was quite buggy for me
Omnara (agent command center)Centralizes Claude Code/Codex sessions with terminal replayProxies via their servers; I already trust OpenAI for the model and don’t want another third‑party handling my code

The stack below gives me the resilience of Mosh, the ergonomics of Zellij, and full control over the AI layer.

3. Layer 1: WireGuard From the Router #

WireGuard is a modern VPN that lets me securely access my home network from anywhere. I terminate WireGuard on my router so every device in the house (and on the road) can dial home with the same config. (If you want something easier to set up without opening ports, Tailscale is a more user-friendly option.)

  • Server: ASUS XT8 router with WireGuard enabled via the router UI.
  • Client: The WireGuard iOS app with On-Demand rules so the tunnel flips on whenever I am off trusted Wi-Fi.
  • DNS: All mobile sessions resolve through my home DNS, so git.nijho.lt and internal services resolve instantly.

This gives Blink a local-LAN address for my desktop nixos.

What is Mosh? It’s like SSH, but it stays connected when the network changes or the phone sleeps, and it feels faster on bad connections.

Blink Shell is my daily driver on iOS because it pairs beautifully with Mosh and has great keyboard ergonomics.

  • I launch sessions using mosh bas@nixos -- zellij attach --create phone (Zellij is my terminal multiplexer—more on that in the next section).
  • Mosh smooths over spotty LTE and keeps my session alive when the phone sleeps.

Mosh keeps the session responsive across flaky networks and sleep.

5. Layer 3: Zellij Layouts #

I use Zellij as my terminal multiplexer for mobile work.

  • I stick to the defaults and use a couple of predefined layouts (e.g., a “phone” layout with editor + shell panes).

Ergonomics matter even more on a glass keyboard, so I lean on a few shell helpers:

I break these tricks down in more depth in Terminal Ninja, explain how I sync them with my Dotfiles, and even package binaries like zoxide with Dotbins.

6. Layer 4: agent-cli Server #

agent-cli runs as a small server on my NixOS machine. In this workflow I only use transcribe from iOS Shortcuts: the phone records audio, the server transcribes and cleans it up, and I paste the text. (Agent-CLI has other features like autocorrect and chat, but I don’t use them for this workflow.) On this machine I run a long‑lived agent-cli user service (systemd) so Shortcuts can POST audio to it and get cleaned text back—ready to paste into Codex CLI. The service is defined in my dotfiles—see: configs/nixos/modules/user.nix (agent-cli service).

For more background on why I built it and how I use it in practice, see Local AI Journey (§3) and Agentic Coding.

The models and services run on the same box:

  • FasterWhisper via faster-whisper-server for high‑accuracy streaming transcription.
  • Ollama for on‑device rewrite/cleanup before sending prompts to the coding agent.

My iOS Shortcut dictation workflow is slower than Apple’s built-in dictation, but the accuracy is much better—and that matters more.

For the actual coding agent, I use a fork of Codex CLI with OpenAI’s gpt-5-codex-high model (here no open-source solution matches the frontier).

7. Layer 5: The iOS Shortcut Pipeline #

My iPhone’s Action Button runs a Shortcut that records audio and sends my voice to agent-cli on my NixOS machine for transcription and cleanup.

For the full recipe, see the iOS Shortcut Guide: agent-cli/iOS_Shortcut_Guide.md.

In short: I press the Action Button, the Shortcut records a snippet, sends it to agent-cli, and copies the cleaned text to my clipboard so I can paste it into Codex CLI.

The whole loop finishes fast enough that I can capture intent by voice and paste it into Codex CLI without typing.

Dictation quality: In my experience, the built‑in iOS dictation is absolute garbage compared to Whisper/FasterWhisper. Accuracy and punctuation are much better with the FasterWhisper server running at home; it’s a bit slower, but that trade‑off is worth it for clean, usable text.

8. Workflow in Practice #

Here’s the typical flow:

  • WireGuard auto-connects when I’m off home WiFi
  • Open Blink—my Mosh session is still alive from last time, Zellij panes intact
  • Dictate the change, paste into Codex CLI, review the agent’s response
  • Check diffs in an adjacent Zellij pane or different Blink tab with separate Mosh session
  • The agent commits and pushes to a branch, then opens a PR via gh for me to review and merge

On the phone, I aim for the smallest practical edits and initial implementations. Even on personal repos, I still open a PR—sometimes prompting alone gets me to a merge‑ready result. For open‑source with real users, I finish on the computer with a careful review and any final edits.

9. Conclusion and Further Reading #

This phone setup extends my Agentic Coding workflow to mobile. I connect to the same Zellij session on nixos, so when an idea hits, I pick up exactly where I left off—same environment, same context, no setup. Everything runs on my hardware (voice processing and automation) except the coding model itself. It’s the most effective mobile workflow I’ve had so far. It’s mostly open‑source not out of dogma, but because those tools are the best options for my needs. The one exception is the coding model: there’s no true open equivalent right now, and it makes no sense to buy 20× H100s just to self‑host a frontier model even if such a high quality model would be available open-source.

If you’re curious to dive deeper, here are a few related posts:

I’d love to hear what self-hosted tricks you’re using for mobile development—reach out if you adapt this stack or build something wild on top of it.

Edit this page

Bas Nijholt
Bas Nijholt
Staff Engineer

Hi.

Related