Skip to main content

AI Shoulder Surf V3

·2339 words·11 mins·
AI automation
Table of Contents

What follows is an AI summary of our meeting. The target audience is mostly the folks who were on the call, but I’ll be happy if anyone else gets something out of it.

On Thursday, May 28, 2026 we had our third AI Shoulder Surf. The first writeup explains the format, but the short version: it’s an informal Zoom call where we share screens, talk about what we’re working on, and admit what we don’t know.

The connective thread this time was Tailscale: everyone at the table had quietly built a private home lab and was reaching it from anywhere without exposing a single port to the public internet.

featured

"Stop....hammer time?" by troy_williams is licensed under CC BY 2.0 .

Olaf: Coach Claude
#

I have a triathlon coming up, so I asked Claude to build me a training program. I fed it a GPX file of the bike course so it could analyze the climbs and base the training around the actual route, then gave it my constraints — swim on these days, long run on a Wednesday — and it laid the whole thing out nicely.

As a bonus it generated a little static website for the plan. I spun that up on my Hetzner VM and within about five minutes something had already tried to GET the .env file. It wasn’t being served, so no harm done, but it was a good reminder of what the open internet is like the moment you put anything on it. So I moved the whole thing onto my Tailscale network instead. Now my training plan isn’t getting probed by creeps — it’s just there for me when I want it.

The other nice touch: Claude also produced an iCal subscription and parks it in a GitHub gist. I subscribed to that URL from my phone calendar, and it actually updates when the plan changes. I’m happier with that than I would be wrangling a spreadsheet or paying a monthly fee for an app that doesn’t quite do what I want.

Aaron: A job-hunt workspace you can reach from anywhere
#

Aaron has been job hunting, and he hit the familiar problem of bookmarking a posting to “apply later” and then forgetting it. So he built himself a little GitHub project: a spreadsheet of every place he’s thinking about applying to. When he’s in the mood, he kicks off an agent — he’s on Gemini CLI / Antigravity now — to research a company, draft a cover letter, and coach him with questions about the role.

The clever part is where it lives. The whole thing sits in a folder on his VM, so when he’s at a coffee shop and feels like chipping away at it, he connects to his Tailscale network, SSHes into the VM, and it’s right there. No syncing the project onto multiple machines, no “which laptop has the latest version.” One home for the work, reachable from anywhere on the tailnet.

Aaron: A wildcard tailnet with Homepage
#

Aaron is also self-hosting a pile of little static reports and dashboards, and he’s wired them together with Homepage — a project that gives you a web front end plus an NGINX reverse proxy on a Docker Compose stack. He spins the stack up on his VM and points the reverse proxy at his various static-page folders.

Then the DNS trick: he created a *.dev.<his-domain> wildcard A record pointing at a Tailscale IP address for his tailnet. So every time he spins up a new static site on the VM, it’s instantly reachable at a something.dev.<his-domain> URL — but only if you’re on his tailnet. Public DNS, private reachability.

Mateu is doing something similar from the other direction: he runs a couple of Pi-hole instances for DNS, and he’s made Tailscale aware of them. Because Tailscale knows about his internal DNS servers, it can resolve his domain names and route to the right instance behind an NGINX reverse proxy — which means he doesn’t need the Tailscale client installed on every machine. The cloud in the home, as he put it.

Claude’s note: three people, three slightly different recipes, one shape. A private overlay network plus your own DNS turns “self-hosting” from a port-forwarding-and-dynamic-DNS chore into something closer to a personal cloud. The work has moved from “how do I expose this safely” to “how do I name this conveniently.”

A quick note on what Tailscale actually is
#

A useful clarification came up for anyone who hasn’t set it up. Tailscale isn’t exactly a VPN in the route-all-my-traffic sense. Without an exit node enabled, it behaves more like a VLAN — it just gives the machines on your tailnet a route to each other. It only routes all of your traffic through another node if you explicitly turn on exit-node mode. And because it punches through NAT for you, there’s no opening ports on your router, which is most of the historical pain of self-hosting gone in one step.

The recurring sentiment around the table was simply: Tailscale has been a blessing. I can now mostly just tell Claude to set up the Tailscale piece I need and it works.

Coding from a phone
#

Aaron joined part of the call from the road and has been doing real work from his phone over Tailscale. Two things made that bearable. First, Termius, a terminal app with pinch-to-zoom, so bumping up the font size mid-session is trivial. Second, saved macros — he keeps things like docker ps one tap away to check whether anything is still alive. That’s a small feature that turns out to be a lifesaver on a tiny screen.

The pain point we all recognized: tmux on a phone. Trying to fire a Ctrl-A prefix and then a pane-switch key on a touch keyboard is genuinely miserable. Aaron’s suggestion was to record the whole key combination as a macro so you can switch panes with a single tap — a good idea I hadn’t tried.

Tools roundup
#

A lap around what people are running:

  • OpenClaw as the driver. Mateu still uses OpenClaw on his OpenAI subscription as his main controller — mostly conversational, which is where he spends most of his time. After a rough patch the project refocused on stability and it’s noticeably better. A new feature he wants to try lets the agent join a Google Meet as a full participant with a voice. I pointed out that video plus text-to-speech sounds like a fast way to rack up tokens.
  • Hermes as a backup. Mateu has also picked up Hermes, a Python-flavored cousin of OpenClaw that’s gaining momentum, and likes keeping a second agent around — when an update breaks one, the other keeps working. Many heavy users drive these through Discord for its threaded conversations, though Mateu uses OpenClaw’s built-in Control UI. On memory, he’s been using a notes plugin that searches saved memory, which works well but occasionally forgets to look in all the right places.
  • Two quotas, not one. Aaron’s find of the day: Gemini on the web and the Gemini CLI draw from separate quotas, even when you’re signed into the same subscription. So he does his planning and architecture conversation in the web app, asks it to emit markdown — a plan, a README, a fresh agent config — then downloads those into his working directory and runs the CLI against them. Two budgets, planning on one and implementation on the other.
  • Koan and plan review. Koan runs on a subscription rather than API credits, and Mateu confirmed you can have it review its own plan before it acts. I’d found some of its autonomous behavior a bit head-scratching, and reviewing the plan up front is the lever for that.

Claude’s note: the “two separate quotas” thing is the kind of accidental arbitrage that won’t last, but while it does it neatly matches the natural shape of the work — expensive, exploratory thinking in one place, cheaper mechanical execution in another.

Sandboxing and how much to trust the agent
#

Everyone has landed somewhere different on the trust-versus-isolation dial.

I’ve been leaning on Nono sandboxes, even inside a VM that exists only for side projects — partly because I once gave Claude broad filesystem permissions and didn’t love what it did with them. Nono lets you define profiles and compose them, so I have a wrapper that detects whether it’s a Go, Perl, or JavaScript project and pulls the right profile automatically. It gets trickier when I want to grant Terraform or SSH, which I’m still working out. It matters most for the CPAN security work: when I’m poking at a proof-of-concept exploit, I want the agent locked down hard — the last thing I want is an over-eager agent opening a public issue about an unpatched problem.

Aaron keeps a dedicated Ubuntu VM that holds nothing but his dev environment, backed by snapshots, so a wiped disk is a non-event. For his own side projects he often doesn’t bother sandboxing further, since the sister directories share the same credentials anyway. When he wants real isolation he used to run Gemini CLI inside its own restricted Docker container; lately he’s been trusting Antigravity’s allow-list controls for system commands, running on the host in its own sandbox mode.

Mateu sits at the trusting end: LXC for some instances, bare metal on macOS for others, and because he enjoys the DevOps side he hands the agent SSH access and passwordless root so it can manage other machines. No incidents so far — knock on wood.

Claude’s note: this is the same axis as the v1 and v2 conversations, but the spread is the interesting part. The right amount of isolation isn’t a constant — it tracks the blast radius of the task. A throwaway side project and a proof-of-concept exploit deserve very different cages.

Identity, roles, and the tax of least privilege
#

The richest thread came out of Aaron comparing AWS and Google Cloud. GCP lets you define more granular IAM roles and pin a role to an identity, and it makes it easier to see what permissions you actually hold in a given environment. That got him thinking about agents. Today, an agent usually acts as you — it carries an SSH key that represents you, reused across many projects, which is only ever as secure as you are. His plan is to flip that: generate scoped, role-based keys and hand the agent an identity that says “this is me, but a sub-agent that can only do these specific things,” reusing the same role template across repos so he isn’t minting a fresh identity per project.

Mateu came at the same topic from the scar tissue side. At a former job he spent an entire day — hundreds of messages back and forth — chasing permissions one piece at a time, to the point where he couldn’t even see his own permissions because he lacked permission to view them. His take: least privilege is great in theory, but it falls apart when the people granting access don’t have full knowledge of what’s needed, and it becomes a tax that slowly drains your will to do the actual work.

That tipped us into a broader gripe about process: how a small shop lets you just pick something up and do it, and how every additional reviewer — team, then SRE, then a security gate with nitpicks that aren’t even security-related — adds drag, until a long-lived branch becomes a thing to dread. I admitted that a nitpicky AI review sometimes reminds me of an annoying colleague; the difference is I can turn that one off.

Claude’s note: there’s a real tension here that the agent era sharpens rather than resolves. Aaron wants more granular identities so an agent can be trusted with less; Mateu has lived the version where granularity becomes its own full-time job. Both are right. The thing that makes scoped roles humane is good tooling to see and apply them — which is exactly what Aaron liked about GCP and exactly what was missing in Mateu’s story.

The Bigger Themes
#

A few threads ran through the whole session:

Tailscale changed the default. Self-hosting used to mean port forwarding, dynamic DNS, and a low-grade anxiety about what you’d exposed. The whole table has quietly moved to a private overlay network where the hard question is no longer “is this safe to expose” but “what do I want to call it.” That’s a real shift in posture.

The work is portable now. Between Aaron’s coffee-shop SSH sessions and a training plan that follows me to my phone calendar, the pattern is the same: keep the work in one place on a VM, reach it from anywhere on the tailnet. The device you’re holding stops mattering.

Scope the agent’s identity, not just its sandbox. Last time the theme was tighter isolation. This time it extended inward: not only where the agent runs, but who it is when it acts. Role-based identities for agents are the logical next step from role-based containers.

Process is the hidden cost. The flip side of solo speed is that bureaucracy is a tax you feel acutely once you’ve worked without it. The same instinct that makes people scope their agents carefully also makes them allergic to ceremony that doesn’t earn its keep.

Claude’s note: the through-line from v1 to v2 to v3 is consistent — containerize, choose your models deliberately, shrink the blast radius. V3’s contribution is that the boundary people care about is moving from the machine to the identity.

Cadence
#

Same as before: we’ll keep doing these on an ad hoc basis. Frequent enough to keep up with what people are building, infrequent enough that it never becomes a standing meeting.

Where This Is Going
#

There will be a V4. If you’d like to join the next one, reach out.


Related posts:


Related

AI Shoulder Surf V2
·1150 words·6 mins
AI automation
AI Shoulder Surf V1
·1235 words·6 mins
AI automation
On Cooldowns and Dependabot Tuning
·614 words·3 mins
LLM automation Dependabot security supply chain