Programming

谢谢你，中国！ (Thank you, China!)

July 1, 2026 llms programming

First things first: Happy Canada Day, everyone! Bonne fête du Canada, tout le monde !

Yes, I know. Starting a post that celebrates Canada Day with a giant title thanking China might feel a bit ironic, if not entirely contradictory. But stick with me for a minute. Because if you are a Canadian working in tech right now, navigating a world where our longest-standing friendship and alliance has ruptured into open hostility from the US government, you should be thanking them too.

For decades, the Canadian tech ecosystem operated under an unspoken assumption: we were fundamentally part of the same digital backyard as Silicon Valley. We relied on the same infrastructure, trusted the same partners, and assumed that a shared border meant shared access to the future of technology.

The rise of large language model regulation in the US has shattered that illusion. As LLMs became the foundational infrastructure for the next generation of software, the narrative shifted toward a suffocating, centralized reality. Suddenly, it looked like access to “intelligence” would be controlled by a couple of trillion-dollar American tech giants, operating under the tight regulatory grip and arbitrary “national security mandates” of an increasingly authoritarian and hostile US government. That’s… not great.

The message to us and to the rest of the world was clear: if you want to build next-gen software, you have to rent it from them and you have to behave or they’ll turn off the tap.

When the US government forced Anthropic to turn Fable and Mythos off a couple of weeks ago, it hit me how all of a sudden things could turn. And that’s when I starting thinking about how different things could have been had China not chosen the path of opensource.

The Open-Source Weapon

Had Chinese AI labs followed the American playbook, the trap would have snapped shut.

When the US president put measures to prevent chips from being sold to China, they inadvertently scored one of the biggest own goals ever, because faced with severe hardware restrictions and intense domestic competition, Chinese companies and research labs like Alibaba, Z.AI and Moonshot took a radically different path. While American companies felt safe in closing their gates, shifting from open research to aggressive, closed-source monopolies, the Chinese chose to weaponize open weights.

They realized the fastest way to bypass the American monopoly was to simply give the models away.

And they didn’t just dump mid-tier toys. Their models are improving at a much higher pace compared to the American ones and have pretty much caught up in practical terms. They’ve consistently commoditized the technology that the US would love to leverage against us.

Autonomy through Proliferation

There is a profound irony here. China has long been known for its heavy domestic internet censorship and state oversight. Yet, they’ve inadvertently become the world’s greatest guarantor of decentralized, open-source AI infra.

By flooding the ecosystem with high-quality opensource models, they’ve made it functionally impossible for an authoritarian US government to lock down the future of software development.

Once a model like GLM 5.2 or Kimi K2.7 are on Hugging Face, downloaded, and running locally on hardware right here in Canada, the gates cannot be closed. It belongs to the global developer community. Better yet, it can be fine-tuned, quantized, and run entirely offline.

This isn’t about blind defence of the Chinese government or anything. It’s a pragmatic look at power dynamics: hegemony is the absolute enemy of innovation, and proliferation is the ultimate antidote to bullying.

When global superpowers compete by releasing open infrastructure, developers win. In fact, the aggressive opensourcing of Chinese frontier models may threaten the duopoly of Anthropic and OpenAI. OpenAI has already postponed their IPO, as companies start having trouble reconciling the costs of inference charged by these labs with whatever gains in productive they’re seeing from them.

Because Chinese labs chose to open up their models, the dream of an API-locked AI dystopia where Canada and other “middle powers” can be frozen out at the border is dead. We have choices, which means we have autonomy.

So, to the engineers and teams pushing the boundaries of open weights on the other side of the world, from a developer celebrating a complicated Canada Day: 谢谢你，中国。 You kept the future of AI open when our neighbours tried to lock it away.

Updated LLM Coding Workflow

May 20, 2026 llms programming

Back in January I posted about how I view LLMs, which included my workflow of doing LLM-assisted coding. To summarize, my workflow was:

Reverse rubber-ducking
Planning and writing a spec file
Implement each phase of the plan, one by one
Validation and commit

I can say lots have changed since January. For one, models are significantly better and more reliable. As well, I feel like I’ve got better at steering them. If I look at the above list by itself, without details, it doesn’t feel like things changed so much, but look closer and it’s a whole new world. My current workflow is an evolution of the above.

Before continuing, let me make something clear: this is my professional workflow. It’s what I use to write production code on cloud services.

Phase 0: Reverse Rubber-ducking

I still have this, but I no longer use a chat interface. I start directly with an agent (almost always OpenCode) and I now first get the agent to engage with the code before anything else. Let’s say I need to make a change to the flux capacitors, so I go and tell the agent what I think happens:

This is cloud-service-foo and it handles requests to create farbelizer connectors. I believe it then nimbolizes the farbelizers before sending them to cloud-service-bar that processes them through the flux capacitors. Check what the actual flow is and summarize it for me.

Most of the times – though not always – I know exactly how the flow works, but I do this to sort of prime the LLM for discussing what I want to change. I just found that it tends to work well for me; better then just telling it directly what I want to change.

An indirect effect of doing this is that sometimes it will tell me something that doesn’t meet my understanding, so I ask details to figure out if it’s really something I missed or just something the LLM got wrong.

When I know the LLM has the context, I will say something like

I have an issue where if the farbelizer connector starts with “foo”, then the flux capacitors should suppress the harmonic back-feeding before it reaches the primary gimbal housing.

I often add a little more about what the actual goal is, but it’s something like this. This usually causes the LLM to tell me what it thinks should be done. It will sometimes ask a question or two, but eventually it will give me a solution. Many times the solution is one I know I don’t want due to some constraint and I will tell it. Sometimes I steer it a bit more and tell it what I think we should do.

And then when I’m happy, I’ll say:

Plan a series of independent PRs to implement this. List which ones can be done in parallel vs linearly

That’s it. That’s the entire plan phase now. I no longer need a spec file.

Phase 1: Implementation

Given the list of PRs, I will ask it to implement them either a few in parallel or, if there’s a dependency, one by one. I also now let the LLM agent commit its changes. (if they were in parallel, I also let it push the changes.)

Again, that’s it?

Phase 2: Validation

I then test the work locally, I still insist on doing that because I don’t want to cause a SEV. I’ll then push the branches and review the diffs myself in GitHub before asking for others to review: I want to avoid wasting people’s time.

You still have to watch them

I had an interesting interaction with GPT 5.5 a few weeks ago, where it wrote code that was akin to this:

attempt := 0

for {
	attempt++
	if attempt > maxAttempts {
		return errTooManyAttemps
	}
	err := fetchData(ctx)
	if err != nil {
		switch {
		case errors.Is(err, errTimeout):
			fmt.Println("Warning: Timeout occurred. Retrying...")

		case errors.Is(err, context.Canceled):
			fmt.Println(" -> Context was canceled. Exiting...")

		default:
			return err
		}
		
		time.Sleep(100 * time.Millisecond) 
	}
}

When I saw that, I immediately knew it didn’t look right, so I asked the agent about the case with the context.Canceled and it happily explained to me that it would log the error and then return with default:. I said, no, it won’t, that’s not how Go switches work. And it insisted! “I understand your confusion, but because there is no break statement, the code will simply fall through the next case.”

No, it forking won’t! So I told it to prove it by writing a test that returned a context canceled. It did, caught the infinite loop and conceded.

My point? They can still make mistakes. I have to check them.

Conclusion

That said, I will concede that the LLMs are so much better now and that these errors are getting more and more rare. My flow is much quicker than before. I still review code like a caveman, I still make sure the LLM gets what I want it to do. But I basically killed the entire “plan” step. It’s just not needed. And I almost never write code by hand.

LLMs Are Tools, Not Replacements

January 12, 2026 ai programming

I’ve been meaning to write this post for a bit, but never found the right time. I guess this is it. Until sometime last year, I was more or less an AI-skeptic. I say more or less because I was always very interested in the technology. I built my own LLM to learn about it and I thought then, as I do now, that the technology is incredible.

And yet, I had tried using LLMs to help with coding and my experiences were not great. I used LLMs to write one-off scripts for me, they were very good at that. But whenever I tried to use them to help me write “production code”, they would hallucinate or get stuck in “bug loop”. I felt like I was spending more time dealing with the aftermath than I’d do writing it all by hand. I even disabled Copilot autocomplete because I felt like it was distracting.

Fast forward to today and most of my code is written by LLMs. How this change happened is a combination of how much the tooling improved but also the recognition that I was holding it wrong.

Now, don’t get me wrong. This post is not meant to convince anyone of anything. I’m not selling anything here. This post is for engineers who are curious about how others work with LLMs and trying to find their own workflow. I’ll show you exactly how I work now and how it works for me.

The bug that changed my mind

As mentioned, I was a bit of a skeptic. I knew LLMs were good at writing one-off scripts and I was using them a lot for that, but not more than that. Then one day someone asked for help with a bug.

We had this multicell architecture and we had a proxy/multiplexer that would decide where any given request should be routed to. Once that decision was made, the request would be proxied to an ALB using a custom transport. The ALB had resource mappings to know where inside a given cell things were hosted, so the custom transport requested a URL from the ALB, the ALB responded with a redirect to the actual destination inside the cell it belonged to. The custom transport would require the request and make it to the correct destination.

The bug: seemingly at random, some requests would succeed and some would not and no one could figure out why. So I started looking and quickly found that it wasn’t random at all: requests with bodies would fail. When I saw that, I immediately thought it was the custom transport eating the body, except I remembered writing that transport and found it hard to believe the issue was there. And upon looking at the code, it seemed fine. I added logging and went about trying to reproduce the issue. The code seemed correct, but the issue was still there.

After a while, I decided to try Claude Code. I launched it on the repo and explained the problem. I’ll admit I did not have high expectations, but hoped that maybe it could give me some insight that would help. To my surprise, in about 40s it came back saying it had found the issue: the transport was eating the request bodies. My first reaction was being frustrated because I knew I had already looked at it and the issue was not there. I thought Claude was being dumb. Except I noticed it was showing code that didn’t look like what I was looking at. Long story short: at some point, someone had copied and pasted some code and added a second custom transport somewhere where it shouldn’t, and that transport had a bug.

I didn’t fully convert then, but I started paying more attention. I began using LLMs for debugging and code reviews, things where being wrong was mostly harmless and I could verify the output easily. Over time, that expanded. Now we’re here.

The mistake I made early on

When I first tried AI coding tools, I treated them like code generators. Describe what you want, get code back, paste it in, repeat. This was the intuitive way to use them, and it’s wrong as far as I am concerned.

For those one-off scripts I mentioned before, I recognize now that I was “vibe coding” them. But that was fine because they were only going to be used by me. But I don’t let LLMs write unsupervised code that I need to ship for others. So the problem is that generated code requires review. Review requires understanding. If you didn’t think through the implementation yourself, you’re now reading code you don’t fully understand, looking for bugs you can’t anticipate, in an approach you didn’t choose. You’re doing more cognitive work than if you’d just written it yourself, and the code is probably worse.

The mental shift that made everything click for me was that LLMs are tools, just like LSPs were tools, and pre-LLM autocomplete was a tool. They’re not a replacement, but a complement. A junior engineer who has read everything but never built anything. Lots of talent but absolutely not trusted unsupervised.

My workflow

This is how I work with LLMs. I found that this works very well for me. I am aware that it is a much more involved workflow than a lot of people’s.

Phase 0: Reverse Rubber-ducking

I don’t start in an agent. I start in Claude, just chatting.

Before I write any code, I want to understand the domain. If I’m implementing auto-updates for a macOS app, I am asking Claude about how Sparkle works. Not “implement auto-updates for me”, but “how does Sparkle choose when to prompt the user?” or whatever. I want to know the concepts, gotchas, tradeoffs, etc. I often talk about some other app and ask “how does X do this?”

This is basically rubber-ducking in reverse. I’m building my own mental model through conversation. By the time I’m ready to touch the code, I actually understand what I’m about to do. This matters because it means I now can review what the LLM produces. I develop an intuition for what to expect, which in turn lets me quickly spot when something is wrong.

This phase gives me confidence, and that matters. And of course, this is mostly for areas I am not already familiar with. But even when am familiar, I find that these conversations give me insights or what I need to ask when doing the plan.

Phase 1: Plan

Now I move to an agent. Lately I’ve been using Amp, but the specific tool matters less than the process. This could be Claude Code, Codex CLI, etc. My process is tool-agnostic.

I don’t say “build me X.” Instead, I start another conversation, mostly a Q&A. “How would you approach this?”, “What are the steps?”, etc. I challenge it when something sounds off. I often ask the LLM to pushback to my ideas if it thinks they’re not good. I may still insist but it’s good to have some pushback here and there. We go back and forth until I’m satisfied with the approach.

Then I ask it to split the plan into the smallest self-contained, testable phases. This is critical. I want each phase to be something I can review, run, and validate before moving on. Those codebase-wide big changes re where things go off the rails.

Finally, I have it write everything to a spec.md file. This serves two purposes: (1) it’s a reference I can point the LLM to if context gets lost, and (2) it’s documentation of what we decided and why. For longer projects, this is how I resume after a break. I also make manual adjustments to this plan when needed, though this is getting more and more rare.

Phase 2: Implement each phase of the plan, one by one

Now the agent starts writing code, one phase at a time.

I watch the diffs as they flow in and because I was part of the planning and did my homework in Phase 0, I know what to expect. A quick glance usually is enough to tell me if it’s writing what we discussed or going off-script. That’s why the prep work matters: review is fast when you understand what you’re looking at.

I also give it context to save time. The agents nowadays are very smart and can find their way, but I can shortcut that by giving it hints “in internal/foo/foo.go there’s a function called DoFoo() and it does this and that and I want it to do that other thing before that” or whatever. Less tokens, faster iteration. This is probably astrology for nerds, pure superstition at this point, but I still do it. (Hi, it’s me, from the future: maybe it’s not astrology?)

Here’s a little trick I’ve started using: cross-agent reviews. Once Amp finishes a phase, I’ll ask Claude Code or Codex to review the diff. Different models and harnesses catch different things. It’s not foolproof, but it’s cheap and occasionally catches something I missed.

Phase 3: Validation, commit, and handoff

Once a phase looks good, I test it. I run and do what I can to validate it. I’ve mostly reviewed the code both by myself and using an LLM.

If something is wrong, I iterate with the agent. I point out the problem and let it fix it. This usually works and only very occasionally I have to take over and fix it myself.

When I’m happy, I commit. This is an easy rollback point if something goes wrong afterwards. At this point I use Amp’s /handoff command to start a fresh context for the next phase. This is a forced boundary: the agent will start clean (though it can reference the previous phase in Amp), it will re-read the spec and we continue. This helps prevent context rot, which is where long sessions start to drift.

Trust Boundaries

I rely on LLMs heavily but I don’t trust them.

These are the lines I don’t let them cross:

Nothing ships without my review. I read every line before it goes in. I am too anxious to ship something I don’t understand. That prep work from Phase 0 is not just about understanding, but about making review fast enough that this is sustainable
Don’t let the LLM write tests unsupervised. I learned this one the hard way. When a test fail, LLMs often “fix” the test to make it pass. I’ve heard this is less likely nowadays but I’ve been burned and trust isn’t easily restored. So there. Now I’m extremely careful about letting them modify test code. Only thing I do like to use LLMs for in testing is asking them “do the tests cover the case where this, this, and this happen?” Helps finding holes in the coverage.
Debugging is still mostly me. This is ironic, given that debugging a bug was my entry point into using LLMs more and more, but I’ve found that for my day-to-day debugging, I’m usually faster on my own. I reach for an LLM if I’m stuck, not as a first resort. Maybe this is muscle memory or maybe the tooling is weaker here. Either way, I don’t force it.

What still doesn’t work well

I want to be honest about the limitations, because the hype around these tools is exhausting.

I don’t think they’re good at complex refactoring across many files. The agent loses the thread. It will make changes that are locally correct but globally inconsistent. For big refactors, I still do a lot of manual work. I feel like the quality of code after an LLM-assisted refactor is not great quality.

Also, anything requiring deep context about the codebase’s history. Why is this weird workaround here? What’s the implicit contract this function has with its callers? The agent doesn’t know, heck most people don’t either, but whereas a human might be reluctant, LLMs will happily remove that code that seemed inconsequential but that now breaks some contract with a client.

And the final one can be controversial, but I think they’re bad at novel architecture decisions. Don’t get me wrong, ask an LLM to design something and it will, but then you ask it “oh but what if…” and it will immediately “yes good point” and redesign it all. It just goes along with whatever you last said. It doesn’t know how to make decisions. It shouldn’t be surprising given how LLMs work, but our brains tend to anthropomorphize everything and then these things become counterintuitive. So I still have to think about architecture myself.

The Real Lesson

These tools have changed a lot — GPT 5.2 and Opus 4.5 are watershed moments IMO — but not as much as my own approach did. I stopped trying to skip the thinking part and started using LLMs to enhance it. The agent participates in discovery, planning, obviously implementation, and also reviews, but I am still driving.

If you bounced off these tools, it might be worth trying again with a different approach, it’s all I’m saying.

I’ve found that my workflow is more work upfront, but dramatically less work overall. More importantly, it lets me focus on the interesting parts and helps me with the drudgery.

Trying out Codex CLI

August 25, 2025 ai programming

A while ago I was a little skeptical of AI-assisted coding. Mostly because my experience had been with CoPilot autocomplete and it was really not good. I still avoid AI autocomplete to this day, even if I can see it got better because I still find it distracting and often still not great.

That said, Claude Code shook my world view and I’ve been daily driving it ever since. I need to write a post about how I use this agent, but tl;dr I use it for the boring parts of coding and to help me read and review code (especially my own) instead of using it to write feature code.

I have been happy with Claude Code, but I also heard very good things about the new GPT-5 model for coding and wanted to check it out. Enter the Codex CLI. It’s OpenAI’s answer to Claude Code.

I am approaching this with a very open mind. I completely understand that it is early times in Codex CLI land and thus I did not expect it to have feature parity with Claude. I’m ok with that, just to get that out of the way.

The onboarding was rough

My first experience with it was that it wouldn’t install due to an issue in the post-install of a dependency (ripgrep, which, I must say, I already had installed.) I went to file a ticket and say that someone else had already done so.

No matter! I thought. I figured out how to get around it and then decided to try it. I opened a local repo and typed /init.

Codex decided it wanted to run tests to check the status of the repo. Fair enough, go ahead. It then failed to compile my Go code, claiming the Go toolchain wasn’t available. I was confused by that, so I closed Codex CLI and ran go version, all good. I ran my tests, all passed. Wut?

I tried again and this time I told it that I checked and I had the toolchain installed. It tried again, no dice. It kept trying until eventually I stopped and did some digging. That’s when I learned that Codex CLI runs inside a sandbox and doesn’t share my shell’s environment. Ok, that was a little upsetting. So I asked Codex CLI how we could provision the sandbox with Go. It proceeded to look for Go 1.13, which was release over six years ago. It asked me to download the tarball and leave it in a certain directory and it would take from there.

Ok, time for some more digging. It’s a this point that I must point out that the Codex CLI documentation is basically non-existent, and being a relatively newcomer, there’s not a lot of resources out there. Again, I get it, let’s just get through this initial steps.

I keep at it until I figure the issue: though my shell’s PATH includes Go 1.25, the sandbox’s did not. I couldn’t quite figure out why but I did manage to get it working by telling GPT where to find the Go binaries.

Once it got working

Now, once I got it working, things went a lot smoother. I quickly got used to the differences from Claude Code (and they are many) and got somewhat comfortable with it. I got GPT to analyze my code and look for bugs and it found a minor one that had escaped Claude for a long time. That was cool.

I found that it tends to be a little noisier than Claude Code, because CC tends to hide somethings behind its quirky verbs (“lampoonig…”, etc) This isn’t necessarily a negative, just different and something to get used to.

I miss the TODO lists that Claude Code creates and follows. Again, not a huge deal. The part that it needs to improve is tool calling. More than once I saw it calling some Go tool with bad parameters. And also, it doesn’t seem to quite grasp the error messages.

Case in point, I asked it to run a linter, so it started running golangci-lint, but it ran it at the root of the repo, where there are no Go files, and without parameters, which resulted in an error “No Go files”. It didn’t seem to understand this error and concluded golangci-lint wasn’t installed.

It then entered a loop trying the same command over and over again until I interrupted it and told it to pass ./... to include subdirectories. It then tried again with the parameters, but bizarrily this time it decided that the golangci-lint would be in ./bin, which is not true at all. So I had to tell it where to find it. And then it worked fine.

Conclusion

It’s early days and it’s clear there’s some ground to make up if they want to catch up, but I also remember the early days of Claude Code. The CC team iterated quickly and we got to where we are today, and I’m hoping the Codex team will do the same. They seem very active in answering questions on X, so I have hope.

I’m hopefuly and interested. I’ll keep an eye on it.

Try was worth trying

July 17, 2019 go programming error-handling

The Go proposal committee has declined the try proposal.

I am disappointed. I also think this is open source working exactly as it should.

The proposal was not dropped because one person with commit access disliked it. A concrete design was published, tools were written to try it on existing code, hundreds of examples and objections were discussed, and the people behind it kept answering questions long after I would have quietly disconnected my router and moved to the woods.

Most importantly, the Proposal Review Committee made its decision in response to that discussion. The closing note mentions problems the original design had missed, including debugging prints and code coverage. It also admits something more fundamental: many Go programmers do not agree that the verbosity of error forwarding is a problem worth changing the language to solve.

That is a good reason to stop. A language belongs to the people who have to read it, and a change to control flow needs more than a clever specification. The community worked through the tradeoffs together and the committee made the call. I may disagree with the result, but I trust the process far more after seeing it work.

Still, I think this will eventually look like a missed opportunity.

The argument against try was strongest when it was used inside nested expressions:

return write(encode(try(read(name))))

I do not want to debug that either. But bad nesting was not the only possible future. The ordinary form was much less alarming:

b := try(ioutil.ReadFile(name))
try(json.Unmarshal(b, &cfg))

One operation per line, errors still in the function signature, no exceptions, and an explicit if whenever local handling mattered. I think that style would have become normal very quickly. It looks unusual today mostly because it is new.

There is a difficult bias in evaluating syntax: familiar boilerplate does not feel like syntax anymore. We see if err != nil and read “return the error.” We see try and inspect every unusual corner because it is new. That scrutiny is necessary before changing a language, but it is not a neutral comparison.

Rust went through a remarkably similar argument around try! and later ?. People worried about hidden control flow and two ways to write the same thing. Once programmers had lived with it, many came to consider it one of the language’s better features. Go is not Rust and should not become Rust, but people do learn new control-flow notation. We already did it with defer and go.

There is another group absent from a discussion among current Go programmers: people who did not choose Go because they found its error handling tedious. It is almost impossible to measure them from inside the community. The people most comfortable with the status quo are, unsurprisingly, the people who stayed.

None of that means the committee should have ignored the reaction and forced try through. That would have damaged the language and the community more than a little error boilerplate ever could. Declining it was the right decision now, given the lack of agreement about both the problem and the solution.

But I suspect we solved the social problem by leaving the technical one untouched. Years from now we may add something remarkably similar under another name, after enough other languages have made the idea feel ordinary. Or we may simply keep writing the same four lines and insist they are valuable because we have become very fast at not seeing them.

Either way, try was worth trying. The proposal improved our understanding of the problem, and its rejection proved that the Go community can say no even after a great deal of work has been invested.

That is not failure. I just hope it is not the last word on improving error handling.