Article Writing
Let's be clear. I do use AI to help write these articles. The ideas and concepts are mine. The majority of generation is AI. It's a way I'm demonstrating — and learning — how multiple agent architectures work in practice.
Each article starts with a brief I write: a specific argument, a target audience, a tone. One agent — the orchestrator — then runs the whole show. It reads the brief, sequences every other agent in the correct order, handles loops and failures, and writes the final output files. It doesn't write any content itself; it conducts. I run it on Claude Opus, because sequencing a multi-agent pipeline with error handling and retry logic requires strong reasoning. Every other agent is more focused, and runs on a cheaper model matched to the complexity of its task — Sonnet for anything requiring synthesis or extended writing, Haiku for narrower, shorter-output tasks.
The key design decision is that agents don't share context. The research challenger gets the research pack but not the brief. The article challenger gets the draft but not the research. I designed it this way deliberately: if an agent can see the intent behind the work, it grades on intent rather than execution, and the challenge becomes worthless. Isolation is what makes the challenge real.
The cartoon is the one exception. The image is generated via a local MCP server that calls the OpenAI API — specifically gpt-image-1 — because that's currently where the best editorial cartoon output comes from. The orchestrator builds the image prompt and returns it to the main session, because the MCP image tool is only available at the top level, not inside subagents.
I review everything before it goes live. The pipeline produces; I decide.
Agents & Models
| Agent | Model | Role |
|---|---|---|
| Me | — | Write the brief. Review and publish the output. |
| Orchestrator | Opus | Sequences all agents, manages loops and failures, tracks state. Opus for the reasoning demands of orchestration. |
| Research | Sonnet | Searches the web, fetches and verifies sources, builds the structured evidence pack. |
| Research Challenger | Sonnet | Independent review of the research pack. Fetches URLs to verify they exist and contain the claimed evidence. |
| Drafter | Sonnet | Writes the first article from validated research. Never invents facts. Applies concrete tone techniques. |
| Article Challenger | Sonnet | Reads the draft cold — no brief, no research. Judges on article merit alone, as a hostile reader would. Issues rated P1/P2/P3. |
| Redrafter | Sonnet | Makes targeted fixes based on challenge feedback. Preserves what works, does not rewrite wholesale. |
| Cartoon Ideas | Sonnet | Generates five editorial cartoon concepts. Each must show the article's argument visually, not just its setting. |
| HTML Assembler | Sonnet | Builds the final HTML article with inline links, references, and LinkedIn hook. |
| Final Reviewer | Sonnet | Pre-publication factual check. Verifies currency, spot-checks claims, sweeps all reference URLs. |
| Cartoon Reviewer | Haiku | Scores the five cartoon concepts on visual hook, humour, and interest. Does not read the article. |
| LinkedIn Post | Haiku | Generates three hook variants to drive clicks. Short output, light task. |
| Image Generator | gpt-image-1 | Produces the editorial cartoon from a prompt built by the orchestrator. Called via local MCP server. |
What I've Learned
-
1
The challengers are the most important agents
The pipeline only works because it argues with itself. Without the research challenger and the article challenger, outputs drifted toward plausible-sounding but unsupported claims. Every major quality improvement I've made came from making the challengers stricter, not from improving the writers.
-
2
Loops need exit conditions, not just single passes
A single challenge-and-fix cycle isn't enough. The article challenge and redraft now loop up to three times. If the article can't pass after three attempts, I abandon it rather than let it through with known P1 issues. The same applies to research: a BLOCK sends the research agent back, up to two times. I learned that loops without hard limits just become infinite retries on fundamentally broken briefs.
-
3
Fetch every URL — no exceptions
Language models can construct URLs that look exactly right but don't exist. I enforce a simple rule: fetch it before you cite it. Any URL that returns a 404 or doesn't contain the claimed evidence is blocked. This single rule eliminated the most embarrassing failure mode I encountered early on.
-
4
Named specifics as a quality test
"Some companies have paused AI deployments" is worthless. "Airbus, ASML, and Mistral wrote an open letter calling for a two-year pause" is evidence. I apply this test to every factual claim in both the research and the draft. It removes a surprising amount of content that reads as credible but isn't.
-
5
I give each agent only what it needs to see
The article challenger doesn't get the research pack. The cartoon reviewer doesn't read the article. I designed the isolation deliberately: if an agent can see the intent behind the work, it grades on intent rather than execution, and the challenge becomes worthless. Narrow context produces better judgement.
-
6
My brief quality determines output quality
"EU AI regulation" produces generic content. "EU explainability requirements will stall Gen AI deployment in financial services" gives the pipeline a specific position to research, argue, and challenge. The pipeline can only be as sharp as the brief I give it at the start.
-
7
Tone needs a technique and a blocking gate
I learned early that telling an agent to write in an "entertaining" tone produces competent, neutral prose — because "entertaining" is not a technique. I now specify deadpan observation, understatement, unexpected juxtaposition. I also made tone a P1 blocking issue, not advisory: if I leave it as P2, the pipeline will produce a correctly structured but tonally wrong article every single time.
-
8
Pass the same governing context through every loop
Early versions passed the tone preference to the drafter but dropped it when invoking the redrafter. The redrafter could see the challenger had flagged a tone problem but had no idea what tone was requested. I now make sure every agent in a loop receives the same governing context — argument, audience, tone — not just the feedback from the previous step.
-
9
A final review gate is non-negotiable
Early versions assembled the HTML and stopped. I added the final-reviewer stage after discovering that claims can become outdated between research and publication, and that broken URLs occasionally survived all previous checks. I don't consider an article finished until it passes a fresh factual sweep against today's date.
-
10
Image generation stays in the main session
The MCP tool that calls the OpenAI image API is only available to the top-level Claude Code session, not to subagents. So the orchestrator builds the image prompt and hands it back to me; I generate the cartoon from the main session. It's a constraint of the architecture — but it means I see the prompt before the image is made and can adjust it if needed.
-
11
Preserve versions — never overwrite
I keep all intermediate files versioned (research-v1, research-v2, draft-v1, and so on). When the pipeline loops — because research was blocked or a redraft was triggered — the previous versions remain. This lets me audit what changed and why, and recover from a bad loop without losing good prior work.
-
12
The worklist is the single source of truth
I have the pipeline trust the worklist marker, not the presence of files on disk. If an article is marked as not started, the pipeline clears all existing intermediate files and starts fresh. This prevents a half-finished run from being picked up as if it were complete — something I got caught by more than once.