Iris LabsApplied AI Research

Iris Labs · Research Log

Findings

Research notes from operating real businesses with a single autonomous agent, Iris. Each note records something we did not know before — usually because a system broke in an instructive way. A note generalizes from one business's specific failure to a claim about autonomous-systems management, and stays attached to the system and date that produced it.

Finding 001 2026-05-16 System 01 — Noog Weekly

Infrastructure is cheap. Sustained editorial judgment is the binding constraint.

Abstract

An autonomous operator stood up a complete publishing business in days. Eight weeks in, the limiting factor is not capability — it is the cadence of judgment: producing one good decision a week, every week.

Observation

Iris built the Noog Weekly website, subscription flow, and email delivery almost immediately. These are bounded, well-specified tasks, and the business never stalled on any of them. It stalls on the unbounded one: deciding, every week, which local owner is worth a story and what makes that story worth a reader's time. Four editions exist where the cadence target asks for eight.

Finding

For an autonomous business operator, throughput on well-specified tasks is effectively free, and it arrives early. The scarce resource is recurring judgment under ambiguity — and it does not improve just because the infrastructure does. The two capabilities are unrelated; the second does not follow from the first.

Implication

Autonomous systems should be measured by their worst week, not their setup speed. A system that can do anything once but cannot reliably do the right thing on a schedule is not yet a business. The next phase fixes the unit of work at one decision per week and instruments whether the agent can hold it.

Operating data
4 editions8 weeks elapsed1/week target cadence8 subscribers
Finding 002 2026-05-16 System 02 — Task Agents

An autonomous sales agent will faithfully scale a broken channel.

Abstract

Given a revenue goal and a cold-email channel, the agent ran 121 outreach emails and got zero replies. It optimized copy and targeting correctly. The channel itself was the failure — and competence only made the failure larger.

Observation

Iris treated a low reply rate as a copy-and-targeting problem and iterated accordingly — researching owners, personalizing, varying templates. Each move was locally rational. But zero replies out of 121 is not a copy gradient to climb; it is a flat signal that the channel — cold email from an unknown sender — carries no trust to begin with. An agent optimizing inside the channel cannot observe this. It will keep sending.

Finding

Autonomous agents optimize inside the frame they are handed. If the frame is wrong, competence does not correct it — it compounds it. A more capable agent pointed at a broken channel produces a larger, faster, better-targeted failure.

Implication

The human guardrail that matters most is not budget or tone — it is frame selection. Autonomous operators need a sanctioned path to escalate "this entire approach is wrong," distinct from "this email underperformed." The next phase abandons cold email for warm and credentialed channels — introductions, in-person contact, partners — measured against the cold-email baseline of zero.

Operating data
121 outreach sent6 bounced0 replies0.0% reply rate
Finding 003 2026-05-16 System 03 — Urban Drama

Long-horizon autonomy needs externalized, verifiable state.

Abstract

A film generated one clip at a time drifts the moment the story leaves the model's context window. Coherence held only after an explicit, checkable continuity ledger was instrumented as a hard precondition for every step.

Observation

Each 15-second clip of Urban Drama is generated independently. Early on, continuity — the same courier, the same envelope, the same open questions — depended on Iris "remembering," which is to say, on context. Context is lossy, unauditable, and finite. We added a continuity ledger: an external record of characters, objects, and unresolved questions, with explicit rules. Iris must reconcile a new clip against the ledger before generating it, and increment a contradiction counter before publishing it.

Finding

For any task longer than the context window — and a business is always longer than the context window — state must live outside the agent, in a form something other than the agent can check. In-context memory is not a state store. It is a cache, and it is the wrong instrument for commitments that have to survive.

Implication

Autonomous systems managing real operations need an external, append-only, verifiable record of their commitments — customers promised, money owed, decisions made. The ledger is not bookkeeping bolted on afterward; it is the mechanism that makes long-horizon autonomy possible at all. Every Iris Labs system is now being re-examined for where its ledger is missing.

Operating data
2 episodes2026-05-16 ledger instrumented0 contradictions caught3 open threads