What This Covers

Rules for writing operational and system documentation for your Tawa product. Drop this into .claude/rules/tawa-ops.md in your project so your Claude writes structured, agent-readable ops docs automatically.

These docs serve two purposes:

  • Human diagnosis — engineers can find exactly why an error occurs and what to change
  • Agent remediation — a monitoring agent can match a runtime error to a specific code location via stackRefs and open a code change request

When to Write Ops Docs

Write or update ops docs when you:

  • Add or change an error code your service emits
  • Add or change a configuration parameter
  • Change how your service behaves when a dependency is unavailable
  • Fix a bug that was caused by a misconfigured parameter (document it so the next one is caught faster)

The Three Ops Doc Types

Error Catalog — {product}/reference/errors

One doc listing every error code your service can emit. This is the most important ops doc — it is the primary lookup table for automated remediation.

Required for every error entry:

FieldWhat to include
Error codeExact string emitted — must match what appears in logs
ServiceThe service name from catalog-info.yaml
FileRelative file path from repo root
FunctionThe function or method that throws the error
Severityfatal / error / warning
CauseThe triggering condition — what state makes this happen
FixWhat a developer or agent should change to resolve it
Agent hintThe specific code location and change needed (for automated fix)

Entry template:

### ERR_WALLET_INSUFFICIENT_RESERVE

**Service:** iec-wallet
**File:** src/services/wallet.service.ts
**Function:** checkDeployGate
**Severity:** error

**Cause:** The org's wallet balance is below the calculated 3-month reserve
for their declared pod tier. Deploy is blocked until balance is topped up or
the tier is reduced.

**Fix:**
- If the tier is correct, org needs to purchase more tokens via `tawa wallet buy`
- If the tier is wrong, lower `insureco.io/pod-tier` in catalog-info.yaml and redeploy

**Agent hint:** The reserve calculation is in `src/services/wallet.service.ts`
around the `checkDeployGate` function. `DEPLOY_GATE_MONTHS` controls how many
months of reserve are required (default: 3). Reducing this value relaxes the
gate. Coordinate with the platform team before changing.

The Agent hint is the field a remediation agent reads. It must say: which file, roughly where in the file, what parameter or constant controls the behavior, and whether a code change is safe to make autonomously or needs human coordination.

Config Parameter Reference — {product}/reference/config-params

One doc listing every environment variable your service reads — including defaults, valid ranges, and what breaks if the value is wrong.

Entry template:

### DEPLOY_GATE_MONTHS

**Type:** number
**Default:** 3
**Valid range:** 1–12
**Required:** No

Controls how many months of hosting cost must be held in reserve before
a deploy is allowed through the gate.

**Effect of wrong value:**
- Too low (< 1): Deploy gate becomes ineffective — orgs can deploy with near-zero balance
- Too high (> 6): Excessive capital locked up, blocks legitimate deploys

**Read by:** `src/services/wallet.service.ts` → `checkDeployGate`

Dependency Map — {product}/architecture/dependencies

One doc describing what your service depends on and how it behaves when each dependency is unavailable.

Entry template:

### iec-wallet

**Used for:** Checking org wallet balance before deploys (deploy gate)
**Failure mode:** Fail-open — if iec-wallet is unreachable, deploys proceed anyway
**Error emitted on failure:** `WARN_WALLET_UNREACHABLE` (warning, not error)
**Impact:** No gas deduction check; orgs can deploy without sufficient reserve
**Acceptable?** Yes — resilience over strictness. Wallet outage should not block all deploys.

### bio-id

**Used for:** Verifying RS256 JWTs on all authenticated routes
**Failure mode:** Fail-closed — requests are rejected with 401 if JWKS endpoint is unreachable
**Error emitted on failure:** `ERR_JWKS_UNAVAILABLE` (fatal)
**Impact:** All authenticated API calls fail until Bio-ID recovers
**Acceptable?** Yes — auth must not be bypassed. Alert on-call if this persists > 2 minutes.

stackRefs — Linking Docs to Code

Always populate stackRefs when creating ops docs. This is what connects a Bible entry to the actual code a remediation agent will read or modify.

Format: "{service}/{path}:{symbol}"

"iec-wallet/src/services/wallet.service:checkDeployGate"
"iec-wallet/src/config/defaults:DEPLOY_GATE_MONTHS"
"iec-builder/src/deploy/pipeline:runDeployGate"

Use the stackRefs parameter in bible_create and bible_edit. A monitoring agent that finds an error code in a Bible entry will follow its stackRefs to locate the code, read the agent hint, and determine whether to propose a fix.

The Agent Remediation Flow

This is the intended flow for why ops docs are structured this way. The coding agent never applies changes directly — every fix proposal goes to a review queue and requires admin approval before anything changes.

Runtime error logged by Janus/iec-observe
  ↓
Error code matched to Bible entry in {product}/reference/errors
  ↓
Agent reads: cause, fix, agent hint, review-level, stackRefs
  ↓
Agent reads the referenced code files via stackRefs
  ↓
Agent generates a Fix Proposal (structured JSON)
  ↓
Fix Proposal submitted to the Coding Queue (iec-queue)
  ↓
Admin review — approve or reject in the platform console
  ↓
If approved → code change request opened (PR) or config updated
If rejected → proposal archived with rejection reason

No code or config ever changes without a human approving it. The agent's job is to generate a well-reasoned proposal, not to act.

Fix Proposal Schema

When an agent submits to the Coding Queue, the proposal must include these fields:

{
  "type": "code-change-request",
  "errorCode": "ERR_WALLET_INSUFFICIENT_RESERVE",
  "service": "iec-wallet",
  "bibleRef": "tawa-platform/reference/errors",
  "stackRefs": [
    "iec-wallet/src/services/wallet.service:checkDeployGate",
    "iec-wallet/src/config/defaults:DEPLOY_GATE_MONTHS"
  ],
  "reviewLevel": "admin",
  "proposedChange": {
    "description": "Reduce DEPLOY_GATE_MONTHS from 3 to 2 to relax the deploy gate reserve requirement.",
    "files": [
      {
        "path": "src/config/defaults.ts",
        "diff": "--- a/src/config/defaults.ts\n+++ b/src/config/defaults.ts\n@@ -14,1 +14,1 @@\n-export const DEPLOY_GATE_MONTHS = 3\n+export const DEPLOY_GATE_MONTHS = 2"
      }
    ]
  },
  "reasoning": "The error ERR_WALLET_INSUFFICIENT_RESERVE was triggered for org insureco on a nano pod. Reserve required: 10,800 tokens. Balance: 9,200 tokens. The shortfall is within 15% of the threshold. The agent hint indicates DEPLOY_GATE_MONTHS controls this threshold and reducing it is safe for this pod tier.",
  "confidence": "medium",
  "submittedAt": "2026-03-01T14:22:00Z"
}
FieldDescription
typeAlways "code-change-request" for code fixes, "config-change-request" for config-only fixes
errorCodeThe exact error code that triggered this proposal
bibleRefThe fullSlug of the Bible entry the agent used
stackRefsCode locations the agent read to generate the proposal
reviewLevel"admin" (requires senior approval) or "standard" (any platform team member)
proposedChangeHuman-readable description plus file diffs
reasoningThe agent's full reasoning chain — what it read, what it concluded
confidence"high" / "medium" / "low" — agent's self-assessed confidence in the fix

review-level Field in Agent Hints

Every agent hint in your error catalog must declare a review-level. This tells the coding agent how to route the proposal.

**Agent hint:** Adjusting `DEPLOY_GATE_MONTHS` in `src/config/defaults.ts` line ~14
resolves false positives on this error class. Reducing from 3 to 2 is safe for
nano and small pod tiers. **review-level: standard**
**Agent hint:** This error indicates the JWKS endpoint configuration is wrong.
The endpoint URL is hardcoded in `src/auth/jwks.ts:getPublicKey`. Any change
here affects all authenticated routes across the service — do not auto-propose.
**review-level: admin**
review-levelWhen to use
standardConfig value adjustment, safe to auto-propose. Low blast radius.
adminCode change affecting auth, billing, or cross-service behavior. Requires senior review.
holdDo not submit to the coding queue automatically. Page on-call instead.

Write your ops docs with this flow in mind. The agent hint, stackRefs, and review-level are not optional fields — they are what makes the gated remediation pipeline work.

Writing Style for Ops Docs

Ops docs are internal and technical. Different rules from developer portal docs:

  • Be precise about code locations. File path + function name, not "somewhere in the wallet service."
  • State failure modes explicitly. Fail-open or fail-closed. Not "it might have issues."
  • Include default values and valid ranges for every config param. An agent needs to know what "correct" looks like.
  • Agent hints must be actionable. "Check the config" is not an agent hint. "Set DEPLOY_GATE_MONTHS to 2 in src/config/defaults.ts line ~15" is.
  • Note when a change needs human coordination. Some fixes are dangerous to automate. Say so explicitly in the agent hint.

Publishing Workflow

Same as developer portal docs — draft, review, publish via jci-mcp:

bible_create({ product, type: "reference", slug: "errors", ... stackRefs: [...] })
→ show user for review
→ bible_publish({ fullSlug: "{product}/reference/errors" })

Update and republish whenever an error code is added, changed, or resolved.

Minimum Required Ops Docs

DocRequired when
{product}/reference/errorsAlways — every service that emits errors
{product}/reference/config-paramsAlways — every service with configurable behavior
{product}/architecture/dependenciesAny service with more than one internal dependency

Common Mistakes

WrongRight
Vague agent hint: "fix the wallet logic"Specific: "adjust MIN_RESERVE_MULTIPLIER in src/config.ts:32"
No stackRefsAlways link to the responsible file and function
Omitting failure modeAlways state fail-open or fail-closed explicitly
Combining all errors into proseOne ### ERR_CODE block per error, scannable by agents
Skipping config-params docEvery env var your service reads must be documented

Last updated: March 1, 2026