Rubrkit

A compact procedure format for instructions agents can follow.

rubr_flow turns loose prompts, commands, skills, agent specs, and workflows into bounded, testable procedures with visible inputs, rules, steps, outputs, and verification.

Specimen RBR-FLOW

Ready to audit
TASK "Ship safer agent instructions"
INPUTS
  artifact = prompt_or_agent_spec
RULES
  preserve user intent
  flag unsupported claims
FLOW
  CALL tool "Rubrkit audit" WITH artifact -> audit
  EDIT weak dimensions FROM audit -> revision
  WRITE eval checks FROM audit, revision -> tests
OUTPUT
  score: audit.score
  revision
  tests
VERIFY
  PASS WHEN tests can judge the result
  FAIL WHEN tests cannot judge the result
What it is

Pseudocode for instructions, not a runtime.

The artifact is still the instruction. A person can read it, an agent can follow it, and Rubrkit can grade whether the work is bounded enough to reuse.

Readable by default

Uses boring keywords, indentation, labels, and named outputs instead of clever syntax.

Built for review

Separates facts, constraints, actions, outputs, and verification so weaknesses are easy to mark.

Pasteable into agents

No compiler or special runtime is required. The procedure is meant to be followed directly.

Mechanism

Every block has one job.

rubr_flow works because it makes the hidden control surface visible: what the agent knows, what it may do, how it moves, and how success is checked.

TASK

Name the objective in one sentence.

CONTEXT

Separate durable facts from the procedure.

INPUTS

Declare files, data, URLs, or assumptions the agent needs.

RULES

Make boundaries and preservation requirements visible.

FLOW

List the ordered work with labels, branches, and handoffs.

OUTPUT

Define the final artifact shape before the work starts.

VERIFY

Give the agent a pass/fail finish line.

Examples

Loose request in. Bounded procedure out.

The point is not to make instructions longer. The point is to expose the decisions an agent otherwise has to guess.

Onboarding improvement
Before

Review our onboarding flow and fix anything confusing.


Inputs named
Edits bounded
Pass/fail verification
After

rubr_flow procedure

TASK "Improve onboarding completion"
CONTEXT
  user is new to [PRODUCT]
  primary action is [TARGET ACTION]
INPUTS
  current_flow = app screens
  analytics_notes = drop-off data
  support_themes = user confusion reports
RULES
  change only copy and step order
  preserve required legal text
TOOLS
  READ analytics_notes
  READ support_themes
STATE
  friction_notes = []
FLOW
  REVIEW each screen -> friction_notes
  RANK issues by user impact -> ranked_issues
  EDIT the highest-impact issue -> changed_copy
ON missing_context
  ASK user "Which onboarding detail is missing?" -> missing_detail
  FAIL WHEN missing_detail is unavailable
OUTPUT
  changed_copy: final text
  rationale: why this improves completion
  risk_notes: constraints preserved
VERIFY
  PASS WHEN user can identify the next action in one pass
  STOP WHEN verification passes
Coding-agent repair loop
Before

Fix the failing tests and clean up anything related.


Scope controlled
Root cause required
Retry limit explicit
After

rubr_flow procedure

TASK "Repair failing checkout tests"
INPUTS
  failing_command = "npm test -- checkout"
  changed_files = git diff
RULES
  edit only checkout code and focused tests
  preserve public API names
FLOW
  RUN "npm test -- checkout" -> test_result
  DECIDE failure_notes FROM test_result, changed_files
  DECIDE root_cause FROM failure_notes, changed_files
  EDIT minimal patch -> patch
  RUN "npm test -- checkout" -> verification
OUTPUT
  root_cause
  changed_files
  verification
VERIFY
  PASS WHEN verification.status == "passed"
  FAIL WHEN same failure repeats 3 times
Research workflow
Before

Research the market and write a useful summary.


Evidence trail
Audience named
Confidence visible
After

rubr_flow procedure

TASK "Produce a sourced market brief"
INPUTS
  topic = [MARKET]
  audience = [DECISION MAKER]
RULES
  cite every factual claim
  separate evidence from recommendation
FLOW
  SEARCH topic IN approved_sources -> sources
  IF sources.empty
    FAIL WHEN sources.empty
  READ url sources -> notes
  DECIDE confidence_level FROM source_quality, recency
  WRITE brief -> draft
OUTPUT
  summary
  evidence_table
  open_questions
  recommendation
VERIFY
  PASS WHEN every recommendation points to evidence
  STOP WHEN confidence_level is stated
Success statistics

Measure whether the procedure can hold.

These are sample rubric statistics from the examples on this page, not aggregate customer performance claims.

64 -> 100

Sample rubric score

The loose onboarding request improves when inputs, tools, failure handling, output, and VERIFY are explicit.

10/10

Rubric dimensions covered

The stronger sample covers task, context, inputs, rules, state, flow, tools, output, verification, and failure handling.

0

Open-ended finish lines

A usable procedure ends with PASS, STOP, or FAIL conditions instead of asking the agent to decide when it is good enough.

3

Drift controls added

Bounded edits, named inputs, and pass/fail verification make the agent less likely to invent the next step.

Turn the next loose instruction into rubr_flow.

Run an audit