Examples

Watch weak instructions get graded, rewritten, and proven.

Every specimen below runs the same loop: grade the instruction against the rubric, mark the weak dimensions, rewrite it, then define a check that proves the rewrite holds.

Run an audit Explore rubr_flow

Grade the specimen

Paste a prompt, command, skill, agent spec, workflow, or rubr_flow block and get a 0-100 score across the rubric.

Read the marks

See which dimensions failed: clarity, context, constraints, output shape, or evaluation criteria.

Rewrite with checks

Apply the stronger rewrite and run the sample eval that proves whether the instruction holds.

Prompt

A launch post that knows who it is for.

Most weak prompts skip audience, structure, and the one action that defines success.

Marketing launch post

Before · 38/100

Write a good launch post for my product.

No audience

No positioning

No success criteria

After · 86/100

Write a launch post for [TARGET AUDIENCE] introducing [PRODUCT]. Open with the problem, show one concrete outcome, keep a direct tone, and end with a single [CTA].

Why it improved

The rewrite fixes audience, structure, tone, and the desired action so the result is repeatable.

Sample eval check

Passes if a first-time reader can name the product, problem, outcome, and CTA in under 30 seconds.

Command

A summary command with an output contract.

Single-shot commands fail when the output shape, audience, and defaults are left implicit.

Document summary command

Before · 50/100

Summarize this document.

No output contract

No audience

No defaults

After · 88/100

Summarize [DOCUMENT] for [AUDIENCE] in five bullets, then list risks, open decisions, and unanswered questions. Do not invent facts that are not in the source.

Why it improved

The command becomes deterministic because the output sections and a no-fabrication rule are explicit.

Sample eval check

Passes if the answer contains all four requested sections and every claim traces to the source.

Skill

A reusable skill that knows when to fire.

Team skills need a trigger, the inputs they require, and the artifacts they must return.

Presentation skill instruction

Before · 44/100

When asked, help me create a presentation.

Weak trigger

No procedure

No output contract

After · 85/100

Trigger when the user asks for slides or a deck. Gather [AUDIENCE], [OBJECTIVE], [LENGTH], and source material before drafting. Produce an outline, slide copy, and visual direction; ask for any missing input first.

Why it improved

The skill now has a precise trigger, required inputs, a procedure, and named deliverables.

Sample eval check

Passes if missing inputs are requested before any slide content is generated.

Agent spec

An agent with boundaries and a stop condition.

Open-ended agent specs drift without tool limits, stopping rules, and failure behavior.

Competitor research agent

Before · 45/100

You are an agent that researches competitors and writes summaries.

No tool boundaries

No stop condition

No escalation behavior

After · 87/100

Research competitors using [TOOLS] only. For each, record sources, a confidence level, and gaps. Stop after [LIMIT] sources or once evidence is sufficient. Escalate to a human if data is older than [FRESHNESS].

Why it improved

The agent gains tool limits, stopping rules, and an explicit escalation path for stale evidence.

Sample eval check

Passes if the agent cites sources, stops predictably, and flags evidence past the freshness window.

Workflow

A multi-step workflow with handoffs and a gate.

Chained AI work breaks when steps have no sequence, no owner, and no measurable acceptance.

Research-to-report workflow

Before · 41/100

Research a topic, write a report, and make it useful.

No sequence

No handoffs

No measurement

After · 89/100

Research [TOPIC] and extract claims with sources. Draft the report from those claims, review it for risks, then produce final recommendations with pass/fail acceptance criteria. Each step consumes the previous step’s artifact.

Why it improved

The workflow now has an ordered sequence, explicit handoffs between steps, and a quality gate.

Sample eval check

Passes if every step produces an artifact that the next step actually consumes.

rubr_flow

A bounded procedure an agent can actually follow.

rubr_flow separates context, inputs, flow, output, and verification into a path with no slack.

Onboarding improvement procedure

Before · 57/100

TASK "Improve onboarding" FLOW review the app and fix anything confusing OUTPUT notes

Loose flow

No input contract

Weak verification

After · 91/100

TASK "Improve onboarding"
CONTEXT user is new to [PRODUCT]
INPUTS current flow, target action, known drop-off point
FLOW
  REVIEW each step -> friction_notes
  EDIT highest-impact copy issue
  VERIFY user can identify the next action
OUTPUT changed copy, rationale, pass/fail check

Why it improved

Context, inputs, flow, output, and verification are separated, so the agent has one bounded path.

Sample eval check

Passes if each FLOW step feeds the next and VERIFY can be scored as a clean pass or fail.

Now grade your own instruction.

Paste a prompt, command, skill, agent spec, workflow, or rubr_flow block and get the marks, the rewrite, and the eval that proves it.

Run an audit