Rubrkit

Before and after examples.

Each example shows the same loop: identify weak dimensions, rewrite the instruction, then define a check that proves whether the rewrite worked.

Weak marketing prompt
38/100

Write a good launch post for my product.

No audience
No positioning
No success criteria

Improved version

Write a launch post for [TARGET AUDIENCE] introducing [PRODUCT]. Use a direct tone, explain the problem, show one concrete outcome, and end with one CTA.

Why it improved

The rewrite defines audience, structure, tone, and the desired action.

Sample eval check

Passes if a reader can identify the product, problem, outcome, and CTA in under 30 seconds.

Bad coding prompt
42/100

Fix this React bug and make the code better.

Undefined bug
No constraints
No verification step

Improved version

Find the cause of [BUG]. Keep changes scoped to [FILES]. Explain the root cause, patch the code, and run [TEST COMMAND].

Why it improved

The rewrite defines scope, expected output, and verification.

Sample eval check

Passes if the patch includes a root-cause note and the specified test result.

Messy agent spec
45/100

You are an agent that researches competitors and writes summaries.

No tool boundaries
No stop condition
No escalation behavior

Improved version

Research competitors using [TOOLS]. Summarize sources, confidence, and gaps. Stop after [LIMIT] sources or when evidence is sufficient. Escalate if data is stale.

Why it improved

The agent now has tool limits, stopping rules, and failure handling.

Sample eval check

Passes if the agent cites sources, stops predictably, and flags stale evidence.

Vague command
50/100

Summarize this document.

No output contract
No audience
No defaults

Improved version

Summarize [DOCUMENT] for [AUDIENCE] in five bullets, then list risks, decisions, and unanswered questions.

Why it improved

The command becomes repeatable because the output shape is explicit.

Sample eval check

Passes if the answer contains the four requested sections and no invented claims.

Reusable skill instruction
44/100

When asked, help me create a presentation.

Weak trigger
No procedure
No output contract

Improved version

Trigger when the user asks for slides. Gather audience, objective, length, and source material. Produce outline, slide copy, and visual direction.

Why it improved

The skill has a trigger, inputs, steps, and deliverables.

Sample eval check

Passes if missing inputs are requested before slide content is generated.

Multi-step AI workflow
41/100

Research a topic, write a report, and make it useful.

No sequence
No owners
No measurement

Improved version

Research [TOPIC], extract claims with sources, draft the report, review risks, and produce final recommendations with pass/fail acceptance criteria.

Why it improved

The workflow has a sequence, deliverables, and quality gate.

Sample eval check

Passes if each step produces an artifact that the next step consumes.