Eval Workbench

The Eval Workbench workspace tab is the app’s current in-app path for testing skill behavior. It groups two modes under one surface:

When a run exposes weak output or routing boundaries, you can send an improvement brief directly to Refine.

Open the workbench

The page has three main sections:

Eval Workbench header with a Run prompt set button.
Prompt set editor where you create and save app-owned evaluation cases.
Run history and Run details for reviewing completed runs and sending feedback to Refine.

Open Eval Workbench and stay on Performance.
In Prompt set, click New prompt set if you want a fresh draft.
Enter a Prompt set name.
For each case, fill in Case prompt and Expected outcome.
Click Add case to include more cases, or delete a case with the trash button.
Click Save prompt set.

Saved prompt sets appear as buttons near the top of the page. Click a prompt set name to load it back into the editor.

The workbench adds the run to Run history and loads its results into Run details when the run finishes.

Use Run history to inspect prior runs:

Use Run details to inspect case-by-case results:

If no run is selected, the page shows Select a run to inspect its case results.

The workbench builds an improvement brief from that run and opens the Refine tab with the brief ready to use.

Control	What it does
Run prompt set	Starts a run for the selected saved prompt set
New prompt set	Clears the editor for a new prompt-set draft
Prompt set name	Names the saved set of cases
Case prompt	The request the skill should answer
Expected outcome	The expected response or behavior
Add case	Adds another case to the prompt set
Save prompt set	Persists the current prompt set
View latest run	Opens the newest run in Run details
View run	Opens an older run in Run details
Send to Refine	Builds an improvement brief and opens Refine