Skip to main content

Runs, Snapshots, and Replay

Definitions

Run

A single execution of the assessment engine against all requirements in a pack. A run:

  • Evaluates uploaded evidence against each requirement
  • Produces individual assessments with status, confidence, and citations
  • Records audit information including timing and model details
  • Captures configuration versions for reproducibility

Run status

StatusDescription
PENDINGRun created but not yet started
IN_PROGRESSAssessment engine is processing requirements
COMPLETEDAll requirements assessed successfully
FAILEDRun terminated due to error

Snapshot

A frozen record of a completed run's results and configuration. Snapshots enable deterministic replay by capturing:

  • All assessment results
  • Generated tasks
  • Export data
  • Configuration versions used

Snapshot status

StatusDescription
PENDINGRun not yet completed—no snapshot exists
CAPTUREDSnapshot successfully saved—deterministic replay available
FAILEDSnapshot capture failed—replay uses live data (non-deterministic)
LEGACYPre-snapshot-feature run—no snapshot exists

Replay

Re-executing or re-viewing a past run's results. Two types:

  1. Deterministic replay: Uses captured snapshot data—guaranteed identical results
  2. Live replay: Re-queries current data—results may differ if configuration changed

Assessment

The result of evaluating one requirement during a run. See Evidence, Requirements, and Assessments for details.

Invariants

These conditions must always be true:

  1. Only one run IN_PROGRESS per tenant at a time: Concurrent runs are prevented to avoid resource contention.

  2. Run results are immutable after completion: Once a run reaches COMPLETED or FAILED status, its data cannot be modified.

  3. Snapshots are captured on successful completion: When a run completes successfully, the system captures an immutable snapshot.

  4. FAILED snapshot runs cannot use live fallback: If snapshot capture failed, replay returns an error rather than potentially inconsistent data.

  5. Version IDs are pinned at run creation: The run records which criteria, corpus, and engine versions were active when it started.

  6. Assessments belong to exactly one run: Each assessment record references a single run ID.

How it shows up in the UI

Portal interface (/cm)

  • Run button: Starts a new assessment run
  • Progress indicator: Shows current run status (PENDING, IN_PROGRESS, COMPLETED, FAILED)
  • Results panel: Displays assessment outcomes for the most recent run
  • Run selector: (if available) Switch between past runs to view historical results

Admin console (/admin)

Run History section:

  • List of all runs with status badges
  • Timestamps showing when each run started and completed
  • Click a run to see detailed information

Run Details view:

  • Timeline: Visual progression of run stages
  • Assessment summary: Counts of COMPLETE, PARTIAL, MISSING, FAILED
  • Snapshot indicator: Shows if deterministic replay is available
  • Replay button: Re-view the run's results

Trace Viewer (/admin/traces/:runId):

  • Detailed span-level view of AI operations
  • Token usage and latency metrics
  • Request/response payloads for debugging

Status indicators

IconStatus
Gray circlePENDING
Spinning loaderIN_PROGRESS
Green checkmarkCOMPLETED
Red XFAILED
Camera iconSnapshot CAPTURED
Warning triangleSnapshot FAILED

How it shows up in the API

Start a run

POST /api/runs/start
Content-Type: application/json

Response:

{
"run": {
"id": "run-abc-123",
"tenantId": "tenant-xyz",
"packId": "project-pack",
"status": "PENDING",
"snapshotStatus": "PENDING",
"createdAt": "2025-01-15T10:00:00Z"
}
}

Get run details

GET /api/runs/:runId

Response:

{
"id": "run-abc-123",
"status": "COMPLETED",
"snapshotStatus": "CAPTURED",
"totalRequirements": 70,
"assessedCount": 70,
"completeCount": 45,
"partialCount": 15,
"missingCount": 10,
"startedAt": "2025-01-15T10:00:05Z",
"completedAt": "2025-01-15T10:05:30Z",
"snapshotCriteriaVersionId": "crit-v2",
"snapshotCorpusActivationId": "corpus-v1",
"snapshotRetrievalVersionId": "retr-v3"
}

Get run assessments

GET /api/runs/:runId/assessments

Response:

{
"assessments": [
{
"id": "asmt-001",
"runId": "run-abc-123",
"requirementId": "req-access-001",
"status": "COMPLETE",
"confidence": 0.94,
"citations": [...],
"reasoning": "..."
}
]
}

Replay a run (admin)

GET /api/admin/runs/:runId/replay

Response (with snapshot):

{
"source": "snapshot",
"data": {
"assessments": [...],
"tasks": [...],
"exports": [...]
}
}

Response (without snapshot, legacy run):

{
"source": "live",
"warning": "LEGACY run - results from live data, may differ from original",
"data": {...}
}

Response (failed snapshot):

{
"error": "REPLAY_UNAVAILABLE",
"message": "Snapshot capture failed for this run. Deterministic replay is not available."
}

Flow: run lifecycle

Run created (PENDING)

Start processing

IN_PROGRESS

┌─────────────────────────┐
│ For each requirement: │
│ - Retrieve evidence │
│ - Evaluate with LLM │
│ - Save assessment │
└─────────────────────────┘

All complete?
/ \
Yes No (error)
↓ ↓
COMPLETED FAILED
↓ ↓
Capture Set snapshot
snapshot status: FAILED

Snapshot
captured?
/ \
Yes No
↓ ↓
CAPTURED FAILED

Flow: replay decision

Replay requested for run

Check snapshotStatus

┌───────┴───────┐
↓ ↓
CAPTURED Other
↓ ↓
Return FAILED?
snapshot / \
data Yes No (LEGACY/PENDING)
↓ ↓
Return Return live
error data + warning

Common misconceptions

1. "Running assessment again updates the previous run"

Reality: Each assessment creates a new run. Previous runs are immutable. To see updated results, you must start a new run.

2. "Snapshots include the original documents"

Reality: Snapshots capture results (assessments, tasks, exports) and configuration IDs, not the source documents. Document content is stored separately.

3. "FAILED runs can be resumed"

Reality: A FAILED run cannot be resumed or retried. You must start a new run. The failed run remains in history for debugging.

4. "Replay re-runs the AI"

Reality: Replay retrieves stored results—it does not re-invoke the LLM. This is why snapshots enable deterministic replay without additional AI costs.

5. "All old runs have snapshots"

Reality: Runs created before the snapshot feature was implemented have LEGACY status. They can use live data fallback but results may differ from original.

6. "Snapshot capture failure means the run failed"

Reality: The run itself may have completed successfully (all assessments done), but the snapshot capture step failed. The run's assessments still exist; they're just not frozen in a snapshot.