PLAN — P14c Mastra ScoreEvent Receipt Import (2026 Q2)¶
- Date: 2026-04-28
- Owner: Evidence / Product
- Status: Implemented importer slice
- Scope: Turn one bounded Mastra
ScoreEvent/ExportedScore-derived artifact into one portable Assay evidence receipt bundle. This is an Assay-side compiler path, not a Mastra integration, partnership, exporter, or observability sink.
1. Why this exists¶
P14b recut the Mastra lane away from scorer definitions and experiment wrappers toward the thinner exporter seam that Mastra maintainers pointed at:
That recut now has two useful anchors:
- an Assay-side sample around
ScoreEvent/ExportedScore - Mastra observability docs now publicly expose
ObservabilityExporterevent callbacks, includingonScoreEvent(event: ScoreEvent)
P14c is the next narrow step. It should make the score-event lane a real Assay compiler path without claiming Mastra runtime truth, scoring truth, trace truth, or dashboard truth.
2. Positioning rule¶
Use this sentence when explaining the lane:
Mastra can surface score events through its observability exporter path. Assay can reduce selected score-event outcomes into portable evidence receipts.
Do not say:
- Assay integrates with Mastra
- Mastra supports Assay
- Assay verifies Mastra scores
- Assay imports Mastra observability
- Assay understands Mastra traces, scorers, or dashboards
This is one bounded evidence compiler lane over an existing upstream seam.
3. v1 input unit¶
P14c v1 should import one JSONL row per bounded score artifact derived from a Mastra ScoreEvent.
The input row is not a raw callback dump. It is a reviewer-safe reduction over ScoreEvent.score / ExportedScore. P14c v1 imports a reduced score-event artifact in JSONL form, not a raw exporter callback payload.
Recommended input surface:
{
"schema": "mastra.score-event.export.v1",
"framework": "mastra",
"surface": "observability.score_event",
"timestamp": "2026-04-28T12:00:00Z",
"score_id_ref": "score_01h...",
"scorer_id": "toxicity-check",
"score": 0.98,
"target_ref": "span_01h...",
"score_source": "live",
"trace_id_ref": "trace_01h...",
"span_id_ref": "span_01h..."
}
The importer should accept JSONL rather than a single JSON document so the lane can later stream multiple bounded score artifacts without changing the command shape.
4. Required fields¶
P14c should require:
schema = "mastra.score-event.export.v1"framework = "mastra"surface = "observability.score_event"timestampas RFC3339 with UTC offset- numeric
score - bounded
target_ref
The input surface value intentionally matches the receipt source_surface value to avoid contract drift between the reduced artifact and Assay receipt.
Preferred live-backed identity fields:
- bounded
score_id_ref - bounded
scorer_id
These are the preferred canonical bounded identity fields for v1. Mastra's public docs confirm the onScoreEvent exporter seam, while the stronger scoreId / scorerId expectations are live-capture-backed rather than docs-hard. A fresh 2026-04-30 capture on @mastra/core 1.29.1 / @mastra/observability 1.10.2 proves that the current supported onScoreEvent path carries both fields.
They remain optional in the released v1 reduced artifact for compatibility with older captures and fixtures. Future schema versions may make this stricter once the team is ready to intentionally break older reduced artifacts.
If score_id_ref is absent in v1, the reduced artifact still needs enough bounded context to remain reviewable: target_ref, timestamp, numeric score, and at least one scorer identity. scorer_id is strongly preferred when naturally present; scorer_name remains a compatibility fallback rather than the preferred compiler identity.
Why this is stricter than the older P14b sample:
- current type/live discovery exposes
scoreIdandscorerId - a real receipt importer should prefer stable bounded identity over display labels
scorer_nameis useful for review, but should not be the primary identity for a compiler path oncescorer_idis available
The importer should preserve these fields when present and tests should prove they round-trip into receipts. Any later move from preferred to required should be a deliberate schema-versioned tightening, not an accidental v1 drift.
5. Optional fields¶
P14c may preserve these bounded fields when naturally present:
scorer_namescorer_versionscore_sourcereasontrace_id_refspan_id_refscore_trace_id_reftarget_entity_typemetadata_ref
metadata_ref MUST be a bounded reviewer-safe string reference only. It is non-resolving by default: no inline object, body expansion, URL requirement, or dereference semantics are part of v1. Raw metadata or correlationContext objects inline are malformed for v1.
trace_id_ref, span_id_ref, and score_trace_id_ref are anchors only. They must not make this a trace import lane. These fields are optional reviewer aids only and must not affect receipt validity or downstream claim semantics in v1. score_source, trace_id_ref, span_id_ref, and score_trace_id_ref are never part of the canonical identity of the receipt in v1.
6. v1 receipt payload¶
The Assay receipt should use one event per imported score artifact:
Payload schema:
{
"schema": "assay.receipt.mastra.score_event.v1",
"source_system": "mastra",
"source_surface": "observability.score_event",
"source_artifact_ref": "mastra-score-events.jsonl",
"source_artifact_digest": "sha256:...",
"reducer_version": "assay-mastra-score-event@0.1.0",
"imported_at": "2026-04-28T12:00:00Z",
"score_event": {
"score_id_ref": "score_01h...",
"scorer_id": "toxicity-check",
"score": 0.98,
"target_ref": "span_01h...",
"timestamp": "2026-04-28T12:00:00Z",
"score_source": "live",
"trace_id_ref": "trace_01h...",
"span_id_ref": "span_01h..."
}
}
The importer should compute source_artifact_digest over the full input JSONL file before reducing rows, following the Promptfoo/OpenFeature/CycloneDX receipt lanes.
7. Exclusions¶
P14c v1 must not import:
- raw
metadatabodies - raw
correlationContextbodies - inline replacements for
metadata_ref - trace trees
- span payloads
- logs, metrics, or feedback events
addScoreToTrace(...)legacy payloads as the primary seam- scorer definitions
- scorer pipeline config
- prompts
- model outputs
- request or response bodies
- dashboard URLs
- experiment summaries
- score histograms or aggregate rollups
The lane is ScoreEvent-first. It is not observability-first.
Only bounded reference fields are allowed for metadata or correlation context continuity. No raw body, object expansion, or callback-envelope import is part of v1.
8. What the receipt does not claim¶
The receipt does not mean:
- the score is correct
- the scorer is reliable
- the model output was good or bad
- the Mastra runtime behaved correctly
- the trace/span anchor is complete
- the dashboard state is true
- the score should pass or fail a gate
The receipt means only:
a selected Mastra score-event outcome was reduced into a bounded, provenance-bearing evidence receipt
9. CLI shape¶
Recommended command:
assay evidence import mastra-score-event \
--input mastra-score-events.jsonl \
--bundle-out mastra-score-receipts.tar.gz \
--source-artifact-ref mastra-score-events.jsonl \
--run-id mastra_score_event_import \
--import-time 2026-04-28T12:00:00Z
Implementation should mirror the existing external receipt importers:
- strict streaming JSONL ingestion
- reduced score-event artifact input, not raw callback input
- full-file SHA-256 source digest
- one Assay
EvidenceEventreceipt per score artifact - direct
BundleWriteroutput - deterministic
--import-timefor fixtures - fail closed on forbidden body fields
10. Evidence Contract posture¶
The implementation PR should add an experimental registry row for:
Stable promotion requires the same governance bar as other event types:
- concrete payload section
- conformance tests
- type-specific payload invariant beyond envelope/hash validity
P14c should not add a Trust Basis claim. First prove:
- import works
- bundle verifies
- Trust Basis can read the bundle
- existing eval/decision/inventory claims remain unaffected
A later slice should decide whether score receipts remain importer-only or need a separate claim such as external_score_receipt_boundary_visible. That decision is now captured by P14d Mastra Score Receipt Trust Basis Readiness Freeze.
11. Tests¶
Minimum test set:
- valid score event JSONL imports into a verifiable bundle
- multiple rows produce multiple receipt events
- missing
target_refortimestampfails closed - present
score_id_refandscorer_idround-trip into receipts; missingscore_id_refremains accepted in v1 when the row still carries the minimum bounded review surface (target_ref,timestamp, numericscore, and at least one scorer identity) score_source,trace_id_ref,span_id_ref, andscore_trace_id_refdo not participate in receipt identity or Trust Basis claim semantics- non-numeric
scorefails closed - raw
metadataobject fails closed - raw
correlationContextobject fails closed addScoreToTrace-shaped row fails closed unless first reduced to the v1 input shape- Trust Basis generation succeeds and keeps existing external eval, decision, and inventory receipt claims absent
12. Outward posture¶
Do not open a new Mastra issue for P14c.
After P14c is on main, a low-pressure heads-up may be reasonable only if there is natural context:
Small downstream follow-up for context: I added an Assay-side receipt-import plan around bounded Mastra
ScoreEvent/ExportedScoreartifacts. It stays outside this repo and is framed as an external evidence-consumer path over the documented exporter callback surface, not as an integration or partnership claim.
No ask. No tag. No "support" language.
13. Non-goals¶
P14c does not:
- implement a Mastra exporter
- run Mastra
- parse full Mastra traces
- import logs, metrics, or feedback
- define score correctness
- define scorer reliability
- add Harness recipe support
- add SARIF/JUnit projection
- add Trust Basis score-claim semantics
- make any upstream contribution to Mastra