TODO — Next Upstream Interop Lanes (2026 Q2)¶

Date: 2026-04-08
Owner: Evidence / Product
Status: Active queue note. The P11 commerce / trust-proof family is now formally started with PLAN — P11A Visa TAP Intent Verification Evidence Interop, the Browser Use adjacent lane is now live, the PLAN — P13 Langfuse Experiment Result Evidence Interop lane is now sample-backed on main, and P11D APS is explicitly watchlist-only until its promote criteria are met.
Scope (this document now): Record the ranked post-Agno queue for the next upstream interop lanes, the reasons behind that ordering, and the execution rules learned from the current wave.

2026-05-03 status note: After the v3.6.0-v3.9.1 evidence-receipt line and the P14d Mastra semantic freeze, the next selected candidate is Pydantic, but only through P9c Pydantic Reduced Case-Result Receipt Readiness Freeze. P9b recut the sample around one reduced case-result artifact derived from EvaluationReport.cases[]; P9c freezes the readiness boundary before any importer-only P9d work. ReportCase remains discovery input, not the contract unit.

1. Why this queue exists¶

After the current framework, protocol, runtime-accounting, and eval-report wave, Assay now has enough signal to rank the next outreach candidates more strictly.

The working filter on 2026-04-08 was:

current repo momentum
one official seam already documented upstream
a natural maintainer channel
a high chance that sample-first outreach will read as a technical boundary question, not as promotion

This note is the queue record for that ranking and the resulting execution order.

It is not an implementation PR.

It is not an outward post.

It is not a commitment to run every candidate below immediately.

2. What changed after the current wave¶

The current wave produced a few useful operating rules:

GitHub-native outreach works better than forum-first outreach. The LangGraph moderation hold is the clearest counterexample.
The strongest lanes are the ones where upstream already documents one small official seam.
Trace-first is no longer the default choice just because a repo exposes observability.
Platform-adjacent tools require a different posture from framework repos. Import/export slices are usually safer than leading with a generic observability pitch.

Those lessons now drive the queue below.

3. Overall priority list¶

Tier 0 — first finish what is already live¶

Rank	Lane	Status	Why it stays first
0	Current open wave	Active	Let the merged samples, live threads, and held-back UCP lane settle before forcing new surface area

Tier 0 means:

no outward UCP post
no extra pushes on cold threads
keep current live lanes breathing unless an upstream maintainer responds

Tier 1 — selected active candidate¶

Rank	Repo / lane	Status	Primary channel	First seam	Why it ranks here
1	`pydantic/pydantic-ai` / Pydantic Evals	Active readiness freeze	Existing GitHub issue	reduced case-result artifact derived from `EvaluationReport.cases[]`	Best current fit after the v3.6.0-v3.9.1 receipt line and P14d: visible name, code-first eval surface, existing Assay sample, and a small enough reduced artifact to keep `ReportCase` out of the contract boundary

Tier 2 — parked platform-adjacent candidate¶

Rank	Repo / lane	Status	Primary channel	First seam	Why it ranks here
2	`langfuse/langfuse`	Parked after sample-backed planning	GitHub Discussion (`Support`)	bounded experiment item result / evaluation export	Still useful, but lower immediate fit than Pydantic because it pulls harder toward platform/evaluation-session framing

Tier 2b — completed importer-only score lane, claim semantics frozen¶

Rank	Repo / lane	Status	Primary channel	First seam	Why it ranks here
2b	`mastra-ai/mastra`	Importer-only lane implemented; Trust Basis semantics frozen	GitHub issue	`ScoreEvent` / `ExportedScore`	Maintainer-guided seam with shipped `scoreId`; no further upstream action needed unless Mastra changes the ScoreEvent/ExportedScore contract

Tier 3 — special-case OTel-native candidate¶

Rank	Repo / lane	Status	Primary channel	First seam	Why it ranks here
3	`openlit/openlit`	Watchlist	GitHub Discussion	eval/export or bounded score record export	Worth keeping as the main OTel-native special case, but still not the best general next lane

Tier 4 — later frontier and heavier infra lanes¶

Rank	Repo / lane	Status	Primary channel	First seam	Why it ranks here
4	`P11B` — x402	Queued	publish / integrate first	payment lifecycle evidence	Technically interesting, but the repo currently has no Issues or Discussions and the semantics are much riskier than TAP
5	`P11C` — Identus	Watchlist	GitHub Discussion	credential / delegation proof	Interesting, but heavier and more infrastructural than the next eval/export lane
6	`P11D` — APS	Watchlist (promote-only)	GitHub issue in `aeoess/agent-passport-system` only	signed authorization artifact / receipt at most	Serious adjacent protocol work, but still too semantically heavy to open as an active lane before a clearly smaller external-facing artifact is confirmed

`P11D` promote rule¶

Treat APS as a roadmap watchlist candidate, not an active lane, unless all of the following are true:

A smallest external-facing artifact is confirmed publicly in APS materials or in the APS repo itself, for example a signed authorization artifact / receipt that clearly sits below the rest of the protocol stack.
The seam can be modeled without pulling in passports, delegation lattice, reputation, or governance primitives.
The sample can stay strictly at permit, deny, and malformed.
The conversation happens in the APS repo itself, not by continuing the A2A thread or another third-party ecosystem thread.

Tier 5 — still lower fit¶

Rank	Repo / lane	Status	Primary channel	First seam	Why it ranks here
7	`livekit/agents`	Watchlist	issue or discussion only if a small hook surface becomes clear	metrics / event hook evidence at most	Lower fit because the public seam is much more deployment and observability heavy than artifact-first
8	`microsoft/autogen`	Deprioritized	GitHub Discussion	n/a	Keep low because the repo is explicitly in maintenance mode

4. Historical note on Agno¶

At the time of discovery, agno-agi/agno ranked first among the same-space framework candidates because:

Discussions were enabled
Evals and Tracing were clearly separated in the docs
AccuracyEval was a cleaner first seam than another trace-export sample

That choice has now already been executed in the current wave. The formal plan is PLAN — P10 Agno Accuracy Eval Evidence Interop.

That lane is already in motion, so the queue no longer starts with Agno even though Agno remains the strongest general-purpose next-lane choice in the ranking.

2026-05-09 addendum: agno-agi/agno PR #7782 also opened a separate ToolAuditHook JSONL fixture seam. Keep that as a post-merge watchlist / probe-only candidate under the P10 plan. It must not become a new public receipt family, an Agno-specific Harness branch, or a broader tracing / AgentOS lane by default.

5. Historical note on Browser Use¶

browser-use/browser-use was not the highest strategic priority overall, but it was the right adjacent lane to finish before opening the next platform lane because:

the planning slice was already in progress
the seam is clean and materially different from the current wave
it can be finished without opening the heavier P11A commerce branch yet
it preserves one-lane-at-a-time discipline better than pivoting mid-plan

The critical Browser Use lesson is that the best seam is not observability.

The docs expose:

AgentHistoryList
action_history()
final_result()
errors()
structured_output

At the same time, Browser Use also documents Laminar, OpenLIT, and telemetry. That broader observability layer is exactly what Assay should avoid as the first wedge.

That lane is now live. The formal Browser Use plan lives in PLAN — P12 Browser Use History / Output Evidence Interop.

6. Historical note on `P11A`¶

The P11A Visa TAP lane was ranked above Browser Use in the broader frontier ordering because it had stronger protocol value:

verification-first rather than platform-first
cryptographic and protocol-adjacent enough to fit Assay's trust-compiler direction closely
strategically different from another framework or eval lane

The formal frontier plan now lives in PLAN — P11A Visa TAP Intent Verification Evidence Interop.

That lane is now live too, so the queue no longer needs to choose between Browser Use and P11A as the next move.

7. Why Langfuse is now the next planned lane¶

langfuse/langfuse is now the next best planned lane because the two lanes that previously sat ahead of it in execution order are already in motion.

Why it now moves up:

strong repo momentum
Discussions enabled with an answerable Support category
official eval docs around datasets, experiments, and scores
API-first and export-friendly positioning
a seam that is different from Browser Use history/output and TAP verification

Why it is still socially harder than the earlier framework lanes:

Langfuse already positions itself as a broad LLM engineering platform with observability, datasets, scores, and experiments
that makes the seam real
but it also makes the outreach socially riskier because Assay can be read as another platform talking to a platform

The right posture there is:

export/import sample first
bounded experiment-result seam first
Support Discussion only after the sample lands
no trace-first framing

The formal Langfuse plan now lives in PLAN — P13 Langfuse Experiment Result Evidence Interop.

8. Why Mastra stays below the top four¶

mastra-ai/mastra is slightly lower in strategic weight than Langfuse, but in some respects cleaner to approach because it has less platform-on-platform friction.

The main reason it stays below the top four in this queue is channel shape:

no Discussions
outward route is issue-first

That means the lane may eventually be easier socially, but it is less natural as the next GitHub-native sample-backed question.

9. Sequencing rules¶

The queue should be executed under the same discipline as the current wave:

one repo at a time
sample first
one small outward move only after the sample lands on main
no second seam in the first sample
no observability-first pitch when a smaller result artifact exists

Additional queue rules:

reserve P11 for the commerce / trust-proof family
Browser Use should stay output/history-first, not Laminar/OpenLIT-first
P11A should stay verification-first, not payment-truth-first
P11D should remain watchlist-only unless a signed authorization artifact can be kept below passports, delegation, reputation, and governance
Langfuse should stay experiment-result-first, not trace-first
Mastra should stay eval-result-first, not tracing-first
OpenLIT should remain a special-case OTel-native candidate, not the default next lane

10. Next actions¶

P13 is now plan-backed and sample-backed on main; treat its outward move as the next deliberate platform-adjacent decision, not an automatic rush.
Let the fresh Browser Use and Visa TAP outward threads breathe unless an upstream maintainer responds.
Keep P11D APS as watchlist-only unless its promote criteria are met in aeoess/agent-passport-system.
Keep Mastra as the main fallback if the Langfuse positioning risk feels too high when deciding whether to open the outward Langfuse thread.