June 12, 2026•2 min read•from Machine Learning

Scholialang: an open, vendor-neutral protocol for structured AI agent reasoning traces [R]

Our take

Doug Fir Labs is proud to introduce Scholialang, an open, vendor-neutral protocol designed to structure AI agent reasoning traces—moving beyond unstructured chat logs. Scholialang empowers agents to record their processes with typed vocabulary and stable IDs, enabling reliable access to evidence and conclusions, regardless of the model used. Early pilots demonstrate impressive cross-model replay accuracy and potential token cost reductions, while also suggesting a positive impact on quality safety. Explore the specification and code—and consider contributing—at scholialang.org.

The burgeoning field of AI agents is rapidly moving beyond simple chat interfaces, demanding solutions for managing the complex reasoning processes that underpin their actions. Doug Fir Labs' open-sourcing of Scholialang addresses a critical pain point in this evolution: the lack of structured, inspectable records of agent reasoning. As AI agents increasingly handle multi-step tasks – reading files, running tools, and making decisions – their internal logic often gets lost in freeform chat transcripts, making it difficult to understand *why* a particular decision was reached or to reliably reuse that reasoning in future contexts. This challenge is amplified by the proliferation of different language models, as evidenced by recent developments like Pinecone bringing AI agents directly to enterprise data with Microsoft OneLake Integration[/post/pinecone-brings-ai-agents-directly-to-enterprise-data-with-m-cmqb8gxak0037yt0plghg3i7t] and Angular's official Agent Skills helping AI coding tools write modern Angular[/post/angular-s-official-agent-skills-helps-ai-coding-tools-write-cmqb8gkko002lyt0p3h567c52], all of which rely on increasingly sophisticated agent capabilities. Scholialang’s approach – providing a typed vocabulary, stable IDs, and explicit references – offers a promising pathway towards greater transparency and reusability.

Scholialang’s design is particularly compelling because it prioritizes interoperability. By defining a standardized protocol, it aims to liberate reasoning records from the confines of specific models like Claude or Codex, allowing them to be shared and leveraged across different platforms. Early results, while preliminary, are encouraging. The ability to successfully re-derive decisions across different model families (Opus, Fable, GPT) demonstrates the potential for “cross-model replay,” a significant step towards building more robust and portable agent systems. Furthermore, the observed token cost reductions—cutting Session-5 input tokens by 30-41%—highlight a practical benefit beyond just improved reasoning traceability. While the initial quality safety evaluation showed a slight dip followed by a recovery to baseline parity, it underscores the importance of careful consideration when integrating structured framing into agent workflows. It’s a reminder that structure alone doesn’t guarantee intelligence; it facilitates it. Slack's recent modernization of their data platform, eliminating SSH in EMR pipelines[/post/slack-eliminates-ssh-in-emr-pipelines-migrates-700-jobs-to-r-cmqb8g786001xyt0p6i7w6vqo], provides a parallel example of how refactoring infrastructure around standardized protocols can yield significant performance and maintainability gains.

The open-source nature of Scholialang, coupled with its focus on practicality (MIT/Apache license, PyPI packages, MCP/LSP servers), suggests a deliberate effort to foster community adoption and contribution. The call for critique, particularly concerning the vocabulary, canonical ID semantics, and potential interoperability with standards like OpenTelemetry, is a smart move, demonstrating a commitment to iterative improvement. The project’s creators are wisely avoiding hyperbolic claims of “revolutionary” breakthroughs, instead focusing on a pragmatic approach to addressing a genuine challenge. This measured tone is refreshing in a field often prone to overblown pronouncements. The content-addressed DAG registry and “lazy preludes” further enhance its usability, enabling efficient retrieval of prior reasoning without the need to replay entire transcripts – a crucial optimization for long-running agent workflows.

Looking ahead, the success of Scholialang will hinge on its ability to gain traction within the broader agent ecosystem. Will it become *the* standard for structured reasoning traces, or will competing protocols emerge? The interoperability considerations are paramount; seamless integration with existing tracing formats like OpenTelemetry will be vital for widespread adoption. The ongoing evolution of language models themselves – with increasingly sophisticated reasoning capabilities – will also shape the future of Scholialang. As agents become more complex, the need for robust, standardized methods of understanding and managing their internal logic will only grow more acute. The question now is whether Scholialang can successfully establish itself as the foundational layer upon which the next generation of AI agents is built.

Our new startup (Doug Fir Labs) just open-sourced Scholialang, a protocol for turning an agent's reasoning into structured, inspectable, reusable records instead of leaving it buried in a chat transcript.

The problem: when an agent does multi-step work — reads files, runs tools, makes decisions — the actual reasoning ends up as freeform prose in a log. A later session (or a different model) can't reliably pull "the evidence that supported decision X" back out without re-parsing English, and there's no stable way to reference a prior conclusion.

Scholialang gives agents a small typed vocabulary — Goal, Observation, Evidence, Finding, Deciding, Action, Contradiction, Retract, Concluding, etc. — with stable content-hash IDs, explicit references between atoms, and validator rules. v0.6 adds a content-addressed DAG registry and "lazy preludes" so a later session can pull prior reasoning by hash instead of replaying the whole transcript. Same atom format whether it's emitted by Claude, Codex, or a local model.

Early results — all small pilots, not final benchmarks, pushback welcome:

- Cross-model replay: gave fresh sessions from three model families (Opus 4.8, Fable 5, GPT-5.5/Codex) a trace with the final decision stripped; they re-derived the original decision in 135/135 cases. Caveat: convergent task set and cold-start baselines were already high on two of three models, so read it as a portability signal, not "beats transcripts."
- Token cost: carrying a compact reasoning prelude instead of full history cut Session-5 input tokens ~30–41% with quality flat in the gated arms (a max-compression mode reaches ~50% but trades a little quality).
- Quality safety: in a 4-arm eval, adding context tooling alone actually lowered answer quality vs a bare baseline; adding the structured framing on top repaired it back to baseline parity. Small n, p≈0.07 — suggestive, not significant. We're explicitly not claiming structure makes models smarter.

Code is MIT/Apache, spec is CC-BY, packages are on PyPI, and there are MCP + LSP servers with host recipes for Claude Code / Codex / Ollama.

Would genuinely value critique from people building agent systems or local tooling — especially on the vocabulary, the canonical_id semantics, and whether this should interoperate with OpenTelemetry / existing trace formats instead of being its own thing.

Spec + code: https://scholialang.org · https://github.com/dougfirlabs

submitted by /u/dawebr
[link] [comments]

Read on the original site

Open the publisher's page for the full experience

View original article →