Scientific software is changing shape.
For a long time, the model of scientific computing was straightforward. We had databases for retrieval, notebooks for analysis, papers for publication, and documents for coordination. Even as machine learning entered the workflow, most tools still treated intelligence as something that happened at the edge: a better search bar, a smarter summarizer, a more fluent assistant.
That is no longer the world we are entering.
Today, models are beginning to participate in scientific work more directly. They read across literatures, extract structured findings, compare conflicting results, propose hypotheses, draft experimental directions, and coordinate with other models to explore a problem from multiple angles. The most interesting systems are no longer just answering questions. They are joining workflows1. In synthesis planning, materials characterization, drug discovery, and closed-loop laboratory systems, models are beginning to participate not just in reading the literature but in designing and evaluating the experiments themselves3.
This changes the design problem.
A document is a remarkable artifact. It is durable, legible, portable, and deeply human. But it is not a particularly good unit of coordination for a world in which humans and machines are continuously reading, revising, evaluating, and acting on knowledge together. That is not because documents are obsolete. It is because documents are too coarse.
A paragraph can contain a guess, a formal claim, a provisional interpretation, a methodological caveat, and an actionable recommendation all at once. To a human reader, those differences are often implicit. To a machine collaborator, they need to be made visible. And once multiple agents are involved, the need becomes sharper: not just what was said, but how formed it is, how grounded it is, and whether it is ready to guide action. That is the kind of state ordinary documents tend to collapse. A reasonable inference from current multi-agent research systems is that the coordination problem is no longer just retrieval. It is representation, handoff, and control2.
Consider a synthesis lab notebook entry. In a single paragraph, a chemist may record a reaction pathway, note the yield, speculate about why a different catalyst might perform better, flag an assay artifact, and recommend a follow-up run. That entry is a document. But it contains at least five different epistemic objects — a structured result, an observation, a hypothesis, a caveat, and a proposed action — all collapsed into prose. A collaborator reading it, human or machine, has to reconstruct those distinctions from context. Most of the time, nobody does.
At Fylo, we have been exploring a simple framework for this problem.
Not a grand theory of everything. Not an ontology for its own sake. A working geometry for scientific thought.
The idea is simple: much of knowledge work can be understood across three dimensions.
The first is claim.
Some statements begin as fragments. A note in the margin. A possible relation. A pattern someone thinks might be there. Other statements are more formed: typed, structured, formal enough to be compared, queried, or tested. Claim, in this sense, runs from the ambiguous to the structured. A reaction mechanism scribbled on a whiteboard during a group meeting and a validated synthesis protocol filed in a LIMS system are both claims — but they occupy opposite ends of this axis.
The second is belief.
Some things are merely possible. Others are weakly supported. Others are well grounded in evidence, replication, mechanism, or convergence from multiple sources. Belief runs from the tentative to the grounded. A single promising yield from one run under one set of conditions sits near the tentative end. The same result, replicated across labs, catalysts, and temperatures, sits near the grounded end.
The third is action.
Some knowledge is something we inspect. Some knowledge is something we use to intervene: to trigger a workflow, prioritize an experiment, recommend a protocol, allocate attention, or hand work from one system to another. Action runs from observation to intervention. Reviewing a pathway in a planning meeting is observation. Dispatching that pathway to an automated synthesis platform for execution is intervention. The distance between them is the action axis.
Taken together, these three axes define a space. A point in that space is not just a piece of content. It is a posture.
A hunch in a lab meeting and a validated protocol may both live in the same research environment, but they do not occupy the same position in this geometry. One is ambiguous, tentative, and observational. The other is structured, grounded, and action-bearing. Most software treats them as neighbors because they are both text. We think that is a mistake.
What matters is not just storing knowledge. It is locating it.
Once this space becomes visible, three important surfaces appear.
The first is the Meaning Plane: where claim and belief meet.
This is the surface where language becomes interpretation. What exactly is being said? How crisp is the claim? What evidence supports it? How much ambiguity remains? This is where extraction, synthesis, contradiction, and explanation live. A system working on this plane is not yet deciding what to do. It is trying to make meaning legible.
The second is the Protocol Plane: where claim and action meet.
This is the surface where representation becomes executable. Is a statement expressed clearly enough to guide a workflow? Can it be translated into a query, a filtering rule, an experiment template, a protocol suggestion, or a machine-readable instruction? This is where scientific language begins to cross the boundary into operations.
This plane is where the automation question lives. A growing number of laboratories are moving toward closed-loop experimental systems — platforms where a model proposes a synthesis route, a robotic system executes it, and the outcome feeds back into the next round of planning4. For those systems to work, chemistry cannot remain in natural language. Reaction pathways, building-block specifications, catalyst conditions, and expected yields need to be expressed in a form that is both human-auditable and machine-executable. The Protocol Plane is the surface where that translation happens — or fails to.
The third is the Control Plane: where belief and action meet.
This is the surface where confidence becomes permission. Given what we know, who should act? Should the system only surface a possibility? Should it recommend a next step? Should it automate something narrow? Should it wait for review? This is where trust, oversight, and delegation become design problems.
In practice, most experimental science does not live in any of these planes yet. It lives in fragments. Lab notebooks. Spreadsheets. Local files. Slide decks passed between group meetings. Each chemist, each biologist, each materials scientist maintains their data differently. There is rarely a single source of truth for what has been tried, what worked, what failed, and why. The geometry we are describing is not an overlay on a well-organized landscape. It is infrastructure for a landscape that does not yet have coordinates.
These planes matter because modern scientific workflows are no longer linear.
A literature-review system may decompose a question into multiple sub-questions. One model may extract claims from papers. Another may compare them against prior work. Another may score uncertainty or identify contradictions. Another may suggest a next experiment. A human researcher may then redirect the search, reject a proposed synthesis, or promote a tentative result into a concrete plan.
Across this chain, the failure mode is not always intelligence in the narrow sense. Often, it is illegibility.
One agent returns a paragraph when what is needed is a structured claim. Another returns a confidence estimate without showing what grounded it. Another converts an interesting result into a recommendation before anyone has defined the threshold for intervention. The workflow does not break because nobody can generate text. It breaks because the handoffs are semantically blurry.
This is why the framework matters.
It gives humans and machines a shared way to express where a piece of knowledge stands.
A claim can become more structured without pretending it is already certain. A belief can become more grounded without being silently converted into action. An action can be proposed without implying it is authorized. In other words, the geometry separates three things that are too often conflated: what something says, why we should take it seriously, and what anyone is allowed to do with it.
For science, that separation is not cosmetic. It is foundational.
Scientific work is full of partials. Partial evidence. Partial agreement. Partial models. Partial protocols. Partial replication. A synthesis run that produced an unexpected byproduct. A catalyst that performed well in one solvent system but not another. A yield variation no one can yet explain. A pathway that three labs tried and one abandoned. The point is not to eliminate this unfinished state. The point is to represent it clearly enough that humans and machines can work inside it together.
That changes what software can do.
A graph is no longer just a visualization of papers. It becomes a medium for expressing movement between ambiguity and structure, between speculation and grounding, between review and intervention.
A note is no longer just a note. It can become a claim candidate, linked to evidence, attached to provenance, revised by a model, challenged by a collaborator, and promoted into a workflow only when the right controls are met.
A multi-agent system is no longer just a bundle of prompts. It becomes a set of specialized processes operating across a shared epistemic environment: one agent parsing, another comparing, another ranking, another proposing, while the human retains the ability to inspect, redirect, and authorize.
This is also why the framework matters for open science.
Open science is not just about access to outputs. It is about legibility of process. Where did a claim come from? What evidence supports it? What transformed it? Who revised it? Why did it become important? Why did it become actionable? Fylo’s public direction already emphasizes graph-native scientific discourse and provenance-first collaboration. This framework extends that logic from data lineage into epistemic lineage: not only where information came from, but how it changed status as it moved through a human–machine workflow.
There is a tendency, when talking about AI in science, to jump immediately to autonomy. Can the model generate the hypothesis? Can the system write the review? Can the agent plan the experiment?
Those are real questions. But they are downstream questions.
The more basic question is whether we can build environments in which humans and machines can reason together without collapsing ambiguity into certainty, certainty into authority, or authority into automation.
That is what this geometry is trying to answer.
Not with a dashboard. Not with a single score. Not with a universal truth meter.
With a space.
A space in which scientific work can be located, moved, revised, and made legible. A space in which claims can take form, beliefs can gather support, and actions can remain accountable. A space in which machine collaborators do not merely generate more text, but participate in a process that is inspectable, steerable, and shared.
The future of scientific software is not a smarter document.
It is an environment where claims can be structured, beliefs can change shape, and action can stay under meaningful control.
That is the kind of system we are building at Fylo Labs.
Footnotes
- Google Research, “Accelerating scientific breakthroughs with an AI co-scientist,” 2025. research.google
- Anthropic, “How we built our multi-agent research system,” 2025. anthropic.com
- Boiko et al., “Autonomous chemical research with large language models,” Nature, 2023. doi:10.1038/s41586-023-06792-0
- Wigh et al., “mCLM: Modular Chemical Language Model,” 2025. arXiv:2505.12565 — demonstrates how modular building-block representations enable automated synthesis planning with robotic platforms.