A State Space for Science - Fylo AI for Science

Scientific software is changing shape.

For a long time, the model of scientific computing was straightforward. We had databases for retrieval, notebooks for analysis, papers for publication, and documents for coordination. Even as machine learning entered the workflow, most tools still treated intelligence as something that happened at the edge: a better search bar, a smarter summarizer, a more fluent assistant.

That is no longer the world we are entering.

Today, models are beginning to participate in scientific work more directly. They read across literatures, extract structured findings, compare conflicting results, propose hypotheses, draft experimental directions, and coordinate with other models to explore a problem from multiple angles. The most interesting systems are no longer just answering questions. They are joining workflows. In synthesis planning, materials characterization, drug discovery, and closed-loop laboratory systems, models are beginning to participate not just in reading the literature but in designing and evaluating the experiments themselves¹.

This changes the design problem. A document is a remarkable artifact. It is durable, legible, portable, and deeply human. But it is not a particularly good unit of coordination for a world in which humans and machines are continuously reading, revising, evaluating, and acting on knowledge together. That is not because documents are obsolete. It is because documents are too coarse.

A paragraph can contain a guess, a formal claim, a provisional interpretation, a methodological caveat, and an actionable recommendation all at once. To a human reader, those differences are often implicit. To a machine collaborator, they need to be made visible. That is the kind of state ordinary documents tend to collapse — and as multi-agent research systems make clear, the coordination problem is no longer just retrieval; it is representation, handoff, and control.

In addition to vertebrates, bioelectricity is evident in planarian regeneration, where it is responsible for the patterning of the anterior-posterior axis.Claim Targeting the endogenous bioelectric circuit in regenerating planarian fragments can produce no-head and two-head animals (Beane et al., 2011; Beane et al., 2013; Durant et al., 2019).Result Strikingly, a temporary (as brief as 3 h) alteration of the bioelectric state gave rise to planaria, which have a permanently altered target morphology—their fragments continue to generate two heads in perpetuity with no further manipulation (Oviedo et al., 2010; Durant et al., 2017),Observation motivating a model in which the bioelectric circuit holds anterior-posterior axis polarity as a kind of rewritable memory separate from the genetics (Pezzulo et al., 2021).Hypothesis

ClaimResultObservationHypothesis

Zhang, G.J. & Levin, M. (2025). Bioelectricity is a universal multifaced signaling cue in living organisms. Mol. Biol. Cell, 36, pe2, 1–8. § "Bioelectricity in Tissue Homeostasis and Regeneration," p. 3.

At Fylo, we have been exploring a simple framework for this problem.

Not a grand theory of everything. Not an ontology for its own sake. A working geometry for scientific thought.

The idea is simple: much of knowledge work can be understood across three dimensions.

The first is claim.

Some statements begin as fragments. A note in the margin. A possible relation. A pattern someone thinks might be there. Other statements are more formed: typed, structured, formal enough to be compared, queried, or tested. Claim, in this sense, runs from the ambiguous to the structured. A reaction mechanism scribbled on a whiteboard during a group meeting and a validated synthesis protocol filed in a LIMS system are both claims — but they occupy opposite ends of this axis.

ambiguous

structured

"Ion channels might be involved in patterning"

Lab note: Kir2.1 expression correlates with BMP signaling changes

EXP1 cation channel expression induces ectopic eye formation in Xenopus

GEVI ASAP1 voltage measurement protocol for zebrafish embryogenesis

The second is belief.

Some things are merely possible. Others are weakly supported. Others are well grounded in evidence, replication, mechanism, or convergence from multiple sources. Belief runs from the tentative to the grounded. A single promising yield from one run under one set of conditions sits near the tentative end. The same result, replicated across labs, catalysts, and temperatures, sits near the grounded end.

tentative

grounded

Single zebrafish embryo voltage reading

Consistent voltage patterns across cleavage stages

Replicated in Xenopus, zebrafish, mice, and planaria

Multiple unrelated channels produce same phenotype

The third is action.

Some knowledge is something we inspect. Some knowledge is something we use to intervene: to trigger a workflow, prioritize an experiment, recommend a protocol, allocate attention, or hand work from one system to another. Action runs from observation to intervention. Reviewing a pathway in a planning meeting is observation. Dispatching that pathway to an automated synthesis platform for execution is intervention. The distance between them is the action axis.

observation

intervention

Reviewing bioelectric patterns in the literature

Screening ion channel expression in tumor cell lines

Designing optogenetic hyperpolarization experiment

Applying TTFields electrotherapy to glioma patients

Taken together, these three axes define a space. A point in that space is not just a piece of content. It is a position.

Consider a synthesis lab notebook entry. In a single paragraph, a chemist may record a reaction pathway, note the yield, speculate about why a different catalyst might perform better, flag an assay artifact, and recommend a follow-up run. That entry is a document. But it contains at least five different epistemic objects — a structured result, an observation, a hypothesis, a caveat, and a proposed action — each occupying a different position in the space we have just described. A collaborator reading it, human or machine, has to reconstruct those distinctions from context. Most of the time, nobody does.

Claim

Bioelectricity is responsible for patterning the anterior-posterior axis in planaria

[0.7, 0.5, 0.1]

Result

Targeting bioelectric circuit produces no-head and two-head planaria

[0.8, 0.7, 0.2]

Observation

3-hour alteration permanently changes target morphology — two heads in perpetuity

[0.6, 0.6, 0.1]

Hypothesis

Bioelectric circuit holds axis polarity as rewritable memory separate from genetics

[0.5, 0.3, 0.1]

A hunch in a lab meeting and a validated protocol may both live in the same research environment, but they do not occupy the same position in this geometry. One is ambiguous, tentative, and observational. The other is structured, grounded, and action-bearing. Most software treats them as neighbors because they are both text. We think that is a mistake.

What matters is not just storing knowledge. It is locating it.

The three axes — claim, belief, and action — define a space in which any piece of scientific knowledge can be located. The three highlighted surfaces (Meaning, Protocol, and Control) emerge where pairs of axes meet.

Once this space becomes visible, three important surfaces appear.

The first is the Meaning Plane: where claim and belief meet.

This is the surface where language becomes interpretation. What exactly is being said? How crisp is the claim? What evidence supports it? How much ambiguity remains? This is where extraction, synthesis, contradiction, and explanation live. A system working on this plane is not yet deciding what to do. It is trying to make meaning legible. The second is the Protocol Plane: where claim and action meet.

This is the surface where representation becomes executable. Is a statement expressed clearly enough to guide a workflow? Can it be translated into a query, a filtering rule, an experiment template, a protocol suggestion, or a machine-readable instruction? This is where scientific language begins to cross the boundary into operations.

This plane is where the automation question lives. A growing number of laboratories are moving toward closed-loop experimental systems — platforms where a model proposes a synthesis route, a robotic system executes it, and the outcome feeds back into the next round of planning. For those systems to work, chemistry cannot remain in natural language. Reaction pathways, building-block specifications, catalyst conditions, and expected yields need to be expressed in a form that is both human-auditable and machine-executable². The Protocol Plane is the surface where that translation happens — or fails to. The third is the Control Plane: where belief and action meet.

This is the surface where confidence becomes permission. Given what we know, who should act? Should the system only surface a possibility? Should it recommend a next step? Should it automate something narrow? Should it wait for review? This is where trust, oversight, and delegation become design problems. In practice, most experimental science does not live in any of these planes yet. It lives in fragments. Lab notebooks. Spreadsheets. Local files. Slide decks passed between group meetings. Each chemist, each biologist, each materials scientist maintains their data differently. There is rarely a single source of truth for what has been tried, what worked, what failed, and why. The geometry we are describing is not an overlay on a well-organized landscape. It is infrastructure for a landscape that does not yet have coordinates.

These planes matter because modern scientific workflows are no longer linear.

A literature-review system may decompose a question into multiple sub-questions. One model may extract claims from papers. Another may compare them against prior work. Another may score uncertainty or identify contradictions. Another may suggest a next experiment. A human researcher may then redirect the search, reject a proposed synthesis, or promote a tentative result into a concrete plan.

Extract

claims from papers

+claim

→

Compare

against prior work

+belief

→

Score

uncertainty

+belief

→

Suggest

next experiment

+action

→

Human

redirect / approve

control

Across this chain, the failure mode is not always intelligence in the narrow sense. Often, it is illegibility. One agent returns a paragraph when what is needed is a structured claim. Another returns a confidence estimate without showing what grounded it. Another converts an interesting result into a recommendation before anyone has defined the threshold for intervention. The workflow does not break because nobody can generate text. It breaks because the handoffs are semantically blurry.

This is why the framework matters.

It gives humans and machines a shared way to express where a piece of knowledge stands.

A claim can become more structured without pretending it is already certain. A belief can become more grounded without being silently converted into action. An action can be proposed without implying it is authorized. In other words, the geometry separates three things that are too often conflated: what something says, why we should take it seriously, and what anyone is allowed to do with it.

What it says

claim

≠

Why trust it

belief

≠

What to do

action

For science, that separation is not cosmetic. It is foundational.

Scientific work is full of partials. Partial evidence. Partial agreement. Partial models. Partial protocols. Partial replication. A synthesis run that produced an unexpected byproduct. A catalyst that performed well in one solvent system but not another. A yield variation no one can yet explain. A pathway that three labs tried and one abandoned. The point is not to eliminate this unfinished state. The point is to represent it clearly enough that humans and machines can work inside it together — and that changes what software can do.

A graph is no longer just a visualization of papers. It becomes a medium for expressing movement between ambiguity and structure, between speculation and grounding, between review and intervention.

A note is no longer just a note. It can become a claim candidate, linked to evidence, attached to provenance, revised by a model, challenged by a collaborator, and promoted into a workflow only when the right controls are met.

Margin note

"Membrane potential changes might affect patterning"

[0.1, 0.1, 0]

Claim candidate

"EXP1 depolarization induces ectopic eye fate in Xenopus"

[0.7, 0.2, 0]

Evidence linked

Three channel types, two organisms — convergent support

[0.7, 0.6, 0.1]

Model-revised

AI identifies Notch pathway connection, flags contradictory Ca²⁺ data

[0.8, 0.5, 0.3]

Promoted to workflow

Dispatched to optogenetic validation pipeline

[0.9, 0.7, 0.9]

A multi-agent system is no longer just a bundle of prompts. It becomes a set of specialized processes operating across a shared epistemic environment: one agent parsing, another comparing, another ranking, another proposing, while the human retains the ability to inspect, redirect, and authorize.

This is also why the framework matters for open science.

Open science is not just about access to outputs. It is about legibility of process. Where did a claim come from? What evidence supports it? What transformed it? Who revised it? Why did it become important? Why did it become actionable? Fylo’s public direction already emphasizes graph-native scientific discourse and provenance-first collaboration. This framework extends that logic from data lineage into epistemic lineage: not only where information came from, but how it changed status as it moved through a human-machine workflow.

There is a tendency, when talking about AI in science, to jump immediately to autonomy. Can the model generate the hypothesis? Can the system write the review? Can the agent plan the experiment?

Those are real questions. But they are downstream questions.

The more basic question is whether we can build environments in which humans and machines can reason together without collapsing ambiguity into certainty, certainty into authority, or authority into automation. That is what this geometry is trying to answer — not with a dashboard, not with a single score, but with a space in which scientific work can be located, moved, revised, and made legible; a space in which claims can take form, beliefs can gather support, and actions can remain accountable; a space in which machine collaborators do not merely generate more text, but participate in a process that is inspectable, steerable, and shared.

The future of scientific software is not a smarter document. It is an environment where claims can be structured, beliefs can change shape, and action can stay under meaningful control. That is the kind of system we are building at Fylo Labs.

Footnotes

Boiko et al., “Autonomous chemical research with large language models,” Nature, 2023. doi:10.1038/s41586-023-06792-0
Wigh et al., “mCLM: Modular Chemical Language Model,” 2025. arXiv:2505.12565 — demonstrates how modular building-block representations can bridge natural-language synthesis descriptions and machine-executable specifications.