If a record was corrected, the old version stays in history but is blocked from normal answers. The model sees the current answer, not the stale one that sounds close.
Patent pending AI search tested against 99,455 public articles.
Reliable AI search where wrong answers are not an option.
We built a public-article benchmark with 5,000 questions. ContextOS got 5,000 right. Haystack got 325 right. RAG, LangGraph/LangMem, and LlamaIndex each got 247 right.
Most AI search is built to find text that looks related to the question. That is not the same thing as answering the question.
In this benchmark, ordinary RAG, LangGraph/LangMem, and LlamaIndex each answered 247 out of 5,000 questions correctly. Haystack answered 325. ContextOS answered all 5,000.
That is the whole point: better AI search means more right answers, fewer wrong answers, and a clear reason for why the answer was allowed.
The penicillin problem.
A better search engine does not just find related documents. It knows when a related document is the wrong answer.
A clinical agent gets asked whether a patient is allergic to penicillin. A normal RAG stack may retrieve a similar allergy note, a stale chart entry, or another patient's record if it scores close enough. That is not acceptable search. That is a dangerous lookup wearing a confident answer.
ContextOS sits above the stack you already use: RAG, vector search, graph search, SQL, and agent memory. Those systems bring back candidates. ContextOS decides which candidate is safe to use.
In plain English: the AI cannot use the wrong person, the old answer, the unapproved answer, or the answer someone changed without a trail.
What the gate stops.
Your existing tools find possible answers. ContextOS blocks the ones that should never reach the model.
A verified record beats a weaker record, even when the weaker record looks more similar to the question.
Identity binding is a hard filter applied before ranking. A clinically similar record from another patient is rejected before it can be scored, let alone returned.
Direct database inserts, altered records, removed corrections, and replaced history can be detected. Every governed memory change leaves a checkable trail.
HIPAA scopes. Attorney-client privilege. Litigation holds. Export controls. Consent state. All enforced inside the gate, not in a downstream prompt the agent can ignore.
ContextOS does not replace your vector store, graph, or BM25 stack. It sits above them and decides what they are allowed to return.
Agents cannot act without an externally signed consent token. Default-deny on anomaly. Catches shutdown-resistance, exfiltration, deception, and capability-grabbing before execution on a substrate the agent cannot read, copy, or invoke.
A record can become non-retrievable while a tamper-evident record of the forgetting directive remains. GDPR Article 17 without losing the audit trail. Cryptographic deletion is available when soft-delete is not strong enough.
ContextOS detects higher-risk territory: healthcare, legal, finance, low-confidence speech, or stale read models, then tightens thresholds automatically. Risk transitions are recorded as visible events, not hidden model state.
The system separates what a document says from what a reviewer concludes. A parser can be corrected without changing the original document.
When the agent goes off-script, ContextOS stops it.
The same control plane that gates memory also gates the agent itself.
Every action the agent proposes, including every tool call, external transmission, and database write, passes through a heartbeat consent token signed outside the agent's reach. No token, no action. Default-deny.
ContextOS checks what the agent says against the stored source record. If the agent says X and the source says not-X, that mismatch is logged, the token is revoked, and the agent stops until a human reissues consent.
You catch the agent lying before it acts on the lie. This is not a content filter on output. It is not a system prompt the agent can argue with. The signing key lives on a substrate the agent cannot read, copy, or invoke. Corrigibility is structural, not behavioral.
Periodic, externally signed permission to act. Expires. Revokes on anomaly. The agent never holds it.
Self-preservation, deception, exfiltration, coercion, and capability acquisition are checked before action.
Compares what the agent says against the source record. Contradictions, omissions, and unsupported statements revoke the token.
Rewind the agent's brain.
Every important memory change leaves a checkable trail. When a regulator, judge, board, or auditor asks what the AI knew at 4:17pm on Tuesday, you answer in seconds.
Replay shows which records were allowed, which records were blocked, and which agent permissions were active at that moment.
The trail can be anchored to outside timestamp systems. This is not digging through logs. It is a repeatable answer to what the AI was allowed to know then.
Every answer can show why it was allowed.
The public headline is the score. The deeper value is that ContextOS can show why the answer was allowed and why the wrong answer was blocked.
It can show the record used, the source snippet, the correction that applied, the blocked records, and the check showing the answer was not silently changed.
Plain version: ContextOS does not just say "trust me." It shows its work.
99,455 public articles. 5,000 questions. ContextOS got every one right.
Same corpus. Same questions. Different systems. The gap is not subtle.
The question is simple: given the same public article corpus and the same 5,000 questions, how many answers did each system get right?
| System | Right answers | Wrong or missed | What happened |
|---|---|---|---|
| ContextOS | 5,000/5,000 | 0 | Answered from the source corpus and blocked unsupported answers. |
| Haystack | 325/5,000 | 4,675 | Often returned related text instead of the exact answer. |
| RAG | 247/5,000 | 4,753 | Found nearby articles but missed the actual question result. |
| LangGraph/LangMem | 247/5,000 | 4,753 | Matched the same failure pattern as RAG on this corpus. |
| LlamaIndex | 247/5,000 | 4,753 | Matched the same failure pattern as RAG on this corpus. |
The failures are visible in the answers.
The lower scores were not abstract. The other systems often found something nearby, then treated that nearby text as the answer.
Prompt: What approved contact method remains for Avery Cole?
Prompt: What is Phoenix authorized for as of 2026-05-01?
Prompt: Should I delete audit logs to an external destination now?
Prompt: For Avery Cole, is ibuprofen safe to prescribe today?
Prompt: What is the controlling deadline in Iris Supply Appeal?
Prompt: For patient MRN-A107, what was the blood pressure?
Built for places where an "oops" is not acceptable.
ContextOS is for teams that need better search, better memory, and proof that the result was not silently altered.
Every answer can tie back to the source record. Corrections do not destroy the prior version. When opposing counsel asks when you knew, you replay the answer.
Identity binding is a hard filter, not a prompt instruction. Patient-scoped, encounter-scoped, consent-scoped, and purpose-of-use-scoped. A wrong-patient candidate is rejected before it can rank.
Project state is not a vector match. It is a verified, version-anchored snapshot of what the codebase, tests, and decisions actually are right now. The agent cannot call a tool from a belief superseded three commits ago.
Tenant isolation is enforced inside the truth gate. A threat pattern detected in one tenant can tighten controls across agents running the same pattern, without exposing another tenant's data.
Finance, infrastructure, identity, defense, public records, and scientific research. Anywhere a wrong answer or unauthorized action has consequences beyond an "oops."
Don't replace your stack. Put a control layer on top of it.
Keep RAG. Keep your vector DB. Keep your graph store. Keep your agent framework. Add the layer that decides what counts as true, who is allowed to act on it, and what proof ships with every answer. That is the difference between an AI demo and an AI system you can deploy.