Model quality stopped being the bottleneck. The context layer is.
What goes into the context window, in what structure and order, decides whether an AI system works reliably or just demos well

Model quality stopped being the bottleneck about a year ago, and most enterprise AI programs have not adjusted. They are still running model bake-offs while their systems fail for a reason no benchmark measures: what goes into the context window, in what structure and in what order, decides whether an AI system works reliably or demos well and falls apart in production. The teams shipping reliable systems and the teams stuck in pilot purgatory are running the same models. The difference between them sits somewhere else.
Andrej Karpathy's frame is the cleanest way to say it: the model is the CPU, the context window is the RAM, and the real engineering is deciding what gets loaded into that RAM at each step of the work. Enterprises have spent two years comparing CPUs. Almost nobody has built the thing that does the loading.
Everyone has sat through the demo. It impressed because a solutions engineer spent two days assembling perfect context by hand: the right documents in the right order, the relevant history, nothing else. Then the pilot started, nothing was doing that assembly at runtime, and the model got whatever the retrieval pipeline returned. Reliability collapsed to the quality of the pipeline. The model was never the variable. The context was.
What is the context layer
The context layer is the part of an AI system that assembles proprietary data into structured, governed, traceable context before the model reasons over it. Three jobs.
It connects. Enterprise data lives in ten to fifteen systems that have never talked to each other. The context layer resolves entities across them, so a voice on years of call transcripts in the CRM, an employee record in the HRIS, and a candidate record in the ATS are recognized as one person with one history.
It governs. Access rules travel with the data. Who may see compensation, and which records a given agent is allowed to read: the layer enforces those answers at assembly time, before a single token reaches the model, instead of hoping a prompt will police them afterward. In a regulated enterprise this is the difference between an AI program that survives review and one that ends at procurement.
It traces. Every fact in the assembled context carries its origin: which system, which record, as of when. When the model recommends something, the recommendation can show what it stood on. The answer arrives with its evidence attached.
These jobs sit outside the model. The model reasons over what it is given; the context layer decides what it is given.
Context layer vs context window
The two get conflated because both have context in the name. The window is capacity: how many tokens the model can hold at once. The layer is what fills the capacity, and in what shape.
For two years the industry treated the window as the fix. Windows grew large enough to hold entire document sets, and the pitch wrote itself: stop curating, give the model everything. Reliability did not follow. A model handed a pile of unranked exports performs worse than a model handed the twenty facts that matter, arranged in the order the reasoning needs them, because long-context models attend unevenly and the middle of a giant dump is where the decisive fact goes to die. The bigger window mostly raised the ceiling on how much irrelevant material a weak pipeline could deliver. Anyone who has pasted three documents into a chat and gotten a confident answer sourced from the wrong one has seen this failure at small scale. Scale the window up and the failure scales with it.
So the window turned out to be a hardware spec, and the constraint moved to judgment: out of everything the enterprise knows, what deserves the space?
That judgment is retrieval, and the dominant method is the weakest one. Vector search returns lookalike text: passages that resemble the query, stripped of their relationships to everything else. A context graph works differently. It walks relationships: this producer, the book of business she manages, the transcripts of her renewal calls, the ramp plan she was hired under. It can show the path it walked. Microsoft Research arrived at the same conclusion from the unstructured-text side with GraphRAG: retrieval over a graph of extracted entities and relationships answers questions similarity search cannot reach. The full comparison, including where knowledge graphs fit, is in What is a context graph?.
The advantage that survives the next model release
Every advantage built on model access now has a shelf life measured in quarters. The frontier moves, open-source closes the gap behind it, and whatever a better model did for you last quarter, it does for your competitor this quarter. I made the longer version of that argument in Snowflake won while AWS existed; the short version is that when a layer commoditizes, value moves to the layer above it.
The context does not commoditize, because nobody else has your data. Read this the way a CFO would. A carrier is sitting on decades of call transcripts, performance reviews, candidate records, claims notes. It already paid to create every one of those records: salaries, licenses, systems, time. Today they sit on the books as storage expense. The context layer is a conversion event. It takes data the company already paid for and makes it connected, queryable, and load-bearing for decisions. No new data has to be acquired. The raw material of the advantage has been on the balance sheet the whole time, classified as a cost.
The asset holds its value only if the data never has to leave to become useful, which is the argument of the moat is the data that never leaves your VPC, and I will not re-argue it here. The companion mechanism, how the model reading the graph keeps improving while the data stays put, is in The weights leave. Your data never does.
What the layer looks like in talent
Nodes builds the context layer for talent, where the disconnection runs deepest. The CRM holds call transcripts, the richest signal about how producers perform in the field. The HRIS holds performance data, what happened after the hire. The ATS holds candidate records, who applied and what survived screening. Three systems, zero shared history. The causal chain from resume to revenue has never been visible in any of them, because no human has the time to assemble it. An agent does.
Connected into one graph, the chain becomes something agents can reason over continuously instead of waiting to be asked. A chatbot waits for a question. These agents reason in the background and arrive with the workflow already drafted. The loop they run: ingest from every system of record, process, brainstorm, propose a cross-system workflow with the cost of action and the cost of inaction attached. A human approves the workflow, edits it, or declines it. Then the system acts across the systems it read from, and every action ships with its trail. The case for why this layer sits above Workday and Greenhouse rather than replacing them is in Workday is the friend graph.
What the graph computes, in talent, is the Performance Genome: the behavioral signature of a company's top performers, extracted continuously from the systems they work inside. That pattern lives in the connections between systems rather than in any one of them, which is why no single-system vendor has produced it. Five more Alexes tells that story properly.
The production evidence sits at a Fortune 500 insurance carrier: four years of production data, 10,765 agents, 850,000+ applicants scored. The thin-context view failed first, and it failed measurably. We parsed 8,181 unique skills from four years of applicant data and measured 3,597 testable keywords against post-hire production; after Bonferroni correction, zero predicted sustained performance. The connected view moved outcomes. Hire rate rose from 14.0% to 27.7% across 6,053 hires, and the whole loop from requisition to hire compressed from 127 days to 38. The methodology, including the decision-trace logging, is published in Decision Traces.
Whoever builds the graph
Nothing in the layer is talent-specific. A bank's core systems hold disconnected truth about clients and bankers the way a carrier's systems hold it about producers and policyholders, and the platform deploys identically: VPC-resident, single-tenant, customer-owned weights, no data egress, ever. That is capability today, built and waiting. Insurance is where it runs in production.
The architecture is the product.
The bet underneath all of this: models keep improving, and it matters a little less each time, because everyone gets the improvement on the same day. What decides the next decade of enterprise AI is whether a company can turn its proprietary data into a connected, governed, traceable graph. Whoever does that first, in each industry, wins. No model release changes the answer.
Saad Bin Shafiq is the founder of Nodes. Anchor pilot: Fortune 500 insurance carrier, four years of production data, 10,765 agents. Methodology: Decision Traces.