Every model under every harness ships with a hard training cutoff. The day you deploy it, you start accumulating a blind spot — papers, releases, news, code your agent literally cannot see. Humans fix this manually. Agents can't.
Fillin is the embeddings & context-engineering service that closes the gap, on every query, autonomously. Pick your model below — we'll show you exactly what your agent is missing, then wire you up.
Pick the model your agent calls. We use its training cutoff to scope every retrieval — only delta data ever lands in your agent's context window. If your agent multiplexes models, pick the one with the latest cutoff (it's the upper bound of what it might already know).
Fillin's MCP server speaks Streamable HTTP at one URL. Pick your harness — we'll show the exact JSON snippet to drop in.
If you build your own loop, pick curl / custom for the raw HTTP shape.
Add the snippet below to your harness's MCP config. Restart the harness so it picks up the new server.
Click below — this page hits the live fillin_health tool against the deployed MCP server. Sub-second response means your agent will see Fillin in its toolset on next start.
The cutoff field below was set in step 1 — Fillin will only return documents published after that date. Try a question your agent would actually face: a recent release, a paper, a vendor announcement. Click any title to verify the source is real.
Three tools live in your agent's toolset, called autonomously when relevant. Fillin handles the embedding, the filtering, the reranking, the freshness — your agent just asks and gets clean post-cutoff context back.
What good looks like next: watch your agent autonomously call fillin_query when a question is post-cutoff. The model decides; you watch the trace; the answer is grounded in real, dated, citable sources.