That's — days of arXiv, HN, releases, and vendor announcements your model can't see. Fillin is the embeddings layer that gives your agent every post-cutoff document — per query, autonomously, citable.
Live demo below — the query already ran against the production endpoint when this page loaded. No install needed to see it work.
fillin_query via streamable HTTPPick the model your agent calls. We use its training cutoff to scope every retrieval — only delta data ever lands in your agent's context window. If your agent multiplexes models, pick the one with the latest cutoff (it's the upper bound of what it might already know).
Fillin's MCP server speaks Streamable HTTP at one URL. Pick your harness — we'll show the exact JSON snippet to drop in.
If you build your own loop, pick curl / custom for the raw HTTP shape.
Add the snippet below to your harness's MCP config. Restart the harness so it picks up the new server.
Click below — this page hits the live fillin_health tool against the deployed MCP server. Sub-second response means your agent will see Fillin in its toolset on next start.
Three tools live in your agent's toolset, called autonomously when relevant. Fillin handles the embedding, the filtering, the reranking, the freshness — your agent just asks and gets clean post-cutoff context back.
What good looks like next: watch your agent autonomously call fillin_query when a question is post-cutoff. The model decides; you watch the trace; the answer is grounded in real, dated, citable sources.