- πͺ SwiftContext β the zero-LLM replacement for FastContext
πͺ SwiftContext β the zero-LLM replacement for FastContext
SwiftContext does everything FastContext used to do β and five things it never could β for $0 in LLM tokens.
When Microsoft's FastContext (FastContext-1.0-4B-SFT) vanished from the Hub in June 2026, coding agents lost their dedicated repository-exploration subagent. SwiftContext rebuilds that capability from scratch β without a 4B model, without a GPU, and without spending a single token per query.
A 66M-parameter router decides how to search. A deterministic, AST-powered engine finds, ranks, traces, and explains your code in milliseconds. No hallucinated line numbers. No API bill.
β‘ Why this exists
Running a 4B LLM just to answer "where is login() defined?" is like hiring a research assistant to look up a word in a dictionary. FastContext proved dedicated exploration subagents help coding agents (+5.5% SWE-bench resolution, -60% tokens) β but it required GPU inference on every single query.
SwiftContext keeps the win, drops the cost. A tiny DistilBERT router (~5ms, CPU) classifies query intent, then a deterministic engine β BM25, AST symbol tables, call graphs, and sentence-transformer embeddings β does the actual finding. Zero LLM calls in the hot path.
π₯ SwiftContext vs. FastContext
| Capability | FastContext (4B LLM) | SwiftContext |
|---|---|---|
| Search ranking | LLM confidence (opaque) | Okapi BM25 + 4-signal scoring |
| Semantic / fuzzy search | β (via LLM) | β MiniLM-L6-v2 embeddings |
| Persistent index | β rebuilt every run | β
.swiftcontext/ + MD5 incremental |
| Symbol table (kind, sig, docstring) | β | β 21-language AST extraction |
| Call graph | β | β full graph + O(k) reverse lookup |
trace() β who calls / is called by X |
β not supported | β |
explain() β docs, signature, deps |
β not supported | β |
summarize() β what does this code do? |
β not supported | β pure AST, no LLM |
context() β multi-file LLM-ready context window |
β (LLM re-explores each turn) | β
to_llm_context() |
| GPU required for queries | β 4B model | β CPU is enough |
| LLM tokens per query | ~2,000 | 0 |
| Line-number accuracy | ~70% (LLM hallucination) | 100% (reads the actual file) |
| Output format | Plain file.py:L45-L67 |
Structured JSON: relevance, reason, deps, snippet |
π§ Architecture
User Query
β
βΌ
βββββββββββββββββββββββββββββββββ
β SwiftContext Router (66M) β β DistilBERT, ~5ms, CPU
β + heuristic fast-path layer β
βββββββββββββββββ¬ββββββββββββββββ
β strategy = broad_scan / targeted_search / pinpoint_cite
βΌ
βββββββββββββββββββββββββββββββββββββββββββββ
β RepoIndex (cached) β
β BM25 Β· Symbol Table Β· Call Graph β
β Import Resolver Β· Semantic (MiniLM) Index β
βββββββββββββββββ¬ββββββββββββββββββββββββββββ
β
ββββββββββββ¬ββββββββΌβββββββββ¬ββββββββββββ¬ββββββββββββ
βΌ βΌ βΌ βΌ βΌ βΌ
explore() trace() explain() summarize() context() (all 0 LLM tokens)
π Five APIs, one pipeline
from inference import SwiftContextPipeline
sc = SwiftContextPipeline(router_path="./model/final", repo_path=".")
# 1. explore() β ranked code citations (BM25 + semantic + symbol match)
result = sc.explore("Find the BM25Index class")
# 2. trace() β call chain: who calls this, what does it call
chain = sc.trace("explore")
# 3. explain() β signature, docstring, location, deps
doc = sc.explain("BM25Index")
# 4. summarize() β natural-language "what does this do?" via pure AST analysis
summary = sc.summarize("search")
# 5. context() β full multi-file LLM-ready context window
ctx = sc.context("How does BM25 ranking work end to end?")
print(ctx.to_llm_context()) # ready to paste into any LLM prompt
Real output from the demo (self-hosted β SwiftContext explores its own code)
query : 'Find the BM25Index class'
strategy : pinpoint_cite conf=0.85 latency=8.7 ms tokens=0 (FC avg ~2000) saved=40.0%
[1.00] inference.py:L672-761 Direct definition of `BM25Index` β exact AST symbol match
doc: Okapi BM25 β industry-standard IR ranking.
[Context] 2 primary, 1 caller, 3 callee, ~604 tokens
(FastContext built equivalent context in 2-3 LLM turns β 6,000 tokens)
~90% fewer tokens than FastContext's multi-turn LLM browsing, for an equivalent context window.
π― The router: the part that ships as a model
The DistilBERT classifier included in this repo (model/final/) is the strategic core: it decides which search strategy the deterministic engine should run, in ~5ms on CPU.
| Label | Meaning | Example |
|---|---|---|
broad_scan |
Wide exploration β file/module unknown | "How does the whole pipeline indexing work?" |
targeted_search |
Specific named symbol to locate | "Where is the SwiftContextRouter predict method?" |
pinpoint_cite |
Exact line-level citation of scoped code | "Find the BM25Index class" |
from transformers import pipeline
router = pipeline("text-classification", model="tripathyShaswata/SwiftContext")
router("Find the BM25Index class")
# [{'label': 'pinpoint_cite', 'score': 0.85}]
Test F1: 100% across all 3 classes, backed by a heuristic pre-classification layer for common patterns (verb-first commands, exact identifiers) that fires before model inference even runs.
π¦ What's in this repo
| File | Purpose |
|---|---|
inference.py |
Full production pipeline β BM25, symbol table, call graph, semantic index, all 5 APIs, and a self-hosted demo |
model/final/ |
Trained DistilBERT router weights + tokenizer |
generate_dataset.py |
Generates the 900-example stratified router training set |
train.py |
Training script (5 epochs, fp16, 2e-5 LR) |
push_to_hub.py |
Upload script |
requirements.txt |
Dependencies (sentence-transformers optional β graceful degradation if absent) |
π Quick start
pip install -r requirements.txt
# Run the full demo β all 5 APIs, zero GPU required
python inference.py
from inference import SwiftContextPipeline
sc = SwiftContextPipeline("./model/final", repo_path="/path/to/any/repo")
result = sc.explore("How is authentication implemented?")
for c in result.citations:
print(f"[{c.relevance:.2f}] {c.file}:L{c.start_line}-{c.end_line} {c.reason}")
Works out of the box on 21 languages (Python gets full AST extraction; JS/TS/Java/Go/Rust/C#/etc. get high-fidelity regex extraction).
π Performance
- Router inference: ~5ms CPU, no GPU needed
- First index build: a few seconds per 1,000 files (then cached)
- Cached query latency: 0.4ms (
trace/explain/summarize) to ~10ms (explore/context) - Index persistence:
.swiftcontext/index.json, MD5-gated β only changed files re-index - Tokens spent per query: 0 (vs. FastContext's ~2,000)
π§© Limitations
- The router is trained on English, template-generated query patterns β very unusual phrasing may fall back to the base model's confidence rather than a heuristic hit.
summarize()behavior descriptions are AST-derived (reads/writes/calls/raises/returns), not a full natural-language paraphrase β it won't replace an LLM for deep semantic explanation of why code exists, only what it does.- Semantic search requires
sentence-transformers; without it, SwiftContext gracefully falls back to BM25 + symbol matching only.
π License
MIT β use it however you want, commercial included.
π Acknowledgment
Built in response to the removal of Microsoft's FastContext (arXiv:2606.14066). Not affiliated with Microsoft β an independent, fully open-source reimplementation of the idea, redesigned around zero-LLM determinism.
Paper for tripathyShaswata/SwiftContext
Evaluation results
- Test F1 (weighted)self-reported1.000