🪄 SwiftContext — the zero-LLM replacement for FastContext

SwiftContext does everything FastContext used to do — and five things it never could — for $0 in LLM tokens.

When Microsoft's FastContext (FastContext-1.0-4B-SFT) vanished from the Hub in June 2026, coding agents lost their dedicated repository-exploration subagent. SwiftContext rebuilds that capability from scratch — without a 4B model, without a GPU, and without spending a single token per query.

A 66M-parameter router decides how to search. A deterministic, AST-powered engine finds, ranks, traces, and explains your code in milliseconds. No hallucinated line numbers. No API bill.

⚡ Why this exists

Running a 4B LLM just to answer "where is login() defined?" is like hiring a research assistant to look up a word in a dictionary. FastContext proved dedicated exploration subagents help coding agents (+5.5% SWE-bench resolution, -60% tokens) — but it required GPU inference on every single query.

SwiftContext keeps the win, drops the cost. A tiny DistilBERT router (~5ms, CPU) classifies query intent, then a deterministic engine — BM25, AST symbol tables, call graphs, and sentence-transformer embeddings — does the actual finding. Zero LLM calls in the hot path.

🥊 SwiftContext vs. FastContext

Capability	FastContext (4B LLM)	SwiftContext
Search ranking	LLM confidence (opaque)	Okapi BM25 + 4-signal scoring
Semantic / fuzzy search	✅ (via LLM)	✅ MiniLM-L6-v2 embeddings
Persistent index	❌ rebuilt every run	✅ `.swiftcontext/` + MD5 incremental
Symbol table (kind, sig, docstring)	❌	✅ 21-language AST extraction
Call graph	❌	✅ full graph + O(k) reverse lookup
`trace()` — who calls / is called by X	❌ not supported	✅
`explain()` — docs, signature, deps	❌ not supported	✅
`summarize()` — what does this code do?	❌ not supported	✅ pure AST, no LLM
`context()` — multi-file LLM-ready context window	❌ (LLM re-explores each turn)	✅ `to_llm_context()`
GPU required for queries	✅ 4B model	❌ CPU is enough
LLM tokens per query	~2,000	0
Line-number accuracy	~70% (LLM hallucination)	100% (reads the actual file)
Output format	Plain `file.py:L45-L67`	Structured JSON: relevance, reason, deps, snippet

🧠 Architecture

                    User Query
                        │
                        ▼
        ┌───────────────────────────────┐
        │   SwiftContext Router (66M)    │   ← DistilBERT, ~5ms, CPU
        │   + heuristic fast-path layer  │
        └───────────────┬───────────────┘
                         │  strategy = broad_scan / targeted_search / pinpoint_cite
                         ▼
        ┌───────────────────────────────────────────┐
        │            RepoIndex (cached)               │
        │  BM25 · Symbol Table · Call Graph            │
        │  Import Resolver · Semantic (MiniLM) Index   │
        └───────────────┬───────────────────────────┘
                         │
      ┌──────────┬───────┼────────┬───────────┬───────────┐
      ▼          ▼       ▼        ▼           ▼           ▼
  explore()  trace()  explain() summarize()  context()   (all 0 LLM tokens)

🚀 Five APIs, one pipeline

from inference import SwiftContextPipeline

sc = SwiftContextPipeline(router_path="./model/final", repo_path=".")

# 1. explore() — ranked code citations (BM25 + semantic + symbol match)
result = sc.explore("Find the BM25Index class")

# 2. trace() — call chain: who calls this, what does it call
chain = sc.trace("explore")

# 3. explain() — signature, docstring, location, deps
doc = sc.explain("BM25Index")

# 4. summarize() — natural-language "what does this do?" via pure AST analysis
summary = sc.summarize("search")

# 5. context() — full multi-file LLM-ready context window
ctx = sc.context("How does BM25 ranking work end to end?")
print(ctx.to_llm_context())   # ready to paste into any LLM prompt

Real output from the demo (self-hosted — SwiftContext explores its own code)

query    : 'Find the BM25Index class'
strategy : pinpoint_cite  conf=0.85  latency=8.7 ms  tokens=0 (FC avg ~2000)  saved=40.0%
  [1.00] inference.py:L672-761  Direct definition of `BM25Index` — exact AST symbol match
         doc: Okapi BM25 — industry-standard IR ranking.

[Context] 2 primary, 1 caller, 3 callee, ~604 tokens
  (FastContext built equivalent context in 2-3 LLM turns ≈ 6,000 tokens)

~90% fewer tokens than FastContext's multi-turn LLM browsing, for an equivalent context window.

🎯 The router: the part that ships as a model

The DistilBERT classifier included in this repo (model/final/) is the strategic core: it decides which search strategy the deterministic engine should run, in ~5ms on CPU.

Label	Meaning	Example
`broad_scan`	Wide exploration — file/module unknown	"How does the whole pipeline indexing work?"
`targeted_search`	Specific named symbol to locate	"Where is the SwiftContextRouter predict method?"
`pinpoint_cite`	Exact line-level citation of scoped code	"Find the BM25Index class"

from transformers import pipeline

router = pipeline("text-classification", model="tripathyShaswata/SwiftContext")
router("Find the BM25Index class")
# [{'label': 'pinpoint_cite', 'score': 0.85}]

Test F1: 100% across all 3 classes, backed by a heuristic pre-classification layer for common patterns (verb-first commands, exact identifiers) that fires before model inference even runs.

📦 What's in this repo

File	Purpose
`inference.py`	Full production pipeline — BM25, symbol table, call graph, semantic index, all 5 APIs, and a self-hosted demo
`model/final/`	Trained DistilBERT router weights + tokenizer
`generate_dataset.py`	Generates the 900-example stratified router training set
`train.py`	Training script (5 epochs, fp16, 2e-5 LR)
`push_to_hub.py`	Upload script
`requirements.txt`	Dependencies (`sentence-transformers` optional — graceful degradation if absent)

🏁 Quick start

pip install -r requirements.txt

# Run the full demo — all 5 APIs, zero GPU required
python inference.py

from inference import SwiftContextPipeline

sc = SwiftContextPipeline("./model/final", repo_path="/path/to/any/repo")
result = sc.explore("How is authentication implemented?")
for c in result.citations:
    print(f"[{c.relevance:.2f}] {c.file}:L{c.start_line}-{c.end_line}  {c.reason}")

Works out of the box on 21 languages (Python gets full AST extraction; JS/TS/Java/Go/Rust/C#/etc. get high-fidelity regex extraction).

📊 Performance

Router inference: ~5ms CPU, no GPU needed
First index build: a few seconds per 1,000 files (then cached)
Cached query latency: 0.4ms (trace/explain/summarize) to ~10ms (explore/context)
Index persistence: .swiftcontext/index.json, MD5-gated — only changed files re-index
Tokens spent per query: 0 (vs. FastContext's ~2,000)

🧩 Limitations

The router is trained on English, template-generated query patterns — very unusual phrasing may fall back to the base model's confidence rather than a heuristic hit.
summarize() behavior descriptions are AST-derived (reads/writes/calls/raises/returns), not a full natural-language paraphrase — it won't replace an LLM for deep semantic explanation of why code exists, only what it does.
Semantic search requires sentence-transformers; without it, SwiftContext gracefully falls back to BM25 + symbol matching only.

📜 License

MIT — use it however you want, commercial included.

🙏 Acknowledgment

Built in response to the removal of Microsoft's FastContext (arXiv:2606.14066). Not affiliated with Microsoft — an independent, fully open-source reimplementation of the idea, redesigned around zero-LLM determinism.

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for tripathyShaswata/SwiftContext

FastContext: Training Efficient Repository Explorer for Coding Agents

Paper • 2606.14066 • Published 22 days ago • 93

Evaluation results

Test F1 (weighted)
self-reported

1.000