πŸͺ„ SwiftContext β€” the zero-LLM replacement for FastContext

SwiftContext does everything FastContext used to do β€” and five things it never could β€” for $0 in LLM tokens.

When Microsoft's FastContext (FastContext-1.0-4B-SFT) vanished from the Hub in June 2026, coding agents lost their dedicated repository-exploration subagent. SwiftContext rebuilds that capability from scratch β€” without a 4B model, without a GPU, and without spending a single token per query.

A 66M-parameter router decides how to search. A deterministic, AST-powered engine finds, ranks, traces, and explains your code in milliseconds. No hallucinated line numbers. No API bill.


⚑ Why this exists

Running a 4B LLM just to answer "where is login() defined?" is like hiring a research assistant to look up a word in a dictionary. FastContext proved dedicated exploration subagents help coding agents (+5.5% SWE-bench resolution, -60% tokens) β€” but it required GPU inference on every single query.

SwiftContext keeps the win, drops the cost. A tiny DistilBERT router (~5ms, CPU) classifies query intent, then a deterministic engine β€” BM25, AST symbol tables, call graphs, and sentence-transformer embeddings β€” does the actual finding. Zero LLM calls in the hot path.


πŸ₯Š SwiftContext vs. FastContext

Capability FastContext (4B LLM) SwiftContext
Search ranking LLM confidence (opaque) Okapi BM25 + 4-signal scoring
Semantic / fuzzy search βœ… (via LLM) βœ… MiniLM-L6-v2 embeddings
Persistent index ❌ rebuilt every run βœ… .swiftcontext/ + MD5 incremental
Symbol table (kind, sig, docstring) ❌ βœ… 21-language AST extraction
Call graph ❌ βœ… full graph + O(k) reverse lookup
trace() β€” who calls / is called by X ❌ not supported βœ…
explain() β€” docs, signature, deps ❌ not supported βœ…
summarize() β€” what does this code do? ❌ not supported βœ… pure AST, no LLM
context() β€” multi-file LLM-ready context window ❌ (LLM re-explores each turn) βœ… to_llm_context()
GPU required for queries βœ… 4B model ❌ CPU is enough
LLM tokens per query ~2,000 0
Line-number accuracy ~70% (LLM hallucination) 100% (reads the actual file)
Output format Plain file.py:L45-L67 Structured JSON: relevance, reason, deps, snippet

🧠 Architecture

                    User Query
                        β”‚
                        β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚   SwiftContext Router (66M)    β”‚   ← DistilBERT, ~5ms, CPU
        β”‚   + heuristic fast-path layer  β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚  strategy = broad_scan / targeted_search / pinpoint_cite
                         β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚            RepoIndex (cached)               β”‚
        β”‚  BM25 Β· Symbol Table Β· Call Graph            β”‚
        β”‚  Import Resolver Β· Semantic (MiniLM) Index   β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
      β–Ό          β–Ό       β–Ό        β–Ό           β–Ό           β–Ό
  explore()  trace()  explain() summarize()  context()   (all 0 LLM tokens)

πŸš€ Five APIs, one pipeline

from inference import SwiftContextPipeline

sc = SwiftContextPipeline(router_path="./model/final", repo_path=".")

# 1. explore() β€” ranked code citations (BM25 + semantic + symbol match)
result = sc.explore("Find the BM25Index class")

# 2. trace() β€” call chain: who calls this, what does it call
chain = sc.trace("explore")

# 3. explain() β€” signature, docstring, location, deps
doc = sc.explain("BM25Index")

# 4. summarize() β€” natural-language "what does this do?" via pure AST analysis
summary = sc.summarize("search")

# 5. context() β€” full multi-file LLM-ready context window
ctx = sc.context("How does BM25 ranking work end to end?")
print(ctx.to_llm_context())   # ready to paste into any LLM prompt

Real output from the demo (self-hosted β€” SwiftContext explores its own code)

query    : 'Find the BM25Index class'
strategy : pinpoint_cite  conf=0.85  latency=8.7 ms  tokens=0 (FC avg ~2000)  saved=40.0%
  [1.00] inference.py:L672-761  Direct definition of `BM25Index` β€” exact AST symbol match
         doc: Okapi BM25 β€” industry-standard IR ranking.

[Context] 2 primary, 1 caller, 3 callee, ~604 tokens
  (FastContext built equivalent context in 2-3 LLM turns β‰ˆ 6,000 tokens)

~90% fewer tokens than FastContext's multi-turn LLM browsing, for an equivalent context window.


🎯 The router: the part that ships as a model

The DistilBERT classifier included in this repo (model/final/) is the strategic core: it decides which search strategy the deterministic engine should run, in ~5ms on CPU.

Label Meaning Example
broad_scan Wide exploration β€” file/module unknown "How does the whole pipeline indexing work?"
targeted_search Specific named symbol to locate "Where is the SwiftContextRouter predict method?"
pinpoint_cite Exact line-level citation of scoped code "Find the BM25Index class"
from transformers import pipeline

router = pipeline("text-classification", model="tripathyShaswata/SwiftContext")
router("Find the BM25Index class")
# [{'label': 'pinpoint_cite', 'score': 0.85}]

Test F1: 100% across all 3 classes, backed by a heuristic pre-classification layer for common patterns (verb-first commands, exact identifiers) that fires before model inference even runs.


πŸ“¦ What's in this repo

File Purpose
inference.py Full production pipeline β€” BM25, symbol table, call graph, semantic index, all 5 APIs, and a self-hosted demo
model/final/ Trained DistilBERT router weights + tokenizer
generate_dataset.py Generates the 900-example stratified router training set
train.py Training script (5 epochs, fp16, 2e-5 LR)
push_to_hub.py Upload script
requirements.txt Dependencies (sentence-transformers optional β€” graceful degradation if absent)

🏁 Quick start

pip install -r requirements.txt

# Run the full demo β€” all 5 APIs, zero GPU required
python inference.py
from inference import SwiftContextPipeline

sc = SwiftContextPipeline("./model/final", repo_path="/path/to/any/repo")
result = sc.explore("How is authentication implemented?")
for c in result.citations:
    print(f"[{c.relevance:.2f}] {c.file}:L{c.start_line}-{c.end_line}  {c.reason}")

Works out of the box on 21 languages (Python gets full AST extraction; JS/TS/Java/Go/Rust/C#/etc. get high-fidelity regex extraction).


πŸ“Š Performance

  • Router inference: ~5ms CPU, no GPU needed
  • First index build: a few seconds per 1,000 files (then cached)
  • Cached query latency: 0.4ms (trace/explain/summarize) to ~10ms (explore/context)
  • Index persistence: .swiftcontext/index.json, MD5-gated β€” only changed files re-index
  • Tokens spent per query: 0 (vs. FastContext's ~2,000)

🧩 Limitations

  • The router is trained on English, template-generated query patterns β€” very unusual phrasing may fall back to the base model's confidence rather than a heuristic hit.
  • summarize() behavior descriptions are AST-derived (reads/writes/calls/raises/returns), not a full natural-language paraphrase β€” it won't replace an LLM for deep semantic explanation of why code exists, only what it does.
  • Semantic search requires sentence-transformers; without it, SwiftContext gracefully falls back to BM25 + symbol matching only.

πŸ“œ License

MIT β€” use it however you want, commercial included.

πŸ™ Acknowledgment

Built in response to the removal of Microsoft's FastContext (arXiv:2606.14066). Not affiliated with Microsoft β€” an independent, fully open-source reimplementation of the idea, redesigned around zero-LLM determinism.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for tripathyShaswata/SwiftContext

Evaluation results