metadata
title: Code Search API
colorFrom: yellow
colorTo: red
sdk: docker
pinned: false
license: mit
app_port: 7860
Code Search API
A FastAPI REST API for semantic code search powered by
jinaai/jina-embeddings-v2-base-code
and FAISS approximate nearest-neighbour search.
What's new (v2)
| Area | Before | After |
|---|---|---|
| Model | pplx-embed-v1-0.6B × 2 | jina-embeddings-v2-base-code × 1 |
| Embedding speed | ~2 s / batch | ~500 ms / batch |
| Search (100 K chunks) | ~2 000 ms | ~5 ms |
| Chunking | Sentence windows | AST (Python) / regex (other langs) |
| Persistence | Lost on restart | Saved to /data volume |
| Batch indexing | ❌ | ✅ /index/batch |
Endpoints
| Method | Path | Description |
|---|---|---|
GET |
/ |
Health check |
GET |
/health |
Model status |
POST |
/index |
Upload & index a single source file |
POST |
/index/batch |
Index an entire codebase in one call |
POST |
/search |
Search an indexed document / codebase |
POST |
/embed |
Embed arbitrary texts (raw vectors) |
GET |
/documents |
List indexed doc IDs |
DELETE |
/documents/{doc_id} |
Remove a document |
Interactive docs available at /docs (Swagger UI).
Quick start
Index a single file
curl -X POST https://YOUR-SPACE.hf.space/index \
-F "file=@src/utils.py" \
-F "doc_id=utils"
Index a whole project (IDE integration)
import os, requests
def index_project(base_url: str, project_path: str, doc_id: str):
SUPPORTED = {".py", ".js", ".ts", ".tsx", ".go", ".rs", ".java", ".md"}
files = []
for root, _, filenames in os.walk(project_path):
for fname in filenames:
if os.path.splitext(fname)[1] in SUPPORTED:
full_path = os.path.join(root, fname)
rel_path = os.path.relpath(full_path, project_path)
with open(full_path, "r", errors="replace") as f:
files.append({"filename": rel_path, "content": f.read()})
resp = requests.post(f"{base_url}/index/batch", json={
"doc_id": doc_id,
"files": files,
"replace": True,
}, timeout=300)
return resp.json()
result = index_project("https://YOUR-SPACE.hf.space", "./my_project", "my_project")
print(result)
# {"doc_id": "my_project", "files_indexed": 42, "chunks_indexed": 318}
Search
curl -X POST https://YOUR-SPACE.hf.space/search \
-H "Content-Type: application/json" \
-d '{"doc_id": "my_project", "query": "fetch user from database", "top_k": 5}'
Supported languages
Python (AST chunking), JavaScript, TypeScript, Go, Rust, Java, C, C++, C#, Ruby, PHP (regex chunking), Markdown & plain text (sentence chunking).
Persistence
Indexes are saved to the /data persistent volume after every /index or /index/batch call and automatically restored on Space restart — no re-indexing needed.