--- title: Code Search API colorFrom: yellow colorTo: red sdk: docker pinned: false license: mit app_port: 7860 --- # Code Search API A FastAPI REST API for semantic code search powered by [`jinaai/jina-embeddings-v2-base-code`](https://huggingface.co/jinaai/jina-embeddings-v2-base-code) and FAISS approximate nearest-neighbour search. ## What's new (v2) | Area | Before | After | |---|---|---| | Model | pplx-embed-v1-0.6B × 2 | jina-embeddings-v2-base-code × 1 | | Embedding speed | ~2 s / batch | ~500 ms / batch | | Search (100 K chunks) | ~2 000 ms | ~5 ms | | Chunking | Sentence windows | AST (Python) / regex (other langs) | | Persistence | Lost on restart | Saved to `/data` volume | | Batch indexing | ❌ | ✅ `/index/batch` | ## Endpoints | Method | Path | Description | |--------|------|-------------| | `GET` | `/` | Health check | | `GET` | `/health` | Model status | | `POST` | `/index` | Upload & index a single source file | | `POST` | `/index/batch` | Index an entire codebase in one call | | `POST` | `/search` | Search an indexed document / codebase | | `POST` | `/embed` | Embed arbitrary texts (raw vectors) | | `GET` | `/documents` | List indexed doc IDs | | `DELETE` | `/documents/{doc_id}` | Remove a document | Interactive docs available at `/docs` (Swagger UI). ## Quick start ### Index a single file ```bash curl -X POST https://YOUR-SPACE.hf.space/index \ -F "file=@src/utils.py" \ -F "doc_id=utils" ``` ### Index a whole project (IDE integration) ```python import os, requests def index_project(base_url: str, project_path: str, doc_id: str): SUPPORTED = {".py", ".js", ".ts", ".tsx", ".go", ".rs", ".java", ".md"} files = [] for root, _, filenames in os.walk(project_path): for fname in filenames: if os.path.splitext(fname)[1] in SUPPORTED: full_path = os.path.join(root, fname) rel_path = os.path.relpath(full_path, project_path) with open(full_path, "r", errors="replace") as f: files.append({"filename": rel_path, "content": f.read()}) resp = requests.post(f"{base_url}/index/batch", json={ "doc_id": doc_id, "files": files, "replace": True, }, timeout=300) return resp.json() result = index_project("https://YOUR-SPACE.hf.space", "./my_project", "my_project") print(result) # {"doc_id": "my_project", "files_indexed": 42, "chunks_indexed": 318} ``` ### Search ```bash curl -X POST https://YOUR-SPACE.hf.space/search \ -H "Content-Type: application/json" \ -d '{"doc_id": "my_project", "query": "fetch user from database", "top_k": 5}' ``` ## Supported languages Python (AST chunking), JavaScript, TypeScript, Go, Rust, Java, C, C++, C#, Ruby, PHP (regex chunking), Markdown & plain text (sentence chunking). ## Persistence Indexes are saved to the `/data` persistent volume after every `/index` or `/index/batch` call and automatically restored on Space restart — no re-indexing needed.