| | --- |
| | title: Code Search API |
| | colorFrom: yellow |
| | colorTo: red |
| | sdk: docker |
| | pinned: false |
| | license: mit |
| | app_port: 7860 |
| | --- |
| | |
| | # Code Search API |
| |
|
| | A FastAPI REST API for semantic code search powered by |
| | [`jinaai/jina-embeddings-v2-base-code`](https://huggingface.co/jinaai/jina-embeddings-v2-base-code) |
| | and FAISS approximate nearest-neighbour search. |
| |
|
| | ## What's new (v2) |
| |
|
| | | Area | Before | After | |
| | |---|---|---| |
| | | Model | pplx-embed-v1-0.6B × 2 | jina-embeddings-v2-base-code × 1 | |
| | | Embedding speed | ~2 s / batch | ~500 ms / batch | |
| | | Search (100 K chunks) | ~2 000 ms | ~5 ms | |
| | | Chunking | Sentence windows | AST (Python) / regex (other langs) | |
| | | Persistence | Lost on restart | Saved to `/data` volume | |
| | | Batch indexing | ❌ | ✅ `/index/batch` | |
| |
|
| | ## Endpoints |
| |
|
| | | Method | Path | Description | |
| | |--------|------|-------------| |
| | | `GET` | `/` | Health check | |
| | | `GET` | `/health` | Model status | |
| | | `POST` | `/index` | Upload & index a single source file | |
| | | `POST` | `/index/batch` | Index an entire codebase in one call | |
| | | `POST` | `/search` | Search an indexed document / codebase | |
| | | `POST` | `/embed` | Embed arbitrary texts (raw vectors) | |
| | | `GET` | `/documents` | List indexed doc IDs | |
| | | `DELETE` | `/documents/{doc_id}` | Remove a document | |
| |
|
| | Interactive docs available at `/docs` (Swagger UI). |
| |
|
| | ## Quick start |
| |
|
| | ### Index a single file |
| |
|
| | ```bash |
| | curl -X POST https://YOUR-SPACE.hf.space/index \ |
| | -F "file=@src/utils.py" \ |
| | -F "doc_id=utils" |
| | ``` |
| |
|
| | ### Index a whole project (IDE integration) |
| |
|
| | ```python |
| | import os, requests |
| | |
| | def index_project(base_url: str, project_path: str, doc_id: str): |
| | SUPPORTED = {".py", ".js", ".ts", ".tsx", ".go", ".rs", ".java", ".md"} |
| | files = [] |
| | for root, _, filenames in os.walk(project_path): |
| | for fname in filenames: |
| | if os.path.splitext(fname)[1] in SUPPORTED: |
| | full_path = os.path.join(root, fname) |
| | rel_path = os.path.relpath(full_path, project_path) |
| | with open(full_path, "r", errors="replace") as f: |
| | files.append({"filename": rel_path, "content": f.read()}) |
| | |
| | resp = requests.post(f"{base_url}/index/batch", json={ |
| | "doc_id": doc_id, |
| | "files": files, |
| | "replace": True, |
| | }, timeout=300) |
| | return resp.json() |
| | |
| | result = index_project("https://YOUR-SPACE.hf.space", "./my_project", "my_project") |
| | print(result) |
| | # {"doc_id": "my_project", "files_indexed": 42, "chunks_indexed": 318} |
| | ``` |
| |
|
| | ### Search |
| |
|
| | ```bash |
| | curl -X POST https://YOUR-SPACE.hf.space/search \ |
| | -H "Content-Type: application/json" \ |
| | -d '{"doc_id": "my_project", "query": "fetch user from database", "top_k": 5}' |
| | ``` |
| |
|
| | ## Supported languages |
| |
|
| | Python (AST chunking), JavaScript, TypeScript, Go, Rust, Java, C, C++, C#, Ruby, PHP (regex chunking), Markdown & plain text (sentence chunking). |
| |
|
| | ## Persistence |
| |
|
| | Indexes are saved to the `/data` persistent volume after every `/index` or `/index/batch` call and automatically restored on Space restart — no re-indexing needed. |