cc1 / README.md
kamp0010's picture
Update README.md
2af430f verified
---
title: Code Search API
colorFrom: yellow
colorTo: red
sdk: docker
pinned: false
license: mit
app_port: 7860
---
# Code Search API
A FastAPI REST API for semantic code search powered by
[`jinaai/jina-embeddings-v2-base-code`](https://huggingface.co/jinaai/jina-embeddings-v2-base-code)
and FAISS approximate nearest-neighbour search.
## What's new (v2)
| Area | Before | After |
|---|---|---|
| Model | pplx-embed-v1-0.6B × 2 | jina-embeddings-v2-base-code × 1 |
| Embedding speed | ~2 s / batch | ~500 ms / batch |
| Search (100 K chunks) | ~2 000 ms | ~5 ms |
| Chunking | Sentence windows | AST (Python) / regex (other langs) |
| Persistence | Lost on restart | Saved to `/data` volume |
| Batch indexing | ❌ | ✅ `/index/batch` |
## Endpoints
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/` | Health check |
| `GET` | `/health` | Model status |
| `POST` | `/index` | Upload & index a single source file |
| `POST` | `/index/batch` | Index an entire codebase in one call |
| `POST` | `/search` | Search an indexed document / codebase |
| `POST` | `/embed` | Embed arbitrary texts (raw vectors) |
| `GET` | `/documents` | List indexed doc IDs |
| `DELETE` | `/documents/{doc_id}` | Remove a document |
Interactive docs available at `/docs` (Swagger UI).
## Quick start
### Index a single file
```bash
curl -X POST https://YOUR-SPACE.hf.space/index \
-F "file=@src/utils.py" \
-F "doc_id=utils"
```
### Index a whole project (IDE integration)
```python
import os, requests
def index_project(base_url: str, project_path: str, doc_id: str):
SUPPORTED = {".py", ".js", ".ts", ".tsx", ".go", ".rs", ".java", ".md"}
files = []
for root, _, filenames in os.walk(project_path):
for fname in filenames:
if os.path.splitext(fname)[1] in SUPPORTED:
full_path = os.path.join(root, fname)
rel_path = os.path.relpath(full_path, project_path)
with open(full_path, "r", errors="replace") as f:
files.append({"filename": rel_path, "content": f.read()})
resp = requests.post(f"{base_url}/index/batch", json={
"doc_id": doc_id,
"files": files,
"replace": True,
}, timeout=300)
return resp.json()
result = index_project("https://YOUR-SPACE.hf.space", "./my_project", "my_project")
print(result)
# {"doc_id": "my_project", "files_indexed": 42, "chunks_indexed": 318}
```
### Search
```bash
curl -X POST https://YOUR-SPACE.hf.space/search \
-H "Content-Type: application/json" \
-d '{"doc_id": "my_project", "query": "fetch user from database", "top_k": 5}'
```
## Supported languages
Python (AST chunking), JavaScript, TypeScript, Go, Rust, Java, C, C++, C#, Ruby, PHP (regex chunking), Markdown & plain text (sentence chunking).
## Persistence
Indexes are saved to the `/data` persistent volume after every `/index` or `/index/batch` call and automatically restored on Space restart — no re-indexing needed.