cc1 / README.md
kamp0010's picture
Update README.md
2af430f verified
metadata
title: Code Search API
colorFrom: yellow
colorTo: red
sdk: docker
pinned: false
license: mit
app_port: 7860

Code Search API

A FastAPI REST API for semantic code search powered by jinaai/jina-embeddings-v2-base-code and FAISS approximate nearest-neighbour search.

What's new (v2)

Area Before After
Model pplx-embed-v1-0.6B × 2 jina-embeddings-v2-base-code × 1
Embedding speed ~2 s / batch ~500 ms / batch
Search (100 K chunks) ~2 000 ms ~5 ms
Chunking Sentence windows AST (Python) / regex (other langs)
Persistence Lost on restart Saved to /data volume
Batch indexing /index/batch

Endpoints

Method Path Description
GET / Health check
GET /health Model status
POST /index Upload & index a single source file
POST /index/batch Index an entire codebase in one call
POST /search Search an indexed document / codebase
POST /embed Embed arbitrary texts (raw vectors)
GET /documents List indexed doc IDs
DELETE /documents/{doc_id} Remove a document

Interactive docs available at /docs (Swagger UI).

Quick start

Index a single file

curl -X POST https://YOUR-SPACE.hf.space/index \
  -F "file=@src/utils.py" \
  -F "doc_id=utils"

Index a whole project (IDE integration)

import os, requests

def index_project(base_url: str, project_path: str, doc_id: str):
    SUPPORTED = {".py", ".js", ".ts", ".tsx", ".go", ".rs", ".java", ".md"}
    files = []
    for root, _, filenames in os.walk(project_path):
        for fname in filenames:
            if os.path.splitext(fname)[1] in SUPPORTED:
                full_path = os.path.join(root, fname)
                rel_path  = os.path.relpath(full_path, project_path)
                with open(full_path, "r", errors="replace") as f:
                    files.append({"filename": rel_path, "content": f.read()})

    resp = requests.post(f"{base_url}/index/batch", json={
        "doc_id":  doc_id,
        "files":   files,
        "replace": True,
    }, timeout=300)
    return resp.json()

result = index_project("https://YOUR-SPACE.hf.space", "./my_project", "my_project")
print(result)
# {"doc_id": "my_project", "files_indexed": 42, "chunks_indexed": 318}

Search

curl -X POST https://YOUR-SPACE.hf.space/search \
  -H "Content-Type: application/json" \
  -d '{"doc_id": "my_project", "query": "fetch user from database", "top_k": 5}'

Supported languages

Python (AST chunking), JavaScript, TypeScript, Go, Rust, Java, C, C++, C#, Ruby, PHP (regex chunking), Markdown & plain text (sentence chunking).

Persistence

Indexes are saved to the /data persistent volume after every /index or /index/batch call and automatically restored on Space restart — no re-indexing needed.