Spaces:

kamp0010
/

cc1

Runtime error

App Files Files Community

cc1 / README.md

kamp0010

Update README.md

2af430f verified 1 day ago

preview code

raw

history blame contribute delete

3.12 kB

metadata

title: Code Search API
colorFrom: yellow
colorTo: red
sdk: docker
pinned: false
license: mit
app_port: 7860

Code Search API

A FastAPI REST API for semantic code search powered by jinaai/jina-embeddings-v2-base-code and FAISS approximate nearest-neighbour search.

What's new (v2)

Area	Before	After
Model	pplx-embed-v1-0.6B × 2	jina-embeddings-v2-base-code × 1
Embedding speed	~2 s / batch	~500 ms / batch
Search (100 K chunks)	~2 000 ms	~5 ms
Chunking	Sentence windows	AST (Python) / regex (other langs)
Persistence	Lost on restart	Saved to `/data` volume
Batch indexing	❌	✅ `/index/batch`

Endpoints

Method	Path	Description
`GET`	`/`	Health check
`GET`	`/health`	Model status
`POST`	`/index`	Upload & index a single source file
`POST`	`/index/batch`	Index an entire codebase in one call
`POST`	`/search`	Search an indexed document / codebase
`POST`	`/embed`	Embed arbitrary texts (raw vectors)
`GET`	`/documents`	List indexed doc IDs
`DELETE`	`/documents/{doc_id}`	Remove a document

Interactive docs available at /docs (Swagger UI).

Quick start

Index a single file

curl -X POST https://YOUR-SPACE.hf.space/index \
  -F "file=@src/utils.py" \
  -F "doc_id=utils"

Index a whole project (IDE integration)

import os, requests

def index_project(base_url: str, project_path: str, doc_id: str):
    SUPPORTED = {".py", ".js", ".ts", ".tsx", ".go", ".rs", ".java", ".md"}
    files = []
    for root, _, filenames in os.walk(project_path):
        for fname in filenames:
            if os.path.splitext(fname)[1] in SUPPORTED:
                full_path = os.path.join(root, fname)
                rel_path  = os.path.relpath(full_path, project_path)
                with open(full_path, "r", errors="replace") as f:
                    files.append({"filename": rel_path, "content": f.read()})

    resp = requests.post(f"{base_url}/index/batch", json={
        "doc_id":  doc_id,
        "files":   files,
        "replace": True,
    }, timeout=300)
    return resp.json()

result = index_project("https://YOUR-SPACE.hf.space", "./my_project", "my_project")
print(result)
# {"doc_id": "my_project", "files_indexed": 42, "chunks_indexed": 318}

Search

curl -X POST https://YOUR-SPACE.hf.space/search \
  -H "Content-Type: application/json" \
  -d '{"doc_id": "my_project", "query": "fetch user from database", "top_k": 5}'

Supported languages

Python (AST chunking), JavaScript, TypeScript, Go, Rust, Java, C, C++, C#, Ruby, PHP (regex chunking), Markdown & plain text (sentence chunking).

Persistence

Indexes are saved to the /data persistent volume after every /index or /index/batch call and automatically restored on Space restart — no re-indexing needed.