kamp0010 commited on
Commit
2af430f
·
verified ·
1 Parent(s): a4ad4e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -17
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: pplx-embed Semantic Search API
3
  colorFrom: yellow
4
  colorTo: red
5
  sdk: docker
@@ -8,34 +8,88 @@ license: mit
8
  app_port: 7860
9
  ---
10
 
11
- # pplx-embed Semantic Search API
12
 
13
- A FastAPI REST API for semantic search powered by Perplexity's contextual embedding models.
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  ## Endpoints
16
 
17
  | Method | Path | Description |
18
  |--------|------|-------------|
19
- | `GET` | `/` | Health check |
20
- | `GET` | `/health` | Model status |
21
- | `POST` | `/index` | Upload & index a `.txt` / `.md` file |
22
- | `POST` | `/search` | Search an indexed document |
23
- | `POST` | `/embed` | Embed arbitrary texts |
24
- | `GET` | `/documents` | List indexed documents |
25
- | `DELETE` | `/documents/{doc_id}` | Remove a document |
 
26
 
27
  Interactive docs available at `/docs` (Swagger UI).
28
 
29
- ## Quick Example
 
 
30
 
31
  ```bash
32
- # 1. Index a document
33
  curl -X POST https://YOUR-SPACE.hf.space/index \
34
- -F "file=@my_doc.txt" \
35
- -F "doc_id=my_doc"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
- # 2. Search it
 
 
 
 
 
 
 
38
  curl -X POST https://YOUR-SPACE.hf.space/search \
39
  -H "Content-Type: application/json" \
40
- -d '{"doc_id": "my_doc", "query": "What is the main conclusion?", "top_k": 5}'
41
- ```
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Code Search API
3
  colorFrom: yellow
4
  colorTo: red
5
  sdk: docker
 
8
  app_port: 7860
9
  ---
10
 
11
+ # Code Search API
12
 
13
+ A FastAPI REST API for semantic code search powered by
14
+ [`jinaai/jina-embeddings-v2-base-code`](https://huggingface.co/jinaai/jina-embeddings-v2-base-code)
15
+ and FAISS approximate nearest-neighbour search.
16
+
17
+ ## What's new (v2)
18
+
19
+ | Area | Before | After |
20
+ |---|---|---|
21
+ | Model | pplx-embed-v1-0.6B × 2 | jina-embeddings-v2-base-code × 1 |
22
+ | Embedding speed | ~2 s / batch | ~500 ms / batch |
23
+ | Search (100 K chunks) | ~2 000 ms | ~5 ms |
24
+ | Chunking | Sentence windows | AST (Python) / regex (other langs) |
25
+ | Persistence | Lost on restart | Saved to `/data` volume |
26
+ | Batch indexing | ❌ | ✅ `/index/batch` |
27
 
28
  ## Endpoints
29
 
30
  | Method | Path | Description |
31
  |--------|------|-------------|
32
+ | `GET` | `/` | Health check |
33
+ | `GET` | `/health` | Model status |
34
+ | `POST` | `/index` | Upload & index a single source file |
35
+ | `POST` | `/index/batch` | Index an entire codebase in one call |
36
+ | `POST` | `/search` | Search an indexed document / codebase |
37
+ | `POST` | `/embed` | Embed arbitrary texts (raw vectors) |
38
+ | `GET` | `/documents` | List indexed doc IDs |
39
+ | `DELETE` | `/documents/{doc_id}` | Remove a document |
40
 
41
  Interactive docs available at `/docs` (Swagger UI).
42
 
43
+ ## Quick start
44
+
45
+ ### Index a single file
46
 
47
  ```bash
 
48
  curl -X POST https://YOUR-SPACE.hf.space/index \
49
+ -F "file=@src/utils.py" \
50
+ -F "doc_id=utils"
51
+ ```
52
+
53
+ ### Index a whole project (IDE integration)
54
+
55
+ ```python
56
+ import os, requests
57
+
58
+ def index_project(base_url: str, project_path: str, doc_id: str):
59
+ SUPPORTED = {".py", ".js", ".ts", ".tsx", ".go", ".rs", ".java", ".md"}
60
+ files = []
61
+ for root, _, filenames in os.walk(project_path):
62
+ for fname in filenames:
63
+ if os.path.splitext(fname)[1] in SUPPORTED:
64
+ full_path = os.path.join(root, fname)
65
+ rel_path = os.path.relpath(full_path, project_path)
66
+ with open(full_path, "r", errors="replace") as f:
67
+ files.append({"filename": rel_path, "content": f.read()})
68
+
69
+ resp = requests.post(f"{base_url}/index/batch", json={
70
+ "doc_id": doc_id,
71
+ "files": files,
72
+ "replace": True,
73
+ }, timeout=300)
74
+ return resp.json()
75
 
76
+ result = index_project("https://YOUR-SPACE.hf.space", "./my_project", "my_project")
77
+ print(result)
78
+ # {"doc_id": "my_project", "files_indexed": 42, "chunks_indexed": 318}
79
+ ```
80
+
81
+ ### Search
82
+
83
+ ```bash
84
  curl -X POST https://YOUR-SPACE.hf.space/search \
85
  -H "Content-Type: application/json" \
86
+ -d '{"doc_id": "my_project", "query": "fetch user from database", "top_k": 5}'
87
+ ```
88
+
89
+ ## Supported languages
90
+
91
+ Python (AST chunking), JavaScript, TypeScript, Go, Rust, Java, C, C++, C#, Ruby, PHP (regex chunking), Markdown & plain text (sentence chunking).
92
+
93
+ ## Persistence
94
+
95
+ Indexes are saved to the `/data` persistent volume after every `/index` or `/index/batch` call and automatically restored on Space restart — no re-indexing needed.