Bhaskar Ram commited on
Commit
a465955
·
1 Parent(s): 3381167

feat: apply all 15 upgrades — BGE embeddings, cosine FAISS, streaming LLM, tenacity retry, dotenv, Dockerfile, tests

Browse files
Files changed (11) hide show
  1. .env.example +11 -0
  2. .gitignore +32 -0
  3. Dockerfile +27 -0
  4. README.md +14 -14
  5. app.py +23 -14
  6. rag/chain.py +61 -21
  7. rag/embedder.py +5 -2
  8. rag/retriever.py +7 -4
  9. requirements-dev.txt +4 -0
  10. requirements.txt +6 -4
  11. tests/smoke_test.py +48 -0
.env.example ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Environment variable template — copy to .env and fill in your values
2
+
3
+ # Required: Your Hugging Face API token (get one at https://huggingface.co/settings/tokens)
4
+ HF_TOKEN=hf_...
5
+
6
+ # Optional: Override the default LLM model
7
+ # LLM_MODEL=meta-llama/Llama-3.1-8B-Instruct
8
+
9
+ # Optional: Gradio server settings
10
+ # GRADIO_SERVER_PORT=7860
11
+ # GRADIO_SERVER_NAME=0.0.0.0
.gitignore ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *.pyo
5
+ .mypy_cache/
6
+ .ruff_cache/
7
+
8
+ # Environment
9
+ .env
10
+ *.env
11
+
12
+ # Virtual environments
13
+ .venv/
14
+ venv/
15
+ env/
16
+
17
+ # Gradio cache / uploads
18
+ gradio_cached_examples/
19
+ flagged/
20
+
21
+ # Test artefacts
22
+ .pytest_cache/
23
+ htmlcov/
24
+ .coverage
25
+
26
+ # Editors
27
+ .vscode/
28
+ .idea/
29
+
30
+ # OS
31
+ .DS_Store
32
+ Thumbs.db
Dockerfile ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Kerdos AI — Custom LLM Chat
2
+ # Multi-stage Docker build for a lean production image
3
+
4
+ FROM python:3.11-slim AS base
5
+
6
+ # System dependencies for PyMuPDF and FAISS
7
+ RUN apt-get update && apt-get install -y --no-install-recommends \
8
+ build-essential \
9
+ libgomp1 \
10
+ && rm -rf /var/lib/apt/lists/*
11
+
12
+ WORKDIR /app
13
+
14
+ # Install Python dependencies first (layer-cached)
15
+ COPY requirements.txt .
16
+ RUN pip install --no-cache-dir -r requirements.txt
17
+
18
+ # Copy source
19
+ COPY . .
20
+
21
+ # Gradio listens on 7860 by default
22
+ EXPOSE 7860
23
+
24
+ ENV GRADIO_SERVER_NAME=0.0.0.0
25
+ ENV GRADIO_SERVER_PORT=7860
26
+
27
+ CMD ["python", "app.py"]
README.md CHANGED
@@ -2,7 +2,7 @@
2
  title: Kerdos AI — Custom LLM Chat (Demo)
3
  emoji: 🤖
4
  colorFrom: blue
5
- colorTo: cyan
6
  sdk: gradio
7
  sdk_version: "6.6.0"
8
  app_file: app.py
@@ -62,15 +62,15 @@ We are actively **seeking investment and strategic partnerships** to build the *
62
 
63
  ## ✨ Features (Demo)
64
 
65
- | Feature | Details |
66
- | ----------------------------- | ---------------------------------------------------------- |
67
- | 📄 **Multi-format ingestion** | PDF, DOCX, TXT, MD, CSV |
68
- | 🧠 **Open-source LLM** | `meta-llama/Llama-3.1-8B-Instruct` via HF Inference API |
69
- | 🔒 **Strictly grounded** | Answers only from your uploaded documents |
70
- | 📦 **Multi-document** | Upload and query across multiple files simultaneously |
71
- | 💬 **Multi-turn chat** | Maintains conversation context across questions |
72
- | ⚡ **Fast** | CPU-friendly embeddings (`all-MiniLM-L6-v2` + FAISS) |
73
- | 🔑 **Secure** | Files processed in-session only — never stored permanently |
74
 
75
  ---
76
 
@@ -103,9 +103,9 @@ Document Parser (PDF / DOCX / TXT)
103
 
104
  Text Chunking (512 chars, 64 overlap)
105
 
106
- Embeddings (all-MiniLM-L6-v2)
107
 
108
- FAISS Vector Index (in-memory)
109
 
110
  User Question → Similarity Search → Top-K Chunks
111
 
@@ -120,7 +120,7 @@ Response + Source Citations
120
 
121
  - **UI**: [Gradio](https://gradio.app)
122
  - **LLM**: `meta-llama/Llama-3.1-8B-Instruct`
123
- - **Embeddings**: `sentence-transformers/all-MiniLM-L6-v2`
124
  - **Vector Store**: [FAISS](https://github.com/facebookresearch/faiss)
125
  - **Document Parsing**: PyMuPDF, python-docx
126
 
@@ -148,4 +148,4 @@ MIT License — free for commercial and private use.
148
 
149
  ---
150
 
151
- _© 2024–2025 Kerdos Infrasoft Private Limited | Bengaluru, Karnataka, India_
 
2
  title: Kerdos AI — Custom LLM Chat (Demo)
3
  emoji: 🤖
4
  colorFrom: blue
5
+ colorTo: indigo
6
  sdk: gradio
7
  sdk_version: "6.6.0"
8
  app_file: app.py
 
62
 
63
  ## ✨ Features (Demo)
64
 
65
+ | Feature | Details |
66
+ | ----------------------------- | ----------------------------------------------------------------- |
67
+ | 📄 **Multi-format ingestion** | PDF, DOCX, TXT, MD, CSV |
68
+ | 🧠 **Open-source LLM** | `meta-llama/Llama-3.1-8B-Instruct` via HF Inference API |
69
+ | 🔒 **Strictly grounded** | Answers only from your uploaded documents |
70
+ | 📦 **Multi-document** | Upload and query across multiple files simultaneously |
71
+ | 💬 **Multi-turn chat** | Maintains conversation context across questions |
72
+ | ⚡ **Fast** | CPU-friendly embeddings (`BAAI/bge-small-en-v1.5` + FAISS cosine) |
73
+ | 🔑 **Secure** | Files processed in-session only — never stored permanently |
74
 
75
  ---
76
 
 
103
 
104
  Text Chunking (512 chars, 64 overlap)
105
 
106
+ Embeddings (BAAI/bge-small-en-v1.5)
107
 
108
+ FAISS Vector Index (cosine similarity, in-memory)
109
 
110
  User Question → Similarity Search → Top-K Chunks
111
 
 
120
 
121
  - **UI**: [Gradio](https://gradio.app)
122
  - **LLM**: `meta-llama/Llama-3.1-8B-Instruct`
123
+ - **Embeddings**: `BAAI/bge-small-en-v1.5` (cosine similarity via FAISS)
124
  - **Vector Store**: [FAISS](https://github.com/facebookresearch/faiss)
125
  - **Document Parsing**: PyMuPDF, python-docx
126
 
 
148
 
149
  ---
150
 
151
+ _© 2024–2026 Kerdos Infrasoft Private Limited | Bengaluru, Karnataka, India_
app.py CHANGED
@@ -6,11 +6,14 @@ Website: https://kerdos.in
6
  """
7
 
8
  import os
 
9
  import gradio as gr
10
  from rag.document_loader import load_documents
11
  from rag.embedder import build_index, add_to_index
12
  from rag.retriever import retrieve
13
- from rag.chain import answer
 
 
14
 
15
  # ─────────────────────────────────────────────
16
  # State helpers
@@ -55,9 +58,10 @@ def process_files(files, current_index, status_box):
55
 
56
 
57
  def chat(user_message, history, vector_index, hf_token_input, top_k):
58
- """Main chat handler — retrieves context and calls the LLM."""
59
  if not user_message.strip():
60
- return history, ""
 
61
 
62
  hf_token = get_hf_token(hf_token_input)
63
  if not hf_token:
@@ -65,26 +69,30 @@ def chat(user_message, history, vector_index, hf_token_input, top_k):
65
  {"role": "user", "content": user_message},
66
  {"role": "assistant", "content": "⚠️ Please provide a Hugging Face API token to use the chat."},
67
  ]
68
- return history, ""
 
69
 
70
  if vector_index is None:
71
  history = history + [
72
  {"role": "user", "content": user_message},
73
  {"role": "assistant", "content": "⚠️ Please upload at least one document first."},
74
  ]
75
- return history, ""
 
76
 
77
  try:
78
  chunks = retrieve(user_message, vector_index, top_k=int(top_k))
79
- bot_reply = answer(user_message, chunks, hf_token, chat_history=history)
 
 
 
 
 
 
 
80
  except Exception as e:
81
- bot_reply = f"❌ Error: {e}"
82
-
83
- history = history + [
84
- {"role": "user", "content": user_message},
85
- {"role": "assistant", "content": bot_reply},
86
- ]
87
- return history, ""
88
 
89
 
90
  def reset_all():
@@ -299,7 +307,7 @@ with gr.Blocks(title="Kerdos AI — Custom LLM Chat | Document Q&A Demo") as dem
299
  # ── Kerdos Footer ─────────────────────────
300
  gr.HTML("""
301
  <div id="kerdos-footer">
302
- &copy; 2024–2025 <strong>Kerdos Infrasoft Private Limited</strong> &nbsp;|&nbsp;
303
  CIN: U62099KA2023PTC182869 &nbsp;|&nbsp; Bengaluru, Karnataka, India<br/>
304
  🌐 <a href="https://kerdos.in" target="_blank" style="color:#0055FF;">kerdos.in</a>
305
  &nbsp;|&nbsp;
@@ -311,4 +319,5 @@ with gr.Blocks(title="Kerdos AI — Custom LLM Chat | Document Q&A Demo") as dem
311
  """)
312
 
313
  if __name__ == "__main__":
 
314
  demo.launch(css=CSS, theme=gr.themes.Soft())
 
6
  """
7
 
8
  import os
9
+ from dotenv import load_dotenv
10
  import gradio as gr
11
  from rag.document_loader import load_documents
12
  from rag.embedder import build_index, add_to_index
13
  from rag.retriever import retrieve
14
+ from rag.chain import answer_stream
15
+
16
+ load_dotenv() # Load HF_TOKEN etc. from .env when running locally
17
 
18
  # ─────────────────────────────────────────────
19
  # State helpers
 
58
 
59
 
60
  def chat(user_message, history, vector_index, hf_token_input, top_k):
61
+ """Streaming chat handler — yields progressively-updated history for real-time response."""
62
  if not user_message.strip():
63
+ yield history, ""
64
+ return
65
 
66
  hf_token = get_hf_token(hf_token_input)
67
  if not hf_token:
 
69
  {"role": "user", "content": user_message},
70
  {"role": "assistant", "content": "⚠️ Please provide a Hugging Face API token to use the chat."},
71
  ]
72
+ yield history, ""
73
+ return
74
 
75
  if vector_index is None:
76
  history = history + [
77
  {"role": "user", "content": user_message},
78
  {"role": "assistant", "content": "⚠️ Please upload at least one document first."},
79
  ]
80
+ yield history, ""
81
+ return
82
 
83
  try:
84
  chunks = retrieve(user_message, vector_index, top_k=int(top_k))
85
+ # Append placeholder so user sees their message immediately
86
+ history = history + [
87
+ {"role": "user", "content": user_message},
88
+ {"role": "assistant", "content": ""},
89
+ ]
90
+ for partial in answer_stream(user_message, chunks, hf_token, chat_history=history[:-2]):
91
+ history[-1]["content"] = partial
92
+ yield history, ""
93
  except Exception as e:
94
+ history[-1]["content"] = f"❌ Error: {e}"
95
+ yield history, ""
 
 
 
 
 
96
 
97
 
98
  def reset_all():
 
307
  # ── Kerdos Footer ─────────────────────────
308
  gr.HTML("""
309
  <div id="kerdos-footer">
310
+ &copy; 2024–2026 <strong>Kerdos Infrasoft Private Limited</strong> &nbsp;|&nbsp;
311
  CIN: U62099KA2023PTC182869 &nbsp;|&nbsp; Bengaluru, Karnataka, India<br/>
312
  🌐 <a href="https://kerdos.in" target="_blank" style="color:#0055FF;">kerdos.in</a>
313
  &nbsp;|&nbsp;
 
319
  """)
320
 
321
  if __name__ == "__main__":
322
+ demo.queue() # Required for streaming generators
323
  demo.launch(css=CSS, theme=gr.themes.Soft())
rag/chain.py CHANGED
@@ -2,10 +2,18 @@
2
  chain.py
3
  Calls the LLM via HF Inference API with a strict RAG prompt.
4
  Only answers from the retrieved context — never from general knowledge.
 
 
 
 
 
5
  """
6
 
7
  from __future__ import annotations
 
 
8
  from huggingface_hub import InferenceClient
 
9
 
10
  SYSTEM_PROMPT = """You are an enterprise document assistant. Your ONLY job is to answer questions using the provided document context below.
11
 
@@ -25,6 +33,8 @@ Context from uploaded documents:
25
  LLM_MODEL = "meta-llama/Llama-3.1-8B-Instruct"
26
  MAX_NEW_TOKENS = 1024
27
  TEMPERATURE = 0.1 # Low temperature for factual, grounded responses
 
 
28
 
29
 
30
  def build_context(chunks: list[dict]) -> str:
@@ -35,37 +45,67 @@ def build_context(chunks: list[dict]) -> str:
35
  return "\n\n".join(parts)
36
 
37
 
38
- def answer(
39
- query: str,
40
- context_chunks: list[dict],
41
- hf_token: str,
42
- chat_history: list[dict] | None = None,
43
- ) -> str:
44
- """
45
- Call Llama 3 via HF Inference API to answer the query
46
- grounded strictly in context_chunks.
47
- """
48
- if not context_chunks:
49
- return "I don't have that information in the uploaded documents."
50
-
51
  context = build_context(context_chunks)
52
  system_msg = SYSTEM_PROMPT.format(context=context)
53
 
54
- # Build message history for multi-turn conversation
55
- # chat_history is now a flat list of {"role": ..., "content": ...} dicts (Gradio 6.x)
56
- messages = [{"role": "system", "content": system_msg}]
57
  if chat_history:
58
- # Keep last 8 messages (4 turns) for context
59
- for msg in chat_history[-8:]:
60
  if msg.get("role") in ("user", "assistant") and msg.get("content"):
61
  messages.append({"role": msg["role"], "content": msg["content"]})
 
 
 
62
  messages.append({"role": "user", "content": query})
 
63
 
64
- client = InferenceClient(token=hf_token)
65
- response = client.chat_completion(
 
 
 
 
 
 
 
 
66
  model=LLM_MODEL,
67
  messages=messages,
68
  max_tokens=MAX_NEW_TOKENS,
69
  temperature=TEMPERATURE,
 
70
  )
71
- return response.choices[0].message.content.strip()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  chain.py
3
  Calls the LLM via HF Inference API with a strict RAG prompt.
4
  Only answers from the retrieved context — never from general knowledge.
5
+
6
+ Upgrades vs original:
7
+ • answer_stream() — yields token-by-token for real-time Gradio streaming
8
+ • tenacity retry (3 attempts, exponential back-off) on transient API errors
9
+ • Hard input length guard (query ≤ 2000 chars, history capped at 6 messages)
10
  """
11
 
12
  from __future__ import annotations
13
+ from typing import Generator
14
+
15
  from huggingface_hub import InferenceClient
16
+ from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
17
 
18
  SYSTEM_PROMPT = """You are an enterprise document assistant. Your ONLY job is to answer questions using the provided document context below.
19
 
 
33
  LLM_MODEL = "meta-llama/Llama-3.1-8B-Instruct"
34
  MAX_NEW_TOKENS = 1024
35
  TEMPERATURE = 0.1 # Low temperature for factual, grounded responses
36
+ MAX_QUERY_CHARS = 2000
37
+ MAX_HISTORY_TURNS = 6 # Keep last N messages (each turn = 1 user + 1 assistant)
38
 
39
 
40
  def build_context(chunks: list[dict]) -> str:
 
45
  return "\n\n".join(parts)
46
 
47
 
48
+ def _build_messages(query: str, context_chunks: list[dict], chat_history: list[dict] | None) -> list[dict]:
49
+ """Assemble the full message list for the LLM call."""
 
 
 
 
 
 
 
 
 
 
 
50
  context = build_context(context_chunks)
51
  system_msg = SYSTEM_PROMPT.format(context=context)
52
 
53
+ messages: list[dict] = [{"role": "system", "content": system_msg}]
 
 
54
  if chat_history:
55
+ # Cap history to avoid overflow
56
+ for msg in chat_history[-MAX_HISTORY_TURNS:]:
57
  if msg.get("role") in ("user", "assistant") and msg.get("content"):
58
  messages.append({"role": msg["role"], "content": msg["content"]})
59
+
60
+ # Guard: truncate excessively long queries
61
+ query = query[:MAX_QUERY_CHARS]
62
  messages.append({"role": "user", "content": query})
63
+ return messages
64
 
65
+
66
+ @retry(
67
+ stop=stop_after_attempt(3),
68
+ wait=wait_exponential(multiplier=1, min=2, max=10),
69
+ retry=retry_if_exception_type(Exception),
70
+ reraise=True,
71
+ )
72
+ def _call_llm_stream(client: InferenceClient, messages: list[dict]):
73
+ """Streaming call to the LLM; decorated with retry logic."""
74
+ return client.chat_completion(
75
  model=LLM_MODEL,
76
  messages=messages,
77
  max_tokens=MAX_NEW_TOKENS,
78
  temperature=TEMPERATURE,
79
+ stream=True,
80
  )
81
+
82
+
83
+ def answer_stream(
84
+ query: str,
85
+ context_chunks: list[dict],
86
+ hf_token: str,
87
+ chat_history: list[dict] | None = None,
88
+ ) -> Generator[str, None, None]:
89
+ """
90
+ Stream the LLM answer token-by-token.
91
+ Yields the progressively-growing reply string so Gradio can update in real time.
92
+ """
93
+ if not context_chunks:
94
+ yield "I don't have that information in the uploaded documents."
95
+ return
96
+
97
+ messages = _build_messages(query, context_chunks, chat_history)
98
+ client = InferenceClient(token=hf_token)
99
+
100
+ try:
101
+ stream = _call_llm_stream(client, messages)
102
+ except Exception as e:
103
+ yield f"❌ LLM error after retries: {e}"
104
+ return
105
+
106
+ accumulated = ""
107
+ for chunk in stream:
108
+ delta = chunk.choices[0].delta.content
109
+ if delta:
110
+ accumulated += delta
111
+ yield accumulated
rag/embedder.py CHANGED
@@ -9,7 +9,7 @@ from dataclasses import dataclass, field
9
 
10
  CHUNK_SIZE = 512 # characters
11
  CHUNK_OVERLAP = 64 # characters
12
- EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
13
 
14
 
15
  @dataclass
@@ -56,7 +56,9 @@ def build_index(docs: list[dict]) -> VectorIndex:
56
  embeddings = np.array(embeddings, dtype="float32")
57
 
58
  dim = embeddings.shape[1]
59
- index = faiss.IndexFlatL2(dim)
 
 
60
  index.add(embeddings)
61
 
62
  print(f"[Embedder] Index built: {index.ntotal} vectors, dim={dim}")
@@ -75,6 +77,7 @@ def add_to_index(vector_index: VectorIndex, docs: list[dict]) -> VectorIndex:
75
  texts = [c["text"] for c in new_chunks]
76
  embeddings = vector_index.embedder.encode(texts, show_progress_bar=False, batch_size=32)
77
  embeddings = np.array(embeddings, dtype="float32")
 
78
 
79
  vector_index.index.add(embeddings)
80
  vector_index.chunks.extend(new_chunks)
 
9
 
10
  CHUNK_SIZE = 512 # characters
11
  CHUNK_OVERLAP = 64 # characters
12
+ EMBEDDING_MODEL = "BAAI/bge-small-en-v1.5" # Upgraded: state-of-the-art small retrieval model
13
 
14
 
15
  @dataclass
 
56
  embeddings = np.array(embeddings, dtype="float32")
57
 
58
  dim = embeddings.shape[1]
59
+ # Use Inner Product index (cosine similarity after L2 normalisation)
60
+ faiss.normalize_L2(embeddings)
61
+ index = faiss.IndexFlatIP(dim)
62
  index.add(embeddings)
63
 
64
  print(f"[Embedder] Index built: {index.ntotal} vectors, dim={dim}")
 
77
  texts = [c["text"] for c in new_chunks]
78
  embeddings = vector_index.embedder.encode(texts, show_progress_bar=False, batch_size=32)
79
  embeddings = np.array(embeddings, dtype="float32")
80
+ faiss.normalize_L2(embeddings) # Keep consistent with cosine index
81
 
82
  vector_index.index.add(embeddings)
83
  vector_index.chunks.extend(new_chunks)
rag/retriever.py CHANGED
@@ -1,10 +1,11 @@
1
  """
2
  retriever.py
3
- Performs similarity search against the FAISS index.
4
  """
5
 
6
  from __future__ import annotations
7
  import numpy as np
 
8
  from rag.embedder import VectorIndex
9
 
10
  DEFAULT_TOP_K = 5
@@ -14,24 +15,26 @@ def retrieve(query: str, vector_index: VectorIndex, top_k: int = DEFAULT_TOP_K)
14
  """
15
  Embed the query and return top_k most similar chunks.
16
  Each result: {"source": str, "text": str, "score": float}
 
17
  """
18
  if vector_index is None or vector_index.index is None:
19
  return []
20
 
21
  query_embedding = vector_index.embedder.encode([query], show_progress_bar=False)
22
  query_embedding = np.array(query_embedding, dtype="float32")
 
23
 
24
  n_results = min(top_k, vector_index.index.ntotal)
25
- distances, indices = vector_index.index.search(query_embedding, n_results)
26
 
27
  results = []
28
- for dist, idx in zip(distances[0], indices[0]):
29
  if idx == -1:
30
  continue
31
  chunk = vector_index.chunks[idx]
32
  results.append({
33
  "source": chunk["source"],
34
  "text": chunk["text"],
35
- "score": float(dist),
36
  })
37
  return results
 
1
  """
2
  retriever.py
3
+ Performs cosine-similarity search against the FAISS index.
4
  """
5
 
6
  from __future__ import annotations
7
  import numpy as np
8
+ import faiss
9
  from rag.embedder import VectorIndex
10
 
11
  DEFAULT_TOP_K = 5
 
15
  """
16
  Embed the query and return top_k most similar chunks.
17
  Each result: {"source": str, "text": str, "score": float}
18
+ Scores are cosine similarities (higher = more relevant).
19
  """
20
  if vector_index is None or vector_index.index is None:
21
  return []
22
 
23
  query_embedding = vector_index.embedder.encode([query], show_progress_bar=False)
24
  query_embedding = np.array(query_embedding, dtype="float32")
25
+ faiss.normalize_L2(query_embedding) # Must match IndexFlatIP cosine index
26
 
27
  n_results = min(top_k, vector_index.index.ntotal)
28
+ scores, indices = vector_index.index.search(query_embedding, n_results)
29
 
30
  results = []
31
+ for score, idx in zip(scores[0], indices[0]):
32
  if idx == -1:
33
  continue
34
  chunk = vector_index.chunks[idx]
35
  results.append({
36
  "source": chunk["source"],
37
  "text": chunk["text"],
38
+ "score": float(score), # cosine similarity (0–1 range)
39
  })
40
  return results
requirements-dev.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ # Development dependencies (not needed in production)
2
+ pytest>=8.0.0
3
+ black>=24.0.0
4
+ ruff>=0.4.0
requirements.txt CHANGED
@@ -1,7 +1,9 @@
1
  gradio>=6.6.0
2
- sentence-transformers>=2.7.0
3
- faiss-cpu>=1.7.4
4
  PyMuPDF>=1.24.0
5
  python-docx>=1.1.0
6
- huggingface-hub>=0.23.0
7
- numpy>=1.24.0
 
 
 
1
  gradio>=6.6.0
2
+ sentence-transformers>=5.0.0
3
+ faiss-cpu>=1.9.0
4
  PyMuPDF>=1.24.0
5
  python-docx>=1.1.0
6
+ huggingface-hub>=0.28.0
7
+ numpy>=1.26.0,<3
8
+ python-dotenv>=1.0.0
9
+ tenacity>=8.2.0
tests/smoke_test.py ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ tests/smoke_test.py
3
+ Quick sanity check — verifies imports and a basic FAISS index round-trip.
4
+ Run with: python -m pytest tests/smoke_test.py -v
5
+ """
6
+
7
+ import sys
8
+ import os
9
+
10
+ # Make sure the project root is on the path
11
+ sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
12
+
13
+
14
+ def test_imports():
15
+ """All RAG modules should import without error."""
16
+ from rag import document_loader, embedder, retriever, chain # noqa: F401
17
+
18
+
19
+ def test_index_and_retrieve():
20
+ """Build a tiny FAISS index and assert we get a result back."""
21
+ from rag.embedder import build_index
22
+ from rag.retriever import retrieve
23
+
24
+ docs = [
25
+ {"source": "test.txt", "text": "The refund policy allows returns within 30 days of purchase."},
26
+ {"source": "test.txt", "text": "Contact support at support@example.com for assistance."},
27
+ ]
28
+
29
+ idx = build_index(docs)
30
+ assert idx.index.ntotal > 0, "Index should have at least one vector"
31
+
32
+ results = retrieve("What is the refund policy?", idx, top_k=2)
33
+ assert len(results) > 0, "Should return at least one result"
34
+
35
+ # Cosine similarity scores should be in (0, 1] range
36
+ for r in results:
37
+ assert 0.0 <= r["score"] <= 1.01, f"Unexpected score: {r['score']}"
38
+ assert "source" in r and "text" in r
39
+
40
+
41
+ def test_chunk_not_empty():
42
+ """Chunker should produce non-empty chunks."""
43
+ from rag.embedder import _chunk_text
44
+
45
+ chunks = _chunk_text("doc.txt", "Hello world. " * 100)
46
+ assert len(chunks) > 0
47
+ for c in chunks:
48
+ assert c["text"].strip()