Spaces:

Jack1808
/

Vector_RAG_vs_Page_Index_RAG

Sleeping

3v324v23 commited on Mar 3

Commit

7b9c753

1 Parent(s): a70fe97

Fix Ollama build error and use llama3 model

Files changed (4) hide show

Dockerfile CHANGED Viewed

@@ -1,7 +1,7 @@
 FROM python:3.11-slim
 # Install necessary tools
-RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
 # Install Ollama
 RUN curl -fsSL https://ollama.com/install.sh | sh

 FROM python:3.11-slim
 # Install necessary tools
+RUN apt-get update && apt-get install -y curl zstd && rm -rf /var/lib/apt/lists/*
 # Install Ollama
 RUN curl -fsSL https://ollama.com/install.sh | sh

entrypoint.sh CHANGED Viewed

@@ -10,10 +10,9 @@ sleep 5
 echo "Pulling nomic-embed-text..."
 ollama pull nomic-embed-text
-# NOTE: The 20B model used for `gpt-oss:20b-cloud` may exceed Hugging Face Free Tier memory/storage limits.
-# Make sure your HF Space has the hardware to support this local LLM, or swap it for a smaller one like `llama3` or `mistral`.
-echo "Pulling gpt-oss:20b-cloud... (This might fail if it's a custom local model or exceeds HF limits)"
-# ollama pull gpt-oss:20b-cloud
 # Start the FastAPI server on port 7860 (default for HF Spaces)
 echo "Starting Application..."

 echo "Pulling nomic-embed-text..."
 ollama pull nomic-embed-text
+# Pull the small LLM for generating responses (llama3.2:1b)
+echo "Pulling llama3.2:1b... (Lightweight model for HF Free Tier)"
+ollama pull llama3.2:1b
 # Start the FastAPI server on port 7860 (default for HF Spaces)
 echo "Starting Application..."

src/page_rag/llm_engine.py CHANGED Viewed

@@ -10,7 +10,7 @@ import ollama
 from .retriever import RetrievedPage, build_context
 # ─── Config ────────────────────────────────────────────────────────────────────
-LLM_MODEL = "gpt-oss:20b-cloud"   # your local model name in Ollama
 # ───────────────────────────────────────────────────────────────────────────────
 SYSTEM_PROMPT = """You are a helpful document assistant.

 from .retriever import RetrievedPage, build_context
 # ─── Config ────────────────────────────────────────────────────────────────────
+LLM_MODEL = "llama3.2:1b"   # your local model name in Ollama
 # ───────────────────────────────────────────────────────────────────────────────
 SYSTEM_PROMPT = """You are a helpful document assistant.

src/vector_rag/llm_engine.py CHANGED Viewed

@@ -9,7 +9,7 @@ from typing import Generator
 import ollama
 from .retriever import RetrievedChunk, build_context
-LLM_MODEL = "gpt-oss:20b-cloud"
 SYSTEM_PROMPT = """You are a precise document assistant.
 Answer the user's question using ONLY the provided context chunks.

 import ollama
 from .retriever import RetrievedChunk, build_context
+LLM_MODEL = "llama3.2:1b"
 SYSTEM_PROMPT = """You are a precise document assistant.
 Answer the user's question using ONLY the provided context chunks.