Spaces:

hardik-0212
/

QA-ChatBot

Sleeping

App Files Files Community

hardik-0212 commited on Jun 21, 2025

Commit

48cfc20

verified ·

1 Parent(s): 5501dad

Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

Dockerfile +17 -0
README.md +48 -11
app.py +1 -1
requirements.txt +7 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,17 @@

+FROM python:3.10
+WORKDIR /app
+RUN mkdir -p /tmp/huggingface && chmod -R 777 /tmp/huggingface
+ENV HF_HOME=/tmp/huggingface
+ENV TRANSFORMERS_CACHE=/tmp/huggingface
+ENV HUGGINGFACE_HUB_CACHE=/tmp/huggingface
+COPY . .
+RUN pip install --no-cache-dir -r requirements.txt
+EXPOSE 7860
+CMD ["sh", "-c", "uvicorn app:app --host 0.0.0.0 --port ${PORT:-7860}"]

README.md CHANGED Viewed

@@ -1,11 +1,48 @@
----
-title: QA Chatbot New
-emoji: 🌖
-colorFrom: purple
-colorTo: purple
-sdk: docker
-pinned: false
-license: mit
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: QA ChatBot
+emoji: 💬
+colorFrom: blue
+colorTo: purple
+sdk: docker
+sdk_version: 4.44.0
+app_file: app.py
+pinned: false
+license: mit
+---
+# 💬 QA ChatBot – Multi-Adapter TinyLlama
+This project is a comprehensive Q&A chatbot powered by **three fine-tuned TinyLlama models** (IA³, LoRA, and QLoRA). It combines the power of parameter-efficient fine-tuning with a user-friendly web interface built using FastAPI and Jinja2 templates.
+---
+## 🚀 Features
+- ⚡ **FastAPI Backend**: High-performance web server with automatic OpenAPI docs
+- 🧩 **Supports IA³, LoRA, and QLoRA Adapters**
+- 💬 **Interactive Chat UI**: Web frontend with Jinja2 templates (`index.html`)
+- 📦 **Predefined QA Memory**: Exact and fuzzy QA from `QA.json`
+- 🔐 **PEFT Integration**: Adapter merging for faster inference
+- 🧠 **Fallback to base model** if adapter not found
+- 🐳 **Docker Support**: Fully containerized deployment
+- ⚙️ **Multi-model Inference API**: Get responses from each model separately or all together
+---
+📊 Model Overview
+| Model | Adapter Type | Quantization   | Purpose                       |
+| ----- | ------------ | -------------- | ----------------------------- |
+| IA³   | IA³          | Full Precision | Parameter-efficient inference |
+| LoRA  | LoRA         | 4-bit          | Memory-efficient and fast     |
+| QLoRA | QLoRA        | 4-bit          | Quantized + LoRA combo        |
+---
+📈 Performance
+| Metric               | Value                 |
+| -------------------- | --------------------- |
+| Base Model Size      | 1.1B parameters       |
+| Trainable Parameters | \~0.5% (via adapters) |
+| Memory Usage         | \~2-3 GB (with QLoRA) |
+| Inference Speed      | \~50–100 tokens/sec   |
+| Port                 | Default `7860`        |

app.py CHANGED Viewed

@@ -262,5 +262,5 @@ def predict_all_models(input_data: TinyLlamaInput):
 if __name__ == "__main__":
     import uvicorn
-    uvicorn.run("app:app", host="0.0.0.0", port=8000, reload=True)

 if __name__ == "__main__":
     import uvicorn
+    uvicorn.run("app:app", host="0.0.0.0", port=7860, reload=True)

requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+fastapi
+uvicorn
+jinja2
+transformers
+torch
+peft
+python-multipart