Spaces:
Sleeping
Sleeping
Upload folder using huggingface_hub
Browse files- Dockerfile +17 -0
- README.md +48 -11
- app.py +1 -1
- requirements.txt +7 -0
Dockerfile
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
FROM python:3.10
|
| 2 |
+
|
| 3 |
+
WORKDIR /app
|
| 4 |
+
|
| 5 |
+
RUN mkdir -p /tmp/huggingface && chmod -R 777 /tmp/huggingface
|
| 6 |
+
|
| 7 |
+
ENV HF_HOME=/tmp/huggingface
|
| 8 |
+
ENV TRANSFORMERS_CACHE=/tmp/huggingface
|
| 9 |
+
ENV HUGGINGFACE_HUB_CACHE=/tmp/huggingface
|
| 10 |
+
|
| 11 |
+
COPY . .
|
| 12 |
+
|
| 13 |
+
RUN pip install --no-cache-dir -r requirements.txt
|
| 14 |
+
|
| 15 |
+
EXPOSE 7860
|
| 16 |
+
|
| 17 |
+
CMD ["sh", "-c", "uvicorn app:app --host 0.0.0.0 --port ${PORT:-7860}"]
|
README.md
CHANGED
|
@@ -1,11 +1,48 @@
|
|
| 1 |
-
---
|
| 2 |
-
title: QA
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo: purple
|
| 6 |
-
sdk: docker
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: QA ChatBot
|
| 3 |
+
emoji: 💬
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
+
sdk: docker
|
| 7 |
+
sdk_version: 4.44.0
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
license: mit
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# 💬 QA ChatBot – Multi-Adapter TinyLlama
|
| 14 |
+
|
| 15 |
+
This project is a comprehensive Q&A chatbot powered by **three fine-tuned TinyLlama models** (IA³, LoRA, and QLoRA). It combines the power of parameter-efficient fine-tuning with a user-friendly web interface built using FastAPI and Jinja2 templates.
|
| 16 |
+
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
## 🚀 Features
|
| 20 |
+
|
| 21 |
+
- ⚡ **FastAPI Backend**: High-performance web server with automatic OpenAPI docs
|
| 22 |
+
- 🧩 **Supports IA³, LoRA, and QLoRA Adapters**
|
| 23 |
+
- 💬 **Interactive Chat UI**: Web frontend with Jinja2 templates (`index.html`)
|
| 24 |
+
- 📦 **Predefined QA Memory**: Exact and fuzzy QA from `QA.json`
|
| 25 |
+
- 🔐 **PEFT Integration**: Adapter merging for faster inference
|
| 26 |
+
- 🧠 **Fallback to base model** if adapter not found
|
| 27 |
+
- 🐳 **Docker Support**: Fully containerized deployment
|
| 28 |
+
- ⚙️ **Multi-model Inference API**: Get responses from each model separately or all together
|
| 29 |
+
|
| 30 |
+
---
|
| 31 |
+
|
| 32 |
+
📊 Model Overview
|
| 33 |
+
| Model | Adapter Type | Quantization | Purpose |
|
| 34 |
+
| ----- | ------------ | -------------- | ----------------------------- |
|
| 35 |
+
| IA³ | IA³ | Full Precision | Parameter-efficient inference |
|
| 36 |
+
| LoRA | LoRA | 4-bit | Memory-efficient and fast |
|
| 37 |
+
| QLoRA | QLoRA | 4-bit | Quantized + LoRA combo |
|
| 38 |
+
|
| 39 |
+
---
|
| 40 |
+
|
| 41 |
+
📈 Performance
|
| 42 |
+
| Metric | Value |
|
| 43 |
+
| -------------------- | --------------------- |
|
| 44 |
+
| Base Model Size | 1.1B parameters |
|
| 45 |
+
| Trainable Parameters | \~0.5% (via adapters) |
|
| 46 |
+
| Memory Usage | \~2-3 GB (with QLoRA) |
|
| 47 |
+
| Inference Speed | \~50–100 tokens/sec |
|
| 48 |
+
| Port | Default `7860` |
|
app.py
CHANGED
|
@@ -262,5 +262,5 @@ def predict_all_models(input_data: TinyLlamaInput):
|
|
| 262 |
|
| 263 |
if __name__ == "__main__":
|
| 264 |
import uvicorn
|
| 265 |
-
uvicorn.run("app:app", host="0.0.0.0", port=
|
| 266 |
|
|
|
|
| 262 |
|
| 263 |
if __name__ == "__main__":
|
| 264 |
import uvicorn
|
| 265 |
+
uvicorn.run("app:app", host="0.0.0.0", port=7860, reload=True)
|
| 266 |
|
requirements.txt
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
fastapi
|
| 2 |
+
uvicorn
|
| 3 |
+
jinja2
|
| 4 |
+
transformers
|
| 5 |
+
torch
|
| 6 |
+
peft
|
| 7 |
+
python-multipart
|