hardik-0212 commited on
Commit
48cfc20
·
verified ·
1 Parent(s): 5501dad

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. Dockerfile +17 -0
  2. README.md +48 -11
  3. app.py +1 -1
  4. requirements.txt +7 -0
Dockerfile ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10
2
+
3
+ WORKDIR /app
4
+
5
+ RUN mkdir -p /tmp/huggingface && chmod -R 777 /tmp/huggingface
6
+
7
+ ENV HF_HOME=/tmp/huggingface
8
+ ENV TRANSFORMERS_CACHE=/tmp/huggingface
9
+ ENV HUGGINGFACE_HUB_CACHE=/tmp/huggingface
10
+
11
+ COPY . .
12
+
13
+ RUN pip install --no-cache-dir -r requirements.txt
14
+
15
+ EXPOSE 7860
16
+
17
+ CMD ["sh", "-c", "uvicorn app:app --host 0.0.0.0 --port ${PORT:-7860}"]
README.md CHANGED
@@ -1,11 +1,48 @@
1
- ---
2
- title: QA Chatbot New
3
- emoji: 🌖
4
- colorFrom: purple
5
- colorTo: purple
6
- sdk: docker
7
- pinned: false
8
- license: mit
9
- ---
10
-
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: QA ChatBot
3
+ emoji: 💬
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: docker
7
+ sdk_version: 4.44.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ # 💬 QA ChatBot – Multi-Adapter TinyLlama
14
+
15
+ This project is a comprehensive Q&A chatbot powered by **three fine-tuned TinyLlama models** (IA³, LoRA, and QLoRA). It combines the power of parameter-efficient fine-tuning with a user-friendly web interface built using FastAPI and Jinja2 templates.
16
+
17
+ ---
18
+
19
+ ## 🚀 Features
20
+
21
+ - ⚡ **FastAPI Backend**: High-performance web server with automatic OpenAPI docs
22
+ - 🧩 **Supports IA³, LoRA, and QLoRA Adapters**
23
+ - 💬 **Interactive Chat UI**: Web frontend with Jinja2 templates (`index.html`)
24
+ - 📦 **Predefined QA Memory**: Exact and fuzzy QA from `QA.json`
25
+ - 🔐 **PEFT Integration**: Adapter merging for faster inference
26
+ - 🧠 **Fallback to base model** if adapter not found
27
+ - 🐳 **Docker Support**: Fully containerized deployment
28
+ - ⚙️ **Multi-model Inference API**: Get responses from each model separately or all together
29
+
30
+ ---
31
+
32
+ 📊 Model Overview
33
+ | Model | Adapter Type | Quantization | Purpose |
34
+ | ----- | ------------ | -------------- | ----------------------------- |
35
+ | IA³ | IA³ | Full Precision | Parameter-efficient inference |
36
+ | LoRA | LoRA | 4-bit | Memory-efficient and fast |
37
+ | QLoRA | QLoRA | 4-bit | Quantized + LoRA combo |
38
+
39
+ ---
40
+
41
+ 📈 Performance
42
+ | Metric | Value |
43
+ | -------------------- | --------------------- |
44
+ | Base Model Size | 1.1B parameters |
45
+ | Trainable Parameters | \~0.5% (via adapters) |
46
+ | Memory Usage | \~2-3 GB (with QLoRA) |
47
+ | Inference Speed | \~50–100 tokens/sec |
48
+ | Port | Default `7860` |
app.py CHANGED
@@ -262,5 +262,5 @@ def predict_all_models(input_data: TinyLlamaInput):
262
 
263
  if __name__ == "__main__":
264
  import uvicorn
265
- uvicorn.run("app:app", host="0.0.0.0", port=8000, reload=True)
266
 
 
262
 
263
  if __name__ == "__main__":
264
  import uvicorn
265
+ uvicorn.run("app:app", host="0.0.0.0", port=7860, reload=True)
266
 
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ fastapi
2
+ uvicorn
3
+ jinja2
4
+ transformers
5
+ torch
6
+ peft
7
+ python-multipart