sujal7102003 commited on
Commit
a286f95
Β·
verified Β·
1 Parent(s): f9dc39d

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md CHANGED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: TinyLlama Chatbot API
3
+ emoji: πŸ¦™
4
+ colorFrom: indigo
5
+ colorTo: pink
6
+ sdk: docker
7
+ sdk_version: "1.0"
8
+ app_file: main.py
9
+ pinned: false
10
+ ---
11
+
12
+ # πŸš€ FastAPI QLoRA Chatbot
13
+
14
+ This project provides a **FastAPI backend** for serving predictions using the **TinyLlama** model, fine-tuned with QLoRA for instruction-based question answering.
15
+
16
+ It also includes a clean, responsive **Jinja2-based frontend** for querying the model interactively.
17
+
18
+ ---
19
+
20
+ ## πŸ”§ Features
21
+
22
+ - βœ… QLoRA-finetuned inference endpoint
23
+ - βœ… HTML frontend built using Jinja2
24
+ - βœ… FastAPI + Uvicorn backend
25
+ - βœ… Docker-ready for Hugging Face Spaces
26
+ - βœ… Hugging Face cache and model offloading for low-RAM environments
27
+
28
+ ---
29
+
30
+ ## πŸ“¦ Tech Stack
31
+
32
+ - FastAPI + Uvicorn
33
+ - Hugging Face Transformers + PEFT (QLoRA)
34
+ - PyTorch (FP16)
35
+ - Jinja2 Templates + HTML + JS (Vanilla)
36
+
37
+ ---
38
+
39
+ ## πŸ“Œ API Endpoints
40
+
41
+ | Method | Endpoint | Description |
42
+ |--------|------------------|-------------------------------|
43
+ | `GET` | `/` | Serves Jinja2 frontend (UI) |
44
+ | `POST` | `/predict/qlora` | Runs QLoRA inference on input |
45
+
46
+ ---
47
+
48
+ ## πŸš€ How to Run
49
+
50
+ ### Locally
51
+
52
+ 1. **Install Python dependencies**:
53
+ ```bash
54
+ pip install -r requirements.txt