Spaces:
Runtime error
Runtime error
metadata
title: TinyLlama Chatbot API
emoji: π¦
colorFrom: indigo
colorTo: pink
sdk: docker
sdk_version: '1.0'
app_file: main.py
pinned: false
π FastAPI QLoRA Chatbot
This project provides a FastAPI backend for serving predictions using the TinyLlama model, fine-tuned with QLoRA for instruction-based question answering.
It also includes a clean, responsive Jinja2-based frontend for querying the model interactively.
π§ Features
- β QLoRA-finetuned inference endpoint
- β HTML frontend built using Jinja2
- β FastAPI + Uvicorn backend
- β Docker-ready for Hugging Face Spaces
- β Hugging Face cache and model offloading for low-RAM environments
π¦ Tech Stack
- FastAPI + Uvicorn
- Hugging Face Transformers + PEFT (QLoRA)
- PyTorch (FP16)
- Jinja2 Templates + HTML + JS (Vanilla)
π API Endpoints
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Serves Jinja2 frontend (UI) |
POST |
/predict/qlora |
Runs QLoRA inference on input |
π How to Run
Locally
- Install Python dependencies:
pip install -r requirements.txt