igenrate / README.md
sujal7102003's picture
Upload folder using huggingface_hub
a286f95 verified
metadata
title: TinyLlama Chatbot API
emoji: πŸ¦™
colorFrom: indigo
colorTo: pink
sdk: docker
sdk_version: '1.0'
app_file: main.py
pinned: false

πŸš€ FastAPI QLoRA Chatbot

This project provides a FastAPI backend for serving predictions using the TinyLlama model, fine-tuned with QLoRA for instruction-based question answering.

It also includes a clean, responsive Jinja2-based frontend for querying the model interactively.


πŸ”§ Features

  • βœ… QLoRA-finetuned inference endpoint
  • βœ… HTML frontend built using Jinja2
  • βœ… FastAPI + Uvicorn backend
  • βœ… Docker-ready for Hugging Face Spaces
  • βœ… Hugging Face cache and model offloading for low-RAM environments

πŸ“¦ Tech Stack

  • FastAPI + Uvicorn
  • Hugging Face Transformers + PEFT (QLoRA)
  • PyTorch (FP16)
  • Jinja2 Templates + HTML + JS (Vanilla)

πŸ“Œ API Endpoints

Method Endpoint Description
GET / Serves Jinja2 frontend (UI)
POST /predict/qlora Runs QLoRA inference on input

πŸš€ How to Run

Locally

  1. Install Python dependencies:
    pip install -r requirements.txt