streamlit-chatbot / README.md
afscomercial's picture
update readme
3988d04
metadata
title: Streamlit Chatbot
emoji: πŸ—¨οΈ
colorFrom: indigo
colorTo: purple
sdk: docker
app_file: app.py
pinned: false

Streamlit Chatbot ✨

A lightweight chatbot built with Streamlit and the open-source microsoft/DialoGPT-small language model from Hugging Face. This repository is ready to be deployed to Hugging Face Spaces automatically through GitHub Actions.

Features

  • πŸ“œ Open-source LLM – Uses a small conversational model that runs comfortably on the free GPU or CPU hardware offered by Spaces.
  • πŸ’¬ Chat interface – Powered by Streamlit 1.30+ st.chat_* components.
  • πŸ”„ Persistent history – Session-state keeps the discussion context on the client side.
  • πŸš€ 1-click deploy – Push to the main branch and GitHub Actions mirrors the repository to your Space.

Quick start (local)

# 1. Install dependencies
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# 2. Launch the app
streamlit run app.py

The app will open in your browser at http://localhost:8501.


Quick start (Docker)

If you prefer to run the chatbot in a container instead of a local virtual-env, use the provided Dockerfile.

# 1. Build the image (tagged "streamlit-chatbot")
docker build -t streamlit-chatbot .

# 2. Run the container and expose the app on http://localhost:8501
docker run --rm -it -e PORT=8501 -p 8501:8501 streamlit-chatbot

The container entrypoint launches Streamlit on the port given by the PORT environment variable (the same variable Hugging Face uses). By passing -e PORT=8501 and mapping -p 8501:8501, you can access the interface in your browser at http://localhost:8501.


Manual deploy to Hugging Face Spaces (CLI)

If you'd rather push the repository yourself (skipping GitHub Actions):

# 1. Authenticate once (stores your token locally)
huggingface-cli login   # paste your HF_TOKEN when prompted

# 2. (First time only) create the Space as a Docker Space
huggingface-cli repo create afscomercial/streamlit-chatbot \
  --repo-type space --space-sdk docker -y  # change the name accordingly

# 3. Add the new remote and push
cd path/to/streamlit_chatbot

git lfs install                 # enables Large-File Storage just in case
git remote add hf \
  https://huggingface.co/spaces/afscomercial/streamlit-chatbot

git push hf main --force        # overwrite contents of the Space

After the push the Space will rebuild the Docker image and redeploy automatically.


Repository layout

.
β”œβ”€β”€ app.py                      # Streamlit application – chat UI
β”œβ”€β”€ fine_tune.py                # Script to fine-tune the base LLM on JSONL data
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ data/                       # Example datasets (small, can live in git)
β”‚   └── aviation_conversations.jsonl
β”œβ”€β”€ research/                   # Jupyter notebooks / ad-hoc DS experiments (untracked by CI)
β”œβ”€β”€ .streamlit/
β”‚   └── config.toml             # UI & server settings
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       β”œβ”€β”€ deploy-to-spaces.yml  # CI/CD – auto-deploy app
β”œβ”€β”€ Dockerfile                  # Container definition for Docker/HF Spaces
└── README.md

Research folder

The research/ directory is reserved for exploratory notebooks, data-science experiments, and scratch work that shouldn't affect the production application. Feel free to place notebooks, CSVs, or prototype scripts here. Anything computationally heavy or containing large files should not be committed; the folder is in the .gitignore by default.

Fine-tuning the model (aviation example)

This repo ships with a tiny JSON-Lines dataset in data/ that contains sample Q&A about aviation. A GitHub Action (train-model.yml) fine-tunes microsoft/DialoGPT-small on that data and pushes the checkpoint to the Hub as afscomercial/streamlit-chatbot-aviation (or the repo name you set in the MODEL_REPO secret).

You can also run it locally:

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

export HF_TOKEN=YOUR_WRITE_TOKEN
export MODEL_REPO="afscomercial/streamlit-chatbot-aviation"
python fine_tune.py

The script will train for one epoch (change EPOCHS if you wish) and push the new weights to the model repo.

Where the model is used

app.py now reads the environment variable MODEL_REPO (defaulting to afscomercial/streamlit-chatbot-aviation). At startup, the Streamlit app downloads the fine-tuned checkpoint instead of the vanilla DialoGPT model.

Pushing the fine-tuned model to the Hub

There are two convenient ways to upload your checkpoint to the Hub once the training run is finished.

Option 1 β€” let the training script push automatically

fine_tune.py ends with trainer.push_to_hub(), so you only need to:

# 1 Β· Authenticate once (stores your token locally)
huggingface-cli login                     # paste your HF access token

# 2 Β· (First time only) create the model repo on the Hub
huggingface-cli repo create <USER>/<MODEL_REPO> -y
#     e.g.  huggingface-cli repo create your-username/streamlit-chatbot-aviation -y

# 3 Β· Point the run to that repo (default shown below)
export MODEL_REPO="your-username/streamlit-chatbot-aviation"

# 4 Β· Launch training β€” the script will commit + push automatically
python fine_tune.py

Option 2 β€” push an existing folder manually

If you already have the fine-tuned files on disk (e.g. in finetuned-aviation/):

# 1 Β· Create the repo once
huggingface-cli repo create your-username/streamlit-chatbot-aviation -y

# 2 Β· Clone the empty repo & copy your files into it
git clone https://huggingface.co/your-username/streamlit-chatbot-aviation
cd streamlit-chatbot-aviation
cp -r /path/to/finetuned-aviation/* .

# 3 Β· Commit and push
git add .
git commit -m "First fine-tuned checkpoint"
git push

After the checkpoint is online, simply point the Streamlit app to it (locally or on Spaces) with:

export MODEL_REPO="your-username/streamlit-chatbot-aviation"
streamlit run app.py

Architecture

graph TD
  subgraph "Frontend"
    U["User<br/>Browser"] -->|"HTTP 8501"| A["Streamlit<br/>Chatbot (app.py)"]
  end

  subgraph "Backend"
    A -->|"Load fine-tuned weights<br/>+ tokenizer"| M["LLM<br/>DialoGPT-fine-tuned"]
    A -->|"Generate reply"| M
    M -->|"Response"| A
  end

  subgraph "Model Hub"
    MH["Hugging Face<br/>Model Repo"]
  end
  MH --> M

  subgraph "Training"
    DS["Dataset<br/>aviation_conversations.jsonl"]
    FT["fine_tune.py<br/>(HF Trainer)"]
    DS --> FT
    FT -->|"Push to Hub"| MH
  end

  CI["GitHub Actions<br/>train-model.yml"] --> FT
  CI2["GitHub Actions<br/>deploy-to-spaces.yml"] -->|"Docker Image"| HFSpace["HF Space<br/>Docker Runtime"]
  HFSpace --> A