File size: 7,028 Bytes
1debfcf
 
 
 
 
 
 
 
 
 
d81a83d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
954e83c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c7327da
d81a83d
c7327da
d81a83d
c7327da
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d81a83d
c7327da
d81a83d
 
 
 
 
 
 
2237f11
 
 
 
 
 
d81a83d
2237f11
d81a83d
 
2237f11
 
d81a83d
2237f11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3988d04
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2237f11
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
---
title: Streamlit Chatbot
emoji: "πŸ—¨οΈ"
colorFrom: indigo
colorTo: purple
sdk: docker
app_file: app.py
pinned: false
---

# Streamlit Chatbot ✨

A lightweight chatbot built with [Streamlit](https://streamlit.io/) and the open-source `microsoft/DialoGPT-small` language model from [Hugging Face](https://huggingface.co/). This repository is ready to be deployed to [Hugging Face Spaces](https://huggingface.co/spaces) automatically through GitHub Actions.

## Features

* πŸ“œ **Open-source LLM** – Uses a small conversational model that runs comfortably on the free GPU or CPU hardware offered by Spaces.
* πŸ’¬ **Chat interface** – Powered by Streamlit 1.30+ `st.chat_*` components.
* πŸ”„ **Persistent history** – Session-state keeps the discussion context on the client side.
* πŸš€ **1-click deploy** – Push to the `main` branch and GitHub Actions mirrors the repository to your Space.

---

## Quick start (local)

```bash
# 1. Install dependencies
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# 2. Launch the app
streamlit run app.py
```

The app will open in your browser at `http://localhost:8501`.

---

## Quick start (Docker)

If you prefer to run the chatbot in a container instead of a local virtual-env, use the provided `Dockerfile`.

```bash
# 1. Build the image (tagged "streamlit-chatbot")
docker build -t streamlit-chatbot .

# 2. Run the container and expose the app on http://localhost:8501
docker run --rm -it -e PORT=8501 -p 8501:8501 streamlit-chatbot
```

The container entrypoint launches Streamlit on the port given by the `PORT` environment variable (the same variable Hugging Face uses). By passing `-e PORT=8501` and mapping `-p 8501:8501`, you can access the interface in your browser at `http://localhost:8501`.

---

## Manual deploy to Hugging Face Spaces (CLI)

If you'd rather push the repository yourself (skipping GitHub Actions):

```bash
# 1. Authenticate once (stores your token locally)
huggingface-cli login   # paste your HF_TOKEN when prompted

# 2. (First time only) create the Space as a Docker Space
huggingface-cli repo create afscomercial/streamlit-chatbot \
  --repo-type space --space-sdk docker -y  # change the name accordingly

# 3. Add the new remote and push
cd path/to/streamlit_chatbot

git lfs install                 # enables Large-File Storage just in case
git remote add hf \
  https://huggingface.co/spaces/afscomercial/streamlit-chatbot

git push hf main --force        # overwrite contents of the Space
```

After the push the Space will rebuild the Docker image and redeploy automatically.

---

## Repository layout

```
.
β”œβ”€β”€ app.py                      # Streamlit application – chat UI
β”œβ”€β”€ fine_tune.py                # Script to fine-tune the base LLM on JSONL data
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ data/                       # Example datasets (small, can live in git)
β”‚   └── aviation_conversations.jsonl
β”œβ”€β”€ research/                   # Jupyter notebooks / ad-hoc DS experiments (untracked by CI)
β”œβ”€β”€ .streamlit/
β”‚   └── config.toml             # UI & server settings
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       β”œβ”€β”€ deploy-to-spaces.yml  # CI/CD – auto-deploy app
β”œβ”€β”€ Dockerfile                  # Container definition for Docker/HF Spaces
└── README.md
```

## Research folder
The `research/` directory is reserved for exploratory notebooks, data-science experiments, and scratch work that shouldn't affect the production application. Feel free to place notebooks, CSVs, or prototype scripts here. Anything computationally heavy or containing large files should **not** be committed; the folder is in the `.gitignore` by default.

## Fine-tuning the model (aviation example)

This repo ships with a tiny JSON-Lines dataset in `data/` that contains sample Q&A about aviation. A GitHub Action (`train-model.yml`) fine-tunes `microsoft/DialoGPT-small` on that data and pushes the checkpoint to the Hub as `afscomercial/streamlit-chatbot-aviation` (or the repo name you set in the `MODEL_REPO` secret).

You can also run it locally:

```bash
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

export HF_TOKEN=YOUR_WRITE_TOKEN
export MODEL_REPO="afscomercial/streamlit-chatbot-aviation"
python fine_tune.py
```

The script will train for one epoch (change `EPOCHS` if you wish) and push the new weights to the model repo.

### Where the model is used
`app.py` now reads the environment variable `MODEL_REPO` (defaulting to `afscomercial/streamlit-chatbot-aviation`). At startup, the Streamlit app downloads the fine-tuned checkpoint instead of the vanilla DialoGPT model.

### Pushing the fine-tuned model to the Hub

There are two convenient ways to upload your checkpoint to the Hub once the training run is finished.

#### Option 1 β€” let the training script push automatically
`fine_tune.py` ends with `trainer.push_to_hub()`, so you only need to:

```bash
# 1 Β· Authenticate once (stores your token locally)
huggingface-cli login                     # paste your HF access token

# 2 Β· (First time only) create the model repo on the Hub
huggingface-cli repo create <USER>/<MODEL_REPO> -y
#     e.g.  huggingface-cli repo create your-username/streamlit-chatbot-aviation -y

# 3 Β· Point the run to that repo (default shown below)
export MODEL_REPO="your-username/streamlit-chatbot-aviation"

# 4 Β· Launch training β€” the script will commit + push automatically
python fine_tune.py
```

#### Option 2 β€” push an existing folder manually
If you already have the fine-tuned files on disk (e.g. in `finetuned-aviation/`):

```bash
# 1 Β· Create the repo once
huggingface-cli repo create your-username/streamlit-chatbot-aviation -y

# 2 Β· Clone the empty repo & copy your files into it
git clone https://huggingface.co/your-username/streamlit-chatbot-aviation
cd streamlit-chatbot-aviation
cp -r /path/to/finetuned-aviation/* .

# 3 Β· Commit and push
git add .
git commit -m "First fine-tuned checkpoint"
git push
```

After the checkpoint is online, simply point the Streamlit app to it (locally or on Spaces) with:

```bash
export MODEL_REPO="your-username/streamlit-chatbot-aviation"
streamlit run app.py
```

## Architecture

```mermaid
graph TD
  subgraph "Frontend"
    U["User<br/>Browser"] -->|"HTTP 8501"| A["Streamlit<br/>Chatbot (app.py)"]
  end

  subgraph "Backend"
    A -->|"Load fine-tuned weights<br/>+ tokenizer"| M["LLM<br/>DialoGPT-fine-tuned"]
    A -->|"Generate reply"| M
    M -->|"Response"| A
  end

  subgraph "Model Hub"
    MH["Hugging Face<br/>Model Repo"]
  end
  MH --> M

  subgraph "Training"
    DS["Dataset<br/>aviation_conversations.jsonl"]
    FT["fine_tune.py<br/>(HF Trainer)"]
    DS --> FT
    FT -->|"Push to Hub"| MH
  end

  CI["GitHub Actions<br/>train-model.yml"] --> FT
  CI2["GitHub Actions<br/>deploy-to-spaces.yml"] -->|"Docker Image"| HFSpace["HF Space<br/>Docker Runtime"]
  HFSpace --> A
```

---