File size: 2,081 Bytes
25d8f7a
 
 
69abda4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25d8f7a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
pipeline_tag: text-to-speech
---
# Multilingual Document Assistant

Agent-style model for explaining documents, answering questions, and responding conversationally in:

- **Spanish**
- **Chinese**
- **Vietnamese**
- **Portuguese**

Base model: [bigscience/bloom-560m](https://huggingface.co/bigscience/bloom-560m) on Hugging Face.

---

## Run on Hugging Face

To run this as a **Hugging Face Space** (browser chat UI):

1. **Create a Space** at [huggingface.co/new-space](https://huggingface.co/new-space):
   - Choose **Gradio**.
   - Clone or upload this repo (at least `app.py` and `requirements.txt`).

2. **Use your fine-tuned model** (after training and pushing):
   - Train: `python train.py`
   - Push to Hub: `export HF_REPO_ID=your-username/multilingual-doc-assistant` then `python push_to_hub.py`
   - In the Space, go to **Settings → Variables** and add:
     - `HF_MODEL_ID` = `your-username/multilingual-doc-assistant`
   - The app will load your model from the Hub. Without this, it uses the base BLOOM model.

3. The Space runs `app.py` and serves the Gradio chat interface.

---

## Setup (local)

```bash
cd multilingual-doc-assistant
pip install -r requirements.txt
```

## Train

```bash
python train.py
```

Saves the fine-tuned model and tokenizer to `./multilingual-doc-model`. You can run from any directory; paths are relative to the script.

## Test / Chat

After training:

```bash
python test_model.py
```

Uses a Spanish prompt by default. You can edit the `prompt` in `test_model.py` to try other languages or questions.

## Training data

Add more examples in `train.jsonl` (one JSON object per line with a `"text"` key). Use the same `User:` / `Assistant:` format so the model learns the conversational style.

## Run the Space UI locally

```bash
pip install -r requirements.txt
python app.py
```

Then open the URL Gradio prints (e.g. http://127.0.0.1:7860). To use your trained model locally, set `HF_MODEL_ID` to a Hub repo or a local path; for a local folder use the path to `multilingual-doc-model` (transformers supports local paths).