drixo's picture
Update README.md
25d8f7a verified
---
pipeline_tag: text-to-speech
---
# Multilingual Document Assistant
Agent-style model for explaining documents, answering questions, and responding conversationally in:
- **Spanish**
- **Chinese**
- **Vietnamese**
- **Portuguese**
Base model: [bigscience/bloom-560m](https://huggingface.co/bigscience/bloom-560m) on Hugging Face.
---
## Run on Hugging Face
To run this as a **Hugging Face Space** (browser chat UI):
1. **Create a Space** at [huggingface.co/new-space](https://huggingface.co/new-space):
- Choose **Gradio**.
- Clone or upload this repo (at least `app.py` and `requirements.txt`).
2. **Use your fine-tuned model** (after training and pushing):
- Train: `python train.py`
- Push to Hub: `export HF_REPO_ID=your-username/multilingual-doc-assistant` then `python push_to_hub.py`
- In the Space, go to **Settings → Variables** and add:
- `HF_MODEL_ID` = `your-username/multilingual-doc-assistant`
- The app will load your model from the Hub. Without this, it uses the base BLOOM model.
3. The Space runs `app.py` and serves the Gradio chat interface.
---
## Setup (local)
```bash
cd multilingual-doc-assistant
pip install -r requirements.txt
```
## Train
```bash
python train.py
```
Saves the fine-tuned model and tokenizer to `./multilingual-doc-model`. You can run from any directory; paths are relative to the script.
## Test / Chat
After training:
```bash
python test_model.py
```
Uses a Spanish prompt by default. You can edit the `prompt` in `test_model.py` to try other languages or questions.
## Training data
Add more examples in `train.jsonl` (one JSON object per line with a `"text"` key). Use the same `User:` / `Assistant:` format so the model learns the conversational style.
## Run the Space UI locally
```bash
pip install -r requirements.txt
python app.py
```
Then open the URL Gradio prints (e.g. http://127.0.0.1:7860). To use your trained model locally, set `HF_MODEL_ID` to a Hub repo or a local path; for a local folder use the path to `multilingual-doc-model` (transformers supports local paths).