| --- |
| pipeline_tag: text-to-speech |
| --- |
| # Multilingual Document Assistant |
|
|
| Agent-style model for explaining documents, answering questions, and responding conversationally in: |
|
|
| - **Spanish** |
| - **Chinese** |
| - **Vietnamese** |
| - **Portuguese** |
|
|
| Base model: [bigscience/bloom-560m](https://huggingface.co/bigscience/bloom-560m) on Hugging Face. |
|
|
| --- |
|
|
| ## Run on Hugging Face |
|
|
| To run this as a **Hugging Face Space** (browser chat UI): |
|
|
| 1. **Create a Space** at [huggingface.co/new-space](https://huggingface.co/new-space): |
| - Choose **Gradio**. |
| - Clone or upload this repo (at least `app.py` and `requirements.txt`). |
|
|
| 2. **Use your fine-tuned model** (after training and pushing): |
| - Train: `python train.py` |
| - Push to Hub: `export HF_REPO_ID=your-username/multilingual-doc-assistant` then `python push_to_hub.py` |
| - In the Space, go to **Settings → Variables** and add: |
| - `HF_MODEL_ID` = `your-username/multilingual-doc-assistant` |
| - The app will load your model from the Hub. Without this, it uses the base BLOOM model. |
|
|
| 3. The Space runs `app.py` and serves the Gradio chat interface. |
|
|
| --- |
|
|
| ## Setup (local) |
|
|
| ```bash |
| cd multilingual-doc-assistant |
| pip install -r requirements.txt |
| ``` |
|
|
| ## Train |
|
|
| ```bash |
| python train.py |
| ``` |
|
|
| Saves the fine-tuned model and tokenizer to `./multilingual-doc-model`. You can run from any directory; paths are relative to the script. |
|
|
| ## Test / Chat |
|
|
| After training: |
|
|
| ```bash |
| python test_model.py |
| ``` |
|
|
| Uses a Spanish prompt by default. You can edit the `prompt` in `test_model.py` to try other languages or questions. |
|
|
| ## Training data |
|
|
| Add more examples in `train.jsonl` (one JSON object per line with a `"text"` key). Use the same `User:` / `Assistant:` format so the model learns the conversational style. |
|
|
| ## Run the Space UI locally |
|
|
| ```bash |
| pip install -r requirements.txt |
| python app.py |
| ``` |
|
|
| Then open the URL Gradio prints (e.g. http://127.0.0.1:7860). To use your trained model locally, set `HF_MODEL_ID` to a Hub repo or a local path; for a local folder use the path to `multilingual-doc-model` (transformers supports local paths). |