--- pipeline_tag: text-to-speech --- # Multilingual Document Assistant Agent-style model for explaining documents, answering questions, and responding conversationally in: - **Spanish** - **Chinese** - **Vietnamese** - **Portuguese** Base model: [bigscience/bloom-560m](https://huggingface.co/bigscience/bloom-560m) on Hugging Face. --- ## Run on Hugging Face To run this as a **Hugging Face Space** (browser chat UI): 1. **Create a Space** at [huggingface.co/new-space](https://huggingface.co/new-space): - Choose **Gradio**. - Clone or upload this repo (at least `app.py` and `requirements.txt`). 2. **Use your fine-tuned model** (after training and pushing): - Train: `python train.py` - Push to Hub: `export HF_REPO_ID=your-username/multilingual-doc-assistant` then `python push_to_hub.py` - In the Space, go to **Settings → Variables** and add: - `HF_MODEL_ID` = `your-username/multilingual-doc-assistant` - The app will load your model from the Hub. Without this, it uses the base BLOOM model. 3. The Space runs `app.py` and serves the Gradio chat interface. --- ## Setup (local) ```bash cd multilingual-doc-assistant pip install -r requirements.txt ``` ## Train ```bash python train.py ``` Saves the fine-tuned model and tokenizer to `./multilingual-doc-model`. You can run from any directory; paths are relative to the script. ## Test / Chat After training: ```bash python test_model.py ``` Uses a Spanish prompt by default. You can edit the `prompt` in `test_model.py` to try other languages or questions. ## Training data Add more examples in `train.jsonl` (one JSON object per line with a `"text"` key). Use the same `User:` / `Assistant:` format so the model learns the conversational style. ## Run the Space UI locally ```bash pip install -r requirements.txt python app.py ``` Then open the URL Gradio prints (e.g. http://127.0.0.1:7860). To use your trained model locally, set `HF_MODEL_ID` to a Hub repo or a local path; for a local folder use the path to `multilingual-doc-model` (transformers supports local paths).