Multilingual Document Assistant
Agent-style model for explaining documents, answering questions, and responding conversationally in:
- Spanish
- Chinese
- Vietnamese
- Portuguese
Base model: bigscience/bloom-560m on Hugging Face.
Run on Hugging Face
To run this as a Hugging Face Space (browser chat UI):
Create a Space at huggingface.co/new-space:
- Choose Gradio.
- Clone or upload this repo (at least
app.pyandrequirements.txt).
Use your fine-tuned model (after training and pushing):
- Train:
python train.py - Push to Hub:
export HF_REPO_ID=your-username/multilingual-doc-assistantthenpython push_to_hub.py - In the Space, go to Settings โ Variables and add:
HF_MODEL_ID=your-username/multilingual-doc-assistant
- The app will load your model from the Hub. Without this, it uses the base BLOOM model.
- Train:
The Space runs
app.pyand serves the Gradio chat interface.
Setup (local)
cd multilingual-doc-assistant
pip install -r requirements.txt
Train
python train.py
Saves the fine-tuned model and tokenizer to ./multilingual-doc-model. You can run from any directory; paths are relative to the script.
Test / Chat
After training:
python test_model.py
Uses a Spanish prompt by default. You can edit the prompt in test_model.py to try other languages or questions.
Training data
Add more examples in train.jsonl (one JSON object per line with a "text" key). Use the same User: / Assistant: format so the model learns the conversational style.
Run the Space UI locally
pip install -r requirements.txt
python app.py
Then open the URL Gradio prints (e.g. http://127.0.0.1:7860). To use your trained model locally, set HF_MODEL_ID to a Hub repo or a local path; for a local folder use the path to multilingual-doc-model (transformers supports local paths).