Multilingual Document Assistant

Agent-style model for explaining documents, answering questions, and responding conversationally in:

Spanish
Chinese
Vietnamese
Portuguese

Base model: bigscience/bloom-560m on Hugging Face.

Run on Hugging Face

To run this as a Hugging Face Space (browser chat UI):

Create a Space at huggingface.co/new-space:
- Choose Gradio.
- Clone or upload this repo (at least app.py and requirements.txt).
Use your fine-tuned model (after training and pushing):
- Train: python train.py
- Push to Hub: export HF_REPO_ID=your-username/multilingual-doc-assistant then python push_to_hub.py
- In the Space, go to Settings → Variables and add:
  - HF_MODEL_ID = your-username/multilingual-doc-assistant
- The app will load your model from the Hub. Without this, it uses the base BLOOM model.
The Space runs app.py and serves the Gradio chat interface.

Setup (local)

cd multilingual-doc-assistant
pip install -r requirements.txt

Train

python train.py

Saves the fine-tuned model and tokenizer to ./multilingual-doc-model. You can run from any directory; paths are relative to the script.

Test / Chat

After training:

python test_model.py

Uses a Spanish prompt by default. You can edit the prompt in test_model.py to try other languages or questions.

Training data

Add more examples in train.jsonl (one JSON object per line with a "text" key). Use the same User: / Assistant: format so the model learns the conversational style.

Run the Space UI locally

pip install -r requirements.txt
python app.py

Then open the URL Gradio prints (e.g. http://127.0.0.1:7860). To use your trained model locally, set HF_MODEL_ID to a Hub repo or a local path; for a local folder use the path to multilingual-doc-model (transformers supports local paths).

Downloads last month: -; Downloads are not tracked for this model. How to track

drixo
/

multilingual-doc-assistant