drixo
/

multilingual-doc-assistant

Model card Files Files and versions

multilingual-doc-assistant / README.md

drixo's picture

Update README.md

25d8f7a verified about 2 months ago

|

history blame contribute delete

2.08 kB

	---
	pipeline_tag: text-to-speech
	---
	# Multilingual Document Assistant

	Agent-style model for explaining documents, answering questions, and responding conversationally in:

	- Spanish
	- Chinese
	- Vietnamese
	- Portuguese

	Base model: [bigscience/bloom-560m](https://huggingface.co/bigscience/bloom-560m) on Hugging Face.

	---

	## Run on Hugging Face

	To run this as a Hugging Face Space (browser chat UI):

	1. Create a Space at [huggingface.co/new-space](https://huggingface.co/new-space):
	- Choose Gradio.
	- Clone or upload this repo (at least `app.py` and `requirements.txt`).

	2. Use your fine-tuned model (after training and pushing):
	- Train: `python train.py`
	- Push to Hub: `export HF_REPO_ID=your-username/multilingual-doc-assistant` then `python push_to_hub.py`
	- In the Space, go to Settings → Variables and add:
	- `HF_MODEL_ID` = `your-username/multilingual-doc-assistant`
	- The app will load your model from the Hub. Without this, it uses the base BLOOM model.

	3. The Space runs `app.py` and serves the Gradio chat interface.

	---

	## Setup (local)

	```bash
	cd multilingual-doc-assistant
	pip install -r requirements.txt
	```

	## Train

	```bash
	python train.py
	```

	Saves the fine-tuned model and tokenizer to `./multilingual-doc-model`. You can run from any directory; paths are relative to the script.

	## Test / Chat

	After training:

	```bash
	python test_model.py
	```

	Uses a Spanish prompt by default. You can edit the `prompt` in `test_model.py` to try other languages or questions.

	## Training data

	Add more examples in `train.jsonl` (one JSON object per line with a `"text"` key). Use the same `User:` / `Assistant:` format so the model learns the conversational style.

	## Run the Space UI locally

	```bash
	pip install -r requirements.txt
	python app.py
	```

	Then open the URL Gradio prints (e.g. http://127.0.0.1:7860). To use your trained model locally, set `HF_MODEL_ID` to a Hub repo or a local path; for a local folder use the path to `multilingual-doc-model` (transformers supports local paths).