Juru: Legal Brazilian Large Language Model from Reputable Sources
Paper • 2403.18140 • Published
How to use roseval/Juru-7B with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="roseval/Juru-7B") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("roseval/Juru-7B", dtype="auto")How to use roseval/Juru-7B with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "roseval/Juru-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "roseval/Juru-7B",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/roseval/Juru-7B
How to use roseval/Juru-7B with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "roseval/Juru-7B" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "roseval/Juru-7B",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "roseval/Juru-7B" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "roseval/Juru-7B",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use roseval/Juru-7B with Docker Model Runner:
docker model run hf.co/roseval/Juru-7B
This repository hosts the public checkpoints for Juru-7B, a Mistral-7B specialised in the Brazilian legal domain. The model was continued pretrained on 1.9 billion unique tokens from reputable academic and legal sources in Portuguese. For full details on data curation, training, and evaluation, see our paper: https://arxiv.org/abs/2403.18140.
Note: The model has not been instruction finetuned. For best results, use few-shot inference or perform additional finetuning on your specific task.
@misc{junior2024jurulegalbrazilianlarge,
title={Juru: Legal Brazilian Large Language Model from Reputable Sources},
author={Roseval Malaquias Junior and Ramon Pires and Roseli Romero and Rodrigo Nogueira},
year={2024},
eprint={2403.18140},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2403.18140},
}
Base model
mistralai/Mistral-7B-v0.3
docker model run hf.co/roseval/Juru-7B