|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- legal |
|
|
- immigration |
|
|
- assistant |
|
|
- qwen2 |
|
|
- fine-tuned |
|
|
base_model: Qwen/Qwen2-7B-Instruct |
|
|
model_type: qwen2 |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# DoloresAI - Immigration Law Assistant |
|
|
|
|
|
DoloresAI is a specialized legal assistant fine-tuned on immigration law, designed to provide accurate and helpful information about U.S. immigration processes, visa types, and legal procedures. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: Qwen/Qwen2-7B-Instruct |
|
|
- **Model Type**: Qwen2ForCausalLM |
|
|
- **Parameters**: 7B |
|
|
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation) |
|
|
- **Vocabulary Size**: 151,665 tokens |
|
|
- **Precision**: FP16 |
|
|
- **Context Length**: 32,768 tokens |
|
|
- **Fixed on**: 2026-01-11 |
|
|
|
|
|
## Changes in This Version |
|
|
|
|
|
This is a fixed version of the DoloresAI merged model with vocabulary mismatch resolved: |
|
|
- Fixed vocabulary size mismatch between model (151,936) and tokenizer (151,665) |
|
|
- Model embeddings properly resized to match tokenizer: 151,665 tokens |
|
|
- Ready for deployment on HuggingFace Inference Endpoints without CUDA errors |
|
|
|
|
|
## Training |
|
|
|
|
|
This model was fine-tuned using LoRA adapters on immigration law data and then merged with the base model. The embeddings have been properly resized to match the tokenizer vocabulary size. |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
DoloresAI is designed to assist with: |
|
|
- Immigration process information |
|
|
- Visa type explanations |
|
|
- Legal procedure guidance |
|
|
- Document requirements |
|
|
- Timeline estimates |
|
|
- Form instructions |
|
|
|
|
|
**Important**: This model provides information only and should not be considered legal advice. Always consult with a licensed immigration attorney for specific legal matters. |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
model_name = "JustiGuide/DoloresAI-Merged" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, |
|
|
torch_dtype=torch.float16, |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
prompt = "What are the requirements for an H-1B visa?" |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
|
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=512, |
|
|
temperature=0.7, |
|
|
top_p=0.9, |
|
|
do_sample=True |
|
|
) |
|
|
|
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
print(response) |
|
|
``` |
|
|
|
|
|
## Deployment |
|
|
|
|
|
### HuggingFace Inference Endpoints |
|
|
|
|
|
For production deployment, use these environment variables to avoid CUDA errors: |
|
|
|
|
|
```bash |
|
|
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True |
|
|
CUDA_LAUNCH_BLOCKING=1 |
|
|
TORCH_USE_CUDA_DSA=1 |
|
|
TRANSFORMERS_OFFLINE=0 |
|
|
HF_HUB_ENABLE_HF_TRANSFER=1 |
|
|
MODEL_LOAD_TIMEOUT=600 |
|
|
``` |
|
|
|
|
|
Recommended hardware: Nvidia A10G or better |
|
|
|
|
|
## Verification |
|
|
|
|
|
The vocabulary sizes have been verified to match: |
|
|
- Model vocab size: 151,665 ✅ |
|
|
- Tokenizer vocab size: 151,665 ✅ |
|
|
- Match: ✅ |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Trained primarily on U.S. immigration law |
|
|
- Knowledge cutoff based on training data |
|
|
- Not a replacement for legal counsel |
|
|
- May require additional context for complex cases |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{doloresai2025, |
|
|
title={DoloresAI: Immigration Law Assistant}, |
|
|
author={JustiGuide}, |
|
|
year={2025}, |
|
|
publisher={HuggingFace}, |
|
|
howpublished={\url{https://huggingface.co/JustiGuide/DoloresAI-Merged}} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Model Card Authors |
|
|
|
|
|
JustiGuide Team |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
For questions or issues, please open an issue on the model repository. |
|
|
|