DoloresAI-Merged / README.md
busybisi's picture
Upload README.md with huggingface_hub
c6d9b01 verified
---
language:
- en
license: apache-2.0
tags:
- legal
- immigration
- assistant
- qwen2
- fine-tuned
base_model: Qwen/Qwen2-7B-Instruct
model_type: qwen2
pipeline_tag: text-generation
---
# DoloresAI - Immigration Law Assistant
DoloresAI is a specialized legal assistant fine-tuned on immigration law, designed to provide accurate and helpful information about U.S. immigration processes, visa types, and legal procedures.
## Model Details
- **Base Model**: Qwen/Qwen2-7B-Instruct
- **Model Type**: Qwen2ForCausalLM
- **Parameters**: 7B
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **Vocabulary Size**: 151,665 tokens
- **Precision**: FP16
- **Context Length**: 32,768 tokens
- **Fixed on**: 2026-01-11
## Changes in This Version
This is a fixed version of the DoloresAI merged model with vocabulary mismatch resolved:
- Fixed vocabulary size mismatch between model (151,936) and tokenizer (151,665)
- Model embeddings properly resized to match tokenizer: 151,665 tokens
- Ready for deployment on HuggingFace Inference Endpoints without CUDA errors
## Training
This model was fine-tuned using LoRA adapters on immigration law data and then merged with the base model. The embeddings have been properly resized to match the tokenizer vocabulary size.
## Intended Use
DoloresAI is designed to assist with:
- Immigration process information
- Visa type explanations
- Legal procedure guidance
- Document requirements
- Timeline estimates
- Form instructions
**Important**: This model provides information only and should not be considered legal advice. Always consult with a licensed immigration attorney for specific legal matters.
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "JustiGuide/DoloresAI-Merged"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
prompt = "What are the requirements for an H-1B visa?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
## Deployment
### HuggingFace Inference Endpoints
For production deployment, use these environment variables to avoid CUDA errors:
```bash
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
CUDA_LAUNCH_BLOCKING=1
TORCH_USE_CUDA_DSA=1
TRANSFORMERS_OFFLINE=0
HF_HUB_ENABLE_HF_TRANSFER=1
MODEL_LOAD_TIMEOUT=600
```
Recommended hardware: Nvidia A10G or better
## Verification
The vocabulary sizes have been verified to match:
- Model vocab size: 151,665 ✅
- Tokenizer vocab size: 151,665 ✅
- Match: ✅
## Limitations
- Trained primarily on U.S. immigration law
- Knowledge cutoff based on training data
- Not a replacement for legal counsel
- May require additional context for complex cases
## License
Apache 2.0
## Citation
```bibtex
@misc{doloresai2025,
title={DoloresAI: Immigration Law Assistant},
author={JustiGuide},
year={2025},
publisher={HuggingFace},
howpublished={\url{https://huggingface.co/JustiGuide/DoloresAI-Merged}}
}
```
## Model Card Authors
JustiGuide Team
## Model Card Contact
For questions or issues, please open an issue on the model repository.