Instructions to use madhurithika22/mistral-compressed with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use madhurithika22/mistral-compressed with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="madhurithika22/mistral-compressed")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("madhurithika22/mistral-compressed") model = AutoModelForCausalLM.from_pretrained("madhurithika22/mistral-compressed") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use madhurithika22/mistral-compressed with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "madhurithika22/mistral-compressed" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "madhurithika22/mistral-compressed", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/madhurithika22/mistral-compressed
- SGLang
How to use madhurithika22/mistral-compressed with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "madhurithika22/mistral-compressed" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "madhurithika22/mistral-compressed", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "madhurithika22/mistral-compressed" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "madhurithika22/mistral-compressed", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use madhurithika22/mistral-compressed with Docker Model Runner:
docker model run hf.co/madhurithika22/mistral-compressed
Mistral-Merged (Compressed QLoRA Model)
Overview
This model is a compressed and fine-tuned version of:
mistralai/Voxtral-Mini-4B-Realtime-2602
The objective of this project is to reduce inference cost, memory usage, and energy consumption while maintaining acceptable output quality.
The model is optimized for:
- Efficient inference
- Low GPU memory usage
- vLLM deployment
- Energy-aware benchmarking
Compression Techniques Used
The following compression and optimization techniques were applied:
1. QLoRA (Quantized Low-Rank Adaptation)
- Parameter-efficient fine-tuning
- Only a small subset of trainable parameters updated
- Reduces training memory significantly
2. 8-bit Quantization
Implemented using:
BitsAndBytesConfig(load_in_8bit=True)
Benefits:
- Lower VRAM usage
- Faster loading
- Reduced energy consumption
3. LoRA Adapters
LoRA adapters were trained and merged into the base model.
Configuration:
- Rank (
r): 32 - Alpha: 32
- Dropout: 0.05
Target modules:
- q_proj
- k_proj
- v_proj
- o_proj
Training Details
Dataset
Training dataset:
golden_set_global
Task:
- Exact text copying / continuation
- Multilingual sequence reproduction
Epochs
- 5 epochs
Optimizer
- AdamW
Learning rate:
- 2e-4
Inference Configuration
The model is intended to run using:
- vLLM
vllm serve --config vllm_config.yaml
Evaluation
Evaluation metrics used:
- Semantic Similarity Accuracy
- Word Error Rate (WER)
- Energy Consumption (CodeCarbon)
The model was benchmarked on multilingual text reproduction tasks.
- Downloads last month
- -
Model tree for madhurithika22/mistral-compressed
Base model
mistralai/Ministral-3-3B-Base-2512