You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Mistral-Merged (Compressed QLoRA Model)

Overview

This model is a compressed and fine-tuned version of:

  • mistralai/Voxtral-Mini-4B-Realtime-2602

The objective of this project is to reduce inference cost, memory usage, and energy consumption while maintaining acceptable output quality.

The model is optimized for:

  • Efficient inference
  • Low GPU memory usage
  • vLLM deployment
  • Energy-aware benchmarking

Compression Techniques Used

The following compression and optimization techniques were applied:

1. QLoRA (Quantized Low-Rank Adaptation)

  • Parameter-efficient fine-tuning
  • Only a small subset of trainable parameters updated
  • Reduces training memory significantly

2. 8-bit Quantization

Implemented using:

  • BitsAndBytesConfig(load_in_8bit=True)

Benefits:

  • Lower VRAM usage
  • Faster loading
  • Reduced energy consumption

3. LoRA Adapters

LoRA adapters were trained and merged into the base model.

Configuration:

  • Rank (r): 32
  • Alpha: 32
  • Dropout: 0.05

Target modules:

  • q_proj
  • k_proj
  • v_proj
  • o_proj

Training Details

Dataset

Training dataset:

  • golden_set_global

Task:

  • Exact text copying / continuation
  • Multilingual sequence reproduction

Epochs

  • 5 epochs

Optimizer

  • AdamW

Learning rate:

  • 2e-4

Inference Configuration

The model is intended to run using:

  • vLLM
vllm serve --config vllm_config.yaml

Evaluation

Evaluation metrics used:

  • Semantic Similarity Accuracy
  • Word Error Rate (WER)
  • Energy Consumption (CodeCarbon)

The model was benchmarked on multilingual text reproduction tasks.


Downloads last month
65
Safetensors
Model size
7B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for madhurithika22/mistral-compressed