Llama-3.1-8B FP8 QAT Fine-tuned with Unsloth

This model was fine-tuned using FP8-INT4 Quantization-Aware Training (QAT) with Unsloth.

Model Details

  • Base Model: meta-llama/Llama-3.1-8B-Instruct
  • Fine-tuning Method: LoRA + FP8-INT4 QAT
  • QAT Scheme: fp8-int4
  • LoRA Rank: 32
  • Training Dataset: mlabonne/FineTome-100k

What is FP8-INT4 QAT?

FP8-INT4 Quantization-Aware Training trains the model to be robust to FP8-INT4 precision loss by simulating quantization during training. This results in:

  • Minimal accuracy degradation when deployed in FP8-INT4
  • Faster inference with FP8-INT4 hardware support
  • Reduced memory footprint

Usage

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="tokenlabsdotrun/Llama-3.1-8B-Unsloth-FP8_INT4-QAT",
    dtype=torch.bfloat16,
    max_seq_length=2048,
)

Training Configuration

  • Learning rate: 2e-5
  • Batch size: 1 (with gradient accumulation of 4)
  • Optimizer: AdamW 8-bit
  • Training steps: 30

License

This model is released under the Llama 3.1 Community License.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tokenlabsdotrun/Llama-3.1-8B-Unsloth-FP8_INT4-QAT

Adapter
(1535)
this model