Llama-3.1-8B FP8 QAT Fine-tuned with Unsloth

This model was fine-tuned using FP8-INT4 Quantization-Aware Training (QAT) with Unsloth.

Model Details

Base Model: meta-llama/Llama-3.1-8B-Instruct
Fine-tuning Method: LoRA + FP8-INT4 QAT
QAT Scheme: fp8-int4
LoRA Rank: 32
Training Dataset: mlabonne/FineTome-100k

What is FP8-INT4 QAT?

FP8-INT4 Quantization-Aware Training trains the model to be robust to FP8-INT4 precision loss by simulating quantization during training. This results in:

Minimal accuracy degradation when deployed in FP8-INT4
Faster inference with FP8-INT4 hardware support
Reduced memory footprint

Usage

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="tokenlabsdotrun/Llama-3.1-8B-Unsloth-FP8_INT4-QAT",
    dtype=torch.bfloat16,
    max_seq_length=2048,
)

Training Configuration

Learning rate: 2e-5
Batch size: 1 (with gradient accumulation of 4)
Optimizer: AdamW 8-bit
Training steps: 30

License

This model is released under the Llama 3.1 Community License.

Downloads last month: -

Model tree for tokenlabsdotrun/Llama-3.1-8B-Unsloth-FP8_INT4-QAT

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(1535)

this model