Madlab Synthetic Data Generator

🧠 Overview

The Madlab SDG 1.2B is part of the MadlabOSS Synthetic Data Generator family β€” a suite of small, efficient synthetic data generators designed for rule‑consistent, semantically coherent variation.
This model was trained on a closed-source dataset created through a multi-stage synthetic data generation process using a modified Madlab training pipeline. It represents the first in its family to be built upon the cutting-edge LFM2.5-instruct foundation, marking a significant advancement over previous iterations.


πŸš€ Intended Use

This model is optimized for:

  • Madlab synthetic data generation

It is not intended as a general-purpose chatbot.


🧩 Model Details

Base Model: LFM2.5-1.2B-instruct
Parameter Count: 1.2 Billion
Training Type: Supervised fine-tuning
Sequence Length: 1024
Precision: FP16
Framework: PyTorch / Transformers


πŸ“¦ Training Data

The model was trained on:

  • 1444 compressed and encoded dataset pairs
  • High variation in output
  • Preservation of semantic meaning
  • Data entirely generated with Madlab

πŸ‹οΈ Training Procedure

Hyperparameters

  • Epochs: 6
  • Batch size: 48
  • Learning rate: cosine schedule, peak ~4e-5
  • Optimizer: AdamW
  • Gradient clipping: 1.0
  • Gradient accumulation: 1

Hardware

Training was performed on:

  • RTX 6000 Blackwell (96GB)

πŸ“Š Evaluation

multi_model_dashboard

Synthetic Data Expansion Benchmark

A curated set of 30 input/target pairs was programmatically expanded using a Python script.
Metrics include seed pairs covered, total variation count, and semantic quality.
The task is to generate 5 variations of each incoming pair.

note: run numbers not aligned with multi_model_dashboard

Run Model Semantic Quality Variations Seeds Covered Efficiency (Variations/Param) Dataset
1 LFM2-350M-16 6.5 94 23 268.57 Madlab sdg small
2 LFM2-350M-16 3.5 46 11 131.43 base model
3 LFM2-350M-f16 6.5 97 22 277.14 Madlab sdg small
4 Qwen3-coder-30B-instruct-q8 8.2 149 26 4.97 base model
5 LFM2-350M-f16 7.5 136 21 388.57 Madlab sdg medium
6 LFM2-2.6B-f16 9.0 137 25 52.69 Madlab sdg medium
7 LFM2-2.6B-f16 9.9 180 25 69.23 Madlab sdg large
8 LFM2-2.6B-f16 6.2 157 20 60.38 Madlab sdg test
9 LFM2-2.6B-f16 10.0 248 27 95.38 Madlab sdg large
10 Qwen3-235B-q3-k_m 9.5 150 27 0.64 base model
11 LFM2.5-1.2B-instruct-f16 9.1 244 30 203.33 Madlab sdg large

Qualitative Behavior

  • Overperforms in variation count
  • Maintains strict semantic correctness

πŸ”’ Safety

This model is a synthetic data generator. It is not designed for conversational use and is not suitable for anything other than generating synthetic datasets.

It is not designed for:

  • Political advice
  • Medical advice
  • Legal advice
  • General-purpose conversation

⚠️ Limitations

  • Not a general assistant
  • Not trained for coding, math, or open-domain reasoning
  • May refuse tasks outside the Madlab SDG scope

Downloads last month
25
Safetensors
Model size
1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for MadlabOSS/LFM2.5-1.2B-Instruct-SDG

Finetuned
(10)
this model
Quantizations
1 model

Collection including MadlabOSS/LFM2.5-1.2B-Instruct-SDG