SmolLM2-360M - Odds Ratio Only (Ablation Study)

Note: This is an ablation study experiment. The model weights were not saved for this experiment. This repository documents the experiment configuration and results for research transparency.

Experiment Description

Testing Odds Ratio component in isolation

This experiment is part of a component ablation study for the Enhanced KTO training method, testing how different Prospect Theory components affect model training.

Component Configuration

Enabled Components

  • Odds Ratio

Disabled Components

  • Prospect Theory Value Function
  • Probability Weighting
  • BCO Shift

Training Summary

Metric Value
Status Completed
Training Time 150.2 minutes
Start Time 2025-12-24T18:28:47.060090
End Time 2025-12-24T20:58:58.216235
Model Weights Saved No

Training Configuration

Parameter Value
Base Model HuggingFaceTB/SmolLM2-360M
Epochs 1
Batch Size 4
Gradient Accumulation 8
Learning Rate 0.0002
Max Length 1024
Beta 0.1
Dataset kto_combined
Train Samples 32090
Test Samples 1295

Combined Preference Dataset (kto_combined)

Training uses a Combined Preference Dataset built via Round-Robin Sampling from three sources:

Source Total Samples Interactions
Anthropic HH-RLHF 321,600 61,568
Stanford Human Preferences (SHP) 697,436 38,984
OpenAssistant Conversations v1 16,810 8,904
Total 1,035,846 109,456

Actual Training Statistics (subset split train_prefs[:32090]):

  • Training samples: 13,300 (paired examples)
  • Validation samples: 700 (5%)
  • Round-Robin distribution: 1,130 interactions per source
  • Seed: 42 (for reproducibility)

Why No Model Weights?

This ablation experiment was designed to test training dynamics and loss behavior with different component configurations. The primary goal was to observe:

  1. Training Stability: How component combinations affect gradient flow
  2. Loss Curves: Convergence patterns with different components
  3. Efficiency: Training time per component configuration

Model weights were not saved to conserve storage, as the main value from this experiment is the configuration documentation and training logs.

Findings

This experiment contributed to understanding which component combinations work well together in the Enhanced KTO framework. Results informed the final Enhanced KTO configuration used in the main training runs.

Related Experiments

Citation

@misc{smollm2_ablation_or_only_2025,
  title = {SmolLM2-360M Odds Ratio Only Ablation Study},
  author = {Thesis Research},
  year = {2025},
  note = {Component ablation experiment - no model weights saved},
  publisher = {HuggingFace}
}

License

Apache 2.0


This experiment was conducted as part of thesis research on LLM alignment using preference optimization methods.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support