SmolLM2-360M - Odds Ratio Only (Ablation Study)
Note: This is an ablation study experiment. The model weights were not saved for this experiment. This repository documents the experiment configuration and results for research transparency.
Experiment Description
Testing Odds Ratio component in isolation
This experiment is part of a component ablation study for the Enhanced KTO training method, testing how different Prospect Theory components affect model training.
Component Configuration
Enabled Components
- Odds Ratio
Disabled Components
Prospect Theory Value FunctionProbability WeightingBCO Shift
Training Summary
| Metric | Value |
|---|---|
| Status | Completed |
| Training Time | 150.2 minutes |
| Start Time | 2025-12-24T18:28:47.060090 |
| End Time | 2025-12-24T20:58:58.216235 |
| Model Weights Saved | No |
Training Configuration
| Parameter | Value |
|---|---|
| Base Model | HuggingFaceTB/SmolLM2-360M |
| Epochs | 1 |
| Batch Size | 4 |
| Gradient Accumulation | 8 |
| Learning Rate | 0.0002 |
| Max Length | 1024 |
| Beta | 0.1 |
| Dataset | kto_combined |
| Train Samples | 32090 |
| Test Samples | 1295 |
Combined Preference Dataset (kto_combined)
Training uses a Combined Preference Dataset built via Round-Robin Sampling from three sources:
| Source | Total Samples | Interactions |
|---|---|---|
| Anthropic HH-RLHF | 321,600 | 61,568 |
| Stanford Human Preferences (SHP) | 697,436 | 38,984 |
| OpenAssistant Conversations v1 | 16,810 | 8,904 |
| Total | 1,035,846 | 109,456 |
Actual Training Statistics (subset split train_prefs[:32090]):
- Training samples: 13,300 (paired examples)
- Validation samples: 700 (5%)
- Round-Robin distribution: 1,130 interactions per source
- Seed: 42 (for reproducibility)
Why No Model Weights?
This ablation experiment was designed to test training dynamics and loss behavior with different component configurations. The primary goal was to observe:
- Training Stability: How component combinations affect gradient flow
- Loss Curves: Convergence patterns with different components
- Efficiency: Training time per component configuration
Model weights were not saved to conserve storage, as the main value from this experiment is the configuration documentation and training logs.
Findings
This experiment contributed to understanding which component combinations work well together in the Enhanced KTO framework. Results informed the final Enhanced KTO configuration used in the main training runs.
Related Experiments
- SmolLM2-360M-Full_ENHANCED_KTO - Full Enhanced KTO with all components
Citation
@misc{smollm2_ablation_or_only_2025,
title = {SmolLM2-360M Odds Ratio Only Ablation Study},
author = {Thesis Research},
year = {2025},
note = {Component ablation experiment - no model weights saved},
publisher = {HuggingFace}
}
License
Apache 2.0
This experiment was conducted as part of thesis research on LLM alignment using preference optimization methods.