SmolLM2-360M - Odds Ratio Only (Ablation Study)

Note: This is an ablation study experiment. The model weights were not saved for this experiment. This repository documents the experiment configuration and results for research transparency.

Experiment Description

Testing Odds Ratio component in isolation

This experiment is part of a component ablation study for the Enhanced KTO training method, testing how different Prospect Theory components affect model training.

Component Configuration

Enabled Components

Odds Ratio

Disabled Components

~~Prospect Theory Value Function~~
~~Probability Weighting~~
~~BCO Shift~~

Training Summary

Metric	Value
Status	Completed
Training Time	150.2 minutes
Start Time	2025-12-24T18:28:47.060090
End Time	2025-12-24T20:58:58.216235
Model Weights Saved	No

Training Configuration

Parameter	Value
Base Model	HuggingFaceTB/SmolLM2-360M
Epochs	1
Batch Size	4
Gradient Accumulation	8
Learning Rate	0.0002
Max Length	1024
Beta	0.1
Dataset	kto_combined
Train Samples	32090
Test Samples	1295

Combined Preference Dataset (kto_combined)

Training uses a Combined Preference Dataset built via Round-Robin Sampling from three sources:

Source	Total Samples	Interactions
Anthropic HH-RLHF	321,600	61,568
Stanford Human Preferences (SHP)	697,436	38,984
OpenAssistant Conversations v1	16,810	8,904
Total	1,035,846	109,456

Actual Training Statistics (subset split train_prefs[:32090]):

Training samples: 13,300 (paired examples)
Validation samples: 700 (5%)
Round-Robin distribution: 1,130 interactions per source
Seed: 42 (for reproducibility)

Why No Model Weights?

This ablation experiment was designed to test training dynamics and loss behavior with different component configurations. The primary goal was to observe:

Training Stability: How component combinations affect gradient flow
Loss Curves: Convergence patterns with different components
Efficiency: Training time per component configuration

Model weights were not saved to conserve storage, as the main value from this experiment is the configuration documentation and training logs.

Findings

This experiment contributed to understanding which component combinations work well together in the Enhanced KTO framework. Results informed the final Enhanced KTO configuration used in the main training runs.

Related Experiments

SmolLM2-360M-Full_ENHANCED_KTO - Full Enhanced KTO with all components

Citation

@misc{smollm2_ablation_or_only_2025,
  title = {SmolLM2-360M Odds Ratio Only Ablation Study},
  author = {Thesis Research},
  year = {2025},
  note = {Component ablation experiment - no model weights saved},
  publisher = {HuggingFace}
}

License

Apache 2.0

This experiment was conducted as part of thesis research on LLM alignment using preference optimization methods.

Downloads last month: -; Downloads are not tracked for this model. How to track