iko-2 (355M)

iko-2 is the second model in the iko series โ€” a GPT-2 Medium (355M parameters) language model that combines:

  1. iko-1 knowledge (GPT-2 124M fine-tuned on 700K FineWeb documents) via distillation
  2. Reddit conversational style from the Dolma v1.6 Reddit corpus

Training Details

Architecture

  • Base model: GPT-2 Medium (355M parameters)
  • Training method: 4-bit QLoRA with gradient checkpointing
  • LoRA config: r=32, alpha=64, targets: ['c_attn', 'c_proj', 'c_fc']
  • Merge strategy: TIES (TrIm, Elect Sign, and merge) with 80% density

Training Data

  • Reddit Dolma v1.6 (~10000 examples, 85% of training mix)
  • iko-1 distillation corpus (~1800 synthetic examples, 15% replay)
  • SuRe (Synthetic Replay) for catastrophic forgetting prevention

Hyperparameters

  • Learning rate: 4e-05 with cosine schedule
  • Layer-wise LR: embeddings 0.1ร—, bottom 0.3ร—, middle 1.0ร—, top 0.8ร—
  • Warmup: 80 steps
  • Effective batch size: 16
  • Sequence length: 512
  • Optimizer: 8-bit AdamW
  • Training time: 15 minutes on T4 GPU

Knowledge Transfer Pipeline

GPT-2 (124M) โ†’ [FineWeb fine-tune] โ†’ iko-1
                                         โ†“ distillation
GPT-2 Medium (355M) โ†’ [QLoRA + Reddit + Replay] โ†’ [TIES merge] โ†’ iko-2

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("iko-01/iko-002")
tokenizer = AutoTokenizer.from_pretrained("iko-01/iko-002")

input_text = "The best thing about learning is"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.8)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Model Series

Model Parameters Training Data Method
iko-1 124M FineWeb (700K docs) QLoRA on GPT-2
iko-2 355M Reddit + iko-1 distillation QLoRA + TIES merge on GPT-2 Medium

Limitations

  • This model inherits biases present in Reddit data and GPT-2's pretraining corpus
  • Not suitable for production use without additional safety fine-tuning
  • Generated text may contain informal language reflecting Reddit's conversational style

License

Apache 2.0

Downloads last month
29
Safetensors
Model size
0.4B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for iko-01/iko-002

Finetuned
(181)
this model