--- license: apache-2.0 language: - en tags: - iko - gpt2-medium - conversational - reddit - qlora - ties-merge pipeline_tag: text-generation base_model: gpt2-medium datasets: - dolma - fineweb --- # iko-2 (355M) **iko-2** is the second model in the iko series — a GPT-2 Medium (355M parameters) language model that combines: 1. **iko-1 knowledge** (GPT-2 124M fine-tuned on 700K FineWeb documents) via distillation 2. **Reddit conversational style** from the Dolma v1.6 Reddit corpus ## Training Details ### Architecture - **Base model:** GPT-2 Medium (355M parameters) - **Training method:** 4-bit QLoRA with gradient checkpointing - **LoRA config:** r=32, alpha=64, targets: ['c_attn', 'c_proj', 'c_fc'] - **Merge strategy:** TIES (TrIm, Elect Sign, and merge) with 80% density ### Training Data - **Reddit Dolma v1.6** (~10000 examples, 85% of training mix) - **iko-1 distillation corpus** (~1800 synthetic examples, 15% replay) - **SuRe (Synthetic Replay)** for catastrophic forgetting prevention ### Hyperparameters - Learning rate: 4e-05 with cosine schedule - Layer-wise LR: embeddings 0.1×, bottom 0.3×, middle 1.0×, top 0.8× - Warmup: 80 steps - Effective batch size: 16 - Sequence length: 512 - Optimizer: 8-bit AdamW - Training time: 15 minutes on T4 GPU ### Knowledge Transfer Pipeline ``` GPT-2 (124M) → [FineWeb fine-tune] → iko-1 ↓ distillation GPT-2 Medium (355M) → [QLoRA + Reddit + Replay] → [TIES merge] → iko-2 ``` ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("iko-01/iko-002") tokenizer = AutoTokenizer.from_pretrained("iko-01/iko-002") input_text = "The best thing about learning is" inputs = tokenizer(input_text, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.8) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Model Series | Model | Parameters | Training Data | Method | |-------|-----------|---------------|--------| | iko-1 | 124M | FineWeb (700K docs) | QLoRA on GPT-2 | | **iko-2** | **355M** | **Reddit + iko-1 distillation** | **QLoRA + TIES merge on GPT-2 Medium** | ## Limitations - This model inherits biases present in Reddit data and GPT-2's pretraining corpus - Not suitable for production use without additional safety fine-tuning - Generated text may contain informal language reflecting Reddit's conversational style ## License Apache 2.0