---
license: apache-2.0
base_model: gpt2
tags:
- trl
- sft
- lora
- nebulos
- skull
- multilingual
datasets:
- HuggingFaceFW/fineweb-edu
- HuggingFaceFW/finewiki
- HuggingFaceFW/fineweb-2
language:
- en
- es
- de
- fr
- pt
metrics:
- accuracy
---

# 💀 SkullLLM-125M

**SkullLLM-125M** is a lightweight, experimental multilingual language model fine-tuned from GPT-2. This project, part of the **SkullLLM** series, demonstrates that AI training is possible on highly constrained consumer hardware (3GB VRAM) using advanced optimization techniques.

### 🚀 Model Details
- **Developed by:** Erik22TY
- **Model Name:** Nebulos (SkullLLM-125M)
- **Base Model:** GPT-2 (125M parameters)
- **Training OS:** Linux Mint
- **Training Hardware:** HP Pavilion Gaming Desktop 690-00xx
- **GPU:** NVIDIA GeForce GTX 1050 (3GB VRAM - Pascal Architecture)
- **Training Type:** LoRA (Low-Rank Adaptation)
- **Format:** ChatML (`<|im_start|>user`, `<|im_start|>assistant`)

### 🖥️ Hardware Requirements
This model is optimized for low-end hardware.
- **VRAM for Inference:** ~1.5 GB (4-bit) / ~2.2 GB (FP16).
- **VRAM for Training:** 2.8 GB+ (Tested on GTX 1050 3GB).
- **System RAM:** 4 GB minimum for inference; 12 GB recommended for training.
- **Storage:** ~150 MB for the adapter files.


### 🧠 Knowledge & Dataset
Nebulos was trained on a high-quality multilingual stream:
- **English (FineWeb-Edu):** Knowledge cutoff March 2024.
- **Multilingual (FineWeb-2):** Spanish, German, French, and Portuguese web data.
- **General (FineWiki):** Wikipedia-based knowledge updated through August 2025.

### 🧪 Training Configuration
- **Steps:** 500
- **Batch Size:** 1 (Gradient Accumulation: 16)
- **Optimization:** 4-bit Quantization (NF4)
- **Compute Dtype:** Forced FP16 (to support Pascal architecture)
- **Learning Rate:** 2e-4
- **Final Loss:** 4.0898

### ⚠️ Limitations & Behavior
As a 125M parameter model trained for 500 steps, SkullLLM-125M is a **Proof of Concept**. 
- **Repetitions:** May occasionally loop phrases (e.g., "metic"). Use `repetition_penalty=1.5`.
- **Language Blending:** Due to its size, it may mix Romance languages (Spanish/French/Portuguese) in complex responses.
- **Coherence:** Best used for short-form explanations or creative experiments.

### 💬 Usage (Python)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

model_id = "gpt2"
adapter_id = "Erik22TY/SkullLLM-125M"

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
model.resize_token_embeddings(len(tokenizer))
model = PeftModel.from_pretrained(model, adapter_id)