Safetensors
trl
sft
lora
nebulos
skull
multilingual
SkullLLM-125M / README.md
Erik22TY's picture
Update README.md
d97411a verified
---
license: apache-2.0
base_model: gpt2
tags:
- trl
- sft
- lora
- nebulos
- skull
- multilingual
datasets:
- HuggingFaceFW/fineweb-edu
- HuggingFaceFW/finewiki
- HuggingFaceFW/fineweb-2
language:
- en
- es
- de
- fr
- pt
metrics:
- accuracy
---
# ๐Ÿ’€ SkullLLM-125M
**SkullLLM-125M** is a lightweight, experimental multilingual language model fine-tuned from GPT-2. This project, part of the **SkullLLM** series, demonstrates that AI training is possible on highly constrained consumer hardware (3GB VRAM) using advanced optimization techniques.
### ๐Ÿš€ Model Details
- **Developed by:** Erik22TY
- **Model Name:** Nebulos (SkullLLM-125M)
- **Base Model:** GPT-2 (125M parameters)
- **Training OS:** Linux Mint
- **Training Hardware:** HP Pavilion Gaming Desktop 690-00xx
- **GPU:** NVIDIA GeForce GTX 1050 (3GB VRAM - Pascal Architecture)
- **Training Type:** LoRA (Low-Rank Adaptation)
- **Format:** ChatML (`<|im_start|>user`, `<|im_start|>assistant`)
### ๐Ÿ–ฅ๏ธ Hardware Requirements
This model is optimized for low-end hardware.
- **VRAM for Inference:** ~1.5 GB (4-bit) / ~2.2 GB (FP16).
- **VRAM for Training:** 2.8 GB+ (Tested on GTX 1050 3GB).
- **System RAM:** 4 GB minimum for inference; 12 GB recommended for training.
- **Storage:** ~150 MB for the adapter files.
### ๐Ÿง  Knowledge & Dataset
Nebulos was trained on a high-quality multilingual stream:
- **English (FineWeb-Edu):** Knowledge cutoff March 2024.
- **Multilingual (FineWeb-2):** Spanish, German, French, and Portuguese web data.
- **General (FineWiki):** Wikipedia-based knowledge updated through August 2025.
### ๐Ÿงช Training Configuration
- **Steps:** 500
- **Batch Size:** 1 (Gradient Accumulation: 16)
- **Optimization:** 4-bit Quantization (NF4)
- **Compute Dtype:** Forced FP16 (to support Pascal architecture)
- **Learning Rate:** 2e-4
- **Final Loss:** 4.0898
### โš ๏ธ Limitations & Behavior
As a 125M parameter model trained for 500 steps, SkullLLM-125M is a **Proof of Concept**.
- **Repetitions:** May occasionally loop phrases (e.g., "metic"). Use `repetition_penalty=1.5`.
- **Language Blending:** Due to its size, it may mix Romance languages (Spanish/French/Portuguese) in complex responses.
- **Coherence:** Best used for short-form explanations or creative experiments.
### ๐Ÿ’ฌ Usage (Python)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
model_id = "gpt2"
adapter_id = "Erik22TY/SkullLLM-125M"
tokenizer = AutoTokenizer.from_pretrained(adapter_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
model.resize_token_embeddings(len(tokenizer))
model = PeftModel.from_pretrained(model, adapter_id)