|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: gpt2 |
|
|
tags: |
|
|
- trl |
|
|
- sft |
|
|
- lora |
|
|
- nebulos |
|
|
- skull |
|
|
- multilingual |
|
|
datasets: |
|
|
- HuggingFaceFW/fineweb-edu |
|
|
- HuggingFaceFW/finewiki |
|
|
- HuggingFaceFW/fineweb-2 |
|
|
language: |
|
|
- en |
|
|
- es |
|
|
- de |
|
|
- fr |
|
|
- pt |
|
|
metrics: |
|
|
- accuracy |
|
|
--- |
|
|
|
|
|
# ๐ SkullLLM-125M |
|
|
|
|
|
**SkullLLM-125M** is a lightweight, experimental multilingual language model fine-tuned from GPT-2. This project, part of the **SkullLLM** series, demonstrates that AI training is possible on highly constrained consumer hardware (3GB VRAM) using advanced optimization techniques. |
|
|
|
|
|
### ๐ Model Details |
|
|
- **Developed by:** Erik22TY |
|
|
- **Model Name:** Nebulos (SkullLLM-125M) |
|
|
- **Base Model:** GPT-2 (125M parameters) |
|
|
- **Training OS:** Linux Mint |
|
|
- **Training Hardware:** HP Pavilion Gaming Desktop 690-00xx |
|
|
- **GPU:** NVIDIA GeForce GTX 1050 (3GB VRAM - Pascal Architecture) |
|
|
- **Training Type:** LoRA (Low-Rank Adaptation) |
|
|
- **Format:** ChatML (`<|im_start|>user`, `<|im_start|>assistant`) |
|
|
|
|
|
### ๐ฅ๏ธ Hardware Requirements |
|
|
This model is optimized for low-end hardware. |
|
|
- **VRAM for Inference:** ~1.5 GB (4-bit) / ~2.2 GB (FP16). |
|
|
- **VRAM for Training:** 2.8 GB+ (Tested on GTX 1050 3GB). |
|
|
- **System RAM:** 4 GB minimum for inference; 12 GB recommended for training. |
|
|
- **Storage:** ~150 MB for the adapter files. |
|
|
|
|
|
|
|
|
|
|
|
### ๐ง Knowledge & Dataset |
|
|
Nebulos was trained on a high-quality multilingual stream: |
|
|
- **English (FineWeb-Edu):** Knowledge cutoff March 2024. |
|
|
- **Multilingual (FineWeb-2):** Spanish, German, French, and Portuguese web data. |
|
|
- **General (FineWiki):** Wikipedia-based knowledge updated through August 2025. |
|
|
|
|
|
### ๐งช Training Configuration |
|
|
- **Steps:** 500 |
|
|
- **Batch Size:** 1 (Gradient Accumulation: 16) |
|
|
- **Optimization:** 4-bit Quantization (NF4) |
|
|
- **Compute Dtype:** Forced FP16 (to support Pascal architecture) |
|
|
- **Learning Rate:** 2e-4 |
|
|
- **Final Loss:** 4.0898 |
|
|
|
|
|
### โ ๏ธ Limitations & Behavior |
|
|
As a 125M parameter model trained for 500 steps, SkullLLM-125M is a **Proof of Concept**. |
|
|
- **Repetitions:** May occasionally loop phrases (e.g., "metic"). Use `repetition_penalty=1.5`. |
|
|
- **Language Blending:** Due to its size, it may mix Romance languages (Spanish/French/Portuguese) in complex responses. |
|
|
- **Coherence:** Best used for short-form explanations or creative experiments. |
|
|
|
|
|
### ๐ฌ Usage (Python) |
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
from peft import PeftModel |
|
|
import torch |
|
|
|
|
|
model_id = "gpt2" |
|
|
adapter_id = "Erik22TY/SkullLLM-125M" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(adapter_id) |
|
|
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto") |
|
|
model.resize_token_embeddings(len(tokenizer)) |
|
|
model = PeftModel.from_pretrained(model, adapter_id) |