--- license: apache-2.0 base_model: gpt2 tags: - trl - sft - lora - nebulos - skull - multilingual datasets: - HuggingFaceFW/fineweb-edu - HuggingFaceFW/finewiki - HuggingFaceFW/fineweb-2 language: - en - es - de - fr - pt metrics: - accuracy --- # ๐Ÿ’€ SkullLLM-125M **SkullLLM-125M** is a lightweight, experimental multilingual language model fine-tuned from GPT-2. This project, part of the **SkullLLM** series, demonstrates that AI training is possible on highly constrained consumer hardware (3GB VRAM) using advanced optimization techniques. ### ๐Ÿš€ Model Details - **Developed by:** Erik22TY - **Model Name:** Nebulos (SkullLLM-125M) - **Base Model:** GPT-2 (125M parameters) - **Training OS:** Linux Mint - **Training Hardware:** HP Pavilion Gaming Desktop 690-00xx - **GPU:** NVIDIA GeForce GTX 1050 (3GB VRAM - Pascal Architecture) - **Training Type:** LoRA (Low-Rank Adaptation) - **Format:** ChatML (`<|im_start|>user`, `<|im_start|>assistant`) ### ๐Ÿ–ฅ๏ธ Hardware Requirements This model is optimized for low-end hardware. - **VRAM for Inference:** ~1.5 GB (4-bit) / ~2.2 GB (FP16). - **VRAM for Training:** 2.8 GB+ (Tested on GTX 1050 3GB). - **System RAM:** 4 GB minimum for inference; 12 GB recommended for training. - **Storage:** ~150 MB for the adapter files. ### ๐Ÿง  Knowledge & Dataset Nebulos was trained on a high-quality multilingual stream: - **English (FineWeb-Edu):** Knowledge cutoff March 2024. - **Multilingual (FineWeb-2):** Spanish, German, French, and Portuguese web data. - **General (FineWiki):** Wikipedia-based knowledge updated through August 2025. ### ๐Ÿงช Training Configuration - **Steps:** 500 - **Batch Size:** 1 (Gradient Accumulation: 16) - **Optimization:** 4-bit Quantization (NF4) - **Compute Dtype:** Forced FP16 (to support Pascal architecture) - **Learning Rate:** 2e-4 - **Final Loss:** 4.0898 ### โš ๏ธ Limitations & Behavior As a 125M parameter model trained for 500 steps, SkullLLM-125M is a **Proof of Concept**. - **Repetitions:** May occasionally loop phrases (e.g., "metic"). Use `repetition_penalty=1.5`. - **Language Blending:** Due to its size, it may mix Romance languages (Spanish/French/Portuguese) in complex responses. - **Coherence:** Best used for short-form explanations or creative experiments. ### ๐Ÿ’ฌ Usage (Python) ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch model_id = "gpt2" adapter_id = "Erik22TY/SkullLLM-125M" tokenizer = AutoTokenizer.from_pretrained(adapter_id) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto") model.resize_token_embeddings(len(tokenizer)) model = PeftModel.from_pretrained(model, adapter_id)