Erik22TY
/

SkullLLM-125M

Model card Files Files and versions

SkullLLM-125M / README.md

Erik22TY's picture

Update README.md

d97411a verified 3 days ago

|

history blame contribute delete

2.73 kB

	---
	license: apache-2.0
	base_model: gpt2
	tags:
	- trl
	- sft
	- lora
	- nebulos
	- skull
	- multilingual
	datasets:
	- HuggingFaceFW/fineweb-edu
	- HuggingFaceFW/finewiki
	- HuggingFaceFW/fineweb-2
	language:
	- en
	- es
	- de
	- fr
	- pt
	metrics:
	- accuracy
	---

	# 💀 SkullLLM-125M

	SkullLLM-125M is a lightweight, experimental multilingual language model fine-tuned from GPT-2. This project, part of the SkullLLM series, demonstrates that AI training is possible on highly constrained consumer hardware (3GB VRAM) using advanced optimization techniques.

	### 🚀 Model Details
	- Developed by: Erik22TY
	- Model Name: Nebulos (SkullLLM-125M)
	- Base Model: GPT-2 (125M parameters)
	- Training OS: Linux Mint
	- Training Hardware: HP Pavilion Gaming Desktop 690-00xx
	- GPU: NVIDIA GeForce GTX 1050 (3GB VRAM - Pascal Architecture)
	- Training Type: LoRA (Low-Rank Adaptation)
	- Format: ChatML (`<\|im_start\|>user`, `<\|im_start\|>assistant`)

	### 🖥️ Hardware Requirements
	This model is optimized for low-end hardware.
	- VRAM for Inference: ~1.5 GB (4-bit) / ~2.2 GB (FP16).
	- VRAM for Training: 2.8 GB+ (Tested on GTX 1050 3GB).
	- System RAM: 4 GB minimum for inference; 12 GB recommended for training.
	- Storage: ~150 MB for the adapter files.



	### 🧠 Knowledge & Dataset
	Nebulos was trained on a high-quality multilingual stream:
	- English (FineWeb-Edu): Knowledge cutoff March 2024.
	- Multilingual (FineWeb-2): Spanish, German, French, and Portuguese web data.
	- General (FineWiki): Wikipedia-based knowledge updated through August 2025.

	### 🧪 Training Configuration
	- Steps: 500
	- Batch Size: 1 (Gradient Accumulation: 16)
	- Optimization: 4-bit Quantization (NF4)
	- Compute Dtype: Forced FP16 (to support Pascal architecture)
	- Learning Rate: 2e-4
	- Final Loss: 4.0898

	### ⚠️ Limitations & Behavior
	As a 125M parameter model trained for 500 steps, SkullLLM-125M is a Proof of Concept.
	- Repetitions: May occasionally loop phrases (e.g., "metic"). Use `repetition_penalty=1.5`.
	- Language Blending: Due to its size, it may mix Romance languages (Spanish/French/Portuguese) in complex responses.
	- Coherence: Best used for short-form explanations or creative experiments.

	### 💬 Usage (Python)
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel
	import torch

	model_id = "gpt2"
	adapter_id = "Erik22TY/SkullLLM-125M"

	tokenizer = AutoTokenizer.from_pretrained(adapter_id)
	model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
	model.resize_token_embeddings(len(tokenizer))
	model = PeftModel.from_pretrained(model, adapter_id)