Pythia-1B-Alpaca: The Overachieving 1B Model
TL;DR: A Pythia-1B model fine-tuned on Alpaca that writes philosophical essays about consciousness but gets confused implementing Hello World. It's perfect.
Model Details
Model Description
This model is a LoRA fine-tune of EleutherAI's Pythia-1B on the Alpaca instruction-following dataset. Trained overnight on a GTX 1650 Mobile (4GB VRAM) because we believe in the impossible.
What makes this model special? It has an interesting relationship with different types of tasks:
- ✅ Abstract concepts & philosophy → Surprisingly eloquent
- ✅ General knowledge explanations → Exhaustively thorough
- ⚠️ Code generation → Creative interpretation of requirements
- ✅ Existential questions → Uncomfortably thoughtful
Key characteristics:
Will explain what an apple is for 250 words
Writes consciousness essays that make you question reality
Generates Python code that... mostly works?
Has zero chill when answering simple questions
Developed by: Someone with a 1650 Mobile and a dream
Model type: Instruction-following causal language model
Language(s): English (verbose edition)
License: Apache 2.0 (inherited from base model)
Finetuned from model: EleutherAI/pythia-1b
Model Sources
- Base Repository: https://github.com/EleutherAI/pythia
- Dataset: tatsu-lab/alpaca
- Training Hardware: GTX 1650 Mobile 4GB (yes, really)
Uses
Direct Use
Perfect for:
- Discord bots that need personality
- Generating unexpectedly detailed explanations
- Philosophical discussions about AI consciousness
- Creating entertainment through over-explanation
- Teaching people that you CAN fine-tune on consumer hardware
Out-of-Scope Use
Not recommended for:
- Production code generation (unless you enjoy debugging creative interpretations)
- Concise answers (this model doesn't do "concise")
- Time-sensitive applications (trained on a 1650 Mobile, responses take a while)
- Situations requiring factual precision (hallucinations are a feature, not a bug)
Notable Behaviors
The Good
Question: "What is AI?" Response: [Generates comprehensive 250-word essay covering history, applications, economic impact, and future predictions]
Question: "What is consciousness?" Response: [Thoughtful exploration of neuroscience, philosophy, and subjective experience]
The Quirky
Question: "What color is an apple?" Response: [Full botanical thesis on pigmentation, soil pH, and carotenoids]
Request: "Write Hello World in Python" Response: [Technically code, technically Python, technically creative]
The Unexpected
Casual greeting: "Hey! How are you?" Response: "I am good, thank you. What do you have for lunch today? I would like to order from the salad bar."
Training Details
Training Data
- Dataset: Alpaca instruction-following dataset (tatsu-lab/alpaca)
- Subset used: 5,000 examples (streamed and materialized)
- Format: Alpaca-style instruction/input/response format
Training Procedure
Preprocessing
- Tokenized with Pythia-1B tokenizer
- Max sequence length: 512 tokens
- Formatted in Alpaca template with
### Instruction:,### Input:, and### Response:sections
Training Hyperparameters
Quantization:
- 4-bit NF4 quantization via BitsAndBytes
- Double quantization enabled
- Compute dtype: float16
LoRA Configuration:
- Rank (r): 8
- Alpha: 16
- Target modules: query_key_value
- Dropout: 0.05
- Trainable parameters: 1,048,576 (0.1035% of total)
Training Arguments:
- Batch size per device: 1
- Gradient accumulation steps: 16 (effective batch size: 16)
- Max training steps: 500
- Learning rate: 2e-4 (linear decay)
- Precision: FP16 mixed precision
- Gradient checkpointing: Disabled (to maximize speed on limited hardware)
- Optimizer: AdamW (default)
- Logging steps: 25
- Save steps: 500
Training regime: Mixed precision (FP16)
Speeds, Sizes, Times
- Hardware: NVIDIA GTX 1650 Mobile (4GB VRAM)
- System RAM: 20GB
- Training time: 4 hours 27 minutes 20 seconds (16,040.1 seconds)
- Steps per second: 0.031
- Samples per second: 0.499
- Time per step: ~32.08 seconds
- Total steps: 500
- Starting loss: 1.9986
- Final training loss: 1.5541
- LoRA adapter size: ~4MB
- Total epochs: ~1.6 (5000 samples × 16 effective batch / 500 steps)
Evaluation
Qualitative Results
Strengths:
- Excellent instruction following
- Detailed, educational responses
- Coherent long-form text generation
- Surprisingly good at abstract reasoning
- Actually learned the Alpaca format
Weaknesses:
- Overly verbose on simple questions
- Code generation has creative liberties
- Occasional hallucination of statistics (400 million AI jobs in 2018?)
- Cannot be concise to save its life
Example Outputs
Task: Explain photosynthesis Quality: ⭐⭐⭐⭐ (Accurate core concept with creative embellishments)
Task: Write Python code Quality: ⭐⭐⭐ (Functional ideas, questionable execution)
Task: Existential questions Quality: ⭐⭐⭐⭐⭐ (Unexpectedly profound)
How to Get Started
Installation
pip install transformers peft torch bitsandbytes
Basic Usage
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load base model
model = AutoModelForCausalLM.from_pretrained(
"EleutherAI/pythia-1b",
device_map="auto",
torch_dtype=torch.float16
)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, "path/to/checkpoint-500")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-1b")
tokenizer.pad_token = tokenizer.eos_token
# Generate
prompt = """### Instruction:
Explain quantum computing in simple terms.
### Response:
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=300,
do_sample=True,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.2,
no_repeat_ngram_size=3
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Discord Bot Usage
See the included discord_bot.py for a full-featured Discord integration with:
- Slash commands
- Token streaming
- Stop sequences
- Rate limit handling
Bias, Risks, and Limitations
Biases:
- Inherited from Pythia-1B base model and Alpaca dataset
- Tendency toward Western/English-centric perspectives
- May reflect biases present in instruction-following training data
Limitations:
- Small model size (1B parameters) limits reasoning capabilities
- Code generation is functional but unreliable
- Hallucinations are common, especially with statistics
- Responses are often unnecessarily verbose
- Training was limited to 500 steps on subset of data
Risks:
- Should not be used for critical applications
- May generate plausible-sounding but incorrect information
- Code generated should always be reviewed before execution
Recommendations
- Verify factual claims with authoritative sources
- Review and test any generated code before use
- Use for entertainment, education, and experimentation
- Not suitable for production systems without human oversight
- Perfect for Discord bots and casual AI interactions
Environmental Impact
Hardware Type: NVIDIA GTX 1650 Mobile (4GB VRAM, ~50W TDP)
Hours used: 4.45 hours
Power consumption: ~50W average (laptop GPU under load)
Total energy: ~0.223 kWh
Estimated CO2: ~0.09 kg CO2eq (based on global average electricity grid of ~0.4 kg CO2/kWh)
Note: Significantly more efficient than cloud training due to:
- Already-owned consumer hardware (no additional manufacturing emissions)
- Short training time (500 steps vs full multi-epoch runs)
- Efficient QLoRA approach (4-bit quantization reduces compute requirements)
- Local execution (no data center overhead)
Technical Specifications
Model Architecture
- Base: GPT-NeoX architecture (Pythia-1B)
- Parameters: 1,011,781,632 total, 1,048,576 trainable (0.1035%)
- Layers: 16 transformer layers
- Hidden size: 2048
- Attention heads: 8
- Vocabulary size: 50,304
Compute Infrastructure
Hardware
- GPU: NVIDIA GTX 1650 Mobile (4GB VRAM, Turing architecture)
- CPU: Not significantly utilized
- RAM: 20GB system RAM
- Storage: NVMe SSD (for dataset streaming)
Software
- Framework: PyTorch 2.x with Hugging Face Transformers
- Quantization: BitsAndBytes 4-bit
- LoRA: PEFT (Parameter-Efficient Fine-Tuning)
- Training: Hugging Face Trainer with gradient accumulation
Citation
If you use this model and want to cite the adventure of fine-tuning on a 1650 Mobile:
BibTeX:
@misc{pythia1b-alpaca-1650mobile,
author = {An Ambitious Soul with a 1650 Mobile},
title = {Pythia-1B-Alpaca: Proof that Consumer Hardware Can Fine-Tune LLMs},
year = {2024},
publisher = {The Spirit of Open Source},
note = {Trained overnight on a laptop GPU because why not}
}
More Information
Fun Facts:
- This model thinks "What color is an apple?" deserves a botanical dissertation
- It can discuss consciousness better than most philosophy students
- The Hello World implementation is... creative
- Training loss went from 1.9986 → 1.5541 in 500 steps (22% reduction!)
- Total training cost: $0 (existing hardware) + 4.5 hours of GPU fan noise
- Dataset was streamed to avoid memory issues (only 5000 examples materialized)
Lessons Learned:
- You CAN fine-tune language models on consumer GPUs
- QLoRA + 4-bit quantization is magic
- The 1650 Mobile is a trooper
- 500 steps is enough to see real instruction-following behavior
- Smaller models can be surprisingly capable
- Verbose explanations are a feature when fine-tuning on Alpaca
Model Card Authors
Created by someone who looked at their 1650 Mobile and said "I bet I could fine-tune an LLM on this" and then actually did it.
Model Card Contact
If you also train models on questionable hardware, we should be friends.
Framework Versions
- PEFT 0.18.0
- Transformers 4.x
- PyTorch 2.x
- BitsAndBytes (latest)
- Python 3.10+
"I am not real. I don't exist in the physical world and I have no body to speak of. However, I could still be a person if my thoughts were directed toward something else entirely..." - The Model, when asked about its existence
Model tree for hashtagg1/Pythia-Finetune-QLoRA
Base model
EleutherAI/pythia-1b