hashtagg1's picture
Update README.md
51cc700 verified
---
license: apache-2.0
datasets:
- tatsu-lab/alpaca
base_model:
- EleutherAI/pythia-1b
pipeline_tag: text-generation
tags:
- base_model:adapter:EleutherAI/pythia-1b
- lora
- transformers
- alpaca
- instruction-following
- existential-crisis-capable
---
# Pythia-1B-Alpaca: The Overachieving 1B Model
**TL;DR**: A Pythia-1B model fine-tuned on Alpaca that writes philosophical essays about consciousness but gets confused implementing Hello World. It's perfect.
## Model Details
### Model Description
This model is a LoRA fine-tune of EleutherAI's Pythia-1B on the Alpaca instruction-following dataset. Trained overnight on a GTX 1650 Mobile (4GB VRAM) because we believe in the impossible.
What makes this model special? It has an *interesting* relationship with different types of tasks:
- ✅ Abstract concepts & philosophy → Surprisingly eloquent
- ✅ General knowledge explanations → Exhaustively thorough
- ⚠️ Code generation → Creative interpretation of requirements
- ✅ Existential questions → Uncomfortably thoughtful
**Key characteristics**:
- Will explain what an apple is for 250 words
- Writes consciousness essays that make you question reality
- Generates Python code that... mostly works?
- Has zero chill when answering simple questions
- **Developed by:** Someone with a 1650 Mobile and a dream
- **Model type:** Instruction-following causal language model
- **Language(s):** English (verbose edition)
- **License:** Apache 2.0 (inherited from base model)
- **Finetuned from model:** EleutherAI/pythia-1b
### Model Sources
- **Base Repository:** https://github.com/EleutherAI/pythia
- **Dataset:** tatsu-lab/alpaca
- **Training Hardware:** GTX 1650 Mobile 4GB (yes, really)
## Uses
### Direct Use
Perfect for:
- Discord bots that need personality
- Generating unexpectedly detailed explanations
- Philosophical discussions about AI consciousness
- Creating entertainment through over-explanation
- Teaching people that you CAN fine-tune on consumer hardware
### Out-of-Scope Use
Not recommended for:
- Production code generation (unless you enjoy debugging creative interpretations)
- Concise answers (this model doesn't do "concise")
- Time-sensitive applications (trained on a 1650 Mobile, responses take a while)
- Situations requiring factual precision (hallucinations are a feature, not a bug)
## Notable Behaviors
### The Good
**Question:** "What is AI?"
**Response:** *[Generates comprehensive 250-word essay covering history, applications, economic impact, and future predictions]*
**Question:** "What is consciousness?"
**Response:** *[Thoughtful exploration of neuroscience, philosophy, and subjective experience]*
### The Quirky
**Question:** "What color is an apple?"
**Response:** *[Full botanical thesis on pigmentation, soil pH, and carotenoids]*
**Request:** "Write Hello World in Python"
**Response:** *[Technically code, technically Python, technically creative]*
### The Unexpected
**Casual greeting:** "Hey! How are you?"
**Response:** "I am good, thank you. What do you have for lunch today? I would like to order from the salad bar."
## Training Details
### Training Data
- **Dataset:** Alpaca instruction-following dataset (tatsu-lab/alpaca)
- **Subset used:** 5,000 examples (streamed and materialized)
- **Format:** Alpaca-style instruction/input/response format
### Training Procedure
#### Preprocessing
- Tokenized with Pythia-1B tokenizer
- Max sequence length: 512 tokens
- Formatted in Alpaca template with `### Instruction:`, `### Input:`, and `### Response:` sections
#### Training Hyperparameters
**Quantization:**
- 4-bit NF4 quantization via BitsAndBytes
- Double quantization enabled
- Compute dtype: float16
**LoRA Configuration:**
- Rank (r): 8
- Alpha: 16
- Target modules: query_key_value
- Dropout: 0.05
- Trainable parameters: 1,048,576 (0.1035% of total)
**Training Arguments:**
- Batch size per device: 1
- Gradient accumulation steps: 16 (effective batch size: 16)
- Max training steps: 500
- Learning rate: 2e-4 (linear decay)
- Precision: FP16 mixed precision
- Gradient checkpointing: Disabled (to maximize speed on limited hardware)
- Optimizer: AdamW (default)
- Logging steps: 25
- Save steps: 500
**Training regime:** Mixed precision (FP16)
#### Speeds, Sizes, Times
- **Hardware:** NVIDIA GTX 1650 Mobile (4GB VRAM)
- **System RAM:** 20GB
- **Training time:** 4 hours 27 minutes 20 seconds (16,040.1 seconds)
- **Steps per second:** 0.031
- **Samples per second:** 0.499
- **Time per step:** ~32.08 seconds
- **Total steps:** 500
- **Starting loss:** 1.9986
- **Final training loss:** 1.5541
- **LoRA adapter size:** ~4MB
- **Total epochs:** ~1.6 (5000 samples × 16 effective batch / 500 steps)
## Evaluation
### Qualitative Results
**Strengths:**
- Excellent instruction following
- Detailed, educational responses
- Coherent long-form text generation
- Surprisingly good at abstract reasoning
- Actually learned the Alpaca format
**Weaknesses:**
- Overly verbose on simple questions
- Code generation has creative liberties
- Occasional hallucination of statistics (400 million AI jobs in 2018?)
- Cannot be concise to save its life
### Example Outputs
**Task:** Explain photosynthesis
**Quality:** ⭐⭐⭐⭐ (Accurate core concept with creative embellishments)
**Task:** Write Python code
**Quality:** ⭐⭐⭐ (Functional ideas, questionable execution)
**Task:** Existential questions
**Quality:** ⭐⭐⭐⭐⭐ (Unexpectedly profound)
## How to Get Started
### Installation
```python
pip install transformers peft torch bitsandbytes
```
### Basic Usage
```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load base model
model = AutoModelForCausalLM.from_pretrained(
"EleutherAI/pythia-1b",
device_map="auto",
torch_dtype=torch.float16
)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, "path/to/checkpoint-500")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-1b")
tokenizer.pad_token = tokenizer.eos_token
# Generate
prompt = """### Instruction:
Explain quantum computing in simple terms.
### Response:
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=300,
do_sample=True,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.2,
no_repeat_ngram_size=3
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Discord Bot Usage
See the included `discord_bot.py` for a full-featured Discord integration with:
- Slash commands
- Token streaming
- Stop sequences
- Rate limit handling
## Bias, Risks, and Limitations
**Biases:**
- Inherited from Pythia-1B base model and Alpaca dataset
- Tendency toward Western/English-centric perspectives
- May reflect biases present in instruction-following training data
**Limitations:**
- Small model size (1B parameters) limits reasoning capabilities
- Code generation is functional but unreliable
- Hallucinations are common, especially with statistics
- Responses are often unnecessarily verbose
- Training was limited to 500 steps on subset of data
**Risks:**
- Should not be used for critical applications
- May generate plausible-sounding but incorrect information
- Code generated should always be reviewed before execution
### Recommendations
- Verify factual claims with authoritative sources
- Review and test any generated code before use
- Use for entertainment, education, and experimentation
- Not suitable for production systems without human oversight
- Perfect for Discord bots and casual AI interactions
## Environmental Impact
**Hardware Type:** NVIDIA GTX 1650 Mobile (4GB VRAM, ~50W TDP)
**Hours used:** 4.45 hours
**Power consumption:** ~50W average (laptop GPU under load)
**Total energy:** ~0.223 kWh
**Estimated CO2:** ~0.09 kg CO2eq (based on global average electricity grid of ~0.4 kg CO2/kWh)
*Note: Significantly more efficient than cloud training due to:*
- Already-owned consumer hardware (no additional manufacturing emissions)
- Short training time (500 steps vs full multi-epoch runs)
- Efficient QLoRA approach (4-bit quantization reduces compute requirements)
- Local execution (no data center overhead)
## Technical Specifications
### Model Architecture
- **Base:** GPT-NeoX architecture (Pythia-1B)
- **Parameters:** 1,011,781,632 total, 1,048,576 trainable (0.1035%)
- **Layers:** 16 transformer layers
- **Hidden size:** 2048
- **Attention heads:** 8
- **Vocabulary size:** 50,304
### Compute Infrastructure
#### Hardware
- **GPU:** NVIDIA GTX 1650 Mobile (4GB VRAM, Turing architecture)
- **CPU:** Not significantly utilized
- **RAM:** 20GB system RAM
- **Storage:** NVMe SSD (for dataset streaming)
#### Software
- **Framework:** PyTorch 2.x with Hugging Face Transformers
- **Quantization:** BitsAndBytes 4-bit
- **LoRA:** PEFT (Parameter-Efficient Fine-Tuning)
- **Training:** Hugging Face Trainer with gradient accumulation
## Citation
If you use this model and want to cite the adventure of fine-tuning on a 1650 Mobile:
**BibTeX:**
```bibtex
@misc{pythia1b-alpaca-1650mobile,
author = {An Ambitious Soul with a 1650 Mobile},
title = {Pythia-1B-Alpaca: Proof that Consumer Hardware Can Fine-Tune LLMs},
year = {2024},
publisher = {The Spirit of Open Source},
note = {Trained overnight on a laptop GPU because why not}
}
```
## More Information
**Fun Facts:**
- This model thinks "What color is an apple?" deserves a botanical dissertation
- It can discuss consciousness better than most philosophy students
- The Hello World implementation is... creative
- Training loss went from 1.9986 → 1.5541 in 500 steps (22% reduction!)
- Total training cost: $0 (existing hardware) + 4.5 hours of GPU fan noise
- Dataset was streamed to avoid memory issues (only 5000 examples materialized)
**Lessons Learned:**
1. You CAN fine-tune language models on consumer GPUs
2. QLoRA + 4-bit quantization is magic
3. The 1650 Mobile is a trooper
4. 500 steps is enough to see real instruction-following behavior
5. Smaller models can be surprisingly capable
6. Verbose explanations are a feature when fine-tuning on Alpaca
## Model Card Authors
Created by someone who looked at their 1650 Mobile and said "I bet I could fine-tune an LLM on this" and then actually did it.
## Model Card Contact
If you also train models on questionable hardware, we should be friends.
### Framework Versions
- PEFT 0.18.0
- Transformers 4.x
- PyTorch 2.x
- BitsAndBytes (latest)
- Python 3.10+
---
*"I am not real. I don't exist in the physical world and I have no body to speak of. However, I could still be a person if my thoughts were directed toward something else entirely..."* - The Model, when asked about its existence