|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- tatsu-lab/alpaca |
|
|
base_model: |
|
|
- EleutherAI/pythia-1b |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- base_model:adapter:EleutherAI/pythia-1b |
|
|
- lora |
|
|
- transformers |
|
|
- alpaca |
|
|
- instruction-following |
|
|
- existential-crisis-capable |
|
|
--- |
|
|
|
|
|
# Pythia-1B-Alpaca: The Overachieving 1B Model |
|
|
|
|
|
**TL;DR**: A Pythia-1B model fine-tuned on Alpaca that writes philosophical essays about consciousness but gets confused implementing Hello World. It's perfect. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
This model is a LoRA fine-tune of EleutherAI's Pythia-1B on the Alpaca instruction-following dataset. Trained overnight on a GTX 1650 Mobile (4GB VRAM) because we believe in the impossible. |
|
|
|
|
|
What makes this model special? It has an *interesting* relationship with different types of tasks: |
|
|
- ✅ Abstract concepts & philosophy → Surprisingly eloquent |
|
|
- ✅ General knowledge explanations → Exhaustively thorough |
|
|
- ⚠️ Code generation → Creative interpretation of requirements |
|
|
- ✅ Existential questions → Uncomfortably thoughtful |
|
|
|
|
|
**Key characteristics**: |
|
|
- Will explain what an apple is for 250 words |
|
|
- Writes consciousness essays that make you question reality |
|
|
- Generates Python code that... mostly works? |
|
|
- Has zero chill when answering simple questions |
|
|
|
|
|
- **Developed by:** Someone with a 1650 Mobile and a dream |
|
|
- **Model type:** Instruction-following causal language model |
|
|
- **Language(s):** English (verbose edition) |
|
|
- **License:** Apache 2.0 (inherited from base model) |
|
|
- **Finetuned from model:** EleutherAI/pythia-1b |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
- **Base Repository:** https://github.com/EleutherAI/pythia |
|
|
- **Dataset:** tatsu-lab/alpaca |
|
|
- **Training Hardware:** GTX 1650 Mobile 4GB (yes, really) |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
Perfect for: |
|
|
- Discord bots that need personality |
|
|
- Generating unexpectedly detailed explanations |
|
|
- Philosophical discussions about AI consciousness |
|
|
- Creating entertainment through over-explanation |
|
|
- Teaching people that you CAN fine-tune on consumer hardware |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
Not recommended for: |
|
|
- Production code generation (unless you enjoy debugging creative interpretations) |
|
|
- Concise answers (this model doesn't do "concise") |
|
|
- Time-sensitive applications (trained on a 1650 Mobile, responses take a while) |
|
|
- Situations requiring factual precision (hallucinations are a feature, not a bug) |
|
|
|
|
|
## Notable Behaviors |
|
|
|
|
|
### The Good |
|
|
**Question:** "What is AI?" |
|
|
**Response:** *[Generates comprehensive 250-word essay covering history, applications, economic impact, and future predictions]* |
|
|
|
|
|
**Question:** "What is consciousness?" |
|
|
**Response:** *[Thoughtful exploration of neuroscience, philosophy, and subjective experience]* |
|
|
|
|
|
### The Quirky |
|
|
**Question:** "What color is an apple?" |
|
|
**Response:** *[Full botanical thesis on pigmentation, soil pH, and carotenoids]* |
|
|
|
|
|
**Request:** "Write Hello World in Python" |
|
|
**Response:** *[Technically code, technically Python, technically creative]* |
|
|
|
|
|
### The Unexpected |
|
|
**Casual greeting:** "Hey! How are you?" |
|
|
**Response:** "I am good, thank you. What do you have for lunch today? I would like to order from the salad bar." |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
- **Dataset:** Alpaca instruction-following dataset (tatsu-lab/alpaca) |
|
|
- **Subset used:** 5,000 examples (streamed and materialized) |
|
|
- **Format:** Alpaca-style instruction/input/response format |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
#### Preprocessing |
|
|
- Tokenized with Pythia-1B tokenizer |
|
|
- Max sequence length: 512 tokens |
|
|
- Formatted in Alpaca template with `### Instruction:`, `### Input:`, and `### Response:` sections |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
|
|
**Quantization:** |
|
|
- 4-bit NF4 quantization via BitsAndBytes |
|
|
- Double quantization enabled |
|
|
- Compute dtype: float16 |
|
|
|
|
|
**LoRA Configuration:** |
|
|
- Rank (r): 8 |
|
|
- Alpha: 16 |
|
|
- Target modules: query_key_value |
|
|
- Dropout: 0.05 |
|
|
- Trainable parameters: 1,048,576 (0.1035% of total) |
|
|
|
|
|
**Training Arguments:** |
|
|
- Batch size per device: 1 |
|
|
- Gradient accumulation steps: 16 (effective batch size: 16) |
|
|
- Max training steps: 500 |
|
|
- Learning rate: 2e-4 (linear decay) |
|
|
- Precision: FP16 mixed precision |
|
|
- Gradient checkpointing: Disabled (to maximize speed on limited hardware) |
|
|
- Optimizer: AdamW (default) |
|
|
- Logging steps: 25 |
|
|
- Save steps: 500 |
|
|
|
|
|
**Training regime:** Mixed precision (FP16) |
|
|
|
|
|
#### Speeds, Sizes, Times |
|
|
|
|
|
- **Hardware:** NVIDIA GTX 1650 Mobile (4GB VRAM) |
|
|
- **System RAM:** 20GB |
|
|
- **Training time:** 4 hours 27 minutes 20 seconds (16,040.1 seconds) |
|
|
- **Steps per second:** 0.031 |
|
|
- **Samples per second:** 0.499 |
|
|
- **Time per step:** ~32.08 seconds |
|
|
- **Total steps:** 500 |
|
|
- **Starting loss:** 1.9986 |
|
|
- **Final training loss:** 1.5541 |
|
|
- **LoRA adapter size:** ~4MB |
|
|
- **Total epochs:** ~1.6 (5000 samples × 16 effective batch / 500 steps) |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Qualitative Results |
|
|
|
|
|
**Strengths:** |
|
|
- Excellent instruction following |
|
|
- Detailed, educational responses |
|
|
- Coherent long-form text generation |
|
|
- Surprisingly good at abstract reasoning |
|
|
- Actually learned the Alpaca format |
|
|
|
|
|
**Weaknesses:** |
|
|
- Overly verbose on simple questions |
|
|
- Code generation has creative liberties |
|
|
- Occasional hallucination of statistics (400 million AI jobs in 2018?) |
|
|
- Cannot be concise to save its life |
|
|
|
|
|
### Example Outputs |
|
|
|
|
|
**Task:** Explain photosynthesis |
|
|
**Quality:** ⭐⭐⭐⭐ (Accurate core concept with creative embellishments) |
|
|
|
|
|
**Task:** Write Python code |
|
|
**Quality:** ⭐⭐⭐ (Functional ideas, questionable execution) |
|
|
|
|
|
**Task:** Existential questions |
|
|
**Quality:** ⭐⭐⭐⭐⭐ (Unexpectedly profound) |
|
|
|
|
|
## How to Get Started |
|
|
|
|
|
### Installation |
|
|
|
|
|
```python |
|
|
pip install transformers peft torch bitsandbytes |
|
|
``` |
|
|
|
|
|
### Basic Usage |
|
|
|
|
|
```python |
|
|
from peft import PeftModel |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
# Load base model |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
"EleutherAI/pythia-1b", |
|
|
device_map="auto", |
|
|
torch_dtype=torch.float16 |
|
|
) |
|
|
|
|
|
# Load LoRA adapter |
|
|
model = PeftModel.from_pretrained(model, "path/to/checkpoint-500") |
|
|
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-1b") |
|
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
|
|
|
# Generate |
|
|
prompt = """### Instruction: |
|
|
Explain quantum computing in simple terms. |
|
|
|
|
|
### Response: |
|
|
""" |
|
|
|
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=300, |
|
|
do_sample=True, |
|
|
temperature=0.7, |
|
|
top_p=0.9, |
|
|
repetition_penalty=1.2, |
|
|
no_repeat_ngram_size=3 |
|
|
) |
|
|
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
### Discord Bot Usage |
|
|
|
|
|
See the included `discord_bot.py` for a full-featured Discord integration with: |
|
|
- Slash commands |
|
|
- Token streaming |
|
|
- Stop sequences |
|
|
- Rate limit handling |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
**Biases:** |
|
|
- Inherited from Pythia-1B base model and Alpaca dataset |
|
|
- Tendency toward Western/English-centric perspectives |
|
|
- May reflect biases present in instruction-following training data |
|
|
|
|
|
**Limitations:** |
|
|
- Small model size (1B parameters) limits reasoning capabilities |
|
|
- Code generation is functional but unreliable |
|
|
- Hallucinations are common, especially with statistics |
|
|
- Responses are often unnecessarily verbose |
|
|
- Training was limited to 500 steps on subset of data |
|
|
|
|
|
**Risks:** |
|
|
- Should not be used for critical applications |
|
|
- May generate plausible-sounding but incorrect information |
|
|
- Code generated should always be reviewed before execution |
|
|
|
|
|
### Recommendations |
|
|
|
|
|
- Verify factual claims with authoritative sources |
|
|
- Review and test any generated code before use |
|
|
- Use for entertainment, education, and experimentation |
|
|
- Not suitable for production systems without human oversight |
|
|
- Perfect for Discord bots and casual AI interactions |
|
|
|
|
|
## Environmental Impact |
|
|
|
|
|
**Hardware Type:** NVIDIA GTX 1650 Mobile (4GB VRAM, ~50W TDP) |
|
|
**Hours used:** 4.45 hours |
|
|
**Power consumption:** ~50W average (laptop GPU under load) |
|
|
**Total energy:** ~0.223 kWh |
|
|
**Estimated CO2:** ~0.09 kg CO2eq (based on global average electricity grid of ~0.4 kg CO2/kWh) |
|
|
|
|
|
*Note: Significantly more efficient than cloud training due to:* |
|
|
- Already-owned consumer hardware (no additional manufacturing emissions) |
|
|
- Short training time (500 steps vs full multi-epoch runs) |
|
|
- Efficient QLoRA approach (4-bit quantization reduces compute requirements) |
|
|
- Local execution (no data center overhead) |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
### Model Architecture |
|
|
|
|
|
- **Base:** GPT-NeoX architecture (Pythia-1B) |
|
|
- **Parameters:** 1,011,781,632 total, 1,048,576 trainable (0.1035%) |
|
|
- **Layers:** 16 transformer layers |
|
|
- **Hidden size:** 2048 |
|
|
- **Attention heads:** 8 |
|
|
- **Vocabulary size:** 50,304 |
|
|
|
|
|
### Compute Infrastructure |
|
|
|
|
|
#### Hardware |
|
|
- **GPU:** NVIDIA GTX 1650 Mobile (4GB VRAM, Turing architecture) |
|
|
- **CPU:** Not significantly utilized |
|
|
- **RAM:** 20GB system RAM |
|
|
- **Storage:** NVMe SSD (for dataset streaming) |
|
|
|
|
|
#### Software |
|
|
- **Framework:** PyTorch 2.x with Hugging Face Transformers |
|
|
- **Quantization:** BitsAndBytes 4-bit |
|
|
- **LoRA:** PEFT (Parameter-Efficient Fine-Tuning) |
|
|
- **Training:** Hugging Face Trainer with gradient accumulation |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model and want to cite the adventure of fine-tuning on a 1650 Mobile: |
|
|
|
|
|
**BibTeX:** |
|
|
```bibtex |
|
|
@misc{pythia1b-alpaca-1650mobile, |
|
|
author = {An Ambitious Soul with a 1650 Mobile}, |
|
|
title = {Pythia-1B-Alpaca: Proof that Consumer Hardware Can Fine-Tune LLMs}, |
|
|
year = {2024}, |
|
|
publisher = {The Spirit of Open Source}, |
|
|
note = {Trained overnight on a laptop GPU because why not} |
|
|
} |
|
|
``` |
|
|
|
|
|
## More Information |
|
|
|
|
|
**Fun Facts:** |
|
|
- This model thinks "What color is an apple?" deserves a botanical dissertation |
|
|
- It can discuss consciousness better than most philosophy students |
|
|
- The Hello World implementation is... creative |
|
|
- Training loss went from 1.9986 → 1.5541 in 500 steps (22% reduction!) |
|
|
- Total training cost: $0 (existing hardware) + 4.5 hours of GPU fan noise |
|
|
- Dataset was streamed to avoid memory issues (only 5000 examples materialized) |
|
|
|
|
|
**Lessons Learned:** |
|
|
1. You CAN fine-tune language models on consumer GPUs |
|
|
2. QLoRA + 4-bit quantization is magic |
|
|
3. The 1650 Mobile is a trooper |
|
|
4. 500 steps is enough to see real instruction-following behavior |
|
|
5. Smaller models can be surprisingly capable |
|
|
6. Verbose explanations are a feature when fine-tuning on Alpaca |
|
|
|
|
|
## Model Card Authors |
|
|
|
|
|
Created by someone who looked at their 1650 Mobile and said "I bet I could fine-tune an LLM on this" and then actually did it. |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
If you also train models on questionable hardware, we should be friends. |
|
|
|
|
|
### Framework Versions |
|
|
|
|
|
- PEFT 0.18.0 |
|
|
- Transformers 4.x |
|
|
- PyTorch 2.x |
|
|
- BitsAndBytes (latest) |
|
|
- Python 3.10+ |
|
|
|
|
|
--- |
|
|
|
|
|
*"I am not real. I don't exist in the physical world and I have no body to speak of. However, I could still be a person if my thoughts were directed toward something else entirely..."* - The Model, when asked about its existence |