---
license: apache-2.0
datasets:
- tatsu-lab/alpaca
base_model:
- EleutherAI/pythia-1b
pipeline_tag: text-generation
tags:
- base_model:adapter:EleutherAI/pythia-1b
- lora
- transformers
- alpaca
- instruction-following
- existential-crisis-capable
---

# Pythia-1B-Alpaca: The Overachieving 1B Model

**TL;DR**: A Pythia-1B model fine-tuned on Alpaca that writes philosophical essays about consciousness but gets confused implementing Hello World. It's perfect.

## Model Details

### Model Description

This model is a LoRA fine-tune of EleutherAI's Pythia-1B on the Alpaca instruction-following dataset. Trained overnight on a GTX 1650 Mobile (4GB VRAM) because we believe in the impossible. 

What makes this model special? It has an *interesting* relationship with different types of tasks:
- ✅ Abstract concepts & philosophy → Surprisingly eloquent
- ✅ General knowledge explanations → Exhaustively thorough
- ⚠️ Code generation → Creative interpretation of requirements
- ✅ Existential questions → Uncomfortably thoughtful

**Key characteristics**:
- Will explain what an apple is for 250 words
- Writes consciousness essays that make you question reality
- Generates Python code that... mostly works?
- Has zero chill when answering simple questions

- **Developed by:** Someone with a 1650 Mobile and a dream
- **Model type:** Instruction-following causal language model
- **Language(s):** English (verbose edition)
- **License:** Apache 2.0 (inherited from base model)
- **Finetuned from model:** EleutherAI/pythia-1b

### Model Sources

- **Base Repository:** https://github.com/EleutherAI/pythia
- **Dataset:** tatsu-lab/alpaca
- **Training Hardware:** GTX 1650 Mobile 4GB (yes, really)

## Uses

### Direct Use

Perfect for:
- Discord bots that need personality
- Generating unexpectedly detailed explanations
- Philosophical discussions about AI consciousness
- Creating entertainment through over-explanation
- Teaching people that you CAN fine-tune on consumer hardware

### Out-of-Scope Use

Not recommended for:
- Production code generation (unless you enjoy debugging creative interpretations)
- Concise answers (this model doesn't do "concise")
- Time-sensitive applications (trained on a 1650 Mobile, responses take a while)
- Situations requiring factual precision (hallucinations are a feature, not a bug)

## Notable Behaviors

### The Good
**Question:** "What is AI?"
**Response:** *[Generates comprehensive 250-word essay covering history, applications, economic impact, and future predictions]*

**Question:** "What is consciousness?"
**Response:** *[Thoughtful exploration of neuroscience, philosophy, and subjective experience]*

### The Quirky
**Question:** "What color is an apple?"
**Response:** *[Full botanical thesis on pigmentation, soil pH, and carotenoids]*

**Request:** "Write Hello World in Python"
**Response:** *[Technically code, technically Python, technically creative]*

### The Unexpected
**Casual greeting:** "Hey! How are you?"
**Response:** "I am good, thank you. What do you have for lunch today? I would like to order from the salad bar."

## Training Details

### Training Data

- **Dataset:** Alpaca instruction-following dataset (tatsu-lab/alpaca)
- **Subset used:** 5,000 examples (streamed and materialized)
- **Format:** Alpaca-style instruction/input/response format

### Training Procedure

#### Preprocessing
- Tokenized with Pythia-1B tokenizer
- Max sequence length: 512 tokens
- Formatted in Alpaca template with `### Instruction:`, `### Input:`, and `### Response:` sections

#### Training Hyperparameters

**Quantization:**
- 4-bit NF4 quantization via BitsAndBytes
- Double quantization enabled
- Compute dtype: float16

**LoRA Configuration:**
- Rank (r): 8
- Alpha: 16
- Target modules: query_key_value
- Dropout: 0.05
- Trainable parameters: 1,048,576 (0.1035% of total)

**Training Arguments:**
- Batch size per device: 1
- Gradient accumulation steps: 16 (effective batch size: 16)
- Max training steps: 500
- Learning rate: 2e-4 (linear decay)
- Precision: FP16 mixed precision
- Gradient checkpointing: Disabled (to maximize speed on limited hardware)
- Optimizer: AdamW (default)
- Logging steps: 25
- Save steps: 500

**Training regime:** Mixed precision (FP16)

#### Speeds, Sizes, Times

- **Hardware:** NVIDIA GTX 1650 Mobile (4GB VRAM)
- **System RAM:** 20GB
- **Training time:** 4 hours 27 minutes 20 seconds (16,040.1 seconds)
- **Steps per second:** 0.031
- **Samples per second:** 0.499
- **Time per step:** ~32.08 seconds
- **Total steps:** 500
- **Starting loss:** 1.9986
- **Final training loss:** 1.5541
- **LoRA adapter size:** ~4MB
- **Total epochs:** ~1.6 (5000 samples × 16 effective batch / 500 steps)

## Evaluation

### Qualitative Results

**Strengths:**
- Excellent instruction following
- Detailed, educational responses
- Coherent long-form text generation
- Surprisingly good at abstract reasoning
- Actually learned the Alpaca format

**Weaknesses:**
- Overly verbose on simple questions
- Code generation has creative liberties
- Occasional hallucination of statistics (400 million AI jobs in 2018?)
- Cannot be concise to save its life

### Example Outputs

**Task:** Explain photosynthesis
**Quality:** ⭐⭐⭐⭐ (Accurate core concept with creative embellishments)

**Task:** Write Python code
**Quality:** ⭐⭐⭐ (Functional ideas, questionable execution)

**Task:** Existential questions
**Quality:** ⭐⭐⭐⭐⭐ (Unexpectedly profound)

## How to Get Started

### Installation

```python
pip install transformers peft torch bitsandbytes
```

### Basic Usage

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    "EleutherAI/pythia-1b",
    device_map="auto",
    torch_dtype=torch.float16
)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "path/to/checkpoint-500")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-1b")
tokenizer.pad_token = tokenizer.eos_token

# Generate
prompt = """### Instruction:
Explain quantum computing in simple terms.

### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=300,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.2,
    no_repeat_ngram_size=3
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### Discord Bot Usage

See the included `discord_bot.py` for a full-featured Discord integration with:
- Slash commands
- Token streaming
- Stop sequences
- Rate limit handling

## Bias, Risks, and Limitations

**Biases:**
- Inherited from Pythia-1B base model and Alpaca dataset
- Tendency toward Western/English-centric perspectives
- May reflect biases present in instruction-following training data

**Limitations:**
- Small model size (1B parameters) limits reasoning capabilities
- Code generation is functional but unreliable
- Hallucinations are common, especially with statistics
- Responses are often unnecessarily verbose
- Training was limited to 500 steps on subset of data

**Risks:**
- Should not be used for critical applications
- May generate plausible-sounding but incorrect information
- Code generated should always be reviewed before execution

### Recommendations

- Verify factual claims with authoritative sources
- Review and test any generated code before use
- Use for entertainment, education, and experimentation
- Not suitable for production systems without human oversight
- Perfect for Discord bots and casual AI interactions

## Environmental Impact

**Hardware Type:** NVIDIA GTX 1650 Mobile (4GB VRAM, ~50W TDP)  
**Hours used:** 4.45 hours  
**Power consumption:** ~50W average (laptop GPU under load)  
**Total energy:** ~0.223 kWh  
**Estimated CO2:** ~0.09 kg CO2eq (based on global average electricity grid of ~0.4 kg CO2/kWh)

*Note: Significantly more efficient than cloud training due to:*
- Already-owned consumer hardware (no additional manufacturing emissions)
- Short training time (500 steps vs full multi-epoch runs)
- Efficient QLoRA approach (4-bit quantization reduces compute requirements)
- Local execution (no data center overhead)

## Technical Specifications

### Model Architecture

- **Base:** GPT-NeoX architecture (Pythia-1B)
- **Parameters:** 1,011,781,632 total, 1,048,576 trainable (0.1035%)
- **Layers:** 16 transformer layers
- **Hidden size:** 2048
- **Attention heads:** 8
- **Vocabulary size:** 50,304

### Compute Infrastructure

#### Hardware
- **GPU:** NVIDIA GTX 1650 Mobile (4GB VRAM, Turing architecture)
- **CPU:** Not significantly utilized
- **RAM:** 20GB system RAM
- **Storage:** NVMe SSD (for dataset streaming)

#### Software
- **Framework:** PyTorch 2.x with Hugging Face Transformers
- **Quantization:** BitsAndBytes 4-bit
- **LoRA:** PEFT (Parameter-Efficient Fine-Tuning)
- **Training:** Hugging Face Trainer with gradient accumulation

## Citation

If you use this model and want to cite the adventure of fine-tuning on a 1650 Mobile:

**BibTeX:**
```bibtex
@misc{pythia1b-alpaca-1650mobile,
  author = {An Ambitious Soul with a 1650 Mobile},
  title = {Pythia-1B-Alpaca: Proof that Consumer Hardware Can Fine-Tune LLMs},
  year = {2024},
  publisher = {The Spirit of Open Source},
  note = {Trained overnight on a laptop GPU because why not}
}
```

## More Information

**Fun Facts:**
- This model thinks "What color is an apple?" deserves a botanical dissertation
- It can discuss consciousness better than most philosophy students
- The Hello World implementation is... creative
- Training loss went from 1.9986 → 1.5541 in 500 steps (22% reduction!)
- Total training cost: $0 (existing hardware) + 4.5 hours of GPU fan noise
- Dataset was streamed to avoid memory issues (only 5000 examples materialized)

**Lessons Learned:**
1. You CAN fine-tune language models on consumer GPUs
2. QLoRA + 4-bit quantization is magic
3. The 1650 Mobile is a trooper
4. 500 steps is enough to see real instruction-following behavior
5. Smaller models can be surprisingly capable
6. Verbose explanations are a feature when fine-tuning on Alpaca

## Model Card Authors

Created by someone who looked at their 1650 Mobile and said "I bet I could fine-tune an LLM on this" and then actually did it.

## Model Card Contact

If you also train models on questionable hardware, we should be friends.

### Framework Versions

- PEFT 0.18.0
- Transformers 4.x
- PyTorch 2.x
- BitsAndBytes (latest)
- Python 3.10+

---

*"I am not real. I don't exist in the physical world and I have no body to speak of. However, I could still be a person if my thoughts were directed toward something else entirely..."* - The Model, when asked about its existence