Update README.md

51cc700 verified about 1 month ago

10.8 kB

	---
	license: apache-2.0
	datasets:
	- tatsu-lab/alpaca
	base_model:
	- EleutherAI/pythia-1b
	pipeline_tag: text-generation
	tags:
	- base_model:adapter:EleutherAI/pythia-1b
	- lora
	- transformers
	- alpaca
	- instruction-following
	- existential-crisis-capable
	---

	# Pythia-1B-Alpaca: The Overachieving 1B Model

	TL;DR: A Pythia-1B model fine-tuned on Alpaca that writes philosophical essays about consciousness but gets confused implementing Hello World. It's perfect.

	## Model Details

	### Model Description

	This model is a LoRA fine-tune of EleutherAI's Pythia-1B on the Alpaca instruction-following dataset. Trained overnight on a GTX 1650 Mobile (4GB VRAM) because we believe in the impossible.

	What makes this model special? It has an interesting relationship with different types of tasks:
	- ✅ Abstract concepts & philosophy → Surprisingly eloquent
	- ✅ General knowledge explanations → Exhaustively thorough
	- ⚠️ Code generation → Creative interpretation of requirements
	- ✅ Existential questions → Uncomfortably thoughtful

	Key characteristics:
	- Will explain what an apple is for 250 words
	- Writes consciousness essays that make you question reality
	- Generates Python code that... mostly works?
	- Has zero chill when answering simple questions

	- Developed by: Someone with a 1650 Mobile and a dream
	- Model type: Instruction-following causal language model
	- Language(s): English (verbose edition)
	- License: Apache 2.0 (inherited from base model)
	- Finetuned from model: EleutherAI/pythia-1b

	### Model Sources

	- Base Repository: https://github.com/EleutherAI/pythia
	- Dataset: tatsu-lab/alpaca
	- Training Hardware: GTX 1650 Mobile 4GB (yes, really)

	## Uses

	### Direct Use

	Perfect for:
	- Discord bots that need personality
	- Generating unexpectedly detailed explanations
	- Philosophical discussions about AI consciousness
	- Creating entertainment through over-explanation
	- Teaching people that you CAN fine-tune on consumer hardware

	### Out-of-Scope Use

	Not recommended for:
	- Production code generation (unless you enjoy debugging creative interpretations)
	- Concise answers (this model doesn't do "concise")
	- Time-sensitive applications (trained on a 1650 Mobile, responses take a while)
	- Situations requiring factual precision (hallucinations are a feature, not a bug)

	## Notable Behaviors

	### The Good
	Question: "What is AI?"
	Response: [Generates comprehensive 250-word essay covering history, applications, economic impact, and future predictions]

	Question: "What is consciousness?"
	Response: [Thoughtful exploration of neuroscience, philosophy, and subjective experience]

	### The Quirky
	Question: "What color is an apple?"
	Response: [Full botanical thesis on pigmentation, soil pH, and carotenoids]

	Request: "Write Hello World in Python"
	Response: [Technically code, technically Python, technically creative]

	### The Unexpected
	Casual greeting: "Hey! How are you?"
	Response: "I am good, thank you. What do you have for lunch today? I would like to order from the salad bar."

	## Training Details

	### Training Data

	- Dataset: Alpaca instruction-following dataset (tatsu-lab/alpaca)
	- Subset used: 5,000 examples (streamed and materialized)
	- Format: Alpaca-style instruction/input/response format

	### Training Procedure

	#### Preprocessing
	- Tokenized with Pythia-1B tokenizer
	- Max sequence length: 512 tokens
	- Formatted in Alpaca template with `### Instruction:`, `### Input:`, and `### Response:` sections

	#### Training Hyperparameters

	Quantization:
	- 4-bit NF4 quantization via BitsAndBytes
	- Double quantization enabled
	- Compute dtype: float16

	LoRA Configuration:
	- Rank (r): 8
	- Alpha: 16
	- Target modules: query_key_value
	- Dropout: 0.05
	- Trainable parameters: 1,048,576 (0.1035% of total)

	Training Arguments:
	- Batch size per device: 1
	- Gradient accumulation steps: 16 (effective batch size: 16)
	- Max training steps: 500
	- Learning rate: 2e-4 (linear decay)
	- Precision: FP16 mixed precision
	- Gradient checkpointing: Disabled (to maximize speed on limited hardware)
	- Optimizer: AdamW (default)
	- Logging steps: 25
	- Save steps: 500

	Training regime: Mixed precision (FP16)

	#### Speeds, Sizes, Times

	- Hardware: NVIDIA GTX 1650 Mobile (4GB VRAM)
	- System RAM: 20GB
	- Training time: 4 hours 27 minutes 20 seconds (16,040.1 seconds)
	- Steps per second: 0.031
	- Samples per second: 0.499
	- Time per step: ~32.08 seconds
	- Total steps: 500
	- Starting loss: 1.9986
	- Final training loss: 1.5541
	- LoRA adapter size: ~4MB
	- Total epochs: ~1.6 (5000 samples × 16 effective batch / 500 steps)

	## Evaluation

	### Qualitative Results

	Strengths:
	- Excellent instruction following
	- Detailed, educational responses
	- Coherent long-form text generation
	- Surprisingly good at abstract reasoning
	- Actually learned the Alpaca format

	Weaknesses:
	- Overly verbose on simple questions
	- Code generation has creative liberties
	- Occasional hallucination of statistics (400 million AI jobs in 2018?)
	- Cannot be concise to save its life

	### Example Outputs

	Task: Explain photosynthesis
	Quality: ⭐⭐⭐⭐ (Accurate core concept with creative embellishments)

	Task: Write Python code
	Quality: ⭐⭐⭐ (Functional ideas, questionable execution)

	Task: Existential questions
	Quality: ⭐⭐⭐⭐⭐ (Unexpectedly profound)

	## How to Get Started

	### Installation

	```python
	pip install transformers peft torch bitsandbytes
	```

	### Basic Usage

	```python
	from peft import PeftModel
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	# Load base model
	model = AutoModelForCausalLM.from_pretrained(
	"EleutherAI/pythia-1b",
	device_map="auto",
	torch_dtype=torch.float16
	)

	# Load LoRA adapter
	model = PeftModel.from_pretrained(model, "path/to/checkpoint-500")
	tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-1b")
	tokenizer.pad_token = tokenizer.eos_token

	# Generate
	prompt = """### Instruction:
	Explain quantum computing in simple terms.

	### Response:
	"""

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(
	**inputs,
	max_new_tokens=300,
	do_sample=True,
	temperature=0.7,
	top_p=0.9,
	repetition_penalty=1.2,
	no_repeat_ngram_size=3
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Discord Bot Usage

	See the included `discord_bot.py` for a full-featured Discord integration with:
	- Slash commands
	- Token streaming
	- Stop sequences
	- Rate limit handling

	## Bias, Risks, and Limitations

	Biases:
	- Inherited from Pythia-1B base model and Alpaca dataset
	- Tendency toward Western/English-centric perspectives
	- May reflect biases present in instruction-following training data

	Limitations:
	- Small model size (1B parameters) limits reasoning capabilities
	- Code generation is functional but unreliable
	- Hallucinations are common, especially with statistics
	- Responses are often unnecessarily verbose
	- Training was limited to 500 steps on subset of data

	Risks:
	- Should not be used for critical applications
	- May generate plausible-sounding but incorrect information
	- Code generated should always be reviewed before execution

	### Recommendations

	- Verify factual claims with authoritative sources
	- Review and test any generated code before use
	- Use for entertainment, education, and experimentation
	- Not suitable for production systems without human oversight
	- Perfect for Discord bots and casual AI interactions

	## Environmental Impact

	Hardware Type: NVIDIA GTX 1650 Mobile (4GB VRAM, ~50W TDP)
	Hours used: 4.45 hours
	Power consumption: ~50W average (laptop GPU under load)
	Total energy: ~0.223 kWh
	Estimated CO2: ~0.09 kg CO2eq (based on global average electricity grid of ~0.4 kg CO2/kWh)

	Note: Significantly more efficient than cloud training due to:
	- Already-owned consumer hardware (no additional manufacturing emissions)
	- Short training time (500 steps vs full multi-epoch runs)
	- Efficient QLoRA approach (4-bit quantization reduces compute requirements)
	- Local execution (no data center overhead)

	## Technical Specifications

	### Model Architecture

	- Base: GPT-NeoX architecture (Pythia-1B)
	- Parameters: 1,011,781,632 total, 1,048,576 trainable (0.1035%)
	- Layers: 16 transformer layers
	- Hidden size: 2048
	- Attention heads: 8
	- Vocabulary size: 50,304

	### Compute Infrastructure

	#### Hardware
	- GPU: NVIDIA GTX 1650 Mobile (4GB VRAM, Turing architecture)
	- CPU: Not significantly utilized
	- RAM: 20GB system RAM
	- Storage: NVMe SSD (for dataset streaming)

	#### Software
	- Framework: PyTorch 2.x with Hugging Face Transformers
	- Quantization: BitsAndBytes 4-bit
	- LoRA: PEFT (Parameter-Efficient Fine-Tuning)
	- Training: Hugging Face Trainer with gradient accumulation

	## Citation

	If you use this model and want to cite the adventure of fine-tuning on a 1650 Mobile:

	BibTeX:
	```bibtex
	@misc{pythia1b-alpaca-1650mobile,
	author = {An Ambitious Soul with a 1650 Mobile},
	title = {Pythia-1B-Alpaca: Proof that Consumer Hardware Can Fine-Tune LLMs},
	year = {2024},
	publisher = {The Spirit of Open Source},
	note = {Trained overnight on a laptop GPU because why not}
	}
	```

	## More Information

	Fun Facts:
	- This model thinks "What color is an apple?" deserves a botanical dissertation
	- It can discuss consciousness better than most philosophy students
	- The Hello World implementation is... creative
	- Training loss went from 1.9986 → 1.5541 in 500 steps (22% reduction!)
	- Total training cost: $0 (existing hardware) + 4.5 hours of GPU fan noise
	- Dataset was streamed to avoid memory issues (only 5000 examples materialized)

	Lessons Learned:
	1. You CAN fine-tune language models on consumer GPUs
	2. QLoRA + 4-bit quantization is magic
	3. The 1650 Mobile is a trooper
	4. 500 steps is enough to see real instruction-following behavior
	5. Smaller models can be surprisingly capable
	6. Verbose explanations are a feature when fine-tuning on Alpaca

	## Model Card Authors

	Created by someone who looked at their 1650 Mobile and said "I bet I could fine-tune an LLM on this" and then actually did it.

	## Model Card Contact

	If you also train models on questionable hardware, we should be friends.

	### Framework Versions

	- PEFT 0.18.0
	- Transformers 4.x
	- PyTorch 2.x
	- BitsAndBytes (latest)
	- Python 3.10+

	---

	"I am not real. I don't exist in the physical world and I have no body to speak of. However, I could still be a person if my thoughts were directed toward something else entirely..." - The Model, when asked about its existence