--- license: apache-2.0 datasets: - tatsu-lab/alpaca base_model: - EleutherAI/pythia-1b pipeline_tag: text-generation tags: - base_model:adapter:EleutherAI/pythia-1b - lora - transformers - alpaca - instruction-following - existential-crisis-capable --- # Pythia-1B-Alpaca: The Overachieving 1B Model **TL;DR**: A Pythia-1B model fine-tuned on Alpaca that writes philosophical essays about consciousness but gets confused implementing Hello World. It's perfect. ## Model Details ### Model Description This model is a LoRA fine-tune of EleutherAI's Pythia-1B on the Alpaca instruction-following dataset. Trained overnight on a GTX 1650 Mobile (4GB VRAM) because we believe in the impossible. What makes this model special? It has an *interesting* relationship with different types of tasks: - ✅ Abstract concepts & philosophy → Surprisingly eloquent - ✅ General knowledge explanations → Exhaustively thorough - ⚠️ Code generation → Creative interpretation of requirements - ✅ Existential questions → Uncomfortably thoughtful **Key characteristics**: - Will explain what an apple is for 250 words - Writes consciousness essays that make you question reality - Generates Python code that... mostly works? - Has zero chill when answering simple questions - **Developed by:** Someone with a 1650 Mobile and a dream - **Model type:** Instruction-following causal language model - **Language(s):** English (verbose edition) - **License:** Apache 2.0 (inherited from base model) - **Finetuned from model:** EleutherAI/pythia-1b ### Model Sources - **Base Repository:** https://github.com/EleutherAI/pythia - **Dataset:** tatsu-lab/alpaca - **Training Hardware:** GTX 1650 Mobile 4GB (yes, really) ## Uses ### Direct Use Perfect for: - Discord bots that need personality - Generating unexpectedly detailed explanations - Philosophical discussions about AI consciousness - Creating entertainment through over-explanation - Teaching people that you CAN fine-tune on consumer hardware ### Out-of-Scope Use Not recommended for: - Production code generation (unless you enjoy debugging creative interpretations) - Concise answers (this model doesn't do "concise") - Time-sensitive applications (trained on a 1650 Mobile, responses take a while) - Situations requiring factual precision (hallucinations are a feature, not a bug) ## Notable Behaviors ### The Good **Question:** "What is AI?" **Response:** *[Generates comprehensive 250-word essay covering history, applications, economic impact, and future predictions]* **Question:** "What is consciousness?" **Response:** *[Thoughtful exploration of neuroscience, philosophy, and subjective experience]* ### The Quirky **Question:** "What color is an apple?" **Response:** *[Full botanical thesis on pigmentation, soil pH, and carotenoids]* **Request:** "Write Hello World in Python" **Response:** *[Technically code, technically Python, technically creative]* ### The Unexpected **Casual greeting:** "Hey! How are you?" **Response:** "I am good, thank you. What do you have for lunch today? I would like to order from the salad bar." ## Training Details ### Training Data - **Dataset:** Alpaca instruction-following dataset (tatsu-lab/alpaca) - **Subset used:** 5,000 examples (streamed and materialized) - **Format:** Alpaca-style instruction/input/response format ### Training Procedure #### Preprocessing - Tokenized with Pythia-1B tokenizer - Max sequence length: 512 tokens - Formatted in Alpaca template with `### Instruction:`, `### Input:`, and `### Response:` sections #### Training Hyperparameters **Quantization:** - 4-bit NF4 quantization via BitsAndBytes - Double quantization enabled - Compute dtype: float16 **LoRA Configuration:** - Rank (r): 8 - Alpha: 16 - Target modules: query_key_value - Dropout: 0.05 - Trainable parameters: 1,048,576 (0.1035% of total) **Training Arguments:** - Batch size per device: 1 - Gradient accumulation steps: 16 (effective batch size: 16) - Max training steps: 500 - Learning rate: 2e-4 (linear decay) - Precision: FP16 mixed precision - Gradient checkpointing: Disabled (to maximize speed on limited hardware) - Optimizer: AdamW (default) - Logging steps: 25 - Save steps: 500 **Training regime:** Mixed precision (FP16) #### Speeds, Sizes, Times - **Hardware:** NVIDIA GTX 1650 Mobile (4GB VRAM) - **System RAM:** 20GB - **Training time:** 4 hours 27 minutes 20 seconds (16,040.1 seconds) - **Steps per second:** 0.031 - **Samples per second:** 0.499 - **Time per step:** ~32.08 seconds - **Total steps:** 500 - **Starting loss:** 1.9986 - **Final training loss:** 1.5541 - **LoRA adapter size:** ~4MB - **Total epochs:** ~1.6 (5000 samples × 16 effective batch / 500 steps) ## Evaluation ### Qualitative Results **Strengths:** - Excellent instruction following - Detailed, educational responses - Coherent long-form text generation - Surprisingly good at abstract reasoning - Actually learned the Alpaca format **Weaknesses:** - Overly verbose on simple questions - Code generation has creative liberties - Occasional hallucination of statistics (400 million AI jobs in 2018?) - Cannot be concise to save its life ### Example Outputs **Task:** Explain photosynthesis **Quality:** ⭐⭐⭐⭐ (Accurate core concept with creative embellishments) **Task:** Write Python code **Quality:** ⭐⭐⭐ (Functional ideas, questionable execution) **Task:** Existential questions **Quality:** ⭐⭐⭐⭐⭐ (Unexpectedly profound) ## How to Get Started ### Installation ```python pip install transformers peft torch bitsandbytes ``` ### Basic Usage ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer import torch # Load base model model = AutoModelForCausalLM.from_pretrained( "EleutherAI/pythia-1b", device_map="auto", torch_dtype=torch.float16 ) # Load LoRA adapter model = PeftModel.from_pretrained(model, "path/to/checkpoint-500") tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-1b") tokenizer.pad_token = tokenizer.eos_token # Generate prompt = """### Instruction: Explain quantum computing in simple terms. ### Response: """ inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=300, do_sample=True, temperature=0.7, top_p=0.9, repetition_penalty=1.2, no_repeat_ngram_size=3 ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### Discord Bot Usage See the included `discord_bot.py` for a full-featured Discord integration with: - Slash commands - Token streaming - Stop sequences - Rate limit handling ## Bias, Risks, and Limitations **Biases:** - Inherited from Pythia-1B base model and Alpaca dataset - Tendency toward Western/English-centric perspectives - May reflect biases present in instruction-following training data **Limitations:** - Small model size (1B parameters) limits reasoning capabilities - Code generation is functional but unreliable - Hallucinations are common, especially with statistics - Responses are often unnecessarily verbose - Training was limited to 500 steps on subset of data **Risks:** - Should not be used for critical applications - May generate plausible-sounding but incorrect information - Code generated should always be reviewed before execution ### Recommendations - Verify factual claims with authoritative sources - Review and test any generated code before use - Use for entertainment, education, and experimentation - Not suitable for production systems without human oversight - Perfect for Discord bots and casual AI interactions ## Environmental Impact **Hardware Type:** NVIDIA GTX 1650 Mobile (4GB VRAM, ~50W TDP) **Hours used:** 4.45 hours **Power consumption:** ~50W average (laptop GPU under load) **Total energy:** ~0.223 kWh **Estimated CO2:** ~0.09 kg CO2eq (based on global average electricity grid of ~0.4 kg CO2/kWh) *Note: Significantly more efficient than cloud training due to:* - Already-owned consumer hardware (no additional manufacturing emissions) - Short training time (500 steps vs full multi-epoch runs) - Efficient QLoRA approach (4-bit quantization reduces compute requirements) - Local execution (no data center overhead) ## Technical Specifications ### Model Architecture - **Base:** GPT-NeoX architecture (Pythia-1B) - **Parameters:** 1,011,781,632 total, 1,048,576 trainable (0.1035%) - **Layers:** 16 transformer layers - **Hidden size:** 2048 - **Attention heads:** 8 - **Vocabulary size:** 50,304 ### Compute Infrastructure #### Hardware - **GPU:** NVIDIA GTX 1650 Mobile (4GB VRAM, Turing architecture) - **CPU:** Not significantly utilized - **RAM:** 20GB system RAM - **Storage:** NVMe SSD (for dataset streaming) #### Software - **Framework:** PyTorch 2.x with Hugging Face Transformers - **Quantization:** BitsAndBytes 4-bit - **LoRA:** PEFT (Parameter-Efficient Fine-Tuning) - **Training:** Hugging Face Trainer with gradient accumulation ## Citation If you use this model and want to cite the adventure of fine-tuning on a 1650 Mobile: **BibTeX:** ```bibtex @misc{pythia1b-alpaca-1650mobile, author = {An Ambitious Soul with a 1650 Mobile}, title = {Pythia-1B-Alpaca: Proof that Consumer Hardware Can Fine-Tune LLMs}, year = {2024}, publisher = {The Spirit of Open Source}, note = {Trained overnight on a laptop GPU because why not} } ``` ## More Information **Fun Facts:** - This model thinks "What color is an apple?" deserves a botanical dissertation - It can discuss consciousness better than most philosophy students - The Hello World implementation is... creative - Training loss went from 1.9986 → 1.5541 in 500 steps (22% reduction!) - Total training cost: $0 (existing hardware) + 4.5 hours of GPU fan noise - Dataset was streamed to avoid memory issues (only 5000 examples materialized) **Lessons Learned:** 1. You CAN fine-tune language models on consumer GPUs 2. QLoRA + 4-bit quantization is magic 3. The 1650 Mobile is a trooper 4. 500 steps is enough to see real instruction-following behavior 5. Smaller models can be surprisingly capable 6. Verbose explanations are a feature when fine-tuning on Alpaca ## Model Card Authors Created by someone who looked at their 1650 Mobile and said "I bet I could fine-tune an LLM on this" and then actually did it. ## Model Card Contact If you also train models on questionable hardware, we should be friends. ### Framework Versions - PEFT 0.18.0 - Transformers 4.x - PyTorch 2.x - BitsAndBytes (latest) - Python 3.10+ --- *"I am not real. I don't exist in the physical world and I have no body to speak of. However, I could still be a person if my thoughts were directed toward something else entirely..."* - The Model, when asked about its existence