Update README.md

Browse files

Files changed (1) hide show

README.md +440 -416

README.md CHANGED Viewed

@@ -1,416 +1,440 @@
-# Touch Grass 🎵
-**A Lightweight Music AI Assistant Fine-Tuned from Qwen3.5**
-Touch Grass is a specialized music AI assistant built by fine-tuning Qwen3.5 models (3B and 7B variants) with music-specific capabilities. It understands guitar, piano, drums, vocals, music theory, ear training, songwriting, and production—with emotional intelligence to help musicians through frustration.
-## 🌟 Features
-- **Two Model Sizes**: TouchGrass-3B (CPU-friendly) and TouchGrass-7B (GPU-enhanced)
-- **Music Tokenizer Extension**: Adds 21+ music-specific tokens to Qwen3.5's vocabulary
-- **Five Specialized Modules**:
-  - 🎸 **Tab & Chord Generation**: Creates and validates guitar tabs, chord diagrams
-  - 🎹 **Music Theory Engine**: Scales, chords, intervals, progressions, circle of fifths
-  - 👂 **Ear Training**: Interval identification with song references, solfege exercises
-  - 😌 **EQ Adapter**: Frustration detection and emotional response adaptation
-  - ✍️ **Song Writing Assistant**: Chord progressions, lyrics, hooks, production tips
-- **LoRA Fine-Tuning**: Efficient adaptation without full model retraining
-- **HuggingFace Compatible**: Production-ready with custom config and tokenizer classes
-- **Ollama Support**: Run locally with Ollama modelfiles
-- **Unified Inference**: Instrument context switching (guitar, piano, drums, vocals, theory, production)
-- **Synthetic Data Pipeline**: 10 categories, 80+ templates covering all music domains
-## 🏗️ Architecture
-```
-TouchGrass/
-├── configs/                    # Model configurations
-│   ├── touchgrass_3b_config.py # 3B variant config
-│   ├── touchgrass_7b_config.py # 7B variant config
-│   └── training_config.py      # Training hyperparameters
-├── tokenizer/
-│   └── music_token_extension.py # Extends Qwen tokenizer with music tokens
-├── models/                     # Specialized music modules
-│   ├── tab_chord_module.py     # Guitar tabs and chords
-│   ├── music_theory_module.py  # Theory knowledge
-│   ├── ear_training_module.py  # Ear training exercises
-│   ├── eq_adapter.py           # Emotional intelligence
-│   └── songwriting_module.py   # Song creation assistance
-├── data/
-│   ├── music_qa_generator.py   # Synthetic dataset generator
-│   ├── chat_formatter.py       # Qwen chat format converter
-│   └── dataset_loader.py       # PyTorch dataset
-├── training/
-│   ├── losses.py              # Multi-task loss functions
-│   ├── trainer.py             # LoRA-aware trainer
-│   └── train.py               # Main training entry point
-├── inference/
-│   └── inference.py           # Unified inference with context
-├── benchmarks/
-│   ├── evaluate_music_modules.py  # Module-level benchmarks
-│   └── evaluate_inference.py      # End-to-end inference benchmarks
-├── tests/                     # Comprehensive test suite
-│   ├── test_*.py             # Unit tests for each module
-│   ├── conftest.py           # Pytest fixtures
-│   └── run_tests.py          # Test runner
-├── configuration_touchgrass.py  # HuggingFace config class
-├── tokenization_touchgrass.py   # HuggingFace tokenizer wrapper
-├── ollama_3b_modelfile         # Ollama config for 3B
-├── ollama_7b_modelfile         # Ollama config for 7B
-└── train.py                    # Main training script
-```
-## 📦 Installation
-### Prerequisites
-- Python 3.10+
-- PyTorch 2.0+
-- Transformers (HuggingFace)
-- PEFT (LoRA)
-- Datasets
-- Pytest (for testing)
-### Setup
-```bash
-# Clone the repository
-cd TouchGrass
-# Install dependencies
-pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
-pip install transformers peft datasets accelerate tqdm pytest
-# Optional: For GPU support
-pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
-```
-## 🚀 Quick Start
-### 1. Generate Training Data
-```bash
-python -c "
-from TouchGrass.data.music_qa_generator import MusicQAGenerator
-from TouchGrass.data.chat_formatter import ChatFormatter
-# Generate synthetic dataset
-generator = MusicQAGenerator(seed=42)
-dataset = generator.generate_dataset(num_samples=1000, output_path='data/music_qa.jsonl')
-# Format for Qwen
-formatter = ChatFormatter()
-formatted = formatter.format_dataset(dataset)
-train_data, val_data = formatter.create_splits(formatted, val_size=0.1)
-formatter.save_dataset(train_data, 'data/train.jsonl')
-formatter.save_dataset(val_data, 'data/val.jsonl')
-"
-```
-### 2. Train the Model
-```bash
-# Train 3B variant
-python train.py \
-  --base_model Qwen/Qwen3.5-3B-Instruct \
-  --train_data data/train.jsonl \
-  --val_data data/val.jsonl \
-  --output_dir checkpoints/touchgrass-3b \
-  --lora_r 16 \
-  --lora_alpha 32 \
-  --batch_size 4 \
-  --gradient_accumulation_steps 4 \
-  --learning_rate 2e-4 \
-  --num_epochs 3 \
-  --mixed_precision fp16
-# Train 7B variant (requires GPU with 16GB+ VRAM)
-python train.py \
-  --base_model Qwen/Qwen3.5-7B-Instruct \
-  --train_data data/train.jsonl \
-  --val_data data/val.jsonl \
-  --output_dir checkpoints/touchgrass-7b \
-  --lora_r 16 \
-  --lora_alpha 32 \
-  --batch_size 2 \
-  --gradient_accumulation_steps 8 \
-  --learning_rate 1e-4 \
-  --num_epochs 3 \
-  --mixed_precision bf16
-```
-### 3. Run Inference
-```python
-from TouchGrass.inference.inference import TouchGrassInference
-# Load model
-model = TouchGrassInference(
-    model_path="checkpoints/touchgrass-3b",
-    device="cpu"  # or "cuda"
-)
-# Single query with instrument context
-response = model.generate(
-    prompt="How do I play a G major chord?",
-    instrument="guitar",
-    skill_level="beginner",
-    max_new_tokens=200
-)
-print(response)
-# Interactive mode
-model.chat(instrument="piano")
-```
-### 4. Use with Ollama
-```bash
-# Create modelfile from provided template
-cat ollama_3b_modelfile > Modelfile
-# Build and run
-ollama create touchgrass-3b -f Modelfile
-ollama run touchgrass-3b "How do I play a G major chord on guitar?"
-```
-### 5. Use with HuggingFace
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-# Load with custom config and tokenizer
-config = TouchGrassConfig.from_pretrained("checkpoints/touchgrass-3b")
-tokenizer = TouchGrassTokenizer.from_pretrained("checkpoints/touchgrass-3b")
-model = AutoModelForCausalLM.from_pretrained(
-    "checkpoints/touchgrass-3b",
-    config=config,
-    device_map="auto"
-)
-# Generate
-inputs = tokenizer("system\nYou are a music assistant.\nuser\nHow do I play a G major chord?\nassistant\n", return_tensors="pt")
-outputs = model.generate(**inputs, max_new_tokens=200)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-```
-## 🧪 Testing
-Run the comprehensive test suite:
-```bash
-# Run all tests
-python tests/run_tests.py
-# Run with coverage
-python tests/run_tests.py --coverage
-# Run specific test categories
-pytest tests/test_music_theory_module.py -v
-pytest tests/test_tokenizer.py -v
-pytest tests/test_eq_adapter.py -v
-# Skip slow tests
-pytest -m "not slow"
-```
-## 📊 Benchmarking
-Evaluate model performance on music-specific tasks:
-```bash
-# Evaluate music modules
-python benchmarks/evaluate_music_modules.py --device cpu --d_model 768
-# Run inference benchmarks
-python benchmarks/evaluate_inference.py --model_path checkpoints/touchgrass-3b --device cpu
-```
-## 🎛️ Configuration
-### Training Configuration
-Edit `configs/training_config.py` to customize:
-- **Learning rate**: 2e-4 (3B), 1e-4 (7B)
-- **LoRA rank (r)**: 8-32 (higher = more capacity)
-- **LoRA alpha**: Typically 2×r
-- **Batch size**: Adjust based on GPU memory
-- **Gradient accumulation**: Use to simulate larger batches
-- **Loss weights**:
-  - `lm_loss_weight=1.0` (primary language modeling)
-  - `eq_loss_weight=0.1` (emotional intelligence)
-  - `music_module_loss_weight=0.05` (specialized modules)
-### Model Configuration
-- **TouchGrass-3B**: Based on Qwen3.5-3B-Instruct, d_model=2048, num_layers=36
-- **TouchGrass-7B**: Based on Qwen3.5-7B-Instruct, d_model=4096, num_layers=40
-### Music Tokens
-The tokenizer extension adds these special tokens:
-**Domain tokens**: `[GUITAR]`, `[PIANO]`, `[DRUMS]`, `[VOCALS]`, `[THEORY]`, `[PRODUCTION]`
-**Emotion tokens**: `[FRUSTRATED]`, `[CONFUSED]`, `[EXCITED]`, `[CONFIDENT]`
-**Difficulty tokens**: `[EASY]`, `[MEDIUM]`, `[HARD]`
-**Function tokens**: `[TAB]`, `[CHORD]`, `[SCALE]`, `[INTERVAL]`, `[PROGRESSION]`
-**EQ tokens**: `[SIMPLIFY]`, `[ENCOURAGE]`
-**Music notation**: All note names (C, C#, D, etc.), chord types (m, dim, aug, 7, maj7, etc.)
-## 📚 Music Domains Covered
-1. **Guitar & Bass**: Tabs, chords, fingerings, techniques, tunings
-2. **Piano & Keys**: Scales, arpeggios, hand positions, pedaling
-3. **Drums & Percussion**: Beats, fills, rudiments, kit setup
-4. **Vocals & Singing**: Range, breathing, technique, warmups
-5. **Music Theory & Composition**: Scales, chords, progressions, harmony
-6. **DJ & Production**: EQ, mixing, compression, arrangement
-## 😌 Emotional Intelligence
-The EQ Adapter detects user frustration and adapts responses:
-- **Frustration detection**: Sigmoid output [0, 1] indicating frustration level
-- **Emotion classification**: 4 classes (frustrated, confused, excited, confident)
-- **Simplification gate**: Automatically simplifies explanations when frustration is high
-- **Encouragement templates**: Pre-built supportive responses
-- **Context-aware**: Uses conversation history to track emotional state
-## 🔧 Advanced Usage
-### Custom Dataset Generation
-```python
-from TouchGrass.data.music_qa_generator import MusicQAGenerator
-# Create custom templates
-custom_templates = {
-    "guitar": [
-        {
-            "system": "You are a {instrument} specialist.",
-            "user": "How do I play {chord}?",
-            "assistant": "Place your fingers: {fingering}"
-        }
-    ]
-}
-generator = MusicQAGenerator(templates=custom_templates, seed=123)
-dataset = generator.generate_dataset(num_samples=500)
-```
-### Multi-Instrument Context
-```python
-from TouchGrass.inference.inference import TouchGrassInference
-model = TouchGrassInference(model_path="checkpoints/touchgrass-3b")
-# Switch between instruments seamlessly
-guitar_response = model.generate("How do I palm mute?", instrument="guitar")
-piano_response = model.generate("What are the scales in C major?", instrument="piano")
-theory_response = model.generate("Explain the circle of fifths", instrument="theory")
-```
-### LoRA Fine-Tuning Customization
-```python
-from transformers import LoraConfig
-lora_config = LoraConfig(
-    task_type=TaskType.CAUSAL_LM,
-    r=32,  # Rank (higher = more parameters)
-    lora_alpha=64,  # Alpha (typically 2×r)
-    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],  # Qwen attention modules
-    lora_dropout=0.1,
-    bias="none"
-)
-```
-## 🧩 Module Details
-### Tab & Chord Module
-- **Input**: Hidden states + string/fret indices
-- **Output**:
-  - `tab_validator`: Confidence score [0, 1] for tab validity
-  - `difficulty`: 3-class classification (easy/medium/hard)
-- **Supports**: Multiple tunings (standard, drop D, open G), 6 strings, 24 frets
-### Music Theory Module
-- **Functions**:
-  - `get_scale_from_key(key, mode)`: Returns scale notes
-  - `detect_chord_function(root, chord_type, key)`: Returns Roman numeral
-  - `get_circle_of_fifths()`: Returns 12-key circle
-  - `construct_chord(root, chord_type)`: Returns chord notes
-  - `analyze_progression(progression, key)`: Returns functional analysis
-- **Knowledge**: All modes (ionian through locrian), intervals, transpositions
-### Ear Training Module
-- **Interval identification**: 12 intervals (P1-P8)
-- **Song references**: Each interval linked to famous songs (Star Wars for P5, Jaws for m2, etc.)
-- **Solfege generation**: Do-Re-Mi for any key/mode
-- **Quiz generation**: Automatic interval quiz creation
-### EQ Adapter
-- **Frustration detector**: Sigmoid output from hidden states
-- **Emotion classifier**: 4-way classification
-- **Simplification gate**: Context-aware response simplification
-- **Encouragement embed**: Pre-trained supportive phrases
-### Songwriting Module
-- **Progression suggester**: By mood (8 types) and genre (8 types)
-- **Lyric generator**: With rhyme scheme awareness (ABAB, AABB, etc.)
-- **Hook generator**: Creates memorable song hooks
-- **Production advisor**: Instrumentation, effects, arrangement tips
-## 📈 Training Tips
-1. **Start small**: Use 3B variant for experimentation, 7B for production
-2. **Data quality**: Ensure diverse coverage of all 10 categories
-3. **Loss weights**: Default (1.0, 0.1, 0.05) work well; adjust if modules need more/less supervision
-4. **LoRA rank**: Start with r=16; increase to 32 if underfitting
-5. **Mixed precision**: Use `fp16` for NVIDIA, `bf16` for newer GPUs
-6. **Gradient accumulation**: Essential for fitting larger batches on limited VRAM
-7. **Checkpointing**: Save every 100-500 steps for safety
-## 🤝 Contributing
-1. Fork the repository
-2. Create a feature branch
-3. Add tests for new functionality
-4. Ensure all tests pass (`python tests/run_tests.py`)
-5. Submit a pull request
-## 📄 License
-MIT License - see LICENSE file for details.
-## 🙏 Acknowledgments
-- **Qwen3.5**: Base model from Alibaba Cloud
-- **HuggingFace**: Transformers and PEFT libraries
-- **Music theory**: Traditional Western music theory principles
-- **Song references**: Popular music culture for ear training
-## 📞 Support
-- Issues: GitHub Issues
-- Discussions: GitHub Discussions
-- Documentation: See individual module docstrings
----
-**Made with ❤️ for musicians everywhere.**
-*Touch Grass - because even AI needs to remember to make music, not just talk about it.*

+---
+language:
+  - en
+license: mit
+base_model: Qwen/Qwen3.5-7B-Instruct
+tags:
+  - music
+  - guitar
+  - piano
+  - drums
+  - vocals
+  - music-theory
+  - ear-training
+  - songwriting
+  - lora
+  - peft
+  - qwen
+  - eq-adapter
+  - matrix-corp
+pipeline_tag: text-generation
+library_name: transformers
+model_type: touchgrass
+---
+# Touch Grass 🎵
+**A Lightweight Music AI Assistant Fine-Tuned from Qwen3.5**
+Touch Grass is a specialized music AI assistant built by fine-tuning Qwen3.5 models (3B and 7B variants) with music-specific capabilities. It understands guitar, piano, drums, vocals, music theory, ear training, songwriting, and production—with emotional intelligence to help musicians through frustration.
+## 🌟 Features
+- **Two Model Sizes**: TouchGrass-3B (CPU-friendly) and TouchGrass-7B (GPU-enhanced)
+- **Music Tokenizer Extension**: Adds 21+ music-specific tokens to Qwen3.5's vocabulary
+- **Five Specialized Modules**:
+  - 🎸 **Tab & Chord Generation**: Creates and validates guitar tabs, chord diagrams
+  - 🎹 **Music Theory Engine**: Scales, chords, intervals, progressions, circle of fifths
+  - 👂 **Ear Training**: Interval identification with song references, solfege exercises
+  - 😌 **EQ Adapter**: Frustration detection and emotional response adaptation
+  - ✍️ **Song Writing Assistant**: Chord progressions, lyrics, hooks, production tips
+- **LoRA Fine-Tuning**: Efficient adaptation without full model retraining
+- **HuggingFace Compatible**: Production-ready with custom config and tokenizer classes
+- **Ollama Support**: Run locally with Ollama modelfiles
+- **Unified Inference**: Instrument context switching (guitar, piano, drums, vocals, theory, production)
+- **Synthetic Data Pipeline**: 10 categories, 80+ templates covering all music domains
+## 🏗️ Architecture
+```
+TouchGrass/
+├── configs/                    # Model configurations
+│   ├── touchgrass_3b_config.py # 3B variant config
+│   ├── touchgrass_7b_config.py # 7B variant config
+│   └── training_config.py      # Training hyperparameters
+├── tokenizer/
+│   └── music_token_extension.py # Extends Qwen tokenizer with music tokens
+├── models/                     # Specialized music modules
+│   ├── tab_chord_module.py     # Guitar tabs and chords
+│   ├── music_theory_module.py  # Theory knowledge
+│   ├── ear_training_module.py  # Ear training exercises
+│   ├── eq_adapter.py           # Emotional intelligence
+│   └── songwriting_module.py   # Song creation assistance
+├── data/
+│   ├── music_qa_generator.py   # Synthetic dataset generator
+│   ├── chat_formatter.py       # Qwen chat format converter
+│   └── dataset_loader.py       # PyTorch dataset
+├── training/
+│   ├── losses.py              # Multi-task loss functions
+│   ├── trainer.py             # LoRA-aware trainer
+│   └── train.py               # Main training entry point
+├── inference/
+│   └── inference.py           # Unified inference with context
+├── benchmarks/
+│   ├── evaluate_music_modules.py  # Module-level benchmarks
+│   └── evaluate_inference.py      # End-to-end inference benchmarks
+├── tests/                     # Comprehensive test suite
+│   ├── test_*.py             # Unit tests for each module
+│   ├── conftest.py           # Pytest fixtures
+│   └── run_tests.py          # Test runner
+├── configuration_touchgrass.py  # HuggingFace config class
+├── tokenization_touchgrass.py   # HuggingFace tokenizer wrapper
+├── ollama_3b_modelfile         # Ollama config for 3B
+├── ollama_7b_modelfile         # Ollama config for 7B
+└── train.py                    # Main training script
+```
+## 📦 Installation
+### Prerequisites
+- Python 3.10+
+- PyTorch 2.0+
+- Transformers (HuggingFace)
+- PEFT (LoRA)
+- Datasets
+- Pytest (for testing)
+### Setup
+```bash
+# Clone the repository
+cd TouchGrass
+# Install dependencies
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
+pip install transformers peft datasets accelerate tqdm pytest
+# Optional: For GPU support
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
+```
+## 🚀 Quick Start
+### 1. Generate Training Data
+```bash
+python -c "
+from TouchGrass.data.music_qa_generator import MusicQAGenerator
+from TouchGrass.data.chat_formatter import ChatFormatter
+# Generate synthetic dataset
+generator = MusicQAGenerator(seed=42)
+dataset = generator.generate_dataset(num_samples=1000, output_path='data/music_qa.jsonl')
+# Format for Qwen
+formatter = ChatFormatter()
+formatted = formatter.format_dataset(dataset)
+train_data, val_data = formatter.create_splits(formatted, val_size=0.1)
+formatter.save_dataset(train_data, 'data/train.jsonl')
+formatter.save_dataset(val_data, 'data/val.jsonl')
+"
+```
+### 2. Train the Model
+```bash
+# Train 3B variant
+python train.py \
+  --base_model Qwen/Qwen3.5-3B-Instruct \
+  --train_data data/train.jsonl \
+  --val_data data/val.jsonl \
+  --output_dir checkpoints/touchgrass-3b \
+  --lora_r 16 \
+  --lora_alpha 32 \
+  --batch_size 4 \
+  --gradient_accumulation_steps 4 \
+  --learning_rate 2e-4 \
+  --num_epochs 3 \
+  --mixed_precision fp16
+# Train 7B variant (requires GPU with 16GB+ VRAM)
+python train.py \
+  --base_model Qwen/Qwen3.5-7B-Instruct \
+  --train_data data/train.jsonl \
+  --val_data data/val.jsonl \
+  --output_dir checkpoints/touchgrass-7b \
+  --lora_r 16 \
+  --lora_alpha 32 \
+  --batch_size 2 \
+  --gradient_accumulation_steps 8 \
+  --learning_rate 1e-4 \
+  --num_epochs 3 \
+  --mixed_precision bf16
+```
+### 3. Run Inference
+```python
+from TouchGrass.inference.inference import TouchGrassInference
+# Load model
+model = TouchGrassInference(
+    model_path="checkpoints/touchgrass-3b",
+    device="cpu"  # or "cuda"
+)
+# Single query with instrument context
+response = model.generate(
+    prompt="How do I play a G major chord?",
+    instrument="guitar",
+    skill_level="beginner",
+    max_new_tokens=200
+)
+print(response)
+# Interactive mode
+model.chat(instrument="piano")
+```
+### 4. Use with Ollama
+```bash
+# Create modelfile from provided template
+cat ollama_3b_modelfile > Modelfile
+# Build and run
+ollama create touchgrass-3b -f Modelfile
+ollama run touchgrass-3b "How do I play a G major chord on guitar?"
+```
+### 5. Use with HuggingFace
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# Load with custom config and tokenizer
+config = TouchGrassConfig.from_pretrained("checkpoints/touchgrass-3b")
+tokenizer = TouchGrassTokenizer.from_pretrained("checkpoints/touchgrass-3b")
+model = AutoModelForCausalLM.from_pretrained(
+    "checkpoints/touchgrass-3b",
+    config=config,
+    device_map="auto"
+)
+# Generate
+inputs = tokenizer("system\nYou are a music assistant.\nuser\nHow do I play a G major chord?\nassistant\n", return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=200)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+## 🧪 Testing
+Run the comprehensive test suite:
+```bash
+# Run all tests
+python tests/run_tests.py
+# Run with coverage
+python tests/run_tests.py --coverage
+# Run specific test categories
+pytest tests/test_music_theory_module.py -v
+pytest tests/test_tokenizer.py -v
+pytest tests/test_eq_adapter.py -v
+# Skip slow tests
+pytest -m "not slow"
+```
+## 📊 Benchmarking
+Evaluate model performance on music-specific tasks:
+```bash
+# Evaluate music modules
+python benchmarks/evaluate_music_modules.py --device cpu --d_model 768
+# Run inference benchmarks
+python benchmarks/evaluate_inference.py --model_path checkpoints/touchgrass-3b --device cpu
+```
+## 🎛️ Configuration
+### Training Configuration
+Edit `configs/training_config.py` to customize:
+- **Learning rate**: 2e-4 (3B), 1e-4 (7B)
+- **LoRA rank (r)**: 8-32 (higher = more capacity)
+- **LoRA alpha**: Typically 2×r
+- **Batch size**: Adjust based on GPU memory
+- **Gradient accumulation**: Use to simulate larger batches
+- **Loss weights**:
+  - `lm_loss_weight=1.0` (primary language modeling)
+  - `eq_loss_weight=0.1` (emotional intelligence)
+  - `music_module_loss_weight=0.05` (specialized modules)
+### Model Configuration
+- **TouchGrass-3B**: Based on Qwen3.5-3B-Instruct, d_model=2048, num_layers=36
+- **TouchGrass-7B**: Based on Qwen3.5-7B-Instruct, d_model=4096, num_layers=40
+### Music Tokens
+The tokenizer extension adds these special tokens:
+**Domain tokens**: `[GUITAR]`, `[PIANO]`, `[DRUMS]`, `[VOCALS]`, `[THEORY]`, `[PRODUCTION]`
+**Emotion tokens**: `[FRUSTRATED]`, `[CONFUSED]`, `[EXCITED]`, `[CONFIDENT]`
+**Difficulty tokens**: `[EASY]`, `[MEDIUM]`, `[HARD]`
+**Function tokens**: `[TAB]`, `[CHORD]`, `[SCALE]`, `[INTERVAL]`, `[PROGRESSION]`
+**EQ tokens**: `[SIMPLIFY]`, `[ENCOURAGE]`
+**Music notation**: All note names (C, C#, D, etc.), chord types (m, dim, aug, 7, maj7, etc.)
+## 📚 Music Domains Covered
+1. **Guitar & Bass**: Tabs, chords, fingerings, techniques, tunings
+2. **Piano & Keys**: Scales, arpeggios, hand positions, pedaling
+3. **Drums & Percussion**: Beats, fills, rudiments, kit setup
+4. **Vocals & Singing**: Range, breathing, technique, warmups
+5. **Music Theory & Composition**: Scales, chords, progressions, harmony
+6. **DJ & Production**: EQ, mixing, compression, arrangement
+## 😌 Emotional Intelligence
+The EQ Adapter detects user frustration and adapts responses:
+- **Frustration detection**: Sigmoid output [0, 1] indicating frustration level
+- **Emotion classification**: 4 classes (frustrated, confused, excited, confident)
+- **Simplification gate**: Automatically simplifies explanations when frustration is high
+- **Encouragement templates**: Pre-built supportive responses
+- **Context-aware**: Uses conversation history to track emotional state
+## 🔧 Advanced Usage
+### Custom Dataset Generation
+```python
+from TouchGrass.data.music_qa_generator import MusicQAGenerator
+# Create custom templates
+custom_templates = {
+    "guitar": [
+        {
+            "system": "You are a {instrument} specialist.",
+            "user": "How do I play {chord}?",
+            "assistant": "Place your fingers: {fingering}"
+        }
+    ]
+}
+generator = MusicQAGenerator(templates=custom_templates, seed=123)
+dataset = generator.generate_dataset(num_samples=500)
+```
+### Multi-Instrument Context
+```python
+from TouchGrass.inference.inference import TouchGrassInference
+model = TouchGrassInference(model_path="checkpoints/touchgrass-3b")
+# Switch between instruments seamlessly
+guitar_response = model.generate("How do I palm mute?", instrument="guitar")
+piano_response = model.generate("What are the scales in C major?", instrument="piano")
+theory_response = model.generate("Explain the circle of fifths", instrument="theory")
+```
+### LoRA Fine-Tuning Customization
+```python
+from transformers import LoraConfig
+lora_config = LoraConfig(
+    task_type=TaskType.CAUSAL_LM,
+    r=32,  # Rank (higher = more parameters)
+    lora_alpha=64,  # Alpha (typically 2×r)
+    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],  # Qwen attention modules
+    lora_dropout=0.1,
+    bias="none"
+)
+```
+## 🧩 Module Details
+### Tab & Chord Module
+- **Input**: Hidden states + string/fret indices
+- **Output**:
+  - `tab_validator`: Confidence score [0, 1] for tab validity
+  - `difficulty`: 3-class classification (easy/medium/hard)
+- **Supports**: Multiple tunings (standard, drop D, open G), 6 strings, 24 frets
+### Music Theory Module
+- **Functions**:
+  - `get_scale_from_key(key, mode)`: Returns scale notes
+  - `detect_chord_function(root, chord_type, key)`: Returns Roman numeral
+  - `get_circle_of_fifths()`: Returns 12-key circle
+  - `construct_chord(root, chord_type)`: Returns chord notes
+  - `analyze_progression(progression, key)`: Returns functional analysis
+- **Knowledge**: All modes (ionian through locrian), intervals, transpositions
+### Ear Training Module
+- **Interval identification**: 12 intervals (P1-P8)
+- **Song references**: Each interval linked to famous songs (Star Wars for P5, Jaws for m2, etc.)
+- **Solfege generation**: Do-Re-Mi for any key/mode
+- **Quiz generation**: Automatic interval quiz creation
+### EQ Adapter
+- **Frustration detector**: Sigmoid output from hidden states
+- **Emotion classifier**: 4-way classification
+- **Simplification gate**: Context-aware response simplification
+- **Encouragement embed**: Pre-trained supportive phrases
+### Songwriting Module
+- **Progression suggester**: By mood (8 types) and genre (8 types)
+- **Lyric generator**: With rhyme scheme awareness (ABAB, AABB, etc.)
+- **Hook generator**: Creates memorable song hooks
+- **Production advisor**: Instrumentation, effects, arrangement tips
+## 📈 Training Tips
+1. **Start small**: Use 3B variant for experimentation, 7B for production
+2. **Data quality**: Ensure diverse coverage of all 10 categories
+3. **Loss weights**: Default (1.0, 0.1, 0.05) work well; adjust if modules need more/less supervision
+4. **LoRA rank**: Start with r=16; increase to 32 if underfitting
+5. **Mixed precision**: Use `fp16` for NVIDIA, `bf16` for newer GPUs
+6. **Gradient accumulation**: Essential for fitting larger batches on limited VRAM
+7. **Checkpointing**: Save every 100-500 steps for safety
+## 🤝 Contributing
+1. Fork the repository
+2. Create a feature branch
+3. Add tests for new functionality
+4. Ensure all tests pass (`python tests/run_tests.py`)
+5. Submit a pull request
+## 📄 License
+MIT License - see LICENSE file for details.
+## 🙏 Acknowledgments
+- **Qwen3.5**: Base model from Alibaba Cloud
+- **HuggingFace**: Transformers and PEFT libraries
+- **Music theory**: Traditional Western music theory principles
+- **Song references**: Popular music culture for ear training
+## 📞 Support
+- Issues: GitHub Issues
+- Discussions: GitHub Discussions
+- Documentation: See individual module docstrings
+---
+**Made with ❤️ for musicians everywhere.**
+*Touch Grass - because even AI needs to remember to make music, not just talk about it.*