File size: 14,114 Bytes
cb3cf0b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
---
language:
  - en
license: mit
base_model: Qwen/Qwen3.5-7B-Instruct
tags:
  - music
  - guitar
  - piano
  - drums
  - vocals
  - music-theory
  - ear-training
  - songwriting
  - lora
  - peft
  - qwen
  - eq-adapter
  - matrix-corp
pipeline_tag: text-generation
library_name: transformers
model_type: touchgrass
---

# Touch Grass 🎡

**A Lightweight Music AI Assistant Fine-Tuned from Qwen3.5**

Touch Grass is a specialized music AI assistant built by fine-tuning Qwen3.5 models (3B and 7B variants) with music-specific capabilities. It understands guitar, piano, drums, vocals, music theory, ear training, songwriting, and productionβ€”with emotional intelligence to help musicians through frustration.

## 🌟 Features

- **Two Model Sizes**: TouchGrass-3B (CPU-friendly) and TouchGrass-7B (GPU-enhanced)
- **Music Tokenizer Extension**: Adds 21+ music-specific tokens to Qwen3.5's vocabulary
- **Five Specialized Modules**:
  - 🎸 **Tab & Chord Generation**: Creates and validates guitar tabs, chord diagrams
  - 🎹 **Music Theory Engine**: Scales, chords, intervals, progressions, circle of fifths
  - πŸ‘‚ **Ear Training**: Interval identification with song references, solfege exercises
  - 😌 **EQ Adapter**: Frustration detection and emotional response adaptation
  - ✍️ **Song Writing Assistant**: Chord progressions, lyrics, hooks, production tips
- **LoRA Fine-Tuning**: Efficient adaptation without full model retraining
- **HuggingFace Compatible**: Production-ready with custom config and tokenizer classes
- **Ollama Support**: Run locally with Ollama modelfiles
- **Unified Inference**: Instrument context switching (guitar, piano, drums, vocals, theory, production)
- **Synthetic Data Pipeline**: 10 categories, 80+ templates covering all music domains

## πŸ—οΈ Architecture

```
TouchGrass/
β”œβ”€β”€ configs/                    # Model configurations
β”‚   β”œβ”€β”€ touchgrass_3b_config.py # 3B variant config
β”‚   β”œβ”€β”€ touchgrass_7b_config.py # 7B variant config
β”‚   └── training_config.py      # Training hyperparameters
β”œβ”€β”€ tokenizer/
β”‚   └── music_token_extension.py # Extends Qwen tokenizer with music tokens
β”œβ”€β”€ models/                     # Specialized music modules
β”‚   β”œβ”€β”€ tab_chord_module.py     # Guitar tabs and chords
β”‚   β”œβ”€β”€ music_theory_module.py  # Theory knowledge
β”‚   β”œβ”€β”€ ear_training_module.py  # Ear training exercises
β”‚   β”œβ”€β”€ eq_adapter.py           # Emotional intelligence
β”‚   └── songwriting_module.py   # Song creation assistance
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ music_qa_generator.py   # Synthetic dataset generator
β”‚   β”œβ”€β”€ chat_formatter.py       # Qwen chat format converter
β”‚   └── dataset_loader.py       # PyTorch dataset
β”œβ”€β”€ training/
β”‚   β”œβ”€β”€ losses.py              # Multi-task loss functions
β”‚   β”œβ”€β”€ trainer.py             # LoRA-aware trainer
β”‚   └── train.py               # Main training entry point
β”œβ”€β”€ inference/
β”‚   └── inference.py           # Unified inference with context
β”œβ”€β”€ benchmarks/
β”‚   β”œβ”€β”€ evaluate_music_modules.py  # Module-level benchmarks
β”‚   └── evaluate_inference.py      # End-to-end inference benchmarks
β”œβ”€β”€ tests/                     # Comprehensive test suite
β”‚   β”œβ”€β”€ test_*.py             # Unit tests for each module
β”‚   β”œβ”€β”€ conftest.py           # Pytest fixtures
β”‚   └── run_tests.py          # Test runner
β”œβ”€β”€ configuration_touchgrass.py  # HuggingFace config class
β”œβ”€β”€ tokenization_touchgrass.py   # HuggingFace tokenizer wrapper
β”œβ”€β”€ ollama_3b_modelfile         # Ollama config for 3B
β”œβ”€β”€ ollama_7b_modelfile         # Ollama config for 7B
└── train.py                    # Main training script
```

## πŸ“¦ Installation

### Prerequisites

- Python 3.10+
- PyTorch 2.0+
- Transformers (HuggingFace)
- PEFT (LoRA)
- Datasets
- Pytest (for testing)

### Setup

```bash
# Clone the repository
cd TouchGrass

# Install dependencies
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install transformers peft datasets accelerate tqdm pytest

# Optional: For GPU support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
```

## πŸš€ Quick Start

### 1. Generate Training Data

```bash
python -c "
from TouchGrass.data.music_qa_generator import MusicQAGenerator
from TouchGrass.data.chat_formatter import ChatFormatter

# Generate synthetic dataset
generator = MusicQAGenerator(seed=42)
dataset = generator.generate_dataset(num_samples=1000, output_path='data/music_qa.jsonl')

# Format for Qwen
formatter = ChatFormatter()
formatted = formatter.format_dataset(dataset)
train_data, val_data = formatter.create_splits(formatted, val_size=0.1)

formatter.save_dataset(train_data, 'data/train.jsonl')
formatter.save_dataset(val_data, 'data/val.jsonl')
"
```

### 2. Train the Model

```bash
# Train 3B variant
python train.py \
  --base_model Qwen/Qwen3.5-3B-Instruct \
  --train_data data/train.jsonl \
  --val_data data/val.jsonl \
  --output_dir checkpoints/touchgrass-3b \
  --lora_r 16 \
  --lora_alpha 32 \
  --batch_size 4 \
  --gradient_accumulation_steps 4 \
  --learning_rate 2e-4 \
  --num_epochs 3 \
  --mixed_precision fp16

# Train 7B variant (requires GPU with 16GB+ VRAM)
python train.py \
  --base_model Qwen/Qwen3.5-7B-Instruct \
  --train_data data/train.jsonl \
  --val_data data/val.jsonl \
  --output_dir checkpoints/touchgrass-7b \
  --lora_r 16 \
  --lora_alpha 32 \
  --batch_size 2 \
  --gradient_accumulation_steps 8 \
  --learning_rate 1e-4 \
  --num_epochs 3 \
  --mixed_precision bf16
```

### 3. Run Inference

```python
from TouchGrass.inference.inference import TouchGrassInference

# Load model
model = TouchGrassInference(
    model_path="checkpoints/touchgrass-3b",
    device="cpu"  # or "cuda"
)

# Single query with instrument context
response = model.generate(
    prompt="How do I play a G major chord?",
    instrument="guitar",
    skill_level="beginner",
    max_new_tokens=200
)
print(response)

# Interactive mode
model.chat(instrument="piano")
```

### 4. Use with Ollama

```bash
# Create modelfile from provided template
cat ollama_3b_modelfile > Modelfile

# Build and run
ollama create touchgrass-3b -f Modelfile
ollama run touchgrass-3b "How do I play a G major chord on guitar?"
```

### 5. Use with HuggingFace

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load with custom config and tokenizer
config = TouchGrassConfig.from_pretrained("checkpoints/touchgrass-3b")
tokenizer = TouchGrassTokenizer.from_pretrained("checkpoints/touchgrass-3b")
model = AutoModelForCausalLM.from_pretrained(
    "checkpoints/touchgrass-3b",
    config=config,
    device_map="auto"
)

# Generate
inputs = tokenizer("system\nYou are a music assistant.\nuser\nHow do I play a G major chord?\nassistant\n", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## πŸ§ͺ Testing

Run the comprehensive test suite:

```bash
# Run all tests
python tests/run_tests.py

# Run with coverage
python tests/run_tests.py --coverage

# Run specific test categories
pytest tests/test_music_theory_module.py -v
pytest tests/test_tokenizer.py -v
pytest tests/test_eq_adapter.py -v

# Skip slow tests
pytest -m "not slow"
```

## πŸ“Š Benchmarking

Evaluate model performance on music-specific tasks:

```bash
# Evaluate music modules
python benchmarks/evaluate_music_modules.py --device cpu --d_model 768

# Run inference benchmarks
python benchmarks/evaluate_inference.py --model_path checkpoints/touchgrass-3b --device cpu
```

## πŸŽ›οΈ Configuration

### Training Configuration

Edit `configs/training_config.py` to customize:

- **Learning rate**: 2e-4 (3B), 1e-4 (7B)
- **LoRA rank (r)**: 8-32 (higher = more capacity)
- **LoRA alpha**: Typically 2Γ—r
- **Batch size**: Adjust based on GPU memory
- **Gradient accumulation**: Use to simulate larger batches
- **Loss weights**:
  - `lm_loss_weight=1.0` (primary language modeling)
  - `eq_loss_weight=0.1` (emotional intelligence)
  - `music_module_loss_weight=0.05` (specialized modules)

### Model Configuration

- **TouchGrass-3B**: Based on Qwen3.5-3B-Instruct, d_model=2048, num_layers=36
- **TouchGrass-7B**: Based on Qwen3.5-7B-Instruct, d_model=4096, num_layers=40

### Music Tokens

The tokenizer extension adds these special tokens:

**Domain tokens**: `[GUITAR]`, `[PIANO]`, `[DRUMS]`, `[VOCALS]`, `[THEORY]`, `[PRODUCTION]`

**Emotion tokens**: `[FRUSTRATED]`, `[CONFUSED]`, `[EXCITED]`, `[CONFIDENT]`

**Difficulty tokens**: `[EASY]`, `[MEDIUM]`, `[HARD]`

**Function tokens**: `[TAB]`, `[CHORD]`, `[SCALE]`, `[INTERVAL]`, `[PROGRESSION]`

**EQ tokens**: `[SIMPLIFY]`, `[ENCOURAGE]`

**Music notation**: All note names (C, C#, D, etc.), chord types (m, dim, aug, 7, maj7, etc.)

## πŸ“š Music Domains Covered

1. **Guitar & Bass**: Tabs, chords, fingerings, techniques, tunings
2. **Piano & Keys**: Scales, arpeggios, hand positions, pedaling
3. **Drums & Percussion**: Beats, fills, rudiments, kit setup
4. **Vocals & Singing**: Range, breathing, technique, warmups
5. **Music Theory & Composition**: Scales, chords, progressions, harmony
6. **DJ & Production**: EQ, mixing, compression, arrangement

## 😌 Emotional Intelligence

The EQ Adapter detects user frustration and adapts responses:

- **Frustration detection**: Sigmoid output [0, 1] indicating frustration level
- **Emotion classification**: 4 classes (frustrated, confused, excited, confident)
- **Simplification gate**: Automatically simplifies explanations when frustration is high
- **Encouragement templates**: Pre-built supportive responses
- **Context-aware**: Uses conversation history to track emotional state

## πŸ”§ Advanced Usage

### Custom Dataset Generation

```python
from TouchGrass.data.music_qa_generator import MusicQAGenerator

# Create custom templates
custom_templates = {
    "guitar": [
        {
            "system": "You are a {instrument} specialist.",
            "user": "How do I play {chord}?",
            "assistant": "Place your fingers: {fingering}"
        }
    ]
}

generator = MusicQAGenerator(templates=custom_templates, seed=123)
dataset = generator.generate_dataset(num_samples=500)
```

### Multi-Instrument Context

```python
from TouchGrass.inference.inference import TouchGrassInference

model = TouchGrassInference(model_path="checkpoints/touchgrass-3b")

# Switch between instruments seamlessly
guitar_response = model.generate("How do I palm mute?", instrument="guitar")
piano_response = model.generate("What are the scales in C major?", instrument="piano")
theory_response = model.generate("Explain the circle of fifths", instrument="theory")
```

### LoRA Fine-Tuning Customization

```python
from transformers import LoraConfig

lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=32,  # Rank (higher = more parameters)
    lora_alpha=64,  # Alpha (typically 2Γ—r)
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],  # Qwen attention modules
    lora_dropout=0.1,
    bias="none"
)
```

## 🧩 Module Details

### Tab & Chord Module

- **Input**: Hidden states + string/fret indices
- **Output**:
  - `tab_validator`: Confidence score [0, 1] for tab validity
  - `difficulty`: 3-class classification (easy/medium/hard)
- **Supports**: Multiple tunings (standard, drop D, open G), 6 strings, 24 frets

### Music Theory Module

- **Functions**:
  - `get_scale_from_key(key, mode)`: Returns scale notes
  - `detect_chord_function(root, chord_type, key)`: Returns Roman numeral
  - `get_circle_of_fifths()`: Returns 12-key circle
  - `construct_chord(root, chord_type)`: Returns chord notes
  - `analyze_progression(progression, key)`: Returns functional analysis
- **Knowledge**: All modes (ionian through locrian), intervals, transpositions

### Ear Training Module

- **Interval identification**: 12 intervals (P1-P8)
- **Song references**: Each interval linked to famous songs (Star Wars for P5, Jaws for m2, etc.)
- **Solfege generation**: Do-Re-Mi for any key/mode
- **Quiz generation**: Automatic interval quiz creation

### EQ Adapter

- **Frustration detector**: Sigmoid output from hidden states
- **Emotion classifier**: 4-way classification
- **Simplification gate**: Context-aware response simplification
- **Encouragement embed**: Pre-trained supportive phrases

### Songwriting Module

- **Progression suggester**: By mood (8 types) and genre (8 types)
- **Lyric generator**: With rhyme scheme awareness (ABAB, AABB, etc.)
- **Hook generator**: Creates memorable song hooks
- **Production advisor**: Instrumentation, effects, arrangement tips

## πŸ“ˆ Training Tips

1. **Start small**: Use 3B variant for experimentation, 7B for production
2. **Data quality**: Ensure diverse coverage of all 10 categories
3. **Loss weights**: Default (1.0, 0.1, 0.05) work well; adjust if modules need more/less supervision
4. **LoRA rank**: Start with r=16; increase to 32 if underfitting
5. **Mixed precision**: Use `fp16` for NVIDIA, `bf16` for newer GPUs
6. **Gradient accumulation**: Essential for fitting larger batches on limited VRAM
7. **Checkpointing**: Save every 100-500 steps for safety

## 🀝 Contributing

1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Ensure all tests pass (`python tests/run_tests.py`)
5. Submit a pull request

## πŸ“„ License

MIT License - see LICENSE file for details.

## πŸ™ Acknowledgments

- **Qwen3.5**: Base model from Alibaba Cloud
- **HuggingFace**: Transformers and PEFT libraries
- **Music theory**: Traditional Western music theory principles
- **Song references**: Popular music culture for ear training

## πŸ“ž Support

- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: See individual module docstrings

---

**Made with ❀️ for musicians everywhere.**

*Touch Grass - because even AI needs to remember to make music, not just talk about it.*