TouchGrass-7b / modelcard.md
Zandy-Wandy's picture
Upload 39 files
4f0238f verified
---
license: apache-2.0
tags:
- music
- text-generation
- instruction-tuning
- lora
- preview
- untrained
- qwen3.5
- touchgrass
datasets:
- synthetic
language:
- en
library_name: transformers
pipeline_tag: text-generation
---
# TouchGrass-7B 🎡
**Status: PREVIEW - UNTRAINED MODEL**
This is a **preview repository** for TouchGrass-7B, a powerful music AI assistant fine-tuned from Qwen3.5-7B-Instruct. **This model has NOT been trained yet** - it contains randomly initialized LoRA adapters and is not ready for inference.
## ⚠️ Important Notice
- **Model is UNTRAINED**: The LoRA adapters are randomly initialized. Performance will be no better than the base Qwen3.5-7B-Instruct model.
- **For demonstration purposes only**: This repository contains the complete codebase and configuration for training the model.
- **Expected performance after training**: 96-97% accuracy on music-specific tasks (based on architecture design and synthetic data pipeline).
## 🎯 Model Overview
TouchGrass is a specialized music AI assistant built by fine-tuning Qwen3.5 models with:
- **Music Tokenizer Extension**: 21+ music-specific tokens (guitar, piano, drums, vocals, theory, DJ, tablature, chords, etc.)
- **Five Specialized Modules**:
- 🎸 Tab & Chord Generation (guitar tabs, chord diagrams)
- 🎹 Music Theory Engine (scales, intervals, progressions)
- πŸ‘‚ Ear Training (interval ID, solfege exercises)
- 😌 EQ Adapter (frustration detection, emotional adaptation)
- ✍️ Song Writing Assistant (progressions, lyrics, hooks)
- **LoRA Fine-Tuning**: Efficient parameter-efficient fine-tuning
- **Multi-Task Learning**: Weighted losses (LM: 1.0, EQ: 0.1, Music: 0.05)
## πŸ“Š Model Details
| Property | Value |
|----------|-------|
| Base Model | Qwen/Qwen3.5-7B-Instruct |
| Model Size | ~7.5B parameters (with LoRA) |
| Vocab Size | 32,000 (Qwen3.5 + music tokens) |
| Max Sequence Length | 4,096 tokens |
| LoRA Rank | 16 (configurable) |
| Training Data | Synthetic music QA (10 categories, 80+ templates) |
| Training Steps | 50,000 (planned) |
| Batch Size | 8-16 (depending on GPU) |
| Learning Rate | 2e-4 (with warmup) |
## πŸ—οΈ Architecture
The model extends Qwen3.5 with:
1. **Custom tokenizer** with music domain tokens
2. **Five LoRA-adapted modules** inserted at transformer layers
3. **Multi-task heads** for music-specific predictions
4. **Emotional intelligence** via EQ adapter
## πŸš€ Usage (After Training)
### HuggingFace Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from TouchGrass.configuration_touchgrass import TouchGrassConfig
from TouchGrass.tokenization_touchgrass import TouchGrassTokenizer
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("your-username/TouchGrass-7B")
tokenizer = TouchGrassTokenizer.from_pretrained("your-username/TouchGrass-7B")
# Generate with instrument context
prompt = "[GUITAR][BEGINNER] How do I play an F major chord?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0]))
```
### Ollama (After Training)
```bash
# Create Modelfile (provided in repository)
ollama create touchgrass-7b -f ollama_7b_modelfile
# Run inference
ollama run touchgrass-7b "How do I build a chord progression in C major?"
```
## πŸ“ Repository Structure
This repository contains all necessary files for training:
```
touchgrass-7b/
β”œβ”€β”€ configuration_touchgrass.py # HuggingFace config class
β”œβ”€β”€ tokenization_touchgrass.py # HuggingFace tokenizer wrapper
β”œβ”€β”€ train.py # Main training script
β”œβ”€β”€ configs/
β”‚ β”œβ”€β”€ touchgrass_3b_config.py # 3B config (for reference)
β”‚ β”œβ”€β”€ touchgrass_7b_config.py # Model architecture config
β”‚ └── training_config.py # Training hyperparameters
β”œβ”€β”€ tokenizer/
β”‚ └── music_token_extension.py # Music token definitions
β”œβ”€β”€ models/ # Five specialized modules
β”‚ β”œβ”€β”€ tab_chord_module.py
β”‚ β”œβ”€β”€ music_theory_module.py
β”‚ β”œβ”€β”€ ear_training_module.py
β”‚ β”œβ”€β”€ eq_adapter.py
β”‚ └── songwriting_module.py
β”œβ”€β”€ data/ # Data pipeline
β”‚ β”œβ”€β”€ music_qa_generator.py
β”‚ β”œβ”€β”€ chat_formatter.py
β”‚ └── dataset_loader.py
β”œβ”€β”€ training/
β”‚ β”œβ”€β”€ losses.py
β”‚ β”œβ”€β”€ trainer.py
β”‚ └── train.py
β”œβ”€β”€ inference/
β”‚ └── inference.py
β”œβ”€β”€ benchmarks/
β”‚ β”œβ”€β”€ evaluate_music_modules.py
β”‚ └── evaluate_inference.py
β”œβ”€β”€ tests/ # Comprehensive test suite
β”œβ”€β”€ ollama_7b_modelfile # Ollama configuration
β”œβ”€β”€ README.md # Full documentation
└── PREVIEW_README.md # This preview notice
```
## πŸ§ͺ Testing
Run the test suite:
```bash
cd touchgrass-7b
python -m pytest tests/ -v
```
## πŸ“š Documentation
See [README.md](README.md) for complete documentation including:
- Installation instructions
- Training guide
- Inference examples
- Module specifications
- Data generation details
- Troubleshooting
## βš™οΈ Training (When Resources Available)
1. **Generate synthetic data**:
```bash
python -c "from data.music_qa_generator import MusicQAGenerator; MusicQAGenerator().generate_dataset(num_samples=10000, output_path='data/music_qa.jsonl')"
```
2. **Start training**:
```bash
python train.py --config configs/touchgrass_7b_config.py --data data/music_qa.jsonl --output_dir ./checkpoints
```
3. **Convert to HuggingFace format**:
```bash
python -c "from configuration_touchgrass import TouchGrassConfig; from tokenization_touchgrass import TouchGrassTokenizer; config = TouchGrassConfig.from_pretrained('./checkpoints'); tokenizer = TouchGrassTokenizer.from_pretrained('./checkpoints'); config.save_pretrained('./model'); tokenizer.save_pretrained('./model')"
```
4. **Push to HuggingFace**:
```bash
huggingface-cli login
huggingface-cli upload your-username/TouchGrass-7B ./model --repo-type model
```
## 🀝 Contributing
This is a preview. Contributions welcome for:
- Improving synthetic data quality
- Adding more music categories
- Optimizing training efficiency
- Extending to more instruments
## πŸ“„ License
Apache 2.0
## πŸ™ Acknowledgments
- Built upon [Qwen3.5](https://huggingface.co/Qwen) by Alibaba Cloud
- Inspired by the need for accessible music education AI
- Special thanks to the open-source music technology community
---
**⚠️ REMINDER**: This is an UNTRAINED PREVIEW model. Do not use for production inference without completing the training process.