Update README.md

38a81d4 verified 4 months ago

7.99 kB

	---
	language:
	- en
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-generation
	base_model: Qwen/Qwen2.5-1.5B-Instruct
	tags:
	- gravityllm
	- spatial-audio
	- immersive-audio
	- spatial9
	- iamf
	- instruction-tuning
	- json
	- lora
	- qlora
	- peft
	- transformers
	widget:
	- text: \|-
	INPUT:
	{
	"target_format": "iamf",
	"max_objects": 10,
	"style": "club",
	"section": "drop",
	"global": {"bpm": 128, "energy": 0.92},
	"stems": [
	{"id": "v1", "class": "lead_vocal", "lufs": -16.8, "transient": 0.25, "band_energy": {"low": 0.1, "mid": 0.6, "high": 0.3}, "leadness": 0.95},
	{"id": "k1", "class": "kick", "lufs": -10.5, "transient": 0.95, "band_energy": {"low": 0.8, "mid": 0.15, "high": 0.05}, "leadness": 0.25}
	],
	"rules": [
	{"type": "anchor", "track_class": "lead_vocal", "az_deg": 0, "el_deg": 10, "dist_m": 1.6},
	{"type": "mono_low_end", "hz_below": 120}
	]
	}
	---

	![GravityLLM banner](assets/heads9.png)

	# GravityLLM

	GravityLLM is a compact instruction-tuned model for constraint-conditioned spatial scene generation.
	It turns music constraints + stem descriptors into strict Spatial9Scene JSON for immersive audio pipelines such as IAMF, binaural, and bed-plus-object rendering workflows.

	> Status
	> This repository is training-ready and Hub-ready.
	> This includes code, schema, sample data, evaluation, and upload helpers.
	> It does not include fine-tuned weights yet. After training, upload the contents of your `outputs/...` folder as the actual model repo.

	Demo at [https://spatial9.ai/demo](https://spatial9.ai/demo)

	## What you will find in this repo

	- Proper instruction fine-tuning with prompt masking, so the loss is applied to the target JSON instead of the instruction prefix.
	- LoRA and QLoRA training paths for efficient fine-tuning on small-to-medium GPUs.
	- Strict JSON Schema validation for production-safe outputs.
	- Built-in evaluation for parse rate, schema-valid rate, object-budget pass rate, and anchor-rule pass rate.
	- Clean Hugging Face upload helper with `upload_folder`.
	- Ready-made sample data, sample scene, and recommended training config.

	## Model contract

	### Input
	A structured payload describing:

	- target format
	- object budget
	- style and section
	- per-stem descriptors
	- hard rules such as anchors, low-end centering, width targets, and masking constraints

	### Output
	A single valid JSON object matching `schemas/scene.schema.json`.

	### Example input
	```json
	{
	"target_format": "iamf",
	"max_objects": 10,
	"style": "club",
	"section": "drop",
	"global": {"bpm": 128, "energy": 0.92},
	"stems": [
	{"id": "v1", "class": "lead_vocal", "lufs": -16.8, "transient": 0.25, "band_energy": {"low": 0.1, "mid": 0.6, "high": 0.3}, "leadness": 0.95},
	{"id": "k1", "class": "kick", "lufs": -10.5, "transient": 0.95, "band_energy": {"low": 0.8, "mid": 0.15, "high": 0.05}, "leadness": 0.25}
	],
	"rules": [
	{"type": "anchor", "track_class": "lead_vocal", "az_deg": 0, "el_deg": 10, "dist_m": 1.6},
	{"type": "mono_low_end", "hz_below": 120}
	]
	}
	```

	### Example output
	```json
	{
	"version": "1.0",
	"bed": {"layout": "iamf", "loudness_target_lufs": -14.0, "room_preset": "club_medium"},
	"objects": [
	{
	"id": "v1",
	"class": "lead_vocal",
	"az_deg": 0,
	"el_deg": 10,
	"dist_m": 1.6,
	"width": 0.15,
	"gain_db": 0.0,
	"reverb_send": 0.18,
	"early_reflections": 0.22,
	"motion": [
	{"t": 0.0, "az_deg": 0, "el_deg": 10, "dist_m": 1.6},
	{"t": 1.0, "az_deg": 0, "el_deg": 10, "dist_m": 1.6}
	]
	}
	],
	"constraints_applied": [
	"anchor:lead_vocal@0/10/1.6",
	"mono_low_end<120Hz"
	]
	}
	```

	## Repository layout

	```text
	GravityLLM-HuggingFace-Repo/
	├── README.md
	├── LICENSE
	├── Makefile
	├── pyproject.toml
	├── requirements.txt
	├── train.py
	├── infer.py
	├── evaluate.py
	├── upload_to_hub.py
	├── assets/
	│ └── gravityllm_banner.svg
	├── configs/
	│ └── recommended_train_args.json
	├── data/
	│ ├── train.jsonl
	│ └── valid.jsonl
	├── examples/
	│ ├── sample_input.json
	│ └── sample_output.json
	├── schemas/
	│ └── scene.schema.json
	├── scripts/
	│ ├── push_to_hub.sh
	│ └── train_qlora.sh
	└── tools/
	├── make_synthetic_dataset.py
	└── validate_scene.py
	```

	## Quick start

	### 1) Install
	```bash
	python -m pip install -r requirements.txt
	```

	### 2) Train with QLoRA
	```bash
	bash scripts/train_qlora.sh
	```

	Or run directly:

	```bash
	python train.py --model Qwen/Qwen2.5-1.5B-Instruct --train_file data/train.jsonl --valid_file data/valid.jsonl --output_dir outputs/GravityLLM-Qwen2.5-1.5B-S9 --max_length 2048 --num_train_epochs 3 --learning_rate 2e-4 --train_batch_size 1 --eval_batch_size 1 --gradient_accumulation_steps 16 --warmup_ratio 0.03 --save_steps 100 --eval_steps 100 --qlora --bf16
	```

	### 3) Generate a scene
	```bash
	python infer.py --model_dir outputs/GravityLLM-Qwen2.5-1.5B-S9 --input_json examples/sample_input.json --validate --output_json outputs/sample_prediction.json
	```

	### 4) Evaluate
	```bash
	python evaluate.py --model_dir outputs/GravityLLM-Qwen2.5-1.5B-S9 --data_file data/valid.jsonl --report_path reports/eval_report.json
	```

	### 5) Validate any output
	```bash
	python tools/validate_scene.py schemas/scene.schema.json outputs/sample_prediction.json
	```

	## Push to the Hugging Face Hub

	### From a trained output folder
	```bash
	python upload_to_hub.py --folder_path outputs/GravityLLM-Qwen2.5-1.5B-S9 --repo_id YOUR_NAMESPACE/GravityLLM-Qwen2.5-1.5B-S9
	```

	### Or with the helper script
	```bash
	bash scripts/push_to_hub.sh outputs/GravityLLM-Qwen2.5-1.5B-S9 YOUR_NAMESPACE/GravityLLM-Qwen2.5-1.5B-S9
	```

	## Dataset format

	Training files are JSONL with two fields per row:

	```json
	{
	"prompt": "GravityLLM: Output ONLY valid JSON matching the Spatial9Scene schema.\n\nINPUT:\n{...}",
	"completion": "{... valid Spatial9Scene JSON ...}"
	}
	```

	The provided sample dataset is intentionally small. Replace it with your real production examples as soon as possible.

	## Recommended data strategy

	For a strong first release:

	1. Collect a few hundred high-quality gold examples from expert-authored scenes.
	2. Keep the schema stable and quantized.
	3. Encode hard rules explicitly instead of relying on vague prose.
	4. Run evaluation after every fine-tune.
	5. Add a post-processor to enforce hard constraints if the runtime must be deterministic.

	## Suggested training roadmap

	### v0
	- Small curated dataset
	- QLoRA adapter
	- Schema-valid JSON only
	- Anchor and budget constraints

	### v1
	- More genres and sections
	- Better masking and width rules
	- Object motion patterns
	- Automatic validation and repair loop

	### v2
	- Preference tuning on human A/B judgments
	- A dedicated reward signal for clarity, masking avoidance, and translation safety

	## Intended use

	GravityLLM is designed for:

	- music-tech pipelines
	- Spatial9 scene authoring
	- assisted immersive-audio layout generation
	- IAMF-ready authoring workflows
	- renderer-side JSON generation

	## Limitations

	- This repo does not include trained weights out of the box.
	- The model only knows what you teach it through your dataset.
	- Raw audio is not consumed directly here; the training pipeline expects structured stem features.
	- Production systems should still validate outputs and optionally apply a rule-based correction pass.

	## Safety and reliability

	- Always validate generated scenes against the JSON schema.
	- Keep low-end centering as a hard rule outside the model if that is non-negotiable.
	- Treat the model as a scene proposal engine, not an oracle.

	## License

	This repository is released under Apache-2.0.