GravityLLM
GravityLLM is a compact instruction-tuned model for constraint-conditioned spatial scene generation.
It turns music constraints + stem descriptors into strict Spatial9Scene JSON for immersive audio pipelines such as IAMF, binaural, and bed-plus-object rendering workflows.
Status This repository is training-ready and Hub-ready.
This includes code, schema, sample data, evaluation, and upload helpers.
It does not include fine-tuned weights yet. After training, upload the contents of youroutputs/...folder as the actual model repo.
Demo at https://spatial9.ai/demo
What you will find in this repo
- Proper instruction fine-tuning with prompt masking, so the loss is applied to the target JSON instead of the instruction prefix.
- LoRA and QLoRA training paths for efficient fine-tuning on small-to-medium GPUs.
- Strict JSON Schema validation for production-safe outputs.
- Built-in evaluation for parse rate, schema-valid rate, object-budget pass rate, and anchor-rule pass rate.
- Clean Hugging Face upload helper with
upload_folder. - Ready-made sample data, sample scene, and recommended training config.
Model contract
Input
A structured payload describing:
- target format
- object budget
- style and section
- per-stem descriptors
- hard rules such as anchors, low-end centering, width targets, and masking constraints
Output
A single valid JSON object matching schemas/scene.schema.json.
Example input
{
"target_format": "iamf",
"max_objects": 10,
"style": "club",
"section": "drop",
"global": {"bpm": 128, "energy": 0.92},
"stems": [
{"id": "v1", "class": "lead_vocal", "lufs": -16.8, "transient": 0.25, "band_energy": {"low": 0.1, "mid": 0.6, "high": 0.3}, "leadness": 0.95},
{"id": "k1", "class": "kick", "lufs": -10.5, "transient": 0.95, "band_energy": {"low": 0.8, "mid": 0.15, "high": 0.05}, "leadness": 0.25}
],
"rules": [
{"type": "anchor", "track_class": "lead_vocal", "az_deg": 0, "el_deg": 10, "dist_m": 1.6},
{"type": "mono_low_end", "hz_below": 120}
]
}
Example output
{
"version": "1.0",
"bed": {"layout": "iamf", "loudness_target_lufs": -14.0, "room_preset": "club_medium"},
"objects": [
{
"id": "v1",
"class": "lead_vocal",
"az_deg": 0,
"el_deg": 10,
"dist_m": 1.6,
"width": 0.15,
"gain_db": 0.0,
"reverb_send": 0.18,
"early_reflections": 0.22,
"motion": [
{"t": 0.0, "az_deg": 0, "el_deg": 10, "dist_m": 1.6},
{"t": 1.0, "az_deg": 0, "el_deg": 10, "dist_m": 1.6}
]
}
],
"constraints_applied": [
"anchor:lead_vocal@0/10/1.6",
"mono_low_end<120Hz"
]
}
Repository layout
GravityLLM-HuggingFace-Repo/
├── README.md
├── LICENSE
├── Makefile
├── pyproject.toml
├── requirements.txt
├── train.py
├── infer.py
├── evaluate.py
├── upload_to_hub.py
├── assets/
│ └── gravityllm_banner.svg
├── configs/
│ └── recommended_train_args.json
├── data/
│ ├── train.jsonl
│ └── valid.jsonl
├── examples/
│ ├── sample_input.json
│ └── sample_output.json
├── schemas/
│ └── scene.schema.json
├── scripts/
│ ├── push_to_hub.sh
│ └── train_qlora.sh
└── tools/
├── make_synthetic_dataset.py
└── validate_scene.py
Quick start
1) Install
python -m pip install -r requirements.txt
2) Train with QLoRA
bash scripts/train_qlora.sh
Or run directly:
python train.py --model Qwen/Qwen2.5-1.5B-Instruct --train_file data/train.jsonl --valid_file data/valid.jsonl --output_dir outputs/GravityLLM-Qwen2.5-1.5B-S9 --max_length 2048 --num_train_epochs 3 --learning_rate 2e-4 --train_batch_size 1 --eval_batch_size 1 --gradient_accumulation_steps 16 --warmup_ratio 0.03 --save_steps 100 --eval_steps 100 --qlora --bf16
3) Generate a scene
python infer.py --model_dir outputs/GravityLLM-Qwen2.5-1.5B-S9 --input_json examples/sample_input.json --validate --output_json outputs/sample_prediction.json
4) Evaluate
python evaluate.py --model_dir outputs/GravityLLM-Qwen2.5-1.5B-S9 --data_file data/valid.jsonl --report_path reports/eval_report.json
5) Validate any output
python tools/validate_scene.py schemas/scene.schema.json outputs/sample_prediction.json
Push to the Hugging Face Hub
From a trained output folder
python upload_to_hub.py --folder_path outputs/GravityLLM-Qwen2.5-1.5B-S9 --repo_id YOUR_NAMESPACE/GravityLLM-Qwen2.5-1.5B-S9
Or with the helper script
bash scripts/push_to_hub.sh outputs/GravityLLM-Qwen2.5-1.5B-S9 YOUR_NAMESPACE/GravityLLM-Qwen2.5-1.5B-S9
Dataset format
Training files are JSONL with two fields per row:
{
"prompt": "GravityLLM: Output ONLY valid JSON matching the Spatial9Scene schema.\n\nINPUT:\n{...}",
"completion": "{... valid Spatial9Scene JSON ...}"
}
The provided sample dataset is intentionally small. Replace it with your real production examples as soon as possible.
Recommended data strategy
For a strong first release:
- Collect a few hundred high-quality gold examples from expert-authored scenes.
- Keep the schema stable and quantized.
- Encode hard rules explicitly instead of relying on vague prose.
- Run evaluation after every fine-tune.
- Add a post-processor to enforce hard constraints if the runtime must be deterministic.
Suggested training roadmap
v0
- Small curated dataset
- QLoRA adapter
- Schema-valid JSON only
- Anchor and budget constraints
v1
- More genres and sections
- Better masking and width rules
- Object motion patterns
- Automatic validation and repair loop
v2
- Preference tuning on human A/B judgments
- A dedicated reward signal for clarity, masking avoidance, and translation safety
Intended use
GravityLLM is designed for:
- music-tech pipelines
- Spatial9 scene authoring
- assisted immersive-audio layout generation
- IAMF-ready authoring workflows
- renderer-side JSON generation
Limitations
- This repo does not include trained weights out of the box.
- The model only knows what you teach it through your dataset.
- Raw audio is not consumed directly here; the training pipeline expects structured stem features.
- Production systems should still validate outputs and optionally apply a rule-based correction pass.
Safety and reliability
- Always validate generated scenes against the JSON schema.
- Keep low-end centering as a hard rule outside the model if that is non-negotiable.
- Treat the model as a scene proposal engine, not an oracle.
License
This repository is released under Apache-2.0.
