GravityLLM

GravityLLM is a compact instruction-tuned model for constraint-conditioned spatial scene generation.
It turns music constraints + stem descriptors into strict Spatial9Scene JSON for immersive audio pipelines such as IAMF, binaural, and bed-plus-object rendering workflows.

Status This repository is training-ready and Hub-ready.
This includes code, schema, sample data, evaluation, and upload helpers.
It does not include fine-tuned weights yet. After training, upload the contents of your outputs/... folder as the actual model repo.

Demo at https://spatial9.ai/demo

What you will find in this repo

Proper instruction fine-tuning with prompt masking, so the loss is applied to the target JSON instead of the instruction prefix.
LoRA and QLoRA training paths for efficient fine-tuning on small-to-medium GPUs.
Strict JSON Schema validation for production-safe outputs.
Built-in evaluation for parse rate, schema-valid rate, object-budget pass rate, and anchor-rule pass rate.
Clean Hugging Face upload helper with upload_folder.
Ready-made sample data, sample scene, and recommended training config.

Model contract

Input

A structured payload describing:

target format
object budget
style and section
per-stem descriptors
hard rules such as anchors, low-end centering, width targets, and masking constraints

Output

A single valid JSON object matching schemas/scene.schema.json.

Example input

{
  "target_format": "iamf",
  "max_objects": 10,
  "style": "club",
  "section": "drop",
  "global": {"bpm": 128, "energy": 0.92},
  "stems": [
    {"id": "v1", "class": "lead_vocal", "lufs": -16.8, "transient": 0.25, "band_energy": {"low": 0.1, "mid": 0.6, "high": 0.3}, "leadness": 0.95},
    {"id": "k1", "class": "kick", "lufs": -10.5, "transient": 0.95, "band_energy": {"low": 0.8, "mid": 0.15, "high": 0.05}, "leadness": 0.25}
  ],
  "rules": [
    {"type": "anchor", "track_class": "lead_vocal", "az_deg": 0, "el_deg": 10, "dist_m": 1.6},
    {"type": "mono_low_end", "hz_below": 120}
  ]
}

Example output

{
  "version": "1.0",
  "bed": {"layout": "iamf", "loudness_target_lufs": -14.0, "room_preset": "club_medium"},
  "objects": [
    {
      "id": "v1",
      "class": "lead_vocal",
      "az_deg": 0,
      "el_deg": 10,
      "dist_m": 1.6,
      "width": 0.15,
      "gain_db": 0.0,
      "reverb_send": 0.18,
      "early_reflections": 0.22,
      "motion": [
        {"t": 0.0, "az_deg": 0, "el_deg": 10, "dist_m": 1.6},
        {"t": 1.0, "az_deg": 0, "el_deg": 10, "dist_m": 1.6}
      ]
    }
  ],
  "constraints_applied": [
    "anchor:lead_vocal@0/10/1.6",
    "mono_low_end<120Hz"
  ]
}

Repository layout

GravityLLM-HuggingFace-Repo/
├── README.md
├── LICENSE
├── Makefile
├── pyproject.toml
├── requirements.txt
├── train.py
├── infer.py
├── evaluate.py
├── upload_to_hub.py
├── assets/
│   └── gravityllm_banner.svg
├── configs/
│   └── recommended_train_args.json
├── data/
│   ├── train.jsonl
│   └── valid.jsonl
├── examples/
│   ├── sample_input.json
│   └── sample_output.json
├── schemas/
│   └── scene.schema.json
├── scripts/
│   ├── push_to_hub.sh
│   └── train_qlora.sh
└── tools/
    ├── make_synthetic_dataset.py
    └── validate_scene.py

Quick start

1) Install

python -m pip install -r requirements.txt

2) Train with QLoRA

bash scripts/train_qlora.sh

Or run directly:

python train.py   --model Qwen/Qwen2.5-1.5B-Instruct   --train_file data/train.jsonl   --valid_file data/valid.jsonl   --output_dir outputs/GravityLLM-Qwen2.5-1.5B-S9   --max_length 2048   --num_train_epochs 3   --learning_rate 2e-4   --train_batch_size 1   --eval_batch_size 1   --gradient_accumulation_steps 16   --warmup_ratio 0.03   --save_steps 100   --eval_steps 100   --qlora --bf16

3) Generate a scene

python infer.py   --model_dir outputs/GravityLLM-Qwen2.5-1.5B-S9   --input_json examples/sample_input.json   --validate   --output_json outputs/sample_prediction.json

4) Evaluate

python evaluate.py   --model_dir outputs/GravityLLM-Qwen2.5-1.5B-S9   --data_file data/valid.jsonl   --report_path reports/eval_report.json

5) Validate any output

python tools/validate_scene.py schemas/scene.schema.json outputs/sample_prediction.json

Push to the Hugging Face Hub

From a trained output folder

python upload_to_hub.py   --folder_path outputs/GravityLLM-Qwen2.5-1.5B-S9   --repo_id YOUR_NAMESPACE/GravityLLM-Qwen2.5-1.5B-S9

Or with the helper script

bash scripts/push_to_hub.sh outputs/GravityLLM-Qwen2.5-1.5B-S9 YOUR_NAMESPACE/GravityLLM-Qwen2.5-1.5B-S9

Dataset format

Training files are JSONL with two fields per row:

{
  "prompt": "GravityLLM: Output ONLY valid JSON matching the Spatial9Scene schema.\n\nINPUT:\n{...}",
  "completion": "{... valid Spatial9Scene JSON ...}"
}

The provided sample dataset is intentionally small. Replace it with your real production examples as soon as possible.

Recommended data strategy

For a strong first release:

Collect a few hundred high-quality gold examples from expert-authored scenes.
Keep the schema stable and quantized.
Encode hard rules explicitly instead of relying on vague prose.
Run evaluation after every fine-tune.
Add a post-processor to enforce hard constraints if the runtime must be deterministic.

Suggested training roadmap

v0

Small curated dataset
QLoRA adapter
Schema-valid JSON only
Anchor and budget constraints

v1

More genres and sections
Better masking and width rules
Object motion patterns
Automatic validation and repair loop

v2

Preference tuning on human A/B judgments
A dedicated reward signal for clarity, masking avoidance, and translation safety

Intended use

GravityLLM is designed for:

music-tech pipelines
Spatial9 scene authoring
assisted immersive-audio layout generation
IAMF-ready authoring workflows
renderer-side JSON generation

Limitations

This repo does not include trained weights out of the box.
The model only knows what you teach it through your dataset.
Raw audio is not consumed directly here; the training pipeline expects structured stem features.
Production systems should still validate outputs and optionally apply a rule-based correction pass.

Safety and reliability

Always validate generated scenes against the JSON schema.
Keep low-end centering as a hard rule outside the model if that is non-negotiable.
Treat the model as a scene proposal engine, not an oracle.

License

This repository is released under Apache-2.0.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Spatial9/GravityLLM

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Adapter

(1097)

this model