GravityLLM banner

GravityLLM

GravityLLM is a compact instruction-tuned model for constraint-conditioned spatial scene generation.
It turns music constraints + stem descriptors into strict Spatial9Scene JSON for immersive audio pipelines such as IAMF, binaural, and bed-plus-object rendering workflows.

Status This repository is training-ready and Hub-ready.
This includes code, schema, sample data, evaluation, and upload helpers.
It does not include fine-tuned weights yet. After training, upload the contents of your outputs/... folder as the actual model repo.

Demo at https://spatial9.ai/demo

What you will find in this repo

  • Proper instruction fine-tuning with prompt masking, so the loss is applied to the target JSON instead of the instruction prefix.
  • LoRA and QLoRA training paths for efficient fine-tuning on small-to-medium GPUs.
  • Strict JSON Schema validation for production-safe outputs.
  • Built-in evaluation for parse rate, schema-valid rate, object-budget pass rate, and anchor-rule pass rate.
  • Clean Hugging Face upload helper with upload_folder.
  • Ready-made sample data, sample scene, and recommended training config.

Model contract

Input

A structured payload describing:

  • target format
  • object budget
  • style and section
  • per-stem descriptors
  • hard rules such as anchors, low-end centering, width targets, and masking constraints

Output

A single valid JSON object matching schemas/scene.schema.json.

Example input

{
  "target_format": "iamf",
  "max_objects": 10,
  "style": "club",
  "section": "drop",
  "global": {"bpm": 128, "energy": 0.92},
  "stems": [
    {"id": "v1", "class": "lead_vocal", "lufs": -16.8, "transient": 0.25, "band_energy": {"low": 0.1, "mid": 0.6, "high": 0.3}, "leadness": 0.95},
    {"id": "k1", "class": "kick", "lufs": -10.5, "transient": 0.95, "band_energy": {"low": 0.8, "mid": 0.15, "high": 0.05}, "leadness": 0.25}
  ],
  "rules": [
    {"type": "anchor", "track_class": "lead_vocal", "az_deg": 0, "el_deg": 10, "dist_m": 1.6},
    {"type": "mono_low_end", "hz_below": 120}
  ]
}

Example output

{
  "version": "1.0",
  "bed": {"layout": "iamf", "loudness_target_lufs": -14.0, "room_preset": "club_medium"},
  "objects": [
    {
      "id": "v1",
      "class": "lead_vocal",
      "az_deg": 0,
      "el_deg": 10,
      "dist_m": 1.6,
      "width": 0.15,
      "gain_db": 0.0,
      "reverb_send": 0.18,
      "early_reflections": 0.22,
      "motion": [
        {"t": 0.0, "az_deg": 0, "el_deg": 10, "dist_m": 1.6},
        {"t": 1.0, "az_deg": 0, "el_deg": 10, "dist_m": 1.6}
      ]
    }
  ],
  "constraints_applied": [
    "anchor:lead_vocal@0/10/1.6",
    "mono_low_end<120Hz"
  ]
}

Repository layout

GravityLLM-HuggingFace-Repo/
├── README.md
├── LICENSE
├── Makefile
├── pyproject.toml
├── requirements.txt
├── train.py
├── infer.py
├── evaluate.py
├── upload_to_hub.py
├── assets/
│   └── gravityllm_banner.svg
├── configs/
│   └── recommended_train_args.json
├── data/
│   ├── train.jsonl
│   └── valid.jsonl
├── examples/
│   ├── sample_input.json
│   └── sample_output.json
├── schemas/
│   └── scene.schema.json
├── scripts/
│   ├── push_to_hub.sh
│   └── train_qlora.sh
└── tools/
    ├── make_synthetic_dataset.py
    └── validate_scene.py

Quick start

1) Install

python -m pip install -r requirements.txt

2) Train with QLoRA

bash scripts/train_qlora.sh

Or run directly:

python train.py   --model Qwen/Qwen2.5-1.5B-Instruct   --train_file data/train.jsonl   --valid_file data/valid.jsonl   --output_dir outputs/GravityLLM-Qwen2.5-1.5B-S9   --max_length 2048   --num_train_epochs 3   --learning_rate 2e-4   --train_batch_size 1   --eval_batch_size 1   --gradient_accumulation_steps 16   --warmup_ratio 0.03   --save_steps 100   --eval_steps 100   --qlora --bf16

3) Generate a scene

python infer.py   --model_dir outputs/GravityLLM-Qwen2.5-1.5B-S9   --input_json examples/sample_input.json   --validate   --output_json outputs/sample_prediction.json

4) Evaluate

python evaluate.py   --model_dir outputs/GravityLLM-Qwen2.5-1.5B-S9   --data_file data/valid.jsonl   --report_path reports/eval_report.json

5) Validate any output

python tools/validate_scene.py schemas/scene.schema.json outputs/sample_prediction.json

Push to the Hugging Face Hub

From a trained output folder

python upload_to_hub.py   --folder_path outputs/GravityLLM-Qwen2.5-1.5B-S9   --repo_id YOUR_NAMESPACE/GravityLLM-Qwen2.5-1.5B-S9

Or with the helper script

bash scripts/push_to_hub.sh outputs/GravityLLM-Qwen2.5-1.5B-S9 YOUR_NAMESPACE/GravityLLM-Qwen2.5-1.5B-S9

Dataset format

Training files are JSONL with two fields per row:

{
  "prompt": "GravityLLM: Output ONLY valid JSON matching the Spatial9Scene schema.\n\nINPUT:\n{...}",
  "completion": "{... valid Spatial9Scene JSON ...}"
}

The provided sample dataset is intentionally small. Replace it with your real production examples as soon as possible.

Recommended data strategy

For a strong first release:

  1. Collect a few hundred high-quality gold examples from expert-authored scenes.
  2. Keep the schema stable and quantized.
  3. Encode hard rules explicitly instead of relying on vague prose.
  4. Run evaluation after every fine-tune.
  5. Add a post-processor to enforce hard constraints if the runtime must be deterministic.

Suggested training roadmap

v0

  • Small curated dataset
  • QLoRA adapter
  • Schema-valid JSON only
  • Anchor and budget constraints

v1

  • More genres and sections
  • Better masking and width rules
  • Object motion patterns
  • Automatic validation and repair loop

v2

  • Preference tuning on human A/B judgments
  • A dedicated reward signal for clarity, masking avoidance, and translation safety

Intended use

GravityLLM is designed for:

  • music-tech pipelines
  • Spatial9 scene authoring
  • assisted immersive-audio layout generation
  • IAMF-ready authoring workflows
  • renderer-side JSON generation

Limitations

  • This repo does not include trained weights out of the box.
  • The model only knows what you teach it through your dataset.
  • Raw audio is not consumed directly here; the training pipeline expects structured stem features.
  • Production systems should still validate outputs and optionally apply a rule-based correction pass.

Safety and reliability

  • Always validate generated scenes against the JSON schema.
  • Keep low-end centering as a hard rule outside the model if that is non-negotiable.
  • Treat the model as a scene proposal engine, not an oracle.

License

This repository is released under Apache-2.0.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Spatial9/GravityLLM

Base model

Qwen/Qwen2.5-1.5B
Adapter
(664)
this model