Text Generation
Transformers
PEFT
English
gravityllm
spatial-audio
immersive-audio
spatial9
iamf
instruction-tuning
json
lora
qlora
Instructions to use Spatial9/GravityLLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Spatial9/GravityLLM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Spatial9/GravityLLM")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Spatial9/GravityLLM", dtype="auto") - PEFT
How to use Spatial9/GravityLLM with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Spatial9/GravityLLM with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Spatial9/GravityLLM" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Spatial9/GravityLLM", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Spatial9/GravityLLM
- SGLang
How to use Spatial9/GravityLLM with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Spatial9/GravityLLM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Spatial9/GravityLLM", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Spatial9/GravityLLM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Spatial9/GravityLLM", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Spatial9/GravityLLM with Docker Model Runner:
docker model run hf.co/Spatial9/GravityLLM
| language: | |
| - en | |
| license: apache-2.0 | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| base_model: Qwen/Qwen2.5-1.5B-Instruct | |
| tags: | |
| - gravityllm | |
| - spatial-audio | |
| - immersive-audio | |
| - spatial9 | |
| - iamf | |
| - instruction-tuning | |
| - json | |
| - lora | |
| - qlora | |
| - peft | |
| - transformers | |
| widget: | |
| - text: |- | |
| INPUT: | |
| { | |
| "target_format": "iamf", | |
| "max_objects": 10, | |
| "style": "club", | |
| "section": "drop", | |
| "global": {"bpm": 128, "energy": 0.92}, | |
| "stems": [ | |
| {"id": "v1", "class": "lead_vocal", "lufs": -16.8, "transient": 0.25, "band_energy": {"low": 0.1, "mid": 0.6, "high": 0.3}, "leadness": 0.95}, | |
| {"id": "k1", "class": "kick", "lufs": -10.5, "transient": 0.95, "band_energy": {"low": 0.8, "mid": 0.15, "high": 0.05}, "leadness": 0.25} | |
| ], | |
| "rules": [ | |
| {"type": "anchor", "track_class": "lead_vocal", "az_deg": 0, "el_deg": 10, "dist_m": 1.6}, | |
| {"type": "mono_low_end", "hz_below": 120} | |
| ] | |
| } | |
|  | |
| # GravityLLM | |
| GravityLLM is a compact instruction-tuned model for **constraint-conditioned spatial scene generation**. | |
| It turns **music constraints + stem descriptors** into strict **Spatial9Scene JSON** for immersive audio pipelines such as IAMF, binaural, and bed-plus-object rendering workflows. | |
| > **Status** | |
| > This repository is **training-ready and Hub-ready**. | |
| > This includes code, schema, sample data, evaluation, and upload helpers. | |
| > It does **not** include fine-tuned weights yet. After training, upload the contents of your `outputs/...` folder as the actual model repo. | |
| Demo at **[https://spatial9.ai/demo](https://spatial9.ai/demo)** | |
| ## What you will find in this repo | |
| - Proper instruction fine-tuning with **prompt masking**, so the loss is applied to the target JSON instead of the instruction prefix. | |
| - **LoRA** and **QLoRA** training paths for efficient fine-tuning on small-to-medium GPUs. | |
| - Strict **JSON Schema** validation for production-safe outputs. | |
| - Built-in **evaluation** for parse rate, schema-valid rate, object-budget pass rate, and anchor-rule pass rate. | |
| - Clean **Hugging Face upload** helper with `upload_folder`. | |
| - Ready-made **sample data**, **sample scene**, and **recommended training config**. | |
| ## Model contract | |
| ### Input | |
| A structured payload describing: | |
| - target format | |
| - object budget | |
| - style and section | |
| - per-stem descriptors | |
| - hard rules such as anchors, low-end centering, width targets, and masking constraints | |
| ### Output | |
| A single valid JSON object matching `schemas/scene.schema.json`. | |
| ### Example input | |
| ```json | |
| { | |
| "target_format": "iamf", | |
| "max_objects": 10, | |
| "style": "club", | |
| "section": "drop", | |
| "global": {"bpm": 128, "energy": 0.92}, | |
| "stems": [ | |
| {"id": "v1", "class": "lead_vocal", "lufs": -16.8, "transient": 0.25, "band_energy": {"low": 0.1, "mid": 0.6, "high": 0.3}, "leadness": 0.95}, | |
| {"id": "k1", "class": "kick", "lufs": -10.5, "transient": 0.95, "band_energy": {"low": 0.8, "mid": 0.15, "high": 0.05}, "leadness": 0.25} | |
| ], | |
| "rules": [ | |
| {"type": "anchor", "track_class": "lead_vocal", "az_deg": 0, "el_deg": 10, "dist_m": 1.6}, | |
| {"type": "mono_low_end", "hz_below": 120} | |
| ] | |
| } | |
| ``` | |
| ### Example output | |
| ```json | |
| { | |
| "version": "1.0", | |
| "bed": {"layout": "iamf", "loudness_target_lufs": -14.0, "room_preset": "club_medium"}, | |
| "objects": [ | |
| { | |
| "id": "v1", | |
| "class": "lead_vocal", | |
| "az_deg": 0, | |
| "el_deg": 10, | |
| "dist_m": 1.6, | |
| "width": 0.15, | |
| "gain_db": 0.0, | |
| "reverb_send": 0.18, | |
| "early_reflections": 0.22, | |
| "motion": [ | |
| {"t": 0.0, "az_deg": 0, "el_deg": 10, "dist_m": 1.6}, | |
| {"t": 1.0, "az_deg": 0, "el_deg": 10, "dist_m": 1.6} | |
| ] | |
| } | |
| ], | |
| "constraints_applied": [ | |
| "anchor:lead_vocal@0/10/1.6", | |
| "mono_low_end<120Hz" | |
| ] | |
| } | |
| ``` | |
| ## Repository layout | |
| ```text | |
| GravityLLM-HuggingFace-Repo/ | |
| βββ README.md | |
| βββ LICENSE | |
| βββ Makefile | |
| βββ pyproject.toml | |
| βββ requirements.txt | |
| βββ train.py | |
| βββ infer.py | |
| βββ evaluate.py | |
| βββ upload_to_hub.py | |
| βββ assets/ | |
| β βββ gravityllm_banner.svg | |
| βββ configs/ | |
| β βββ recommended_train_args.json | |
| βββ data/ | |
| β βββ train.jsonl | |
| β βββ valid.jsonl | |
| βββ examples/ | |
| β βββ sample_input.json | |
| β βββ sample_output.json | |
| βββ schemas/ | |
| β βββ scene.schema.json | |
| βββ scripts/ | |
| β βββ push_to_hub.sh | |
| β βββ train_qlora.sh | |
| βββ tools/ | |
| βββ make_synthetic_dataset.py | |
| βββ validate_scene.py | |
| ``` | |
| ## Quick start | |
| ### 1) Install | |
| ```bash | |
| python -m pip install -r requirements.txt | |
| ``` | |
| ### 2) Train with QLoRA | |
| ```bash | |
| bash scripts/train_qlora.sh | |
| ``` | |
| Or run directly: | |
| ```bash | |
| python train.py --model Qwen/Qwen2.5-1.5B-Instruct --train_file data/train.jsonl --valid_file data/valid.jsonl --output_dir outputs/GravityLLM-Qwen2.5-1.5B-S9 --max_length 2048 --num_train_epochs 3 --learning_rate 2e-4 --train_batch_size 1 --eval_batch_size 1 --gradient_accumulation_steps 16 --warmup_ratio 0.03 --save_steps 100 --eval_steps 100 --qlora --bf16 | |
| ``` | |
| ### 3) Generate a scene | |
| ```bash | |
| python infer.py --model_dir outputs/GravityLLM-Qwen2.5-1.5B-S9 --input_json examples/sample_input.json --validate --output_json outputs/sample_prediction.json | |
| ``` | |
| ### 4) Evaluate | |
| ```bash | |
| python evaluate.py --model_dir outputs/GravityLLM-Qwen2.5-1.5B-S9 --data_file data/valid.jsonl --report_path reports/eval_report.json | |
| ``` | |
| ### 5) Validate any output | |
| ```bash | |
| python tools/validate_scene.py schemas/scene.schema.json outputs/sample_prediction.json | |
| ``` | |
| ## Push to the Hugging Face Hub | |
| ### From a trained output folder | |
| ```bash | |
| python upload_to_hub.py --folder_path outputs/GravityLLM-Qwen2.5-1.5B-S9 --repo_id YOUR_NAMESPACE/GravityLLM-Qwen2.5-1.5B-S9 | |
| ``` | |
| ### Or with the helper script | |
| ```bash | |
| bash scripts/push_to_hub.sh outputs/GravityLLM-Qwen2.5-1.5B-S9 YOUR_NAMESPACE/GravityLLM-Qwen2.5-1.5B-S9 | |
| ``` | |
| ## Dataset format | |
| Training files are JSONL with two fields per row: | |
| ```json | |
| { | |
| "prompt": "GravityLLM: Output ONLY valid JSON matching the Spatial9Scene schema.\n\nINPUT:\n{...}", | |
| "completion": "{... valid Spatial9Scene JSON ...}" | |
| } | |
| ``` | |
| The provided sample dataset is intentionally small. Replace it with your real production examples as soon as possible. | |
| ## Recommended data strategy | |
| For a strong first release: | |
| 1. Collect a few hundred high-quality gold examples from expert-authored scenes. | |
| 2. Keep the schema stable and quantized. | |
| 3. Encode hard rules explicitly instead of relying on vague prose. | |
| 4. Run evaluation after every fine-tune. | |
| 5. Add a post-processor to enforce hard constraints if the runtime must be deterministic. | |
| ## Suggested training roadmap | |
| ### v0 | |
| - Small curated dataset | |
| - QLoRA adapter | |
| - Schema-valid JSON only | |
| - Anchor and budget constraints | |
| ### v1 | |
| - More genres and sections | |
| - Better masking and width rules | |
| - Object motion patterns | |
| - Automatic validation and repair loop | |
| ### v2 | |
| - Preference tuning on human A/B judgments | |
| - A dedicated reward signal for clarity, masking avoidance, and translation safety | |
| ## Intended use | |
| GravityLLM is designed for: | |
| - music-tech pipelines | |
| - Spatial9 scene authoring | |
| - assisted immersive-audio layout generation | |
| - IAMF-ready authoring workflows | |
| - renderer-side JSON generation | |
| ## Limitations | |
| - This repo does not include trained weights out of the box. | |
| - The model only knows what you teach it through your dataset. | |
| - Raw audio is not consumed directly here; the training pipeline expects structured stem features. | |
| - Production systems should still validate outputs and optionally apply a rule-based correction pass. | |
| ## Safety and reliability | |
| - Always validate generated scenes against the JSON schema. | |
| - Keep low-end centering as a hard rule outside the model if that is non-negotiable. | |
| - Treat the model as a scene proposal engine, not an oracle. | |
| ## License | |
| This repository is released under Apache-2.0. | |