LatentScore Gemma3-270M v5 (LoRA adapter)

This repo contains the LoRA adapter for a Gemma 3 270M model fine‑tuned to map short "vibe" descriptions into structured music configuration JSON.

Base model

  • unsloth/gemma-3-270m-it

Training data

Derived from Common Pile v0.1 (public-domain + openly licensed text). The full dataset and processing artifacts are in the companion dataset repo: https://huggingface.co/datasets/guprab/latentscore-data

Methodology (summary)

  1. Vibe extraction from Common Pile text (scene/character vibes, tags).
  2. Config generation with Gemini 3 Flash, best‑of‑N candidates.
  3. Quality scoring using LAION‑CLAP to select the best config per vibe.
  4. SFT on Gemma 3 270M with LoRA.

GRPO is currently skipped due to compute limits.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "unsloth/gemma-3-270m-it"
adapter = "guprab/latentscore-gemma3-270m-v5-lora"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base)
model = PeftModel.from_pretrained(model, adapter)

System prompt & input format

  • System prompt: The training prompt is config_v1 in common/prompt_registry.py (LatentScore repo). It includes strict JSON rules and the schema below.
  • User input: Vibes are wrapped as <vibe>...</vibe> and passed through the model’s chat template (Gemma uses <start_of_turn>... tokens).

End-to-end constrained inference (Pydantic + Outlines)

import outlines
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# If you cloned the LatentScore repo, import the Pydantic schema + prompt:
from common.music_schema import MusicConfigPromptPayload
from common.prompt_registry import render_config_prompt

# Training uses the full config_v1 prompt (schema included):
SYSTEM_PROMPT = render_config_prompt("config_v1")

def generate_config(vibe: str) -> str:
    base = "unsloth/gemma-3-270m-it"
    adapter = "guprab/latentscore-gemma3-270m-v5-lora"
    tokenizer = AutoTokenizer.from_pretrained(base)
    model = AutoModelForCausalLM.from_pretrained(base)
    model = PeftModel.from_pretrained(model, adapter)
    model.eval()

    outlines_model = outlines.from_transformers(model, tokenizer)

    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": f"<vibe>{vibe}</vibe>"},
    ]
    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
    )
    with torch.no_grad():
        result = outlines_model(
            prompt,
            output_type=MusicConfigPromptPayload,
            max_new_tokens=1024,
            temperature=0.0,
        )
    # Optional: strict validation
    MusicConfigPromptPayload.model_validate_json(result)
    return result

print(generate_config("a warm sunset over water"))

Full JSON schema (MusicConfigPromptPayload)

JSON schema
{
  "$defs": {
    "MusicConfigPrompt": {
      "additionalProperties": false,
      "description": "Prompt-only schema matching MusicConfig without defaults.",
      "properties": {
        "accent": {
          "description": "Sparse accent sound type.",
          "enum": [
            "none",
            "bells",
            "pluck",
            "chime",
            "bells_dense",
            "blip",
            "blip_random",
            "brass_hit",
            "wind",
            "arp_accent",
            "piano_note"
          ],
          "title": "Accent",
          "type": "string"
        },
        "attack": {
          "description": "Transient sharpness label.",
          "enum": [
            "soft",
            "medium",
            "sharp"
          ],
          "title": "Attack",
          "type": "string"
        },
        "bass": {
          "description": "Bass style or movement pattern.",
          "enum": [
            "drone",
            "sustained",
            "pulsing",
            "walking",
            "fifth_drone",
            "sub_pulse",
            "octave",
            "arp_bass"
          ],
          "title": "Bass",
          "type": "string"
        },
        "brightness": {
          "description": "Filter brightness / spectral tilt label.",
          "enum": [
            "very_dark",
            "dark",
            "medium",
            "bright",
            "very_bright"
          ],
          "title": "Brightness",
          "type": "string"
        },
        "cadence_strength": {
          "description": "Cadence emphasis label (weak, medium, strong).",
          "enum": [
            "weak",
            "medium",
            "strong"
          ],
          "title": "Cadence Strength",
          "type": "string"
        },
        "chord_change_bars": {
          "description": "Chord change rate label (very_slow, slow, medium, fast).",
          "enum": [
            "very_slow",
            "slow",
            "medium",
            "fast"
          ],
          "title": "Chord Change Bars",
          "type": "string"
        },
        "chord_extensions": {
          "description": "Chord color/extension level.",
          "enum": [
            "triads",
            "sevenths",
            "lush"
          ],
          "title": "Chord Extensions",
          "type": "string"
        },
        "chromatic_prob": {
          "description": "Chromaticism label (none, light, medium, heavy).",
          "enum": [
            "none",
            "light",
            "medium",
            "heavy"
          ],
          "title": "Chromatic Prob",
          "type": "string"
        },
        "density": {
          "description": "Layer count indicating overall thickness.",
          "enum": [
            2,
            3,
            4,
            5,
            6
          ],
          "ge": 2,
          "le": 6,
          "title": "Density",
          "type": "integer"
        },
        "depth": {
          "description": "Whether to add sub-bass depth.",
          "title": "Depth",
          "type": "boolean"
        },
        "echo": {
          "description": "Delay amount label.",
          "enum": [
            "none",
            "subtle",
            "medium",
            "heavy",
            "infinite"
          ],
          "title": "Echo",
          "type": "string"
        },
        "grain": {
          "description": "Oscillator character (clean/warm/gritty).",
          "enum": [
            "clean",
            "warm",
            "gritty"
          ],
          "title": "Grain",
          "type": "string"
        },
        "harmony_style": {
          "description": "Harmony progression style.",
          "enum": [
            "auto",
            "pop",
            "jazz",
            "cinematic",
            "ambient"
          ],
          "title": "Harmony Style",
          "type": "string"
        },
        "human": {
          "description": "Timing/pitch looseness label.",
          "enum": [
            "robotic",
            "tight",
            "natural",
            "loose",
            "drunk"
          ],
          "title": "Human",
          "type": "string"
        },
        "melody": {
          "description": "Melody style or contour.",
          "enum": [
            "procedural",
            "contemplative",
            "rising",
            "falling",
            "minimal",
            "ornamental",
            "arp_melody",
            "contemplative_minor",
            "call_response",
            "heroic"
          ],
          "title": "Melody",
          "type": "string"
        },
        "melody_density": {
          "description": "Melody note density label (very_sparse, sparse, medium, busy, very_busy).",
          "enum": [
            "very_sparse",
            "sparse",
            "medium",
            "busy",
            "very_busy"
          ],
          "title": "Melody Density",
          "type": "string"
        },
        "melody_engine": {
          "description": "Melody generation mode (procedural or pattern).",
          "enum": [
            "pattern",
            "procedural"
          ],
          "title": "Melody Engine",
          "type": "string"
        },
        "mode": {
          "description": "Scale mode that shapes the emotional color.",
          "enum": [
            "major",
            "minor",
            "dorian",
            "mixolydian"
          ],
          "title": "Mode",
          "type": "string"
        },
        "motif_repeat_prob": {
          "description": "Motif repetition label (rare, sometimes, often).",
          "enum": [
            "rare",
            "sometimes",
            "often"
          ],
          "title": "Motif Repeat Prob",
          "type": "string"
        },
        "motion": {
          "description": "Modulation/LFO rate label.",
          "enum": [
            "static",
            "slow",
            "medium",
            "fast",
            "chaotic"
          ],
          "title": "Motion",
          "type": "string"
        },
        "pad": {
          "description": "Pad texture and harmonic bed style.",
          "enum": [
            "warm_slow",
            "dark_sustained",
            "cinematic",
            "thin_high",
            "ambient_drift",
            "stacked_fifths",
            "bright_open"
          ],
          "title": "Pad",
          "type": "string"
        },
        "phrase_len_bars": {
          "description": "Phrase length in bars (2, 4, or 8).",
          "enum": [
            2,
            4,
            8
          ],
          "title": "Phrase Len Bars",
          "type": "integer"
        },
        "register_max_oct": {
          "description": "Highest melody octave (integer).",
          "maximum": 8,
          "minimum": 1,
          "title": "Register Max Oct",
          "type": "integer"
        },
        "register_min_oct": {
          "description": "Lowest melody octave (integer).",
          "maximum": 8,
          "minimum": 1,
          "title": "Register Min Oct",
          "type": "integer"
        },
        "rhythm": {
          "description": "Percussion pattern style (or none).",
          "enum": [
            "none",
            "minimal",
            "heartbeat",
            "soft_four",
            "hats_only",
            "electronic",
            "kit_light",
            "kit_medium",
            "military",
            "tabla_essence",
            "brush"
          ],
          "title": "Rhythm",
          "type": "string"
        },
        "root": {
          "description": "Root note of the scale.",
          "enum": [
            "c",
            "c#",
            "d",
            "d#",
            "e",
            "f",
            "f#",
            "g",
            "g#",
            "a",
            "a#",
            "b"
          ],
          "title": "Root",
          "type": "string"
        },
        "space": {
          "description": "Reverb/room size label.",
          "enum": [
            "dry",
            "small",
            "medium",
            "large",
            "vast"
          ],
          "title": "Space",
          "type": "string"
        },
        "step_bias": {
          "description": "Melodic motion label (step, balanced, leapy).",
          "enum": [
            "step",
            "balanced",
            "leapy"
          ],
          "title": "Step Bias",
          "type": "string"
        },
        "stereo": {
          "description": "Stereo width label.",
          "enum": [
            "mono",
            "narrow",
            "medium",
            "wide",
            "ultra_wide"
          ],
          "title": "Stereo",
          "type": "string"
        },
        "swing": {
          "description": "Swing amount label (none, light, medium, heavy).",
          "enum": [
            "none",
            "light",
            "medium",
            "heavy"
          ],
          "title": "Swing",
          "type": "string"
        },
        "syncopation": {
          "description": "Offbeat emphasis label (straight, light, medium, heavy).",
          "enum": [
            "straight",
            "light",
            "medium",
            "heavy"
          ],
          "title": "Syncopation",
          "type": "string"
        },
        "tempo": {
          "description": "Tempo label controlling overall speed and energy.",
          "enum": [
            "very_slow",
            "slow",
            "medium",
            "fast",
            "very_fast"
          ],
          "title": "Tempo",
          "type": "string"
        },
        "tension_curve": {
          "description": "Tension shape across the phrase.",
          "enum": [
            "arc",
            "ramp",
            "waves"
          ],
          "title": "Tension Curve",
          "type": "string"
        },
        "texture": {
          "description": "Background texture or noise layer.",
          "enum": [
            "none",
            "shimmer",
            "shimmer_slow",
            "vinyl_crackle",
            "breath",
            "stars",
            "glitch",
            "noise_wash",
            "crystal",
            "pad_whisper"
          ],
          "title": "Texture",
          "type": "string"
        }
      },
      "required": [
        "tempo",
        "root",
        "mode",
        "brightness",
        "space",
        "density",
        "bass",
        "pad",
        "melody",
        "rhythm",
        "texture",
        "accent",
        "motion",
        "attack",
        "stereo",
        "depth",
        "echo",
        "human",
        "grain",
        "melody_engine",
        "phrase_len_bars",
        "melody_density",
        "syncopation",
        "swing",
        "motif_repeat_prob",
        "step_bias",
        "chromatic_prob",
        "cadence_strength",
        "register_min_oct",
        "register_max_oct",
        "tension_curve",
        "harmony_style",
        "chord_change_bars",
        "chord_extensions"
      ],
      "title": "MusicConfigPrompt",
      "type": "object"
    },
    "Palette": {
      "additionalProperties": false,
      "description": "A ranked palette with exactly five colors, auto-sorted by weight.",
      "properties": {
        "colors": {
          "description": "Ranked palette with exactly five colors.",
          "items": {
            "$ref": "#/$defs/PaletteColor"
          },
          "maxItems": 5,
          "minItems": 5,
          "title": "Colors",
          "type": "array"
        }
      },
      "required": [
        "colors"
      ],
      "title": "Palette",
      "type": "object"
    },
    "PaletteColor": {
      "additionalProperties": false,
      "description": "A single color with weight in a palette.",
      "properties": {
        "hex": {
          "description": "Hex color string in #RRGGBB format.",
          "pattern": "^#[0-9A-Fa-f]{6}$",
          "title": "Hex",
          "type": "string"
        },
        "weight": {
          "description": "Relative weight label (xs, sm, md, lg, xl, xxl).",
          "enum": [
            "xs",
            "sm",
            "md",
            "lg",
            "xl",
            "xxl"
          ],
          "title": "Weight",
          "type": "string"
        }
      },
      "required": [
        "hex",
        "weight"
      ],
      "title": "PaletteColor",
      "type": "object"
    }
  },
  "additionalProperties": false,
  "description": "LLM payload that includes a thinking field, config, and palettes.",
  "properties": {
    "config": {
      "$ref": "#/$defs/MusicConfigPrompt",
      "description": "Music configuration that matches the requested vibe."
    },
    "palettes": {
      "description": "Three ranked palettes matching the vibe.",
      "items": {
        "$ref": "#/$defs/Palette"
      },
      "maxItems": 3,
      "minItems": 3,
      "title": "Palettes",
      "type": "array"
    },
    "thinking": {
      "description": "Explain the sonic reasoning for the choices. Mention vibe decomposition, sonic translation, coherence check, and which examples guided the selection.",
      "maxLength": 1000,
      "title": "Thinking",
      "type": "string"
    },
    "title": {
      "description": "Short, readable title summarizing the vibe (<=6 words).",
      "maxLength": 60,
      "minLength": 1,
      "title": "Title",
      "type": "string"
    }
  },
  "required": [
    "thinking",
    "title",
    "config",
    "palettes"
  ],
  "title": "MusicConfigPromptPayload",
  "type": "object"
}

Schema & prompts

The target output follows a Pydantic schema (MusicConfigPromptPayload), and inference uses Outlines JSON‑schema constrained decoding. The base system prompt and chat formatting live in the LatentScore repo; see the dataset README for a summary and the repo for full details: https://github.com/prabal-rje/latentscore

Full merged model

If you want the fully merged weights (base + LoRA), use:

https://huggingface.co/guprab/latentscore-gemma3-270m-v5-merged

License & attribution

This model is trained on mixed‑license data. Per‑document licenses are preserved in the dataset. Please respect those licenses and attribute Common Pile v0.1, Google (Gemini + Gemma), and LAION‑CLAP where appropriate.

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for guprab/latentscore-gemma3-270m-v5-lora

Adapter
(25)
this model