LatentScore Gemma3-270M v5 (LoRA adapter)
This repo contains the LoRA adapter for a Gemma 3 270M model fine‑tuned to map short "vibe" descriptions into structured music configuration JSON.
Base model
unsloth/gemma-3-270m-it
Training data
Derived from Common Pile v0.1 (public-domain + openly licensed text). The full dataset and processing artifacts are in the companion dataset repo: https://huggingface.co/datasets/guprab/latentscore-data
Methodology (summary)
- Vibe extraction from Common Pile text (scene/character vibes, tags).
- Config generation with Gemini 3 Flash, best‑of‑N candidates.
- Quality scoring using LAION‑CLAP to select the best config per vibe.
- SFT on Gemma 3 270M with LoRA.
GRPO is currently skipped due to compute limits.
Usage
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = "unsloth/gemma-3-270m-it"
adapter = "guprab/latentscore-gemma3-270m-v5-lora"
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base)
model = PeftModel.from_pretrained(model, adapter)
System prompt & input format
- System prompt: The training prompt is
config_v1incommon/prompt_registry.py(LatentScore repo). It includes strict JSON rules and the schema below. - User input: Vibes are wrapped as
<vibe>...</vibe>and passed through the model’s chat template (Gemma uses<start_of_turn>...tokens).
End-to-end constrained inference (Pydantic + Outlines)
import outlines
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
# If you cloned the LatentScore repo, import the Pydantic schema + prompt:
from common.music_schema import MusicConfigPromptPayload
from common.prompt_registry import render_config_prompt
# Training uses the full config_v1 prompt (schema included):
SYSTEM_PROMPT = render_config_prompt("config_v1")
def generate_config(vibe: str) -> str:
base = "unsloth/gemma-3-270m-it"
adapter = "guprab/latentscore-gemma3-270m-v5-lora"
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base)
model = PeftModel.from_pretrained(model, adapter)
model.eval()
outlines_model = outlines.from_transformers(model, tokenizer)
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"<vibe>{vibe}</vibe>"},
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
with torch.no_grad():
result = outlines_model(
prompt,
output_type=MusicConfigPromptPayload,
max_new_tokens=1024,
temperature=0.0,
)
# Optional: strict validation
MusicConfigPromptPayload.model_validate_json(result)
return result
print(generate_config("a warm sunset over water"))
Full JSON schema (MusicConfigPromptPayload)
JSON schema
{
"$defs": {
"MusicConfigPrompt": {
"additionalProperties": false,
"description": "Prompt-only schema matching MusicConfig without defaults.",
"properties": {
"accent": {
"description": "Sparse accent sound type.",
"enum": [
"none",
"bells",
"pluck",
"chime",
"bells_dense",
"blip",
"blip_random",
"brass_hit",
"wind",
"arp_accent",
"piano_note"
],
"title": "Accent",
"type": "string"
},
"attack": {
"description": "Transient sharpness label.",
"enum": [
"soft",
"medium",
"sharp"
],
"title": "Attack",
"type": "string"
},
"bass": {
"description": "Bass style or movement pattern.",
"enum": [
"drone",
"sustained",
"pulsing",
"walking",
"fifth_drone",
"sub_pulse",
"octave",
"arp_bass"
],
"title": "Bass",
"type": "string"
},
"brightness": {
"description": "Filter brightness / spectral tilt label.",
"enum": [
"very_dark",
"dark",
"medium",
"bright",
"very_bright"
],
"title": "Brightness",
"type": "string"
},
"cadence_strength": {
"description": "Cadence emphasis label (weak, medium, strong).",
"enum": [
"weak",
"medium",
"strong"
],
"title": "Cadence Strength",
"type": "string"
},
"chord_change_bars": {
"description": "Chord change rate label (very_slow, slow, medium, fast).",
"enum": [
"very_slow",
"slow",
"medium",
"fast"
],
"title": "Chord Change Bars",
"type": "string"
},
"chord_extensions": {
"description": "Chord color/extension level.",
"enum": [
"triads",
"sevenths",
"lush"
],
"title": "Chord Extensions",
"type": "string"
},
"chromatic_prob": {
"description": "Chromaticism label (none, light, medium, heavy).",
"enum": [
"none",
"light",
"medium",
"heavy"
],
"title": "Chromatic Prob",
"type": "string"
},
"density": {
"description": "Layer count indicating overall thickness.",
"enum": [
2,
3,
4,
5,
6
],
"ge": 2,
"le": 6,
"title": "Density",
"type": "integer"
},
"depth": {
"description": "Whether to add sub-bass depth.",
"title": "Depth",
"type": "boolean"
},
"echo": {
"description": "Delay amount label.",
"enum": [
"none",
"subtle",
"medium",
"heavy",
"infinite"
],
"title": "Echo",
"type": "string"
},
"grain": {
"description": "Oscillator character (clean/warm/gritty).",
"enum": [
"clean",
"warm",
"gritty"
],
"title": "Grain",
"type": "string"
},
"harmony_style": {
"description": "Harmony progression style.",
"enum": [
"auto",
"pop",
"jazz",
"cinematic",
"ambient"
],
"title": "Harmony Style",
"type": "string"
},
"human": {
"description": "Timing/pitch looseness label.",
"enum": [
"robotic",
"tight",
"natural",
"loose",
"drunk"
],
"title": "Human",
"type": "string"
},
"melody": {
"description": "Melody style or contour.",
"enum": [
"procedural",
"contemplative",
"rising",
"falling",
"minimal",
"ornamental",
"arp_melody",
"contemplative_minor",
"call_response",
"heroic"
],
"title": "Melody",
"type": "string"
},
"melody_density": {
"description": "Melody note density label (very_sparse, sparse, medium, busy, very_busy).",
"enum": [
"very_sparse",
"sparse",
"medium",
"busy",
"very_busy"
],
"title": "Melody Density",
"type": "string"
},
"melody_engine": {
"description": "Melody generation mode (procedural or pattern).",
"enum": [
"pattern",
"procedural"
],
"title": "Melody Engine",
"type": "string"
},
"mode": {
"description": "Scale mode that shapes the emotional color.",
"enum": [
"major",
"minor",
"dorian",
"mixolydian"
],
"title": "Mode",
"type": "string"
},
"motif_repeat_prob": {
"description": "Motif repetition label (rare, sometimes, often).",
"enum": [
"rare",
"sometimes",
"often"
],
"title": "Motif Repeat Prob",
"type": "string"
},
"motion": {
"description": "Modulation/LFO rate label.",
"enum": [
"static",
"slow",
"medium",
"fast",
"chaotic"
],
"title": "Motion",
"type": "string"
},
"pad": {
"description": "Pad texture and harmonic bed style.",
"enum": [
"warm_slow",
"dark_sustained",
"cinematic",
"thin_high",
"ambient_drift",
"stacked_fifths",
"bright_open"
],
"title": "Pad",
"type": "string"
},
"phrase_len_bars": {
"description": "Phrase length in bars (2, 4, or 8).",
"enum": [
2,
4,
8
],
"title": "Phrase Len Bars",
"type": "integer"
},
"register_max_oct": {
"description": "Highest melody octave (integer).",
"maximum": 8,
"minimum": 1,
"title": "Register Max Oct",
"type": "integer"
},
"register_min_oct": {
"description": "Lowest melody octave (integer).",
"maximum": 8,
"minimum": 1,
"title": "Register Min Oct",
"type": "integer"
},
"rhythm": {
"description": "Percussion pattern style (or none).",
"enum": [
"none",
"minimal",
"heartbeat",
"soft_four",
"hats_only",
"electronic",
"kit_light",
"kit_medium",
"military",
"tabla_essence",
"brush"
],
"title": "Rhythm",
"type": "string"
},
"root": {
"description": "Root note of the scale.",
"enum": [
"c",
"c#",
"d",
"d#",
"e",
"f",
"f#",
"g",
"g#",
"a",
"a#",
"b"
],
"title": "Root",
"type": "string"
},
"space": {
"description": "Reverb/room size label.",
"enum": [
"dry",
"small",
"medium",
"large",
"vast"
],
"title": "Space",
"type": "string"
},
"step_bias": {
"description": "Melodic motion label (step, balanced, leapy).",
"enum": [
"step",
"balanced",
"leapy"
],
"title": "Step Bias",
"type": "string"
},
"stereo": {
"description": "Stereo width label.",
"enum": [
"mono",
"narrow",
"medium",
"wide",
"ultra_wide"
],
"title": "Stereo",
"type": "string"
},
"swing": {
"description": "Swing amount label (none, light, medium, heavy).",
"enum": [
"none",
"light",
"medium",
"heavy"
],
"title": "Swing",
"type": "string"
},
"syncopation": {
"description": "Offbeat emphasis label (straight, light, medium, heavy).",
"enum": [
"straight",
"light",
"medium",
"heavy"
],
"title": "Syncopation",
"type": "string"
},
"tempo": {
"description": "Tempo label controlling overall speed and energy.",
"enum": [
"very_slow",
"slow",
"medium",
"fast",
"very_fast"
],
"title": "Tempo",
"type": "string"
},
"tension_curve": {
"description": "Tension shape across the phrase.",
"enum": [
"arc",
"ramp",
"waves"
],
"title": "Tension Curve",
"type": "string"
},
"texture": {
"description": "Background texture or noise layer.",
"enum": [
"none",
"shimmer",
"shimmer_slow",
"vinyl_crackle",
"breath",
"stars",
"glitch",
"noise_wash",
"crystal",
"pad_whisper"
],
"title": "Texture",
"type": "string"
}
},
"required": [
"tempo",
"root",
"mode",
"brightness",
"space",
"density",
"bass",
"pad",
"melody",
"rhythm",
"texture",
"accent",
"motion",
"attack",
"stereo",
"depth",
"echo",
"human",
"grain",
"melody_engine",
"phrase_len_bars",
"melody_density",
"syncopation",
"swing",
"motif_repeat_prob",
"step_bias",
"chromatic_prob",
"cadence_strength",
"register_min_oct",
"register_max_oct",
"tension_curve",
"harmony_style",
"chord_change_bars",
"chord_extensions"
],
"title": "MusicConfigPrompt",
"type": "object"
},
"Palette": {
"additionalProperties": false,
"description": "A ranked palette with exactly five colors, auto-sorted by weight.",
"properties": {
"colors": {
"description": "Ranked palette with exactly five colors.",
"items": {
"$ref": "#/$defs/PaletteColor"
},
"maxItems": 5,
"minItems": 5,
"title": "Colors",
"type": "array"
}
},
"required": [
"colors"
],
"title": "Palette",
"type": "object"
},
"PaletteColor": {
"additionalProperties": false,
"description": "A single color with weight in a palette.",
"properties": {
"hex": {
"description": "Hex color string in #RRGGBB format.",
"pattern": "^#[0-9A-Fa-f]{6}$",
"title": "Hex",
"type": "string"
},
"weight": {
"description": "Relative weight label (xs, sm, md, lg, xl, xxl).",
"enum": [
"xs",
"sm",
"md",
"lg",
"xl",
"xxl"
],
"title": "Weight",
"type": "string"
}
},
"required": [
"hex",
"weight"
],
"title": "PaletteColor",
"type": "object"
}
},
"additionalProperties": false,
"description": "LLM payload that includes a thinking field, config, and palettes.",
"properties": {
"config": {
"$ref": "#/$defs/MusicConfigPrompt",
"description": "Music configuration that matches the requested vibe."
},
"palettes": {
"description": "Three ranked palettes matching the vibe.",
"items": {
"$ref": "#/$defs/Palette"
},
"maxItems": 3,
"minItems": 3,
"title": "Palettes",
"type": "array"
},
"thinking": {
"description": "Explain the sonic reasoning for the choices. Mention vibe decomposition, sonic translation, coherence check, and which examples guided the selection.",
"maxLength": 1000,
"title": "Thinking",
"type": "string"
},
"title": {
"description": "Short, readable title summarizing the vibe (<=6 words).",
"maxLength": 60,
"minLength": 1,
"title": "Title",
"type": "string"
}
},
"required": [
"thinking",
"title",
"config",
"palettes"
],
"title": "MusicConfigPromptPayload",
"type": "object"
}
Schema & prompts
The target output follows a Pydantic schema (MusicConfigPromptPayload), and
inference uses Outlines JSON‑schema constrained decoding. The base system prompt
and chat formatting live in the LatentScore repo; see the dataset README for a
summary and the repo for full details:
https://github.com/prabal-rje/latentscore
Full merged model
If you want the fully merged weights (base + LoRA), use:
https://huggingface.co/guprab/latentscore-gemma3-270m-v5-merged
License & attribution
This model is trained on mixed‑license data. Per‑document licenses are preserved in the dataset. Please respect those licenses and attribute Common Pile v0.1, Google (Gemini + Gemma), and LAION‑CLAP where appropriate.
- Downloads last month
- 1