Salary Normalizer

A fine-tuned Gemma 3 270M model that parses and standardizes free-form salary text into structured JSON. Given an arbitrary salary string and a country name, it extracts currency symbol, ISO code, numeric range, and pay cadence in a single inference pass.

Model Details

Property	Value
Base model	`google/gemma-3-270m`
Fine-tune type	Supervised (instruction-tuned)
License	Apache 2.0
Task	Structured information extraction

Intended Use

Extract and normalize salary mentions from job descriptions or candidate profiles.
Standardize heterogeneous salary formats (e.g., 12 LPA, $60k–$80k/yr, €45,000 p.a.) into a consistent schema for downstream analytics or storage.

Out-of-scope use: This model is not designed for general text generation or tasks unrelated to salary parsing.

Input Format

Prompts must follow this exact template:

<start_of_turn>user
summarize salary: <SALARY_TEXT>
country: <COUNTRY_NAME><end_of_turn>
<start_of_turn>model

Examples of valid salary text:

$60k - $80k per year
INR 12 LPA
€45,000 annually
12 to 12.5 US $ per hr

Country names must match one of the supported countries listed below.

Output Schema

The model returns a JSON object with the following fields:

{
  "currency": "$",
  "iso_code": "USD",
  "min_amount": 60000,
  "max_amount": 80000,
  "pay_rate": "ANNUALLY"
}

Field	Type	Description
`currency`	`string`	Raw currency symbol or string as it appears in the input
`iso_code`	`string`	Standardized ISO 4217 currency code
`min_amount`	`int / float`	Lower bound of the salary range (annualized or as stated)
`max_amount`	`int / float`	Upper bound of the salary range (annualized or as stated)
`pay_rate`	`string`	One of: `HOURLY`, `DAILY`, `WEEKLY`, `BI-WEEKLY`, `MONTHLY`, `ANNUALLY`, `OTHERS`

Note: min_amount and max_amount reflect normalized numeric values, not raw token extractions. For single-value salaries, both fields will hold the same value.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "Draup/salary-normalizer"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

salary_text = "12 to 12.5 US $ per hr"
country = "United States"

prompt = (
    f"<start_of_turn>user\n"
    f"summarize salary: {salary_text}\n"
    f"country: {country}<end_of_turn>\n"
    f"<start_of_turn>model\n"
)

inputs = tokenizer(prompt, return_tensors="pt", truncation=True)

outputs = model.generate(
    **inputs,
    max_new_tokens=64,
    do_sample=False,
    eos_token_id=tokenizer.convert_tokens_to_ids("<end_of_turn>")
)

result = tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[-1]:],
    skip_special_tokens=True
)
print(result)
# {"currency": "US $", "iso_code": "USD", "min_amount": 12, "max_amount": 12.5, "pay_rate": "HOURLY"}

Supported Countries

The model supports salary parsing for the following 49 countries:


Argentina	Australia	Austria	Belgium
Brazil	Canada	Chile	China
Colombia	Czechia	Denmark	Egypt
Finland	France	Germany	Hong Kong
Hungary	India	Indonesia	Ireland
Israel	Italy	Japan	Malaysia
Mexico	Netherlands	New Zealand	Norway
Pakistan	Peru	Philippines	Poland
Portugal	Romania	Russia	Saudi Arabia
Singapore	South Africa	South Korea	Spain
Sweden	Switzerland	Taiwan	Thailand
Turkey	United Arab Emirates	United Kingdom	United States
Vietnam

Limitations

Country names must match one of the 49 supported countries listed above.
Country is used for currency disambiguation; incorrect input may lead to wrong currency/ISO codes.
Primarily trained on English and performance may degrade with other languages.

Downloads last month: 5

Safetensors

Model size

0.3B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Draup/salary-normalizer

Base model

google/gemma-3-270m

Finetuned

(147)

this model