How to un-normalized unquantized predictions

by rfbrito - opened Mar 12

Mar 12

Hello! Thanks for all your help so far:)

How do I un-normalized the quantized predictions/what function did you use to normalize it? I want to make sure I can properly compare the predictions to the true scores in a regression context.

Thanks!

Rahul

NDStein

Kintsugi org Mar 12

Hi Rahul,

You're welcome!

By normalize do you mean quantize? If you want the raw, un-quantized scores, just pass quantize=False to one of the Pipeline.run_on_* functions.

The quantization uses a sorted list of thresholds for each task, which are configurable at the Pipeline level; for the defaults see inference_thresholds in config.py. The quantization happens in the last function in model.py by calling torch.searchsorted to figure out where in the sorted list of thresholds a particular score lands.

Noah

rfbrito

Mar 12

Yeah fair question, to clarify - I don't quantize. The output I get is, as the model card says, "raw float values which correlate monotonically with PHQ-9 and GAD-7." The values I get out are on a range of -2.5 to 0.5, so I wasn't sure how to convert the true scores (0-27) to things normalized range or vis-versa.

NDStein

Kintsugi org Mar 12

Oh, ok. How to best map raw scores to the most likely corresponding PHQ-9 sums depends on what metric you use for "best". The tuning folder has tools to use for this; see the "Tuning thresholds" section of the dam model card, specifically the "Optimal Tuning for Multi-class Tasks" subsection, for some examples. Let me know if you have questions about how this is intended to answer your question.

anhtndev

Mar 13

Hi everyone. I would kindly ask you about the model that predicts the emotions output. Do we have any plans to release this model?

NDStein

Kintsugi org Mar 13

Hi @anhtndev ! There is no such model, but if you have further questions, feel free to open a separate thread.

rfbrito

Mar 13

Thanks @NDStein ! Maybe I'm misunderstanding; from a regression standpoint (e.g. if quantize=False), I was expecting the model to output a continuous prediction between 0-27 for phq9 and 0-21 for gad7. The output I get instead is continuous but between -2.5 and 0.5 or so. My interpretation of that was that these scores map to an actual phq9 or gad7 sum score, but that there is some type of conversion or normalization happening in the DAM model (e.g. z-scoring). If so, I'm curious what that is. I didn't think that threshold tuning was what I wanted to do since I am not trying to look at any type of classifier but treating this as a continuous regression problem.

grehce

Kintsugi org Mar 13

•

edited Mar 13

@rfbrito Thanks for the question. I think the confusion is coming from how the model outputs were structured.

The model itself does not directly regress to the raw PHQ-9 (0–27) or GAD-7 (0–21) totals. Instead, the underlying model produces a continuous latent score that reflects the model’s estimate of symptom severity from vocal features. Those raw outputs can fall in ranges like the one you’re seeing (e.g., roughly −2.5 to 0.5 depending on the checkpoint and calibration).

In the production API (https://www.kintsugihealth.com/api/voice-api#predict-results), we apply an additional mapping layer on top of that latent score to produce the clinically interpretable outputs. Specifically:

The latent score is used internally as the model’s continuous signal.
We then apply calibration and threshold mapping to translate that signal into:
- binary screening outputs (e.g., depression present/absent)
- severity bands (e.g., no_to_mild, mild_to_moderate, moderate_to_severe)
The API therefore returns clinically meaningful categories, rather than the raw latent regression value.

So if you’re using the open model outputs directly, what you’re seeing is the pre-calibration latent score, not a normalized PHQ-9 or GAD-7 regression target.

NDStein

Kintsugi org Mar 13

The model is trained with ordinal regression rather than traditional regression. Ordinal regression is designed for problems with categorical labels which have a natural ordering.

The raw score is on an arbitrary numeric scale, not based on a pre-defined transformation of the labels. The ordinal regression objective teaches the model to output higher raw depression / anxiety scores for higher PHQ-9 / GAD-7 labels. There is nothing encouraging the relationship to be linear, but the ordinal regression objective tries to make it as close to monotone as possible.

I've added the figures below showing the cumulative score distribution for each label value to the "Output" section of the model card. Since predicting PHQ-9 or GAD-7 sum exactly from a 30-second voice clip on an arbitrary topic is hard, there is substantial overlap between the distributions for nearby PHQ-9 and GAD-7 values. But the trend of higher raw scores for higher label values is visible.

The reason I mentioned threshold tuning is that if you want to explicitly map ranges of scores to predicted PHQ-9 or GAD-7 sums, the threshold tuning code gives you an optimal way to do so for many definitions of "optimal".

These were plotted with the data https://huggingface.co/datasets/KintsugiHealth/dam-dataset/blob/main/data/test-00000-of-00001.parquet using the code

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_parquet("test-00000-of-00001.parquet")
df['phq'] = df['phq'].astype(int)  # All integers, but stored as floats
df['gad'] = df['gad'].astype(int)
sns.ecdfplot(data=df, x="scores_depression", hue="phq")
plt.title("CDF of unquantized depression model scores on test set grouped by PHQ-9 sum")
plt.grid()
sns.ecdfplot(data=df, x="scores_anxiety", hue="gad")
plt.title("CDF of unquantized anxiety model scores on test set grouped by GAD-7 sum")
plt.grid()
plt.show()

rfbrito

Mar 18

Awesome, thank you @grehce @NDStein for the very thorough explanation. That helps a lot for understanding what's happening!

NDStein changed discussion status to closed Apr 17

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment