How to un-normalized unquantized predictions
Hello! Thanks for all your help so far:)
How do I un-normalized the quantized predictions/what function did you use to normalize it? I want to make sure I can properly compare the predictions to the true scores in a regression context.
Thanks!
Rahul
Hi Rahul,
You're welcome!
By normalize do you mean quantize? If you want the raw, un-quantized scores, just pass quantize=False to one of the Pipeline.run_on_* functions.
The quantization uses a sorted list of thresholds for each task, which are configurable at the Pipeline level; for the defaults see inference_thresholds in config.py. The quantization happens in the last function in model.py by calling torch.searchsorted to figure out where in the sorted list of thresholds a particular score lands.
Noah
Yeah fair question, to clarify - I don't quantize. The output I get is, as the model card says, "raw float values which correlate monotonically with PHQ-9 and GAD-7." The values I get out are on a range of -2.5 to 0.5, so I wasn't sure how to convert the true scores (0-27) to things normalized range or vis-versa.
Oh, ok. How to best map raw scores to the most likely corresponding PHQ-9 sums depends on what metric you use for "best". The tuning folder has tools to use for this; see the "Tuning thresholds" section of the dam model card, specifically the "Optimal Tuning for Multi-class Tasks" subsection, for some examples. Let me know if you have questions about how this is intended to answer your question.
Hi everyone. I would kindly ask you about the model that predicts the emotions output. Do we have any plans to release this model?
Thanks @NDStein ! Maybe I'm misunderstanding; from a regression standpoint (e.g. if quantize=False), I was expecting the model to output a continuous prediction between 0-27 for phq9 and 0-21 for gad7. The output I get instead is continuous but between -2.5 and 0.5 or so. My interpretation of that was that these scores map to an actual phq9 or gad7 sum score, but that there is some type of conversion or normalization happening in the DAM model (e.g. z-scoring). If so, I'm curious what that is. I didn't think that threshold tuning was what I wanted to do since I am not trying to look at any type of classifier but treating this as a continuous regression problem.
@rfbrito Thanks for the question. I think the confusion is coming from how the model outputs were structured.
The model itself does not directly regress to the raw PHQ-9 (0–27) or GAD-7 (0–21) totals. Instead, the underlying model produces a continuous latent score that reflects the model’s estimate of symptom severity from vocal features. Those raw outputs can fall in ranges like the one you’re seeing (e.g., roughly −2.5 to 0.5 depending on the checkpoint and calibration).
In the production API (https://www.kintsugihealth.com/api/voice-api#predict-results), we apply an additional mapping layer on top of that latent score to produce the clinically interpretable outputs. Specifically:
The latent score is used internally as the model’s continuous signal.
We then apply calibration and threshold mapping to translate that signal into:
- binary screening outputs (e.g., depression present/absent)
- severity bands (e.g., no_to_mild, mild_to_moderate, moderate_to_severe)
The API therefore returns clinically meaningful categories, rather than the raw latent regression value.
So if you’re using the open model outputs directly, what you’re seeing is the pre-calibration latent score, not a normalized PHQ-9 or GAD-7 regression target.
If someone wanted to approximate PHQ-9 / GAD-7 totals from that signal, you would typically apply a calibration function (e.g., isotonic/logistic mapping) trained on labeled validation data rather than simple threshold tuning.