You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

DeBERTa-v3-large + Psycholinguistic Features (5-Fold Ensemble)

> Task: 8-class emotion classification on sentence-level mental health discourse
> Backbone: microsoft/deberta-v3-large + 64 engineered features
> Validation: 5-Fold Stratified CV Macro F1 = 0.6326 Β± 0.0115
> Compute: Trained on Kaggle Tesla P100 (16GB) with FP16


πŸ“– Model Summary

This model is a hybrid emotion classifier designed to detect granular emotional states in mental health discussions. It combines the contextual understanding of DeBERTa-v3-large with a dense vector of 64 engineered psycholinguistic features (sentiment, readability, keyword indicators, and linguistic style).

βœ… Key Characteristics

  • Not a single model: This is a 5-Fold Ensemble
  • Why ensemble? Voting/averaging across folds reduces variance, improves stability, and generalizes better on unseen mental health text.
  • Two-input system: The model consumes (a) tokenized text and (b) a 64-dim feature vector.

🧾 What it predicts

The model outputs one of 8 emotion categories:

ID Emotion
0 sadness
1 hopelessness
2 loneliness
3 anger
4 worthlessness
5 suicide intent
6 emptiness
7 brain dysfunction

🧩 Architecture Overview

Component Specification
Transformer backbone microsoft/deberta-v3-large
Max sequence length 512
Auxiliary features 64 engineered features (see below)
Fusion method Concatenate [CLS] embedding + projected feature vector
Head MLP classifier β†’ 8 logits
Training strategy 5-Fold Stratified Cross Validation + fold ensembling
Domain Reddit mental health discourse (2015–2023)

🧠 Engineered Features (64 total)

Extracted from full_text = title + " " + sentence:

  • Basic (16): char count, word count, sentence count, punctuation counts/ratios, uppercase ratio, titlecase count, readability indices.
  • Sentiment (10): VADER compound/pos/neg/neu, intensity, sentiment flags, sentiment variance, positive keyword counts.
  • Emotion keyword signals (26): per-class keyword count/present/ratio + total keyword count + diversity score.
  • Linguistic (12): pronoun counts, negation counts, stopword ratio, unique word ratio, repetition, avg sentence length, ellipsis, ALLCAPS words.

βœ… Feature extraction results from training run:

  • Train features: (22820, 64)
  • Competition validation features: (4611, 64)
  • Saved artifacts:
    • train_with_features.csv
    • val_with_features.csv
    • feature_columns.json

⚠️ IMPORTANT: The model expects feature vectors in the exact same column order as feature_columns.json.


πŸ—‚οΈ Dataset & Training Data

Dataset summary

  • Total: 32,347 sentences from 5,154 Reddit posts (2015–2023)
  • Train: 22,820 labeled sentences
  • Competition validation file: 4,611 rows (unlabeled for leaderboard; used for submission generation)
  • Average engagement: ~167 upvotes/post (dataset info)

Training label distribution (from run)

  • sadness (0): 6956 (30.5%)
  • hopelessness (1): 4571 (20.0%)
  • loneliness (2): 3668 (16.1%)
  • anger (3): 3025 (13.3%)
  • worthlessness (4): 1586 (7.0%)
  • suicide intent (5): 1297 (5.7%)
  • emptiness (6): 1164 (5.1%)
  • brain dysfunction (7): 553 (2.4%)

Preprocessing

  • Constructed full_text by concatenating title + sentence
  • Extracted 64 engineered features from full_text
  • NaNs handled in features
  • Tokenization with truncation to max length 512

πŸ“Š Evaluation Results

Primary evaluation metric

  • Macro F1 (balanced across classes), computed via 5-Fold Stratified CV

Results (from training output)

  • Average Macro F1: 0.6326
  • Standard deviation: 0.0115
  • Fold scores: 0.6286, 0.6378, 0.6163, 0.6510, 0.6291

Notes on stability

Low std dev indicates the model is stable across folds and not overly dependent on a single split.


πŸ’» Usage

Inputs & Outputs

Inputs

  • title: str (can be empty)
  • sentence: str

The model consumes:

  1. tokenized text tensors: input_ids, attention_mask
  2. numeric tensor: feature_vector of shape [batch, 64]

Output

  • logits shape [batch, 8]
  • you can apply softmax β†’ probabilities
  • final label is argmax(probs)

🐍 Inference (Kaggle-friendly)

⚠️ Because this model takes two inputs, you cannot directly use AutoModelForSequenceClassification alone. You must use the custom model wrapper that fuses transformer embeddings with the 64-dim feature vector.

Minimal inference snippet (single fold)

import os
import json
import numpy as np
import torch
from transformers import AutoTokenizer

# You must provide these from your package/notebook:
# - CustomDebertaModel (hybrid model)
# - extract_features(text) -> np.array shape (64,) or torch.Tensor shape (1,64)
from model_utils import CustomDebertaModel, extract_features

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
MODEL_DIR = "/kaggle/input/your-model-dataset/model_fold_0"  # fold folder
WEIGHTS_PATH = os.path.join(MODEL_DIR, "model.safetensors")
FEATURE_COLUMNS_PATH = "/kaggle/input/your-model-dataset/feature_columns.json"

tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-large")

# Load feature column order (VERY IMPORTANT)
with open(FEATURE_COLUMNS_PATH, "r") as f:
    feature_cols = json.load(f)

model = CustomDebertaModel(n_features=64, n_classes=8)

# If your weights are in safetensors, load using safetensors:
from safetensors.torch import load_file
state_dict = load_file(WEIGHTS_PATH)
model.load_state_dict(state_dict, strict=True)

model.to(DEVICE).eval()

text = "I feel like I'm drowning in my own thoughts."

# Tokenize
inputs = tokenizer(
    text,
    return_tensors="pt",
    truncation=True,
    max_length=512,
    padding=False
)
inputs = {k: v.to(DEVICE) for k, v in inputs.items()}

# Extract features (must match training feature order)
feat = extract_features(text)  # expected raw dict OR array
# If feat returns dict, reorder it using feature_cols:
if isinstance(feat, dict):
    feat_vec = np.array([feat[c] for c in feature_cols], dtype=np.float32)[None, :]
else:
    feat_vec = np.array(feat, dtype=np.float32)[None, :]

feature_vector = torch.tensor(feat_vec).to(DEVICE)

with torch.no_grad():
    logits = model(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        feature_vector=feature_vector,
    )
    probs = torch.softmax(logits, dim=-1).cpu().numpy()[0]
    pred_id = int(probs.argmax())

print("probs:", probs)
print("pred_id:", pred_id)

πŸ—³οΈ Ensemble Inference (Recommended)

For best performance, predictions from all 5 trained folds should be combined by averaging class probabilities. This reduces variance and improves robustness on unseen mental health text.

Each fold was trained independently using stratified splits. During inference, each fold produces a probability distribution over the 8 emotion classes, and the final prediction is computed as:

> Final Probability = Mean(Fold₁ … Foldβ‚… Probabilities)


🐍 Ensemble Inference Code

import os
import json
import numpy as np
import torch
from transformers import AutoTokenizer
from safetensors.torch import load_file

# Custom hybrid model + feature extractor
from model_utils import CustomDebertaModel, extract_features

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# Paths (adjust to your Kaggle dataset structure)
MODEL_ROOT = "/kaggle/input/fragment-of-feeling-model"
FEATURE_COLUMNS_PATH = os.path.join(MODEL_ROOT, "feature_columns.json")

FOLD_DIRS = [
    "model_fold_0",
    "model_fold_1",
    "model_fold_2",
    "model_fold_3",
    "model_fold_4",
]

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-large")

# Load feature column order
with open(FEATURE_COLUMNS_PATH, "r") as f:
    feature_columns = json.load(f)

# Load all fold models
models = []
for fold in FOLD_DIRS:
    model = CustomDebertaModel(n_features=64, n_classes=8)
    weights = load_file(os.path.join(MODEL_ROOT, fold, "model.safetensors"))
    model.load_state_dict(weights, strict=True)
    model.to(DEVICE).eval()
    models.append(model)

def ensemble_predict(text: str):
    # Tokenize
    inputs = tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        max_length=512
    )
    inputs = {k: v.to(DEVICE) for k, v in inputs.items()}

    # Feature extraction
    raw_features = extract_features(text)

    if isinstance(raw_features, dict):
        feature_vector = np.array(
            [raw_features[col] for col in feature_columns],
            dtype=np.float32
        )[None, :]
    else:
        feature_vector = np.array(raw_features, dtype=np.float32)[None, :]

    feature_vector = torch.tensor(feature_vector).to(DEVICE)

    # Average probabilities
    probs_sum = None
    with torch.no_grad():
        for model in models:
            logits = model(
                input_ids=inputs["input_ids"],
                attention_mask=inputs["attention_mask"],
                feature_vector=feature_vector
            )
            probs = torch.softmax(logits, dim=-1)
            probs_sum = probs if probs_sum is None else probs_sum + probs

    probs_avg = (probs_sum / len(models)).cpu().numpy()[0]
    pred_class = int(probs_avg.argmax())

    return pred_class, probs_avg

# Example
pred, probs = ensemble_predict("Nothing feels real anymore.")
print("Predicted class:", pred)
print("Probabilities:", probs)

πŸ“ Input & Output Specification

Output

Name Shape Type Description
logits [batch, 8] float32 Raw, unnormalized scores for each emotion class
probabilities [batch, 8] float32 Softmax-normalized class probabilities

The final predicted emotion is obtained via argmax(probabilities).


βš™οΈ System & Integration Details

Standalone or System Component

This model can operate as a standalone text classifier, but is designed to function as part of a hybrid NLP system when combined with its preprocessing pipeline (feature extraction + tokenization).

Upstream Requirements

Before inference, the following steps must be completed:

  1. Concatenate title and sentence into a single full_text field
  2. Extract the 64-dimensional psycholinguistic feature vector from full_text
  3. Ensure feature ordering exactly matches feature_columns.json
  4. Tokenize text using microsoft/deberta-v3-large tokenizer (max length = 512)

Failure to follow these steps will result in invalid predictions.

Downstream Dependencies

  • Model outputs are probabilistic signals, not deterministic labels
  • Recommended downstream handling:
    • Confidence thresholding
    • Abstain / fallback logic for low-confidence predictions
    • Human-in-the-loop review for sensitive cases

πŸ› οΈ Implementation Requirements

Training Hardware

Component Specification
GPU NVIDIA Tesla P100 (16GB)
Environment Kaggle Notebook
Precision FP16 (Mixed Precision)
Training Strategy 5-Fold Stratified Cross Validation

Inference Hardware

Component Specification
GPU (Recommended) NVIDIA T4 / P100
CPU Supported (higher latency)

πŸ§ͺ Software Stack

The following software versions were used during training and inference:

  • Python β‰₯ 3.10
  • torch β‰₯ 2.0.0
  • transformers β‰₯ 4.30.0
  • nltk (VADER sentiment analysis)
  • textstat (readability metrics)
  • safetensors (secure weight loading)

⚑ Model Characteristics

Initialization

  • Initialized from a pretrained language model
  • Backbone: microsoft/deberta-v3-large
  • Classification head randomly initialized and fine-tuned on task labels

Model Statistics

Attribute Value Notes
Total Parameters ~435M Backbone + fusion head
Transformer Layers 24 DeBERTa-Large
Hidden Size 1024 CLS embedding
Precision FP16 Mixed precision training
Pruning None Full model retained
Quantization None No post-training compression

πŸ“‚ Data Overview

Training Data

Attribute Description
Source Reddit mental health communities
Communities r/depression, r/anxiety, r/suicidewatch
Total Posts 5,154
Total Sentences 32,347
Labeled Training Set 22,820 sentences
Time Range 2015 – 2023

Preprocessing Steps

  • Sentence-level segmentation
  • Removal of usernames, URLs, and identifiable metadata
  • Construction of full_text = title + sentence
  • Feature extraction + NaN handling

Demographic Information

  • No explicit demographic labels provided
  • User population is anonymous
  • Likely skew towards ages 18–34 based on Reddit usage patterns

πŸ“Š Evaluation Results

Evaluation Protocol

  • Metric: Macro F1 Score
  • Validation Strategy: 5-Fold Stratified Cross Validation
  • Objective: Balanced performance across all emotion classes

Results Summary

Metric Value
Macro F1 (mean) 0.6326
Standard Deviation 0.0115
Fold Scores 0.6286, 0.6378, 0.6163, 0.6510, 0.6291

Low variance across folds indicates stable generalization.


πŸ”Ž Subgroup Analysis & Observed Biases

Identified Behaviors

  • Keyword Sensitivity:
    Explicit distress terms (e.g., β€œsuicide”, β€œkill myself”) strongly influence predictions.

  • Length Bias:
    Best performance on inputs between 10–30 words.
    Higher variance observed for very short inputs (<5 words).

  • Domain Dependence:
    Model is optimized for Reddit-style, informal English text.

Mitigation Strategies

  • Ensemble averaging across folds
  • Probability-based confidence thresholds
  • Optional human review for high-risk outputs

⚠️ Usage Limitations

THIS MODEL IS NOT A MEDICAL DEVICE

  • Not clinically validated
  • Not suitable for autonomous mental health diagnosis
  • Not safe for automated suicide-risk intervention systems
  • Intended strictly for research, benchmarking, and educational use

🧭 Ethics & Responsible Use

  • All training data is anonymized
  • No personally identifiable information is stored or inferred
  • Outputs reflect language patterns, not mental states
  • High-risk applications must include human oversight

πŸ“¦ Recommended Model Package Structure

fragment-of-feeling-model/
β”œβ”€β”€ model_fold_0/
β”‚ └── model.safetensors
β”œβ”€β”€ model_fold_1/
β”‚ └── model.safetensors
β”œβ”€β”€ model_fold_2/
β”‚ └── model.safetensors
β”œβ”€β”€ model_fold_3/
β”‚ └── model.safetensors
β”œβ”€β”€ model_fold_4/
β”‚ └── model.safetensors
β”œβ”€β”€ feature_columns.json
β”œβ”€β”€ model_utils.py
└── README.md

πŸ“£ Reported Training Output

  • Cross-validation completed successfully
  • Macro F1 (5-Fold CV): 0.6326
  • Standard deviation: 0.0115
  • Submission file generated without errors

βœ… Intended Use Statement

This model is intended for research and educational purposes to study emotional signal detection in anonymized mental health text.
Any real-world deployment must include robust safeguards, calibration, and human-in-the-loop review.

Find related codes here https://www.kaggle.com/models/sanjidh090/fragment_final_model/code

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support