You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

DeBERTa-v3-large + Psycholinguistic Features (5-Fold Ensemble)

> Task: 8-class emotion classification on sentence-level mental health discourse
> Backbone: microsoft/deberta-v3-large + 64 engineered features
> Validation: 5-Fold Stratified CV Macro F1 = 0.6326 ± 0.0115
> Compute: Trained on Kaggle Tesla P100 (16GB) with FP16

📖 Model Summary

This model is a hybrid emotion classifier designed to detect granular emotional states in mental health discussions. It combines the contextual understanding of DeBERTa-v3-large with a dense vector of 64 engineered psycholinguistic features (sentiment, readability, keyword indicators, and linguistic style).

✅ Key Characteristics

Not a single model: This is a 5-Fold Ensemble
Why ensemble? Voting/averaging across folds reduces variance, improves stability, and generalizes better on unseen mental health text.
Two-input system: The model consumes (a) tokenized text and (b) a 64-dim feature vector.

🧾 What it predicts

The model outputs one of 8 emotion categories:

ID	Emotion
0	sadness
1	hopelessness
2	loneliness
3	anger
4	worthlessness
5	suicide intent
6	emptiness
7	brain dysfunction

🧩 Architecture Overview

Component	Specification
Transformer backbone	`microsoft/deberta-v3-large`
Max sequence length	512
Auxiliary features	64 engineered features (see below)
Fusion method	Concatenate `[CLS]` embedding + projected feature vector
Head	MLP classifier → 8 logits
Training strategy	5-Fold Stratified Cross Validation + fold ensembling
Domain	Reddit mental health discourse (2015–2023)

🧠 Engineered Features (64 total)

Extracted from full_text = title + " " + sentence:

Basic (16): char count, word count, sentence count, punctuation counts/ratios, uppercase ratio, titlecase count, readability indices.
Sentiment (10): VADER compound/pos/neg/neu, intensity, sentiment flags, sentiment variance, positive keyword counts.
Emotion keyword signals (26): per-class keyword count/present/ratio + total keyword count + diversity score.
Linguistic (12): pronoun counts, negation counts, stopword ratio, unique word ratio, repetition, avg sentence length, ellipsis, ALLCAPS words.

✅ Feature extraction results from training run:

Train features: (22820, 64)
Competition validation features: (4611, 64)
Saved artifacts:
- train_with_features.csv
- val_with_features.csv
- feature_columns.json

⚠️ IMPORTANT: The model expects feature vectors in the exact same column order as feature_columns.json.

🗂️ Dataset & Training Data

Dataset summary

Total: 32,347 sentences from 5,154 Reddit posts (2015–2023)
Train: 22,820 labeled sentences
Competition validation file: 4,611 rows (unlabeled for leaderboard; used for submission generation)
Average engagement: ~167 upvotes/post (dataset info)

Training label distribution (from run)

sadness (0): 6956 (30.5%)
hopelessness (1): 4571 (20.0%)
loneliness (2): 3668 (16.1%)
anger (3): 3025 (13.3%)
worthlessness (4): 1586 (7.0%)
suicide intent (5): 1297 (5.7%)
emptiness (6): 1164 (5.1%)
brain dysfunction (7): 553 (2.4%)

Preprocessing

Constructed full_text by concatenating title + sentence
Extracted 64 engineered features from full_text
NaNs handled in features
Tokenization with truncation to max length 512

📊 Evaluation Results

Primary evaluation metric

Macro F1 (balanced across classes), computed via 5-Fold Stratified CV

Results (from training output)

Average Macro F1: 0.6326
Standard deviation: 0.0115
Fold scores: 0.6286, 0.6378, 0.6163, 0.6510, 0.6291

Notes on stability

Low std dev indicates the model is stable across folds and not overly dependent on a single split.

💻 Usage

Inputs & Outputs

Inputs

title: str (can be empty)
sentence: str

The model consumes:

tokenized text tensors: input_ids, attention_mask
numeric tensor: feature_vector of shape [batch, 64]

Output

logits shape [batch, 8]
you can apply softmax → probabilities
final label is argmax(probs)

🐍 Inference (Kaggle-friendly)

⚠️ Because this model takes two inputs, you cannot directly use AutoModelForSequenceClassification alone. You must use the custom model wrapper that fuses transformer embeddings with the 64-dim feature vector.

Minimal inference snippet (single fold)

import os
import json
import numpy as np
import torch
from transformers import AutoTokenizer

# You must provide these from your package/notebook:
# - CustomDebertaModel (hybrid model)
# - extract_features(text) -&gt; np.array shape (64,) or torch.Tensor shape (1,64)
from model_utils import CustomDebertaModel, extract_features

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
MODEL_DIR = "/kaggle/input/your-model-dataset/model_fold_0"  # fold folder
WEIGHTS_PATH = os.path.join(MODEL_DIR, "model.safetensors")
FEATURE_COLUMNS_PATH = "/kaggle/input/your-model-dataset/feature_columns.json"

tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-large")

# Load feature column order (VERY IMPORTANT)
with open(FEATURE_COLUMNS_PATH, "r") as f:
    feature_cols = json.load(f)

model = CustomDebertaModel(n_features=64, n_classes=8)

# If your weights are in safetensors, load using safetensors:
from safetensors.torch import load_file
state_dict = load_file(WEIGHTS_PATH)
model.load_state_dict(state_dict, strict=True)

model.to(DEVICE).eval()

text = "I feel like I'm drowning in my own thoughts."

# Tokenize
inputs = tokenizer(
    text,
    return_tensors="pt",
    truncation=True,
    max_length=512,
    padding=False
)
inputs = {k: v.to(DEVICE) for k, v in inputs.items()}

# Extract features (must match training feature order)
feat = extract_features(text)  # expected raw dict OR array
# If feat returns dict, reorder it using feature_cols:
if isinstance(feat, dict):
    feat_vec = np.array([feat[c] for c in feature_cols], dtype=np.float32)[None, :]
else:
    feat_vec = np.array(feat, dtype=np.float32)[None, :]

feature_vector = torch.tensor(feat_vec).to(DEVICE)

with torch.no_grad():
    logits = model(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        feature_vector=feature_vector,
    )
    probs = torch.softmax(logits, dim=-1).cpu().numpy()[0]
    pred_id = int(probs.argmax())

print("probs:", probs)
print("pred_id:", pred_id)

🗳️ Ensemble Inference (Recommended)

For best performance, predictions from all 5 trained folds should be combined by averaging class probabilities. This reduces variance and improves robustness on unseen mental health text.

Each fold was trained independently using stratified splits. During inference, each fold produces a probability distribution over the 8 emotion classes, and the final prediction is computed as:

> Final Probability = Mean(Fold₁ … Fold₅ Probabilities)

🐍 Ensemble Inference Code

import os
import json
import numpy as np
import torch
from transformers import AutoTokenizer
from safetensors.torch import load_file

# Custom hybrid model + feature extractor
from model_utils import CustomDebertaModel, extract_features

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# Paths (adjust to your Kaggle dataset structure)
MODEL_ROOT = "/kaggle/input/fragment-of-feeling-model"
FEATURE_COLUMNS_PATH = os.path.join(MODEL_ROOT, "feature_columns.json")

FOLD_DIRS = [
    "model_fold_0",
    "model_fold_1",
    "model_fold_2",
    "model_fold_3",
    "model_fold_4",
]

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-large")

# Load feature column order
with open(FEATURE_COLUMNS_PATH, "r") as f:
    feature_columns = json.load(f)

# Load all fold models
models = []
for fold in FOLD_DIRS:
    model = CustomDebertaModel(n_features=64, n_classes=8)
    weights = load_file(os.path.join(MODEL_ROOT, fold, "model.safetensors"))
    model.load_state_dict(weights, strict=True)
    model.to(DEVICE).eval()
    models.append(model)

def ensemble_predict(text: str):
    # Tokenize
    inputs = tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        max_length=512
    )
    inputs = {k: v.to(DEVICE) for k, v in inputs.items()}

    # Feature extraction
    raw_features = extract_features(text)

    if isinstance(raw_features, dict):
        feature_vector = np.array(
            [raw_features[col] for col in feature_columns],
            dtype=np.float32
        )[None, :]
    else:
        feature_vector = np.array(raw_features, dtype=np.float32)[None, :]

    feature_vector = torch.tensor(feature_vector).to(DEVICE)

    # Average probabilities
    probs_sum = None
    with torch.no_grad():
        for model in models:
            logits = model(
                input_ids=inputs["input_ids"],
                attention_mask=inputs["attention_mask"],
                feature_vector=feature_vector
            )
            probs = torch.softmax(logits, dim=-1)
            probs_sum = probs if probs_sum is None else probs_sum + probs

    probs_avg = (probs_sum / len(models)).cpu().numpy()[0]
    pred_class = int(probs_avg.argmax())

    return pred_class, probs_avg

# Example
pred, probs = ensemble_predict("Nothing feels real anymore.")
print("Predicted class:", pred)
print("Probabilities:", probs)

📐 Input & Output Specification

Output

Name	Shape	Type	Description
logits	`[batch, 8]`	float32	Raw, unnormalized scores for each emotion class
probabilities	`[batch, 8]`	float32	Softmax-normalized class probabilities

The final predicted emotion is obtained via argmax(probabilities).

⚙️ System & Integration Details

Standalone or System Component

This model can operate as a standalone text classifier, but is designed to function as part of a hybrid NLP system when combined with its preprocessing pipeline (feature extraction + tokenization).

Upstream Requirements

Before inference, the following steps must be completed:

Concatenate title and sentence into a single full_text field
Extract the 64-dimensional psycholinguistic feature vector from full_text
Ensure feature ordering exactly matches feature_columns.json
Tokenize text using microsoft/deberta-v3-large tokenizer (max length = 512)

Failure to follow these steps will result in invalid predictions.

Downstream Dependencies

Model outputs are probabilistic signals, not deterministic labels
Recommended downstream handling:
- Confidence thresholding
- Abstain / fallback logic for low-confidence predictions
- Human-in-the-loop review for sensitive cases

🛠️ Implementation Requirements

Training Hardware

Component	Specification
GPU	NVIDIA Tesla P100 (16GB)
Environment	Kaggle Notebook
Precision	FP16 (Mixed Precision)
Training Strategy	5-Fold Stratified Cross Validation

Inference Hardware

Component	Specification
GPU (Recommended)	NVIDIA T4 / P100
CPU	Supported (higher latency)

🧪 Software Stack

The following software versions were used during training and inference:

Python ≥ 3.10
torch ≥ 2.0.0
transformers ≥ 4.30.0
nltk (VADER sentiment analysis)
textstat (readability metrics)
safetensors (secure weight loading)

⚡ Model Characteristics

Initialization

Initialized from a pretrained language model
Backbone: microsoft/deberta-v3-large
Classification head randomly initialized and fine-tuned on task labels

Model Statistics

Attribute	Value	Notes
Total Parameters	~435M	Backbone + fusion head
Transformer Layers	24	DeBERTa-Large
Hidden Size	1024	CLS embedding
Precision	FP16	Mixed precision training
Pruning	None	Full model retained
Quantization	None	No post-training compression

📂 Data Overview

Training Data

Attribute	Description
Source	Reddit mental health communities
Communities	r/depression, r/anxiety, r/suicidewatch
Total Posts	5,154
Total Sentences	32,347
Labeled Training Set	22,820 sentences
Time Range	2015 – 2023

Preprocessing Steps

Sentence-level segmentation
Removal of usernames, URLs, and identifiable metadata
Construction of full_text = title + sentence
Feature extraction + NaN handling

Demographic Information

No explicit demographic labels provided
User population is anonymous
Likely skew towards ages 18–34 based on Reddit usage patterns

📊 Evaluation Results

Evaluation Protocol

Metric: Macro F1 Score
Validation Strategy: 5-Fold Stratified Cross Validation
Objective: Balanced performance across all emotion classes

Results Summary

Metric	Value
Macro F1 (mean)	0.6326
Standard Deviation	0.0115
Fold Scores	0.6286, 0.6378, 0.6163, 0.6510, 0.6291

Low variance across folds indicates stable generalization.

🔎 Subgroup Analysis & Observed Biases

Identified Behaviors

Keyword Sensitivity:
Explicit distress terms (e.g., “suicide”, “kill myself”) strongly influence predictions.
Length Bias:
Best performance on inputs between 10–30 words.
Higher variance observed for very short inputs (<5 words).
Domain Dependence:
Model is optimized for Reddit-style, informal English text.

Mitigation Strategies

Ensemble averaging across folds
Probability-based confidence thresholds
Optional human review for high-risk outputs

⚠️ Usage Limitations

THIS MODEL IS NOT A MEDICAL DEVICE

Not clinically validated
Not suitable for autonomous mental health diagnosis
Not safe for automated suicide-risk intervention systems
Intended strictly for research, benchmarking, and educational use

🧭 Ethics & Responsible Use

All training data is anonymized
No personally identifiable information is stored or inferred
Outputs reflect language patterns, not mental states
High-risk applications must include human oversight

📦 Recommended Model Package Structure

fragment-of-feeling-model/
├── model_fold_0/
│ └── model.safetensors
├── model_fold_1/
│ └── model.safetensors
├── model_fold_2/
│ └── model.safetensors
├── model_fold_3/
│ └── model.safetensors
├── model_fold_4/
│ └── model.safetensors
├── feature_columns.json
├── model_utils.py
└── README.md

📣 Reported Training Output

Cross-validation completed successfully
Macro F1 (5-Fold CV): 0.6326
Standard deviation: 0.0115
Submission file generated without errors

✅ Intended Use Statement

This model is intended for research and educational purposes to study emotional signal detection in anonymized mental health text.
Any real-world deployment must include robust safeguards, calibration, and human-in-the-loop review.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support