granite-embedding-107m-multilingual-chat-difficulty
A fine-tuned model that estimates the difficulty of multilingual, multi-turn human–AI conversations based on reasoning complexity.
- Input: A condensed conversation in the format
<|user|>prompt<|assistant|>reply... - Output: A normalized difficulty score (lower scores indicate easier conversations)
Based on ibm-granite/granite-embedding-107m-multilingual.
Evaluation results:
- Loss: 0.5663
- MSE: 0.5663
- Tokens processed: 51,173,120
Model description
This model maps multi-turn chat logs to a continuous difficulty representation, enabling comparison across languages and reasoning styles.
Use cases include:
- Categorizing multilingual chat transcripts by reasoning depth.
- Supporting dataset curation or curriculum design.
- Serving as a difficulty scoring component in evaluation pipelines.
Intended uses and limitations
Use cases
- Estimating reasoning difficulty in multilingual conversations.
- Comparing dialogue complexity across datasets.
- Benchmarking conversational reasoning.
Limitations
- Not suitable for assessing factual accuracy, coherence, or sentiment.
- May not generalize well to highly domain-specific data.
- Produces relative difficulty scores, not absolute intelligence measures.
Training procedure
Hyperparameters
| Parameter | Value |
|---|---|
| learning_rate | 5e-5 |
| train_batch_size | 8 |
| eval_batch_size | 8 |
| seed | 42 |
| optimizer | AdamW (fused), betas=(0.9, 0.999), epsilon=1e-8 |
| lr_scheduler_type | linear |
| num_epochs | 5.0 |
Results
| Metric | Value |
|---|---|
| Training loss | 0.5663 |
| MSE | 0.5663 |
| Tokens processed | 51,173,120 |
Framework versions
- Transformers: 5.0.0.dev0
- PyTorch: 2.9.1+cu128
- Datasets: 4.4.1
- Tokenizers: 0.22.1
See also
agentlans/bge-small-en-v1.5-prompt-difficulty for single-turn English conversations and prompts
- Downloads last month
- 13