inclusionAI
/

LLaDA2.0-mini-CAP

Text Generation

text_generation

Model card Files Files and versions

luguoshan commited on Dec 11, 2025

Commit

3a5e99b

·

1 Parent(s): 8a40c50

update README.md

Files changed (1) hide show

README.md +6 -4

README.md CHANGED Viewed

@@ -36,13 +36,15 @@ _Evaluated on 12 diverse benchmarks covering knowledge, reasoning, coding, and m
 ### Technical Overview
 The training objective combines two complementary losses:
-$ \mathcal{L}(\theta) = \mathcal{L}_{\text{SFT}}(\theta) + \lambda \mathcal{L}_{\text{conf}}(\theta) $
 Where:
-+ $ \mathcal{L}_{\text{SFT}} $: Supervised fine-tuning loss ensuring prediction correctness
-+ $ \mathcal{L}_{\text{conf}} $: Confidence loss that minimizes entropy only for correctly predicted tokens
-+ $ \lambda $: Hyperparameter balancing the two objectives
 ### Why CAP Works
 1. **Sharpens Correct Predictions**: While standard training ensures correctness, it provides diminishing incentive to increase confidence on already-correct tokens. CAP explicitly optimizes for high-confidence predictions.

 ### Technical Overview
 The training objective combines two complementary losses:
+```math
+L(θ) = L_SFT(θ) + λL_conf(θ)
+```
 Where:
++ **L_SFT**: Supervised fine-tuning loss ensuring prediction correctness
++ **L_conf**: Confidence loss that minimizes entropy only for correctly predicted tokens
++ **λ**: Hyperparameter balancing the two objectives
 ### Why CAP Works
 1. **Sharpens Correct Predictions**: While standard training ensures correctness, it provides diminishing incentive to increase confidence on already-correct tokens. CAP explicitly optimizes for high-confidence predictions.