This is the expand version of dleemiller/WordLlamaDetect model. Stacking two wordllama-based models to enhance the performace
- Support languages: 148
Training data (740k samples)
β
βΌ
βββββββββββββββββββββββββββββββββββββ
β Phase 1: Base Models β
β β
β βββββββββββββββ βββββββββββββββ β
β βLID Model 01 β β LID Model 02β β
β ββββββββ¬βββββββ ββββββββ¬βββββββ β
βββββββββββΌβββββββββββββββββΌβββββββββ
β train each β
β independently β
βΌ βΌ
lid_models[0] lid_models[1]
β β
βββββββββ¬βββββββββ
β
βΌ
collect_preds() β X: (N, 2*148) = (N, 296)
model1 logits model2 logits
(N, 148) cat (N, 148)
ββββββββββ¬βββββββββββ
βΌ
(N, 296)
β
Linear(296 β 148) β 296*148 = 43,808 params trained
β
βΌ
(N, 148) β CrossEntropy(y)
Evaluation results on Flores +
| Pair | Num Languages | Accuracy | F1 Macro | Metric per Base Model |
|---|---|---|---|---|
| gemma3_27b + gemma_300m | 148 | 0.9307 | 0.9303 | gemma3_27b: Acc: 0.9147, F1: 0.9149 gemma_300m: Acc: 0.9087, F1: 0.9078 |
Model tree for Bonkh/lid-stack-model-gemma_3_27b-gemma_3_300m
Base model
dleemiller/WordLlamaDetect