This is the expand version of dleemiller/WordLlamaDetect model. Stacking two wordllama-based models to enhance the performace

  • Support languages: 148
Training data (740k samples)
        β”‚
        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Phase 1: Base Models      β”‚
β”‚                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚LID Model 01 β”‚  β”‚ LID Model 02β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚  train each    β”‚
          β”‚  independently β”‚
          β–Ό                β–Ό
     lid_models[0]    lid_models[1]
          β”‚                β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚
                  β–Ό
          collect_preds() β†’ X: (N, 2*148) = (N, 296)

model1 logits      model2 logits
  (N, 148)    cat    (N, 148)
      β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β–Ό
           (N, 296)
               β”‚
       Linear(296 β†’ 148)           ← 296*148 = 43,808 params trained
               β”‚
               β–Ό
          (N, 148) β†’ CrossEntropy(y)

Evaluation results on Flores +

Pair Num Languages Accuracy F1 Macro Metric per Base Model
gemma3_27b + gemma_300m 148 0.9307 0.9303 gemma3_27b: Acc: 0.9147, F1: 0.9149
gemma_300m: Acc: 0.9087, F1: 0.9078
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Bonkh/lid-stack-model-gemma_3_27b-gemma_3_300m

Finetuned
(2)
this model

Dataset used to train Bonkh/lid-stack-model-gemma_3_27b-gemma_3_300m