docs: align model card with template style

2017d75 verified about 3 hours ago

6.88 kB

license: gemma
datasets:
  - lianghsun/fineweb-edu-zhtw-magistral-annotations
language:
  - zh
metrics:
  - f1
  - google/embeddinggemma-300m
pipeline_tag: text-classification
library_name: transformers
tags:
  - Taiwan
  - ROC
  - zhtw
  - edu
  - classifier
  - Twinkle.AI
model-index:
  - name: fineweb-edu-zhtw-classifier
    results:
      - task:
          type: text-classification
          name: Text Classification
        dataset:
          type: lianghsun/fineweb-edu-zhtw-magistral-annotations
          name: fineweb-edu-zhtw-magistral-annotations
        metrics:
          - name: Loss
            type: loss
            value: 0.21275073289871216
          - name: Precision
            type: precision
            value: 0.7671874817634704
          - name: Recall
            type: recall
            value: 0.7840000000000001
          - name: F1 (Macro)
            type: f1-macro
            value: 0.7656082438372686
          - name: Accuracy
            type: accuracy
            value: 0.8093333333333333

Model Card for fineweb-edu-zhtw-classifier

fineweb-edu-zhtw-classifier 是用來過濾繁體中文網頁文本「教育性」程度的輕量級分類器。建構於 google/embeddinggemma-300m 之上，以 fineweb-edu-zhtw-magistral-annotations 為訓練資料微調，輸出 c0／c1／c2 三類教育性標籤，作為 fineweb-edu-zhtw 過濾流程之核心模型。

⚠️ 規格重點：本模型為 300M 參數 embedding + classification head 模型，不是生成模型；輸出為三分類標籤與 confidence。

Model Details

Model Description

Developed by: Liang Hsun Huang, Min YI Chen
Funded by: APMIC
Shared by: Twinkle AI
Model type: Embedding + classification head
Language(s) (NLP): Traditional Chinese & English
License: gemma
Finetuned from model: google/embeddinggemma-300m

Model Sources [optional]

Repository: lianghsun/fineweb-edu-zhtw-classifier
Paper: TBA

Uses

Direct Use

[More Information Needed]

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

[More Information Needed]

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: [More Information Needed]
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors

Liang Hsun Huang

Model Card Contact

Liang Hsun Huang