mathlib4_state_diff / README.md
fumiyau's picture
Update README.md
2303781 verified
metadata
license: apache-2.0
datasets:
  - fumiyau/mathlib4-state-change
base_model:
  - FacebookAI/xlm-roberta-base
pipeline_tag: text-classification
library_name: transformers

RoBERTa for mathlib state difficulty judgement

lean4の状態を入力として、証明完了までの行数を10段階("1"-"10 or more")で予測するRoBERTaモデルです。

使用方法

pipelineを使用

from transformers import pipeline

pipe = pipeline("text-classification", model="fumiyau/mathlib4_state_diff", token=HF_TOKEN)
pipe("case h.e'_2.a x x₁ x₂ x₃ x' y y₁ y₂ y₃ y' : PGame ih : ∀ (a : Args), ArgsRel a (Args.P1 x y) → P124 a hx : x.Numeric hy : y.Numeric ih' : ∀ (a : Args), ArgsRel a (Args.P24 x₁ x₂ y) → P124 a hn : (x * y).Numeric h : ∀ (i : (x * y).LeftMoves), ⟦(x * y).moveLeft i⟧ < ⟦x * y⟧ ⊢ MulOptionsLTMul (-x) (-y) ↔ ∀ (i : (-x).LeftMoves) (j : (-y).LeftMoves), ⟦-x * -y⟧ > ⟦(-x).mulOption (-y) i j⟧")
# [{'label': '1', 'score': 0.23482494056224823}]
# logitsにsoftmaxを適用した値が得られる

model, tokenizerを明示的に読み込む使いかた

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

text = """x y : ℕ,
h₁ : Prime x,
h₂ : ¬Even x,
h₃ : y > x
⊢ y ≥ 4"""

tokenizer = AutoTokenizer.from_pretrained("fumiyau/mathlib4_state_diff")
model = AutoModelForSequenceClassification.from_pretrained("fumiyau/mathlib4_state_diff")

inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits
    # logits = tensor([[ 1.5015,  1.1391,  0.4959,  0.0449, -0.1418, -0.2950, -0.8557, -0.8465, -1.1877,  0.2097]])

predicted_class_id = logits.argmax().item()
model.config.id2label[predicted_class_id]
# '1'

評価結果

1.5epoch学習時点
適宜更新予定

Metric Value
train_loss 2.0292
eval_loss 1.8142
eval_acc 0.39312