| --- |
| license: apache-2.0 |
| datasets: |
| - fumiyau/mathlib4-state-change |
| base_model: |
| - FacebookAI/xlm-roberta-base |
| pipeline_tag: text-classification |
| library_name: transformers |
| --- |
| # RoBERTa for mathlib state difficulty judgement |
| lean4の状態を入力として、証明完了までの行数を10段階("1"-"10 or more")で予測するRoBERTaモデルです。 |
| ## 使用方法 |
| pipelineを使用 |
| ```python |
| from transformers import pipeline |
| |
| pipe = pipeline("text-classification", model="fumiyau/mathlib4_state_diff", token=HF_TOKEN) |
| pipe("case h.e'_2.a x x₁ x₂ x₃ x' y y₁ y₂ y₃ y' : PGame ih : ∀ (a : Args), ArgsRel a (Args.P1 x y) → P124 a hx : x.Numeric hy : y.Numeric ih' : ∀ (a : Args), ArgsRel a (Args.P24 x₁ x₂ y) → P124 a hn : (x * y).Numeric h : ∀ (i : (x * y).LeftMoves), ⟦(x * y).moveLeft i⟧ < ⟦x * y⟧ ⊢ MulOptionsLTMul (-x) (-y) ↔ ∀ (i : (-x).LeftMoves) (j : (-y).LeftMoves), ⟦-x * -y⟧ > ⟦(-x).mulOption (-y) i j⟧") |
| # [{'label': '1', 'score': 0.23482494056224823}] |
| # logitsにsoftmaxを適用した値が得られる |
| ``` |
| model, tokenizerを明示的に読み込む使いかた |
| ```python |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| import torch |
| |
| text = """x y : ℕ, |
| h₁ : Prime x, |
| h₂ : ¬Even x, |
| h₃ : y > x |
| ⊢ y ≥ 4""" |
| |
| tokenizer = AutoTokenizer.from_pretrained("fumiyau/mathlib4_state_diff") |
| model = AutoModelForSequenceClassification.from_pretrained("fumiyau/mathlib4_state_diff") |
| |
| inputs = tokenizer(text, return_tensors="pt") |
| with torch.no_grad(): |
| logits = model(**inputs).logits |
| # logits = tensor([[ 1.5015, 1.1391, 0.4959, 0.0449, -0.1418, -0.2950, -0.8557, -0.8465, -1.1877, 0.2097]]) |
| |
| predicted_class_id = logits.argmax().item() |
| model.config.id2label[predicted_class_id] |
| # '1' |
| ``` |
| ## 評価結果 |
| 1.5epoch学習時点 \ |
| 適宜更新予定 |
| |Metric|Value| |
| |----|----| |
| |train_loss|2.0292| |
| |eval_loss|1.8142| |
| |eval_acc|0.39312| |