b4c0n
/

KAi-Toxicity-Filter

@@ -2,7 +2,6 @@
 language:
 - ja
 license: apache-2.0
-base_model: cl-tohoku/bert-base-japanese-v3
 library_name: transformers
 pipeline_tag: text-classification
 tags:
@@ -54,7 +53,7 @@ Japanese toxicity detection model specialized for Japanese language
 ### モデル概要
-日本語テキストを有害/非有害に分類するモデルです。日本語特有の表現やニュアンスに最適化されています。
 ### 学習データ
@@ -68,27 +67,38 @@ Japanese toxicity detection model specialized for Japanese language
 ### モデル詳細
-- **ベースモデル**: `cl-tohoku/bert-base-japanese-v3`
 - **タスク**: 二値分類（有害/非有害）
-- **学習手法**: 連続値ラベル学習（0.0〜1.0）+ BCEWithLogitsLoss
-- **特徴**: 改善された学習手法による日本語表現の最適化
-### 使用例
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
 import torch
-model_name = "b4c0n/KAi-toxicity-filter"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForSequenceClassification.from_pretrained(model_name)
 text = "終わってる暴言"
-inputs = tokenizer(text, return_tensors="pt")
 outputs = model(**inputs)
-toxic_logit = outputs.logits[0][1].item()
-toxic_prob = torch.sigmoid(torch.tensor(toxic_logit)).item()
 print(f"有害確率: {toxic_prob:.2%}")
 ```
@@ -104,9 +114,10 @@ KAi (かい鯖グループAI) における日本語テキストの有害コン
 ### 制限事項
-- 単文レベルの分類（文脈考慮なし）
-- 誤検出（偽陽性/偽陰性）の可能性
-- 文化的・地域的文脈により判定が変わる可能性
 - 人間のレビューなしの自動検閲には適していません
 ### 倫理的配慮
@@ -118,10 +129,6 @@ KAi (かい鯖グループAI) における日本語テキストの有害コン
 - 定期的な人間によるレビューを推奨します
 - 自動フィルタリング実装時は表現の自由を考慮してください
-### パフォーマンス
-日本語の有害表現検出タスクにおいて高いパフォーマンスを発揮します。
 ### ライセンス
 Apache 2.0
@@ -136,7 +143,7 @@ Apache 2.0
 ### Model Description
-This model classifies Japanese text as toxic or non-toxic, specifically optimized for Japanese language nuances and expressions.
 ### Training Data
@@ -150,27 +157,38 @@ This model was trained on:
 ### Model Details
-- **Base Model**: `cl-tohoku/bert-base-japanese-v3`
 - **Task**: Binary Text Classification (toxic/not-toxic)
-- **Training**: Continuous label learning (0.0-1.0) with BCEWithLogitsLoss
-- **Special Feature**: Optimized for Japanese language with improved training techniques
-### Usage
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
 import torch
-model_name = "your-username/KAi-toxicity-filter"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForSequenceClassification.from_pretrained(model_name)
 text = "toxic expression"
-inputs = tokenizer(text, return_tensors="pt")
 outputs = model(**inputs)
-toxic_logit = outputs.logits[0][1].item()
-toxic_prob = torch.sigmoid(torch.tensor(toxic_logit)).item()
 print(f"Toxic probability: {toxic_prob:.2%}")
 ```
@@ -186,9 +204,10 @@ This model was developed for the KAi (KaisabaGroupAI) to detect and filter harmf
 ### Limitations
-- Single sentence classification (no context consideration)
 - May have false positives/negatives
 - Cultural and regional context may affect predictions
 - Not designed for automatic censorship without human review
 ### Ethical Considerations
@@ -200,23 +219,18 @@ This model was developed for the KAi (KaisabaGroupAI) to detect and filter harmf
 - Regular human review is recommended
 - Consider freedom of expression when implementing automated filtering
-### Performance
-The model shows strong performance on Japanese toxicity detection tasks.
 ### License
 Apache 2.0
 ### Citation
 ```bibtex
 @misc{kai-toxicity-filter,
-  author = {Your Name},
   title = {KAi Toxicity Filter: Japanese Toxicity Detection Model},
   year = {2025},
   publisher = {HuggingFace},
-  howpublished = {\url{https://huggingface.co/your-username/KAi-toxicity-filter}}
 }
 ```

 language:
 - ja
 license: apache-2.0
 library_name: transformers
 pipeline_tag: text-classification
 tags:
 ### モデル概要
+日本語テキストを有害/非有害に分類するモデルです。このモデルは**tohoku-nlp/bert-base-japanese-v3**をベースに、日本語の有害表現検出タスクでファインチューニングされています。
 ### 学習データ
 ### モデル詳細
+- **ベースモデル**: tohoku-nlp/bert-base-japanese-v3
 - **タスク**: 二値分類（有害/非有害）
+- **学習手法**: 連続値ラベル学習（0.0〜1.0）+ MSE Loss
+- **訓練データ**: 1,899サンプル（訓練: 1,614 / 検証: 285）
+- **エポック数**: 5
+- **学習率**: 2e-5（線形減衰）
+- **特徴**: ハードネガティブサンプリングによる日本語表現の最適化
+### 性能
+検証データセットでの評価結果:
+- **Accuracy**: 86.32%
+- **F1 Score**: 70.68%
+- **Precision**: 72.31%
+- **Recall**: 69.12%
+### 使用例
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
 import torch
+model_name = "b4c0n/KAi-Toxicity-Filter"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForSequenceClassification.from_pretrained(model_name)
 text = "終わってる暴言"
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
 outputs = model(**inputs)
+probs = torch.softmax(outputs.logits, dim=1)
+toxic_prob = probs[0][1].item()
 print(f"有害確率: {toxic_prob:.2%}")
 ```
 ### 制限事項
+- 短い口語表現に特化しており、長文や文脈依存の有害性検出には限界があります
+- 誤検出（偽陽性/偽陰性）の可能性があります
+- 文化的・地域的文脈により判定が変わる可能性があります
+- 訓練データに含まれない新しいタイプの有害表現は検出できない場合があります
 - 人間のレビューなしの自動検閲には適していません
 ### 倫理的配慮
 - 定期的な人間によるレビューを推奨します
 - 自動フィルタリング実装時は表現の自由を考慮してください
 ### ライセンス
 Apache 2.0
 ### Model Description
+This model classifies Japanese text as toxic or non-toxic. It is fine-tuned from **tohoku-nlp/bert-base-japanese-v3** for Japanese toxicity detection tasks.
 ### Training Data
 ### Model Details
+- **Base Model**: tohoku-nlp/bert-base-japanese-v3
 - **Task**: Binary Text Classification (toxic/not-toxic)
+- **Training Data**: 1,899 samples (train: 1,614 / validation: 285)
+- **Epochs**: 5
+- **Learning Rate**: 2e-5 with linear decay
+- **Training**: Continuous label learning (0.0-1.0) with MSE Loss
+- **Special Feature**: Optimized for Japanese language with hard negative sampling
+### Performance
+Evaluation results on validation dataset:
+- **Accuracy**: 86.32%
+- **F1 Score**: 70.68%
+- **Precision**: 72.31%
+- **Recall**: 69.12%
+### Usage
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
 import torch
+model_name = "b4c0n/KAi-Toxicity-Filter"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForSequenceClassification.from_pretrained(model_name)
 text = "toxic expression"
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
 outputs = model(**inputs)
+probs = torch.softmax(outputs.logits, dim=1)
+toxic_prob = probs[0][1].item()
 print(f"Toxic probability: {toxic_prob:.2%}")
 ```
 ### Limitations
+- Optimized for short colloquial expressions; limited for long texts or context-dependent toxicity
 - May have false positives/negatives
 - Cultural and regional context may affect predictions
+- Cannot detect new types of toxic expressions not present in training data
 - Not designed for automatic censorship without human review
 ### Ethical Considerations
 - Regular human review is recommended
 - Consider freedom of expression when implementing automated filtering
 ### License
 Apache 2.0
 ### Citation
 ```bibtex
 @misc{kai-toxicity-filter,
+  author = {b4c0n},
   title = {KAi Toxicity Filter: Japanese Toxicity Detection Model},
   year = {2025},
   publisher = {HuggingFace},
+  howpublished = {\url{https://huggingface.co/b4c0n/KAi-Toxicity-Filter}}
 }
 ```