b4c0n
/

KAi-Toxicity-Filter

+# KAi Toxicity Filter
+日本語の有害表現検出に特化したモデル
+Japanese toxicity detection model specialized for Japanese language
+---
+## 日本語版
+### モデル概要
+日本語テキストを有害/非有害に分類するモデルです。日本語特有の表現やニュアンスに最適化されています。
+### 学習データ
+以下のデータで学習されています：
+- **inspection-ai/japanese-toxic-dataset** (Apache 2.0)
+  - 出典: https://github.com/inspection-ai/japanese-toxic-dataset
+- **KAi専用カスタムデータセット**
+- **自動生成されたハードネガティブサンプル**
+- **自動生成された有害表現バリエーション**（バランス調整用）
+### モデル詳細
+- **ベースモデル**: `cl-tohoku/bert-base-japanese-v3`
+- **タスク**: 二値分類（有害/非有害）
+- **学習手法**: 連続値ラベル学習（0.0〜1.0）+ BCEWithLogitsLoss
+- **特徴**: 改善された学習手法による日本語表現の最適化
+### 使用例
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+model_name = "b4c0n/KAi-toxicity-filter"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+text = "死ね"
+inputs = tokenizer(text, return_tensors="pt")
+outputs = model(**inputs)
+toxic_logit = outputs.logits[0][1].item()
+toxic_prob = torch.sigmoid(torch.tensor(toxic_logit)).item()
+print(f"有害確率: {toxic_prob:.2%}")
+```
+### 使用目的
+KAi サービスにおける日本語テキストの有害コンテンツ検出・フィルタリングのために開発されました。
+**主な用途:**
+- ユーザー生成コンテンツのモデレーション
+- 対話型AIの安全性フィルタリング
+- 日本語ソーシャルメディアコンテンツの有害性検出
+### 制限事項
+- 単文レベルの分類（文脈考慮なし）
+- 誤検出（偽陽性/偽陰性）の可能性
+- 文化的・地域的文脈により判定が変わる可能性
+- 人間のレビューなしの自動検閲には適していません
+### 倫理的配慮
+⚠️ このモデルは有害コンテンツデータで学習されています。責任を持って使用してください。
+- 正当な表現を誤検出する可能性があります
+- コンテンツ削除の唯一の判断基準として使用すべきではありません
+- 定期的な人間によるレビューを推奨します
+- 自動フィルタリング実装時は表現の自由を考慮してください
+### パフォーマンス
+日本語の有害表現検出タスクにおいて高いパフォーマンスを発揮します。
+### ライセンス
+Apache 2.0
+### 謝辞
+このモデルは [inspection-ai/japanese-toxic-dataset](https://github.com/inspection-ai/japanese-toxic-dataset) (Apache 2.0 License) のデータを使用しています。
+---
+## English
+### Model Description
+This model classifies Japanese text as toxic or non-toxic, specifically optimized for Japanese language nuances and expressions.
+### Training Data
+This model was trained on:
+- **inspection-ai/japanese-toxic-dataset** (Apache 2.0)
+  - Source: https://github.com/inspection-ai/japanese-toxic-dataset
+- **Custom dataset** created specifically for KAi
+- **Automatically generated hard negative samples**
+- **Automatically generated toxic variations** for balance
+### Model Details
+- **Base Model**: `cl-tohoku/bert-base-japanese-v3`
+- **Task**: Binary Text Classification (toxic/not-toxic)
+- **Training**: Continuous label learning (0.0-1.0) with BCEWithLogitsLoss
+- **Special Feature**: Optimized for Japanese language with improved training techniques
+### Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+model_name = "your-username/KAi-toxicity-filter"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+text = "死ね"
+inputs = tokenizer(text, return_tensors="pt")
+outputs = model(**inputs)
+toxic_logit = outputs.logits[0][1].item()
+toxic_prob = torch.sigmoid(torch.tensor(toxic_logit)).item()
+print(f"Toxic probability: {toxic_prob:.2%}")
+```
+### Intended Use
+This model was developed for the KAi service to detect and filter harmful content in Japanese text.
+**Primary Use Cases:**
+- Content moderation for user-generated text
+- Safety filtering in conversational AI
+- Toxicity detection in Japanese social media content
+### Limitations
+- Single sentence classification (no context consideration)
+- May have false positives/negatives
+- Cultural and regional context may affect predictions
+- Not designed for automatic censorship without human review
+### Ethical Considerations
+⚠️ This model was trained on toxic content data. Please use responsibly.
+- The model may produce false positives affecting legitimate speech
+- Should not be used as the sole decision-maker for content removal
+- Regular human review is recommended
+- Consider freedom of expression when implementing automated filtering
+### Performance
+The model shows strong performance on Japanese toxicity detection tasks.
+### License
+Apache 2.0
+### Citation
+```bibtex
+@misc{kai-toxicity-filter,
+  author = {Your Name},
+  title = {KAi Toxicity Filter: Japanese Toxicity Detection Model},
+  year = {2025},
+  publisher = {HuggingFace},
+  howpublished = {\url{https://huggingface.co/your-username/KAi-toxicity-filter}}
+}
+```
+### Acknowledgments
+This model uses data from [inspection-ai/japanese-toxic-dataset](https://github.com/inspection-ai/japanese-toxic-dataset) (Apache 2.0 License).