Spaces:

moztrk
/

sentinel-api

Runtime error

App Files Files Community

Mustafa Öztürk commited on Mar 5

Commit

857d4f5

1 Parent(s): 2eae299

Deploy Sentinel API to HF Space

Browse files

Files changed (25) hide show

.dockerignore +13 -0
.gitignore +52 -0
Dockerfile +25 -0
README.md +255 -8
app.py +1017 -0
app/__init__.py +0 -0
app/api/__init__.py +0 -0
app/api/endpoints.py +131 -0
app/core/__init__.py +0 -0
app/core/config.py +14 -0
app/db/__init__.py +0 -0
app/db/supabase_client.py +24 -0
app/ml/__init__.py +0 -0
app/ml/model_loader.py +55 -0
app/services/__init__.py +0 -0
app/services/cache_manager.py +73 -0
app/services/moderation_service.py +169 -0
app/utils/__init__.py +0 -0
app/utils/text_utils.py +65 -0
main.py +12 -0
performance_test.py +58 -0
requirements.txt +16 -0
stress_test.py +63 -0
utils.py +51 -0
vram_check.py +32 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,13 @@

+__pycache__/
+*.py[cod]
+*.pyo
+*.pyd
+.pytest_cache/
+.mypy_cache/
+.ruff_cache/
+.venv/
+venv/
+.git/
+.gitignore
+.env
+*.log

.gitignore ADDED Viewed

	@@ -0,0 +1,52 @@

+# Python bytecode/cache
+__pycache__/
+*.py[cod]
+*$py.class
+# Virtual environments
+venv/
+.venv/
+env/
+ENV/
+# Environment variables
+.env
+.env.*
+!.env.example
+# Build/distribution artifacts
+build/
+dist/
+*.egg-info/
+.eggs/
+# Test/cache artifacts
+.pytest_cache/
+.mypy_cache/
+.ruff_cache/
+.coverage
+coverage.xml
+htmlcov/
+# Jupyter
+.ipynb_checkpoints/
+# IDE/editor
+.vscode/
+.idea/
+*.swp
+*.swo
+# OS files
+.DS_Store
+Thumbs.db
+# Logs/runtime files
+*.log
+*.pid
+# Local model caches / weights
+models_cache/
+# Streamlit local state
+.streamlit/secrets.toml

Dockerfile ADDED Viewed

	@@ -0,0 +1,25 @@

+FROM python:3.10-slim
+ENV PYTHONDONTWRITEBYTECODE=1 \
+    PYTHONUNBUFFERED=1 \
+    PIP_NO_CACHE_DIR=1
+WORKDIR /app
+# Build tools are needed for some Python packages during install.
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    build-essential \
+    && rm -rf /var/lib/apt/lists/*
+COPY requirements.txt .
+# Hugging Face Spaces free tier is CPU-based; install CPU Torch explicitly.
+RUN pip install --upgrade pip && \
+    pip install --no-cache-dir torch --index-url https://download.pytorch.org/whl/cpu && \
+    pip install --no-cache-dir -r requirements.txt
+COPY . .
+EXPOSE 7860
+CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,10 +1,257 @@
 ---
-title: Sentinel Api
-emoji: 👁
-colorFrom: blue
-colorTo: pink
-sdk: docker
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Hibrit İçerik Moderasyon Sistemi
+Türkçe ve İngilizce kullanıcı içeriklerini düşük gecikmeyle analiz edip zararlı içeriği otomatik sınıflandıran, kara liste + yapay zeka modellerini birlikte kullanan katmanlı bir moderasyon sistemidir.
+## 1. Proje Amacı
+Bu proje, gerçek zamanlı içerik moderasyonunda hem hızlı hem de yüksek doğruluklu karar üretmek için kural tabanlı filtreleri ve makine öğrenmesi modellerini tek bir hibrit akışta birleştirir.
+## 2. Kullanılan Teknolojiler
+- `Python 3.x`
+- `FastAPI` (API servis katmanı)
+- `Uvicorn` (ASGI sunucu)
+- `Streamlit` (moderasyon paneli)
+- `PyTorch` (inference altyapısı)
+- `Transformers` (BERTurk tokenizer/model yükleme)
+- `Detoxify` (toxicity, insult, threat, identity attack skorları)
+- `Supabase` (canlı kara liste veritabanı)
+- `Pandas` / `Openpyxl` (toplu veri analizi)
+- `Scikit-learn` / `Matplotlib` (değerlendirme ve raporlama)
+## 3. Proje Dizin Yapısı
+```text
+moderasyon/
+├─ main.py
+├─ app.py
+├─ utils.py
+├─ performance_test.py
+├─ stress_test.py
+├─ vram_check.py
+├─ requirements.txt
+├─ models_cache/
+│  ├─ bertturk-offensive-42k/
+│  └─ bertturk-hate-speech/   # v4 raporunda aktif akıştan çıkarıldı
+└─ app/
+   ├─ api/
+   │  └─ endpoints.py
+   ├─ core/
+   │  └─ config.py
+   ├─ db/
+   │  └─ supabase_client.py
+   ├─ ml/
+   │  └─ model_loader.py
+   ├─ services/
+   │  ├─ cache_manager.py
+   │  └─ moderation_service.py
+   └─ utils/
+      └─ text_utils.py
+```
+## 4. Sistem Nasıl Çalışır?
+Sistem, gelen metni önce temizler, sonra spam/gibberish ve kara liste kontrollerini yapar, en son gerekli ise ML modellerini çalıştırıp tek bir karar motorunda sonucu üretir.
+### 4.1 Genel Akış
+```mermaid
+flowchart TD
+    A[POST /analyze] --> B[clean_text_nfkc]
+    B --> C{is_spam?}
+    C -- Evet --> D[SPAM/GIBBERISH - Early Exit]
+    C -- Hayir --> E[Supabase Cache Kontrolu]
+    E --> F{Dil}
+    F -- TR --> G[BERTurk Offensive + Detoxify Multilingual]
+    F -- EN --> H[Gibberish Detector + Detoxify Original]
+    G --> I[calculate_verdict]
+    H --> I
+    I --> J[decision + risk + action + latency_ms]
+```
+### 4.2 TR Pipeline
+- Metin normalize edilir (`NFKC`, obfuscation temizliği, leet dönüşümü).
+- `is_spam()` hızlı kuralları çalışır; pozitifse model çağrılmaz.
+- Supabase kara listesi RAM cache içinden taranır.
+- `BERTurk Offensive 42K` ile `off_score` hesaplanır.
+- `Detoxify Multilingual` ile toxicity/insult/threat/identity attack skorları alınır.
+- `calculate_verdict()` son kararı üretir.
+### 4.3 EN Pipeline
+- Metin normalize edilir.
+- `is_spam()` kontrolü yapılır.
+- Supabase EN cache taranır.
+- Gibberish detector `noise > 0.98` ise erken çıkış verir.
+- `Detoxify Original` 6 etiketli skor üretir.
+- `calculate_verdict()` ile nihai sınıflandırma yapılır.
+## 5. Modeller, Veri Setleri ve Eğitim Süreci
+### 5.1 Kullanılan Veri Setleri
+- `Toygar/turkish-offensive-language-detection` (aktif TR modeli)
+  - Toplam: `53,005`
+  - Train: `42,398`, Validation: `1,756`, Test: `8,851`
+  - Etiketler: `0=temiz`, `1=offensive`
+- `fawern/turkish-hate-speech` (referans)
+  - v4 notu: ayrı hate modeli aktif akıştan çıkarılmıştır.
+### 5.2 Fine-Tuning Özeti (BERTurk Offensive 42K)
+- `num_train_epochs=3`
+- `batch_size=16`
+- `learning_rate=2e-5`
+- `weight_decay=0.01`
+- `fp16=True`
+- `load_best_model_at_end=True`
+Eğitim gözlemi (özet): model ilk epoch'tan sonra hızla iyileşir, sonraki epoch'larda doğruluk artışı sınırlı kalsa da en iyi checkpoint otomatik seçilerek stabil performans korunur.
+### 5.3 Neden Hibrit Mimari?
+- Sadece blacklist: semantik saldırganlığı kaçırabilir.
+- Sadece model: obfuscation ve açık küfürlerde gereksiz gecikme yaratabilir.
+- Hibrit yaklaşım: erken çıkış + semantik model kombinasyonuyla hız/doğruluk dengesi sağlar.
+## 6. Başarı Metrikleri (Rapor v4)
+### 6.1 Model Kalitesi
+- TR Accuracy: `%92`
+- TR Macro F1: `%92`
+- Offensive sınıfı F1: `%92`
+### 6.2 Performans
+- Tek istek gecikmesi:
+  - TR: `~90ms - 240ms`
+  - EN: `~54ms - 111ms`
+- Hedef: `<300ms` (karşılanıyor)
+### 6.3 Stress Test
+- `50` istek, `5` eş zamanlı kullanıcı
+- Ortalama gecikme: `319.69ms`
+- Throughput: `15.01 req/sec`
+- GPU: `RTX 3050 Ti Laptop GPU`
+- VRAM: `687.36MB allocated / 750MB reserved`
+### 6.4 Gerçek Veri Testi (500 Tweet)
+- Toplam süre: `83 saniye` (`~166ms/satır`)
+- Dağılım:
+  - `TEMİZ`: `216` (`%43.2`)
+  - `KÜFÜR/PROFANITY`: `169` (`%33.8`)
+  - `SALDIRGAN/TOXIC`: `87` (`%17.4`)
+  - `İNCELEME GEREKLİ`: `27` (`%5.4`)
+  - `SPAM/GİBBERİSH`: `1` (`%0.2`)
+## 7. API Endpointleri
+- `POST /analyze`
+  - Girdi: `{"text": "...", "platform_dil": "tr|en"}`
+  - Çıktı: `decision`, `risk_level`, `details`, `latency_ms`
+- `GET /refresh-cache`
+  - Supabase kara listesini sistemi durdurmadan RAM'e yeniden yükler.
+- `GET /vram-status`
+  - GPU bellek kullanımını döndürür.
+## 8. Önemli Fonksyonlar
+### 8.1 `clean_text_nfkc()` ne yapar?
+Dosya: `app/utils/text_utils.py`
+- Mesajdaki karakterleri standartlaştırır.
+- Gizlenmiş küfürleri görünür hale getirir.
+- Örnek: `m.a.l` -> `mal`, `ger1zekalı` -> `gerizekalı`.
+Kısaca: Kullanıcı metnini “makine için okunabilir ve karşılaştırılabilir” hale getirir.
+### 8.2 `is_spam()` ne yapar?
+Dosya: `app/utils/text_utils.py`
+- Çok kısa, anlamsız, tekrar eden veya reklam kalıbı içeren metni işaretler.
+- Eğer spam ise pahalı model çağrısını atlar.
+Kısaca: Sistemin hem hızını artırır hem de gereksiz GPU kullanımını azaltır.
+### 8.3 `load_blacklist_to_ram()` ne yapar?
+Dosya: `app/services/cache_manager.py`
+- Supabase'deki `blacklist` tablosunu sayfalı şekilde çeker.
+- TR ve EN kelimeleri ayrı sözlüklerde RAM'e alır.
+- `/refresh-cache` çağrısıyla canlı güncellenir.
+Kısaca: Kara listeyi veritabanından her istekte tekrar okumadan çok hızlı kullanmayı sağlar.
+### 8.4 `run_moderation()` ne yapar?
+Dosya: `app/services/moderation_service.py`
+- Tüm moderasyon adımlarını sırayla çalıştıran ana fonksiyondur.
+- Temizleme -> spam -> kara liste -> model -> karar akışını yönetir.
+- Sonuçta API'nin döndüğü tüm karar bilgilerini üretir.
+Kısaca: Bu fonksiyon sistemin beyni gibi çalışır.
+### 8.5 `calculate_verdict()` ne yapar?
+Dosya: `app/services/moderation_service.py`
+- Kara liste eşleşmeleri ve model skorlarını tek bir karara dönüştürür.
+- Risk seviyesini (`CRITICAL`, `MEDIUM`, `LOW`, `NONE`) belirler.
+- Karşılık gelen aksiyonu (`CENSOR`, `MONITOR`, `ALLOW`) tetikler.
+Kısaca: Model skorlarını insanlar için anlaşılır moderasyon kararına çevirir.
+### 8.6 `/analyze` endpoint'i ne yapar?
+Dosya: `app/api/endpoints.py`
+- Dış sistemin çağırdığı ana API kapısıdır.
+- Metni alır, `run_moderation()` ile analiz eder.
+- JSON formatında karar + gecikme bilgisi döndürür.
+Kısaca: Platform ile moderasyon motoru arasındaki köprüdür.
+## 9. Kurulum ve Çalıştırma
+### 9.1 Gereksinimler
+- Python 3.10+
+- CUDA destekli GPU (önerilir, zorunlu değil)
+- Supabase proje bilgileri (`SUPABASE_URL`, `SUPABASE_KEY`)
+### 9.2 Kurulum
+```bash
+pip install -r requirements.txt
+```
+### 9.3 API'yi Başlatma
+```bash
+uvicorn main:app --reload
+```
+### 9.4 Streamlit Paneli Başlatma
+```bash
+streamlit run app.py
+```
+## 10. Notlar ve İyileştirme Önerileri
+- Bilinen kenar vaka: aşırı tekrar + küfür kombinasyonlarında bazı metinler spam'e düşebilir.
+- Öneri 1: tekrar harf sadeleştirmesini spam kontrolünden önce kesinleştir.
+- Öneri 2: `İNCELEME GEREKLİ` eşiğini veri dağılımına göre yeniden kalibre et.
+- Öneri 3: platformdan gelen etiketli gerçek verilerle periyodik yeniden eğitim yap.
 ---
+Hazırlanan bu README, `Teknik Araştırma & Geliştirme Raporu v4` içeriğini proje kod yapısıyla birlikte tek dokümanda birleştirir.
+## 11. Docker ve Hugging Face Spaces Deployment
+Bu proje FastAPI servisini `main.py` içindeki `app` nesnesi ile başlatır. Docker imajı içinde doğru başlangıç komutu bu nedenle `uvicorn main:app ...` şeklindedir.
+### 11.1 Yerelde Docker ile test
+```bash
+docker build -t sentinel-api .
+docker run --rm -p 7860:7860 \
+  -e SUPABASE_URL="https://YOUR_PROJECT.supabase.co" \
+  -e SUPABASE_KEY="YOUR_SUPABASE_KEY" \
+  sentinel-api
+```
+Test isteği:
+```bash
+curl -X POST "http://127.0.0.1:7860/analyze" \
+  -H "Content-Type: application/json" \
+  -d '{"text":"örnek metin","platform_dil":"tr"}'
+```
+### 11.2 Hugging Face Spaces adımları
+1. Hugging Face hesabında `New Space` oluştur.
+2. `SDK` olarak `Docker` seç.
+3. Bu repodaki dosyaları Space'e yükle (`Dockerfile`, `requirements.txt`, `app/`, `main.py`, `models_cache/` vb.).
+4. Space ayarlarında `Settings -> Variables and secrets` bölümüne şu secret'ları ekle:
+   - `SUPABASE_URL`
+   - `SUPABASE_KEY`
+5. Build tamamlandığında servis `7860` portunda ayağa kalkar.
+Notlar:
+- Bu Docker kurulumunda `torch` CPU wheel olarak yüklenir (HF Spaces free tier için uygun).
+- `.dockerignore` dosyası, gereksiz yerel dosyaları imaja dahil etmeyerek build süresini ve imaj boyutunu azaltır.

app.py ADDED Viewed

	@@ -0,0 +1,1017 @@

+import io
+import subprocess
+import time
+from datetime import datetime
+import pandas as pd
+import requests
+import streamlit as st
+try:
+    import psutil
+except ImportError:
+    psutil = None
+st.set_page_config(
+    page_title="Sentinel — İçerik Moderasyon",
+    layout="wide",
+    initial_sidebar_state="expanded",
+)
+st.markdown(
+    """
+<style>
+@import url('https://fonts.googleapis.com/css2?family=IBM+Plex+Mono:wght@400;500;600&family=IBM+Plex+Sans:wght@300;400;500;600&display=swap');
+html, body, [class*="css"] {
+    font-family: 'IBM Plex Sans', sans-serif;
+    background-color: #0a0e17;
+    color: #c9d1e0;
+}
+[data-testid="stSidebar"] {
+    background: #0d1220;
+    border-right: 1px solid #1e2d45;
+    min-width: 300px !important;
+    max-width: 300px !important;
+    width: 300px !important;
+    margin-left: 0 !important;
+    transform: translateX(0) !important;
+    flex-shrink: 0 !important;
+}
+[data-testid="stSidebar"][aria-expanded="false"] {
+    min-width: 300px !important;
+    max-width: 300px !important;
+    width: 300px !important;
+    margin-left: 0 !important;
+    transform: translateX(0) !important;
+}
+[data-testid="stSidebar"][aria-expanded="true"] {
+    min-width: 300px !important;
+    max-width: 300px !important;
+    width: 300px !important;
+}
+[data-testid="stSidebarContent"] {
+    display: block !important;
+    visibility: visible !important;
+    opacity: 1 !important;
+}
+[data-testid="stSidebar"] * { color: #8a9bc0 !important; }
+[data-testid="stSidebar"] .stRadio label { color: #c9d1e0 !important; }
+[data-testid="collapsedControl"],
+[data-testid="stSidebarCollapseButton"],
+button[title="Close sidebar"],
+button[title="Open sidebar"] { display: none !important; }
+#MainMenu, footer, header { visibility: hidden; }
+.block-container { padding-top: 1.5rem; padding-bottom: 2rem; }
+.sentinel-header {
+    display: flex; align-items: center; gap: 16px;
+    padding: 20px 0 28px 0;
+    border-bottom: 1px solid #1e2d45;
+    margin-bottom: 28px;
+}
+.sentinel-logo {
+    width: 44px; height: 44px;
+    background: linear-gradient(135deg, #1a6cf7, #0d3d8e);
+    border-radius: 10px;
+    display: flex; align-items: center; justify-content: center;
+    font-size: 22px;
+}
+.sentinel-title { font-family:'IBM Plex Mono',monospace; font-size:22px; font-weight:600; color:#e8eef8; }
+.sentinel-sub { font-size:12px; color:#6f86ab; font-family:'IBM Plex Mono',monospace; letter-spacing:1px; text-transform:uppercase; }
+.status-pill {
+    margin-left:auto; background:#0a1f0e; border:1px solid #1a5c28;
+    color:#3ddc5f; font-family:'IBM Plex Mono',monospace;
+    font-size:11px; padding:4px 12px; border-radius:20px;
+}
+.status-dot { display:inline-block; width:7px; height:7px; background:#3ddc5f; border-radius:50%; margin-right:6px; animation:pulse 2s infinite; }
+@keyframes pulse { 0%,100%{opacity:1} 50%{opacity:0.3} }
+.verdict-card { border-radius:12px; padding:24px 28px; margin-bottom:20px; border:1px solid; position:relative; overflow:hidden; }
+.verdict-card::before { content:''; position:absolute; top:0; left:0; width:4px; height:100%; }
+.verdict-TEMIZ    { background:#050f07; border-color:#1a4d25; } .verdict-TEMIZ::before    { background:#2ea84a; }
+.verdict-KUFUR    { background:#0f0c02; border-color:#4d3d08; } .verdict-KUFUR::before    { background:#d4a017; }
+.verdict-SALDIRGAN{ background:#0f0c02; border-color:#4d3d08; } .verdict-SALDIRGAN::before{ background:#d4a017; }
+.verdict-TOXIC    { background:#0f0c02; border-color:#4d3d08; } .verdict-TOXIC::before    { background:#d4a017; }
+.verdict-NEFRET   { background:#120a02; border-color:#5c2e0a; } .verdict-NEFRET::before   { background:#e07020; }
+.verdict-INCELEME { background:#060a13; border-color:#1a2d5c; } .verdict-INCELEME::before { background:#3a7bd4; }
+.verdict-SPAM     { background:#080810; border-color:#2a1a4d; } .verdict-SPAM::before     { background:#8030d4; }
+.verdict-label { font-family:'IBM Plex Mono',monospace; font-size:26px; font-weight:600; margin-bottom:6px; }
+.verdict-reason { font-size:14px; color:#6a7f9a; font-family:'IBM Plex Mono',monospace; }
+.metric-row { display:flex; gap:12px; margin-bottom:20px; }
+.metric-card { flex:1; background:#0d1220; border:1px solid #1e2d45; border-radius:10px; padding:16px 20px; }
+.metric-label { font-family:'IBM Plex Mono',monospace; font-size:11px; color:#7690b8; text-transform:uppercase; letter-spacing:1px; margin-bottom:8px; }
+.metric-value { font-family:'IBM Plex Mono',monospace; font-size:24px; font-weight:600; color:#e8eef8; }
+.metric-value.low{color:#2ea84a} .metric-value.med{color:#d4a017} .metric-value.high{color:#e03030}
+.score-row { margin-bottom:14px; }
+.score-label { display:flex; justify-content:space-between; font-family:'IBM Plex Mono',monospace; font-size:12px; color:#8ea7cb; margin-bottom:5px; }
+.score-track { height:5px; background:#1a2535; border-radius:3px; overflow:hidden; }
+.score-fill  { height:100%; border-radius:3px; }
+.stTextArea textarea { background:#0d1220 !important; border:1px solid #1e2d45 !important; border-radius:10px !important; color:#c9d1e0 !important; font-family:'IBM Plex Sans',sans-serif !important; font-size:15px !important; padding:14px !important; }
+.stTextArea textarea:focus { border-color:#1a6cf7 !important; }
+.stButton button { background:#1a6cf7 !important; color:white !important; border:none !important; border-radius:8px !important; font-family:'IBM Plex Sans',sans-serif !important; font-weight:500 !important; font-size:14px !important; padding:10px 24px !important; }
+.stButton button:hover { background:#1557cc !important; }
+.stTabs [data-baseweb="tab-list"] { background:transparent !important; border-bottom:1px solid #1e2d45 !important; }
+.stTabs [data-baseweb="tab"] { background:transparent !important; color:#4a6080 !important; font-family:'IBM Plex Mono',monospace !important; font-size:13px !important; padding:10px 20px !important; border-bottom:2px solid transparent !important; }
+.stTabs [aria-selected="true"] { color:#1a6cf7 !important; border-bottom-color:#1a6cf7 !important; background:transparent !important; }
+[data-testid="stFileUploader"] { background:#0d1220 !important; border:1px dashed #1e2d45 !important; border-radius:10px !important; }
+.stRadio label { background:#111827 !important; border:1px solid #1e2d45 !important; border-radius:8px !important; padding:10px 14px !important; }
+.stRadio label:has(input:checked) { border-color:#1a6cf7 !important; background:#0d1a33 !important; }
+hr { border-color:#1e2d45 !important; }
+.stTextInput input { background:#0d1220 !important; border:1px solid #1e2d45 !important; color:#c9d1e0 !important; border-radius:8px !important; font-family:'IBM Plex Mono',monospace !important; font-size:12px !important; }
+[data-testid="stDataFrame"] { border:1px solid #1e2d45 !important; border-radius:10px !important; overflow:hidden !important; }
+.stProgress > div > div { background:#1a6cf7 !important; }
+.report-table { width:100%; border-collapse:collapse; font-family:'IBM Plex Mono',monospace; font-size:12px; }
+.report-table th {
+    text-align:left; padding:10px 14px;
+    color:#4a6080; font-weight:600; font-size:10px;
+    letter-spacing:1.2px; text-transform:uppercase;
+    background:#0d1220; border-bottom:1px solid #1e2d45;
+    position:sticky; top:0; z-index:10;
+}
+.report-table td { padding:10px 14px; border-bottom:1px solid #0f1826; vertical-align:middle; }
+.report-table tr:hover td { background:#0d1525; }
+.risk-badge {
+    display:inline-block; padding:2px 10px; border-radius:12px;
+    font-size:10px; font-weight:600; letter-spacing:0.8px;
+    font-family:'IBM Plex Mono',monospace;
+}
+.badge-CRITICAL { background:#1f0c0c; color:#e03030; border:1px solid #5c1a1a; }
+.badge-HIGH     { background:#1a0e03; color:#e07020; border:1px solid #5c2e0a; }
+.badge-MEDIUM   { background:#141002; color:#d4a017; border:1px solid #4d3d08; }
+.badge-LOW      { background:#07091a; color:#3a7bd4; border:1px solid #1a2d5c; }
+.badge-NONE     { background:#050f07; color:#2ea84a; border:1px solid #1a4d25; }
+.inline-bar {
+    display:inline-block; height:4px; border-radius:2px;
+    vertical-align:middle; margin-right:4px;
+}
+.hits-tag {
+    display:inline-block; background:#1f0e0e; border:1px solid #5c1a1a;
+    color:#e05050; font-size:10px; padding:1px 6px; border-radius:4px; margin:1px;
+}
+.karar-cell { font-weight:600; font-size:11px; }
+.metin-cell { color:#8a9bc0; max-width:280px; overflow:hidden; text-overflow:ellipsis; white-space:nowrap; }
+.skor-cell  { color:#6a8cb0; font-size:11px; }
+.summary-grid { display:grid; grid-template-columns:repeat(auto-fit, minmax(140px, 1fr)); gap:12px; margin-bottom:24px; }
+.summary-card { background:#0d1220; border:1px solid #1e2d45; border-radius:10px; padding:16px; text-align:center; }
+.summary-count { font-family:'IBM Plex Mono',monospace; font-size:36px; font-weight:700; margin-bottom:4px; }
+.summary-label { font-family:'IBM Plex Mono',monospace; font-size:10px; color:#4a6080; text-transform:uppercase; letter-spacing:1px; }
+.queue-card {
+    background:#060a13; border:1px solid #1a2d5c; border-radius:10px;
+    padding:16px; margin-bottom:10px;
+    display:flex; gap:16px; align-items:flex-start;
+}
+.queue-index { font-family:'IBM Plex Mono',monospace; font-size:11px; color:#2a3d55; min-width:28px; }
+.queue-text  { color:#c9d1e0; font-size:13px; line-height:1.5; flex:1; }
+.queue-meta  { font-family:'IBM Plex Mono',monospace; font-size:10px; color:#4a6080; margin-top:4px; }
+</style>
+""",
+    unsafe_allow_html=True,
+)
+API_URL = "http://127.0.0.1:8000/analyze"
+VERDICT_COLORS = {
+    "High": "#e03030",
+    "Medium": "#d4a017",
+    "Low": "#8030d4",
+    "None": "#2ea84a",
+    "CRITICAL": "#e03030",
+}
+VERDICT_ICONS = {"High": "●", "Medium": "◆", "Low": "▲", "None": "✓", "CRITICAL": "🚨"}
+if "last_latency_ms" not in st.session_state:
+    st.session_state["last_latency_ms"] = None
+if "last_metrics" not in st.session_state:
+    st.session_state["last_metrics"] = None
+def get_gpu_info():
+    try:
+        result = subprocess.check_output(
+            [
+                "nvidia-smi",
+                "--query-gpu=name,utilization.gpu,temperature.gpu,memory.used,memory.total",
+                "--format=csv,noheader,nounits",
+            ],
+            encoding="utf-8",
+            stderr=subprocess.STDOUT,
+        )
+        line = result.strip().splitlines()[0]
+        name, util, temp, mem_used, mem_total = [p.strip() for p in line.split(",", maxsplit=4)]
+        return {
+            "name": name,
+            "load": int(float(util)),
+            "temp": int(float(temp)),
+            "vram_used": int(float(mem_used)),
+            "vram_total": int(float(mem_total)),
+        }
+    except Exception:
+        return None
+def capture_process_metrics():
+    gpu_data = get_gpu_info()
+    cpu_val = 0.0
+    ram_pct = 0.0
+    if psutil is not None:
+        cpu_val = psutil.cpu_percent(interval=0.1)
+        ram_pct = psutil.virtual_memory().percent
+    return {
+        "cpu": round(cpu_val, 1),
+        "ram_pct": round(ram_pct, 1),
+        "vram_used": str(gpu_data["vram_used"]) if gpu_data else "0",
+        "gpu_load": str(gpu_data["load"]) if gpu_data else "0",
+        "timestamp": time.strftime("%H:%M:%S"),
+    }
+def verdict_css_class(decision):
+    d = decision.upper()
+    if "TEMIZ" in d or "CLEAR" in d:
+        return "TEMIZ"
+    if "NEFRET" in d or "IDENTITY" in d:
+        return "NEFRET"
+    if "KÜFÜR" in d or "KUFUR" in d or "PROFANITY" in d:
+        return "KUFUR"
+    if "SALDIRGAN" in d or "TOXIC" in d:
+        return "SALDIRGAN"
+    if "İNCELEME" in d or "INCELEME" in d or "REVIEW" in d:
+        return "INCELEME"
+    if "SPAM" in d or "GİBBERİSH" in d:
+        return "SPAM"
+    return "TEMIZ"
+def risk_color(val):
+    if val > 0.7:
+        return "#e03030"
+    if val > 0.4:
+        return "#d4a017"
+    if val > 0.15:
+        return "#f0a020"
+    return "#2ea84a"
+def score_bar(label, value, color="#1a6cf7"):
+    pct = min(max(value * 100, 0), 100)
+    return f"""<div class=\"score-row\">
+        <div class=\"score-label\"><span>{label}</span><span style=\"color:{color};font-weight:600\">%{pct:.1f}</span></div>
+        <div class=\"score-track\"><div class=\"score-fill\" style=\"width:{pct}%;background:{color}\"></div></div>
+    </div>"""
+def badge_html(risk):
+    cls = {
+        "CRITICAL": "badge-CRITICAL",
+        "HIGH": "badge-HIGH",
+        "MEDIUM": "badge-MEDIUM",
+        "LOW": "badge-LOW",
+        "NONE": "badge-NONE",
+    }.get(risk.upper(), "badge-NONE")
+    return f'<span class="risk-badge {cls}">{risk}</span>'
+def inline_bar_html(value, color):
+    w = min(max(value * 60, 0), 60)
+    return f'<span class="inline-bar" style="width:{w}px;background:{color}"></span><span style="color:{color};font-size:11px">%{value * 100:.0f}</span>'
+def generate_docx_report(res_df, total_time, platform_dil):
+    try:
+        from docx import Document
+        from docx.enum.text import WD_ALIGN_PARAGRAPH
+        from docx.oxml import OxmlElement
+        from docx.oxml.ns import qn
+        from docx.shared import Cm, Pt, RGBColor
+    except ImportError:
+        return None
+    doc = Document()
+    for section in doc.sections:
+        section.top_margin = Cm(1.8)
+        section.bottom_margin = Cm(1.8)
+        section.left_margin = Cm(2.0)
+        section.right_margin = Cm(2.0)
+    def set_cell_bg(cell, hex_color):
+        tc = cell._tc
+        tc_pr = tc.get_or_add_tcPr()
+        shd = OxmlElement("w:shd")
+        shd.set(qn("w:val"), "clear")
+        shd.set(qn("w:color"), "auto")
+        shd.set(qn("w:fill"), hex_color)
+        tc_pr.append(shd)
+    def add_run(para, text, bold=False, size=10, color="000000", italic=False):
+        run = para.add_run(text)
+        run.bold = bold
+        run.italic = italic
+        run.font.size = Pt(size)
+        run.font.color.rgb = RGBColor(int(color[0:2], 16), int(color[2:4], 16), int(color[4:6], 16))
+        return run
+    title_para = doc.add_paragraph()
+    title_para.alignment = WD_ALIGN_PARAGRAPH.CENTER
+    add_run(title_para, "SENTINEL AI - Moderasyon Analiz Raporu", bold=True, size=18, color="1F4E79")
+    sub_para = doc.add_paragraph()
+    sub_para.alignment = WD_ALIGN_PARAGRAPH.CENTER
+    ts = datetime.now().strftime("%d.%m.%Y %H:%M")
+    add_run(
+        sub_para,
+        f"Platform: {platform_dil.upper()}  |  Olusturulma: {ts}  |  {len(res_df)} kayit  |  {total_time:.1f}s",
+        size=9,
+        color="888888",
+    )
+    doc.add_paragraph()
+    counts = res_df["Karar"].value_counts()
+    sum_para = doc.add_paragraph()
+    add_run(sum_para, "OZET", bold=True, size=11, color="1F4E79")
+    sum_tbl = doc.add_table(rows=1, cols=len(counts) + 1)
+    sum_tbl.style = "Table Grid"
+    hdr = sum_tbl.rows[0].cells
+    set_cell_bg(hdr[0], "1F4E79")
+    p = hdr[0].paragraphs[0]
+    p.alignment = WD_ALIGN_PARAGRAPH.CENTER
+    add_run(p, "Metrik", bold=True, size=9, color="FFFFFF")
+    karar_colors = {
+        "TEMIZ": "2EA84A",
+        "KÜFÜR": "D4A017",
+        "KUFUR": "D4A017",
+        "PROFANITY": "D4A017",
+        "SALDIRGAN": "D4A017",
+        "TOXIC": "D4A017",
+        "NEFRET": "E07020",
+        "INCELEME": "3A7BD4",
+        "SPAM": "8030D4",
+        "GIBBERISH": "8030D4",
+    }
+    for i, (karar, cnt) in enumerate(counts.items()):
+        cell = hdr[i + 1]
+        set_cell_bg(cell, "0D1220")
+        p2 = cell.paragraphs[0]
+        p2.alignment = WD_ALIGN_PARAGRAPH.CENTER
+        c = next((v for k, v in karar_colors.items() if k in karar.upper()), "888888")
+        add_run(p2, f"{cnt}", bold=True, size=14, color=c)
+        p3 = cell.add_paragraph()
+        p3.alignment = WD_ALIGN_PARAGRAPH.CENTER
+        add_run(p3, karar[:16], size=7, color="888888")
+    doc.add_paragraph()
+    detail_para = doc.add_paragraph()
+    add_run(detail_para, "DETAYLI ANALIZ SONUCLARI", bold=True, size=11, color="1F4E79")
+    cols = ["#", "Metin", "Normalize", "Karar", "Risk", "Saldirganlik", "Nefret", "Tehdit", "Hits"]
+    tbl = doc.add_table(rows=1, cols=len(cols))
+    tbl.style = "Table Grid"
+    for i, col_name in enumerate(cols):
+        cell = tbl.rows[0].cells[i]
+        set_cell_bg(cell, "1F4E79")
+        p = cell.paragraphs[0]
+        p.alignment = WD_ALIGN_PARAGRAPH.CENTER
+        add_run(p, col_name, bold=True, size=8, color="FFFFFF")
+    for idx, row in res_df.iterrows():
+        tr = tbl.add_row()
+        cells = tr.cells
+        risk_str = str(row.get("Risk", "")).upper()
+        row_colors = {
+            "CRITICAL": "1F0C0C",
+            "HIGH": "1A0E03",
+            "MEDIUM": "141002",
+            "LOW": "07091A",
+            "NONE": "050F07",
+        }
+        row_fill = row_colors.get(risk_str, "0D1220")
+        set_cell_bg(cells[0], row_fill)
+        p = cells[0].paragraphs[0]
+        p.alignment = WD_ALIGN_PARAGRAPH.CENTER
+        add_run(p, str(idx + 1), size=8, color="4A6080")
+        set_cell_bg(cells[1], row_fill)
+        p = cells[1].paragraphs[0]
+        add_run(p, str(row.get("Metin", ""))[:120], size=8, color="C9D1E0")
+        set_cell_bg(cells[2], row_fill)
+        p = cells[2].paragraphs[0]
+        add_run(p, str(row.get("Normalize", ""))[:60], size=7, color="6A8CB0", italic=True)
+        set_cell_bg(cells[3], row_fill)
+        p = cells[3].paragraphs[0]
+        p.alignment = WD_ALIGN_PARAGRAPH.CENTER
+        karar = str(row.get("Karar", ""))
+        c = next((v for k, v in karar_colors.items() if k in karar.upper()), "888888")
+        add_run(p, karar[:20], bold=True, size=8, color=c)
+        set_cell_bg(cells[4], row_fill)
+        p = cells[4].paragraphs[0]
+        p.alignment = WD_ALIGN_PARAGRAPH.CENTER
+        risk_colors = {
+            "CRITICAL": "E03030",
+            "HIGH": "E07020",
+            "MEDIUM": "D4A017",
+            "LOW": "3A7BD4",
+            "NONE": "2EA84A",
+        }
+        rc = risk_colors.get(risk_str, "888888")
+        add_run(p, risk_str, bold=True, size=8, color=rc)
+        for col_i, field in [(5, "Saldırganlık"), (6, "Nefret"), (7, "Tehdit")]:
+            set_cell_bg(cells[col_i], row_fill)
+            p = cells[col_i].paragraphs[0]
+            p.alignment = WD_ALIGN_PARAGRAPH.CENTER
+            score = float(row.get(field, 0.0))
+            add_run(p, f"%{score * 100:.1f}", size=8, color=risk_color(score).replace("#", ""))
+        set_cell_bg(cells[8], row_fill)
+        p = cells[8].paragraphs[0]
+        hits = str(row.get("Hits", "")).strip("[]'\"")
+        add_run(p, hits if hits else "-", size=7, color="E05050" if hits else "2A3D55")
+    widths_cm = [0.7, 4.5, 3.0, 2.8, 1.5, 1.4, 1.4, 1.4, 2.0]
+    for i, w in enumerate(widths_cm):
+        for row in tbl.rows:
+            row.cells[i].width = Cm(w)
+    doc.add_paragraph()
+    inceleme = res_df[res_df["Karar"].str.contains("İNCELEME|INCELEME|REVIEW", na=False)]
+    if len(inceleme):
+        q_para = doc.add_paragraph()
+        add_run(q_para, f"INCELEME KUYRUGU - {len(inceleme)} Icerik", bold=True, size=11, color="3A7BD4")
+        for _, row in inceleme.iterrows():
+            q_tbl = doc.add_table(rows=1, cols=1)
+            q_tbl.style = "Table Grid"
+            cell = q_tbl.rows[0].cells[0]
+            set_cell_bg(cell, "060A13")
+            p = cell.paragraphs[0]
+            add_run(p, str(row.get("Metin", ""))[:200], size=9, color="C9D1E0")
+            p2 = cell.add_paragraph()
+            add_run(
+                p2,
+                f"Risk: {row.get('Risk', '')}  |  Saldirganlik: %{float(row.get('Saldırganlık', 0)) * 100:.0f}  |  {row.get('Gerekçe', '')}",
+                size=8,
+                color="4A6080",
+                italic=True,
+            )
+    doc.add_paragraph()
+    footer_p = doc.add_paragraph()
+    footer_p.alignment = WD_ALIGN_PARAGRAPH.CENTER
+    add_run(footer_p, "Sentinel AI  -  Dahili Kullanim  -  " + datetime.now().strftime("%Y"), size=8, color="2A3D55")
+    buf = io.BytesIO()
+    doc.save(buf)
+    buf.seek(0)
+    return buf
+st.markdown(
+    """
+<div class="sentinel-header">
+    <div class="sentinel-logo">⬡</div>
+    <div>
+        <div class="sentinel-title">Sentinel</div>
+        <div class="sentinel-sub">İçerik Moderasyon Sistemi</div>
+    </div>
+    <div class="status-pill"><span class="status-dot"></span>ONLINE</div>
+</div>
+""",
+    unsafe_allow_html=True,
+)
+with st.sidebar:
+    st.markdown(
+        """<div style="padding:8px 0 20px 0; border-bottom:1px solid #1e2d45; margin-bottom:20px;">
+        <div style="font-family:'IBM Plex Mono',monospace; font-size:11px; color:#4a6080; letter-spacing:1.5px; text-transform:uppercase; margin-bottom:16px;">Sistem Konfigürasyonu</div>
+    </div>""",
+        unsafe_allow_html=True,
+    )
+    st.markdown(
+        """<div style="font-family:'IBM Plex Mono',monospace; font-size:11px; color:#4a6080; text-transform:uppercase; letter-spacing:1px; margin-bottom:10px;">Platform Dili</div>""",
+        unsafe_allow_html=True,
+    )
+    platform_dil = st.radio(
+        "Platform dili",
+        ["tr", "en"],
+        format_func=lambda x: "Türkçe  ·  TR Pipeline" if x == "tr" else "English  ·  EN Pipeline",
+        label_visibility="collapsed",
+    )
+    st.markdown("<br>", unsafe_allow_html=True)
+    st.markdown(
+        """<div style="font-family:'IBM Plex Mono',monospace; font-size:11px; color:#4a6080; text-transform:uppercase; letter-spacing:1px; margin-bottom:10px;">API Endpoint</div>""",
+        unsafe_allow_html=True,
+    )
+    api_url = st.text_input("API", value=API_URL, label_visibility="collapsed")
+    st.markdown("<br><br>", unsafe_allow_html=True)
+    st.markdown(
+        """<div style="font-family:'IBM Plex Mono',monospace; font-size:11px; color:#2a3d55; line-height:1.8;">
+        TR PIPELINE<br><span style="color:#4a6289">──────────────</span><br>
+        <span style="color:#6f8fbf">▸</span> is_spam() evrensel filtre<br>
+        <span style="color:#6f8fbf">▸</span> Küfür listesi lookup<br>
+        <span style="color:#6f8fbf">▸</span> BERTurk offensive 42K<br>
+        <span style="color:#6f8fbf">▸</span> Detoxify multilingual<br><br>
+        EN PIPELINE<br><span style="color:#4a6289">──────────────</span><br>
+        <span style="color:#6f8fbf">▸</span> is_spam() evrensel filtre<br>
+        <span style="color:#6f8fbf">▸</span> Gibberish Detector<br>
+        <span style="color:#6f8fbf">▸</span> Detoxify original 6-label
+    </div>""",
+        unsafe_allow_html=True,
+    )
+    st.markdown("---")
+    st.markdown("### 🖥️ Sistem Monitörü")
+    if psutil is None:
+        st.warning("psutil yüklü değil. Kurulum: pip install psutil")
+    else:
+        cpu_load = psutil.cpu_percent(interval=0.2)
+        ram = psutil.virtual_memory()
+        ram_used_gb = ram.used / (1024**3)
+        col1, col2 = st.columns(2)
+        col1.metric("CPU Yükü", f"%{cpu_load:.0f}")
+        col2.metric("RAM", f"{ram_used_gb:.1f} GB", f"%{ram.percent:.0f}", delta_color="inverse")
+    gpu = get_gpu_info()
+    if gpu:
+        st.markdown(f"**GPU:** {gpu['name']}")
+        col3, col4 = st.columns(2)
+        col3.metric("GPU Yükü", f"%{gpu['load']}")
+        col4.metric("GPU Isı", f"{gpu['temp']}°C")
+        vram_pct = 0.0
+        if gpu["vram_total"] > 0:
+            vram_pct = min(max(gpu["vram_used"] / gpu["vram_total"], 0.0), 1.0)
+        st.write(f"VRAM: {gpu['vram_used']}MB / {gpu['vram_total']}MB")
+        st.progress(vram_pct)
+    else:
+        st.warning("GPU bilgisi alınamadı (nvidia-smi erişimi yok).")
+    st.markdown("---")
+    live_latency = st.session_state.get("last_latency_ms")
+    if live_latency is None:
+        st.info("🚀 **Model Latency:** N/A\n\n🛡️ **Sentinel v2.9 Active**")
+    else:
+        st.info(f"🚀 **Model Latency:** ~{live_latency:.0f}ms/req\n\n🛡️ **Sentinel v2.9 Active**")
+    st.markdown("---")
+    if st.session_state.get("last_metrics"):
+        m = st.session_state["last_metrics"]
+        st.markdown("### ⚡ Son İşlem Performansı")
+        st.caption(f"Saat: {m['timestamp']} (İstek anındaki veriler)")
+        col5, col6 = st.columns(2)
+        col5.metric("İşlem CPU", f"%{m['cpu']}")
+        col6.metric("İşlem RAM", f"%{m['ram_pct']}")
+        col7, col8 = st.columns(2)
+        col7.metric("GPU Yükü", f"%{m['gpu_load']}")
+        col8.metric("VRAM", f"{m['vram_used']} MB")
+        st.success("Analiz işlemi için performans verisi kaydedildi.")
+    else:
+        st.info("Performans verisi için analiz başlatın.")
+tab1, tab2 = st.tabs(["  Tek Metin Analizi  ", "  Toplu Analiz  "])
+with tab1:
+    st.markdown("<br>", unsafe_allow_html=True)
+    user_input = st.text_area(
+        "Analiz metni",
+        height=120,
+        placeholder="Analiz edilecek metni buraya yazın...",
+        label_visibility="collapsed",
+    )
+    col_btn, col_info = st.columns([2, 5])
+    with col_btn:
+        analyze_btn = st.button("Analiz Et", use_container_width=True)
+    with col_info:
+        st.markdown(
+            """<div style="padding:10px 0; font-family:'IBM Plex Mono',monospace; font-size:11px; color:#8ea7cb; line-height:1.8;">Spam → Dil → Küfür → Model → Karar</div>""",
+            unsafe_allow_html=True,
+        )
+    if analyze_btn:
+        if not user_input.strip():
+            st.warning("Analiz için metin gerekli.")
+        else:
+            with st.spinner(""):
+                try:
+                    t0 = time.time()
+                    resp = requests.post(api_url, json={"text": user_input, "platform_dil": platform_dil}, timeout=30)
+                    st.session_state["last_metrics"] = capture_process_metrics()
+                    elapsed = (time.time() - t0) * 1000
+                except requests.RequestException as e:
+                    st.error(f"API bağlantı hatası: {e}")
+                    st.stop()
+            if resp.status_code != 200:
+                st.error(f"API {resp.status_code} döndü.")
+                st.stop()
+            r = resp.json()
+            decision = r.get("decision", "—")
+            reason = r.get("reason", "—")
+            risk = r.get("risk_level", "None")
+            lang = r.get("language", platform_dil).upper()
+            cleaned = r.get("cleaned_text", "")
+            details = r.get("details", {})
+            latency = r.get("latency_ms", round(elapsed, 1))
+            st.session_state["last_latency_ms"] = float(latency)
+            backend_perf = r.get("performance")
+            if isinstance(backend_perf, dict):
+                st.session_state["last_metrics"] = {
+                    "cpu": backend_perf.get("cpu", 0),
+                    "ram_pct": backend_perf.get("ram_pct", 0),
+                    "vram_used": str(backend_perf.get("vram_used", 0)),
+                    "gpu_load": str(backend_perf.get("gpu_load", 0)),
+                    "timestamp": backend_perf.get("timestamp", time.strftime("%H:%M:%S")),
+                }
+            vcls = verdict_css_class(decision)
+            vcolor = VERDICT_COLORS.get(risk, "#2ea84a")
+            vicon = VERDICT_ICONS.get(risk, "✓")
+            st.markdown(
+                f"""<div class="verdict-card verdict-{vcls}">
+                <div class="verdict-label" style="color:{vcolor}">{vicon}&nbsp; {decision}
+                    <span style="font-size:14px;color:#2a3d55;margin-left:12px;">[{lang}]</span>
+                </div>
+                <div class="verdict-reason">{reason}</div>
+            </div>""",
+                unsafe_allow_html=True,
+            )
+            lat_class = "low" if latency < 200 else ("med" if latency < 500 else "high")
+            risk_class = {"High": "high", "Medium": "med", "Low": "med", "None": "low", "CRITICAL": "high"}.get(risk, "low")
+            st.markdown(
+                f"""<div class="metric-row">
+                <div class="metric-card"><div class="metric-label">Risk Seviyesi</div><div class="metric-value {risk_class}">{risk}</div></div>
+                <div class="metric-card"><div class="metric-label">Gecikme</div><div class="metric-value {lat_class}">{latency:.0f} ms</div></div>
+                <div class="metric-card"><div class="metric-label">Pipeline</div><div class="metric-value" style="font-size:18px;">{lang}</div></div>
+                <div class="metric-card" style="flex:2"><div class="metric-label">Normalize Edilen Metin</div>
+                    <div style="font-family:'IBM Plex Mono',monospace;font-size:13px;color:#6a8cb0;margin-top:6px;word-break:break-all;">{cleaned}</div>
+                </div>
+            </div>""",
+                unsafe_allow_html=True,
+            )
+            hits = details.get("hits", []) or []
+            insult_hits = details.get("insult_hits", []) or []
+            if hits or insult_hits:
+                tags = "".join(f'<span class="hits-tag">⚡ {h}</span>' for h in hits)
+                tags += "".join(
+                    f'<span class="hits-tag" style="color:#d4a017;border-color:#5c3d08;background:#1a1002">⚠ {h}</span>'
+                    for h in insult_hits
+                )
+                st.markdown(
+                    f"""<div style="margin-bottom:16px;">
+                    <div style="font-family:'IBM Plex Mono',monospace;font-size:11px;color:#4a6080;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px;">Kara Liste Eşleşmeleri</div>
+                    {tags}
+                </div>""",
+                    unsafe_allow_html=True,
+                )
+            col_scores, col_models = st.columns([1, 1.2])
+            with col_scores:
+                st.markdown(
+                    """<div style="font-family:'IBM Plex Mono',monospace;font-size:11px;color:#4a6080;text-transform:uppercase;letter-spacing:1px;margin-bottom:14px;">Sinyal Analizi</div>""",
+                    unsafe_allow_html=True,
+                )
+                bars = ""
+                if lang == "TR":
+                    off = details.get("off_score", 0.0)
+                    ia = details.get("detox", {}).get("identity_attack", 0.0)
+                    thr = details.get("threat", 0.0)
+                    bars += score_bar("Saldırganlık", off, risk_color(off))
+                    bars += score_bar("Nefret (identity_attack)", ia, risk_color(ia))
+                    bars += score_bar("Tehdit", thr, risk_color(thr))
+                else:
+                    dtx = details.get("detox", {})
+                    for key, lbl in [
+                        ("toxicity", "Toxicity"),
+                        ("threat", "Threat"),
+                        ("insult", "Insult"),
+                        ("identity_attack", "Identity Attack"),
+                        ("severe_toxicity", "Severe Toxicity"),
+                        ("obscene", "Obscene"),
+                    ]:
+                        v = dtx.get(key, 0.0)
+                        bars += score_bar(lbl, v, risk_color(v))
+                st.markdown(bars, unsafe_allow_html=True)
+            with col_models:
+                st.markdown(
+                    """<div style="font-family:'IBM Plex Mono',monospace;font-size:11px;color:#4a6080;text-transform:uppercase;letter-spacing:1px;margin-bottom:14px;">Model Kaynak Analizi (Source)</div>""",
+                    unsafe_allow_html=True,
+                )
+                rows_html = ""
+                if lang == "TR":
+                    m_list = [
+                        ("BERTurk Offensive", "N/A", details.get("off_score", 0.0)),
+                        ("Detoxify (TR)", "Analyzed", details.get("detox", {}).get("toxicity", 0.0)),
+                    ]
+                else:
+                    m_list = [
+                        ("Detoxify (Original)", "Analyzed", details.get("detox", {}).get("toxicity", 0.0)),
+                        (
+                            "Gibberish Detector",
+                            details.get("gibberish_label", "N/A"),
+                            details.get("gibberish_score", 0.0) or 0.0,
+                        ),
+                    ]
+                for m_name, m_dec, m_score in m_list:
+                    try:
+                        m_score = float(m_score)
+                    except (TypeError, ValueError):
+                        m_score = 0.0
+                    c = risk_color(m_score)
+                    rows_html += f"""<div style="background:#0d1220;border:1px solid #1e2d45;border-radius:8px;padding:10px;margin-bottom:8px;">
+                        <div style="display:flex;justify-content:space-between;align-items:center;gap:10px;">
+                            <span style="font-size:12px;font-weight:600;color:#e8eef8;">{m_name}</span>
+                            <span style="font-size:10px;color:{c};background:{c}22;padding:2px 8px;border-radius:4px;border:1px solid {c}44;white-space:nowrap;">
+                                {m_dec} (%{m_score * 100:.1f})
+                            </span>
+                        </div>
+                    </div>"""
+                st.markdown(rows_html, unsafe_allow_html=True)
+with tab2:
+    st.markdown("<br>", unsafe_allow_html=True)
+    st.markdown(
+        """<div style="font-family:'IBM Plex Mono',monospace;font-size:11px;color:#4a6080;text-transform:uppercase;letter-spacing:1px;margin-bottom:16px;">Veri Seti Yükle</div>""",
+        unsafe_allow_html=True,
+    )
+    uploaded = st.file_uploader("Dosya", type=["csv", "xlsx"], label_visibility="collapsed")
+    if uploaded:
+        df = pd.read_csv(uploaded) if uploaded.name.endswith(".csv") else pd.read_excel(uploaded)
+        if len(df) == 0:
+            st.warning("Dosya boş.")
+            st.stop()
+        st.markdown(
+            f"""<div style="font-family:'IBM Plex Mono',monospace;font-size:12px;color:#4a6080;margin-bottom:16px;">{len(df)} satır yüklendi</div>""",
+            unsafe_allow_html=True,
+        )
+        col_name = st.selectbox("Analiz sütunu:", df.columns)
+        if st.button("Toplu Analizi Başlat", use_container_width=False):
+            progress = st.progress(0)
+            status_text = st.empty()
+            results = []
+            t0 = time.time()
+            for i, text in enumerate(df[col_name]):
+                try:
+                    resp = requests.post(api_url, json={"text": str(text), "platform_dil": platform_dil}, timeout=30)
+                    r = resp.json() if resp.status_code == 200 else {}
+                except Exception:
+                    r = {}
+                details = r.get("details", {})
+                hits_all = list(details.get("hits", []) or []) + list(details.get("insult_hits", []) or [])
+                results.append(
+                    {
+                        "Metin": str(text),
+                        "Normalize": r.get("cleaned_text", ""),
+                        "Dil": r.get("language", "—").upper(),
+                        "Karar": r.get("decision", "—"),
+                        "Risk": r.get("risk_level", "—"),
+                        "Gerekçe": r.get("reason", "—"),
+                        "Saldırganlık": round(float(details.get("off_score", 0.0)), 4),
+                        "Nefret": round(float(details.get("detox", {}).get("identity_attack", 0.0)), 4),
+                        "Tehdit": round(float(details.get("threat", details.get("detox", {}).get("threat", 0.0))), 4),
+                        "Hits": ", ".join(hits_all) if hits_all else "",
+                    }
+                )
+                progress.progress((i + 1) / len(df))
+                status_text.markdown(
+                    f"""<span style="font-family:'IBM Plex Mono',monospace;font-size:12px;color:#4a6080;">{i + 1} / {len(df)} işlendi</span>""",
+                    unsafe_allow_html=True,
+                )
+            elapsed = time.time() - t0
+            res_df = pd.DataFrame(results)
+            if len(df) > 0:
+                st.session_state["last_latency_ms"] = (elapsed * 1000.0) / len(df)
+            status_text.empty()
+            progress.empty()
+            st.markdown(
+                f"""<div style="font-family:'IBM Plex Mono',monospace;font-size:12px;color:#2ea84a;margin:12px 0;">
+                {len(df)} satır {elapsed:.1f}s içinde analiz edildi</div>""",
+                unsafe_allow_html=True,
+            )
+            counts = res_df["Karar"].value_counts()
+            karar_colors_ui = {
+                "TEMIZ": "#2ea84a",
+                "CLEAR": "#2ea84a",
+                "KÜFÜR": "#d4a017",
+                "KUFUR": "#d4a017",
+                "PROFANITY": "#d4a017",
+                "SALDIRGAN": "#d4a017",
+                "TOXIC": "#d4a017",
+                "NEFRET": "#e07020",
+                "IDENTITY": "#e07020",
+                "İNCELEME": "#3a7bd4",
+                "INCELEME": "#3a7bd4",
+                "REVIEW": "#3a7bd4",
+                "SPAM": "#8030d4",
+                "GİBBERİSH": "#8030d4",
+            }
+            cols_summary = st.columns(min(len(counts), 6))
+            for i, (karar, cnt) in enumerate(counts.items()):
+                if i < 6:
+                    vc = next((v for k, v in karar_colors_ui.items() if k in karar.upper()), "#888888")
+                    with cols_summary[i]:
+                        st.markdown(
+                            f"""<div class="metric-card" style="text-align:center;">
+                            <div class="summary-count" style="color:{vc}">{cnt}</div>
+                            <div class="summary-label">{karar[:18]}</div>
+                        </div>""",
+                            unsafe_allow_html=True,
+                        )
+            st.markdown("<br>", unsafe_allow_html=True)
+            st.markdown(
+                """<div style="font-family:'IBM Plex Mono',monospace;font-size:11px;color:#4a6080;text-transform:uppercase;letter-spacing:1px;margin-bottom:12px;">Detaylı Analiz Tablosu</div>""",
+                unsafe_allow_html=True,
+            )
+            table_rows = ""
+            for idx, row in res_df.iterrows():
+                risk_str = str(row.get("Risk", "")).upper()
+                row_bg = {
+                    "CRITICAL": "#1f0c0c",
+                    "HIGH": "#1a0e03",
+                    "MEDIUM": "#141002",
+                    "LOW": "#07091a",
+                    "NONE": "#050f07",
+                }.get(risk_str, "#0d1220")
+                karar_str = str(row.get("Karar", ""))
+                kc = next((v for k, v in karar_colors_ui.items() if k in karar_str.upper()), "#888888")
+                sal = float(row.get("Saldırganlık", 0.0))
+                nef = float(row.get("Nefret", 0.0))
+                thr = float(row.get("Tehdit", 0.0))
+                hits_str = str(row.get("Hits", "")).strip()
+                hits_html = ""
+                if hits_str:
+                    for h in hits_str.split(","):
+                        h = h.strip()
+                        if h:
+                            hits_html += f'<span class="hits-tag">{h}</span>'
+                else:
+                    hits_html = '<span style="color:#2a3d55;font-size:10px;">—</span>'
+                metin_full = str(row.get("Metin", ""))
+                metin_short = metin_full[:60] + "..." if len(metin_full) > 60 else metin_full
+                normalize = str(row.get("Normalize", ""))[:50]
+                table_rows += f"""
+                <tr style="background:{row_bg}">
+                    <td style="color:#2a3d55;text-align:center;font-size:11px;">{idx + 1}</td>
+                    <td class="metin-cell" title="{metin_full}">{metin_short}</td>
+                    <td style="color:#4a6080;font-size:10px;font-style:italic;">{normalize}</td>
+                    <td class="karar-cell" style="color:{kc}">{karar_str[:22]}</td>
+                    <td>{badge_html(risk_str)}</td>
+                    <td class="skor-cell">{inline_bar_html(sal, risk_color(sal))}</td>
+                    <td class="skor-cell">{inline_bar_html(nef, risk_color(nef))}</td>
+                    <td class="skor-cell">{inline_bar_html(thr, risk_color(thr))}</td>
+                    <td>{hits_html}</td>
+                    <td style="color:#4a6080;font-size:10px;max-width:180px;">{str(row.get("Gerekçe", ""))[:60]}</td>
+                </tr>"""
+            st.markdown(
+                f"""
+            <div style="overflow-x:auto;overflow-y:auto;max-height:520px;border:1px solid #1e2d45;border-radius:10px;">
+            <table class="report-table">
+                <thead>
+                    <tr>
+                        <th>#</th><th>Metin</th><th>Normalize</th><th>Karar</th>
+                        <th>Risk</th><th>Saldırganlık</th><th>Nefret</th><th>Tehdit</th>
+                        <th>Hits</th><th>Gerekçe</th>
+                    </tr>
+                </thead>
+                <tbody>{table_rows}</tbody>
+            </table>
+            </div>""",
+                unsafe_allow_html=True,
+            )
+            st.markdown("<br>", unsafe_allow_html=True)
+            col_chart, col_stats = st.columns([1, 1])
+            with col_chart:
+                st.markdown(
+                    """<div style="font-family:'IBM Plex Mono',monospace;font-size:11px;color:#4a6080;text-transform:uppercase;letter-spacing:1px;margin-bottom:10px;">Dağılım</div>""",
+                    unsafe_allow_html=True,
+                )
+                st.bar_chart(counts)
+            with col_stats:
+                st.markdown(
+                    """<div style="font-family:'IBM Plex Mono',monospace;font-size:11px;color:#4a6080;text-transform:uppercase;letter-spacing:1px;margin-bottom:10px;">İstatistikler</div>""",
+                    unsafe_allow_html=True,
+                )
+                total = len(res_df)
+                zararli = total - len(res_df[res_df["Karar"].str.contains("TEMİZ|CLEAR", na=False)])
+                st.markdown(
+                    f"""
+                <div style="font-family:'IBM Plex Mono',monospace;font-size:13px;line-height:2.2;color:#8a9bc0;">
+                    <span style="color:#4a6080">Toplam kayıt  </span> {total}<br>
+                    <span style="color:#4a6080">Zararlı içerik</span> <span style="color:#e03030">{zararli}</span> (%{zararli / total * 100:.1f})<br>
+                    <span style="color:#4a6080">Ortalama süre </span> {elapsed / total * 1000:.0f}ms / satır<br>
+                    <span style="color:#4a6080">Hits bulundu  </span> {len(res_df[res_df['Hits'].str.len() > 0])} kayıt<br>
+                    <span style="color:#4a6080">İnceleme kuyruğu</span> {len(res_df[res_df['Karar'].str.contains('İNCELEME|INCELEME', na=False)])} içerik
+                </div>""",
+                    unsafe_allow_html=True,
+                )
+            st.markdown("<br>", unsafe_allow_html=True)
+            inceleme = res_df[res_df["Karar"].str.contains("İNCELEME|INCELEME|REVIEW", na=False)]
+            if len(inceleme):
+                st.markdown(
+                    f"""<div style="font-family:'IBM Plex Mono',monospace;font-size:11px;color:#3a7bd4;text-transform:uppercase;letter-spacing:1px;margin-bottom:12px;">İnceleme Kuyruğu — {len(inceleme)} İçerik</div>""",
+                    unsafe_allow_html=True,
+                )
+                for i, (_, row) in enumerate(inceleme.iterrows()):
+                    sal = float(row.get("Saldırganlık", 0.0))
+                    st.markdown(
+                        f"""<div class="queue-card">
+                        <div class="queue-index">{i + 1:02d}</div>
+                        <div>
+                            <div class="queue-text">{str(row.get('Metin', ''))}</div>
+                            <div class="queue-meta">
+                                Risk: {row.get('Risk', '')} &nbsp;|&nbsp;
+                                Saldırganlık: %{sal * 100:.0f} &nbsp;|&nbsp;
+                                {row.get('Gerekçe', '')}
+                            </div>
+                        </div>
+                    </div>""",
+                        unsafe_allow_html=True,
+                    )
+            st.markdown("<br>", unsafe_allow_html=True)
+            st.markdown(
+                """<div style="font-family:'IBM Plex Mono',monospace;font-size:11px;color:#4a6080;text-transform:uppercase;letter-spacing:1px;margin-bottom:12px;">Raporu İndir</div>""",
+                unsafe_allow_html=True,
+            )
+            col_dl1, col_dl2, _ = st.columns([1, 1, 4])
+            with col_dl1:
+                csv_bytes = res_df.to_csv(index=False).encode("utf-8")
+                st.download_button(
+                    "⬇ CSV",
+                    data=csv_bytes,
+                    file_name=f"sentinel_raporu_{datetime.now().strftime('%Y%m%d_%H%M')}.csv",
+                    mime="text/csv",
+                    use_container_width=True,
+                )
+            with col_dl2:
+                docx_buf = generate_docx_report(res_df, elapsed, platform_dil)
+                if docx_buf:
+                    st.download_button(
+                        "⬇ DOCX",
+                        data=docx_buf,
+                        file_name=f"sentinel_raporu_{datetime.now().strftime('%Y%m%d_%H%M')}.docx",
+                        mime="application/vnd.openxmlformats-officedocument.wordprocessingml.document",
+                        use_container_width=True,
+                    )
+                else:
+                    st.warning("python-docx yüklü değil: pip install python-docx")

app/__init__.py ADDED Viewed

File without changes

app/api/__init__.py ADDED Viewed

File without changes

app/api/endpoints.py ADDED Viewed

	@@ -0,0 +1,131 @@

+import subprocess
+import time
+from typing import Optional
+import torch
+from fastapi import APIRouter, HTTPException
+from pydantic import BaseModel
+try:
+    import psutil
+except ImportError:
+    psutil = None
+from app.services.cache_manager import get_cache_counts, load_blacklist_to_ram
+from app.services.moderation_service import run_moderation
+router = APIRouter()
+def get_gpu_info():
+    try:
+        raw = subprocess.check_output(
+            [
+                "nvidia-smi",
+                "--query-gpu=utilization.gpu,memory.used,memory.total",
+                "--format=csv,noheader,nounits",
+            ],
+            encoding="utf-8",
+            stderr=subprocess.STDOUT,
+        )
+        util, mem_used, mem_total = [p.strip() for p in raw.strip().splitlines()[0].split(",", maxsplit=2)]
+        return {
+            "load": int(float(util)),
+            "vram_used": int(float(mem_used)),
+            "vram_total": int(float(mem_total)),
+        }
+    except Exception:
+        if not torch.cuda.is_available():
+            return None
+        allocated = torch.cuda.memory_allocated(0) / (1024 ** 2)
+        total = torch.cuda.get_device_properties(0).total_memory / (1024 ** 2)
+        return {
+            "load": None,
+            "vram_used": int(round(allocated)),
+            "vram_total": int(round(total)),
+        }
+def capture_process_metrics():
+    cpu_load = None
+    ram_pct = None
+    if psutil is not None:
+        cpu_load = round(psutil.cpu_percent(interval=0.05), 1)
+        ram_pct = round(psutil.virtual_memory().percent, 1)
+    gpu = get_gpu_info()
+    return {
+        "cpu": cpu_load,
+        "ram_pct": ram_pct,
+        "gpu_load": gpu["load"] if gpu else None,
+        "vram_used": gpu["vram_used"] if gpu else 0,
+        "vram_total": gpu["vram_total"] if gpu else 0,
+        "timestamp": time.strftime("%H:%M:%S"),
+    }
+class ModerationInput(BaseModel):
+    text: str
+    platform_dil: Optional[str] = "tr"
+@router.get("/vram-status")
+def get_vram_status():
+    if not torch.cuda.is_available():
+        return {
+            "cuda_available": False,
+            "message": "CUDA aktif değil, GPU belleği ölçülemedi.",
+        }
+    allocated = torch.cuda.memory_allocated(0) / (1024 ** 2)
+    reserved = torch.cuda.memory_reserved(0) / (1024 ** 2)
+    total = torch.cuda.get_device_properties(0).total_memory / (1024 ** 2)
+    return {
+        "cuda_available": True,
+        "gpu_name": torch.cuda.get_device_name(0),
+        "allocated_mb": round(allocated, 2),
+        "reserved_mb": round(reserved, 2),
+        "total_mb": round(total, 2),
+        "free_estimate_mb": round(total - reserved, 2),
+    }
+@router.get("/refresh-cache")
+def refresh_cache():
+    load_blacklist_to_ram()
+    tr_count, en_count = get_cache_counts()
+    return {
+        "status": "success",
+        "message": "Kara liste güncellendi.",
+        "tr_count": tr_count,
+        "en_count": en_count,
+    }
+@router.post("/analyze")
+async def analyze(input_data: ModerationInput):
+    if not input_data.text or not input_data.text.strip():
+        raise HTTPException(status_code=400, detail="text alanı boş olamaz")
+    start_time = time.time()
+    decision, reason, risk, lang, cleaned, details = run_moderation(
+        input_data.text,
+        input_data.platform_dil or "tr",
+    )
+    latency_ms = round((time.time() - start_time) * 1000, 2)
+    performance = capture_process_metrics()
+    performance["latency_ms"] = latency_ms
+    return {
+        "text": input_data.text,
+        "cleaned_text": cleaned,
+        "decision": decision,
+        "reason": reason,
+        "risk_level": risk,
+        "language": lang,
+        "details": details,
+        "latency_ms": latency_ms,
+        "performance": performance,
+    }

app/core/__init__.py ADDED Viewed

File without changes

app/core/config.py ADDED Viewed

	@@ -0,0 +1,14 @@

+import os
+from dotenv import load_dotenv
+load_dotenv()
+APP_TITLE = "🛡️ Sentinel AI Moderasyon API"
+APP_DESCRIPTION = "Supabase tabanlı, yüksek performanslı moderasyon motoru."
+APP_VERSION = "2.5.0"
+SUPABASE_URL = os.getenv("SUPABASE_URL", "")
+SUPABASE_KEY = os.getenv("SUPABASE_KEY", "")
+TR_HATE_MODEL_PATH = "./models_cache/bertturk-hate-speech"
+TR_OFF_MODEL_PATH = "./models_cache/bertturk-offensive-42k"

app/db/__init__.py ADDED Viewed

File without changes

app/db/supabase_client.py ADDED Viewed

	@@ -0,0 +1,24 @@

+from supabase import create_client
+from app.core.config import SUPABASE_KEY, SUPABASE_URL
+_supabase = None
+def get_supabase_client():
+    global _supabase
+    if _supabase is not None:
+        return _supabase
+    if not SUPABASE_URL or not SUPABASE_KEY:
+        print("⚠️ Supabase bilgileri .env içinde bulunamadı!")
+        return None
+    try:
+        _supabase = create_client(SUPABASE_URL, SUPABASE_KEY)
+    except Exception as exc:
+        print(f"⚠️ Supabase client oluşturulamadı: {exc}")
+        _supabase = None
+    return _supabase

app/ml/__init__.py ADDED Viewed

File without changes

app/ml/model_loader.py ADDED Viewed

	@@ -0,0 +1,55 @@

+import torch
+from detoxify import Detoxify
+from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
+from app.core.config import TR_OFF_MODEL_PATH
+_STATE = {
+    "T_O": None,
+    "M_O": None,
+    "GB_PIPE": None,
+    "D_EN": None,
+    "D_MULTI": None,
+    "TORCH_DEVICE": None,
+}
+def load_system():
+    if _STATE["T_O"] is not None:
+        return _STATE
+    device_id = 0 if torch.cuda.is_available() else -1
+    torch_device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    tokenizer_o = AutoTokenizer.from_pretrained(TR_OFF_MODEL_PATH)
+    model_o = AutoModelForSequenceClassification.from_pretrained(TR_OFF_MODEL_PATH).to(torch_device)
+    model_o.eval()
+    try:
+        gibberish = pipeline(
+            "text-classification",
+            model="madhurjindal/autonlp-Gibberish-Detector-492513457",
+            device=device_id,
+        )
+    except Exception:
+        gibberish = None
+    detox_en = Detoxify("original")
+    detox_multi = Detoxify("multilingual")
+    _STATE.update(
+        {
+            "T_O": tokenizer_o,
+            "M_O": model_o,
+            "GB_PIPE": gibberish,
+            "D_EN": detox_en,
+            "D_MULTI": detox_multi,
+            "TORCH_DEVICE": torch_device,
+        }
+    )
+    return _STATE
+def get_model_state():
+    return load_system()

app/services/__init__.py ADDED Viewed

File without changes

app/services/cache_manager.py ADDED Viewed

	@@ -0,0 +1,73 @@

+from app.db.supabase_client import get_supabase_client
+CACHE_KUFUR_DICT_TR = {}
+CACHE_KUFUR_DICT_EN = {}
+def load_blacklist_to_ram():
+    """Supabase limitlerini (1000 satır) aşan sayfalama destekli loader."""
+    global CACHE_KUFUR_DICT_TR, CACHE_KUFUR_DICT_EN
+    temp_tr = {}
+    temp_en = {}
+    supabase = get_supabase_client()
+    if supabase is None:
+        print("⚠️ Supabase bağlantısı yok!")
+        return
+    try:
+        print("🌐 Supabase'den tüm liste çekiliyor...")
+        all_rows = []
+        start = 0
+        page_size = 1000
+        while True:
+            response = (
+                supabase.table("blacklist")
+                .select("word, language, category")
+                .range(start, start + page_size - 1)
+                .execute()
+            )
+            data = response.data or []
+            all_rows.extend(data)
+            if len(data) < page_size:
+                break
+            start += page_size
+        print(f"📊 Toplam çekilen satır: {len(all_rows)}")
+        found_langs = set()
+        for row in all_rows:
+            lang_raw = str(row.get("language", "")).lower().strip()
+            word = str(row.get("word", "")).lower().strip()
+            cat = str(row.get("category", "insult")).lower().strip() or "insult"
+            if not word:
+                continue
+            found_langs.add(lang_raw)
+            if lang_raw == "tr":
+                temp_tr[word] = cat
+            elif lang_raw == "en":
+                temp_en[word] = cat
+        CACHE_KUFUR_DICT_TR = temp_tr
+        CACHE_KUFUR_DICT_EN = temp_en
+        print(f"🔍 Veritabanındaki diller: {found_langs}")
+        print(f"✅ RAM Hazır: {len(temp_tr)} TR, {len(temp_en)} EN kelime.")
+    except Exception as exc:
+        print(f"❌ Cache Hatası: {exc}")
+def get_blacklist_for_language(language: str):
+    return CACHE_KUFUR_DICT_TR if language == "tr" else CACHE_KUFUR_DICT_EN
+def get_cache_counts():
+    return len(CACHE_KUFUR_DICT_TR), len(CACHE_KUFUR_DICT_EN)

app/services/moderation_service.py ADDED Viewed

	@@ -0,0 +1,169 @@

+import re
+import torch
+from app.ml.model_loader import get_model_state
+from app.services.cache_manager import get_blacklist_for_language, get_cache_counts, load_blacklist_to_ram
+from app.utils.text_utils import clean_text_nfkc, is_spam
+def _ensure_runtime_ready():
+    state = get_model_state()
+    tr_count, en_count = get_cache_counts()
+    if tr_count == 0 and en_count == 0:
+        load_blacklist_to_ram()
+    return state
+def calculate_verdict(hits, ai_scores):
+    if len(hits) > 0:
+        return {
+            "decision": "🚨 KÜFÜR / PROFANITY",
+            "risk_level": "CRITICAL",
+            "reason": f"Sözlük eşleşmesi: {', '.join(hits)}",
+        }
+    off_score = ai_scores.get("off_score", 0.0)
+    detox_score = ai_scores.get("detox_toxicity", 0.0)
+    max_score = max(off_score, detox_score)
+    if max_score > 0.80:
+        return {
+            "decision": "🟡 SALDIRGAN / TOXIC",
+            "risk_level": "MEDIUM",
+            "reason": "Yapay zeka yüksek derecede saldırganlık/hakaret algıladı.",
+        }
+    if 0.55 < max_score <= 0.80:
+        return {
+            "decision": "🔵 İNCELEME GEREKLİ",
+            "risk_level": "LOW",
+            "reason": "Gri alan tespiti; manuel moderatör onayı önerilir.",
+        }
+    return {
+        "decision": "✅ TEMİZ",
+        "risk_level": "NONE",
+        "reason": "İçerik güvenli.",
+    }
+def run_moderation(text: str, platform_dil: str = "tr"):
+    state = _ensure_runtime_ready()
+    temiz = clean_text_nfkc(text)
+    dil = "en" if platform_dil == "en" else "tr"
+    pure_text = re.sub(r"[^a-zA-ZçğıöşüÇĞİÖŞÜ0-9\s]", "", temiz).lower()
+    words_in_pure_text = set(pure_text.split())
+    if is_spam(temiz, dil):
+        return (
+            "🗑️ SPAM/GİBBERİSH",
+            "Anlamsız veya tekrarlı içerik.",
+            "LOW",
+            dil,
+            temiz,
+            {"action": "MONITOR", "detox": {}},
+        )
+    active_cache = get_blacklist_for_language(dil)
+    detected_profanity = []
+    detected_insult = []
+    for bad_word, category in active_cache.items():
+        is_hit = bad_word in words_in_pure_text or (len(bad_word) > 3 and bad_word in pure_text)
+        if is_hit:
+            if category == "profanity":
+                detected_profanity.append(bad_word)
+            else:
+                detected_insult.append(bad_word)
+    profanity_hits = sorted(set(detected_profanity))
+    insult_hits = sorted(set(detected_insult))
+    if dil == "en":
+        if state["GB_PIPE"] is not None:
+            gb_raw = state["GB_PIPE"](temiz)[0]
+            gb = {
+                "label": str(gb_raw.get("label", "")),
+                "score": float(gb_raw.get("score", 0.0)),
+            }
+            if gb.get("label", "").lower() == "noise" and gb.get("score", 0.0) > 0.98:
+                return (
+                    "🗑️ GIBBERISH/SPAM",
+                    "Metin anlamsız karakter dizileri içeriyor.",
+                    "LOW",
+                    "en",
+                    temiz,
+                    {"gibberish": gb, "action": "MONITOR", "detox": {}},
+                )
+        raw_res = state["D_EN"].predict(temiz)
+        res = {k: float(v) for k, v in raw_res.items()}
+        tox_score = float(res.get("toxicity", 0.0))
+        ins_score = float(res.get("insult", 0.0))
+        identity_attack = float(res.get("identity_attack", 0.0))
+        detail = {
+            "detox": res,
+            "insult": ins_score,
+            "toxicity": tox_score,
+            "identity_attack": identity_attack,
+            "hits": profanity_hits,
+            "insult_hits": insult_hits,
+            "gibberish": gb if state["GB_PIPE"] is not None else None,
+        }
+        verdict = calculate_verdict(
+            profanity_hits,
+            {
+                "off_score": 0.0,
+                "detox_toxicity": tox_score,
+            },
+        )
+        action_map = {
+            "CRITICAL": "CENSOR",
+            "HIGH": "WARN",
+            "MEDIUM": "MONITOR",
+            "LOW": "MONITOR",
+            "NONE": "ALLOW",
+        }
+        detail.update({"action": action_map.get(verdict["risk_level"], "MONITOR")})
+        return verdict["decision"], verdict["reason"], verdict["risk_level"], dil, temiz, detail
+    in_o = state["T_O"](temiz, return_tensors="pt", truncation=True, padding=True, max_length=128)
+    in_o = {k: v.to(state["TORCH_DEVICE"]) for k, v in in_o.items()}
+    with torch.no_grad():
+        out_o = state["M_O"](**in_o)
+    p_o = torch.softmax(out_o.logits, dim=1)[0]
+    off_score = float(p_o[1].item()) if p_o.numel() > 1 else float(p_o.max().item())
+    raw_threat_res = state["D_MULTI"].predict(temiz)
+    threat_res = {k: float(v) for k, v in raw_threat_res.items()}
+    threat = float(threat_res.get("threat", 0.0))
+    tox_score = float(threat_res.get("toxicity", 0.0))
+    ins_score = float(threat_res.get("insult", 0.0))
+    detail = {
+        "off_score": off_score,
+        "toxicity": tox_score,
+        "insult": ins_score,
+        "threat": threat,
+        "detox": threat_res,
+        "hits": profanity_hits,
+        "insult_hits": insult_hits,
+    }
+    verdict = calculate_verdict(
+        profanity_hits,
+        {
+            "off_score": off_score,
+            "detox_toxicity": tox_score,
+        },
+    )
+    action_map = {
+        "CRITICAL": "CENSOR",
+        "HIGH": "WARN",
+        "MEDIUM": "MONITOR",
+        "LOW": "MONITOR",
+        "NONE": "ALLOW",
+    }
+    detail.update({"action": action_map.get(verdict["risk_level"], "MONITOR")})
+    return verdict["decision"], verdict["reason"], verdict["risk_level"], dil, temiz, detail

app/utils/__init__.py ADDED Viewed

File without changes

app/utils/text_utils.py ADDED Viewed

	@@ -0,0 +1,65 @@

+import re
+import unicodedata
+def clean_text_nfkc(text: str) -> str:
+    text = unicodedata.normalize('NFKC', str(text))
+    text = text.replace('İ', 'i').replace('I', 'ı').lower()
+    text = re.sub(r'(?<=[a-zğüşıöç0-9])[\.\-_\*]+(?=[a-zğüşıöç0-9])', '', text)
+    leet_map = {'0': 'o', '1': 'i', '3': 'e', '4': 'a', '5': 's', '7': 't', '8': 'b'}
+    for key, value in leet_map.items():
+        text = text.replace(key, value)
+    text = re.sub(r'(.)\1+', r'\1', text)
+    return " ".join(text.split())
+def check_blacklist(text: str, blacklist_set: set) -> bool:
+    return bool(set(text.split()) & blacklist_set)
+def is_spam(temiz: str, dil: str = "tr") -> bool:
+    sadece_harf = re.sub(r'[^a-zğüşıöç]', '', temiz)
+    n = len(sadece_harf)
+    if n < 2:
+        return True
+    sesli = set('aeıioöuüeiou')
+    sesli_oran = sum(1 for c in sadece_harf if c in sesli) / max(n, 1)
+    if 5 < n < 100 and sesli_oran < 0.15:
+        return True
+    if dil == "tr":
+        tr_olmayan = set('wqx')
+        tr_olmayan_oran = sum(1 for c in sadece_harf if c in tr_olmayan) / max(n, 1)
+        if tr_olmayan_oran > 0.2:
+            return True
+    unique_chars = len(set(sadece_harf))
+    if 10 < n < 50:
+        if unique_chars / n < 0.25:
+            return True
+    elif n >= 50:
+        if unique_chars < 8:
+            return True
+    if re.search(r'(.)\1{6,}', temiz):
+        return True
+    n_temiz = len(temiz)
+    for blok in range(3, min(10, n_temiz // 2 + 1)):
+        pattern = temiz[:blok]
+        tekrar = temiz.count(pattern)
+        if tekrar >= 4 and tekrar * blok >= n_temiz * 0.7:
+            return True
+    spam_patterns = [
+        r'http[s]?://', r'www\.', r'\.com', r'\.net', r'\.org',
+        r'click\s*here', r'buy\s*cheap', r'free\s*follow',
+        r'tıkla.*kazan', r'ücretsiz.*takipçi', r'satın\s*al',
+        r'indirim.*%', r'subscribe.*channel',
+    ]
+    for pattern in spam_patterns:
+        if re.search(pattern, temiz, re.IGNORECASE):
+            return True
+    return False

main.py ADDED Viewed

	@@ -0,0 +1,12 @@

+from fastapi import FastAPI
+from app.api.endpoints import router
+from app.core.config import APP_DESCRIPTION, APP_TITLE, APP_VERSION
+app = FastAPI(
+    title=APP_TITLE,
+    description=APP_DESCRIPTION,
+    version=APP_VERSION,
+)
+app.include_router(router)

performance_test.py ADDED Viewed

	@@ -0,0 +1,58 @@

+import time
+import requests
+API_URL = "http://127.0.0.1:8000/analyze"
+heavy_text_tr = """
+Merhaba, şu an geliştirmekte olduğumuz Sentinel moderasyon sisteminin performans limitlerini test etmek amacıyla bu uzun paragrafı oluşturuyorum.
+Yapay zeka modellerinin, özellikle BERTurk ve Detoxify gibi derin öğrenme mimarilerinin, metin uzunluğu arttıkça işlem süresini nasıl değiştirdiğini gözlemlemek bizim için kritik.
+Bu metin, herhangi bir küfür veya spam emaresi taşımadığı için sistemin tüm ön filtrelerinden geçerek doğrudan doğal dil işleme katmanına ulaşacaktır.
+Burada tokenization süreci, modelin çıkarım (inference) hızı ve donanım kaynaklarının kullanımı gibi metrikleri saniyeler bazında değil, milisaniyeler bazında ölçerek sistemin gerçek zamanlı
+isteklere ne kadar hazırlıklı olduğunu kanıtlamış olacağız. Umarım sonuçlar, sistemin ölçeklenebilirliği hakkında bize net bir veri sağlar.
+"""
+heavy_text_en = """
+Hello, this is a long paragraph designed to test the performance limits of the Sentinel moderation system in English.
+We are specifically looking at how the Detoxify original model handles longer contexts and multiple toxicity labels simultaneously.
+By sending this comprehensive text, we ensure that the system bypasses simple keyword filters and triggers the full deep learning pipeline.
+This will give us a clear baseline for latency in a global production environment.
+"""
+test_scenarios = [
+    ("TR - Kısa Temiz", "Merhaba, bugün hava çok güzel.", "tr"),
+    ("TR - Early Exit (Küfür)", "Lan naber o.ç.", "tr"),
+    ("TR - Ağır AI Yükü", heavy_text_tr, "tr"),
+    ("EN - Kısa Temiz", "Hello, I hope you are having a wonderful day.", "en"),
+    ("EN - Early Exit (Profanity)", "Shut the fuck up you bastard!", "en"),
+    ("EN - Ağır AI Yükü", heavy_text_en, "en"),
+]
+def run_performance_suite() -> None:
+    print(f"{'Senaryo Adı':<30} | {'Dil':<4} | {'API Latency':<12} | {'Toplam Süre':<12}")
+    print("-" * 75)
+    for label, text, lang in test_scenarios:
+        start_time = time.time()
+        try:
+            payload = {"text": text, "platform_dil": lang}
+            response = requests.post(API_URL, json=payload, timeout=60)
+            total_time = (time.time() - start_time) * 1000
+        except requests.RequestException as exc:
+            print(f"{label:<30} | {lang:<4} | BAĞLANTI HATASI: {exc}")
+            continue
+        if response.status_code == 200:
+            res_json = response.json()
+            api_latency = float(res_json.get("latency_ms", 0))
+            status_symbol = "⚡" if api_latency < 50 else "🧠"
+            print(f"{label:<30} | {lang.upper():<4} | {api_latency:>8.2f} ms | {total_time:>8.2f} ms {status_symbol}")
+        else:
+            print(f"{label:<30} | {lang:<4} | HATA: {response.status_code}")
+if __name__ == "__main__":
+    requests.post(API_URL, json={"text": "warmup", "platform_dil": "tr"}, timeout=60)
+    requests.post(API_URL, json={"text": "warmup", "platform_dil": "en"}, timeout=60)
+    run_performance_suite()

requirements.txt ADDED Viewed

	@@ -0,0 +1,16 @@

+transformers
+torch
+detoxify
+streamlit
+pandas
+requests
+openpyxl
+sentencepiece
+matplotlib
+scikit-learn
+fastapi
+uvicorn
+supabase
+python-dotenv
+psutil
+python-docx

stress_test.py ADDED Viewed

	@@ -0,0 +1,63 @@

+import requests
+import time
+from concurrent.futures import ThreadPoolExecutor
+import torch
+API_URL = "http://127.0.0.1:8000/analyze"
+VRAM_URL = "http://127.0.0.1:8000/vram-status"
+TOTAL_REQUESTS = 50  # Toplam istek sayısı
+CONCURRENT_USERS = 5  # Aynı anda saldıran kullanıcı sayısı
+payload = {
+    "text": "Bu bir performans testidir. Sistemimiz hem Türkçe hem İngilizce içerikleri başarıyla analiz edebiliyor.",
+    "platform_dil": "tr",
+}
+def send_request(_):
+    start = time.time()
+    response = requests.post(API_URL, json=payload)
+    response.raise_for_status()
+    return (time.time() - start) * 1000
+print(f"🔥 Stress Test Başlatılıyor: {TOTAL_REQUESTS} istek, {CONCURRENT_USERS} eşzamanlı kanal...")
+with ThreadPoolExecutor(max_workers=CONCURRENT_USERS) as executor:
+    # İlk isteği "Warm-up" (Isınma) için atalım ve sonuçlara dahil etmeyelim
+    requests.post(API_URL, json=payload)
+    start_time = time.time()
+    latencies = list(executor.map(send_request, range(TOTAL_REQUESTS)))
+    total_duration = time.time() - start_time
+avg_latency = sum(latencies) / len(latencies)
+rps = TOTAL_REQUESTS / total_duration
+if torch.cuda.is_available():
+    runtime_label = f"GPU - {torch.cuda.get_device_name(0)} Üzerinde"
+else:
+    runtime_label = "CPU Üzerinde"
+print("\n" + "=" * 40)
+print(f"📊 SONUÇLAR ({runtime_label})")
+print("-" * 40)
+print(f"⏱️ Ortalama Gecikme: {avg_latency:.2f} ms")
+print(f"🚀 Saniyedeki İstek (RPS): {rps:.2f} req/sec")
+print(f"⌛ Toplam Süre: {total_duration:.2f} saniye")
+print("=" * 40)
+print("\n🔎 VRAM Snapshot (/vram-status)")
+try:
+    vram_resp = requests.get(VRAM_URL, timeout=10)
+    vram_resp.raise_for_status()
+    vram = vram_resp.json()
+    if vram.get("cuda_available"):
+        print(f"📟 GPU: {vram.get('gpu_name', 'Bilinmiyor')}")
+        print(f"🔥 Allocated: {vram.get('allocated_mb', 0)} MB")
+        print(f"🛡️ Reserved: {vram.get('reserved_mb', 0)} MB")
+        print(f"🆓 Free (Tahmini): {vram.get('free_estimate_mb', 0)} MB")
+    else:
+        print(f"ℹ️ {vram.get('message', 'CUDA aktif değil.')}")
+except requests.RequestException as exc:
+    print(f"⚠️ VRAM endpoint erişilemedi: {exc}")

utils.py ADDED Viewed

	@@ -0,0 +1,51 @@

+import unicodedata
+import re
+def clean_text_nfkc(text: str) -> str:
+    text = unicodedata.normalize('NFKC', str(text))
+    text = text.replace('İ', 'i').replace('I', 'ı').lower()
+    text = re.sub(r'(?<=[a-zğüşıöç0-9])[\.\-_\*]+(?=[a-zğüşıöç0-9])', '', text)
+    leet_map = {'0':'o', '1':'i', '3':'e', '4':'a', '5':'s', '7':'t', '8':'b'}
+    for key, value in leet_map.items():
+        text = text.replace(key, value)
+    text = re.sub(r'(.)\1+', r'\1', text)
+    return " ".join(text.split())
+def check_blacklist(text: str, blacklist_set: set) -> bool:
+    return bool(set(text.split()) & blacklist_set)
+def is_spam(temiz: str, dil: str = "tr") -> bool:
+    sadece_harf = re.sub(r'[^a-zğüşıöç]', '', temiz)
+    if len(sadece_harf) < 2:
+        return True
+    sesli = set('aeıioöuüeiou')
+    sesli_oran = sum(1 for c in sadece_harf if c in sesli) / max(len(sadece_harf), 1)
+    if len(sadece_harf) > 5 and sesli_oran < 0.15:
+        return True
+    if dil == "tr":
+        tr_olmayan = set('wqx')
+        tr_olmayan_oran = sum(1 for c in sadece_harf if c in tr_olmayan) / max(len(sadece_harf), 1)
+        if tr_olmayan_oran > 0.2:
+            return True
+    if re.search(r'(.)\1{4,}', temiz):
+        return True
+    n = len(temiz)
+    for blok in range(2, n // 2 + 1):
+        pattern = temiz[:blok]
+        tekrar = len(re.findall(re.escape(pattern), temiz))
+        if tekrar >= 3 and tekrar * blok >= n * 0.6:
+            return True
+    if len(sadece_harf) > 10 and len(set(sadece_harf)) / len(sadece_harf) < 0.25:
+        return True
+    spam_patterns = [
+        r'http[s]?://', r'www\.', r'\.com', r'\.net', r'\.org',
+        r'click\s*here', r'buy\s*cheap', r'free\s*follow',
+        r'tıkla.*kazan', r'ücretsiz.*takipçi', r'satın\s*al',
+        r'indirim.*%', r'subscribe.*channel',
+    ]
+    for pattern in spam_patterns:
+        if re.search(pattern, temiz, re.IGNORECASE):
+            return True
+    return False

vram_check.py ADDED Viewed

	@@ -0,0 +1,32 @@

+import torch
+CRITICAL_RESERVED_MB = 3500
+def check_vram_usage() -> None:
+    if not torch.cuda.is_available():
+        print("❌ CUDA aktif değil, VRAM ölçülemez.")
+        return
+    allocated = torch.cuda.memory_allocated(0) / (1024 ** 2)
+    reserved = torch.cuda.memory_reserved(0) / (1024 ** 2)
+    total_capacity = torch.cuda.get_device_properties(0).total_memory / (1024 ** 2)
+    free_estimate = total_capacity - reserved
+    print("=" * 40)
+    print(f"📟 GPU: {torch.cuda.get_device_name(0)}")
+    print(f"📊 Toplam VRAM: {total_capacity:.2f} MB")
+    print(f"🔥 Şu An Ayrılan (Allocated): {allocated:.2f} MB")
+    print(f"🛡️ Rezerve Edilen (Reserved): {reserved:.2f} MB")
+    print(f"🆓 Boş Alan (Tahmini): {free_estimate:.2f} MB")
+    if reserved >= CRITICAL_RESERVED_MB:
+        print(f"⚠️ Kritik Eşik Aşıldı: Reserved >= {CRITICAL_RESERVED_MB} MB")
+    else:
+        print(f"✅ Güvenli Bölge: Reserved < {CRITICAL_RESERVED_MB} MB")
+    print("=" * 40)
+if __name__ == "__main__":
+    check_vram_usage()