Spaces:

Davidbio
/

fakenewsdetection

Sleeping

App Files Files Community

Davidbio commited on Dec 17, 2025

Commit

68e495a

verified ·

1 Parent(s): cf88e8b

Upload Manual ML Indobert

Browse files

Files changed (6) hide show

.dockerignore +91 -0
.gitignore +61 -0
Dockerfile +33 -0
README.md +90 -11
app.py +218 -0
requirements.txt +17 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,91 @@

+# .dockerignore - Exclude file yang tidak diperlukan dari Docker build
+# Ini memastikan hanya file yang terpakai yang masuk ke Docker image
+# Model checkpoints dan versions yang tidak terpakai
+models/indobert_versions/
+models/indobert/checkpoint-*/
+**/checkpoint-*/
+**/checkpoints/
+**/*.ckpt
+# Training artifacts yang tidak perlu
+*.pth.tar
+*.pt
+trainer_state.json
+training_args.bin
+# Data dan dataset (tidak perlu di Space)
+data/
+datasets/
+*.csv
+*.json
+*.jsonl
+!config.json
+!tokenizer.json
+!tokenizer_config.json
+!special_tokens_map.json
+# Python cache
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+*.egg-info/
+.pytest_cache/
+.mypy_cache/
+# Virtual environments
+venv/
+env/
+ENV/
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+.DS_Store
+Thumbs.db
+# Git
+.git/
+.gitignore
+.gitattributes
+# Documentation yang tidak perlu di runtime
+DEPLOYMENT.md
+QUICKSTART.md
+*.md
+!README.md
+# Test files
+test_*.py
+tests/
+*_test.py
+# Scripts yang tidak perlu di runtime
+upload_model_to_hub.py
+quick_deploy.sh
+quick_deploy.ps1
+# Logs
+*.log
+logs/
+# Temporary files
+*.tmp
+tmp/
+temp/
+.cache/
+# Large files yang tidak perlu
+*.zip
+*.tar
+*.gz
+*.rar
+# Notebook checkpoints
+.ipynb_checkpoints/
+*.ipynb

.gitignore ADDED Viewed

	@@ -0,0 +1,61 @@

+# .gitignore untuk Hugging Face Space
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual Environment
+venv/
+env/
+ENV/
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# OS
+.DS_Store
+Thumbs.db
+# Model files (jika sudah di-upload ke HF Hub)
+models/indobert/*.bin
+models/indobert/*.safetensors
+models/indobert/pytorch_model.bin
+# Logs
+*.log
+logs/
+# Environment variables
+.env
+.env.local
+# Test files
+test_*.py
+tests/
+# Temporary files
+*.tmp
+tmp/
+temp/

Dockerfile ADDED Viewed

	@@ -0,0 +1,33 @@

+# Hugging Face Space - IndoBERT Fake News Detection
+# Base image dengan Python 3.10
+FROM python:3.10-slim
+# Set working directory
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    git \
+    build-essential \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements first untuk caching
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application code (hanya file yang diperlukan)
+COPY app.py .
+# Model akan di-load dari HuggingFace Hub, tidak perlu copy lokal
+# Environment variable HF_MODEL_REPO akan di-set di Space settings
+# Expose port 7860 (default untuk HF Spaces)
+EXPOSE 7860
+# Set environment variables
+ENV GRADIO_SERVER_NAME="0.0.0.0"
+ENV GRADIO_SERVER_PORT=7860
+ENV HF_MODEL_REPO="Davidbio/fakenewsdetection"
+# Run the application
+CMD ["python", "app.py"]

README.md CHANGED Viewed

@@ -1,11 +1,90 @@
----
-title: Fakenewsdetection
-emoji: 🏢
-colorFrom: gray
-colorTo: green
-sdk: docker
-pinned: false
-license: mit
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: IndoBERT Fake News Detection
+emoji: 🔍
+colorFrom: red
+colorTo: yellow
+sdk: docker
+pinned: false
+license: mit
+app_port: 7860
+tags:
+  - indonesian
+  - fake-news
+  - bert
+  - classification
+  - nlp
+  - text-classification
+models:
+  - Davidbio/fakenewsdetection
+---
+# 🔍 IndoBERT Fake News Detection
+Aplikasi deteksi berita hoax berbahasa Indonesia menggunakan model IndoBERT.
+## 📖 Deskripsi
+Model ini menggunakan **IndoBERT** (Indonesian BERT) yang telah di-fine-tune pada dataset berita Indonesia untuk mengklasifikasikan berita sebagai **Real** atau **Hoax (Fake News)**.
+### ✨ Fitur
+- 🤖 Deteksi otomatis berita hoax menggunakan deep learning
+- 📊 Menampilkan confidence score dan probabilitas detail
+- 🇮🇩 Dioptimalkan untuk teks berbahasa Indonesia
+- ⚡ Interface yang mudah digunakan dengan Gradio
+## 🚀 Cara Penggunaan
+1. Masukkan teks berita pada kotak input
+2. Klik tombol "🔍 Deteksi Berita"
+3. Lihat hasil analisis:
+   - Label prediksi (Real/Hoax)
+   - Confidence score
+   - Distribusi probabilitas
+## 🎯 Model Information
+- **Base Model:** indobenchmark/indobert-base-p1
+- **Task:** Binary Text Classification
+- **Classes:**
+  - 0: Real News
+  - 1: Fake News (Hoax)
+- **Max Sequence Length:** 256 tokens
+- **Framework:** PyTorch + Transformers
+## ⚠️ Disclaimer
+Model ini adalah **alat bantu** dan tidak menjamin akurasi 100%. Selalu verifikasi informasi dari sumber terpercaya sebelum menyimpulkan sebuah berita sebagai hoax.
+## 📚 Dataset
+Model dilatih menggunakan dataset berita Indonesia yang telah dilabeli sebagai real atau hoax.
+## 🛠️ Technology Stack
+- **Framework:** Gradio
+- **Model:** IndoBERT (Transformers)
+- **Backend:** PyTorch
+- **Deployment:** Hugging Face Spaces (Docker)
+## 📝 Citation
+Jika menggunakan model ini, mohon cantumkan:
+```bibtex
+@misc{indobert-fakenews,
+  title={IndoBERT Fake News Detection},
+  author={Your Name},
+  year={2025},
+  publisher={Hugging Face},
+  howpublished={\url{https://huggingface.co/spaces/your-username/indobert-fakenews}}
+}
+```
+## 📄 License
+MIT License
+---
+**Developed with ❤️ for Indonesian NLP Community**

app.py ADDED Viewed

	@@ -0,0 +1,218 @@

+"""
+Hugging Face Space Application - IndoBERT Fake News Detection
+Menggunakan Gradio untuk interface yang user-friendly
+"""
+import gradio as gr
+import torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import numpy as np
+from pathlib import Path
+import os
+import logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# Konfigurasi - Model dari HuggingFace Hub
+MODEL_REPO = "Davidbio/fakenewsdetection"  # Model yang sudah di-upload
+MODEL_DIR = Path(__file__).parent / "models" / "indobert"  # Fallback lokal
+class FakeNewsDetector:
+    def __init__(self):
+        self.tokenizer = None
+        self.model = None
+        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        self.load_model()
+    def load_model(self):
+        """Load model dari HuggingFace Hub atau lokal"""
+        try:
+            # Load dari HuggingFace Hub (prioritas utama)
+            hf_repo = os.environ.get("HF_MODEL_REPO", MODEL_REPO)
+            logger.info(f"Loading model dari HuggingFace Hub: {hf_repo}")
+            try:
+                self.tokenizer = AutoTokenizer.from_pretrained(hf_repo)
+                self.model = AutoModelForSequenceClassification.from_pretrained(hf_repo)
+                logger.info("✅ Model loaded dari HuggingFace Hub")
+            except Exception as hub_error:
+                # Fallback ke lokal jika HF Hub gagal
+                if MODEL_DIR.exists() and any(MODEL_DIR.iterdir()):
+                    logger.warning(f"HF Hub failed, loading from local: {hub_error}")
+                    self.tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
+                    self.model = AutoModelForSequenceClassification.from_pretrained(
+                        MODEL_DIR
+                    )
+                    logger.info("✅ Model loaded dari lokal")
+                else:
+                    raise hub_error
+            self.model.eval()
+            self.model.to(self.device)
+            logger.info(f"Model berhasil dimuat di {self.device}")
+        except Exception as e:
+            logger.error(f"Error loading model: {e}")
+            raise
+    def predict(self, text: str):
+        """
+        Prediksi apakah berita adalah hoax atau bukan
+        Returns: (label, confidence, probabilities)
+        """
+        if not text or len(text.strip()) < 10:
+            return (
+                "Error",
+                0.0,
+                {"Real": 0.0, "Hoax": 0.0},
+                "⚠️ Teks terlalu pendek. Minimal 10 karakter.",
+            )
+        try:
+            # Tokenisasi
+            encoded = self.tokenizer(
+                text, truncation=True, max_length=256, return_tensors="pt"
+            ).to(self.device)
+            # Prediksi
+            with torch.no_grad():
+                logits = self.model(**encoded).logits
+                probs = torch.softmax(logits, dim=-1).cpu().numpy()[0]
+            # Ekstrak hasil
+            prob_real = float(probs[0])
+            prob_hoax = float(probs[1])
+            predicted_label = int(np.argmax(probs))
+            confidence = float(probs[predicted_label])
+            # Label dan warning
+            label_text = "🚨 HOAX" if predicted_label == 1 else "✅ REAL"
+            # Confidence level
+            if confidence >= 0.9:
+                confidence_level = "Sangat Tinggi"
+            elif confidence >= 0.75:
+                confidence_level = "Tinggi"
+            elif confidence >= 0.6:
+                confidence_level = "Sedang"
+            else:
+                confidence_level = "Rendah"
+            # Warning message
+            warning = ""
+            if confidence < 0.6:
+                warning = "⚠️ Confidence rendah. Hasil mungkin tidak akurat. Silakan verifikasi secara manual."
+            # Format hasil
+            result_text = f"""
+### Hasil Deteksi: {label_text}
+**Confidence:** {confidence:.2%} ({confidence_level})
+**Probabilitas Detail:**
+- Real News: {prob_real:.2%}
+- Fake News (Hoax): {prob_hoax:.2%}
+{warning}
+            """
+            return (
+                result_text,
+                confidence,
+                {"Real News": prob_real, "Fake News (Hoax)": prob_hoax},
+                warning,
+            )
+        except Exception as e:
+            logger.error(f"Error during prediction: {e}")
+            return f"❌ Error: {str(e)}", 0.0, {"Real": 0.0, "Hoax": 0.0}, ""
+# Inisialisasi detector
+detector = FakeNewsDetector()
+def predict_news(text: str):
+    """Wrapper function untuk Gradio"""
+    result_text, confidence, probs, warning = detector.predict(text)
+    return result_text, probs
+# Contoh teks untuk demo
+examples = [
+    [
+        "Pemerintah mengumumkan kebijakan baru untuk meningkatkan ekonomi rakyat dengan subsidi langsung kepada UMKM."
+    ],
+    ["BREAKING: Alien mendarat di Jakarta dan bertemu dengan presiden!"],
+    [
+        "Menteri Kesehatan mengimbau masyarakat untuk tetap menjaga protokol kesehatan di tengah musim hujan."
+    ],
+]
+# Buat Gradio Interface
+with gr.Blocks(title="IndoBERT Fake News Detection", theme=gr.themes.Soft()) as demo:
+    gr.Markdown("""
+    # 🔍 IndoBERT Fake News Detection
+    Deteksi berita hoax menggunakan model IndoBERT yang telah dilatih pada dataset berita Indonesia.
+    **Cara Penggunaan:**
+    1. Masukkan teks berita pada kotak di bawah
+    2. Klik tombol "🔍 Deteksi Berita"
+    3. Lihat hasil analisis dan tingkat confidence
+    ⚠️ **Catatan:** Model ini adalah alat bantu dan tidak 100% akurat. Selalu verifikasi dari sumber terpercaya.
+    """)
+    with gr.Row():
+        with gr.Column(scale=2):
+            input_text = gr.Textbox(
+                label="📝 Masukkan Teks Berita",
+                placeholder="Ketik atau paste teks berita di sini...",
+                lines=8,
+                max_lines=15,
+            )
+            with gr.Row():
+                clear_btn = gr.Button("🗑️ Clear", variant="secondary")
+                submit_btn = gr.Button("🔍 Deteksi Berita", variant="primary")
+        with gr.Column(scale=1):
+            output_text = gr.Markdown(label="Hasil Deteksi")
+            output_plot = gr.Label(label="Distribusi Probabilitas", num_top_classes=2)
+    # Examples
+    gr.Markdown("### 📋 Contoh Teks")
+    gr.Examples(examples=examples, inputs=input_text, label="Klik untuk mencoba contoh")
+    # Event handlers
+    submit_btn.click(
+        fn=predict_news, inputs=input_text, outputs=[output_text, output_plot]
+    )
+    clear_btn.click(
+        fn=lambda: ("", "", {}),
+        inputs=None,
+        outputs=[input_text, output_text, output_plot],
+    )
+    gr.Markdown("""
+    ---
+    ### ℹ️ Tentang Model
+    Model ini menggunakan **IndoBERT** (Indonesian BERT) yang telah di-fine-tune pada dataset berita Indonesia
+    untuk klasifikasi berita real vs hoax.
+    - **Base Model:** indobenchmark/indobert-base-p1
+    - **Task:** Binary Classification (Real/Hoax)
+    - **Max Length:** 256 tokens
+    ### 🤝 Kontribusi & Feedback
+    Jika menemukan hasil yang kurang akurat, silakan laporkan untuk membantu meningkatkan model ini.
+    """)
+# Launch app
+if __name__ == "__main__":
+    demo.launch(server_name="0.0.0.0", server_port=7860, share=False)

requirements.txt ADDED Viewed

	@@ -0,0 +1,17 @@

+# Hugging Face Space Requirements
+# Python 3.10+
+# Web Framework - Gradio untuk HF Spaces
+gradio>=4.0.0
+# Model & ML Dependencies
+torch>=2.0.0
+transformers>=4.30.0
+huggingface-hub>=0.18.0
+# Data Processing
+numpy>=1.24.0
+pandas>=2.0.0
+# Utilities
+python-dotenv>=1.0.0