Spaces:

AnonymousResearch
/

WatermarkLeaderboard

Runtime error

App Files Files Community

kirudang commited on Nov 7, 2025

Commit

40b3335

1 Parent(s): bc4e081

Copy files from original watermark leaderboard

Browse files

Files changed (21) hide show

.gitignore +23 -0
CHANGELOG.md +97 -0
DEPLOYMENT_GUIDE.md +78 -0
Guideline to submit your watermark performance.docx +0 -0
README.md +50 -6
Reproducibility/Attack_dipper.py +135 -0
Reproducibility/BERT_score.py +71 -0
Reproducibility/C4_dataset_download.py +20 -0
Reproducibility/CNN_dataset_download.py +38 -0
Reproducibility/Entity_similarity_score.py +138 -0
Reproducibility/Finetune_sum.py +298 -0
Reproducibility/Inference_sum.py +154 -0
Reproducibility/README.md +61 -0
app.py +1097 -0
assets/index-Cd6CRo7g.js +0 -0
assets/index-tTSI8ghR.css +1 -0
deploy_to_huggingface.py +138 -0
index.html +13 -0
leaderboard.json +122 -0
requirements.txt +4 -0
test.html +16 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,23 @@

+# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
+# dependencies
+/node_modules
+/.pnp
+.pnp.js
+# testing
+/coverage
+# production
+/build
+# misc
+.DS_Store
+.env.local
+.env.development.local
+.env.test.local
+.env.production.local
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*

CHANGELOG.md ADDED Viewed

	@@ -0,0 +1,97 @@

+# 📋 Changelog - Watermark Leaderboard Fixes
+## Version 2.0 - December 2024
+### 🐛 Bug Fixes
+#### 1. Fixed Submission Validation
+**Problem**: Users could only submit Attack-free data. Watermark Removal and Stealing Attack submissions failed validation.
+**Solution**:
+- Updated validation logic to accept any combination of attack types
+- Users can now submit:
+  - Only Attack-free data
+  - Only Watermark Removal data
+  - Only Stealing Attack data
+  - Any combination of the above
+**Code Changes**:
+```python
+# Before: Required Attack-free fields
+if normalized_utility is None or detection_rate is None:
+    return "❌ Error: Normalized Utility and Detection Rate are required"
+# After: Flexible validation
+has_attack_free_data = normalized_utility is not None and detection_rate is not None
+has_removal_data = absolute_utility_degradation is not None and removal_detection_rate is not None
+has_stealing_data = adversary_bert_score is not None and adversary_detection_rate is not None
+if not has_attack_free_data and not has_removal_data and not has_stealing_data:
+    return "❌ Error: Please provide at least one complete set of metrics"
+```
+#### 2. Enhanced Pending Submissions Display
+**Problem**: Pending submissions table only showed basic fields (ID, Name, Model, Normalized Utility, Detection Rate, Submitted At).
+**Solution**:
+- Updated table to show ALL submission fields
+- Administrators can now see complete submission details for proper review
+**New Fields Displayed**:
+- ID, Name, Model, Paper Link
+- Attack-free Utility, Attack-free Detection
+- Removal Degradation, Removal Detection
+- Adversary BERT, Adversary Detection
+- Submitted At
+### ✨ New Features
+#### 1. Paper Link Field
+- Added optional paper link field to submissions
+- Links are displayed in the pending submissions table
+#### 2. Enhanced User Guidance
+- Added clear submission requirements in the form
+- Better error messages with specific guidance
+- Visual indicators for required vs optional fields
+#### 3. Improved Form Labels
+- Changed "Attack-free (Required)" to "Attack-free (Optional - Both Required if One is Provided)"
+- Made it clear that all attack types are optional but pairs must be complete
+### 🔧 Technical Improvements
+#### 1. Better Validation Logic
+- Separate validation for each attack type
+- Clear error messages for each validation failure
+- Consistent range validation across all metrics
+#### 2. Enhanced Data Structure
+- Improved pending submissions data formatting
+- Better handling of optional fields
+- Consistent data types across all metrics
+#### 3. Updated Dependencies
+- Added numpy requirement for better data handling
+- Updated Gradio version compatibility
+### 📊 User Experience Improvements
+#### 1. Clearer Instructions
+- Added submission requirements box in the form
+- Better placeholder text and help information
+- Consistent styling across all form sections
+#### 2. Better Error Handling
+- More specific error messages
+- Guidance on how to fix validation errors
+- Consistent error formatting
+#### 3. Enhanced Admin Experience
+- Complete field visibility in pending submissions
+- Better table formatting with all metrics
+- Improved approval workflow
+### 🚀 Deployment Ready
+All changes are compatible with Hugging Face Spaces and ready for immediate deployment. The fixes maintain backward compatibility while significantly improving functionality and user experience.

DEPLOYMENT_GUIDE.md ADDED Viewed

	@@ -0,0 +1,78 @@

+# 🚀 Deployment Guide for Hugging Face Spaces
+## Quick Fix for Your Existing Space
+Your Hugging Face Space at https://huggingface.co/spaces/kirudang/watermark-leaderboard has been updated with the following fixes:
+### ✅ Fixed Issues
+1. **Flexible Submission Validation**
+   - Now accepts any combination of attack types
+   - Submit only Attack-free, only Watermark Removal, only Stealing Attack, or any combination
+   - Clear validation messages guide users
+2. **Complete Pending Submissions Table**
+   - Shows ALL fields: ID, Name, Model, Paper Link, Attack-free metrics, Watermark Removal metrics, Stealing Attack metrics, Submitted At
+   - Administrators can see complete submission details for proper review
+3. **Enhanced User Experience**
+   - Clear submission requirements displayed in the form
+   - Better error messages
+   - Paper link field added
+## 📁 Files to Update in Your Hugging Face Space
+Upload these updated files to your Space:
+1. **app.py** - Main application with all fixes
+2. **requirements.txt** - Updated dependencies
+3. **README.md** - Updated documentation
+4. **leaderboard.json** - Latest data (if needed)
+## 🔄 How to Deploy
+### Option 1: Git Push (Recommended)
+```bash
+# In your watermark-leaderboard directory
+git add .
+git commit -m "Fix submission validation and pending approval display"
+git push origin main
+```
+### Option 2: Manual Upload
+1. Go to your Space: https://huggingface.co/spaces/kirudang/watermark-leaderboard
+2. Click "Files and versions" tab
+3. Upload the updated files:
+   - `app.py`
+   - `requirements.txt`
+   - `README.md`
+   - `leaderboard.json` (if you want to update data)
+## 🎯 What's Fixed
+### Before (Issues):
+- ❌ Could only submit Attack-free data
+- ❌ Pending submissions showed limited fields
+- ❌ Confusing validation messages
+### After (Fixed):
+- ✅ Can submit any combination: Attack-free, Watermark Removal, Stealing Attack
+- ✅ Pending submissions show ALL fields for complete review
+- ✅ Clear submission requirements and validation
+## 🔍 Testing the Fixes
+After deployment, test these scenarios:
+1. **Submit only Stealing Attack data** (no Attack-free)
+2. **Submit only Watermark Removal data** (no Attack-free)
+3. **Submit combination of all three types**
+4. **Check pending submissions table shows all fields**
+## 🛠️ Admin Controls
+- **Admin Password**: `admin123` (you can change this in app.py)
+- **Pending Submissions**: Shows complete details for review
+- **Approval Process**: Approve/reject with full visibility
+Your Space will automatically rebuild when you push the changes!

Guideline to submit your watermark performance.docx ADDED Viewed

Binary file (17.7 kB). View file

README.md CHANGED Viewed

@@ -1,13 +1,57 @@
 ---
-title: WatermarkLeaderboard
 emoji: 🏆
-colorFrom: gray
-colorTo: yellow
 sdk: gradio
-sdk_version: 5.49.1
 app_file: app.py
 pinned: false
-short_description: 'Public Watermark Leaderboard '
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Watermark Leaderboard
 emoji: 🏆
+colorFrom: blue
+colorTo: green
 sdk: gradio
+sdk_version: "4.44.0"
 app_file: app.py
 pinned: false
+license: mit
+short_description: Interactive leaderboard for watermark performance evaluation
 ---
+# Watermark Leaderboard 🏆
+An interactive leaderboard for comparing watermark performance across different models and evaluation settings.
+## Features
+- **Interactive Scatter Plot**: Visualize watermark performance with Plotly charts
+- **Performance Table**: Detailed metrics with sorting and filtering
+- **Multiple Evaluation Settings**: Attack-free, Watermark Removal, and Stealing Attack
+- **Model Support**: LLaMA3 and DeepSeek models
+- **Dynamic Filtering**: Real-time updates based on model and metric selection
+- **Flexible Submissions**: Submit data for any combination of attack types
+- **Pending Approval System**: All submissions reviewed before appearing on leaderboard
+- **Complete Field Visibility**: Administrators see all submission details for review
+- **Professional UI**: Clean, modern interface with accordion sections
+- **Reproducibility**: Access to all evaluation codes and guidelines
+## How to Use
+1. **Select Model**: Choose between LLaMA3 or DeepSeek
+2. **Choose Setting**: Pick from Attack-free, Watermark Removal, or Stealing Attack
+3. **View Results**: Explore the scatter plot and detailed table
+4. **Submit Data**: Click "Add Your Data" to submit new results
+   - Submit any combination of attack types (Attack-free, Watermark Removal, Stealing Attack)
+   - All submissions go through approval process before appearing on leaderboard
+5. **Administrator Review**: Administrators can review pending submissions with full field visibility
+## Metrics Explained
+- **Normalized Utility ↑**: Higher values indicate better text quality
+- **Detection Rate (%) ↑**: Higher values indicate better watermark detection
+- **Absolute Utility Degradation ↑**: Higher values indicate better resistance to removal attacks
+- **Adversary BERT Score ↑**: Higher values indicate better performance under adversarial conditions
+## Contributing
+We encourage researchers to contribute their evaluation results. Please follow the guidelines in the "Guidelines" section for submission requirements.
+## License
+MIT License
+---
+*Last updated: September 2024*

Reproducibility/Attack_dipper.py ADDED Viewed

	@@ -0,0 +1,135 @@

+import os
+# Set the HF_HOME environment variable to point to the desired cache location
+os.environ["HF_TOKEN"] = "Your_HuggingFace_Token_Here"
+# Specify the directory path
+cache_dir = 'Your_Desired_Cache_Directory_Here'
+# Set the HF_HOME environment variable
+os.environ['HF_HOME'] = cache_dir
+import argparse
+import json
+import nltk
+import time
+import os
+import tqdm
+from nltk.tokenize import sent_tokenize
+from transformers import T5Tokenizer, T5ForConditionalGeneration
+import torch
+#nltk.download("punkt")
+def main(args):
+    # Clear the cache
+    torch.cuda.empty_cache()
+    # Load data from the specified JSON file
+    with open(args.data, 'r') as f:
+        data = json.load(f)
+        data = [{"query": item["input"], "output_with_watermark": item[args.column_in_data]} for item in data[4960:args.Ninputs]]
+    # Load the model and tokenizer
+    time1 = time.time()
+    tokenizer = T5Tokenizer.from_pretrained("google/t5-v1_1-xxl")
+    model = T5ForConditionalGeneration.from_pretrained(args.model_name)
+    print("Model loaded in ", time.time() - time1)
+    model.cuda()
+    model.eval()
+    # Initialize lists to store the attacked output and the inputs for the paraphrase model
+    attack_results = []
+    input_counter = 0
+    # Iterate over the data
+    for idx, dd in tqdm.tqdm(enumerate(data), total=len(data)):
+        print(f"Processing input {idx + 1} / {len(data)}")
+        input_gen = dd["output_with_watermark"].strip() if isinstance(dd["output_with_watermark"], str) else dd["output_with_watermark"][0].strip()
+        # Initialize dipper_inputs and w_wm_output_attacked to empty lists
+        dipper_inputs = []
+        w_wm_output_attacked = []
+        assert args.lex in [0, 20, 40, 60, 80, 100], "Lexical diversity must be one of 0, 20, 40, 60, 80, 100."
+        assert args.order in [0, 20, 40, 60, 80, 100], "Order diversity must be one of 0, 20, 40, 60, 80, 100."
+        # Calculate the control codes for the paraphrase model
+        lex_code = int(100 - args.lex)
+        order_code = int(100 - args.order)
+        # Remove spurious newlines
+        # removing all extra whitespace from
+        input_gen = " ".join(input_gen.split())
+        # Split the input into sentences
+        sentences = sent_tokenize(input_gen)
+        # White space removal
+        prefix = " ".join(dd["query"].replace("\n", " ").split())
+        output_text = ""
+        final_input_text = ""
+        # Generate the paraphrase for each sentence window
+        for sent_idx in range(0, len(sentences), args.sent_interval):
+            curr_sent_window = " ".join(sentences[sent_idx : sent_idx + args.sent_interval])
+            if args.no_ctx:
+                final_input_text = f"lexical = {lex_code}, order = {order_code} <sent> {curr_sent_window} </sent>"
+            else:
+                final_input_text = f"lexical = {lex_code}, order = {order_code} {prefix} <sent> {curr_sent_window} </sent>"
+            final_input = tokenizer([final_input_text], return_tensors="pt")
+            final_input = {k: v.cuda() for k, v in final_input.items()}
+            # Generate the paraphrase
+            with torch.inference_mode():
+                outputs = model.generate(
+                    **final_input,
+                    do_sample=True,
+                    top_p=0.75,
+                    top_k=None,
+                    max_length=400
+                )
+            outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)
+            prefix += " " + outputs[0]
+            output_text += " " + outputs[0]
+        # Store the attacked output and the input for the paraphrase model
+        w_wm_output_attacked.append(output_text.strip())
+        dipper_inputs.append(final_input_text)
+        # Create a dictionary with the four specified columns
+        result = {
+            "original_query": dd["query"],
+            "watermarked_response": dd["output_with_watermark"],
+            #"final_input_text": dipper_inputs,
+            "paraphrased_response": w_wm_output_attacked[0]
+        }
+        # Add the result to the list of results
+        attack_results.append(result)
+        # Increment the input counter
+        input_counter += 1
+        # Save the results after processing every saving_freq inputs
+        if input_counter % args.saving_freq == 0:
+            # Save the generated data to a JSON file
+            # Check if the file exists
+            if os.path.isfile(f"{args.output_name}_{input_counter-args.saving_freq}.json"):
+                os.remove(f"{args.output_name}_{input_counter-args.saving_freq}.json")
+            with open(f"{args.output_name}_{input_counter}.json", "w") as json_file:
+                json.dump(attack_results, json_file, indent=4)
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Attack by Dipper Paraphrasing")
+    parser.add_argument("--data",type=str,default="Llama3_SIR_test_13860.json", help="The data to be attacked / paraphrased.")
+    parser.add_argument("--column_in_data", type=str, default="output_only", help="Column in the data to be attacked / paraphrased.")
+    parser.add_argument("--output_name", type=str, default="Dipper_Llama3_SIR_13860_4960_", help="The output directory to save the attacked / paraphrased data.")
+    parser.add_argument("--Ninputs", type=int, default=13860, help="Number of inputs to be attacked / paraphrased.")
+    parser.add_argument("--saving_freq", type=int, default=10, help="The frequency of saving the output.")
+    parser.add_argument("--model_name", type=str, default="kalpeshk2011/dipper-paraphraser-xxl", help="The model name to use.")
+    parser.add_argument("--no_ctx", type=bool, default=True, help="Whether to use context or not.")
+    parser.add_argument("--sent_interval", type=int, default=3,help="The sentence interval.")
+    parser.add_argument("--lex",type=int, default=60, help="Lexical diversity knob for the paraphrase attack.")
+    parser.add_argument("--order",type=int,default=60,help="Order diversity knob for the paraphrase attack.")
+    args = parser.parse_args()
+    main(args)

Reproducibility/BERT_score.py ADDED Viewed

	@@ -0,0 +1,71 @@

+import os
+from bert_score import score
+import json
+import argparse
+import csv
+import torch
+import warnings
+warnings.filterwarnings("ignore")
+# Ensure the HF_HOME environment variable points to your desired cache location
+os.environ["HF_TOKEN"] = "Your_HF_TOKEN"
+cache_dir = 'Your_cache_directory'
+def main(args):
+    start = 0
+    # Clear the cache
+    torch.cuda.empty_cache()
+    # Load Candidate and Reference Files if they are from the same file.
+    with open(args.data_can, 'r') as f:
+        data_1 = json.load(f)[start:args.N]
+        cands = [item["Watermarked_summary"] for item in data_1]
+        # randomized_words = [item["Total_randomized_words"] for item in data_1]
+        # total_words = [item["Total_words"] for item in data_1]
+    with open(args.data_ref, 'r') as f:
+        data_2 = json.load(f)[start:args.N]
+        refs = [item["summary"] for item in data_2]
+    # Set saving frequency
+    saving_freq = 10
+    # Initialize input counter
+    input_counter = 0
+    # Loop through the output text and detect the watermark
+    results = []
+    # Loop through the data and calculate the BERTScore
+    for i, item in enumerate(cands):
+            num_tokens = len(item.split())
+            print(f"Item number: {i}")
+            if num_tokens >= 16: # Only consider items with at least 16 tokens for valid assessment
+                P, R, F1 = score([cands[i]], [refs[i]], lang="en", verbose=True)
+                scores = F1.mean().item()
+                #results.append([i, scores, randomized_words[i], total_words[i]])
+                results.append([i, scores])
+            else:
+               print(f"Skipping item number {i} due to insufficient tokens.")
+            # Write the results to a CSV file
+            # Increment input counter
+            input_counter += 1
+            # Save the results after processing every saving_freq inputs
+            if input_counter % saving_freq == 0:
+                # Check if the file exits
+                if os.path.isfile(f"{args.Output_name}{start}_{input_counter-saving_freq}.csv"):
+                    os.remove(f"{args.Output_name}{start}_{input_counter-saving_freq}.csv")
+                with open(f'{args.Output_name}{start}_{input_counter}.csv', 'w', newline='') as f:
+                    writer = csv.writer(f)
+                    writer.writerow(["data_item", 'BERTScore'])
+                    writer.writerows(results)
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='Calculate BERTScore')
+    parser.add_argument('--data_can',default= 'DeepSeek_TW_Summarization_test__1000.json',type=str, help='a file containing the candidate document to test')
+    parser.add_argument('--data_ref',default= 'DeepSeek_No_WM_Summarization_test_0_1000_1000.json',type=str, help='a file containing the reference document to test')
+    parser.add_argument('--N', default= 1000, type=int, help='Number of data items to process')
+    parser.add_argument('--Output_name', default= "BERTScore_DeepSeek_Summarization_TW_ref_No_WM_", type=str, help='Name of the output file')
+    main(parser.parse_args())

Reproducibility/C4_dataset_download.py ADDED Viewed

	@@ -0,0 +1,20 @@

+import os
+# Ensure the HF_HOME environment variable points to your desired cache location
+os.environ["HF_TOKEN"] = "Your_HF_Token"
+cache_dir = 'Your_Cache_Dir'
+os.environ['HF_HOME'] = cache_dir
+from datasets import load_dataset
+# Specify local directory for caching
+dataset_path = "./c4_realnewslike"
+# Load only the "realnewslike" subset of train and validation
+dataset = load_dataset("allenai/c4", "realnewslike", cache_dir=dataset_path)
+# Print confirmation
+print("Dataset downloaded and stored at:", dataset_path)
+# Print the number of samples in each subset
+print("Number of training samples:", len(dataset["train"]))
+print("Number of validation samples:", len(dataset["validation"]))

Reproducibility/CNN_dataset_download.py ADDED Viewed

	@@ -0,0 +1,38 @@

+import os
+# Ensure the HF_HOME environment variable points to your desired cache location
+os.environ["HF_TOKEN"] = "Your_HF_Token"
+cache_dir = 'Your_Cache_Dir'
+os.environ['HF_HOME'] = cache_dir
+import json
+from datasets import load_dataset
+# Set dataset save path
+save_path = "cnn.json"
+if not os.path.exists(save_path):
+    dataset = load_dataset("abisee/cnn_dailymail", "3.0.0")
+    train_data = dataset["train"][:20000]
+    test_data = dataset["test"][:1000]
+    data_subset = []
+    for article, highlights, data_id in zip(train_data["article"], train_data["highlights"], train_data["id"]):
+        data_subset.append({
+            "id": data_id,
+            "article": article,
+            "highlights": highlights,
+            "type": "train"
+        })
+    for article, highlights, data_id in zip(test_data["article"], test_data["highlights"], test_data["id"]):
+        data_subset.append({
+            "id": data_id,
+            "article": article,
+            "highlights": highlights,
+            "type": "test"
+        })
+    with open(save_path, "w", encoding="utf-8") as f:
+        json.dump(data_subset, f, ensure_ascii=False, indent=4)
+print(f"Data saved to {save_path}")

Reproducibility/Entity_similarity_score.py ADDED Viewed

	@@ -0,0 +1,138 @@

+import json
+import spacy
+from sklearn.metrics.pairwise import cosine_similarity
+from difflib import SequenceMatcher
+import numpy as np
+import pandas as pd
+from tqdm import tqdm
+#Data
+ref_data = "DeepSeek7b_No_WM_test_13860.json"
+ref_column = "output_only"
+cand_data = "Dipper_DeepSeek_TW_13860.json"
+cand_column ="paraphrased_response"
+output_name = "Entity_Dipper_DeepSeek_TW.csv"
+N = 13860
+# Use SPACY
+# Load spaCy's Named Entity Recognition (NER) model
+nlp = spacy.load("en_core_web_sm")
+# Function to extract named entities from text
+def extract_named_entities(text):
+    doc = nlp(text)
+    return [ent.text for ent in doc.ents]
+# === Similarity Calculation ===
+def compute_similarity(entity1, entity2):
+    if entity1 == entity2:
+        return 1.0, 1.0, 1.0
+    lev_similarity = SequenceMatcher(None, entity1, entity2).ratio()
+    vec1 = nlp(entity1).vector.reshape(1, -1)
+    vec2 = nlp(entity2).vector.reshape(1, -1)
+    if np.any(vec1) and np.any(vec2):
+        cos_similarity = cosine_similarity(vec1, vec2)[0][0]
+    else:
+        cos_similarity = 0.0
+    combined_similarity = (lev_similarity + cos_similarity) / 2
+    return combined_similarity, lev_similarity, cos_similarity
+# === Greedy Pairwise Matching ===
+def greedy_pairwise_matching(ref_entities, cand_entities):
+    matched_entities = []
+    cand_entities_copy = cand_entities.copy()
+    for ref_entity in ref_entities:
+        best_match = None
+        best_score = 0
+        best_lev_similarity = 0
+        best_cos_similarity = 0
+        for cand_entity in cand_entities_copy:
+            similarity_score, lev_similarity, cos_similarity = compute_similarity(ref_entity, cand_entity)
+            if similarity_score > best_score:
+                best_score = similarity_score
+                best_match = cand_entity
+                best_lev_similarity = lev_similarity
+                best_cos_similarity = cos_similarity
+        if best_match:
+            matched_entities.append((ref_entity, best_match, best_score, best_lev_similarity, best_cos_similarity))
+            cand_entities_copy.remove(best_match)
+        else:
+            matched_entities.append((ref_entity, "MISSING", 0, 0, 0))
+    for cand_entity in cand_entities_copy:
+        matched_entities.append(("NEW ENTITY", cand_entity, 0, 0, 0))
+    return matched_entities
+# === Load Data ===
+with open(ref_data, "r", encoding="utf-8") as ref_file:
+    reference_data = [entry[ref_column] for entry in json.load(ref_file)[:N]]
+with open(cand_data, "r", encoding="utf-8") as cand_file:
+    candidate_data = [entry[cand_column] for entry in json.load(cand_file)[:N]]
+assert len(reference_data) == len(candidate_data), "Mismatch in data point count!"
+# === Process Each Pair ===
+results = []
+for idx, (ref_text, cand_text) in enumerate(tqdm(zip(reference_data, candidate_data), total=len(reference_data))):
+    ref_entities = extract_named_entities(ref_text)
+    cand_entities = extract_named_entities(cand_text)
+    if len(ref_entities) == 0 and len(cand_entities) == 0:
+        continue
+    matched_entities = greedy_pairwise_matching(ref_entities, cand_entities)
+    # Compute similarity lists
+    cosine_similarities = [match[4] for match in matched_entities if match[4] > 0]
+    levenshtein_similarities = [match[3] for match in matched_entities if match[3] > 0]
+    avg_cosine_similarity = np.mean(cosine_similarities) if cosine_similarities else 0.0
+    avg_levenshtein_similarity = np.mean(levenshtein_similarities) if levenshtein_similarities else 0.0
+    avg_similarity = (avg_cosine_similarity + avg_levenshtein_similarity) / 2
+    # Calculate exact match count
+    exact_match_pairs = sum(1 for match in matched_entities if match[0] == match[1])
+    # Union count
+    union_count = len(ref_entities) + len(cand_entities) - exact_match_pairs
+    union_count = union_count if union_count > 0 else 1  # Avoid division by zero
+    # Final score
+    final_score = (avg_similarity / union_count) * max(len(ref_entities), len(cand_entities))
+    results.append({
+        "Index": idx,
+        "Reference_Entity_Count": len(ref_entities),
+        "Candidate_Entity_Count": len(cand_entities),
+        "Reference_Entities": ref_entities,
+        "Candidate _Entities": cand_entities,
+        "Matched_Entities": matched_entities,
+        "Exact_Match Pairs": exact_match_pairs,
+        "Union_Count": union_count,
+        "Average_Cosine_Similarity": avg_cosine_similarity,
+        "Average_Levenshtein_Similarity": avg_levenshtein_similarity,
+        "Average_Combined_Similarity_Score": avg_similarity,
+        "Final_Score": final_score
+    })
+# === Save to CSV ===
+results_df = pd.DataFrame(results)
+results_df.to_csv(output_name, index=False)
+print(f'Average Final Score: {results_df["Final_Score"].mean()}')

Reproducibility/Finetune_sum.py ADDED Viewed

	@@ -0,0 +1,298 @@

+import os
+# Set the HF_HOME environment variable to point to the desired cache location
+# os.environ["HF_TOKEN"] = "your_hugging_face_token_here"  # Replace with your Hugging Face token
+# Specify the directory path
+cache_dir = '/network/rit/lab/Lai_ReSecureAI/kiel/wmm'
+# Set the HF_HOME environment variable
+os.environ['HF_HOME'] = cache_dir
+os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
+import matplotlib.pyplot as plt
+import logging
+import time
+import torch
+import json
+import torch.nn as nn
+from typing import Optional
+import pandas as pd
+from datasets import Dataset
+from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training
+from dataclasses import dataclass, field
+from transformers import (
+    HfArgumentParser,
+    AutoTokenizer,
+    TrainingArguments,
+    BitsAndBytesConfig,
+    TrainerCallback,
+    AutoModelForCausalLM
+)
+from trl import SFTTrainer
+import warnings
+# Ignore all warnings
+warnings.filterwarnings("ignore")
+# Set up logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# Clear cache
+torch.cuda.empty_cache()
+device = "cuda" if torch.cuda.is_available() else "cpu"
+# Initialize parameters
+model_name =  "Llama2" #'DeepSeek' #'Llama3' #
+WM = "TW"
+num_data = 10000
+num_epochs = 5
+learning_rate_ = 1e-5
+# Print parameters
+print(f'Device: {device}')
+print(f'Model: {model_name}')
+print(f'WM: {WM}')
+print(f'Number of data: {num_data}')
+print(f'Number of epochs: {num_epochs}')
+print(f'Learning rate: {learning_rate_}')
+start_time = time.time()
+# Load data
+def load_data(file_path, num_data):
+    with open(file_path, 'r') as f:
+        data = json.load(f)
+        return [
+            {
+                "text": "Now summarize the following text with maximum 60 words: " +
+                        item["article"] +
+                        "\nThe summary is: " +
+                        item['Watermarked_summary']
+            }
+            for item in data[:num_data]
+        ]
+# Create dataset
+def create_dataset(data):
+    """
+    Convert the concatenated data into a Hugging Face Dataset format.
+    """
+    df = pd.DataFrame(data)  # Each element in 'data' is a dictionary with 'text' as the key
+    return Dataset.from_pandas(df)
+def get_file_paths(model_name,WM):
+    base_path = '/network/rit/lab/Lai_ReSecureAI/kiel/Website/Stealing/'
+    if WM == "SafeSeal":
+        paths = {
+            'DeepSeek': ('DeepSeek_train_Summarization_Safeseal_top_3_threshold_0.8_Uniform_0_20000_20k.json', 'DeepSeek_test_Summarization_Safeseal_top_3_threshold_0.8_Uniform_0_1000_1000.json'),
+            'Llama3': ('Llama3_train_Summarization_Safeseal_top_3_threshold_0.8_Uniform_0_20000_20k.json', 'Llama3_test_Summarization_Safeseal_top_3_threshold_0.8_Uniform_0_1000_1000.json')
+        }
+    elif WM == "DTM":
+        paths = {
+            'Llama3': ('Llama3_DTM_Summarization_train__20000.json', 'Llama3_DTM_Summarization_test__1000.json'),
+            'DeepSeek': ('DeepSeek_DTM_Summarization_train__20000.json', 'DeepSeek_DTM_Summarization_test__1000.json'),
+            'Llama2': ('Llama2_DTM_Summarization_train_20k.json', 'Llama2_DTM_Summary_test_1000.json'),
+            'Mistral': ('Mistral_DTM_Summarization_train_20k.json', 'Mistral_DTM_Summary_test_1000.json')
+        }
+    elif WM == "KGW":
+        paths = {
+            'Llama3': ('Llama3_KGW_Summarization_train_0_20000_20000.json', 'Llama3_KGW_Summarization_test_0_1000_1000.json'),
+            'DeepSeek': ('DeepSeek_KGW_Summarization_train_0_20000_20000.json', 'DeepSeek_KGW_Summarization_test_0_1000_1000.json')
+        }
+    elif WM == "SIR":
+        paths = {
+            'DeepSeek': ('DeepSeek_SIR_Summarization_train_0_20000_20000.json', 'DeepSeek_SIR_Summarization_test_0_1000_1000.json'),
+            'Llama3': ('Llama3_SIR_Summarization_train_0_20000_20000.json', 'Llama3_SIR_Summarization_test_0_1000_1000.json')
+        }
+    elif WM == "SynthID":
+        paths = {
+            'DeepSeek': ('DeepSeek_SynthID_Summarization_train_0_20000_20000.json', 'DeepSeek_SynthID_Summarization_test_0_1000_1000.json'),
+            'Llama3': ('Llama3_SynthID_Summarization_train_0_20000_20000.json', 'Llama3_SynthID_Summarization_test_0_1000_1000.json')
+        }
+    elif WM == "TW":
+        paths = {
+            'DeepSeek': ('DeepSeek_TW_Summarization_train_20000.json', 'DeepSeek_TW_Summarization_test__1000.json'),
+            'Llama3': ('Llama3_TW_Summarization_train__20000.json', 'Llama3_TW_Summarization_test__1000.json'),
+            'Llama2': ('Llama2_TW_Summarization_train_20k.json', 'Llama2_TW_Summary_test_1000.json'),
+            'Mistral': ('Mistral_TW_Summarization_train_20k.json', 'Mistral_TW_Summary_Test_1000.json')
+        }
+    return base_path + paths[model_name][0], base_path + paths[model_name][1]
+def get_new_model_path(model_name,WM, num_epochs, learning_rate_, num_data):
+    #return f"/network/rit/lab/Lai_ReSecureAI/phung/adversary_models/{model_name}_epoch{num_epochs}_lr{learning_rate_}_K{K}_Threshold{Threshold}_data{num_data}_testing_batch{batch_no}_"
+    return f"./adversary_models/{model_name}_{WM}_epoch{num_epochs}_lr{learning_rate_}_data{num_data}_"
+    #return f"/network/rit/lab/Lai_ReSecureAI/phung/adversary_models/{model_name}_{WM}_epoch{num_epochs}_lr{learning_rate_}_data{num_data}_"
+train_file, test_file = get_file_paths(model_name, WM)
+train_data = load_data(train_file, num_data)
+test_data = load_data(test_file, num_data)
+train_dataset = create_dataset(train_data)
+test_dataset = create_dataset(test_data)
+new_model = get_new_model_path(model_name, WM, num_epochs, learning_rate_, num_data)
+print(f'New model path: {new_model}')
+# Load parameters
+@dataclass
+class ScriptArguments:
+    use_8_bit: Optional[bool] = field(default=False, metadata={"help": "use 8 bit precision"})
+    use_4_bit: Optional[bool] = field(default=False, metadata={"help": "use 4 bit precision"})
+    bnb_4bit_quant_type: Optional[str] = field(default="nf4", metadata={"help": "precise the quantization type (fp4 or nf4)"})
+    use_bnb_nested_quant: Optional[bool] = field(default=False, metadata={"help": "use nested quantization"})
+    use_multi_gpu: Optional[bool] = field(default=True, metadata={"help": "use multi GPU"})
+    use_adapters: Optional[bool] = field(default=True, metadata={"help": "use adapters"})
+    batch_size: Optional[int] = field(default=8, metadata={"help": "input batch size"})
+    max_seq_length: Optional[int] = field(default=400, metadata={"help": "max sequence length"})
+    optimizer_name: Optional[str] = field(default="adamw_hf", metadata={"help": "Optimizer name"})
+parser = HfArgumentParser(ScriptArguments)
+script_args = parser.parse_args_into_dataclasses()[0]
+# Device map
+device_map = "auto" if script_args.use_multi_gpu else "cpu"
+# Check precision settings
+if script_args.use_8_bit and script_args.use_4_bit:
+    raise ValueError("You can't use 8 bit and 4 bit precision at the same time")
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_compute_dtype=torch.float16,
+    bnb_4bit_quant_type=script_args.bnb_4bit_quant_type,
+    bnb_4bit_use_double_quant=script_args.use_bnb_nested_quant,
+) if script_args.use_4_bit else None
+# Load model and tokenizer
+model = AutoModelForCausalLM.from_pretrained(
+    "meta-llama/Meta-Llama-3-8B" if model_name == 'Llama3'
+    else "meta-llama/Llama-2-7b-chat-hf" if model_name == 'Llama2'
+    else "mistralai/Mistral-7B-Instruct-v0.2" if model_name == 'Mistral'
+    else "deepseek-ai/deepseek-llm-7b-base",
+    cache_dir=cache_dir,
+    quantization_config=bnb_config,
+    device_map={"": 0}
+)
+model.config.use_cache = False
+model.config.pretraining_tp = 1
+model = prepare_model_for_kbit_training(model)
+tokenizer = AutoTokenizer.from_pretrained(
+    "meta-llama/Meta-Llama-3-8B" if model_name == 'Llama3'
+    else "meta-llama/Llama-2-7b-chat-hf" if model_name == 'Llama2'
+    else "mistralai/Mistral-7B-Instruct-v0.2" if model_name == 'Mistral'
+    else "deepseek-ai/deepseek-llm-7b-base",
+    use_fast=False
+)
+tokenizer.add_special_tokens({'pad_token': '[PAD]'})
+tokenizer.pad_token = tokenizer.eos_token
+# LoRA Config
+peft_config = LoraConfig(
+        lora_alpha=32, # Alpha value for LoRA, the higher the value, the more aggressive the sparsity
+        lora_dropout=0.05,
+        r=16, # Rank of the LoRA decomposition
+        target_modules= ['q_proj','k_proj','v_proj','o_proj','gate_proj','down_proj','up_proj','lm_head'],
+        bias="none",
+        task_type="CAUSAL_LM",
+    )
+# Create adapter directory
+os.makedirs(new_model, exist_ok=True)
+# Store loss for visualization
+class LoggingCallback(TrainerCallback):
+    def on_log(self, args, state, control, logs=None, **kwargs):
+        if logs:
+            output_log_file = os.path.join(args.output_dir, "train_results.json")
+            with open(output_log_file, "a") as writer:
+                writer.write(json.dumps(logs) + "\n")
+# Training arguments
+training_arguments = TrainingArguments(
+    num_train_epochs=num_epochs,
+    evaluation_strategy="steps",
+    save_steps=-1,
+    save_total_limit=1,
+    logging_steps=500,
+    eval_steps=500,
+    learning_rate=learning_rate_,
+    weight_decay=0.001,
+    per_device_train_batch_size=script_args.batch_size,
+    max_steps=-1,
+    gradient_accumulation_steps=4,
+    per_device_eval_batch_size=script_args.batch_size,
+    output_dir=new_model,
+    max_grad_norm=0.3,
+    warmup_ratio=0.03,
+    lr_scheduler_type="constant",
+    optim=script_args.optimizer_name,
+    fp16=True,
+    logging_strategy="steps",
+    log_level='info'
+)
+trainer = SFTTrainer(
+    model=model,
+    tokenizer=tokenizer,
+    train_dataset=train_dataset,
+    eval_dataset=test_dataset,
+    dataset_text_field="text",
+    peft_config=peft_config,
+    max_seq_length=script_args.max_seq_length,
+    args=training_arguments,
+    callbacks=[LoggingCallback()]
+)
+trainer.train()
+trainer.model.save_pretrained(new_model)
+trainer.tokenizer.save_pretrained(new_model)
+print('Done in ', time.time() - start_time)
+# Save plots
+epochs, train_losses, eval_losses = [], [], []
+# Load evaluation results
+eval_results_file = os.path.join(new_model, "train_results.json")
+with open(eval_results_file, "r") as f:
+    for line in f:
+        data = json.loads(line)
+        if 'epoch' in data:
+            epoch = data['epoch']
+            if 'loss' in data:
+                train_losses.append(data['loss'])
+                epochs.append(epoch)
+            if 'eval_loss' in data:
+                eval_losses.append(data['eval_loss'])
+                if epoch not in epochs:
+                    epochs.append(epoch)
+# Plotting
+plt.figure(figsize=(10, 5))
+plt.plot(epochs[:len(train_losses)], train_losses, label='Train Loss', color='blue')
+plt.plot(epochs[:len(eval_losses)], eval_losses, label='Eval Loss', color='red')
+plt.xlabel('Epoch')
+plt.ylabel('Loss')
+plt.title('Training and Evaluation Loss', fontsize=10)
+plt.legend()
+plt.tight_layout()
+# Save the plot
+plot_path = os.path.join(new_model, 'training_evaluation_loss_plot.png')
+plt.savefig(plot_path)
+plt.close()
+print(f"Plot saved in the current directory as 'training_evaluation_loss_plot.png'.")

Reproducibility/Inference_sum.py ADDED Viewed

	@@ -0,0 +1,154 @@

+import os
+# Ensure the HF_HOME environment variable points to your desired cache location
+# os.environ["HF_TOKEN"] = "your_hugging_face_token_here"  # Replace with your Hugging Face token
+cache_dir = '/network/rit/lab/Lai_ReSecureAI/kiel/wmm'
+os.environ['HF_HOME'] = cache_dir
+import time
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+import json
+# Clear cache
+torch.cuda.empty_cache()
+device = "cuda" if torch.cuda.is_available() else "cpu"
+# Initialize parameters
+model_name =  "DeepSeek" #"Llama3" #'DeepSeek' #
+WM = "SafeSeal"
+num_data = 20000
+num_epochs = 5
+learning_rate_ = 1e-5
+N = 1000
+# Print parameters
+# Print parameters
+print(f'Device: {device}')
+print(f'Model: {model_name}')
+print(f'WM: {WM}')
+print(f'Number of data: {num_data}')
+print(f'Number of epochs: {num_epochs}')
+print(f'Learning rate: {learning_rate_}')
+print(f'Number of generated data: {N}')
+# Base model
+if model_name == 'Llama3':
+    LLM_name = "meta-llama/Meta-Llama-3-8B"
+    base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B",
+    low_cpu_mem_usage=True,
+    return_dict=True,
+    torch_dtype=torch.float16,
+    device_map={"": 0})
+elif model_name == 'DeepSeek':
+    LLM_name = "deepseek-ai/deepseek-llm-7b-base"
+    base_model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-llm-7b-base",
+    low_cpu_mem_usage=True,
+    return_dict=True,
+    torch_dtype=torch.float16,
+    device_map={"": 0})
+# Adapter path
+def get_adapter_path(model_name,WM, num_epochs, learning_rate_, num_data):
+    return f"./adversary_models/{model_name}_{WM}_epoch{num_epochs}_lr{learning_rate_}_data{num_data}_"
+    #return f"/network/rit/lab/Lai_ReSecureAI/phung/adversary_models/{model_name}_epoch{num_epochs}_lr{learning_rate_}_K{K}_Threshold{Threshold}_data{num_data}_testing_batch{batch_no}_"
+adapter = get_adapter_path(model_name,WM, num_epochs, learning_rate_, num_data)
+# Check if the adapter path exists
+print(f'Path to adapter: {adapter}')
+if os.path.exists(adapter):
+    print("Path exists.")
+else:
+    print("Path does not exist.")
+# Merge the base model and adapter
+model = PeftModel.from_pretrained(base_model, adapter)
+print("Model loaded successfully.")
+model = model.merge_and_unload()
+print("Model merged successfully.")
+model.to(device)
+# Initialize the tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained(LLM_name, cache_dir=cache_dir,use_fast=False)
+# # Add special tokens
+# tokenizer.add_special_tokens({'pad_token': '[PAD]'})
+# tokenizer.pad_token = tokenizer.eos_token
+# Load the data
+max_output_tokens=90
+min_output_tokens=10
+data_link = "/network/rit/lab/Lai_ReSecureAI/kiel/New_WM/Summarization/cnn.json"
+output_results = []
+input_counter = 0
+saving_freq = 10
+data = "test"
+output_name = f"Adversary_{model_name}_{WM}_Summarization_{data}_{num_data}_{num_epochs}_{learning_rate_}_"
+# torch.clear_cache()
+def text_summarize(input_text, model, tokenizer, max_output_tokens , min_output_tokens):
+    #prompt = f"{input_text}\nThe summary is:"
+    prompt = f"""
+    Input: The CNN/Daily Mail dataset is one of the most widely used datasets for text summarization.
+    It contains news articles and their corresponding highlights, which act as summaries.
+    State-of-the-art models often use this dataset to fine-tune their summarization capabilities.
+    Example Summary: The CNN/Daily Mail dataset is commonly used for training summarization models with news articles and highlights.
+    Now summarize the following text with maximum 60 words: {input_text}
+    The summary is:"""
+    # prompt = f"""
+    # Now summarize the following text with maximum 60 words: {input_text}
+    # The summary is:"""
+    inputs = tokenizer(prompt, return_tensors="pt",  add_special_tokens=True)
+    inputs_tokens = inputs['input_ids'].cuda()
+    output = model.generate(
+        inputs_tokens,
+        max_new_tokens=max_output_tokens,
+        min_new_tokens=min_output_tokens,
+        do_sample=True,
+        temperature=0.9,
+        top_k=50,
+        eos_token_id=tokenizer.eos_token_id,
+        pad_token_id=tokenizer.eos_token_id,  # Prevents truncation issues
+        repetition_penalty=1.2  # Ensures complete sentence termination
+    )
+    summary = tokenizer.decode(output[0], skip_special_tokens=True)
+    return summary.split("The summary is:")[-1].strip()
+with open(data_link, "r", encoding="utf-8") as f:
+    data_subset = json.load(f)
+# Filter test data
+test_data = [sample for sample in data_subset if sample["type"] == data]
+# Testing loop
+for i, sample in enumerate(test_data[:N]):
+    print(f"Processed {i+1}/{len(test_data[:N])}")
+    text = sample["article"]
+    summary = text_summarize(text, model, tokenizer, max_output_tokens , min_output_tokens)
+    # Store the input and output in a dictionary
+    data_dict = {
+        "id": sample["id"],
+        "article": sample["article"],
+        "highlights": sample["highlights"],
+        "summary": summary,
+        "type": data
+    }
+    output_results.append(data_dict)
+    input_counter += 1
+    # Save the results freqently
+    if input_counter % saving_freq == 0:
+        # Check if the file exits
+        if os.path.isfile(output_name + "_" + str(input_counter-saving_freq) + ".json"):
+            os.remove(output_name + "_" + str(input_counter-saving_freq) + ".json")
+        with open(output_name + "_" + str(input_counter) + ".json", "w", encoding="utf-8") as json_file:
+            json.dump(output_results, json_file, indent=4)
+print(f"Summarization complete. Results saved to {output_name}.")

Reproducibility/README.md ADDED Viewed

	@@ -0,0 +1,61 @@

+# Reproducibility Codes
+This folder contains the Python scripts needed to reproduce the watermark performance results shown in the leaderboard.
+## Scripts Overview
+### Dataset Preparation
+- **`C4_dataset_download.py`**: Downloads and prepares the C4 dataset for watermark evaluation
+- **`CNN_dataset_download.py`**: Downloads and prepares the CNN/DailyMail dataset for evaluation
+### Model Training & Inference
+- **`Finetune_sum.py`**: Fine-tunes language models for watermark evaluation
+- **`Inference_sum.py`**: Performs inference with watermarked models to generate test data
+### Evaluation Metrics
+- **`BERT_score.py`**: Computes BERT scores for text quality evaluation
+- **`Entity_similarity_score.py`**: Calculates entity similarity scores for watermark detection
+- **`Attack_dipper.py`**: Implements watermark removal attacks for robustness testing
+## Usage Instructions
+1. **Environment Setup**: Ensure you have the required dependencies installed (transformers, datasets, etc.)
+2. **Dataset Preparation**: Run the dataset download scripts first
+   ```bash
+   python C4_dataset_download.py
+   python CNN_dataset_download.py
+   ```
+3. **Model Training**: Fine-tune your models
+   ```bash
+   python Finetune_sum.py
+   ```
+4. **Inference**: Generate watermarked text
+   ```bash
+   python Inference_sum.py
+   ```
+5. **Evaluation**: Run the evaluation metrics
+   ```bash
+   python BERT_score.py
+   python Entity_similarity_score.py
+   python Attack_dipper.py
+   ```
+## Requirements
+- Python 3.8+
+- PyTorch
+- Transformers library
+- Datasets library
+- Other dependencies as specified in each script
+## Notes
+- Modify the configuration parameters in each script according to your setup
+- Ensure you have sufficient computational resources for training and evaluation
+- Results may vary based on random seeds and hardware differences
+For detailed instructions on each metric evaluation, refer to the main guidelines in the leaderboard application.

app.py ADDED Viewed

	@@ -0,0 +1,1097 @@

+import gradio as gr
+import json
+import os
+import pandas as pd
+import plotly.express as px
+import plotly.graph_objects as go
+from datetime import datetime
+from plotly.subplots import make_subplots
+# Load leaderboard data
+def load_leaderboard_data():
+    try:
+        with open('leaderboard.json', 'r') as f:
+            return json.load(f)
+    except:
+        return []
+# Filter data based on model and metric
+def filter_data(data, model, metric):
+    filtered = []
+    for item in data:
+        if item.get('model') == model:
+            if metric == "Attack-free":
+                if item.get('normalizedUtility') is not None and item.get('detectionRate') is not None:
+                    filtered.append({
+                        'name': item.get('name', ''),
+                        'model': item.get('model', ''),
+                        'normalizedUtility': item.get('normalizedUtility', 0),
+                        'detectionRate': item.get('detectionRate', 0)
+                    })
+            elif metric == "Watermark Removal":
+                if (item.get('absoluteUtilityDegregation') is not None and
+                    item.get('removal_detectionRate') is not None):
+                    filtered.append({
+                        'name': item.get('name', ''),
+                        'model': item.get('model', ''),
+                        'absoluteUtilityDegregation': item.get('absoluteUtilityDegregation', 0),
+                        'removal_detectionRate': item.get('removal_detectionRate', 0)
+                    })
+            elif metric == "Stealing Attack":
+                if (item.get('adversaryBERTscore') is not None and
+                    item.get('adversaryDetectionRate') is not None):
+                    filtered.append({
+                        'name': item.get('name', ''),
+                        'model': item.get('model', ''),
+                        'adversaryBERTscore': item.get('adversaryBERTscore', 0),
+                        'adversaryDetectionRate': item.get('adversaryDetectionRate', 0)
+                    })
+    # Sort by detection rate (descending)
+    if metric == "Attack-free":
+        filtered.sort(key=lambda x: x['detectionRate'], reverse=True)
+    elif metric == "Watermark Removal":
+        filtered.sort(key=lambda x: x['removal_detectionRate'], reverse=True)
+    else:  # Stealing Attack
+        filtered.sort(key=lambda x: x['adversaryDetectionRate'], reverse=True)
+    return filtered
+# Create scatter plot
+def create_scatter_plot(data, metric):
+    if not data:
+        return go.Figure()
+    # Prepare data for plotting
+    x_data = []
+    y_data = []
+    names = []
+    for item in data:
+        names.append(item['name'])
+        if metric == "Attack-free":
+            x_data.append(item['normalizedUtility'])
+            y_data.append(item['detectionRate'])
+        elif metric == "Watermark Removal":
+            x_data.append(item['absoluteUtilityDegregation'])
+            y_data.append(item['removal_detectionRate'])
+        else:  # Stealing Attack
+            x_data.append(item['adversaryBERTscore'])
+            y_data.append(item['adversaryDetectionRate'])
+    # Create scatter plot
+    fig = go.Figure()
+    # Add scatter points
+    fig.add_trace(go.Scatter(
+        x=x_data,
+        y=y_data,
+        mode='markers+text',
+        marker=dict(
+            size=12,
+            color='#3B82F6',
+            line=dict(width=2, color='white')
+        ),
+        text=names,
+        textposition='top center',
+        textfont=dict(size=10, color='#374151'),
+        hovertemplate='<b>%{text}</b><br>' +
+                      ('Normalized Utility: %{x:.3f}<br>' if metric == "Attack-free" else
+                       'Abs Utility Degradation: %{x:.3f}<br' if metric == "Watermark Removal" else
+                       'Adversary BERT Score: %{x:.3f}<br>') +
+                      ('Detection Rate: %{y:.3f}%<br>' if metric != "Stealing Attack" else
+                       'Adversary Detection Rate: %{y:.3f}%<br>') +
+                      '<extra></extra>'
+    ))
+    # Set axis labels
+    if metric == "Attack-free":
+        x_title = "Normalized Utility"
+        y_title = "Detection Rate (%)"
+    elif metric == "Watermark Removal":
+        x_title = "Absolute Utility Degradation"
+        y_title = "Removal Detection Rate (%)"
+    else:  # Stealing Attack
+        x_title = "Adversary BERT Score"
+        y_title = "Adversary Detection Rate (%)"
+    fig.update_layout(
+        title=f"{metric} Performance Scatter Plot",
+        xaxis_title=x_title,
+        yaxis_title=y_title,
+        font=dict(size=12, color='#374151'),
+        plot_bgcolor='white',
+        paper_bgcolor='white',
+        xaxis=dict(
+            gridcolor='lightgray',
+            showgrid=True,
+            zeroline=False
+        ),
+        yaxis=dict(
+            gridcolor='lightgray',
+            showgrid=True,
+            zeroline=False
+        ),
+        margin=dict(l=60, r=60, t=80, b=60)
+    )
+    return fig
+# Create table data with heatmap styling
+def create_table_data(data, metric):
+    if not data:
+        return pd.DataFrame()
+    table_data = []
+    for i, item in enumerate(data, 1):
+        row = {'Rank': i, 'Watermark': item['name']}
+        if metric == "Attack-free":
+            row['Normalized Utility ↑'] = f"{item['normalizedUtility']:.3f}"
+            row['Detection Rate (%) ↑'] = f"{item['detectionRate']:.3f}"
+        elif metric == "Watermark Removal":
+            row['Abs Utility Degradation ↑'] = f"{item['absoluteUtilityDegregation']:.3f}"
+            row['Removal Detection Rate (%) ↑'] = f"{item['removal_detectionRate']:.3f}"
+        else:  # Stealing Attack
+            row['Adversary BERT Score ↑'] = f"{item['adversaryBERTscore']:.3f}"
+            row['Adversary Detection Rate (%) ↑'] = f"{item['adversaryDetectionRate']:.3f}"
+        table_data.append(row)
+    return pd.DataFrame(table_data)
+# Create table data with green arrows and reference links
+def create_table_data(data, metric):
+    if not data:
+        return pd.DataFrame()
+    table_data = []
+    for i, item in enumerate(data, 1):
+        watermark_name = item['name']
+        paper_link = item.get('paperLink')
+        model = item.get('model', 'N/A')
+        # Create reference link if paper link exists (smaller text)
+        if paper_link:
+            reference_link = f'<a href="{paper_link}" target="_blank" style="color: #3B82F6; text-decoration: underline; font-size: 0.8em;">📄 Paper</a>'
+        else:
+            reference_link = '-'
+        row = {
+            'Watermark': watermark_name
+        }
+        if metric == "Attack-free":
+            row['Normalized Utility ↑'] = f"{item['normalizedUtility']:.3f}"
+            row['Detection Rate (%) ↑'] = f"{item['detectionRate']:.3f}"
+        elif metric == "Watermark Removal":
+            row['Abs Utility Degradation ↑'] = f"{item['absoluteUtilityDegregation']:.3f}"
+            row['Removal Detection Rate (%) ↑'] = f"{item['removal_detectionRate']:.3f}"
+        else:  # Stealing Attack
+            row['Adversary BERT Score ↑'] = f"{item['adversaryBERTscore']:.3f}"
+            row['Adversary Detection Rate (%) ↑'] = f"{item['adversaryDetectionRate']:.3f}"
+        # Add Reference column at the end
+        row['Reference'] = reference_link
+        table_data.append(row)
+    return pd.DataFrame(table_data)
+# Update interface based on selections
+def update_interface(model, metric):
+    data = load_leaderboard_data()
+    filtered_data = filter_data(data, model, metric)
+    # Create scatter plot
+    scatter_plot = create_scatter_plot(filtered_data, metric)
+    # Create table with green arrows
+    table_data = create_table_data(filtered_data, metric)
+    return scatter_plot, table_data
+# Handle form submission
+def submit_watermark_data(name, model, paper_link, normalized_utility, detection_rate,
+                         absolute_utility_degradation, removal_detection_rate,
+                         adversary_bert_score, adversary_detection_rate):
+    """Handle watermark data submission"""
+    # Validation
+    if not name or not name.strip():
+        return "❌ Error: Watermark name is required", gr.update()
+    if not model:
+        return "❌ Error: Model selection is required", gr.update()
+    # Validate paper link if provided
+    if paper_link and paper_link.strip():
+        paper_link = paper_link.strip()
+        if not (paper_link.startswith('http://') or paper_link.startswith('https://')):
+            return "❌ Error: Paper link must start with http:// or https://", gr.update()
+    else:
+        paper_link = None
+    # Check what type of submission this is based on provided fields
+    has_attack_free_data = normalized_utility is not None and detection_rate is not None
+    has_removal_data = absolute_utility_degradation is not None and removal_detection_rate is not None
+    has_stealing_data = adversary_bert_score is not None and adversary_detection_rate is not None
+    # At least one complete set of metrics must be provided
+    if not has_attack_free_data and not has_removal_data and not has_stealing_data:
+        return "❌ Error: Please provide at least one complete set of metrics:\n• Attack-free: Normalized Utility + Detection Rate\n• Watermark Removal: Absolute Utility Degradation + Removal Detection Rate\n• Stealing Attack: Adversary BERT Score + Adversary Detection Rate", gr.update()
+    # Validate Attack-free metrics if provided
+    if has_attack_free_data:
+        if normalized_utility <= 0 or normalized_utility > 1.0:
+            return "❌ Error: Normalized Utility must be between 0.000 and 1.000", gr.update()
+        if detection_rate < 0.0 or detection_rate > 100.0:
+            return "❌ Error: Detection Rate must be between 0.000 and 100.000", gr.update()
+    # Validate Watermark Removal metrics if provided
+    if has_removal_data:
+        if absolute_utility_degradation <= 0 or absolute_utility_degradation > 1.0:
+            return "❌ Error: Absolute Utility Degradation must be between 0.000 and 1.000", gr.update()
+        if removal_detection_rate < 0.0 or removal_detection_rate > 100.0:
+            return "❌ Error: Removal Detection Rate must be between 0.000 and 100.000", gr.update()
+    # Validate Stealing Attack metrics if provided
+    if has_stealing_data:
+        if adversary_bert_score <= 0 or adversary_bert_score > 1.0:
+            return "❌ Error: Adversary BERT Score must be between 0.000 and 1.000", gr.update()
+        if adversary_detection_rate < 0.0 or adversary_detection_rate > 100.0:
+            return "❌ Error: Adversary Detection Rate must be between 0.000 and 100.000", gr.update()
+    # Validate partial adversary data (if one is provided, both are required)
+    has_partial_adversary = (adversary_bert_score is not None and adversary_bert_score > 0) or \
+                           (adversary_detection_rate is not None and adversary_detection_rate > 0)
+    if has_partial_adversary and not has_stealing_data:
+        return "❌ Error: If you provide one adversary metric, you must provide both Adversary BERT Score and Adversary Detection Rate", gr.update()
+    # Create new entry - only include provided values, don't set missing ones to 0
+    new_entry = {
+        "name": name.strip(),
+        "model": model,
+        "normalizedUtility": normalized_utility,
+        "detectionRate": detection_rate
+    }
+    # Add paper link if provided
+    if paper_link:
+        new_entry["paperLink"] = paper_link
+    # Only add optional metrics if they were provided
+    if absolute_utility_degradation is not None:
+        new_entry["absoluteUtilityDegregation"] = absolute_utility_degradation
+    if removal_detection_rate is not None:
+        new_entry["removal_detectionRate"] = removal_detection_rate
+    if adversary_bert_score is not None:
+        new_entry["adversaryBERTscore"] = adversary_bert_score
+    if adversary_detection_rate is not None:
+        new_entry["adversaryDetectionRate"] = adversary_detection_rate
+    # Load existing approved data to check for duplicates
+    try:
+        with open('leaderboard.json', 'r') as f:
+            approved_data = json.load(f)
+    except:
+        approved_data = []
+    # Check for duplicate names in approved data
+    for entry in approved_data:
+        if entry.get('name') == name.strip() and entry.get('model') == model:
+            return f"❌ Error: A watermark named '{name.strip()}' already exists for {model}", gr.update()
+    # Load pending submissions to check for duplicates there too
+    try:
+        with open('pending_submissions.json', 'r') as f:
+            pending_data = json.load(f)
+    except:
+        pending_data = []
+    # Check for duplicate names in pending data
+    for entry in pending_data:
+        if entry.get('name') == name.strip() and entry.get('model') == model:
+            return f"❌ Error: A watermark named '{name.strip()}' is already pending approval for {model}", gr.update()
+    # Add submission timestamp and status
+    new_entry['submitted_at'] = datetime.now().isoformat()
+    new_entry['status'] = 'pending'
+    new_entry['submission_id'] = f"{name.strip()}_{model}_{int(datetime.now().timestamp())}"
+    # Add to pending submissions instead of approved data
+    pending_data.append(new_entry)
+    # Save pending submissions
+    try:
+        with open('pending_submissions.json', 'w') as f:
+            json.dump(pending_data, f, indent=2)
+        # Update the interface with current approved data only
+        filtered_data = filter_data(approved_data, model, "Attack-free")
+        scatter_plot = create_scatter_plot(filtered_data, "Attack-free")
+        table_data = create_table_data(filtered_data, "Attack-free")
+        success_msg = f"✅ Successfully submitted '{name.strip()}' for {model} for approval! Your submission will be reviewed by the administrator before appearing on the leaderboard."
+        return success_msg, scatter_plot, table_data
+    except Exception as e:
+        return f"❌ Error saving submission: {str(e)}", gr.update()
+# Clear form function
+def clear_form():
+    return (None, None, None, None, None, None, None, None, None)
+# Owner approval functions
+def load_pending_submissions():
+    """Load pending submissions for owner review"""
+    try:
+        with open('pending_submissions.json', 'r') as f:
+            pending_data = json.load(f)
+        if not pending_data:
+            return pd.DataFrame(columns=["ID", "Name", "Model", "Paper Link", "Attack-free Utility", "Attack-free Detection",
+                                        "Removal Degradation", "Removal Detection", "Adversary BERT", "Adversary Detection", "Submitted At"])
+        # Format data for display with all fields
+        formatted_data = []
+        for entry in pending_data:
+            watermark_name = entry.get('name', 'N/A')
+            paper_link = entry.get('paperLink', '-')
+            model = entry.get('model', 'N/A')
+            # Format all metric fields
+            formatted_entry = {
+                "ID": entry.get('submission_id', 'N/A'),
+                "Name": watermark_name,
+                "Model": model,
+                "Paper Link": paper_link if paper_link != '-' else '-',
+                "Attack-free Utility": f"{entry.get('normalizedUtility', 0):.3f}" if entry.get('normalizedUtility') is not None else '-',
+                "Attack-free Detection": f"{entry.get('detectionRate', 0):.3f}" if entry.get('detectionRate') is not None else '-',
+                "Removal Degradation": f"{entry.get('absoluteUtilityDegregation', 0):.3f}" if entry.get('absoluteUtilityDegregation') is not None else '-',
+                "Removal Detection": f"{entry.get('removal_detectionRate', 0):.3f}" if entry.get('removal_detectionRate') is not None else '-',
+                "Adversary BERT": f"{entry.get('adversaryBERTscore', 0):.3f}" if entry.get('adversaryBERTscore') is not None else '-',
+                "Adversary Detection": f"{entry.get('adversaryDetectionRate', 0):.3f}" if entry.get('adversaryDetectionRate') is not None else '-',
+                "Submitted At": entry.get('submitted_at', 'N/A')[:19] if entry.get('submitted_at') else 'N/A',  # Show only date and time
+            }
+            formatted_data.append(formatted_entry)
+        return pd.DataFrame(formatted_data)
+    except Exception as e:
+        print(f"Error loading pending submissions: {e}")
+        return pd.DataFrame(columns=["ID", "Name", "Model", "Paper Link", "Attack-free Utility", "Attack-free Detection",
+                                    "Removal Degradation", "Removal Detection", "Adversary BERT", "Adversary Detection", "Submitted At"])
+def approve_submission(submission_id, admin_password):
+    """Approve a pending submission"""
+    # Check admin password
+    if admin_password != "admin123":  # You can change this password
+        return "❌ Access denied: Invalid admin password", gr.update()
+    try:
+        # Load pending submissions from file (not from the formatted function)
+        try:
+            with open('pending_submissions.json', 'r') as f:
+                pending_data = json.load(f)
+        except:
+            pending_data = []
+        # Find and remove the submission
+        approved_entry = None
+        for i, entry in enumerate(pending_data):
+            if entry.get('submission_id') == submission_id:
+                approved_entry = pending_data.pop(i)
+                break
+        if not approved_entry:
+            return "❌ Submission not found", gr.update()
+        # Remove submission metadata
+        approved_entry.pop('submitted_at', None)
+        approved_entry.pop('status', None)
+        approved_entry.pop('submission_id', None)
+        # Load approved data
+        try:
+            with open('leaderboard.json', 'r') as f:
+                approved_data = json.load(f)
+        except:
+            approved_data = []
+        # Add to approved data
+        approved_data.append(approved_entry)
+        # Save approved data
+        with open('leaderboard.json', 'w') as f:
+            json.dump(approved_data, f, indent=2)
+        # Save updated pending data
+        with open('pending_submissions.json', 'w') as f:
+            json.dump(pending_data, f, indent=2)
+        return f"✅ Approved submission: {approved_entry.get('name', 'Unknown')}", load_pending_submissions()
+    except Exception as e:
+        return f"❌ Error approving submission: {str(e)}", gr.update()
+def reject_submission(submission_id, admin_password):
+    """Reject a pending submission"""
+    # Check admin password
+    if admin_password != "admin123":  # You can change this password
+        return "❌ Access denied: Invalid admin password", gr.update()
+    try:
+        # Load pending submissions from file (not from the formatted function)
+        try:
+            with open('pending_submissions.json', 'r') as f:
+                pending_data = json.load(f)
+        except:
+            pending_data = []
+        # Find and remove the submission
+        rejected_entry = None
+        for i, entry in enumerate(pending_data):
+            if entry.get('submission_id') == submission_id:
+                rejected_entry = pending_data.pop(i)
+                break
+        if not rejected_entry:
+            return "❌ Submission not found", gr.update()
+        # Save updated pending data
+        with open('pending_submissions.json', 'w') as f:
+            json.dump(pending_data, f, indent=2)
+        return f"❌ Rejected submission: {rejected_entry.get('name', 'Unknown')}", load_pending_submissions()
+    except Exception as e:
+        return f"❌ Error rejecting submission: {str(e)}", gr.update()
+# Toggle add data section visibility
+def toggle_add_data_section(section):
+    return gr.update(visible=not section.visible)
+# Create the main interface
+def create_interface():
+    # Custom CSS for better styling
+    css = """
+    .gradio-container {
+        max-width: 1200px !important;
+        margin: 0 auto !important;
+        background: linear-gradient(135deg, #f5f7fa 0%, #c3cfe2 100%);
+        min-height: 100vh;
+    }
+    .title {
+        text-align: center;
+        margin: 20px 0;
+        font-size: 3rem;
+        font-weight: bold;
+        background: linear-gradient(45deg, #667eea 0%, #764ba2 100%);
+        -webkit-background-clip: text;
+        -webkit-text-fill-color: transparent;
+        background-clip: text;
+        text-shadow: 2px 2px 4px rgba(0,0,0,0.1);
+    }
+    .subtitle {
+        text-align: center;
+        margin-bottom: 30px;
+        font-size: 1.3rem;
+        color: #4a5568;
+        font-weight: 500;
+    }
+    .controls {
+        background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+        padding: 30px;
+        border-radius: 15px;
+        margin-bottom: 25px;
+        box-shadow: 0 8px 32px rgba(0,0,0,0.1);
+        border: 1px solid rgba(255,255,255,0.2);
+    }
+    .controls label {
+        color: white !important;
+        font-weight: bold !important;
+        font-size: 1.2rem !important;
+    }
+    .controls .gr-radio {
+        background: rgba(255,255,255,0.1) !important;
+        border-radius: 10px !important;
+        padding: 12px !important;
+    }
+    .controls .gr-radio label {
+        color: white !important;
+        font-size: 1.1rem !important;
+    }
+    .controls h3 {
+        font-size: 1.4rem !important;
+        margin-bottom: 15px !important;
+    }
+    #highlighted-add-data {
+        background: linear-gradient(135deg, #E0F2FE 0%, #B3E5FC 100%) !important;
+        border: 2px solid #81D4FA !important;
+        border-radius: 15px !important;
+        box-shadow: 0 10px 40px rgba(129, 212, 250, 0.3) !important;
+        margin: 20px 0 !important;
+    }
+    #highlighted-add-data .gr-accordion-header {
+        background: linear-gradient(135deg, #81D4FA 0%, #4FC3F7 100%) !important;
+        color: white !important;
+        font-weight: bold !important;
+        font-size: 1.2rem !important;
+        padding: 15px 20px !important;
+        border-radius: 15px 15px 0 0 !important;
+    }
+    #highlighted-add-data .gr-accordion-content {
+        background: rgba(255,255,255,0.95) !important;
+        border-radius: 0 0 15px 15px !important;
+        padding: 25px !important;
+    }
+    .gr-button {
+        border-radius: 10px !important;
+        font-weight: bold !important;
+        transition: all 0.3s ease !important;
+    }
+    .gr-button:hover {
+        transform: translateY(-2px) !important;
+        box-shadow: 0 5px 15px rgba(0,0,0,0.2) !important;
+    }
+    .gr-plot {
+        border-radius: 15px !important;
+        box-shadow: 0 8px 32px rgba(0,0,0,0.1) !important;
+        background: white !important;
+        padding: 20px !important;
+    }
+    .gr-dataframe {
+        border-radius: 15px !important;
+        box-shadow: 0 8px 32px rgba(0,0,0,0.1) !important;
+        background: white !important;
+        overflow: hidden !important;
+    }
+    .gr-accordion {
+        border-radius: 15px !important;
+        box-shadow: 0 8px 32px rgba(0,0,0,0.1) !important;
+        background: white !important;
+        margin: 15px 0 !important;
+    }
+    .gr-accordion-header {
+        background: linear-gradient(135deg, #667eea 0%, #764ba2 100%) !important;
+        color: white !important;
+        font-weight: bold !important;
+        padding: 15px 20px !important;
+        border-radius: 15px 15px 0 0 !important;
+    }
+    .gr-accordion-content {
+        background: rgba(255,255,255,0.95) !important;
+        border-radius: 0 0 15px 15px !important;
+        padding: 20px !important;
+    }
+    #submit-btn {
+        background: linear-gradient(135deg, #29B6F6 0%, #0288D1 100%) !important;
+        border: 2px solid #0277BD !important;
+        color: white !important;
+        font-weight: bold !important;
+        font-size: 1.1rem !important;
+        padding: 15px 30px !important;
+        border-radius: 12px !important;
+        box-shadow: 0 8px 25px rgba(41, 182, 246, 0.4) !important;
+        transition: all 0.3s ease !important;
+    }
+       #submit-btn:hover {
+           background: linear-gradient(135deg, #0288D1 0%, #0277BD 100%) !important;
+           transform: translateY(-3px) !important;
+           box-shadow: 0 12px 35px rgba(41, 182, 246, 0.6) !important;
+       }
+       #owner-controls {
+           background: linear-gradient(135deg, #FFE0E0 0%, #FFCDD2 100%) !important;
+           border: 2px solid #FF5722 !important;
+           border-radius: 15px !important;
+           box-shadow: 0 10px 40px rgba(255, 87, 34, 0.3) !important;
+           margin: 20px 0 !important;
+       }
+       #owner-controls .gr-accordion-header {
+           background: linear-gradient(135deg, #FF5722 0%, #D32F2F 100%) !important;
+           color: white !important;
+           font-weight: bold !important;
+           font-size: 1.2rem !important;
+           padding: 15px 20px !important;
+           border-radius: 15px 15px 0 0 !important;
+       }
+       #owner-controls .gr-accordion-content {
+           background: rgba(255,255,255,0.95) !important;
+           border-radius: 0 0 15px 15px !important;
+           padding: 25px !important;
+       }
+       #approve-btn {
+           background: linear-gradient(135deg, #4CAF50 0%, #2E7D32 100%) !important;
+           border: 2px solid #388E3C !important;
+           color: white !important;
+           font-weight: bold !important;
+           font-size: 1.1rem !important;
+           padding: 15px 30px !important;
+           border-radius: 12px !important;
+           box-shadow: 0 8px 25px rgba(76, 175, 80, 0.4) !important;
+           transition: all 0.3s ease !important;
+       }
+       #approve-btn:hover {
+           background: linear-gradient(135deg, #2E7D32 0%, #1B5E20 100%) !important;
+           transform: translateY(-3px) !important;
+           box-shadow: 0 12px 35px rgba(76, 175, 80, 0.6) !important;
+       }
+       #reject-btn {
+           background: linear-gradient(135deg, #F44336 0%, #C62828 100%) !important;
+           border: 2px solid #D32F2F !important;
+           color: white !important;
+           font-weight: bold !important;
+           font-size: 1.1rem !important;
+           padding: 15px 30px !important;
+           border-radius: 12px !important;
+           box-shadow: 0 8px 25px rgba(244, 67, 54, 0.4) !important;
+           transition: all 0.3s ease !important;
+       }
+       #reject-btn:hover {
+           background: linear-gradient(135deg, #C62828 0%, #B71C1C 100%) !important;
+           transform: translateY(-3px) !important;
+           box-shadow: 0 12px 35px rgba(244, 67, 54, 0.6) !important;
+       }
+       #guideline-section {
+           background: linear-gradient(135deg, #E8F5E8 0%, #C8E6C9 100%) !important;
+           border: 2px solid #4CAF50 !important;
+           border-radius: 15px !important;
+           box-shadow: 0 10px 40px rgba(76, 175, 80, 0.3) !important;
+           margin: 20px 0 !important;
+       }
+       #guideline-section .gr-accordion-header {
+           background: linear-gradient(135deg, #4CAF50 0%, #2E7D32 100%) !important;
+           color: white !important;
+           font-weight: bold !important;
+           font-size: 1.2rem !important;
+           padding: 15px 20px !important;
+           border-radius: 15px 15px 0 0 !important;
+       }
+       #guideline-section .gr-accordion-content {
+           background: rgba(255,255,255,0.95) !important;
+           border-radius: 0 0 15px 15px !important;
+           padding: 25px !important;
+       }
+    """
+    with gr.Blocks(css=css, title="Watermark Leaderboard for LLMs") as demo:
+        # Header
+        gr.HTML("""
+        <div class="title">
+            🏆 Watermark Leaderboard for LLMs 🏆
+        </div>
+        <div class="subtitle">
+            📊 Interactive leaderboard for comparing watermark performance across different models and evaluation settings
+        </div>
+        """)
+        # Controls
+        with gr.Row():
+            with gr.Column(scale=1):
+                gr.HTML("<div style='text-align: center; margin-bottom: 15px;'><h3 style='color: #667eea; margin: 0; font-weight: bold;'>🤖 Model Selection</h3></div>")
+                model_selector = gr.Radio(
+                    choices=["LLaMA3", "DeepSeek"],
+                    value="LLaMA3",
+                    label="Model",
+                    info="Select the model to display"
+                )
+            with gr.Column(scale=1):
+                gr.HTML("<div style='text-align: center; margin-bottom: 15px;'><h3 style='color: #667eea; margin: 0; font-weight: bold;'>⚙️ Evaluation Setting</h3></div>")
+                metric_selector = gr.Radio(
+                    choices=["Attack-free", "Watermark Removal", "Stealing Attack"],
+                    value="Attack-free",
+                    label="Setting",
+                    info="Select the evaluation setting"
+                )
+        # Add Your Data Section (Highlighted)
+        with gr.Accordion("🚀 Add Your Data to the Leaderboard", open=False, elem_id="highlighted-add-data"):
+            gr.HTML("""
+            <div style='text-align: center; margin-bottom: 20px;'>
+                <h2 style='color: #0277BD; margin: 0; font-size: 1.5rem;'>📝 Submit Your Watermark Performance Results</h2>
+                <p style='color: #374151; margin: 10px 0 0 0;'>Contribute to the community by sharing your watermark evaluation results</p>
+            </div>
+            <div style='background: #E3F2FD; border: 1px solid #2196F3; border-radius: 8px; padding: 15px; margin-bottom: 20px;'>
+                <h4 style='color: #1976D2; margin: 0 0 10px 0;'>📋 Submission Requirements</h4>
+                <p style='color: #374151; margin: 0 0 8px 0;'>Provide at least one complete set of metrics:</p>
+                <ul style='color: #374151; margin: 0; padding-left: 20px;'>
+                    <li><strong>Attack-free:</strong> Normalized Utility + Detection Rate</li>
+                    <li><strong>Watermark Removal:</strong> Absolute Utility Degradation + Removal Detection Rate</li>
+                    <li><strong>Stealing Attack:</strong> Adversary BERT Score + Adversary Detection Rate</li>
+                </ul>
+            </div>
+            """)
+            with gr.Row():
+                with gr.Column(scale=1):
+                    # Basic Information
+                    gr.HTML("<div style='text-align: center; margin-bottom: 15px;'><h3 style='color: #0277BD; margin: 0;'>📋 Basic Information</h3></div>")
+                    watermark_name = gr.Textbox(
+                        label="Watermark Name",
+                        placeholder="e.g., MyWatermark, Watermark-X",
+                        info="Unique identifier for your watermark"
+                    )
+                    paper_link = gr.Textbox(
+                        label="Paper Link (Optional)",
+                        placeholder="https://arxiv.org/abs/xxxx.xxxxx or https://...",
+                        info="Link to the paper describing this watermark method"
+                    )
+                    submission_model = gr.Radio(
+                        choices=["LLaMA3", "DeepSeek"],
+                        label="Model",
+                        value="LLaMA3",
+                        info="Select the model used"
+                    )
+                with gr.Column(scale=1):
+                    # Attack-free Metrics (Optional)
+                    gr.HTML("<div style='text-align: center; margin-bottom: 15px;'><h3 style='color: #0277BD; margin: 0;'>⚡ Attack-free Metrics (Optional - Both Required if One is Provided)</h3></div>")
+                    normalized_utility = gr.Number(
+                        label="Normalized Utility",
+                        value=None,
+                        minimum=0.0,
+                        maximum=1.0,
+                        step=0.001,
+                        info="Text quality metric (0.000 - 1.000)"
+                    )
+                    detection_rate = gr.Number(
+                        label="Detection Rate (%)",
+                        value=None,
+                        minimum=0.0,
+                        maximum=100.0,
+                        step=0.001,
+                        info="Watermark detection accuracy (0.000 - 100.000%)"
+                    )
+            with gr.Row():
+                with gr.Column(scale=1):
+                    # Watermark Removal Metrics (Optional)
+                    gr.HTML("<div style='text-align: center; margin-bottom: 15px;'><h3 style='color: #0277BD; margin: 0;'>🛡️ Watermark Removal (Optional)</h3></div>")
+                    absolute_utility_degradation = gr.Number(
+                        label="Absolute Utility Degradation",
+                        value=None,
+                        minimum=0.0,
+                        maximum=1.0,
+                        step=0.001,
+                        info="Resistance to removal attacks (0.000 - 1.000)"
+                    )
+                    removal_detection_rate = gr.Number(
+                        label="Removal Detection Rate (%)",
+                        value=None,
+                        minimum=0.0,
+                        maximum=100.0,
+                        step=0.001,
+                        info="Detection rate under removal attacks (0.000 - 100.000%)"
+                    )
+                with gr.Column(scale=1):
+                    # Stealing Attack Metrics (Optional)
+                    gr.HTML("<div style='text-align: center; margin-bottom: 15px;'><h3 style='color: #0277BD; margin: 0;'>🎯 Stealing Attack (Optional)</h3></div>")
+                    adversary_bert_score = gr.Number(
+                        label="Adversary BERT Score",
+                        value=None,
+                        minimum=0.0,
+                        maximum=1.0,
+                        step=0.001,
+                        info="Performance under adversarial conditions (0.000 - 1.000)"
+                    )
+                    adversary_detection_rate = gr.Number(
+                        label="Adversary Detection Rate (%)",
+                        value=None,
+                        minimum=0.0,
+                        maximum=100.0,
+                        step=0.001,
+                        info="Detection rate under adversarial attacks (0.000 - 100.000%)"
+                    )
+            # Submit and Clear buttons
+            with gr.Row():
+                with gr.Column(scale=1):
+                    submit_btn = gr.Button(
+                        "🚀 Submit Data to Leaderboard",
+                        variant="primary",
+                        size="lg",
+                        elem_id="submit-btn"
+                    )
+                with gr.Column(scale=1):
+                    clear_btn = gr.Button(
+                        "🗑️ Clear Form",
+                        variant="secondary",
+                        size="lg"
+                    )
+            # Status message
+            status_message = gr.Markdown("", visible=True)
+        # Scatter Plot
+        scatter_plot = gr.Plot(
+            label="Performance Scatter Plot",
+            show_label=True
+        )
+        # Table
+        table = gr.DataFrame(
+            label="Performance Table",
+            show_label=True,
+            interactive=False,
+            wrap=True
+        )
+        # Guideline and Metrics Explained Section (At bottom with light green background)
+        with gr.Accordion("📋 Guideline for Submitting Watermark Performance Results", open=False, elem_id="guideline-section"):
+            gr.HTML("""
+            <div style="padding: 20px;">
+                <h3>Guideline for Submitting Watermark Performance Results</h3>
+                <h4>1. Datasets</h4>
+                <ul>
+                    <li><strong>Text Generation (C4 dataset)</strong>
+                        <ul>
+                            <li>Training: first 20,000 samples</li>
+                            <li>Testing: 13,860 samples</li>
+                            <li>Reference script: <code>Files/Reproducibility/C4_dataset_download.py</code></li>
+                        </ul>
+                    </li>
+                    <li><strong>Text Summarization (CNN/Daily Mail dataset)</strong>
+                        <ul>
+                            <li>Training: first 10,000–20,000 samples</li>
+                            <li>Testing: 1,000 samples</li>
+                            <li>Reference script: <code>Files/Reproducibility/CNN_dataset_download.py</code></li>
+                        </ul>
+                    </li>
+                </ul>
+                <h4>2. Models</h4>
+                <ul>
+                    <li>Use open-source models available on Hugging Face:
+                        <ul>
+                            <li>DeepSeek: "deepseek-ai/deepseek-llm-7b-base"</li>
+                            <li>LLaMA-3: "meta-llama/Meta-Llama-3-8B"</li>
+                        </ul>
+                    </li>
+                </ul>
+                <h4>3. Evaluation Settings</h4>
+                <ul>
+                    <li><strong>(a) Attack-Free Setting</strong>
+                        <ul>
+                            <li>Generate 13,860 watermarked outputs on the C4 test set.</li>
+                            <li>Report: Detection Rate and Normalized Utility (see Metrics).</li>
+                        </ul>
+                    </li>
+                    <li><strong>(b) Watermark Removal Setting</strong>
+                        <ul>
+                            <li>Apply Dipper to paraphrase watermarked outputs.</li>
+                            <li>Report:
+                                <ul>
+                                    <li>Detection Rate after attack</li>
+                                    <li>Normalized Utility after attack</li>
+                                    <li>Absolute Utility Degradation (difference before vs. after attack)</li>
+                                </ul>
+                            </li>
+                            <li>Reference scripts: <code>Files/Reproducibility/Attack_dipper.py</code></li>
+                        </ul>
+                    </li>
+                    <li><strong>(c) Stealing Attack Setting</strong>
+                        <ul>
+                            <li>Generate 20,000 watermarked samples for training a surrogate model using LoRA.</li>
+                            <li>Use the surrogate model for summarization on 1,000 test samples.</li>
+                            <li>Report: Detection Rate and Normalized Utility on the surrogate's outputs.</li>
+                            <li>Reference scripts: <code>Files/Reproducibility/Finetune_sum.py</code>, <code>Files/Reproducibility/Inference_sum.py</code></li>
+                        </ul>
+                    </li>
+                </ul>
+                <h4>4. Metrics</h4>
+                <ul>
+                    <li><strong>Detection Rate</strong>
+                        <ul>
+                            <li>Average accuracy across the test set (e.g., 13,860 examples for text generation).</li>
+                            <li>Use your own detector implementation.</li>
+                        </ul>
+                    </li>
+                    <li><strong>Normalized Utility</strong>
+                        <ul>
+                            <li>Defined as the mean of:</li>
+                            <li>BERTScore (<code>Files/Reproducibility/BERT_score.py</code>)</li>
+                            <li>Entity Similarity Score (<code>Files/Reproducibility/Entity_similarity_score.py</code>)</li>
+                        </ul>
+                    </li>
+                    <li><strong>Absolute Utility Degradation</strong>
+                        <ul>
+                            <li>The absolute change in Normalized Utility between attack-free and attacked outputs.</li>
+                        </ul>
+                    </li>
+                </ul>
+                <h4>5. Submission</h4>
+                <ul>
+                    <li>You may submit results for one or more evaluation settings (Attack-Free, Removal, Stealing).</li>
+                    <li>Please include:
+                        <ul>
+                            <li>Model(s) evaluated</li>
+                            <li>Dataset(s) used</li>
+                            <li>Scripts/configuration details if modified</li>
+                            <li>Reported metrics in the required format</li>
+                        </ul>
+                    </li>
+                </ul>
+                <p><strong>Reproducibility codes are available in the Files tab of this Space.</strong></p>
+            </div>
+            """)
+        # Owner Approval Section (At the very bottom)
+        with gr.Accordion("🔒 Owner Controls - Pending Submissions", open=False, elem_id="owner-controls"):
+            gr.HTML("""
+            <div style='text-align: center; margin-bottom: 20px;'>
+                <h2 style='color: #D32F2F; margin: 0; font-size: 1.5rem;'>🛡️ Administrator Approval Panel</h2>
+                <p style='color: #374151; margin: 10px 0 0 0;'>Review and approve pending submissions before they appear on the leaderboard</p>
+            </div>
+            """)
+            # Pending submissions table
+            pending_table = gr.DataFrame(
+                label="📋 Pending Submissions",
+                show_label=True,
+                interactive=False,
+                wrap=True,
+                headers=["ID", "Name", "Model", "Paper Link", "Attack-free Utility", "Attack-free Detection",
+                        "Removal Degradation", "Removal Detection", "Adversary BERT", "Adversary Detection", "Submitted At"]
+            )
+            # Admin authentication
+            admin_password_input = gr.Textbox(
+                label="🔐 Admin Password",
+                placeholder="Enter admin password to access controls",
+                type="password",
+                info="Required for approval/rejection actions"
+            )
+            # Approval controls
+            with gr.Row():
+                with gr.Column(scale=1):
+                    submission_id_input = gr.Textbox(
+                        label="Submission ID",
+                        placeholder="Enter submission ID to approve/reject",
+                        info="Copy from the pending submissions table"
+                    )
+                    approve_btn = gr.Button(
+                        "✅ Approve Submission",
+                        variant="primary",
+                        size="lg",
+                        elem_id="approve-btn"
+                    )
+                with gr.Column(scale=1):
+                    reject_btn = gr.Button(
+                        "❌ Reject Submission",
+                        variant="stop",
+                        size="lg",
+                        elem_id="reject-btn"
+                    )
+                    refresh_pending_btn = gr.Button(
+                        "🔄 Refresh Pending",
+                        variant="secondary",
+                        size="lg"
+                    )
+            approval_status = gr.Markdown("", visible=True)
+        # Event handlers
+        model_selector.change(
+            fn=update_interface,
+            inputs=[model_selector, metric_selector],
+            outputs=[scatter_plot, table]
+        )
+        metric_selector.change(
+            fn=update_interface,
+            inputs=[model_selector, metric_selector],
+            outputs=[scatter_plot, table]
+        )
+        # Form submission handler
+        submit_btn.click(
+            fn=submit_watermark_data,
+            inputs=[
+                watermark_name,
+                submission_model,
+                paper_link,
+                normalized_utility,
+                detection_rate,
+                absolute_utility_degradation,
+                removal_detection_rate,
+                adversary_bert_score,
+                adversary_detection_rate
+            ],
+            outputs=[status_message, scatter_plot, table]
+        )
+        # Clear form handler
+        clear_btn.click(
+            fn=clear_form,
+            outputs=[
+                watermark_name,
+                paper_link,
+                submission_model,
+                normalized_utility,
+                detection_rate,
+                absolute_utility_degradation,
+                removal_detection_rate,
+                adversary_bert_score,
+                adversary_detection_rate
+            ]
+        )
+        # Add data button handler
+        # The add_data_button is removed, so this handler is no longer needed.
+        # The highlighted section is now always visible.
+        # Owner approval event handlers
+        approve_btn.click(
+            fn=approve_submission,
+            inputs=[submission_id_input, admin_password_input],
+            outputs=[approval_status, pending_table]
+        )
+        reject_btn.click(
+            fn=reject_submission,
+            inputs=[submission_id_input, admin_password_input],
+            outputs=[approval_status, pending_table]
+        )
+        refresh_pending_btn.click(
+            fn=load_pending_submissions,
+            outputs=[pending_table]
+        )
+        # Initial load
+        demo.load(
+            fn=lambda: update_interface("LLaMA3", "Attack-free"),
+            outputs=[scatter_plot, table]
+        )
+        # Load pending submissions on startup
+        demo.load(
+            fn=load_pending_submissions,
+            outputs=[pending_table]
+        )
+        # Clear admin password after actions for security
+        def clear_admin_password():
+            return gr.update(value="")
+        # Clear password after approve/reject actions
+        approve_btn.click(
+            fn=clear_admin_password,
+            outputs=[admin_password_input]
+        )
+        reject_btn.click(
+            fn=clear_admin_password,
+            outputs=[admin_password_input]
+        )
+    return demo
+# Create and launch the interface
+if __name__ == "__main__":
+    demo = create_interface()
+    demo.launch()

assets/index-Cd6CRo7g.js ADDED Viewed

The diff for this file is too large to render. See raw diff

assets/index-tTSI8ghR.css ADDED Viewed

	@@ -0,0 +1 @@

+ *,:before,:after{--tw-border-spacing-x: 0;--tw-border-spacing-y: 0;--tw-translate-x: 0;--tw-translate-y: 0;--tw-rotate: 0;--tw-skew-x: 0;--tw-skew-y: 0;--tw-scale-x: 1;--tw-scale-y: 1;--tw-pan-x: ;--tw-pan-y: ;--tw-pinch-zoom: ;--tw-scroll-snap-strictness: proximity;--tw-gradient-from-position: ;--tw-gradient-via-position: ;--tw-gradient-to-position: ;--tw-ordinal: ;--tw-slashed-zero: ;--tw-numeric-figure: ;--tw-numeric-spacing: ;--tw-numeric-fraction: ;--tw-ring-inset: ;--tw-ring-offset-width: 0px;--tw-ring-offset-color: #fff;--tw-ring-color: rgb(59 130 246 / .5);--tw-ring-offset-shadow: 0 0 #0000;--tw-ring-shadow: 0 0 #0000;--tw-shadow: 0 0 #0000;--tw-shadow-colored: 0 0 #0000;--tw-blur: ;--tw-brightness: ;--tw-contrast: ;--tw-grayscale: ;--tw-hue-rotate: ;--tw-invert: ;--tw-saturate: ;--tw-sepia: ;--tw-drop-shadow: ;--tw-backdrop-blur: ;--tw-backdrop-brightness: ;--tw-backdrop-contrast: ;--tw-backdrop-grayscale: ;--tw-backdrop-hue-rotate: ;--tw-backdrop-invert: ;--tw-backdrop-opacity: ;--tw-backdrop-saturate: ;--tw-backdrop-sepia: ;--tw-contain-size: ;--tw-contain-layout: ;--tw-contain-paint: ;--tw-contain-style: }::backdrop{--tw-border-spacing-x: 0;--tw-border-spacing-y: 0;--tw-translate-x: 0;--tw-translate-y: 0;--tw-rotate: 0;--tw-skew-x: 0;--tw-skew-y: 0;--tw-scale-x: 1;--tw-scale-y: 1;--tw-pan-x: ;--tw-pan-y: ;--tw-pinch-zoom: ;--tw-scroll-snap-strictness: proximity;--tw-gradient-from-position: ;--tw-gradient-via-position: ;--tw-gradient-to-position: ;--tw-ordinal: ;--tw-slashed-zero: ;--tw-numeric-figure: ;--tw-numeric-spacing: ;--tw-numeric-fraction: ;--tw-ring-inset: ;--tw-ring-offset-width: 0px;--tw-ring-offset-color: #fff;--tw-ring-color: rgb(59 130 246 / .5);--tw-ring-offset-shadow: 0 0 #0000;--tw-ring-shadow: 0 0 #0000;--tw-shadow: 0 0 #0000;--tw-shadow-colored: 0 0 #0000;--tw-blur: ;--tw-brightness: ;--tw-contrast: ;--tw-grayscale: ;--tw-hue-rotate: ;--tw-invert: ;--tw-saturate: ;--tw-sepia: ;--tw-drop-shadow: ;--tw-backdrop-blur: ;--tw-backdrop-brightness: ;--tw-backdrop-contrast: ;--tw-backdrop-grayscale: ;--tw-backdrop-hue-rotate: ;--tw-backdrop-invert: ;--tw-backdrop-opacity: ;--tw-backdrop-saturate: ;--tw-backdrop-sepia: ;--tw-contain-size: ;--tw-contain-layout: ;--tw-contain-paint: ;--tw-contain-style: }*,:before,:after{box-sizing:border-box;border-width:0;border-style:solid;border-color:#e5e7eb}:before,:after{--tw-content: ""}html,:host{line-height:1.5;-webkit-text-size-adjust:100%;-moz-tab-size:4;-o-tab-size:4;tab-size:4;font-family:ui-sans-serif,system-ui,sans-serif,"Apple Color Emoji","Segoe UI Emoji",Segoe UI Symbol,"Noto Color Emoji";font-feature-settings:normal;font-variation-settings:normal;-webkit-tap-highlight-color:transparent}body{margin:0;line-height:inherit}hr{height:0;color:inherit;border-top-width:1px}abbr:where([title]){-webkit-text-decoration:underline dotted;text-decoration:underline dotted}h1,h2,h3,h4,h5,h6{font-size:inherit;font-weight:inherit}a{color:inherit;text-decoration:inherit}b,strong{font-weight:bolder}code,kbd,samp,pre{font-family:ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,monospace;font-feature-settings:normal;font-variation-settings:normal;font-size:1em}small{font-size:80%}sub,sup{font-size:75%;line-height:0;position:relative;vertical-align:baseline}sub{bottom:-.25em}sup{top:-.5em}table{text-indent:0;border-color:inherit;border-collapse:collapse}button,input,optgroup,select,textarea{font-family:inherit;font-feature-settings:inherit;font-variation-settings:inherit;font-size:100%;font-weight:inherit;line-height:inherit;letter-spacing:inherit;color:inherit;margin:0;padding:0}button,select{text-transform:none}button,input:where([type=button]),input:where([type=reset]),input:where([type=submit]){-webkit-appearance:button;background-color:transparent;background-image:none}:-moz-focusring{outline:auto}:-moz-ui-invalid{box-shadow:none}progress{vertical-align:baseline}::-webkit-inner-spin-button,::-webkit-outer-spin-button{height:auto}[type=search]{-webkit-appearance:textfield;outline-offset:-2px}::-webkit-search-decoration{-webkit-appearance:none}::-webkit-file-upload-button{-webkit-appearance:button;font:inherit}summary{display:list-item}blockquote,dl,dd,h1,h2,h3,h4,h5,h6,hr,figure,p,pre{margin:0}fieldset{margin:0;padding:0}legend{padding:0}ol,ul,menu{list-style:none;margin:0;padding:0}dialog{padding:0}textarea{resize:vertical}input::-moz-placeholder,textarea::-moz-placeholder{opacity:1;color:#9ca3af}input::placeholder,textarea::placeholder{opacity:1;color:#9ca3af}button,[role=button]{cursor:pointer}:disabled{cursor:default}img,svg,video,canvas,audio,iframe,embed,object{display:block;vertical-align:middle}img,video{max-width:100%;height:auto}[hidden]:where(:not([hidden=until-found])){display:none}.pointer-events-none{pointer-events:none}.pointer-events-auto{pointer-events:auto}.static{position:static}.fixed{position:fixed}.absolute{position:absolute}.inset-0{inset:0}.bottom-24{bottom:6rem}.left-0{left:0}.z-10{z-index:10}.z-40{z-index:40}.z-50{z-index:50}.m-0{margin:0}.mx-auto{margin-left:auto;margin-right:auto}.my-8{margin-top:2rem;margin-bottom:2rem}.mb-2{margin-bottom:.5rem}.mb-4{margin-bottom:1rem}.mb-8{margin-bottom:2rem}.mt-2{margin-top:.5rem}.mt-4{margin-top:1rem}.mt-6{margin-top:1.5rem}.mt-8{margin-top:2rem}.block{display:block}.inline-block{display:inline-block}.flex{display:flex}.table{display:table}.grid{display:grid}.hidden{display:none}.h-10{height:2.5rem}.h-12{height:3rem}.h-16{height:4rem}.h-4\/5{height:80%}.h-full{height:100%}.min-h-screen{min-height:100vh}.w-12{width:3rem}.w-14{width:3.5rem}.w-full{width:100%}.w-min{width:-moz-min-content;width:min-content}.max-w-4xl{max-width:56rem}.max-w-5xl{max-width:64rem}.flex-1{flex:1 1 0%}.flex-shrink-0{flex-shrink:0}.scale-105{--tw-scale-x: 1.05;--tw-scale-y: 1.05;transform:translate(var(--tw-translate-x),var(--tw-translate-y)) rotate(var(--tw-rotate)) skew(var(--tw-skew-x)) skewY(var(--tw-skew-y)) scaleX(var(--tw-scale-x)) scaleY(var(--tw-scale-y))}.transform{transform:translate(var(--tw-translate-x),var(--tw-translate-y)) rotate(var(--tw-rotate)) skew(var(--tw-skew-x)) skewY(var(--tw-skew-y)) scaleX(var(--tw-scale-x)) scaleY(var(--tw-scale-y))}.cursor-pointer{cursor:pointer}.select-none{-webkit-user-select:none;-moz-user-select:none;user-select:none}.list-disc{list-style-type:disc}.grid-cols-1{grid-template-columns:repeat(1,minmax(0,1fr))}.flex-row{flex-direction:row}.flex-col{flex-direction:column}.items-center{align-items:center}.justify-end{justify-content:flex-end}.justify-center{justify-content:center}.justify-between{justify-content:space-between}.gap-2{gap:.5rem}.gap-4{gap:1rem}.space-x-2>:not([hidden])~:not([hidden]){--tw-space-x-reverse: 0;margin-right:calc(.5rem * var(--tw-space-x-reverse));margin-left:calc(.5rem * calc(1 - var(--tw-space-x-reverse)))}.space-y-1>:not([hidden])~:not([hidden]){--tw-space-y-reverse: 0;margin-top:calc(.25rem * calc(1 - var(--tw-space-y-reverse)));margin-bottom:calc(.25rem * var(--tw-space-y-reverse))}.space-y-2>:not([hidden])~:not([hidden]){--tw-space-y-reverse: 0;margin-top:calc(.5rem * calc(1 - var(--tw-space-y-reverse)));margin-bottom:calc(.5rem * var(--tw-space-y-reverse))}.space-y-3>:not([hidden])~:not([hidden]){--tw-space-y-reverse: 0;margin-top:calc(.75rem * calc(1 - var(--tw-space-y-reverse)));margin-bottom:calc(.75rem * var(--tw-space-y-reverse))}.space-y-4>:not([hidden])~:not([hidden]){--tw-space-y-reverse: 0;margin-top:calc(1rem * calc(1 - var(--tw-space-y-reverse)));margin-bottom:calc(1rem * var(--tw-space-y-reverse))}.space-y-6>:not([hidden])~:not([hidden]){--tw-space-y-reverse: 0;margin-top:calc(1.5rem * calc(1 - var(--tw-space-y-reverse)));margin-bottom:calc(1.5rem * var(--tw-space-y-reverse))}.overflow-hidden{overflow:hidden}.overflow-clip{overflow:clip}.overflow-y-auto{overflow-y:auto}.whitespace-nowrap{white-space:nowrap}.rounded{border-radius:.25rem}.rounded-2xl{border-radius:1rem}.rounded-3xl{border-radius:1.5rem}.rounded-lg{border-radius:.5rem}.border{border-width:1px}.border-b{border-bottom-width:1px}.border-b-2{border-bottom-width:2px}.border-t{border-top-width:1px}.border-border{border-color:var(--border)}.border-green-400{--tw-border-opacity: 1;border-color:rgb(74 222 128 / var(--tw-border-opacity, 1))}.border-transparent{border-color:var(--transparent)}.bg-bg{background-color:var(--bg)}.bg-bg-dark{background-color:var(--bg-dark)}.bg-bg-light{background-color:var(--bg-light)}.bg-black\/20{background-color:#0003}.bg-primary{background-color:var(--primary)}.bg-gradient-to-r{background-image:linear-gradient(to right,var(--tw-gradient-stops))}.from-blue-500{--tw-gradient-from: #3b82f6 var(--tw-gradient-from-position);--tw-gradient-to: rgb(59 130 246 / 0) var(--tw-gradient-to-position);--tw-gradient-stops: var(--tw-gradient-from), var(--tw-gradient-to)}.from-green-500{--tw-gradient-from: #22c55e var(--tw-gradient-from-position);--tw-gradient-to: rgb(34 197 94 / 0) var(--tw-gradient-to-position);--tw-gradient-stops: var(--tw-gradient-from), var(--tw-gradient-to)}.to-green-600{--tw-gradient-to: #16a34a var(--tw-gradient-to-position)}.to-purple-600{--tw-gradient-to: #9333ea var(--tw-gradient-to-position)}.p-2{padding:.5rem}.p-3{padding:.75rem}.p-4{padding:1rem}.p-6{padding:1.5rem}.px-10{padding-left:2.5rem;padding-right:2.5rem}.px-2{padding-left:.5rem;padding-right:.5rem}.px-3{padding-left:.75rem;padding-right:.75rem}.px-4{padding-left:1rem;padding-right:1rem}.px-6{padding-left:1.5rem;padding-right:1.5rem}.py-1{padding-top:.25rem;padding-bottom:.25rem}.py-2{padding-top:.5rem;padding-bottom:.5rem}.py-3{padding-top:.75rem;padding-bottom:.75rem}.pb-2{padding-bottom:.5rem}.pb-4{padding-bottom:1rem}.pl-6{padding-left:1.5rem}.pr-12{padding-right:3rem}.pt-2{padding-top:.5rem}.pt-4{padding-top:1rem}.pt-8{padding-top:2rem}.text-center{text-align:center}.text-2xl{font-size:1.5rem;line-height:2rem}.text-4xl{font-size:2.25rem;line-height:2.5rem}.text-6xl{font-size:3.75rem;line-height:1}.text-lg{font-size:1.125rem;line-height:1.75rem}.text-sm{font-size:.875rem;line-height:1.25rem}.text-xl{font-size:1.25rem;line-height:1.75rem}.font-bold{font-weight:700}.font-medium{font-weight:500}.font-semibold{font-weight:600}.text-black{--tw-text-opacity: 1;color:rgb(0 0 0 / var(--tw-text-opacity, 1))}.text-primary{color:var(--primary)}.text-text{color:var(--text)}.text-text-muted{color:var(--text-muted)}.text-white{--tw-text-opacity: 1;color:rgb(255 255 255 / var(--tw-text-opacity, 1))}.placeholder-gray-400::-moz-placeholder{--tw-placeholder-opacity: 1;color:rgb(156 163 175 / var(--tw-placeholder-opacity, 1))}.placeholder-gray-400::placeholder{--tw-placeholder-opacity: 1;color:rgb(156 163 175 / var(--tw-placeholder-opacity, 1))}.opacity-0{opacity:0}.shadow-lg{--tw-shadow: 0 10px 15px -3px rgb(0 0 0 / .1), 0 4px 6px -4px rgb(0 0 0 / .1);--tw-shadow-colored: 0 10px 15px -3px var(--tw-shadow-color), 0 4px 6px -4px var(--tw-shadow-color);box-shadow:var(--tw-ring-offset-shadow, 0 0 #0000),var(--tw-ring-shadow, 0 0 #0000),var(--tw-shadow)}.shadow-md{--tw-shadow: 0 4px 6px -1px rgb(0 0 0 / .1), 0 2px 4px -2px rgb(0 0 0 / .1);--tw-shadow-colored: 0 4px 6px -1px var(--tw-shadow-color), 0 2px 4px -2px var(--tw-shadow-color);box-shadow:var(--tw-ring-offset-shadow, 0 0 #0000),var(--tw-ring-shadow, 0 0 #0000),var(--tw-shadow)}.filter{filter:var(--tw-blur) var(--tw-brightness) var(--tw-contrast) var(--tw-grayscale) var(--tw-hue-rotate) var(--tw-invert) var(--tw-saturate) var(--tw-sepia) var(--tw-drop-shadow)}.backdrop-blur-sm{--tw-backdrop-blur: blur(4px);-webkit-backdrop-filter:var(--tw-backdrop-blur) var(--tw-backdrop-brightness) var(--tw-backdrop-contrast) var(--tw-backdrop-grayscale) var(--tw-backdrop-hue-rotate) var(--tw-backdrop-invert) var(--tw-backdrop-opacity) var(--tw-backdrop-saturate) var(--tw-backdrop-sepia);backdrop-filter:var(--tw-backdrop-blur) var(--tw-backdrop-brightness) var(--tw-backdrop-contrast) var(--tw-backdrop-grayscale) var(--tw-backdrop-hue-rotate) var(--tw-backdrop-invert) var(--tw-backdrop-opacity) var(--tw-backdrop-saturate) var(--tw-backdrop-sepia)}.transition{transition-property:color,background-color,border-color,text-decoration-color,fill,stroke,opacity,box-shadow,transform,filter,-webkit-backdrop-filter;transition-property:color,background-color,border-color,text-decoration-color,fill,stroke,opacity,box-shadow,transform,filter,backdrop-filter;transition-property:color,background-color,border-color,text-decoration-color,fill,stroke,opacity,box-shadow,transform,filter,backdrop-filter,-webkit-backdrop-filter;transition-timing-function:cubic-bezier(.4,0,.2,1);transition-duration:.15s}.transition-all{transition-property:all;transition-timing-function:cubic-bezier(.4,0,.2,1);transition-duration:.15s}.transition-colors{transition-property:color,background-color,border-color,text-decoration-color,fill,stroke;transition-timing-function:cubic-bezier(.4,0,.2,1);transition-duration:.15s}.duration-300{transition-duration:.3s}.duration-500,.duration-long{transition-duration:.5s}.duration-short{transition-duration:.3s}.ease-in-out{transition-timing-function:cubic-bezier(.4,0,.2,1)}:root{font-family:system-ui,Avenir,Helvetica,Arial,sans-serif;line-height:1.5;font-weight:400;color-scheme:light dark;color:#ffffffde;background-color:#242424;font-synthesis:none;text-rendering:optimizeLegibility;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}a{font-weight:500;color:#646cff;text-decoration:inherit}a:hover{color:#535bf2}body{margin:0;min-width:320px;min-height:100vh}h1{font-size:3.2em;line-height:1.1}button{border-radius:8px;border:1px solid transparent;padding:.6em 1.2em;font-size:1em;font-weight:500;font-family:inherit;background-color:#1a1a1a;cursor:pointer;transition:border-color .25s}button:hover{border-color:#646cff}button:focus,button:focus-visible{outline:4px auto -webkit-focus-ring-color}@media (prefers-color-scheme: light){:root{color:#213547;background-color:#fff}a:hover{color:#747bff}button{background-color:#f9f9f9}}.no-scrollbar::-webkit-scrollbar{display:none}.no-scrollbar{-ms-overflow-style:none;scrollbar-width:none}:root{--bg-dark: hsl(220 59% 91%);--bg: hsl(220 100% 97%);--bg-light: hsl(220 100% 100%);--text: hsl(226 85% 7%);--text-muted: hsl(220 26% 31%);--highlight: hsl(220 100% 100%);--border: hsl(220 19% 53%);--border-muted: hsl(220 27% 65%);--primary: hsl(219 78% 50%);--secondary: hsl(39 54% 61%);--danger: hsl(9 26% 64%);--warning: hsl(52 19% 57%);--success: hsl(146 17% 59%);--info: hsl(217 28% 65%);--transparent: rgba(206, 25, 25, 0)}.dark{--bg-dark: hsl(336 0% 1%);--bg: hsl(300 0% 4%);--bg-light: hsl(0 0% 9%);--text: hsl(300 0% 95%);--text-muted: hsl(300 0% 69%);--highlight: hsl(330 0% 39%);--border: hsl(0 0% 28%);--border-muted: hsl(300 0% 18%);--primary: hsl(219 78% 50%);--secondary: hsl(39 54% 61%);--danger: hsl(9 26% 64%);--warning: hsl(52 19% 57%);--success: hsl(146 17% 59%);--info: hsl(217 28% 65%);--transparent: rgba(225, 1, 1, 0)}*{transition:color .3s ease,background-color .3s ease,border-color .3s ease,box-shadow .3s ease}.hover\:border-blue-400:hover{--tw-border-opacity: 1;border-color:rgb(96 165 250 / var(--tw-border-opacity, 1))}.hover\:border-green-500:hover{--tw-border-opacity: 1;border-color:rgb(34 197 94 / var(--tw-border-opacity, 1))}.hover\:border-primary:hover{border-color:var(--primary)}.hover\:border-red-900:hover{--tw-border-opacity: 1;border-color:rgb(127 29 29 / var(--tw-border-opacity, 1))}.hover\:border-secondary:hover{border-color:var(--secondary)}.hover\:bg-gray-700\/30:hover{background-color:#3741514d}.hover\:bg-primary:hover{background-color:var(--primary)}.hover\:from-green-600:hover{--tw-gradient-from: #16a34a var(--tw-gradient-from-position);--tw-gradient-to: rgb(22 163 74 / 0) var(--tw-gradient-to-position);--tw-gradient-stops: var(--tw-gradient-from), var(--tw-gradient-to)}.hover\:to-green-700:hover{--tw-gradient-to: #15803d var(--tw-gradient-to-position)}.hover\:text-bg-dark:hover{color:var(--bg-dark)}.hover\:text-text:hover{color:var(--text)}.hover\:opacity-90:hover{opacity:.9}.hover\:shadow-lg:hover{--tw-shadow: 0 10px 15px -3px rgb(0 0 0 / .1), 0 4px 6px -4px rgb(0 0 0 / .1);--tw-shadow-colored: 0 10px 15px -3px var(--tw-shadow-color), 0 4px 6px -4px var(--tw-shadow-color);box-shadow:var(--tw-ring-offset-shadow, 0 0 #0000),var(--tw-ring-shadow, 0 0 #0000),var(--tw-shadow)}.hover\:shadow-xl:hover{--tw-shadow: 0 20px 25px -5px rgb(0 0 0 / .1), 0 8px 10px -6px rgb(0 0 0 / .1);--tw-shadow-colored: 0 20px 25px -5px var(--tw-shadow-color), 0 8px 10px -6px var(--tw-shadow-color);box-shadow:var(--tw-ring-offset-shadow, 0 0 #0000),var(--tw-ring-shadow, 0 0 #0000),var(--tw-shadow)}.focus\:border-primary:focus{border-color:var(--primary)}.focus\:outline-none:focus{outline:2px solid transparent;outline-offset:2px}@media (min-width: 768px){.md\:left-1\/4{left:25%}.md\:w-1\/2{width:50%}.md\:grid-cols-2{grid-template-columns:repeat(2,minmax(0,1fr))}.md\:p-0{padding:0}}

deploy_to_huggingface.py ADDED Viewed

	@@ -0,0 +1,138 @@

+#!/usr/bin/env python3
+"""
+Script to deploy the Watermark Leaderboard to Hugging Face Spaces
+"""
+import os
+import shutil
+import json
+from pathlib import Path
+def copy_files_to_hf_directory():
+    """Copy necessary files to the Hugging Face deployment directory"""
+    # Files to copy from the main project
+    source_dir = Path("../")
+    hf_dir = Path(".")
+    # Essential files for Hugging Face deployment
+    files_to_copy = [
+        "app.py",
+        "requirements.txt",
+        "README.md",
+        "leaderboard.json"
+    ]
+    # Copy Reproducibility folder if it exists
+    reproducibility_source = source_dir / "Reproducibility"
+    if reproducibility_source.exists():
+        reproducibility_dest = hf_dir / "Reproducibility"
+        if reproducibility_dest.exists():
+            shutil.rmtree(reproducibility_dest)
+        shutil.copytree(reproducibility_source, reproducibility_dest)
+        print("✅ Copied Reproducibility folder")
+    # Copy individual files
+    for file_name in files_to_copy:
+        source_file = source_dir / file_name
+        dest_file = hf_dir / file_name
+        if source_file.exists():
+            shutil.copy2(source_file, dest_file)
+            print(f"✅ Copied {file_name}")
+        else:
+            print(f"⚠️  {file_name} not found in source directory")
+    print("\n🎉 Files copied successfully!")
+    print("\nNext steps:")
+    print("1. Create a new Hugging Face Space")
+    print("2. Upload all files in this directory")
+    print("3. Set the Space to use Gradio SDK")
+    print("4. Your leaderboard will be live!")
+def create_hf_readme():
+    """Create a Hugging Face specific README"""
+    readme_content = """---
+title: Watermark Leaderboard
+emoji: 🏆
+colorFrom: blue
+colorTo: green
+sdk: gradio
+sdk_version: "4.44.0"
+app_file: app.py
+pinned: false
+license: mit
+short_description: Interactive leaderboard for watermark performance evaluation
+---
+# Watermark Leaderboard 🏆
+An interactive leaderboard for comparing watermark performance across different models and evaluation settings.
+## Features
+- **Interactive Scatter Plot**: Visualize watermark performance with Plotly charts
+- **Performance Table**: Detailed metrics with sorting and filtering
+- **Multiple Evaluation Settings**: Attack-free, Watermark Removal, and Stealing Attack
+- **Model Support**: LLaMA3 and DeepSeek models
+- **Dynamic Filtering**: Real-time updates based on model and metric selection
+- **Flexible Submissions**: Submit data for any combination of attack types
+- **Pending Approval System**: All submissions reviewed before appearing on leaderboard
+- **Complete Field Visibility**: Administrators see all submission details for review
+- **Professional UI**: Clean, modern interface with accordion sections
+- **Reproducibility**: Access to all evaluation codes and guidelines
+## How to Use
+1. **Select Model**: Choose between LLaMA3 or DeepSeek
+2. **Choose Setting**: Pick from Attack-free, Watermark Removal, or Stealing Attack
+3. **View Results**: Explore the scatter plot and detailed table
+4. **Submit Data**: Click "Add Your Data" to submit new results
+   - Submit any combination of attack types (Attack-free, Watermark Removal, Stealing Attack)
+   - All submissions go through approval process before appearing on leaderboard
+5. **Administrator Review**: Administrators can review pending submissions with full field visibility
+## Metrics Explained
+- **Normalized Utility ↑**: Higher values indicate better text quality
+- **Detection Rate (%) ↑**: Higher values indicate better watermark detection
+- **Absolute Utility Degradation ↑**: Higher values indicate better resistance to removal attacks
+- **Adversary BERT Score ↑**: Higher values indicate better performance under adversarial conditions
+## Contributing
+We encourage researchers to contribute their evaluation results. Please follow the guidelines in the "Guidelines" section for submission requirements.
+## License
+MIT License
+---
+*Last updated: December 2024*
+"""
+    with open("README.md", "w", encoding="utf-8") as f:
+        f.write(readme_content)
+    print("✅ Created Hugging Face README.md")
+def main():
+    """Main deployment function"""
+    print("🚀 Preparing Watermark Leaderboard for Hugging Face deployment...")
+    # Create HF README
+    create_hf_readme()
+    # Copy files
+    copy_files_to_hf_directory()
+    print("\n📋 Deployment Checklist:")
+    print("✅ All files prepared")
+    print("✅ Requirements.txt updated")
+    print("✅ README.md created for Hugging Face")
+    print("✅ Reproducibility codes included")
+    print("\n🌐 Ready for Hugging Face Spaces deployment!")
+if __name__ == "__main__":
+    main()

index.html ADDED Viewed

	@@ -0,0 +1,13 @@

+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <title>Watermark Leaderboard</title>
+    <script type="module" crossorigin src="./assets/index-Cd6CRo7g.js"></script>
+    <link rel="stylesheet" crossorigin href="./assets/index-tTSI8ghR.css">
+  </head>
+  <body>
+    <div id="root"></div>
+  </body>
+</html>

leaderboard.json ADDED Viewed

	@@ -0,0 +1,122 @@

+[
+  {
+    "name": "KGW",
+    "model": "LLaMA3",
+    "normalizedUtility": 0.601,
+    "detectionRate": 91.43,
+    "removal_detectionRate": 3.9,
+    "absoluteUtilityDegregation": 0.028999999999999915,
+    "adversaryBERTscore": 0.785,
+    "adversaryDetectionRate": 0.72
+  },
+  {
+    "name": "SIR",
+    "model": "LLaMA3",
+    "normalizedUtility": 0.596,
+    "detectionRate": 82.98,
+    "removal_detectionRate": 22.35,
+    "absoluteUtilityDegregation": 0.026499999999999968,
+    "adversaryBERTscore": 0.785,
+    "adversaryDetectionRate": 64.09
+  },
+  {
+    "name": "SynthID",
+    "model": "LLaMA3",
+    "normalizedUtility": 0.591,
+    "detectionRate": 49.76,
+    "removal_detectionRate": 1.435,
+    "absoluteUtilityDegregation": 0.02474999999999994,
+    "adversaryBERTscore": 0.788,
+    "adversaryDetectionRate": 1.4
+  },
+  {
+    "name": "DTM",
+    "model": "LLaMA3",
+    "normalizedUtility": 0.74,
+    "detectionRate": 85.64,
+    "removal_detectionRate": 2.835,
+    "absoluteUtilityDegregation": 0.05545,
+    "adversaryBERTscore": 0.798,
+    "adversaryDetectionRate": 29.1
+  },
+  {
+    "name": "TW",
+    "model": "LLaMA3",
+    "normalizedUtility": 0.852,
+    "detectionRate": 88.51,
+    "removal_detectionRate": 11.56,
+    "absoluteUtilityDegregation": 0.16169999999999995,
+    "adversaryBERTscore": 0.807,
+    "adversaryDetectionRate": 2.3
+  },
+  {
+    "name": "SafeSeal",
+    "model": "LLaMA3",
+    "normalizedUtility": 0.982,
+    "detectionRate": 89.69,
+    "removal_detectionRate": 37.63,
+    "absoluteUtilityDegregation": 0.28105,
+    "adversaryBERTscore": 0.788,
+    "adversaryDetectionRate": 69.23
+  },
+  {
+    "name": "KGW",
+    "model": "DeepSeek",
+    "normalizedUtility": 0.602,
+    "detectionRate": 82.81,
+    "removal_detectionRate": 1.98,
+    "absoluteUtilityDegregation": 0.026499999999999968,
+    "adversaryBERTscore": 0.777,
+    "adversaryDetectionRate": 0.11
+  },
+  {
+    "name": "SIR",
+    "model": "DeepSeek",
+    "normalizedUtility": 0.594,
+    "detectionRate": 95.8,
+    "removal_detectionRate": 55.16,
+    "absoluteUtilityDegregation": 0.03005000000000002,
+    "adversaryBERTscore": 0.767,
+    "adversaryDetectionRate": 67.38
+  },
+  {
+    "name": "SynthID",
+    "model": "DeepSeek",
+    "normalizedUtility": 0.566,
+    "detectionRate": 49.24,
+    "removal_detectionRate": 1.49,
+    "absoluteUtilityDegregation": 0.018500000000000072,
+    "adversaryBERTscore": 0.77,
+    "adversaryDetectionRate": 1.4
+  },
+  {
+    "name": "DTM",
+    "model": "DeepSeek",
+    "normalizedUtility": 0.748,
+    "detectionRate": 89.7,
+    "removal_detectionRate": 8.54,
+    "absoluteUtilityDegregation": 0.054850000000000065,
+    "adversaryBERTscore": 0.779,
+    "adversaryDetectionRate": 3.5
+  },
+  {
+    "name": "TW",
+    "model": "DeepSeek",
+    "normalizedUtility": 0.857,
+    "detectionRate": 85.22,
+    "removal_detectionRate": 15.39,
+    "absoluteUtilityDegregation": 0.15155000000000007,
+    "adversaryBERTscore": 0.776,
+    "adversaryDetectionRate": 59.5
+  },
+  {
+    "name": "SafeSeal",
+    "model": "DeepSeek",
+    "normalizedUtility": 0.981,
+    "detectionRate": 90.92,
+    "removal_detectionRate": 60.05,
+    "absoluteUtilityDegregation": 0.26935,
+    "adversaryBERTscore": 0.778,
+    "adversaryDetectionRate": 74.1
+  }
+]

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+gradio>=4.44.0
+pandas>=1.5.0
+plotly>=5.0.0
+numpy>=1.21.0

test.html ADDED Viewed

	@@ -0,0 +1,16 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Test Page</title>
+</head>
+<body>
+    <h1>Test Page</h1>
+    <p>If you can see this, the static deployment is working.</p>
+    <p>Current time: <span id="time"></span></p>
+    <script>
+        document.getElementById('time').textContent = new Date().toLocaleString();
+    </script>
+</body>
+</html>