BD_framework_test / README.md
lllouo's picture
English Version
28e23fd

A newer version of the Gradio SDK is available: 6.9.0

Upgrade
metadata
title: BD Framework
emoji: 🔥
colorFrom: blue
colorTo: gray
sdk: gradio
sdk_version: 6.1.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Benchmark-Denoising (BD) framework

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Dataset Denoising Framework Demo System

LLM-based Intelligent Dataset Quality Enhancement Framework - Graduate Thesis Research Showcase

Deploy to Hugging Face Spaces

Step 1: Create Space

  1. Visit https://huggingface.co/spaces
  2. Click "Create new Space"
  3. Select Gradio SDK (or Docker)
  4. Space name: dataset-cleaning-demo

Step 2: Upload Files

Upload the following files to the Space:

  • app.py - Main application
  • requirements.txt - Python dependencies
  • README.md - This file

Step 3: Configure Environment Variables

Add in Space settings:

  • DEEPSEEK_API_KEY: Your DeepSeek API key

Step 4: Wait for Build

HF Spaces will automatically build and deploy your application.

Local Development

# Install dependencies
pip install -r requirements.txt

# Set environment variable
export DEEPSEEK_API_KEY="your-api-key"

# Run application
python app.py

Visit http://localhost:7860

Features

✅ Dataset upload (JSON/JSONL format) ✅ Intelligent denoising via DeepSeek API ✅ Showcase denoising effects on 19 mainstream benchmarks ✅ Interactive Leaderboard ✅ Download denoised results

Tech Stack

  • Frontend: React + Tailwind CSS
  • Backend: FastAPI
  • LLM: DeepSeek API
  • Deployment: Hugging Face Spaces

Denoising Workflow

  1. Error Detection: Identify data quality issues
  2. Quality Assessment: Score samples
  3. Intelligent Correction: LLM generates high-quality versions
  4. Consistency Validation: Ensure logical consistency

Notes

  • Demo version limits processing to 10 samples per batch
  • Requires valid DeepSeek API key
  • Leaderboard data is pre-configured results

Future Enhancements

  • Connect to university server LLaMA3 model
  • Support large-scale dataset processing
  • Add more evaluation metrics
  • Real-time processing progress feedback