---
title: BD Framework
emoji: 🔥
colorFrom: blue
colorTo: gray
sdk: gradio
sdk_version: 6.1.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Benchmark-Denoising (BD) framework
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# Dataset Denoising Framework Demo System

LLM-based Intelligent Dataset Quality Enhancement Framework - Graduate Thesis Research Showcase

## Deploy to Hugging Face Spaces

### Step 1: Create Space

1. Visit https://huggingface.co/spaces
2. Click "Create new Space"
3. Select **Gradio** SDK (or Docker)
4. Space name: `dataset-cleaning-demo`

### Step 2: Upload Files

Upload the following files to the Space:
- `app.py` - Main application
- `requirements.txt` - Python dependencies
- `README.md` - This file

### Step 3: Configure Environment Variables

Add in Space settings:
- `DEEPSEEK_API_KEY`: Your DeepSeek API key

### Step 4: Wait for Build

HF Spaces will automatically build and deploy your application.

## Local Development
```bash
# Install dependencies
pip install -r requirements.txt

# Set environment variable
export DEEPSEEK_API_KEY="your-api-key"

# Run application
python app.py
```

Visit http://localhost:7860

## Features

✅ Dataset upload (JSON/JSONL format)
✅ Intelligent denoising via DeepSeek API
✅ Showcase denoising effects on 19 mainstream benchmarks
✅ Interactive Leaderboard
✅ Download denoised results

## Tech Stack

- **Frontend**: React + Tailwind CSS
- **Backend**: FastAPI
- **LLM**: DeepSeek API
- **Deployment**: Hugging Face Spaces

## Denoising Workflow

1. **Error Detection**: Identify data quality issues
2. **Quality Assessment**: Score samples
3. **Intelligent Correction**: LLM generates high-quality versions
4. **Consistency Validation**: Ensure logical consistency

## Notes

- Demo version limits processing to 10 samples per batch
- Requires valid DeepSeek API key
- Leaderboard data is pre-configured results

## Future Enhancements

- [ ] Connect to university server LLaMA3 model
- [ ] Support large-scale dataset processing
- [ ] Add more evaluation metrics
- [ ] Real-time processing progress feedback