Spaces:
Sleeping
Sleeping
File size: 2,176 Bytes
82337e4 28e23fd 82337e4 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd 82b2018 28e23fd | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | ---
title: BD Framework
emoji: 🔥
colorFrom: blue
colorTo: gray
sdk: gradio
sdk_version: 6.1.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Benchmark-Denoising (BD) framework
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# Dataset Denoising Framework Demo System
LLM-based Intelligent Dataset Quality Enhancement Framework - Graduate Thesis Research Showcase
## Deploy to Hugging Face Spaces
### Step 1: Create Space
1. Visit https://huggingface.co/spaces
2. Click "Create new Space"
3. Select **Gradio** SDK (or Docker)
4. Space name: `dataset-cleaning-demo`
### Step 2: Upload Files
Upload the following files to the Space:
- `app.py` - Main application
- `requirements.txt` - Python dependencies
- `README.md` - This file
### Step 3: Configure Environment Variables
Add in Space settings:
- `DEEPSEEK_API_KEY`: Your DeepSeek API key
### Step 4: Wait for Build
HF Spaces will automatically build and deploy your application.
## Local Development
```bash
# Install dependencies
pip install -r requirements.txt
# Set environment variable
export DEEPSEEK_API_KEY="your-api-key"
# Run application
python app.py
```
Visit http://localhost:7860
## Features
✅ Dataset upload (JSON/JSONL format)
✅ Intelligent denoising via DeepSeek API
✅ Showcase denoising effects on 19 mainstream benchmarks
✅ Interactive Leaderboard
✅ Download denoised results
## Tech Stack
- **Frontend**: React + Tailwind CSS
- **Backend**: FastAPI
- **LLM**: DeepSeek API
- **Deployment**: Hugging Face Spaces
## Denoising Workflow
1. **Error Detection**: Identify data quality issues
2. **Quality Assessment**: Score samples
3. **Intelligent Correction**: LLM generates high-quality versions
4. **Consistency Validation**: Ensure logical consistency
## Notes
- Demo version limits processing to 10 samples per batch
- Requires valid DeepSeek API key
- Leaderboard data is pre-configured results
## Future Enhancements
- [ ] Connect to university server LLaMA3 model
- [ ] Support large-scale dataset processing
- [ ] Add more evaluation metrics
- [ ] Real-time processing progress feedback |