BD_framework_test / README.md
lllouo's picture
English Version
28e23fd
---
title: BD Framework
emoji: πŸ”₯
colorFrom: blue
colorTo: gray
sdk: gradio
sdk_version: 6.1.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Benchmark-Denoising (BD) framework
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# Dataset Denoising Framework Demo System
LLM-based Intelligent Dataset Quality Enhancement Framework - Graduate Thesis Research Showcase
## Deploy to Hugging Face Spaces
### Step 1: Create Space
1. Visit https://huggingface.co/spaces
2. Click "Create new Space"
3. Select **Gradio** SDK (or Docker)
4. Space name: `dataset-cleaning-demo`
### Step 2: Upload Files
Upload the following files to the Space:
- `app.py` - Main application
- `requirements.txt` - Python dependencies
- `README.md` - This file
### Step 3: Configure Environment Variables
Add in Space settings:
- `DEEPSEEK_API_KEY`: Your DeepSeek API key
### Step 4: Wait for Build
HF Spaces will automatically build and deploy your application.
## Local Development
```bash
# Install dependencies
pip install -r requirements.txt
# Set environment variable
export DEEPSEEK_API_KEY="your-api-key"
# Run application
python app.py
```
Visit http://localhost:7860
## Features
βœ… Dataset upload (JSON/JSONL format)
βœ… Intelligent denoising via DeepSeek API
βœ… Showcase denoising effects on 19 mainstream benchmarks
βœ… Interactive Leaderboard
βœ… Download denoised results
## Tech Stack
- **Frontend**: React + Tailwind CSS
- **Backend**: FastAPI
- **LLM**: DeepSeek API
- **Deployment**: Hugging Face Spaces
## Denoising Workflow
1. **Error Detection**: Identify data quality issues
2. **Quality Assessment**: Score samples
3. **Intelligent Correction**: LLM generates high-quality versions
4. **Consistency Validation**: Ensure logical consistency
## Notes
- Demo version limits processing to 10 samples per batch
- Requires valid DeepSeek API key
- Leaderboard data is pre-configured results
## Future Enhancements
- [ ] Connect to university server LLaMA3 model
- [ ] Support large-scale dataset processing
- [ ] Add more evaluation metrics
- [ ] Real-time processing progress feedback