Spaces:
Sleeping
Sleeping
| title: BD Framework | |
| emoji: π₯ | |
| colorFrom: blue | |
| colorTo: gray | |
| sdk: gradio | |
| sdk_version: 6.1.0 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| short_description: Benchmark-Denoising (BD) framework | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
| # Dataset Denoising Framework Demo System | |
| LLM-based Intelligent Dataset Quality Enhancement Framework - Graduate Thesis Research Showcase | |
| ## Deploy to Hugging Face Spaces | |
| ### Step 1: Create Space | |
| 1. Visit https://huggingface.co/spaces | |
| 2. Click "Create new Space" | |
| 3. Select **Gradio** SDK (or Docker) | |
| 4. Space name: `dataset-cleaning-demo` | |
| ### Step 2: Upload Files | |
| Upload the following files to the Space: | |
| - `app.py` - Main application | |
| - `requirements.txt` - Python dependencies | |
| - `README.md` - This file | |
| ### Step 3: Configure Environment Variables | |
| Add in Space settings: | |
| - `DEEPSEEK_API_KEY`: Your DeepSeek API key | |
| ### Step 4: Wait for Build | |
| HF Spaces will automatically build and deploy your application. | |
| ## Local Development | |
| ```bash | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Set environment variable | |
| export DEEPSEEK_API_KEY="your-api-key" | |
| # Run application | |
| python app.py | |
| ``` | |
| Visit http://localhost:7860 | |
| ## Features | |
| β Dataset upload (JSON/JSONL format) | |
| β Intelligent denoising via DeepSeek API | |
| β Showcase denoising effects on 19 mainstream benchmarks | |
| β Interactive Leaderboard | |
| β Download denoised results | |
| ## Tech Stack | |
| - **Frontend**: React + Tailwind CSS | |
| - **Backend**: FastAPI | |
| - **LLM**: DeepSeek API | |
| - **Deployment**: Hugging Face Spaces | |
| ## Denoising Workflow | |
| 1. **Error Detection**: Identify data quality issues | |
| 2. **Quality Assessment**: Score samples | |
| 3. **Intelligent Correction**: LLM generates high-quality versions | |
| 4. **Consistency Validation**: Ensure logical consistency | |
| ## Notes | |
| - Demo version limits processing to 10 samples per batch | |
| - Requires valid DeepSeek API key | |
| - Leaderboard data is pre-configured results | |
| ## Future Enhancements | |
| - [ ] Connect to university server LLaMA3 model | |
| - [ ] Support large-scale dataset processing | |
| - [ ] Add more evaluation metrics | |
| - [ ] Real-time processing progress feedback |