Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.9.0
metadata
title: BD Framework
emoji: 🔥
colorFrom: blue
colorTo: gray
sdk: gradio
sdk_version: 6.1.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Benchmark-Denoising (BD) framework
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
Dataset Denoising Framework Demo System
LLM-based Intelligent Dataset Quality Enhancement Framework - Graduate Thesis Research Showcase
Deploy to Hugging Face Spaces
Step 1: Create Space
- Visit https://huggingface.co/spaces
- Click "Create new Space"
- Select Gradio SDK (or Docker)
- Space name:
dataset-cleaning-demo
Step 2: Upload Files
Upload the following files to the Space:
app.py- Main applicationrequirements.txt- Python dependenciesREADME.md- This file
Step 3: Configure Environment Variables
Add in Space settings:
DEEPSEEK_API_KEY: Your DeepSeek API key
Step 4: Wait for Build
HF Spaces will automatically build and deploy your application.
Local Development
# Install dependencies
pip install -r requirements.txt
# Set environment variable
export DEEPSEEK_API_KEY="your-api-key"
# Run application
python app.py
Visit http://localhost:7860
Features
✅ Dataset upload (JSON/JSONL format) ✅ Intelligent denoising via DeepSeek API ✅ Showcase denoising effects on 19 mainstream benchmarks ✅ Interactive Leaderboard ✅ Download denoised results
Tech Stack
- Frontend: React + Tailwind CSS
- Backend: FastAPI
- LLM: DeepSeek API
- Deployment: Hugging Face Spaces
Denoising Workflow
- Error Detection: Identify data quality issues
- Quality Assessment: Score samples
- Intelligent Correction: LLM generates high-quality versions
- Consistency Validation: Ensure logical consistency
Notes
- Demo version limits processing to 10 samples per batch
- Requires valid DeepSeek API key
- Leaderboard data is pre-configured results
Future Enhancements
- Connect to university server LLaMA3 model
- Support large-scale dataset processing
- Add more evaluation metrics
- Real-time processing progress feedback