Spaces:

lllouo
/

BD_framework_test

Sleeping

App Files Files Community

BD_framework_test / README.md

lllouo

English Version

28e23fd about 2 months ago

preview code

raw

history blame contribute delete

2.18 kB

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

metadata

title: BD Framework
emoji: 🔥
colorFrom: blue
colorTo: gray
sdk: gradio
sdk_version: 6.1.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Benchmark-Denoising (BD) framework

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Dataset Denoising Framework Demo System

LLM-based Intelligent Dataset Quality Enhancement Framework - Graduate Thesis Research Showcase

Deploy to Hugging Face Spaces

Step 1: Create Space

Visit https://huggingface.co/spaces
Click "Create new Space"
Select Gradio SDK (or Docker)
Space name: dataset-cleaning-demo

Step 2: Upload Files

Upload the following files to the Space:

app.py - Main application
requirements.txt - Python dependencies
README.md - This file

Step 3: Configure Environment Variables

Add in Space settings:

DEEPSEEK_API_KEY: Your DeepSeek API key

Step 4: Wait for Build

HF Spaces will automatically build and deploy your application.

Local Development

# Install dependencies
pip install -r requirements.txt

# Set environment variable
export DEEPSEEK_API_KEY="your-api-key"

# Run application
python app.py

Visit http://localhost:7860

Features

✅ Dataset upload (JSON/JSONL format) ✅ Intelligent denoising via DeepSeek API ✅ Showcase denoising effects on 19 mainstream benchmarks ✅ Interactive Leaderboard ✅ Download denoised results

Tech Stack

Frontend: React + Tailwind CSS
Backend: FastAPI
LLM: DeepSeek API
Deployment: Hugging Face Spaces

Denoising Workflow

Error Detection: Identify data quality issues
Quality Assessment: Score samples
Intelligent Correction: LLM generates high-quality versions
Consistency Validation: Ensure logical consistency

Notes

Demo version limits processing to 10 samples per batch
Requires valid DeepSeek API key
Leaderboard data is pre-configured results

Future Enhancements

Connect to university server LLaMA3 model
Support large-scale dataset processing
Add more evaluation metrics
Real-time processing progress feedback