Spaces:

lllouo
/

BD_framework_test

Sleeping

App Files Files Community

BD_framework_test / README.md

lllouo

English Version

28e23fd about 2 months ago

preview code

raw

history blame contribute delete

2.18 kB

	---
	title: BD Framework
	emoji: 🔥
	colorFrom: blue
	colorTo: gray
	sdk: gradio
	sdk_version: 6.1.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: Benchmark-Denoising (BD) framework
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

	# Dataset Denoising Framework Demo System

	LLM-based Intelligent Dataset Quality Enhancement Framework - Graduate Thesis Research Showcase

	## Deploy to Hugging Face Spaces

	### Step 1: Create Space

	1. Visit https://huggingface.co/spaces
	2. Click "Create new Space"
	3. Select Gradio SDK (or Docker)
	4. Space name: `dataset-cleaning-demo`

	### Step 2: Upload Files

	Upload the following files to the Space:
	- `app.py` - Main application
	- `requirements.txt` - Python dependencies
	- `README.md` - This file

	### Step 3: Configure Environment Variables

	Add in Space settings:
	- `DEEPSEEK_API_KEY`: Your DeepSeek API key

	### Step 4: Wait for Build

	HF Spaces will automatically build and deploy your application.

	## Local Development
	```bash
	# Install dependencies
	pip install -r requirements.txt

	# Set environment variable
	export DEEPSEEK_API_KEY="your-api-key"

	# Run application
	python app.py
	```

	Visit http://localhost:7860

	## Features

	✅ Dataset upload (JSON/JSONL format)
	✅ Intelligent denoising via DeepSeek API
	✅ Showcase denoising effects on 19 mainstream benchmarks
	✅ Interactive Leaderboard
	✅ Download denoised results

	## Tech Stack

	- Frontend: React + Tailwind CSS
	- Backend: FastAPI
	- LLM: DeepSeek API
	- Deployment: Hugging Face Spaces

	## Denoising Workflow

	1. Error Detection: Identify data quality issues
	2. Quality Assessment: Score samples
	3. Intelligent Correction: LLM generates high-quality versions
	4. Consistency Validation: Ensure logical consistency

	## Notes

	- Demo version limits processing to 10 samples per batch
	- Requires valid DeepSeek API key
	- Leaderboard data is pre-configured results

	## Future Enhancements

	- [ ] Connect to university server LLaMA3 model
	- [ ] Support large-scale dataset processing
	- [ ] Add more evaluation metrics
	- [ ] Real-time processing progress feedback