Spaces:

MENG21
/

synthetic-data-generation

Sleeping

App Files Files Community

synthetic-data-generation / README.md

MENG21

Update short description in README.md to better reflect the focus on synthetic data generation and ML model training.

073874d 10 months ago

preview code

raw

history blame contribute delete

3.74 kB

A newer version of the Streamlit SDK is available: 1.54.0

Upgrade

metadata

title: Synthetic Data Generation and ML Model Training
emoji: 📈
colorFrom: pink
colorTo: red
sdk: streamlit
sdk_version: 1.39.0
app_file: App.py
pinned: false
license: apache-2.0
short_description: Synthetic Data Generation and ML Model Training

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Synthetic Data Generation and ML Model Training

A comprehensive Streamlit application for generating synthetic data, training machine learning models, and educational visualization of algorithm performance.

Live Demo

Try the application online!

Overview

This application provides an end-to-end platform for:

Generating customizable synthetic datasets
Training and evaluating multiple machine learning classifiers
Visualizing model performance and data characteristics
Learning about different ML algorithms through interactive education
Implementing and testing trained models

Features

Main App (`App.py`)

Synthetic data generation with customizable feature distributions
Support for multiple classifier algorithms with automatic preprocessing
Real-time visualization of model performance metrics
Model comparison and selection
Dataset exploration and visualization tools
Model saving and exporting functionality

Algorithm Education (`pages/02_Algorithm_Education.py`)

Detailed explanations of various ML classification algorithms
Interactive demonstrations with customizable parameters
Mathematical foundations and implementation details
Algorithm strengths, limitations, and use cases
Performance visualization across different data distributions

Model Implementation (`pages/03_Model_Implementation.py`)

Upload and use previously trained models
Real-time prediction with custom input values
Model and scaler integration

Installation

# Clone the repository
git clone https://github.com/yourusername/synthetic_data_generation.git
cd synthetic_data_generation

# Install dependencies
pip install -r requirements.txt

# Run the application
streamlit run App.py

Requirements

Python 3.7+
streamlit>=1.28.0
numpy>=1.24.0
pandas>=2.0.0
scikit-learn>=1.2.0
plotly>=5.13.0
seaborn>=0.12.0
matplotlib>=3.7.0
joblib>=1.2.0

Usage

Generating Synthetic Data

Define features and their distributions
Configure class characteristics
Set sample size and other generation parameters
Generate and explore your synthetic dataset

Training Models

Select classifier algorithms to evaluate
Configure training parameters (test split, etc.)
Train models and view performance metrics
Compare model results through interactive visualizations

Educational Resources

Navigate to the Algorithm Education page
Select an algorithm to learn about
Interact with the demo to see how parameters affect performance
Examine mathematical foundations and implementation details

Model Implementation

Upload previously saved model and scaler files
Input feature values or generate random test values
Make predictions and view results

Project Structure

synthetic_data_generation/
├── App.py                  # Main application
├── models/                 # Directory for saved models
├── pages/                  # Additional application pages
│   ├── 02_Algorithm_Education.py    # Educational content about ML algorithms
│   └── 03_Model_implementation.py   # Model deployment and usage interface
├── temp_uploads/           # Temporary directory for file uploads
└── requirements.txt        # Project dependencies