MENG21's picture
Update short description in README.md to better reflect the focus on synthetic data generation and ML model training.
073874d

A newer version of the Streamlit SDK is available: 1.54.0

Upgrade
metadata
title: Synthetic Data Generation and ML Model Training
emoji: πŸ“ˆ
colorFrom: pink
colorTo: red
sdk: streamlit
sdk_version: 1.39.0
app_file: App.py
pinned: false
license: apache-2.0
short_description: Synthetic Data Generation and ML Model Training

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Synthetic Data Generation and ML Model Training

A comprehensive Streamlit application for generating synthetic data, training machine learning models, and educational visualization of algorithm performance.

Live Demo

Try the application online!

Overview

This application provides an end-to-end platform for:

  1. Generating customizable synthetic datasets
  2. Training and evaluating multiple machine learning classifiers
  3. Visualizing model performance and data characteristics
  4. Learning about different ML algorithms through interactive education
  5. Implementing and testing trained models

Features

Main App (App.py)

  • Synthetic data generation with customizable feature distributions
  • Support for multiple classifier algorithms with automatic preprocessing
  • Real-time visualization of model performance metrics
  • Model comparison and selection
  • Dataset exploration and visualization tools
  • Model saving and exporting functionality

Algorithm Education (pages/02_Algorithm_Education.py)

  • Detailed explanations of various ML classification algorithms
  • Interactive demonstrations with customizable parameters
  • Mathematical foundations and implementation details
  • Algorithm strengths, limitations, and use cases
  • Performance visualization across different data distributions

Model Implementation (pages/03_Model_Implementation.py)

  • Upload and use previously trained models
  • Real-time prediction with custom input values
  • Model and scaler integration

Installation

# Clone the repository
git clone https://github.com/yourusername/synthetic_data_generation.git
cd synthetic_data_generation

# Install dependencies
pip install -r requirements.txt

# Run the application
streamlit run App.py

Requirements

  • Python 3.7+
  • streamlit>=1.28.0
  • numpy>=1.24.0
  • pandas>=2.0.0
  • scikit-learn>=1.2.0
  • plotly>=5.13.0
  • seaborn>=0.12.0
  • matplotlib>=3.7.0
  • joblib>=1.2.0

Usage

Generating Synthetic Data

  1. Define features and their distributions
  2. Configure class characteristics
  3. Set sample size and other generation parameters
  4. Generate and explore your synthetic dataset

Training Models

  1. Select classifier algorithms to evaluate
  2. Configure training parameters (test split, etc.)
  3. Train models and view performance metrics
  4. Compare model results through interactive visualizations

Educational Resources

  1. Navigate to the Algorithm Education page
  2. Select an algorithm to learn about
  3. Interact with the demo to see how parameters affect performance
  4. Examine mathematical foundations and implementation details

Model Implementation

  1. Upload previously saved model and scaler files
  2. Input feature values or generate random test values
  3. Make predictions and view results

Project Structure

synthetic_data_generation/
β”œβ”€β”€ App.py                  # Main application
β”œβ”€β”€ models/                 # Directory for saved models
β”œβ”€β”€ pages/                  # Additional application pages
β”‚   β”œβ”€β”€ 02_Algorithm_Education.py    # Educational content about ML algorithms
β”‚   └── 03_Model_implementation.py   # Model deployment and usage interface
β”œβ”€β”€ temp_uploads/           # Temporary directory for file uploads
└── requirements.txt        # Project dependencies