Spaces:
Sleeping
Sleeping
A newer version of the Streamlit SDK is available:
1.54.0
metadata
title: Synthetic Data Generation and ML Model Training
emoji: π
colorFrom: pink
colorTo: red
sdk: streamlit
sdk_version: 1.39.0
app_file: App.py
pinned: false
license: apache-2.0
short_description: Synthetic Data Generation and ML Model Training
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
Synthetic Data Generation and ML Model Training
A comprehensive Streamlit application for generating synthetic data, training machine learning models, and educational visualization of algorithm performance.
Live Demo
Overview
This application provides an end-to-end platform for:
- Generating customizable synthetic datasets
- Training and evaluating multiple machine learning classifiers
- Visualizing model performance and data characteristics
- Learning about different ML algorithms through interactive education
- Implementing and testing trained models
Features
Main App (App.py)
- Synthetic data generation with customizable feature distributions
- Support for multiple classifier algorithms with automatic preprocessing
- Real-time visualization of model performance metrics
- Model comparison and selection
- Dataset exploration and visualization tools
- Model saving and exporting functionality
Algorithm Education (pages/02_Algorithm_Education.py)
- Detailed explanations of various ML classification algorithms
- Interactive demonstrations with customizable parameters
- Mathematical foundations and implementation details
- Algorithm strengths, limitations, and use cases
- Performance visualization across different data distributions
Model Implementation (pages/03_Model_Implementation.py)
- Upload and use previously trained models
- Real-time prediction with custom input values
- Model and scaler integration
Installation
# Clone the repository
git clone https://github.com/yourusername/synthetic_data_generation.git
cd synthetic_data_generation
# Install dependencies
pip install -r requirements.txt
# Run the application
streamlit run App.py
Requirements
- Python 3.7+
- streamlit>=1.28.0
- numpy>=1.24.0
- pandas>=2.0.0
- scikit-learn>=1.2.0
- plotly>=5.13.0
- seaborn>=0.12.0
- matplotlib>=3.7.0
- joblib>=1.2.0
Usage
Generating Synthetic Data
- Define features and their distributions
- Configure class characteristics
- Set sample size and other generation parameters
- Generate and explore your synthetic dataset
Training Models
- Select classifier algorithms to evaluate
- Configure training parameters (test split, etc.)
- Train models and view performance metrics
- Compare model results through interactive visualizations
Educational Resources
- Navigate to the Algorithm Education page
- Select an algorithm to learn about
- Interact with the demo to see how parameters affect performance
- Examine mathematical foundations and implementation details
Model Implementation
- Upload previously saved model and scaler files
- Input feature values or generate random test values
- Make predictions and view results
Project Structure
synthetic_data_generation/
βββ App.py # Main application
βββ models/ # Directory for saved models
βββ pages/ # Additional application pages
β βββ 02_Algorithm_Education.py # Educational content about ML algorithms
β βββ 03_Model_implementation.py # Model deployment and usage interface
βββ temp_uploads/ # Temporary directory for file uploads
βββ requirements.txt # Project dependencies