Spaces:
Sleeping
Sleeping
File size: 3,735 Bytes
5bad51c 073874d 5bad51c d6212ac |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
---
title: Synthetic Data Generation and ML Model Training
emoji: π
colorFrom: pink
colorTo: red
sdk: streamlit
sdk_version: 1.39.0
app_file: App.py
pinned: false
license: apache-2.0
short_description: Synthetic Data Generation and ML Model Training
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# Synthetic Data Generation and ML Model Training
A comprehensive Streamlit application for generating synthetic data, training machine learning models, and educational visualization of algorithm performance.
## Live Demo
**[Try the application online!](https://projectsyntheticdatageneration.streamlit.app/)**
## Overview
This application provides an end-to-end platform for:
1. Generating customizable synthetic datasets
2. Training and evaluating multiple machine learning classifiers
3. Visualizing model performance and data characteristics
4. Learning about different ML algorithms through interactive education
5. Implementing and testing trained models
## Features
### Main App (`App.py`)
- Synthetic data generation with customizable feature distributions
- Support for multiple classifier algorithms with automatic preprocessing
- Real-time visualization of model performance metrics
- Model comparison and selection
- Dataset exploration and visualization tools
- Model saving and exporting functionality
### Algorithm Education (`pages/02_Algorithm_Education.py`)
- Detailed explanations of various ML classification algorithms
- Interactive demonstrations with customizable parameters
- Mathematical foundations and implementation details
- Algorithm strengths, limitations, and use cases
- Performance visualization across different data distributions
### Model Implementation (`pages/03_Model_Implementation.py`)
- Upload and use previously trained models
- Real-time prediction with custom input values
- Model and scaler integration
## Installation
```bash
# Clone the repository
git clone https://github.com/yourusername/synthetic_data_generation.git
cd synthetic_data_generation
# Install dependencies
pip install -r requirements.txt
# Run the application
streamlit run App.py
```
## Requirements
- Python 3.7+
- streamlit>=1.28.0
- numpy>=1.24.0
- pandas>=2.0.0
- scikit-learn>=1.2.0
- plotly>=5.13.0
- seaborn>=0.12.0
- matplotlib>=3.7.0
- joblib>=1.2.0
## Usage
### Generating Synthetic Data
1. Define features and their distributions
2. Configure class characteristics
3. Set sample size and other generation parameters
4. Generate and explore your synthetic dataset
### Training Models
1. Select classifier algorithms to evaluate
2. Configure training parameters (test split, etc.)
3. Train models and view performance metrics
4. Compare model results through interactive visualizations
### Educational Resources
1. Navigate to the Algorithm Education page
2. Select an algorithm to learn about
3. Interact with the demo to see how parameters affect performance
4. Examine mathematical foundations and implementation details
### Model Implementation
1. Upload previously saved model and scaler files
2. Input feature values or generate random test values
3. Make predictions and view results
## Project Structure
```
synthetic_data_generation/
βββ App.py # Main application
βββ models/ # Directory for saved models
βββ pages/ # Additional application pages
β βββ 02_Algorithm_Education.py # Educational content about ML algorithms
β βββ 03_Model_implementation.py # Model deployment and usage interface
βββ temp_uploads/ # Temporary directory for file uploads
βββ requirements.txt # Project dependencies
``` |