Spaces:

szili2011
/

TrainAI

Sleeping

App Files Files Community

TrainAI / README.md

szili2011

Update README.md

b1c6c9b verified 7 months ago

preview code

raw

history blame contribute delete

4.48 kB

	---
	title: TrainAI
	emoji: 👁
	colorFrom: pink
	colorTo: yellow
	sdk: gradio
	sdk_version: 5.31.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: You can train any simple models
	---

	# 🧠 Universal CPU AI Trainer ⚙️

	Welcome to the Universal CPU AI Trainer! This Hugging Face Space allows you to:

	* Define AI Tasks: Choose from Tabular Classification, Tabular Regression, or basic Image Classification.
	* Select Model Families: Experiment with classical Scikit-learn models or simpler PyTorch Neural Networks (MLPs, basic CNNs).
	* Configure Datasets:
	* Generate synthetic datasets with configurable rows, features, and characteristics.
	* Let the "AI Assistant" (heuristic rules) suggest dataset parameters.
	* Upload your own datasets (CSV, JSON, Parquet).
	* Design Neural Networks: For PyTorch MLPs, specify hidden layers and get suggestions for target parameter counts (10k - 1M).
	* Train Models on CPU: All training happens on the free CPU tier. Be patient with larger models or datasets!
	* Evaluate & Download: Get basic evaluation metrics and download your trained models (PKL, ONNX for Scikit-learn; PT for PyTorch).

	⚠️ Important Considerations for CPU Training:

	* Performance: This Space runs on a free CPU tier. Training complex models (especially Neural Networks with >100k parameters) or large datasets will be SLOW. An epoch can take minutes to hours.
	* Memory Limits: The free tier has limited RAM (~15GB). Very large datasets or models might cause the Space to crash.
	* Toy Examples: The "Basic Image Classification" task uses randomly generated pixel data, not real images. It's for demonstrating the CNN pipeline structure on CPU.
	* Experimental: This is a tool for learning and experimentation, not for production-grade model training.

	## How to Use

	1. Tab 1: Define Task & Model
	* Select your desired Task Type (e.g., Tabular Classification).
	* Choose a Model Family (Scikit-learn or PyTorch).
	* Select the Specific Model.
	* If using PyTorch NNs:
	* Select a Target Parameter Range (e.g., "Small (10k-50k)").
	* For MLPs, configure Hidden Layers or use the "Suggest MLP Layers" button (after defining a dataset in Tab 2 for better dimension estimates).

	2. Tab 2: Configure Dataset
	* Choose to Generate a new dataset or Upload your own.
	* Generation: Specify rows, features, etc., or use the "AI suggest" checkbox.
	* Upload: Provide your CSV, JSON, or Parquet file.
	* Enter the Target Column Name from your dataset.
	* Click "Generate & Preview Dataset" or let the upload complete.

	3. Tab 3: Train Model & Get Results
	* Adjust Training Hyperparameters (Epochs, Batch Size, Learning Rate - primarily for NNs).
	* Select the desired Model Output Format.
	* Click "🚀 Train Model".
	* Monitor the Training Log.
	* View Evaluation Metrics, Model Parameters, and (for PyTorch) a Loss Curve.
	* Download your trained model using the Download Trained Model button.

	## Model Output Formats

	* Scikit-learn:
	* `.pkl`: Python pickle file containing the Scikit-learn pipeline (preprocessor + model).
	* `.onnx`: Open Neural Network Exchange format. The exported ONNX model includes the preprocessing steps and expects raw input matching the original training data structure.
	* PyTorch:
	* `.pt`: PyTorch file. For MLPs trained on tabular data, this bundles the model's `state_dict` and the Scikit-learn `preprocessor` used. For CNNs, it's typically the `state_dict`.

	## Want More Power? Clone & Upgrade!

	If training is too slow or you hit resource limits:

	1. Go to this Space's main page.
	2. Click the three dots (⋮) menu and select "Duplicate this Space."
	3. On the creation page, choose upgraded Space Hardware (e.g., better CPU or a GPU - these are paid options).
	4. Create your new, more powerful Space! (You'll likely need to re-upload/re-generate data).

	## Development & Contributions

	This Space is built with Python, Gradio, Scikit-learn, and PyTorch.
	* Main Application Logic: `app.py`
	* Dependencies: `requirements.txt`

	Feel free to explore the code, suggest improvements, or report issues!

	## License
	This project is licensed under the Apache License 2.0. See the `LICENSE` file for details.