szili2011 commited on
Commit
b1c6c9b
·
verified ·
1 Parent(s): a3a39d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -1
README.md CHANGED
@@ -11,4 +11,76 @@ license: apache-2.0
11
  short_description: You can train any simple models
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  short_description: You can train any simple models
12
  ---
13
 
14
+ # 🧠 Universal CPU AI Trainer ⚙️
15
+
16
+ Welcome to the Universal CPU AI Trainer! This Hugging Face Space allows you to:
17
+
18
+ * **Define AI Tasks:** Choose from Tabular Classification, Tabular Regression, or basic Image Classification.
19
+ * **Select Model Families:** Experiment with classical Scikit-learn models or simpler PyTorch Neural Networks (MLPs, basic CNNs).
20
+ * **Configure Datasets:**
21
+ * Generate synthetic datasets with configurable rows, features, and characteristics.
22
+ * Let the "AI Assistant" (heuristic rules) suggest dataset parameters.
23
+ * Upload your own datasets (CSV, JSON, Parquet).
24
+ * **Design Neural Networks:** For PyTorch MLPs, specify hidden layers and get suggestions for target parameter counts (10k - 1M).
25
+ * **Train Models on CPU:** All training happens on the free CPU tier. Be patient with larger models or datasets!
26
+ * **Evaluate & Download:** Get basic evaluation metrics and download your trained models (PKL, ONNX for Scikit-learn; PT for PyTorch).
27
+
28
+ **⚠️ Important Considerations for CPU Training:**
29
+
30
+ * **Performance:** This Space runs on a **free CPU tier**. Training complex models (especially Neural Networks with >100k parameters) or large datasets will be **SLOW**. An epoch can take minutes to hours.
31
+ * **Memory Limits:** The free tier has limited RAM (~15GB). Very large datasets or models might cause the Space to crash.
32
+ * **Toy Examples:** The "Basic Image Classification" task uses randomly generated pixel data, not real images. It's for demonstrating the CNN pipeline structure on CPU.
33
+ * **Experimental:** This is a tool for learning and experimentation, not for production-grade model training.
34
+
35
+ ## How to Use
36
+
37
+ 1. **Tab 1: Define Task & Model**
38
+ * Select your desired **Task Type** (e.g., Tabular Classification).
39
+ * Choose a **Model Family** (Scikit-learn or PyTorch).
40
+ * Select the **Specific Model**.
41
+ * If using PyTorch NNs:
42
+ * Select a **Target Parameter Range** (e.g., "Small (10k-50k)").
43
+ * For MLPs, configure **Hidden Layers** or use the "Suggest MLP Layers" button (after defining a dataset in Tab 2 for better dimension estimates).
44
+
45
+ 2. **Tab 2: Configure Dataset**
46
+ * Choose to **Generate** a new dataset or **Upload** your own.
47
+ * **Generation:** Specify rows, features, etc., or use the "AI suggest" checkbox.
48
+ * **Upload:** Provide your CSV, JSON, or Parquet file.
49
+ * Enter the **Target Column Name** from your dataset.
50
+ * Click "Generate & Preview Dataset" or let the upload complete.
51
+
52
+ 3. **Tab 3: Train Model & Get Results**
53
+ * Adjust **Training Hyperparameters** (Epochs, Batch Size, Learning Rate - primarily for NNs).
54
+ * Select the desired **Model Output Format**.
55
+ * Click "🚀 Train Model".
56
+ * Monitor the **Training Log**.
57
+ * View **Evaluation Metrics**, **Model Parameters**, and (for PyTorch) a **Loss Curve**.
58
+ * Download your trained model using the **Download Trained Model** button.
59
+
60
+ ## Model Output Formats
61
+
62
+ * **Scikit-learn:**
63
+ * `.pkl`: Python pickle file containing the Scikit-learn pipeline (preprocessor + model).
64
+ * `.onnx`: Open Neural Network Exchange format. The exported ONNX model includes the preprocessing steps and expects raw input matching the original training data structure.
65
+ * **PyTorch:**
66
+ * `.pt`: PyTorch file. For MLPs trained on tabular data, this bundles the model's `state_dict` and the Scikit-learn `preprocessor` used. For CNNs, it's typically the `state_dict`.
67
+
68
+ ## Want More Power? Clone & Upgrade!
69
+
70
+ If training is too slow or you hit resource limits:
71
+
72
+ 1. Go to this Space's main page.
73
+ 2. Click the three dots (⋮) menu and select "Duplicate this Space."
74
+ 3. On the creation page, choose upgraded **Space Hardware** (e.g., better CPU or a GPU - these are paid options).
75
+ 4. Create your new, more powerful Space! (You'll likely need to re-upload/re-generate data).
76
+
77
+ ## Development & Contributions
78
+
79
+ This Space is built with Python, Gradio, Scikit-learn, and PyTorch.
80
+ * **Main Application Logic:** `app.py`
81
+ * **Dependencies:** `requirements.txt`
82
+
83
+ Feel free to explore the code, suggest improvements, or report issues!
84
+
85
+ ## License
86
+ This project is licensed under the **Apache License 2.0**. See the `LICENSE` file for details.