FutureMa
/

MatroidNN

Model card Files Files and versions

xet

Community

FutureMa commited on Feb 27, 2025

Commit

d12bc5f

verified ·

1 Parent(s): 8bafb01

Update README.md

Browse files

Files changed (1) hide show

README.md +131 -3

README.md CHANGED Viewed

@@ -1,3 +1,131 @@
----
-license: apache-2.0
----

+# Model Card for MatroidNN
+## Model Details
+### Model Description
+**Model type:** Neural Network with Matroid-based Feature Selection (MatroidNN)
+**Version:** 1.0
+**Framework:** PyTorch
+**Last updated:** February 27, 2025
+### Overview
+MatroidNN is a neural network architecture that incorporates matroid theory for feature selection. It addresses the challenge of feature redundancy by selecting a maximally independent set of features based on matroid theory principles before training the neural network.
+### Model Architecture
+- **Feature Selection Component**: MatroidFeatureSelector using correlation-based dependency analysis
+- **Neural Network**: 3-layer feedforward network with batch normalization and dropout
+- **Input**: Varies based on the number of features selected by the matroid selector
+- **Hidden Layers**: Configurable hidden layer sizes (default 64 → 32)
+- **Output**: Multi-class classification (configurable number of classes)
+- **Parameters**: ~5K-10K parameters (varies based on input/output dimensions)
+## Uses
+### Direct Use
+MatroidNN is designed for classification tasks where feature redundancy is a potential issue. It's particularly useful for:
+- High-dimensional datasets with correlated features
+- Feature selection in biological/medical data
+- Financial prediction with multicollinear variables
+- Any classification task where feature independence is desired
+### Out-of-Scope Use
+This model is not intended for:
+- Regression tasks (without modification)
+- Time series prediction (without temporal adaptations)
+- Raw image or text classification (without appropriate feature extraction)
+## Training Data
+The model was developed and tested using synthetic data with deliberate feature dependencies. For real-world applications, the model should be retrained on domain-specific data.
+### Training Dataset
+- **Type**: Synthetic data with controlled dependencies
+- **Size**: 1000 samples (default), configurable
+- **Features**: 20 initial features (default), configurable
+- **Classes**: 3 classes (default), configurable
+- **Distribution**: Equal class distribution in the synthetic data
+## Performance
+### Metrics
+On synthetic test data with 3 classes:
+- **Accuracy**: 94.0%
+- **Macro-average F1-score**: 0.93
+- **Per-class metrics**:
+  - Class 0: Precision 0.96, Recall 1.00, F1 0.98
+  - Class 1: Precision 0.86, Recall 0.86, F1 0.86
+  - Class 2: Precision 0.97, Recall 0.93, F1 0.95
+### Factors
+Performance may vary based on:
+- Feature correlation structure in the dataset
+- Number of initial features and their information content
+- Class distribution balance
+- Rank threshold parameter in the MatroidFeatureSelector
+## Limitations
+- The matroid-based feature selection uses correlation as a proxy for independence, which may not capture all forms of dependency
+- The current implementation assumes numerical features and may require adaptation for categorical features
+- Feature selection is performed once before training and does not adapt during training
+- The rank threshold parameter requires careful tuning based on the dataset
+## Ethical Considerations
+- Feature selection might unintentionally exclude features that are important for fairness considerations
+- The model inherits any biases present in the training data
+- Results should be interpreted with caution in high-stakes applications, with human oversight
+## Technical Specifications
+### Hardware Requirements
+- Training: CUDA-capable GPU recommended for larger datasets
+- Inference: CPU sufficient for most applications
+### Software Requirements
+- Python 3.8+
+- PyTorch 1.8+
+- NumPy 1.20+
+- scikit-learn 0.24+
+### Training Hyperparameters
+- **Batch size**: 32 (default)
+- **Learning rate**: 0.001 (default)
+- **Optimizer**: Adam
+- **Loss function**: Cross-Entropy Loss
+- **Epochs**: Early stopping based on validation loss (patience=10)
+- **Feature selection rank threshold**: 0.7 (default, configurable)
+## How to Use
+```python
+from matroid_nn import MatroidFeatureSelector, MatroidNN
+# Initialize feature selector
+selector = MatroidFeatureSelector(rank_threshold=0.7)
+# Apply feature selection
+X_train_selected = selector.fit_transform(X_train)
+X_test_selected = selector.transform(X_test)
+# Create and train model
+model = MatroidNN(
+    input_size=X_train_selected.shape[1],
+    hidden_size=64,
+    output_size=num_classes
+)