MatroidNN / README.md
FutureMa's picture
Update README.md
fef7cdd verified
---
license: apache-2.0
---
# Model Card for MatroidNN
## Model Details
### Model Description
**Model type:** Neural Network with Matroid-based Feature Selection (MatroidNN)
**Version:** 1.0
**Framework:** PyTorch
**Last updated:** February 27, 2025
### Overview
MatroidNN is a neural network architecture that incorporates matroid theory for feature selection. It addresses the challenge of feature redundancy by selecting a maximally independent set of features based on matroid theory principles before training the neural network.
### Model Architecture
- **Feature Selection Component**: MatroidFeatureSelector using correlation-based dependency analysis
- **Neural Network**: 3-layer feedforward network with batch normalization and dropout
- **Input**: Varies based on the number of features selected by the matroid selector
- **Hidden Layers**: Configurable hidden layer sizes (default 64 → 32)
- **Output**: Multi-class classification (configurable number of classes)
- **Parameters**: ~5K-10K parameters (varies based on input/output dimensions)
## Uses
### Direct Use
MatroidNN is designed for classification tasks where feature redundancy is a potential issue. It's particularly useful for:
- High-dimensional datasets with correlated features
- Feature selection in biological/medical data
- Financial prediction with multicollinear variables
- Any classification task where feature independence is desired
### Out-of-Scope Use
This model is not intended for:
- Regression tasks (without modification)
- Time series prediction (without temporal adaptations)
- Raw image or text classification (without appropriate feature extraction)
## Training Data
The model was developed and tested using synthetic data with deliberate feature dependencies. For real-world applications, the model should be retrained on domain-specific data.
### Training Dataset
- **Type**: Synthetic data with controlled dependencies
- **Size**: 1000 samples (default), configurable
- **Features**: 20 initial features (default), configurable
- **Classes**: 3 classes (default), configurable
- **Distribution**: Equal class distribution in the synthetic data
## Performance
### Metrics
On synthetic test data with 3 classes:
- **Accuracy**: 94.0%
- **Macro-average F1-score**: 0.93
- **Per-class metrics**:
- Class 0: Precision 0.96, Recall 1.00, F1 0.98
- Class 1: Precision 0.86, Recall 0.86, F1 0.86
- Class 2: Precision 0.97, Recall 0.93, F1 0.95
### Factors
Performance may vary based on:
- Feature correlation structure in the dataset
- Number of initial features and their information content
- Class distribution balance
- Rank threshold parameter in the MatroidFeatureSelector
## Limitations
- The matroid-based feature selection uses correlation as a proxy for independence, which may not capture all forms of dependency
- The current implementation assumes numerical features and may require adaptation for categorical features
- Feature selection is performed once before training and does not adapt during training
- The rank threshold parameter requires careful tuning based on the dataset
## Ethical Considerations
- Feature selection might unintentionally exclude features that are important for fairness considerations
- The model inherits any biases present in the training data
- Results should be interpreted with caution in high-stakes applications, with human oversight
## Technical Specifications
### Hardware Requirements
- Training: CUDA-capable GPU recommended for larger datasets
- Inference: CPU sufficient for most applications
### Software Requirements
- Python 3.8+
- PyTorch 1.8+
- NumPy 1.20+
- scikit-learn 0.24+
### Training Hyperparameters
- **Batch size**: 32 (default)
- **Learning rate**: 0.001 (default)
- **Optimizer**: Adam
- **Loss function**: Cross-Entropy Loss
- **Epochs**: Early stopping based on validation loss (patience=10)
- **Feature selection rank threshold**: 0.7 (default, configurable)
## How to Use
```python
from matroid_nn import MatroidFeatureSelector, MatroidNN
# Initialize feature selector
selector = MatroidFeatureSelector(rank_threshold=0.7)
# Apply feature selection
X_train_selected = selector.fit_transform(X_train)
X_test_selected = selector.transform(X_test)
# Create and train model
model = MatroidNN(
input_size=X_train_selected.shape[1],
hidden_size=64,
output_size=num_classes
)