VilaVision
/

mutualfundclassification

Text Classification

Model card Files Files and versions

alokpandey commited on Nov 5, 2024

Commit

630a153

·

verified ·

1 Parent(s): aad91f9

Update README.md

Files changed (1) hide show

README.md +101 -0

README.md CHANGED Viewed

@@ -1,3 +1,104 @@
 ---
 license: mit
 ---

 ---
 license: mit
+language:
+- en
+pipeline_tag: text-classification
+tags:
+- finance
 ---
+# Model Card: Fund Predictor Pipeline Model
+## Model Overview
+This is a machine learning pipeline designed to predict mutual fund performance using both numerical and categorical features. The model combines preprocessing steps with a Random Forest classifier, making it suitable for financial data analysis.
+## Model Architecture
+The model uses a two-branch preprocessing pipeline followed by a Random Forest classifier:
+### Preprocessing Pipeline
+1. **Numerical Features Branch**
+   - Features: ['AUM']
+   - Transformation: StandardScaler
+2. **Categorical Features Branch**
+   - Features: ['AMC', 'Fund Category', 'Sub-Sheme', 'Investment Type', 'Growth Option']
+   - Transformations:
+     - OneHotEncoder (non-sparse output, handles unknown categories)
+     - Feature Selection (SelectKBest with mutual_info_classif, k=30)
+### Classifier
+- **Model**: RandomForestClassifier
+- **Key Parameters**:
+  - n_estimators: 30
+  - max_depth: 20
+  - min_samples_split: 10
+  - min_samples_leaf: 5
+  - n_jobs: -1 (parallel processing)
+  - random_state: 42
+## Use Cases
+- Mutual fund performance prediction
+- Investment strategy optimization
+- Portfolio management
+- Risk assessment
+## Model Parameters
+### Preprocessing Configuration
+- **Numerical Features**:
+  - StandardScaler with default parameters
+  - Handles mean centering and scaling
+- **Categorical Features**:
+  - OneHotEncoder:
+    - handle_unknown: 'ignore'
+    - sparse_output: False
+    - dtype: numpy.float64
+  - Feature Selection:
+    - Method: SelectKBest with mutual_info_classif
+    - Number of features: 30
+### Random Forest Configuration
+- **Tree Structure**:
+  - Maximum depth: 20
+  - Minimum samples for split: 10
+  - Minimum samples per leaf: 5
+- **Ensemble Settings**:
+  - Number of trees: 30
+  - Feature selection: sqrt (auto)
+  - Bootstrap: True
+  - Criterion: gini
+## Technical Details
+### File Information
+- **Model Path**: C:\Users\alokp\models\fund_predictor_model_20241103_230654.joblib
+- **Model Type**: Scikit-learn Pipeline
+- **Last Updated**: November 3, 2024
+### Input Features
+1. **Numerical Features**:
+   - AUM (Assets Under Management)
+2. **Categorical Features**:
+   - AMC
+   - Fund Category
+   - Sub-Scheme
+   - Investment Type
+   - Growth Option
+## Limitations and Considerations
+- The model uses mutual_info_classif for feature selection, which may not capture all relevant relationships
+- Feature selection is limited to top 30 features
+- Performance may vary with unknown categories due to the 'ignore' setting in OneHotEncoder
+## Usage Notes
+- The model supports parallel processing (n_jobs=-1)
+- Handles unknown categories in categorical features gracefully
+- Uses standard scaling for numerical features
+- Designed for production use with joblib serialization
+## Model Location
+```
+C:\Users\alokp\models\fund_predictor_model_20241103_230654.joblib
+```