alokpandey commited on
Commit
630a153
·
verified ·
1 Parent(s): aad91f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +101 -0
README.md CHANGED
@@ -1,3 +1,104 @@
1
  ---
2
  license: mit
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - en
5
+ pipeline_tag: text-classification
6
+ tags:
7
+ - finance
8
  ---
9
+ # Model Card: Fund Predictor Pipeline Model
10
+
11
+ ## Model Overview
12
+ This is a machine learning pipeline designed to predict mutual fund performance using both numerical and categorical features. The model combines preprocessing steps with a Random Forest classifier, making it suitable for financial data analysis.
13
+
14
+ ## Model Architecture
15
+ The model uses a two-branch preprocessing pipeline followed by a Random Forest classifier:
16
+
17
+ ### Preprocessing Pipeline
18
+ 1. **Numerical Features Branch**
19
+ - Features: ['AUM']
20
+ - Transformation: StandardScaler
21
+
22
+ 2. **Categorical Features Branch**
23
+ - Features: ['AMC', 'Fund Category', 'Sub-Sheme', 'Investment Type', 'Growth Option']
24
+ - Transformations:
25
+ - OneHotEncoder (non-sparse output, handles unknown categories)
26
+ - Feature Selection (SelectKBest with mutual_info_classif, k=30)
27
+
28
+ ### Classifier
29
+ - **Model**: RandomForestClassifier
30
+ - **Key Parameters**:
31
+ - n_estimators: 30
32
+ - max_depth: 20
33
+ - min_samples_split: 10
34
+ - min_samples_leaf: 5
35
+ - n_jobs: -1 (parallel processing)
36
+ - random_state: 42
37
+
38
+ ## Use Cases
39
+ - Mutual fund performance prediction
40
+ - Investment strategy optimization
41
+ - Portfolio management
42
+ - Risk assessment
43
+
44
+ ## Model Parameters
45
+
46
+ ### Preprocessing Configuration
47
+ - **Numerical Features**:
48
+ - StandardScaler with default parameters
49
+ - Handles mean centering and scaling
50
+
51
+ - **Categorical Features**:
52
+ - OneHotEncoder:
53
+ - handle_unknown: 'ignore'
54
+ - sparse_output: False
55
+ - dtype: numpy.float64
56
+ - Feature Selection:
57
+ - Method: SelectKBest with mutual_info_classif
58
+ - Number of features: 30
59
+
60
+ ### Random Forest Configuration
61
+ - **Tree Structure**:
62
+ - Maximum depth: 20
63
+ - Minimum samples for split: 10
64
+ - Minimum samples per leaf: 5
65
+
66
+ - **Ensemble Settings**:
67
+ - Number of trees: 30
68
+ - Feature selection: sqrt (auto)
69
+ - Bootstrap: True
70
+ - Criterion: gini
71
+
72
+ ## Technical Details
73
+
74
+ ### File Information
75
+ - **Model Path**: C:\Users\alokp\models\fund_predictor_model_20241103_230654.joblib
76
+ - **Model Type**: Scikit-learn Pipeline
77
+ - **Last Updated**: November 3, 2024
78
+
79
+ ### Input Features
80
+ 1. **Numerical Features**:
81
+ - AUM (Assets Under Management)
82
+
83
+ 2. **Categorical Features**:
84
+ - AMC
85
+ - Fund Category
86
+ - Sub-Scheme
87
+ - Investment Type
88
+ - Growth Option
89
+
90
+ ## Limitations and Considerations
91
+ - The model uses mutual_info_classif for feature selection, which may not capture all relevant relationships
92
+ - Feature selection is limited to top 30 features
93
+ - Performance may vary with unknown categories due to the 'ignore' setting in OneHotEncoder
94
+
95
+ ## Usage Notes
96
+ - The model supports parallel processing (n_jobs=-1)
97
+ - Handles unknown categories in categorical features gracefully
98
+ - Uses standard scaling for numerical features
99
+ - Designed for production use with joblib serialization
100
+
101
+ ## Model Location
102
+ ```
103
+ C:\Users\alokp\models\fund_predictor_model_20241103_230654.joblib
104
+ ```