mjpsm commited on
Commit
30a95e7
Β·
verified Β·
1 Parent(s): 756f09d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +115 -0
README.md ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: xgboost
4
+ pipeline_tag: tabular-classification
5
+ tags:
6
+ - entrepreneurial-readiness
7
+ - tabular
8
+ - xgboost
9
+ - idea-difficulty
10
+ model-index:
11
+ - name: Idea Difficulty Classifier (XGBoost)
12
+ results:
13
+ - task:
14
+ type: tabular-classification
15
+ name: Idea Difficulty (Low/Medium/High)
16
+ dataset:
17
+ name: idea_difficulty_dataset_2000 (synthetic, balanced)
18
+ type: tabular
19
+ metrics:
20
+ - type: accuracy
21
+ value: 0.9733
22
+ - type: macro_f1
23
+ value: 0.9733
24
+ - type: log_loss
25
+ value: 0.0584
26
+ ---
27
+
28
+ # mjpsm/Idea-Difficulty-XGB
29
+
30
+ ## 🧾 Overview
31
+ This model predicts the **difficulty of a business idea** as `Low`, `Medium`, or `High`.
32
+ It is part of the Entrepreneurial Readiness series of tabular classifiers (alongside Skill Level, Risk Tolerance, and Confidence).
33
+
34
+ The model was trained with **XGBoost** on a 2,000-row synthetic dataset of structured features that capture common difficulty drivers.
35
+
36
+ ---
37
+
38
+ ## πŸ“₯ Input Features
39
+
40
+ | Feature | Type | Range | Definition |
41
+ |---------|------|-------|------------|
42
+ | `capital_required` | int | 1–10 | How much upfront capital is needed (1 = minimal, 10 = very high) |
43
+ | `technical_complexity` | int | 1–10 | How technically difficult the product/service is to build or maintain |
44
+ | `market_competition` | int | 1–10 | How crowded the target market is with competitors |
45
+ | `customer_acquisition_difficulty` | int | 1–10 | How difficult it is to acquire and retain customers |
46
+ | `regulatory_hurdles` | int | 1–10 | The degree of legal/regulatory challenges |
47
+ | `time_to_mvp_months` | int | 1–60 | Estimated time to Minimum Viable Product launch (in months) |
48
+ | `team_expertise_required` | int | 1–10 | Level of specialized expertise/team members required |
49
+ | `scalability_requirement` | int | 1–10 | Degree to which scaling is required for success |
50
+
51
+ **Target label:**
52
+ - `Low` = Idea is relatively easy to execute
53
+ - `Medium` = Moderately challenging
54
+ - `High` = Difficult, requiring significant resources and expertise
55
+
56
+ ---
57
+
58
+ ## πŸ“Š Performance
59
+
60
+ - **Accuracy:** 0.9733
61
+ - **Macro F1:** 0.9733
62
+ - **Log Loss:** 0.0584
63
+
64
+ Confusion Matrix (rows = true, cols = predicted):
65
+
66
+ | | High | Low | Medium |
67
+ |-------|------|-----|--------|
68
+ | High | 100 | 0 | 0 |
69
+ | Low | 0 | 96 | 4 |
70
+ | Medium| 2 | 2 | 96 |
71
+
72
+ ---
73
+
74
+ ## πŸš€ Quickstart (load from the Hub)
75
+ ```python
76
+ # Load directly from: mjpsm/Idea-Difficulty-XGB
77
+ from huggingface_hub import hf_hub_download
78
+ from xgboost import XGBClassifier
79
+ import pandas as pd, json
80
+
81
+ REPO_ID = "mjpsm/Idea-Difficulty-XGB"
82
+ model_path = hf_hub_download(REPO_ID, "xgb_model.json")
83
+
84
+ clf = XGBClassifier()
85
+ clf.load_model(model_path)
86
+
87
+ # IMPORTANT: Use the same feature names/order as training
88
+ FEATURES = [
89
+ "capital_required","technical_complexity","market_competition",
90
+ "customer_acquisition_difficulty","regulatory_hurdles",
91
+ "time_to_mvp_months","team_expertise_required","scalability_requirement"
92
+ ]
93
+
94
+ row = pd.DataFrame([{
95
+ "capital_required": 7,
96
+ "technical_complexity": 9,
97
+ "market_competition": 6,
98
+ "customer_acquisition_difficulty": 8,
99
+ "regulatory_hurdles": 7,
100
+ "time_to_mvp_months": 18,
101
+ "team_expertise_required": 5,
102
+ "scalability_requirement": 9
103
+ }], columns=FEATURES)
104
+
105
+ pred_id = int(clf.predict(row)[0])
106
+
107
+ # If label_map.json is NOT uploaded, default to alphabetical LabelEncoder order:
108
+ CLASSES = ["High","Low","Medium"] # update if you publish label_map.json
109
+ print("Predicted Idea Difficulty:", CLASSES[pred_id])
110
+
111
+ # OPTIONAL: If you later upload 'label_map.json', prefer this:
112
+ # lm_path = hf_hub_download(REPO_ID, "label_map.json")
113
+ # label_map = json.load(open(lm_path))
114
+ # inv_map = {v:k for k,v in label_map.items()}
115
+ # print("Predicted Idea Difficulty:", inv_map[pred_id])