JohanBeytell commited on
Commit
14c67f0
·
verified ·
1 Parent(s): 71aa836

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +124 -3
README.md CHANGED
@@ -1,3 +1,124 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ metrics:
6
+ - mae
7
+ - r_squared
8
+ pipeline_tag: tabular-regression
9
+ tags:
10
+ - regression
11
+ - price-prediction
12
+ ---
13
+
14
+ # Model Card for Infinitode/IHPPM-OPEN-ARC
15
+
16
+ Repository: https://github.com/Infinitode/OPEN-ARC/
17
+
18
+ ## Model Description
19
+
20
+ OPEN-ARC-IHPP is a CatBoostRegressor model developed as part of Infinitode's OPEN-ARC initiative. It was designed to predict accurate price points for India house and property rentals based on various factors.
21
+
22
+ **Architecture**:
23
+
24
+ - **CatBoostRegressor**: `iterations=2500`, `depth=10`, `learning_rate=0.045`, `loss_function="MAE"`, `eval_metric="MAE"`, `random_seed=42`, `verbose=200`.
25
+ - **Framework**: CatBoost
26
+ - **Training Setup**: Trained with 2500 iterations on the dataset split.
27
+
28
+ ## Uses
29
+
30
+ - Predicting accurate price points for properties in India.
31
+ - Validating or measuring existing price points for properties.
32
+ - Researching property value and factors that influence price.
33
+
34
+ ## Limitations
35
+
36
+ - May generate implausible or inappropriate results when influenced by extreme outlier values.
37
+ - Could provide inaccurate prices; caution is advised when relying on these outputs.
38
+
39
+ ## Training Data
40
+
41
+ - Dataset: India House Rent Prediction dataset from Kaggle.
42
+ - Source URL: https://www.kaggle.com/datasets/pranavshinde36/india-house-rent-prediction
43
+ - Content: House type, locality, city, area, furnishing and room specifics along with the target rent value.
44
+ - Size: 7691 entries of properties in India.
45
+ - Preprocessing: Removed tiny area properties, extreme rent outliers, and `area_rate`. Also created "area buckets" for better performance.
46
+
47
+ ## Training Procedure
48
+
49
+ - Metrics: MAE, R-squared
50
+ - Train/Testing Split: 85% train, 15% testing.
51
+
52
+ ## Evaluation Results
53
+
54
+ | Metric | Value |
55
+ | ------ | ----- |
56
+ | Testing MAE | 3.86k |
57
+ | Testing R-squared | 0.9351 |
58
+
59
+ ## How to Use
60
+
61
+ ```python
62
+ def predict_user_rent(model, raw_df):
63
+ print("\n\n========== RENT PREDICTION ASSISTANT ==========\n")
64
+ print("Choose values for each feature below. For categorical vars, pick a number.\n")
65
+
66
+ sample = {}
67
+
68
+ # Menu
69
+ def choose_cat(col_name):
70
+ unique_vals = sorted(raw_df[col_name].unique())
71
+ print(f"\n--- {col_name} ---")
72
+ for idx, val in enumerate(unique_vals):
73
+ print(f"{idx + 1}. {val}")
74
+ sel = int(input("Enter your choice number: ")) - 1
75
+ return unique_vals[sel]
76
+
77
+ # Categorical
78
+ sample["house_type"] = choose_cat("house_type")
79
+ sample["locality"] = choose_cat("locality")
80
+ sample["city"] = choose_cat("city")
81
+ sample["furnishing"] = choose_cat("furnishing")
82
+
83
+ # Numeric values
84
+ def choose_num(col_name):
85
+ return float(input(f"\nEnter value for {col_name}: "))
86
+
87
+ sample["area"] = choose_num("area")
88
+ sample["beds"] = choose_num("beds")
89
+ sample["bathrooms"] = choose_num("bathrooms")
90
+ sample["balconies"] = choose_num("balconies")
91
+
92
+ # area bucket
93
+ area_val = sample["area"]
94
+ area_bins = [0, 300, 600, 900, 1200, 2000, 5000, 100000]
95
+ area_bucket = np.digitize([area_val], area_bins)[0] - 1
96
+ sample["area_bucket"] = area_bucket
97
+
98
+ # placeholder for rent_psf bucket (we don't know rent yet)
99
+ # so we use area only as a proxy for typical price density
100
+ sample["rent_psf_bucket"] = min(int(area_bucket), 19)
101
+
102
+ df_input = pd.DataFrame([sample])
103
+
104
+ # Must match training encodings
105
+ for col in ["house_type", "locality", "city", "furnishing"]:
106
+ df_input[col] = df_input[col].astype(raw_df[col].dtype)
107
+
108
+ # Prediction
109
+ pred_log = model.predict(df_input)[0]
110
+ pred_rent = np.expm1(pred_log)
111
+
112
+ print("\n===================================")
113
+ print(f"Estimated Rent: ₹ {pred_rent:,.2f}")
114
+ print("===================================\n")
115
+
116
+ return pred_rent
117
+
118
+ # Uncomment to use interactively:
119
+ # predict_user_rent(model, df)
120
+ ```
121
+
122
+ ## Contact
123
+
124
+ For questions or issues, open a GitHub issue or reach out at https://infinitode.netlify.app/forms/contact.