|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- mae |
|
|
- r_squared |
|
|
pipeline_tag: tabular-regression |
|
|
tags: |
|
|
- regression |
|
|
- price-prediction |
|
|
--- |
|
|
|
|
|
# Model Card for Infinitode/IHPPM-OPEN-ARC |
|
|
|
|
|
Repository: https://github.com/Infinitode/OPEN-ARC/ |
|
|
|
|
|
## Model Description |
|
|
|
|
|
OPEN-ARC-IHPP is a CatBoostRegressor model developed as part of Infinitode's OPEN-ARC initiative. It was designed to predict accurate price points for India house and property rentals based on various factors. |
|
|
|
|
|
**Architecture**: |
|
|
|
|
|
- **CatBoostRegressor**: `iterations=2500`, `depth=10`, `learning_rate=0.045`, `loss_function="MAE"`, `eval_metric="MAE"`, `random_seed=42`, `verbose=200`. |
|
|
- **Framework**: CatBoost |
|
|
- **Training Setup**: Trained with 2500 iterations on the dataset split. |
|
|
|
|
|
## Uses |
|
|
|
|
|
- Predicting accurate price points for properties in India. |
|
|
- Validating or measuring existing price points for properties. |
|
|
- Researching property value and factors that influence price. |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- May generate implausible or inappropriate results when influenced by extreme outlier values. |
|
|
- Could provide inaccurate prices; caution is advised when relying on these outputs. |
|
|
|
|
|
## Training Data |
|
|
|
|
|
- Dataset: India House Rent Prediction dataset from Kaggle. |
|
|
- Source URL: https://www.kaggle.com/datasets/pranavshinde36/india-house-rent-prediction |
|
|
- Content: House type, locality, city, area, furnishing and room specifics along with the target rent value. |
|
|
- Size: 7691 entries of properties in India. |
|
|
- Preprocessing: Removed tiny area properties, extreme rent outliers, and `area_rate`. Also created "area buckets" for better performance. |
|
|
|
|
|
## Training Procedure |
|
|
|
|
|
- Metrics: MAE, R-squared |
|
|
- Train/Testing Split: 85% train, 15% testing. |
|
|
|
|
|
## Evaluation Results |
|
|
|
|
|
| Metric | Value | |
|
|
| ------ | ----- | |
|
|
| Testing MAE | 3.86k | |
|
|
| Testing R-squared | 0.9351 | |
|
|
|
|
|
## How to Use |
|
|
|
|
|
```python |
|
|
def predict_user_rent(model, raw_df): |
|
|
print("\n\n========== RENT PREDICTION ASSISTANT ==========\n") |
|
|
print("Choose values for each feature below. For categorical vars, pick a number.\n") |
|
|
|
|
|
sample = {} |
|
|
|
|
|
# Menu |
|
|
def choose_cat(col_name): |
|
|
unique_vals = sorted(raw_df[col_name].unique()) |
|
|
print(f"\n--- {col_name} ---") |
|
|
for idx, val in enumerate(unique_vals): |
|
|
print(f"{idx + 1}. {val}") |
|
|
sel = int(input("Enter your choice number: ")) - 1 |
|
|
return unique_vals[sel] |
|
|
|
|
|
# Categorical |
|
|
sample["house_type"] = choose_cat("house_type") |
|
|
sample["locality"] = choose_cat("locality") |
|
|
sample["city"] = choose_cat("city") |
|
|
sample["furnishing"] = choose_cat("furnishing") |
|
|
|
|
|
# Numeric values |
|
|
def choose_num(col_name): |
|
|
return float(input(f"\nEnter value for {col_name}: ")) |
|
|
|
|
|
sample["area"] = choose_num("area") |
|
|
sample["beds"] = choose_num("beds") |
|
|
sample["bathrooms"] = choose_num("bathrooms") |
|
|
sample["balconies"] = choose_num("balconies") |
|
|
|
|
|
# area bucket |
|
|
area_val = sample["area"] |
|
|
area_bins = [0, 300, 600, 900, 1200, 2000, 5000, 100000] |
|
|
area_bucket = np.digitize([area_val], area_bins)[0] - 1 |
|
|
sample["area_bucket"] = area_bucket |
|
|
|
|
|
# placeholder for rent_psf bucket (we don't know rent yet) |
|
|
# so we use area only as a proxy for typical price density |
|
|
sample["rent_psf_bucket"] = min(int(area_bucket), 19) |
|
|
|
|
|
df_input = pd.DataFrame([sample]) |
|
|
|
|
|
# Must match training encodings |
|
|
for col in ["house_type", "locality", "city", "furnishing"]: |
|
|
df_input[col] = df_input[col].astype(raw_df[col].dtype) |
|
|
|
|
|
# Prediction |
|
|
pred_log = model.predict(df_input)[0] |
|
|
pred_rent = np.expm1(pred_log) |
|
|
|
|
|
print("\n===================================") |
|
|
print(f"Estimated Rent: ₹ {pred_rent:,.2f}") |
|
|
print("===================================\n") |
|
|
|
|
|
return pred_rent |
|
|
|
|
|
# Uncomment to use interactively: |
|
|
# predict_user_rent(model, df) |
|
|
``` |
|
|
|
|
|
## Contact |
|
|
|
|
|
For questions or issues, open a GitHub issue or reach out at https://infinitode.netlify.app/forms/contact. |