IHPPM-OPEN-ARC / README.md

Update README.md

14c67f0 verified about 2 months ago

3.93 kB

	---
	license: mit
	language:
	- en
	metrics:
	- mae
	- r_squared
	pipeline_tag: tabular-regression
	tags:
	- regression
	- price-prediction
	---

	# Model Card for Infinitode/IHPPM-OPEN-ARC

	Repository: https://github.com/Infinitode/OPEN-ARC/

	## Model Description

	OPEN-ARC-IHPP is a CatBoostRegressor model developed as part of Infinitode's OPEN-ARC initiative. It was designed to predict accurate price points for India house and property rentals based on various factors.

	Architecture:

	- CatBoostRegressor: `iterations=2500`, `depth=10`, `learning_rate=0.045`, `loss_function="MAE"`, `eval_metric="MAE"`, `random_seed=42`, `verbose=200`.
	- Framework: CatBoost
	- Training Setup: Trained with 2500 iterations on the dataset split.

	## Uses

	- Predicting accurate price points for properties in India.
	- Validating or measuring existing price points for properties.
	- Researching property value and factors that influence price.

	## Limitations

	- May generate implausible or inappropriate results when influenced by extreme outlier values.
	- Could provide inaccurate prices; caution is advised when relying on these outputs.

	## Training Data

	- Dataset: India House Rent Prediction dataset from Kaggle.
	- Source URL: https://www.kaggle.com/datasets/pranavshinde36/india-house-rent-prediction
	- Content: House type, locality, city, area, furnishing and room specifics along with the target rent value.
	- Size: 7691 entries of properties in India.
	- Preprocessing: Removed tiny area properties, extreme rent outliers, and `area_rate`. Also created "area buckets" for better performance.

	## Training Procedure

	- Metrics: MAE, R-squared
	- Train/Testing Split: 85% train, 15% testing.

	## Evaluation Results

	\| Metric \| Value \|
	\| ------ \| ----- \|
	\| Testing MAE \| 3.86k \|
	\| Testing R-squared \| 0.9351 \|

	## How to Use

	```python
	def predict_user_rent(model, raw_df):
	print("\n\n========== RENT PREDICTION ASSISTANT ==========\n")
	print("Choose values for each feature below. For categorical vars, pick a number.\n")

	sample = {}

	# Menu
	def choose_cat(col_name):
	unique_vals = sorted(raw_df[col_name].unique())
	print(f"\n--- {col_name} ---")
	for idx, val in enumerate(unique_vals):
	print(f"{idx + 1}. {val}")
	sel = int(input("Enter your choice number: ")) - 1
	return unique_vals[sel]

	# Categorical
	sample["house_type"] = choose_cat("house_type")
	sample["locality"] = choose_cat("locality")
	sample["city"] = choose_cat("city")
	sample["furnishing"] = choose_cat("furnishing")

	# Numeric values
	def choose_num(col_name):
	return float(input(f"\nEnter value for {col_name}: "))

	sample["area"] = choose_num("area")
	sample["beds"] = choose_num("beds")
	sample["bathrooms"] = choose_num("bathrooms")
	sample["balconies"] = choose_num("balconies")

	# area bucket
	area_val = sample["area"]
	area_bins = [0, 300, 600, 900, 1200, 2000, 5000, 100000]
	area_bucket = np.digitize([area_val], area_bins)[0] - 1
	sample["area_bucket"] = area_bucket

	# placeholder for rent_psf bucket (we don't know rent yet)
	# so we use area only as a proxy for typical price density
	sample["rent_psf_bucket"] = min(int(area_bucket), 19)

	df_input = pd.DataFrame([sample])

	# Must match training encodings
	for col in ["house_type", "locality", "city", "furnishing"]:
	df_input[col] = df_input[col].astype(raw_df[col].dtype)

	# Prediction
	pred_log = model.predict(df_input)[0]
	pred_rent = np.expm1(pred_log)

	print("\n===================================")
	print(f"Estimated Rent: ₹ {pred_rent:,.2f}")
	print("===================================\n")

	return pred_rent

	# Uncomment to use interactively:
	# predict_user_rent(model, df)
	```

	## Contact

	For questions or issues, open a GitHub issue or reach out at https://infinitode.netlify.app/forms/contact.