NeshVerse
/

Sales_Prediction_U

Model card Files Files and versions

Sales_Prediction_U / README.md

NeshVerse's picture

Update README.md

32e429d verified 5 months ago

|

history blame contribute delete

2.94 kB

	## Summary:

	---
	tags:
	- regression
	- sales-prediction
	library: scikit-learn
	---

	### Data Analysis Key Findings

	* The dataset contains 39,435 entries with 'input' (sales data description) and 'output' (total sales value) columns.
	* The 'output' column containing sales values was successfully converted from an object type to a numeric (float) type, and rows with parsing errors were removed.
	* The 'product_name' was successfully extracted from the 'input' column.
	* The data was split into training (80%) and testing (20%) sets, resulting in 30,921 training samples and 7,731 testing samples.
	* A `RandomForestRegressor` model within a `Pipeline` using `TfidfVectorizer` for the 'product_name' feature was successfully trained.
	* For efficient querying, a separate DataFrame `df_query` was created with 'product_name' set as the index and 'sales_value' as a numeric type.
	* A function `query_sales_data` was implemented to filter the sales data DataFrame based on a provided query string using `dataframe.eval()`.
	* The trained prediction model showed a Mean Absolute Error (MAE) of 2224.90, a Root Mean Squared Error (RMSE) of 31031.07, and an R-squared (R2) Score of 0.05 on the test set.
	* A function `predict_and_query` was successfully developed to integrate the prediction model and querying functionality, allowing users to predict sales for a product and retrieve its actual sales data.

	### Insights or Next Steps

	* The low R-squared score (0.05) indicates that the current model, which only uses product name features processed by TF-IDF, has limited predictive power. Including additional features related to sales data, such as time-based information or product categories, could significantly improve model performance.
	* The implemented `query_sales_data` function provides a basic querying capability. For a large dataset or more complex querying needs, consider implementing a more robust data storage and querying solution, such as a database.



	# Example of how to use the predict_and_query function

	# You can replace this with any product name you want to predict for
	# If the product name exists in the original data, it will also retrieve actual sales
	sample_product_name_for_prediction = "APPLE IPHONE 16 128GB SS \"MP01213171\""

	result = predict_and_query(sample_product_name_for_prediction)

	print(f"Prediction and Query Result for '{sample_product_name_for_prediction}':")
	print(f"Predicted Sales: {result['predicted_sales']:.2f}")
	print(f"Actual Sales: {result['actual_sales']}")

	# You can try with a product name that might not be in the training data to see only the prediction
	# sample_product_name_new = "A completely new product not in the dataset"
	# result_new = predict_and_query(sample_product_name_new)
	# print(f"\nPrediction and Query Result for '{sample_product_name_new}':")
	# print(f"Predicted Sales: {result_new['predicted_sales']:.2f}")
	# print(f"Actual Sales: {result_new['actual_sales']}")