Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,75 +1,10 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
.
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
βββ src/
|
| 12 |
-
β βββ preprocessing.py # Data loading, cleaning, merging
|
| 13 |
-
β βββ eda.py # EDA and visualization (plots saved to /report/images)
|
| 14 |
-
βββ notebooks/
|
| 15 |
-
β βββ Practical.ipynb # Step-by-step notebook for exploration and prototyping
|
| 16 |
-
βββ report/
|
| 17 |
-
β βββ images/ # Output directory for all generated plots and images
|
| 18 |
-
βββ data/
|
| 19 |
-
β βββ raw/ # Raw input data (CSV files)
|
| 20 |
-
β βββ interim/ # Cleaned/intermediate CSVs
|
| 21 |
-
β βββ processed/ # (Optional) Final processed data
|
| 22 |
-
βββ requirements.txt # Python dependencies
|
| 23 |
-
βββ README.md # This file
|
| 24 |
-
```
|
| 25 |
-
|
| 26 |
-
## How to Run
|
| 27 |
-
|
| 28 |
-
1. **Install dependencies**
|
| 29 |
-
Make sure you have Python 3.8+ and run:
|
| 30 |
-
```
|
| 31 |
-
pip install -r requirements.txt
|
| 32 |
-
```
|
| 33 |
-
|
| 34 |
-
2. **Prepare data**
|
| 35 |
-
Place the raw MovieLens CSV files in `data/raw/` as:
|
| 36 |
-
- `movies_metadata.csv`
|
| 37 |
-
- `credits.csv`
|
| 38 |
-
- `keywords.csv`
|
| 39 |
-
- `links.csv`
|
| 40 |
-
- `ratings.csv`
|
| 41 |
-
|
| 42 |
-
3. **Run the pipeline**
|
| 43 |
-
```
|
| 44 |
-
python app/Practical.py
|
| 45 |
-
```
|
| 46 |
-
This will:
|
| 47 |
-
- Clean and merge the data
|
| 48 |
-
- Save interim cleaned CSVs to `data/interim/`
|
| 49 |
-
- Generate all EDA plots and wordclouds, saving them to `report/images/`
|
| 50 |
-
- Save interactive Plotly plots as PNG (requires [kaleido](https://github.com/plotly/Kaleido)) or HTML fallback
|
| 51 |
-
|
| 52 |
-
## Features
|
| 53 |
-
|
| 54 |
-
- **Modular Preprocessing**: All data cleaning, merging, and type handling in `src/preprocessing.py`
|
| 55 |
-
- **Automated EDA**: All plots and wordclouds generated and saved by `src/eda.py`
|
| 56 |
-
- **Reproducibility**: One-command run for the entire workflow
|
| 57 |
-
- **Notebook**: `notebooks/Practical.ipynb` for step-by-step exploration
|
| 58 |
-
|
| 59 |
-
## Requirements
|
| 60 |
-
|
| 61 |
-
- pandas
|
| 62 |
-
- numpy
|
| 63 |
-
- matplotlib
|
| 64 |
-
- seaborn
|
| 65 |
-
- missingno
|
| 66 |
-
- wordcloud
|
| 67 |
-
- plotly
|
| 68 |
-
- pycountry
|
| 69 |
-
- kaleido (for static plotly image export)
|
| 70 |
-
|
| 71 |
-
## Notes
|
| 72 |
-
|
| 73 |
-
- If static Plotly image export fails, HTML versions of the plots are saved as a fallback.
|
| 74 |
-
- All output images are saved in `report/images/`.
|
| 75 |
-
- Adjust paths in `src/eda.py` and `src/preprocessing.py` if your
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Movie Recommender System
|
| 3 |
+
emoji: π¬
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: pink
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: "4.41.0" # you can also leave this out and HF picks latest
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|