Bardi-ya commited on
Commit
fd382fe
Β·
verified Β·
1 Parent(s): c296592

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -75
README.md CHANGED
@@ -1,75 +1,10 @@
1
- # MovieLens Movie Data Analysis
2
-
3
- This project provides a reproducible pipeline for preprocessing and exploratory data analysis (EDA) on the MovieLens movie dataset.
4
-
5
- ## Project Structure
6
-
7
- ```
8
- .
9
- β”œβ”€β”€ app/
10
- β”‚ └── Practical.py # Main entry point for running the pipeline
11
- β”œβ”€β”€ src/
12
- β”‚ β”œβ”€β”€ preprocessing.py # Data loading, cleaning, merging
13
- β”‚ └── eda.py # EDA and visualization (plots saved to /report/images)
14
- β”œβ”€β”€ notebooks/
15
- β”‚ └── Practical.ipynb # Step-by-step notebook for exploration and prototyping
16
- β”œβ”€β”€ report/
17
- β”‚ └── images/ # Output directory for all generated plots and images
18
- β”œβ”€β”€ data/
19
- β”‚ β”œβ”€β”€ raw/ # Raw input data (CSV files)
20
- β”‚ β”œβ”€β”€ interim/ # Cleaned/intermediate CSVs
21
- β”‚ └── processed/ # (Optional) Final processed data
22
- β”œβ”€β”€ requirements.txt # Python dependencies
23
- └── README.md # This file
24
- ```
25
-
26
- ## How to Run
27
-
28
- 1. **Install dependencies**
29
- Make sure you have Python 3.8+ and run:
30
- ```
31
- pip install -r requirements.txt
32
- ```
33
-
34
- 2. **Prepare data**
35
- Place the raw MovieLens CSV files in `data/raw/` as:
36
- - `movies_metadata.csv`
37
- - `credits.csv`
38
- - `keywords.csv`
39
- - `links.csv`
40
- - `ratings.csv`
41
-
42
- 3. **Run the pipeline**
43
- ```
44
- python app/Practical.py
45
- ```
46
- This will:
47
- - Clean and merge the data
48
- - Save interim cleaned CSVs to `data/interim/`
49
- - Generate all EDA plots and wordclouds, saving them to `report/images/`
50
- - Save interactive Plotly plots as PNG (requires [kaleido](https://github.com/plotly/Kaleido)) or HTML fallback
51
-
52
- ## Features
53
-
54
- - **Modular Preprocessing**: All data cleaning, merging, and type handling in `src/preprocessing.py`
55
- - **Automated EDA**: All plots and wordclouds generated and saved by `src/eda.py`
56
- - **Reproducibility**: One-command run for the entire workflow
57
- - **Notebook**: `notebooks/Practical.ipynb` for step-by-step exploration
58
-
59
- ## Requirements
60
-
61
- - pandas
62
- - numpy
63
- - matplotlib
64
- - seaborn
65
- - missingno
66
- - wordcloud
67
- - plotly
68
- - pycountry
69
- - kaleido (for static plotly image export)
70
-
71
- ## Notes
72
-
73
- - If static Plotly image export fails, HTML versions of the plots are saved as a fallback.
74
- - All output images are saved in `report/images/`.
75
- - Adjust paths in `src/eda.py` and `src/preprocessing.py` if your
 
1
+ ---
2
+ title: Movie Recommender System
3
+ emoji: 🎬
4
+ colorFrom: blue
5
+ colorTo: pink
6
+ sdk: gradio
7
+ sdk_version: "4.41.0" # you can also leave this out and HF picks latest
8
+ app_file: app.py
9
+ pinned: false
10
+ ---