elsayedelmandoh's picture
update readme, datasets, and structure
413d3a1
# Project Definition - Quickstart
This quickstart explains how to prepare the environment and reproduce core experiments and inference from the repository.
1) Create a Python environment and install dependencies:
```
python -m venv .venv
source .venv/Scripts/activate
pip install -r requirements.txt
```
2) Inspect processed data and vectorizers (already available in repo):
- `data/processed/` contains prepared train/valid/test CSVs and labels.
- `data/vectorizers/` contains the fitted TF-IDF vectorizer and sparse matrices.
3) Run notebooks (recommended order):
- `notebooks/01_data_acquisition.ipynb`
- `notebooks/02_eda.ipynb`
- `notebooks/03_data_preprocessing.ipynb`
- `notebooks/04_feature_engineering.ipynb`
- modeling notebooks `05_*.ipynb``14_comparsion.ipynb`
4) Run the demo/app:
```
streamlit run app.py
```
Notes:
- Preprocessed artifacts and trained model joblib files are stored under `data/processed`, `data/vectorizers`, and `data/models` to speed up reproduction.