| # Dependencies | |
| ## System | |
| - Hadoop/HDFS | |
| - Spark | |
| - Airflow | |
| - Python 3.8+ | |
| ## Python | |
| Install with pip: | |
| ```bash | |
| pip install pandas numpy scikit-learn tensorflow | |
| ``` | |
| ## Notes | |
| - Ensure Java is installed for Hadoop/Spark. | |
| - Airflow and Hadoop should be configured and running before triggering the DAG. | |
| - If using a dev container, dependencies may already be installed. | |