# Dependencies ## System - Hadoop/HDFS - Spark - Airflow - Python 3.8+ ## Python Install with pip: ```bash pip install pandas numpy scikit-learn tensorflow ``` ## Notes - Ensure Java is installed for Hadoop/Spark. - Airflow and Hadoop should be configured and running before triggering the DAG. - If using a dev container, dependencies may already be installed.