Dependencies
System
- Hadoop/HDFS
- Spark
- Airflow
- Python 3.8+
Python
Install with pip:
pip install pandas numpy scikit-learn tensorflow
Notes
- Ensure Java is installed for Hadoop/Spark.
- Airflow and Hadoop should be configured and running before triggering the DAG.
- If using a dev container, dependencies may already be installed.