ETL_pipeline / docs /dependencies.md
heboya8's picture
Upload folder using huggingface_hub
2eee82e verified

Dependencies

System

  • Hadoop/HDFS
  • Spark
  • Airflow
  • Python 3.8+

Python

Install with pip:

pip install pandas numpy scikit-learn tensorflow

Notes

  • Ensure Java is installed for Hadoop/Spark.
  • Airflow and Hadoop should be configured and running before triggering the DAG.
  • If using a dev container, dependencies may already be installed.