ETL_pipeline / docs /dependencies.md
heboya8's picture
Upload folder using huggingface_hub
2eee82e verified
# Dependencies
## System
- Hadoop/HDFS
- Spark
- Airflow
- Python 3.8+
## Python
Install with pip:
```bash
pip install pandas numpy scikit-learn tensorflow
```
## Notes
- Ensure Java is installed for Hadoop/Spark.
- Airflow and Hadoop should be configured and running before triggering the DAG.
- If using a dev container, dependencies may already be installed.