File Upload Guide: Where Each File Goes
This guide shows exactly which files are uploaded to which location (Hugging Face or GitHub) and when.
Overview: Three Upload Destinations
- Hugging Face Dataset Repo (
ananttripathiak/engine-maintenance-dataset) - Hugging Face Model Repo (
ananttripathiak/engine-maintenance-model) - Hugging Face Space (
ananttripathiak/engine-maintenance-space) - GitHub Repository (
ananttripathi/engine-predictive-maintenance)
1. Hugging Face Dataset Repo
Repo ID: ananttripathiak/engine-maintenance-dataset
Created by: src/data_register.py and src/data_prep.py
Files Uploaded:
A. Raw Data (via src/data_register.py)
- File:
data/engine_data.csv - Uploaded to:
data/engine_data.csvin the dataset repo - When: Run
python src/data_register.py
B. Processed Data (via src/data_prep.py)
File:
data/processed/train.csvUploaded to:
data/train.csvin the dataset repoWhen: Run
python src/data_prep.pyFile:
data/processed/test.csvUploaded to:
data/test.csvin the dataset repoWhen: Run
python src/data_prep.py
Scripts that upload here:
src/data_register.pyβ uploads raw datasrc/data_prep.pyβ uploads train/test splits
2. Hugging Face Model Repo
Repo ID: ananttripathiak/engine-maintenance-model
Created by: src/train.py
Files Uploaded:
- File:
models/best_model.joblib - Uploaded to:
model.joblibin the model repo - When: Run
python src/train.py(after training completes)
Scripts that upload here:
src/train.pyβ uploads the trained model
3. Hugging Face Space (Streamlit App)
Repo ID: ananttripathiak/engine-maintenance-space
Created by: src/deploy_to_hf.py
Files Uploaded:
The src/deploy_to_hf.py script uploads the entire project folder except:
data/(ignored - too large)mlruns/(ignored - MLflow tracking data)models/(ignored - model is in model repo).github/(ignored - GitHub-specific)
Files that ARE uploaded to Space:
src/app.pyβ Main Streamlit appsrc/inference.pyβ Inference utilitiessrc/config.pyβ ConfigurationDockerfileβ Container definitionrequirements.txtβ Python dependenciesREADME.mdβ Documentation- Other
src/*.pyfiles (if needed by app)
Scripts that upload here:
src/deploy_to_hf.pyβ uploads deployment files
4. GitHub Repository
Repo URL: https://github.com/ananttripathi/engine-predictive-maintenance
Created by: You (manually via git push)
Files Uploaded:
Everything in the mlops/ folder, including:
- β
data/(includingengine_data.csv,processed/train.csv,processed/test.csv) - β
src/(all Python scripts) - β
notebooks/(EDA notebooks, etc.) - β
.github/workflows/pipeline.ymlβ GitHub Actions workflow - β
requirements.txt - β
Dockerfile - β
README.md - β
models/(if you want to track model versions in git) - β
mlruns/(MLflow tracking data - optional) - β All other project files
How to upload:
cd /Users/ananttripathi/Desktop/mlops
git init
git add .
git commit -m "Initial commit: Predictive maintenance MLOps pipeline"
git remote add origin https://github.com/ananttripathi/engine-predictive-maintenance.git
git push -u origin main
Upload Workflow Summary
Step-by-Step Upload Process:
Data Registration β Hugging Face Dataset Repo
python src/data_register.py- Uploads:
data/engine_data.csvβ HF Dataset Repo
- Uploads:
Data Preparation β Hugging Face Dataset Repo
python src/data_prep.py- Uploads:
data/processed/train.csvandtest.csvβ HF Dataset Repo
- Uploads:
Model Training β Hugging Face Model Repo
python src/train.py- Uploads:
models/best_model.joblibβ HF Model Repo
- Uploads:
Deploy App β Hugging Face Space
python src/deploy_to_hf.py- Uploads:
src/app.py,Dockerfile,requirements.txt, etc. β HF Space
- Uploads:
Push to GitHub β GitHub Repository
git add . git commit -m "Complete MLOps pipeline" git push origin main- Uploads: Everything β GitHub Repo
What Gets Uploaded Automatically vs Manually
Automatic (via Scripts):
- β
Hugging Face Dataset Repo β
src/data_register.pyandsrc/data_prep.py - β
Hugging Face Model Repo β
src/train.py - β
Hugging Face Space β
src/deploy_to_hf.py - β GitHub Actions β Runs automatically when you push to GitHub
Manual:
- β οΈ GitHub Repository β You need to run
git pushyourself
File Size Considerations
Large Files (may be ignored):
data/engine_data.csvβ Uploaded to HF Dataset, but you might want to add to.gitignorefor GitHubmlruns/β MLflow tracking data (can be large) - ignored by HF Space deploymodels/best_model.joblibβ Uploaded to HF Model Repo, but you might want to add to.gitignorefor GitHub
Recommended .gitignore:
# Large data files
data/*.csv
data/processed/*.csv
# MLflow tracking
mlruns/
# Model files (already in HF Model Repo)
models/*.joblib
# Python cache
__pycache__/
*.pyc
.venv/
Quick Reference Table
| File/Folder | HF Dataset | HF Model | HF Space | GitHub |
|---|---|---|---|---|
data/engine_data.csv |
β | β | β | β οΈ Optional |
data/processed/train.csv |
β | β | β | β οΈ Optional |
data/processed/test.csv |
β | β | β | β οΈ Optional |
models/best_model.joblib |
β | β | β | β οΈ Optional |
src/app.py |
β | β | β | β |
src/train.py |
β | β | β | β |
src/data_prep.py |
β | β | β | β |
Dockerfile |
β | β | β | β |
requirements.txt |
β | β | β | β |
.github/workflows/pipeline.yml |
β | β | β | β |
README.md |
β | β | β | β |
Legend:
- β = Uploaded automatically or should be uploaded
- β = Not uploaded to this location
- β οΈ Optional = Can be uploaded but might want to exclude from GitHub due to size
Need Help?
- Hugging Face Dataset: Check
src/hf_data_utils.py - Hugging Face Model: Check
src/hf_model_utils.py - Hugging Face Space: Check
src/deploy_to_hf.py - GitHub: Standard git commands