averye-duke commited on
Commit
fd1ef54
·
1 Parent(s): 11b1420

Add app.py and setup files for Hugging Face Space

Browse files
Files changed (4) hide show
  1. README.md +194 -8
  2. app.py +251 -0
  3. config.yaml +73 -0
  4. requirements.txt +14 -0
README.md CHANGED
@@ -1,12 +1,198 @@
1
  ---
2
- title: Module3
3
- emoji: 📊
4
- colorFrom: purple
5
- colorTo: green
6
- sdk: gradio
7
- sdk_version: 6.0.0
8
- app_file: app.py
9
  pinned: false
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: "Coffee Cup Points Estimator"
3
+ emoji: "☕️"
4
+ colorFrom: "brown"
5
+ colorTo: "green"
6
+ sdk: "gradio"
7
+ sdk_version: "5.49.1"
8
+ app_file: "app.py"
9
  pinned: false
10
  ---
11
 
12
+
13
+
14
+ # Module3Project
15
+
16
+ # Overview
17
+ Predict coffee quality scores based on sensory attributes using a RandomForest model and an MLOps pipeline.
18
+ This project demonstrates an end-to-end MLOps pipeline: data ingestion, preprocessing, model training, containerization, cloud deployment, and front-end integration.
19
+
20
+ # Data
21
+ For this project, we are using data on coffee quality found here:
22
+ https://www.kaggle.com/datasets/volpatto/coffee-quality-database-from-cqi
23
+
24
+ The cleaned coffee dataset is publicly hosted on Google Cloud Storage for reproducibility.
25
+ The preprocessing pipeline automatically downloads it via the data.url field in config.yaml.
26
+
27
+ Cleaned data is hosted in Google Cloud Storage:
28
+ https://storage.googleapis.com/coffee-quality-data/preprocessed_data.csv
29
+
30
+ # Architecture
31
+ Data → Cloud (GCS) → Preprocess (ColumnTransformer) → Train (RandomForest) → FastAPI → Gradio frontend
32
+
33
+ ## Frontend Architecture
34
+ ┌───────────────┐ ┌─────────────┐ ┌───────────────┐ ┌──────────────┐
35
+ │ Kaggle Data │ → │ GCS Bucket │ → │ FastAPI (API)│ → │ Gradio UI │
36
+ └───────────────┘ └─────────────┘ └───────────────┘ └──────────────┘
37
+
38
+ # Frontend
39
+ The Gradio-based frontend is deployed at:
40
+
41
+ # Cloud Deployment:
42
+ The FastAPI container is deployed on Google Cloud Run at:
43
+ Base URL:
44
+ https://coffee-api-354131048216.us-central1.run.app
45
+
46
+ Endpoints:
47
+
48
+ - /health – Health check
49
+ - /predict_named – POST endpoint for predictions
50
+ - /docs - API documentation (Swagger)
51
+
52
+ Example cURL:
53
+ ```
54
+ curl -X POST "https://coffee-api-354131048216.us-central1.run.app/predict_named" \
55
+ -H "Content-Type: application/json" \
56
+ -d '{"rows":[{"Aroma":7.5,"Flavor":6.0,"Body":5.5,"Acidity":8.0,"Sweetness":9.0,"Balance":7.0,"Aftertaste":6.5,"Clean.Cup":9.0}]}'
57
+ ```
58
+
59
+ # Setup:
60
+ ```
61
+ python -m venv venv
62
+ source venv/bin/activate # Windows: venv\Scripts\activate
63
+ pip install --upgrade pip
64
+ pip install -r requirements.txt
65
+ ```
66
+ # Testing/running scripts
67
+ To test preprocess.py:
68
+ ```
69
+ python scripts/preprocess.py
70
+ ```
71
+ Confirm all output files exist by running:
72
+ ```
73
+ ls -l data/cleaned/X_train.csv data/cleaned/X_test.csv data/cleaned/y_train.csv data/cleaned/y_test.csv artifacts/preprocessor.joblib
74
+ ```
75
+ We wrote a unit test script tests/test_preprocessor.py, to run it:
76
+ ```
77
+ pip install pytest
78
+ pytest -q
79
+ ```
80
+
81
+ To run the server, do health check use sample predict payload:
82
+ ```
83
+ uvicorn app.server:app --reload --port 8000
84
+ curl http://127.0.0.1:8000/health
85
+ curl -X POST "http://127.0.0.1:8000/predict_named" \
86
+ -H "Content-Type: application/json" \
87
+ -d '{"rows":[ {"Aroma":7.5,"Flavor":6.0,"Number.of.Bags":1,"Category.One.Defects":0} ] }'
88
+ ```
89
+
90
+ To train the model:
91
+ ```
92
+ python scripts/train.py
93
+ ```
94
+ Ensure artifacts/model.joblib was built
95
+
96
+ To run the UI app start the server and type in CLI:
97
+ ```
98
+ python app/frontend.py
99
+ Enter 3 when prompted:
100
+ wandb: (1) Create a W&B account
101
+ wandb: (2) Use an existing W&B account
102
+ wandb: (3) Don't visualize my results
103
+ My personal login is needed to sign in here to update to wandb website
104
+
105
+ ```
106
+ Open link in browser
107
+
108
+ # Model
109
+ We used a RandomForestRegression for the model. Test size is 20% of dataset. Model has accuracy of 94.2% with 100 estimators.
110
+
111
+ W and B tracks model performance. Data can be found in wandb/run.../files/wandb-summary.json. Data is presented like this:
112
+ ```
113
+ {
114
+ "_timestamp":1.763876781125257e+09,
115
+ "_wandb":{"runtime":2},
116
+ "_runtime":2,
117
+ "_step":0,
118
+ "R2":0.9424069488737763,
119
+ "RMSE":0.5528660703704987,
120
+ "MAE":0.31615526315789416,
121
+ "MAPE":0.39006294567905464
122
+ }
123
+ ```
124
+ These perfomance metrics are also stored in artifacts.metrics.json like this:
125
+ ```
126
+ {
127
+ "R2": 0.9424069488737761,
128
+ "RMSE": 0.5528660703704994,
129
+ "MAE": 0.31615526315789455,
130
+ "MAPE": 0.39006294567905514
131
+ }
132
+ ```
133
+ The 94.2% R2 value shows very good fit and a cup score that correlates strongly with the other columns. The RMSE 0f 0.55 shows a small predicition error and therefore reinforces the model's high preformance. The MAE of 0.314 also shows a small error to the actual cup points. MAPE shows average percentage error of 39% which shows medium accuracy. This could be due to the small size dataset the model was trained on.
134
+
135
+ # 🐳 Docker and Testing
136
+ ## Build the image
137
+ ```
138
+ # from the project root
139
+ docker build -t coffee-api:dev .
140
+ docker run --rm -e WANDB_MODE=offline -p 8000:8000 coffee-api:dev
141
+ ```
142
+ Note: Use WANDB_MODE=offline (as shown above) when running inside Docker or CI to prevent login prompts from Weights & Biases. If you have a W&B API key, set it via WANDB_API_KEY=your_key to enable cloud logging.
143
+
144
+ ## Run the container
145
+ ```
146
+ docker run --rm -p 8000:8000 \
147
+ -v "$(pwd)/artifacts":/app/artifacts \
148
+ -v "$(pwd)/config.yaml":/app/config.yaml \
149
+ -v "$(pwd)/data":/app/data \
150
+ coffee-api:dev
151
+ ```
152
+ Then open:
153
+ • Health check: http://127.0.0.1:8000/health
154
+ • Interactive docs: http://127.0.0.1:8000/docs
155
+
156
+ If artifacts are missing, the container automatically runs scripts/preprocess.py to generate them.
157
+
158
+ ## Run tests inside the container
159
+
160
+ To verify reproducibility of preprocessing and data pipeline:
161
+ ```
162
+ docker run --rm -v "$(pwd)":/app -w /app coffee-api:dev python -m pytest -q
163
+ ```
164
+ Expect output:
165
+ ```
166
+ ...
167
+ 3 passed in ~0.9s
168
+ ```
169
+
170
+ ## Docker-related notes:
171
+ - Ports: container exposes 8000 (mapped to host port 8000)
172
+ - Artifacts (preprocessor.joblib, model.joblib) are mounted from the host for faster iteration
173
+
174
+ # Limitations & Ethics
175
+ Predictions depend on sensory ratings, which are subjective.
176
+ The model is not suitable for real-world evaluation of coffee quality without expert calibration.
177
+ The dataset may contain sampling bias by country or producer, and model predictions should not be used for commercial grading without calibration against expert cuppers.
178
+
179
+ # Notes / Gotchas
180
+ - config.yaml may include data.input_columns — if present the server will require/expect those columns and reindex incoming payloads automatically.
181
+ - The server will try to load artifacts/preprocessor.joblib and artifacts/model.joblib. If those are missing the server returns deterministic dummy predictions (development mode).
182
+
183
+ # ☁️ Cloud Services Used
184
+ - **Google Cloud Storage (GCS):** Stores the cleaned dataset (`preprocessed_data.csv`) publicly.
185
+ - **Google Cloud Run:** Hosts and serves the FastAPI model API container.
186
+ - **Weights & Biases (W&B):** Tracks model training metrics and performance.
187
+
188
+ # Hugging Face
189
+
190
+ # 🧠 Authors
191
+ - Eugenia Tate
192
+ - Avery Estopinal
193
+
194
+ # References:
195
+ - OpenAI. (2025). ChatGPT (Version 5.1) [Large language model]. https://chat.openai.com We used ChatGPT (OpenAI GPT-5.1) to assist with code snippets.
196
+ Portions of the preprocessing, frontend, train and most of server code were assisted by ChatGPT (OpenAI GPT-5.1). Authors verified and adapted the generated code.
197
+ Authors fully understand what the code does and how to apply the knowledge in the future.
198
+ - Kaggle Coffee Quality Data (Volpatto, 2020) https://www.kaggle.com/datasets/volpatto/coffee-quality-database-from-cqi
app.py ADDED
@@ -0,0 +1,251 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # app/frontend.py
2
+ # Author: Eugenia Tate
3
+ # Date: 11/23/2025
4
+
5
+ # CITATION:
6
+ # ChatGPT was used to prototype robust JSON-sanitization and input-coercion logic when encountering serialization errors
7
+ # and mixed user inputs (strings, noisy numeric text, pandas / numpy scalars). A recursive approach and conversion patterns
8
+ # were suggested; we reviewed and thoroughly tested the code locally. See coerce_and_clamp_dict() and make_json_safe() below.
9
+
10
+ # import necessary helpers
11
+ import os
12
+ import yaml
13
+ import json
14
+ import math
15
+ import pandas as pd, numpy as np # table handling
16
+ import gradio as gr # UI
17
+ import requests # to call API server
18
+ from typing import Dict, Any, List
19
+
20
+ # point to config.yaml file to retrieve API URL
21
+ CONFIG_PATH = os.path.join(os.getcwd(), "config.yaml")
22
+ # The above line was modified by ChatGPT 5.1 at 10:41a on 11/24/25 to work with Hugging Face
23
+ # if config exists - load it
24
+ if os.path.exists(CONFIG_PATH):
25
+ with open(CONFIG_PATH, "r") as f:
26
+ cfg = yaml.safe_load(f)
27
+ # if config does not exist - it falls back to being an empty dict
28
+ else:
29
+ cfg = {}
30
+
31
+ # server endpoint UI will use for POST; if confid is missing fallback to predict_named
32
+ API_URL = cfg.get("api_url", {}).get("FastAPI", "http://127.0.0.1:8000/predict_named")
33
+
34
+ # reduced set of sensible columns exposed in UI to the end user
35
+ INPUT_COLS = [
36
+ "Aroma", "Flavor", "Aftertaste", "Acidity",
37
+ "Body", "Balance", "Sweetness", "Clean.Cup"
38
+ ]
39
+ # help text for the end user explaining Clean Cup feature
40
+ CLEAN_CUP_HELP = (
41
+ "Clean.Cup indicates the absence of off-flavors or defects (higher is better). "
42
+ "Typically scored on the same sensory scale as other cup attributes."
43
+ )
44
+ # enforcing 0 to 10 possible values for input
45
+ RANGES = {c: (0.0, 10.0) for c in INPUT_COLS}
46
+
47
+ # ------------------------------------ CITED BLOCK --------------------------------------------------------------------
48
+ # implemented using ChatGPT (conversation 2025-11-23) to help normalize free-form user input into numeric values within range
49
+ # convert user values to allowed 0 - 10 range to avoid errors/crashes: handles blanks, strings, noisy input by stripping chars
50
+ # and sets None for missing / invalid entries (JSON's null)
51
+ def coerce_and_clamp_dict(row: Dict[str, Any]) -> Dict[str, Any]:
52
+ # out = {}
53
+ out: Dict[str, Any] = {}
54
+ # iterates over 8 input columns
55
+ for k in INPUT_COLS:
56
+ v = row.get(k, "")
57
+ # if a value user types is blank or string - converts it into np.nan
58
+ # or if user types something like "7.5pts" it strips the letters and keeps the number
59
+ if v is None or (isinstance(v, str) and v.strip() == ""):
60
+ # out[k] = np.nan
61
+ out[k] = None
62
+ continue
63
+ # tries to convert to float
64
+ fv = None
65
+ try:
66
+ fv = float(v)
67
+ except Exception:
68
+ # try to strip out non-digit characters (e.g. "7.5pts" -> "7.5")
69
+ try:
70
+ cleaned = "".join(ch for ch in str(v) if (ch.isdigit() or ch in ".-"))
71
+ fv = float(cleaned) if cleaned not in ("", ".", "-") else None
72
+ except Exception:
73
+ fv = None
74
+ # if conversion failed -> None
75
+ if fv is None or (isinstance(fv, float) and (math.isnan(fv) or math.isinf(fv))):
76
+ out[k] = None
77
+ continue
78
+ # once we have a clean numeric - it is clamped to be within [0,10] range of valid inputs
79
+ # if user typed 13 it will be clmaped to 10
80
+ # if user typed -2 it will become 0
81
+ lo, hi = RANGES.get(k, (None, None))
82
+ if lo is not None and hi is not None:
83
+ fv = max(lo, min(hi, fv))
84
+ out[k] = float(fv)
85
+ # returns a clean dict to be sent to server
86
+ return out
87
+
88
+ # ChatGPT 5.1 used to prototype this recursive JSON-sanitizer
89
+ # This function recursively walks nested containers (dicts, lists, tuples) and ensures any nested
90
+ # structure (e.g. {"payload": [{"Aroma": np.nan}]}) becomes JSON-safe everywhere, not just the top level
91
+ def make_json_safe(obj):
92
+ # dict
93
+ if isinstance(obj, dict):
94
+ return {k: make_json_safe(v) for k, v in obj.items()}
95
+ # list/tuple
96
+ if isinstance(obj, (list, tuple)):
97
+ return [make_json_safe(v) for v in obj]
98
+ # numpy scalar -> python scalar
99
+ try:
100
+ import numpy as _np
101
+ if isinstance(obj, _np.generic):
102
+ return make_json_safe(obj.item())
103
+ except Exception:
104
+ pass
105
+ # floats: map NaN/Inf -> None
106
+ if isinstance(obj, float):
107
+ if math.isnan(obj) or math.isinf(obj):
108
+ return None
109
+ return float(obj)
110
+ # ints, bool, str, None: ok
111
+ if isinstance(obj, (int, bool, str)) or obj is None:
112
+ return obj
113
+ # fallback
114
+ try:
115
+ return str(obj)
116
+ except Exception:
117
+ return None
118
+
119
+
120
+ # ------------------------------------------ END CITED BLOCK ------------------------------------------------
121
+
122
+ # helper function that returns True if every value in a row is null or numeric 0, otherwise - False
123
+ def _row_is_all_null_or_zero(row: Dict[str, Any]) -> bool:
124
+ for v in row.values():
125
+ # missing/null -> keep scanning (counts as "no numeric input")
126
+ if v is None:
127
+ continue
128
+ # numeric non-zero -> row is VALID
129
+ if isinstance(v, (int, float)) and v != 0:
130
+ return False
131
+ # anything else (string, etc) is considered missing/invalid; continue
132
+ # but coerce_and_clamp_dict should have converted those to None or numeric
133
+ return True
134
+
135
+ # sends JSON to server endpoint, returns a tuple (predictions list, raw resposnse/error)
136
+ def call_api_named(payload_rows: List[Dict[str, Any]]):
137
+ # sanitize payload so it's JSON-serializable and uses `null` for missing
138
+ safe_body = {"rows": make_json_safe(payload_rows)}
139
+ try:
140
+ payload_str = json.dumps(safe_body)
141
+ except Exception as e:
142
+ return None, f"Serialization error: {e}"
143
+ # tries calling POST to get predictions using requests lib
144
+ headers = {"Content-Type": "application/json"}
145
+ try:
146
+ response = requests.post(API_URL, data=payload_str, headers=headers, timeout=10) # timeout at 10 sec to avoid hanging
147
+ response.raise_for_status()
148
+ # returns prediction list and full raw text response to be used within debug box on SUCCESS (200 OK)
149
+ return response.json().get("predictions", []), response.text
150
+ except Exception as e:
151
+ return None, f"API error: {e}" # on error return None
152
+
153
+ #prettifies prediction and debug JSON
154
+ def predict_from_rows_of_dicts(rows_of_dicts: List[Dict[str, Any]]):
155
+ payload_rows = [coerce_and_clamp_dict(row) for row in rows_of_dicts]
156
+ # decide whether submission is allowed:
157
+ # - if every submitted row is all-null-or-zero, refuse
158
+ all_rows_invalid = all(_row_is_all_null_or_zero(r) for r in payload_rows)
159
+ if all_rows_invalid:
160
+ debug = {"payload": payload_rows, "response_raw": "skipped - all values missing or zero"}
161
+ return "Please enter at least one numeric attribute (non-zero) before submitting.", json.dumps(debug, indent=2)
162
+ # Otherwise proceed and call API (allowed if at least one row has a non-zero numeric)
163
+ preds, raw = call_api_named(payload_rows)
164
+ # building a debug dictionary containing both payload and raw server response
165
+ debug = {"payload": payload_rows, "response_raw": raw}
166
+ # if API fails - return empty prediction and debug JSON for debugging
167
+ if preds is None:
168
+ return "", json.dumps(debug, indent=2)
169
+ # prettifying predictions upon successful call to be user-friendly
170
+ prettified_pred = [f"Predicted Coffee Quality Points = {round(float(p), 1)}" for p in preds] # rounding predictions to 1 decimal place (user friendly)
171
+ #returns prettified prediction and debug JSON for debug box
172
+ return "\n".join(prettified_pred), json.dumps(debug, indent=2)
173
+
174
+
175
+ def predict_from_table(table):
176
+ rows_of_dicts = table_to_list_of_dicts(table)
177
+ return predict_from_rows_of_dicts(rows_of_dicts)
178
+
179
+
180
+ # ------------------------------------ CITED BLOCK -------------------------------------
181
+ # ChatGPT was used on 11/23/2025 to fix this function due to encountering errors to help deal
182
+ # with 2 possible incoming formats: Dataframe and list of lists.
183
+
184
+ # helper function puts input into proper expected by server format of list-of-dicts keyed by INPUT_COLS:
185
+ # [{"Aroma": 7.5, "Flavor": 8.0, ...}];
186
+ # fills missing columns with empty strings so coerce_and_clamp_dict() can convert them to np.nan
187
+ def table_to_list_of_dicts(table):
188
+ # if table passed in is an instance of Dataframe obj - turn it into a dict
189
+ if isinstance(table, pd.DataFrame):
190
+ df = table
191
+ return [df.iloc[i].to_dict() for i in range(len(df))]
192
+ # else - assume table is a list of lists and manually pair each element to corresponding column
193
+ rows = []
194
+ for row in table:
195
+ # ensure row has right length
196
+ vals = list(row) + [""] * max(0, len(INPUT_COLS) - len(row))
197
+ rows.append({col: vals[i] for i, col in enumerate(INPUT_COLS)})
198
+ return rows
199
+ # ------------------------------- END CITED BLOCK -------------------------------------------
200
+
201
+
202
+ # -------------------------------- Gradio UI ------------------------------------------------------
203
+ with gr.Blocks(title="Coffee Quality Points Estimator") as demo:
204
+ # inline HTML/CSS to style user instructions
205
+ gr.Markdown("<h1 style='text-align:center;color:#08306B'>Coffee Quality Points Estimator</h1>")
206
+ gr.Markdown(
207
+ "<div style='font-size:17px;font-weight:700;color:#2b6cb0'>"
208
+ "Instructions: Fill the known sensory attributes (0–10). Leave unknowns blank and the model will "
209
+ "attempt to infer missing values. Then click <b style='color:#ff6600'>Submit</b> to estimate the "
210
+ "<b>Coffee Quality Points</b> (Total.Cup.Points). Higher scores mean better coffee quality.</div>"
211
+ )
212
+
213
+ with gr.Row():
214
+ # presents 1 row by default with INPUT_COLS
215
+ df_input = gr.Dataframe(
216
+ headers=INPUT_COLS,
217
+ value=[["" for _ in INPUT_COLS]], # list of lists to avoid validation errors encountered on testing
218
+ # ------------------------- ChatGPT 5.1 was used to fix the issues on 11/23/2025 ---------------------
219
+ row_count=1,
220
+ col_count=len(INPUT_COLS),
221
+ interactive=True,
222
+ label="Enter Known Columns (0–10 range; numeric values preferred)"
223
+ )
224
+
225
+ with gr.Row():
226
+ submit_btn = gr.Button("Submit", variant="primary")
227
+
228
+ with gr.Row():
229
+ # short prediction for the user
230
+ pred_out = gr.Textbox(label="Predicted Coffee Quality Points", lines=1, interactive=False)
231
+
232
+ with gr.Row():
233
+ # full debug info for developer
234
+ debug_out = gr.Textbox(label="Debug (payload + raw response)", lines=10, interactive=False)
235
+
236
+ with gr.Row():
237
+ gr.Markdown(f"<b>Note:</b> <i>{CLEAN_CUP_HELP}</i>")
238
+
239
+ # When user clicks Submit, Gradio sends the contents of the table to table_to_list_of_dicts().
240
+ # the content can either be a Dataframe or list of lists and the helper function can handle both
241
+ # making the format consistent with FastAPI expectations
242
+ def submit_table(table):
243
+ rows_of_dicts = table_to_list_of_dicts(table)
244
+ return predict_from_rows_of_dicts(rows_of_dicts)
245
+
246
+ # fires up the actual prediction
247
+ submit_btn.click(predict_from_table, inputs=[df_input], outputs=[pred_out, debug_out])
248
+
249
+ if __name__ == "__main__":
250
+ # auto opens the demo in browser
251
+ demo.launch()
config.yaml ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ data:
2
+ # url empty for now so script will default to local file; modify later as needed
3
+ url: "https://storage.googleapis.com/coffee-quality-data/preprocessed_data.csv"
4
+ local_path: "data/raw/raw_data.csv"
5
+ preprocessed_path: "data/preprocessed/preprocessed_data.csv"
6
+ target: "Total.Cup.Points"
7
+ input_columns:
8
+ - Number.of.Bags
9
+ - Category.One.Defects
10
+ - Category.Two.Defects
11
+ - Aroma
12
+ - Flavor
13
+ - Aftertaste
14
+ - Acidity
15
+ - Body
16
+ - Balance
17
+ - Uniformity
18
+ - Clean.Cup
19
+ - Sweetness
20
+ - Cupper.Points
21
+ - Moisture
22
+ - Quakers
23
+ - altitude_low_meters
24
+ - altitude_high_meters
25
+ - altitude_mean_meters
26
+ - Species
27
+ - Owner
28
+ - Country.of.Origin
29
+ - Mill
30
+ - ICO.Number
31
+ - Company
32
+ - Altitude
33
+ - Region
34
+ - Producer
35
+ - Bag.Weight
36
+ - In.Country.Partner
37
+ - Harvest.Year
38
+ - Grading.Date
39
+ - Owner.1
40
+ - Variety
41
+ - Processing.Method
42
+ - Color
43
+ - Expiration
44
+ - Certification.Body
45
+ - Certification.Address
46
+ - Certification.Contact
47
+ - unit_of_measurement
48
+
49
+ # model details to be added later during train.py work
50
+ train:
51
+ test_size: 0.2
52
+ random_state: 42
53
+ model_params:
54
+ n_estimators: 100
55
+ random_state: 42
56
+ n_jobs: -1
57
+
58
+ paths:
59
+ X_train: "data/cleaned/X_train.csv"
60
+ X_test: "data/cleaned/X_test.csv"
61
+ y_train: "data/cleaned/y_train.csv"
62
+ y_test: "data/cleaned/y_test.csv"
63
+
64
+ artifacts:
65
+ model: "artifacts/model.joblib"
66
+ preprocessor: "artifacts/preprocessor.joblib"
67
+ metrics: "artifacts/metrics.json"
68
+ # The above snippet was generated by chatGPT 5.1 at 10:20p at 11/20/25.
69
+
70
+ api_url:
71
+ # FastAPI: "http://127.0.0.1:8000/predict_named"
72
+ FastAPI: "https://coffee-api-354131048216.us-central1.run.app/predict_named"
73
+
requirements.txt ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ fastapi>=0.95
2
+ uvicorn[standard]>=0.22.0
3
+ pydantic>=1.10
4
+ PyYAML==6.0
5
+ joblib==1.3.2
6
+ scikit-learn==1.7.2
7
+ numpy==1.26.4
8
+ pandas==2.2.2
9
+ gradio==3.41.0
10
+ requests==2.31.0
11
+ wandb==0.23.0
12
+
13
+ # test / dev tools
14
+ pytest>=7.4