girish00 commited on
Commit
ec50606
·
verified ·
1 Parent(s): 36ff300

add structured endpoint handler

Browse files
Files changed (1) hide show
  1. IMPLEMENTATION.md +17 -1
IMPLEMENTATION.md CHANGED
@@ -27,6 +27,9 @@ Build and run a local fine-tuning pipeline for a coding assistant model with:
27
  - Runs inference through the Hugging Face API using an HF token.
28
  - Reuses the local structured-output parser and repair checks so API output matches the local JSON contract.
29
  - Falls back to the local `model/` folder when Hugging Face does not serve the custom repo through an inference provider.
 
 
 
30
  - `evaluate_model.py`
31
  - Runs a multi-prompt evaluation and reports pass rate (accuracy) for schema + quality checks.
32
  - `upload_to_hf.py`
@@ -105,7 +108,7 @@ To update an already published Hugging Face model with current project behavior:
105
  Optional safer rollout:
106
  - Upload to a revision branch first and test before merging to main.
107
 
108
- ## Current Output Contract
109
 
110
  `infer_local.py` returns JSON with:
111
  - `code`
@@ -118,3 +121,16 @@ Optional safer rollout:
118
  - `latency_ms`
119
 
120
  `infer_cloud.py` returns the same JSON keys through the Hugging Face API, or through local fallback if HF cannot serve the custom repo. Cloud responses may not include token-level probabilities, so `important_tokens` can be empty and `confidence` can be `0.0` unless the serving endpoint exposes token details.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  - Runs inference through the Hugging Face API using an HF token.
28
  - Reuses the local structured-output parser and repair checks so API output matches the local JSON contract.
29
  - Falls back to the local `model/` folder when Hugging Face does not serve the custom repo through an inference provider.
30
+ - `handler.py`
31
+ - Custom Hugging Face Dedicated Inference Endpoint handler.
32
+ - Loads the LoRA adapter/full model and returns the same structured JSON contract directly from the hosted endpoint.
33
  - `evaluate_model.py`
34
  - Runs a multi-prompt evaluation and reports pass rate (accuracy) for schema + quality checks.
35
  - `upload_to_hf.py`
 
108
  Optional safer rollout:
109
  - Upload to a revision branch first and test before merging to main.
110
 
111
+ ## Current Output Contract
112
 
113
  `infer_local.py` returns JSON with:
114
  - `code`
 
121
  - `latency_ms`
122
 
123
  `infer_cloud.py` returns the same JSON keys through the Hugging Face API, or through local fallback if HF cannot serve the custom repo. Cloud responses may not include token-level probabilities, so `important_tokens` can be empty and `confidence` can be `0.0` unless the serving endpoint exposes token details.
124
+
125
+ For users calling the hosted model with their own token/API key, deploy the repository as a Hugging Face Dedicated Inference Endpoint. The included `handler.py` makes endpoint responses use the same JSON pattern:
126
+
127
+ - `code`
128
+ - `explanation`
129
+ - `confidence`
130
+ - `important_tokens`
131
+ - `relevancy_score`
132
+ - `hallucination`
133
+ - `hallucination_check_reason`
134
+ - `latency_ms`
135
+
136
+ Direct Hugging Face serverless calls to the model repo are not guaranteed to run custom LoRA repos. Dedicated endpoints or a cloud VM are required for true cloud execution.