File size: 6,118 Bytes
6e88483
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0a19b76
ec50606
 
 
6e88483
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a201d36
 
 
6e88483
 
 
 
 
 
 
 
0a19b76
a201d36
007a7c7
 
 
6e88483
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ec50606
6e88483
 
 
 
 
 
 
 
 
 
 
0a19b76
ec50606
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
# Implementation Guide

## Goal

Build and run a local fine-tuning pipeline for a coding assistant model with:
- dataset generation
- LoRA fine-tuning
- local inference
- optional Hugging Face upload

## Project Modules

- `generate_dataset.py`
  - Generates training samples into JSON.
  - Supports dataset size range: 5000 to 10000.
- `finetune_coding_llm_colab.py`
  - Main training module (local usage).
  - Supports dataset generation, training, and optional HF upload.
- `run_pipeline.py`
  - Orchestrates generate -> train -> upload in one command.
  - Reads defaults from `training_config.json`.
- `infer_local.py`
  - Runs inference from local trained output.
  - Handles both LoRA adapter output and full model output.
  - Returns structured JSON fields including code, explanation, confidence, relevancy, hallucination check, and latency.
- `infer_cloud.py`
  - Runs inference through the Hugging Face API using an HF token.
  - Reuses the local structured-output parser and repair checks so API output matches the local JSON contract.
  - Falls back to the local `model/` folder when Hugging Face does not serve the custom repo through an inference provider.
- `handler.py`
  - Custom Hugging Face Dedicated Inference Endpoint handler.
  - Loads the LoRA adapter/full model and returns the same structured JSON contract directly from the hosted endpoint.
- `evaluate_model.py`
  - Runs a multi-prompt evaluation and reports pass rate (accuracy) for schema + quality checks.
- `upload_to_hf.py`
  - Uploads local model folder to Hugging Face model repo.

## Environment Setup

1. Use Python 3.10+ (recommended).
2. Install dependencies:
   - `pip install -r requirements.txt`
3. (Optional) Login to Hugging Face before upload:
   - `huggingface-cli login`

## Standard Execution Flow

1. Generate dataset:
   - `python generate_dataset.py --size 8000 --out train.json`
2. Train model:
   - `python finetune_coding_llm_colab.py --dataset-size 8000 --train-file train.json --output-dir model --skip-dataset-gen`
3. Test inference:
   - `python infer_local.py --model-path model --prompt "Fix this code: def add(a,b) return a+b"`
   - Add `--allow-downloads` on a fresh machine if the base model is not cached locally.
4. Evaluate quality:
   - `python evaluate_model.py --model-path model`
5. Upload (optional):
   - `python upload_to_hf.py --model-dir model --repo-id your-username/your-model-name`
6. Test cloud inference (optional):
   - PowerShell: `$env:HF_TOKEN="your_huggingface_token"`
   - `python infer_cloud.py --repo-id your-username/your-model-name --prompt "Fix this code: def add(a,b) return a+b"`
   - If you already logged in with `hf auth login`, the saved token can be used without setting `HF_TOKEN`.
   - Add `--no-local-fallback` if you want the command to fail when HF cloud serving is unavailable.
   - Add `--allow-downloads` if local fallback needs to download missing base-model files.
   - For true cloud execution, deploy a Hugging Face Dedicated Inference Endpoint and call:
     - `python infer_cloud.py --endpoint-url "https://your-endpoint-url.endpoints.huggingface.cloud" --prompt "Fix this code: def add(a,b) return a+b" --no-local-fallback`
   - Users should set their own token with `$env:HF_TOKEN="their_huggingface_token"` before calling the endpoint.

## One-Command Execution

- Run full pipeline without upload:
  - `python run_pipeline.py --dataset-size 8000 --skip-upload`

- Run with upload:
  - `python run_pipeline.py --dataset-size 8000 --hf-repo your-username/your-model-name`

## Performance Recommendations

- CPU quick validation:
  - `python run_pipeline.py --dataset-size 5000 --max-train-samples 20 --epochs 0.1 --skip-upload`
- Full quality run:
  - `python run_pipeline.py --dataset-size 8000 --epochs 3 --batch-size 2 --learning-rate 1e-4 --max-length 512 --use-4bit --skip-upload`

## Error Handling Rules

- If dataset file is missing, run `generate_dataset.py`.
- If model folder is missing, run training first.
- If HF upload fails, verify:
  - `huggingface-cli whoami`
  - repo permission and repo id format (`username/repo`)

## Integration Notes

- `run_pipeline.py` is the recommended entrypoint for regular usage.
- `training_config.json` provides default values and can be overridden by CLI flags.
- Inference works with LoRA adapters and full models automatically.

## Hugging Face Existing Model Update

To update an already published Hugging Face model with current project behavior:

1. Retrain with latest code:
   - `python run_pipeline.py --dataset-size 8000 --skip-upload`
2. Validate local inference + evaluation:
   - `python infer_local.py --model-path model --prompt "Fix this code: def add(a,b) return a+b"`
   - `python evaluate_model.py --model-path model`
3. Upload to same repo id:
   - `python upload_to_hf.py --model-dir model --repo-id your-username/your-existing-model-name`

Optional safer rollout:
- Upload to a revision branch first and test before merging to main.

## Current Output Contract

`infer_local.py` returns JSON with:
- `code`
- `explanation`
- `confidence`
- `important_tokens`
- `relevancy_score`
- `hallucination`
- `hallucination_check_reason`
- `latency_ms`

`infer_cloud.py` returns the same JSON keys through the Hugging Face API, or through local fallback if HF cannot serve the custom repo. Cloud responses may not include token-level probabilities, so `important_tokens` can be empty and `confidence` can be `0.0` unless the serving endpoint exposes token details.

For users calling the hosted model with their own token/API key, deploy the repository as a Hugging Face Dedicated Inference Endpoint. The included `handler.py` makes endpoint responses use the same JSON pattern:

- `code`
- `explanation`
- `confidence`
- `important_tokens`
- `relevancy_score`
- `hallucination`
- `hallucination_check_reason`
- `latency_ms`

Direct Hugging Face serverless calls to the model repo are not guaranteed to run custom LoRA repos. Dedicated endpoints or a cloud VM are required for true cloud execution.