File size: 6,118 Bytes
6e88483 0a19b76 ec50606 6e88483 a201d36 6e88483 0a19b76 a201d36 007a7c7 6e88483 ec50606 6e88483 0a19b76 ec50606 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 | # Implementation Guide
## Goal
Build and run a local fine-tuning pipeline for a coding assistant model with:
- dataset generation
- LoRA fine-tuning
- local inference
- optional Hugging Face upload
## Project Modules
- `generate_dataset.py`
- Generates training samples into JSON.
- Supports dataset size range: 5000 to 10000.
- `finetune_coding_llm_colab.py`
- Main training module (local usage).
- Supports dataset generation, training, and optional HF upload.
- `run_pipeline.py`
- Orchestrates generate -> train -> upload in one command.
- Reads defaults from `training_config.json`.
- `infer_local.py`
- Runs inference from local trained output.
- Handles both LoRA adapter output and full model output.
- Returns structured JSON fields including code, explanation, confidence, relevancy, hallucination check, and latency.
- `infer_cloud.py`
- Runs inference through the Hugging Face API using an HF token.
- Reuses the local structured-output parser and repair checks so API output matches the local JSON contract.
- Falls back to the local `model/` folder when Hugging Face does not serve the custom repo through an inference provider.
- `handler.py`
- Custom Hugging Face Dedicated Inference Endpoint handler.
- Loads the LoRA adapter/full model and returns the same structured JSON contract directly from the hosted endpoint.
- `evaluate_model.py`
- Runs a multi-prompt evaluation and reports pass rate (accuracy) for schema + quality checks.
- `upload_to_hf.py`
- Uploads local model folder to Hugging Face model repo.
## Environment Setup
1. Use Python 3.10+ (recommended).
2. Install dependencies:
- `pip install -r requirements.txt`
3. (Optional) Login to Hugging Face before upload:
- `huggingface-cli login`
## Standard Execution Flow
1. Generate dataset:
- `python generate_dataset.py --size 8000 --out train.json`
2. Train model:
- `python finetune_coding_llm_colab.py --dataset-size 8000 --train-file train.json --output-dir model --skip-dataset-gen`
3. Test inference:
- `python infer_local.py --model-path model --prompt "Fix this code: def add(a,b) return a+b"`
- Add `--allow-downloads` on a fresh machine if the base model is not cached locally.
4. Evaluate quality:
- `python evaluate_model.py --model-path model`
5. Upload (optional):
- `python upload_to_hf.py --model-dir model --repo-id your-username/your-model-name`
6. Test cloud inference (optional):
- PowerShell: `$env:HF_TOKEN="your_huggingface_token"`
- `python infer_cloud.py --repo-id your-username/your-model-name --prompt "Fix this code: def add(a,b) return a+b"`
- If you already logged in with `hf auth login`, the saved token can be used without setting `HF_TOKEN`.
- Add `--no-local-fallback` if you want the command to fail when HF cloud serving is unavailable.
- Add `--allow-downloads` if local fallback needs to download missing base-model files.
- For true cloud execution, deploy a Hugging Face Dedicated Inference Endpoint and call:
- `python infer_cloud.py --endpoint-url "https://your-endpoint-url.endpoints.huggingface.cloud" --prompt "Fix this code: def add(a,b) return a+b" --no-local-fallback`
- Users should set their own token with `$env:HF_TOKEN="their_huggingface_token"` before calling the endpoint.
## One-Command Execution
- Run full pipeline without upload:
- `python run_pipeline.py --dataset-size 8000 --skip-upload`
- Run with upload:
- `python run_pipeline.py --dataset-size 8000 --hf-repo your-username/your-model-name`
## Performance Recommendations
- CPU quick validation:
- `python run_pipeline.py --dataset-size 5000 --max-train-samples 20 --epochs 0.1 --skip-upload`
- Full quality run:
- `python run_pipeline.py --dataset-size 8000 --epochs 3 --batch-size 2 --learning-rate 1e-4 --max-length 512 --use-4bit --skip-upload`
## Error Handling Rules
- If dataset file is missing, run `generate_dataset.py`.
- If model folder is missing, run training first.
- If HF upload fails, verify:
- `huggingface-cli whoami`
- repo permission and repo id format (`username/repo`)
## Integration Notes
- `run_pipeline.py` is the recommended entrypoint for regular usage.
- `training_config.json` provides default values and can be overridden by CLI flags.
- Inference works with LoRA adapters and full models automatically.
## Hugging Face Existing Model Update
To update an already published Hugging Face model with current project behavior:
1. Retrain with latest code:
- `python run_pipeline.py --dataset-size 8000 --skip-upload`
2. Validate local inference + evaluation:
- `python infer_local.py --model-path model --prompt "Fix this code: def add(a,b) return a+b"`
- `python evaluate_model.py --model-path model`
3. Upload to same repo id:
- `python upload_to_hf.py --model-dir model --repo-id your-username/your-existing-model-name`
Optional safer rollout:
- Upload to a revision branch first and test before merging to main.
## Current Output Contract
`infer_local.py` returns JSON with:
- `code`
- `explanation`
- `confidence`
- `important_tokens`
- `relevancy_score`
- `hallucination`
- `hallucination_check_reason`
- `latency_ms`
`infer_cloud.py` returns the same JSON keys through the Hugging Face API, or through local fallback if HF cannot serve the custom repo. Cloud responses may not include token-level probabilities, so `important_tokens` can be empty and `confidence` can be `0.0` unless the serving endpoint exposes token details.
For users calling the hosted model with their own token/API key, deploy the repository as a Hugging Face Dedicated Inference Endpoint. The included `handler.py` makes endpoint responses use the same JSON pattern:
- `code`
- `explanation`
- `confidence`
- `important_tokens`
- `relevancy_score`
- `hallucination`
- `hallucination_check_reason`
- `latency_ms`
Direct Hugging Face serverless calls to the model repo are not guaranteed to run custom LoRA repos. Dedicated endpoints or a cloud VM are required for true cloud execution.
|