| # Implementation Guide
|
|
|
| ## Goal
|
|
|
| Build and run a local fine-tuning pipeline for a coding assistant model with:
|
| - dataset generation
|
| - LoRA fine-tuning
|
| - local inference
|
| - optional Hugging Face upload
|
|
|
| ## Project Modules
|
|
|
| - `generate_dataset.py`
|
| - Generates training samples into JSON.
|
| - Supports dataset size range: 5000 to 10000.
|
| - `finetune_coding_llm_colab.py`
|
| - Main training module (local usage).
|
| - Supports dataset generation, training, and optional HF upload.
|
| - `run_pipeline.py`
|
| - Orchestrates generate -> train -> upload in one command.
|
| - Reads defaults from `training_config.json`.
|
| - `infer_local.py` |
| - Runs inference from local trained output. |
| - Handles both LoRA adapter output and full model output. |
| - Returns structured JSON fields including code, explanation, confidence, relevancy, hallucination check, and latency. |
| - `infer_cloud.py` |
| - Runs inference through the Hugging Face API using an HF token. |
| - Reuses the local structured-output parser and repair checks so API output matches the local JSON contract. |
| - Falls back to the local `model/` folder when Hugging Face does not serve the custom repo through an inference provider. |
| - `handler.py` |
| - Custom Hugging Face Dedicated Inference Endpoint handler. |
| - Loads the LoRA adapter/full model and returns the same structured JSON contract directly from the hosted endpoint. |
| - `evaluate_model.py`
|
| - Runs a multi-prompt evaluation and reports pass rate (accuracy) for schema + quality checks.
|
| - `upload_to_hf.py`
|
| - Uploads local model folder to Hugging Face model repo.
|
|
|
| ## Environment Setup
|
|
|
| 1. Use Python 3.10+ (recommended).
|
| 2. Install dependencies:
|
| - `pip install -r requirements.txt`
|
| 3. (Optional) Login to Hugging Face before upload:
|
| - `huggingface-cli login`
|
|
|
| ## Standard Execution Flow
|
|
|
| 1. Generate dataset:
|
| - `python generate_dataset.py --size 8000 --out train.json`
|
| 2. Train model:
|
| - `python finetune_coding_llm_colab.py --dataset-size 8000 --train-file train.json --output-dir model --skip-dataset-gen`
|
| 3. Test inference: |
| - `python infer_local.py --model-path model --prompt "Fix this code: def add(a,b) return a+b"` |
| - Add `--allow-downloads` on a fresh machine if the base model is not cached locally. |
| 4. Evaluate quality:
|
| - `python evaluate_model.py --model-path model`
|
| 5. Upload (optional): |
| - `python upload_to_hf.py --model-dir model --repo-id your-username/your-model-name` |
| 6. Test cloud inference (optional): |
| - PowerShell: `$env:HF_TOKEN="your_huggingface_token"` |
| - `python infer_cloud.py --repo-id your-username/your-model-name --prompt "Fix this code: def add(a,b) return a+b"` |
| - If you already logged in with `hf auth login`, the saved token can be used without setting `HF_TOKEN`. |
| - Add `--no-local-fallback` if you want the command to fail when HF cloud serving is unavailable. |
| - Add `--allow-downloads` if local fallback needs to download missing base-model files. |
| - For true cloud execution, deploy a Hugging Face Dedicated Inference Endpoint and call: |
| - `python infer_cloud.py --endpoint-url "https://your-endpoint-url.endpoints.huggingface.cloud" --prompt "Fix this code: def add(a,b) return a+b" --no-local-fallback` |
| - Users should set their own token with `$env:HF_TOKEN="their_huggingface_token"` before calling the endpoint. |
|
|
| ## One-Command Execution
|
|
|
| - Run full pipeline without upload:
|
| - `python run_pipeline.py --dataset-size 8000 --skip-upload`
|
|
|
| - Run with upload:
|
| - `python run_pipeline.py --dataset-size 8000 --hf-repo your-username/your-model-name`
|
|
|
| ## Performance Recommendations
|
|
|
| - CPU quick validation:
|
| - `python run_pipeline.py --dataset-size 5000 --max-train-samples 20 --epochs 0.1 --skip-upload`
|
| - Full quality run:
|
| - `python run_pipeline.py --dataset-size 8000 --epochs 3 --batch-size 2 --learning-rate 1e-4 --max-length 512 --use-4bit --skip-upload`
|
|
|
| ## Error Handling Rules
|
|
|
| - If dataset file is missing, run `generate_dataset.py`.
|
| - If model folder is missing, run training first.
|
| - If HF upload fails, verify:
|
| - `huggingface-cli whoami`
|
| - repo permission and repo id format (`username/repo`)
|
|
|
| ## Integration Notes
|
|
|
| - `run_pipeline.py` is the recommended entrypoint for regular usage.
|
| - `training_config.json` provides default values and can be overridden by CLI flags.
|
| - Inference works with LoRA adapters and full models automatically.
|
|
|
| ## Hugging Face Existing Model Update
|
|
|
| To update an already published Hugging Face model with current project behavior:
|
|
|
| 1. Retrain with latest code:
|
| - `python run_pipeline.py --dataset-size 8000 --skip-upload`
|
| 2. Validate local inference + evaluation:
|
| - `python infer_local.py --model-path model --prompt "Fix this code: def add(a,b) return a+b"`
|
| - `python evaluate_model.py --model-path model`
|
| 3. Upload to same repo id:
|
| - `python upload_to_hf.py --model-dir model --repo-id your-username/your-existing-model-name`
|
|
|
| Optional safer rollout:
|
| - Upload to a revision branch first and test before merging to main.
|
|
|
| ## Current Output Contract |
|
|
| `infer_local.py` returns JSON with: |
| - `code`
|
| - `explanation`
|
| - `confidence`
|
| - `important_tokens`
|
| - `relevancy_score`
|
| - `hallucination`
|
| - `hallucination_check_reason`
|
| - `latency_ms` |
|
|
| `infer_cloud.py` returns the same JSON keys through the Hugging Face API, or through local fallback if HF cannot serve the custom repo. Cloud responses may not include token-level probabilities, so `important_tokens` can be empty and `confidence` can be `0.0` unless the serving endpoint exposes token details. |
|
|
| For users calling the hosted model with their own token/API key, deploy the repository as a Hugging Face Dedicated Inference Endpoint. The included `handler.py` makes endpoint responses use the same JSON pattern: |
|
|
| - `code` |
| - `explanation` |
| - `confidence` |
| - `important_tokens` |
| - `relevancy_score` |
| - `hallucination` |
| - `hallucination_check_reason` |
| - `latency_ms` |
|
|
| Direct Hugging Face serverless calls to the model repo are not guaranteed to run custom LoRA repos. Dedicated endpoints or a cloud VM are required for true cloud execution. |
|
|