Implementation Guide
Goal
Build and run a local fine-tuning pipeline for a coding assistant model with:
- dataset generation
- LoRA fine-tuning
- local inference
- optional Hugging Face upload
Project Modules
generate_dataset.py- Generates training samples into JSON.
- Supports dataset size range: 5000 to 10000.
finetune_coding_llm_colab.py- Main training module (local usage).
- Supports dataset generation, training, and optional HF upload.
run_pipeline.py- Orchestrates generate -> train -> upload in one command.
- Reads defaults from
training_config.json.
infer_local.py- Runs inference from local trained output.
- Handles both LoRA adapter output and full model output.
- Returns structured JSON fields including code, explanation, confidence, relevancy, hallucination check, and latency.
infer_cloud.py- Runs inference through the Hugging Face API using an HF token.
- Reuses the local structured-output parser and repair checks so API output matches the local JSON contract.
- Falls back to the local
model/folder when Hugging Face does not serve the custom repo through an inference provider.
handler.py- Custom Hugging Face Dedicated Inference Endpoint handler.
- Loads the LoRA adapter/full model and returns the same structured JSON contract directly from the hosted endpoint.
evaluate_model.py- Runs a multi-prompt evaluation and reports pass rate (accuracy) for schema + quality checks.
upload_to_hf.py- Uploads local model folder to Hugging Face model repo.
Environment Setup
- Use Python 3.10+ (recommended).
- Install dependencies:
pip install -r requirements.txt
- (Optional) Login to Hugging Face before upload:
huggingface-cli login
Standard Execution Flow
- Generate dataset:
python generate_dataset.py --size 8000 --out train.json
- Train model:
python finetune_coding_llm_colab.py --dataset-size 8000 --train-file train.json --output-dir model --skip-dataset-gen
- Test inference:
python infer_local.py --model-path model --prompt "Fix this code: def add(a,b) return a+b"- Add
--allow-downloadson a fresh machine if the base model is not cached locally.
- Evaluate quality:
python evaluate_model.py --model-path model
- Upload (optional):
python upload_to_hf.py --model-dir model --repo-id your-username/your-model-name
- Test cloud inference (optional):
- PowerShell:
$env:HF_TOKEN="your_huggingface_token" python infer_cloud.py --repo-id your-username/your-model-name --prompt "Fix this code: def add(a,b) return a+b"- If you already logged in with
hf auth login, the saved token can be used without settingHF_TOKEN. - Add
--no-local-fallbackif you want the command to fail when HF cloud serving is unavailable. - Add
--allow-downloadsif local fallback needs to download missing base-model files. - For true cloud execution, deploy a Hugging Face Dedicated Inference Endpoint and call:
python infer_cloud.py --endpoint-url "https://your-endpoint-url.endpoints.huggingface.cloud" --prompt "Fix this code: def add(a,b) return a+b" --no-local-fallback
- Users should set their own token with
$env:HF_TOKEN="their_huggingface_token"before calling the endpoint.
- PowerShell:
One-Command Execution
Run full pipeline without upload:
python run_pipeline.py --dataset-size 8000 --skip-upload
Run with upload:
python run_pipeline.py --dataset-size 8000 --hf-repo your-username/your-model-name
Performance Recommendations
- CPU quick validation:
python run_pipeline.py --dataset-size 5000 --max-train-samples 20 --epochs 0.1 --skip-upload
- Full quality run:
python run_pipeline.py --dataset-size 8000 --epochs 3 --batch-size 2 --learning-rate 1e-4 --max-length 512 --use-4bit --skip-upload
Error Handling Rules
- If dataset file is missing, run
generate_dataset.py. - If model folder is missing, run training first.
- If HF upload fails, verify:
huggingface-cli whoami- repo permission and repo id format (
username/repo)
Integration Notes
run_pipeline.pyis the recommended entrypoint for regular usage.training_config.jsonprovides default values and can be overridden by CLI flags.- Inference works with LoRA adapters and full models automatically.
Hugging Face Existing Model Update
To update an already published Hugging Face model with current project behavior:
- Retrain with latest code:
python run_pipeline.py --dataset-size 8000 --skip-upload
- Validate local inference + evaluation:
python infer_local.py --model-path model --prompt "Fix this code: def add(a,b) return a+b"python evaluate_model.py --model-path model
- Upload to same repo id:
python upload_to_hf.py --model-dir model --repo-id your-username/your-existing-model-name
Optional safer rollout:
- Upload to a revision branch first and test before merging to main.
Current Output Contract
infer_local.py returns JSON with:
codeexplanationconfidenceimportant_tokensrelevancy_scorehallucinationhallucination_check_reasonlatency_ms
infer_cloud.py returns the same JSON keys through the Hugging Face API, or through local fallback if HF cannot serve the custom repo. Cloud responses may not include token-level probabilities, so important_tokens can be empty and confidence can be 0.0 unless the serving endpoint exposes token details.
For users calling the hosted model with their own token/API key, deploy the repository as a Hugging Face Dedicated Inference Endpoint. The included handler.py makes endpoint responses use the same JSON pattern:
codeexplanationconfidenceimportant_tokensrelevancy_scorehallucinationhallucination_check_reasonlatency_ms
Direct Hugging Face serverless calls to the model repo are not guaranteed to run custom LoRA repos. Dedicated endpoints or a cloud VM are required for true cloud execution.