API & CLI Wiring - Complete Verification
All optimizations are now fully wired through the API and CLI.
β Complete Parameter List
Phase 4 Optimizations
BF16 Support
- API:
use_bf16: bool - CLI:
--use-bf16 - Service: β Integrated
- API:
Gradient Clipping
- API:
gradient_clip_norm: Optional[float] - CLI:
--gradient-clip-norm - Service: β Integrated
- API:
Learning Rate Finder
- API:
find_lr: bool - CLI:
--find-lr - Service: β Integrated
- API:
Batch Size Finder
- API:
find_batch_size: bool - CLI:
--find-batch-size - Service: β Integrated
- API:
FSDP Options
FSDP
- API:
use_fsdp: bool - CLI:
--use-fsdp - Service: β Integrated
- API:
FSDP Sharding Strategy
- API:
fsdp_sharding_strategy: str - CLI:
--fsdp-sharding-strategy - Service: β Integrated
- API:
FSDP Mixed Precision
- API:
fsdp_mixed_precision: Optional[str] - CLI:
--fsdp-mixed-precision - Service: β Integrated
- API:
Advanced Optimizations
QAT
- API:
use_qat: bool - CLI:
--use-qat - Service: β Integrated
- API:
QAT Backend
- API:
qat_backend: str - CLI:
--qat-backend - Service: β Integrated
- API:
Sequence Parallelism
- API:
use_sequence_parallel: bool - CLI:
--use-sequence-parallel - Service: β Integrated
- API:
Sequence Parallel GPUs
- API:
sequence_parallel_gpus: int - CLI:
--sequence-parallel-gpus - Service: β Integrated
- API:
Activation Recomputation
- API:
activation_recompute_strategy: Optional[str] - CLI:
--activation-recompute-strategy - Service: β Integrated
- API:
Checkpoint Options
Async Checkpoint
- API:
async_checkpoint: bool - CLI:
--async-checkpoint - Service: β Integrated
- API:
Compress Checkpoint
- API:
compress_checkpoint: bool - CLI:
--compress-checkpoint - Service: β Integrated
- API:
π Data Flow Verification
API Request Flow
POST /api/v1/train/start
β
TrainRequest (Pydantic validation)
β
Router: /train/start endpoint
β
fine_tune_da3() service function
β
All optimizations applied
CLI Command Flow
ylff train start ...
β
CLI function parameters
β
fine_tune_da3() service function
β
All optimizations applied
β Verification Checklist
API Models (ylff/models/api_models.py)
-
TrainRequesthas all Phase 4 parameters -
TrainRequesthas all FSDP parameters -
TrainRequesthas all advanced optimization parameters -
TrainRequesthas checkpoint optimization parameters -
PretrainRequesthas all Phase 4 parameters -
PretrainRequesthas all FSDP parameters -
PretrainRequesthas all advanced optimization parameters -
PretrainRequesthas checkpoint optimization parameters
Router (ylff/routers/training.py)
-
/train/startpasses all parameters tofine_tune_da3() -
/train/pretrainpasses all parameters topretrain_da3_on_arkit()
CLI (ylff/cli.py)
-
train startcommand accepts all parameters -
train startpasses all parameters tofine_tune_da3() -
train pretraincommand accepts all parameters -
train pretrainpasses all parameters topretrain_da3_on_arkit()
Service Functions
-
fine_tune_da3()accepts all parameters -
fine_tune_da3()implements all optimizations -
pretrain_da3_on_arkit()accepts all parameters -
pretrain_da3_on_arkit()implements all optimizations
π Complete Parameter Mapping
| Parameter | API Model | Router | CLI | Service |
|---|---|---|---|---|
use_bf16 |
β | β | β | β |
gradient_clip_norm |
β | β | β | β |
find_lr |
β | β | β | β |
find_batch_size |
β | β | β | β |
use_fsdp |
β | β | β | β |
fsdp_sharding_strategy |
β | β | β | β |
fsdp_mixed_precision |
β | β | β | β |
use_qat |
β | β | β | β |
qat_backend |
β | β | β | β |
use_sequence_parallel |
β | β | β | β |
sequence_parallel_gpus |
β | β | β | β |
activation_recompute_strategy |
β | β | β | β |
async_checkpoint |
β | β | β | β |
compress_checkpoint |
β | β | β | β |
Status: 100% Complete β
π― Usage Examples
Complete API Request
{
"training_data_dir": "data/training",
"epochs": 10,
"lr": 1e-5,
"batch_size": 1,
"use_bf16": true,
"gradient_clip_norm": 1.0,
"find_lr": true,
"find_batch_size": true,
"use_fsdp": true,
"fsdp_sharding_strategy": "FULL_SHARD",
"fsdp_mixed_precision": "bf16",
"use_qat": false,
"qat_backend": "fbgemm",
"use_sequence_parallel": false,
"sequence_parallel_gpus": 1,
"activation_recompute_strategy": "checkpoint",
"async_checkpoint": true,
"compress_checkpoint": true
}
Complete CLI Command
ylff train start data/training \
--epochs 10 \
--lr 1e-5 \
--batch-size 1 \
--use-bf16 \
--gradient-clip-norm 1.0 \
--find-lr \
--find-batch-size \
--use-fsdp \
--fsdp-sharding-strategy FULL_SHARD \
--fsdp-mixed-precision bf16 \
--use-qat \
--qat-backend fbgemm \
--use-sequence-parallel \
--sequence-parallel-gpus 4 \
--activation-recompute-strategy hybrid \
--async-checkpoint \
--compress-checkpoint
β Final Status
All optimizations are fully wired through:
- β API request models
- β Router endpoints
- β CLI commands
- β Service functions
Everything is connected end-to-end! π