| # API & CLI Wiring - Complete Verification | |
| All optimizations are now fully wired through the API and CLI. | |
| ## β Complete Parameter List | |
| ### Phase 4 Optimizations | |
| 1. **BF16 Support** | |
| - API: `use_bf16: bool` | |
| - CLI: `--use-bf16` | |
| - Service: β Integrated | |
| 2. **Gradient Clipping** | |
| - API: `gradient_clip_norm: Optional[float]` | |
| - CLI: `--gradient-clip-norm` | |
| - Service: β Integrated | |
| 3. **Learning Rate Finder** | |
| - API: `find_lr: bool` | |
| - CLI: `--find-lr` | |
| - Service: β Integrated | |
| 4. **Batch Size Finder** | |
| - API: `find_batch_size: bool` | |
| - CLI: `--find-batch-size` | |
| - Service: β Integrated | |
| ### FSDP Options | |
| 5. **FSDP** | |
| - API: `use_fsdp: bool` | |
| - CLI: `--use-fsdp` | |
| - Service: β Integrated | |
| 6. **FSDP Sharding Strategy** | |
| - API: `fsdp_sharding_strategy: str` | |
| - CLI: `--fsdp-sharding-strategy` | |
| - Service: β Integrated | |
| 7. **FSDP Mixed Precision** | |
| - API: `fsdp_mixed_precision: Optional[str]` | |
| - CLI: `--fsdp-mixed-precision` | |
| - Service: β Integrated | |
| ### Advanced Optimizations | |
| 8. **QAT** | |
| - API: `use_qat: bool` | |
| - CLI: `--use-qat` | |
| - Service: β Integrated | |
| 9. **QAT Backend** | |
| - API: `qat_backend: str` | |
| - CLI: `--qat-backend` | |
| - Service: β Integrated | |
| 10. **Sequence Parallelism** | |
| - API: `use_sequence_parallel: bool` | |
| - CLI: `--use-sequence-parallel` | |
| - Service: β Integrated | |
| 11. **Sequence Parallel GPUs** | |
| - API: `sequence_parallel_gpus: int` | |
| - CLI: `--sequence-parallel-gpus` | |
| - Service: β Integrated | |
| 12. **Activation Recomputation** | |
| - API: `activation_recompute_strategy: Optional[str]` | |
| - CLI: `--activation-recompute-strategy` | |
| - Service: β Integrated | |
| ### Checkpoint Options | |
| 13. **Async Checkpoint** | |
| - API: `async_checkpoint: bool` | |
| - CLI: `--async-checkpoint` | |
| - Service: β Integrated | |
| 14. **Compress Checkpoint** | |
| - API: `compress_checkpoint: bool` | |
| - CLI: `--compress-checkpoint` | |
| - Service: β Integrated | |
| --- | |
| ## π Data Flow Verification | |
| ### API Request Flow | |
| ``` | |
| POST /api/v1/train/start | |
| β | |
| TrainRequest (Pydantic validation) | |
| β | |
| Router: /train/start endpoint | |
| β | |
| fine_tune_da3() service function | |
| β | |
| All optimizations applied | |
| ``` | |
| ### CLI Command Flow | |
| ``` | |
| ylff train start ... | |
| β | |
| CLI function parameters | |
| β | |
| fine_tune_da3() service function | |
| β | |
| All optimizations applied | |
| ``` | |
| --- | |
| ## β Verification Checklist | |
| ### API Models (`ylff/models/api_models.py`) | |
| - [x] `TrainRequest` has all Phase 4 parameters | |
| - [x] `TrainRequest` has all FSDP parameters | |
| - [x] `TrainRequest` has all advanced optimization parameters | |
| - [x] `TrainRequest` has checkpoint optimization parameters | |
| - [x] `PretrainRequest` has all Phase 4 parameters | |
| - [x] `PretrainRequest` has all FSDP parameters | |
| - [x] `PretrainRequest` has all advanced optimization parameters | |
| - [x] `PretrainRequest` has checkpoint optimization parameters | |
| ### Router (`ylff/routers/training.py`) | |
| - [x] `/train/start` passes all parameters to `fine_tune_da3()` | |
| - [x] `/train/pretrain` passes all parameters to `pretrain_da3_on_arkit()` | |
| ### CLI (`ylff/cli.py`) | |
| - [x] `train start` command accepts all parameters | |
| - [x] `train start` passes all parameters to `fine_tune_da3()` | |
| - [x] `train pretrain` command accepts all parameters | |
| - [x] `train pretrain` passes all parameters to `pretrain_da3_on_arkit()` | |
| ### Service Functions | |
| - [x] `fine_tune_da3()` accepts all parameters | |
| - [x] `fine_tune_da3()` implements all optimizations | |
| - [x] `pretrain_da3_on_arkit()` accepts all parameters | |
| - [x] `pretrain_da3_on_arkit()` implements all optimizations | |
| --- | |
| ## π Complete Parameter Mapping | |
| | Parameter | API Model | Router | CLI | Service | | |
| | ------------------------------- | --------- | ------ | --- | ------- | | |
| | `use_bf16` | β | β | β | β | | |
| | `gradient_clip_norm` | β | β | β | β | | |
| | `find_lr` | β | β | β | β | | |
| | `find_batch_size` | β | β | β | β | | |
| | `use_fsdp` | β | β | β | β | | |
| | `fsdp_sharding_strategy` | β | β | β | β | | |
| | `fsdp_mixed_precision` | β | β | β | β | | |
| | `use_qat` | β | β | β | β | | |
| | `qat_backend` | β | β | β | β | | |
| | `use_sequence_parallel` | β | β | β | β | | |
| | `sequence_parallel_gpus` | β | β | β | β | | |
| | `activation_recompute_strategy` | β | β | β | β | | |
| | `async_checkpoint` | β | β | β | β | | |
| | `compress_checkpoint` | β | β | β | β | | |
| **Status: 100% Complete** β | |
| --- | |
| ## π― Usage Examples | |
| ### Complete API Request | |
| ```json | |
| { | |
| "training_data_dir": "data/training", | |
| "epochs": 10, | |
| "lr": 1e-5, | |
| "batch_size": 1, | |
| "use_bf16": true, | |
| "gradient_clip_norm": 1.0, | |
| "find_lr": true, | |
| "find_batch_size": true, | |
| "use_fsdp": true, | |
| "fsdp_sharding_strategy": "FULL_SHARD", | |
| "fsdp_mixed_precision": "bf16", | |
| "use_qat": false, | |
| "qat_backend": "fbgemm", | |
| "use_sequence_parallel": false, | |
| "sequence_parallel_gpus": 1, | |
| "activation_recompute_strategy": "checkpoint", | |
| "async_checkpoint": true, | |
| "compress_checkpoint": true | |
| } | |
| ``` | |
| ### Complete CLI Command | |
| ```bash | |
| ylff train start data/training \ | |
| --epochs 10 \ | |
| --lr 1e-5 \ | |
| --batch-size 1 \ | |
| --use-bf16 \ | |
| --gradient-clip-norm 1.0 \ | |
| --find-lr \ | |
| --find-batch-size \ | |
| --use-fsdp \ | |
| --fsdp-sharding-strategy FULL_SHARD \ | |
| --fsdp-mixed-precision bf16 \ | |
| --use-qat \ | |
| --qat-backend fbgemm \ | |
| --use-sequence-parallel \ | |
| --sequence-parallel-gpus 4 \ | |
| --activation-recompute-strategy hybrid \ | |
| --async-checkpoint \ | |
| --compress-checkpoint | |
| ``` | |
| --- | |
| ## β Final Status | |
| **All optimizations are fully wired through:** | |
| - β API request models | |
| - β Router endpoints | |
| - β CLI commands | |
| - β Service functions | |
| **Everything is connected end-to-end!** π | |