Spaces:
Sleeping
Sleeping
File size: 601 Bytes
f45427d 75dbc58 f45427d 75dbc58 f45427d 75dbc58 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
LIMITATIONS = """
### Key Assumptions:
- Standard transformer architecture with homogeneous layers
- Adam optimizer with mixed precision training (master weights copy)
- Tensor parallelism includes sequence parallelism
- Pipeline parallelism maintains consistent activation memory
### Not Currently Supported:
- Non-standard architectures (alternating dense/sparse layers, custom attention)
- Multi-modal models with vision layers
- Mixed dtype training (e.g., MXFP4)
- Kernel/framework overhead and intermediate memory
For advanced configurations, results should be validated against profiling.
""" |