llm_memory_visualizer / limitations.py
rubenaghayan's picture
final edits
75dbc58
raw
history blame
601 Bytes
LIMITATIONS = """
### Key Assumptions:
- Standard transformer architecture with homogeneous layers
- Adam optimizer with mixed precision training (master weights copy)
- Tensor parallelism includes sequence parallelism
- Pipeline parallelism maintains consistent activation memory
### Not Currently Supported:
- Non-standard architectures (alternating dense/sparse layers, custom attention)
- Multi-modal models with vision layers
- Mixed dtype training (e.g., MXFP4)
- Kernel/framework overhead and intermediate memory
For advanced configurations, results should be validated against profiling.
"""