rasAli02 commited on
Commit
1508d64
·
1 Parent(s): 48d2a70

docs: detail ROCm eager execution optimizations in README

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -50,6 +50,7 @@ Building ForgeSight was a journey through the cutting edge of AMD hardware and a
50
  To make the agents responsive, we deployed the model using **vLLM** on the **ROCm 6.2** stack.
51
  * We utilized **PagedAttention** to handle the high VRAM requirements of the model.
52
  * The massive 192GB VRAM of the MI300X allowed us to serve the full model without sharding, maximizing throughput for our concurrent agent calls.
 
53
 
54
  ### 2. Designing the Multi-Agent Pipeline
55
  We implemented a 4-stage sequential pipeline in Python to ensure industrial-grade auditability:
 
50
  To make the agents responsive, we deployed the model using **vLLM** on the **ROCm 6.2** stack.
51
  * We utilized **PagedAttention** to handle the high VRAM requirements of the model.
52
  * The massive 192GB VRAM of the MI300X allowed us to serve the full model without sharding, maximizing throughput for our concurrent agent calls.
53
+ * **ROCm Tuning**: To ensure rock-solid stability during multimodal inference and avoid known `HSA_STATUS_ERROR_INVALID_PACKET_FORMAT` bugs with complex attention kernels on the MI300X, we optimized the engine by enforcing eager execution and disabling chunked prefill, resulting in flawless pipeline stability.
54
 
55
  ### 2. Designing the Multi-Agent Pipeline
56
  We implemented a 4-stage sequential pipeline in Python to ensure industrial-grade auditability: