Spaces:

harmdevries
/

transformer_inference

Runtime error

harmdevries commited on Nov 3, 2022

Commit

6c2da96

1 Parent(s): c88286f

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -185,10 +185,10 @@ st.markdown("where BW_math is the number of floating point operations per second
 st.markdown("If we assume we can *perfectly* overlap memory access with math operations, then the estimated execution time for the operation is:")
 st.latex("max(T_{math}, T_{mem})")
-st.markdown("We also a minimum time for executing the operation due to [kernel launch overhead](https://forums.developer.nvidia.com/t/any-way-to-measure-the-latency-of-a-kernel-launch/221413/2)")
 st.subheader("Inference time for Transformer operations")
-st.text("We can now estimate the execution for each of the operations in the transformer model. I suggest you inspect the code for details on the calculations. ")
 st.subheader('Attention layer')

 st.markdown("If we assume we can *perfectly* overlap memory access with math operations, then the estimated execution time for the operation is:")
 st.latex("max(T_{math}, T_{mem})")
+st.markdown("Note that there is a minimum time to execute the operation due to [kernel launch overhead](https://forums.developer.nvidia.com/t/any-way-to-measure-the-latency-of-a-kernel-launch/221413/2)")
 st.subheader("Inference time for Transformer operations")
+st.markdown("We can now estimate the execution for each of the operations in the transformer model. I suggest you inspect the code for details on the calculations. ")
 st.subheader('Attention layer')