Spaces:
Runtime error
Runtime error
Commit
·
6c2da96
1
Parent(s):
c88286f
Update app.py
Browse files
app.py
CHANGED
|
@@ -185,10 +185,10 @@ st.markdown("where BW_math is the number of floating point operations per second
|
|
| 185 |
st.markdown("If we assume we can *perfectly* overlap memory access with math operations, then the estimated execution time for the operation is:")
|
| 186 |
st.latex("max(T_{math}, T_{mem})")
|
| 187 |
|
| 188 |
-
st.markdown("
|
| 189 |
|
| 190 |
st.subheader("Inference time for Transformer operations")
|
| 191 |
-
st.
|
| 192 |
|
| 193 |
st.subheader('Attention layer')
|
| 194 |
|
|
|
|
| 185 |
st.markdown("If we assume we can *perfectly* overlap memory access with math operations, then the estimated execution time for the operation is:")
|
| 186 |
st.latex("max(T_{math}, T_{mem})")
|
| 187 |
|
| 188 |
+
st.markdown("Note that there is a minimum time to execute the operation due to [kernel launch overhead](https://forums.developer.nvidia.com/t/any-way-to-measure-the-latency-of-a-kernel-launch/221413/2)")
|
| 189 |
|
| 190 |
st.subheader("Inference time for Transformer operations")
|
| 191 |
+
st.markdown("We can now estimate the execution for each of the operations in the transformer model. I suggest you inspect the code for details on the calculations. ")
|
| 192 |
|
| 193 |
st.subheader('Attention layer')
|
| 194 |
|