Spaces:

Daksh0505
/

Seq2Seq-LSTM-MultiHeadAttention-Translation

Sleeping

App Files Files Community

Daksh0505 commited on Oct 22

Commit

1bb02a6

verified ·

1 Parent(s): bc44d3c

Update app.py

Browse files

Files changed (1) hide show

app.py +65 -2

app.py CHANGED Viewed

@@ -221,7 +221,7 @@ if st.session_state.translation:
 # ------------------------------------------------
 # Learning Header
 # ------------------------------------------------
-st.subheader("Leaning how it works")
 # ------------------------------------------------
 # Self Attention Section
@@ -312,7 +312,70 @@ with st.expander("🔹 Fixed-Length vs Variable-Length Tasks"):
     - Example: Machine translation, summarization, speech recognition.
     - Seq2Seq models are designed to handle this flexibility.
     """)
 # ------------------------------------------------
 # Show model architecture
 # ------------------------------------------------

 # ------------------------------------------------
 # Learning Header
 # ------------------------------------------------
+st.subheader(" Understanding the Model")
 # ------------------------------------------------
 # Self Attention Section
     - Example: Machine translation, summarization, speech recognition.
     - Seq2Seq models are designed to handle this flexibility.
     """)
+# ------------------------------------------------
+# Mathematics Expanders (Advanced / Optional)
+# ------------------------------------------------
+st.subheader("🧮 Mathematics Behind the Model")
+with st.expander("🔹 Self-Attention Equations", expanded=False):
+    st.markdown(r"""
+    The attention function is computed as:
+    \[
+    \text{Attention}(Q,K,V) = \text{softmax}\left(\frac{Q K^T}{\sqrt{d_k}}\right) V
+    \]
+    Where:
+    - \(Q\) = Query matrix
+    - \(K\) = Key matrix
+    - \(V\) = Value matrix
+    - \(d_k\) = Dimension of key vectors
+    This allows the model to compute a weighted sum of values based on relevance.
+    """)
+with st.expander("🔹 Multi-Head Attention Equations", expanded=False):
+    st.markdown(r"""
+    Multi-Head Attention combines multiple self-attention heads:
+    \[
+    \text{MultiHead}(Q,K,V) = \text{Concat}(\text{head}_1, ..., \text{head}_h) W^O
+    \]
+    Each head:
+    \[
+    \text{head}_i = \text{Attention}(Q W_i^Q, K W_i^K, V W_i^V)
+    \]
+    Where \(W_i^Q, W_i^K, W_i^V, W^O\) are learnable projection matrices.
+    """)
+with st.expander("🔹 Cross-Attention / Encoder-Decoder Attention", expanded=False):
+    st.markdown(r"""
+    Cross-Attention computes attention using decoder queries and encoder outputs:
+    \[
+    \text{Context}_t = \text{Attention}(Q_t, K_{enc}, V_{enc})
+    \]
+    - \(Q_t\) = decoder hidden state at timestep \(t\)
+    - \(K_{enc}, V_{enc}\) = encoder outputs
+    """)
+with st.expander("🔹 Seq2Seq Decoder Step", expanded=False):
+    st.markdown(r"""
+    At each decoder timestep:
+    \[
+    s_t, c_t = \text{LSTM}(y_{t-1}, s_{t-1}, c_{t-1})
+    \]
+    \[
+    \text{Output}_t = \text{Dense}(\text{Concat}(s_t, \text{Context}_t))
+    \]
+    """)
 # ------------------------------------------------
 # Show model architecture
 # ------------------------------------------------