Update README.md
Browse files
README.md
CHANGED
|
@@ -17,6 +17,8 @@ model_type: qed
|
|
| 17 |
|
| 18 |

|
| 19 |
|
|
|
|
|
|
|
| 20 |
|
| 21 |
# QED-75M
|
| 22 |
|
|
@@ -206,17 +208,6 @@ Typical usage via Transformers:
|
|
| 206 |
- `loss`: scalar when `labels` are provided
|
| 207 |
- `past_key_values`: cached KV tensors when `use_cache=True`
|
| 208 |
|
| 209 |
-
## KV Cache and Generation Semantics
|
| 210 |
-
|
| 211 |
-
- The model uses a **legacy tuple KV cache** format (not the newer `DynamicCache` object). The integration explicitly disables default dynamic cache support (`_supports_default_dynamic_cache()` returns `False`).
|
| 212 |
-
- In `prepare_inputs_for_generation(...)`:
|
| 213 |
-
- If `past_key_values` is provided, generation continues by feeding only the **last token** (`input_ids[:, -1:]`).
|
| 214 |
-
- The attention layer concatenates past and current KV along the sequence dimension.
|
| 215 |
-
|
| 216 |
-
Expected KV shapes (conceptually):
|
| 217 |
-
|
| 218 |
-
- For each layer, `(key, value)` have shape `[batch_size, n_heads, kv_len, head_dim]`.
|
| 219 |
-
|
| 220 |
## Attention Masking
|
| 221 |
|
| 222 |
When `attention_mask` is provided, the model converts it to a key-padding boolean mask:
|
|
|
|
| 17 |
|
| 18 |

|
| 19 |
|
| 20 |
+

|
| 21 |
+
|
| 22 |
|
| 23 |
# QED-75M
|
| 24 |
|
|
|
|
| 208 |
- `loss`: scalar when `labels` are provided
|
| 209 |
- `past_key_values`: cached KV tensors when `use_cache=True`
|
| 210 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 211 |
## Attention Masking
|
| 212 |
|
| 213 |
When `attention_mask` is provided, the model converts it to a key-padding boolean mask:
|