levossadtchi commited on
Commit
a80988e
·
verified ·
1 Parent(s): ec77546

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -11
README.md CHANGED
@@ -17,6 +17,8 @@ model_type: qed
17
 
18
  ![Frame 33](https://cdn-uploads.huggingface.co/production/uploads/695b8d7a2114f706bdcee465/Wu3QCW8XNwUXrYaANG7Ss.png)
19
 
 
 
20
 
21
  # QED-75M
22
 
@@ -206,17 +208,6 @@ Typical usage via Transformers:
206
  - `loss`: scalar when `labels` are provided
207
  - `past_key_values`: cached KV tensors when `use_cache=True`
208
 
209
- ## KV Cache and Generation Semantics
210
-
211
- - The model uses a **legacy tuple KV cache** format (not the newer `DynamicCache` object). The integration explicitly disables default dynamic cache support (`_supports_default_dynamic_cache()` returns `False`).
212
- - In `prepare_inputs_for_generation(...)`:
213
- - If `past_key_values` is provided, generation continues by feeding only the **last token** (`input_ids[:, -1:]`).
214
- - The attention layer concatenates past and current KV along the sequence dimension.
215
-
216
- Expected KV shapes (conceptually):
217
-
218
- - For each layer, `(key, value)` have shape `[batch_size, n_heads, kv_len, head_dim]`.
219
-
220
  ## Attention Masking
221
 
222
  When `attention_mask` is provided, the model converts it to a key-padding boolean mask:
 
17
 
18
  ![Frame 33](https://cdn-uploads.huggingface.co/production/uploads/695b8d7a2114f706bdcee465/Wu3QCW8XNwUXrYaANG7Ss.png)
19
 
20
+ ![compute_vs_score_scatter](https://cdn-uploads.huggingface.co/production/uploads/695b8d7a2114f706bdcee465/wgr_RTC2YhZ2cESPcdR5Y.png)
21
+
22
 
23
  # QED-75M
24
 
 
208
  - `loss`: scalar when `labels` are provided
209
  - `past_key_values`: cached KV tensors when `use_cache=True`
210
 
 
 
 
 
 
 
 
 
 
 
 
211
  ## Attention Masking
212
 
213
  When `attention_mask` is provided, the model converts it to a key-padding boolean mask: