levossadtchi
/

QED-75M

Text Generation

custom-architecture

Model card Files Files and versions

levossadtchi commited on 21 days ago

Commit

a80988e

·

verified ·

1 Parent(s): ec77546

Update README.md

Files changed (1) hide show

README.md +2 -11

README.md CHANGED Viewed

@@ -17,6 +17,8 @@ model_type: qed
 ![Frame 33](https://cdn-uploads.huggingface.co/production/uploads/695b8d7a2114f706bdcee465/Wu3QCW8XNwUXrYaANG7Ss.png)
 # QED-75M
@@ -206,17 +208,6 @@ Typical usage via Transformers:
   - `loss`: scalar when `labels` are provided
   - `past_key_values`: cached KV tensors when `use_cache=True`
-## KV Cache and Generation Semantics
-- The model uses a **legacy tuple KV cache** format (not the newer `DynamicCache` object). The integration explicitly disables default dynamic cache support (`_supports_default_dynamic_cache()` returns `False`).
-- In `prepare_inputs_for_generation(...)`:
-  - If `past_key_values` is provided, generation continues by feeding only the **last token** (`input_ids[:, -1:]`).
-- The attention layer concatenates past and current KV along the sequence dimension.
-Expected KV shapes (conceptually):
-- For each layer, `(key, value)` have shape `[batch_size, n_heads, kv_len, head_dim]`.
 ## Attention Masking
 When `attention_mask` is provided, the model converts it to a key-padding boolean mask:

 ![Frame 33](https://cdn-uploads.huggingface.co/production/uploads/695b8d7a2114f706bdcee465/Wu3QCW8XNwUXrYaANG7Ss.png)
+![compute_vs_score_scatter](https://cdn-uploads.huggingface.co/production/uploads/695b8d7a2114f706bdcee465/wgr_RTC2YhZ2cESPcdR5Y.png)
 # QED-75M
   - `loss`: scalar when `labels` are provided
   - `past_key_values`: cached KV tensors when `use_cache=True`
 ## Attention Masking
 When `attention_mask` is provided, the model converts it to a key-padding boolean mask: