dleemiller
/

SwipeALot-base

@@ -21,6 +21,7 @@ metrics:
 > [!IMPORTANT]
 > This model is currently in beta status and is subject to change.
 Multimodal, multi-objective transformer for swipe keyboard prediction.
 Trained on the [futo-org/swipe.futo.org](https://huggingface.co/datasets/futo-org/swipe.futo.org) dataset.
@@ -36,47 +37,47 @@ This model is trained with the following objectives:
 </p>
-> [!NOTE]
-> This model should be further fine-tuned for a specific task, if not using the embedding mode.
-> For example, length prediction can be significantly improved in a single task setting.
-## Quick Start
-```python
-from datasets import load_dataset
-from transformers import AutoModel, AutoProcessor
-import torch
-# Load model
-model = AutoModel.from_pretrained("dleemiller/SwipeALot-base", trust_remote_code=True)
-processor = AutoProcessor.from_pretrained("dleemiller/SwipeALot-base", trust_remote_code=True)
-model.eval()
-# Load sample
-dataset = load_dataset("futo-org/swipe.futo.org", split="test[:10]")
-item = dataset[4]
-# Preprocess swipe path using processor methods
-# 1. Normalize timestamps (x,y already normalized in futo dataset)
-normalized = processor.normalize_coordinates(item["data"], item["canvas_width"], item["canvas_height"])
-# 2. Resample to fixed length (max_path_len=128)
-#    - Pads with zeros if path < 128 points
-#    - Interpolates if path > 128 points
-path_coords, _ = processor.sample_path_points(normalized, processor.max_path_len)
-path = torch.tensor([path_coords], dtype=torch.float32)
-# Get predictions
-inputs = processor(path_coords=path, text=None, return_tensors="pt")
-with torch.no_grad():
-    outputs = model(**inputs)
-# Length prediction
-predicted_length = outputs.length_logits.argmax(dim=-1).item()
-print(f"Predicted word length: {predicted_length}")
 ```
 ## Model Details
 - **Architecture**: Transformer encoder (768-dim, 12 layers, 12 heads)
@@ -135,51 +136,9 @@ Trained via contrastive learning where the SEP token produces fixed-size embeddi
 - **Inverted mode (80%)**: Pulls embeddings of heavily-masked and lightly-masked versions of the same input close together, teaching invariance to noise and occlusion
 - **Modality mode (20%)**: Pulls embeddings of path-only and text-only views of the same word close together, teaching cross-modal alignment between gesture geometry and semantic meaning
-The contrastive loss (15% weight, temperature 0.07) pulls matching pairs together in embedding space while pushing non-matches apart. Uses Matryoshka embeddings to create nested representations at multiple dimensions (64, 128, 384, 768), with stronger weight on lower-dimensional representations (2.0×, 1.5×, 1.0×, 1.0×) to ensure the first 64 dimensions are highly informative on their own.
-## Usage Examples
-### Length Prediction
-This
-```python
-from datasets import load_dataset
-from transformers import AutoModel, AutoProcessor
-model = AutoModel.from_pretrained("dleemiller/SwipeALot-base", trust_remote_code=True)
-model.eval()
-model.requires_grad_(False)
-processor = AutoProcessor.from_pretrained("dleemiller/SwipeALot-base", trust_remote_code=True)
-# Load a sample row from the dataset.
-ds = load_dataset("futo-org/swipe.futo.org", split="test[:50]")
-row = ds[0]  # "Brahmas"
-# Length-only inference:
-# `encode_path(...)` preprocesses the swipe path to fixed-length motion features and sets text attention to 0.
-inputs = processor.encode_path(row["data"], return_tensors="pt")
-outputs = model(**inputs, return_dict=True)
-# Length prediction is a regression scalar (float); round it for an integer length.
-pred_len = float(outputs.length_logits.item())
-pred_len_rounded = max(0, int(round(pred_len)))
-true_len = sum(1 for c in row["word"].lower() if c.isalpha() or c.isdigit())
-print(f'Word:                 "{row["word"]}"')
-print(f"Length (true):        {true_len}")
-print(f"Length (pred):        {pred_len:.3f}")
-print(f"Length (pred rounded):{pred_len_rounded}")
-```
-```text
-Word:                 "Brahmas"
-Length (true):        7
-Length (pred):        7.483
-Length (pred rounded):7
-```
 ### Embedding Similarity

 > [!IMPORTANT]
 > This model is currently in beta status and is subject to change.
+> Last updated 2025-12-19
 Multimodal, multi-objective transformer for swipe keyboard prediction.
 Trained on the [futo-org/swipe.futo.org](https://huggingface.co/datasets/futo-org/swipe.futo.org) dataset.
 </p>
+## Quick Start (Length Prediction)
+```python
+from datasets import load_dataset
+from transformers import AutoModel, AutoProcessor
+model = AutoModel.from_pretrained("dleemiller/SwipeALot-base", trust_remote_code=True)
+model.eval()
+model.requires_grad_(False)
+processor = AutoProcessor.from_pretrained("dleemiller/SwipeALot-base", trust_remote_code=True)
+# Load a sample row from the dataset.
+ds = load_dataset("futo-org/swipe.futo.org", split="test[:50]")
+row = ds[0]  # "Brahmas"
+# Length-only inference:
+# `encode_path(...)` preprocesses the swipe path to fixed-length motion features and sets text attention to 0.
+inputs = processor.encode_path(row["data"], return_tensors="pt")
+outputs = model(**inputs, return_dict=True)
+# Length prediction is a regression scalar (float); round it for an integer length.
+pred_len = float(outputs.length_logits.item())
+pred_len_rounded = max(0, int(round(pred_len)))
+true_len = sum(1 for c in row["word"].lower() if c.isalpha() or c.isdigit())
+print(f'Word:                 "{row["word"]}"')
+print(f"Length (true):        {true_len}")
+print(f"Length (pred):        {pred_len:.3f}")
+print(f"Length (pred rounded):{pred_len_rounded}")
+```
+```text
+Word:                 "Brahmas"
+Length (true):        7
+Length (pred):        7.483
+Length (pred rounded):7
 ```
 ## Model Details
 - **Architecture**: Transformer encoder (768-dim, 12 layers, 12 heads)
 - **Inverted mode (80%)**: Pulls embeddings of heavily-masked and lightly-masked versions of the same input close together, teaching invariance to noise and occlusion
 - **Modality mode (20%)**: Pulls embeddings of path-only and text-only views of the same word close together, teaching cross-modal alignment between gesture geometry and semantic meaning
+The contrastive loss (10-20% weight, temperature 0.07) pulls matching pairs together in embedding space while pushing non-matches apart. Uses Matryoshka embeddings to create nested representations at multiple dimensions (64, 128, 384, 768), with stronger weight on lower-dimensional representations (2.0×, 1.5×, 1.0×, 1.0×) to ensure the first 64 dimensions are highly informative on their own.
+## More Usage Examples
 ### Embedding Similarity