Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -98,28 +98,41 @@ print(f"Predicted word length: {predicted_length}")
|
|
| 98 |
### 1. Character Prediction
|
| 99 |
Predict characters from swipe paths with partial text context.
|
| 100 |
|
| 101 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 102 |
|
| 103 |
### 2. Length Prediction
|
| 104 |
Predict word length from swipe path alone.
|
| 105 |
|
| 106 |
-
|
| 107 |
|
| 108 |
-
|
| 109 |
|
| 110 |
### 3. Path Reconstruction
|
| 111 |
Reconstruct missing path coordinates.
|
| 112 |
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
**Use Case**: Noise reduction, gesture smoothing
|
| 116 |
|
| 117 |
### 4. Embedding Extraction
|
| 118 |
Extract fixed-size embeddings for similarity search.
|
| 119 |
|
| 120 |
**Dimension**: 768
|
| 121 |
|
| 122 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 123 |
|
| 124 |
## Usage Examples
|
| 125 |
|
|
@@ -158,19 +171,19 @@ predicted_length = outputs.length_logits.argmax(dim=-1).item()
|
|
| 158 |
|
| 159 |
## Performance Metrics
|
| 160 |
|
| 161 |
-
Evaluated on
|
| 162 |
|
| 163 |
| Task | Metric | Score |
|
| 164 |
|------|--------|-------|
|
| 165 |
-
| Masked Prediction (30%) | Character Accuracy |
|
| 166 |
-
| | Top-3 Accuracy |
|
| 167 |
-
| | Word Accuracy |
|
| 168 |
-
| Full Reconstruction (100%) | Character Accuracy |
|
| 169 |
-
| | Word Accuracy |
|
| 170 |
-
| Length Prediction | Exact Accuracy |
|
| 171 |
-
| | Within ±1 |
|
| 172 |
-
| | Within ±2 | 99% |
|
| 173 |
-
| Path Reconstruction | MSE (masked) | 0.
|
| 174 |
|
| 175 |
## Model Outputs
|
| 176 |
|
|
|
|
| 98 |
### 1. Character Prediction
|
| 99 |
Predict characters from swipe paths with partial text context.
|
| 100 |
|
| 101 |
+
Trained via masked language modeling with a sophisticated pairwise masking strategy that creates two augmented views of each input for contrastive learning. Training uses focal loss to focus on hard-to-predict characters and frequency-based weighting to handle character imbalance (rare letters like 'z' vs common letters like 'e').
|
| 102 |
+
|
| 103 |
+
**Pairwise Masking Strategy:**
|
| 104 |
+
- **Inverted Mode (80%)**: Asymmetric augmentation pairs
|
| 105 |
+
- Query view: Heavy masking (50-70% of path points and characters randomly masked) with gradients
|
| 106 |
+
- Key view: Light masking (10-20% of path points and characters randomly masked) with stop gradient
|
| 107 |
+
- Teaches robust representations invariant to noise and occlusion
|
| 108 |
+
|
| 109 |
+
- **Modality Mode (20%)**: Cross-modal alignment pairs
|
| 110 |
+
- Query view: Text fully masked, path visible (teaches path → semantic representation) with gradients
|
| 111 |
+
- Key view: Path fully masked, text visible (provides alignment target) with stop gradient
|
| 112 |
+
- Teaches correspondence between path geometry and text meaning
|
| 113 |
|
| 114 |
### 2. Length Prediction
|
| 115 |
Predict word length from swipe path alone.
|
| 116 |
|
| 117 |
+
Trained as an auxiliary task where the CLS token aggregates path information to predict word length (0-48 characters). This helps the model learn geometric properties of swipe gestures that correlate with word length, such as path extent and complexity.
|
| 118 |
|
| 119 |
+
Length supervision occurs only during modality mode when text attention is fully zeroed (10% of training batches: 20% modality mode × 50% zero-attention probability). This trains the model to predict length from path geometry alone without any text length cues. Uses 10% of the total loss weight to encourage learning without dominating the primary objectives.
|
| 120 |
|
| 121 |
### 3. Path Reconstruction
|
| 122 |
Reconstruct missing path coordinates.
|
| 123 |
|
| 124 |
+
Trained via masked path prediction as part of the pairwise masking strategy. During inverted mode (80% of batches), path points are randomly masked at 50-70% for heavy augmentation and 10-20% for light augmentation. During modality mode (20% of batches), either all path points are masked (key view) or none are masked (query view). The model learns to reconstruct spatial-temporal structure from partial path information and text context, teaching it the geometric and temporal patterns of swipe gestures. Uses 50% of the character prediction loss weight, making it a significant secondary objective.
|
|
|
|
|
|
|
| 125 |
|
| 126 |
### 4. Embedding Extraction
|
| 127 |
Extract fixed-size embeddings for similarity search.
|
| 128 |
|
| 129 |
**Dimension**: 768
|
| 130 |
|
| 131 |
+
Trained via contrastive learning where the SEP token produces fixed-size embeddings for path-text pairs. The pairwise masking strategy is central to embedding training:
|
| 132 |
+
- **Inverted mode (80%)**: Pulls embeddings of heavily-masked and lightly-masked versions of the same input close together, teaching invariance to noise and occlusion
|
| 133 |
+
- **Modality mode (20%)**: Pulls embeddings of path-only and text-only views of the same word close together, teaching cross-modal alignment between gesture geometry and semantic meaning
|
| 134 |
+
|
| 135 |
+
The contrastive loss (15% weight, temperature 0.07) pulls matching pairs together in embedding space while pushing non-matches apart. Uses Matryoshka embeddings to create nested representations at multiple dimensions (64, 128, 384, 768), with stronger weight on lower-dimensional representations (2.0×, 1.5×, 1.0×, 1.0×) to ensure the first 64 dimensions are highly informative on their own.
|
| 136 |
|
| 137 |
## Usage Examples
|
| 138 |
|
|
|
|
| 171 |
|
| 172 |
## Performance Metrics
|
| 173 |
|
| 174 |
+
Evaluated on 49,970 test samples:
|
| 175 |
|
| 176 |
| Task | Metric | Score |
|
| 177 |
|------|--------|-------|
|
| 178 |
+
| Masked Prediction (30%) | Character Accuracy | 96.1% |
|
| 179 |
+
| | Top-3 Accuracy | 97.6% |
|
| 180 |
+
| | Word Accuracy | 94.3% |
|
| 181 |
+
| Full Reconstruction (100%) | Character Accuracy | 93.1% |
|
| 182 |
+
| | Word Accuracy | 76.7% |
|
| 183 |
+
| Length Prediction | Exact Accuracy | 93.2% |
|
| 184 |
+
| | Within ±1 | 98.9% |
|
| 185 |
+
| | Within ±2 | 99.8% |
|
| 186 |
+
| Path Reconstruction | MSE (masked) | 0.000697 |
|
| 187 |
|
| 188 |
## Model Outputs
|
| 189 |
|