dleemiller commited on
Commit
24bee9e
·
verified ·
1 Parent(s): d22b384

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +30 -17
README.md CHANGED
@@ -98,28 +98,41 @@ print(f"Predicted word length: {predicted_length}")
98
  ### 1. Character Prediction
99
  Predict characters from swipe paths with partial text context.
100
 
101
- **Use Case**: Autocorrection, suggestion ranking
 
 
 
 
 
 
 
 
 
 
 
102
 
103
  ### 2. Length Prediction
104
  Predict word length from swipe path alone.
105
 
106
- **Accuracy**: 89% exact, 96% within ±1
107
 
108
- **Use Case**: Pre-filtering candidate words
109
 
110
  ### 3. Path Reconstruction
111
  Reconstruct missing path coordinates.
112
 
113
- **MSE**: 0.005 on masked points
114
-
115
- **Use Case**: Noise reduction, gesture smoothing
116
 
117
  ### 4. Embedding Extraction
118
  Extract fixed-size embeddings for similarity search.
119
 
120
  **Dimension**: 768
121
 
122
- **Use Case**: Similar gesture search, deduplication
 
 
 
 
123
 
124
  ## Usage Examples
125
 
@@ -158,19 +171,19 @@ predicted_length = outputs.length_logits.argmax(dim=-1).item()
158
 
159
  ## Performance Metrics
160
 
161
- Evaluated on 200 test samples:
162
 
163
  | Task | Metric | Score |
164
  |------|--------|-------|
165
- | Masked Prediction (30%) | Character Accuracy | 98.7% |
166
- | | Top-3 Accuracy | 100% |
167
- | | Word Accuracy | 97.3% |
168
- | Full Reconstruction (100%) | Character Accuracy | 94% |
169
- | | Word Accuracy | 83.7% |
170
- | Length Prediction | Exact Accuracy | 89% |
171
- | | Within ±1 | 96% |
172
- | | Within ±2 | 99% |
173
- | Path Reconstruction | MSE (masked) | 0.005 |
174
 
175
  ## Model Outputs
176
 
 
98
  ### 1. Character Prediction
99
  Predict characters from swipe paths with partial text context.
100
 
101
+ Trained via masked language modeling with a sophisticated pairwise masking strategy that creates two augmented views of each input for contrastive learning. Training uses focal loss to focus on hard-to-predict characters and frequency-based weighting to handle character imbalance (rare letters like 'z' vs common letters like 'e').
102
+
103
+ **Pairwise Masking Strategy:**
104
+ - **Inverted Mode (80%)**: Asymmetric augmentation pairs
105
+ - Query view: Heavy masking (50-70% of path points and characters randomly masked) with gradients
106
+ - Key view: Light masking (10-20% of path points and characters randomly masked) with stop gradient
107
+ - Teaches robust representations invariant to noise and occlusion
108
+
109
+ - **Modality Mode (20%)**: Cross-modal alignment pairs
110
+ - Query view: Text fully masked, path visible (teaches path → semantic representation) with gradients
111
+ - Key view: Path fully masked, text visible (provides alignment target) with stop gradient
112
+ - Teaches correspondence between path geometry and text meaning
113
 
114
  ### 2. Length Prediction
115
  Predict word length from swipe path alone.
116
 
117
+ Trained as an auxiliary task where the CLS token aggregates path information to predict word length (0-48 characters). This helps the model learn geometric properties of swipe gestures that correlate with word length, such as path extent and complexity.
118
 
119
+ Length supervision occurs only during modality mode when text attention is fully zeroed (10% of training batches: 20% modality mode × 50% zero-attention probability). This trains the model to predict length from path geometry alone without any text length cues. Uses 10% of the total loss weight to encourage learning without dominating the primary objectives.
120
 
121
  ### 3. Path Reconstruction
122
  Reconstruct missing path coordinates.
123
 
124
+ Trained via masked path prediction as part of the pairwise masking strategy. During inverted mode (80% of batches), path points are randomly masked at 50-70% for heavy augmentation and 10-20% for light augmentation. During modality mode (20% of batches), either all path points are masked (key view) or none are masked (query view). The model learns to reconstruct spatial-temporal structure from partial path information and text context, teaching it the geometric and temporal patterns of swipe gestures. Uses 50% of the character prediction loss weight, making it a significant secondary objective.
 
 
125
 
126
  ### 4. Embedding Extraction
127
  Extract fixed-size embeddings for similarity search.
128
 
129
  **Dimension**: 768
130
 
131
+ Trained via contrastive learning where the SEP token produces fixed-size embeddings for path-text pairs. The pairwise masking strategy is central to embedding training:
132
+ - **Inverted mode (80%)**: Pulls embeddings of heavily-masked and lightly-masked versions of the same input close together, teaching invariance to noise and occlusion
133
+ - **Modality mode (20%)**: Pulls embeddings of path-only and text-only views of the same word close together, teaching cross-modal alignment between gesture geometry and semantic meaning
134
+
135
+ The contrastive loss (15% weight, temperature 0.07) pulls matching pairs together in embedding space while pushing non-matches apart. Uses Matryoshka embeddings to create nested representations at multiple dimensions (64, 128, 384, 768), with stronger weight on lower-dimensional representations (2.0×, 1.5×, 1.0×, 1.0×) to ensure the first 64 dimensions are highly informative on their own.
136
 
137
  ## Usage Examples
138
 
 
171
 
172
  ## Performance Metrics
173
 
174
+ Evaluated on 49,970 test samples:
175
 
176
  | Task | Metric | Score |
177
  |------|--------|-------|
178
+ | Masked Prediction (30%) | Character Accuracy | 96.1% |
179
+ | | Top-3 Accuracy | 97.6% |
180
+ | | Word Accuracy | 94.3% |
181
+ | Full Reconstruction (100%) | Character Accuracy | 93.1% |
182
+ | | Word Accuracy | 76.7% |
183
+ | Length Prediction | Exact Accuracy | 93.2% |
184
+ | | Within ±1 | 98.9% |
185
+ | | Within ±2 | 99.8% |
186
+ | Path Reconstruction | MSE (masked) | 0.000697 |
187
 
188
  ## Model Outputs
189