futo-org
/

futo-swipe

@@ -3,7 +3,6 @@ license: other
 license_name: futo-model-weights-license-1.0
 license_link: LICENSE.md
 library_name: executorch
-pipeline_tag: token-classification
 tags:
   - swipe-typing
   - gesture-typing
@@ -23,13 +22,13 @@ Mobile-oriented models for decoding swipe gestures into text.
   <img src="https://huggingface.co/futo-org/futo-swipe/resolve/main/animations/swipe_demo_computer.gif" width="640" alt="Swipe decode of the word 'computer'">
 </p>
-See the paper, [*coming soon*](https://huggingface.co/futo-org/futo-swipe),
-for details.
 ## Models
-This repository contains 3 models that compose together.
-Only the encoder is required; the decoder and language model are
 additional refinements, leveraging specific layout and language information.
 The encoder can decode for **any** keyboard layout,
 while the decoder is English/QWERTY-only and the language model is English-only.
@@ -161,8 +160,8 @@ greedy decode: computer
 | **input** | `layout_keys` | `[1, 64, 2]` | Per-key `(x, y)` centers, padded to 64 keys |
 | **input** | `layout_mask` | `[1, 64]` | Boolean mask of valid keys |
 | **output** | `log_emissions` | `[1, 32, 65]` | Log-probabilities over 64 keys + blank |
-| **output** | `coefficients` | `[1, 32, 64]` | Spectral coefficients |
-| **output** | `lambda` | `[1, 32, 1]` | *Intention* gate |
 The output time dimension is 32, half the 64 input points. The encoder
 applies a 2× temporal downsample (a stride-2 adapter) inside the network, so

 license_name: futo-model-weights-license-1.0
 license_link: LICENSE.md
 library_name: executorch
 tags:
   - swipe-typing
   - gesture-typing
   <img src="https://huggingface.co/futo-org/futo-swipe/resolve/main/animations/swipe_demo_computer.gif" width="640" alt="Swipe decode of the word 'computer'">
 </p>
+See the paper [(*coming soon*)](https://huggingface.co/futo-org/futo-swipe)
+for more details.
 ## Models
+This repository contains 3 CNN models that compose together.
+Only the encoder is required. The decoder and language model are
 additional refinements, leveraging specific layout and language information.
 The encoder can decode for **any** keyboard layout,
 while the decoder is English/QWERTY-only and the language model is English-only.
 | **input** | `layout_keys` | `[1, 64, 2]` | Per-key `(x, y)` centers, padded to 64 keys |
 | **input** | `layout_mask` | `[1, 64]` | Boolean mask of valid keys |
 | **output** | `log_emissions` | `[1, 32, 65]` | Log-probabilities over 64 keys + blank |
+| **output** | `coefficients` | `[1, 32, 64]` | Spectral coefficients (decoder features) |
+| **output** | `lambda` | `[1, 32, 1]` | *Intention* gate (decoder features) |
 The output time dimension is 32, half the 64 input points. The encoder
 applies a 2× temporal downsample (a stride-2 adapter) inside the network, so