Update README.md
Browse files
README.md
CHANGED
|
@@ -1,7 +1,8 @@
|
|
| 1 |
-
---
|
| 2 |
-
language:
|
| 3 |
-
- en
|
| 4 |
-
|
|
|
|
| 5 |
# char128-shift Tokenizer
|
| 6 |
|
| 7 |
A fixed-size Hugging Face–compatible **character tokenizer** with a dedicated **SHIFT** token (`↨`) to represent uppercase letters. Instead of assigning separate tokens to uppercase `A–Z`, each uppercase is encoded as `↨` + lowercase (e.g., `H` → `↨h`).
|
|
@@ -135,7 +136,7 @@ Your model’s `vocab_size` must match (128).
|
|
| 135 |
|
| 136 |
## License
|
| 137 |
|
| 138 |
-
MIT
|
| 139 |
|
| 140 |
---
|
| 141 |
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
license: mit
|
| 5 |
+
---
|
| 6 |
# char128-shift Tokenizer
|
| 7 |
|
| 8 |
A fixed-size Hugging Face–compatible **character tokenizer** with a dedicated **SHIFT** token (`↨`) to represent uppercase letters. Instead of assigning separate tokens to uppercase `A–Z`, each uppercase is encoded as `↨` + lowercase (e.g., `H` → `↨h`).
|
|
|
|
| 136 |
|
| 137 |
## License
|
| 138 |
|
| 139 |
+
MIT
|
| 140 |
|
| 141 |
---
|
| 142 |
|