apehex
/

tokun

apehex commited on Jul 1, 2024

Commit

b66338b

verified ·

1 Parent(s): 583c638

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -58,22 +58,22 @@ Import the tokenizer and model:
 ```python
 tokenizer = tokun.huggingface.ByteTokenizer()
-model = hh.from_pretrained_keras('tokun/variants/4x16/')
 ```
 ### With Base Tensorflow / Keras
 You can directly load the weights [from the repository](../models/).
-For the most performant variant of the model, `4x16`:
 ```python
 import tensorflow as tf
 import tokun.model
 import urllib.request
-urllib.request.urlretrieve('https://github.com/apehex/tokun/raw/main/models/4x16/1/6.3.keras', 'model.keras')
-model = tf.keras.models.load_model('model.keras')
 ```
 ## Usage
@@ -121,7 +121,7 @@ print(__p.shape) # back to x shape
 ### With Base Tensorflow / Keras
 ```python
-__x = tokun.pipeline.preprocess(text=__s, groups=[4, 16], expand=[1], flatten=True)
 __e = model._encoder(__x) # final embedding = input for another model
 # these embeddings would be the input of a LLM
 __o = llm(__e) # replace with your LLM
@@ -178,10 +178,6 @@ Notes on each iteration:
 - `tokun-4`: [Github][article-file-tokun-4]
 - `tokun-16`: [Github][article-file-tokun-16]
-## TODO
-See [TODO](TODO.md).
 ## Credits
 This project was inspired by a video from Andrej Karpathy, ["Let's build the GPT tokenizer"][youtube-karpathy-tokenizer].

 ```python
 tokenizer = tokun.huggingface.ByteTokenizer()
+model = hh.from_pretrained_keras('tokun/variants/16x4/')
 ```
 ### With Base Tensorflow / Keras
 You can directly load the weights [from the repository](../models/).
+For the most performant variant of the model, `16x4`:
 ```python
 import tensorflow as tf
 import tokun.model
 import urllib.request
+urllib.request.urlretrieve('https://github.com/apehex/tokun/raw/main/models/16x4/1/7.7.keras', 'model.keras')
+model = tf.keras.models.load_model('model.keras', compile=False)
 ```
 ## Usage
 ### With Base Tensorflow / Keras
 ```python
+__x = tokun.pipeline.preprocess(text=__s, groups=[16, 4], expand=[1], flatten=True)
 __e = model._encoder(__x) # final embedding = input for another model
 # these embeddings would be the input of a LLM
 __o = llm(__e) # replace with your LLM
 - `tokun-4`: [Github][article-file-tokun-4]
 - `tokun-16`: [Github][article-file-tokun-16]
 ## Credits
 This project was inspired by a video from Andrej Karpathy, ["Let's build the GPT tokenizer"][youtube-karpathy-tokenizer].