bhavnicksm commited on
Commit
a0e52a3
·
verified ·
1 Parent(s): 4636299

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +61 -0
README.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - tokie
4
+ - model2vec
5
+ library_name: tokie
6
+ ---
7
+
8
+ <p align="center">
9
+ <img src="tokie-banner.png" alt="tokie" width="600">
10
+ </p>
11
+
12
+ # potion-base-4M
13
+
14
+ Pre-built [tokie](https://github.com/chonkie-inc/tokie) tokenizer for [potion-base-4M](https://huggingface.co/minishlab/potion-base-4M).
15
+
16
+ ## Quick Start (Python)
17
+
18
+ ```bash
19
+ pip install tokie
20
+ ```
21
+
22
+ ```python
23
+ import tokie
24
+
25
+ tokenizer = tokie.Tokenizer.from_pretrained("tokiers/potion-base-4M")
26
+ encoding = tokenizer.encode("Hello, world!")
27
+ print(encoding.ids)
28
+ print(encoding.attention_mask)
29
+ ```
30
+
31
+ ## Quick Start (Rust)
32
+
33
+ ```toml
34
+ [dependencies]
35
+ tokie = { version = "0.0.7", features = ["hf"] }
36
+ ```
37
+
38
+ ```rust
39
+ use tokie::Tokenizer;
40
+
41
+ let tokenizer = Tokenizer::from_pretrained("tokiers/potion-base-4M").unwrap();
42
+ let encoding = tokenizer.encode("Hello, world!", true);
43
+ println!("{:?}", encoding.ids);
44
+ ```
45
+
46
+ ## Files
47
+
48
+ - `tokenizer.tkz` — tokie binary format (~10x smaller, loads in ~5ms)
49
+ - `tokenizer.json` — original HuggingFace tokenizer
50
+ - `model.safetensors` — original model weights
51
+ - All other files from [potion-base-4M](https://huggingface.co/minishlab/potion-base-4M)
52
+
53
+ ## About tokie
54
+
55
+ **50x faster tokenization, 10x smaller model files, 100% accurate.**
56
+
57
+ tokie is a drop-in replacement for HuggingFace tokenizers, built in Rust. See [GitHub](https://github.com/chonkie-inc/tokie) for benchmarks and documentation.
58
+
59
+ ## License
60
+
61
+ MIT OR Apache-2.0 (tokie library). Original model files retain their original license from [potion-base-4M](https://huggingface.co/minishlab/potion-base-4M).