bhavnicksm commited on
Commit
98f968d
·
verified ·
1 Parent(s): 17a8108

Add model card README

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - tokie
4
+ library_name: tokie
5
+ ---
6
+
7
+ <p align="center">
8
+ <img src="tokie-banner.png" alt="tokie" width="600">
9
+ </p>
10
+
11
+ # phi-2
12
+
13
+ Pre-built [tokie](https://github.com/chonkie-inc/tokie) tokenizer for [microsoft/phi-2](https://huggingface.co/microsoft/phi-2).
14
+
15
+ ## Quick Start (Python)
16
+
17
+ ```bash
18
+ pip install tokie
19
+ ```
20
+
21
+ ```python
22
+ import tokie
23
+
24
+ tokenizer = tokie.Tokenizer.from_pretrained("tokiers/phi-2")
25
+ encoding = tokenizer.encode("Hello, world!")
26
+ print(encoding.ids)
27
+ print(encoding.attention_mask)
28
+ ```
29
+
30
+ ## Quick Start (Rust)
31
+
32
+ ```toml
33
+ [dependencies]
34
+ tokie = { version = "0.0.4", features = ["hf"] }
35
+ ```
36
+
37
+ ```rust
38
+ use tokie::Tokenizer;
39
+
40
+ let tokenizer = Tokenizer::from_pretrained("tokiers/phi-2").unwrap();
41
+ let encoding = tokenizer.encode("Hello, world!", true);
42
+ println!("{:?}", encoding.ids);
43
+ ```
44
+
45
+ ## Files
46
+
47
+ - `tokenizer.tkz` — tokie binary format (~10x smaller, loads in ~5ms)
48
+ - `tokenizer.json` — original HuggingFace tokenizer
49
+
50
+ ## About tokie
51
+
52
+ **50x faster tokenization, 10x smaller model files, 100% accurate.**
53
+
54
+ tokie is a drop-in replacement for HuggingFace tokenizers, built in Rust. See [GitHub](https://github.com/chonkie-inc/tokie) for benchmarks and documentation.
55
+
56
+ ## License
57
+
58
+ MIT OR Apache-2.0 (tokie library). Original model files retain their original license from [microsoft/phi-2](https://huggingface.co/microsoft/phi-2).