Instructions to use J-Raposo/code-search-net-tokenizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use J-Raposo/code-search-net-tokenizer with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("J-Raposo/code-search-net-tokenizer", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -38,7 +38,7 @@ This tokenizer is a byte-level BPE tokenizer (GPT-2 style) retrained on the Code
|
|
| 38 |
|
| 39 |
## Tokenizer details / configuration
|
| 40 |
- **Tokenizer type:** Byte-level BPE (GPT-2–style / `tokenizers` fast API).
|
| 41 |
-
- **Vocabulary size:** 50,257 (GPT-2 default)
|
| 42 |
- **Special tokens:** standard GPT-2 tokens (e.g., ``) or custom tokens if you added any. Ensure `tokenizer_config.json` in the repo lists them.
|
| 43 |
- **Normalization:** Byte-level normalization (works with arbitrary byte sequences / UTF-8).
|
| 44 |
- **Files included:** `tokenizer.json` (preferred `tokenizers` fast format) or `vocab.json` + `merges.txt` (legacy), and `tokenizer_config.json`.
|
|
|
|
| 38 |
|
| 39 |
## Tokenizer details / configuration
|
| 40 |
- **Tokenizer type:** Byte-level BPE (GPT-2–style / `tokenizers` fast API).
|
| 41 |
+
- **Vocabulary size:** 50,257 (GPT-2 default)
|
| 42 |
- **Special tokens:** standard GPT-2 tokens (e.g., ``) or custom tokens if you added any. Ensure `tokenizer_config.json` in the repo lists them.
|
| 43 |
- **Normalization:** Byte-level normalization (works with arbitrary byte sequences / UTF-8).
|
| 44 |
- **Files included:** `tokenizer.json` (preferred `tokenizers` fast format) or `vocab.json` + `merges.txt` (legacy), and `tokenizer_config.json`.
|