Text Generation
Transformers
Safetensors
English
mistral
text-generation-inference
DarwinAnim8or commited on
Commit
1447346
·
verified ·
1 Parent(s): 2c7047e

update url

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -47,7 +47,7 @@ This table tracks the performance of our model on various tasks over time. The m
47
 
48
  # Tokenizer
49
  Our tokenizer was trained from scratch on 500,000 samples from the Openwebtext dataset.
50
- For variation, we also included 500,000 samples from our [GitHub-CC0](KoalaAI/GitHub-CC0) dataset, in the hopes that code would be tokenized properly despite our small vocab_size.
51
  Like Mistral, we use the LlamaTokenizerFast as our tokenizer class; in legacy mode.
52
 
53
  ## Tokenization Analysis
 
47
 
48
  # Tokenizer
49
  Our tokenizer was trained from scratch on 500,000 samples from the Openwebtext dataset.
50
+ For variation, we also included 500,000 samples from our [GitHub-CC0](https://huggingface.co/KoalaAI/GitHub-CC0) dataset, in the hopes that code would be tokenized properly despite our small vocab_size.
51
  Like Mistral, we use the LlamaTokenizerFast as our tokenizer class; in legacy mode.
52
 
53
  ## Tokenization Analysis