KoalaAI
/

Bamboo-Nano

Text Generation

text-generation-inference

Model card Files Files and versions

DarwinAnim8or commited on Dec 1, 2024

Commit

1447346

·

verified ·

1 Parent(s): 2c7047e

update url

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -47,7 +47,7 @@ This table tracks the performance of our model on various tasks over time. The m
 # Tokenizer
 Our tokenizer was trained from scratch on 500,000 samples from the Openwebtext dataset.
-For variation, we also included 500,000 samples from our [GitHub-CC0](KoalaAI/GitHub-CC0) dataset, in the hopes that code would be tokenized properly despite our small vocab_size.
 Like Mistral, we use the LlamaTokenizerFast as our tokenizer class; in legacy mode.
 ## Tokenization Analysis

 # Tokenizer
 Our tokenizer was trained from scratch on 500,000 samples from the Openwebtext dataset.
+For variation, we also included 500,000 samples from our [GitHub-CC0](https://huggingface.co/KoalaAI/GitHub-CC0) dataset, in the hopes that code would be tokenized properly despite our small vocab_size.
 Like Mistral, we use the LlamaTokenizerFast as our tokenizer class; in legacy mode.
 ## Tokenization Analysis