pika
π You are looking at pika 2, which incorporates the following changes:
- π¬π§ Usage of an English dataset
- π€ Smaller size (8K vocabulary)
- π« No unknown token
pika is a simple and public domain-like tokenizer.
Special Tokens
- End-of-Sequence token:
[EOS] - Padding token:
[PAD]
Training
pika was trained on the first 6K rows of a Cosmopedia sample.
Limitations
Due to its small corpus, pika may split words into smaller pieces. Also, some uncommon special tokens aren't present, you'll have to add them manually if needed.
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support