Text Generation
Transformers
Safetensors
llama
text-generation-inference
File size: 2,228 Bytes
d3fe2a0
 
 
 
 
 
 
 
0a65c2c
98cbd25
c71a696
98cbd25
7260d48
6854049
7260d48
98cbd25
0a65c2c
98cbd25
cceeadf
6854049
7260d48
 
 
6854049
7260d48
98cbd25
aaada0b
98cbd25
8dcf09e
 
 
 
0a65c2c
98cbd25
6854049
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
library_name: transformers
datasets:
- HuggingFaceFW/finepdfs
- HuggingFaceFW/fineweb-edu
- gair-prox/FineWeb-pro
license: mit
---
# MultivexAI/Plyx-15M

**MultivexAI/Plyx-15M** is a 15 million parameter 8-layer language model, trained from scratch using the Llama architecture.

We built this model to be a small, useful foundation for various tasks. It's a great starting point for quick tests, research projects, or fine-tuning on specialized jobs where a small model footprint is important.

**Model Series Note:** This is the first model in our Plyx series. We're continuing this work and plan to release future models in various sizes. We'll be adding some initial performance benchmarks here soon.

## Pre-training Data

The model was trained on a carefully curated mix of data to build a great foundation, trained on approx ~600M tokens:

1.  **`fineweb-pro`**: A heavily filtered and refined version of the FineWeb dataset. This provides a strong base in general-purpose language by removing significant noise and low-quality content.
2.  **`fineweb-edu`**: A subset of FineWeb containing educational and instructional content, used to ground the model in well-structured, factual information.
3.  **`finepdfs`**: A large collection of documents from PDFs, including professional reports and technical papers. This component introduces the model to more formal language, complex sentence structures, and data-rich formats.

### A Note on Size and Performance

To set the right expectations: **Plyx-15M is a 15-million-parameter model, which is quite small.** Its performance won't be comparable to models with billions of parameters. It's best used for research, highly specific tasks, or as a base for fine-tuning - not as a drop-in replacement for a large, general-purpose model.

## Limitations

Users should be aware of the biases and limitations of this model, as no model is truly perfect. 

## License

The data used for pre-training (`fineweb-pro`, `fineweb-edu`, and `finepdfs`) is derived from sources made available under the **ODC-By 1.0 license**. Users must also abide by the [CommonCrawl Terms of Use](https://commoncrawl.org/terms-of-use/). We do not alter the license of any of the underlying data.