Improve model card: Add `library_name`, paper, and code links, and enhance formatting
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,5 +1,7 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
| 3 |
tags:
|
| 4 |
- transformer
|
| 5 |
- causal-lm
|
|
@@ -7,11 +9,12 @@ tags:
|
|
| 7 |
- constructive-learning
|
| 8 |
- frozen-embeddings
|
| 9 |
- bvv
|
| 10 |
-
pipeline_tag: text-generation
|
| 11 |
---
|
| 12 |
|
| 13 |
# Model Card for abs-bvv-6
|
| 14 |
|
|
|
|
|
|
|
| 15 |
## Model Description
|
| 16 |
|
| 17 |
`abs-bvv-6` is a 2.3 billion parameter decoder-only Transformer model. It is the sixth and final model in the **Progressive Growth Transformers (PGT)** series, designed to explore how linguistic and reasoning capabilities emerge as a function of model depth.
|
|
@@ -31,17 +34,14 @@ This model is primarily an artifact for research into emergent capabilities, con
|
|
| 31 |
|
| 32 |
## Performance
|
| 33 |
The model was evaluated on several standard benchmarks. Scores reflect performance on held-out test sets.
|
| 34 |
-
Benchmark Score (%) Ο (%)
|
| 35 |
-
|
| 36 |
-
MMLU 21.63% 0.22%
|
| 37 |
-
|
| 38 |
-
ARC-e 23.42% 1.28%
|
| 39 |
-
|
| 40 |
-
ARC-c 25.62% 1.92%
|
| 41 |
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
A key finding from the PGT series is the emergence of extractive QA capabilities (SQuAD) only in deeper models.
|
| 47 |
|
|
@@ -60,19 +60,16 @@ Data: A ~9B token mix of Wikipedia and SFT datasets (10%).
|
|
| 60 |
|
| 61 |
This model is a research prototype and has several limitations:
|
| 62 |
|
| 63 |
-
Not Instruction-Tuned
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
Data Bias: Trained primarily on Wikipedia, it will reflect the biases present in that corpus.
|
| 68 |
-
|
| 69 |
-
Limited Scope: The model was trained on a relatively small dataset (9B tokens) compared to state-of-the-art models. Its performance is intended to be evaluated relative to its own baseline (trainable embeddings) and shallower versions, not against giant commercial models.
|
| 70 |
|
| 71 |
## π§βπ¬ Citation & Concept
|
| 72 |
|
| 73 |
If you use this model or the underlying concepts in your research, please cite our work:
|
| 74 |
|
| 75 |
-
```
|
| 76 |
@misc{bochkov2025emergentsemanticstokenembeddings,
|
| 77 |
title={Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations},
|
| 78 |
author={A. Bochkov},
|
|
@@ -119,4 +116,5 @@ outputs = model.generate(
|
|
| 119 |
do_sample=True
|
| 120 |
)
|
| 121 |
|
| 122 |
-
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
pipeline_tag: text-generation
|
| 4 |
+
library_name: transformers
|
| 5 |
tags:
|
| 6 |
- transformer
|
| 7 |
- causal-lm
|
|
|
|
| 9 |
- constructive-learning
|
| 10 |
- frozen-embeddings
|
| 11 |
- bvv
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
# Model Card for abs-bvv-6
|
| 15 |
|
| 16 |
+
[[Paper](https://huggingface.co/papers/2507.07129)] [[Code](https://github.com/Bochkov/bvv241)]
|
| 17 |
+
|
| 18 |
## Model Description
|
| 19 |
|
| 20 |
`abs-bvv-6` is a 2.3 billion parameter decoder-only Transformer model. It is the sixth and final model in the **Progressive Growth Transformers (PGT)** series, designed to explore how linguistic and reasoning capabilities emerge as a function of model depth.
|
|
|
|
| 34 |
|
| 35 |
## Performance
|
| 36 |
The model was evaluated on several standard benchmarks. Scores reflect performance on held-out test sets.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
+
| Benchmark | Score (%) | Ο (%) |
|
| 39 |
+
|---|---|---|
|
| 40 |
+
| MMLU | 21.63% | 0.22% |
|
| 41 |
+
| ARC-e | 23.42% | 1.28% |
|
| 42 |
+
| ARC-c | 25.62% | 1.92% |
|
| 43 |
+
| C-SENSE | 19.51% | 0.90% |
|
| 44 |
+
| SQuAD | 5.55% | 1.05% |
|
| 45 |
|
| 46 |
A key finding from the PGT series is the emergence of extractive QA capabilities (SQuAD) only in deeper models.
|
| 47 |
|
|
|
|
| 60 |
|
| 61 |
This model is a research prototype and has several limitations:
|
| 62 |
|
| 63 |
+
* **Not Instruction-Tuned:** It is a base model and will not follow instructions or engage in dialogue reliably.
|
| 64 |
+
* **Potential for Hallucinations:** Like all LLMs, it can generate factually incorrect or nonsensical text.
|
| 65 |
+
* **Data Bias:** Trained primarily on Wikipedia, it will reflect the biases present in that corpus.
|
| 66 |
+
* **Limited Scope:** The model was trained on a relatively small dataset (9B tokens) compared to state-of-the-art models. Its performance is intended to be evaluated relative to its own baseline (trainable embeddings) and shallower versions, not against giant commercial models.
|
|
|
|
|
|
|
|
|
|
| 67 |
|
| 68 |
## π§βπ¬ Citation & Concept
|
| 69 |
|
| 70 |
If you use this model or the underlying concepts in your research, please cite our work:
|
| 71 |
|
| 72 |
+
```bibtex
|
| 73 |
@misc{bochkov2025emergentsemanticstokenembeddings,
|
| 74 |
title={Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations},
|
| 75 |
author={A. Bochkov},
|
|
|
|
| 116 |
do_sample=True
|
| 117 |
)
|
| 118 |
|
| 119 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 120 |
+
```
|