Improve model card: Add `library_name`, paper, and code links, and enhance formatting

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +18 -20
README.md CHANGED
@@ -1,5 +1,7 @@
1
  ---
2
  license: apache-2.0
 
 
3
  tags:
4
  - transformer
5
  - causal-lm
@@ -7,11 +9,12 @@ tags:
7
  - constructive-learning
8
  - frozen-embeddings
9
  - bvv
10
- pipeline_tag: text-generation
11
  ---
12
 
13
  # Model Card for abs-bvv-6
14
 
 
 
15
  ## Model Description
16
 
17
  `abs-bvv-6` is a 2.3 billion parameter decoder-only Transformer model. It is the sixth and final model in the **Progressive Growth Transformers (PGT)** series, designed to explore how linguistic and reasoning capabilities emerge as a function of model depth.
@@ -31,17 +34,14 @@ This model is primarily an artifact for research into emergent capabilities, con
31
 
32
  ## Performance
33
  The model was evaluated on several standard benchmarks. Scores reflect performance on held-out test sets.
34
- Benchmark Score (%) Οƒ (%)
35
-
36
- MMLU 21.63% 0.22%
37
-
38
- ARC-e 23.42% 1.28%
39
-
40
- ARC-c 25.62% 1.92%
41
 
42
- C-SENSE 19.51% 0.90%
43
-
44
- SQuAD 5.55% 1.05%
 
 
 
 
45
 
46
  A key finding from the PGT series is the emergence of extractive QA capabilities (SQuAD) only in deeper models.
47
 
@@ -60,19 +60,16 @@ Data: A ~9B token mix of Wikipedia and SFT datasets (10%).
60
 
61
  This model is a research prototype and has several limitations:
62
 
63
- Not Instruction-Tuned: It is a base model and will not follow instructions or engage in dialogue reliably.
64
-
65
- Potential for Hallucinations: Like all LLMs, it can generate factually incorrect or nonsensical text.
66
-
67
- Data Bias: Trained primarily on Wikipedia, it will reflect the biases present in that corpus.
68
-
69
- Limited Scope: The model was trained on a relatively small dataset (9B tokens) compared to state-of-the-art models. Its performance is intended to be evaluated relative to its own baseline (trainable embeddings) and shallower versions, not against giant commercial models.
70
 
71
  ## πŸ§‘β€πŸ”¬ Citation & Concept
72
 
73
  If you use this model or the underlying concepts in your research, please cite our work:
74
 
75
- ```
76
  @misc{bochkov2025emergentsemanticstokenembeddings,
77
  title={Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations},
78
  author={A. Bochkov},
@@ -119,4 +116,5 @@ outputs = model.generate(
119
  do_sample=True
120
  )
121
 
122
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: text-generation
4
+ library_name: transformers
5
  tags:
6
  - transformer
7
  - causal-lm
 
9
  - constructive-learning
10
  - frozen-embeddings
11
  - bvv
 
12
  ---
13
 
14
  # Model Card for abs-bvv-6
15
 
16
+ [[Paper](https://huggingface.co/papers/2507.07129)] [[Code](https://github.com/Bochkov/bvv241)]
17
+
18
  ## Model Description
19
 
20
  `abs-bvv-6` is a 2.3 billion parameter decoder-only Transformer model. It is the sixth and final model in the **Progressive Growth Transformers (PGT)** series, designed to explore how linguistic and reasoning capabilities emerge as a function of model depth.
 
34
 
35
  ## Performance
36
  The model was evaluated on several standard benchmarks. Scores reflect performance on held-out test sets.
 
 
 
 
 
 
 
37
 
38
+ | Benchmark | Score (%) | Οƒ (%) |
39
+ |---|---|---|
40
+ | MMLU | 21.63% | 0.22% |
41
+ | ARC-e | 23.42% | 1.28% |
42
+ | ARC-c | 25.62% | 1.92% |
43
+ | C-SENSE | 19.51% | 0.90% |
44
+ | SQuAD | 5.55% | 1.05% |
45
 
46
  A key finding from the PGT series is the emergence of extractive QA capabilities (SQuAD) only in deeper models.
47
 
 
60
 
61
  This model is a research prototype and has several limitations:
62
 
63
+ * **Not Instruction-Tuned:** It is a base model and will not follow instructions or engage in dialogue reliably.
64
+ * **Potential for Hallucinations:** Like all LLMs, it can generate factually incorrect or nonsensical text.
65
+ * **Data Bias:** Trained primarily on Wikipedia, it will reflect the biases present in that corpus.
66
+ * **Limited Scope:** The model was trained on a relatively small dataset (9B tokens) compared to state-of-the-art models. Its performance is intended to be evaluated relative to its own baseline (trainable embeddings) and shallower versions, not against giant commercial models.
 
 
 
67
 
68
  ## πŸ§‘β€πŸ”¬ Citation & Concept
69
 
70
  If you use this model or the underlying concepts in your research, please cite our work:
71
 
72
+ ```bibtex
73
  @misc{bochkov2025emergentsemanticstokenembeddings,
74
  title={Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations},
75
  author={A. Bochkov},
 
116
  do_sample=True
117
  )
118
 
119
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
120
+ ```