Bochkov
/

abs-bvv-6

@@ -1,5 +1,7 @@
 ---
 license: apache-2.0
 tags:
 - transformer
 - causal-lm
@@ -7,11 +9,12 @@ tags:
 - constructive-learning
 - frozen-embeddings
 - bvv
-pipeline_tag: text-generation
 ---
 # Model Card for abs-bvv-6
 ## Model Description
 `abs-bvv-6` is a 2.3 billion parameter decoder-only Transformer model. It is the sixth and final model in the **Progressive Growth Transformers (PGT)** series, designed to explore how linguistic and reasoning capabilities emerge as a function of model depth.
@@ -31,17 +34,14 @@ This model is primarily an artifact for research into emergent capabilities, con
 ## Performance
 The model was evaluated on several standard benchmarks. Scores reflect performance on held-out test sets.
-Benchmark	Score (%)	σ (%)
-MMLU	21.63%	0.22%
-ARC-e	23.42%	1.28%
-ARC-c	25.62%	1.92%
-C-SENSE	19.51%	0.90%
-SQuAD	5.55%	1.05%
 A key finding from the PGT series is the emergence of extractive QA capabilities (SQuAD) only in deeper models.
@@ -60,19 +60,16 @@ Data: A ~9B token mix of Wikipedia and SFT datasets (10%).
 This model is a research prototype and has several limitations:
-Not Instruction-Tuned: It is a base model and will not follow instructions or engage in dialogue reliably.
-Potential for Hallucinations: Like all LLMs, it can generate factually incorrect or nonsensical text.
-Data Bias: Trained primarily on Wikipedia, it will reflect the biases present in that corpus.
-Limited Scope: The model was trained on a relatively small dataset (9B tokens) compared to state-of-the-art models. Its performance is intended to be evaluated relative to its own baseline (trainable embeddings) and shallower versions, not against giant commercial models.
 ## 🧑‍🔬 Citation & Concept
 If you use this model or the underlying concepts in your research, please cite our work:
-```
 @misc{bochkov2025emergentsemanticstokenembeddings,
       title={Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations},
       author={A. Bochkov},
@@ -119,4 +116,5 @@ outputs = model.generate(
     do_sample=True
 )
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))

 ---
 license: apache-2.0
+pipeline_tag: text-generation
+library_name: transformers
 tags:
 - transformer
 - causal-lm
 - constructive-learning
 - frozen-embeddings
 - bvv
 ---
 # Model Card for abs-bvv-6
+[[Paper](https://huggingface.co/papers/2507.07129)] [[Code](https://github.com/Bochkov/bvv241)]
 ## Model Description
 `abs-bvv-6` is a 2.3 billion parameter decoder-only Transformer model. It is the sixth and final model in the **Progressive Growth Transformers (PGT)** series, designed to explore how linguistic and reasoning capabilities emerge as a function of model depth.
 ## Performance
 The model was evaluated on several standard benchmarks. Scores reflect performance on held-out test sets.
+| Benchmark | Score (%) | σ (%) |
+|---|---|---|
+| MMLU | 21.63% | 0.22% |
+| ARC-e | 23.42% | 1.28% |
+| ARC-c | 25.62% | 1.92% |
+| C-SENSE | 19.51% | 0.90% |
+| SQuAD | 5.55% | 1.05% |
 A key finding from the PGT series is the emergence of extractive QA capabilities (SQuAD) only in deeper models.
 This model is a research prototype and has several limitations:
+*   **Not Instruction-Tuned:** It is a base model and will not follow instructions or engage in dialogue reliably.
+*   **Potential for Hallucinations:** Like all LLMs, it can generate factually incorrect or nonsensical text.
+*   **Data Bias:** Trained primarily on Wikipedia, it will reflect the biases present in that corpus.
+*   **Limited Scope:** The model was trained on a relatively small dataset (9B tokens) compared to state-of-the-art models. Its performance is intended to be evaluated relative to its own baseline (trainable embeddings) and shallower versions, not against giant commercial models.
 ## 🧑‍🔬 Citation & Concept
 If you use this model or the underlying concepts in your research, please cite our work:
+```bibtex
 @misc{bochkov2025emergentsemanticstokenembeddings,
       title={Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations},
       author={A. Bochkov},
     do_sample=True
 )
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```