Bochkov commited on
Commit
bf1ff32
·
verified ·
1 Parent(s): 075e8b7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -23,14 +23,14 @@ The core idea is to demonstrate an alternative, more modular and resource-effici
23
  2. Complex reasoning abilities are a direct result of compositional depth.
24
  3. Models can be built incrementally, much like a living organism grows, rather than being forged all at once.
25
 
26
- `abs-bvv-3` represents the state of the model after 3 layers of progressive training. It has 3Transformer blocks, a hidden dimension of 4096, and uses the `bvv241` tokenizer family.
27
 
28
  ## Intended Use
29
 
30
  This model is primarily an artifact for research into emergent capabilities, constructive learning, and the role of embeddings in LLMs. It can be used for text generation, but it is not fine-tuned for specific downstream tasks and may produce unpredictable outputs. It is suitable for exploring the raw capabilities of a model trained under this novel paradigm.
31
 
32
  ## Training Details
33
- Architecture: 4-layer Decoder-Only Transformer (n_layer=3, d_model=4096, n_head=32).
34
 
35
  Embeddings: The token embedding layer is frozen and derived from visual representations of Unicode glyphs. It is never updated during training.
36
 
 
23
  2. Complex reasoning abilities are a direct result of compositional depth.
24
  3. Models can be built incrementally, much like a living organism grows, rather than being forged all at once.
25
 
26
+ `abs-bvv-3` represents the state of the model after 3 layers of progressive training. It has 3 Transformer blocks, a hidden dimension of 4096, and uses the `bvv241` tokenizer family.
27
 
28
  ## Intended Use
29
 
30
  This model is primarily an artifact for research into emergent capabilities, constructive learning, and the role of embeddings in LLMs. It can be used for text generation, but it is not fine-tuned for specific downstream tasks and may produce unpredictable outputs. It is suitable for exploring the raw capabilities of a model trained under this novel paradigm.
31
 
32
  ## Training Details
33
+ Architecture: 3-layer Decoder-Only Transformer (n_layer=3, d_model=4096, n_head=32).
34
 
35
  Embeddings: The token embedding layer is frozen and derived from visual representations of Unicode glyphs. It is never updated during training.
36