nielsr HF Staff commited on
Commit
ef5a0e4
·
verified ·
1 Parent(s): c7abffa

Add library_name and update model card

Browse files

This PR adds `library_name: transformers` to the YAML metadata to enable the library's automated code snippets on the hub. It also includes a link to the paper [FlashNorm: Fast Normalization for Transformers](https://huggingface.co/papers/2407.09577) in the model description.

Files changed (1) hide show
  1. README.md +11 -8
README.md CHANGED
@@ -1,17 +1,20 @@
1
  ---
2
- license: llama3.2
3
  base_model: meta-llama/Llama-3.2-1B
4
- tags:
5
- - flashnorm
6
- - transformer-tricks
7
- - efficient-inference
8
- - weightless-rmsnorm
9
  pipeline_tag: text-generation
 
 
 
 
 
10
  ---
11
 
12
  # Llama-3.2-1B-FlashNorm
13
 
14
- FlashNorm-prepared checkpoint of [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B). Mathematically equivalent to the source model. The per-channel RMSNorm weight tensors (`input_layernorm.weight`, `post_attention_layernorm.weight`, `model.norm.weight`) are folded into the following linear layers and then removed from the state dict entirely.
 
 
15
 
16
  > **Framework support note.** Stock vLLM currently does not load this checkpoint because the norm weight tensors are absent. The upstream patch to accept missing tensors is tracked at: **TBD (vLLM issue link)**. Until the patch lands, use HuggingFace Transformers; it loads this with a warning that norm weights were not initialized and defaults them to ones, which is the correct behavior for FlashNorm.
17
 
@@ -55,4 +58,4 @@ Not yet supported. See the tracking issue linked above.
55
 
56
  ## License
57
 
58
- Inherited from the source model.
 
1
  ---
 
2
  base_model: meta-llama/Llama-3.2-1B
3
+ library_name: transformers
4
+ license: llama3.2
 
 
 
5
  pipeline_tag: text-generation
6
+ tags:
7
+ - flashnorm
8
+ - transformer-tricks
9
+ - efficient-inference
10
+ - weightless-rmsnorm
11
  ---
12
 
13
  # Llama-3.2-1B-FlashNorm
14
 
15
+ FlashNorm-prepared checkpoint of [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B). This model was presented in the paper [FlashNorm: Fast Normalization for Transformers](https://huggingface.co/papers/2407.09577).
16
+
17
+ Mathematically equivalent to the source model. The per-channel RMSNorm weight tensors (`input_layernorm.weight`, `post_attention_layernorm.weight`, `model.norm.weight`) are folded into the following linear layers and then removed from the state dict entirely.
18
 
19
  > **Framework support note.** Stock vLLM currently does not load this checkpoint because the norm weight tensors are absent. The upstream patch to accept missing tensors is tracked at: **TBD (vLLM issue link)**. Until the patch lands, use HuggingFace Transformers; it loads this with a warning that norm weights were not initialized and defaults them to ones, which is the correct behavior for FlashNorm.
20
 
 
58
 
59
  ## License
60
 
61
+ Inherited from the source model.