MagistrTheOne commited on
Commit
24827e9
·
verified ·
1 Parent(s): 63b89c4

Fix RadonSAI-Small with working config

Browse files
Files changed (3) hide show
  1. README.md +20 -108
  2. config.json +24 -16
  3. tokenizer_config.json +7 -16
README.md CHANGED
@@ -1,108 +1,20 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - ru
5
- - en
6
- tags:
7
- - mistral
8
- - russian
9
- - english
10
- - code
11
- - machine-learning
12
- - nlp
13
- - transformer
14
- - small
15
- - demo
16
- pipeline_tag: text-generation
17
- size_categories: 100M
18
- ---
19
-
20
- # RADON-Small - Compact Mistral-based Russian-English Transformer
21
-
22
- ## Model Description
23
-
24
- RADON-Small is a compact version of the RADON transformer model, optimized for development, testing, and resource-constrained environments.
25
-
26
- ### Key Features
27
-
28
- - **Architecture**: Mistral with Llama 3 innovations (GQA, RMSNorm, SwiGLU, RoPE)
29
- - **Parameters**: ~50M parameters (small version)
30
- - **Context**: 2K tokens
31
- - **Tokenizer**: Hybrid Unigram+BPE for Russian-English
32
- - **Status**: Initialized with random weights (training required)
33
- - **Use Case**: Development, testing, prototyping
34
-
35
- ### Model Weights
36
-
37
- This is a small model with initialized weights:
38
-
39
- - **Format**: PyTorch (.bin) and Safetensors (.safetensors)
40
- - **Dtype**: float16
41
- - **Initialization**: Random
42
- - **Size**: ~100MB (50M parameters)
43
-
44
- ### Usage
45
-
46
- ```python
47
- from transformers import AutoModelForCausalLM, AutoTokenizer
48
-
49
- # Load small model
50
- model = AutoModelForCausalLM.from_pretrained("MagistrTheOne/RadonSAI-Small")
51
- tokenizer = AutoTokenizer.from_pretrained("MagistrTheOne/RadonSAI-Small")
52
-
53
- # Note: This model has random weights and needs training
54
- # For inference, you should use a trained version
55
-
56
- # Generate text (will produce random output)
57
- prompt = "Машинное обучение - это"
58
- inputs = tokenizer(prompt, return_tensors="pt")
59
- outputs = model.generate(**inputs, max_length=50, temperature=0.7)
60
- result = tokenizer.decode(outputs[0], skip_special_tokens=True)
61
- print(result)
62
- ```
63
-
64
- ### Training
65
-
66
- This small model is perfect for:
67
-
68
- 1. **Development and testing**
69
- 2. **Learning transformer architectures**
70
- 3. **Prototyping new ideas**
71
- 4. **Resource-constrained environments**
72
-
73
- ### Model Architecture
74
-
75
- ```
76
- RADON-Small:
77
- - Hidden size: 512
78
- - Layers: 6
79
- - Attention heads: 8 (2 KV heads)
80
- - Intermediate size: 1024
81
- - Vocabulary: 8K
82
- - Context window: 2K tokens
83
- ```
84
-
85
- ### Related Models
86
-
87
- - **Full Model**: [MagistrTheOne/RadonSAI](https://huggingface.co/MagistrTheOne/RadonSAI)
88
- - **Datasets**: [MagistrTheOne/radon-examples](https://huggingface.co/datasets/MagistrTheOne/radon-examples)
89
-
90
- ### Citation
91
-
92
- ```bibtex
93
- @misc{radon2024small,
94
- title={RADON-Small: Compact Mistral-based Russian-English Transformer},
95
- author={MagistrTheOne},
96
- year={2024},
97
- url={https://github.com/MagistrTheOne/Radon2BMistral}
98
- }
99
- ```
100
-
101
- ### License
102
-
103
- Apache 2.0 License
104
-
105
- ### Contact
106
-
107
- - GitHub: [MagistrTheOne/Radon2BMistral](https://github.com/MagistrTheOne/Radon2BMistral)
108
- - Hugging Face: [MagistrTheOne/RadonSAI-Small](https://huggingface.co/MagistrTheOne/RadonSAI-Small)
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - radon
5
+ - gpt2
6
+ - 22mb
7
+ - fixed
8
+ ---
9
+
10
+ # RadonSAI-Small (Fixed)
11
+
12
+ Исправленная версия RadonSAI-Small с рабочей конфигурацией.
13
+
14
+ ## Использование
15
+ ```python
16
+ from transformers import AutoModelForCausalLM, AutoTokenizer
17
+
18
+ model = AutoModelForCausalLM.from_pretrained("MagistrTheOne/RadonSAI-Small")
19
+ tokenizer = AutoTokenizer.from_pretrained("MagistrTheOne/RadonSAI-Small")
20
+ ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json CHANGED
@@ -1,20 +1,28 @@
1
  {
2
- "model_name": "radon",
3
- "model_type": "gpt2",
4
- "vocab_size": 32000,
5
- "hidden_size": 256,
6
- "num_layers": 6,
7
- "num_attention_heads": 8,
8
- "intermediate_size": 1024,
9
- "max_position_embeddings": 512,
10
- "dropout": 0.1,
11
- "attention_dropout": 0.1,
12
- "activation_function": "gelu",
13
- "layer_norm_eps": 1e-05,
14
- "initializer_range": 0.02,
15
- "use_cache": true,
16
- "torch_dtype": "float32",
17
  "architectures": [
18
  "GPT2LMHeadModel"
19
- ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  }
 
1
  {
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  "architectures": [
3
  "GPT2LMHeadModel"
4
+ ],
5
+ "model_type": "gpt2",
6
+ "n_ctx": 1024,
7
+ "n_embd": 512,
8
+ "n_head": 8,
9
+ "n_layer": 6,
10
+ "n_positions": 1024,
11
+ "vocab_size": 50257,
12
+ "torch_dtype": "float16",
13
+ "transformers_version": "4.36.2",
14
+ "use_cache": true,
15
+ "attention_dropout": 0.0,
16
+ "attn_pdrop": 0.1,
17
+ "bos_token_id": 50256,
18
+ "eos_token_id": 50256,
19
+ "embd_pdrop": 0.1,
20
+ "initializer_range": 0.02,
21
+ "layer_norm_epsilon": 1e-05,
22
+ "resid_pdrop": 0.1,
23
+ "summary_activation": null,
24
+ "summary_first_dropout": 0.1,
25
+ "summary_proj_to_labels": true,
26
+ "summary_type": "cls_index",
27
+ "summary_use_proj": true
28
  }
tokenizer_config.json CHANGED
@@ -1,23 +1,14 @@
1
  {
2
- "add_bos_token": false,
3
- "add_prefix_space": false,
4
- "added_tokens_decoder": {
5
- "50256": {
6
- "content": "<|endoftext|>",
7
- "lstrip": false,
8
- "normalized": true,
9
- "rstrip": false,
10
- "single_word": false,
11
- "special": true
12
- }
13
  },
14
  "bos_token": "<|endoftext|>",
15
- "chat_template": "{% for message in messages %}{{ message.content }}{{ eos_token }}{% endfor %}",
16
- "clean_up_tokenization_spaces": true,
17
  "eos_token": "<|endoftext|>",
18
- "errors": "replace",
19
  "model_max_length": 1024,
20
- "pad_token": null,
21
  "tokenizer_class": "GPT2Tokenizer",
22
  "unk_token": "<|endoftext|>"
23
- }
 
1
  {
2
+ "auto_map": {
3
+ "AutoTokenizer": [
4
+ "gpt2",
5
+ null
6
+ ]
 
 
 
 
 
 
7
  },
8
  "bos_token": "<|endoftext|>",
 
 
9
  "eos_token": "<|endoftext|>",
 
10
  "model_max_length": 1024,
11
+ "pad_token": "<|endoftext|>",
12
  "tokenizer_class": "GPT2Tokenizer",
13
  "unk_token": "<|endoftext|>"
14
+ }