MagistrTheOne commited on
Commit
40651cc
·
verified ·
1 Parent(s): e9df3de

Fix RadonSAI with working config

Browse files
Files changed (3) hide show
  1. README.md +7 -139
  2. config.json +24 -22
  3. tokenizer_config.json +7 -16
README.md CHANGED
@@ -1,152 +1,20 @@
1
  ---
2
  license: apache-2.0
3
- language:
4
- - ru
5
- - en
6
  tags:
7
- - mistral
8
- - russian
9
- - english
10
- - code
11
- - machine-learning
12
- - nlp
13
- - transformer
14
- - gqa
15
- - rmsnorm
16
- - swiglu
17
- - rope
18
- pipeline_tag: text-generation
19
  ---
20
 
21
- # RADON - Mistral-based Russian-English Transformer
22
 
23
- ## Model Description
24
-
25
- RADON is a modern transformer model based on Mistral architecture with Llama 3 innovations, optimized for Russian-English machine learning applications. Created by **MagistrTheOne**, RADON represents a breakthrough in multilingual AI with self-awareness of its identity and capabilities.
26
-
27
- ### About RADON
28
-
29
- RADON knows that it is a Mistral-based Russian-English transformer created by MagistrTheOne. The model has been designed with self-awareness and can identify itself in conversations, making it unique among open-source language models.
30
-
31
- ### Key Features
32
-
33
- - **Architecture**: Mistral with Llama 3 innovations (GQA, RMSNorm, SwiGLU, RoPE)
34
- - **Parameters**: 2B-7B parameters
35
- - **Context**: 8K-32K tokens
36
- - **Tokenizer**: Hybrid Unigram+BPE for Russian-English
37
- - **Optimizations**: Flash Attention 2, Quantization support
38
-
39
- ### Innovations
40
-
41
- 1. **Grouped Query Attention (GQA)**: 4:1 ratio for memory efficiency
42
- 2. **RMSNorm**: Root Mean Square Layer Normalization
43
- 3. **SwiGLU**: Swish-Gated Linear Unit activation
44
- 4. **RoPE**: Rotary Position Embeddings for long contexts
45
- 5. **Sliding Window Attention**: Efficient attention for long sequences
46
-
47
- ## Usage
48
 
 
49
  ```python
50
  from transformers import AutoModelForCausalLM, AutoTokenizer
51
 
52
- # Load model and tokenizer
53
  model = AutoModelForCausalLM.from_pretrained("MagistrTheOne/RadonSAI")
54
  tokenizer = AutoTokenizer.from_pretrained("MagistrTheOne/RadonSAI")
55
-
56
- # Generate text
57
- prompt = "Машинное обучение - это"
58
- inputs = tokenizer(prompt, return_tensors="pt")
59
- outputs = model.generate(**inputs, max_length=100, temperature=0.7)
60
- result = tokenizer.decode(outputs[0], skip_special_tokens=True)
61
- print(result)
62
- ```
63
-
64
- ## API Usage
65
-
66
- ```python
67
- import requests
68
-
69
- # Generate text via API
70
- response = requests.post(
71
- "https://your-api-endpoint.com/api/v1/generate",
72
- json={
73
- "prompt": "Привет, RADON!",
74
- "max_length": 100,
75
- "temperature": 0.7
76
- }
77
- )
78
- print(response.json()["generated_text"])
79
- ```
80
-
81
- ## Performance
82
-
83
- - **Speed**: 3-5x faster than GPT-2
84
- - **Memory**: 30% less memory usage
85
- - **Quality**: Optimized for Russian-English ML tasks
86
- - **Context**: Supports up to 32K tokens
87
-
88
- ## Model Architecture
89
-
90
  ```
91
- RADON Mistral-2B:
92
- - Hidden size: 2048
93
- - Layers: 24
94
- - Attention heads: 32 (8 KV heads)
95
- - Intermediate size: 5632
96
- - Vocabulary: 32K (hybrid Unigram+BPE)
97
- ```
98
-
99
- ## Training
100
-
101
- The model is trained on a clean corpus of:
102
- - Russian ML documentation and articles
103
- - English technical content
104
- - Code samples (Python, JavaScript, etc.)
105
- - Mixed Russian-English content
106
-
107
- ## Deployment
108
-
109
- ### Local Development
110
- ```bash
111
- git clone https://github.com/MagistrTheOne/Radon2BMistral.git
112
- cd Radon2BMistral
113
- bash quick_start_local.sh
114
- ```
115
-
116
- ### Docker
117
- ```bash
118
- docker-compose up -d
119
- ```
120
-
121
- ### Yandex Cloud
122
- ```bash
123
- bash cloud/yc/full_deploy.sh 2b
124
- ```
125
-
126
- ## Citation
127
-
128
- ```bibtex
129
- @misc{radon2024,
130
- title={RADON: Mistral-based Russian-English Transformer},
131
- author={MagistrTheOne},
132
- year={2024},
133
- url={https://github.com/MagistrTheOne/Radon2BMistral}
134
- }
135
- ```
136
-
137
- ## License
138
-
139
- Apache 2.0 License
140
-
141
- ## Creator
142
-
143
- **MagistrTheOne** - Creator and lead developer of RADON
144
- - Specialized in multilingual AI and transformer architectures
145
- - Focus on Russian-English machine learning applications
146
- - Open-source AI advocate and researcher
147
-
148
- ## Contact
149
-
150
- - GitHub: [MagistrTheOne/Radon2BMistral](https://github.com/MagistrTheOne/Radon2BMistral)
151
- - Hugging Face: [MagistrTheOne/RadonSAI](https://huggingface.co/MagistrTheOne/RadonSAI)
152
- - Creator: [MagistrTheOne](https://github.com/MagistrTheOne)
 
1
  ---
2
  license: apache-2.0
 
 
 
3
  tags:
4
+ - radon
5
+ - gpt2
6
+ - 2000mb
7
+ - fixed
 
 
 
 
 
 
 
 
8
  ---
9
 
10
+ # RadonSAI (Fixed)
11
 
12
+ Исправленная версия RadonSAI с рабочей конфигурацией.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
+ ## Использование
15
  ```python
16
  from transformers import AutoModelForCausalLM, AutoTokenizer
17
 
 
18
  model = AutoModelForCausalLM.from_pretrained("MagistrTheOne/RadonSAI")
19
  tokenizer = AutoTokenizer.from_pretrained("MagistrTheOne/RadonSAI")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json CHANGED
@@ -1,26 +1,28 @@
1
  {
2
- "model_name": "radon",
3
- "model_type": "gpt2",
4
- "vocab_size": 32000,
5
- "hidden_size": 2048,
6
- "num_layers": 24,
7
- "num_attention_heads": 32,
8
- "num_kv_heads": 8,
9
- "intermediate_size": 5632,
10
- "max_position_embeddings": 32768,
11
- "sliding_window": 4096,
12
- "rope_theta": 10000.0,
13
- "rms_norm_eps": 1e-06,
14
- "dropout": 0.1,
15
- "attention_dropout": 0.1,
16
- "activation_function": "silu",
17
- "layer_norm_eps": 1e-06,
18
- "initializer_range": 0.02,
19
- "use_cache": true,
20
- "torch_dtype": "float32",
21
- "output_attentions": false,
22
- "output_hidden_states": false,
23
  "architectures": [
24
  "GPT2LMHeadModel"
25
- ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  }
 
1
  {
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  "architectures": [
3
  "GPT2LMHeadModel"
4
+ ],
5
+ "model_type": "gpt2",
6
+ "n_ctx": 1024,
7
+ "n_embd": 1024,
8
+ "n_head": 16,
9
+ "n_layer": 12,
10
+ "n_positions": 1024,
11
+ "vocab_size": 50257,
12
+ "torch_dtype": "float16",
13
+ "transformers_version": "4.36.2",
14
+ "use_cache": true,
15
+ "attention_dropout": 0.0,
16
+ "attn_pdrop": 0.1,
17
+ "bos_token_id": 50256,
18
+ "eos_token_id": 50256,
19
+ "embd_pdrop": 0.1,
20
+ "initializer_range": 0.02,
21
+ "layer_norm_epsilon": 1e-05,
22
+ "resid_pdrop": 0.1,
23
+ "summary_activation": null,
24
+ "summary_first_dropout": 0.1,
25
+ "summary_proj_to_labels": true,
26
+ "summary_type": "cls_index",
27
+ "summary_use_proj": true
28
  }
tokenizer_config.json CHANGED
@@ -1,23 +1,14 @@
1
  {
2
- "add_bos_token": false,
3
- "add_prefix_space": false,
4
- "added_tokens_decoder": {
5
- "50256": {
6
- "content": "<|endoftext|>",
7
- "lstrip": false,
8
- "normalized": true,
9
- "rstrip": false,
10
- "single_word": false,
11
- "special": true
12
- }
13
  },
14
  "bos_token": "<|endoftext|>",
15
- "chat_template": "{% for message in messages %}{{ message.content }}{{ eos_token }}{% endfor %}",
16
- "clean_up_tokenization_spaces": true,
17
  "eos_token": "<|endoftext|>",
18
- "errors": "replace",
19
  "model_max_length": 1024,
20
- "pad_token": null,
21
  "tokenizer_class": "GPT2Tokenizer",
22
  "unk_token": "<|endoftext|>"
23
- }
 
1
  {
2
+ "auto_map": {
3
+ "AutoTokenizer": [
4
+ "gpt2",
5
+ null
6
+ ]
 
 
 
 
 
 
7
  },
8
  "bos_token": "<|endoftext|>",
 
 
9
  "eos_token": "<|endoftext|>",
 
10
  "model_max_length": 1024,
11
+ "pad_token": "<|endoftext|>",
12
  "tokenizer_class": "GPT2Tokenizer",
13
  "unk_token": "<|endoftext|>"
14
+ }