Fix RadonSAI with working config

Browse files

Files changed (3) hide show

README.md +7 -139
config.json +24 -22
tokenizer_config.json +7 -16

README.md CHANGED Viewed

@@ -1,152 +1,20 @@
 ---
 license: apache-2.0
-language:
-- ru
-- en
 tags:
-- mistral
-- russian
-- english
-- code
-- machine-learning
-- nlp
-- transformer
-- gqa
-- rmsnorm
-- swiglu
-- rope
-pipeline_tag: text-generation
 ---
-# RADON - Mistral-based Russian-English Transformer
-## Model Description
-RADON is a modern transformer model based on Mistral architecture with Llama 3 innovations, optimized for Russian-English machine learning applications. Created by **MagistrTheOne**, RADON represents a breakthrough in multilingual AI with self-awareness of its identity and capabilities.
-### About RADON
-RADON knows that it is a Mistral-based Russian-English transformer created by MagistrTheOne. The model has been designed with self-awareness and can identify itself in conversations, making it unique among open-source language models.
-### Key Features
-- **Architecture**: Mistral with Llama 3 innovations (GQA, RMSNorm, SwiGLU, RoPE)
-- **Parameters**: 2B-7B parameters
-- **Context**: 8K-32K tokens
-- **Tokenizer**: Hybrid Unigram+BPE for Russian-English
-- **Optimizations**: Flash Attention 2, Quantization support
-### Innovations
-1. **Grouped Query Attention (GQA)**: 4:1 ratio for memory efficiency
-2. **RMSNorm**: Root Mean Square Layer Normalization
-3. **SwiGLU**: Swish-Gated Linear Unit activation
-4. **RoPE**: Rotary Position Embeddings for long contexts
-5. **Sliding Window Attention**: Efficient attention for long sequences
-## Usage
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
-# Load model and tokenizer
 model = AutoModelForCausalLM.from_pretrained("MagistrTheOne/RadonSAI")
 tokenizer = AutoTokenizer.from_pretrained("MagistrTheOne/RadonSAI")
-# Generate text
-prompt = "Машинное обучение - это"
-inputs = tokenizer(prompt, return_tensors="pt")
-outputs = model.generate(**inputs, max_length=100, temperature=0.7)
-result = tokenizer.decode(outputs[0], skip_special_tokens=True)
-print(result)
-```
-## API Usage
-```python
-import requests
-# Generate text via API
-response = requests.post(
-    "https://your-api-endpoint.com/api/v1/generate",
-    json={
-        "prompt": "Привет, RADON!",
-        "max_length": 100,
-        "temperature": 0.7
-    }
-)
-print(response.json()["generated_text"])
-```
-## Performance
-- **Speed**: 3-5x faster than GPT-2
-- **Memory**: 30% less memory usage
-- **Quality**: Optimized for Russian-English ML tasks
-- **Context**: Supports up to 32K tokens
-## Model Architecture
 ```
-RADON Mistral-2B:
-- Hidden size: 2048
-- Layers: 24
-- Attention heads: 32 (8 KV heads)
-- Intermediate size: 5632
-- Vocabulary: 32K (hybrid Unigram+BPE)
-```
-## Training
-The model is trained on a clean corpus of:
-- Russian ML documentation and articles
-- English technical content
-- Code samples (Python, JavaScript, etc.)
-- Mixed Russian-English content
-## Deployment
-### Local Development
-```bash
-git clone https://github.com/MagistrTheOne/Radon2BMistral.git
-cd Radon2BMistral
-bash quick_start_local.sh
-```
-### Docker
-```bash
-docker-compose up -d
-```
-### Yandex Cloud
-```bash
-bash cloud/yc/full_deploy.sh 2b
-```
-## Citation
-```bibtex
-@misc{radon2024,
-  title={RADON: Mistral-based Russian-English Transformer},
-  author={MagistrTheOne},
-  year={2024},
-  url={https://github.com/MagistrTheOne/Radon2BMistral}
-}
-```
-## License
-Apache 2.0 License
-## Creator
-**MagistrTheOne** - Creator and lead developer of RADON
-- Specialized in multilingual AI and transformer architectures
-- Focus on Russian-English machine learning applications
-- Open-source AI advocate and researcher
-## Contact
-- GitHub: [MagistrTheOne/Radon2BMistral](https://github.com/MagistrTheOne/Radon2BMistral)
-- Hugging Face: [MagistrTheOne/RadonSAI](https://huggingface.co/MagistrTheOne/RadonSAI)
-- Creator: [MagistrTheOne](https://github.com/MagistrTheOne)

 ---
 license: apache-2.0
 tags:
+- radon
+- gpt2
+- 2000mb
+- fixed
 ---
+# RadonSAI (Fixed)
+Исправленная версия RadonSAI с рабочей конфигурацией.
+## Использование
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model = AutoModelForCausalLM.from_pretrained("MagistrTheOne/RadonSAI")
 tokenizer = AutoTokenizer.from_pretrained("MagistrTheOne/RadonSAI")
 ```

config.json CHANGED Viewed

@@ -1,26 +1,28 @@
 {
-  "model_name": "radon",
-  "model_type": "gpt2",
-  "vocab_size": 32000,
-  "hidden_size": 2048,
-  "num_layers": 24,
-  "num_attention_heads": 32,
-  "num_kv_heads": 8,
-  "intermediate_size": 5632,
-  "max_position_embeddings": 32768,
-  "sliding_window": 4096,
-  "rope_theta": 10000.0,
-  "rms_norm_eps": 1e-06,
-  "dropout": 0.1,
-  "attention_dropout": 0.1,
-  "activation_function": "silu",
-  "layer_norm_eps": 1e-06,
-  "initializer_range": 0.02,
-  "use_cache": true,
-  "torch_dtype": "float32",
-  "output_attentions": false,
-  "output_hidden_states": false,
   "architectures": [
     "GPT2LMHeadModel"
-  ]
 }

 {
   "architectures": [
     "GPT2LMHeadModel"
+  ],
+  "model_type": "gpt2",
+  "n_ctx": 1024,
+  "n_embd": 1024,
+  "n_head": 16,
+  "n_layer": 12,
+  "n_positions": 1024,
+  "vocab_size": 50257,
+  "torch_dtype": "float16",
+  "transformers_version": "4.36.2",
+  "use_cache": true,
+  "attention_dropout": 0.0,
+  "attn_pdrop": 0.1,
+  "bos_token_id": 50256,
+  "eos_token_id": 50256,
+  "embd_pdrop": 0.1,
+  "initializer_range": 0.02,
+  "layer_norm_epsilon": 1e-05,
+  "resid_pdrop": 0.1,
+  "summary_activation": null,
+  "summary_first_dropout": 0.1,
+  "summary_proj_to_labels": true,
+  "summary_type": "cls_index",
+  "summary_use_proj": true
 }

tokenizer_config.json CHANGED Viewed

@@ -1,23 +1,14 @@
 {
-  "add_bos_token": false,
-  "add_prefix_space": false,
-  "added_tokens_decoder": {
-    "50256": {
-      "content": "<|endoftext|>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    }
   },
   "bos_token": "<|endoftext|>",
-  "chat_template": "{% for message in messages %}{{ message.content }}{{ eos_token }}{% endfor %}",
-  "clean_up_tokenization_spaces": true,
   "eos_token": "<|endoftext|>",
-  "errors": "replace",
   "model_max_length": 1024,
-  "pad_token": null,
   "tokenizer_class": "GPT2Tokenizer",
   "unk_token": "<|endoftext|>"
-}

 {
+  "auto_map": {
+    "AutoTokenizer": [
+      "gpt2",
+      null
+    ]
   },
   "bos_token": "<|endoftext|>",
   "eos_token": "<|endoftext|>",
   "model_max_length": 1024,
+  "pad_token": "<|endoftext|>",
   "tokenizer_class": "GPT2Tokenizer",
   "unk_token": "<|endoftext|>"
+}