# Nyx: Core-Outline Transformer Model Nyx is a transformer-based language model designed for efficient text generation and understanding. This model is part of the Core-Outline project, focusing on providing high-quality text generation capabilities with a focus on financial, SaaS, social media, customer, and customer feedback analytics data. ## Model Architecture Nyx is built on a transformer decoder-only architecture with the following key components: - **Rotary Position Embeddings (RoPE)**: For better handling of sequence positions - **Multi-head Self-Attention**: With grouped-query attention for efficient inference - **SwiGLU Activation**: For the feed-forward networks - **RMSNorm**: For layer normalization - **Sliding Window Attention**: For handling longer sequences efficiently ### Model Specifications | Parameter | Value | |-----------|-------| | Hidden Size | 1024 | | Number of Layers | 24 | | Number of Attention Heads | 16 | | Number of Key-Value Heads | 16 | | Intermediate Size | 2816 | | Max Sequence Length | 32,768 tokens | | Vocabulary Size | 151,936 | | Activation | SwiGLU (SiLU) | ## Usage ### Prerequisites - Python 3.11+ - PyTorch 2.0+ - Transformers library - FastAPI (for API server) ### Loading the Model ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_path = "core-outline/nyx" model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("core-outline/nyx") # Using Qwen tokenizer ``` ### Text Generation ```python def generate_text(prompt, max_length=100, temperature=0.7): inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate( inputs.input_ids, max_length=max_length, temperature=temperature, do_sample=True, pad_token_id=tokenizer.eos_token_id ) return tokenizer.decode(outputs[0], skip_special_tokens=True) ``` ## Model Configuration The model uses the following key configuration parameters (from `config.json`): ```json { "hidden_size": 1024, "intermediate_size": 2816, "num_hidden_layers": 24, "num_attention_heads": 16, "num_key_value_heads": 16, "max_position_embeddings": 32768, "rms_norm_eps": 1e-6, "rope_theta": 1000000.0 } ``` ## Tokenizer The model uses the Qwen tokenizer, which is a BPE-based tokenizer with a vocabulary size of 151,936 tokens. ## Training Data The model has been trained on a diverse dataset including: - Financial analytics - SaaS metrics - Social media data - Customer data - Customer feedback ## License [Specify your license here] ## Acknowledgements - The model architecture is based on the Qwen/Llama architecture - Uses Rotary Position Embeddings (RoPE) for position encoding - Implements grouped-query attention for efficient inference