Austin207
/

Map-NEO

+---
+language:
+- en
+license: mit
+library_name: transformers
+tags:
+- text-generation
+- pytorch
+- custom-architecture
+- rope
+- rmsnorm
+- swiglu
+- flash-attention
+- 16k-context
+pipeline_tag: text-generation
+widget:
+- text: "The future of artificial intelligence is"
+  example_title: "AI Future"
+- text: "Write a short story about"
+  example_title: "Story Generation"
+- text: "Explain quantum computing in simple terms:"
+  example_title: "Technical Explanation"
+datasets:
+- tiiuae/falcon-refinedweb
+metrics:
+- perplexity
+model-index:
+- name: MAP-NEO Mini
+  results:
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: RefinedWeb (100K subset)
+      type: tiiuae/falcon-refinedweb
+    metrics:
+    - type: perplexity
+      value: 3.9
+      name: Final Training Loss
+---
+# MAP-NEO Mini
+## Model Description
+**MAP-NEO Mini** is a 253M parameter autoregressive language model built from scratch with modern architectural improvements. It demonstrates that high-quality language models can be trained efficiently on modest hardware while achieving competitive performance through careful data curation and architectural choices.
+- **Developed by**: [Your Name/Organization]
+- **Model type**: Autoregressive Language Model
+- **Language(s)**: English
+- **License**: MIT
+- **Architecture**: Custom transformer with RoPE, RMSNorm, SwiGLU, and Flash Attention
+## Key Features
+- 🚀 **Efficient Training**: Trained on RTX 5070 (8GB VRAM) in ~4 hours
+- 📏 **Extended Context**: 16,384 token context window (16x typical small models)
+- 🧠 **Memory Efficient**: Only 1.3GB VRAM for 1,800 tokens inference
+- ⚡ **Fast Inference**: ~10 tokens/second on consumer GPU
+- 🎯 **High Quality Data**: Trained on curated RefinedWeb subset
+## Architecture Details
+### Model Architecture
+- **Parameters**: 253,085,696 (253M)
+- **Layers**: 16 transformer blocks
+- **Hidden Size**: 1,024
+- **Attention Heads**: 16
+- **Head Dimension**: 64
+- **FFN Hidden Size**: 2,736 (2.67x hidden size)
+- **Vocabulary Size**: 50,257 (GPT-2 tokenizer)
+- **Max Sequence Length**: 16,384 tokens
+### Architectural Innovations
+- **RMSNorm**: Root Mean Square Layer Normalization for training stability
+- **RoPE**: Rotary Positional Embeddings for better positional understanding
+- **SwiGLU**: Swish-Gated Linear Units for improved FFN performance
+- **Flash Attention**: Memory-efficient attention computation
+- **Weight Tying**: Input/output embeddings shared for parameter efficiency
+## Training Data
+### Dataset
+- **Source**: `tiiuae/falcon-refinedweb` (curated subset)
+- **Size**: 100,000 high-quality web documents
+- **Tokens**: ~41 million tokens
+- **Sequence Length**: 1,024 tokens per sequence
+- **Sequences**: 40,965 packed sequences
+### Data Quality
+- Length filtering: 200-10,000 characters
+- Language detection: English only
+- Quality scoring: High-quality web content
+- Deduplication: Exact and near-duplicate removal
+## Training Procedure
+### Training Configuration
+- **Hardware**: NVIDIA RTX 5070 (8GB VRAM)
+- **Precision**: bfloat16 mixed precision
+- **Batch Size**: 1 per device
+- **Gradient Accumulation**: 32 steps
+- **Effective Batch Size**: 32
+- **Learning Rate**: 3e-4
+- **Scheduler**: Cosine with linear warmup
+- **Warmup Steps**: 3,750
+- **Total Steps**: 150,000
+- **Training Time**: ~4 hours
+### Optimization Details
+- **Optimizer**: AdamW (β₁=0.9, β₂=0.95, weight_decay=0.01)
+- **Gradient Clipping**: 1.0
+- **Gradient Checkpointing**: Enabled for memory efficiency
+- **Loss Function**: Cross-entropy loss
+### Context Extension
+- **Base Context**: 2,048 tokens
+- **Extended Context**: 16,384 tokens
+- **Method**: Linear interpolation of positional embeddings
+- **Validation**: Successfully tested up to 3,600 tokens
+## Performance
+### Training Metrics
+- **Final Loss**: 3.907
+- **Training Speed**: ~10 iterations/second
+- **Peak Memory**: ~8GB VRAM
+- **Convergence**: Smooth loss curve, no overfitting
+### Inference Performance
+- **Speed**: ~10 tokens/second (RTX 5070)
+- **Memory Usage**: 1.3GB for 1,800 token context
+- **Context Limit**: 3,600 tokens practical limit
+- **Temperature**: Recommended 0.7-0.9 for creative tasks
+## Usage
+### Quick Start
+---
+import torch
+from transformers import AutoTokenizer
+from model_neo import NeoMini, NeoMiniConfig
+# Load model
+config = NeoMiniConfig()
+model = NeoMini(config)
+checkpoint = torch.load("extended_context_model.pt")
+model.load_state_dict(checkpoint['model_state_dict'])
+model.eval()
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained("gpt2")
+# Generate text
+prompt = "The future of AI is"
+input_ids = tokenizer.encode(prompt, return_tensors="pt")
+with torch.no_grad():
+    output = model.generate(input_ids, max_length=100, temperature=0.8)
+print(tokenizer.decode(output))
+---
+### Interactive Chat
+---
+python interactive_chat.py
+---
+### Generation Parameters
+- **Temperature**: 0.7-0.9 for creative tasks, 0.3-0.5 for factual
+- **Top-k**: 40-50
+- **Top-p**: 0.8-0.9
+- **Repetition Penalty**: 1.1-1.3
+## Limitations
+### Current Limitations
+- **Base Model Only**: Not instruction-tuned (requires fine-tuning for chat)
+- **Context Window**: Practical limit of ~3,600 tokens despite 16K architecture
+- **Hardware Requirements**: Requires CUDA-capable GPU for optimal performance
+- **Knowledge Cutoff**: Limited to web data patterns, no specific knowledge cutoff
+### Known Issues
+- Occasionally generates repetitive patterns (fixable with fine-tuning)
+- May not follow instructions well (base model behavior)
+- Sometimes produces formatting artifacts from web data
+## Ethical Considerations
+### Bias and Fairness
+- Trained on web data which may contain societal biases
+- No explicit bias mitigation applied during training
+- Users should be aware of potential biased outputs
+### Use Cases
+**Intended Uses:**
+- Research and experimentation
+- Text generation and completion
+- Creative writing assistance
+- Educational purposes
+**Out-of-Scope Uses:**
+- Medical or legal advice
+- High-stakes decision making
+- Content that could cause harm
+## Environmental Impact
+### Carbon Footprint
+- **Training Hardware**: Single RTX 5070 (200W)
+- **Training Time**: 4 hours
+- **Estimated CO₂**: ~0.3 kg CO₂ equivalent
+- **Efficiency**: 253M parameters per 0.3 kg CO₂
+## Model Card Authors
+[Antony Austin] - Model development and training
+[30/08/2025] - Model card creation
+## Citation
+---
+@misc{mapneo_mini_2025,
+  title={MAP-NEO Mini: An Efficient 253M Parameter Language Model},
+  author={[Antony Austin]},
+  year={2025},
+  howpublished={\url{https://huggingface.co/[Austin207]/map-neo-mini}},
+  note={Trained on NVIDIA RTX 5070 with RefinedWeb data}
+}
+---
+## Technical Details
+### Files Structure
+---
+map-neo-mini/
+├── config.json                 # Model configuration
+├── pytorch_model.bin           # Model weights
+├── tokenizer.json              # Tokenizer configuration
+├── tokenizer_config.json       # Tokenizer metadata
+├── special_tokens_map.json     # Special tokens
+├── vocab.json                  # Vocabulary
+├── merges.txt                  # BPE merges
+└── model_neo.py               # Model architecture code
+---
+### Hardware Requirements
+- **Minimum**: 4GB VRAM for inference
+- **Recommended**: 8GB VRAM for extended context
+- **Training**: 8GB+ VRAM with mixed precision
+- **CPU**: Any modern CPU (inference possible but slow)
+## Future Work
+### Planned Improvements
+- [ ] Conversational fine-tuning with UltraChat dataset
+- [ ] Instruction following capabilities
+- [ ] Multi-language support
+- [ ] Quantized versions (4-bit, 8-bit)
+- [ ] ONNX export for edge deployment
+### Research Directions
+- Context window optimization beyond 16K
+- More efficient attention mechanisms
+- Improved training data curation
+- Specialized domain fine-tuning
+## Acknowledgments
+- **Falcon RefinedWeb**: High-quality training data
+- **Hugging Face**: Transformers library and infrastructure
+- **Community**: Open-source ML community for architectural insights
+---
+**Last Updated**: August 30, 2025
+**Model Version**: 1.0.0
+**Status**: Base model (pre-conversational fine-tuning)