| language: es | |
| license: apache-2.0 | |
| tags: | |
| - text-generation | |
| - transformer | |
| - pytorch | |
| # MTP Mini - Modelo de Lenguaje | |
| Modelo transformer entrenado con las siguientes características: | |
| ## Arquitectura | |
| - **Parámetros**: ~35.6M | |
| - **Vocabulario**: 4000 tokens | |
| - **Capas**: 8 | |
| - **Dimensión**: 512 | |
| - **Cabezas de atención**: 8 | |
| ## Mejoras implementadas | |
| - ✅ RoPE (Rotary Position Embedding) | |
| - ✅ RMSNorm | |
| - ✅ SwiGLU activation | |
| - ✅ Label smoothing | |
| - ✅ Repetition penalty | |
| - ✅ Early stopping | |
| - ✅ Length control | |
| ## Uso | |
| ```python | |
| import torch | |
| import pickle | |
| # Cargar modelo | |
| with open('mtp_mini.pkl', 'rb') as f: | |
| model_data = pickle.load(f) | |
| # Cargar tokenizer | |
| from tokenizer import MTPTokenizer | |
| tokenizer = MTPTokenizer('mtp_tokenizer.model') | |
| # Cargar modelo | |
| from model import MTPMiniModel | |
| model = MTPMiniModel(**model_data['config']['model']) | |
| model.load_state_dict(model_data['model_state_dict']) | |
| model.eval() | |
| # Generar texto | |
| prompt = "¿Qué es la inteligencia artificial?" | |
| input_ids = torch.tensor([tokenizer.encode(prompt)]) | |
| output = model.generate(input_ids, max_new_tokens=100) | |
| print(tokenizer.decode(output[0].tolist())) | |
| ``` | |
| ## Entrenamiento | |
| - Dataset: Corpus personalizado en español | |
| - Épocas: 0 | |
| - Mejor val loss: 3.7816 | |
| Entrenado en Google Colab. | |