| | --- |
| | language: es |
| | license: apache-2.0 |
| | tags: |
| | - text-generation |
| | - transformer |
| | - pytorch |
| | --- |
| | |
| | # MTP Mini - Modelo Mejorado 20x |
| |
|
| | Modelo transformer con arquitectura avanzada entrenado en GPU T4. |
| |
|
| | ## Arquitectura |
| | - **Parámetros**: ~310.7M (310,708,225) |
| | - **Vocabulario**: 8000 tokens |
| | - **Capas**: 24 |
| | - **Dimensión**: 1024 |
| | - **Contexto**: 2048 tokens |
| |
|
| | ## Mejoras |
| | - ✅ RoPE, RMSNorm, SwiGLU |
| | - ✅ Flash Attention |
| | - ✅ Gradient Checkpointing |
| | - ✅ Mixed Precision FP16 |
| | - ✅ Anti-alucinación |
| | - ✅ Confidence Scoring |
| |
|
| | ## Uso |
| | ```python |
| | import torch, pickle |
| | from tokenizer import MTPTokenizer |
| | from model import MTPMiniModel |
| | |
| | with open('mtp_mini.pkl', 'rb') as f: |
| | data = pickle.load(f) |
| | |
| | tokenizer = MTPTokenizer('mtp_tokenizer.model') |
| | model = MTPMiniModel(**data['config']['model']) |
| | model.load_state_dict(data['model_state_dict']) |
| | model.eval() |
| | |
| | prompt = "¿Qué es la IA?" |
| | ids = torch.tensor([tokenizer.encode(prompt)]).unsqueeze(0) |
| | output = model.generate(ids, max_new_tokens=150) |
| | print(tokenizer.decode(output[0].tolist())) |
| | ``` |
| |
|
| | Entrenado en Google Colab con GPU T4. |
| |
|