# Custom LLM Model This is a custom-built language model trained on a small dataset of example sentences about AI and machine learning. ## Model Description - **Model Type**: Transformer-based language model - **Vocabulary Size**: 40 characters - **Hidden Size**: 256 - **Number of Layers**: 4 - **Number of Attention Heads**: 8 - **Feedforward Size**: 1024 - **Max Sequence Length**: 64 - **Training Epochs**: 10 - **Parameters**: ~3.2M ## Training Data The model was trained on a small custom dataset containing 10 example sentences about: - Greetings and small talk - Weather descriptions - Machine learning concepts - Deep learning and transformers - Natural language processing - Model publishing and sharing ## Usage ```python import torch from train_model import TransformerLM, CharacterTokenizer # Load the saved model checkpoint = torch.load('custom_llm_model.pth', map_location='cpu') model_config = checkpoint['model_config'] tokenizer = checkpoint['tokenizer'] # Initialize model model = TransformerLM(**model_config) model.load_state_dict(checkpoint['model_state_dict']) model.eval() # Generate text def generate_text(prompt, max_length=50, temperature=0.8): # Tokenize prompt input_ids = tokenizer.encode(prompt, max_length=32, padding=False, return_tensors='pt') generated = input_ids.clone() with torch.no_grad(): for _ in range(max_length): logits = model(generated) next_token_logits = logits[0, -1, :] / temperature probs = torch.softmax(next_token_logits, dim=-1) next_token = torch.multinomial(probs, num_samples=1) generated = torch.cat([generated, next_token.unsqueeze(0)], dim=1) # Stop on period or max length if next_token.item() == tokenizer.char_to_idx.get('.', tokenizer.unk_token_id): break return tokenizer.decode(generated[0]) # Example usage print(generate_text("Hello")) print(generate_text("The weather")) print(generate_text("Deep learning")) ``` ## Limitations This is a small demonstration model trained on very limited data. For serious applications, consider: - Using larger datasets - Training for more epochs - Using larger model architectures - Implementing proper tokenization (BPE, WordPiece, etc.) ## License This model is released under the MIT License.