Spaces:

Krishnakanth1993
/

FirstLLM

Sleeping

App Files Files Community

Krishnakanth1993 commited on Nov 15, 2025

Commit

0ede4e9

1 Parent(s): 71a75cd

Initial commit

Browse files

Files changed (5) hide show

.gitignore +64 -0
README.md +239 -0
app.py +163 -0
inference.py +101 -0
model.py +216 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,64 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual environments
+venv/
+env/
+ENV/
+.venv
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# Model checkpoints
+*.pth
+*.pt
+*.ckpt
+*.bin
+*.safetensors
+# Data files
+input.txt
+*.txt
+*.csv
+*.json
+# Jupyter Notebook
+.ipynb_checkpoints/
+*.ipynb_checkpoints/
+# OS
+.DS_Store
+Thumbs.db
+# Logs
+*.log
+logs/
+# Hugging Face cache
+.cache/
+hf_cache/

README.md CHANGED Viewed

@@ -11,3 +11,242 @@ license: mit
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+# Sentence Completion with GPT
+A Gradio web application for sentence completion using a custom GPT model architecture. This app can use either a trained model checkpoint or pretrained GPT-2 weights.
+## Features
+- **Sentence Completion**: Generate text completions for any given prompt
+- **Customizable Generation**: Control generation parameters (temperature, top-k, max tokens)
+- **Model Flexibility**: Supports both saved trained models and pretrained GPT-2
+- **Easy Deployment**: Ready for deployment on Hugging Face Spaces
+## Model Architecture
+This app uses a custom GPT implementation based on the GPT-2 architecture:
+- **Parameters**: ~124M (for gpt2 base model)
+- **Vocab Size**: 50,257 tokens
+- **Block Size**: 1024 tokens (max sequence length)
+- **Architecture**: 12 layers, 12 attention heads, 768 embedding dimension
+## Environment Setup
+### Prerequisites
+- Python 3.8 or higher
+- pip (Python package manager)
+- (Optional) CUDA-enabled GPU for faster inference
+### Step 1: Clone or Download the Repository
+```bash
+git clone <repository-url>
+cd first_llm_124
+```
+Or download and extract the project files to a directory.
+### Step 2: Create a Virtual Environment (Recommended)
+Using a virtual environment helps avoid conflicts with other projects:
+**On Windows:**
+```bash
+python -m venv venv
+venv\Scripts\activate
+```
+**On macOS/Linux:**
+```bash
+python3 -m venv venv
+source venv/bin/activate
+```
+### Step 3: Install Dependencies
+Install all required packages from the requirements file:
+```bash
+pip install -r requirements.txt
+```
+Or install packages individually:
+```bash
+pip install gradio>=4.0.0
+pip install torch>=2.0.0
+pip install transformers>=4.30.0
+pip install tiktoken>=0.5.0
+pip install huggingface_hub>=0.34.0
+```
+### Step 4: Verify Installation
+Verify that all packages are installed correctly:
+```bash
+python -c "import torch; import gradio; import transformers; import tiktoken; print('All packages installed successfully!')"
+```
+### Step 5: Prepare Model Directory (Optional)
+If you have a trained model, create a `model` directory and place your checkpoint there:
+```bash
+mkdir model
+# Place your model.pth file in the model/ directory
+```
+## Installation
+1. Follow the [Environment Setup](#environment-setup) steps above
+2. Ensure all dependencies are installed
+3. (Optional) Place your trained model checkpoint in the `model/` directory
+## Usage
+### Running Locally
+```bash
+python app.py
+```
+The app will start a local server. Open the provided URL in your browser.
+### Model Loading
+The app automatically tries to load models in this order:
+1. Saved checkpoint file (checks for: `./model/model.pth`, `model.pt`, `checkpoint.pth`, `checkpoint.pt`, `gpt_model.pth`)
+2. Pretrained GPT-2 from Hugging Face (fallback)
+### Saving a Trained Model
+If you have a trained model, you can save it using:
+```python
+import torch
+import os
+# Create model directory if it doesn't exist
+os.makedirs('model', exist_ok=True)
+# After training your model, save the checkpoint
+checkpoint = {
+    'model_state_dict': model.state_dict(),
+    'config': {
+        'block_size': model.config.block_size,
+        'vocab_size': model.config.vocab_size,
+        'n_layer': model.config.n_layer,
+        'n_head': model.config.n_head,
+        'n_embd': model.config.n_embd,
+    }
+}
+torch.save(checkpoint, './model/model.pth')
+print("Model saved successfully to ./model/model.pth!")
+```
+### Loading a Saved Model
+Place your saved model checkpoint (`.pth` or `.pt` file) in the `model/` directory. The app will automatically detect and load it from `./model/model.pth`.
+## Parameters
+- **Max Tokens**: Maximum number of tokens to generate (10-200)
+- **Top-K**: Sample from the top K most likely tokens (1-100). Lower values make the output more focused.
+- **Temperature**: Controls the randomness of the output (0.1-2.0). Lower values make the output more deterministic, higher values more creative.
+## Project Structure
+```
+.
+├── app.py              # Gradio interface (main entry point)
+├── model.py            # GPT model architecture
+├── inference.py        # Model loading and text generation utilities
+├── requirements.txt    # Python dependencies
+├── README.md          # This file
+├── llm_trainer.ipynb  # Jupyter notebook for training
+├── input.txt          # Training data (optional)
+├── model/             # (Optional) Directory for saved model checkpoints
+│   └── model.pth      # Saved model checkpoint
+└── venv/              # Virtual environment (created during setup)
+```
+## Deployment to Hugging Face Spaces
+1. Create a new Space on [Hugging Face Spaces](https://huggingface.co/spaces)
+2. Upload all files from this project (except `venv/` and `__pycache__/`)
+3. Set the Space SDK to **Gradio**
+4. Add your model checkpoint file in the `model/` directory (if using a trained model)
+5. The Space will automatically install dependencies and launch the app
+### For Hugging Face Spaces
+The app will automatically:
+- Use CPU or GPU if available
+- Load pretrained GPT-2 if no checkpoint is found
+- Handle model loading errors gracefully
+## Model Training
+To train your own model, use the `llm_trainer.ipynb` notebook. After training, save the model:
+```python
+import torch
+import os
+# Create model directory if it doesn't exist
+os.makedirs('model', exist_ok=True)
+# Save model checkpoint
+checkpoint = {
+    'model_state_dict': model.state_dict(),
+    'config': {
+        'block_size': model.config.block_size,
+        'vocab_size': model.config.vocab_size,
+        'n_layer': model.config.n_layer,
+        'n_head': model.config.n_head,
+        'n_embd': model.config.n_embd,
+    }
+}
+torch.save(checkpoint, './model/model.pth')
+print("Model saved successfully!")
+```
+Then place `model.pth` in the `model/` directory for automatic loading.
+## Troubleshooting
+### Common Issues
+1. **Import Errors**:
+   - Ensure all dependencies are installed: `pip install -r requirements.txt`
+   - Make sure your virtual environment is activated
+2. **Model Not Found**:
+   - Check that the model checkpoint is in the correct directory: `./model/model.pth`
+   - Verify the file exists: `ls model/model.pth` (Linux/Mac) or `dir model\model.pth` (Windows)
+3. **CUDA Out of Memory**:
+   - The app will automatically fall back to CPU if GPU memory is insufficient
+   - Reduce max_tokens parameter in the interface
+4. **Module Not Found**:
+   - Reinstall dependencies: `pip install -r requirements.txt --upgrade`
+   - Check Python version: `python --version` (should be 3.8+)
+5. **Port Already in Use**:
+   - Change the port in `app.py`: `demo.launch(server_port=7861)`
+   - Or stop the process using the port
+## License
+This project uses the GPT-2 architecture and can load pretrained GPT-2 weights from Hugging Face, which are subject to OpenAI's GPT-2 license.
+## Notes
+- The model uses tiktoken's 'gpt2' encoding
+- Generation uses top-k sampling with temperature control
+- Maximum sequence length is 1024 tokens

app.py ADDED Viewed

	@@ -0,0 +1,163 @@

+"""
+Gradio App for Sentence Completion
+Main entry point for Hugging Face Spaces
+"""
+import gradio as gr
+import torch
+from inference import load_model, generate_text, get_device
+# Global model variable
+model = None
+device = None
+def initialize_model(model_path=None, pretrained_model='gpt2'):
+    """Initialize the model on startup"""
+    global model, device
+    try:
+        model, device = load_model(model_path=model_path, pretrained_model=pretrained_model)
+        return f"Model loaded successfully on device: {device}"
+    except Exception as e:
+        return f"Error loading model: {str(e)}"
+def complete_sentence(prompt, max_tokens, top_k, temperature):
+    """Generate sentence completion based on prompt"""
+    global model, device
+    if model is None:
+        return "Error: Model not loaded. Please restart the app."
+    if not prompt.strip():
+        return "Please enter a prompt to complete."
+    try:
+        # Ensure device is current
+        if device != get_device():
+            device = get_device()
+            model = model.to(device)
+        # Generate completion
+        generated_text = generate_text(
+            prompt=prompt,
+            model=model,
+            max_tokens=max_tokens,
+            top_k=top_k,
+            temperature=temperature,
+            device=device
+        )
+        return generated_text
+    except Exception as e:
+        return f"Error generating text: {str(e)}"
+def create_interface():
+    """Create and return the Gradio interface"""
+    # Initialize model on startup
+    # Try to load from common checkpoint paths
+    checkpoint_paths = [
+        './model/model.pth',
+        'model.pt',
+        'checkpoint.pth',
+        'checkpoint.pt',
+        'gpt_model.pth',
+    ]
+    model_path = None
+    for path in checkpoint_paths:
+        import os
+        if os.path.exists(path):
+            model_path = path
+            break
+    status = initialize_model(model_path=model_path, pretrained_model='gpt2')
+    print(status)
+    # Create Gradio interface
+    with gr.Blocks(title="Sentence Completion with GPT") as demo:
+        gr.Markdown(
+            """
+            # Sentence Completion with GPT
+            Enter a prompt and the model will complete the sentence for you.
+            Adjust the parameters to control the generation behavior.
+            """
+        )
+        with gr.Row():
+            with gr.Column(scale=2):
+                prompt_input = gr.Textbox(
+                    label="Prompt",
+                    placeholder="Enter your prompt here...",
+                    lines=3,
+                    value="The future of artificial intelligence is"
+                )
+                with gr.Row():
+                    max_tokens_slider = gr.Slider(
+                        minimum=10,
+                        maximum=200,
+                        value=50,
+                        step=10,
+                        label="Max Tokens"
+                    )
+                    top_k_slider = gr.Slider(
+                        minimum=1,
+                        maximum=100,
+                        value=50,
+                        step=1,
+                        label="Top-K"
+                    )
+                    temperature_slider = gr.Slider(
+                        minimum=0.1,
+                        maximum=2.0,
+                        value=1.0,
+                        step=0.1,
+                        label="Temperature"
+                    )
+                generate_btn = gr.Button("Generate", variant="primary")
+            with gr.Column(scale=2):
+                output_text = gr.Textbox(
+                    label="Generated Text",
+                    lines=10,
+                    interactive=False
+                )
+        gr.Markdown(
+            """
+            ### Parameters:
+            - **Max Tokens**: Maximum number of tokens to generate
+            - **Top-K**: Sample from top K most likely tokens (lower = more focused)
+            - **Temperature**: Controls randomness (lower = more deterministic, higher = more creative)
+            """
+        )
+        # Set up the generate function
+        generate_btn.click(
+            fn=complete_sentence,
+            inputs=[prompt_input, max_tokens_slider, top_k_slider, temperature_slider],
+            outputs=output_text
+        )
+        # Also generate on Enter key press
+        prompt_input.submit(
+            fn=complete_sentence,
+            inputs=[prompt_input, max_tokens_slider, top_k_slider, temperature_slider],
+            outputs=output_text
+        )
+    return demo
+if __name__ == "__main__":
+    demo = create_interface()
+    demo.launch(share=False)

inference.py ADDED Viewed

	@@ -0,0 +1,101 @@

+"""
+Inference and Model Loading Utilities
+"""
+import os
+import torch
+from torch.nn import functional as F
+import tiktoken
+from model import GPT, GPTConfig
+def get_device():
+    """Auto-detect and return the best available device"""
+    if torch.cuda.is_available():
+        return 'cuda'
+    elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
+        return "mps"
+    else:
+        return 'cpu'
+def load_model(model_path=None, pretrained_model='gpt2', device=None):
+    """
+    Load model with priority: saved checkpoint > pretrained model
+    Args:
+        model_path: Path to saved model checkpoint (.pth or .pt file)
+        pretrained_model: HuggingFace model name to fallback to ('gpt2', 'gpt2-medium', etc.)
+        device: Device to load model on (auto-detected if None)
+    Returns:
+        Loaded model and device
+    """
+    if device is None:
+        device = get_device()
+    # Try to load saved checkpoint first
+    if model_path and os.path.exists(model_path):
+        try:
+            print(f"Loading saved model from {model_path}...")
+            model = GPT.load_checkpoint(model_path, device=device)
+            return model, device
+        except Exception as e:
+            print(f"Failed to load saved model: {e}")
+            print(f"Falling back to pretrained model: {pretrained_model}")
+    # Fallback to pretrained model
+    print(f"Loading pretrained model: {pretrained_model}...")
+    try:
+        model = GPT.from_pretrained(pretrained_model)
+        model.to(device)
+        return model, device
+    except Exception as e:
+        print(f"Failed to load pretrained model: {e}")
+        # Last resort: create untrained model with default config
+        print("Creating model with default config...")
+        config = GPTConfig()
+        model = GPT(config)
+        model.to(device)
+        return model, device
+def generate_text(prompt, model, max_tokens=50, top_k=50, temperature=1.0, device="cpu"):
+    """
+    Generate text completion for a given prompt using the GPT model.
+    Args:
+        prompt: Input text prompt
+        model: GPT model instance
+        max_tokens: Maximum number of tokens to generate
+        top_k: Top-k sampling parameter (None for no top-k filtering)
+        temperature: Temperature for sampling (higher = more random)
+        device: Device to run inference on
+    Returns:
+        Generated text string (including original prompt)
+    """
+    enc = tiktoken.get_encoding("gpt2")
+    model.eval()
+    with torch.no_grad():
+        # tokenize prompt
+        input_ids = enc.encode(prompt)
+        x = torch.tensor(input_ids, dtype=torch.long, device=device).unsqueeze(0)
+        for _ in range(max_tokens):
+            logits, _ = model(x)
+            logits = logits[:, -1, :] / temperature
+            if top_k is not None:
+                topk = torch.topk(logits, top_k, dim=-1)
+                mask = logits < topk.values[:, -1].unsqueeze(-1)
+                logits = logits.masked_fill(mask, -float("inf"))
+            probs = F.softmax(logits, dim=-1)
+            next_token = torch.multinomial(probs, num_samples=1)
+            x = torch.cat((x, next_token), dim=1)
+        generated_ids = x[0].tolist()
+    return enc.decode(generated_ids)

model.py ADDED Viewed

	@@ -0,0 +1,216 @@

+"""
+GPT Model Architecture
+Extracted from llm_trainer.ipynb
+"""
+import math
+from dataclasses import dataclass
+import torch
+import torch.nn as nn
+from torch.nn import functional as F
+class CausalSelfAttention(nn.Module):
+    def __init__(self, config):
+        super().__init__()
+        assert config.n_embd % config.n_head == 0
+        # key, query, value projections for all heads, but in a batch
+        self.c_attn = nn.Linear(config.n_embd, 3 * config.n_embd)
+        # output projection
+        self.c_proj = nn.Linear(config.n_embd, config.n_embd)
+        self.c_proj.NANGPT_SCALE_INIT = 1
+        # regularization
+        self.n_head = config.n_head
+        self.n_embd = config.n_embd
+        self.register_buffer("bias", torch.tril(torch.ones(config.block_size, config.block_size)).view(1, 1, config.block_size, config.block_size))
+    def forward(self, x):
+        B, T, C = x.size() # batch size, sequence length, embedding dimensionality (n_embd)
+        # calculate query, key, values for all heads in batch and move head forward to be the batch dim
+        # nh is "number of heads", hs is "head size", and C (number of channels) = nh * hs
+        # e.g. in GPT-2 (124M), n_head=12, hs=64, so nh*hs=C=768 channels in the Transformer
+        qkv = self.c_attn(x)
+        q, k, v = qkv.split(self.n_embd, dim=2)
+        k = k.view(B, T, self.n_head, C // self.n_head).transpose(1, 2) # (B, nh, T, hs)
+        q = q.view(B, T, self.n_head, C // self.n_head).transpose(1, 2) # (B, nh, T, hs)
+        v = v.view(B, T, self.n_head, C // self.n_head).transpose(1, 2) # (B, nh, T, hs)
+        att = (q @ k.transpose(-2, -1)) * (1.0 / math.sqrt(k.size(-1)))
+        att = att.masked_fill(self.bias[:, :, :T, :T] == 0, float('-inf'))
+        att = F.softmax(att, dim=-1)
+        y = att @ v # (B, nh, T, T) x (B, nh, T, hs) -> (B, nh, T, hs)
+        y = y.transpose(1, 2).contiguous().view(B, T, C) # re-assemble all head outputs side by side
+        # output projection
+        y = self.c_proj(y)
+        return y
+class MLP(nn.Module):
+    def __init__(self, config):
+        super().__init__()
+        self.c_fc    = nn.Linear(config.n_embd, 4 * config.n_embd)
+        self.gelu    = nn.GELU(approximate='tanh')
+        self.c_proj  = nn.Linear(4 * config.n_embd, config.n_embd)
+        self.c_proj.NANOGPT_SCALE_INIT = 1
+    def forward(self, x):
+        x = self.c_fc(x)
+        x = self.gelu(x)
+        x = self.c_proj(x)
+        return x
+class Block(nn.Module):
+    def __init__(self, config):
+        super().__init__()
+        self.ln_1 = nn.LayerNorm(config.n_embd)
+        self.attn = CausalSelfAttention(config)
+        self.ln_2 = nn.LayerNorm(config.n_embd)
+        self.mlp = MLP(config)
+    def forward(self, x):
+        x = x + self.attn(self.ln_1(x))
+        x = x + self.mlp(self.ln_2(x))
+        return x
+@dataclass
+class GPTConfig:
+    block_size: int = 1024 # max sequence length
+    vocab_size: int = 50257 # number of tokens: 50,000 BPE merges + 256 bytes tokens + 1 <|endoftext|> token
+    n_layer: int = 12 # number of layers
+    n_head: int = 12 # number of heads
+    n_embd: int = 768 # embedding dimension
+class GPT(nn.Module):
+    def __init__(self, config):
+        super().__init__()
+        self.config = config
+        self.transformer = nn.ModuleDict(dict(
+            wte = nn.Embedding(config.vocab_size, config.n_embd),
+            wpe = nn.Embedding(config.block_size, config.n_embd),
+            h = nn.ModuleList([Block(config) for _ in range(config.n_layer)]),
+            ln_f = nn.LayerNorm(config.n_embd),
+        ))
+        self.lm_head = nn.Linear(config.n_embd, config.vocab_size, bias=False)
+        # weight sharing
+        self.transformer.wte.weight = self.lm_head.weight
+        # weight initialization
+        self.apply(self._init_weights)
+    def _init_weights(self, module):
+        if isinstance(module, nn.Linear):
+            std = 0.02
+            if hasattr(module, 'NANGPT_SCALE_INIT'):
+                std *= (2 * self.config.n_layer) ** -0.5
+            torch.nn.init.normal_(module.weight, mean = 0.0, std = std)
+            if module.bias is not None:
+                torch.nn.init.zeros_(module.bias)
+        elif isinstance(module, nn.Embedding):
+            torch.nn.init.normal_(module.weight, mean=0.0, std = 0.02)
+    def forward(self, idx, targets=None):
+        # idx is of shape (B, T)
+        B, T = idx.size()
+        assert T <= self.config.block_size, f"Cannot forward sequence of length {T}, block size is only {self.config.block_size}"
+        # forward the token and posisition embeddings
+        pos = torch.arange(0, T, dtype=torch.long, device=idx.device) # shape (T)
+        pos_emb = self.transformer.wpe(pos) # position embeddings of shape (T, n_embd)
+        tok_emb = self.transformer.wte(idx) # token embeddings of shape (B, T, n_embd)
+        x = tok_emb + pos_emb
+        # forward the blocks of the transformer
+        for block in self.transformer.h:
+            x = block(x)
+        # forward the final layernorm and the classifier
+        x = self.transformer.ln_f(x)
+        logits = self.lm_head(x) # (B, T, vocab_size)
+        loss = None
+        if targets is not None:
+            loss = F.cross_entropy(logits.view(-1, logits.size(-1)), targets.view(-1))
+        return logits, loss
+    @classmethod
+    def from_pretrained(cls, model_type):
+        """Loads pretrained GPT-2 model weights from huggingface"""
+        assert model_type in {'gpt2', 'gpt2-medium', 'gpt2-large', 'gpt2-xl'}
+        from transformers import GPT2LMHeadModel
+        print("loading weights from pretrained gpt: %s" % model_type)
+        # n_layer, n_head and n_embd are determined from model_type
+        config_args = {
+            'gpt2':         dict(n_layer=12, n_head=12, n_embd=768),  # 124M params
+            'gpt2-medium':  dict(n_layer=24, n_head=16, n_embd=1024), # 350M params
+            'gpt2-large':   dict(n_layer=36, n_head=20, n_embd=1280), # 774M params
+            'gpt2-xl':      dict(n_layer=48, n_head=25, n_embd=1600), # 1558M params
+        }[model_type]
+        config_args['vocab_size'] = 50257 # always 50257 for GPT model checkpoints
+        config_args['block_size'] = 1024 # always 1024 for GPT model checkpoints
+        # create a from-scratch initialized minGPT model
+        config = GPTConfig(**config_args)
+        model = GPT(config)
+        sd = model.state_dict()
+        sd_keys = sd.keys()
+        sd_keys = [k for k in sd_keys if not k.endswith('.attn.bias')] # discard this mask / buffer, not a param
+        # init a huggingface/transformers model
+        model_hf = GPT2LMHeadModel.from_pretrained(model_type)
+        sd_hf = model_hf.state_dict()
+        # copy while ensuring all of the parameters are aligned and match in names and shapes
+        sd_keys_hf = sd_hf.keys()
+        sd_keys_hf = [k for k in sd_keys_hf if not k.endswith('.attn.masked_bias')] # ignore these, just a buffer
+        sd_keys_hf = [k for k in sd_keys_hf if not k.endswith('.attn.bias')] # same, just the mask (buffer)
+        transposed = ['attn.c_attn.weight', 'attn.c_proj.weight', 'mlp.c_fc.weight', 'mlp.c_proj.weight']
+        # basically the openai checkpoints use a "Conv1D" module, but we only want to use a vanilla Linear
+        # this means that we have to transpose these weights when we import them
+        assert len(sd_keys_hf) == len(sd_keys), f"mismatched keys: {len(sd_keys_hf)} != {len(sd_keys)}"
+        for k in sd_keys_hf:
+            if any(k.endswith(w) for w in transposed):
+                # special treatment for the Conv1D weights we need to transpose
+                assert sd_hf[k].shape[::-1] == sd[k].shape
+                with torch.no_grad():
+                    sd[k].copy_(sd_hf[k].t())
+            else:
+                # vanilla copy over the other parameters
+                assert sd_hf[k].shape == sd[k].shape
+                with torch.no_grad():
+                    sd[k].copy_(sd_hf[k])
+        return model
+    def save_checkpoint(self, filepath):
+        """Save model checkpoint with config"""
+        checkpoint = {
+            'model_state_dict': self.state_dict(),
+            'config': {
+                'block_size': self.config.block_size,
+                'vocab_size': self.config.vocab_size,
+                'n_layer': self.config.n_layer,
+                'n_head': self.config.n_head,
+                'n_embd': self.config.n_embd,
+            }
+        }
+        torch.save(checkpoint, filepath)
+        print(f"Model saved to {filepath}")
+    @classmethod
+    def load_checkpoint(cls, filepath, device='cpu'):
+        """Load model from checkpoint file"""
+        checkpoint = torch.load(filepath, map_location=device)
+        config_dict = checkpoint['config']
+        config = GPTConfig(**config_dict)
+        model = cls(config)
+        model.load_state_dict(checkpoint['model_state_dict'])
+        model.to(device)
+        print(f"Model loaded from {filepath}")
+        return model