Upload model

Browse files

Files changed (6) hide show

README.md +199 -0
config.json +17 -0
configuration_gpt.py +25 -0
generation_config.json +4 -0
modeling_gpt.py +289 -0
pytorch_model.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,199 @@

+---
+library_name: transformers
+tags: []
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]

config.json ADDED Viewed

	@@ -0,0 +1,17 @@

+{
+  "architectures": [
+    "GPTModelForTextGeneration"
+  ],
+  "auto_map": {
+    "AutoConfig": "configuration_gpt.GPTConfig",
+    "AutoModelForCausalLM": "modeling_gpt.GPTModelForTextGeneration"
+  },
+  "block_size": 1024,
+  "model_type": "custom_gpt",
+  "n_embd": 768,
+  "n_head": 12,
+  "n_layer": 12,
+  "torch_dtype": "float32",
+  "transformers_version": "4.44.1",
+  "vocab_size": 50304
+}

configuration_gpt.py ADDED Viewed

	@@ -0,0 +1,25 @@

+# Importing libraries
+from transformers import PretrainedConfig
+class GPTConfig(PretrainedConfig):
+    model_type = "custom_gpt"
+    def __init__(
+        self,
+        block_size: int = 1024,
+        vocab_size: int = 50304,
+        n_layer: int = 12,
+        n_head: int = 12,
+        n_embd: int = 768,
+        **kwargs,
+    ):
+        """
+        GPT configuration dataclass storing model hyperparameters.
+        """
+        super().__init__(**kwargs)
+        self.block_size = block_size
+        self.vocab_size = vocab_size
+        self.n_layer: int = n_layer
+        self.n_head: int = n_head
+        self.n_embd: int = n_embd

generation_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "_from_model_config": true,
+  "transformers_version": "4.44.1"
+}

modeling_gpt.py ADDED Viewed

	@@ -0,0 +1,289 @@

+# Importing libraries
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from transformers import PreTrainedModel
+from .configuration_gpt import GPTConfig
+class GPT(nn.Module):
+    """
+    The GPT language model:
+    - Embeddings (token + positional)
+    - Stack of Transformer blocks
+    - Final LayerNorm + Linear head for output logits
+    """
+    def __init__(
+        self,
+        block_size: int = 1024,
+        vocab_size: int = 50304,
+        n_layer: int = 12,
+        n_head: int = 12,
+        n_embd: int = 768,
+    ):
+        super().__init__()
+        # Store model hyperparameters
+        self.block_size = block_size
+        self.vocab_size = vocab_size
+        self.n_layer = n_layer
+        self.n_head = n_head
+        self.n_embd = n_embd
+        # Transformer components stored in a module dictionary
+        self.transformer = nn.ModuleDict(
+            dict(
+                wte=nn.Embedding(self.vocab_size, self.n_embd),  # Token embedding
+                wpe=nn.Embedding(self.block_size, self.n_embd),  # Positional embedding
+                h=nn.ModuleList(
+                    [self.Block(self.n_embd, self.n_head) for _ in range(self.n_layer)]
+                ),  # Transformer blocks
+                ln_f=nn.LayerNorm(self.n_embd),  # Final layer normalization
+            )
+        )
+        # Linear head for output logits
+        self.lm_head = nn.Linear(self.n_embd, self.vocab_size, bias=False)
+        # Tie weights between token embedding and output projection
+        self.transformer.wte.weight = self.lm_head.weight
+    def forward(self, x):
+        B, T = x.shape  # Batch size and sequence length
+        assert T <= self.block_size, "Cannot forward sequence longer than block size"
+        # Token and positional embeddings
+        tok_emb = self.transformer.wte(x)
+        pos_emb = self.transformer.wpe(torch.arange(T, device=x.device))
+        x = tok_emb + pos_emb.unsqueeze(0)
+        # Forward pass through transformer blocks
+        for block in self.transformer.h:
+            x = block(x)
+        x = self.transformer.ln_f(x)  # Final layer norm
+        logits = self.lm_head(x)  # Compute logits
+        return logits
+    class CausalSelfAttention(nn.Module):
+        """
+        Multi-head self-attention with causal masking.
+        """
+        def __init__(self, n_embd, n_head):
+            super().__init__()
+            assert (
+                n_embd % n_head == 0
+            ), "Embedding dimension must be divisible by number of heads"
+            self.n_head = n_head
+            self.n_embd = n_embd
+            # Linear layers for query, key, and value
+            self.c_attn = nn.Linear(n_embd, 3 * n_embd)
+            self.c_proj = nn.Linear(n_embd, n_embd)
+        def forward(self, x):
+            B, T, C = x.size()
+            qkv = self.c_attn(x)
+            q, k, v = qkv.split(self.n_embd, dim=2)
+            # Reshape and transpose for multi-head attention
+            k = k.view(B, T, self.n_head, C // self.n_head).transpose(1, 2)
+            q = q.view(B, T, self.n_head, C // self.n_head).transpose(1, 2)
+            v = v.view(B, T, self.n_head, C // self.n_head).transpose(1, 2)
+            # Apply scaled dot-product attention with causal masking
+            y = F.scaled_dot_product_attention(q, k, v, is_causal=True)
+            # Reshape and apply output projection
+            y = y.transpose(1, 2).contiguous().view(B, T, C)
+            y = self.c_proj(y)
+            return y
+    class MLP(nn.Module):
+        """
+        Feed-forward network block used in Transformer architectures.
+        """
+        def __init__(self, n_embd):
+            super().__init__()
+            self.c_fc = nn.Linear(n_embd, 4 * n_embd)
+            self.gelu = nn.GELU(approximate="tanh")
+            self.c_proj = nn.Linear(4 * n_embd, n_embd)
+        def forward(self, x):
+            return self.c_proj(self.gelu(self.c_fc(x)))
+    class Block(nn.Module):
+        """
+        A single Transformer block.
+        """
+        def __init__(self, n_embd, n_head):
+            super().__init__()
+            self.ln_1 = nn.LayerNorm(n_embd)
+            self.attn = GPT.CausalSelfAttention(n_embd, n_head)
+            self.ln_2 = nn.LayerNorm(n_embd)
+            self.mlp = GPT.MLP(n_embd)
+        def forward(self, x):
+            x = x + self.attn(self.ln_1(x))
+            x = x + self.mlp(self.ln_2(x))
+            return x
+class GPTModelForTextGeneration(PreTrainedModel):
+    """
+    A wrapper class for GPT-based text generation.
+    This integrates a Transformer model within the Hugging Face `PreTrainedModel` framework.
+    """
+    config_class = GPTConfig
+    def __init__(self, config):
+        super().__init__(config)
+        # Instantiate the GPT model with the provided configuration
+        self.model = GPT(
+            block_size=config.block_size,
+            vocab_size=config.vocab_size,
+            n_layer=config.n_layer,
+            n_head=config.n_head,
+            n_embd=config.n_embd,
+        )
+    def forward(self, input_ids: torch.Tensor):
+        # Check input_ids type and shape
+        assert isinstance(input_ids, torch.Tensor), "input_ids must be a PyTorch tensor"
+        tokens = input_ids.clone()  # Avoid modifying input_ids directly
+        tokens = tokens.unsqueeze(0) if tokens.dim() == 1 else tokens
+        assert (
+            tokens.ndim == 2 and tokens.shape[0] == 1
+        ), "input_ids must have 2 dimensions: (1, sequence_length)"
+        # Check token values
+        assert torch.all(
+            (tokens >= 0) & (tokens <= self.model.vocab_size)
+        ), "input_ids contain invalid token values"
+        # Forward pass through the model
+        logits = self.model.forward(tokens)
+        return {"logits": logits}
+    @torch.no_grad()
+    def generate(
+        self,
+        input_ids: torch.Tensor,
+        max_length: int = 50,
+        do_sample: bool = True,
+        top_k: int = 50,
+        top_p: float = 0.95,
+        temperature: float = 0.9,
+        device: str = "cpu",
+    ):
+        """
+        Generates text using autoregressive sampling with top-k, top-p, and temperature.
+        """
+        # Validate device type
+        if device.startswith("cuda"):
+            assert torch.cuda.is_available(), "CUDA is not available, please use 'cpu'"
+            if device != "cuda":  # Check for specific CUDA device (cuda:n)
+                try:
+                    device_index = int(device.split(":")[1])  # Extract device number
+                    assert (
+                        0 <= device_index < torch.cuda.device_count()
+                    ), f"Invalid CUDA device index: {device_index}"
+                except (IndexError, ValueError):
+                    raise ValueError(
+                        "Invalid device format. Use 'cpu', 'cuda', or 'cuda:N' where N is an integer."
+                    )
+        elif device != "cpu":
+            raise ValueError("Invalid device. Use 'cpu', 'cuda', or 'cuda:N'.")
+        # Move input tensor and model to the specified device
+        input_ids = input_ids.to(device)
+        self.model.to(device)
+        # Check input_ids type and shape
+        assert isinstance(input_ids, torch.Tensor), "input_ids must be a PyTorch tensor"
+        tokens = input_ids.clone()  # Avoid modifying input_ids directly
+        tokens = tokens.unsqueeze(0) if tokens.dim() == 1 else tokens
+        assert (
+            tokens.ndim == 2 and tokens.shape[0] == 1
+        ), "input_ids must have 2 dimensions: (1, sequence_length)"
+        # Check token values
+        assert torch.all(
+            (tokens >= 0) & (tokens < self.model.vocab_size)
+        ), "input_ids contain invalid token values"
+        # Check max_length
+        assert (
+            isinstance(max_length, int) and max_length >= 1
+        ), "max_length must be a positive integer"
+        assert (
+            max_length <= self.model.block_size
+        ), f"max_length must be in range [1, {self.model.block_size}]"
+        # Check top_k
+        assert isinstance(top_k, int) and top_k >= 1, "top_k must be a positive integer"
+        # Check top_p
+        assert (
+            isinstance(top_p, (int, float)) and 0.0 <= top_p <= 1.0
+        ), "top_p must be in range [0, 1]"
+        # Check temperature
+        assert (
+            isinstance(temperature, (int, float)) and 0.0 <= temperature <= 1.0
+        ), "temperature must be in range [0, 1]"
+        # Move tokens to the correct device
+        tokens = tokens.to(device)
+        # Autoregressive token generation loop
+        while tokens.size(1) < max_length:
+            logits = self.forward(tokens)["logits"][:, -1, :]
+            logits = logits / max(0.01, temperature)
+            if do_sample:
+                top_k = min(top_k, logits.size(-1))  # Safety check
+                # Remove all tokens with a probability less than the last token of the top-k
+                indices_to_remove = (
+                    logits < torch.topk(logits, top_k, dim=1)[0][..., -1, None]
+                )
+                logits[indices_to_remove] = float("-inf")
+                sorted_logits, sorted_indices = torch.sort(logits, descending=True)
+                cumulative_probs = torch.cumsum(
+                    F.softmax(sorted_logits, dim=-1), dim=-1
+                )
+                # Remove tokens with cumulative probability above the threshold
+                sorted_indices_to_remove = cumulative_probs > top_p
+                # Shift the indices to the right to keep also the first token above the threshold
+                sorted_indices_to_remove[..., 1:] = sorted_indices_to_remove[
+                    ..., :-1
+                ].clone()
+                sorted_indices_to_remove[..., 0] = 0
+                # Replace logits to be removed with -inf in the sorted_logits
+                sorted_logits[sorted_indices_to_remove] = float("-inf")
+                # Then reverse the sorting process by mapping back sorted_logits to their original position
+                logits = torch.gather(sorted_logits, 1, sorted_indices.argsort(-1))
+                # Convert sorted indices back to original vocab indices
+                next_tokens = torch.multinomial(F.softmax(logits, -1), 1)
+            else:
+                next_tokens = torch.argmax(logits, dim=-1, keepdim=True)
+            tokens = torch.cat((tokens, next_tokens), dim=1)
+        return tokens.flatten()

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:30ceeaf70221fc1e9e030423fda7b2aad4f93a850d4e05167f569ef23b47dda0
+size 497951283