Upload model

Browse files

Files changed (5) hide show

README.md +201 -0
config.json +20 -0
configuration_vitmix.py +30 -0
model.safetensors +3 -0
modeling_vitmix.py +196 -0

README.md ADDED Viewed

	@@ -0,0 +1,201 @@

+---
+library_name: transformers
+tags: []
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]

config.json ADDED Viewed

	@@ -0,0 +1,20 @@

+{
+  "architectures": [
+    "ViTMixModel"
+  ],
+  "auto_map": {
+    "AutoConfig": "configuration_vitmix.ViTMixConfig",
+    "AutoModelForImageClassification": "modeling_vitmix.ViTMixModel"
+  },
+  "depth": 6,
+  "dim": 1024,
+  "heads": 16,
+  "image_size": 28,
+  "mlp_dim": 2048,
+  "model_type": "VitMix",
+  "num_classes": 10,
+  "num_experts": 12,
+  "patch_size": 14,
+  "torch_dtype": "float32",
+  "transformers_version": "4.38.0.dev0"
+}

configuration_vitmix.py ADDED Viewed

	@@ -0,0 +1,30 @@

+from transformers import PretrainedConfig
+from typing import List
+class ViTMixConfig(PretrainedConfig): #Note you cannot change the expert layers for now...
+    model_type = "VitMix"
+    def __init__(
+        self,
+        image_size = 28,
+        patch_size = 14,
+        num_classes = 10,
+        dim = 1024,
+        depth = 6,
+        heads = 16,
+        mlp_dim = 2048,
+        num_experts = 12
+    ):
+        if image_size % patch_size != 0:
+            print(f"image size must be half patch size! img_size: {image_size} | patch_size{patch_size}")
+        self.image_size = image_size
+        self.patch_size = patch_size
+        self.num_classes = num_classes
+        self.dim = dim
+        self.depth = depth
+        self.heads = heads
+        self.mlp_dim = mlp_dim
+        self.num_experts = num_experts
+        super().__init__()

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:22244f61f64607469cd39413009ddd06c2ddb395b1dd898bb4719a940ac24fc1
+size 1564290824

modeling_vitmix.py ADDED Viewed

	@@ -0,0 +1,196 @@

+from transformers import PreTrainedModel
+from .configuration_vitmix import ViTMixConfig
+# Model architecture gracefully stolen from lucidrains https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/simple_vit.py
+import torch
+from torch import nn
+from einops import rearrange
+from einops.layers.torch import Rearrange
+from st_moe_pytorch import SparseMoEBlock, MoE
+# I thin this is 'including a copy of this notice'... tell me if I'm wrong
+####################################
+#MIT License
+#Copyright (c) 2020 Phil Wang
+#Permission is hereby granted, free of charge, to any person obtaining a copy
+#of this software and associated documentation files (the "Software"), to deal
+#in the Software without restriction, including without limitation the rights
+#to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+#copies of the Software, and to permit persons to whom the Software is
+#furnished to do so, subject to the following conditions:
+#The above copyright notice and this permission notice shall be included in all
+#copies or substantial portions of the Software.
+#THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+#IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+#FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+#AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+#LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+#OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+#SOFTWARE.
+##################################]
+# helpers
+def pair(t):
+    return t if isinstance(t, tuple) else (t, t)
+def posemb_sincos_2d(h, w, dim, temperature: int = 10000, dtype = torch.float32):
+    y, x = torch.meshgrid(torch.arange(h), torch.arange(w), indexing="ij")
+    assert (dim % 4) == 0, "feature dimension must be multiple of 4 for sincos emb"
+    omega = torch.arange(dim // 4) / (dim // 4 - 1)
+    omega = 1.0 / (temperature ** omega)
+    y = y.flatten()[:, None] * omega[None, :]
+    x = x.flatten()[:, None] * omega[None, :]
+    pe = torch.cat((x.sin(), x.cos(), y.sin(), y.cos()), dim=1)
+    return pe.type(dtype)
+# classes
+class FeedForward(nn.Module):
+    def __init__(self, dim, hidden_dim):
+        super().__init__()
+        self.net = nn.Sequential(
+            nn.LayerNorm(dim),
+            nn.Linear(dim, hidden_dim),
+            nn.GELU(),
+            nn.Linear(hidden_dim, dim),
+        )
+    def forward(self, x):
+        return self.net(x)
+class Attention(nn.Module):
+    def __init__(self, dim, heads = 8, dim_head = 64):
+        super().__init__()
+        inner_dim = dim_head *  heads
+        self.heads = heads
+        self.scale = dim_head ** -0.5
+        self.norm = nn.LayerNorm(dim)
+        self.attend = nn.Softmax(dim = -1)
+        self.to_qkv = nn.Linear(dim, inner_dim * 3, bias = False)
+        self.to_out = nn.Linear(inner_dim, dim, bias = False)
+    def forward(self, x):
+        x = self.norm(x)
+        qkv = self.to_qkv(x).chunk(3, dim = -1)
+        q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> b h n d', h = self.heads), qkv)
+        dots = torch.matmul(q, k.transpose(-1, -2)) * self.scale
+        attn = self.attend(dots)
+        out = torch.matmul(attn, v)
+        out = rearrange(out, 'b h n d -> b n (h d)')
+        return self.to_out(out)
+class Transformer(nn.Module):
+    ### Here is my MOE modification
+    def __init__(self, dim, depth, heads, dim_head, mlp_dim, num_experts):
+        super().__init__()
+        self.norm = nn.LayerNorm(dim)
+        self.layers = nn.ModuleList([])
+        for _ in range(depth):
+            if _ % 2 == 0: #make every other FNN an expert layer.
+                self.layers.append(nn.ModuleList([
+                    Attention(dim, heads = heads, dim_head = dim_head),
+                    FeedForward(dim, mlp_dim)
+                ]))
+            else:
+                self.layers.append(nn.ModuleList([
+                    Attention(dim, heads = heads, dim_head = dim_head),
+                    SparseMoEBlock(
+                        MoE(dim = dim,
+                            num_experts = num_experts,
+                            gating_top_n = 2,
+                            threshold_train = 0.2,
+                            threshold_eval = 0.2,
+                            capacity_factor_train = 1.25,
+                            capacity_factor_eval = 2.,
+                            balance_loss_coef = 1e-2,
+                            router_z_loss_coef = 1e-3,
+                        ),
+                        add_ff_before = True,
+                        add_ff_after = True
+                    )
+                ]))
+    def forward(self, x):
+        for attne, ff in self.layers:
+            x = attne(x) + x
+            try:
+                x = ff(x) + x
+            except:
+                x = ff(x)[0]+x # I won't bother returning aux_loss... probably a bad idea
+        return self.norm(x)
+class SimpleViTMIX(nn.Module):
+    def __init__(self, *, image_size, patch_size, num_classes, dim, depth, heads, mlp_dim, channels = 3, dim_head = 64, num_experts = 12):
+        super().__init__()
+        image_height, image_width = pair(image_size)
+        patch_height, patch_width = pair(patch_size)
+        assert image_height % patch_height == 0 and image_width % patch_width == 0, 'Image dimensions must be divisible by the patch size.'
+        patch_dim = channels * patch_height * patch_width
+        self.to_patch_embedding = nn.Sequential(
+            Rearrange("b c (h p1) (w p2) -> b (h w) (p1 p2 c)", p1 = patch_height, p2 = patch_width),
+            nn.LayerNorm(patch_dim),
+            nn.Linear(patch_dim, dim),
+            nn.LayerNorm(dim),
+        )
+        self.pos_embedding = posemb_sincos_2d(
+            h = image_height // patch_height,
+            w = image_width // patch_width,
+            dim = dim,
+        )
+        self.transformer = Transformer(dim, depth, heads, dim_head, mlp_dim, num_experts)
+        self.pool = "mean"
+        self.to_latent = nn.Identity()
+        self.linear_head = nn.Linear(dim, num_classes)
+    def forward(self, img):
+        device = img.device
+        x = self.to_patch_embedding(img)
+        x += self.pos_embedding.to(device, dtype=x.dtype)
+        x = self.transformer(x)
+        x = x.mean(dim = 1)
+        x = self.to_latent(x)
+        return self.linear_head(x)
+class ViTMixModel(PreTrainedModel):
+    config_class = ViTMixConfig
+    def __init__(self, config):
+        super().__init__(config)
+        self.model = SimpleViTMIX(
+                image_size = config.image_size,
+                patch_size = config.patch_size,
+                num_classes = config.num_classes,
+                dim = config.dim,
+                depth = config.depth,
+                heads = config.heads,
+                mlp_dim = config.mlp_dim,
+                num_experts = config.num_experts
+            )
+    def forward(self,tensor):
+        logits = self.model(tensor)
+        if labels is not None:
+            loss = torch.nn.cross_entropy(logits, labels)
+            return {"loss": loss, "logits": logits}
+        return {"logits": logits}