Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

README.md +202 -133
adapter_config.json +5 -5
adapter_model.safetensors +1 -1
inference.py +12 -6
vocab.json +0 -0

README.md CHANGED Viewed

@@ -1,134 +1,203 @@
-# Contrastive Zero-Shot Shakespeare Classifier
-This project implements a lightweight Contrastive Transformer model for zero-shot text classification, specifically designed to operate efficiently within memory-constrained environments using LoRA (Low-Rank Adaptation) for fine-tuning.
-## Motivation and Memory Optimization
-Initially, training larger transformer models led to `OutOfMemoryError` on the available GPU. To address this, a two-pronged approach was taken:
-1.  **Base Model Reduction**: The core `ContrastiveTransformer` architecture was significantly scaled down to `dim=64`, `depth=2`, and `heads=2`. This drastically reduced the base model's memory footprint, allowing it to be loaded onto the GPU.
-2.  **LoRA (Low-Rank Adaptation)**: To enable efficient fine-tuning without requiring extensive memory for gradients and optimizer states, LoRA was applied. This technique adds small, trainable low-rank matrices to the existing linear layers, allowing us to train only a small percentage of the model's parameters. In this implementation, only **1.7430%** of the total parameters were trainable, making the training process highly memory-efficient.
-## Model Architecture
-The model is a custom `ContrastiveTransformer` built from scratch, composed of:
-*   **Token and Positional Embeddings**: Map input tokens and their positions to dense vector representations.
-*   **Transformer Blocks**: Multiple layers, each containing a Multi-Head Attention mechanism and a SwiGLU-activated Feed-Forward Network.
-*   **SwiGLU Activation**: A modern activation function for the Feed-Forward Network, providing improved performance.
-*   **Projection Layer**: Maps the final pooled embeddings to the output dimension.
-LoRA layers were injected into the following modules to allow for efficient adaptation:
-*   `MultiheadAttention` linear layers (query, key, value, output projections)
-*   Feed-Forward Network's linear layers (`ff.0`, `ff.3`)
-*   Final projection layer (`proj`)
-## Training
-The model was trained for contrastive zero-shot classification using several datasets:
-*   `Xerv-AI/Conversational-2K-SimpleEnglish`
-*   `Xerv-AI/Savage-Responses-2K`
-*   `Xerv-AI/GRAD`
-*   `tiny_shakespeare` (a text dataset extracted from a raw text file)
-These datasets were combined and tokenized using a custom vocabulary. The model was trained for `10` epochs with an AdamW optimizer and a Cosine Annealing learning rate scheduler.
-## Inference Usage
-To use the trained model for zero-shot classification, follow these steps:
-1.  **Download the model artifacts**: The model weights and configuration are available on Hugging Face:
-    `https://huggingface.co/Phase-Technologies/contrastive-zeroshot-shakespeare`
-2.  **Load the model and vocabulary**: You can use the `inference.py` script provided in the repository.
-    ```python
-    import torch, json, re, torch.nn as nn, torch.nn.functional as F
-    from peft import PeftModel, LoraConfig, get_peft_model, TaskType
-    from pathlib import Path
-    # Define the custom model architecture (same as in training)
-    class SwiGLU(nn.Module):
-        def forward(self, x):
-            x, gate = x.chunk(2, dim=-1);
-            return F.silu(gate) * x
-    class TransformerBlock(nn.Module):
-        def __init__(self, dim, heads=2, dropout=0.1):
-            super().__init__()
-            self.norm1 = nn.LayerNorm(dim)
-            self.attn = nn.MultiheadAttention(dim, heads, dropout=dropout, batch_first=True)
-            self.norm2 = nn.LayerNorm(dim)
-            self.ff = nn.Sequential(
-                nn.Linear(dim, dim * 4 * 2),
-                SwiGLU(),
-                nn.Linear(dim * 4, dim)
-            )
-        def forward(self, x, mask=None):
-            attn_out, _ = self.attn(x, x, x, key_padding_mask=mask)
-            x = self.norm1(x + attn_out)
-            ff_out = self.ff(x)
-            return self.norm2(x + ff_out)
-    class ContrastiveTransformer(nn.Module):
-        def __init__(self, vocab, dim=64, depth=2, heads=2, max_seq=256):
-            super().__init__()
-            self.token_emb = nn.Embedding(len(vocab), dim)
-            self.pos_emb = nn.Embedding(max_seq, dim)
-            self.blocks = nn.ModuleList([TransformerBlock(dim, heads) for _ in range(depth)])
-            self.ln_f = nn.LayerNorm(dim)
-            self.proj = nn.Linear(dim, dim, bias=False)
-        def forward(self, input_ids, attention_mask=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, return_dict=None):
-            if inputs_embeds is not None:
-                x = inputs_embeds
-            else:
-                x = self.token_emb(input_ids)
-            t = x.shape[1]
-            pos_emb = self.pos_emb(torch.arange(t, device=x.device))
-            x = x + pos_emb
-            mask = (input_ids == 0)
-            for block in self.blocks:
-                x = block(x, mask)
-            x = self.ln_f(x)
-            pooled = x.mean(dim=1)
-            return self.proj(pooled)
-    def load_model(vocab_path, ckpt_path):
-        vocab = json.load(open(vocab_path))
-        config = json.load(open(Path(ckpt_path).parent / "config.json"))
-        base_model = ContrastiveTransformer(vocab, dim=config["dim"], depth=config["depth"], heads=config["heads"])
-        model = PeftModel.from_pretrained(base_model, Path(ckpt_path).parent)
-        model.eval()
-        return model, vocab
-    def predict(text, candidate_labels, model, vocab):
-        device = next(model.parameters()).device
-        def enc(t):
-            tokens = [vocab.get(w, vocab.get("<UNK>", 1)) for w in re.findall(r"\w+", t.lower())]
-            if not tokens:
-                tokens = [vocab.get("<UNK>", 1)]
-            input_ids = torch.tensor([tokens[:256] + [0]*(256-len(tokens[:256]))], device=device)
-            with torch.no_grad():
-                return F.normalize(model.forward(input_ids=input_ids), dim=-1)
-        text_emb = enc(text)
-        label_embs = torch.cat([enc(lab) for lab in candidate_labels])
-        sims = F.cosine_similarity(text_emb, label_embs)
-        best = sims.argmax().item()
-        return candidate_labels[best], float(sims[best])
-    # Example Usage:
-    # Assuming model_path is the directory where you saved the model (e.g., '/content/contrastive-zeroshot-shakespeare')
-    model_dir = Path("/content/contrastive-zeroshot-shakespeare") # Or download and specify your path
-    model, vocab = load_model(model_dir / "vocab.json", model_dir / "adapter_model.safetensors")
-    test_text = "to be or not to be that is the question"
-    candidate_labels = ["shakespeare play", "math proof", "cooking recipe", "gaming victory", "climate news"]
-    pred, conf = predict(test_text, candidate_labels, model, vocab)
-    print(f"Test prediction: shakespeare play (0.995)")
-    ```
-### Hugging Face Repository
-The model and related files are available on Hugging Face:
-[Phase-Technologies/contrastive-zeroshot-shakespeare](https://huggingface.co/Phase-Technologies/contrastive-zeroshot-shakespeare)

+---
+library_name: peft
+tags:
+- lora
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.0

adapter_config.json CHANGED Viewed

@@ -29,13 +29,13 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "proj",
-    "q_proj",
     "ff.0",
     "ff.3",
-    "v_proj",
-    "out_proj",
-    "k_proj"
   ],
   "target_parameters": null,
   "task_type": "FEATURE_EXTRACTION",

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
+    "v_proj",
     "ff.0",
+    "q_proj",
     "ff.3",
+    "proj",
+    "k_proj",
+    "out_proj"
   ],
   "target_parameters": null,
   "task_type": "FEATURE_EXTRACTION",

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e6340f16eb6af5984692d1ccdb5b664963ba5e2c9c1b4edc3c9a2fa3b8878f75
 size 71184

 version https://git-lfs.github.com/spec/v1
+oid sha256:6766d4647220efc90394eb44ee30e6c07eaa8b15cedc835586753a1ac589a775
 size 71184

inference.py CHANGED Viewed

@@ -1,6 +1,8 @@
 import torch, json, re, torch.nn as nn, torch.nn.functional as F
 from peft import PeftModel, LoraConfig, get_peft_model, TaskType
 class SwiGLU(nn.Module):
     def forward(self, x):
@@ -55,14 +57,18 @@ class ContrastiveTransformer(nn.Module):
         # The training code normalises in the training loop.
         return self.proj(pooled)
-def load_model(vocab_path, ckpt_path):
     vocab = json.load(open(vocab_path))
-    # Ensure model parameters match the saved config
-    config = json.load(open(Path(ckpt_path).parent / "config.json"))
-    base_model = ContrastiveTransformer(vocab, dim=config["dim"], depth=config["depth"], heads=config["heads"])
-    # Load the PEFT model from the directory
-    model = PeftModel.from_pretrained(base_model, Path(ckpt_path).parent)
     model.eval()
     return model, vocab

 import torch, json, re, torch.nn as nn, torch.nn.functional as F
 from peft import PeftModel, LoraConfig, get_peft_model, TaskType
+from huggingface_hub import hf_hub_download # Import for downloading files from Hugging Face Hub
+from pathlib import Path
 class SwiGLU(nn.Module):
     def forward(self, x):
         # The training code normalises in the training loop.
         return self.proj(pooled)
+def load_model(repo_id):
+    # Download vocab.json and config.json from the Hugging Face Hub
+    vocab_path = hf_hub_download(repo_id=repo_id, filename="vocab.json")
+    config_path = hf_hub_download(repo_id=repo_id, filename="config.json")
     vocab = json.load(open(vocab_path))
+    config = json.load(open(config_path))
+    base_model = ContrastiveTransformer(vocab, dim=config["dim"], depth=config["depth"], heads=config["heads"]) # Pass vocab object instead of its length
+    # Load the PEFT model directly from the Hugging Face Hub
+    model = PeftModel.from_pretrained(base_model, repo_id) # Pass repo_id directly
     model.eval()
     return model, vocab

vocab.json CHANGED Viewed

The diff for this file is too large to render. See raw diff