uf-aice-lab
/

BLIP_Math_Generation_Classification

Model card Files Files and versions

xet

Community

uf-aice-lab commited on May 23, 2024

Commit

b955168

verified ·

1 Parent(s): f42f2a2

Update README.md

Browse files

Files changed (1) hide show

README.md +30 -27

README.md CHANGED Viewed

@@ -4,34 +4,37 @@ license: mit
 This is the structure of the BLIPNet model. You can load the model with it, or you can create a bigger model for your task.
-        class BLIPNet(torch.nn.Module):
-            def __init__(self, ):
-                super().__init__()
-                #Generation Model
-                self.model = BlipForConditionalGeneration.from_pretrained(MODEL_NAME, cache_dir="model")
-                #Same with https://huggingface.co/uf-aice-lab/BLIP-Math
-                self.ebd_dim = ebd_dim= 443136
-                #Classification Model
-                fc_dim = 64  # You can choose a higher number for better performance, for example, 1024.
-                self.head = nn.Sequential(
-                    nn.Linear(self.ebd_dim, fc_dim),
-                    nn.ReLU(),
-                )
-                self.score = nn.Linear(fc_dim, 5) #5 classes
-            def forward(self, pixel_values, input_ids):
-                outputs = self.model(input_ids=input_ids, pixel_values=pixel_values, labels=input_ids)
-                image_text_embeds = self.model.vision_model(pixel_values, return_dict=True).last_hidden_state
-                image_text_embeds = self.head(image_embeds.view(-1, self.ebd_dim))
-                #A classification model is based on embeddings from a generative model to leverage BLIP's powerful image-text encoding capabilities.
-                logits = self.score(image_embeds)
-                #generated text, probabilities of classification
-                return outputs, logits
 You need to input the sample in the same way as:
 https://huggingface.co/uf-aice-lab/BLIP-Math
 Then you can get the text and score at the same time.

 This is the structure of the BLIPNet model. You can load the model with it, or you can create a bigger model for your task.
+  class BLIPNet(torch.nn.Module):
+      def __init__(self, ):
+          super().__init__()
+          #Generation Model
+          self.model = BlipForConditionalGeneration.from_pretrained(MODEL_NAME, cache_dir="model")
+          #Same with https://huggingface.co/uf-aice-lab/BLIP-Math
+          self.ebd_dim = ebd_dim= 443136
+          #Classification Model
+          fc_dim = 64  # You can choose a higher number for better performance, for example, 1024.
+          self.head = nn.Sequential(
+              nn.Linear(self.ebd_dim, fc_dim),
+              nn.ReLU(),
+          )
+          self.score = nn.Linear(fc_dim, 5) #5 classes
+      def forward(self, pixel_values, input_ids):
+          outputs = self.model(input_ids=input_ids, pixel_values=pixel_values, labels=input_ids)
+          image_text_embeds = self.model.vision_model(pixel_values, return_dict=True).last_hidden_state
+          image_text_embeds = self.head(image_embeds.view(-1, self.ebd_dim))
+          #A classification model is based on embeddings from a generative model to leverage BLIP's powerful image-text encoding capabilities.
+          logits = self.score(image_embeds)
+          #generated text, probabilities of classification
+          return outputs, logits
+  model = BLIPNet()
+  model.load_state_dict(torch.load(best_model_wts_path) ,strict=False)
 You need to input the sample in the same way as:
 https://huggingface.co/uf-aice-lab/BLIP-Math
 Then you can get the text and score at the same time.