Upload model

Browse files

Files changed (6) hide show

README.md +199 -0
config.json +63 -0
configuration_dockgen.py +114 -0
generation_config.json +4 -0
model.safetensors +3 -0
modeling_dockgen.py +136 -0

README.md ADDED Viewed

	@@ -0,0 +1,199 @@

+---
+library_name: transformers
+tags: []
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]

config.json ADDED Viewed

	@@ -0,0 +1,63 @@

+{
+  "architectures": [
+    "DockGenModel"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "auto_map": {
+    "AutoConfig": "configuration_dockgen.DockGenConfig",
+    "AutoModelForCausalLM": "modeling_dockgen.DockGenModel"
+  },
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 1024,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 40960,
+  "max_window_layers": 28,
+  "mm_token_id": 151655,
+  "model_type": "dockgen",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 28,
+  "num_key_value_heads": 8,
+  "prot_embedding_dim": 320,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000,
+  "sliding_window": null,
+  "torch_dtype": "float32",
+  "transformers_version": "4.53.1",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}

configuration_dockgen.py ADDED Viewed

	@@ -0,0 +1,114 @@

+from typing import Any, Optional
+from transformers.models.qwen3 import Qwen3Config
+class DockGenConfig(Qwen3Config):
+    model_type = "dockgen"
+    keys_to_ignore_at_inference = ["past_key_values"]
+    # Default tensor parallel plan for base model `Qwen3`
+    base_model_tp_plan = {
+        "layers.*.self_attn.q_proj": "colwise",
+        "layers.*.self_attn.k_proj": "colwise",
+        "layers.*.self_attn.v_proj": "colwise",
+        "layers.*.self_attn.o_proj": "rowwise",
+        "layers.*.mlp.gate_proj": "colwise",
+        "layers.*.mlp.up_proj": "colwise",
+        "layers.*.mlp.down_proj": "rowwise",
+    }
+    base_model_pp_plan = {
+        "embed_tokens": (["input_ids"], ["inputs_embeds"]),
+        "layers": (["hidden_states", "attention_mask"], ["hidden_states"]),
+        "norm": (["hidden_states"], ["hidden_states"]),
+    }
+    def __init__(
+        self,
+        prot_embedding_dim: int = 1024,
+        mm_token_id: int = 151655,
+        vocab_size: int = 151936,
+        hidden_size: int = 4096,
+        intermediate_size: int = 22016,
+        num_hidden_layers: int = 32,
+        num_attention_heads: int = 32,
+        num_key_value_heads: int = 32,
+        head_dim: int = 128,
+        hidden_act: str = "silu",
+        max_position_embeddings: int = 32768,
+        initializer_range: float = 0.02,
+        rms_norm_eps: float = 1e-6,
+        use_cache: bool = True,
+        tie_word_embeddings: bool = True,
+        rope_theta: float = 10000.0,
+        rope_scaling: Optional[float] = None,
+        attention_bias: bool = False,
+        use_sliding_window: bool = False,
+        sliding_window: int = 4096,
+        max_window_layers: int = 28,
+        layer_types: Optional[str] = None,
+        attention_dropout: float = 0.0,
+        **kwargs: Any,
+    ):
+        self.prot_embedding_dim = prot_embedding_dim
+        self.mm_token_id = mm_token_id
+        super().__init__(
+            vocab_size=vocab_size,
+            hidden_size=hidden_size,
+            intermediate_size=intermediate_size,
+            num_hidden_layers=num_hidden_layers,
+            num_attention_heads=num_attention_heads,
+            num_key_value_heads=num_key_value_heads,
+            head_dim=head_dim,
+            hidden_act=hidden_act,
+            max_position_embeddings=max_position_embeddings,
+            initializer_range=initializer_range,
+            rms_norm_eps=rms_norm_eps,
+            use_cache=use_cache,
+            tie_word_embeddings=tie_word_embeddings,
+            rope_theta=rope_theta,
+            rope_scaling=rope_scaling,
+            attention_bias=attention_bias,
+            use_sliding_window=use_sliding_window,
+            sliding_window=sliding_window,
+            max_window_layers=max_window_layers,
+            layer_types=layer_types,
+            attention_dropout=attention_dropout,
+            **kwargs,
+        )
+    @classmethod
+    def from_qwen3_config(
+        cls,
+        qwen3_config: Qwen3Config,
+        prot_embedding_dim: int = 1024,
+        mm_token_id: int = 151655,
+        **kwargs: Any,
+    ) -> "DockGenConfig":
+        """Create a DockGenConfig from a Qwen3Config."""
+        return cls(
+            prot_embedding_dim=prot_embedding_dim,
+            mm_token_id=mm_token_id,
+            vocab_size=qwen3_config.vocab_size,
+            hidden_size=qwen3_config.hidden_size,
+            intermediate_size=qwen3_config.intermediate_size,
+            num_hidden_layers=qwen3_config.num_hidden_layers,
+            num_attention_heads=qwen3_config.num_attention_heads,
+            num_key_value_heads=qwen3_config.num_key_value_heads,
+            head_dim=qwen3_config.head_dim,
+            hidden_act=qwen3_config.hidden_act,
+            max_position_embeddings=qwen3_config.max_position_embeddings,
+            initializer_range=qwen3_config.initializer_range,
+            rms_norm_eps=qwen3_config.rms_norm_eps,
+            use_cache=qwen3_config.use_cache,
+            tie_word_embeddings=qwen3_config.tie_word_embeddings,
+            rope_theta=qwen3_config.rope_theta,
+            rope_scaling=qwen3_config.rope_scaling,
+            attention_bias=qwen3_config.attention_bias,
+            use_sliding_window=qwen3_config.use_sliding_window,
+            sliding_window=qwen3_config.sliding_window,
+            max_window_layers=qwen3_config.max_window_layers,
+            layer_types=qwen3_config.layer_types,
+            attention_dropout=qwen3_config.attention_dropout,
+            **kwargs,
+        )

generation_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "_from_model_config": true,
+  "transformers_version": "4.53.1"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7051d7005cd5d718bf463396d07f0a71be4987d1ef31c70905280268dfd8de1e
+size 3007880032

modeling_dockgen.py ADDED Viewed

	@@ -0,0 +1,136 @@

+from typing import Any, Optional
+import torch
+from torch import nn
+from transformers.modeling_outputs import (
+    CausalLMOutputWithPast,
+)
+from transformers.models.qwen3.modeling_qwen3 import (
+    KwargsForCausalLM,
+    Qwen3ForCausalLM,
+    Qwen3Model,
+)
+from transformers.processing_utils import Unpack
+from .configuration_dockgen import DockGenConfig
+class DockGenModelBase(Qwen3Model):
+    config_class = DockGenConfig
+    def __init__(self, config: DockGenConfig) -> None:
+        super().__init__(config)
+    @classmethod
+    def from_language_model(cls, language_model: Qwen3Model) -> "DockGenModelBase":
+        """Create a DockGenModelBase from a Qwen3Model."""
+        base_model = language_model
+        dock_gen_config = DockGenConfig.from_qwen3_config(
+            language_model.config,
+        )
+        model = cls(dock_gen_config)
+        model.load_state_dict(base_model.state_dict(), strict=True)
+        return model
+class DockGenModel(Qwen3ForCausalLM):
+    config_class = DockGenConfig
+    _tied_weights_keys = ["lm_head.weight"]
+    _tp_plan = {"lm_head": "colwise_rep"}
+    _pp_plan = {"lm_head": (["hidden_states"], ["logits"])}
+    def __init__(self, config: DockGenConfig) -> None:
+        super(Qwen3ForCausalLM, self).__init__(config)
+        self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
+        self.model = DockGenModelBase(config)
+        self.vocab_size = config.vocab_size
+        self.aligner = nn.Linear(
+            self.config.prot_embedding_dim, self.config.hidden_size, bias=True
+        )
+        self.post_init()
+    def get_multimodal_embeddings(
+        self, pixel_values: Optional[torch.Tensor]
+    ) -> torch.Tensor:
+        if pixel_values is None:
+            return None
+        # Run multimodal inputs through encoder and projector
+        embeddings = self.aligner(pixel_values)
+        return embeddings
+    def get_input_embed_embeddings(
+        self,
+        input_ids: torch.Tensor,
+        multimodal_embeddings: Optional[Any] = None,
+    ) -> torch.Tensor:
+        # `get_input_embeddings` should already be implemented for the language
+        # model as one of the requirements of basic vLLM model implementation.
+        inputs_embeds = self.model.embed_tokens(input_ids)
+        if multimodal_embeddings is not None:
+            if input_ids is None:
+                special_mm_mask = inputs_embeds == self.get_input_embeddings()(
+                    torch.tensor(
+                        self.config.mm_token_id,
+                        dtype=torch.long,
+                        device=inputs_embeds.device,
+                    )
+                )
+                special_mm_mask = special_mm_mask.all(-1)
+            else:
+                special_mm_mask = input_ids == self.config.mm_token_id
+            special_mm_mask = (
+                special_mm_mask.unsqueeze(-1)
+                .expand_as(inputs_embeds)
+                .to(inputs_embeds.device)
+            )
+            assert special_mm_mask.all(-1).sum() == multimodal_embeddings.shape[0], (
+                "The number of multimodal embeddings should match the number of "
+                "special multimodal tokens in the input_ids."
+            )
+            inputs_embeds = inputs_embeds.masked_scatter(
+                special_mm_mask, multimodal_embeddings
+            )
+        return inputs_embeds
+    def forward(
+        self,
+        input_ids: Optional[torch.LongTensor] = None,
+        pixel_values: Optional[torch.FloatTensor] = None,
+        inputs_embeds: Optional[torch.FloatTensor] = None,
+        labels: Optional[torch.LongTensor] = None,
+        logits_to_keep: Optional[int] = None,
+        **kwargs: Unpack[KwargsForCausalLM],
+    ) -> CausalLMOutputWithPast:
+        if inputs_embeds is None:
+            multimodal_embeddings = self.get_multimodal_embeddings(pixel_values)
+            inputs_embeds = self.get_input_embed_embeddings(
+                input_ids=input_ids, multimodal_embeddings=multimodal_embeddings
+            )
+        return super().forward(
+            inputs_embeds=inputs_embeds,
+            labels=labels,
+            logits_to_keep=logits_to_keep,
+            **kwargs,
+        )
+    @classmethod
+    def from_language_model(
+        cls,
+        language_model: Qwen3ForCausalLM,
+        prot_embedding_dim: int = 1024,
+        mm_token_id: int = 151655,
+    ) -> "DockGenModel":
+        """Create a DockGenModel from a Qwen3ForCausalLM model."""
+        base_model = DockGenModelBase.from_language_model(language_model.model)
+        dock_gen_config = DockGenConfig.from_qwen3_config(
+            language_model.config,
+            prot_embedding_dim=prot_embedding_dim,
+        )
+        model = cls(dock_gen_config)
+        model.model = base_model
+        return model