Mishamq commited on
Commit
a5649d8
·
verified ·
1 Parent(s): 310f831

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ tags:
7
+ - genomics
8
+ - dna
9
+ - mamba
10
+ - hybrid
11
+ - biology
12
+ ---
13
+
14
+ # HybriDNA-3B
15
+
16
+ HybriDNA is a hybrid Mamba-Attention model for DNA sequence modeling. This is the 3B parameter variant.
17
+
18
+ ## Model Description
19
+
20
+ HybriDNA combines the efficiency of Mamba state space models with the expressiveness of attention mechanisms in a hybrid architecture. The model alternates between Mamba and Attention layers to achieve both computational efficiency and strong sequence modeling capabilities.
21
+
22
+ ### Architecture
23
+
24
+ - **Parameters**: ~3B
25
+ - **Hidden Size**: 4096
26
+ - **Layers**: 16 (hybrid Mamba + Attention)
27
+ - **Attention Heads**: 32
28
+ - **Key-Value Heads**: 8 (Grouped Query Attention)
29
+ - **Mamba Version**: Mamba-2
30
+ - **Vocabulary**: 12 tokens (A, C, G, T, N + special tokens)
31
+ - **Max Sequence Length**: 131,202 bp
32
+
33
+ ## Installation
34
+
35
+ ```bash
36
+ pip install transformers torch mamba-ssm causal-conv1d flash-attn
37
+ ```
38
+
39
+ ## Usage
40
+
41
+ ### Text Generation
42
+
43
+ ```python
44
+ from transformers import AutoTokenizer, AutoModelForCausalLM
45
+
46
+ model_name = "Mishamq/HybriDNA-3B"
47
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
48
+ model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
49
+
50
+ prompt = "ACGTACGT"
51
+ inputs = tokenizer(prompt, return_tensors="pt")
52
+ outputs = model.generate(**inputs, max_new_tokens=64)
53
+ print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])
54
+ ```
55
+
56
+ ### Embeddings
57
+
58
+ ```python
59
+ from transformers import AutoTokenizer, AutoModel
60
+ import torch
61
+
62
+ model_name = "Mishamq/HybriDNA-3B"
63
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
64
+ model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
65
+
66
+ sequence = "ACGTACGTACGTACGT"
67
+ inputs = tokenizer(sequence, return_tensors="pt")
68
+
69
+ with torch.no_grad():
70
+ outputs = model(**inputs)
71
+ embeddings = outputs.last_hidden_state
72
+ ```
73
+
74
+ ## Model Variants
75
+
76
+ | Model | Parameters | Hidden Size | Layers |
77
+ |-------|------------|-------------|--------|
78
+ | [HybriDNA-300M](https://huggingface.co/Mishamq/HybriDNA-300M) | 300M | 1024 | 24 |
79
+ | [HybriDNA-3B](https://huggingface.co/Mishamq/HybriDNA-3B) | 3B | 4096 | 16 |
80
+ | [HybriDNA-7B](https://huggingface.co/Mishamq/HybriDNA-7B) | 7B | 4096 | 32 |
81
+
82
+ ## Citation
83
+
84
+ If you use HybriDNA in your research, please cite:
85
+
86
+ ```bibtex
87
+ @article{ma2025hybridna,
88
+ title={HybriDNA: A Hybrid Transformer-Mamba2 Long-Range DNA Language Model},
89
+ author={Ma, Mingqian and Liu, Guoqing and Cao, Chuan and Deng, Pan and Dao, Tri and Gu, Albert and Jin, Peiran and Yang, Zhao and Xia, Yingce and Luo, Renqian and others},
90
+ journal={arXiv preprint arXiv:2502.10807},
91
+ year={2025}
92
+ }
93
+ ```
94
+
95
+ ## License
96
+
97
+ Apache 2.0
config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "./",
3
+ "architectures": [
4
+ "HybriDNAForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "attn_layer_offset": 4,
8
+ "attn_layer_period": 8,
9
+ "auto_map": {
10
+ "AutoConfig": "configuration_hybridna.HybriDNAConfig",
11
+ "AutoModel": "modeling_hybridna.HybriDNAModel",
12
+ "AutoModelForCausalLM": "modeling_hybridna.HybriDNAForCausalLM"
13
+ },
14
+ "bos_token_id": 2,
15
+ "chunk_size": 256,
16
+ "eos_token_id": 1,
17
+ "expert_layer_offset": 7565761,
18
+ "expert_layer_period": 2,
19
+ "head_dim": 64,
20
+ "hidden_act": "silu",
21
+ "hidden_size": 4096,
22
+ "initializer_range": 0.02,
23
+ "intermediate_size": 8192,
24
+ "mamba_conv_bias": true,
25
+ "mamba_d_conv": 4,
26
+ "mamba_d_state": 64,
27
+ "mamba_dt_rank": 64,
28
+ "mamba_expand": 2,
29
+ "mamba_proj_bias": false,
30
+ "mamba_version": "mamba-2",
31
+ "max_position_embeddings": 8194,
32
+ "model_type": "hybridna",
33
+ "n_groups": 8,
34
+ "num_attention_heads": 32,
35
+ "num_experts": 8,
36
+ "num_experts_per_tok": 2,
37
+ "num_hidden_layers": 16,
38
+ "num_key_value_heads": 8,
39
+ "num_logits_to_keep": 2,
40
+ "output_router_logits": false,
41
+ "pad_token_id": 4,
42
+ "rms_norm_eps": 1e-06,
43
+ "router_aux_loss_coef": 0.001,
44
+ "sliding_window": null,
45
+ "tie_word_embeddings": false,
46
+ "time_step_floor": 0.0001,
47
+ "time_step_limit": [
48
+ 0.0,
49
+ Infinity
50
+ ],
51
+ "time_step_max": 0.1,
52
+ "time_step_min": 0.001,
53
+ "torch_dtype": "bfloat16",
54
+ "transformers_version": "4.42.4",
55
+ "use_cache": false,
56
+ "use_mamba_kernels": true,
57
+ "vocab_size": 12
58
+ }
configuration_hybridna.py ADDED
@@ -0,0 +1,194 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import math
2
+
3
+ from transformers.configuration_utils import PretrainedConfig
4
+ from transformers.utils import logging
5
+
6
+
7
+ logger = logging.get_logger(__name__)
8
+
9
+
10
+ class HybriDNAConfig(PretrainedConfig):
11
+ r"""
12
+ This is the configuration class to store the configuration of a [`HybriDNA`] model. It is adopted from the AI21 lab work of Jamba Model.
13
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
14
+ documentation from [`PretrainedConfig`] for more information.
15
+ Args:
16
+ vocab_size (`int`, *optional*, defaults to 65536):
17
+ Vocabulary size of the HybriDNA model. Defines the number of different tokens that can be represented by the
18
+ `inputs_ids` passed when calling [`HybriDNAModel`]
19
+ tie_word_embeddings (`bool`, *optional*, defaults to `False`):
20
+ Whether the model's input and output word embeddings should be tied. Note that this is only relevant if the
21
+ model has a output word embedding layer.
22
+ hidden_size (`int`, *optional*, defaults to 4096):
23
+ Dimension of the hidden representations.
24
+ intermediate_size (`int`, *optional*, defaults to 14336):
25
+ Dimension of the MLP representations.
26
+ num_hidden_layers (`int`, *optional*, defaults to 32):
27
+ Number of hidden layers in the Transformer encoder.
28
+ num_attention_heads (`int`, *optional*, defaults to 32):
29
+ Number of attention heads for each attention layer in the Transformer encoder.
30
+ num_key_value_heads (`int`, *optional*, defaults to 8):
31
+ This is the number of key_value heads that should be used to implement Grouped Query Attention. If
32
+ `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
33
+ `num_key_value_heads=1 the model will use Multi Query Attention (MQA) otherwise GQA is used. When
34
+ converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
35
+ by meanpooling all the original heads within that group. For more details checkout [this
36
+ paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to `8`.
37
+ hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
38
+ The non-linear activation function (function or string) in the decoder.
39
+ initializer_range (`float`, *optional*, defaults to 0.02):
40
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
41
+ rms_norm_eps (`float`, *optional*, defaults to 1e-06):
42
+ The epsilon used by the rms normalization layers.
43
+ use_cache (`bool`, *optional*, defaults to `True`):
44
+ Whether or not the model should return the last key/values attentions (not used by all models). Only
45
+ relevant if `config.is_decoder=True`.
46
+ num_logits_to_keep (`int` or `None`, *optional*, defaults to 1):
47
+ Number of prompt logits to calculate during generation. If `None`, all logits will be calculated. If an
48
+ integer value, only last `num_logits_to_keep` logits will be calculated. Default is 1 because only the
49
+ logits of the last prompt token are needed for generation. For long sequences, the logits for the entire
50
+ sequence may use a lot of memory so, setting `num_logits_to_keep=1` will reduce memory footprint
51
+ significantly.
52
+ pad_token_id (`int`, *optional*, defaults to 0):
53
+ The id of the padding token.
54
+ bos_token_id (`int`, *optional*, defaults to 1):
55
+ The id of the "beginning-of-sequence" token.
56
+ eos_token_id (`int`, *optional*, defaults to 2):
57
+ The id of the "end-of-sequence" token.
58
+ sliding_window (`int`, *optional*):
59
+ Sliding window attention window size. If not specified, will default to `None`.
60
+ max_position_embeddings (`int`, *optional*, defaults to 262144):
61
+ This value doesn't have any real effect. The maximum sequence length that this model is intended to be
62
+ used with. It can be used with longer sequences, but performance may degrade.
63
+ attention_dropout (`float`, *optional*, defaults to 0.0):
64
+ The dropout ratio for the attention probabilities.
65
+ use_mamba_kernels (`bool`, *optional*, defaults to `True`):
66
+ Flag indicating whether or not to use the fast mamba kernels. These are available only if `mamba-ssm` and
67
+ `causal-conv1d` are installed, and the mamba modules are running on a CUDA device. Raises ValueError if
68
+ `True` and kernels are not available
69
+ mamba_d_state (`int`, *optional*, defaults to 16):
70
+ The dimension the mamba state space latents
71
+ mamba_d_conv (`int`, *optional*, defaults to 4):
72
+ The size of the mamba convolution kernel
73
+ mamba_expand (`int`, *optional*, defaults to 2):
74
+ Expanding factor (relative to hidden_size) used to determine the mamba intermediate size
75
+ mamba_dt_rank (`Union[int,str]`, *optional*, defaults to `"auto"`):
76
+ Rank of the the mamba discretization projection matrix. `"auto"` means that it will default to `math.ceil(self.hidden_size / 16)`
77
+ mamba_conv_bias (`bool`, *optional*, defaults to `True`):
78
+ Flag indicating whether or not to use bias in the convolution layer of the mamba mixer block.
79
+ mamba_proj_bias (`bool`, *optional*, defaults to `False`):
80
+ Flag indicating whether or not to use bias in the input and output projections (["in_proj", "out_proj"]) of the mamba mixer block
81
+ head_dim (`int`, *optional*, defaults to 64):
82
+ Dimension of each attention head.
83
+ chunk_size (`int`, *optional*, defaults to 256):
84
+ The size of each chunk for processing.
85
+ n_groups (`int`, *optional*, defaults to 8):
86
+ Number of groups for the evolution matrices of mamba 2.
87
+ time_step_rank (`Union[int,str]`, *optional*, defaults to `"auto"`):
88
+ Rank of the discretization projection matrix. `"auto"` means that it will default to `math.ceil(self.hidden_size / 16)`
89
+ time_step_min (`float`, *optional*, defaults to 0.001):
90
+ Minimum `time_step` used to bound `dt_proj.bias`.
91
+ time_step_max (`float`, *optional*, defaults to 0.1):
92
+ Maximum `time_step` used to bound `dt_proj.bias`.
93
+ time_step_floor (`float`, *optional*, defaults to 0.0001):
94
+ Minimum clamping value of the `dt_proj.bias` layer initialization.
95
+ time_step_limit (`tuple`, *optional*, defaults to `(0.0, inf)`):
96
+ Accepted range of time step values.
97
+ output_router_logits (`bool`, *optional*, defaults to `False`):
98
+ Whether to return the router logits from mixture-of-experts layers.
99
+ """
100
+
101
+ model_type = "hybridna"
102
+ keys_to_ignore_at_inference = ["past_key_values"]
103
+
104
+ def __init__(
105
+ self,
106
+ vocab_size=65536,
107
+ tie_word_embeddings=False,
108
+ hidden_size=4096,
109
+ intermediate_size=14336,
110
+ num_hidden_layers=32,
111
+ num_attention_heads=32,
112
+ num_key_value_heads=8,
113
+ hidden_act="silu",
114
+ initializer_range=0.02,
115
+ rms_norm_eps=1e-6,
116
+ use_cache=True,
117
+ num_logits_to_keep=1,
118
+ sliding_window=None,
119
+ max_position_embeddings=262144,
120
+ attention_dropout=0.0,
121
+ use_mamba_kernels=True,
122
+ mamba_d_state=16,
123
+ mamba_d_conv=4,
124
+ mamba_expand=2,
125
+ mamba_dt_rank="auto",
126
+ mamba_conv_bias=True,
127
+ mamba_proj_bias=False,
128
+ head_dim=64,
129
+ chunk_size=256,
130
+ n_groups=8,
131
+ pad_token_id=0,
132
+ bos_token_id=1,
133
+ eos_token_id=2,
134
+ time_step_min=0.001,
135
+ time_step_max=0.1,
136
+ time_step_floor=1e-4,
137
+ time_step_limit=(0.0, float("inf")),
138
+ output_router_logits=False,
139
+ **kwargs,
140
+ ):
141
+ self.output_router_logits = output_router_logits
142
+ self.vocab_size = vocab_size
143
+ self.tie_word_embeddings = tie_word_embeddings
144
+ self.hidden_size = hidden_size
145
+ self.intermediate_size = intermediate_size
146
+ self.num_hidden_layers = num_hidden_layers
147
+ self.num_attention_heads = num_attention_heads
148
+ self.sliding_window = sliding_window
149
+ self.max_position_embeddings = max_position_embeddings
150
+ self.attention_dropout = attention_dropout
151
+
152
+ # for backward compatibility
153
+ if num_key_value_heads is None:
154
+ num_key_value_heads = num_attention_heads
155
+
156
+ self.num_key_value_heads = num_key_value_heads
157
+ self.hidden_act = hidden_act
158
+ self.initializer_range = initializer_range
159
+ self.rms_norm_eps = rms_norm_eps
160
+
161
+ self.use_cache = use_cache
162
+ self.num_logits_to_keep = num_logits_to_keep
163
+
164
+ self.use_mamba_kernels = use_mamba_kernels
165
+ self.mamba_d_state = mamba_d_state
166
+ self.mamba_d_conv = mamba_d_conv
167
+ self.mamba_expand = mamba_expand
168
+ self.mamba_dt_rank = math.ceil(self.hidden_size / 16) if mamba_dt_rank == "auto" else mamba_dt_rank
169
+ self.mamba_conv_bias = mamba_conv_bias
170
+ self.mamba_proj_bias = mamba_proj_bias
171
+ self.head_dim = head_dim
172
+ self.chunk_size = chunk_size
173
+ self.n_groups = n_groups
174
+ self.time_step_limit = time_step_limit
175
+ self.time_step_min = time_step_min
176
+ self.time_step_max = time_step_max
177
+ self.time_step_floor = time_step_floor
178
+
179
+
180
+ super().__init__(
181
+ pad_token_id=pad_token_id,
182
+ bos_token_id=bos_token_id,
183
+ eos_token_id=eos_token_id,
184
+ tie_word_embeddings=tie_word_embeddings,
185
+ output_router_logits=output_router_logits,
186
+ **kwargs,
187
+ )
188
+
189
+ @property
190
+ def layers_block_type(self):
191
+ return [
192
+ "attention" if i % self.attn_layer_period == self.attn_layer_offset else "mamba"
193
+ for i in range(self.num_hidden_layers)
194
+ ]
generation_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 2,
4
+ "eos_token_id": 1,
5
+ "pad_token_id": 4,
6
+ "transformers_version": "4.42.4",
7
+ "use_cache": false
8
+ }
hybridna_tokenizer.py ADDED
@@ -0,0 +1,226 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from transformers import PreTrainedTokenizer, AddedToken
2
+ from typing import List, Optional, Union, Dict, Sequence, Tuple
3
+ from pathlib import Path
4
+ import numpy as np
5
+ import json
6
+ import os
7
+
8
+
9
+ class HybriDNATokenizer(PreTrainedTokenizer):
10
+ model_input_names = ["input_ids", "attention_mask"]
11
+
12
+ def __init__(self,
13
+ model_max_length: int,
14
+ bos_token="[BOS]",
15
+ eos_token="[SEP]",
16
+ sep_token="[SEP]",
17
+ cls_token="[CLS]",
18
+ pad_token="[PAD]",
19
+ mask_token="[MASK]",
20
+ unk_token="[UNK]",
21
+ **kwargs):
22
+ """Character tokenizer for Hugging Face transformers.
23
+ Args:
24
+ characters (Sequence[str]): List of desired characters. Any character which
25
+ is not included in this list will be replaced by a special token called
26
+ [UNK] with id=6. Following are list of all of the special tokens with
27
+ their corresponding ids:
28
+ "[CLS]": 0
29
+ "[SEP]": 1
30
+ "[BOS]": 2
31
+ "[MASK]": 3
32
+ "[PAD]": 4
33
+ "[RESERVED]": 5
34
+ "[UNK]": 6
35
+ an id (starting at 7) will be assigned to each character.
36
+ model_max_length (int): Model maximum sequence length.
37
+ """
38
+ self.characters = ('A', 'C', 'G', 'T', 'N')
39
+ self.model_max_length = model_max_length
40
+
41
+ self._vocab_str_to_int = {
42
+ "[CLS]": 0,
43
+ "[SEP]": 1,
44
+ "[BOS]": 2,
45
+ "[MASK]": 3,
46
+ "[PAD]": 4,
47
+ "[RESERVED]": 5,
48
+ "[UNK]": 6,
49
+ **{ch: i + 7 for i, ch in enumerate(self.characters)},
50
+ }
51
+ self._vocab_int_to_str = {v: k for k, v in self._vocab_str_to_int.items()}
52
+ self._bos_id = self._vocab_str_to_int["[BOS]"]
53
+ self._eos_id = self._vocab_str_to_int["[SEP]"]
54
+ self._pad_id = self._vocab_str_to_int["[PAD]"]
55
+ self._unk_id = self._vocab_str_to_int["[UNK]"]
56
+ self._bos_np = np.array([self._bos_id], dtype=np.uint16)
57
+ self._eos_np = np.array([self._eos_id], dtype=np.uint16)
58
+ self._numpy_lookup = np.full(256, self._unk_id, dtype=np.uint16)
59
+ for ch in self.characters:
60
+ self._numpy_lookup[ord(ch)] = self._vocab_str_to_int[ch]
61
+ for special in ("[CLS]", "[SEP]", "[BOS]", "[MASK]", "[PAD]", "[RESERVED]", "[UNK]"):
62
+ token_id = self._vocab_str_to_int[special]
63
+ if special.startswith("[") and len(special) == 5:
64
+ # Skip bracketed four-letter tokens from attempting ascii mapping.
65
+ continue
66
+ # Explicitly map special token string representations if they are single characters.
67
+ if len(special) == 1:
68
+ self._numpy_lookup[ord(special)] = token_id
69
+ add_prefix_space = kwargs.pop("add_prefix_space", False)
70
+ padding_side = kwargs.pop("padding_side", "left")
71
+
72
+ super().__init__(
73
+ bos_token=bos_token,
74
+ eos_token=eos_token,
75
+ sep_token=sep_token,
76
+ cls_token=cls_token,
77
+ pad_token=pad_token,
78
+ mask_token=mask_token,
79
+ unk_token=unk_token,
80
+ add_prefix_space=add_prefix_space,
81
+ model_max_length=model_max_length,
82
+ padding_side=padding_side,
83
+ **kwargs,
84
+ )
85
+
86
+ @property
87
+ def vocab_size(self) -> int:
88
+ return len(self._vocab_str_to_int)
89
+
90
+ def _tokenize(self, text: str) -> List[str]:
91
+ return list(text)
92
+
93
+ def _convert_token_to_id(self, token: str) -> int:
94
+ return self._vocab_str_to_int.get(token, self._vocab_str_to_int["[UNK]"])
95
+
96
+ def _convert_id_to_token(self, index: int) -> str:
97
+ return self._vocab_int_to_str[index]
98
+
99
+ def convert_tokens_to_string(self, tokens):
100
+ return "".join(tokens)
101
+
102
+ def get_special_tokens_mask(
103
+ self,
104
+ token_ids_0: List[int],
105
+ token_ids_1: Optional[List[int]] = None,
106
+ already_has_special_tokens: bool = False,
107
+ ) -> List[int]:
108
+ if already_has_special_tokens:
109
+ return super().get_special_tokens_mask(
110
+ token_ids_0=token_ids_0,
111
+ token_ids_1=token_ids_1,
112
+ already_has_special_tokens=True,
113
+ )
114
+
115
+ result = ([0] * len(token_ids_0)) + [1]
116
+ if token_ids_1 is not None:
117
+ result += ([0] * len(token_ids_1)) + [1]
118
+ return result
119
+
120
+ def build_inputs_with_special_tokens(
121
+ self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
122
+ ) -> List[int]:
123
+ sep = [self.sep_token_id]
124
+ bos = [self.bos_token_id]
125
+ eos = [self.eos_token_id]
126
+ result = bos + token_ids_0 + eos
127
+ if token_ids_1 is not None:
128
+ result += token_ids_1 + eos
129
+ return result
130
+
131
+ def create_attention_mask(self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None) -> List[int]:
132
+ """Creates an attention mask to differentiate between padding and non-padding tokens.
133
+
134
+ Args:
135
+ token_ids_0 (List[int]): List of token IDs for the first sequence.
136
+ token_ids_1 (Optional[List[int]]): List of token IDs for the second sequence if available.
137
+
138
+ Returns:
139
+ List[int]: A list where 1 represents non-padding tokens and 0 represents padding tokens.
140
+ """
141
+ mask = [1] * len(token_ids_0)
142
+ if token_ids_1 is not None:
143
+ mask += [1] * len(token_ids_1)
144
+ return mask
145
+
146
+ def get_vocab(self) -> Dict[str, int]:
147
+ return self._vocab_str_to_int
148
+
149
+ def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple:
150
+ vocab_file = os.path.join(save_directory, (filename_prefix or '') + 'vocab.json')
151
+ with open(vocab_file, 'w') as f:
152
+ json.dump(self._vocab_str_to_int, f)
153
+ return (vocab_file,)
154
+
155
+ def __call__(
156
+ self,
157
+ text: Union[str, List[str]],
158
+ *,
159
+ padding: bool = True,
160
+ truncation: bool = True,
161
+ max_length: Optional[int] = None,
162
+ add_special_tokens: bool = True,
163
+ ):
164
+ # ---------- detect batch vs single ----------
165
+ is_batch = not isinstance(text, str)
166
+ seqs = text if is_batch else [text] # always work on a list internally
167
+ max_len = max_length or self.model_max_length
168
+
169
+ # ---------- encode every sequence ----------
170
+ batch_input_ids = []
171
+ for seq in seqs:
172
+ seq_bytes = np.frombuffer(seq.encode("ascii", "ignore"), dtype=np.uint8)
173
+ ids = self._numpy_lookup[seq_bytes]
174
+ if add_special_tokens:
175
+ ids = np.concatenate((self._bos_np, ids, self._eos_np))
176
+ if truncation and ids.size > max_len:
177
+ ids = ids[:max_len]
178
+ batch_input_ids.append(ids.astype(np.uint16, copy=False))
179
+
180
+ # ---------- pad ----------
181
+ if padding and batch_input_ids:
182
+ if padding == "max_length":
183
+ pad_len = max_len
184
+ elif padding == "longest":
185
+ pad_len = max(ids.size for ids in batch_input_ids)
186
+ elif padding is True:
187
+ pad_len = max(ids.size for ids in batch_input_ids)
188
+ else:
189
+ pad_len = None
190
+
191
+ if pad_len is not None:
192
+ pad_len = min(pad_len, max_len)
193
+ padded_ids = []
194
+ for ids in batch_input_ids:
195
+ if ids.size < pad_len:
196
+ pad_width = pad_len - ids.size
197
+ ids = np.pad(ids, (0, pad_width), constant_values=self._pad_id)
198
+ elif ids.size > pad_len:
199
+ ids = ids[:pad_len]
200
+ ids = np.asarray(ids, dtype=np.uint16, order="C")
201
+ padded_ids.append(ids[:pad_len])
202
+ batch_input_ids = padded_ids
203
+
204
+ for ids in batch_input_ids:
205
+ if not isinstance(ids, np.ndarray):
206
+ # Fallback for any non-numpy path
207
+ continue
208
+
209
+ # ---------- masks ----------
210
+ batch_attention = []
211
+ for ids in batch_input_ids:
212
+ if isinstance(ids, np.ndarray):
213
+ mask = (ids != self._pad_id).astype(np.uint8, copy=False)
214
+ else:
215
+ mask = [0 if tok == self._pad_id else 1 for tok in ids]
216
+ batch_attention.append(mask)
217
+
218
+ # ---------- collapse back if it was a single example ----------
219
+ if not is_batch:
220
+ batch_input_ids = batch_input_ids[0]
221
+ batch_attention = batch_attention[0]
222
+
223
+ return {
224
+ "input_ids": batch_input_ids,
225
+ "attention_mask": batch_attention,
226
+ }
model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6bc123358e87c99d4b88ea170e3eaf6d05145a3c1160497ddf57fb08c91f1c9e
3
+ size 4956986656
model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cf7cee3faa7bfab0d97a3498720e8185a3724f631401619a20ab1edb549b5b65
3
+ size 1281837808
model.safetensors.index.json ADDED
@@ -0,0 +1,210 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 6238801920
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00002-of-00002.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00002.safetensors",
8
+ "model.final_layernorm.weight": "model-00002-of-00002.safetensors",
9
+ "model.layers.0.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
10
+ "model.layers.0.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
11
+ "model.layers.0.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
12
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
13
+ "model.layers.0.mamba.A_log": "model-00001-of-00002.safetensors",
14
+ "model.layers.0.mamba.D": "model-00001-of-00002.safetensors",
15
+ "model.layers.0.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
16
+ "model.layers.0.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
17
+ "model.layers.0.mamba.dt_bias": "model-00001-of-00002.safetensors",
18
+ "model.layers.0.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
19
+ "model.layers.0.mamba.norm.weight": "model-00001-of-00002.safetensors",
20
+ "model.layers.0.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
21
+ "model.layers.0.pre_ff_layernorm.weight": "model-00001-of-00002.safetensors",
22
+ "model.layers.1.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
23
+ "model.layers.1.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
24
+ "model.layers.1.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
25
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
26
+ "model.layers.1.mamba.A_log": "model-00001-of-00002.safetensors",
27
+ "model.layers.1.mamba.D": "model-00001-of-00002.safetensors",
28
+ "model.layers.1.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
29
+ "model.layers.1.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
30
+ "model.layers.1.mamba.dt_bias": "model-00001-of-00002.safetensors",
31
+ "model.layers.1.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
32
+ "model.layers.1.mamba.norm.weight": "model-00001-of-00002.safetensors",
33
+ "model.layers.1.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
34
+ "model.layers.1.pre_ff_layernorm.weight": "model-00001-of-00002.safetensors",
35
+ "model.layers.10.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
36
+ "model.layers.10.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
37
+ "model.layers.10.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
38
+ "model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
39
+ "model.layers.10.mamba.A_log": "model-00001-of-00002.safetensors",
40
+ "model.layers.10.mamba.D": "model-00001-of-00002.safetensors",
41
+ "model.layers.10.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
42
+ "model.layers.10.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
43
+ "model.layers.10.mamba.dt_bias": "model-00001-of-00002.safetensors",
44
+ "model.layers.10.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
45
+ "model.layers.10.mamba.norm.weight": "model-00001-of-00002.safetensors",
46
+ "model.layers.10.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
47
+ "model.layers.10.pre_ff_layernorm.weight": "model-00001-of-00002.safetensors",
48
+ "model.layers.11.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
49
+ "model.layers.11.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
50
+ "model.layers.11.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
51
+ "model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
52
+ "model.layers.11.mamba.A_log": "model-00001-of-00002.safetensors",
53
+ "model.layers.11.mamba.D": "model-00001-of-00002.safetensors",
54
+ "model.layers.11.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
55
+ "model.layers.11.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
56
+ "model.layers.11.mamba.dt_bias": "model-00001-of-00002.safetensors",
57
+ "model.layers.11.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
58
+ "model.layers.11.mamba.norm.weight": "model-00001-of-00002.safetensors",
59
+ "model.layers.11.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
60
+ "model.layers.11.pre_ff_layernorm.weight": "model-00001-of-00002.safetensors",
61
+ "model.layers.12.feed_forward.down_proj.weight": "model-00002-of-00002.safetensors",
62
+ "model.layers.12.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
63
+ "model.layers.12.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
64
+ "model.layers.12.input_layernorm.weight": "model-00002-of-00002.safetensors",
65
+ "model.layers.12.pre_ff_layernorm.weight": "model-00002-of-00002.safetensors",
66
+ "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
67
+ "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
68
+ "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
69
+ "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
70
+ "model.layers.13.feed_forward.down_proj.weight": "model-00002-of-00002.safetensors",
71
+ "model.layers.13.feed_forward.gate_proj.weight": "model-00002-of-00002.safetensors",
72
+ "model.layers.13.feed_forward.up_proj.weight": "model-00002-of-00002.safetensors",
73
+ "model.layers.13.input_layernorm.weight": "model-00002-of-00002.safetensors",
74
+ "model.layers.13.mamba.A_log": "model-00002-of-00002.safetensors",
75
+ "model.layers.13.mamba.D": "model-00002-of-00002.safetensors",
76
+ "model.layers.13.mamba.conv1d.bias": "model-00002-of-00002.safetensors",
77
+ "model.layers.13.mamba.conv1d.weight": "model-00002-of-00002.safetensors",
78
+ "model.layers.13.mamba.dt_bias": "model-00002-of-00002.safetensors",
79
+ "model.layers.13.mamba.in_proj.weight": "model-00002-of-00002.safetensors",
80
+ "model.layers.13.mamba.norm.weight": "model-00002-of-00002.safetensors",
81
+ "model.layers.13.mamba.out_proj.weight": "model-00002-of-00002.safetensors",
82
+ "model.layers.13.pre_ff_layernorm.weight": "model-00002-of-00002.safetensors",
83
+ "model.layers.14.feed_forward.down_proj.weight": "model-00002-of-00002.safetensors",
84
+ "model.layers.14.feed_forward.gate_proj.weight": "model-00002-of-00002.safetensors",
85
+ "model.layers.14.feed_forward.up_proj.weight": "model-00002-of-00002.safetensors",
86
+ "model.layers.14.input_layernorm.weight": "model-00002-of-00002.safetensors",
87
+ "model.layers.14.mamba.A_log": "model-00002-of-00002.safetensors",
88
+ "model.layers.14.mamba.D": "model-00002-of-00002.safetensors",
89
+ "model.layers.14.mamba.conv1d.bias": "model-00002-of-00002.safetensors",
90
+ "model.layers.14.mamba.conv1d.weight": "model-00002-of-00002.safetensors",
91
+ "model.layers.14.mamba.dt_bias": "model-00002-of-00002.safetensors",
92
+ "model.layers.14.mamba.in_proj.weight": "model-00002-of-00002.safetensors",
93
+ "model.layers.14.mamba.norm.weight": "model-00002-of-00002.safetensors",
94
+ "model.layers.14.mamba.out_proj.weight": "model-00002-of-00002.safetensors",
95
+ "model.layers.14.pre_ff_layernorm.weight": "model-00002-of-00002.safetensors",
96
+ "model.layers.15.feed_forward.down_proj.weight": "model-00002-of-00002.safetensors",
97
+ "model.layers.15.feed_forward.gate_proj.weight": "model-00002-of-00002.safetensors",
98
+ "model.layers.15.feed_forward.up_proj.weight": "model-00002-of-00002.safetensors",
99
+ "model.layers.15.input_layernorm.weight": "model-00002-of-00002.safetensors",
100
+ "model.layers.15.mamba.A_log": "model-00002-of-00002.safetensors",
101
+ "model.layers.15.mamba.D": "model-00002-of-00002.safetensors",
102
+ "model.layers.15.mamba.conv1d.bias": "model-00002-of-00002.safetensors",
103
+ "model.layers.15.mamba.conv1d.weight": "model-00002-of-00002.safetensors",
104
+ "model.layers.15.mamba.dt_bias": "model-00002-of-00002.safetensors",
105
+ "model.layers.15.mamba.in_proj.weight": "model-00002-of-00002.safetensors",
106
+ "model.layers.15.mamba.norm.weight": "model-00002-of-00002.safetensors",
107
+ "model.layers.15.mamba.out_proj.weight": "model-00002-of-00002.safetensors",
108
+ "model.layers.15.pre_ff_layernorm.weight": "model-00002-of-00002.safetensors",
109
+ "model.layers.2.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
110
+ "model.layers.2.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
111
+ "model.layers.2.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
112
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
113
+ "model.layers.2.mamba.A_log": "model-00001-of-00002.safetensors",
114
+ "model.layers.2.mamba.D": "model-00001-of-00002.safetensors",
115
+ "model.layers.2.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
116
+ "model.layers.2.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
117
+ "model.layers.2.mamba.dt_bias": "model-00001-of-00002.safetensors",
118
+ "model.layers.2.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
119
+ "model.layers.2.mamba.norm.weight": "model-00001-of-00002.safetensors",
120
+ "model.layers.2.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
121
+ "model.layers.2.pre_ff_layernorm.weight": "model-00001-of-00002.safetensors",
122
+ "model.layers.3.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
123
+ "model.layers.3.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
124
+ "model.layers.3.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
125
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
126
+ "model.layers.3.mamba.A_log": "model-00001-of-00002.safetensors",
127
+ "model.layers.3.mamba.D": "model-00001-of-00002.safetensors",
128
+ "model.layers.3.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
129
+ "model.layers.3.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
130
+ "model.layers.3.mamba.dt_bias": "model-00001-of-00002.safetensors",
131
+ "model.layers.3.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
132
+ "model.layers.3.mamba.norm.weight": "model-00001-of-00002.safetensors",
133
+ "model.layers.3.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
134
+ "model.layers.3.pre_ff_layernorm.weight": "model-00001-of-00002.safetensors",
135
+ "model.layers.4.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
136
+ "model.layers.4.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
137
+ "model.layers.4.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
138
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
139
+ "model.layers.4.pre_ff_layernorm.weight": "model-00001-of-00002.safetensors",
140
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
141
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
142
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
143
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
144
+ "model.layers.5.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
145
+ "model.layers.5.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
146
+ "model.layers.5.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
147
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
148
+ "model.layers.5.mamba.A_log": "model-00001-of-00002.safetensors",
149
+ "model.layers.5.mamba.D": "model-00001-of-00002.safetensors",
150
+ "model.layers.5.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
151
+ "model.layers.5.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
152
+ "model.layers.5.mamba.dt_bias": "model-00001-of-00002.safetensors",
153
+ "model.layers.5.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
154
+ "model.layers.5.mamba.norm.weight": "model-00001-of-00002.safetensors",
155
+ "model.layers.5.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
156
+ "model.layers.5.pre_ff_layernorm.weight": "model-00001-of-00002.safetensors",
157
+ "model.layers.6.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
158
+ "model.layers.6.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
159
+ "model.layers.6.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
160
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
161
+ "model.layers.6.mamba.A_log": "model-00001-of-00002.safetensors",
162
+ "model.layers.6.mamba.D": "model-00001-of-00002.safetensors",
163
+ "model.layers.6.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
164
+ "model.layers.6.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
165
+ "model.layers.6.mamba.dt_bias": "model-00001-of-00002.safetensors",
166
+ "model.layers.6.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
167
+ "model.layers.6.mamba.norm.weight": "model-00001-of-00002.safetensors",
168
+ "model.layers.6.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
169
+ "model.layers.6.pre_ff_layernorm.weight": "model-00001-of-00002.safetensors",
170
+ "model.layers.7.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
171
+ "model.layers.7.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
172
+ "model.layers.7.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
173
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
174
+ "model.layers.7.mamba.A_log": "model-00001-of-00002.safetensors",
175
+ "model.layers.7.mamba.D": "model-00001-of-00002.safetensors",
176
+ "model.layers.7.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
177
+ "model.layers.7.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
178
+ "model.layers.7.mamba.dt_bias": "model-00001-of-00002.safetensors",
179
+ "model.layers.7.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
180
+ "model.layers.7.mamba.norm.weight": "model-00001-of-00002.safetensors",
181
+ "model.layers.7.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
182
+ "model.layers.7.pre_ff_layernorm.weight": "model-00001-of-00002.safetensors",
183
+ "model.layers.8.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
184
+ "model.layers.8.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
185
+ "model.layers.8.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
186
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
187
+ "model.layers.8.mamba.A_log": "model-00001-of-00002.safetensors",
188
+ "model.layers.8.mamba.D": "model-00001-of-00002.safetensors",
189
+ "model.layers.8.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
190
+ "model.layers.8.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
191
+ "model.layers.8.mamba.dt_bias": "model-00001-of-00002.safetensors",
192
+ "model.layers.8.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
193
+ "model.layers.8.mamba.norm.weight": "model-00001-of-00002.safetensors",
194
+ "model.layers.8.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
195
+ "model.layers.8.pre_ff_layernorm.weight": "model-00001-of-00002.safetensors",
196
+ "model.layers.9.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
197
+ "model.layers.9.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
198
+ "model.layers.9.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
199
+ "model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
200
+ "model.layers.9.mamba.A_log": "model-00001-of-00002.safetensors",
201
+ "model.layers.9.mamba.D": "model-00001-of-00002.safetensors",
202
+ "model.layers.9.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
203
+ "model.layers.9.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
204
+ "model.layers.9.mamba.dt_bias": "model-00001-of-00002.safetensors",
205
+ "model.layers.9.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
206
+ "model.layers.9.mamba.norm.weight": "model-00001-of-00002.safetensors",
207
+ "model.layers.9.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
208
+ "model.layers.9.pre_ff_layernorm.weight": "model-00001-of-00002.safetensors"
209
+ }
210
+ }
modeling_hybridna.py ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "[BOS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "[CLS]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "[SEP]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "[MASK]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "[PAD]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "[SEP]",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "[CLS]",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "[SEP]",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "[BOS]",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "[MASK]",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "[PAD]",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "6": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ }
52
+ },
53
+ "auto_map": {
54
+ "AutoTokenizer": [
55
+ "hybridna_tokenizer.HybriDNATokenizer",
56
+ null
57
+ ]
58
+ },
59
+ "bos_token": "[BOS]",
60
+ "clean_up_tokenization_spaces": true,
61
+ "cls_token": "[CLS]",
62
+ "eos_token": "[SEP]",
63
+ "mask_token": "[MASK]",
64
+ "model_max_length": 131202,
65
+ "pad_token": "[PAD]",
66
+ "padding_side": "left",
67
+ "sep_token": "[SEP]",
68
+ "tokenizer_class": "HybriDNATokenizer",
69
+ "unk_token": "[UNK]"
70
+ }
vocab.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"[CLS]": 0, "[SEP]": 1, "[BOS]": 2, "[MASK]": 3, "[PAD]": 4, "[RESERVED]": 5, "[UNK]": 6, "A": 7, "C": 8, "G": 9, "T": 10, "N": 11}