Edwin Jose Palathinkal commited on
Commit
6ac124b
·
1 Parent(s): 72c9685

Add HuggingFace Transformers compatibility with AutoModel and Pipeline support

Browse files

- Add modeling_namer.py with NamerModel (PreTrainedModel + GenerationMixin)
- Add NamerPipeline for easy inference: pipe.generate(42) -> 'forty two'
- Add config.json and generation_config.json for HF integration
- Add convert_checkpoint.py for converting old checkpoints
- Update README with HF usage examples
- Update namer/__init__.py to export new HF-compatible classes

.gitignore CHANGED
@@ -39,3 +39,4 @@ Thumbs.db
39
  # Project specific
40
  namer_model.pt
41
  .pip-tmp/
 
 
39
  # Project specific
40
  namer_model.pt
41
  .pip-tmp/
42
+ pip-tmp/
README.md CHANGED
@@ -33,63 +33,62 @@ Namer is a sequence-to-sequence transformer trained to read digits of a number a
33
 
34
  ## Usage
35
 
36
- ### Quick Start
 
 
37
 
38
  ```python
39
- import torch
40
- from namer import load_namer_model, predict_number_name
41
 
42
- # Load model
43
- device = "cuda" if torch.cuda.is_available() else "cpu"
44
- model = load_namer_model("namer_model.pt", device)
 
 
45
 
46
- # Convert number to name
47
- name = predict_number_name(model, 42)
48
- print(f"42 -> '{name}'") # Output: forty two
49
- ```
50
 
51
- ### Interactive Mode
 
 
52
 
53
- ```bash
54
- python -m namer infer
55
  ```
56
 
57
- Then enter numbers to convert interactively.
58
-
59
- ### API
60
 
61
  ```python
62
- from namer.inference import predict_number_name
63
 
64
- # Single prediction
65
- name = predict_number_name(model, 123456)
66
- # Returns: "one hundred twenty three thousand four hundred fifty six"
67
  ```
68
 
69
- ## Model Architecture
70
 
71
- - **Type**: Sequence-to-sequence transformer
72
- - **Input**: Digits of the integer (as token indices)
73
- - **Output**: English words representing the number
74
- - **Vocabulary**: English number words (zero-nineteen, twenty-ninety, hundred, thousand, million, billion, etc.)
75
-
76
- ## Files
77
 
78
- | File | Description |
79
- |------|-------------|
80
- | `namer_model.pt` | Trained model weights |
81
- | `namer/models.py` | Transformer architecture |
82
- | `namer/inference.py` | Prediction utilities |
83
- | `namer/utils.py` | Encoding/decoding utilities |
84
 
85
- ## Training
 
 
 
86
 
87
- To train from scratch:
88
 
89
  ```bash
90
- python -m namer train
91
  ```
92
 
 
 
93
  ## Installation
94
 
95
  Choose either repository — both have identical code:
@@ -113,6 +112,33 @@ pip install -e .
113
  pip install git+https://github.com/edwinhere/namer.git
114
  ```
115
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116
  ## Citation
117
 
118
  If you use this model, please cite:
 
33
 
34
  ## Usage
35
 
36
+ ### 🚀 HuggingFace Transformers (Recommended)
37
+
38
+ Load and use the model with HuggingFace's `AutoModel` API:
39
 
40
  ```python
41
+ from transformers import AutoModel
42
+ from namer import NamerPipeline
43
 
44
+ # Load model from HuggingFace
45
+ model = AutoModel.from_pretrained(
46
+ "edwinhere/namer",
47
+ trust_remote_code=True
48
+ )
49
 
50
+ # Create pipeline
51
+ pipe = NamerPipeline(model)
 
 
52
 
53
+ # Generate number names
54
+ result = pipe.generate(42) # "forty two"
55
+ result = pipe.generate(1234567) # "one million two hundred thirty four thousand five hundred sixty seven"
56
 
57
+ # Or use callable interface (HF compatible)
58
+ result = pipe(42) # {"generated_text": "forty two"}
59
  ```
60
 
61
+ Alternatively, use the convenience function:
 
 
62
 
63
  ```python
64
+ from namer import load_namer_pipeline
65
 
66
+ pipe = load_namer_pipeline("edwinhere/namer")
67
+ print(pipe.generate(42)) # "forty two"
 
68
  ```
69
 
70
+ ### 🔄 Original API (Local)
71
 
72
+ ```python
73
+ import torch
74
+ from namer import load_namer_model, predict_number_name
 
 
 
75
 
76
+ # Load model
77
+ model = load_namer_model("namer_model.pt")
 
 
 
 
78
 
79
+ # Convert number to name
80
+ name = predict_number_name(model, 42)
81
+ print(f"42 -> '{name}'")
82
+ ```
83
 
84
+ ### 💻 Interactive Mode
85
 
86
  ```bash
87
+ python -m namer infer
88
  ```
89
 
90
+ Then enter numbers to convert interactively.
91
+
92
  ## Installation
93
 
94
  Choose either repository — both have identical code:
 
112
  pip install git+https://github.com/edwinhere/namer.git
113
  ```
114
 
115
+ ## Model Architecture
116
+
117
+ - **Type**: Sequence-to-sequence transformer
118
+ - **Input**: Digits of the integer (as token indices)
119
+ - **Output**: English words representing the number
120
+ - **Vocabulary**: English number words (zero-nineteen, twenty-ninety, hundred, thousand, million, billion, etc.)
121
+ - **Max Output Length**: 20 tokens
122
+
123
+ ## Files
124
+
125
+ | File | Description |
126
+ |------|-------------|
127
+ | `pytorch_model.bin` | HuggingFace model weights |
128
+ | `config.json` | Model configuration |
129
+ | `generation_config.json` | Generation parameters |
130
+ | `modeling_namer.py` | HF-compatible model implementation |
131
+ | `namer_model.pt` | Original PyTorch checkpoint |
132
+ | `namer/` | Source code package |
133
+
134
+ ## Training
135
+
136
+ To train from scratch:
137
+
138
+ ```bash
139
+ python -m namer train
140
+ ```
141
+
142
  ## Citation
143
 
144
  If you use this model, please cite:
config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "NamerModel"
4
+ ],
5
+ "d_model": 128,
6
+ "dim_feedforward": 512,
7
+ "dropout": 0.0,
8
+ "dtype": "float32",
9
+ "eos_token_id": 40,
10
+ "max_output_len": 20,
11
+ "model_type": "custom",
12
+ "nhead": 4,
13
+ "num_encoder_layers": 4,
14
+ "pad_token_id": 10,
15
+ "transformers_version": "5.8.0",
16
+ "vocab_size": 41
17
+ }
convert_checkpoint.py ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Convert old checkpoint format to HuggingFace format."""
2
+
3
+ import torch
4
+ from modeling_namer import NamerModel, NamerConfig
5
+
6
+ # Load old checkpoint
7
+ checkpoint = torch.load("namer_model.pt", map_location="cpu")
8
+
9
+ # Create config from checkpoint
10
+ config = NamerConfig(
11
+ vocab_size=checkpoint["vocab_size"],
12
+ max_output_len=checkpoint["max_output_len"],
13
+ d_model=checkpoint.get("d_model", 128),
14
+ nhead=4,
15
+ num_encoder_layers=4,
16
+ dim_feedforward=512,
17
+ dropout=0.0,
18
+ )
19
+
20
+ # Create new model
21
+ model = NamerModel(config)
22
+
23
+ # Load old weights into new model
24
+ model.load_state_dict(checkpoint["model_state_dict"], strict=False)
25
+
26
+ # Save in HF format
27
+ model.save_pretrained(".")
28
+ print("Model converted and saved to current directory")
29
+ print("Files saved: pytorch_model.bin, config.json")
generation_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "eos_token_id": 40,
4
+ "output_attentions": false,
5
+ "output_hidden_states": false,
6
+ "pad_token_id": 10,
7
+ "transformers_version": "5.8.0"
8
+ }
modeling_namer.py ADDED
@@ -0,0 +1,342 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """HuggingFace compatible Namer model."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import math
6
+ from typing import Optional, Union
7
+
8
+ import torch
9
+ import torch.nn as nn
10
+ from transformers import PreTrainedModel, PretrainedConfig
11
+ from transformers.modeling_outputs import CausalLMOutputWithCrossAttentions
12
+ from transformers.generation import GenerationMixin
13
+
14
+
15
+ class NamerConfig(PretrainedConfig):
16
+ """Configuration class for NamerModel."""
17
+
18
+ model_type = "custom"
19
+
20
+ def __init__(
21
+ self,
22
+ vocab_size: int = 41,
23
+ max_output_len: int = 20,
24
+ d_model: int = 128,
25
+ nhead: int = 4,
26
+ num_encoder_layers: int = 4,
27
+ dim_feedforward: int = 512,
28
+ dropout: float = 0.1,
29
+ pad_token_id: int = 10,
30
+ eos_token_id: int = 40, # <EOS> token index
31
+ **kwargs,
32
+ ):
33
+ self.vocab_size = vocab_size
34
+ self.max_output_len = max_output_len
35
+ self.d_model = d_model
36
+ self.nhead = nhead
37
+ self.num_encoder_layers = num_encoder_layers
38
+ self.dim_feedforward = dim_feedforward
39
+ self.dropout = dropout
40
+
41
+ super().__init__(
42
+ pad_token_id=pad_token_id,
43
+ eos_token_id=eos_token_id,
44
+ **kwargs,
45
+ )
46
+
47
+
48
+ class PositionalEncoding(nn.Module):
49
+ """Sinusoidal positional encoding for transformer."""
50
+
51
+ def __init__(self, d_model: int, max_len: int = 5000) -> None:
52
+ super().__init__()
53
+
54
+ pe = torch.zeros(max_len, d_model)
55
+ position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
56
+ div_term = torch.exp(
57
+ torch.arange(0, d_model, 2).float()
58
+ * (-math.log(10000.0) / d_model)
59
+ )
60
+
61
+ pe[:, 0::2] = torch.sin(position * div_term)
62
+ pe[:, 1::2] = torch.cos(position * div_term)
63
+
64
+ self.register_buffer("pe", pe)
65
+
66
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
67
+ """Add positional encoding to input."""
68
+ return x + self.pe[: x.size(1)]
69
+
70
+
71
+ class NamerModel(PreTrainedModel, GenerationMixin):
72
+ """HuggingFace compatible Namer transformer model.
73
+
74
+ Converts integer digit sequences to English number names.
75
+ """
76
+
77
+ config_class = NamerConfig
78
+ base_model_prefix = "namer"
79
+
80
+ def __init__(self, config: NamerConfig):
81
+ super().__init__(config)
82
+
83
+ self.vocab_size = config.vocab_size
84
+ self.max_output_len = config.max_output_len
85
+ self.d_model = config.d_model
86
+
87
+ # Digit embedding (10 digits + 1 padding token = 11)
88
+ self.digit_embedding = nn.Embedding(11, config.d_model, padding_idx=config.pad_token_id)
89
+
90
+ # Positional encoding
91
+ self.pos_encoder = PositionalEncoding(config.d_model, max_len=100)
92
+
93
+ # Transformer encoder
94
+ encoder_layer = nn.TransformerEncoderLayer(
95
+ d_model=config.d_model,
96
+ nhead=config.nhead,
97
+ dim_feedforward=config.dim_feedforward,
98
+ dropout=config.dropout,
99
+ batch_first=True,
100
+ )
101
+ self.transformer_encoder = nn.TransformerEncoder(
102
+ encoder_layer, num_layers=config.num_encoder_layers
103
+ )
104
+
105
+ # Output projection
106
+ self.output_projection = nn.Linear(config.d_model, config.vocab_size)
107
+
108
+ # Learned queries for each output position
109
+ self.output_queries = nn.Parameter(torch.randn(config.max_output_len, config.d_model))
110
+
111
+ # Cross-attention from output positions to encoded input
112
+ self.cross_attention = nn.MultiheadAttention(
113
+ config.d_model, config.nhead, dropout=config.dropout, batch_first=True
114
+ )
115
+
116
+ # Final output layers
117
+ self.output_norm = nn.LayerNorm(config.d_model)
118
+
119
+ self.post_init()
120
+
121
+ def forward(
122
+ self,
123
+ input_ids: Optional[torch.Tensor] = None,
124
+ attention_mask: Optional[torch.Tensor] = None,
125
+ labels: Optional[torch.Tensor] = None,
126
+ **kwargs,
127
+ ) -> CausalLMOutputWithCrossAttentions:
128
+ """Forward pass for HF compatibility.
129
+
130
+ Args:
131
+ input_ids: (batch_size, seq_len) tensor of digit indices (0-9), padding=10
132
+ attention_mask: Optional mask for padding
133
+ labels: Optional target labels for training
134
+
135
+ Returns:
136
+ CausalLMOutputWithCrossAttentions with logits
137
+ """
138
+ if input_ids is None:
139
+ raise ValueError("input_ids must be provided")
140
+
141
+ batch_size, seq_len = input_ids.shape
142
+
143
+ # Handle padding: convert -1 padding to 10 (our padding index)
144
+ digits = input_ids.clone()
145
+ digits[digits == -1] = self.config.pad_token_id
146
+
147
+ # Create padding mask for transformer (True = padding)
148
+ if attention_mask is None:
149
+ src_key_padding_mask = digits == self.config.pad_token_id
150
+ else:
151
+ src_key_padding_mask = ~attention_mask.bool()
152
+
153
+ # Embed digits: (batch, seq_len, d_model)
154
+ embedded = self.digit_embedding(digits)
155
+
156
+ # Add positional encoding
157
+ embedded = self.pos_encoder(embedded)
158
+
159
+ # Transformer encoder: (batch, seq_len, d_model)
160
+ memory = self.transformer_encoder(
161
+ embedded, src_key_padding_mask=src_key_padding_mask
162
+ )
163
+
164
+ # Expand queries for batch: (batch, max_output_len, d_model)
165
+ queries = self.output_queries.unsqueeze(0).expand(batch_size, -1, -1)
166
+
167
+ # Cross-attention from queries to encoded input
168
+ attn_output, _ = self.cross_attention(
169
+ queries, memory, memory, key_padding_mask=src_key_padding_mask
170
+ )
171
+
172
+ # Normalize and project to vocab
173
+ output = self.output_norm(attn_output)
174
+ logits = self.output_projection(output)
175
+
176
+ loss = None
177
+ if labels is not None:
178
+ loss_fct = nn.CrossEntropyLoss(ignore_index=-100)
179
+ loss = loss_fct(logits.view(-1, self.vocab_size), labels.view(-1))
180
+
181
+ return CausalLMOutputWithCrossAttentions(
182
+ loss=loss,
183
+ logits=logits,
184
+ hidden_states=None,
185
+ attentions=None,
186
+ cross_attentions=None,
187
+ )
188
+
189
+ def prepare_inputs_for_generation(self, input_ids, **kwargs):
190
+ """Prepare inputs for text generation."""
191
+ return {"input_ids": input_ids}
192
+
193
+ def _reorder_cache(self, past_key_values, beam_idx):
194
+ """Reorder cache for beam search."""
195
+ return past_key_values
196
+
197
+
198
+ class NamerPipeline:
199
+ """Simple pipeline for Namer model inference.
200
+
201
+ Usage:
202
+ from transformers import AutoModel
203
+
204
+ # Load model
205
+ model = AutoModel.from_pretrained(
206
+ "edwinhere/namer",
207
+ trust_remote_code=True
208
+ )
209
+
210
+ # Create pipeline
211
+ pipe = NamerPipeline(model)
212
+
213
+ # Generate
214
+ result = pipe.generate(42) # "forty two"
215
+ result = pipe(42) # {"generated_text": "forty two"}
216
+ """
217
+
218
+ def __init__(self, model: NamerModel, tokenizer=None, device: str = None):
219
+ if device is None:
220
+ device = "cuda" if torch.cuda.is_available() else "cpu"
221
+ self.model = model.to(device)
222
+ self.model.eval()
223
+ self.device = device
224
+ self.tokenizer = tokenizer # Placeholder if we add a tokenizer later
225
+
226
+ # Vocabulary mapping (index -> word)
227
+ # Must match utils.py vocabulary exactly
228
+ self.id2word = {
229
+ 0: "zero", 1: "one", 2: "two", 3: "three", 4: "four",
230
+ 5: "five", 6: "six", 7: "seven", 8: "eight", 9: "nine",
231
+ 10: "ten", 11: "eleven", 12: "twelve", 13: "thirteen", 14: "fourteen",
232
+ 15: "fifteen", 16: "sixteen", 17: "seventeen", 18: "eighteen", 19: "nineteen",
233
+ 20: "twenty", 21: "thirty", 22: "forty", 23: "fifty",
234
+ 24: "sixty", 25: "seventy", 26: "eighty", 27: "ninety",
235
+ 28: "hundred",
236
+ 29: "thousand", 30: "million", 31: "billion", 32: "trillion",
237
+ 33: "quadrillion", 34: "quintillion", 35: "sextillion",
238
+ 36: "septillion", 37: "octillion", 38: "nonillion", 39: "decillion",
239
+ 40: "<EOS>"
240
+ }
241
+
242
+ # Reverse mapping
243
+ self.word2id = {v: k for k, v in self.id2word.items()}
244
+
245
+ def _int_to_digits(self, n: int) -> list[int]:
246
+ """Convert integer to list of digit indices."""
247
+ if n == 0:
248
+ return [0]
249
+ digits = []
250
+ while n > 0:
251
+ digits.append(n % 10)
252
+ n //= 10
253
+ return digits[::-1] # Reverse to get most significant digit first
254
+
255
+ def _decode(self, token_ids: list[int]) -> str:
256
+ """Decode token IDs to text, stopping at first EOS."""
257
+ words = []
258
+ eos_idx = self.model.config.eos_token_id # Should be 40
259
+
260
+ for idx in token_ids:
261
+ if idx == eos_idx: # Stop at EOS
262
+ break
263
+ if idx in self.id2word:
264
+ word = self.id2word[idx]
265
+ if word != "<EOS>": # Skip EOS token itself
266
+ words.append(word)
267
+
268
+ return " ".join(words) if words else "zero"
269
+
270
+ def generate(self, text: Union[str, int], **kwargs) -> str:
271
+ """Generate English name for a number.
272
+
273
+ Args:
274
+ text: Integer or string representation of integer
275
+
276
+ Returns:
277
+ English name of the number
278
+ """
279
+ # Parse input
280
+ if isinstance(text, str):
281
+ n = int(text.strip())
282
+ else:
283
+ n = int(text)
284
+
285
+ # Convert to digits
286
+ digits = self._int_to_digits(n)
287
+
288
+ # Pad to max length (20)
289
+ while len(digits) < 20:
290
+ digits.append(10) # padding token
291
+
292
+ # Create tensor
293
+ input_ids = torch.tensor([digits], dtype=torch.long).to(self.device)
294
+
295
+ # Forward pass
296
+ with torch.no_grad():
297
+ outputs = self.model(input_ids)
298
+ logits = outputs.logits
299
+ predictions = logits.argmax(dim=-1)[0].cpu().tolist()
300
+
301
+ # Decode
302
+ return self._decode(predictions)
303
+
304
+ def __call__(self, text: Union[str, int], **kwargs) -> dict:
305
+ """Callable interface for pipeline.
306
+
307
+ Returns dict with 'generated_text' key for HF pipeline compatibility.
308
+ """
309
+ result = self.generate(text, **kwargs)
310
+ return {"generated_text": result}
311
+
312
+
313
+ def load_namer_pipeline(model_name_or_path: str = "edwinhere/namer", device: str = None, **kwargs):
314
+ """Load a Namer pipeline with model.
315
+
316
+ This is a convenience function that loads both the model and creates
317
+ a pipeline for easy inference.
318
+
319
+ Args:
320
+ model_name_or_path: HuggingFace model ID or local path
321
+ device: Device to run on ('cuda', 'cpu', or None for auto)
322
+ **kwargs: Additional args passed to from_pretrained
323
+
324
+ Returns:
325
+ NamerPipeline instance ready for inference
326
+
327
+ Example:
328
+ >>> pipe = load_namer_pipeline("edwinhere/namer")
329
+ >>> pipe.generate(42)
330
+ 'forty two'
331
+ >>> pipe(123)
332
+ {'generated_text': 'one hundred twenty three'}
333
+ """
334
+ from transformers import AutoModel
335
+
336
+ model = AutoModel.from_pretrained(
337
+ model_name_or_path,
338
+ trust_remote_code=True,
339
+ **kwargs
340
+ )
341
+
342
+ return NamerPipeline(model, device=device)
namer/__init__.py CHANGED
@@ -1,7 +1,8 @@
1
  """Namer - A PyTorch transformer model for converting numbers to English names."""
2
 
3
- __version__ = "0.2.0"
4
 
 
5
  from namer.models import NamerTransformer, load_namer_model
6
  from namer.inference import predict_number_name
7
  from namer.utils import (
@@ -15,7 +16,20 @@ from namer.utils import (
15
  read_double,
16
  )
17
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  __all__ = [
 
19
  "NamerTransformer",
20
  "load_namer_model",
21
  "predict_number_name",
@@ -28,3 +42,12 @@ __all__ = [
28
  "read_triplet",
29
  "read_double",
30
  ]
 
 
 
 
 
 
 
 
 
 
1
  """Namer - A PyTorch transformer model for converting numbers to English names."""
2
 
3
+ __version__ = "0.3.0"
4
 
5
+ # Original API
6
  from namer.models import NamerTransformer, load_namer_model
7
  from namer.inference import predict_number_name
8
  from namer.utils import (
 
16
  read_double,
17
  )
18
 
19
+ # HuggingFace compatible API
20
+ try:
21
+ from .modeling_namer import (
22
+ NamerModel,
23
+ NamerConfig,
24
+ NamerPipeline,
25
+ load_namer_pipeline,
26
+ )
27
+ HF_AVAILABLE = True
28
+ except ImportError:
29
+ HF_AVAILABLE = False
30
+
31
  __all__ = [
32
+ # Original API
33
  "NamerTransformer",
34
  "load_namer_model",
35
  "predict_number_name",
 
42
  "read_triplet",
43
  "read_double",
44
  ]
45
+
46
+ if HF_AVAILABLE:
47
+ __all__.extend([
48
+ # HuggingFace API
49
+ "NamerModel",
50
+ "NamerConfig",
51
+ "NamerPipeline",
52
+ "load_namer_pipeline",
53
+ ])
namer/modeling_namer.py ADDED
@@ -0,0 +1,342 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """HuggingFace compatible Namer model."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import math
6
+ from typing import Optional, Union
7
+
8
+ import torch
9
+ import torch.nn as nn
10
+ from transformers import PreTrainedModel, PretrainedConfig
11
+ from transformers.modeling_outputs import CausalLMOutputWithCrossAttentions
12
+ from transformers.generation import GenerationMixin
13
+
14
+
15
+ class NamerConfig(PretrainedConfig):
16
+ """Configuration class for NamerModel."""
17
+
18
+ model_type = "custom"
19
+
20
+ def __init__(
21
+ self,
22
+ vocab_size: int = 41,
23
+ max_output_len: int = 20,
24
+ d_model: int = 128,
25
+ nhead: int = 4,
26
+ num_encoder_layers: int = 4,
27
+ dim_feedforward: int = 512,
28
+ dropout: float = 0.1,
29
+ pad_token_id: int = 10,
30
+ eos_token_id: int = 40, # <EOS> token index
31
+ **kwargs,
32
+ ):
33
+ self.vocab_size = vocab_size
34
+ self.max_output_len = max_output_len
35
+ self.d_model = d_model
36
+ self.nhead = nhead
37
+ self.num_encoder_layers = num_encoder_layers
38
+ self.dim_feedforward = dim_feedforward
39
+ self.dropout = dropout
40
+
41
+ super().__init__(
42
+ pad_token_id=pad_token_id,
43
+ eos_token_id=eos_token_id,
44
+ **kwargs,
45
+ )
46
+
47
+
48
+ class PositionalEncoding(nn.Module):
49
+ """Sinusoidal positional encoding for transformer."""
50
+
51
+ def __init__(self, d_model: int, max_len: int = 5000) -> None:
52
+ super().__init__()
53
+
54
+ pe = torch.zeros(max_len, d_model)
55
+ position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
56
+ div_term = torch.exp(
57
+ torch.arange(0, d_model, 2).float()
58
+ * (-math.log(10000.0) / d_model)
59
+ )
60
+
61
+ pe[:, 0::2] = torch.sin(position * div_term)
62
+ pe[:, 1::2] = torch.cos(position * div_term)
63
+
64
+ self.register_buffer("pe", pe)
65
+
66
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
67
+ """Add positional encoding to input."""
68
+ return x + self.pe[: x.size(1)]
69
+
70
+
71
+ class NamerModel(PreTrainedModel, GenerationMixin):
72
+ """HuggingFace compatible Namer transformer model.
73
+
74
+ Converts integer digit sequences to English number names.
75
+ """
76
+
77
+ config_class = NamerConfig
78
+ base_model_prefix = "namer"
79
+
80
+ def __init__(self, config: NamerConfig):
81
+ super().__init__(config)
82
+
83
+ self.vocab_size = config.vocab_size
84
+ self.max_output_len = config.max_output_len
85
+ self.d_model = config.d_model
86
+
87
+ # Digit embedding (10 digits + 1 padding token = 11)
88
+ self.digit_embedding = nn.Embedding(11, config.d_model, padding_idx=config.pad_token_id)
89
+
90
+ # Positional encoding
91
+ self.pos_encoder = PositionalEncoding(config.d_model, max_len=100)
92
+
93
+ # Transformer encoder
94
+ encoder_layer = nn.TransformerEncoderLayer(
95
+ d_model=config.d_model,
96
+ nhead=config.nhead,
97
+ dim_feedforward=config.dim_feedforward,
98
+ dropout=config.dropout,
99
+ batch_first=True,
100
+ )
101
+ self.transformer_encoder = nn.TransformerEncoder(
102
+ encoder_layer, num_layers=config.num_encoder_layers
103
+ )
104
+
105
+ # Output projection
106
+ self.output_projection = nn.Linear(config.d_model, config.vocab_size)
107
+
108
+ # Learned queries for each output position
109
+ self.output_queries = nn.Parameter(torch.randn(config.max_output_len, config.d_model))
110
+
111
+ # Cross-attention from output positions to encoded input
112
+ self.cross_attention = nn.MultiheadAttention(
113
+ config.d_model, config.nhead, dropout=config.dropout, batch_first=True
114
+ )
115
+
116
+ # Final output layers
117
+ self.output_norm = nn.LayerNorm(config.d_model)
118
+
119
+ self.post_init()
120
+
121
+ def forward(
122
+ self,
123
+ input_ids: Optional[torch.Tensor] = None,
124
+ attention_mask: Optional[torch.Tensor] = None,
125
+ labels: Optional[torch.Tensor] = None,
126
+ **kwargs,
127
+ ) -> CausalLMOutputWithCrossAttentions:
128
+ """Forward pass for HF compatibility.
129
+
130
+ Args:
131
+ input_ids: (batch_size, seq_len) tensor of digit indices (0-9), padding=10
132
+ attention_mask: Optional mask for padding
133
+ labels: Optional target labels for training
134
+
135
+ Returns:
136
+ CausalLMOutputWithCrossAttentions with logits
137
+ """
138
+ if input_ids is None:
139
+ raise ValueError("input_ids must be provided")
140
+
141
+ batch_size, seq_len = input_ids.shape
142
+
143
+ # Handle padding: convert -1 padding to 10 (our padding index)
144
+ digits = input_ids.clone()
145
+ digits[digits == -1] = self.config.pad_token_id
146
+
147
+ # Create padding mask for transformer (True = padding)
148
+ if attention_mask is None:
149
+ src_key_padding_mask = digits == self.config.pad_token_id
150
+ else:
151
+ src_key_padding_mask = ~attention_mask.bool()
152
+
153
+ # Embed digits: (batch, seq_len, d_model)
154
+ embedded = self.digit_embedding(digits)
155
+
156
+ # Add positional encoding
157
+ embedded = self.pos_encoder(embedded)
158
+
159
+ # Transformer encoder: (batch, seq_len, d_model)
160
+ memory = self.transformer_encoder(
161
+ embedded, src_key_padding_mask=src_key_padding_mask
162
+ )
163
+
164
+ # Expand queries for batch: (batch, max_output_len, d_model)
165
+ queries = self.output_queries.unsqueeze(0).expand(batch_size, -1, -1)
166
+
167
+ # Cross-attention from queries to encoded input
168
+ attn_output, _ = self.cross_attention(
169
+ queries, memory, memory, key_padding_mask=src_key_padding_mask
170
+ )
171
+
172
+ # Normalize and project to vocab
173
+ output = self.output_norm(attn_output)
174
+ logits = self.output_projection(output)
175
+
176
+ loss = None
177
+ if labels is not None:
178
+ loss_fct = nn.CrossEntropyLoss(ignore_index=-100)
179
+ loss = loss_fct(logits.view(-1, self.vocab_size), labels.view(-1))
180
+
181
+ return CausalLMOutputWithCrossAttentions(
182
+ loss=loss,
183
+ logits=logits,
184
+ hidden_states=None,
185
+ attentions=None,
186
+ cross_attentions=None,
187
+ )
188
+
189
+ def prepare_inputs_for_generation(self, input_ids, **kwargs):
190
+ """Prepare inputs for text generation."""
191
+ return {"input_ids": input_ids}
192
+
193
+ def _reorder_cache(self, past_key_values, beam_idx):
194
+ """Reorder cache for beam search."""
195
+ return past_key_values
196
+
197
+
198
+ class NamerPipeline:
199
+ """Simple pipeline for Namer model inference.
200
+
201
+ Usage:
202
+ from transformers import AutoModel
203
+
204
+ # Load model
205
+ model = AutoModel.from_pretrained(
206
+ "edwinhere/namer",
207
+ trust_remote_code=True
208
+ )
209
+
210
+ # Create pipeline
211
+ pipe = NamerPipeline(model)
212
+
213
+ # Generate
214
+ result = pipe.generate(42) # "forty two"
215
+ result = pipe(42) # {"generated_text": "forty two"}
216
+ """
217
+
218
+ def __init__(self, model: NamerModel, tokenizer=None, device: str = None):
219
+ if device is None:
220
+ device = "cuda" if torch.cuda.is_available() else "cpu"
221
+ self.model = model.to(device)
222
+ self.model.eval()
223
+ self.device = device
224
+ self.tokenizer = tokenizer # Placeholder if we add a tokenizer later
225
+
226
+ # Vocabulary mapping (index -> word)
227
+ # Must match utils.py vocabulary exactly
228
+ self.id2word = {
229
+ 0: "zero", 1: "one", 2: "two", 3: "three", 4: "four",
230
+ 5: "five", 6: "six", 7: "seven", 8: "eight", 9: "nine",
231
+ 10: "ten", 11: "eleven", 12: "twelve", 13: "thirteen", 14: "fourteen",
232
+ 15: "fifteen", 16: "sixteen", 17: "seventeen", 18: "eighteen", 19: "nineteen",
233
+ 20: "twenty", 21: "thirty", 22: "forty", 23: "fifty",
234
+ 24: "sixty", 25: "seventy", 26: "eighty", 27: "ninety",
235
+ 28: "hundred",
236
+ 29: "thousand", 30: "million", 31: "billion", 32: "trillion",
237
+ 33: "quadrillion", 34: "quintillion", 35: "sextillion",
238
+ 36: "septillion", 37: "octillion", 38: "nonillion", 39: "decillion",
239
+ 40: "<EOS>"
240
+ }
241
+
242
+ # Reverse mapping
243
+ self.word2id = {v: k for k, v in self.id2word.items()}
244
+
245
+ def _int_to_digits(self, n: int) -> list[int]:
246
+ """Convert integer to list of digit indices."""
247
+ if n == 0:
248
+ return [0]
249
+ digits = []
250
+ while n > 0:
251
+ digits.append(n % 10)
252
+ n //= 10
253
+ return digits[::-1] # Reverse to get most significant digit first
254
+
255
+ def _decode(self, token_ids: list[int]) -> str:
256
+ """Decode token IDs to text, stopping at first EOS."""
257
+ words = []
258
+ eos_idx = self.model.config.eos_token_id # Should be 40
259
+
260
+ for idx in token_ids:
261
+ if idx == eos_idx: # Stop at EOS
262
+ break
263
+ if idx in self.id2word:
264
+ word = self.id2word[idx]
265
+ if word != "<EOS>": # Skip EOS token itself
266
+ words.append(word)
267
+
268
+ return " ".join(words) if words else "zero"
269
+
270
+ def generate(self, text: Union[str, int], **kwargs) -> str:
271
+ """Generate English name for a number.
272
+
273
+ Args:
274
+ text: Integer or string representation of integer
275
+
276
+ Returns:
277
+ English name of the number
278
+ """
279
+ # Parse input
280
+ if isinstance(text, str):
281
+ n = int(text.strip())
282
+ else:
283
+ n = int(text)
284
+
285
+ # Convert to digits
286
+ digits = self._int_to_digits(n)
287
+
288
+ # Pad to max length (20)
289
+ while len(digits) < 20:
290
+ digits.append(10) # padding token
291
+
292
+ # Create tensor
293
+ input_ids = torch.tensor([digits], dtype=torch.long).to(self.device)
294
+
295
+ # Forward pass
296
+ with torch.no_grad():
297
+ outputs = self.model(input_ids)
298
+ logits = outputs.logits
299
+ predictions = logits.argmax(dim=-1)[0].cpu().tolist()
300
+
301
+ # Decode
302
+ return self._decode(predictions)
303
+
304
+ def __call__(self, text: Union[str, int], **kwargs) -> dict:
305
+ """Callable interface for pipeline.
306
+
307
+ Returns dict with 'generated_text' key for HF pipeline compatibility.
308
+ """
309
+ result = self.generate(text, **kwargs)
310
+ return {"generated_text": result}
311
+
312
+
313
+ def load_namer_pipeline(model_name_or_path: str = "edwinhere/namer", device: str = None, **kwargs):
314
+ """Load a Namer pipeline with model.
315
+
316
+ This is a convenience function that loads both the model and creates
317
+ a pipeline for easy inference.
318
+
319
+ Args:
320
+ model_name_or_path: HuggingFace model ID or local path
321
+ device: Device to run on ('cuda', 'cpu', or None for auto)
322
+ **kwargs: Additional args passed to from_pretrained
323
+
324
+ Returns:
325
+ NamerPipeline instance ready for inference
326
+
327
+ Example:
328
+ >>> pipe = load_namer_pipeline("edwinhere/namer")
329
+ >>> pipe.generate(42)
330
+ 'forty two'
331
+ >>> pipe(123)
332
+ {'generated_text': 'one hundred twenty three'}
333
+ """
334
+ from transformers import AutoModel
335
+
336
+ model = AutoModel.from_pretrained(
337
+ model_name_or_path,
338
+ trust_remote_code=True,
339
+ **kwargs
340
+ )
341
+
342
+ return NamerPipeline(model, device=device)