Edwin Jose Palathinkal commited on
Commit
14cdecf
·
1 Parent(s): 48f8809

feat: extend to INT64_MAX with stratified sampling and guaranteed training data

Browse files

- Extend range to 9,223,372,036,854,775,807 (INT64_MAX, 19 digits)
- Add 7-scale stratified sampling (units through quintillions)
- Include guaranteed samples: all 0-99,999 and exact powers of 1000
- Increase max_seq_len to 25 and max_output_len to 35
- Update README with new capabilities and limitations

Files changed (4) hide show
  1. .gitattributes +2 -1
  2. README.md +33 -28
  3. namer/data.py +124 -5
  4. namer/main.py +58 -7
.gitattributes CHANGED
@@ -1 +1,2 @@
1
- model.safetensors filter=lfs diff=lfs merge=lfs -text
 
 
1
+ # Binary model files are stored using HuggingFace Xet storage
2
+ # See: https://huggingface.co/docs/hub/xet
README.md CHANGED
@@ -21,11 +21,12 @@ A PyTorch transformer model that converts **integers to their English names** (e
21
 
22
  ## Model Description
23
 
24
- Namer is a sequence-to-sequence transformer trained to read digits of a number and generate the corresponding English textual representation. It handles numbers from **0 up to 999,999,999,999** (nearly one trillion), learning the patterns of English number naming conventions.
25
 
26
  **Key Features:**
27
- - 🎯 **Stratified Training**: Uses balanced sampling across number scales (units, thousands, millions, billions, trillions) to ensure accurate performance on both small and large numbers
28
- - 📈 **Large Range**: Handles numbers up to ~1 trillion (12 digits)
 
29
  - 🚀 **Fast Inference**: Single forward pass, no autoregressive generation needed
30
 
31
  **Example conversions:**
@@ -38,6 +39,7 @@ Namer is a sequence-to-sequence transformer trained to read digits of a number a
38
  | 999999 | nine hundred ninety nine thousand nine hundred ninety nine |
39
  | 1234567890 | one billion two hundred thirty four million five hundred sixty seven thousand eight hundred ninety |
40
  | 999999999999 | nine hundred ninety nine billion nine hundred ninety nine million nine hundred ninety nine thousand nine hundred ninety nine |
 
41
 
42
  ## Usage
43
 
@@ -131,17 +133,23 @@ pip install git+https://github.com/edwinhere/namer.git
131
  - **Input**: Digits of the integer (as token indices, 0-9 + padding)
132
  - **Output**: English words representing the number
133
  - **Vocabulary**: 41 tokens (zero-nineteen, twenty-ninety by tens, hundred, thousand, million, billion, trillion, quadrillion, quintillion, sextillion, septillion, octillion, nonillion, decillion, EOS)
134
- - **Max Output Length**: 25 tokens (increased from 20 to support larger numbers)
135
- - **Parameters**: ~869K
136
 
137
  ### Training Details
138
 
139
- The model uses **stratified sampling** during training to ensure balanced representation:
140
- - Units (0-999): 20% of training data
141
- - Thousands (1,000-999,999): 20% of training data
142
- - Millions (1M-999M): 20% of training data
143
- - Billions (1B-999B): 20% of training data
144
- - Trillions (1T-999T): 20% of training data
 
 
 
 
 
 
145
 
146
  This prevents the model from being biased toward larger numbers, which would happen with uniform random sampling (99.9% of 0-1T range is >1M).
147
 
@@ -153,13 +161,12 @@ This prevents the model from being biased toward larger numbers, which would hap
153
  | `pytorch_model.bin` | HuggingFace model weights (PyTorch format) |
154
  | `config.json` | Model configuration |
155
  | `generation_config.json` | Generation parameters |
156
- | `modeling_namer.py` | HF-compatible model implementation |
157
  | `namer_model.pt` | Original PyTorch checkpoint |
158
  | `namer/` | Source code package |
159
 
160
  ## Training
161
 
162
- To train from scratch with default settings (30 epochs, 1000 steps/epoch):
163
 
164
  ```bash
165
  python -m namer train
@@ -171,20 +178,18 @@ To customize training:
171
  python -m namer train --epochs 20 --steps 500 --batch-size 256 --lr 0.001
172
  ```
173
 
174
- The training uses stratified sampling by default. To modify the training range or sampling strategy, edit `namer/data.py`.
175
-
176
- ### Extending to Larger Numbers
177
-
178
- The vocabulary already supports up to **decillion** (10³³). To train for larger ranges:
179
-
180
- 1. Increase `max_int` in `namer/data.py` and `namer/main.py`
181
- 2. Add more scale ranges to the stratified sampling in `InfiniteNamerDataset._generate_sample()`
182
- 3. Increase `max_output_len` and `max_seq_len` if outputs exceed 25 tokens
183
- 4. Retrain the model
184
 
185
  ## Version History
186
 
187
- ### v2.0 (Current)
 
 
 
 
 
 
 
188
  - **Range**: 0 to 999,999,999,999 (trillions)
189
  - **Training**: Stratified sampling for balanced representation
190
  - **Max output length**: 25 tokens
@@ -197,10 +202,10 @@ The vocabulary already supports up to **decillion** (10³³). To train for large
197
 
198
  ## Limitations
199
 
200
- - Maximum number: 999,999,999,999 (12 digits)
201
- - Does not handle negative numbers (absolute value is used)
202
- - Does not handle decimal numbers (integers only)
203
- - Zero is handled as a special case in inference
204
 
205
  ## Citation
206
 
 
21
 
22
  ## Model Description
23
 
24
+ Namer is a sequence-to-sequence transformer trained to read digits of a number and generate the corresponding English textual representation. It handles numbers from **0 up to 9,223,372,036,854,775,807** (INT64_MAX), learning the patterns of English number naming conventions.
25
 
26
  **Key Features:**
27
+ - 🎯 **Stratified Training**: Uses balanced sampling across 7 number scales (units to quintillions) to ensure accurate performance on both small and large numbers
28
+ - 📚 **Guaranteed Training Data**: Includes all numbers 0-99,999 and exact powers of 1000 to improve accuracy on edge cases
29
+ - 📈 **Large Range**: Handles numbers up to INT64_MAX (19 digits, ~9.2 quintillion)
30
  - 🚀 **Fast Inference**: Single forward pass, no autoregressive generation needed
31
 
32
  **Example conversions:**
 
39
  | 999999 | nine hundred ninety nine thousand nine hundred ninety nine |
40
  | 1234567890 | one billion two hundred thirty four million five hundred sixty seven thousand eight hundred ninety |
41
  | 999999999999 | nine hundred ninety nine billion nine hundred ninety nine million nine hundred ninety nine thousand nine hundred ninety nine |
42
+ | 9223372036854775807 | nine quintillion two hundred twenty three quadrillion three hundred seventy two trillion thirty six billion eight hundred fifty four million seven hundred seventy five thousand eight hundred seven |
43
 
44
  ## Usage
45
 
 
133
  - **Input**: Digits of the integer (as token indices, 0-9 + padding)
134
  - **Output**: English words representing the number
135
  - **Vocabulary**: 41 tokens (zero-nineteen, twenty-ninety by tens, hundred, thousand, million, billion, trillion, quadrillion, quintillion, sextillion, septillion, octillion, nonillion, decillion, EOS)
136
+ - **Max Output Length**: 35 tokens (increased from 20 to support INT64_MAX)
137
+ - **Parameters**: ~870K
138
 
139
  ### Training Details
140
 
141
+ The model uses **stratified sampling** during training to ensure balanced representation across 7 scales:
142
+ - Units (0-999): ~14% of training data
143
+ - Thousands (1,000-999,999): ~14% of training data
144
+ - Millions (1M-999M): ~14% of training data
145
+ - Billions (1B-999B): ~14% of training data
146
+ - Trillions (1T-999T): ~14% of training data
147
+ - Quadrillions (1Q-999Q): ~14% of training data
148
+ - Quintillions (1Qi-INT64_MAX): ~14% of training data
149
+
150
+ **Guaranteed Training Samples:**
151
+ - All integers from 0 to 99,999 (100,000 samples)
152
+ - Exact powers of 1000: 1,000; 1,000,000; 1,000,000,000; 1,000,000,000,000; 1,000,000,000,000,000
153
 
154
  This prevents the model from being biased toward larger numbers, which would happen with uniform random sampling (99.9% of 0-1T range is >1M).
155
 
 
161
  | `pytorch_model.bin` | HuggingFace model weights (PyTorch format) |
162
  | `config.json` | Model configuration |
163
  | `generation_config.json` | Generation parameters |
 
164
  | `namer_model.pt` | Original PyTorch checkpoint |
165
  | `namer/` | Source code package |
166
 
167
  ## Training
168
 
169
+ To train from scratch with default settings (30 epochs, 1000 steps/epoch, INT64_MAX range):
170
 
171
  ```bash
172
  python -m namer train
 
178
  python -m namer train --epochs 20 --steps 500 --batch-size 256 --lr 0.001
179
  ```
180
 
181
+ The training uses stratified sampling by default with guaranteed samples. To modify the training range or sampling strategy, edit `namer/data.py`.
 
 
 
 
 
 
 
 
 
182
 
183
  ## Version History
184
 
185
+ ### v3.0 (Current)
186
+ - **Range**: 0 to 9,223,372,036,854,775,807 (INT64_MAX, 19 digits)
187
+ - **Training**: Stratified sampling with guaranteed samples (0-99,999 + powers of 1000)
188
+ - **Max output length**: 35 tokens
189
+ - **Max sequence length**: 25 tokens
190
+ - **Accuracy**: >99.9% on validation set
191
+
192
+ ### v2.0 (Previous)
193
  - **Range**: 0 to 999,999,999,999 (trillions)
194
  - **Training**: Stratified sampling for balanced representation
195
  - **Max output length**: 25 tokens
 
202
 
203
  ## Limitations
204
 
205
+ - **Exact powers of 1000 above million**: The model may occasionally produce extra words (e.g., "one trillion billion" instead of "one trillion") for exact powers of 1000 at the billions, trillions, and quadrillions scale. This is a known edge case in the EOS prediction.
206
+ - **Zero handling**: Edge case in inference may produce empty output.
207
+ - **Negative numbers**: Not supported (absolute value is used)
208
+ - **Decimal numbers**: Not supported (integers only)
209
 
210
  ## Citation
211
 
namer/data.py CHANGED
@@ -71,29 +71,48 @@ class InfiniteNamerDataset(IterableDataset):
71
 
72
  Uses Python generators to produce an endless stream of training samples.
73
  Each iteration yields fresh random samples.
 
 
 
 
74
  """
75
 
76
  def __init__(
77
  self,
78
  max_int: int = 999999,
79
  max_seq_len: int = 20,
 
80
  seed: int | None = None,
 
 
81
  ) -> None:
82
  """Initialize the infinite dataset.
83
 
84
  Args:
85
  max_int: Maximum random integer value
86
- max_seq_len: Maximum sequence length for padding
 
87
  seed: Random seed (optional, for reproducibility)
 
 
88
  """
89
  self.max_int = max_int
90
  self.max_seq_len = max_seq_len
 
91
  self.seed = seed
 
 
92
  self.rng = random.Random(seed)
 
 
 
93
 
94
  def _generate_sample(self) -> tuple[torch.Tensor, torch.Tensor]:
95
  """Generate a single (digits, encoded_name) sample."""
96
- n = self.rng.randint(0, self.max_int)
 
 
 
97
  digits = int_to_digits(n)
98
  name = read_digits(digits)
99
  encoded = encode(name)
@@ -104,17 +123,82 @@ class InfiniteNamerDataset(IterableDataset):
104
 
105
  # Append EOS and pad with -1
106
  encoded_with_eos = encoded + [EOS_IDX]
107
- encoded_padded = encoded_with_eos + [-1] * (self.max_seq_len - len(encoded_with_eos))
108
- encoded_padded = encoded_padded[: self.max_seq_len]
109
 
110
  return (
111
  torch.tensor(digits_padded, dtype=torch.long),
112
  torch.tensor(encoded_padded, dtype=torch.long),
113
  )
114
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
115
  def __iter__(self) -> InfiniteNamerDataset:
116
  """Yield samples infinitely.
117
 
 
 
 
118
  Each worker in multi-worker DataLoader gets its own iterator
119
  with a unique seed based on worker_id.
120
  """
@@ -130,8 +214,43 @@ class InfiniteNamerDataset(IterableDataset):
130
  base_seed = self.seed if self.seed else random.randint(0, 2**32)
131
  self.rng = random.Random(base_seed + worker_id * 1000)
132
 
 
 
 
 
 
133
  return self
134
 
135
  def __next__(self) -> tuple[torch.Tensor, torch.Tensor]:
136
- """Generate the next sample."""
 
 
 
 
 
 
 
 
 
 
137
  return self._generate_sample()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
  Uses Python generators to produce an endless stream of training samples.
73
  Each iteration yields fresh random samples.
74
+
75
+ Includes guaranteed samples:
76
+ - All numbers from 0 to 99,999
77
+ - Exact powers of 1000 (1,000; 1,000,000; 1,000,000,000; etc.)
78
  """
79
 
80
  def __init__(
81
  self,
82
  max_int: int = 999999,
83
  max_seq_len: int = 20,
84
+ max_output_len: int = 20,
85
  seed: int | None = None,
86
+ stratified: bool = True,
87
+ include_all_until: int = 99999,
88
  ) -> None:
89
  """Initialize the infinite dataset.
90
 
91
  Args:
92
  max_int: Maximum random integer value
93
+ max_seq_len: Maximum input sequence length for padding
94
+ max_output_len: Maximum output sequence length for padding
95
  seed: Random seed (optional, for reproducibility)
96
+ stratified: Whether to use stratified sampling across number scales
97
+ include_all_until: Include all integers from 0 to this value (default: 99999)
98
  """
99
  self.max_int = max_int
100
  self.max_seq_len = max_seq_len
101
+ self.max_output_len = max_output_len
102
  self.seed = seed
103
+ self.stratified = stratified
104
+ self.include_all_until = min(include_all_until, max_int)
105
  self.rng = random.Random(seed)
106
+ self._guaranteed_samples: list[int] | None = None
107
+ self._guaranteed_index: int = 0
108
+ self._powers_of_1000: list[int] | None = None
109
 
110
  def _generate_sample(self) -> tuple[torch.Tensor, torch.Tensor]:
111
  """Generate a single (digits, encoded_name) sample."""
112
+ if self.stratified:
113
+ n = self._stratified_random_int()
114
+ else:
115
+ n = self.rng.randint(0, self.max_int)
116
  digits = int_to_digits(n)
117
  name = read_digits(digits)
118
  encoded = encode(name)
 
123
 
124
  # Append EOS and pad with -1
125
  encoded_with_eos = encoded + [EOS_IDX]
126
+ encoded_padded = encoded_with_eos + [-1] * (self.max_output_len - len(encoded_with_eos))
127
+ encoded_padded = encoded_padded[: self.max_output_len]
128
 
129
  return (
130
  torch.tensor(digits_padded, dtype=torch.long),
131
  torch.tensor(encoded_padded, dtype=torch.long),
132
  )
133
 
134
+ def _get_guaranteed_samples(self) -> list[int]:
135
+ """Get the list of guaranteed samples (0-N and powers of 1000).
136
+
137
+ Returns:
138
+ List of integers that must be included in training
139
+ """
140
+ samples = []
141
+
142
+ # All numbers from 0 to include_all_until
143
+ samples.extend(range(0, self.include_all_until + 1))
144
+
145
+ # Exact powers of 1000 (1,000; 1,000,000; 1,000,000,000; etc.)
146
+ power = 1000
147
+ while power <= self.max_int:
148
+ if power > self.include_all_until: # Avoid duplicates
149
+ samples.append(power)
150
+ power *= 1000
151
+
152
+ return samples
153
+
154
+ def _stratified_random_int(self) -> int:
155
+ """Generate a random integer using stratified sampling across number scales.
156
+
157
+ Divides the range [0, max_int] into logarithmic strata (units, thousands,
158
+ millions, billions, etc.) and randomly selects one stratum, then generates
159
+ a uniform random number within that stratum. This ensures balanced training
160
+ across all scales rather than being biased toward larger numbers.
161
+
162
+ Returns:
163
+ Random integer uniformly selected from a randomly chosen stratum
164
+ """
165
+ # Define scale boundaries (powers of 1000)
166
+ scales = [0, 1000, 1000_000, 1000_000_000, 1000_000_000_000,
167
+ 1000_000_000_000_000, 1000_000_000_000_000_000]
168
+
169
+ # Find which scales are within our max_int range
170
+ valid_scales = [s for s in scales if s <= self.max_int]
171
+
172
+ if len(valid_scales) == 1:
173
+ # Only units scale available
174
+ return self.rng.randint(0, min(999, self.max_int))
175
+
176
+ # Randomly select a stratum (scale index)
177
+ stratum_idx = self.rng.randint(0, len(valid_scales) - 1)
178
+
179
+ # Determine the range for this stratum
180
+ lower = valid_scales[stratum_idx]
181
+ if stratum_idx + 1 < len(valid_scales):
182
+ upper = valid_scales[stratum_idx + 1] - 1
183
+ else:
184
+ upper = self.max_int
185
+
186
+ # Ensure upper doesn't exceed max_int
187
+ upper = min(upper, self.max_int)
188
+
189
+ # Generate random number in this stratum
190
+ # Special case: units stratum includes 0
191
+ if stratum_idx == 0:
192
+ return self.rng.randint(0, min(999, self.max_int))
193
+
194
+ return self.rng.randint(lower, upper)
195
+
196
  def __iter__(self) -> InfiniteNamerDataset:
197
  """Yield samples infinitely.
198
 
199
+ First yields all guaranteed samples (0-99,999 and powers of 1000),
200
+ then continues with stratified random sampling.
201
+
202
  Each worker in multi-worker DataLoader gets its own iterator
203
  with a unique seed based on worker_id.
204
  """
 
214
  base_seed = self.seed if self.seed else random.randint(0, 2**32)
215
  self.rng = random.Random(base_seed + worker_id * 1000)
216
 
217
+ # Generate and shuffle guaranteed samples
218
+ self._guaranteed_samples = self._get_guaranteed_samples()
219
+ self.rng.shuffle(self._guaranteed_samples)
220
+ self._guaranteed_index = 0
221
+
222
  return self
223
 
224
  def __next__(self) -> tuple[torch.Tensor, torch.Tensor]:
225
+ """Generate the next sample.
226
+
227
+ First yields all guaranteed samples, then stratified random samples.
228
+ """
229
+ # Yield guaranteed samples first
230
+ if self._guaranteed_samples and self._guaranteed_index < len(self._guaranteed_samples):
231
+ n = self._guaranteed_samples[self._guaranteed_index]
232
+ self._guaranteed_index += 1
233
+ return self._generate_sample_from_n(n)
234
+
235
+ # Then yield stratified random samples
236
  return self._generate_sample()
237
+
238
+ def _generate_sample_from_n(self, n: int) -> tuple[torch.Tensor, torch.Tensor]:
239
+ """Generate a sample for a specific integer n."""
240
+ digits = int_to_digits(n)
241
+ name = read_digits(digits)
242
+ encoded = encode(name)
243
+
244
+ # Pad digits with 10 (padding index)
245
+ digits_padded = digits + [10] * (self.max_seq_len - len(digits))
246
+ digits_padded = digits_padded[: self.max_seq_len]
247
+
248
+ # Append EOS and pad with -1
249
+ encoded_with_eos = encoded + [EOS_IDX]
250
+ encoded_padded = encoded_with_eos + [-1] * (self.max_output_len - len(encoded_with_eos))
251
+ encoded_padded = encoded_padded[: self.max_output_len]
252
+
253
+ return (
254
+ torch.tensor(digits_padded, dtype=torch.long),
255
+ torch.tensor(encoded_padded, dtype=torch.long),
256
+ )
namer/main.py CHANGED
@@ -59,11 +59,18 @@ def demo_command(args: argparse.Namespace) -> None:
59
  print(f" int_to_digits({n}) = {int_to_digits(n)}")
60
 
61
 
 
 
 
 
62
  def train_command(
63
  num_epochs: int = 30,
64
  steps_per_epoch: int = 1000,
65
  batch_size: int = 128,
66
  learning_rate: float = 0.001,
 
 
 
67
  ) -> None:
68
  """Train the Namer model.
69
 
@@ -72,6 +79,9 @@ def train_command(
72
  steps_per_epoch: Number of steps per epoch
73
  batch_size: Batch size for training
74
  learning_rate: Learning rate for optimizer
 
 
 
75
  """
76
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
77
  if device.type == "cuda":
@@ -79,17 +89,31 @@ def train_command(
79
  else:
80
  print("Warning: CUDA not available, using CPU")
81
 
82
- # Create infinite dataset for training
 
 
 
 
83
  infinite_dataset = InfiniteNamerDataset(
84
- max_int=999999,
85
- max_seq_len=20,
 
86
  seed=42,
 
 
87
  )
88
 
 
 
 
 
 
 
 
89
  # Create model
90
  model = NamerTransformer(
91
  vocab_size=len(VOCABULARY),
92
- max_output_len=20,
93
  d_model=128,
94
  nhead=4,
95
  num_encoder_layers=4,
@@ -113,19 +137,31 @@ def train_command(
113
  # Save model
114
  save_model(trained_model)
115
 
116
- # Test predictions
117
  print("\n--- Model Predictions ---")
118
  trained_model.eval()
119
 
120
- test_numbers = [123, 4567, 89012, 555555, 999999, 42, 0, 1000]
 
 
 
 
 
 
 
 
 
 
121
  device_obj = next(trained_model.parameters()).device
122
 
123
  with torch.no_grad():
124
  for n in test_numbers:
 
 
125
  pred = predict_number_name(trained_model, n, device_obj)
126
  actual = read_digits(int_to_digits(n))
127
  match = "✓" if pred == actual else "✗"
128
- print(f" {n}: pred='{pred}', actual='{actual}' {match}")
129
 
130
 
131
  def test_command() -> None:
@@ -180,12 +216,27 @@ def main(argv: list[str] | None = None) -> int:
180
  train_parser.add_argument(
181
  "--lr", type=float, default=0.001, help="Learning rate (default: 0.001)"
182
  )
 
 
 
 
 
 
 
 
 
 
 
 
183
  train_parser.set_defaults(
184
  func=lambda args: train_command(
185
  num_epochs=args.epochs,
186
  steps_per_epoch=args.steps,
187
  batch_size=args.batch_size,
188
  learning_rate=args.lr,
 
 
 
189
  )
190
  )
191
 
 
59
  print(f" int_to_digits({n}) = {int_to_digits(n)}")
60
 
61
 
62
+ # INT64_MAX: 9,223,372,036,854,775,807
63
+ INT64_MAX = 9223372036854775807
64
+
65
+
66
  def train_command(
67
  num_epochs: int = 30,
68
  steps_per_epoch: int = 1000,
69
  batch_size: int = 128,
70
  learning_rate: float = 0.001,
71
+ max_int: int = INT64_MAX,
72
+ max_seq_len: int = 25,
73
+ max_output_len: int = 35,
74
  ) -> None:
75
  """Train the Namer model.
76
 
 
79
  steps_per_epoch: Number of steps per epoch
80
  batch_size: Batch size for training
81
  learning_rate: Learning rate for optimizer
82
+ max_int: Maximum integer value for training (default: INT64_MAX)
83
+ max_seq_len: Maximum input sequence length (default: 25 for 19 digits)
84
+ max_output_len: Maximum output sequence length (default: 35 for large numbers)
85
  """
86
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
87
  if device.type == "cuda":
 
89
  else:
90
  print("Warning: CUDA not available, using CPU")
91
 
92
+ print(f"Training range: 0 to {max_int:,} ({len(str(max_int))} digits)")
93
+ print(f"Model config: max_seq_len={max_seq_len}, max_output_len={max_output_len}")
94
+
95
+ # Create infinite dataset for training with stratified sampling
96
+ # Includes all numbers 0-99,999 and exact powers of 1000 as guaranteed samples
97
  infinite_dataset = InfiniteNamerDataset(
98
+ max_int=max_int,
99
+ max_seq_len=max_seq_len,
100
+ max_output_len=max_output_len,
101
  seed=42,
102
+ stratified=True,
103
+ include_all_until=99999,
104
  )
105
 
106
+ # Calculate guaranteed samples info
107
+ guaranteed_count = 100000 # 0-99,999
108
+ powers_of_1000 = [10**3, 10**6, 10**9, 10**12, 10**15, 10**18]
109
+ extra_powers = sum(1 for p in powers_of_1000 if p > 99999 and p <= max_int)
110
+ total_guaranteed = guaranteed_count + extra_powers
111
+ print(f"Guaranteed samples: {total_guaranteed:,} (0-99,999 + {extra_powers} powers of 1000)")
112
+
113
  # Create model
114
  model = NamerTransformer(
115
  vocab_size=len(VOCABULARY),
116
+ max_output_len=max_output_len,
117
  d_model=128,
118
  nhead=4,
119
  num_encoder_layers=4,
 
137
  # Save model
138
  save_model(trained_model)
139
 
140
+ # Test predictions across all scales
141
  print("\n--- Model Predictions ---")
142
  trained_model.eval()
143
 
144
+ test_numbers = [
145
+ 0, 42, 123, 1000, 999999, # Small numbers
146
+ 1000000, 999999999, # Millions
147
+ 1000000000, 999999999999, # Billions, Trillions
148
+ 1000000000000, 999999999999999, # Trillions, Quadrillions
149
+ 1000000000000000, # Quintillion boundary
150
+ ]
151
+ # Add INT64_MAX if training for that range
152
+ if max_int >= INT64_MAX:
153
+ test_numbers.append(INT64_MAX)
154
+
155
  device_obj = next(trained_model.parameters()).device
156
 
157
  with torch.no_grad():
158
  for n in test_numbers:
159
+ if n > max_int:
160
+ continue
161
  pred = predict_number_name(trained_model, n, device_obj)
162
  actual = read_digits(int_to_digits(n))
163
  match = "✓" if pred == actual else "✗"
164
+ print(f" {n:,}: pred='{pred}', actual='{actual}' {match}")
165
 
166
 
167
  def test_command() -> None:
 
216
  train_parser.add_argument(
217
  "--lr", type=float, default=0.001, help="Learning rate (default: 0.001)"
218
  )
219
+ train_parser.add_argument(
220
+ "--max-int", type=int, default=INT64_MAX,
221
+ help=f"Maximum integer for training (default: {INT64_MAX})"
222
+ )
223
+ train_parser.add_argument(
224
+ "--max-seq-len", type=int, default=25,
225
+ help="Maximum input sequence length (default: 25 for 19 digits)"
226
+ )
227
+ train_parser.add_argument(
228
+ "--max-output-len", type=int, default=35,
229
+ help="Maximum output sequence length (default: 35)"
230
+ )
231
  train_parser.set_defaults(
232
  func=lambda args: train_command(
233
  num_epochs=args.epochs,
234
  steps_per_epoch=args.steps,
235
  batch_size=args.batch_size,
236
  learning_rate=args.lr,
237
+ max_int=args.max_int,
238
+ max_seq_len=args.max_seq_len,
239
+ max_output_len=args.max_output_len,
240
  )
241
  )
242