File size: 4,723 Bytes
db956c3 67f67f7 db956c3 67f67f7 db956c3 67f67f7 db956c3 67f67f7 db956c3 67f67f7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
---
license: mit
language:
- en
metrics:
- accuracy
pipeline_tag: text-generation
tags:
- text-generation
- name-generation
---
# Model Card for Infinitode/TWNGM-OPEN-ARC
Repository: https://github.com/Infinitode/OPEN-ARC/
## Model Description
OPEN-ARC-TWNG is a simple recurrent neural network (RNN) language model developed as part of Infinitode’s OPEN-ARC initiative. It predicts the next token in a sequence using a lightweight architecture suitable for smaller datasets.
**Architecture**:
- **Embedding**: `Embedding(vocab_size, 500)` maps tokens to 500-dimensional vectors.
- **Recurrent Layer**: `SimpleRNN(50)` processes sequences with 50 recurrent units.
- **Output Layer**: `Dense(vocab_size, activation='softmax')` produces a probability distribution over the vocabulary.
- **Framework**: TensorFlow 2.x / Keras
- **Training Setup**: Compiled with `loss='categorical_crossentropy'`, `optimizer='adam'`, and tracked `accuracy` metric.
## Uses
- Generating creative and fantasy-style weapon or item names.
- Supporting game design workflows and ideation.
- Creating randomized content for prototyping or mods.
## Limitations
- May produce implausible or inappropriate names if poorly seeded or prompted.
- Vocabulary is tailored to Terraria-style names; may not generalize to other genres.
- Does not enforce real-world constraints like lore consistency, cultural appropriateness, or game balance.
## Training Data
- Dataset: All Terraria Weapons DPS v1.449 dataset from Kaggle.
- Source URL: https://www.kaggle.com/datasets/acr1209/all-terraria-weapons-dps-v-1449
- Content: Weapon names and their damage-per-second values used as creative seed text.
- Size: 395 entries of unique weapon names.
- Preprocessing: Names tokenized into subword units using SentencePiece BPE.
## Training Procedure
- Tokenizer: SentencePiece BPE trained on the weapon name corpus (vocab size: 500).
- Batch Size: 128
- Sequence Length: max_seq_len (based on longest tokenized name, `11`)
- Optimizer: Adam
- Loss: categorical_crossentropy
- Metrics: accuracy
- Epochs: 500
- Train/Validation Split: 100% train, 0% validation.
## Evaluation Results
| Metric | Value |
| ------ | ----- |
| Train Accuracy | 78.6% |
| Validation Accuracy | not used |
| Loss (final) | 0.44 |
## How to Use
```python
import random
def generate_random_name(min_length=3, max_length=10, temperature=1.0, seed_text=""):
# Use a random seed
random.seed()
if seed_text:
# If seed text is provided
generated_name = seed_text
else:
# Randomly select a token from our vocab as our starting token if no seed text is present
random_index = random.randint(1, vocab_size-1)
random_token = sp.id_to_piece(random_index)
generated_name = random_token
# Generate subsequent subword tokens
for _ in range(max_length - 1):
# Encode our starting text
token_list = sp.encode_as_ids(generated_name)
token_list = pad_sequences([token_list], maxlen=max_seq_len-1, padding='pre')
# Run prediction
predicted = model.predict(token_list, verbose=0)[0]
# Apply temperature to predictions, helps to varied results
predicted = np.log(predicted + 1e-8) / temperature
predicted = np.exp(predicted) / np.sum(np.exp(predicted))
# Sample from the distribution
next_index = np.random.choice(range(vocab_size), p=predicted)
next_index = int(next_index)
next_token = sp.id_to_piece(next_index)
# Add the predicted token to our output
generated_name += next_token
# Decode the generated subword tokens into a string
decoded_name = sp.decode_pieces(generated_name.split())
# Stop if end token is predicted (optional, based on your dataset), or stop if max_length is reached
if next_token == '' or len(decoded_name) > max_length:
break
# Replace underscores with spaces
decoded_name = decoded_name.replace("▁", " ")
# Remove stop tokens from the output
decoded_name = decoded_name.replace("</s>", "")
# Capatilize the first letter of each word
generated_name = decoded_name.rsplit(' ', 1)[0]
generated_name = generated_name[0].upper() + generated_name[1:]
# Split the name and check the last part, make sure that it is not cut off
parts = generated_name.split()
if parts and len(parts[-1]) < min_length:
generated_name = " ".join(parts[:-1])
# Strip the output to ensure no extra whitespace
return generated_name.strip()
```
## Contact
For questions or issues, open a GitHub issue or reach out at https://infinitode.netlify.app/forms/contact. |