Update README.md
Browse files
README.md
CHANGED
|
@@ -1,17 +1,136 @@
|
|
| 1 |
-
|
| 2 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
library_name: keras
|
| 4 |
---
|
| 5 |
|
| 6 |
-
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
-
|
| 15 |
-
[config.json](./config.json).
|
| 16 |
|
| 17 |
-
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
metrics:
|
| 6 |
+
- accuracy
|
| 7 |
+
pipeline_tag: text-generation
|
| 8 |
+
tags:
|
| 9 |
+
- text-generation
|
| 10 |
+
- name-generation
|
| 11 |
library_name: keras
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# Model Card for Infinitode/TWNGM-OPEN-ARC
|
| 15 |
+
|
| 16 |
+
Repository: https://github.com/Infinitode/OPEN-ARC/
|
| 17 |
+
|
| 18 |
+
## Model Description
|
| 19 |
+
|
| 20 |
+
OPEN-ARC-TWNG is a simple recurrent neural network (RNN) language model developed as part of Infinitode’s OPEN-ARC initiative. It predicts the next token in a sequence using a lightweight architecture suitable for smaller datasets.
|
| 21 |
+
|
| 22 |
+
**Architecture**:
|
| 23 |
+
|
| 24 |
+
- **Embedding**: `Embedding(vocab_size, 500)` maps tokens to 500-dimensional vectors.
|
| 25 |
+
- **Recurrent Layer**: `SimpleRNN(50)` processes sequences with 50 recurrent units.
|
| 26 |
+
- **Output Layer**: `Dense(vocab_size, activation='softmax')` produces a probability distribution over the vocabulary.
|
| 27 |
+
- **Framework**: TensorFlow 2.x / Keras
|
| 28 |
+
- **Training Setup**: Compiled with `loss='categorical_crossentropy'`, `optimizer='adam'`, and tracked `accuracy` metric.
|
| 29 |
+
|
| 30 |
+
## Uses
|
| 31 |
+
|
| 32 |
+
- Generating creative and fantasy-style weapon or item names.
|
| 33 |
+
- Supporting game design workflows and ideation.
|
| 34 |
+
- Creating randomized content for prototyping or mods.
|
| 35 |
+
|
| 36 |
+
## Limitations
|
| 37 |
+
|
| 38 |
+
- May produce implausible or inappropriate names if poorly seeded or prompted.
|
| 39 |
+
- Vocabulary is tailored to Terraria-style names; may not generalize to other genres.
|
| 40 |
+
- Does not enforce real-world constraints like lore consistency, cultural appropriateness, or game balance.
|
| 41 |
+
|
| 42 |
+
## Training Data
|
| 43 |
+
|
| 44 |
+
- Dataset: All Terraria Weapons DPS v1.449 dataset from Kaggle.
|
| 45 |
+
- Source URL: https://www.kaggle.com/datasets/acr1209/all-terraria-weapons-dps-v-1449
|
| 46 |
+
- Content: Weapon names and their damage-per-second values used as creative seed text.
|
| 47 |
+
- Size: 395 entries of unique weapon names.
|
| 48 |
+
- Preprocessing: Names tokenized into subword units using SentencePiece BPE.
|
| 49 |
+
|
| 50 |
+
## Training Procedure
|
| 51 |
+
|
| 52 |
+
- Tokenizer: SentencePiece BPE trained on the weapon name corpus (vocab size: 500).
|
| 53 |
+
- Batch Size: 128
|
| 54 |
+
- Sequence Length: max_seq_len (based on longest tokenized name, `11`)
|
| 55 |
+
- Optimizer: Adam
|
| 56 |
+
- Loss: categorical_crossentropy
|
| 57 |
+
- Metrics: accuracy
|
| 58 |
+
- Epochs: 500
|
| 59 |
+
- Train/Validation Split: 100% train, 0% validation.
|
| 60 |
+
|
| 61 |
+
## Evaluation Results
|
| 62 |
+
|
| 63 |
+
| Metric | Value |
|
| 64 |
+
| ------ | ----- |
|
| 65 |
+
| Train Accuracy | 78.6% |
|
| 66 |
+
| Validation Accuracy | not used |
|
| 67 |
+
| Loss (final) | 0.44 |
|
| 68 |
+
|
| 69 |
+
## How to Use
|
| 70 |
+
|
| 71 |
+
```python
|
| 72 |
+
import random
|
| 73 |
+
|
| 74 |
+
def generate_random_name(min_length=3, max_length=10, temperature=1.0, seed_text=""):
|
| 75 |
+
# Use a random seed
|
| 76 |
+
random.seed()
|
| 77 |
+
|
| 78 |
+
if seed_text:
|
| 79 |
+
# If seed text is provided
|
| 80 |
+
generated_name = seed_text
|
| 81 |
+
else:
|
| 82 |
+
# Randomly select a token from our vocab as our starting token if no seed text is present
|
| 83 |
+
random_index = random.randint(1, vocab_size-1)
|
| 84 |
+
random_token = sp.id_to_piece(random_index)
|
| 85 |
+
generated_name = random_token
|
| 86 |
+
|
| 87 |
+
# Generate subsequent subword tokens
|
| 88 |
+
for _ in range(max_length - 1):
|
| 89 |
+
# Encode our starting text
|
| 90 |
+
token_list = sp.encode_as_ids(generated_name)
|
| 91 |
+
token_list = pad_sequences([token_list], maxlen=max_seq_len-1, padding='pre')
|
| 92 |
+
|
| 93 |
+
# Run prediction
|
| 94 |
+
predicted = model.predict(token_list, verbose=0)[0]
|
| 95 |
+
|
| 96 |
+
# Apply temperature to predictions, helps to varied results
|
| 97 |
+
predicted = np.log(predicted + 1e-8) / temperature
|
| 98 |
+
predicted = np.exp(predicted) / np.sum(np.exp(predicted))
|
| 99 |
+
|
| 100 |
+
# Sample from the distribution
|
| 101 |
+
next_index = np.random.choice(range(vocab_size), p=predicted)
|
| 102 |
+
next_index = int(next_index)
|
| 103 |
+
next_token = sp.id_to_piece(next_index)
|
| 104 |
+
|
| 105 |
+
# Add the predicted token to our output
|
| 106 |
+
generated_name += next_token
|
| 107 |
+
|
| 108 |
+
# Decode the generated subword tokens into a string
|
| 109 |
+
decoded_name = sp.decode_pieces(generated_name.split())
|
| 110 |
+
|
| 111 |
+
# Stop if end token is predicted (optional, based on your dataset), or stop if max_length is reached
|
| 112 |
+
if next_token == '' or len(decoded_name) > max_length:
|
| 113 |
+
break
|
| 114 |
+
|
| 115 |
+
# Replace underscores with spaces
|
| 116 |
+
decoded_name = decoded_name.replace("▁", " ")
|
| 117 |
+
|
| 118 |
+
# Remove stop tokens from the output
|
| 119 |
+
decoded_name = decoded_name.replace("</s>", "")
|
| 120 |
+
|
| 121 |
+
# Capatilize the first letter of each word
|
| 122 |
+
generated_name = decoded_name.rsplit(' ', 1)[0]
|
| 123 |
+
generated_name = generated_name[0].upper() + generated_name[1:]
|
| 124 |
|
| 125 |
+
# Split the name and check the last part, make sure that it is not cut off
|
| 126 |
+
parts = generated_name.split()
|
| 127 |
+
if parts and len(parts[-1]) < min_length:
|
| 128 |
+
generated_name = " ".join(parts[:-1])
|
| 129 |
+
|
| 130 |
+
# Strip the output to ensure no extra whitespace
|
| 131 |
+
return generated_name.strip()
|
| 132 |
+
```
|
| 133 |
|
| 134 |
+
## Contact
|
|
|
|
| 135 |
|
| 136 |
+
For questions or issues, open a GitHub issue or reach out at https://infinitode.netlify.app/forms/contact.
|