InfinitodeLTD
/

TWNGM-OPEN-ARC

Text Generation

English

name-generation

Model card Files Files and versions

xet

Community

JohanBeytell commited on May 28, 2025

Commit

67f67f7

verified ·

1 Parent(s): 073f629

Update README.md

Browse files

Files changed (1) hide show

README.md +129 -10

README.md CHANGED Viewed

@@ -1,17 +1,136 @@
 ---
 library_name: keras
 ---
-This model has been uploaded using the Keras library and can be used with JAX,
-TensorFlow, and PyTorch backends.
-This model card has been generated automatically and should be completed by the
-model author.
-See [Model Cards documentation](https://huggingface.co/docs/hub/model-cards) for
-more information.
-For more details about the model architecture, check out
-[config.json](./config.json).
-![](./assets/summary_plot.png)

 ---
+license: mit
+language:
+- en
+metrics:
+- accuracy
+pipeline_tag: text-generation
+tags:
+- text-generation
+- name-generation
 library_name: keras
 ---
+# Model Card for Infinitode/TWNGM-OPEN-ARC
+Repository: https://github.com/Infinitode/OPEN-ARC/
+## Model Description
+OPEN-ARC-TWNG is a simple recurrent neural network (RNN) language model developed as part of Infinitode’s OPEN-ARC initiative. It predicts the next token in a sequence using a lightweight architecture suitable for smaller datasets.
+**Architecture**:
+- **Embedding**: `Embedding(vocab_size, 500)` maps tokens to 500-dimensional vectors.
+- **Recurrent Layer**: `SimpleRNN(50)` processes sequences with 50 recurrent units.
+- **Output Layer**: `Dense(vocab_size, activation='softmax')` produces a probability distribution over the vocabulary.
+- **Framework**: TensorFlow 2.x / Keras
+- **Training Setup**: Compiled with `loss='categorical_crossentropy'`, `optimizer='adam'`, and tracked `accuracy` metric.
+## Uses
+- Generating creative and fantasy-style weapon or item names.
+- Supporting game design workflows and ideation.
+- Creating randomized content for prototyping or mods.
+## Limitations
+- May produce implausible or inappropriate names if poorly seeded or prompted.
+- Vocabulary is tailored to Terraria-style names; may not generalize to other genres.
+- Does not enforce real-world constraints like lore consistency, cultural appropriateness, or game balance.
+## Training Data
+- Dataset: All Terraria Weapons DPS v1.449 dataset from Kaggle.
+- Source URL: https://www.kaggle.com/datasets/acr1209/all-terraria-weapons-dps-v-1449
+- Content: Weapon names and their damage-per-second values used as creative seed text.
+- Size: 395 entries of unique weapon names.
+- Preprocessing: Names tokenized into subword units using SentencePiece BPE.
+## Training Procedure
+- Tokenizer: SentencePiece BPE trained on the weapon name corpus (vocab size: 500).
+- Batch Size: 128
+- Sequence Length: max_seq_len (based on longest tokenized name, `11`)
+- Optimizer: Adam
+- Loss: categorical_crossentropy
+- Metrics: accuracy
+- Epochs: 500
+- Train/Validation Split: 100% train, 0% validation.
+## Evaluation Results
+| Metric | Value |
+| ------ | ----- |
+| Train Accuracy | 78.6% |
+| Validation Accuracy | not used |
+| Loss (final) | 0.44 |
+## How to Use
+```python
+import random
+def generate_random_name(min_length=3, max_length=10, temperature=1.0, seed_text=""):
+    # Use a random seed
+    random.seed()
+    if seed_text:
+        # If seed text is provided
+        generated_name = seed_text
+    else:
+        # Randomly select a token from our vocab as our starting token if no seed text is present
+        random_index = random.randint(1, vocab_size-1)
+        random_token = sp.id_to_piece(random_index)
+        generated_name = random_token
+    # Generate subsequent subword tokens
+    for _ in range(max_length - 1):
+        # Encode our starting text
+        token_list = sp.encode_as_ids(generated_name)
+        token_list = pad_sequences([token_list], maxlen=max_seq_len-1, padding='pre')
+        # Run prediction
+        predicted = model.predict(token_list, verbose=0)[0]
+        # Apply temperature to predictions, helps to varied results
+        predicted = np.log(predicted + 1e-8) / temperature
+        predicted = np.exp(predicted) / np.sum(np.exp(predicted))
+        # Sample from the distribution
+        next_index = np.random.choice(range(vocab_size), p=predicted)
+        next_index = int(next_index)
+        next_token = sp.id_to_piece(next_index)
+        # Add the predicted token to our output
+        generated_name += next_token
+        # Decode the generated subword tokens into a string
+        decoded_name = sp.decode_pieces(generated_name.split())
+        # Stop if end token is predicted (optional, based on your dataset), or stop if max_length is reached
+        if next_token == '' or len(decoded_name) > max_length:
+            break
+    # Replace underscores with spaces
+    decoded_name = decoded_name.replace("▁", " ")
+    # Remove stop tokens from the output
+    decoded_name = decoded_name.replace("</s>", "")
+    # Capatilize the first letter of each word
+    generated_name = decoded_name.rsplit(' ', 1)[0]
+    generated_name = generated_name[0].upper() + generated_name[1:]
+    # Split the name and check the last part, make sure that it is not cut off
+    parts = generated_name.split()
+    if parts and len(parts[-1]) < min_length:
+        generated_name = " ".join(parts[:-1])
+    # Strip the output to ensure no extra whitespace
+    return generated_name.strip()
+```
+## Contact
+For questions or issues, open a GitHub issue or reach out at https://infinitode.netlify.app/forms/contact.