--- license: mit language: - en metrics: - accuracy pipeline_tag: text-generation tags: - text-generation - name-generation --- # Model Card for Infinitode/TWNGM-OPEN-ARC Repository: https://github.com/Infinitode/OPEN-ARC/ ## Model Description OPEN-ARC-TWNG is a simple recurrent neural network (RNN) language model developed as part of Infinitode’s OPEN-ARC initiative. It predicts the next token in a sequence using a lightweight architecture suitable for smaller datasets. **Architecture**: - **Embedding**: `Embedding(vocab_size, 500)` maps tokens to 500-dimensional vectors. - **Recurrent Layer**: `SimpleRNN(50)` processes sequences with 50 recurrent units. - **Output Layer**: `Dense(vocab_size, activation='softmax')` produces a probability distribution over the vocabulary. - **Framework**: TensorFlow 2.x / Keras - **Training Setup**: Compiled with `loss='categorical_crossentropy'`, `optimizer='adam'`, and tracked `accuracy` metric. ## Uses - Generating creative and fantasy-style weapon or item names. - Supporting game design workflows and ideation. - Creating randomized content for prototyping or mods. ## Limitations - May produce implausible or inappropriate names if poorly seeded or prompted. - Vocabulary is tailored to Terraria-style names; may not generalize to other genres. - Does not enforce real-world constraints like lore consistency, cultural appropriateness, or game balance. ## Training Data - Dataset: All Terraria Weapons DPS v1.449 dataset from Kaggle. - Source URL: https://www.kaggle.com/datasets/acr1209/all-terraria-weapons-dps-v-1449 - Content: Weapon names and their damage-per-second values used as creative seed text. - Size: 395 entries of unique weapon names. - Preprocessing: Names tokenized into subword units using SentencePiece BPE. ## Training Procedure - Tokenizer: SentencePiece BPE trained on the weapon name corpus (vocab size: 500). - Batch Size: 128 - Sequence Length: max_seq_len (based on longest tokenized name, `11`) - Optimizer: Adam - Loss: categorical_crossentropy - Metrics: accuracy - Epochs: 500 - Train/Validation Split: 100% train, 0% validation. ## Evaluation Results | Metric | Value | | ------ | ----- | | Train Accuracy | 78.6% | | Validation Accuracy | not used | | Loss (final) | 0.44 | ## How to Use ```python import random def generate_random_name(min_length=3, max_length=10, temperature=1.0, seed_text=""): # Use a random seed random.seed() if seed_text: # If seed text is provided generated_name = seed_text else: # Randomly select a token from our vocab as our starting token if no seed text is present random_index = random.randint(1, vocab_size-1) random_token = sp.id_to_piece(random_index) generated_name = random_token # Generate subsequent subword tokens for _ in range(max_length - 1): # Encode our starting text token_list = sp.encode_as_ids(generated_name) token_list = pad_sequences([token_list], maxlen=max_seq_len-1, padding='pre') # Run prediction predicted = model.predict(token_list, verbose=0)[0] # Apply temperature to predictions, helps to varied results predicted = np.log(predicted + 1e-8) / temperature predicted = np.exp(predicted) / np.sum(np.exp(predicted)) # Sample from the distribution next_index = np.random.choice(range(vocab_size), p=predicted) next_index = int(next_index) next_token = sp.id_to_piece(next_index) # Add the predicted token to our output generated_name += next_token # Decode the generated subword tokens into a string decoded_name = sp.decode_pieces(generated_name.split()) # Stop if end token is predicted (optional, based on your dataset), or stop if max_length is reached if next_token == '' or len(decoded_name) > max_length: break # Replace underscores with spaces decoded_name = decoded_name.replace("▁", " ") # Remove stop tokens from the output decoded_name = decoded_name.replace("", "") # Capatilize the first letter of each word generated_name = decoded_name.rsplit(' ', 1)[0] generated_name = generated_name[0].upper() + generated_name[1:] # Split the name and check the last part, make sure that it is not cut off parts = generated_name.split() if parts and len(parts[-1]) < min_length: generated_name = " ".join(parts[:-1]) # Strip the output to ensure no extra whitespace return generated_name.strip() ``` ## Contact For questions or issues, open a GitHub issue or reach out at https://infinitode.netlify.app/forms/contact.