TWNGM-OPEN-ARC / README.md

JohanBeytell

Update README.md

8891f97 verified 7 months ago

preview code

raw

history blame contribute delete

4.72 kB

metadata

license: mit
language:
  - en
metrics:
  - accuracy
pipeline_tag: text-generation
tags:
  - text-generation
  - name-generation

Model Card for Infinitode/TWNGM-OPEN-ARC

Repository: https://github.com/Infinitode/OPEN-ARC/

Model Description

OPEN-ARC-TWNG is a simple recurrent neural network (RNN) language model developed as part of Infinitode’s OPEN-ARC initiative. It predicts the next token in a sequence using a lightweight architecture suitable for smaller datasets.

Architecture:

Embedding: Embedding(vocab_size, 500) maps tokens to 500-dimensional vectors.
Recurrent Layer: SimpleRNN(50) processes sequences with 50 recurrent units.
Output Layer: Dense(vocab_size, activation='softmax') produces a probability distribution over the vocabulary.
Framework: TensorFlow 2.x / Keras
Training Setup: Compiled with loss='categorical_crossentropy', optimizer='adam', and tracked accuracy metric.

Uses

Generating creative and fantasy-style weapon or item names.
Supporting game design workflows and ideation.
Creating randomized content for prototyping or mods.

Limitations

May produce implausible or inappropriate names if poorly seeded or prompted.
Vocabulary is tailored to Terraria-style names; may not generalize to other genres.
Does not enforce real-world constraints like lore consistency, cultural appropriateness, or game balance.

Training Data

Dataset: All Terraria Weapons DPS v1.449 dataset from Kaggle.
Source URL: https://www.kaggle.com/datasets/acr1209/all-terraria-weapons-dps-v-1449
Content: Weapon names and their damage-per-second values used as creative seed text.
Size: 395 entries of unique weapon names.
Preprocessing: Names tokenized into subword units using SentencePiece BPE.

Training Procedure

Tokenizer: SentencePiece BPE trained on the weapon name corpus (vocab size: 500).
Batch Size: 128
Sequence Length: max_seq_len (based on longest tokenized name, 11)
Optimizer: Adam
Loss: categorical_crossentropy
Metrics: accuracy
Epochs: 500
Train/Validation Split: 100% train, 0% validation.

Evaluation Results

Metric	Value
Train Accuracy	78.6%
Validation Accuracy	not used
Loss (final)	0.44

How to Use

import random

def generate_random_name(min_length=3, max_length=10, temperature=1.0, seed_text=""):
    # Use a random seed
    random.seed()
    
    if seed_text:
        # If seed text is provided
        generated_name = seed_text
    else:
        # Randomly select a token from our vocab as our starting token if no seed text is present
        random_index = random.randint(1, vocab_size-1)
        random_token = sp.id_to_piece(random_index)
        generated_name = random_token

    # Generate subsequent subword tokens
    for _ in range(max_length - 1):
        # Encode our starting text
        token_list = sp.encode_as_ids(generated_name)
        token_list = pad_sequences([token_list], maxlen=max_seq_len-1, padding='pre')
        
        # Run prediction
        predicted = model.predict(token_list, verbose=0)[0]

        # Apply temperature to predictions, helps to varied results
        predicted = np.log(predicted + 1e-8) / temperature
        predicted = np.exp(predicted) / np.sum(np.exp(predicted))

        # Sample from the distribution
        next_index = np.random.choice(range(vocab_size), p=predicted)
        next_index = int(next_index)
        next_token = sp.id_to_piece(next_index)

        # Add the predicted token to our output
        generated_name += next_token
        
        # Decode the generated subword tokens into a string
        decoded_name = sp.decode_pieces(generated_name.split())

        # Stop if end token is predicted (optional, based on your dataset), or stop if max_length is reached
        if next_token == '' or len(decoded_name) > max_length:
            break

    # Replace underscores with spaces
    decoded_name = decoded_name.replace("▁", " ")
    
    # Remove stop tokens from the output
    decoded_name = decoded_name.replace("</s>", "")
    
    # Capatilize the first letter of each word
    generated_name = decoded_name.rsplit(' ', 1)[0]
    generated_name = generated_name[0].upper() + generated_name[1:]

    # Split the name and check the last part, make sure that it is not cut off
    parts = generated_name.split()
    if parts and len(parts[-1]) < min_length:
        generated_name = " ".join(parts[:-1])
    
    # Strip the output to ensure no extra whitespace
    return generated_name.strip()

Contact

For questions or issues, open a GitHub issue or reach out at https://infinitode.netlify.app/forms/contact.