File size: 4,723 Bytes
db956c3
67f67f7
 
 
 
 
 
 
 
 
db956c3
 
67f67f7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
db956c3
67f67f7
 
 
 
 
 
 
 
db956c3
67f67f7
db956c3
67f67f7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
license: mit
language:
- en
metrics:
- accuracy
pipeline_tag: text-generation
tags:
- text-generation
- name-generation
---

# Model Card for Infinitode/TWNGM-OPEN-ARC

Repository: https://github.com/Infinitode/OPEN-ARC/

## Model Description

OPEN-ARC-TWNG is a simple recurrent neural network (RNN) language model developed as part of Infinitode’s OPEN-ARC initiative. It predicts the next token in a sequence using a lightweight architecture suitable for smaller datasets.

**Architecture**:

- **Embedding**: `Embedding(vocab_size, 500)` maps tokens to 500-dimensional vectors.
- **Recurrent Layer**: `SimpleRNN(50)` processes sequences with 50 recurrent units.
- **Output Layer**: `Dense(vocab_size, activation='softmax')` produces a probability distribution over the vocabulary.
- **Framework**: TensorFlow 2.x / Keras
- **Training Setup**: Compiled with `loss='categorical_crossentropy'`, `optimizer='adam'`, and tracked `accuracy` metric.

## Uses

- Generating creative and fantasy-style weapon or item names.
- Supporting game design workflows and ideation.
- Creating randomized content for prototyping or mods.

## Limitations

- May produce implausible or inappropriate names if poorly seeded or prompted.
- Vocabulary is tailored to Terraria-style names; may not generalize to other genres.
- Does not enforce real-world constraints like lore consistency, cultural appropriateness, or game balance.

## Training Data

- Dataset: All Terraria Weapons DPS v1.449 dataset from Kaggle.
- Source URL: https://www.kaggle.com/datasets/acr1209/all-terraria-weapons-dps-v-1449
- Content: Weapon names and their damage-per-second values used as creative seed text.
- Size: 395 entries of unique weapon names.
- Preprocessing: Names tokenized into subword units using SentencePiece BPE.

## Training Procedure

- Tokenizer: SentencePiece BPE trained on the weapon name corpus (vocab size: 500).
- Batch Size: 128
- Sequence Length: max_seq_len (based on longest tokenized name, `11`)
- Optimizer: Adam
- Loss: categorical_crossentropy
- Metrics: accuracy
- Epochs: 500
- Train/Validation Split: 100% train, 0% validation.

## Evaluation Results

| Metric | Value |
| ------ | ----- |
| Train Accuracy | 78.6% |
| Validation Accuracy | not used |
| Loss (final) | 0.44 |

## How to Use

```python
import random

def generate_random_name(min_length=3, max_length=10, temperature=1.0, seed_text=""):
    # Use a random seed
    random.seed()
    
    if seed_text:
        # If seed text is provided
        generated_name = seed_text
    else:
        # Randomly select a token from our vocab as our starting token if no seed text is present
        random_index = random.randint(1, vocab_size-1)
        random_token = sp.id_to_piece(random_index)
        generated_name = random_token

    # Generate subsequent subword tokens
    for _ in range(max_length - 1):
        # Encode our starting text
        token_list = sp.encode_as_ids(generated_name)
        token_list = pad_sequences([token_list], maxlen=max_seq_len-1, padding='pre')
        
        # Run prediction
        predicted = model.predict(token_list, verbose=0)[0]

        # Apply temperature to predictions, helps to varied results
        predicted = np.log(predicted + 1e-8) / temperature
        predicted = np.exp(predicted) / np.sum(np.exp(predicted))

        # Sample from the distribution
        next_index = np.random.choice(range(vocab_size), p=predicted)
        next_index = int(next_index)
        next_token = sp.id_to_piece(next_index)

        # Add the predicted token to our output
        generated_name += next_token
        
        # Decode the generated subword tokens into a string
        decoded_name = sp.decode_pieces(generated_name.split())

        # Stop if end token is predicted (optional, based on your dataset), or stop if max_length is reached
        if next_token == '' or len(decoded_name) > max_length:
            break

    # Replace underscores with spaces
    decoded_name = decoded_name.replace("▁", " ")
    
    # Remove stop tokens from the output
    decoded_name = decoded_name.replace("</s>", "")
    
    # Capatilize the first letter of each word
    generated_name = decoded_name.rsplit(' ', 1)[0]
    generated_name = generated_name[0].upper() + generated_name[1:]

    # Split the name and check the last part, make sure that it is not cut off
    parts = generated_name.split()
    if parts and len(parts[-1]) < min_length:
        generated_name = " ".join(parts[:-1])
    
    # Strip the output to ensure no extra whitespace
    return generated_name.strip()
```

## Contact

For questions or issues, open a GitHub issue or reach out at https://infinitode.netlify.app/forms/contact.