JohanBeytell commited on
Commit
db956c3
·
verified ·
1 Parent(s): 98a0bf8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +135 -3
README.md CHANGED
@@ -1,3 +1,135 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ metrics:
6
+ - accuracy
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - text-generation
10
+ - name-generation
11
+ ---
12
+
13
+ # Model Card for Infinitode/TWNGM-OPEN-ARC
14
+
15
+ Repository: https://github.com/Infinitode/OPEN-ARC/
16
+
17
+ ## Model Description
18
+
19
+ OPEN-ARC-TWNG is a simple recurrent neural network (RNN) language model developed as part of Infinitode’s OPEN-ARC initiative. It predicts the next token in a sequence using a lightweight architecture suitable for smaller datasets.
20
+
21
+ **Architecture**:
22
+
23
+ - **Embedding**: `Embedding(vocab_size, 500)` maps tokens to 500-dimensional vectors.
24
+ - **Recurrent Layer**: `SimpleRNN(50)` processes sequences with 50 recurrent units.
25
+ - **Output Layer**: `Dense(vocab_size, activation='softmax')` produces a probability distribution over the vocabulary.
26
+ - **Framework**: TensorFlow 2.x / Keras
27
+ - **Training Setup**: Compiled with `loss='categorical_crossentropy'`, `optimizer='adam'`, and tracked `accuracy` metric.
28
+
29
+ ## Uses
30
+
31
+ - Generating creative and fantasy-style weapon or item names.
32
+ - Supporting game design workflows and ideation.
33
+ - Creating randomized content for prototyping or mods.
34
+
35
+ ## Limitations
36
+
37
+ - May produce implausible or inappropriate names if poorly seeded or prompted.
38
+ - Vocabulary is tailored to Terraria-style names; may not generalize to other genres.
39
+ - Does not enforce real-world constraints like lore consistency, cultural appropriateness, or game balance.
40
+
41
+ ## Training Data
42
+
43
+ - Dataset: All Terraria Weapons DPS v1.449 dataset from Kaggle.
44
+ - Source URL: https://www.kaggle.com/datasets/acr1209/all-terraria-weapons-dps-v-1449
45
+ - Content: Weapon names and their damage-per-second values used as creative seed text.
46
+ - Size: 395 entries of unique weapon names.
47
+ - Preprocessing: Names tokenized into subword units using SentencePiece BPE.
48
+
49
+ ## Training Procedure
50
+
51
+ - Tokenizer: SentencePiece BPE trained on the weapon name corpus (vocab size: 500).
52
+ - Batch Size: 128
53
+ - Sequence Length: max_seq_len (based on longest tokenized name, `11`)
54
+ - Optimizer: Adam
55
+ - Loss: categorical_crossentropy
56
+ - Metrics: accuracy
57
+ - Epochs: 500
58
+ - Train/Validation Split: 100% train, 0% validation.
59
+
60
+ ## Evaluation Results
61
+
62
+ | Metric | Value |
63
+ | ------ | ----- |
64
+ | Train Accuracy | 78.6% |
65
+ | Validation Accuracy | not used |
66
+ | Loss (final) | 0.44 |
67
+
68
+ ## How to Use
69
+
70
+ ```python
71
+ import random
72
+
73
+ def generate_random_name(min_length=3, max_length=10, temperature=1.0, seed_text=""):
74
+ # Use a random seed
75
+ random.seed()
76
+
77
+ if seed_text:
78
+ # If seed text is provided
79
+ generated_name = seed_text
80
+ else:
81
+ # Randomly select a token from our vocab as our starting token if no seed text is present
82
+ random_index = random.randint(1, vocab_size-1)
83
+ random_token = sp.id_to_piece(random_index)
84
+ generated_name = random_token
85
+
86
+ # Generate subsequent subword tokens
87
+ for _ in range(max_length - 1):
88
+ # Encode our starting text
89
+ token_list = sp.encode_as_ids(generated_name)
90
+ token_list = pad_sequences([token_list], maxlen=max_seq_len-1, padding='pre')
91
+
92
+ # Run prediction
93
+ predicted = model.predict(token_list, verbose=0)[0]
94
+
95
+ # Apply temperature to predictions, helps to varied results
96
+ predicted = np.log(predicted + 1e-8) / temperature
97
+ predicted = np.exp(predicted) / np.sum(np.exp(predicted))
98
+
99
+ # Sample from the distribution
100
+ next_index = np.random.choice(range(vocab_size), p=predicted)
101
+ next_index = int(next_index)
102
+ next_token = sp.id_to_piece(next_index)
103
+
104
+ # Add the predicted token to our output
105
+ generated_name += next_token
106
+
107
+ # Decode the generated subword tokens into a string
108
+ decoded_name = sp.decode_pieces(generated_name.split())
109
+
110
+ # Stop if end token is predicted (optional, based on your dataset), or stop if max_length is reached
111
+ if next_token == '' or len(decoded_name) > max_length:
112
+ break
113
+
114
+ # Replace underscores with spaces
115
+ decoded_name = decoded_name.replace("▁", " ")
116
+
117
+ # Remove stop tokens from the output
118
+ decoded_name = decoded_name.replace("</s>", "")
119
+
120
+ # Capatilize the first letter of each word
121
+ generated_name = decoded_name.rsplit(' ', 1)[0]
122
+ generated_name = generated_name[0].upper() + generated_name[1:]
123
+
124
+ # Split the name and check the last part, make sure that it is not cut off
125
+ parts = generated_name.split()
126
+ if parts and len(parts[-1]) < min_length:
127
+ generated_name = " ".join(parts[:-1])
128
+
129
+ # Strip the output to ensure no extra whitespace
130
+ return generated_name.strip()
131
+ ```
132
+
133
+ ## Contact
134
+
135
+ For questions or issues, open a GitHub issue or reach out at https://infinitode.netlify.app/forms/contact.