JohanBeytell commited on
Commit
67f67f7
·
verified ·
1 Parent(s): 073f629

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +129 -10
README.md CHANGED
@@ -1,17 +1,136 @@
1
-
2
  ---
 
 
 
 
 
 
 
 
 
3
  library_name: keras
4
  ---
5
 
6
- This model has been uploaded using the Keras library and can be used with JAX,
7
- TensorFlow, and PyTorch backends.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
- This model card has been generated automatically and should be completed by the
10
- model author.
11
- See [Model Cards documentation](https://huggingface.co/docs/hub/model-cards) for
12
- more information.
 
 
 
 
13
 
14
- For more details about the model architecture, check out
15
- [config.json](./config.json).
16
 
17
- ![](./assets/summary_plot.png)
 
 
1
  ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ metrics:
6
+ - accuracy
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - text-generation
10
+ - name-generation
11
  library_name: keras
12
  ---
13
 
14
+ # Model Card for Infinitode/TWNGM-OPEN-ARC
15
+
16
+ Repository: https://github.com/Infinitode/OPEN-ARC/
17
+
18
+ ## Model Description
19
+
20
+ OPEN-ARC-TWNG is a simple recurrent neural network (RNN) language model developed as part of Infinitode’s OPEN-ARC initiative. It predicts the next token in a sequence using a lightweight architecture suitable for smaller datasets.
21
+
22
+ **Architecture**:
23
+
24
+ - **Embedding**: `Embedding(vocab_size, 500)` maps tokens to 500-dimensional vectors.
25
+ - **Recurrent Layer**: `SimpleRNN(50)` processes sequences with 50 recurrent units.
26
+ - **Output Layer**: `Dense(vocab_size, activation='softmax')` produces a probability distribution over the vocabulary.
27
+ - **Framework**: TensorFlow 2.x / Keras
28
+ - **Training Setup**: Compiled with `loss='categorical_crossentropy'`, `optimizer='adam'`, and tracked `accuracy` metric.
29
+
30
+ ## Uses
31
+
32
+ - Generating creative and fantasy-style weapon or item names.
33
+ - Supporting game design workflows and ideation.
34
+ - Creating randomized content for prototyping or mods.
35
+
36
+ ## Limitations
37
+
38
+ - May produce implausible or inappropriate names if poorly seeded or prompted.
39
+ - Vocabulary is tailored to Terraria-style names; may not generalize to other genres.
40
+ - Does not enforce real-world constraints like lore consistency, cultural appropriateness, or game balance.
41
+
42
+ ## Training Data
43
+
44
+ - Dataset: All Terraria Weapons DPS v1.449 dataset from Kaggle.
45
+ - Source URL: https://www.kaggle.com/datasets/acr1209/all-terraria-weapons-dps-v-1449
46
+ - Content: Weapon names and their damage-per-second values used as creative seed text.
47
+ - Size: 395 entries of unique weapon names.
48
+ - Preprocessing: Names tokenized into subword units using SentencePiece BPE.
49
+
50
+ ## Training Procedure
51
+
52
+ - Tokenizer: SentencePiece BPE trained on the weapon name corpus (vocab size: 500).
53
+ - Batch Size: 128
54
+ - Sequence Length: max_seq_len (based on longest tokenized name, `11`)
55
+ - Optimizer: Adam
56
+ - Loss: categorical_crossentropy
57
+ - Metrics: accuracy
58
+ - Epochs: 500
59
+ - Train/Validation Split: 100% train, 0% validation.
60
+
61
+ ## Evaluation Results
62
+
63
+ | Metric | Value |
64
+ | ------ | ----- |
65
+ | Train Accuracy | 78.6% |
66
+ | Validation Accuracy | not used |
67
+ | Loss (final) | 0.44 |
68
+
69
+ ## How to Use
70
+
71
+ ```python
72
+ import random
73
+
74
+ def generate_random_name(min_length=3, max_length=10, temperature=1.0, seed_text=""):
75
+ # Use a random seed
76
+ random.seed()
77
+
78
+ if seed_text:
79
+ # If seed text is provided
80
+ generated_name = seed_text
81
+ else:
82
+ # Randomly select a token from our vocab as our starting token if no seed text is present
83
+ random_index = random.randint(1, vocab_size-1)
84
+ random_token = sp.id_to_piece(random_index)
85
+ generated_name = random_token
86
+
87
+ # Generate subsequent subword tokens
88
+ for _ in range(max_length - 1):
89
+ # Encode our starting text
90
+ token_list = sp.encode_as_ids(generated_name)
91
+ token_list = pad_sequences([token_list], maxlen=max_seq_len-1, padding='pre')
92
+
93
+ # Run prediction
94
+ predicted = model.predict(token_list, verbose=0)[0]
95
+
96
+ # Apply temperature to predictions, helps to varied results
97
+ predicted = np.log(predicted + 1e-8) / temperature
98
+ predicted = np.exp(predicted) / np.sum(np.exp(predicted))
99
+
100
+ # Sample from the distribution
101
+ next_index = np.random.choice(range(vocab_size), p=predicted)
102
+ next_index = int(next_index)
103
+ next_token = sp.id_to_piece(next_index)
104
+
105
+ # Add the predicted token to our output
106
+ generated_name += next_token
107
+
108
+ # Decode the generated subword tokens into a string
109
+ decoded_name = sp.decode_pieces(generated_name.split())
110
+
111
+ # Stop if end token is predicted (optional, based on your dataset), or stop if max_length is reached
112
+ if next_token == '' or len(decoded_name) > max_length:
113
+ break
114
+
115
+ # Replace underscores with spaces
116
+ decoded_name = decoded_name.replace("▁", " ")
117
+
118
+ # Remove stop tokens from the output
119
+ decoded_name = decoded_name.replace("</s>", "")
120
+
121
+ # Capatilize the first letter of each word
122
+ generated_name = decoded_name.rsplit(' ', 1)[0]
123
+ generated_name = generated_name[0].upper() + generated_name[1:]
124
 
125
+ # Split the name and check the last part, make sure that it is not cut off
126
+ parts = generated_name.split()
127
+ if parts and len(parts[-1]) < min_length:
128
+ generated_name = " ".join(parts[:-1])
129
+
130
+ # Strip the output to ensure no extra whitespace
131
+ return generated_name.strip()
132
+ ```
133
 
134
+ ## Contact
 
135
 
136
+ For questions or issues, open a GitHub issue or reach out at https://infinitode.netlify.app/forms/contact.