Codefer System commited on
Commit
8da5163
·
verified ·
1 Parent(s): effbd54

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -0
README.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ ---
7
+ # GPT-2 Hacker password generator.
8
+ This model can generate hacker passwords.
9
+
10
+ # Fine-tuning results
11
+ Number of epochs: 5
12
+
13
+ Number of steps: 3125
14
+
15
+ Loss: 0.519600
16
+
17
+ Fine-tuning time: almost 34:39 on Nvidia Geforce RTX 4060 8 GB GPU (laptop)
18
+
19
+ Fine-tuned on 3674.21 examples of 128 tokens.
20
+
21
+ # Using the model
22
+ Use this code:
23
+
24
+ ```python
25
+ from transformers import GPT2Tokenizer, GPT2LMHeadModel
26
+ import torch
27
+
28
+ model_name = "Codefer/GPT2-Hacker-password-generator"
29
+
30
+ # Load the pre-trained GPT-2 model and tokenizer from the specified directory
31
+ tokenizer = GPT2Tokenizer.from_pretrained(model_name) # Load standard GPT-2 tokenizer
32
+ model = GPT2LMHeadModel.from_pretrained(model_name) # Load fine-tuned GPT-2 model
33
+
34
+ # Function to generate an answer based on a given question
35
+ def generate_answer(question):
36
+ # Create a prompt by formatting the question for the model
37
+ prompt = f"Question: {question}\nAnswer:"
38
+
39
+ # Encode the prompt into input token IDs suitable for the model
40
+ input_ids = tokenizer.encode(prompt, return_tensors="pt")
41
+
42
+ # Set the model to evaluation mode
43
+ model.eval()
44
+
45
+ # Generate the output without calculating gradients (for efficiency)
46
+ with torch.no_grad():
47
+ output = model.generate(
48
+ input_ids, # Provide the input tokens
49
+ max_length=50, # Set the maximum length of the generated text
50
+ num_return_sequences=1, # Only return one sequence of text
51
+ no_repeat_ngram_size=2, # Prevent repeating n-grams (sequences of n words)
52
+ do_sample=True, # Enable sampling (randomized generation)
53
+ top_k=50, # Limit the model's choices to the top 50 probable words
54
+ top_p=0.95, # Use nucleus sampling (the cumulative probability distribution)
55
+ temperature=2.0, # Control the randomness/creativity of the output
56
+ pad_token_id=tokenizer.eos_token_id # Specify the padding token ID (EOS token in this case)
57
+ )
58
+
59
+ # Decode the generated token IDs back to a string and strip any special tokens
60
+ generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
61
+
62
+ # Extract the part after "Answer:" to get the model's generated answer
63
+ answer = generated_text.split("Answer:")[-1].strip()
64
+
65
+ return answer
66
+
67
+ # Example usage
68
+ question = "generate password."
69
+ print(generate_answer(question)) # Print the generated password
70
+ ```
71
+ # Example passwords generation with this model:
72
+
73
+ ### If you write a prompt like "Generate a hacker password." - the password will be something like this (5 examples):
74
+ - 0Qk=4CdPQQv0>n1K
75
+ - o4K*mQq9>Zu
76
+ - e5vx=KqE_j>kFj&*
77
+ - xD2PZ5@kz_hFq|W=
78
+ - h=rZ?^<Qp~7&z7XZ