--- license: apache-2.0 language: - en pipeline_tag: text-generation base_model: - openai-community/gpt2 library_name: transformers datasets: - CodeferSystem/GPT2-Hacker-password-generator-dataset tags: - cybersecurity - passwords --- # GPT-2 Hacker password generator. This model can generate hacker passwords. # Fine-tuning results Number of epochs: 5 Number of steps: 3125 Loss: 0.519600 Fine-tuning time: almost 34:39 on Nvidia Geforce RTX 4060 8 GB GPU (laptop) Fine-tuned on 20k examples of 128 tokens. # Using the model Use this code: ```python from transformers import GPT2Tokenizer, GPT2LMHeadModel import torch model_name = "CodeferSystem/GPT2-Hacker-password-generator" # Load the pre-trained GPT-2 model and tokenizer from the specified directory tokenizer = GPT2Tokenizer.from_pretrained(model_name) # Load standard GPT-2 tokenizer model = GPT2LMHeadModel.from_pretrained(model_name) # Load fine-tuned GPT-2 model # Function to generate an answer based on a given question def generate_answer(question): # Create a prompt by formatting the question for the model prompt = f"Question: {question}\nAnswer:" # Encode the prompt into input token IDs suitable for the model input_ids = tokenizer.encode(prompt, return_tensors="pt") # Set the model to evaluation mode model.eval() # Generate the output without calculating gradients (for efficiency) with torch.no_grad(): output = model.generate( input_ids, # Provide the input tokens max_length=50, # Set the maximum length of the generated text num_return_sequences=1, # Only return one sequence of text no_repeat_ngram_size=2, # Prevent repeating n-grams (sequences of n words) do_sample=True, # Enable sampling (randomized generation) top_k=50, # Limit the model's choices to the top 50 probable words top_p=0.95, # Use nucleus sampling (the cumulative probability distribution) temperature=2.0, # Control the randomness/creativity of the output pad_token_id=tokenizer.eos_token_id # Specify the padding token ID (EOS token in this case) ) # Decode the generated token IDs back to a string and strip any special tokens generated_text = tokenizer.decode(output[0], skip_special_tokens=True) # Extract the part after "Answer:" to get the model's generated answer answer = generated_text.split("Answer:")[-1].strip() return answer # Example usage question = "generate password." print(generate_answer(question)) # Print the generated password ``` # Example passwords generation with this model: ### If you write a prompt like "Generate a hacker password." - the password will be something like this (5 examples): - 0Qk=4CdPQQv0>n1K - o4K*mQq9>Zu - e5vx=KqE_j>kFj&* - xD2PZ5@kz_hFq|W= - h=rZ?^