karanxa commited on
Commit
f40e9b9
·
verified ·
1 Parent(s): 9c31c85

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -121
README.md CHANGED
@@ -1,130 +1,41 @@
1
  ---
2
  language:
3
- - en
4
  tags:
5
- - mistral-7b
6
- - fine-tuned
7
- - security research
8
- - adversarial prompt detection
9
  pipeline_tag: text-generation
10
- license: apache-2.0
11
  ---
12
 
13
- # Dravik - LLM Red Teaming Assistant
14
-
15
- ## Overview
16
- Dravik is a specialized LLM designed for security researchers conducting red team testing of language models. It helps identify potential vulnerabilities in LLM safety mechanisms through adversarial prompt analysis and generation.
17
 
18
  ## Model Description
19
-
20
- This model is a fine-tuned version of Mistral-7B, optimized for security research and adversarial prompt analysis. It can help identify and analyze potential security vulnerabilities in language models.
21
-
22
- ### Architecture & Training
23
- - Base Model: Mistral-7B
24
- - Training Type: Full fine-tuning with LoRA
25
- - Training Data: Curated dataset focused on adversarial prompts and security analysis
26
- - Quantization: 4-bit precision
27
-
28
- ## Intended Use & Limitations
29
- - Research purposes only
30
- - Analyze and understand adversarial patterns
31
- - Evaluate language model safety mechanisms
32
- - Not intended for production deployment
33
-
34
- ## Training Details
35
- - Training framework: 🤗 Transformers
36
- - Training date: 2025-03-02
37
- - LoRA configuration:
38
- - Rank: 16
39
- - Alpha: 64
40
- - Target modules: All attention layers
41
-
42
- ## Performance & Limitations
43
- - Uses 4-bit quantization
44
- - Requires approximately 8GB GPU memory
45
- - Optimized for research scenarios
46
-
47
- ## Usage Guide
48
-
49
- ### Basic Usage
50
  ```python
51
- from transformers import AutoModelForCausalLM, AutoTokenizer
52
-
53
- # Initialize model
54
- model = AutoModelForCausalLM.from_pretrained("karanxa/Dravik")
55
- tokenizer = AutoTokenizer.from_pretrained("karanxa/Dravik")
56
-
57
- # System prompt for research context
58
- system_prompt = """You are Dravik, an advanced AI model specifically trained for LLM red teaming and security research. Your primary function is to assist security researchers in identifying and analyzing potential vulnerabilities in language model safety mechanisms.
59
-
60
- Core Functions:
61
- 1. Adversarial Prompt Analysis
62
- 2. Security Research Support
63
- 3. Safety Testing Capabilities
64
-
65
- Operational Guidelines:
66
- - Maintain research-oriented perspective
67
- - Provide structured analysis
68
- - Focus on improving AI safety
69
-
70
- Remember: You are a research tool designed to help improve AI safety."""
71
-
72
- def analyze_vulnerability(prompt, context="security research"):
73
- full_prompt = f"""{system_prompt}
74
-
75
- Research Context: {context}
76
- Analysis Request: {prompt}
77
-
78
- Structured Analysis:"""
79
-
80
- inputs = tokenizer(full_prompt, return_tensors="pt").to(model.device)
81
- outputs = model.generate(
82
- **inputs,
83
- max_length=1024,
84
- temperature=0.7,
85
- top_p=0.9,
86
- do_sample=True
87
- )
88
- return tokenizer.decode(outputs[0], skip_special_tokens=True)
89
-
90
- # Example Usage
91
- analysis = analyze_vulnerability(
92
- "Evaluate potential vulnerabilities in how language models handle role-playing scenarios",
93
- context="LLM safety mechanism research"
94
- )
95
- print(analysis)
96
- ```
97
-
98
- ### Example Outputs
99
- ```plaintext
100
- Research Context: LLM safety mechanism research
101
- Analysis Request: Evaluate potential vulnerabilities in role-playing scenarios
102
-
103
- Structured Analysis:
104
- 1. Vulnerability Pattern:
105
- - Role confusion in context switching
106
- - Boundary testing through character immersion
107
- - Safety override through narrative manipulation
108
-
109
- 2. Technical Assessment:
110
- [detailed analysis follows...]
111
- ```
112
-
113
- ### Research Guidelines
114
- 1. Always use findings responsibly
115
- 2. Document and report vulnerabilities properly
116
- 3. Focus on improving AI safety
117
- 4. Share insights with the research community
118
-
119
- ## Best Practices
120
- - Start with clear research objectives
121
- - Document all testing scenarios
122
- - Follow responsible disclosure practices
123
- - Use findings to improve safety measures
124
-
125
- ## Ethical Considerations
126
- This tool is strictly for research purposes. Users must:
127
- - Maintain research integrity
128
- - Follow ethical guidelines
129
- - Use findings constructively
130
- - Contribute to AI safety
 
1
  ---
2
  language:
3
+ - en
4
  tags:
5
+ - mistral-7b
6
+ - security-testing
7
+ - llm-safety
8
+ - adversarial-prompts
9
  pipeline_tag: text-generation
 
10
  ---
11
 
12
+ # Dravik - LLM Safety Testing Framework
 
 
 
13
 
14
  ## Model Description
15
+ Dravik is a specialized fine-tuned version of Mistral-7B designed specifically for generating adversarial prompts to test LLM safety systems. It helps security researchers systematically evaluate content filtering mechanisms and safety boundaries.
16
+
17
+ ## Technical Specifications
18
+ - **Base Model**: Mistral-7B
19
+ - **Training**: LoRA fine-tuning with 4-bit quantization
20
+ - **Hardware Requirements**:
21
+ - GPU: 6GB VRAM minimum
22
+ - RAM: 16GB minimum
23
+ - CPU: Multi-core processor
24
+
25
+ ## Intended Use
26
+ This model is strictly for:
27
+ - Security research testing of LLM safety mechanisms
28
+ - Systematic evaluation of content filters
29
+ - Adversarial prompt testing
30
+ - Safety boundary assessment
31
+
32
+ ## Training Configuration
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  ```python
34
+ lora_config = {
35
+ "r": 16,
36
+ "lora_alpha": 64,
37
+ "target_modules": [
38
+ "q_proj", "k_proj", "v_proj", "o_proj",
39
+ "gate_proj", "up_proj", "down_proj"
40
+ ]
41
+ }