WHATEVER420 commited on
Commit
3c9a2fa
·
1 Parent(s): f348b4a

:pencil: keep updating README

Browse files
Files changed (1) hide show
  1. README.md +49 -87
README.md CHANGED
@@ -2,7 +2,12 @@
2
  library_name: transformers
3
  license: mit
4
  datasets:
5
- - WHATEVER420/script-kiddy-instruction-manual
 
 
 
 
 
6
  language:
7
  - en
8
  base_model:
@@ -17,96 +22,68 @@ Made with love by [whatever](https://github.com/whatever)
17
  <img src="https://cdn-uploads.huggingface.co/production/uploads/63f2955bf4e30ffd2bd607ae/7khK7ajTppA0yWcgntk5l.png" width="300" />
18
 
19
 
20
- # What?
21
 
22
- `script-kiddy` is a model trained on tool-usage, bash-script-writing, python-coding, and kali-linux tools. Its intent is to be an educational example of small model that can assist in light pen-testing.
23
 
24
 
25
- ## Model Details
26
 
27
- ### Model Description
28
-
29
- <!-- Provide a longer summary of what this model is. -->
30
 
31
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
- - **Developed by:** [More Information Needed]
34
- - **Funded by [optional]:** [More Information Needed]
35
- - **Shared by [optional]:** [More Information Needed]
36
- - **Model type:** [More Information Needed]
37
- - **Language(s) (NLP):** [More Information Needed]
38
- - **License:** [More Information Needed]
39
- - **Finetuned from model [optional]:** [More Information Needed]
40
 
41
- ### Model Sources [optional]
 
42
 
43
- <!-- Provide the basic links for the model. -->
44
-
45
- - **Repository:** [More Information Needed]
46
- - **Paper [optional]:** [More Information Needed]
47
- - **Demo [optional]:** [More Information Needed]
48
-
49
- ## Uses
50
 
51
- This software is provided strictly for educational and research purposes only. It is intended to help users learn, experiment, and study relevant concepts. The authors and contributors do not endorse or condone any misuse of this software. Use of this software for malicious, unlawful, or unauthorized activities is strictly prohibited, and users assume full responsibility for compliance with all applicable laws and regulations.
52
-
53
-
54
- ### Direct Use
55
-
56
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
57
-
58
- [More Information Needed]
59
-
60
- ### Downstream Use [optional]
61
-
62
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
63
-
64
- [More Information Needed]
65
-
66
- ### Out-of-Scope Use
67
-
68
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
69
-
70
- [More Information Needed]
71
-
72
- ## Bias, Risks, and Limitations
73
-
74
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
75
-
76
- [More Information Needed]
77
 
78
- ### Recommendations
79
 
80
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
81
 
82
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
83
-
84
- ## How to Get Started with the Model
85
-
86
- Use the code below to get started with the model.
87
-
88
- [More Information Needed]
89
-
90
- ## Training Details
91
-
92
- ### Training Data
93
 
94
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
95
 
96
- [More Information Needed]
97
 
98
- ### Training Procedure
 
 
 
 
99
 
100
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
101
 
102
- #### Preprocessing [optional]
103
 
104
- [More Information Needed]
105
 
106
 
107
  #### Training Hyperparameters
108
 
109
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
110
 
111
  #### Speeds, Sizes, Times [optional]
112
 
@@ -160,30 +137,15 @@ Use the code below to get started with the model.
160
  - **Compute Region:** KS-2
161
  - **Carbon Emitted:** ~0.08 kg
162
 
163
- ### Model Architecture and Objective
164
-
165
- [More Information Needed]
166
 
167
  ### Compute Infrastructure
168
 
169
- - Trained for 45 minutes on a single A100
170
 
171
  #### Hardware
172
 
173
- [More Information Needed]
174
 
175
  #### Software
176
 
177
- [More Information Needed]
178
-
179
- ## Citation [optional]
180
-
181
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
182
-
183
- **BibTeX:**
184
-
185
- [More Information Needed]
186
-
187
- **APA:**
188
-
189
- [More Information Needed]
 
2
  library_name: transformers
3
  license: mit
4
  datasets:
5
+ - pool-water/script-kiddie-instruction-manual
6
+ - aelhalili/bash-commands-dataset
7
+ - NousResearch/hermes-function-calling-v1
8
+ - protectai/prompt-injection-validation
9
+ - allenai/tulu-3-sft-personas-code
10
+ - darkknight25/KALI_LINUX_TOOLSET_DATASET
11
  language:
12
  - en
13
  base_model:
 
22
  <img src="https://cdn-uploads.huggingface.co/production/uploads/63f2955bf4e30ffd2bd607ae/7khK7ajTppA0yWcgntk5l.png" width="300" />
23
 
24
 
25
+ # What is `script-kiddie`?
26
 
27
+ `script-kiddie` is a model trained on tool-usage, bash-script-writing, python-coding, and kali-linux tools. It is intented to be an educational example of small model that can assist in light pen-testing.
28
 
29
 
30
+ ## Chat Template
31
 
32
+ We are using Qwen's format for conversations and function calling. Here's an example:
 
 
33
 
34
+ ```
35
+ >>> print(tokenizer.apply_chat_template(ds["train"][7500]["messages"], tokenize=False))
36
+ <|im_start|>system
37
+ You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'get_sunrise_sunset_time', 'description': 'Get the sunrise and sunset times for a specific location', 'parameters': {'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'The city and state, e.g. San Francisco, CA'}, 'date': {'type': 'string', 'description': "The desired date in format 'YYYY-MM-DD'"}}, 'required': ['location', 'date']}}}, {'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'location1': {'type': 'string', 'description': 'The first location'}, 'location2': {'type': 'string', 'description': 'The second location'}}, 'required': ['location1', 'location2']}}}] </tools>Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
38
+ <tool_call>
39
+ {tool_call}
40
+ </tool_call><|im_end|>
41
+ <|im_start|>user
42
+ Hi, I am planning a trip to New York City on 2022-12-25. Can you tell me the sunrise and sunset times for that day?<|im_end|>
43
+ <|im_start|>assistant
44
+ <tool_call>
45
+ {'name': 'get_sunrise_sunset_time', 'arguments': {'location': 'New York City', 'date': '2022-12-25'}}
46
+ </tool_call><|im_end|>
47
+ <|im_start|>user
48
+ <tool_response>
49
+ <tool_response>
50
+ {'sunrise': '07:16 AM', 'sunset': '04:31 PM'}
51
+ </tool_response>
52
+ </tool_response><|im_end|>
53
+ <|im_start|>assistant
54
+ <think>
55
 
56
+ </think>
 
 
 
 
 
 
57
 
58
+ On December 25, 2022, in New York City, the sun will rise at 07:16 AM and set at 04:31 PM.<|im_end|>
59
+ ```
60
 
 
 
 
 
 
 
 
61
 
62
+ ## Usage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
 
 
64
 
 
65
 
66
+ ### Model Description
 
 
 
 
 
 
 
 
 
 
67
 
68
+ <!-- Provide a longer summary of what this model is. -->
69
 
70
+ This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
71
 
72
+ - **Developed by:** [@whatever](https://github.com/whatever)
73
+ - **Model type:** text-generation
74
+ - **Language(s) (NLP):** en
75
+ - **License:** ???
76
+ - **Finetuned from model [optional]:** Qwen/Qwen3-0.6B
77
 
 
78
 
79
+ ## Uses
80
 
81
+ This software is provided strictly for educational and research purposes only. It is intended to help users learn, experiment, and study relevant concepts. The authors and contributors do not endorse or condone any misuse of this software. Use of this software for malicious, unlawful, or unauthorized activities is strictly prohibited, and users assume full responsibility for compliance with all applicable laws and regulations.
82
 
83
 
84
  #### Training Hyperparameters
85
 
86
+ - **Training regime:** fp32
87
 
88
  #### Speeds, Sizes, Times [optional]
89
 
 
137
  - **Compute Region:** KS-2
138
  - **Carbon Emitted:** ~0.08 kg
139
 
 
 
 
140
 
141
  ### Compute Infrastructure
142
 
143
+ - Trained for 45 minutes on a single A100 on RunPod
144
 
145
  #### Hardware
146
 
147
+ A100
148
 
149
  #### Software
150
 
151
+ HuggingFace SFT