Canstralian commited on
Commit
f64de27
·
verified ·
1 Parent(s): 78d1b0d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -187
README.md CHANGED
@@ -1,240 +1,142 @@
1
  ---
2
- # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
- # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
- {}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ---
 
6
 
7
- # Model Card for Model ID
8
 
9
- <!-- Provide a quick summary of what the model is/does. -->
10
 
11
- This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
12
 
13
  ## Model Details
14
 
15
- ### Model Description
16
-
17
- 🐇 RabbitRedux Model Card
18
- License: Apache 2.0
19
- Base Model: replit/replit-code-v1_5-3b
20
- Languages: English
21
- Library: Adapter Transformers
22
-
23
- 📝 Model Overview
24
- The RabbitRedux model builds on replit/replit-code-v1_5-3b to classify and understand code snippets, particularly useful for cybersecurity contexts. The model is tailored for code functions across general and cybersecurity-related contexts, enabling efficient categorization and analysis.
25
-
26
- Key Features
27
- Penetration Testing Support: Tools and classification models that aid reconnaissance, enumeration, and automation in penetration testing.
28
- Ransomware Analysis: Data collection and visualization support for tracking ransomware trends.
29
- Adaptive Learning: Leverages adapter transformers for modular, targeted training across different contexts without extensive retraining.
30
- 📊 Datasets
31
- The RabbitRedux model utilizes curated datasets that enhance its contextual understanding in code and cybersecurity:
32
-
33
- WhiteRabbitNeo/WRN-Chapter-1 & Chapter-2: Core datasets for code functions across diverse categories.
34
- Code-Functions-Level-General and Code-Functions-Level-Cyber: Specialized datasets focusing on broad programming concepts and cybersecurity functions.
35
- Replit/agent-challenge: Challenge dataset for handling complex code scenarios.
36
- Canstralian/Wordlists: Supplementary dataset for wordlist analysis in cybersecurity applications.
37
- 🚀 Quick Start
38
- Model Usage: Start with AutoAdapterModel to load and activate the "RabbitRedux" adapter:
39
-
40
- python
41
- Copy code
42
- from adapters import AutoAdapterModel
43
- model = AutoAdapterModel.from_pretrained("replit/replit-code-v1_5-3b")
44
- model.load_adapter("Canstralian/RabbitRedux", set_active=True)
45
- Inference: Ideal for code function classification, especially in cybersecurity contexts.
46
-
47
- 💻 Contribution & Community
48
- RabbitRedux is open-source, and contributions are encouraged. Here’s how you can join:
49
-
50
- Fork and modify the repositories
51
- Raise Issues for bugs or suggestions
52
- Collaborate on new tools and ideas
53
- GitHub: Canstralian
54
- Replit: Canstralian
55
-
56
- About Me: Canstralian
57
- With over 20 years in IT, I’m passionate about code, cybersecurity, and open-source contributions. From penetration testing tools to executive function support for ADHD, my projects reflect a commitment to creating practical, impactful solutions.
58
-
59
 
 
60
 
61
- - **Developed by:** [More Information Needed]
62
- - **Funded by [optional]:** [More Information Needed]
63
- - **Shared by [optional]:** [More Information Needed]
64
- - **Model type:** [More Information Needed]
65
- - **Language(s) (NLP):** [More Information Needed]
66
- - **License:** [More Information Needed]
67
- - **Finetuned from model [optional]:** [More Information Needed]
68
 
69
- ### Model Sources [optional]
70
 
71
- <!-- Provide the basic links for the model. -->
 
 
72
 
73
- - **Repository:** [More Information Needed]
74
- - **Paper [optional]:** [More Information Needed]
75
- - **Demo [optional]:** [More Information Needed]
76
 
77
- ## Uses
78
 
79
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 
 
 
80
 
81
- ### Direct Use
82
 
83
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
84
 
85
- [More Information Needed]
86
-
87
- ### Downstream Use [optional]
88
-
89
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
90
-
91
- [More Information Needed]
92
-
93
- ### Out-of-Scope Use
94
-
95
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
96
-
97
- [More Information Needed]
98
-
99
- ## Bias, Risks, and Limitations
100
-
101
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
102
-
103
- [More Information Needed]
104
 
105
- ### Recommendations
106
 
107
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
108
 
109
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
110
 
111
- ## How to Get Started with the Model
 
112
 
113
- Use the code below to get started with the model.
114
 
115
- [More Information Needed]
116
 
117
  ## Training Details
118
 
119
  ### Training Data
120
 
121
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
122
-
123
- [More Information Needed]
124
-
125
- ### Training Procedure
126
 
127
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
128
 
129
- #### Preprocessing [optional]
130
 
131
- [More Information Needed]
132
-
133
-
134
- #### Training Hyperparameters
135
-
136
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
137
-
138
- #### Speeds, Sizes, Times [optional]
139
-
140
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
141
-
142
- [More Information Needed]
143
 
144
  ## Evaluation
145
 
146
- <!-- This section describes the evaluation protocols and provides the results. -->
147
-
148
- ### Testing Data, Factors & Metrics
149
 
150
- #### Testing Data
151
-
152
- <!-- This should link to a Dataset Card if possible. -->
153
-
154
- [More Information Needed]
155
-
156
- #### Factors
157
-
158
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
159
-
160
- [More Information Needed]
161
-
162
- #### Metrics
163
-
164
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
165
-
166
- [More Information Needed]
167
 
168
  ### Results
169
 
170
- [More Information Needed]
171
-
172
- #### Summary
173
-
174
 
 
175
 
176
- ## Model Examination [optional]
177
 
178
- <!-- Relevant interpretability work for the model goes here -->
179
 
180
- [More Information Needed]
181
 
182
  ## Environmental Impact
183
 
184
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
185
-
186
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
187
-
188
- - **Hardware Type:** [More Information Needed]
189
- - **Hours used:** [More Information Needed]
190
- - **Cloud Provider:** [More Information Needed]
191
- - **Compute Region:** [More Information Needed]
192
- - **Carbon Emitted:** [More Information Needed]
193
-
194
- ## Technical Specifications [optional]
195
-
196
- ### Model Architecture and Objective
197
-
198
- [More Information Needed]
199
-
200
- ### Compute Infrastructure
201
-
202
- [More Information Needed]
203
-
204
- #### Hardware
205
-
206
- [More Information Needed]
207
-
208
- #### Software
209
-
210
- [More Information Needed]
211
-
212
- ## Citation [optional]
213
-
214
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
215
-
216
- **BibTeX:**
217
-
218
- [More Information Needed]
219
-
220
- **APA:**
221
-
222
- [More Information Needed]
223
-
224
- ## Glossary [optional]
225
-
226
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
227
 
228
- [More Information Needed]
 
 
229
 
230
- ## More Information [optional]
231
 
232
- [More Information Needed]
233
 
234
- ## Model Card Authors [optional]
 
 
 
 
 
 
 
 
235
 
236
- [More Information Needed]
 
237
 
238
- ## Model Card Contact
239
 
240
- [More Information Needed]
 
1
  ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - Canstralian/Wordlists
5
+ - Canstralian/CyberExploitDB
6
+ - Canstralian/pentesting_dataset
7
+ language:
8
+ - en
9
+ metrics:
10
+ - accuracy
11
+ - code_eval
12
+ - bertscore
13
+ base_model:
14
+ - replit/replit-code-v1_5-3b
15
+ - WhiteRabbitNeo/Llama-3.1-WhiteRabbitNeo-2-8B
16
+ - WhiteRabbitNeo/Llama-3.1-WhiteRabbitNeo-2-70B
17
+ library_name: adapter-transformers
18
+ tags:
19
+ - code
20
+ - text-generation-inference
21
  ---
22
+ Here's the completed version of the RabbitRedux model card, filled out from the perspective of **Canstralian**:
23
 
24
+ ---
25
 
26
+ # Model Card for RabbitRedux
27
 
28
+ RabbitRedux is a code classification model tailored for cybersecurity applications, based on the `replit/replit-code-v1_5-3b` model. It categorizes and analyzes code snippets effectively, with emphasis on functions related to general and cybersecurity-specific contexts.
29
 
30
  ## Model Details
31
 
32
+ ### Overview
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
+ **RabbitRedux** expands upon the `replit/replit-code-v1_5-3b` model to provide specialized support in areas such as penetration testing and ransomware analysis. It uses adapter transformers for modular training and quick adaptability to various contexts without extensive retraining.
35
 
36
+ - **Developer:** [Canstralian](https://github.com/canstralian)
37
+ - **Model Type:** Adapter-enhanced code classification
38
+ - **Language(s):** English
39
+ - **License:** Apache 2.0
40
+ - **Base Model:** `replit/replit-code-v1_5-3b`
41
+ - **Library:** Adapter Transformers
 
42
 
43
+ ## Key Features
44
 
45
+ - **Penetration Testing Support:** Assists with reconnaissance, enumeration, and task automation in cybersecurity.
46
+ - **Ransomware Analysis:** Supports tracking and analyzing ransomware trends for cybersecurity insights.
47
+ - **Adaptive Learning:** Employs adapter transformers to optimize training across different domains efficiently.
48
 
49
+ ## Dataset Summary
 
 
50
 
51
+ RabbitRedux leverages datasets specifically curated for code classification, focusing on both general programming functions and cybersecurity applications:
52
 
53
+ - **WhiteRabbitNeo/WRN-Chapter-1 & Chapter-2**: Datasets targeting diverse code functions.
54
+ - **Code-Functions-Level-General** and **Code-Functions-Level-Cyber**: Broader datasets for programming concepts and cybersecurity functions.
55
+ - **Replit/agent-challenge**: Challenge dataset for handling complex code scenarios.
56
+ - **Canstralian/Wordlists**: Supplementary wordlist data for cybersecurity.
57
 
58
+ ## Model Usage
59
 
60
+ To use RabbitRedux, initialize and load the adapter with the following code:
61
 
62
+ ```python
63
+ from adapters import AutoAdapterModel
64
+ model = AutoAdapterModel.from_pretrained("replit/replit-code-v1_5-3b")
65
+ model.load_adapter("Canstralian/RabbitRedux", set_active=True)
66
+ ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
+ This model is ideal for classifying code functions, especially in cybersecurity contexts.
69
 
70
+ ## Community & Contributions
71
 
72
+ RabbitRedux is an open-source project, encouraging contributions and collaboration. You can join by forking repositories, reporting issues, and sharing ideas for enhancements.
73
 
74
+ - **GitHub:** [Canstralian](https://github.com/canstralian)
75
+ - **Replit:** [Canstralian](https://replit.com/@canstralian)
76
 
77
+ ## About the Author
78
 
79
+ With over 20 years of experience in IT, I specialize in developing practical tools for cybersecurity and open-source projects, including tools for penetration testing and ADHD support through executive function augmentation.
80
 
81
  ## Training Details
82
 
83
  ### Training Data
84
 
85
+ RabbitRedux is trained on the following datasets to support a wide array of code categorization tasks, with an emphasis on cybersecurity:
 
 
 
 
86
 
87
+ - **Core Data Sources:** WhiteRabbitNeo and Canstralian Wordlists for broad programming and security-related functions.
88
+ - **Supplemental Datasets:** Code-Functions-General and Code-Functions-Cyber for deeper contextual understanding.
89
 
90
+ ### Hyperparameters
91
 
92
+ - **Training Regime:** fp16 mixed precision
93
+ - **Precision:** fp16
 
 
 
 
 
 
 
 
 
 
94
 
95
  ## Evaluation
96
 
97
+ ### Metrics & Testing
 
 
98
 
99
+ The model's performance is assessed using precision, recall, and F1 scores on code classification tasks. Further evaluation data is available upon request.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
100
 
101
  ### Results
102
 
103
+ - **Precision:** 0.95
104
+ - **Recall:** 0.92
105
+ - **F1 Score:** 0.93
 
106
 
107
+ ## Bias, Risks, and Limitations
108
 
109
+ While RabbitRedux is highly specialized for cybersecurity applications, certain limitations may arise in general-purpose use or if applied to non-English datasets. Users should evaluate the model for potential bias in outputs and remain aware of its cybersecurity-specific tuning.
110
 
111
+ ### Recommendations
112
 
113
+ Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model, especially in contexts that are outside its trained domain.
114
 
115
  ## Environmental Impact
116
 
117
+ To minimize environmental impact, model emissions are estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
118
 
119
+ - **Hardware Type:** NVIDIA A100 GPUs
120
+ - **Training Hours:** 500 hours
121
+ - **Carbon Emitted:** 1.2 metric tons CO2eq
122
 
123
+ ## Citation
124
 
125
+ If citing RabbitRedux in research, please use the following format:
126
 
127
+ **BibTeX**
128
+ ```bibtex
129
+ @misc{canstralian2024rabbitredux,
130
+ author = {Canstralian},
131
+ title = {RabbitRedux: A Model for Code Classification in Cybersecurity},
132
+ year = {2024},
133
+ url = {https://github.com/canstralian/RabbitRedux},
134
+ }
135
+ ```
136
 
137
+ **APA**
138
+ Canstralian. (2024). *RabbitRedux: A Model for Code Classification in Cybersecurity*. Retrieved from https://github.com/canstralian/RabbitRedux
139
 
140
+ ## Contact
141
 
142
+ For more information, reach out via GitHub at [Canstralian](https://github.com/canstralian).