cisco-ehsan commited on
Commit
7736105
·
verified ·
1 Parent(s): 7bd3350

Upload folder using huggingface_hub

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,654 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:35705
9
+ - loss:MultipleNegativesRankingLoss
10
+ widget:
11
+ - source_sentence: What is the primary responsibility of the Information Security
12
+ Oversight Committee in an organization?
13
+ sentences:
14
+ - Least privilege
15
+ - By searching for repeating ciphertext sequences at fixed displacements.
16
+ - Ensuring and supporting information protection awareness and training programs
17
+ - source_sentence: Which of the following databases are required to be maintained
18
+ by any system participating in an IPSec VPN?
19
+ sentences:
20
+ - 'Gatekeeper bypass through code signing exploitation represents a sophisticated
21
+ attack vector targeting macOS''s application verification mechanism. Understanding
22
+ detection indicators requires examining both technical artifacts and behavioral
23
+ patterns associated with compromised digital signatures.\n\n**Primary Technical
24
+ Indicators:**\n\nCode signing certificate anomalies constitute the most direct
25
+ indicator. Legitimate applications possess valid, unexpired certificates from
26
+ trusted authorities like Apple or recognized developers. Suspicious indicators
27
+ include self-signed certificates, expired certificates, certificates issued by
28
+ unrecognized authorities, or certificates with unusual subject alternative names
29
+ (SANs). The `codesign` command reveals signature validity, while examining certificate
30
+ chains through Keychain Access exposes potential anomalies.\n\nBinary modification
31
+ signatures often manifest as \\\"unsigned\\\" status for previously signed applications.
32
+ Gatekeeper maintains a whitelist of notarized applications; unsigned binaries
33
+ attempting execution trigger alerts in system logs located at `/var/log/system.log`.
34
+ Additionally, applications with altered code signing identifiers (CSIDs) or modified
35
+ entitlements may indicate tampering.\n\n**Behavioral and System-Level Indicators:**\n\nProcess
36
+ execution from non-standard locations frequently accompanies successful bypasses.
37
+ Legitimate Gatekeeper-approved applications typically execute from `/Applications`
38
+ or user-specific application directories. Execution from temporary directories,
39
+ Downloads folders, or unusual paths warrants investigation.\n\nNetwork behavior
40
+ analysis reveals additional indicators. Compromised applications may exhibit unexpected
41
+ network connections, particularly to suspicious domains or IP addresses not associated
42
+ with the legitimate application''s functionality. DNS queries to newly registered
43
+ domains (NRDs) or domains with high entropy often indicate command-and-control
44
+ communications.\n\n**MITRE ATT&CK Framework Alignment:**\n\nThis technique aligns
45
+ with T1553.002 (Subvert Trust Controls: Code Signing). Adversaries exploit weaknesses
46
+ in code signing verification processes, potentially through stolen certificates,
47
+ certificate authority compromise, or exploitation of bypass mechanisms like manual
48
+ allowlisting.\n\n**Detection and Response Strategies:**\n\nImplement comprehensive
49
+ logging using the Unified Logging system with custom predicates monitoring `com.apple.securityd`
50
+ events. Deploy endpoint detection solutions capable of real-time code signing
51
+ validation and behavioral analysis. Regularly audit installed applications against
52
+ known-good baselines, focusing on unsigned or suspiciously signed executables.\n\nNIST
53
+ Cybersecurity Framework alignment emphasizes continuous monitoring (DE.CM) and
54
+ anomaly detection capabilities within the Detect function, ensuring organizations
55
+ maintain visibility into potential Gatekeeper bypass attempts through robust logging
56
+ and behavioral analysis mechanisms.'
57
+ - Security Policy Database (SPD) and Security Association Database (SAD)
58
+ - Virus
59
+ - source_sentence: How is a supply chain attack implemented through compromised software
60
+ development kits (SDKs) and their propagation to thousands of applications?
61
+ sentences:
62
+ - 'Detecting security label tampering through extended attributes (xattrs) requires
63
+ implementing comprehensive monitoring and validation mechanisms aligned with NIST
64
+ Cybersecurity Framework''s Detect function and MITRE ATT&CK''s Defense Evasion
65
+ tactics.\n\n**Xattr Monitoring Techniques:**\n\nImplement real-time file system
66
+ monitoring using tools like `auditd` or Windows Event Tracing to track xattr modifications.
67
+ Configure audit rules targeting specific security-critical files and directories,
68
+ focusing on operations like `SETXATTR`, `GETXATTR`, and `LISTXATTR`. This aligns
69
+ with NIST CSF DE.CM-1 (continuous monitoring) by establishing baseline behaviors
70
+ for legitimate xattr usage patterns.\n\n**Integrity Validation Methods:**\n\nDeploy
71
+ cryptographic hashing of security labels stored in xattrs, creating immutable
72
+ reference values. Implement periodic verification against these baselines using
73
+ SHA-256 or stronger algorithms. This corresponds to NIST CSF PR.DS-6 (integrity
74
+ checking mechanisms) and provides detection capabilities for unauthorized modifications.\n\n**Behavioral
75
+ Analysis:**\n\nEstablish user and process behavior profiling for xattr operations,
76
+ identifying anomalous patterns that deviate from established baselines. Monitor
77
+ for unusual privilege escalation attempts modifying security labels, particularly
78
+ focusing on MITRE ATT&CK technique T1562.008 (Impair Defenses: Disable or Modify
79
+ Tools) where adversaries manipulate security mechanisms.\n\n**System Integration:**\n\nLeverage
80
+ SELinux or AppArmor mandatory access controls to restrict unauthorized xattr modifications.
81
+ Implement centralized logging aggregation correlating xattr changes with process
82
+ execution and network activities, enabling correlation analysis for sophisticated
83
+ tampering attempts.\n\n**Detection Signatures:**\n\nDevelop custom detection rules
84
+ identifying suspicious xattr patterns, including rapid successive modifications,
85
+ bulk security label changes across multiple files, or modifications from unexpected
86
+ processes. Integrate these signatures into SIEM platforms for automated alerting
87
+ and incident response workflows.\n\nThis multi-layered approach provides comprehensive
88
+ coverage against sophisticated tampering attempts while maintaining operational
89
+ efficiency through targeted monitoring strategies.'
90
+ - Supply chain attacks occur when an attacker injects malicious code into trusted
91
+ components in the software supply chain, such as open source libraries or SDKs.
92
+ These components are then distributed to many developers and organizations worldwide.
93
+ Once they integrate these seemingly legitimate tools into their own products,
94
+ the malware is automatically embedded within them, propagating widely across various
95
+ applications and devices. Attackers can also compromise update servers that deliver
96
+ patches to millions of systems simultaneously. The Sunburst attack on SolarWinds
97
+ was one such supply chain attack where a malicious update was pushed through the
98
+ Orion update server. In this case, attackers used the compromised SDK from Pulse
99
+ Secure to propagate the malware. Because Pulse Secure is used by many organizations
100
+ for secure remote access solutions, their software development kit was distributed
101
+ as part of legitimate downloads. Attackers then inserted their own malicious code
102
+ into that SDK, which in turn infected all products built using it. This attack
103
+ caused massive damage and forced a significant number of companies to implement
104
+ new policies regarding software updates and vendor trustworthiness. The SolarWinds
105
+ supply chain attack also demonstrated the importance of monitoring for suspicious
106
+ network traffic patterns and adopting multi-factor authentication (MFA) to limit
107
+ access to sensitive systems. Attackers can easily bypass traditional security
108
+ measures if they manage to compromise a legitimate update server or SDK provider.
109
+ Thus, it is essential for companies to carefully vet their third-party software
110
+ providers and implement strict controls around the development lifecycle of critical
111
+ applications. Furthermore, adopting robust patch management strategies and keeping
112
+ an eye on suspicious activity in software repositories are important steps toward
113
+ reducing the risk of supply chain attacks. Finally, implementing continuous monitoring
114
+ tools like SIEM can provide early warning signs if any unexpected changes occur
115
+ within a software component or its corresponding repository. Overall, understanding
116
+ how supply chain attacks work is crucial for building more secure systems and
117
+ ensuring that organizations do not inadvertently become victims of such sophisticated
118
+ breaches in the future. Attackers can easily bypass traditional security measures
119
+ if they manage to compromise a legitimate update server or SDK provider. Thus,
120
+ it is essential for companies to carefully vet their third-party software providers
121
+ and implement strict controls around the development lifecycle of critical applications.
122
+ Furthermore, adopting robust patch management strategies and keeping an eye on
123
+ suspicious activity in software repositories are important steps toward reducing
124
+ the risk of supply chain attacks. Finally, implementing continuous monitoring
125
+ tools like SIEM can provide early warning signs if any unexpected changes occur
126
+ within a software component or its corresponding repository.
127
+ - 'An automated response system for isolating compromised systems requires a multi-layered
128
+ architecture integrating detection capabilities with orchestrated containment
129
+ actions, aligned with NIST Cybersecurity Framework''s Respond (RS) function and
130
+ MITRE ATT&CK defensive strategies.\n\n**Core Architecture Components:**\n\nThe
131
+ system should implement Security Orchestration, Automation, and Response (SOAR)
132
+ platforms integrated with Security Information and Event Management (SIEM) systems.
133
+ Central components include: detection engines processing indicators of compromise
134
+ (IoCs), automated decision matrices for risk assessment, and isolation mechanisms
135
+ that can quarantine affected assets without disrupting critical operations.\n\n**Detection
136
+ Integration:**\n\nLeverage MITRE ATT&CK techniques to establish comprehensive
137
+ monitoring across the attack lifecycle. Implement behavioral analytics detecting
138
+ tactics like Initial Access (T1566 Phishing), Execution (T1059 Command and Scripting
139
+ Interpreter), and Defense Evasion (T1027 Obfuscated Files). Deploy endpoint detection
140
+ and response (EDR) solutions monitoring process execution, network communications,
141
+ and file system modifications. Integrate threat intelligence feeds correlating
142
+ observed indicators with known exploitation campaigns.\n\n**Automated Response
143
+ Logic:**\n\nDesign tiered response capabilities based on confidence levels and
144
+ asset criticality. Implement network microsegmentation enabling granular isolation
145
+ through software-defined networking (SDN) controllers. Automated actions should
146
+ include: DNS sinkholing for malicious domains, firewall rule deployment blocking
147
+ suspicious traffic patterns, and network switch port isolation. Critical systems
148
+ require graceful degradation procedures maintaining business continuity.\n\n**Decision
149
+ Framework:**\n\nEstablish risk scoring algorithms incorporating asset value, threat
150
+ severity, and exploitation likelihood. Implement approval workflows for high-confidence
151
+ isolations while enabling rapid containment for confirmed compromises. Integration
152
+ with Configuration Management Databases (CMDB) ensures accurate asset inventory
153
+ and dependency mapping before executing isolation procedures.\n\n**Validation
154
+ and Recovery:**\n\nPost-isolation processes should include automated forensic
155
+ data collection, incident classification against MITRE ATT&CK framework, and coordinated
156
+ recovery procedures. Implement continuous monitoring ensuring isolation effectiveness
157
+ while maintaining operational readiness for subsequent threats.'
158
+ - source_sentence: What are the best practices for SOC teams to enhance their threat
159
+ hunting capabilities against ScreenConnect vulnerabilities?
160
+ sentences:
161
+ - 'The hiberfil.sys file represents a critical artifact in digital forensics for
162
+ establishing temporal context and system state at specific points in time. This
163
+ Windows hibernation file contains compressed memory contents when a system enters
164
+ power-saving mode, preserving volatile data including running processes, loaded
165
+ drivers, and network connections.\n\n**Timeline Establishment Through Metadata
166
+ Analysis**\n\nThe creation timestamp of hiberfil.sys provides definitive evidence
167
+ of the last hibernation event, establishing a concrete temporal anchor point.
168
+ This timestamp corresponds to the exact moment Windows initiated hibernation mode,
169
+ typically occurring during system shutdown or power management events. By analyzing
170
+ this metadata alongside related artifacts like registry entries (HKLM\\\\SYSTEM\\\\CurrentControlSet\\\\Control\\\\Power)
171
+ and Event Viewer logs (Event ID 4634 for logoff), investigators can reconstruct
172
+ precise chronological sequences.\n\n**MITRE ATT&CK Framework Integration**\n\nWithin
173
+ the MITRE ATT&CK framework, hiberfil.sys analysis supports detection of several
174
+ techniques including T1055 (Process Injection) and T1070.004 (File Deletion).
175
+ Adversaries may attempt to manipulate or delete this file to eliminate forensic
176
+ evidence, making its presence or absence significant for timeline reconstruction.\n\n**NIST
177
+ Cybersecurity Framework Alignment**\n\nFollowing NIST CSF guidelines, particularly
178
+ the Respond (RS) and Recover (RC) functions, hiberfil.sys analysis supports incident
179
+ response activities. The metadata provides crucial temporal context for understanding
180
+ attack progression and system compromise duration.\n\n**Technical Implementation
181
+ Considerations**\n\nForensic investigators should examine multiple timestamp artifacts:
182
+ creation time, last modification time, and access time. However, the creation
183
+ timestamp remains most reliable as it reflects the initial hibernation event rather
184
+ than subsequent file operations that might alter other timestamps through normal
185
+ system operations or potential tampering attempts.\n\nThe hiberfil.sys file size
186
+ typically correlates with installed RAM capacity, providing additional contextual
187
+ information about system configuration at the time of hibernation. This metadata
188
+ correlation strengthens timeline accuracy by validating system state consistency
189
+ across multiple forensic artifacts.'
190
+ - 'Organizations can effectively leverage log analysis and Endpoint Detection and
191
+ Response (EDR) systems to monitor Alternate Data Streams (ADS) through a multi-layered
192
+ approach that addresses both detection capabilities and operational visibility.\n\n**Log
193
+ Analysis Implementation:**\nWindows Event Logs provide foundational monitoring
194
+ through Security, System, and Application logs. Critical events include 4663 (object
195
+ access), 4688 (process creation), and 5140 (file share access). Organizations
196
+ should configure advanced audit policies for \\\"Audit File System\\\" and \\\"Audit
197
+ Handle Manipulation\\\" under Local Security Policy. Sysmon configuration becomes
198
+ essential, particularly Event ID 2 (CreateFile) and Event ID 3 (NetworkConnect),
199
+ as these capture detailed file system interactions that standard Windows logs
200
+ might miss.\n\n**EDR System Configuration:**\nModern EDR platforms like CrowdStrike,
201
+ SentinelOne, or Microsoft Defender for Endpoint offer native ADS detection capabilities.
202
+ These systems should be configured to monitor:\n- File creation/modification events
203
+ with stream enumeration\n- Process access to files with multiple data streams\n-
204
+ Registry modifications associated with ADS-enabled applications\n- Network communications
205
+ from processes accessing hidden streams\n\n**Critical Directory Monitoring:**\nSystem
206
+ directories requiring enhanced monitoring include %SystemRoot%, %ProgramFiles%,
207
+ and user profile directories. Implement baseline integrity monitoring using tools
208
+ like Microsoft''s Attack Surface Reduction (ASR) rules or custom PowerShell scripts
209
+ that enumerate ADS presence through Get-ItemProperty -Name \\\"*\\\" commands.\n\n**MITRE
210
+ ATT&CK Alignment:**\nThis approach addresses T1096 (NTFS File Attributes), T1547.001
211
+ (Registry Run Keys/Startup Folder), and T1564.002 (Impair Defenses: Disable or
212
+ Modify Tools). Detection rules should correlate ADS creation with suspicious process
213
+ ancestry, particularly PowerShell execution or living-off-the-land binaries.\n\n**Operational
214
+ Integration:**\nEstablish automated response workflows that quarantine systems
215
+ exhibiting ADS anomalies while preserving forensic evidence. Implement centralized
216
+ logging aggregation using SIEM platforms configured to detect patterns indicating
217
+ ADS abuse, such as rapid stream creation followed by executable access attempts.\n\nThis
218
+ comprehensive monitoring strategy ensures organizations maintain visibility into
219
+ ADS activities while minimizing false positives through contextual analysis and
220
+ behavioral correlation.'
221
+ - SOC teams can enhance their threat hunting capabilities against ScreenConnect
222
+ vulnerabilities by adopting a proactive and iterative approach to searching for
223
+ indicators of compromise (IoCs) and anomalous activities that may indicate exploitation.
224
+ Develop and regularly update threat hunting hypotheses based on the latest threat
225
+ intelligence, focusing on known TTPs associated with the exploitation of ScreenConnect
226
+ vulnerabilities. Utilize advanced analytics and machine learning tools to sift
227
+ through large volumes of data for patterns and anomalies that may signify malicious
228
+ activity. Leverage endpoint detection and response (EDR) tools to continuously
229
+ monitor endpoints for signs of exploitation, such as unusual PowerShell command
230
+ execution, modification of system files, or unexpected network connections. Conduct
231
+ regular vulnerability scans and penetration tests to identify and remediate potential
232
+ weaknesses in ScreenConnect and other critical systems before attackers can exploit
233
+ them. Foster collaboration and information sharing with other organizations and
234
+ cybersecurity communities to gain insights into emerging threats and effective
235
+ detection and response strategies. Invest in continuous training and development
236
+ for SOC team members to keep them abreast of the latest cybersecurity trends,
237
+ tools, and techniques. By implementing these best practices, SOC teams can significantly
238
+ improve their ability to detect and respond to threats targeting ScreenConnect
239
+ vulnerabilities, thereby enhancing the overall security posture of their organization.
240
+ - source_sentence: How would you use Amcache analysis to detect fileless malware that
241
+ drops temporary components for initial system compromise?
242
+ sentences:
243
+ - '# Automated Extraction of Empire Agent Configurations: Defensive Analysis\n\n##
244
+ NIST Cybersecurity Framework Context\n\nWithin the NIST CSF''s **Detect (DE)**
245
+ and **Respond (RS)** functions, organizations must implement capabilities to identify
246
+ and analyze malicious configurations. PowerShell Empire represents a sophisticated
247
+ post-exploitation framework mapped to MITRE ATT&CK techniques including T1059.001
248
+ (PowerShell) and T1027 (Obfuscated Files or Information).\n\n## Detection and
249
+ Analysis Methodology\n\n**Memory Forensics Approach:**\nDevelop automated tools
250
+ leveraging memory acquisition frameworks like Volatility or Rekall to identify
251
+ Empire''s in-memory artifacts. Focus on detecting:\n- PowerShell reflection objects
252
+ characteristic of Empire''s module loading\n- Base64-encoded configuration blobs
253
+ within process memory spaces\n- Registry keys containing encoded agent parameters
254
+ (typically HKLM\\\\SOFTWARE\\\\Classes\\\\ms-settings)\n\n**File System Analysis:**\nImplement
255
+ scanning mechanisms targeting:\n- Temporary directories where Empire extracts
256
+ configurations\n- PowerShell execution logs revealing obfuscated command patterns\n-
257
+ Event log analysis for suspicious PowerShell execution contexts\n\n## Technical
258
+ Implementation Framework\n\n**Automated Extraction Pipeline:**\n1. **Signature-Based
259
+ Detection:** Develop YARA rules identifying Empire''s distinctive code patterns
260
+ and configuration structures\n2. **Memory Parsing:** Implement plugins parsing
261
+ .NET objects and PowerShell runspaces\n3. **Decryption Routines:** Create automated
262
+ decoding mechanisms for Empire''s XOR-based configuration encryption\n4. **Artifact
263
+ Correlation:** Cross-reference multiple data sources to validate findings\n\n**MITRE
264
+ ATT&CK Mapping:**\n- T1083 (File and Directory Discovery)\n- T1057 (Process Discovery)\n-
265
+ T1005 (Data from Local System)\n\n## Defensive Considerations\n\nTools must incorporate
266
+ anti-evasion techniques, including detection of common obfuscation methods like
267
+ string concatenation and variable substitution. Integration with SIEM platforms
268
+ enables real-time alerting when Empire artifacts are discovered.\n\n**Validation
269
+ Framework:**\nImplement multi-layered validation ensuring extracted configurations
270
+ correspond to active threats rather than benign PowerShell activity. This includes
271
+ behavioral analysis correlating configuration parameters with observed network
272
+ communications and file system modifications'
273
+ - To capture and display network traffic
274
+ - 'Amcache analysis provides critical forensic artifacts for detecting fileless
275
+ malware employing temporary component deployment during initial system compromise,
276
+ aligning with MITRE ATT&CK techniques T1055 (Process Injection) and T1620 (Reflective
277
+ Code Loading).\n\n**Amcache Artifact Analysis Framework:**\n\nThe Amcache.hve
278
+ registry hive maintains comprehensive application execution metadata, including
279
+ file paths, hashes, and execution timestamps. For fileless malware detection,
280
+ focus on:\n\n1. **Temporary File Creation Patterns**: Analyze entries with suspicious
281
+ temporal clustering in the \\\"Programs\\\" key, particularly executables stored
282
+ in system directories (C:\\\\Windows\\\\Temp, C:\\\\Users\\\\[User]\\\\AppData\\\\Local\\\\Temp).
283
+ Legitimate applications typically exhibit predictable installation patterns, while
284
+ malicious components often manifest as isolated, recently-created executables.\n\n2.
285
+ **Hash-Based Indicators**: Cross-reference SHA-1 hashes against threat intelligence
286
+ feeds and known malware signatures. Fileless malware frequently employs legitimate
287
+ system binaries for process hollowing (T1055.012) or reflective DLL loading (T1620),
288
+ making hash analysis crucial for identifying repurposed executables.\n\n3. **Execution
289
+ Chain Analysis**: Examine parent-child relationships within Amcache entries to
290
+ identify anomalous process spawning patterns. Fileless malware often exhibits
291
+ unusual execution chains, particularly when temporary components spawn from unexpected
292
+ parent processes or system services.\n\n**NIST CSF Implementation Strategy:**\n\nUnder
293
+ the Detect (DE) function, specifically DE.AE-2 (Detected events are analyzed),
294
+ implement continuous Amcache monitoring through:\n\n- **Baseline Establishment**:
295
+ Create organizational baselines for normal temporary file creation patterns and
296
+ execution behaviors\n- **Anomaly Detection**: Deploy automated analysis tools
297
+ to identify deviations from established baselines\n- **Correlation Analysis**:
298
+ Integrate Amcache findings with network traffic analysis and endpoint detection
299
+ systems\n\n**Advanced Detection Methodologies:**\n\nUtilize PowerShell-based parsing
300
+ scripts or specialized forensic tools like KAPE to extract and analyze Amcache
301
+ artifacts. Focus on:\n\n- Unusual file extensions in temporary directories\n-
302
+ Executables created immediately before suspicious network activity\n- Components
303
+ with execution timestamps correlating with initial access events\n- Hash collisions
304
+ or similarities between temporary files and known malware families\n\nThis approach
305
+ enables proactive identification of fileless malware campaigns leveraging temporary
306
+ components for system compromise, supporting comprehensive threat hunting and
307
+ incident response activities within enterprise environments.'
308
+ pipeline_tag: sentence-similarity
309
+ library_name: sentence-transformers
310
+ ---
311
+
312
+ # SentenceTransformer
313
+
314
+ This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
315
+
316
+ ## Model Details
317
+
318
+ ### Model Description
319
+ - **Model Type:** Sentence Transformer
320
+ <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
321
+ - **Maximum Sequence Length:** 1024 tokens
322
+ - **Output Dimensionality:** 768 dimensions
323
+ - **Similarity Function:** Cosine Similarity
324
+ <!-- - **Training Dataset:** Unknown -->
325
+ <!-- - **Language:** Unknown -->
326
+ <!-- - **License:** Unknown -->
327
+
328
+ ### Model Sources
329
+
330
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
331
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
332
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
333
+
334
+ ### Full Model Architecture
335
+
336
+ ```
337
+ SentenceTransformer(
338
+ (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
339
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
340
+ )
341
+ ```
342
+
343
+ ## Usage
344
+
345
+ ### Direct Usage (Sentence Transformers)
346
+
347
+ First install the Sentence Transformers library:
348
+
349
+ ```bash
350
+ pip install -U sentence-transformers
351
+ ```
352
+
353
+ Then you can load this model and run inference.
354
+ ```python
355
+ from sentence_transformers import SentenceTransformer
356
+
357
+ # Download from the 🤗 Hub
358
+ model = SentenceTransformer("sentence_transformers_model_id")
359
+ # Run inference
360
+ sentences = [
361
+ 'How would you use Amcache analysis to detect fileless malware that drops temporary components for initial system compromise?',
362
+ 'Amcache analysis provides critical forensic artifacts for detecting fileless malware employing temporary component deployment during initial system compromise, aligning with MITRE ATT&CK techniques T1055 (Process Injection) and T1620 (Reflective Code Loading).\\n\\n**Amcache Artifact Analysis Framework:**\\n\\nThe Amcache.hve registry hive maintains comprehensive application execution metadata, including file paths, hashes, and execution timestamps. For fileless malware detection, focus on:\\n\\n1. **Temporary File Creation Patterns**: Analyze entries with suspicious temporal clustering in the \\\\\\"Programs\\\\\\" key, particularly executables stored in system directories (C:\\\\\\\\Windows\\\\\\\\Temp, C:\\\\\\\\Users\\\\\\\\[User]\\\\\\\\AppData\\\\\\\\Local\\\\\\\\Temp). Legitimate applications typically exhibit predictable installation patterns, while malicious components often manifest as isolated, recently-created executables.\\n\\n2. **Hash-Based Indicators**: Cross-reference SHA-1 hashes against threat intelligence feeds and known malware signatures. Fileless malware frequently employs legitimate system binaries for process hollowing (T1055.012) or reflective DLL loading (T1620), making hash analysis crucial for identifying repurposed executables.\\n\\n3. **Execution Chain Analysis**: Examine parent-child relationships within Amcache entries to identify anomalous process spawning patterns. Fileless malware often exhibits unusual execution chains, particularly when temporary components spawn from unexpected parent processes or system services.\\n\\n**NIST CSF Implementation Strategy:**\\n\\nUnder the Detect (DE) function, specifically DE.AE-2 (Detected events are analyzed), implement continuous Amcache monitoring through:\\n\\n- **Baseline Establishment**: Create organizational baselines for normal temporary file creation patterns and execution behaviors\\n- **Anomaly Detection**: Deploy automated analysis tools to identify deviations from established baselines\\n- **Correlation Analysis**: Integrate Amcache findings with network traffic analysis and endpoint detection systems\\n\\n**Advanced Detection Methodologies:**\\n\\nUtilize PowerShell-based parsing scripts or specialized forensic tools like KAPE to extract and analyze Amcache artifacts. Focus on:\\n\\n- Unusual file extensions in temporary directories\\n- Executables created immediately before suspicious network activity\\n- Components with execution timestamps correlating with initial access events\\n- Hash collisions or similarities between temporary files and known malware families\\n\\nThis approach enables proactive identification of fileless malware campaigns leveraging temporary components for system compromise, supporting comprehensive threat hunting and incident response activities within enterprise environments.',
363
+ 'To capture and display network traffic',
364
+ ]
365
+ embeddings = model.encode(sentences)
366
+ print(embeddings.shape)
367
+ # [3, 768]
368
+
369
+ # Get the similarity scores for the embeddings
370
+ similarities = model.similarity(embeddings, embeddings)
371
+ print(similarities)
372
+ # tensor([[ 1.0000, 0.8653, 0.0078],
373
+ # [ 0.8653, 1.0000, -0.0407],
374
+ # [ 0.0078, -0.0407, 1.0000]])
375
+ ```
376
+
377
+ <!--
378
+ ### Direct Usage (Transformers)
379
+
380
+ <details><summary>Click to see the direct usage in Transformers</summary>
381
+
382
+ </details>
383
+ -->
384
+
385
+ <!--
386
+ ### Downstream Usage (Sentence Transformers)
387
+
388
+ You can finetune this model on your own dataset.
389
+
390
+ <details><summary>Click to expand</summary>
391
+
392
+ </details>
393
+ -->
394
+
395
+ <!--
396
+ ### Out-of-Scope Use
397
+
398
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
399
+ -->
400
+
401
+ <!--
402
+ ## Bias, Risks and Limitations
403
+
404
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
405
+ -->
406
+
407
+ <!--
408
+ ### Recommendations
409
+
410
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
411
+ -->
412
+
413
+ ## Training Details
414
+
415
+ ### Training Dataset
416
+
417
+ #### Unnamed Dataset
418
+
419
+ * Size: 35,705 training samples
420
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
421
+ * Approximate statistics based on the first 1000 samples:
422
+ | | sentence_0 | sentence_1 | label |
423
+ |:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:--------------------------------------------------------------|
424
+ | type | string | string | float |
425
+ | details | <ul><li>min: 9 tokens</li><li>mean: 20.14 tokens</li><li>max: 103 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 293.14 tokens</li><li>max: 934 tokens</li></ul> | <ul><li>min: 1.0</li><li>mean: 1.0</li><li>max: 1.0</li></ul> |
426
+ * Samples:
427
+ | sentence_0 | sentence_1 | label |
428
+ |:---------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
429
+ | <code>Under what circumstances does attribution bias distort intrusion linking, and what countermeasures can mitigate it?</code> | <code>Attribution bias in intrusion linking occurs when analysts allow preconceived notions, organizational pressures, or cognitive shortcuts to influence their assessment of attack origins and relationships between incidents. This distortion manifests through several mechanisms: confirmation bias leads analysts to interpret ambiguous indicators as evidence supporting their initial hypotheses; availability heuristic causes over-reliance on memorable or recent attacks; and anchoring effect results in disproportionate weight given to early intelligence assessments.\n\nThe MITRE ATT&CK framework's Groups matrix exemplifies how attribution bias emerges when linking tactics, techniques, and procedures (TTPs) across campaigns. Analysts may incorrectly associate similar tools or methodologies without sufficient evidence of operational connection, particularly when adversaries employ publicly available malware or common techniques across multiple threat groups. This challenge intensifies during high...</code> | <code>1.0</code> |
430
+ | <code>How can you identify store buffer bypass speculation artifacts?</code> | <code>Store buffer bypass speculation artifacts represent sophisticated side-channel vulnerabilities that exploit microarchitectural features in modern processors. These vulnerabilities occur when speculative execution accesses memory locations before stores are committed, creating observable timing differences that leak sensitive information.\n\n**Detection Methodologies:**\n\nStatic analysis tools can identify potential store buffer bypass vulnerabilities by examining assembly code patterns where load instructions follow store operations without proper memory barriers. Tools like Intel's Pin or LLVM-based analyzers can detect missing lfence/stfence instructions that prevent speculative execution from bypassing stores.\n\nDynamic analysis approaches involve precise timing measurements using high-resolution performance counters. Researchers measure memory access latencies to identify anomalous patterns where load operations complete faster than expected, indicating potential speculation thro...</code> | <code>1.0</code> |
431
+ | <code>How would you develop methods for analyzing attacks targeting industrial control systems?</code> | <code>Developing comprehensive methods for analyzing attacks targeting Industrial Control Systems (ICS) requires a multi-layered approach integrating specialized frameworks and industry standards.\n\n**Framework Integration**: Begin with NIST's Cybersecurity Framework, particularly the Identify function, to catalog ICS assets, data flows, and criticality levels. The Detect function provides guidance on implementing continuous monitoring capabilities tailored for operational technology environments. MITRE ATT&CK for ICS offers crucial tactical intelligence, mapping adversary behaviors specific to industrial contexts through techniques like \\\"Inhibit Response Function\\\" (T0803) or \\\"Modify Parameter\\\" (T0832).\n\n**Technical Analysis Methodology**: Establish baseline behavioral profiles for normal ICS operations using network traffic analysis, protocol inspection, and system state monitoring. Deploy specialized tools capable of deep packet inspection for industrial protocols (Modbus, D...</code> | <code>1.0</code> |
432
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
433
+ ```json
434
+ {
435
+ "scale": 20.0,
436
+ "similarity_fct": "cos_sim"
437
+ }
438
+ ```
439
+
440
+ ### Training Hyperparameters
441
+ #### Non-Default Hyperparameters
442
+
443
+ - `eval_strategy`: steps
444
+ - `per_device_train_batch_size`: 32
445
+ - `per_device_eval_batch_size`: 32
446
+ - `num_train_epochs`: 20
447
+ - `multi_dataset_batch_sampler`: round_robin
448
+
449
+ #### All Hyperparameters
450
+ <details><summary>Click to expand</summary>
451
+
452
+ - `overwrite_output_dir`: False
453
+ - `do_predict`: False
454
+ - `eval_strategy`: steps
455
+ - `prediction_loss_only`: True
456
+ - `per_device_train_batch_size`: 32
457
+ - `per_device_eval_batch_size`: 32
458
+ - `per_gpu_train_batch_size`: None
459
+ - `per_gpu_eval_batch_size`: None
460
+ - `gradient_accumulation_steps`: 1
461
+ - `eval_accumulation_steps`: None
462
+ - `torch_empty_cache_steps`: None
463
+ - `learning_rate`: 5e-05
464
+ - `weight_decay`: 0.0
465
+ - `adam_beta1`: 0.9
466
+ - `adam_beta2`: 0.999
467
+ - `adam_epsilon`: 1e-08
468
+ - `max_grad_norm`: 1
469
+ - `num_train_epochs`: 20
470
+ - `max_steps`: -1
471
+ - `lr_scheduler_type`: linear
472
+ - `lr_scheduler_kwargs`: {}
473
+ - `warmup_ratio`: 0.0
474
+ - `warmup_steps`: 0
475
+ - `log_level`: passive
476
+ - `log_level_replica`: warning
477
+ - `log_on_each_node`: True
478
+ - `logging_nan_inf_filter`: True
479
+ - `save_safetensors`: True
480
+ - `save_on_each_node`: False
481
+ - `save_only_model`: False
482
+ - `restore_callback_states_from_checkpoint`: False
483
+ - `no_cuda`: False
484
+ - `use_cpu`: False
485
+ - `use_mps_device`: False
486
+ - `seed`: 42
487
+ - `data_seed`: None
488
+ - `jit_mode_eval`: False
489
+ - `use_ipex`: False
490
+ - `bf16`: False
491
+ - `fp16`: False
492
+ - `fp16_opt_level`: O1
493
+ - `half_precision_backend`: auto
494
+ - `bf16_full_eval`: False
495
+ - `fp16_full_eval`: False
496
+ - `tf32`: None
497
+ - `local_rank`: 0
498
+ - `ddp_backend`: None
499
+ - `tpu_num_cores`: None
500
+ - `tpu_metrics_debug`: False
501
+ - `debug`: []
502
+ - `dataloader_drop_last`: True
503
+ - `dataloader_num_workers`: 0
504
+ - `dataloader_prefetch_factor`: None
505
+ - `past_index`: -1
506
+ - `disable_tqdm`: False
507
+ - `remove_unused_columns`: True
508
+ - `label_names`: None
509
+ - `load_best_model_at_end`: False
510
+ - `ignore_data_skip`: False
511
+ - `fsdp`: []
512
+ - `fsdp_min_num_params`: 0
513
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
514
+ - `fsdp_transformer_layer_cls_to_wrap`: None
515
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
516
+ - `deepspeed`: None
517
+ - `label_smoothing_factor`: 0.0
518
+ - `optim`: adamw_torch
519
+ - `optim_args`: None
520
+ - `adafactor`: False
521
+ - `group_by_length`: False
522
+ - `length_column_name`: length
523
+ - `ddp_find_unused_parameters`: None
524
+ - `ddp_bucket_cap_mb`: None
525
+ - `ddp_broadcast_buffers`: False
526
+ - `dataloader_pin_memory`: True
527
+ - `dataloader_persistent_workers`: False
528
+ - `skip_memory_metrics`: True
529
+ - `use_legacy_prediction_loop`: False
530
+ - `push_to_hub`: False
531
+ - `resume_from_checkpoint`: None
532
+ - `hub_model_id`: None
533
+ - `hub_strategy`: every_save
534
+ - `hub_private_repo`: None
535
+ - `hub_always_push`: False
536
+ - `gradient_checkpointing`: False
537
+ - `gradient_checkpointing_kwargs`: None
538
+ - `include_inputs_for_metrics`: False
539
+ - `include_for_metrics`: []
540
+ - `eval_do_concat_batches`: True
541
+ - `fp16_backend`: auto
542
+ - `push_to_hub_model_id`: None
543
+ - `push_to_hub_organization`: None
544
+ - `mp_parameters`:
545
+ - `auto_find_batch_size`: False
546
+ - `full_determinism`: False
547
+ - `torchdynamo`: None
548
+ - `ray_scope`: last
549
+ - `ddp_timeout`: 1800
550
+ - `torch_compile`: False
551
+ - `torch_compile_backend`: None
552
+ - `torch_compile_mode`: None
553
+ - `include_tokens_per_second`: False
554
+ - `include_num_input_tokens_seen`: False
555
+ - `neftune_noise_alpha`: None
556
+ - `optim_target_modules`: None
557
+ - `batch_eval_metrics`: False
558
+ - `eval_on_start`: False
559
+ - `use_liger_kernel`: False
560
+ - `eval_use_gather_object`: False
561
+ - `average_tokens_across_devices`: False
562
+ - `prompts`: None
563
+ - `batch_sampler`: batch_sampler
564
+ - `multi_dataset_batch_sampler`: round_robin
565
+ - `router_mapping`: {}
566
+ - `learning_rate_mapping`: {}
567
+
568
+ </details>
569
+
570
+ ### Training Logs
571
+ | Epoch | Step | Training Loss |
572
+ |:-------:|:----:|:-------------:|
573
+ | 1.0 | 139 | - |
574
+ | 2.0 | 278 | - |
575
+ | 3.0 | 417 | - |
576
+ | 3.5971 | 500 | 1.1678 |
577
+ | 4.0 | 556 | - |
578
+ | 5.0 | 695 | - |
579
+ | 6.0 | 834 | - |
580
+ | 7.0 | 973 | - |
581
+ | 7.1942 | 1000 | 0.0258 |
582
+ | 8.0 | 1112 | - |
583
+ | 9.0 | 1251 | - |
584
+ | 10.0 | 1390 | - |
585
+ | 10.7914 | 1500 | 0.0037 |
586
+ | 11.0 | 1529 | - |
587
+ | 12.0 | 1668 | - |
588
+ | 13.0 | 1807 | - |
589
+ | 14.0 | 1946 | - |
590
+ | 14.3885 | 2000 | 0.0016 |
591
+ | 15.0 | 2085 | - |
592
+ | 16.0 | 2224 | - |
593
+ | 17.0 | 2363 | - |
594
+ | 17.9856 | 2500 | 0.0009 |
595
+ | 18.0 | 2502 | - |
596
+ | 19.0 | 2641 | - |
597
+ | 20.0 | 2780 | - |
598
+
599
+
600
+ ### Framework Versions
601
+ - Python: 3.10.10
602
+ - Sentence Transformers: 5.0.0
603
+ - Transformers: 4.52.4
604
+ - PyTorch: 2.7.0+cu128
605
+ - Accelerate: 1.9.0
606
+ - Datasets: 3.6.0
607
+ - Tokenizers: 0.21.1
608
+
609
+ ## Citation
610
+
611
+ ### BibTeX
612
+
613
+ #### Sentence Transformers
614
+ ```bibtex
615
+ @inproceedings{reimers-2019-sentence-bert,
616
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
617
+ author = "Reimers, Nils and Gurevych, Iryna",
618
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
619
+ month = "11",
620
+ year = "2019",
621
+ publisher = "Association for Computational Linguistics",
622
+ url = "https://arxiv.org/abs/1908.10084",
623
+ }
624
+ ```
625
+
626
+ #### MultipleNegativesRankingLoss
627
+ ```bibtex
628
+ @misc{henderson2017efficient,
629
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
630
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
631
+ year={2017},
632
+ eprint={1705.00652},
633
+ archivePrefix={arXiv},
634
+ primaryClass={cs.CL}
635
+ }
636
+ ```
637
+
638
+ <!--
639
+ ## Glossary
640
+
641
+ *Clearly define terms in order to be accessible across audiences.*
642
+ -->
643
+
644
+ <!--
645
+ ## Model Card Authors
646
+
647
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
648
+ -->
649
+
650
+ <!--
651
+ ## Model Card Contact
652
+
653
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
654
+ -->
config.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ModernBertModel"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 50281,
8
+ "classifier_activation": "gelu",
9
+ "classifier_bias": false,
10
+ "classifier_dropout": 0.0,
11
+ "classifier_pooling": "mean",
12
+ "cls_token_id": 50281,
13
+ "decoder_bias": true,
14
+ "deterministic_flash_attn": false,
15
+ "embedding_dropout": 0.0,
16
+ "eos_token_id": 50282,
17
+ "global_attn_every_n_layers": 3,
18
+ "global_rope_theta": 160000.0,
19
+ "gradient_checkpointing": false,
20
+ "hidden_activation": "gelu",
21
+ "hidden_size": 768,
22
+ "id2label": {
23
+ "0": "LABEL_0"
24
+ },
25
+ "initializer_cutoff_factor": 2.0,
26
+ "initializer_range": 0.02,
27
+ "intermediate_size": 1152,
28
+ "label2id": {
29
+ "LABEL_0": 0
30
+ },
31
+ "layer_norm_eps": 1e-05,
32
+ "local_attention": 128,
33
+ "local_rope_theta": 10000.0,
34
+ "max_position_embeddings": 8192,
35
+ "mlp_bias": false,
36
+ "mlp_dropout": 0.0,
37
+ "model_type": "modernbert",
38
+ "norm_bias": false,
39
+ "norm_eps": 1e-05,
40
+ "num_attention_heads": 12,
41
+ "num_hidden_layers": 22,
42
+ "pad_token_id": 50283,
43
+ "position_embedding_type": "absolute",
44
+ "repad_logits_with_grad": false,
45
+ "sentence_transformers": {
46
+ "activation_fn": "torch.nn.modules.activation.Sigmoid",
47
+ "version": "5.0.0"
48
+ },
49
+ "sep_token_id": 50282,
50
+ "sparse_pred_ignore_index": -100,
51
+ "sparse_prediction": false,
52
+ "torch_dtype": "float32",
53
+ "transformers_version": "4.52.4",
54
+ "vocab_size": 50368
55
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.0.0",
5
+ "transformers": "4.52.4",
6
+ "pytorch": "2.7.0+cu128"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ab97ffc3039e63cde330b9e83b6c359fbbd4cb3eb59c05d804ca29000486ef9e
3
+ size 596070136
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 1024,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": true,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,952 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "|||IP_ADDRESS|||",
5
+ "lstrip": false,
6
+ "normalized": true,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": false
10
+ },
11
+ "1": {
12
+ "content": "<|padding|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "50254": {
20
+ "content": " ",
21
+ "lstrip": false,
22
+ "normalized": true,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": false
26
+ },
27
+ "50255": {
28
+ "content": " ",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": false
34
+ },
35
+ "50256": {
36
+ "content": " ",
37
+ "lstrip": false,
38
+ "normalized": true,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": false
42
+ },
43
+ "50257": {
44
+ "content": " ",
45
+ "lstrip": false,
46
+ "normalized": true,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": false
50
+ },
51
+ "50258": {
52
+ "content": " ",
53
+ "lstrip": false,
54
+ "normalized": true,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": false
58
+ },
59
+ "50259": {
60
+ "content": " ",
61
+ "lstrip": false,
62
+ "normalized": true,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": false
66
+ },
67
+ "50260": {
68
+ "content": " ",
69
+ "lstrip": false,
70
+ "normalized": true,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": false
74
+ },
75
+ "50261": {
76
+ "content": " ",
77
+ "lstrip": false,
78
+ "normalized": true,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": false
82
+ },
83
+ "50262": {
84
+ "content": " ",
85
+ "lstrip": false,
86
+ "normalized": true,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": false
90
+ },
91
+ "50263": {
92
+ "content": " ",
93
+ "lstrip": false,
94
+ "normalized": true,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": false
98
+ },
99
+ "50264": {
100
+ "content": " ",
101
+ "lstrip": false,
102
+ "normalized": true,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": false
106
+ },
107
+ "50265": {
108
+ "content": " ",
109
+ "lstrip": false,
110
+ "normalized": true,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": false
114
+ },
115
+ "50266": {
116
+ "content": " ",
117
+ "lstrip": false,
118
+ "normalized": true,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": false
122
+ },
123
+ "50267": {
124
+ "content": " ",
125
+ "lstrip": false,
126
+ "normalized": true,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": false
130
+ },
131
+ "50268": {
132
+ "content": " ",
133
+ "lstrip": false,
134
+ "normalized": true,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": false
138
+ },
139
+ "50269": {
140
+ "content": " ",
141
+ "lstrip": false,
142
+ "normalized": true,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": false
146
+ },
147
+ "50270": {
148
+ "content": " ",
149
+ "lstrip": false,
150
+ "normalized": true,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": false
154
+ },
155
+ "50271": {
156
+ "content": " ",
157
+ "lstrip": false,
158
+ "normalized": true,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": false
162
+ },
163
+ "50272": {
164
+ "content": " ",
165
+ "lstrip": false,
166
+ "normalized": true,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": false
170
+ },
171
+ "50273": {
172
+ "content": " ",
173
+ "lstrip": false,
174
+ "normalized": true,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": false
178
+ },
179
+ "50274": {
180
+ "content": " ",
181
+ "lstrip": false,
182
+ "normalized": true,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": false
186
+ },
187
+ "50275": {
188
+ "content": " ",
189
+ "lstrip": false,
190
+ "normalized": true,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": false
194
+ },
195
+ "50276": {
196
+ "content": " ",
197
+ "lstrip": false,
198
+ "normalized": true,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": false
202
+ },
203
+ "50277": {
204
+ "content": "|||EMAIL_ADDRESS|||",
205
+ "lstrip": false,
206
+ "normalized": true,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": false
210
+ },
211
+ "50278": {
212
+ "content": "|||PHONE_NUMBER|||",
213
+ "lstrip": false,
214
+ "normalized": true,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": false
218
+ },
219
+ "50279": {
220
+ "content": "<|endoftext|>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "50280": {
228
+ "content": "[UNK]",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "50281": {
236
+ "content": "[CLS]",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "50282": {
244
+ "content": "[SEP]",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "50283": {
252
+ "content": "[PAD]",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "50284": {
260
+ "content": "[MASK]",
261
+ "lstrip": true,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "50285": {
268
+ "content": "[unused0]",
269
+ "lstrip": false,
270
+ "normalized": true,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": false
274
+ },
275
+ "50286": {
276
+ "content": "[unused1]",
277
+ "lstrip": false,
278
+ "normalized": true,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": false
282
+ },
283
+ "50287": {
284
+ "content": "[unused2]",
285
+ "lstrip": false,
286
+ "normalized": true,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": false
290
+ },
291
+ "50288": {
292
+ "content": "[unused3]",
293
+ "lstrip": false,
294
+ "normalized": true,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": false
298
+ },
299
+ "50289": {
300
+ "content": "[unused4]",
301
+ "lstrip": false,
302
+ "normalized": true,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": false
306
+ },
307
+ "50290": {
308
+ "content": "[unused5]",
309
+ "lstrip": false,
310
+ "normalized": true,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": false
314
+ },
315
+ "50291": {
316
+ "content": "[unused6]",
317
+ "lstrip": false,
318
+ "normalized": true,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": false
322
+ },
323
+ "50292": {
324
+ "content": "[unused7]",
325
+ "lstrip": false,
326
+ "normalized": true,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": false
330
+ },
331
+ "50293": {
332
+ "content": "[unused8]",
333
+ "lstrip": false,
334
+ "normalized": true,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": false
338
+ },
339
+ "50294": {
340
+ "content": "[unused9]",
341
+ "lstrip": false,
342
+ "normalized": true,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": false
346
+ },
347
+ "50295": {
348
+ "content": "[unused10]",
349
+ "lstrip": false,
350
+ "normalized": true,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": false
354
+ },
355
+ "50296": {
356
+ "content": "[unused11]",
357
+ "lstrip": false,
358
+ "normalized": true,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": false
362
+ },
363
+ "50297": {
364
+ "content": "[unused12]",
365
+ "lstrip": false,
366
+ "normalized": true,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": false
370
+ },
371
+ "50298": {
372
+ "content": "[unused13]",
373
+ "lstrip": false,
374
+ "normalized": true,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": false
378
+ },
379
+ "50299": {
380
+ "content": "[unused14]",
381
+ "lstrip": false,
382
+ "normalized": true,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": false
386
+ },
387
+ "50300": {
388
+ "content": "[unused15]",
389
+ "lstrip": false,
390
+ "normalized": true,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": false
394
+ },
395
+ "50301": {
396
+ "content": "[unused16]",
397
+ "lstrip": false,
398
+ "normalized": true,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": false
402
+ },
403
+ "50302": {
404
+ "content": "[unused17]",
405
+ "lstrip": false,
406
+ "normalized": true,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": false
410
+ },
411
+ "50303": {
412
+ "content": "[unused18]",
413
+ "lstrip": false,
414
+ "normalized": true,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": false
418
+ },
419
+ "50304": {
420
+ "content": "[unused19]",
421
+ "lstrip": false,
422
+ "normalized": true,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": false
426
+ },
427
+ "50305": {
428
+ "content": "[unused20]",
429
+ "lstrip": false,
430
+ "normalized": true,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": false
434
+ },
435
+ "50306": {
436
+ "content": "[unused21]",
437
+ "lstrip": false,
438
+ "normalized": true,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": false
442
+ },
443
+ "50307": {
444
+ "content": "[unused22]",
445
+ "lstrip": false,
446
+ "normalized": true,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": false
450
+ },
451
+ "50308": {
452
+ "content": "[unused23]",
453
+ "lstrip": false,
454
+ "normalized": true,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": false
458
+ },
459
+ "50309": {
460
+ "content": "[unused24]",
461
+ "lstrip": false,
462
+ "normalized": true,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": false
466
+ },
467
+ "50310": {
468
+ "content": "[unused25]",
469
+ "lstrip": false,
470
+ "normalized": true,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": false
474
+ },
475
+ "50311": {
476
+ "content": "[unused26]",
477
+ "lstrip": false,
478
+ "normalized": true,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": false
482
+ },
483
+ "50312": {
484
+ "content": "[unused27]",
485
+ "lstrip": false,
486
+ "normalized": true,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": false
490
+ },
491
+ "50313": {
492
+ "content": "[unused28]",
493
+ "lstrip": false,
494
+ "normalized": true,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": false
498
+ },
499
+ "50314": {
500
+ "content": "[unused29]",
501
+ "lstrip": false,
502
+ "normalized": true,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": false
506
+ },
507
+ "50315": {
508
+ "content": "[unused30]",
509
+ "lstrip": false,
510
+ "normalized": true,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": false
514
+ },
515
+ "50316": {
516
+ "content": "[unused31]",
517
+ "lstrip": false,
518
+ "normalized": true,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": false
522
+ },
523
+ "50317": {
524
+ "content": "[unused32]",
525
+ "lstrip": false,
526
+ "normalized": true,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": false
530
+ },
531
+ "50318": {
532
+ "content": "[unused33]",
533
+ "lstrip": false,
534
+ "normalized": true,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": false
538
+ },
539
+ "50319": {
540
+ "content": "[unused34]",
541
+ "lstrip": false,
542
+ "normalized": true,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": false
546
+ },
547
+ "50320": {
548
+ "content": "[unused35]",
549
+ "lstrip": false,
550
+ "normalized": true,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": false
554
+ },
555
+ "50321": {
556
+ "content": "[unused36]",
557
+ "lstrip": false,
558
+ "normalized": true,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": false
562
+ },
563
+ "50322": {
564
+ "content": "[unused37]",
565
+ "lstrip": false,
566
+ "normalized": true,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": false
570
+ },
571
+ "50323": {
572
+ "content": "[unused38]",
573
+ "lstrip": false,
574
+ "normalized": true,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": false
578
+ },
579
+ "50324": {
580
+ "content": "[unused39]",
581
+ "lstrip": false,
582
+ "normalized": true,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": false
586
+ },
587
+ "50325": {
588
+ "content": "[unused40]",
589
+ "lstrip": false,
590
+ "normalized": true,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": false
594
+ },
595
+ "50326": {
596
+ "content": "[unused41]",
597
+ "lstrip": false,
598
+ "normalized": true,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": false
602
+ },
603
+ "50327": {
604
+ "content": "[unused42]",
605
+ "lstrip": false,
606
+ "normalized": true,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": false
610
+ },
611
+ "50328": {
612
+ "content": "[unused43]",
613
+ "lstrip": false,
614
+ "normalized": true,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": false
618
+ },
619
+ "50329": {
620
+ "content": "[unused44]",
621
+ "lstrip": false,
622
+ "normalized": true,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": false
626
+ },
627
+ "50330": {
628
+ "content": "[unused45]",
629
+ "lstrip": false,
630
+ "normalized": true,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": false
634
+ },
635
+ "50331": {
636
+ "content": "[unused46]",
637
+ "lstrip": false,
638
+ "normalized": true,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": false
642
+ },
643
+ "50332": {
644
+ "content": "[unused47]",
645
+ "lstrip": false,
646
+ "normalized": true,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": false
650
+ },
651
+ "50333": {
652
+ "content": "[unused48]",
653
+ "lstrip": false,
654
+ "normalized": true,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": false
658
+ },
659
+ "50334": {
660
+ "content": "[unused49]",
661
+ "lstrip": false,
662
+ "normalized": true,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": false
666
+ },
667
+ "50335": {
668
+ "content": "[unused50]",
669
+ "lstrip": false,
670
+ "normalized": true,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": false
674
+ },
675
+ "50336": {
676
+ "content": "[unused51]",
677
+ "lstrip": false,
678
+ "normalized": true,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": false
682
+ },
683
+ "50337": {
684
+ "content": "[unused52]",
685
+ "lstrip": false,
686
+ "normalized": true,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": false
690
+ },
691
+ "50338": {
692
+ "content": "[unused53]",
693
+ "lstrip": false,
694
+ "normalized": true,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": false
698
+ },
699
+ "50339": {
700
+ "content": "[unused54]",
701
+ "lstrip": false,
702
+ "normalized": true,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": false
706
+ },
707
+ "50340": {
708
+ "content": "[unused55]",
709
+ "lstrip": false,
710
+ "normalized": true,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": false
714
+ },
715
+ "50341": {
716
+ "content": "[unused56]",
717
+ "lstrip": false,
718
+ "normalized": true,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": false
722
+ },
723
+ "50342": {
724
+ "content": "[unused57]",
725
+ "lstrip": false,
726
+ "normalized": true,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": false
730
+ },
731
+ "50343": {
732
+ "content": "[unused58]",
733
+ "lstrip": false,
734
+ "normalized": true,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": false
738
+ },
739
+ "50344": {
740
+ "content": "[unused59]",
741
+ "lstrip": false,
742
+ "normalized": true,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": false
746
+ },
747
+ "50345": {
748
+ "content": "[unused60]",
749
+ "lstrip": false,
750
+ "normalized": true,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": false
754
+ },
755
+ "50346": {
756
+ "content": "[unused61]",
757
+ "lstrip": false,
758
+ "normalized": true,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": false
762
+ },
763
+ "50347": {
764
+ "content": "[unused62]",
765
+ "lstrip": false,
766
+ "normalized": true,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": false
770
+ },
771
+ "50348": {
772
+ "content": "[unused63]",
773
+ "lstrip": false,
774
+ "normalized": true,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": false
778
+ },
779
+ "50349": {
780
+ "content": "[unused64]",
781
+ "lstrip": false,
782
+ "normalized": true,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": false
786
+ },
787
+ "50350": {
788
+ "content": "[unused65]",
789
+ "lstrip": false,
790
+ "normalized": true,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": false
794
+ },
795
+ "50351": {
796
+ "content": "[unused66]",
797
+ "lstrip": false,
798
+ "normalized": true,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": false
802
+ },
803
+ "50352": {
804
+ "content": "[unused67]",
805
+ "lstrip": false,
806
+ "normalized": true,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": false
810
+ },
811
+ "50353": {
812
+ "content": "[unused68]",
813
+ "lstrip": false,
814
+ "normalized": true,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": false
818
+ },
819
+ "50354": {
820
+ "content": "[unused69]",
821
+ "lstrip": false,
822
+ "normalized": true,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": false
826
+ },
827
+ "50355": {
828
+ "content": "[unused70]",
829
+ "lstrip": false,
830
+ "normalized": true,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": false
834
+ },
835
+ "50356": {
836
+ "content": "[unused71]",
837
+ "lstrip": false,
838
+ "normalized": true,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": false
842
+ },
843
+ "50357": {
844
+ "content": "[unused72]",
845
+ "lstrip": false,
846
+ "normalized": true,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": false
850
+ },
851
+ "50358": {
852
+ "content": "[unused73]",
853
+ "lstrip": false,
854
+ "normalized": true,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": false
858
+ },
859
+ "50359": {
860
+ "content": "[unused74]",
861
+ "lstrip": false,
862
+ "normalized": true,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": false
866
+ },
867
+ "50360": {
868
+ "content": "[unused75]",
869
+ "lstrip": false,
870
+ "normalized": true,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": false
874
+ },
875
+ "50361": {
876
+ "content": "[unused76]",
877
+ "lstrip": false,
878
+ "normalized": true,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": false
882
+ },
883
+ "50362": {
884
+ "content": "[unused77]",
885
+ "lstrip": false,
886
+ "normalized": true,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": false
890
+ },
891
+ "50363": {
892
+ "content": "[unused78]",
893
+ "lstrip": false,
894
+ "normalized": true,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": false
898
+ },
899
+ "50364": {
900
+ "content": "[unused79]",
901
+ "lstrip": false,
902
+ "normalized": true,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": false
906
+ },
907
+ "50365": {
908
+ "content": "[unused80]",
909
+ "lstrip": false,
910
+ "normalized": true,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": false
914
+ },
915
+ "50366": {
916
+ "content": "[unused81]",
917
+ "lstrip": false,
918
+ "normalized": true,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": false
922
+ },
923
+ "50367": {
924
+ "content": "[unused82]",
925
+ "lstrip": false,
926
+ "normalized": true,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": false
930
+ }
931
+ },
932
+ "clean_up_tokenization_spaces": true,
933
+ "cls_token": "[CLS]",
934
+ "extra_special_tokens": {},
935
+ "mask_token": "[MASK]",
936
+ "max_length": 1024,
937
+ "model_input_names": [
938
+ "input_ids",
939
+ "attention_mask"
940
+ ],
941
+ "model_max_length": 1024,
942
+ "pad_to_multiple_of": null,
943
+ "pad_token": "[PAD]",
944
+ "pad_token_type_id": 0,
945
+ "padding_side": "right",
946
+ "sep_token": "[SEP]",
947
+ "stride": 0,
948
+ "tokenizer_class": "PreTrainedTokenizer",
949
+ "truncation_side": "right",
950
+ "truncation_strategy": "longest_first",
951
+ "unk_token": "[UNK]"
952
+ }