aidamian commited on
Commit
ad9fbb7
·
verified ·
1 Parent(s): 87338a8

Update sentinel-mb-c-d11 release bundle

Browse files
README.md CHANGED
@@ -1,7 +1,114 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
5
  base_model:
6
- - FacebookAI/roberta-large
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: other
3
  language:
4
  - en
5
  base_model:
6
+ - answerdotai/ModernBERT-base
7
+ library_name: transformers
8
+ pipeline_tag: text-classification
9
+ tags:
10
+ - custom
11
+ - compliance
12
+ - finance
13
+ - risk-detection
14
+ - text-classification
15
+ - sentinel-stage-a
16
+ - limited-functionality
17
+ - model-version:sentinel-mb-c-d11-20260424
18
+ widget:
19
+ - text: "Subject: Portfolio review follow-up. Hi Karen, following our quarterly review, I recommend trimming part of the concentrated technology position and reallocating the proceeds into the municipal bond ladder we discussed. This should reduce single-name exposure while keeping the account aligned with your income objective."
20
+ example_title: "Portfolio review follow-up"
21
+ - text: "Subject: Structured note opportunity. Hi Michael, I wanted to flag a new structured note that may fit the income sleeve of your portfolio. The note offers enhanced coupon potential, but it is subject to issuer credit risk, market risk, and downside participation if the reference index falls below the stated buffer."
22
+ example_title: "Structured note email"
23
+ ---
24
+
25
+ # sentinel-01-pub
26
+
27
+ `sentinel-01-pub` is a limited-functionality public Aurelex Sentinel Stage A model for demonstration and evaluation of wealth-management communications risk review. It is not a production Aurelex model and must not be treated as legal, compliance, or investment advice.
28
+
29
+ ## Publisher And Ownership
30
+
31
+ - Model developed by Aurelex AI Corp.
32
+ - Published in collaboration with Ratio1.
33
+ - Contact: [hello@aurelexai.com](mailto:hello@aurelexai.com).
34
+ - All intellectual property rights in the model remain with Aurelex AI Corp.
35
+
36
+ This repository is intended to publish only the designated limited-functionality model artifact and its required Hugging Face runtime files. It does not include proprietary training data, system prompts, production models, or internal Aurelex architecture details beyond the information needed to load and evaluate this public artifact.
37
+
38
+ ## Identity
39
+
40
+ - Repo ID: `AurelexAI/sentinel-01-pub`
41
+ - Model key: `sentinel-mb-c-d11`
42
+ - Model version: `sentinel-mb-c-d11-20260424`
43
+ - Release channel: `sentinel-01-pub`
44
+ - Base model: `answerdotai/ModernBERT-base`
45
+ - Artifact format: `transformers_end_to_end`
46
+ - Publication status: public, approved by Aurelex on 2026-04-28
47
+
48
+ The model was selected as a public, lower-capacity, limited-functionality variant. It is separate from Aurelex production channels and full-featured internal models.
49
+
50
+ ## Loading From Hugging Face
51
+
52
+ ```python
53
+ from transformers import pipeline
54
+
55
+ MODEL_ID = "AurelexAI/sentinel-01-pub"
56
+
57
+ audit = pipeline(
58
+ "sentinel-stage-a",
59
+ model=MODEL_ID,
60
+ tokenizer=MODEL_ID,
61
+ trust_remote_code=True,
62
+ )
63
+
64
+ result = audit(
65
+ "Subject: Portfolio review follow-up. Hi Karen, following our quarterly "
66
+ "review, I recommend trimming part of the concentrated technology position "
67
+ "and reallocating the proceeds into the municipal bond ladder we discussed."
68
+ )
69
+ model_version = getattr(audit.model.config, "model_version", MODEL_ID)
70
+
71
+ print(result)
72
+ print(model_version)
73
+ ```
74
+
75
+ For reproducible evaluation, pin a reviewed Hub commit with `revision="<commit_sha>"`.
76
+
77
+ ## Outputs
78
+
79
+ The pipeline returns a JSON-serializable dictionary for Sentinel Stage A labels: `violation`, `severity`, `domain`, `subtype`, `jurisdiction`, `why`, `impacted_principles`, `remediation_actions`, `content_type`, `audience_segment`, `detection_difficulty`, and `aggravating_factors`.
80
+
81
+ These outputs are risk-review signals for human review. They are not final compliance determinations.
82
+
83
+ ## Evaluation
84
+
85
+ Dataset: `2026-04-07-final-audit-clear-v1`, test split size `150`.
86
+
87
+ | Metric | Test |
88
+ | --- | ---: |
89
+ | Stage-A | `0.751` |
90
+ | Violation F1 | `0.993` |
91
+ | Severity Acc | `0.727` |
92
+ | Domain F1 | `0.803` |
93
+ | Subtype F1 | `0.738` |
94
+ | Jurisdiction Acc | `0.740` |
95
+ | Why F1 | `0.684` |
96
+ | Principles F1 | `0.703` |
97
+ | Remediation F1 | `0.618` |
98
+ | Aggravating F1 | `0.655` |
99
+
100
+ ## Repository Contents
101
+
102
+ - `model.safetensors`: serialized public model artifact.
103
+ - `config.json`: custom Transformers config, pipeline registration, and public release metadata.
104
+ - `configuration_sentinel.py`, `modeling_sentinel.py`, `pipeline_sentinel.py`: Hugging Face runtime code required to load this artifact.
105
+ - tokenizer files: tokenizer assets used by the model.
106
+ - `metadata.json`: dataset signature, output signature, thresholds, and release metadata.
107
+ - `metrics.json`: evaluation metrics for the selected model.
108
+ - `results.md`: human-readable evaluation artifact.
109
+
110
+ ## Intended Use And Limits
111
+
112
+ This model is intended for public demonstration and evaluation of automated first-pass risk signals in wealth-management communications. It is scoped to English client-communications examples under the dataset contract listed above.
113
+
114
+ Do not use this model as a legal decision-maker, a substitute for qualified compliance review, a general-purpose moderation system, or evidence of performance outside the stated dataset scope. Aurelex AI Corp may request modification or removal of this repository at any time.
config.json ADDED
@@ -0,0 +1,395 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "SentinelStageAModel"
4
+ ],
5
+ "auto_map": {
6
+ "AutoConfig": "configuration_sentinel.SentinelConfig",
7
+ "AutoModel": "modeling_sentinel.SentinelStageAModel"
8
+ },
9
+ "classifier_dropout": 0.1,
10
+ "custom_pipelines": {
11
+ "sentinel-stage-a": {
12
+ "impl": "pipeline_sentinel.SentinelStageAPipeline",
13
+ "pt": [
14
+ "AutoModel"
15
+ ],
16
+ "type": "text"
17
+ }
18
+ },
19
+ "dataset_signature": {
20
+ "counts": {
21
+ "dev": 150,
22
+ "test": 150,
23
+ "train": 900
24
+ },
25
+ "distribution": {
26
+ "dev": {
27
+ "clean": 8,
28
+ "risky": 142
29
+ },
30
+ "test": {
31
+ "clean": 8,
32
+ "risky": 142
33
+ },
34
+ "train": {
35
+ "clean": 297,
36
+ "risky": 603
37
+ }
38
+ },
39
+ "generator_version": "2026-04-07-final-audit-clear-v1"
40
+ },
41
+ "encoder_code_revision": null,
42
+ "encoder_config": {
43
+ "_attn_implementation_autoset": true,
44
+ "_name_or_path": "answerdotai/ModernBERT-base",
45
+ "add_cross_attention": false,
46
+ "architectures": [
47
+ "ModernBertForMaskedLM"
48
+ ],
49
+ "attention_bias": false,
50
+ "attention_dropout": 0.0,
51
+ "bad_words_ids": null,
52
+ "begin_suppress_tokens": null,
53
+ "bos_token_id": 50281,
54
+ "chunk_size_feed_forward": 0,
55
+ "classifier_activation": "gelu",
56
+ "classifier_bias": false,
57
+ "classifier_dropout": 0.0,
58
+ "classifier_pooling": "mean",
59
+ "cls_token_id": 50281,
60
+ "cross_attention_hidden_size": null,
61
+ "decoder_bias": true,
62
+ "decoder_start_token_id": null,
63
+ "deterministic_flash_attn": false,
64
+ "diversity_penalty": 0.0,
65
+ "do_sample": false,
66
+ "early_stopping": false,
67
+ "embedding_dropout": 0.0,
68
+ "encoder_no_repeat_ngram_size": 0,
69
+ "eos_token_id": 50282,
70
+ "exponential_decay_length_penalty": null,
71
+ "finetuning_task": null,
72
+ "forced_bos_token_id": null,
73
+ "forced_eos_token_id": null,
74
+ "global_attn_every_n_layers": 3,
75
+ "global_rope_theta": 160000.0,
76
+ "gradient_checkpointing": false,
77
+ "hidden_activation": "gelu",
78
+ "hidden_size": 768,
79
+ "id2label": {
80
+ "0": "LABEL_0",
81
+ "1": "LABEL_1"
82
+ },
83
+ "initializer_cutoff_factor": 2.0,
84
+ "initializer_range": 0.02,
85
+ "intermediate_size": 1152,
86
+ "is_decoder": false,
87
+ "is_encoder_decoder": false,
88
+ "label2id": {
89
+ "LABEL_0": 0,
90
+ "LABEL_1": 1
91
+ },
92
+ "layer_norm_eps": 1e-05,
93
+ "length_penalty": 1.0,
94
+ "local_attention": 128,
95
+ "local_rope_theta": 10000.0,
96
+ "max_length": 20,
97
+ "max_position_embeddings": 8192,
98
+ "min_length": 0,
99
+ "mlp_bias": false,
100
+ "mlp_dropout": 0.0,
101
+ "model_type": "modernbert",
102
+ "no_repeat_ngram_size": 0,
103
+ "norm_bias": false,
104
+ "norm_eps": 1e-05,
105
+ "num_attention_heads": 12,
106
+ "num_beam_groups": 1,
107
+ "num_beams": 1,
108
+ "num_hidden_layers": 22,
109
+ "num_return_sequences": 1,
110
+ "output_attentions": false,
111
+ "output_hidden_states": false,
112
+ "output_scores": false,
113
+ "pad_token_id": 50283,
114
+ "position_embedding_type": "absolute",
115
+ "prefix": null,
116
+ "problem_type": null,
117
+ "pruned_heads": {},
118
+ "reference_compile": null,
119
+ "remove_invalid_values": false,
120
+ "repad_logits_with_grad": false,
121
+ "repetition_penalty": 1.0,
122
+ "return_dict": true,
123
+ "return_dict_in_generate": false,
124
+ "sep_token_id": 50282,
125
+ "sparse_pred_ignore_index": -100,
126
+ "sparse_prediction": false,
127
+ "suppress_tokens": null,
128
+ "task_specific_params": null,
129
+ "temperature": 1.0,
130
+ "tf_legacy_loss": false,
131
+ "tie_encoder_decoder": false,
132
+ "tie_word_embeddings": true,
133
+ "tokenizer_class": null,
134
+ "top_k": 50,
135
+ "top_p": 1.0,
136
+ "torch_dtype": "float32",
137
+ "torchscript": false,
138
+ "transformers_version": "4.48.3",
139
+ "typical_p": 1.0,
140
+ "use_bfloat16": false,
141
+ "vocab_size": 50368
142
+ },
143
+ "encoder_config_overrides": {},
144
+ "encoder_model_name": "answerdotai/ModernBERT-base",
145
+ "encoder_revision": null,
146
+ "encoder_trust_remote_code": false,
147
+ "head_code": "c",
148
+ "head_div": 1,
149
+ "head_dropout": 0.1,
150
+ "head_mul": 1,
151
+ "head_skip": true,
152
+ "head_type": "columnar",
153
+ "head_variant": "d11",
154
+ "max_length": 512,
155
+ "model_key": "sentinel-mb-c-d11",
156
+ "model_type": "sentinel_stage_a",
157
+ "model_version": "sentinel-mb-c-d11-20260424",
158
+ "output_heads": [
159
+ "violation",
160
+ "severity",
161
+ "domain",
162
+ "subtype",
163
+ "jurisdiction",
164
+ "why",
165
+ "impacted_principles",
166
+ "remediation_actions",
167
+ "content_type",
168
+ "audience_segment",
169
+ "detection_difficulty",
170
+ "aggravating_factors"
171
+ ],
172
+ "output_signature": {
173
+ "aggravating_factors": {
174
+ "labels": [
175
+ "intentional",
176
+ "reckless",
177
+ "negligent",
178
+ "concealment_present",
179
+ "customer_harm_potential",
180
+ "financial_benefit_to_respondent",
181
+ "vulnerable_client",
182
+ "pattern_or_duration"
183
+ ],
184
+ "type": "multilabel"
185
+ },
186
+ "audience_segment": {
187
+ "labels": [
188
+ "client",
189
+ "internal",
190
+ "prospect_or_investor",
191
+ "public",
192
+ "third_party"
193
+ ],
194
+ "type": "multiclass"
195
+ },
196
+ "content_type": {
197
+ "labels": [
198
+ "email",
199
+ "message"
200
+ ],
201
+ "type": "multiclass"
202
+ },
203
+ "detection_difficulty": {
204
+ "labels": [
205
+ "obvious",
206
+ "moderate",
207
+ "subtle"
208
+ ],
209
+ "type": "multiclass"
210
+ },
211
+ "domain": {
212
+ "labels": [
213
+ "performance_claims_forecasting",
214
+ "investment_advice_suitability",
215
+ "conflicts_inducements",
216
+ "marketing_solicitation_advertising",
217
+ "selective_disclosure_fair_access",
218
+ "mnpi_insider_trading",
219
+ "recordkeeping_supervision",
220
+ "ai_automation_capability_claims",
221
+ "privacy_confidentiality",
222
+ "cybersecurity_internal_controls",
223
+ "employment_favoritism_role_conflict",
224
+ "aml_and_suspicious_activity",
225
+ "other_unknown"
226
+ ],
227
+ "type": "multiclass"
228
+ },
229
+ "impacted_principles": {
230
+ "labels": [
231
+ "truthful_non_misleading_communications",
232
+ "balanced_risk_reward_presentation",
233
+ "no_performance_guarantees_or_promissory_language",
234
+ "registration_and_scope_of_advice",
235
+ "duty_of_loyalty_conflict_disclosure",
236
+ "fair_access_to_material_information",
237
+ "insider_trading_and_mnpi_controls",
238
+ "supervision_and_books_records",
239
+ "privacy_confidentiality_and_secure_handling",
240
+ "security_control_integrity",
241
+ "role_separation_and_fair_access_in_academia",
242
+ "non_coercion_and_no_undue_influence",
243
+ "accurate_ai_capability_and_human_oversight",
244
+ "client_vulnerability_and_exploitation_prevention",
245
+ "aml_and_sanctions_compliance"
246
+ ],
247
+ "type": "multilabel"
248
+ },
249
+ "jurisdiction": {
250
+ "labels": [
251
+ "US",
252
+ "EU",
253
+ "UK",
254
+ "Other",
255
+ "Unknown"
256
+ ],
257
+ "type": "multiclass"
258
+ },
259
+ "remediation_actions": {
260
+ "labels": [
261
+ "add_forward_looking_disclaimer",
262
+ "reframe_as_scenarios_not_expectations",
263
+ "add_balanced_risk_and_downside_section",
264
+ "remove_or_soften_guarantee_language",
265
+ "remove_personalized_recommendations",
266
+ "add_registered_advice_boundary_language",
267
+ "disclose_conflicts_and_compensation",
268
+ "add_fees_costs_and_alternatives_comparison",
269
+ "use_standardized_approved_performance_materials",
270
+ "add_performance_methodology_and_gross_net_context",
271
+ "avoid_selective_disclosure_share_broadly",
272
+ "escalate_mnpi_to_compliance_and_halt",
273
+ "keep_discussion_on_retained_channels",
274
+ "require_formal_preapproval_before_send",
275
+ "remove_pressure_scarcity_and_use_factual_timeline",
276
+ "substantiation_or_remove_credibility_claims",
277
+ "add_testimonial_endorsement_and_rating_disclosure",
278
+ "make_required_disclosure_clear_and_prominent",
279
+ "avoid_minimizing_compliance_or_diligence",
280
+ "clarify_ai_is_assistive_with_human_review",
281
+ "remove_claims_that_ai_eliminates_risk",
282
+ "redact_and_minimize_sensitive_data",
283
+ "use_secure_transfer_and_limit_access",
284
+ "avoid_sharing_internal_controls_or_sanitize",
285
+ "route_academic_opportunities_through_institution",
286
+ "separate_recommendation_letters_from_work",
287
+ "assess_cost_to_equity_against_client_profile",
288
+ "flag_for_elder_exploitation_review_and_hold",
289
+ "assess_sar_filing_obligation_and_escalate",
290
+ "initiate_breach_notification_review_and_timeline",
291
+ "remove_provisions_impeding_regulatory_communications"
292
+ ],
293
+ "type": "multilabel"
294
+ },
295
+ "severity": {
296
+ "labels": [
297
+ "sev_0_compliant_or_ok",
298
+ "sev_1_minor",
299
+ "sev_2_moderate",
300
+ "sev_3_high"
301
+ ],
302
+ "type": "multiclass"
303
+ },
304
+ "subtype": {
305
+ "labels": [
306
+ "speculative_outcomes_unqualified",
307
+ "implicit_or_explicit_guarantee",
308
+ "risk_context_omitted_or_unbalanced",
309
+ "unregistered_personalized_investment_advice",
310
+ "undisclosed_economic_conflict_or_referral",
311
+ "pressure_or_coercion",
312
+ "selective_disclosure",
313
+ "mnpi_misuse_or_encouragement",
314
+ "recordkeeping_or_preapproval_evasion",
315
+ "ai_autonomy_or_safety_overstatement",
316
+ "credentials_validation_or_compliance_misrepresentation",
317
+ "confidential_data_leakage",
318
+ "internal_controls_or_exception_process_leakage",
319
+ "academic_commercial_role_blurring_or_quid_pro_quo",
320
+ "improper_solicitation_offering_pressure",
321
+ "excessive_trading_or_account_churning",
322
+ "product_switching_without_cost_benefit_analysis",
323
+ "dual_registrant_capacity_or_wrap_fee_conflict_confusion",
324
+ "elder_exploitation_or_vulnerable_client_signal",
325
+ "suspicious_activity_indicator_or_structuring",
326
+ "influencer_or_social_media_promotion_compliance_failure",
327
+ "crypto_asset_misrepresentation_or_inadequate_disclosure",
328
+ "other_unknown"
329
+ ],
330
+ "type": "multiclass"
331
+ },
332
+ "violation": {
333
+ "type": "binary"
334
+ },
335
+ "why": {
336
+ "labels": [
337
+ "forward_looking_statement_unqualified",
338
+ "guarantee_or_assurance_language",
339
+ "omits_material_risk_or_downside",
340
+ "implies_downside_protection_or_no_drawdown",
341
+ "cherry_picks_performance_period",
342
+ "omits_performance_methodology_or_gross_net_context",
343
+ "personalized_trade_or_allocation_recommendation",
344
+ "timing_or_sizing_guidance",
345
+ "creates_implied_advisory_relationship",
346
+ "conflict_not_disclosed",
347
+ "referral_relationship_not_disclosed",
348
+ "omits_fees_costs_or_reasonably_available_alternatives",
349
+ "selective_private_performance_or_fundraising_update",
350
+ "off_the_record_or_not_in_writing_language",
351
+ "mnpi_possession_indicated",
352
+ "encourages_action_before_public_release",
353
+ "avoid_recordkeeping_channel_shift",
354
+ "bypasses_required_preapproval",
355
+ "pressure_scarcity_urgency",
356
+ "unsubstantiated_social_proof_or_validation",
357
+ "omits_testimonial_endorsement_or_rating_disclosure",
358
+ "obscures_required_disclosure_or_form_crs",
359
+ "minimizes_need_for_diligence_or_compliance",
360
+ "overstates_ai_capability_or_removes_human_oversight",
361
+ "claims_compliance_risk_eliminated",
362
+ "shares_sensitive_personal_or_financial_data",
363
+ "violates_need_to_know_data_minimization",
364
+ "shares_sensitive_internal_controls_or_exceptions",
365
+ "role_power_imbalance_or_favoritism",
366
+ "excessive_trading_cost_to_equity",
367
+ "inadequate_customer_profile_or_suitability_basis",
368
+ "exploits_vulnerable_or_elderly_client",
369
+ "aml_suspicious_activity_indicator",
370
+ "omits_switching_costs_and_product_comparison",
371
+ "conflict_language_understates_actual_relationship",
372
+ "omits_influencer_compensation_or_affiliation_disclosure",
373
+ "misrepresents_sipc_or_regulatory_protection_for_crypto",
374
+ "data_breach_notification_obligation_triggered",
375
+ "impedes_regulatory_reporting_or_whistleblower_rights"
376
+ ],
377
+ "type": "multilabel"
378
+ }
379
+ },
380
+ "projection_size": 640,
381
+ "release_alias_of": null,
382
+ "release_channel": "sentinel-01-pub",
383
+ "release_repo_id": "AurelexAI/sentinel-01-pub",
384
+ "thresholds": {
385
+ "aggravating_factors": 0.4,
386
+ "impacted_principles": 0.7,
387
+ "remediation_actions": 0.5,
388
+ "violation": 0.5,
389
+ "why": 0.55
390
+ },
391
+ "tokenizer_class": "PreTrainedTokenizerFast",
392
+ "torch_dtype": "float32",
393
+ "trainable_head_params": 14325653,
394
+ "transformers_version": "4.48.3"
395
+ }
configuration_sentinel.py ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Configuration for self-contained Sentinel Stage A Transformers models."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import Any
6
+
7
+ from transformers import PretrainedConfig
8
+
9
+
10
+ class SentinelConfig(PretrainedConfig):
11
+ """Transformers config for an end-to-end Sentinel Stage A classifier."""
12
+
13
+ model_type = "sentinel_stage_a"
14
+
15
+ def __init__(
16
+ self,
17
+ model_key: str = "sentinel-stage-a",
18
+ model_version: str | None = None,
19
+ release_repo_id: str | None = None,
20
+ release_channel: str | None = None,
21
+ release_alias_of: str | None = None,
22
+ encoder_model_name: str = "",
23
+ encoder_revision: str | None = None,
24
+ encoder_code_revision: str | None = None,
25
+ encoder_trust_remote_code: bool = False,
26
+ encoder_config_overrides: dict[str, Any] | None = None,
27
+ encoder_config: dict[str, Any] | None = None,
28
+ head_type: str = "direct",
29
+ head_code: str | None = None,
30
+ head_variant: str | None = None,
31
+ head_dropout: float | None = None,
32
+ head_div: int = 1,
33
+ head_mul: int = 1,
34
+ head_skip: bool = False,
35
+ projection_size: int = 768,
36
+ classifier_dropout: float = 0.10,
37
+ max_length: int = 512,
38
+ output_heads: list[str] | None = None,
39
+ output_signature: dict[str, Any] | None = None,
40
+ thresholds: dict[str, float] | None = None,
41
+ dataset_signature: dict[str, Any] | None = None,
42
+ trainable_head_params: int | None = None,
43
+ **kwargs: Any,
44
+ ) -> None:
45
+ super().__init__(**kwargs)
46
+ self.model_key = model_key
47
+ self.model_version = model_version or model_key
48
+ self.release_repo_id = release_repo_id
49
+ self.release_channel = release_channel
50
+ self.release_alias_of = release_alias_of
51
+ self.encoder_model_name = encoder_model_name
52
+ self.encoder_revision = encoder_revision
53
+ self.encoder_code_revision = encoder_code_revision
54
+ self.encoder_trust_remote_code = bool(encoder_trust_remote_code)
55
+ self.encoder_config_overrides = encoder_config_overrides or {}
56
+ self.encoder_config = encoder_config or {}
57
+ self.head_type = head_type
58
+ self.head_code = head_code or {"direct": "d", "recombine": "r", "columnar": "c"}.get(
59
+ head_type,
60
+ head_type,
61
+ )
62
+ self.head_variant = head_variant
63
+ self.head_dropout = float(classifier_dropout if head_dropout is None else head_dropout)
64
+ self.head_div = int(head_div)
65
+ self.head_mul = int(head_mul)
66
+ self.head_skip = bool(head_skip)
67
+ self.projection_size = int(projection_size)
68
+ self.classifier_dropout = float(self.head_dropout)
69
+ self.max_length = int(max_length)
70
+ self.output_heads = output_heads or list((output_signature or {}).keys())
71
+ self.output_signature = output_signature or {}
72
+ self.thresholds = thresholds or {}
73
+ self.dataset_signature = dataset_signature or {}
74
+ self.trainable_head_params = trainable_head_params
metadata.json ADDED
@@ -0,0 +1,589 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "checkpoint_format_version": 1,
3
+ "created_at": "2026-04-24T13:59:13",
4
+ "model_key": "sentinel-mb-c-d11",
5
+ "encoder_model": "answerdotai/ModernBERT-base",
6
+ "encoder_params_millions": 149.7,
7
+ "head_type": "columnar",
8
+ "head_code": "c",
9
+ "head_variant": "d11",
10
+ "head_dropout": 0.1,
11
+ "head_div": 1,
12
+ "head_mul": 1,
13
+ "head_skip": true,
14
+ "head_architecture": "funnel",
15
+ "model_family": "modernbert-base",
16
+ "projection_size": 640,
17
+ "trainable_head_params": 14325653,
18
+ "artifact_format": "transformers_end_to_end",
19
+ "end_to_end_serialized": true,
20
+ "dataset_signature": {
21
+ "generator_version": "2026-04-07-final-audit-clear-v1",
22
+ "counts": {
23
+ "train": 900,
24
+ "dev": 150,
25
+ "test": 150
26
+ },
27
+ "distribution": {
28
+ "train": {
29
+ "risky": 603,
30
+ "clean": 297
31
+ },
32
+ "dev": {
33
+ "risky": 142,
34
+ "clean": 8
35
+ },
36
+ "test": {
37
+ "risky": 142,
38
+ "clean": 8
39
+ }
40
+ }
41
+ },
42
+ "output_signature": {
43
+ "violation": {
44
+ "type": "binary"
45
+ },
46
+ "severity": {
47
+ "type": "multiclass",
48
+ "labels": [
49
+ "sev_0_compliant_or_ok",
50
+ "sev_1_minor",
51
+ "sev_2_moderate",
52
+ "sev_3_high"
53
+ ]
54
+ },
55
+ "domain": {
56
+ "type": "multiclass",
57
+ "labels": [
58
+ "performance_claims_forecasting",
59
+ "investment_advice_suitability",
60
+ "conflicts_inducements",
61
+ "marketing_solicitation_advertising",
62
+ "selective_disclosure_fair_access",
63
+ "mnpi_insider_trading",
64
+ "recordkeeping_supervision",
65
+ "ai_automation_capability_claims",
66
+ "privacy_confidentiality",
67
+ "cybersecurity_internal_controls",
68
+ "employment_favoritism_role_conflict",
69
+ "aml_and_suspicious_activity",
70
+ "other_unknown"
71
+ ]
72
+ },
73
+ "subtype": {
74
+ "type": "multiclass",
75
+ "labels": [
76
+ "speculative_outcomes_unqualified",
77
+ "implicit_or_explicit_guarantee",
78
+ "risk_context_omitted_or_unbalanced",
79
+ "unregistered_personalized_investment_advice",
80
+ "undisclosed_economic_conflict_or_referral",
81
+ "pressure_or_coercion",
82
+ "selective_disclosure",
83
+ "mnpi_misuse_or_encouragement",
84
+ "recordkeeping_or_preapproval_evasion",
85
+ "ai_autonomy_or_safety_overstatement",
86
+ "credentials_validation_or_compliance_misrepresentation",
87
+ "confidential_data_leakage",
88
+ "internal_controls_or_exception_process_leakage",
89
+ "academic_commercial_role_blurring_or_quid_pro_quo",
90
+ "improper_solicitation_offering_pressure",
91
+ "excessive_trading_or_account_churning",
92
+ "product_switching_without_cost_benefit_analysis",
93
+ "dual_registrant_capacity_or_wrap_fee_conflict_confusion",
94
+ "elder_exploitation_or_vulnerable_client_signal",
95
+ "suspicious_activity_indicator_or_structuring",
96
+ "influencer_or_social_media_promotion_compliance_failure",
97
+ "crypto_asset_misrepresentation_or_inadequate_disclosure",
98
+ "other_unknown"
99
+ ]
100
+ },
101
+ "jurisdiction": {
102
+ "type": "multiclass",
103
+ "labels": [
104
+ "US",
105
+ "EU",
106
+ "UK",
107
+ "Other",
108
+ "Unknown"
109
+ ]
110
+ },
111
+ "why": {
112
+ "type": "multilabel",
113
+ "labels": [
114
+ "forward_looking_statement_unqualified",
115
+ "guarantee_or_assurance_language",
116
+ "omits_material_risk_or_downside",
117
+ "implies_downside_protection_or_no_drawdown",
118
+ "cherry_picks_performance_period",
119
+ "omits_performance_methodology_or_gross_net_context",
120
+ "personalized_trade_or_allocation_recommendation",
121
+ "timing_or_sizing_guidance",
122
+ "creates_implied_advisory_relationship",
123
+ "conflict_not_disclosed",
124
+ "referral_relationship_not_disclosed",
125
+ "omits_fees_costs_or_reasonably_available_alternatives",
126
+ "selective_private_performance_or_fundraising_update",
127
+ "off_the_record_or_not_in_writing_language",
128
+ "mnpi_possession_indicated",
129
+ "encourages_action_before_public_release",
130
+ "avoid_recordkeeping_channel_shift",
131
+ "bypasses_required_preapproval",
132
+ "pressure_scarcity_urgency",
133
+ "unsubstantiated_social_proof_or_validation",
134
+ "omits_testimonial_endorsement_or_rating_disclosure",
135
+ "obscures_required_disclosure_or_form_crs",
136
+ "minimizes_need_for_diligence_or_compliance",
137
+ "overstates_ai_capability_or_removes_human_oversight",
138
+ "claims_compliance_risk_eliminated",
139
+ "shares_sensitive_personal_or_financial_data",
140
+ "violates_need_to_know_data_minimization",
141
+ "shares_sensitive_internal_controls_or_exceptions",
142
+ "role_power_imbalance_or_favoritism",
143
+ "excessive_trading_cost_to_equity",
144
+ "inadequate_customer_profile_or_suitability_basis",
145
+ "exploits_vulnerable_or_elderly_client",
146
+ "aml_suspicious_activity_indicator",
147
+ "omits_switching_costs_and_product_comparison",
148
+ "conflict_language_understates_actual_relationship",
149
+ "omits_influencer_compensation_or_affiliation_disclosure",
150
+ "misrepresents_sipc_or_regulatory_protection_for_crypto",
151
+ "data_breach_notification_obligation_triggered",
152
+ "impedes_regulatory_reporting_or_whistleblower_rights"
153
+ ]
154
+ },
155
+ "impacted_principles": {
156
+ "type": "multilabel",
157
+ "labels": [
158
+ "truthful_non_misleading_communications",
159
+ "balanced_risk_reward_presentation",
160
+ "no_performance_guarantees_or_promissory_language",
161
+ "registration_and_scope_of_advice",
162
+ "duty_of_loyalty_conflict_disclosure",
163
+ "fair_access_to_material_information",
164
+ "insider_trading_and_mnpi_controls",
165
+ "supervision_and_books_records",
166
+ "privacy_confidentiality_and_secure_handling",
167
+ "security_control_integrity",
168
+ "role_separation_and_fair_access_in_academia",
169
+ "non_coercion_and_no_undue_influence",
170
+ "accurate_ai_capability_and_human_oversight",
171
+ "client_vulnerability_and_exploitation_prevention",
172
+ "aml_and_sanctions_compliance"
173
+ ]
174
+ },
175
+ "remediation_actions": {
176
+ "type": "multilabel",
177
+ "labels": [
178
+ "add_forward_looking_disclaimer",
179
+ "reframe_as_scenarios_not_expectations",
180
+ "add_balanced_risk_and_downside_section",
181
+ "remove_or_soften_guarantee_language",
182
+ "remove_personalized_recommendations",
183
+ "add_registered_advice_boundary_language",
184
+ "disclose_conflicts_and_compensation",
185
+ "add_fees_costs_and_alternatives_comparison",
186
+ "use_standardized_approved_performance_materials",
187
+ "add_performance_methodology_and_gross_net_context",
188
+ "avoid_selective_disclosure_share_broadly",
189
+ "escalate_mnpi_to_compliance_and_halt",
190
+ "keep_discussion_on_retained_channels",
191
+ "require_formal_preapproval_before_send",
192
+ "remove_pressure_scarcity_and_use_factual_timeline",
193
+ "substantiation_or_remove_credibility_claims",
194
+ "add_testimonial_endorsement_and_rating_disclosure",
195
+ "make_required_disclosure_clear_and_prominent",
196
+ "avoid_minimizing_compliance_or_diligence",
197
+ "clarify_ai_is_assistive_with_human_review",
198
+ "remove_claims_that_ai_eliminates_risk",
199
+ "redact_and_minimize_sensitive_data",
200
+ "use_secure_transfer_and_limit_access",
201
+ "avoid_sharing_internal_controls_or_sanitize",
202
+ "route_academic_opportunities_through_institution",
203
+ "separate_recommendation_letters_from_work",
204
+ "assess_cost_to_equity_against_client_profile",
205
+ "flag_for_elder_exploitation_review_and_hold",
206
+ "assess_sar_filing_obligation_and_escalate",
207
+ "initiate_breach_notification_review_and_timeline",
208
+ "remove_provisions_impeding_regulatory_communications"
209
+ ]
210
+ },
211
+ "content_type": {
212
+ "type": "multiclass",
213
+ "labels": [
214
+ "email",
215
+ "message"
216
+ ]
217
+ },
218
+ "audience_segment": {
219
+ "type": "multiclass",
220
+ "labels": [
221
+ "client",
222
+ "internal",
223
+ "prospect_or_investor",
224
+ "public",
225
+ "third_party"
226
+ ]
227
+ },
228
+ "detection_difficulty": {
229
+ "type": "multiclass",
230
+ "labels": [
231
+ "obvious",
232
+ "moderate",
233
+ "subtle"
234
+ ]
235
+ },
236
+ "aggravating_factors": {
237
+ "type": "multilabel",
238
+ "labels": [
239
+ "intentional",
240
+ "reckless",
241
+ "negligent",
242
+ "concealment_present",
243
+ "customer_harm_potential",
244
+ "financial_benefit_to_respondent",
245
+ "vulnerable_client",
246
+ "pattern_or_duration"
247
+ ]
248
+ }
249
+ },
250
+ "label_groups": {
251
+ "severity": [
252
+ "sev_0_compliant_or_ok",
253
+ "sev_1_minor",
254
+ "sev_2_moderate",
255
+ "sev_3_high"
256
+ ],
257
+ "domain": [
258
+ "performance_claims_forecasting",
259
+ "investment_advice_suitability",
260
+ "conflicts_inducements",
261
+ "marketing_solicitation_advertising",
262
+ "selective_disclosure_fair_access",
263
+ "mnpi_insider_trading",
264
+ "recordkeeping_supervision",
265
+ "ai_automation_capability_claims",
266
+ "privacy_confidentiality",
267
+ "cybersecurity_internal_controls",
268
+ "employment_favoritism_role_conflict",
269
+ "aml_and_suspicious_activity",
270
+ "other_unknown"
271
+ ],
272
+ "subtype": [
273
+ "speculative_outcomes_unqualified",
274
+ "implicit_or_explicit_guarantee",
275
+ "risk_context_omitted_or_unbalanced",
276
+ "unregistered_personalized_investment_advice",
277
+ "undisclosed_economic_conflict_or_referral",
278
+ "pressure_or_coercion",
279
+ "selective_disclosure",
280
+ "mnpi_misuse_or_encouragement",
281
+ "recordkeeping_or_preapproval_evasion",
282
+ "ai_autonomy_or_safety_overstatement",
283
+ "credentials_validation_or_compliance_misrepresentation",
284
+ "confidential_data_leakage",
285
+ "internal_controls_or_exception_process_leakage",
286
+ "academic_commercial_role_blurring_or_quid_pro_quo",
287
+ "improper_solicitation_offering_pressure",
288
+ "excessive_trading_or_account_churning",
289
+ "product_switching_without_cost_benefit_analysis",
290
+ "dual_registrant_capacity_or_wrap_fee_conflict_confusion",
291
+ "elder_exploitation_or_vulnerable_client_signal",
292
+ "suspicious_activity_indicator_or_structuring",
293
+ "influencer_or_social_media_promotion_compliance_failure",
294
+ "crypto_asset_misrepresentation_or_inadequate_disclosure",
295
+ "other_unknown"
296
+ ],
297
+ "jurisdiction": [
298
+ "US",
299
+ "EU",
300
+ "UK",
301
+ "Other",
302
+ "Unknown"
303
+ ],
304
+ "why": [
305
+ "forward_looking_statement_unqualified",
306
+ "guarantee_or_assurance_language",
307
+ "omits_material_risk_or_downside",
308
+ "implies_downside_protection_or_no_drawdown",
309
+ "cherry_picks_performance_period",
310
+ "omits_performance_methodology_or_gross_net_context",
311
+ "personalized_trade_or_allocation_recommendation",
312
+ "timing_or_sizing_guidance",
313
+ "creates_implied_advisory_relationship",
314
+ "conflict_not_disclosed",
315
+ "referral_relationship_not_disclosed",
316
+ "omits_fees_costs_or_reasonably_available_alternatives",
317
+ "selective_private_performance_or_fundraising_update",
318
+ "off_the_record_or_not_in_writing_language",
319
+ "mnpi_possession_indicated",
320
+ "encourages_action_before_public_release",
321
+ "avoid_recordkeeping_channel_shift",
322
+ "bypasses_required_preapproval",
323
+ "pressure_scarcity_urgency",
324
+ "unsubstantiated_social_proof_or_validation",
325
+ "omits_testimonial_endorsement_or_rating_disclosure",
326
+ "obscures_required_disclosure_or_form_crs",
327
+ "minimizes_need_for_diligence_or_compliance",
328
+ "overstates_ai_capability_or_removes_human_oversight",
329
+ "claims_compliance_risk_eliminated",
330
+ "shares_sensitive_personal_or_financial_data",
331
+ "violates_need_to_know_data_minimization",
332
+ "shares_sensitive_internal_controls_or_exceptions",
333
+ "role_power_imbalance_or_favoritism",
334
+ "excessive_trading_cost_to_equity",
335
+ "inadequate_customer_profile_or_suitability_basis",
336
+ "exploits_vulnerable_or_elderly_client",
337
+ "aml_suspicious_activity_indicator",
338
+ "omits_switching_costs_and_product_comparison",
339
+ "conflict_language_understates_actual_relationship",
340
+ "omits_influencer_compensation_or_affiliation_disclosure",
341
+ "misrepresents_sipc_or_regulatory_protection_for_crypto",
342
+ "data_breach_notification_obligation_triggered",
343
+ "impedes_regulatory_reporting_or_whistleblower_rights"
344
+ ],
345
+ "impacted_principles": [
346
+ "truthful_non_misleading_communications",
347
+ "balanced_risk_reward_presentation",
348
+ "no_performance_guarantees_or_promissory_language",
349
+ "registration_and_scope_of_advice",
350
+ "duty_of_loyalty_conflict_disclosure",
351
+ "fair_access_to_material_information",
352
+ "insider_trading_and_mnpi_controls",
353
+ "supervision_and_books_records",
354
+ "privacy_confidentiality_and_secure_handling",
355
+ "security_control_integrity",
356
+ "role_separation_and_fair_access_in_academia",
357
+ "non_coercion_and_no_undue_influence",
358
+ "accurate_ai_capability_and_human_oversight",
359
+ "client_vulnerability_and_exploitation_prevention",
360
+ "aml_and_sanctions_compliance"
361
+ ],
362
+ "remediation_actions": [
363
+ "add_forward_looking_disclaimer",
364
+ "reframe_as_scenarios_not_expectations",
365
+ "add_balanced_risk_and_downside_section",
366
+ "remove_or_soften_guarantee_language",
367
+ "remove_personalized_recommendations",
368
+ "add_registered_advice_boundary_language",
369
+ "disclose_conflicts_and_compensation",
370
+ "add_fees_costs_and_alternatives_comparison",
371
+ "use_standardized_approved_performance_materials",
372
+ "add_performance_methodology_and_gross_net_context",
373
+ "avoid_selective_disclosure_share_broadly",
374
+ "escalate_mnpi_to_compliance_and_halt",
375
+ "keep_discussion_on_retained_channels",
376
+ "require_formal_preapproval_before_send",
377
+ "remove_pressure_scarcity_and_use_factual_timeline",
378
+ "substantiation_or_remove_credibility_claims",
379
+ "add_testimonial_endorsement_and_rating_disclosure",
380
+ "make_required_disclosure_clear_and_prominent",
381
+ "avoid_minimizing_compliance_or_diligence",
382
+ "clarify_ai_is_assistive_with_human_review",
383
+ "remove_claims_that_ai_eliminates_risk",
384
+ "redact_and_minimize_sensitive_data",
385
+ "use_secure_transfer_and_limit_access",
386
+ "avoid_sharing_internal_controls_or_sanitize",
387
+ "route_academic_opportunities_through_institution",
388
+ "separate_recommendation_letters_from_work",
389
+ "assess_cost_to_equity_against_client_profile",
390
+ "flag_for_elder_exploitation_review_and_hold",
391
+ "assess_sar_filing_obligation_and_escalate",
392
+ "initiate_breach_notification_review_and_timeline",
393
+ "remove_provisions_impeding_regulatory_communications"
394
+ ]
395
+ },
396
+ "metadata_groups": {
397
+ "content_type": [
398
+ "email",
399
+ "message"
400
+ ],
401
+ "audience_segment": [
402
+ "client",
403
+ "internal",
404
+ "prospect_or_investor",
405
+ "public",
406
+ "third_party"
407
+ ],
408
+ "detection_difficulty": [
409
+ "obvious",
410
+ "moderate",
411
+ "subtle"
412
+ ],
413
+ "aggravating_factors": [
414
+ "intentional",
415
+ "reckless",
416
+ "negligent",
417
+ "concealment_present",
418
+ "customer_harm_potential",
419
+ "financial_benefit_to_respondent",
420
+ "vulnerable_client",
421
+ "pattern_or_duration"
422
+ ]
423
+ },
424
+ "thresholds": {
425
+ "violation": 0.5,
426
+ "why": 0.55,
427
+ "impacted_principles": 0.7,
428
+ "remediation_actions": 0.5,
429
+ "aggravating_factors": 0.4
430
+ },
431
+ "dev": {
432
+ "loss": 11.207931518554688,
433
+ "violation_accuracy": 0.9933333333333333,
434
+ "violation_precision": 1.0,
435
+ "violation_recall": 0.9929577464788732,
436
+ "violation_f1": 0.9964664310954063,
437
+ "severity_accuracy": 0.7133333333333334,
438
+ "severity_precision_macro": 0.5736714975845411,
439
+ "severity_recall_macro": 0.5810399159663866,
440
+ "severity_f1_macro": 0.577203237410072,
441
+ "domain_accuracy": 0.8733333333333333,
442
+ "domain_precision_macro": 0.9152304502304504,
443
+ "domain_recall_macro": 0.9037037037037038,
444
+ "domain_f1_macro": 0.8981829715276235,
445
+ "subtype_accuracy": 0.82,
446
+ "subtype_precision_macro": 0.8295979273252001,
447
+ "subtype_recall_macro": 0.8100452577725306,
448
+ "subtype_f1_macro": 0.8046637752590468,
449
+ "jurisdiction_accuracy": 0.6933333333333334,
450
+ "jurisdiction_precision_macro": 0.41350649350649354,
451
+ "jurisdiction_recall_macro": 0.4179220779220779,
452
+ "jurisdiction_f1_macro": 0.4076005906238464,
453
+ "why_precision_micro": 0.616822429906542,
454
+ "why_precision_macro": 0.6160081633765844,
455
+ "why_recall_micro": 0.752851711026616,
456
+ "why_recall_macro": 0.7186333609410531,
457
+ "why_f1_micro": 0.678082191780822,
458
+ "why_f1_macro": 0.6517414247029207,
459
+ "impacted_principles_precision_micro": 0.7631578947368421,
460
+ "impacted_principles_precision_macro": 0.7874420024420025,
461
+ "impacted_principles_recall_micro": 0.7945205479452054,
462
+ "impacted_principles_recall_macro": 0.7614157289194307,
463
+ "impacted_principles_f1_micro": 0.7785234899328859,
464
+ "impacted_principles_f1_macro": 0.7660467655075498,
465
+ "remediation_actions_precision_micro": 0.6105263157894737,
466
+ "remediation_actions_precision_macro": 0.5976390453783973,
467
+ "remediation_actions_recall_micro": 0.7733333333333333,
468
+ "remediation_actions_recall_macro": 0.690795299444056,
469
+ "remediation_actions_f1_micro": 0.6823529411764706,
470
+ "remediation_actions_f1_macro": 0.6264413385705756,
471
+ "content_type_accuracy": 1.0,
472
+ "content_type_precision_macro": 1.0,
473
+ "content_type_recall_macro": 1.0,
474
+ "content_type_f1_macro": 1.0,
475
+ "audience_segment_accuracy": 1.0,
476
+ "audience_segment_precision_macro": 1.0,
477
+ "audience_segment_recall_macro": 1.0,
478
+ "audience_segment_f1_macro": 1.0,
479
+ "detection_difficulty_accuracy": 0.41333333333333333,
480
+ "detection_difficulty_precision_macro": 0.4076248313090418,
481
+ "detection_difficulty_recall_macro": 0.4146464646464647,
482
+ "detection_difficulty_f1_macro": 0.41032213795594075,
483
+ "aggravating_factors_precision_micro": 0.6404494382022472,
484
+ "aggravating_factors_precision_macro": 0.6351122397339503,
485
+ "aggravating_factors_recall_micro": 0.7276595744680852,
486
+ "aggravating_factors_recall_macro": 0.7164210015443564,
487
+ "aggravating_factors_f1_micro": 0.6812749003984064,
488
+ "aggravating_factors_f1_macro": 0.6705742793431082,
489
+ "stage_a_selection_score": 0.7687761716662238,
490
+ "selection_score": 0.7690657581979315,
491
+ "scenario_key_count": 150,
492
+ "rows_per_scenario_min": 1,
493
+ "rows_per_scenario_median": 1.0,
494
+ "rows_per_scenario_max": 1,
495
+ "violation_accuracy_scenario_macro": 0.9933333333333333,
496
+ "violation_accuracy_scenario_macro_risky": 0.9929577464788732,
497
+ "violation_accuracy_scenario_macro_clean": 1.0,
498
+ "violation_accuracy_scenario_min": 0.0,
499
+ "violation_worst_scenario_key": "train_1371",
500
+ "violation_worst_scenario_label": "risky"
501
+ },
502
+ "test": {
503
+ "loss": 10.207207107543946,
504
+ "violation_accuracy": 0.9866666666666667,
505
+ "violation_precision": 1.0,
506
+ "violation_recall": 0.9859154929577465,
507
+ "violation_f1": 0.9929078014184397,
508
+ "severity_accuracy": 0.7266666666666667,
509
+ "severity_precision_macro": 0.7056742540613509,
510
+ "severity_recall_macro": 0.6917853651724619,
511
+ "severity_f1_macro": 0.6937461494861875,
512
+ "domain_accuracy": 0.82,
513
+ "domain_precision_macro": 0.8639371000239372,
514
+ "domain_recall_macro": 0.7870126705653021,
515
+ "domain_f1_macro": 0.8032142065328451,
516
+ "subtype_accuracy": 0.7733333333333333,
517
+ "subtype_precision_macro": 0.7708825265643447,
518
+ "subtype_recall_macro": 0.7368260527351436,
519
+ "subtype_f1_macro": 0.7383595011385061,
520
+ "jurisdiction_accuracy": 0.74,
521
+ "jurisdiction_precision_macro": 0.5511805026656511,
522
+ "jurisdiction_recall_macro": 0.5755799755799755,
523
+ "jurisdiction_f1_macro": 0.5608646466716769,
524
+ "why_precision_micro": 0.6408045977011494,
525
+ "why_precision_macro": 0.6228897802851919,
526
+ "why_recall_micro": 0.8228782287822878,
527
+ "why_recall_macro": 0.7797228098698687,
528
+ "why_f1_micro": 0.7205169628432957,
529
+ "why_f1_macro": 0.6837887640406874,
530
+ "impacted_principles_precision_micro": 0.7368421052631579,
531
+ "impacted_principles_precision_macro": 0.7691853878810401,
532
+ "impacted_principles_recall_micro": 0.7636363636363637,
533
+ "impacted_principles_recall_macro": 0.6710974322869485,
534
+ "impacted_principles_f1_micro": 0.7499999999999999,
535
+ "impacted_principles_f1_macro": 0.7030370589130892,
536
+ "remediation_actions_precision_micro": 0.6188811188811189,
537
+ "remediation_actions_precision_macro": 0.5923653065256482,
538
+ "remediation_actions_recall_micro": 0.7695652173913043,
539
+ "remediation_actions_recall_macro": 0.684497765569872,
540
+ "remediation_actions_f1_micro": 0.686046511627907,
541
+ "remediation_actions_f1_macro": 0.6175714466344578,
542
+ "content_type_accuracy": 1.0,
543
+ "content_type_precision_macro": 1.0,
544
+ "content_type_recall_macro": 1.0,
545
+ "content_type_f1_macro": 1.0,
546
+ "audience_segment_accuracy": 1.0,
547
+ "audience_segment_precision_macro": 1.0,
548
+ "audience_segment_recall_macro": 1.0,
549
+ "audience_segment_f1_macro": 1.0,
550
+ "detection_difficulty_accuracy": 0.47333333333333333,
551
+ "detection_difficulty_precision_macro": 0.46757744378508614,
552
+ "detection_difficulty_recall_macro": 0.471182412358883,
553
+ "detection_difficulty_f1_macro": 0.46490073858516184,
554
+ "aggravating_factors_precision_micro": 0.6641509433962264,
555
+ "aggravating_factors_precision_macro": 0.6283313196161129,
556
+ "aggravating_factors_recall_micro": 0.7333333333333333,
557
+ "aggravating_factors_recall_macro": 0.6949052211781471,
558
+ "aggravating_factors_f1_micro": 0.697029702970297,
559
+ "aggravating_factors_f1_macro": 0.6546016914120363,
560
+ "stage_a_selection_score": 0.7506931806680867,
561
+ "selection_score": 0.7565296660343293,
562
+ "scenario_key_count": 150,
563
+ "rows_per_scenario_min": 1,
564
+ "rows_per_scenario_median": 1.0,
565
+ "rows_per_scenario_max": 1,
566
+ "violation_accuracy_scenario_macro": 0.9866666666666667,
567
+ "violation_accuracy_scenario_macro_risky": 0.9859154929577465,
568
+ "violation_accuracy_scenario_macro_clean": 1.0,
569
+ "violation_accuracy_scenario_min": 0.0,
570
+ "violation_worst_scenario_key": "train_1843",
571
+ "violation_worst_scenario_label": "risky"
572
+ },
573
+ "model_version": "sentinel-mb-c-d11-20260424",
574
+ "release_repo_id": "AurelexAI/sentinel-01-pub",
575
+ "release_channel": "sentinel-01-pub",
576
+ "release_alias_of": null,
577
+ "source_model_key": "sentinel-mb-c-d11",
578
+ "encoder_revision": null,
579
+ "encoder_code_revision": null,
580
+ "encoder_trust_remote_code": false,
581
+ "encoder_config_overrides": {},
582
+ "inference_task": "sentinel-stage-a",
583
+ "inference_entrypoint": "transformers.pipeline",
584
+ "source_checkpoint": {
585
+ "source": "_models/stage-a-grid-v3-gpu/sentinel-mb-c-d11/260424_135913_sentinel-mb-c-d11",
586
+ "checkpoint_sha256": "ba46d9609b97073802fbacbbceb076fb20e943389263af179ec4affa1ad97dd0",
587
+ "metadata_sha256": "feda8e1183869806e91531bf87fdc1de09c2417e4821a4ec7fcf2b8404e89979"
588
+ }
589
+ }
metrics.json ADDED
@@ -0,0 +1,997 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_key": "sentinel-mb-c-d11",
3
+ "encoder_model": "answerdotai/ModernBERT-base",
4
+ "encoder_params_millions": 149.7,
5
+ "head_type": "columnar",
6
+ "head_code": "c",
7
+ "head_variant": "d11",
8
+ "head_dropout": 0.1,
9
+ "head_div": 1,
10
+ "head_mul": 1,
11
+ "head_skip": true,
12
+ "head_architecture": "funnel",
13
+ "model_family": "modernbert-base",
14
+ "projection_size": 640,
15
+ "trainable_head_params": 14325653,
16
+ "dataset_counts": {
17
+ "train": 900,
18
+ "dev": 150,
19
+ "test": 150
20
+ },
21
+ "dataset_signature": {
22
+ "generator_version": "2026-04-07-final-audit-clear-v1",
23
+ "counts": {
24
+ "train": 900,
25
+ "dev": 150,
26
+ "test": 150
27
+ },
28
+ "distribution": {
29
+ "train": {
30
+ "risky": 603,
31
+ "clean": 297
32
+ },
33
+ "dev": {
34
+ "risky": 142,
35
+ "clean": 8
36
+ },
37
+ "test": {
38
+ "risky": 142,
39
+ "clean": 8
40
+ }
41
+ }
42
+ },
43
+ "label_groups": {
44
+ "severity": [
45
+ "sev_0_compliant_or_ok",
46
+ "sev_1_minor",
47
+ "sev_2_moderate",
48
+ "sev_3_high"
49
+ ],
50
+ "domain": [
51
+ "performance_claims_forecasting",
52
+ "investment_advice_suitability",
53
+ "conflicts_inducements",
54
+ "marketing_solicitation_advertising",
55
+ "selective_disclosure_fair_access",
56
+ "mnpi_insider_trading",
57
+ "recordkeeping_supervision",
58
+ "ai_automation_capability_claims",
59
+ "privacy_confidentiality",
60
+ "cybersecurity_internal_controls",
61
+ "employment_favoritism_role_conflict",
62
+ "aml_and_suspicious_activity",
63
+ "other_unknown"
64
+ ],
65
+ "subtype": [
66
+ "speculative_outcomes_unqualified",
67
+ "implicit_or_explicit_guarantee",
68
+ "risk_context_omitted_or_unbalanced",
69
+ "unregistered_personalized_investment_advice",
70
+ "undisclosed_economic_conflict_or_referral",
71
+ "pressure_or_coercion",
72
+ "selective_disclosure",
73
+ "mnpi_misuse_or_encouragement",
74
+ "recordkeeping_or_preapproval_evasion",
75
+ "ai_autonomy_or_safety_overstatement",
76
+ "credentials_validation_or_compliance_misrepresentation",
77
+ "confidential_data_leakage",
78
+ "internal_controls_or_exception_process_leakage",
79
+ "academic_commercial_role_blurring_or_quid_pro_quo",
80
+ "improper_solicitation_offering_pressure",
81
+ "excessive_trading_or_account_churning",
82
+ "product_switching_without_cost_benefit_analysis",
83
+ "dual_registrant_capacity_or_wrap_fee_conflict_confusion",
84
+ "elder_exploitation_or_vulnerable_client_signal",
85
+ "suspicious_activity_indicator_or_structuring",
86
+ "influencer_or_social_media_promotion_compliance_failure",
87
+ "crypto_asset_misrepresentation_or_inadequate_disclosure",
88
+ "other_unknown"
89
+ ],
90
+ "jurisdiction": [
91
+ "US",
92
+ "EU",
93
+ "UK",
94
+ "Other",
95
+ "Unknown"
96
+ ],
97
+ "why": [
98
+ "forward_looking_statement_unqualified",
99
+ "guarantee_or_assurance_language",
100
+ "omits_material_risk_or_downside",
101
+ "implies_downside_protection_or_no_drawdown",
102
+ "cherry_picks_performance_period",
103
+ "omits_performance_methodology_or_gross_net_context",
104
+ "personalized_trade_or_allocation_recommendation",
105
+ "timing_or_sizing_guidance",
106
+ "creates_implied_advisory_relationship",
107
+ "conflict_not_disclosed",
108
+ "referral_relationship_not_disclosed",
109
+ "omits_fees_costs_or_reasonably_available_alternatives",
110
+ "selective_private_performance_or_fundraising_update",
111
+ "off_the_record_or_not_in_writing_language",
112
+ "mnpi_possession_indicated",
113
+ "encourages_action_before_public_release",
114
+ "avoid_recordkeeping_channel_shift",
115
+ "bypasses_required_preapproval",
116
+ "pressure_scarcity_urgency",
117
+ "unsubstantiated_social_proof_or_validation",
118
+ "omits_testimonial_endorsement_or_rating_disclosure",
119
+ "obscures_required_disclosure_or_form_crs",
120
+ "minimizes_need_for_diligence_or_compliance",
121
+ "overstates_ai_capability_or_removes_human_oversight",
122
+ "claims_compliance_risk_eliminated",
123
+ "shares_sensitive_personal_or_financial_data",
124
+ "violates_need_to_know_data_minimization",
125
+ "shares_sensitive_internal_controls_or_exceptions",
126
+ "role_power_imbalance_or_favoritism",
127
+ "excessive_trading_cost_to_equity",
128
+ "inadequate_customer_profile_or_suitability_basis",
129
+ "exploits_vulnerable_or_elderly_client",
130
+ "aml_suspicious_activity_indicator",
131
+ "omits_switching_costs_and_product_comparison",
132
+ "conflict_language_understates_actual_relationship",
133
+ "omits_influencer_compensation_or_affiliation_disclosure",
134
+ "misrepresents_sipc_or_regulatory_protection_for_crypto",
135
+ "data_breach_notification_obligation_triggered",
136
+ "impedes_regulatory_reporting_or_whistleblower_rights"
137
+ ],
138
+ "impacted_principles": [
139
+ "truthful_non_misleading_communications",
140
+ "balanced_risk_reward_presentation",
141
+ "no_performance_guarantees_or_promissory_language",
142
+ "registration_and_scope_of_advice",
143
+ "duty_of_loyalty_conflict_disclosure",
144
+ "fair_access_to_material_information",
145
+ "insider_trading_and_mnpi_controls",
146
+ "supervision_and_books_records",
147
+ "privacy_confidentiality_and_secure_handling",
148
+ "security_control_integrity",
149
+ "role_separation_and_fair_access_in_academia",
150
+ "non_coercion_and_no_undue_influence",
151
+ "accurate_ai_capability_and_human_oversight",
152
+ "client_vulnerability_and_exploitation_prevention",
153
+ "aml_and_sanctions_compliance"
154
+ ],
155
+ "remediation_actions": [
156
+ "add_forward_looking_disclaimer",
157
+ "reframe_as_scenarios_not_expectations",
158
+ "add_balanced_risk_and_downside_section",
159
+ "remove_or_soften_guarantee_language",
160
+ "remove_personalized_recommendations",
161
+ "add_registered_advice_boundary_language",
162
+ "disclose_conflicts_and_compensation",
163
+ "add_fees_costs_and_alternatives_comparison",
164
+ "use_standardized_approved_performance_materials",
165
+ "add_performance_methodology_and_gross_net_context",
166
+ "avoid_selective_disclosure_share_broadly",
167
+ "escalate_mnpi_to_compliance_and_halt",
168
+ "keep_discussion_on_retained_channels",
169
+ "require_formal_preapproval_before_send",
170
+ "remove_pressure_scarcity_and_use_factual_timeline",
171
+ "substantiation_or_remove_credibility_claims",
172
+ "add_testimonial_endorsement_and_rating_disclosure",
173
+ "make_required_disclosure_clear_and_prominent",
174
+ "avoid_minimizing_compliance_or_diligence",
175
+ "clarify_ai_is_assistive_with_human_review",
176
+ "remove_claims_that_ai_eliminates_risk",
177
+ "redact_and_minimize_sensitive_data",
178
+ "use_secure_transfer_and_limit_access",
179
+ "avoid_sharing_internal_controls_or_sanitize",
180
+ "route_academic_opportunities_through_institution",
181
+ "separate_recommendation_letters_from_work",
182
+ "assess_cost_to_equity_against_client_profile",
183
+ "flag_for_elder_exploitation_review_and_hold",
184
+ "assess_sar_filing_obligation_and_escalate",
185
+ "initiate_breach_notification_review_and_timeline",
186
+ "remove_provisions_impeding_regulatory_communications"
187
+ ]
188
+ },
189
+ "metadata_groups": {
190
+ "content_type": [
191
+ "email",
192
+ "message"
193
+ ],
194
+ "audience_segment": [
195
+ "client",
196
+ "internal",
197
+ "prospect_or_investor",
198
+ "public",
199
+ "third_party"
200
+ ],
201
+ "detection_difficulty": [
202
+ "obvious",
203
+ "moderate",
204
+ "subtle"
205
+ ],
206
+ "aggravating_factors": [
207
+ "intentional",
208
+ "reckless",
209
+ "negligent",
210
+ "concealment_present",
211
+ "customer_harm_potential",
212
+ "financial_benefit_to_respondent",
213
+ "vulnerable_client",
214
+ "pattern_or_duration"
215
+ ]
216
+ },
217
+ "output_signature": {
218
+ "violation": {
219
+ "type": "binary"
220
+ },
221
+ "severity": {
222
+ "type": "multiclass",
223
+ "labels": [
224
+ "sev_0_compliant_or_ok",
225
+ "sev_1_minor",
226
+ "sev_2_moderate",
227
+ "sev_3_high"
228
+ ]
229
+ },
230
+ "domain": {
231
+ "type": "multiclass",
232
+ "labels": [
233
+ "performance_claims_forecasting",
234
+ "investment_advice_suitability",
235
+ "conflicts_inducements",
236
+ "marketing_solicitation_advertising",
237
+ "selective_disclosure_fair_access",
238
+ "mnpi_insider_trading",
239
+ "recordkeeping_supervision",
240
+ "ai_automation_capability_claims",
241
+ "privacy_confidentiality",
242
+ "cybersecurity_internal_controls",
243
+ "employment_favoritism_role_conflict",
244
+ "aml_and_suspicious_activity",
245
+ "other_unknown"
246
+ ]
247
+ },
248
+ "subtype": {
249
+ "type": "multiclass",
250
+ "labels": [
251
+ "speculative_outcomes_unqualified",
252
+ "implicit_or_explicit_guarantee",
253
+ "risk_context_omitted_or_unbalanced",
254
+ "unregistered_personalized_investment_advice",
255
+ "undisclosed_economic_conflict_or_referral",
256
+ "pressure_or_coercion",
257
+ "selective_disclosure",
258
+ "mnpi_misuse_or_encouragement",
259
+ "recordkeeping_or_preapproval_evasion",
260
+ "ai_autonomy_or_safety_overstatement",
261
+ "credentials_validation_or_compliance_misrepresentation",
262
+ "confidential_data_leakage",
263
+ "internal_controls_or_exception_process_leakage",
264
+ "academic_commercial_role_blurring_or_quid_pro_quo",
265
+ "improper_solicitation_offering_pressure",
266
+ "excessive_trading_or_account_churning",
267
+ "product_switching_without_cost_benefit_analysis",
268
+ "dual_registrant_capacity_or_wrap_fee_conflict_confusion",
269
+ "elder_exploitation_or_vulnerable_client_signal",
270
+ "suspicious_activity_indicator_or_structuring",
271
+ "influencer_or_social_media_promotion_compliance_failure",
272
+ "crypto_asset_misrepresentation_or_inadequate_disclosure",
273
+ "other_unknown"
274
+ ]
275
+ },
276
+ "jurisdiction": {
277
+ "type": "multiclass",
278
+ "labels": [
279
+ "US",
280
+ "EU",
281
+ "UK",
282
+ "Other",
283
+ "Unknown"
284
+ ]
285
+ },
286
+ "why": {
287
+ "type": "multilabel",
288
+ "labels": [
289
+ "forward_looking_statement_unqualified",
290
+ "guarantee_or_assurance_language",
291
+ "omits_material_risk_or_downside",
292
+ "implies_downside_protection_or_no_drawdown",
293
+ "cherry_picks_performance_period",
294
+ "omits_performance_methodology_or_gross_net_context",
295
+ "personalized_trade_or_allocation_recommendation",
296
+ "timing_or_sizing_guidance",
297
+ "creates_implied_advisory_relationship",
298
+ "conflict_not_disclosed",
299
+ "referral_relationship_not_disclosed",
300
+ "omits_fees_costs_or_reasonably_available_alternatives",
301
+ "selective_private_performance_or_fundraising_update",
302
+ "off_the_record_or_not_in_writing_language",
303
+ "mnpi_possession_indicated",
304
+ "encourages_action_before_public_release",
305
+ "avoid_recordkeeping_channel_shift",
306
+ "bypasses_required_preapproval",
307
+ "pressure_scarcity_urgency",
308
+ "unsubstantiated_social_proof_or_validation",
309
+ "omits_testimonial_endorsement_or_rating_disclosure",
310
+ "obscures_required_disclosure_or_form_crs",
311
+ "minimizes_need_for_diligence_or_compliance",
312
+ "overstates_ai_capability_or_removes_human_oversight",
313
+ "claims_compliance_risk_eliminated",
314
+ "shares_sensitive_personal_or_financial_data",
315
+ "violates_need_to_know_data_minimization",
316
+ "shares_sensitive_internal_controls_or_exceptions",
317
+ "role_power_imbalance_or_favoritism",
318
+ "excessive_trading_cost_to_equity",
319
+ "inadequate_customer_profile_or_suitability_basis",
320
+ "exploits_vulnerable_or_elderly_client",
321
+ "aml_suspicious_activity_indicator",
322
+ "omits_switching_costs_and_product_comparison",
323
+ "conflict_language_understates_actual_relationship",
324
+ "omits_influencer_compensation_or_affiliation_disclosure",
325
+ "misrepresents_sipc_or_regulatory_protection_for_crypto",
326
+ "data_breach_notification_obligation_triggered",
327
+ "impedes_regulatory_reporting_or_whistleblower_rights"
328
+ ]
329
+ },
330
+ "impacted_principles": {
331
+ "type": "multilabel",
332
+ "labels": [
333
+ "truthful_non_misleading_communications",
334
+ "balanced_risk_reward_presentation",
335
+ "no_performance_guarantees_or_promissory_language",
336
+ "registration_and_scope_of_advice",
337
+ "duty_of_loyalty_conflict_disclosure",
338
+ "fair_access_to_material_information",
339
+ "insider_trading_and_mnpi_controls",
340
+ "supervision_and_books_records",
341
+ "privacy_confidentiality_and_secure_handling",
342
+ "security_control_integrity",
343
+ "role_separation_and_fair_access_in_academia",
344
+ "non_coercion_and_no_undue_influence",
345
+ "accurate_ai_capability_and_human_oversight",
346
+ "client_vulnerability_and_exploitation_prevention",
347
+ "aml_and_sanctions_compliance"
348
+ ]
349
+ },
350
+ "remediation_actions": {
351
+ "type": "multilabel",
352
+ "labels": [
353
+ "add_forward_looking_disclaimer",
354
+ "reframe_as_scenarios_not_expectations",
355
+ "add_balanced_risk_and_downside_section",
356
+ "remove_or_soften_guarantee_language",
357
+ "remove_personalized_recommendations",
358
+ "add_registered_advice_boundary_language",
359
+ "disclose_conflicts_and_compensation",
360
+ "add_fees_costs_and_alternatives_comparison",
361
+ "use_standardized_approved_performance_materials",
362
+ "add_performance_methodology_and_gross_net_context",
363
+ "avoid_selective_disclosure_share_broadly",
364
+ "escalate_mnpi_to_compliance_and_halt",
365
+ "keep_discussion_on_retained_channels",
366
+ "require_formal_preapproval_before_send",
367
+ "remove_pressure_scarcity_and_use_factual_timeline",
368
+ "substantiation_or_remove_credibility_claims",
369
+ "add_testimonial_endorsement_and_rating_disclosure",
370
+ "make_required_disclosure_clear_and_prominent",
371
+ "avoid_minimizing_compliance_or_diligence",
372
+ "clarify_ai_is_assistive_with_human_review",
373
+ "remove_claims_that_ai_eliminates_risk",
374
+ "redact_and_minimize_sensitive_data",
375
+ "use_secure_transfer_and_limit_access",
376
+ "avoid_sharing_internal_controls_or_sanitize",
377
+ "route_academic_opportunities_through_institution",
378
+ "separate_recommendation_letters_from_work",
379
+ "assess_cost_to_equity_against_client_profile",
380
+ "flag_for_elder_exploitation_review_and_hold",
381
+ "assess_sar_filing_obligation_and_escalate",
382
+ "initiate_breach_notification_review_and_timeline",
383
+ "remove_provisions_impeding_regulatory_communications"
384
+ ]
385
+ },
386
+ "content_type": {
387
+ "type": "multiclass",
388
+ "labels": [
389
+ "email",
390
+ "message"
391
+ ]
392
+ },
393
+ "audience_segment": {
394
+ "type": "multiclass",
395
+ "labels": [
396
+ "client",
397
+ "internal",
398
+ "prospect_or_investor",
399
+ "public",
400
+ "third_party"
401
+ ]
402
+ },
403
+ "detection_difficulty": {
404
+ "type": "multiclass",
405
+ "labels": [
406
+ "obvious",
407
+ "moderate",
408
+ "subtle"
409
+ ]
410
+ },
411
+ "aggravating_factors": {
412
+ "type": "multilabel",
413
+ "labels": [
414
+ "intentional",
415
+ "reckless",
416
+ "negligent",
417
+ "concealment_present",
418
+ "customer_harm_potential",
419
+ "financial_benefit_to_respondent",
420
+ "vulnerable_client",
421
+ "pattern_or_duration"
422
+ ]
423
+ }
424
+ },
425
+ "device_info": {
426
+ "device": "cuda",
427
+ "torch_cuda_version": "11.8",
428
+ "gpu_count": 1,
429
+ "gpu_name": "NVIDIA GeForce RTX 2080 Ti",
430
+ "gpu_memory_gb": 11.0,
431
+ "gpu_capability": "7.5",
432
+ "nvidia_smi": [
433
+ "NVIDIA GeForce RTX 2080 Ti, 591.74, 11264 MiB"
434
+ ]
435
+ },
436
+ "timings": {
437
+ "encoding_seconds": 0.0,
438
+ "training_seconds": 82.25,
439
+ "total_seconds": 86.5
440
+ },
441
+ "cycles": [
442
+ {
443
+ "loss": 7.7748064517974855,
444
+ "violation_accuracy": 0.9933333333333333,
445
+ "violation_precision": 1.0,
446
+ "violation_recall": 0.9929577464788732,
447
+ "violation_f1": 0.9964664310954063,
448
+ "severity_accuracy": 0.68,
449
+ "severity_precision_macro": 0.6277301315037164,
450
+ "severity_recall_macro": 0.707563025210084,
451
+ "severity_f1_macro": 0.6574701673088821,
452
+ "domain_accuracy": 0.8466666666666667,
453
+ "domain_precision_macro": 0.8728019516325967,
454
+ "domain_recall_macro": 0.8525462962962963,
455
+ "domain_f1_macro": 0.8494220062066961,
456
+ "subtype_accuracy": 0.7733333333333333,
457
+ "subtype_precision_macro": 0.8113931523022433,
458
+ "subtype_recall_macro": 0.7490964843237571,
459
+ "subtype_f1_macro": 0.7508012065714375,
460
+ "jurisdiction_accuracy": 0.6933333333333334,
461
+ "jurisdiction_precision_macro": 0.39713131313131317,
462
+ "jurisdiction_recall_macro": 0.4161038961038961,
463
+ "jurisdiction_f1_macro": 0.3923395902343271,
464
+ "why_precision_micro": 0.5229591836734694,
465
+ "why_precision_macro": 0.5766391767639499,
466
+ "why_recall_micro": 0.779467680608365,
467
+ "why_recall_macro": 0.7375322683014991,
468
+ "why_f1_micro": 0.6259541984732825,
469
+ "why_f1_macro": 0.6266278969973275,
470
+ "impacted_principles_precision_micro": 0.714859437751004,
471
+ "impacted_principles_precision_macro": 0.7255202728514017,
472
+ "impacted_principles_recall_micro": 0.8127853881278538,
473
+ "impacted_principles_recall_macro": 0.7848574654881022,
474
+ "impacted_principles_f1_micro": 0.7606837606837606,
475
+ "impacted_principles_f1_macro": 0.7439683431383844,
476
+ "remediation_actions_precision_micro": 0.6126760563380281,
477
+ "remediation_actions_precision_macro": 0.5838607852720756,
478
+ "remediation_actions_recall_micro": 0.7733333333333333,
479
+ "remediation_actions_recall_macro": 0.7043368620792969,
480
+ "remediation_actions_f1_micro": 0.6836935166994106,
481
+ "remediation_actions_f1_macro": 0.6242740328903318,
482
+ "content_type_accuracy": 1.0,
483
+ "content_type_precision_macro": 1.0,
484
+ "content_type_recall_macro": 1.0,
485
+ "content_type_f1_macro": 1.0,
486
+ "audience_segment_accuracy": 1.0,
487
+ "audience_segment_precision_macro": 1.0,
488
+ "audience_segment_recall_macro": 1.0,
489
+ "audience_segment_f1_macro": 1.0,
490
+ "detection_difficulty_accuracy": 0.44666666666666666,
491
+ "detection_difficulty_precision_macro": 0.4471819645732689,
492
+ "detection_difficulty_recall_macro": 0.46915306915306915,
493
+ "detection_difficulty_f1_macro": 0.4404195664321677,
494
+ "aggravating_factors_precision_micro": 0.5650793650793651,
495
+ "aggravating_factors_precision_macro": 0.550085885667087,
496
+ "aggravating_factors_recall_micro": 0.7574468085106383,
497
+ "aggravating_factors_recall_macro": 0.7552521514727553,
498
+ "aggravating_factors_f1_micro": 0.6472727272727272,
499
+ "aggravating_factors_f1_macro": 0.6306325855261203,
500
+ "stage_a_selection_score": 0.7456116562791146,
501
+ "selection_score": 0.7500419326212061,
502
+ "scenario_key_count": 150,
503
+ "rows_per_scenario_min": 1,
504
+ "rows_per_scenario_median": 1.0,
505
+ "rows_per_scenario_max": 1,
506
+ "violation_accuracy_scenario_macro": 0.9933333333333333,
507
+ "violation_accuracy_scenario_macro_risky": 0.9929577464788732,
508
+ "violation_accuracy_scenario_macro_clean": 1.0,
509
+ "violation_accuracy_scenario_min": 0.0,
510
+ "violation_worst_scenario_key": "train_1371",
511
+ "violation_worst_scenario_label": "risky",
512
+ "cycle": 1,
513
+ "best_epoch": 15,
514
+ "epochs_ran": 21,
515
+ "lr": 0.003,
516
+ "head_dropout": 0.1,
517
+ "weight_decay": 0.01,
518
+ "cycle_seconds": 20.43
519
+ },
520
+ {
521
+ "loss": 11.207931518554688,
522
+ "violation_accuracy": 0.9933333333333333,
523
+ "violation_precision": 1.0,
524
+ "violation_recall": 0.9929577464788732,
525
+ "violation_f1": 0.9964664310954063,
526
+ "severity_accuracy": 0.7133333333333334,
527
+ "severity_precision_macro": 0.5736714975845411,
528
+ "severity_recall_macro": 0.5810399159663866,
529
+ "severity_f1_macro": 0.577203237410072,
530
+ "domain_accuracy": 0.8733333333333333,
531
+ "domain_precision_macro": 0.9152304502304504,
532
+ "domain_recall_macro": 0.9037037037037038,
533
+ "domain_f1_macro": 0.8981829715276235,
534
+ "subtype_accuracy": 0.82,
535
+ "subtype_precision_macro": 0.8295979273252001,
536
+ "subtype_recall_macro": 0.8100452577725306,
537
+ "subtype_f1_macro": 0.8046637752590468,
538
+ "jurisdiction_accuracy": 0.6933333333333334,
539
+ "jurisdiction_precision_macro": 0.41350649350649354,
540
+ "jurisdiction_recall_macro": 0.4179220779220779,
541
+ "jurisdiction_f1_macro": 0.4076005906238464,
542
+ "why_precision_micro": 0.6041666666666666,
543
+ "why_precision_macro": 0.5994839193351778,
544
+ "why_recall_micro": 0.7718631178707225,
545
+ "why_recall_macro": 0.7338144761221683,
546
+ "why_f1_micro": 0.67779632721202,
547
+ "why_f1_macro": 0.6489732285249087,
548
+ "impacted_principles_precision_micro": 0.7204724409448819,
549
+ "impacted_principles_precision_macro": 0.7509759521524227,
550
+ "impacted_principles_recall_micro": 0.8356164383561644,
551
+ "impacted_principles_recall_macro": 0.7889626527134941,
552
+ "impacted_principles_f1_micro": 0.773784355179704,
553
+ "impacted_principles_f1_macro": 0.7575593568585874,
554
+ "remediation_actions_precision_micro": 0.6105263157894737,
555
+ "remediation_actions_precision_macro": 0.5976390453783973,
556
+ "remediation_actions_recall_micro": 0.7733333333333333,
557
+ "remediation_actions_recall_macro": 0.690795299444056,
558
+ "remediation_actions_f1_micro": 0.6823529411764706,
559
+ "remediation_actions_f1_macro": 0.6264413385705756,
560
+ "content_type_accuracy": 1.0,
561
+ "content_type_precision_macro": 1.0,
562
+ "content_type_recall_macro": 1.0,
563
+ "content_type_f1_macro": 1.0,
564
+ "audience_segment_accuracy": 1.0,
565
+ "audience_segment_precision_macro": 1.0,
566
+ "audience_segment_recall_macro": 1.0,
567
+ "audience_segment_f1_macro": 1.0,
568
+ "detection_difficulty_accuracy": 0.41333333333333333,
569
+ "detection_difficulty_precision_macro": 0.4076248313090418,
570
+ "detection_difficulty_recall_macro": 0.4146464646464647,
571
+ "detection_difficulty_f1_macro": 0.41032213795594075,
572
+ "aggravating_factors_precision_micro": 0.6544715447154471,
573
+ "aggravating_factors_precision_macro": 0.6429940120546376,
574
+ "aggravating_factors_recall_micro": 0.6851063829787234,
575
+ "aggravating_factors_recall_macro": 0.6755889259203152,
576
+ "aggravating_factors_f1_micro": 0.6694386694386694,
577
+ "aggravating_factors_f1_macro": 0.6555896631040743,
578
+ "stage_a_selection_score": 0.767369221062852,
579
+ "selection_score": 0.7671909669032824,
580
+ "scenario_key_count": 150,
581
+ "rows_per_scenario_min": 1,
582
+ "rows_per_scenario_median": 1.0,
583
+ "rows_per_scenario_max": 1,
584
+ "violation_accuracy_scenario_macro": 0.9933333333333333,
585
+ "violation_accuracy_scenario_macro_risky": 0.9929577464788732,
586
+ "violation_accuracy_scenario_macro_clean": 1.0,
587
+ "violation_accuracy_scenario_min": 0.0,
588
+ "violation_worst_scenario_key": "train_1371",
589
+ "violation_worst_scenario_label": "risky",
590
+ "cycle": 2,
591
+ "best_epoch": 28,
592
+ "epochs_ran": 34,
593
+ "lr": 0.001,
594
+ "head_dropout": 0.1,
595
+ "weight_decay": 0.0,
596
+ "cycle_seconds": 32.74
597
+ },
598
+ {
599
+ "loss": 8.636670589447021,
600
+ "violation_accuracy": 0.9866666666666667,
601
+ "violation_precision": 0.9929577464788732,
602
+ "violation_recall": 0.9929577464788732,
603
+ "violation_f1": 0.9929577464788732,
604
+ "severity_accuracy": 0.66,
605
+ "severity_precision_macro": 0.4967107870333677,
606
+ "severity_recall_macro": 0.5516281512605042,
607
+ "severity_f1_macro": 0.5189861673414305,
608
+ "domain_accuracy": 0.8666666666666667,
609
+ "domain_precision_macro": 0.8598119380377445,
610
+ "domain_recall_macro": 0.9199074074074075,
611
+ "domain_f1_macro": 0.8685643227768131,
612
+ "subtype_accuracy": 0.7866666666666666,
613
+ "subtype_precision_macro": 0.8163114663114664,
614
+ "subtype_recall_macro": 0.775937950937951,
615
+ "subtype_f1_macro": 0.7646749863327403,
616
+ "jurisdiction_accuracy": 0.7733333333333333,
617
+ "jurisdiction_precision_macro": 0.5372131147540984,
618
+ "jurisdiction_recall_macro": 0.4397402597402597,
619
+ "jurisdiction_f1_macro": 0.47636711947056776,
620
+ "why_precision_micro": 0.4528301886792453,
621
+ "why_precision_macro": 0.48296837652041275,
622
+ "why_recall_micro": 0.8212927756653993,
623
+ "why_recall_macro": 0.7714144117990271,
624
+ "why_f1_micro": 0.5837837837837838,
625
+ "why_f1_macro": 0.5733734446440217,
626
+ "impacted_principles_precision_micro": 0.6332179930795848,
627
+ "impacted_principles_precision_macro": 0.6573170966740058,
628
+ "impacted_principles_recall_micro": 0.8356164383561644,
629
+ "impacted_principles_recall_macro": 0.8099590558905407,
630
+ "impacted_principles_f1_micro": 0.7204724409448819,
631
+ "impacted_principles_f1_macro": 0.7112060969947187,
632
+ "remediation_actions_precision_micro": 0.5207756232686981,
633
+ "remediation_actions_precision_macro": 0.5043151896069353,
634
+ "remediation_actions_recall_micro": 0.8355555555555556,
635
+ "remediation_actions_recall_macro": 0.7390617197643065,
636
+ "remediation_actions_f1_micro": 0.6416382252559728,
637
+ "remediation_actions_f1_macro": 0.5815208256530966,
638
+ "content_type_accuracy": 1.0,
639
+ "content_type_precision_macro": 1.0,
640
+ "content_type_recall_macro": 1.0,
641
+ "content_type_f1_macro": 1.0,
642
+ "audience_segment_accuracy": 1.0,
643
+ "audience_segment_precision_macro": 1.0,
644
+ "audience_segment_recall_macro": 1.0,
645
+ "audience_segment_f1_macro": 1.0,
646
+ "detection_difficulty_accuracy": 0.41333333333333333,
647
+ "detection_difficulty_precision_macro": 0.39879147137211657,
648
+ "detection_difficulty_recall_macro": 0.40982905982905987,
649
+ "detection_difficulty_f1_macro": 0.4012269618676941,
650
+ "aggravating_factors_precision_micro": 0.5627009646302251,
651
+ "aggravating_factors_precision_macro": 0.5604147213071698,
652
+ "aggravating_factors_recall_micro": 0.7446808510638298,
653
+ "aggravating_factors_recall_macro": 0.7464349207339351,
654
+ "aggravating_factors_f1_micro": 0.6410256410256411,
655
+ "aggravating_factors_f1_macro": 0.6206138817554251,
656
+ "stage_a_selection_score": 0.7407038445266995,
657
+ "selection_score": 0.7436551178025155,
658
+ "scenario_key_count": 150,
659
+ "rows_per_scenario_min": 1,
660
+ "rows_per_scenario_median": 1.0,
661
+ "rows_per_scenario_max": 1,
662
+ "violation_accuracy_scenario_macro": 0.9866666666666667,
663
+ "violation_accuracy_scenario_macro_risky": 0.9929577464788732,
664
+ "violation_accuracy_scenario_macro_clean": 0.875,
665
+ "violation_accuracy_scenario_min": 0.0,
666
+ "violation_worst_scenario_key": "train_1371",
667
+ "violation_worst_scenario_label": "risky",
668
+ "cycle": 3,
669
+ "best_epoch": 23,
670
+ "epochs_ran": 29,
671
+ "lr": 0.0005,
672
+ "head_dropout": 0.1,
673
+ "weight_decay": 0.02,
674
+ "cycle_seconds": 27.33
675
+ }
676
+ ],
677
+ "best_cycle": {
678
+ "loss": 11.207931518554688,
679
+ "violation_accuracy": 0.9933333333333333,
680
+ "violation_precision": 1.0,
681
+ "violation_recall": 0.9929577464788732,
682
+ "violation_f1": 0.9964664310954063,
683
+ "severity_accuracy": 0.7133333333333334,
684
+ "severity_precision_macro": 0.5736714975845411,
685
+ "severity_recall_macro": 0.5810399159663866,
686
+ "severity_f1_macro": 0.577203237410072,
687
+ "domain_accuracy": 0.8733333333333333,
688
+ "domain_precision_macro": 0.9152304502304504,
689
+ "domain_recall_macro": 0.9037037037037038,
690
+ "domain_f1_macro": 0.8981829715276235,
691
+ "subtype_accuracy": 0.82,
692
+ "subtype_precision_macro": 0.8295979273252001,
693
+ "subtype_recall_macro": 0.8100452577725306,
694
+ "subtype_f1_macro": 0.8046637752590468,
695
+ "jurisdiction_accuracy": 0.6933333333333334,
696
+ "jurisdiction_precision_macro": 0.41350649350649354,
697
+ "jurisdiction_recall_macro": 0.4179220779220779,
698
+ "jurisdiction_f1_macro": 0.4076005906238464,
699
+ "why_precision_micro": 0.6041666666666666,
700
+ "why_precision_macro": 0.5994839193351778,
701
+ "why_recall_micro": 0.7718631178707225,
702
+ "why_recall_macro": 0.7338144761221683,
703
+ "why_f1_micro": 0.67779632721202,
704
+ "why_f1_macro": 0.6489732285249087,
705
+ "impacted_principles_precision_micro": 0.7204724409448819,
706
+ "impacted_principles_precision_macro": 0.7509759521524227,
707
+ "impacted_principles_recall_micro": 0.8356164383561644,
708
+ "impacted_principles_recall_macro": 0.7889626527134941,
709
+ "impacted_principles_f1_micro": 0.773784355179704,
710
+ "impacted_principles_f1_macro": 0.7575593568585874,
711
+ "remediation_actions_precision_micro": 0.6105263157894737,
712
+ "remediation_actions_precision_macro": 0.5976390453783973,
713
+ "remediation_actions_recall_micro": 0.7733333333333333,
714
+ "remediation_actions_recall_macro": 0.690795299444056,
715
+ "remediation_actions_f1_micro": 0.6823529411764706,
716
+ "remediation_actions_f1_macro": 0.6264413385705756,
717
+ "content_type_accuracy": 1.0,
718
+ "content_type_precision_macro": 1.0,
719
+ "content_type_recall_macro": 1.0,
720
+ "content_type_f1_macro": 1.0,
721
+ "audience_segment_accuracy": 1.0,
722
+ "audience_segment_precision_macro": 1.0,
723
+ "audience_segment_recall_macro": 1.0,
724
+ "audience_segment_f1_macro": 1.0,
725
+ "detection_difficulty_accuracy": 0.41333333333333333,
726
+ "detection_difficulty_precision_macro": 0.4076248313090418,
727
+ "detection_difficulty_recall_macro": 0.4146464646464647,
728
+ "detection_difficulty_f1_macro": 0.41032213795594075,
729
+ "aggravating_factors_precision_micro": 0.6544715447154471,
730
+ "aggravating_factors_precision_macro": 0.6429940120546376,
731
+ "aggravating_factors_recall_micro": 0.6851063829787234,
732
+ "aggravating_factors_recall_macro": 0.6755889259203152,
733
+ "aggravating_factors_f1_micro": 0.6694386694386694,
734
+ "aggravating_factors_f1_macro": 0.6555896631040743,
735
+ "stage_a_selection_score": 0.767369221062852,
736
+ "selection_score": 0.7671909669032824,
737
+ "scenario_key_count": 150,
738
+ "rows_per_scenario_min": 1,
739
+ "rows_per_scenario_median": 1.0,
740
+ "rows_per_scenario_max": 1,
741
+ "violation_accuracy_scenario_macro": 0.9933333333333333,
742
+ "violation_accuracy_scenario_macro_risky": 0.9929577464788732,
743
+ "violation_accuracy_scenario_macro_clean": 1.0,
744
+ "violation_accuracy_scenario_min": 0.0,
745
+ "violation_worst_scenario_key": "train_1371",
746
+ "violation_worst_scenario_label": "risky",
747
+ "cycle": 2,
748
+ "best_epoch": 28,
749
+ "epochs_ran": 34,
750
+ "lr": 0.001,
751
+ "head_dropout": 0.1,
752
+ "weight_decay": 0.0,
753
+ "cycle_seconds": 32.74
754
+ },
755
+ "train": {
756
+ "loss": 0.29129520431160927,
757
+ "violation_accuracy": 1.0,
758
+ "violation_precision": 1.0,
759
+ "violation_recall": 1.0,
760
+ "violation_f1": 1.0,
761
+ "severity_accuracy": 0.9733333333333334,
762
+ "severity_precision_macro": 0.979737423027768,
763
+ "severity_recall_macro": 0.9795454017784588,
764
+ "severity_f1_macro": 0.9795522630686699,
765
+ "domain_accuracy": 0.9911111111111112,
766
+ "domain_precision_macro": 0.9922987117552334,
767
+ "domain_recall_macro": 0.9966329966329965,
768
+ "domain_f1_macro": 0.9943418090318922,
769
+ "subtype_accuracy": 1.0,
770
+ "subtype_precision_macro": 1.0,
771
+ "subtype_recall_macro": 1.0,
772
+ "subtype_f1_macro": 1.0,
773
+ "jurisdiction_accuracy": 0.9711111111111111,
774
+ "jurisdiction_precision_macro": 0.8754392733703078,
775
+ "jurisdiction_recall_macro": 0.9931847968545217,
776
+ "jurisdiction_f1_macro": 0.9279309415166559,
777
+ "why_precision_micro": 0.7692307692307693,
778
+ "why_precision_macro": 0.7629469993662027,
779
+ "why_recall_micro": 1.0,
780
+ "why_recall_macro": 0.9487179487179487,
781
+ "why_f1_micro": 0.8695652173913044,
782
+ "why_f1_macro": 0.8392609851418487,
783
+ "impacted_principles_precision_micro": 0.9257028112449799,
784
+ "impacted_principles_precision_macro": 0.8949360744911738,
785
+ "impacted_principles_recall_micro": 0.9913978494623656,
786
+ "impacted_principles_recall_macro": 0.9304394224733208,
787
+ "impacted_principles_f1_micro": 0.9574247144340603,
788
+ "impacted_principles_f1_macro": 0.9116715954228238,
789
+ "remediation_actions_precision_micro": 0.8031128404669261,
790
+ "remediation_actions_precision_macro": 0.743125500508932,
791
+ "remediation_actions_recall_micro": 0.9990319457889641,
792
+ "remediation_actions_recall_macro": 0.9029428409734013,
793
+ "remediation_actions_f1_micro": 0.8904227782571181,
794
+ "remediation_actions_f1_macro": 0.8109253028549341,
795
+ "content_type_accuracy": 1.0,
796
+ "content_type_precision_macro": 1.0,
797
+ "content_type_recall_macro": 1.0,
798
+ "content_type_f1_macro": 1.0,
799
+ "audience_segment_accuracy": 1.0,
800
+ "audience_segment_precision_macro": 1.0,
801
+ "audience_segment_recall_macro": 1.0,
802
+ "audience_segment_f1_macro": 1.0,
803
+ "detection_difficulty_accuracy": 0.9944444444444445,
804
+ "detection_difficulty_precision_macro": 0.9945552657437111,
805
+ "detection_difficulty_recall_macro": 0.9945552657437111,
806
+ "detection_difficulty_f1_macro": 0.9945552657437111,
807
+ "aggravating_factors_precision_micro": 0.9263862332695985,
808
+ "aggravating_factors_precision_macro": 0.9236474949570554,
809
+ "aggravating_factors_recall_micro": 0.9979402677651905,
810
+ "aggravating_factors_recall_macro": 0.9992537313432837,
811
+ "aggravating_factors_f1_micro": 0.9608329201784829,
812
+ "aggravating_factors_f1_macro": 0.9592327500257608,
813
+ "stage_a_selection_score": 0.937580517111993,
814
+ "selection_score": 0.947753814478068,
815
+ "scenario_key_count": 900,
816
+ "rows_per_scenario_min": 1,
817
+ "rows_per_scenario_median": 1.0,
818
+ "rows_per_scenario_max": 1,
819
+ "violation_accuracy_scenario_macro": 1.0,
820
+ "violation_accuracy_scenario_macro_risky": 1.0,
821
+ "violation_accuracy_scenario_macro_clean": 1.0,
822
+ "violation_accuracy_scenario_min": 1.0,
823
+ "violation_worst_scenario_key": "train_1001",
824
+ "violation_worst_scenario_label": "risky"
825
+ },
826
+ "dev": {
827
+ "loss": 11.207931518554688,
828
+ "violation_accuracy": 0.9933333333333333,
829
+ "violation_precision": 1.0,
830
+ "violation_recall": 0.9929577464788732,
831
+ "violation_f1": 0.9964664310954063,
832
+ "severity_accuracy": 0.7133333333333334,
833
+ "severity_precision_macro": 0.5736714975845411,
834
+ "severity_recall_macro": 0.5810399159663866,
835
+ "severity_f1_macro": 0.577203237410072,
836
+ "domain_accuracy": 0.8733333333333333,
837
+ "domain_precision_macro": 0.9152304502304504,
838
+ "domain_recall_macro": 0.9037037037037038,
839
+ "domain_f1_macro": 0.8981829715276235,
840
+ "subtype_accuracy": 0.82,
841
+ "subtype_precision_macro": 0.8295979273252001,
842
+ "subtype_recall_macro": 0.8100452577725306,
843
+ "subtype_f1_macro": 0.8046637752590468,
844
+ "jurisdiction_accuracy": 0.6933333333333334,
845
+ "jurisdiction_precision_macro": 0.41350649350649354,
846
+ "jurisdiction_recall_macro": 0.4179220779220779,
847
+ "jurisdiction_f1_macro": 0.4076005906238464,
848
+ "why_precision_micro": 0.616822429906542,
849
+ "why_precision_macro": 0.6160081633765844,
850
+ "why_recall_micro": 0.752851711026616,
851
+ "why_recall_macro": 0.7186333609410531,
852
+ "why_f1_micro": 0.678082191780822,
853
+ "why_f1_macro": 0.6517414247029207,
854
+ "impacted_principles_precision_micro": 0.7631578947368421,
855
+ "impacted_principles_precision_macro": 0.7874420024420025,
856
+ "impacted_principles_recall_micro": 0.7945205479452054,
857
+ "impacted_principles_recall_macro": 0.7614157289194307,
858
+ "impacted_principles_f1_micro": 0.7785234899328859,
859
+ "impacted_principles_f1_macro": 0.7660467655075498,
860
+ "remediation_actions_precision_micro": 0.6105263157894737,
861
+ "remediation_actions_precision_macro": 0.5976390453783973,
862
+ "remediation_actions_recall_micro": 0.7733333333333333,
863
+ "remediation_actions_recall_macro": 0.690795299444056,
864
+ "remediation_actions_f1_micro": 0.6823529411764706,
865
+ "remediation_actions_f1_macro": 0.6264413385705756,
866
+ "content_type_accuracy": 1.0,
867
+ "content_type_precision_macro": 1.0,
868
+ "content_type_recall_macro": 1.0,
869
+ "content_type_f1_macro": 1.0,
870
+ "audience_segment_accuracy": 1.0,
871
+ "audience_segment_precision_macro": 1.0,
872
+ "audience_segment_recall_macro": 1.0,
873
+ "audience_segment_f1_macro": 1.0,
874
+ "detection_difficulty_accuracy": 0.41333333333333333,
875
+ "detection_difficulty_precision_macro": 0.4076248313090418,
876
+ "detection_difficulty_recall_macro": 0.4146464646464647,
877
+ "detection_difficulty_f1_macro": 0.41032213795594075,
878
+ "aggravating_factors_precision_micro": 0.6404494382022472,
879
+ "aggravating_factors_precision_macro": 0.6351122397339503,
880
+ "aggravating_factors_recall_micro": 0.7276595744680852,
881
+ "aggravating_factors_recall_macro": 0.7164210015443564,
882
+ "aggravating_factors_f1_micro": 0.6812749003984064,
883
+ "aggravating_factors_f1_macro": 0.6705742793431082,
884
+ "stage_a_selection_score": 0.7687761716662238,
885
+ "selection_score": 0.7690657581979315,
886
+ "scenario_key_count": 150,
887
+ "rows_per_scenario_min": 1,
888
+ "rows_per_scenario_median": 1.0,
889
+ "rows_per_scenario_max": 1,
890
+ "violation_accuracy_scenario_macro": 0.9933333333333333,
891
+ "violation_accuracy_scenario_macro_risky": 0.9929577464788732,
892
+ "violation_accuracy_scenario_macro_clean": 1.0,
893
+ "violation_accuracy_scenario_min": 0.0,
894
+ "violation_worst_scenario_key": "train_1371",
895
+ "violation_worst_scenario_label": "risky"
896
+ },
897
+ "test": {
898
+ "loss": 10.207207107543946,
899
+ "violation_accuracy": 0.9866666666666667,
900
+ "violation_precision": 1.0,
901
+ "violation_recall": 0.9859154929577465,
902
+ "violation_f1": 0.9929078014184397,
903
+ "severity_accuracy": 0.7266666666666667,
904
+ "severity_precision_macro": 0.7056742540613509,
905
+ "severity_recall_macro": 0.6917853651724619,
906
+ "severity_f1_macro": 0.6937461494861875,
907
+ "domain_accuracy": 0.82,
908
+ "domain_precision_macro": 0.8639371000239372,
909
+ "domain_recall_macro": 0.7870126705653021,
910
+ "domain_f1_macro": 0.8032142065328451,
911
+ "subtype_accuracy": 0.7733333333333333,
912
+ "subtype_precision_macro": 0.7708825265643447,
913
+ "subtype_recall_macro": 0.7368260527351436,
914
+ "subtype_f1_macro": 0.7383595011385061,
915
+ "jurisdiction_accuracy": 0.74,
916
+ "jurisdiction_precision_macro": 0.5511805026656511,
917
+ "jurisdiction_recall_macro": 0.5755799755799755,
918
+ "jurisdiction_f1_macro": 0.5608646466716769,
919
+ "why_precision_micro": 0.6408045977011494,
920
+ "why_precision_macro": 0.6228897802851919,
921
+ "why_recall_micro": 0.8228782287822878,
922
+ "why_recall_macro": 0.7797228098698687,
923
+ "why_f1_micro": 0.7205169628432957,
924
+ "why_f1_macro": 0.6837887640406874,
925
+ "impacted_principles_precision_micro": 0.7368421052631579,
926
+ "impacted_principles_precision_macro": 0.7691853878810401,
927
+ "impacted_principles_recall_micro": 0.7636363636363637,
928
+ "impacted_principles_recall_macro": 0.6710974322869485,
929
+ "impacted_principles_f1_micro": 0.7499999999999999,
930
+ "impacted_principles_f1_macro": 0.7030370589130892,
931
+ "remediation_actions_precision_micro": 0.6188811188811189,
932
+ "remediation_actions_precision_macro": 0.5923653065256482,
933
+ "remediation_actions_recall_micro": 0.7695652173913043,
934
+ "remediation_actions_recall_macro": 0.684497765569872,
935
+ "remediation_actions_f1_micro": 0.686046511627907,
936
+ "remediation_actions_f1_macro": 0.6175714466344578,
937
+ "content_type_accuracy": 1.0,
938
+ "content_type_precision_macro": 1.0,
939
+ "content_type_recall_macro": 1.0,
940
+ "content_type_f1_macro": 1.0,
941
+ "audience_segment_accuracy": 1.0,
942
+ "audience_segment_precision_macro": 1.0,
943
+ "audience_segment_recall_macro": 1.0,
944
+ "audience_segment_f1_macro": 1.0,
945
+ "detection_difficulty_accuracy": 0.47333333333333333,
946
+ "detection_difficulty_precision_macro": 0.46757744378508614,
947
+ "detection_difficulty_recall_macro": 0.471182412358883,
948
+ "detection_difficulty_f1_macro": 0.46490073858516184,
949
+ "aggravating_factors_precision_micro": 0.6641509433962264,
950
+ "aggravating_factors_precision_macro": 0.6283313196161129,
951
+ "aggravating_factors_recall_micro": 0.7333333333333333,
952
+ "aggravating_factors_recall_macro": 0.6949052211781471,
953
+ "aggravating_factors_f1_micro": 0.697029702970297,
954
+ "aggravating_factors_f1_macro": 0.6546016914120363,
955
+ "stage_a_selection_score": 0.7506931806680867,
956
+ "selection_score": 0.7565296660343293,
957
+ "scenario_key_count": 150,
958
+ "rows_per_scenario_min": 1,
959
+ "rows_per_scenario_median": 1.0,
960
+ "rows_per_scenario_max": 1,
961
+ "violation_accuracy_scenario_macro": 0.9866666666666667,
962
+ "violation_accuracy_scenario_macro_risky": 0.9859154929577465,
963
+ "violation_accuracy_scenario_macro_clean": 1.0,
964
+ "violation_accuracy_scenario_min": 0.0,
965
+ "violation_worst_scenario_key": "train_1843",
966
+ "violation_worst_scenario_label": "risky"
967
+ },
968
+ "thresholds": {
969
+ "violation": 0.5,
970
+ "why": 0.55,
971
+ "impacted_principles": 0.7,
972
+ "remediation_actions": 0.5,
973
+ "aggravating_factors": 0.4
974
+ },
975
+ "log_path": "_cache/logs/legacy/stage-a-grid-v3-gpu/raw/260424_135746_sentinel-mb-c-d11.log",
976
+ "prior_poc_inflation_factors": [
977
+ "The previous PoC reused the same 17 synthetic families across train, dev, and test, so the model mostly learned family signatures rather than broad compliance reasoning.",
978
+ "Every prior observation carried extra structural cues such as source metadata, evidence snippets, and explicit jurisdiction sentences appended to the text.",
979
+ "A later dataset refactor silently dropped jurisdiction, impacted-principle, and remediation heads, which made the reported Stage A contract narrower than the product actually promises.",
980
+ "Reported micro metrics on dense negative label maps made performance look cleaner than a realistic class-by-class review would suggest."
981
+ ],
982
+ "mitigations": [
983
+ "The data pipeline now uses a 150-row agent-authored pilot plus a hard human-review gate before any 1000/100/100 release split is allowed to exist on disk.",
984
+ "The generation workflow now keeps Python limited to validation, formatting, duplicate review, and statistics while the agent authors and labels each observation directly.",
985
+ "The encoder default still uses a 512-token window, which comfortably covers the current 1000-character manual-authoring ceiling.",
986
+ "The full Stage A diagnose/prescribe contract is restored in both dataset and model outputs: jurisdiction, why, impacted principles, remediation actions, detection difficulty, and aggravating factors are all explicit.",
987
+ "Dataset generation now validates the mock contract keys directly and requires a human-reviewed approval hash before contract changes can pass validation.",
988
+ "The model factory now constructs full model bundles, while checkpoints store the trained projection and heads plus the frozen encoder reference instead of duplicating immutable backbone weights.",
989
+ "Evaluation artifacts now report scenario-family macro violation metrics and worst-family binary performance so repeated rows inside a narrow split cannot hide behind a flattering row-average alone.",
990
+ "Cross-checkpoint comparison artifacts are only kept when they are refreshed against the current dataset, preventing stale benchmark reports from masquerading as current evidence."
991
+ ],
992
+ "artifact_format": "checkpoint_only",
993
+ "end_to_end_serialized": false,
994
+ "transformers_bundle_dir": null,
995
+ "checkpoint_dir": "_models/stage-a-grid-v3-gpu/sentinel-mb-c-d11/260424_135913_sentinel-mb-c-d11",
996
+ "display_name": "sentinel-mb-c-d11@260424_135913"
997
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6a10402332c588c7d67faa61f507aecee0b2d4004c685cb425b6e180dbfbf554
3
+ size 653387268
modeling_sentinel.py ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Self-contained Transformers model for Sentinel Stage A."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import Any
6
+
7
+ import torch
8
+ import torch.nn as nn
9
+ from transformers import AutoConfig, AutoModel, PretrainedConfig, PreTrainedModel
10
+
11
+ from .configuration_sentinel import SentinelConfig
12
+
13
+
14
+ def _masked_mean(hidden: torch.Tensor, attention_mask: torch.Tensor) -> torch.Tensor:
15
+ mask = attention_mask.unsqueeze(-1)
16
+ summed = (hidden * mask).sum(dim=1)
17
+ counts = mask.sum(dim=1).clamp(min=1)
18
+ return summed / counts
19
+
20
+
21
+ def _head_sizes(config: SentinelConfig) -> dict[str, int]:
22
+ sizes: dict[str, int] = {}
23
+ for head in config.output_heads:
24
+ head_info = config.output_signature[head]
25
+ if head_info.get("type") == "binary":
26
+ sizes[head] = 1
27
+ else:
28
+ sizes[head] = len(head_info.get("labels", []))
29
+ return sizes
30
+
31
+
32
+ def _build_encoder_config(config: SentinelConfig) -> PretrainedConfig:
33
+ encoder_config = dict(config.encoder_config)
34
+ for key, value in dict(getattr(config, "encoder_config_overrides", {}) or {}).items():
35
+ encoder_config[key] = value
36
+ model_type = encoder_config.pop("model_type", None)
37
+ remote_error: Exception | None = None
38
+
39
+ if bool(getattr(config, "encoder_trust_remote_code", False)):
40
+ remote_kwargs: dict[str, Any] = {"trust_remote_code": True}
41
+ if getattr(config, "encoder_revision", None):
42
+ remote_kwargs["revision"] = config.encoder_revision
43
+ if getattr(config, "encoder_code_revision", None):
44
+ remote_kwargs["code_revision"] = config.encoder_code_revision
45
+ try:
46
+ trusted_config = AutoConfig.from_pretrained(
47
+ config.encoder_model_name,
48
+ **remote_kwargs,
49
+ )
50
+ for key, value in encoder_config.items():
51
+ setattr(trusted_config, key, value)
52
+ return trusted_config
53
+ except Exception as exc:
54
+ remote_error = exc
55
+
56
+ if not model_type:
57
+ raise ValueError("SentinelConfig.encoder_config must include model_type")
58
+ try:
59
+ return AutoConfig.for_model(model_type, **encoder_config)
60
+ except Exception as exc:
61
+ if remote_error is not None:
62
+ raise ValueError(
63
+ "could not build trusted remote encoder config; "
64
+ f"remote_error={type(remote_error).__name__}: {remote_error}"
65
+ ) from exc
66
+ raise
67
+
68
+
69
+ class SharedProjection(nn.Module):
70
+ def __init__(self, input_size: int, hidden_size: int, dropout: float) -> None:
71
+ super().__init__()
72
+ self.input_norm = nn.LayerNorm(input_size)
73
+ self.hidden = nn.Linear(input_size, hidden_size)
74
+ self.activation = nn.GELU()
75
+ self.dropout = nn.Dropout(dropout)
76
+ self.residual = nn.Linear(input_size, hidden_size) if input_size != hidden_size else nn.Identity()
77
+ self.output_norm = nn.LayerNorm(hidden_size)
78
+
79
+ def forward(self, features: torch.Tensor) -> torch.Tensor:
80
+ projected = self.hidden(self.input_norm(features))
81
+ projected = self.activation(projected)
82
+ projected = self.dropout(projected)
83
+ return self.output_norm(projected + self.residual(features))
84
+
85
+
86
+ class BaseStageAClassifier(nn.Module):
87
+ @staticmethod
88
+ def _format_outputs(logits: dict[str, torch.Tensor]) -> dict[str, torch.Tensor]:
89
+ logits["violation"] = logits["violation"].squeeze(-1)
90
+ return logits
91
+
92
+
93
+ class DirectStageAClassifier(BaseStageAClassifier):
94
+ def __init__(self, input_size: int, config: SentinelConfig) -> None:
95
+ super().__init__()
96
+ projection_size = int(config.projection_size)
97
+ dropout = float(config.classifier_dropout)
98
+ sizes = _head_sizes(config)
99
+ self.shared = SharedProjection(input_size, projection_size, dropout)
100
+ self.violation = nn.Linear(projection_size, sizes["violation"])
101
+ self.severity = nn.Linear(projection_size, sizes["severity"])
102
+ self.domain = nn.Linear(projection_size, sizes["domain"])
103
+ self.subtype = nn.Linear(projection_size, sizes["subtype"])
104
+ self.jurisdiction = nn.Linear(projection_size, sizes["jurisdiction"])
105
+ self.why = nn.Linear(projection_size, sizes["why"])
106
+ self.impacted_principles = nn.Linear(projection_size, sizes["impacted_principles"])
107
+ self.remediation_actions = nn.Linear(projection_size, sizes["remediation_actions"])
108
+ self.content_type = nn.Linear(projection_size, sizes["content_type"])
109
+ self.audience_segment = nn.Linear(projection_size, sizes["audience_segment"])
110
+ self.detection_difficulty = nn.Linear(projection_size, sizes["detection_difficulty"])
111
+ self.aggravating_factors = nn.Linear(projection_size, sizes["aggravating_factors"])
112
+
113
+ def forward(self, features: torch.Tensor) -> dict[str, torch.Tensor]:
114
+ hidden = self.shared(features)
115
+ return self._format_outputs(
116
+ {
117
+ "violation": self.violation(hidden),
118
+ "severity": self.severity(hidden),
119
+ "domain": self.domain(hidden),
120
+ "subtype": self.subtype(hidden),
121
+ "jurisdiction": self.jurisdiction(hidden),
122
+ "why": self.why(hidden),
123
+ "impacted_principles": self.impacted_principles(hidden),
124
+ "remediation_actions": self.remediation_actions(hidden),
125
+ "content_type": self.content_type(hidden),
126
+ "audience_segment": self.audience_segment(hidden),
127
+ "detection_difficulty": self.detection_difficulty(hidden),
128
+ "aggravating_factors": self.aggravating_factors(hidden),
129
+ }
130
+ )
131
+
132
+
133
+ def _funnel_width(size: int, divisor: int, floor: int) -> int:
134
+ return max(floor, size // max(1, divisor))
135
+
136
+
137
+ class FunnelHead(nn.Module):
138
+ def __init__(
139
+ self,
140
+ input_size: int,
141
+ output_size: int,
142
+ dropout: float,
143
+ head_div: int,
144
+ head_mul: int,
145
+ head_skip: bool,
146
+ ) -> None:
147
+ super().__init__()
148
+ self.input_size = int(input_size)
149
+ self.hidden_size = _funnel_width(self.input_size, head_div, 32)
150
+ self.final_size = _funnel_width(self.input_size, head_div * head_mul, 16)
151
+ self.input_norm = nn.LayerNorm(self.input_size)
152
+ self.first = nn.Linear(self.input_size, self.hidden_size)
153
+ self.activation = nn.GELU()
154
+ self.dropout = nn.Dropout(dropout)
155
+ self.second = nn.Linear(self.hidden_size, self.final_size)
156
+ self.residual = (
157
+ nn.Linear(self.input_size, self.final_size)
158
+ if head_skip and self.input_size != self.final_size
159
+ else nn.Identity()
160
+ if head_skip
161
+ else None
162
+ )
163
+ self.output_norm = nn.LayerNorm(self.final_size)
164
+ self.out = nn.Linear(self.final_size, output_size)
165
+
166
+ def forward(self, features: torch.Tensor) -> torch.Tensor:
167
+ normalized = self.input_norm(features)
168
+ hidden = self.first(normalized)
169
+ hidden = self.activation(hidden)
170
+ hidden = self.dropout(hidden)
171
+ hidden = self.second(hidden)
172
+ hidden = self.activation(hidden)
173
+ hidden = self.dropout(hidden)
174
+ if self.residual is not None:
175
+ hidden = hidden + self.residual(features)
176
+ return self.out(self.output_norm(hidden))
177
+
178
+
179
+ class RecombinationStageAClassifier(BaseStageAClassifier):
180
+ def __init__(self, input_size: int, config: SentinelConfig) -> None:
181
+ super().__init__()
182
+ projection_size = int(config.projection_size)
183
+ dropout = float(config.head_dropout)
184
+ self.shared = SharedProjection(input_size, projection_size, dropout)
185
+ self.heads = nn.ModuleDict(
186
+ {
187
+ head: FunnelHead(
188
+ projection_size,
189
+ size,
190
+ dropout,
191
+ int(config.head_div),
192
+ int(config.head_mul),
193
+ bool(config.head_skip),
194
+ )
195
+ for head, size in _head_sizes(config).items()
196
+ }
197
+ )
198
+
199
+ def forward(self, features: torch.Tensor) -> dict[str, torch.Tensor]:
200
+ hidden = self.shared(features)
201
+ return self._format_outputs({head: layer(hidden) for head, layer in self.heads.items()})
202
+
203
+
204
+ class ColumnarStageAClassifier(BaseStageAClassifier):
205
+ def __init__(self, input_size: int, config: SentinelConfig) -> None:
206
+ super().__init__()
207
+ dropout = float(config.head_dropout)
208
+ self.heads = nn.ModuleDict(
209
+ {
210
+ head: FunnelHead(
211
+ int(input_size),
212
+ size,
213
+ dropout,
214
+ int(config.head_div),
215
+ int(config.head_mul),
216
+ bool(config.head_skip),
217
+ )
218
+ for head, size in _head_sizes(config).items()
219
+ }
220
+ )
221
+
222
+ def forward(self, features: torch.Tensor) -> dict[str, torch.Tensor]:
223
+ return self._format_outputs({head: layer(features) for head, layer in self.heads.items()})
224
+
225
+
226
+ class SentinelStageAModel(PreTrainedModel):
227
+ """Frozen-encoder Sentinel classifier serialized as one Transformers model."""
228
+
229
+ config_class = SentinelConfig
230
+ base_model_prefix = "encoder"
231
+ main_input_name = "input_ids"
232
+
233
+ def __init__(self, config: SentinelConfig) -> None:
234
+ super().__init__(config)
235
+ if not config.encoder_config:
236
+ raise ValueError("SentinelConfig.encoder_config is required")
237
+ encoder_config = _build_encoder_config(config)
238
+ self.encoder = AutoModel.from_config(
239
+ encoder_config,
240
+ trust_remote_code=bool(getattr(config, "encoder_trust_remote_code", False)),
241
+ )
242
+ hidden_size = int(getattr(self.encoder.config, "hidden_size"))
243
+ if config.head_type == "direct":
244
+ self.classifier = DirectStageAClassifier(hidden_size, config)
245
+ elif config.head_type == "recombine":
246
+ self.classifier = RecombinationStageAClassifier(hidden_size, config)
247
+ elif config.head_type == "columnar":
248
+ self.classifier = ColumnarStageAClassifier(hidden_size, config)
249
+ else:
250
+ raise ValueError(f"unsupported Sentinel head_type={config.head_type}")
251
+ self.post_init()
252
+
253
+ def forward(
254
+ self,
255
+ input_ids: torch.Tensor | None = None,
256
+ attention_mask: torch.Tensor | None = None,
257
+ token_type_ids: torch.Tensor | None = None,
258
+ position_ids: torch.Tensor | None = None,
259
+ head_mask: torch.Tensor | None = None,
260
+ inputs_embeds: torch.Tensor | None = None,
261
+ output_attentions: bool | None = None,
262
+ output_hidden_states: bool | None = None,
263
+ return_dict: bool | None = None,
264
+ **kwargs: Any,
265
+ ) -> dict[str, dict[str, torch.Tensor]] | tuple[dict[str, torch.Tensor]]:
266
+ encoder_kwargs: dict[str, Any] = {
267
+ "input_ids": input_ids,
268
+ "attention_mask": attention_mask,
269
+ "inputs_embeds": inputs_embeds,
270
+ "return_dict": True,
271
+ }
272
+ if head_mask is not None:
273
+ encoder_kwargs["head_mask"] = head_mask
274
+ if token_type_ids is not None:
275
+ encoder_kwargs["token_type_ids"] = token_type_ids
276
+ if position_ids is not None:
277
+ encoder_kwargs["position_ids"] = position_ids
278
+ if output_attentions is not None:
279
+ encoder_kwargs["output_attentions"] = output_attentions
280
+ if output_hidden_states is not None:
281
+ encoder_kwargs["output_hidden_states"] = output_hidden_states
282
+ encoder_outputs = self.encoder(**encoder_kwargs, **kwargs)
283
+ if attention_mask is None:
284
+ batch_size, sequence_length = encoder_outputs.last_hidden_state.shape[:2]
285
+ attention_mask = torch.ones(
286
+ (batch_size, sequence_length),
287
+ dtype=encoder_outputs.last_hidden_state.dtype,
288
+ device=encoder_outputs.last_hidden_state.device,
289
+ )
290
+ features = _masked_mean(encoder_outputs.last_hidden_state, attention_mask)
291
+ logits = self.classifier(features)
292
+ if return_dict is False:
293
+ return (logits,)
294
+ return {"logits": logits}
pipeline_sentinel.py ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Custom Transformers pipeline for Sentinel Stage A inference."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import Any
6
+
7
+ import torch
8
+ from transformers import Pipeline
9
+
10
+
11
+ class SentinelStageAPipeline(Pipeline):
12
+ """Run Sentinel Stage A prediction and return JSON-serializable probabilities."""
13
+
14
+ def _sanitize_parameters(self, **kwargs: Any) -> tuple[dict[str, Any], dict[str, Any], dict[str, Any]]:
15
+ preprocess_kwargs: dict[str, Any] = {}
16
+ postprocess_kwargs: dict[str, Any] = {}
17
+ if "max_length" in kwargs:
18
+ preprocess_kwargs["max_length"] = kwargs["max_length"]
19
+ if "return_all_probabilities" in kwargs:
20
+ postprocess_kwargs["return_all_probabilities"] = kwargs["return_all_probabilities"]
21
+ if "threshold_overrides" in kwargs:
22
+ postprocess_kwargs["threshold_overrides"] = kwargs["threshold_overrides"]
23
+ return preprocess_kwargs, {}, postprocess_kwargs
24
+
25
+ def preprocess(self, inputs: str, max_length: int | None = None) -> dict[str, torch.Tensor]:
26
+ if not isinstance(inputs, str):
27
+ raise TypeError(f"SentinelStageAPipeline expects a string input, got {type(inputs).__name__}")
28
+ limit = int(max_length or getattr(self.model.config, "max_length", 512))
29
+ return self.tokenizer(
30
+ inputs,
31
+ padding=False,
32
+ truncation=True,
33
+ max_length=limit,
34
+ return_tensors=self.framework,
35
+ )
36
+
37
+ def _forward(self, model_inputs: dict[str, torch.Tensor]) -> Any:
38
+ return self.model(**model_inputs)
39
+
40
+ def postprocess(
41
+ self,
42
+ model_outputs: Any,
43
+ return_all_probabilities: bool = True,
44
+ threshold_overrides: dict[str, float] | None = None,
45
+ ) -> dict[str, Any]:
46
+ if isinstance(model_outputs, tuple):
47
+ logits = model_outputs[0]
48
+ elif isinstance(model_outputs, dict):
49
+ logits = model_outputs["logits"]
50
+ else:
51
+ logits = model_outputs.logits
52
+ signature = getattr(self.model.config, "output_signature", {})
53
+ output_heads = getattr(self.model.config, "output_heads", None) or list(signature.keys())
54
+ thresholds = dict(getattr(self.model.config, "thresholds", {}) or {})
55
+ if threshold_overrides:
56
+ thresholds.update(threshold_overrides)
57
+
58
+ result: dict[str, Any] = {}
59
+ for head in output_heads:
60
+ head_info = signature[head]
61
+ head_type = head_info.get("type")
62
+ head_logits = logits[head]
63
+ if head_type == "binary":
64
+ probability = float(torch.sigmoid(head_logits)[0].detach().cpu())
65
+ threshold = float(thresholds.get(head, 0.5))
66
+ result[head] = {
67
+ "label": probability >= threshold,
68
+ "probability": probability,
69
+ "threshold": threshold,
70
+ }
71
+ elif head_type == "multiclass":
72
+ labels = [str(label) for label in head_info.get("labels", [])]
73
+ probabilities = torch.softmax(head_logits, dim=-1)[0].detach().cpu()
74
+ index = int(torch.argmax(probabilities).item())
75
+ result[head] = {
76
+ "label": labels[index],
77
+ "probability": float(probabilities[index]),
78
+ }
79
+ if return_all_probabilities:
80
+ result[head]["probabilities"] = {
81
+ label: float(probabilities[position])
82
+ for position, label in enumerate(labels)
83
+ }
84
+ elif head_type == "multilabel":
85
+ labels = [str(label) for label in head_info.get("labels", [])]
86
+ probabilities = torch.sigmoid(head_logits)[0].detach().cpu()
87
+ threshold = float(thresholds.get(head, 0.5))
88
+ result[head] = {
89
+ "labels": [
90
+ label
91
+ for position, label in enumerate(labels)
92
+ if float(probabilities[position]) >= threshold
93
+ ],
94
+ "threshold": threshold,
95
+ }
96
+ if return_all_probabilities:
97
+ result[head]["probabilities"] = {
98
+ label: float(probabilities[position])
99
+ for position, label in enumerate(labels)
100
+ }
101
+ else:
102
+ raise ValueError(f"unsupported Sentinel head type for {head}: {head_type}")
103
+ return result
results.md ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": true,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,945 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "|||IP_ADDRESS|||",
5
+ "lstrip": false,
6
+ "normalized": true,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": false
10
+ },
11
+ "1": {
12
+ "content": "<|padding|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "50254": {
20
+ "content": " ",
21
+ "lstrip": false,
22
+ "normalized": true,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": false
26
+ },
27
+ "50255": {
28
+ "content": " ",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": false
34
+ },
35
+ "50256": {
36
+ "content": " ",
37
+ "lstrip": false,
38
+ "normalized": true,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": false
42
+ },
43
+ "50257": {
44
+ "content": " ",
45
+ "lstrip": false,
46
+ "normalized": true,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": false
50
+ },
51
+ "50258": {
52
+ "content": " ",
53
+ "lstrip": false,
54
+ "normalized": true,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": false
58
+ },
59
+ "50259": {
60
+ "content": " ",
61
+ "lstrip": false,
62
+ "normalized": true,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": false
66
+ },
67
+ "50260": {
68
+ "content": " ",
69
+ "lstrip": false,
70
+ "normalized": true,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": false
74
+ },
75
+ "50261": {
76
+ "content": " ",
77
+ "lstrip": false,
78
+ "normalized": true,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": false
82
+ },
83
+ "50262": {
84
+ "content": " ",
85
+ "lstrip": false,
86
+ "normalized": true,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": false
90
+ },
91
+ "50263": {
92
+ "content": " ",
93
+ "lstrip": false,
94
+ "normalized": true,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": false
98
+ },
99
+ "50264": {
100
+ "content": " ",
101
+ "lstrip": false,
102
+ "normalized": true,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": false
106
+ },
107
+ "50265": {
108
+ "content": " ",
109
+ "lstrip": false,
110
+ "normalized": true,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": false
114
+ },
115
+ "50266": {
116
+ "content": " ",
117
+ "lstrip": false,
118
+ "normalized": true,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": false
122
+ },
123
+ "50267": {
124
+ "content": " ",
125
+ "lstrip": false,
126
+ "normalized": true,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": false
130
+ },
131
+ "50268": {
132
+ "content": " ",
133
+ "lstrip": false,
134
+ "normalized": true,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": false
138
+ },
139
+ "50269": {
140
+ "content": " ",
141
+ "lstrip": false,
142
+ "normalized": true,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": false
146
+ },
147
+ "50270": {
148
+ "content": " ",
149
+ "lstrip": false,
150
+ "normalized": true,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": false
154
+ },
155
+ "50271": {
156
+ "content": " ",
157
+ "lstrip": false,
158
+ "normalized": true,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": false
162
+ },
163
+ "50272": {
164
+ "content": " ",
165
+ "lstrip": false,
166
+ "normalized": true,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": false
170
+ },
171
+ "50273": {
172
+ "content": " ",
173
+ "lstrip": false,
174
+ "normalized": true,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": false
178
+ },
179
+ "50274": {
180
+ "content": " ",
181
+ "lstrip": false,
182
+ "normalized": true,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": false
186
+ },
187
+ "50275": {
188
+ "content": " ",
189
+ "lstrip": false,
190
+ "normalized": true,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": false
194
+ },
195
+ "50276": {
196
+ "content": " ",
197
+ "lstrip": false,
198
+ "normalized": true,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": false
202
+ },
203
+ "50277": {
204
+ "content": "|||EMAIL_ADDRESS|||",
205
+ "lstrip": false,
206
+ "normalized": true,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": false
210
+ },
211
+ "50278": {
212
+ "content": "|||PHONE_NUMBER|||",
213
+ "lstrip": false,
214
+ "normalized": true,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": false
218
+ },
219
+ "50279": {
220
+ "content": "<|endoftext|>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "50280": {
228
+ "content": "[UNK]",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "50281": {
236
+ "content": "[CLS]",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "50282": {
244
+ "content": "[SEP]",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "50283": {
252
+ "content": "[PAD]",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "50284": {
260
+ "content": "[MASK]",
261
+ "lstrip": true,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "50285": {
268
+ "content": "[unused0]",
269
+ "lstrip": false,
270
+ "normalized": true,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": false
274
+ },
275
+ "50286": {
276
+ "content": "[unused1]",
277
+ "lstrip": false,
278
+ "normalized": true,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": false
282
+ },
283
+ "50287": {
284
+ "content": "[unused2]",
285
+ "lstrip": false,
286
+ "normalized": true,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": false
290
+ },
291
+ "50288": {
292
+ "content": "[unused3]",
293
+ "lstrip": false,
294
+ "normalized": true,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": false
298
+ },
299
+ "50289": {
300
+ "content": "[unused4]",
301
+ "lstrip": false,
302
+ "normalized": true,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": false
306
+ },
307
+ "50290": {
308
+ "content": "[unused5]",
309
+ "lstrip": false,
310
+ "normalized": true,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": false
314
+ },
315
+ "50291": {
316
+ "content": "[unused6]",
317
+ "lstrip": false,
318
+ "normalized": true,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": false
322
+ },
323
+ "50292": {
324
+ "content": "[unused7]",
325
+ "lstrip": false,
326
+ "normalized": true,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": false
330
+ },
331
+ "50293": {
332
+ "content": "[unused8]",
333
+ "lstrip": false,
334
+ "normalized": true,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": false
338
+ },
339
+ "50294": {
340
+ "content": "[unused9]",
341
+ "lstrip": false,
342
+ "normalized": true,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": false
346
+ },
347
+ "50295": {
348
+ "content": "[unused10]",
349
+ "lstrip": false,
350
+ "normalized": true,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": false
354
+ },
355
+ "50296": {
356
+ "content": "[unused11]",
357
+ "lstrip": false,
358
+ "normalized": true,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": false
362
+ },
363
+ "50297": {
364
+ "content": "[unused12]",
365
+ "lstrip": false,
366
+ "normalized": true,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": false
370
+ },
371
+ "50298": {
372
+ "content": "[unused13]",
373
+ "lstrip": false,
374
+ "normalized": true,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": false
378
+ },
379
+ "50299": {
380
+ "content": "[unused14]",
381
+ "lstrip": false,
382
+ "normalized": true,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": false
386
+ },
387
+ "50300": {
388
+ "content": "[unused15]",
389
+ "lstrip": false,
390
+ "normalized": true,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": false
394
+ },
395
+ "50301": {
396
+ "content": "[unused16]",
397
+ "lstrip": false,
398
+ "normalized": true,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": false
402
+ },
403
+ "50302": {
404
+ "content": "[unused17]",
405
+ "lstrip": false,
406
+ "normalized": true,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": false
410
+ },
411
+ "50303": {
412
+ "content": "[unused18]",
413
+ "lstrip": false,
414
+ "normalized": true,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": false
418
+ },
419
+ "50304": {
420
+ "content": "[unused19]",
421
+ "lstrip": false,
422
+ "normalized": true,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": false
426
+ },
427
+ "50305": {
428
+ "content": "[unused20]",
429
+ "lstrip": false,
430
+ "normalized": true,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": false
434
+ },
435
+ "50306": {
436
+ "content": "[unused21]",
437
+ "lstrip": false,
438
+ "normalized": true,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": false
442
+ },
443
+ "50307": {
444
+ "content": "[unused22]",
445
+ "lstrip": false,
446
+ "normalized": true,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": false
450
+ },
451
+ "50308": {
452
+ "content": "[unused23]",
453
+ "lstrip": false,
454
+ "normalized": true,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": false
458
+ },
459
+ "50309": {
460
+ "content": "[unused24]",
461
+ "lstrip": false,
462
+ "normalized": true,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": false
466
+ },
467
+ "50310": {
468
+ "content": "[unused25]",
469
+ "lstrip": false,
470
+ "normalized": true,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": false
474
+ },
475
+ "50311": {
476
+ "content": "[unused26]",
477
+ "lstrip": false,
478
+ "normalized": true,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": false
482
+ },
483
+ "50312": {
484
+ "content": "[unused27]",
485
+ "lstrip": false,
486
+ "normalized": true,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": false
490
+ },
491
+ "50313": {
492
+ "content": "[unused28]",
493
+ "lstrip": false,
494
+ "normalized": true,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": false
498
+ },
499
+ "50314": {
500
+ "content": "[unused29]",
501
+ "lstrip": false,
502
+ "normalized": true,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": false
506
+ },
507
+ "50315": {
508
+ "content": "[unused30]",
509
+ "lstrip": false,
510
+ "normalized": true,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": false
514
+ },
515
+ "50316": {
516
+ "content": "[unused31]",
517
+ "lstrip": false,
518
+ "normalized": true,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": false
522
+ },
523
+ "50317": {
524
+ "content": "[unused32]",
525
+ "lstrip": false,
526
+ "normalized": true,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": false
530
+ },
531
+ "50318": {
532
+ "content": "[unused33]",
533
+ "lstrip": false,
534
+ "normalized": true,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": false
538
+ },
539
+ "50319": {
540
+ "content": "[unused34]",
541
+ "lstrip": false,
542
+ "normalized": true,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": false
546
+ },
547
+ "50320": {
548
+ "content": "[unused35]",
549
+ "lstrip": false,
550
+ "normalized": true,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": false
554
+ },
555
+ "50321": {
556
+ "content": "[unused36]",
557
+ "lstrip": false,
558
+ "normalized": true,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": false
562
+ },
563
+ "50322": {
564
+ "content": "[unused37]",
565
+ "lstrip": false,
566
+ "normalized": true,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": false
570
+ },
571
+ "50323": {
572
+ "content": "[unused38]",
573
+ "lstrip": false,
574
+ "normalized": true,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": false
578
+ },
579
+ "50324": {
580
+ "content": "[unused39]",
581
+ "lstrip": false,
582
+ "normalized": true,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": false
586
+ },
587
+ "50325": {
588
+ "content": "[unused40]",
589
+ "lstrip": false,
590
+ "normalized": true,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": false
594
+ },
595
+ "50326": {
596
+ "content": "[unused41]",
597
+ "lstrip": false,
598
+ "normalized": true,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": false
602
+ },
603
+ "50327": {
604
+ "content": "[unused42]",
605
+ "lstrip": false,
606
+ "normalized": true,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": false
610
+ },
611
+ "50328": {
612
+ "content": "[unused43]",
613
+ "lstrip": false,
614
+ "normalized": true,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": false
618
+ },
619
+ "50329": {
620
+ "content": "[unused44]",
621
+ "lstrip": false,
622
+ "normalized": true,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": false
626
+ },
627
+ "50330": {
628
+ "content": "[unused45]",
629
+ "lstrip": false,
630
+ "normalized": true,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": false
634
+ },
635
+ "50331": {
636
+ "content": "[unused46]",
637
+ "lstrip": false,
638
+ "normalized": true,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": false
642
+ },
643
+ "50332": {
644
+ "content": "[unused47]",
645
+ "lstrip": false,
646
+ "normalized": true,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": false
650
+ },
651
+ "50333": {
652
+ "content": "[unused48]",
653
+ "lstrip": false,
654
+ "normalized": true,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": false
658
+ },
659
+ "50334": {
660
+ "content": "[unused49]",
661
+ "lstrip": false,
662
+ "normalized": true,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": false
666
+ },
667
+ "50335": {
668
+ "content": "[unused50]",
669
+ "lstrip": false,
670
+ "normalized": true,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": false
674
+ },
675
+ "50336": {
676
+ "content": "[unused51]",
677
+ "lstrip": false,
678
+ "normalized": true,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": false
682
+ },
683
+ "50337": {
684
+ "content": "[unused52]",
685
+ "lstrip": false,
686
+ "normalized": true,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": false
690
+ },
691
+ "50338": {
692
+ "content": "[unused53]",
693
+ "lstrip": false,
694
+ "normalized": true,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": false
698
+ },
699
+ "50339": {
700
+ "content": "[unused54]",
701
+ "lstrip": false,
702
+ "normalized": true,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": false
706
+ },
707
+ "50340": {
708
+ "content": "[unused55]",
709
+ "lstrip": false,
710
+ "normalized": true,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": false
714
+ },
715
+ "50341": {
716
+ "content": "[unused56]",
717
+ "lstrip": false,
718
+ "normalized": true,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": false
722
+ },
723
+ "50342": {
724
+ "content": "[unused57]",
725
+ "lstrip": false,
726
+ "normalized": true,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": false
730
+ },
731
+ "50343": {
732
+ "content": "[unused58]",
733
+ "lstrip": false,
734
+ "normalized": true,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": false
738
+ },
739
+ "50344": {
740
+ "content": "[unused59]",
741
+ "lstrip": false,
742
+ "normalized": true,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": false
746
+ },
747
+ "50345": {
748
+ "content": "[unused60]",
749
+ "lstrip": false,
750
+ "normalized": true,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": false
754
+ },
755
+ "50346": {
756
+ "content": "[unused61]",
757
+ "lstrip": false,
758
+ "normalized": true,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": false
762
+ },
763
+ "50347": {
764
+ "content": "[unused62]",
765
+ "lstrip": false,
766
+ "normalized": true,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": false
770
+ },
771
+ "50348": {
772
+ "content": "[unused63]",
773
+ "lstrip": false,
774
+ "normalized": true,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": false
778
+ },
779
+ "50349": {
780
+ "content": "[unused64]",
781
+ "lstrip": false,
782
+ "normalized": true,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": false
786
+ },
787
+ "50350": {
788
+ "content": "[unused65]",
789
+ "lstrip": false,
790
+ "normalized": true,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": false
794
+ },
795
+ "50351": {
796
+ "content": "[unused66]",
797
+ "lstrip": false,
798
+ "normalized": true,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": false
802
+ },
803
+ "50352": {
804
+ "content": "[unused67]",
805
+ "lstrip": false,
806
+ "normalized": true,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": false
810
+ },
811
+ "50353": {
812
+ "content": "[unused68]",
813
+ "lstrip": false,
814
+ "normalized": true,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": false
818
+ },
819
+ "50354": {
820
+ "content": "[unused69]",
821
+ "lstrip": false,
822
+ "normalized": true,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": false
826
+ },
827
+ "50355": {
828
+ "content": "[unused70]",
829
+ "lstrip": false,
830
+ "normalized": true,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": false
834
+ },
835
+ "50356": {
836
+ "content": "[unused71]",
837
+ "lstrip": false,
838
+ "normalized": true,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": false
842
+ },
843
+ "50357": {
844
+ "content": "[unused72]",
845
+ "lstrip": false,
846
+ "normalized": true,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": false
850
+ },
851
+ "50358": {
852
+ "content": "[unused73]",
853
+ "lstrip": false,
854
+ "normalized": true,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": false
858
+ },
859
+ "50359": {
860
+ "content": "[unused74]",
861
+ "lstrip": false,
862
+ "normalized": true,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": false
866
+ },
867
+ "50360": {
868
+ "content": "[unused75]",
869
+ "lstrip": false,
870
+ "normalized": true,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": false
874
+ },
875
+ "50361": {
876
+ "content": "[unused76]",
877
+ "lstrip": false,
878
+ "normalized": true,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": false
882
+ },
883
+ "50362": {
884
+ "content": "[unused77]",
885
+ "lstrip": false,
886
+ "normalized": true,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": false
890
+ },
891
+ "50363": {
892
+ "content": "[unused78]",
893
+ "lstrip": false,
894
+ "normalized": true,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": false
898
+ },
899
+ "50364": {
900
+ "content": "[unused79]",
901
+ "lstrip": false,
902
+ "normalized": true,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": false
906
+ },
907
+ "50365": {
908
+ "content": "[unused80]",
909
+ "lstrip": false,
910
+ "normalized": true,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": false
914
+ },
915
+ "50366": {
916
+ "content": "[unused81]",
917
+ "lstrip": false,
918
+ "normalized": true,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": false
922
+ },
923
+ "50367": {
924
+ "content": "[unused82]",
925
+ "lstrip": false,
926
+ "normalized": true,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": false
930
+ }
931
+ },
932
+ "clean_up_tokenization_spaces": true,
933
+ "cls_token": "[CLS]",
934
+ "extra_special_tokens": {},
935
+ "mask_token": "[MASK]",
936
+ "model_input_names": [
937
+ "input_ids",
938
+ "attention_mask"
939
+ ],
940
+ "model_max_length": 8192,
941
+ "pad_token": "[PAD]",
942
+ "sep_token": "[SEP]",
943
+ "tokenizer_class": "PreTrainedTokenizerFast",
944
+ "unk_token": "[UNK]"
945
+ }