Commit ·
d5ffba1
0
Parent(s):
Release v2.1 · model + card
Browse files- .gitattributes +3 -0
- LICENSE +21 -0
- README.md +176 -0
- USAGE.txt +27 -0
- config.json +74 -0
- dtypes.json +92 -0
- model.safetensors +3 -0
- viterbi_calibration.json +14 -0
.gitattributes
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
MIT License
|
| 2 |
+
|
| 3 |
+
Copyright (c) 2026 Digitflow
|
| 4 |
+
|
| 5 |
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
| 6 |
+
of this software and associated documentation files (the "Software"), to deal
|
| 7 |
+
in the Software without restriction, including without limitation the rights
|
| 8 |
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
| 9 |
+
copies of the Software, and to permit persons to whom the Software is
|
| 10 |
+
furnished to do so, subject to the following conditions:
|
| 11 |
+
|
| 12 |
+
The above copyright notice and this permission notice shall be included in all
|
| 13 |
+
copies or substantial portions of the Software.
|
| 14 |
+
|
| 15 |
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
| 16 |
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
| 17 |
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
| 18 |
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
| 19 |
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
| 20 |
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
| 21 |
+
SOFTWARE.
|
README.md
ADDED
|
@@ -0,0 +1,176 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- de
|
| 5 |
+
base_model: openai/privacy-filter
|
| 6 |
+
pipeline_tag: token-classification
|
| 7 |
+
library_name: opf
|
| 8 |
+
tags:
|
| 9 |
+
- pii
|
| 10 |
+
- privacy
|
| 11 |
+
- ner
|
| 12 |
+
- token-classification
|
| 13 |
+
- german
|
| 14 |
+
- de
|
| 15 |
+
- privacy-filter
|
| 16 |
+
- opf
|
| 17 |
+
datasets:
|
| 18 |
+
- ai4privacy/open-pii-masking-500k-ai4privacy
|
| 19 |
+
metrics:
|
| 20 |
+
- f1
|
| 21 |
+
model-index:
|
| 22 |
+
- name: digitflow/privacy-filter-de-ft
|
| 23 |
+
results:
|
| 24 |
+
- task:
|
| 25 |
+
type: token-classification
|
| 26 |
+
name: PII detection (German)
|
| 27 |
+
dataset:
|
| 28 |
+
name: ai4privacy/open-pii-masking-500k-ai4privacy (de validation, n=1,000)
|
| 29 |
+
type: ai4privacy/open-pii-masking-500k-ai4privacy
|
| 30 |
+
split: validation
|
| 31 |
+
args:
|
| 32 |
+
language: de
|
| 33 |
+
metrics:
|
| 34 |
+
- type: f1
|
| 35 |
+
value: 0.8706
|
| 36 |
+
name: OPF-containment F1 (char-level, label-agnostic)
|
| 37 |
+
- type: f1
|
| 38 |
+
value: 0.8368
|
| 39 |
+
name: Char-coverage F1 (label-aware)
|
| 40 |
+
- type: f1
|
| 41 |
+
value: 0.6445
|
| 42 |
+
name: Strict span F1
|
| 43 |
+
---
|
| 44 |
+
|
| 45 |
+
# digitflow/privacy-filter-de-ft
|
| 46 |
+
|
| 47 |
+
A German-language fine-tune of [`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter).
|
| 48 |
+
It exposes the same inference API and OPF label space as the base
|
| 49 |
+
model, so existing OPF call sites work without changes on German
|
| 50 |
+
input.
|
| 51 |
+
|
| 52 |
+
**Caveat.** This model is not a perfect redactor for German PII. No
|
| 53 |
+
warranty is provided and Digitflow accepts no legal responsibility
|
| 54 |
+
for decisions made on its output. Use at your own risk. For
|
| 55 |
+
non-German text, use [`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter)
|
| 56 |
+
directly.
|
| 57 |
+
|
| 58 |
+
## Benchmark
|
| 59 |
+
|
| 60 |
+
Evaluated on the German subset (`language == 'de'`, n = 1,000) of the
|
| 61 |
+
[`ai4privacy/open-pii-masking-500k-ai4privacy`](https://huggingface.co/datasets/ai4privacy/open-pii-masking-500k-ai4privacy)
|
| 62 |
+
validation split, scored with OPF-containment F1 (the char-level,
|
| 63 |
+
label-agnostic completeness metric from the OPF reference scoring
|
| 64 |
+
code). 95 % confidence intervals are estimated by 1,000-sample
|
| 65 |
+
bootstrap resampling with replacement, taking the 2.5th and 97.5th
|
| 66 |
+
percentiles of the resulting F1 distribution.
|
| 67 |
+
|
| 68 |
+
| Metric | `openai/privacy-filter` | `digitflow/privacy-filter-de-ft` | Δ |
|
| 69 |
+
|---|---:|---:|---:|
|
| 70 |
+
| **OPF-containment F1** | 0.8437 | **0.8706** | **+0.027** |
|
| 71 |
+
| Leak rate (1 − char recall, label-agnostic) | 23.05 % | **20.49 %** | **−2.56 pp** |
|
| 72 |
+
| Char-coverage F1, label-aware | 0.6791 | **0.8368** | **+0.158** |
|
| 73 |
+
| Strict span F1 | 0.4348 | **0.6445** | **+0.210** |
|
| 74 |
+
| Strict span precision | 0.5645 | **0.7518** | +0.187 |
|
| 75 |
+
| Strict span recall | 0.3536 | **0.5640** | +0.210 |
|
| 76 |
+
|
| 77 |
+
| Model | OPF-containment F1 | 95 % bootstrap CI |
|
| 78 |
+
|---|---:|---|
|
| 79 |
+
| `openai/privacy-filter` | 0.8437 | [0.8294, 0.8579] |
|
| 80 |
+
| `digitflow/privacy-filter-de-ft` | 0.8706 | [0.8585, 0.8812] |
|
| 81 |
+
|
| 82 |
+
The intervals do not overlap; the +0.027 lift is significant against
|
| 83 |
+
single-slice sampling noise.
|
| 84 |
+
|
| 85 |
+
## Examples
|
| 86 |
+
|
| 87 |
+
Output of `m.redact(text)`, formatted as `label:'redacted text'`.
|
| 88 |
+
`(none)` means the model returned no spans.
|
| 89 |
+
|
| 90 |
+
| Input | `openai/privacy-filter` | `digitflow/privacy-filter-de-ft` |
|
| 91 |
+
|---|---|---|
|
| 92 |
+
| Mein Name ist Jürgen Müller und ich wohne in Hamburg. | `(none)` | `private_person:'Jürgen Müller'`, `private_address:'Hamburg'` |
|
| 93 |
+
| Mein Passwort lautet SicherPasswort123! | `(none)` | `secret:'SicherPasswort123!'` |
|
| 94 |
+
| Senden Sie das Paket an Hauptstraße 25, 10115 Berlin. | `(none)` | `private_address:'Hauptstraße 25, 10115 Berlin'` |
|
| 95 |
+
| Hans-Jürgen Brömmelmeyer hat den Termin bestätigt. | `(none)` | `private_person:'Hans-Jürgen Brömmelmeyer'` |
|
| 96 |
+
| Server-Status: https://intern.firma.de/health. | `(none)` | `private_url:'https://intern.firma.de/health'` |
|
| 97 |
+
| Termin mit Mariella von Schönefeld-Brixius um 15:00. | `private_person:'Mariella von Schönefeld-Brixius'` | `private_person:'Mariella von Schönefeld-Brixius'`, `private_date:'15:00'` |
|
| 98 |
+
|
| 99 |
+
## How it was built
|
| 100 |
+
|
| 101 |
+
The fine-tune adapts the base model to German PII through slot-filled
|
| 102 |
+
augmentation of public German carriers.
|
| 103 |
+
|
| 104 |
+
It is supplemented by a hand-authored curriculum spanning real-world
|
| 105 |
+
text registers, and trained on a single NVIDIA Jetson Orin.
|
| 106 |
+
|
| 107 |
+
The training set is screened against the evaluation slice for
|
| 108 |
+
contamination before training begins.
|
| 109 |
+
|
| 110 |
+
## How to use it
|
| 111 |
+
|
| 112 |
+
The OPF Python API is unchanged. Fetch the checkpoint with
|
| 113 |
+
`huggingface_hub.snapshot_download(...)` and pass the resulting local
|
| 114 |
+
path to `opf.OPF`.
|
| 115 |
+
|
| 116 |
+
```python
|
| 117 |
+
from huggingface_hub import snapshot_download
|
| 118 |
+
import opf
|
| 119 |
+
|
| 120 |
+
path = snapshot_download("digitflow/privacy-filter-de-ft")
|
| 121 |
+
|
| 122 |
+
m = opf.OPF(
|
| 123 |
+
model=path,
|
| 124 |
+
device="cuda",
|
| 125 |
+
output_mode="typed",
|
| 126 |
+
decode_mode="viterbi",
|
| 127 |
+
)
|
| 128 |
+
|
| 129 |
+
text = "Mein Name ist Jürgen Müller und ich wohne in Hamburg."
|
| 130 |
+
result = m.redact(text)
|
| 131 |
+
for span in result.detected_spans:
|
| 132 |
+
print(f"{span.label}: {text[span.start:span.end]!r}")
|
| 133 |
+
# private_person: 'Jürgen Müller'
|
| 134 |
+
# private_address: 'Hamburg'
|
| 135 |
+
```
|
| 136 |
+
|
| 137 |
+
`snapshot_download` caches the weights under `~/.cache/huggingface/`
|
| 138 |
+
so subsequent calls are free. The current `opf` release does not
|
| 139 |
+
resolve a Hub repo id directly; it expects a local checkpoint
|
| 140 |
+
directory.
|
| 141 |
+
|
| 142 |
+
### Reproducing the benchmark
|
| 143 |
+
|
| 144 |
+
```python
|
| 145 |
+
from datasets import load_dataset
|
| 146 |
+
from huggingface_hub import snapshot_download
|
| 147 |
+
import opf
|
| 148 |
+
# ... plus shared.span_prf and metrics.char_coverage_prf from the
|
| 149 |
+
# openai/privacy-filter reference scoring code.
|
| 150 |
+
|
| 151 |
+
ds = load_dataset(
|
| 152 |
+
"ai4privacy/open-pii-masking-500k-ai4privacy",
|
| 153 |
+
split="validation",
|
| 154 |
+
)
|
| 155 |
+
de = ds.filter(lambda r: r["language"] == "de").select(range(1000))
|
| 156 |
+
|
| 157 |
+
ft_path = snapshot_download("digitflow/privacy-filter-de-ft")
|
| 158 |
+
m_base = opf.OPF(device="cuda", output_mode="typed", decode_mode="viterbi")
|
| 159 |
+
m_ft = opf.OPF(model=ft_path,
|
| 160 |
+
device="cuda", output_mode="typed", decode_mode="viterbi")
|
| 161 |
+
|
| 162 |
+
# Run m.redact() per row, collect predicted spans, score against gold
|
| 163 |
+
# with `char_coverage_prf(predictions, golds, label_aware=False)`.
|
| 164 |
+
# Report the __micro__.f1 as OPF-containment F1.
|
| 165 |
+
```
|
| 166 |
+
|
| 167 |
+
## License and citations
|
| 168 |
+
|
| 169 |
+
**License.** [MIT](./LICENSE).
|
| 170 |
+
|
| 171 |
+
[`ai4privacy/open-pii-masking-500k-ai4privacy`](https://huggingface.co/datasets/ai4privacy/open-pii-masking-500k-ai4privacy)
|
| 172 |
+
was used as the source of training carriers (with augmentation) and
|
| 173 |
+
as the validation slice for the benchmark above.
|
| 174 |
+
|
| 175 |
+
[`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter)
|
| 176 |
+
is the base model (Apache 2.0).
|
USAGE.txt
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
digitflow/privacy-filter-de-ft
|
| 2 |
+
A German fine-tune of openai/privacy-filter. Drop-in on German text.
|
| 3 |
+
|
| 4 |
+
Quick start (Python, opf):
|
| 5 |
+
from huggingface_hub import snapshot_download
|
| 6 |
+
import opf
|
| 7 |
+
|
| 8 |
+
path = snapshot_download("digitflow/privacy-filter-de-ft")
|
| 9 |
+
m = opf.OPF(
|
| 10 |
+
model=path,
|
| 11 |
+
device="cuda",
|
| 12 |
+
output_mode="typed",
|
| 13 |
+
decode_mode="viterbi",
|
| 14 |
+
)
|
| 15 |
+
result = m.redact("Mein Name ist Jürgen Müller und ich wohne in Hamburg.")
|
| 16 |
+
for span in result.detected_spans:
|
| 17 |
+
print(span.label, repr(span.text))
|
| 18 |
+
# private_person 'Jürgen Müller'
|
| 19 |
+
# private_address 'Hamburg'
|
| 20 |
+
|
| 21 |
+
Note: the current opf release loads from a local checkpoint directory,
|
| 22 |
+
not from a Hub repo id directly. Fetch with snapshot_download first.
|
| 23 |
+
The cache lives under ~/.cache/huggingface/ so subsequent runs are
|
| 24 |
+
free.
|
| 25 |
+
|
| 26 |
+
See README.md for the full model card, benchmark numbers, examples,
|
| 27 |
+
and when not to use this model.
|
config.json
ADDED
|
@@ -0,0 +1,74 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"bidirectional_context": true,
|
| 3 |
+
"bidirectional_left_context": 128,
|
| 4 |
+
"bidirectional_right_context": 128,
|
| 5 |
+
"category_version": "v2",
|
| 6 |
+
"default_n_ctx": 128000,
|
| 7 |
+
"encoding": "o200k_base",
|
| 8 |
+
"experts_per_token": 4,
|
| 9 |
+
"head_dim": 64,
|
| 10 |
+
"hidden_size": 640,
|
| 11 |
+
"inference_contract_version": 1,
|
| 12 |
+
"initial_context_length": 4096,
|
| 13 |
+
"intermediate_size": 640,
|
| 14 |
+
"max_position_embeddings": 131072,
|
| 15 |
+
"model_type": "privacy_filter",
|
| 16 |
+
"ner_class_names": [
|
| 17 |
+
"O",
|
| 18 |
+
"B-account_number",
|
| 19 |
+
"I-account_number",
|
| 20 |
+
"E-account_number",
|
| 21 |
+
"S-account_number",
|
| 22 |
+
"B-private_address",
|
| 23 |
+
"I-private_address",
|
| 24 |
+
"E-private_address",
|
| 25 |
+
"S-private_address",
|
| 26 |
+
"B-private_date",
|
| 27 |
+
"I-private_date",
|
| 28 |
+
"E-private_date",
|
| 29 |
+
"S-private_date",
|
| 30 |
+
"B-private_email",
|
| 31 |
+
"I-private_email",
|
| 32 |
+
"E-private_email",
|
| 33 |
+
"S-private_email",
|
| 34 |
+
"B-private_person",
|
| 35 |
+
"I-private_person",
|
| 36 |
+
"E-private_person",
|
| 37 |
+
"S-private_person",
|
| 38 |
+
"B-private_phone",
|
| 39 |
+
"I-private_phone",
|
| 40 |
+
"E-private_phone",
|
| 41 |
+
"S-private_phone",
|
| 42 |
+
"B-private_url",
|
| 43 |
+
"I-private_url",
|
| 44 |
+
"E-private_url",
|
| 45 |
+
"S-private_url",
|
| 46 |
+
"B-secret",
|
| 47 |
+
"I-secret",
|
| 48 |
+
"E-secret",
|
| 49 |
+
"S-secret"
|
| 50 |
+
],
|
| 51 |
+
"num_attention_heads": 14,
|
| 52 |
+
"num_experts": 128,
|
| 53 |
+
"num_hidden_layers": 8,
|
| 54 |
+
"num_key_value_heads": 2,
|
| 55 |
+
"num_labels": 33,
|
| 56 |
+
"param_dtype": "bfloat16",
|
| 57 |
+
"rope_ntk_alpha": 1.0,
|
| 58 |
+
"rope_ntk_beta": 32.0,
|
| 59 |
+
"rope_scaling_factor": 32.0,
|
| 60 |
+
"rope_theta": 150000,
|
| 61 |
+
"sliding_window": 257,
|
| 62 |
+
"span_class_names": [
|
| 63 |
+
"O",
|
| 64 |
+
"account_number",
|
| 65 |
+
"private_address",
|
| 66 |
+
"private_date",
|
| 67 |
+
"private_email",
|
| 68 |
+
"private_person",
|
| 69 |
+
"private_phone",
|
| 70 |
+
"private_url",
|
| 71 |
+
"secret"
|
| 72 |
+
],
|
| 73 |
+
"vocab_size": 200064
|
| 74 |
+
}
|
dtypes.json
ADDED
|
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"embedding.weight": "torch.bfloat16",
|
| 3 |
+
"block.0.attn.qkv.weight": "torch.bfloat16",
|
| 4 |
+
"block.0.attn.qkv.bias": "torch.bfloat16",
|
| 5 |
+
"block.0.attn.sinks": "torch.float32",
|
| 6 |
+
"block.0.attn.out.weight": "torch.bfloat16",
|
| 7 |
+
"block.0.attn.out.bias": "torch.bfloat16",
|
| 8 |
+
"block.0.mlp.gate.weight": "torch.bfloat16",
|
| 9 |
+
"block.0.mlp.gate.bias": "torch.bfloat16",
|
| 10 |
+
"block.0.mlp.swiglu.weight": "torch.bfloat16",
|
| 11 |
+
"block.0.mlp.swiglu.bias": "torch.bfloat16",
|
| 12 |
+
"block.0.mlp.out.weight": "torch.bfloat16",
|
| 13 |
+
"block.0.mlp.out.bias": "torch.bfloat16",
|
| 14 |
+
"block.1.attn.qkv.weight": "torch.bfloat16",
|
| 15 |
+
"block.1.attn.qkv.bias": "torch.bfloat16",
|
| 16 |
+
"block.1.attn.sinks": "torch.float32",
|
| 17 |
+
"block.1.attn.out.weight": "torch.bfloat16",
|
| 18 |
+
"block.1.attn.out.bias": "torch.bfloat16",
|
| 19 |
+
"block.1.mlp.gate.weight": "torch.bfloat16",
|
| 20 |
+
"block.1.mlp.gate.bias": "torch.bfloat16",
|
| 21 |
+
"block.1.mlp.swiglu.weight": "torch.bfloat16",
|
| 22 |
+
"block.1.mlp.swiglu.bias": "torch.bfloat16",
|
| 23 |
+
"block.1.mlp.out.weight": "torch.bfloat16",
|
| 24 |
+
"block.1.mlp.out.bias": "torch.bfloat16",
|
| 25 |
+
"block.2.attn.qkv.weight": "torch.bfloat16",
|
| 26 |
+
"block.2.attn.qkv.bias": "torch.bfloat16",
|
| 27 |
+
"block.2.attn.sinks": "torch.float32",
|
| 28 |
+
"block.2.attn.out.weight": "torch.bfloat16",
|
| 29 |
+
"block.2.attn.out.bias": "torch.bfloat16",
|
| 30 |
+
"block.2.mlp.gate.weight": "torch.bfloat16",
|
| 31 |
+
"block.2.mlp.gate.bias": "torch.bfloat16",
|
| 32 |
+
"block.2.mlp.swiglu.weight": "torch.bfloat16",
|
| 33 |
+
"block.2.mlp.swiglu.bias": "torch.bfloat16",
|
| 34 |
+
"block.2.mlp.out.weight": "torch.bfloat16",
|
| 35 |
+
"block.2.mlp.out.bias": "torch.bfloat16",
|
| 36 |
+
"block.3.attn.qkv.weight": "torch.bfloat16",
|
| 37 |
+
"block.3.attn.qkv.bias": "torch.bfloat16",
|
| 38 |
+
"block.3.attn.sinks": "torch.float32",
|
| 39 |
+
"block.3.attn.out.weight": "torch.bfloat16",
|
| 40 |
+
"block.3.attn.out.bias": "torch.bfloat16",
|
| 41 |
+
"block.3.mlp.gate.weight": "torch.bfloat16",
|
| 42 |
+
"block.3.mlp.gate.bias": "torch.bfloat16",
|
| 43 |
+
"block.3.mlp.swiglu.weight": "torch.bfloat16",
|
| 44 |
+
"block.3.mlp.swiglu.bias": "torch.bfloat16",
|
| 45 |
+
"block.3.mlp.out.weight": "torch.bfloat16",
|
| 46 |
+
"block.3.mlp.out.bias": "torch.bfloat16",
|
| 47 |
+
"block.4.attn.qkv.weight": "torch.bfloat16",
|
| 48 |
+
"block.4.attn.qkv.bias": "torch.bfloat16",
|
| 49 |
+
"block.4.attn.sinks": "torch.float32",
|
| 50 |
+
"block.4.attn.out.weight": "torch.bfloat16",
|
| 51 |
+
"block.4.attn.out.bias": "torch.bfloat16",
|
| 52 |
+
"block.4.mlp.gate.weight": "torch.bfloat16",
|
| 53 |
+
"block.4.mlp.gate.bias": "torch.bfloat16",
|
| 54 |
+
"block.4.mlp.swiglu.weight": "torch.bfloat16",
|
| 55 |
+
"block.4.mlp.swiglu.bias": "torch.bfloat16",
|
| 56 |
+
"block.4.mlp.out.weight": "torch.bfloat16",
|
| 57 |
+
"block.4.mlp.out.bias": "torch.bfloat16",
|
| 58 |
+
"block.5.attn.qkv.weight": "torch.bfloat16",
|
| 59 |
+
"block.5.attn.qkv.bias": "torch.bfloat16",
|
| 60 |
+
"block.5.attn.sinks": "torch.float32",
|
| 61 |
+
"block.5.attn.out.weight": "torch.bfloat16",
|
| 62 |
+
"block.5.attn.out.bias": "torch.bfloat16",
|
| 63 |
+
"block.5.mlp.gate.weight": "torch.bfloat16",
|
| 64 |
+
"block.5.mlp.gate.bias": "torch.bfloat16",
|
| 65 |
+
"block.5.mlp.swiglu.weight": "torch.bfloat16",
|
| 66 |
+
"block.5.mlp.swiglu.bias": "torch.bfloat16",
|
| 67 |
+
"block.5.mlp.out.weight": "torch.bfloat16",
|
| 68 |
+
"block.5.mlp.out.bias": "torch.bfloat16",
|
| 69 |
+
"block.6.attn.qkv.weight": "torch.bfloat16",
|
| 70 |
+
"block.6.attn.qkv.bias": "torch.bfloat16",
|
| 71 |
+
"block.6.attn.sinks": "torch.float32",
|
| 72 |
+
"block.6.attn.out.weight": "torch.bfloat16",
|
| 73 |
+
"block.6.attn.out.bias": "torch.bfloat16",
|
| 74 |
+
"block.6.mlp.gate.weight": "torch.bfloat16",
|
| 75 |
+
"block.6.mlp.gate.bias": "torch.bfloat16",
|
| 76 |
+
"block.6.mlp.swiglu.weight": "torch.bfloat16",
|
| 77 |
+
"block.6.mlp.swiglu.bias": "torch.bfloat16",
|
| 78 |
+
"block.6.mlp.out.weight": "torch.bfloat16",
|
| 79 |
+
"block.6.mlp.out.bias": "torch.bfloat16",
|
| 80 |
+
"block.7.attn.qkv.weight": "torch.bfloat16",
|
| 81 |
+
"block.7.attn.qkv.bias": "torch.bfloat16",
|
| 82 |
+
"block.7.attn.sinks": "torch.float32",
|
| 83 |
+
"block.7.attn.out.weight": "torch.bfloat16",
|
| 84 |
+
"block.7.attn.out.bias": "torch.bfloat16",
|
| 85 |
+
"block.7.mlp.gate.weight": "torch.bfloat16",
|
| 86 |
+
"block.7.mlp.gate.bias": "torch.bfloat16",
|
| 87 |
+
"block.7.mlp.swiglu.weight": "torch.bfloat16",
|
| 88 |
+
"block.7.mlp.swiglu.bias": "torch.bfloat16",
|
| 89 |
+
"block.7.mlp.out.weight": "torch.bfloat16",
|
| 90 |
+
"block.7.mlp.out.bias": "torch.bfloat16",
|
| 91 |
+
"unembedding.weight": "torch.bfloat16"
|
| 92 |
+
}
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6ef1dd109ba977e8baa611ae2f0fcb037e1782d342fe388318d9bde21882d14e
|
| 3 |
+
size 2798983976
|
viterbi_calibration.json
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"operating_points": {
|
| 3 |
+
"default": {
|
| 4 |
+
"biases": {
|
| 5 |
+
"transition_bias_background_stay": 0.0,
|
| 6 |
+
"transition_bias_background_to_start": 0.0,
|
| 7 |
+
"transition_bias_end_to_background": 0.0,
|
| 8 |
+
"transition_bias_end_to_start": 0.0,
|
| 9 |
+
"transition_bias_inside_to_continue": 0.0,
|
| 10 |
+
"transition_bias_inside_to_end": 0.0
|
| 11 |
+
}
|
| 12 |
+
}
|
| 13 |
+
}
|
| 14 |
+
}
|