mjbommar commited on
Commit
31181d3
·
verified ·
1 Parent(s): a6eed73

Update README with improved formatting, YAML metadata, and verified statistics

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md CHANGED
@@ -1,3 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # glaurung-binary-tokenizer-002
2
 
3
  A cross-platform BPE tokenizer for binary executables and machine code. Trained using advanced chunked training with deduplication on 23 GB of diverse binaries spanning Linux and Windows platforms.
 
1
+ ---
2
+ language:
3
+ - code
4
+ license: apache-2.0
5
+ tags:
6
+ - tokenizer
7
+ - binary-analysis
8
+ - binary-tokenization
9
+ - bpe
10
+ - byte-pair-encoding
11
+ - malware-analysis
12
+ - reverse-engineering
13
+ - security
14
+ - x86-64
15
+ - arm64
16
+ - elf
17
+ - pe
18
+ library_name: tokenizers
19
+ pipeline_tag: feature-extraction
20
+ ---
21
+
22
  # glaurung-binary-tokenizer-002
23
 
24
  A cross-platform BPE tokenizer for binary executables and machine code. Trained using advanced chunked training with deduplication on 23 GB of diverse binaries spanning Linux and Windows platforms.