assemsabry commited on
Commit
2c8ad44
·
verified ·
1 Parent(s): f95d2c9

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +3 -11
README.md CHANGED
@@ -1,11 +1,3 @@
1
- ---
2
- license: mit
3
- language:
4
- - en
5
- pipeline_tag: token-classification
6
- tags:
7
- - tokenizer
8
- ---
9
  # Traum Tokenizer
10
 
11
  Traum Tokenizer is a high-performance, specialized tokenizer designed for next-generation Large Language Models (LLMs) and specifically optimized for the Flash - SLM project. Developed after extensive research into existing tokenizers like GPT-2 and BERT, Traum Tokenizer addresses the critical need for a balanced approach between compression efficiency, training speed, and linguistic understanding.
@@ -43,7 +35,7 @@ Traum Tokenizer has been benchmarked against GPT-2 and LLaMA tokenizers across m
43
 
44
  The chart below visualizes the comparative efficiency of Traum Tokenizer across different test sets.
45
 
46
- ![Tokenizer Comparison](./Traum_Chart.png)
47
 
48
  ## Future Development
49
 
@@ -69,7 +61,7 @@ print(f"Decoded text: {tokenizer.decode(tokens)}")
69
 
70
  - `tokenizer.json`: Core BPE tokenizer configuration and vocabulary.
71
  - `tokenizer_config.json`: Metadata and configuration for the Transformers/Tokenizers library.
72
- - `Traum_Chart.png`: Benchmark visualization.
73
  - `README.md`: System documentation and benchmarks.
74
 
75
  ## Developer
@@ -77,4 +69,4 @@ print(f"Decoded text: {tokenizer.decode(tokens)}")
77
  **Assem Sabry** is an Egyptian AI Engineer & Researcher and the founder of Token AI (founded in 2025).
78
 
79
  - Website: https://assem.cloud/
80
- - LinkedIn: https://www.linkedin.com/in/assem7/
 
 
 
 
 
 
 
 
 
1
  # Traum Tokenizer
2
 
3
  Traum Tokenizer is a high-performance, specialized tokenizer designed for next-generation Large Language Models (LLMs) and specifically optimized for the Flash - SLM project. Developed after extensive research into existing tokenizers like GPT-2 and BERT, Traum Tokenizer addresses the critical need for a balanced approach between compression efficiency, training speed, and linguistic understanding.
 
35
 
36
  The chart below visualizes the comparative efficiency of Traum Tokenizer across different test sets.
37
 
38
+ ![Tokenizer Comparison](./traum_chart.png)
39
 
40
  ## Future Development
41
 
 
61
 
62
  - `tokenizer.json`: Core BPE tokenizer configuration and vocabulary.
63
  - `tokenizer_config.json`: Metadata and configuration for the Transformers/Tokenizers library.
64
+ - `traum_chart.png`: Benchmark visualization.
65
  - `README.md`: System documentation and benchmarks.
66
 
67
  ## Developer
 
69
  **Assem Sabry** is an Egyptian AI Engineer & Researcher and the founder of Token AI (founded in 2025).
70
 
71
  - Website: https://assem.cloud/
72
+ - LinkedIn: https://www.linkedin.com/in/assem7/