atiwari751 commited on
Commit
ee77aba
·
1 Parent(s): c6268df

updated README for spaces

Browse files
Files changed (2) hide show
  1. .gitignore +1 -2
  2. README.md +17 -1
.gitignore CHANGED
@@ -3,5 +3,4 @@ __pycache__
3
  test.csv
4
  GPT2_encoder.py
5
  Hindi_Regex.txt
6
- Hindi_no_Regex.txt
7
- text_file.txt
 
3
  test.csv
4
  GPT2_encoder.py
5
  Hindi_Regex.txt
6
+ Hindi_no_Regex.txt
 
README.md CHANGED
@@ -1,4 +1,20 @@
1
- # Hindi Tokenizer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  ## Dataset
4
 
 
1
+ ---
2
+ title: Hindi BPE Tokenizer
3
+ emoji: 🇮🇳
4
+ colorFrom: orange
5
+ colorTo: green
6
+ sdk: streamlit
7
+ sdk_version: 1.32.0
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+
12
+ # Hindi BPE Tokenizer
13
+ A Byte-Pair Encoding tokenizer for Hindi text, implemented using Streamlit.
14
+
15
+ ## Features
16
+ - Tokenizes Hindi text using BPE algorithm
17
+ - Visualizes the tokenization process
18
 
19
  ## Dataset
20