Navya-Sree commited on
Commit
39758c1
·
verified ·
1 Parent(s): c6412b7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md CHANGED
@@ -8,5 +8,58 @@ sdk_version: 5.35.0
8
  app_file: app.py
9
  pinned: false
10
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
 
12
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
8
  app_file: app.py
9
  pinned: false
10
  ---
11
+ ---
12
+ language:
13
+ - multilingual
14
+ - endangered-languages
15
+ tags:
16
+ - translation
17
+ - unesco
18
+ - m2m100
19
+ license: mit
20
+ datasets:
21
+ - UNESCO language vitality data
22
+ metrics:
23
+ - BLEU
24
+ - chrF++
25
+ ---
26
+
27
+ # UNESCO Language Translator 🌍
28
+
29
+ **A specialized translation model for UNESCO's endangered languages** powered by Meta's M2M100 and Hugging Face.
30
+
31
+ ## Key Features
32
+ - 🔍 **Endangered Language Focus**: 35+ UNESCO-protected languages
33
+ - ⚡️ **Context-Aware Translation**: Preserves cultural context
34
+ - 📊 **Language Vitality Tags**: Shows preservation status
35
+ - 🤝 **Community Feedback**: Crowdsourced quality improvement
36
+
37
+ ## Supported Languages
38
+ | Language | ISO Code | Vitality Level |
39
+ |----------|----------|----------------|
40
+ | Aymara | ay | Vulnerable |
41
+ | Cherokee | chr | Definitely Endangered |
42
+ | Quechua | qu | Vulnerable |
43
+ | ... | ... | ... |
44
+
45
+ [See full list](https://unesco.org/languages)
46
+
47
+ ## Usage
48
+ ```python
49
+ from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
50
+
51
+ model = M2M100ForConditionalGeneration.from_pretrained("unesco/translator")
52
+ tokenizer = M2M100Tokenizer.from_pretrained("unesco/translator")
53
+
54
+ def translate(text, target_lang):
55
+ tokenizer.src_lang = "auto"
56
+ encoded = tokenizer(text, return_tensors="pt")
57
+ generated_tokens = model.generate(
58
+ **encoded,
59
+ forced_bos_token_id=tokenizer.get_lang_id(target_lang),
60
+ cultural_preservation=True # Unique feature!
61
+ )
62
+ return tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
63
 
64
+ translate("Traditional knowledge matters", "qu")
65
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference