Akindelevictoria commited on
Commit
009c497
·
verified ·
1 Parent(s): f2b984f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -3
README.md CHANGED
@@ -1,3 +1,50 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: "yor"
3
+ license: "cc-by-4.0"
4
+ tags:
5
+ - nlp
6
+ - constituency-parsing
7
+ - yoruba
8
+ - transformer
9
+ ---
10
+
11
+ # Yoruba Constituency Parser (Fine-tuned T5 Model)
12
+
13
+ ## Overview
14
+ This repository hosts a **transformer-based constituency parser** fine-tuned on the manually annotated Yoruba Constituency Treebank (Version 1.0). The model is designed to automatically generate **phrase-structure trees** for Yoruba sentences, supporting both linguistic research and NLP applications.
15
+
16
+ The parser is built on a **T5 architecture** and was fine-tuned to understand Yoruba syntax, including:
17
+
18
+ - Serial Verb Constructions (SVCs)
19
+ - Focus constructions
20
+ - Embedded complement clauses
21
+ - Relative clauses
22
+ - Clause chaining
23
+
24
+ This model is intended for **academic use, syntactic analysis, and computational research** in Yoruba language processing.
25
+
26
+ ## Model Files
27
+ | File | Description |
28
+ |------|-------------|
29
+ | `config.json` | Model architecture and configuration settings. |
30
+ | `pytorch_model.bin` or `model.safetensors` | Trained model weights. |
31
+ | `tokenizer.json` or `tokenizer.model` | Tokenization rules for Yoruba sentences. |
32
+ | `tokenizer_config.json` | Tokenizer settings and special rules. |
33
+ | `special_tokens_map.json` | Maps special tokens (e.g., `<pad>`, `<eos>`). |
34
+
35
+ ## Usage
36
+ ```python
37
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
38
+
39
+ # Load the fine-tuned Yoruba parser
40
+ model_name_or_path = "YOUR-HF-USERNAME/yoruba-constituency-parser"
41
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
42
+ model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)
43
+
44
+ # Parse a sample Yoruba sentence
45
+ sentence = "Mo ra aso tuntun"
46
+ inputs = tokenizer(sentence, return_tensors="pt")
47
+ outputs = model.generate(**inputs)
48
+ parsed_tree = tokenizer.decode(outputs[0], skip_special_tokens=True)
49
+
50
+ print(parsed_tree)