Akindelevictoria
/

yoruba-multilingual-constituency-parser

constituency-parsing

Model card Files Files and versions

Akindelevictoria commited on 19 days ago

Commit

009c497

·

verified ·

1 Parent(s): f2b984f

Update README.md

Files changed (1) hide show

README.md +50 -3

README.md CHANGED Viewed

@@ -1,3 +1,50 @@
----
-license: apache-2.0
----

+---
+language: "yor"
+license: "cc-by-4.0"
+tags:
+- nlp
+- constituency-parsing
+- yoruba
+- transformer
+---
+# Yoruba Constituency Parser (Fine-tuned T5 Model)
+## Overview
+This repository hosts a **transformer-based constituency parser** fine-tuned on the manually annotated Yoruba Constituency Treebank (Version 1.0). The model is designed to automatically generate **phrase-structure trees** for Yoruba sentences, supporting both linguistic research and NLP applications.
+The parser is built on a **T5 architecture** and was fine-tuned to understand Yoruba syntax, including:
+- Serial Verb Constructions (SVCs)
+- Focus constructions
+- Embedded complement clauses
+- Relative clauses
+- Clause chaining
+This model is intended for **academic use, syntactic analysis, and computational research** in Yoruba language processing.
+## Model Files
+| File | Description |
+|------|-------------|
+| `config.json` | Model architecture and configuration settings. |
+| `pytorch_model.bin` or `model.safetensors` | Trained model weights. |
+| `tokenizer.json` or `tokenizer.model` | Tokenization rules for Yoruba sentences. |
+| `tokenizer_config.json` | Tokenizer settings and special rules. |
+| `special_tokens_map.json` | Maps special tokens (e.g., `<pad>`, `<eos>`). |
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+# Load the fine-tuned Yoruba parser
+model_name_or_path = "YOUR-HF-USERNAME/yoruba-constituency-parser"
+tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
+model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)
+# Parse a sample Yoruba sentence
+sentence = "Mo ra aso tuntun"
+inputs = tokenizer(sentence, return_tensors="pt")
+outputs = model.generate(**inputs)
+parsed_tree = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(parsed_tree)