Yoruba Constituency Parser (Fine-tuned T5 Model)

Overview

This repository hosts a transformer-based constituency parser fine-tuned on the manually annotated Yoruba Constituency Treebank (Version 1.0). The model is designed to automatically generate phrase-structure trees for Yoruba sentences, supporting both linguistic research and NLP applications.

The parser is built on a T5 architecture and was fine-tuned to understand Yoruba syntax, including:

Serial Verb Constructions (SVCs)
Focus constructions
Embedded complement clauses
Relative clauses
Clause chaining

This model is intended for academic use, syntactic analysis, and computational research in Yoruba language processing.

Model Files

File	Description
`config.json`	Model architecture and configuration settings.
`pytorch_model.bin` or `model.safetensors`	Trained model weights.
`tokenizer.json` or `tokenizer.model`	Tokenization rules for Yoruba sentences.
`tokenizer_config.json`	Tokenizer settings and special rules.
`special_tokens_map.json`	Maps special tokens (e.g., `<pad>`, `<eos>`).

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load the fine-tuned Yoruba parser
model_name_or_path = "YOUR-HF-USERNAME/yoruba-constituency-parser"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)

# Parse a sample Yoruba sentence
sentence = "Mo ra aso tuntun"
inputs = tokenizer(sentence, return_tensors="pt")
outputs = model.generate(**inputs)
parsed_tree = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(parsed_tree)
## Authors

Victoria A. Akindele  
Department of Linguistics, University of Ibadan

Dr. Gerald Nweya  
Department of Linguistics, University of Ibadan (Project Supervisor)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support