Yoruba Constituency Parser (Fine-tuned T5 Model)

Overview

This repository hosts a transformer-based constituency parser fine-tuned on the manually annotated Yoruba Constituency Treebank (Version 1.0). The model is designed to automatically generate phrase-structure trees for Yoruba sentences, supporting both linguistic research and NLP applications.

The parser is built on a T5 architecture and was fine-tuned to understand Yoruba syntax, including:

  • Serial Verb Constructions (SVCs)
  • Focus constructions
  • Embedded complement clauses
  • Relative clauses
  • Clause chaining

This model is intended for academic use, syntactic analysis, and computational research in Yoruba language processing.

Model Files

File Description
config.json Model architecture and configuration settings.
pytorch_model.bin or model.safetensors Trained model weights.
tokenizer.json or tokenizer.model Tokenization rules for Yoruba sentences.
tokenizer_config.json Tokenizer settings and special rules.
special_tokens_map.json Maps special tokens (e.g., <pad>, <eos>).

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load the fine-tuned Yoruba parser
model_name_or_path = "YOUR-HF-USERNAME/yoruba-constituency-parser"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)

# Parse a sample Yoruba sentence
sentence = "Mo ra aso tuntun"
inputs = tokenizer(sentence, return_tensors="pt")
outputs = model.generate(**inputs)
parsed_tree = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(parsed_tree)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support