|
|
--- |
|
|
language: dje |
|
|
tags: |
|
|
- fasttext |
|
|
- word-embeddings |
|
|
- zarma |
|
|
- nlp |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- 27Group/noisy_zarma |
|
|
--- |
|
|
|
|
|
## Description |
|
|
This repository contains a pre-trained FastText model for the Zarma language. The model generates word embeddings for Zarma text, capturing semantic and contextual information for various NLP tasks. |
|
|
|
|
|
|
|
|
## Tasks |
|
|
- **Word Embeddings**: Generate vector representations for Zarma words. |
|
|
- **Part-of-Speech (POS) Tagging**: Provide features for POS tagging models. |
|
|
- **Text Classification**: Use embeddings for sentiment analysis or topic classification. |
|
|
- **Semantic Similarity**: Compute similarity between Zarma words or phrases. |
|
|
|
|
|
## Usage Examples |
|
|
|
|
|
### 1. Word Embeddings |
|
|
Load the FastText model to get word embeddings for Zarma text. |
|
|
|
|
|
```python |
|
|
import fasttext |
|
|
|
|
|
model = fasttext.load_model('zarma_fasttext.bin') |
|
|
|
|
|
word = "ay" |
|
|
embedding = model.get_word_vector(word) |
|
|
print(f"Embedding for '{word}': {embedding[:5]}...") |
|
|
``` |
|
|
### 2. Semantic Similarity |
|
|
```python |
|
|
import fasttext |
|
|
import numpy as np |
|
|
|
|
|
model = fasttext.load_model('zarma_fasttext.bin') |
|
|
|
|
|
word1 = "ay" |
|
|
word2 = "ni" |
|
|
vec1 = model.get_word_vector(word1) |
|
|
vec2 = model.get_word_vector(word2) |
|
|
|
|
|
similarity = np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2) + 1e-8) |
|
|
print(f"Similarity between '{word1}' and '{word2}': {similarity:.4f}") |
|
|
``` |
|
|
|
|
|
## How to Use |
|
|
Install FastText: **pip install fasttext** |
|
|
|
|
|
Download **zarma_fasttext.bin** from this repository. |
|
|
|
|
|
Use the code snippets above to integrate the model into your NLP pipeline. |
|
|
|
|
|
## How to cite |
|
|
If you use this model in your work, please cite: |
|
|
``` |
|
|
@misc{zarma_fasttext, |
|
|
title = {Pre-trained FastText Embeddings for Zarma}, |
|
|
author = {Mamadou K. Keita and Christopher Homan}, |
|
|
year = {2025}, |
|
|
howpublished = {\url{https://huggingface.co/27Group/zarma_fasttext}} |
|
|
} |
|
|
``` |