Spaces:
Sleeping
Sleeping
metadata
title: Tatar2Vec Explorer
emoji: ๐
colorFrom: indigo
colorTo: purple
sdk: docker
pinned: true
app_file: app.py
๐ Tatar2Vec Explorer
๐ Overview
Tatar2Vec represents a breakthrough in natural language processing for the Tatar language, offering state-of-the-art word embeddings that significantly outperform existing solutions. This interactive demo allows you to explore the semantic richness of Tatar through cutting-edge AI models.
๐ Features
๐ Semantic Search
- Word Similarity: Find semantically similar words
- Vector Operations: Perform complex word analogies
- Interactive Visualizations: Explore results with beautiful charts and word clouds
๐ง Advanced Analytics
- Model Comparison: Compare FastText vs Word2Vec performance
- OOV Handling: Test out-of-vocabulary word capabilities
- Performance Metrics: Detailed model evaluation scores
๐ฏ Model Variants
- ๐ฅ Best FastText:
ft_dim100_win5_min5_ngram3-6_sg.epoch1(Composite: 0.7019) - ๐ฅ Alternative FastText:
ft_dim100_win5_min5_ngram3-6_sg.epoch3 - ๐ฅ Best Word2Vec:
w2v_dim200_win5_min5_sg.epoch4 - ๐ฅ Compact Word2Vec:
w2v_dim100_win5_min5_sg
๐ Performance Highlights
| Model | Composite Score | Semantic Similarity | OOV Handling |
|---|---|---|---|
| Best FastText | 0.7019 | 0.7368 | 1.0000 |
| Meta cc.tt.300 | 0.2000 | - | - |
| Improvement | 3.5ร | Significant | Perfect |
๐ฎ Quick Start
Try These Examples:
Word Similarity
# Find words similar to "ะผำะบัำะฟ" (school)
similar_words = model.most_similar('ะผำะบัำะฟ', topn=10)
Word Analogies
# Doctor - man + woman = ?
analogy = model.most_similar(
positive=['ัะฐะฑะธะฑ', 'ั
ะฐััะฝ'], # doctor, woman
negative=['ะธั'] # man
)
OOV Testing (FastText Only)
# Handle unknown words
vector = model['ัะตั
ะฝะพะปะพะณะธัะปำััะตัาฏ'] # technology-related word
๐๏ธ Technical Details
Training Corpus
- Total Tokens: 203.2 million
- Vocabulary Size: 637.7K words
- Unique Words: 1.8 million
- Domains: Wikipedia, news, books, social media
Model Architecture
- FastText: Subword information support
- Word2Vec: Classical word embeddings
- Optimized: Skip-gram architecture, 100 dimensions
๐ Use Cases
๐ Education
- Language learning applications
- Educational content analysis
- Academic research
๐ผ Business
- Content recommendation systems
- Search engine enhancement
- Customer feedback analysis
๐ฌ Research
- Linguistic studies
- Cross-lingual comparisons
- AI model development
๐ ๏ธ Installation
Local Development
git clone https://huggingface.co/spaces/arabovs-ai-lab/tatar2vec-demo
cd tatar2vec-demo
pip install -r requirements.txt
streamlit run app.py
Docker Deployment
docker build -t tatar2vec-demo .
docker run -p 7860:7860 tatar2vec-demo
๐ API Access
from huggingface_hub import snapshot_download
from gensim.models import FastText
# Download and load the best model
model_dir = snapshot_download(repo_id="arabovs-ai-lab/Tatar2Vec")
model = FastText.load(f"{model_dir}/fasttext/ft_dim100_win5_min5_ngram3-6_sg.epoch1/ft_dim100_win5_min5_ngram3-6_sg.epoch1.model")
# Use the model
similar_words = model.wv.most_similar('ะผำะบัำะฟ')
๐ Evaluation Metrics
Our models were evaluated on multiple dimensions:
- Semantic Similarity: Human-judged word pairs
- Analogy Accuracy: Word relationship tasks
- OOV Handling: Unknown word processing
- Neighbor Coherence: Semantic consistency
๐ค Contributing
We welcome contributions from the community! Areas of interest:
- Additional evaluation benchmarks
- New model architectures
- Expanded training data
- Multilingual applications
๐ Citation
If you use Tatar2Vec in your research, please cite:
@misc{tatar2vec2025,
title = {Tatar2Vec: High-Quality Tatar Word Embeddings},
author = {Arabovs AI Lab},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/arabovs-ai-lab/Tatar2Vec},
note = {Version 1.0}
}
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- Tatar language speakers and contributors
- Hugging Face for platform support
- Open-source community for tools and libraries
Empowering Tatar Language Technology
Brought to you by Arabovs AI Lab