|
|
--- |
|
|
title: EN-VI-JA Triplet Dataset Viewer |
|
|
emoji: ๐ |
|
|
colorFrom: blue |
|
|
colorTo: purple |
|
|
sdk: gradio |
|
|
sdk_version: "4.44.0" |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
license: cc-by-4.0 |
|
|
datasets: |
|
|
- sotalab/en-vi-ja-300k-triplets |
|
|
--- |
|
|
|
|
|
 |
|
|
|
|
|
# SOTA Lab |
|
|
|
|
|
SOTA Lab is a research lab with the goal of heading to quality, core values building, and core technologies development in Software, Hardware, and Robotics. |
|
|
|
|
|
We focus on building foundational technologies and high-quality datasets that push the boundaries of what's possible. Our work spans across: |
|
|
|
|
|
- **Software** - Machine learning, natural language processing, and data engineering |
|
|
- **Hardware** - Embedded systems and computing infrastructure |
|
|
- **Robotics** - Intelligent systems and automation |
|
|
|
|
|
### AI/ML Research |
|
|
|
|
|
Our AI and machine learning efforts focus on: |
|
|
|
|
|
- **Multilingual NLP** - Building parallel corpora and translation datasets across multiple languages |
|
|
- **Data Quality** - Developing pipelines for cleaning, filtering, and validating large-scale datasets |
|
|
- **Model Training** - Creating high-quality training data for language models and translation systems |
|
|
- **Open Datasets** - Publishing curated datasets for the research community |
|
|
|
|
|
We believe in open research and contributing to the global community through open-source projects and publicly available datasets. |
|
|
|