radar-1 / README.md
rain1024's picture
Initial project setup for Radar-1 language detection model
8551e99
metadata
license: apache-2.0
language:
  - vi
  - en
  - zh
  - ja
  - ko
  - fr
  - de
  - es
  - th
  - lo
  - km
tags:
  - language-detection
  - language-identification
  - vietnamese
  - multilingual
library_name: underthesea
pipeline_tag: text-classification
metrics:
  - accuracy
  - f1

Radar-1

Radar-1 is a language detection model developed by UnderTheSea NLP.

Model Description

  • Model Type: Language Detection (Text Classification)
  • Task: Identify the language of input text
  • Language: Multilingual
  • License: Apache 2.0

Supported Languages

Code Language
vi Vietnamese
en English
zh Chinese
ja Japanese
ko Korean
fr French
de German
es Spanish
th Thai
lo Lao
km Khmer

Installation

pip install underthesea

Usage

from underthesea import lang_detect

text = "Xin chào, tôi là người Việt Nam"
language = lang_detect(text)
print(language)  # vi

API

from radar import RadarLangDetector, detect

# Quick detection
lang = detect("Hello world")
print(lang)  # en

# With confidence scores
detector = RadarLangDetector.load("models/radar-1")
result = detector.predict("Xin chào Việt Nam")
print(result.lang)   # vi
print(result.score)  # 0.98

Training

python src/train.py

Technical Report

See TECHNICAL_REPORT.md for detailed methodology and evaluation.