radar-1 / README.md

Initial project setup for Radar-1 language detection model

8551e99 14 days ago

1.46 kB

	---
	license: apache-2.0
	language:
	- vi
	- en
	- zh
	- ja
	- ko
	- fr
	- de
	- es
	- th
	- lo
	- km
	tags:
	- language-detection
	- language-identification
	- vietnamese
	- multilingual
	library_name: underthesea
	pipeline_tag: text-classification
	metrics:
	- accuracy
	- f1
	---

	# Radar-1

	Radar-1 is a language detection model developed by UnderTheSea NLP.

	## Model Description

	- Model Type: Language Detection (Text Classification)
	- Task: Identify the language of input text
	- Language: Multilingual
	- License: Apache 2.0

	## Supported Languages

	\| Code \| Language \|
	\|------\|----------\|
	\| vi \| Vietnamese \|
	\| en \| English \|
	\| zh \| Chinese \|
	\| ja \| Japanese \|
	\| ko \| Korean \|
	\| fr \| French \|
	\| de \| German \|
	\| es \| Spanish \|
	\| th \| Thai \|
	\| lo \| Lao \|
	\| km \| Khmer \|

	## Installation

	```bash
	pip install underthesea
	```

	## Usage

	```python
	from underthesea import lang_detect

	text = "Xin chào, tôi là người Việt Nam"
	language = lang_detect(text)
	print(language) # vi
	```

	## API

	```python
	from radar import RadarLangDetector, detect

	# Quick detection
	lang = detect("Hello world")
	print(lang) # en

	# With confidence scores
	detector = RadarLangDetector.load("models/radar-1")
	result = detector.predict("Xin chào Việt Nam")
	print(result.lang) # vi
	print(result.score) # 0.98
	```

	## Training

	```bash
	python src/train.py
	```

	## Technical Report

	See [TECHNICAL_REPORT.md](TECHNICAL_REPORT.md) for detailed methodology and evaluation.