Spaces:

datumo
/

README

Configuration error

App Files Files Community

barom2 commited on Apr 21

Commit

12e49bf

verified ·

1 Parent(s): 4bebd58

Update README.md

Browse files

Files changed (1) hide show

README.md +176 -45

README.md CHANGED Viewed

@@ -38,53 +38,46 @@ Our flagship **Datumo Platform** is Korea's first end-to-end AI trust evaluation
 ## 🎯 What We Do
-<table>
-<tr>
-<td width="33%" valign="top">
-### 🗂️ Data Construction
-Training data design &amp; build
-Pre-training data licensing
-RAG knowledge pipelines
-Crowdsourced at scale
-</td>
-<td width="33%" valign="top">
-### 🛡️ AI Trust &amp; Safety
-LLM red-teaming
-Reliability benchmarks
-Safety evaluation datasets
-Guardrail testing
-</td>
-<td width="33%" valign="top">
-### 📊 Datumo Platform
-Automated LLM evaluation
-Dashboard analytics
-**45 days → 45 minutes**
-End-to-end eval pipeline
-</td>
-</tr>
-</table>
 ---
 ## 📚 Featured Collections
 ### 🛡️ [Safety-Data](https://huggingface.co/collections/datumo/safety-data)
 Curated by our **AI Safety team** — Korean-language safety and reliability benchmarks for LLM evaluation.
-- 🔸 [**KorSET**](https://huggingface.co/datasets/datumo/KorSET) — Korean Safety Evaluation Toolkit
-- 🔸 [**KorNAT**](https://huggingface.co/datasets/datumo/KorNAT) — Korea's first LLM reliability / national-alignment benchmark
 ### 📦 [Data-Data](https://huggingface.co/collections/datumo/data-data)
-Research outputs from our **Data team**.
-- 🔸 [**CAC-CoT**](https://huggingface.co/datumo/CAC-CoT) — 7B Chain-of-Thought feature extraction model
-- 🔸 [**CAC-CoT dataset**](https://huggingface.co/datasets/datumo/CAC-CoT) — accompanying training data
 > 💡 팔로우하시면 새 데이터셋과 모델이 공개될 때 알림을 받으실 수 있어요.
@@ -92,13 +85,150 @@ Research outputs from our **Data team**.
 ## 🏆 Milestones
-- 🇰🇷 **K-AI Company** — Selected for Korea's Sovereign AI Foundation Model Project *(SKT Consortium, data lead)*
-- 🏅 **Forbes Korea "2025 AI 50"**
-- 🏅 **Forbes "30 Under 30 Asia"** — Enterprise Technology
-- 🚀 **Datumo Eval** — Korea's first automated LLM reliability evaluation platform (2025)
-- 📈 **200M+ annotations** · **287+ enterprise clients** · **250K+ crowdworkers**
-- 📝 Co-built landmark Korean benchmarks including **KLUE** and **KorQuAD 2.0**
-- 🔬 Publications at **NeurIPS · EMNLP · CVPR**
 ---
@@ -109,10 +239,11 @@ Research outputs from our **Data team**.
 | 🌐 Website | [selectstar.ai](https://selectstar.ai/) |
 | 📰 Blog | [selectstar.ai/blog](https://selectstar.ai/blog/) |
 | 💼 Enterprise inquiries | [Contact form](https://selectstar.ai/contact_page/) |
-| 💬 Community | Join the [discussion tab](https://huggingface.co/spaces/datumo/README/discussions) |
 ---
 <div align="center">
-<sub>⭐ Building the data foundation for trustworthy AI &middot; Made with care in Seoul 🇰🇷</sub>
 </div>

 ## 🎯 What We Do
+Perception AI(2018~) → Generative AI(2022~) → **Agentic AI(2026~)** 로 이어지는 AI 진화 전 단계에 걸쳐, 데이터 구축부터 신뢰성 검증까지 **End-to-End 파이프라인**을 제공합니다.
+| 🗂️ Data Construction | 🛡️ AI Trust & Safety | 📊 Datumo Platform |
+|---|---|---|
+| 고난도 추론 데이터 생성 (CAC-CoT, GRADE, ATA, COBA) | LLM 레드티밍 (CAGE, STAR-Teaming) | 국내 최초 LLM 신뢰성 자동화 평가 플랫폼 |
+| 사전학습·파인튜닝 데이터 라이선싱 | 한국어 Safety 벤치마크 (KorNAT, KorSET, FinRED) | **평가 기간 45일 → 45분** |
+| RAG 지식 파이프라인 | Safety Judge (Datumo-Guard) | 온프레미스·망분리 환경 지원 |
+| 25만 명+ 크라우드워커 · 2억 건+ 어노테이션 | 금융·의료·공공 도메인 특화 평가 | Dashboard Analytics & Reporting |
+> 🤝 **주요 파트너십**: SKT 독자 AI 파운데이션 모델(독파모) 컨소시엄 · GSMA Open Telco AI · 삼성생명 C-Lab Outside · 금융보안원 · 식약처 의료 레드팀
 ---
 ## 📚 Featured Collections
 ### 🛡️ [Safety-Data](https://huggingface.co/collections/datumo/safety-data)
 Curated by our **AI Safety team** — Korean-language safety and reliability benchmarks for LLM evaluation.
+| Dataset | Description | Venue |
+|---|---|---|
+| 🔸 [**KorSET**](https://huggingface.co/datasets/datumo/KorSET) | CAGE 프레임워크로 구축한 한국어 레드티밍 벤치마크 (5개 위험 도메인 · 12개 카테고리 · 53개 세부 유형 · ~8,000건) | **ICLR 2026** (CAGE) |
+| 🔸 [**KorNAT**](https://huggingface.co/datasets/datumo/KorNAT) | Korea's first LLM reliability / national-alignment benchmark | ACL 2024 Findings |
+| 🔸 **FinRED** | 금융 도메인 LLM **레드티밍(Red-Teaming)** 평가 벤치마크 (금융보안원 AI혁신실 공동 구축) | KDD 2026 D&B Track |
 ### 📦 [Data-Data](https://huggingface.co/collections/datumo/data-data)
+Research outputs from our **Data team** — models and datasets built in-house.
+| Resource | Description | Type |
+|---|---|---|
+| 🔸 [**CAC-CoT**](https://huggingface.co/datumo/CAC-CoT) | 7B Connector-Aware Compact Chain-of-Thought reasoning model | Model |
+| 🔸 [**CAC-CoT dataset**](https://huggingface.co/datasets/datumo/CAC-CoT) | Accompanying training data for CAC-CoT | Dataset |
+### 🏗️ Co-built Benchmarks
+셀렉트스타가 공동 구축에 참여한 한국어 대표 벤치마크입니다.
+- 🔹 [**KLUE**](https://huggingface.co/datasets/klue/klue) — Korean Language Understanding Evaluation (NeurIPS 2021 Datasets & Benchmarks)
+- 🔹 [**KorQuAD 2.0**](https://huggingface.co/datasets/KETI-AIR/korquad) — Korean Question Answering Dataset (LG CNS 공동 구축)
 > 💡 팔로우하시면 새 데이터셋과 모델이 공개될 때 알림을 받으실 수 있어요.
 ## 🏆 Milestones
+**Highlight (최근 주요 성과)**
+- 🏅 **Forbes "30 Under 30 Asia" 2021** — Enterprise Technology (공동창업자 4인 선정)
+- 🏅 **Forbes Korea "2025 대한민국 AI 50"** 선정
+- 🏅 **Forbes Asia "100 유망 기업" 2025** 선정
+- 🇰🇷 **독자 AI 파운데이션 모델(독파모)** 1차 통과 (2026.01, SKT 컨소시엄 데이터 총괄)
+- 🌐 **GSMA Open Telco AI** 공식 파트너 합류 (2026.03, MWC Barcelona)
+- 💰 누적 투자 **434억원 돌파** (2025.12, Series B 확장)
+- 📈 누적 어노테이션 **2억 건+** · 기업 고객 **287개+** · 크라우드워커 **25만 명+**
+<details>
+<summary><b>📜 전체 연혁 보기 (2018–2026)</b></summary>
+### 🌱 Founding & Early Traction (2018–2020)
+| 연월 | 내용 |
+|---|---|
+| 2018.11 | 셀렉트스타(주) 설립 |
+| 2018.12 | KAIST 창업대회(E*5) 최우수상 |
+| 2019.07 | 카카오벤처스 SEED 4억 투자 유치 |
+| 2019.09 | KorQuAD 2.0 Dataset 구축 (LG CNS 공동) |
+| 2019.10 | TIPS 프로그램 선정 |
+| 2019.12 | 기업부설연구소 설립 인정 |
+| 2020.09 | Series A 40억 투자 유치 (카카오벤처스·코오롱인베스트먼트·컴퍼니케이파트너스) |
+| 2020.10 | **SideGuide** (IROS 2020) 논문 성과 — Large-scale Sidewalk Dataset |
+| 2020.11 | 데이터스타즈 최우수상 (과학기술정보통신부장관상) |
+### 🚀 Scale-Up & Global Recognition (2021–2022)
+| 연월 | 내용 |
+|---|---|
+| 2021.01 | Samsung C-Lab Outside 선정 |
+| **2021.04** | 🏅 **Forbes "30 Under 30 Asia"** Enterprise Technology 선정 (공동창업자 4인) |
+| 2021.11 | **KLUE** NeurIPS 2021 Datasets & Benchmarks 논문 성과 |
+| 2022.01 | CES 2022 참여 (Samsung C-Lab) |
+| 2022.02 | 제1기 인공지능 윤리 정책 포럼 기술 분과 위원 선정 |
+| 2022.03 | **Instance-wise Occlusion and Depth Orders** CVPR 2022 논문 성과 |
+| 2022.07 | Series A Extension 90억 투자 유치 |
+| 2022.07 | 기술혁신형 중소기업(inno-Biz) 인증 |
+| 2022.11 | **KOLD** (EMNLP 2022), **CochlScene** (APSIPA 2022), **Split-GCN** (TPAMI, 1저자) 논문 성과 |
+### 🧠 LLM Era & AI Safety (2023–2024)
+| 연월 | 내용 |
+|---|---|
+| 2023.05 | Series A Extension 40억 투자 유치 (산업은행) |
+| 2023.06 | AI 기반 국방 혁신 포럼 대상 (육군참모총장상) |
+| 2023.07 | "AI Talk with Andrew Ng" 행사 Keynote Speaker |
+| 2023.10 | Samsung Developer Conference 2023 연사 참여 |
+| 2023.11 | 대한민국 Digital Innovation Award 특별상 |
+| 2023.12 | **Analyzing Norm Violations in Live-Stream Chat** EMNLP 2023 논문 성과 |
+| 2023.12 | 국내 최초 "초거대 언어 모델 신뢰성 벤치마크 데이터셋" 구축 (NIA) |
+| 2024.04 | **Gen AI Korea 2024: 생성형 AI 레드팀 챌린지** 컨퍼런스 기획·운영 (과기정통부) |
+| **2024.08** | **KorNAT** ACL 2024 Findings 논문 성과 — 국내 AI 데이터 기업 최초 글로벌 Top AI 학회 데이터셋 1저자 논문 |
+| 2024.10 | KT 'Responsible AI 자문 위원회' 자문위 위원 선정 |
+| 2024.11 | 제2회 인공지능 신뢰성 대상 우수상 (정보통신정책연구원 원장상) |
+| 2024.11 | GSMA AI Summit 2024 연사 참여 |
+| 2024.12 | 국내 최초 **LLM 무해성 평가 데이터 DQ(Data Quality) 인증** 획득 (TTA) |
+| 2024.12 | 2024 아시아AI대상 벤처기업협회 회장상 |
+### 🌐 Agentic AI & Global Expansion (2025–2026)
+| 연월 | 내용 |
+|---|---|
+| 2025.02 | **Datumo Eval 출시** — 국내 최초 LLM 자동화 평가 플랫폼 |
+| 2025.03 | **Gen AI Red Team Challenge** 공동 개최 (MWC Barcelona, GSMA) — 세계 최초 오프라인 글로벌 레드팀 챌린지 |
+| 2025.04 | AI 기본법 안전성 가이드라인 TF 위원 선정 (과기정통부·AI안전연구소, 김세엽 대표) |
+| 2025.05 | 🏅 **Forbes Korea "2025 대한민국 AI 50"** 선정 |
+| 2025.06 | 삼성금융 C-Lab Outside 최종 선정 (삼성생명 금융 AI 신뢰성 검증 협업) |
+| 2025.07 | 민간 AI 신뢰성 인증 'AI-MASTER' 시험기관 참여 (국내 최초 민간 주도 체계) |
+| 2025.08 | **Series B 205억원 투자 유치** |
+| 2025.08 | 🏅 **Forbes Asia "100 유망 기업 2025"** 선정 |
+| 2025.08 | **독자 AI 파운데이션 모델(독파모)** 정예팀 선발 (SKT 컨소시엄 데이터 총괄) |
+| 2025.09 | **국가인공지능전략위원회 데이터 분과위원** 위촉 (김세엽 대표) |
+| 2025.09 | 식약처 첨단 AI 디지털 의료제품 레드팀 챌린지 후원 — 아시아 첫 '의료 레드팀' |
+| 2025.10 | 삼성금융 C-Lab Outside **최우수 스타트업** 선정 (삼성생명) |
+| 2025.11 | **CAC-CoT · CoBA · GRADE** EMNLP 2025 논문 3편 동시 등재 |
+| 2025.11 | 2025 이데일리 AI 코리아 대상 (한국인공지능산업협회장상) |
+| 2025.11 | Good AI Awards 2025 NIA 원장상 |
+| 2025.12 | Series B 55억원 추가 투자 — **누적 투자 434억원** 돌파 |
+| **2026.01** | 🇰🇷 **독자 AI 파운데이션 모델(독파모) 1차 통과** (SKT 컨소시엄) |
+| 2026.02 | **CAGE** ICLR 2026 Main Conference 논문 성과 |
+| 2026.03 | **GSMA 'Open Telco AI'** 글로벌 연합체 공식 파트너 합류 (MWC Barcelona) |
+| 2026.03 | MWC 2026 Gen AI Red Team Challenge 공동 주관 (GSMA · LG U+) |
+</details>
+---
+## 📖 Publications
+셀렉트스타가 단독·공동·지원 참여한 논문 목록입니다. 국제 AI·ML Top 학회 중���으로 정리했습니다.
+<details open>
+<summary><b>🔥 2026 (4편)</b></summary>
+| Paper | Co-authors | Venue |
+|---|---|---|
+| **STAR-Teaming**: A Strategy-Response Multiplex Network Approach to Automated LLM Red Teaming | Selectstar | ACL 2026 |
+| **FinRED**: An Expert-Guided Red-Teaming Benchmark for Financial LLM Safety | Selectstar · 금융보안원 AI혁신실 | KDD 2026 Dataset & Benchmark Track |
+| [**CAGE**](https://openreview.net/forum?id=gCm55KYiqz): A Framework for Culturally Adaptive Red-Teaming Benchmark Generation | Selectstar | **ICLR 2026** Main |
+| **E-star-12B**: Rubric-Following Evaluator Adaptive Across Industrial Domains | Selectstar | ACL 2026 Workshop (진행 중) |
+</details>
+<details>
+<summary><b>📄 2025 (4편)</b></summary>
+| Paper | Co-authors | Venue |
+|---|---|---|
+| [**CoBA**](https://aclanthology.org/2025.emnlp-main.520/): Counterbias Text Augmentation for Mitigating Various Spurious Correlations via Semantic Triples | Selectstar · 중앙대학교 | EMNLP 2025 Main |
+| [**GRADE**](https://aclanthology.org/2025.findings-emnlp.236/): Generating multi-hop QA and fine-gRAined Difficulty matrix for RAG Evaluation | Selectstar · KAIST | EMNLP 2025 Findings |
+| [**CAC-CoT**](https://aclanthology.org/2025.findings-emnlp.1062/): Connector-Aware Compact Chain-of-Thought for Efficient Reasoning Data Synthesis Across Dual-System Cognitive Tasks | Selectstar | EMNLP 2025 Findings |
+| **ATA**: Autonomous Tabular-data Analysis for Insight Generation via Statistical Methods | Selectstar · 삼성증권 금융AI센터 | 공저 제출 중 |
+</details>
+<details>
+<summary><b>📄 2024 (1편)</b></summary>
+| Paper | Co-authors | Venue |
+|---|---|---|
+| [**KorNAT**](https://arxiv.org/abs/2402.13605): LLM Alignment Benchmark for Korean Social Values and Common Knowledge | Selectstar · KAIST · SKT · LG · 네이버 · KT · NIA | ACL 2024 Findings |
+> 국내 AI 데이터 기업 최초 글로벌 Top AI 학회에 데이터셋 주제 1저자 논문 등재
+</details>
+<details>
+<summary><b>📄 2021–2023 (5편)</b></summary>
+| Year | Paper | Venue |
+|---|---|---|
+| 2023 | [**Analyzing Norm Violations in Live-Stream Chat**](https://aclanthology.org/2023.emnlp-main.55/) | EMNLP 2023 |
+| 2022 | [**KOLD**](https://aclanthology.org/2022.emnlp-main.744/): Korean Offensive Language Dataset | EMNLP 2022 |
+| 2022 | [**Split-GCN**](https://ieeexplore.ieee.org/document/9984937): Effective Interactive Annotation for Segmentation of Disconnected Instance | IEEE TPAMI (1저자) |
+| 2022 | [**Instance-wise Occlusion and Depth Orders**](https://openaccess.thecvf.com/content/CVPR2022/html/Lee_Instance-Wise_Occlusion_and_Depth_Orders_in_Natural_Scenes_CVPR_2022_paper.html) | CVPR 2022 |
+| 2022 | [**CochlScene**](https://ieeexplore.ieee.org/document/9979822): Acquisition of acoustic scene data using crowdsourcing | APSIPA 2022 |
+| 2021 | [**KLUE**](https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/98dce83da57b0395e163467c9dae521b-Abstract-round2.html): Korean Language Understanding Evaluation | NeurIPS 2021 Datasets & Benchmarks |
+| 2020 | [**SideGuide**](https://ieeexplore.ieee.org/document/9340734): A Large-scale Sidewalk Dataset for Guiding Impaired People | IROS 2020 |
+</details>
+> 전체 논문 목록 및 상세 내용은 [블로그](https://selectstar.ai/blog/) 또는 [문의하기](https://selectstar.ai/contact_page/)를 통해 확인하실 수 있습니다.
 ---
 | 🌐 Website | [selectstar.ai](https://selectstar.ai/) |
 | 📰 Blog | [selectstar.ai/blog](https://selectstar.ai/blog/) |
 | 💼 Enterprise inquiries | [Contact form](https://selectstar.ai/contact_page/) |
+| 💬 Community | [Discussion tab](https://huggingface.co/spaces/datumo/README/discussions) |
+| 🔔 Updates | HuggingFace 팔로우로 새 릴리즈 알림 받기 |
 ---
 <div align="center">
+<sub>⭐ Building the data foundation for trustworthy AI · Made with care in Seoul 🇰🇷</sub>
 </div>