Update README.md
Browse files
README.md
CHANGED
|
@@ -1,46 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# 🥪 Rye AI
|
|
|
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
- layered 🧠
|
| 8 |
-
- occasionally overengineered 😏
|
| 9 |
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
### 🧠 DaMorph
|
| 13 |
-
A collection of experimental models exploring **morphological segmentation for Danish NLP**.
|
| 14 |
-
|
| 15 |
-
👉 Because Danish words are long… and deserve to be sliced properly.
|
| 16 |
-
|
| 17 |
-
**Includes:**
|
| 18 |
-
- DaMorph
|
| 19 |
-
- DaMorph Tokenizers (yes, we slice at every layer)
|
| 20 |
-
- DaMedSum
|
| 21 |
-
|
| 22 |
-
### 🏥 DaMedSum
|
| 23 |
-
Danish medical summarization models trained on **LUMI HPC**.
|
| 24 |
-
|
| 25 |
-
👉 Turning long, complicated medical text into something (slightly) more digestible.
|
| 26 |
-
|
| 27 |
-
**Includes:**
|
| 28 |
-
- T5-large
|
| 29 |
-
- large / base / small variants
|
| 30 |
-
|
| 31 |
-
### 🔪 Tokenizers (a.k.a. precision slicing)
|
| 32 |
-
Because no good sandwich starts without proper slicing.
|
| 33 |
-
|
| 34 |
-
- Morphological tokenizers for Danish
|
| 35 |
-
- Built to explore how structure impacts understanding
|
| 36 |
-
|
| 37 |
-
## 🤔 What to expect
|
| 38 |
-
|
| 39 |
-
- Serious experiments 🤓
|
| 40 |
-
- Slightly cursed ideas 😈
|
| 41 |
-
- Danish NLP in all its glory 🇩🇰
|
| 42 |
-
- Things that *probably* shouldn’t work… but do
|
| 43 |
-
|
| 44 |
-
## ⚡ Slogan
|
| 45 |
-
|
| 46 |
-
**Open source never tasted this good.**
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Rye AI
|
| 3 |
+
emoji: 🐨
|
| 4 |
+
colorFrom: yellow
|
| 5 |
+
colorTo: purple
|
| 6 |
+
thumbnail: >-
|
| 7 |
+
https://cdn-uploads.huggingface.co/production/uploads/651e9c3d8645f13f18eb3af4/ToSGmNsxw8iWNUPu63Eaj.png
|
| 8 |
+
---
|
| 9 |
# 🥪 Rye AI
|
| 10 |
+
Open-source Danish NLP models and tools, layered like a proper smørrebrød.
|
| 11 |
|
| 12 |
+
### 🔪 What’s on the menu?
|
| 13 |
+
- **DaMorph** – Morphological segmentation for Danish. Because long compound words deserve proper slicing.
|
| 14 |
+
- **DaMedSum** – Medical summarization models trained on LUMI HPC, turning complex clinical text into clear, concise outputs.
|
| 15 |
+
- **Morphology-Aware Tokenizers** – Danish tokenizers built to preserve linguistic structure and boost downstream performance.
|
| 16 |
|
| 17 |
+
### 🇩🇰 Why Rye AI?
|
| 18 |
+
We build open, layered, and occasionally overengineered Danish NLP. Expect rigorous experiments, slightly playful ideas, and models that _probably_ shouldn’t work… but do.
|
|
|
|
|
|
|
| 19 |
|
| 20 |
+
> ⚡ **Open source never tasted this good.**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|