Commit ·
ea95949
1
Parent(s): bac26dd
README update
Browse files- README.md +3 -3
- src/test_t5.py +1 -1
README.md
CHANGED
|
@@ -44,8 +44,8 @@ model-index:
|
|
| 44 |
- Less stable on very long or morphologically complex words
|
| 45 |
|
| 46 |
> Development information
|
| 47 |
-
> - 🚧 **Current version:**
|
| 48 |
-
> - ⏳ **Upcoming release:**
|
| 49 |
>
|
| 50 |
> **Note:** As of May 19, 2026, AramT5's training process, which was at stage 4, was reset a baseline level due to inconsistencies found in previous versions of the Serto-Madnḥaya mapping code and lack of data for individual words, which mostly invalidated prior learning efforts
|
| 51 |
|
|
@@ -149,4 +149,4 @@ uv run python src/train_t5.py --stage 2 --hf-model your-username/model-name
|
|
| 149 |
|
| 150 |
## 📋 Version Changelog
|
| 151 |
|
| 152 |
-
* **AramT5 Baseline (May
|
|
|
|
| 44 |
- Less stable on very long or morphologically complex words
|
| 45 |
|
| 46 |
> Development information
|
| 47 |
+
> - 🚧 **Current version:** Baseline (stage 1)
|
| 48 |
+
> - ⏳ **Upcoming release:** v1 (stage 2)
|
| 49 |
>
|
| 50 |
> **Note:** As of May 19, 2026, AramT5's training process, which was at stage 4, was reset a baseline level due to inconsistencies found in previous versions of the Serto-Madnḥaya mapping code and lack of data for individual words, which mostly invalidated prior learning efforts
|
| 51 |
|
|
|
|
| 149 |
|
| 150 |
## 📋 Version Changelog
|
| 151 |
|
| 152 |
+
* **AramT5 Baseline (May 20, 2026):** T5 fine-tuned on 20k records, across 30 epochs, leveraging the stage 1 configuration. Baseline version with a surprisingly good initial understanding of how to transliterate properly, shown to capture some roots and Syriac morphology in a limited manner
|
src/test_t5.py
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
from transformers import AutoTokenizer, T5ForConditionalGeneration, pipeline
|
| 2 |
|
| 3 |
# HF Hub path config
|
| 4 |
-
model_path = "
|
| 5 |
|
| 6 |
# Unicode directional formatting for RTL text (Syriac)
|
| 7 |
RLI = "\u2067" # Right-to-Left Isolate
|
|
|
|
| 1 |
from transformers import AutoTokenizer, T5ForConditionalGeneration, pipeline
|
| 2 |
|
| 3 |
# HF Hub path config
|
| 4 |
+
model_path = "crossroderick/aramt5"
|
| 5 |
|
| 6 |
# Unicode directional formatting for RTL text (Syriac)
|
| 7 |
RLI = "\u2067" # Right-to-Left Isolate
|