Spaces:

PLTAT
/

README

Running

App Files Files Community

welyjesch commited on Apr 3

Commit

609edf2

verified ·

1 Parent(s): 4ec96ef

Update README.md

Browse files

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -53,7 +53,7 @@ Interested parties may reach out via the Hugging Face discussion board or review
 </details>
-# PROGRESS REPORT: Phase 1 Foundation Model Alignment
 <details>
 <summary><b>Summary:</b> Phase 1 is underway, but achieving a high-fidelity "Teacher" model for Philippine languages using Llama 3.1 and machine-translated Alpaca data is currently bottlenecked. Llama 3.1's inherent English-centric bias combined with syntactically flawed, machine-translated training data creates a compounding error loop. This results in grammatical corruption, dialect mixing, and severe hallucinations rather than true Neural Machine Translation (NMT) parity. There is still a long way to go to build a reliable teacher model; we must pivot away from machine-translated shortcuts and invest in human-curated, native-first datasets before progressing to knowledge distillation.</summary>
@@ -114,7 +114,7 @@ Building high-performance NLP architectures for Philippine languages cannot rely
 </details>
-# SOLUTION DOCUMENT: Crowdsourced Authentic Dataset Generation Strategy
 <details>
 <summary><b>Summary:</b> In response to the hallucination loop caused by machine-translated training data, stakeholders have pivoted towards authentic, native-first dataset curation. To facilitate this, we have developed the PLTAT App—an all-in-one "Swiss Army knife" platform for crowdsourcing the translation, generation, evaluation, and correction of NLP datasets. Because building a high-fidelity teacher model is a long-term, iterative process, we are actively seeking institutional stakeholders (universities, government agencies) to sustain this effort. Technical resources, including the PLTAT Chat App and our Ollama Colab Server Notebook, are now live for community testing.</summary>
@@ -124,7 +124,7 @@ Building high-performance NLP architectures for Philippine languages cannot rely
 **Organization:** Philippine Languages Translation and AI Training Community (PLTAT)
 **Project Phase:** Phase 1.5 - Authentic Data Remediation & HITL Integration
-**Date:** [Current Date]
 ---

 </details>
+## Progress Repoort for Phase 1
 <details>
 <summary><b>Summary:</b> Phase 1 is underway, but achieving a high-fidelity "Teacher" model for Philippine languages using Llama 3.1 and machine-translated Alpaca data is currently bottlenecked. Llama 3.1's inherent English-centric bias combined with syntactically flawed, machine-translated training data creates a compounding error loop. This results in grammatical corruption, dialect mixing, and severe hallucinations rather than true Neural Machine Translation (NMT) parity. There is still a long way to go to build a reliable teacher model; we must pivot away from machine-translated shortcuts and invest in human-curated, native-first datasets before progressing to knowledge distillation.</summary>
 </details>
+## Current Status: Crowdsourced Authentic Dataset Generation Strategy
 <details>
 <summary><b>Summary:</b> In response to the hallucination loop caused by machine-translated training data, stakeholders have pivoted towards authentic, native-first dataset curation. To facilitate this, we have developed the PLTAT App—an all-in-one "Swiss Army knife" platform for crowdsourcing the translation, generation, evaluation, and correction of NLP datasets. Because building a high-fidelity teacher model is a long-term, iterative process, we are actively seeking institutional stakeholders (universities, government agencies) to sustain this effort. Technical resources, including the PLTAT Chat App and our Ollama Colab Server Notebook, are now live for community testing.</summary>
 **Organization:** Philippine Languages Translation and AI Training Community (PLTAT)
 **Project Phase:** Phase 1.5 - Authentic Data Remediation & HITL Integration
+**Date:** April 6, 2026
 ---