Update README.md
Browse files
README.md
CHANGED
|
@@ -24,48 +24,44 @@ The Karibu project is a collaboration between pleIAs, Bibliothèque sans fronti
|
|
| 24 |
|
| 25 |
## Karibu Language Level Classifier
|
| 26 |
Karibu is a DeBERTa-based classifier that automatically assigns CEFR language proficiency levels (A1-C2) to French educational content.
|
| 27 |
-
Model Characteristics
|
| 28 |
|
| 29 |
## Architecture: DeBERTa with multi-head classification
|
| 30 |
-
Base Model: PleIAs/celadon
|
| 31 |
-
Model Size: Fine-tuned from DeBERTa-v3-small
|
| 32 |
-
Output: 6 classification levels (A1, A2, B1, B2, C1, C2)
|
| 33 |
|
| 34 |
🤖 [Explore the Celadon model](https://huggingface.co/PleIAs/celadon)
|
| 35 |
|
| 36 |
|
| 37 |
## Training Details
|
| 38 |
|
| 39 |
-
Training Data: 9,000 synthetic samples
|
| 40 |
|
| 41 |
-
Source: French press articles + Wikimedia content
|
| 42 |
-
Processing: Sequential text simplification using an open source model (to come)
|
| 43 |
-
Validation: 1,000 samples per level manually verified by BSF experts
|
| 44 |
|
| 45 |
## Topics Coverage:
|
| 46 |
- solidarity, geography, African literature, agriculture, tourism, cultural events, African history, geopolitics, communication
|
| 47 |
-
Topic Filtering: Meta-Llama-3-8B-Instruct for content categorization
|
| 48 |
-
Annotation Method:
|
| 49 |
|
| 50 |
🔍 [Explore the full dataset](https://huggingface.co/datasets/PleIAs/KaribuAI/viewer/default)
|
| 51 |
|
| 52 |
|
| 53 |
## levels
|
| 54 |
-
Manual verification using CEFR framework criteria
|
| 55 |
-
Statistical validation using Louvain word-level classification
|
| 56 |
|
| 57 |
## Technical Integration
|
| 58 |
|
| 59 |
-
Deployment: Offline-capable via microSD cards
|
| 60 |
-
Format: H5P-compatible for interactive exercises
|
| 61 |
-
Input Processing: Handles various text types (academic writing, press articles, emails, letters, stories)
|
| 62 |
|
| 63 |
|
| 64 |
## Collaborators
|
| 65 |
|
| 66 |
-
PleIAs: Technical development
|
| 67 |
-
Bibliothèque Sans Frontières (BSF): Educational expertise
|
| 68 |
-
Kajou: Distribution platform
|
| 69 |
|
| 70 |
|
| 71 |
|
|
|
|
| 24 |
|
| 25 |
## Karibu Language Level Classifier
|
| 26 |
Karibu is a DeBERTa-based classifier that automatically assigns CEFR language proficiency levels (A1-C2) to French educational content.
|
|
|
|
| 27 |
|
| 28 |
## Architecture: DeBERTa with multi-head classification
|
| 29 |
+
- Base Model: PleIAs/celadon
|
| 30 |
+
- Model Size: Fine-tuned from DeBERTa-v3-small
|
| 31 |
+
- Output : 6 classification levels (A1, A2, B1, B2, C1, C2)
|
| 32 |
|
| 33 |
🤖 [Explore the Celadon model](https://huggingface.co/PleIAs/celadon)
|
| 34 |
|
| 35 |
|
| 36 |
## Training Details
|
| 37 |
|
| 38 |
+
- Training Data: 9,000 synthetic samples
|
| 39 |
|
| 40 |
+
- Source: French press articles + Wikimedia content
|
| 41 |
+
- Processing: Sequential text simplification using an open source model (to come)
|
| 42 |
+
- Validation: 1,000 samples per level manually verified by BSF experts
|
| 43 |
|
| 44 |
## Topics Coverage:
|
| 45 |
- solidarity, geography, African literature, agriculture, tourism, cultural events, African history, geopolitics, communication
|
| 46 |
+
- Topic Filtering: Meta-Llama-3-8B-Instruct for content categorization
|
|
|
|
| 47 |
|
| 48 |
🔍 [Explore the full dataset](https://huggingface.co/datasets/PleIAs/KaribuAI/viewer/default)
|
| 49 |
|
| 50 |
|
| 51 |
## levels
|
| 52 |
+
- Manual verification using CEFR framework criteria
|
| 53 |
+
- Statistical validation using Louvain word-level classification
|
| 54 |
|
| 55 |
## Technical Integration
|
| 56 |
|
| 57 |
+
- Deployment: Offline-capable via microSD cards
|
| 58 |
+
- Format: H5P-compatible for interactive exercises
|
| 59 |
+
- Input Processing: Handles various text types (academic writing, press articles, emails, letters, stories)
|
| 60 |
|
| 61 |
|
| 62 |
## Collaborators
|
| 63 |
|
| 64 |
+
PleIAs: Technical development, Bibliothèque Sans Frontières (BSF): Educational expertise, Kajou: Distribution platform
|
|
|
|
|
|
|
| 65 |
|
| 66 |
|
| 67 |
|