|
|
--- |
|
|
language: en |
|
|
license: mit |
|
|
tags: |
|
|
- regression |
|
|
- soulprint |
|
|
- ubuntu |
|
|
- xgboost |
|
|
- culturally-rooted |
|
|
model-index: |
|
|
- name: Ubuntu_xgb_model |
|
|
results: |
|
|
- task: |
|
|
type: regression |
|
|
name: Ubuntu Regression |
|
|
dataset: |
|
|
name: Ubuntu-regression_data.jsonl |
|
|
type: synthetic |
|
|
metrics: |
|
|
- type: mse |
|
|
value: 0.0121 |
|
|
- type: rmse |
|
|
value: 0.1101 |
|
|
- type: r2 |
|
|
value: 0.8817 |
|
|
--- |
|
|
|
|
|
# Ubuntu Regression Model (Soulprint Archetype) |
|
|
|
|
|
## π§© Overview |
|
|
The **Ubuntu_xgb_model** is part of the Soulprint archetype family of models. |
|
|
It predicts an **Ubuntu alignment score (0.0β1.0)** for text inputs, where Ubuntu represents *"I am because we are"*: harmony, inclusion, and community bridge-building. |
|
|
|
|
|
- **0.0β0.3 β Low Ubuntu** (exclusion, selfishness, division) |
|
|
- **0.4β0.7 β Medium Ubuntu** (partial inclusion, effort but incomplete) |
|
|
- **0.8β1.0 β High Ubuntu** (harmony, belonging, collective well-being) |
|
|
|
|
|
This model is trained with **XGBoost regression** on a custom dataset of **918 rows**, balanced across Low, Medium, and High Ubuntu examples. Data was generated using culturally diverse contexts (family, school, workplace, community, cultural rituals). |
|
|
|
|
|
--- |
|
|
|
|
|
## π Training Details |
|
|
- **Framework:** Python 3, scikit-learn, XGBoost |
|
|
- **Embeddings:** SentenceTransformer `"all-mpnet-base-v2"` |
|
|
- **Algorithm:** `XGBRegressor` |
|
|
- **Training Size:** 918 rows |
|
|
- **Train/Test Split:** 80/20 |
|
|
|
|
|
### βοΈ Hyperparameters |
|
|
- `n_estimators=300` |
|
|
- `learning_rate=0.05` |
|
|
- `max_depth=6` |
|
|
- `subsample=0.8` |
|
|
- `colsample_bytree=0.8` |
|
|
- `random_state=42` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Evaluation Results |
|
|
On the held-out test set (20% of data): |
|
|
- **MSE:** 0.0121 |
|
|
- **RMSE:** 0.1101 |
|
|
- **RΒ² Score:** 0.882 |
|
|
|
|
|
--- |
|
|
|
|
|
## π Usage |
|
|
|
|
|
### Load Model |
|
|
```python |
|
|
import joblib |
|
|
import xgboost as xgb |
|
|
from sentence_transformers import SentenceTransformer |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
# ----------------------------- |
|
|
# 1. Download model from Hugging Face Hub |
|
|
# ----------------------------- |
|
|
REPO_ID = "mjpsm/Ubuntu_xgb_model" # change if you used a different repo name |
|
|
FILENAME = "Ubuntu_xgb_model.pkl" |
|
|
|
|
|
model_path = hf_hub_download(repo_id=REPO_ID, filename=FILENAME) |
|
|
|
|
|
# ----------------------------- |
|
|
# 2. Load model + embedder |
|
|
# ----------------------------- |
|
|
model = joblib.load(model_path) |
|
|
embedder = SentenceTransformer("all-mpnet-base-v2") |
|
|
|
|
|
# ----------------------------- |
|
|
# 3. Example prediction |
|
|
# ----------------------------- |
|
|
text = "During our class project, I made sure everyoneβs ideas were included." |
|
|
embedding = embedder.encode([text]) |
|
|
score = model.predict(embedding)[0] |
|
|
|
|
|
print("Predicted Ubuntu Score:", round(float(score), 3)) |
|
|
|
|
|
``` |
|
|
|
|
|
## π Applications |
|
|
|
|
|
- Community storytelling evaluation |
|
|
|
|
|
- Character alignment in cultural narratives |
|
|
|
|
|
- AI assistants tuned to Afrocentric archetypes |
|
|
|
|
|
- Training downstream models in the Soulprint system |
|
|
|
|
|
## β οΈ Limitations |
|
|
|
|
|
- Dataset is synthetic (generated + curated). Real-world generalization should be validated. |
|
|
|
|
|
- The model is context-specific to Ubuntu values and may not generalize beyond Afrocentric cultural framing. |
|
|
|
|
|
- Scores are approximate indicators β interpretation depends on narrative context. |