File size: 3,224 Bytes
a50c277 e9c6c37 a50c277 e9c6c37 a50c277 e9c6c37 a50c277 e9c6c37 a50c277 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | ---
language: en
license: mit
tags:
- regression
- soulprint
- ubuntu
- xgboost
- culturally-rooted
model-index:
- name: Ubuntu_xgb_model
results:
- task:
type: regression
name: Ubuntu Regression
dataset:
name: Ubuntu-regression_data.jsonl
type: synthetic
metrics:
- type: mse
value: 0.0121
- type: rmse
value: 0.1101
- type: r2
value: 0.8817
---
# Ubuntu Regression Model (Soulprint Archetype)
## 🧩 Overview
The **Ubuntu_xgb_model** is part of the Soulprint archetype family of models.
It predicts an **Ubuntu alignment score (0.0–1.0)** for text inputs, where Ubuntu represents *"I am because we are"*: harmony, inclusion, and community bridge-building.
- **0.0–0.3 → Low Ubuntu** (exclusion, selfishness, division)
- **0.4–0.7 → Medium Ubuntu** (partial inclusion, effort but incomplete)
- **0.8–1.0 → High Ubuntu** (harmony, belonging, collective well-being)
This model is trained with **XGBoost regression** on a custom dataset of **918 rows**, balanced across Low, Medium, and High Ubuntu examples. Data was generated using culturally diverse contexts (family, school, workplace, community, cultural rituals).
---
## 📊 Training Details
- **Framework:** Python 3, scikit-learn, XGBoost
- **Embeddings:** SentenceTransformer `"all-mpnet-base-v2"`
- **Algorithm:** `XGBRegressor`
- **Training Size:** 918 rows
- **Train/Test Split:** 80/20
### ⚙️ Hyperparameters
- `n_estimators=300`
- `learning_rate=0.05`
- `max_depth=6`
- `subsample=0.8`
- `colsample_bytree=0.8`
- `random_state=42`
---
## 📈 Evaluation Results
On the held-out test set (20% of data):
- **MSE:** 0.0121
- **RMSE:** 0.1101
- **R² Score:** 0.882
---
## 🚀 Usage
### Load Model
```python
import joblib
import xgboost as xgb
from sentence_transformers import SentenceTransformer
from huggingface_hub import hf_hub_download
# -----------------------------
# 1. Download model from Hugging Face Hub
# -----------------------------
REPO_ID = "mjpsm/Ubuntu_xgb_model" # change if you used a different repo name
FILENAME = "Ubuntu_xgb_model.pkl"
model_path = hf_hub_download(repo_id=REPO_ID, filename=FILENAME)
# -----------------------------
# 2. Load model + embedder
# -----------------------------
model = joblib.load(model_path)
embedder = SentenceTransformer("all-mpnet-base-v2")
# -----------------------------
# 3. Example prediction
# -----------------------------
text = "During our class project, I made sure everyone’s ideas were included."
embedding = embedder.encode([text])
score = model.predict(embedding)[0]
print("Predicted Ubuntu Score:", round(float(score), 3))
```
## 🌍 Applications
- Community storytelling evaluation
- Character alignment in cultural narratives
- AI assistants tuned to Afrocentric archetypes
- Training downstream models in the Soulprint system
## ⚠️ Limitations
- Dataset is synthetic (generated + curated). Real-world generalization should be validated.
- The model is context-specific to Ubuntu values and may not generalize beyond Afrocentric cultural framing.
- Scores are approximate indicators — interpretation depends on narrative context. |