File size: 3,224 Bytes
a50c277
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e9c6c37
a50c277
e9c6c37
 
 
 
 
 
 
 
 
 
 
 
a50c277
 
e9c6c37
 
 
a50c277
 
 
 
e9c6c37
 
a50c277
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
language: en
license: mit
tags:
- regression
- soulprint
- ubuntu
- xgboost
- culturally-rooted
model-index:
- name: Ubuntu_xgb_model
  results:
  - task:
      type: regression
      name: Ubuntu Regression
    dataset:
      name: Ubuntu-regression_data.jsonl
      type: synthetic
    metrics:
    - type: mse
      value: 0.0121
    - type: rmse
      value: 0.1101
    - type: r2
      value: 0.8817
---

# Ubuntu Regression Model (Soulprint Archetype)

## 🧩 Overview
The **Ubuntu_xgb_model** is part of the Soulprint archetype family of models.  
It predicts an **Ubuntu alignment score (0.0–1.0)** for text inputs, where Ubuntu represents *"I am because we are"*: harmony, inclusion, and community bridge-building.

- **0.0–0.3 → Low Ubuntu** (exclusion, selfishness, division)  
- **0.4–0.7 → Medium Ubuntu** (partial inclusion, effort but incomplete)  
- **0.8–1.0 → High Ubuntu** (harmony, belonging, collective well-being)  

This model is trained with **XGBoost regression** on a custom dataset of **918 rows**, balanced across Low, Medium, and High Ubuntu examples. Data was generated using culturally diverse contexts (family, school, workplace, community, cultural rituals).

---

## 📊 Training Details
- **Framework:** Python 3, scikit-learn, XGBoost  
- **Embeddings:** SentenceTransformer `"all-mpnet-base-v2"`  
- **Algorithm:** `XGBRegressor`  
- **Training Size:** 918 rows  
- **Train/Test Split:** 80/20  

### ⚙️ Hyperparameters
- `n_estimators=300`  
- `learning_rate=0.05`  
- `max_depth=6`  
- `subsample=0.8`  
- `colsample_bytree=0.8`  
- `random_state=42`  

---

## 📈 Evaluation Results
On the held-out test set (20% of data):  
- **MSE:** 0.0121  
- **RMSE:** 0.1101  
- **R² Score:** 0.882  

---

## 🚀 Usage

### Load Model
```python
import joblib
import xgboost as xgb
from sentence_transformers import SentenceTransformer
from huggingface_hub import hf_hub_download

# -----------------------------
# 1. Download model from Hugging Face Hub
# -----------------------------
REPO_ID = "mjpsm/Ubuntu_xgb_model"  # change if you used a different repo name
FILENAME = "Ubuntu_xgb_model.pkl"

model_path = hf_hub_download(repo_id=REPO_ID, filename=FILENAME)

# -----------------------------
# 2. Load model + embedder
# -----------------------------
model = joblib.load(model_path)
embedder = SentenceTransformer("all-mpnet-base-v2")

# -----------------------------
# 3. Example prediction
# -----------------------------
text = "During our class project, I made sure everyone’s ideas were included."
embedding = embedder.encode([text])
score = model.predict(embedding)[0]

print("Predicted Ubuntu Score:", round(float(score), 3))

```

## 🌍 Applications

- Community storytelling evaluation

- Character alignment in cultural narratives

- AI assistants tuned to Afrocentric archetypes

- Training downstream models in the Soulprint system

## ⚠️ Limitations

- Dataset is synthetic (generated + curated). Real-world generalization should be validated.

- The model is context-specific to Ubuntu values and may not generalize beyond Afrocentric cultural framing.

- Scores are approximate indicators — interpretation depends on narrative context.