language: en
license: mit
tags:
- regression
- soulprint
- ubuntu
- xgboost
- culturally-rooted
model-index:
- name: Ubuntu_xgb_model
results:
- task:
type: regression
name: Ubuntu Regression
dataset:
name: Ubuntu-regression_data.jsonl
type: synthetic
metrics:
- type: mse
value: 0.0121
- type: rmse
value: 0.1101
- type: r2
value: 0.8817
Ubuntu Regression Model (Soulprint Archetype)
π§© Overview
The Ubuntu_xgb_model is part of the Soulprint archetype family of models.
It predicts an Ubuntu alignment score (0.0β1.0) for text inputs, where Ubuntu represents "I am because we are": harmony, inclusion, and community bridge-building.
- 0.0β0.3 β Low Ubuntu (exclusion, selfishness, division)
- 0.4β0.7 β Medium Ubuntu (partial inclusion, effort but incomplete)
- 0.8β1.0 β High Ubuntu (harmony, belonging, collective well-being)
This model is trained with XGBoost regression on a custom dataset of 918 rows, balanced across Low, Medium, and High Ubuntu examples. Data was generated using culturally diverse contexts (family, school, workplace, community, cultural rituals).
π Training Details
- Framework: Python 3, scikit-learn, XGBoost
- Embeddings: SentenceTransformer
"all-mpnet-base-v2" - Algorithm:
XGBRegressor - Training Size: 918 rows
- Train/Test Split: 80/20
βοΈ Hyperparameters
n_estimators=300learning_rate=0.05max_depth=6subsample=0.8colsample_bytree=0.8random_state=42
π Evaluation Results
On the held-out test set (20% of data):
- MSE: 0.0121
- RMSE: 0.1101
- RΒ² Score: 0.882
π Usage
Load Model
import joblib
import xgboost as xgb
from sentence_transformers import SentenceTransformer
from huggingface_hub import hf_hub_download
# -----------------------------
# 1. Download model from Hugging Face Hub
# -----------------------------
REPO_ID = "mjpsm/Ubuntu_xgb_model" # change if you used a different repo name
FILENAME = "Ubuntu_xgb_model.pkl"
model_path = hf_hub_download(repo_id=REPO_ID, filename=FILENAME)
# -----------------------------
# 2. Load model + embedder
# -----------------------------
model = joblib.load(model_path)
embedder = SentenceTransformer("all-mpnet-base-v2")
# -----------------------------
# 3. Example prediction
# -----------------------------
text = "During our class project, I made sure everyoneβs ideas were included."
embedding = embedder.encode([text])
score = model.predict(embedding)[0]
print("Predicted Ubuntu Score:", round(float(score), 3))
π Applications
Community storytelling evaluation
Character alignment in cultural narratives
AI assistants tuned to Afrocentric archetypes
Training downstream models in the Soulprint system
β οΈ Limitations
Dataset is synthetic (generated + curated). Real-world generalization should be validated.
The model is context-specific to Ubuntu values and may not generalize beyond Afrocentric cultural framing.
Scores are approximate indicators β interpretation depends on narrative context.