Ubuntu-xgb-model / README.md

mjpsm

Update README.md

e9c6c37 verified 5 months ago

preview code

raw

history blame contribute delete

3.22 kB

metadata

language: en
license: mit
tags:
  - regression
  - soulprint
  - ubuntu
  - xgboost
  - culturally-rooted
model-index:
  - name: Ubuntu_xgb_model
    results:
      - task:
          type: regression
          name: Ubuntu Regression
        dataset:
          name: Ubuntu-regression_data.jsonl
          type: synthetic
        metrics:
          - type: mse
            value: 0.0121
          - type: rmse
            value: 0.1101
          - type: r2
            value: 0.8817

Ubuntu Regression Model (Soulprint Archetype)

🧩 Overview

The Ubuntu_xgb_model is part of the Soulprint archetype family of models.
It predicts an Ubuntu alignment score (0.0–1.0) for text inputs, where Ubuntu represents "I am because we are": harmony, inclusion, and community bridge-building.

0.0–0.3 → Low Ubuntu (exclusion, selfishness, division)
0.4–0.7 → Medium Ubuntu (partial inclusion, effort but incomplete)
0.8–1.0 → High Ubuntu (harmony, belonging, collective well-being)

This model is trained with XGBoost regression on a custom dataset of 918 rows, balanced across Low, Medium, and High Ubuntu examples. Data was generated using culturally diverse contexts (family, school, workplace, community, cultural rituals).

📊 Training Details

Framework: Python 3, scikit-learn, XGBoost
Embeddings: SentenceTransformer "all-mpnet-base-v2"
Algorithm: XGBRegressor
Training Size: 918 rows
Train/Test Split: 80/20

⚙️ Hyperparameters

n_estimators=300
learning_rate=0.05
max_depth=6
subsample=0.8
colsample_bytree=0.8
random_state=42

📈 Evaluation Results

On the held-out test set (20% of data):

MSE: 0.0121
RMSE: 0.1101
R² Score: 0.882

🚀 Usage

Load Model

import joblib
import xgboost as xgb
from sentence_transformers import SentenceTransformer
from huggingface_hub import hf_hub_download

# -----------------------------
# 1. Download model from Hugging Face Hub
# -----------------------------
REPO_ID = "mjpsm/Ubuntu_xgb_model"  # change if you used a different repo name
FILENAME = "Ubuntu_xgb_model.pkl"

model_path = hf_hub_download(repo_id=REPO_ID, filename=FILENAME)

# -----------------------------
# 2. Load model + embedder
# -----------------------------
model = joblib.load(model_path)
embedder = SentenceTransformer("all-mpnet-base-v2")

# -----------------------------
# 3. Example prediction
# -----------------------------
text = "During our class project, I made sure everyone’s ideas were included."
embedding = embedder.encode([text])
score = model.predict(embedding)[0]

print("Predicted Ubuntu Score:", round(float(score), 3))

🌍 Applications

Community storytelling evaluation
Character alignment in cultural narratives
AI assistants tuned to Afrocentric archetypes
Training downstream models in the Soulprint system

⚠️ Limitations

Dataset is synthetic (generated + curated). Real-world generalization should be validated.
The model is context-specific to Ubuntu values and may not generalize beyond Afrocentric cultural framing.
Scores are approximate indicators — interpretation depends on narrative context.