--- language: en license: mit tags: - regression - soulprint - ubuntu - xgboost - culturally-rooted model-index: - name: Ubuntu_xgb_model results: - task: type: regression name: Ubuntu Regression dataset: name: Ubuntu-regression_data.jsonl type: synthetic metrics: - type: mse value: 0.0121 - type: rmse value: 0.1101 - type: r2 value: 0.8817 --- # Ubuntu Regression Model (Soulprint Archetype) ## 🧩 Overview The **Ubuntu_xgb_model** is part of the Soulprint archetype family of models. It predicts an **Ubuntu alignment score (0.0–1.0)** for text inputs, where Ubuntu represents *"I am because we are"*: harmony, inclusion, and community bridge-building. - **0.0–0.3 β†’ Low Ubuntu** (exclusion, selfishness, division) - **0.4–0.7 β†’ Medium Ubuntu** (partial inclusion, effort but incomplete) - **0.8–1.0 β†’ High Ubuntu** (harmony, belonging, collective well-being) This model is trained with **XGBoost regression** on a custom dataset of **918 rows**, balanced across Low, Medium, and High Ubuntu examples. Data was generated using culturally diverse contexts (family, school, workplace, community, cultural rituals). --- ## πŸ“Š Training Details - **Framework:** Python 3, scikit-learn, XGBoost - **Embeddings:** SentenceTransformer `"all-mpnet-base-v2"` - **Algorithm:** `XGBRegressor` - **Training Size:** 918 rows - **Train/Test Split:** 80/20 ### βš™οΈ Hyperparameters - `n_estimators=300` - `learning_rate=0.05` - `max_depth=6` - `subsample=0.8` - `colsample_bytree=0.8` - `random_state=42` --- ## πŸ“ˆ Evaluation Results On the held-out test set (20% of data): - **MSE:** 0.0121 - **RMSE:** 0.1101 - **RΒ² Score:** 0.882 --- ## πŸš€ Usage ### Load Model ```python import joblib import xgboost as xgb from sentence_transformers import SentenceTransformer from huggingface_hub import hf_hub_download # ----------------------------- # 1. Download model from Hugging Face Hub # ----------------------------- REPO_ID = "mjpsm/Ubuntu_xgb_model" # change if you used a different repo name FILENAME = "Ubuntu_xgb_model.pkl" model_path = hf_hub_download(repo_id=REPO_ID, filename=FILENAME) # ----------------------------- # 2. Load model + embedder # ----------------------------- model = joblib.load(model_path) embedder = SentenceTransformer("all-mpnet-base-v2") # ----------------------------- # 3. Example prediction # ----------------------------- text = "During our class project, I made sure everyone’s ideas were included." embedding = embedder.encode([text]) score = model.predict(embedding)[0] print("Predicted Ubuntu Score:", round(float(score), 3)) ``` ## 🌍 Applications - Community storytelling evaluation - Character alignment in cultural narratives - AI assistants tuned to Afrocentric archetypes - Training downstream models in the Soulprint system ## ⚠️ Limitations - Dataset is synthetic (generated + curated). Real-world generalization should be validated. - The model is context-specific to Ubuntu values and may not generalize beyond Afrocentric cultural framing. - Scores are approximate indicators β€” interpretation depends on narrative context.