Upload folder using huggingface_hub

Browse files

Files changed (12) hide show

README.md +203 -3
class_mapping.json +38 -0
class_mapping_single.json +38 -0
config.json +27 -0
feature_normalizer.json +222 -0
feature_selector.json +113 -0
mlp_weights.pt +3 -0
mlp_weights_multitask.pt +3 -0
mlp_weights_single.pt +3 -0
tfidf_vectorizer.json +0 -0
tfidf_vectorizer_multitask.json +0 -0
tfidf_vectorizer_single.json +0 -0

README.md CHANGED Viewed

@@ -1,3 +1,203 @@
----
-license: gpl-3.0
----

+---
+language: en
+license: mit
+tags:
+- mbti
+- personality-classification
+- text-classification
+- multi-task-learning
+- rust
+datasets:
+- mbti-kaggle
+metrics:
+- accuracy
+model-index:
+- name: psycial-mbti-multitask
+  results:
+  - task:
+      type: text-classification
+      name: MBTI Personality Classification
+    dataset:
+      name: MBTI Kaggle Dataset
+      type: mbti-kaggle
+    metrics:
+    - type: accuracy
+      value: 49.80
+      name: Overall Test Accuracy
+    - type: accuracy
+      value: 80.58
+      name: E/I Accuracy
+    - type: accuracy
+      value: 87.15
+      name: S/N Accuracy
+    - type: accuracy
+      value: 81.90
+      name: T/F Accuracy
+    - type: accuracy
+      value: 75.33
+      name: J/P Accuracy
+---
+# MBTI Personality Classifier - Multi-Task Model
+Production-grade MBTI (Myers-Briggs Type Indicator) personality classification model implemented in Rust with GPU acceleration.
+## Model Description
+This model predicts MBTI personality type from text using a multi-task learning approach with four independent binary classifiers for each MBTI dimension:
+- **E/I**: Extraversion vs Introversion
+- **S/N**: Sensing vs Intuition
+- **T/F**: Thinking vs Feeling
+- **J/P**: Judging vs Perceiving
+### Architecture
+- **Input Features**: 5384 dimensions
+  - 5000 TF-IDF features (top words from vocabulary)
+  - 384 BERT embeddings (sentence-transformers/all-MiniLM-L6-v2)
+- **Network**: 4-layer deep MLP
+  - Architecture: 5384 → [1024, 768, 512, 256] → 4×2 outputs
+  - Dropout: 0.5 (disabled during inference)
+  - Optimizer: Adam (lr=0.001)
+- **Training**: Per-dimension epochs with weighted loss
+  - E/I: 30 epochs, weight=1.2
+  - S/N: 30 epochs, weight=1.0
+  - T/F: 25 epochs, weight=1.0
+  - J/P: 30 epochs, weight=1.3
+## Performance
+### Overall Metrics
+| Metric | Value |
+|--------|-------|
+| **Test Accuracy** | **49.80%** |
+| vs Random Baseline (6.25%) | 8.0x better |
+| vs TF-IDF Baseline (21.73%) | +129.2% improvement |
+| Training Time | ~50 seconds (GPU) |
+### Per-Dimension Accuracy
+| Dimension | Accuracy | Samples |
+|-----------|----------|---------|
+| E/I | 80.58% | 1398/1735 |
+| S/N | 87.15% | 1512/1735 |
+| T/F | 81.90% | 1421/1735 |
+| J/P | 75.33% | 1307/1735 |
+## Training Data
+- **Dataset**: MBTI Kaggle Dataset (8675 samples)
+- **Split**: 80% train (6940), 20% test (1735)
+- **Classes**: 16 MBTI types (INTJ, ENFP, etc.)
+- **Class Distribution**:
+  - I: 77%, E: 23% (imbalanced)
+  - N: 86%, S: 14% (highly imbalanced)
+  - F: 54%, T: 46% (balanced)
+  - J: 60%, P: 40% (moderately imbalanced)
+## Usage
+### Requirements
+- Rust 1.70+
+- PyTorch/libtorch (for tch-rs bindings)
+- CUDA (optional, for GPU acceleration)
+### Installation
+```bash
+git clone https://github.com/your-username/psycial
+cd psycial
+# Set up environment
+conda create -n psycial python=3.10
+conda activate psycial
+conda install pytorch
+# Build
+export LIBTORCH_USE_PYTORCH=1
+export LIBTORCH_BYPASS_VERSION_CHECK=1
+cargo build --release
+```
+### Inference
+```bash
+# Train model
+./target/release/psycial hybrid train --multi-task
+# Predict single text
+./target/release/psycial hybrid predict "I love solving complex problems and thinking deeply about abstract concepts."
+```
+### Programmatic Usage (Rust)
+```rust
+use psycial::hybrid::predict::predict_single;
+// Load model and predict
+predict_single("Your text here")?;
+```
+## Model Files
+This model repository includes:
+1. **mlp_weights_multitask.pt** (27MB) - Neural network weights
+2. **tfidf_vectorizer_multitask.json** (213KB) - TF-IDF vocabulary and IDF weights
+3. **feature_normalizer.json** (5KB) - Feature normalization parameters (optional)
+4. **feature_selector.json** (1KB) - Pearson feature selector indices (optional)
+## Limitations
+1. **Moderate Accuracy**: 49.80% is significantly better than random/baseline but still has room for improvement
+2. **Class Imbalance**: Model may favor majority classes (I, N, F, J)
+3. **Data Bias**: Trained on online forum posts, may not generalize to all text types
+4. **Language**: English only
+5. **MBTI Validity**: MBTI itself has limited scientific validity
+## Ethical Considerations
+- MBTI is not scientifically validated for hiring/clinical decisions
+- Predictions should be used for entertainment/research only
+- Be aware of class imbalances and potential biases
+- Do not use for discriminatory purposes
+## Technical Details
+### Framework
+- **Language**: Rust
+- **ML Framework**: tch-rs (PyTorch bindings)
+- **BERT Model**: sentence-transformers/all-MiniLM-L6-v2
+- **Device**: CUDA-capable GPU (falls back to CPU)
+### Key Innovations
+1. **Dropout Bug Fix**: Discovered and fixed critical bug where dropout remained active during inference, causing ~5% accuracy loss
+2. **Per-Dimension Optimization**: Different epochs and loss weights for each MBTI dimension
+3. **Multi-Task Learning**: Four independent binary classifiers instead of 16-way classification
+## Citation
+If you use this model, please cite:
+```bibtex
+@software{psycial_mbti_2025,
+  title = {Polyjuice: Multi-Task MBTI Personality Classifier},
+  author = {lderRyan},
+  year = {2025},
+  url = {https://huggingface.co/lderRyan/polyjuice}
+}
+```
+## License
+MIT License - See LICENSE file for details
+## Acknowledgments
+- **rust-bert**: Guillaume BE for BERT Rust implementation
+- **Dataset**: MBTI Kaggle Dataset
+- **sentence-transformers**: all-MiniLM-L6-v2 model

class_mapping.json ADDED Viewed

	@@ -0,0 +1,38 @@

+{
+  "class_to_idx": {
+    "ENFJ": 0,
+    "ENFP": 1,
+    "ENTJ": 2,
+    "ENTP": 3,
+    "ESFJ": 4,
+    "ESFP": 5,
+    "ESTJ": 6,
+    "ESTP": 7,
+    "INFJ": 8,
+    "INFP": 9,
+    "INTJ": 10,
+    "INTP": 11,
+    "ISFJ": 12,
+    "ISFP": 13,
+    "ISTJ": 14,
+    "ISTP": 15
+  },
+  "idx_to_class": {
+    "0": "ENFJ",
+    "1": "ENFP",
+    "10": "INTJ",
+    "11": "INTP",
+    "12": "ISFJ",
+    "13": "ISFP",
+    "14": "ISTJ",
+    "15": "ISTP",
+    "2": "ENTJ",
+    "3": "ENTP",
+    "4": "ESFJ",
+    "5": "ESFP",
+    "6": "ESTJ",
+    "7": "ESTP",
+    "8": "INFJ",
+    "9": "INFP"
+  }
+}

class_mapping_single.json ADDED Viewed

	@@ -0,0 +1,38 @@

+{
+  "class_to_idx": {
+    "ENFJ": 0,
+    "ENFP": 1,
+    "ENTJ": 2,
+    "ENTP": 3,
+    "ESFJ": 4,
+    "ESFP": 5,
+    "ESTJ": 6,
+    "ESTP": 7,
+    "INFJ": 8,
+    "INFP": 9,
+    "INTJ": 10,
+    "INTP": 11,
+    "ISFJ": 12,
+    "ISFP": 13,
+    "ISTJ": 14,
+    "ISTP": 15
+  },
+  "idx_to_class": {
+    "0": "ENFJ",
+    "1": "ENFP",
+    "10": "INTJ",
+    "11": "INTP",
+    "12": "ISFJ",
+    "13": "ISFP",
+    "14": "ISTJ",
+    "15": "ISTP",
+    "2": "ENTJ",
+    "3": "ENTP",
+    "4": "ESFJ",
+    "5": "ESFP",
+    "6": "ESTJ",
+    "7": "ESTP",
+    "8": "INFJ",
+    "9": "INFP"
+  }
+}

config.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "model_type": "multi-task-mlp",
+  "task": "mbti-classification",
+  "num_classes": 16,
+  "dimensions": [
+    "E/I",
+    "S/N",
+    "T/F",
+    "J/P"
+  ],
+  "input_features": 5384,
+  "architecture": [
+    1024,
+    768,
+    512,
+    256
+  ],
+  "accuracy": {
+    "overall": 49.8,
+    "e_i": 80.58,
+    "s_n": 87.15,
+    "t_f": 81.9,
+    "j_p": 75.33
+  },
+  "framework": "rust-tch",
+  "language": "en"
+}

feature_normalizer.json ADDED Viewed

	@@ -0,0 +1,222 @@

+{
+  "means": [
+    0.11892745237758537,
+    0.05718321435078807,
+    0.08903009701198665,
+    0.05311462616045365,
+    0.10074856229995091,
+    0.005238788351734905,
+    0.1180565619501122,
+    0.05558708917655197,
+    0.031797216820838296,
+    0.025987183053747098,
+    0.923267019267142,
+    0.032976511489937715,
+    1.0692035554006571,
+    0.04727388342378716,
+    0.12665535318825866,
+    -0.10929009269139744,
+    0.17288452633307436,
+    0.047284178704204746,
+    0.09925854481169885,
+    2.8348703170028817,
+    0.4242427405525002,
+    0.6173349876551278,
+    0.08227528755008601,
+    0.038406755473586636,
+    0.0899492856840066,
+    0.04214351200078721,
+    0.08220681709204476,
+    7.903036887559477,
+    2.092115911872459,
+    1.5237228256225628,
+    0.0064792351086982135,
+    0.0062672765514501195,
+    0.008339967538536535,
+    0.0031623437761733658,
+    0.005632692180832851,
+    0.0,
+    0.0,
+    0.0,
+    0.0,
+    0.0,
+    0.0,
+    0.0,
+    0.0,
+    0.0,
+    0.0,
+    0.0,
+    0.0,
+    0.0,
+    0.0,
+    0.0,
+    0.0,
+    0.0,
+    0.0,
+    0.0,
+    0.0,
+    0.0,
+    0.0,
+    2.2659337849287593,
+    1.044037763138872,
+    2.479041585729291,
+    1.1919732148401274,
+    6.907797511424016,
+    0.20183860256393302,
+    1.3890878243731999,
+    2.674333824093939,
+    12.975266745390156,
+    0.6689204055239117,
+    0.13398951634897222,
+    125.31657060518732,
+    10.990893262551307,
+    8.618075627427794,
+    0.039010665253703836,
+    2.5448203222323675,
+    74.92319884726224,
+    0.335258863617196,
+    0.10991749021008423,
+    0.023568500337880372,
+    0.43417112277016273,
+    0.8410876756504063,
+    2.131120421546073,
+    0.006522501353850472,
+    2.3742059106274946,
+    15.0585619143013,
+    1.7338250157090132,
+    46.82445812913767,
+    0.2229548365349718,
+    0.5657628928051207,
+    0.5608774940650731,
+    0.02932614561269527,
+    4.250217510635379,
+    0.8674383811441574,
+    3449.8171163739294,
+    18.427569571055827,
+    20.2482088181966,
+    10.554981450928512,
+    7.313789687435963,
+    5.710843411129353,
+    3.517305492116866,
+    2.4741733752721022,
+    1.6224146200739884,
+    0.9174680975210207,
+    0.4181542340068293,
+    0.2537107680799646,
+    0.06454837319169232,
+    0.04664554518020532,
+    3.588184438040346,
+    4.536167146974063,
+    0.034829471123004815
+  ],
+  "stds": [
+    0.16222398426461307,
+    0.09092354715482365,
+    0.11179568665118766,
+    0.08312728942311748,
+    0.1126019164924034,
+    0.02269594928598018,
+    0.1251666937390458,
+    0.10924194954707005,
+    0.05872550850917025,
+    0.0633645799292151,
+    0.42475437786398945,
+    0.05674599679544609,
+    0.42017963662050706,
+    0.13162506345234012,
+    0.13047416627398492,
+    0.1334994167431149,
+    0.1354726430001486,
+    0.10387263332800313,
+    0.13281477052945212,
+    1.2523615377448563,
+    0.2860078155175164,
+    1.0323647098520268,
+    0.11874272981883065,
+    0.0737292652829932,
+    0.1250957943956574,
+    0.07920780206910542,
+    0.05964781374122256,
+    1.9770418991295584,
+    1.0051282322538722,
+    0.8070361908342224,
+    0.018907235885132503,
+    0.017544468539039473,
+    0.026064837088552832,
+    0.011571376284607058,
+    0.015007920642550205,
+    1.0,
+    1.0,
+    1.0,
+    1.0,
+    1.0,
+    1.0,
+    1.0,
+    1.0,
+    1.0,
+    1.0,
+    1.0,
+    1.0,
+    1.0,
+    1.0,
+    1.0,
+    1.0,
+    1.0,
+    1.0,
+    1.0,
+    1.0,
+    1.0,
+    1.0,
+    0.5586922378464478,
+    0.3972695963551329,
+    0.6755919465210544,
+    0.41404904091977174,
+    1.140085237740577,
+    0.06711088335149644,
+    0.49855911829224,
+    0.7313228888125592,
+    2.085136928326834,
+    0.3134181772934974,
+    0.12607994728745905,
+    32.108918326188885,
+    3.1681212101343434,
+    2.954850038906221,
+    0.015510895435743266,
+    0.3538567879733598,
+    27.298478451929025,
+    0.04154412039723594,
+    0.10792443827290753,
+    0.047089307032740446,
+    0.2654527395327808,
+    0.2186173461677225,
+    0.5459354574322072,
+    0.024801460473129467,
+    0.17411130220169632,
+    2.271383011432748,
+    0.6518006121681411,
+    2.349981541194138,
+    0.18364740637981875,
+    0.2929219359188755,
+    0.8111240117428083,
+    0.06061569013629006,
+    3.174016329540834,
+    0.044197254841717507,
+    1112.3845909439165,
+    1.8178255230687432,
+    1.9076853556265896,
+    1.4599790880963919,
+    1.0092494794625304,
+    1.0606798564631996,
+    0.7720238147404723,
+    0.6463716717065361,
+    0.5350779979831183,
+    0.4089743982652186,
+    0.24577101556951275,
+    0.17584871595800494,
+    0.08405248047643799,
+    0.07606227100268607,
+    2.4320456375803654,
+    2.634615558689861,
+    0.05362220775257933
+  ]
+}

feature_selector.json ADDED Viewed

	@@ -0,0 +1,113 @@

+{
+  "threshold": 0.85,
+  "selected_indices": [
+    0,
+    1,
+    2,
+    3,
+    4,
+    5,
+    6,
+    7,
+    10,
+    13,
+    14,
+    15,
+    16,
+    17,
+    18,
+    19,
+    20,
+    21,
+    22,
+    23,
+    24,
+    25,
+    26,
+    27,
+    28,
+    29,
+    32,
+    33,
+    34,
+    35,
+    36,
+    37,
+    38,
+    39,
+    40,
+    41,
+    47,
+    53,
+    59,
+    65,
+    71,
+    77,
+    83,
+    89,
+    95,
+    101,
+    107,
+    113,
+    119,
+    125,
+    131,
+    137,
+    143,
+    149,
+    155,
+    161,
+    167,
+    271,
+    272,
+    273,
+    274,
+    275,
+    276,
+    277,
+    278,
+    279,
+    282,
+    283,
+    284,
+    285,
+    286,
+    287,
+    288,
+    289,
+    290,
+    440,
+    441,
+    442,
+    443,
+    444,
+    445,
+    448,
+    449,
+    450,
+    451,
+    452,
+    453,
+    454,
+    456,
+    457,
+    460,
+    464,
+    465,
+    466,
+    467,
+    468,
+    469,
+    470,
+    471,
+    472,
+    473,
+    474,
+    475,
+    476,
+    477,
+    478,
+    479,
+    496
+  ]
+}

mlp_weights.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:38c4f45b8865cdac630224b72f6566288790e5e1f9bd2cba38cbf8e9e35b4add
+size 24701783

mlp_weights_multitask.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aeecb0db78953acf0be88c455839a92e06de3f8c16db5f827bfe9f07c805903e
+size 27320451

mlp_weights_single.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1b3de609d7e6990dac2df96313ca25ab0662f1f57f8d8a1bb88fe62df8ee9549
+size 18663360

tfidf_vectorizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tfidf_vectorizer_multitask.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tfidf_vectorizer_single.json ADDED Viewed

The diff for this file is too large to render. See raw diff