4nkh
/

theme_model

@@ -1,63 +1,43 @@
----
-library_name: transformers
-license: apache-2.0
-base_model: bert-base-uncased
-tags:
-- generated_from_trainer
-model-index:
-- name: theme_model
-  results: []
-datasets:
-- 4nkh/theme_data
----
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# theme_model
-This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the None dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.1822
-- Micro/precision: 1.0
-- Micro/recall: 1.0
-- Micro/f1: 1.0
-- Macro/precision: 1.0
-- Macro/recall: 1.0
-- Macro/f1: 1.0
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 2e-05
-- train_batch_size: 8
-- eval_batch_size: 16
-- seed: 42
-- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- num_epochs: 5
-### Training results
-### Framework versions
-- Transformers 4.57.3
-- Pytorch 2.8.0
-- Datasets 4.4.2
-- Tokenizers 0.22.2

+# Theme classification model (multi-label)
+This repository contains a fine-tuned BERT model for classifying short texts into community-oriented themes. The model was trained locally and pushed to the Hugging Face Hub.
+Model details
+- Model architecture: bert-base-uncased (fine-tuned)
+- Problem type: multi-label classification
+- Labels: `mentorship`, `entrepreneurship`, `startup success`
+- Training data: `train_theme.jsonl` (included)
+- Final evaluation (example run):
+  - eval_loss: 0.1822
+  - eval_micro/f1: 1.0
+  - eval_macro/f1: 1.0
+Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+repo = "4nkh/theme_model"
+tokenizer = AutoTokenizer.from_pretrained(repo)
+model = AutoModelForSequenceClassification.from_pretrained(repo)
+texts = ["Our co-op paired first-time founders with veteran shop owners to troubleshoot setbacks."]
+inputs = tokenizer(texts, truncation=True, padding=True, return_tensors="pt")
+with torch.no_grad():
+    outputs = model(**inputs)
+    logits = outputs.logits
+    probs = torch.sigmoid(logits)
+    preds = (probs >= 0.5).int()
+    print('probs', probs.numpy(), 'preds', preds.numpy())
+```
+Notes
+- This model uses a threshold of 0.5 for multi-label predictions. Adjust thresholds per-class as needed.
+- If you want to re-train or fine-tune further, see `train_theme_model.py` in this folder.
+License
+Specify your license here (e.g., Apache-2.0) or remove this section if you prefer a different license.