KingTechnician commited on
Commit
0a194f3
·
verified ·
1 Parent(s): 574d6d7

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +169 -0
README.md ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ tags:
5
+ - education
6
+ - coverage-assessment
7
+ - bert
8
+ - regression
9
+ - domain-agnostic
10
+ - educational-ai
11
+ datasets:
12
+ - synthetic-educational-conversations
13
+ metrics:
14
+ - pearson_correlation
15
+ - mae
16
+ - r_squared
17
+ model-index:
18
+ - name: BERT Coverage Assessment
19
+ results:
20
+ - task:
21
+ type: regression
22
+ name: Educational Coverage Assessment
23
+ metrics:
24
+ - type: pearson_correlation
25
+ value: 0.865
26
+ name: Pearson Correlation
27
+ - type: r_squared
28
+ value: 0.749
29
+ name: R-squared
30
+ - type: mae
31
+ value: 0.133
32
+ name: Mean Absolute Error
33
+ ---
34
+
35
+ # BERT Coverage Assessment Model
36
+
37
+ 🎯 **A domain-agnostic BERT model for assessing educational conversation coverage**
38
+
39
+ ## Model Description
40
+
41
+ This model fine-tunes BERT for educational coverage assessment, predicting how well student conversations address learning objectives. It achieves **0.865 Pearson correlation** with coverage assessments, making it suitable for real-time educational applications.
42
+
43
+ ## Key Features
44
+
45
+ - 🌍 **Domain-agnostic**: Works across subjects without retraining
46
+ - 📊 **Continuous scoring**: Outputs 0.0-1.0 coverage scores
47
+ - ⚡ **Real-time capable**: Fast inference for live systems
48
+ - 🎓 **Research-validated**: Exceeds academic benchmarks
49
+
50
+ ## Performance
51
+
52
+ | Metric | Value |
53
+ |--------|-------|
54
+ | Pearson Correlation | 0.8650 |
55
+ | R-squared | 0.7490 |
56
+ | Mean Absolute Error | 0.1330 |
57
+ | RMSE | 0.165 |
58
+
59
+ ## Usage
60
+
61
+ ```python
62
+ from transformers import AutoTokenizer
63
+ import torch
64
+ import torch.nn as nn
65
+ from transformers import AutoModel
66
+
67
+ class BERTCoverageRegressor(nn.Module):
68
+ def __init__(self, model_name='bert-base-uncased', dropout_rate=0.3):
69
+ super(BERTCoverageRegressor, self).__init__()
70
+ self.bert = AutoModel.from_pretrained(model_name)
71
+ self.dropout = nn.Dropout(dropout_rate)
72
+ self.regressor = nn.Linear(self.bert.config.hidden_size, 1)
73
+
74
+ def forward(self, input_ids, attention_mask):
75
+ outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
76
+ pooled_output = outputs.pooler_output
77
+ output = self.dropout(pooled_output)
78
+ return self.regressor(output)
79
+
80
+ # Load model and tokenizer
81
+ tokenizer = AutoTokenizer.from_pretrained('KingTechnician/bert-osmosis-coverage')
82
+ model = BERTCoverageRegressor()
83
+
84
+ # Load the fine-tuned weights
85
+ model_path = "pytorch_model.bin" # Download from repo
86
+ model.load_state_dict(torch.load(model_path, map_location='cpu'))
87
+ model.eval()
88
+
89
+ # Make prediction
90
+ def predict_coverage(objective, conversation, max_length=512):
91
+ encoding = tokenizer(
92
+ objective,
93
+ conversation,
94
+ truncation=True,
95
+ padding='max_length',
96
+ max_length=max_length,
97
+ return_tensors='pt'
98
+ )
99
+
100
+ with torch.no_grad():
101
+ output = model(encoding['input_ids'], encoding['attention_mask'])
102
+ score = torch.clamp(output.squeeze(), 0.0, 1.0).item()
103
+
104
+ return score
105
+
106
+ # Example usage
107
+ objective = "Understand the process of photosynthesis"
108
+ conversation = "Student explains light reactions and Calvin cycle with examples..."
109
+ coverage_score = predict_coverage(objective, conversation)
110
+ print(f"Coverage Score: {coverage_score:.3f}")
111
+ ```
112
+
113
+ ## Input Format
114
+
115
+ The model expects input in the format:
116
+ ```
117
+ [CLS] learning_objective [SEP] student_conversation [SEP]
118
+ ```
119
+
120
+ ## Output
121
+
122
+ Returns a continuous score between 0.0 and 1.0:
123
+ - **0.0-0.2**: Minimal coverage
124
+ - **0.3-0.4**: Low coverage
125
+ - **0.5-0.6**: Moderate coverage
126
+ - **0.7-0.8**: High coverage
127
+ - **0.9-1.0**: Complete coverage
128
+
129
+ ## Training Data
130
+
131
+ Trained on synthetic educational conversations across multiple domains:
132
+ - Computer Science (algorithms, data structures)
133
+ - Statistics (hypothesis testing, regression)
134
+ - Multi-domain conversations
135
+
136
+ ## Research Background
137
+
138
+ This model implements the methodology from research on domain-agnostic educational assessment, achieving significant improvements over traditional similarity-based approaches:
139
+
140
+ - **269% improvement** over baseline similarity features
141
+ - **Domain transfer capability** without retraining
142
+ - **Real-time processing** under 100ms per assessment
143
+
144
+ ## Limitations
145
+
146
+ - Trained primarily on synthetic data (validation on real conversations recommended)
147
+ - Optimized for English language conversations
148
+ - Performance may vary for highly specialized technical domains
149
+
150
+ ## Citation
151
+
152
+ If you use this model in your research, please cite:
153
+
154
+ ```bibtex
155
+ @misc{bert-coverage-assessment,
156
+ title={Domain-Agnostic Coverage Assessment Through BERT Fine-tuning},
157
+ author={Your Name},
158
+ year={2025},
159
+ url={https://huggingface.co/KingTechnician/bert-osmosis-coverage}
160
+ }
161
+ ```
162
+
163
+ ## Contact
164
+
165
+ For questions or collaborations, please open an issue in the model repository.
166
+
167
+ ---
168
+
169
+ **Model Type**: Educational AI | **Task**: Coverage Assessment | **Performance**: r=0.865