patrickmlml commited on
Commit
ae3db3f
·
verified ·
1 Parent(s): ade044c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +135 -0
README.md ADDED
@@ -0,0 +1,135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ language:
4
+ - en
5
+ metrics:
6
+ - matthews_correlation
7
+ - accuracy
8
+ base_model:
9
+ - state-spaces/mamba-130m
10
+ tags:
11
+ - text-classification
12
+ - nli
13
+ - mamba
14
+ ---
15
+
16
+ # Model Card for 11128093-11066053-nli
17
+
18
+ <!-- Provide a quick summary of what the model is/does. -->
19
+
20
+ A binary Natural Language Inference classifier fine-tuned on the provided COMP34812 dataset using the Mamba state space model.
21
+
22
+
23
+ ## Model Details
24
+
25
+ ### Model Description
26
+
27
+ <!-- Provide a longer summary of what this model is. -->
28
+
29
+ This model extends the state-spaces/mamba-130m architecture for binary NLI tasks (entailment vs. non-entailment). It uses a custom classification head and was fine-tuned on the COMP34812 NLI dataset.
30
+
31
+ - **Developed by:** Patrick Mermelstein Lyons and Dev Soneji
32
+ - **Language(s):** English
33
+ - **Model type:** Supervised
34
+ - **Model architecture:** Non-Transformers (Selective State Spaces)
35
+ - **Finetuned from model [optional]:** state-spaces/mamba-130m
36
+
37
+ ### Model Resources
38
+
39
+ <!-- Provide links where applicable. -->
40
+
41
+ - **Repository:** https://huggingface.co/state-spaces/mamba-130m
42
+ - **Paper or documentation:** https://arxiv.org/pdf/2312.00752.pdf
43
+
44
+ ## Training Details
45
+
46
+ ### Training Data
47
+
48
+ <!-- This is a short stub of information on the training data that was used, and documentation related to data pre-processing or additional filtering (if applicable). -->
49
+
50
+ The COMP34812 NLI train dataset (closed-source task-specific dataset). 24.4K pairs of premise-hypothesis pairs, each with a binary entailment label.
51
+
52
+ ### Training Procedure
53
+
54
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
55
+
56
+ #### Training Hyperparameters
57
+
58
+ <!-- This is a summary of the values of hyperparameters used in training the model. -->
59
+
60
+
61
+ - learning_rate: 5e-5
62
+ - train_batch_size: 4
63
+ - eval_batch_size: 16
64
+ - num_train_epochs: 5
65
+ - lr_scheduler_type: cosine
66
+ - warmup_ratio: 0.1
67
+
68
+
69
+ #### Speeds, Sizes, Times
70
+
71
+ <!-- This section provides information about how roughly how long it takes to train the model and the size of the resulting model. -->
72
+
73
+
74
+ - total training time: 1 hour 17 minutes
75
+ - number of epochs: 5
76
+ - model size: ~500MB
77
+
78
+
79
+ ## Evaluation
80
+
81
+ <!-- This section describes the evaluation protocols and provides the results. -->
82
+
83
+ ### Testing Data & Metrics
84
+
85
+ #### Testing Data
86
+
87
+ <!-- This should describe any evaluation data used (e.g., the development/validation set provided). -->
88
+
89
+ The COMP34812 NLI dev dataset (closed-source task-specific dataset). 6.7K pairs of premise-hypothesis pairs, each with a binary entailment label.
90
+
91
+ #### Metrics
92
+
93
+ <!-- These are the evaluation metrics being used. -->
94
+
95
+
96
+ - Accuracy
97
+ - Matthews Correlation Coefficient (MCC)
98
+
99
+
100
+ ### Results
101
+
102
+ The model achieved an accuracy of 82.4% and an MCC of 0.649.
103
+
104
+ ## Technical Specifications
105
+
106
+ ### Hardware
107
+
108
+
109
+ - GPU: NVIDIA T4 (Google Colab)
110
+ - VRAM: 15.0 GB
111
+ - RAM: 12.7 GB
112
+ - Disk: 2 GB for model and data
113
+
114
+
115
+ ### Software
116
+
117
+
118
+ - Python 3.10+
119
+ - PyTorch
120
+ - HuggingFace Transformers
121
+ - mamba-ssm
122
+ - datasets, evaluate, accelerate
123
+
124
+
125
+ ## Bias, Risks, and Limitations
126
+
127
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
128
+
129
+ The model is limited to binary entailment detection and is trained exclusively on the COMP34812 dataset. Generalization outside of this dataset is untested. Sentence pairs longer than 128 tokens will be trunacted.
130
+
131
+ ## Additional Information
132
+
133
+ <!-- Any other information that would be useful for other people to know. -->
134
+
135
+ Model checkpoints and tokenizer available at https://huggingface.co/patrickmlml/mamba_nli_ensemble. Hyperparameters were determined by closely following referenced literature.