KhoaUIT commited on
Commit
b5986f2
·
verified ·
1 Parent(s): 6764d63

Update README.md

Browse files

add dataset information

Files changed (1) hide show
  1. README.md +139 -139
README.md CHANGED
@@ -1,140 +1,140 @@
1
- ---
2
- tags:
3
- - sentence-transformers
4
- - sentence-similarity
5
- - feature-extraction
6
- pipeline_tag: sentence-similarity
7
- library_name: sentence-transformers
8
- ---
9
-
10
- # SentenceTransformer
11
-
12
- This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
13
-
14
- ## Model Details
15
-
16
- ### Model Description
17
- - **Model Type:** Sentence Transformer
18
- <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
19
- - **Maximum Sequence Length:** 256 tokens
20
- - **Output Dimensionality:** 768 tokens
21
- - **Similarity Function:** Cosine Similarity
22
- <!-- - **Training Dataset:** Unknown -->
23
- <!-- - **Language:** Unknown -->
24
- <!-- - **License:** Unknown -->
25
-
26
- ### Model Sources
27
-
28
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
29
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
30
- - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
31
-
32
- ### Full Model Architecture
33
-
34
- ```
35
- SentenceTransformer(
36
- (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: RobertaModel
37
- (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
38
- )
39
- ```
40
-
41
- ## Usage
42
-
43
- ### Direct Usage (Sentence Transformers)
44
-
45
- First install the Sentence Transformers library:
46
-
47
- ```bash
48
- pip install -U sentence-transformers
49
- ```
50
-
51
- Then you can load this model and run inference.
52
- ```python
53
- from sentence_transformers import SentenceTransformer
54
-
55
- # Download from the 🤗 Hub
56
- model = SentenceTransformer("KhoaUIT/Phobert-UIT-R2GQA")
57
- # Run inference
58
- sentences = [
59
- 'The weather is lovely today.',
60
- "It's so sunny outside!",
61
- 'He drove to the stadium.',
62
- ]
63
- embeddings = model.encode(sentences)
64
- print(embeddings.shape)
65
- # [3, 768]
66
-
67
- # Get the similarity scores for the embeddings
68
- similarities = model.similarity(embeddings, embeddings)
69
- print(similarities.shape)
70
- # [3, 3]
71
- ```
72
-
73
- <!--
74
- ### Direct Usage (Transformers)
75
-
76
- <details><summary>Click to see the direct usage in Transformers</summary>
77
-
78
- </details>
79
- -->
80
-
81
- <!--
82
- ### Downstream Usage (Sentence Transformers)
83
-
84
- You can finetune this model on your own dataset.
85
-
86
- <details><summary>Click to expand</summary>
87
-
88
- </details>
89
- -->
90
-
91
- <!--
92
- ### Out-of-Scope Use
93
-
94
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
95
- -->
96
-
97
- <!--
98
- ## Bias, Risks and Limitations
99
-
100
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
101
- -->
102
-
103
- <!--
104
- ### Recommendations
105
-
106
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
107
- -->
108
-
109
- ## Training Details
110
-
111
- ### Framework Versions
112
- - Python: 3.12.3
113
- - Sentence Transformers: 3.2.0
114
- - Transformers: 4.45.2
115
- - PyTorch: 2.3.0+cpu
116
- - Accelerate:
117
- - Datasets: 3.1.0
118
- - Tokenizers: 0.20.1
119
-
120
- ## Citation
121
-
122
- ### BibTeX
123
-
124
- <!--
125
- ## Glossary
126
-
127
- *Clearly define terms in order to be accessible across audiences.*
128
- -->
129
-
130
- <!--
131
- ## Model Card Authors
132
-
133
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
134
- -->
135
-
136
- <!--
137
- ## Model Card Contact
138
-
139
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
140
  -->
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ pipeline_tag: sentence-similarity
7
+ library_name: sentence-transformers
8
+ ---
9
+
10
+ # SentenceTransformer
11
+
12
+ This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
13
+
14
+ ## Model Details
15
+
16
+ ### Model Description
17
+ - **Model Type:** Sentence Transformer
18
+ <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
19
+ - **Maximum Sequence Length:** 256 tokens
20
+ - **Output Dimensionality:** 768 tokens
21
+ - **Similarity Function:** Cosine Similarity
22
+ - **Training Dataset:** [R2GQA](https://link.springer.com/article/10.1007/s10506-025-09457-7)
23
+ <!-- - **Language:** Unknown -->
24
+ <!-- - **License:** Unknown -->
25
+
26
+ ### Model Sources
27
+
28
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
29
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
30
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
31
+
32
+ ### Full Model Architecture
33
+
34
+ ```
35
+ SentenceTransformer(
36
+ (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: RobertaModel
37
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
38
+ )
39
+ ```
40
+
41
+ ## Usage
42
+
43
+ ### Direct Usage (Sentence Transformers)
44
+
45
+ First install the Sentence Transformers library:
46
+
47
+ ```bash
48
+ pip install -U sentence-transformers
49
+ ```
50
+
51
+ Then you can load this model and run inference.
52
+ ```python
53
+ from sentence_transformers import SentenceTransformer
54
+
55
+ # Download from the 🤗 Hub
56
+ model = SentenceTransformer("KhoaUIT/Phobert-UIT-R2GQA")
57
+ # Run inference
58
+ sentences = [
59
+ 'The weather is lovely today.',
60
+ "It's so sunny outside!",
61
+ 'He drove to the stadium.',
62
+ ]
63
+ embeddings = model.encode(sentences)
64
+ print(embeddings.shape)
65
+ # [3, 768]
66
+
67
+ # Get the similarity scores for the embeddings
68
+ similarities = model.similarity(embeddings, embeddings)
69
+ print(similarities.shape)
70
+ # [3, 3]
71
+ ```
72
+
73
+ <!--
74
+ ### Direct Usage (Transformers)
75
+
76
+ <details><summary>Click to see the direct usage in Transformers</summary>
77
+
78
+ </details>
79
+ -->
80
+
81
+ <!--
82
+ ### Downstream Usage (Sentence Transformers)
83
+
84
+ You can finetune this model on your own dataset.
85
+
86
+ <details><summary>Click to expand</summary>
87
+
88
+ </details>
89
+ -->
90
+
91
+ <!--
92
+ ### Out-of-Scope Use
93
+
94
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
95
+ -->
96
+
97
+ <!--
98
+ ## Bias, Risks and Limitations
99
+
100
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
101
+ -->
102
+
103
+ <!--
104
+ ### Recommendations
105
+
106
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
107
+ -->
108
+
109
+ ## Training Details
110
+
111
+ ### Framework Versions
112
+ - Python: 3.12.3
113
+ - Sentence Transformers: 3.2.0
114
+ - Transformers: 4.45.2
115
+ - PyTorch: 2.3.0+cpu
116
+ - Accelerate:
117
+ - Datasets: 3.1.0
118
+ - Tokenizers: 0.20.1
119
+
120
+ ## Citation
121
+
122
+ ### BibTeX
123
+
124
+ <!--
125
+ ## Glossary
126
+
127
+ *Clearly define terms in order to be accessible across audiences.*
128
+ -->
129
+
130
+ <!--
131
+ ## Model Card Authors
132
+
133
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
134
+ -->
135
+
136
+ <!--
137
+ ## Model Card Contact
138
+
139
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
140
  -->