dpanea commited on
Commit
99607f1
·
verified ·
1 Parent(s): 5743096

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -35
README.md CHANGED
@@ -230,9 +230,9 @@ pipeline_tag: sentence-similarity
230
  library_name: sentence-transformers
231
  ---
232
 
233
- # SentenceTransformer based on Alibaba-NLP/gte-large-en-v1.5
234
 
235
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
236
 
237
  ## Model Details
238
 
@@ -242,25 +242,12 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [A
242
  - **Maximum Sequence Length:** 8192 tokens
243
  - **Output Dimensionality:** 1024 dimensions
244
  - **Similarity Function:** Cosine Similarity
245
- <!-- - **Training Dataset:** Unknown -->
 
 
246
  <!-- - **Language:** Unknown -->
247
  <!-- - **License:** Unknown -->
248
 
249
- ### Model Sources
250
-
251
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
252
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
253
- - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
254
-
255
- ### Full Model Architecture
256
-
257
- ```
258
- SentenceTransformer(
259
- (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NewModel
260
- (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
261
- )
262
- ```
263
-
264
  ## Usage
265
 
266
  ### Direct Usage (Sentence Transformers)
@@ -271,26 +258,31 @@ First install the Sentence Transformers library:
271
  pip install -U sentence-transformers
272
  ```
273
 
274
- Then you can load this model and run inference.
 
275
  ```python
276
  from sentence_transformers import SentenceTransformer
277
 
278
  # Download from the 🤗 Hub
279
  model = SentenceTransformer("dpanea/skill-assignment-transformer")
280
- # Run inference
281
- sentences = [
282
- 'What is the artefact? This is catter this affect is called a cold Almond What are the features of the artefact? The features of this artefact are it looks like a gold snake with inscribed writing on the inside Question 2 What aspect of Ancient Roman society does this artefact represent? This aspect represents the 1st century AD What does the artefact tell us about Ancient Roman Society? This artefact tells us about the 1st century AD The plantations keep the stones gave them girls How does this artefact give us an understanding about Ancient Roman society? it gives us an understanding about Ancient Roman society, because the plantations keep slaves and gift the stones. and forced them to wear it',
283
- 'Writing Convention Skills: Conventions of Writing',
284
- 'Sentence Construction Skills: I can construct basic sentences',
 
 
 
 
285
  ]
286
- embeddings = model.encode(sentences)
287
- print(embeddings.shape)
288
- # [3, 1024]
289
-
290
- # Get the similarity scores for the embeddings
291
- similarities = model.similarity(embeddings, embeddings)
292
- print(similarities.shape)
293
- # [3, 3]
294
  ```
295
 
296
  <!--
@@ -336,14 +328,14 @@ You can finetune this model on your own dataset.
336
  #### Unnamed Dataset
337
 
338
  * Size: 11,779 training samples
339
- * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>sentence_2</code>
340
  * Approximate statistics based on the first 1000 samples:
341
- | | sentence_0 | sentence_1 | sentence_2 |
342
  |:--------|:---------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
343
  | type | string | string | string |
344
  | details | <ul><li>min: 124 tokens</li><li>mean: 615.96 tokens</li><li>max: 1566 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 19.72 tokens</li><li>max: 69 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 19.55 tokens</li><li>max: 53 tokens</li></ul> |
345
  * Samples:
346
- | sentence_0 | sentence_1 | sentence_2 |
347
  |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
348
  | <code>2024 POETRY FEATURE ARTICLE – SCAFFOLD - blank<br>Name: <br>Song Chosen: SET IT ALL FREE<br>Poem Chosen: STILL, I RISE<br>Common theme: These form together to give the message of overcoming challenges and rising above difficulties with confidence and strength.<br>]<br>THIS Scaffold could be submitted as your draft. <br>HEADLINE: It needs to be strong, catchy and stimulate the reader. Try for ‘ear appeal’ or ‘brain appeal’ if you can. Possibly use alliteration or a pun. Just use the title of your poem until you can think of a title for the article. FOCUS BLUB: A brief, gripping sentence or two that lets readers know more specifically what the article is about. It gives a sense of the style of your piece. / Voiceworks - Whispers Of Wisdom Discover the themes of resilience and empowerment in Scarlett Johanssons “set it all Free” and mya Angelou’s “still I rise” I will explore how these works help us to overcome adversity and embrace our true strength...</code> | <code>Emotionally Engaging Language: I can evoke an emotional response through emotive language.</code> | <code>Reference Formatting Skills: Formats the reference list/bibliography correctly.</code> |
349
  | <code>Why is there no fuel for the next 500 kilometers? We need fuel and there is no way to turn back.This is such a bad time.We need fuel and i am gonna rage quit and drive us off the bridge if we can't get fuel any time soon pull over it's my turn, to drive you have been driving for the last hour and i want t go speeding, down this hill and get to the fuel station quicker, you drive way to slow and it is annoying me.Ok fine i'm pulling over.Finally ok i see that red car coming ,he wants to race and im racing him.ya i beat him but now we only have enough fuel for the next 200 km and the next fuel station is 250 km away i will drive until we run out of fuel then we will have to push and i'm paying for the fuel don't even think about paying for the fuel little brother.Ok time to push.No i am not pushing the car and you can not make me just because u are 1 year older than me does no mean can boss me around.Fine i will push lazy boy.What Why is the gas station shut down and the next one is 300k...</code> | <code>Essay Organization Skills: Essay Writing</code> | <code>Case Evaluation Skills: Does the student include discerning evaluation of ideas to support their case for positive change? </code> |
 
230
  library_name: sentence-transformers
231
  ---
232
 
233
+ # Skill Assignment SentenceTransformer based on Alibaba-NLP/gte-large-en-v1.5
234
 
235
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5). It maps sentences & paragraphs to a 1024-dimensional dense vector space and has been fine-tuned to match essay texts with relevant skills for pedadogical evaluation.
236
 
237
  ## Model Details
238
 
 
242
  - **Maximum Sequence Length:** 8192 tokens
243
  - **Output Dimensionality:** 1024 dimensions
244
  - **Similarity Function:** Cosine Similarity
245
+ - **Training Dataset:** 11779 triplets (anchor, positive, negative) consisting of (essay text, relevant skill, irrelevant skill)
246
+ - **Training Loss:** Triplet loss
247
+ - **Final evaluation:** 100% accuracy using the [Triplet Evaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#tripletevaluator) with 0 margin on 619 validation triplets.
248
  <!-- - **Language:** Unknown -->
249
  <!-- - **License:** Unknown -->
250
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
251
  ## Usage
252
 
253
  ### Direct Usage (Sentence Transformers)
 
258
  pip install -U sentence-transformers
259
  ```
260
 
261
+ Then you can load this model and run inference, to find matching skills for a given essay.
262
+ The essay should be in plain text, and the skills should ideally be of the form "Short skill name: detailed skill description"
263
  ```python
264
  from sentence_transformers import SentenceTransformer
265
 
266
  # Download from the 🤗 Hub
267
  model = SentenceTransformer("dpanea/skill-assignment-transformer")
268
+ # Prepare data
269
+ essay_text = ['Fighter Jet\nGreetings my fellow friends. I am going to talk about my greatest passion fighter jets...']
270
+ skills = [
271
+ 'Noun Consistency Skills: I can use nouns, pronouns, plurals and tenses accurately and consistently throughout.',
272
+ 'Adventurous Vocabulary Skills: I can select from a range of known adventurous vocabulary. (tier 2 and tier 3 words).',
273
+ 'Descriptive Language Skills: I can use appropriate, interesting and varied word choice (adjectives, adverbs and descriptive phrases).',
274
+ 'Dialogue Tagging Skills: I can use dialogue tags successfully (eg correct positioning, new line for new speaker).',
275
+ 'Spell Words: I can spell commonly used words accurately.',
276
+ ...
277
  ]
278
+ # Get embeddings
279
+ essay_embedding = model.encode(essay_text)
280
+ skill_embeddings = model.encode(skills)
281
+ # Get the k most relevant skills for the given essay
282
+ from sentence_transformers.util import cos_sim
283
+ similarities = cos_sim(essay_embedding, skill_embeddings).flatten()
284
+ top_indices = np.argsort(similarities)[-k:][::-1]
285
+ top_skills = [all_skill_texts[i] for i in top_indices]
286
  ```
287
 
288
  <!--
 
328
  #### Unnamed Dataset
329
 
330
  * Size: 11,779 training samples
331
+ * Columns: <code>Essay text</code>, <code>Relevant skill</code>, and <code>Irrelevant skill</code>
332
  * Approximate statistics based on the first 1000 samples:
333
+ | | Essay text | Relevant skill | Irrelevant skill |
334
  |:--------|:---------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
335
  | type | string | string | string |
336
  | details | <ul><li>min: 124 tokens</li><li>mean: 615.96 tokens</li><li>max: 1566 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 19.72 tokens</li><li>max: 69 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 19.55 tokens</li><li>max: 53 tokens</li></ul> |
337
  * Samples:
338
+ | Essay text | Relevant skill | Irrelevant skill |
339
  |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
340
  | <code>2024 POETRY FEATURE ARTICLE – SCAFFOLD - blank<br>Name: <br>Song Chosen: SET IT ALL FREE<br>Poem Chosen: STILL, I RISE<br>Common theme: These form together to give the message of overcoming challenges and rising above difficulties with confidence and strength.<br>]<br>THIS Scaffold could be submitted as your draft. <br>HEADLINE: It needs to be strong, catchy and stimulate the reader. Try for ‘ear appeal’ or ‘brain appeal’ if you can. Possibly use alliteration or a pun. Just use the title of your poem until you can think of a title for the article. FOCUS BLUB: A brief, gripping sentence or two that lets readers know more specifically what the article is about. It gives a sense of the style of your piece. / Voiceworks - Whispers Of Wisdom Discover the themes of resilience and empowerment in Scarlett Johanssons “set it all Free” and mya Angelou’s “still I rise” I will explore how these works help us to overcome adversity and embrace our true strength...</code> | <code>Emotionally Engaging Language: I can evoke an emotional response through emotive language.</code> | <code>Reference Formatting Skills: Formats the reference list/bibliography correctly.</code> |
341
  | <code>Why is there no fuel for the next 500 kilometers? We need fuel and there is no way to turn back.This is such a bad time.We need fuel and i am gonna rage quit and drive us off the bridge if we can't get fuel any time soon pull over it's my turn, to drive you have been driving for the last hour and i want t go speeding, down this hill and get to the fuel station quicker, you drive way to slow and it is annoying me.Ok fine i'm pulling over.Finally ok i see that red car coming ,he wants to race and im racing him.ya i beat him but now we only have enough fuel for the next 200 km and the next fuel station is 250 km away i will drive until we run out of fuel then we will have to push and i'm paying for the fuel don't even think about paying for the fuel little brother.Ok time to push.No i am not pushing the car and you can not make me just because u are 1 year older than me does no mean can boss me around.Fine i will push lazy boy.What Why is the gas station shut down and the next one is 300k...</code> | <code>Essay Organization Skills: Essay Writing</code> | <code>Case Evaluation Skills: Does the student include discerning evaluation of ideas to support their case for positive change? </code> |