dpanea
/

skill-assignment-transformer

@@ -230,9 +230,9 @@ pipeline_tag: sentence-similarity
 library_name: sentence-transformers
 ---
-# SentenceTransformer based on Alibaba-NLP/gte-large-en-v1.5
-This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
@@ -242,25 +242,12 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [A
 - **Maximum Sequence Length:** 8192 tokens
 - **Output Dimensionality:** 1024 dimensions
 - **Similarity Function:** Cosine Similarity
-<!-- - **Training Dataset:** Unknown -->
 <!-- - **Language:** Unknown -->
 <!-- - **License:** Unknown -->
-### Model Sources
-- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
-- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
-- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
-### Full Model Architecture
-```
-SentenceTransformer(
-  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NewModel
-  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
-)
-```
 ## Usage
 ### Direct Usage (Sentence Transformers)
@@ -271,26 +258,31 @@ First install the Sentence Transformers library:
 pip install -U sentence-transformers
 ```
-Then you can load this model and run inference.
 ```python
 from sentence_transformers import SentenceTransformer
 # Download from the 🤗 Hub
 model = SentenceTransformer("dpanea/skill-assignment-transformer")
-# Run inference
-sentences = [
-    'What is the artefact?    This is catter this affect is called a cold   Almond   What are the features of the artefact?  The features of this artefact are it looks like   a gold snake with inscribed writing on the   inside   Question 2  What aspect of Ancient Roman society does this artefact represent?  This aspect represents the 1st century AD   What does the artefact tell us about Ancient Roman Society?  This artefact tells us about the 1st century AD The plantations keep the stones gave them girls How does this artefact give us an understanding about Ancient Roman society? it gives us an understanding about Ancient Roman society, because the plantations keep slaves and gift the stones. and forced them to wear it',
-    'Writing Convention Skills: Conventions of Writing',
-    'Sentence Construction Skills: I can construct basic sentences',
 ]
-embeddings = model.encode(sentences)
-print(embeddings.shape)
-# [3, 1024]
-# Get the similarity scores for the embeddings
-similarities = model.similarity(embeddings, embeddings)
-print(similarities.shape)
-# [3, 3]
 ```
 <!--
@@ -336,14 +328,14 @@ You can finetune this model on your own dataset.
 #### Unnamed Dataset
 * Size: 11,779 training samples
-* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>sentence_2</code>
 * Approximate statistics based on the first 1000 samples:
-  |         | sentence_0                                                                             | sentence_1                                                                        | sentence_2                                                                        |
   |:--------|:---------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
   | type    | string                                                                                 | string                                                                            | string                                                                            |
   | details | <ul><li>min: 124 tokens</li><li>mean: 615.96 tokens</li><li>max: 1566 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 19.72 tokens</li><li>max: 69 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 19.55 tokens</li><li>max: 53 tokens</li></ul> |
 * Samples:
-  | sentence_0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | sentence_1                                                                                              | sentence_2                                                                                                                                                       |
   |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
   | <code>2024 POETRY FEATURE ARTICLE – SCAFFOLD - blank<br>Name:  <br>Song Chosen: SET IT ALL FREE<br>Poem Chosen: STILL, I RISE<br>Common theme:  These form together to give the message of overcoming challenges and rising above difficulties with confidence and strength.<br>]<br>THIS Scaffold could be submitted as your draft.	 											                                             <br>HEADLINE: It needs to be strong, catchy and stimulate the reader. Try for ‘ear appeal’ or ‘brain appeal’ if you can.  Possibly use alliteration or a pun. Just use the title of your poem until you can think of a title for the article. FOCUS BLUB: A brief, gripping sentence or two that lets readers know more specifically what the article is about. It gives a sense of the style of your piece. / Voiceworks - Whispers Of Wisdom Discover the themes of resilience and empowerment in Scarlett Johanssons “set it all Free” and mya Angelou’s “still I rise” I will explore how these works help us to overcome adversity and embrace our true strength...</code> | <code>Emotionally Engaging Language: I can evoke an emotional response through emotive language.</code> | <code>Reference Formatting Skills: Formats the reference list/bibliography correctly.</code>                                                                     |
   | <code>Why is there no fuel for the next 500 kilometers? We need fuel and there is no way to turn back.This is such a bad time.We need fuel and i am gonna rage quit and drive us off the bridge if we can't get fuel any time soon pull over it's my turn, to drive you have been driving for the last hour and i want t go speeding, down this hill and get to the fuel station quicker, you drive way to slow and it is annoying me.Ok fine i'm pulling over.Finally ok i see that red car coming ,he wants to race and im racing him.ya i beat him but now we only have enough fuel for the next 200 km and the next fuel station is 250 km away i will drive until we run out of fuel then we will have to push and i'm paying for the fuel don't even think about paying for the fuel little brother.Ok time to push.No i am not pushing the car and you can not make me just because u are 1 year older than me does no mean can boss me around.Fine i will push lazy boy.What Why is the gas station shut down and the next one is 300k...</code>                      | <code>Essay Organization Skills: Essay Writing</code>                                                   | <code>Case Evaluation Skills: Does the student include discerning evaluation of ideas to support their case for positive change? </code>                         |

 library_name: sentence-transformers
 ---
+# Skill Assignment SentenceTransformer based on Alibaba-NLP/gte-large-en-v1.5
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5). It maps sentences & paragraphs to a 1024-dimensional dense vector space and has been fine-tuned to match essay texts with relevant skills for pedadogical evaluation.
 ## Model Details
 - **Maximum Sequence Length:** 8192 tokens
 - **Output Dimensionality:** 1024 dimensions
 - **Similarity Function:** Cosine Similarity
+- **Training Dataset:** 11779 triplets (anchor, positive, negative) consisting of (essay text, relevant skill, irrelevant skill)
+- **Training Loss:** Triplet loss
+- **Final evaluation:** 100% accuracy using the [Triplet Evaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#tripletevaluator) with 0 margin on 619 validation triplets.
 <!-- - **Language:** Unknown -->
 <!-- - **License:** Unknown -->
 ## Usage
 ### Direct Usage (Sentence Transformers)
 pip install -U sentence-transformers
 ```
+Then you can load this model and run inference, to find matching skills for a given essay.
+The essay should be in plain text, and the skills should ideally be of the form "Short skill name: detailed skill description"
 ```python
 from sentence_transformers import SentenceTransformer
 # Download from the 🤗 Hub
 model = SentenceTransformer("dpanea/skill-assignment-transformer")
+# Prepare data
+essay_text = ['Fighter Jet\nGreetings my fellow friends. I am going to talk about my greatest passion fighter jets...']
+skills = [
+  'Noun Consistency Skills: I can use nouns, pronouns, plurals and tenses accurately and consistently throughout.',
+  'Adventurous Vocabulary Skills: I can select from a range of known adventurous vocabulary. (tier 2 and tier 3 words).',
+  'Descriptive Language Skills: I can use appropriate, interesting and varied word choice (adjectives, adverbs and descriptive phrases).',
+  'Dialogue Tagging Skills: I can use dialogue tags successfully (eg correct positioning, new line for new speaker).',
+  'Spell Words: I can spell commonly used words accurately.',
+  ...
 ]
+# Get embeddings
+essay_embedding = model.encode(essay_text)
+skill_embeddings = model.encode(skills)
+# Get the k most relevant skills for the given essay
+from sentence_transformers.util import cos_sim
+similarities = cos_sim(essay_embedding, skill_embeddings).flatten()
+top_indices = np.argsort(similarities)[-k:][::-1]
+top_skills = [all_skill_texts[i] for i in top_indices]
 ```
 <!--
 #### Unnamed Dataset
 * Size: 11,779 training samples
+* Columns: <code>Essay text</code>, <code>Relevant skill</code>, and <code>Irrelevant skill</code>
 * Approximate statistics based on the first 1000 samples:
+  |         | Essay text                                                                             | Relevant skill                                                                    | Irrelevant skill                                                                  |
   |:--------|:---------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
   | type    | string                                                                                 | string                                                                            | string                                                                            |
   | details | <ul><li>min: 124 tokens</li><li>mean: 615.96 tokens</li><li>max: 1566 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 19.72 tokens</li><li>max: 69 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 19.55 tokens</li><li>max: 53 tokens</li></ul> |
 * Samples:
+  | Essay text                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | Relevant skill                                                                                          | Irrelevant skill                                                                                                                                                 |
   |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
   | <code>2024 POETRY FEATURE ARTICLE – SCAFFOLD - blank<br>Name:  <br>Song Chosen: SET IT ALL FREE<br>Poem Chosen: STILL, I RISE<br>Common theme:  These form together to give the message of overcoming challenges and rising above difficulties with confidence and strength.<br>]<br>THIS Scaffold could be submitted as your draft.	 											                                             <br>HEADLINE: It needs to be strong, catchy and stimulate the reader. Try for ‘ear appeal’ or ‘brain appeal’ if you can.  Possibly use alliteration or a pun. Just use the title of your poem until you can think of a title for the article. FOCUS BLUB: A brief, gripping sentence or two that lets readers know more specifically what the article is about. It gives a sense of the style of your piece. / Voiceworks - Whispers Of Wisdom Discover the themes of resilience and empowerment in Scarlett Johanssons “set it all Free” and mya Angelou’s “still I rise” I will explore how these works help us to overcome adversity and embrace our true strength...</code> | <code>Emotionally Engaging Language: I can evoke an emotional response through emotive language.</code> | <code>Reference Formatting Skills: Formats the reference list/bibliography correctly.</code>                                                                     |
   | <code>Why is there no fuel for the next 500 kilometers? We need fuel and there is no way to turn back.This is such a bad time.We need fuel and i am gonna rage quit and drive us off the bridge if we can't get fuel any time soon pull over it's my turn, to drive you have been driving for the last hour and i want t go speeding, down this hill and get to the fuel station quicker, you drive way to slow and it is annoying me.Ok fine i'm pulling over.Finally ok i see that red car coming ,he wants to race and im racing him.ya i beat him but now we only have enough fuel for the next 200 km and the next fuel station is 250 km away i will drive until we run out of fuel then we will have to push and i'm paying for the fuel don't even think about paying for the fuel little brother.Ok time to push.No i am not pushing the car and you can not make me just because u are 1 year older than me does no mean can boss me around.Fine i will push lazy boy.What Why is the gas station shut down and the next one is 300k...</code>                      | <code>Essay Organization Skills: Essay Writing</code>                                                   | <code>Case Evaluation Skills: Does the student include discerning evaluation of ideas to support their case for positive change? </code>                         |