Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
How to use davanstrien/fineweb-c-quality-classifier-v4 with sentence-transformers:
from sentence_transformers import CrossEncoder
model = CrossEncoder("davanstrien/fineweb-c-quality-classifier-v4")
query = "Which planet is known as the Red Planet?"
passages = [
"Venus is often called Earth's twin because of its similar size and proximity.",
"Mars, known for its reddish appearance, is often referred to as the Red Planet.",
"Jupiter, the largest planet in our solar system, has a prominent red spot.",
"Saturn, famous for its rings, is sometimes mistaken for the Red Planet."
]
scores = model.predict([(query, passage) for passage in passages])
print(scores)This is a Cross Encoder model finetuned from jhu-clsp/mmBERT-base using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text pair classification.
CrossEncoder(
(0): Transformer({'transformer_task': 'sequence-classification', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'logits'}}, 'module_output_name': 'scores', 'architecture': 'ModernBertForSequenceClassification'})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
# Download from the 🤗 Hub
model = CrossEncoder("davanstrien/fineweb-c-quality-classifier-v4")
# Get scores for pairs of inputs
pairs = [
['Rate the educational quality of this text on a 6-point scale (Problematic / None / Minimal / Basic / Good / Excellent):', 'It’s important to include healthy sources of protein in your diet each day. Protein helps your body with a number of important functions and helps you maintain muscle mass. When you think of protein, steak or chicken might come to mind. But if you’re not a big meat eater, you have other options to make sure you get the recommended amount of protein that your body needs. Worry not, because there are plenty of protein-rich vegetables available year-round. Try out these options for plenty of variety.\nYou can enjoy each of them alone as a side dish, or as indifferent recipes for a filling main course. Keep in mind that the protein content may change depending on how you prepare each vegetable. The values below match the cooking method indicated for each food. Here are some vegetables high in protein.\n7 Vegetables High in Protein\nSeitan is a popular protein source for many vegetarians and vegans.\nIt’s made from gluten, the main protein in wheat. Unlike many soy-based mock meats, it resembles the look and texture of meat when cooked.\nAlso known as wheat meat or wheat gluten, it contains about 25 grams of protein per 3.5 ounces (100 grams). This makes it the richest plant protein source on this list. Seitan is also a good source of selenium and contains small amounts of iron, calcium and phosphorus.\nYou can find this meat alternative in the refrigerated section of most health food stores, or make your own version with vital wheat gluten using this recipe.\nSeitan can be pan-fried, sautéed and even grilled. Therefore, it can be easily incorporated in a variety of recipes.\nHowever, seitan should be avoided by people with celiac disease or gluten sensitivity.\n2. Tofu, Tempeh and Edamame\nTofu, tempeh and edamame all originate from soybeans.\nSoybeans are considered a whole source of protein. This means that they provide the body with all the essential amino acids it needs.\nEdamame are immature soybeans with a sweet and slightly grassy taste. They need to be steamed or boiled prior to consumption and can be eaten on their own or added to soups and salads.\nTofu is made from bean curds pressed together in a process similar to cheesemaking. Tempeh is made by cooking and slightly fermenting mature soybeans prior to pressing them into a patty.\nTofu doesn’t have much taste, but easily absorbs the flavor of the ingredients it’s prepared with. Comparatively, tempeh has a characteristic nutty flavor.\nBoth tofu and tempeh can be used in a variety of recipes, ranging from burgers to soups and chilis.\nAll three contain iron, calcium and 10-19 grams of protein per 3.5 ounces (100 grams).\nEdamame are also rich in folate, vitamin K and fiber. Tempeh contains a good amount of probiotics, B vitamins and minerals such as magnesium and phosphorus.\nCooked chickpeas are high in protein, containing around 7.25 g per ½ cup.\nChickpeas can be eaten hot or cold, and are highly versatile with plenty of recipes available online. They can, for example, be added to stews and curries, or spiced with paprika and roasted in the oven.\nA person can add hummus, which is made from chickpea paste, to a sandwich for a healthful, protein-rich alternative to butter.\nPeanuts are protein-rich, full of healthful fats, and may improve heart health. They contain around 20.5 g of protein per ½ cup.\nPeanut butter is also rich in protein, with 3.6 g per tablespoon, making peanut butter sandwiches a healthful complete protein snack.\nQuinoa is a grain with a high-protein content, and is a complete protein. Cooked quinoa contains 8 g of protein per cup.\nThis grain is also rich in other nutrients, including magnesium, iron, fiber, and manganese. It is also highly versatile.\nQuinoa can fill in for pasta in soups and stews. It can be sprinkled on a salad or eaten as the main course.\nMycoprotein is a fungus-based protein. Mycoprotein products contain around 13 g of protein per ½ cup serving.\nProducts with mycoprotein are often advertised as meat substitutes and are available in forms such as “chicken” nuggets or cutlets. However, many of these products contain egg white, so people must be sure to check the label.\nEggs are an easily available, cheap source of nutrients. A single hard-boiled egg contains around 7g of protein and makes a nutritious, filling breakfast or lunchtime meal. They’re also easily digestible and low in calories. Try our protein-rich scrambled egg and feta hash.'],
['Rate the educational quality of this text on a 6-point scale (Problematic / None / Minimal / Basic / Good / Excellent):', "หลังเสร็จภารกิจรักต่างคนต่างเข้าสู่นิทราในสภาพไร้เสื้อผ้าอาภรณ์อาจจะเป็นเพราะเหนื่อยเกินกว่าที่จะควานหาอะไรมาปิดกายหรือไม่ต้อง\nอายคู่รักก็แล้วแต่เหตุผลของใคร แต่ที่แน่ๆ ตื่นมาแล้วมีแต่รอยยิ้มฟินไม่หายรู้สึกสดชื่นแจ่มใส อาจจะยกความดีความชอบให้เซ็กซ์เมื่อคืน\nที่แสนจะดุเด็ดเผ็ดอร่อย แต่ล่าสุดมีงานวิจัยว่าเหตุผลใดจึงควรนอนเปลือยกาย หลับไปโดยไร้แพรพรรณใดๆ สวมใส่ ซึ่งไม่เพียงก้าวสู่นิทรา\nด้วยรอยยิ้มเท่านั้น เรื่องดังกล่าวยังมีส่วนทำให้ชีวิต 'ดี๊ดี' ขึ้นอีกด้วย\n1. หลับง่าย\nเราอาจคิดว่าร่างกายมนุษย์ต้องการเครื่องปกป้องผิวหนังตลอดเวลาเพื่อสร้างความอบอุ่น แต่เชื่อหรือไม่ว่าเสื้อผ้าที่สวมใส่ไปรบกวนกระบวน\nการปรับอุณหภูมิในร่างกาย มีงานวิจัยจาก The American Academy of Sleep Medicine หรือ AASM รายงานว่า ขณะเข้าสู่\nนิทรา ร่างกายจะค่อยๆ ลดอุณหภูมิลง อันเป็นส่วนหนึ่งของระบบนาฬิกาชีวภาพ (Circadian Rhythm หรือ Human Biological Clock)\nหากมีการสวมใส่เสื้อผ้า ร่างกายจะเกิดความร้อนสะสมสูงขึ้น เมื่อลดอุณหภูมิยากกว่าเดิมจึงหลับยากขึ้น นอนไม่เต็มอิ่ม ส่งผลต่อสภาพร่างกาย\nโดยตรง อาจรู้สึกอ่อนเพลียช่วงระหว่างวันก็เป็นได้\n2. หน้าท้องแบนราบ?\nนอกจากเรื่องความสดชื่นแจ่มใสมีพลังหลังตื่นนอนแล้ว กระบวนการของร่างกายขณะปรับอุณหภูมิลดลงส่งผลต่อฮอร์โมนเกี่ยวข้องกับการเจริญ\nเติบโต หรือ Growth Hormone ที่จะค่อยๆ เพิ่มขึ้น ตรงกันข้ามจะมีการปรับลดคอร์ติซอล (Cortisol) หรือฮอร์โมนแห่งความเครียด\nเพื่อให้ร่างกายฟื้นฟูระบบต่างๆ อย่างสมดุลหลังหลับสนิท ทว่าการสวมชุดนอนซึ่งทำให้อุณหภูมิร่างกายสูงขึ้นนั้น คอร์ติซอลจะถูกผลิตออกมา\nมากกว่าปกติเพื่อชดเชยพลังงานที่เสียไป คุณจึงอยากอาหารมากขึ้น นั่นคือยิ่งหลับไม่สนิท คุณยิ่งมีความต้องการบริโภคอาหารมากขึ้น นำไปสู่\nการเกิดพุงง่ายกว่าปกติ\n3. ยั่วเย้ายวนใจ\nไม่แปลกหากการนอนแก้ผ้าทำให้คนสองคนอยากฟีชเจอริ่งกันบ่อยครั้งขึ้น ข้อดีคือช่วยสานสัมพันธ์ระหว่างคู่รัก ไม่ว่าถูกกระตุ้นด้วยวิธีมองด้วย\nตายามเห็นเธอนอนเผยเรือนร่าง หรือระหว่างนอนเนื้อแนบเนื้อจุดประกายไฟในตัวให้สปาร์กอยากขยับไปจับตรงนั้นสอดใส่ตรงนี้ สุดท้ายหลังมี\nเพศสัมพันธ์ไม่ว่าเป็นแบบเนิบๆ ร้อนแรง หรือ ขุดกระบวนยุทธ์ใดมาใช้กระหน่ำรัก ย่อมนำมาสู่การหลั่งสารเคมีจำพวกเอนดอร์ฟิน(Endorphin)\nหรือเคมีแห่งความสุข ช่วยลดความเครียดจากเรื่องราวระหว่างวัน รวมถึงอีกหนึ่งเคมีที่ขาดไม่ได้คือออกซิโตซิน (Oxytocin) ฮอร์โมนแห่ง\nความ ผูกพัน\nคลิปเสียวประกอบเนื้อหา\n4. มั่นใจในเรือนร่างของตัวเองมากกว่าเดิม\nต่อยอดจากข้อสาม เมื่อคุณถูกคนที่นอนอยู่ข้างๆ รุกเข้าหา ไม่ว่าจะเป็นฝ่ายชายเปิดศึกเกมรักก่อนอย่างเร่าร้อน หรือฝ่ายหญิงค่อยๆ ประโลมรัก\nให้อย่างช้าๆ สิ่งที่ตามมาคือความรู้สึกว่าตนเองมีความน่าสนใจในสายตาของเขา หรือเธอ สรุปคือเมื่อนอนเปลือยเปล่าโดยไร้เสื้อผ้าแล้วได้รับ\nความสนใจ จะบังเกิดเป็นความเชื่อมั่นว่าหุ่นของเรายังน่ากินอยู่นั่นเอง\n5. ไม่ต้องซักผ้าเยอะ\nแน่นอน เมื่อไม่ต้องเลือกชุดนอนมาสวมใส่ ก็ไม่เปลืองเวลารับมือกับจำนวนชุดที่ต้องซัก ไม่เปลืองผงซักฟอก น้ำยาปรับผ้านุ่ม ไม่เปลืองน้ำด้วย\nอีกเหตุผลคือเราสามารถใช้เวลาหลังอาบน้ำได้กระชับขึ้น อาบน้ำ เช็ดตัว ทาครีมเสร็จปุ๊บก็กระโดดขึ้นเตียงปั๊บเลย\n6. เป็นเหตุผลที่ดีให้คนที่ไม่อยากอาบน้ำตอนเช้า\nตามข้อมูลเบื้องต้นบอกอยู่แล้วว่าการสวมเสื้อผ้านอนทำให้ร่างกายมีอุณหภูมิสูงขึ้น สิ่งที่ตามมาคือกลิ่นอับชื้นจากเหงื่อที่สะสมมาทั้งคืน แต่\nหากเราลดอุณหภูมิให้ร่างกายด้วยวิธีนอนเปลือยกาย ผลที่ตามมาคือเหงื่อไคลย่อมน้อยลง หรือในบางรายอาจแทบไม่มีเลยด้วยซ้ำ และเมื่อ\nไม่มีเหงื่อย่อมไม่มีกลิ่นตัวใช่มั้ยล่ะ แต่สิ่งนี้ขอให้เป็นทางเลือกเฉพาะบุคคลที่มักยอมแพ้การสัมผัสน้ำยามเช้าจริงๆ รวมถึงจำกัดเฉพาะผู้มีเครื่อง\nปรับอากาศ หรืออยู่ในสภาพแวดล้อมกลางอุณหภูมิต่ำเท่านั้น เพราะคุณอาจไม่หงุดหงิดกับกลิ่นตัวตัวเอง แต่กับคนรอบข้างมันคือหายนะดีๆ\nนี่เอง\n7. เบาสบาย สำหรับจุดซ่อนเร้น\nเหตุผลสุดท้ายเน้นสำหรับสาวๆ โดยเฉพาะ (อาจใช้เป็นข้ออ้างของคุณผู้ชายเพื่อกระซิบบอกเธอว่าอย่าอาย ไม่ต้องสวมอะไรนอนเลยก็ได้นะ)\nเพราะทำให้บริเวณจุดซ่อนเร้นมีกลิ่นไม่พึงประสงค์ อย่าลืมว่าตลอดวันผู้หญิงต้องเผชิญกับมลภาวะ ความอับชื้นจากเหงื่อไคลมามากพอแล้ว\nลองปล่อยให้จุดซ่อนเร้นได้สูดอากาศบริสุทธิ์บ้าง เมื่อไม่มีกลิ่นจากเชื้อแบคทีเรียมาทำลายบรรยากาศ การทำออรัลเซ็กซ์ให้เธอย่อมใส่ลีลาได้\nอย่างเต็มที่ โดยไม่ต้องกังวลใจทั้งสองฝ่าย"],
['Rate the educational quality of this text on a 6-point scale (Problematic / None / Minimal / Basic / Good / Excellent):', 'ideen fur grabbepflanzung fa 1 4 r fa 1 4 r ede ideen fur grabbepflanzung im fruhjahr.\nideen grabbepflanzung herbst fur sommer im fruhjahr fa 1 4 r,ideen fur grabbepflanzung im fruhjahr gesucht o,ideen grabbepflanzung grab in 5 fa 1 4 r co fur im fruhjahr gesucht sommer,ideen fur grabbepflanzung im sommer fa 1 4 r den allerheiligen herbst, ideen fur grabbepflanzung fruhjahr sommer,ideen fur grabbepflanzung best images on cemetery flowers funeral fruhjahr allerheiligen,ideen fur grabbepflanzung im sommer gallery of die allerheiligen,ideen grabbepflanzung herbst fur im sommer ,ideen fur grabbepflanzung im fruhjahr n nu 1 4 u 2 gesucht,ideen grabbepflanzung herbst sommer fur best images about on style heart and.'],
['Rate the educational quality of this text on a 6-point scale (Problematic / None / Minimal / Basic / Good / Excellent):', 'Татар мәгарифе порталы\nӨстәмә һөнәри белем бирү дәүләт автоном белем бирү учреждениесе «Татарстан Республикасы Мәгарифне үстерү институты» «Иң яхшы цифрлы белем бирү практикалары» авторлык методик эшкәртмәләре конкурсы (конкурс) игълан итә.\nКонкурс 2017–2021 еллар һәм 2030 елга кадәрге чорга Татарстан Республикасында мәгарифне үстерү стратегиясе бурычларын тормышка ашыру һәм 2022 елда Татарстан Республикасында цифрлаштыру елы кысаларында үткәрелә.\nКонкурсның максаты ‒ мәктәпкәчә, гомуми һәм өстәмә белем бирү учреждениеләре педагогларының электрон белем бирү ресурсларын булдыру, мәгълүмати-коммуникацион технологияләрдән нәтиҗәле файдалану мәсьәләләре буенча һөнәри компетентлыгын арттыруга ярдәм итү.\nКонкурста Татарстан Республикасы мәгариф оешмалары педагоглары катнаша ала, яшь һәм педагогик стаж буенча чикләүләр юк. Методик эшкәртмәләр рус һәм татар телләрендә кабул ителә.\nТулырак мәгълүмат белән беркетелгән НИГЕЗЛӘМӘдә таныша аласыз.\nХәзер online: 0 кулланучы'],
['Rate the educational quality of this text on a 6-point scale (Problematic / None / Minimal / Basic / Good / Excellent):', 'Sau quy trình rà soát, xét duyệt công khai và minh bạch, 36 phạm nhân thuộc các tội ít nghiêm trọng đủ điều kiện, đã được trao quyết định đặc xá của Chủ tịch nước tại Trại tạm giam số 1 CATP Hà Nội. Trong số này có 5 phạm nhân nữ chủ yếu liên quan đến các tội danh môi giới mại dâm, người trẻ nhất được đặc xá 19 tuổi, người già nhất 63 tuổi.\nĐại tá Nguyễn Đức Niên, Phó Cục trưởng Cục Hướng dẫn tạm giam, tạm giữ -Tổng cục VIII , Bộ Công an trao quyết định đặc xá của Chủ tịch nước cho đại diện các phạm nhân được đặc xá\nThượng tá Trịnh Đình Hùng, Phó Giám thị Trại tạm giam số 1 CATP Hà Nội cho biết, khác với mọi năm, việc đặc xá thường được tổ chức vào dịp Quốc khánh 2-9 nhưng năm nay, quyết định đặc xá của Chủ tịch nước được công bố vào dịp tháng 11.\nQuyết định đặc xá của Chủ tịch nước thể hiện chính sách khoan hồng của Đảng và Nhà nước với người phạm tội\nTrước đó khoảng 1 tháng, Hội đồng đặc xá các cấp mới ban hành kế hoạch chuẩn bị công tác đặc xá các cấp. Theo đó, tiêu chuẩn được đặc xá năm nay quy định chặt chẽ hơn, phạm nhân phải thi hành ít nhất 1/2 bản án mới thuộc đối tượng được xét duyệt.\nDo đó việc thực hiện đầy đủ quy trình để được Hội đồng đặc xá Trung ương thông qua rất gấp gáp, song với tinh thần và trách nhiệm cao, thể hiện tính nhân văn cao cả của pháp luật nước CHXHCN Việt Nam, Hội đồng đặc xá Trại tạm giam số 1 đã làm việc khẩn trương, công tâm, rà soát kỹ lưỡng không để sót lọt các đối tượng đủ điều kiện đặc xá.\nRời cánh cổng trại giam về với cuộc sống thường ngày\nTâm trạng rưng rưng, anh Nguyễn Văn Hưng, người phải thi hành bản án 40 tháng tù giam về tội môi giới mại dâm, đại diện 36 phạm nhân được đặc xá đã bày tỏ lòng cảm ơn đến các cán bộ quản giáo, Ban Giám thị trại, CATP Hà Nội, VKSND, TAND đã tạo điều kiện, động viên để cá nhân mình cùng các phạm nhân khác được học tập, rèn luyện, phấn đấu trở thành một công dân tốt, đủ điều kiện được đặc xá hôm nay.\nChị Mai Thị Thu Hường chia sẻ niềm vui của người được đặc xá\nLà một trong 5 phạm nhân nữ được đặc xá vào dịp này, chị Mai Thị Thu Hường chia sẻ, do đã được giảm án một lần nên phần thời gian còn lại tính đến ngày hôm nay, chị phải tiếp tục thi hành án là 11 tháng, nhưng do phấn đấu tốt, chị đã được đặc xá.\n"Ở bên ngoài kia, tôi có gia đình, những đứa con đang chờ đợi mình. Trong giờ phút chia tay này, tôi không biết nói gì hơn ngoài lời cảm ơn cán bộ quản giáo trong những tháng ngày tôi lao động cải tạo tại đây đã cho tôi được học tập, rèn luyện, từ đó sửa đổi từ trong suy nghĩ, trở thành người công dân có ích" - chị Hường bày tỏ.\nÔng Nguyễn Chí Hồng, 63 tuổi, trú tại xã Trung Giã, huyện Sóc Sơn, Hà Nội, phạm nhân lớn tuổi nhất được nhận quyết định đặc xá:\nTừ 1 tháng qua, khi được Ban giám thị trại phát phiếu rà soát xem xét có thuộc diện được đặc xá hay không, tôi đã nhiều đêm không ngủ. Gia đình tôi ở xã Trung Giã, huyện Sóc Sơn, Hà Nội cũng mong đợi ngày tôi được trở về nhà. Lúc này tôi cảm thấy sung sướng và xúc động, nhận thức rõ về cái sai của bản thân mình trong quá khứ, nhưng tin tưởng vào tương lai phía trước. Sau khi rời khỏi cánh cổng nhà tạm giam, tôi sẽ tiếp tục phấn đấu trở thành một công dân tốt, có ích cho xã hội bằng việc làm phù hợp với sức khỏe của bản thân.'],
]
scores = model.predict(pairs)
print(scores)
# [[-6.0938 -5.9688 -0.9062 3.4531 5.4062 2.6719]
# [-0.8516 -0.8984 -0.0908 0.3613 0.7461 -1.7109]
# [ 9.8125 3.1094 -1.6562 -1.7969 -3.5 -3.1562]
# [-1.8516 3.9844 2.5156 -0.4043 -2.9531 -4.2188]
# [-0.1514 4. 2.8594 -1.7109 -3.4688 -5.125 ]]
query, passage, and label| query | passage | label | |
|---|---|---|---|
| type | string | string | int |
| modality | text | text | |
| details |
|
|
|
| query | passage | label |
|---|---|---|
Rate the educational quality of this text on a 6-point scale (Problematic / None / Minimal / Basic / Good / Excellent): |
โดย วัชรากร หนูทอง และ อนุกูล น้อยไม้ |
2 |
Rate the educational quality of this text on a 6-point scale (Problematic / None / Minimal / Basic / Good / Excellent): |
According to a 2018 study by the U.S. Department of Health and Human Services, there are approximately 5.7 million Native Americans in the United States. This comprises about 1.7% of the total U.S. population. Because of various social issues that disproportionately affect Native American communities, rates of substance abuse and alcohol abuse are considerably higher in these communities compared to the rest of the U.S. population. |
5 |
Rate the educational quality of this text on a 6-point scale (Problematic / None / Minimal / Basic / Good / Excellent): |
Hat Lady Gaga Fibromyalgie? |
0 |
CrossEntropyLossquery, passage, and label| query | passage | label | |
|---|---|---|---|
| type | string | string | int |
| modality | text | text | |
| details |
|
|
|
| query | passage | label |
|---|---|---|
Rate the educational quality of this text on a 6-point scale (Problematic / None / Minimal / Basic / Good / Excellent): |
It’s important to include healthy sources of protein in your diet each day. Protein helps your body with a number of important functions and helps you maintain muscle mass. When you think of protein, steak or chicken might come to mind. But if you’re not a big meat eater, you have other options to make sure you get the recommended amount of protein that your body needs. Worry not, because there are plenty of protein-rich vegetables available year-round. Try out these options for plenty of variety. |
5 |
Rate the educational quality of this text on a 6-point scale (Problematic / None / Minimal / Basic / Good / Excellent): |
หลังเสร็จภารกิจรักต่างคนต่างเข้าสู่นิทราในสภาพไร้เสื้อผ้าอาภรณ์อาจจะเป็นเพราะเหนื่อยเกินกว่าที่จะควานหาอะไรมาปิดกายหรือไม่ต้อง |
2 |
Rate the educational quality of this text on a 6-point scale (Problematic / None / Minimal / Basic / Good / Excellent): |
ideen fur grabbepflanzung fa 1 4 r fa 1 4 r ede ideen fur grabbepflanzung im fruhjahr. |
0 |
CrossEntropyLossper_device_train_batch_size: 32num_train_epochs: 2learning_rate: 2e-05warmup_steps: 0.1weight_decay: 0.01bf16: Trueper_device_eval_batch_size: 64push_to_hub: Truehub_model_id: davanstrien/fineweb-c-quality-classifier-v4load_best_model_at_end: Trueseed: 12per_device_train_batch_size: 32num_train_epochs: 2max_steps: -1learning_rate: 2e-05lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_steps: 0.1optim: adamw_torch_fusedoptim_args: Noneweight_decay: 0.01adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08optim_target_modules: Nonegradient_accumulation_steps: 1average_tokens_across_devices: Truemax_grad_norm: 1.0label_smoothing_factor: 0.0bf16: Truefp16: Falsebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Nonetorch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneuse_liger_kernel: Falseliger_kernel_config: Noneuse_cache: Falseneftune_noise_alpha: Nonetorch_empty_cache_steps: Noneauto_find_batch_size: Falselog_on_each_node: Truelogging_nan_inf_filter: Trueinclude_num_input_tokens_seen: nolog_level: passivelog_level_replica: warningdisable_tqdm: Falseproject: huggingfacetrackio_space_id: Nonetrackio_bucket_id: Nonetrackio_static_space_id: Noneper_device_eval_batch_size: 64prediction_loss_only: Trueeval_on_start: Falseeval_do_concat_batches: Trueeval_use_gather_object: Falseeval_accumulation_steps: Noneinclude_for_metrics: []batch_eval_metrics: Falsesave_only_model: Falsesave_on_each_node: Falseenable_jit_checkpoint: Falsepush_to_hub: Truehub_private_repo: Nonehub_model_id: davanstrien/fineweb-c-quality-classifier-v4hub_strategy: every_savehub_always_push: Falsehub_revision: Noneload_best_model_at_end: Trueignore_data_skip: Falserestore_callback_states_from_checkpoint: Falsefull_determinism: Falseseed: 12data_seed: Noneuse_cpu: Falseaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedataloader_drop_last: Falsedataloader_num_workers: 0dataloader_pin_memory: Truedataloader_persistent_workers: Falsedataloader_prefetch_factor: Noneremove_unused_columns: Truelabel_names: Nonetrain_sampling_strategy: randomlength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falseddp_static_graph: Noneddp_backend: Noneddp_timeout: 1800fsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}deepspeed: Nonedebug: []skip_memory_metrics: Truedo_predict: Falseresume_from_checkpoint: Nonewarmup_ratio: Nonelocal_rank: -1prompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | Validation Loss |
|---|---|---|---|
| 0.0009 | 1 | 3.4543 | - |
| 0.0208 | 23 | 2.9107 | - |
| 0.0416 | 46 | 2.0832 | - |
| 0.0624 | 69 | 1.3031 | - |
| 0.0832 | 92 | 1.1345 | - |
| 0.1040 | 115 | 1.0150 | - |
| 0.1248 | 138 | 1.0059 | - |
| 0.1456 | 161 | 1.0075 | - |
| 0.1664 | 184 | 0.9778 | - |
| 0.1872 | 207 | 1.0201 | - |
| 0.2007 | 222 | - | 0.9319 |
| 0.2080 | 230 | 0.9740 | - |
| 0.2288 | 253 | 0.9830 | - |
| 0.2495 | 276 | 0.8856 | - |
| 0.2703 | 299 | 0.8201 | - |
| 0.2911 | 322 | 0.8963 | - |
| 0.3119 | 345 | 0.9093 | - |
| 0.3327 | 368 | 0.8476 | - |
| 0.3535 | 391 | 0.8476 | - |
| 0.3743 | 414 | 0.8178 | - |
| 0.3951 | 437 | 0.8832 | - |
| 0.4014 | 444 | - | 0.7573 |
| 0.4159 | 460 | 0.8325 | - |
| 0.4367 | 483 | 0.8152 | - |
| 0.4575 | 506 | 0.7976 | - |
| 0.4783 | 529 | 0.8702 | - |
| 0.4991 | 552 | 0.7890 | - |
| 0.5199 | 575 | 0.8179 | - |
| 0.5407 | 598 | 0.8462 | - |
| 0.5615 | 621 | 0.8134 | - |
| 0.5823 | 644 | 0.8272 | - |
| 0.6022 | 666 | - | 0.7021 |
| 0.6031 | 667 | 0.8505 | - |
| 0.6239 | 690 | 0.7768 | - |
| 0.6447 | 713 | 0.8334 | - |
| 0.6655 | 736 | 0.8041 | - |
| 0.6863 | 759 | 0.8329 | - |
| 0.7071 | 782 | 0.8292 | - |
| 0.7278 | 805 | 0.7947 | - |
| 0.7486 | 828 | 0.7570 | - |
| 0.7694 | 851 | 0.8126 | - |
| 0.7902 | 874 | 0.7791 | - |
| 0.8029 | 888 | - | 0.6981 |
| 0.8110 | 897 | 0.6901 | - |
| 0.8318 | 920 | 0.7495 | - |
| 0.8526 | 943 | 0.8146 | - |
| 0.8734 | 966 | 0.8269 | - |
| 0.8942 | 989 | 0.8353 | - |
| 0.9150 | 1012 | 0.7531 | - |
| 0.9358 | 1035 | 0.8264 | - |
| 0.9566 | 1058 | 0.7690 | - |
| 0.9774 | 1081 | 0.7914 | - |
| 0.9982 | 1104 | 0.8223 | - |
| 1.0036 | 1110 | - | 0.7141 |
| 1.0190 | 1127 | 0.6954 | - |
| 1.0398 | 1150 | 0.6326 | - |
| 1.0606 | 1173 | 0.7012 | - |
| 1.0814 | 1196 | 0.6841 | - |
| 1.1022 | 1219 | 0.6464 | - |
| 1.1230 | 1242 | 0.6649 | - |
| 1.1438 | 1265 | 0.6476 | - |
| 1.1646 | 1288 | 0.6242 | - |
| 1.1854 | 1311 | 0.6742 | - |
| 1.2043 | 1332 | - | 0.6382 |
| 1.2061 | 1334 | 0.6531 | - |
| 1.2269 | 1357 | 0.6511 | - |
| 1.2477 | 1380 | 0.6150 | - |
| 1.2685 | 1403 | 0.6621 | - |
| 1.2893 | 1426 | 0.6222 | - |
| 1.3101 | 1449 | 0.6264 | - |
| 1.3309 | 1472 | 0.6178 | - |
| 1.3517 | 1495 | 0.6288 | - |
| 1.3725 | 1518 | 0.6549 | - |
| 1.3933 | 1541 | 0.6500 | - |
| 1.4051 | 1554 | - | 0.6324 |
| 1.4141 | 1564 | 0.6260 | - |
| 1.4349 | 1587 | 0.6085 | - |
| 1.4557 | 1610 | 0.6279 | - |
| 1.4765 | 1633 | 0.6121 | - |
| 1.4973 | 1656 | 0.6495 | - |
| 1.5181 | 1679 | 0.5821 | - |
| 1.5389 | 1702 | 0.6589 | - |
| 1.5597 | 1725 | 0.6012 | - |
| 1.5805 | 1748 | 0.6777 | - |
| 1.6013 | 1771 | 0.6340 | - |
| 1.6058 | 1776 | - | 0.6266 |
| 1.6221 | 1794 | 0.6347 | - |
| 1.6429 | 1817 | 0.6074 | - |
| 1.6637 | 1840 | 0.6082 | - |
| 1.6844 | 1863 | 0.5885 | - |
| 1.7052 | 1886 | 0.5885 | - |
| 1.7260 | 1909 | 0.6712 | - |
| 1.7468 | 1932 | 0.6356 | - |
| 1.7676 | 1955 | 0.5806 | - |
| 1.7884 | 1978 | 0.5942 | - |
| 1.8065 | 1998 | - | 0.6198 |
| 1.8092 | 2001 | 0.6307 | - |
| 1.8300 | 2024 | 0.6837 | - |
| 1.8508 | 2047 | 0.5723 | - |
| 1.8716 | 2070 | 0.6014 | - |
| 1.8924 | 2093 | 0.6037 | - |
| 1.9132 | 2116 | 0.6628 | - |
| 1.9340 | 2139 | 0.6169 | - |
| 1.9548 | 2162 | 0.5935 | - |
| 1.9756 | 2185 | 0.5934 | - |
| 1.9964 | 2208 | 0.6555 | - |
| 2.0 | 2212 | - | 0.6119 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
Base model
jhu-clsp/mmBERT-base