Buckets:
| Using device: cpu | |
| Loaded existing RL model. | |
| Iter 0, Prompt: [BOS]What is the definition of jussory?, Mean Reward: -0.2500 | |
| /app/src/grpo_train.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor). | |
| m = torch.tensor(mask[i]).to(model.embedding.weight.device) | |
| Iter 1, Prompt: [BOS]What is (91 plus 40) times 79?, Mean Reward: -0.4500 | |
| Iter 2, Prompt: [BOS]What is the definition of willawa?, Mean Reward: -0.2875 | |
| Iter 3, Prompt: [BOS]What is the definition of prayerful?, Mean Reward: -0.3375 | |
| Iter 4, Prompt: [BOS]What is the definition of mirrorize?, Mean Reward: -0.2000 | |
| Iter 5, Prompt: [BOS]What is the definition of inornate?, Mean Reward: -0.2000 | |
| Iter 6, Prompt: [BOS]What is (62 minus 88) times 33?, Mean Reward: -0.3625 | |
| Iter 7, Prompt: [BOS]What is (4 plus 7) plus 71?, Mean Reward: -0.3688 | |
| Iter 8, Prompt: [BOS]What is the definition of undertake?, Mean Reward: -0.2500 | |
| Iter 9, Prompt: [BOS]What is the sum of 400 and 240?, Mean Reward: -0.7750 | |
| Saved checkpoint at iter 10 | |
| Iter 10, Prompt: [BOS]What is (57 plus 20) plus 83?, Mean Reward: -0.3000 | |
| Iter 11, Prompt: [BOS]What is the definition of colcannon?, Mean Reward: -0.2000 | |
| Iter 12, Prompt: [BOS]Which word is longer: 'untaped' or 'duckweed'? If they are equal, say 'both'., Mean Reward: -0.8750 | |
| Iter 13, Prompt: [BOS]What is the definition of remand?, Mean Reward: -0.2000 | |
| Iter 14, Prompt: [BOS]What is the definition of Grizel?, Mean Reward: -0.2000 | |
| Iter 15, Prompt: [BOS]Which word is longer: 'myriapoda' or 'uninformed'? If they are equal, say 'both'., Mean Reward: -1.0875 | |
| Iter 16, Prompt: [BOS]What is the product of 219 and 219?, Mean Reward: -0.7000 | |
| Iter 17, Prompt: [BOS]What is (98 minus 2) plus 41?, Mean Reward: -0.3625 | |
| Iter 18, Prompt: [BOS]Which word is longer: 'diligency' or 'dothienenteritis'? If they are equal, say 'both'., Mean Reward: -0.9500 | |
| Iter 19, Prompt: [BOS]What is the definition of twinling?, Mean Reward: -0.2500 | |
| Saved checkpoint at iter 20 | |
| Iter 20, Prompt: [BOS]What is the difference between 8 and 134?, Mean Reward: -0.3750 | |
| Iter 21, Prompt: [BOS]What is the definition of whitter?, Mean Reward: -0.3125 | |
| Iter 22, Prompt: [BOS]What is the difference between 462 and 201?, Mean Reward: -0.4875 | |
| Iter 23, Prompt: [BOS]What is the difference between 322 and 339?, Mean Reward: -0.3625 | |
| Iter 24, Prompt: [BOS]Which word is longer: 'posttubercular' or 'chirping'? If they are equal, say 'both'., Mean Reward: -1.3875 | |
| Iter 25, Prompt: [BOS]What is the definition of clavola?, Mean Reward: -0.2375 | |
| Iter 26, Prompt: [BOS]What is the definition of abduction?, Mean Reward: -0.2000 | |
| Iter 27, Prompt: [BOS]What is (64 minus 60) plus 90?, Mean Reward: -0.6125 | |
| Iter 28, Prompt: [BOS]What is the sum of 226 and 299?, Mean Reward: -1.0125 | |
| Iter 29, Prompt: [BOS]What is (38 plus 72) plus 15?, Mean Reward: -0.4125 | |
| Saved checkpoint at iter 30 | |
| Iter 30, Prompt: [BOS]What is (39 minus 43) times 58?, Mean Reward: -0.2500 | |
| Iter 31, Prompt: [BOS]What is the definition of solarium?, Mean Reward: -0.2000 | |
| Iter 32, Prompt: [BOS]What is the sum of 63 and 330?, Mean Reward: -0.8125 | |
| Iter 33, Prompt: [BOS]What is the definition of zeolitize?, Mean Reward: -0.3375 | |
| Iter 34, Prompt: [BOS]Which word is longer: 'quakerbird' or 'dinitrophenol'? If they are equal, say 'both'., Mean Reward: -1.0625 | |
| Iter 35, Prompt: [BOS]What is the sum of 80 and 11?, Mean Reward: -0.2688 | |
| Iter 36, Prompt: [BOS]What is the product of 254 and 205?, Mean Reward: -0.5000 | |
| Iter 37, Prompt: [BOS]What is the definition of plisky?, Mean Reward: -0.5875 | |
| Iter 38, Prompt: [BOS]What is the definition of sough?, Mean Reward: -0.4875 | |
| Iter 39, Prompt: [BOS]What is the difference between 306 and 434?, Mean Reward: -0.8125 | |
| Saved checkpoint at iter 40 | |
| Iter 40, Prompt: [BOS]What is the product of 231 and 23?, Mean Reward: -0.5500 | |
| Iter 41, Prompt: [BOS]What is (24 minus 12) times 66?, Mean Reward: -0.3375 | |
| Iter 42, Prompt: [BOS]What is the definition of maledict?, Mean Reward: -0.2000 | |
| Iter 43, Prompt: [BOS]What is the antonym of 'reducing'?, Mean Reward: -0.7375 | |
| Iter 44, Prompt: [BOS]Which word is longer: 'stomatotomy' or 'safranine'? If they are equal, say 'both'., Mean Reward: -1.4875 | |
| Iter 45, Prompt: [BOS]What is (18 minus 53) times 49?, Mean Reward: -0.3375 | |
| Iter 46, Prompt: [BOS]What is the definition of primy?, Mean Reward: -0.1375 | |
| Iter 47, Prompt: [BOS]What is the definition of mulga?, Mean Reward: -0.3500 | |
| Iter 48, Prompt: [BOS]What is the sum of 17 and 281?, Mean Reward: -0.8000 | |
| Iter 49, Prompt: [BOS]What is the definition of chester?, Mean Reward: -0.2125 | |
| Saved checkpoint at iter 50 | |
| Iter 50, Prompt: [BOS]What is the definition of quei?, Mean Reward: -0.4125 | |
| Iter 51, Prompt: [BOS]Which word is longer: 'depravation' or 'woolen'? If they are equal, say 'both'., Mean Reward: -1.0875 | |
| Iter 52, Prompt: [BOS]What is the definition of paleolate?, Mean Reward: -0.2000 | |
| Iter 53, Prompt: [BOS]Which word is longer: 'tetrabasicity' or 'garabato'? If they are equal, say 'both'., Mean Reward: -1.0250 | |
| Iter 54, Prompt: [BOS]What is (84 minus 66) plus 11?, Mean Reward: -0.3625 | |
| Iter 55, Prompt: [BOS]Which word is longer: 'dotterel' or 'cribrately'? If they are equal, say 'both'., Mean Reward: -1.3500 | |
| Iter 56, Prompt: [BOS]What is the antonym of 'decreasing'?, Mean Reward: -0.2750 | |
| Iter 57, Prompt: [BOS]Which word is longer: 'savation' or 'dilatedly'? If they are equal, say 'both'., Mean Reward: -0.7500 | |
| Iter 58, Prompt: [BOS]Which word is longer: 'steinerian' or 'affluence'? If they are equal, say 'both'., Mean Reward: -0.9000 | |
| Iter 59, Prompt: [BOS]What is the product of 173 and 224?, Mean Reward: -0.4000 | |
| Saved checkpoint at iter 60 | |
| Iter 60, Prompt: [BOS]What is the antonym of 'slug'?, Mean Reward: -0.4500 | |
| Iter 61, Prompt: [BOS]What is the definition of boschbok?, Mean Reward: -0.4500 | |
| Iter 62, Prompt: [BOS]What is the definition of idiotry?, Mean Reward: -0.2000 | |
| Iter 63, Prompt: [BOS]What is (58 minus 59) times 47?, Mean Reward: -1.1500 | |
| Iter 64, Prompt: [BOS]Which word is longer: 'whipgraft' or 'excommunicatory'? If they are equal, say 'both'., Mean Reward: -1.4875 | |
| Iter 65, Prompt: [BOS]What is the sum of 173 and 133?, Mean Reward: -0.5250 | |
| Iter 66, Prompt: [BOS]What is a synonym for 'interim'?, Mean Reward: -0.4125 | |
| Iter 67, Prompt: [BOS]Which word is longer: 'junglewood' or 'afforest'? If they are equal, say 'both'., Mean Reward: -0.8625 | |
| Iter 68, Prompt: [BOS]What is (60 minus 60) times 33?, Mean Reward: -0.6000 | |
| Iter 69, Prompt: [BOS]What is the definition of schnabel?, Mean Reward: -0.2000 | |
| Saved checkpoint at iter 70 | |
| Iter 70, Prompt: [BOS]What is the definition of pretense?, Mean Reward: -0.2000 | |
| Iter 71, Prompt: [BOS]What is the product of 179 and 302?, Mean Reward: -0.8125 | |
| Iter 72, Prompt: [BOS]What is the definition of jatamansi?, Mean Reward: -0.3875 | |
| Iter 73, Prompt: [BOS]What is the definition of unmoral?, Mean Reward: -0.2000 | |
| Iter 74, Prompt: [BOS]What is the difference between 338 and 356?, Mean Reward: -0.7375 | |
| Iter 75, Prompt: [BOS]What is the definition of diagnosis?, Mean Reward: -0.2500 | |
| Iter 76, Prompt: [BOS]What is the definition of wisdomful?, Mean Reward: -0.1750 | |
| Iter 77, Prompt: [BOS]What is the sum of 25 and 122?, Mean Reward: -0.4750 | |
| Iter 78, Prompt: [BOS]What is the definition of foodstuff?, Mean Reward: -0.3750 | |
| Iter 79, Prompt: [BOS]What is the sum of 499 and 115?, Mean Reward: -0.6750 | |
| Saved checkpoint at iter 80 | |
| Iter 80, Prompt: [BOS]What is (95 plus 64) plus 47?, Mean Reward: -0.1875 | |
| Iter 81, Prompt: [BOS]Which word is longer: 'supervisance' or 'electropotential'? If they are equal, say 'both'., Mean Reward: -0.7750 | |
| Iter 82, Prompt: [BOS]Which word is longer: 'zymophosphate' or 'utriculosaccular'? If they are equal, say 'both'., Mean Reward: -1.2500 | |
| Iter 83, Prompt: [BOS]What is the definition of untested?, Mean Reward: -0.2000 | |
| Iter 84, Prompt: [BOS]What is the antonym of 'disliking'?, Mean Reward: -0.1750 | |
| Iter 85, Prompt: [BOS]What is (70 minus 4) times 49?, Mean Reward: -0.6125 | |
| Iter 86, Prompt: [BOS]What is the difference between 281 and 383?, Mean Reward: -0.6375 | |
| Iter 87, Prompt: [BOS]Which word is longer: 'bedspring' or 'chucklingly'? If they are equal, say 'both'., Mean Reward: -0.7750 | |
| Iter 88, Prompt: [BOS]What is the sum of 27 and 442?, Mean Reward: -0.6625 | |
| Iter 89, Prompt: [BOS]What is the product of 357 and 358?, Mean Reward: -0.5125 | |
| Saved checkpoint at iter 90 | |
| Iter 90, Prompt: [BOS]What is (47 minus 91) times 58?, Mean Reward: -0.4375 | |
| Iter 91, Prompt: [BOS]What is the product of 380 and 363?, Mean Reward: -0.5125 | |
| Iter 92, Prompt: [BOS]What is the definition of logoi?, Mean Reward: -0.2500 | |
| Iter 93, Prompt: [BOS]What is the definition of adenose?, Mean Reward: -0.2500 | |
| Iter 94, Prompt: [BOS]Which word is longer: 'extremist' or 'steerability'? If they are equal, say 'both'., Mean Reward: -1.3375 | |
| Iter 95, Prompt: [BOS]What is the product of 115 and 480?, Mean Reward: -0.4125 | |
| Iter 96, Prompt: [BOS]What is the product of 183 and 378?, Mean Reward: -0.8000 | |
| Iter 97, Prompt: [BOS]What is (71 minus 45) plus 78?, Mean Reward: -0.4000 | |
| Iter 98, Prompt: [BOS]What is the definition of paraffle?, Mean Reward: -0.2125 | |
| Iter 99, Prompt: [BOS]What is the product of 412 and 203?, Mean Reward: -0.8500 | |
| Saved checkpoint at iter 100 | |
| Iter 100, Prompt: [BOS]What is the definition of unifloral?, Mean Reward: -0.2125 | |
| Iter 101, Prompt: [BOS]What is a synonym for 'pastime'?, Mean Reward: -0.3875 | |
| Iter 102, Prompt: [BOS]What is the definition of pongee?, Mean Reward: -0.2000 | |
| Iter 103, Prompt: [BOS]What is (45 plus 48) plus 76?, Mean Reward: -0.9000 | |
| Iter 104, Prompt: [BOS]What is (44 plus 75) plus 13?, Mean Reward: -0.4625 | |
| Iter 105, Prompt: [BOS]What is the sum of 280 and 234?, Mean Reward: -0.7625 | |
| Iter 106, Prompt: [BOS]What is the definition of undittoed?, Mean Reward: -0.2000 | |
| Iter 107, Prompt: [BOS]Which word is longer: 'plurification' or 'wardress'? If they are equal, say 'both'., Mean Reward: -1.1500 | |
| Iter 108, Prompt: [BOS]What is (84 minus 69) plus 75?, Mean Reward: -0.7625 | |
| Iter 109, Prompt: [BOS]What is the definition of gibber?, Mean Reward: -0.4000 | |
| Saved checkpoint at iter 110 | |
| Iter 110, Prompt: [BOS]What is the sum of 115 and 328?, Mean Reward: -0.6125 | |
| Iter 111, Prompt: [BOS]Which word is longer: 'muchness' or 'aeschynanthus'? If they are equal, say 'both'., Mean Reward: -0.9625 | |
| Iter 112, Prompt: [BOS]What is (18 plus 33) plus 13?, Mean Reward: -0.2125 | |
| Iter 113, Prompt: [BOS]What is the definition of crankily?, Mean Reward: -0.3375 | |
| Iter 114, Prompt: [BOS]Which word is longer: 'superdreadnought' or 'daphnin'? If they are equal, say 'both'., Mean Reward: -0.8375 | |
| Iter 115, Prompt: [BOS]What is the difference between 225 and 351?, Mean Reward: -0.9125 | |
| Iter 116, Prompt: [BOS]What is the definition of bemoaner?, Mean Reward: -0.2125 | |
| Iter 117, Prompt: [BOS]What is the definition of Lushai?, Mean Reward: -0.2000 | |
| Iter 118, Prompt: [BOS]What is the definition of drawsheet?, Mean Reward: -0.3500 | |
| Iter 119, Prompt: [BOS]What is the definition of cleanable?, Mean Reward: -0.2000 | |
| Saved checkpoint at iter 120 | |
| Iter 120, Prompt: [BOS]What is the definition of Baganda?, Mean Reward: -0.2500 | |
| Iter 121, Prompt: [BOS]What is the definition of canoodler?, Mean Reward: -0.3125 | |
| Iter 122, Prompt: [BOS]What is the definition of grigri?, Mean Reward: -0.3875 | |
| Iter 123, Prompt: [BOS]What is the product of 130 and 150?, Mean Reward: -0.8500 | |
| Iter 124, Prompt: [BOS]What is the product of 106 and 53?, Mean Reward: -0.4625 | |
| Iter 125, Prompt: [BOS]What is (19 plus 10) times 91?, Mean Reward: -0.1937 | |
| Iter 126, Prompt: [BOS]What is the difference between 410 and 496?, Mean Reward: -0.5875 | |
| Iter 127, Prompt: [BOS]What is the definition of Bihari?, Mean Reward: -0.3500 | |
| Iter 128, Prompt: [BOS]What is (28 plus 77) plus 15?, Mean Reward: -0.4750 | |
| Iter 129, Prompt: [BOS]What is the product of 17 and 430?, Mean Reward: -0.7625 | |
| Saved checkpoint at iter 130 | |
| Iter 130, Prompt: [BOS]What is (81 minus 68) times 26?, Mean Reward: -0.4000 | |
| Iter 131, Prompt: [BOS]Which word is longer: 'abietene' or 'unbound'? If they are equal, say 'both'., Mean Reward: -0.8375 | |
| Iter 132, Prompt: [BOS]What is the definition of anodynia?, Mean Reward: -0.2000 | |
| Iter 133, Prompt: [BOS]What is the definition of Teri?, Mean Reward: -0.2250 | |
| Iter 134, Prompt: [BOS]What is (19 plus 28) times 37?, Mean Reward: -0.5250 | |
| Iter 135, Prompt: [BOS]What is the definition of soggily?, Mean Reward: -0.1625 | |
| Iter 136, Prompt: [BOS]What is (86 plus 9) times 51?, Mean Reward: -0.9250 | |
| Iter 137, Prompt: [BOS]What is the antonym of 'cautiousness'?, Mean Reward: -0.1750 | |
| Iter 138, Prompt: [BOS]What is (4 plus 3) plus 88?, Mean Reward: -0.6375 | |
| Iter 139, Prompt: [BOS]What is the difference between 86 and 286?, Mean Reward: -0.6375 | |
| Saved checkpoint at iter 140 | |
| Iter 140, Prompt: [BOS]What is (3 plus 4) plus 48?, Mean Reward: -0.7375 | |
| Iter 141, Prompt: [BOS]What is the definition of palus?, Mean Reward: -0.3000 | |
| Iter 142, Prompt: [BOS]Which word is longer: 'periarctic' or 'atomiferous'? If they are equal, say 'both'., Mean Reward: -1.1125 | |
| Iter 143, Prompt: [BOS]What is the product of 428 and 93?, Mean Reward: -0.7875 | |
| Iter 144, Prompt: [BOS]What is (47 plus 30) times 27?, Mean Reward: -0.5375 | |
| Iter 145, Prompt: [BOS]What is the definition of fluidible?, Mean Reward: -0.2125 | |
| Iter 146, Prompt: [BOS]What is (29 plus 52) times 71?, Mean Reward: -0.4875 | |
| Iter 147, Prompt: [BOS]Which word is longer: 'unlocalize' or 'igniter'? If they are equal, say 'both'., Mean Reward: -0.6500 | |
| Iter 148, Prompt: [BOS]What is the definition of sciophyte?, Mean Reward: -0.4625 | |
| Iter 149, Prompt: [BOS]What is the product of 400 and 3?, Mean Reward: -0.3438 | |
| Saved checkpoint at iter 150 | |
| Iter 150, Prompt: [BOS]What is the definition of Visayan?, Mean Reward: -0.0500 | |
| Iter 151, Prompt: [BOS]What is the product of 231 and 458?, Mean Reward: -0.6375 | |
| Iter 152, Prompt: [BOS]What is the definition of alvar?, Mean Reward: -0.3750 | |
| Iter 153, Prompt: [BOS]What is the definition of asiderite?, Mean Reward: -0.2000 | |
| Iter 154, Prompt: [BOS]What is (83 minus 99) times 13?, Mean Reward: -0.2625 | |
| Iter 155, Prompt: [BOS]What is the sum of 50 and 400?, Mean Reward: -1.1125 | |
| Iter 156, Prompt: [BOS]What is (46 minus 64) times 92?, Mean Reward: -0.5000 | |
| Iter 157, Prompt: [BOS]Which word is longer: 'polyglottist' or 'stinkardly'? If they are equal, say 'both'., Mean Reward: -1.4375 | |
| Iter 158, Prompt: [BOS]What is the definition of queenhood?, Mean Reward: -0.2625 | |
| Iter 159, Prompt: [BOS]What is (20 minus 81) plus 48?, Mean Reward: -0.6000 | |
| Saved checkpoint at iter 160 | |
| Iter 160, Prompt: [BOS]What is (8 minus 60) times 45?, Mean Reward: -0.8062 | |
| Iter 161, Prompt: [BOS]What is the difference between 440 and 242?, Mean Reward: -0.6000 | |
| Iter 162, Prompt: [BOS]Which word is longer: 'managerially' or 'cyclosporous'? If they are equal, say 'both'., Mean Reward: -0.7625 | |
| Iter 163, Prompt: [BOS]Which word is longer: 'kingcraft' or 'gerygone'? If they are equal, say 'both'., Mean Reward: -0.5250 | |
| Iter 164, Prompt: [BOS]What is the sum of 88 and 234?, Mean Reward: -1.0625 | |
| Iter 165, Prompt: [BOS]What is the definition of crower?, Mean Reward: -0.2625 | |
| Iter 166, Prompt: [BOS]What is the definition of jubilate?, Mean Reward: -0.3250 | |
| Iter 167, Prompt: [BOS]What is (6 plus 84) plus 15?, Mean Reward: -0.2437 | |
| Iter 168, Prompt: [BOS]What is the product of 491 and 55?, Mean Reward: -0.6500 | |
| Iter 169, Prompt: [BOS]What is the product of 109 and 319?, Mean Reward: -0.7000 | |
| Saved checkpoint at iter 170 | |
| Iter 170, Prompt: [BOS]What is (18 minus 7) times 7?, Mean Reward: -0.1875 | |
| Iter 171, Prompt: [BOS]What is the sum of 26 and 382?, Mean Reward: -0.8625 | |
| Iter 172, Prompt: [BOS]What is the product of 347 and 172?, Mean Reward: -0.6625 | |
| Iter 173, Prompt: [BOS]Which word is longer: 'kusum' or 'cyperaceous'? If they are equal, say 'both'., Mean Reward: -0.7250 | |
| Iter 174, Prompt: [BOS]Which word is longer: 'montezuma' or 'crowberry'? If they are equal, say 'both'., Mean Reward: -0.9000 | |
| Iter 175, Prompt: [BOS]Which word is longer: 'heracliteanism' or 'syncrasy'? If they are equal, say 'both'., Mean Reward: -0.9375 | |
| Iter 176, Prompt: [BOS]Which word is longer: 'feculency' or 'accusable'? If they are equal, say 'both'., Mean Reward: -1.3125 | |
| Iter 177, Prompt: [BOS]What is (9 minus 67) plus 20?, Mean Reward: -0.5250 | |
| Iter 178, Prompt: [BOS]Which word is longer: 'deconsider' or 'mulletry'? If they are equal, say 'both'., Mean Reward: -0.4750 | |
| Iter 179, Prompt: [BOS]What is the sum of 94 and 117?, Mean Reward: -0.6750 | |
| Saved checkpoint at iter 180 | |
| Iter 180, Prompt: [BOS]What is the definition of levanter?, Mean Reward: -0.2625 | |
| Iter 181, Prompt: [BOS]What is the definition of conspire?, Mean Reward: -0.2000 | |
| Iter 182, Prompt: [BOS]What is the definition of caulicule?, Mean Reward: -0.3625 | |
| Iter 183, Prompt: [BOS]What is the antonym of 'opposite'?, Mean Reward: -0.5500 | |
| Iter 184, Prompt: [BOS]What is the product of 328 and 146?, Mean Reward: -0.9125 | |
| Iter 185, Prompt: [BOS]What is the sum of 304 and 396?, Mean Reward: -0.6250 | |
| Iter 186, Prompt: [BOS]What is the definition of finikin?, Mean Reward: -0.2000 | |
| Iter 187, Prompt: [BOS]What is the product of 294 and 244?, Mean Reward: -0.5250 | |
| Iter 188, Prompt: [BOS]What is the difference between 328 and 410?, Mean Reward: -0.5000 | |
| Iter 189, Prompt: [BOS]What is (36 plus 18) times 77?, Mean Reward: -0.4500 | |
| Saved checkpoint at iter 190 | |
| Iter 190, Prompt: [BOS]Which word is longer: 'carnalism' or 'aloelike'? If they are equal, say 'both'., Mean Reward: -0.6750 | |
| Iter 191, Prompt: [BOS]What is the definition of cruent?, Mean Reward: -0.4000 | |
| Iter 192, Prompt: [BOS]What is the definition of scabbler?, Mean Reward: -0.2000 | |
| Iter 193, Prompt: [BOS]Which word is longer: 'photomapper' or 'neet'? If they are equal, say 'both'., Mean Reward: -0.7000 | |
| Iter 194, Prompt: [BOS]What is the difference between 20 and 307?, Mean Reward: -0.5000 | |
| Iter 195, Prompt: [BOS]What is the definition of ambrosin?, Mean Reward: -0.2000 | |
| Iter 196, Prompt: [BOS]Which word is longer: 'mephitical' or 'chalutzim'? If they are equal, say 'both'., Mean Reward: -1.1125 | |
| Iter 197, Prompt: [BOS]Which word is longer: 'nondealer' or 'indigestibility'? If they are equal, say 'both'., Mean Reward: -1.0375 | |
| Iter 198, Prompt: [BOS]What is the definition of spoonbill?, Mean Reward: -0.2000 | |
| Iter 199, Prompt: [BOS]What is the definition of Crambe?, Mean Reward: -0.2125 | |
| Saved checkpoint at iter 200 | |
| Iter 200, Prompt: [BOS]What is (93 plus 24) times 73?, Mean Reward: -0.7375 | |
| Iter 201, Prompt: [BOS]Which word is longer: 'chapournetted' or 'ayah'? If they are equal, say 'both'., Mean Reward: -0.6375 | |
| Iter 202, Prompt: [BOS]What is (82 minus 39) times 83?, Mean Reward: 0.0750 | |
| Iter 203, Prompt: [BOS]What is the definition of Lyonnais?, Mean Reward: -0.2000 | |
| Iter 204, Prompt: [BOS]What is the difference between 413 and 161?, Mean Reward: -0.7000 | |
| Iter 205, Prompt: [BOS]What is the definition of sleety?, Mean Reward: -0.2000 | |
| Iter 206, Prompt: [BOS]What is (54 minus 71) plus 40?, Mean Reward: -0.4250 | |
| Iter 207, Prompt: [BOS]What is the difference between 68 and 387?, Mean Reward: -0.5000 | |
| Iter 208, Prompt: [BOS]Which word is longer: 'disputeless' or 'shaft'? If they are equal, say 'both'., Mean Reward: -1.0875 | |
| Iter 209, Prompt: [BOS]What is (81 minus 12) times 53?, Mean Reward: -0.0375 | |
| Saved checkpoint at iter 210 | |
| Iter 210, Prompt: [BOS]What is the definition of grieved?, Mean Reward: -0.2375 | |
| Iter 211, Prompt: [BOS]What is (71 minus 58) plus 46?, Mean Reward: -0.2375 | |
| Iter 212, Prompt: [BOS]Which word is longer: 'edifyingly' or 'bigemina'? If they are equal, say 'both'., Mean Reward: -0.8875 | |
| Iter 213, Prompt: [BOS]What is (6 minus 74) times 48?, Mean Reward: -0.1562 | |
| Iter 214, Prompt: [BOS]What is the definition of viva?, Mean Reward: -0.2625 | |
| Iter 215, Prompt: [BOS]What is the antonym of 'uxorial'?, Mean Reward: -0.4375 | |
| Iter 216, Prompt: [BOS]What is the sum of 297 and 108?, Mean Reward: -0.4875 | |
| Iter 217, Prompt: [BOS]What is (86 plus 47) plus 85?, Mean Reward: -0.3500 | |
| Iter 218, Prompt: [BOS]What is the definition of etiology?, Mean Reward: -0.2375 | |
| Iter 219, Prompt: [BOS]What is the definition of Yadava?, Mean Reward: -0.2000 | |
| Saved checkpoint at iter 220 | |
| Iter 220, Prompt: [BOS]What is the antonym of 'opposed'?, Mean Reward: -0.3750 | |
| Iter 221, Prompt: [BOS]What is the definition of unplowed?, Mean Reward: -0.2000 | |
| Iter 222, Prompt: [BOS]What is the sum of 484 and 21?, Mean Reward: -0.5188 | |
| Iter 223, Prompt: [BOS]Which word is longer: 'glabrous' or 'ironical'? If they are equal, say 'both'., Mean Reward: -0.9500 | |
| Iter 224, Prompt: [BOS]What is the sum of 360 and 396?, Mean Reward: -0.6750 | |
| Iter 225, Prompt: [BOS]What is the difference between 194 and 377?, Mean Reward: -0.6500 | |
| Iter 226, Prompt: [BOS]What is the definition of auld?, Mean Reward: -0.2000 | |
| Iter 227, Prompt: [BOS]What is the difference between 403 and 487?, Mean Reward: -0.6000 | |
| Iter 228, Prompt: [BOS]What is (88 minus 13) times 95?, Mean Reward: -0.2625 | |
| Iter 229, Prompt: [BOS]What is (41 plus 20) times 37?, Mean Reward: -0.6500 | |
| Saved checkpoint at iter 230 | |
| Iter 230, Prompt: [BOS]What is the sum of 165 and 295?, Mean Reward: -0.7125 | |
| Iter 231, Prompt: [BOS]What is the definition of secondary?, Mean Reward: -0.2000 | |
| Iter 232, Prompt: [BOS]What is the definition of Babelize?, Mean Reward: -0.2000 | |
| Iter 233, Prompt: [BOS]What is the product of 404 and 126?, Mean Reward: -0.4750 | |
| Iter 234, Prompt: [BOS]What is the definition of envoy?, Mean Reward: -0.3250 | |
| Iter 235, Prompt: [BOS]What is a synonym for 'dupery'?, Mean Reward: -0.4375 | |
| Iter 236, Prompt: [BOS]What is the definition of dolldom?, Mean Reward: -0.2625 | |
| Iter 237, Prompt: [BOS]What is the definition of Ephemera?, Mean Reward: -0.2000 | |
| Iter 238, Prompt: [BOS]What is (21 minus 66) times 68?, Mean Reward: -0.6938 | |
| Iter 239, Prompt: [BOS]What is the antonym of 'apprize'?, Mean Reward: -0.2875 | |
| Saved checkpoint at iter 240 | |
| Iter 240, Prompt: [BOS]What is the sum of 172 and 48?, Mean Reward: -0.4750 | |
| Iter 241, Prompt: [BOS]What is (72 plus 60) plus 18?, Mean Reward: -0.6437 | |
| Iter 242, Prompt: [BOS]What is the difference between 203 and 140?, Mean Reward: -0.5125 | |
| Iter 243, Prompt: [BOS]What is the definition of ancillary?, Mean Reward: -0.2125 | |
| Iter 244, Prompt: [BOS]What is (3 minus 24) plus 20?, Mean Reward: -1.0125 | |
| Iter 245, Prompt: [BOS]Which word is longer: 'unpitiable' or 'aggregant'? If they are equal, say 'both'., Mean Reward: -1.3750 | |
| Iter 246, Prompt: [BOS]What is the antonym of 'contestable'?, Mean Reward: -0.4250 | |
| Iter 247, Prompt: [BOS]What is the product of 172 and 242?, Mean Reward: -0.8750 | |
| Iter 248, Prompt: [BOS]What is the definition of drongo?, Mean Reward: -0.4125 | |
| Iter 249, Prompt: [BOS]Which word is longer: 'fabes' or 'mendelssohnic'? If they are equal, say 'both'., Mean Reward: -0.9375 | |
| Saved checkpoint at iter 250 | |
| Iter 250, Prompt: [BOS]What is the definition of camwood?, Mean Reward: -0.2375 | |
| Iter 251, Prompt: [BOS]What is the sum of 337 and 458?, Mean Reward: -0.5000 | |
| Iter 252, Prompt: [BOS]What is the sum of 337 and 235?, Mean Reward: -0.7000 | |
| Iter 253, Prompt: [BOS]What is the definition of athyrosis?, Mean Reward: -0.3750 | |
| Iter 254, Prompt: [BOS]What is (62 plus 12) times 71?, Mean Reward: -0.4750 | |
| Iter 255, Prompt: [BOS]What is (72 plus 66) times 70?, Mean Reward: -0.2812 | |
| Iter 256, Prompt: [BOS]What is (73 plus 64) times 40?, Mean Reward: -0.4375 | |
| Iter 257, Prompt: [BOS]What is the sum of 372 and 269?, Mean Reward: -0.5250 | |
| Iter 258, Prompt: [BOS]What is the definition of wetly?, Mean Reward: -0.2500 | |
| Iter 259, Prompt: [BOS]What is the definition of disenjoy?, Mean Reward: -0.4000 | |
| Saved checkpoint at iter 260 | |
| Iter 260, Prompt: [BOS]Which word is longer: 'incorporealism' or 'schedar'? If they are equal, say 'both'., Mean Reward: -1.1500 | |
| Iter 261, Prompt: [BOS]What is the definition of Nepa?, Mean Reward: -0.4875 | |
| Iter 262, Prompt: [BOS]What is (37 plus 66) times 82?, Mean Reward: -0.6750 | |
| Iter 263, Prompt: [BOS]What is the product of 162 and 125?, Mean Reward: -0.5875 | |
| Iter 264, Prompt: [BOS]What is the definition of wingy?, Mean Reward: -0.2000 | |
| Iter 265, Prompt: [BOS]What is (31 plus 20) times 38?, Mean Reward: -0.5625 | |
| Iter 266, Prompt: [BOS]What is the definition of hatchable?, Mean Reward: -0.2000 | |
| Iter 267, Prompt: [BOS]What is the definition of remain?, Mean Reward: -0.2000 | |
| Iter 268, Prompt: [BOS]What is (90 plus 25) times 66?, Mean Reward: -0.2625 | |
| Iter 269, Prompt: [BOS]What is the difference between 274 and 141?, Mean Reward: -0.8000 | |
| Saved checkpoint at iter 270 | |
| Iter 270, Prompt: [BOS]Which word is longer: 'microrefractometer' or 'frough'? If they are equal, say 'both'., Mean Reward: -0.9125 | |
| Iter 271, Prompt: [BOS]What is the sum of 361 and 418?, Mean Reward: -0.6625 | |
| Iter 272, Prompt: [BOS]What is the definition of epigraphy?, Mean Reward: -0.4500 | |
| Iter 273, Prompt: [BOS]What is the sum of 344 and 78?, Mean Reward: -0.6625 | |
| Iter 274, Prompt: [BOS]What is (74 plus 39) plus 3?, Mean Reward: -0.1812 | |
| Iter 275, Prompt: [BOS]What is the definition of ascend?, Mean Reward: -0.2000 | |
| Iter 276, Prompt: [BOS]What is the definition of takedown?, Mean Reward: -0.4000 | |
| Iter 277, Prompt: [BOS]What is the product of 281 and 13?, Mean Reward: -0.6500 | |
| Iter 278, Prompt: [BOS]What is the sum of 446 and 405?, Mean Reward: -0.6625 | |
| Iter 279, Prompt: [BOS]What is (32 plus 56) times 39?, Mean Reward: -0.5813 | |
| Saved checkpoint at iter 280 | |
| Iter 280, Prompt: [BOS]What is (88 minus 26) plus 34?, Mean Reward: -0.5500 | |
| Iter 281, Prompt: [BOS]What is the definition of bluffable?, Mean Reward: -0.5250 | |
| Iter 282, Prompt: [BOS]What is the product of 462 and 48?, Mean Reward: -0.5500 | |
| Iter 283, Prompt: [BOS]Which word is longer: 'heteronymously' or 'cantalite'? If they are equal, say 'both'., Mean Reward: -1.0375 | |
| Iter 284, Prompt: [BOS]Which word is longer: 'pondside' or 'dissuade'? If they are equal, say 'both'., Mean Reward: -0.6375 | |
| Iter 285, Prompt: [BOS]What is the difference between 466 and 302?, Mean Reward: -0.5250 | |
| Iter 286, Prompt: [BOS]Which word is longer: 'craniate' or 'berrypicking'? If they are equal, say 'both'., Mean Reward: -1.0125 | |
| Iter 287, Prompt: [BOS]What is the definition of overpick?, Mean Reward: -0.4125 | |
| Iter 288, Prompt: [BOS]What is (76 minus 27) plus 86?, Mean Reward: -0.5625 | |
| Iter 289, Prompt: [BOS]What is the product of 198 and 104?, Mean Reward: -0.6375 | |
| Saved checkpoint at iter 290 | |
| Iter 290, Prompt: [BOS]Which word is longer: 'grudgingness' or 'allergy'? If they are equal, say 'both'., Mean Reward: -1.3000 | |
| Iter 291, Prompt: [BOS]What is the product of 264 and 451?, Mean Reward: -0.6875 | |
| Iter 292, Prompt: [BOS]What is the antonym of 'entozoan'?, Mean Reward: -0.5125 | |
| Iter 293, Prompt: [BOS]What is the definition of grege?, Mean Reward: -0.3375 | |
| Iter 294, Prompt: [BOS]What is (74 minus 39) times 58?, Mean Reward: -0.1000 | |
| Iter 295, Prompt: [BOS]What is the product of 396 and 82?, Mean Reward: -0.6125 | |
| Iter 296, Prompt: [BOS]What is the definition of Achakzai?, Mean Reward: -0.2000 | |
| Iter 297, Prompt: [BOS]What is the definition of biaxially?, Mean Reward: -0.2000 | |
| Iter 298, Prompt: [BOS]What is the definition of knapweed?, Mean Reward: -0.2125 | |
| Iter 299, Prompt: [BOS]What is (64 minus 12) plus 62?, Mean Reward: -0.6375 | |
| Saved checkpoint at iter 300 | |
| Iter 300, Prompt: [BOS]What is the antonym of 'take'?, Mean Reward: -0.2500 | |
| Iter 301, Prompt: [BOS]What is the definition of locate?, Mean Reward: -0.2125 | |
| Iter 302, Prompt: [BOS]What is the definition of presignal?, Mean Reward: -0.2000 | |
| Iter 303, Prompt: [BOS]What is the product of 46 and 95?, Mean Reward: -0.8000 | |
| Iter 304, Prompt: [BOS]What is (44 plus 62) plus 63?, Mean Reward: -0.6625 | |
| Iter 305, Prompt: [BOS]Which word is longer: 'uncontended' or 'quinopyrin'? If they are equal, say 'both'., Mean Reward: -0.1000 | |
| Iter 306, Prompt: [BOS]What is the definition of localist?, Mean Reward: -0.2875 | |
| Iter 307, Prompt: [BOS]What is the sum of 202 and 104?, Mean Reward: -0.5375 | |
| Iter 308, Prompt: [BOS]What is the difference between 341 and 437?, Mean Reward: -0.8625 | |
| Iter 309, Prompt: [BOS]What is the product of 240 and 133?, Mean Reward: -0.4000 | |
| Saved checkpoint at iter 310 | |
| Iter 310, Prompt: [BOS]What is the definition of maliceful?, Mean Reward: -0.2000 | |
| Iter 311, Prompt: [BOS]Which word is longer: 'prestudiousness' or 'curavecan'? If they are equal, say 'both'., Mean Reward: -1.2875 | |
| Iter 312, Prompt: [BOS]What is (94 plus 43) plus 58?, Mean Reward: -0.2125 | |
| Iter 313, Prompt: [BOS]What is (44 plus 45) plus 4?, Mean Reward: -0.1000 | |
| Iter 314, Prompt: [BOS]What is the definition of cavicorn?, Mean Reward: -0.2000 | |
| Iter 315, Prompt: [BOS]What is (79 minus 54) times 91?, Mean Reward: -0.6375 | |
| Iter 316, Prompt: [BOS]What is (68 plus 92) times 7?, Mean Reward: -0.3188 | |
| Iter 317, Prompt: [BOS]What is the definition of wyve?, Mean Reward: -0.4000 | |
| Iter 318, Prompt: [BOS]Which word is longer: 'macropinacoid' or 'connect'? If they are equal, say 'both'., Mean Reward: -0.7000 | |
| Iter 319, Prompt: [BOS]What is (81 plus 80) times 93?, Mean Reward: -0.8500 | |
| Saved checkpoint at iter 320 | |
| Iter 320, Prompt: [BOS]Which word is longer: 'selfly' or 'suppering'? If they are equal, say 'both'., Mean Reward: -0.4250 | |
| Iter 321, Prompt: [BOS]What is the definition of chider?, Mean Reward: -0.3000 | |
| Iter 322, Prompt: [BOS]What is the definition of drupetum?, Mean Reward: -0.3000 | |
| Iter 323, Prompt: [BOS]What is the product of 167 and 62?, Mean Reward: -0.5375 | |
| Iter 324, Prompt: [BOS]Which word is longer: 'five' or 'nooked'? If they are equal, say 'both'., Mean Reward: -1.0125 | |
| Iter 325, Prompt: [BOS]What is the difference between 250 and 311?, Mean Reward: -0.5250 | |
| Iter 326, Prompt: [BOS]What is the definition of bellows?, Mean Reward: -0.2000 | |
| Iter 327, Prompt: [BOS]What is the definition of perioikoi?, Mean Reward: -0.4000 | |
| Iter 328, Prompt: [BOS]What is the antonym of 'squander'?, Mean Reward: -0.5250 | |
| Iter 329, Prompt: [BOS]What is the difference between 301 and 316?, Mean Reward: -0.6375 | |
| Saved checkpoint at iter 330 | |
| Iter 330, Prompt: [BOS]What is the product of 154 and 440?, Mean Reward: -1.0000 | |
| Iter 331, Prompt: [BOS]What is the product of 299 and 271?, Mean Reward: -0.9375 | |
| Iter 332, Prompt: [BOS]What is the definition of pimelitis?, Mean Reward: -0.2250 | |
| Iter 333, Prompt: [BOS]Which word is longer: 'ronga' or 'circumlocutory'? If they are equal, say 'both'., Mean Reward: -1.2750 | |
| Iter 334, Prompt: [BOS]What is the sum of 168 and 296?, Mean Reward: -0.4625 | |
| Iter 335, Prompt: [BOS]What is (55 plus 29) times 21?, Mean Reward: -0.2625 | |
| Iter 336, Prompt: [BOS]What is (77 plus 11) times 70?, Mean Reward: -0.3625 | |
| Iter 337, Prompt: [BOS]What is the difference between 489 and 170?, Mean Reward: -0.8125 | |
| Iter 338, Prompt: [BOS]What is (37 plus 29) plus 89?, Mean Reward: -0.1625 | |
| Iter 339, Prompt: [BOS]What is the definition of pyrosome?, Mean Reward: -0.4000 | |
| Saved checkpoint at iter 340 | |
| Iter 340, Prompt: [BOS]Which word is longer: 'unknotted' or 'mysticete'? If they are equal, say 'both'., Mean Reward: -0.9875 | |
| Iter 341, Prompt: [BOS]What is the definition of mesoderm?, Mean Reward: -0.2000 | |
| Iter 342, Prompt: [BOS]What is the difference between 376 and 220?, Mean Reward: -0.5000 | |
| Iter 343, Prompt: [BOS]What is (3 minus 56) plus 30?, Mean Reward: -0.7750 | |
| Iter 344, Prompt: [BOS]What is the sum of 159 and 197?, Mean Reward: -0.6875 | |
| Iter 345, Prompt: [BOS]What is the product of 85 and 188?, Mean Reward: -0.6375 | |
| Iter 346, Prompt: [BOS]What is the product of 145 and 140?, Mean Reward: -0.5125 | |
| Iter 347, Prompt: [BOS]What is the definition of inversive?, Mean Reward: -0.2250 | |
| Iter 348, Prompt: [BOS]What is the definition of Dantonist?, Mean Reward: -0.2000 | |
| Iter 349, Prompt: [BOS]What is the product of 253 and 324?, Mean Reward: -0.7125 | |
| Saved checkpoint at iter 350 | |
| Iter 350, Prompt: [BOS]What is (92 minus 17) plus 28?, Mean Reward: -0.1750 | |
| Iter 351, Prompt: [BOS]What is (49 minus 68) times 38?, Mean Reward: -0.2750 | |
| Iter 352, Prompt: [BOS]What is the product of 120 and 1?, Mean Reward: -0.0062 | |
| Iter 353, Prompt: [BOS]What is (29 minus 16) times 91?, Mean Reward: -0.6250 | |
| Iter 354, Prompt: [BOS]Which word is longer: 'overgloom' or 'monoammonium'? If they are equal, say 'both'., Mean Reward: -1.1375 | |
| Iter 355, Prompt: [BOS]What is the antonym of 'inconvenient'?, Mean Reward: -0.2750 | |
| Iter 356, Prompt: [BOS]What is (51 plus 86) plus 55?, Mean Reward: -0.2625 | |
| Iter 357, Prompt: [BOS]Which word is longer: 'lactoglobulin' or 'fakirism'? If they are equal, say 'both'., Mean Reward: -0.9875 | |
| Iter 358, Prompt: [BOS]What is (76 minus 45) times 90?, Mean Reward: -0.7250 | |
| Iter 359, Prompt: [BOS]What is the definition of Montagnac?, Mean Reward: -0.3625 | |
| Saved checkpoint at iter 360 | |
| Iter 360, Prompt: [BOS]What is the definition of sinapize?, Mean Reward: -0.3625 | |
| Iter 361, Prompt: [BOS]What is the difference between 317 and 213?, Mean Reward: -0.4500 | |
| Iter 362, Prompt: [BOS]What is the antonym of 'partially'?, Mean Reward: -0.2375 | |
| Iter 363, Prompt: [BOS]What is the definition of Felinae?, Mean Reward: -0.2750 | |
| Iter 364, Prompt: [BOS]What is the definition of vocalness?, Mean Reward: -0.2750 | |
| Iter 365, Prompt: [BOS]Which word is longer: 'ultramontane' or 'malebolgic'? If they are equal, say 'both'., Mean Reward: -1.0500 | |
| Iter 366, Prompt: [BOS]What is the sum of 405 and 154?, Mean Reward: -0.8375 | |
| Iter 367, Prompt: [BOS]What is the sum of 346 and 375?, Mean Reward: -0.6125 | |
| Iter 368, Prompt: [BOS]What is the definition of Chatillon?, Mean Reward: -0.2375 | |
| Iter 369, Prompt: [BOS]Which word is longer: 'guarrau' or 'mucid'? If they are equal, say 'both'., Mean Reward: -1.3625 | |
| Saved checkpoint at iter 370 | |
| Iter 370, Prompt: [BOS]What is the definition of Brahmoism?, Mean Reward: -0.2125 | |
| Iter 371, Prompt: [BOS]What is (85 plus 52) times 20?, Mean Reward: -0.3500 | |
| Iter 372, Prompt: [BOS]What is the sum of 472 and 265?, Mean Reward: -0.6500 | |
| Iter 373, Prompt: [BOS]What is (58 minus 22) times 37?, Mean Reward: -0.1063 | |
| Iter 374, Prompt: [BOS]What is the definition of balmlike?, Mean Reward: -0.2875 | |
| Iter 375, Prompt: [BOS]What is the definition of unminded?, Mean Reward: -0.3000 | |
| Iter 376, Prompt: [BOS]What is (95 minus 59) plus 77?, Mean Reward: -0.3375 | |
| Iter 377, Prompt: [BOS]Which word is longer: 'unnaturalistic' or 'countermine'? If they are equal, say 'both'., Mean Reward: -0.5750 | |
| Iter 378, Prompt: [BOS]What is the definition of precoiler?, Mean Reward: -0.4250 | |
| Iter 379, Prompt: [BOS]What is the difference between 457 and 415?, Mean Reward: -0.5375 | |
| Saved checkpoint at iter 380 | |
| Iter 380, Prompt: [BOS]What is the product of 290 and 283?, Mean Reward: -0.6250 | |
| Iter 381, Prompt: [BOS]What is the product of 125 and 364?, Mean Reward: -0.4875 | |
| Iter 382, Prompt: [BOS]What is (94 plus 33) times 97?, Mean Reward: -0.3375 | |
| Iter 383, Prompt: [BOS]What is (99 minus 11) times 5?, Mean Reward: -0.5500 | |
| Iter 384, Prompt: [BOS]What is (21 plus 11) times 36?, Mean Reward: -0.0875 | |
| Iter 385, Prompt: [BOS]What is the product of 475 and 18?, Mean Reward: -0.6750 | |
| Iter 386, Prompt: [BOS]What is the definition of theologue?, Mean Reward: -0.2125 | |
| Iter 387, Prompt: [BOS]What is the product of 24 and 301?, Mean Reward: -0.9000 | |
| Iter 388, Prompt: [BOS]Which word is longer: 'upstart' or 'undecolic'? If they are equal, say 'both'., Mean Reward: -0.9375 | |
| Iter 389, Prompt: [BOS]What is the definition of endosome?, Mean Reward: -0.3750 | |
| Saved checkpoint at iter 390 | |
| Iter 390, Prompt: [BOS]What is (73 minus 48) times 23?, Mean Reward: -0.7250 | |
| Iter 391, Prompt: [BOS]What is (43 plus 87) plus 23?, Mean Reward: -0.3750 | |
| Iter 392, Prompt: [BOS]What is the sum of 171 and 432?, Mean Reward: -0.4750 | |
| Iter 393, Prompt: [BOS]What is the product of 201 and 203?, Mean Reward: -0.4000 | |
| Iter 394, Prompt: [BOS]What is the sum of 463 and 294?, Mean Reward: -0.4875 | |
| Iter 395, Prompt: [BOS]Which word is longer: 'ironwork' or 'ineligible'? If they are equal, say 'both'., Mean Reward: -0.6000 | |
| Iter 396, Prompt: [BOS]What is the difference between 280 and 86?, Mean Reward: -0.8250 | |
| Iter 397, Prompt: [BOS]What is the definition of sketchee?, Mean Reward: -0.4000 | |
| Iter 398, Prompt: [BOS]What is the definition of lordly?, Mean Reward: -0.2125 | |
| Iter 399, Prompt: [BOS]What is the definition of stuck?, Mean Reward: -0.4375 | |
| Saved checkpoint at iter 400 | |
| Iter 400, Prompt: [BOS]What is (98 minus 82) plus 94?, Mean Reward: -0.1375 | |
| Iter 401, Prompt: [BOS]What is (38 plus 23) plus 54?, Mean Reward: -0.9187 | |
| Iter 402, Prompt: [BOS]What is (90 minus 35) plus 43?, Mean Reward: -0.2625 | |
| Iter 403, Prompt: [BOS]What is the definition of snareless?, Mean Reward: -0.2500 | |
| Iter 404, Prompt: [BOS]What is the antonym of 'indispose'?, Mean Reward: -0.4000 | |
| Iter 405, Prompt: [BOS]What is a synonym for 'anguis'?, Mean Reward: -0.5250 | |
| Iter 406, Prompt: [BOS]Which word is longer: 'covarecas' or 'linkedness'? If they are equal, say 'both'., Mean Reward: -0.6125 | |
| Iter 407, Prompt: [BOS]What is the difference between 66 and 458?, Mean Reward: -0.6250 | |
| Iter 408, Prompt: [BOS]What is the definition of memory?, Mean Reward: -0.4125 | |
| Iter 409, Prompt: [BOS]What is a synonym for 'botanical'?, Mean Reward: -0.1875 | |
| Saved checkpoint at iter 410 | |
| Iter 410, Prompt: [BOS]What is the definition of enspirit?, Mean Reward: -0.2000 | |
| Iter 411, Prompt: [BOS]Which word is longer: 'mumruffin' or 'kasolite'? If they are equal, say 'both'., Mean Reward: -0.6500 | |
| Iter 412, Prompt: [BOS]What is the product of 182 and 230?, Mean Reward: -0.6750 | |
| Iter 413, Prompt: [BOS]Which word is longer: 'raticidal' or 'moly'? If they are equal, say 'both'., Mean Reward: -1.2000 | |
| Iter 414, Prompt: [BOS]What is the definition of Silphidae?, Mean Reward: -0.2000 | |
| Iter 415, Prompt: [BOS]What is (6 minus 49) plus 56?, Mean Reward: -0.6687 | |
| Iter 416, Prompt: [BOS]What is the difference between 335 and 32?, Mean Reward: -0.6125 | |
| Iter 417, Prompt: [BOS]What is the definition of unrent?, Mean Reward: -0.3000 | |
| Iter 418, Prompt: [BOS]What is the product of 220 and 183?, Mean Reward: -0.6625 | |
| Iter 419, Prompt: [BOS]What is the definition of uricemia?, Mean Reward: -0.3250 | |
| Saved checkpoint at iter 420 | |
| Iter 420, Prompt: [BOS]What is the antonym of 'acknowledged'?, Mean Reward: -0.3500 | |
| Iter 421, Prompt: [BOS]Which word is longer: 'basitemporal' or 'pertusaria'? If they are equal, say 'both'., Mean Reward: -0.8125 | |
| Iter 422, Prompt: [BOS]Which word is longer: 'shower' or 'rustication'? If they are equal, say 'both'., Mean Reward: -0.6250 | |
| Iter 423, Prompt: [BOS]What is (41 plus 31) plus 73?, Mean Reward: -0.6125 | |
| Iter 424, Prompt: [BOS]What is the definition of bivittate?, Mean Reward: -0.4750 | |
| Iter 425, Prompt: [BOS]What is the product of 178 and 494?, Mean Reward: -0.6875 | |
| Iter 426, Prompt: [BOS]What is the definition of phasm?, Mean Reward: -0.3625 | |
| Iter 427, Prompt: [BOS]What is the definition of fillmass?, Mean Reward: -0.3625 | |
| Iter 428, Prompt: [BOS]What is (79 plus 34) plus 3?, Mean Reward: -0.1750 | |
| Iter 429, Prompt: [BOS]What is the antonym of 'sadden'?, Mean Reward: -0.2000 | |
| Saved checkpoint at iter 430 | |
| Iter 430, Prompt: [BOS]What is (9 minus 25) plus 89?, Mean Reward: -0.0875 | |
| Iter 431, Prompt: [BOS]What is the definition of clovene?, Mean Reward: -0.2125 | |
| Iter 432, Prompt: [BOS]What is (53 plus 19) times 78?, Mean Reward: -0.3625 | |
| Iter 433, Prompt: [BOS]Which word is longer: 'polybranchia' or 'atik'? If they are equal, say 'both'., Mean Reward: -0.9125 | |
| Iter 434, Prompt: [BOS]What is the definition of evocatory?, Mean Reward: -0.3625 | |
| Iter 435, Prompt: [BOS]What is the definition of bacula?, Mean Reward: -0.2000 | |
| Iter 436, Prompt: [BOS]What is the difference between 422 and 240?, Mean Reward: -0.5500 | |
| Iter 437, Prompt: [BOS]What is the definition of ovoidal?, Mean Reward: -0.1875 | |
| Iter 438, Prompt: [BOS]What is the antonym of 'multiple'?, Mean Reward: -0.1625 | |
| Iter 439, Prompt: [BOS]What is the definition of aconitic?, Mean Reward: -0.2625 | |
| Saved checkpoint at iter 440 | |
| Iter 440, Prompt: [BOS]What is the definition of Piete?, Mean Reward: -0.2000 | |
| Iter 441, Prompt: [BOS]Which word is longer: 'dennstaedtia' or 'monostichous'? If they are equal, say 'both'., Mean Reward: -0.5250 | |
| Iter 442, Prompt: [BOS]What is the difference between 319 and 174?, Mean Reward: -0.3000 | |
| Iter 443, Prompt: [BOS]What is (75 plus 22) times 7?, Mean Reward: -0.5750 | |
| Iter 444, Prompt: [BOS]Which word is longer: 'pelasgi' or 'eldest'? If they are equal, say 'both'., Mean Reward: -0.8250 | |
| Iter 445, Prompt: [BOS]What is (42 minus 23) times 60?, Mean Reward: -0.7625 | |
| Iter 446, Prompt: [BOS]What is the difference between 63 and 326?, Mean Reward: -0.5750 | |
| Iter 447, Prompt: [BOS]What is (46 minus 2) times 93?, Mean Reward: -0.0875 | |
| Iter 448, Prompt: [BOS]What is the difference between 50 and 51?, Mean Reward: -0.4750 | |
| Iter 449, Prompt: [BOS]What is the definition of unrigging?, Mean Reward: -0.3000 | |
| Saved checkpoint at iter 450 | |
| Iter 450, Prompt: [BOS]What is the definition of ungroined?, Mean Reward: -0.2125 | |
| Iter 451, Prompt: [BOS]What is the definition of impolicy?, Mean Reward: -0.3375 | |
| Iter 452, Prompt: [BOS]What is (18 minus 26) times 73?, Mean Reward: -0.3750 | |
| Iter 453, Prompt: [BOS]What is the difference between 40 and 8?, Mean Reward: -0.7188 | |
| Iter 454, Prompt: [BOS]What is (73 minus 20) times 33?, Mean Reward: -0.8875 | |
| Iter 455, Prompt: [BOS]What is the definition of spikefish?, Mean Reward: -0.3000 | |
| Iter 456, Prompt: [BOS]What is the definition of gather?, Mean Reward: -0.2000 | |
| Iter 457, Prompt: [BOS]What is the definition of windwards?, Mean Reward: -0.2250 | |
| Iter 458, Prompt: [BOS]What is the definition of Calixtus?, Mean Reward: -0.2125 | |
| Iter 459, Prompt: [BOS]What is the definition of Montagnac?, Mean Reward: -0.2625 | |
| Saved checkpoint at iter 460 | |
| Iter 460, Prompt: [BOS]What is the product of 349 and 446?, Mean Reward: -0.5500 | |
| Iter 461, Prompt: [BOS]What is (8 plus 6) plus 12?, Mean Reward: -0.0438 | |
| Iter 462, Prompt: [BOS]What is (98 minus 72) plus 42?, Mean Reward: -0.7375 | |
| Iter 463, Prompt: [BOS]Which word is longer: 'serving' or 'farandole'? If they are equal, say 'both'., Mean Reward: -1.1375 | |
| Iter 464, Prompt: [BOS]Which word is longer: 'yawniness' or 'awner'? If they are equal, say 'both'., Mean Reward: -0.6125 | |
| Iter 465, Prompt: [BOS]What is the definition of breeches?, Mean Reward: -0.2750 | |
| Iter 466, Prompt: [BOS]What is the definition of Ghedda?, Mean Reward: -0.2125 | |
| Iter 467, Prompt: [BOS]What is the definition of rasen?, Mean Reward: -0.2000 | |
| Iter 468, Prompt: [BOS]What is the definition of Aimee?, Mean Reward: -0.2000 | |
| Iter 469, Prompt: [BOS]What is the difference between 399 and 97?, Mean Reward: -0.4750 | |
| Saved checkpoint at iter 470 | |
| Iter 470, Prompt: [BOS]What is (55 plus 4) times 38?, Mean Reward: 0.0688 | |
| Iter 471, Prompt: [BOS]What is the definition of resh?, Mean Reward: -0.2250 | |
| Iter 472, Prompt: [BOS]What is (89 plus 7) times 25?, Mean Reward: -0.4375 | |
| Iter 473, Prompt: [BOS]What is (31 minus 30) plus 69?, Mean Reward: -0.3375 | |
| Iter 474, Prompt: [BOS]What is (86 plus 76) times 69?, Mean Reward: -0.4500 | |
| Iter 475, Prompt: [BOS]What is (85 plus 48) times 68?, Mean Reward: -0.1875 | |
| Iter 476, Prompt: [BOS]What is (36 minus 24) plus 83?, Mean Reward: -0.1250 | |
| Iter 477, Prompt: [BOS]What is the product of 204 and 453?, Mean Reward: -0.7250 | |
| Iter 478, Prompt: [BOS]What is the definition of neuric?, Mean Reward: -0.2000 | |
| Iter 479, Prompt: [BOS]What is (53 plus 52) times 76?, Mean Reward: -0.6625 | |
| Saved checkpoint at iter 480 | |
| Iter 480, Prompt: [BOS]What is the definition of bigroot?, Mean Reward: -0.2750 | |
| Iter 481, Prompt: [BOS]What is (60 minus 22) times 46?, Mean Reward: -0.3875 | |
| Iter 482, Prompt: [BOS]What is the definition of bracingly?, Mean Reward: -0.3250 | |
| Iter 483, Prompt: [BOS]What is the definition of flatway?, Mean Reward: -0.2000 | |
| Iter 484, Prompt: [BOS]What is the definition of preallow?, Mean Reward: -0.2875 | |
| Iter 485, Prompt: [BOS]What is (83 plus 16) times 43?, Mean Reward: -0.1750 | |
| Iter 486, Prompt: [BOS]Which word is longer: 'intempestive' or 'sulfionide'? If they are equal, say 'both'., Mean Reward: -0.7500 | |
| Iter 487, Prompt: [BOS]What is the antonym of 'unbound'?, Mean Reward: -0.9000 | |
| Iter 488, Prompt: [BOS]What is the difference between 481 and 473?, Mean Reward: -0.6000 | |
| Iter 489, Prompt: [BOS]Which word is longer: 'octastrophic' or 'internode'? If they are equal, say 'both'., Mean Reward: -0.5375 | |
| Saved checkpoint at iter 490 | |
| Iter 490, Prompt: [BOS]What is the definition of overseal?, Mean Reward: -0.2750 | |
| Iter 491, Prompt: [BOS]What is the definition of soreness?, Mean Reward: -0.2125 | |
| Iter 492, Prompt: [BOS]What is the sum of 177 and 406?, Mean Reward: -0.6500 | |
| Iter 493, Prompt: [BOS]Which word is longer: 'degradable' or 'slammerkin'? If they are equal, say 'both'., Mean Reward: -0.8625 | |
| Iter 494, Prompt: [BOS]Which word is longer: 'dang' or 'clubridden'? If they are equal, say 'both'., Mean Reward: -1.1875 | |
| Iter 495, Prompt: [BOS]What is the product of 372 and 461?, Mean Reward: -0.4875 | |
| Iter 496, Prompt: [BOS]What is the difference between 152 and 364?, Mean Reward: -0.7125 | |
| Iter 497, Prompt: [BOS]What is the product of 366 and 189?, Mean Reward: -0.4625 | |
| Iter 498, Prompt: [BOS]What is the antonym of 'unrealistic'?, Mean Reward: -0.4000 | |
| Iter 499, Prompt: [BOS]What is a synonym for 'prescript'?, Mean Reward: -0.3250 | |
| Saved checkpoint at iter 500 | |
| RL training complete. Model saved to models/rl_model.pt | |
Xet Storage Details
- Size:
- 45.1 kB
- Xet hash:
- e301755d74caddf39f92d559f08967a010bbc89f3695d1240433610e23267695
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.