GiuLeo01 commited on
Commit
6add377
·
verified ·
1 Parent(s): a8afb5e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -0
README.md CHANGED
@@ -75,5 +75,50 @@ This simple process led to the creation of a small, synthetically labeled traini
75
 
76
  IMPORTANT: Do not validate this model on MBPP-derived benchmarks due to the data overlap.
77
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
  ### Training Procedure
79
 
 
75
 
76
  IMPORTANT: Do not validate this model on MBPP-derived benchmarks due to the data overlap.
77
 
78
+ #### Reinforcement Learning with Verifiable Rewards Data
79
+
80
+ In this phase, both the programming tasks and their test cases were generated automatically using a large language model (OpenAI o3-mini).
81
+
82
+ 1) The model received detailed instructions regarding:
83
+ - the expected format of the task descriptions
84
+ - the difficulty level of the problems
85
+ - the structure and format of the test cases
86
+ 2) To ensure a wide variety of tasks, 30 distinct themes were defined, including:
87
+ * string manipulation and formatting
88
+ * basic array processing (1D arrays)
89
+ * simple numeric sequences
90
+ * frequency counting in arrays
91
+ * finding prime numbers
92
+ * basic sorting algorithms on 1D arrays
93
+ * simple recursive functions
94
+ * pattern detection in strings
95
+ * calculating GCD and LCM
96
+ * basic statistics (mean, median)
97
+ * string encoding/decoding
98
+ * subarray sums
99
+ * basic combinatorial calculations
100
+ * bitwise operations
101
+ * date and time manipulation
102
+ * palindrome substring detection
103
+ * basic hashing techniques
104
+ * number base conversions
105
+ * array rotation (1D)
106
+ * counting unique elements
107
+ * string compression
108
+ * validating numeric strings
109
+ * string reversal with conditions
110
+ * generating the Fibonacci sequence
111
+ * checking balanced parentheses
112
+ * basic queue and stack problems (using 1D arrays)
113
+ * counting vowels and consonants
114
+ * integer factorization
115
+ * simple encryption/decryption
116
+ * basic logical puzzles
117
+
118
+ 3) For each theme, the model was prompted once to generate 10 unique programming problems and their corresponding test cases.
119
+
120
+ This final step was key to generating high-quality synthetic data. Without a clearly defined theme, the model tends to repeat or default to similar types of tasks.
121
+ By guiding generation through specific topics, I built a synthetic dataset of 300 examples—each composed of a task and a corresponding test case.
122
+
123
  ### Training Procedure
124