GiuLeo01
/

FortranCodeGen-3B-SynthData

Text Generation

reinforcement learning

text-generation-inference

Model card Files Files and versions

GiuLeo01 commited on May 19, 2025

Commit

6add377

·

verified ·

1 Parent(s): a8afb5e

Update README.md

Files changed (1) hide show

README.md +45 -0

README.md CHANGED Viewed

@@ -75,5 +75,50 @@ This simple process led to the creation of a small, synthetically labeled traini
 IMPORTANT: Do not validate this model on MBPP-derived benchmarks due to the data overlap.
 ### Training Procedure

 IMPORTANT: Do not validate this model on MBPP-derived benchmarks due to the data overlap.
+#### Reinforcement Learning with Verifiable Rewards Data
+In this phase, both the programming tasks and their test cases were generated automatically using a large language model (OpenAI o3-mini).
+1) The model received detailed instructions regarding:
+  - the expected format of the task descriptions
+  - the difficulty level of the problems
+  - the structure and format of the test cases
+2) To ensure a wide variety of tasks, 30 distinct themes were defined, including:
+  * string manipulation and formatting
+  * basic array processing (1D arrays)
+  * simple numeric sequences
+  * frequency counting in arrays
+  * finding prime numbers
+  * basic sorting algorithms on 1D arrays
+  * simple recursive functions
+  * pattern detection in strings
+  * calculating GCD and LCM
+  * basic statistics (mean, median)
+  * string encoding/decoding
+  * subarray sums
+  * basic combinatorial calculations
+  * bitwise operations
+  * date and time manipulation
+  * palindrome substring detection
+  * basic hashing techniques
+  * number base conversions
+  * array rotation (1D)
+  * counting unique elements
+  * string compression
+  * validating numeric strings
+  * string reversal with conditions
+  * generating the Fibonacci sequence
+  * checking balanced parentheses
+  * basic queue and stack problems (using 1D arrays)
+  * counting vowels and consonants
+  * integer factorization
+  * simple encryption/decryption
+  * basic logical puzzles
+3) For each theme, the model was prompted once to generate 10 unique programming problems and their corresponding test cases.
+This final step was key to generating high-quality synthetic data. Without a clearly defined theme, the model tends to repeat or default to similar types of tasks.
+By guiding generation through specific topics, I built a synthetic dataset of 300 examples—each composed of a task and a corresponding test case.
 ### Training Procedure