Add comprehensive model card for FR3E

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +81 -3
README.md CHANGED
@@ -1,3 +1,81 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: text-generation
4
+ library_name: transformers
5
+ tags:
6
+ - qwen
7
+ - llm
8
+ - reinforcement-learning
9
+ - reasoning
10
+ ---
11
+
12
+ # FR3E (First Return, Entropy-Eliciting Explore)
13
+
14
+ The FR3E (First Return, Entropy-Eliciting Explore) model is a novel structured exploration framework designed to enhance the reasoning abilities of Large Language Models (LLMs). It was presented in the paper [First Return, Entropy-Eliciting Explore](https://huggingface.co/papers/2507.07017).
15
+
16
+ ## Model Description
17
+
18
+ FR3E addresses the challenges of unstable exploration in Reinforcement Learning from Verifiable Rewards (RLVR) by identifying high-uncertainty decision points within reasoning trajectories. It then performs targeted rollouts to construct semantically grounded intermediate feedback, providing precise guidance without the need for dense supervision.
19
+
20
+ Empirical results on mathematical reasoning benchmarks (AIME24) demonstrate that FR3E promotes more stable training, produces longer and more coherent responses, and significantly increases the proportion of fully correct trajectories. This framework highlights an effective approach to improving LLM reasoning through robust and structured exploration.
21
+
22
+ ## Paper
23
+
24
+ For more detailed information, please refer to the research paper:
25
+ [**First Return, Entropy-Eliciting Explore**](https://huggingface.co/papers/2507.07017)
26
+
27
+ ## Project Page
28
+
29
+ You can find more information about the project on its Hugging Face organization page:
30
+ [**FR3E-Bytedance**](https://huggingface.co/FR3E-Bytedance)
31
+
32
+ ## Usage
33
+
34
+ You can use the FR3E model with the Hugging Face `transformers` library. This model is based on the Qwen2 architecture and can be loaded as a causal language model for text generation tasks, especially in a conversational format.
35
+
36
+ ```python
37
+ from transformers import AutoModelForCausalLM, AutoTokenizer
38
+ import torch
39
+
40
+ # Replace "FR3E-Bytedance/FR3E-Qwen2-7B-Instruct" with the specific model ID if different
41
+ model_id = "FR3E-Bytedance/FR3E-Qwen2-7B-Instruct"
42
+
43
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
44
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
45
+
46
+ messages = [
47
+ {"role": "system", "content": "You are a helpful assistant."},
48
+ {"role": "user", "content": "What is the capital of France?"},
49
+ ]
50
+
51
+ # Apply chat template and tokenize
52
+ input_ids = tokenizer.apply_chat_template(
53
+ messages,
54
+ tokenize=True,
55
+ add_generation_prompt=True,
56
+ return_tensors="pt"
57
+ ).to(model.device)
58
+
59
+ # Generate response
60
+ outputs = model.generate(input_ids, max_new_tokens=512, do_sample=True, temperature=0.7, top_p=0.9)
61
+
62
+ # Decode the generated text, skipping the input prompt
63
+ generated_text = tokenizer.decode(outputs[0][len(input_ids[0]):], skip_special_tokens=True)
64
+ print(generated_text)
65
+
66
+ # Example with a reasoning prompt
67
+ reasoning_messages = [
68
+ {"role": "user", "content": "Solve the following problem step-by-step: A rectangular garden has a length of 15 meters and a width of 10 meters. If you want to put a fence around it, and the fencing costs $5 per meter, how much will it cost to fence the entire garden?"},
69
+ ]
70
+
71
+ reasoning_input_ids = tokenizer.apply_chat_template(
72
+ reasoning_messages,
73
+ tokenize=True,
74
+ add_generation_prompt=True,
75
+ return_tensors="pt"
76
+ ).to(model.device)
77
+
78
+ reasoning_outputs = model.generate(reasoning_input_ids, max_new_tokens=512, do_sample=True, temperature=0.7, top_p=0.9)
79
+ reasoning_generated_text = tokenizer.decode(reasoning_outputs[0][len(reasoning_input_ids[0]):], skip_special_tokens=True)
80
+ print(reasoning_generated_text)
81
+ ```