Visual Effects
+Methodology
+ +
+ Step I: Success and Failure Detection
+
+ At each turn \( t \), the agent and the user simulator generate their responses, and a critic model assigns a scalar reward \( r_t \).
+ We determine the status as either success or failure by evaluating whether the reward is higher than the previous turn:
+ \begin{equation}
+\text{status}(s_t, a_t, u_t) =
+\begin{cases}
+\text{1} & \text{if } r_t > r_{t-1} \\
+\text{0} & \text{otherwise}
+\end{cases}
+\end{equation}
+
Step II: Strategy Revision
- Using nerfies you can create fun visual effects. This Dolly zoom effect - would be impossible without nerfies since it would require going through a wall. + Upon detecting a failure, the simulation invokes a revision step to refine the previously failed strategic decision. +It then generates a revised strategy \(\sigma_t^{\prime}\) to re-simulate from the failure point, leveraging prior failed attempts at turn \(t\). Formally, the revised strategy is generated as: +\begin{equation} +\sigma_t^{\prime} = \texttt{LLM}_{\theta}(\rho_{r}; s_t, \mathcal{F}_t) +\end{equation} +where \(\rho_{r}\) is the revision prompt and \(\mathcal{F}_t\) denotes the set of previously failed trials at turn \(t\), defined as \( \mathcal{F}_t = \{ (\sigma_t^{1}, a_t^{1}, u_t^{1}), \dots, (\sigma_t^{n}, a_t^{n}, u_t^{n}) \} \) +where \(n\) is the maximum number of failed attempts. This failure history guides the model to avoid previously ineffective strategies. +
+ +Step III: Re-simulation via Backtracking
++ After generating a revised strategy \(\sigma_t^{\prime}\), the simulation backtracks to the original state \(s_t\) preceding the failure and re-simulates turn \(t\) using \(\sigma_t^{\prime}\). The agent generates a revised response \(a_t^{\prime}\), and the user simulator produces a new reply \(u_t^{\prime}\) based on the updated context. + \begin{equation} + a_t^{\prime} = \texttt{LLM}_{\theta}(\rho_{a}; s_t, \sigma_t^{\prime}) + \end{equation} + \begin{equation} + u_t^{\prime} = \texttt{LLM}_{\theta}(\rho_{u}; s_t, a_t^{\prime}) + \end{equation} + +
+Step IV: Principle Derivation
+
+ If the corrected turn is re-evaluated as successful (status == 1), indicating a transition from failure to success, we derive a principle \( \tilde{p_t} \) as a result of overcoming the failure:
+ \begin{equation}
+ \tilde{p_t} = \texttt{LLM}_{\theta}(\rho_{\psi}; s_{t}, \mathcal{T}_t^{*}, \mathcal{F}_{t})
+ \end{equation}
+ where \(\rho_{\psi}\) is a prompt designed to extract a principle from failure, and the successful revised interaction is denoted as \(\mathcal{T}_t^{*} = (\sigma_t^{*}, a_t^{*}, u_t^{*})\). The extracted principle is then added to the principle set \(\mathcal{P}\):
+ \begin{equation}
+ \mathcal{P} \leftarrow \mathcal{P} \cup \{ \tilde{p_t} \}
+ \end{equation}
+
PRINCIPLES: Synthetic Strategy Memory for Proactive Dialogue Agents
PRINCIPLES: a synthetic strategy memory for proactive dialogue agents.
+ PRINCIPLES is derived through offline self-play simulations and serves as reusable knowledge that guides strategy planning during inference, eliminating the need for additional training and data annotation.
+ We evaluate PRINCIPLES in both emotional support and persuasion domains, demonstrating consistent improvements over strong baselines.
+ Furthermore, PRINCIPLES maintains its robustness across extended and more diverse evaluation settings.
-
-
+