new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Jan 12

OpenCSP: A Deep Learning Framework for Crystal Structure Prediction from Ambient to High Pressure

High-pressure crystal structure prediction (CSP) underpins advances in condensed matter physics, planetary science, and materials discovery. Yet, most large atomistic models are trained on near-ambient, equilibrium data, leading to degraded stress accuracy at tens to hundreds of gigapascals and sparse coverage of pressure-stabilized stoichiometries and dense coordination motifs. Here, we introduce OpenCSP, a machine learning framework for CSP tasks spanning ambient to high-pressure conditions. This framework comprises an open-source pressure-resolved dataset alongside a suite of publicly available atomistic models that are jointly optimized for accuracy in energy, force, and stress predictions. The dataset is constructed via randomized high-pressure sampling and iteratively refined through an uncertainty-guided concurrent learning strategy, which enriches underrepresented compression regimes while suppressing redundant DFT labeling. Despite employing a training corpus one to two orders of magnitude smaller than those of leading large models, OpenCSP achieves comparable or superior performance in high-pressure enthalpy ranking and stability prediction. Across benchmark CSP tasks spanning a wide pressure window, our models match or surpass MACE-MPA-0, MatterSim v1 5M, and GRACE-2L-OAM, with the largest gains observed at elevated pressures. These results demonstrate that targeted, pressure-aware data acquisition coupled with scalable architectures enables data-efficient, high-fidelity CSP, paving the way for autonomous materials discovery under ambient and extreme conditions.

  • 6 authors
·
Sep 12, 2025

Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback

Direct preference optimization (DPO) methods have shown strong potential in aligning text-to-image diffusion models with human preferences by training on paired comparisons. These methods improve training stability by avoiding the REINFORCE algorithm but still struggle with challenges such as accurately estimating image probabilities due to the non-linear nature of the sigmoid function and the limited diversity of offline datasets. In this paper, we introduce Diffusion Denoising Ranking Optimization (Diffusion-DRO), a new preference learning framework grounded in inverse reinforcement learning. Diffusion-DRO removes the dependency on a reward model by casting preference learning as a ranking problem, thereby simplifying the training objective into a denoising formulation and overcoming the non-linear estimation issues found in prior methods. Moreover, Diffusion-DRO uniquely integrates offline expert demonstrations with online policy-generated negative samples, enabling it to effectively capture human preferences while addressing the limitations of offline data. Comprehensive experiments show that Diffusion-DRO delivers improved generation quality across a range of challenging and unseen prompts, outperforming state-of-the-art baselines in both both quantitative metrics and user studies. Our source code and pre-trained models are available at https://github.com/basiclab/DiffusionDRO.

  • 4 authors
·
Oct 21, 2025 1

CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human Alignment

Language models trained on large-scale corpus often generate content that is harmful, toxic, or contrary to human preferences, making their alignment with human values a critical concern. Reinforcement learning from human feedback (RLHF) with algorithms like PPO is a prevalent approach for alignment but is often complex, unstable, and resource-intensive. Recently, ranking-based alignment methods have emerged, offering stability and effectiveness by replacing the RL framework with supervised fine-tuning, but they are costly due to the need for annotated data. Considering that existing large language models (LLMs) like ChatGPT are already relatively well-aligned and cost-friendly, researchers have begun to align the language model with human preference from AI feedback. The common practices, which unidirectionally distill the instruction-following responses from LLMs, are constrained by their bottleneck. Thus we introduce CycleAlign to distill alignment capabilities from parameter-invisible LLMs (black-box) to a parameter-visible model (white-box) in an iterative manner. With in-context learning (ICL) as the core of the cycle, the black-box models are able to rank the model-generated responses guided by human-craft instruction and demonstrations about their preferences. During iterative interaction, the white-box models also have a judgment about responses generated by them. Consequently, the agreement ranking could be viewed as a pseudo label to dynamically update the in-context demonstrations and improve the preference ranking ability of black-box models. Through multiple interactions, the CycleAlign framework could align the white-box model with the black-box model effectively in a low-resource way. Empirical results illustrate that the model fine-tuned by CycleAlign remarkably exceeds existing methods, and achieves the state-of-the-art performance in alignment with human value.

  • 6 authors
·
Oct 24, 2023 1

PerSEval: Assessing Personalization in Text Summarizers

Personalized summarization models cater to individuals' subjective understanding of saliency, as represented by their reading history and current topics of attention. Existing personalized text summarizers are primarily evaluated based on accuracy measures such as BLEU, ROUGE, and METEOR. However, a recent study argued that accuracy measures are inadequate for evaluating the degree of personalization of these models and proposed EGISES, the first metric to evaluate personalized text summaries. It was suggested that accuracy is a separate aspect and should be evaluated standalone. In this paper, we challenge the necessity of an accuracy leaderboard, suggesting that relying on accuracy-based aggregated results might lead to misleading conclusions. To support this, we delve deeper into EGISES, demonstrating both theoretically and empirically that it measures the degree of responsiveness, a necessary but not sufficient condition for degree-of-personalization. We subsequently propose PerSEval, a novel measure that satisfies the required sufficiency condition. Based on the benchmarking of ten SOTA summarization models on the PENS dataset, we empirically establish that -- (i) PerSEval is reliable w.r.t human-judgment correlation (Pearson's r = 0.73; Spearman's rho = 0.62; Kendall's tau = 0.42), (ii) PerSEval has high rank-stability, (iii) PerSEval as a rank-measure is not entailed by EGISES-based ranking, and (iv) PerSEval can be a standalone rank-measure without the need of any aggregated ranking.

  • 5 authors
·
Jun 29, 2024