Instructions to use ianshank/phi-35-moe-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ianshank/phi-35-moe-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ianshank/phi-35-moe-instruct", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ianshank/phi-35-moe-instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("ianshank/phi-35-moe-instruct", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ianshank/phi-35-moe-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ianshank/phi-35-moe-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ianshank/phi-35-moe-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/ianshank/phi-35-moe-instruct

SGLang

How to use ianshank/phi-35-moe-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ianshank/phi-35-moe-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ianshank/phi-35-moe-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ianshank/phi-35-moe-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ianshank/phi-35-moe-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use ianshank/phi-35-moe-instruct with Docker Model Runner:
```
docker model run hf.co/ianshank/phi-35-moe-instruct
```

ianshank commited on Sep 13, 2025

Commit

73b0d95

verified ·

1 Parent(s): 840ad0d

Upload training_data/arxiv/raw_test_harvest.jsonl with huggingface_hub

Browse files

Files changed (1) hide show

training_data/arxiv/raw_test_harvest.jsonl +25 -0

training_data/arxiv/raw_test_harvest.jsonl ADDED Viewed

	@@ -0,0 +1,25 @@

+{"id": "http://arxiv.org/abs/2509.09680v1", "arxiv_id": "2509.09680v1", "title": "FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning\n  Dataset and Comprehensive Benchmark", "authors": ["Rongyao Fang", "Aldrich Yu", "Chengqi Duan", "Linjiang Huang", "Shuai Bai", "Yuxuan Cai", "Kun Wang", "Si Liu", "Xihui Liu", "Hongsheng Li"], "categories": ["cs.CV", "cs.CL"], "primary_category": "cs.CV", "published": "2025-09-11T17:59:59Z", "updated": "2025-09-11T17:59:59Z", "doi": null, "comment": "Project page: https://flux-reason-6m.github.io/", "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09680v1", "url_pdf": "http://arxiv.org/pdf/2509.09680v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "The advancement of open-source text-to-image (T2I) models has been hindered\nby the absence of large-scale, reasoning-focused datasets and comprehensive\nevaluation benchmarks, resulting in a performance gap compared to leading\nclosed-source systems. To address this challenge, We introduce FLUX-Reason-6M\nand PRISM-Bench (Precise and Robust Image Synthesis Measurement Benchmark).\nFLUX-Reason-6M is a massive dataset consisting of 6 million high-quality\nFLUX-generated images and 20 million bilingual (English and Chinese)\ndescriptions specifically designed to teach complex reasoning. The image are\norganized according to six key characteristics: Imagination, Entity, Text\nrendering, Style, Affection, and Composition, and design explicit Generation\nChain-of-Thought (GCoT) to provide detailed breakdowns of image generation\nsteps. The whole data curation takes 15,000 A100 GPU days, providing the\ncommunity with a resource previously unattainable outside of large industrial\nlabs. PRISM-Bench offers a novel evaluation standard with seven distinct\ntracks, including a formidable Long Text challenge using GCoT. Through\ncarefully designed prompts, it utilizes advanced vision-language models for\nnuanced human-aligned assessment of prompt-image alignment and image\naesthetics. Our extensive evaluation of 19 leading models on PRISM-Bench\nreveals critical performance gaps and highlights specific areas requiring\nimprovement. Our dataset, benchmark, and evaluation code are released to\ncatalyze the next wave of reasoning-oriented T2I generation. Project page:\nhttps://flux-reason-6m.github.io/ .", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09679v1", "arxiv_id": "2509.09679v1", "title": "ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable\n  Orthogonal Butterfly Transforms", "authors": ["Bingxin Xu", "Zhen Dong", "Oussama Elachqar", "Yuzhang Shang"], "categories": ["cs.LG", "cs.AI", "cs.CL"], "primary_category": "cs.LG", "published": "2025-09-11T17:59:51Z", "updated": "2025-09-11T17:59:51Z", "doi": null, "comment": "Replace discrete Hadamard transforms with continuous Butterfly\n  transforms to facilitate the learning of rotation matrices in LLM\n  quantization", "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09679v1", "url_pdf": "http://arxiv.org/pdf/2509.09679v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "Large language models require massive memory footprints, severely limiting\ndeployment on consumer hardware. Quantization reduces memory through lower\nnumerical precision, but extreme 2-bit quantization suffers from catastrophic\nperformance loss due to outliers in activations. Rotation-based methods such as\nQuIP and QuaRot apply orthogonal transforms to eliminate outliers before\nquantization, using computational invariance: $\\mathbf{y} = \\mathbf{Wx} =\n(\\mathbf{WQ}^T)(\\mathbf{Qx})$ for orthogonal $\\mathbf{Q}$. However, these\nmethods use fixed transforms--Hadamard matrices achieving optimal worst-case\ncoherence $\\mu = 1/\\sqrt{n}$--that cannot adapt to specific weight\ndistributions. We identify that different transformer layers exhibit distinct\noutlier patterns, motivating layer-adaptive rotations rather than\none-size-fits-all approaches. We propose ButterflyQuant, which replaces\nHadamard rotations with learnable butterfly transforms parameterized by\ncontinuous Givens rotation angles. Unlike Hadamard's discrete $\\{+1, -1\\}$\nentries that are non-differentiable and prohibit gradient-based learning,\nbutterfly transforms' continuous parameterization enables smooth optimization\nwhile guaranteeing orthogonality by construction. This orthogonal constraint\nensures theoretical guarantees in outlier suppression while achieving $O(n \\log\nn)$ computational complexity with only $\\frac{n \\log n}{2}$ learnable\nparameters. We further introduce a uniformity regularization on\npost-transformation activations to promote smoother distributions amenable to\nquantization. Learning requires only 128 calibration samples and converges in\nminutes on a single GPU--a negligible one-time cost. On LLaMA-2-7B with 2-bit\nquantization, ButterflyQuant achieves 15.4 perplexity versus 22.1 for QuaRot.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09677v1", "arxiv_id": "2509.09677v1", "title": "The Illusion of Diminishing Returns: Measuring Long Horizon Execution in\n  LLMs", "authors": ["Akshit Sinha", "Arvindh Arun", "Shashwat Goel", "Steffen Staab", "Jonas Geiping"], "categories": ["cs.AI"], "primary_category": "cs.AI", "published": "2025-09-11T17:59:34Z", "updated": "2025-09-11T17:59:34Z", "doi": null, "comment": null, "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09677v1", "url_pdf": "http://arxiv.org/pdf/2509.09677v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "Does continued scaling of large language models (LLMs) yield diminishing\nreturns? Real-world value often stems from the length of task an agent can\ncomplete. We start this work by observing the simple but counterintuitive fact\nthat marginal gains in single-step accuracy can compound into exponential\nimprovements in the length of a task a model can successfully complete. Then,\nwe argue that failures of LLMs when simple tasks are made longer arise from\nmistakes in execution, rather than an inability to reason. We propose isolating\nexecution capability, by explicitly providing the knowledge and plan needed to\nsolve a long-horizon task. We find that larger models can correctly execute\nsignificantly more turns even when small models have 100\\% single-turn\naccuracy. We observe that the per-step accuracy of models degrades as the\nnumber of steps increases. This is not just due to long-context limitations --\ncuriously, we observe a self-conditioning effect -- models become more likely\nto make mistakes when the context contains their errors from prior turns.\nSelf-conditioning does not reduce by just scaling the model size. In contrast,\nrecent thinking models do not self-condition, and can also execute much longer\ntasks in a single turn. We conclude by benchmarking frontier thinking models on\nthe length of task they can execute in a single turn. Overall, by focusing on\nthe ability to execute, we hope to reconcile debates on how LLMs can solve\ncomplex reasoning problems yet fail at simple tasks when made longer, and\nhighlight the massive benefits of scaling model size and sequential test-time\ncompute for long-horizon tasks.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09676v1", "arxiv_id": "2509.09676v1", "title": "SpatialVID: A Large-Scale Video Dataset with Spatial Annotations", "authors": ["Jiahao Wang", "Yufeng Yuan", "Rujie Zheng", "Youtian Lin", "Jian Gao", "Lin-Zhuo Chen", "Yajie Bao", "Yi Zhang", "Chang Zeng", "Yanxi Zhou", "Xiaoxiao Long", "Hao Zhu", "Zhaoxiang Zhang", "Xun Cao", "Yao Yao"], "categories": ["cs.CV"], "primary_category": "cs.CV", "published": "2025-09-11T17:59:31Z", "updated": "2025-09-11T17:59:31Z", "doi": null, "comment": "Project page: https://nju-3dv.github.io/projects/SpatialVID/", "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09676v1", "url_pdf": "http://arxiv.org/pdf/2509.09676v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "Significant progress has been made in spatial intelligence, spanning both\nspatial reconstruction and world exploration. However, the scalability and\nreal-world fidelity of current models remain severely constrained by the\nscarcity of large-scale, high-quality training data. While several datasets\nprovide camera pose information, they are typically limited in scale,\ndiversity, and annotation richness, particularly for real-world dynamic scenes\nwith ground-truth camera motion. To this end, we collect \\textbf{SpatialVID}, a\ndataset consists of a large corpus of in-the-wild videos with diverse scenes,\ncamera movements and dense 3D annotations such as per-frame camera poses,\ndepth, and motion instructions. Specifically, we collect more than 21,000 hours\nof raw video, and process them into 2.7 million clips through a hierarchical\nfiltering pipeline, totaling 7,089 hours of dynamic content. A subsequent\nannotation pipeline enriches these clips with detailed spatial and semantic\ninformation, including camera poses, depth maps, dynamic masks, structured\ncaptions, and serialized motion instructions. Analysis of SpatialVID's data\nstatistics reveals a richness and diversity that directly foster improved model\ngeneralization and performance, establishing it as a key asset for the video\nand 3D vision research community.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09674v1", "arxiv_id": "2509.09674v1", "title": "SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning", "authors": ["Haozhan Li", "Yuxin Zuo", "Jiale Yu", "Yuhao Zhang", "Zhaohui Yang", "Kaiyan Zhang", "Xuekai Zhu", "Yuchen Zhang", "Tianxing Chen", "Ganqu Cui", "Dehui Wang", "Dingxiang Luo", "Yuchen Fan", "Youbang Sun", "Jia Zeng", "Jiangmiao Pang", "Shanghang Zhang", "Yu Wang", "Yao Mu", "Bowen Zhou", "Ning Ding"], "categories": ["cs.RO", "cs.AI", "cs.CL", "cs.LG"], "primary_category": "cs.RO", "published": "2025-09-11T17:59:17Z", "updated": "2025-09-11T17:59:17Z", "doi": null, "comment": null, "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09674v1", "url_pdf": "http://arxiv.org/pdf/2509.09674v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "Vision-Language-Action (VLA) models have recently emerged as a powerful\nparadigm for robotic manipulation. Despite substantial progress enabled by\nlarge-scale pretraining and supervised fine-tuning (SFT), these models face two\nfundamental challenges: (i) the scarcity and high cost of large-scale\nhuman-operated robotic trajectories required for SFT scaling, and (ii) limited\ngeneralization to tasks involving distribution shift. Recent breakthroughs in\nLarge Reasoning Models (LRMs) demonstrate that reinforcement learning (RL) can\ndramatically enhance step-by-step reasoning capabilities, raising a natural\nquestion: Can RL similarly improve the long-horizon step-by-step action\nplanning of VLA? In this work, we introduce SimpleVLA-RL, an efficient RL\nframework tailored for VLA models. Building upon veRL, we introduce\nVLA-specific trajectory sampling, scalable parallelization, multi-environment\nrendering, and optimized loss computation. When applied to OpenVLA-OFT,\nSimpleVLA-RL achieves SoTA performance on LIBERO and even outperforms $\\pi_0$\non RoboTwin 1.0\\&2.0 with the exploration-enhancing strategies we introduce.\nSimpleVLA-RL not only reduces dependence on large-scale data and enables robust\ngeneralization, but also remarkably surpasses SFT in real-world tasks.\nMoreover, we identify a novel phenomenon ``pushcut'' during RL training,\nwherein the policy discovers previously unseen patterns beyond those seen in\nthe previous training process. Github: https://github.com/PRIME-RL/SimpleVLA-RL", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09675v1", "arxiv_id": "2509.09675v1", "title": "CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning\n  in Large Language Models", "authors": ["Runpeng Dai", "Linfeng Song", "Haolin Liu", "Zhenwen Liang", "Dian Yu", "Haitao Mi", "Zhaopeng Tu", "Rui Liu", "Tong Zheng", "Hongtu Zhu", "Dong Yu"], "categories": ["cs.CL", "cs.AI", "cs.LG"], "primary_category": "cs.CL", "published": "2025-09-11T17:59:17Z", "updated": "2025-09-11T17:59:17Z", "doi": null, "comment": "21 pages", "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09675v1", "url_pdf": "http://arxiv.org/pdf/2509.09675v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "Reinforcement Learning with Verifiable Rewards (RLVR) is a powerful paradigm\nfor enhancing the reasoning ability of Large Language Models (LLMs). Yet\ncurrent RLVR methods often explore poorly, leading to premature convergence and\nentropy collapse. To address this challenge, we introduce Curiosity-Driven\nExploration (CDE), a framework that leverages the model's own intrinsic sense\nof curiosity to guide exploration. We formalize curiosity with signals from\nboth the actor and the critic: for the actor, we use perplexity over its\ngenerated response, and for the critic, we use the variance of value estimates\nfrom a multi-head architecture. Both signals serve as an exploration bonus\nwithin the RLVR framework to guide the model. Our theoretical analysis shows\nthat the actor-wise bonus inherently penalizes overconfident errors and\npromotes diversity among correct responses; moreover, we connect the\ncritic-wise bonus to the well-established count-based exploration bonus in RL.\nEmpirically, our method achieves an approximate +3 point improvement over\nstandard RLVR using GRPO/PPO on AIME benchmarks. Further analysis identifies a\ncalibration collapse mechanism within RLVR, shedding light on common LLM\nfailure modes.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09672v1", "arxiv_id": "2509.09672v1", "title": "Locality in Image Diffusion Models Emerges from Data Statistics", "authors": ["Artem Lukoianov", "Chenyang Yuan", "Justin Solomon", "Vincent Sitzmann"], "categories": ["cs.CV"], "primary_category": "cs.CV", "published": "2025-09-11T17:59:08Z", "updated": "2025-09-11T17:59:08Z", "doi": null, "comment": "30 pages, 18 figures, 6 tables", "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09672v1", "url_pdf": "http://arxiv.org/pdf/2509.09672v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "Among generative models, diffusion models are uniquely intriguing due to the\nexistence of a closed-form optimal minimizer of their training objective, often\nreferred to as the optimal denoiser. However, diffusion using this optimal\ndenoiser merely reproduces images in the training set and hence fails to\ncapture the behavior of deep diffusion models. Recent work has attempted to\ncharacterize this gap between the optimal denoiser and deep diffusion models,\nproposing analytical, training-free models that can generate images that\nresemble those generated by a trained UNet. The best-performing method\nhypothesizes that shift equivariance and locality inductive biases of\nconvolutional neural networks are the cause of the performance gap, hence\nincorporating these assumptions into its analytical model. In this work, we\npresent evidence that the locality in deep diffusion models emerges as a\nstatistical property of the image dataset, not due to the inductive bias of\nconvolutional neural networks. Specifically, we demonstrate that an optimal\nparametric linear denoiser exhibits similar locality properties to the deep\nneural denoisers. We further show, both theoretically and experimentally, that\nthis locality arises directly from the pixel correlations present in natural\nimage datasets. Finally, we use these insights to craft an analytical denoiser\nthat better matches scores predicted by a deep diffusion model than the prior\nexpert-crafted alternative.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09671v1", "arxiv_id": "2509.09671v1", "title": "Dexplore: Scalable Neural Control for Dexterous Manipulation from\n  Reference-Scoped Exploration", "authors": ["Sirui Xu", "Yu-Wei Chao", "Liuyu Bian", "Arsalan Mousavian", "Yu-Xiong Wang", "Liang-Yan Gui", "Wei Yang"], "categories": ["cs.RO", "cs.CV"], "primary_category": "cs.RO", "published": "2025-09-11T17:59:07Z", "updated": "2025-09-11T17:59:07Z", "doi": null, "comment": "CoRL 2025", "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09671v1", "url_pdf": "http://arxiv.org/pdf/2509.09671v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "Hand-object motion-capture (MoCap) repositories offer large-scale,\ncontact-rich demonstrations and hold promise for scaling dexterous robotic\nmanipulation. Yet demonstration inaccuracies and embodiment gaps between human\nand robot hands limit the straightforward use of these data. Existing methods\nadopt a three-stage workflow, including retargeting, tracking, and residual\ncorrection, which often leaves demonstrations underused and compound errors\nacross stages. We introduce Dexplore, a unified single-loop optimization that\njointly performs retargeting and tracking to learn robot control policies\ndirectly from MoCap at scale. Rather than treating demonstrations as ground\ntruth, we use them as soft guidance. From raw trajectories, we derive adaptive\nspatial scopes, and train with reinforcement learning to keep the policy\nin-scope while minimizing control effort and accomplishing the task. This\nunified formulation preserves demonstration intent, enables robot-specific\nstrategies to emerge, improves robustness to noise, and scales to large\ndemonstration corpora. We distill the scaled tracking policy into a\nvision-based, skill-conditioned generative controller that encodes diverse\nmanipulation skills in a rich latent representation, supporting generalization\nacross objects and real-world deployment. Taken together, these contributions\nposition Dexplore as a principled bridge that transforms imperfect\ndemonstrations into effective training signals for dexterous manipulation.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09670v1", "arxiv_id": "2509.09670v1", "title": "Optimal symmetry operators", "authors": ["Leandro Martinek"], "categories": ["hep-th", "math-ph", "math.MP"], "primary_category": "hep-th", "published": "2025-09-11T17:59:01Z", "updated": "2025-09-11T17:59:01Z", "doi": null, "comment": "48 pages, 5 figures", "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09670v1", "url_pdf": "http://arxiv.org/pdf/2509.09670v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "We present a constructive method to maximize the expectation value of\noperators that implement a symmetry on a subsystem, making use of modular\ntools. More generally, we study the positive cones associated with a von\nNeumann algebra, as defined by Araki. Given a reference vector, an algebra, and\na state on the algebra, the purification of the state in the cone $\\alpha = 0$,\nassociated with the reference vector and the algebra, yields the unique vector\nwhose overlap with the reference vector is maximal among all possible\npurifications. This establishes that the supremum in Uhlmann's theorem is\nuniquely attained by this vector, thereby providing the fidelity between the\ngiven state and the state obtained by restricting the reference vector to the\nalgebra. Moreover, this purification can be explicitly constructed using\nmodular tools. In addition, given an automorphism of the algebra, we show how\nto construct isometries implementing the automorphism using the positive cones.\nWe prove that the isometry constructed from the cone $\\alpha = 0$ is the one\nwith maximal expectation value among all possible isometries implementing the\nautomorphism. We illustrate these ideas with two simple examples: one involving\na system of two spins, and the other in the theory of the massless scalar field\nin 3+1 dimensions.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09669v1", "arxiv_id": "2509.09669v1", "title": "Strong-to-Weak Symmetry Breaking Phases in Steady States of Quantum\n  Operations", "authors": ["Niklas Ziereis", "Sanjay Moudgalya", "Michael Knap"], "categories": ["cond-mat.stat-mech", "cond-mat.str-el", "hep-th", "math-ph", "math.MP", "quant-ph"], "primary_category": "cond-mat.stat-mech", "published": "2025-09-11T17:58:48Z", "updated": "2025-09-11T17:58:48Z", "doi": null, "comment": "35 pages, 8 figures", "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09669v1", "url_pdf": "http://arxiv.org/pdf/2509.09669v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "Mixed states can exhibit two distinct kinds of symmetries, either on the\nlevel of the individual states (strong symmetry), or only on the level of the\nensemble (weak symmetry). Strong symmetries can be spontaneously broken down to\nweak ones, a mechanism referred to as Strong-to-Weak Spontaneous Symmetry\nBreaking (SW-SSB). In this work, we first show that maximally mixed symmetric\ndensity matrices, which appear, for example, as steady states of symmetric\nrandom quantum circuits have SW-SSB when the symmetry is an on-site\nrepresentation of a compact Lie or finite group. We then show that this can be\nregarded as an isolated point within an entire SW-SSB phase that is stable to\nmore general quantum operations such as measurements followed by weak\npostselection. With sufficiently strong postselection, a second-order\ntransition can be driven to a phase where the steady state is strongly\nsymmetric. We provide analytical and numerical results for such SW-SSB phases\nand their transitions for both abelian $\\mathbb{Z}_2$ and non-abelian $S_3$\nsymmetries in the steady state of Brownian random quantum circuits with\nmeasurements. We also show that such continuous SW-SSB transitions are absent\nin the steady-state of general strongly symmetric, trace-preserving quantum\nchannels (including unital, Brownian, or Lindbladian dynamics) by analyzing the\ndegeneracies of the steady states in the presence of symmetries. Our results\ndemonstrate robust SW-SSB phases and their transitions in the steady states of\nnoisy quantum operations, and provide a framework for realizing various kinds\nof mixed-state quantum phases based on their symmetries.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09667v1", "arxiv_id": "2509.09667v1", "title": "Geometric Neural Distance Fields for Learning Human Motion Priors", "authors": ["Zhengdi Yu", "Simone Foti", "Linguang Zhang", "Amy Zhao", "Cem Keskin", "Stefanos Zafeiriou", "Tolga Birdal"], "categories": ["cs.CV"], "primary_category": "cs.CV", "published": "2025-09-11T17:58:18Z", "updated": "2025-09-11T17:58:18Z", "doi": null, "comment": "8 pages", "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09667v1", "url_pdf": "http://arxiv.org/pdf/2509.09667v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "We introduce Neural Riemannian Motion Fields (NRMF), a novel 3D generative\nhuman motion prior that enables robust, temporally consistent, and physically\nplausible 3D motion recovery. Unlike existing VAE or diffusion-based methods,\nour higher-order motion prior explicitly models the human motion in the zero\nlevel set of a collection of neural distance fields (NDFs) corresponding to\npose, transition (velocity), and acceleration dynamics. Our framework is\nrigorous in the sense that our NDFs are constructed on the product space of\njoint rotations, their angular velocities, and angular accelerations,\nrespecting the geometry of the underlying articulations. We further introduce:\n(i) a novel adaptive-step hybrid algorithm for projecting onto the set of\nplausible motions, and (ii) a novel geometric integrator to \"roll out\"\nrealistic motion trajectories during test-time-optimization and generation. Our\nexperiments show significant and consistent gains: trained on the AMASS\ndataset, NRMF remarkably generalizes across multiple input modalities and to\ndiverse tasks ranging from denoising to motion in-betweening and fitting to\npartial 2D / 3D observations.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09666v1", "arxiv_id": "2509.09666v1", "title": "Can Understanding and Generation Truly Benefit Together -- or Just\n  Coexist?", "authors": ["Zhiyuan Yan", "Kaiqing Lin", "Zongjian Li", "Junyan Ye", "Hui Han", "Zhendong Wang", "Hao Liu", "Bin Lin", "Hao Li", "Xue Xu", "Xinyan Xiao", "Jingdong Wang", "Haifeng Wang", "Li Yuan"], "categories": ["cs.CV"], "primary_category": "cs.CV", "published": "2025-09-11T17:57:59Z", "updated": "2025-09-11T17:57:59Z", "doi": null, "comment": null, "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09666v1", "url_pdf": "http://arxiv.org/pdf/2509.09666v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "In this paper, we introduce an insightful paradigm through the Auto-Encoder\nlens-understanding as the encoder (I2T) that compresses images into text, and\ngeneration as the decoder (T2I) that reconstructs images from that text. Using\nreconstruction fidelity as the unified training objective, we enforce the\ncoherent bidirectional information flow between the understanding and\ngeneration processes, bringing mutual gains. To implement this, we propose UAE,\na novel framework for unified multimodal learning. We begin by pre-training the\ndecoder with large-scale long-context image captions to capture fine-grained\nsemantic and complex spatial relationships. We then propose Unified-GRPO via\nreinforcement learning (RL), which covers three stages: (1) A cold-start phase\nto gently initialize both encoder and decoder with a semantic reconstruction\nloss; (2) Generation for Understanding, where the encoder is trained to\ngenerate informative captions that maximize the decoder's reconstruction\nquality, enhancing its visual understanding; (3) Understanding for Generation,\nwhere the decoder is refined to reconstruct from these captions, forcing it to\nleverage every detail and improving its long-context instruction following and\ngeneration fidelity. For evaluation, we introduce Unified-Bench, the first\nbenchmark tailored to assess the degree of unification of the UMMs. A\nsurprising \"aha moment\" arises within the multimodal learning domain: as RL\nprogresses, the encoder autonomously produces more descriptive captions, while\nthe decoder simultaneously demonstrates a profound ability to understand these\nintricate descriptions, resulting in reconstructions of striking fidelity.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09662v1", "arxiv_id": "2509.09662v1", "title": "Galois' Professor's Revenge", "authors": ["M. Damele", "A. Loi", "M. Mereb", "L. Vendramin"], "categories": ["math.NT", "math.GR"], "primary_category": "math.NT", "published": "2025-09-11T17:55:35Z", "updated": "2025-09-11T17:55:35Z", "doi": null, "comment": "10 pages, several figures", "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09662v1", "url_pdf": "http://arxiv.org/pdf/2509.09662v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "We prove that the groups associated with the Revenge Cube and the Professor's\nCube can be realized as Galois groups over the rationals.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09661v1", "arxiv_id": "2509.09661v1", "title": "Cohomological invariants of $\\mathscr{M}_{3,n}$ via level structures", "authors": ["Andrea Di Lorenzo"], "categories": ["math.AG", "14F20, 14H1, 14D23"], "primary_category": "math.AG", "published": "2025-09-11T17:55:29Z", "updated": "2025-09-11T17:55:29Z", "doi": null, "comment": "19 pages, first version: might, or might not, be expanded in the\n  future. Comments welcome!", "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09661v1", "url_pdf": "http://arxiv.org/pdf/2509.09661v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "We show that mod $2$ cohomological invariants of the moduli stack\n$\\mathscr{M}_{3,n}$ of smooth pointed curves of genus three contain a free\nmodule with generators in degree $0$, $2$, $3$, $4$ and $6$, formed by the\ninvariants of the symplectic group $\\mathrm{Sp}_6(2)$. We achieve this by\nshowing that the torsor of full level two structures $\\mathscr{M}_{3,n}(2) \\to\n\\mathscr{M}_{3,n}$ is versal. Along the way, we prove that the invariants of\nthe stack of del Pezzo surfaces of degree two contain the invariants of the\nWeyl group $W(\\mathsf{E}_7)$ and that the mod $2$ cohomology of\n$\\mathscr{M}_{3,n}$ is non-zero in degree three. Our main result holds also for\nthe stack $\\mathscr{A}_3$ of principally polarized abelian threefolds.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09660v1", "arxiv_id": "2509.09660v1", "title": "Steering MoE LLMs via Expert (De)Activation", "authors": ["Mohsen Fayyaz", "Ali Modarressi", "Hanieh Deilamsalehy", "Franck Dernoncourt", "Ryan Rossi", "Trung Bui", "Hinrich Schütze", "Nanyun Peng"], "categories": ["cs.CL", "cs.LG"], "primary_category": "cs.CL", "published": "2025-09-11T17:55:09Z", "updated": "2025-09-11T17:55:09Z", "doi": null, "comment": null, "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09660v1", "url_pdf": "http://arxiv.org/pdf/2509.09660v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "Mixture-of-Experts (MoE) in Large Language Models (LLMs) routes each token\nthrough a subset of specialized Feed-Forward Networks (FFN), known as experts.\nWe present SteerMoE, a framework for steering MoE models by detecting and\ncontrolling behavior-linked experts. Our detection method identifies experts\nwith distinct activation patterns across paired inputs exhibiting contrasting\nbehaviors. By selectively (de)activating such experts during inference, we\ncontrol behaviors like faithfulness and safety without retraining or modifying\nweights. Across 11 benchmarks and 6 LLMs, our steering raises safety by up to\n+20% and faithfulness by +27%. In adversarial attack mode, it drops safety by\n-41% alone, and -100% when combined with existing jailbreak methods, bypassing\nall safety guardrails and exposing a new dimension of alignment faking hidden\nwithin experts.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09658v1", "arxiv_id": "2509.09658v1", "title": "Measuring Epistemic Humility in Multimodal Large Language Models", "authors": ["Bingkui Tong", "Jiaer Xia", "Sifeng Shang", "Kaiyang Zhou"], "categories": ["cs.CV"], "primary_category": "cs.CV", "published": "2025-09-11T17:54:00Z", "updated": "2025-09-11T17:54:00Z", "doi": null, "comment": null, "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09658v1", "url_pdf": "http://arxiv.org/pdf/2509.09658v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "Hallucinations in multimodal large language models (MLLMs) -- where the model\ngenerates content inconsistent with the input image -- pose significant risks\nin real-world applications, from misinformation in visual question answering to\nunsafe errors in decision-making. Existing benchmarks primarily test\nrecognition accuracy, i.e., evaluating whether models can select the correct\nanswer among distractors. This overlooks an equally critical capability for\ntrustworthy AI: recognizing when none of the provided options are correct, a\nbehavior reflecting epistemic humility. We present HumbleBench, a new\nhallucination benchmark designed to evaluate MLLMs' ability to reject plausible\nbut incorrect answers across three hallucination types: object, relation, and\nattribute. Built from a panoptic scene graph dataset, we leverage fine-grained\nscene graph annotations to extract ground-truth entities and relations, and\nprompt GPT-4-Turbo to generate multiple-choice questions, followed by a\nrigorous manual filtering process. Each question includes a \"None of the above\"\noption, requiring models not only to recognize correct visual information but\nalso to identify when no provided answer is valid. We evaluate a variety of\nstate-of-the-art MLLMs -- including both general-purpose and specialized\nreasoning models -- on HumbleBench and share valuable findings and insights\nwith the community. By incorporating explicit false-option rejection,\nHumbleBench fills a key gap in current evaluation suites, providing a more\nrealistic measure of MLLM reliability in safety-critical settings. Our code and\ndataset are released publicly and can be accessed at\nhttps://github.com/maifoundations/HumbleBench.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09657v1", "arxiv_id": "2509.09657v1", "title": "Uniformity within Parameterized Circuit Classes", "authors": ["Steef Hegeman", "Jan Martens", "Alfons Laarman"], "categories": ["cs.CC", "cs.LO"], "primary_category": "cs.CC", "published": "2025-09-11T17:52:41Z", "updated": "2025-09-11T17:52:41Z", "doi": null, "comment": null, "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09657v1", "url_pdf": "http://arxiv.org/pdf/2509.09657v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "We study uniformity conditions for parameterized Boolean circuit families.\nUniformity conditions require that the infinitely many circuits in a circuit\nfamily are in some sense easy to construct from one shared description. For\nshallow circuit families, logtime-uniformity is often desired but quite\ntechnical to prove. Despite that, proving it is often left as an exercise for\nthe reader -- even for recently introduced classes in parameterized circuit\ncomplexity, where uniformity conditions have not yet been explicitly studied.\nWe formally define parameterized versions of linear-uniformity,\nlogtime-uniformity, and FO-uniformity, and prove that these result in\nequivalent complexity classes when imposed on $\\text{para-}\\textsf{AC}^0$ and\n$\\text{para-}\\textsf{AC}^{0\\uparrow}$. Overall, we provide a convenient way to\nverify uniformity for shallow parameterized circuit classes, and thereby\nsubstantiate claims of uniformity in the literature.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09655v1", "arxiv_id": "2509.09655v1", "title": "Feasibility-Guided Fair Adaptive Offline Reinforcement Learning for\n  Medicaid Care Management", "authors": ["Sanjay Basu", "Sadiq Y. Patel", "Parth Sheth", "Bhairavi Muralidharan", "Namrata Elamaran", "Aakriti Kinra", "Rajaie Batniji"], "categories": ["cs.LG", "cs.AI", "cs.LO", "stat.AP"], "primary_category": "cs.LG", "published": "2025-09-11T17:50:06Z", "updated": "2025-09-11T17:50:06Z", "doi": null, "comment": "12 pages, 5 figures, 3 tables", "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09655v1", "url_pdf": "http://arxiv.org/pdf/2509.09655v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "We introduce Feasibility-Guided Fair Adaptive Reinforcement Learning\n(FG-FARL), an offline RL procedure that calibrates per-group safety thresholds\nto reduce harm while equalizing a chosen fairness target (coverage or harm)\nacross protected subgroups. Using de-identified longitudinal trajectories from\na Medicaid population health management program, we evaluate FG-FARL against\nbehavior cloning (BC) and HACO (Hybrid Adaptive Conformal Offline RL; a global\nconformal safety baseline). We report off-policy value estimates with bootstrap\n95% confidence intervals and subgroup disparity analyses with p-values. FG-FARL\nachieves comparable value to baselines while improving fairness metrics,\ndemonstrating a practical path to safer and more equitable decision support.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09653v1", "arxiv_id": "2509.09653v1", "title": "Towards A High-Performance Quantum Data Center Network Architecture", "authors": ["Yufeng Xin", "Liang Zhang"], "categories": ["quant-ph", "cs.DC", "cs.NI"], "primary_category": "quant-ph", "published": "2025-09-11T17:46:31Z", "updated": "2025-09-11T17:46:31Z", "doi": null, "comment": "IEEE International Conference on Communications 2025 (ICC 2025)", "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09653v1", "url_pdf": "http://arxiv.org/pdf/2509.09653v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "Quantum Data Centers (QDCs) are needed to support large-scale quantum\nprocessing for both academic and commercial applications. While large-scale\nquantum computers are constrained by technological and financial barriers, a\nmodular approach that clusters small quantum computers offers an alternative.\nThis approach, however, introduces new challenges in network scalability,\nentanglement generation, and quantum memory management. In this paper, we\npropose a three-layer fat-tree network architecture for QDCs, designed to\naddress these challenges. Our architecture features a unique leaf switch and an\nadvanced swapping spine switch design, optimized to handle high volumes of\nentanglement requests as well as a queue scheduling mechanism that efficiently\nmanages quantum memory to prevent decoherence. Through queuing-theoretical\nmodels and simulations in NetSquid, we demonstrate the proposed architecture's\nscalability and effectiveness in maintaining high entanglement fidelity,\noffering a practical path forward for modular QDC networks.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09652v1", "arxiv_id": "2509.09652v1", "title": "Additive Approximation Schemes for Low-Dimensional Embeddings", "authors": ["Prashanti Anderson", "Ainesh Bakshi", "Samuel B. Hopkins"], "categories": ["cs.DS"], "primary_category": "cs.DS", "published": "2025-09-11T17:45:21Z", "updated": "2025-09-11T17:45:21Z", "doi": null, "comment": "57 pages", "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09652v1", "url_pdf": "http://arxiv.org/pdf/2509.09652v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "We consider the task of fitting low-dimensional embeddings to\nhigh-dimensional data. In particular, we study the $k$-Euclidean Metric\nViolation problem ($\\textsf{$k$-EMV}$), where the input is $D \\in\n\\mathbb{R}^{\\binom{n}{2}}_{\\geq 0}$ and the goal is to find the closest vector\n$X \\in \\mathbb{M}_{k}$, where $\\mathbb{M}_k \\subset\n\\mathbb{R}^{\\binom{n}{2}}_{\\geq 0}$ is the set of all $k$-dimensional Euclidean\nmetrics on $n$ points, and closeness is formulated as the following\noptimization problem, where $\\| \\cdot \\|$ is the entry-wise $\\ell_2$ norm: \\[\n  \\textsf{OPT}_{\\textrm{EMV}} = \\min_{X \\in \\mathbb{M}_{k} } \\Vert D - X\n\\Vert_2^2\\,.\\] Cayton and Dasgupta [CD'06] showed that this problem is NP-Hard,\neven when $k=1$. Dhamdhere [Dha'04] obtained a $O(\\log(n))$-approximation for\n$\\textsf{$1$-EMV}$ and leaves finding a PTAS for it as an open question\n(reiterated recently by Lee [Lee'25]). Although $\\textsf{$k$-EMV}$ has been\nstudied in the statistics community for over 70 years, under the name\n\"multi-dimensional scaling\", there are no known efficient approximation\nalgorithms for $k > 1$, to the best of our knowledge.\n  We provide the first polynomial-time additive approximation scheme for\n$\\textsf{$k$-EMV}$. In particular, we obtain an embedding with objective value\n$\\textsf{OPT}_{\\textrm{EMV}} + \\varepsilon \\Vert D\\Vert_2^2$ in $(n\\cdot\nB)^{\\mathsf{poly}(k, \\varepsilon^{-1})}$ time, where each entry in $D$ can be\nrepresented by $B$ bits. We believe our algorithm is a crucial first step\ntowards obtaining a PTAS for $\\textsf{$k$-EMV}$. Our key technical contribution\nis a new analysis of correlation rounding for Sherali-Adams / Sum-of-Squares\nrelaxations, tailored to low-dimensional embeddings. We also show that our\ntechniques allow us to obtain additive approximation schemes for two related\nproblems: a weighted variant of $\\textsf{$k$-EMV}$ and $\\ell_p$ low-rank\napproximation for $p>2$.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09651v1", "arxiv_id": "2509.09651v1", "title": "Retrieval-Augmented Generation for Reliable Interpretation of Radio\n  Regulations", "authors": ["Zakaria El Kassimi", "Fares Fourati", "Mohamed-Slim Alouini"], "categories": ["cs.IR", "cs.AI", "cs.CL", "cs.LG", "eess.SP"], "primary_category": "cs.IR", "published": "2025-09-11T17:43:42Z", "updated": "2025-09-11T17:43:42Z", "doi": null, "comment": null, "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09651v1", "url_pdf": "http://arxiv.org/pdf/2509.09651v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "We study question answering in the domain of radio regulations, a legally\nsensitive and high-stakes area. We propose a telecom-specific\nRetrieval-Augmented Generation (RAG) pipeline and introduce, to our knowledge,\nthe first multiple-choice evaluation set for this domain, constructed from\nauthoritative sources using automated filtering and human validation. To assess\nretrieval quality, we define a domain-specific retrieval metric, under which\nour retriever achieves approximately 97% accuracy. Beyond retrieval, our\napproach consistently improves generation accuracy across all tested models. In\nparticular, while naively inserting documents without structured retrieval\nyields only marginal gains for GPT-4o (less than 1%), applying our pipeline\nresults in nearly a 12% relative improvement. These findings demonstrate that\ncarefully targeted grounding provides a simple yet strong baseline and an\neffective domain-specific solution for regulatory question answering. All code\nand evaluation scripts, along with our derived question-answer dataset, are\navailable at https://github.com/Zakaria010/Radio-RAG.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09650v1", "arxiv_id": "2509.09650v1", "title": "All for One: LLMs Solve Mental Math at the Last Token With Information\n  Transferred From Other Tokens", "authors": ["Siddarth Mamidanna", "Daking Rai", "Ziyu Yao", "Yilun Zhou"], "categories": ["cs.CL", "I.2.7"], "primary_category": "cs.CL", "published": "2025-09-11T17:41:29Z", "updated": "2025-09-11T17:41:29Z", "doi": null, "comment": "EMNLP 2025 Main Conference", "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09650v1", "url_pdf": "http://arxiv.org/pdf/2509.09650v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "Large language models (LLMs) demonstrate proficiency across numerous\ncomputational tasks, yet their inner workings remain unclear. In theory, the\ncombination of causal self-attention and multilayer perceptron layers allows\nevery token to access and compute information based on all preceding tokens. In\npractice, to what extent are such operations present? In this paper, on mental\nmath tasks (i.e., direct math calculation via next-token prediction without\nexplicit reasoning), we investigate this question in three steps: inhibiting\ninput-specific token computations in the initial layers, restricting the routes\nof information transfer across token positions in the next few layers, and\nforcing all computation to happen at the last token in the remaining layers.\nWith two proposed techniques, Context-Aware Mean Ablation (CAMA) and\nAttention-Based Peeking (ABP), we identify an All-for-One subgraph (AF1) with\nhigh accuracy on a wide variety of mental math tasks, where meaningful\ncomputation occurs very late (in terms of layer depth) and only at the last\ntoken, which receives information of other tokens in few specific middle\nlayers. Experiments on a variety of models and arithmetic expressions show that\nthis subgraph is sufficient and necessary for high model performance, transfers\nacross different models, and works on a variety of input styles. Ablations on\ndifferent CAMA and ABP alternatives reveal their unique advantages over other\nmethods, which may be of independent interest.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09649v1", "arxiv_id": "2509.09649v1", "title": "Large character sums with multiplicative coefficients", "authors": ["Zikang Dong", "Yutong Song", "Weijia Wang", "Hao Zhang", "Shengbo Zhao"], "categories": ["math.NT"], "primary_category": "math.NT", "published": "2025-09-11T17:41:13Z", "updated": "2025-09-11T17:41:13Z", "doi": null, "comment": "8 pages", "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09649v1", "url_pdf": "http://arxiv.org/pdf/2509.09649v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "In this paper, we investigate large values of Dirichlet character sums with\nmultiplicative coefficients $\\sum_{n\\le N}f(n)\\chi(n)$. We prove a new Omega\nresult in the region $\\exp((\\log q)^{\\frac12+\\delta})\\le N\\le\\sqrt q$, where\n$q$ is the prime modulus.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09648v1", "arxiv_id": "2509.09648v1", "title": "Stability and asymptotic behaviour of one-dimensional solutions in\n  cylinders", "authors": ["Francesca De Marchis", "Lisa Mazzuoli", "Filomena Pacella"], "categories": ["math.AP"], "primary_category": "math.AP", "published": "2025-09-11T17:37:45Z", "updated": "2025-09-11T17:37:45Z", "doi": null, "comment": null, "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09648v1", "url_pdf": "http://arxiv.org/pdf/2509.09648v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "We consider positive one-dimensional solutions of a Lane-Emden relative\nDirichlet problem in a cylinder and study their stability/instability\nproperties as the energy varies with respect to domain perturbations. This\ndepends on the exponent $p >1$ of the nonlinearity and we obtain results for\n$p$ close to 1 and for $p$ large. This is achieved by a careful asymptotic\nanalysis of the one-dimensional solution as $p \\to 1$ or $p \\to \\infty$, which\nis of independent interest. It allows to detect the limit profile and other\nqualitative properties of these solutions.", "purpose": "lm_pretraining"}
+{"id": "http://arxiv.org/abs/2509.09646v1", "arxiv_id": "2509.09646v1", "title": "Rigidifying simplicial complexes and realizing group actions", "authors": ["Cristina Costoya", "Rafael Gomes", "Antonio Viruel"], "categories": ["math.AT", "math.CO", "math.GR", "55P10 (Primary) 55U99, 20B25, 06A06, 06A11 (Secondary)"], "primary_category": "math.AT", "published": "2025-09-11T17:35:55Z", "updated": "2025-09-11T17:35:55Z", "doi": null, "comment": "18 pages. Comments welcome!", "journal_ref": null, "url_abs": "http://arxiv.org/abs/2509.09646v1", "url_pdf": "http://arxiv.org/pdf/2509.09646v1", "license_url": null, "source": "arXiv", "chunk_index": 0, "n_chunks": 1, "text": "We show that any action of a finite group on a finitely presentable group\narises as the action of the group of self-homotopy equivalences of a space on\nits fundamental group. In doing so, we prove that any finite connected\n(abstract) simplicial complex $\\mathbf{K}$ can be rigidified -- meaning it can\nbe perturbed in a way that reduces the full automorphism group to any subgroup\n-- while preserving the homotopy type of the geometric realization $|\n\\mathbf{K} |$.", "purpose": "lm_pretraining"}