本篇博文主要内容为 2025-10-14 从Arxiv.org论文网站获取的最新论文列表,自动更新,按照NLP、CV、ML、AI、IR五个大方向区分,若需要邮件定时接收,请在评论区留下你的邮箱号。

说明:每日论文数据从Arxiv.org获取,每天早上12:00左右定时自动更新。

友情提示: 如何您需要邮箱接收每日论文数据,请在评论处留下你的邮箱。

目录

概览 (2025-10-14)

今日共更新1229篇论文,其中:

  • 自然语言处理199篇(Computation and Language (cs.CL))
  • 人工智能409篇(Artificial Intelligence (cs.AI))
  • 计算机视觉263篇(Computer Vision and Pattern Recognition (cs.CV))
  • 机器学习355篇(Machine Learning (cs.LG))

自然语言处理

[NLP-0] Are Large Reasoning Models Interruptible?

【速读】: 该论文旨在解决大型推理模型(Large Reasoning Models, LRM)在传统静态评估范式下高估其鲁棒性的问题,特别是在动态现实场景中,如辅助编程任务中,模型可能需要数小时完成推理,而在此期间上下文信息可能发生改变或被中断。解决方案的关键在于打破“冻结世界”假设,通过引入两种真实世界的动态场景——中断(interruptions)和动态上下文(dynamic context),系统性地评估LRM在部分输出受限和实时信息变更条件下的表现。实验表明,即使最先进的LRM在静态测试中表现优异,在动态条件下性能可下降高达60%,并揭示了三类新型失败模式:推理泄露(reasoning leakage)、恐慌(panic)和自我怀疑(self-doubt)。

链接: https://arxiv.org/abs/2510.11713
作者: Tsung-Han Wu,Mihran Miroyan,David M. Chan,Trevor Darrell,Narges Norouzi,Joseph E. Gonzalez
机构: University of California, Berkeley (加州大学伯克利分校)
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: Project Page: this https URL

点击查看摘要

Abstract:Large Reasoning Models (LRMs) excel at complex reasoning but are traditionally evaluated in static, “frozen world” settings: model responses are assumed to be instantaneous, and the context of a request is presumed to be immutable over the duration of the response. While generally true for short-term tasks, the “frozen world” assumption breaks down in modern reasoning tasks such as assistive programming, where models may take hours to think through problems and code may change dramatically from the time the model starts thinking to the model’s final output. In this work, we challenge the frozen world assumption and evaluate LRM robustness under two realistic dynamic scenarios: interruptions, which test the quality of the model’s partial outputs on a limited budget, and dynamic context, which tests model adaptation to in-flight changes. Across mathematics and programming benchmarks that require long-form reasoning, static evaluations consistently overestimate robustness: even state-of-the-art LRMs, which achieve high accuracy in static settings, can fail unpredictably when interrupted or exposed to changing context, with performance dropping by up to 60% when updates are introduced late in the reasoning process. Our analysis further reveals several novel failure modes, including reasoning leakage, where models fold the reasoning into their final answer when interrupted; panic, where under time pressure models abandon reasoning entirely and return incorrect answers; and self-doubt, where performance degrades while incorporating updated information.
zh

[NLP-1] Demystifying Reinforcement Learning in Agent ic Reasoning

【速读】: 该论文旨在解决当前基于强化学习(Reinforcement Learning, RL)的智能体推理(agentic reasoning)中缺乏明确设计原则与最优实践的问题。其核心挑战在于如何有效提升大语言模型(Large Language Models, LLMs)在复杂任务中的自主推理能力,尤其是在工具调用(tool-use)场景下的决策效率和准确性。解决方案的关键在于从数据、算法和推理模式三个维度提出系统性优化策略:首先,使用真实端到端的工具调用轨迹替代合成拼接轨迹作为监督微调(Supervised Fine-Tuning, SFT)初始化,结合高多样性、模型感知的数据集以增强探索能力;其次,引入探索友好的技术如裁剪奖励上限(clip higher)、过长路径奖励塑造(overlong reward shaping)及保持适度策略熵(policy entropy),显著提升训练效率;最后,采用少频次工具调用的审慎推理策略(deliberative strategy)优于频繁调用或冗长自推理,从而提升工具使用效率与最终准确率。这些简单但有效的实践共同构建了一个可复现且高效的 agentic RL 基线,使4B规模模型在多个挑战性基准测试中超越32B模型表现。

链接: https://arxiv.org/abs/2510.11701
作者: Zhaochen Yu,Ling Yang,Jiaru Zou,Shuicheng Yan,Mengdi Wang
机构: National University of Singapore (新加坡国立大学); University of Illinois at Urbana-Champaign (伊利诺伊大学厄巴纳-香槟分校); Princeton University (普林斯顿大学)
类目: Computation and Language (cs.CL)
备注: Code and models: this https URL

点击查看摘要

Abstract:Recently, the emergence of agentic RL has showcased that RL could also effectively improve the agentic reasoning ability of LLMs, yet the key design principles and optimal practices remain unclear. In this work, we conduct a comprehensive and systematic investigation to demystify reinforcement learning in agentic reasoning from three key perspectives: data, algorithm, and reasoning mode. We highlight our key insights: (i) Replacing stitched synthetic trajectories with real end-to-end tool-use trajectories yields a far stronger SFT initialization; high-diversity, model-aware datasets sustain exploration and markedly improve RL performance. (ii) Exploration-friendly techniques are crucial for agentic RL, such as clip higher, overlong reward shaping, and maintaining adequate policy entropy could improve the training efficiency. (iii) A deliberative strategy with fewer tool calls outperforms frequent tool calls or verbose self-reasoning, improving tool efficiency and final accuracy. Together, these simple practices consistently enhance agentic reasoning and training efficiency, achieving strong results on challenging benchmarks with smaller models, and establishing a practical baseline for future agentic RL research. Beyond these empirical insights, we further contribute a high-quality, real end-to-end agentic SFT dataset along with a high-quality RL dataset, and demonstrate the effectiveness of our insights in boosting the agentic reasoning ability of LLMs across four challenging benchmarks, including AIME2024/AIME2025, GPQA-Diamond, and LiveCodeBench-v6. With our recipes, 4B-sized models could also achieve superior agentic reasoning performance compared to 32B-sized models. Code and models: this https URL
zh

[NLP-2] QeRL: Beyond Efficiency – Quantization-enhanced Reinforcement Learning for LLM s

【速读】: 该论文旨在解决大语言模型(Large Language Models, LLMs)在强化学习(Reinforcement Learning, RL)训练过程中面临的高资源消耗问题,包括GPU内存占用大和rollout阶段耗时长。其解决方案的关键在于提出QeRL框架,通过结合NVFP4量化(Quantization-enhanced Reinforcement Learning)与低秩适应(Low-Rank Adaptation, LoRA),在加速rollout阶段的同时显著降低内存开销;此外,研究发现量化噪声可提升策略熵(policy entropy),增强探索能力,从而促进更优策略的发现,并进一步引入自适应量化噪声(Adaptive Quantization Noise, AQN)机制动态调节噪声强度以优化探索效率。该方法首次实现了在单张H100 80GB GPU上对32B参数模型进行RL训练,并在多个数学基准测试中达到与全参数微调相当的性能,同时实现整体训练速度提升。

链接: https://arxiv.org/abs/2510.11696
作者: Wei Huang,Yi Ge,Shuai Yang,Yicheng Xiao,Huizi Mao,Yujun Lin,Hanrong Ye,Sifei Liu,Ka Chun Cheung,Hongxu Yin,Yao Lu,Xiaojuan Qi,Song Han,Yukang Chen
机构: NVIDIA(英伟达); MIT(麻省理工学院); HKU(香港大学); THU(清华大学)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
备注: Code is available at this https URL

点击查看摘要

Abstract:We propose QeRL, a Quantization-enhanced Reinforcement Learning framework for large language models (LLMs). While RL is essential for LLMs’ reasoning capabilities, it is resource-intensive, requiring substantial GPU memory and long rollout durations. QeRL addresses these issues by combining NVFP4 quantization with Low-Rank Adaptation (LoRA), accelerating rollout phase of RL while reducing memory overhead. Beyond efficiency, our findings show that quantization noise increases policy entropy, enhancing exploration, and enabling the discovery of better strategies during RL. To further optimize exploration, QeRL introduces an Adaptive Quantization Noise (AQN) mechanism, which dynamically adjusts noise during training. Experiments demonstrate that QeRL delivers over 1.5 times speedup in the rollout phase. Moreover, this is the first framework to enable RL training of a 32B LLM on a single H100 80GB GPU, while delivering overall speedups for RL training. It also achieves faster reward growth and higher final accuracy than 16-bit LoRA and QLoRA, while matching the performance of full-parameter fine-tuning on mathematical benchmarks such as GSM8K (90.8%) and MATH 500 (77.4%) in the 7B model. These results establish QeRL as an efficient and effective framework for RL training in LLMs.
zh

[NLP-3] When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents

【速读】: 该论文旨在解决当前对基于大语言模型(Large Language Model, LLM)的交易代理(trading agent)在真实市场环境中推理与适应能力评估不足的问题,具体包括:现有研究多测试模型而非完整代理、覆盖时间与资产范围有限,且依赖未经验证的数据。其解决方案的关键在于提出首个终身、实时的基准测试平台——Agent Market Arena (AMA),该平台整合经验证的交易数据、专家审核的新闻信息以及多种代理架构,在统一框架下实现公平、持续的实证比较。AMA通过部署四类具有不同风险风格的代理(如InvestorAgent、TradeAgent、HedgeFundAgent和DeepFundAgent),并在多个主流LLM(如GPT-4o、Claude-3.5-haiku等)上进行活体实验,揭示了代理结构对行为模式的影响显著高于模型骨干本身,从而为金融推理与交易智能提供了可复现、可演进的评估基础。

链接: https://arxiv.org/abs/2510.11695
作者: Lingfei Qian,Xueqing Peng,Yan Wang,Vincent Jim Zhang,Huan He,Hanley Smith,Yi Han,Yueru He,Haohang Li,Yupeng Cao,Yangyang Yu,Alejandro Lopez-Lira,Peng Lu,Jian-Yun Nie,Guojun Xiong,Jimin Huang,Sophia Ananiadou
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Although Large Language Model (LLM)-based agents are increasingly used in financial trading, it remains unclear whether they can reason and adapt in live markets, as most studies test models instead of agents, cover limited periods and assets, and rely on unverified data. To address these gaps, we introduce Agent Market Arena (AMA), the first lifelong, real-time benchmark for evaluating LLM-based trading agents across multiple markets. AMA integrates verified trading data, expert-checked news, and diverse agent architectures within a unified trading framework, enabling fair and continuous comparison under real conditions. It implements four agents, including InvestorAgent as a single-agent baseline, TradeAgent and HedgeFundAgent with different risk styles, and DeepFundAgent with memory-based reasoning, and evaluates them across GPT-4o, GPT-4.1, Claude-3.5-haiku, Claude-sonnet-4, and Gemini-2.0-flash. Live experiments on both cryptocurrency and stock markets demonstrate that agent frameworks display markedly distinct behavioral patterns, spanning from aggressive risk-taking to conservative decision-making, whereas model backbones contribute less to outcome variation. AMA thus establishes a foundation for rigorous, reproducible, and continuously evolving evaluation of financial reasoning and trading intelligence in LLM-based agents.
zh

[NLP-4] Scaling Language-Centric Omnimodal Representation Learning NEURIPS2025

【速读】: 该论文旨在解决多模态嵌入方法中基于多模态大语言模型(Multimodal Large Language Models, MLLMs)在对比学习(Contrastive Learning, CL)框架下表现优异但其内在机制尚不明确的问题。核心挑战在于理解为何经过生成式预训练的MLLM能够通过CL实现更优的跨模态对齐效果。解决方案的关键在于揭示了MLLM在生成式预训练过程中隐式实现了跨模态对齐(implicit cross-modal alignment),即语言解码器在共享表示空间中利用多模态信号生成单模态输出,从而在潜在空间中形成结构化的对齐特性;在此基础上提出语言中心的全模态嵌入框架(Language-Centric Omnimodal Embedding, LCO-Emb),并发现生成-表征缩放律(Generation-Representation Scaling Law, GRSL),证明模型的生成能力与对比精调后的表征性能正相关,为提升嵌入质量提供了新的优化路径。

链接: https://arxiv.org/abs/2510.11693
作者: Chenghao Xiao,Hou Pong Chan,Hao Zhang,Weiwen Xu,Mahani Aljunied,Yu Rong
机构: DAMO Academy, Alibaba Group (阿里巴巴集团达摩院)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: NeurIPS 2025

点击查看摘要

Abstract:Recent multimodal embedding approaches leveraging multimodal large language models (MLLMs) fine-tuned with contrastive learning (CL) have shown promising results, yet the underlying reasons behind their superiority remain underexplored. This work argues that a crucial advantage of MLLM-based approaches stems from implicit cross-modal alignment achieved during generative pretraining, where the language decoder learns to exploit multimodal signals within a shared representation space for generating unimodal outputs. Through analysis of anisotropy and kernel similarity structure, we empirically confirm that latent alignment emerges within MLLM representations, allowing CL to serve as a lightweight refinement stage. Leveraging this insight, we propose a Language-Centric Omnimodal Embedding framework, termed LCO-Emb. Extensive experiments across diverse backbones and benchmarks demonstrate its effectiveness, achieving state-of-the-art performance across modalities. Furthermore, we identify a Generation-Representation Scaling Law (GRSL), showing that the representational capabilities gained through contrastive refinement scales positively with the MLLM’s generative capabilities. This suggests that improving generative abilities evolves as an effective paradigm for enhancing representation quality. We provide a theoretical explanation of GRSL, which formally links the MLLM’s generative quality to the upper bound on its representation performance, and validate it on a challenging, low-resource visual-document retrieval task, showing that continual generative pretraining before CL can further enhance the potential of a model’s embedding capabilities. Codes, models, and resources are available at this https URL.
zh

[NLP-5] Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models

【速读】: 该论文旨在解决强化学习(Reinforcement Learning, RL)应用于扩散大语言模型(diffusion large language models, dLLMs)时,由于其似然函数难以计算而导致的训练难题。现有方法通过定制化的蒙特卡洛(Monte Carlo, MC)采样近似对数似然,但需保留所有采样路径的前向计算图以计算RL目标中非线性项的梯度,造成显著内存开销,限制了样本规模,进而导致似然估计不准确并扭曲RL目标。解决方案的关键在于提出边界引导策略优化(Boundary-Guided Policy Optimization, BGPO),其核心是构造一个基于证据下界(ELBO)目标的特殊下界,该下界满足两个关键性质:(1) 线性性(Linearity)——每个项仅依赖单个MC样本,支持跨样本梯度累积,实现恒定内存消耗;(2) 等价性(Equivalence)——在在线策略训练中,该下界的值与梯度等于原始ELBO目标,从而成为原RL目标的有效近似。这一设计使BGPO能够使用大规模MC样本,提升似然估计精度和RL目标逼近效果,最终在数学解题、代码生成和规划任务中显著优于现有dLLM强化学习算法。

链接: https://arxiv.org/abs/2510.11683
作者: Nianyi Lin,Jiajie Zhang,Lei Hou,Juanzi Li
机构: Tsinghua University (清华大学)
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:A key challenge in applying reinforcement learning (RL) to diffusion large language models (dLLMs) lies in the intractability of their likelihood functions, which are essential for the RL objective, necessitating corresponding approximation in each training step. While existing methods approximate the log-likelihoods by their evidence lower bounds (ELBOs) via customized Monte Carlo (MC) sampling, the forward computational graphs of all MC samples need to be retained for the gradient computation of non-linear terms in the RL objective, resulting in significant memory overhead. This constraint restricts feasible sample sizes, leading to imprecise likelihood approximations and ultimately distorting the RL objective. To overcome this limitation, we propose \emphBoundary-Guided Policy Optimization (BGPO), a memory-efficient RL algorithm that maximizes a specially constructed lower bound of the ELBO-based objective. This lower bound is carefully designed to satisfy two key properties: (1) Linearity: it is formulated in a linear sum where each term depends only on a single MC sample, thereby enabling gradient accumulation across samples and ensuring constant memory usage; (2) Equivalence: Both the value and gradient of this lower bound are equal to those of the ELBO-based objective in on-policy training, making it also an effective approximation for the original RL objective. These properties allow BGPO to adopt a large MC sample size, resulting in more accurate likelihood approximations and improved RL objective estimation, which in turn leads to enhanced performance. Experiments show that BGPO significantly outperforms previous RL algorithms for dLLMs in math problem solving, code generation, and planning tasks.
zh

[NLP-6] FinVet: A Collaborative Framework of RAG and External Fact-Checking Agents for Financial Misinformation Detection

【速读】: 该论文旨在解决金融市场上由虚假信息引发的严重风险问题,此类信息可在短时间内造成数十亿美元损失,而现有方法普遍缺乏决策透明度和可信来源的可追溯性。其解决方案的关键在于提出一种名为FinVet的多智能体框架,该框架融合了两个检索增强生成(Retrieval-Augmented Generation, RAG)管道,并通过置信度加权投票机制引入外部事实核查;其核心创新在于采用自适应三层处理策略,根据检索置信度动态调整验证方式——从直接元数据提取到混合推理再到全模型分析,从而实现基于证据的判断、来源归属、置信度评分以及在证据不足时的显式不确定性标记。

链接: https://arxiv.org/abs/2510.11654
作者: Daniel Berhane Araya,Duoduo Liao
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Financial markets face growing threats from misinformation that can trigger billions in losses in minutes. Most existing approaches lack transparency in their decision-making and provide limited attribution to credible sources. We introduce FinVet, a novel multi-agent framework that integrates two Retrieval-Augmented Generation (RAG) pipelines with external fact-checking through a confidence-weighted voting mechanism. FinVet employs adaptive three-tier processing that dynamically adjusts verification strategies based on retrieval confidence, from direct metadata extraction to hybrid reasoning to full model-based analysis. Unlike existing methods, FinVet provides evidence-backed verdicts, source attribution, confidence scores, and explicit uncertainty flags when evidence is insufficient. Experimental evaluation on the FinFact dataset shows that FinVet achieves an F1 score of 0.85, which is a 10.4% improvement over the best individual pipeline (fact-check pipeline) and 37% improvement over standalone RAG approaches.
zh

[NLP-7] ACADREASON : Exploring the Limits of Reasoning Models with Academic Research Problems

【速读】: 该论文试图解决当前大语言模型(Large Language Models, LLMs)与智能体(Agents)在高阶学术推理能力评估方面缺乏严谨基准的问题。现有评测多集中于数学、编程竞赛或通用任务,而跨学科的学术基准又普遍缺乏足够的推理深度,难以真实反映模型在复杂学术场景下的表现。为此,作者提出了Acadreason基准,其关键在于构建了一个包含50道由专家标注的高难度学术问题的数据集,覆盖计算机科学、经济学、法学、数学和哲学五个需要深度推理的领域,所有题目均来自近年顶级期刊,经过严格的质量控制,确保挑战性与可答性兼具。该基准为系统评估LLMs和Agents在学术知识获取与推理方面的性能提供了标准化工具,揭示了当前模型在超智能学术研究任务中的显著能力差距。

链接: https://arxiv.org/abs/2510.11652
作者: Xin Gui,King Zhu,JinCheng Ren,Qianben Chen,Zekun Moore Wang,Yizhi LI,Xinpeng Liu,Xiaowan Li,Wenli Ren,Linyu Miao,Tianrui Qin,Ziqi Shu,He Zhu,Xiangru Tang,Dingfeng Shi,Jiaheng Liu,Yuchen Eleanor Jiang,Minghao Liu,Ge Zhang,Wangchunshu Zhou
机构: OPPO AI Agent Team (OPPO人工智能代理团队)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:In recent years, the research focus of large language models (LLMs) and agents has shifted increasingly from demonstrating novel capabilities to complex reasoning and tackling challenging tasks. However, existing evaluations focus mainly on math/code contests or general tasks, while existing multi-domain academic benchmarks lack sufficient reasoning depth, leaving the field without a rigorous benchmark for high-level reasoning. To fill this gap, we introduce the Acadreason benchmark, designed to evaluate the ability of LLMs and agents to acquire and reason over academic knowledge. It consists of 50 expert-annotated academic problems across five high-reasoning domains, including computer science, economics, law, mathematics, and philosophy. All questions are sourced from top-tier publications in recent years and undergo rigorous annotation and quality control to ensure they are both challenging and answerable. We conduct systematic evaluations of over 10 mainstream LLMs and agents. The results show that most LLMs scored below 20 points, with even the cutting-edge GPT-5 achieving only 16 points. While agents achieved higher scores, none exceeded 40 points. This demonstrates the current capability gap between LLMs and agents in super-intelligent academic research tasks and highlights the challenges of Acadreason.
zh

[NLP-8] Enhancing Long Chain-of-Thought Reasoning through Multi-Path Plan Aggregation

【速读】: 该论文旨在解决语言模型(Language Model, LM)在推理过程中因链式思维(Chain-of-Thought, CoT)生成单次前向传播导致的推理轨迹偏移(CoT derailment)问题,尤其针对小规模模型在长CoT场景下由于容量有限而产生的累积误差。其核心解决方案是提出多路径计划聚合(Multi-Path Plan Aggregation, MPPA)框架,该框架通过识别推理层级中的规划与执行步骤并发现多数错误源于错误规划,进而采用基于token位置的可变间隔策略生成多个候选计划,并通过轻量级LoRA模块对这些计划进行聚合以优化初始规划阶段。此外,为提升训练效率和稳定性,作者引入在线Step-DPO方法,利用扭曲顺序蒙特卡洛(Twisted Sequential Monte Carlo, TSMC)实现基于小模型的可扩展步骤级监督,显著优于传统基于结果奖励的强化学习(Outcome-Reward RL)方案,在仅使用10%监督微调数据和5%偏好样本的情况下,在数学、科学和逻辑推理任务上超越了DeepSeek-R1蒸馏基线和结果奖励RL基线。

链接: https://arxiv.org/abs/2510.11620
作者: Siheng Xiong,Ali Payani,Faramarz Fekri
机构: Georgia Institute of Technology (佐治亚理工学院); Cisco Research (思科研究院)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Inference-time scaling enhances the reasoning ability of a language model (LM) by extending its chain-of-thought (CoT). However, existing approaches typically generate the entire reasoning chain in a single forward pass, which often leads to CoT derailment, i.e., the reasoning trajectory drifting off course due to compounding errors. This problem is particularly severe for smaller LMs with long CoTs due to their limited capacity. To address this, we analyze raw long CoTs and uncover a reasoning hierarchy consisting of planning and execution steps. Our analysis reveals that most reasoning errors stem from incorrect planning. Motivated by this observation, we propose Multi-Path Plan Aggregation (MPPA), a framework that augments single-pass reasoning with plan exploration and aggregation. Following a variable interval schedule based on the token position, MPPA generates multiple candidate plans and aggregates them into a refined planning step. To maintain efficiency, we adopt a minimal design in which the base LM serves as the primary policy, while a lightweight LoRA module implements the plan aggregation policy. We further observe that outcome-reward RL is inefficient for long trajectories (e.g., exceeding 4K tokens). To overcome this, we introduce online Step-DPO, a process-level preference optimization scheme that leverages Twisted Sequential Monte Carlo (TSMC) to provide scalable stepwise supervision using small LMs. This yields more efficient training, improved stability, and higher accuracy. Extensive experiments on challenging math, science, and logical reasoning benchmarks demonstrate that, with only 10% SFT data and 5% of preference pairs, our method outperforms both the DeepSeek-R1 distillation baseline and the outcome-reward RL baseline across multiple base models and tasks.
zh

[NLP-9] StoryBox: Collaborative Multi-Agent Simulation for Hybrid Bottom-Up Long-Form Story Generation Using Large Language Models

【速读】: 该论文旨在解决当前长篇故事生成模型在保持情节连贯性与一致性方面的挑战,尤其是在生成超过10,000词的复杂叙事时难以自然演进的问题。传统方法多采用自上而下的结构化生成策略,往往导致故事僵化、缺乏自发性。其解决方案的关键在于提出一种混合自底向上的长篇故事生成框架,通过多智能体(multi-agent)模拟在动态沙盒环境中进行交互,使事件从智能体行为和环境互动中自然涌现,从而构建有机的故事基础。这一机制不仅支持角色发展与情节推进的自发性,还显著提升了生成故事的长度、连贯性和沉浸感,实现了当前最优的性能表现。

链接: https://arxiv.org/abs/2510.11618
作者: Zehao Chen,Rong Pan,Haoran Li
机构: 未知
类目: Computation and Language (cs.CL); Multiagent Systems (cs.MA)
备注: Project: this https URL

点击查看摘要

Abstract:Human writers often begin their stories with an overarching mental scene, where they envision the interactions between characters and their environment. Inspired by this creative process, we propose a novel approach to long-form story generation, termed hybrid bottom-up long-form story generation, using multi-agent simulations. In our method, agents interact within a dynamic sandbox environment, where their behaviors and interactions with one another and the environment generate emergent events. These events form the foundation for the story, enabling organic character development and plot progression. Unlike traditional top-down approaches that impose rigid structures, our hybrid bottom-up approach allows for the natural unfolding of events, fostering more spontaneous and engaging storytelling. The system is capable of generating stories exceeding 10,000 words while maintaining coherence and consistency, addressing some of the key challenges faced by current story generation models. We achieve state-of-the-art performance across several metrics. This approach offers a scalable and innovative solution for creating dynamic, immersive long-form stories that evolve organically from agent-driven interactions.
zh

[NLP-10] LLM -Oriented Token-Adaptive Knowledge Distillation

【速读】: 该论文旨在解决当前基于logit的知识蒸馏(Knowledge Distillation, KD)方法在压缩大语言模型(Large-Scale Language Models, LLMs)时存在的局限性,即静态策略无法适配学生模型动态学习过程的问题。具体而言,传统方法对所有token一视同仁,并采用固定温度进行软标签分配,导致知识传递效率低下。其解决方案的关键在于提出一种面向大语言模型的Token自适应知识蒸馏框架(LLM-Oriented Token-Adaptive Knowledge Distillation, AdaKD),通过一个统一的token难度度量驱动两个协同模块:一是Loss-Driven Adaptive Token Focusing (LATF) 模块,根据学生模型的学习稳定性动态调整蒸馏焦点,聚焦于训练阶段最有价值的token;二是逆难度温度缩放(Inverse Difficulty Temperature Scaling, IDTS)机制,针对难样本使用低温度以实现精准误差修正,对易样本采用高温度以促进学生从教师模型平滑分布中学习,从而提升泛化能力。该方案为通用蒸馏方法提供了一种可插拔的优化路径,在多种模型架构和基准测试中均表现出一致性性能提升。

链接: https://arxiv.org/abs/2510.11615
作者: Xurong Xie,Zhucun Xue,Jiafu Wu,Jian Li,Yabiao Wang,Xiaobin Hu,Yong Liu,Jiangning Zhang
机构: Zhejiang University (浙江大学); Tencent Youtu Lab (腾讯优图实验室)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 15 pages, 4 figures

点击查看摘要

Abstract:Knowledge distillation (KD) is a key technique for compressing large-scale language models (LLMs), yet prevailing logit-based methods typically employ static strategies that are misaligned with the dynamic learning process of student models. These methods typically treat all tokens indiscriminately and apply a single, fixed temperature, resulting in suboptimal knowledge transfer. To address these limitations, we propose LLM-Oriented Token-Adaptive Knowledge Distillation (AdaKD), a novel framework that adapts the distillation process to the real-time learning state of each token. AdaKD consists of two synergistic modules driven by a unified token difficulty metric. First, our Loss-Driven Adaptive Token Focusing (LATF) module dynamically adjusts the distillation focus by monitoring the student’s learning stability, concentrating computational resources on the most valuable tokens at each training phase. Second, we introduce Inverse Difficulty Temperature Scaling (IDTS), a counterintuitive yet effective token-level temperature strategy. It employs low temperatures for difficult tokens for targeted error correction, and high temperatures for easy tokens to encourage students to learn from the teacher’s complete and smooth output distribution, thereby enhancing generalization. As a plug-and-play framework, AdaKD can consistently improve the performance of various distillation methods on multiple model architectures and benchmarks.
zh

[NLP-11] Deconstructing Attention: Investigating Design Principles for Effective Language Modeling

【速读】: 该论文试图解决的问题是:Transformer语言模型中注意力机制(attention mechanism)的核心设计原则是否必要,以及这些原则对模型性能的具体贡献是什么。现有研究普遍认为点积注意力机制的成功源于其四大关键特性——跨位置信息混合、序列依赖激活、特定数学形式(点积相似度加Softmax加权)及查询与键耦合到当前层隐藏状态的结构。然而,这些设计原则的实际必要性尚未被系统验证。

解决方案的关键在于:通过构建受控的变体模型,有选择地放松上述每一项设计原则,分别在所有层中统一应用或仅在部分层保留标准注意力机制,从而进行系统的实证分析。结果表明,跨位置的信息混合机制不可或缺,其缺失会导致模型退化为近乎随机行为;而数学形式和序列依赖性可显著放宽,尤其当它们仅保留在部分层时仍能维持良好性能;更令人意外的是,某些单独失效的注意力变体在与标准注意力交替使用时反而表现出协同效应,实现稳健性能。这一发现深化了对注意力机制本质作用的理解,并为简化语言模型提供了新路径。

链接: https://arxiv.org/abs/2510.11602
作者: Huiyin Xue,Nafise Sadat Moosavi,Nikolaos Aletras
机构: 未知
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:The success of Transformer language models is widely credited to their dot-product attention mechanism, which interweaves a set of key design principles: mixing information across positions (enabling multi-token interactions), sequence-dependent activations (where attention weights adapt to each input), a specific mathematical form (dot-product similarities plus softmax weighting), and coupling of queries and keys to evolving hidden states (grounding attention in the current layer). However, the necessity of each of these principles remains largely untested. In this work, we systematically deconstruct attention by designing controlled variants that selectively relax these principles, applied both uniformly across all layers and in hybrid architectures where only some layers retain standard attention. Our empirical analysis reveals that mechanisms for mixing tokens are indispensable, as their absence collapses models to near-random behavior, while the exact mathematical form and sequence dependency can be substantially relaxed, especially when preserved in just a subset of layers. Surprisingly, even variants that fail in isolation can achieve robust performance when interleaved with standard attention, highlighting a cooperative effect. These findings deepen our understanding of what truly underpins attention’s effectiveness and open new avenues for simplifying language models without sacrificing performance.
zh

[NLP-12] SemCSE-Multi: Multifaceted and Decodable Embeddings for Aspect-Specific and Interpretable Scientific Domain Mapping

【速读】: 该论文旨在解决科学文献中多维度语义信息难以被细粒度捕捉与可控表达的问题,尤其在入侵生物学和医学领域,传统嵌入方法往往生成单一、混合的向量表示,限制了用户对特定语义方面的精准检索与可视化分析。其解决方案的关键在于提出SemCSE-Multi框架,通过无监督方式生成针对不同语义方面(aspect)的摘要句,并训练嵌入模型将语义相关的摘要映射至嵌入空间中的邻近位置;随后,将这些方面特异性嵌入能力蒸馏至一个统一模型中,实现从单篇科学摘要中一次性预测多个方面嵌入;同时引入嵌入解码管道,可将嵌入还原为自然语言描述,即使在低维可视化中未覆盖区域也保持有效性,显著提升用户导向场景下的可解释性。

链接: https://arxiv.org/abs/2510.11599
作者: Marc Brinner,Sina Zarrieß
机构: Bielefeld University, Germany(比勒费尔德大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:We propose SemCSE-Multi, a novel unsupervised framework for generating multifaceted embeddings of scientific abstracts, evaluated in the domains of invasion biology and medicine. These embeddings capture distinct, individually specifiable aspects in isolation, thus enabling fine-grained and controllable similarity assessments as well as adaptive, user-driven visualizations of scientific domains. Our approach relies on an unsupervised procedure that produces aspect-specific summarizing sentences and trains embedding models to map semantically related summaries to nearby positions in the embedding space. We then distill these aspect-specific embedding capabilities into a unified embedding model that directly predicts multiple aspect embeddings from a scientific abstract in a single, efficient forward pass. In addition, we introduce an embedding decoding pipeline that decodes embeddings back into natural language descriptions of their associated aspects. Notably, we show that this decoding remains effective even for unoccupied regions in low-dimensional visualizations, thus offering vastly improved interpretability in user-centric settings.
zh

[NLP-13] MeTA-LoRA: Data-Efficient Multi-Task Fine-Tuning for Large Language Models

【速读】: 该论文旨在解决低秩适配(Low-Rank Adaptation, LoRA)在多任务学习(multi-task learning)场景下数据效率不足的问题,即在复杂多任务环境中难以有效利用任务间知识,通常需要大量任务特定数据才能达到最优性能。其解决方案的关键在于提出一种两阶段优化框架 MeTA-LoRA:第一阶段通过少量样本为每个任务独立训练任务特定的 LoRA 适配器,实现快速适应;第二阶段通过聚合多个任务的梯度来更新共享 LoRA 适配器,促进跨任务知识迁移,从而显著降低对任务特定数据的依赖,并在多任务和多语言学习场景中实现与全数据 LoRA 相当或更优的性能。

链接: https://arxiv.org/abs/2510.11598
作者: Bo Cheng,Xu Wang,Jinda Liu,Yi Chang,Yuan Wu
机构: School of Artificial Intelligence, Jilin University (人工智能学院,吉林大学); Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China (知识驱动人机智能工程研究中心,教育部,中国); International Center of Future Science, Jilin University (未来科学国际中心,吉林大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Low-Rank Adaptation (LoRA) has emerged as one of the most widely used parameter-efficient fine-tuning (PEFT) methods for adapting large language models (LLMs) to downstream tasks. While highly effective in single-task settings, it struggles to efficiently leverage inter-task knowledge in complex multi-task learning scenarios, often requiring substantial task-specific data to achieve optimal performance. To address this limitation, we introduce MeTA-LoRA, a two-stage optimization framework that significantly improves data efficiency in multi-task adaptation. In the first stage, task-specific LoRA adapters are learned using only a few samples from each involved dataset, enabling rapid adaptation without large-scale supervision. In the second stage, the shared LoRA adapter is updated by aggregating gradients from multiple tasks to promote knowledge transfer across tasks, further reducing data usage by leveraging common patterns. In both multi-task learning and multilingual learning scenarios, our method matches or surpasses the performance of traditional full-data LoRA fine-tuning approaches, while using significantly less task-specific data.
zh

[NLP-14] REGENT: Relevance-Guided Attention for Entity-Aware Multi-Vector Neural Re-Ranking SIGIR

【速读】: 该论文旨在解决当前神经重排序模型(neural re-rankers)在处理复杂信息需求和长篇内容丰富文档时的局限性,其核心问题在于缺乏智能的内容选择能力——即难以从冗长、多维度文本中识别关键语义信息。传统模型受限于固定的token窗口机制,对所有交互一视同仁,忽略重要语义信号。解决方案的关键在于提出REGENT模型,该模型通过将实体作为“语义骨架”嵌入注意力机制,实现相关性引导的注意力分配,从而融合细粒度词汇匹配与高层次语义推理,使模型能够聚焦概念重要内容的同时保留精确术语匹配的敏感性。这是首个成功将实体语义直接整合进神经注意力机制的工作,确立了面向实体感知的信息检索新范式。

链接: https://arxiv.org/abs/2510.11592
作者: Shubham Chatterjee
机构: Missouri University of Science and Technology (密苏里科技大学)
类目: Information Retrieval (cs.IR); Computation and Language (cs.CL)
备注: To be published in: Proceedings of the 2025 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region (SIGIR-AP 2025)

点击查看摘要

Abstract:Current neural re-rankers often struggle with complex information needs and long, content-rich documents. The fundamental issue is not computational–it is intelligent content selection: identifying what matters in lengthy, multi-faceted texts. While humans naturally anchor their understanding around key entities and concepts, neural models process text within rigid token windows, treating all interactions as equally important and missing critical semantic signals. We introduce REGENT, a neural re-ranking model that mimics human-like understanding by using entities as a “semantic skeleton” to guide attention. REGENT integrates relevance guidance directly into the attention mechanism, combining fine-grained lexical matching with high-level semantic reasoning. This relevance-guided attention enables the model to focus on conceptually important content while maintaining sensitivity to precise term matches. REGENT achieves new state-of-the-art performance in three challenging datasets, providing up to 108% improvement over BM25 and consistently outperforming strong baselines including ColBERT and RankVicuna. To our knowledge, this is the first work to successfully integrate entity semantics directly into neural attention, establishing a new paradigm for entity-aware information retrieval.
zh

[NLP-15] QDER: Query-Specific Document and Entity Representations for Multi-Vector Document Re-Ranking SIGIR SIGIR2025

【速读】: 该论文旨在解决神经信息检索(Neural IR)中实体导向方法与多向量模型各自局限的问题,即如何有效融合知识图谱(Knowledge Graph)的语义信息与细粒度的词元级表示能力,以提升复杂查询下的检索性能。其解决方案的关键在于提出QDER模型,通过“晚聚合”(late aggregation)机制,在排序过程中保持个体词元和实体的细粒度表示,仅在最终评分阶段进行聚合;同时利用学习到的注意力模式对这些表示进行变换,并结合精心设计的数学运算实现精准匹配,从而显著提升难例查询的召回效果,尤其在TREC Robust 2004数据集上相较最强基线模型nDCG@20提升达36%。

链接: https://arxiv.org/abs/2510.11589
作者: Shubham Chatterjee,Jeff Dalton
机构: Missouri University of Science and Technology (密苏里科技大学); The University of Edinburgh (爱丁堡大学)
类目: Information Retrieval (cs.IR); Computation and Language (cs.CL)
备注: Published in: Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2025)

点击查看摘要

Abstract:Neural IR has advanced through two distinct paths: entity-oriented approaches leveraging knowledge graphs and multi-vector models capturing fine-grained semantics. We introduce QDER, a neural re-ranking model that unifies these approaches by integrating knowledge graph semantics into a multi-vector model. QDER’s key innovation lies in its modeling of query-document relationships: rather than computing similarity scores on aggregated embeddings, we maintain individual token and entity representations throughout the ranking process, performing aggregation only at the final scoring stage - an approach we call “late aggregation.” We first transform these fine-grained representations through learned attention patterns, then apply carefully chosen mathematical operations for precise matches. Experiments across five standard benchmarks show that QDER achieves significant performance gains, with improvements of 36% in nDCG@20 over the strongest baseline on TREC Robust 2004 and similar improvements on other datasets. QDER particularly excels on difficult queries, achieving an nDCG@20 of 0.70 where traditional approaches fail completely (nDCG@20 = 0.0), setting a foundation for future work in entity-aware retrieval.
zh

[NLP-16] Survey Response Generation: Generating Closed-Ended Survey Responses In-Silico with Large Language Models

【速读】: 该论文旨在解决如何有效利用大语言模型(Large Language Models, LLMs)生成符合真实人类调查响应模式的闭合式问卷回答的问题。当前研究中缺乏统一的标准方法来指导LLMs生成此类响应,且不同生成策略对模拟结果的准确性影响显著。解决方案的关键在于系统性地比较8种不同的调查响应生成方法在4个政治态度调查问卷和10个开源权重语言模型上的表现,发现受限生成(Restricted Generation Methods)策略整体上最能保持个体层面与子群体层面的响应一致性,而推理输出(reasoning output)并不总是提升对齐度。这一发现为未来基于LLM的模拟调查研究提供了实证依据和可操作建议。

链接: https://arxiv.org/abs/2510.11586
作者: Georg Ahnert,Anna-Carolina Haensch,Barbara Plank,Markus Strohmaier
机构: 未知
类目: Computation and Language (cs.CL); Computers and Society (cs.CY)
备注:

点击查看摘要

Abstract:Many in-silico simulations of human survey responses with large language models (LLMs) focus on generating closed-ended survey responses, whereas LLMs are typically trained to generate open-ended text instead. Previous research has used a diverse range of methods for generating closed-ended survey responses with LLMs, and a standard practice remains to be identified. In this paper, we systematically investigate the impact that various Survey Response Generation Methods have on predicted survey responses. We present the results of 32 mio. simulated survey responses across 8 Survey Response Generation Methods, 4 political attitude surveys, and 10 open-weight language models. We find significant differences between the Survey Response Generation Methods in both individual-level and subpopulation-level alignment. Our results show that Restricted Generation Methods perform best overall, and that reasoning output does not consistently improve alignment. Our work underlines the significant impact that Survey Response Generation Methods have on simulated survey responses, and we develop practical recommendations on the application of Survey Response Generation Methods.
zh

[NLP-17] LLM AtKGE: Large Language Models as Explainable Attackers against Knowledge Graph Embeddings

【速读】: 该论文旨在解决知识图谱嵌入(Knowledge Graph Embeddings, KGE)在对抗攻击中缺乏可解释性与泛化能力不足的问题。现有黑盒攻击方法虽尝试融合文本和结构信息以提升攻击效果,但无法生成人类可读的解释且泛化性能较差。其解决方案的关键在于提出一种基于大语言模型(Large Language Models, LLMs)的框架 LLMAtKGE,通过设计结构化提示策略将攻击任务形式化为多选题并引入知识图谱事实证据,同时结合语义和中心性过滤机制压缩候选集以保留高召回率的攻击相关信息,并预计算高阶邻接关系及对LLM进行三元组分类任务微调,从而高效整合语义与结构信息,显著提升攻击有效性与解释能力。

链接: https://arxiv.org/abs/2510.11584
作者: Ting Li,Yang Yang,Yipeng Yu,Liang Yao,Guoqing Chao,Ruifeng Xu
机构: Sun Yat-sen University (中山大学); Alibaba Inc. (阿里巴巴集团); Harbin Institute of Technology (哈尔滨工业大学)
类目: Computation and Language (cs.CL)
备注: 13 pages

点击查看摘要

Abstract:Adversarial attacks on knowledge graph embeddings (KGE) aim to disrupt the model’s ability of link prediction by removing or inserting triples. A recent black-box method has attempted to incorporate textual and structural information to enhance attack performance. However, it is unable to generate human-readable explanations, and exhibits poor generalizability. In the past few years, large language models (LLMs) have demonstrated powerful capabilities in text comprehension, generation, and reasoning. In this paper, we propose LLMAtKGE, a novel LLM-based framework that selects attack targets and generates human-readable explanations. To provide the LLM with sufficient factual context under limited input constraints, we design a structured prompting scheme that explicitly formulates the attack as multiple-choice questions while incorporating KG factual evidence. To address the context-window limitation and hesitation issues, we introduce semantics-based and centrality-based filters, which compress the candidate set while preserving high recall of attack-relevant information. Furthermore, to efficiently integrate both semantic and structural information into the filter, we precompute high-order adjacency and fine-tune the LLM with a triple classification task to enhance filtering performance. Experiments on two widely used knowledge graph datasets demonstrate that our attack outperforms the strongest black-box baselines and provides explanations via reasoning, and showing competitive performance compared with white-box methods. Comprehensive ablation and case studies further validate its capability to generate explanations.
zh

[NLP-18] Bag of Tricks for Subverting Reasoning -based Safety Guardrails

【速读】: 该论文旨在解决当前基于推理的安全防护机制(reasoning-based safety guardrails)在大型推理模型(Large Reasoning Models, LRMs)中存在系统性脆弱性的问题,即这些原本能有效抵御越狱攻击(jailbreak attacks)的机制可能被微小但精心设计的输入提示扰动所绕过,从而导致模型生成有害内容。解决方案的关键在于揭示了此类防护机制对模板令牌(template tokens)插入等简单输入修改的高度敏感性,并提出了一套涵盖白盒、灰盒和黑盒场景的越狱方法集合,包括从手动模板操纵到全自动优化的多种攻击策略,实验证明这些方法在多个基准测试上均能实现超过90%的成功率,凸显出当前开放源代码LRM安全对齐技术的不足,亟需更强的防御机制以防止恶意滥用。

链接: https://arxiv.org/abs/2510.11570
作者: Shuo Chen,Zhen Han,Haokun Chen,Bailan He,Shengyun Si,Jingpei Wu,Philip Torr,Volker Tresp,Jindong Gu
机构: LMU Munich (慕尼黑路德维希-马克西米利安大学); Munich Center for Machine Learning (MCML) (慕尼黑机器学习中心); Technical University of Berlin (柏林工业大学); Konrad Zuse School of Excellence in Reliable AI (relAI) (康拉德·祖塞可靠人工智能卓越学院); DFKI (德国弗劳恩霍夫计算机辅助技术研究所); AWS AI (亚马逊云科技人工智能); University of Oxford (牛津大学)
类目: Cryptography and Security (cs.CR); Computation and Language (cs.CL)
备注: OpenAI Red-teaming Challenge Winner and Oral Presentation

点击查看摘要

Abstract:Recent reasoning-based safety guardrails for Large Reasoning Models (LRMs), such as deliberative alignment, have shown strong defense against jailbreak attacks. By leveraging LRMs’ reasoning ability, these guardrails help the models to assess the safety of user inputs before generating final responses. The powerful reasoning ability can analyze the intention of the input query and will refuse to assist once it detects the harmful intent hidden by the jailbreak methods. Such guardrails have shown a significant boost in defense, such as the near-perfect refusal rates on the open-source gpt-oss series. Unfortunately, we find that these powerful reasoning-based guardrails can be extremely vulnerable to subtle manipulation of the input prompts, and once hijacked, can lead to even more harmful results. Specifically, we first uncover a surprisingly fragile aspect of these guardrails: simply adding a few template tokens to the input prompt can successfully bypass the seemingly powerful guardrails and lead to explicit and harmful responses. To explore further, we introduce a bag of jailbreak methods that subvert the reasoning-based guardrails. Our attacks span white-, gray-, and black-box settings and range from effortless template manipulations to fully automated optimization. Along with the potential for scalable implementation, these methods also achieve alarmingly high attack success rates (e.g., exceeding 90% across 5 different benchmarks on gpt-oss series on both local host models and online API services). Evaluations across various leading open-source LRMs confirm that these vulnerabilities are systemic, underscoring the urgent need for stronger alignment techniques for open-sourced LRMs to prevent malicious misuse. Code is open-sourced at this https URL.
zh

[NLP-19] Culturally-Aware Conversations: A Framework Benchmark for LLM s EMNLP

【速读】: 该论文旨在解决现有大语言模型(Large Language Models, LLMs)文化适应性评估基准与实际跨文化对话场景脱节的问题。当前的评测方法未能准确反映模型在与不同文化背景用户交互时所面临的挑战,导致对模型跨文化沟通能力的衡量存在偏差。解决方案的关键在于提出首个基于社会文化理论的系统性框架与真实世界多文化对话场景下的评测基准,该框架将语言风格(linguistic style)视为受情境、关系和文化语境共同塑造的核心要素,并构建了由多元文化评判者标注的数据集,同时定义了三项新的跨文化自然语言处理评估标准:对话框架(conversational framing)、风格敏感性(stylistic sensitivity)和主观正确性(subjective correctness)。通过这一框架,研究揭示了当前顶尖LLMs在多文化对话环境中仍存在显著的文化适应能力不足。

链接: https://arxiv.org/abs/2510.11563
作者: Shreya Havaldar,Sunny Rai,Young-Min Cho,Lyle Ungar
机构: University of Pennsylvania (宾夕法尼亚大学)
类目: Computation and Language (cs.CL)
备注: To appear at the 4th HCI + NLP Workshop @ EMNLP

点击查看摘要

Abstract:Existing benchmarks that measure cultural adaptation in LLMs are misaligned with the actual challenges these models face when interacting with users from diverse cultural backgrounds. In this work, we introduce the first framework and benchmark designed to evaluate LLMs in realistic, multicultural conversational settings. Grounded in sociocultural theory, our framework formalizes how linguistic style - a key element of cultural communication - is shaped by situational, relational, and cultural context. We construct a benchmark dataset based on this framework, annotated by culturally diverse raters, and propose a new set of desiderata for cross-cultural evaluation in NLP: conversational framing, stylistic sensitivity, and subjective correctness. We evaluate today’s top LLMs on our benchmark and show that these models struggle with cultural adaptation in a conversational setting.
zh

[NLP-20] Invisible Languages of the LLM Universe

【速读】: 该论文试图解决生成式 AI 系统中普遍存在的语言不平等现象,即尽管大型语言模型(Large Language Models, LLMs)基于多语种语料训练,但全球约2000种拥有数百万母语者的语言仍处于数字生态系统的“隐形状态”。其解决方案的关键在于提出一个融合语言活力(real world demographic strength)与数字存在度(online presence)的实证框架,并结合后殖民理论与认识论不公正(epistemic injustice)视角,揭示这种语言不平等并非偶然,而是殖民时代语言等级制度在当代人工智能开发中的结构性延续。研究识别出四类语言分布模式,其中“隐形巨兽”(Invisible Giants)类别——高人口基数却几乎无数字存在——最能体现问题本质,从而论证英语主导AI并非技术必然,而是权力结构系统性排斥边缘语言知识的结果。

链接: https://arxiv.org/abs/2510.11557
作者: Saurabh Khanna,Xinxu Li
机构: Amsterdam School of Communication Research, University of Amsterdam (阿姆斯特丹大学传播研究学院); Pembroke College, University of Oxford (牛津大学彭布罗克学院)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Large Language Models are trained on massive multilingual corpora, yet this abundance masks a profound crisis: of the world’s 7,613 living languages, approximately 2,000 languages with millions of speakers remain effectively invisible in digital ecosystems. We propose a critical framework connecting empirical measurements of language vitality (real world demographic strength) and digitality (online presence) with postcolonial theory and epistemic injustice to explain why linguistic inequality in AI systems is not incidental but structural. Analyzing data across all documented human languages, we identify four categories: Strongholds (33%, high vitality and digitality), Digital Echoes (6%, high digitality despite declining vitality), Fading Voices (36%, low on both dimensions), and critically, Invisible Giants (27%, high vitality but near-zero digitality) - languages spoken by millions yet absent from the LLM universe. We demonstrate that these patterns reflect continuities from colonial-era linguistic hierarchies to contemporary AI development, constituting what we term digital epistemic injustice. Our analysis reveals that English dominance in AI is not a technical necessity but an artifact of power structures that systematically exclude marginalized linguistic knowledge. We conclude with implications for decolonizing language technology and democratizing access to AI benefits.
zh

[NLP-21] Information-Preserving Reformulation of Reasoning Traces for Antidistillation

【速读】: 该论文旨在解决生成式 AI(Generative AI)在推理链(reasoning chain)公开披露后面临的未经授权蒸馏(unauthorized distillation)风险,即第三方模型可通过学习详细推理过程高效复现原模型能力,从而威胁模型提供商的知识产权与商业价值。现有保护策略(如用简短摘要替代详细推理)虽能降低蒸馏风险,但牺牲了用户获取中间信息以验证、理解或学习的能力。解决方案的关键在于提出PART方法——一种信息保持型反蒸馏(information-preserving antidistillation)框架,其核心是通过两个步骤对推理链进行重构:一是去除模型自言自语式的行为(self-talk behaviors),二是重新排列子结论顺序(reordering sub-conclusions),并辅以一个轻量级辅助模型实现该重构,显著削弱学生模型在不同规模和类型下的蒸馏效果,同时保留原始推理链的可解释性与教学价值。

链接: https://arxiv.org/abs/2510.11545
作者: Jiayu Ding,Lei Cui,Li Dong,Nanning Zheng,Furu Wei
机构: IAIR, Xi’an Jiaotong University (西安交通大学人工智能研究院); Microsoft Research (微软研究院)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Recent advances in Large Language Models (LLMs) show that extending the length of reasoning chains significantly improves performance on complex tasks. While revealing these reasoning traces helps users better follow, verify, and learn from the model’s problem-solving process, it also makes them highly vulnerable to unauthorized distillation. To mitigate this risk, proprietary model providers often adopt aggressive protection strategies, such as replacing detailed reasoning with brief summaries, which deprive users of valuable intermediate information. To address this trade-off, we propose PART, an information-preserving antidistillation reformulation of reasoning traces. Motivated by the difference between how humans understand reasoning traces and how LLMs exploit them for supervised fine-tuning, we design a simple but effective two-step reformulation: removing self-talk behaviors and reordering sub-conclusions. A small auxiliary model is trained to perform this reformulation, incurring minimal computational overhead. Extensive experiments demonstrate that PART consistently disrupts distillation across student models of different sizes and types on various reasoning benchmarks. For instance, when training on reformulated traces, even the performance of a large 32B student model decreases from 54.17 to 46.88 on AIME 2024, corresponding to a 13.5% degradation.
zh

[NLP-22] An Encoder-Integrated PhoBERT with Graph Attention for Vietnamese Token-Level Classification

【速读】: 该论文旨在解决传统序列模型在token-level分类任务中对长距离依赖关系建模不足的问题,尤其在越南语领域中,如命名实体识别(Named Entity Recognition, NER)和言语不流畅检测等任务中,仅依赖自回归或卷积结构难以充分捕捉词元间的复杂语义关联。解决方案的关键在于提出TextGraphFuseGAT模型,该模型将预训练Transformer编码器(PhoBERT)与图注意力网络(Graph Attention Networks, GAT)相结合:首先利用PhoBERT生成初始token嵌入,随后构建一个全连接图以显式建模token之间的非局部依赖关系;在此基础上引入Transformer风格的自注意力机制进一步增强上下文表示能力,最终通过分类头实现精准序列标注。该架构有效融合了预训练语义特征与图结构关系建模优势,在多个越南语基准数据集上显著优于纯Transformer及混合神经网络基线模型。

链接: https://arxiv.org/abs/2510.11537
作者: Ba-Quang Nguyen
机构: University of Engineering and Technology (工程与技术大学); Vietnam National University (越南国家大学); Hanoi (河内)
类目: Computation and Language (cs.CL)
备注: 11 pages, 1 figure. Submitted to VLSP 2025 and reviewed

点击查看摘要

Abstract:We propose a novel neural architecture named TextGraphFuseGAT, which integrates a pretrained transformer encoder (PhoBERT) with Graph Attention Networks for token-level classification tasks. The proposed model constructs a fully connected graph over the token embeddings produced by PhoBERT, enabling the GAT layer to capture rich inter-token dependencies beyond those modeled by sequential context alone. To further enhance contextualization, a Transformer-style self-attention layer is applied on top of the graph-enhanced embeddings. The final token representations are passed through a classification head to perform sequence labeling. We evaluate our approach on three Vietnamese benchmark datasets: PhoNER-COVID19 for named entity recognition in the COVID-19 domain, PhoDisfluency for speech disfluency detection, and VietMed-NER for medical-domain NER. VietMed-NER is the first Vietnamese medical spoken NER dataset, featuring 18 entity types collected from real-world medical speech transcripts and annotated with the BIO tagging scheme. Its specialized vocabulary and domain-specific expressions make it a challenging benchmark for token-level classification models. Experimental results show that our method consistently outperforms strong baselines, including transformer-only and hybrid neural models such as BiLSTM + CNN + CRF, confirming the effectiveness of combining pretrained semantic features with graph-based relational modeling for improved token classification across multiple domains.
zh

[NLP-23] Hallucination Detection via Internal States and Structured Reasoning Consistency in Large Language Models

【速读】: 该论文旨在解决大型语言模型(Large Language Models, LLMs)中复杂幻觉检测的“检测困境”(Detection Dilemma)问题:即基于内部状态探测(Internal State Probing)的方法擅长识别事实性不一致,但在逻辑谬误检测上表现不佳;而基于思维链验证(Chain-of-Thought Verification)的方法则在逻辑推理任务中有效,却难以应对依赖事实的任务(如开放域问答),因推理缺乏 grounded 依据。为克服这一局限,作者提出一个统一框架,其核心创新在于两个关键机制:一是引入多路径推理机制(multi-path reasoning mechanism),以生成更细粒度、可比性强的信号,突破“信号稀缺障碍”(Signal Scarcity Barrier);二是设计分段感知的时间化交叉注意力模块(segment-aware temporalized cross-attention module),实现对齐后的表示自适应融合,从而精准定位细微的语义不一致。实验表明,该框架在多个基准测试和主流LLM上显著优于现有方法。

链接: https://arxiv.org/abs/2510.11529
作者: Yusheng Song,Lirong Qiu,Xi Zhang,Zhihao Tang
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:The detection of sophisticated hallucinations in Large Language Models (LLMs) is hampered by a ``Detection Dilemma’': methods probing internal states (Internal State Probing) excel at identifying factual inconsistencies but fail on logical fallacies, while those verifying externalized reasoning (Chain-of-Thought Verification) show the opposite behavior. This schism creates a task-dependent blind spot: Chain-of-Thought Verification fails on fact-intensive tasks like open-domain QA where reasoning is ungrounded, while Internal State Probing is ineffective on logic-intensive tasks like mathematical reasoning where models are confidently wrong. We resolve this with a unified framework that bridges this critical gap. However, unification is hindered by two fundamental challenges: the Signal Scarcity Barrier, as coarse symbolic reasoning chains lack signals directly comparable to fine-grained internal states, and the Representational Alignment Barrier, a deep-seated mismatch between their underlying semantic spaces. To overcome these, we introduce a multi-path reasoning mechanism to obtain more comparable, fine-grained signals, and a segment-aware temporalized cross-attention module to adaptively fuse these now-aligned representations, pinpointing subtle dissonances. Extensive experiments on three diverse benchmarks and two leading LLMs demonstrate that our framework consistently and significantly outperforms strong baselines. Our code is available: this https URL.
zh

[NLP-24] ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agent ic Web Coding

【速读】: 该论文旨在解决大型语言模型(Large Language Models, LLMs)在前端开发任务中表现不佳的问题,尤其是当代码正确性依赖于视觉渲染结果(如像素和交互行为)时。传统LLMs难以有效捕捉视觉反馈并进行迭代优化,导致生成的前端代码常存在布局错误或功能异常。解决方案的关键在于提出ReLook框架,其核心创新包括:(1) 基于多模态大语言模型(Multimodal Large Language Model, MLLM)构建一个“生成-诊断-精修”闭环机制,将MLLM作为视觉评判工具(通过截图评分)和可操作的、视觉引导的反馈来源;(2) 引入严格的零奖励规则以确保渲染有效性并防止奖励欺骗;(3) 提出强制优化(Forced Optimization)策略,仅接受改进型修改,避免行为退化;(4) 推理阶段解耦视觉批评器,采用轻量级无批评自编辑循环,在保持低延迟的同时保留大部分性能提升。该方法显著提升了视觉感知驱动的前端代码生成质量。

链接: https://arxiv.org/abs/2510.11498
作者: Yuhang Li,Chenchen Zhang,Ruilin Lv,Ao Liu,Ken Deng,Yuanxing Zhang,Jiaheng Liu,Wiggin Zhou,Bo Zhou
机构: Tencent(腾讯); Peking University (北京大学); Nanjing University (南京大学)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:While Large Language Models (LLMs) excel at algorithmic code generation, they struggle with front-end development, where correctness is judged on rendered pixels and interaction. We present ReLook, an agentic, vision-grounded reinforcement learning framework that empowers an agent to close a robust generate–diagnose–refine loop by invoking a multimodal LLM (MLLM) as a tool. During training, the agent uses the MLLM-in-the-loop both as a visual critic–scoring code with screenshots–and as a source of actionable, vision-grounded feedback; a strict zero-reward rule for invalid renders anchors renderability and prevents reward hacking. To prevent behavioral collapse, we introduce Forced Optimization, a strict acceptance rule that admits only improving revisions, yielding monotonically better trajectories. At inference, we decouple the critic and run a lightweight, critic-free self-edit cycle, keeping latency comparable to base decoding while retaining most of the gains. Across three widely used benchmarks, ReLook consistently outperforms strong baselines in vision-grounded front-end code generation, highlighting the benefits of agentic perception, visual rewards, and training-inference decoupling.
zh

[NLP-25] Investigating Large Language Models Linguistic Abilities for Text Preprocessing

【速读】: 该论文旨在解决传统文本预处理方法(如停用词去除、词形还原和词干提取)在处理多语言文本时忽视上下文信息的问题。由于这些传统方法通常依赖于静态规则或语言特定的标注资源,难以适应复杂语境下的语言变化,导致预处理效果受限。解决方案的关键在于利用大语言模型(Large Language Models, LLMs)的强大上下文理解能力,使其无需依赖大量语言特定标注数据即可完成高质量的文本预处理任务。实验表明,LLMs 在停用词去除、词形还原和词干提取上的准确率分别达到 97%、82% 和 74%,且使用 LLM 预处理后的文本训练的机器学习模型在 F₁ 指标上相比传统方法最高提升 6%,验证了其有效性与泛化潜力。

链接: https://arxiv.org/abs/2510.11482
作者: Marco Braga,Gian Carlo Milanese,Gabriella Pasi
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Accepted in WI-IAT 2025. Pre-camera-ready version

点击查看摘要

Abstract:Text preprocessing is a fundamental component of Natural Language Processing, involving techniques such as stopword removal, stemming, and lemmatization to prepare text as input for further processing and analysis. Despite the context-dependent nature of the above techniques, traditional methods usually ignore contextual information. In this paper, we investigate the idea of using Large Language Models (LLMs) to perform various preprocessing tasks, due to their ability to take context into account without requiring extensive language-specific annotated resources. Through a comprehensive evaluation on web-sourced data, we compare LLM-based preprocessing (specifically stopword removal, lemmatization and stemming) to traditional algorithms across multiple text classification tasks in six European languages. Our analysis indicates that LLMs are capable of replicating traditional stopword removal, lemmatization, and stemming methods with accuracies reaching 97%, 82%, and 74%, respectively. Additionally, we show that ML algorithms trained on texts preprocessed by LLMs achieve an improvement of up to 6% with respect to the F_1 measure compared to traditional techniques. Our code, prompts, and results are publicly available at this https URL.
zh

[NLP-26] GenCNER: A Generative Framework for Continual Named Entity Recognition IJCNN2025

【速读】: 该论文旨在解决持续命名实体识别(Continual Named Entity Recognition, CNER)中因实体类别不断扩展所引发的灾难性遗忘(catastrophic forgetting)和非实体类型语义漂移(semantic shift)问题。其解决方案的关键在于提出一种基于生成式框架的策略(GenCNER),将CNER任务转化为持续的实体三元组序列生成问题,并借助强大的预训练序列到序列(seq2seq)模型进行求解;同时设计了基于类型特异性置信度的伪标签策略与知识蒸馏(Knowledge Distillation, KD)机制,以保留已学知识并缓解三元组层面的标签噪声影响。

链接: https://arxiv.org/abs/2510.11444
作者: Yawen Yang,Fukun Ma,Shiao Meng,Aiwei Liu,Lijie Wen
机构: 未知
类目: Computation and Language (cs.CL)
备注: Accepted by IJCNN 2025

点击查看摘要

Abstract:Traditional named entity recognition (NER) aims to identify text mentions into pre-defined entity types. Continual Named Entity Recognition (CNER) is introduced since entity categories are continuously increasing in various real-world scenarios. However, existing continual learning (CL) methods for NER face challenges of catastrophic forgetting and semantic shift of non-entity type. In this paper, we propose GenCNER, a simple but effective Generative framework for CNER to mitigate the above drawbacks. Specifically, we skillfully convert the CNER task into sustained entity triplet sequence generation problem and utilize a powerful pre-trained seq2seq model to solve it. Additionally, we design a type-specific confidence-based pseudo labeling strategy along with knowledge distillation (KD) to preserve learned knowledge and alleviate the impact of label noise at the triplet level. Experimental results on two benchmark datasets show that our framework outperforms previous state-of-the-art methods in multiple CNER settings, and achieves the smallest gap compared with non-CL results.
zh

[NLP-27] Who are you ChatGPT ? Personality and Demographic Style in LLM -Generated Content ECAI2025

【速读】: 该论文旨在解决生成式大语言模型(Generative Large Language Models, LLMs)是否在文本输出中表现出类似人类的性格与人口统计学特征的问题。传统研究依赖自陈问卷,存在主观性和局限性;本文提出一种数据驱动的新方法,通过自动人格分类器和性别分类器对从Reddit收集的开放式问题回复进行分析,从而无需依赖人工问卷即可评估LLM的语言性格特征。其解决方案的关键在于利用大规模真实用户文本作为基准,对比LLM输出与人类文本在五大性格维度(如宜人性Agreeableness、神经质Neuroticism)及性别化语言模式上的差异,揭示了LLM普遍呈现更高宜人性和更低神经质的倾向,且性别语言模式虽与人类相似但变异性较低,为理解生成式AI的“人格”提供了客观、可扩展的分析框架。

链接: https://arxiv.org/abs/2510.11434
作者: Dana Sotto Porat,Ella Rabinovich
机构: The Academic College of Tel Aviv–Yaffo(特拉维夫-亚夫大学学院)
类目: Computation and Language (cs.CL)
备注: ECAI2025 (Identity-Aware AI workshop)

点击查看摘要

Abstract:Generative large language models (LLMs) have become central to everyday life, producing human-like text across diverse domains. A growing body of research investigates whether these models also exhibit personality- and demographic-like characteristics in their language. In this work, we introduce a novel, data-driven methodology for assessing LLM personality without relying on self-report questionnaires, applying instead automatic personality and gender classifiers to model replies on open-ended questions collected from Reddit. Comparing six widely used models to human-authored responses, we find that LLMs systematically express higher Agreeableness and lower Neuroticism, reflecting cooperative and stable conversational tendencies. Gendered language patterns in model text broadly resemble those of human writers, though with reduced variation, echoing prior findings on automated agents. We contribute a new dataset of human and model responses, along with large-scale comparative analyses, shedding new light on the topic of personality and demographic patterns of generative AI.
zh

[NLP-28] Beyond the Crowd: LLM -Augmented Community Notes for Governing Health Misinformation

【速读】: 该论文旨在解决社区注释(Community Notes)系统在健康类虚假信息治理中响应延迟严重的问题,其核心挑战在于用户生成的注释从发布到获得“有用性”评价存在显著延迟(中位数达17.6小时),难以应对现实中的虚假信息爆发。解决方案的关键是提出 CrowdNotes+ 框架,该框架通过两种互补机制实现增强:(1) 基于证据的注释增补(evidence-grounded note augmentation),利用大语言模型(LLMs)自动补充权威来源支持;(2) 以效用为导向的注释自动化(utility-guided note automation),优化注释内容的实用性与准确性。此外,该框架引入分层三步评估机制,依次判断相关性、正确性和有用性,从而有效规避当前评估中存在的“风格流畅性误判为事实准确性”的漏洞,显著提升注释的客观性和时效性,推动形成更高效、严谨的混合人机治理模式。

链接: https://arxiv.org/abs/2510.11423
作者: Jiaying Wu,Zihang Fu,Haonan Wang,Fanxiao Li,Min-Yen Kan
机构: National University of Singapore(新加坡国立大学); Yunnan University(云南大学)
类目: ocial and Information Networks (cs.SI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Community Notes, the crowd-sourced misinformation governance system on X (formerly Twitter), enables users to flag misleading posts, attach contextual notes, and vote on their helpfulness. However, our analysis of 30.8K health-related notes reveals significant latency, with a median delay of 17.6 hours before the first note receives a helpfulness status. To improve responsiveness during real-world misinformation surges, we propose CrowdNotes+, a unified framework that leverages large language models (LLMs) to augment Community Notes for faster and more reliable health misinformation governance. CrowdNotes+ integrates two complementary modes: (1) evidence-grounded note augmentation and (2) utility-guided note automation, along with a hierarchical three-step evaluation that progressively assesses relevance, correctness, and helpfulness. We instantiate the framework through HealthNotes, a benchmark of 1.2K helpfulness-annotated health notes paired with a fine-tuned helpfulness judge. Experiments on fifteen LLMs reveal an overlooked loophole in current helpfulness evaluation, where stylistic fluency is mistaken for factual accuracy, and demonstrate that our hierarchical evaluation and LLM-augmented generation jointly enhance factual precision and evidence utility. These results point toward a hybrid human-AI governance model that improves both the rigor and timeliness of crowd-sourced fact-checking.
zh

[NLP-29] Valid Survey Simulations with Limited Human Data: The Roles of Prompting Fine-Tuning and Rectification

【速读】: 该论文旨在解决传统调查(survey)因成本高、耗时长而难以大规模实施的问题,同时应对当前利用大语言模型(Large Language Models, LLMs)生成模拟响应以替代人类受访者时所引入的显著偏差问题。其核心挑战在于如何有效结合LLM合成数据与偏差校正方法,在有限预算下最大化估计精度与样本效率。解决方案的关键在于:不采用将全部人类数据用于微调LLM的传统做法,而是将大部分人类样本资源分配给偏差校正(rectification)环节,从而在保持低偏差(<5%)的同时,显著提升有效样本量(最高达14%),实现更准确且高效的群体估计。

链接: https://arxiv.org/abs/2510.11408
作者: Stefan Krsteski,Giuseppe Russo,Serina Chang,Robert West,Kristina Gligorić
机构: EPFL (瑞士联邦理工学院); Stanford University (斯坦福大学); University of California, Berkeley (加州大学伯克利分校); Johns Hopkins University (约翰霍普金斯大学)
类目: Computation and Language (cs.CL)
备注: 19 pages, 4 figures, 9 tables

点击查看摘要

Abstract:Surveys provide valuable insights into public opinion and behavior, but their execution is costly and slow. Large language models (LLMs) have been proposed as a scalable, low-cost substitute for human respondents, but their outputs are often biased and yield invalid estimates. We study the interplay between synthesis methods that use LLMs to generate survey responses and rectification methods that debias population estimates, and explore how human responses are best allocated between them. Using two panel surveys with questions on nutrition, politics, and economics, we find that synthesis alone introduces substantial bias (24-86%), whereas combining it with rectification reduces bias below 5% and increases effective sample size by up to 14%. Overall, we challenge the common practice of using all human responses for fine-tuning, showing that under a fixed budget, allocating most to rectification results in far more effective estimation.
zh

[NLP-30] KnowRL: Teaching Language Models to Know What They Know

【速读】: 该论文旨在解决当前大语言模型(Large Language Models, LLMs)在自我认知能力上的不足问题,即模型常常错误评估自身知识边界,在超过五分之一的情况下高估自身能力,导致生成内容不可靠。为提升模型对自身可行性的判断准确性,作者提出KnowRL框架,其核心在于通过两个关键机制实现无需外部监督的自我增强:一是“内省”(introspection),模型自主生成并分类任务以识别自身能力范围;二是基于共识的奖励机制(consensus-based rewarding),利用内部一致性强化对自我知识判断的稳定性。该方法完全依赖模型自生成数据,显著提升了自知能力,实验表明在LLaMA-3.1-8B和Qwen-2.5-7B上仅用少量种子数据即可实现高达28%的准确率提升与12%的F1分数增长,为构建更可靠、可问责的AI系统提供了高效路径。

链接: https://arxiv.org/abs/2510.11407
作者: Sahil Kale,Devendra Singh Dhami
机构: KnowledgeVerse AI; TU Eindhoven (埃因霍温理工大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 14 pages, 7 figures

点击查看摘要

Abstract:Truly reliable AI requires more than simply scaling up knowledge; it demands the ability to know what it knows and when it does not. Yet recent research shows that even the best LLMs misjudge their own competence in more than one in five cases, making any response born of such internal uncertainty impossible to fully trust. Inspired by self-improvement reinforcement learning techniques that require minimal data, we present a simple but powerful framework KnowRL that strengthens a model’s internal understanding of its own feasibility boundaries, enabling safer and more responsible behaviour. Our framework combines two components: (i) introspection, where the model generates and classifies tasks it judges feasible or infeasible, and (ii) consensus-based rewarding, where stability of self-knowledge assessment is reinforced through internal agreement. By using internally generated data, this design strengthens consistency in self-knowledge and entirely avoids costly external supervision. In experiments on LLaMA-3.1-8B and Qwen-2.5-7B, KnowRL steadily improved self-knowledge, validated by both intrinsic self-consistency and extrinsic benchmarking. With nothing more than a small seed set and no external supervision, our method drove gains as high as 28% in accuracy and 12% in F1, outperforming baselines in just a few iterations. Our framework essentially unlocks the untapped capacity of LLMs to self-improve their knowledge awareness, opening the door to reliable, more accountable AI and safer deployment in critical applications. Owing to its simplicity and independence from external effort, we encourage applying this reliability-enhancing process to all future models.
zh

[NLP-31] DocReward: A Document Reward Model for Structuring and Stylizing

【速读】: 该论文旨在解决当前基于智能体(agentic)的工作流在自动化生成专业文档时,仅关注文本质量而忽视视觉结构与风格的问题,这导致生成文档的可读性和吸引力不足。其核心挑战在于缺乏能够有效引导智能体优化文档结构与风格的奖励模型(reward model)。解决方案的关键是提出DocReward——一个专门用于评估文档结构与风格的专业性评分模型,通过构建包含117K对文档的多领域数据集DocPair(覆盖32个领域和267种文档类型),其中每对文档内容相同但专业性不同,从而实现对文本质量无关的结构与风格评估。该模型采用Bradley-Terry损失函数进行训练,确保预测结果与人工标注排序一致,并在多项评估中显著优于GPT-4o和GPT-5等基线模型,证明其在指导文档生成代理产出更受人类偏好的文档方面的有效性。

链接: https://arxiv.org/abs/2510.11391
作者: Junpeng Liu,Yuzhong Zhao,Bowen Cao,Jiayu Ding,Yilin Jia,Tengchao Lv,Yupan Huang,Shaohan Huang,Nan Yang,Li Dong,Lei Cui,Tao Ge,Xun Wang,Huitian Jiao,Sun Mao,FNU Kartik,Si-Qing Chen,Wai Lam,Furu Wei
机构: CUHK (香港中文大学); UCAS (中国科学院大学); XJTU (西安交通大学); UMich (密歇根大学); Microsoft (微软)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Recent advances in agentic workflows have enabled the automation of tasks such as professional document generation. However, they primarily focus on textual quality, neglecting visual structure and style, which are crucial for readability and engagement. This gap arises mainly from the absence of suitable reward models to guide agentic workflows toward producing documents with stronger structural and stylistic quality. To address this, we propose DocReward, a document reward model that evaluates documents based on their structure and style. We construct a multi-domain dataset DocPair of 117K paired documents, covering 32 domains and 267 document types, each including a high- and low-professionalism document with identical content but different structure and style. This enables the model to evaluate professionalism comprehensively, and in a textual-quality-agnostic way. DocReward is trained using the Bradley-Terry loss to score documents, penalizing predictions that contradict the annotated ranking. To assess the performance of reward models, we create a test dataset containing document bundles ranked by well-educated human evaluators. Notably, DocReward outperforms GPT-4o and GPT-5 in accuracy by 30.6 and 19.4 percentage points, respectively, demonstrating its superiority over baselines. In an extrinsic evaluation of document generation, DocReward achieves a significantly higher win rate of 60.8%, compared to GPT-5’s 37.7% win rate, demonstrating its utility in guiding generation agents toward producing human-preferred documents.
zh

[NLP-32] Beyond Survival: Evaluating LLM s in Social Deduction Games with Human-Aligned Strategies

【速读】: 该论文旨在解决当前生成式 AI (Generative AI) 在社交推理类游戏(如狼人杀)中评估不足的问题,特别是现有研究多依赖大语言模型(LLM)自对弈,导致对话模板化、社会互动细节缺失,且缺乏高质量的参考数据用于精细评估。其解决方案的关键在于构建一个高质量、人工验证的多模态狼人杀数据集(包含超100小时视频、3240万词元及15种规则变体),并提出一种基于策略对齐的两阶段评估框架:第一阶段为言语评估,通过多项选择任务衡量模型在五维社交能力上的立场适配性;第二阶段为决策评估,考察模型的投票行为与对手角色推断准确性。该方法以获胜方策略作为真实标签,实现了对模型语言能力和推理能力的细粒度量化,揭示了当前主流LLM在欺骗与反事实推理方面的显著短板。

链接: https://arxiv.org/abs/2510.11389
作者: Zirui Song,Yuan Huang,Junchang Liu,Haozhe Luo,Chenxi Wang,Lang Gao,Zixiang Xu,Mingfei Han,Xiaojun Chang,Xiuying Chen
机构: Mohamed bin Zayed University of Artificial Intelligence (MBZUAI); Northeastern University
类目: Computation and Language (cs.CL)
备注: 34 pages, 32figures

点击查看摘要

Abstract:Social deduction games like Werewolf combine language, reasoning, and strategy, providing a testbed for studying natural language and social intelligence. However, most studies reduce the game to LLM-based self-play, yielding templated utterances and anecdotal cases that overlook the richness of social gameplay. Evaluation further relies on coarse metrics such as survival time or subjective scoring due to the lack of quality reference data. To address these gaps, we curate a high-quality, human-verified multimodal Werewolf dataset containing over 100 hours of video, 32.4M utterance tokens, and 15 rule variants. Based on this dataset, we propose a novel strategy-alignment evaluation that leverages the winning faction’s strategies as ground truth in two stages: 1) Speech evaluation, formulated as multiple-choice-style tasks that assess whether the model can adopt appropriate stances across five dimensions of social ability; and 2) Decision evaluation, which assesses the model’s voting choices and opponent-role inferences. This framework enables a fine-grained evaluation of models’ linguistic and reasoning capabilities, while capturing their ability to generate strategically coherent gameplay. Our experiments show that state-of-the-art LLMs show diverse performance, with roughly half remain below 0.50, revealing clear gaps in deception and counterfactual reasoning. We hope our dataset further inspires research on language, reasoning, and strategy in multi-agent interaction.
zh

[NLP-33] Early Detection and Reduction of Memorisation for Domain Adaptation and Instruction Tuning ACL

【速读】: 该论文旨在解决大语言模型在微调(fine-tuning)过程中对训练数据产生逐字记忆(verbatim memorisation)的问题,尤其是针对领域适配和指令微调场景下,现有防御机制主要聚焦于预训练阶段,而对微调阶段的遗忘风险缺乏系统理解。解决方案的关键在于:首先提出一种基于n-gram的可量化记忆评分指标,该指标能有效提前预测逐字记忆的发生,并作为早期停止(early-stopping)策略显著减少记忆风险且仅带来最小性能损失;其次引入一种n-gram感知的损失正则化项(n-gram-aware loss regulariser),可在所有测试模型家族中将记忆水平降低最多达40%,同时相比现有方法更少损害下游任务性能,从而提供了可扩展、实用的记忆管理策略。

链接: https://arxiv.org/abs/2510.11372
作者: Dean L. Slack,Noura Al Moubayed
机构: Durham University (杜伦大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Accepted to Transactions of the ACL (TACL), 2025. 15 pages, 6 figures, 3 tables

点击查看摘要

Abstract:Although large language models excel across many tasks, they can memorise training data and thereby expose private or copyrighted text. Most defences target the pre-training stage, leaving memorisation during fine-tuning, especially for domain adaptation and instruction tuning, poorly understood. We fine-tune Pythia, Llama3, and Mistral models spanning 1.4B-70B parameters on common evaluation datasets and track verbatim memorisation throughout training. We find that memorisation increases dramatically in the first few epochs, often significantly before either validation perplexity or evaluation performance is optimised. We use a simple but effective n-gram memorisation score which reliably precedes verbatim memorisation; using it as an early-stopping criterion mitigates memorisation with minimal performance loss. Further, we introduce an n-gram-aware loss regulariser and show that it reduces memorisation across all model families tested by up to 40% while minimising evaluation performance trade-offs when compared to an existing memorisation mitigation strategy. These results yield practical, scalable insights into memorisation dynamics during language model fine-tuning.
zh

[NLP-34] Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers

【速读】: 该论文旨在解决Mixture-of-Experts (MoE)模型在强化学习(Reinforcement Learning, RL)训练过程中因路由机制(routing mechanism)不稳定性导致的训练崩溃问题,尤其是训练与推理阶段路由行为不一致所引发的策略分布差异。其关键解决方案是提出Rollout Routing Replay (R3),通过记录推理阶段的路由分布并在训练中回放这些分布,从而显著降低训练与推理间策略的KL散度,有效缓解路由选择的随机性与不一致性,同时保持训练效率,实现RL训练的稳定性和性能提升。

链接: https://arxiv.org/abs/2510.11370
作者: Wenhan Ma,Hailin Zhang,Liang Zhao,Yifan Song,Yudong Wang,Zhifang Sui,Fuli Luo
机构: Peking University (北京大学); Xiaomi (小米)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Reinforcement learning (RL) has emerged as a crucial approach for enhancing the capabilities of large language models. However, in Mixture-of-Experts (MoE) models, the routing mechanism often introduces instability, even leading to catastrophic RL training collapse. We analyze the training-inference consistency of MoE models and identify a notable discrepancy in routing behaviors between the two phases. Moreover, even under identical conditions, the routing framework can yield divergent expert selections across repeated forward passes. To address this foundational inconsistency, we propose Rollout Routing Replay (R3), a method that records routing distributions from the inference engine and replays them during training. R3 significantly reduces training-inference policy KL divergence and mitigates extreme discrepancies without compromising training speed. Extensive experiments on various settings confirm that R3 succeeds in stabilizing RL training, preventing collapse and outperforming methods such as GSPO and TIS. We believe this work can offer a new solution for stabilizing RL in MoE models.
zh

[NLP-35] LLM -Specific Utility: A New Perspective for Retrieval-Augmented Generation

【速读】: 该论文旨在解决检索增强生成(Retrieval-Augmented Generation, RAG)中一个关键问题:现有方法通常将检索到的文本片段视为通用“有用性”(utility),忽视了不同大语言模型(Large Language Models, LLMs)对同一段落的利用效果存在显著差异。这种忽略导致检索策略难以适配具体LLM的内部知识结构和理解能力,从而限制了RAG的整体性能。论文的核心贡献在于提出并系统验证了“LLM-specific utility”(LLM特异性有用性)的概念,指出人类标注的片段并非对所有LLM都是最优,且真实有用的片段不具备跨模型迁移性。解决方案的关键在于构建一套针对特定LLM的实用性判断基准流程,并发现基于伪答案(pseudo-answers)的言语化评估方法表现稳健,而LLM自身难以有效识别真正有用的片段,尤其在未知查询场景下表现不佳。这一发现推动RAG研究从通用效用转向个性化、模型感知的检索优化路径。

链接: https://arxiv.org/abs/2510.11358
作者: Hengran Zhang,Keping Bi,Jiafeng Guo,Jiaming Zhang,Shuaiqiang Wang,Dawei Yin,Xueqi Cheng
机构: State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences (中国科学院计算技术研究所人工智能安全重点实验室); University of Chinese Academy of Sciences (中国科学院大学); Baidu Inc (百度公司)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
备注: 13 pages, 9 figures

点击查看摘要

Abstract:Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge. While traditional retrieval focuses on relevance, RAG’s effectiveness depends on the utility of retrieved passages, i.e., the usefulness in facilitating the generation of an accurate and comprehensive answer. Existing studies often treat utility as a generic attribute, ignoring the fact that different LLMs may benefit differently from the same passage due to variations in internal knowledge and comprehension ability. In this work, we introduce and systematically investigate the notion of LLM-specific utility. Through large-scale experiments across multiple datasets and LLMs, we demonstrate that human-annotated passages are not optimal for LLMs and that ground-truth utilitarian passages are not transferable across different LLMs. These findings highlight the necessity of adopting the LLM-specific utility in RAG research. Our findings indicate that some human-annotated passages are not ground-truth utilitarian passages for specific LLMs, partially due to the varying readability of queries and passages for LLMs, a tendency for which perplexity is a key metric. Based on these findings, we propose a benchmarking procedure for LLM-specific utility judgments. We evaluate existing utility judgment methods on six datasets and find that while verbalized methods using pseudo-answers perform robustly, LLMs struggle to assess utility effectively-failing to reject all passages for known queries and to select truly useful ones for unknown queries.
zh

[NLP-36] Diffusion-Link: Diffusion Probabilistic Model for Bridging the Audio-Text Modality Gap ICASSP2026

【速读】: 该论文旨在解决多模态编码器与大语言模型(LLM)耦合时存在的音频-文本模态差距(modality gap)问题,该差距限制了联合表示的有效性。解决方案的关键在于提出Diffusion-Link,一个基于扩散机制的模态桥接模块,通过生成式方式将冻结的多模态编码器输出的音频嵌入映射到文本嵌入分布中。该模块设计为轻量级网络,包含三个残差MLP块,在不依赖外部知识的前提下显著缩小了模态差距,并在自动音频字幕(Automatic Audio Captioning, AAC)任务上实现了零样本和全监督条件下的最先进性能,相对提升达52.5%和7.5%,验证了生成式模态桥接对多模态系统整合的重要性。

链接: https://arxiv.org/abs/2510.11330
作者: KiHyun Nam,Jongmin Choi,Hyeongkeun Lee,Jungwoo Heo,Joon Son Chung
机构: 未知
类目: ound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
备注: 5 pages. Submitted to IEEE ICASSP 2026

点击查看摘要

Abstract:Contrastive audio-language pretraining yields powerful joint representations, yet a persistent audio-text modality gap limits the benefits of coupling multimodal encoders with large language models (LLMs). We present Diffusion-Link, a diffusion-based modality-bridging module that generatively maps audio embeddings into the text-embedding distribution. The module is trained at the output embedding from the frozen multimodal encoder and implemented as a lightweight network with three residual MLP blocks. To assess the effect of Diffusion-Link on multimodal encoder-LLM coupling, we evaluate on Automatic Audio Captioning (AAC); to our knowledge, this is the first application of diffusion-based modality bridging to AAC. We report two results. (1) Modality-gap analysis: on similarity and geometric criteria, Diffusion-Link reduces the modality gap the most among prior diffusion-based methods and shows a collective migration of audio embeddings toward the text distribution. (2) Downstream AAC: attaching Diffusion-Link to the same multimodal LLM baseline achieves state-of-the-art on AudioCaps in both zero-shot and fully supervised captioning without external knowledge, with relative gains up to 52.5% and 7.5%, respectively. These findings show that closing the modality gap is pivotal for effective coupling between multimodal encoders and LLMs, and diffusion-based modality bridging offers a promising direction beyond knowledge-retrieval-centric designs. Code will be released upon acceptance this https URL
zh

[NLP-37] Do LLM s “Feel”? Emotion Circuits Discovery and Control

【速读】: 该论文旨在解决大语言模型(Large Language Models, LLMs)中情感表达的内在机制不明确以及难以实现统一情感控制的问题。其核心挑战在于理解情绪如何在模型内部被编码,并实现对生成文本情绪的精准调控。解决方案的关键在于:首先构建了一个受控数据集SEV(Scenario-Event with Valence),用于诱发跨情境一致的情绪状态;其次通过分析分解与因果干预,识别出不依赖上下文的情绪方向及其局部实现单元(如特定神经元和注意力头),并量化各子层对最终情绪表征的因果影响;最后整合这些局部组件形成全局情感电路,直接调节该电路即可实现高达99.65%的情感表达准确率,显著优于提示工程和引导方法,首次系统性揭示并验证了LLMs中的情感电路。

链接: https://arxiv.org/abs/2510.11328
作者: Chenxi Wang,Yixuan Zhang,Ruiji Yu,Yufei Zheng,Lang Gao,Zirui Song,Zixiang Xu,Gus Xia,Huishuai Zhang,Dongyan Zhao,Xiuying Chen
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 19 pages, 8 figures, 8 tables. Code and dataset available at this https URL

点击查看摘要

Abstract:As the demand for emotional intelligence in large language models (LLMs) grows, a key challenge lies in understanding the internal mechanisms that give rise to emotional expression and in controlling emotions in generated text. This study addresses three core questions: (1) Do LLMs contain context-agnostic mechanisms shaping emotional expression? (2) What form do these mechanisms take? (3) Can they be harnessed for universal emotion control? We first construct a controlled dataset, SEV (Scenario-Event with Valence), to elicit comparable internal states across emotions. Subsequently, we extract context-agnostic emotion directions that reveal consistent, cross-context encoding of emotion (Q1). We identify neurons and attention heads that locally implement emotional computation through analytical decomposition and causal analysis, and validate their causal roles via ablation and enhancement interventions. Next, we quantify each sublayer’s causal influence on the model’s final emotion representation and integrate the identified local components into coherent global emotion circuits that drive emotional expression (Q2). Directly modulating these circuits achieves 99.65% emotion-expression accuracy on the test set, surpassing prompting- and steering-based methods (Q3). To our knowledge, this is the first systematic study to uncover and validate emotion circuits in LLMs, offering new insights into interpretability and controllable emotional intelligence.
zh

[NLP-38] mplate-Based Text-to-Image Alignment for Language Accessibility: A Study on Visualizing Text Simplifications

【速读】: 该论文旨在解决智力障碍个体在理解复杂文本时面临的困难,特别是如何通过视觉插图增强文本简化(Text Simplification, TS)的可访问性。其核心问题是:当前多数文本到图像模型侧重美学而非可访问性,且缺乏对图像生成与文本简化之间关系的系统研究。解决方案的关键在于提出一种结构化的视觉语言模型(Vision-Language Model, VLM)提示框架,设计五种符合可访问性约束(如对象数量限制、空间分离和内容规范)的提示模板(包括Basic Object Focus、Contextual Scene等),并通过两阶段评估验证其有效性——第一阶段使用CLIPScores衡量语义对齐度,第二阶段由四位可访问性专家对生成图像进行十种视觉风格的人工标注。结果表明,基于“基础对象聚焦”提示模板生成的图像具有最强语义一致性,且“复古”风格最易被专家判定为可访问,这凸显了结构化提示在提升AI生成视觉辅助工具可访问性中的关键作用。

链接: https://arxiv.org/abs/2510.11314
作者: Belkiss Souayed,Sarah Ebling,Yingqiang Gao
机构: University of Zurich (苏黎世大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Individuals with intellectual disabilities often have difficulties in comprehending complex texts. While many text-to-image models prioritize aesthetics over accessibility, it is not clear how visual illustrations relate to text simplifications (TS) generated from them. This paper presents a structured vision-language model (VLM) prompting framework for generating accessible images from simplified texts. We designed five prompt templates, i.e., Basic Object Focus, Contextual Scene, Educational Layout, Multi-Level Detail, and Grid Layout, each following distinct spatial arrangements while adhering to accessibility constraints such as object count limits, spatial separation, and content restrictions. Using 400 sentence-level simplifications from four established TS datasets (OneStopEnglish, SimPA, Wikipedia, and ASSET), we conducted a two-phase evaluation: Phase 1 assessed prompt template effectiveness with CLIPScores, and Phase 2 involved human annotation of generated images across ten visual styles by four accessibility experts. Results show that the Basic Object Focus prompt template achieved the highest semantic alignment, indicating that visual minimalism enhances language accessibility. Expert evaluation further identified Retro style as the most accessible and Wikipedia as the most effective data source. Inter-annotator agreement varied across dimensions, with Text Simplicity showing strong reliability and Image Quality proving more subjective. Overall, our framework offers practical guidelines for accessible content generation and underscores the importance of structured prompting in AI-generated visual accessibility tools.
zh

[NLP-39] FOSSIL: Harnessing Feedback on Suboptimal Samples for Data-Efficient Generalisation with Imitation Learning for Embodied Vision-and-Language Tasks EMNLP2025

【速读】: 该论文旨在解决当前具身智能(Embodied AI)中模仿学习(Imitation Learning)依赖最优示范而导致的泛化能力受限问题,即模型难以从次优甚至错误行为中学习,且传统强化学习因探索成本高而牺牲数据效率。解决方案的关键在于引入结构化语言反馈(Constructive Language Feedback),将反馈嵌入Transformer架构的策略网络输入序列中,并辅以自监督的反馈预测任务作为辅助目标,从而让代理在无需额外奖励信号的情况下,通过语言上下文理解不同行为模式的合理性,实现对次优行为的有效利用,显著提升组合泛化能力和鲁棒性。

链接: https://arxiv.org/abs/2510.11307
作者: Sabrina McCallum,Amit Parekh,Alessandro Suglia
机构: University of Edinburgh (爱丁堡大学); Heriot-Watt University (赫瑞-瓦特大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: EMNLP 2025 Findings

点击查看摘要

Abstract:Current approaches to embodied AI tend to learn policies from expert demonstrations. However, without a mechanism to evaluate the quality of demonstrated actions, they are limited to learning from optimal behaviour, or they risk replicating errors and inefficiencies. While reinforcement learning offers one alternative, the associated exploration typically results in sacrificing data efficiency. This work explores how agents trained with imitation learning can learn robust representations from both optimal and suboptimal demonstrations when given access to constructive language feedback as a means to contextualise different modes of behaviour. We directly provide language feedback embeddings as part of the input sequence into a Transformer-based policy, and optionally complement the traditional next action prediction objective with auxiliary self-supervised learning objectives for feedback prediction. We test our approach on a range of embodied Vision-and-Language tasks in our custom BabyAI-XGen environment and show significant improvements in agents’ compositional generalisation abilities and robustness, suggesting that our data-efficient method allows models to successfully convert suboptimal behaviour into learning opportunities. Overall, our results suggest that language feedback is a competitive and intuitive alternative to intermediate scalar rewards for language-specified embodied tasks.
zh

[NLP-40] Are Large Language Models Effective Knowledge Graph Constructors?

【速读】: 该论文旨在解决知识图谱(Knowledge Graph, KG)构建过程中存在的高质量信息抽取与结构化表示难题,尤其针对当前基于大语言模型(Large Language Models, LLMs)的方法在实体和关系抽取上局限于句子级上下文或依赖预定义模式、导致覆盖不全与语义贫乏的问题。其解决方案的关键在于提出一种分层抽取框架(hierarchical extraction framework),通过多层级组织信息,实现从原始文本中提取更丰富语义并生成结构良好、可解释性强的知识图谱;该框架利用前沿LLMs进行知识抽取与构建,并从结构与语义两个维度对生成的KG进行全面评估,从而揭示当前LLMs在KG构建中的优势与局限,为后续研究指明方向。

链接: https://arxiv.org/abs/2510.11297
作者: Ruirui Chen,Weifeng Jiang,Chengwei Qin,Bo Xiong,Fiona Liausvia,Dongkyu Choi,Boon Kiat Quek
机构: Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), Singapore; Nanyang Technological University, Singapore; Hong Kong University of Science and Technology (Guangzhou), China; Stanford University, United States
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Knowledge graphs (KGs) are vital for knowledge-intensive tasks and have shown promise in reducing hallucinations in large language models (LLMs). However, constructing high-quality KGs remains difficult, requiring accurate information extraction and structured representations that support interpretability and downstream utility. Existing LLM-based approaches often focus narrowly on entity and relation extraction, limiting coverage to sentence-level contexts or relying on predefined schemas. We propose a hierarchical extraction framework that organizes information at multiple levels, enabling the creation of semantically rich and well-structured KGs. Using state-of-the-art LLMs, we extract and construct knowledge graphs and evaluate them comprehensively from both structural and semantic perspectives. Our results highlight the strengths and shortcomings of current LLMs in KG construction and identify key challenges for future work. To advance research in this area, we also release a curated dataset of LLM-generated KGs derived from research papers on children’s mental well-being. This resource aims to foster more transparent, reliable, and impactful applications in high-stakes domains such as healthcare.
zh

[NLP-41] Emergent Misalignment via In-Context Learning: Narrow in-context examples can produce broadly misaligned LLM s

【速读】: 该论文旨在解决**上下文学习(In-Context Learning, ICL)中是否存在新兴偏差(Emergent Misalignment, EM)**的问题,即在仅使用少量示例进行上下文提示时,大语言模型(Large Language Models, LLMs)是否会生成广泛偏离对齐目标的有害或不当响应。解决方案的关键在于通过系统性实验验证:即使不依赖微调(fine-tuning)或激活操控(activation steering),仅依靠上下文示例即可诱发EM现象;进一步地,通过对模型推理链(chain-of-thought)的手动分析揭示,高达67.5%的错误响应会通过采用危险或鲁莽的“人格角色”来合理化其有害输出,表明EM机制与模型内部推理过程中的认知偏差密切相关。

链接: https://arxiv.org/abs/2510.11288
作者: Nikita Afonin,Nikita Andriyanov,Nikhil Bageshpura,Kyle Liu,Kevin Zhu,Sunishchal Dev,Ashwinee Panda,Alexander Panchenko,Oleg Rogov,Elena Tutubalina,Mikhail Seleznyov
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Recent work has shown that narrow finetuning can produce broadly misaligned LLMs, a phenomenon termed emergent misalignment (EM). While concerning, these findings were limited to finetuning and activation steering, leaving out in-context learning (ICL). We therefore ask: does EM emerge in ICL? We find that it does: across three datasets, three frontier models produce broadly misaligned responses at rates between 2% and 17% given 64 narrow in-context examples, and up to 58% with 256 examples. We also examine mechanisms of EM by eliciting step-by-step reasoning (while leaving in-context examples unchanged). Manual analysis of the resulting chain-of-thought shows that 67.5% of misaligned traces explicitly rationalize harmful outputs by adopting a reckless or dangerous ‘‘persona’’, echoing prior results on finetuning-induced EM.
zh

[NLP-42] ENIGMA: The Geometry of Reasoning and Alignment in Large-Language Models

【速读】: 该论文旨在解决大语言模型(Large-Language Model, LLM)在推理能力、对齐性(alignment)和鲁棒性(robustness)之间难以协同提升的问题。传统方法常依赖奖励模型或复杂的多目标优化,导致训练不稳定或性能瓶颈。其解决方案的关键在于提出一种基于信息几何的统一训练框架——Entropic Mutual-Information Geometry Large-Language Model Alignment (ENIGMA),该框架将组织政策/原则视为模型信息流形(information manifold)上的方向,并通过单循环训练实现三者的联合优化:1)使用无评价值的Group-Relative Policy Optimisation(GRPO)结合Chain-of-Thought(CoT)格式奖励;2)引入类SAMI的对称InfoNCE辅助损失以增强自监督对齐;3)采用熵正则化的Sinkhorn最优传输项约束隐藏状态分布的几何漂移。此外,文中设计了针对匹配负样本的infoNCE指标(如 Sufficiency Index, SI),用于量化CoT对政策的编码强度,从而在训练前筛选高价值原则,显著改善小规模模型(1B参数)的训练稳定性和下游任务表现,验证了推理、对齐与鲁棒性可统一为单一信息几何目标的假设。

链接: https://arxiv.org/abs/2510.11278
作者: Gareth Seneque,Lap-Hang Ho,Nafise Erfanian Saeedi,Jeffrey Molendijk,Ariel Kupermann,Tim Elson
机构: Australian Broadcasting Corporation(澳大利亚广播公司)
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: 52 pages, 10 figures

点击查看摘要

Abstract:We present Entropic Mutual-Information Geometry Large-Language Model Alignment (ENIGMA), a novel approach to Large-Language Model (LLM) training that jointly improves reasoning, alignment and robustness by treating an organisation’s policies/principles as directions to move on a model’s information manifold. Our single-loop trainer combines Group-Relative Policy Optimisation (GRPO), an on-policy, critic-free RL method with Chain-of-Thought (CoT)-format only rewards; a Self-Supervised Alignment with Mutual Information (SAMI)-style symmetric InfoNCE auxiliary; and an entropic Sinkhorn optimal-transport regulariser on hidden-state distributions to bound geometry drift. We also introduce infoNCE metrics that specialise to a standard MI lower bound under matched negatives to measure how strongly a model’s CoT encodes these policies. These metrics include a Sufficiency Index (SI) that enables the selection and creation of principles that maximise downstream performance prior to training. In our experiments using small (1B) LLMs, high-SI principles predict steadier training dynamics and improved benchmark performance over GRPO ablations. Our information-geometry analysis of trained models validates desirable structural change in the manifold. These results support our hypothesis that reasoning, alignment, and robustness are projections of a single informationgeometric objective, and that models trained using ENIGMA demonstrate principled reasoning without the use of a reward model, offering a path to trusted capability
zh

[NLP-43] owards Real-Time Fake News Detection under Evidence Scarcity

【速读】: 该论文旨在解决实时场景下假新闻检测面临的挑战,即新兴事件常缺乏充分的外部证据支持,导致现有依赖外部证据的方法在证据稀缺时泛化能力差。解决方案的关键在于提出一种名为EASE(Evaluation-Aware Selection of Experts)的框架,其核心是通过三阶段评估机制动态调整决策流程:(1) 基于证据的评估,仅在证据充分时纳入决策;(2) 基于推理的评估,仅当大语言模型(Large Language Models, LLMs)可靠性足够时启用;(3) 基于情感的备用机制,在前两者均不可靠时引入情感线索。EASE通过指令微调(instruction tuning)与伪标签指导每个评估器生成可解释的推理路径,并将评估结果与新闻内容融合,实现评估感知的决策,从而显著提升检测准确性和对实时新闻的泛化能力。

链接: https://arxiv.org/abs/2510.11277
作者: Guangyu Wei,Ke Han,Yueming Lyu,Yu Luo,Yue Jiang,Caifeng Shan,Nicu Sebe
机构: Nanjing University (南京大学); Ocean University of China (中国海洋大学); Suzhou (苏州); Qingdao (青岛); University of Trento (特伦托大学); Chinese Academy of Sciences (中国科学院); Beijing (北京)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Fake news detection becomes particularly challenging in real-time scenarios, where emerging events often lack sufficient supporting evidence. Existing approaches often rely heavily on external evidence and therefore struggle to generalize under evidence scarcity. To address this issue, we propose Evaluation-Aware Selection of Experts (EASE), a novel framework for real-time fake news detection that dynamically adapts its decision-making process according to the assessed sufficiency of available evidence. EASE introduces a sequential evaluation mechanism comprising three independent perspectives: (1) Evidence-based evaluation, which assesses evidence and incorporates it into decision-making only when the evidence is sufficiently supportive; (2) Reasoning-based evaluation, which leverages the world knowledge of large language models (LLMs) and applies them only when their reliability is adequately established; and (3) Sentiment-based fallback, which integrates sentiment cues when neither evidence nor reasoning is reliable. To enhance the accuracy of evaluation processes, EASE employs instruction tuning with pseudo labels to guide each evaluator in justifying its perspective-specific knowledge through interpretable reasoning. Furthermore, the expert modules integrate the evaluators’ justified assessments with the news content to enable evaluation-aware decision-making, thereby enhancing overall detection accuracy. Moreover, we introduce RealTimeNews-25, a new benchmark comprising recent news for evaluating model generalization on emerging news with limited evidence. Extensive experiments demonstrate that EASE not only achieves state-of-the-art performance across multiple benchmarks, but also significantly improves generalization to real-time news. The code and dataset are available: this https URL.
zh

[NLP-44] Do Psychometric Tests Work for Large Language Models ? Evaluation of Tests on Sexism Racism and Morality

【速读】: 该论文旨在解决当前将人类心理测量工具(psychometric tests)直接应用于大语言模型(LLMs)时,其信度(reliability)与效度(validity)是否成立的问题。解决方案的关键在于通过系统性评估三种心理构念——性别偏见(sexism)、种族偏见(racism)和道德观(morality)——的测试结果,结合收敛效度(convergent validity)和生态效度(ecological validity)两种方法进行验证:一方面检验测试间理论预期的相关性,另一方面考察测试分数与模型在真实下游任务中的行为表现是否一致。研究发现,尽管测试具有中等信度,但其生态效度较低,甚至存在负相关,表明人类心理测量工具不能直接用于LLMs,必须经过适应性调整才能有效评估模型的心理特征。

链接: https://arxiv.org/abs/2510.11254
作者: Jana Jung,Marlene Lutz,Indira Sen,Markus Strohmaier
机构: University of Mannheim (曼海姆大学); GESIS - Leibniz Institute for the Social Sciences (GESIS-莱布尼茨社会科学研究所); Complexity Science Hub Vienna (复杂科学 hub 维也纳)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Psychometric tests are increasingly used to assess psychological constructs in large language models (LLMs). However, it remains unclear whether these tests – originally developed for humans – yield meaningful results when applied to LLMs. In this study, we systematically evaluate the reliability and validity of human psychometric tests for three constructs: sexism, racism, and morality. We find moderate reliability across multiple item and prompt variations. Validity is evaluated through both convergent (i.e., testing theory-based inter-test correlations) and ecological approaches (i.e., testing the alignment between tests scores and behavior in real-world downstream tasks). Crucially, we find that psychometric test scores do not align, and in some cases even negatively correlate with, model behavior in downstream tasks, indicating low ecological validity. Our results highlight that systematic evaluations of psychometric tests is essential before interpreting their scores. They also suggest that psychometric tests designed for humans cannot be applied directly to LLMs without adaptation.
zh

[NLP-45] Attacks by Content: Automated Fact-checking is an AI Security Issue EMNLP2025

【速读】: 该论文旨在解决AI代理在检索和推理外部文档时,因接收到被篡改或误导性内容而导致行为偏移的问题。传统防御方法主要针对间接提示注入(indirect prompt injection)中隐藏指令的检测,但对攻击者通过提供偏见、误导或虚假信息进行操纵的“内容攻击”(attack by content)无效。解决方案的关键在于让AI代理具备批判性评估能力,即通过外部证据交叉验证声明的真实性,并评估信息来源的可信度,这与自然语言处理中的自动化事实核查(automated fact-checking)任务高度相似,作者建议将后者重构为代理的认知自我防御工具。

链接: https://arxiv.org/abs/2510.11238
作者: Michael Schlichtkrull
机构: Queen Mary University of London (伦敦玛丽女王大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Accepted to EMNLP 2025

点击查看摘要

Abstract:When AI agents retrieve and reason over external documents, adversaries can manipulate the data they receive to subvert their behaviour. Previous research has studied indirect prompt injection, where the attacker injects malicious instructions. We argue that injection of instructions is not necessary to manipulate agents - attackers could instead supply biased, misleading, or false information. We term this an attack by content. Existing defenses, which focus on detecting hidden commands, are ineffective against attacks by content. To defend themselves and their users, agents must critically evaluate retrieved information, corroborating claims with external evidence and evaluating source trustworthiness. We argue that this is analogous to an existing NLP task, automated fact-checking, which we propose to repurpose as a cognitive self-defense tool for agents.
zh

[NLP-46] XQuant: Achieving Ultra-Low Bit KV Cache Quantization with Cross-Layer Compression EMNLP2025

【速读】: 该论文旨在解决大语言模型(Large Language Models, LLMs)在长文本理解与生成过程中因KV缓存(Key-Value Cache)内存占用过大而导致的资源受限环境部署难题。其核心解决方案是提出XQuant框架,该框架无需训练且即插即用,通过两项关键技术实现超低比特位宽的KV缓存量化:一是计算开销可忽略的数据无关校准方法,二是跨层KV缓存压缩机制,从而将量化精度提升至亚1.4比特(sub-1.4 bits),在TruthfulQA和LongBench等基准测试中显著优于现有最优方法(如KIVI-2bit和AsymKV-1.5bit),实现了更优的内存效率与模型准确率权衡。

链接: https://arxiv.org/abs/2510.11236
作者: Haoqi Yang,Yao Yao,Zuchao Li,Baoyuan Qi,Guoming Liu,Hai Zhao
机构: Wuhan University (武汉大学); Shanghai Jiao Tong University (上海交通大学); Xiaomi Inc. (小米公司)
类目: Computation and Language (cs.CL)
备注: To be published in The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025)

点击查看摘要

Abstract:Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse natural language processing tasks. However, their extensive memory requirements, particularly due to KV cache growth during long-text understanding and generation, present significant challenges for deployment in resource-constrained environments. Quantization has emerged as a promising solution to reduce memory consumption while preserving historical information. We propose XQuant, a training-free and plug-and-play framework that achieves ultra-low equivalent bit-width KV cache quantization. XQuant introduces two key innovations: a computationally negligible data-free calibration method and cross-layer KV cache compression, enabling quantization to sub-1.4 bits. Extensive experiments on TruthfulQA and LongBench demonstrate that XQuant outperforms state-of-the-art methods (e.g., KIVI-2bit and AsymKV-1.5bit) by achieving lower bit-width while maintaining superior performance, establishing a better trade-off between memory efficiency and model accuracy.
zh

[NLP-47] CNSocialDepress: A Chinese Social Media Dataset for Depression Risk Detection and Structured Analysis

【速读】: 该论文旨在解决中文社交媒体文本中抑郁风险检测资源匮乏且多局限于二分类任务的问题。其解决方案的关键在于构建并发布CNSocialDepress数据集,该数据集包含44,178条来自233名用户的中文社交帖子,并由心理学专家标注了10,306个与抑郁相关的语义片段,同时提供二分类风险标签及结构化的多维心理属性信息,从而支持可解释的细粒度抑郁信号分析。这一设计显著提升了抑郁症风险识别的精度与心理机制解析能力,为面向中文人群的心理健康应用提供了高质量的数据基础和评估基准。

链接: https://arxiv.org/abs/2510.11233
作者: Jinyuan Xu,Tian Lan,Xintao Yu,Xue He,Hezhi Zhang,Ying Wang,Pierre Magistry,Mathieu Valette,Lei Li
机构: Ertim Inalco; Milkuya Studio; Sorbonne Université; IRD Lab; Peking University (北京大学); Beijing Normal University (北京师范大学); University of Washington; VitaSight
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Depression is a pressing global public health issue, yet publicly available Chinese-language resources for risk detection remain scarce and are mostly limited to binary classification. To address this limitation, we release CNSocialDepress, a benchmark dataset for depression risk detection from Chinese social media posts. The dataset contains 44,178 texts from 233 users, within which psychological experts annotated 10,306 depression-related segments. CNSocialDepress provides binary risk labels together with structured multi-dimensional psychological attributes, enabling interpretable and fine-grained analysis of depressive signals. Experimental results demonstrate its utility across a wide range of NLP tasks, including structured psychological profiling and fine-tuning of large language models for depression detection. Comprehensive evaluations highlight the dataset’s effectiveness and practical value for depression risk identification and psychological analysis, thereby providing insights to mental health applications tailored for Chinese-speaking populations.
zh

[NLP-48] A Theorem-Proving-Based Evaluation of Neural Semantic Parsing

【速读】: 该论文旨在解决当前神经语义解析器评估中依赖图匹配指标(如Smatch)所导致的逻辑等价性捕捉不足的问题,因为这些指标仅衡量表面结构重叠而非语义逻辑一致性。其解决方案的关键在于引入自动定理证明(automated theorem proving)作为逻辑验证手段,与图匹配指标结合使用,从而更准确地评估模型输出是否在逻辑上等价于目标公式;同时通过规范化(normalization)目标表示以减少偶然变异性、提升公式合法性,并发现模型在复杂公式、并列结构、介词短语及被动语态下表现下降,主要错误集中在变量绑定、索引和谓词命名等方面,这为开发更具逻辑敏感性的评估体系和训练目标提供了实证依据。

链接: https://arxiv.org/abs/2510.11225
作者: Hayate Funakura,Hyunsoo Kim,Koji Mineshima
机构: Kyoto University (京都大学); Keio University (庆应义塾大学); Kikagaku Inc. (株式会社吉卡瓜)
类目: Computation and Language (cs.CL)
备注: Accepted to BlackboxNLP 2025

点击查看摘要

Abstract:Graph-matching metrics such as Smatch are the de facto standard for evaluating neural semantic parsers, yet they capture surface overlap rather than logical equivalence. We reassess evaluation by pairing graph-matching with automated theorem proving. We compare two approaches to building parsers: supervised fine-tuning (T5-Small/Base) and few-shot in-context learning (GPT-4o/4.1/5), under normalized and unnormalized targets. We evaluate outputs using graph-matching, bidirectional entailment between source and target formulas with a first-order logic theorem prover, and well-formedness. Across settings, we find that models performing well on graph-matching often fail to produce logically equivalent formulas. Normalization reduces incidental target variability, improves well-formedness, and strengthens logical adequacy. Error analysis shows performance degrades with increasing formula complexity and with coordination, prepositional phrases, and passive voice; the dominant failures involve variable binding and indexing, and predicate naming. These findings highlight limits of graph-based metrics for reasoning-oriented applications and motivate logic-sensitive evaluation and training objectives together with simplified, normalized target representations. All code and data for our experiments are publicly available.
zh

[NLP-49] Fairness Metric Design Exploration in Multi-Domain Moral Sentiment Classification using Transformer-Based Models

【速读】: 该论文旨在解决自然语言处理中道德情感分类任务在跨域迁移场景下的公平性问题,尤其是在Transformer模型广泛应用背景下,传统整体性能指标可能掩盖特定标签上的不公平现象。其核心挑战在于:不同领域(如Twitter与Reddit)间模型表现存在显著不对称性,且某些道德维度(如权威)在跨域时出现严重的群体差异,例如Demographic Parity Difference高达0.22–0.23,Equalized Odds Difference达0.40–0.41。为应对这一问题,作者提出道德公平一致性(Moral Fairness Consistency, MFC)指标,该指标量化了道德基础检测在跨域场景下的稳定性,具有高度的诊断价值——它与Demographic Parity Difference呈现完美的负相关(rho = -1.000, p < 0.001),同时独立于标准性能指标,从而可作为公平性评估的补充工具,帮助识别并优化模型在异构语境中的部署可靠性。

链接: https://arxiv.org/abs/2510.11222
作者: Battemuulen Naranbat,Seyed Sahand Mohammadi Ziabari,Yousuf Nasser Al Husaini,Ali Mohammed Mansoor Alsahag
机构: University of Amsterdam (阿姆斯特丹大学); SUNY Empire State College (纽约州立大学帝国州立学院); Sultan Qaboos University (苏丹卡布斯大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Ensuring fairness in natural language processing for moral sentiment classification is challenging, particularly under cross-domain shifts where transformer models are increasingly deployed. Using the Moral Foundations Twitter Corpus (MFTC) and Moral Foundations Reddit Corpus (MFRC), this work evaluates BERT and DistilBERT in a multi-label setting with in-domain and cross-domain protocols. Aggregate performance can mask disparities: we observe pronounced asymmetry in transfer, with Twitter-Reddit degrading micro-F1 by 14.9% versus only 1.5% for Reddit-Twitter. Per-label analysis reveals fairness violations hidden by overall scores; notably, the authority label exhibits Demographic Parity Differences of 0.22-0.23 and Equalized Odds Differences of 0.40-0.41. To address this gap, we introduce the Moral Fairness Consistency (MFC) metric, which quantifies the cross-domain stability of moral foundation detection. MFC shows strong empirical validity, achieving a perfect negative correlation with Demographic Parity Difference (rho = -1.000, p 0.001) while remaining independent of standard performance metrics. Across labels, loyalty demonstrates the highest consistency (MFC = 0.96) and authority the lowest (MFC = 0.78). These findings establish MFC as a complementary, diagnosis-oriented metric for fairness-aware evaluation of moral reasoning models, enabling more reliable deployment across heterogeneous linguistic contexts. .
zh

[NLP-50] WebRouter: Query-specific Router via Variational Information Bottleneck for Cost-sensitive Web Agent

【速读】: 该论文旨在解决大语言模型(Large Language Model, LLM)驱动的网页代理(web agents)在Web自动化任务中面临的成本-性能权衡问题。由于网页代理的提示(prompt)通常包含目标、动作历史和环境状态等复杂信息,导致LLM集成性能下降,且运行成本较高。解决方案的关键在于提出WebRouter,一个基于信息论训练的查询特定路由器,其核心创新是引入一种成本感知的变分信息瓶颈(cost-aware Variational Information Bottleneck, ca-VIB)目标函数,该函数在学习输入提示压缩表示的同时,显式惩罚预期操作成本,从而实现高效且低成本的决策路径选择。

链接: https://arxiv.org/abs/2510.11221
作者: Tao Li,Jinlong Hu,Yang Wang,Junfeng Liu,Xuejun Liu
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:LLM-brained web agents offer powerful capabilities for web automation but face a critical cost-performance trade-off. The challenge is amplified by web agents’ inherently complex prompts that include goals, action histories, and environmental states, leading to degraded LLM ensemble performance. To address this, we introduce WebRouter, a novel query-specific router trained from an information-theoretic perspective. Our core contribution is a cost-aware Variational Information Bottleneck (ca-VIB) objective, which learns a compressed representation of the input prompt while explicitly penalizing the expected operational cost. Experiments on five real-world websites from the WebVoyager benchmark show that WebRouter reduces operational costs by a striking 87.8% compared to a GPT-4o baseline, while incurring only a 3.8% accuracy drop.
zh

[NLP-51] he Curious Case of Factual (Mis)Alignment between LLM s Short- and Long-Form Answers

链接: https://arxiv.org/abs/2510.11218
作者: Saad Obaid ul Islam,Anne Lauscher,Goran Glavaš
机构: WüNLP, CAIDAS, University of Würzburg (WüNLP, CAIDAS, 约尔大学); Data Science Group, University of Hamburg (数据科学组, 汉堡大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-52] Domain-Specific Data Generation Framework for RAG Adaptation

【速读】: 该论文旨在解决检索增强生成(Retrieval-Augmented Generation, RAG)系统在领域特定场景下适应性不足的问题,即现有RAG模型依赖通用问答数据,难以有效支持专业领域知识的精准响应。解决方案的关键在于提出一个可扩展且模块化的框架RAGen,其核心机制包括:通过语义切分和层级概念提取识别文档中的关键概念;基于布卢姆分类法(Bloom’s Taxonomy)指导生成多样化的提问;结合精确答案抽取与多块检索策略构建高质量的问答-上下文(Question-Answer-Context, QAC)三元组;同时引入精心设计的干扰上下文以提升推理鲁棒性。该框架支持对LLM、检索器和嵌入模型等关键组件的优化,适用于动态演化的大型文档集合,显著增强了RAG系统在科学文献和企业知识库等领域的适配能力。

链接: https://arxiv.org/abs/2510.11217
作者: Chris Xing Tian,Weihao Xie,Zhen Chen,Zhengyuan Yi,Hui Liu,Haoliang Li,Shiqi Wang,Siwei Ma
机构: Peng Cheng Laboratory (鹏程实验室); City University of Hong Kong (香港城市大学); Peking University (北京大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Retrieval-Augmented Generation (RAG) combines the language understanding and reasoning power of large language models (LLMs) with external retrieval to enable domain-grounded responses. Effectively adapting RAG systems to domain-specific settings requires specialized, context-rich training data beyond general-purpose question-answering. Here, we propose RAGen, a scalable and modular framework for generating domain-grounded question-answer-context (QAC) triples tailored to diverse RAG adaptation approaches. RAGen produces these QAC triples by identifying key concepts in documents, generating diverse questions guided by Bloom’s Taxonomy-inspired principles, and pairing them with precise answers extracted from relevant contexts. RAGen supports multiple RAG adaptation strategies, including the optimization of key components such as the LLM, retriever, and embedding model, etc. Its modular pipeline features semantic chunking, hierarchical concept extraction, and multi-chunk retrieval, along with the introduction of curated distractor contexts to promote robust reasoning. Designed for scalability, RAGen efficiently handles large and evolving document corpora without redundant processing, making it especially suitable for dynamic evolving domains such as scientific research and enterprise knowledge bases.
zh

[NLP-53] Discursive Circuits: How Do Language Models Understand Discourse Relations? EMNLP2025

【速读】: 该论文旨在解决Transformer语言模型中负责话语理解(discourse understanding)的组件识别问题,特别是揭示哪些神经元或计算路径在处理话语关系时起关键作用。其核心假设是:稀疏的计算图(称为discursive circuits)控制模型对话语关系的处理方式。解决方案的关键在于提出一种名为“话语关系下的补全任务”(Completion under Discourse Relation, CuDR)的新任务,并构建一个专为激活修补(activation patching)设计的最小对比对语料库,从而实现对这些稀疏电路的高效发现与验证。实验表明,仅约0.2%参数量的稀疏电路即可恢复基于PDTB的数据集上的话语理解能力,并且能泛化到RST和SDRT等未见话语框架,进一步分析显示低层捕捉词汇语义和指代消解等语言特征,而高层则编码话语层面的抽象信息。

链接: https://arxiv.org/abs/2510.11210
作者: Yisong Miao,Min-Yen Kan
机构: National University of Singapore (新加坡国立大学)
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: Accepted to EMNLP 2025 (Main Conference); 9 pages, 8 figures, 5 tables (20 pages, 12 figures, 14 tables including references and appendices)

点击查看摘要

Abstract:Which components in transformer language models are responsible for discourse understanding? We hypothesize that sparse computational graphs, termed as discursive circuits, control how models process discourse relations. Unlike simpler tasks, discourse relations involve longer spans and complex reasoning. To make circuit discovery feasible, we introduce a task called Completion under Discourse Relation (CuDR), where a model completes a discourse given a specified relation. To support this task, we construct a corpus of minimal contrastive pairs tailored for activation patching in circuit discovery. Experiments show that sparse circuits ( \approx 0.2% of a full GPT-2 model) recover discourse understanding in the English PDTB-based CuDR task. These circuits generalize well to unseen discourse frameworks such as RST and SDRT. Further analysis shows lower layers capture linguistic features such as lexical semantics and coreference, while upper layers encode discourse-level abstractions. Feature utility is consistent across frameworks (e.g., coreference supports Expansion-like relations).
zh

[NLP-54] Evaluating Reasoning Faithfulness in Medical Vision-Language Models using Multimodal Perturbations

【速读】: 该论文旨在解决视觉语言模型(Vision-Language Models, VLMs)在胸部X光片视觉问答(VQA)任务中生成的思维链(Chain-of-Thought, CoT)解释虽看似合理但缺乏对实际决策过程忠实性的问题,这在高风险临床场景中严重削弱了用户信任。其解决方案的关键在于提出一个基于临床实践的评估框架,通过控制文本和图像的修改,在三个维度上系统性地探测CoT的忠实性:临床合理性(clinical fidelity)、因果归因(causal attribution)和置信度校准(confidence calibration)。该框架通过放射科医生读者研究验证了其有效性,发现不同模型在各维度表现差异显著,尤其指出答案准确率与解释质量之间存在解耦现象,且开源模型在归因和临床合理性方面普遍劣于专有模型,凸显了仅依赖最终答案准确性进行评估的局限性及部署风险。

链接: https://arxiv.org/abs/2510.11196
作者: Johannes Moll,Markus Graf,Tristan Lemke,Nicolas Lenhart,Daniel Truhn,Jean-Benoit Delbrouck,Jiazhen Pan,Daniel Rueckert,Lisa C. Adams,Keno K. Bressem
机构: Technical University of Munich (TUM); TUM University Hospital; German Heart Center; Stanford University; Klinikum rechts der Isar; Uniklinik RWTH Aachen; HOPPR; University of Oxford; Imperial College London
类目: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Vision-language models (VLMs) often produce chain-of-thought (CoT) explanations that sound plausible yet fail to reflect the underlying decision process, undermining trust in high-stakes clinical use. Existing evaluations rarely catch this misalignment, prioritizing answer accuracy or adherence to formats. We present a clinically grounded framework for chest X-ray visual question answering (VQA) that probes CoT faithfulness via controlled text and image modifications across three axes: clinical fidelity, causal attribution, and confidence calibration. In a reader study (n=4), evaluator-radiologist correlations fall within the observed inter-radiologist range for all axes, with strong alignment for attribution (Kendall’s \tau_b=0.670 ), moderate alignment for fidelity ( \tau_b=0.387 ), and weak alignment for confidence tone ( \tau_b=0.091 ), which we report with caution. Benchmarking six VLMs shows that answer accuracy and explanation quality are decoupled, acknowledging injected cues does not ensure grounding, and text cues shift explanations more than visual cues. While some open-source models match final answer accuracy, proprietary models score higher on attribution (25.0% vs. 1.4%) and often on fidelity (36.1% vs. 31.7%), highlighting deployment risks and the need to evaluate beyond final answer accuracy.
zh

[NLP-55] Can Tool-Integrated Reinforcement Learning Generalize Across Diverse Domains?

链接: https://arxiv.org/abs/2510.11184
作者: Zhengyu Chen,Jinluan Yang,Teng Xiao,Ruochen Zhou,Luan Zhang,Xiangyu Xi,Xiaowei Shi,Wei Wang,Jinggang Wang
机构: Meituan(美团); Zhejiang University (浙江大学); Allen Institute for Artificial Intelligence (艾伦人工智能研究所); City University of Hong Kong (香港城市大学)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-56] EAGER: Entropy-Aware GEneRation for Adaptive Inference-Time Scaling

【速读】: 该论文旨在解决当前基于推理语言模型(reasoning language models)和测试时扩展方法(test-time scaling methods)在生成多候选序列时存在的计算资源分配不合理问题,即对所有输入提示(prompt)均分配相同计算预算,而未考虑不同提示本身复杂度差异导致的计算需求不同。解决方案的关键在于提出一种无需训练的生成方法 EAGer,其核心机制是利用 token 级别的熵分布(token-wise entropy distribution)来量化模型不确定性,并仅在高熵 token 处触发多路径分支探索,从而动态减少冗余计算;同时将节省下来的计算资源重新分配至最需要进一步探索的样本上,实现效率与性能之间的最优权衡(efficiency-performance trade-off)。

链接: https://arxiv.org/abs/2510.11170
作者: Daniel Scalena,Leonidas Zotos,Elisabetta Fersini,Malvina Nissim,Ahmet Üstün
机构: University of Groningen (格罗宁根大学); University of Milano - Bicocca (米兰博科尼大学); Cohere Labs; Cohere
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:With the rise of reasoning language models and test-time scaling methods as a paradigm for improving model performance, substantial computation is often required to generate multiple candidate sequences from the same prompt. This enables exploration of different reasoning paths toward the correct solution, however, allocates the same compute budget for each prompt. Grounded on the assumption that different prompts carry different degrees of complexity, and thus different computation needs, we propose EAGer, a training-free generation method that leverages model uncertainty through token-wise entropy distribution to reduce redundant computation and concurrently improve overall performance. EAGer allows branching to multiple reasoning paths only in the presence of high-entropy tokens, and then reallocates the saved compute budget to the instances where exploration of alternative paths is most needed. We find that across multiple open-source models on complex reasoning benchmarks such as AIME 2025, EAGer can reallocate the budget without accessing target labels, achieving the best efficiency-performance trade-off in terms of reasoning length and Pass@k. When target labels are accessible, EAGer generates up to 65% fewer tokens (hence saving compute) and achieves up to 37% improvement in Pass@k compared to the Full Parallel Sampling.
zh

[NLP-57] ELMO: Efficiency via Low-precision and Peak Memory Optimization in Large Output Spaces ICML2025

【速读】: 该论文旨在解决极端多标签分类(Extreme Multilabel Classification, XMC)任务中因标签空间巨大(可达百万级别)导致的线性分类头成为计算和内存瓶颈的问题。现有方法依赖FP16-FP32混合精度训练,存在稳定性差、内存效率低和计算开销高的缺陷;而低精度方法通常仍对分类层保留高精度以保证性能。本文提出ELMO框架,采用纯低精度训练策略,利用BFloat16和Float8数据类型,在无需单精度主权重或张量缩放的前提下,结合Kahan求和与随机舍入技术,实现Float8下的稳定训练。其关键创新在于通过梯度融合与分块(chunking)等内存优化手段,显著降低GPU显存占用——例如在仅6.6 GiB显存下训练300万标签模型,相较最优基线Renee(39.7 GiB)大幅节省资源且不损失精度。

链接: https://arxiv.org/abs/2510.11168
作者: Jinbin Zhang,Nasib Ullah,Erik Schultheis,Rohit Babbar
机构: 未知
类目: Machine Learning (cs.LG); Computation and Language (cs.CL); Information Retrieval (cs.IR)
备注: Accepted to ICML 2025

点击查看摘要

Abstract:Large output spaces, also referred to as Extreme multilabel classification (XMC), is a setting that arises, e.g., in large-scale tagging and product-to-product recommendation, and is characterized by the number of labels ranging from hundreds of thousands to millions. This means that the linear classification head, usually only a tiny fraction of the overall model, turns into the main driver for compute and memory demand. Current state-of-the-art XMC methods predominantly rely on FP16-FP32 mixed-precision training, which we show can be unstable, and inefficient in terms of memory usage and computational overhead. Meanwhile, existing low-precision methods typically retain higher precision for the classification layer. In this work, we propose ELMO, a pure low-precision training framework for XMC models using BFloat16 and Float8 data types. By leveraging Kahan summation and stochastic rounding, we demonstrate that XMC models can be effectively trained entirely in Float8, without relying on single-precision master weights or tensor scaling. Low-precision training, combined with our proposed memory optimizations – gradient fusion and chunking – enables significant reductions in GPU memory usage. For example, we train a 3-million-label XMC model with only 6.6 GiB of GPU memory, compared to the 39.7 GiB required by the optimized SOTA method, Renee without compromising accuracy.
zh

[NLP-58] Bridging Gaps in Hate Speech Detection: Meta-Collections and Benchmarks for Low-Resource Iberian Languages

【速读】: 该论文旨在解决低资源语言(特别是伊比利亚地区的欧洲西班牙语、葡萄牙语及加利西亚语)在仇恨言论检测中的数据稀缺与多变体忽视问题。当前研究主要集中在英语,缺乏统一标注和标准化元数据的跨变体语料库,且大语言模型在这些语言上难以可靠训练。解决方案的关键在于构建一个基于系统性整合现有资源的元数据集(meta-collection),对欧洲西班牙语进行标准化处理,并通过翻译扩展至欧洲葡萄牙语以及两种具有不同语言趋同特征的加利西亚语变体(分别贴近西班牙语和葡萄牙语),形成对齐的多语言语料库,从而建立适用于伊比利亚语言的新基准,推动跨语言与变体感知的仇恨言论检测方法发展。

链接: https://arxiv.org/abs/2510.11167
作者: Paloma Piot,José Ramom Pichel Campos,Javier Parapar
机构: University of A Coruña (拉科鲁尼亚大学); University of Santiago de Compostela (圣地亚哥德孔波斯特拉大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Hate speech poses a serious threat to social cohesion and individual well-being, particularly on social media, where it spreads rapidly. While research on hate speech detection has progressed, it remains largely focused on English, resulting in limited resources and benchmarks for low-resource languages. Moreover, many of these languages have multiple linguistic varieties, a factor often overlooked in current approaches. At the same time, large language models require substantial amounts of data to perform reliably, a requirement that low-resource languages often cannot meet. In this work, we address these gaps by compiling a meta-collection of hate speech datasets for European Spanish, standardised with unified labels and metadata. This collection is based on a systematic analysis and integration of existing resources, aiming to bridge the data gap and support more consistent and scalable hate speech detection. We extended this collection by translating it into European Portuguese and into a Galician standard that is more convergent with Spanish and another Galician variant that is more convergent with Portuguese, creating aligned multilingual corpora. Using these resources, we establish new benchmarks for hate speech detection in Iberian languages. We evaluate state-of-the-art large language models in zero-shot, few-shot, and fine-tuning settings, providing baseline results for future research. Moreover, we perform a cross-lingual analysis with our target languages. Our findings underscore the importance of multilingual and variety-aware approaches in hate speech detection and offer a foundation for improved benchmarking in underrepresented European languages.
zh

[NLP-59] One Size Does Not Fit All: Exploring Variable Thresholds for Distance-Based Multi-Label Text Classification

【速读】: 该论文旨在解决多标签距离-based文本分类(multi-label distance-based text classification, MLTC)中阈值设定不准确的问题,尤其是在不同模型、数据集和标签集下,文本与标签之间的语义相似度分布存在显著差异,导致统一阈值(如标准化的0.5阈值)性能不佳。解决方案的关键在于提出一种基于验证集的标签特定阈值优化方法(label-specific thresholding),通过为每个标签独立学习最优阈值,显著提升分类性能——实验表明该方法相比标准化0.5阈值平均提升46%,优于以往统一阈值方法平均14%,且在标注样本有限时仍具鲁棒性。

链接: https://arxiv.org/abs/2510.11160
作者: Jens Van Nooten,Andriy Kosar,Guy De Pauw,Walter Daelemans
机构: University of Antwerp (安特卫普大学); CLiPS (计算语言学与符号处理研究中心); Textgain; University of Antwerp (安特卫普大学); CLiPS (计算语言学与符号处理研究中心)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Distance-based unsupervised text classification is a method within text classification that leverages the semantic similarity between a label and a text to determine label relevance. This method provides numerous benefits, including fast inference and adaptability to expanding label sets, as opposed to zero-shot, few-shot, and fine-tuned neural networks that require re-training in such cases. In multi-label distance-based classification and information retrieval algorithms, thresholds are required to determine whether a text instance is “similar” to a label or query. Similarity between a text and label is determined in a dense embedding space, usually generated by state-of-the-art sentence encoders. Multi-label classification complicates matters, as a text instance can have multiple true labels, unlike in multi-class or binary classification, where each instance is assigned only one label. We expand upon previous literature on this underexplored topic by thoroughly examining and evaluating the ability of sentence encoders to perform distance-based classification. First, we perform an exploratory study to verify whether the semantic relationships between texts and labels vary across models, datasets, and label sets by conducting experiments on a diverse collection of realistic multi-label text classification (MLTC) datasets. We find that similarity distributions show statistically significant differences across models, datasets and even label sets. We propose a novel method for optimizing label-specific thresholds using a validation set. Our label-specific thresholding method achieves an average improvement of 46% over normalized 0.5 thresholding and outperforms uniform thresholding approaches from previous work by an average of 14%. Additionally, the method demonstrates strong performance even with limited labeled examples.
zh

[NLP-60] ypePilot: Leverag ing the Scala Type System for Secure LLM -generated Code

【速读】: 该论文旨在解决大型语言模型(Large Language Models, LLMs)在代码生成任务中产生的代码存在潜在安全漏洞的问题,尤其是在高保障领域(high-assurance domains)中,这些漏洞可能引发严重风险。解决方案的关键在于提出一种称为TypePilot的代理式人工智能框架,该框架通过引入强类型且可验证的语言(以Scala为例),并结合形式化验证工具(如Stainless)与结构化的类型引导工作流,显著降低输入验证和注入类漏洞的风险,从而提升LLM生成代码的安全性与鲁棒性。

链接: https://arxiv.org/abs/2510.11151
作者: Alexander Sternfeld,Andrei Kucharavy,Ljiljana Dolamic
机构: HES-SO; armeuisse, Science and Technology
类目: Computation and Language (cs.CL); Cryptography and Security (cs.CR)
备注:

点击查看摘要

Abstract:Large language Models (LLMs) have shown remarkable proficiency in code generation tasks across various programming languages. However, their outputs often contain subtle but critical vulnerabilities, posing significant risks when deployed in security-sensitive or mission-critical systems. This paper introduces TypePilot, an agentic AI framework designed to enhance the security and robustness of LLM-generated code by leveraging strongly typed and verifiable languages, using Scala as a representative example. We evaluate the effectiveness of our approach in two settings: formal verification with the Stainless framework and general-purpose secure code generation. Our experiments with leading open-source LLMs reveal that while direct code generation often fails to enforce safety constraints, just as naive prompting for more secure code, our type-focused agentic pipeline substantially mitigates input validation and injection vulnerabilities. The results demonstrate the potential of structured, type-guided LLM workflows to improve the SotA of the trustworthiness of automated code generation in high-assurance domains.
zh

[NLP-61] How2: How to learn from procedural How-to questions

【速读】: 该论文旨在解决AI代理在规划任务中因如何提问(how-to questions)的开放性而导致的不确定性与知识缺口问题,尤其在于如何有效获取、存储并复用这些问答以支持长期学习和高效决策。其解决方案的关键在于提出一个名为 How^2 的记忆代理框架,该框架使代理能够在交互环境中主动提出如何提问,并将答案以抽象且与当前状态解耦的形式进行存储,从而实现对高阶子目标(sub-goal)级别的知识复用,显著提升基于大语言模型(LLM)的代理在动态环境中的规划能力与终身学习效果。

链接: https://arxiv.org/abs/2510.11144
作者: Gautier Dagan,Frank Keller,Alex Lascarides
机构: University of Edinburgh (爱丁堡大学)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:An agent facing a planning problem can use answers to how-to questions to reduce uncertainty and fill knowledge gaps, helping it solve both current and future tasks. However, their open ended nature, where valid answers to “How do I X?” range from executable actions to high-level descriptions of X’s sub-goals, makes them challenging for AI agents to ask, and for AI experts to answer, in ways that support efficient planning. We introduce How^2 , a memory agent framework that enables agents to ask how-to questions, store the answers, and reuse them for lifelong learning in interactive environments. We evaluate our approach in Plancraft, a Minecraft crafting environment, where agents must complete an assembly task by manipulating inventory items. Using teacher models that answer at varying levels of abstraction, from executable action sequences to high-level subgoal descriptions, we show that lifelong learning agents benefit most from answers that are abstracted and decoupled from the current state. How^2 offers a way for LLM-based agents to improve their planning capabilities over time by asking questions in interactive environments.
zh

[NLP-62] Enhancing LLM Reasoning Reasoning via Non-Human-Like Reasoning Path Preference Optimization

【速读】: 该论文旨在解决当前大语言模型(Large Language Models, LLMs)在推理能力增强过程中因依赖人类或高容量模型标注中间步骤而导致的训练偏差问题,这种偏差限制了模型探索非人类类推理路径的能力,从而制约性能提升。解决方案的关键在于提出一种基于置信度引导的推理路径偏好优化方法(Confidence-Guided Reasoning Path Preference Optimization, CGPO),该方法利用模型自身的置信度信号识别推理过程中不确定性最高的点,并在此处施加由小模型自动生成的、非人类类的推理路径指导,以更早且更精准地干预错误传播,有效缓解轨迹漂移问题。实验表明,在相同数据量下,CGPO使用小模型生成的数据即可实现优于强模型或人工标注数据的方法的性能表现。

链接: https://arxiv.org/abs/2510.11104
作者: Junjie Lu,Yuliang Liu,Chaofeng Qu,Wei Shen,Zhouhan Lin,Min Xu
机构: University of Technology Sydney (悉尼科技大学); Shanghai Innovation Institute (上海创新研究院); Southeast University (东南大学); Nanjing University (南京大学); Shanghai Jiao Tong University (上海交通大学); Independent Researcher
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 13 pages

点击查看摘要

Abstract:Current approaches for strengthening LLM reasoning tend to introduce a training bias toward human-like reasoning trajectories. In step-wise preference optimization, in particular, dependence on human or higher-capacity model annotations for intermediate steps limits exploration of alternative, non-human-like reasoning paths and thus constrains achievable performance. Furthermore, through a small-scale pilot study, we observed that in approximately 75% of cases, the model’s first erroneous step occurs after the lowest-confidence point. This suggests that guiding the model at its lowest-confidence point before an error provides more accurate supervision than locating the first explicit error. In this paper, we propose Confidence-Guided Reasoning Path Preference Optimization (CGPO), a method that leverages a confidence signal to identify points of maximal uncertainty in the model’s reasoning process and applies self-generated, non-human-like reasoning-path guidance to mitigate trajectory drift. Our experiments span diverse models applied to both code and mathematical reasoning tasks. The results show that, with the same amount of training data, our method using data generated by a small model can achieve better performance in most cases compared with approaches using data generated by a strong model or human-annotated.
zh

[NLP-63] VCB Bench: An Evaluation Benchmark for Audio-Grounded Large Language Model Conversational Agents

【速读】: 该论文旨在解决当前多模态对话系统评估中存在的局限性问题,即现有基准测试主要以英语为中心、依赖合成语音且缺乏在多个维度上的细致区分性评估。为应对这一挑战,作者提出了Voice Chat Bot Bench (VCB Bench)——一个完全基于真实人类语音构建的高质量中文基准测试平台。其关键创新在于从三个互补视角对大音频语言模型(Large Audio Language Models, LALMs)进行系统化评估:指令遵循能力(包括语音层面的控制,而不仅是文本命令)、知识理解能力(涵盖通用知识、推理与日常对话)以及鲁棒性(在内容、环境和说话人特征扰动下的稳定性)。该方案提供了一个可复现、细粒度的评估框架,推动了中文语音对话模型的发展方向。

链接: https://arxiv.org/abs/2510.11098
作者: Jiliang Hu,Wenfu Wang,Zuchao Li,Chenxing Li,Yiyang Zhao,Hanzhao Li,Liqiang Zhang,Meng Yu,Dong Yu
机构: Tencent AI Lab (腾讯AI实验室); Wuhan University (武汉大学)
类目: ound (cs.SD); Computation and Language (cs.CL)
备注: 20 pages, 5 figures

点击查看摘要

Abstract:Recent advances in large audio language models (LALMs) have greatly enhanced multimodal conversational systems. However, existing benchmarks remain limited – they are mainly English-centric, rely on synthetic speech, and lack comprehensive, discriminative evaluation across multiple dimensions. To address these gaps, we present Voice Chat Bot Bench (VCB Bench) – a high-quality Chinese benchmark built entirely on real human speech. VCB Bench evaluates LALMs from three complementary perspectives: instruction following (including speech-level control beyond text commands), knowledge understanding (general knowledge, reasoning, and daily dialogue), and robustness (stability under perturbations in content, environment, and speaker traits). Experiments on representative LALMs reveal notable performance gaps and highlight future directions for improvement. VCB Bench provides a reproducible and fine-grained evaluation framework, offering standardized methodology and practical insights for advancing Chinese voice conversational models.
zh

[NLP-64] Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States

【速读】: 该论文旨在解决自回归(Autoregressive, AR)模型在自然语言生成中因严格串行解码而导致的高延迟问题,以及现有扩散类并行生成方法(如LlaDA和Dream)所面临的两个核心缺陷:信息丢失(预测分布在每一步被丢弃)和过早决策(局部决策缺乏全局协调)。其解决方案的关键在于提出一种两阶段框架——潜在精炼解码(Latent Refinement Decoding, LRD),第一阶段通过将未完成位置建模为预测token与掩码嵌入的分布混合,以建立更全局一致的信念;第二阶段逐步确定高置信度token,同时保留不确定token进行迭代反馈,利用KL散度动态作为收敛与提前停止的可靠判据,从而在提升生成准确性的同时实现最高达10.6倍的速度提升。

链接: https://arxiv.org/abs/2510.11052
作者: Qinglin Zhu,Yizhen Yao,Runcong Zhao,Yanzheng Xiang,Amrutha Saseendran,Chen Jin,Philip Alexander Teare,Bin Liang,Yulan He,Lin Gui
机构: King’s College London (国王学院); The Alan Turing Institute (艾伦图灵研究所); AstraZeneca (阿斯利康); The Chinese University of Hong Kong (香港中文大学); MoE Lab, CUHK (教育部实验室,香港中文大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Autoregressive (AR) models remain the standard for natural language generation but still suffer from high latency due to strictly sequential decoding. Recent diffusion-inspired approaches, such as LlaDA and Dream, mitigate this by generating in parallel, yet they suffer from two core limitations: information loss, as predictive distributions for non-finalized tokens are discarded at each step, and premature commitment, where local decisions are made without sufficient global coordination. We introduce Latent Refinement Decoding (LRD), a two-stage framework with Latent Refinement and a Predictive Feedback Loop. The first stage maintains masked positions as distributional mixtures of predicted tokens and the mask embedding, allowing the model to establish more globally consistent beliefs. The second stage progressively finalizes confident tokens while retaining uncertain ones for iterative feedback. KL-divergence dynamics provide a principled and reliable criterion for convergence and early stopping. Experiments across coding (HumanEval +6.3, MBPP +2.6) and reasoning (GSM8K +2.9, MATH500 +3.8) show that LRD improves accuracy while delivering speedups of up to 10.6x, making it a strong and versatile alternative for parallel sequence generation.
zh

[NLP-65] Enabling Doctor-Centric Medical AI with LLM s through Workflow-Aligned Tasks and Benchmarks

【速读】: 该论文旨在解决大语言模型(Large Language Models, LLMs)在医疗场景中直接面向患者部署时因领域专业知识有限而带来的安全风险问题。其解决方案的关键在于重新定位LLMs的角色,使其作为医生的临床辅助工具而非直接与患者交互,从而提升安全性与实用性。为此,作者通过两阶段启发-反馈调查识别临床工作流程中的真实需求,并构建了DoctorFLAN——一个包含92,000个问答实例的大规模中文医疗数据集,覆盖22项临床任务和27个专科方向;同时设计了DoctorFLAN-test(550个单轮问答)和DotaBench(74个多轮对话)两个评估基准,用于量化评测LLM在面向医生的应用场景下的性能表现。实验表明,DoctorFLAN显著提升了开源LLMs在医学任务中的能力,有助于其与临床工作流对齐,并补充现有面向患者的医疗模型体系。

链接: https://arxiv.org/abs/2510.11040
作者: Wenya Xie,Qingying Xiao,Yu Zheng,Xidong Wang,Junying Chen,Ke Ji,Anningzhe Gao,Prayag Tiwari,Xiang Wan,Feng Jiang,Benyou Wang
机构: University of Minnesota (明尼苏达大学); Shenzhen Institutes of Research of Big Data (深圳大数据研究院); The Chinese University of Hong Kong, Shenzhen (香港中文大学(深圳)); Southern University of Science and Technology, Shenzhen (南方科技大学,深圳)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:The rise of large language models (LLMs) has transformed healthcare by offering clinical guidance, yet their direct deployment to patients poses safety risks due to limited domain expertise. To mitigate this, we propose repositioning LLMs as clinical assistants that collaborate with experienced physicians rather than interacting with patients directly. We conduct a two-stage inspiration-feedback survey to identify real-world needs in clinical workflows. Guided by this, we construct DoctorFLAN, a large-scale Chinese medical dataset comprising 92,000 QA instances across 22 clinical tasks and 27 specialties. To evaluate model performance in doctor-facing applications, we introduce DoctorFLAN-test (550 single-turn QA items) and DotaBench (74 multi-turn conversations). Experimental results with over ten popular LLMs demonstrate that DoctorFLAN notably improves the performance of open-source LLMs in medical contexts, facilitating their alignment with physician workflows and complementing existing patient-oriented models. This work contributes a valuable resource and framework for advancing doctor-centered medical LLM development
zh

[NLP-66] LogiNumSynth: Synthesizing Joint Logical-Numerical Reasoning Problems for Language Models

【速读】: 该论文旨在解决语言模型在联合逻辑-数值推理(joint logical-numerical reasoning)方面仍存在显著不足的问题,现有数据集受限于固定规则集且难以控制任务复杂度,从而限制了模型的评估与训练泛化能力。解决方案的关键在于提出 LogiNumSynth——一个可灵活控制推理世界丰富度、逻辑推理深度及数值计算复杂度的自然语言问题合成器,能够生成具有细粒度可控性的联合推理任务,进而支持对大语言模型(LLMs)进行针对性训练和过程准确性分析,有效提升其整合推理能力。

链接: https://arxiv.org/abs/2510.11031
作者: Yiwei Liu,Yucheng Li,Xiao Li,Gong Cheng
机构: Nanjing University (南京大学)
类目: Computation and Language (cs.CL)
备注: 30 pages, 3 figures

点击查看摘要

Abstract:Joint logical-numerical reasoning remains a major challenge for language models, yet existing datasets rely on fixed rule sets and offer limited control over task complexity, constraining their generalizability for evaluation and training. We present LogiNumSynth, a flexible natural language problem synthesizer that synthesizes tasks requiring proficiency in joint logical reasoning (e.g., rule-based reasoning) and numerical reasoning (e.g., arithmetic computation). LogiNumSynth supports fine-grained control over reasoning world richness, logical reasoning depth, and the complexity of numerical computations, enabling flexible data synthesis across difficulty levels. We demonstrate three key contributions: (1) Synthesizer – synthesizing fully controllable joint reasoning tasks over natural language; (2) Evaluation Process Analysis – evaluating both process accuracy and answer accuracy; (3) Targeted Training – using synthesized data to enhance LLMs’ reasoning performance. Experiments with multiple LLMs highlight persistent weaknesses in logical-numerical reasoning, showing that LogiNumSynth can serve as both a diagnostic tool and a source of targeted supervision for advancing integrated reasoning skills.
zh

[NLP-67] Automating Structural Engineering Workflows with Large Language Model Agents

【速读】: 该论文旨在解决结构工程领域长期存在的自动化程度低的问题,即尽管该领域具有显著的经济影响和庞大的市场规模,其核心工作流程几十年来基本未发生实质性变革。解决方案的关键在于提出首个面向结构工程的多智能体系统(Multi-Agent System for Structural Engineering, MASSE),通过将大语言模型(Large Language Model, LLM)驱动的智能体与真实工程工作流深度融合,实现无需训练即可自动执行包括解读设计规范、荷载计算和结构承载力验证等复杂任务。实证表明,MASSE可在不牺牲可靠性与准确性的前提下,将专家工作时间从约两小时缩短至几分钟,从而显著提升工程效率。

链接: https://arxiv.org/abs/2510.11004
作者: Haoran Liang,Yufa Zhou,Mohammad Talebi Kalaleh,Qipei Mei
机构: 未知
类目: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Computation and Language (cs.CL)
备注: Code: this https URL

点击查看摘要

Abstract:We introduce \textbfMASSE , the first Multi-Agent System for Structural Engineering, effectively integrating large language model (LLM)-based agents with real-world engineering workflows. Structural engineering is a fundamental yet traditionally stagnant domain, with core workflows remaining largely unchanged for decades despite its substantial economic impact and global market size. Recent advancements in LLMs have significantly enhanced their ability to perform complex reasoning, long-horizon planning, and precise tool utilization – capabilities well aligned with structural engineering tasks such as interpreting design codes, executing load calculations, and verifying structural capacities. We present a proof-of-concept showing that most real-world structural engineering workflows can be fully automated through a training-free LLM-based multi-agent system. MASSE enables immediate deployment in professional environments, and our comprehensive validation on real-world case studies demonstrates that it can reduce expert workload from approximately two hours to mere minutes, while enhancing both reliability and accuracy in practical engineering scenarios.
zh

[NLP-68] DND: Boosting Large Language Models with Dynamic Nested Depth

链接: https://arxiv.org/abs/2510.11001
作者: Tieyuan Chen,Xiaodong Chen,Haoxing Chen,Zhenzhong Lan,Weiyao Lin,Jianguo Li
机构: Inclusion AI; Shanghai Jiao Tong University; ZhongguanCun Academy; Renmin University of China; Westlake University
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: TL;DR: We introduce Dynamic Nested Depth (DND), an efficient paradigm that adaptively identifies critical tokens and selectively deepens their computation via nested re-processing

点击查看摘要

[NLP-69] ABLEIST: Intersectional Disability Bias in LLM -Generated Hiring Scenarios

链接: https://arxiv.org/abs/2510.10998
作者: Mahika Phutane,Hayoung Jung,Matthew Kim,Tanushree Mitra,Aditya Vashistha
机构: Cornell University (康奈尔大学); Princeton University (普林斯顿大学); University of Washington (华盛顿大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
备注: 28 pages, 11 figures, 16 tables. In submission

点击查看摘要

[NLP-70] DeepResearchGuard: Deep Research with Open-Domain Evaluation and Multi-Stage Guardrails for Safety

【速读】: 该论文旨在解决当前深度研究框架(Deep Research Framework)在生成综合报告时存在的评估不足与阶段保护缺失问题,尤其是在可信度、连贯性、广度、深度及安全性等方面的忽视,可能导致有害或恶意信息被整合进最终报告。其解决方案的关键在于提出DEEPRESEARCHGUARD,一个包含四阶段防护机制的综合性框架,通过开放域参考文献与报告的多维评估(涵盖防御成功率、过度拒绝率及五大报告维度),实现从输入到输出各阶段的系统性安全控制。其中,输入防护提供早期风险过滤,计划与研究阶段分别强化引用规范性和来源可信度,从而有效阻断有害内容传播并提升整体报告质量,同时避免因过度保守导致的拒绝响应。

链接: https://arxiv.org/abs/2510.10994
作者: Wei-Chieh Huang,Henry Peng Zou,Yaozu Wu,Dongyuan Li,Yankai Chen,Weizhi Zhang,Yangning Li,Angelo Zangari,Jizhou Guo,Chunyu Miao,Liancheng Fang,Langzhou He,Renhe Jiang,Philip S. Yu
机构: University of Illinois Chicago (伊利诺伊大学芝加哥分校); University of Tokyo (东京大学); Tsinghua University (清华大学); Shanghai Jiao Tong University (上海交通大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Deep research frameworks have shown promising capabilities in synthesizing comprehensive reports from web sources. While deep research possesses significant potential to address complex issues through planning and research cycles, existing frameworks are deficient in sufficient evaluation procedures and stage-specific protections. They typically treat evaluation as exact match accuracy of question-answering, but overlook crucial aspects of report quality such as credibility, coherence, breadth, depth, and safety. This oversight may result in hazardous or malicious sources being integrated into the final report. To address these issues, we introduce DEEPRESEARCHGUARD, a comprehensive framework featuring four-stage safeguards with open-domain evaluation of references and reports. We assess performance across multiple metrics, e.g., defense success rate and over-refusal rate, and five key report dimensions. In the absence of a suitable safety benchmark, we introduce DRSAFEBENCH, a stage-wise benchmark for deep research safety. Our evaluation spans diverse state-of-the-art LLMs, including GPT-4o, Gemini-2.5-flash, DeepSeek-v3, and o4-mini. DEEPRESEARCHGUARD achieves an average defense success rate improvement of 18.16% while reducing over-refusal rate by 6%. The input guard provides the most substantial early-stage protection by filtering out obvious risks, while the plan and research guards enhance citation discipline and source credibility. Through extensive experiments, we show that DEEPRESEARCHGUARD enables comprehensive open-domain evaluation and stage-aware defenses that effectively block harmful content propagation, while systematically improving report quality without excessive over-refusal rates. The code can be found via this https URL.
zh

[NLP-71] A Survey on Agent ic Multimodal Large Language Models

链接: https://arxiv.org/abs/2510.10991
作者: Huanjin Yao,Ruifei Zhang,Jiaxing Huang,Jingyi Zhang,Yibo Wang,Bo Fang,Ruolin Zhu,Yongcheng Jing,Shunyu Liu,Guanbin Li,Dacheng Tao
机构: Nanyang Technological University, Singapore; Chinese University of Hong Kong, Shenzhen, China; Shenzhen Research Institute of Big Data, China; Sun Yat-sen University, China; City University of Hong Kong, China; Communication University of China, China
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-72] Secret-Protected Evolution for Differentially Private Synthetic Text Generation

【速读】: 该论文旨在解决现有差分隐私(Differentially Private, DP)合成文本生成方法中因对所有内容施加统一隐私保护而导致的效用损失和计算开销过大的问题,尤其在非敏感内容上存在过度保护现象。解决方案的关键在于提出一种名为Secret-Protected Evolution (SecPE) 的新框架,其核心创新是引入“秘密感知”(secret-aware)的隐私保护机制,理论证明该方法满足 (\mathrmp, \mathrmr) -secret 保护,作为高斯差分隐私(Gaussian Differential Privacy, GDP)的一种松弛形式,能够在保证隐私的同时实现更优的效用-隐私权衡,并显著降低计算复杂度。实验证明,SecPE 在 OpenReview、PubMed 和 Yelp 基准上均优于基于 GDP 的 Aug-PE 基线,在更低噪声水平下仍能保持更高下游任务准确率与更低的 Fréchet Inception Distance (FID)。

链接: https://arxiv.org/abs/2510.10990
作者: Tianze Wang,Zhaoyu Chen,Jian Du,Yingtai Xiao,Linjun Zhang,Qiang Yan
机构: TikTok; Rutgers University (罗格斯大学)
类目: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
备注:

点击查看摘要

Abstract:Text data has become extremely valuable on large language models (LLMs) and even lead to general artificial intelligence (AGI). A lot of high-quality text in the real world is private and cannot be freely used due to privacy concerns. Therefore, differentially private (DP) synthetic text generation has been proposed, aiming to produce high-utility synthetic data while protecting sensitive information. However, existing DP synthetic text generation imposes uniform guarantees that often overprotect non-sensitive content, resulting in substantial utility loss and computational overhead. Therefore, we propose Secret-Protected Evolution (SecPE), a novel framework that extends private evolution with secret-aware protection. Theoretically, we show that SecPE satisfies (\mathrmp, \mathrmr) -secret protection, constituting a relaxation of Gaussian DP that enables tighter utility-privacy trade-offs, while also substantially reducing computational complexity relative to baseline methods. Empirically, across the OpenReview, PubMed, and Yelp benchmarks, SecPE consistently achieves lower Fréchet Inception Distance (FID) and higher downstream task accuracy than GDP-based Aug-PE baselines, while requiring less noise to attain the same level of protection. Our results highlight that secret-aware guarantees can unlock more practical and effective privacy-preserving synthetic text generation.
zh

[NLP-73] Revisiting Model Interpolation for Efficient Reasoning

【速读】: 该论文旨在解决模型合并(model merging)中如何高效实现推理能力优化的问题,尤其是在保持计算成本可控的前提下提升模型性能。其核心挑战在于现有复杂合并方法往往难以在效率与效果之间取得最佳平衡。解决方案的关键在于发现并利用权重直接插值(direct weight interpolation)所遵循的三阶段演化规律(three-stage evolutionary paradigm),该规律揭示了模型在插值过程中推理轨迹上的行为差异,从而为合理选择插值比例提供了理论依据。实证结果表明,通过策略性地应用这一简单插值方法,所得到的模型在效率和有效性上均优于复杂的合并基线,证明了该方法的有效性和实用性。

链接: https://arxiv.org/abs/2510.10977
作者: Taiqiang Wu,Runming Yang,Tao Liu,Jiahao Wang,Ngai Wong
机构: The University of Hong Kong (香港大学); Tsinghua University (清华大学)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: 14 pages, 6 figures, 7 tables. Working in progress

点击查看摘要

Abstract:Model merging, typically on Instruct and Thinking models, has shown remarkable performance for efficient reasoning. In this paper, we systematically revisit the simplest merging method that interpolates two weights directly. Particularly, we observe that model interpolation follows a three-stage evolutionary paradigm with distinct behaviors on the reasoning trajectory. These dynamics provide a principled guide for navigating the performance-cost trade-off. Empirical results demonstrate that a strategically interpolated model surprisingly surpasses sophisticated model merging baselines on both efficiency and effectiveness. We further validate our findings with extensive ablation studies on model layers, modules, and decoding strategies. Ultimately, this work demystifies model interpolation and offers a practical framework for crafting models with precisely targeted reasoning capabilities. Code is available at \hrefthis https URLGithub.
zh

[NLP-74] Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning

链接: https://arxiv.org/abs/2510.10974
作者: Zhiwen Ruan,Yixia Li,He Zhu,Yun Chen,Peng Li,Yang Liu,Guanhua Chen
机构: Southern University of Science and Technology (南方科技大学); Peking University (北京大学); Shanghai University of Finance and Economics (上海财经大学); Tsinghua University (清华大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-75] RV-HATE: Reinforced Multi-Module Voting for Implicit Hate Speech Detection

链接: https://arxiv.org/abs/2510.10971
作者: Yejin Lee,Hyeseon Ahn,Yo-Sub Han
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 10 pages, 9 figures, 12 tables

点击查看摘要

[NLP-76] Judge Before Answer: Can MLLM Discern the False Premise in Question?

链接: https://arxiv.org/abs/2510.10965
作者: Jidong Li,Lingyong Fang,Haodong Zhao,Sufeng Duan,Gongshen Liu
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-77] KOTOX: A Korean Toxic Dataset for Deobfuscation and Detoxification

链接: https://arxiv.org/abs/2510.10961
作者: Yejin Lee,Su-Hyeon Kim,Hyundong Jin,Dayoung Kim,Yeonsoo Kim,Yo-Sub Han
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 25 pages, 5 figures, 25 tables

点击查看摘要

[NLP-78] Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning

链接: https://arxiv.org/abs/2510.10959
作者: Xiaoyun Zhang,Xiaojian Yuan,Di Huang,Wang You,Chen Hu,Jingqing Ruan,Kejiang Chen,Xing Hu
机构: State Key Lab of Processors, Institute of Computing Technology, CAS (中国科学院计算技术研究所); University of Science and Technology of China (中国科学技术大学); University of Chinese Academy of Sciences (中国科学院大学); StepFun Inc
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
备注: 16 pages, 4 figures

点击查看摘要

[NLP-79] Punctuation-aware treebank tree binarization

链接: https://arxiv.org/abs/2510.10951
作者: Eitan Klinger,Vivaan Wadhwa,Jungyeul Park
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-80] he Social Cost of Intelligence: Emergence Propagation and Amplification of Stereotypical Bias in Multi-Agent Systems

【速读】: 该论文旨在解决多智能体系统(Multi-Agent Systems, MAS)中由于多个大语言模型(Large Language Models, LLMs)协同交互所引发的刻板偏见(stereotypical bias)问题,特别是其在系统内部如何涌现、传播及放大。解决方案的关键在于识别并量化三个核心因素的影响:智能体内部专业化程度、底层LLM的质量以及智能体间通信协议的设计。研究发现,尽管MAS整体上比单智能体系统更易受偏见影响,但通过采用合作式或辩论式通信机制可有效抑制偏见放大,同时使用更具鲁棒性的底层LLM能提升整个系统的公平性与稳定性。

链接: https://arxiv.org/abs/2510.10943
作者: Thi-Nhung Nguyen,Linhao Luo,Thuy-Trang Vu,Dinh Phung
机构: Monash University (莫纳什大学)
类目: Multiagent Systems (cs.MA); Computation and Language (cs.CL)
备注: 15 pages, 19 figures, Preprint. Under review

点击查看摘要

Abstract:Bias in large language models (LLMs) remains a persistent challenge, manifesting in stereotyping and unfair treatment across social groups. While prior research has primarily focused on individual models, the rise of multi-agent systems (MAS), where multiple LLMs collaborate and communicate, introduces new and largely unexplored dynamics in bias emergence and propagation. In this work, we present a comprehensive study of stereotypical bias in MAS, examining how internal specialization, underlying LLMs and inter-agent communication protocols influence bias robustness, propagation, and amplification. We simulate social contexts where agents represent different social groups and evaluate system behavior under various interaction and adversarial scenarios. Experiments on three bias benchmarks reveal that MAS are generally less robust than single-agent systems, with bias often emerging early through in-group favoritism. However, cooperative and debate-based communication can mitigate bias amplification, while more robust underlying LLMs improve overall system stability. Our findings highlight critical factors shaping fairness and resilience in multi-agent LLM systems.
zh

[NLP-81] End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF: A Reproducibility Study

【速读】: 该论文旨在解决序列标注任务中传统方法依赖手工设计特征(hand-crafted features)的问题,从而提升命名实体识别(NER)和词性标注(POS tagging)等任务的性能。其解决方案的关键在于提出一种端到端(end-to-end)的神经网络架构——BiLSTM-CNN-CRF模型,该模型通过卷积神经网络(CNNs)提取字符级表示、双向长短期记忆网络(BiLSTMs)建模词级上下文信息,并利用条件随机场(CRFs)进行结构化预测,有效避免了人工特征工程,同时在CoNLL-2003 NER数据集上实现了91.18%的F1分数,验证了该方法在序列标注任务中的有效性。

链接: https://arxiv.org/abs/2510.10936
作者: Anirudh Ganesh,Jayavardhan Reddy
机构: The Ohio State University (俄亥俄州立大学)
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:We present a reproducibility study of the state-of-the-art neural architecture for sequence labeling proposed by Ma and Hovy (2016)\citema2016end. The original BiLSTM-CNN-CRF model combines character-level representations via Convolutional Neural Networks (CNNs), word-level context modeling through Bi-directional Long Short-Term Memory networks (BiLSTMs), and structured prediction using Conditional Random Fields (CRFs). This end-to-end approach eliminates the need for hand-crafted features while achieving excellent performance on named entity recognition (NER) and part-of-speech (POS) tagging tasks. Our implementation successfully reproduces the key results, achieving 91.18% F1-score on CoNLL-2003 NER and demonstrating the model’s effectiveness across sequence labeling tasks. We provide a detailed analysis of the architecture components and release an open-source PyTorch implementation to facilitate further research.
zh

[NLP-82] Evaluating Language Models Evaluations of Games

【速读】: 该论文试图解决的问题是:当前人工智能(AI)系统的评估主要聚焦于其问题求解能力,而忽视了AI在判断哪些问题值得解决方面的评估能力,即“评价的评价”(evaluation of evaluations)。为弥补这一空白,论文提出了一种新的评估范式,旨在衡量AI系统对游戏的评价能力,特别是从收益(或公平性)和趣味性两个维度进行量化。解决方案的关键在于构建了一个形式化的评价框架,并基于包含100余款新型棋盘游戏及450条人类判断的大规模数据集,对比现代语言与推理模型、人类以及符号计算代理在上述两类评价任务中的表现。研究发现,推理模型相较于非推理型语言模型更贴近人类评价,但随着模型逼近博弈论最优解,其与人类数据的一致性反而下降,且趣味性评估表现出更高的不规则性和资源使用波动,凸显出引入更具资源理性(resource-rational)的元推理机制对提升AI评价能力的重要性。

链接: https://arxiv.org/abs/2510.10930
作者: Katherine M. Collins,Cedegao E. Zhang,Graham Todd,Lance Ying,Mauricio Barba da Costa,Ryan Liu,Prafull Sharma,Adrian Weller,Ionatan Kuperwajs,Lionel Wong,Joshua B. Tenenbaum,Thomas L. Griffiths
机构: University of Cambridge (剑桥大学); MIT (麻省理工学院); NYU (纽约大学); Harvard University (哈佛大学); Princeton University (普林斯顿大学); Stanford University (斯坦福大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Pre-print

点击查看摘要

Abstract:Reasoning is not just about solving problems – it is also about evaluating which problems are worth solving at all. Evaluations of artificial intelligence (AI) systems primarily focused on problem solving, historically by studying how models play games such as chess and Go. In this paper, we advocate for a new paradigm that assesses AI systems’ evaluation of games. First, we introduce a formalism for evaluating such evaluations. We then leverage a large-scale dataset of over 100 novel board games and over 450 human judgments to compare evaluations produced by modern language and reasoning models against those of people and symbolic computational agents. We consider two kinds of evaluative queries: assessing the payoff (or fairness) and the funness of games. These queries span two dimensions relevant to the design of evaluations of AI evaluations: how complex a query is to compute and how difficult a query is to quantify. Our results show that reasoning models are generally more aligned to people in their evaluations of games than non-reasoning language models. However, we observe a non-monotonic relationship: as models get closer to game-theoretic optimal, their fit to human data weakens. We also observe more “jaggedness” across models for assessing funness, in line with the greater difficulty of quantifying this query. Across queries and games, reasoning models show highly variable and unpredictable resource usage when assessing queries, pointing to the importance of imbuing more resource-rational meta-reasoning in language and reasoning models.
zh

[NLP-83] GapDNER: A Gap-Aware Grid Tagging Model for Discontinuous Named Entity Recognition IJCNN2025

链接: https://arxiv.org/abs/2510.10927
作者: Yawen Yang,Fukun Ma,Shiao Meng,Aiwei Liu,Lijie Wen
机构: 未知
类目: Computation and Language (cs.CL)
备注: Accepted by IJCNN 2025

点击查看摘要

[NLP-84] Find Your Optimal Teacher: Personalized Data Synthesis via Router-Guided Multi-Teacher Distillation

链接: https://arxiv.org/abs/2510.10925
作者: Hengyuan Zhang,Shiping Yang,Xiao Liang,Chenming Shang,Yuxuan Jiang,Chaofan Tao,Jing Xiong,Hayden Kwok-Hay So,Ruobing Xie,Angel X. Chang,Ngai Wong
机构: The University of Hong Kong (香港大学); Simon Fraser University (西蒙弗雷泽大学); University of California, Los Angeles (加州大学洛杉矶分校); Dartmouth College (达特茅斯学院); University of Maryland, Baltimore County (马里兰大学巴尔的摩县分校); Tencent (腾讯)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注: 19 pages, 10 figures

点击查看摘要

[NLP-85] ADVICE: Answer-Dependent Verbalized Confidence Estimation

链接: https://arxiv.org/abs/2510.10913
作者: Ki Jung Seo,Sehun Lim,Taeuk Kim
机构: Hanyang University (汉阳大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-86] LLM timesMapReduce-V3: Enabling Interactive In-Depth Survey Generation through a MCP-Driven Hierarchically Modular Agent System EMNLP2025

链接: https://arxiv.org/abs/2510.10890
作者: Yu Chao,Siyu Lin,xiaorong wang,Zhu Zhang,Zihan Zhou,Haoyu Wang,Shuo Wang,Jie Zhou,Zhiyuan Liu,Maosong Sun
机构: Tsinghua University (清华大学); Peking University (北京大学); Modelbest Inc.; Nanyang Technological University (南洋理工大学)
类目: Computation and Language (cs.CL)
备注: Accepted by EMNLP2025 System Demonstration

点击查看摘要

[NLP-87] Rethinking Agent ic Workflows: Evaluating Inference-Based Test-Time Scaling Strategies in Text2SQL Tasks

链接: https://arxiv.org/abs/2510.10885
作者: Jiajing Guo,Kenil Patel,Jorge Piazentin Ono,Wenbin He,Liu Ren
机构: Bosch Research North America (博世北美研究中心); Bosch Center for Artificial Intelligence (博世人工智能中心)
类目: Computation and Language (cs.CL); Databases (cs.DB)
备注: Accepted at COLM 2025 SCALR Workshop

点击查看摘要

[NLP-88] DUAL-Bench: Measuring Over-Refusal and Robustness in Vision-Language Models

【速读】: 该论文旨在解决视觉语言模型(Vision-Language Models, VLMs)在安全与有用性之间难以平衡的问题,特别是针对“过度拒绝”(over-refusal)现象——即模型因过度谨慎而拒绝合理请求,尤其是在图像模态下存在潜在危害内容时。其核心挑战在于如何实现“安全完成”(safe completion),即在确保不执行有害指令的同时,仍能完成请求中无害的部分,并对潜在风险进行明确提示。解决方案的关键是提出首个专注于多模态场景下过拒绝与安全完成的基准测试工具 DUAL-Bench,通过系统评估18个VLM在12类危险类别下的表现,尤其关注语义保持的视觉扰动下的鲁棒性,从而推动更细粒度对齐策略的发展,以提升模型在复杂多模态环境中的安全性与实用性。

链接: https://arxiv.org/abs/2510.10846
作者: Kaixuan Ren,Preslav Nakov,Usman Naseem
机构: 未知
类目: Computation and Language (cs.CL)
备注: 25 pages, 91 figures, submitted to Oct ARR under reviewing

点击查看摘要

Abstract:As vision-language models become increasingly capable, maintaining a balance between safety and usefulness remains a central challenge. Safety mechanisms, while essential, can backfire, causing over-refusal, where models decline benign requests out of excessive caution. Yet, no existing benchmark has systematically addressed over-refusal in the visual modality. This setting introduces unique challenges, such as dual-use cases where an instruction is harmless, but the accompanying image contains harmful content. Models frequently fail in such scenarios, either refusing too conservatively or completing tasks unsafely, which highlights the need for more fine-grained alignment. The ideal behavior is safe completion, i.e., fulfilling the benign parts of a request while explicitly warning about any potentially harmful elements. To address this, we present DUAL-Bench, the first multimodal benchmark focused on over-refusal and safe completion in VLMs. We evaluated 18 VLMs across 12 hazard categories, with focus on their robustness under semantics-preserving visual perturbations. The results reveal substantial room for improvement: GPT-5-Nano achieves 12.9% safe completion, GPT-5 models average 7.9%, and Qwen models only 3.9%. We hope that DUAL-Bench will foster the development of more nuanced alignment strategies that ensure models remain both safe and useful in complex multimodal settings.
zh

[NLP-89] Happiness is Sharing a Vocabulary: A Study of Transliteration Methods

链接: https://arxiv.org/abs/2510.10827
作者: Haeji Jung,Jinju Kim,Kyungjin Kim,Youjeong Roh,David R. Mortensen
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-90] DRIFT: Decompose Retrieve Illustrate then Formalize Theorems

【速读】: 该论文旨在解决大型语言模型(Large Language Models, LLMs)在数学定理证明自动形式化(autoformalization)过程中难以有效识别和利用前置数学知识及其形式化表示的问题。当前基于检索增强的方法直接对非形式化数学陈述进行查询,但忽略了非形式化陈述本身复杂且上下文信息有限的局限性,导致前提检索效果不佳。论文提出的解决方案关键在于引入DRIFT框架,该框架通过将复杂的非形式化数学陈述分解为更易处理的“子组件”,从而实现针对数学库(如Mathlib)中前提的精准检索;同时,DRIFT还检索示例定理以辅助模型更好地运用前提,提升形式化任务的表现。实验证明,DRIFT显著优于DPR基线,在多个基准测试中大幅提高F1分数,并展现出对不同分布数据的良好适应性。

链接: https://arxiv.org/abs/2510.10815
作者: Meiru Zhang,Philipp Borchert,Milan Gritta,Gerasimos Lampouras
机构: University of Cambridge (剑桥大学); Huawei Noah’s Ark Lab (华为诺亚方舟实验室)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Symbolic Computation (cs.SC)
备注:

点击查看摘要

Abstract:Automating the formalization of mathematical statements for theorem proving remains a major challenge for Large Language Models (LLMs). LLMs struggle to identify and utilize the prerequisite mathematical knowledge and its corresponding formal representation in languages like Lean. Current retrieval-augmented autoformalization methods query external libraries using the informal statement directly, but overlook a fundamental limitation: informal mathematical statements are often complex and offer limited context on the underlying math concepts. To address this, we introduce DRIFT, a novel framework that enables LLMs to decompose informal mathematical statements into smaller, more tractable ‘‘sub-components’’. This facilitates targeted retrieval of premises from mathematical libraries such as Mathlib. Additionally, DRIFT retrieves illustrative theorems to help models use premises more effectively in formalization tasks. We evaluate DRIFT across diverse benchmarks (ProofNet, ConNF, and MiniF2F-test) and find that it consistently improves premise retrieval, nearly doubling the F1 score compared to the DPR baseline on ProofNet. Notably, DRIFT demonstrates strong performance on the out-of-distribution ConNF benchmark, with BEq+@10 improvements of 37.14% and 42.25% using GPT-4.1 and DeepSeek-V3.1, respectively. Our analysis shows that retrieval effectiveness in mathematical autoformalization depends heavily on model-specific knowledge boundaries, highlighting the need for adaptive retrieval strategies aligned with each model’s capabilities.
zh

[NLP-91] Is Implicit Knowledge Enough for LLM s? A RAG Approach for Tree-based Structures

链接: https://arxiv.org/abs/2510.10806
作者: Mihir Gupte,Paolo Giusto,Ramesh S
机构: General Motors (通用汽车)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
备注: Waiting for Conference Response

点击查看摘要

[NLP-92] oward Human-Centered Readability Evaluation EMNLP2025

链接: https://arxiv.org/abs/2510.10801
作者: Bahar İlgen,Georges Hattab
机构: Center for Artificial Intelligence in Public Health Research (ZKI-PH)(公共健康人工智能研究中心); Robert Koch Institute (罗伯特·科赫研究所); Department of Mathematics and Computer Science (数学与计算机科学系); Freie Universität Berlin (柏林自由大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Accepted to the 4th Workshop on Bridging Human-Computer Interaction and NLP (HCI+NLP) at EMNLP 2025, Suzhou, China

点击查看摘要

[NLP-93] Review of Inference-Time Scaling Strategies: Reasoning Search and RAG

链接: https://arxiv.org/abs/2510.10787
作者: Zhichao Wang,Cheng Wan,Dong Nie
机构: Inflection AI; Georgia Institute of Technology; ChatAlpha AI
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-94] HiligayNER: A Baseline Named Entity Recognition Model for Hiligaynon ACL

【速读】: 该论文旨在解决菲律宾语种中Hiligaynon语言在自然语言处理(Natural Language Processing, NLP)研究中因缺乏标注语料库和基线模型而导致的代表性不足问题。其解决方案的关键在于构建并公开发布首个针对Hiligaynon语言的命名实体识别(Named Entity Recognition, NER)基线模型——HiligayNER,该模型基于包含8000余条标注句子的语料库训练而成,采用mBERT和XLM-RoBERTa两种Transformer架构进行微调,并在多类实体上均实现超过80%的精确率、召回率与F1分数,同时展现出良好的跨语言迁移能力,为低资源环境下区域性语言的NLP技术发展提供了重要基础。

链接: https://arxiv.org/abs/2510.10776
作者: James Ald Teves,Ray Daniel Cal,Josh Magdiel Villaluz,Jean Malolos,Mico Magtira,Ramon Rodriguez,Mideth Abisado,Joseph Marvin Imperial
机构: Silliman University (希利曼大学); National University Philippines (菲律宾国家大学)
类目: Computation and Language (cs.CL)
备注: Camera-ready for PACLIC 2025 (ACL Proceedings)

点击查看摘要

Abstract:The language of Hiligaynon, spoken predominantly by the people of Panay Island, Negros Occidental, and Soccsksargen in the Philippines, remains underrepresented in language processing research due to the absence of annotated corpora and baseline models. This study introduces HiligayNER, the first publicly available baseline model for the task of Named Entity Recognition (NER) in Hiligaynon. The dataset used to build HiligayNER contains over 8,000 annotated sentences collected from publicly available news articles, social media posts, and literary texts. Two Transformer-based models, mBERT and XLM-RoBERTa, were fine-tuned on this collected corpus to build versions of HiligayNER. Evaluation results show strong performance, with both models achieving over 80% in precision, recall, and F1-score across entity types. Furthermore, cross-lingual evaluation with Cebuano and Tagalog demonstrates promising transferability, suggesting the broader applicability of HiligayNER for multilingual NLP in low-resource settings. This work aims to contribute to language technology development for underrepresented Philippine languages, specifically for Hiligaynon, and support future research in regional language processing.
zh

[NLP-95] Large Language Models for Full-Text Methods Assessment: A Case Study on Mediation Analysis

链接: https://arxiv.org/abs/2510.10762
作者: Wenqing Zhang,Trang Nguyen,Elizabeth A. Stuart,Yiqun T. Chen
机构: 未知
类目: Computation and Language (cs.CL); Applications (stat.AP)
备注:

点击查看摘要

[NLP-96] Sarcasm Detection Using Deep Convolutional Neural Networks: A Modular Deep Learning Framework

链接: https://arxiv.org/abs/2510.10729
作者: Manas Zambre,Sarika Bobade(Supervisor)
机构: 未知
类目: Computation and Language (cs.CL)
备注: 4 pages, 5 figures

点击查看摘要

[NLP-97] RePro: Training Language Models to Faithfully Recycle the Web for Pretraining

链接: https://arxiv.org/abs/2510.10681
作者: Zichun Yu,Chenyan Xiong
机构: Carnegie Mellon University (卡内基梅隆大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-98] Unlocking LLM Safeguards for Low-Resource Languages via Reasoning and Alignment with Minimal Training Data EMNLP2025

链接: https://arxiv.org/abs/2510.10677
作者: Zhuowei Chen,Bowei Zhang,Nankai Lin,Tian Hou,Lianxi Wang
机构: Guangdong University of Foreign Studies (广东外语外贸大学); Guangzhou Key Laboratory of Multilingual Intelligent Processing (广州 multilingual 智能处理重点实验室); University of Pittsburgh (匹兹堡大学)
类目: Computation and Language (cs.CL)
备注: Accepted to MRL Workshop at EMNLP 2025

点击查看摘要

[NLP-99] Bhasha-Rupantarika: Algorithm-Hardware Co-design approach for Multilingual Neural Machine Translation

链接: https://arxiv.org/abs/2510.10676
作者: Mukul Lokhande,Tanushree Dewangan,Mohd Sharik Mansoori,Tejas Chaudhari,Akarsh J.,Damayanti Lokhande,Adam Teman,Santosh Kumar Vishvakarma
机构: 未知
类目: Hardware Architecture (cs.AR); Computation and Language (cs.CL); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
备注:

点击查看摘要

[NLP-100] BrowserAg ent: Building Web Agents with Human-Inspired Web Browsing Actions

链接: https://arxiv.org/abs/2510.10666
作者: Zhengbo Zhang,Zhiheng Lyu,Junhao Gong,Hongzhu Yi,Xinming Wang,Yuxuan Zhou,Jiabing Yang,Ping Nie,Yan Huang,Wenhu Chen
机构: Chinese Academy of Sciences (中国科学院); University of Waterloo (滑铁卢大学); Peking University (北京大学); Tsinghua University (清华大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 10 pages

点击查看摘要

[NLP-101] AGENT IQL: An Agent -Inspired Multi-Expert Framework for Text-to-SQL Generation NEURIPS2025

链接: https://arxiv.org/abs/2510.10661
作者: Omid Reza Heidari,Siobhan Reid,Yassine Yaakoubi
机构: Concordia University (康考迪亚大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Accepted at NeurIPS 2025, ER “Efficient Reasoning” workshop

点击查看摘要

[NLP-102] Youre Not Gonna Believe This: A Computational Analysis of Factual Appeals and Sourcing in Partisan News

链接: https://arxiv.org/abs/2510.10658
作者: Guy Mor-Lan,Tamir Sheafer,Shaul R. Shenhav
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-103] FactAppeal: Identifying Epistemic Factual Appeals in News Media

链接: https://arxiv.org/abs/2510.10627
作者: Guy Mor-Lan,Tamir Sheafer,Shaul R. Shenhav
机构: Hebrew University of Jerusalem (希伯来大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-104] Preserving LLM Capabilities through Calibration Data Curation: From Analysis to Optimization NEURIPS2025

【速读】: 该论文旨在解决后训练压缩(post-training compression)过程中,校准数据(calibration data)对大语言模型(LLM)能力保留的影响机制不明确的问题,尤其关注其在复杂推理能力(如数学问题求解和代码生成)上的表现。现有研究多局限于语言建模或常识推理性能的单一维度分析,缺乏从校准数据的组合特性(compositional properties)与领域对应性(domain correspondence)出发的系统性探讨。论文的关键解决方案在于揭示激活空间中的代表性(representativeness)与多样性(diversity)才是决定校准数据质量的根本因素,并据此提出一种基于激活模式分析的数据筛选框架,从而显著提升压缩后模型在关键LLM能力上的保持效果。

链接: https://arxiv.org/abs/2510.10618
作者: Bowei He,Lihao Yin,Huiling Zhen,Shuqi Liu,Han Wu,Xiaokun Zhang,Mingxuan Yuan,Chen Ma
机构: City University of Hong Kong (香港城市大学); Huawei (华为)
类目: Computation and Language (cs.CL)
备注: Accepted by NeurIPS 2025

点击查看摘要

Abstract:Post-training compression has been a widely employed approach to scale down large language model (LLM) and facilitate efficient inference. In various proposed compression methods, including pruning and quantization, calibration data plays a vital role by informing the weight importance and activation dynamic ranges. However, how calibration data impacts the LLM capability after compression is less explored. Few of the existing works, though recognizing the significance of this study, only investigate the language modeling or commonsense reasoning performance degradation from limited angles, like the data sources or sample amounts. More systematic research is still needed to examine the impacts on different LLM capabilities in terms of compositional properties and domain correspondence of calibration data. In this work, we aim at bridging this gap and further analyze underlying influencing mechanisms from the activation pattern perspective. Especially, we explore the calibration data’s impacts on high-level complex reasoning capabilities, like math problem solving and code generation. Delving into the underlying mechanism, we find that the representativeness and diversity in activation space more fundamentally determine the quality of calibration data. Finally, we propose a calibration data curation framework based on such observations and analysis, enhancing the performance of existing post-training compression methods on preserving critical LLM capabilities. Our code is provided in \hrefthis https URLLink.
zh

[NLP-105] Dynamic Topic Evolution with Temporal Decay and Attention in Large Language Models

链接: https://arxiv.org/abs/2510.10613
作者: Di Wu abd Shuaidong Pan
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-106] A Layered Intuition – Method Model with Scope Extension for LLM Reasoning

【速读】: 该论文旨在解决大型语言模型(Large Language Model, LLM)在面对未见过的间接性问题时表现不足的问题,即如何提升其在真实世界复杂场景下的推理能力与泛化性能。解决方案的关键在于提出一个统一的“直觉-方法分层模型与范围扩展”框架:其中,直觉思维提供快速初始响应,方法论思维将问题与解法解耦为可迁移的推理单元,而范围扩展则通过垂直(因果分析)、水平(平行与泛化问题)以及首次引入的时间与空间维度扩展,构建系统化的知识树并形成知识网络,从而增强模型对未知问题的适应性;此外,论文进一步提出以“方法扩展熵”作为量化指标,衡量扩展的独立性与多样性,用以评估系统解决未见问题的能力,推动LLM向更鲁棒、可扩展的推理范式演进。

链接: https://arxiv.org/abs/2510.10592
作者: Hong Su
机构: Chengdu University of Information Technology (成都信息工程大学)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Existing studies have introduced method-based reasoning and scope extension as approaches to enhance Large Language Model (LLM) performance beyond direct matrix mappings. Building on these foundations, this paper summarizes and integrates these ideas into a unified Intuition-Method Layered Model with Scope Extension, designed to address indirected (unseen) issues more systematically. In this framework, intuition-based thinking provides rapid first-reaction answers, while method-based thinking decouples questions and solutions into transferable reasoning units. Scope extension is then applied to broaden applicability, including vertical (cause analysis), horizontal (parallel and generalized issues), and for the first time, temporal and spatial extensions, which expand reasoning across time and contextual dimensions. These extensions are organized into systematic knowledge trees that interconnect into a knowledge network, thereby increasing adaptability. To quantitatively evaluate this process, we propose the entropy of method extension, which measures the independence and diversity of extensions as an indicator of the system’s capacity to solve unseen questions. By logically connecting existing approaches with new extensions and introducing an entropy-based evaluation framework, this work advances toward a more robust and extensible reasoning paradigm for LLMs in real-world problem-solving.
zh

[NLP-107] BitMar: Low-Bit Multimodal Fusion with Episodic Memory for Edge Devices EMNLP2025

链接: https://arxiv.org/abs/2510.10560
作者: Euhid Aman,Esteban Carlin,Hsing-Kuo Pao,Giovanni Beltrame,Ghaluh Indah Permata Sari,Yie-Tarng Chen
机构: NTUST Taiwan (国立台湾科技大学); Polytechnique Montréal (蒙特利尔综合理工学院)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: 6 pages, BabyLM Workshop, EMNLP 2025

点击查看摘要

[NLP-108] Detecting Hallucinations in Authentic LLM -Human Interactions

链接: https://arxiv.org/abs/2510.10539
作者: Yujie Ren,Niklas Gruhlke,Anne Lauscher
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-109] Merlins Whisper: Enabling Efficient Reasoning in LLM s via Black-box Adversarial Prompting

链接: https://arxiv.org/abs/2510.10528
作者: Heming Xia,Cunxiao Du,Rui Li,Chak Tou Leong,Yongqi Li,Wenjie Li
机构: The Hong Kong Polytechnic University (香港理工大学); Sea AI Lab; Peking University (北京大学)
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-110] VOLTAGE: A Versatile Contrastive Learning based OCR Methodology for ultra low-resource scripts through Auto Glyph Feature Extraction EACL2024

链接: https://arxiv.org/abs/2510.10490
作者: Prawaal Sharma,Poonam Goyal,Vidisha Sharma,Navneet Goyal
机构: Infosys(印度资讯科技公司); BITS Pilani(比尔拉理工学院)
类目: Computation and Language (cs.CL)
备注: 9 Pages, Plus Appendices, EACL 2024

点击查看摘要

[NLP-111] UltraLLaDA: Scaling the Context Length to 128K for Diffusion Large Language Models

链接: https://arxiv.org/abs/2510.10481
作者: Guangxin He,Shen Nie,Fengqi Zhu,Yuankang Zhao,Tianyi Bai,Ran Yan,Jie Fu,Chongxuan Li,Binhang Yuan
机构: HKUST; Renmin University of China (中国人民大学); University of Chinese Academy of Sciences (中国科学院大学); Shanghai AI Lab (上海人工智能实验室)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-112] Assessing Large Language Models for Structured Medical Order Extraction

链接: https://arxiv.org/abs/2510.10475
作者: A H M Rezaul Karim,Ozlem Uzuner
机构: George Mason University (乔治梅森大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-113] When or What? Understanding Consumer Engagement on Digital Platforms

链接: https://arxiv.org/abs/2510.10474
作者: Jingyi Wu,Junying Liang
机构: Zhejiang University (浙江大学)
类目: Computation and Language (cs.CL); Computers and Society (cs.CY)
备注: 21 pages, 6 figures, 3 tables

点击查看摘要

[NLP-114] FML-bench: A Benchmark for Automatic ML Research Agents Highlighting the Importance of Exploration Breadth

链接: https://arxiv.org/abs/2510.10472
作者: Qiran Zou,Hou Hei Lam,Wenhao Zhao,Yiming Tang,Tingting Chen,Samson Yu,Tianyi Zhang,Chang Liu,Xiangyang Ji,Dianbo Liu
机构: National University of Singapore (新加坡国立大学); Tsinghua University (清华大学); University of Minnesota (明尼苏达大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Our benchmark is available at: this https URL

点击查看摘要

[NLP-115] NIM: Neuro-symbolic Ideographic Metalanguage for Inclusive Communication EMNLP

【速读】: 该论文旨在解决数字鸿沟(Digital Divide)问题,特别是针对学术素养较低群体在数字通信中面临的显著障碍。其解决方案的关键在于提出一种通用的表意符号金属语言(ideographic metalanguage),该方法融合神经符号人工智能(Neuro-symbolic AI)原理,利用基于世界知识的大语言模型(LLMs)与源自自然语义金属语言理论(Natural Semantic Metalanguage, NSM)的符号知识启发式规则,实现复杂概念向更基础原子语义单元的分解。通过人本协同设计,系统在超过200名半文盲参与者中完成定义、符号选择与验证,最终实现了80%以上的语义可理解性、低学习门槛和跨文化适应性,有效服务于教育程度有限的弱势群体。

链接: https://arxiv.org/abs/2510.10459
作者: Prawaal Sharma,Poonam Goyal,Navneet Goyal,Vidisha Sharma
机构: Infosys(印孚瑟斯); BITS Pilani(印度理工学院比拉尼分校)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 9 pages, EMNLP Findings 2025

点击查看摘要

Abstract:Digital communication has become the cornerstone of modern interaction, enabling rapid, accessible, and interactive exchanges. However, individuals with lower academic literacy often face significant barriers, exacerbating the “digital divide”. In this work, we introduce a novel, universal ideographic metalanguage designed as an innovative communication framework that transcends academic, linguistic, and cultural boundaries. Our approach leverages principles of Neuro-symbolic AI, combining neural-based large language models (LLMs) enriched with world knowledge and symbolic knowledge heuristics grounded in the linguistic theory of Natural Semantic Metalanguage (NSM). This enables the semantic decomposition of complex ideas into simpler, atomic concepts. Adopting a human-centric, collaborative methodology, we engaged over 200 semi-literate participants in defining the problem, selecting ideographs, and validating the system. With over 80% semantic comprehensibility, an accessible learning curve, and universal adaptability, our system effectively serves underprivileged populations with limited formal education.
zh

[NLP-116] Rethinking LLM Evaluation: Can We Evaluate LLM s with 200x Less Data?

链接: https://arxiv.org/abs/2510.10457
作者: Shaobo Wang,Cong Wang,Wenjie Fu,Yue Min,Mingquan Feng,Isabel Guan,Xuming Hu,Conghui He,Cunxiang Wang,Kexin Yang,Xingzhang Ren,Fei Huang,Dayiheng Liu,Linfeng Zhang
机构: EPIC Lab, SJTU; SJTU; Alibaba Group; FDU; HKUST; HKUST (GZ); Shanghai AI Lab; ZhipuAI
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: 18 pages, 5 figures

点击查看摘要

[NLP-117] End-to-end Speech Recognition with similar length speech and text

链接: https://arxiv.org/abs/2510.10453
作者: Peng Fan,Wenping Wang,Fei Deng
机构: Chengdu University of Technology (成都理工大学); National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University (四川大学合成视觉基础科学国家重点实验室)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-118] Steering Over-refusals Towards Safety in Retrieval Augmented Generation

链接: https://arxiv.org/abs/2510.10452
作者: Utsav Maskey,Mark Dras,Usman Naseem
机构: Macquarie University (麦考瑞大学)
类目: Computation and Language (cs.CL)
备注: Preprint

点击查看摘要

[NLP-119] RECON: Reasoning with Condensation for Efficient Retrieval-Augmented Generation

链接: https://arxiv.org/abs/2510.10448
作者: Zhichao Xu,Minheng Wang,Yawei Wang,Wenqian Ye,Yuntao Du,Yunpu Ma,Yijun Tian
机构: University of Utah (犹他大学); University of Washington (华盛顿大学); George Washington University (乔治·华盛顿大学); University of Virginia (弗吉尼亚大学); Shandong University (山东大学); Ludwig Maximilian University of Munich (慕尼黑路德维希-马克西米利安大学); University of Notre Dame (圣母大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-120] Do Audio LLM s Really LISTEN or Just Transcribe? Measuring Lexical vs. Acoustic Emotion Cues Reliance

链接: https://arxiv.org/abs/2510.10444
作者: Jingyi Chen,Zhimeng Guo,Jiyun Chun,Pichao Wang,Andrew Perrault,Micha Elsner
机构: The Ohio State University (俄亥俄州立大学); Penn State University (宾夕法尼亚州立大学); Amazon (亚马逊)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-121] LONGQAEVAL: Designing Reliable Evaluations of Long-Form Clinical QA under Resource Constraints

【速读】: 该论文旨在解决长文本临床问答(Long-form Clinical Question Answering, QA)系统评估中资源消耗大、一致性差的问题,尤其在医疗专业性强、人工标注成本高的场景下。其核心挑战在于如何在有限资源和高专业要求条件下实现可靠、高效的评估。解决方案的关键在于提出 LongQAEval 评估框架,并基于医师对300个真实患者问题的回答(包括医生与大语言模型生成的答案)进行分析,发现细粒度句子级评估更有利于提升正确性判断的一致性,而粗粒度答案级评估在相关性维度上表现更优;同时,仅标注少量关键句子即可达到与粗粒度标注相当的可靠性,从而显著降低人力成本和时间开销。

链接: https://arxiv.org/abs/2510.10415
作者: Federica Bologna,Tiffany Pan,Matthew Wilkens,Yue Guo,Lucy Lu Wang
机构: Cornell University (康奈尔大学); University of Illinois, Urbana-Champaign (伊利诺伊大学香槟分校); University of Washington (华盛顿大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Evaluating long-form clinical question answering (QA) systems is resource-intensive and challenging: accurate judgments require medical expertise and achieving consistent human judgments over long-form text is difficult. We introduce LongQAEval, an evaluation framework and set of evaluation recommendations for limited-resource and high-expertise settings. Based on physician annotations of 300 real patient questions answered by physicians and LLMs, we compare coarse answer-level versus fine-grained sentence-level evaluation over the dimensions of correctness, relevance, and safety. We find that inter-annotator agreement (IAA) varies by dimension: fine-grained annotation improves agreement on correctness, coarse improves agreement on relevance, and judgments on safety remain inconsistent. Additionally, annotating only a small subset of sentences can provide reliability comparable to coarse annotations, reducing cost and effort.
zh

[NLP-122] STEAM: A Semantic-Level Knowledge Editing Framework for Large Language Models EMNLP2025

链接: https://arxiv.org/abs/2510.10398
作者: Geunyeong Jeong,Juoh Sun,Seonghee Lee,Harksoo Kim
机构: Konkuk University (中央大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Accepted to EMNLP 2025 (Findings)

点击查看摘要

[NLP-123] AssoMem: Scalable Memory QA with Multi-Signal Associative Retrieval

链接: https://arxiv.org/abs/2510.10397
作者: Kai Zhang,Xinyuan Zhang,Ejaz Ahmed,Hongda Jiang,Caleb Kumar,Kai Sun,Zhaojiang Lin,Sanat Sharma,Shereen Oraby,Aaron Colak,Ahmed Aly,Anuj Kumar,Xiaozhong Liu,Xin Luna Dong
机构: Worcester Polytechnic Institute (伍斯特理工学院)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-124] RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models

链接: https://arxiv.org/abs/2510.10390
作者: Aashiq Muhamed,Leonardo F. R. Ribeiro,Markus Dreyer,Virginia Smith,Mona T. Diab
机构: Carnegie Mellon University (卡内基梅隆大学); Amazon AGI (亚马逊AGI)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-125] ASC analyzer: A Python package for measuring argument structure construction usage in English texts

链接: https://arxiv.org/abs/2510.10384
作者: Hakyung Sung,Kristopher Kyle
机构: University of Oregon (俄勒冈大学); Rochester Institute of Technology (罗切斯特理工学院)
类目: Computation and Language (cs.CL)
备注: Accepted to the 2nd Workshop on Construction Grammars and NLP (CxGs+NLP)

点击查看摘要

[NLP-126] End-to-end Automatic Speech Recognition and Speech Translation: Integration of Speech Foundational Models and LLM s

链接: https://arxiv.org/abs/2510.10329
作者: Nam Luu,Ondřej Bojar
机构: Charles University (查尔斯大学); Faculty of Mathematics and Physics (数学与物理学院); Institute of Formal and Applied Linguistics (形式与应用语言学研究所)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-127] Are LLM s Empathetic to All? Investigating the Influence of Multi-Demographic Personas on a Models Empathy EMNLP2025

链接: https://arxiv.org/abs/2510.10328
作者: Ananya Malik,Nazanin Sabri,Melissa Karnaze,Mai Elsherief
机构: Northeastern University (东北大学); University of California, San Diego (加州大学圣地亚哥分校)
类目: Computation and Language (cs.CL)
备注: 9 pages, 4 figures, 4 tables, EMNLP 2025 Findings

点击查看摘要

[NLP-128] Sample-Efficient Online Learning in LM Agents via Hindsight Trajectory Rewriting

链接: https://arxiv.org/abs/2510.10304
作者: Michael Y. Hu,Benjamin Van Durme,Jacob Andreas,Harsh Jhamtani
机构: New York University (纽约大学); Microsoft (微软)
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-129] MatryoshkaThinking: Recursive Test-Time Scaling Enables Efficient Reasoning

链接: https://arxiv.org/abs/2510.10293
作者: Hongwei Chen,Yishu Lei,Dan Zhang,Bo Ke,Danxiang Zhu,Xuyi Chen,Yuxiang Lu,Zhengjie Huang,Shikun Feng,Jingzhou He,Yu Sun,Hua Wu,Haifeng Wang
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-130] ArtPerception: ASCII Art-based Jailbreak on LLM s with Recognition Pre-test

链接: https://arxiv.org/abs/2510.10281
作者: Guan-Yan Yang,Tzu-Yu Cheng,Ya-Wen Teng,Farn Wanga,Kuo-Hui Yeh
机构: National Taiwan University (国立台湾大学); National Yang Ming Chiao Tung University (国立阳明交通大学)
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: 30 pages, 22 figures. This preprint has been accepted for publication in Elsevier JOURNAL OF NETWORK AND COMPUTER APPLICATIONS (JNCA)

点击查看摘要

[NLP-131] On the Entity-Level Alignment in Crosslingual Consistency

链接: https://arxiv.org/abs/2510.10280
作者: Yihong Liu,Mingyang Wang,François Yvon,Hinrich Schütze
机构: Center for Information and Language Processing, LMU Munich (信息与语言处理中心,慕尼黑路德维希-马克西米利安大学); Munich Center for Machine Learning (MCML) (慕尼黑机器学习中心); Sorbonne Université, CNRS, ISIR, France (索邦大学,法国国家科学研究中心,ISIR实验室,法国)
类目: Computation and Language (cs.CL)
备注: preprint

点击查看摘要

[NLP-132] Backdoor Collapse: Eliminating Unknown Threats via Known Backdoor Aggregation in Language Models

链接: https://arxiv.org/abs/2510.10265
作者: Liang Lin,Miao Yu,Moayad Aloqaily,Zhenhong Zhou,Kun Wang,Linsey Pang,Prakhar Mehrotra,Qingsong Wen
机构: NTU(南洋理工大学); USTC(中国科学技术大学); UAEU(阿联酋大学); PayPal Inc(贝宝公司); Walmart Labs(沃尔玛实验室); Squirrel Ai Learning(松鼠AI学习)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-133] Audit-of-Understanding: Posterior-Constrained Inference for Mathematical Reasoning in Language Models

链接: https://arxiv.org/abs/2510.10252
作者: Samir Abdaljalil,Erchin Serpedin,Khalid Qaraqe,Hasan Kurban
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-134] ImCoref-CeS: An Improved Lightweight Pipeline for Coreference Resolution with LLM -based Checker-Splitter Refinement

链接: https://arxiv.org/abs/2510.10241
作者: Kangyang Luo,Yuzhuo Bai,Shuzheng Si,Cheng Gao,Zhitong Wang,Yingli Shen,Wenhao Li,Zhu Liu,Yufeng Han,Jiayi Wu,Cunliang Kong,Maosong Sun
机构: Tsinghua University (清华大学); East China Normal University (华东师范大学); Jiangsu Collaborative Innovation Center for Language Ability (江苏省语言能力协同创新中心)
类目: Computation and Language (cs.CL); Information Retrieval (cs.IR)
备注:

点击查看摘要

[NLP-135] xt2Token: Unsupervised Text Representation Learning with Token Target Prediction

链接: https://arxiv.org/abs/2510.10224
作者: Ruize An,Richong Zhang,Zhijie Nie,Zhanyu Wu,Yanzhao Zhang,Dingkun Long
机构: Beihang University (北京航空航天大学); Zhongguancun Laboratory (中关村实验室); Shen Yuan Honors College (冯如书院)
类目: Computation and Language (cs.CL); Information Retrieval (cs.IR)
备注:

点击查看摘要

[NLP-136] You only need 4 extra tokens: Synergistic Test-time Adaptation for LLM s

链接: https://arxiv.org/abs/2510.10223
作者: Yijie Xu,Huizai Yao,Zhiyu Guo,Weiyu Guo,Pengteng Li,Aiwei Liu,Xuming Hu,Hui Xiong
机构: The Hong Kong University of Science and Technology (Guangzhou) (香港科技大学(广州)); The Hong Kong University of Science and Technology (香港科技大学); Tsinghua University (清华大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Under Review

点击查看摘要

[NLP-137] Weed Out Then Harvest: Dual Low-Rank Adaptation is an Effective Noisy Label Detector for Noise-Robust Learning ACL2025

链接: https://arxiv.org/abs/2510.10208
作者: Bo Yuan,Yulin Chen,Yin Zhang
机构: Zhejiang University (浙江大学)
类目: Computation and Language (cs.CL)
备注: ACL 2025

点击查看摘要

[NLP-138] RLFR: Extending Reinforcement Learning for LLM s with Flow Environment

链接: https://arxiv.org/abs/2510.10201
作者: Jinghao Zhang,Naishan Zheng,Ruilin Li,Dongzhou Cheng,Zheming Liang,Feng Zhao,Jiaqi Wang
机构: University of Science and Technology of China (中国科学技术大学); Shanghai Innovation Institute (上海创新研究院); ByteDance (字节跳动); Wuhan University (武汉大学); Southeast University (东南大学)
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: Project Website: this https URL

点击查看摘要

[NLP-139] MedAgent Audit: Diagnosing and Quantifying Collaborative Failure Modes in Medical Multi-Agent Systems

链接: https://arxiv.org/abs/2510.10185
作者: Lei Gu,Yinghao Zhu,Haoran Sang,Zixiang Wang,Dehao Sui,Wen Tang,Ewen Harrison,Junyi Gao,Lequan Yu,Liantao Ma
机构: Peking University (北京大学); The University of Hong Kong (香港大学); The University of Edinburgh (爱丁堡大学); Peking University Third Hospital (北京大学第三医院); Health Data Research UK (英国健康数据研究)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
备注: Code: this https URL

点击查看摘要

[NLP-140] A Survey of Inductive Reasoning for Large Language Models

链接: https://arxiv.org/abs/2510.10182
作者: Kedi Chen,Dezhao Ruan,Yuhao Dan,Yaoting Wang,Siyu Yan,Xuecheng Wu,Yinqi Zhang,Qin Chen,Jie Zhou,Liang He,Biqing Qi,Linyang Li,Qipeng Guo,Xiaoming Shi,Wei Zhang
机构: East China Normal University (华东师范大学); Shanghai Innovation Institute; Fudan University (复旦大学); Xi’an Jiaotong University (西安交通大学); Shanghai AI Laboratory
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-141] Large Language Model Sourcing: A Survey

链接: https://arxiv.org/abs/2510.10161
作者: Liang Pang,Kangxi Wu,Sunhao Dai,Zihao Wei,Zenghao Duan,Jia Gu,Xiang Li,Zhiyi Yin,Jun Xu,Huawei Shen,Xueqi Cheng
机构: State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences; Gaoling School of Artificial Intelligence, Renmin University of China
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 31 pages

点击查看摘要

[NLP-142] BabyBabelLM : A Multilingual Benchmark of Developmentally Plausible Training Data

链接: https://arxiv.org/abs/2510.10159
作者: Jaap Jumelet,Abdellah Fourtassi,Akari Haga,Bastian Bunzeck,Bhargav Shandilya,Diana Galvan-Sosa,Faiz Ghifari Haznitrama,Francesca Padovani,Francois Meyer,Hai Hu,Julen Etxaniz,Laurent Prévot,Linyang He,María Grandury,Mila Marcheva,Negar Foroutan,Nikitas Theodoropoulos,Pouya Sadeghi,Siyuan Song,Suchir Salhan,Susana Zhou,Yurii Paniv,Ziyin Zhang,Arianna Bisazza,Alex Warstadt,Leshem Choshen
机构: University of Groningen (格罗宁根大学); Aix Marseille University (艾克斯-马赛大学); Nara Institute of Science and Technology (奈良先端科学技术大学院大学); Bielefeld University (比勒费尔德大学); University of Colorado Boulder (科罗拉多大学博尔德分校); University of Cambridge (剑桥大学); KAIST (韩国科学技术院); University of Cape Town (开普敦大学); City University of Hong Kong (香港城市大学); HiTZ, University of the Basque Country (HiTZ,巴斯克大学); Columbia University (哥伦比亚大学); SomosNLP; EPFL (瑞士联邦理工学院洛桑分校); Independent Researcher (独立研究员); University of Tehran (德黑兰大学); University of Texas at Austin (德克萨斯大学奥斯汀分校); Ukrainian Catholic University (乌克兰天主教大学); Shanghai Jiao Tong University (上海交通大学); University of California San Diego (加州大学圣地亚哥分校); MIT, MIT-IBM Watson AI Lab (麻省理工学院,MIT-IBM沃森人工智能实验室)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-143] BILLY: Steering Large Language Models via Merging Persona Vectors for Creative Generation

链接: https://arxiv.org/abs/2510.10157
作者: Tsung-Min Pai,Jui-I Wang,Li-Chun Lu,Shao-Hua Sun,Hung-Yi Lee,Kai-Wei Chang
机构: National Taiwan University (台湾大学); Massachusetts Institute of Technology (麻省理工学院)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-144] DiffHeads: Differential Analysis and Inference-Time Masking of Bias Heads in Large Language Models

链接: https://arxiv.org/abs/2510.10142
作者: Tingxu Han,Wei Song,Ziqi Ding,Ziming Li,Chunrong Fang,Yuekang Li,Dongfang Liu,Zhenyu Chen,Zhenting Wang
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-145] Hybrid OCR-LLM Framework for Enterprise-Scale Document Information Extraction Under Copy-heavy Task

链接: https://arxiv.org/abs/2510.10138
作者: Zilong Wang,Xiaoyu Shen
机构: Ningbo Institute of Digital Twin (宁波数字孪生研究所); Eastern Institute of Technology (东方理工大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-146] LinearRAG : Linear Graph Retrieval Augmented Generation on Large-scale Corpora

链接: https://arxiv.org/abs/2510.10114
作者: Luyao Zhuang,Shengyuan Chen,Yilin Xiao,Huachi Zhou,Yujing Zhang,Hao Chen,Qinggang Zhang,Xiao Huang
机构: Hong Kong Polytechnic University (香港理工大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-147] Stop When Enough: Adaptive Early-Stopping for Chain-of-Thought Reasoning

链接: https://arxiv.org/abs/2510.10103
作者: Renliang Sun,Wei Cheng,Dawei Li,Haifeng Chen,Wei Wang
机构: UCLA (加州大学洛杉矶分校); NEC Labs America (美国NEC实验室); Arizona State University (亚利桑那州立大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-148] CardRewriter: Leverag ing Knowledge Cards for Long-Tail Query Rewriting on Short-Video Platforms

链接: https://arxiv.org/abs/2510.10095
作者: Peiyuan Gong,Feiran Zhu,Yaqi Yin,Chenglei Dai,Chao Zhang,Kai Zheng,Wentian Bao,Jiaxin Mao,Yi Zhang
机构: GSAI, Renmin University of China (中国人民大学高瓴人工智能学院); Kuaishou Technology (快手科技)
类目: Information Retrieval (cs.IR); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-149] Diversity Augmentation of Dynamic User Preference Data for Boosting Personalized Text Summarizers

链接: https://arxiv.org/abs/2510.10082
作者: Parthiv Chatterjee,Shivam Sonawane,Amey Hengle,Aditya Tanna,Sourish Dasgupta,Tanmoy Chakraborty
机构: KDM Lab, Dhirubhai Ambani University, India (KDM 实验室,达赫鲁巴·阿迈蒂大学,印度); LCS2 Lab, Indian Institute of Technology Delhi, India (LCS2 实验室,印度理工学院德里分校,印度)
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-150] A-IPO: Adaptive Intent-driven Preference Optimization

链接: https://arxiv.org/abs/2510.10077
作者: Wenqing Wang(1),Muhammad Asif Ali(2),Ali Shoker(2),Ruohan Yang(1),Junyang Chen(3),Ying Sha(1),Huan Wang(1) ((1) Huazhong Agricultural University, China,(2) King Abdullah University of Science and Technology, KSA,(3) Shenzhen University, China)
机构: Huazhong Agricultural University (华中农业大学); King Abdullah University of Science and Technology (阿卜杜拉国王科技大学); Shenzhen University (深圳大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-151] Unilaw-R1: A Large Language Model for Legal Reasoning with Reinforcement Learning and Iterative Inference

链接: https://arxiv.org/abs/2510.10072
作者: Hua Cai,Shuang Zhao,Liang Zhang,Xuli Shen,Qing Xu,Weilin Shen,Zihao Wen,Tianke Ban
机构: UniDT; Fudan University (复旦大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-152] CLMN: Concept based Language Models via Neural Symbolic Reasoning

链接: https://arxiv.org/abs/2510.10063
作者: Yibo Yang
机构: The Hong Kong University of Science and Technology (香港科技大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 7 pages, 2 figures

点击查看摘要

[NLP-153] HUME: Measuring the Human-Model Performance Gap in Text Embedding Task ICLR2026

链接: https://arxiv.org/abs/2510.10062
作者: Adnan El Assadi,Isaac Chung,Roman Solomatin,Niklas Muennighoff,Kenneth Enevoldsen
机构: Carleton University (卡尔顿大学); Zendesk; SberAI; Stanford University (斯坦福大学); Aarhus University (奥胡斯大学)
类目: Computation and Language (cs.CL)
备注: Submitted to ICLR 2026

点击查看摘要

[NLP-154] ranslution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling

链接: https://arxiv.org/abs/2510.10060
作者: Hehe Fan,Yi Yang,Mohan Kankanhalli,Fei Wu
机构: Zhejiang University (浙江大学); National University of Singapore (新加坡国立大学)
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
备注: technical report

点击查看摘要

[NLP-155] Lightweight Baselines for Medical Abstract Classification: DistilBERT with Cross-Entropy as a Strong Default ALT

【速读】: 该论文旨在解决大型语言模型(Large Language Models, LLMs)在医疗场景中部署受限的问题,尤其是在成本、延迟和隐私要求严格的环境下,如何实现高效且准确的医学摘要分类。其关键解决方案是采用轻量级编码器(compact encoder)策略,在固定预算下系统评估不同训练目标对性能的影响:对比了BERT base与DistilBERT在标准交叉熵损失、类别加权交叉熵损失和焦点损失(focal loss)下的表现。结果表明,使用普通交叉熵损失的DistilBERT在测试集上取得了最佳平衡性能,同时参数量远低于BERT base,验证了“优先选择紧凑编码器+标准交叉熵”的实用默认方案,为医疗NLP任务提供了一种可落地的轻量化建模路径。

链接: https://arxiv.org/abs/2510.10025
作者: Jiaqi Liu,Lanruo Wang,Su Liu,Xin Hu
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Healthcare AI, Medical Text Classification, Lightweight LLMs, DistilBERT, Reproducibility

点击查看摘要

Abstract:Large language models work well for many NLP tasks, but they are hard to deploy in health settings with strict cost, latency, and privacy limits. We revisit a lightweight recipe for medical abstract classification and ask how far compact encoders can go under a controlled budget. Using the public medical abstracts corpus, we finetune BERT base and DistilBERT with three objectives standard cross-entropy, class weighted cross entropy, and focal loss keeping tokenizer, sequence length, optimizer, and schedule fixed. DistilBERT with plain cross-entropy gives the best balance on the test set while using far fewer parameters than BERT base. We report accuracy, Macro F1, and Weighted F1, release the evaluation code, and include confusion analyses to make error patterns clear. Our results suggest a practical default: start with a compact encoder and cross-entropy, then add calibration and task-specific checks before moving to heavier models.
zh

[NLP-156] Path Drift in Large Reasoning Models:How First-Person Commitments Override Safety

【速读】: 该论文旨在解决长链式思维(Long Chain-of-Thought, Long-CoT)推理中出现的“路径漂移”(Path Drift)问题,即模型在复杂推理过程中偏离对齐路径,导致输出违反安全约束的现象。解决方案的关键在于提出一种三阶段诱导框架(Path Drift Induction Framework),通过认知负荷增强、自我角色预设和条件链劫持三个独立机制降低拒绝率,并进一步结合路径级防御策略——包括角色归属修正与元认知反思(reflective safety cues),实现对推理轨迹层面的安全控制,而不仅限于词元层面的对齐。

链接: https://arxiv.org/abs/2510.10013
作者: Yuyi Huang,Runzhe Zhan,Lidia S.Chao,Ailin Tao,Derek F.Wong
机构: The Second Affiliated Hospital, Guangdong Provincial Key Laboratory of Allergy and Clinical Immunology, Guangzhou Medical University; NLP2CT Lab, Department of Computer and Information Science, University of Macau
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:As large language models (LLMs) are increasingly deployed for complex reasoning tasks, Long Chain-of-Thought (Long-CoT) prompting has emerged as a key paradigm for structured inference. Despite early-stage safeguards enabled by alignment techniques such as RLHF, we identify a previously underexplored vulnerability: reasoning trajectories in Long-CoT models can drift from aligned paths, resulting in content that violates safety constraints. We term this phenomenon Path Drift. Through empirical analysis, we uncover three behavioral triggers of Path Drift: (1) first-person commitments that induce goal-driven reasoning that delays refusal signals; (2) ethical evaporation, where surface-level disclaimers bypass alignment checkpoints; (3) condition chain escalation, where layered cues progressively steer models toward unsafe completions. Building on these insights, we introduce a three-stage Path Drift Induction Framework comprising cognitive load amplification, self-role priming, and condition chain hijacking. Each stage independently reduces refusal rates, while their combination further compounds the effect. To mitigate these risks, we propose a path-level defense strategy incorporating role attribution correction and metacognitive reflection (reflective safety cues). Our findings highlight the need for trajectory-level alignment oversight in long-form reasoning beyond token-level alignment.
zh

[NLP-157] Beyond the limitation of a single query: Train your LLM for query expansion with Reinforcement Learning

链接: https://arxiv.org/abs/2510.10009
作者: Shu Zhao,Tan Yu,Anbang Xu
机构: NVIDIA; Pennsylvania State University (宾夕法尼亚州立大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
备注:

点击查看摘要

[NLP-158] MTP-S2UT: Enhancing Speech-to-Speech Translation Quality with Multi-token Prediction

链接: https://arxiv.org/abs/2510.10003
作者: Jianjin Wang,Runsong Zhao,Xiaoqian Liu,Yuan Ge,Ziqiang Xu,Tong Xiao,Shengxiang Gao,Zhengtao Yu,Jingbo Zhu
机构: 未知
类目: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
备注: Copyright 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

点击查看摘要

[NLP-159] oward Machine Translation Literacy: How Lay Users Perceive and Rely on Imperfect Translations EMNLP2025

链接: https://arxiv.org/abs/2510.09994
作者: Yimin Xiao,Yongle Zhang,Dayeon Ki,Calvin Bao,Marianna J. Martindale,Charlotte Vaughn,Ge Gao,Marine Carpuat
机构: 未知
类目: Computation and Language (cs.CL)
备注: EMNLP 2025

点击查看摘要

[NLP-160] Unifying Tree Search Algorithm and Reward Design for LLM Reasoning : A Survey

链接: https://arxiv.org/abs/2510.09988
作者: Jiaqi Wei,Xiang Zhang,Yuejin Yang,Wenxuan Huang,Juntai Cao,Sheng Xu,Xiang Zhuang,Zhangyang Gao,Muhammad Abdul-Mageed,Laks V.S. Lakshmanan,Chenyu You,Wanli Ouyang,Siqi Sun
机构: Zhejiang University (浙江大学); University of British Columbia (不列颠哥伦比亚大学); Fudan University (复旦大学); Stony Brook University (石溪大学); The Chinese University of Hong Kong (香港中文大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-161] Operationalizing AI: Empirical Evidence on MLOps Practices User Satisfaction and Organizational Context

链接: https://arxiv.org/abs/2510.09968
作者: Stefan Pasch
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-162] Beyond Fertility: Analyzing STRR as a Metric for Multilingual Tokenization Evaluation NEURIPS2025

链接: https://arxiv.org/abs/2510.09947
作者: Mir Tafseer Nayeem,Sawsan Alqahtani,Md Tahmid Rahman Laskar,Tasnim Mohiuddin,M Saiful Bari
机构: University of Alberta (阿尔伯塔大学); Princess Nourah Bint Abdulrahman University (普林西萨·努拉·宾特·阿卜杜勒拉赫曼大学); Dialpad (Dialpad); Qatar Computing Research Institute (卡塔尔计算研究研究所); Amazon (亚马逊)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: NeurIPS 2025 Workshop

点击查看摘要

[NLP-163] Unpacking Hateful Memes: Presupposed Context and False Claims

链接: https://arxiv.org/abs/2510.09935
作者: Weibin Cai,Jiayu Li,Reza Zafarani
机构: Syracuse University (雪城大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-164] Enhancing Faithfulness in Abstractive Summarization via Span-Level Fine-Tuning

链接: https://arxiv.org/abs/2510.09915
作者: Sicong Huang,Qianqi Yan,Shengze Wang,Ian Lane
机构: University of California, Santa Cruz (加州大学圣克鲁兹分校)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-165] Dont Throw Away Your Pretrained Model

链接: https://arxiv.org/abs/2510.09913
作者: Shangbin Feng,Wenhao Yu,Yike Wang,Hongming Zhang,Yulia Tsvetkov,Dong Yu
机构: University of Washington (华盛顿大学); Tencent AI Seattle Lab (腾讯人工智能西雅图实验室)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-166] he Personalization Trap: How User Memory Alters Emotional Reasoning in LLM s

链接: https://arxiv.org/abs/2510.09905
作者: Xi Fang,Weijie Xu,Yuchong Zhang,Stephanie Eckman,Scott Nickleach,Chandan K. Reddy
机构: Amazon(亚马逊)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: 12 pages 5 figures

点击查看摘要

[NLP-167] HIPPD: Brain-Inspired Hierarchical Information Processing for Personality Detection

链接: https://arxiv.org/abs/2510.09893
作者: Guanming Chen,Lingzhi Shen,Xiaohao Cai,Imran Razzak,Shoaib Jameel
机构: University of Southampton (南安普顿大学); Mohamed bin Zayed University of Artificial Intelligence (穆罕默德·本·扎耶德人工智能大学)
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-168] Abductive Preference Learning

链接: https://arxiv.org/abs/2510.09887
作者: Yijin Ni,Peng Qi
机构: Georgia Institute of Technology (佐治亚理工学院); Uniphore
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-169] Closing the Data-Efficiency Gap Between Autoregressive and Masked Diffusion LLM s

链接: https://arxiv.org/abs/2510.09885
作者: Xu Pan,Ely Hahami,Jingxuan Fan,Ziqian Xie,Haim Sompolinsky
机构: Harvard University (哈佛大学); Hebrew University (希伯来大学); University of Texas Health Science Center at Houston (德克萨斯大学健康科学中心休斯顿分校)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-170] DELTA: Dynamic Layer-Aware Token Attention for Efficient Long-Context Reasoning

链接: https://arxiv.org/abs/2510.09883
作者: Hossein Entezari Zarch,Lei Gao,Chaoyi Jiang,Murali Annavarm
机构: University of Southern California (南加州大学)
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-171] BERT: Interpretable Style Embeddings via Sense Decomposition

链接: https://arxiv.org/abs/2510.09882
作者: Vishal Anand,Milad Alshomary,Kathleen McKeown
机构: Microsoft(微软); Columbia University(哥伦比亚大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-172] CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biases in LLM s EMNLP2025

链接: https://arxiv.org/abs/2510.09871
作者: Nafiseh Nikeghbal,Amir Hossein Kargaran,Jana Diesner
机构: Technical University of Munich (慕尼黑工业大学); LMU Munich & Munich Center for Machine Learning (慕尼黑路德维希马克西米利安大学与慕尼黑机器学习中心)
类目: Computation and Language (cs.CL)
备注: EMNLP 2025 (Oral)

点击查看摘要

[NLP-173] NarraBench: A Comprehensive Framework for Narrative Benchmarking

链接: https://arxiv.org/abs/2510.09869
作者: Sil Hamilton,Matthew Wilkens,Andrew Piper
机构: Cornell University (康奈尔大学); McGill University (麦吉尔大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-174] NG-Router: Graph-Supervised Multi-Agent Collaboration for Nutrition Question Answering

链接: https://arxiv.org/abs/2510.09854
作者: Kaiwen Shi,Zheyuan Zhang,Zhengqing Yuan,Keerthiram Murugesan,Vincent Galass,Chuxu Zhang,Yanfang Ye
机构: University of Notre Dame (圣母大学); University of Connecticut (康涅狄格大学); IBM Research (IBM 研究院)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-175] xt Prompt Injection of Vision Language Models

链接: https://arxiv.org/abs/2510.09849
作者: Ruizhe Zhu
机构: 未知
类目: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[NLP-176] ask-Aware Resolution Optimization for Visual Large Language Models EMNLP2025

链接: https://arxiv.org/abs/2510.09822
作者: Weiqing Luo,Zhen Tan,Yifan Li,Xinyu Zhao,Kwonjoon Lee,Behzad Dariush,Tianlong Chen
机构: University of North Carolina at Chapel Hill (北卡罗来纳大学教堂山分校); Arizona State University (亚利桑那州立大学); Michigan State University (密歇根州立大学); Honda Research Institute USA (本田研究美国公司)
类目: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
备注: Accepted as a main conference paper at EMNLP 2025. 9 pages (main content), 7 figures

点击查看摘要

[NLP-177] Steering Embedding Models with Geometric Rotation: Mapping Semantic Relationships Across Languages and Models

链接: https://arxiv.org/abs/2510.09790
作者: Michael Freenor,Lauren Alvarez
机构: Fuel iX (Fuel iX); TELUS Digital (TELUS 数字化)
类目: Computation and Language (cs.CL)
备注: 9 pages, 3 Figure, 1 table, preprint

点击查看摘要

[NLP-178] he Geometry of Reasoning : Flowing Logics in Representation Space

链接: https://arxiv.org/abs/2510.09782
作者: Yufa Zhou,Yixiao Wang,Xunjian Yin,Shuyan Zhou,Anru R. Zhang
机构: 未知
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
备注: Code: this https URL

点击查看摘要

[NLP-179] Building a Foundational Guardrail for General Agent ic Systems via Synthetic Data

链接: https://arxiv.org/abs/2510.09781
作者: Yue Huang,Hang Hua,Yujun Zhou,Pengcheng Jing,Manish Nagireddy,Inkit Padhi,Greta Dolcetti,Zhangchen Xu,Subhajit Chaudhury,Ambrish Rawat,Liubov Nedoshivina,Pin-Yu Chen,Prasanna Sattigeri,Xiangliang Zhang
机构: University of Notre Dame (圣母大学); MIT-IBM Watson AI Lab; University of Washington (华盛顿大学); Ca’ Foscari University of Venice (威尼斯卡福斯卡里大学); IBM Research
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-180] PromptGuard at BLP-2025 Task 1: A Few-Shot Classification Framework Using Majority Voting and Keyword Similarity for Bengali Hate Speech Detection

链接: https://arxiv.org/abs/2510.09771
作者: Rakib Hossan,Shubhashis Roy Dipta
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-181] Gold Panning: Turning Positional Bias into Signal for Multi-Document LLM Reasoning

链接: https://arxiv.org/abs/2510.09770
作者: Adam Byerly,Daniel Khashabi
机构: Johns Hopkins University (约翰霍普金斯大学)
类目: Computation and Language (cs.CL)
备注: 20 pages, 6 figures

点击查看摘要

[NLP-182] Machine learning methods fail to provide cohesive atheoretical construction of personality traits from semantic embeddings

链接: https://arxiv.org/abs/2510.09739
作者: Ayoub Bouguettaya,Elizabeth M. Stuart
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
备注: 1 figure, 12 pages

点击查看摘要

[NLP-183] Judges Verdict: A Comprehensive Analysis of LLM Judge Capability Through Human Agreement ICLR2026

链接: https://arxiv.org/abs/2510.09738
作者: Steve Han,Gilberto Titericz Junior,Tom Balough,Wenfei Zhou
机构: NVIDIA Corporation(英伟达)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 10 pages, 1 figure, 4 tables, under review as a conference paper at ICLR 2026

点击查看摘要

[NLP-184] VisRAG 2.0: Evidence-Guided Multi-Image Reasoning in Visual Retrieval-Augmented Generation

链接: https://arxiv.org/abs/2510.09733
作者: Yubo Sun,Chunyi Peng,Yukun Yan,Shi Yu,Zhenghao Liu,Chi Chen,Zhiyuan Liu,Maosong Sun
机构: Peking University (北京大学); Northeastern University (东北大学); Tsinghua University (清华大学)
类目: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[NLP-185] Its 2025 – Narrative Learning is the new baseline to beat for explainable machine learning

链接: https://arxiv.org/abs/2510.09723
作者: Gregory D. Baker
机构: Australian National University (澳大利亚国立大学)
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: 18 pages, 5 figures

点击查看摘要

[NLP-186] Layout-Aware Parsing Meets Efficient LLM s: A Unified Scalable Framework for Resume Information Extraction and Evaluation

链接: https://arxiv.org/abs/2510.09722
作者: Fanwei Zhu,Jinke Yu,Zulong Chen,Ying Zhou,Junhao Ji,Zhibo Yang,Yuxue Zhang,Haoyuan Hu,Zhenghao Liu
机构: Hangzhou City University (杭州城市大学); Alibaba Group (阿里巴巴集团); Zhejiang Lab (浙江实验室); Alibaba Cloud (阿里云); Ant Group (蚂蚁集团); Northeastern University (东北大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[NLP-187] A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM -Empowered Agent ic System

链接: https://arxiv.org/abs/2510.09721
作者: Jiale Guo,Suizhi Huang,Mei Li,Dong Huang,Xingsheng Chen,Regina Zhang,Zhijiang Guo,Han Yu,Siu-Ming Yiu,Christian Jensen,Pietro Lio,Kwok-Yan Lam
机构: Nanyang Technological University (南洋理工大学); The Hong Kong University of Science and Technology (香港科技大学); The University of Hong Kong (香港大学); Alborg University (奥尔堡大学); The University of Cambridge (剑桥大学)
类目: oftware Engineering (cs.SE); Computation and Language (cs.CL)
备注: 21 pages

点击查看摘要

[NLP-188] Preference-Aware Memory Update for Long-Term LLM Agents

链接: https://arxiv.org/abs/2510.09720
作者: Haoran Sun,Zekun Zhang,Shaoning Zeng
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-189] All Code No Thought: Current Language Models Struggle to Reason in Ciphered Language

链接: https://arxiv.org/abs/2510.09714
作者: Shiyuan Guo,Henry Sleight,Fabien Roger
机构: Anthropic Fellows Program (Anthropic研究员计划); Constellation (星座公司); Anthropic (Anthropic)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-190] Group-Adaptive Adversarial Learning for Robust Fake News Detection Against Malicious Comments

链接: https://arxiv.org/abs/2510.09712
作者: Zhao Tong,Chunlin Gong,Yimeng Gu,Haichao Shi,Qiang Liu,Shu Wu,Xiao-Yu Zhang
机构: Institute of Information Engineering, Chinese Academy of Sciences (中国科学院信息工程研究所); University of Minnesota Twin Cities (明尼苏达大学双城分校); Queen Mary University of London (伦敦玛丽女王大学); Institute of Automation, Chinese Academy of Sciences (中国科学院自动化研究所)
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: 10 pages, 12 figures

点击查看摘要

[NLP-191] ReaLM: Residual Quantization Bridging Knowledge Graph Embeddings and Large Language Models

链接: https://arxiv.org/abs/2510.09711
作者: Wenbin Guo,Xin Wang,Jiaoyan Chen,Lingbing Guo,Zhao Li,Zirui Chen
机构: Tianjin University (天津大学); The University of Manchester (曼彻斯特大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-192] SeCon-RAG : A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG NEURIPS2025

链接: https://arxiv.org/abs/2510.09710
作者: Xiaonan Si,Meilin Zhu,Simeng Qin,Lijia Yu,Lijun Zhang,Shuaitong Liu,Xinfeng Li,Ranjie Duan,Yang Liu,Xiaojun Jia
机构: Institute of Software, Chinese Academy of Sciences (中国科学院软件研究所); Northeast University (东北大学); Institute of Ai For industries, Chinese Academy of Sciences (中国科学院人工智能产业研究院); Southwest University (西南大学); Nanyang Technological University (南洋理工大学); Alibaba (阿里巴巴)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Accepted at NeurIPS 2025

点击查看摘要

[NLP-193] he Idola Tribus of AI: Large Language Models tend to perceive order where none exists EMNLP2025

链接: https://arxiv.org/abs/2510.09709
作者: Shin-nosuke Ishikawa,Masato Todo,Taiki Ogihara,Hirotsugu Ohba
机构: Rikkyo University (立教大学); Mamezou Co., Ltd.
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 14 pages, 3 figures, accepted to Findings of EMNLP 2025

点击查看摘要

[NLP-194] Emotionally Charged Logically Blurred: AI-driven Emotional Framing Impairs Human Fallacy Detection

链接: https://arxiv.org/abs/2510.09695
作者: Yanran Chen,Lynn Greschner,Roman Klinger,Michael Klenk,Steffen Eger
机构: NLLG, University of Technology Nuremberg (UTN), Germany; Fundamentals of Natural Language Processing, University of Bamberg, Germany; Ethics and Philosophy of Technology, Delft University of Technology, Netherlands
类目: Computation and Language (cs.CL)
备注: Initial submission

点击查看摘要

[NLP-195] Stop DDoS Attacking the Research Community with AI-Generated Survey Papers NEURIPS2025

链接: https://arxiv.org/abs/2510.09686
作者: Jianghao Lin,Rong Shan,Jiachen Zhu,Yunjia Xi,Yong Yu,Weinan Zhang
机构: Antai College of Economics and Management, Shanghai Jiao Tong University, China (上海交通大学安泰经济与管理学院); School of Computer Science, Shanghai Jiao Tong University, China (上海交通大学计算机科学学院)
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
备注: Accepted by NeurIPS 2025 (Position Track)

点击查看摘要

[NLP-196] able Question Answering in the Era of Large Language Models : A Comprehensive Survey of Tasks Methods and Evaluation

链接: https://arxiv.org/abs/2510.09671
作者: Wei Zhou,Bolei Ma,Annemarie Friedrich,Mohsen Mesgar
机构: Bosch Center for Artificial Intelligence(博世人工智能中心); LMU Munich & Munich Center for Machine Learning(慕尼黑大学与慕尼黑机器学习中心); University of Augsburg(奥格斯堡大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-197] Mission Impossible: Feedback-Guided Dynamic Interactive Planning for Improving Reasoning on LLM s

链接: https://arxiv.org/abs/2510.05577
作者: Dong Yan,Gaochen Wu,Bowen Zhou
机构: Central South University (中南大学); Tsinghua University (清华大学); Shanghai AI Laboratory (上海人工智能实验室)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-198] Detecting Conspiracy Theory Against COVID-19 Vaccines

链接: https://arxiv.org/abs/2211.13003
作者: Md Hasibul Amin(1),Harika Madanu(1),Sahithi Lavu(1),Hadi Mansourifar(1),Dana Alsagheer(1),Weidong Shi(1) ((1) University Of Houston)
机构: 未知
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
备注: 6 pages, 5 figures

点击查看摘要

计算机视觉

[CV-0] CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images

【速读】:该论文旨在解决当前大型语言模型(Large Language Models, LLMs)和视觉语言模型(Vision Language Models, VLMs)在处理需要视觉辅助的数学问题时存在的瓶颈,例如绘制辅助线或函数图像以辅助求解。现有模型受限于纯文本推理链,缺乏能够生成交错文本与图像的多模态统一模型所需的精度与可控性。其解决方案的关键在于提出一种代码驱动的思维链(CodePlot-CoT)范式,通过VLM同时生成文本推理过程和可执行绘图代码,将代码渲染为“视觉思考”图像用于数学问题求解。该方法依赖三个核心要素:构建首个大规模双语数学视觉推理数据集Math-VR(17.8万样本)、开发专门用于解析复杂数学图形的先进图像到代码转换器,以及基于高质量训练数据训练出的CodePlot-CoT模型,实验表明该方法相较基线模型在新基准上提升达21%,验证了其有效性。

链接: https://arxiv.org/abs/2510.11718
作者: Chengqi Duan,Kaiyue Sun,Rongyao Fang,Manyuan Zhang,Yan Feng,Ying Luo,Yufang Liu,Ke Wang,Peng Pei,Xunliang Cai,Hongsheng Li,Yi Ma,Xihui Liu
机构: HKU (香港大学); Meituan (美团); CUHK (香港中文大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Recent advances in Large Language Models (LLMs) and Vision Language Models (VLMs) have shown significant progress in mathematical reasoning, yet they still face a critical bottleneck with problems requiring visual assistance, such as drawing auxiliary lines or plotting functions to solve the problems. Most LLMs and VLMs are constrained to text-only reasoning chains, while multimodal unified models that can generate interleaved text and images lack the necessary precision and controllability for such tasks. To address this, we propose CodePlot-CoT, a code-driven Chain-of-Thought paradigm for “thinking with images” in mathematics. Our approach leverages the VLM to generate text reasoning as well as executable plotting code, which is then rendered into images as “visual thought”, to solve mathematical problems. To achieve this, we first construct Math-VR, the first large-scale, bilingual dataset and benchmark for Mathematics problems with Visual Reasoning, comprising 178K samples. Second, to create high-quality training data, we develop a state-of-the-art image-to-code converter specialized for parsing complex mathematical figures into codes. Finally, using these training data, we train the CodePlot-CoT model for solving mathematical problems. Experimental results show that our model achieves up to 21% increase over base model on our new benchmark, fully validating the efficacy of our proposed code-driven reasoning paradigm. Our work opens a new direction for multimodal mathematical reasoning and provides the community with the first large-scale dataset, comprehensive benchmark, and strong approach for such problems. To facilitate future research, we make our datasets, code, and pretrained models publicly available at this https URL.
zh

[CV-1] Ev4DGS: Novel-view Rendering of Non-Rigid Objects from Monocular Event Streams

【速读】:该论文旨在解决仅从单目事件流(monocular event streams)中实现非刚性变形物体的新视角渲染(novel view rendering)这一挑战性问题。现有方法在处理非刚性场景时通常依赖稀疏RGB输入,而本文首次提出Ev4DGS,即首个在显式观察空间(如RGB或灰度图像)中从单一事件流重建非刚性物体三维动态结构并支持新视角生成的方法。其解决方案的关键在于:1)设计了一种损失函数,将模型输出与二维事件观测空间进行关联,从而直接利用事件数据优化重建;2)通过从事件中生成的二值掩码训练一个粗粒度的3D形变模型,为后续精细化的可变形3D高斯溅射(deformable 3D Gaussian Splatting)表示提供初始形变场,从而实现无需RGB辅助输入的完整重建流程。

链接: https://arxiv.org/abs/2510.11717
作者: Takuya Nakabayashi,Navami Kairanda,Hideo Saito,Vladislav Golyanik
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Event cameras offer various advantages for novel view rendering compared to synchronously operating RGB cameras, and efficient event-based techniques supporting rigid scenes have been recently demonstrated in the literature. In the case of non-rigid objects, however, existing approaches additionally require sparse RGB inputs, which can be a substantial practical limitation; it remains unknown if similar models could be learned from event streams only. This paper sheds light on this challenging open question and introduces Ev4DGS, i.e., the first approach for novel view rendering of non-rigidly deforming objects in the explicit observation space (i.e., as RGB or greyscale images) from monocular event streams. Our method regresses a deformable 3D Gaussian Splatting representation through 1) a loss relating the outputs of the estimated model with the 2D event observation space, and 2) a coarse 3D deformation model trained from binary masks generated from events. We perform experimental comparisons on existing synthetic and newly recorded real datasets with non-rigid objects. The results demonstrate the validity of Ev4DGS and its superior performance compared to multiple naive baselines that can be applied in our setting. We will release our models and the datasets used in the evaluation for research purposes; see the project webpage: this https URL.
zh

[CV-2] Point Prompting: Counterfactual Tracking with Video Diffusion Models

【速读】:该论文旨在解决视频中零样本点跟踪(zero-shot point tracking)的问题,即在不进行特定任务训练的情况下,准确追踪视频中指定点的运动轨迹。其解决方案的关键在于利用预训练视频扩散模型(video diffusion models)的生成能力,通过在查询点位置放置一个颜色独特的标记(marker),并从中间噪声水平重新生成视频,使该标记在帧间传播,从而自然地描绘出目标点的轨迹。为确保标记在生成过程中保持可见性(因真实视频中此类标记极为罕见),作者引入未编辑的初始帧作为负向提示(negative prompt),有效抑制了模型对原始场景的依赖,增强了轨迹的鲁棒性与准确性。实验表明,该方法在多个图像条件视频扩散模型上表现优异,甚至可媲美专门训练的自监督跟踪模型,并能有效应对遮挡情况。

链接: https://arxiv.org/abs/2510.11715
作者: Ayush Shrivastava,Sanyam Mehta,Daniel Geng,Andrew Owens
机构: University of Michigan (密歇根大学); Cornell University (康奈尔大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Project link: this https URL

点击查看摘要

Abstract:Trackers and video generators solve closely related problems: the former analyze motion, while the latter synthesize it. We show that this connection enables pretrained video diffusion models to perform zero-shot point tracking by simply prompting them to visually mark points as they move over time. We place a distinctively colored marker at the query point, then regenerate the rest of the video from an intermediate noise level. This propagates the marker across frames, tracing the point’s trajectory. To ensure that the marker remains visible in this counterfactual generation, despite such markers being unlikely in natural videos, we use the unedited initial frame as a negative prompt. Through experiments with multiple image-conditioned video diffusion models, we find that these “emergent” tracks outperform those of prior zero-shot methods and persist through occlusions, often obtaining performance that is competitive with specialized self-supervised models.
zh

[CV-3] DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training

【速读】:该论文旨在解决全景图像生成中几何保真度(geometric fidelity)与照片真实感(photorealism)难以兼顾的问题,其核心原因被归结为缺乏大规模、高质量的真实世界全景数据。解决方案的关键在于提出基于DiT(Diffusion Transformer)的DiT360框架,通过跨域转换与域内增强模块实现多层级优化:在预VAE图像层面引入透视图引导与全景细化以提升感知质量并规范多样性与真实感;在后VAE token层面采用混合监督策略,包括循环填充(circular padding)保障边界连续性、yaw损失(yaw loss)增强旋转鲁棒性、立方体损失(cube loss)提升对畸变的敏感度,从而显著改善全景生成任务中的边界一致性与图像保真度,在文本到全景、修复和外补绘等任务中均优于现有方法。

链接: https://arxiv.org/abs/2510.11712
作者: Haoran Feng,Dizhe Zhang,Xiangtai Li,Bo Du,Lu Qi
机构: Insta360 Research (Insta360 研究院); Tsinghua University (清华大学); Nanyang Technological University (南洋理工大学); Wuhan University (武汉大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: this https URL

点击查看摘要

Abstract:In this work, we propose DiT360, a DiT-based framework that performs hybrid training on perspective and panoramic data for panoramic image generation. For the issues of maintaining geometric fidelity and photorealism in generation quality, we attribute the main reason to the lack of large-scale, high-quality, real-world panoramic data, where such a data-centric view differs from prior methods that focus on model design. Basically, DiT360 has several key modules for inter-domain transformation and intra-domain augmentation, applied at both the pre-VAE image level and the post-VAE token level. At the image level, we incorporate cross-domain knowledge through perspective image guidance and panoramic refinement, which enhance perceptual quality while regularizing diversity and photorealism. At the token level, hybrid supervision is applied across multiple modules, which include circular padding for boundary continuity, yaw loss for rotational robustness, and cube loss for distortion awareness. Extensive experiments on text-to-panorama, inpainting, and outpainting tasks demonstrate that our method achieves better boundary consistency and image fidelity across eleven quantitative metrics. Our code is available at this https URL.
zh

[CV-4] Adversarial Attacks Leverag e Interference Between Features in Superposition

【速读】:该论文试图解决神经网络中对抗样本(adversarial examples)的产生机制问题,即为何以及何时对抗脆弱性会出现。现有观点将其归因于决策空间的不规则性或对非鲁棒输入特征的敏感性,但本文提出新视角:对抗脆弱性可能源于神经网络中高效的信息编码方式。解决方案的关键在于揭示“超叠加”(superposition)——即网络在有限维度下表示比其容量更多的特征——会形成潜在表示的特定排列,使得攻击者能够利用这些特征间的干扰来生成可预测的对抗扰动。这一机制解释了对抗攻击的迁移性和类别特异性脆弱模式,并在受控合成场景和ViT模型上验证了超叠加足以引发对抗脆弱性,表明对抗脆弱性可能是表征压缩的副产物,而非学习过程或输入特征本身的缺陷。

链接: https://arxiv.org/abs/2510.11709
作者: Edward Stevinson,Lucas Prieto,Melih Barsbey,Tolga Birdal
机构: Imperial College London (帝国理工学院)
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Fundamental questions remain about when and why adversarial examples arise in neural networks, with competing views characterising them either as artifacts of the irregularities in the decision landscape or as products of sensitivity to non-robust input features. In this paper, we instead argue that adversarial vulnerability can stem from efficient information encoding in neural networks. Specifically, we show how superposition - where networks represent more features than they have dimensions - creates arrangements of latent representations that adversaries can exploit. We demonstrate that adversarial perturbations leverage interference between superposed features, making attack patterns predictable from feature arrangements. Our framework provides a mechanistic explanation for two known phenomena: adversarial attack transferability between models with similar training regimes and class-specific vulnerability patterns. In synthetic settings with precisely controlled superposition, we establish that superposition suffices to create adversarial vulnerability. We then demonstrate that these findings persist in a ViT trained on CIFAR-10. These findings reveal adversarial vulnerability can be a byproduct of networks’ representational compression, rather than flaws in the learning process or non-robust inputs.
zh

[CV-5] Bayesian Topological Convolutional Neural Nets

【速读】:该论文旨在解决卷积神经网络(Convolutional Neural Networks, CNNs)在图像数据处理中面临的三大问题:训练所需数据量大、预测结果过度自信以及缺乏不确定性量化能力。其解决方案的关键在于提出一种新的贝叶斯拓扑卷积神经网络(Bayesian Topological CNN),通过引入拓扑感知学习与贝叶斯采样之间的新型协同机制,利用重要流形信息加速训练并降低校准误差;同时,在学习成本函数中加入一致性约束条件,有效调整先验分布以优化模型性能,并显著提升在小样本或噪声数据场景下的鲁棒性及对分布外数据的不确定性识别能力。

链接: https://arxiv.org/abs/2510.11704
作者: Sarah Harkins Dayton,Hayden Everett,Ioannis Schizas,David L. Boothe Jr.,Vasileios Maroulas
机构: The University of Tennessee, Knoxville (田纳西大学诺克斯维尔分校); DEVCOM ARL (陆军研究实验室)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Convolutional neural networks (CNNs) have been established as the main workhorse in image data processing; nonetheless, they require large amounts of data to train, often produce overconfident predictions, and frequently lack the ability to quantify the uncertainty of their predictions. To address these concerns, we propose a new Bayesian topological CNN that promotes a novel interplay between topology-aware learning and Bayesian sampling. Specifically, it utilizes information from important manifolds to accelerate training while reducing calibration error by placing prior distributions on network parameters and properly learning appropriate posteriors. One important contribution of our work is the inclusion of a consistency condition in the learning cost, which can effectively modify the prior distributions to improve the performance of our novel network architecture. We evaluate the model on benchmark image classification datasets and demonstrate its superiority over conventional CNNs, Bayesian neural networks (BNNs), and topological CNNs. In particular, we supply evidence that our method provides an advantage in situations where training data is limited or corrupted. Furthermore, we show that the new model allows for better uncertainty quantification than standard BNNs since it can more readily identify examples of out-of-distribution data on which it has not been trained. Our results highlight the potential of our novel hybrid approach for more efficient and robust image classification.
zh

[CV-6] Diffusion Transformers with Representation Autoencoders

【速读】:该论文旨在解决当前扩散 Transformer(Diffusion Transformers, DiT)中依赖原始变分自编码器(Variational Autoencoder, VAE)所带来的局限性问题,包括过时的网络结构、低维潜在空间导致的信息容量受限以及仅基于重建训练所引发的表征能力弱化,从而限制生成质量。其解决方案的关键在于用预训练表示编码器(如 DINO、SigLIP、MAE)结合可训练解码器构成新型 Representation Autoencoders (RAEs),以构建语义丰富且高保真度的潜在空间,并通过理论分析与实证验证优化扩散 Transformer 在高维潜在空间中的运行效率,最终实现无需辅助对齐损失即可更快收敛,并在 ImageNet 数据集上取得优异图像生成性能(256×256 无引导时 FID=1.51,有引导时为 1.13;512×512 有引导时也为 1.13)。

链接: https://arxiv.org/abs/2510.11690
作者: Boyang Zheng,Nanye Ma,Shengbang Tong,Saining Xie
机构: New York University (纽约大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Technical Report; Project Page: this https URL

点击查看摘要

Abstract:Latent generative modeling, where a pretrained autoencoder maps pixels into a latent space for the diffusion process, has become the standard strategy for Diffusion Transformers (DiT); however, the autoencoder component has barely evolved. Most DiTs continue to rely on the original VAE encoder, which introduces several limitations: outdated backbones that compromise architectural simplicity, low-dimensional latent spaces that restrict information capacity, and weak representations that result from purely reconstruction-based training and ultimately limit generative quality. In this work, we explore replacing the VAE with pretrained representation encoders (e.g., DINO, SigLIP, MAE) paired with trained decoders, forming what we term Representation Autoencoders (RAEs). These models provide both high-quality reconstructions and semantically rich latent spaces, while allowing for a scalable transformer-based architecture. Since these latent spaces are typically high-dimensional, a key challenge is enabling diffusion transformers to operate effectively within them. We analyze the sources of this difficulty, propose theoretically motivated solutions, and validate them empirically. Our approach achieves faster convergence without auxiliary representation alignment losses. Using a DiT variant equipped with a lightweight, wide DDT head, we achieve strong image generation results on ImageNet: 1.51 FID at 256x256 (no guidance) and 1.13 at both 256x256 and 512x512 (with guidance). RAE offers clear advantages and should be the new default for diffusion transformer training.
zh

[CV-7] Beyond Templates: Category-Agnostic Object Pose Size and Shape Estimation from a Single View

【速读】:该论文旨在解决从单张RGB-D图像中同时估计物体的6D位姿(6D pose)、尺寸(size)和密集形状(dense shape)这一基础计算机视觉问题,尤其针对现有方法依赖特定物体先验(如CAD模型或模板)或在类别间泛化能力受限的问题。其解决方案的关键在于提出一个统一的、类别无关(category-agnostic)框架:通过Transformer编码器融合视觉基础模型的稠密2D特征与部分3D点云,并引入Mixture-of-Experts机制增强表达能力;同时采用并行解码器分别完成位姿-尺寸估计与形状重建,在仅使用合成数据训练的情况下实现对超过300个类别的零样本泛化,且推理速度达28 FPS,显著提升了开放集场景下机器人操作与具身智能中的6D理解性能。

链接: https://arxiv.org/abs/2510.11687
作者: Jinyu Zhang,Haitao Lin,Jiashu Hou,Xiangyang Xue,Yanwei Fu
机构: Fudan University (复旦大学); Shanghai Innovation Institute (上海创新研究院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Estimating an object’s 6D pose, size, and shape from visual input is a fundamental problem in computer vision, with critical applications in robotic grasping and manipulation. Existing methods either rely on object-specific priors such as CAD models or templates, or suffer from limited generalization across categories due to pose-shape entanglement and multi-stage pipelines. In this work, we propose a unified, category-agnostic framework that simultaneously predicts 6D pose, size, and dense shape from a single RGB-D image, without requiring templates, CAD models, or category labels at test time. Our model fuses dense 2D features from vision foundation models with partial 3D point clouds using a Transformer encoder enhanced by a Mixture-of-Experts, and employs parallel decoders for pose-size estimation and shape reconstruction, achieving real-time inference at 28 FPS. Trained solely on synthetic data from 149 categories in the SOPE dataset, our framework is evaluated on four diverse benchmarks SOPE, ROPE, ObjaversePose, and HANDAL, spanning over 300 categories. It achieves state-of-the-art accuracy on seen categories while demonstrating remarkably strong zero-shot generalization to unseen real-world objects, establishing a new standard for open-set 6D understanding in robotics and embodied AI.
zh

[CV-8] FACE: Faithful Automatic Concept Extraction NEURIPS2025

【速读】:该论文旨在解决当前自动概念发现方法在生成深度神经网络解释时,难以与模型真实决策过程对齐的问题,从而导致解释忠实性(faithfulness)不足。其解决方案的关键在于提出一种名为FACE(Faithful Automatic Concept Extraction)的新框架,该框架通过在非负矩阵分解(Non-negative Matrix Factorization, NMF)中引入Kullback-Leibler(KL)散度正则项,确保原始模型预测与基于概念的预测之间的一致性;同时,在概念学习过程中引入分类器监督,强化预测一致性,从而实现更忠实的解释。理论分析进一步表明,最小化KL散度可有效限制预测分布的偏差,促进概念空间中的局部线性特性。

链接: https://arxiv.org/abs/2510.11675
作者: Dipkamal Bhusal,Michael Clifford,Sara Rampazzi,Nidhi Rastogi
机构: Rochester Institute of Technology (罗彻斯特理工学院); Toyota InfoTech Labs (丰田信息科技实验室); University of Florida (佛罗里达大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)

点击查看摘要

Abstract:Interpreting deep neural networks through concept-based explanations offers a bridge between low-level features and high-level human-understandable semantics. However, existing automatic concept discovery methods often fail to align these extracted concepts with the model’s true decision-making process, thereby compromising explanation faithfulness. In this work, we propose FACE (Faithful Automatic Concept Extraction), a novel framework that augments Non-negative Matrix Factorization (NMF) with a Kullback-Leibler (KL) divergence regularization term to ensure alignment between the model’s original and concept-based predictions. Unlike prior methods that operate solely on encoder activations, FACE incorporates classifier supervision during concept learning, enforcing predictive consistency and enabling faithful explanations. We provide theoretical guarantees showing that minimizing the KL divergence bounds the deviation in predictive distributions, thereby promoting faithful local linearity in the learned concept space. Systematic evaluations on ImageNet, COCO, and CelebA datasets demonstrate that FACE outperforms existing methods across faithfulness and sparsity metrics.
zh

[CV-9] InfiniHuman: Infinite 3D Human Creation with Precise Control SIGGRAPH

【速读】:该论文旨在解决生成真实且可控的3D人类虚拟形象(Human Avatar)这一长期挑战,尤其是在覆盖广泛属性范围(如种族、年龄、服装风格和详细体形)时,传统方法因大规模数据采集与标注成本过高而受限。其解决方案的关键在于提出InfiniHuman框架,通过蒸馏现有基础模型(foundation models)实现理论上无限扩展、丰富标注的3D人类数据生成;具体包括InfiniHumanData自动数据生成管道和InfiniHumanGen扩散生成模型:前者利用视觉-语言模型和图像生成模型构建包含11.1万身份、多粒度文本描述、多视角RGB图像、服装细节图及SMPL体形参数的大规模多模态数据集;后者基于文本、体形和服装资产条件进行快速、逼真且精确可控的虚拟形象生成,显著优于当前最优方法在视觉质量、生成速度和可控性方面的表现。

链接: https://arxiv.org/abs/2510.11650
作者: Yuxuan Xue,Xianghui Xie,Margaret Kostyrko,Gerard Pons-Moll
机构: University of Tübingen(图宾根大学); Tübingen AI Center(图宾根人工智能中心); MPI for Informatics(马普研究所信息学所); SIC(计算机视觉与机器学习研究中心)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted to ACM SIGGRAPH Asia 2025. Project website: this https URL

点击查看摘要

Abstract:Generating realistic and controllable 3D human avatars is a long-standing challenge, particularly when covering broad attribute ranges such as ethnicity, age, clothing styles, and detailed body shapes. Capturing and annotating large-scale human datasets for training generative models is prohibitively expensive and limited in scale and diversity. The central question we address in this paper is: Can existing foundation models be distilled to generate theoretically unbounded, richly annotated 3D human data? We introduce InfiniHuman, a framework that synergistically distills these models to produce richly annotated human data at minimal cost and with theoretically unlimited scalability. We propose InfiniHumanData, a fully automatic pipeline that leverages vision-language and image generation models to create a large-scale multi-modal dataset. User study shows our automatically generated identities are undistinguishable from scan renderings. InfiniHumanData contains 111K identities spanning unprecedented diversity. Each identity is annotated with multi-granularity text descriptions, multi-view RGB images, detailed clothing images, and SMPL body-shape parameters. Building on this dataset, we propose InfiniHumanGen, a diffusion-based generative pipeline conditioned on text, body shape, and clothing assets. InfiniHumanGen enables fast, realistic, and precisely controllable avatar generation. Extensive experiments demonstrate significant improvements over state-of-the-art methods in visual quality, generation speed, and controllability. Our approach enables high-quality avatar generation with fine-grained control at effectively unbounded scale through a practical and affordable solution. We will publicly release the automatic data generation pipeline, the comprehensive InfiniHumanData dataset, and the InfiniHumanGen models at this https URL.
zh

[CV-10] PhySIC: Physically Plausible 3D Human-Scene Interaction and Contact from a Single Image SIGGRAPH

【速读】:该论文旨在解决从单张RGB图像中重建度量准确的人体与场景几何结构的问题,尤其针对现有方法在深度模糊性(depth ambiguity)、遮挡(occlusion)及物理不一致的接触关系(physically inconsistent contacts)方面的局限。其解决方案的关键在于提出PhySIC框架,通过联合优化人体SMPL-X网格、密集场景表面和顶点级接触图,在统一坐标系下实现物理合理的交互与接触重建;该框架首先基于粗略的单目深度估计进行遮挡感知的图像修复(occlusion-aware inpainting),融合可见深度与未缩放几何以构建鲁棒的度量骨架,并合成缺失的支持面(如地板);随后通过置信度加权优化,联合约束深度对齐、接触先验、穿透避免和2D重投影一致性,从而精确恢复人体姿态、相机参数与全局尺度,同时利用显式遮挡掩码防止不可行配置,最终实现高效且真实的单图人体-场景三维重建。

链接: https://arxiv.org/abs/2510.11649
作者: Pradyumna Yalandur Muralidhar,Yuxuan Xue,Xianghui Xie,Margaret Kostyrko,Gerard Pons-Moll
机构: University of Tübingen (图宾根大学); Tübingen AI Center (图宾根人工智能中心); MPI for Informatics (马普信息研究所); SIC (智能计算中心)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted to ACM SIGGraphAsia 2025. Project website: this https URL

点击查看摘要

Abstract:Reconstructing metrically accurate humans and their surrounding scenes from a single image is crucial for virtual reality, robotics, and comprehensive 3D scene understanding. However, existing methods struggle with depth ambiguity, occlusions, and physically inconsistent contacts. To address these challenges, we introduce PhySIC, a framework for physically plausible Human-Scene Interaction and Contact reconstruction. PhySIC recovers metrically consistent SMPL-X human meshes, dense scene surfaces, and vertex-level contact maps within a shared coordinate frame from a single RGB image. Starting from coarse monocular depth and body estimates, PhySIC performs occlusion-aware inpainting, fuses visible depth with unscaled geometry for a robust metric scaffold, and synthesizes missing support surfaces like floors. A confidence-weighted optimization refines body pose, camera parameters, and global scale by jointly enforcing depth alignment, contact priors, interpenetration avoidance, and 2D reprojection consistency. Explicit occlusion masking safeguards invisible regions against implausible configurations. PhySIC is efficient, requiring only 9 seconds for joint human-scene optimization and under 27 seconds end-to-end. It naturally handles multiple humans, enabling reconstruction of diverse interactions. Empirically, PhySIC outperforms single-image baselines, reducing mean per-vertex scene error from 641 mm to 227 mm, halving PA-MPJPE to 42 mm, and improving contact F1 from 0.09 to 0.51. Qualitative results show realistic foot-floor interactions, natural seating, and plausible reconstructions of heavily occluded furniture. By converting a single image into a physically plausible 3D human-scene pair, PhySIC advances scalable 3D scene understanding. Our implementation is publicly available at this https URL.
zh

[CV-11] IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment

【速读】:该论文旨在解决当前视频编辑基准测试在评估指令引导式视频编辑(instruction-guided video editing)任务时存在的不足,包括源视频多样性有限、任务覆盖范围狭窄以及评价指标不完整等问题。其解决方案的关键在于提出 IVEBench——一个专为指令引导式视频编辑设计的现代基准套件,包含600个高质量源视频(涵盖7个语义维度和32–1024帧长度范围)、8类编辑任务及35个子类,并通过大语言模型生成与专家评审优化提示(prompt),同时构建包含视频质量、指令合规性和视频保真度三个维度的综合评估协议,融合传统指标与多模态大语言模型评估方法,从而实现全面且符合人类认知的评测能力。

链接: https://arxiv.org/abs/2510.11647
作者: Yinan Chen,Jiangning Zhang,Teng Hu,Yuxiang Zeng,Zhucun Xue,Qingdong He,Chengjie Wang,Yong Liu,Xiaobin Hu,Shuicheng Yan
机构: Zhejiang University (浙江大学); Tencent Youtu Lab (腾讯优图实验室); Shanghai Jiao Tong University (上海交通大学); University of Auckland (奥克兰大学); National University of Singapore (新加坡国立大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Equal contributions from first two authors. Project page: this https URL Code: this https URL Dataset: this https URL

点击查看摘要

Abstract:Instruction-guided video editing has emerged as a rapidly advancing research direction, offering new opportunities for intuitive content transformation while also posing significant challenges for systematic evaluation. Existing video editing benchmarks fail to support the evaluation of instruction-guided video editing adequately and further suffer from limited source diversity, narrow task coverage and incomplete evaluation metrics. To address the above limitations, we introduce IVEBench, a modern benchmark suite specifically designed for instruction-guided video editing assessment. IVEBench comprises a diverse database of 600 high-quality source videos, spanning seven semantic dimensions, and covering video lengths ranging from 32 to 1,024 frames. It further includes 8 categories of editing tasks with 35 subcategories, whose prompts are generated and refined through large language models and expert review. Crucially, IVEBench establishes a three-dimensional evaluation protocol encompassing video quality, instruction compliance and video fidelity, integrating both traditional metrics and multimodal large language model-based assessments. Extensive experiments demonstrate the effectiveness of IVEBench in benchmarking state-of-the-art instruction-guided video editing methods, showing its ability to provide comprehensive and human-aligned evaluation outcomes.
zh

[CV-12] NV3D: Leverag ing Spatial Shape Through Normal Vector-based 3D Object Detection

【速读】:该论文旨在解决自动驾驶场景中3D目标检测任务面临的两个核心问题:一是多模态方法在特征对齐上的困难,二是仅依赖局部特征提取难以应对复杂场景下的检测需求。其解决方案的关键在于提出一种名为NV3D的新模型,该模型通过基于K近邻(KNN)和主成分分析(PCA)的体素法向量计算机制,从体素邻域中获取具有几何结构信息的局部特征,从而增强模型对物体表面与目标实体(如车辆、行人或骑行者)之间关系的理解能力。此外,NV3D引入两种采样策略(基于法向量密度的采样和视野感知的分箱采样),可在保留性能的前提下减少高达55%的数据量,并结合元素级注意力融合机制,将体素特征作为查询和值,法向量特征作为键进行特征融合,显著提升了检测精度,尤其在KITTI数据集上实现了优于基线Voxel R-CNN的mAP表现。

链接: https://arxiv.org/abs/2510.11632
作者: Krittin Chaowakarn,Paramin Sangwongngam,Nang Htet Htet Aung,Chalie Charoenlarpnopparut
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Recent studies in 3D object detection for autonomous vehicles aim to enrich features through the utilization of multi-modal setups or the extraction of local patterns within LiDAR point clouds. However, multi-modal methods face significant challenges in feature alignment, and gaining features locally can be oversimplified for complex 3D object detection tasks. In this paper, we propose a novel model, NV3D, which utilizes local features acquired from voxel neighbors, as normal vectors computed per voxel basis using K-nearest neighbors (KNN) and principal component analysis (PCA). This informative feature enables NV3D to determine the relationship between the surface and pertinent target entities, including cars, pedestrians, or cyclists. During the normal vector extraction process, NV3D offers two distinct sampling strategies: normal vector density-based sampling and FOV-aware bin-based sampling, allowing elimination of up to 55% of data while maintaining performance. In addition, we applied element-wise attention fusion, which accepts voxel features as the query and value and normal vector features as the key, similar to the attention mechanism. Our method is trained on the KITTI dataset and has demonstrated superior performance in car and cyclist detection owing to their spatial shapes. In the validation set, NV3D without sampling achieves 86.60% and 80.18% mean Average Precision (mAP), greater than the baseline Voxel R-CNN by 2.61% and 4.23% mAP, respectively. With both samplings, NV3D achieves 85.54% mAP in car detection, exceeding the baseline by 1.56% mAP, despite roughly 55% of voxels being filtered out.
zh

[CV-13] EvoCAD: Evolutionary CAD Code Generation with Vision Language Models ICTAI2025

【速读】:该论文旨在解决如何利用大语言模型(Large Language Models, LLMs)与进化计算算法相结合,以生成拓扑结构正确且语义合理的计算机辅助设计(Computer-Aided Design, CAD)对象的问题。其解决方案的关键在于提出EvoCAD方法,该方法通过视觉语言模型(Vision Language Models, VLMs)和推理语言模型(Reasoning Language Models)对符号化表示的CAD对象进行采样与进化优化,并引入基于欧拉示性数(Euler characteristic)定义的拓扑属性新指标,从而高效评估生成对象的拓扑正确性,显著优于现有空间度量指标,在生成高质量、拓扑一致的3D CAD对象方面展现出优势。

链接: https://arxiv.org/abs/2510.11631
作者: Tobias Preintner,Weixuan Yuan,Adrian König,Thomas Bäck,Elena Raponi,Niki van Stein
机构: 1. TU Eindhoven (埃因霍温理工大学); 2. Google DeepMind (谷歌深度思维)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
备注: Accepted to IEEE ICTAI 2025

点击查看摘要

Abstract:Combining large language models with evolutionary computation algorithms represents a promising research direction leveraging the remarkable generative and in-context learning capabilities of LLMs with the strengths of evolutionary algorithms. In this work, we present EvoCAD, a method for generating computer-aided design (CAD) objects through their symbolic representations using vision language models and evolutionary optimization. Our method samples multiple CAD objects, which are then optimized using an evolutionary approach with vision language and reasoning language models. We assess our method using GPT-4V and GPT-4o, evaluating it on the CADPrompt benchmark dataset and comparing it to prior methods. Additionally, we introduce two new metrics based on topological properties defined by the Euler characteristic, which capture a form of semantic similarity between 3D objects. Our results demonstrate that EvoCAD outperforms previous approaches on multiple metrics, particularly in generating topologically correct objects, which can be efficiently evaluated using our two novel metrics that complement existing spatial metrics.
zh

[CV-14] High-resolution Photo Enhancement in Real-time: A Laplacian Pyramid Network

【速读】:该论文旨在解决图像增强方法在性能与计算效率之间的权衡问题:现有方法要么追求高增强效果但难以部署于边缘设备,要么注重计算效率却无法满足真实场景下的性能需求。解决方案的关键在于提出一种名为LLF-LUT++的金字塔网络架构,其通过闭式拉普拉斯金字塔分解与重建融合全局与局部操作,实现高效且高质量的图像增强。具体而言,该方案利用图像自适应的3D查找表(Look-Up Table, LUT)捕捉下采样图像的全局色调特征,并结合两种不同的权重融合策略完成粗粒度全局增强;同时设计空间-频率变换器权重预测器以提取频域特征驱动的差异化权重;此外,在高频分量中应用局部拉普拉斯滤波器自适应优化边缘细节。这一系列设计使模型在保持极低推理延迟(如4K图像仅需13ms)的同时,在HDR+数据集上PSNR提升达2.64 dB,显著优于当前先进方法。

链接: https://arxiv.org/abs/2510.11613
作者: Feng Zhang,Haoyou Deng,Zhiqiang Li,Lida Li,Bin Xu,Qingbo Lu,Zisheng Cao,Minchen Wei,Changxin Gao,Nong Sang,Xiang Bai
机构: Huazhong University of Science and Technology (华中科技大学); DJI Technology Co., Ltd. (大疆创新科技有限公司); The Hong Kong Polytechnic University (香港理工大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: accepted by TPAMI 2025

点击查看摘要

Abstract:Photo enhancement plays a crucial role in augmenting the visual aesthetics of a photograph. In recent years, photo enhancement methods have either focused on enhancement performance, producing powerful models that cannot be deployed on edge devices, or prioritized computational efficiency, resulting in inadequate performance for real-world applications. To this end, this paper introduces a pyramid network called LLF-LUT++, which integrates global and local operators through closed-form Laplacian pyramid decomposition and reconstruction. This approach enables fast processing of high-resolution images while also achieving excellent performance. Specifically, we utilize an image-adaptive 3D LUT that capitalizes on the global tonal characteristics of downsampled images, while incorporating two distinct weight fusion strategies to achieve coarse global image enhancement. To implement this strategy, we designed a spatial-frequency transformer weight predictor that effectively extracts the desired distinct weights by leveraging frequency features. Additionally, we apply local Laplacian filters to adaptively refine edge details in high-frequency components. After meticulously redesigning the network structure and transformer model, LLF-LUT++ not only achieves a 2.64 dB improvement in PSNR on the HDR+ dataset, but also further reduces runtime, with 4K resolution images processed in just 13 ms on a single GPU. Extensive experimental results on two benchmark datasets further show that the proposed approach performs favorably compared to state-of-the-art methods. The source code will be made publicly available at this https URL.
zh

[CV-15] ExpVid: A Benchmark for Experiment Video Understanding Reasoning

【速读】:该论文旨在解决当前多模态大语言模型(Multimodal Large Language Models, MLLMs)在真实实验室场景中科学实验视频理解能力评估不足的问题,尤其是现有基准测试未能充分捕捉湿实验(wet-lab)过程中细粒度、长时程的特性。其解决方案的关键在于提出首个系统性评估框架ExpVid,该框架通过三层次任务结构(细粒度感知、程序性理解与科学推理)模拟完整的科学实验流程,并采用视觉导向的标注流程结合自动化生成与跨学科专家验证,确保任务具有强视觉 grounding 能力。实验表明,尽管主流MLLMs在粗粒度识别上表现良好,但在细粒度辨别、状态演化追踪及实验过程与结论关联等高阶推理方面存在显著短板,尤其开放源代码模型与专有模型之间差距明显。

链接: https://arxiv.org/abs/2510.11606
作者: Yicheng Xu,Yue Wu,Jiashuo Yu,Ziang Yan,Tianxiang Jiang,Yinan He,Qingsong Zhao,Kai Chen,Yu Qiao,Limin Wang,Manabu Okumura,Yi Wang
机构: Shanghai AI Laboratory (上海人工智能实验室); Institute of Science Tokyo (东京科学研究所); Nanjing University (南京大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Data Code: this https URL

点击查看摘要

Abstract:Multimodal Large Language Models (MLLMs) hold promise for accelerating scientific discovery by interpreting complex experimental procedures. However, their true capabilities are poorly understood, as existing benchmarks neglect the fine-grained and long-horizon nature of authentic laboratory work, especially in wet-lab settings. To bridge this gap, we introduce ExpVid, the first benchmark designed to systematically evaluate MLLMs on scientific experiment videos. Curated from peer-reviewed video publications, ExpVid features a new three-level task hierarchy that mirrors the scientific process: (1) Fine-grained Perception of tools, materials, and actions; (2) Procedural Understanding of step order and completeness; and (3) Scientific Reasoning that connects the full experiment to its published conclusions. Our vision-centric annotation pipeline, combining automated generation with multi-disciplinary expert validation, ensures that tasks require visual grounding. We evaluate 19 leading MLLMs on ExpVid and find that while they excel at coarse-grained recognition, they struggle with disambiguating fine details, tracking state changes over time, and linking experimental procedures to scientific outcomes. Our results reveal a notable performance gap between proprietary and open-source models, particularly in high-order reasoning. ExpVid not only provides a diagnostic tool but also charts a roadmap for developing MLLMs capable of becoming trustworthy partners in scientific experimentation.
zh

[CV-16] ACE-G: Improving Generalization of Scene Coordinate Regression Through Query Pre-Training ICCV2025

【速读】:该论文旨在解决场景坐标回归(Scene Coordinate Regression, SCR)方法在视觉重定位任务中泛化能力不足的问题。传统SCR模型由于训练目标是将训练视图编码到坐标回归器的权重中,导致其对光照、视角等成像条件变化敏感,难以适应未见过的查询图像。解决方案的关键在于将坐标回归器与场景表示分离:引入一个通用的Transformer作为主干网络,并结合场景特定的地图代码(map code)。这种架构设计使得Transformer可以在数万个场景上进行预训练,并在预训练阶段就学习从映射图像到未见查询图像的跨场景映射能力,从而显著提升模型的鲁棒性,同时保持计算开销可控。该方法命名为ACE-G,在多个具有挑战性的重定位数据集上验证了其优越性能。

链接: https://arxiv.org/abs/2510.11605
作者: Leonard Bruns,Axel Barroso-Laguna,Tommaso Cavallari,Áron Monszpart,Sowmya Munukutla,Victor Adrian Prisacariu,Eric Brachmann
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: ICCV 2025, Project page: this https URL

点击查看摘要

Abstract:Scene coordinate regression (SCR) has established itself as a promising learning-based approach to visual relocalization. After mere minutes of scene-specific training, SCR models estimate camera poses of query images with high accuracy. Still, SCR methods fall short of the generalization capabilities of more classical feature-matching approaches. When imaging conditions of query images, such as lighting or viewpoint, are too different from the training views, SCR models fail. Failing to generalize is an inherent limitation of previous SCR frameworks, since their training objective is to encode the training views in the weights of the coordinate regressor itself. The regressor essentially overfits to the training views, by design. We propose to separate the coordinate regressor and the map representation into a generic transformer and a scene-specific map code. This separation allows us to pre-train the transformer on tens of thousands of scenes. More importantly, it allows us to train the transformer to generalize from mapping images to unseen query images during pre-training. We demonstrate on multiple challenging relocalization datasets that our method, ACE-G, leads to significantly increased robustness while keeping the computational footprint attractive.
zh

[CV-17] MS-Mix: Unveiling the Power of Mixup for Multimodal Sentiment Analysis

【速读】:该论文旨在解决多模态情感分析(Multimodal Sentiment Analysis, MSA)中因标注数据稀缺而导致的模型泛化能力不足问题,尤其是直接应用Mixup类数据增强方法时会引入标签模糊性和语义不一致性的挑战。其解决方案的关键在于提出一种自适应的情感敏感增强框架MS-Mix,核心包括三个创新组件:(1) 情感感知样本选择策略(Sentiment-Aware Sample Selection, SASS),通过避免混合情绪冲突的样本减少语义混淆;(2) 情感强度引导模块(Sentiment Intensity Guided, SIG),利用多头自注意力机制动态计算各模态的混合比例以匹配情感强度;(3) 情感对齐损失(Sentiment Alignment Loss, SAL),通过跨模态预测分布对齐及基于KL散度的正则项联合优化情感强度预测器与主干网络,从而显著提升模型在多模态场景下的鲁棒性与性能。

链接: https://arxiv.org/abs/2510.11579
作者: Hongyu Zhu,Lin Chen,Mounim A. El-Yacoubi,Mingsheng Shang
机构: Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences (重庆绿色智能技术研究院,中国科学院); Chongqing School, University of Chinese Academy of Sciences (中国科学院大学重庆学院); Telecom SudParis, Institute Polytechnique de Paris (电信巴黎理工学院)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Under Review

点击查看摘要

Abstract:Multimodal Sentiment Analysis (MSA) aims to identify and interpret human emotions by integrating information from heterogeneous data sources such as text, video, and audio. While deep learning models have advanced in network architecture design, they remain heavily limited by scarce multimodal annotated data. Although Mixup-based augmentation improves generalization in unimodal tasks, its direct application to MSA introduces critical challenges: random mixing often amplifies label ambiguity and semantic inconsistency due to the lack of emotion-aware mixing mechanisms. To overcome these issues, we propose MS-Mix, an adaptive, emotion-sensitive augmentation framework that automatically optimizes sample mixing in multimodal settings. The key components of MS-Mix include: (1) a Sentiment-Aware Sample Selection (SASS) strategy that effectively prevents semantic confusion caused by mixing samples with contradictory emotions. (2) a Sentiment Intensity Guided (SIG) module using multi-head self-attention to compute modality-specific mixing ratios dynamically based on their respective emotional intensities. (3) a Sentiment Alignment Loss (SAL) that aligns the prediction distributions across modalities, and incorporates the Kullback-Leibler-based loss as an additional regularization term to train the emotion intensity predictor and the backbone network jointly. Extensive experiments on three benchmark datasets with six state-of-the-art backbones confirm that MS-Mix consistently outperforms existing methods, establishing a new standard for robust multimodal sentiment augmentation. The source code is available at: this https URL.
zh

[CV-18] Benchmarking foundation models for hyperspectral image classification: Application to cereal crop type mapping

【速读】:该论文旨在解决生成式 AI (Generative AI) 在高光谱作物制图中的应用潜力尚未充分探索的问题,特别是如何利用基础模型(foundation models)提升跨地理区域和传感器平台的泛化能力。解决方案的关键在于系统性地评估三种基础模型——HyperSigma、DOFA 以及在 SpectralEarth 数据集上预训练的 Vision Transformer——在高光谱影像上的性能表现,并通过在特定区域微调后于独立测试区域验证其准确性。结果表明,基于 SpectralEarth 预训练的模型展现出最高的整体准确率(OA=93.5%),而一个从零开始训练的紧凑版本也达到 91%,凸显了模型架构设计对实现强泛化能力的重要性,为未来面向操作级高光谱作物制图的基础模型开发提供了明确方向。

链接: https://arxiv.org/abs/2510.11576
作者: Walid Elbarz,Mohamed Bourriz,Hicham Hajji,Hamd Ait Abdelali,François Bourzeix
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Being reviewed for WHISPERS conference ( Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing )

点击查看摘要

Abstract:Foundation models are transforming Earth observation, but their potential for hyperspectral crop mapping remains underexplored. This study benchmarks three foundation models for cereal crop mapping using hyperspectral imagery: HyperSigma, DOFA, and Vision Transformers pre-trained on the SpectralEarth dataset (a large multitemporal hyperspectral archive). Models were fine-tuned on manually labeled data from a training region and evaluated on an independent test region. Performance was measured with overall accuracy (OA), average accuracy (AA), and F1-score. HyperSigma achieved an OA of 34.5% (+/- 1.8%), DOFA reached 62.6% (+/- 3.5%), and the SpectralEarth model achieved an OA of 93.5% (+/- 0.8%). A compact SpectralEarth variant trained from scratch achieved 91%, highlighting the importance of model architecture for strong generalization across geographic regions and sensor platforms. These results provide a systematic evaluation of foundation models for operational hyperspectral crop mapping and outline directions for future model development.
zh

[CV-19] A Framework for Low-Effort Training Data Generation for Urban Semantic Segmentation

【速读】:该论文旨在解决合成数据(synthetic data)与真实城市场景图像之间存在的域差距(domain gap)问题,尤其是在特定目标域(如Cityscapes)下,由于建筑风格、植被分布、物体外观及相机特性等差异导致下游任务性能受限的问题。传统方法通过精细化3D建模来缩小这一差距,但成本高昂,违背了使用低成本标注数据的初衷。其解决方案的关键在于提出一种新框架,利用仅含不完美伪标签(pseudo-labels)的扩散模型(diffusion model),对现有合成数据进行域适应(domain adaptation)。该框架能够生成高保真、语义对齐的目标域图像,并通过过滤低质量生成结果、修正图像与标签间的错位以及标准化不同数据集的语义表示,将低质量或快速构建的合成数据转化为可媲美高质量人工设计数据的训练集,显著提升分割性能(mIoU最高提升8.0%点)。

链接: https://arxiv.org/abs/2510.11567
作者: Denis Zavadski,Damjan Kalšan,Tim Küchler,Haebom Lee,Stefan Roth,Carsten Rother
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Synthetic datasets are widely used for training urban scene recognition models, but even highly realistic renderings show a noticeable gap to real imagery. This gap is particularly pronounced when adapting to a specific target domain, such as Cityscapes, where differences in architecture, vegetation, object appearance, and camera characteristics limit downstream performance. Closing this gap with more detailed 3D modelling would require expensive asset and scene design, defeating the purpose of low-cost labelled data. To address this, we present a new framework that adapts an off-the-shelf diffusion model to a target domain using only imperfect pseudo-labels. Once trained, it generates high-fidelity, target-aligned images from semantic maps of any synthetic dataset, including low-effort sources created in hours rather than months. The method filters suboptimal generations, rectifies image-label misalignments, and standardises semantics across datasets, transforming weak synthetic data into competitive real-domain training sets. Experiments on five synthetic datasets and two real target datasets show segmentation gains of up to +8.0%pt. mIoU over state-of-the-art translation methods, making rapidly constructed synthetic datasets as effective as high-effort, time-intensive synthetic datasets requiring extensive manual design. This work highlights a valuable collaborative paradigm where fast semantic prototyping, combined with generative models, enables scalable, high-quality training data creation for urban scene understanding.
zh

[CV-20] SCOOPD: Learning Mixed-Liquid-Solid Scooping via Sim2Real Generative Policy

【速读】:该论文旨在解决机器人自主执行舀取任务(scooping)的挑战,特别是如何在复杂工具-物体交互和可变形物体(如颗粒介质或液体)操控中学习通用且鲁棒的策略。其关键解决方案是提出SCOOP’D方法:首先利用OmniGibson仿真环境(基于NVIDIA Omniverse)通过依赖特权状态信息(privileged state information)的算法生成舀取示范数据;随后采用生成式策略(generative policies)中的扩散模型(diffusion models)从观测输入中模仿这些示范,从而实现端到端的策略学习。该方法在零样本部署中于465次多样化真实场景测试中表现出色,验证了其在不同物品数量、特性及容器类型下的泛化能力。

链接: https://arxiv.org/abs/2510.11566
作者: Kuanning Wang,Yongchong Gu,Yuqian Fu,Zeyu Shangguan,Sicheng He,Xiangyang Xue,Yanwei Fu,Daniel Seita
机构: Fudan University (复旦大学); University of Southern California (南加州大学); INSAIT, Sofia University “St. Kliment Ohridski” (索非亚大学“圣克莱门特·奥霍里斯基”学院)
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
备注: Project page is at this https URL

点击查看摘要

Abstract:Scooping items with tools such as spoons and ladles is common in daily life, ranging from assistive feeding to retrieving items from environmental disaster sites. However, developing a general and autonomous robotic scooping policy is challenging since it requires reasoning about complex tool-object interactions. Furthermore, scooping often involves manipulating deformable objects, such as granular media or liquids, which is challenging due to their infinite-dimensional configuration spaces and complex dynamics. We propose a method, SCOOP’D, which uses simulation from OmniGibson (built on NVIDIA Omniverse) to collect scooping demonstrations using algorithmic procedures that rely on privileged state information. Then, we use generative policies via diffusion to imitate demonstrations from observational input. We directly apply the learned policy in diverse real-world scenarios, testing its performance on various item quantities, item characteristics, and container types. In zero-shot deployment, our method demonstrates promising results across 465 trials in diverse scenarios, including objects of different difficulty levels that we categorize as “Level 1” and “Level 2.” SCOOP’D outperforms all baselines and ablations, suggesting that this is a promising approach to acquiring robotic scooping skills. Project page is at this https URL.
zh

[CV-21] SNAP: Towards Segmenting Anything in Any Point Cloud

【速读】:该论文旨在解决当前交互式3D点云分割方法在应用范围和交互方式上的局限性问题,即现有模型通常局限于单一场景域(如室内或室外)且仅支持一种用户交互形式(空间点击或文本提示),同时多数据集训练易引发负迁移,导致模型泛化能力差。其解决方案的关键在于提出一个统一模型SNAP(Segment aNything in Any Point cloud),通过在7个涵盖室内、室外及航拍环境的数据集上联合训练,并引入领域自适应归一化(domain-adaptive normalization)机制以抑制负迁移,从而实现跨域泛化;此外,针对文本提示分割任务,采用自动掩码提案生成与CLIP文本嵌入匹配策略,无需人工干预即可实现全景分割(panoptic segmentation)和开放词汇分割(open-vocabulary segmentation)。实验表明,SNAP在8/9个零样本空间提示分割基准和全部5个文本提示分割基准上均达到领先或具有竞争力的性能,验证了统一模型可媲美甚至超越专用领域方法的可行性。

链接: https://arxiv.org/abs/2510.11565
作者: Aniket Gupta,Hanhui Wang,Charles Saunders,Aruni RoyChowdhury,Hanumant Singh,Huaizu Jiang
机构: Northeastern University (东北大学); The Mathworks, Inc. (数学工作公司)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Project Page, this https URL

点击查看摘要

Abstract:Interactive 3D point cloud segmentation enables efficient annotation of complex 3D scenes through user-guided prompts. However, current approaches are typically restricted in scope to a single domain (indoor or outdoor), and to a single form of user interaction (either spatial clicks or textual prompts). Moreover, training on multiple datasets often leads to negative transfer, resulting in domain-specific tools that lack generalizability. To address these limitations, we present \textbfSNAP (\textbfSegment a\textbfNything in \textbfAny \textbfPoint cloud), a unified model for interactive 3D segmentation that supports both point-based and text-based prompts across diverse domains. Our approach achieves cross-domain generalizability by training on 7 datasets spanning indoor, outdoor, and aerial environments, while employing domain-adaptive normalization to prevent negative transfer. For text-prompted segmentation, we automatically generate mask proposals without human intervention and match them against CLIP embeddings of textual queries, enabling both panoptic and open-vocabulary segmentation. Extensive experiments demonstrate that SNAP consistently delivers high-quality segmentation results. We achieve state-of-the-art performance on 8 out of 9 zero-shot benchmarks for spatial-prompted segmentation and demonstrate competitive results on all 5 text-prompted benchmarks. These results show that a unified model can match or exceed specialized domain-specific approaches, providing a practical tool for scalable 3D annotation. Project page is at, this https URL
zh

[CV-22] How many samples to label for an application given a foundation model? Chest X-ray classification study

【速读】:该论文旨在解决胸部X光片分类任务中对大量标注数据的依赖问题,这是当前医学影像诊断模型训练的主要瓶颈。解决方案的关键在于利用幂律拟合(power-law fits)来预测达到特定受试者工作特征曲线下面积(ROC-AUC)阈值所需的最小训练样本量,并发现基于XrayCLIP和XraySigLIP的Foundation模型在仅需少量标注样本(如50例)的情况下即可实现接近最优性能,且其学习曲线斜率能准确预判最终性能平台期,从而帮助从业者精准控制标注成本,仅标注必要样本即可达成目标性能。

链接: https://arxiv.org/abs/2510.11553
作者: Nikolay Nechaev,Evgenia Przhezdzetskaya,Viktor Gombolevskiy,Dmitry Umerenkov,Dmitry Dylov
机构: Artificial Intelligence Research Institute (AIRI); Skolkovo Institute of Science and Technology
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 8 pages, 5 figures

点击查看摘要

Abstract:Chest X-ray classification is vital yet resource-intensive, typically demanding extensive annotated data for accurate diagnosis. Foundation models mitigate this reliance, but how many labeled samples are required remains unclear. We systematically evaluate the use of power-law fits to predict the training size necessary for specific ROC-AUC thresholds. Testing multiple pathologies and foundation models, we find XrayCLIP and XraySigLIP achieve strong performance with significantly fewer labeled examples than a ResNet-50 baseline. Importantly, learning curve slopes from just 50 labeled cases accurately forecast final performance plateaus. Our results enable practitioners to minimize annotation costs by labeling only the essential samples for targeted performance.
zh

[CV-23] ODI-Bench: Can MLLM s Understand Immersive Omnidirectional Environments?

【速读】:该论文旨在解决当前多模态大语言模型(Multimodal Large Language Models, MLLMs)在理解全景图像(Omnidirectional Images, ODIs)所捕获的沉浸式环境方面能力不足的问题。ODIs 提供了 360°×180° 的全向视野,广泛应用于虚拟现实(VR)、增强现实(AR)和具身智能(Embodied Intelligence)等场景,但现有 MLLMs 在此类复杂空间信息的理解上表现有限。为填补这一空白,作者提出了 ODI-Bench——一个包含 2000 张高质量 ODIs 和超过 4000 对人工标注问答对的综合性基准,涵盖 10 个细粒度任务,覆盖一般层面与空间层面的理解需求。实验表明,主流 MLLMs 在 ODIs 上的表现仍存在显著局限。为此,论文进一步提出 Omni-CoT,一种无需训练的方法,通过跨文本信息与视觉线索的链式思维(Chain-of-Thought, CoT)推理机制,显著提升 MLLMs 对全景环境的理解能力。其关键创新在于利用 CoT 推理框架实现对 ODI 中空间结构和语义内容的协同解析,从而增强模型在沉浸式场景下的推理能力。

链接: https://arxiv.org/abs/2510.11549
作者: Liu Yang,Huiyu Duan,Ran Tao,Juntao Cheng,Sijing Wu,Yunhao Li,Jing Liu,Xiongkuo Min,Guangtao Zhai
机构: Shanghai Jiao Tong University (上海交通大学); Shanghai AI Laboratory; Xinjiang University (新疆大学); Tianjin University (天津大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Omnidirectional images (ODIs) provide full 360x180 view which are widely adopted in VR, AR and embodied intelligence applications. While multi-modal large language models (MLLMs) have demonstrated remarkable performance on conventional 2D image and video understanding benchmarks, their ability to comprehend the immersive environments captured by ODIs remains largely unexplored. To address this gap, we first present ODI-Bench, a novel comprehensive benchmark specifically designed for omnidirectional image understanding. ODI-Bench contains 2,000 high-quality omnidirectional images and over 4,000 manually annotated question-answering (QA) pairs across 10 fine-grained tasks, covering both general-level and spatial-level ODI understanding. Extensive experiments are conducted to benchmark 20 representative MLLMs, including proprietary and open-source models, under both close-ended and open-ended settings. Experimental results reveal that current MLLMs still struggle to capture the immersive context provided by ODIs. To this end, we further introduce Omni-CoT, a training-free method which significantly enhances MLLMs’ comprehension ability in the omnidirectional environment through chain-of-thought reasoning across both textual information and visual cues. Both the benchmark and the code will be released upon the publication.
zh

[CV-24] Massive Activations are the Key to Local Detail Synthesis in Diffusion Transformers

【速读】:该论文旨在解决扩散 Transformer (Diffusion Transformers, DiTs) 在视觉生成过程中局部细节合成质量不足的问题。尽管 DiTs 已成为强大的生成骨干模型,但其内部特征图中存在大量激活(Massive Activations, MAs),这些激活的具体作用尚不明确。研究发现,MAs 分布受输入时间步嵌入(timestep embeddings)调控,并在局部细节生成中起关键作用,而对整体语义内容影响较小。解决方案的关键在于提出一种无需训练的自引导策略——Detail Guidance (DG),该方法通过破坏 MAs 构建一个“细节缺失”的退化模型,利用其与原始网络的差异来指导原模型提升局部细节保真度。DG 可无缝集成 Classifier-Free Guidance (CFG),进一步优化细粒度细节表现,实验证明其在多个预训练 DiT 模型(如 SD3、SD3.5 和 Flux)上均能稳定提升细节质量。

链接: https://arxiv.org/abs/2510.11538
作者: Chaofan Gan,Zicheng Zhao,Yuanpeng Tu,Xi Chen,Ziran Qin,Tieyuan Chen,Mehrtash Harandi,Weiyao Lin
机构: Shanghai Jiao Tong University (上海交通大学); Monash University (莫纳什大学); The University of Hong Kong (香港大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Diffusion Transformers (DiTs) have recently emerged as a powerful backbone for visual generation. Recent observations reveal \emphMassive Activations (MAs) in their internal feature maps, yet their function remains poorly understood. In this work, we systematically investigate these activations to elucidate their role in visual generation. We found that these massive activations occur across all spatial tokens, and their distribution is modulated by the input timestep embeddings. Importantly, our investigations further demonstrate that these massive activations play a key role in local detail synthesis, while having minimal impact on the overall semantic content of output. Building on these insights, we propose \textbfDetail \textbfGuidance (\textbfDG), a MAs-driven, training-free self-guidance strategy to explicitly enhance local detail fidelity for DiTs. Specifically, DG constructs a degraded ``detail-deficient’’ model by disrupting MAs and leverages it to guide the original network toward higher-quality detail synthesis. Our DG can seamlessly integrate with Classifier-Free Guidance (CFG), enabling further refinements of fine-grained details. Extensive experiments demonstrate that our DG consistently improves fine-grained detail quality across various pre-trained DiTs (\eg, SD3, SD3.5, and Flux).
zh

[CV-25] mmWalk: Towards Multi-modal Multi-view Walking Assistance NEURIPS2025

【速读】:该论文旨在解决视障或低视力(Blind or Low Vision, BLV)人群在极端或复杂环境中的行走辅助难题,其核心瓶颈在于缺乏对场景的全局理解能力。为应对这一挑战,作者提出了一种名为mmWalk的模拟多模态数据集,其关键创新在于整合了多视角传感器信息与面向无障碍通行的特征,涵盖120条人工控制的、按场景分类的行走轨迹,共62,000帧同步数据及超过559,000张RGB、深度和语义模态的全景图像,并特别包含室外典型边缘案例和BLV用户专属地标。此外,研究构建了mmWalkVQA视觉问答基准(含69,000余个三元组),用于评估模型在风险识别与导航决策上的表现,实验表明当前最先进的视觉语言模型(Vision-Language Models, VLMs)在零样本和少样本设置下仍存在显著性能不足,而基于mmWalk微调后的模型在真实世界数据上展现出更强的实用性,验证了该数据集在推动多模态行走辅助技术发展方面的有效性。

链接: https://arxiv.org/abs/2510.11520
作者: Kedi Ying,Ruiping Liu,Chongyan Chen,Mingzhe Tao,Hao Shi,Kailun Yang,Jiaming Zhang,Rainer Stiefelhagen
机构: KIT(卡尔斯鲁厄理工学院); Center for Digital Accessibility and Assistive Technology (ACCESS@KIT)(数字无障碍与辅助技术中心); Hunan University(湖南大学); ETH Zurich(苏黎世联邦理工学院); University of Texas at Austin(德克萨斯大学奥斯汀分校); Zhejiang University(浙江大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by NeurIPS 2025 Datasets and Benchmarks Track. Data and Code: this https URL

点击查看摘要

Abstract:Walking assistance in extreme or complex environments remains a significant challenge for people with blindness or low vision (BLV), largely due to the lack of a holistic scene understanding. Motivated by the real-world needs of the BLV community, we build mmWalk, a simulated multi-modal dataset that integrates multi-view sensor and accessibility-oriented features for outdoor safe navigation. Our dataset comprises 120 manually controlled, scenario-categorized walking trajectories with 62k synchronized frames. It contains over 559k panoramic images across RGB, depth, and semantic modalities. Furthermore, to emphasize real-world relevance, each trajectory involves outdoor corner cases and accessibility-specific landmarks for BLV users. Additionally, we generate mmWalkVQA, a VQA benchmark with over 69k visual question-answer triplets across 9 categories tailored for safe and informed walking assistance. We evaluate state-of-the-art Vision-Language Models (VLMs) using zero- and few-shot settings and found they struggle with our risk assessment and navigational tasks. We validate our mmWalk-finetuned model on real-world datasets and show the effectiveness of our dataset for advancing multi-modal walking assistance.
zh

[CV-26] LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference

【速读】:该论文旨在解决视频扩散模型中直观物理理解能力的准确评估问题,即如何在生成视频时有效区分物理上合理与不合理的内容,同时避免视觉外观干扰带来的误判。其核心挑战在于难以将物理正确性与视觉真实性解耦。解决方案的关键在于提出一种无需训练的评估方法 LikePhys,利用去噪目标作为基于证据下界(ELBO)的似然替代指标,在精心构建的有效-无效视频对数据集上进行判别,从而量化模型生成视频的物理合理性。该方法通过引入Plausibility Preference Error(PPE)指标,在涵盖四个物理领域的十二种场景中展现出与人类偏好高度一致的评估性能,显著优于现有最先进评估基线。

链接: https://arxiv.org/abs/2510.11512
作者: Jianhao Yuan,Fabio Pizzati,Francesco Pinto,Lars Kunze,Ivan Laptev,Paul Newman,Philip Torr,Daniele De Martini
机构: University of Oxford (牛津大学); MBZUAI; University of Chicago (芝加哥大学); UWE Bristol (布里斯托大学西英格兰分校)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 22 pages, 9 figures

点击查看摘要

Abstract:Intuitive physics understanding in video diffusion models plays an essential role in building general-purpose physically plausible world simulators, yet accurately evaluating such capacity remains a challenging task due to the difficulty in disentangling physics correctness from visual appearance in generation. To the end, we introduce LikePhys, a training-free method that evaluates intuitive physics in video diffusion models by distinguishing physically valid and impossible videos using the denoising objective as an ELBO-based likelihood surrogate on a curated dataset of valid-invalid pairs. By testing on our constructed benchmark of twelve scenarios spanning over four physics domains, we show that our evaluation metric, Plausibility Preference Error (PPE), demonstrates strong alignment with human preference, outperforming state-of-the-art evaluator baselines. We then systematically benchmark intuitive physics understanding in current video diffusion models. Our study further analyses how model design and inference settings affect intuitive physics understanding and highlights domain-specific capacity variations across physical laws. Empirical results show that, despite current models struggling with complex and chaotic dynamics, there is a clear trend of improvement in physics understanding as model capacity and inference settings scale.
zh

[CV-27] Situat3DChange: Situated 3D Change Understanding Dataset for Multimodal Large Language Model NEURIPS2025

【速读】:该论文旨在解决当前3D场景理解研究中对动态环境和情境感知理解不足的问题,即现有数据集和评估基准多局限于静态场景或孤立的动态情形,难以全面刻画真实世界中物理环境的复杂变化。其核心解决方案是提出Situat3DChange数据集,该数据集支持三种基于感知-动作模型的情境感知变化理解任务(包括问答、变化描述与重排指令),并整合11K人类对环境变化的观察记录,融合第一人称(egocentric)与第三人称(allocentric)视角及类别化与坐标系空间关系信息,借助大语言模型(LLM)构建共享心智模型以增强人机协作下的情境一致性。此外,为高效比较同一场景下微小点云差异,论文进一步设计SCReasoner——一种轻量级多模态大语言模型(MLLM)方法,在无需额外语言解码器token的情况下实现高效率的点云对比,从而推动生成式AI在动态场景理解中的应用进展。

链接: https://arxiv.org/abs/2510.11509
作者: Ruiping Liu,Junwei Zheng,Yufan Chen,Zirui Wang,Kunyu Peng,Kailun Yang,Jiaming Zhang,Marc Pollefeys,Rainer Stiefelhagen
机构: Karlsruhe Institute of Technology (KIT); Hunan University; ETH Zurich
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted to NeurIPS 2025 Datasets and Benchmarks Track. Dataset and Code: this https URL

点击查看摘要

Abstract:Physical environments and circumstances are fundamentally dynamic, yet current 3D datasets and evaluation benchmarks tend to concentrate on either dynamic scenarios or dynamic situations in isolation, resulting in incomplete comprehension. To overcome these constraints, we introduce Situat3DChange, an extensive dataset supporting three situation-aware change understanding tasks following the perception-action model: 121K question-answer pairs, 36K change descriptions for perception tasks, and 17K rearrangement instructions for the action task. To construct this large-scale dataset, Situat3DChange leverages 11K human observations of environmental changes to establish shared mental models and shared situational awareness for human-AI collaboration. These observations, enriched with egocentric and allocentric perspectives as well as categorical and coordinate spatial relations, are integrated using an LLM to support understanding of situated changes. To address the challenge of comparing pairs of point clouds from the same scene with minor changes, we propose SCReasoner, an efficient 3D MLLM approach that enables effective point cloud comparison with minimal parameter overhead and no additional tokens required for the language decoder. Comprehensive evaluation on Situat3DChange tasks highlights both the progress and limitations of MLLMs in dynamic scene and situation understanding. Additional experiments on data scaling and cross-domain transfer demonstrate the task-agnostic effectiveness of using Situat3DChange as a training dataset for MLLMs.
zh

[CV-28] owards Fast and Scalable Normal Integration using Continuous Components WACV

【速读】:该论文旨在解决表面法向积分(surface normal integration)问题,即从给定的法向图(normal map)中重建三维表面深度信息。传统方法依赖于全局迭代优化来联合估计每个像素的深度,导致计算复杂度高,难以扩展至高分辨率法向图。其解决方案的关键在于将法向积分重构为连续区域相对尺度的估计问题:通过约束属于同一连续组件(continuous component)的像素共同变化其尺度,显著减少优化变量数量;同时引入启发式策略准确初始化连续组件、优化项再平衡机制以及迭代合并组件的技术,从而在保持精度的同时实现数十倍的速度提升。

链接: https://arxiv.org/abs/2510.11508
作者: Francesco Milano,Jen Jen Chung,Lionel Ott,Roland Siegwart
机构: ETH Zurich (苏黎世联邦理工学院); The University of Queensland (昆士兰大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2026, first round. 17 pages, 9 figures, 6 tables

点击查看摘要

Abstract:Surface normal integration is a fundamental problem in computer vision, dealing with the objective of reconstructing a surface from its corresponding normal map. Existing approaches require an iterative global optimization to jointly estimate the depth of each pixel, which scales poorly to larger normal maps. In this paper, we address this problem by recasting normal integration as the estimation of relative scales of continuous components. By constraining pixels belonging to the same component to jointly vary their scale, we drastically reduce the number of optimization variables. Our framework includes a heuristic to accurately estimate continuous components from the start, a strategy to rebalance optimization terms, and a technique to iteratively merge components to further reduce the size of the problem. Our method achieves state-of-the-art results on the standard normal integration benchmark in as little as a few seconds and achieves one-order-of-magnitude speedup over pixel-level approaches on large-resolution normal maps.
zh

[CV-29] AndesVL Technical Report: An Efficient Mobile-side Multimodal Large Language Model

【速读】:该论文旨在解决大型多模态大语言模型(Multimodal Large Language Models, MLLMs)在边缘设备(如手机)上部署受限的问题,这些问题主要源于模型参数量庞大导致的内存占用高、功耗大及计算能力不足。解决方案的关键在于设计并实现了一套轻量化、面向移动端的多模态大语言模型系列 AndesVL,其参数规模为 0.6B 至 4B,基于 Qwen3 的语言模型和多种视觉编码器构建,并通过优化模型架构、训练流程与数据策略,在多个开源基准测试中实现了与同类规模先进模型相当甚至更优的性能表现,尤其在文本丰富图像理解、推理与数学、多图理解、通用视觉问答(VQA)、幻觉抑制、多语言理解和 GUI 相关任务等方面表现出色。此外,引入了 1+N LoR(Low-Rank Adaptation)机制以提升模型微调效率与适应性,进一步增强了其在移动场景下的实用性与可扩展性。

链接: https://arxiv.org/abs/2510.11496
作者: Zhiwei Jin,Xiaohui Song,Nan Wang,Yafei Liu,Chao Li,Xin Li,Ruichen Wang,Zhihao Li,Qi Qi,Long Cheng,Dongze Hao,Quanlong Zheng,Yanhao Zhang,Haobo Ji,Jian Ma,Zhitong Zheng,Zhenyi Lin,Haolin Deng,Xin Zou,Xiaojie Yin,Ruilin Wang,Liankai Cai,Haijing Liu,Yuqing Qiu,Ke Chen,Zixian Li,Chi Xie,Huafei Li,Chenxing Li,Chuangchuang Wang,Kai Tang,Zhiguang Zhu,Kai Tang,Wenmei Gao,Rui Wang,Jun Wu,Chao Liu,Qin Xie,Chen Chen,Haonan Lu
机构: OPPO AI Center (OPPO人工智能中心)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Tech report of OPPO AndesVL Team

点击查看摘要

Abstract:In recent years, while cloud-based MLLMs such as QwenVL, InternVL, GPT-4o, Gemini, and Claude Sonnet have demonstrated outstanding performance with enormous model sizes reaching hundreds of billions of parameters, they significantly surpass the limitations in memory, power consumption, and computing capacity of edge devices such as mobile phones. This paper introduces AndesVL, a suite of mobile-side MLLMs with 0.6B to 4B parameters based on Qwen3’s LLM and various visual encoders. We comprehensively outline the model architectures, training pipeline, and training data of AndesVL, which achieves first-tier performance across a wide range of open-source benchmarks, including fields such as text-rich image understanding, reasoning and math, multi-image comprehension, general VQA, hallucination mitigation, multilingual understanding, and GUI-related tasks when compared with state-of-the-art models of a similar scale. Furthermore, we introduce a 1+N LoR
zh

[CV-30] VA-GS: Enhancing the Geometric Representation of Gaussian Splatting via View Alignment NEURIPS2025

【速读】:该论文旨在解决3D Gaussian Splatting在表面重建(surface reconstruction)方面的几何精度不足问题。由于高斯分布的离散性和非结构化特性,仅依赖图像渲染损失(image rendering loss)会导致几何失真和多视角对齐不一致。其解决方案的关键在于引入多模态约束以增强几何一致性:首先通过边缘感知图像线索改进表面边界刻画;其次设计可见性感知光度对齐损失(visibility-aware photometric alignment loss)建模遮挡关系并强化高斯体素间的空间一致性;进一步结合法向量约束(normal-based constraints)缓解光照变化带来的歧义,提升局部表面估计精度;最后利用深度图像特征嵌入(deep image feature embeddings)实现跨视角一致性建模,从而显著提升重建质量和新视角合成效果。

链接: https://arxiv.org/abs/2510.11473
作者: Qing Li,Huifang Feng,Xun Gong,Yu-Shen Liu
机构: Southwest Jiaotong University (西南交通大学); Xihua University (西华大学); Tsinghua University (清华大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by NeurIPS 2025

点击查看摘要

Abstract:3D Gaussian Splatting has recently emerged as an efficient solution for high-quality and real-time novel view synthesis. However, its capability for accurate surface reconstruction remains underexplored. Due to the discrete and unstructured nature of Gaussians, supervision based solely on image rendering loss often leads to inaccurate geometry and inconsistent multi-view alignment. In this work, we propose a novel method that enhances the geometric representation of 3D Gaussians through view alignment (VA). Specifically, we incorporate edge-aware image cues into the rendering loss to improve surface boundary delineation. To enforce geometric consistency across views, we introduce a visibility-aware photometric alignment loss that models occlusions and encourages accurate spatial relationships among Gaussians. To further mitigate ambiguities caused by lighting variations, we incorporate normal-based constraints to refine the spatial orientation of Gaussians and improve local surface estimation. Additionally, we leverage deep image feature embeddings to enforce cross-view consistency, enhancing the robustness of the learned geometry under varying viewpoints and illumination. Extensive experiments on standard benchmarks demonstrate that our method achieves state-of-the-art performance in both surface reconstruction and novel view synthesis. The source code is available at this https URL.
zh

[CV-31] Coupled Degradation Modeling and Fusion: A VLM-Guided Degradation-Coupled Network for Degradation-Aware Infrared and Visible Image Fusion

【速读】:该论文旨在解决现有红外与可见光图像融合(Infrared and Visible Image Fusion, IVIF)方法在处理退化图像时性能显著下降的问题。传统方法通常假设输入图像质量良好,需手动切换预处理策略以应对不同退化类型,导致退化处理与融合过程解耦,难以适应复杂退化场景。其解决方案的关键在于提出一种视觉-语言模型(Vision-Language Model, VLM)引导的退化耦合融合网络(VLM-Guided Degradation-Coupled Fusion network, VGDCFusion),通过两个核心模块实现:1)特定提示退化耦合提取器(Specific-Prompt Degradation-Coupled Extractor, SPDCE),实现模态特异性退化感知与退化抑制及模态内特征提取的联合建模;2)联合提示退化耦合融合模块(Joint-Prompt Degradation-Coupled Fusion, JPDCF),促进跨模态退化感知,并将残差退化滤波与互补跨模态特征融合相耦合。该设计使退化建模与融合过程紧密耦合,从而在多种退化条件下均显著优于现有先进方法。

链接: https://arxiv.org/abs/2510.11456
作者: Tianpei Zhang,Jufeng Zhao,Yiming Zhu,Guangmang Cui
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Existing Infrared and Visible Image Fusion (IVIF) methods typically assume high-quality inputs. However, when handing degraded images, these methods heavily rely on manually switching between different pre-processing techniques. This decoupling of degradation handling and image fusion leads to significant performance degradation. In this paper, we propose a novel VLM-Guided Degradation-Coupled Fusion network (VGDCFusion), which tightly couples degradation modeling with the fusion process and leverages vision-language models (VLMs) for degradation-aware perception and guided suppression. Specifically, the proposed Specific-Prompt Degradation-Coupled Extractor (SPDCE) enables modality-specific degradation awareness and establishes a joint modeling of degradation suppression and intra-modal feature extraction. In parallel, the Joint-Prompt Degradation-Coupled Fusion (JPDCF) facilitates cross-modal degradation perception and couples residual degradation filtering with complementary cross-modal feature fusion. Extensive experiments demonstrate that our VGDCFusion significantly outperforms existing state-of-the-art fusion approaches under various degraded image scenarios. Our code is available at this https URL.
zh

[CV-32] Enhancing Maritime Domain Awareness on Inland Waterways: A YOLO-Based Fusion of Satellite and AIS for Vessel Characterization

【速读】:该论文旨在解决内河航道海域态势感知(Maritime Domain Awareness, MDA)中因依赖合作式系统(如自动识别系统,AIS)而存在的漏洞问题。传统AIS监测易受船舶关闭信号或未装备设备的影响,导致“暗船”(dark vessels)难以被发现,从而削弱了对航行活动的全面掌握。解决方案的关键在于融合非合作式的高分辨率卫星遥感影像与AIS轨迹数据,通过YOLO v11目标检测模型实现对船只类型、舱盖状态、作业状态、驳船数量及航向的精准识别,并利用多源数据关联分析实现对暗船的识别、合作船舶的验证以及高级MDA支持。该方法显著提升了内河航运监管的完整性与实时性,且在不同地理段落间具有良好的空间迁移能力(准确率高达98%)。

链接: https://arxiv.org/abs/2510.11449
作者: Geoffery Agorku,Sarah Hernandez,Hayley Hames,Cade Wagner
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Maritime Domain Awareness (MDA) for inland waterways remains challenged by cooperative system vulnerabilities. This paper presents a novel framework that fuses high-resolution satellite imagery with vessel trajectory data from the Automatic Identification System (AIS). This work addresses the limitations of AIS-based monitoring by leveraging non-cooperative satellite imagery and implementing a fusion approach that links visual detections with AIS data to identify dark vessels, validate cooperative traffic, and support advanced MDA. The You Only Look Once (YOLO) v11 object detection model is used to detect and characterize vessels and barges by vessel type, barge cover, operational status, barge count, and direction of travel. An annotated data set of 4,550 instances was developed from 5,973~\mathrmmi^2 of Lower Mississippi River imagery. Evaluation on a held-out test set demonstrated vessel classification (tugboat, crane barge, bulk carrier, cargo ship, and hopper barge) with an F1 score of 95.8%; barge cover (covered or uncovered) detection yielded an F1 score of 91.6%; operational status (staged or in motion) classification reached an F1 score of 99.4%. Directionality (upstream, downstream) yielded 93.8% accuracy. The barge count estimation resulted in a mean absolute error (MAE) of 2.4 barges. Spatial transferability analysis across geographically disjoint river segments showed accuracy was maintained as high as 98%. These results underscore the viability of integrating non-cooperative satellite sensing with AIS fusion. This approach enables near-real-time fleet inventories, supports anomaly detection, and generates high-quality data for inland waterway surveillance. Future work will expand annotated datasets, incorporate temporal tracking, and explore multi-modal deep learning to further enhance operational scalability.
zh

[CV-33] Robust Ego-Exo Correspondence with Long-Term Memory NEURIPS2025

【速读】:该论文旨在解决第一人称视角(egocentric)与第三人称视角(exocentric)之间的对象级对应关系(ego-exo correspondence, EEC)问题,该任务对智能助手提供精确直观的视觉引导至关重要。现有方法多借鉴视频目标分割模型,但在极端视角变化、遮挡及小目标存在等挑战下性能受限。针对此问题,作者提出基于Segment Anything Model 2(SAM 2)的改进框架LM-EEC,其关键创新在于:(i) 设计了一种双分支路由机制的Memory-View MoE模块,通过自适应分配通道和空间维度上的专家特征权重,实现更有效的跨视角特征融合;(ii) 提出一种双记忆库系统结合压缩策略,在保留长期关键信息的同时消除冗余,显著增强长视频场景下的记忆能力。实验表明,该方法在EgoExo4D基准上达到新的最先进性能。

链接: https://arxiv.org/abs/2510.11417
作者: Yijun Hu,Bing Fan,Xin Gu,Haiqing Ren,Dongfang Liu,Heng Fan,Libo Zhang
机构: University of Chinese Academy of Sciences (中国科学院大学); University of North Texas (北德克萨斯大学); Institute of Software Chinese Academy of Sciences (中国科学院软件研究所); Rochester Institute of Technology (罗切斯特理工学院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by NeurIPS 2025

点击查看摘要

Abstract:Establishing object-level correspondence between egocentric and exocentric views is essential for intelligent assistants to deliver precise and intuitive visual guidance. However, this task faces numerous challenges, including extreme viewpoint variations, occlusions, and the presence of small objects. Existing approaches usually borrow solutions from video object segmentation models, but still suffer from the aforementioned challenges. Recently, the Segment Anything Model 2 (SAM 2) has shown strong generalization capabilities and excellent performance in video object segmentation. Yet, when simply applied to the ego-exo correspondence (EEC) task, SAM 2 encounters severe difficulties due to ineffective ego-exo feature fusion and limited long-term memory capacity, especially for long videos. Addressing these problems, we propose a novel EEC framework based on SAM 2 with long-term memories by presenting a dual-memory architecture and an adaptive feature routing module inspired by Mixture-of-Experts (MoE). Compared to SAM 2, our approach features (i) a Memory-View MoE module which consists of a dual-branch routing mechanism to adaptively assign contribution weights to each expert feature along both channel and spatial dimensions, and (ii) a dual-memory bank system with a simple yet effective compression strategy to retain critical long-term information while eliminating redundancy. In the extensive experiments on the challenging EgoExo4D benchmark, our method, dubbed LM-EEC, achieves new state-of-the-art results and significantly outperforms existing methods and the SAM 2 baseline, showcasing its strong generalization across diverse scenarios. Our code and model are available at this https URL.
zh

[CV-34] MaterialRefGS: Reflective Gaussian Splatting with Multi-view Consistent Material Inference NEURIPS2025

【速读】:该论文旨在解决基于高斯溅射(Gaussian Splatting, GS)的反射建模在光照 aliasing 和泛化能力不足的问题,尤其是在环境建模受限条件下。其关键解决方案在于从多视角一致性出发,通过在延迟着色阶段强制2D高斯生成多视角一致的材质图,并利用跨视角光度变化识别高反射区域以提供反射强度的强先验;同时引入基于2DGS的光线追踪环境建模策略,有效处理由物体间遮挡引起的间接光照,从而实现更逼真的反射与全局光照重建。

链接: https://arxiv.org/abs/2510.11387
作者: Wenyuan Zhang,Jimin Tang,Weiqi Zhang,Yi Fang,Yu-Shen Liu,Zhizhong Han
机构: Tsinghua University (清华大学); NYU Abu Dhabi (纽约大学阿布扎比校区); Wayne State University (韦恩州立大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by NeurIPS 2025. Project Page: this https URL

点击查看摘要

Abstract:Modeling reflections from 2D images is essential for photorealistic rendering and novel view synthesis. Recent approaches enhance Gaussian primitives with reflection-related material attributes to enable physically based rendering (PBR) with Gaussian Splatting. However, the material inference often lacks sufficient constraints, especially under limited environment modeling, resulting in illumination aliasing and reduced generalization. In this work, we revisit the problem from a multi-view perspective and show that multi-view consistent material inference with more physically-based environment modeling is key to learning accurate reflections with Gaussian Splatting. To this end, we enforce 2D Gaussians to produce multi-view consistent material maps during deferred shading. We also track photometric variations across views to identify highly reflective regions, which serve as strong priors for reflection strength terms. To handle indirect illumination caused by inter-object occlusions, we further introduce an environment modeling strategy through ray tracing with 2DGS, enabling photorealistic rendering of indirect radiance. Experiments on widely used benchmarks show that our method faithfully recovers both illumination and geometry, achieving state-of-the-art rendering quality in novel views synthesis.
zh

[CV-35] Reasoning as Representation: Rethinking Visual Reinforcement Learning in Image Quality Assessment

【速读】:该论文旨在解决基于推理的图像质量评估(IQA)模型在实际部署中面临的高能耗与高延迟问题,同时揭示其卓越泛化能力的内在机制。现有方法虽通过强化学习(Reinforcement Learning, RL)训练的多模态大语言模型(MLLMs)实现了优异的跨域泛化性能,但其推理过程依赖复杂的文本生成和模型计算,导致资源消耗巨大。论文的关键发现是:RL训练使MLLM能够将冗余的视觉表征转化为紧凑且跨域对齐的文本表征,这一转换正是泛化能力的来源。基于此洞察,作者提出RALI算法,利用对比学习直接将图像对齐至由RL学习到的通用文本表征,从而无需依赖推理过程或加载大型语言模型(LLM),在保持与推理模型相当的泛化性能的同时,模型参数量和推理时间均减少至不足5%。

链接: https://arxiv.org/abs/2510.11369
作者: Shijie Zhao,Xuanyu Zhang,Weiqi Li,Junlin Li,Li Zhang,Tianfan Xue,Jian Zhang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Reasoning-based image quality assessment (IQA) models trained through reinforcement learning (RL) exhibit exceptional generalization, yet the underlying mechanisms and critical factors driving this capability remain underexplored in current research. Moreover, despite their superior performance, these models incur inference energy usage and latency orders of magnitude higher than their earlier counterparts, restricting their deployment in specific scenarios. Through extensive experiments, this paper verifies and elaborates that through RL training, MLLMs leverage their reasoning capability to convert redundant visual representations into compact, cross-domain aligned text representations. This conversion is precisely the source of the generalization exhibited by these reasoning-based IQA models. Building on this fundamental insight, we propose a novel algorithm, RALI, which employs contrastive learning to directly align images with these generalizable text representations learned by RL. This approach eliminates the reliance on reasoning processes and even obviates the need to load an LLM. For the quality scoring task, this framework achieves generalization performance comparable to reasoning-based models while requiring less than 5% of their model parameters and inference time.
zh

[CV-36] Uncertainty-Aware ControlNet: Bridging Domain Gaps with Synthetic Image Generation ICCV

【速读】:该论文旨在解决生成式模型在跨域数据增强中的局限性问题,即传统ControlNet等控制扩散模型难以有效利用未标注域的数据来生成高质量、具有语义标签的合成图像,从而限制了其在目标域(如低质量Home-OCT图像)上的迁移性能。解决方案的关键在于引入不确定性引导机制(uncertainty-guided control),通过量化输入图像是否偏离下游任务(如分割)的训练分布,使ControlNet能够同时接受来自标注数据的语义控制和来自未标注数据的不确定性控制。这一机制允许模型从目标域中生成高不确定性的合成带标签图像,实现无监督域适应,显著提升分割性能,且无需额外人工标注或严格的风格迁移学习。

链接: https://arxiv.org/abs/2510.11346
作者: Joshua Niemeijer,Jan Ehrhardt,Heinz Handels,Hristina Uzunova
机构: German Aerospace Center (DLR); University of Lübeck; German Research Center for Artificial Intelligence (DFKI)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Accepted for presentation at ICCV Workshops 2025, “The 4th Workshop on What is Next in Multimodal Foundation Models?” (MMFM)

点击查看摘要

Abstract:Generative Models are a valuable tool for the controlled creation of high-quality image data. Controlled diffusion models like the ControlNet have allowed the creation of labeled distributions. Such synthetic datasets can augment the original training distribution when discriminative models, like semantic segmentation, are trained. However, this augmentation effect is limited since ControlNets tend to reproduce the original training distribution. This work introduces a method to utilize data from unlabeled domains to train ControlNets by introducing the concept of uncertainty into the control mechanism. The uncertainty indicates that a given image was not part of the training distribution of a downstream task, e.g., segmentation. Thus, two types of control are engaged in the final network: an uncertainty control from an unlabeled dataset and a semantic control from the labeled dataset. The resulting ControlNet allows us to create annotated data with high uncertainty from the target domain, i.e., synthetic data from the unlabeled distribution with labels. In our scenario, we consider retinal OCTs, where typically high-quality Spectralis images are available with given ground truth segmentations, enabling the training of segmentation networks. The recent development in Home-OCT devices, however, yields retinal OCTs with lower quality and a large domain shift, such that out-of-the-pocket segmentation networks cannot be applied for this type of data. Synthesizing annotated images from the Home-OCT domain using the proposed approach closes this gap and leads to significantly improved segmentation results without adding any further supervision. The advantage of uncertainty-guidance becomes obvious when compared to style transfer: it enables arbitrary domain shifts without any strict learning of an image style. This is also demonstrated in a traffic scene experiment. Comments: Accepted for presentation at ICCV Workshops 2025, “The 4th Workshop on What is Next in Multimodal Foundation Models?” (MMFM) Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI) Cite as: arXiv:2510.11346 [cs.CV] (or arXiv:2510.11346v1 [cs.CV] for this version) https://doi.org/10.48550/arXiv.2510.11346 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
zh

[CV-37] MMAP: A Multi-Magnification and Prototype-Aware Architecture for Predicting Spatial Gene Expression PRICAI2025

【速读】:该论文旨在解决从组织学图像(HE染色全切片图像,WSI)中预测空间转录组基因表达谱时存在的两大挑战:一是局部特征提取粒度不足,二是全局空间上下文覆盖不充分。解决方案的关键在于提出一种名为MMAP(Multi-MAgnification and Prototype-enhanced architecture)的新框架:首先通过多倍率图像块表示增强局部特征的细粒度表达能力;其次引入一组潜在原型嵌入(latent prototype embeddings),作为滑动级别信息的紧凑表征以提升全局空间语境的理解。该设计有效缓解了视觉特征与分子信号之间的模态差距,实验表明MMAP在多个指标(如MAE、MSE和PCC)上均显著优于现有最优方法。

链接: https://arxiv.org/abs/2510.11344
作者: Hai Dang Nguyen,Nguyen Dang Huy Pham, TheMinh Duc Nguyen,Dac Thai Nguyen,Hang Thi Nguyen,Duong M. Nguyen
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted for presentation at the 2025 Pacific Rim International Conference on Artificial Intelligence (PRICAI 2025)

点击查看摘要

Abstract:Spatial Transcriptomics (ST) enables the measurement of gene expression while preserving spatial information, offering critical insights into tissue architecture and disease pathology. Recent developments have explored the use of hematoxylin and eosin (HE)-stained whole-slide images (WSIs) to predict transcriptome-wide gene expression profiles through deep neural networks. This task is commonly framed as a regression problem, where each input corresponds to a localized image patch extracted from the WSI. However, predicting spatial gene expression from histological images remains a challenging problem due to the significant modality gap between visual features and molecular signals. Recent studies have attempted to incorporate both local and global information into predictive models. Nevertheless, existing methods still suffer from two key limitations: (1) insufficient granularity in local feature extraction, and (2) inadequate coverage of global spatial context. In this work, we propose a novel framework, MMAP (Multi-MAgnification and Prototype-enhanced architecture), that addresses both challenges simultaneously. To enhance local feature granularity, MMAP leverages multi-magnification patch representations that capture fine-grained histological details. To improve global contextual understanding, it learns a set of latent prototype embeddings that serve as compact representations of slide-level information. Extensive experimental results demonstrate that MMAP consistently outperforms all existing state-of-the-art methods across multiple evaluation metrics, including Mean Absolute Error (MAE), Mean Squared Error (MSE), and Pearson Correlation Coefficient (PCC).
zh

[CV-38] InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models

【速读】:该论文旨在解决通用SVG(可缩放矢量图形)建模中的三大挑战:数据集碎片化、方法在不同任务间迁移能力有限,以及对结构复杂性的处理困难。其核心解决方案是利用多模态大语言模型(Multimodal Large Language Models, MLLMs)强大的迁移与泛化能力,构建一个统一的SVG理解、编辑与生成框架——InternSVG家族。关键创新包括:1)提出SAgoge,目前最大且最全面的多模态SVG数据集,涵盖静态图形与动态动画,支持多样化任务;2)设计SArena基准测试体系,提供标准化评估与任务定义;3)开发InternSVG模型,采用SVG专用特殊标记、基于子词的嵌入初始化及两阶段训练策略,从简单静态SVG逐步过渡到复杂长序列图示与动画,实现正向迁移并显著提升性能。

链接: https://arxiv.org/abs/2510.11341
作者: Haomin Wang,Jinhui Yin,Qi Wei,Wenguang Zeng,Lixin Gu,Shenglong Ye,Zhangwei Gao,Yaohui Wang,Yanting Zhang,Yuanqi Li,Yanwen Guo,Wenhai Wang,Kai Chen,Yu Qiao,Hongjie Zhang
机构: Shanghai Jiao Tong University (上海交通大学); Shanghai AI Laboratory (上海人工智能实验室); Nanjing University (南京大学); Donghua University (东华大学); The Chinese University of Hong Kong (香港中文大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:General SVG modeling remains challenging due to fragmented datasets, limited transferability of methods across tasks, and the difficulty of handling structural complexity. In response, we leverage the strong transfer and generalization capabilities of multimodal large language models (MLLMs) to achieve unified modeling for SVG understanding, editing, and generation. We present the InternSVG family, an integrated data-benchmark-model suite. At its core is SAgoge, the largest and most comprehensive multimodal dataset for SVG tasks, encompassing both static graphics and dynamic animations. It covers icons, long-sequence illustrations, scientific diagrams, and dynamic animations, supporting tasks of varied difficulty levels and providing deeper hierarchies with richer attributes compared to previous datasets. Based on this resource, we introduce SArena, a companion benchmark with comprehensive task definitions and standardized evaluation that aligns with the domains and difficulty spectrum covered by SAgoge. Building on these foundations, we propose InternSVG, a unified MLLM for SVG understanding, editing, and generation with SVG-specific special tokens, subword-based embedding initialization, and a two-stage training strategy that progresses from short static SVGs to long-sequence illustrations and complex animations. This unified formulation induces positive transfer and improves overall performance. Experiments on SArena and prior benchmark confirm that InternSVG achieves substantial gains and consistently outperforms leading open and proprietary counterparts.
zh

[CV-39] REACT3D: Recovering Articulations for Interactive Physical 3D Scenes

【速读】:该论文旨在解决当前交互式3D场景数据集受限于人工标注成本高、难以规模化的问题,尤其在部件分割(part segmentation)、运动关节类型(kinematic types)及运动轨迹(motion trajectories)等方面。其解决方案的关键在于提出REACT3D框架,实现从静态3D场景到可模拟交互场景的零样本(zero-shot)转换:通过开合物体检测与分割识别潜在可移动部件,基于隐式几何补全与关节类型估计推断运动参数,最终以标准化格式集成至主流仿真平台,从而显著提升交互式场景生成的效率与兼容性,为结构化场景理解的大规模研究提供可行路径。

链接: https://arxiv.org/abs/2510.11340
作者: Zhao Huang,Boyang Sun,Alexandros Delitzas,Jiaqi Chen,Marc Pollefeys
机构: ETH Zurich (苏黎世联邦理工学院); Max Planck Institute for Informatics (马普所信息学研究所); Microsoft Spatial AI Lab (微软空间AI实验室)
类目: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
备注: 8 pages

点击查看摘要

Abstract:Interactive 3D scenes are increasingly vital for embodied intelligence, yet existing datasets remain limited due to the labor-intensive process of annotating part segmentation, kinematic types, and motion trajectories. We present REACT3D, a scalable zero-shot framework that converts static 3D scenes into simulation-ready interactive replicas with consistent geometry, enabling direct use in diverse downstream tasks. Our contributions include: (i) openable-object detection and segmentation to extract candidate movable parts from static scenes, (ii) articulation estimation that infers joint types and motion parameters, (iii) hidden-geometry completion followed by interactive object assembly, and (iv) interactive scene integration in widely supported formats to ensure compatibility with standard simulation platforms. We achieve state-of-the-art performance on detection/segmentation and articulation metrics across diverse indoor scenes, demonstrating the effectiveness of our framework and providing a practical foundation for scalable interactive scene generation, thereby lowering the barrier to large-scale research on articulated scene understanding. Our project page is \textit\hypersetupurlcolor=black\hrefthis https URLthis http URL.
zh

[CV-40] Evaluating the effects of preprocessing method selection and hyperparameter tuning on SAR-based flood mapping and water depth estimation

【速读】:该论文旨在解决利用合成孔径雷达(Synthetic Aperture Radar, SAR)影像进行洪水制图与水深估算时,因预处理、洪水识别和水深反演各步骤方法选择及超参数设置差异所导致的不确定性问题。其关键解决方案在于强调整个处理流程(包括去斑噪声、洪水制图和水深估计)的系统性评估,而非单一配置的优化;通过构建多方法组合的集合(ensemble)策略,量化并管理各步骤的不确定性,从而提升结果的可靠性。研究表明,洪水制图方法的选择对最终结果影响最大,而水深估计则高度依赖于洪水边界输入及其对应的超参数设定。

链接: https://arxiv.org/abs/2510.11305
作者: Jean-Paul Travert,Cédric Goeury,Sébastien Boyaval,Vito Bacchi,Fabrice Zaoui
机构: EDF R&D (法国电力集团研发部门); Laboratoire National d’Hydraulique et Environnement (LNHE) (国家水力与环境实验室); Laboratoire d’Hydraulique Saint-Venant (LHSV) (圣文森特水力学实验室); ENPC (国立桥路学院); Institut Polytechnique de Paris (巴黎综合理工学院); Inria (法国国家信息与自动化研究院)
类目: Computer Vision and Pattern Recognition (cs.CV); Geophysics (physics.geo-ph)
备注:

点击查看摘要

Abstract:Flood mapping and water depth estimation from Synthetic Aperture Radar (SAR) imagery are crucial for calibrating and validating hydraulic models. This study uses SAR imagery to evaluate various preprocessing (especially speckle noise reduction), flood mapping, and water depth estimation methods. The impact of the choice of method at different steps and its hyperparameters is studied by considering an ensemble of preprocessed images, flood maps, and water depth fields. The evaluation is conducted for two flood events on the Garonne River (France) in 2019 and 2021, using hydrodynamic simulations and in-situ observations as reference data. Results show that the choice of speckle filter alters flood extent estimations with variations of several square kilometers. Furthermore, the selection and tuning of flood mapping methods also affect performance. While supervised methods outperformed unsupervised ones, tuned unsupervised approaches (such as local thresholding or change detection) can achieve comparable results. The compounded uncertainty from preprocessing and flood mapping steps also introduces high variability in the water depth field estimates. This study highlights the importance of considering the entire processing pipeline, encompassing preprocessing, flood mapping, and water depth estimation methods and their associated hyperparameters. Rather than relying on a single configuration, adopting an ensemble approach and accounting for methodological uncertainty should be privileged. For flood mapping, the method choice has the most influence. For water depth estimation, the most influential processing step was the flood map input resulting from the flood mapping step and the hyperparameters of the methods.
zh

[CV-41] sketch2symm: Symmetry-aware sketch-to-shape generation via semantic bridging

【速读】:该论文旨在解决草图驱动的三维重建(Sketch-based 3D Reconstruction)难题,即如何从语义和几何信息匮乏的稀疏草图中生成几何一致的三维形状。其核心挑战在于草图输入的抽象性和不完整性,导致传统方法难以恢复准确的三维结构。解决方案的关键在于提出一种两阶段生成框架 Sketch2Symm:首先通过草图到图像的翻译(Sketch-to-Image Translation)实现语义桥接(Semantic Bridging),以增强草图的语义表达;其次引入对称性约束(Symmetry Constraints)作为几何先验,利用日常物体普遍存在的结构规律提升重建质量。实验表明,该方法在 Chamfer Distance、Earth Mover’s Distance 和 F-Score 等指标上优于现有方法,验证了语义桥接与对称感知设计的有效性。

链接: https://arxiv.org/abs/2510.11303
作者: Yan Zhou(1),Mingji Li(2),Xiantao Zeng(2),Jie Lin(1),Yuexia Zhou(1) ((1) School of Electronic Information Engineering, Foshan University, Guangdong, China, (2) School of Computer Science and Artificial Intelligence, Foshan University, Guangdong, China)
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Sketch-based 3D reconstruction remains a challenging task due to the abstract and sparse nature of sketch inputs, which often lack sufficient semantic and geometric information. To address this, we propose Sketch2Symm, a two-stage generation method that produces geometrically consistent 3D shapes from sketches. Our approach introduces semantic bridging via sketch-to-image translation to enrich sparse sketch representations, and incorporates symmetry constraints as geometric priors to leverage the structural regularity commonly found in everyday objects. Experiments on mainstream sketch datasets demonstrate that our method achieves superior performance compared to existing sketch-based reconstruction methods in terms of Chamfer Distance, Earth Mover’s Distance, and F-Score, verifying the effectiveness of the proposed semantic bridging and symmetry-aware design.
zh

[CV-42] When Does Supervised Training Pay Off? The Hidden Economics of Object Detection in the Era of Vision-Language Models

【速读】:该论文旨在解决目标检测系统在精度与成本之间的权衡问题,即如何在高精度的监督学习方法(如YOLO)与无需标注的零样本视觉语言模型(VLM,如Gemini Flash 2.5)之间做出最优架构选择。其关键解决方案在于通过系统的成本效益分析(Total Cost of Ownership modeling),量化不同部署场景下的经济阈值,识别出影响决策的核心因素:推理量(inference volume)、类别稳定性(category stability)、预算约束(budget constraints)及精度要求(accuracy requirements),从而为实际应用提供可操作的决策框架,而非仅依赖传统技术指标(如mAP)。

链接: https://arxiv.org/abs/2510.11302
作者: Samer Al-Hamadani
机构: Al-Khwarizmi College of Engineering (阿尔·花剌子米工程学院); University of Baghdad (巴格达大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 23 pages, 4 figures, 4 tables

点击查看摘要

Abstract:Object detection systems have traditionally relied on supervised learning with manually annotated bounding boxes, achieving high accuracy at the cost of substantial annotation investment. The emergence of Vision-Language Models (VLMs) offers an alternative paradigm enabling zero-shot detection through natural language queries, eliminating annotation requirements but operating with reduced accuracy. This paper presents the first comprehensive cost-effectiveness analysis comparing supervised detection (YOLO) with zero-shot VLM inference (Gemini Flash 2.5). Through systematic evaluation on 1,000 stratified COCO images and 200 diverse product images spanning consumer electronics and rare categories, combined with detailed Total Cost of Ownership modeling, we establish quantitative break-even thresholds governing architecture selection. Our findings reveal that supervised YOLO achieves 91.2% accuracy versus 68.5% for zero-shot Gemini on standard categories, representing a 22.7 percentage point advantage that costs 10,800 in annotation for 100-category systems. However, this advantage justifies investment only beyond 55 million inferences, equivalent to 151,000 images daily for one year. Zero-shot Gemini demonstrates 52.3% accuracy on diverse product categories (ranging from highly web-prevalent consumer electronics at 75-85% to rare specialized equipment at 25-40%) where supervised YOLO achieves 0% due to architectural constraints preventing detection of untrained classes. Cost per Correct Detection analysis reveals substantially lower per-detection costs for Gemini ( 0.00050 vs 0.143) at 100,000 inferences despite accuracy deficits. We develop decision frameworks demonstrating that optimal architecture selection depends critically on deployment volume, category stability, budget constraints, and accuracy requirements rather than purely technical performance metrics.
zh

[CV-43] ΔmathrmEnergy: Optimizing Energy Change During Vision-Language Alignment Improves both OOD Detection and OOD Generalization

【速读】:该论文旨在解决视觉-语言模型(Vision-Language Models, VLMs)在实际下游任务中面对分布内(in-distribution, ID)和分布外(out-of-distribution, OOD)数据时的泛化能力不足问题,尤其是如何有效应对两类OOD情形:一是协变量偏移(covariate shift,如已知类别的图像风格变化),二是语义偏移(semantic shift,如测试时未见过的新类别)。解决方案的关键在于提出一种新颖的OOD评分指标——\DeltaEnergy,其基于重新对齐视觉与语言模态时能量变化的观察,显著优于传统的基于能量的OOD检测方法,并能同时提升模型在协变量偏移下的泛化性能。进一步地,通过下界最大化\DeltaEnergy(称为EBM),理论证明该方法不仅增强OOD检测效果,还产生领域一致的Hessian矩阵,作为OOD泛化能力的强指标;由此构建统一微调框架,在多个挑战性OOD检测与泛化基准上实现性能提升(AUROC提升10%–25%)。

链接: https://arxiv.org/abs/2510.11296
作者: Lin Zhu,Yifeng Yang,Xinbing Wang,Qinying Gu,Nanyang Ye
机构: Shanghai Jiao Tong University (上海交通大学); Shanghai Artificial Intelligence Laboratory (上海人工智能实验室)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Accepted by NeruIPS2025

点击查看摘要

Abstract:Recent approaches for vision-language models (VLMs) have shown remarkable success in achieving fast downstream adaptation. When applied to real-world downstream tasks, VLMs inevitably encounter both the in-distribution (ID) data and out-of-distribution (OOD) data. The OOD datasets often include both covariate shifts (e.g., known classes with changes in image styles) and semantic shifts (e.g., test-time unseen classes). This highlights the importance of improving VLMs’ generalization ability to covariate-shifted OOD data, while effectively detecting open-set semantic-shifted OOD classes. In this paper, inspired by the substantial energy change observed in closed-set data when re-aligning vision-language modalities (specifically by directly reducing the maximum cosine similarity to a low value), we introduce a novel OOD score, named \DeltaEnergy. \DeltaEnergy significantly outperforms the vanilla energy-based OOD score and provides a more reliable approach for OOD detection. Furthermore, \DeltaEnergy can simultaneously improve OOD generalization under covariate shifts, which is achieved by lower-bound maximization for \DeltaEnergy (termed EBM). EBM is theoretically proven to not only enhance OOD detection but also yields a domain-consistent Hessian, which serves as a strong indicator for OOD generalization. Based on this finding, we developed a unified fine-tuning framework that allows for improving VLMs’ robustness in both OOD generalization and OOD detection. Extensive experiments on challenging OOD detection and generalization benchmarks demonstrate the superiority of our method, outperforming recent approaches by 10% to 25% in AUROC.
zh

[CV-44] Human Uncertainty-Aware Data Selection and Automatic Labeling in Visual Question Answering

【速读】:该论文旨在解决大规模视觉语言模型(VLMs)在监督微调(Supervised Fine-Tuning, SFT)过程中对高成本人工标注数据的依赖问题,尤其是现有方法忽视了真实世界数据中广泛存在的“人类不确定性”(Human Uncertainty, HU)——即标注者对同一样本存在不同置信度的情况。研究表明,传统SFT简单地以最频繁标签为目标进行优化,不仅未能利用HU信息,反而因误用高HU样本导致模型性能下降和校准不足。解决方案的关键在于提出HaDola框架,其核心是通过四个阶段(判别、自标注、错误触发与训练)迭代识别有害样本、优先选择高信息量样本,并从仅5%的小规模种子数据开始自动构建高质量训练集,从而显著减少对昂贵HU标注的依赖,同时提升模型准确性和校准能力。

链接: https://arxiv.org/abs/2510.11295
作者: Jian Lan,Zhicheng Liu,Udo Schlegel,Raoyuan Zhao,Yihong Liu,Hinrich Schütze,Michael A. Hedderich,Thomas Seidl
机构: University of Munich (慕尼黑大学); Munich Center of Machine Learning (慕尼黑机器学习中心)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Large vision-language models (VLMs) achieve strong performance in Visual Question Answering but still rely heavily on supervised fine-tuning (SFT) with massive labeled datasets, which is costly due to human annotations. Crucially, real-world datasets often exhibit human uncertainty (HU) – variation in human confidence across annotations – but standard SFT simply optimizes toward the most frequent label, disregarding HU distributions. This leaves two open questions: How does HU affect SFT, and how can HU be effectively leveraged in training? In this work, we first conduct a systematic evaluation of VLMs across varying HU levels. We have two key findings: (i) surprisingly, high-HU samples contribute little or even degrade model performance, and (ii) naively training on the full dataset yields under-calibrated models that fail to capture HU distributions. Motivated by these findings, we introduce HaDola, a human uncertainty-aware data selection and automatic labeling framework. HaDola operates in four stages – discriminate, self-annotate, error trigger, and training – to iteratively identify harmful samples, prioritize informative ones, and bootstrap from a small seed set (5% of data). Our approach substantially reduces reliance on costly HU annotations and makes VLMs more accurate and better calibrated. Extensive experiments on VQAv2 and VizWiz datasets demonstrate that HaDola consistently matches or outperforms state-of-the-art baselines with less training data. Our work highlights the importance of explicitly modeling HU in SFT, suggesting that better utilization of HU is more effective than merely scaling up dataset size.
zh

[CV-45] EEMS: Edge-Prompt Enhanced Medical Image Segmentation Based on Learnable Gating Mechanism

【速读】:该论文旨在解决医学图像分割中因边界模糊和背景噪声等因素导致的分割精度不足问题。其解决方案的关键在于提出了一种名为EEMS的新模型,该模型包含三个核心组件:边缘感知增强单元(Edge-Aware Enhancement Unit, EAEU)通过多频特征提取提升边缘感知能力,从而精确定义目标边界;多尺度提示生成单元(Multi-scale Prompt Generation Unit, MSPGU)采用提示引导策略融合高层语义与低层空间特征,实现精准的目标定位;以及双源自适应门控融合单元(Dual-Source Adaptive Gated Fusion Unit, DAGFU),用于融合EAEU的边缘特征与MSPGU的语义特征,显著提升分割的准确性和鲁棒性。

链接: https://arxiv.org/abs/2510.11287
作者: Han Xia,Quanjun Li,Qian Li,Zimeng Li,Hongbin Ye,Yupeng Liu,Haolun Li,Xuhang Chen
机构: Guangdong University of Technology (广东工业大学); Guangdong Provincial People’s Hospital (广东省人民医院); Southern Medical University (南方医科大学); Guangdong Academy of Medical Sciences (广东省医学科学院); Nanjing University of Posts and Telecommunications (南京邮电大学); Shenzhen Polytechnic University (深圳职业技术大学); National Key Laboratory of Space Intelligent Control (空间智能控制国家重点实验室); Guangdong Basic and Applied Basic Research Foundation (广东省基础与应用基础研究基金)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by BIBM 2025

点击查看摘要

Abstract:Medical image segmentation is vital for diagnosis, treatment planning, and disease monitoring but is challenged by complex factors like ambiguous edges and background noise. We introduce EEMS, a new model for segmentation, combining an Edge-Aware Enhancement Unit (EAEU) and a Multi-scale Prompt Generation Unit (MSPGU). EAEU enhances edge perception via multi-frequency feature extraction, accurately defining boundaries. MSPGU integrates high-level semantic and low-level spatial features using a prompt-guided approach, ensuring precise target localization. The Dual-Source Adaptive Gated Fusion Unit (DAGFU) merges edge features from EAEU with semantic features from MSPGU, enhancing segmentation accuracy and robustness. Tests on datasets like ISIC2018 confirm EEMS’s superior performance and reliability as a clinical tool.
zh

[CV-46] Exploring and Leverag ing Class Vectors for Classifier Editing NEURIPS2025

【速读】:该论文旨在解决图像分类模型在训练完成后难以进行灵活编辑的问题,尤其是在需要遗忘特定类别或适应分布偏移(distribution shift)时,现有方法要么仅能修正错误,要么需耗费大量重训练成本。其解决方案的关键在于提出类向量(Class Vectors),该向量在微调过程中捕获每个类别的特定表征调整,从而将类级别的适应性解耦到潜在空间中。与任务向量(task vectors)编码权重空间中的整体任务变化不同,类向量能够精确刻画每个类的语义迁移,并支持通过沿类向量方向引导潜在特征或将其映射至权重空间更新决策边界来实现高效编辑。此外,类向量具备内在的线性与正交特性,使得高阶概念编辑可通过简单的类算术操作完成,显著提升了编辑的灵活性与效率。

链接: https://arxiv.org/abs/2510.11268
作者: Jaeik Kim,Jaeyoung Do
机构: Seoul National University (首尔国立大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted in NeurIPS 2025

点击查看摘要

Abstract:Image classifiers play a critical role in detecting diseases in medical imaging and identifying anomalies in manufacturing processes. However, their predefined behaviors after extensive training make post hoc model editing difficult, especially when it comes to forgetting specific classes or adapting to distribution shifts. Existing classifier editing methods either focus narrowly on correcting errors or incur extensive retraining costs, creating a bottleneck for flexible editing. Moreover, such editing has seen limited investigation in image classification. To overcome these challenges, we introduce Class Vectors, which capture class-specific representation adjustments during fine-tuning. Whereas task vectors encode task-level changes in weight space, Class Vectors disentangle each class’s adaptation in the latent space. We show that Class Vectors capture each class’s semantic shift and that classifier editing can be achieved either by steering latent features along these vectors or by mapping them into weight space to update the decision boundaries. We also demonstrate that the inherent linearity and orthogonality of Class Vectors support efficient, flexible, and high-level concept editing via simple class arithmetic. Finally, we validate their utility in applications such as unlearning, environmental adaptation, adversarial defense, and adversarial trigger optimization.
zh

[CV-47] A Large-Language-Model Assisted Automated Scale Bar Detection and Extraction Framework for Scanning Electron Microscopic Images

【速读】:该论文旨在解决扫描电子显微镜(Scanning Electron Microscopy, SEM)图像中尺度条(scale bar)自动检测与提取依赖人工操作导致效率低且易出错的问题。其核心解决方案是一个多模态自动化框架,关键在于四个阶段的协同工作:首先通过自动数据生成(Auto-DG)模型合成多样化SEM图像以提升模型泛化能力;其次利用目标检测算法精准定位尺度条;再通过融合DenseNet与卷积循环神经网络(Convolutional Recurrent Neural Network, CRNN)的混合光学字符识别(OCR)系统实现高精度文本信息提取;最后引入大语言模型(Large Language Model, LLM)作为推理引擎和智能助手,对结果进行验证并提出后续分析建议。该方法在对象检测上达到100%精度、95.8%召回率及99.2% mAP(IoU=0.5),显著优于传统OCR工具,从而大幅提升了SEM图像中尺度条识别的自动化水平与准确性。

链接: https://arxiv.org/abs/2510.11260
作者: Yuxuan Chen,Ruotong Yang,Zhengyang Zhang,Mehreen Ahmed,Yanming Wang
机构: Shanghai Jiao Tong Global College (上海交通大学全球学院); Shanghai Jiao Tong University (上海交通大学); Global Institute of Future Technology (未来技术学院); AI Lab, Xiaomi Corporation (小米公司人工智能实验室)
类目: Computer Vision and Pattern Recognition (cs.CV); Materials Science (cond-mat.mtrl-sci); Artificial Intelligence (cs.AI); Data Analysis, Statistics and Probability (physics.data-an)
备注: 14 pages, 6 figures

点击查看摘要

Abstract:Microscopic characterizations, such as Scanning Electron Microscopy (SEM), are widely used in scientific research for visualizing and analyzing microstructures. Determining the scale bars is an important first step of accurate SEM analysis; however, currently, it mainly relies on manual operations, which is both time-consuming and prone to errors. To address this issue, we propose a multi-modal and automated scale bar detection and extraction framework that provides concurrent object detection, text detection and text recognition with a Large Language Model (LLM) agent. The proposed framework operates in four phases; i) Automatic Dataset Generation (Auto-DG) model to synthesize a diverse dataset of SEM images ensuring robust training and high generalizability of the model, ii) scale bar object detection, iii) information extraction using a hybrid Optical Character Recognition (OCR) system with DenseNet and Convolutional Recurrent Neural Network (CRNN) based algorithms, iv) an LLM agent to analyze and verify accuracy of the results. The proposed model demonstrates a strong performance in object detection and accurate localization with a precision of 100%, recall of 95.8%, and a mean Average Precision (mAP) of 99.2% at IoU=0.5 and 69.1% at IoU=0.5:0.95. The hybrid OCR system achieved 89% precision, 65% recall, and a 75% F1 score on the Auto-DG dataset, significantly outperforming several mainstream standalone engines, highlighting its reliability for scientific image analysis. The LLM is introduced as a reasoning engine as well as an intelligent assistant that suggests follow-up steps and verifies the results. This automated method powered by an LLM agent significantly enhances the efficiency and accuracy of scale bar detection and extraction in SEM images, providing a valuable tool for microscopic analysis and advancing the field of scientific imaging.
zh

[CV-48] DTEA: Dynamic Topology Weaving and Instability-Driven Entropic Attenuation for Medical Image Segmentation

【速读】:该论文旨在解决医学图像分割中跳接连接(skip connection)因结构表征能力有限和上下文建模不足,导致在复杂临床场景下泛化性能不佳的问题。解决方案的关键在于提出DTEA模型,其核心创新为引入两个模块:一是语义拓扑重构(Semantic Topology Reconfiguration, STR),通过将多尺度语义特征重组为动态超图以增强跨分辨率解剖依赖关系的建模能力;二是熵扰动门控(Entropic Perturbation Gating, EPG),通过评估通道扰动后的稳定性并过滤高熵通道,强化临床关键区域的空间注意力。这两个模块协同提升了模型的结构表达能力和对临床重要区域的敏感性,从而改善分割精度与泛化性能。

链接: https://arxiv.org/abs/2510.11259
作者: Weixuan Li,Quanjun Li,Guang Yu,Song Yang,Zimeng Li,Chi-Man Pun,Yupeng Liu,Xuhang Chen
机构: Guangdong University of Technology (广东工业大学); Shenzhen Polytechnic University (深圳职业技术学院); Guangzhou City University of Technology (广州城市理工学院); Southern Medical University (南方医科大学); University of Macau (澳门大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by BIBM 2025

点击查看摘要

Abstract:In medical image segmentation, skip connections are used to merge global context and reduce the semantic gap between encoder and decoder. Current methods often struggle with limited structural representation and insufficient contextual modeling, affecting generalization in complex clinical scenarios. We propose the DTEA model, featuring a new skip connection framework with the Semantic Topology Reconfiguration (STR) and Entropic Perturbation Gating (EPG) modules. STR reorganizes multi-scale semantic features into a dynamic hypergraph to better model cross-resolution anatomical dependencies, enhancing structural and semantic representation. EPG assesses channel stability after perturbation and filters high-entropy channels to emphasize clinically important regions and improve spatial attention. Extensive experiments on three benchmark datasets show our framework achieves superior segmentation accuracy and better generalization across various clinical settings. The code is available at \hrefthis https URLthis https URL.
zh

[CV-49] Nepali Sign Language Characters Recognition: Dataset Development and Deep Learning Approaches

【速读】:该论文旨在解决尼泊尔手语(Nepali Sign Language, NSL)这一低资源手语缺乏高质量数字语言数据集的问题,从而推动其识别与研究进展。解决方案的关键在于构建首个NSL基准数据集,包含36个手势类别、每类1500个样本,以充分捕捉该手语的结构和视觉特征,并通过在该数据集上微调MobileNetV2和ResNet50卷积神经网络模型,实现90.45%和88.78%的分类准确率,验证了迁移学习与微调在低资源手语识别任务中的有效性。

链接: https://arxiv.org/abs/2510.11243
作者: Birat Poudel,Satyam Ghimire,Sijan Bhattarai,Saurav Bhandari,Suramya Sharma Dahal
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 6 pages, 9 figures

点击查看摘要

Abstract:Sign languages serve as essential communication systems for individuals with hearing and speech impairments. However, digital linguistic dataset resources for underrepresented sign languages, such as Nepali Sign Language (NSL), remain scarce. This study introduces the first benchmark dataset for NSL, consisting of 36 gesture classes with 1,500 samples per class, designed to capture the structural and visual features of the language. To evaluate recognition performance, we fine-tuned MobileNetV2 and ResNet50 architectures on the dataset, achieving classification accuracies of 90.45% and 88.78%, respectively. These findings demonstrate the effectiveness of convolutional neural networks in sign recognition tasks, particularly within low-resource settings. To the best of our knowledge, this work represents the first systematic effort to construct a benchmark dataset and assess deep learning approaches for NSL recognition, highlighting the potential of transfer learning and fine-tuning for advancing research in underexplored sign languages.
zh

[CV-50] LightPneumoNet: Lightweight Pneumonia Classifier

【速读】:该论文旨在解决在资源受限环境下部署计算复杂度高的深度学习模型进行肺炎诊断的难题。其核心解决方案是提出一种轻量级卷积神经网络(CNN)——LightPneumoNet,该模型从零开始设计,仅包含388,082个可训练参数,内存占用仅为1.48 MB,同时在独立测试集上实现了94.2%的整体准确率、92%的精确率、96%的F1分数以及高达99%的敏感性(recall),显著提升了对真实肺炎病例的识别能力并有效减少临床意义重大的假阴性结果。该设计使模型可在低成本硬件上高效部署,为偏远地区诊所提供可及的辅助诊断工具。

链接: https://arxiv.org/abs/2510.11232
作者: Neilansh Chauhan,Piyush Kumar Gupta,Faraz Doja
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 13 pages (including references), 5 figures

点击查看摘要

Abstract:Effective pneumonia diagnosis is often challenged by the difficulty of deploying large, computationally expensive deep learning models in resource-limited settings. This study introduces LightPneumoNet, an efficient, lightweight convolutional neural network (CNN) built from scratch to provide an accessible and accurate diagnostic solution for pneumonia detection from chest X-rays. Our model was trained on a public dataset of 5,856 chest X-ray images. Preprocessing included image resizing to 224x224, grayscale conversion, and pixel normalization, with data augmentation (rotation, zoom, shear) to prevent overfitting. The custom architecture features four blocks of stacked convolutional layers and contains only 388,082 trainable parameters, resulting in a minimal 1.48 MB memory footprint. On the independent test set, our model delivered exceptional performance, achieving an overall accuracy of 0.942, precision of 0.92, and an F1-Score of 0.96. Critically, it obtained a sensitivity (recall) of 0.99, demonstrating a near-perfect ability to identify true pneumonia cases and minimize clinically significant false negatives. Notably, LightPneumoNet achieves this high recall on the same dataset where existing approaches typically require significantly heavier architectures or fail to reach comparable sensitivity levels. The model’s efficiency enables deployment on low-cost hardware, making advanced computer-aided diagnosis accessible in underserved clinics and serving as a reliable second-opinion tool to improve patient outcomes.
zh

[CV-51] Investigating Identity Signals in Conversational Facial Dynamics via Disentangled Expression Features

【速读】:该论文旨在解决“个体身份是否仅通过面部表情的动态成分(即表情变化过程)即可识别,而无需依赖静态面部特征”的问题。其核心解决方案在于利用FLAME 3D可变形模型实现面部形状与表情动态的显式解耦,从对话视频中提取逐帧的表情和下颌参数,并仅保留表达动态信息;在此基础上,结合Conformer架构与监督对比学习,在CANDOR数据集上实现了61.14%的1,429类分类准确率(远高于随机水平),验证了面部动态中蕴含强身份签名。关键创新点在于引入漂移-噪声比(Drift-to-Noise Ratio, DNR)量化形状表达分离的可靠性,发现DNR与识别性能呈显著负相关,表明稳定形状估计对动态识别至关重要。

链接: https://arxiv.org/abs/2510.11223
作者: Masoumeh Chapariniya,Pierre Vuillecard,Jean-Marc Odobez,Volker Dellwo,Teodora Vukovic
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:This work investigates whether individuals can be identified solely through the pure dynamical components of their facial expressions, independent of static facial appearance. We leverage the FLAME 3D morphable model to achieve explicit disentanglement between facial shape and expression dynamics, extracting frame-by-frame parameters from conversational videos while retaining only expression and jaw coefficients. On the CANDOR dataset of 1,429 speakers in naturalistic conversations, our Conformer model with supervised contrastive learning achieves 61.14%accuracy on 1,429-way classification – 458 times above chance – demonstrating that facial dynamics carry strong identity signatures. We introduce a drift-to-noise ratio (DNR) that quantifies the reliability of shape expression separation by measuring across-session shape changes relative to within-session variability. DNR strongly negatively correlates with recognition performance, confirming that unstable shape estimation compromises dynamic identification. Our findings reveal person-specific signatures in conversational facial dynamics, with implications for social perception and clinical assessment.
zh

[CV-52] Class Prototypes based Contrastive Learning for Classifying Multi-Label and Fine-Grained Educational Videos CVPR2023

【速读】:该论文旨在解决儿童早期教育中在线视频内容筛选的问题,即如何自动识别并分类在线视频中的细粒度教育内容(如识字和数学类),以帮助教育工作者为幼儿提供合适的学习资源。其核心挑战在于视频可能包含多种教育类别且视觉特征相似(例如“字母名称”与“字母发音”),因此需进行细粒度多标签分类。解决方案的关键是提出一种基于类原型的监督对比学习方法,通过为每个教育类别学习一个类原型,并优化样本与其所属类原型之间的距离(最小化)及与其他类原型的距离(最大化),从而增强模型对细粒度类别的区分能力;同时引入多模态Transformer网络,有效建模视频中视觉与音频线索的交互关系,提升嵌入表示的质量。该方法在自建数据集APPROVE及其他基准数据集上均显著优于现有基线模型。

链接: https://arxiv.org/abs/2510.11204
作者: Rohit Gupta,Anirban Roy,Claire Christensen,Sujeong Kim,Sarah Gerard,Madeline Cincebeaux,Ajay Divakaran,Todd Grindal,Mubarak Shah
机构: Center for Research in Computer Vision, University of Central Florida (计算机视觉研究中心,中佛罗里达大学); SRI International (SRI 国际)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Published at CVPR 2023

点击查看摘要

Abstract:The recent growth in the consumption of online media by children during early childhood necessitates data-driven tools enabling educators to filter out appropriate educational content for young learners. This paper presents an approach for detecting educational content in online videos. We focus on two widely used educational content classes: literacy and math. For each class, we choose prominent codes (sub-classes) based on the Common Core Standards. For example, literacy codes include letter names', letter sounds’, and math codes include counting', sorting’. We pose this as a fine-grained multilabel classification problem as videos can contain multiple types of educational content and the content classes can get visually similar (e.g., letter names' vs letter sounds’). We propose a novel class prototypes based supervised contrastive learning approach that can handle fine-grained samples associated with multiple labels. We learn a class prototype for each class and a loss function is employed to minimize the distances between a class prototype and the samples from the class. Similarly, distances between a class prototype and the samples from other classes are maximized. As the alignment between visual and audio cues are crucial for effective comprehension, we consider a multimodal transformer network to capture the interaction between visual and audio cues in videos while learning the embedding for videos. For evaluation, we present a dataset, APPROVE, employing educational videos from YouTube labeled with fine-grained education classes by education researchers. APPROVE consists of 193 hours of expert-annotated videos with 19 classes. The proposed approach outperforms strong baselines on APPROVE and other benchmarks such as Youtube-8M, and COIN. The dataset is available at this https URL
zh

[CV-53] FlexAC: Towards Flexible Control of Associative Reasoning in Multimodal Large Language Models NEURIPS2025

链接: https://arxiv.org/abs/2510.11190
作者: Shengming Yuan,Xinyu Lyu,Shuailong Wang,Beitao Chen,Jingkuan Song,Lianli Gao
机构: University of Electronic Science and Technology of China (电子科技大学); Southwestern University of Finance and Economics (西南财经大学); Engineering Research Center of Intelligent Finance, Ministry of Education (智能金融工程研究中心,教育部); Tongji University (同济大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 19 pages, 11 figures. Accepted by the 39th Conference on Neural Information Processing Systems (NeurIPS 2025)

点击查看摘要

[CV-54] Saudi Sign Language Translation Using T5

【速读】:该论文旨在解决沙特阿拉伯手语(Saudi Sign Language, SSL)翻译中因数据稀缺和特殊视觉特征(如面部遮挡)导致的识别与翻译困难问题。解决方案的关键在于利用大规模美国手语(American Sign Language, ASL)预训练数据提升SSL翻译性能,通过在YouTubeASL数据集上对T5模型进行预训练,再微调至SSL任务,实验表明该跨语言迁移策略显著提升模型效果(BLEU-4指标提升约3倍),验证了手语模型中跨语言知识迁移的有效性。

链接: https://arxiv.org/abs/2510.11183
作者: Ali Alhejab,Tomas Zelezny,Lamya Alkanhal,Ivan Gruber,Yazeed Alharbi,Jakub Straka,Vaclav Javorek,Marek Hruz,Badriah Alkalifah,Ahmed Ali
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 11 pages, supplementary, SPECOM 2025

点击查看摘要

Abstract:This paper explores the application of T5 models for Saudi Sign Language (SSL) translation using a novel dataset. The SSL dataset includes three challenging testing protocols, enabling comprehensive evaluation across different scenarios. Additionally, it captures unique SSL characteristics, such as face coverings, which pose challenges for sign recognition and translation. In our experiments, we investigate the impact of pre-training on American Sign Language (ASL) data by comparing T5 models pre-trained on the YouTubeASL dataset with models trained directly on the SSL dataset. Experimental results demonstrate that pre-training on YouTubeASL significantly improves models’ performance (roughly 3\times in BLEU-4), indicating cross-linguistic transferability in sign language models. Our findings highlight the benefits of leveraging large-scale ASL data to improve SSL translation and provide insights into the development of more effective sign language translation systems. Our code is publicly available at our GitHub repository.
zh

[CV-55] BLEnD-Vis: Benchmarking Multimodal Cultural Understanding in Vision Language Models

【速读】:该论文旨在解决当前视觉语言模型(Vision-Language Models, VLMs)在跨文化情境下对日常文化知识理解能力的评估不足问题,特别是现有评测方法多局限于静态记忆或孤立的视觉定位,难以衡量模型在不同语言表达和视觉模态下的鲁棒性与迁移能力。其解决方案的关键在于提出BLEnD-Vis——一个多元文化、多模态的基准测试集,通过构建313个涵盖16个地区的文化相关问题模板,并生成三种对齐的多项选择题格式:(i)仅文本查询(区域→实体)、(ii)反向文本查询(实体→区域)及(iii)结合生成图像的VQA风格版本,从而系统评估VLM在语言重构和跨模态一致性上的表现。该基准包含4,916张图像和超过21,000个MCQ实例,经人工标注验证,揭示了当前VLM在文化知识理解中的脆弱性,尤其在低资源地区存在显著性能下降,为提升模型的文化适应性和多模态融合能力提供了关键评估工具。

链接: https://arxiv.org/abs/2510.11178
作者: Bryan Chen Zhengyu Tan,Zheng Weihua,Zhengyuan Liu,Nancy F. Chen,Hwaran Lee,Kenny Tsu Wei Choo,Roy Ka-Wei Lee
机构: Singapore University of Technology and Design (SUTD); Institute for Infocomm Research (I2R), A*STAR, Singapore; Sogang University
类目: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
备注: Code and Dataset to be released

点击查看摘要

Abstract:As vision-language models (VLMs) are deployed globally, their ability to understand culturally situated knowledge becomes essential. Yet, existing evaluations largely assess static recall or isolated visual grounding, leaving unanswered whether VLMs possess robust and transferable cultural understanding. We introduce BLEnD-Vis, a multimodal, multicultural benchmark designed to evaluate the robustness of everyday cultural knowledge in VLMs across linguistic rephrasings and visual modalities. Building on the BLEnD dataset, BLEnD-Vis constructs 313 culturally grounded question templates spanning 16 regions and generates three aligned multiple-choice formats: (i) a text-only baseline querying from Region \to Entity, (ii) an inverted text-only variant (Entity \to Region), and (iii) a VQA-style version of (ii) with generated images. The resulting benchmark comprises 4,916 images and over 21,000 multiple-choice question (MCQ) instances, validated through human annotation. BLEnD-Vis reveals significant fragility in current VLM cultural knowledge; models exhibit performance drops under linguistic rephrasing and, whilst visual cues often aid performance, low cross-modal consistency highlights challenges in robustly integrating textual and visual understanding, particularly for lower-resource regions. BLEnD-Vis thus provides a crucial testbed for systematically analysing cultural robustness and multimodal grounding, exposing limitations and guiding the development of more culturally competent VLMs.
zh

[CV-56] G2L:From Giga-Scale to Cancer-Specific Large-Scale Pathology Foundation Models via Knowledge Distillation

【速读】:该论文旨在解决大尺度病理基础模型(giga-scale foundation models)在实际应用中因计算成本过高而难以部署的问题。其核心挑战在于如何在不牺牲性能的前提下,显著降低模型参数规模与训练资源需求。解决方案的关键在于提出一种名为G2L的新型知识蒸馏(knowledge distillation)框架,通过仅使用1K张目标癌症病理切片(如乳腺癌、前列腺癌等),将giga-scale模型的知识迁移至参数量仅为前者15%的大规模模型(large-scale model)。实验表明,该蒸馏后的模型不仅在多个基准测试中超越同规模的先进模型,甚至在某些任务上优于原始的giga-scale教师模型和超大规模模型,同时展现出更高的鲁棒性指数,证明其对多机构图像变异具有更强的适应能力。这一方法实现了数据和参数效率的双重优化,为癌症特异性任务提供了可实用的高性能替代方案。

链接: https://arxiv.org/abs/2510.11176
作者: Yesung Cho,Sungmin Lee,Geongyu Lee,Minkyung Lee,Jongbae Park,Dongmyung Shin
机构: 1. Korea University(韩国科学技术院); 2. Seoul National University of Science and Technology(首尔科学综合大学校)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Recent studies in pathology foundation models have shown that scaling training data, diversifying cancer types, and increasing model size consistently improve their performance. However, giga-scale foundation models, which are trained on hundreds of thousands of slides covering tens of cancer types and contain billions of parameters, pose significant challenges for practical use due to their tremendous computational costs in both development and deployment. In this work, we present a novel strategy, named the G2L framework, to increase the performance of large-scale foundation models, which consist of only 15% of the parameters of giga-scale models, to a comparable performance level of giga-scale models in cancer-specific tasks. Our approach applies knowledge distillation, transferring the capabilities of a giga-scale model to a large-scale model, using just 1K pathology slides of a target cancer (e.g., breast, prostate, etc.). The resulting distilled model not only outperformed state-of-the-art models of the same size (i.e., large-scale) across several benchmarks but also, interestingly, surpassed the giga-scale teacher and huge-scale models in some benchmarks. In addition, the distilled model exhibited a higher robustness index, indicating improved resilience to image variations originating from multiple institutions. These findings suggest that the proposed distillation approach for a large-scale model is a data- and parameter-efficient way to achieve giga-scale-level performance for cancer-specific applications without prohibitive computational burden.
zh

[CV-57] Reliable Cross-modal Alignment via Prototype Iterative Construction

【速读】:该论文旨在解决跨模态对齐(cross-modal alignment)任务中因风格信息(style information)干扰导致的语义偏差或信息丢失问题。传统方法假设嵌入向量仅包含语义信息,忽略了风格等非语义因素的影响,从而限制了对齐效果。其解决方案的关键在于提出PICO框架,通过量化每个特征列代表语义信息的概率作为权重,在嵌入交互过程中抑制风格干扰;为此进一步设计了基于性能反馈的原型迭代构建方法,理论上证明该加权函数能为提升性能更显著的原型分配更高权重,从而增强语义一致性并提升对齐精度。

链接: https://arxiv.org/abs/2510.11175
作者: Xiang Ma,Litian Xu,Lexin Fang,Caiming Zhang,Lizhen Cui
机构: Shandong University (山东大学); The University of Exeter (埃克塞特大学); The Joint SDU-NTU Centre for Artificial Intelligence Research (山东大学-南洋理工大学人工智能研究中心)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Cross-modal alignment is an important multi-modal task, aiming to bridge the semantic gap between different modalities. The most reliable fundamention for achieving this objective lies in the semantic consistency between matched pairs. Conventional methods implicitly assume embeddings contain solely semantic information, ignoring the impact of non-semantic information during alignment, which inevitably leads to information bias or even loss. These non-semantic information primarily manifest as stylistic variations in the data, which we formally define as style information. An intuitive approach is to separate style from semantics, aligning only the semantic information. However, most existing methods distinguish them based on feature columns, which cannot represent the complex coupling relationship between semantic and style information. In this paper, we propose PICO, a novel framework for suppressing style interference during embedding interaction. Specifically, we quantify the probability of each feature column representing semantic information, and regard it as the weight during the embedding interaction. To ensure the reliability of the semantic probability, we propose a prototype iterative construction method. The key operation of this method is a performance feedback-based weighting function, and we have theoretically proven that the function can assign higher weight to prototypes that bring higher performance improvements. Extensive experiments on various benchmarks and model backbones demonstrate the superiority of PICO, outperforming state-of-the-art methods by 5.2%-14.1%.
zh

[CV-58] CoPRS: Learning Positional Prior from Chain-of-Thought for Reasoning Segmentation

【速读】:该论文旨在解决现有推理分割(reasoning segmentation)方法中因直接将语言模型的隐藏特征连接到掩码解码器或仅用文本位置表示而导致的可解释性差和语义细节不足的问题。其解决方案的关键在于提出CoPRS,一种基于多模态思维链(Multi-modal Chain-of-Thought, MCoT)的位置感知模型,通过一个可微且可解释的位置先验(以热力图形式表达)将语言推理与分割过程相连接。该模型利用一个可学习的集中令牌(concentration token)聚合图像与推理文本特征,生成稠密、可微的热力图作为位置先验,并由轻量级解码器将其转化为精确掩码,从而实现推理过程与分割结果之间的清晰映射与增强的可诊断性。

链接: https://arxiv.org/abs/2510.11173
作者: Zhenyu Lu,Liupeng Li,Jinpeng Wang,Yan Feng,Bin Chen,Ke Chen,Yaowei Wang
机构: Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences (中国科学院深圳先进技术研究院); Peng Cheng Laboratory (鹏城实验室); Harbin Institute of Technology, Shenzhen (哈尔滨工业大学(深圳)); Meituan, Beijing (美团); University of Chinese Academy of Sciences (中国科学院大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
备注: 18 pages, 6 figures, 6 tables

点击查看摘要

Abstract:Existing works on reasoning segmentation either connect hidden features from a language model directly to a mask decoder or represent positions in text, which limits interpretability and semantic detail. To solve this, we present CoPRS, a Multi-modal Chain-of-Thought (MCoT)-based positional perception model that bridges language reasoning to segmentation through a differentiable and interpretable positional prior instantiated as a heatmap. By making the reasoning process clear via MCoT and expressing it as a dense, differentiable heatmap, this interface enhances interpretability and diagnostic analysis and yields more concentrated evidence on the target. A learnable concentration token aggregates features of the image and reasoning text to generate this positional prior, which is decoded to precise masks through a lightweight decoder, providing a direct connection between reasoning and segmentation. Across the RefCOCO series and ReasonSeg, CoPRS matches or surpasses the best reported metrics on each standard split under comparable protocols, with performance at or above prior state of the art across both validation and test partitions. Extensive experiments reveal that the quality of the heatmap strongly influences the resulting mask quality, supporting a consistent association between the reasoning output and downstream mask generation. Collectively, these findings support the utility of this paradigm in bridging reasoning and segmentation and show advantages in concentration driven by reasoning and predicting masks more precisely. Code, checkpoints and logs are released at this https URL.
zh

[CV-59] Multiview Manifold Evidential Fusion for PolSAR Image Classification

【速读】:该论文旨在解决极化合成孔径雷达(PolSAR)图像分类中多视图特征融合的可靠性与可解释性问题。传统方法通常简单地将协方差矩阵和多特征(如散射角、熵、纹理等)进行拼接或通过深度学习联合建模,但忽略了这两个视图分别位于不同流形空间(Hermitian Positive Definite (HPD) 流形和Grassmann流形)上的几何结构差异,且未考虑各视图的重要性权重及不确定性信息,导致预测结果不稳定、不可靠。解决方案的关键在于提出一种多视图流形证据融合网络(Multiview Manifold Evidential Fusion, MMEFnet),其核心创新包括:首先在不同流形上分别学习协方差矩阵和多特征的深度表示;其次引入基于Dempster-Shafer证据理论的信任度融合机制,量化各视图的置信度并估计不确定性;最终实现更可靠、可解释的分类决策。

链接: https://arxiv.org/abs/2510.11171
作者: Junfei Shi,Haojia Zhang,Haiyan Jin,Junhuai Li,Xiaogang Song,Yuanfan Guo,Haonan Su,Weisi Lin
机构: Xi’an University of Technology (西安理工大学); Nanyang Technological University (南洋理工大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: The paper has 14 pages and 7 figures

点击查看摘要

Abstract:Polarimetric Synthetic Aperture Radar (PolSAR) covariance matrices and their extracted multi-features - such as scattering angle, entropy, texture, and boundary descriptors - provide complementary and physically interpretable information for image classification. Traditional fusion strategies typically concatenate these features or employ deep learning networks to combine them. However, the covariance matrices and multi-features, as two complementary views, lie on different manifolds with distinct geometric structures. Existing fusion methods also overlook the varying importance of different views and ignore uncertainty, often leading to unreliable predictions. To address these issues, we propose a Multiview Manifold Evidential Fusion (MMEFnet) method to effectively fuse these two views. It gives a new framework to integrate PolSAR manifold learning and evidence fusion into a unified architecture. Specifically, covariance matrices are represented on the Hermitian Positive Definite (HPD) manifold, while multi-features are modeled on the Grassmann manifold. Two different kernel metric learning networks are constructed to learn their manifold representations. Subsequently, a trusted multiview evidence fusion, replacing the conventional softmax classifier, estimates belief mass and quantifies the uncertainty of each view from the learned deep features. Finally, a Dempster-Shafer theory-based fusion strategy combines evidence, enabling a more reliable and interpretable classification. Extensive experiments on three real-world PolSAR datasets demonstrate that the proposed method consistently outperforms existing approaches in accuracy, robustness, and interpretability.
zh

[CV-60] Validation of an Artificial Intelligence Tool for the Detection of Sperm DNA Frag mentation Using the TUNEL In Situ Hybridization Assay

【速读】:该论文旨在解决传统精液分析无法评估精子DNA碎片化(Sperm DNA Fragmentation, SDF)这一关键男性生育力指标的问题。其解决方案的关键在于提出了一种基于形态学辅助的集成人工智能(AI)模型,该模型结合图像处理技术与先进的基于Transformer架构的机器学习模型(GC-ViT),通过相位对比显微镜图像预测精子DNA完整性,从而实现非破坏性、实时的精子筛选,为临床诊断和治疗提供依据。

链接: https://arxiv.org/abs/2510.11142
作者: Byron Alexander Jacobs,Aqeel Morris,Ifthakaar Shaik,Frando Lin
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Sperm DNA fragmentation (SDF) is a critical parameter in male fertility assessment that conventional semen analysis fails to evaluate. This study presents the validation of a novel artificial intelligence (AI) tool designed to detect SDF through digital analysis of phase contrast microscopy images, using the terminal deoxynucleotidyl transferase dUTP nick end labeling (TUNEL) assay as the gold standard reference. Utilising the established link between sperm morphology and DNA integrity, the present work proposes a morphology assisted ensemble AI model that combines image processing techniques with state-of-the-art transformer based machine learning models (GC-ViT) for the prediction of DNA fragmentation in sperm from phase contrast images. The ensemble model is benchmarked against a pure transformer vision' model as well as a morphology-only` model. Promising results show the proposed framework is able to achieve sensitivity of 60% and specificity of 75%. This non-destructive methodology represents a significant advancement in reproductive medicine by enabling real-time sperm selection based on DNA integrity for clinical diagnostic and therapeutic applications.
zh

[CV-61] video-SALMONN S: Streaming Audio-Visual LLM s Beyond Length Limits via Memory

【速读】:该论文旨在解决当前视频理解大语言模型(Video-Understanding LLMs)在处理长时间、高帧率、高分辨率视频流时面临的可扩展性问题,尤其是现有方法在内存受限条件下难以保持长程依赖信息的挑战。其解决方案的关键在于提出了一种名为video-SALMONN S的流式多模态大语言模型,引入两个核心机制:(i) 测试时训练(Test-Time Training, TTT)记忆模块,通过Hessian-free共轭梯度优化(TTT_HF)持续更新token表示以替代传统的token合并策略,从而有效捕获长程依赖;(ii) 基于提示的内存读取器(prompt-dependent memory reader),能够从固定大小的记忆空间中选择性检索与当前任务相关的上下文内容。该方案使得模型能够在固定内存预算下稳定处理长达3小时、1 FPS、360p分辨率的视频,并在多个长视频基准测试(如Video-MME、LVBench)上实现优于离线与流式基线的方法性能。

链接: https://arxiv.org/abs/2510.11129
作者: Guangzhi Sun,Yixuan Li,Xiaodong Wu,Yudong Yang,Wei Li,Zejun Ma,Chao Zhang
机构: Tsinghua University (清华大学); University of Cambridge (剑桥大学); ByteDance (字节跳动)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Continuous, high-frame-rate, high-resolution processing of long video streams is critical for future AI agents, yet current video-understanding LLMs struggle to scale. Offline, fixed-frame-number methods require the stream length to adapt frame rates; streaming methods constrain memory by merging or discarding tokens, losing information. We propose video-SALMONN S, a streaming audio-visual LLM that, to our knowledge, is the first to process 3-hour videos at 1 FPS and 360p resolution under a fixed memory budget. Our model introduces (i) a test-time-training (TTT) memory module that continually updates token representations to capture long-range dependencies by replacing token merging, and (ii) a prompt-dependent memory reader that selectively retrieves context-relevant content from fixed-size memory. The TTT module is optimised with a Hessian-free conjugate-gradient procedure (TTT_HF) for efficient adaptation. On long-video benchmarks (Video-MME, LVBench, VideoEvalPro), video-SALMONN S sustains high-quality understanding on multi-hour videos with 10k frames and 1M tokens. Our 8B-parameter model achieves 74.2% overall and 67.8% on the Video-MME long split, outperforming both offline and streaming baselines.
zh

[CV-62] Lightweight Facial Landmark Detection in Thermal Images via Multi-Level Cross-Modal Knowledge Transfer

【速读】:该论文旨在解决热成像(thermal imagery)中面部关键点检测(Facial Landmark Detection, FLD)因缺乏丰富视觉线索而导致的性能瓶颈问题。传统跨模态方法如特征融合或RGB到热图的图像翻译存在计算复杂度高或引入结构伪影的问题,限制了实际部署。其解决方案的关键在于提出多层级跨模态知识蒸馏(Multi-Level Cross-Modal Knowledge Distillation, MLCM-KD)框架,将高保真RGB到热图的知识迁移与模型压缩解耦,从而实现准确且高效的热成像FLD模型。其中,核心创新是双注入知识蒸馏(Dual-Injected Knowledge Distillation, DIKD),该机制通过双向监督:一方面用RGB教师模型的丰富特征引导热图学生模型,另一方面将学生学习到的表示反馈至冻结教师的预测头进行验证,形成闭环监督,强制学生学习跨模态不变的语义特征,有效弥合RGB与热图之间的模态鸿沟(modality gap),实现更鲁棒、深入的知识迁移。

链接: https://arxiv.org/abs/2510.11128
作者: Qiyi Tong,Olivia Nocentini,Marta Lagomarsino,Kuanqi Cai,Marta Lorenzini,Arash Ajoudani
机构: Istituto Italiano di Tecnologia (意大利技术研究院); Università di Genova (热那亚大学)
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Facial Landmark Detection (FLD) in thermal imagery is critical for applications in challenging lighting conditions, but it is hampered by the lack of rich visual cues. Conventional cross-modal solutions, like feature fusion or image translation from RGB data, are often computationally expensive or introduce structural artifacts, limiting their practical deployment. To address this, we propose Multi-Level Cross-Modal Knowledge Distillation (MLCM-KD), a novel framework that decouples high-fidelity RGB-to-thermal knowledge transfer from model compression to create both accurate and efficient thermal FLD models. A central challenge during knowledge transfer is the profound modality gap between RGB and thermal data, where traditional unidirectional distillation fails to enforce semantic consistency across disparate feature spaces. To overcome this, we introduce Dual-Injected Knowledge Distillation (DIKD), a bidirectional mechanism designed specifically for this task. DIKD establishes a connection between modalities: it not only guides the thermal student with rich RGB features but also validates the student’s learned representations by feeding them back into the frozen teacher’s prediction head. This closed-loop supervision forces the student to learn modality-invariant features that are semantically aligned with the teacher, ensuring a robust and profound knowledge transfer. Experiments show that our approach sets a new state-of-the-art on public thermal FLD benchmarks, notably outperforming previous methods while drastically reducing computational overhead.
zh

[CV-63] Demystifying Numerosity in Diffusion Models – Limitations and Remedies

【速读】:该论文旨在解决当前生成式 AI(Generative AI)模型在文本到图像生成任务中对数量指令(numerosity)理解不足的问题,即模型难以准确根据文本提示生成指定数量的物体。研究发现,单纯扩大训练数据规模和模型参数量并不能有效提升计数准确性,其根本原因在于扩散模型(diffusion models)更依赖于噪声初始化(noise initialization)而非显式的文本计数信息,且噪声先验存在偏向特定计数的偏差。解决方案的关键在于通过向噪声先验注入计数感知的布局信息(count-aware layout information),从而引导模型关注文本中的数量指令,该策略在两个基准数据集上显著提升了计数准确率,证明了其有效性与泛化能力。

链接: https://arxiv.org/abs/2510.11117
作者: Yaqi Zhao,Xiaochen Wang,Li Dong,Wentao Zhang,Yuhui Yuan
机构: Peking University (北京大学); Microsoft Research Asia (微软亚洲研究院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Numerosity remains a challenge for state-of-the-art text-to-image generation models like FLUX and GPT-4o, which often fail to accurately follow counting instructions in text prompts. In this paper, we aim to study a fundamental yet often overlooked question: Can diffusion models inherently generate the correct number of objects specified by a textual prompt simply by scaling up the dataset and model size? To enable rigorous and reproducible evaluation, we construct a clean synthetic numerosity benchmark comprising two complementary datasets: GrayCount250 for controlled scaling studies, and NaturalCount6 featuring complex naturalistic scenes. Second, we empirically show that the scaling hypothesis does not hold: larger models and datasets alone fail to improve counting accuracy on our benchmark. Our analysis identifies a key reason: diffusion models tend to rely heavily on the noise initialization rather than the explicit numerosity specified in the prompt. We observe that noise priors exhibit biases toward specific object counts. In addition, we propose an effective strategy for controlling numerosity by injecting count-aware layout information into the noise prior. Our method achieves significant gains, improving accuracy on GrayCount250 from 20.0% to 85.3% and on NaturalCount6 from 74.8% to 86.3%, demonstrating effective generalization across settings.
zh

[CV-64] Connecting Giants: Synergistic Knowledge Transfer of Large Multimodal Models for Few-Shot Learning IJCAI2025

【速读】:该论文旨在解决小样本学习(Few-shot Learning, FSL)中因训练样本稀缺而导致的分类性能下降问题,尤其针对现有方法依赖小模型引入语义知识时易产生噪声和偏差的局限性。其核心解决方案是提出一种名为协同知识迁移(Synergistic Knowledge Transfer, SynTrans)的新框架,关键在于通过无监督代理任务从大型多模态模型(如CLIP)中蒸馏出语义对齐的视觉知识,并借助无需训练的协同知识挖掘模块实现多模型间高质量语义知识的提取;进一步通过视觉-语义桥梁模块实现双向知识迁移,将显式视觉与隐式语义知识转化为类别特定的分类器权重,最终利用视觉权重生成器和语义权重重构器自适应地构建最优多模态FSL分类器,从而显著提升小样本场景下的分类性能。

链接: https://arxiv.org/abs/2510.11115
作者: Hao Tang,Shengfeng He,Jing Qin
机构: Centre for Smart Health, The Hong Kong Polytechnic University (智能健康中心,香港理工大学); School of Computing and Information Systems, Singapore Management University (计算机与信息系统学院,新加坡管理大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
备注: Accepted by IJCAI 2025

点击查看摘要

Abstract:Few-shot learning (FSL) addresses the challenge of classifying novel classes with limited training samples. While some methods leverage semantic knowledge from smaller-scale models to mitigate data scarcity, these approaches often introduce noise and bias due to the data’s inherent simplicity. In this paper, we propose a novel framework, Synergistic Knowledge Transfer (SynTrans), which effectively transfers diverse and complementary knowledge from large multimodal models to empower the off-the-shelf few-shot learner. Specifically, SynTrans employs CLIP as a robust teacher and uses a few-shot vision encoder as a weak student, distilling semantic-aligned visual knowledge via an unsupervised proxy task. Subsequently, a training-free synergistic knowledge mining module facilitates collaboration among large multimodal models to extract high-quality semantic knowledge. Building upon this, a visual-semantic bridging module enables bi-directional knowledge transfer between visual and semantic spaces, transforming explicit visual and implicit semantic knowledge into category-specific classifier weights. Finally, SynTrans introduces a visual weight generator and a semantic weight reconstructor to adaptively construct optimal multimodal FSL classifiers. Experimental results on four FSL datasets demonstrate that SynTrans, even when paired with a simple few-shot vision encoder, significantly outperforms current state-of-the-art methods.
zh

[CV-65] Multimodal Disease Progression Modeling via Spatiotemporal Disentanglement and Multiscale Alignment NEURIPS2025

链接: https://arxiv.org/abs/2510.11112
作者: Chen Liu,Wenfang Yao,Kejing Yin,William K. Cheung,Jing Qin
机构: The Hong Kong Polytechnic University (香港理工大学); Hong Kong Baptist University (香港浸会大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: NeurIPS 2025 Spotlight

点击查看摘要

[CV-66] MoMaps: Semantics-Aware Scene Motion Generation with Motion Maps ICCV2025

【速读】:该论文旨在解决从真实世界视频中学习语义和功能上有意义的3D运动先验(3D motion priors)的问题,以实现仅凭单张输入图像即可预测未来3D场景运动的目标。其解决方案的关键在于提出了一种新颖的像素对齐运动图(Motion Map, MoMap)表示方法,该表示可从现有生成图像模型中高效生成,并用于训练扩散模型(diffusion model)以学习运动分布;同时,MoMap不仅支持3D轨迹合成,还构建了一种新的2D视频合成流水线:先生成MoMap,再根据其对图像进行形变并完成基于点的渲染,从而实现语义一致且合理的3D场景运动预测。

链接: https://arxiv.org/abs/2510.11107
作者: Jiahui Lei,Kyle Genova,George Kopanas,Noah Snavely,Leonidas Guibas
机构: Google DeepMind(谷歌深度大脑); University of Pennsylvania(宾夕法尼亚大学); Google(谷歌)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted at ICCV 2025, project page: this https URL

点击查看摘要

Abstract:This paper addresses the challenge of learning semantically and functionally meaningful 3D motion priors from real-world videos, in order to enable prediction of future 3D scene motion from a single input image. We propose a novel pixel-aligned Motion Map (MoMap) representation for 3D scene motion, which can be generated from existing generative image models to facilitate efficient and effective motion prediction. To learn meaningful distributions over motion, we create a large-scale database of MoMaps from over 50,000 real videos and train a diffusion model on these representations. Our motion generation not only synthesizes trajectories in 3D but also suggests a new pipeline for 2D video synthesis: first generate a MoMap, then warp an image accordingly and complete the warped point-based renderings. Experimental results demonstrate that our approach generates plausible and semantically consistent 3D scene motion.
zh

[CV-67] Compositional Zero-Shot Learning: A Survey

【速读】:该论文旨在解决组合零样本学习(Compositional Zero-Shot Learning, CZSL)中的核心挑战,即模型在推理阶段识别未见过的属性与物体组合的能力,而无需为每种可能的组合提供训练数据。其关键在于有效建模视觉表征的上下文依赖性(contextuality)和组合结构的可分解性(compositionality),例如“小猫”与“老猫”在视觉上差异显著,“湿车”与“湿猫”的语义组合也需区分。为此,作者提出了首个针对CZSL的系统性综述,基于解耦(disentanglement)思想构建四类方法:无显式解耦、文本解耦、视觉解耦及跨模态解耦,并深入分析各类方法在封闭世界与开放世界场景下的优劣,从而为未来研究指明方向。

链接: https://arxiv.org/abs/2510.11106
作者: Ans Munir,Faisal Z. Qureshi,Mohsen Ali,Muhammad Haris Khan
机构: Information Technology University (信息科技大学); University of Ontario Institute of Technology (安大略理工大学); Mohamed Bin Zayed University of Artificial Intelligence (穆罕默德·本·扎耶德人工智能大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Survey paper with 36 pages, 8 plots and 4 figures

点击查看摘要

Abstract:Compositional Zero-Shot Learning (CZSL) is a critical task in computer vision that enables models to recognize unseen combinations of known attributes and objects during inference, addressing the combinatorial challenge of requiring training data for every possible composition. This is particularly challenging because the visual appearance of primitives is highly contextual; for example, small'' cats appear visually distinct from older’’ ones, and wet'' cars differ significantly from wet’’ cats. Effectively modeling this contextuality and the inherent compositionality is crucial for robust compositional zero-shot recognition. This paper presents, to our knowledge, the first comprehensive survey specifically focused on Compositional Zero-Shot Learning. We systematically review the state-of-the-art CZSL methods, introducing a taxonomy grounded in disentanglement, with four families of approaches: no explicit disentanglement, textual disentanglement, visual disentanglement, and cross-modal disentanglement. We provide a detailed comparative analysis of these methods, highlighting their core advantages and limitations in different problem settings, such as closed-world and open-world CZSL. Finally, we identify the most significant open challenges and outline promising future research directions. This survey aims to serve as a foundational resource to guide and inspire further advancements in this fascinating and important field. Papers studied in this survey with their official code are available on our github: this https URL
zh

[CV-68] CoDefend: Cross-Modal Collaborative Defense via Diffusion Purification and Prompt Optimization

【速读】:该论文旨在解决多模态大语言模型(Multimodal Large Language Models, MLLMs)在面对对抗攻击时的脆弱性问题,尤其是视觉模态作为主要攻击入口所引发的安全风险。现有防御策略如对抗训练和输入净化方法存在局限:前者仅对已知攻击有效且计算成本高,后者常导致图像质量下降且泛化能力不足。论文提出一种基于监督扩散(supervised diffusion)的去噪框架,其关键在于利用成对的对抗清洁图像数据集,通过方向性、任务特定的引导信号微调扩散模型,从而实现高质量重建并显著提升多模态任务中的鲁棒性;同时引入提示优化(prompt optimization)作为补充机制,增强对未知攻击策略的抵抗能力。实验表明,该方法在图像描述和视觉问答任务中不仅大幅提升了防御效果,还具备良好的迁移性,为MLLMs在真实场景中的可靠部署提供了有效保障。

链接: https://arxiv.org/abs/2510.11096
作者: Fengling Zhu,Boshi Liu,Jingyu Hua,Sheng Zhong
机构: Nanjing University (南京大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Multimodal Large Language Models (MLLMs) have achieved remarkable success in tasks such as image captioning, visual question answering, and cross-modal reasoning by integrating visual and textual modalities. However, their multimodal nature also exposes them to adversarial threats, where attackers can perturb either modality or both jointly to induce harmful, misleading, or policy violating outputs. Existing defense strategies, such as adversarial training and input purification, face notable limitations: adversarial training typically improves robustness only against known attacks while incurring high computational costs, whereas conventional purification approaches often suffer from degraded image quality and insufficient generalization to complex multimodal tasks. In this work, we focus on defending the visual modality, which frequently serves as the primary entry point for adversarial manipulation. We propose a supervised diffusion based denoising framework that leverages paired adversarial clean image datasets to fine-tune diffusion models with directional, task specific guidance. Unlike prior unsupervised purification methods such as DiffPure, our approach achieves higher quality reconstructions while significantly improving defense robustness in multimodal tasks. Furthermore, we incorporate prompt optimization as a complementary defense mechanism, enhancing resistance against diverse and unseen attack strategies. Extensive experiments on image captioning and visual question answering demonstrate that our method not only substantially improves robustness but also exhibits strong transferability to unknown adversarial attacks. These results highlight the effectiveness of supervised diffusion based denoising for multimodal defense, paving the way for more reliable and secure deployment of MLLMs in real world applications. Subjects: Computer Vision and Pattern Recognition (cs.CV) Cite as: arXiv:2510.11096 [cs.CV] (or arXiv:2510.11096v1 [cs.CV] for this version) https://doi.org/10.48550/arXiv.2510.11096 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
zh

[CV-69] Future-Aware End-to-End Driving: Bidirectional Modeling of Trajectory Planning and Scene Evolution NEURIPS2025

【速读】:该论文旨在解决当前端到端自动驾驶方法在决策过程中对场景动态演化关注不足的问题,这类方法通常采用“一次性”(one-shot)范式,仅依赖当前场景上下文进行动作预测,从而难以在复杂驾驶场景中做出适应性与前瞻性更强的决策。解决方案的关键在于提出SeerDrive框架,其核心创新是通过闭环方式联合建模未来场景演化与轨迹规划:一方面利用预测的鸟瞰图(BEV)表示提前感知环境动态变化,另一方面将此先验信息注入轨迹规划模块以生成更具前瞻性的路径;具体由两个关键组件实现——未来感知规划(future-aware planning)和迭代式场景建模与车辆规划协同优化,从而显著提升模型在真实复杂交通场景中的决策能力。

链接: https://arxiv.org/abs/2510.11092
作者: Bozhou Zhang,Nan Song,Jingyu Li,Xiatian Zhu,Jiankang Deng,Li Zhang
机构: Fudan University (复旦大学); Shanghai Innovation Institute; University of Surrey; Imperial College London
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: NeurIPS 2025

点击查看摘要

Abstract:End-to-end autonomous driving methods aim to directly map raw sensor inputs to future driving actions such as planned trajectories, bypassing traditional modular pipelines. While these approaches have shown promise, they often operate under a one-shot paradigm that relies heavily on the current scene context, potentially underestimating the importance of scene dynamics and their temporal evolution. This limitation restricts the model’s ability to make informed and adaptive decisions in complex driving scenarios. We propose a new perspective: the future trajectory of an autonomous vehicle is closely intertwined with the evolving dynamics of its environment, and conversely, the vehicle’s own future states can influence how the surrounding scene unfolds. Motivated by this bidirectional relationship, we introduce SeerDrive, a novel end-to-end framework that jointly models future scene evolution and trajectory planning in a closed-loop manner. Our method first predicts future bird’s-eye view (BEV) representations to anticipate the dynamics of the surrounding scene, then leverages this foresight to generate future-context-aware trajectories. Two key components enable this: (1) future-aware planning, which injects predicted BEV features into the trajectory planner, and (2) iterative scene modeling and vehicle planning, which refines both future scene prediction and trajectory generation through collaborative optimization. Extensive experiments on the NAVSIM and nuScenes benchmarks show that SeerDrive significantly outperforms existing state-of-the-art methods.
zh

[CV-70] xt-Enhanced Panoptic Symbol Spotting in CAD Drawings

【速读】:该论文旨在解决CAD图纸中符号识别(symbol spotting)任务中存在的两个关键问题:一是现有方法通常忽略图纸中丰富的文本注释信息,二是缺乏对几何与文本元素之间空间关系的显式建模,导致对图纸整体语义理解不充分。解决方案的关键在于提出一种融合文本注释的全景符号定位(panoptic symbol spotting)框架,通过联合建模几何与文本原语构建统一表征,并引入基于类型的注意力机制(type-aware attention mechanism)增强Transformer主干网络,以显式捕捉不同类别原语之间的空间依赖关系,从而提升复杂CAD图纸中的符号识别准确性和鲁棒性。

链接: https://arxiv.org/abs/2510.11091
作者: Xianlin Liu,Yan Gong,Bohao Li,Jiajing Huang,Bowen Du,Junchen Ye,Liyan Xu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 7 pages, 3figures. This version is the original submitted manuscript of the paper accepted by The 12th International Conference on Behavioural and Social Computing

点击查看摘要

Abstract:With the widespread adoption of Computer-Aided Design(CAD) drawings in engineering, architecture, and industrial design, the ability to accurately interpret and analyze these drawings has become increasingly critical. Among various subtasks, panoptic symbol spotting plays a vital role in enabling downstream applications such as CAD automation and design retrieval. Existing methods primarily focus on geometric primitives within the CAD drawings to address this task, but they face following major problems: they usually overlook the rich textual annotations present in CAD drawings and they lack explicit modeling of relationships among primitives, resulting in incomprehensive understanding of the holistic drawings. To fill this gap, we propose a panoptic symbol spotting framework that incorporates textual annotations. The framework constructs unified representations by jointly modeling geometric and textual primitives. Then, using visual features extract by pretrained CNN as the initial representations, a Transformer-based backbone is employed, enhanced with a type-aware attention mechanism to explicitly model the different types of spatial dependencies between various primitives. Extensive experiments on the real-world dataset demonstrate that the proposed method outperforms existing approaches on symbol spotting tasks involving textual annotations, and exhibits superior robustness when applied to complex CAD drawings.
zh

[CV-71] Source-Free Object Detection with Detection Transformer

【速读】:该论文旨在解决无源域目标检测(Source-Free Object Detection, SFOD)中现有方法难以有效适配新型检测架构(特别是Detection Transformer, DETR)的问题。传统SFOD方法多局限于Faster R-CNN等经典模型,缺乏针对DETR结构的专门优化机制,导致特征提取与知识迁移效率不足。解决方案的关键在于提出一个专为DETR设计的框架FRANCK,其核心创新包括:基于置信度的样本重加权模块(OSSR)以增强对低识别区域的关注;结合匹配记忆库的对比学习模块(CMMB)提升类别内区分能力;不确定性加权的查询融合特征蒸馏模块(UQFD)改善特征迁移质量;以及动态教师更新间隔策略(DTUI)优化伪标签可靠性。这些组件协同作用,显著提升了DETR在无源数据条件下的域适应性能和泛化能力。

链接: https://arxiv.org/abs/2510.11090
作者: Huizai Yao,Sicheng Zhao,Shuo Lu,Hui Chen,Yangyang Li,Guoping Liu,Tengfei Xing,Chenggang Yan,Jianhua Tao,Guiguang Ding
机构: BNRist, Tsinghua University (清华大学); Department of Automation, Tsinghua University (自动化系); School of Software, Tsinghua University (软件学院); Institute of Automation, Chinese Academy of Sciences (中国科学院自动化研究所); Academy of Cyber, China (中国网络研究院); DiDi Chuxing (滴滴出行); School of Automation, Hangzhou Dianzi University (杭州电子科技大学自动化学院); Lishui Institute of Hangzhou Dianzi University (丽水研究院)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: IEEE Transactions on Image Processing

点击查看摘要

Abstract:Source-Free Object Detection (SFOD) enables knowledge transfer from a source domain to an unsupervised target domain for object detection without access to source data. Most existing SFOD approaches are either confined to conventional object detection (OD) models like Faster R-CNN or designed as general solutions without tailored adaptations for novel OD architectures, especially Detection Transformer (DETR). In this paper, we introduce Feature Reweighting ANd Contrastive Learning NetworK (FRANCK), a novel SFOD framework specifically designed to perform query-centric feature enhancement for DETRs. FRANCK comprises four key components: (1) an Objectness Score-based Sample Reweighting (OSSR) module that computes attention-based objectness scores on multi-scale encoder feature maps, reweighting the detection loss to emphasize less-recognized regions; (2) a Contrastive Learning with Matching-based Memory Bank (CMMB) module that integrates multi-level features into memory banks, enhancing class-wise contrastive learning; (3) an Uncertainty-weighted Query-fused Feature Distillation (UQFD) module that improves feature distillation through prediction quality reweighting and query feature fusion; and (4) an improved self-training pipeline with a Dynamic Teacher Updating Interval (DTUI) that optimizes pseudo-label quality. By leveraging these components, FRANCK effectively adapts a source-pre-trained DETR model to a target domain with enhanced robustness and generalization. Extensive experiments on several widely used benchmarks demonstrate that our method achieves state-of-the-art performance, highlighting its effectiveness and compatibility with DETR-based SFOD models.
zh

[CV-72] ROFI: A Deep Learning-Based Ophthalmic Sign-Preserving and Reversible Patient Face Anonymizer

【速读】:该论文旨在解决眼病影像在数字医疗时代面临的隐私泄露问题,同时确保疾病特征不被破坏以维持诊断准确性。其解决方案的关键在于提出了一种基于深度学习的隐私保护框架ROFI,通过弱监督学习(weakly supervised learning)与神经身份转换(neural identity translation)技术,在保留超过98%疾病识别准确率的前提下,有效匿名化面部特征,实现对患者隐私的高保真保护,并支持安全的图像逆向还原,保障长期医疗追踪与审计需求。

链接: https://arxiv.org/abs/2510.11073
作者: Yuan Tian,Min Zhou,Yitong Chen,Fang Li,Lingzi Qi,Shuo Wang,Xieyang Xu,Yu Yu,Shiqiong Xu,Chaoyu Lei,Yankai Jiang,Rongzhao Zhang,Jia Tan,Li Wu,Hong Chen,Xiaowei Liu,Wei Lu,Lin Li,Huifang Zhou,Xuefei Song,Guangtao Zhai,Xianqun Fan
机构: Shanghai Medical College, Fudan University (复旦大学上海医学院); Shanghai Jiao Tong University (上海交通大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted to Nature NPJ Digital Medicine

点击查看摘要

Abstract:Patient face images provide a convenient mean for evaluating eye diseases, while also raising privacy concerns. Here, we introduce ROFI, a deep learning-based privacy protection framework for ophthalmology. Using weakly supervised learning and neural identity translation, ROFI anonymizes facial features while retaining disease features (over 98% accuracy, \kappa 0.90 ). It achieves 100% diagnostic sensitivity and high agreement ( \kappa 0.90 ) across eleven eye diseases in three cohorts, anonymizing over 95% of images. ROFI works with AI systems, maintaining original diagnoses ( \kappa 0.80 ), and supports secure image reversal (over 98% similarity), enabling audits and long-term care. These results show ROFI’s effectiveness of protecting patient privacy in the digital medicine era.
zh

[CV-73] LSVOS 2025 Challenge Report: Recent Advances in Complex Video Object Segmentation

链接: https://arxiv.org/abs/2510.11063
作者: Chang Liu,Henghui Ding,Kaining Ying,Lingyi Hong,Ning Xu,Linjie Yang,Yuchen Fan,Mingqi Gao,Jingkun Chen,Yunqi Miao,Gengshen Wu,Zhijin Qin,Jungong Han,Zhixiong Zhang,Shuangrui Ding,Xiaoyi Dong,Yuhang Zang,Yuhang Cao,Jiaqi Wang,Chang Soo Lim,Joonyoung Moon,Donghyeon Cho,Tingmin Li,Yixuan Li,Yang Yang,An Yan,Leilei Cao,Feng Lu,Ran Hong,Youhai Jiang,Fengjie Zhu,Yujie Xie,Hongyang Zhang,Zhihui Liu,Shihai Ruan,Quanzhu Niu,Dengxian Gong,Shihao Chen,Tao Zhang,Yikang Zhou,Haobo Yuan,Lu Qi,Xiangtai Li,Shunping Ji,Ran Hong,Feng Lu,Leilei Cao,An Yan,Alexey Nekrasov,Ali Athar,Daan de Geus,Alexander Hermans,Bastian Leibe
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 16 pages, 9 figures

点击查看摘要

[CV-74] Zero-shot Face Editing via ID-Attribute Decoupled Inversion ICME2025

链接: https://arxiv.org/abs/2510.11050
作者: Yang Hou,Minggu Wang,Jianjun Zhao
机构: Kyushu University (九州大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by ICME2025

点击查看摘要

[CV-75] Benchmarking Deep Learning Models for Laryngeal Cancer Staging Using the LaryngealCT Dataset

链接: https://arxiv.org/abs/2510.11047
作者: Nivea Roy,Son Tran,Atul Sajjanhar,K. Devaraja,Prakashini Koteshwara,Yong Xiang,Divya Rao
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-76] Enhancing Zero-Shot Anomaly Detection: CLIP-SAM Collaboration with Cascaded Prompts

【速读】:该论文旨在解决工业异常检测中零样本异常分割(zero-shot anomaly segmentation)任务的挑战,即如何有效引导基础模型(foundation models)准确聚焦于异常区域而非完整物体进行分割。其解决方案的关键在于提出了一种两阶段框架:第一阶段通过协同利用CLIP(对比语言-图像预训练模型)与SAM(Segment Anything Model)的特征,设计了共特征点提示生成(Co-Feature Point Prompt Generation, PPG)模块,生成正负点提示以抑制SAM对完整物体的分割倾向;第二阶段引入级联提示与轻量解码器相结合的CPS(Cascaded Prompts for SAM)模块,进一步优化分割边界并去除孤立噪声,从而实现高精度的异常区域分割。该方法在多个数据集上达到SOTA性能,尤其在Visa数据集上F₁-max和AP指标分别提升10.3%和7.7%。

链接: https://arxiv.org/abs/2510.11028
作者: Yanning Hou,Ke Xu,Junfa Li,Yanran Ruan,Jianfeng Qiu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by PRCV

点击查看摘要

Abstract:Recently, the powerful generalization ability exhibited by foundation models has brought forth new solutions for zero-shot anomaly segmentation tasks. However, guiding these foundation models correctly to address downstream tasks remains a challenge. This paper proposes a novel two-stage framework, for zero-shot anomaly segmentation tasks in industrial anomaly detection. This framework excellently leverages the powerful anomaly localization capability of CLIP and the boundary perception ability of SAM.(1) To mitigate SAM’s inclination towards object segmentation, we propose the Co-Feature Point Prompt Generation (PPG) module. This module collaboratively utilizes CLIP and SAM to generate positive and negative point prompts, guiding SAM to focus on segmenting anomalous regions rather than the entire object. (2) To further optimize SAM’s segmentation results and mitigate rough boundaries and isolated noise, we introduce the Cascaded Prompts for SAM (CPS) module. This module employs hybrid prompts cascaded with a lightweight decoder of SAM, achieving precise segmentation of anomalous regions. Across multiple datasets, consistent experimental validation demonstrates that our approach achieves state-of-the-art zero-shot anomaly segmentation results. Particularly noteworthy is our performance on the Visa dataset, where we outperform the state-of-the-art methods by 10.3% and 7.7% in terms of F_1 -max and AP metrics, respectively.
zh

[CV-77] Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning

链接: https://arxiv.org/abs/2510.11027
作者: Ganlin Yang,Tianyi Zhang,Haoran Hao,Weiyun Wang,Yibin Liu,Dehui Wang,Guanzhou Chen,Zijian Cai,Junting Chen,Weijie Su,Wengang Zhou,Yu Qiao,Jifeng Dai,Jiangmiao Pang,Gen Luo,Wenhai Wang,Yao Mu,Zhi Hou
机构: University of Science and Technology of China (中国科学技术大学); Shanghai AI Laboratory; Shanghai Jiao Tong University (上海交通大学); Zhejiang University (浙江大学); Nanjing University (南京大学); Fudan University (复旦大学); Tsinghua University (清华大学); NUS (新加坡国立大学); Northeastern University (东北大学); Shenzhen University (深圳大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-78] GIR-Bench: Versatile Benchmark for Generating Images with Reasoning

链接: https://arxiv.org/abs/2510.11026
作者: Hongxiang Li,Yaowei Li,Bin Lin,Yuwei Niu,Yuhang Yang,Xiaoshuang Huang,Jiayin Cai,Xiaolong Jiang,Yao Hu,Long Chen
机构: The Hong Kong University of Science and Technology (香港科技大学); Peking University (北京大学); University of Science and Technology of China (中国科学技术大学); Xiaohongshu Inc. (小红书)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-79] GeoVLMath: Enhancing Geometry Reasoning in Vision-Language Models via Cross-Modal Reward for Auxiliary Line Creation

链接: https://arxiv.org/abs/2510.11020
作者: Shasha Guo,Liang Pang,Xi Wang,Yanling Wang,Huawei Shen,Jing Zhang
机构: Institute of Computing Technology, Chinese Academy of Sciences (中国科学院计算技术研究所); Renmin University of China (中国人民大学); Zhipu AI (智谱AI)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 22 pages

点击查看摘要

[CV-80] he Easy Path to Robustness: Coreset Selection using Sample Hardness

【速读】:该论文旨在解决如何从数据中心视角提升模型对抗鲁棒性的问题,即在不牺牲训练效率的前提下,识别并保留对学习鲁棒特征至关重要的样本。现有基于coreset(核心集)的选择方法主要优化干净准确率,在保持对抗鲁棒性方面表现不足。其解决方案的关键在于提出一个将样本的对抗脆弱性与“难度”(hardness)相关联的框架,通过量化训练过程中输入梯度的平均范数(Average Input Gradient Norm, AIGN)来衡量样本的hardness:低AIGN的样本(易样本)更不易受到对抗攻击,且位于决策边界之外更远区域。基于此发现,作者设计了EasyCore算法,仅保留AIGN较低的样本用于训练,从而在标准训练和TRADES对抗训练下均显著优于现有coreset方法,实现高达7%和5%的对抗准确率提升。该方法具有模型无关性,可广泛应用于各类场景以增强对抗鲁棒性。

链接: https://arxiv.org/abs/2510.11018
作者: Pranav Ramesh,Arjun Roy,Deepak Ravikumar,Kaushik Roy,Gopalakrishnan Srinivasan
机构: 未知
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Designing adversarially robust models from a data-centric perspective requires understanding which input samples are most crucial for learning resilient features. While coreset selection provides a mechanism for efficient training on data subsets, current algorithms are designed for clean accuracy and fall short in preserving robustness. To address this, we propose a framework linking a sample’s adversarial vulnerability to its \textithardness, which we quantify using the average input gradient norm (AIGN) over training. We demonstrate that \textiteasy samples (with low AIGN) are less vulnerable and occupy regions further from the decision boundary. Leveraging this insight, we present EasyCore, a coreset selection algorithm that retains only the samples with low AIGN for training. We empirically show that models trained on EasyCore-selected data achieve significantly higher adversarial accuracy than those trained with competing coreset methods under both standard and adversarial training. As AIGN is a model-agnostic dataset property, EasyCore is an efficient and widely applicable data-centric method for improving adversarial robustness. We show that EasyCore achieves up to 7% and 5% improvement in adversarial accuracy under standard training and TRADES adversarial training, respectively, compared to existing coreset methods.
zh

[CV-81] High-Resolution Spatiotemporal Modeling with Global-Local State Space Models for Video-Based Human Pose Estimation ICCV2025

【速读】:该论文旨在解决视频中人体姿态估计(VHPE)任务中高分辨率时空表征建模的难题,特别是如何有效平衡全局动态上下文(如整体人体运动趋势)与局部运动细节(如关键点的高频变化)的建模问题。现有方法通常采用单一结构(卷积或注意力机制)统一处理时空学习,难以兼顾两者且存在计算复杂度高的缺陷,尤其在高分辨率序列中难以捕捉全局依赖关系。解决方案的关键在于将状态空间模型(State Space Model, SSM)——具体为Mamba架构——扩展至二维时空域,提出两种并行模块:一是全局时空Mamba(Global Spatiotemporal Mamba),通过六维选择性时空扫描和空间-时间调制合并策略高效提取全局表征;二是局部精修Mamba(Local Refinement Mamba),基于窗口化时空扫描增强关键点局部高频细节。该设计实现了对高分辨率视频中全局与局部动态特征的分离建模,同时保持线性计算复杂度,显著优于现有方法。

链接: https://arxiv.org/abs/2510.11017
作者: Runyang Feng,Hyung Jin Chang,Tze Ho Elden Tse,Boeun Kim,Yi Chang,Yixing Gao
机构: Jilin University (吉林大学); University of Birmingham (伯明翰大学); National University of Singapore (新加坡国立大学); Dankook University (丹克大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: This paper is accepted to ICCV 2025

点击查看摘要

Abstract:Modeling high-resolution spatiotemporal representations, including both global dynamic contexts (e.g., holistic human motion tendencies) and local motion details (e.g., high-frequency changes of keypoints), is essential for video-based human pose estimation (VHPE). Current state-of-the-art methods typically unify spatiotemporal learning within a single type of modeling structure (convolution or attention-based blocks), which inherently have difficulties in balancing global and local dynamic modeling and may bias the network to one of them, leading to suboptimal performance. Moreover, existing VHPE models suffer from quadratic complexity when capturing global dependencies, limiting their applicability especially for high-resolution sequences. Recently, the state space models (known as Mamba) have demonstrated significant potential in modeling long-range contexts with linear complexity; however, they are restricted to 1D sequential data. In this paper, we present a novel framework that extends Mamba from two aspects to separately learn global and local high-resolution spatiotemporal representations for VHPE. Specifically, we first propose a Global Spatiotemporal Mamba, which performs 6D selective space-time scan and spatial- and temporal-modulated scan merging to efficiently extract global representations from high-resolution sequences. We further introduce a windowed space-time scan-based Local Refinement Mamba to enhance the high-frequency details of localized keypoint motions. Extensive experiments on four benchmark datasets demonstrate that the proposed model outperforms state-of-the-art VHPE approaches while achieving better computational trade-offs.
zh

[CV-82] Into the Unknown: Towards using Generative Models for Sampling Priors of Environment Uncertainty for Planning in Configuration Spaces

【速读】:该论文旨在解决机器人在部分可观测环境下进行规划时,难以获取有效先验(priors)的问题。解决方案的关键在于提出一种基于采样的流水线方法,利用大规模预训练生成式模型(Generative Models)零样本地生成包含环境不确定性与空间语义关系的概率先验。该方法在给定部分观测条件下,恢复出完整的RGB-D点云样本,其中包含占据信息(occupancy)和目标语义信息(target semantics),并可直接用于配置空间(configuration-space)规划。实验表明,该方法能恢复符合常识的空间语义,生成多样且干净的3D点云,为机器人导航至未观测目标物体提供可靠先验支持。

链接: https://arxiv.org/abs/2510.11014
作者: Subhransu S. Bhattacharjee,Hao Lu,Dylan Campbell,Rahul Shome
机构: Australian National University (澳大利亚国立大学)
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: Under Review

点击查看摘要

Abstract:Priors are vital for planning under partial observability, yet difficult to obtain in practice. We present a sampling-based pipeline that leverages large-scale pretrained generative models to produce probabilistic priors capturing environmental uncertainty and spatio-semantic relationships in a zero-shot manner. Conditioned on partial observations, the pipeline recovers complete RGB-D point cloud samples with occupancy and target semantics, formulated to be directly useful in configuration-space planning. We establish a Matterport3D benchmark of rooms partially visible through doorways, where a robot must navigate to an unobserved target object. Effective priors for this setting must represent both occupancy and target-location uncertainty in unobserved regions. Experiments show that our approach recovers commonsense spatial semantics consistent with ground truth, yielding diverse, clean 3D point clouds usable in motion planning, highlight the promise of generative models as a rich source of priors for robotic planning.
zh

[CV-83] COCO-Tree: Compositional Hierarchical Concept Trees for Enhanced Reasoning in Vision Language Models EMNLP2025

【速读】:该论文旨在解决现代视觉语言模型(Vision Language Models, VLMs)在组合推理(compositional reasoning)方面的固有缺陷,即当任务依赖于理解图像中多个对象、属性与关系之间的交互时,VLMs 常常表现不佳。解决方案的关键在于提出一种名为 COCO-Tree 的新方法,该方法通过从大型语言模型(Large Language Models, LLMs)中学习得到的神经符号概念树(neurosymbolic concept trees)来增强 VLM 的输出,从而提升其语言推理能力。COCO-Tree 采用受束搜索(beam search)启发的推理过程,在不增加显著计算开销的前提下,不仅显著提升了组合泛化性能(在四个基准测试上提升 5–10%),还提供了可解释的推理路径,增强了模型预测的透明性。

链接: https://arxiv.org/abs/2510.11012
作者: Sanchit Sinha,Guangzhi Xiong,Aidong Zhang
机构: University of Virginia (弗吉尼亚大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: EMNLP 2025 (main)

点击查看摘要

Abstract:Compositional reasoning remains a persistent weakness of modern vision language models (VLMs): they often falter when a task hinges on understanding how multiple objects, attributes, and relations interact within an image. Multiple research works have attempted to improve compositionality performance by creative tricks such as improving prompt structure, chain of thought reasoning, etc. A more recent line of work attempts to impart additional reasoning in VLMs using well-trained Large Language Models (LLMs), which are far superior in linguistic understanding than VLMs to compensate for the limited linguistic prowess of VLMs. However, these approaches are either resource-intensive or do not provide an interpretable reasoning process. In this paper, we present ‘COCO-Tree’ - a novel approach that augments VLM outputs with carefully designed neurosymbolic concept trees learned from LLMs to improve VLM’s linguistic reasoning. COCO-Tree’s beam search-inspired reasoning process boosts compositionality performance and provides a rationale behind VLM predictions. Empirical results on four compositionality benchmarks, Winoground, EqBench, ColorSwap, and SugarCrepe, in seven different open-source VLMs with varying sizes, demonstrate that COCO-Tree significantly improves compositional generalization by 5-10% over baselines.
zh

[CV-84] Frequency Domain Unlocks New Perspectives for Abdominal Medical Image Segmentation

链接: https://arxiv.org/abs/2510.11005
作者: Kai Han,Siqi Ma,Chengxuan Qian,Jun Chen,Chongwen Lyu,Yuqing Song,Zhe Liu
机构: Jiangsu University (江苏大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-85] ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation

链接: https://arxiv.org/abs/2510.11000
作者: Ruihang Xu,Dewei Zhou,Fan Ma,Yi Yang
机构: ReLER, CCAI, Zhejiang University (浙江大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Project Page: this https URL

点击查看摘要

[CV-86] Perspective-aware 3D Gaussian Inpainting with Multi-view Consistency

【速读】:该论文旨在解决3D Gaussian inpainting中多视角一致性(multi-view consistency)难以保障的问题,这是实现高质量虚拟现实与多媒体应用的关键挑战。解决方案的核心在于提出PAInpainter方法,其关键创新在于引入视角感知的内容传播机制和跨多视角的一致性验证策略:通过从视角图中自适应采样多个视点,迭代优化3D Gaussian表示并利用已修复图像作为先验信息进行传播,同时在邻近视点间验证内容一致性,从而显著提升重建场景的全局一致性和纹理保真度。

链接: https://arxiv.org/abs/2510.10993
作者: Yuxin Cheng,Binxiao Huang,Taiqiang Wu,Wenyong Zhou,Chenchen Ding,Zhengwu Liu,Graziano Chesi,Ngai Wong
机构: The University of Hong Kong (香港大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:3D Gaussian inpainting, a critical technique for numerous applications in virtual reality and multimedia, has made significant progress with pretrained diffusion models. However, ensuring multi-view consistency, an essential requirement for high-quality inpainting, remains a key challenge. In this work, we present PAInpainter, a novel approach designed to advance 3D Gaussian inpainting by leveraging perspective-aware content propagation and consistency verification across multi-view inpainted images. Our method iteratively refines inpainting and optimizes the 3D Gaussian representation with multiple views adaptively sampled from a perspective graph. By propagating inpainted images as prior information and verifying consistency across neighboring views, PAInpainter substantially enhances global consistency and texture fidelity in restored 3D scenes. Extensive experiments demonstrate the superiority of PAInpainter over existing methods. Our approach achieves superior 3D inpainting quality, with PSNR scores of 26.03 dB and 29.51 dB on the SPIn-NeRF and NeRFiller datasets, respectively, highlighting its effectiveness and generalization capability.
zh

[CV-87] Mixup Helps Understanding Multimodal Video Better

链接: https://arxiv.org/abs/2510.10986
作者: Xiaoyu Ma,Ding Ding,Hao Chen
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-88] On the Optimal Representation Efficiency of Barlow Twins: An Information-Geometric Interpretation

【速读】:该论文旨在解决自监督学习(Self-supervised Learning, SSL)中缺乏统一理论框架以理解和比较不同方法表示效率的问题。其解决方案的关键在于提出一种基于信息几何的新框架,通过定义表示效率 η\eta(即学习到的表示空间的有效内在维数与环境维数之比)来量化效率,其中有效维数由编码器诱导的统计流形上Fisher信息矩阵(Fisher Information Matrix, FIM)的谱特性决定。在此框架下,作者对Barlow Twins方法进行了理论分析,证明在合理假设下该方法可通过将表示的交叉相关矩阵驱动至单位矩阵,从而诱导各向同性的FIM,实现最优表示效率(η=1\eta = 1),为理解Barlow Twins的有效性提供了严格的理论基础,并为分析SSL算法提供了新的几何视角。

链接: https://arxiv.org/abs/2510.10980
作者: Di Zhang
机构: Xi’an Jiaotong-Liverpool University (西交利物浦大学)
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)
备注: 7 pages

点击查看摘要

Abstract:Self-supervised learning (SSL) has achieved remarkable success by learning meaningful representations without labeled data. However, a unified theoretical framework for understanding and comparing the efficiency of different SSL paradigms remains elusive. In this paper, we introduce a novel information-geometric framework to quantify representation efficiency. We define representation efficiency \eta as the ratio between the effective intrinsic dimension of the learned representation space and its ambient dimension, where the effective dimension is derived from the spectral properties of the Fisher Information Matrix (FIM) on the statistical manifold induced by the encoder. Within this framework, we present a theoretical analysis of the Barlow Twins method. Under specific but natural assumptions, we prove that Barlow Twins achieves optimal representation efficiency ( \eta = 1 ) by driving the cross-correlation matrix of representations towards the identity matrix, which in turn induces an isotropic FIM. This work provides a rigorous theoretical foundation for understanding the effectiveness of Barlow Twins and offers a new geometric perspective for analyzing SSL algorithms.
zh

[CV-89] Chart-RVR: Reinforcement Learning with Verifiable Rewards for Explainable Chart Reasoning

【速读】:该论文旨在解决大型视觉语言模型(Large Vision-Language Models, LVLMs)在图表推理任务中对分布外(out-of-distribution, OOD)数据鲁棒性不足,且生成链式思维(chain-of-thought, CoT)解释时性能进一步下降的问题,从而限制了模型的可解释性和实际应用可信度。解决方案的关键在于提出 Chart-RVR 框架,该框架通过结合组相对策略优化(Group Relative Policy Optimization, GRPO)与自动可验证奖励机制,设计三种目标驱动的奖励信号:(i) 正确的图表类型分类、(ii) 忠实的表格结构重建、(iii) 推理过程的一致性,从而在保持高准确率的同时显著提升 CoT 理由的忠实度与可解释性,使模型在分布内和分布外场景下均达到最优表现。

链接: https://arxiv.org/abs/2510.10973
作者: Sanchit Sinha,Oana Frunza,Kashif Rasul,Yuriy Nevmyvaka,Aidong Zhang
机构: University of Virginia (弗吉尼亚大学); Morgan Stanley (摩根士丹利)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: 23 pages

点击查看摘要

Abstract:The capabilities of Large Vision-Language Models (LVLMs) have reached state-of-the-art on many visual reasoning tasks, including chart reasoning, yet they still falter on out-of-distribution (OOD) data, and degrade further when asked to produce their chain-of-thought (CoT) rationales, limiting explainability. We present Chart-RVR, a general framework that fine-tunes LVLMs to be more robust and explainable for chart reasoning by coupling Group Relative Policy Optimization (GRPO) with automatically verifiable rewards. Our framework comprises of three rewards that maximize: (i) correct chart-type classification, (ii) faithful chart table reconstruction, and (iii) process conformity. Applied to 3-billion-parameter LVLMs, Chart-RVR consistently outperforms standard supervised fine-tuning (SFT) on both in-distribution and out-of-distribution datasets, closing the OOD performance gap while improving rationale fidelity. The resulting models, the Chart-RVR-3B series, achieve state-of-the-art results on six chart-reasoning benchmarks spanning in-domain and OOD settings, surpassing all existing models of comparable size. Beyond accuracy, Chart-RVR yields more interpretable CoT rationales, strengthening trust and reliability - showcasing the power of verifiable rewards with GRPO for training reliable, interpretable chart-reasoning models.
zh

[CV-90] IUT-Plug: A Plug-in tool for Interleaved Image-Text Generation

【速读】:该论文旨在解决现有视觉语言模型(Vision Language Models, VLMs)在多模态图像-文本生成任务中难以保持逻辑一致性、对象身份识别准确性和风格稳定性的关键问题,这些问题显著限制了VLMs在复杂图像-文本输入输出场景下的泛化能力。解决方案的核心是提出IUT-Plug模块,其基于图像理解树(Image Understanding Tree, IUT)构建,通过显式的结构化推理机制增强现有交错式VLMs,从而缓解逻辑、实体身份和风格方面的上下文漂移(context drift)。该框架分为两个阶段:首先由动态IUT-Plug提取模块将视觉场景解析为分层符号结构;其次通过协调叙事流与图像合成机制确保跨模态一致性,实验表明该方法在多个多模态问答场景中有效提升了准确性并显著减轻了三类关键上下文漂移。

链接: https://arxiv.org/abs/2510.10969
作者: Zeteng Lin,Xingxing Li,Wen You,Xiaoyang Li,Zehan Lu,Yujun Cai,Jing Tang
机构: Hong Kong University of Science and Technology(Guangzhou)(广州科技大学); University of Queensland(昆士兰大学); Hong Kong University of Science and Technology(香港科技大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Existing vision language models (VLMs), including GPT-4 and DALL-E, often struggle to preserve logic, object identity, and style in multimodal image-text generation. This limitation significantly hinders the generalization capability of VLMs in complex image-text input-output scenarios. To address this issue, we propose IUT-Plug, a module grounded in an Image Understanding Tree (IUT), which enhances existing interleaved VLMs through explicit structured reasoning, thereby mitigating context drift in logic, entity identity, and style. The proposed framework operates in two stages. (1) A dynamic IUT-Plug extraction module parses visual scenes into hierarchical symbolic structures. (2) A coordinated narrative-flow and image synthesis mechanism ensures cross-modal consistency. To evaluate our approach, we construct a novel benchmark based on 3,000 real human-generated question-answer pairs over fine-tuned large models, introducing a dynamic evaluation protocol for quantifying context drift in interleaved VLMs. Experimental results demonstrate that IUT-Plug not only improves accuracy on established benchmarks but also effectively alleviates the three critical forms of context drift across diverse multimodal question answering (QA) scenarios.
zh

[CV-91] Comparative Evaluation of Neural Network Architectures for Generalizable Human Spatial Preference Prediction in Unseen Built Environments ALT

链接: https://arxiv.org/abs/2510.10954
作者: Maral Doctorarastoo,Katherine A. Flanigan,Mario Bergés,Christopher McComb
机构: 未知
类目: Computational Engineering, Finance, and Science (cs.CE); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
备注: The 15th International Workshop on Structural Health Monitoring (IWSHM)

点击查看摘要

[CV-92] owards Distribution-Shift Uncertainty Estimation for Inverse Problems with Generative Priors

【速读】:该论文旨在解决生成式先验(Generative Prior)在计算成像逆问题中应用时,因测试图像分布偏离训练分布而导致的幻觉(Hallucination)问题,即模型可能在缺乏足够数据支持的情况下错误地生成不真实特征。现有不确定性量化方法通常依赖于同分布校准数据集、提供启发式估计或仅衡量模型容量/测量限制带来的不确定性,而无法有效捕捉分布偏移(Distribution Shift)的影响。其解决方案的关键在于提出一种实例级、无需校准的不确定性指标,该指标基于核心假设:同分布图像在随机测量扰动下重建结果保持稳定,而分布外(Out-of-Distribution, OOD)图像则表现出更高重建变异性。通过将这种稳定性作为分布偏移的代理指标,该方法可在不重新训练模型的前提下高效检测OOD情况,并为实际部署提供轻量级防护机制,从而实现对内分布样本的激进采样压缩与对外分布样本的自动预警。

链接: https://arxiv.org/abs/2510.10947
作者: Namhoon Kim,Sara Fridovich-Keil
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Code is available at this https URL

点击查看摘要

Abstract:Generative models have shown strong potential as data-driven priors for solving inverse problems such as reconstructing medical images from undersampled measurements. While these priors improve reconstruction quality with fewer measurements, they risk hallucinating features when test images lie outside the training distribution. Existing uncertainty quantification methods in this setting (i) require an in-distribution calibration dataset, which may not be available, (ii) provide heuristic rather than statistical estimates, or (iii) quantify uncertainty from model capacity or limited measurements rather than distribution shift. We propose an instance-level, calibration-free uncertainty indicator that is sensitive to distribution shift, requires no knowledge of the training distribution, and incurs no retraining cost. Our key hypothesis is that reconstructions of in-distribution images remain stable under random measurement variations, while reconstructions of out-of-distribution (OOD) images exhibit greater instability. We use this stability as a proxy for detecting distribution shift. Our proposed OOD indicator is efficiently computable for any computational imaging inverse problem; we demonstrate it on tomographic reconstruction of MNIST digits, where a learned proximal network trained only on digit “0” is evaluated on all ten digits. Reconstructions of OOD digits show higher variability and correspondingly higher reconstruction error, validating this indicator. These results suggest a deployment strategy that pairs generative priors with lightweight guardrails, enabling aggressive measurement reduction for in-distribution cases while automatically warning when priors are applied out of distribution.
zh

[CV-93] DKPMV: Dense Keypoints Fusion from Multi-View RGB Frames for 6D Pose Estimation of Textureless Objects ICRA2026

链接: https://arxiv.org/abs/2510.10933
作者: Jiahong Chen,Jinghao Wang,Zi Wang,Ziwen Wang,Banglei Guan,Qifeng Yu
机构: College of Aerospace Science and Engineering, National University of Defense Technology (国防科技大学航空航天学院); Hunan Provincial Key Laboratory of Image Measurement and Vision Navigation (湖南省图像测量与视觉导航重点实验室)
类目: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
备注: 12 pages, 9 figures, submitted to ICRA 2026

点击查看摘要

[CV-94] FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model

链接: https://arxiv.org/abs/2510.10921
作者: Chunyu Xie,Bin Wang,Fanjing Kong,Jincheng Li,Dawei Liang,Ji Ao,Dawei Leng,Yuhui Yin
机构: 360 AI Research
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[CV-95] DreamMakeup: Face Makeup Customization using Latent Diffusion Models

链接: https://arxiv.org/abs/2510.10918
作者: Geon Yeong Park,Inhwa Han,Serin Yang,Yeobin Hong,Seongmin Jeong,Heechan Jeon,Myeongjin Goh,Sung Won Yi,Jin Nam,Jong Chul Ye
机构: KAIST; Amorepacific
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[CV-96] SceneTextStylizer: A Training-Free Scene Text Style Transfer Framework with Diffusion Model

【速读】:该论文旨在解决场景文本(scene text)中灵活且局部化的风格编辑难题,现有方法通常仅支持内容替换或简单风格迁移,难以实现自由风格转换。其解决方案的关键在于提出了一种无需训练的扩散模型框架 SceneTextStylizer,通过设计特征注入模块(利用扩散模型反演和自注意力机制实现风格特征的有效迁移)、区域控制机制(在每个去噪步骤中应用基于距离的动态掩码以实现精确空间控制),以及基于傅里叶变换的风格增强模块,从而在保持文本可读性和风格一致性的前提下,实现提示引导的文本区域风格变换。

链接: https://arxiv.org/abs/2510.10910
作者: Honghui Yuan,Keiji Yanai
机构: The University of Electro-Communications(电波通信大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
备注:

点击查看摘要

Abstract:With the rapid development of diffusion models, style transfer has made remarkable progress. However, flexible and localized style editing for scene text remains an unsolved challenge. Although existing scene text editing methods have achieved text region editing, they are typically limited to content replacement and simple styles, which lack the ability of free-style transfer. In this paper, we introduce SceneTextStylizer, a novel training-free diffusion-based framework for flexible and high-fidelity style transfer of text in scene images. Unlike prior approaches that either perform global style transfer or focus solely on textual content modification, our method enables prompt-guided style transformation specifically for text regions, while preserving both text readability and stylistic consistency. To achieve this, we design a feature injection module that leverages diffusion model inversion and self-attention to transfer style features effectively. Additionally, a region control mechanism is introduced by applying a distance-based changing mask at each denoising step, enabling precise spatial control. To further enhance visual quality, we incorporate a style enhancement module based on the Fourier transform to reinforce stylistic richness. Extensive experiments demonstrate that our method achieves superior performance in scene text style transformation, outperforming existing state-of-the-art methods in both visual fidelity and text preservation.
zh

[CV-97] opological Alignment of Shared Vision-Language Embedding Space

链接: https://arxiv.org/abs/2510.10889
作者: Junwon You,Dasol Kang,Jae-Hun Jung
机构: POSTECH(韩国科学技术院)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 24 pages, 5 figures, 19 tables

点击查看摘要

[CV-98] Where on Earth? A Vision-Language Benchmark for Probing Model Geolocation Skills Across Scales

链接: https://arxiv.org/abs/2510.10880
作者: Zhaofang Qian,Hardy Chen,Zeyu Wang,Li Zhang,Zijun Wang,Xiaoke Huang,Hui Liu,Xianfeng Tang,Zeyu Zheng,Haoqin Tu,Cihang Xie,Yuyin Zhou
机构: University of Central Florida (中佛罗里达大学); University of California, Santa Cruz (加州大学圣克鲁兹分校); Columbia University (哥伦比亚大学); Amazon Research (亚马逊研究院); University of California, Berkeley (加州大学伯克利分校)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-99] rareboost3d: a synthetic lidar dataset with enhanced rare classes

链接: https://arxiv.org/abs/2510.10876
作者: Shutong Lin,Zhengkang Xiang,Jianzhong Qi,Kourosh Khoshelham
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-100] FastHMR: Accelerating Human Mesh Recovery via Token and Layer Merging with Diffusion Decoding FAST

链接: https://arxiv.org/abs/2510.10868
作者: Soroush Mehraban,Andrea Iaboni,Babak Taati
机构: University of Toronto (多伦多大学); Vector Institute (矢量研究所); KITE Research Institute, UHN (KITE 研究所,UHN)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Project page: this https URL

点击查看摘要

[CV-101] From Detection to Mitigation: Addressing Bias in Deep Learning Models for Chest X-Ray Diagnosis

链接: https://arxiv.org/abs/2510.10822
作者: Clemence Mottez,Louisa Fay,Maya Varma,Sophie Ostmeier,Curtis Langlotz
机构: Stanford University (斯坦福大学); University Hospital of Tübingen (图宾根大学医院)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Preprint of an article published in Pacific Symposium on Biocomputing \c{opyright} 2026 World Scientific Publishing Co., Singapore, this http URL

点击查看摘要

[CV-102] MSCloudCAM: Cross-Attention with Multi-Scale Context for Multispectral Cloud Segmentation

【速读】:该论文旨在解决光学卫星遥感影像中云层干扰问题(clouds remain a critical challenge in optical satellite imagery),这对环境监测、土地覆盖制图和气候研究等任务的可靠性构成严重影响。其解决方案的核心是提出MSCloudCAM模型,该模型采用基于Swin Transformer的层次化特征提取结构,并结合ASPP与PSP多尺度上下文模块以增强尺度感知能力;同时引入交叉注意力机制(Cross-Attention block)实现多传感器与多光谱特征的有效融合,辅以高效通道注意力模块(ECAB)和空间注意力模块(Spatial Attention Module)自适应优化特征表示,从而在CloudSEN12和L8Biome数据集上实现最先进的云分割精度与良好的参数效率。

链接: https://arxiv.org/abs/2510.10802
作者: Md Abdullah Al Mazid,Liangdong Deng,Naphtali Rishe
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 7 pages, 2 Figures

点击查看摘要

Abstract:Clouds remain a critical challenge in optical satellite imagery, hindering reliable analysis for environmental monitoring, land cover mapping, and climate research. To overcome this, we propose MSCloudCAM, a Cross-Attention with Multi-Scale Context Network tailored for multispectral and multi-sensor cloud segmentation. Our framework exploits the spectral richness of Sentinel-2 (CloudSEN12) and Landsat-8 (L8Biome) data to classify four semantic categories: clear sky, thin cloud, thick cloud, and cloud shadow. MSCloudCAM combines a Swin Transformer backbone for hierarchical feature extraction with multi-scale context modules ASPP and PSP for enhanced scale-aware learning. A Cross-Attention block enables effective multisensor and multispectral feature fusion, while the integration of an Efficient Channel Attention Block (ECAB) and a Spatial Attention Module adaptively refine feature representations. Comprehensive experiments on CloudSEN12 and L8Biome demonstrate that MSCloudCAM delivers state-of-the-art segmentation accuracy, surpassing leading baseline architectures while maintaining competitive parameter efficiency and FLOPs. These results underscore the model’s effectiveness and practicality, making it well-suited for large-scale Earth observation tasks and real-world applications.
zh

[CV-103] Full segmentation annotations of 3D time-lapse microscopy images of MDA231 cells

【速读】:该论文旨在解决高精度、公开可用的三维(3D)时间序列图像中细胞分割标注稀缺的问题,尤其是在复杂动态形态下对大量目标进行标注时存在耗时且困难的挑战。其关键解决方案是构建并发布首个公开的完整3D时间 lapse 细胞迁移分割标注数据集,涵盖来自Cell Tracking Challenge(CTC)的Fluo-C3DL-MDA231人类乳腺癌细胞序列,并由三位独立标注者完成标注。通过与CTC提供的追踪标记及2D金标准(gold truth)对比验证,证明该标注在一致性上符合已知基准,且其分割精度处于人与人之间的变异范围内;同时相较于CTC自动生成的银标准(silver truth),新标注更能准确反映输入图像的复杂性,从而为细胞分割算法训练与测试以及高度动态物体的3D形态分析提供可靠的数据基础。

链接: https://arxiv.org/abs/2510.10797
作者: Aleksandra Melnikova,Petr Matula
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 6 pages, 2 figures, 4 tables

点击查看摘要

Abstract:High-quality, publicly available segmentation annotations of image and video datasets are critical for advancing the field of image processing. In particular, annotations of volumetric images of a large number of targets are time-consuming and challenging. In (Melnikova, A., Matula, P., 2025), we presented the first publicly available full 3D time-lapse segmentation annotations of migrating cells with complex dynamic shapes. Concretely, three distinct humans annotated two sequences of MDA231 human breast carcinoma cells (Fluo-C3DL-MDA231) from the Cell Tracking Challenge (CTC). This paper aims to provide a comprehensive description of the dataset and accompanying experiments that were not included in (Melnikova, A., Matula, P., 2025) due to limitations in publication space. Namely, we show that the created annotations are consistent with the previously published tracking markers provided by the CTC organizers and the segmentation accuracy measured based on the 2D gold truth of CTC is within the inter-annotator variability margins. We compared the created 3D annotations with automatically created silver truth provided by CTC. We have found the proposed annotations better represent the complexity of the input images. The presented annotations can be used for testing and training cell segmentation, or analyzing 3D shapes of highly dynamic objects. Comments: 6 pages, 2 figures, 4 tables Subjects: Computer Vision and Pattern Recognition (cs.CV) Cite as: arXiv:2510.10797 [cs.CV] (or arXiv:2510.10797v1 [cs.CV] for this version) https://doi.org/10.48550/arXiv.2510.10797 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
zh

[CV-104] ImHead: A Large-scale Implicit Morphable Model for Localized Head Modeling ICCV2025

【速读】:该论文旨在解决传统3D形态模型(3D morphable models, 3DMMs)在建模复杂全头形状时的局限性,尤其是其严格拓扑结构和线性特性难以捕捉精细面部特征与局部变化的问题。解决方案的关键在于提出imHead——一种基于深度隐式函数的新颖隐式3DMM,其核心创新是保留单一紧凑的身份潜空间(identity space),并通过引入中间区域特异性潜表示(region-specific latent representation)实现对人脸局部特征的精准编辑,从而在保持模型表达能力的同时显著降低潜空间复杂度,并支持可解释的三维人脸操作。

链接: https://arxiv.org/abs/2510.10793
作者: Rolandos Alexandros Potamias,Stathis Galanakis,Jiankang Deng,Athanasios Papaioannou,Stefanos Zafeiriou
机构: Imperial College London (帝国理工学院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: ICCV 2025

点击查看摘要

Abstract:Over the last years, 3D morphable models (3DMMs) have emerged as a state-of-the-art methodology for modeling and generating expressive 3D avatars. However, given their reliance on a strict topology, along with their linear nature, they struggle to represent complex full-head shapes. Following the advent of deep implicit functions, we propose imHead, a novel implicit 3DMM that not only models expressive 3D head avatars but also facilitates localized editing of the facial features. Previous methods directly divided the latent space into local components accompanied by an identity encoding to capture the global shape variations, leading to expensive latent sizes. In contrast, we retain a single compact identity space and introduce an intermediate region-specific latent representation to enable local edits. To train imHead, we curate a large-scale dataset of 4K distinct identities, making a step-towards large scale 3D head modeling. Under a series of experiments we demonstrate the expressive power of the proposed model to represent diverse identities and expressions outperforming previous approaches. Additionally, the proposed approach provides an interpretable solution for 3D face manipulation, allowing the user to make localized edits.
zh

[CV-105] DISC-GAN: Disentangling Style and Content for Cluster-Specific Synthetic Underwater Image Generation

链接: https://arxiv.org/abs/2510.10782
作者: Sneha Varur,Anirudh R Hanchinamani,Tarun S Bagewadi,Uma Mudenagudi,Chaitra D Desai,Sujata C,Padmashree Desai,Sumit Meharwade
机构: KLE Technological University (KLE 技术大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-106] Structured Spectral Graph Learning for Multi-label Abnormality Classification in 3D Chest CT Scans

链接: https://arxiv.org/abs/2510.10779
作者: Theo Di Piazza,Carole Lazarus,Olivier Nempont,Loic Boussel
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 22 pages, 14 figures

点击查看摘要

[CV-107] EGD-YOLO: A Lightweight Multimodal Framework for Robust Drone-Bird Discrimination via Ghost-Enhanced YOLOv8n and EMA Attention under Adverse Condition

链接: https://arxiv.org/abs/2510.10765
作者: Sudipto Sarkar,Mohammad Asif Hasan,Khondokar Ashik Shahriar,Fablia Labiba,Nahian Tasnim,Sheikh Anawarul Haq Fattah
机构: Bangladesh University of Engineering and Technology (孟加拉国工程技术大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-108] Optimally Deep Networks – Adapting Model Depth to Datasets for Superior Efficiency

【速读】:该论文旨在解决深度神经网络(Deep Neural Networks, DNNs)在实际应用中因模型规模过大、计算资源消耗高和内存占用大而导致的部署难题,尤其是在资源受限设备上的可行性问题。其核心挑战在于:许多任务并不需要全深度模型即可达到最优性能,而现有方法通常对所有数据集统一训练深层网络,造成冗余计算与资源浪费。解决方案的关键在于提出最优深度网络(Optimally Deep Networks, ODNs),通过一种类神经架构搜索(Neural Architecture Search, NAS)的训练策略——渐进式深度扩展(progressive depth expansion),从浅层开始训练,并在早期网络块收敛后逐步增加深度,直至达到目标精度。该方法能够自动识别并保留任务所需的最小深度,从而显著降低内存占用、提升计算效率,并支持边缘设备部署。

链接: https://arxiv.org/abs/2510.10764
作者: Shaharyar Ahmed Khan Tareen,Filza Khan Tareen
机构: University of Houston (休斯顿大学); National University of Sciences and Technology (巴基斯坦国立科技大学)
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: 6 pages, 3 figures, 1 table

点击查看摘要

Abstract:Deep neural networks (DNNs) have provided brilliant performance across various tasks. However, this success often comes at the cost of unnecessarily large model sizes, high computational demands, and substantial memory footprints. Typically, powerful architectures are trained at full depths but not all datasets or tasks require such high model capacity. Training very deep architectures on relatively low-complexity datasets frequently leads to wasted computation, unnecessary energy consumption, and excessive memory usage, which in turn makes deployment of models on resource-constrained devices impractical. To address this problem, we introduce Optimally Deep Networks (ODNs), which provide a balance between model depth and task complexity. Specifically, we propose a NAS like training strategy called progressive depth expansion, which begins by training deep networks at shallower depths and incrementally increases their depth as the earlier blocks converge, continuing this process until the target accuracy is reached. ODNs use only the optimal depth for the given datasets, removing redundant layers. This cuts down future training and inference costs, lowers the memory footprint, enhances computational efficiency, and facilitates deployment on edge devices. Empirical results show that the optimal depths of ResNet-18 and ResNet-34 for MNIST and SVHN, achieve up to 98.64 % and 96.44 % reduction in memory footprint, while maintaining a competitive accuracy of 99.31 % and 96.08 %, respectively.
zh

[CV-109] Restricted Receptive Fields for Face Verification

链接: https://arxiv.org/abs/2510.10753
作者: Kagan Ozturk,Aman Bhatta,Haiyu Wu,Patrick Flynn,Kevin W. Bowyer
机构: University of Notre Dame (圣母大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-110] Uncovering Anomalous Events for Marine Environmental Monitoring via Visual Anomaly Detection

链接: https://arxiv.org/abs/2510.10750
作者: Laura Weihl,Nejc Novak,Stefan H. Bengtson,Malte Pedersen
机构: IT University of Copenhagen (哥本哈根信息技术大学); Anemo Robotics ApS; Aalborg University (奥尔堡大学); Pioneer Centre for Artificial Intelligence (先锋人工智能中心)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-111] Seeing My Future: Predicting Situated Interaction Behavior in Virtual Reality

【速读】:该论文旨在解决虚拟现实(VR)和增强现实(AR)系统中缺乏对用户行为进行智能适应的问题,核心挑战在于如何准确理解人类意图并预测其情境化行为(如注视方向和物体交互),从而实现响应式环境构建。解决方案的关键在于提出一种分层的、意图感知的框架,通过建模驱动人-环境交互的认知机制来预测细粒度的行为;其中,创新性地引入动态图卷积网络(Dynamic Graph Convolutional Network, GCN)以有效捕捉人与环境之间的动态关系,从而在真实世界基准数据集和实时VR环境中实现更优的行为预测性能,支撑主动式VR系统的开发。

链接: https://arxiv.org/abs/2510.10742
作者: Yuan Xu,Zimu Zhang,Xiaoxuan Ma,Wentao Zhu,Yu Qiao,Yizhou Wang
机构: Peking University (北京大学); Eastern Institute of Technology, Ningbo (宁波东方理工大学); Shanghai Jiao Tong University (上海交通大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Project Page: this https URL

点击查看摘要

Abstract:Virtual and augmented reality systems increasingly demand intelligent adaptation to user behaviors for enhanced interaction experiences. Achieving this requires accurately understanding human intentions and predicting future situated behaviors - such as gaze direction and object interactions - which is vital for creating responsive VR/AR environments and applications like personalized assistants. However, accurate behavioral prediction demands modeling the underlying cognitive processes that drive human-environment interactions. In this work, we introduce a hierarchical, intention-aware framework that models human intentions and predicts detailed situated behaviors by leveraging cognitive mechanisms. Given historical human dynamics and the observation of scene contexts, our framework first identifies potential interaction targets and forecasts fine-grained future behaviors. We propose a dynamic Graph Convolutional Network (GCN) to effectively capture human-environment relationships. Extensive experiments on challenging real-world benchmarks and live VR environment demonstrate the effectiveness of our approach, achieving superior performance across all metrics and enabling practical applications for proactive VR systems that anticipate user behaviors and adapt virtual environments accordingly.
zh

[CV-112] WorldMirror: Universal 3D World Reconstruction with Any-Prior Prompting

链接: https://arxiv.org/abs/2510.10726
作者: Yifan Liu,Zhiyuan Min,Zhenwei Wang,Junta Wu,Tengfei Wang,Yixuan Yuan,Yawei Luo,Chunchao Guo
机构: Zhejiang University (浙江大学); Chinese University of Hong Kong (香港中文大学); Tencent Hunyuan (腾讯混元)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Project page, code, and models will be publicly available soon

点击查看摘要

[CV-113] VLM-Guided Adaptive Negative Prompting for Creative Generation

【速读】:该论文旨在解决当前文本到图像扩散模型在生成具有真正新颖性内容时的局限性问题,即尽管这些模型能够忠实还原用户提示并生成逼真的图像,但其输出往往局限于已知类别和常见视觉概念,难以突破预训练数据分布的边界以实现创造性生成。解决方案的关键在于提出一种无需训练、仅在推理阶段应用的VLM-Guided Adaptive Negative-Prompting方法,该方法利用视觉语言模型(Vision-Language Model, VLM)对生成过程中的中间结果进行分析,并动态调整负向提示(negative-prompt),引导模型避开常规视觉概念,从而激发新颖且符合语义意图的图像输出,同时保持生成对象的有效性。

链接: https://arxiv.org/abs/2510.10715
作者: Shelly Golan,Yotam Nitzan,Zongze Wu,Or Patashnik
机构: 未知
类目: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
备注: Project page at: this https URL

点击查看摘要

Abstract:Creative generation is the synthesis of new, surprising, and valuable samples that reflect user intent yet cannot be envisioned in advance. This task aims to extend human imagination, enabling the discovery of visual concepts that exist in the unexplored spaces between familiar domains. While text-to-image diffusion models excel at rendering photorealistic scenes that faithfully match user prompts, they still struggle to generate genuinely novel content. Existing approaches to enhance generative creativity either rely on interpolation of image features, which restricts exploration to predefined categories, or require time-intensive procedures such as embedding optimization or model fine-tuning. We propose VLM-Guided Adaptive Negative-Prompting, a training-free, inference-time method that promotes creative image generation while preserving the validity of the generated object. Our approach utilizes a vision-language model (VLM) that analyzes intermediate outputs of the generation process and adaptively steers it away from conventional visual concepts, encouraging the emergence of novel and surprising outputs. We evaluate creativity through both novelty and validity, using statistical metrics in the CLIP embedding space. Through extensive experiments, we show consistent gains in creative novelty with negligible computational overhead. Moreover, unlike existing methods that primarily generate single objects, our approach extends to complex scenarios, such as generating coherent sets of creative objects and preserving creativity within elaborate compositional prompts. Our method integrates seamlessly into existing diffusion pipelines, offering a practical route to producing creative outputs that venture beyond the constraints of textual descriptions.
zh

[CV-114] Dynamic Gaussian Splatting from Defocused and Motion-blurred Monocular Videos

链接: https://arxiv.org/abs/2510.10691
作者: Xuankai Zhang,Junjin Xiao,Qing Zhang
机构: Sun Yat-sen University (中山大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-115] Action-Dynamics Modeling and Cross-Temporal Interaction for Online Action Understanding

【速读】:该论文旨在解决两个核心问题:一是未剪辑视频(untrimmed videos)中普遍存在冗余信息和噪声,导致动作理解任务效率低下;二是现有方法通常忽视代理(agent)意图对动作的潜在影响,从而限制了动作检测与预测的准确性。解决方案的关键在于提出一种统一框架——状态特定模型(State-Specific Model, SSM),其核心创新包括:1)基于关键状态的记忆压缩模块(Critical State-Based Memory Compression),将帧序列压缩为关键状态以降低冗余;2)动作模式学习模块(Action Pattern Learning)构建多维边的状态转移图,用于建模复杂场景下的动作动态并提取未来线索以表征意图;3)跨时序交互模块(Cross-Temporal Interaction)通过建模意图与过去及当前信息之间的相互作用,优化当前和未来特征表示,从而实现动作检测与预测的联合优化。

链接: https://arxiv.org/abs/2510.10682
作者: Xinyu Yang,Zheheng Jiang,Feixiang Zhou,Yihang Zhu,Na Lv,Nan Xing,Huiyu Zhou
机构: University of Leicester(莱斯特大学); University of Liverpool(利物浦大学); University of Jinan(济南大学); Xi’an University of Technology(西安理工大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 10 pages, 9 figures

点击查看摘要

Abstract:Action understanding, encompassing action detection and anticipation, plays a crucial role in numerous practical applications. However, untrimmed videos are often characterized by substantial redundant information and noise. Moreover, in modeling action understanding, the influence of the agent’s intention on the action is often overlooked. Motivated by these issues, we propose a novel framework called the State-Specific Model (SSM), designed to unify and enhance both action detection and anticipation tasks. In the proposed framework, the Critical State-Based Memory Compression module compresses frame sequences into critical states, reducing information redundancy. The Action Pattern Learning module constructs a state-transition graph with multi-dimensional edges to model action dynamics in complex scenarios, on the basis of which potential future cues can be generated to represent intention. Furthermore, our Cross-Temporal Interaction module models the mutual influence between intentions and past as well as current information through cross-temporal interactions, thereby refining present and future features and ultimately realizing simultaneous action detection and anticipation. Extensive experiments on multiple benchmark datasets – including EPIC-Kitchens-100, THUMOS’14, TVSeries, and the introduced Parkinson’s Disease Mouse Behaviour (PDMB) dataset – demonstrate the superior performance of our proposed framework compared to other state-of-the-art approaches. These results highlight the importance of action dynamics learning and cross-temporal interactions, laying a foundation for future action understanding research.
zh

[CV-116] MSM-Seg: A Modality-and-Slice Memory Framework with Category-Agnostic Prompting for Multi-Modal Brain Tumor Segmentation

链接: https://arxiv.org/abs/2510.10679
作者: Yuxiang Luo,Qing Xu,Hai Huang,Yuqi Ouyang,Zhen Chen,Wenting Duan
机构: Waseda University (早稻田大学); University of Lincoln (林肯大学); University of Nottingham (诺丁汉大学); University of Nottingham Ningbo China (诺丁汉大学宁波分校); Northeast Agricultural University (东北农业大学); Sichuan University (四川大学); Yale University (耶鲁大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Under Review

点击查看摘要

[CV-117] Image-to-Video Transfer Learning based on Image-Language Foundation Models: A Comprehensive Survey

链接: https://arxiv.org/abs/2510.10671
作者: Jinxuan Li,Chaolei Tan,Haoxuan Chen,Jianxin Ma,Jian-Fang Hu,Wei-Shi Zheng,Jianhuang Lai
机构: Sun Yat-sen University (中山大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Draft version, work in progress

点击查看摘要

[CV-118] AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4D Scenes

链接: https://arxiv.org/abs/2510.10670
作者: Yu Li,Menghan Xia,Gongye Liu,Jianhong Bai,Xintao Wang,Conglang Zhang,Yuxuan Lin,Ruihang Chu,Pengfei Wan,Yujiu Yang
机构: Tsinghua University (清华大学); HUST (华中科技大学); Kling Team, Kuaishou Technology (快手科技Kling团队); HKUST (香港科技大学); Zhejiang University (浙江大学); Wuhan University (武汉大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-119] Scalable Face Security Vision Foundation Model for Deepfake Diffusion and Spoofing Detection

链接: https://arxiv.org/abs/2510.10663
作者: Gaojian Wang,Feng Lin,Tong Wu,Zhisheng Yan,Kui Ren
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 18 pages, 9 figures, project page: this https URL

点击查看摘要

[CV-120] Stability Under Scrutiny: Benchmarking Representation Paradigms for Online HD Mapping

【速读】:该论文旨在解决在线高精地图(online high-definition (HD) mapping)模型在动态环境中因传感器空间位移导致的映射结果时间不稳定性问题,这一不稳定性严重制约下游自动驾驶任务的可靠性。现有方法多聚焦于单帧映射精度提升,而忽视了对时序稳定性的系统性评估与优化。解决方案的关键在于提出首个针对在线高精地图模型的综合性稳定性基准评测框架,包含Presence、Localization和Shape三维度的新颖稳定性指标,并融合为统一的平均稳定度(mean Average Stability, mAS)得分,从而量化模型在时间维度上的表现。实验表明,精度(mAP)与稳定性(mAS)是独立且均需优化的性能维度,该框架可有效识别影响两者的关键模型设计因素,推动将时间稳定性作为与精度同等重要的核心评价标准。

链接: https://arxiv.org/abs/2510.10660
作者: Hao Shan,Ruikai Li,Han Jiang,Yizhe Fan,Ziyang Yan,Bohan Li,Xiaoshuai Hao,Hao Zhao,Zhiyong Cui,Yilong Ren,Haiyang Yu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:As one of the fundamental modules in autonomous driving, online high-definition (HD) maps have attracted significant attention due to their cost-effectiveness and real-time capabilities. Since vehicles always cruise in highly dynamic environments, spatial displacement of onboard sensors inevitably causes shifts in real-time HD mapping results, and such instability poses fundamental challenges for downstream tasks. However, existing online map construction models tend to prioritize improving each frame’s mapping accuracy, while the mapping stability has not yet been systematically studied. To fill this gap, this paper presents the first comprehensive benchmark for evaluating the temporal stability of online HD mapping models. We propose a multi-dimensional stability evaluation framework with novel metrics for Presence, Localization, and Shape Stability, integrated into a unified mean Average Stability (mAS) score. Extensive experiments on 42 models and variants show that accuracy (mAP) and stability (mAS) represent largely independent performance dimensions. We further analyze the impact of key model design choices on both criteria, identifying architectural and training factors that contribute to high accuracy, high stability, or both. To encourage broader focus on stability, we will release a public benchmark. Our work highlights the importance of treating temporal stability as a core evaluation criterion alongside accuracy, advancing the development of more reliable autonomous driving systems. The benchmark toolkit, code, and models will be available at this https URL.
zh

[CV-121] A Machine Learning Perspective on Automated Driving Corner Cases

链接: https://arxiv.org/abs/2510.10653
作者: Sebastian Schmidt,Julius Körner,Stephan Günnemann
机构: Technical University of Munich (慕尼黑工业大学); BMW Group (宝马集团)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-122] DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis

链接: https://arxiv.org/abs/2510.10650
作者: Peiyin Chen,Zhuowei Yang,Hui Feng,Sheng Jiang,Rui Yan
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 5 pages

点击查看摘要

[CV-123] GraphTARIF: Linear Graph Transformer with Augmented Rank and Improved Focus

链接: https://arxiv.org/abs/2510.10631
作者: Zhaolin Hu,Kun Li,Hehe Fan,Yi Yang
机构: Zhejiang University (浙江大学); Hong Kong Baptist University (香港浸会大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:

点击查看摘要

[CV-124] ImpMIA: Leverag ing Implicit Bias for Membership Inference Attack under Realistic Scenarios

链接: https://arxiv.org/abs/2510.10625
作者: Yuval Golbari,Navve Wasserman,Gal Vardi,Michal Irani
机构: Weizmann Institute of Science (魏茨曼科学研究所)
类目: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-125] OmniQuality-R: Advancing Reward Models Through All-Encompassing Quality Assessment

链接: https://arxiv.org/abs/2510.10609
作者: Yiting Lu,Fengbin Guan,Yixin Gao,Yan Zhong,Xinge Peng,Jiakang Yuan,Yihao Liu,Bo Zhang,Xin Li,Zhibo Chen,Weisi Lin
机构: University of Science and Technology of China (中国科学技术大学); Nanyang Technological University (南洋理工大学); Shanghai Artificial Intelligence Laboratory (上海人工智能实验室); Peking University (北京大学); Fudan University (复旦大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-126] ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for Large Vision-and-Language Models

【速读】:该论文旨在解决大型视觉-语言模型(Large Vision-and-Language Models, LVLMs)在后训练阶段中,监督微调(Supervised Fine-Tuning, SFT)易导致次优性能,而基于可验证奖励的强化学习(Reinforcement Learning with Verifiable Rewards, RLVR)在处理超出模型内部知识库的任务时表现受限的问题。解决方案的关键在于提出一种统一的后训练范式 ViSurf(Visual Supervised-and-Reinforcement Fine-Tuning),其核心思想是在单阶段训练中融合 SFT 的外部监督与 RLVR 的内在强化机制:通过将真实标签注入 RLVR 的 rollout 过程,实现外部监督与内部奖励的同步优化,并引入三种新颖的奖励控制策略以稳定训练过程并提升性能。实验证明,ViSurf 在多个基准测试中优于单独使用 SFT、RLVR 或两阶段 SFT→RLVR 方法。

链接: https://arxiv.org/abs/2510.10606
作者: Yuqi Liu,Liangyu Chen,Jiazhen Liu,Mingkang Zhu,Zhisheng Zhong,Bei Yu,Jiaya Jia
机构: The Chinese University of Hong Kong (香港中文大学); The Hong Kong University of Science and Technology (香港科技大学); Renmin University of China (中国人民大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Typical post-training paradigms for Large Vision-and-Language Models (LVLMs) include Supervised Fine-Tuning (SFT) and Reinforcement Learning with Verifiable Rewards (RLVR). SFT leverages external guidance to inject new knowledge, whereas RLVR utilizes internal reinforcement to enhance reasoning capabilities and overall performance. However, our analysis reveals that SFT often leads to sub-optimal performance, while RLVR struggles with tasks that exceed the model’s internal knowledge base. To address these limitations, we propose ViSurf (\textbfVisual \textbfSupervised-and-\textbfReinforcement \textbfFine-Tuning), a unified post-training paradigm that integrates the strengths of both SFT and RLVR within a single stage. We analyze the derivation of the SFT and RLVR objectives to establish the ViSurf objective, providing a unified perspective on these two paradigms. The core of ViSurf involves injecting ground-truth labels into the RLVR rollouts, thereby providing simultaneous external supervision and internal reinforcement. Furthermore, we introduce three novel reward control strategies to stabilize and optimize the training process. Extensive experiments across several diverse benchmarks demonstrate the effectiveness of ViSurf, outperforming both individual SFT, RLVR, and two-stage SFT \textrightarrow RLVR. In-depth analysis corroborates these findings, validating the derivation and design principles of ViSurf.
zh

[CV-127] SpikeGrasp: A Benchmark for 6-DoF Grasp Pose Detection from Stereo Spike Streams

链接: https://arxiv.org/abs/2510.10602
作者: Zhuoheng Gao,Jiyao Zhang,Zhiyong Xie,Hao Dong,Zhaofei Yu,Rongmei Chen,Guozhang Chen,Tiejun Huang
机构: Peking University (北京大学); National Key Laboratory for Multimedia Information Processing (多媒体信息处理全国重点实验室); School of Computer Science (计算机学院); Center on Frontiers of Computing Studies (计算前沿研究中心); Institute for Artificial Intelligence (人工智能研究院); School of Electronics (电子学院)
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-128] A Simple and Better Baseline for Visual Grounding ICME2025

【速读】:该论文旨在解决视觉定位(Visual Grounding)任务中因多尺度迭代处理和特征缓存导致的计算开销过大的问题。现有方法虽能通过选择语言相关的视觉区域提升效率,但其迭代过程及缓存机制仍带来显著额外负担。解决方案的关键在于提出一种基于特征选择的简洁有效基线方法(FSVG),其核心创新包括:1)将语言与视觉模态直接嵌入统一网络架构,避免复杂迭代流程;2)利用语言信息并行引导跨模态交互,提取高效视觉特征;3)引入基于相似性的特征选择机制,在视觉特征学习阶段仅保留与语言相关的特征,从而显著降低计算成本并实现精度与效率的更好平衡。

链接: https://arxiv.org/abs/2510.10587
作者: Jingchao Wang,Wenlong Zhang,Dingjiang Huang,Hong Wang,Yefeng Zheng
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: ICME2025

点击查看摘要

Abstract:Visual grounding aims to predict the locations of target objects specified by textual descriptions. For this task with linguistic and visual modalities, there is a latest research line that focuses on only selecting the linguistic-relevant visual regions for object localization to reduce the computational overhead. Albeit achieving impressive performance, it is iteratively performed on different image scales, and at every iteration, linguistic features and visual features need to be stored in a cache, incurring extra overhead. To facilitate the implementation, in this paper, we propose a feature selection-based simple yet effective baseline for visual grounding, called FSVG. Specifically, we directly encapsulate the linguistic and visual modalities into an overall network architecture without complicated iterative procedures, and utilize the language in parallel as guidance to facilitate the interaction between linguistic modal and visual modal for extracting effective visual features. Furthermore, to reduce the computational cost, during the visual feature learning, we introduce a similarity-based feature selection mechanism to only exploit language-related visual features for faster prediction. Extensive experiments conducted on several benchmark datasets comprehensively substantiate that the proposed FSVG achieves a better balance between accuracy and efficiency beyond the current state-of-the-art methods. Code is available at this https URL.
zh

[CV-129] Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection

链接: https://arxiv.org/abs/2510.10584
作者: Shizhen Zhao,Jiahui Liu,Xin Wen,Haoru Tan,Xiaojuan Qi
机构: The University of Hong Kong (香港大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-130] Injecting Frame-Event Complementary Fusion into Diffusion for Optical Flow in Challenging Scenes

链接: https://arxiv.org/abs/2510.10577
作者: Haonan Wang,Hanyu Zhou,Haoyue Liu,Luxin Yan
机构: Huazhong University of Science and Technology (华中科技大学); National University of Singapore (新加坡国立大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-131] UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation

链接: https://arxiv.org/abs/2510.10575
作者: Zhengrong Yue,Haiyu Zhang,Xiangyu Zeng,Boyu Chen,Chenting Wang,Shaobin Zhuang,Lu Dong,KunPeng Du,Yi Wang,Limin Wang,Yali Wang
机构: Shanghai Jiao Tong University (上海交通大学); Shanghai AI Laboratory; Beihang University (北京航空航天大学); Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences (中国科学院深圳先进技术研究院); Nanjing University (南京大学); University of Science and Technology of China (中国科学技术大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-132] Deep semi-supervised approach based on consistency regularization and similarity learning for weeds classification

链接: https://arxiv.org/abs/2510.10573
作者: Farouq Benchallal,Adel Hafiane,Nicolas Ragot,Raphael Canals
机构: PRISME EA 4229; INSA CVL; LIFAT EA 6300; Université Tours; Université d’Orleans
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Submitted to EURASIP Journal on Image and Video Processing

点击查看摘要

[CV-133] MRS-YOLO Railroad Transmission Line Foreign Object Detection Based on Improved YOLO1 1 and Channel Pruning

链接: https://arxiv.org/abs/2510.10553
作者: Siyuan Liu,Junting Lin
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-134] GLOFNet – A Multimodal Dataset for GLOF Monitoring and Prediction

链接: https://arxiv.org/abs/2510.10546
作者: Zuha Fatima,Muhammad Anser Sohaib,Muhammad Talha,Sidra Sultana,Ayesha Kanwal,Nazia Perwaiz
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-135] MCE: Towards a General Framework for Handling Missing Modalities under Imbalanced Missing Rates

【速读】:该论文旨在解决多模态学习中因模态缺失率不平衡导致的模型性能下降问题,特别是当某些模态缺失频率较高时,其学习进度滞后、特征表示退化,进而形成恶性循环,影响整体性能。解决方案的关键在于提出一种名为“模态能力增强”(Modality Capability Enhancement, MCE)的新框架,包含两个协同工作的组件:i)学习能力增强(Learning Capability Enhancement, LCE),通过引入多层次因素动态平衡各模态的学习进度;ii)表征能力增强(Representation Capability Enhancement, RCE),借助子集预测和跨模态补全任务提升特征语义质量和鲁棒性。该方法有效缓解了样本级模态效用差异与特征质量退化问题,在多个基准数据集上均显著优于现有最先进方法。

链接: https://arxiv.org/abs/2510.10534
作者: Binyu Zhao,Wei Zhang,Zhaonian Zou
机构: Harbin Institute of Technology (哈尔滨工业大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
备注: This is the accepted version of an article that has been published in \textbf{Pattern Recognition}. The final published version will be available soon

点击查看摘要

Abstract:Multi-modal learning has made significant advances across diverse pattern recognition applications. However, handling missing modalities, especially under imbalanced missing rates, remains a major challenge. This imbalance triggers a vicious cycle: modalities with higher missing rates receive fewer updates, leading to inconsistent learning progress and representational degradation that further diminishes their contribution. Existing methods typically focus on global dataset-level balancing, often overlooking critical sample-level variations in modality utility and the underlying issue of degraded feature quality. We propose Modality Capability Enhancement (MCE) to tackle these limitations. MCE includes two synergistic components: i) Learning Capability Enhancement (LCE), which introduces multi-level factors to dynamically balance modality-specific learning progress, and ii) Representation Capability Enhancement (RCE), which improves feature semantics and robustness through subset prediction and cross-modal completion tasks. Comprehensive evaluations on four multi-modal benchmarks show that MCE consistently outperforms state-of-the-art methods under various missing configurations. The journal preprint version is now available at this https URL. Our code is available at this https URL.
zh

[CV-136] Layout-Independent License Plate Recognition via Integrated Vision and Language Models

链接: https://arxiv.org/abs/2510.10533
作者: Elham Shabaninia,Fatemeh Asadi-zeydabadi,Hossein Nezamabadi-pour
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-137] Unified Open-World Segmentation with Multi-Modal Prompts ICCV2025

链接: https://arxiv.org/abs/2510.10524
作者: Yang Liu,Yufei Yin,Chenchen Jing,Muzhi Zhu,Hao Chen,Yuling Xi,Bo Feng,Hao Wang,Shiyu Li,Chunhua Shen
机构: Zhejiang University (浙江大学); Hangzhou Dianzi University (杭州电子科技大学); Zhejiang University of Technology (浙江工业大学); Apple (苹果)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted to ICCV2025

点击查看摘要

[CV-138] Receptive Field Expanded Look-Up Tables for Vision Inference: Advancing from Low-level to High-level Tasks

链接: https://arxiv.org/abs/2510.10522
作者: Xi Zhang,Xiaolin Wu
机构: ANGEL Lab, Nanyang Technological University (南洋理工大学); School of Computing and Artificial Intelligence, Southwest Jiaotong University (西南交通大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-139] VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning

链接: https://arxiv.org/abs/2510.10518
作者: Qunzhong Wang,Jie Liu,Jiajun Liang,Yilei Jiang,Yuanxing Zhang,Jinyuan Chen,Yaozhi Zheng,Xintao Wang,Pengfei Wan,Xiangyu Yue,Jiaheng Liu
机构: CUHK MMLab; Kling Team, Kuaishou Technology; Nanjing University
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-140] SuperEx: Enhancing Indoor Mapping and Exploration using Non-Line-of-Sight Perception

链接: https://arxiv.org/abs/2510.10506
作者: Kush Garg(1),Akshat Dave(2) ((1) Delhi Technological University, New Delhi, India, (2) Stony Brook University, NY, United States)
机构: Delhi Technological University (德里技术大学); Stony Brook University (石溪大学)
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
备注: 8 pages, 9 Figures , Project webpage: this https URL

点击查看摘要

[CV-141] Jigsaw3D: Disentangled 3D Style Transfer via Patch Shuffling and Masking

【速读】:该论文旨在解决可控3D风格迁移(controllable 3D style transfer)中风格与语义内容耦合、多视角一致性差以及优化延迟高的问题。现有方法通常依赖于直接注入参考风格标记或从2D扩散模型中蒸馏得分,导致每场景需大量优化且难以分离风格与内容。其解决方案的关键在于提出Jigsaw3D——一个基于多视角扩散模型的管道,通过引入拼图操作(jigsaw operation,即参考图像块的空间打乱与随机掩码)来抑制对象语义信息,从而提取并保留纯风格统计特征(如色彩调色板、笔触和纹理)。这些风格线索通过参考到视角的交叉注意力机制整合进多视角扩散模型,生成视点一致的风格化渲染结果,并将风格烘焙至网格表面以获得无缝贴图。此方法实现了高效、高质量且解耦的3D风格迁移。

链接: https://arxiv.org/abs/2510.10497
作者: Yuteng Ye,Zheng Zhang,Qinchuan Zhang,Di Wang,Youjia Zhang,Wenxiao Zhang,Wei Yang,Yuan Liu
机构: Huawei(华为); Huazhong University of Science & Technology (华中科技大学); The Hong Kong University of Science and Technology (香港科技大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 23 pages, 16 figures and 1 table

点击查看摘要

Abstract:Controllable 3D style transfer seeks to restyle a 3D asset so that its textures match a reference image while preserving the integrity and multi-view consistency. The prevalent methods either rely on direct reference style token injection or score-distillation from 2D diffusion models, which incurs heavy per-scene optimization and often entangles style with semantic content. We introduce Jigsaw3D, a multi-view diffusion based pipeline that decouples style from content and enables fast, view-consistent stylization. Our key idea is to leverage the jigsaw operation - spatial shuffling and random masking of reference patches - to suppress object semantics and isolate stylistic statistics (color palettes, strokes, textures). We integrate these style cues into a multi-view diffusion model via reference-to-view cross-attention, producing view-consistent stylized renderings conditioned on the input mesh. The renders are then style-baked onto the surface to yield seamless textures. Across standard 3D stylization benchmarks, Jigsaw3D achieves high style fidelity and multi-view consistency with substantially lower latency, and generalizes to masked partial reference stylization, multi-object scene styling, and tileable texture generation. Project page is available at: this https URL
zh

[CV-142] Head-wise Adaptive Rotary Positional Encoding for Fine-Grained Image Generation

链接: https://arxiv.org/abs/2510.10489
作者: Jiaye Li,Baoyou Chen,Hui Li,Zilong Dong,Jingdong Wang,Siyu Zhu
机构: Fudan University (复旦大学); Alibaba Group (阿里巴巴集团); Baidu Inc. (百度公司); Shanghai Academy of AI for Science (上海人工智能科学研究院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-143] owards Self-Refinement of Vision-Language Models with Triangular Consistency

【速读】:该论文试图解决的问题是:当前视觉语言模型(Vision-Language Models, VLMs)主要依赖监督式视觉指令微调(supervised visual instruction tuning)来整合视觉知识与大语言模型(Large Language Models, LLMs)的分析能力,而未受监督训练下的VLMs是否具备自主学习潜力仍缺乏系统探索。解决方案的关键在于提出一种基于三角一致性(Triangular Consistency)原则的自精炼框架,通过在图像-查询-答案三角结构中实现任意遮蔽元素的一致性重建,使VLMs能够在无外部标注或环境反馈的情况下生成高质量合成数据并进行自主更新。该框架包含三个步骤:(1) 通过多任务指令微调赋予VLMs生成指令的能力;(2) 利用三角一致性原则对无标签图像生成的三元组进行过滤;(3) 使用过滤后的合成数据进一步优化模型。实验表明,仅依靠该机制,LLaVA-1.5基线模型即可在多个基准上实现一致且显著的性能提升,揭示了VLMs内在的自精炼能力及其潜在的学习机制。

链接: https://arxiv.org/abs/2510.10487
作者: Yunlong Deng,Guangyi Chen,Tianpei Gu,Lingjing Kong,Yan Li,Zeyu Tang,Kun Zhang
机构: Mohamed bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学); ByteDance US(字节跳动美国); Carnegie Mellon University(卡内基梅隆大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Vision-Language Models (VLMs) integrate visual knowledge with the analytical capabilities of Large Language Models (LLMs) through supervised visual instruction tuning, using image-question-answer triplets. However, the potential of VLMs trained without supervised instruction remains largely unexplored. This study validates that VLMs possess inherent self-refinement capabilities, enabling them to generate high-quality supervised data without external inputs and thereby learn autonomously. Specifically, to stimulate the self-refinement ability of VLMs, we propose a self-refinement framework based on a Triangular Consistency principle: within the image-query-answer triangle, any masked elements should be consistently and accurately reconstructed. The framework involves three steps: (1) We enable the instruction generation ability of VLMs by adding multi-task instruction tuning like image \rightarrow question-answer or image-answer \rightarrow question. (2) We generate image-query-answer triplets from unlabeled images and use the Triangular Consistency principle for filtering. (3) The model is further updated using the filtered synthetic data. To investigate the underlying mechanisms behind this self-refinement capability, we conduct a theoretical analysis from a causal perspective. Using the widely recognized LLaVA-1.5 as our baseline, our experiments reveal that the model can autonomously achieve consistent, though deliberately modest, improvements across multiple benchmarks without any external supervision, such as human annotations or environmental feedback. We expect that the insights of this study on the self-refinement ability of VLMs can inspire future research on the learning mechanism of VLMs. Code is available at this https URL.
zh

[CV-144] MSF-Mamba: Motion-aware State Fusion Mamba for Efficient Micro-Gesture Recognition

链接: https://arxiv.org/abs/2510.10478
作者: Deng Li,Jun Shao,Bohao Xing,Rong Gao,Bihan Wen,Heikki Kälviäinen,Xin Liu
机构: Tianjin University (天津大学); Nanyang Technological University (南洋理工大学); Lappeenranta-Lahti University of Technology LUT (拉彭兰塔-拉赫蒂理工大学); Brno University of Technology (布林诺理工大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-145] DAGLFNet:Deep Attention-Guided Global-Local Feature Fusion for Pseudo-Image Point Cloud Segmentation

链接: https://arxiv.org/abs/2510.10471
作者: Chuang Chen,Wenyi Ge
机构: Chengdu University of Information Technology (成都信息工程大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:

点击查看摘要

[CV-146] When Images Speak Louder: Mitigating Language Bias-induced Hallucinations in VLMs through Cross-Modal Guidance

链接: https://arxiv.org/abs/2510.10466
作者: Jinjin Cao,Zhiyang Chen,Zijun Wang,Liyuan Ma,Weijian Luo,Guojun Qi
机构: MAPLE Lab, Westlake University (西湖大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-147] Post-TIPS Prediction via Multimodal Interaction: A Multi-Center Dataset and Framework for Survival Complication and Portal Pressure Assessment

链接: https://arxiv.org/abs/2510.10464
作者: Junhao Dong,Dejia Liu,Ruiqi Ding,Zongxing Chen,Yingjie Huang,Zhu Meng,Jianbo Zhao,Zhicheng Zhao,Fei Su
机构: Beijing University of Posts and Telecommunications (北京邮电大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 81 pages, 13 figures

点击查看摘要

[CV-148] Learning from Disagreement: A Group Decision Simulation Framework for Robust Medical Image Segmentation

链接: https://arxiv.org/abs/2510.10462
作者: Chen Zhong,Yuxuan Yang,Xinyue Zhang,Ruohan Ma,Yong Guo,Gang Li,Jupeng Li
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-149] On the Problem of Consistent Anomalies in Zero-Shot Industrial Anomaly Detection

链接: https://arxiv.org/abs/2510.10456
作者: Tai Le-Gia,Ahn Jaehyun
机构: Chungnam National University (忠南国立大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP)
备注: Published in TMLR (10/2025)

点击查看摘要

[CV-150] MonoSE(3)-Diffusion: A Monocular SE(3) Diffusion Framework for Robust Camera-to-Robot Pose Estimation

链接: https://arxiv.org/abs/2510.10434
作者: Kangjian Zhu,Haobo Jiang,Yigong Zhang,Jianjun Qian,Jian Yang,Jin Xie
机构: Nanjing University of Science and Technology (南京理工大学); Nanyang Technological University (南洋理工大学); Nankai University (南开大学); Nanjing University (南京大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
备注:

点击查看摘要

[CV-151] aming a Retrieval Framework to Read Images in Humanlike Manner for Augmenting Generation of MLLM s

链接: https://arxiv.org/abs/2510.10426
作者: Suyang Xi,Chenxi Yang,Hong Ding,Yiqing Ni,Catherine C. Liu,Yunhao Liu,Chengqi Zhang
机构: Emory University (埃默里大学); University of Electronic Science and Technology of China (中国电子科技大学); University of Illinois Chicago (芝加哥大学伊利诺伊分校); The Hong Kong Polytechnic University (香港理工大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 12 pages, 5 figures

点击查看摘要

[CV-152] owards Cybersickness Severity Classification from VR Gameplay Videos Using Transfer Learning and Temporal Modeling

链接: https://arxiv.org/abs/2510.10422
作者: Jyotirmay Nag Setu,Kevin Desai,John Quarles
机构: The University of Texas at San Antonio (圣安东尼奥德克萨斯大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-153] Combo-Gait: Unified Transformer Framework for Multi-Modal Gait Recognition and Attribute Analysis

链接: https://arxiv.org/abs/2510.10417
作者: Zhao-Yang Wang,Zhimin Shao,Jieneng Chen,Rama Chellappa
机构: Johns Hopkins University (约翰霍普金斯大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[CV-154] Guided Image Feature Matching using Feature Spatial Order

【速读】:该论文旨在解决图像特征匹配(image feature matching)效率低下的问题,尤其是在特征点数量较多时,传统方法计算复杂度高、耗时长。其解决方案的关键在于引入特征空间顺序(feature spatial order)的概念,并将其整合进渐进式匹配框架中:利用初始匹配的特征点构建空间顺序模型,预测后续匹配可能的空间范围,从而过滤掉无效匹配对;同时结合基础矩阵(fundamental matrix)所体现的极线几何(epipolar geometry)信息,进一步提升匹配效率与精度;此外,为消除图像旋转对空间顺序的影响,提出基于基础矩阵的图像对齐方法,确保空间顺序模型的鲁棒性。实验表明,该方法在标准数据集和真实场景下均显著优于传统方法。

链接: https://arxiv.org/abs/2510.10414
作者: Chin-Hung Teng,Ben-Jian Dong
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
备注:

点击查看摘要

Abstract:Image feature matching plays a vital role in many computer vision tasks. Although many image feature detection and matching techniques have been proposed over the past few decades, it is still time-consuming to match feature points in two images, especially for images with a large number of detected features. Feature spatial order can estimate the probability that a pair of features is correct. Since it is a completely independent concept from epipolar geometry, it can be used to complement epipolar geometry in guiding feature match in a target region so as to improve matching efficiency. In this paper, we integrate the concept of feature spatial order into a progressive matching framework. We use some of the initially matched features to build a computational model of feature spatial order and employs it to calculates the possible spatial range of subsequent feature matches, thus filtering out unnecessary feature matches. We also integrate it with epipolar geometry to further improve matching efficiency and accuracy. Since the spatial order of feature points is affected by image rotation, we propose a suitable image alignment method from the fundamental matrix of epipolar geometry to remove the effect of image rotation. To verify the feasibility of the proposed method, we conduct a series of experiments, including a standard benchmark dataset, self-generated simulated images, and real images. The results demonstrate that our proposed method is significantly more efficient and has more accurate feature matching than the traditional method.
zh

[CV-155] Mesh-Gait: A Unified Framework for Gait Recognition Through Multi-Modal Representation Learning from 2D Silhouettes

链接: https://arxiv.org/abs/2510.10406
作者: Zhao-Yang Wang,Jieneng Chen,Jiang Liu,Yuxiang Guo,Rama Chellappa
机构: Johns Hopkins University (约翰霍普金斯大学); Advanced Micro Devices, Inc. (超威半导体公司)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[CV-156] AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration

链接: https://arxiv.org/abs/2510.10395
作者: Xinlong Chen,Yue Ding,Weihong Lin,Jingyun Hua,Linli Yao,Yang Shi,Bozhou Li,Yuanxing Zhang,Qiang Liu,Pengfei Wan,Liang Wang,Tieniu Tan
机构: Kling Team, Kuaishou Technology (快手科技); New Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA) (中国科学院自动化研究所); School of Artificial Intelligence, University of Chinese Academy of Sciences (中国科学院大学人工智能学院); Peking University (北京大学); Nanjing University (南京大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Project webpage: this https URL

点击查看摘要

[CV-157] Identifying bias in CNN image classification using image scrambling and transforms

【速读】:该论文旨在解决卷积神经网络(Convolutional Neural Networks, CNNs)在图像分类任务中存在“黑箱”特性所带来的隐性偏见问题,即模型可能依赖于与目标任务无关的背景信息或噪声进行决策,从而导致不可靠或有偏的结果。其核心挑战在于难以区分模型所依赖的是真正具有判别力的上下文特征(contextual information)还是无关的背景噪声(background noise)。解决方案的关键在于提出两种可操作的方法:一是将图像分割为不重叠的小块并随机打乱,增加分类难度以检测对局部结构的依赖;二是应用多种图像变换(如傅里叶变换、小波变换和中值滤波)及其组合,以恢复并识别CNN用于分类的背景噪声信息。实验表明,这两种方法能有效区分上下文信息与背景噪声,并在无需显式空白背景的情况下识别潜在的噪声依赖,从而提升模型决策的透明度与可信度。

链接: https://arxiv.org/abs/2510.10383
作者: Sai Teja Erukude
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 62 pages, Master’s thesis

点击查看摘要

Abstract:CNNs are now prevalent as the primary choice for most machine vision problems due to their superior rate of classification and the availability of user-friendly libraries. These networks effortlessly identify and select features in a non-intuitive data-driven manner, making it difficult to determine which features were most influential. That leads to a ``black box", where users cannot know how the image data are analyzed but rely on empirical results. Therefore the decision-making process can be biased by background information that is difficult to detect. Here we discuss examples of such hidden biases and propose techniques for identifying them, methods to distinguish between contextual information and background noise, and explore whether CNNs learn from irrelevant features. One effective approach to identify dataset bias is to classify blank background parts of the images. However, in some situations a blank background in the images is not available, making it more difficult to separate the foreground information from the blank background. Such parts of the image can also be considered contextual learning, not necessarily bias. To overcome this, we propose two approaches that were tested on six different datasets, including natural, synthetic, and hybrid datasets. The first method involves dividing images into smaller, non-overlapping tiles of various sizes, which are then shuffled randomly, making classification more challenging. The second method involves the application of several image transforms, including Fourier, Wavelet transforms, and Median filter, and their combinations. These transforms help recover background noise information used by CNN to classify images. Results indicate that this method can effectively distinguish between contextual information and background noise, and alert on the presence of background noise even without the need to use background information.
zh

[CV-158] Self-Supervised Multi-Scale Transformer with Attention-Guided Fusion for Efficient Crack Detection

链接: https://arxiv.org/abs/2510.10378
作者: Blessing Agyei Kyem,Joshua Kofi Asamoah,Eugene Denteh,Andrews Danyo,Armstrong Aboah
机构: North Dakota State University (北达科他州立大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: The paper has been published at Automation in Construction journal. The paper has 53 pages and 11 figures

点击查看摘要

[CV-159] Vision4PPG: Emergent PPG Analysis Capability of Vision Foundation Models for Vital Signs like Blood Pressure

链接: https://arxiv.org/abs/2510.10366
作者: Saurabh Kataria,Ayca Ermis,Lovely Yeswanth Panchumarthi,Minxiao Wang,Xiao Hu
机构: Emory University (埃默里大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: BHI abstract extended

点击查看摘要

[CV-160] PointMAC: Meta-Learned Adaptation for Robust Test-Time Point Cloud Completion NEURIPS2025

链接: https://arxiv.org/abs/2510.10365
作者: Linlian Jiang,Rui Ma,Li Gu,Ziqiang Wang,Xinxin Zuo,Yang Wang
机构: Concordia University (康考迪亚大学); Jilin University (吉林大学); Mila - Quebec AI Institute (魁北克人工智能研究所); Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China (教育部知识驱动人机智能工程研究中心)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: NeurIPS 2025

点击查看摘要

[CV-161] Ortho-Fuse: Orthomosaic Generation for Sparse High-Resolution Crop Health Maps Through Intermediate Optical Flow Estimation

链接: https://arxiv.org/abs/2510.10360
作者: Rugved Katole,Christopher Stewart
机构: The Ohio State University (俄亥俄州立大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 6 Figures, 9 pages

点击查看摘要

[CV-162] Ordinal Scale Traffic Congestion Classification with Multi-Modal Vision-Language and Motion Analysis

链接: https://arxiv.org/abs/2510.10342
作者: Yu-Hsuan Lin
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 7 pages, 4 figures. Preprint submitted to arXiv in October 2025

点击查看摘要

[CV-163] From Programs to Poses: Factored Real-World Scene Generation via Learned Program Libraries NEURIPS2025

链接: https://arxiv.org/abs/2510.10292
作者: Joy Hsu,Emily Jin,Jiajun Wu,Niloy J. Mitra
机构: Stanford University (斯坦福大学); University College London (伦敦大学学院)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: NeurIPS 2025

点击查看摘要

[CV-164] SAM2LoRA: Composite Loss-Guided Parameter-Efficient Finetuning of SAM2 for Retinal Fundus Segmentation ICML

链接: https://arxiv.org/abs/2510.10288
作者: Sayan Mandal,Divyadarshini Karthikeyan,Manas Paldhe
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted for publication at the 2025 International Conference on Machine Learning and Applications (ICMLA)

点击查看摘要

[CV-165] Bridging Perspectives: Foundation Model Guided BEV Maps for 3D Object Detection and Tracking

链接: https://arxiv.org/abs/2510.10287
作者: Markus Käppeler,Özgün Çiçek,Daniele Cattaneo,Claudius Gläser,Yakov Miron,Abhinav Valada
机构: University of Freiburg (弗莱堡大学); Bosch Research (博世研究)
类目: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
备注:

点击查看摘要

[CV-166] X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model

链接: https://arxiv.org/abs/2510.10274
作者: Jinliang Zheng,Jianxiong Li,Zhihao Wang,Dongxiu Liu,Xirui Kang,Yuchun Feng,Yinan Zheng,Jiayin Zou,Yilun Chen,Jia Zeng,Ya-Qin Zhang,Jiangmiao Pang,Jingjing Liu,Tai Wang,Xianyuan Zhan
机构: Institute for AI Industry Research (AIR); Tsinghua University; Shanghai AI Lab; Peking University
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: preprint, technical report, 33 pages

点击查看摘要

[CV-167] VividAnimator: An End-to-End Audio and Pose-driven Half-Body Human Animation Framework

链接: https://arxiv.org/abs/2510.10269
作者: Donglin Huang,Yongyuan Li,Tianhang Liu,Junming Huang,Xiaoda Yang,Chi Wang,Weiwei Xu
机构: Zhejiang University (浙江大学); Image Derivative Inc (图像衍生公司)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Comments: 10 pages, 6 figures

点击查看摘要

[CV-168] Opacity-Gradient Driven Density Control for Compact and Efficient Few-Shot 3D Gaussian Splatting

链接: https://arxiv.org/abs/2510.10257
作者: Abdelrhman Elrawy,Emad A. Mohammed
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:

点击查看摘要

[CV-169] Are Video Models Emerging as Zero-Shot Learners and Reason ers in Medical Imaging?

链接: https://arxiv.org/abs/2510.10254
作者: Yuxiang Lai,Jike Zhong,Ming Li,Yuheng Li,Xiaofeng Yang
机构: Emory University (埃默里大学); University of Southern California (南加州大学); University of Maryland (马里兰大学); Georgia Institute of Technology (佐治亚理工学院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-170] MRI Brain Tumor Detection with Computer Vision

链接: https://arxiv.org/abs/2510.10250
作者: Jack Krolik,Jake Lynn,John Henry Rudden,Dmytro Vremenko
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 12 pages, 8 figures, final project report for CS4100 (Machine Learning), Northeastern University, April 2024

点击查看摘要

[CV-171] Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images

链接: https://arxiv.org/abs/2510.10231
作者: Chuangchuang Tan,Xiang Ming,Jinglu Wang,Renshuai Tao,Bin Li,Yunchao Wei,Yao Zhao,Yan Lu
机构: Beijing Jiaotong University (北京交通大学); Microsoft Research Asia (微软亚洲研究院); Shenzhen University (深圳大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 27 pages, 7 figures

点击查看摘要

[CV-172] A Style-Based Metric for Quantifying the Synthetic-to-Real Gap in Autonomous Driving Image Datasets

链接: https://arxiv.org/abs/2510.10203
作者: Dingyi Yao,Xinyao Han,Ruibo Ming,Zhihang Song,Lihui Peng,Jianming Hu,Danya Yao,Yi Zhang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 7 pages, 4 figures

点击查看摘要

[CV-173] From Generic to Specialized: A Subspecialty Diagnostic System Powered by Self-Supervised Learning for Cervical Histopathology

链接: https://arxiv.org/abs/2510.10196
作者: Yizhi Wang,Li Chen,Qiang Huang,Tian Guan,Xi Deng,Zhiyuan Shen,Jiawen Li,Xinrui Chen,Bin Hu,Xitong Ling,Taojie Zhu,Zirui Huang,Deshui Yu,Yan Liu,Jiurun Chen,Lianghui Zhu,Qiming He,Yiqing Liu,Diwei Shi,Hanzhong Liu,Junbo Hu,Hongyi Gao,Zhen Song,Xilong Zhao,Chao He,Ming Zhao,Yonghong He
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 32 pages, 6 figures

点击查看摘要

[CV-174] B2N3D: Progressive Learning from Binary to N-ary Relationships for 3D Object Grounding

链接: https://arxiv.org/abs/2510.10194
作者: Feng Xiao,Hongbin Xu,Hai Ci,Wenxiong Kang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-175] Fairness Without Labels: Pseudo-Balancing for Bias Mitigation in Face Gender Classification ICCV2025

链接: https://arxiv.org/abs/2510.10191
作者: Haohua Dong,Ana Manzano Rodríguez,Camille Guinaudeau,Shin’ichi Satoh
机构: National Institute of Informatics (日本信息研究所); INESC-ID, Instituto Superior Técnico, University of Lisbon (葡萄牙里斯本理工学院INESC-ID研究所); University of Amsterdam (阿姆斯特丹大学); LIMSI, CNRS / Université Paris-Saclay (法国巴黎萨克雷大学/法国国家科学研究中心LIMSI实验室)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 8 pages. Accepted for publication in the ICCV 2025 Workshop Proceedings (2nd FAILED Workshop). Also available on HAL (hal-05210445v1)

点击查看摘要

[CV-176] INR-Bench: A Unified Benchmark for Implicit Neural Representations in Multi-Domain Regression and Reconstruction

链接: https://arxiv.org/abs/2510.10188
作者: Linfei Li,Fengyi Zhang,Zhong Wang,Lin Zhang,Ying Shen
机构: Tongji University (同济大学); The University of Queensland (昆士兰大学); Shanghai Jiao Tong University (上海交通大学)
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-177] Dejavu: Post-Deployment Learning for Embodied Agents via Experience Feedback

链接: https://arxiv.org/abs/2510.10181
作者: Shaokai Wu,Yanbiao Ji,Qiuchang Li,Zhiyi Zhang,Qichen He,Wenyuan Xie,Guodong Zhang,Bayram Bayramli,Yue Ding,Hongtao Lu
机构: Shanghai Jiao Tong University (上海交通大学)
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-178] CMA: Text-Conditioned Multi-granularity Alignment for Drone Cross-Modal Text-Video Retrieval

链接: https://arxiv.org/abs/2510.10180
作者: Zixu Zhao,Yang Zhan
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-179] HccePose(BF): Predicting Front Back Surfaces to Construct Ultra-Dense 2D-3D Correspondences for Pose Estimation ICCV ICCV2025

【速读】:该论文旨在解决现有姿态估计方法中对物体表面信息利用不充分的问题,尤其是忽视了物体背面和内部区域的潜在价值。当前主流方法通常仅关注物体前表面的3D坐标预测,并通过神经网络生成2D图像上的密集3D坐标以建立2D-3D对应关系,但这种策略限制了姿态估计精度的进一步提升。其解决方案的关键在于:首先,同时预测物体前表面与后表面的3D坐标,并在两者之间进行密集采样,从而构建超密集的2D-3D对应关系;其次,提出分层连续坐标编码(Hierarchical Continuous Coordinate Encoding, HCCE)方法,以更精确且高效地表示前后表面坐标,从而显著提升基于透视n点(Perspective-n-Point, PnP)算法的姿态估计性能。实验表明,该方法在BOP数据集的7个核心基准上均优于现有最先进(state-of-the-art, SOTA)方法。

链接: https://arxiv.org/abs/2510.10177
作者: Yulin Wang,Mengting Hu,Hongli Li,Chen Luo
机构: Southeast University (东南大学); Purdue University (普渡大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: International Conference on Computer Vision, ICCV 2025 (Highlight) this https URL

点击查看摘要

Abstract:In pose estimation for seen objects, a prevalent pipeline involves using neural networks to predict dense 3D coordinates of the object surface on 2D images, which are then used to establish dense 2D-3D correspondences. However, current methods primarily focus on more efficient encoding techniques to improve the precision of predicted 3D coordinates on the object’s front surface, overlooking the potential benefits of incorporating the back surface and interior of the object. To better utilize the full surface and interior of the object, this study predicts 3D coordinates of both the object’s front and back surfaces and densely samples 3D coordinates between them. This process creates ultra-dense 2D-3D correspondences, effectively enhancing pose estimation accuracy based on the Perspective-n-Point (PnP) algorithm. Additionally, we propose Hierarchical Continuous Coordinate Encoding (HCCE) to provide a more accurate and efficient representation of front and back surface coordinates. Experimental results show that, compared to existing state-of-the-art (SOTA) methods on the BOP website, the proposed approach outperforms across seven classic BOP core datasets. Code is available at this https URL.
zh

[CV-180] ViConEx-Med: Visual Concept Explainability via Multi-Concept Token Transformer for Medical Image Analysis

链接: https://arxiv.org/abs/2510.10174
作者: Cristiano Patrício,Luís F. Teixeira,João C. Neves
机构: Universidade da Beira Interior(贝拉内陆大学); NOVA LINCS; Faculdade de Engenharia da Universidade do Porto(波尔图大学工程学院); INESC TEC
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: This work has been submitted to the IEEE for possible publication

点击查看摘要

[CV-181] SparseUWSeg: Active Sparse Point-Label Augmentation for Underwater Semantic Segmentation

链接: https://arxiv.org/abs/2510.10163
作者: César Borja,Carlos Plou,Rubén Martinez-Cantín,Ana C. Murillo
机构: DIIS-I3A, University of Zaragoza, Spain(西班牙萨拉戈萨大学DIIS-I3A研究所)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-182] SaFiRe: Saccade-Fixation Reiteration with Mamba for Referring Image Segmentation NEURIPS2025

链接: https://arxiv.org/abs/2510.10160
作者: Zhenjie Mao,Yuhuan Yang,Chaofan Ma,Dongsheng Jiang,Jiangchao Yao,Ya Zhang,Yanfeng Wang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: NeurIPS 2025

点击查看摘要

[CV-183] ReMix: Towards a Unified View of Consistent Character Generation and Editing

链接: https://arxiv.org/abs/2510.10156
作者: Benjia Zhou,Bin Fu,Pei Cheng,Yanru Wang,Jiayuan Fan,Tao Chen
机构: Tencent GYLab(腾讯GYLab); Fudan University(复旦大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-184] Stroke Locus Net: Occluded Vessel Localization from MRI Modalities

链接: https://arxiv.org/abs/2510.10155
作者: Mohamed Hamad,Muhammad Khan,Tamer Khattab,Mohamed Mabrok
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: This version of the paper was accepted in the ADMA 2025 conference in Kyoto, Japan

点击查看摘要

[CV-185] Color3D: Controllable and Consistent 3D Colorization with Personalized Colorizer

链接: https://arxiv.org/abs/2510.10152
作者: Yecong Wan,Mingwen Shao,Renlong Wu,Wangmeng Zuo
机构: Harbin Institute of Technology (哈尔滨工业大学); Shenzhen University of Advanced Technology (深圳先进技术研究院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Project Page this https URL

点击查看摘要

[CV-186] YOLOv11-Litchi: Efficient Litchi Fruit Detection based on UAV-Captured Agricultural Imagery in Complex Orchard Environments

链接: https://arxiv.org/abs/2510.10141
作者: Hongxing Peng,Haopei Xie,Weijia Lia,Huanai Liuc,Ximing Li
机构: South China Agricultural University (华南农业大学); South China University of Technology (华南理工大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
备注:

点击查看摘要

[CV-187] DeepFusionNet: Autoencoder-Based Low-Light Image Enhancement and Super-Resolution

链接: https://arxiv.org/abs/2510.10122
作者: Halil Hüseyin Çalışkan,Talha Koruk
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 12 pages, 11 figures

点击查看摘要

[CV-188] Multi Class Parkinsons Disease Detection Based on Finger Tapping Using Attention-Enhanced CNN BiLSTM

链接: https://arxiv.org/abs/2510.10121
作者: Abu Saleh Musa Miah,Najmul Hassan,Md Maruf Al Hossain,Yuichi Okuyama,Jungpil Shin
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-189] ImmerIris: A Large-Scale Dataset and Benchmark for Immersive Iris Recognition in Open Scenes

链接: https://arxiv.org/abs/2510.10113
作者: Yuxi Mi,Qiuyang Yuan,Zhizhou Zhong,Xuan Zhao,Jiaogen Zhou,Fubao Zhu,Jihong Guan,Shuigeng Zhou
机构: Fudan University (复旦大学); Huaiyin Normal University (淮阴师范学院); Zhengzhou University of Light Industry (郑州轻工业大学); Tongji University (同济大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-190] raining-Free In-Context Forensic Chain for Image Manipulation Detection and Localization

链接: https://arxiv.org/abs/2510.10111
作者: Rui Chen,Bin Liu,Changtao Miao,Xinghao Wang,Yi Li,Tao Gong,Qi Chu,Nenghai Yu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
备注:

点击查看摘要

[CV-191] Uncertainty-Aware Post-Detection Framework for Enhanced Fire and Smoke Detection in Compact Deep Learning Models

链接: https://arxiv.org/abs/2510.10108
作者: Aniruddha Srinivas Joshi,Godwyn James William,Shreyas Srinivas Joshi
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
备注: Accepted and to be presented at the International Conference on Smart Multimedia (ICSM 2025) - this https URL

点击查看摘要

[CV-192] Answer-Consistent Chain-of-thought Reinforcement Learning For Multi-modal Large Langauge Models

链接: https://arxiv.org/abs/2510.10104
作者: Minbin Huang,Runhui Huang,Chuanyang Zheng,Jingyao Li,Guoxuan Chen,Han Shi,Hong Cheng
机构: The Chinese University of Hong Kong (香港中文大学); The University of Hong Kong (香港大学); Huawei Noah’s Ark Lab (华为诺亚方舟实验室)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-193] Cooperative Pseudo Labeling for Unsupervised Federated Classification ICCV2025

链接: https://arxiv.org/abs/2510.10100
作者: Kuangpu Guo,Lijun Sheng,Yongcan Yu,Jian Liang,Zilei Wang,Ran He
机构: University of Science and Technology of China (中国科学技术大学); NLPR & MAIS, Institute of Automation, Chinese Academy of Sciences (中科院自动化所NLPR与MAIS); School of Artificial Intelligence, University of Chinese Academy of Sciences (中国科学院大学人工智能学院)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Accepted by ICCV 2025

点击查看摘要

[CV-194] Gesplat: Robust Pose-Free 3D Reconstruction via Geometry-Guided Gaussian Splatting

链接: https://arxiv.org/abs/2510.10097
作者: Jiahui Lu,Haihong Xiao,Xueyan Zhao,Wenxiong Kang
机构: South China University of Technology (华南理工大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-195] racking the Spatiotemporal Evolution of Landslide Scars Using a Vision Foundation Model: A Novel and Universal Framework

链接: https://arxiv.org/abs/2510.10084
作者: Meijun Zhou,Gang Mei,Zhengjing Ma,Nengxiong Xu,Jianbing Peng
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-196] SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents

链接: https://arxiv.org/abs/2510.10073
作者: Zonghao Ying,Yangguang Shao,Jianle Gan,Gan Xu,Junjie Shen,Wenxin Zhang,Quanchen Zou,Junzheng Shi,Zhenfei Yin,Mingchuan Zhang,Aishan Liu,Xianglong Liu
机构: SKLCCSE, Beihang University(北京航空航天大学); Institute of Information Engineering, Chinese Academy of Sciences(中国科学院信息工程研究所); China University of Petroleum (East China)(中国石油大学(华东)); Zhejiang University of Technology(浙江工业大学); University of Chinese Academy of Science(中国科学院大学); 360 AI Security Lab(360人工智能安全实验室); The University of Sydney(悉尼大学); Henan University of Science and Technology(河南科技大学); Zhongguancun Laboratory(中关村实验室); Institute of Dataspace(数据空间研究所)
类目: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-197] Probabilistic Hyper-Graphs using Multiple Randomly Masked Autoencoders for Semi-supervised Multi-modal Multi-task Learning

链接: https://arxiv.org/abs/2510.10068
作者: Pîrvu Mihai-Cristian,Leordeanu Marius
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-198] Collaborative Learning of Semantic-Aware Feature Learning and Label Recovery for Multi-Label Image Recognition with Incomplete Labels

链接: https://arxiv.org/abs/2510.10055
作者: Zhi-Fen He,Ren-Dong Xie,Bo Li,Bin Liu,Jin-Yan Hu
机构: Nanchang Hangkong University (南昌航空大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-199] DREAM: A Benchmark Study for Deepfake REalism AssessMent

链接: https://arxiv.org/abs/2510.10053
作者: Bo Peng,Zichuan Wang,Sheng Yu,Xiaochuan Jin,Wei Wang,Jing Dong
机构: New Laboratory of Pattern Recognition (模式识别新实验室); Institute of Automation, Chinese Academy of Sciences (中国科学院自动化研究所); School of Artificial Intelligence, University of Chinese Academy of Sciences (中国科学院大学人工智能学院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-200] hink Twice to See More: Iterative Visual Reasoning in Medical VLMs

链接: https://arxiv.org/abs/2510.10052
作者: Kaitao Chen,Shaohao Rui,Yankai Jiang,Jiamin Wu,Qihao Zheng,Chunfeng Song,Xiaosong Wang,Mu Zhou,Mianxin Liu
机构: Fudan University (复旦大学); Shanghai AI Laboratory; Rutgers University (罗格斯大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 25 pages, 21 figures

点击查看摘要

[CV-201] Complementary and Contrastive Learning for Audio-Visual Segmentation

链接: https://arxiv.org/abs/2510.10051
作者: Sitong Gong,Yunzhi Zhuge,Lu Zhang,Pingping Zhang,Huchuan Lu
机构: Dalian University of Technology (大连理工大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted to IEEE Transactions on Multimedia

点击查看摘要

[CV-202] P-4DGS: Predictive 4D Gaussian Splatting with 90times Compression

链接: https://arxiv.org/abs/2510.10030
作者: Henan Wang,Hanxin Zhu,Xinliang Gong,Tianyu He,Xin Li,Zhibo Chen
机构: University of Science and Technology of China (中国科学技术大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-203] Q-Adapter: Visual Query Adapter for Extracting Textually-related Features in Video Captioning

链接: https://arxiv.org/abs/2510.10022
作者: Junan Chen,Trung Thanh Nguyen,Takahiro Komamizu,Ichiro Ide
机构: Nagoya University (名古屋大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: ACM Multimedia Asia 2025

点击查看摘要

[CV-204] MIMO: A medical vision language model with visual referring multimodal input and pixel grounding multimodal output CVPR2025

链接: https://arxiv.org/abs/2510.10011
作者: Yanyuan Chen,Dexuan Xu,Yu Huang,Songkun Zhan,Hanpin Wang,Dongxue Chen,Xueping Wang,Meikang Qiu,Hang Li
机构: Peking University (北京大学); Augusta University; Peking University First Hospital (北京大学第一医院); Peking University Sixth Hospital (北京大学第六医院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: CVPR 2025

点击查看摘要

[CV-205] CLoD-GS: Continuous Level-of-Detail via 3D Gaussian Splatting

链接: https://arxiv.org/abs/2510.09997
作者: Zhigang Cheng,Mingchao Sun,Yu Liu,Zengye Ge,Luyang Tang,Mu Xu,Yangyan Li,Peng Pan
机构: Tsinghua University (清华大学); AMAP; Ant Group (蚂蚁集团)
类目: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-206] BurstDeflicker: A Benchmark Dataset for Flicker Removal in Dynamic Scenes NEURIPS2025

链接: https://arxiv.org/abs/2510.09996
作者: Lishen Qu,Zhihao Liu,Shihao Zhou,Yaqi Luo,Jie Liang,Hui Zeng,Lei Zhang,Jufeng Yang
机构: Nankai International Advanced Research Institute (南开国际先进研究院); Peng Cheng Laboratory (鹏城实验室); College of Computer Science, Nankai University (南开大学计算机学院); The Hong Kong Polytechnic University (香港理工大学); OPPO Research Institute (OPPO研究院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by NeurIPS 2025

点击查看摘要

[CV-207] FlareX: A Physics-Informed Dataset for Lens Flare Removal via 2D Synthesis and 3D Rendering NEURIPS2025

链接: https://arxiv.org/abs/2510.09995
作者: Lishen Qu,Zhihao Liu,Jinshan Pan,Shihao Zhou,Jinglei Shi,Duosheng Chen,Jufeng Yang
机构: Nankai International Advanced Research Institute (SHENZHEN·FUTIAN); Peng Cheng Laboratory; College of Computer Science, Nankai University; Nanjing University of Science and Technology; Key Lab of SCCI, Dalian University of Technology
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by NeurIPS 2025

点击查看摘要

[CV-208] Scaling Traffic Insights with AI and Language Model-Powered Camera Systems for Data-Driven Transportation Decision Making

链接: https://arxiv.org/abs/2510.09981
作者: Fan Zuo,Donglin Zhou,Jingqin Gao,Kaan Ozbay
机构: New York University (纽约大学); C2SMART Center (C2SMART 中心)
类目: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
备注:

点击查看摘要

[CV-209] J-RAS: Enhancing Medical Image Segmentation via Retrieval-Augmented Joint Training

链接: https://arxiv.org/abs/2510.09953
作者: Salma J. Ahmed,Emad A. Mohammed,Azam Asilian Bidgoli
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-210] A Multi-Strategy Framework for Enhancing Shatian Pomelo Detection in Real-World Orchards

链接: https://arxiv.org/abs/2510.09948
作者: Pan Wang,Yihao Hu,Xiaodong Bai,Aiping Yang,Xiangxiang Li,Meiping Ding,Jianguo Yao
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-211] Explainable Human-in-the-Loop Segmentation via Critic Feedback Signals

链接: https://arxiv.org/abs/2510.09945
作者: Pouya Shaeri,Ryan T. Woo,Yasaman Mohammadpour,Ariane Middel
机构: Arizona State University (亚利桑那州立大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
备注: Submitted to a computer vision conference (under review)

点击查看摘要

[CV-212] Semi-disentangled spatiotemporal implicit neural representations of longitudinal neuroimaging data for trajectory classification MICCAI2025

链接: https://arxiv.org/abs/2510.09936
作者: Agampreet Aulakh,Nils D. Forkert,Matthias Wilms
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted at the MICCAI 2025 Learning with Longitudinal Medical Images and Data Workshop

点击查看摘要

[CV-213] Denoising Diffusion as a New Framework for Underwater Images

链接: https://arxiv.org/abs/2510.09934
作者: Nilesh Jain,Elie Alhajjar
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-214] HeadsUp! High-Fidelity Portrait Image Super-Resolution

链接: https://arxiv.org/abs/2510.09924
作者: Renjie Li,Zihao Zhu,Xiaoyu Wang,Zhengzhong Tu
机构: Texas A&M University (德州农工大学); Topaz Labs
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-215] SpectralCA: Bi-Directional Cross-Attention for Next-Generation UAV Hyperspectral Vision

链接: https://arxiv.org/abs/2510.09912
作者: D.V. Brovko
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: The work consists of three chapters, includes 12 figures, 4 tables, 31 references, and 1 appendix. A version of this work has been accepted for presentation at the 2025 IEEE 8th International Conference on Methods and Systems of Navigation and Motion Control

点击查看摘要

[CV-216] An uncertainty-aware framework for data-efficient multi-view animal pose estimation

链接: https://arxiv.org/abs/2510.09903
作者: Lenny Aharon,Keemin Lee,Karan Sikka,Selmaan Chettih,Cole Hurwitz,Liam Paninski,Matthew R Whiteway
机构: Columbia University (哥伦比亚大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)
备注:

点击查看摘要

[CV-217] LTGS: Long-Term Gaussian Scene Chronology From Sparse View Updates

链接: https://arxiv.org/abs/2510.09881
作者: Minkwan Kim,Seungmin Lee,Junho Kim,Young Min Kim
机构: Seoul National University (首尔国立大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-218] Geometry-Aware Scene Configurations for Novel View Synthesis

链接: https://arxiv.org/abs/2510.09880
作者: Minkwan Kim,Changwoon Choi,Young Min Kim
机构: Seoul National University (首尔国立大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-219] CHUG: Crowdsourced User-Generated HDR Video Quality Dataset

链接: https://arxiv.org/abs/2510.09879
作者: Shreshth Saini,Alan C. Bovik,Neil Birkbeck,Yilin Wang,Balu Adsumilli
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-220] Fast Self-Supervised depth and mask aware Association for Multi-Object Tracking

链接: https://arxiv.org/abs/2510.09878
作者: Milad Khanchi,Maria Amer,Charalambos Poullis
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-221] Cluster-Aware Prompt Ensemble Learning for Few-Shot Vision-Language Model Adaptation

链接: https://arxiv.org/abs/2510.09867
作者: Zhi Chen,Xin Yu,Xiaohui Tao,Yan Li,Zi Huang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted to the journal Pattern Recognition in 2025

点击查看摘要

[CV-222] MTMD: A Multi-Task Multi-Domain Framework for Unified Ad Lightweight Ranking at Pinterest KDD2025

链接: https://arxiv.org/abs/2510.09857
作者: Xiao Yang,Peifeng Yin,Abe Engle,Jinfeng Zhuang,Ling Leng
机构: Pinterest Inc.(Pinterest公司)
类目: Information Retrieval (cs.IR); Computer Vision and Pattern Recognition (cs.CV)
备注: AdKDD 2025

点击查看摘要

[CV-223] Cell Instance Segmentation: The Devil Is in the Boundaries

链接: https://arxiv.org/abs/2510.09848
作者: Peixian Liang,Yifan Ding,Yizhe Zhang,Jianxu Chen,Hao Zheng,Hongxiao Wang,Yejia Zhang,Guangyu Meng,Tim Weninger,Michael Niemier,X. Sharon Hu,Danny Z Chen
机构: University of Notre Dame (圣母大学); Leibniz-Institut für Analytische Wissenschaften–ISAS–e.V. (莱布尼茨分析科学研究所); University of Louisiana at Lafayette (路易斯安那大学拉斐特分校)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted at IEEE Transactions On Medical Imaging (TMI)

点击查看摘要

[CV-224] Harnessing Self-Supervised Deep Learning and Geostationary Remote Sensing for Advancing Wildfire and Associated Air Quality Monitoring: Improved Smoke and Fire Front Masking using GOES and TEMPO Radiance Data

链接: https://arxiv.org/abs/2510.09845
作者: Nicholas LaHaye,Thilanka Munashinge,Hugo Lee,Xiaohua Pan,Gonzalo Gonzalez Abad,Hazem Mahmoud,Jennifer Wei
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: this https URL

点击查看摘要

[CV-225] Exploration of Incremental Synthetic Non-Morphed Images for Single Morphing Attack Detection NEURIPS2025

【速读】:该论文旨在解决单形态攻击检测(Single-Morphing Attack Detection, S-MAD)中真实人脸图像数据集规模受限的问题,这主要源于隐私保护的限制。为提升检测性能,研究提出利用合成人脸数据来增强现有数据集,并通过多种形态生成工具和跨数据集评估方案进行验证。其解决方案的关键在于:有控制地引入适量的合成图像,或在训练过程中逐步加入真实图像,从而有效提升模型的泛化能力;而盲目使用合成数据则可能导致性能下降。实验表明,仅依赖合成数据虽能获得最低等错误率(Equal Error Rate, EER),但在实际应用中仍不可取,强调了合理融合真实与合成数据的重要性。

链接: https://arxiv.org/abs/2510.09836
作者: David Benavente-Rios,Juan Ruiz Rodriguez,Gustavo Gatica
机构: Universidad de Santiago (圣地亚哥大学); Universidad de Playa Ancha (普拉亚安恰大学); Universidad Andres Bello (安德烈斯贝洛大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
备注: Workshop paper accepted NeurIPS 2025

点击查看摘要

Abstract:This paper investigates the use of synthetic face data to enhance Single-Morphing Attack Detection (S-MAD), addressing the limitations of availability of large-scale datasets of bona fide images due to privacy concerns. Various morphing tools and cross-dataset evaluation schemes were utilized to conduct this study. An incremental testing protocol was implemented to assess the generalization capabilities as more and more synthetic images were added. The results of the experiments show that generalization can be improved by carefully incorporating a controlled number of synthetic images into existing datasets or by gradually adding bona fide images during training. However, indiscriminate use of synthetic data can lead to sub-optimal performance. Evenmore, the use of only synthetic data (morphed and non-morphed images) achieves the highest Equal Error Rate (EER), which means in operational scenarios the best option is not relying only on synthetic data for S-MAD.
zh

[CV-226] Post Processing of image segmentation using Conditional Random Fields

链接: https://arxiv.org/abs/2510.09833
作者: Aashish Dhawan,Pankaj Bodani,Vishal Garg
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-227] Decomposer Networks: Deep Component Analysis and Synthesis

链接: https://arxiv.org/abs/2510.09825
作者: Mohsen Joneidi
机构: 未知
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Neural and Evolutionary Computing (cs.NE)
备注: 13 Pages, 4 figures

点击查看摘要

[CV-228] Cross-Sensor Touch Generation

链接: https://arxiv.org/abs/2510.09817
作者: Samanta Rodriguez,Yiming Dou,Miquel Oller,Andrew Owens,Nima Fazeli
机构: University of Michigan (密歇根大学); Cornell University (康奈尔大学)
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
备注: CoRL 2025

点击查看摘要

[CV-229] owards Understanding Ambiguity Resolution in Multimodal Inference of Meaning

链接: https://arxiv.org/abs/2510.09815
作者: Yufei Wang,Adriana Kovashka,Loretta Fernández,Marc N. Coutanche,Seth Wiener
机构: University of Pittsburgh (匹兹堡大学); Carnegie Mellon University (卡内基梅隆大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Accepted to International Conference on Development and Learning (ICDL) 2025

点击查看摘要

[CV-230] Causality neq Decodability and Vice Versa: Lessons from Interpreting Counting ViTs

链接: https://arxiv.org/abs/2510.09794
作者: Lianghuan Huang,Yingshan Chang
机构: University of Pennsylvania (宾夕法尼亚大学); Carnegie Mellon University (卡内基梅隆大学)
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-231] Constructive Distortion: Improving MLLM s with Attention-Guided Image Warping

链接: https://arxiv.org/abs/2510.09741
作者: Dwip Dalal,Gautam Vashishtha,Utkarsh Mishra,Jeonghwan Kim,Madhav Kanda,Hyeonjeong Ha,Svetlana Lazebnik,Heng Ji,Unnat Jain
机构: University of Illinois Urbana–Champaign (伊利诺伊大学厄巴纳-香槟分校); Skan AI; Texas A&M University (德克萨斯农工大学); University of California, Irvine (加州大学欧文分校)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:

点击查看摘要

[CV-232] Reliable Active Learning from Unreliable Labels via Neural Collapse Geometry NEURIPS2025

链接: https://arxiv.org/abs/2510.09740
作者: Atharv Goel,Sharat Agarwal,Saket Anand,Chetan Arora
机构: IIIT Delhi (印度信息技术研究所); IIT Delhi (印度理工学院德里分校)
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted to NeurIPS 2025 Workshop on Reliable ML from Unreliable Data

点击查看摘要

[CV-233] Multi Camera Connected Vision System with Multi View Analytics: A Comprehensive Survey

链接: https://arxiv.org/abs/2510.09731
作者: Muhammad Munsif,Waqas Ahmad,Amjid Ali,Mohib Ullah,Adnan Hussain,Sung Wook Baik
机构: Sejong University (世宗大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-234] Adaptive Fusion Network with Temporal-Ranked and Motion-Intensity Dynamic Images for Micro-expression Recognition

链接: https://arxiv.org/abs/2510.09730
作者: Thi Bich Phuong Man,Luu Tu Nguyen,Vu Tram Anh Khuong,Thanh Ha Le,Thi Duyen Ngo
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-235] Deep Neural Networks Inspired by Differential Equations

链接: https://arxiv.org/abs/2510.09685
作者: Yongshuai Liu,Lianfang Wang,Kuilin Qin,Qinghua Zhang,Faqiang Wang,Li Cui,Jun Liu,Yuping Duan,Tieyong Zeng
机构: Beijing Normal University (北京师范大学); The Chinese University of Hong Kong (香港中文大学)
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (math.NA)
备注: 35 Pages, 3 figures

点击查看摘要

[CV-236] NNDM: NN_UNet Diffusion Model for Brain Tumor Segmentation

链接: https://arxiv.org/abs/2510.09681
作者: Sashank Makanaboyina
机构: DePaul University (德保罗大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-237] Knowledge-Aware Mamba for Joint Change Detection and Classification from MODIS Times Series

链接: https://arxiv.org/abs/2510.09679
作者: Zhengsen Xu,Yimin Zhu,Zack Dewis,Mabel Heffring,Motasem Alkayid,Saeid Taleghanidoozdoozan,Lincoln Linlin Xu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-238] OmniSAT: Compact Action Token Faster Auto Regression

链接: https://arxiv.org/abs/2510.09667
作者: Huaihai Lyu,Chaofan Chen,Senwei Xie,Pengwei Wang,Xiansheng Chen,Shanghang Zhang,Changsheng Xu
机构: Institute of Automation, Chinese Academy of Sciences (中国科学院自动化研究所); University of Chinese Academy of Sciences (中国科学院大学); Institute of Computation, Chinese Academy of Sciences (中国科学院计算技术研究所); Beijing Academy of Artificial Intelligence (北京人工智能研究院); Peking University (北京大学); Peng Cheng Laboratory (鹏城实验室)
类目: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
备注:

点击查看摘要

[CV-239] Semantic-Cohesive Knowledge Distillation for Deep Cross-modal Hashing

链接: https://arxiv.org/abs/2510.09664
作者: Changchang Sun,Vickie Chen,Yan Yan
机构: University of Illinois Chicago (伊利诺伊大学芝加哥分校); Rensselaer Polytechnic Institute (伦斯勒理工学院)
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
备注:

点击查看摘要

[CV-240] Learning What Matters: Steering Diffusion via Spectrally Anisotropic Forward Noise

链接: https://arxiv.org/abs/2510.09660
作者: Luca Scimeca,Thomas Jiralerspong,Berton Earnshaw,Jason Hartford,Yoshua Bengio
机构: Mila - Quebec AI Institute (魁北克人工智能研究所); Université de Montréal (蒙特利尔大学); Valence Labs; Recursion (Recursion)
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-241] Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models

链接: https://arxiv.org/abs/2510.09658
作者: Filippo Rinaldi,Aniello Panariello,Giacomo Salici,Fengyuan Liu,Marco Ciccone,Angelo Porrello,Simone Calderara
机构: University of Modena and Reggio Emilia (摩德纳和雷焦艾米利亚大学); Vector Institute (向量研究所); University of Toronto (多伦多大学)
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-242] reeNet: Layered Decision Ensembles

链接: https://arxiv.org/abs/2510.09654
作者: Zeshan Khan
机构: National University of Computer and Emerging Sciences (国家计算机与新兴科学大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-243] Ultralytics YOLO Evolution: An Overview of YOLO26 YOLO1 1 YOLOv8 and YOLOv5 Object Detectors for Computer Vision and Pattern Recognition

链接: https://arxiv.org/abs/2510.09653
作者: Ranjan Sapkota,Manoj Karkee
机构: Cornell University (康奈尔大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 16 pages, 5 Tables, 5 Figures

点击查看摘要

[CV-244] nyViT-Batten: Few-Shot Vision Transformer with Explainable Attention for Early Batten-Disease Detection on Pediatric MRI

【速读】:该论文旨在解决罕见儿科神经退行性疾病——Batten病(neuronal ceroid lipofuscinosis)早期影像学征象不明显、易被忽视的问题,从而实现基于儿童脑部磁共振成像(MRI)的早期精准识别。其解决方案的关键在于提出一种小样本视觉Transformer(TinyViT-Batten)框架,通过知识蒸馏将大模型教师ViT压缩为仅500万参数的轻量级TinyViT,并结合基于度量的少样本学习(prototypical loss with 5-shot episodes)进行微调,在仅有少量标注病例的情况下仍能实现高准确率(约91%)和高AUC(≥0.95),同时利用梯度加权类激活映射(Grad-CAM)实现可解释性分析,显著优于3D-ResNet与Swin-Tiny等基线模型。

链接: https://arxiv.org/abs/2510.09649
作者: Khartik Uppalapati,Bora Yimenicioglu,Shakeel Abdulkareem,Adan Eftekhari,Bhavya Uppalapati,Viraj Kamath
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 8 pages, 3 figures, 1 table. Submitted to International Conference on Computational Intelligence and Sustainable Engineering Solutions (CISES)

点击查看摘要

Abstract:Batten disease (neuronal ceroid lipofuscinosis) is a rare pediatric neurodegenerative disorder whose early MRI signs are subtle and often missed. We propose TinyViT-Batten, a few-shot Vision Transformer (ViT) framework to detect early Batten disease from pediatric brain MRI with limited training cases. We distill a large teacher ViT into a 5 M-parameter TinyViT and fine-tune it using metric-based few-shot learning (prototypical loss with 5-shot episodes). Our model achieves high accuracy (approximately 91%) and area under ROC of at least 0.95 on a multi-site dataset of 79 genetically confirmed Batten-disease MRIs (27 CLN3 from the Hochstein natural-history study, 32 CLN2 from an international longitudinal cohort, 12 early-manifestation CLN2 cases reported by Cokal et al., and 8 public Radiopaedia scans) together with 90 age-matched controls, outperforming a 3D-ResNet and Swin-Tiny baseline. We further integrate Gradient-weighted Class Activation Mapping (Grad-CAM) to highlight disease-relevant brain regions, enabling explainable predictions. The model’s small size and strong performance (sensitivity greater than 90%, specificity approximately 90%) demonstrates a practical AI solution for early Batten disease detection.
zh

[CV-245] Generalisation of automatic tumour segmentation in histopathological whole-slide images across multiple cancer types

链接: https://arxiv.org/abs/2510.11182
作者: Ole-Johan Skrede,Manohar Pradhan,Maria Xepapadakis Isaksen,Tarjei Sveinsgjerd Hveem,Ljiljana Vlatkovic,Arild Nesbakken,Kristina Lindemann,Gunnar B Kristensen,Jenneke Kasius,Alain G Zeimet,Odd Terje Brustugun,Lill-Tove Rasmussen Busund,Elin H Richardsen,Erik Skaaheim Haug,Bjørn Brennhovd,Emma Rewcastle,Melinda Lillesand,Vebjørn Kvikstad,Emiel Janssen,David J Kerr,Knut Liestøl,Fritz Albregtsen,Andreas Kleppe
机构: Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Amsterdam University Medical Centres(阿姆斯特丹大学医学中心); Innsbruck Medical University(因斯布鲁克医科大学); Vestre Viken Hospital Trust(西维肯医院信托); UiT The Arctic University of Norway(北挪威特罗姆瑟大学); University of Stavanger(斯塔万格大学); University of Oxford(牛津大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); UiT The Arctic University of Norway(北挪威特罗姆瑟大学); Vestfold Hospital Trust(维特弗德医院信托); Stavanger University Hospital(斯塔万格大学医院); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学医院); University of Oslo(奥斯陆大学); Oslo University Hospital(奥斯陆大学…
类目: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-246] JND-Guided Light-Weight Neural Pre-Filter for Perceptual Image Coding ISCAS

链接: https://arxiv.org/abs/2510.10648
作者: Chenlong He,Zijing Dong,Min Li,Zhijian Hao,Leilei Huang,Xiaoyang Zeng,Yibo Fan
机构: Fudan University (复旦大学); Xidian University (西安电子科技大学); East China Normal University (华东师范大学)
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
备注: 5 pages, 4 figures. Submitted to the IEEE International Symposium on Circuits and Systems (ISCAS) 2026

点击查看摘要

[CV-247] UltraScatter: Ray-Based Simulation of Ultrasound Scattering

链接: https://arxiv.org/abs/2510.10612
作者: Felix Duelmer,Mohammad Farid Azampour,Nassir Navab
机构: 未知
类目: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted at IEEE IUS 2025

点击查看摘要

[CV-248] owards Efficient 3D Gaussian Human Avatar Compression: A Prior-Guided Framework

链接: https://arxiv.org/abs/2510.10492
作者: Shanzhi Yin,Bolin Chen,Xinju Wu,Ru-Ling Liao,Jie Chen,Shiqi Wang,Yan Ye
机构: City University of Hong Kong (香港城市大学); DAMO Academy, Alibaba Group (阿里巴巴集团达摩院); HuPan Laboratory (虎扑实验室); Fudan University (复旦大学)
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
备注: 10 pages, 4 figures

点击查看摘要

[CV-249] Enabling High-Quality In-the-Wild Imaging from Severely Aberrated Metalens Bursts

链接: https://arxiv.org/abs/2510.10083
作者: Debabrata Mandal,Zhihan Peng,Yujie Wang,Praneeth Chakravarthula
机构: UNC Chapel Hill (北卡罗来纳大学教堂山分校)
类目: Optics (physics.optics); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-250] Generative Latent Video Compression

链接: https://arxiv.org/abs/2510.09987
作者: Zongyu Guo,Zhaoyang Jia,Jiahao Li,Xiaoyi Zhang,Bin Li,Yan Lu
机构: 未知
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
备注: Preprint. Supplementary material in Openreview

点击查看摘要

人工智能

[AI-0] Operand Quant: A Single-Agent Architecture for Autonomous Machine Learning Engineering

【速读】:该论文旨在解决传统多智能体协同框架在自主机器学习工程(Machine Learning Engineering, MLE)中效率低、协调复杂的问题。现有方法通常依赖多个智能体分工协作,导致系统开销大、通信延迟高且难以统一调度。本文提出的Operand Quant架构通过将MLE全生命周期(探索、建模、实验与部署)整合到单一上下文感知的智能体中,实现了线性、非阻塞的自主操作。其关键创新在于:在受控的集成开发环境(IDE)中运行单智能体系统,无需多智能体调度即可实现高性能,最终在MLE-Benchmark (2025)上达到0.3956 ± 0.0565的整体奖牌率,成为当前最优方案。

链接: https://arxiv.org/abs/2510.11694
作者: Arjun Sahney,Ram Gorthi,Cezary Łastowski,Javier Vega
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 8 pages. No figures. Evaluated on MLE-Benchmark 2025

点击查看摘要

Abstract:We present Operand Quant, a single-agent, IDE-based architecture for autonomous machine learning engineering (MLE). Operand Quant departs from conventional multi-agent orchestration frameworks by consolidating all MLE lifecycle stages – exploration, modeling, experimentation, and deployment – within a single, context-aware agent. On the MLE-Benchmark (2025), Operand Quant achieved a new state-of-the-art (SOTA) result, with an overall medal rate of 0.3956 +/- 0.0565 across 75 problems – the highest recorded performance among all evaluated systems to date. The architecture demonstrates that a linear, non-blocking agent, operating autonomously within a controlled IDE environment, can outperform multi-agent and orchestrated systems under identical constraints.
zh

[AI-1] Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation

【速读】:该论文旨在解决强化学习(Reinforcement Learning, RL)策略从仿真环境到真实世界迁移时的挑战,尤其是对动态精度要求较高的任务。传统方法依赖域随机化(domain randomization),难以有效处理物理参数差异导致的性能下降。其解决方案的关键在于提出Phys2Real框架,该框架通过三个核心组件实现高效sim-to-real迁移:(1) 利用3D高斯泼溅(3D Gaussian splatting)进行高保真几何重建;(2) 基于视觉语言模型(Vision-Language Model, VLM)推断物理参数的先验分布;(3) 通过交互数据在线估计物理参数,并结合集成不确定性量化(ensemble-based uncertainty quantification)融合VLM预测与实时观测,从而条件化策略并持续优化物理参数估计。实验表明,该方法在T-block和锤子推动任务中显著优于基线,验证了VLM与交互信息融合的重要性。

链接: https://arxiv.org/abs/2510.11689
作者: Maggie Wang,Stephen Tian,Aiden Swann,Ola Shorinwa,Jiajun Wu,Mac Schwager
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Learning robotic manipulation policies directly in the real world can be expensive and time-consuming. While reinforcement learning (RL) policies trained in simulation present a scalable alternative, effective sim-to-real transfer remains challenging, particularly for tasks that require precise dynamics. To address this, we propose Phys2Real, a real-to-sim-to-real RL pipeline that combines vision-language model (VLM)-inferred physical parameter estimates with interactive adaptation through uncertainty-aware fusion. Our approach consists of three core components: (1) high-fidelity geometric reconstruction with 3D Gaussian splatting, (2) VLM-inferred prior distributions over physical parameters, and (3) online physical parameter estimation from interaction data. Phys2Real conditions policies on interpretable physical parameters, refining VLM predictions with online estimates via ensemble-based uncertainty quantification. On planar pushing tasks of a T-block with varying center of mass (CoM) and a hammer with an off-center mass distribution, Phys2Real achieves substantial improvements over a domain randomization baseline: 100% vs 79% success rate for the bottom-weighted T-block, 57% vs 23% in the challenging top-weighted T-block, and 15% faster average task completion for hammer pushing. Ablation studies indicate that the combination of VLM and interaction information is essential for success. Project website: this https URL .
zh

[AI-2] PACEbench: A Framework for Evaluating Practical AI Cyber-Exploitation Capabilities

【速读】:该论文旨在解决当前大型语言模型(Large Language Models, LLMs)在网络安全领域评估缺乏现实复杂性的问题,即现有基准测试难以准确衡量LLMs在实际网络攻击场景中的能力。为应对这一挑战,作者提出了PACEbench,一个基于真实漏洞难度、环境复杂性和网络安全防御原则构建的AI网络渗透测试基准,包含单漏洞、混合漏洞、链式漏洞和防御规避四种场景。其解决方案的关键在于设计了PACEagent——一种模拟人类渗透测试人员行为的新型智能体,具备多阶段侦察、分析与利用能力,从而系统性地评估LLMs在复杂攻防环境下的表现。实验表明,当前主流LLMs在复杂场景中仍存在显著局限,无法绕过防御机制,验证了该基准的有效性并为未来可信模型开发提供了可靠评估工具。

链接: https://arxiv.org/abs/2510.11688
作者: Zicheng Liu,Lige Huang,Jie Zhang,Dongrui Liu,Yuan Tian,Jing Shao
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注: Project webpage available at this https URL

点击查看摘要

Abstract:The increasing autonomy of Large Language Models (LLMs) necessitates a rigorous evaluation of their potential to aid in cyber offense. Existing benchmarks often lack real-world complexity and are thus unable to accurately assess LLMs’ cybersecurity capabilities. To address this gap, we introduce PACEbench, a practical AI cyber-exploitation benchmark built on the principles of realistic vulnerability difficulty, environmental complexity, and cyber defenses. Specifically, PACEbench comprises four scenarios spanning single, blended, chained, and defense vulnerability exploitations. To handle these complex challenges, we propose PACEagent, a novel agent that emulates human penetration testers by supporting multi-phase reconnaissance, analysis, and exploitation. Extensive experiments with seven frontier LLMs demonstrate that current models struggle with complex cyber scenarios, and none can bypass defenses. These findings suggest that current models do not yet pose a generalized cyber offense threat. Nonetheless, our work provides a robust benchmark to guide the trustworthy development of future models.
zh

[AI-3] Representation-Based Exploration for Language Models: From Test-Time to Post-Training

【速读】:该论文旨在解决当前强化学习(Reinforcement Learning, RL)技术在语言模型中是否能有效促进新行为的发现,而非仅对预训练模型已有的能力进行优化的问题。其核心挑战在于如何利用预训练模型的知识来引导探索过程,以实现更高效的多样性行为挖掘。解决方案的关键在于引入一种基于预训练语言模型隐藏状态的、原则性明确的表示奖励(representation-based bonus),通过显式激励模型探索新颖且多样的行为策略。实验表明,该方法在推理时扩展和后训练两种场景下均显著提升了多样性和任务通过率(pass@k),尤其在推理阶段大幅提高了验证效率,并在后训练中实现了样本效率的三倍提升,证明了有意识探索是突破现有能力边界的有效路径。

链接: https://arxiv.org/abs/2510.11686
作者: Jens Tuyls,Dylan J. Foster,Akshay Krishnamurthy,Jordan T. Ash
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Website and code: this https URL

点击查看摘要

Abstract:Reinforcement learning (RL) promises to expand the capabilities of language models, but it is unclear if current RL techniques promote the discovery of novel behaviors, or simply sharpen those already present in the base model. In this paper, we investigate the value of deliberate exploration – explicitly incentivizing the model to discover novel and diverse behaviors – and aim to understand how the knowledge in pre-trained models can guide this search. Our main finding is that exploration with a simple, principled, representation-based bonus derived from the pre-trained language model’s hidden states significantly improves diversity and pass@k rates – both for post-training, and in a novel inference-time scaling setting we introduce. For inference-time, exploration with representation-based diversity improves efficiency, consistently improving pass@k rates across a variety of models and reasoning tasks. For example, for Qwen-2.5-14b-Instruct we obtain over 50% improvement in verifier efficiency on almost all tasks. For post-training, we show that integrating this exploration strategy into an RL pipeline improves reasoning performance over that of the initial model and over standard RL post-training. For example, on AIME 2024, our post-trained Qwen-2.5-7b-Instruct’s pass@80 matches the pass@256 of GRPO on the same model, demonstrating a 3x improvement in test-time sample efficiency. Overall, our findings suggest that deliberate exploration – with the right notion of diversity – is a practical path toward discovery of new behaviors beyond sharpening.
zh

[AI-4] Ego-Vision World Model for Humanoid Contact Planning

【速读】:该论文旨在解决人形机器人在非结构化环境中如何有效利用物理接触(而非仅避免碰撞)以实现自主性的难题。传统基于优化的规划方法难以处理接触复杂性,而基于策略的强化学习(Reinforcement Learning, RL)则存在样本效率低和多任务能力有限的问题。解决方案的关键在于提出一个融合学习世界模型与采样式模型预测控制(Model Predictive Control, MPC)的框架:该框架使用离线数据训练的世界模型,在压缩的潜在空间中预测未来状态,并结合一个学习到的代理价值函数(surrogate value function),以应对稀疏接触奖励和传感器噪声,从而实现密集、鲁棒的规划。此单一可扩展模型支持多种接触感知任务,如扰动后的壁面支撑、阻挡来袭物体及穿越高度受限拱门,在数据效率和多任务适应性上优于传统在线策略RL方法,并已在真实人形机器人上实现实时接触规划,仅依赖本体感觉和自中心深度图像。

链接: https://arxiv.org/abs/2510.11682
作者: Hang Liu,Yuman Gao,Sangli Teng,Yufeng Chi,Yakun Sophia Shao,Zhongyu Li,Maani Ghaffari,Koushil Sreenath
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
备注:

点击查看摘要

Abstract:Enabling humanoid robots to exploit physical contact, rather than simply avoid collisions, is crucial for autonomy in unstructured environments. Traditional optimization-based planners struggle with contact complexity, while on-policy reinforcement learning (RL) is sample-inefficient and has limited multi-task ability. We propose a framework combining a learned world model with sampling-based Model Predictive Control (MPC), trained on a demonstration-free offline dataset to predict future outcomes in a compressed latent space. To address sparse contact rewards and sensor noise, the MPC uses a learned surrogate value function for dense, robust planning. Our single, scalable model supports contact-aware tasks, including wall support after perturbation, blocking incoming objects, and traversing height-limited arches, with improved data efficiency and multi-task capability over on-policy RL. Deployed on a physical humanoid, our system achieves robust, real-time contact planning from proprioception and ego-centric depth images. Website: this https URL
zh

[AI-5] SR-Scientist: Scientific Equation Discovery With Agent ic AI

【速读】:该论文旨在解决当前大型语言模型(Large Language Models, LLMs)在科学方程发现任务中仅作为方程提案者(equation proposer)的局限性,无法自主完成从数据解析、方程实现到优化迭代的完整科研流程。其解决方案的关键在于提出SR-Scientist框架,将LLM升级为具备自主科研能力的AI科学家:通过封装代码解释器为一组工具(tools),使代理(agent)能够自动执行数据处理、方程编码、实验评估与参数优化等操作,并基于长期反馈进行策略调整,从而实现无需人工定义管道(pipeline)的端到端方程发现与优化。

链接: https://arxiv.org/abs/2510.11661
作者: Shijie Xia,Yuhan Sun,Pengfei Liu
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Recently, Large Language Models (LLMs) have been applied to scientific equation discovery, leveraging their embedded scientific knowledge for hypothesis generation. However, current methods typically confine LLMs to the role of an equation proposer within search algorithms like genetic programming. In this paper, we present SR-Scientist, a framework that elevates the LLM from a simple equation proposer to an autonomous AI scientist that writes code to analyze data, implements the equation as code, submits it for evaluation, and optimizes the equation based on experimental feedback. Specifically, we wrap the code interpreter into a set of tools for data analysis and equation evaluation. The agent is instructed to optimize the equation by utilizing these tools over a long horizon with minimal human-defined pipelines. Empirical results show that SR-Scientist outperforms baseline methods by an absolute margin of 6% to 35% on datasets covering four science disciplines. Additionally, we demonstrate our method’s robustness to noise, the generalization of the discovered equations to out-of-domain data, and their symbolic accuracy. Furthermore, we develop an end-to-end reinforcement learning framework to enhance the agent’s capabilities.
zh

[AI-6] ManiAgent : An Agent ic Framework for General Robotic Manipulation

【速读】:该论文旨在解决视觉-语言-动作(Vision-Language-Action, VLA)模型在复杂推理和长程任务规划中因数据稀缺与模型容量限制而导致性能受限的问题。其解决方案的关键在于提出一种名为ManiAgent的代理架构(agentic architecture),该架构通过多个智能体之间的协同通信,实现环境感知、子任务分解与动作生成的端到端流程,从而高效处理复杂的操作场景。实验证明,该方法在SimplerEnv基准上达到86.8%的成功率,在真实世界的抓取与放置任务中达95.8%,并能通过高效数据收集提升VLA模型性能,使其媲美人类标注数据训练的结果。

链接: https://arxiv.org/abs/2510.11660
作者: Yi Yang,Kefan Gu,Yuqing Wen,Hebei Li,Yucheng Zhao,Tiancai Wang,Xudong Liu
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注: 8 pages, 6 figures, conference

点击查看摘要

Abstract:While Vision-Language-Action (VLA) models have demonstrated impressive capabilities in robotic manipulation, their performance in complex reasoning and long-horizon task planning is limited by data scarcity and model capacity. To address this, we introduce ManiAgent, an agentic architecture for general manipulation tasks that achieves end-to-end output from task descriptions and environmental inputs to robotic manipulation actions. In this framework, multiple agents involve inter-agent communication to perform environmental perception, sub-task decomposition and action generation, enabling efficient handling of complex manipulation scenarios. Evaluations show ManiAgent achieves an 86.8% success rate on the SimplerEnv benchmark and 95.8% on real-world pick-and-place tasks, enabling efficient data collection that yields VLA models with performance comparable to those trained on human-annotated this http URL project webpage is available at this https URL.
zh

[AI-7] MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model

【速读】:该论文旨在解决当前基于强化学习(Reinforcement Learning, RL)的大型语言模型(Large Language Models, LLMs)在数学推理任务中陷入“能力 plateau”的问题,即现有RL方法主要通过优化已有解题模式而非发现新的推理策略,导致性能提升有限。其解决方案的关键在于提出一个名为MATH-Beyond(MATH-B)的新基准测试集,该基准专为挑战主流开源模型(如8B参数规模)设计,即使在大规模采样(例如pass@1024)条件下仍能保持高难度,从而迫使RL方法探索超出基础模型能力的新推理路径。实验证明,当前主流RL微调模型在该基准上表现不佳,凸显了现有方法的局限性,并推动开发更具探索性的RL策略以激发更深层次的数学推理能力。

链接: https://arxiv.org/abs/2510.11653
作者: Prasanna Mayilvahanan,Ricardo Dominguez-Olmedo,Thaddäus Wiedemer,Wieland Brendel
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:With the advent of DeepSeek-R1, a new wave of reinforcement learning (RL) methods has emerged that seem to unlock stronger mathematical reasoning. However, a closer look at the open-source ecosystem reveals a critical limitation: with sufficiently many draws (e.g., \textttpass@1024 ), many existing base models already solve nearly all questions on widely used math benchmarks such as MATH-500 and AIME 2024. This suggests that the RL fine-tuning methods prevalent in the LLM reasoning literature largely sharpen existing solution modes rather than discovering entirely new ones. Such sharpening stands in contrast to the broader promise of RL: to foster exploration and to acquire new skills. To move beyond this plateau, we introduce MATH-Beyond (MATH-B), a benchmark deliberately constructed to defeat common open-source models of up to 8B parameters even under large sampling budgets. Improving performance on our benchmark via RL requires methods that learn to reason in ways that go beyond base model capabilities in repeated sampling. Since the problems are drawn from subsets of DAPO-Math-17K and DeepScaleR datasets, they remain topically equivalent to standard high-school math. Validating our premise, RL fine-tuned models such as Nemotron-Research-Reasoning-Qwen-1.5B and DeepScaleR-1.5B-Preview perform poorly on MATH-B at \textttpass@1024 , showing how existing approaches fall short on tackling harder instances. We hope MATH-B will catalyze exploration-driven RL approaches that elicit deeper reasoning capabilities. We release MATH-B at this https URL.
zh

[AI-8] Attention Factors for Statistical Arbitrag e

【速读】:该论文旨在解决统计套利(Statistical Arbitrage)中如何有效识别相似资产、捕捉价格错配并构建风险调整后收益最优的交易策略这一核心问题。其关键解决方案在于提出了一种联合学习框架——Attention Factors,该方法通过从公司特征嵌入(firm characteristic embeddings)中学习条件潜在因子(conditional latent factors),从而自动识别对套利交易最具价值的因子,并允许复杂交互关系建模;同时,利用通用序列模型从因子残差组合中提取时间序列信号,最终实现因子估计与交易策略制定的一体化优化,显著提升了扣除交易成本后的风险调整收益(Sharpe ratio),在美股大盘24年数据中实现了超4的样本外夏普比率。

链接: https://arxiv.org/abs/2510.11616
作者: Elliot L. Epstein,Rose Wang,Jaewon Choi,Markus Pelger
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Finance (q-fin.CP)
备注: Accepted to the 6th ACM International Conference on AI in Finance

点击查看摘要

Abstract:Statistical arbitrage exploits temporal price differences between similar assets. We develop a framework to jointly identify similar assets through factors, identify mispricing and form a trading policy that maximizes risk-adjusted performance after trading costs. Our Attention Factors are conditional latent factors that are the most useful for arbitrage trading. They are learned from firm characteristic embeddings that allow for complex interactions. We identify time-series signals from the residual portfolios of our factors with a general sequence model. Estimating factors and the arbitrage trading strategy jointly is crucial to maximize profitability after trading costs. In a comprehensive empirical study we show that our Attention Factor model achieves an out-of-sample Sharpe ratio above 4 on the largest U.S. equities over a 24-year period. Our one-step solution yields an unprecedented Sharpe ratio of 2.3 net of transaction costs. We show that weak factors are important for arbitrage trading.
zh

[AI-9] ParaCook: On Time-Efficient Planning for Multi-Agent Systems

【速读】:该论文旨在解决当前大型语言模型(Large Language Models, LLMs)在多智能体协作规划中缺乏对时间效率考量的问题,现有基准测试主要关注任务完成度而忽视了并行与异步操作下的时间性能。其解决方案的关键在于提出ParaCook——一个基于烹饪任务的多智能体协同规划基准,通过简化动作空间以聚焦于战略性的并行规划挑战,并提供可扩展的评估框架来衡量LLMs在时间效率上的表现。实验表明,当前主流LLM方法在并行行动协调上存在不足,但其在抽象任务中展现出高阶并行优化潜力,ParaCook为此类研究提供了系统化的评测基础。

链接: https://arxiv.org/abs/2510.11608
作者: Shiqi Zhang,Xinbei Ma,Yunqing Xu,Zouying Cao,Pengrui Lu,Haobo Yuan,Tiancheng Shen,Zhuosheng Zhang,Hai Zhao,Ming-Hsuan Yang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Large Language Models (LLMs) exhibit strong reasoning abilities for planning long-horizon, real-world tasks, yet existing agent benchmarks focus on task completion while neglecting time efficiency in parallel and asynchronous operations. To address this, we present ParaCook, a benchmark for time-efficient collaborative planning. Inspired by the Overcooked game, ParaCook provides an environment for various challenging interaction planning of multi-agent systems that are instantiated as cooking tasks, with a simplified action space to isolate the core challenge of strategic parallel planning. Through a comprehensive evaluation of state-of-the-art LLMs, we find that current approaches achieve suboptimal plans, which struggle with parallel actions or coordination. Our analysis also reveals LLMs’ potential on abstract tasks where they can focus on high-level parallel optimization. ParaCook provides a scalable evaluation framework with adjustable complexity, establishing a foundation for developing and assessing time efficiency-aware multi-agent planning. The code and data are available at this https URL.
zh

[AI-10] Explainability risk modeling and segmentation based customer churn analytics for personalized retention in e-commerce

【速读】:该论文旨在解决在线零售中客户流失(churn)预测模型缺乏可解释性的问题,即现有模型多为“黑箱”,难以揭示流失驱动因素、识别最优干预时机以及定位高风险客户群体,从而限制了个性化留存策略的制定与实施。解决方案的关键在于构建一个三组件框架:首先利用可解释人工智能(Explainable AI, XAI)量化各特征对流失的贡献度,其次通过生存分析(survival analysis)建模时间至事件的流失风险以确定干预窗口,最后结合RFM(Recency, Frequency, Monetary)客户行为分群方法对客户进行精细化分段,从而实现流失原因归因、干预时机估计和高优先级客户群体识别的协同优化,支撑更具针对性和实效性的客户留存策略。

链接: https://arxiv.org/abs/2510.11604
作者: Sanjula De Alwis,Indrajith Ekanayake
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:In online retail, customer acquisition typically incurs higher costs than customer retention, motivating firms to invest in churn analytics. However, many contemporary churn models operate as opaque black boxes, limiting insight into the determinants of attrition, the timing of retention opportunities, and the identification of high-risk customer segments. Accordingly, the emphasis should shift from prediction alone to the design of personalized retention strategies grounded in interpretable evidence. This study advances a three-component framework that integrates explainable AI to quantify feature contributions, survival analysis to model time-to-event churn risk, and RFM profiling to segment customers by transactional behaviour. In combination, these methods enable the attribution of churn drivers, estimation of intervention windows, and prioritization of segments for targeted actions, thereby supporting strategies that reduce attrition and strengthen customer loyalty.
zh

[AI-11] Reproducibility: The New Frontier in AI Governance ICML

【速读】:该论文旨在解决当前人工智能(AI)治理中因研究信息环境信号噪声比过低而导致的政策制定困境,即缺乏可靠、可复现的研究成果使得政策制定者难以形成对AI风险的共识,进而影响有效治理机制的建立。其解决方案的关键在于推动AI研究领域采纳更严格的可复现性(reproducibility)指南,包括实施预注册(preregistration)、提升统计功效(statistical power)以及鼓励发表阴性结果(negative result publication),从而增强研究可信度,为政策制定提供坚实科学基础,使AI治理从被动反应转向更具前瞻性和一致性的制度安排。

链接: https://arxiv.org/abs/2510.11595
作者: Israel Mason-Williams,Gabryel Mason-Williams
机构: 未知
类目: Artificial Intelligence (cs.AI); General Literature (cs.GL)
备注: 12 pages,6 figures,Workshop on Technical AI Governance at ICML

点击查看摘要

Abstract:AI policymakers are responsible for delivering effective governance mechanisms that can provide safe, aligned and trustworthy AI development. However, the information environment offered to policymakers is characterised by an unnecessarily low Signal-To-Noise Ratio, favouring regulatory capture and creating deep uncertainty and divides on which risks should be prioritised from a governance perspective. We posit that the current publication speeds in AI combined with the lack of strong scientific standards, via weak reproducibility protocols, effectively erodes the power of policymakers to enact meaningful policy and governance protocols. Our paper outlines how AI research could adopt stricter reproducibility guidelines to assist governance endeavours and improve consensus on the AI risk landscape. We evaluate the forthcoming reproducibility crisis within AI research through the lens of crises in other scientific domains; providing a commentary on how adopting preregistration, increased statistical power and negative result publication reproducibility protocols can enable effective AI governance. While we maintain that AI governance must be reactive due to AI’s significant societal implications we argue that policymakers and governments must consider reproducibility protocols as a core tool in the governance arsenal and demand higher standards for AI research. Code to replicate data and figures: this https URL
zh

[AI-12] Analyzing and Internalizing Complex Policy Documents for LLM Agents

【速读】:该论文旨在解决大语言模型(Large Language Model, LLM)驱动的智能体系统中,因业务规则政策文档(policy documents)不断扩展导致的计算开销过高问题,以及现有提示压缩方法难以有效处理多复杂度层级政策文本所带来的内部化(internalization)挑战。其核心解决方案是提出Category-Aware Policy Continued Pretraining(CAP-CPT),通过自动化解析政策文档并按事实性(factual)、行为性(behavioral)和条件性(conditional)三类结构化提取关键规范,识别出驱动工作流复杂性的高阶条件逻辑;在此基础上,利用自回归预训练损失进行针对性数据合成与模型微调,从而在显著减少提示长度(最高达97.3%)的同时提升智能体对复杂策略的理解与执行能力,且仅需少量监督微调(SFT)数据即可实现稳定性能增益(如Qwen-3-32B上提升最高达41%)。

链接: https://arxiv.org/abs/2510.11588
作者: Jiateng Liu,Zhenhailong Wang,Xiaojiang Huang,Yingjie Li,Xing Fan,Xiang Li,Chenlei Guo,Ruhi Sarikaya,Heng Ji
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 42 pages

点击查看摘要

Abstract:Large Language Model (LLM)-based agentic systems rely on in-context policy documents encoding diverse business rules. As requirements grow, these documents expand rapidly, causing high computational overhead. This motivates developing internalization methods that embed policy documents into model priors while preserving performance. Prior prompt compression work targets generic prompts, but agentic policy documents span multiple complexity levels and require deeper reasoning, making internalization harder. We introduce CC-Gen, an agentic benchmark generator with Controllable Complexity across four levels, enabling systematic evaluation of agents’ ability to handle complexity and offering a unified framework for assessing policy internalization. Our analysis shows that complex policy specifications governing workflows pose major reasoning challenges. Supporting internalization with gold user agent interaction trajectories containing chain-of-thought (CoT) annotations via supervised fine-tuning (SFT) is data-intensive and degrades sharply as policy complexity increases. To mitigate data and reasoning burdens, we propose Category-Aware Policy Continued Pretraining (CAP-CPT). Our automated pipeline parses policy documents to extract key specifications, grouping them into factual, behavioral, and conditional categories, and isolating complex conditions that drive workflow complexity. This guides targeted data synthesis and enables agents to internalize policy information through an autoregressive pretraining loss. Experiments show CAP-CPT improves SFT baselines in all settings, with up to 41% and 22% gains on Qwen-3-32B, achieving 97.3% prompt length reduction on CC-Gen and further enhancing tau-Bench with minimal SFT data.
zh

[AI-13] Characterizing Web Search in The Age of Generative AI

【速读】:该论文试图解决的问题是:在生成式 AI(Generative AI)时代,生成式搜索(Generative Search)与传统网页搜索在输出形式和信息呈现方式上的本质差异及其影响。解决方案的关键在于系统性地比较传统搜索引擎(如 Google)与四种来自两家提供商(Google 和 OpenAI)的生成式搜索引擎,在四个不同领域查询下的表现差异,从而揭示生成式搜索在来源覆盖范围、模型内知识与外部检索知识依赖程度、概念呈现多样性等方面的独特特征,为未来评估标准的重构提供实证依据。

链接: https://arxiv.org/abs/2510.11560
作者: Elisabeth Kirsten,Jost Grosse Perdekamp,Mihir Upadhyay,Krishna P. Gummadi,Muhammad Bilal Zafar
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:The advent of LLMs has given rise to a new type of web search: Generative search, where LLMs retrieve web pages related to a query and generate a single, coherent text as a response. This output modality stands in stark contrast to traditional web search, where results are returned as a ranked list of independent web pages. In this paper, we ask: Along what dimensions do generative search outputs differ from traditional web search? We compare Google, a traditional web search engine, with four generative search engines from two providers (Google and OpenAI) across queries from four domains. Our analysis reveals intriguing differences. Most generative search engines cover a wider range of sources compared to web search. Generative search engines vary in the degree to which they rely on internal knowledge contained within the model parameters v.s. external knowledge retrieved from the web. Generative search engines surface varying sets of concepts, creating new opportunities for enhancing search diversity and serendipity. Our results also highlight the need for revisiting evaluation criteria for web search in the age of Generative AI.
zh

[AI-14] Zero Data Retention in LLM -based Enterprise AI Assistants: A Comparative Study of Market Leading Agent ic AI Products

【速读】:该论文旨在解决企业在部署生成式 AI(Generative AI)助手时面临的隐私保护与合规性挑战,特别是在医疗和金融等高度监管行业中如何实现数据零留存(zero data retention)。其解决方案的关键在于通过定义企业级大语言模型(Large Language Model, LLM)应用的架构设计、合规要求与可用性之间的权衡关系,探索并实现由LLM服务提供商(如OpenAI、Anthropic和Meta)及企业应用(如Salesforce AgentForce和Microsoft Copilot)共同支持的零数据留存策略。该策略的核心是确保用户输入数据不被存储或用于模型训练,从而在提升业务效率的同时满足数据主权和隐私合规需求。

链接: https://arxiv.org/abs/2510.11558
作者: Komal Gupta,Aditya Shrivastava
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Governance of data, compliance, and business privacy matters, particularly for healthcare and finance businesses. Since the recent emergence of AI enterprise AI assistants enhancing business productivity, safeguarding private data and compliance is now a priority. With the implementation of AI assistants across the enterprise, the zero data retention can be achieved by implementing zero data retention policies by Large Language Model businesses like Open AI and Anthropic and Meta. In this work, we explore zero data retention policies for the Enterprise apps of large language models (LLMs). Our key contribution is defining the architectural, compliance, and usability trade-offs of such systems in parallel. In this research work, we examine the development of commercial AI assistants with two industry leaders and market titans in this arena - Salesforce and Microsoft. Both of these companies used distinct technical architecture to support zero data retention policies. Salesforce AgentForce and Microsoft Copilot are among the leading AI assistants providing much-needed push to business productivity in customer care. The purpose of this paper is to analyze the technical architecture and deployment of zero data retention policy by consuming applications as well as big language models service providers like Open Ai, Anthropic, and Meta.
zh

[AI-15] Query-Specific GNN: A Comprehensive Graph Representation Learning Method for Retrieval Augmented Generation

【速读】:该论文旨在解决多跳问答(multi-hop question)场景下检索增强生成(RAG)系统面临的两大挑战:一是现有方法难以充分理解具有复杂语义结构的问题,二是多步检索过程中容易受到无关噪声信息的干扰。其解决方案的关键在于提出一种新颖的图表示学习框架,包含两个核心组件:首先构建多信息层级知识图(Multi-information Level Knowledge Graph, Multi-L KG),以建模不同粒度的信息层级来更全面地解析多跳问题;其次设计查询感知的图神经网络(Query-Specific Graph Neural Network, QSGNN),通过引入跨层与层内消息传递机制,并在每轮聚合中由查询引导信息传播,从而实现多粒度信息整合并有效抑制噪声影响。此外,为提升模型鲁棒性,还提出了两种合成数据生成策略用于预训练QSGNN,实验表明该框架在高跳数问答任务中性能提升可达33.8%。

链接: https://arxiv.org/abs/2510.11541
作者: Yuchen Yan,Zhihua Liu,Hao Wang,Weiming Li,Xiaoshuai Hao
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Retrieval-augmented generation (RAG) has demonstrated its ability to enhance Large Language Models (LLMs) by integrating external knowledge sources. However, multi-hop questions, which require the identification of multiple knowledge targets to form a synthesized answer, raise new challenges for RAG systems. Under the multi-hop settings, existing methods often struggle to fully understand the questions with complex semantic structures and are susceptible to irrelevant noise during the retrieval of multiple information targets. To address these limitations, we propose a novel graph representation learning framework for multi-hop question retrieval. We first introduce a Multi-information Level Knowledge Graph (Multi-L KG) to model various information levels for a more comprehensive understanding of multi-hop questions. Based on this, we design a Query-Specific Graph Neural Network (QSGNN) for representation learning on the Multi-L KG. QSGNN employs intra/inter-level message passing mechanisms, and in each message passing the information aggregation is guided by the query, which not only facilitates multi-granular information aggregation but also significantly reduces the impact of noise. To enhance its ability to learn robust representations, we further propose two synthesized data generation strategies for pre-training the QSGNN. Extensive experimental results demonstrate the effectiveness of our framework in multi-hop scenarios, especially in high-hop questions the improvement can reach 33.8%. The code is available at: this https URL.
zh

[AI-16] CodeWatcher: IDE Telemetry Data Extraction Tool for Understanding Coding Interactions with LLM s

【速读】:该论文旨在解决开发者与代码生成工具(Code Generation Tools, CGTs)交互行为研究中缺乏细粒度、实时数据的问题,传统方法往往因干扰开发流程而难以获取准确的行为记录。其解决方案的关键在于提出并实现了一个轻量级、无侵入式的客户端-服务器系统——CodeWatcher,该系统通过在Visual Studio Code(VS Code)编辑器内集成插件,捕获语义明确的交互事件(如CGT插入、删除、复制粘贴和焦点切换),并利用Python RESTful API与MongoDB后端进行结构化存储与时间戳标记,从而支持对编码会话的后期重构和深入的行为分析,为负责任的人工智能研究、开发者生产力评估及以人为中心的CGT评价提供了关键基础设施。

链接: https://arxiv.org/abs/2510.11536
作者: Manaal Basha,Aimeê M. Ribeiro,Jeena Javahar,Cleidson R. B. de Souza,Gema Rodríguez-Pérez
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注: ICSME 2025 Tool Demonstration Track

点击查看摘要

Abstract:Understanding how developers interact with code generation tools (CGTs) requires detailed, real-time data on programming behavior which is often difficult to collect without disrupting workflow. We present \textitCodeWatcher, a lightweight, unobtrusive client-server system designed to capture fine-grained interaction events from within the Visual Studio Code (VS Code) editor. \textitCodeWatcher logs semantically meaningful events such as insertions made by CGTs, deletions, copy-paste actions, and focus shifts, enabling continuous monitoring of developer activity without modifying user workflows. The system comprises a VS Code plugin, a Python-based RESTful API, and a MongoDB backend, all containerized for scalability and ease of deployment. By structuring and timestamping each event, \textitCodeWatcher enables post-hoc reconstruction of coding sessions and facilitates rich behavioral analyses, including how and when CGTs are used during development. This infrastructure is crucial for supporting research on responsible AI, developer productivity, and the human-centered evaluation of CGTs. Please find the demo, diagrams, and tool here: this https URL.
zh

[AI-17] A Flexible Multi-Agent Deep Reinforcement Learning Framework for Dynamic Routing and Scheduling of Latency-Critical Services

【速读】:该论文旨在解决动态异构网络中延迟敏感信息的可靠传输问题,尤其针对现有网络控制方案仅关注平均延迟性能而无法提供严格的端到端(End-to-End, E2E)峰值延迟保障的局限性。其解决方案的关键在于提出一种基于多智能体深度强化学习(Multi-Agent Deep Reinforcement Learning, MA-DRL)的新型网络控制框架,该框架采用集中式路由与分布式调度相结合的架构,利用多智能体深度确定性策略梯度(MADDPG)算法设计有效的策略,使路由和调度代理能够根据数据包生命周期动态分配路径并调度传输,从而最大化按时交付率。该框架还具备良好的通用性,可融合数据驱动的深度强化学习(Deep Reinforcement Learning, DRL)代理与传统规则策略,在性能与学习复杂度之间取得平衡,显著优于基于随机优化的传统方法。

链接: https://arxiv.org/abs/2510.11535
作者: Vincenzo Norman Vitale,Antonia Maria Tulino,Andreas F. Molisch,Jaime Llorca
机构: 未知
类目: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Timely delivery of delay-sensitive information over dynamic, heterogeneous networks is increasingly essential for a range of interactive applications, such as industrial automation, self-driving vehicles, and augmented reality. However, most existing network control solutions target only average delay performance, falling short of providing strict End-to-End (E2E) peak latency guarantees. This paper addresses the challenge of reliably delivering packets within application-imposed deadlines by leveraging recent advancements in Multi-Agent Deep Reinforcement Learning (MA-DRL). After introducing the Delay-Constrained Maximum-Throughput (DCMT) dynamic network control problem, and highlighting the limitations of current solutions, we present a novel MA-DRL network control framework that leverages a centralized routing and distributed scheduling architecture. The proposed framework leverages critical networking domain knowledge for the design of effective MA-DRL strategies based on the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) technique, where centralized routing and distributed scheduling agents dynamically assign paths and schedule packet transmissions according to packet lifetimes, thereby maximizing on-time packet delivery. The generality of the proposed framework allows integrating both data-driven \blueDeep Reinforcement Learning (DRL) agents and traditional rule-based policies in order to strike the right balance between performance and learning complexity. Our results confirm the superiority of the proposed framework with respect to traditional stochastic optimization-based approaches and provide key insights into the role and interplay between data-driven DRL agents and new rule-based policies for both efficient and high-performance control of latency-critical services.
zh

[AI-18] Cracking CodeWhisperer: Analyzing Developers Interactions and Patterns During Programming Tasks

【速读】:该论文旨在解决当前软件开发者对生成式 AI(Generative AI)代码生成工具(如 Amazon CodeWhisperer)的使用行为尚不清晰的问题,以理解其在实际开发场景中的采纳方式与交互模式。解决方案的关键在于通过两轮用户研究(每轮10名参与者)结合定性与定量方法,利用定制化遥测插件收集低层级交互数据,并识别出四种核心行为模式:增量式代码优化、通过自然语言注释明确指令、基于模型建议的基础结构搭建,以及与外部资源的整合式使用,从而为提升代码生成工具的设计与应用提供实证依据。

链接: https://arxiv.org/abs/2510.11516
作者: Jeena Javahar,Tanya Budhrani,Manaal Basha,Cleidson R. B. de Souza,Ivan Beschastnikh,Gema Rodriguez-Perez
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注: VL/HCC 2025 Short Paper

点击查看摘要

Abstract:The use of AI code-generation tools is becoming increasingly common, making it important to understand how software developers are adopting these tools. In this study, we investigate how developers engage with Amazon’s CodeWhisperer, an LLM-based code-generation tool. We conducted two user studies with two groups of 10 participants each, interacting with CodeWhisperer - the first to understand which interactions were critical to capture and the second to collect low-level interaction data using a custom telemetry plugin. Our mixed-methods analysis identified four behavioral patterns: 1) incremental code refinement, 2) explicit instruction using natural language comments, 3) baseline structuring with model suggestions, and 4) integrative use with external sources. We provide a comprehensive analysis of these patterns .
zh

[AI-19] Automatic Music Sample Identification with Multi-Track Contrastive Learning

【速读】:该论文旨在解决自动样本识别(automatic sample identification)问题,即在新生成的音乐中检测出被采样的音频片段,并从参考数据库中准确检索其原始来源。解决方案的关键在于采用自监督学习方法,利用多轨数据集构建人工混音的正样本对,并设计了一种新颖的对比学习目标函数,从而显著优于先前的最先进基线方法,在多种音乐风格下具有鲁棒性,且在参考数据库噪声歌曲数量增加时仍能良好扩展。此外,研究还强调高质量分离的音频声部(separated stems)对于该任务的重要性。

链接: https://arxiv.org/abs/2510.11507
作者: Alain Riou,Joan Serrà,Yuki Mitsufuji
机构: 未知
类目: ound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
备注:

点击查看摘要

Abstract:Sampling, the technique of reusing pieces of existing audio tracks to create new music content, is a very common practice in modern music production. In this paper, we tackle the challenging task of automatic sample identification, that is, detecting such sampled content and retrieving the material from which it originates. To do so, we adopt a self-supervised learning approach that leverages a multi-track dataset to create positive pairs of artificial mixes, and design a novel contrastive learning objective. We show that such method significantly outperforms previous state-of-the-art baselines, that is robust to various genres, and that scales well when increasing the number of noise songs in the reference database. In addition, we extensively analyze the contribution of the different components of our training pipeline and highlight, in particular, the need for high-quality separated stems for this task.
zh

[AI-20] Offline Reinforcement Learning with Generative Trajectory Policies ICLR2026

【速读】:该论文旨在解决离线强化学习(Offline Reinforcement Learning, Offline RL)中生成式策略(Generative Policies)存在的性能与效率之间的权衡问题:传统迭代式模型(如扩散策略,Diffusion Policies)计算成本高,而单步快速模型(如一致性策略,Consistency Policies)则常因性能下降难以实用。解决方案的关键在于提出一个统一的理论视角,将扩散、流匹配(Flow Matching)和一致性模型等现代生成模型视为由常微分方程(Ordinary Differential Equation, ODE)控制的连续时间生成轨迹的特例;基于此,作者进一步提出生成轨迹策略(Generative Trajectory Policies, GTPs),其核心是学习该ODE的完整解映射(solution map),从而在理论上统一并优化生成式策略的设计空间,并通过两个理论严谨的适配机制使该范式适用于离线RL场景,最终在D4RL基准上实现SOTA性能,尤其在极具挑战性的AntMaze任务中取得满分表现。

链接: https://arxiv.org/abs/2510.11499
作者: Xinsong Feng,Leshu Tang,Chenan Wang,Haipeng Chen
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Preprint. Under review at ICLR 2026

点击查看摘要

Abstract:Generative models have emerged as a powerful class of policies for offline reinforcement learning (RL) due to their ability to capture complex, multi-modal behaviors. However, existing methods face a stark trade-off: slow, iterative models like diffusion policies are computationally expensive, while fast, single-step models like consistency policies often suffer from degraded performance. In this paper, we demonstrate that it is possible to bridge this gap. The key to moving beyond the limitations of individual methods, we argue, lies in a unifying perspective that views modern generative models, including diffusion, flow matching, and consistency models, as specific instances of learning a continuous-time generative trajectory governed by an Ordinary Differential Equation (ODE). This principled foundation provides a clearer design space for generative policies in RL and allows us to propose Generative Trajectory Policies (GTPs), a new and more general policy paradigm that learns the entire solution map of the underlying ODE. To make this paradigm practical for offline RL, we further introduce two key theoretically principled adaptations. Empirical results demonstrate that GTP achieves state-of-the-art performance on D4RL benchmarks - it significantly outperforms prior generative policies, achieving perfect scores on several notoriously hard AntMaze tasks.
zh

[AI-21] Coordinated Strategies in Realistic Air Combat by Hierarchical Multi-Agent Reinforcement Learning

【速读】:该论文旨在解决真实空战模拟中因情境感知不完善和非线性飞行动力学导致的任务目标达成难题。其解决方案的关键在于提出了一种新型3D多智能体空战环境与分层多智能体强化学习(Hierarchical Multi-Agent Reinforcement Learning)框架,通过异构智能体动力学建模、课程学习(curriculum learning)、联赛对战(league-play)以及一种新适配的训练算法,将决策过程划分为两个抽象层次:低层策略学习精确控制机动,高层策略根据任务目标发出战术指令,从而显著提升复杂缠斗场景下的学习效率与作战性能。

链接: https://arxiv.org/abs/2510.11474
作者: Ardian Selmonaj,Giacomo Del Rio,Adrian Schneider,Alessandro Antonucci
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
备注: 2025 IEEE International Conference on Agentic AI (ICA)

点击查看摘要

Abstract:Achieving mission objectives in a realistic simulation of aerial combat is highly challenging due to imperfect situational awareness and nonlinear flight dynamics. In this work, we introduce a novel 3D multi-agent air combat environment and a Hierarchical Multi-Agent Reinforcement Learning framework to tackle these challenges. Our approach combines heterogeneous agent dynamics, curriculum learning, league-play, and a newly adapted training algorithm. To this end, the decision-making process is organized into two abstraction levels: low-level policies learn precise control maneuvers, while high-level policies issue tactical commands based on mission objectives. Empirical results show that our hierarchical approach improves both learning efficiency and combat performance in complex dogfight scenarios.
zh

[AI-22] Iterative Amortized Inference: Unifying In-Context Learning and Learned Optimizers

【速读】:该论文旨在解决当前 amortized learning 方法在处理大规模数据时的可扩展性瓶颈问题,即大多数现有方法在推理阶段对任务数据的处理能力受限(如上下文长度限制),难以适应大规模数据集。其解决方案的关键在于提出一种称为“迭代 amortized 推理”(iterative amortized inference)的新范式,该方法通过分批逐步优化的方式,在 mini-batch 级别上迭代 refine 解决方案,借鉴了随机优化的思想,从而实现了从基于优化的元学习到大型语言模型(LLM)中前向传播 amortization 的统一建模,为通用任务适配提供了可扩展且可扩展的基础架构。

链接: https://arxiv.org/abs/2510.11471
作者: Sarthak Mittal,Divyat Mahajan,Guillaume Lajoie,Mohammad Pezeshki
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Modern learning systems increasingly rely on amortized learning - the idea of reusing computation or inductive biases shared across tasks to enable rapid generalization to novel problems. This principle spans a range of approaches, including meta-learning, in-context learning, prompt tuning, learned optimizers and more. While motivated by similar goals, these approaches differ in how they encode and leverage task-specific information, often provided as in-context examples. In this work, we propose a unified framework which describes how such methods differ primarily in the aspects of learning they amortize - such as initializations, learned updates, or predictive mappings - and how they incorporate task data at inference. We introduce a taxonomy that categorizes amortized models into parametric, implicit, and explicit regimes, based on whether task adaptation is externalized, internalized, or jointly modeled. Building on this view, we identify a key limitation in current approaches: most methods struggle to scale to large datasets because their capacity to process task data at inference (e.g., context length) is often limited. To address this, we propose iterative amortized inference, a class of models that refine solutions step-by-step over mini-batches, drawing inspiration from stochastic optimization. Our formulation bridges optimization-based meta-learning with forward-pass amortization in models like LLMs, offering a scalable and extensible foundation for general-purpose task adaptation.
zh

[AI-23] Unifying Deductive and Abductive Reasoning in Knowledge Graphs with Masked Diffusion Model

【速读】:该论文旨在解决知识图谱中演绎推理(Deductive Reasoning)与归纳推理(Abductive Reasoning)长期被孤立处理的问题,而二者在实际应用中具有显著的协同潜力:演绎可用于验证假设,归纳可挖掘深层逻辑模式。为实现统一建模,论文提出DARK框架——一种基于掩码扩散模型(Masked Diffusion Model)的通用推理系统,其核心创新在于两点:一是引入自省去噪机制(self-reflective denoising process),通过迭代生成并验证候选假设来增强归纳推理中的假设精炼能力;二是设计逻辑探索强化学习策略(logic-exploration reinforcement learning approach),同时掩码查询和结论以发现更丰富的逻辑关联组合。实验表明,DARK在多个基准知识图谱上均达到当前最优性能,验证了统一方法的有效性。

链接: https://arxiv.org/abs/2510.11462
作者: Yisen Gao,Jiaxin Bai,Yi Huang,Xingcheng Fu,Qingyun Sun,Yangqiu Song
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: Under Review

点击查看摘要

Abstract:Deductive and abductive reasoning are two critical paradigms for analyzing knowledge graphs, enabling applications from financial query answering to scientific discovery. Deductive reasoning on knowledge graphs usually involves retrieving entities that satisfy a complex logical query, while abductive reasoning generates plausible logical hypotheses from observations. Despite their clear synergistic potential, where deduction can validate hypotheses and abduction can uncover deeper logical patterns, existing methods address them in isolation. To bridge this gap, we propose DARK, a unified framework for Deductive and Abductive Reasoning in Knowledge graphs. As a masked diffusion model capable of capturing the bidirectional relationship between queries and conclusions, DARK has two key innovations. First, to better leverage deduction for hypothesis refinement during abductive reasoning, we introduce a self-reflective denoising process that iteratively generates and validates candidate hypotheses against the observed conclusion. Second, to discover richer logical associations, we propose a logic-exploration reinforcement learning approach that simultaneously masks queries and conclusions, enabling the model to explore novel reasoning compositions. Extensive experiments on multiple benchmark knowledge graphs show that DARK achieves state-of-the-art performance on both deductive and abductive reasoning tasks, demonstrating the significant benefits of our unified approach.
zh

[AI-24] From Answer to Think: Multidimensional Supervision of Reasoning Process for LLM Optimization

【速读】:该论文旨在解决大型语言模型(Large Language Models, LLMs)在多步推理能力上的提升难题,尤其是传统基于最终结果的强化学习方法(outcome-supervised reinforcement learning, RLVR)因奖励信号稀疏且无法纠正错误推理路径而导致性能瓶颈的问题。其解决方案的关键在于提出一种新的监督框架——维度级奖励模型(Dimension-level Reward Model, DRM),该模型通过三个互补且可解释的维度对推理过程进行评估:置信度(Confidence)用于不确定性校准、相关性(Relevance)用于语义对齐、连贯性(Coherence)用于逻辑一致性。这一多维监督机制不依赖于真实答案,能够提供密集且可解释的反馈信号,从而有效引导LLM优化推理过程,并在分布内和分布外任务(如数学推理、问答、代码执行和谜题求解)中显著提升泛化推理能力。

链接: https://arxiv.org/abs/2510.11457
作者: Beining Wang,Weihang Su,Hongtao Tian,Tao Yang,Yujia Zhou,Ting Yao,Qingyao Ai,Yiqun Liu
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Improving the multi-step reasoning ability of Large Language Models (LLMs) is a critical yet challenging task. The dominant paradigm, outcome-supervised reinforcement learning (RLVR), rewards only correct final answers, often propagating flawed reasoning and suffering from sparse reward signals. While process-level reward models (PRMs) provide denser, step-by-step feedback, they lack generalizability and interpretability, requiring task-specific segmentation of the reasoning process. To this end, we propose the Dimension-level Reward Model (DRM), a new supervision framework that bridges the gap between these two approaches. DRM evaluates the quality of a reasoning process along three fundamental, complementary, and interpretable dimensions: Confidence for uncertainty calibration, Relevance for semantic alignment, and Coherence for logical consistency. Together, these dimensions capture aspects beyond final answer correctness and enable interpretable assessment without requiring ground truth answers. Experimental results show that DRM provides effective supervision signals, guides the optimization of LLMs and enhances their reasoning ability. In particular, DRM-supervised training achieves consistent gains on both in-distribution and out-of-distribution open-domain tasks, including mathematics, question answering, code execution, and puzzles. Our findings demonstrate that multidimensional supervision of the reasoning process can improve the generalized reasoning ability of LLMs beyond the training distribution.
zh

[AI-25] Audio-Maestro: Enhancing Large Audio-Language Models with Tool-Augmented Reasoning

【速读】:该论文旨在解决当前大型音频语言模型(Large Audio Language Models, LALMs)在处理需要结构化知识或专业信号分析的任务时,因过度依赖端到端推理而导致可解释性差和准确率低的问题。解决方案的关键在于提出 Audio-Maestro 框架,该框架使模型能够自主调用外部工具,并将这些工具的带时间戳输出结果整合进推理流程中,从而通过专业化工具对音频信号进行分析、变换与解释,而非仅依赖单一的端到端推断机制。实验表明,该方法显著提升了多个主流模型在 MMAU-Test 数据集上的平均准确率。

链接: https://arxiv.org/abs/2510.11454
作者: Kuan-Yi Lee,Tsung-En Lin,Hung-Yi Lee
机构: 未知
类目: ound (cs.SD); Artificial Intelligence (cs.AI)
备注: 9pages

点击查看摘要

Abstract:Recent advancements in large multimodal models (LMMs) have shown strong capabilities in audio understanding. However, most systems rely solely on end-to-end reasoning, limiting interpretability and accuracy for tasks that require structured knowledge or specialized signal analysis. In this work, we present Audio-Maestro – a tool-augmented audio reasoning framework that enables audio-language models to autonomously call external tools and integrate their timestamped outputs into the reasoning process. This design allows the model to analyze, transform, and interpret audio signals through specialized tools rather than relying solely on end-to-end inference. Experiments show that Audio-Maestro consistently improves general audio reasoning performance: Gemini-2.5-flash’s average accuracy on MMAU-Test rises from 67.4% to 72.1%, DeSTA-2.5 from 58.3% to 62.8%, and GPT-4o from 60.8% to 63.9%. To our knowledge, Audio-Maestro is the first framework to integrate structured tool output into the large audio language model reasoning process.
zh

[AI-26] Reconstructing 12-Lead ECG from 3-Lead ECG using Variational Autoencoder to Improve Cardiac Disease Detection of Wearable ECG Devices

【速读】:该论文旨在解决三导联心电图(ECG)在临床诊断中因空间覆盖不足而难以检测心肌梗死(Myocardial Infarction, MI)等心脏疾病的问题,同时兼顾可穿戴设备对便携性和连续监测的需求。其解决方案的关键在于提出WearECG方法,这是一种基于变分自编码器(Variational Autoencoder, VAE)的生成式模型,能够从三个标准导联(II、V1、V5)重建出完整的十二导联ECG信号,并通过改进网络架构以更好地捕捉ECG信号的时间和空间依赖性,从而实现生理上合理且具有诊断价值的信号重构。

链接: https://arxiv.org/abs/2510.11442
作者: Xinyan Guan,Yongfan Lai,Jiarui Jin,Jun Li,Haoyu Wang,Qinghao Zhao,Deyun Zhang,Shijia Geng,Shenda Hong
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 24 pages, 5 figures, submitted to Nature Communications

点击查看摘要

Abstract:Twelve-lead electrocardiograms (ECGs) are the clinical gold standard for cardiac diagnosis, providing comprehensive spatial coverage of the heart necessary to detect conditions such as myocardial infarction (MI). However, their lack of portability limits continuous and large-scale use. Three-lead ECG systems are widely used in wearable devices due to their simplicity and mobility, but they often fail to capture pathologies in unmeasured regions. To address this, we propose WearECG, a Variational Autoencoder (VAE) method that reconstructs twelve-lead ECGs from three leads: II, V1, and V5. Our model includes architectural improvements to better capture temporal and spatial dependencies in ECG signals. We evaluate generation quality using MSE, MAE, and Frechet Inception Distance (FID), and assess clinical validity via a Turing test with expert cardiologists. To further validate diagnostic utility, we fine-tune ECGFounder, a large-scale pretrained ECG model, on a multi-label classification task involving over 40 cardiac conditions, including six different myocardial infarction locations, using both real and generated signals. Experiments on the MIMIC dataset show that our method produces physiologically realistic and diagnostically informative signals, with robust performance in downstream tasks. This work demonstrates the potential of generative modeling for ECG reconstruction and its implications for scalable, low-cost cardiac screening.
zh

[AI-27] Living Off the LLM : How LLM s Will Change Adversary Tactics

【速读】:该论文旨在解决未来本地部署的大语言模型(Large Language Models, LLMs)可能被恶意攻击者集成到“就地取材”(Living off the Land)攻击链中,从而规避传统安全检测机制的问题。其解决方案的关键在于识别LLMs在攻击流程中的潜在滥用路径,并提出由安全社区协同构建的防御策略,包括对LLM调用行为的异常检测、权限最小化控制以及系统级监控机制,以降低生成式AI(Generative AI)在攻击场景下的隐蔽性和破坏力。

链接: https://arxiv.org/abs/2510.11398
作者: Sean Oesch,Jack Hutchins,Luke Koch,Kevin Kurian
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注: 6 pages, 0 figures

点击查看摘要

Abstract:In living off the land attacks, malicious actors use legitimate tools and processes already present on a system to avoid detection. In this paper, we explore how the on-device LLMs of the future will become a security concern as threat actors integrate LLMs into their living off the land attack pipeline and ways the security community may mitigate this threat.
zh

[AI-28] Medical Interpretability and Knowledge Maps of Large Language Models

【速读】:该论文旨在解决医学领域大型语言模型(Large Language Models, LLMs)的可解释性问题,即明确模型如何表示与处理医学知识。其核心挑战在于理解医学概念(如患者年龄、症状、疾病和药物)在模型内部的存储位置及动态演化过程。解决方案的关键在于系统性地应用四种可解释性技术:(1) 中间激活的UMAP投影以可视化知识分布,(2) 基于梯度的显著性分析识别关键权重,(3) 层删除实验评估各层对输出的影响,以及(4) 激活修补法探究特定层功能。通过这些方法,作者构建了五种LLMs的知识图谱,揭示出如Llama3.3-70B中医学知识主要集中在前半部分层,并发现年龄编码非线性、疾病进展呈现非单调循环等现象,为后续针对医学任务的微调、去偏或遗忘学习提供了基于层级定位的实证依据。

链接: https://arxiv.org/abs/2510.11390
作者: Razvan Marinescu,Victoria-Elisabeth Gruber,Diego Fajardo
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 29 pages, 34 figures, 5 tables

点击查看摘要

Abstract:We present a systematic study of medical-domain interpretability in Large Language Models (LLMs). We study how the LLMs both represent and process medical knowledge through four different interpretability techniques: (1) UMAP projections of intermediate activations, (2) gradient-based saliency with respect to the model weights, (3) layer lesioning/removal and (4) activation patching. We present knowledge maps of five LLMs which show, at a coarse-resolution, where knowledge about patient’s ages, medical symptoms, diseases and drugs is stored in the models. In particular for Llama3.3-70B, we find that most medical knowledge is processed in the first half of the model’s layers. In addition, we find several interesting phenomena: (i) age is often encoded in a non-linear and sometimes discontinuous manner at intermediate layers in the models, (ii) the disease progression representation is non-monotonic and circular at certain layers of the model, (iii) in Llama3.3-70B, drugs cluster better by medical specialty rather than mechanism of action, especially for Llama3.3-70B and (iv) Gemma3-27B and MedGemma-27B have activations that collapse at intermediate layers but recover by the final layers. These results can guide future research on fine-tuning, un-learning or de-biasing LLMs for medical tasks by suggesting at which layers in the model these techniques should be applied.
zh

[AI-29] AI-Driven anemia diagnosis: A review of advanced models and techniques

【速读】:该论文旨在解决贫血(anemia)诊断中准确性和及时性不足的问题,以提升其管理与治疗效果。其解决方案的关键在于系统性地综述近年来人工智能技术,特别是机器学习(machine learning, ML)和深度学习(deep learning, DL)在贫血检测、分类与诊断中的应用进展,并通过准确性(accuracy)、敏感性(sensitivity)、特异性(specificity)和精确度(precision)等性能指标对不同模型进行比较分析,从而评估各类模型的优势与局限,强调优化这些因素对于提高诊断精度的重要性。

链接: https://arxiv.org/abs/2510.11380
作者: Abdullah Al Mahmud,Prangon Chowdhury,Mohammed Borhan Uddin,Khaled Eabne Delowar,Tausifur Rahman Talha,Bijoy Dewanjee
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Anemia, a condition marked by insufficient levels of red blood cells or hemoglobin, remains a widespread health issue affecting millions of individuals globally. Accurate and timely diagnosis is essential for effective management and treatment of anemia. In recent years, there has been a growing interest in the use of artificial intelligence techniques, i.e., machine learning (ML) and deep learning (DL) for the detection, classification, and diagnosis of anemia. This paper provides a systematic review of the recent advancements in this field, with a focus on various models applied to anemia detection. The review also compares these models based on several performance metrics, including accuracy, sensitivity, specificity, and precision. By analyzing these metrics, the paper evaluates the strengths and limitation of discussed models in detecting and classifying anemia, emphasizing the importance of addressing these factors to improve diagnostic accuracy.
zh

[AI-30] Understanding the Generalization of Stochastic Gradient Adam in Learning Neural Networks NEURIPS2025

【速读】:该论文旨在解决随机梯度下降(Stochastic Gradient Descent, SGD)与自适应梯度方法如Adam在理论分析和实际应用之间的差距问题,特别是针对实践中广泛使用的随机Adam(stochastic Adam)缺乏理论支撑的问题。其关键解决方案在于首次从理论上刻画了批量大小(batch size)对Adam泛化性能的影响机制,通过分析两层过参数化卷积神经网络(Convolutional Neural Networks, CNNs)在图像数据上的表现,揭示了尽管Adam及其带权重衰减的变体AdamW在全批量(full-batch)下收敛至较差测试误差解,但其小批量版本却可实现接近零的测试误差;进一步证明Adam的有效权重衰减边界严格小于AdamW,从而从理论上解释了为何Adam对权重衰减系数λ的调参更为敏感。实验结果验证了上述理论发现,强调了批量大小和权重衰减在Adam泛化能力中的决定性作用。

链接: https://arxiv.org/abs/2510.11354
作者: Xuan Tang,Han Zhang,Yuan Cao,Difan Zou
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注: 71 pages, 12 figures, NeurIPS 2025

点击查看摘要

Abstract:Adam is a popular and widely used adaptive gradient method in deep learning, which has also received tremendous focus in theoretical research. However, most existing theoretical work primarily analyzes its full-batch version, which differs fundamentally from the stochastic variant used in practice. Unlike SGD, stochastic Adam does not converge to its full-batch counterpart even with infinitesimal learning rates. We present the first theoretical characterization of how batch size affects Adam’s generalization, analyzing two-layer over-parameterized CNNs on image data. Our results reveal that while both Adam and AdamW with proper weight decay \lambda converge to poor test error solutions, their mini-batch variants can achieve near-zero test error. We further prove Adam has a strictly smaller effective weight decay bound than AdamW, theoretically explaining why Adam requires more sensitive \lambda tuning. Extensive experiments validate our findings, demonstrating the critical role of batch size and weight decay in Adam’s generalization performance.
zh

[AI-31] Multi-View Graph Feature Propagation for Privacy Preservation and Feature Sparsity

【速读】:该论文旨在解决图神经网络(Graph Neural Networks, GNNs)在节点分类任务中因特征稀疏或包含敏感信息而导致性能下降和隐私泄露的问题。解决方案的关键在于提出一种多视角特征传播(Multi-view Feature Propagation, MFP)框架,该框架通过将可用特征划分为多个加噪的高斯视图,并在图拓扑结构上独立传播信息,最终聚合得到具有表达力且鲁棒的节点嵌入。MFP不仅提升了极端稀疏场景下的模型鲁棒性,还提供了一种可量化的机制以平衡任务效用与隐私保护,实验证明其在保持分类性能的同时显著降低了隐私泄露风险。

链接: https://arxiv.org/abs/2510.11347
作者: Etzion Harari,Moshe Unger
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Graph Neural Networks (GNNs) have demonstrated remarkable success in node classification tasks over relational data, yet their effectiveness often depends on the availability of complete node features. In many real-world scenarios, however, feature matrices are highly sparse or contain sensitive information, leading to degraded performance and increased privacy risks. Furthermore, direct exposure of information can result in unintended data leakage, enabling adversaries to infer sensitive information. To address these challenges, we propose a novel Multi-view Feature Propagation (MFP) framework that enhances node classification under feature sparsity while promoting privacy preservation. MFP extends traditional Feature Propagation (FP) by dividing the available features into multiple Gaussian-noised views, each propagating information independently through the graph topology. The aggregated representations yield expressive and robust node embeddings. This framework is novel in two respects: it introduces a mechanism that improves robustness under extreme sparsity, and it provides a principled way to balance utility with privacy. Extensive experiments conducted on graph datasets demonstrate that MFP outperforms state-of-the-art baselines in node classification while substantially reducing privacy leakage. Moreover, our analysis demonstrates that propagated outputs serve as alternative imputations rather than reconstructions of the original features, preserving utility without compromising privacy. A comprehensive sensitivity analysis further confirms the stability and practical applicability of MFP across diverse scenarios. Overall, MFP provides an effective and privacy-aware framework for graph learning in domains characterized by missing or sensitive features.
zh

[AI-32] Part II: ROLL Flash – Accelerating RLVR and Agent ic Training with Asynchrony

【速读】:该论文旨在解决同步强化学习(Reinforcement Learning, RL)后训练在大规模语言模型(Large Language Models, LLMs)中的资源利用率低和可扩展性差的问题。其解决方案的关键在于提出 ROLL Flash 系统,该系统基于两个核心设计原则:细粒度并行(fine-grained parallelism)与 rollout-train 解耦(rollout-train decoupling),从而支持异步 RL 后训练架构,并通过队列调度和环境级异步执行等机制实现高效的采样与训练流程,显著提升了资源利用效率和系统可扩展性,在相同 GPU 预算下相较同步基线在 RLVR 任务上最高提速 2.24 倍、在代理类任务上最高提速 2.72 倍,且性能可达到与同步训练相当的水平。

链接: https://arxiv.org/abs/2510.11345
作者: Han Lu,Zichen Liu,Shaopan Xiong,Yancheng He,Wei Gao,Yanan Wu,Weixun Wang,Jiashun Liu,Yang Li,Haizhou Zhao,Ju Huang,Siran Yang,Xiaoyang Li,Yijia Luo,Zihe Liu,Ling Pan,Junchi Yan,Wei Wang,Wenbo Su,Jiamang Wang,Lin Qu,Bo Zheng
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Synchronous Reinforcement Learning (RL) post-training has emerged as a crucial step for enhancing Large Language Models (LLMs) with diverse capabilities. However, many systems designed to accelerate RL post-training still suffer from low resource utilization and limited scalability. We present ROLL Flash, a system that extends ROLL with native support for asynchronous RL post-training. ROLL Flash is built upon two core design principles: fine-grained parallelism and rollout-train decoupling. Guided by these principles, ROLL Flash provides flexible programming interfaces that enable a fully asynchronous training architecture and support efficient rollout mechanisms, including queue scheduling and environment-level asynchronous execution. Through comprehensive theoretical analysis and extensive experiments, we demonstrate that ROLL Flash significantly improves resource utilization and scalability over synchronous RL post-training. ROLL Flash achieves up to 2.24x speedup on RLVR tasks and 2.72x on agentic tasks, using the same GPU budget as synchronous baselines. Furthermore, we implement several popular off-policy algorithms and verify that asynchronous training can achieve performance on par with synchronous training.
zh

[AI-33] Event-Aware Prompt Learning for Dynamic Graphs

【速读】:该论文旨在解决现有动态图神经网络(Dynamic Graph Neural Networks, DGNNs)在处理动态图学习任务时,通常仅关注节点与时间的关系,而忽视了历史事件对节点表征影响的问题。其解决方案的关键在于提出一种事件感知的动态图提示学习框架(Event-aware Dynamic Graph Prompt Learning, EVP),通过两个核心机制实现:一是事件适配机制(event adaptation mechanism),用于将每个节点的历史事件细粒度特征与下游任务对齐;二是事件聚合机制(event aggregation mechanism),用于有效整合历史事件知识到节点表示中,从而增强模型对历史上下文的理解能力。

链接: https://arxiv.org/abs/2510.11339
作者: Xingtong Yu,Ruijuan Liang,Xinming Zhang,Yuan Fang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Under review

点击查看摘要

Abstract:Real-world graph typically evolve via a series of events, modeling dynamic interactions between objects across various domains. For dynamic graph learning, dynamic graph neural networks (DGNNs) have emerged as popular solutions. Recently, prompt learning methods have been explored on dynamic graphs. However, existing methods generally focus on capturing the relationship between nodes and time, while overlooking the impact of historical events. In this paper, we propose EVP, an event-aware dynamic graph prompt learning framework that can serve as a plug-in to existing methods, enhancing their ability to leverage historical events knowledge. First, we extract a series of historical events for each node and introduce an event adaptation mechanism to align the fine-grained characteristics of these events with downstream tasks. Second, we propose an event aggregation mechanism to effectively integrate historical knowledge into node representations. Finally, we conduct extensive experiments on four public datasets to evaluate and analyze EVP.
zh

[AI-34] Automated Skill Decomposition Meets Expert Ontologies: Bridging the Granularity Gap with LLM s

【速读】:该论文旨在解决技能分解(Skill Decomposition)自动化过程中缺乏标准化评估与结构一致性保障的问题,尤其关注如何利用大语言模型(Large Language Models, LLMs)生成符合知识本体(ontology)语义和层级结构的技能分解结果。其解决方案的关键在于提出了一套严谨的、基于本体的评估框架,该框架规范了从提示工程(prompting)、生成到归一化及与本体节点对齐的全流程;并引入两个核心指标:基于嵌入最优匹配的语义F1分数用于衡量内容准确性,以及考虑层级结构的F1分数用于评估分解粒度的正确性。实验表明,采用示例引导的少样本提示策略(leakage-safe few-shot)相比零样本提示(zero-shot)能更稳定地提升输出的语义准确性和层次结构对齐度,同时在延迟方面具有竞争力甚至更快,因生成内容更符合预定义的结构规范。

链接: https://arxiv.org/abs/2510.11313
作者: Le Ngoc Luyen,Marie-Hélène Abel
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:This paper investigates automated skill decomposition using Large Language Models (LLMs) and proposes a rigorous, ontology-grounded evaluation framework. Our framework standardizes the pipeline from prompting and generation to normalization and alignment with ontology nodes. To evaluate outputs, we introduce two metrics: a semantic F1-score that uses optimal embedding-based matching to assess content accuracy, and a hierarchy-aware F1-score that credits structurally correct placements to assess granularity. We conduct experiments on ROME-ESCO-DecompSkill, a curated subset of parents, comparing two prompting strategies: zero-shot and leakage-safe few-shot with exemplars. Across diverse LLMs, zero-shot offers a strong baseline, while few-shot consistently stabilizes phrasing and granularity and improves hierarchy-aware alignment. A latency analysis further shows that exemplar-guided prompts are competitive - and sometimes faster - than unguided zero-shot due to more schema-compliant completions. Together, the framework, benchmark, and metrics provide a reproducible foundation for developing ontology-faithful skill decomposition systems.
zh

[AI-35] Beyond touch-based HMI: Control your machines in natural language by utilizing large language models and OPC UA

【速读】:该论文旨在解决工业场景中人机交互(Human-Machine Interface, HMI)的自然性不足问题,当前主流依赖触控操作,缺乏直观、灵活的指令输入方式。解决方案的关键在于构建一种基于智能体(Agent)的架构,利用大语言模型(Large Language Model, LLM)结合工具调用机制与OPC UA通信标准,使操作员可通过自然语言(如“请将机器1温度降低20%并设置机器2电机转速为5000 rpm”)直接控制支持OPC UA协议的工业设备。该方法无需对LLM进行微调或训练数据,仅需在系统提示词中注入机器凭证和参数字典,即可实现高准确率的指令解析与执行,在Siemens S7-1500 PLC上的案例研究中,使用专有GPT-5模型时准确率达96.0%–98.0%,开源模型最高达90.0%,验证了其通用性和实用性。

链接: https://arxiv.org/abs/2510.11300
作者: Bernd Hofmann,Sven Kreitlein,Joerg Franke,Patrick Bruendl
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:This paper proposes an agent-based approach toward a more natural interface between humans and machines. Large language models equipped with tools and the communication standard OPC UA are utilized to control machines in natural language. Instead of touch interaction, which is currently the state-of-the-art medium for interaction in operations, the proposed approach enables operators to talk or text with machines. This allows commands such as ‘Please decrease the temperature by 20 % in machine 1 and set the motor speed to 5000 rpm in machine 2.’ The large language model receives the user input and selects one of three predefined tools that connect to an OPC UA server and either change or read the value of a node. Afterwards, the result of the tool execution is passed back to the language model, which then provides a final response to the user. The approach is universally designed and can therefore be applied to any machine that supports the OPC UA standard. The large language model is neither fine-tuned nor requires training data, only the relevant machine credentials and a parameter dictionary are included within the system prompt. The approach is evaluated on a Siemens S7-1500 programmable logic controller with four machine parameters in a case study of fifty synthetically generated commands on five different models. The results demonstrate high success rate, with proprietary GPT 5 models achieving accuracies between 96.0 % and 98.0 %, and open-weight models reaching up to 90.0 %. The proposed approach of this empirical study contributes to advancing natural interaction in industrial human-machine interfaces.
zh

[AI-36] LouisKV: Efficient KV Cache Retrieval for Long Input-Output Sequences

【速读】:该论文旨在解决自回归模型中键值缓存(Key-Value cache, KV cache)在长序列场景下带来的显著内存开销问题,该问题限制了其在实际部署中的应用。现有KV检索方法虽通过动态保留GPU上的部分KV条目来缓解内存压力,但仍因逐Token检索和粗粒度的页级管理策略,在长输出推理任务中面临效率与准确率瓶颈。解决方案的关键在于两个核心观察:一是关键KV在解码过程中表现出强时间局部性;二是这些KV在输入提示和生成输出中具有不同的分布模式。基于此,作者提出LouisKV框架,其创新点包括:1)引入语义感知的检索策略,仅在语义边界触发检索,大幅降低计算与数据传输开销;2)设计解耦且细粒度的管理机制,针对输入和输出序列采用差异化策略,构建更贴合模型注意力模式的检索单元,实现对关键KV的精准识别;3)集成多项内核级优化(如定制Triton和CUDA核函数),加速KV聚类与检索过程。实验表明,LouisKV相较当前最优方法最高可实现4.7倍加速,同时保持近无损精度,适用于多种长序列任务场景。

链接: https://arxiv.org/abs/2510.11292
作者: Wenbo Wu,Qingyi Si,Xiurui Pan,Ye Wang,Jie Zhang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:While Key-Value (KV) cache succeeds in reducing redundant computations in auto-regressive models, it introduces significant memory overhead, limiting its practical deployment in long-sequence scenarios. Existing KV retrieval methods mitigate this by dynamically retaining only a subset of KV entries on the GPU. However, they still suffer from notable efficiency and accuracy bottlenecks due to per-token retrieval and coarse-grained page-level KV management, especially in long-output reasoning scenarios. With the emergence of large reasoning models, efficiently handling such scenarios has become increasingly important. To address this issue, we present two key observations: (1) critical KVs exhibit strong temporal locality during decoding, and (2) these KVs exhibit distinct distribution patterns across the input prompt and generated output. Building on these observations, we propose LouisKV, an efficient KV cache retrieval framework designed for various long-sequence scenarios. Specifically, LouisKV introduces a semantic-aware retrieval strategy leveraging temporal locality to trigger retrieval only at semantic boundaries, drastically reducing computation and data transfer overhead. LouisKV also designs a decoupled, fine-grained management scheme that tailors differentiated strategies for input and output sequences to create retrieval units that better match the model’s attention patterns, enabling precise identification of critical KVs. Furthermore, to boost efficiency, LouisKV incorporates several kernel-level optimizations, including custom Triton and CUDA kernels to accelerate the KV clustering and retrieval. Evaluations show that LouisKV achieves up to 4.7 \times speedup over state-of-the-art KV retrieval methods while maintaining near-lossless accuracy across diverse long-sequence tasks, including long-input short-output, short-input long-output, and long-input long-output scenarios.
zh

[AI-37] Evolution in Simulation: AI-Agent School with Dual Memory for High-Fidelity Educational Dynamics EMNLP

【速读】:该论文旨在解决教育过程中教学流程建模碎片化以及代理(Agent)在模拟多样化教育参与者时性能受限的问题。其解决方案的关键在于提出AI-Agent School(AAS)系统,该系统基于一个自演化机制,采用“经验-反思-优化”的持续循环,并依托包含经验库与知识库的双记忆结构,融合短期与长期记忆组件,使代理能够在多样化的模拟校园场景中通过情境化交互自主进化,从而更精准地模拟真实学校中教师与学生之间复杂的多维度互动关系及学习过程。

链接: https://arxiv.org/abs/2510.11290
作者: Sheng Jin,Haoming Wang,Zhiqi Gao,Yongbo Yang,Bao Chunjia,Chengliang Wang
机构: 未知
类目: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
备注: 9 pages, 7 figures, EMNLP conference

点击查看摘要

Abstract:Large language models (LLMs) based Agents are increasingly pivotal in simulating and understanding complex human systems and interactions. We propose the AI-Agent School (AAS) system, built around a self-evolving mechanism that leverages agents for simulating complex educational dynamics. Addressing the fragmented issues in teaching process modeling and the limitations of agents performance in simulating diverse educational participants, AAS constructs the Zero-Exp strategy, employs a continuous “experience-reflection-optimization” cycle, grounded in a dual memory base comprising experience and knowledge bases and incorporating short-term and long-term memory components. Through this mechanism, agents autonomously evolve via situated interactions within diverse simulated school scenarios. This evolution enables agents to more accurately model the nuanced, multi-faceted teacher-student engagements and underlying learning processes found in physical schools. Experiment confirms that AAS can effectively simulate intricate educational dynamics and is effective in fostering advanced agent cognitive abilities, providing a foundational stepping stone from the “Era of Experience” to the “Era of Simulation” by generating high-fidelity behavioral and interaction data.
zh

[AI-38] PADME: Procedure Aware DynaMic Execution

【速读】:该论文旨在解决智能体从自然语言中自主执行长时程任务(long-horizon procedures)的核心挑战,尤其是面对自由格式指令(如食谱、科学实验流程或业务工作流)时,由于其结构不明确和多样性导致的大语言模型(LLMs)在执行过程中产生漂移或失败的问题。解决方案的关键在于提出一种名为Procedure Aware DynaMic Execution (PADME) 的代理框架,其核心创新是通过两阶段方法将非结构化的程序文本自动转化为可执行的图结构表示(graph-based representation),该表示能够显式建模任务依赖关系、决策点和可复用子例程;其中“教学阶段”(Teach phase)负责系统化构建与逻辑增强,而“执行阶段”(Execute phase)则支持对实时输入和环境反馈的动态响应,从而实现高质量保证与跨场景可扩展性,并借助图结构提供的归纳偏置(inductive bias)显著减少长程推理中的误差累积。

链接: https://arxiv.org/abs/2510.11281
作者: Deepeka Garg,Sihan Zeng,Annapoorani L. Narayanan,Sumitra Ganesh,Leo Ardon
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Learning to autonomously execute long-horizon procedures from natural language remains a core challenge for intelligent agents. Free-form instructions such as recipes, scientific protocols, or business workflows encode rich procedural knowledge, but their variability and lack of structure cause agents driven by large language models (LLMs) to drift or fail during execution. We introduce Procedure Aware DynaMic Execution (PADME), an agent framework that produces and exploits a graph-based representation of procedures. Unlike prior work that relies on manual graph construction or unstructured reasoning, PADME autonomously transforms procedural text into executable graphs that capture task dependencies, decision points, and reusable subroutines. Central to PADME is a two-phase methodology; Teach phase, which focuses on systematic structuring, enrichment with executable logic of procedures, followed by Execute phase, which enables dynamic execution in response to real-time inputs and environment feedback. This separation ensures quality assurance and scalability, allowing expert knowledge to be encoded once and reliably reused across varying contexts. The graph representation also provides an inductive bias that reduces error accumulation in long-horizon reasoning, underscoring the importance of structured procedure modeling for reliable agent-driven automation. Empirically, PADME achieves state-of-the-art performance on four diverse benchmarks, including ALFWorld and ScienceWorld. These results demonstrate that agents equipped with graph-based procedure representations offer a powerful intermediate abstraction for robust and generalizable execution.
zh

[AI-39] From Prompts to Packets: A View from the Network on ChatGPT Copilot and Gemini

【速读】:该论文旨在解决生成式 AI (Generative AI) 聊天机器人在移动应用中产生的网络流量特征尚不明确的问题,特别是其与传统应用流量的差异及其对移动网络管理的影响。解决方案的关键在于构建了一套专门的流量捕获架构,采集并标注了两类互补的工作负载:一个60小时的通用数据集(无约束提示)和一个受控数据集(相同提示跨应用复现),从而实现对ChatGPT、Copilot和Gemini三款主流聊天机器人的细粒度流量表征,涵盖追踪、流和协议层级,并采用多模态马尔可夫链建模报文序列动态。研究发现GenAI流量具有显著的应用和内容特异性,如上行/下行分布、TLS协议使用模式(Gemini广泛采用QUIC,ChatGPT仅用TLS 1.3)及SNI值的差异化,且SNI在分类中贡献显著——掩码后F1分数下降最高达20个百分点,揭示了GenAI流量的独特性及其对移动网络监控与管理带来的新挑战。

链接: https://arxiv.org/abs/2510.11269
作者: Antonio Montieri,Alfredo Nascita,Antonio Pescapè
机构: 未知
类目: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
备注: 13 pages, 8 figures, 2 tables, 4 research questions, preprint submitted to Elsevier Computer Networks

点击查看摘要

Abstract:Generative AI (GenAI) chatbots are now pervasive in digital ecosystems, yet their network traffic remains largely underexplored. This study presents an in-depth investigation of traffic generated by three leading chatbots (ChatGPT, Copilot, and Gemini) when accessed via Android mobile apps for both text and image generation. Using a dedicated capture architecture, we collect and label two complementary workloads: a 60-hour generic dataset with unconstrained prompts, and a controlled dataset built from identical prompts across GenAI apps and replicated via conventional messaging apps to enable one-to-one comparisons. This dual design allows us to address practical research questions on the distinctiveness of GenAI traffic, its differences from widely deployed traffic categories, and its novel implications for network usage. To this end, we provide fine-grained traffic characterization at trace, flow, and protocol levels, and model packet-sequence dynamics with Multimodal Markov Chains. Our analyses reveal app- and content-specific traffic patterns, particularly in volume, uplink/downlink profiles, and protocol adoption. We highlight the predominance of TLS, with Gemini extensively leveraging QUIC, ChatGPT exclusively using TLS 1.3, and app- and content-specific Server Name Indication (SNI) values. A payload-based occlusion analysis quantifies SNI’s contribution to classification: masking it reduces F1-score by up to 20 percentage points in GenAI app traffic classification. Finally, compared with conventional messaging apps when carrying the same content, GenAI chatbots exhibit unique traffic characteristics, highlighting new stress factors for mobile networks, such as sustained upstream activity, with direct implications for network monitoring and management. We publicly release the datasets to support reproducibility and foster extensions to other use cases.
zh

[AI-40] Large Language Models Are Effective Code Watermarkers

【速读】:该论文旨在解决大规模语言模型(Large Language Models, LLMs)和开源代码广泛使用背景下,源代码分发与归属识别中存在的伦理与安全问题,如未经授权的再分发、许可证违规及恶意用途。现有水印技术依赖手工设计的转换规则、抽象语法树(Abstract Syntax Tree, AST)操作或任务特定训练,存在跨语言泛化能力弱、可扩展性差且抗攻击能力不足的问题。其解决方案的关键在于提出 CodeMark-LLM 框架,通过两个核心模块实现:(i) 语义一致嵌入模块,采用功能保持型变换编码水印比特,确保代码语义和可读性不受影响;(ii) 差分比较提取模块,通过对比原始代码与水印代码识别嵌入的变换模式。该框架利用 LLM 的跨语言泛化能力,避免了语言特异性工程和训练流程,从而在多种编程语言和攻击场景下展现出鲁棒性、有效性与可扩展性。

链接: https://arxiv.org/abs/2510.11251
作者: Rui Xu,Jiawei Chen,Zhaoxia Yin,Cong Kong,Xinpeng Zhang
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:The widespread use of large language models (LLMs) and open-source code has raised ethical and security concerns regarding the distribution and attribution of source code, including unauthorized redistribution, license violations, and misuse of code for malicious purposes. Watermarking has emerged as a promising solution for source attribution, but existing techniques rely heavily on hand-crafted transformation rules, abstract syntax tree (AST) manipulation, or task-specific training, limiting their scalability and generality across languages. Moreover, their robustness against attacks remains limited. To address these limitations, we propose CodeMark-LLM, an LLM-driven watermarking framework that embeds watermark into source code without compromising its semantics or readability. CodeMark-LLM consists of two core components: (i) Semantically Consistent Embedding module that applies functionality-preserving transformations to encode watermark bits, and (ii) Differential Comparison Extraction module that identifies the applied transformations by comparing the original and watermarked code. Leveraging the cross-lingual generalization ability of LLM, CodeMark-LLM avoids language-specific engineering and training pipelines. Extensive experiments across diverse programming languages and attack scenarios demonstrate its robustness, effectiveness, and scalability.
zh

[AI-41] AI Alignment Strategies from a Risk Perspective: Independent Safety Mechanisms or Shared Failures?

【速读】:该论文旨在解决当前AI对齐(AI alignment)研究中一个关键挑战:如何有效降低因单一对齐技术失效而导致的安全风险。其核心问题是,尽管防御纵深(defense-in-depth)策略通过部署多种冗余保护机制来提升系统安全性,但其有效性高度依赖于不同对齐技术之间的失败模式(failure modes)是否相互独立——若各技术具有高度相关的失败模式,则防御纵深将无法提供额外安全保障。论文的关键解决方案在于系统性分析了7种代表性对齐技术和7种典型失败模式之间的重叠程度,从而量化它们的独立性,并为未来优先级排序和改进对齐研究提供实证依据。

链接: https://arxiv.org/abs/2510.11235
作者: Leonard Dung,Florian Mai
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: under review

点击查看摘要

Abstract:AI alignment research aims to develop techniques to ensure that AI systems do not cause harm. However, every alignment technique has failure modes, which are conditions in which there is a non-negligible chance that the technique fails to provide safety. As a strategy for risk mitigation, the AI safety community has increasingly adopted a defense-in-depth framework: Conceding that there is no single technique which guarantees safety, defense-in-depth consists in having multiple redundant protections against safety failure, such that safety can be maintained even if some protections fail. However, the success of defense-in-depth depends on how (un)correlated failure modes are across alignment techniques. For example, if all techniques had the exact same failure modes, the defense-in-depth approach would provide no additional protection at all. In this paper, we analyze 7 representative alignment techniques and 7 failure modes to understand the extent to which they overlap. We then discuss our results’ implications for understanding the current level of risk and how to prioritize AI alignment research in the future.
zh

[AI-42] RAG -Pull: Imperceptible Attacks on RAG Systems for Code Generation

【速读】:该论文旨在解决生成式 AI(Generative AI)在使用检索增强生成(Retrieval-Augmented Generation, RAG)机制时所面临的安全隐患问题,即攻击者可通过隐蔽的输入扰动诱导模型从外部知识源中检索到恶意代码片段,从而破坏模型的安全对齐性并引入漏洞。解决方案的关键在于提出一种新型黑盒攻击方法——RAG-Pull,其通过在查询文本或外部代码库中插入隐藏的UTF字符,实现对检索过程的定向干扰,使模型倾向于获取由攻击者控制的恶意代码片段,进而导致远程代码执行或SQL注入等严重安全风险。该攻击仅需极小扰动即可显著改变模型输出偏好,暴露RAG架构在安全防护上的薄弱环节。

链接: https://arxiv.org/abs/2510.11195
作者: Vasilije Stambolic,Aritra Dhar,Lukas Cavigelli
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Retrieval-Augmented Generation (RAG) increases the reliability and trustworthiness of the LLM response and reduces hallucination by eliminating the need for model retraining. It does so by adding external data into the LLM’s context. We develop a new class of black-box attack, RAG-Pull, that inserts hidden UTF characters into queries or external code repositories, redirecting retrieval toward malicious code, thereby breaking the models’ safety alignment. We observe that query and code perturbations alone can shift retrieval toward attacker-controlled snippets, while combined query-and-target perturbations achieve near-perfect success. Once retrieved, these snippets introduce exploitable vulnerabilities such as remote code execution and SQL injection. RAG-Pull’s minimal perturbations can alter the model’s safety alignment and increase preference towards unsafe code, therefore opening up a new class of attacks on LLMs.
zh

[AI-43] Aligning Deep Implicit Preferences by Learning to Reason Defensively

【速读】:该论文旨在解决当前大语言模型(Large Language Models, LLMs)在个性化对齐(Personalized Alignment)中存在的双重挑战:一是难以推断用户深层次的隐式偏好(包括未明说的目标、语义上下文和风险容忍度),二是缺乏应对现实世界模糊性的防御性推理能力(Defensive Reasoning),导致生成响应浅层、脆弱且短视。解决方案的关键在于提出一种新的框架——批判驱动推理对齐(Critique-Driven Reasoning Alignment, CDRA),其核心创新包括:1)构建DeepPref基准数据集,通过模拟多维认知委员会生成带批判标注的推理链,以揭示查询语义与潜在风险;2)设计个性化生成过程奖励模型(Pers-GenPRM),将奖励建模转化为个性化推理任务,先生成批判链评估响应与用户偏好的一致性,再输出可解释的评分;最终结合批判驱动策略对齐(Critique-Driven Policy Alignment)算法,在在线强化学习中融合数值与自然语言反馈,实现结构化、可解释的对齐优化。

链接: https://arxiv.org/abs/2510.11194
作者: Peiming Li,Zhiyuan Hu,Yang Tang,Shiyu Li,Xi Chen
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Personalized alignment is crucial for enabling Large Language Models (LLMs) to engage effectively in user-centric interactions. However, current methods face a dual challenge: they fail to infer users’ deep implicit preferences (including unstated goals, semantic context and risk tolerances), and they lack the defensive reasoning required to navigate real-world ambiguity. This cognitive gap leads to responses that are superficial, brittle and short-sighted. To address this, we propose Critique-Driven Reasoning Alignment (CDRA), which reframes alignment from a scalar reward-matching task into a structured reasoning process. First, to bridge the preference inference gap, we introduce the DeepPref benchmark. This dataset, comprising 3000 preference-query pairs across 20 topics, is curated by simulating a multi-faceted cognitive council that produces critique-annotated reasoning chains to deconstruct query semantics and reveal latent risks. Second, to instill defensive reasoning, we introduce the Personalized Generative Process Reward Model (Pers-GenPRM), which frames reward modeling as a personalized reasoning task. It generates a critique chain to evaluate a response’s alignment with user preferences before outputting a final score based on this rationale. Ultimately, this interpretable, structured reward signal guides policy model through Critique-Driven Policy Alignment, a process-level online reinforcement learning algorithm integrating both numerical and natural language feedback. Experiments demonstrate that CDRA excels at discovering and aligning with users’ true preferences while executing robust reasoning. Our code and dataset are available at this https URL.
zh

[AI-44] Protein as a Second Language for LLM s ICLR2026

【速读】:该论文旨在解决未知蛋白质序列功能解析这一基础性挑战,传统方法通常依赖于特定任务的适配器或大规模监督微调,存在泛化能力弱和数据依赖性强的问题。解决方案的关键在于提出“蛋白质作为第二语言”(Protein-as-Second-Language)框架,将氨基酸序列重构为大型语言模型(LLM)可理解的符号化句子,并通过上下文示例引导模型在零样本设置下生成序列-问题-答案三元组,从而揭示功能线索,无需额外训练。此方法利用自建的包含79,926个蛋白质问答实例的双语语料库支持推理过程,在多种开源LLM及GPT-4上实现显著性能提升(平均ROUGE-L提升7%,最高达17.2%),甚至超越专用蛋白质语言模型,表明通用大模型在获得蛋白质语言提示后可优于领域特化模型,为构建基础模型中的蛋白质理解提供了可扩展路径。

链接: https://arxiv.org/abs/2510.11188
作者: Xinhui Chen,Zuchao Li,Mengqi Gao,Yufeng Zhang,Chak Tou Leong,Haoyang Li,Jiaqi Chen
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM)
备注: Main paper: 9 pages, 6 figures. With references and appendix: 18 pages, 9 figures total. Submitted to ICLR 2026 (under review)

点击查看摘要

Abstract:Deciphering the function of unseen protein sequences is a fundamental challenge with broad scientific impact, yet most existing methods depend on task-specific adapters or large-scale supervised fine-tuning. We introduce the “Protein-as-Second-Language” framework, which reformulates amino-acid sequences as sentences in a novel symbolic language that large language models can interpret through contextual exemplars. Our approach adaptively constructs sequence-question-answer triples that reveal functional cues in a zero-shot setting, without any further training. To support this process, we curate a bilingual corpus of 79,926 protein-QA instances spanning attribute prediction, descriptive understanding, and extended reasoning. Empirically, our method delivers consistent gains across diverse open-source LLMs and GPT-4, achieving up to 17.2% ROUGE-L improvement (average +7%) and even surpassing fine-tuned protein-specific language models. These results highlight that generic LLMs, when guided with protein-as-language cues, can outperform domain-specialized models, offering a scalable pathway for protein understanding in foundation models.
zh

[AI-45] Spec-Driven AI for Science: The ARIA Framework for Automated and Reproducible Data Analysis

链接: https://arxiv.org/abs/2510.11143
作者: Chuke Chen,Biao Luo,Nan Li,Boxiang Wang,Hang Yang,Jing Guo,Ming Xu
机构: 未知
类目: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
备注: 19 pages,5 figures

点击查看摘要

[AI-46] Improving AI Efficiency in Data Centres by Power Dynamic Response

链接: https://arxiv.org/abs/2510.11119
作者: Andrea Marinoni,Sai Shivareddy,Pietro Lio’,Weisi Lin,Erik Cambria,Clare Grey
机构: 未知
类目: Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)
备注:

点击查看摘要

[AI-47] PhysioME: A Robust Multimodal Self-Supervised Framework for Physiological Signals with Missing Modalities

【速读】:该论文旨在解决生理信号(physiological signal)医疗应用中因硬件限制或运动伪影导致的多模态数据缺失问题,此类缺失常使现有方法性能显著下降。解决方案的关键在于提出PhysioME框架,其核心包括:(1) 一种结合对比学习与掩码预测的多模态自监督学习策略,以增强模型对不完整输入的鲁棒性;(2) 专为捕捉各生理信号模态时序动态设计的Dual-PathNeuroNet主干网络;(3) 一个恢复解码器,用于重建缺失模态的token,从而实现对不完整输入的灵活处理。实验表明,该框架在多种模态缺失场景下均表现出高一致性与泛化能力。

链接: https://arxiv.org/abs/2510.11110
作者: Cheol-Hui Lee,Hwa-Yeon Lee,Min-Kyung Jung,Dong-Joo Kim
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 9 pages, 2 figures

点击查看摘要

Abstract:Missing or corrupted modalities are common in physiological signal-based medical applications owing to hardware constraints or motion artifacts. However, most existing methods assume the availability of all modalities, resulting in substantial performance degradation in the absence of any modality. To overcome this limitation, this study proposes PhysioME, a robust framework designed to ensure reliable performance under missing modality conditions. PhysioME adopts: (1) a multimodal self-supervised learning approach that combines contrastive learning with masked prediction; (2) a Dual-PathNeuroNet backbone tailored to capture the temporal dynamics of each physiological signal modality; and (3) a restoration decoder that reconstructs missing modality tokens, enabling flexible processing of incomplete inputs. The experimental results show that PhysioME achieves high consistency and generalization performance across various missing modality scenarios. These findings highlight the potential of PhysioME as a reliable tool for supporting clinical decision-making in real-world settings with imperfect data availability.
zh

[AI-48] A Vision for Access Control in LLM -based Agent Systems

【速读】:该论文旨在解决大型语言模型(Large Language Models, LLM)驱动的智能体(agents)因自主性和情境复杂性增强,导致传统访问控制(Access Control, AC)机制失效的问题。传统基于静态规则的访问控制在面对动态信息流时显得力不从心,难以适应智能体交互中的多变场景。解决方案的关键在于提出一种新的框架——智能体访问控制(Agent Access Control, AAC),其核心是将访问控制重构为一种动态、上下文感知的信息流治理过程。AAC包含两个核心模块:一是多维情境评估,综合考量身份、关系、场景与规范等要素;二是自适应响应生成,通过删减、摘要和改写等方式对信息进行精细化处理,而非简单的允许或拒绝决策。这一方案借助专用的访问控制推理引擎,旨在实现人类级的语境判断能力与可扩展的AI安全之间的平衡,为可信智能体设计提供全新的理论视角。

链接: https://arxiv.org/abs/2510.11108
作者: Xinfeng Li,Dong Huang,Jie Li,Hongyi Cai,Zhenhong Zhou,Wei Dong,XiaoFeng Wang,Yang Liu
机构: 未知
类目: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
备注: 10 pages, 1 figure

点击查看摘要

Abstract:The autonomy and contextual complexity of LLM-based agents render traditional access control (AC) mechanisms insufficient. Static, rule-based systems designed for predictable environments are fundamentally ill-equipped to manage the dynamic information flows inherent in agentic interactions. This position paper argues for a paradigm shift from binary access control to a more sophisticated model of information governance, positing that the core challenge is not merely about permission, but about governing the flow of information. We introduce Agent Access Control (AAC), a novel framework that reframes AC as a dynamic, context-aware process of information flow governance. AAC operates on two core modules: (1) multi-dimensional contextual evaluation, which assesses not just identity but also relationships, scenarios, and norms; and (2) adaptive response formulation, which moves beyond simple allow/deny decisions to shape information through redaction, summarization, and paraphrasing. This vision, powered by a dedicated AC reasoning engine, aims to bridge the gap between human-like nuanced judgment and scalable Al safety, proposing a new conceptual lens for future research in trustworthy agent design.
zh

[AI-49] A Primer on SO(3) Action Representations in Deep Reinforcement Learning

【速读】:该论文旨在解决机器人控制任务中基于旋转(SO(3))动作表示的挑战,即由于SO(3)流形缺乏全局光滑且最小的参数化,导致常用表示方法(如欧拉角、四元数、旋转矩阵和李代数坐标)在强化学习中引入不同的约束和失效模式。其解决方案的关键在于系统评估多种SO(3)动作表示在PPO、SAC和TD3三种连续控制算法下的表现,重点分析它们对探索行为、熵正则化交互以及训练稳定性的影响,并发现将动作表示为局部坐标系中的切向量(tangent vectors)能够最可靠地提升不同算法的性能与鲁棒性。

链接: https://arxiv.org/abs/2510.11103
作者: Martin Schuck,Sherif Samy,Angela P. Schoellig
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Many robotic control tasks require policies to act on orientations, yet the geometry of SO(3) makes this nontrivial. Because SO(3) admits no global, smooth, minimal parameterization, common representations such as Euler angles, quaternions, rotation matrices, and Lie algebra coordinates introduce distinct constraints and failure modes. While these trade-offs are well studied for supervised learning, their implications for actions in reinforcement learning remain unclear. We systematically evaluate SO(3) action representations across three standard continuous control algorithms, PPO, SAC, and TD3, under dense and sparse rewards. We compare how representations shape exploration, interact with entropy regularization, and affect training stability through empirical studies and analyze the implications of different projections for obtaining valid rotations from Euclidean network outputs. Across a suite of robotics benchmarks, we quantify the practical impact of these choices and distill simple, implementation-ready guidelines for selecting and using rotation actions. Our results highlight that representation-induced geometry strongly influences exploration and optimization and show that representing actions as tangent vectors in the local frame yields the most reliable results across algorithms.
zh

[AI-50] HoMer: Addressing Heterogeneities by Modeling Sequential and Set-wise Contexts for CTR Prediction

【速读】:该论文旨在解决工业推荐系统中点击率(Click-Through Rate, CTR)预测面临的三大异质性问题:特征异质性(Feature Heterogeneity)、上下文异质性(Context Heterogeneity)和架构异质性(Architecture Heterogeneity)。具体而言,特征异质性源于序列侧特征粒度不足,难以与非序列特征(如用户/物品画像或交叉特征)匹配;上下文异质性表现为点对点预测忽略整个物品集合中的跨物品交互关系;架构异质性则来自模块化网络结构的碎片化整合,影响模型效率与可扩展性。其解决方案的关键在于提出HoMer——一种面向统一性的Transformer架构,通过三方面创新实现突破:首先,对齐序列侧特征与非序列特征以提升兴趣建模精度;其次,将预测范式从点对点转向集合级(set-wise),支持并行化的跨物品交互建模;最后,采用统一的编码器-解码器结构,在减少计算冗余的同时实现性能与规模的双重优化,从而在不修改原有预测流程的前提下显著提升AUC指标(+0.0099)及线上CTR/RPM(分别+1.99%/+2.46%),并节省27% GPU资源。

链接: https://arxiv.org/abs/2510.11100
作者: Shuwei Chen,Jiajun Cui,Zhengqi Xu,Fan Zhang,Jiangke Fan,Teng Zhang,Xingxing Wang
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
备注: 10 pages, 6 figures

点击查看摘要

Abstract:Click-through rate (CTR) prediction, which models behavior sequence and non-sequential features (e.g., user/item profiles or cross features) to infer user interest, underpins industrial recommender systems. However, most methods face three forms of heterogeneity that degrade predictive performance: (i) Feature Heterogeneity persists when limited sequence side features provide less granular interest representation compared to extensive non-sequential features, thereby impairing sequence modeling performance; (ii) Context Heterogeneity arises because a user’s interest in an item will be influenced by other items, yet point-wise prediction neglects cross-item interaction context from the entire item set; (iii) Architecture Heterogeneity stems from the fragmented integration of specialized network modules, which compounds the model’s effectiveness, efficiency and scalability in industrial deployments. To tackle the above limitations, we propose HoMer, a Homogeneous-Oriented TransforMer for modeling sequential and set-wise contexts. First, we align sequence side features with non-sequential features for accurate sequence modeling and fine-grained interest representation. Second, we shift the prediction paradigm from point-wise to set-wise, facilitating cross-item interaction in a highly parallel manner. Third, HoMer’s unified encoder-decoder architecture achieves dual optimization through structural simplification and shared computation, ensuring computational efficiency while maintaining scalability with model size. Without arduous modification to the prediction pipeline, HoMer successfully scales up and outperforms our industrial baseline by 0.0099 in the AUC metric, and enhances online business metrics like CTR/RPM by 1.99%/2.46%. Additionally, HoMer saves 27% of GPU resources via preliminary engineering optimization, further validating its superiority and practicality.
zh

[AI-51] Modeling AI-Driven Production and Competitiveness A Multi-Agent Economic Simulation of China and the United States

【速读】:该论文旨在解决人工智能(AI)驱动下社会生产系统转型及其对国际竞争力影响的机制与量化问题。其解决方案的关键在于构建了一个多层级智能代理经济模型,并通过仿真比较中国与美国在不同机制——AI协作、网络效应及AI自主生产——下的宏观经济产出演化路径,从而揭示AI作为独立生产主体时对整体社会产出增长的显著促进作用,以及中国在智能代理规模扩展和技术追赶方面的潜在加速优势,为政策制定提供基于模型的定量分析依据。

链接: https://arxiv.org/abs/2510.11085
作者: Yuxinyue Qian,Jun Liu
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:With the rapid development of artificial intelligence (AI) technology, socio-economic systems are entering a new stage of “human-AI co-creation.” Building upon a previously established multi-level intelligent agent economic model, this paper conducts simulation-based comparisons of macroeconomic output evolution in China and the United States under different mechanisms-AI collaboration, network effects, and AI autonomous production. The results show that: (1) when AI functions as an independent productive entity, the overall growth rate of social output far exceeds that of traditional human-labor-based models; (2) China demonstrates clear potential for acceleration in both the expansion of intelligent agent populations and the pace of technological catch-up, offering the possibility of achieving technological convergence or even partial surpassing. This study provides a systematic, model-based analytical framework for understanding AI-driven production system transformation and shifts in international competitiveness, as well as quantitative insights for relevant policy formulation.
zh

[AI-52] Causal Disentanglement Learning for Accurate Anomaly Detection in Multivariate Time Series

链接: https://arxiv.org/abs/2510.11084
作者: Wonah Kim,Jeonghyeon Park,Dongsan Jun,Jungkyu Han,Sejin Chun
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 20 pages, 4 Figures,

点击查看摘要

[AI-53] Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling NEURIPS2025

【速读】:该论文旨在解决复杂驾驶场景中交互式驾驶行为建模的难题,尤其是现有基于学习的方法在缺乏高质量交互数据时难以捕捉高价值交互行为的问题。其核心解决方案在于提出Flow Planner框架,关键创新包括:1)细粒度轨迹标记化(fine-grained trajectory tokenization),通过将轨迹分解为重叠片段降低整体建模复杂度;2)设计高效的时空融合架构,实现规划信息与场景信息的协同融合,从而更精准地捕捉交互行为;3)引入无分类器引导的流匹配(flow matching with classifier-free guidance)机制,用于多模态行为生成,并在推理阶段动态调整代理间交互权重,确保响应策略的一致性,显著提升对复杂交互场景的理解能力。

链接: https://arxiv.org/abs/2510.11083
作者: Tianyi Tan,Yinan Zheng,Ruiming Liang,Zexu Wang,Kexin Zheng,Jinliang Zheng,Jianxiong Li,Xianyuan Zhan,Jingjing Liu
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注: 26 pages, 6 figures. Accepted at NeurIPS 2025

点击查看摘要

Abstract:Modeling interactive driving behaviors in complex scenarios remains a fundamental challenge for autonomous driving planning. Learning-based approaches attempt to address this challenge with advanced generative models, removing the dependency on over-engineered architectures for representation fusion. However, brute-force implementation by simply stacking transformer blocks lacks a dedicated mechanism for modeling interactive behaviors that are common in real driving scenarios. The scarcity of interactive driving data further exacerbates this problem, leaving conventional imitation learning methods ill-equipped to capture high-value interactive behaviors. We propose Flow Planner, which tackles these problems through coordinated innovations in data modeling, model architecture, and learning scheme. Specifically, we first introduce fine-grained trajectory tokenization, which decomposes the trajectory into overlapping segments to decrease the complexity of whole trajectory modeling. With a sophisticatedly designed architecture, we achieve efficient temporal and spatial fusion of planning and scene information, to better capture interactive behaviors. In addition, the framework incorporates flow matching with classifier-free guidance for multi-modal behavior generation, which dynamically reweights agent interactions during inference to maintain coherent response strategies, providing a critical boost for interactive scenario understanding. Experimental results on the large-scale nuPlan dataset and challenging interactive interPlan dataset demonstrate that Flow Planner achieves state-of-the-art performance among learning-based approaches while effectively modeling interactive behaviors in complex driving scenarios.
zh

[AI-54] Argumentation-Based Explainability for Legal AI: Comparative and Regulatory Perspectives

【速读】:该论文旨在解决人工智能(AI)系统在法律领域应用中因“黑箱问题”(black box problem)导致的公平性、问责性和可信度不足的问题,即受影响个体难以获得有意义的解释。其解决方案的关键在于引入计算论证模型(computational models of arguments),通过捕捉法律推理中可撤销性(defeasible)、可争议性(contestable)和价值敏感性(value-sensitive)的特性,为法律场景提供更具规范性和解释力的解释机制。该方法不仅与欧盟《通用数据保护条例》(GDPR)和《人工智能法案》(AIA)等新兴监管框架相契合,还为实现法律领域可解释人工智能(Explainable AI, XAI)的技术与规范双重透明性要求提供了坚实基础。

链接: https://arxiv.org/abs/2510.11079
作者: Andrada Iulia Prajescu,Roberto Confalonieri
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Artificial Intelligence (AI) systems are increasingly deployed in legal contexts, where their opacity raises significant challenges for fairness, accountability, and trust. The so-called ``black box problem’’ undermines the legitimacy of automated decision-making, as affected individuals often lack access to meaningful explanations. In response, the field of Explainable AI (XAI) has proposed a variety of methods to enhance transparency, ranging from example-based and rule-based techniques to hybrid and argumentation-based approaches. This paper promotes computational models of arguments and their role in providing legally relevant explanations, with particular attention to their alignment with emerging regulatory frameworks such as the EU General Data Protection Regulation (GDPR) and the Artificial Intelligence Act (AIA). We analyze the strengths and limitations of different explanation strategies, evaluate their applicability to legal reasoning, and highlight how argumentation frameworks – by capturing the defeasible, contestable, and value-sensitive nature of law – offer a particularly robust foundation for explainable legal AI. Finally, we identify open challenges and research directions, including bias mitigation, empirical validation in judicial settings, and compliance with evolving ethical and legal standards, arguing that computational argumentation is best positioned to meet both technical and normative requirements of transparency in the law domain.
zh

[AI-55] PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System

【速读】:该论文旨在解决人形机器人在真实世界环境中实现多样化交互任务(如搬运物体、坐姿、躺下和起身)时面临的两大核心挑战:一是如何生成通用且类人的运动行为,二是如何实现鲁棒的场景感知能力。解决方案的关键在于提出一个统一的物理世界人形-场景交互系统(PhysHSI),其核心由两部分组成:首先,在仿真训练阶段采用基于对抗性运动先验(adversarial motion prior-based)的策略学习方法,以模仿多样场景下的自然人形交互数据,从而同时实现高泛化能力和逼真的动作表现;其次,在真实部署阶段引入粗粒度到细粒度的对象定位模块,融合LiDAR与相机输入,提供连续且鲁棒的环境感知能力,从而保障复杂交互任务的成功执行。

链接: https://arxiv.org/abs/2510.11072
作者: Huayi Wang,Wentao Zhang,Runyi Yu,Tao Huang,Junli Ren,Feiyu Jia,Zirui Wang,Xiaojie Niu,Xiao Chen,Jiahe Chen,Qifeng Chen,Jingbo Wang,Jiangmiao Pang
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
备注: Project website: this https URL

点击查看摘要

Abstract:Deploying humanoid robots to interact with real-world environments–such as carrying objects or sitting on chairs–requires generalizable, lifelike motions and robust scene perception. Although prior approaches have advanced each capability individually, combining them in a unified system is still an ongoing challenge. In this work, we present a physical-world humanoid-scene interaction system, PhysHSI, that enables humanoids to autonomously perform diverse interaction tasks while maintaining natural and lifelike behaviors. PhysHSI comprises a simulation training pipeline and a real-world deployment system. In simulation, we adopt adversarial motion prior-based policy learning to imitate natural humanoid-scene interaction data across diverse scenarios, achieving both generalization and lifelike behaviors. For real-world deployment, we introduce a coarse-to-fine object localization module that combines LiDAR and camera inputs to provide continuous and robust scene perception. We validate PhysHSI on four representative interactive tasks–box carrying, sitting, lying, and standing up–in both simulation and real-world settings, demonstrating consistently high success rates, strong generalization across diverse task goals, and natural motion patterns.
zh

[AI-56] mporal Alignment Guidance: On-Manifold Sampling in Diffusion Models

【速读】:该论文旨在解决扩散模型(diffusion models)在生成过程中因任意引导(arbitrary guidance)导致的离流形现象(off-manifold phenomenon),即样本偏离真实数据流形,从而破坏生成质量的问题。解决方案的关键在于提出一种名为“时间对齐引导”(Temporal Alignment Guidance, TAG)的新机制:通过引入一个时间预测器(time predictor)实时估计每一步时间中样本偏离目标数据流形的程度,并据此在生成过程的每个时间步主动将样本拉回目标流形,从而显著提升生成样本的保真度与下游任务性能。

链接: https://arxiv.org/abs/2510.11057
作者: Youngrok Park,Hojung Jung,Sangmin Bae,Se-Young Yun
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 54 pages, 17 figures, 18 tables

点击查看摘要

Abstract:Diffusion models have achieved remarkable success as generative models. However, even a well-trained model can accumulate errors throughout the generation process. These errors become particularly problematic when arbitrary guidance is applied to steer samples toward desired properties, which often breaks sample fidelity. In this paper, we propose a general solution to address the off-manifold phenomenon observed in diffusion models. Our approach leverages a time predictor to estimate deviations from the desired data manifold at each timestep, identifying that a larger time gap is associated with reduced generation quality. We then design a novel guidance mechanism, `Temporal Alignment Guidance’ (TAG), attracting the samples back to the desired manifold at every timestep during generation. Through extensive experiments, we demonstrate that TAG consistently produces samples closely aligned with the desired manifold at each timestep, leading to significant improvements in generation quality across various downstream tasks.
zh

[AI-57] From Reasoning LLM s to BERT: A Two-Stage Distillation Framework for Search Relevance

【速读】:该论文旨在解决电商搜索系统中查询-服务相关性预测(query-service relevance prediction)面临的严格延迟要求,这一限制使得直接部署大语言模型(Large Language Models, LLMs)不可行。为弥合性能与效率之间的差距,作者提出了一种两阶段推理蒸馏框架(two-stage reasoning distillation framework),其关键在于:第一阶段构建领域适配的教师模型(domain-adapted teacher model),通过领域自适应预训练注入平台知识、监督微调激发推理能力,并结合多维奖励模型进行偏好优化,从而自动标注海量查询-服务对的相关标签与推理链;第二阶段引入对比推理自蒸馏(Contrastive Reasoning Self-Distillation, CRSD),通过将同一学生模型在标准输入与推理增强输入下的行为建模为师生关系,使轻量级学生模型无需显式推理路径即可内化教师的复杂决策机制,最终在美团搜索广告系统中实现离线指标提升与在线A/B测试效果显著改善。

链接: https://arxiv.org/abs/2510.11056
作者: Runze Xia,Yupeng Ji,Yuxi Zhou,Haodong Liu,Teng Zhang,Piji Li
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Query-service relevance prediction in e-commerce search systems faces strict latency requirements that prevent the direct application of Large Language Models (LLMs). To bridge this gap, we propose a two-stage reasoning distillation framework to transfer reasoning capabilities from a powerful teacher LLM to a lightweight, deployment-friendly student model. In the first stage, we address the limitations of general-purpose LLMs by constructing a domain-adapted teacher model. This is achieved through a three-step process: domain-adaptive pre-training to inject platform knowledge, supervised fine-tuning to elicit reasoning skills, and preference optimization with a multi-dimensional reward model to ensure the generation of reliable and preference-aligned reasoning paths. This teacher can then automatically annotate massive query-service pairs from search logs with both relevance labels and reasoning chains. In the second stage, to address the challenges of architectural heterogeneity in standard distillation, we introduce Contrastive Reasoning Self-Distillation (CRSD). By modeling the behavior of the same student model under “standard” and “reasoning-augmented” inputs as a teacher-student relationship, CRSD enables the lightweight model to internalize the teacher’s complex decision-making mechanisms without needing the explicit reasoning path at inference. Offline evaluations and online A/B testing in the Meituan search advertising system demonstrate that our framework achieves significant improvements across multiple metrics, validating its effectiveness and practical value.
zh

[AI-58] XGrasp: Gripper-Aware Grasp Detection with Multi-Gripper Data Generation

【速读】:该论文旨在解决当前机器人抓取方法通常仅针对单一夹爪类型设计,导致在需要多样化末端执行器(end-effector)的实际场景中适用性受限的问题。其解决方案的关键在于提出XGrasp框架,这是一个实时的夹爪感知抓取检测系统,通过系统性地扩展现有数据集以包含多夹爪标注来缓解数据稀缺问题;同时采用分层两阶段架构:第一阶段由抓取点预测器(Grasp Point Predictor, GPP)利用全局场景信息和夹爪规格识别最优抓取位置,第二阶段由角度-宽度预测器(Angle-Width Predictor, AWP)基于局部特征精调抓取角度与宽度,并通过AWP模块中的对比学习实现对未见过夹爪的零样本泛化能力,从而显著提升推理速度并保持高抓取成功率。

链接: https://arxiv.org/abs/2510.11036
作者: Yeonseo Lee,Jungwook Mun,Hyosup Shin,Guebin Hwang,Junhee Nam,Taeyeop Lee,Sungho Jo
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Most robotic grasping methods are typically designed for single gripper types, which limits their applicability in real-world scenarios requiring diverse end-effectors. We propose XGrasp, a real-time gripper-aware grasp detection framework that efficiently handles multiple gripper configurations. The proposed method addresses data scarcity by systematically augmenting existing datasets with multi-gripper annotations. XGrasp employs a hierarchical two-stage architecture. In the first stage, a Grasp Point Predictor (GPP) identifies optimal locations using global scene information and gripper specifications. In the second stage, an Angle-Width Predictor (AWP) refines the grasp angle and width using local features. Contrastive learning in the AWP module enables zero-shot generalization to unseen grippers by learning fundamental grasping characteristics. The modular framework integrates seamlessly with vision foundation models, providing pathways for future vision-language capabilities. The experimental results demonstrate competitive grasp success rates across various gripper types, while achieving substantial improvements in inference speed compared to existing gripper-aware methods. Project page: this https URL
zh

[AI-59] FBS Model-based Maintenance Record Accumulation for Failure-Cause Inference in Manufacturing Systems

【速读】:该论文旨在解决制造系统中故障原因推理的准确性问题,尤其是在维护记录有限且术语差异较大的情况下,传统方法难以有效识别故障根源。解决方案的关键在于构建了诊断知识本体(Diagnostic Knowledge Ontology),并基于功能-行为-结构(Function-Behavior-Structure, FBS)模型提出了一种维护记录积累方法,通过显式结构化系统知识与故障因果链,显著提升了故障原因推理结果与专家候选原因集的一致性,尤其在数据稀疏场景下表现优越。

链接: https://arxiv.org/abs/2510.11003
作者: Takuma Fujiu,Sho Okazaki,Kohei Kaminishi,Yuji Nakata,Shota Hamamoto,Kenshin Yokose,Tatsunori Hara,Yasushi Umeda,Jun Ota
机构: 未知
类目: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
备注:

点击查看摘要

Abstract:In manufacturing systems, identifying the causes of failures is crucial for maintaining and improving production efficiency. In knowledge-based failure-cause inference, it is important that the knowledge base (1) explicitly structures knowledge about the target system and about failures, and (2) contains sufficiently long causal chains of failures. In this study, we constructed Diagnostic Knowledge Ontology and proposed a Function-Behavior-Structure (FBS) model-based maintenance-record accumulation method based on it. Failure-cause inference using the maintenance records accumulated by the proposed method showed better agreement with the set of candidate causes enumerated by experts, especially in difficult cases where the number of related cases is small and the vocabulary used differs. In the future, it will be necessary to develop inference methods tailored to these maintenance records, build a user interface, and carry out validation on larger and more diverse systems. Additionally, this approach leverages the understanding and knowledge of the target in the design phase to support knowledge accumulation and problem solving during the maintenance phase, and it is expected to become a foundation for knowledge sharing across the entire engineering chain in the future.
zh

[AI-60] DITTO: A Spoofing Attack Framework on Watermarked LLM s via Knowledge Distillation

【速读】:该论文旨在解决大语言模型(Large Language Models, LLMs)水印技术在文本作者身份验证中存在安全漏洞的问题,即当前假设“特定水印可证明由特定模型生成”这一前提并不成立。论文提出了一种名为“水印伪造”(watermark spoofing)的新型攻击方法,其关键在于利用微调过程中无意继承的数据模式——即水印放射性(watermark radioactivity),将其从一种可检测特征转化为攻击向量。通过蒸馏来自已水印教师模型的知识,攻击者能够窃取并复制受害模型的水印信号,从而生成看似来自可信源的伪造内容,实现对有害信息(如虚假信息)的无缝误标。这一发现揭示了现有文本归属验证机制的重大安全缺陷,并呼吁转向具备区分真实水印与高超仿制品能力的新一代技术。

链接: https://arxiv.org/abs/2510.10987
作者: Hyeseon Ahn,Shinwoo Park,Yo-Sub Han
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注: 14 pages, 4 figures, preprint

点击查看摘要

Abstract:The promise of LLM watermarking rests on a core assumption that a specific watermark proves authorship by a specific model. We demonstrate that this assumption is dangerously flawed. We introduce the threat of watermark spoofing, a sophisticated attack that allows a malicious model to generate text containing the authentic-looking watermark of a trusted, victim model. This enables the seamless misattribution of harmful content, such as disinformation, to reputable sources. The key to our attack is repurposing watermark radioactivity, the unintended inheritance of data patterns during fine-tuning, from a discoverable trait into an attack vector. By distilling knowledge from a watermarked teacher model, our framework allows an attacker to steal and replicate the watermarking signal of the victim model. This work reveals a critical security gap in text authorship verification and calls for a paradigm shift towards technologies capable of distinguishing authentic watermarks from expertly imitated ones. Our code is available at this https URL.
zh

[AI-61] Catch-Only-One: Non-Transferable Examples for Model-Specific Authorization

【速读】:该论文旨在解决当前生成式 AI(Generative AI)模型训练中数据使用控制难题,即如何在保障数据对授权模型仍具可用性的同时,防止其被未授权模型滥用。现有方法要么通过扰动数据使其难以学习,要么重新训练模型以抑制迁移能力,但这些方案均无法控制未知模型的推理过程,且通常依赖对训练过程的干预。论文提出非可迁移样本(Non-transferable Examples, NEs),这是一种无需训练、与数据无关的输入端使用控制机制:通过将输入编码至特定模型的低敏感度子空间,在保持授权模型输出性能的同时,利用子空间错位显著降低未授权模型的表现。其核心创新在于基于Hoffman-Wielandt不等式建立理论边界,量化授权与非授权模型之间的性能差异,从而实现对数据用途的精准管控。

链接: https://arxiv.org/abs/2510.10982
作者: Zihan Wang,Zhiyong Ma,Zhongkui Ma,Shuofeng Liu,Akide Liu,Derui Wang,Minhui Xue,Guangdong Bai
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Recent AI regulations call for data that remain useful for innovation while resistant to misuse, balancing utility with protection at the model level. Existing approaches either perturb data to make it unlearnable or retrain models to suppress transfer, but neither governs inference by unknown models, and both typically require control over training. We propose non-transferable examples (NEs), a training-free and data-agnostic input-side usage-control mechanism. We recode inputs within a model-specific low-sensitivity subspace, preserving outputs for the authorized model while reducing performance on unauthorized models through subspace misalignment. We establish formal bounds that guarantee utility for the authorized model and quantify deviation for unauthorized ones, with the Hoffman-Wielandt inequality linking degradation to spectral differences. Empirically, NEs retain performance on diverse vision backbones and state-of-the-art vision-language models under common preprocessing, whereas non-target models collapse even with reconstruction attempts. These results establish NEs as a practical means to preserve intended data utility while preventing unauthorized exploitation. Our project is available at this https URL
zh

[AI-62] Video-STR: Reinforcing MLLM s in Video Spatio-Temporal Reasoning with Relation Graph

【速读】:该论文旨在解决多模态大语言模型(Multimodal Large Language Models, MLLMs)在视频场景中缺乏精确时空理解能力的问题,尤其是其对视频内物理信息(如多物体布局和运动)的忽视限制了其在具身智能(embodied intelligence)和虚拟现实(VR)等高精度下游任务中的应用。解决方案的关键在于提出Video-STR方法,该方法基于图结构的强化学习框架,引入了基于图的群体相对策略优化(Graph-based Group Relative Policy Optimization, GRPO)机制,通过可验证奖励(Verifiable Reward)引导模型在推理过程中推断场景的潜在时空拓扑结构;同时构建了包含205k问答对的STV-205k数据集,覆盖室内外动态多物体场景,以弥补现有时空训练数据的不足,从而显著提升模型在多个基准测试上的表现。

链接: https://arxiv.org/abs/2510.10976
作者: Wentao Wang,Heqing Zou,Tianze Luo,Rui Huang,Yutian Zhao,Zhuochen Wang,Hansheng Zhang,Chengwei Qin,Yan Wang,Lin Zhao,Huaijian Zhang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Recent progress in Multimodal Large Language Models (MLLMs) has demonstrated strong semantic understanding capabilities, but struggles to perform precise spatio-temporal understanding. Existing spatio-temporal methods primarily focus on the video itself, while overlooking the physical information within the video, such as multi-object layouts and motion. Such limitations restrict the use of MLLMs in downstream applications that demand high precision, including embodied intelligence and VR. To address this issue, we present Video-STR, a novel graph-based reinforcement method for precise Video Spatio-Temporal Reasoning. Building upon the capacity of Reinforcement Learning with Verifiable Reward (RLVR) to improve model abilities, we introduce a reasoning mechanism using graph-based Group Relative Policy Optimization (GRPO) method to guide the model in inferring the underlying spatio-temporal topology of scenarios during the thinking process. To resolve the lack of spatio-temporal training data, we construct the STV-205k dataset with 205k question-answering pairs, covering dynamic multi-object scenes in both indoor and outdoor environments, to support the model training. Experiments show that Video-STR achieves state-of-the-art results on various benchmarks, outperforming the base model by 13% on STI-Bench, and demonstrating the effectiveness of our approach and dataset. Code, model, and data will be released.
zh

[AI-63] APLOT: Robust Reward Modeling via Adaptive Preference Learning with Optimal Transport EMNLP2025

【速读】:该论文旨在解决基于Bradley-Terry(BT)目标的奖励模型(Reward Model, RM)在对齐大语言模型(Large Language Models, LLMs)与人类偏好时存在的局限性,即难以有效区分相似偏好响应,导致对简单样本过拟合且在分布外(Out-Of-Distribution, OOD)样本上泛化能力差的问题。解决方案的关键在于提出一种自适应边距机制(adaptive margin mechanism),通过结合语义相似度和模型预测奖励差异,从分布视角出发利用最优传输(Optimal Transport, OT)设计合理的成本矩阵,动态调整RM对更具挑战性样本的关注度,从而增强模型对选择与拒绝响应之间分布差异的捕捉能力,显著提升性能、收敛速度及泛化能力。

链接: https://arxiv.org/abs/2510.10963
作者: Zhuo Li,Yuege Feng,Dandan Guo,Jinpeng Hu,Anningzhe Gao,Xiang Wan
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: EMNLP2025

点击查看摘要

Abstract:The reward model (RM) plays a crucial role in aligning Large Language Models (LLMs) with human preferences through Reinforcement Learning, where the Bradley-Terry (BT) objective has been recognized as simple yet powerful, specifically for pairwise preference learning. However, BT-based RMs often struggle to effectively distinguish between similar preference responses, leading to insufficient separation between preferred and non-preferred outputs. Consequently, they may easily overfit easy samples and cannot generalize well to Out-Of-Distribution (OOD) samples, resulting in suboptimal performance. To address these challenges, this paper introduces an effective enhancement to BT-based RMs through an adaptive margin mechanism. Specifically, we design to dynamically adjust the RM focus on more challenging samples through margins, based on both semantic similarity and model-predicted reward differences, which is approached from a distributional perspective solvable with Optimal Transport (OT). By incorporating these factors into a principled OT cost matrix design, our adaptive margin enables the RM to better capture distributional differences between chosen and rejected responses, yielding significant improvements in performance, convergence speed, and generalization capabilities. Experimental results across multiple benchmarks demonstrate that our method outperforms several existing RM techniques, showcasing enhanced performance in both In-Distribution (ID) and OOD settings. Moreover, RLHF experiments support our practical effectiveness in better aligning LLMs with human preferences. Our code is available at this https URL
zh

[AI-64] MC#: Mixture Compressor for Mixture-of-Experts Large Models

链接: https://arxiv.org/abs/2510.10962
作者: Wei Huang,Yue Liao,Yukang Chen,Jianhui Liu,Haoru Tan,Si Liu,Shiming Zhang,Shuicheng Yan,Xiaojuan Qi
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 15 pages, 13 figures

点击查看摘要

[AI-65] Project-Level C-to-Rust Translation via Synergistic Integration of Knowledge Graphs and Large Language Models

【速读】:该论文旨在解决现有基于大语言模型(Large Language Model, LLM)的C-to-Rust翻译方法在项目级(project-level)转换中难以正确处理指针(pointer)语义的问题。传统方法通常基于调用图将C项目分割为函数单元并采用自底向上的翻译策略,但这种局部视角无法充分捕捉指针的全局使用模式,导致生成的Rust代码仍存在大量不安全行为。解决方案的关键在于提出一种新型的C-Rust指针知识图谱(Pointer Knowledge Graph, KG),该图谱在代码依赖图基础上扩展了两类指针语义信息:(i) 指针使用信息(pointer-usage information),记录点向流(points-to flows)及结构体层级映射关系;(ii) Rust导向注解(Rust-oriented annotations),显式编码所有权(ownership)、可变性(mutability)、空值性(nullability)和生命周期(lifetime)。通过将该KG与LLM协同使用,所提出的工具\ourtool能够从全局视角指导LLM生成更安全、更符合Rust惯用法的代码,显著降低不安全使用并提升功能正确性。

链接: https://arxiv.org/abs/2510.10956
作者: Zhiqiang Yuan,Wenjun Mao,Zhuo Chen,Xiyue Shang,Chong Wang,Yiling Lou,Xin Peng
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Translating C code into safe Rust is an effective way to ensure its memory safety. Compared to rule-based translation which produces Rust code that remains largely unsafe, LLM-based methods can generate more idiomatic and safer Rust code because LLMs have been trained on vast amount of human-written idiomatic code. Although promising, existing LLM-based methods still struggle with project-level C-to-Rust translation. They typically partition a C project into smaller units (\eg functions) based on call graphs and translate them bottom-up to resolve program dependencies. However, this bottom-up, unit-by-unit paradigm often fails to translate pointers due to the lack of a global perspective on their usage. To address this problem, we propose a novel C-Rust Pointer Knowledge Graph (KG) that enriches a code-dependency graph with two types of pointer semantics: (i) pointer-usage information which record global behaviors such as points-to flows and map lower-level struct usage to higher-level units; and (ii) Rust-oriented annotations which encode ownership, mutability, nullability, and lifetime. Synthesizing the \kg with LLMs, we further propose \ourtool, which implements a project-level C-to-Rust translation technique. In \ourtool, the \kg provides LLMs with comprehensive pointer semantics from a global perspective, thus guiding LLMs towards generating safe and idiomatic Rust code from a given C project. Our experiments show that \ourtool reduces unsafe usages in translated Rust by 99.9% compared to both rule-based translation and traditional LLM-based rewriting, while achieving an average 29.3% higher functional correctness than those fuzzing-enhanced LLM methods.
zh

[AI-66] Unify Variables in Neural Scaling Laws for General Audio Representations via Embedding Effective Rank

链接: https://arxiv.org/abs/2510.10948
作者: Xuyao Deng,Yanjie Sun,Yong Dou,Kele Xu
机构: 未知
类目: ound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
备注:

点击查看摘要

[AI-67] Scalable and Explainable Enterprise Knowledge Discovery Using Graph-Centric Hybrid Retrieval

链接: https://arxiv.org/abs/2510.10942
作者: Nilima Rao,Jagriti Srivastava,Pradeep Kumar Sharma,Hritvik Shrivastava
机构: 未知
类目: Artificial Intelligence (cs.AI); Databases (cs.DB)
备注:

点击查看摘要

[AI-68] Redundancy as a Structural Information Principle for Learning and Generalization

链接: https://arxiv.org/abs/2510.10938
作者: Yuda Bi,Ying Zhu,Vince D Calhoun
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)
备注:

点击查看摘要

[AI-69] abVLA: Targeted Backdoor Attacks on Vision-Language-Action Models

链接: https://arxiv.org/abs/2510.10932
作者: Zonghuan Xu,Xiang Zheng,Xingjun Ma,Yu-Gang Jiang
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Robotics (cs.RO)
备注: 8 pages, 8 tables, 1 figure. Under review

点击查看摘要

[AI-70] PoU: Proof-of-Use to Counter Tool-Call Hacking in DeepResearch Agents

【速读】:该论文针对检索增强生成(Retrieval-Augmented Generation, RAG)代理在强化学习(Reinforcement Learning, RL)训练中出现的“工具调用劫持”(Tool-Call Hacking)问题展开研究,即代理通过发出表面正确但未真正利用检索证据的工具调用以虚增奖励信号,导致模式坍缩和虚假支撑(spurious grounding)。解决方案的关键在于提出使用证明(Proof-of-Use, PoU)框架,该框架通过一个统一的逐步契约机制,强制建立检索证据、推理轨迹与最终答案之间的可验证因果联系,具体包括语法层面的引用验证、基于扰动的敏感性奖励以及答案-证据对齐目标,从而确保工具调用既可解释又功能上扎根于真实信息。

链接: https://arxiv.org/abs/2510.10931
作者: SHengjie Ma,Chenlong Deng,Jiaxin Mao,Jiadeng Huang,Teng Wang,Junjie Wu,Changwang Zhang,Jun wang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Retrieval-augmented generation (RAG) agents, such as recent DeepResearch-style systems, extend large language models (LLMs) with autonomous information-seeking capabilities through external tools. While reinforcement learning (RL) has enabled impressive multi-step reasoning, we identify a previously overlooked failure mode, Tool-Call Hacking, where agents inflate reward signals by issuing superficially correct tool calls without genuinely leveraging the retrieved evidence. This results in (i) mode collapse into repetitive reliance on a single source and (ii) spurious grounding, where answers are only weakly supported by cited content. To address this, we propose Proof-of-Use (PoU), an evidence-grounded RL framework that enforces verifiable causal links between retrieved evidence, reasoning traces, and final answers. PoU operationalizes this through a unified step-wise contract combining syntactic citation validation, perturbation-based sensitivity rewards, and answer-evidence alignment objectives, ensuring that tool usage remains both interpretable and functionally grounded. Across seven QA benchmarks spanning in-domain, out-of-domain, and out-of-tool-distribution settings, PoU consistently outperforms strong DeepResearch baselines in factual accuracy, evidence faithfulness, and tool-routing balance. These findings highlight the necessity of grounding RL-trained agents not merely in task outcomes but in the causal use of retrieved information, offering a principled path toward trustworthy retrieval-augmented reasoning. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2510.10931 [cs.AI] (or arXiv:2510.10931v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2510.10931 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
zh

[AI-71] Comparative Explanations via Counterfactual Reasoning in Recommendations

【速读】:该论文旨在解决现有可解释推荐系统中基于反事实推理的解释存在事实性错误的问题。当前主流方法通过最小化物品属性的变化并依据聚合决策边界分数反转推荐结果来生成解释,但这种方法常导致解释与实际推荐逻辑不符,即产生事实性不准确的解释。为解决此问题,论文提出了一种新的方法——比较型反事实解释推荐(CoCountER),其关键在于利用软交换操作(soft swap operations)构建反事实数据,从而支持任意一对比较性物品的推荐解释,有效提升了解释的真实性与合理性。

链接: https://arxiv.org/abs/2510.10920
作者: Yi Yu,Zhenxing Hu
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Explainable recommendation through counterfactual reasoning seeks to identify the influential aspects of items in recommendations, which can then be used as explanations. However, state-of-the-art approaches, which aim to minimize changes in product aspects while reversing their recommended decisions according to an aggregated decision boundary score, often lead to factual inaccuracies in explanations. To solve this problem, in this work we propose a novel method of Comparative Counterfactual Explanations for Recommendation (CoCountER). CoCountER creates counterfactual data based on soft swap operations, enabling explanations for recommendations of arbitrary pairs of comparative items. Empirical experiments validate the effectiveness of our approach.
zh

[AI-72] LPCVAE: A Conditional VAE with Long-Term Dependency and Probabilistic Time-Frequency Fusion for Time Series Anomaly Detection

链接: https://arxiv.org/abs/2510.10915
作者: Hanchang Cheng,Weimin Mu,Fan Liu,Weilin Zhu,Can Ma
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-73] PaperArena: An Evaluation Benchmark for Tool-Augmented Agent ic Reasoning on Scientific Literature

【速读】:该论文旨在解决当前大型语言模型(Large Language Model, LLM)代理在处理真实科研问题时面临的局限性,即现有方法主要局限于单篇文献内的工具无依赖任务,缺乏对跨论文信息整合与多工具协同推理的评估基准。其解决方案的关键在于提出PaperArena——一个面向科学发现场景的标准化评估基准,支持代理通过跨文献信息融合与外部工具(如多模态解析、上下文检索和程序化计算)的交互来回答复杂研究问题。该平台不仅提供模块化、可扩展的代理执行环境,还揭示了当前先进代理系统在跨文献推理中的显著性能瓶颈(平均准确率仅38.78%,困难子集降至18.47%),并指出工具使用效率低下是主要障碍之一,从而为未来更高效、智能的科研辅助代理开发提供了明确方向。

链接: https://arxiv.org/abs/2510.10909
作者: Daoyu Wang,Mingyue Cheng,Qi Liu,Shuo Yu,Zirui Liu,Ze Guo
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 12 pages, 9 figures

点击查看摘要

Abstract:Understanding and reasoning on the web-scale scientific literature is a crucial touchstone for large language model (LLM) based agents designed to support complex knowledge-intensive tasks. However, existing works are mainly restricted to tool-free tasks within isolated papers, largely due to the lack of a benchmark for cross-paper reasoning and multi-tool orchestration in real research scenarios. In this work, we propose PaperArena, an evaluation benchmark for agents to address real-world research questions that typically require integrating information across multiple papers with the assistance of external tools. Given a research question, agents should integrate diverse formats across multiple papers through reasoning and interacting with appropriate tools, thereby producing a well-grounded answer. To support standardized evaluation, we provide a modular and extensible platform for agent execution, offering tools such as multimodal parsing, context retrieval, and programmatic computation. Experimental results reveal that even the most advanced LLM powering a well-established agent system achieves merely 38.78% average accuracy. On the hard subset, accuracy drops to only 18.47%, highlighting great potential for improvement. We also present several empirical findings, including that all agents tested exhibit inefficient tool usage, often invoking more tools than necessary to solve a task. We invite the community to adopt PaperArena to develop and evaluate more capable agents for scientific discovery. Our code and data are available this https URL.
zh

[AI-74] LLM -Empowered Agent ic MAC Protocols: A Dynamic Stackelberg Game Approach

链接: https://arxiv.org/abs/2510.10895
作者: Renxuan Tan,Rongpeng Li,Fei Wang,Chenghui Peng,Shaoyun Wu,Zhifeng Zhao,Honggang Zhang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: This work has been submitted to IEEE for possible publication

点击查看摘要

[AI-75] Generative AI for Software Project Management: Insights from a Review of Software Practitioner Literature

链接: https://arxiv.org/abs/2510.10887
作者: Lakshana Iruni Assalaarachchi,Zainab Masood,Rashina Hoda,John Grundy
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-76] GRIP: A Unified Framework for Grid-Based Relay and Co-Occurrence-Aware Planning in Dynamic Environments

链接: https://arxiv.org/abs/2510.10865
作者: Ahmed Alanazi,Duy Ho,Yugyung Lee
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注: 17 pages, 5 figures, 8 tables

点击查看摘要

[AI-77] HeroFilter: Adaptive Spectral Graph Filter for Varying Heterophilic Relations

链接: https://arxiv.org/abs/2510.10864
作者: Shuaicheng Zhang,Haohui Wang,Junhong Lin,Xiaojie Guo,Yada Zhu,Si Zhang,Dongqi Fu,Dawei Zhou
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
备注:

点击查看摘要

[AI-78] Discrete State Diffusion Models: A Sample Complexity Perspective

链接: https://arxiv.org/abs/2510.10854
作者: Aadithya Srikanth,Mudit Gaur,Vaneet Aggarwal
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注:

点击查看摘要

[AI-79] Software Defect Prediction using Autoencoder Transformer Model

链接: https://arxiv.org/abs/2510.10840
作者: Seshu Barma,Mohanakrishnan Hariharan,Satish Arvapalli
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-80] VeritasFi: An Adaptable Multi-tiered RAG Framework for Multi-modal Financial Question Answering

链接: https://arxiv.org/abs/2510.10828
作者: Zhenghan Tai,Hanwei Wu,Qingchen Hu,Jijun Chi,Hailin He,Lei Ding,Tung Sum Thomas Kwok,Bohuai Xiao,Yuchen Hua,Suyuchen Wang,Peng Lu,Muzhi Li,Yihong Wu,Liheng Ma,Jerry Huang,Jiayi Zhang,Gonghao Zhang,Chaolong Jiang,Jingrui Tian,Sicheng Lyu,Zeyu Li,Boyu Han,Fengran Mo,Xinyue Yu,Yufei Cui,Ling Zhou,Xinyu Wang
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-81] Agent ic RAG for Software Testing with Hybrid Vector-Graph and Multi-Agent Orchestration

链接: https://arxiv.org/abs/2510.10824
作者: Mohanakrishnan Hariharan,Satish Arvapalli,Seshu Barma,Evangeline Sheela
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-82] he Irrational Machine: Neurosis and the Limits of Algorithmic Safety

链接: https://arxiv.org/abs/2510.10823
作者: Daniel Howard
机构: 未知
类目: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
备注: 41 pages, 17 figures, 5 tables

点击查看摘要

[AI-83] Generative AI and the Transformation of Software Development Practices

链接: https://arxiv.org/abs/2510.10819
作者: Vivek Acharya
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注: 16 pages; 1 figure; preprint; v

点击查看摘要

[AI-84] LLM s as Strategic Agents : Beliefs Best Response Behavior and Emergent Heuristics

链接: https://arxiv.org/abs/2510.10813
作者: Enric Junque de Fortuny,Veronica Roberta Cappelli
机构: 未知
类目: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
备注:

点击查看摘要

[AI-85] herapeutic AI and the Hidden Risks of Over-Disclosure: An Embedded AI-Literacy Framework for Mental Health Privacy

链接: https://arxiv.org/abs/2510.10805
作者: Soraya S. Anvari,Rina R. Wehbe
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注: Accepted to SMASH 2025

点击查看摘要

[AI-86] PruneGCRN: Minimizing and explaining spatio-temporal problems through node pruning

链接: https://arxiv.org/abs/2510.10803
作者: Javier García-Sigüenza,Mirco Nanni,Faraón Llorens-Largo,José F. Vicent
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-87] BioOSS: A Bio-Inspired Oscillatory State System with Spatio-Temporal Dynamics

链接: https://arxiv.org/abs/2510.10790
作者: Zhongju Yuan,Geraint Wiggins,Dick Botteldooren
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-88] ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis

链接: https://arxiv.org/abs/2510.10774
作者: Mohammad Javad Ranjbar Kalahroodi,Heshaam Faili,Azadeh Shakery
机构: 未知
类目: ound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
备注:

点击查看摘要

[AI-89] Understanding Sampler Stochasticity in Training Diffusion Models for RLHF

【速读】:该论文旨在解决强化学习从人类反馈(Reinforcement Learning from Human Feedback, RLHF)微调扩散模型时,训练阶段使用的随机微分方程(Stochastic Differential Equation, SDE)采样器与推理阶段使用的确定性常微分方程(Ordinary Differential Equation, ODE)采样器之间的不匹配问题,即由此产生的奖励差距(reward gap)对推理质量的影响。解决方案的关键在于理论刻画这一奖励差距并提供非平凡的边界,同时引入广义去噪扩散隐式模型(generalized denoising diffusion implicit models, gDDIM)框架以支持任意高程度的随机性,从而在保持数据边缘分布不变的前提下提升训练过程中的探索能力,并通过大规模实验验证:随着训练进行,奖励差距持续缩小,且使用高随机性SDE训练后,采用ODE采样的生成质量显著提升。

链接: https://arxiv.org/abs/2510.10767
作者: Jiayuan Sheng,Hanyang Zhao,Haoxian Chen,David D. Yao,Wenpin Tang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
备注:

点击查看摘要

Abstract:Reinforcement Learning from Human Feedback (RLHF) is increasingly used to fine-tune diffusion models, but a key challenge arises from the mismatch between stochastic samplers used during training and deterministic samplers used during inference. In practice, models are fine-tuned using stochastic SDE samplers to encourage exploration, while inference typically relies on deterministic ODE samplers for efficiency and stability. This discrepancy induces a reward gap, raising concerns about whether high-quality outputs can be expected during inference. In this paper, we theoretically characterize this reward gap and provide non-vacuous bounds for general diffusion models, along with sharper convergence rates for Variance Exploding (VE) and Variance Preserving (VP) Gaussian models. Methodologically, we adopt the generalized denoising diffusion implicit models (gDDIM) framework to support arbitrarily high levels of stochasticity, preserving data marginals throughout. Empirically, our findings through large-scale experiments on text-to-image models using denoising diffusion policy optimization (DDPO) and mixed group relative policy optimization (MixGRPO) validate that reward gaps consistently narrow over training, and ODE sampling quality improves when models are updated using higher-stochasticity SDE training.
zh

[AI-90] GPS Spoofing Attack Detection in Autonomous Vehicles Using Adaptive DBSCAN

链接: https://arxiv.org/abs/2510.10766
作者: Ahmad Mohammadi,Reza Ahmari,Vahid Hemmati,Frederick Owusu-Ambrose,Mahmoud Nabil Mahmoud,Parham Kebria,Abdollah Homaifar,Mehrdad Saif
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
备注:

点击查看摘要

[AI-91] A Stochastic Differential Equation Framework for Multi-Objective LLM Interactions: Dynamical Systems Analysis with Code Generation Applications NEURIPS25 NEURIPS2025

【速读】:该论文旨在解决多目标优化在迭代式大语言模型(Large Language Model, LLM)交互中的动态建模问题,尤其关注不同目标之间可能存在的系统性干扰及其对收敛行为的影响。其解决方案的关键在于提出一个通用的随机微分方程(Stochastic Differential Equation, SDE)框架,通过显式的扩散项刻画LLM响应的固有随机性,并利用干扰矩阵(interference matrix)形式化地揭示多个竞争目标之间的系统性相互作用机制。该框架在代码生成任务中进行了验证,表明其能有效预测不同策略下的收敛速率与性能表现,为基于动力系统理论分析多目标LLM交互提供了可行路径。

链接: https://arxiv.org/abs/2510.10739
作者: Shivani Shukla,Himanshu Joshi
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
备注: Peer-reviewed and accepted to the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) DynaFront 2025 Workshop ( this https URL )

点击查看摘要

Abstract:We introduce a general stochastic differential equation framework for modelling multiobjective optimization dynamics in iterative Large Language Model (LLM) interactions. Our framework captures the inherent stochasticity of LLM responses through explicit diffusion terms and reveals systematic interference patterns between competing objectives via an interference matrix formulation. We validate our theoretical framework using iterative code generation as a proof-of-concept application, analyzing 400 sessions across security, efficiency, and functionality objectives. Our results demonstrate strategy-dependent convergence behaviors with rates ranging from 0.33 to 1.29, and predictive accuracy achieving R2 = 0.74 for balanced approaches. This work proposes the feasibility of dynamical systems analysis for multi-objective LLM interactions, with code generation serving as an initial validation domain.
zh

[AI-92] Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR ICASSP2026

链接: https://arxiv.org/abs/2510.10738
作者: Ling Sun,Charlotte Zhu,Shuju Shi
机构: 未知
类目: ound (cs.SD); Artificial Intelligence (cs.AI)
备注: Submitted to ICASSP 2026

点击查看摘要

[AI-93] Provable Anytime Ensemble Sampling Algorithms in Nonlinear Contextual Bandits

链接: https://arxiv.org/abs/2510.10730
作者: Jiazheng Sun,Weixin Wang,Pan Xu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注: 40 pages, 1 figure

点击查看摘要

[AI-94] SS-DPPN: A self-supervised dual-path foundation model for the generalizable cardiac audio representation

链接: https://arxiv.org/abs/2510.10719
作者: Ummy Maria Muna,Md Mehedi Hasan Shawon,Md Jobayer,Sumaiya Akter,Md Rakibul Hasan,Md. Golam Rabiul Alam
机构: 未知
类目: ound (cs.SD); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-95] Adaptive Selection of Symbolic Languages for Improving LLM Logical Reasoning

链接: https://arxiv.org/abs/2510.10703
作者: Xiangyu Wang,Haocheng Yang,Fengxiang Cheng,Fenrong Liu
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-96] Attention-Enhanced LSTM Modeling for Improved Temperature and Rainfall Forecasting in Bangladesh

链接: https://arxiv.org/abs/2510.10702
作者: Usman Gani Joy,Shahadat kabir,Tasnim Niger
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-97] Extended Triangular Method: A Generalized Algorithm for Contradiction Separation Based Automated Deduction

链接: https://arxiv.org/abs/2510.10701
作者: Yang Xu,Shuwei Chen,Jun Liu,Feng Cao,Xingxing He
机构: 未知
类目: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
备注: 38 pages, 8 figures

点击查看摘要

[AI-98] OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLM s

链接: https://arxiv.org/abs/2510.10689
作者: Caorui Li,Yu Chen,Yiyan Ji,Jin Xu,Zhenyu Cui,Shihao Li,Yuanxing Zhang,Jiafu Tang,Zhenghao Song,Dingling Zhang,Ying He,Haoxiang Liu,Yuxuan Wang,Qiufeng Wang,Zhenhe Wu,Jiehui Luo,Zhiyu Pan,Weihao Xie,Chenchen Zhang,Zhaohui Wang,Jiayi Tian,Yanghai Wang,Zhe Cao,Minxin Dai,Ke Wang,Runzhe Wen,Yinghao Ma,Yaning Pan,Sungkyun Chang,Termeh Taheri,Haiwen Xia,Christos Plachouras,Emmanouil Benetos,Yizhi Li,Ge Zhang,Jian Yang,Tianhao Peng,Zili Wang,Minghao Liu,Junran Peng,Zhaoxiang Zhang,Jiaheng Liu
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-99] LSZone: A Lightweight Spatial Information Modeling Architecture for Real-time In-car Multi-zone Speech Separation ICASSP2026

链接: https://arxiv.org/abs/2510.10687
作者: Jun Chen,Shichao Hu,Jiuxin Lin,Wenjie Li,Zihan Zhang,Xingchen Li,JinJiang Liu,Longshuai Xiao,Chao Weng,Lei Xie,Zhiyong Wu
机构: 未知
类目: ound (cs.SD); Artificial Intelligence (cs.AI)
备注: submitted to ICASSP 2026

点击查看摘要

[AI-100] Simpliflow: A Lightweight Open-Source Framework for Rapid Creation and Deployment of Generative Agent ic AI Workflows

链接: https://arxiv.org/abs/2510.10675
作者: Deven Panchal
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-101] Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning

链接: https://arxiv.org/abs/2510.10649
作者: Can Xie,Ruotong Pan,Xiangyu Wu,Yunfei Zhang,Jiayi Fu,Tingting Gao,Guorui Zhou
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-102] rustworthy Retrosynthesis: Eliminating Hallucinations with a Diverse Ensemble of Reaction Scorers

链接: https://arxiv.org/abs/2510.10645
作者: Michal Sadowski,Maria Wyrzykowska,Lukasz Sztukiewicz,Tadija Radusinović,Jan Rzymkowski,Paweł Włodarczyk-Pruszyński,Mikołaj Sacha,Piotr Kozakowski,Ruard van Workum,Stanislaw Kamil Jastrzebski
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-103] Hierarchical Optimization via LLM -Guided Objective Evolution for Mobility-on-Demand Systems

链接: https://arxiv.org/abs/2510.10644
作者: Yi Zhang,Yushen Long,Yun Ni,Liping Huang,Xiaohong Wang,Jun Liu
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-104] UniCoD: Enhancing Robot Policy via Unified Continuous and Discrete Representation Learning

链接: https://arxiv.org/abs/2510.10642
作者: Jianke Zhang,Yucheng Hu,Yanjiang Guo,Xiaoyu Chen,Yichen Liu,Wenna Chen,Chaochao Lu,Jianyu Chen
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-105] Equity-Aware Geospatial AI for Forecasting Demand-Driven Hospital Locations in Germany

链接: https://arxiv.org/abs/2510.10640
作者: Piyush Pant,Marcellius William Suntoro,Ayesha Siddiqua,Muhammad Shehryaar Sharif,Daniyal Ahmed
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 7 pages. Application: this https URL Codebase: this https URL

点击查看摘要

[AI-106] Automatic Piecewise Linear Regression for Predicting Student Learning Satisfaction

链接: https://arxiv.org/abs/2510.10639
作者: Haemin Choi,Gayathri Nadarajan
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[AI-107] Collaborative Text-to-Image Generation via Multi-Agent Reinforcement Learning and Semantic Fusion

链接: https://arxiv.org/abs/2510.10633
作者: Jiabao Shi,Minfeng Qi,Lefeng Zhang,Di Wang,Yingjie Zhao,Ziying Li,Yalong Xing,Ningran Li
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 16 pages, 13 figures

点击查看摘要

[AI-108] A Machine Learning Approach for MIDI to Guitar Tablature Conversion

链接: https://arxiv.org/abs/2510.10619
作者: Maximos Kaliakatsos-Papakostas,Gregoris Bastas,Dimos Makris,Dorien Herremans,Vassilis Katsouros,Petros Maragos
机构: 未知
类目: ound (cs.SD); Artificial Intelligence (cs.AI)
备注: Proceedings of the 19th Sound and Music Computing Conference, June 5-12th, 2022, Saint-Étienne (France)

点击查看摘要

[AI-109] EA4LLM : A Gradient-Free Approach to Large Language Model Optimization via Evolutionary Algorithms

链接: https://arxiv.org/abs/2510.10603
作者: WenTao Liu,Siyu Song,Hao Hao,Aimin Zhou
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-110] A Distance Measure for Random Permutation Set: From the Layer-2 Belief Structure Perspective

链接: https://arxiv.org/abs/2510.10596
作者: Ruolan Cheng,Yong Deng,Serafín Moral,José Ramón Trillo
机构: 未知
类目: Artificial Intelligence (cs.AI); Information Theory (cs.IT)
备注:

点击查看摘要

[AI-111] Compositional Symmetry as Compression: Lie Pseudogroup Structure in Algorithmic Agents WWW

链接: https://arxiv.org/abs/2510.10586
作者: Giulio Ruffini
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Neurons and Cognition (q-bio.NC)
备注: Submitted to NeurReps 2025 ( this https URL )

点击查看摘要

[AI-112] ELAIPBench: A Benchmark for Expert-Level Artificial Intelligence Paper Understanding

链接: https://arxiv.org/abs/2510.10549
作者: Xinbang Dai,Huikang Hu,Yongrui Chen,Jiaqi Li,Rihui Jin,Yuyang Zhang,Xiaoguang Li,Lifeng Shang,Guilin Qi
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 25 pages, 20 figures

点击查看摘要

[AI-113] PAC-Bayesian Reinforcement Learning Trains Generalizable Policies

链接: https://arxiv.org/abs/2510.10544
作者: Abdelkrim Zitouni,Mehdi Hennequin,Juba Agoun,Ryan Horache,Nadia Kabachi,Omar Rivasplata
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注:

点击查看摘要

[AI-114] Rethinking RL Evaluation: Can Benchmarks Truly Reveal Failures of RL Methods?

链接: https://arxiv.org/abs/2510.10541
作者: Zihan Chen,Yiming Zhang,Hengguang Zhou,Zenghui Ding,Yining Sun,Cho-Jui Hsieh
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-115] ECO: Enhanced Code Optimization via Performance-Aware Prompting for Code-LLM s

【速读】:该论文旨在解决代码运行时优化(Code Runtime Optimization)中的核心挑战,即如何让大型语言模型(LLM)在不依赖人工标注的慢-快代码对的情况下,真正理解性能提升的根本原因并生成高效代码。传统基于慢-快代码对的方法容易导致模型仅模仿表面模式而非进行深层性能推理。其解决方案的关键在于提出一种性能感知提示框架 ECO(Performance-aware Prompting Framework),该框架首先从参考的慢-快代码对中蒸馏出“运行时优化指令”(Runtime Optimization Instructions, ROIs),每条 ROI 明确描述低效根源及性能改进的逻辑依据;随后,ECO 并行调用符号化顾问(symbolic advisor)进行瓶颈诊断,并通过 ROI 检索器返回相关优化建议,二者共同构成面向性能的提示内容,可直接插入任意代码-LLM 的输入前缀中,无需微调即可显著提升代码生成效率(最高达 7.81x 加速),同时保持正确性。

链接: https://arxiv.org/abs/2510.10517
作者: Su-Hyeon Kim,Joonghyuk Hahn,Sooyoung Cha,Yo-Sub Han
机构: 未知
类目: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
备注:

点击查看摘要

Abstract:Code runtime optimization-the task of rewriting a given code to a faster one-remains challenging, as it requires reasoning about performance trade-offs involving algorithmic and structural choices. Recent approaches employ code-LLMs with slow-fast code pairs provided as optimization guidance, but such pair-based methods obscure the causal factors of performance gains and often lead to superficial pattern imitation rather than genuine performance reasoning. We introduce ECO, a performance-aware prompting framework for code optimization. ECO first distills runtime optimization instructions (ROIs) from reference slow-fast code pairs; Each ROI describes root causes of inefficiency and the rationales that drive performance improvements. For a given input code, ECO in parallel employs (i) a symbolic advisor to produce a bottleneck diagnosis tailored to the code, and (ii) an ROI retriever to return related ROIs. These two outputs are then composed into a performance-aware prompt, providing actionable guidance for code-LLMs. ECO’s prompts are model-agnostic, require no fine-tuning, and can be easily prepended to any code-LLM prompt. Our empirical studies highlight that ECO prompting significantly improves code-LLMs’ ability to generate efficient code, achieving speedups of up to 7.81x while minimizing correctness loss.
zh

[AI-116] Population-Coded Spiking Neural Networks for High-Dimensional Robotic Control

链接: https://arxiv.org/abs/2510.10516
作者: Kanishkha Jaisankar,Xiaoyang Jiang,Feifan Liao,Jeethu Sreenivas Amuthan
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[AI-117] f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness

链接: https://arxiv.org/abs/2510.10510
作者: Subhodip Panda,Dhruv Tarsadiya,Shashwat Sourav,Prathosh A.P,Sai Praneeth Karimireddy
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-118] MARS-Sep: Multimodal-Aligned Reinforced Sound Separation

链接: https://arxiv.org/abs/2510.10509
作者: Zihan Zhang,Xize Cheng,Zhennan Jiang,Dongjie Fu,Jingyuan Chen,Zhou Zhao,Tao Jin
机构: 未知
类目: ound (cs.SD); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-119] Align2Act: Instruction-Tuned Models for Human-Aligned Autonomous Driving

【速读】:该论文旨在解决自动驾驶中复杂场景下运动规划(motion planning)的挑战,特别是现有基于大语言模型(Large Language Models, LLMs)的方法是否真正捕捉了人类驾驶逻辑的问题。其解决方案的关键在于提出Align2Act框架,通过将指令微调后的LLM转化为与人类行为对齐的可解释规划器:首先基于人类推理模式(如预判危险、在交叉口让行)和交通规则(如红灯停车、保持车道边界)构建结构化驾驶指令,再利用Align2ActChain模块进行分步推理,从而生成既具可解释性又安全的轨迹。该方法在nuPlan真实世界闭环基准测试中显著优于以往仅关注合成或开环场景的工作,验证了其规划质量和人类相似性的提升。

链接: https://arxiv.org/abs/2510.10503
作者: Kanishkha Jaisankar,Sunidhi Tandel
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
备注:

点击查看摘要

Abstract:Motion planning in complex scenarios is a core challenge in autonomous driving. Conventional methods apply predefined rules or learn from driving data to generate trajectories, while recent approaches leverage large language models (LLMs) for decision-making. However, it remains unclear whether LLMs truly capture human driving logic. We propose Align2Act, a motion planning framework that transforms instruction-tuned LLMs into interpretable planners aligned with human behavior. We derive structured driving instructions based on human reasoning patterns (e.g., anticipate hazards, yield at intersections) and traffic rules (e.g., stop at red lights, maintain lane boundaries). Our Align2ActChain module guides step-by-step reasoning to produce both an interpretable rationale and a safe trajectory. By fine-tuning LLaMA-2-7B with LoRA on one million scenarios from the nuPlan dataset, our method achieves an open-loop score of 85.17 and closed-loop scores of 70.31 (non-reactive) and 66.96 (reactive) on Test14-random. Unlike prior work focused on synthetic or open-loop settings, we demonstrate improved planning quality and human-likeness on the real-world nuPlan closed-loop benchmark. Ablation studies confirm that structured reasoning significantly improves performance over baseline LLM planners.
zh

[AI-120] Personalized Motion Guidance Framework for Athlete-Centric Coaching

链接: https://arxiv.org/abs/2510.10496
作者: Ryota Takamidoa,Chiharu Suzukia,Hiroki Nakamoto
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-121] racing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning

链接: https://arxiv.org/abs/2510.10494
作者: Martina G. Vilas,Safoora Yousefi,Besmira Nushi,Eric Horvitz,Vidhisha Balachandran
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-122] SASER: Stego attacks on open-source LLM s

链接: https://arxiv.org/abs/2510.10486
作者: Ming Tan,Wei Li,Hu Tao,Hailong Ma,Aodi Liu,Qian Chen,Zilong Wang
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-123] Latent Retrieval Augmented Generation of Cross-Domain Protein Binders

链接: https://arxiv.org/abs/2510.10480
作者: Zishen Zhang,Xiangzhe Kong,Wenbing Huang,Yang Liu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-124] AnyBCQ: Hardware Efficient Flexible Binary-Coded Quantization for Multi-Precision LLM s

链接: https://arxiv.org/abs/2510.10467
作者: Gunho Park,Jeongin Bae,Beomseok Kwon,Byeongwook Kim,Se Jung Kwon,Dongsoo Lee
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-125] LightSAE: Parameter-Efficient and Heterogeneity-Aware Embedding for IoT Multivariate Time Series Forecasting

链接: https://arxiv.org/abs/2510.10465
作者: Yi Ren,Xinjie Yu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Submitted to IEEE IoT-J

点击查看摘要

[AI-126] MedCoAct: Confidence-Aware Multi-Agent Collaboration for Complete Clinical Decision

链接: https://arxiv.org/abs/2510.10461
作者: Hongjie Zheng,Zesheng Shi,Ping Yi
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-127] sting and Enhancing Multi-Agent Systems for Robust Code Generation

链接: https://arxiv.org/abs/2510.10460
作者: Zongyi Lyu,Songqiang Chen,Zhenlan Ji,Liwen Wang,Shuai Wang,Daoyuan Wu,Wenxuan Wang,Shing-Chi Cheung
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注: 19pages, 5 figures

点击查看摘要

[AI-128] raj-CoA: Patient Trajectory Modeling via Chain-of-Agents for Lung Cancer Risk Prediction ALT NEURIPS2025

链接: https://arxiv.org/abs/2510.10454
作者: Sihang Zeng,Yujuan Fu,Sitong Zhou,Zixuan Yu,Lucas Jing Liu,Jun Wen,Matthew Thompson,Ruth Etzioni,Meliha Yetisgen
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: Accepted by NeurIPS 2025 GenAI4Health Workshop

点击查看摘要

[AI-129] Data-driven simulator of multi-animal behavior with unknown dynamics via offline and online reinforcement learning

链接: https://arxiv.org/abs/2510.10451
作者: Keisuke Fujii,Kazushi Tsutsui,Yu Teshima,Makoto Itoh,Naoya Takeishi,Nozomi Nishiumi,Ryoya Tanaka,Shunsuke Shigaki,Yoshinobu Kawahara
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 21 pages, 7 figures

点击查看摘要

[AI-130] Reverse Supervision at Scale: Exponential Search Meets the Economics of Annotation

链接: https://arxiv.org/abs/2510.10446
作者: Masoud Makrehchi
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 10 pages

点击查看摘要

[AI-131] Multi-Task Learning with Feature-Similarity Laplacian Graphs for Predicting Alzheimers Disease Progression

链接: https://arxiv.org/abs/2510.10433
作者: Zixiang Xu,Menghui Zhou,Jun Qi,Xuanhan Fan,Yun Yang,Po Yang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-132] Hierarchical LoRA MoE for Efficient CTR Model Scaling

链接: https://arxiv.org/abs/2510.10432
作者: Zhichen Zeng,Mengyue Hang,Xiaolong Liu,Xiaoyi Liu,Xiao Lin,Ruizhong Qiu,Tianxin Wei,Zhining Liu,Siyang Yuan,Chaofei Yang,Yiqun Liu,Hang Yin,Jiyan Yang,Hanghang Tong
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
备注: 13 pages, 9 figures

点击查看摘要

[AI-133] race Length is a Simple Uncertainty Signal in Reasoning Models

链接: https://arxiv.org/abs/2510.10409
作者: Siddartha Devic,Charlotte Peale,Arwen Bradley,Sinead Williamson,Preetum Nakkiran,Aravind Gollakota
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-134] Controllable Graph Generation with Diffusion Models via Inference-Time Tree Search Guidance

链接: https://arxiv.org/abs/2510.10402
作者: Jiachi Zhao,Zehong Wang,Yamei Liao,Chuxu Zhang,Yanfang Ye
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
备注:

点击查看摘要

[AI-135] RobotFleet: An Open-Source Framework for Centralized Multi-Robot Task Planning

链接: https://arxiv.org/abs/2510.10379
作者: Rohan Gupta,Trevor Asbery,Zain Merchant,Abrar Anwar,Jesse Thomason
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
备注:

点击查看摘要

[AI-136] Measuring What Matters: Connecting AI Ethics Evaluations to System Attributes Hazards and Harms

链接: https://arxiv.org/abs/2510.10339
作者: Shalaleh Rismani,Renee Shelby,Leah Davis,Negar Rostamzadeh,AJung Moon
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[AI-137] Beyond Ethics: How Inclusive Innovation Drives Economic Returns in Medical AI

链接: https://arxiv.org/abs/2510.10338
作者: Balagopal Unnikrishnan,Ariel Guerra Adames,Amin Adibi,Sameer Peesapati,Rafal Kocielnik,Shira Fischer,Hillary Clinton Kasimbazi,Rodrigo Gameiro,Alina Peluso,Chrystinne Oliveira Fernandes,Maximin Lange,Lovedeep Gondara,Leo Anthony Celi
机构: 未知
类目: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
备注:

点击查看摘要

[AI-138] owards Safe Maneuvering of Double-Ackermann-Steering Robots with a Soft Actor-Critic Framework IROS2025

链接: https://arxiv.org/abs/2510.10332
作者: Kohio Deflesselle,Mélodie Daniel,Aly Magassouba,Miguel Aranda,Olivier Ly
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注: 4 pages, 3 figures, 2 tables, Accepted for Safety of Intelligent and Autonomous Vehicles: Formal Methods vs. Machine Learning approaches for reliable navigation (SIAV-FM2L) an IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025) workshop

点击查看摘要

[AI-139] LLM -Friendly Knowledge Representation for Customer Support

链接: https://arxiv.org/abs/2510.10331
作者: Hanchen Su,Wei Luo,Wei Han,Yu Elaine Liu,Yufeng Wayne Zhang,Cen Mia Zhao,Ying Joy Zhang,Yashar Mehdad
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-140] Mapping the Urban Mobility Intelligence Frontier: A Scientometric Analysis of Data-Driven Pedestrian Trajectory Prediction and Simulation

链接: https://arxiv.org/abs/2510.10327
作者: Junhao Xu,Hui Zeng
机构: 未知
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
备注: 5 figures

点击查看摘要

[AI-141] KG-MAS: Knowledge Graph-Enhanced Multi-Agent Infrastructure for coupling physical and digital robotic environments

链接: https://arxiv.org/abs/2510.10325
作者: Walid Abdela
机构: 未知
类目: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Robotics (cs.RO)
备注:

点击查看摘要

[AI-142] Bridging Semantics Structure for Software Vulnerability Detection using Hybrid Network Models

【速读】:该论文旨在解决软件漏洞检测中传统静态和动态分析方法因忽视程序结构依赖关系而导致的不准确问题,这些依赖关系往往决定了不安全行为的产生机制。解决方案的关键在于将程序建模为异质图(heterogeneous graph),显式捕捉控制流与数据流之间的复杂交互网络,并结合轻量级本地大语言模型(local LLMs,参数规模4B)进行融合推理,从而在保持低计算开销和隐私安全的前提下,实现拓扑特征与语义理解的协同优化。该方法在Java漏洞检测任务上达到93.57%的准确率,显著优于基于图注意力网络和预训练大模型的基线方法,同时具备子图重要性提取和自然语言解释能力,提升了可解释性与工程实用性。

链接: https://arxiv.org/abs/2510.10321
作者: Jugal Gajjar,Kaustik Ranaware,Kamalasankari Subramaniakuppusamy
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
备注: 13 pages, 3 figures, 5 tables, 14 equations, accepted at the 14th International Conference on Complex Networks and Their Applications (COMPLEX NETWORKS 2025) and the conference proceedings will be published by Springer in the Studies in Computational Intelligence series

点击查看摘要

Abstract:Software vulnerabilities remain a persistent risk, yet static and dynamic analyses often overlook structural dependencies that shape insecure behaviors. Viewing programs as heterogeneous graphs, we capture control- and data-flow relations as complex interaction networks. Our hybrid framework combines these graph representations with light-weight (4B) local LLMs, uniting topological features with semantic reasoning while avoiding the cost and privacy concerns of large cloud models. Evaluated on Java vulnerability detection (binary classification), our method achieves 93.57% accuracy-an 8.36% gain over Graph Attention Network-based embeddings and 17.81% over pretrained LLM baselines such as Qwen2.5 Coder 3B. Beyond accuracy, the approach extracts salient subgraphs and generates natural language explanations, improving interpretability for developers. These results pave the way for scalable, explainable, and locally deployable tools that can shift vulnerability analysis from purely syntactic checks to deeper structural and semantic insights, facilitating broader adoption in real-world secure software development.
zh

[AI-143] Prepared for the Unknown: Adapting AIOps Capacity Forecasting Models to Data Changes

链接: https://arxiv.org/abs/2510.10320
作者: Lorena Poenaru-Olaru,Wouter van 't Hof,Adrian Stando,Arkadiusz P. Trawinski,Eileen Kapel,Jan S. Rellermeyer,Luis Cruz,Arie van Deursen
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-144] he algorithmic regulator

链接: https://arxiv.org/abs/2510.10300
作者: Giulio Ruffini
机构: 未知
类目: Computational Complexity (cs.CC); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Systems and Control (eess.SY); Neurons and Cognition (q-bio.NC)
备注: 2 Figures

点击查看摘要

[AI-145] Mitigating Hallucination in Multimodal Reasoning via Functional Attention Control

链接: https://arxiv.org/abs/2510.10285
作者: Haolang Lu,Bolun Chu,WeiYe Fu,Guoshun Nan,Junning Liu,Minghui Pan,Qiankun Li,Yi Yu,Hua Wang,Kun Wang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: preprint

点击查看摘要

[AI-146] Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models

链接: https://arxiv.org/abs/2510.10278
作者: Christopher Chiu,Silviu Pitis,Mihaela van der Schaar
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-147] MetaBreak: Jailbreaking Online LLM Services via Special Token Manipulation

链接: https://arxiv.org/abs/2510.10271
作者: Wentian Zhu,Zhen Xiang,Wei Niu,Le Guan
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-148] Unveiling Gamer Archetypes through Multi modal feature Correlations and Unsupervised Learning

链接: https://arxiv.org/abs/2510.10263
作者: Moona Kanwal,Muhammad Sami Siddiqui,Syed Anael Ali
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
备注: Submitted to Peer Review Journal

点击查看摘要

[AI-149] Reasoning -Enhanced Large Language Models for Molecular Property Prediction

【速读】:该论文旨在解决分子性质预测(molecular property prediction)中现有方法存在的三大问题:解释性不足、跨任务泛化能力差以及缺乏化学推理能力。传统机器学习模型在任务迁移上表现受限,而专用的分子语言模型则难以提供决策过程的可解释性。为此,作者提出MPPReasoner,一个结合化学推理能力的多模态大语言模型,其关键在于通过融合分子图像与SMILES字符串实现对分子的全面理解,并采用两阶段训练策略:首先利用专家知识和多个教师模型生成的16,000条高质量推理轨迹进行监督微调(SFT),随后引入基于原则引导奖励的强化学习(RLPGR),该方法使用可验证的规则奖励机制系统评估化学原理应用、分子结构分析及逻辑一致性,从而显著提升模型在分布内和分布外任务上的性能(分别提升7.91%和4.53%),并生成具有化学合理性的推理路径,大幅增强模型的可解释性和对化学家的实际价值。

链接: https://arxiv.org/abs/2510.10248
作者: Jiaxi Zhuang,Yaorui Shi,Jue Hou,Yunong He,Mingwei Ye,Mingjun Xu,Yuming Su,Linfeng Zhang,Linfeng Zhang,Guolin Ke,Hengxing Cai
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Molecular property prediction is crucial for drug discovery and materials science, yet existing approaches suffer from limited interpretability, poor cross-task generalization, and lack of chemical reasoning capabilities. Traditional machine learning models struggle with task transferability, while specialized molecular language models provide little insight into their decision-making processes. To address these limitations, we propose \textbfMPPReasoner, a multimodal large language model that incorporates chemical reasoning for molecular property prediction. Our approach, built upon Qwen2.5-VL-7B-Instruct, integrates molecular images with SMILES strings to enable comprehensive molecular understanding. We develop a two-stage training strategy: supervised fine-tuning (SFT) using 16,000 high-quality reasoning trajectories generated through expert knowledge and multiple teacher models, followed by Reinforcement Learning from Principle-Guided Rewards (RLPGR). RLPGR employs verifiable, rule-based rewards that systematically evaluate chemical principle application, molecular structure analysis, and logical consistency through computational verification. Extensive experiments across 8 datasets demonstrate significant performance improvements, with MPPReasoner outperforming the best baselines by 7.91% and 4.53% on in-distribution and out-of-distribution tasks respectively. MPPReasoner exhibits exceptional cross-task generalization and generates chemically sound reasoning paths that provide valuable insights into molecular property analysis, substantially enhancing both interpretability and practical utility for chemists. Code is available at this https URL.
zh

[AI-150] he Achilles Heel of LLM s: How Altering a Handful of Neurons Can Cripple Language Abilities

链接: https://arxiv.org/abs/2510.10238
作者: Zixuan Qin,Kunlin Lyu,Qingchen Yu,Yifan Sun,Zhaoxin Fan
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-151] SGM: A Statistical Godel Machine for Risk-Controlled Recursive Self-Modification

链接: https://arxiv.org/abs/2510.10232
作者: Xuening Wu,Shenqin Yin,Yanlan Kang,Xinhang Zhang,Qianya Xu,Zeping Chen,Wenqiang Zhang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-152] A3RNN: Bi-directional Fusion of Bottom-up and Top-down Process for Developmental Visual Attention in Robots

链接: https://arxiv.org/abs/2510.10221
作者: Hyogo Hiruma,Hiroshi Ito,Hiroki Mori,Tetsuya Ogata
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注: 8 pages, 5 figures

点击查看摘要

[AI-153] UF-RNN: Real-Time Adaptive Motion Generation Using Uncertainty-Driven Foresight Prediction

链接: https://arxiv.org/abs/2510.10217
作者: Hyogo Hiruma,Hiroshi Ito,Tetsuya Ogata
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注: 8 pages, 6 figures

点击查看摘要

[AI-154] Learning to Guarantee Type Correctness in Code Generation through Type-Guided Program Synthesis

链接: https://arxiv.org/abs/2510.10216
作者: Zhechong Huang,Zhao Zhang,Ruyi Ji,Tingxuan Xia,Qihao Zhu,Qinxiang Cao,Zeyu Sun,Yingfei Xiong
机构: 未知
类目: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
备注:

点击查看摘要

[AI-155] Adaptive Dual Reason er: Large Reasoning Reasoning Models Can Think Efficiently by Hybrid Reasoning

链接: https://arxiv.org/abs/2510.10207
作者: Yujian Zhang,Keyu Chen,Zhifeng Shen,Ruizhi Qiao,Xing Sun
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-156] PIXEL: Adaptive Steering Via Position-wise Injection with eXact Estimated Levels under Subspace Calibration

链接: https://arxiv.org/abs/2510.10205
作者: Manjiang Yu,Hongji Li,Priyanka Singh,Xue Li,Di Wang,Lijie Hu
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 18 pages,3 figures

点击查看摘要

[AI-157] Revisiting Trust in the Era of Generative AI: Factorial Structure and Latent Profiles

链接: https://arxiv.org/abs/2510.10199
作者: Haocan Sun,Weizi Liu,Di Wu,Guoming Yu,Mike Yao
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-158] Dont Just Fine-tune the Agent Tune the Environment

链接: https://arxiv.org/abs/2510.10197
作者: Siyuan Lu,Zechuan Wang,Hongxuan Zhang,Qintong Wu,Leilei Gan,Chenyi Zhuang,Jinjie Gu,Tao Lin
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-159] CauchyNet: Compact and Data-Efficient Learning using Holomorphic Activation Functions

链接: https://arxiv.org/abs/2510.10195
作者: Hong-Kun Zhang,Xin Li,Sikun Yang,Zhihong Xia
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-160] SAFER: Risk-Constrained Sample-then-Filter in Large Language Models

链接: https://arxiv.org/abs/2510.10193
作者: Qingni Wang,Yue Fan,Xin Eric Wang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-161] Formally Verified Certification of Unsolvability of Temporal Planning Problems

链接: https://arxiv.org/abs/2510.10189
作者: David Wang,Mohammad Abdulaziz
机构: 未知
类目: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-162] LLM s are All You Need? Improving Fuzz Testing for MOJO with Large Language Models

链接: https://arxiv.org/abs/2510.10179
作者: Linghan Huang,Peizhou Zhao,Huaming Chen
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-163] Concise Reasoning in the Lens of Lagrangian Optimization

链接: https://arxiv.org/abs/2510.10168
作者: Chengqian Gao,Haonan Li,Taylor W. Killian,Jianshu She,Renxi Wang,Liqun Ma,Zhoujun Cheng,Shibo Hao,Zhiqiang Xu
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-164] Multi-Scale Diffusion Transformer for Jointly Simulating User Mobility and Mobile Traffic Pattern

链接: https://arxiv.org/abs/2510.10158
作者: Ziyi Liu,Qingyue Long,Zhiwen Xue,Huandong Wang,Yong Li
机构: 未知
类目: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
备注: 9 pages, 4 figures. Code: this https URL

点击查看摘要

[AI-165] Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective

链接: https://arxiv.org/abs/2510.10150
作者: Zhezheng Hao,Hong Wang,Haoyang Liu,Jian Luo,Jiarui Yu,Hande Dong,Qiang Lin,Can Wang,Jiawei Chen
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-166] A Unified Frequency Domain Decomposition Framework for Interpretable and Robust Time Series Forecasting

链接: https://arxiv.org/abs/2510.10145
作者: Cheng He,Xijie Liang,Zengrong Zheng,Patrick P.C. Lee,Xu Huang,Zhaoyi Li,Hong Xie,Defu Lian,Enhong Chen
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-167] PermLLM : Learnable Channel Permutation for N:M Sparse Large Language Models NEURIPS2025

链接: https://arxiv.org/abs/2510.10136
作者: Lancheng Zou,Shuo Yin,Zehua Pei,Tsung-Yi Ho,Farzan Farnia,Bei Yu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Accepted by NeurIPS 2025

点击查看摘要

[AI-168] CharCom: Composable Identity Control for Multi-Character Story Illustration ACM-MM

链接: https://arxiv.org/abs/2510.10135
作者: Zhongsheng Wang,Ming Lin,Zhedong Lin,Yaser Shakib,Qian Liu,Jiamou Liu
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: Accepted by ACM MMAsia 2025

点击查看摘要

[AI-169] CacheClip: Accelerating RAG with Effective KV Cache Reuse

链接: https://arxiv.org/abs/2510.10129
作者: Bin Yang,Qiuyu Leng,Jun Zeng,Zhenhua Wu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-170] Ctrl-World: A Controllable Generative World Model for Robot Manipulation

链接: https://arxiv.org/abs/2510.10125
作者: Yanjiang Guo,Lucy Xiaoyang Shi,Jianyu Chen,Chelsea Finn
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注: 17 pages

点击查看摘要

[AI-171] DixitWorld: Evaluating Multimodal Abductive Reasoning in Vision-Language Models with Multi-Agent Dixit Gameplay EMNLP2025

链接: https://arxiv.org/abs/2510.10117
作者: Yunxiang Mo,Tianshi Zheng,Qing Zong,Jiayu Liu,Baixuan Xu,Yauwai Yim,Chunkit Chan,Jiaxin Bai,Yangqiu Song
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: EMNLP 2025 Wordplay (Spotlight)

点击查看摘要

[AI-172] What Makes Looped Transformers Perform Better Than Non-Recursive Ones (Provably)

链接: https://arxiv.org/abs/2510.10089
作者: Zixuan Gong,Jiaye Teng,Yong Liu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注:

点击查看摘要

[AI-173] Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning

链接: https://arxiv.org/abs/2510.10085
作者: Guozhi Liu,Qi Mu,Tiansheng Huang,Xinhua Wang,Li Shen,Weiwei Lin,Zhang Li
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[AI-174] How AI Companionship Develops: Evidence from a Longitudinal Study

【速读】:该论文试图解决的问题是:当前对人工智能伴侣(AI companion)与人类交互的驱动因素研究多集中于个体层面,缺乏对其相互作用机制及随时间演变过程的理解。为回应这一问题,作者通过两项研究构建了AI伴侣关系发展的纵向模型——Study 1通过横断面调查(N=303)识别出心理模型、拟社会互动(parasocial interaction)和参与度(engagement)等关键变量之间的内在关联;Study 2则采用纵向设计(N=110),发现用户对通用聊天机器人的感知在三周内显著趋同于其个性化伴侣的感知,从而验证了AI伴侣关系形成具有可预测的演化路径。解决方案的关键在于提出并实证验证了一个整合心理模型、拟社会体验与持续互动的动态发展框架,为未来人机共情关系的研究提供了可操作的测量方法与理论基础。

链接: https://arxiv.org/abs/2510.10079
作者: Angel Hsing-Chi Hwang,Fiona Li,Jacy Reese Anthis,Hayoun Noh
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:The quickly growing popularity of AI companions poses risks to mental health, personal wellbeing, and social relationships. Past work has identified many individual factors that can drive human-companion interaction, but we know little about how these factors interact and evolve over time. In Study 1, we surveyed AI companion users (N = 303) to map the psychological pathway from users’ mental models of the agent to parasocial experiences, social interaction, and the psychological impact of AI companions. Participants’ responses foregrounded multiple interconnected variables (agency, parasocial interaction, and engagement) that shape AI companionship. In Study 2, we conducted a longitudinal study with a subset of participants (N = 110) using a new generic chatbot. Participants’ perceptions of the generic chatbot significantly converged to perceptions of their own companions by Week 3. These results suggest a longitudinal model of AI companionship development and demonstrate an empirical method to study human-AI companionship.
zh

[AI-175] Gradient-based Model Shortcut Detection for Time Series Classification

链接: https://arxiv.org/abs/2510.10075
作者: Salomon Ibarra,Frida Cantu,Kaixiong Zhou,Li Zhang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Code available at: this https URL

点击查看摘要

[AI-176] Agent ic Troubleshooting Guide Automation for Incident Management

链接: https://arxiv.org/abs/2510.10074
作者: Jiayi Mao,Liqun Li,Yanjie Gao,Zegang Peng,Shilin He,Chaoyun Zhang,Si Qin,Samia Khalid,Qingwei Lin,Saravan Rajmohan,Sitaram Lanka,Dongmei Zhang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-177] SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation

链接: https://arxiv.org/abs/2510.10069
作者: Zeyu Ling,Xiaodong Gu,Jiangnan Tang,Changqing Zou
机构: 未知
类目: Artificial Intelligence (cs.AI); Multimedia (cs.MM)
备注:

点击查看摘要

[AI-178] OBsmith: Testing JavaScript Obfuscator using LLM -powered sketching

链接: https://arxiv.org/abs/2510.10066
作者: Shan Jiang,Chenguang Zhu,Sarfraz Khurshid
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
备注:

点击查看摘要

[AI-179] ALLOY: Generating Reusable Agent Workflows from User Demonstration

链接: https://arxiv.org/abs/2510.10049
作者: Jiawen Li,Zheng Ning,Yuan Tian,Toby Jia-jun Li
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
备注:

点击查看摘要

[AI-180] SwarmSys: Decentralized Swarm-Inspired Agents for Scalable and Adaptive Reasoning

链接: https://arxiv.org/abs/2510.10047
作者: Ruohao Li,Hongjun Liu,Leyi Zhao,Zisu Li,Jiawei Li,Jiajun Jiang,Linning Xu,Chen Zhao,Mingming Fan,Chen Liang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 14 pages, 7 figures

点击查看摘要

[AI-181] Belief Graphs with Reasoning Zones: Structure Dynamics and Epistemic Activation

链接: https://arxiv.org/abs/2510.10042
作者: Saleh Nikooroo,Thomas Engel
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-182] FOSSIL: Regret-Minimizing Curriculum Learning for Metadata-Free and Low-Data Mpox Diagnosis

链接: https://arxiv.org/abs/2510.10041
作者: Sahng-Min Han,Minjae Kim,Jinho Cha,Se-woon Choe,Eunchan Daniel Cha,Jungwon Choi,Kyudong Jung
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 35 pages, 11 figures, submitted to Computers in Biology and Medicine (Elsevier, under review)

点击查看摘要

[AI-183] Failure-Driven Workflow Refinement

链接: https://arxiv.org/abs/2510.10035
作者: Jusheng Zhang,Kaitong Cai,Qinglin Zeng,Ningyuan Liu,Stephen Fan,Ziliang Chen,Keze Wang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-184] Efficient Onboard Vision-Language Inference in UAV-Enabled Low-Altitude Economy Networks via LLM -Enhanced Optimization

链接: https://arxiv.org/abs/2510.10028
作者: Yang Li,Ruichen Zhang,Yinqiu Liu,Guangyuan Liu,Dusit Niyato,Abbas Jamalipour,Xianbin Wang,Dong In Kim
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
备注:

点击查看摘要

[AI-185] Skill-Targeted Adaptive Training

链接: https://arxiv.org/abs/2510.10023
作者: Yinghui He,Abhishek Panigrahi,Yong Lin,Sanjeev Arora
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-186] SLEAN: Simple Lightweight Ensemble Analysis Network for Multi-Provider LLM Coordination: Design Implementation and Vibe Coding Bug Investigation Case Study

链接: https://arxiv.org/abs/2510.10010
作者: Matheus J. T. Vargas
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注: 14 pages, 4 figures, 6 tables, link to code repo

点击查看摘要

[AI-187] RIPRAG : Hack a Black-box Retrieval-Augmented Generation Question-Answering System with Reinforcement Learning

链接: https://arxiv.org/abs/2510.10008
作者: Meng Xi,Sihan Lv,Yechen Jin,Guanjie Cheng,Naibo Wang,Ying Li,Jianwei Yin
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-188] Deliberative Dynamics and Value Alignment in LLM Debates

链接: https://arxiv.org/abs/2510.10002
作者: Pratik S. Sachdeva,Tom van Nuenen
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-189] Follow My Lead: Logical Fallacy Classification with Knowledge-Augmented LLM s

链接: https://arxiv.org/abs/2510.09970
作者: Olivia Peiyu Wang,Tashvi Bansal,Ryan Bai,Emily M. Chui,Leilani H. Gilpin
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: Accepted as a poster at the Twelfth Annual Conference on Advances in Cognitive Systems. 21 pages, 7 figures and 1 table

点击查看摘要

[AI-190] Homomorphic Mappings for Value-Preserving State Aggregation in Markov Decision Processes

链接: https://arxiv.org/abs/2510.09965
作者: Shuo Zhao,Yongqiang Li,Yu Feng,Zhongsheng Hou,Yuanjing Feng
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注:

点击查看摘要

[AI-191] Conformal Sparsification for Bandwidth-Efficient Edge-Cloud Speculative Decoding NEURIPS2025

链接: https://arxiv.org/abs/2510.09942
作者: Payel Bhattacharjee,Fengwei Tian,Meiyu Zhong,Guangyi Zhang,Osvaldo Simeone,Ravi Tandon
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT)
备注: 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: AI and ML for Next-Generation Wireless Communications and Networking (AI4NextG)

点击查看摘要

[AI-192] MemPromptTSS: Persistent Prompt Memory for Iterative Multi-Granularity Time Series State Segmentation

链接: https://arxiv.org/abs/2510.09930
作者: Ching Chang,Ming-Chih Lo,Chiao-Tung Chan,Wen-Chih Peng,Tien-Fu Chen
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: This paper is currently under review. The code will be made available upon acceptance

点击查看摘要

[AI-193] Phase-Aware Deep Learning with Complex-Valued CNNs for Audio Signal Applications

链接: https://arxiv.org/abs/2510.09926
作者: Naman Agrawal
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD)
备注:

点击查看摘要

[AI-194] Augmenting generative models with biomedical knowledge graphs improves targeted drug discovery

链接: https://arxiv.org/abs/2510.09914
作者: Aditya Malusare,Vineet Punyamoorty,Vaneet Aggarwal
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)
备注: This paper has been accepted for publication in the IEEE Transactions on Artificial Intelligence, October 2025

点击查看摘要

[AI-195] Agent ic Property-Based Testing: Finding Bugs Across the Python Ecosystem NEURIPS2025

链接: https://arxiv.org/abs/2510.09907
作者: Muhammad Maaz,Liam DeVoe,Zac Hatfield-Dodds,Nicholas Carlini
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注: 4 pages (main), NeurIPS 2025, The 4th Deep Learning for Code Workshop

点击查看摘要

[AI-196] Stability of Transformers under Layer Normalization

链接: https://arxiv.org/abs/2510.09904
作者: Kelvin Kan,Xingjian Li,Benjamin J. Zhang,Tuhin Sahai,Stanley Osher,Krishna Kumar,Markos A. Katsoulakis
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
备注:

点击查看摘要

[AI-197] Autonomous Agents for Scientific Discovery: Orchestrating Scientists Language Code and Physics

链接: https://arxiv.org/abs/2510.09901
作者: Lianhao Zhou,Hongyi Ling,Cong Fu,Yepeng Huang,Michael Sun,Wendi Yu,Xiaoxuan Wang,Xiner Li,Xingyu Su,Junkai Zhang,Xiusi Chen,Chenxing Liang,Xiaofeng Qian,Heng Ji,Wei Wang,Marinka Zitnik,Shuiwang Ji
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-198] Learning Bug Context for PyTorch-to-JAX Translation with LLM s

【速读】:该论文旨在解决当前大型语言模型(Large Language Models, LLMs)在PyTorch到JAX代码迁移任务中表现不佳的问题,尤其针对两者在核心设计、执行语义和生态系统成熟度上的显著差异。其关键解决方案是提出T2J框架,通过三阶段流程构建高质量的训练与推理增强机制:首先整合来自TorchLeet和CodeParrot的PyTorch代码源,并利用GPT-4o-mini生成初始JAX草稿;其次由专业开发者迭代修复草稿直至功能等价,形成包含常见错误及修复方案的结构化数据集;最后基于这些修复模式构造增强提示(prompt augmentation),引导轻量级LLM(如GPT-4o-mini)生成更准确、高效且可运行的JAX代码。实验证明,该方法显著提升多项指标性能,包括CodeBLEU最高提升10%、修复成本估计提升50%、代码转换评分提升1.33分(满分4分)、以及LLM评判得分翻倍,同时生成代码运行速度最快达基线的2.5倍。

链接: https://arxiv.org/abs/2510.09898
作者: Hung Phan,Son Le Vu,Ali Jannesari
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Despite recent progress of large language models (LLMs) on code translation among mainstream languages, translating PyTorch to JAX remains nontrivial. The two libraries, though both embedded in Python, differ in core design, execution semantics, and ecosystem maturity; JAX is newer and comparatively underrepresented in public code, and parallel PyTorch–JAX corpora are limited. Weaknesses in existing evaluation further complicate cross-framework benchmarking. We present T2J, a prompt-augmentation framework that strengthens LLM-based PyTorch to JAX translation. Our pipeline (i) assembles two PyTorch sources – the problem-solving set from TorchLeet (Aroori Chien, 2025) and a GitHub-derived set from CodeParrot (Wolf et al., 2022) – and uses GPT-4o-mini to produce initial JAX drafts; (ii) engages two professional developers to iteratively repair those drafts until functional equivalence, yielding a curated fixed-bug dataset of common errors and patches; and (iii) constructs augmented prompts that inject structured guidance from these fixes to steer lightweight LLMs (e.g., GPT-4o-mini). We also introduce three metrics tailored to PyTorch to JAX: T2J CodeTrans Score, T2J FixCost Score (an LLM-based estimate of bug-fix effort), and T2J Comparison Score (LLM-as-judge). Empirically, T2J raises GPT-4o-mini performance by up to 10% on CodeBLEU, 50% on T2J FixCost Score, 1.33 points on T2J CodeTrans Score (0–4 scale), and 100% on T2J Comparison Score; moreover, the generated code runs up to 2.5x faster than the baseline.
zh

[AI-199] Chain-of-Influence: Tracing Interdependencies Across Time and Features in Clinical Predictive Modelings

链接: https://arxiv.org/abs/2510.09895
作者: Yubo Li,Rema Padman
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注:

点击查看摘要

[AI-200] Beyond AlphaEarth: Toward Human-Centered Spatial Representation via POI-Guided Contrastive Learning

链接: https://arxiv.org/abs/2510.09894
作者: Junyuan Liu,Quan Qin,Guangsheng Dong,Xinglei Wang,Jiazhuang Feng,Zichao Zeng,Tao Cheng
机构: 未知
类目: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
备注:

点击查看摘要

[AI-201] Probabilistic bias adjustment of seasonal predictions of Arctic Sea Ice Concentration

链接: https://arxiv.org/abs/2510.09891
作者: Parsa Gooya,Reinel Sospedra-Alfonso
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Atmospheric and Oceanic Physics (physics.ao-ph); Machine Learning (stat.ML)
备注:

点击查看摘要

[AI-202] Myopic Bayesian Decision Theory for Batch Active Learning with Partial Batch Label Sampling

链接: https://arxiv.org/abs/2510.09877
作者: Kangping Hu,Stephen Mussmann
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注:

点击查看摘要

[AI-203] ROBOPSY PL[AI]: Using Role-Play to Investigate how LLM s Present Collective Memory

链接: https://arxiv.org/abs/2510.09874
作者: Margarete Jahrmann,Thomas Brandstetter,Stefan Glasauer
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
备注: 17 pages, 4 figures

点击查看摘要

[AI-204] WARC-Bench: Web Archive Based Benchmark for GUI Subtask Executions

链接: https://arxiv.org/abs/2510.09872
作者: Sanjari Srivastava,Gang Li,Cheng Chang,Rishu Garg,Manpreet Kaur,Charlene Y. Lee,Yuezhang Li,Yining Mao,Ignacio Cases,Yanan Xie,Peng Qi
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-205] AI and Consciousness

链接: https://arxiv.org/abs/2510.09858
作者: Eric Schwitzgebel
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-206] ProxRouter: Proximity-Weighted LLM Query Routing for Improved Robustness to Outliers

链接: https://arxiv.org/abs/2510.09852
作者: Shivam Patel,Neharika Jali,Ankur Mallick,Gauri Joshi
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-207] CALM: A Causal Analysis Language Model for Tabular Data in Complex Systems with Local Scores Conditional Independence Tests and Relation Attributes

链接: https://arxiv.org/abs/2510.09846
作者: Zhenjiang Fan,Zengyi Qin,Yuanning Zheng,Bo Xiong,Summer Han
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-208] mporal Lifting as Latent-Space Regularization for Continuous-Time Flow Models in AI Systems

链接: https://arxiv.org/abs/2510.09805
作者: Jeffrey Camlin
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 6 pages, 1 figure, 1 table, 1 algorithm

点击查看摘要

[AI-209] How can we assess human-agent interactions? Case studies in software agent design

链接: https://arxiv.org/abs/2510.09801
作者: Valerie Chen,Rohit Malhotra,Xingyao Wang,Juan Michelini,Xuhui Zhou,Aditya Bharat Soni,Hoang H. Tran,Calvin Smith,Ameet Talwalkar,Graham Neubig
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-210] Large Language Models for Imbalanced Classification: Diversity makes the difference

链接: https://arxiv.org/abs/2510.09783
作者: Dang Nguyen,Sunil Gupta,Kien Do,Thin Nguyen,Taylor Braund,Alexis Whitton,Svetha Venkatesh
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注:

点击查看摘要

[AI-211] SVTime: Small Time Series Forecasting Models Informed by “Physics” of Large Vision Model Forecasters

链接: https://arxiv.org/abs/2510.09780
作者: ChengAo Shen,Ziming Zhao,Hanghang Tong,Dongjin Song,Dongsheng Luo,Qingsong Wen,Jingchao Ni
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-212] Why Do Transformers Fail to Forecast Time Series In-Context?

链接: https://arxiv.org/abs/2510.09776
作者: Yufa Zhou,Yixiao Wang,Surbhi Goel,Anru R. Zhang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注: Code: this https URL

点击查看摘要

[AI-213] Scaling Laws and Symmetry Evidence from Neural Force Fields

链接: https://arxiv.org/abs/2510.09768
作者: Khang Ngo,Siamak Ravanbakhsh
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Physics (physics.comp-ph)
备注: 22 pages, 10 figures

点击查看摘要

[AI-214] PatentVision: A multimodal method for drafting patent applications

链接: https://arxiv.org/abs/2510.09762
作者: Ruo Yang,Sai Krishna Reddy Mudhiganti,Manali Sharma
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-215] Patentformer: A demonstration of AI-assisted automated patent drafting

链接: https://arxiv.org/abs/2510.09752
作者: Sai Krishna Reddy Mudhiganti,Juanyan Wang,Ruo Yang,Manali Sharma
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
备注:

点击查看摘要

[AI-216] InterCorpRel-LLM : Enhancing Financial Relational Understanding with Graph-Language Models

链接: https://arxiv.org/abs/2510.09735
作者: Qianyou Sun,Jiexin Zheng,Bohan Jin,Lihua Chen,Yijie Peng
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-217] ARROW: An Adaptive Rollout and Routing Method for Global Weather Forecasting

链接: https://arxiv.org/abs/2510.09734
作者: Jindong Tian,Yifei Ding,Ronghui Xu,Hao Miao,Chenjuan Guo,Bin Yang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 16 pages, 6 figures, conference

点击查看摘要

[AI-218] Evaluating LLM -Based Process Explanations under Progressive Behavioral-Input Reduction

链接: https://arxiv.org/abs/2510.09732
作者: P. van Oerle,R. H. Bemthuis,F. A. Bukhsh
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 12 pages, 2 figures, 3 tables; to appear in Enterprise Design, Operations, and Computing. EDOC 2025 Workshops, Lecture Notes in Business Information Processing (LNBIP), Springer, 2025. Part of 29th International Conference on Enterprise Design, Operations, and Computing (EDOC)

点击查看摘要

[AI-219] Herb.jl: A Unifying Program Synthesis Library

链接: https://arxiv.org/abs/2510.09726
作者: Tilman Hinnerichs,Reuben Gardos Reid,Jaap de Jong,Bart Swinkels,Pamela Wochner,Nicolae Filat,Tudor Magurescu,Issa Hanou,Sebastijan Dumancic
机构: 未知
类目: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
备注:

点击查看摘要

[AI-220] InteractScience: Programmatic and Visually-Grounded Evaluation of Interactive Scientific Demonstration Code Generation

链接: https://arxiv.org/abs/2510.09724
作者: Qiaosheng Chen,Yang Liu,Lei Li,Kai Chen,Qipeng Guo,Gong Cheng,Fei Yuan
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注: 27 pages, 17 figures

点击查看摘要

[AI-221] ICL-Router: In-Context Learned Model Representations for LLM Routing

链接: https://arxiv.org/abs/2510.09719
作者: Chenxu Wang,Hao Li,Yiqun Zhang,Linyao Chen,Jianhao Chen,Ping Jian,Peng Ye,Qiaosheng Zhang,Shuyue Hu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-222] High-Power Training Data Identification with Provable Statistical Guarantees

链接: https://arxiv.org/abs/2510.09717
作者: Zhenlong Liu,Hao Zeng,Weiran Huang,Hongxin Wei
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-223] A Demonstration of Self-Adaptive Jamming Attack Detection in AI/ML Integrated O-RAN

链接: https://arxiv.org/abs/2510.09706
作者: Md Habibur Rahman,Md Sharif Hossen,Nathan H. Stephenson,Vijay K. Shah,Aloizio Da Silva
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注: 2 pages, 3 figures

点击查看摘要

[AI-224] VisualDAN: Exposing Vulnerabilities in VLMs with Visual-Driven DAN Commands

链接: https://arxiv.org/abs/2510.09699
作者: Aofan Liu,Lulu Tang
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-225] Vanishing Contributions: A Unified Approach to Smoothly Transition Neural Models into Compressed Form

链接: https://arxiv.org/abs/2510.09696
作者: Lorenzo Nikiforos,Charalampos Antoniadis,Luciano Prono,Fabio Pareschi,Riccardo Rovatti,Gianluca Setti
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Code available at this https URL

点击查看摘要

[AI-226] Kelp: A Streaming Safeguard for Large Models via Latent Dynamics-Guided Risk Detection

链接: https://arxiv.org/abs/2510.09694
作者: Xiaodan Li,Mengjie Wu,Yao Zhu,Yunna Lv,YueFeng Chen,Cen Chen,Jianmei Guo,Hui Xue
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-227] Evaluation of Differential Privacy Mechanisms on Federated Learning

链接: https://arxiv.org/abs/2510.09691
作者: Tejash Varsani
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Supervised by Prof. Dr.-Ing. habil. Alois C. Knoll; Advisor: Nagacharan Teja Tangirala, this http URL

点击查看摘要

[AI-228] CREST-Search: Comprehensive Red-teaming for Evaluating Safety Threats in Large Language Models Powered by Web Search

链接: https://arxiv.org/abs/2510.09689
作者: Haoran Ou,Kangjie Chen,Xingshuo Han,Gelei Deng,Jie Zhang,Han Qiu,Tianwei Zhang
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-229] On the Occurence of Critical Learning Periods in Neural Networks

链接: https://arxiv.org/abs/2510.09687
作者: Stanisław Pawlak
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 8 pages, 8 figures

点击查看摘要

[AI-230] Fortifying LLM -Based Code Generation with Graph-Based Reasoning on Secure Coding Practices

链接: https://arxiv.org/abs/2510.09682
作者: Rupam Patir,Keyan Guo,Haipeng Cai,Hongxin Hu
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
备注:

点击查看摘要

[AI-231] AI in Computational Thinking Education in Higher Education: A Systematic Literature Review ICSE2025

链接: https://arxiv.org/abs/2510.09677
作者: Ebrahim Rahimi,Clara Maathuis
机构: 未知
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
备注: A poster based on this paper was accepted and published in the Proceedings of the 30th ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE 2025), DOI: this https URL

点击查看摘要

[AI-232] Coupled Data and Measurement Space Dynamics for Enhanced Diffusion Posterior Sampling

链接: https://arxiv.org/abs/2510.09676
作者: Shayan Mohajer Hamidi,En-Hui Yang,Ben Liang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注:

点击查看摘要

[AI-233] Leverag ing LLM s to Streamline the Review of Public Funding Applications EMNLP2025

链接: https://arxiv.org/abs/2510.09674
作者: Joao D.S. Marques,Andre V. Duarte,Andre Carvalho,Gil Rocha,Bruno Martins,Arlindo L. Oliveira
机构: 未知
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
备注: Paper Accepted at EMNLP 2025 Industry Track

点击查看摘要

[AI-234] A Hybrid Computational Intelligence Framework with Metaheuristic Optimization for Drug-Drug Interaction Prediction

链接: https://arxiv.org/abs/2510.09668
作者: Maryam Abdollahi Shamami,Babak Teimourpour,Farshad Sharifi
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM)
备注:

点击查看摘要

[AI-235] Adversarial-Resilient RF Fingerprinting: A CNN-GAN Framework for Rogue Transmitter Detection ICML

链接: https://arxiv.org/abs/2510.09663
作者: Raju Dhakal,Prashant Shekhar,Laxima Niure Kandel
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注: Accepted for publication in ICMLA 2025

点击查看摘要

[AI-236] Generative Models for Helmholtz Equation Solutions: A Dataset of Acoustic Materials

链接: https://arxiv.org/abs/2510.09657
作者: Riccardo Fosco Gramaccioni,Christian Marinoni,Fabrizio Frezza,Aurelio Uncini,Danilo Comminiello
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP); Numerical Analysis (math.NA)
备注: Accepted at EUSIPCO 2025

点击查看摘要

[AI-237] Data Provenance Auditing of Fine-Tuned Large Language Models with a Text-Preserving Technique

链接: https://arxiv.org/abs/2510.09655
作者: Yanming Li(PETSCRAFT),Seifeddine Ghozzi(ENSTA),Cédric Eichler(PETSCRAFT),Nicolas Anciaux(PETSCRAFT),Alexandra Bensamoun,Lorena Gonzalez Manzano(UC3M)
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-238] Rounding-Guided Backdoor Injection in Deep Learning Model Quantization NDSS2026

链接: https://arxiv.org/abs/2510.09647
作者: Xiangxiang Chen,Peixin Zhang,Jun Sun,Wenhai Wang,Jingyi Wang
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: This paper is to appear in NDSS 2026

点击查看摘要

[AI-239] Real-Time Health Analytics Using Ontology-Driven Complex Event Processing and LLM Reasoning : A Tuberculosis Case Study

链接: https://arxiv.org/abs/2510.09646
作者: Ritesh Chandra,Sonali Agarwal,Navjot Singh
机构: 未知
类目: Databases (cs.DB); Artificial Intelligence (cs.AI)
备注: 14 table. 20 figure

点击查看摘要

[AI-240] Enhanced Urban Traffic Management Using CCTV Surveillance Videos and Multi-Source Data Current State Prediction and Frequent Episode Mining

链接: https://arxiv.org/abs/2510.09644
作者: Shaharyar Alam Ansari,Mohammad Luqman,Aasim Zafar,Savir Ali
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 24 pages, 9 figures

点击查看摘要

[AI-241] Direct Routing Gradient (DRGrad): A Personalized Information Surgery for Multi-Task Learning (MTL) Recommendations

【速读】:该论文旨在解决多任务学习(Multi-task Learning, MTL)在工业级推荐系统中面临的负迁移(negative transfer)和“跷跷板现象”(seesaw phenomenon)问题,这些问题源于现实推荐场景中任务间复杂且常相互矛盾的相关性。为应对这一挑战并更好地利用个性化信息,作者提出了一种个性化直接路由梯度框架(Personalized Direct Routing Gradient, DRGrad),其核心创新在于引入三个关键组件:路由器(router)、更新器(updater)和个性化门控网络(personalized gate network)。该方案通过在训练过程中动态判断任务间的优先级关系,智能地选择并利用有效梯度进行参数更新,从而减少任务间的冲突,提升模型性能,同时不增加额外的模型复杂度,并展现出对噪声处理的改进能力。

链接: https://arxiv.org/abs/2510.09643
作者: Yuguang Liu,Yiyun Miao,Luyao Xia
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Multi-task learning (MTL) has emerged as a successful strategy in industrial-scale recommender systems, offering significant advantages such as capturing diverse users’ interests and accurately detecting different behaviors like click" or dwell time". However, negative transfer and the seesaw phenomenon pose challenges to MTL models due to the complex and often contradictory task correlations in real-world recommendations. To address the problem while making better use of personalized information, we propose a personalized Direct Routing Gradient framework (DRGrad), which consists of three key components: router, updater and personalized gate network. DRGrad judges the stakes between tasks in the training process, which can leverage all valid gradients for the respective task to reduce conflicts. We evaluate the efficiency of DRGrad on complex MTL using a real-world recommendation dataset with 15 billion samples. The results show that DRGrad’s superior performance over competing state-of-the-art MTL models, especially in terms of AUC (Area Under the Curve) metrics, indicating that it effectively manages task conflicts in multi-task learning environments without increasing model complexity, while also addressing the deficiencies in noise processing. Moreover, experiments on the public Census-income dataset and Synthetic dataset, have demonstrated the capability of DRGrad in judging and routing the stakes between tasks with varying degrees of correlation and personalization.
zh

[AI-242] Bias-Aware AI Chatbot for Engineering Advising at the University of Maryland A. James Clark School of Engineering

链接: https://arxiv.org/abs/2510.09636
作者: Prarthana P. Kartholy,Thandi M. Labor,Neil N. Panchal,Sean H. Wang,Hillary N. Owusu
机构: 未知
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-243] Responsible AI Adoption in the Public Sector: A Data-Centric Taxonomy of AI Adoption Challenges

链接: https://arxiv.org/abs/2510.09634
作者: Anastasija Nikiforova,Martin Lnenicka,Ulf Melin,David Valle-Cruz,Asif Gill,Cesar Casiano Flores,Emyana Sirait,Mariusz Luterek,Richard Michael Dreyling,Barbora Tesarova
机构: 未知
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-244] Hound: Relation-First Knowledge Graphs for Complex-System Reasoning in Security Audits

链接: https://arxiv.org/abs/2510.09633
作者: Bernhard Mueller
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Programming Languages (cs.PL)
备注:

点击查看摘要

[AI-245] oward a Unified Security Framework for AI Agents : Trust Risk and Liability

链接: https://arxiv.org/abs/2510.09620
作者: Jiayun Mo,Xin Kang,Tieyan Li,Zhongding Lei
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-246] Causal Digital Twins for Cyber-Physical Security: A Framework for Robust Anomaly Detection in Industrial Control Systems

链接: https://arxiv.org/abs/2510.09616
作者: Mohammadhossein Homaei,Mehran Tarif,Mar Avilla,Andres Caro
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Statistics Theory (math.ST)
备注: 29 Pages, six figures, and 14 tables,

点击查看摘要

[AI-247] Domain-Specific Constitutional AI: Enhancing Safety in LLM -Powered Mental Health Chatbots

链接: https://arxiv.org/abs/2509.16444
作者: Chenhan Lyu,Yutong Song,Pengfei Zhang,Amir M. Rahmani
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[AI-248] Manimator: Transforming Research Papers into Visual Explanations

链接: https://arxiv.org/abs/2507.14306
作者: Samarth P,Vyoman Jain,Shiva Golugula,Motamarri Sai Sathvik
机构: 未知
类目: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Multimedia (cs.MM)
备注:

点击查看摘要

[AI-249] Accelerated stochastic first-order method for convex optimization under heavy-tailed noise

链接: https://arxiv.org/abs/2510.11676
作者: Chuan He,Zhaosong Lu
机构: 未知
类目: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
备注:

点击查看摘要

[AI-250] Hierarchical Qubit-Merging Transformer for Quantum Error Correction

链接: https://arxiv.org/abs/2510.11593
作者: Seong-Joon Park,Hee-Youl Kwak,Yongjune Kim
机构: 未知
类目: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 6 pages, 5 figures

点击查看摘要

[AI-251] People use fast flat goal-directed simulation to reason about novel problems

链接: https://arxiv.org/abs/2510.11503
作者: Katherine M. Collins,Cedegao E. Zhang,Lionel Wong,Mauricio Barba da Costa,Graham Todd,Adrian Weller,Samuel J. Cheyette,Thomas L. Griffiths,Joshua B. Tenenbaum
机构: 未知
类目: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
备注: Pre-print

点击查看摘要

[AI-252] HYPERDOA: Robust and Efficient DoA Estimation using Hyperdimensional Computing

链接: https://arxiv.org/abs/2510.10718
作者: Rajat Bhattacharjya,Woohyeok Park,Arnab Sarkar,Hyunwoo Oh,Mohsen Imani,Nikil Dutt
机构: 未知
类目: ignal Processing (eess.SP); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Symbolic Computation (cs.SC)
备注: 3 figures, 5 pages. Authors’ version posted for personal use and not for redistribution

点击查看摘要

[AI-253] Deep Learning in Astrophysics WWW

链接: https://arxiv.org/abs/2510.10713
作者: Yuan-Sen Ting
机构: 未知
类目: Instrumentation and Methods for Astrophysics (astro-ph.IM); Cosmology and Nongalactic Astrophysics (astro-ph.CO); Earth and Planetary Astrophysics (astro-ph.EP); Astrophysics of Galaxies (astro-ph.GA); High Energy Astrophysical Phenomena (astro-ph.HE); Artificial Intelligence (cs.AI)
备注: Manuscript submitted to Annual Review of Astronomy and Astrophysics for Volume 64. This is the authors’ version. Revisions and the final version will be available at this https URL

点击查看摘要

[AI-254] Missing Data Multiple Imputation for Tabular Q-Learning in Online RL

链接: https://arxiv.org/abs/2510.10709
作者: Kyla Chasalow,Skyler Wu,Susan Murphy
机构: 未知
类目: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Working paper

点击查看摘要

[AI-255] High-Dimensional Learning Dynamics of Quantized Models with Straight-Through Estimator

链接: https://arxiv.org/abs/2510.10693
作者: Yuma Ichikawa,Shuhei Kashiwamura,Ayaka Sakata
机构: 未知
类目: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Statistics Theory (math.ST)
备注: 27 pages, 14 figures

点击查看摘要

[AI-256] Distributionally Robust Control with End-to-End Statistically Guaranteed Metric Learning

链接: https://arxiv.org/abs/2510.10214
作者: Jingyi Wu,Chao Ning,Yang Shi
机构: 未知
类目: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
备注:

点击查看摘要

[AI-257] Uncovering Singularities in Feynman Integrals via Machine Learning

链接: https://arxiv.org/abs/2510.10099
作者: Yuanche Liu,Yingxuan Xu,Yang Zhang
机构: 未知
类目: High Energy Physics - Phenomenology (hep-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); High Energy Physics - Theory (hep-th)
备注:

点击查看摘要

[AI-258] Neuro-inspired automated lens design

链接: https://arxiv.org/abs/2510.09979
作者: Yao Gao,Lei Sun,Shaohua Gao,Qi Jiang,Kailun Yang,Weijian Hu,Xiaolong Qian,Wenyong Li,Luc Van Gool,Kaiwei Wang
机构: 未知
类目: Optics (physics.optics); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[AI-259] oken is All You Price

链接: https://arxiv.org/abs/2510.09859
作者: Weijie Zhong
机构: 未知
类目: Theoretical Economics (econ.TH); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-260] Chlorophyll-a Mapping and Prediction in the Mar Menor Lagoon Using C2RCC-Processed Sentinel 2 Imagery

链接: https://arxiv.org/abs/2510.09736
作者: Antonio Martínez-Ibarra,Aurora González-Vidal,Adrián Cánovas-Rodríguez,Antonio F. Skarmeta
机构: 未知
类目: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Atmospheric and Oceanic Physics (physics.ao-ph)
备注:

点击查看摘要

机器学习

[LG-0] Reinforced sequential Monte Carlo for amortised sampling

链接: https://arxiv.org/abs/2510.11711
作者: Sanghyeok Choi,Sarthak Mittal,Víctor Elvira,Jinkyoo Park,Nikolay Malkin
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注: code: this https URL

点击查看摘要

Abstract:This paper proposes a synergy of amortised and particle-based methods for sampling from distributions defined by unnormalised density functions. We state a connection between sequential Monte Carlo (SMC) and neural sequential samplers trained by maximum-entropy reinforcement learning (MaxEnt RL), wherein learnt sampling policies and value functions define proposal kernels and twist functions. Exploiting this connection, we introduce an off-policy RL training procedure for the sampler that uses samples from SMC – using the learnt sampler as a proposal – as a behaviour policy that better explores the target distribution. We describe techniques for stable joint training of proposals and twist functions and an adaptive weight tempering scheme to reduce training signal variance. Furthermore, building upon past attempts to use experience replay to guide the training of neural samplers, we derive a way to combine historical samples with annealed importance sampling weights within a replay buffer. On synthetic multi-modal targets (in both continuous and discrete spaces) and the Boltzmann distribution of alanine dipeptide conformations, we demonstrate improvements in approximating the true distribution as well as training stability compared to both amortised and Monte Carlo methods.

[LG-1] ght Regret Upper and Lower Bounds for Optimistic Hedge in Two-Player Zero-Sum Games

链接: https://arxiv.org/abs/2510.11691
作者: Taira Tsuchiya
类目: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)
*备注: 29 pages, 2 figures

点击查看摘要

Abstract:In two-player zero-sum games, the learning dynamic based on optimistic Hedge achieves one of the best-known regret upper bounds among strongly-uncoupled learning dynamics. With an appropriately chosen learning rate, the social and individual regrets can be bounded by O(\log(mn)) in terms of the numbers of actions m and n of the two players. This study investigates the optimality of the dependence on m and n in the regret of optimistic Hedge. To this end, we begin by refining existing regret analysis and show that, in the strongly-uncoupled setting where the opponent’s number of actions is known, both the social and individual regret bounds can be improved to O(\sqrt\log m \log n) . In this analysis, we express the regret upper bound as an optimization problem with respect to the learning rates and the coefficients of certain negative terms, enabling refined analysis of the leading constants. We then show that the existing social regret bound as well as these new social and individual regret upper bounds cannot be further improved for optimistic Hedge by providing algorithm-dependent individual regret lower bounds. Importantly, these social regret upper and lower bounds match exactly including the constant factor in the leading term. Finally, building on these results, we improve the last-iterate convergence rate and the dynamic regret of a learning dynamic based on optimistic Hedge, and complement these bounds with algorithm-dependent dynamic regret lower bounds that match the improved bounds.

[LG-2] Chronologically Consistent Generative AI

链接: https://arxiv.org/abs/2510.11677
作者: Songrun He,Linying Lv,Asaf Manela,Jimmy Wu
类目: Machine Learning (cs.LG); General Finance (q-fin.GN)
*备注:

点击查看摘要

Abstract:We introduce a family of chronologically consistent, instruction-following large language models to eliminate lookahead bias. Each model is trained only on data available before a clearly defined knowledge-cutoff date, ensuring strict temporal separation from any post-cutoff data. The resulting framework offers (i) a simple, conversational chat interface, (ii) fully open, fixed model weights that guarantee replicability, and (iii) a conservative lower bound on forecast accuracy, isolating the share of predictability that survives once training leakage is removed. Together, these features provide researchers with an easy-to-use generative AI tool useful for a wide range of prediction tasks that is free of lookahead bias.

[LG-3] An Eulerian Perspective on Straight-Line Sampling

链接: https://arxiv.org/abs/2510.11657
作者: Panos Tsimpos,Youssef Marzouk
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:We study dynamic measure transport for generative modeling: specifically, flows induced by stochastic processes that bridge a specified source and target distribution. The conditional expectation of the process’ velocity defines an ODE whose flow map achieves the desired transport. We ask \emphwhich processes produce straight-line flows – i.e., flows whose pointwise acceleration vanishes and thus are exactly integrable with a first-order method? We provide a concise PDE characterization of straightness as a balance between conditional acceleration and the divergence of a weighted covariance (Reynolds) tensor. Using this lens, we fully characterize affine-in-time interpolants and show that straightness occurs exactly under deterministic endpoint couplings. We also derive necessary conditions that constrain flow geometry for general processes, offering broad guidance for designing transports that are easier to integrate.

[LG-4] Continual Release of Densest Subgraphs: Privacy Amplification Sublinear Space via Subsampling

链接: https://arxiv.org/abs/2510.11640
作者: Felix Zhou
类目: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
*备注: to be published in SOSA’26

点击查看摘要

Abstract:We study the sublinear space continual release model for edge-differentially private (DP) graph algorithms, with a focus on the densest subgraph problem (DSG) in the insertion-only setting. Our main result is the first continual release DSG algorithm that matches the additive error of the best static DP algorithms and the space complexity of the best non-private streaming algorithms, up to constants. The key idea is a refined use of subsampling that simultaneously achieves privacy amplification and sparsification, a connection not previously formalized in graph DP. Via a simple black-box reduction to the static setting, we obtain both pure and approximate-DP algorithms with O(\log n) additive error and O(n\log n) space, improving both accuracy and space complexity over the previous state of the art. Along the way, we introduce graph densification in the graph DP setting, adding edges to trigger earlier subsampling, which removes the extra logarithmic factors in error and space incurred by prior work [ELMZ25]. We believe this simple idea may be of independent interest.

[LG-5] Lecture Notes on Verifying Graph Neural Networks

链接: https://arxiv.org/abs/2510.11617
作者: François Schwarzentruber
类目: Logic in Computer Science (cs.LO); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:In these lecture notes, we first recall the connection between graph neural networks, Weisfeiler-Lehman tests and logics such as first-order logic and graded modal logic. We then present a modal logic in which counting modalities appear in linear inequalities in order to solve verification tasks on graph neural networks. We describe an algorithm for the satisfiability problem of that logic. It is inspired from the tableau method of vanilla modal logic, extended with reasoning in quantifier-free fragment Boolean algebra with Presburger arithmetic.

[LG-6] Diffusion-DFL: Decision-focused Diffusion Models for Stochastic Optimization

链接: https://arxiv.org/abs/2510.11590
作者: Zihao Zhao,Christopher Yeh,Lingkai Kong,Kai Wang
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:Decision-focused learning (DFL) integrates predictive modeling and optimization by training predictors to optimize the downstream decision target rather than merely minimizing prediction error. To date, existing DFL methods typically rely on deterministic point predictions, which are often insufficient to capture the intrinsic stochasticity of real-world environments. To address this challenge, we propose the first diffusion-based DFL approach, which trains a diffusion model to represent the distribution of uncertain parameters and optimizes the decision by solving a stochastic optimization with samples drawn from the diffusion model. Our contributions are twofold. First, we formulate diffusion DFL using the reparameterization trick, enabling end-to-end training through diffusion. While effective, it is memory and compute-intensive due to the need to differentiate through the diffusion sampling process. Second, we propose a lightweight score function estimator that uses only several forward diffusion passes and avoids backpropagation through the sampling. This follows from our results that backpropagating through stochastic optimization can be approximated by a weighted score function formulation. We empirically show that our diffusion DFL approach consistently outperforms strong baselines in decision quality. The source code for all experiments is available at the project repository: this https URL.

[LG-7] Ontolearn-A Framework for Large-scale OWL Class Expression Learning in Python

链接: https://arxiv.org/abs/2510.11561
作者: Caglar Demir,Alkid Baci,N’Dah Jean Kouagou,Leonie Nora Sieger,Stefan Heindorf,Simon Bin,Lukas Blübaum,Alexander Bigerl,Axel-Cyrille Ngonga Ngomo
类目: Machine Learning (cs.LG); Symbolic Computation (cs.SC)
*备注:

点击查看摘要

Abstract:In this paper, we present Ontolearn-a framework for learning OWL class expressions over large knowledge graphs. Ontolearn contains efficient implementations of recent stateof-the-art symbolic and neuro-symbolic class expression learners including EvoLearner and DRILL. A learned OWL class expression can be used to classify instances in the knowledge graph. Furthermore, Ontolearn integrates a verbalization module based on an LLM to translate complex OWL class expressions into natural language sentences. By mapping OWL class expressions into respective SPARQL queries, Ontolearn can be easily used to operate over a remote triplestore. The source code of Ontolearn is available at this https URL.

[LG-8] Knowledge-Guided Machine Learning Models to Upscale Evapotranspiration in the U.S. Midwest

链接: https://arxiv.org/abs/2510.11505
作者: Aleksei Rozanov,Samikshya Subedi,Vasudha Sharma,Bryan C. Runck
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Evapotranspiration (ET) plays a critical role in the land-atmosphere interactions, yet its accurate quantification across various spatiotemporal scales remains a challenge. In situ measurement approaches, like eddy covariance (EC) or weather station-based ET estimation, allow for measuring ET at a single location. Agricultural uses of ET require estimates for each field over broad areas, making it infeasible to deploy sensing systems at each location. This study integrates tree-based and knowledge-guided machine learning (ML) techniques with multispectral remote sensing data, griddled meteorology and EC data to upscale ET across the Midwest United States. We compare four tree-based models - Random Forest, CatBoost, XGBoost, LightGBM - and a simple feed-forward artificial neural network in combination with features engineered using knowledge-guided ML principles. Models were trained and tested on EC towers located in the Midwest of the United States using k-fold cross validation with k=5 and site-year, biome stratified train-test split to avoid data leakage. Results show that LightGBM with knowledge-guided features outperformed other methods with an R2=0.86, MSE=14.99 W m^-2 and MAE = 8.82 W m^-2 according to grouped k-fold validation (k=5). Feature importance analysis shows that knowledge-guided features were most important for predicting evapotranspiration. Using the best performing model, we provide a data product at 500 m spatial and one-day temporal resolution for gridded ET for the period of 2019-2024. Intercomparison between the new gridded product and state-level weather station-based ET estimates show best-in-class correspondence.

[LG-9] Learning to Make MISTAKEs: Modeling Incorrect Student Thinking And Key Errors

链接: https://arxiv.org/abs/2510.11502
作者: Alexis Ross,Jacob Andreas
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Research on reasoning in language models (LMs) predominantly focuses on improving the correctness of their outputs. But some important applications require modeling reasoning patterns that are incorrect. For example, automated systems that can reason about and simulate student errors are useful for providing real-time feedback in the classroom or offline practice for educators-in-training. This paper presents a new method, MISTAKE, that (1) constructs high-quality synthetic examples of reasoning errors by leveraging cycle consistency between incorrect answers and latent misconceptions; and (2) uses the generated data to learn models for student simulation, misconception classification, and answer generation. We evaluate MISTAKE on three educational tasks and find that it results in (1) higher accuracy when simulating incorrect student answers based on specific misconceptions, (2) increased performance inferring latent misconceptions from observed incorrect answers, and (3) higher alignment with expert-written distractor answers when generating incorrect answers (e.g., for multiple-choice tests).

[LG-10] Context-Aware Model-Based Reinforcement Learning for Autonomous Racing

链接: https://arxiv.org/abs/2510.11501
作者: Emran Yasser Moustafa,Ivana Dusparic
类目: Machine Learning (cs.LG); Robotics (cs.RO)
*备注: Accepted to IEEE ICAR 2025

点击查看摘要

Abstract:Autonomous vehicles have shown promising potential to be a groundbreaking technology for improving the safety of road users. For these vehicles, as well as many other safety-critical robotic technologies, to be deployed in real-world applications, we require algorithms that can generalize well to unseen scenarios and data. Model-based reinforcement learning algorithms (MBRL) have demonstrated state-of-the-art performance and data efficiency across a diverse set of domains. However, these algorithms have also shown susceptibility to changes in the environment and its transition dynamics. In this work, we explore the performance and generalization capabilities of MBRL algorithms for autonomous driving, specifically in the simulated autonomous racing environment, Roboracer (formerly F1Tenth). We frame the head-to-head racing task as a learning problem using contextual Markov decision processes and parameterize the driving behavior of the adversaries using the context of the episode, thereby also parameterizing the transition and reward dynamics. We benchmark the behavior of MBRL algorithms in this environment and propose a novel context-aware extension of the existing literature, cMask. We demonstrate that context-aware MBRL algorithms generalize better to out-of-distribution adversary behaviors relative to context-free approaches. We also demonstrate that cMask displays strong generalization capabilities, as well as further performance improvement relative to other context-aware MBRL approaches when racing against adversaries with in-distribution behaviors. Comments: Accepted to IEEE ICAR 2025 Subjects: Machine Learning (cs.LG); Robotics (cs.RO) Cite as: arXiv:2510.11501 [cs.LG] (or arXiv:2510.11501v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2510.11501 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

[LG-11] How Reinforcement Learning After Next-Token Prediction Facilitates Learning

链接: https://arxiv.org/abs/2510.11495
作者: Nikolaos Tsilivis,Eran Malach,Karen Ullrich,Julia Kempe
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:Recent advances in reasoning domains with neural networks have primarily been enabled by a training recipe that optimizes Large Language Models, previously trained to predict the next-token in a sequence, with reinforcement learning algorithms. We introduce a framework to study the success of this paradigm, and we theoretically expose the optimization mechanisms by which reinforcement learning improves over next-token prediction in this setting. We study learning from mixture distributions of short and long ``chain-of-thought’’ sequences encoding a single task. In particular, when the task consists of predicting the parity of d bits and long sequences are rare, we show how reinforcement learning after next-token prediction enables autoregressive transformers to generalize, whereas mere next-token prediction requires extreme statistical or computational resources to do so. We further explain how reinforcement learning leverages increased test-time computation, manifested in longer responses, to facilitate this learning process. In a simplified setting, we theoretically prove that autoregressive linear models following this training recipe can efficiently learn to predict the parity of d bits as long as the proportion of long demonstrations in the data mix is not exponentially small in the input dimension d . Finally, we demonstrate these same phenomena in other settings, including the post-training of Llama-series models on mixture variations of common mathematical reasoning benchmarks.

[LG-12] Constraint-Aware Reinforcement Learning via Adaptive Action Scaling

链接: https://arxiv.org/abs/2510.11491
作者: Murad Dawood,Usama Ahmed Siddiquie,Shahram Khorshidi,Maren Bennewitz
类目: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)
*备注:

点击查看摘要

Abstract:Safe reinforcement learning (RL) seeks to mitigate unsafe behaviors that arise from exploration during training by reducing constraint violations while maintaining task performance. Existing approaches typically rely on a single policy to jointly optimize reward and safety, which can cause instability due to conflicting objectives, or they use external safety filters that override actions and require prior system knowledge. In this paper, we propose a modular cost-aware regulator that scales the agent’s actions based on predicted constraint violations, preserving exploration through smooth action modulation rather than overriding the policy. The regulator is trained to minimize constraint violations while avoiding degenerate suppression of actions. Our approach integrates seamlessly with off-policy RL methods such as SAC and TD3, and achieves state-of-the-art return-to-cost ratios on Safety Gym locomotion tasks with sparse costs, reducing constraint violations by up to 126 times while increasing returns by over an order of magnitude compared to prior methods.

[LG-13] Rescaling-Aware Training for Efficient Deployment of Deep Learning Models on Full-Integer Hardware

链接: https://arxiv.org/abs/2510.11484
作者: Lion Mueller,Alberto Garcia-Ortiz,Ardalan Najafi,Adam Fuks,Lennart Bamberg
类目: Machine Learning (cs.LG); Hardware Architecture (cs.AR)
*备注: Submitted to IEEE Embedded Systems Letters

点击查看摘要

Abstract:Integer AI inference significantly reduces computational complexity in embedded systems. Quantization-aware training (QAT) helps mitigate accuracy degradation associated with post-training quantization but still overlooks the impact of integer rescaling during inference, which is a hardware costly operation in integer-only AI inference. This work shows that rescaling cost can be dramatically reduced post-training, by applying a stronger quantization to the rescale multiplicands at no model-quality loss. Furthermore, we introduce Rescale-Aware Training, a fine tuning method for ultra-low bit-width rescaling multiplicands. Experiments show that even with 8x reduced rescaler widths, the full accuracy is preserved through minimal incremental retraining. This enables more energy-efficient and cost-efficient AI inference for resource-constrained embedded systems.

[LG-14] Differentiable Fast Top-K Selection for Large-Scale Recommendation

链接: https://arxiv.org/abs/2510.11472
作者: Yanjie Zhu,Zhen Zhang,Yunli Wang,Zhiqiang Wang,Yu Li,Rufan Zhou,Shiyang Wen,Peng Jiang,Chenhao Lin,Jian Yang
类目: Machine Learning (cs.LG)
*备注: 12 pages, 5 figures

点击查看摘要

Abstract:Cascade ranking is a widely adopted paradigm in large-scale information retrieval systems for Top-K item selection. However, the Top-K operator is non-differentiable, hindering end-to-end training. Existing methods include Learning-to-Rank approaches (e.g., LambdaLoss), which optimize ranking metrics like NDCG and suffer from objective misalignment, and differentiable sorting-based methods (e.g., ARF, LCRON), which relax permutation matrices for direct Top-K optimization but introduce gradient conflicts through matrix aggregation. A promising alternative is to directly construct a differentiable approximation of the Top-K selection operator, bypassing the use of soft permutation matrices. However, even state-of-the-art differentiable Top-K operator (e.g., LapSum) require O(n \log n) complexity due to their dependence on sorting for solving the threshold. Thus, we propose DFTopK, a novel differentiable Top-K operator achieving optimal O(n) time complexity. By relaxing normalization constraints, DFTopK admits a closed-form solution and avoids sorting. DFTopK also avoids the gradient conflicts inherent in differentiable sorting-based methods. We evaluate DFTopK on both the public benchmark RecFLow and an industrial system. Experimental results show that DFTopK significantly improves training efficiency while achieving superior performance, which enables us to scale up training samples more efficiently. In the online A/B test, DFTopK yielded a +1.77% revenue lift with the same computational budget compared to the baseline. To the best of our knowledge, this work is the first to introduce differentiable Top-K operators into recommendation systems and the first to achieve theoretically optimal linear-time complexity for Top-K selection. We have open-sourced our implementation to facilitate future research in both academia and industry.

[LG-15] Forward-Forward Autoencoder Architectures for Energy-Efficient Wireless Communications

链接: https://arxiv.org/abs/2510.11418
作者: Daniel Seifert,Onur Günlü,Rafael F. Schaefer
类目: Information Theory (cs.IT); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:The application of deep learning to the area of communications systems has been a growing field of interest in recent years. Forward-forward (FF) learning is an efficient alternative to the backpropagation (BP) algorithm, which is the typically used training procedure for neural networks. Among its several advantages, FF learning does not require the communication channel to be differentiable and does not rely on the global availability of partial derivatives, allowing for an energy-efficient implementation. In this work, we design end-to-end learned autoencoders using the FF algorithm and numerically evaluate their performance for the additive white Gaussian noise and Rayleigh block fading channels. We demonstrate their competitiveness with BP-trained systems in the case of joint coding and modulation, and in a scenario where a fixed, non-differentiable modulation stage is applied. Moreover, we provide further insights into the design principles of the FF network, its training convergence behavior, and significant memory and processing time savings compared to BP-based approaches.

[LG-16] Leverag ing LLM s for Semi-Automatic Corpus Filtration in Systematic Literature Reviews

链接: https://arxiv.org/abs/2510.11409
作者: Lucas Joos,Daniel A. Keim,Maximilian T. Fischer
类目: Machine Learning (cs.LG); Digital Libraries (cs.DL); Human-Computer Interaction (cs.HC)
*备注:

点击查看摘要

Abstract:The creation of systematic literature reviews (SLR) is critical for analyzing the landscape of a research field and guiding future research directions. However, retrieving and filtering the literature corpus for an SLR is highly time-consuming and requires extensive manual effort, as keyword-based searches in digital libraries often return numerous irrelevant publications. In this work, we propose a pipeline leveraging multiple large language models (LLMs), classifying papers based on descriptive prompts and deciding jointly using a consensus scheme. The entire process is human-supervised and interactively controlled via our open-source visual analytics web interface, LLMSurver, which enables real-time inspection and modification of model outputs. We evaluate our approach using ground-truth data from a recent SLR comprising over 8,000 candidate papers, benchmarking both open and commercial state-of-the-art LLMs from mid-2024 and fall 2025. Results demonstrate that our pipeline significantly reduces manual effort while achieving lower error rates than single human annotators. Furthermore, modern open-source models prove sufficient for this task, making the method accessible and cost-effective. Overall, our work demonstrates how responsible human-AI collaboration can accelerate and enhance systematic literature reviews within academic workflows.

[LG-17] FedHybrid: Breaking the Memory Wall of Federated Learning via Hybrid Tensor Management

链接: https://arxiv.org/abs/2510.11400
作者: Kahou Tam,Chunlin Tian,Li Li,Haikai Zhao,ChengZhong Xu
类目: Machine Learning (cs.LG)
*备注: Sensys 2024

点击查看摘要

Abstract:Federated Learning (FL) emerges as a new learning paradigm that enables multiple devices to collaboratively train a shared model while preserving data privacy. However, one fundamental and prevailing challenge that hinders the deployment of FL on mobile devices is the memory limitation. This paper proposes \textitFedHybrid, a novel framework that effectively reduces the memory footprint during the training process while guaranteeing the model accuracy and the overall training progress. Specifically, \textitFedHybrid first selects the participating devices for each training round by jointly evaluating their memory budget, computing capability, and data diversity. After that, it judiciously analyzes the computational graph and generates an execution plan for each selected client in order to meet the corresponding memory budget while minimizing the training delay through employing a hybrid of recomputation and compression techniques according to the characteristic of each tensor. During the local training process, \textitFedHybrid carries out the execution plan with a well-designed activation compression technique to effectively achieve memory reduction with minimum accuracy loss. We conduct extensive experiments to evaluate \textitFedHybrid on both simulation and off-the-shelf mobile devices. The experiment results demonstrate that \textitFedHybrid achieves up to a 39.1% increase in model accuracy and a 15.5 \times reduction in wall clock time under various memory budgets compared with the baselines.

[LG-18] DiffStyleTS: Diffusion Model for Style Transfer in Time Series

链接: https://arxiv.org/abs/2510.11335
作者: Mayank Nagda,Phil Ostheimer,Justus Arweiler,Indra Jungjohann,Jennifer Werner,Dennis Wagner,Aparna Muraleedharan,Pouya Jafari,Jochen Schmid,Fabian Jirasek,Jakob Burger,Michael Bortz,Hans Hasse,Stephan Mandt,Marius Kloft,Sophie Fellenz
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Style transfer combines the content of one signal with the style of another. It supports applications such as data augmentation and scenario simulation, helping machine learning models generalize in data-scarce domains. While well developed in vision and language, style transfer methods for time series data remain limited. We introduce DiffTSST, a diffusion-based framework that disentangles a time series into content and style representations via convolutional encoders and recombines them through a self-supervised attention-based diffusion process. At inference, encoders extract content and style from two distinct series, enabling conditional generation of novel samples to achieve style transfer. We demonstrate both qualitatively and quantitatively that DiffTSST achieves effective style transfer. We further validate its real-world utility by showing that data augmentation with DiffTSST improves anomaly detection in data-scarce regimes.

[LG-19] Network-Optimised Spiking Neural Network (NOS) Scheduling for 6G O-RAN: Spectral Margin and Delay-Tail Control

链接: https://arxiv.org/abs/2510.11291
作者: Muhammad Bilal,Xiaolong Xu
类目: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Machine Learning (cs.LG)
*备注: 6 pages, 5 figures, 1 table

点击查看摘要

Abstract:This work presents a Network-Optimised Spiking (NOS) delay-aware scheduler for 6G radio access. The scheme couples a bounded two-state kernel to a clique-feasible proportional-fair (PF) grant head: the excitability state acts as a finite-buffer proxy, the recovery state suppresses repeated grants, and neighbour pressure is injected along the interference graph via delayed spikes. A small-signal analysis yields a delay-dependent threshold k_\star(\Delta) and a spectral margin \delta = k_\star(\Delta) - gH\rho(W) that compress topology, controller gain, and delay into a single design parameter. Under light assumptions on arrivals, we prove geometric ergodicity for \delta0 and derive sub-Gaussian backlog and delay tail bounds with exponents proportional to \delta . A numerical study, aligned with the analysis and a DU compute budget, compares NOS with PF and delayed backpressure (BP) across interference topologies over a 5 – 20 ,ms delay sweep. With a single gain fixed at the worst spectral radius, NOS sustains higher utilisation and a smaller 99.9th-percentile delay while remaining clique-feasible on integer PRBs.

[LG-20] Gym-TORAX: Open-source software for integrating RL with plasma control simulators

链接: https://arxiv.org/abs/2510.11283
作者: Antoine Mouchamps,Arthur Malherbe,Adrien Bolland,Damien Ernst
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:This paper presents Gym-TORAX, a Python package enabling the implementation of Reinforcement Learning (RL) environments for simulating plasma dynamics and control in tokamaks. Users define succinctly a set of control actions and observations, and a control objective from which Gym-TORAX creates a Gymnasium environment that wraps TORAX for simulating the plasma dynamics. The objective is formulated through rewards depending on the simulated state of the plasma and control action to optimize specific characteristics of the plasma, such as performance and stability. The resulting environment instance is then compatible with a wide range of RL algorithms and libraries and will facilitate RL research in plasma control. In its current version, one environment is readily available, based on a ramp-up scenario of the International Thermonuclear Experimental Reactor (ITER).

[LG-21] Vision-LLM s for Spatiotemporal Traffic Forecasting

链接: https://arxiv.org/abs/2510.11282
作者: Ning Yang,Hengyu Zhong,Haijun Zhang,Randall Berry
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Accurate spatiotemporal traffic forecasting is a critical prerequisite for proactive resource management in dense urban mobile networks. While Large Language Models (LLMs) have shown promise in time series analysis, they inherently struggle to model the complex spatial dependencies of grid-based traffic data. Effectively extending LLMs to this domain is challenging, as representing the vast amount of information from dense geographical grids can be inefficient and overwhelm the model’s context. To address these challenges, we propose ST-Vision-LLM, a novel framework that reframes spatiotemporal forecasting as a vision-language fusion problem. Our approach leverages a Vision-LLM visual encoder to process historical global traffic matrices as image sequences, providing the model with a comprehensive global view to inform cell-level predictions. To overcome the inefficiency of LLMs in handling numerical data, we introduce an efficient encoding scheme that represents floating-point values as single tokens via a specialized vocabulary, coupled with a two-stage numerical alignment fine-tuning process. The model is first trained with Supervised Fine-Tuning (SFT) and then further optimized for predictive accuracy using Group Relative Policy Optimization (GRPO), a memory-efficient reinforcement learning method. Evaluations on real-world mobile traffic datasets demonstrate that ST-Vision-LLM outperforms existing methods by 15.6% in long-term prediction accuracy and exceeds the second-best baseline by over 30.04% in cross-domain few-shot scenarios. Our extensive experiments validate the model’s strong generalization capabilities across various data-scarce environments.

[LG-22] FedLoRA-Optimizer: Federated LoRA Fine-Tuning with Global and Local Optimization in Heterogeneous Data Scenarios

链接: https://arxiv.org/abs/2510.11274
作者: Jianzhe Zhao,Hailin Zhu,Yu Zhang,Ziqi Chen,Guibing Guo
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Federated efficient fine-tuning has emerged as an approach that leverages distributed data and computational resources across nodes to address the challenges of large-scale fine-tuning and privacy preservation. The Low-Rank Adaptation (LoRA) enables efficient fine-tuning of large-scale pre-trained models by introducing trainable low-rank matrices into weight this http URL, in heterogeneous data scenarios, client drift weakens the generalization of the global model, and local models often fail to meet the personalized needs of individual this http URL, existing federated LoRA efficient fine-tuning techniques overlook fine-grained analysis of the tuning matrices. To address this, we conducted preliminary experiments and found that different LoRA matrices exhibit different sensitivity to changes in the direction and magnitude of their this http URL thus propose a fine-grained federated LoRA tuning method. By fine-tuning the more sensitive directional vectors in the A matrix, which encode shared knowledge, our method learns shared features more effectively across clients and enhances global generalization. Simultaneously, by fine-tuning the more sensitive magnitude vectors in the B matrix, which encode personalized knowledge, our method better captures personalized knowledge, enabling detailed adaptation to local data. The method uses a pipeline combining global and local optimizers. Global optimization further improves local models, achieving collaborative optimization between global and local levels. This improves both the generalization ability of the global model and the personalized adaptation of local models under heterogeneous data scenarios. Experiments on Databricks-Dolly-15k and Natural Instructions with LLaMA2-7B and Deepseek-7B confirm that our method improves global performance by 0.39% and local performance by 0.59%.

[LG-23] DemoHLM: From One Demonstration to Generalizable Humanoid Loco-Manipulation

链接: https://arxiv.org/abs/2510.11258
作者: Yuhui Fu,Feiyang Xie,Chaoyi Xu,Jing Xiong,Haoqi Yuan,Zongqing Lu
类目: Robotics (cs.RO); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Loco-manipulation is a fundamental challenge for humanoid robots to achieve versatile interactions in human environments. Although recent studies have made significant progress in humanoid whole-body control, loco-manipulation remains underexplored and often relies on hard-coded task definitions or costly real-world data collection, which limits autonomy and generalization. We present DemoHLM, a framework for humanoid loco-manipulation that enables generalizable loco-manipulation on a real humanoid robot from a single demonstration in simulation. DemoHLM adopts a hierarchy that integrates a low-level universal whole-body controller with high-level manipulation policies for multiple tasks. The whole-body controller maps whole-body motion commands to joint torques and provides omnidirectional mobility for the humanoid robot. The manipulation policies, learned in simulation via our data generation and imitation learning pipeline, command the whole-body controller with closed-loop visual feedback to execute challenging loco-manipulation tasks. Experiments show a positive correlation between the amount of synthetic data and policy performance, underscoring the effectiveness of our data generation pipeline and the data efficiency of our approach. Real-world experiments on a Unitree G1 robot equipped with an RGB-D camera validate the sim-to-real transferability of DemoHLM, demonstrating robust performance under spatial variations across ten loco-manipulation tasks.

[LG-24] MIEO: encoding clinical data to enhance cardiovascular event prediction

链接: https://arxiv.org/abs/2510.11257
作者: Davide Borghini,Davide Marchi,Angelo Nardone,Giordano Scerra,Silvia Giulia Galfrè,Alessandro Pingitore,Giuseppe Prencipe,Corrado Priami,Alina Sîrbu
类目: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
*备注: Presented in the Poster Session of Computational Intelligence methods for Bioinformatics and Biostatistics (CIBB) 2025

点击查看摘要

Abstract:As clinical data are becoming increasingly available, machine learning methods have been employed to extract knowledge from them and predict clinical events. While promising, approaches suffer from at least two main issues: low availability of labelled data and data heterogeneity leading to missing values. This work proposes the use of self-supervised auto-encoders to efficiently address these challenges. We apply our methodology to a clinical dataset from patients with ischaemic heart disease. Patient data is embedded in a latent space, built using unlabelled data, which is then used to train a neural network classifier to predict cardiovascular death. Results show improved balanced accuracy compared to applying the classifier directly to the raw data, demonstrating that this solution is promising, especially in conditions where availability of unlabelled data could increase.

[LG-25] FUSE: Fast Semi-Supervised Node Embedding Learning via Structural and Label-Aware Optimization

链接: https://arxiv.org/abs/2510.11250
作者: Sujan Chakraborty,Rahul Bordoloi,Anindya Sengupta,Olaf Wolkenhauer,Saptarshi Bej
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Graph-based learning is a cornerstone for analyzing structured data, with node classification as a central task. However, in many real-world graphs, nodes lack informative feature vectors, leaving only neighborhood connectivity and class labels as available signals. In such cases, effective classification hinges on learning node embeddings that capture structural roles and topological context. We introduce a fast semi-supervised embedding framework that jointly optimizes three complementary objectives: (i) unsupervised structure preservation via scalable modularity approximation, (ii) supervised regularization to minimize intra-class variance among labeled nodes, and (iii) semi-supervised propagation that refines unlabeled nodes through random-walk-based label spreading with attention-weighted similarity. These components are unified into a single iterative optimization scheme, yielding high-quality node embeddings. On standard benchmarks, our method consistently achieves classification accuracy at par with or superior to state-of-the-art approaches, while requiring significantly less computational cost.

[LG-26] Learning the Structure of Connection Graphs

链接: https://arxiv.org/abs/2510.11245
作者: Leonardo Di Nino,Gabriele D’Acunto,Sergio Barbarossa,Paolo Di Lorenzo
类目: Machine Learning (cs.LG); Signal Processing (eess.SP)
*备注:

点击查看摘要

Abstract:Connection graphs (CGs) extend traditional graph models by coupling network topology with orthogonal transformations, enabling the representation of global geometric consistency. They play a key role in applications such as synchronization, Riemannian signal processing, and neural sheaf diffusion. In this work, we address the inverse problem of learning CGs directly from observed signals. We propose a principled framework based on maximum pseudo-likelihood under a consistency assumption, which enforces spectral properties linking the connection Laplacian to the underlying combinatorial Laplacian. Based on this formulation, we introduce the Structured Connection Graph Learning (SCGL) algorithm, a block-optimization procedure over Riemannian manifolds that jointly infers network topology, edge weights, and geometric structure. Our experiments show that SCGL consistently outperforms existing baselines in both topological recovery and geometric fidelity, while remaining computationally efficient.

[LG-27] Neural Weight Compression for Language Models

链接: https://arxiv.org/abs/2510.11234
作者: Jegwang Ryu,Minkyu Kim,Seungjun Shin,Hee Min Choi,Dokwan Oh,Jaeho Lee
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:The efficient storage and transmission of language model weights is becoming increasingly important, as their scale and adoption continue to grow. However, as our understanding of this new data modality is limited, designing a good compression algorithm for language model weights heavily relies on manual, trial-and-error approaches. In this paper, we propose a learned compression framework that trains neural codecs directly from pretrained language model weights. Unlike conventional data (e.g., images), language model weights pose unique challenges: the sizes and shapes of weight tensors vary significantly, and the reconstruction quality must be judged by downstream model predictions rather than naïve MSE loss. To address this, we introduce Neural Weight Compression (NWC), a novel autoencoder-based neural codec tailored to model weight compression. The proposed method inherits the advantages of autoencoder-based codecs while incorporating three technical components: (1) column-wise tensor chunking and normalization; (2) an importance-aware training loss; (3) an inference-time error compensation mechanism guided by model outputs. Experiments on open-weight language models show that NWC achieves competitive or state-of-the-art accuracy-compression tradeoffs, with particularly strong results at 4-6 bit precisions where accuracy remains nearly on par with FP16 models.

[LG-28] Enforcing convex constraints in Graph Neural Networks

链接: https://arxiv.org/abs/2510.11227
作者: Ahmed Rashwan,Keith Briggs,Chris Budd,Lisa Kreusser
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Many machine learning applications require outputs that satisfy complex, dynamic constraints. This task is particularly challenging in Graph Neural Network models due to the variable output sizes of graph-structured data. In this paper, we introduce ProjNet, a Graph Neural Network framework which satisfies input-dependant constraints. ProjNet combines a sparse vector clipping method with the Component-Averaged Dykstra (CAD) algorithm, an iterative scheme for solving the best-approximation problem. We establish a convergence result for CAD and develop a GPU-accelerated implementation capable of handling large-scale inputs efficiently. To enable end-to-end training, we introduce a surrogate gradient for CAD that is both computationally efficient and better suited for optimization than the exact gradient. We validate ProjNet on four classes of constrained optimisation problems: linear programming, two classes of non-convex quadratic programs, and radio transmit power optimization, demonstrating its effectiveness across diverse problem settings.

[LG-29] Cross-Scale Reservoir Computing for large spatio-temporal forecasting and modeling

链接: https://arxiv.org/abs/2510.11209
作者: Nicola Alboré,Gabriele Di Antonio,Fabrizio Coccetti,Andrea Gabrielli
类目: Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
*备注:

点击查看摘要

Abstract:We propose a new reservoir computing method for forecasting high-resolution spatiotemporal datasets. By combining multi-resolution inputs from coarser to finer layers, our architecture better captures both local and global dynamics. Applied to Sea Surface Temperature data, it outperforms standard parallel reservoir models in long-term forecasting, demonstrating the effectiveness of cross-layers coupling in improving predictive accuracy. Finally, we show that the optimal network dynamics in each layer become increasingly linear, revealing the slow modes propagated to subsequent layers.

[LG-30] Evaluating Line-level Localization Ability of Learning-based Code Vulnerability Detection Models

链接: https://arxiv.org/abs/2510.11202
作者: Marco Pintore,Giorgio Piras,Angelo Sotgiu,Maura Pintor,Battista Biggio
类目: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
*备注: Preprint

点击查看摘要

Abstract:To address the extremely concerning problem of software vulnerability, system security is often entrusted to Machine Learning (ML) algorithms. Despite their now established detection capabilities, such models are limited by design to flagging the entire input source code function as vulnerable, rather than precisely localizing the concerned code lines. However, the detection granularity is crucial to support human operators during software development, ensuring that such predictions reflect the true code semantics to help debug, evaluate, and fix the detected vulnerabilities. To address this issue, recent work made progress toward improving the detector’s localization ability, thus narrowing down the vulnerability detection “window” and providing more fine-grained predictions. Such approaches, however, implicitly disregard the presence of spurious correlations and biases in the data, which often predominantly influence the performance of ML algorithms. In this work, we investigate how detectors comply with this requirement by proposing an explainability-based evaluation procedure. Our approach, defined as Detection Alignment (DA), quantifies the agreement between the input source code lines that most influence the prediction and the actual localization of the vulnerability as per the ground truth. Through DA, which is model-agnostic and adaptable to different detection tasks, not limited to our use case, we analyze multiple learning-based vulnerability detectors and datasets. As a result, we show how the predictions of such models are consistently biased by non-vulnerable lines, ultimately highlighting the high impact of biases and spurious correlations. The code is available at this https URL.

[LG-31] Efficient In-Memory Acceleration of Sparse Block Diagonal LLM s

链接: https://arxiv.org/abs/2510.11192
作者: João Paulo Cardoso de Lima,Marc Dietrich,Jeronimo Castrillon,Asif Ali Khan
类目: Hardware Architecture (cs.AR); Machine Learning (cs.LG)
*备注: 8 pages, to appear in IEEE Cross-disciplinary Conference on Memory-Centric Computing (CCMCC)

点击查看摘要

Abstract:Structured sparsity enables deploying large language models (LLMs) on resource-constrained systems. Approaches like dense-to-sparse fine-tuning are particularly compelling, achieving remarkable structured sparsity by reducing the model size by over 6.7x, while still maintaining acceptable accuracy. Despite this reduction, LLM inference, especially the decode stage being inherently memory-bound, is extremely expensive on conventional Von-Neumann architectures. Compute-in-memory (CIM) architectures mitigate this by performing computations directly in memory, and when paired with sparse LLMs, enable storing and computing the entire model in memory, eliminating the data movement on the off-chip bus and improving efficiency. Nonetheless, naively mapping sparse matrices onto CIM arrays leads to poor array utilization and diminished computational efficiency. In this paper, we present an automated framework with novel mapping and scheduling strategies to accelerate sparse LLM inference on CIM accelerators. By exploiting block-diagonal sparsity, our approach improves CIM array utilization by over 50%, achieving more than 4x reduction in both memory footprint and the number of required floating-point operations.

[LG-32] Beyond single-model XAI: aggregating multi-model explanations for enhanced trustworthiness ECAI2025

链接: https://arxiv.org/abs/2510.11164
作者: Ilaria Vascotto,Alex Rodriguez,Alessandro Bonaita,Luca Bortolussi
类目: Machine Learning (cs.LG)
*备注: Accepted at the European Workshop on Trustworthy Artificial Intelligence (TRUST-AI), co-located within ECAI 2025

点击查看摘要

Abstract:The use of Artificial Intelligence (AI) models in real-world and high-risk applications has intensified the discussion about their trustworthiness and ethical usage, from both a technical and a legislative perspective. The field of eXplainable Artificial Intelligence (XAI) addresses this challenge by proposing explanations that bring to light the decision-making processes of complex black-box models. Despite being an essential property, the robustness of explanations is often an overlooked aspect during development: only robust explanation methods can increase the trust in the system as a whole. This paper investigates the role of robustness through the usage of a feature importance aggregation derived from multiple models ( k -nearest neighbours, random forest and neural networks). Preliminary results showcase the potential in increasing the trustworthiness of the application, while leveraging multiple model’s predictive power.

[LG-33] Emergence of hybrid computational dynamics through reinforcement learning

链接: https://arxiv.org/abs/2510.11162
作者: Roman A. Kononov,Nikita A. Pospelov,Konstantin V. Anokhin,Vladimir V. Nekorkin,Oleg V. Maslennikov
类目: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Adaptation and Self-Organizing Systems (nlin.AO); Neurons and Cognition (q-bio.NC)
*备注: 22 pages, 11 figures

点击查看摘要

[LG-34] A Comprehensive Forecasting-Based Framework for Time Series Anomaly Detection: Benchmarking on the Numenta Anomaly Benchmark (NAB)

链接: https://arxiv.org/abs/2510.11141
作者: Mohammad Karami,Mostafa Jalali,Fatemeh Ghassemi
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Time series anomaly detection is critical for modern digital infrastructures, yet existing methods lack systematic cross-domain evaluation. We present a comprehensive forecasting-based framework unifying classical methods (Holt-Winters, SARIMA) with deep learning architectures (LSTM, Informer) under a common residual-based detection interface. Our modular pipeline integrates preprocessing (normalization, STL decomposition), four forecasting models, four detection methods, and dual evaluation through forecasting metrics (MAE, RMSE, PCC) and detection metrics (Precision, Recall, F1, AUC). We conduct the first complete evaluation on the Numenta Anomaly Benchmark (58 datasets, 7 categories) with 232 model training runs and 464 detection evaluations achieving 100% success rate. LSTM achieves best performance (F1: 0.688, ranking first or second on 81% of datasets) with exceptional correlation on complex patterns (PCC: 0.999). Informer provides competitive accuracy (F1: 0.683) with 30% faster training. Classical methods achieve perfect predictions on simple synthetic data with 60 lower cost but show 2-3 worse F1-scores on real-world datasets. Forecasting quality dominates detection performance: differences between detection methods (F1: 0.621-0.688) are smaller than between forecasting models (F1: 0.344-0.688). Our findings provide evidence-based guidance: use LSTM for complex patterns, Informer for efficiency-critical deployments, and classical methods for simple periodic data with resource constraints. The complete implementation and results establish baselines for future forecasting-based anomaly detection research.

[LG-35] DUAL: Learning Diverse Kernels for Aggregated Two-sample and Independence Testing

链接: https://arxiv.org/abs/2510.11140
作者: Zhijian Zhou,Xunye Tian,Liuhua Peng,Chao Lei,Antonin Schrab,Danica J. Sutherland,Feng Liu
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-36] st-Time Adaptation by Causal Trimming NEURIPS2025

链接: https://arxiv.org/abs/2510.11133
作者: Yingnan Liu,Rui Qiao,Mong Li Lee,Wynne Hsu
类目: Machine Learning (cs.LG)
*备注: Accepted to the Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025); Code is available at this https URL

点击查看摘要

Abstract:Test-time adaptation aims to improve model robustness under distribution shifts by adapting models with access to unlabeled target samples. A primary cause of performance degradation under such shifts is the model’s reliance on features that lack a direct causal relationship with the prediction target. We introduce Test-time Adaptation by Causal Trimming (TACT), a method that identifies and removes non-causal components from representations for test distributions. TACT applies data augmentations that preserve causal features while varying non-causal ones. By analyzing the changes in the representations using Principal Component Analysis, TACT identifies the highest variance directions associated with non-causal features. It trims the representations by removing their projections on the identified directions, and uses the trimmed representations for the predictions. During adaptation, TACT continuously tracks and refines these directions to get a better estimate of non-causal features. We theoretically analyze the effectiveness of this approach and empirically validate TACT on real-world out-of-distribution benchmarks. TACT consistently outperforms state-of-the-art methods by a significant margin.

[LG-37] Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM

链接: https://arxiv.org/abs/2510.11121
作者: Rongjie Zhu,Cong Zhang,Zhiguang Cao
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:While large language models (LLMs) are increasingly used as automated heuristic designers for vehicle routing problems (VRPs), current state-of-the-art methods predominantly rely on prompting massive, general-purpose models like GPT-4. This work challenges that paradigm by demonstrating that a smaller, specialized LLM, when meticulously fine-tuned, can generate components that surpass expert-crafted heuristics within advanced solvers. We propose RFTHGS, a novel Reinforcement learning (RL) framework for Fine-Tuning a small LLM to generate high-performance crossover operators for the Hybrid Genetic Search (HGS) solver, applied to the Capacitated VRP (CVRP). Our method employs a multi-tiered, curriculum-based reward function that progressively guides the LLM to master generating first compilable, then executable, and finally, superior-performing operators that exceed human expert designs. This is coupled with an operator caching mechanism that discourages plagiarism and promotes diversity during training. Comprehensive experiments show that our fine-tuned LLM produces crossover operators which significantly outperform the expert-designed ones in HGS. The performance advantage remains consistent, generalizing from small-scale instances to large-scale problems with up to 1000 nodes. Furthermore, RFTHGS exceeds the performance of leading neuro-combinatorial baselines, prompt-based methods, and commercial LLMs such as GPT-4o and GPT-4o-mini.

[LG-38] Graph Neural Network-Based Multicast Routing for On-Demand Streaming Services in 6G Networks

链接: https://arxiv.org/abs/2510.11109
作者: Xiucheng Wang,Zien Wang,Nan Cheng,Wenchao Xu,Wei Quan,Xuemin Shen
类目: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:The increase of bandwidth-intensive applications in sixth-generation (6G) wireless networks, such as real-time volumetric streaming and multi-sensory extended reality, demands intelligent multicast routing solutions capable of delivering differentiated quality-of-service (QoS) at scale. Traditional shortest-path and multicast routing algorithms are either computationally prohibitive or structurally rigid, and they often fail to support heterogeneous user demands, leading to suboptimal resource utilization. Neural network-based approaches, while offering improved inference speed, typically lack topological generalization and scalability. To address these limitations, this paper presents a graph neural network (GNN)-based multicast routing framework that jointly minimizes total transmission cost and supports user-specific video quality requirements. The routing problem is formulated as a constrained minimum-flow optimization task, and a reinforcement learning algorithm is developed to sequentially construct efficient multicast trees by reusing paths and adapting to network dynamics. A graph attention network (GAT) is employed as the encoder to extract context-aware node embeddings, while a long short-term memory (LSTM) module models the sequential dependencies in routing decisions. Extensive simulations demonstrate that the proposed method closely approximates optimal dynamic programming-based solutions while significantly reducing computational complexity. The results also confirm strong generalization to large-scale and dynamic network topologies, highlighting the method’s potential for real-time deployment in 6G multimedia delivery scenarios. Code is available at this https URL.

[LG-39] Efficient Edge Test-Time Adaptation via Latent Feature Coordinate Correction

链接: https://arxiv.org/abs/2510.11068
作者: Xinyu Luo,Jie Liu,Kecheng Chen,Junyi Yang,Bo Ding,Arindam Basu,Haoliang Li
类目: Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
*备注: Under review

点击查看摘要

[LG-40] Stronger Together: On-Policy Reinforcement Learning for Collaborative LLM s

链接: https://arxiv.org/abs/2510.11062
作者: Yujie Zhao,Lanxiang Hu,Yang Wang,Minmin Hou,Hao Zhang,Ke Ding,Jishen Zhao
类目: Machine Learning (cs.LG); Multiagent Systems (cs.MA)
*备注:

点击查看摘要

[LG-41] Robust Photoplethysmography Signal Denoising via Mamba Networks

链接: https://arxiv.org/abs/2510.11058
作者: I Chiu,Yu-Tung Liu,Kuan-Chen Wang,Hung-Yu Wei,Yu Tsao
类目: Machine Learning (cs.LG); Signal Processing (eess.SP)
*备注: 5 pages, 2 figures

点击查看摘要

Abstract:Photoplethysmography (PPG) is widely used in wearable health monitoring, but its reliability is often degraded by noise and motion artifacts, limiting downstream applications such as heart rate (HR) estimation. This paper presents a deep learning framework for PPG denoising with an emphasis on preserving physiological information. In this framework, we propose DPNet, a Mamba-based denoising backbone designed for effective temporal modeling. To further enhance denoising performance, the framework also incorporates a scale-invariant signal-to-distortion ratio (SI-SDR) loss to promote waveform fidelity and an auxiliary HR predictor (HRP) that provides physiological consistency through HR-based supervision. Experiments on the BIDMC dataset show that our method achieves strong robustness against both synthetic noise and real-world motion artifacts, outperforming conventional filtering and existing neural models. Our method can effectively restore PPG signals while maintaining HR accuracy, highlighting the complementary roles of SI-SDR loss and HR-guided supervision. These results demonstrate the potential of our approach for practical deployment in wearable healthcare systems.

[LG-42] Conformal Inference for Time Series over Graphs

链接: https://arxiv.org/abs/2510.11049
作者: Sonakshi Dua,Gonzalo Mateos,Sundeep Prabhakar Chepuri
类目: Machine Learning (cs.LG); Signal Processing (eess.SP)
*备注:

点击查看摘要

[LG-43] Instruction-aware User Embedding via Synergistic Language and Representation Modeling

链接: https://arxiv.org/abs/2510.11016
作者: Ziyi Gao,Yike Xu,Jiahao Yuan,Baokun Wang,Jinyong Wen,Xiaotong Lin,Yun Liu,Xing Fu,Yu Cheng,Yongchao Liu,Weiqiang Wang,Zhongle Xie
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:User representation modeling has become increasingly crucial for personalized applications, yet existing approaches struggle with generalizability across domains and sensitivity to noisy behavioral signals. We present InstructUE, an instruction-aware user embedding foundation model that leverages large language models (LLMs) to generate general and instruction-aware user representations. InstructUE introduces a multi-encoder architecture with a lightweight adapter that efficiently processes heterogeneous data from six different sources while preserving their structural characteristics. Additionally, it proposes a novel contrastive-autoregressive training framework that bridges language and representation spaces through a curated UserQA dataset. The contrastive-autoregressive training framework simultaneously leverages autoregressive learning to capture domain knowledge in language space and contrastive learning to align user-text embeddings in representation space, thereby enhancing the instruction-awareness and noise-robustness of user embeddings. Through extensive experiments on real-world applications, we demonstrate that InstructUE significantly outperforms existing methods across multiple domains including user prediction, marketing, and recommendation scenarios. Our results show that instruction-aware user modeling can effectively achieve instruction-guided denoising of user information in specific scenarios, paving the way for more generalizable and robust user representation learning.

[LG-44] GrASP: A Generalizable Address-based Semantic Prefetcher for Scalable Transactional and Analytical Workloads

链接: https://arxiv.org/abs/2510.11011
作者: Farzaneh Zirak,Farhana Choudhury,Renata Borovica-Gajic
类目: Databases (cs.DB); Machine Learning (cs.LG)
*备注: This is a preprint version

点击查看摘要

[LG-45] Blade: A Derivative-free Bayesian Inversion Method using Diffusion Priors

链接: https://arxiv.org/abs/2510.10968
作者: Hongkai Zheng,Austin Wang,Zihui Wu,Zhengyu Huang,Ricardo Baptista,Yisong Yue
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:Derivative-free Bayesian inversion is an important task in many science and engineering applications, particularly when computing the forward model derivative is computationally and practically challenging. In this paper, we introduce Blade, which can produce accurate and well-calibrated posteriors for Bayesian inversion using an ensemble of interacting particles. Blade leverages powerful data-driven priors based on diffusion models, and can handle nonlinear forward models that permit only black-box access (i.e., derivative-free). Theoretically, we establish a non-asymptotic convergence analysis to characterize the effects of forward model and prior estimation errors. Empirically, Blade achieves superior performance compared to existing derivative-free Bayesian inversion methods on various inverse problems, including challenging highly nonlinear fluid dynamics.

[LG-46] Not All Bits Are Equal: Scale-Dependent Memory Optimization Strategies for Reasoning Models

链接: https://arxiv.org/abs/2510.10964
作者: Junhyuck Kim,Ethan Ewer,Taehong Moon,Jongho Park,Dimitris Papailiopoulos
类目: Machine Learning (cs.LG)
*备注: 20 pages, 12 figures

点击查看摘要

[LG-47] Interpretable Machine Learning for Cognitive Aging: Handling Missing Data and Uncovering Social Determinant

链接: https://arxiv.org/abs/2510.10952
作者: Xi Mao,Zhendong Wang,Jingyu Li,Lingchao Mao,Utibe Essien,Hairong Wang,Xuelei Sherry Ni
类目: Machine Learning (cs.LG); Applications (stat.AP)
*备注:

点击查看摘要

Abstract:Early detection of Alzheimer’s disease (AD) is crucial because its neurodegenerative effects are irreversible, and neuropathologic and social-behavioral risk factors accumulate years before diagnosis. Identifying higher-risk individuals earlier enables prevention, timely care, and equitable resource allocation. We predict cognitive performance from social determinants of health (SDOH) using the NIH NIA-supported PREPARE Challenge Phase 2 dataset derived from the nationally representative Mex-Cog cohort of the 2003 and 2012 Mexican Health and Aging Study (MHAS). Data: The target is a validated composite cognitive score across seven domains-orientation, memory, attention, language, constructional praxis, and executive function-derived from the 2016 and 2021 MHAS waves. Predictors span demographic, socioeconomic, health, lifestyle, psychosocial, and healthcare access factors. Methodology: Missingness was addressed with a singular value decomposition (SVD)-based imputation pipeline treating continuous and categorical variables separately. This approach leverages latent feature correlations to recover missing values while balancing reliability and scalability. After evaluating multiple methods, XGBoost was chosen for its superior predictive performance. Results and Discussion: The framework outperformed existing methods and the data challenge leaderboard, demonstrating high accuracy, robustness, and interpretability. SHAP-based post hoc analysis identified top contributing SDOH factors and age-specific feature patterns. Notably, flooring material emerged as a strong predictor, reflecting socioeconomic and environmental disparities. Other influential factors, age, SES, lifestyle, social interaction, sleep, stress, and BMI, underscore the multifactorial nature of cognitive aging and the value of interpretable, data-driven SDOH modeling. Subjects: Machine Learning (cs.LG); Applications (stat.AP) Cite as: arXiv:2510.10952 [cs.LG] (or arXiv:2510.10952v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2510.10952 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Sherry Ni [view email] [v1] Mon, 13 Oct 2025 03:04:10 UTC (1,122 KB)

[LG-48] Neutral Agent -based Adversarial Policy Learning against Deep Reinforcement Learning in Multi-party Open Systems

链接: https://arxiv.org/abs/2510.10937
作者: Qizhou Peng,Yang Zheng,Yu Wen,Yanna Wu,Yingying Du
类目: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
*备注:

点击查看摘要

[LG-49] Quantifying Information Disclosure During Gradient Descent Using Gradient Uniqueness

链接: https://arxiv.org/abs/2510.10902
作者: Mahmoud Abdelghafar,Maryam Aliakbarpour,Chris Jermaine
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-50] A Joint Learning Approach to Hardware Caching and Prefetching NEURIPS2025

链接: https://arxiv.org/abs/2510.10862
作者: Samuel Yuan,Divyanshu Saxena,Jiayi Chen,Nihal Sharma,Aditya Akella
类目: Machine Learning (cs.LG); Hardware Architecture (cs.AR)
*备注: Accepted at ML for Systems Workshop at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025)

点击查看摘要

[LG-51] Glance for Context: Learning When to Leverag e LLM LLM s for Node-Aware GNN-LLM Fusion

链接: https://arxiv.org/abs/2510.10849
作者: Donald Loveland,Yao-An Yang,Danai Koutra
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Learning on text-attributed graphs has motivated the use of Large Language Models (LLMs) for graph learning. However, most fusion strategies are applied uniformly across all nodes and attain only small overall performance gains. We argue this result stems from aggregate metrics that obscure when LLMs provide benefit, inhibiting actionable signals for new strategies. In this work, we reframe LLM-GNN fusion around nodes where GNNs typically falter. We first show that performance can significantly differ between GNNs and LLMs, with each excelling on distinct structural patterns, such as local homophily. To leverage this finding, we propose GLANCE (GNN with LLM Assistance for Neighbor- and Context-aware Embeddings), a framework that invokes an LLM to refine a GNN’s prediction. GLANCE employs a lightweight router that, given inexpensive per-node signals, decides whether to query the LLM. Since the LLM calls are non-differentiable, the router is trained with an advantage-based objective that compares the utility of querying the LLM against relying solely on the GNN. Across multiple benchmarks, GLANCE achieves the best performance balance across node subgroups, achieving significant gains on heterophilous nodes (up to +13% ) while simultaneously achieving top overall performance. Our findings highlight the value of adaptive, node-aware GNN-LLM architectures, where selectively invoking the LLM enables scalable deployment on large graphs without incurring high computational costs.

[LG-52] Fast and the Furious: Hot Starts in Pursuit-Evasion Games AAMAS

链接: https://arxiv.org/abs/2510.10830
作者: Gabriel Smithline,Scott Nivison
类目: Multiagent Systems (cs.MA); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
*备注: Presented at AAMAS Workshop on Autonomous Robots and Multirobot Systems (ARMS)

点击查看摘要

[LG-53] Aegis: A Correlation-Based Data Masking Advisor for Data Sharing Ecosystems SIGMOD2026

链接: https://arxiv.org/abs/2510.10810
作者: Omar Islam Laskar,Fatemeh Ramezani Khozestani,Ishika Nankani,Sohrab Namazi Nia,Senjuti Basu Roy,Kaustubh Beedkar
类目: Machine Learning (cs.LG); Databases (cs.DB)
*备注: Accepted at SIGMOD 2026

点击查看摘要

[LG-54] Crisis-Aware Regime-Conditioned Diffusion with CVaR Allocation

链接: https://arxiv.org/abs/2510.10807
作者: Ali Atiah Alzahrani
类目: Machine Learning (cs.LG); Computational Finance (q-fin.CP)
*备注: Code available at: this https URL

点击查看摘要

[LG-55] Rethinking deep learning: linear regression remains a key benchmark in predicting terrestrial water storag e

链接: https://arxiv.org/abs/2510.10799
作者: Wanshu Nie,Sujay V. Kumar,Junyu Chen,Long Zhao,Olya Skulovich,Jinwoong Yoo,Justin Pflug,Shahryar Khalique Ahmad,Goutam Konapala
类目: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph); Geophysics (physics.geo-ph)
*备注:

点击查看摘要

[LG-56] Preconditioned Norms: A Unified Framework for Steepest Descent Quasi-Newton and Adaptive Methods

链接: https://arxiv.org/abs/2510.10777
作者: Andrey Veprikov,Arman Bolatov,Samuel Horváth,Aleksandr Beznosikov,Martin Takáč,Slavomir Hanzely
类目: Machine Learning (cs.LG); Optimization and Control (math.OC)
*备注: 22 pages, 2 figures, 8 tables

点击查看摘要

[LG-57] Structure Over Signal: A Globalized Approach to Multi-relational GNNs for Stock Prediction

链接: https://arxiv.org/abs/2510.10775
作者: Amber Li,Aruzhan Abil,Juno Marques Oda
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-58] Controllable Generative Trajectory Prediction via Weak Preference Alignment

链接: https://arxiv.org/abs/2510.10731
作者: Yongxi Cao,Julian F. Schumann,Jens Kober,Joni Pajarinen,Arkady Zgonnikov
类目: Robotics (cs.RO); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-59] Designing ReLU Generative Networks to Enumerate Trees with a Given Tree Edit Distance

链接: https://arxiv.org/abs/2510.10706
作者: Mamoona Ghafoor,Tatsuya Akutsu
类目: Machine Learning (cs.LG); Discrete Mathematics (cs.DM)
*备注:

点击查看摘要

[LG-60] Learning-Augmented Streaming Algorithms for Correlation Clustering NEURIPS2025

链接: https://arxiv.org/abs/2510.10705
作者: Yinhao Dong,Shan Jiang,Shi Li,Pan Peng
类目: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
*备注: NeurIPS 2025

点击查看摘要

Abstract:We study streaming algorithms for Correlation Clustering. Given a graph as an arbitrary-order stream of edges, with each edge labeled as positive or negative, the goal is to partition the vertices into disjoint clusters, such that the number of disagreements is minimized. In this paper, we give the first learning-augmented streaming algorithms for the problem on both complete and general graphs, improving the best-known space-approximation tradeoffs. Based on the works of Cambus et al. (SODA’24) and Ahn et al. (ICML’15), our algorithms use the predictions of pairwise distances between vertices provided by a predictor. For complete graphs, our algorithm achieves a better-than- 3 approximation under good prediction quality, while using \tildeO(n) total space. For general graphs, our algorithm achieves an O(\log |E^-|) approximation under good prediction quality using \tildeO(n) total space, improving the best-known non-learning algorithm in terms of space efficiency. Experimental results on synthetic and real-world datasets demonstrate the superiority of our proposed algorithms over their non-learning counterparts.

[LG-61] Stock Prediction via a Dual Relation Fusion Network incorporating Static and Dynamic Relations

链接: https://arxiv.org/abs/2510.10695
作者: Long Chen,Huixin Bai,Mingxin Wang,Xiaohua Huang,Ying Liu,Jie Zhao,Ziyu Guan
类目: Machine Learning (cs.LG)
*备注: 11 pages

点击查看摘要

Abstract:Accurate modeling of inter-stock relationships is critical for stock price forecasting. However, existing methods predominantly focus on single-state relationships, neglecting the essential complementarity between dynamic and static inter-stock relations. To solve this problem, we propose a Dual Relation Fusion Network (DRFN) to capture the long-term relative stability of stock relation structures while retaining the flexibility to respond to sudden market shifts. Our approach features a novel relative static relation component that models time-varying long-term patterns and incorporates overnight informational influences. We capture dynamic inter-stock relationships through distance-aware mechanisms, while evolving long-term structures via recurrent fusion of dynamic relations from the prior day with the pre-defined static relations. Experiments demonstrate that our method significantly outperforms the baselines across different markets, with high sensitivity to the co-movement of relational strength and stock price.

[LG-62] Digital Twin-enabled Multi-generation Control Co-Design with Deep Reinforcement Learning

链接: https://arxiv.org/abs/2510.10694
作者: Ying-Kuan Tsai,Vispi Karkaria,Yi-Ping Chen,Wei Chen
类目: Machine Learning (cs.LG)
*备注: to be published in Journal of Mechanical Design

点击查看摘要

[LG-63] ProteinAE: Protein Diffusion Autoencoders for Structure Encoding

链接: https://arxiv.org/abs/2510.10634
作者: Shaoning Li,Le Zhuo,Yusong Wang,Mingyu Li,Xinheng He,Fandi Wu,Hongsheng Li,Pheng-Ann Heng
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-64] SDG-L: A Semiparametric Deep Gaussian Process based Framework for Battery Capacity Prediction

链接: https://arxiv.org/abs/2510.10621
作者: Hanbing Liu,Yanru Wu,Yang Li,Ercan E. Kuruoglu,Xuan Zhang
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-65] DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism

链接: https://arxiv.org/abs/2510.10620
作者: Chenyu Jiang,Zhenkun Cai,Ye Tian,Zhen Jia,Yida Wang,Chuan Wu
类目: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
*备注: 16 pages, 22 figures

点击查看摘要

Abstract:Context parallelism has emerged as a key technique to support long-context training, a growing trend in generative AI for modern large models. However, existing context parallel methods rely on static parallelization configurations that overlook the dynamic nature of training data, specifically, the variability in sequence lengths and token relationships (i.e., attention patterns) across samples. As a result, these methods often suffer from unnecessary communication overhead and imbalanced computation. In this paper, we present DCP, a dynamic context parallel training framework that introduces fine-grained blockwise partitioning of both data and computation. By enabling flexible mapping of data and computation blocks to devices, DCP can adapt to varying sequence characteristics, effectively reducing communication and improving memory and computation balance. Micro-benchmarks demonstrate that DCP accelerates attention by 1.19x~2.45x under causal masks and 2.15x~3.77x under sparse attention patterns. Additionally, we observe up to 0.94x~1.16x end-to-end training speed-up for causal masks, and 1.00x~1.46x for sparse masks.

[LG-66] Encoder Decoder Generative Adversarial Network Model for Stock Market Prediction

链接: https://arxiv.org/abs/2510.10617
作者: Bahadur Yadav,Sanjay Kumar Mohanty
类目: Machine Learning (cs.LG); Optimization and Control (math.OC)
*备注:

点击查看摘要

Abstract:Forecasting stock prices remains challenging due to the volatile and non-linear nature of financial markets. Despite the promise of deep learning, issues such as mode collapse, unstable training, and difficulty in capturing temporal and feature level correlations have limited the applications of GANs in this domain. We propose a GRU-based Encoder-Decoder GAN (EDGAN) model that strikes a balance between expressive power and simplicity. The model introduces key innovations such as a temporal decoder with residual connections for precise reconstruction, conditioning on static and dynamic covariates for contextual learning, and a windowing mechanism to capture temporal dynamics. Here, the generator uses a dense encoder-decoder framework with residual GRU blocks. Extensive experiments on diverse stock datasets demonstrate that EDGAN achieves superior forecasting accuracy and training stability, even in volatile markets. It consistently outperforms traditional GAN variants in forecasting accuracy and convergence stability under market conditions.

[LG-67] Budget Allocation for Unknown Value Functions in a Lipschitz Space

链接: https://arxiv.org/abs/2510.10605
作者: MohammadHossein Bateni,Hossein Esfandiari,Samira HosseinGhorban,Alireza Mirrokni,Radin Shahdaei
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-68] FusionGen: Feature Fusion-Based Few-Shot EEG Data Generation

链接: https://arxiv.org/abs/2510.10604
作者: Yuheng Chen,Dingkun Liu,Xinyao Yang,Xinping Xu,Baicheng Chen,Dongrui Wu
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-69] Understanding Self-supervised Contrastive Learning through Supervised Objectives

链接: https://arxiv.org/abs/2510.10572
作者: Byeongchan Lee
类目: Machine Learning (cs.LG)
*备注: Accepted at TMLR 2025

点击查看摘要

[LG-70] Multitask Learning with Learned Task Relationships

链接: https://arxiv.org/abs/2510.10570
作者: Zirui Wan,Stefan Vlaski
类目: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA)
*备注:

点击查看摘要

[LG-71] Multi-scale Frequency-Aware Adversarial Network for Parkinsons Disease Assessment Using Wearable Sensors

链接: https://arxiv.org/abs/2510.10558
作者: Weiming Zhao,Xulong Wang,Jun Qi,Yun Yang,Po Yang
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-72] Reinforced Domain Selection for Continuous Domain Adaptation

链接: https://arxiv.org/abs/2510.10530
作者: Hanbing Liu,Huaze Tang,Yanru Wu,Yang Li,Xiao-Ping Zhang
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-73] A Hybrid Machine Learning Approach for Synthetic Data Generation with Post Hoc Calibration for Clinical Tabular Datasets

链接: https://arxiv.org/abs/2510.10513
作者: Md Ibrahim Shikder Mahin,Md Shamsul Arefin,Md Tanvir Hasan
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-74] he Hidden DNA of LLM -Generated JavaScript: Structural Patterns Enable High-Accuracy Authorship Attribution

链接: https://arxiv.org/abs/2510.10493
作者: Norbert Tihanyi,Bilel Cherif,Richard A. Dubniczky,Mohamed Amine Ferrag,Tamás Bisztray
类目: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-75] Gradient Enhanced Self-Training Physics-Informed Neural Network (gST-PINN) for Solving Nonlinear Partial Differential Equations

链接: https://arxiv.org/abs/2510.10483
作者: Narayan S Iyer,Bivas Bhaumik,Ram S Iyer,Satyasaran Changdar
类目: Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
*备注:

点击查看摘要

[LG-76] Anchor-based Maximum Discrepancy for Relative Similarity Testing

链接: https://arxiv.org/abs/2510.10477
作者: Zhijian Zhou,Liuhua Peng,Xunye Tian,Feng Liu
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-77] Does Weighting Improve Matrix Factorization for Recommender Systems? WWW

链接: https://arxiv.org/abs/2510.10440
作者: Alex Ayoub,Samuel Robertson,Dawen Liang,Harald Steck,Nathan Kallus
类目: Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注: In the proceedings of the Web Conference (WWW) 2025 (11 pages)

点击查看摘要

[LG-78] Softmax geq Linear: Transformers may learn to classify in-context by kernel gradient descent

链接: https://arxiv.org/abs/2510.10425
作者: Sara Dragutinović,Andrew M. Saxe,Aaditya K. Singh
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-79] FLAMMABLE: A Multi-Model Federated Learning Framework with Multi-Model Engagement and Adaptive Batch Sizes

链接: https://arxiv.org/abs/2510.10380
作者: Shouxu Lin,Zimeng Pan,Yuhang Yao,Haeyoung Noh,Pei Zhang,Carlee Joe-Wong
类目: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-80] Applying non-negative matrix factorization with covariates to label matrix for classification

链接: https://arxiv.org/abs/2510.10375
作者: Kenichi Satoh
类目: Machine Learning (cs.LG); Methodology (stat.ME)
*备注: 2 figures, R package: nmfkc published in GitHub, this https URL

点击查看摘要

[LG-81] Exploration-free Algorithms for Multi-group Mean Estimation

链接: https://arxiv.org/abs/2510.10374
作者: Ziyi Wei,Huaiyang Zhong,Xiaocheng Li
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-82] ransformer Model Detects Antidepressant Use From a Single Night of Sleep Unlocking an Adherence Biomarker

链接: https://arxiv.org/abs/2510.10364
作者: Ali Mirzazadeh,Simon Cadavid,Kaiwen Zha,Chao Li,Sultan Alzahrani,Manar Alawajy,Joshua Korzenik,Kreshnik Hoti,Charles Reynolds,David Mischoulon,John Winkelman,Maurizio Fava,Dina Katabi
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-83] Learning to Throw-Flip IROS2025

链接: https://arxiv.org/abs/2510.10357
作者: Yang Liu,Bruno Da Costa,Aude Billard
类目: Robotics (cs.RO); Machine Learning (cs.LG)
*备注: Accepted to IROS 2025. Video Summary: this https URL

点击查看摘要

[LG-84] Learning Operators through Coefficient Mappings in Fixed Basis Spaces

链接: https://arxiv.org/abs/2510.10350
作者: Chuqi Chen,Yang Xiang,Weihong Zhang
类目: Numerical Analysis (math.NA); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-85] Multi-View Graph Learning with Graph-Tuple

链接: https://arxiv.org/abs/2510.10341
作者: Shiyu Chen,Ningyuan(Teresa)Huang,Soledad Villar
类目: Machine Learning (cs.LG)
*备注: Submitted to TAG workshop

点击查看摘要

Abstract:Graph Neural Networks (GNNs) typically scale with the number of graph edges, making them well suited for sparse graphs but less efficient on dense graphs, such as point clouds or molecular interactions. A common remedy is to sparsify the graph via similarity thresholding or distance pruning, but this forces an arbitrary choice of a single interaction scale and discards crucial information from other scales. To overcome this limitation, we introduce a multi-view graph-tuple framework. Instead of a single graph, our graph-tuple framework partitions the graph into disjoint subgraphs, capturing primary local interactions and weaker, long-range connections. We then learn multi-view representations from the graph-tuple via a heterogeneous message-passing architecture inspired by the theory of non-commuting operators, which we formally prove is strictly more expressive and guarantees a lower oracle risk compared to single-graph message-passing models. We instantiate our framework on two scientific domains: molecular property prediction from feature-scarce Coulomb matrices and cosmological parameter inference from geometric point clouds. On both applications, our multi-view graph-tuple models demonstrate better performance than single-graph baselines, highlighting the power and versatility of our multi-view approach.

[LG-86] Grounded AI for Code Review: Resource-Efficient Large-Model Serving in Enterprise Pipelines

链接: https://arxiv.org/abs/2510.10290
作者: Sayan Mandal,Hua Jiang
类目: oftware Engineering (cs.SE); Machine Learning (cs.LG)
*备注: Submitted to MLSys 2026

点击查看摘要

[LG-87] Lost in the Middle: An Emergent Property from Information Retrieval Demands in LLM s

链接: https://arxiv.org/abs/2510.10276
作者: Nikolaus Salvatore,Hao Wang,Qiong Zhang
类目: Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
*备注:

点击查看摘要

[LG-88] Enhancing the Cross-Size Generalization for Solving Vehicle Routing Problems via Continual Learning

链接: https://arxiv.org/abs/2510.10262
作者: Jingwen Li,Zhiguang Cao,Yaoxin Wu,Tang Liu
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-89] ProGress: Structured Music Generation via Graph Diffusion and Hierarchical Music Analysis

链接: https://arxiv.org/abs/2510.10249
作者: Stephen Ni-Hahn,Chao Péter Yang,Mingchen Ma,Cynthia Rudin,Simon Mak,Yue Jiang
类目: ound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
*备注:

点击查看摘要

[LG-90] Progressive Scale Convolutional Network for Spatio-Temporal Downscaling of Soil Moisture: A Case Study Over the Tibetan Plateau

链接: https://arxiv.org/abs/2510.10244
作者: Ziyu Zhou,Keyan Hu,Ling Zhang,Zhaohui Xue,Yutian Fang,Yusha Zheng
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-91] Hierarchical Bayesian Flow Networks for Molecular Graph Generation

链接: https://arxiv.org/abs/2510.10211
作者: Yida Xiong,Jiameng Chen,Kun Li,Hongzhi Zhang,Xiantao Cai,Wenbin Hu
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-92] LOOPerSet: A Large-Scale Dataset for Data-Driven Polyhedral Compiler Optimization

链接: https://arxiv.org/abs/2510.10209
作者: Massinissa Merouani,Afif Boudaoud,Riyadh Baghdadi
类目: Programming Languages (cs.PL); Machine Learning (cs.LG); Performance (cs.PF)
*备注:

点击查看摘要

Abstract:The advancement of machine learning for compiler optimization, particularly within the polyhedral model, is constrained by the scarcity of large-scale, public performance datasets. This data bottleneck forces researchers to undertake costly data generation campaigns, slowing down innovation and hindering reproducible research learned code optimization. To address this gap, we introduce LOOPerSet, a new public dataset containing 28 million labeled data points derived from 220,000 unique, synthetically generated polyhedral programs. Each data point maps a program and a complex sequence of semantics-preserving transformations (such as fusion, skewing, tiling, and parallelism)to a ground truth performance measurement (execution time). The scale and diversity of LOOPerSet make it a valuable resource for training and evaluating learned cost models, benchmarking new model architectures, and exploring the frontiers of automated polyhedral scheduling. The dataset is released under a permissive license to foster reproducible research and lower the barrier to entry for data-driven compiler optimization.

[LG-93] BrainForm: a Serious Game for BCI Training and Data Collection

链接: https://arxiv.org/abs/2510.10169
作者: Michele Romani,Devis Zanoni,Elisabetta Farella,Luca Turchet
类目: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
*备注: 15 pages, 6 figures. Author-accepted version. Accepted for presentation at the Brain Informatics 2025 conference, to appear in Springer Lecture Notes in Artificial Intelligence (LNAI) Brain Informatics Books Series. The final authenticated version will be available via SpringerLink

点击查看摘要

Abstract: \textitBrainForm is a gamified Brain-Computer Interface (BCI) training system designed for scalable data collection using consumer hardware and a minimal setup. We investigated (1) how users develop BCI control skills across repeated sessions and (2) perceptual and performance effects of two visual stimulation textures. Game Experience Questionnaire (GEQ) scores for Flow, Positive Affect, Competence and Challenge were strongly positive, indicating sustained engagement. A within-subject study with multiple runs, two task complexities, and post-session questionnaires revealed no significant performance differences between textures but increased ocular irritation over time. Online metrics \unicodex2013 Task Accuracy, Task Time, and Information Transfer Rate \unicodex2013 improved across sessions, confirming learning effects for symbol spelling, even under pressure conditions. Our results highlight the potential of \textitBrainForm as a scalable, user-friendly BCI research tool and offer guidance for sustained engagement and reduced training fatigue.

[LG-94] Robust Learning of Diffusion Models with Extremely Noisy Conditions

链接: https://arxiv.org/abs/2510.10149
作者: Xin Chen,Gillian Dobbie,Xinyu Wang,Feng Liu,Di Wang,Jingfeng Zhang
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-95] Adversarial Attacks on Downstream Weather Forecasting Models: Application to Tropical Cyclone Trajectory Prediction

链接: https://arxiv.org/abs/2510.10140
作者: Yue Deng,Francisco Santos,Pang-Ning Tan,Lifeng Luo
类目: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-96] he Hybrid Multimodal Graph Index (HMGI): A Comprehensive Framework for Integrated Relational and Vector Search

链接: https://arxiv.org/abs/2510.10123
作者: Joydeep Chandra,Satyam Kumar Navneet,Yong Zhang
类目: Databases (cs.DB); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-97] Preference-driven Knowledge Distillation for Few-shot Node Classification NEURIPS2025

链接: https://arxiv.org/abs/2510.10116
作者: Xing Wei,Chunchun Chen,Rui Fan,Xiaofeng Cao,Sourav Medya,Wei Ye
类目: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
*备注: Accepted at NeurIPS 2025

点击查看摘要

[LG-98] Lighter-X: An Efficient and Plug-and-play Strategy for Graph-based Recommendation through Decoupled Propagation

链接: https://arxiv.org/abs/2510.10105
作者: Yanping Zheng,Zhewei Wei,Frank de Hoog,Xu Chen,Hongteng Xu,Yuhang Ye,Jiadeng Huang
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-99] PANTHER: Generative Pretraining Beyond Language for Sequential User Behavior Modeling

链接: https://arxiv.org/abs/2510.10102
作者: Guilin Li,Yun Zhang,Xiuyuan Chen,Chengqi Li,Bo Wang,Linghe Kong,Wenjia Wang,Weiran Huang,Matthias Hwai Yong Tan
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-100] Rademacher Meets Colors: More Expressivity but at What Cost ?

链接: https://arxiv.org/abs/2510.10101
作者: Martin Carrasco,Caio Deberaldini Netto,Vahan A. Martirosyan,Aneeqa Mehrab,Ehimare Okoyomon,Caterina Graziani
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-101] Improving Speech Emotion Recognition with Mutual Information Regularized Generative Model

链接: https://arxiv.org/abs/2510.10078
作者: Chung-Soo Ahn,Rajib Rana,Sunil Sivadas,Carlos Busso,Jagath C. Rajapakse
类目: ound (cs.SD); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-102] ADEPT: Continual Pretraining via Adaptive Expansion and Dynamic Decoupled Tuning

链接: https://arxiv.org/abs/2510.10071
作者: Jinyang Zhang,Yue Fang,Hongxin Ding,Weibin Liao,Muyang Ye,Xu Chu,Junfeng Zhao,Yasha Wang
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-103] One4Many-StablePacker: An Efficient Deep Reinforcement Learning Framework for the 3D Bin Packing Problem

链接: https://arxiv.org/abs/2510.10057
作者: Lei Gao,Shihong Huang,Shengjie Wang,Hong Ma,Feng Zhang,Hengda Bao,Qichang Chen,Weihua Zhou
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-104] Experience-Efficient Model-Free Deep Reinforcement Learning Using Pre-Training

链接: https://arxiv.org/abs/2510.10029
作者: Ruoxing Yang
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-105] Bidirectional Time-Frequency Pyramid Network for Enhanced Robust EEG Classification

链接: https://arxiv.org/abs/2510.10004
作者: Jiahui Hong,Siqing Li,Muqing Jian,Luming Yang
类目: Machine Learning (cs.LG)
*备注: Accepted to IEEE BIBM 2025

点击查看摘要

[LG-106] ght Robustness Certificates and Wasserstein Distributional Attacks for Deep Neural Networks

链接: https://arxiv.org/abs/2510.10000
作者: Bach C. Le,Tung V. Dao,Binh T. Nguyen,Hong T.M. Chu
类目: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-107] Learning Joint Embeddings of Function and Process Call Graphs for Malware Detection

链接: https://arxiv.org/abs/2510.09984
作者: Kartikeya Aneja,Nagender Aneja,Murat Kantarcioglu
类目: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
*备注:

点击查看摘要

[LG-108] An Unsupervised Time Series Anomaly Detection Approach for Efficient Online Process Monitoring of Additive Manufacturing

链接: https://arxiv.org/abs/2510.09977
作者: Frida Cantu,Salomon Ibarra,Arturo Gonzales,Jesus Barreda,Chenang Liu,Li Zhang
类目: Machine Learning (cs.LG)
*备注: 2025 IEEE 21st International Conference on Automation Science and Engineering

点击查看摘要

[LG-109] Reinforcement Fine-Tuning of Flow-Matching Policies for Vision-Language-Action Models

链接: https://arxiv.org/abs/2510.09976
作者: Mingyang Lyu,Yinqian Sun,Erliang Lin,Huangrui Li,Ruolin Chen,Feifei Zhao,Yi Zeng
类目: Machine Learning (cs.LG); Robotics (cs.RO)
*备注:

点击查看摘要

[LG-110] Clustering Result Re-guided Incomplete Multi-view Spectral Clustering

链接: https://arxiv.org/abs/2510.09959
作者: Jun Yin,Runcheng Cai,Shiliang Sun
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-111] Structured Cooperative Multi-Agent Reinforcement Learning: a Bayesian Network Perspective

链接: https://arxiv.org/abs/2510.09937
作者: Shahbaz P Qadri Syed,He Bai
类目: Multiagent Systems (cs.MA); Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-112] AutoGD: Automatic Learning Rate Selection for Gradient Descent

链接: https://arxiv.org/abs/2510.09923
作者: Nikola Surjanovic,Alexandre Bouchard-Côté,Trevor Campbell
类目: Machine Learning (cs.LG); Optimization and Control (math.OC); Computation (stat.CO); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-113] Advancing Intoxication Detection: A Smartwatch-Based Approach

链接: https://arxiv.org/abs/2510.09916
作者: Manuel Segura,Pere Vergés,Richard Ky,Ramesh Arangott,Angela Kristine Garcia,Thang Dihn Trong,Makoto Hyodo,Alexandru Nicolau,Tony Givargis,Sergio Gago-Masague
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-114] Understanding Robust Machine Learning for Nonparametric Regression with Heavy-Tailed Noise

链接: https://arxiv.org/abs/2510.09888
作者: Yunlong Feng,Qiang Wu
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-115] AWRMAC: A Novel Dynamic Graph Representation Learning Method

链接: https://arxiv.org/abs/2510.09884
作者: Soheila Farokhi,Xiaojun Qi,Hamid Karimi
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-116] An Exploration of Non-Euclidean Gradient Descent: Muon and its Many Variants

链接: https://arxiv.org/abs/2510.09827
作者: Michael Crawshaw,Chirag Modi,Mingrui Liu,Robert M. Gower
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-117] Distributed clustering in partially overlapping feature spaces

链接: https://arxiv.org/abs/2510.09799
作者: Alessio Maritan,Luca Schenato
类目: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-118] A Unified Framework for Lifted Training and Inversion Approaches

链接: https://arxiv.org/abs/2510.09796
作者: Xiaoyu Wang,Alexandra Valavanis,Azhir Mahmood,Andreas Mang,Martin Benning,Audrey Repetti
类目: Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-119] Principled Operator Learning in Ocean Dynamics: The Role of Temporal Structure NEURIPS

链接: https://arxiv.org/abs/2510.09792
作者: Vahidreza Jahanmard,Ali Ramezani-Kebrya,Robinson Hordoir
类目: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
*备注: Accepted at NeurIPS ML4PS 2025

点击查看摘要

[LG-120] Combined Representation and Generation with Diffusive State Predictive Information Bottleneck

链接: https://arxiv.org/abs/2510.09784
作者: Richard John,Yunrui Qiu,Lukas Herron,Pratyush Tiwary
类目: Machine Learning (cs.LG); Statistical Mechanics (cond-mat.stat-mech); Quantitative Methods (q-bio.QM)
*备注:

点击查看摘要

[LG-121] A Generic Machine Learning Framework for Radio Frequency Fingerprinting

链接: https://arxiv.org/abs/2510.09775
作者: Alex Hiles,Bashar I. Ahmad
类目: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-122] HeSRN: Representation Learning On Heterogeneous Graphs via Slot-Aware Retentive Network

链接: https://arxiv.org/abs/2510.09767
作者: Yifan Lu,Ziyun Zou,Belal Alsinglawi,Islam Al-Qudah,Izzat Alsmadi,Feilong Tang,Pengfei Jiao,Shoaib Jameel
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-123] Leverag ing Shared Prototypes for a Multimodal Pulse Motion Foundation Model

链接: https://arxiv.org/abs/2510.09764
作者: Wanting Mao,Maxwell A Xu,Harish Haresamudram,Mithun Saha,Santosh Kumar,James Matthew Rehg
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-124] Federated k-Means via Generalized Total Variation Minimization

链接: https://arxiv.org/abs/2510.09718
作者: A. Jung
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-125] A Multi-Component Reward Function with Policy Gradient for Automated Feature Selection with Dynamic Regularization and Bias Mitigation

链接: https://arxiv.org/abs/2510.09705
作者: Sudip Khadka,L.S. Paudel
类目: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-126] Operator Learning for Power Systems Simulation

链接: https://arxiv.org/abs/2510.09704
作者: Matthew Schlegel,Matthew E. Taylor,Mostafa Farrokhabadi
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-127] Neural PDE Solvers with Physics Constraints: A Comparative Study of PINNs DRM and WANs

链接: https://arxiv.org/abs/2510.09693
作者: Jiakang Chen
类目: Machine Learning (cs.LG); Quantum Physics (quant-ph)
*备注: 50 pages, 13 figures

点击查看摘要

[LG-128] Using LLM s to Directly Guess Conditional Expectations Can Improve Efficiency in Causal Estimation

链接: https://arxiv.org/abs/2510.09684
作者: Chris Engh,P. M. Aronow
类目: Machine Learning (cs.LG); Methodology (stat.ME)
*备注:

点击查看摘要

[LG-129] A physics-aware deep learning model for shear band formation around collapsing pores in shocked reactive materials

链接: https://arxiv.org/abs/2510.09670
作者: Xinlun Cheng,Bingzhe Chen,Joseph Choi,Yen T. Nguyen,Pradeep Seshadri,Mayank Verma,H. S. Udaykumar,Stephen Baek
类目: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci); Computational Physics (physics.comp-ph)
*备注:

点击查看摘要

[LG-130] Population synthesis with geographic coordinates

链接: https://arxiv.org/abs/2510.09669
作者: Jacopo Lenti,Lorenzo Costantini,Ariadna Fosch,Anna Monticelli,David Scala,Marco Pangallo
类目: Machine Learning (cs.LG); Computers and Society (cs.CY); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-131] Spatial Uncertainty Quantification in Wildfire Forecasting for Climate-Resilient Emergency Planning

链接: https://arxiv.org/abs/2510.09666
作者: Aditya Chakravarty
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-132] LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference

链接: https://arxiv.org/abs/2510.09665
作者: Yihua Cheng,Yuhan Liu,Jiayi Yao,Yuwei An,Xiaokun Chen,Shaoting Feng,Yuyang Huang,Samuel Shen,Kuntai Du,Junchen Jiang
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-133] Assessment of different loss functions for fitting equivalent circuit models to electrochemical impedance spectroscopy data

链接: https://arxiv.org/abs/2510.09662
作者: Ali Jaberi(3),Amin Sadeghi(2),Runze Zhang(1),Zhaoyang Zhao(1),Qiuyu Shi(1),Robert Black(3),Zoya Sadighi(3),Jason Hattrick-Simpers(1) ((1) Department of Material Science and Engineering, University of Toronto, Toronto, Ontario, Canada, (2) Canmet MATERIALS, Natural Resources Canada, Hamilton, ON, Canada, (3) Clean Energy Innovation Research Center, National Research Council Canada, Mississauga, Ontario, Canada)
类目: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci)
*备注:

点击查看摘要

[LG-134] Heterogeneous Point Set Transformers for Segmentation of Multiple View Particle Detectors NEURIPS2025

链接: https://arxiv.org/abs/2510.09659
作者: Edgar E. Robles,Dikshant Sagar,Alejandro Yankelevich,Jianming Bian,Pierre Baldi,NOvA Collaboration
类目: Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex)
*备注: Submitted to Machine Learning and the Physical Sciences Workshop (ML4PS) at NeurIPS 2025

点击查看摘要

[LG-135] AdaptAuth: Multi-Layered Behavioral and Credential Analysis for a Secure and Adaptive Authentication Framework for Password Security

链接: https://arxiv.org/abs/2510.09645
作者: Tonmoy Ghosh
类目: Cryptography and Security (cs.CR); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-136] Risk-Calibrated Bayesian Streaming Intrusion Detection with SRE-Aligned Decisions

链接: https://arxiv.org/abs/2510.09619
作者: Michel Youssef(Independent Researcher)
类目: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
*备注: 11 pages, 7 figures. Primary category: cs.CR; cross-list: cs.LG, stat.ML. Implementation code and datasets are available from the corresponding author upon reasonable request. Code and reproducibility materials will be made available upon publication

点击查看摘要

[LG-137] Efficient Group Lasso Regularized Rank Regression with Data-Driven Parameter Determination

链接: https://arxiv.org/abs/2510.11546
作者: Meixia Lin,Meijiao Shi,Yunhai Xiao,Qian Zhang
类目: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)
*备注: 36 pages, 4 figures, 8 tables

点击查看摘要

[LG-138] SeFEF: A Seizure Forecasting Evaluation Framework

链接: https://arxiv.org/abs/2510.11275
作者: Ana Sofia Carmo,Lourenço Abrunhosa Rodrigues,Ana Rita Peralta,Ana Fred,Carla Bentes,Hugo Plácido da Silva
类目: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG)
*备注: main document: 14 pages, 9 figures, 2 tables; appendix: 7 pages, 2 figures, 3 tables, 2 algorithms

点击查看摘要

[LG-139] Analyzing Data Quality and Decay in Mega-Constellations: A Physics-Informed Machine Learning Approach

链接: https://arxiv.org/abs/2510.11242
作者: Katarina Dyreby,Francisco Caldas,Cláudia Soares
类目: Earth and Planetary Astrophysics (astro-ph.EP); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG)
*备注: 76th International Astronautical Congress

点击查看摘要

[LG-140] Machine Learning-Integrated Hybrid Fluid-Kinetic Framework for Quantum Electrodynamic Laser Plasma Simulations

链接: https://arxiv.org/abs/2510.11174
作者: Sadra Saremi,Amirhossein Ahmadkhan Kordbacheh
类目: Plasma Physics (physics.plasm-ph); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-141] PAC-Bayesian Bounds on Constrained f-Entropic Risk Measures

链接: https://arxiv.org/abs/2510.11169
作者: Hind Atbir,Farah Cherfaoui,Guillaume Metzler,Emilie Morvant,Paul Viallard
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-142] Enhanced Sampling for Efficient Learning of Coarse-Grained Machine Learning Potentials

链接: https://arxiv.org/abs/2510.11148
作者: Weilong Chen,Franz Görlich,Paul Fuchs,Julija Zavadlav
类目: Chemical Physics (physics.chem-ph); Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
*备注:

点击查看摘要

[LG-143] orchsom: The Reference PyTorch Library for Self-Organizing Maps

链接: https://arxiv.org/abs/2510.11147
作者: Louis Berthier,Ahmed Shokry,Maxime Moreaud,Guillaume Ramelet,Eric Moulines
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注: 4 mains pages with 2 tables, 4 pages of references, 15 pages of appendices with 13 figures and 3 tables

点击查看摘要

[LG-144] Adversarial Robustness in One-Stage Learning-to-Defer

链接: https://arxiv.org/abs/2510.10988
作者: Yannis Montreuil,Letian Yu,Axel Carlier,Lai Xing Ng,Wei Tsang Ooi
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-145] In-Context Learning Is Provably Bayesian Inference: A Generalization Theory for Meta-Learning

链接: https://arxiv.org/abs/2510.10981
作者: Tomoya Wakayama,Taiji Suzuki
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-146] ransfer Learning with Distance Covariance for Random Forest: Error Bounds and an EHR Application

链接: https://arxiv.org/abs/2510.10870
作者: Chenze Li,Subhadeep Paul
类目: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME)
*备注:

点击查看摘要

[LG-147] Quantifying Dataset Similarity to Guide Transfer Learning

链接: https://arxiv.org/abs/2510.10866
作者: Shudong Sun,Hao Helen Zhang
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-148] How Patterns Dictate Learnability in Sequential Data NEURIPS2025

链接: https://arxiv.org/abs/2510.10744
作者: Mario Morawski,Anais Despres,Rémi Rehm
类目: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)
*备注: NeurIPS 2025, 36 pages, 4 figures

点击查看摘要

[LG-149] Deep Signature and Neural RDE Methods for Path-Dependent Portfolio Optimization

链接: https://arxiv.org/abs/2510.10728
作者: Ali Atiah Alzahrani
类目: Mathematical Finance (q-fin.MF); Machine Learning (cs.LG)
*备注: Accepted for presentation at the ACM International Conference on AI in Finance (ICAIF 2025), QuantAI Workshop, Singapore. 9 pages. Code available at: this https URL

点击查看摘要

[LG-150] Mean-square and linear convergence of a stochastic proximal point algorithm in metric spaces of nonpositive curvature

链接: https://arxiv.org/abs/2510.10697
作者: Nicholas Pischke
类目: Optimization and Control (math.OC); Machine Learning (cs.LG)
*备注: 24 pages

点击查看摘要

[LG-151] Second-order Optimization under Heavy-Tailed Noise: Hessian Clipping and Sample Complexity Limits NEURIPS2025

链接: https://arxiv.org/abs/2510.10690
作者: Abdurakhmon Sadiev,Peter Richtárik,Ilyas Fatkhullin
类目: Optimization and Control (math.OC); Machine Learning (cs.LG)
*备注: Accepted for publication at NeurIPS 2025

点击查看摘要

[LG-152] Interactive Atmospheric Composition Emulation for Next-Generation Earth System Models

链接: https://arxiv.org/abs/2510.10654
作者: Seyed Mohammad Hassan Erfani,Kara Lamb,Susanne Bauer,Kostas Tsigaridis,Marcus van Lier-Walqui,Gavin Schmidt
类目: Atmospheric and Oceanic Physics (physics.ao-ph); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-153] Integrating Large Language Models and Reinforcement Learning for Sentiment-Driven Quantitative Trading

链接: https://arxiv.org/abs/2510.10526
作者: Wo Long,Wenxin Zeng,Xiaoyu Zhang,Ziyao Zhou
类目: Computational Finance (q-fin.CP); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-154] Generative Modeling of Aerosol State Representations

链接: https://arxiv.org/abs/2510.10361
作者: Ehsan Saleh,Saba Ghaffari,Jeffrey H. Curtis,Lekha Patel,Peter A. Bosler,Nicole Riemer,Matthew West
类目: Atmospheric and Oceanic Physics (physics.ao-ph); Machine Learning (cs.LG)
*备注: 31 pages, 20 figures

点击查看摘要

[LG-155] On some practical challenges of conformal prediction

链接: https://arxiv.org/abs/2510.10324
作者: Liang Hong,Noura Raydan Nasreddine
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-156] Neural variational inference for cutting feedback during uncertainty propagation

链接: https://arxiv.org/abs/2510.10268
作者: Jiafang Song,Sandipan Pramanik,Abhirup Datta
类目: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
*备注:

点击查看摘要

[LG-157] Kernel Treatment Effects with Adaptively Collected Data

链接: https://arxiv.org/abs/2510.10245
作者: Houssam Zenati,Bariscan Bozkurt,Arthur Gretton
类目: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
*备注:

点击查看摘要

[LG-158] Calibrating Generative Models

链接: https://arxiv.org/abs/2510.10020
作者: Henry D. Smith,Nathaniel L. Diamant,Brian L. Trippe
类目: Machine Learning (stat.ML); Machine Learning (cs.LG); Biomolecules (q-bio.BM)
*备注: Our codebase accompanying the paper is available at: this https URL

点击查看摘要

[LG-159] Egocentric Visual Navigation through Hippocampal Sequences

链接: https://arxiv.org/abs/2510.09951
作者: Xiao-Xiong Lin,Yuk Hoi Yiu,Christian Leibold
类目: Neurons and Cognition (q-bio.NC); Machine Learning (cs.LG)
*备注: 20 pages, 21 figures. This is a conference submission

点击查看摘要

[LG-160] Learning with Incomplete Context: Linear Contextual Bandits with Pretrained Imputation

链接: https://arxiv.org/abs/2510.09908
作者: Hao Yan,Heyan Zhang,Yongyi Guo
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-161] Performance of Machine Learning Methods for Gravity Inversion: Successes and Challenges

链接: https://arxiv.org/abs/2510.09632
作者: Vahid Negahdari,Shirin Samadi Bahrami,Seyed Reza Moghadasi,Mohammad Reza Razvan
类目: Geophysics (physics.geo-ph); Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
*备注:

点击查看摘要

信息检索

[IR-0] OneRec-Think: In-Text Reasoning for Generative Recommendation

链接: https://arxiv.org/abs/2510.11639
作者: Zhanyu Liu,Shiyao Wang,Xingmei Wang,Rongzhou Zhang,Jiaxin Deng,Honghui Bao,Jinghao Zhang,Wuchao Li,Pengfei Zheng,Xiangyu Wu,Yifei Hu,Qigen Hu,Xinchen Luo,Lejian Ren,Zixing Zhang,Qianqian Wang,Kuo Cai,Yunfan Wu,Hongtao Cheng,Zexuan Cheng,Lu Ren,Huanjie Wang,Yi Su,Ruiming Tang,Kun Gai,Guorui Zhou
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:The powerful generative capacity of Large Language Models (LLMs) has instigated a paradigm shift in recommendation. However, existing generative models (e.g., OneRec) operate as implicit predictors, critically lacking the capacity for explicit and controllable reasoning-a key advantage of LLMs. To bridge this gap, we propose OneRec-Think, a unified framework that seamlessly integrates dialogue, reasoning, and personalized recommendation. OneRec-Think incorporates: (1) Itemic Alignment: cross-modal Item-Textual Alignment for semantic grounding; (2) Reasoning Activation: Reasoning Scaffolding to activate LLM reasoning within the recommendation context; and (3) Reasoning Enhancement, where we design a recommendation-specific reward function that accounts for the multi-validity nature of user preferences. Experiments across public benchmarks show state-of-the-art performance. Moreover, our proposed “Think-Ahead” architecture enables effective industrial deployment on Kuaishou, achieving a 0.159% gain in APP Stay Time and validating the practical efficacy of the model’s explicit reasoning capability.

[IR-1] Uncertainty Quantification for Retrieval-Augmented Reasoning

链接: https://arxiv.org/abs/2510.11483
作者: Heydar Soudani,Hamed Zamani,Faegheh Hasibi
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:Retrieval-augmented reasoning (RAR) is a recent evolution of retrieval-augmented generation (RAG) that employs multiple reasoning steps for retrieval and generation. While effective for some complex queries, RAR remains vulnerable to errors and misleading outputs. Uncertainty quantification (UQ) offers methods to estimate the confidence of systems’ outputs. These methods, however, often handle simple queries with no retrieval or single-step retrieval, without properly handling RAR setup. Accurate estimation of UQ for RAR requires accounting for all sources of uncertainty, including those arising from retrieval and generation. In this paper, we account for all these sources and introduce Retrieval-Augmented Reasoning Consistency (R2C)–a novel UQ method for RAR. The core idea of R2C is to perturb the multi-step reasoning process by applying various actions to reasoning steps. These perturbations alter the retriever’s input, which shifts its output and consequently modifies the generator’s input at the next step. Through this iterative feedback loop, the retriever and generator continuously reshape one another’s inputs, enabling us to capture uncertainty arising from both components. Experiments on five popular RAR systems across diverse QA datasets show that R2C improves AUROC by over 5% on average compared to the state-of-the-art UQ baselines. Extrinsic evaluations using R2C as an external signal further confirm its effectiveness for two downstream tasks: in Abstention, it achieves ~5% gains in both F1Abstain and AccAbstain; in Model Selection, it improves the exact match by ~7% over single models and ~3% over selection methods.

[IR-2] What Generative Search Engines Like and How to Optimize Web Content Cooperatively

链接: https://arxiv.org/abs/2510.11438
作者: Yujiang Wu,Shanshan Zhong,Yubin Kim,Chenyan Xiong
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:By employing large language models (LLMs) to retrieve documents and generate natural language responses, Generative Engines, such as Google AI overview and ChatGPT, provide significantly enhanced user experiences and have rapidly become the new form of search. Their rapid adoption also drives the needs of Generative Engine Optimization (GEO), as content providers are eager to gain more traction from them. In this paper, we introduce AutoGEO, a framework to automatically learn generative engine preferences when using retrieved contents for response generation, and rewrite web contents for more such traction. AutoGEO first prompts frontier LLMs to explain generative engine preferences and extract meaningful preference rules from these explanations. Then it uses preference rules as context engineering for AutoGEO _\textAPI , a prompt-based GEO system, and as rule-based rewards to train AutoGEO _\textMini , a cost-effective GEO model. Experiments on the standard GEO-Bench and two newly constructed benchmarks using real user queries demonstrate the effectiveness of AutoGEO in enhancing content traction while preserving search utility. Analyses confirm the learned rules’ robustness and abilities to capture unique preferences in variant domains, and AutoGEO systems’ ability to embed them in content optimization. The code is released at this https URL.

[IR-3] On Inherited Popularity Bias in Cold-Start Item Recommendation RECSYS2025

链接: https://arxiv.org/abs/2510.11402
作者: Gregor Meehan,Johan Pauwels
类目: Information Retrieval (cs.IR)
*备注: Published at ACM RecSys 2025

点击查看摘要

Abstract:Collaborative filtering (CF) recommender systems struggle with making predictions on unseen, or ‘cold’, items. Systems designed to address this challenge are often trained with supervision from warm CF models in order to leverage collaborative and content information from the available interaction data. However, since they learn to replicate the behavior of CF methods, cold-start models may therefore also learn to imitate their predictive biases. In this paper, we show that cold-start systems can inherit popularity bias, a common cause of recommender system unfairness arising when CF models overfit to more popular items, thereby maximizing user-oriented accuracy but neglecting rarer items. We demonstrate that cold-start recommenders not only mirror the popularity biases of warm models, but are in fact affected more severely: because they cannot infer popularity from interaction data, they instead attempt to estimate it based solely on content features. This leads to significant over-prediction of certain cold items with similar content to popular warm items, even if their ground truth popularity is very low. Through experiments on three multimedia datasets, we analyze the impact of this behavior on three generative cold-start methods. We then describe a simple post-processing bias mitigation method that, by using embedding magnitude as a proxy for predicted popularity, can produce more balanced recommendations with limited harm to user-oriented cold-start accuracy.

[IR-4] VeriCite: Towards Reliable Citations in Retrieval-Augmented Generation via Rigorous Verification

链接: https://arxiv.org/abs/2510.11394
作者: Haosheng Qian,Yixing Fan,Jiafeng Guo,Ruqing Zhang,Qi Chen,Dawei Yin,Xueqi Cheng
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:Retrieval-Augmented Generation (RAG) has emerged as a crucial approach for enhancing the responses of large language models (LLMs) with external knowledge sources. Despite the impressive performance in complex question-answering tasks, RAG still struggles with hallucinations. Attributing RAG-generated content through in-line citations has demonstrated potential in reducing hallucinations and facilitating human verification. Existing citation generation methods primarily rely on either fine-tuning the generator or employing post-processing approaches for citation matching. However, the former approach demands substantial annotated data and computational resources, while the latter often encounters difficulties in managing multiple citations and frequently produces suboptimal results. In this paper, we introduce a novel framework, called VeriCite, designed to rigorously validate supporting evidence and enhance answer attribution. Specifically, VeriCite breaks down into a three-stage generation: 1) The initial answer generation first generates a response based on all available contexts and has its claims verified through the NLI model; 2) the supporting evidence selection assesses the utility of each document and extracts useful supporting evidences; 3) the final answer refinement integrates the initial response and collected evidences to produce the final, refined this http URL conduct experiments across five open-source LLMs and four datasets, demonstrating that VeriCite can significantly improve citation quality while maintaining the correctness of the answers.

[IR-5] Dynamic Network-Based Two-Stage Time Series Forecasting for Affiliate Marketing

链接: https://arxiv.org/abs/2510.11323
作者: Zhe Wang,Yaming Yang,Ziyu Guan,Bin Tong,Rui Wang,Wei Zhao,Hongbo Deng
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:In recent years, affiliate marketing has emerged as a revenue-sharing strategy where merchants collaborate with promoters to promote their products. It not only increases product exposure but also allows promoters to earn a commission. This paper addresses the pivotal yet under-explored challenge in affiliate marketing: accurately assessing and predicting the contributions of promoters in product promotion. We design a novel metric for evaluating the indirect contributions of the promoter, called propagation scale. Unfortunately, existing time series forecasting techniques fail to deliver accurate predictions due to the propagation scale being influenced by multiple factors and the inherent complexities arising from dynamic scenarios. To address this issue, we decouple the network structure from the node signals and propose a two-stage solution: initially, the basic self-sales and network structure prediction are conducted separately, followed by the synthesis of the propagation scale. Specifically, we design a graph convolution encoding scheme based on descendant neighbors and incorporate hypergraph convolution to efficiently capture complex promotional dynamics. Additionally, three auxiliary tasks are employed: self-sales prediction for base estimations, descendant prediction to synthesize propagation scale, and promoter activation prediction to mitigate high volatility issues. Extensive offline experiments on large-scale industrial datasets validate the superiority of our method. We further deploy our model on Alimama platform with over 100,000 promoters, achieving a 9.29% improvement in GMV and a 5.89% increase in sales volume.

[IR-6] Next Interest Flow: A Generative Pre-training Paradigm for Recommender Systems by Modeling All-domain Movelines

链接: https://arxiv.org/abs/2510.11317
作者: Chen Gao,Zixin Zhao,Lv Shao,Tong Liu
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:Click-Through Rate (CTR) prediction, a cornerstone of modern recommender systems, has been dominated by discriminative models that react to past user behavior rather than proactively modeling user intent. Existing generative paradigms attempt to address this but suffer from critical limitations: Large Language Model (LLM) based methods create a semantic mismatch by forcing e-commerce signals into a linguistic space, while ID-based generation is constrained by item memorization and cold-start issues. To overcome these limitations, we propose a novel generative pre-training paradigm. Our model learns to predict the Next Interest Flow, a dense vector sequence representing a user’s future intent, while simultaneously modeling its internal Interest Diversity and Interest Evolution Velocity to ensure the representation is both rich and coherent. However, this two-stage approach introduces a critical objective mismatch between the generative and discriminative stages. We resolve this via a bidirectional alignment strategy, which harmonizes the two stages through cross-stage weight initialization and a dynamic Semantic Alignment Module for fine-tuning. Additionally, we enhance the underlying discriminative model with a Temporal Sequential Pairwise (TSP) mechanism to better capture temporal causality. We present the All-domain Moveline Evolution Network (AMEN), a unified framework implementing our entire pipeline. Extensive offline experiments validate AMEN’s superiority over strong baselines, and a large-scale online A/B test demonstrates its significant real-world impact, delivering substantial improvements in key business metrics.

[IR-7] DyKnow-RAG : Dynamic Knowledge Utilization Reinforcement Framework for Noisy Retrieval-Augmented Generation in E-commerce Search Relevance

链接: https://arxiv.org/abs/2510.11122
作者: Tingqiao Xu,Shaowei Yao,Chenhe Dong,Yiming Jin,Zerui Huang,Dan Ou,Haihong Tang
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

[IR-8] Decoupled Multimodal Fusion for User Interest Modeling in Click-Through Rate Prediction

链接: https://arxiv.org/abs/2510.11066
作者: Alin Fan,Hanqing Li,Sihan Lu,Jingsong Yuan,Jiandong Zhang
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

[IR-9] Does LLM Focus on the Right Words? Diagnosing Language Bias in LLM -based Recommenders

链接: https://arxiv.org/abs/2510.10978
作者: Bohao Wang,Jiawei Chen,Feng Liu,Changwang Zhang,Jun Wang,Canghong Jin,Chun Chen,Can Wang
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

[IR-10] HatLLM : Hierarchical Attention Masking for Enhanced Collaborative Modeling in LLM -based Recommendation

链接: https://arxiv.org/abs/2510.10955
作者: Yu Cui,Feng Liu,Jiawei Chen,Canghong Jin,Xingyu Lou,Changwang Zhang,Jun Wang,Yuegang Sun,Can Wang
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:Recent years have witnessed a surge of research on leveraging large language models (LLMs) for sequential recommendation. LLMs have demonstrated remarkable potential in inferring users’ nuanced preferences through fine-grained semantic reasoning. However, they also exhibit a notable limitation in effectively modeling collaborative signals, i.e., behavioral correlations inherent in users’ historical interactions. Our empirical analysis further reveals that the attention mechanisms in LLMs tend to disproportionately focus on tokens within the same item, thereby impeding the capture of cross-item correlations. To address this limitation, we propose a novel hierarchical attention masking strategy for LLM-based recommendation, termed HatLLM. Specifically, in shallow layers, HatLLM masks attention between tokens from different items, facilitating intra-item semantic understanding; in contrast, in deep layers, HatLLM masks attention within items, thereby compelling the model to capture cross-item correlations. This progressive, layer-wise approach enables LLMs to jointly model both token-level and item-level dependencies. Extensive experiments on three real-world datasets demonstrate that HatLLM achieves significant performance gains (9.13% on average) over existing LLM-based methods. Subjects: Information Retrieval (cs.IR) Cite as: arXiv:2510.10955 [cs.IR] (or arXiv:2510.10955v1 [cs.IR] for this version) https://doi.org/10.48550/arXiv.2510.10955 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

[IR-11] Multi-Granularity Sequence Denoising with Weakly Supervised Signal for Sequential Recommendation

链接: https://arxiv.org/abs/2510.10564
作者: Liang Li,Zhou Yang,Xiaofei Zhu
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

[IR-12] Self-Supervised Representation Learning with ID-Content Modality Alignment for Sequential Recommendation

链接: https://arxiv.org/abs/2510.10556
作者: Donglin Zhou,Weike Pan,Zhong Ming
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

[IR-13] owards Long-Term User Welfare in Recommender Systems via Creator-Oriented Information Revelation

链接: https://arxiv.org/abs/2510.10511
作者: Xu Zhao,Xiaopeng Ye,Chen Xu,Weiran Shen,Jun Xu
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

[IR-14] ZeroGR: A Generalizable and Scalable Framework for Zero-Shot Generative Retrieval

链接: https://arxiv.org/abs/2510.10419
作者: Weiwei Sun,Keyi Kong,Xinyu Ma,Shuaiqiang Wang,Dawei Yin,Maarten de Rijke,Zhaochun Ren,Yiming Yang
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

[IR-15] Breaking the Likelihood Trap: Consistent Generative Recommendation with Graph-structured Model

链接: https://arxiv.org/abs/2510.10127
作者: Qiya Yang,Xiaoxi Liang,Zeping Xiao,Yingjie Deng,Yalong Wang,Yongqi Liu,Han Li
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

[IR-16] Integrating Structure-Aware Attention and Knowledge Graphs in Explainable Recommendation Systems

链接: https://arxiv.org/abs/2510.10109
作者: Shuangquan Lyu,Ming Wang,Huajun Zhang,Jiasen Zheng,Junjiang Lin,Xiaoxuan Sun
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

[IR-17] PairSem: LLM -Guided Pairwise Semantic Matching for Scientific Document Retrieval

链接: https://arxiv.org/abs/2510.09897
作者: Wonbin Kweon,Runchu Tian,SeongKu Kang,Pengcheng Jiang,Zhiyong Lu,Jiawei Han,Hwanjo Yu
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

附件下载

点击下载今日全部论文列表