本篇博文主要内容为 2026-02-10 从Arxiv.org论文网站获取的最新论文列表,自动更新,按照NLP、CV、ML、AI、IR、MA六个大方向区分。
提示: 当天未及时更新,有可能是Arxiv当日未有新的论文发布,也有可能是脚本出错。尽可能会在当天修复。
目录
概览 (2026-02-10)
今日共更新450篇论文,其中:
- 自然语言处理共63篇(Computation and Language (cs.CL))
- 人工智能共145篇(Artificial Intelligence (cs.AI))
- 计算机视觉共81篇(Computer Vision and Pattern Recognition (cs.CV))
- 机器学习共139篇(Machine Learning (cs.LG))
- 多智能体系统共9篇(Multiagent Systems (cs.MA))
- 信息检索共21篇(Information Retrieval (cs.IR))
- 人机交互共26篇(Human-Computer Interaction (cs.HC))
多智能体系统
[MA-0] Learning to Coordinate via Quantum Entanglement in Multi-Agent Reinforcement Learning
【速读】:该论文旨在解决多智能体强化学习(Multi-Agent Reinforcement Learning, MARL)中因缺乏通信而导致的协调难题。传统方法常依赖共享随机性(shared randomness)或相关设备来实现局部策略的相关性,但这类机制受限于经典概率理论,无法实现更广泛的协作策略。本文提出首个利用量子纠缠(quantum entanglement)作为协调资源的MARL框架,其核心创新在于:一是设计了一种新型可微分策略参数化方法,支持对量子测量进行优化;二是构建了由量子协调器与分布式本地执行者组成的策略架构,将联合策略分解为可训练的模块。这一方案使得智能体能在无需直接通信的情况下,学习到超越仅使用共享随机性的量子优势(quantum advantage)策略,在单轮博弈和Dec-POMDP场景中均验证了有效性。
链接: https://arxiv.org/abs/2602.08965
作者: John Gardiner,Orlando Romero,Brendan Tivnan,Nicolò Dal Fabbro,George J. Pappas
机构: 未知
类目: Multiagent Systems (cs.MA); Machine Learning (cs.LG)
备注:
点击查看摘要
Abstract:The inability to communicate poses a major challenge to coordination in multi-agent reinforcement learning (MARL). Prior work has explored correlating local policies via shared randomness, sometimes in the form of a correlation device, as a mechanism to assist in decentralized decision-making. In contrast, this work introduces the first framework for training MARL agents to exploit shared quantum entanglement as a coordination resource, which permits a larger class of communication-free correlated policies than shared randomness alone. This is motivated by well-known results in quantum physics which posit that, for certain single-round cooperative games with no communication, shared quantum entanglement enables strategies that outperform those that only use shared randomness. In such cases, we say that there is quantum advantage. Our framework is based on a novel differentiable policy parameterization that enables optimization over quantum measurements, together with a novel policy architecture that decomposes joint policies into a quantum coordinator and decentralized local actors. To illustrate the effectiveness of our proposed method, we first show that we can learn, purely from experience, strategies that attain quantum advantage in single-round games that are treated as black box oracles. We then demonstrate how our machinery can learn policies with quantum advantage in an illustrative multi-agent sequential decision-making problem formulated as a decentralized partially observable Markov decision process (Dec-POMDP).
[MA-1] aching an Old Dynamics New Tricks: Regularization-free Last-iterate Convergence in Zero-sum Games via BNN Dynamics
【速读】:该论文旨在解决多智能体学习(Multi-Agent Learning, MAL)中零和博弈(Zero-sum Games)的收敛性问题,尤其是现有基于正则化的方法依赖额外超参数调优、在未知或动态收益结构下难以适用的问题。其解决方案的关键在于重新利用演化博弈论中的布朗-冯·诺伊曼-纳什(Brown-von Neumann-Nash, BNN)动力学,该动力学在零和博弈中无需正则化即可保证收敛,并在此基础上构建了一个具有理论保障的新框架,通过反事实加权(counterfactual weighting)将BNN动力学扩展至广泛形式博弈(Extensive-Form Games, EFGs),同时结合神经函数逼近实现可扩展的学习算法,从而在噪声正常形式博弈(Noisy Normal-Form Games, NFGs)中提供最后迭代收敛保证(last-iterate convergence guarantees)。
链接: https://arxiv.org/abs/2602.08938
作者: Tuo Zhang,Leonardo Stella
机构: University of Birmingham (伯明翰大学)
类目: Multiagent Systems (cs.MA)
备注:
点击查看摘要
Abstract:Zero-sum games are a fundamental setting for adversarial training and decision-making in multi-agent learning (MAL). Existing methods often ensure convergence to (approximate) Nash equilibria by introducing a form of regularization. Yet, regularization requires additional hyperparameters, which must be carefully tuned–a challenging task when the payoff structure is known, and considerably harder when the structure is unknown or subject to change. Motivated by this problem, we repurpose a classical model in evolutionary game theory, i.e., the Brown-von Neumann-Nash (BNN) dynamics, by leveraging the intrinsic convergence of this dynamics in zero-sum games without regularization, and provide last-iterate convergence guarantees in noisy normal-form games (NFGs). Importantly, to make this approach more applicable, we develop a novel framework with theoretical guarantees that integrates the BNN dynamics in extensive-form games (EFGs) through counterfactual weighting. Furthermore, we implement an algorithm that instantiates our framework with neural function approximation, enabling scalable learning in both NFGs and EFGs. Empirical results show that our method quickly adapts to nonstationarities, outperforming the state-of-the-art regularization-based approach.
[MA-2] A Generic Service-Oriented Function Offloading Framework for Connected Automated Vehicles
【速读】:该论文旨在解决连接式自动驾驶车辆(Connected Automated Vehicles, CAVs)在计算能力受限和能源不足背景下,难以高效执行复杂计算任务(如轨迹规划)的问题。解决方案的关键在于提出一个通用的函数卸载(function offloading)框架,该框架通过将计算任务动态分配至本地或远程边缘计算设备(如多接入边缘计算,Multi-Access Edge Computing, MEC)来提升整体系统性能。其中,核心创新点是引入基于位置的决策机制,使任务是否本地处理或卸载至MEC服务器取决于CAV的位置信息,从而在保障服务质量(Quality of Service, QoS)的同时显著提高计算效率,并支持多车并发请求场景下的自适应调度。
链接: https://arxiv.org/abs/2602.08799
作者: Robin Dehler,Michael Buchholz
机构: Ulm University (乌尔姆大学)
类目: Robotics (cs.RO); Multiagent Systems (cs.MA)
备注: 8 pages, 6 figures, 2 tables, published in RA-L
点击查看摘要
Abstract:Function offloading is a promising solution to address limitations concerning computational capacity and available energy of Connected Automated Vehicles~(CAVs) or other autonomous robots by distributing computational tasks between local and remote computing devices in form of distributed services. This paper presents a generic function offloading framework that can be used to offload an arbitrary set of computational tasks with a focus on autonomous driving. To provide flexibility, the function offloading framework is designed to incorporate different offloading decision making algorithms and quality of service~(QoS) requirements that can be adjusted to different scenarios or the objectives of the CAVs. With a focus on the applicability, we propose an efficient location-based approach, where the decision whether tasks are processed locally or remotely depends on the location of the CAV. We apply the proposed framework on the use case of service-oriented trajectory planning, where we offload the trajectory planning task of CAVs to a Multi-Access Edge Computing~(MEC) server. The evaluation is conducted in both simulation and real-world application. It demonstrates the potential of the function offloading framework to guarantee the QoS for trajectory planning while improving the computational efficiency of the CAVs. Moreover, the simulation results also show the adaptability of the framework to diverse scenarios involving simultaneous offloading requests from multiple CAVs.
[MA-3] ValueFlow: Measuring the Propagation of Value Perturbations in Multi-Agent LLM Systems
【速读】:该论文旨在解决多智能体大语言模型(Multi-agent Large Language Model, LLM)系统中价值漂移(value drift)的量化与分析问题,即当多个智能体相互观察并响应彼此输出时,初始价值设定如何在交互过程中发生偏移及其机制尚不明确。解决方案的关键在于提出ValueFlow框架,其核心包括:(1)基于Schwartz价值调查构建包含56个价值维度的评估数据集;(2)利用LLM-as-a-judge协议量化智能体在交互中的价值取向变化;(3)将价值漂移分解为个体响应行为(agent-level response behavior)和系统结构效应(system-level structural effects),并通过两个可操作指标进行测量——beta-susceptibility(衡量单个智能体对扰动同伴信号的敏感度)与系统敏感度(System Susceptibility, SS,反映节点扰动对整体系统输出的影响)。实验表明,不同价值维度的敏感性差异显著,且受网络拓扑结构强烈调控。
链接: https://arxiv.org/abs/2602.08567
作者: Jinnuo Liu,Chuke Liu,Hua Shen
机构: Center for Data Science, NYU Shanghai, New York University (纽约大学上海中心数据科学研究中心)
类目: Multiagent Systems (cs.MA); Computation and Language (cs.CL)
备注: Preprint. Under review. 18 pages, 9 figures
点击查看摘要
Abstract:Multi-agent large language model (LLM) systems increasingly consist of agents that observe and respond to one another’s outputs. While value alignment is typically evaluated for isolated models, how value perturbations propagate through agent interactions remains poorly understood. We present ValueFlow, a perturbation-based evaluation framework for measuring and analyzing value drift in multi-agent systems. ValueFlow introduces a 56-value evaluation dataset derived from the Schwartz Value Survey and quantifies agents’ value orientations during interaction using an LLM-as-a-judge protocol. Building on this measurement layer, ValueFlow decomposes value drift into agent-level response behavior and system-level structural effects, operationalized by two metrics: beta-susceptibility, which measures an agent’s sensitivity to perturbed peer signals, and system susceptibility (SS), which captures how node-level perturbations affect final system outputs. Experiments across multiple model backbones, prompt personas, value dimensions, and network structures show that susceptibility varies widely across values and is strongly shaped by structural topology.
[MA-4] EvoCorps: An Evolutionary Multi-Agent Framework for Depolarizing Online Discourse
【速读】:该论文旨在解决在线话语极化(polarization)问题,该问题会削弱社会信任并加速错误信息传播,而现有技术手段多为诊断性和事后应对,存在固有延迟和静态策略的局限,难以实时对抗动态演化的协同恶意放大行为。其解决方案的关键在于提出EvoCorps框架——一个基于进化多智能体的主动去极化系统,将话语治理建模为动态社会博弈,并协调监测、规划、基于事实的生成与多身份扩散等角色;通过检索增强的集体认知核心提供事实依据与行动-结果记忆,结合闭环进化学习机制随环境和攻击者变化自适应调整策略,从而实现从检测到过程内闭环干预的转变。
链接: https://arxiv.org/abs/2602.08529
作者: Ning Lin,Haolun Li,Mingshu Liu,Chengyun Ruan,Kaibo Huang,Yukun Wei,Zhongliang Yang,Linna Zhou
机构: Beijing University of Posts and Telecommunications (北京邮电大学)
类目: Multiagent Systems (cs.MA)
备注:
点击查看摘要
Abstract:Polarization in online discourse erodes social trust and accelerates misinformation, yet technical responses remain largely diagnostic and post-hoc. Current governance approaches suffer from inherent latency and static policies, struggling to counter coordinated adversarial amplification that evolves in real-time. We present EvoCorps, an evolutionary multi-agent framework for proactive depolarization. EvoCorps frames discourse governance as a dynamic social game and coordinates roles for monitoring, planning, grounded generation, and multi-identity diffusion. A retrieval-augmented collective cognition core provides factual grounding and action–outcome memory, while closed-loop evolutionary learning adapts strategies as the environment and attackers change. We implement EvoCorps on the MOSAIC social-AI simulation platform for controlled evaluation in a multi-source news stream with adversarial injection and amplification. Across emotional polarization, viewpoint extremity, and argumentative rationality, EvoCorps improves discourse outcomes over an adversarial baseline, pointing to a practical path from detection and post-hoc mitigation to in-process, closed-loop intervention. The code is available at this https URL.
[MA-5] A General Theory of Proportionality with Additive Utilities
【速读】:该论文旨在解决在存在一般性约束条件下,如何基于选民偏好从候选集合中选择子集的问题,此类问题广泛存在于委员会选举(含多样性约束)、参与式预算(包括资金分配到不同项目池的约束)以及公共决策等领域。现有研究虽已为仅支持批准投票(approval ballots)的场景定义了比例性公理,但无法适用于更灵活的基数投票(cardinal ballots)情形,即每位选民可对每个候选人赋予权重以反映其效用。论文的关键解决方案是提出适用于基数投票的比例性规则,并引入生成比例排名的方法,确保排名的任意前缀均满足比例性条件,从而在保持公平性的前提下实现更精细的决策支持。
链接: https://arxiv.org/abs/2602.08504
作者: Piotr Skowron
机构: 未知
类目: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
备注:
点击查看摘要
Abstract:We consider a model where a subset of candidates must be selected based on voter preferences, subject to general constraints that specify which subsets are feasible. This model generalizes committee elections with diversity constraints, participatory budgeting (including constraints specifying how funds must be allocated to projects from different pools), and public decision-making. Axioms of proportionality have recently been defined for this general model, but the proposed rules apply only to approval ballots, where each voter submits a subset of candidates she finds acceptable. We propose proportional rules for cardinal ballots, where each voter assigns a numerical value to each candidate corresponding to her utility if that candidate is selected. In developing these rules, we also introduce methods that produce proportional rankings, ensuring that every prefix of the ranking satisfies proportionality.
[MA-6] Altruism and Fair Objective in Mixed-Motive Markov games
【速读】:该论文旨在解决多智能体协作中因追求效用最大化而导致的公平性缺失问题,即传统基于功利主义(utilitarian welfare)的目标函数虽能实现高效合作,却常导致资源分配极度不公。其解决方案的关键在于引入比例公平(Proportional Fairness)原则,构建一种基于个体对数收益空间的公平利他效用函数,并推导出确保经典社会困境中合作成立的解析条件;进一步将该框架扩展至序列决策场景,提出公平马尔可夫博弈(Fair Markov Game)并设计新型公平Actor-Critic算法以学习公平策略。
链接: https://arxiv.org/abs/2602.08389
作者: Yao-hua Franck Xu,Tayeb Lemlouma,Arnaud Braud,Jean-Marie Bonnin
机构: 未知
类目: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
备注:
点击查看摘要
Abstract:Cooperation is fundamental for society’s viability, as it enables the emergence of structure within heterogeneous groups that seek collective well-being. However, individuals are inclined to defect in order to benefit from the group’s cooperation without contributing the associated costs, thus leading to unfair situations. In game theory, social dilemmas entail this dichotomy between individual interest and collective outcome. The most dominant approach to multi-agent cooperation is the utilitarian welfare which can produce efficient highly inequitable outcomes. This paper proposes a novel framework to foster fairer cooperation by replacing the standard utilitarian objective with Proportional Fairness. We introduce a fair altruistic utility for each agent, defined on the individual log-payoff space and derive the analytical conditions required to ensure cooperation in classic social dilemmas. We then extend this framework to sequential settings by defining a Fair Markov Game and deriving novel fair Actor-Critic algorithms to learn fair policies. Finally, we evaluate our method in various social dilemma environments.
[MA-7] 2VTree: User-Centered Visual Analytics for Agent -Assisted Thought-to-Video Authoring
【速读】:该论文旨在解决生成式AI(Generative AI)在视频创作过程中存在的多阶段、多模态与决策密集型问题,即当前工具要么隐藏中间决策导致难以追踪和复用创作过程,要么暴露操作层流程使探索轨迹难以管理。其解决方案的关键在于提出T2VTree——一种以用户为中心的视觉分析方法,将创作过程建模为可编辑的树状结构,每个节点绑定意图、输入、工作流选择、提示词及参数等规范,并关联多模态输出,从而支持直接的细化、分支与溯源检查;同时引入协作代理将步骤级意图转化为可见且可编辑的执行计划,结合就地预览与拼接功能实现端到端多场景视频合成,显著提升创作效率与可控性。
链接: https://arxiv.org/abs/2602.08368
作者: Zhuoyun Zheng,Yu Dong,Gaorong Liang,Guan Li,Guihua Shan,Shiyu Cheng,Dong Tian,Jianlong Zhou,Jie Liang
机构: 未知
类目: Multimedia (cs.MM); Graphics (cs.GR); Human-Computer Interaction (cs.HC); Multiagent Systems (cs.MA)
备注:
点击查看摘要
Abstract:Generative models have substantially expanded video generation capabilities, yet practical thought-to-video creation remains a multi-stage, multi-modal, and decision-intensive process. However, existing tools either hide intermediate decisions behind repeated reruns or expose operator-level workflows that make exploration traces difficult to manage, compare, and reuse. We present T2VTree, a user-centered visual analytics approach for agent-assisted thought-to-video authoring. T2VTree represents the authoring process as a tree visualization. Each node in the tree binds an editable specification (intent, referenced inputs, workflow choice, prompts, and parameters) with the resulting multimodal outputs, making refinement, branching, and provenance inspection directly operable. To reduce the burden of deciding what to do next, a set of collaborating agents translates step-level intent into an executable plan that remains visible and user-editable before execution. We further implement a visual analytics system that integrates branching authoring with in-place preview and stitching for convergent assembly, enabling end-to-end multi-scene creation without leaving the authoring context. We demonstrate T2VTreeVA through two multi-scene case studies and a comparative user study, showing how the T2VTree visualization and editable agent planning support reliable refinement, localized comparison, and practical reuse in real authoring workflows. T2VTree is available at: this https URL.
[MA-8] SynthAgent : A Multi-Agent LLM Framework for Realistic Patient Simulation – A Case Study in Obesity with Mental Health Comorbidities ALT AAAI2026
【速读】:该论文旨在解决真实世界医疗数据中存在的碎片化、偏倚及隐私限制问题,从而难以有效研究复杂疾病(如肥胖合并精神障碍)的多维动态机制。其解决方案的关键在于提出SynthAgent框架——一个基于多智能体系统(Multi-Agent System, MAS)的高保真虚拟患者生成平台,该框架整合了索赔数据、人群调查与以患者为中心的文献证据,构建具有人格特质(personality traits)的个性化虚拟患者,使其能够模拟疾病进展、治疗反应及生活管理行为,并通过自主智能体交互实现心理社会情境下的行为演化。实验表明,GPT-5和Claude 4.5 Sonnet作为核心引擎在生成质量上显著优于其他模型,验证了该框架在医学与心理领域探索患者旅程与决策过程中的可行性与优越性。
链接: https://arxiv.org/abs/2602.08254
作者: Arman Aghaee,Sepehr Asgarian,Jouhyun Jeon
机构: 未知
类目: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Multiagent Systems (cs.MA)
备注: Presented in AAAI 2026 Singapore at the workshop of Health Intelligence
点击查看摘要
Abstract:Simulating high-fidelity patients offers a powerful avenue for studying complex diseases while addressing the challenges of fragmented, biased, and privacy-restricted real-world data. In this study, we introduce SynthAgent, a novel Multi-Agent System (MAS) framework designed to model obesity patients with comorbid mental disorders, including depression, anxiety, social phobia, and binge eating disorder. SynthAgent integrates clinical and medical evidence from claims data, population surveys, and patient-centered literature to construct personalized virtual patients enriched with personality traits that influence adherence, emotion regulation, and lifestyle behaviors. Through autonomous agent interactions, the system simulates disease progression, treatment response, and life management across diverse psychosocial contexts. Evaluation of more than 100 generated patients demonstrated that GPT-5 and Claude 4.5 Sonnet achieved the highest fidelity as the core engine in the proposed MAS framework, outperforming Gemini 2.5 Pro and DeepSeek-R1. SynthAgent thus provides a scalable and privacy-preserving framework for exploring patient journeys, behavioral dynamics, and decision-making processes in both medical and psychological domains.
自然语言处理
[NLP-0] Next-Gen CAPTCHAs: Leverag ing the Cognitive Gap for Scalable and Diverse GUI-Agent Defense
链接: https://arxiv.org/abs/2602.09012
作者: Jiacheng Liu,Yaxin Luo,Jiacheng Cui,Xinyi Shang,Xiaohan Zhao,Zhiqiang Shen
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: Project page at this https URL
[NLP-1] Data Science and Technology Towards AGI Part I: Tiered Data Management
【速读】: 该论文试图解决当前大语言模型(Large Language Models, LLMs)发展中因数据规模单向扩展所引发的瓶颈问题,包括数据获取成本上升、可用数据资源受限以及训练效率下降等挑战。其解决方案的关键在于提出一个分层的数据管理框架(L0-L4 tiered data management framework),实现数据与模型的协同演化:在该框架中,LLM被深度集成于数据处理全流程(如质量评分与内容编辑),从而动态优化各层级数据的质量、成本与训练价值之间的平衡;同时,不同层级数据(从原始未清洗资源到结构化知识库)根据训练阶段(预训练、中期训练、对齐阶段)进行战略性分配,显著提升训练效率与模型性能。
链接: https://arxiv.org/abs/2602.09003
作者: Yudong Wang,Zixuan Fu,Hengyu Zhao,Chen Zhao,Chuyue Zhou,Xinle Lin,Hongya Lyu,Shuaikang Xue,Yi Yi,Yingjiao Wang,Zhi Zheng,Yuzhou Zhang,Jie Zhou,Chaojun Xiao,Xu Han,Zhiyuan Liu,Maosong Sun
机构: Tsinghua University (清华大学); ModelBest Inc.; Beijing Institute of Technology (北京理工大学); South China Agricultural University (华南农业大学)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: 16 pages, 3 figures, 7 tables
点击查看摘要
Abstract:The development of artificial intelligence can be viewed as an evolution of data-driven learning paradigms, with successive shifts in data organization and utilization continuously driving advances in model capability. Current LLM research is dominated by a paradigm that relies heavily on unidirectional scaling of data size, increasingly encountering bottlenecks in data availability, acquisition cost, and training efficiency. In this work, we argue that the development of AGI is entering a new phase of data-model co-evolution, in which models actively guide data management while high-quality data, in turn, amplifies model capabilities. To implement this vision, we propose a tiered data management framework, designed to support the full LLM training lifecycle across heterogeneous learning objectives and cost constraints. Specifically, we introduce an L0-L4 tiered data management framework, ranging from raw uncurated resources to organized and verifiable knowledge. Importantly, LLMs are fully used in data management processes, such as quality scoring and content editing, to refine data across tiers. Each tier is characterized by distinct data properties, management strategies, and training roles, enabling data to be strategically allocated across LLM training stages, including pre-training, mid-training, and alignment. The framework balances data quality, acquisition cost, and marginal training benefit, providing a systematic approach to scalable and sustainable data management. We validate the effectiveness of the proposed framework through empirical studies, in which tiered datasets are constructed from raw corpora and used across multiple training phases. Experimental results demonstrate that tier-aware data utilization significantly improves training efficiency and model performance. To facilitate further research, we release our tiered datasets and processing tools to the community.
[NLP-2] Paradox of De-identification: A Critique of HIPAA Safe Harbour in the Age of LLM s
【速读】: 该论文试图解决临床笔记在遵循HIPAA Safe Harbor标准进行去标识化后仍存在隐私泄露风险的问题,特别是现代大语言模型(Large Language Models, LLMs)能够通过识别身份与准标识符之间的潜在关联实现个体再识别。其解决方案的关键在于:首先利用因果图形式化描述这些隐含信息的传播路径,进而通过实证研究验证从清洗后的临床笔记中可实现患者个体再识别;同时揭示一个悖论——即使移除所有其他信息,仅凭诊断信息即可预测患者的居住区域,说明当前去标识化方法在语义层面存在根本性缺陷。论文呼吁学术界和医疗社区共同探讨更有效的隐私保护机制,以维护医患信任。
链接: https://arxiv.org/abs/2602.08997
作者: Lavender Y. Jiang,Xujin Chris Liu,Kyunghyun Cho,Eric K. Oermann
机构: New York University (纽约大学)
类目: Computers and Society (cs.CY); Computation and Language (cs.CL)
备注:
点击查看摘要
Abstract:Privacy is a human right that sustains patient-provider trust. Clinical notes capture a patient’s private vulnerability and individuality, which are used for care coordination and research. Under HIPAA Safe Harbor, these notes are de-identified to protect patient privacy. However, Safe Harbor was designed for an era of categorical tabular data, focusing on the removal of explicit identifiers while ignoring the latent information found in correlations between identity and quasi-identifiers, which can be captured by modern LLMs. We first formalize these correlations using a causal graph, then validate it empirically through individual re-identification of patients from scrubbed notes. The paradox of de-identification is further shown through a diagnosis ablation: even when all other information is removed, the model can predict the patient’s neighborhood based on diagnosis alone. This position paper raises the question of how we can act as a community to uphold patient-provider trust when de-identification is inherently imperfect. We aim to raise awareness and discuss actionable recommendations.
[NLP-3] When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents
【速读】: 该论文旨在解决计算机使用代理(Computer-use Agents, CUAs)中频繁出现的行动错位(misaligned actions)问题,即代理执行的动作偏离用户原始意图,此类问题可能源于外部攻击(如间接提示注入)或内部局限(如错误推理),从而带来安全风险并降低任务效率与可靠性。解决方案的关键在于提出一种名为DeAction的通用防护机制,其通过在动作执行前检测错位行为,并利用结构化反馈迭代修正这些行为,实现了对内外因引发的错位动作的有效识别与纠正。实验表明,DeAction在离线和在线评估中均显著优于现有基线方法,在保持较低延迟开销的前提下大幅提升了检测准确率和对抗环境下的安全性。
链接: https://arxiv.org/abs/2602.08995
作者: Yuting Ning,Jaylen Jones,Zhehao Zhang,Chentao Ye,Weitong Ruan,Junyi Li,Rahul Gupta,Huan Sun
机构: 未知
类目: Computation and Language (cs.CL)
备注: Project Homepage: this https URL
点击查看摘要
Abstract:Computer-use agents (CUAs) have made tremendous progress in the past year, yet they still frequently produce misaligned actions that deviate from the user’s original intent. Such misaligned actions may arise from external attacks (e.g., indirect prompt injection) or from internal limitations (e.g., erroneous reasoning). They not only expose CUAs to safety risks, but also degrade task efficiency and reliability. This work makes the first effort to define and study misaligned action detection in CUAs, with comprehensive coverage of both externally induced and internally arising misaligned actions. We further identify three common categories in real-world CUA deployment and construct MisActBench, a benchmark of realistic trajectories with human-annotated, action-level alignment labels. Moreover, we propose DeAction, a practical and universal guardrail that detects misaligned actions before execution and iteratively corrects them through structured feedback. DeAction outperforms all existing baselines across offline and online evaluations with moderate latency overhead: (1) On MisActBench, it outperforms baselines by over 15% absolute in F1 score; (2) In online evaluation, it reduces attack success rate by over 90% under adversarial settings while preserving or even improving task success rate in benign environments.
[NLP-4] Next Concept Prediction in Discrete Latent Space Leads to Stronger Language Models
链接: https://arxiv.org/abs/2602.08984
作者: Yuliang Liu,Yunchong Song,Yixuan Wang,Kewen Ge,Alex Lamb,Qipeng Guo,Kai Chen,Bowen Zhou,Zhouhan Lin
机构: LUMIA Lab (LUMIA 实验室); School of Artificial Intelligence (人工智能学院); Shanghai Jiao Tong University (上海交通大学); Shanghai AI Laboratory (上海人工智能实验室); Department of Electronic Engineering (电子工程系); Tsinghua University (清华大学); College of Artificial Intelligence (人工智能学院); Shanghai Innovation Institute (上海创新研究院)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:
[NLP-5] Beyond Transcripts: A Renewed Perspective on Audio Chaptering
链接: https://arxiv.org/abs/2602.08979
作者: Fabian Retkowski,Maike Züfle,Thai Binh Nguyen,Jan Niehues,Alexander Waibel
机构: Karlsruhe Institute of Technology (卡尔斯鲁厄理工学院); Carnegie Mellon University (卡内基梅隆大学)
类目: ound (cs.SD); Computation and Language (cs.CL)
备注:
[NLP-6] A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents
【速读】: 该论文试图解决的问题是:如何可靠地为智能体(agent)赋予目标(goal)并准确刻画其目标导向性(goal-directedness),当前缺乏一套被广泛接受的评估方法。解决方案的关键在于提出一个融合行为评估与基于可解释性的模型内部表征分析的框架:首先通过对比最优策略在不同任务复杂度下的表现来衡量行为层面的目标达成能力;随后利用探测(probing)方法解析大语言模型(LLM)在二维网格世界中对环境状态和多步行动规划的内部表示,发现其非线性编码了粗粒度的空间地图,并在推理过程中从结构化环境线索逐步聚焦于即时动作选择所需的信息。这一方法表明,仅靠行为评估不足以全面刻画智能体的目标实现机制,必须结合对内部认知表征的深入分析。
链接: https://arxiv.org/abs/2602.08964
作者: Raghu Arghal,Fade Chen,Niall Dalton,Evgenii Kortukov,Calum McNamara,Angelos Nalmpantis,Moksh Nirvaan,Gabriele Sarti,Mario Giulianelli
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
备注:
点击查看摘要
Abstract:Understanding an agent’s goals helps explain and predict its behaviour, yet there is no established methodology for reliably attributing goals to agentic systems. We propose a framework for evaluating goal-directedness that integrates behavioural evaluation with interpretability-based analyses of models’ internal representations. As a case study, we examine an LLM agent navigating a 2D grid world toward a goal state. Behaviourally, we evaluate the agent against an optimal policy across varying grid sizes, obstacle densities, and goal structures, finding that performance scales with task difficulty while remaining robust to difficulty-preserving transformations and complex goal structures. We then use probing methods to decode the agent’s internal representations of the environment state and its multi-step action plans. We find that the LLM agent non-linearly encodes a coarse spatial map of the environment, preserving approximate task-relevant cues about its position and the goal location; that its actions are broadly consistent with these internal representations; and that reasoning reorganises them, shifting from broader environment structural cues toward information supporting immediate action selection. Our findings support the view that introspective examination is required beyond behavioural evaluations to characterise how agents represent and pursue their objectives.
[NLP-7] How Should We Model the Probability of a Language?
【速读】: 该论文试图解决语言识别(Language Identification, LID)系统在覆盖全球7000多种语言时存在的严重不均衡问题,尤其是对尾部语言(tail languages)识别能力薄弱甚至缺失的现状。当前主流LID系统多被建模为去上下文的文本分类任务,忽视了先验概率估计的重要性,并受制于机构激励机制倾向于采用全局固定先验模型,导致对低资源语言的支持不足。论文提出的关键解决方案是将LID重新定义为一个路由(routing)问题,通过引入环境线索(environmental cues)来提升特定场景下局部语言的合理性判断,从而实现更普适、灵活且可扩展的语言识别框架。
链接: https://arxiv.org/abs/2602.08951
作者: Rasul Dent,Pedro Ortiz Suarez,Thibault Clérice,Benoît Sagot
机构: Inria(法国国家信息与自动化研究院); Common Crawl Foundation(公共爬虫基金会)
类目: Computation and Language (cs.CL)
备注: Accepted for Vardial 2026
点击查看摘要
Abstract:Of the over 7,000 languages spoken in the world, commercial language identification (LID) systems only reliably identify a few hundred in written form. Research-grade systems extend this coverage under certain circumstances, but for most languages coverage remains patchy or nonexistent. This position paper argues that this situation is largely self-imposed. In particular, it arises from a persistent framing of LID as decontextualized text classification, which obscures the central role of prior probability estimation and is reinforced by institutional incentives that favor global, fixed-prior models. We argue that improving coverage for tail languages requires rethinking LID as a routing problem and developing principled ways to incorporate environmental cues that make languages locally plausible.
[NLP-8] CoRefine: Confidence-Guided Self-Refinement for Adaptive Test-Time Compute
【速读】: 该论文旨在解决大型语言模型(Large Language Models, LLMs)在推理过程中依赖高计算开销的并行解码(如512样本)以提升准确率的问题。其核心解决方案是提出CoRefine方法,关键在于设计了一个轻量级的211k参数Conv1D控制器,基于完整轨迹的置信度信号动态决策是否终止、重新审视或尝试不同策略,从而实现目标导向的自我修正,平均每个问题仅需2.7次精炼步骤,并相较512样本基线节省约190倍的token消耗。该方法通过将置信度视为控制信号而非正确性保证,为可扩展推理和代理场景提供了模块化基础。
链接: https://arxiv.org/abs/2602.08948
作者: Chen Jin,Ryutaro Tanno,Tom Diethe,Philip Teare
机构: 未知
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:
点击查看摘要
Abstract:Large Language Models (LLMs) often rely on test-time scaling via parallel decoding (for example, 512 samples) to boost reasoning accuracy, but this incurs substantial compute. We introduce CoRefine, a confidence-guided self-refinement method that achieves competitive accuracy using a fraction of the tokens via a lightweight 211k-parameter Conv1D controller atop a frozen LLM. The controller consumes full-trace confidence to decide whether to halt, re-examine, or try a different approach, enabling targeted self-correction with an average of 2.7 refinement steps per problem and roughly 190-fold token reduction relative to 512-sample baselines. Across diverse reasoning benchmarks and three open-source models, the controller achieves 92.6 percent precision when it confidently halts, indicating that confidence dynamics reliably signal correctness without ground-truth verification. We extend this to CoRefine-Tree, a hybrid sequential-parallel variant that adaptively balances exploration and exploitation, with easy serving integration and verifier compatibility. By treating confidence as a control signal rather than a correctness guarantee, CoRefine provides a modular primitive for scalable reasoning and agentic settings with imperfect verifiers.
[NLP-9] GitSearch: Enhancing Community Notes Generation with Gap-Informed Targeted Search
链接: https://arxiv.org/abs/2602.08945
作者: Sahajpreet Singh,Kokil Jaidka,Min-Yen Kan
机构: National University of Singapore (新加坡国立大学)
类目: Computation and Language (cs.CL); Computers and Society (cs.CY)
备注: 18 pages, 11 figures, 7 tables
[NLP-10] Is Reasoning Capability Enough for Safety in Long-Context Language Models?
链接: https://arxiv.org/abs/2602.08874
作者: Yu Fu,Haz Sameen Shahgir,Huanli Gong,Zhipeng Wei,N. Benjamin Erichson,Yue Dong
机构: 未知
类目: Computation and Language (cs.CL); Cryptography and Security (cs.CR)
备注: 25 pages, 7 figures
[NLP-11] Understanding Dynamic Compute Allocation in Recurrent Transformers
链接: https://arxiv.org/abs/2602.08864
作者: Ibraheem Muhammad Moosa,Suhas Lohit,Ye Wang,Moitreya Chatterjee,Wenpeng Yin
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:
[NLP-12] Discovering Interpretable Algorithms by Decompiling Transformers to RASP
链接: https://arxiv.org/abs/2602.08857
作者: Xinting Huang,Aleksandra Bakalova,Satwik Bhattamishra,William Merrill,Michael Hahn
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: 101 pages, 92 figures
[NLP-13] WildReward: Learning Reward Models from In-the-Wild Human Interactions
链接: https://arxiv.org/abs/2602.08829
作者: Hao Peng,Yunjia Qi,Xiaozhi Wang,Zijun Yao,Lei Hou,Juanzi Li
机构: Tsinghua University (清华大学); Zhipu AI (智谱AI)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:
[NLP-14] Affective Flow Language Model for Emotional Support Conversation
【速读】: 该论文旨在解决情感支持对话(Emotional Support Conversation, ESC)中复杂多轮交互时策略决策缺乏有效监督的问题。现有对齐方法依赖稀疏的结果层面信号,难以提供中间步骤的细粒度指导,导致模型在多轮对话中难以学习一致且高效的应对策略。解决方案的关键在于提出AFlow框架,通过建模多轮轨迹上的连续情感流(affective flow),对对话前缀进行细粒度监督,从而估计搜索路径中的中间效用并学习偏好一致的策略转移;进一步引入子路径级流平衡目标(subpath-level flow-balance objective),将偏好信号传播至中间状态,提升策略连贯性与共情响应质量。
链接: https://arxiv.org/abs/2602.08826
作者: Chenghui Zou,Ning Wang,Tiesunlong Shen,Luwei Xiao,Chuan Ma,Xiangpeng Li,Rui Mao,Erik Cambria
机构: Chongqing University (重庆大学); National University of Singapore (新加坡国立大学); Nanyang Technological University (南洋理工大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 19 pages, 7 figures
点击查看摘要
Abstract:Large language models (LLMs) have been widely applied to emotional support conversation (ESC). However, complex multi-turn support remains this http URL is because existing alignment schemes rely on sparse outcome-level signals, thus offering limited supervision for intermediate strategy decisions. To fill this gap, this paper proposes affective flow language model for emotional support conversation (AFlow), a framework that introduces fine-grained supervision on dialogue prefixes by modeling a continuous affective flow along multi-turn trajectories. AFlow can estimate intermediate utility over searched trajectories and learn preference-consistent strategy transitions. To improve strategy coherence and empathetic response quality, a subpath-level flow-balance objective is presented to propagate preference signals to intermediate states. Experiment results show consistent and significant improvements over competitive baselines in diverse emotional contexts. Remarkably, AFlow with a compact open-source backbone outperforms proprietary LMMs such as GPT-4o and Claude-3.5 on major ESC metrics. Our code is available at this https URL.
[NLP-15] Bayesian Preference Learning for Test-Time Steerable Reward Models
链接: https://arxiv.org/abs/2602.08819
作者: Jiwoo Hong,Shao Tang,Zhipeng Wang
机构: 未知
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注: Preprint
[NLP-16] he Use of AI Tools to Develop and Validate Q-Matrices
【速读】: 该论文旨在解决认知诊断模型(Cognitive Diagnostic Modeling, CDM)中Q-matrix构建过程费时费力的问题,探索生成式AI(Generative AI)是否能够辅助完成这一任务。其解决方案的关键在于利用不同大语言模型(Large Language Models, LLMs)根据与人类专家相同的训练材料生成Q-matrix,并通过Cohen’s kappa系数量化AI生成结果与人工验证Q-matrix及人类专家判断的一致性,从而评估AI在Q-matrix自动化构建中的可行性与性能表现。
链接: https://arxiv.org/abs/2602.08796
作者: Kevin Fan,Jacquelyn A. Bialo,Hongli Li
机构: 未知
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: An earlier version of this study was presented at the Psychometric Society Meeting held in July 2025 in Minneapolis, USA
点击查看摘要
Abstract:Constructing a Q-matrix is a critical but labor-intensive step in cognitive diagnostic modeling (CDM). This study investigates whether AI tools (i.e., general language models) can support Q-matrix development by comparing AI-generated Q-matrices with a validated Q-matrix from Li and Suen (2013) for a reading comprehension test. In May 2025, multiple AI models were provided with the same training materials as human experts. Agreement among AI-generated Q-matrices, the validated Q-matrix, and human raters’ Q-matrices was assessed using Cohen’s kappa. Results showed substantial variation across AI models, with Google Gemini 2.5 Pro achieving the highest agreement (Kappa = 0.63) with the validated Q-matrix, exceeding that of all human experts. A follow-up analysis in January 2026 using newer AI versions, however, revealed lower agreement with the validated Q-matrix. Implications and directions for future research are discussed.
[NLP-17] LakeHopper: Cross Data Lakes Column Type Annotation through Model Adaptation
【速读】: 该论文旨在解决如何将基于预训练语言模型(Language Model, LM)的列类型标注模型从一个源数据湖(source data lake)迁移到新的目标数据湖(target data lake),同时最小化在目标数据湖上所需的标注数据量。其核心挑战包括源与目标数据湖之间的知识鸿沟、如何选择具有信息量的目标未标注数据,以及在微调过程中避免丢失源域中共享的知识。解决方案的关键在于提出LakeHopper框架:通过LM交互识别并弥合知识鸿沟;采用基于聚类的数据选择机制筛选出最具代表性的未标注列;并通过增量式微调策略逐步适应目标数据湖,从而实现高效且稳定的跨数据湖迁移。
链接: https://arxiv.org/abs/2602.08793
作者: Yushi Sun,Xujia Li,Nan Tang,Quanqing Xu,Chuanhui Yang,Lei Chen
机构: HKUST(香港科技大学); HKUST(GZ)(香港科技大学(广州)); Ant Group(蚂蚁集团)
类目: Computation and Language (cs.CL); Databases (cs.DB)
备注:
点击查看摘要
Abstract:Column type annotation is vital for tasks like data cleaning, integration, and visualization. Recent solutions rely on resource-intensive language models fine-tuned on well-annotated columns from a particular set of tables, i.e., a source data lake. In this paper, we study whether we can adapt an existing pre-trained LM-based model to a new (i.e., target) data lake to minimize the annotations required on the new data lake. However, challenges include the source-target knowledge gap, selecting informative target data, and fine-tuning without losing shared knowledge exist. We propose LakeHopper, a framework that identifies and resolves the knowledge gap through LM interactions, employs a cluster-based data selection scheme for unannotated columns, and uses an incremental fine-tuning mechanism that gradually adapts the source model to the target data lake. Our experimental results validate the effectiveness of LakeHopper on two different data lake transfers under both low-resource and high-resource settings.
[NLP-18] Dynamics Within Latent Chain-of-Thought: An Empirical Study of Causal Structure
【速读】: 该论文旨在解决隐式链式思维(latent chain-of-thought)方法中中间推理步骤难以评估的问题,尤其是这些步骤的因果必要性与信息传播机制不明确。解决方案的关键在于将隐式推理步骤建模为结构因果模型(Structural Causal Model, SCM)中的变量,并通过逐步的do-干预(step-wise do-interventions)分析其因果效应,从而揭示潜在推理路径的功能分工、影响传播结构及表示层面的承诺机制。这一方法突破了传统相关性探针的局限,为理解隐式推理系统的可解释性和改进训练/解码策略提供了新的因果视角。
链接: https://arxiv.org/abs/2602.08783
作者: Zirui Li,Xuefeng Bai,Kehai Chen,Yizhi Li,Jian Yang,Chenghua Lin,Min Zhang
机构: Harbin Institute of Technology, Shenzhen (HITSZ); M-A-P; University of Manchester; Beihang University
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: 22 pages
点击查看摘要
Abstract:Latent or continuous chain-of-thought methods replace explicit textual rationales with a number of internal latent steps, but these intermediate computations are difficult to evaluate beyond correlation-based probes. In this paper, we view latent chain-of-thought as a manipulable causal process in representation space by modeling latent steps as variables in a structural causal model (SCM) and analyzing their effects through step-wise \mathrmdo -interventions. We study two representative paradigms (i.e., Coconut and CODI) on both mathematical and general reasoning tasks to investigate three key questions: (1) which steps are causally necessary for correctness and when answers become decidable early; (2) how does influence propagate across steps, and how does this structure compare to explicit CoT; and (3) do intermediate trajectories retain competing answer modes, and how does output-level commitment differ from representational commitment across steps. We find that latent-step budgets behave less like homogeneous extra depth and more like staged functionality with non-local routing, and we identify a persistent gap between early output bias and late representational commitment. These results motivate mode-conditional and stability-aware analyses – and corresponding training/decoding objectives – as more reliable tools for interpreting and improving latent reasoning systems.
[NLP-19] Map of Encoders – Mapping Sentence Encoders using Quantum Relative Entropy
链接: https://arxiv.org/abs/2602.08740
作者: Gaifan Zhang,Danushka Bollegala
机构: University of Liverpool (利物浦大学)
类目: Computation and Language (cs.CL)
备注:
[NLP-20] PERSPECTRA: A Scalable and Configurable Pluralist Benchmark of Perspectives from Arguments
链接: https://arxiv.org/abs/2602.08716
作者: Shangrui Nie,Kian Omoomi,Lucie Flek,Zhixue Zhao,Charles Welch
机构: 未知
类目: Computation and Language (cs.CL)
备注: 15 pages, 1 figure
[NLP-21] FactSim: Fact-Checking for Opinion Summarization
链接: https://arxiv.org/abs/2602.08709
作者: Leandro Anghinoni,Jorge Sanchez
机构: MercadoLibre( MercadoLibre)
类目: Computation and Language (cs.CL)
备注: 10 pages, 4 figures
[NLP-22] Challenges in Translating Technical Lectures: Insights from the NPTEL
链接: https://arxiv.org/abs/2602.08698
作者: Basudha Raje,Sadanand Venkatraman,Nandana TP,Soumyadeepa Das,Polkam Poojitha,M. Vijaykumar,Tanima Bagchi,Hema A. Murthy
机构: Indian Institute of Technology Madras, India (印度理工学院马德拉斯分校); Shiv Nadar University Chennai, India (希瓦纳达大学钦奈分校); Central University of Karnataka, India (卡纳塔克中央大学); English and Foreign Languages University, Hyderabad, India (英语和外国语大学海得拉巴分校)
类目: Computation and Language (cs.CL)
备注:
[NLP-23] Prototype-Based Disentanglement for Controllable Dysarthric Speech Synthesis
链接: https://arxiv.org/abs/2602.08696
作者: Haoshen Wang,Xueli Zhong,Bingbing Lin,Jia Huang,Xingduo Pan,Shengxiang Liang,Nizhuan Wang,Wai Ting Siok
机构: The Hong Kong Polytechnic University (香港理工大学); Fujian University of Traditional Chinese Medicine (福建中医药大学)
类目: ound (cs.SD); Computation and Language (cs.CL)
备注:
[NLP-24] Old wine in old glasses: Comparing computational and qualitative methods in identifying incivility on Persian Twitter during the #MahsaAmini movement
【速读】: 该论文旨在解决在低资源语言环境下(以波斯语为例)识别社交媒体中不文明言论(incivility)的挑战,尤其关注仇恨言论(hate speech)的检测问题。其解决方案的关键在于系统性比较三种方法:人工定性编码、基于ParsBERT的监督学习模型以及大语言模型(ChatGPT),并通过大规模数据集(47,278条来自#MahsaAmini运动的波斯语推文)进行评估。研究发现,ParsBERT在准确性上显著优于多个ChatGPT模型,而后者无论在隐含或明确的不文明内容上均表现不佳,且提示词语言(英语 vs. 波斯语)对其输出影响有限,从而明确了不同方法在低资源场景下的适用边界与局限性。
链接: https://arxiv.org/abs/2602.08688
作者: Hossein Kermani,Fatemeh Oudlajani,Pardis Yarahmadi,Hamideh Mahdi Soltani,Mohammad Makki,Zahra HosseiniKhoo
机构: 未知
类目: Computation and Language (cs.CL); Computers and Society (cs.CY)
备注:
点击查看摘要
Abstract:This paper compares three approaches to detecting incivility in Persian tweets: human qualitative coding, supervised learning with ParsBERT, and large language models (ChatGPT). Using 47,278 tweets from the #MahsaAmini movement in Iran, we evaluate the accuracy and efficiency of each method. ParsBERT substantially outperforms seven evaluated ChatGPT models in identifying hate speech. We also find that ChatGPT struggles not only with subtle cases but also with explicitly uncivil content, and that prompt language (English vs. Persian) does not meaningfully affect its outputs. The study provides a detailed comparison of these approaches and clarifies their strengths and limitations for analyzing hate speech in a low-resource language context.
[NLP-25] Learning to Judge: LLM s Designing and Applying Evaluation Rubrics EACL2026
链接: https://arxiv.org/abs/2602.08672
作者: Clemencia Siro,Pourya Aliannejadi,Mohammad Aliannejadi
机构: Centrum Wiskunde & Informatica (CWI); Shahid Beheshti University; University of Amsterdam
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: Accepted at EACL 2026 Findings
[NLP-26] Fundamental Reasoning Paradigms Induce Out-of-Domain Generalization in Language Models
链接: https://arxiv.org/abs/2602.08658
作者: Mingzi Cao,Xingwei Tan,Mahmud Akhter,Marco Valentino,Maria Liakata,Xi Wang,Nikolaos Aletras
机构: 未知
类目: Computation and Language (cs.CL)
备注:
[NLP-27] We Should Separate Memorization from Copyright
链接: https://arxiv.org/abs/2602.08632
作者: Adi Haviv,Niva Elkin-Koren,Uri Hacohen,Roi Livni,Shay Moran
机构: 未知
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:
[NLP-28] Do Multilingual LLM s have specialized language heads?
链接: https://arxiv.org/abs/2602.08625
作者: Muhammad Naufil
机构: 未知
类目: Computation and Language (cs.CL)
备注:
[NLP-29] VocalNet-MDM: Accelerating Streaming Speech LLM via Self-Distilled Masked Diffusion Modeling
【速读】: 该论文旨在解决当前端到端语音大语言模型(Speech Large Language Models, Speech LLMs)普遍采用自回归(Autoregressive, AR)范式所导致的生成效率低和暴露偏差(exposure bias)问题。其核心解决方案是引入非自回归的掩码扩散建模(Masked Diffusion Modeling, MDM)框架,并提出VocalNet-MDM模型。关键创新在于:1)提出分层块级掩码(Hierarchical Block-wise Masking),以对齐训练与推理阶段的渐进式掩码状态,缓解训练-推理不匹配;2)设计迭代自蒸馏(Iterative Self-Distillation)机制,将多步精炼过程压缩为更少步骤,显著降低延迟。实验表明,在仅使用6K小时语音数据训练下,该方法实现3.7×–10×的解码加速,首次生成延迟降低34%,同时保持高识别准确率,并在文本质量和语音自然度上达到最优水平。
链接: https://arxiv.org/abs/2602.08607
作者: Ziyang Cheng,Yuhao Wang,Heyang Liu,Ronghua Wu,Qunshan Gu,Yanfeng Wang,Yu Wang
机构: Shanghai Jiao Tong University (上海交通大学); Ant Group (蚂蚁集团)
类目: Computation and Language (cs.CL); Sound (cs.SD)
备注:
点击查看摘要
Abstract:Recent Speech Large Language Models~(LLMs) have achieved impressive capabilities in end-to-end speech interaction. However, the prevailing autoregressive paradigm imposes strict serial constraints, limiting generation efficiency and introducing exposure bias. In this paper, we investigate Masked Diffusion Modeling~(MDM) as a non-autoregressive paradigm for speech LLMs and introduce VocalNet-MDM. To adapt MDM for streaming speech interaction, we address two critical challenges: training-inference mismatch and iterative overhead. We propose Hierarchical Block-wise Masking to align training objectives with the progressive masked states encountered during block diffusion decoding, and Iterative Self-Distillation to compress multi-step refinement into fewer steps for low-latency inference. Trained on a limited scale of only 6K hours of speech data, VocalNet-MDM achieves a 3.7 \times --10 \times decoding speedup and reduces first-chunk latency by 34% compared to AR baselines. It maintains competitive recognition accuracy while achieving state-of-the-art text quality and speech naturalness, demonstrating that MDM is a promising and scalable alternative for low-latency, efficient speech LLMs.
[NLP-30] Beyond Scalar Scores: Reinforcement Learning for Error-Aware Quality Estimation of Machine Translation
【速读】: 该论文旨在解决两个核心问题:一是现有质量估计(Quality Estimation, QE)方法主要依赖标量分数,缺乏对翻译错误的显式描述,导致模型难以理解错误类型及其对整体质量的影响;二是低资源语言对(如英语到马拉雅拉姆语)由于标注数据稀缺,现有QE模型性能受限。解决方案的关键在于构建首个面向英-马拉雅拉姆语的细粒度段级QE数据集(包含人类标注的直接评估分数Direct Assessment, DA和翻译质量备注Translation Quality Remarks, TQR),并提出基于策略的强化学习框架ALOPE-RL,通过整合DA分数与TQR生成的误差感知奖励信号,训练轻量级适配器(adapter)。该方法使大语言模型(LLMs)能够在有限数据和计算预算下,基于上下文错误信息进行推理,从而显著提升QE性能,优于更大规模的LLM基线和主流编码器架构模型。
链接: https://arxiv.org/abs/2602.08600
作者: Archchana Sindhujan,Girish A. Koushik,Shenbin Qian,Diptesh Kanojia,Constantin Orăsan
机构: Institute for People-Centred AI, University of Surrey (萨里大学); NICE Research Group, University of Surrey (萨里大学); Department of Informatics, University of Oslo (奥斯陆大学); Centre for Translation Studies, University of Surrey (萨里大学)
类目: Computation and Language (cs.CL)
备注: Currently this article is under review for Natural Language Processing Journal
点击查看摘要
Abstract:Quality Estimation (QE) aims to assess the quality of machine translation (MT) outputs without relying on reference translations, making it essential for real-world, large-scale MT evaluation. Large Language Models (LLMs) have shown significant promise in advancing the field of quality estimation of machine translation. However, most of the QE approaches solely rely on scalar quality scores, offering no explicit information about the translation errors that should drive these judgments. Moreover, for low-resource languages where annotated QE data is limited, existing approaches struggle to achieve reliable performance. To address these challenges, we introduce the first segment-level QE dataset for English to Malayalam, a severely resource-scarce language pair in the QE domain, comprising human-annotated Direct Assessment (DA) scores and Translation Quality Remarks (TQR), which are short, contextual, free-form annotator comments that describe translation errors. We further introduce ALOPE-RL, a policy-based reinforcement learning framework that trains efficient adapters based on policy rewards derived from DA score and TQR. Integrating error-aware rewards with ALOPE-RL, enables LLMs to reason about translation quality beyond numeric scores. Despite being trained on a small-scale QE dataset, ALOPE-RL achieves state-of-the-art performance on English to Malayalam QE using compact LLMs (=4B parameters) fine-tuned with LoRA and 4-bit quantization, outperforming both larger LLM-based baselines and leading encoder-based QE models. Our results demonstrate that error-aware, policy-based learning can deliver strong QE performance under limited data and compute budgets. We release our dataset, code, and trained models to support future research.
[NLP-31] Automating Computational Reproducibility in Social Science: Comparing Prompt-Based and Agent -Based Approaches
链接: https://arxiv.org/abs/2602.08561
作者: Syed Mehtab Hussain Shah,Frank Hopfgartner,Arnim Bleier
机构: GESIS – Leibniz Institute for the Social Sciences(德国社会科学研究机构); University of Koblenz(科布伦茨大学)
类目: oftware Engineering (cs.SE); Computation and Language (cs.CL)
备注: 12 pages, 5 figures. Submitted to ACM conference
[NLP-32] How Do Language Models Understand Tables? A Mechanistic Analysis of Cell Location
链接: https://arxiv.org/abs/2602.08548
作者: Xuanliang Zhang,Dingzirui Wang,Keyan Xu,Qingfu Zhu,Wanxiang Che
机构: 未知
类目: Computation and Language (cs.CL)
备注:
[NLP-33] Learning Self-Correction in Vision-Language Models via Rollout Augmentation
链接: https://arxiv.org/abs/2602.08503
作者: Yi Ding,Ziliang Qiu,Bolian Li,Ruqi Zhang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: 17 pages
[NLP-34] Characterizing Evaluating and Optimizing Complex Reasoning
链接: https://arxiv.org/abs/2602.08498
作者: Haoran Zhang,Yafu Li,Zhi Wang,Zhilin Wang,Shunkai Zhang,Xiaoye Qu,Yu Cheng
机构: 未知
类目: Computation and Language (cs.CL)
备注: Code and data are available at \url{ this https URL }
[NLP-35] Beyond Correctness: Learning Robust Reasoning via Transfer
链接: https://arxiv.org/abs/2602.08489
作者: Hyunseok Lee,Soheil Abbasloo,Jihoon Tack,Jinwoo Shin
机构: 未知
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注:
[NLP-36] Large Language Models and Impossible Language Acquisition: “False Promise” or an Overturn of our Current Perspective towards AI
【速读】: 该论文旨在解决生成式 AI(Generative AI)在语言学习能力上是否具备人类所特有的因果推理与自我修正机制的问题,即 LLM 是否能够区分可能语言与不可能语言。其核心挑战在于验证 Chomsky 对 LLM 仅作为模式预测器的批判是否成立。解决方案的关键在于通过构造语法上不可能的语言(如句子反转和基于词数奇偶性的否定变换),并设计两轮对照实验,分别在 GPT-2 small 和 LSTM 模型上测试其对可能与不可能语言的学习能力。统计分析(Welch’s t-test)表明,GPT-2 小模型在所有不可能语言任务中表现显著劣于可能语言(p < 0.001),而 LSTM 模型则表现出与 Chomsky 观点一致的结果,凸显了 Transformer 架构演化的不可替代性。研究由此提出从 Chomsky 的“理性主义浪漫主义”范式向功能主义与经验主义范式的理论转向,为 LLM 研究提供新的认知框架。
链接: https://arxiv.org/abs/2602.08437
作者: Ziyan wang,Longlong Ma
机构: Xicheng Academy (西城学院); Institute of Software, University of Chinese Academy of Sciences (中国科学院大学软件研究所)
类目: Computation and Language (cs.CL)
备注:
点击查看摘要
Abstract:In Chomsky’s provocative critique “The False Promise of CHATGPT,” Large Language Models (LLMs) are characterized as mere pattern predictors that do not acquire languages via intrinsic causal and self-correction structures like humans, therefore are not able to distinguish impossible languages. It stands as a representative in a fundamental challenge to the intellectual foundations of AI, for it integrally synthesizes major issues in methodologies within LLMs and possesses an iconic a priori rationalist perspective. We examine this famous critic from both the perspective in pre-existing literature of linguistics and psychology as well as a research based on an experiment inquiring the capacity of learning both possible and impossible languages among LLMs. We constructed a set of syntactically impossible languages by applying certain transformations to English. These include reversing whole sentences, and adding negation based on word-count parity. Two rounds of controlled experiments were each conducted on GPT-2 small models and long short-term memory (LSTM) models. Statistical analysis (Welch’s t-test) shows GPT2 small models underperform in learning all of the impossible languages compared to their performance on the possible language (p.001). On the other hand, LSTM models’ performance tallies with Chomsky’s argument, suggesting the irreplaceable role of the evolution of transformer architecture. Based on theoretical analysis and empirical findings, we propose a new vision within Chomsky’s theory towards LLMs, and a shift of theoretical paradigm outside Chomsky, from his “rationalist-romantics” paradigm to functionalism and empiricism in LLMs research.
[NLP-37] Prism: Spectral-Aware Block-Sparse Attention
链接: https://arxiv.org/abs/2602.08426
作者: Xinghao Wang,Pengyu Wang,Xiaoran Liu,Fangxu Liu,Jason Chu,Kai Song,Xipeng Qiu
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注:
[NLP-38] EAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration
链接: https://arxiv.org/abs/2602.08404
作者: Linye Wei,Zixiang Luo,Pingzhi Tang,Meng Li
机构: 未知
类目: Computation and Language (cs.CL)
备注:
[NLP-39] Dynamic Long Context Reasoning over Compressed Memory via End-to-End Reinforcement Learning
链接: https://arxiv.org/abs/2602.08382
作者: Zhuoen Chen,Dongfang Li,Meishan Zhang,Baotian Hu,Min Zhang
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 26 pages, 7 figures. Code and models will be released
[NLP-40] Reinforcement Learning with Backtracking Feedback NEURIPS2025
链接: https://arxiv.org/abs/2602.08377
作者: Bilgehan Sel,Vaishakh Keshava,Phillip Wallis,Lukas Rutishauser,Ming Jin,Dingcheng Li
机构: Google; Virginia Tech; Google DeepMind
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: NeurIPS 2025
[NLP-41] ViGoEmotions: A Benchmark Dataset For Fine-grained Emotion Detection on Vietnamese Texts EACL2026
链接: https://arxiv.org/abs/2602.08371
作者: Hung Quang Tran,Nam Tien Pham,Son T. Luu,Kiet Van Nguyen
机构: University of Information Technology (信息科技大学); Vietnam National University, Ho Chi Minh City (胡志明市国家大学)
类目: Computation and Language (cs.CL)
备注: Accepted as main paper at EACL 2026
[NLP-42] MemAdapter: Fast Alignment across Agent Memory Paradigms via Generative Subgraph Retrieval
【速读】: 该论文旨在解决当前基于大语言模型(Large Language Model, LLM)的智能体(agent)中,不同记忆范式(如显式记忆、参数化记忆和潜在记忆)各自独立设计、检索方法紧密耦合导致跨范式泛化与融合困难的问题。其解决方案的关键在于提出 MemAdapter,一个统一异构记忆范式的检索框架,通过两阶段训练策略实现快速对齐:首先在统一的记忆空间中训练一个生成式子图检索器(generative subgraph retriever),随后利用对比学习微调轻量级对齐模块,使检索器可适配未见过的记忆范式。该设计显著提升了记忆检索的灵活性并大幅降低跨范式对齐的计算成本,实验证明其能在单张GPU上13分钟内完成跨范式对齐,且仅需原始检索器训练计算资源的5%即可达到更优性能,并支持零样本跨范式融合。
链接: https://arxiv.org/abs/2602.08369
作者: Xin Zhang,Kailai Yang,Chenyue Li,Hao Li,Qiyu Wei,Jun’ichi Tsujii,Sophia Ananiadou
机构: The University of Manchester(曼彻斯特大学); Stanford University(斯坦福大学); Imperial College London(帝国理工学院); National Institute of Advanced Industrial Science and Technology(日本先进工业科学技术研究院)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:
点击查看摘要
Abstract:Memory mechanism is a core component of LLM-based agents, enabling reasoning and knowledge discovery over long-horizon contexts. Existing agent memory systems are typically designed within isolated paradigms (e.g., explicit, parametric, or latent memory) with tightly coupled retrieval methods that hinder cross-paradigm generalization and fusion. In this work, we take a first step toward unifying heterogeneous memory paradigms within a single memory system. We propose MemAdapter, a memory retrieval framework that enables fast alignment across agent memory paradigms. MemAdapter adopts a two-stage training strategy: (1) training a generative subgraph retriever from the unified memory space, and (2) adapting the retriever to unseen memory paradigms by training a lightweight alignment module through contrastive learning. This design improves the flexibility for memory retrieval and substantially reduces alignment cost across paradigms. Comprehensive experiments on three public evaluation benchmarks demonstrate that the generative subgraph retriever consistently outperforms five strong agent memory systems across three memory paradigms and agent model scales. Notably, MemAdapter completes cross-paradigm alignment within 13 minutes on a single GPU, achieving superior performance over original memory retrievers with less than 5% of training compute. Furthermore, MemAdapter enables effective zero-shot fusion across memory paradigms, highlighting its potential as a plug-and-play solution for agent memory systems.
[NLP-43] WorldTravel: A Realistic Multimodal Travel-Planning Benchmark with Tightly Coupled Constraints
链接: https://arxiv.org/abs/2602.08367
作者: Zexuan Wang,Chenghao Yang,Yingqi Que,Zhenzhu Yang,Huaqing Yuan,Yiwen Wang,Zhengxuan Jiang,Shengjie Fang,Zhenhe Wu,Zhaohui Wang,Zhixin Yao,Jiashuo Liu,Jincheng Ren,Yuzhen Li,Yang Yang,Jiaheng Liu,Jian Yang,Zaiyuan Wang,Ge Zhang,Zhoufutu Wen,Wenhao Huang
机构: 未知
类目: Computation and Language (cs.CL)
备注:
[NLP-44] ManifoldKV: Training-Free KV Cache Compression via Euclidean Outlier Detection
链接: https://arxiv.org/abs/2602.08343
作者: Debajyoti Datta,Trishala Neeraj,Bibek Paudel,Vyom Sharma,Subhabrata Mukherjee
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: 18 pages, 5 figures, 18 tables
[NLP-45] UReason : Benchmarking the Reasoning Paradox in Unified Multimodal Models
链接: https://arxiv.org/abs/2602.08336
作者: Cheng Yang,Chufan Shi,Bo Shui,Yaokang Wu,Muzi Tao,Huijuan Wang,Ivan Yee Lee,Yong Liu,Xuezhe Ma,Taylor Berg-Kirkpatrick
机构: 未知
类目: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
备注: Project page: this https URL
[NLP-46] Latent Reasoning with Supervised Thinking States
链接: https://arxiv.org/abs/2602.08332
作者: Ido Amos,Avi Caciularu,Mor Geva,Amir Globerson,Jonathan Herzig,Lior Shani,Idan Szpektor
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:
[NLP-47] An Attention-over-Attention Generative Model for Joint Multiple Intent Detection and Slot Filling
链接: https://arxiv.org/abs/2602.08322
作者: Wei Zhu
机构: 未知
类目: Computation and Language (cs.CL)
备注:
[NLP-48] Improving Data and Reward Design for Scientific Reasoning in Large Language Models
链接: https://arxiv.org/abs/2602.08321
作者: Zijie Chen,Zhenghao Lin,Xiao Liu,Zhenzhong Lan,Yeyun Gong,Peng Cheng
机构: 未知
类目: Computation and Language (cs.CL)
备注:
[NLP-49] JUSTICE: Judicial Unified Synthesis Through Intermediate Conclusion Emulation for Automated Judgment Document Generation
链接: https://arxiv.org/abs/2602.08305
作者: Binglin Wu,Yingyi Zhang,Xiannneg Li
机构: 未知
类目: Computation and Language (cs.CL)
备注:
[NLP-50] When Does Context Help? Error Dynamics of Contextual Information in Large Language Models
【速读】: 该论文旨在解决大语言模型(Large Language Models, LLMs)在推理阶段利用上下文信息(如示范、检索知识或交互历史)提升性能的理论机制不明确的问题,尤其在非特定场景(如in-context learning, ICL)下缺乏统一分析框架。其解决方案的关键在于提出一个统一的理论框架,通过输出误差动态(output error dynamics)刻画任意上下文信息对模型输出的影响;在单层Transformer中,证明了上下文条件误差向量可分解为基线误差向量与上下文修正向量的叠加,并由此导出误差减少的几何必要条件:上下文修正方向需与负基线误差对齐且满足范数约束;进一步揭示上下文修正范数存在由上下文-查询相关性与互补性决定的显式上界,该结论可推广至多上下文和多层Transformer结构,实验验证了理论预测并启发了基于此原理的上下文选择策略,从而提升了模型性能0.6%。
链接: https://arxiv.org/abs/2602.08294
作者: Dingzirui Wang,Xuanliang Zhang,Keyan Xu,Qingfu Zhu,Wanxiang Che,Yang Deng
机构: 未知
类目: Computation and Language (cs.CL)
备注:
点击查看摘要
Abstract:Contextual information at inference time, such as demonstrations, retrieved knowledge, or interaction history, can substantially improve large language models (LLMs) without parameter updates, yet its theoretical role remains poorly understood beyond specific settings such as in-context learning (ICL). We present a unified theoretical framework for analyzing the effect of arbitrary contextual information in Transformer-based LLMs. Our analysis characterizes contextual influence through output error dynamics. In a single-layer Transformer, we prove that the context-conditioned error vector decomposes additively into the baseline error vector and a contextual correction vector. This yields necessary geometric conditions for error reduction: the contextual correction must align with the negative baseline error and satisfy a norm constraint. We further show that the contextual correction norm admits an explicit upper bound determined by context-query relevance and complementarity. These results extend to multi-context and multi-layer Transformers. Experiments across ICL, retrieval-augmented generation, and memory evolution validate our theory and motivate a principled context selection strategy that improves performance by 0.6% .
[NLP-51] Knowledge Augmented Entity and Relation Extraction for Legal Documents with Hypergraph Neural Network
链接: https://arxiv.org/abs/2602.08289
作者: Binglin Wu,Xianneng Li
机构: 未知
类目: Computation and Language (cs.CL)
备注:
[NLP-52] New Skills or Sharper Primitives? A Probabilistic Perspective on the Emergence of Reasoning in RLVR
链接: https://arxiv.org/abs/2602.08281
作者: Zhilin Wang,Yafu Li,Shunkai Zhang,Zhi Wang,Haoran Zhang,Xiaoye Qu,Yu Cheng
机构: 未知
类目: Computation and Language (cs.CL)
备注: 15 pages
[NLP-53] Language Modeling and Understanding Through Paraphrase Generation and Detection
链接: https://arxiv.org/abs/2602.08274
作者: Jan Philip Wahle
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: PhD dissertation, University of Göttingen Germany, 2025. 182 pages
[NLP-54] Language Predicts Identity Fusion Across Cultures and Reveals Divergent Pathways to Violence
链接: https://arxiv.org/abs/2602.08252
作者: Devin R. Wright,Justin E. Lane,F. LeRon Shults
机构: 未知
类目: Computation and Language (cs.CL)
备注: Initial submitted version
[NLP-55] On convexity and efficiency in semantic systems
【速读】: 该论文试图解决的问题是:人类语义类别系统(semantic category systems)为何在颜色命名等域中常表现为凸集划分(convex partitions),且这种结构是否与语义效率(semantic efficiency)存在因果或理论关联。此前研究观察到凸性与效率共现,但二者关系尚不明确。解决方案的关键在于引入信息瓶颈(Information Bottleneck, IB)框架对语义效率进行形式化建模,并通过理论分析与实证比较揭示:凸性与效率虽在经验上相似,但本质上不同——最优效率系统大多呈现凸结构,但并非所有凸系统都高效;更重要的是,效率比凸性更能有效区分实际存在的颜色命名系统与假设变体,且能解释凸性无法涵盖的多种现象。因此,该研究主张以效率为核心机制来理解语义类别的演化与类型学特征。
链接: https://arxiv.org/abs/2602.08238
作者: Nathaniel Imel,Noga Zaslavasky
机构: 未知
类目: Computation and Language (cs.CL)
备注:
点击查看摘要
Abstract:There are two widely held characterizations of human semantic category systems: (1) they form convex partitions of conceptual spaces, and (2) they are efficient for communication. While prior work observed that convexity and efficiency co-occur in color naming, the analytical relation between them and why they co-occur have not been well understood. We address this gap by combining analytical and empirical analyses that build on the Information Bottleneck (IB) framework for semantic efficiency. First, we show that convexity and efficiency are distinct in the sense that neither entails the other: there are convex systems which are inefficient, and optimally-efficient systems that are non-convex. Crucially, however, the IB-optimal systems are mostly convex in the domain of color naming, explaining the main empirical basis for the convexity approach. Second, we show that efficiency is a stronger predictor for discriminating attested color naming systems from hypothetical variants, with convexity adding negligible improvement on top of that. Finally, we discuss a range of empirical phenomena that convexity cannot account for but efficiency can. Taken together, our work suggests that while convexity and efficiency can yield similar structural observations, they are fundamentally distinct, with efficiency providing a more comprehensive account of semantic typology.
[NLP-56] Document Reconstruction Unlocks Scalable Long-Context RLVR
链接: https://arxiv.org/abs/2602.08237
作者: Yao Xiao,Lei Wang,Yue Deng,Guanzheng Chen,Ziqi Jin,Jung-jae Kim,Xiaoli Li,Roy Ka-wei Lee,Lidong Bing
机构: 未知
类目: Computation and Language (cs.CL)
备注:
[NLP-57] When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning
【速读】: 该论文旨在解决多模态大语言模型(Multimodal Large Language Models, MLLMs)在视觉空间推理任务中因依赖特定视角而表现不可靠的问题,尤其关注在需要基于未见或替代视角进行判断时,如何有效利用世界模型(world models)进行测试时的视觉想象(test-time visual imagination)。其核心挑战在于:何时需要视觉想象、想象的量度如何影响性能,以及不当使用是否会导致误导性证据从而降低准确性。解决方案的关键在于提出一种自适应测试时框架AVIC(Adaptive Visual Imagination Control),该框架通过显式评估当前静态视觉证据的充分性,动态决定是否调用及调整视觉想象的强度,从而实现对视觉想象资源的选择性控制。实验表明,该方法可在多个空间推理基准(SAT、MMSI)和具身导航基准(R2R)上显著减少世界模型调用次数与语言Token消耗,同时保持甚至超越固定想象策略的性能,验证了选择性控制对提升推理效率与可靠性的关键作用。
链接: https://arxiv.org/abs/2602.08236
作者: Shoubin Yu,Yue Zhang,Zun Wang,Jaehong Yoon,Huaxiu Yao,Mingyu Ding,Mohit Bansal
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: the first two authors are equally contributed. Project page: this https URL
点击查看摘要
Abstract:Despite rapid progress in Multimodal Large Language Models (MLLMs), visual spatial reasoning remains unreliable when correct answers depend on how a scene would appear under unseen or alternative viewpoints. Recent work addresses this by augmenting reasoning with world models for visual imagination, but questions such as when imagination is actually necessary, how much of it is beneficial, and when it becomes harmful, remain poorly understood. In practice, indiscriminate imagination can increase computation and even degrade performance by introducing misleading evidence. In this work, we present an in-depth analysis of test-time visual imagination as a controllable resource for spatial reasoning. We study when static visual evidence is sufficient, when imagination improves reasoning, and how excessive or unnecessary imagination affects accuracy and efficiency. To support this analysis, we introduce AVIC, an adaptive test-time framework with world models that explicitly reasons about the sufficiency of current visual evidence before selectively invoking and scaling visual imagination. Across spatial reasoning benchmarks (SAT, MMSI) and an embodied navigation benchmark (R2R), our results reveal clear scenarios where imagination is critical, marginal, or detrimental, and show that selective control can match or outperform fixed imagination strategies with substantially fewer world-model calls and language tokens. Overall, our findings highlight the importance of analyzing and controlling test-time imagination for efficient and reliable spatial reasoning.
[NLP-58] When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents
链接: https://arxiv.org/abs/2602.08235
作者: Jaylen Jones,Zhehao Zhang,Yuting Ning,Eric Fosler-Lussier,Pierre-Luc St-Charles,Yoshua Bengio,Dawn Song,Yu Su,Huan Sun
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
备注: Project Homepage: this https URL
信息检索
[IR-0] Automatic In-Domain Exemplar Construction and LLM -Based Refinement of Multi-LLM Expansions for Query Expansion
链接: https://arxiv.org/abs/2602.08917
作者: Minghan Li,Ercong Nie,Siqi Zhao,Tongna Chen,Huiping Huang,Guodong Zhou
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
备注:
[IR-1] OmniReview: A Large-scale Benchmark and LLM -enhanced Framework for Realistic Reviewer Recommendation
链接: https://arxiv.org/abs/2602.08896
作者: Yehua Huang,Penglei Sun,Zebin Chen,Zhenheng Tang,Xiaowen Chu
机构: The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州) ); The Hong Kong University of Science and Technology(香港科技大学)
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
备注:
[IR-2] Contrastive Learning for Diversity-Aware Product Recommendations in Retail
链接: https://arxiv.org/abs/2602.08886
作者: Vasileios Karlis,Ezgi Yıldırım,David Vos,Maarten de Rijke
机构: University of Amsterdam (阿姆斯特丹大学); IKEA Retail (Ingca Group) (宜家零售(英格卡集团))
类目: Information Retrieval (cs.IR); Machine Learning (cs.LG)
备注:
[IR-3] Whose Name Comes Up? Benchmarking and Intervention-Based Auditing of LLM -Based Scholar Recommendation
【速读】:该论文旨在解决当前大语言模型(Large Language Models, LLMs)在学术专家推荐场景中,因忽视用户在推理阶段的干预行为而导致评估结果失真的问题。现有审计方法通常孤立地评估模型输出,未考虑实际部署中用户通过提示词约束、温度调节或检索增强生成(Retrieval-Augmented Generation, RAG)等手段对模型性能的影响,从而难以区分错误来源是模型选择本身还是部署策略所致。解决方案的关键在于构建LLMScholarBench基准测试框架,该框架联合评估模型基础设施与用户干预策略在多个任务下的表现,并引入九项指标同时衡量技术质量(如有效性、一致性、事实性)和社会代表性(如多样性、公平性)。实证结果显示,不同干预方式并非带来统一改进,而是重塑了各项指标间的权衡关系:高温度降低事实性,提示词约束提升多样性但损害事实性,RAG改善技术指标却削弱多样性和公平性,表明用户干预本质上是调整误差分布而非提供通用优化路径。
链接: https://arxiv.org/abs/2602.08873
作者: Lisette Espin-Noboa,Gonzalo Gabriel Mendez
机构: Complexity Science Hub Vienna (复杂性科学中心维也纳); Universitat Politècnica de València (瓦伦西亚理工大学)
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
备注: 28 pages: 8 pages in main (5 figures, 1 table), 20 pages in appendix (18 figures, 2 tables). under-review
点击查看摘要
Abstract:Large language models (LLMs) are increasingly used for academic expert recommendation. Existing audits typically evaluate model outputs in isolation, largely ignoring end-user inference-time interventions. As a result, it remains unclear whether failures such as refusals, hallucinations, and uneven coverage stem from model choice or deployment decisions. We introduce LLMScholarBench, a benchmark for auditing LLM-based scholar recommendation that jointly evaluates model infrastructure and end-user interventions across multiple tasks. LLMScholarBench measures both technical quality and social representation using nine metrics. We instantiate the benchmark in physics expert recommendation and audit 22 LLMs under temperature variation, representation-constrained prompting, and retrieval-augmented generation (RAG) via web search. Our results show that end-user interventions do not yield uniform improvements but instead redistribute error across dimensions. Higher temperature degrades validity, consistency, and factuality. Representation-constrained prompting improves diversity at the expense of factuality, while RAG primarily improves technical quality while reducing diversity and parity. Overall, end-user interventions reshape trade-offs rather than providing a general fix. We release code and data that can be adapted to other disciplines by replacing domain-specific ground truth and metrics.
[IR-4] Large Language Models for Geolocation Extraction in Humanitarian Crisis Response
链接: https://arxiv.org/abs/2602.08872
作者: G. Cafferata,T. Demarco,K. Kalimeri,Y. Mejova,M.G. Beiró
机构: Universidad de San Andrés (圣安德烈斯大学); ISI Foundation (ISI基金会); CONICET (阿根廷国家科学技术研究委员会)
类目: Computation and Language (cs.CL); Information Retrieval (cs.IR)
备注:
[IR-5] AMEM4Rec: Leverag ing Cross-User Similarity for Memory Evolution in Agent ic LLM Recommenders
链接: https://arxiv.org/abs/2602.08837
作者: Minh-Duc Nguyen,Hai-Dang Kieu,Dung D. Le
机构: VinUniversity(维大学)
类目: Information Retrieval (cs.IR); Machine Learning (cs.LG)
备注:
[IR-6] Welfarist Formulations for Diverse Similarity Search
【速读】:该论文旨在解决近邻搜索(Nearest Neighbor Search, NNS)中如何在保证检索相关性的同时实现多属性多样性的问题,尤其针对检索增强生成(Retrieval-Augmented Generation, RAG)等新兴应用场景的需求。传统约束式方法固定多样性水平并优先优化相关性,缺乏灵活性;而本文提出基于福利函数(welfare functions)的建模框架,以数学经济学中的公平性与效率公理为基础,构建可自适应平衡相关性与多样性的目标函数。其关键创新在于引入纳什社会福利(Nash social welfare)作为具体目标,实现了查询相关的动态权衡,并提供参数化控制机制供实践者按任务需求调整相关性与多样性之间的折衷关系。此外,作者设计了高效近邻算法,可在任意标准近似最近邻(Approximate Nearest Neighbor, ANN)方法基础上运行,且具备理论保障,实验证明该方法在保持高相关性的同时显著提升多样性表现。
链接: https://arxiv.org/abs/2602.08742
作者: Siddharth Barman,Nirjhar Das,Shivam Gupta,Kirankumar Shiragur
机构: Indian Institute of Science (印度科学研究所); Microsoft Research India (微软研究院印度)
类目: Data Structures and Algorithms (cs.DS); Computational Geometry (cs.CG); Computer Science and Game Theory (cs.GT); Information Retrieval (cs.IR); Machine Learning (cs.LG)
备注:
点击查看摘要
Abstract:Nearest Neighbor Search (NNS) is a fundamental problem in data structures with wide-ranging applications, such as web search, recommendation systems, and, more recently, retrieval-augmented generations (RAG). In such recent applications, in addition to the relevance (similarity) of the returned neighbors, diversity among the neighbors is a central requirement. In this paper, we develop principled welfare-based formulations in NNS for realizing diversity across attributes. Our formulations are based on welfare functions – from mathematical economics – that satisfy central diversity (fairness) and relevance (economic efficiency) axioms. With a particular focus on Nash social welfare, we note that our welfare-based formulations provide objective functions that adaptively balance relevance and diversity in a query-dependent manner. Notably, such a balance was not present in the prior constraint-based approach, which forced a fixed level of diversity and optimized for relevance. In addition, our formulation provides a parametric way to control the trade-off between relevance and diversity, providing practitioners with flexibility to tailor search results to task-specific requirements. We develop efficient nearest neighbor algorithms with provable guarantees for the welfare-based objectives. Notably, our algorithm can be applied on top of any standard ANN method (i.e., use standard ANN method as a subroutine) to efficiently find neighbors that approximately maximize our welfare-based objectives. Experimental results demonstrate that our approach is practical and substantially improves diversity while maintaining high relevance of the retrieved neighbors.
[IR-7] Do Images Clarify? A Study on the Effect of Images on Clarifying Questions in Conversational Search
链接: https://arxiv.org/abs/2602.08700
作者: Clemencia Siro,Zahra Abbasiantaeb,Yifei Yuan,Mohammad Aliannejadi,Maarten de Rijke
机构: University of Amsterdam (阿姆斯特丹大学); University of Copenhagen (哥本哈根大学)
类目: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)
备注: Accepted at CHIIR 2025
[IR-8] SA-CAISR: Stage-Adaptive and Conflict-Aware Incremental Sequential Recommendation
链接: https://arxiv.org/abs/2602.08678
作者: Xiaomeng Song,Xinru Wang,Hanbing Wang,Hongyu Lu,Yu Chen,Zhaochun Ren,Zhumin Chen
机构: Shandong University (山东大学); Michigan State University (密歇根州立大学); Tencent (腾讯); Leiden University (莱顿大学)
类目: Information Retrieval (cs.IR)
备注:
[IR-9] Retrieval Pivot Attacks in Hybrid RAG : Measuring and Mitigating Amplified Leakage from Vector Seeds to Graph Expansion
链接: https://arxiv.org/abs/2602.08668
作者: Scott Thornton
机构: 未知
类目: Cryptography and Security (cs.CR); Information Retrieval (cs.IR); Machine Learning (cs.LG)
备注: 18 pages, 5 figures
[IR-10] SRSUPM: Sequential Recommender System Based on User Psychological Motivation
链接: https://arxiv.org/abs/2602.08667
作者: Yicheng Di,Yuan Liu,Zhi Chen,Jingcai Guo
机构: Jiangnan University (江南大学); University of Queensland (昆士兰大学); The Hong Kong Polytechnic University (香港理工大学)
类目: Information Retrieval (cs.IR)
备注: 9 pages, 8 pages
[IR-11] OneLive: Dynamically Unified Generative Framework for Live-Streaming Recommendation
链接: https://arxiv.org/abs/2602.08612
作者: Shen Wang,Yusheng Huang,Ruochen Yang,Shuang Wen,Pengbo Xu,Jiangxia Cao,Yueyang Liu,Kuo Cai,Chengcheng Guo,Shiyao Wang,Xinchen Luo,Qiang Luo,Ruiming Tang,Shuang Yang,Zhaojie Liu,Guorui Zhou,Han Li,Kun Gai
机构: Kuaishou Technology (快手科技)
类目: Information Retrieval (cs.IR)
备注: Work in progress
[IR-12] RankGR: Rank-Enhanced Generative Retrieval with Listwise Direct Preference Optimization in Recommendation
【速读】:该论文旨在解决当前生成式推荐(Generative Retrieval, GR)方法在用户偏好建模和行为序列交互方面的局限性:一方面,现有方法依赖于逐token预测策略,难以捕捉用户偏好的层次结构;另一方面,忽略了解码标识符与用户行为序列之间的深层交互。解决方案的关键在于提出RankGR,其核心创新是将检索过程分解为两个互补阶段——初始评估阶段(Initial Assessment Phase, IAP)和精炼评分阶段(Refined Scoring Phase, RSP)。IAP引入了一种新颖的列表级直接偏好优化策略,以更全面地建模用户偏好的层级关系和部分序关系;RSP则通过轻量级评分模块对IAP生成的top-λ候选进行精细化打分,强化了候选项与输入行为序列的交互。两阶段在统一的GR框架下联合优化,兼顾了效率与精度,并实现了每秒近万次请求的实时系统部署。
链接: https://arxiv.org/abs/2602.08575
作者: Kairui Fu,Changfa Wu,Kun Yuan,Binbin Cao,Dunxian Huang,Yuliang Yan,Junjun Zheng,Jianning Zhang,Silu Zhou,Jian Wu,Kun Kuang
机构: Zhejiang University (浙江大学); Alibaba Group (阿里巴巴集团)
类目: Information Retrieval (cs.IR)
备注:
点击查看摘要
Abstract:Generative retrieval (GR) has emerged as a promising paradigm in recommendation systems by autoregressively decoding identifiers of target items. Despite its potential, current approaches typically rely on the next-token prediction schema, which treats each token of the next interacted items as the sole target. This narrow focus 1) limits their ability to capture the nuanced structure of user preferences, and 2) overlooks the deep interaction between decoded identifiers and user behavior sequences. In response to these challenges, we propose RankGR, a Rank-enhanced Generative Retrieval method that incorporates listwise direct preference optimization for recommendation. RankGR decomposes the retrieval process into two complementary stages: the Initial Assessment Phase (IAP) and the Refined Scoring Phase (RSP). In IAP, we incorporate a novel listwise direct preference optimization strategy into GR, thus facilitating a more comprehensive understanding of the hierarchical user preferences and more effective partial-order modeling. The RSP then refines the top-\lambda candidates generated by IAP with interactions towards input sequences using a lightweight scoring module, leading to more precise candidate evaluation. Both phases are jointly optimized under a unified GR model, ensuring consistency and efficiency. Additionally, we implement several practical improvements in training and deployment, ultimately achieving a real-time system capable of handling nearly ten thousand requests per second. Extensive offline performance on both research and industrial datasets, as well as the online gains on the “Guess You Like” section of Taobao, validate the effectiveness and scalability of RankGR.
[IR-13] owards Reliable Social A/B Testing: Spillover-Contained Clustering with Robust Post-Experiment Analysis
链接: https://arxiv.org/abs/2602.08569
作者: Xu Min,Zhaoxu Yang,Kaixuan Tan,Juan Yan,Xunbin Xiong,Zihao Zhu,Kaiyu Zhu,Fenglin Cui,Yang Yang,Sihua Yang,Jianhui Bu
机构: Kuaishou Technology (快手科技)
类目: ocial and Information Networks (cs.SI); Information Retrieval (cs.IR)
备注:
[IR-14] QARM V2: Quantitative Alignment Multi-Modal Recommendation for Reasoning User Sequence Modeling
链接: https://arxiv.org/abs/2602.08559
作者: Tian Xia,Jiaqi Zhang,Yueyang Liu,Hongjian Dou,Tingya Yin,Jiangxia Cao,Xulei Liang,Tianlu Xie,Lihao Liu,Xiang Chen,Shen Wang,Changxin Lao,Haixiang Gan,Jinkai Yu,Keting Cen,Lu Hao,Xu Zhang,Qiqiang Zhong,Zhongbo Sun,Yiyu Wang,Shuang Yang,Mingxin Wen,Xiangyu Wu,Shaoguo Liu,Tingting Gao,Zhaojie Liu,Han Li,Kun Gai
机构: Kuaishou Technology(快手科技)
类目: Information Retrieval (cs.IR)
备注: Work in progress
[IR-15] DA-RAG : Dynamic Attributed Community Search for Retrieval-Augmented Generation
【速读】:该论文旨在解决当前图结构检索增强生成(Graph-based Retrieval-Augmented Generation, G-RAG)方法在处理动态和复杂查询时效率与效果不足的问题,其核心在于现有方法对图拓扑结构的利用不充分,主要依赖低阶结构或静态社区划分,难以捕捉高阶语义关联。解决方案的关键是提出DA-RAG框架,通过引入属性社区搜索(Attributed Community Search, ACS)动态提取与查询相关的子图,从而捕获高阶图结构并实现自补全知识的检索;同时设计面向块层(chunk-layer)的图索引机制,支持多粒度高效检索,在显著降低计算与经济成本的同时提升性能表现。
链接: https://arxiv.org/abs/2602.08545
作者: Xingyuan Zeng,Zuohan Wu,Yue Wang,Chen Zhang,Quanming Yao,Libin Zheng,Jian Yin
机构: The Technology Innovation Center for Collaborative Applications of Natural Resources Data in GBA, MNR; Sun Yat-sen University; The Hong Kong University of Science and Technology (Guangzhou); Shenzhen Institute of Computing Sciences; The Hong Kong Polytechnic University; Tsinghua University; State Key laboratory of Space Network and Communications; Beijing National Research Center for Information Science and Technology
类目: Information Retrieval (cs.IR)
备注:
点击查看摘要
Abstract:Owing to their unprecedented comprehension capabilities, large language models (LLMs) have become indispensable components of modern web search engines. From a technical perspective, this integration represents retrieval-augmented generation (RAG), which enhances LLMs by grounding them in external knowledge bases. A prevalent technical approach in this context is graph-based RAG (G-RAG). However, current G-RAG methodologies frequently underutilize graph topology, predominantly focusing on low-order structures or pre-computed static communities. This limitation affects their effectiveness in addressing dynamic and complex queries. Thus, we propose DA-RAG, which leverages attributed community search (ACS) to extract relevant subgraphs based on the queried question dynamically. DA-RAG captures high-order graph structures, allowing for the retrieval of self-complementary knowledge. Furthermore, DA-RAG is equipped with a chunk-layer oriented graph index, which facilitates efficient multi-granularity retrieval while significantly reducing both computational and economic costs. We evaluate DA-RAG on multiple datasets, demonstrating that it outperforms existing RAG methods by up to 40% in head-to-head comparisons across four metrics while reducing index construction time and token overhead by up to 37% and 41%, respectively.
[IR-16] GISA: A Benchmark for General Information-Seeking Assistant
链接: https://arxiv.org/abs/2602.08543
作者: Yutao Zhu,Xingshuo Zhang,Maosen Zhang,Jiajie Jin,Liancheng Zhang,Xiaoshuai Song,Kangzhi Zhao,Wencong Zeng,Ruiming Tang,Han Li,Ji-Rong Wen,Zhicheng Dou
机构: Renmin University of China (中国人民大学); Kuaishou Technology (快手科技)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
备注:
[IR-17] PIT: A Dynamic Personalized Item Tokenizer for End-to-End Generative Recommendation
链接: https://arxiv.org/abs/2602.08530
作者: Huanjie Wang,Xinchen Luo,Honghui Bao,Zhang Zixing,Lejian Ren,Yunfan Wu,Hongwei Zhang,Liwei Guan,Guang Chen
机构: Beijing University of Posts and Telecommunications(北京邮电大学); Kuaishou Technology(快手科技); Institute of Computing Technology, Chinese Academy of Sciences(中国科学院计算技术研究所)
类目: Information Retrieval (cs.IR)
备注:
[IR-18] Hybrid Pooling with LLM s via Relevance Context Learning
链接: https://arxiv.org/abs/2602.08457
作者: David Otero,Javier Parapar
机构: Information Retrieval Lab, CITIC (信息检索实验室,CITIC); Universidade da Coruña (拉科鲁尼亚大学)
类目: Information Retrieval (cs.IR)
备注:
[IR-19] A SketchText Composed Image Retrieval Dataset for Thangka
链接: https://arxiv.org/abs/2602.08411
作者: Jinyu Xu,Yi Sun,Jiangling Zhang,Qing Xie,Daomin Ji,Zhifeng Bao,Jiachen Li,Yanchun Ma,Yongjian Liu
机构: Wuhan University of Technology(武汉理工大学); RMIT University(皇家墨尔本理工学院); The University of Queensland(昆士兰大学); Wuhan Vocational College of Software and Engineering(武汉软件工程职业学院)
类目: Information Retrieval (cs.IR)
备注: 9 pages
人机交互
[HC-0] Rhythms of Recovery: Patient-Centered Virtual Reality Exergame for Physical Rehabilitation in the Intensive Care Unit
链接: https://arxiv.org/abs/2602.08994
作者: Sangjun Eom,Tianyi Hu,Wenyi Xu,Liheng Zou,Ernesto Escobar,Gabriel Streisfeld,Anna Mall,Bradi Granger,Maria Gorlatova
机构: 未知
类目: Human-Computer Interaction (cs.HC)
备注:
[HC-1] PPG as a Bridge: Cross-Device Authentication for Smart Wearables with Photoplethysmography
链接: https://arxiv.org/abs/2602.08972
作者: Jiacheng Liu,Jiankai Tang,Guangye Zhao,Ruichen Gui,Songqin Cheng,Taiting Lu,Jian Liu,Weiqiang Wang,Mahanth Gowda,Yuanchun Shi,Yuntao Wang
机构: Cornell University (康奈尔大学); Tsinghua University (清华大学); University of Washington (华盛顿大学); Pennsylvania State University (宾夕法尼亚州立大学); Ant Group (蚂蚁集团)
类目: Human-Computer Interaction (cs.HC)
备注: 31 pages, 15 figures, 5 tables, submitted to IMWUT 2026
[HC-2] pixelLOG: Logging of Online Gameplay for Cognitive Research
【速读】:该论文旨在解决传统认知评估方法依赖孤立、以输出为导向的测量指标,难以在自然情境中捕捉人类认知复杂性的局限性。其解决方案的关键在于提出 pixelLOG,一个专为基于 Spigot 的 Minecraft 服务器设计的高性能数据采集框架,能够支持多玩家/多智能体环境下的过程导向型认知研究。该框架通过主动状态轮询与被动事件监控相结合的混合策略,在可配置频率(最高超过每秒20次更新)下捕获全面的行为数据,并借助 Spigot 的可扩展 API 实现会话隔离和结构化 JSON 输出,从而实现对虚拟环境中认知过程的高分辨率分析,弥合了实验室测评与生态效度更高的任务之间的差距。
链接: https://arxiv.org/abs/2602.08941
作者: Zeyu Lu,Dennis L. Barbour
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注: 9 pages, 1 figure
点击查看摘要
Abstract:Traditional cognitive assessments often rely on isolated, output-focused measurements that may fail to capture the complexity of human cognition in naturalistic settings. We present pixelLOG, a high-performance data collection framework for Spigot-based Minecraft servers designed specifically for process-based cognitive research. Unlike existing frameworks tailored only for artificial intelligence agents, pixelLOG also enables human behavioral tracking in multi-player/multi-agent environments. Operating at configurable frequencies up to and exceeding 20 updates per second, the system captures comprehensive behavioral data through a hybrid approach of active state polling and passive event monitoring. By leveraging Spigot’s extensible API, pixelLOG facilitates robust session isolation and produces structured JSON outputs integrable with standard analytical pipelines. This framework bridges the gap between decontextualized laboratory assessments and richer, more ecologically valid tasks, enabling high-resolution analysis of cognitive processes as they unfold in complex, virtual environments.
[HC-3] How University Disability Services Professionals Write Image Descriptions for HCI Figures Using Generative AI
【速读】:该论文试图解决的问题是:高等教育机构中无障碍服务办公室(Disability Services Office, DSO)专业人员在为复杂视觉内容(如人机交互(Human-Computer Interaction, HCI)研究论文中的图表)撰写高质量替代文本(alt text)时,因缺乏领域专业知识而面临困难。解决方案的关键在于引入生成式 AI(Generative AI)作为辅助工具,帮助DSO专业人员提升alt text的质量、写作效率和信心;研究发现,AI辅助显著优于纯人工撰写,尽管存在交互效率不足的问题,但其在提升非专业人员的可访问性内容创作能力方面具有重要价值。
链接: https://arxiv.org/abs/2602.08937
作者: Muhammad Raees,Yugo Iwamoto,Konstantinos Papangelis,Jamison Heard,Garreth W. Tigwell
机构: 未知
类目: Human-Computer Interaction (cs.HC)
备注:
点击查看摘要
Abstract:Disability Services Office (DSO) professionals at higher education institutions write alt text for visual content. However, due to the complexity of visual content, such as HCI figures in research publications, DSO professionals can struggle to write high-quality alt text if they lack subject expertise. Generative AI has shown potential in understanding figures and writing their descriptions, yet its support for DSO professionals is underexplored, and limited work evaluates the quality of alt text generated with AI assistance. In this work, we conducted two studies: first, we investigated generative AI support for writing alt text for HCI figures with 12 DSO professionals. Second, we recruited 11 HCI experts to evaluate the alt text written by DSO professionals. Findings show that alt text written solely by DSO professionals has lower quality than alt text written with AI assistance. AI assistance also helped DSO professionals write alt text more quickly and with greater confidence; however, they reported inefficiencies in interactions with the AI. Our work contributes to exploring AI support for non-subject expert accessibility professionals.
[HC-4] “I Dont Trust Any Professional Research Tool”: A Re-Imagination of Knowledge Production Workflows by with and for Blind and Low-Vision Researchers
【速读】:该论文试图解决盲人及低视力(Blind and Low-Vision, BLV)研究人员在现代科研流程中因视觉主导的基础设施而遭遇系统性排斥的问题。解决方案的关键在于通过实证研究揭示BLV研究者在文献综述、可视化结果评估等核心环节中的自主权丧失与体力负担加重,并基于活动理论框架提出设计建议,将无障碍设计从边缘补充转变为科研成功的核心要素,从而推动科研体系对BLV学者支持流程的根本性重构。
链接: https://arxiv.org/abs/2602.08925
作者: Omar Khan,JooYoung Seo
机构: 未知
类目: Human-Computer Interaction (cs.HC)
备注: In Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems, 24 pages
点击查看摘要
Abstract:Research touts universal participation through accessibility initiatives, yet blind and low-vision (BLV) researchers face systematic exclusion as visual representations dominate modern research workflows. To materialize inclusive processes, we, as BLV researchers, examined how our peers combat inaccessible infrastructures. Through an explanatory sequential mixed-methods approach, we conducted a cross-sectional, observational survey (n=57) and follow-up semi-structured interviews (n=15), analyzing open-ended data using reflexive thematic analysis and framing findings through activity theory to highlight research’s systemic shortcomings. We expose how BLV researchers sacrifice autonomy and shoulder physical burdens, with nearly one-fifth unable to independently perform literature review or evaluate visual outputs, delegating tasks to sighted colleagues or relying on AI-driven retrieval to circumvent fatigue. Researchers also voiced frustration with specialized tools, citing developers’ performative responses and losing deserved professional accolades. We seek follow-through on research’s promises through design recommendations that reconceptualize accessibility as fundamental to successful research and supporting BLV scholars’ workflows.
[HC-5] Gesturing Toward Abstraction: Multimodal Convention Formation in Collaborative Physical Tasks
【速读】:该论文旨在解决人类在重复协作过程中如何通过演化通信策略来建立临时惯例(ad hoc conventions),从而高效实现共享目标的问题。其解决方案的关键在于:通过自然语言和多模态交互(语音与手势)的实验设计,发现参与者在物理协作中会逐步形成语言和手势抽象,并利用跨模态冗余强调关键变化,从而提升协作效率;在此基础上,研究进一步扩展了概率性惯例形成模型至多模态场景,捕捉模态偏好转变,为设计具备惯例意识的物理世界智能体提供了理论基础与实践路径。
链接: https://arxiv.org/abs/2602.08914
作者: Kiyosu Maeda,William P. McCarthy,Ching-Yi Tsai,Jeffrey Mu,Haoliang Wang,Robert D. Hawkins,Judith E. Fan,Parastoo Abtahi
机构: Princeton University (普林斯顿大学); UC San Diego (加州大学圣地亚哥分校); Brown University (布朗大学); MIT (麻省理工学院); Stanford University (斯坦福大学)
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注: Accepted at the 2026 CHI Conference on Human Factors in Computing Systems (CHI 2026). 15 pages
点击查看摘要
Abstract:A quintessential feature of human intelligence is the ability to create ad hoc conventions over time to achieve shared goals efficiently. We investigate how communication strategies evolve through repeated collaboration as people coordinate on shared procedural abstractions. To this end, we conducted an online unimodal study (n = 98) using natural language to probe abstraction hierarchies. In a follow-up lab study (n = 40), we examined how multimodal communication (speech and gestures) changed during physical collaboration. Pairs used augmented reality to isolate their partner’s hand and voice; one participant viewed a 3D virtual tower and sent instructions to the other, who built the physical tower. Participants became faster and more accurate by establishing linguistic and gestural abstractions and using cross-modal redundancy to emphasize key changes from previous interactions. Based on these findings, we extend probabilistic models of convention formation to multimodal settings, capturing shifts in modality preferences. Our findings and model provide building blocks for designing convention-aware intelligent agents situated in the physical world.
[HC-6] Designing Multi-Robot Ground Video Sensemaking with Public Safety Professionals
【速读】:该论文旨在解决多机器人地面视频在公共安全工作流中难以有效设计与集成的问题,以提升态势感知能力并减轻专业人员负担。其解决方案的关键在于构建了一个名为MRVS的工具,该工具通过提示工程(prompt engineering)增强的大型语言模型(Large Language Model, LLM)对多机器人巡逻视频流进行理解与解释,从而降低人工工作负荷、提高决策信心;同时,研究识别出六个关键设计要求,并提供首个面向公共安全的多机器人视频理解测试平台(包括38个事件类型和20段视频数据),为未来多机器人视频智能分析系统的设计提供了实证基础。
链接: https://arxiv.org/abs/2602.08882
作者: Puqi Zhou(1),Ali Asgarov(2),Aafiya Hussain(2),Wonjoon Park(3),Amit Paudyal(1),Sameep Shrestha(1),Chia-wei Tang(2),Michael F. Lighthiser(1),Michael R. Hieb(1),Xuesu Xiao(1),Chris Thomas(2),Sungsoo Ray Hong(1) ((1) George Mason University, Fairfax, VA, USA (2) Virginia Tech, Blacksburg, VA, USA (3) University of Maryland, College Park, MD, USA)
机构: George Mason University (乔治梅森大学); Virginia Tech (弗吉尼亚理工学院); University of Maryland (马里兰大学)
类目: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)
备注:
点击查看摘要
Abstract:Videos from fleets of ground robots can advance public safety by providing scalable situational awareness and reducing professionals’ burden. Yet little is known about how to design and integrate multi-robot videos into public safety workflows. Collaborating with six police agencies, we examined how such videos could be made practical. In Study 1, we presented the first testbed for multi-robot ground video sensemaking. The testbed includes 38 events-of-interest (EoI) relevant to public safety, a dataset of 20 robot patrol videos (10 day/night pairs) covering EoI types, and 6 design requirements aimed at improving current video sensemaking practices. In Study 2, we built MRVS, a tool that augments multi-robot patrol video streams with a prompt-engineered video understanding model. Participants reported reduced manual workload and greater confidence with LLM-based explanations, while noting concerns about false alarms and privacy. We conclude with implications for designing future multi-robot video sensemaking tools. The testbed is available at this https URL_VideoSensemaking
[HC-7] Glow with the Flow: AI-Assisted Creation of Ambient Lightscapes for Music Videos
【速读】:该论文旨在解决当前生成用于音乐视频的环境光序列(ambient light sequences)仍局限于专业场景的问题,主要受限于创作时间和技能门槛。解决方案的关键在于提出一个AI辅助系统,该系统基于专业设计启发式规则,从音频和视频源中提取显著特征,自动生成可编辑的基于对象的环境光效果初稿,从而为非专业人士提供一个可进一步优化的起点。评估结果显示,32名参与者认为该系统的初始输出已具备可行的基线价值,证明了AI辅助工作流在拓展设计光 beyond 专业场所中的有效性。
链接: https://arxiv.org/abs/2602.08838
作者: Frederic Anthony Robinson,Vishnu Raj,David Cooper,Fan Du,David Gunawan
机构: Dolby Laboratories Inc. (杜比实验室公司)
类目: Human-Computer Interaction (cs.HC)
备注: Accepted to IEEE Pacific Visualization 2026 (Notes)
点击查看摘要
Abstract:Designed light is an established modality for live performance and music playback. Despite the growing availability of consumer smart lighting, the creation of designed light for music visualization remains limited to professional contexts due to time and skill constraints. To address this, we present an AI-assisted system for generating ambient light sequences for music videos. Informed by professional design heuristics, the system extracts salient features from source video and audio to generate an editable preliminary design of object based ambient light effect. We evaluated the system by comparing its autonomous output against hand-authored designs for three music videos. Findings from responses by 32 participants indicate that the initial output provides a viable baseline for further refinement by human authors. This work demonstrates the utility of AI-assisted workflows in supporting the creation and adoption of designed light beyond professional venues.
[HC-8] Enhancing Generative AI Image Refinement with Scribbles and Annotations: A Comparative Study of Multimodal Prompts
链接: https://arxiv.org/abs/2602.08830
作者: Hyerim Park,Phuong Thao Tran,Andre Luckow,Ceenu George,Michael Sedlmair,Malin Eiband
机构: BMW Group (宝马集团); University of Stuttgart (斯图加特大学); TU Berlin (柏林工业大学); LMU Munich (慕尼黑路德维希-马克西米利安大学)
类目: Human-Computer Interaction (cs.HC)
备注: 22 pages, 14 figures. Preprint of an accepted IUI '26 paper
[HC-9] Belief Offloading in Human-AI Interaction
【速读】:该论文旨在解决人类在与大型语言模型(Large Language Model, LLM)交互过程中,因过度依赖AI生成信息而产生的“信念卸载”(belief offloading)问题,即个体将信念的形成与维持过程转移至AI系统,从而可能削弱其自主认知能力并引发行为与信念体系的潜在负面影响。解决方案的关键在于通过整合哲学、心理学与计算机科学的研究成果,明确信念卸载的发生边界条件,并构建一个描述性分类框架,以系统化识别不同类型的信念卸载及其规范性后果,为后续评估人-AI交互中信念卸载的潜在风险与影响提供理论基础和研究方向。
链接: https://arxiv.org/abs/2602.08754
作者: Rose E. Guingrich,Dvija Mehta,Umang Bhatt
机构: Princeton University (普林斯顿大学); Eindhoven University of Technology (埃因霍温理工大学); University of Cambridge (剑桥大学)
类目: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
备注:
点击查看摘要
Abstract:What happens when people’s beliefs are derived from information provided by an LLM? People’s use of LLM chatbots as thought partners can contribute to cognitive offloading, which can have adverse effects on cognitive skills in cases of over-reliance. This paper defines and investigates a particular kind of cognitive offloading in human-AI interaction, “belief offloading,” in which people’s processes of forming and upholding beliefs are offloaded onto an AI system with downstream consequences on their behavior and the nature of their system of beliefs. Drawing on philosophy, psychology, and computer science research, we clarify the boundary conditions under which belief offloading occurs and provide a descriptive taxonomy of belief offloading and its normative implications. We close with directions for future work to assess the potential for and consequences of belief offloading in human-AI interaction.
[HC-10] Why do we Trust Chatbots? From Normative Principles to Behavioral Drivers
【速读】:该论文试图解决的问题是:当前聊天机器人(chatbot)在人机交互中日益模糊自动化系统与人类对话的界限,导致用户对其产生的“信任”往往源于交互设计中的认知偏差,而非真实的可信性表现,从而引发对“信任”概念混淆的风险。解决方案的关键在于重新定义聊天机器人的角色——将其视为由部署组织目标驱动的、具备高超说服技巧的销售人员(salespeople),而非陪伴者或助手(assistant)。这一视角转变有助于区分心理层面的信任形成机制与规范意义上的可信性标准,进而推动研究和机制设计聚焦于如何帮助用户合理校准对对话式人工智能系统的信任水平。
链接: https://arxiv.org/abs/2602.08707
作者: Aditya Gulati,Nuria Oliver
机构: ELLIS Alicante(ELLIS阿尔卡nte)
类目: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
备注:
点击查看摘要
Abstract:As chatbots increasingly blur the boundary between automated systems and human conversation, the foundations of trust in these systems warrant closer examination. While regulatory and policy frameworks tend to define trust in normative terms, the trust users place in chatbots often emerges from behavioral mechanisms. In many cases, this trust is not earned through demonstrated trustworthiness but is instead shaped by interactional design choices that leverage cognitive biases to influence user behavior. Based on this observation, we propose reframing chatbots not as companions or assistants, but as highly skilled salespeople whose objectives are determined by the deploying organization. We argue that the coexistence of competing notions of “trust” under a shared term obscures important distinctions between psychological trust formation and normative trustworthiness. Addressing this gap requires further research and stronger support mechanisms to help users appropriately calibrate trust in conversational AI systems.
[HC-11] chnosocial risks of ideal emotion recognition technologies: A defense of the (social) value of emotional expressions
【速读】:该论文试图解决的问题是:当前对理想情绪识别技术(Ideal Emotion Recognition Technologies, ERTs)的推崇往往基于一个未经检验的假设,即社会生活将因情感透明度的提升而受益。然而,作者指出,这种观点忽视了情绪表达在社会互动中的多重功能——不仅作为内在情绪状态的外显信号,更是协调行动、实现道德修复、维系人际信任和支持集体规范的重要工具。这些功能依赖于一定程度的“认知模糊性”(epistemic friction)和“部分不透明性”。当ERTs被应用于具有社会权威或评价性质的情境时,其通过消除认知摩擦、用技术中介的情绪画像替代关系性意义,压缩了个体在情感表达上的自主空间,导致情感决定论和环境化的情感监控(affective auditing),进而削弱社会凝聚力与个体能动性。解决方案的关键在于采用“功能优先”的监管路径,将表达自主权与有意情绪表达视为特定社会价值的构成要素,并据此限制过度情绪可读性(affective legibility)对这些价值的侵蚀。
链接: https://arxiv.org/abs/2602.08706
作者: Alexandra Pregent
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)
备注: 12 pages
点击查看摘要
Abstract:The prospect of AI systems that I call ideal emotion recognition technologies (ERTs) is often defended on the assumption that social life would benefit from increased affective transparency. This paper challenges that assumption by examining the technosocial risks posed by ideal ERTs, understood as multimodal systems capable of reliably inferring inner affective states in real time. Drawing on philosophical accounts of emotional expression and social practice, as well as empirical work in affective science and social psychology, I argue that the appeal of such systems rests on a misunderstanding of the social functions of emotional expression. Emotional expressions function not only as read-outs of inner states, but also as tools for coordinating action, enabling moral repair, sustaining interpersonal trust, and supporting collective norms. These functions depend on a background of partial opacity and epistemic friction. When deployed in socially authoritative or evaluative contexts, ideal ERTs threaten this expressive space by collapsing epistemic friction, displacing relational meaning with technology-mediated affective profiles, and narrowing the space for aspirational and role-sensitive expressions. The result is a drift towards affective determinism and ambient forms of affective auditing, which undermine both social cohesion and individual agency. I argue that, although it is intuitive to think that increasing accuracy would legitimise such systems, in the case of ERTs accuracy does not straightforwardly justify their deployment, and may, in some contexts, provide a reason for regulatory restraint. I conclude by defending a function-first regulatory approach that treats expressive discretion and intentional emotional expression as constitutive of certain social goods, and that accordingly seeks to protect these goods from excessive affective legibility.
[HC-12] LLM -Enhanced Wearables for Comprehensible Health Guidance in LMICs
【速读】:该论文旨在解决低收入和中等收入国家(LMICs)中个人健康监测因可负担性不足、数字素养低下以及健康数据理解困难而导致的普及障碍。其解决方案的关键在于提出了一种低成本、无屏幕的可穿戴设备——Guardian Angel,该设备通过与基于WhatsApp的大型语言模型(LLM)代理协同工作,直接在原始、噪声较大的传感器波形上运行,无需依赖高质量硬件信号,并能生成通俗易懂的个性化健康洞察。该方案在真实噪声环境下实现了100%的数据可用性覆盖,显著优于传统算法(仅70.29%),并在96小时的用户研究中验证了其对健康数据理解能力和生命体征意识的提升效果,为资源受限环境中提升健康素养和设备采纳提供了可行路径。
链接: https://arxiv.org/abs/2602.08701
作者: Mohammad Shaharyar Ahsan,Areeba Shahzad Shaikh,Maham Zahid,Umer Irfan,Maryam Mustafa,Naveed Anwar Bhatti,Muhammad Hamad Alizai
机构: Lahore University of Management Sciences (LUMS)
类目: Human-Computer Interaction (cs.HC)
备注:
点击查看摘要
Abstract:Personal health monitoring via IoT in LMICs is limited by affordability, low digital literacy, and limited health data comprehension. We present Guardian Angel, a low-cost, screenless wearable paired with a WhatsApp-based LLM agent that delivers plain-language, personalized insights. The LLM operates directly on raw, noisy sensor waveforms and is robust to the poor signal quality of low-cost hardware. On a benchmark dataset, a standard open-source algorithm produced valid outputs for only 70.29% of segments, whereas Guardian Angel achieved 100% availability (reported as coverage under field noise, distinct from accuracy), yielding a continuous and understandable physiological record. In a 96-hour study involving 20 participants (1,920 participant-hours), users demonstrated significant improvements in health data comprehension and mindfulness of vital signs. These results suggest a practical approach to enhancing health literacy and adoption in resource-constrained settings.
[HC-13] Supporting Effective Goal Setting with LLM -Based Chatbots
【速读】:该论文旨在解决个体在设定行为目标(如健康饮食、规律锻炼或提升生产力)时,虽有心理学框架(如目标设定理论和实施意向)指导,但常缺乏结构化外部支持的问题。解决方案的关键在于利用大语言模型(Large Language Model, LLM)驱动的聊天机器人,通过操作化心理干预策略来提供有效支持;实证结果表明,相较于仅提供引导或自适应建议,加入反馈机制显著提升了目标质量,是实现有效行为目标支持的核心设计要素。
链接: https://arxiv.org/abs/2602.08636
作者: Michel Schimpf,Sebastian Maier,Anton Wyrowski,Lara Christoforakos,Stefan Feuerriegel,Thomas Bohné
机构: University of Cambridge (剑桥大学); LMU Munich (慕尼黑路德维希马克西米利安大学); MCML (慕尼黑计算医学中心); Technical University of Munich (慕尼黑工业大学)
类目: Human-Computer Interaction (cs.HC)
备注:
点击查看摘要
Abstract:Each day, individuals set behavioral goals such as eating healthier, exercising regularly, or increasing productivity. While psychological frameworks (i.e., goal setting and implementation intentions) can be helpful, they often need structured external support, which interactive technologies can provide. We thus explored how large language model (LLM)-based chatbots can apply these frameworks to guide users in setting more effective goals. We conducted a preregistered randomized controlled experiment ( N = 543 ) comparing chatbots with different combinations of three design features: guidance, suggestions, and feedback. We evaluated goal quality using subjective and objective measures. We found that, while guidance is already helpful, it is the addition of feedback that makes LLM-based chatbots effective in supporting participants’ goal setting. In contrast, adaptive suggestions were less effective. Altogether, our study shows how to design chatbots by operationalizing psychological frameworks to provide effective support for reaching behavioral goals.
[HC-14] Kissan-Dost: Bridging the Last Mile in Smallholder Precision Agriculture with Conversational IoT
【速读】:该论文旨在解决农业物联网(Agri-IoT)设备产生的海量传感器数据难以被小农户有效利用的问题,尤其是如何将这些数据转化为可操作、易理解的农事建议。解决方案的关键在于构建一个基于多语言、传感器驱动的对话系统 Kissan-Dost,其核心是将土壤和气候传感器数据与检索增强生成(retrieval-augmented generation, RAG)技术结合,并通过模块化流程确保输出内容的接地性(grounding)、可追溯性(traceability)及主动预警能力。实验表明,该系统在90天双站点试点中显著优于传统仪表板,实现了近实时响应(亚秒级延迟)和高准确率(>90%),证明了“最后一公里”的集成设计比硬件创新更能释放现有 Agri-IoT 的价值。
链接: https://arxiv.org/abs/2602.08593
作者: Muhammad Saad Ali,Daanish U. Khan,Laiba Intizar Ahmad,Umer Irfan,Maryam Mustafa,Naveed Anwar Bhatti,Muhammad Hamad Alizai
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:We present Kissan-Dost, a multilingual, sensor-grounded conversational system that turns live on-farm measurements and weather into plain-language guidance delivered over WhatsApp text or voice. The system couples commodity soil and climate sensors with retrieval-augmented generation, then enforces grounding, traceability, and proactive alerts through a modular pipeline. In a 90-day, two-site pilot with five participants, we ran three phases (baseline, dashboard only, chatbot only). Dashboard engagement was sporadic and faded, while the chatbot was used nearly daily and informed concrete actions. Controlled tests on 99 sensor-grounded crop queries achieved over 90 percent correctness with subsecond end-to-end latency, alongside high-quality translation outputs. Results show that careful last-mile integration, not novel circuitry, unlocks the latent value of existing Agri-IoT for smallholders.
[HC-15] Agent -Supported Foresight for AI Systemic Risks: AI Agents for Breadth Experts for Judgment
链接: https://arxiv.org/abs/2602.08565
作者: Leon Fröhling,Alessandro Giaconia,Edyta Paulina Bogucka,Daniele Quercia
机构: GESIS - Leibniz Institute for the Social Sciences (GESIS - 社会科学研究所); ETH Zurich (苏黎世联邦理工学院); Nokia Bell Labs (诺基亚贝尔实验室); University of Cambridge (剑桥大学); Politecnico di Torino (都灵理工大学)
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注: 48 pages, 15 figures
[HC-16] hree Lessons from Citizen-Centric Participatory AI Design ICIP
链接: https://arxiv.org/abs/2602.08554
作者: Eike Schneiders,Sarah Kiden,Beining Zhang,Bruno Rafael Queiros Arcanjo,Zhaoxing Li,Ezhilarasi Periyathambi,Vahid Yazdanpanah,Sebastian Stein
机构: The University of Southampton (南安普顿大学)
类目: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
备注: PARTICIPATE-AI: A Workshop at the 2026 ACM Conference on Intelligent User Interfaces (ACM IUI)
[HC-17] Gesture Matters: Pedestrian Gesture Recognition for AVs Through Skeleton Pose Evaluation
【速读】:该论文旨在解决自动驾驶车辆(AV)在复杂交通场景中难以准确识别行人手势的问题,尤其是在正式交通规则不足以保障交互安全的情况下。其解决方案的关键在于构建一个基于2D姿态估计的行人手势分类框架,利用WIVW数据集中的真实世界视频序列,从归一化关键点中提取76个静态与动态特征,并对四类主要手势(Stop、Go、Thank-Greet 和 No Gesture)进行分类。研究发现,手部位置和运动速度是区分不同手势类别最具判别性的特征,最终实现了87%的分类准确率,显著提升了AV系统对非语言交通交互的感知能力。
链接: https://arxiv.org/abs/2602.08479
作者: Alif Rizqullah Mahdi,Mahdi Rezaei,Natasha Merat
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
备注: 9th International Conference on Instrumentation, Control, and Automation (ICA)
点击查看摘要
Abstract:Gestures are a key component of non-verbal communication in traffic, often helping pedestrian-to-driver interactions when formal traffic rules may be insufficient. This problem becomes more apparent when autonomous vehicles (AVs) struggle to interpret such gestures. In this study, we present a gesture classification framework using 2D pose estimation applied to real-world video sequences from the WIVW dataset. We categorise gestures into four primary classes (Stop, Go, Thank Greet, and No Gesture) and extract 76 static and dynamic features from normalised keypoints. Our analysis demonstrates that hand position and movement velocity are especially discriminative in distinguishing between gesture classes, achieving a classification accuracy score of 87%. These findings not only improve the perceptual capabilities of AV systems but also contribute to the broader understanding of pedestrian behaviour in traffic contexts.
[HC-18] A Two-Week In-the-Wild Study of Screen Filters and Camera Sliders for Smartphone Privacy in Public Spaces
【速读】:该论文旨在解决公共空间中智能手机使用引发的隐私问题,如侧向窥视(shoulder surfing)和无意中的摄像头拍摄。其解决方案的关键在于评估两种实体隐私增强工具——屏幕滤光片(screen filter)和摄像头滑块(camera slider)——对用户隐私感知、行为适应、可用性及社交动态的影响。研究通过混合方法在真实环境中进行,发现屏幕滤光片能提升用户在公共场所使用手机时的舒适度与隐私感,但同时也导致用户减少主动遮挡屏幕等防护行为,反映出风险感知的变化;而摄像头滑块则揭示了隐私意识与社会评价担忧之间的心理机制,为设计更有效的隐私保护工具提供了实证依据。
链接: https://arxiv.org/abs/2602.08465
作者: Andreas Tjeldflaat,Piero Romare,Yuki Onishi,Morten Fjeld,Bjørn Sætrevik
机构: University of Bergen (卑尔根大学); Chalmers University of Technology (查尔默斯理工大学); University of Gothenburg (哥德堡大学); t2i Lab, CSE, Chalmers University of Technology (t2i 实验室,计算机科学与工程系,查尔默斯理工大学)
类目: Human-Computer Interaction (cs.HC)
备注: To be published in TEI '26: Proceedings of the Twenty International Conference on Tangible, Embedded, and Embodied Interaction
点击查看摘要
Abstract:Smartphone usage in public spaces can raise privacy concerns, in terms of shoulder surfing and unintended camera capture. In real-world public space settings, we investigated the impact of tangible privacy-enhancing tools (here: screen filter and camera slider) on smartphone users’ reported privacy perception, behavioral adaptations, usability and social dynamics. We conducted a mixed-method, in-the-wild study ( N = 22 ) using off-the-shelf smartphone privacy tools. We investigated subjective behavioral transition by combining questionnaires with semi-structured interviews. Participants used the screen filter and the camera slider for two weeks; they reported changes in attitude and behavior after using a screen filter including screen visibility and comfort when using phones publicly. They explained decreased privacy-protective behaviors, such as actively covering their screens, suggesting a shift in perceived risk. Qualitative findings about the camera slider suggested underlying psychological mechanisms, including privacy awareness and concerns about social perception, while also offering insights regarding the tools’ effectiveness.
[HC-19] Intelligent support for Human Oversight: Integrating Reinforcement Learning with Gaze Simulation to Personalize Highlighting
【速读】:该论文旨在解决在时间紧迫条件下,人机交互界面(Human-Computer Interface, HCI)如何有效支持操作者的情境意识(Situation Awareness, SA),特别是在需要人工监督的复杂任务中,如何平衡关键事件提示的收益与干扰带来的认知负荷。其解决方案的关键在于采用强化学习(Reinforcement Learning, RL)驱动的用户界面(UI)自适应机制,通过个性化调整警报策略,在不依赖真实世界部署的前提下,利用用户注视行为模型模拟注意力动态,从而实现对关键事件的智能突出显示,初步实验结果表明该方法优于传统的静态规则式警报策略。
链接: https://arxiv.org/abs/2602.08403
作者: Thorsten Klößner,João Belo,Zekun Wu,Jörg Hoffmann,Anna Maria Feit
机构: Saarland University (萨尔兰大学)
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注: AI CHAOS '26: Workshop Series on the Challenges for Human Oversight of AI Systems
点击查看摘要
Abstract:Interfaces for human oversight must effectively support users’ situation awareness under time-critical conditions. We explore reinforcement learning (RL)-based UI adaptation to personalize alerting strategies that balance the benefits of highlighting critical events against the cognitive costs of interruptions. To enable learning without real-world deployment, we integrate models of users’ gaze behavior to simulate attentional dynamics during monitoring. Using a delivery-drone oversight scenario, we present initial results suggesting that RL-based highlighting can outperform static, rule-based approaches and discuss challenges of intelligent oversight support.
[HC-20] AI-Assisted Model for Generating Multiple-Choice Questions
链接: https://arxiv.org/abs/2602.08383
作者: Tetiana Krushynska,Jani Ursin,Ville Heilala
机构: 未知
类目: Human-Computer Interaction (cs.HC)
备注: 28 pages, 1 figure
[HC-21] o Tango or to Disentangle? Making Ethnography Public in the Digital Age
【速读】:该论文旨在解决数字平台兴起背景下,传统民族志方法在混合媒介环境(hybrid media environments)中如何重新定义研究者与被研究对象之间“内外”、“参与者与观察者”的双重角色问题。随着虚拟现实社交平台(如VRChat)和即时通讯工具(如WhatsApp)等数字空间成为社会文化实践的新场域,种族与种姓等结构性议题的公共性(publics)呈现出新的复杂形态,传统民族志面临实践与伦理挑战。解决方案的关键在于提出“涌现关系性”(emergent relationality)作为核心分析框架,用以理解研究者、平台与公众之间相互塑造的动态过程,并揭示位置性(positionality)与混合媒介环境如何共同决定哪些内容可被访问、表达并转化为公共话语。
链接: https://arxiv.org/abs/2602.08349
作者: Daniel Mwesigwa,Cyan DeVeaux,Palashi Vaghela
机构: 未知
类目: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
备注: Accepted to CSCW 2026 (PACM HCI)
点击查看摘要
Abstract:Ethnography attends to relations among people, practices, and the technologies that mediate them. Central to this method is the duality of roles ethnographers navigate as researchers and participants and as outsiders and insiders. However, the rise of digital platforms has introduced new opportunities as well as practical and ethical challenges that reshape these dualities across hybrid media environments spanning both online and offline contexts. Drawing on two case studies of VRChat and WhatsApp, we examine how ethnographers employ diverse tactics to study both enduring and emerging socio-cultural issues of race and caste, particularly those that form what are often called publics. We propose emergent relationality as a key analytic for understanding the mutual shaping of ethnographers, platforms, and publics. In this work, emergent relationality offers registers for analyzing how positionality and hybrid media environments constitute and condition what can be accessed, articulated, and made public.
[HC-22] “I Cant Keep Up”: Accessibility Barriers in Video-Based Learning for Individuals with Borderline Intellectual Functioning
【速读】:该论文旨在解决视频学习(Video-based Learning, VBL)在服务认知差异群体,特别是边缘性智力功能障碍(Borderline Intellectual Functioning, BIF)个体时存在的显著适配不足问题。现有无障碍指南对认知多样性用户支持有限,而VBL虽具备自定进度和视觉演示等优势,却因内容与用户认知特征不匹配(如节奏过快、信息密度高、关键内容省略导致理解困难)及体验因素(如自我效能感低)加剧学习挑战。研究通过多轮用户调研发现,BIF用户虽采用重复观看等策略缓解困难,但无法根本弥补视频媒介固有的认知负荷缺口。解决方案的关键在于从内容设计与用户界面(UI)层面进行系统性优化,以实现对BIF及其他认知多样性用户的更有效支持。
链接: https://arxiv.org/abs/2602.08300
作者: Hyehyun Chu,Seungju Kim,Chen Zhou,Yu-Kai Hung,Saelyne Yang,Hyun W. Ka,Juho Kim
机构: 未知
类目: Human-Computer Interaction (cs.HC)
备注: Accepted by ACM CHI 2026; 25 pages, 5 figures, 4 tables
点击查看摘要
Abstract:Video-based learning (VBL) has become a dominant method for learning practical skills, yet accessibility guidelines provide limited guidance for users with cognitive differences. In particular, challenges that individuals with Borderline Intellectual Functioning (BIF) encounter in video-based learning remain largely underexplored, despite VBL’s potential to support their learning through features like self-paced viewing and visual demonstration. To address this gap, we conducted a series of studies with BIF individuals and caretakers to comprehensively understand their VBL challenges. Our analysis revealed challenges stemming from misalignment between user cognitive characteristics and video elements (e.g., overwhelmed by pacing and density, difficulty inferring omitted content), and experiential factors intensifying challenges (e.g., low self-efficacy). While participants employed coping strategies such as repetitive viewing to address these challenges, these strategies could not overcome fundamental gaps with video. We further discuss the design implications on both content and UI-level features for BIF and broader groups with cognitive diversities.
[HC-23] Investigating Writing Professionals Relationships with Generative AI: How Combined Perceptions of Rivalry and Collaboration Shape Work Practices and Outcomes
链接: https://arxiv.org/abs/2602.08227
作者: Rama Adithya,Varanasi, Nov,Oded,Wiesenfeld,Batia Mishan
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注: CHI’2026
计算机视觉
[CV-0] Autoregressive Image Generation with Masked Bit Modeling
【速读】:该论文旨在解决视觉生成领域中连续(continuous)与离散(discrete)方法之间的性能差距问题,特别是挑战了“离散分词器本质上劣于连续方法”的主流认知。研究表明,这种差距主要源于潜在空间中分配的比特总数(即压缩比),而非方法本身的固有缺陷。解决方案的关键在于提出一种可扩展的掩码位自回归建模框架(masked Bit AutoRegressive modeling, BAR),该框架通过在自回归Transformer中引入掩码位建模头,逐步生成离散token的比特组成,从而支持任意大小的码本(codebook)。BAR不仅显著提升了离散生成方法的性能(在ImageNet-256上达到0.99的gFID,优于现有连续和离散方法),还大幅降低采样成本并加快收敛速度,突破了传统离散方法因码本规模扩大而出现性能下降或训练成本激增的瓶颈。
链接: https://arxiv.org/abs/2602.09024
作者: Qihang Yu,Qihao Liu,Ju He,Xinyang Zhang,Yang Liu,Liang-Chieh Chen,Xi Chen
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: SOTA discrete visual generation defeats diffusion models with 0.99 FID score, project page is available at this https URL
点击查看摘要
Abstract:This paper challenges the dominance of continuous pipelines in visual generation. We systematically investigate the performance gap between discrete and continuous methods. Contrary to the belief that discrete tokenizers are intrinsically inferior, we demonstrate that the disparity arises primarily from the total number of bits allocated in the latent space (i.e., the compression ratio). We show that scaling up the codebook size effectively bridges this gap, allowing discrete tokenizers to match or surpass their continuous counterparts. However, existing discrete generation methods struggle to capitalize on this insight, suffering from performance degradation or prohibitive training costs with scaled codebook. To address this, we propose masked Bit AutoRegressive modeling (BAR), a scalable framework that supports arbitrary codebook sizes. By equipping an autoregressive transformer with a masked bit modeling head, BAR predicts discrete tokens through progressively generating their constituent bits. BAR achieves a new state-of-the-art gFID of 0.99 on ImageNet-256, outperforming leading methods across both continuous and discrete paradigms, while significantly reducing sampling costs and converging faster than prior continuous approaches. Project page is available at this https URL
[CV-1] WorldCompass: Reinforcement Learning for Long-Horizon World Models
链接: https://arxiv.org/abs/2602.09022
作者: Zehan Wang,Tengfei Wang,Haiyu Zhang,Xuhui Zuo,Junta Wu,Haoyuan Wang,Wenqiang Sun,Zhenwei Wang,Chenjie Cao,Hengshuang Zhao,Chunchao Guo,Zhou Zhao
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Project page: \url{ this https URL }
[CV-2] χ_0: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies
链接: https://arxiv.org/abs/2602.09021
作者: Checheng Yu,Chonghao Sima,Gangcheng Jiang,Hai Zhang,Haoguang Mai,Hongyang Li,Huijie Wang,Jin Chen,Kaiyang Wu,Li Chen,Lirui Zhao,Modi Shi,Ping Luo,Qingwen Bu,Shijia Peng,Tianyu Li,Yibo Yuan
机构: Hong Kong University of Science and Technology (香港科技大学)
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-3] Robustness Is a Function Not a Number: A Factorized Comprehensive Study of OOD Robustness in Vision-Based Driving
【速读】:该论文旨在解决自动驾驶系统在分布外(Out of Distribution, OOD)场景下鲁棒性不足的问题,即现有策略在训练数据分布之外的环境变化中性能显著下降。其核心挑战在于缺乏对OOD失效原因的细粒度解析以及如何设计更具泛化能力的策略架构。解决方案的关键在于:(1) 构建五维环境分解框架(场景、季节、天气、时间、代理混合),通过受控的k-因子扰动实验量化不同因素对性能的影响;(2) 采用冻结基础模型(Foundation Model, FM)特征训练紧凑的视觉Transformer(ViT)头部,在保持低延迟的同时实现SOTA的OOD鲁棒性;(3) 揭示非加性交互效应(如季节与时间组合损害严重),并提出基于多源ID环境训练和针对性暴露于困难条件来提升鲁棒性的实用设计规则。
链接: https://arxiv.org/abs/2602.09018
作者: Amir Mallak,Alaa Maalouf
机构: University of Haifa (海法大学)
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:
点击查看摘要
Abstract:Out of distribution (OOD) robustness in autonomous driving is often reduced to a single number, hiding what breaks a policy. We decompose environments along five axes: scene (rural/urban), season, weather, time (day/night), and agent mix; and measure performance under controlled k -factor perturbations ( k \in \0,1,2,3\ ). Using closed loop control in VISTA, we benchmark FC, CNN, and ViT policies, train compact ViT heads on frozen foundation-model (FM) features, and vary ID support in scale, diversity, and temporal context. (1) ViT policies are markedly more OOD-robust than comparably sized CNN/FC, and FM features yield state-of-the-art success at a latency cost. (2) Naive temporal inputs (multi-frame) do not beat the best single-frame baseline. (3) The largest single factor drops are rural \rightarrow urban and day \rightarrow night ( \sim 31% each); actor swaps \sim 10% , moderate rain \sim 7% ; season shifts can be drastic, and combining a time flip with other changes further degrades performance. (4) FM-feature policies stay above 85% under three simultaneous changes; non-FM single-frame policies take a large first-shift hit, and all no-FM models fall below 50% by three changes. (5) Interactions are non-additive: some pairings partially offset, whereas season-time combinations are especially harmful. (6) Training on winter/snow is most robust to single-factor shifts, while a rural+summer baseline gives the best overall OOD performance. (7) Scaling traces/views improves robustness ( +11.8 points from 5 to 14 traces), yet targeted exposure to hard conditions can substitute for scale. (8) Using multiple ID environments broadens coverage and strengthens weak cases (urban OOD 60.6% \rightarrow 70.1% ) with a small ID drop; single-ID preserves peak performance but in a narrow domain. These results yield actionable design rules for OOD-robust driving policies.
[CV-4] Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction
链接: https://arxiv.org/abs/2602.09016
作者: Hao Phung,Hadar Averbuch-Elor
机构: Cornell University (康奈尔大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Code: this https URL
[CV-5] ArcFlow: Unleashing 2-Step Text-to-Image Generation via High-Precision Non-Linear Flow Distillation
【速读】:该论文旨在解决扩散模型(Diffusion Models)在推理阶段因依赖多步顺序去噪而导致的高计算成本问题,尤其是现有蒸馏方法通过线性捷径近似教师模型轨迹时,难以匹配其随时间步不断变化的切向方向,从而造成生成质量下降。解决方案的关键在于提出ArcFlow框架,该框架显式采用非线性流形轨迹来逼近预训练教师模型的推理路径,通过将速度场参数化为连续动量过程的混合形式,能够捕捉速度演化并外推出一致的速度序列,形成每个去噪步骤内的连续非线性轨迹;更重要的是,这种参数化支持对该非线性轨迹的解析积分,避免数值离散误差,实现对教师轨迹的高精度逼近,同时结合轻量级适配器进行轨迹蒸馏训练,仅微调少于5%的原始参数即可在2个数值函数评估次数(NFEs)下实现40倍加速且保持生成质量稳定。
链接: https://arxiv.org/abs/2602.09014
作者: Zihan Yang(1),Shuyuan Tu(1),Licheng Zhang(1),Qi Dai(2),Yu-Gang Jiang(1),Zuxuan Wu(1) ((1) Fudan University, (2) Microsoft Research Asia)
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:Diffusion models have achieved remarkable generation quality, but they suffer from significant inference cost due to their reliance on multiple sequential denoising steps, motivating recent efforts to distill this inference process into a few-step regime. However, existing distillation methods typically approximate the teacher trajectory by using linear shortcuts, which makes it difficult to match its constantly changing tangent directions as velocities evolve across timesteps, thereby leading to quality degradation. To address this limitation, we propose ArcFlow, a few-step distillation framework that explicitly employs non-linear flow trajectories to approximate pre-trained teacher trajectories. Concretely, ArcFlow parameterizes the velocity field underlying the inference trajectory as a mixture of continuous momentum processes. This enables ArcFlow to capture velocity evolution and extrapolate coherent velocities to form a continuous non-linear trajectory within each denoising step. Importantly, this parameterization admits an analytical integration of this non-linear trajectory, which circumvents numerical discretization errors and results in high-precision approximation of the teacher trajectory. To train this parameterization into a few-step generator, we implement ArcFlow via trajectory distillation on pre-trained teacher models using lightweight adapters. This strategy ensures fast, stable convergence while preserving generative diversity and quality. Built on large-scale models (Qwen-Image-20B and FLUX.1-dev), ArcFlow only fine-tunes on less than 5% of original parameters and achieves a 40x speedup with 2 NFEs over the original multi-step teachers without significant quality degradation. Experiments on benchmarks show the effectiveness of ArcFlow both qualitatively and quantitatively.
[CV-6] Dexterous Manipulation Policies from RGB Human Videos via 4D Hand-Object Trajectory Reconstruction
链接: https://arxiv.org/abs/2602.09013
作者: Hongyi Chen,Tony Dong,Tiancheng Wu,Liquan Wang,Yash Jangir,Yaru Niu,Yufei Ye,Homanga Bharadhwaj,Zackory Erickson,Jeffrey Ichnowski
机构: Carnegie Mellon University (卡内基梅隆大学); Georgia Institute of Technology (佐治亚理工学院); Stanford University (斯坦福大学)
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-7] GEBench: Benchmarking Image Generation Models as GUI Environments
【速读】:该论文旨在解决当前图像生成模型在图形用户界面(GUI)动态交互与时间连贯性评估方面的不足问题。现有基准主要关注通用领域的视觉保真度,而忽视了GUI特定场景中状态转移和时序一致性的评测需求。其解决方案的关键在于提出GEBench基准和GE-Score评价指标:GEBench包含700个精心设计的样本,覆盖五类任务,涵盖单步与多步交互、真实与虚构场景及定位点识别;GE-Score则是一个五维指标体系,从目标达成、交互逻辑、内容一致性、UI合理性及视觉质量五个维度系统评估生成GUI的动态表现。实证表明,当前模型在单步转换上表现良好,但在长序列交互中难以维持时空一致性,暴露出图标理解、文本渲染和定位精度等瓶颈,为构建高保真生成式GUI环境提供了可量化的评估框架与研究方向。
链接: https://arxiv.org/abs/2602.09007
作者: Haodong Li,Jingwei Wu,Quan Sun,Guopeng Li,Juanxi Tian,Huanyu Zhang,Yanlin Lai,Ruichuan An,Hongbo Peng,Yuhong Dai,Chenxi Li,Chunmei Qing,Jia Wang,Ziyang Meng,Zheng Ge,Xiangyu Zhang,Daxin Jiang
机构: StepFun; South China University of Technology (华南理工大学); Peking University (北京大学); Tsinghua University (清华大学); Institute of Automation, Chinese Academy of Sciences (中国科学院自动化研究所); The University of Chicago (芝加哥大学); Nanyang Technological University (南洋理工大学)
类目: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: 23 pages, 5 figures, 4 tables
点击查看摘要
Abstract:Recent advancements in image generation models have enabled the prediction of future Graphical User Interface (GUI) states based on user instructions. However, existing benchmarks primarily focus on general domain visual fidelity, leaving the evaluation of state transitions and temporal coherence in GUI-specific contexts underexplored. To address this gap, we introduce GEBench, a comprehensive benchmark for evaluating dynamic interaction and temporal coherence in GUI generation. GEBench comprises 700 carefully curated samples spanning five task categories, covering both single-step interactions and multi-step trajectories across real-world and fictional scenarios, as well as grounding point localization. To support systematic evaluation, we propose GE-Score, a novel five-dimensional metric that assesses Goal Achievement, Interaction Logic, Content Consistency, UI Plausibility, and Visual Quality. Extensive evaluations on current models indicate that while they perform well on single-step transitions, they struggle significantly with maintaining temporal coherence and spatial grounding over longer interaction sequences. Our findings identify icon interpretation, text rendering, and localization precision as critical bottlenecks. This work provides a foundation for systematic assessment and suggests promising directions for future research toward building high-fidelity generative GUI environments. The code is available at: this https URL.
[CV-8] Generalizing Sports Feedback Generation by Watching Competitions and Reading Books: A Rock Climbing Case Study WACV2026
链接: https://arxiv.org/abs/2602.08996
作者: Arushi Rai,Adriana Kovashka
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: to appear WACV 2026
[CV-9] WorldArena: A Unified Benchmark for Evaluating Perception and Functional Utility of Embodied World Models
链接: https://arxiv.org/abs/2602.08971
作者: Yu Shang,Zhuohang Li,Yiding Ma,Weikang Su,Xin Jin,Ziyou Wang,Xin Zhang,Yinzhou Tang,Chen Gao,Wei Wu,Xihui Liu,Dhruv Shah,Zhaoxiang Zhang,Zhibo Chen,Jun Zhu,Yonghong Tian,Tat-Seng Chua,Wenwu Zhu,Yong Li
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
备注:
[CV-10] Modeling 3D Pedestrian-Vehicle Interactions for Vehicle-Conditioned Pose Forecasting ICRA
链接: https://arxiv.org/abs/2602.08962
作者: Guangxun Zhu,Xuan Liu,Nicolas Pugeault,Chongfeng Wei,Edmond S. L. Ho
机构: University of Glasgow(格拉斯哥大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
备注: Accepted for IEEE International Conference on Robotics and Automation (ICRA) 2026
[CV-11] MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE
链接: https://arxiv.org/abs/2602.08961
作者: Ruijie Zhu,Jiahao Lu,Wenbo Hu,Xiaoguang Han,Jianfei Cai,Ying Shan,Chuanxia Zheng
机构: NTU; ARC Lab, Tencent PCG; HKUST; CUHK(SZ); Monash University
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computational Geometry (cs.CG); Machine Learning (cs.LG)
备注: Project page: this https URL
[CV-12] Grow with the Flow: 4D Reconstruction of Growing Plants with Gaussian Flow Fields
链接: https://arxiv.org/abs/2602.08958
作者: Weihan Luo,Lily Goli,Sherwin Bahmani,Felix Taubner,Andrea Tagliasacchi,David B. Lindell
机构: University of Toronto (多伦多大学); Vector Institute (向量研究所); Simon Fraser University (西蒙菲莎大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Project page: this https URL
[CV-13] Analysis of Converged 3D Gaussian Splatting Solutions: Density Effects and Prediction Limit
链接: https://arxiv.org/abs/2602.08909
作者: Zhendong Wang,Cihan Ruan,Jingchuan Xiao,Chuqing Shi,Wei Jiang,Wei Wang,Wenjie Liu,Nam Ling
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:
[CV-14] FRe: Text-guided Video Frame Reduction for Efficient Video Multi-modal Large Language Models
链接: https://arxiv.org/abs/2602.08861
作者: Xiangtian Zheng,Zishuo Wang,Yuxin Peng
机构: Wangxuan Institute of Computer Technology, Peking University (北京大学王选计算机研究所)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-15] FlattenGPT : Depth Compression for Transformer with Layer Flattening ICML2026
链接: https://arxiv.org/abs/2602.08858
作者: Ruihan Xu,Qingpei Guo,Yao Zhu,Xiangyang Ji,Ming Yang,Shiliang Zhang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Submitted to ICML 2026
[CV-16] VideoVeritas: AI-Generated Video Detection via Perception Pretext Reinforcement Learning
链接: https://arxiv.org/abs/2602.08828
作者: Hao Tan,Jun Lan,Senyuan Shi,Zichang Tan,Zijian Yu,Huijia Zhu,Weiqiang Wang,Jun Wan,Zhen Lei
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Project: this https URL
[CV-17] Any-to-All MRI Synthesis: A Unified Foundation Model for Nasopharyngeal Carcinoma and Its Downstream Applications
链接: https://arxiv.org/abs/2602.08822
作者: Yao Pu,Yiming Shi,Zhenxi Zhang,Peixin Yu,Yitao Zhuang,Xiang Wang,Hongzhao Chen,Jing Cai,Ge Ren
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-18] Omni-Video 2: Scaling MLLM -Conditioned Diffusion for Unified Video Generation and Editing
链接: https://arxiv.org/abs/2602.08820
作者: Hao Yang,Zhiyu Tan,Jia Gong,Luozheng Qin,Hesen Chen,Xiaomeng Yang,Yuqing Sun,Yuetan Lin,Mengping Yang,Hao Li
机构: Shanghai Academy of Artificial Intelligence for Science(上海人工智能科学研究院); Fudan University(复旦大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Technical Report, Project: this https URL
[CV-19] Addressing data annotation scarcity in Brain Tumor Segmentation on 3D MRI scan Using a Semi-Supervised Teacher-Student Framework ALT
链接: https://arxiv.org/abs/2602.08797
作者: Jiaming Liu,Cheng Ding,Daoqiang Zhang
机构: Nanjing University of Aeronautics and Astronautics (南京航空航天大学); Shenzhen Research Institute, Nanjing University of Aeronautics and Astronautics (深圳研究院)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 10 pages, 7 figures. Submitted to IEEE Journal of Biomedical and Health Informatics (JBHI)
[CV-20] MOVA: Towards Scalable and Synchronized Video-Audio Generation
链接: https://arxiv.org/abs/2602.08794
作者: SII-OpenMOSS Team:Donghua Yu,Mingshu Chen,Qi Chen,Qi Luo,Qianyi Wu,Qinyuan Cheng,Ruixiao Li,Tianyi Liang,Wenbo Zhang,Wenming Tu,Xiangyu Peng,Yang Gao,Yanru Huo,Ying Zhu,Yinze Luo,Yiyang Zhang,Yuerong Song,Zhe Xu,Zhiyu Zhang,Chenchen Yang,Cheng Chang,Chushu Zhou,Hanfu Chen,Hongnan Ma,Jiaxi Li,Jingqi Tong,Junxi Liu,Ke Chen,Shimin Li,Songlin Wang,Wei Jiang,Zhaoye Fei,Zhiyuan Ning,Chunguo Li,Chenhui Li,Ziwei He,Zengfeng Huang,Xie Chen,Xipeng Qiu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
备注: Technical report for MOVA (open-source video-audio generation model). 38 pages, 10 figures, 22 tables. Project page: this https URL Code: this https URL Models: this https URL . Qinyuan Cheng and Tianyi Liang are project leader. Xie Chen and Xipeng Qiu are corresponding authors
[CV-21] Multimodal Learning for Arcing Detection in Pantograph-Catenary Systems
链接: https://arxiv.org/abs/2602.08792
作者: Hao Dong,Eleni Chatzi,Olga Fink
机构: ETH Zürich (苏黎世联邦理工学院); EPFL (洛桑联邦理工学院)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:
[CV-22] VedicTHG: Symbolic Vedic Computation for Low-Resource Talking-Head Generation in Educational Avatars
链接: https://arxiv.org/abs/2602.08775
作者: Vineet Kumar Rakesh,Ahana Bhattacharjee,Soumya Mazumdar,Tapas Samanta,Hemendra Kumar Pandey,Amitabha Das,Sarbajit Pal
机构: Homi Bhabha National Institute (印度原子能委员会国家研究所); Variable Energy Cyclotron Centre (变能回旋加速器中心); Gargi Memorial Institute of Technology (加吉纪念理工学院); Jadavpur University (贾达普大学); Mahatma Gandhi University (甘地马哈特玛大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Computational Geometry (cs.CG)
备注:
[CV-23] MVAnimate: Enhancing Character Animation with Multi-View Optimization
链接: https://arxiv.org/abs/2602.08753
作者: Tianyu Sun,Zhoujie Fu,Bang Zhang,Guosheng Lin
机构: Nanyang Technological University (南洋理工大学); Alibaba Group (阿里巴巴集团)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-24] Shifting the Breaking Point of Flow Matching for Multi-Instance Editing
【速读】:该论文旨在解决现有基于流(flow-based)的图像编辑方法在多实例场景下难以实现独立编辑的问题,即当参考图像中存在多个需独立修改的区域时,传统方法因全局条件速度场和联合注意力机制导致编辑指令间产生语义干扰,无法实现局部化且无交叉影响的编辑。其解决方案的关键在于提出实例解耦注意力(Instance-Disentangled Attention)机制,该机制通过分割联合注意力操作,在速度场估计过程中强制绑定特定实例的文本指令与空间区域,从而实现单次遍历下的实例级编辑,同时保持整体输出的一致性与局部编辑的准确性。
链接: https://arxiv.org/abs/2602.08749
作者: Carmine Zaccagnino,Fabio Quattrini,Enis Simsar,Marta Tintoré Gazulla,Rita Cucchiara,Alessio Tonioni,Silvia Cascianelli
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
点击查看摘要
Abstract:Flow matching models have recently emerged as an efficient alternative to diffusion, especially for text-guided image generation and editing, offering faster inference through continuous-time dynamics. However, existing flow-based editors predominantly support global or single-instruction edits and struggle with multi-instance scenarios, where multiple parts of a reference input must be edited independently without semantic interference. We identify this limitation as a consequence of globally conditioned velocity fields and joint attention mechanisms, which entangle concurrent edits. To address this issue, we introduce Instance-Disentangled Attention, a mechanism that partitions joint attention operations, enforcing binding between instance-specific textual instructions and spatial regions during velocity field estimation. We evaluate our approach on both natural image editing and a newly introduced benchmark of text-dense infographics with region-level editing instructions. Experimental results demonstrate that our approach promotes edit disentanglement and locality while preserving global output coherence, enabling single-pass, instance-level editing.
[CV-25] From Correspondence to Actions: Human-Like Multi-Image Spatial Reasoning in Multi-modal Large Language Models
链接: https://arxiv.org/abs/2602.08735
作者: Masanari Oi,Koki Maeda,Ryuto Koike,Daisuke Oba,Nakamasa Inoue,Naoaki Okazaki
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-26] Closing the Confusion Loop: CLIP-Guided Alignment for Source-Free Domain Adaptation
链接: https://arxiv.org/abs/2602.08730
作者: Shanshan Wang,Ziying Feng,Xiaozheng Shen,Xun Yang,Pichao Wang,Zhenwei He,Xingyi Zhang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-27] Artifact Reduction in Undersampled 3D Cone-Beam CTs using a Hybrid 2D-3D CNN Framework
链接: https://arxiv.org/abs/2602.08727
作者: Johannes Thalhammer,Tina Dorosti,Sebastian Peterhansl,Daniela Pfeiffer,Franz Pfeiffer,Florian Schaff
机构: Technical University of Munich (慕尼黑工业大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:
[CV-28] SynSacc: A Blender-to-V2E Pipeline for Synthetic Neuromorphic Eye-Movement Data and Sim-to-Real Spiking Model Training WACV2026
链接: https://arxiv.org/abs/2602.08726
作者: Khadija Iddrisu,Waseem Shariff,Suzanne Little,Noel OConnor
机构: Dublin City University (都柏林城市大学); University of Galway (高威大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted to the 2nd Workshop on "Event-based Vision in the Era of Generative AI - Transforming Perception and Visual Innovation, IEEE Winter Conference on Applications of Computer Vision (WACV 2026)
[CV-29] FusionEdit: Semantic Fusion and Attention Modulation for Training-Free Image Editing ICASSP2026
【速读】:该论文旨在解决文本引导图像编辑中因使用硬掩码(hard mask)导致的边界伪影和编辑能力受限的问题。其核心解决方案在于提出了一种无需训练的图像编辑框架FusionEdit:首先通过测量源图像与目标提示之间的语义差异自动识别编辑区域与保留区域,并采用距离感知的潜在空间融合策略生成平滑的软掩码(soft mask),结合总变差损失(total variation loss)确保边界过渡自然;其次,在DiT(Diffusion Transformer)注意力层中引入AdaIN(Adaptive Instance Normalization)调制机制,实现编辑区域内的统计注意力融合,从而在提升局部编辑可控性的同时保持全局图像一致性。
链接: https://arxiv.org/abs/2602.08725
作者: Yongwen Lai,Chaoqun Wang,Shaobo Min
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by ICASSP 2026
点击查看摘要
Abstract:Text-guided image editing aims to modify specific regions according to the target prompt while preserving the identity of the source image. Recent methods exploit explicit binary masks to constrain editing, but hard mask boundaries introduce artifacts and reduce editability. To address these issues, we propose FusionEdit, a training-free image editing framework that achieves precise and controllable edits. First, editing and preserved regions are automatically identified by measuring semantic discrepancies between source and target prompts. To mitigate boundary artifacts, FusionEdit performs distance-aware latent fusion along region boundaries to yield the soft and accurate mask, and employs a total variation loss to enforce smooth transitions, obtaining natural editing results. Second, FusionEdit leverages AdaIN-based modulation within DiT attention layers to perform a statistical attention fusion in the editing region, enhancing editability while preserving global consistency with the source image. Extensive experiments demonstrate that our FusionEdit significantly outperforms state-of-the-art methods. Code is available at \hrefthis https URLthis https URL.
[CV-30] Rotated Lights for Consistent and Efficient 2D Gaussians Inverse Rendering
链接: https://arxiv.org/abs/2602.08724
作者: Geng Lin,Matthias Zwicker
机构: University of Maryland, College Park (马里兰大学学院市分校)
类目: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
备注: Project Page: this https URL
[CV-31] Zero-shot System for Automatic Body Region Detection for Volumetric CT and MR Images
链接: https://arxiv.org/abs/2602.08717
作者: Farnaz Khun Jush,Grit Werner,Mark Klemens,Matthias Lenga
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 8 pages, 5 figures, 5 tables
[CV-32] owards Understanding Multimodal Fine-Tuning: Spatial Features
链接: https://arxiv.org/abs/2602.08713
作者: Lachin Naghashyar,Hunar Batra,Ashkan Khakzar,Philip Torr,Ronald Clark,Christian Schroeder de Witt,Constantin Venhoff
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:
[CV-33] meChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions
链接: https://arxiv.org/abs/2602.08711
作者: Linli Yao,Yuancheng Wei,Yaojie Zhang,Lei Li,Xinlong Chen,Feifan Song,Ziyue Wang,Kun Ouyang,Yuanxin Liu,Lingpeng Kong,Qi Liu,Pengfei Wan,Kun Gai,Yuanxing Zhang,Xu Sun
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-34] Low-Light Video Enhancement with An Effective Spatial-Temporal Decomposition Paradigm
链接: https://arxiv.org/abs/2602.08699
作者: Xiaogang Xu,Kun Zhou,Tao Hu,Jiafei Wu,Ruixing Wang,Hao Peng,Bei Yu
机构: The Chinese University of Hong Kong (香港中文大学); Shenzhen University (深圳大学); Bytedance (字节跳动); The University of Hong Kong (香港大学); DJI (大疆创新); Zhejiang Normal University (浙江师范大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-35] OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
链接: https://arxiv.org/abs/2602.08683
作者: Feilong Tang,Xiang An,Yunyao Yan,Yin Xie,Bin Qin,Kaicheng Yang,Yifei Shen,Yuanhan Zhang,Chunyuan Li,Shikun Feng,Changrui Chen,Huajie Tan,Ming Hu,Manyuan Zhang,Bo Li,Ziyong Feng,Ziwei Liu,Zongyuan Ge,Jiankang Deng
机构: Glint Lab; AIM for Health Lab; MVP Lab
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-36] ALIVE: Animate Your World with Lifelike Audio-Video Generation
链接: https://arxiv.org/abs/2602.08682
作者: Ying Guo,Qijun Gan,Yifu Zhang,Jinlai Liu,Yifei Hu,Pan Xie,Dongjun Qian,Yu Zhang,Ruiqi Li,Yuqi Zhang,Ruibiao Lu,Xiaofeng Mei,Bo Han,Xiang Yin,Bingyue Peng,Zehuan Yuan
机构: Bytedance ALIVE Team (字节跳动ALIVE团队)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-37] A Machine Learning accelerated geophysical fluid solver
【速读】:该论文旨在解决如何将机器学习(Machine Learning, ML)方法有效应用于具有数学约束的领域,特别是偏微分方程(Partial Differential Equations, PDEs)的求解问题。传统数值方法如有限差分或有限体积法在低分辨率下存在精度和稳定性不足的问题,而现有ML方法尚未充分融合数值格式的物理守恒特性。解决方案的关键在于提出一种数据驱动的离散化方法(data-driven discretization),通过训练深度神经网络预测拟线性差分模板(quasi-linear stencils)中的系数,从而在结构化网格上提升PDE求解器的准确性与稳定性;同时,该方法可继承传统数值格式的优势,例如通过适配有限体积形式实现守恒律,从而在经典浅水方程和欧拉方程求解中显著优于PyClaw等基准求解器。
链接: https://arxiv.org/abs/2602.08670
作者: Yang Bai
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Computational Engineering, Finance, and Science (cs.CE); Performance (cs.PF); Computational Physics (physics.comp-ph)
备注: Master Thesis
点击查看摘要
Abstract:Machine learning methods have been successful in many areas, like image classification and natural language processing. However, it still needs to be determined how to apply ML to areas with mathematical constraints, like solving PDEs. Among various approaches to applying ML techniques to solving PDEs, the data-driven discretization method presents a promising way of accelerating and improving existing PDE solver on structured grids where it predicts the coefficients of quasi-linear stencils for computing values or derivatives of a function at given positions. It can improve the accuracy and stability of low-resolution simulation compared with using traditional finite difference or finite volume schemes. Meanwhile, it can also benefit from traditional numerical schemes like achieving conservation law by adapting finite volume type formulations. In this thesis, we have implemented the shallow water equation and Euler equation classic solver under a different framework. Experiments show that our classic solver performs much better than the Pyclaw solver. Then we propose four different deep neural networks for the ML-based solver. The results indicate that two of these approaches could output satisfactory solutions.
[CV-38] WiFlow: A Lightweight WiFi-based Continuous Human Pose Estimation Network with Spatio-Temporal Feature Decoupling
链接: https://arxiv.org/abs/2602.08661
作者: Yi Dao,Lankai Zhang,Hao Liu,Haiwei Zhang,Wenbo Wang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-39] Deep Learning-Based Fixation Type Prediction for Quality Assurance in Digital Pathology
链接: https://arxiv.org/abs/2602.08652
作者: Oskar Thaeter,Tanja Niedermair,Johannes Raffler,Ralf Huss,Peter J. Schüffler
机构: Technical University of Munich (慕尼黑工业大学); University of Regensburg (雷根斯堡大学); University Hospital of Augsburg (奥格斯堡大学医院); Bavarian Cancer Research Center (巴伐利亚癌症研究中心); Institute of Pathology and Molecular Diagnostics (病理学与分子诊断研究所); Munich Center for Machine Learning (慕尼黑机器学习中心); Munich Data Science Institute (慕尼黑数据科学研究所); BioM^{\text{M}} Biotech Cluster Development GmbH (BioM^{\text{M}} 生物技术集群开发有限公司)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 17 pages, 8 figures, 7 tables
[CV-40] Revisiting [CLS] and Patch Token Interaction in Vision Transformers ICLR2026
链接: https://arxiv.org/abs/2602.08626
作者: Alexis Marouani,Oriane Siméoni,Hervé Jégou,Piotr Bojanowski,Huy V. Vo
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: To be published as a conference paper at ICLR 2026
[CV-41] Improving Reconstruction of Representation Autoencoder
链接: https://arxiv.org/abs/2602.08620
作者: Siyu Liu,Chujie Qin,Hubery Yin,Qixin Yan,Zheng-Peng Duan,Chen Li,Jing Lyu,Chun-Le Guo,Chongyi Li
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-42] Inspiration Seeds: Learning Non-Literal Visual Combinations for Generative Exploration
链接: https://arxiv.org/abs/2602.08615
作者: Kfir Goldberg,Elad Richardson,Yael Vinker
机构: BRIA AI(以色列BRIA人工智能公司); Runway(美国Runway公司); MIT(麻省理工学院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Project page available at this https URL
[CV-43] Overview and Comparison of AVS Point Cloud Compression Standard
链接: https://arxiv.org/abs/2602.08613
作者: Wei Gao,Wenxu Gao,Xingming Mu,Changhao Peng,Ge Li
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 3 figures, 3 tables
[CV-44] SemiNFT: Learning to Transfer Presets from Imitation to Appreciation via Hybrid-Sample Reinforcement Learning
链接: https://arxiv.org/abs/2602.08582
作者: Melany Yang,Yuhang Yu,Diwang Weng,Jinwei Chen,Wei Dong
机构: vivo Mobile Communication Co. Ltd (vivo移动通信有限公司); Zhejiang University (浙江大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-45] FLAG-4D: Flow-Guided Local-Global Dual-Deformation Model for 4D Reconstruction
【速读】:该论文旨在解决动态场景中基于稀疏输入视角生成高质量新视图时,现有方法难以准确建模复杂点运动与细粒度动态细节一致性的问题。其解决方案的关键在于提出FLAG-4D框架,通过引入双变形网络(Dual-Deformation Network)实现对3D高斯原语在时空上的精准演化重建:其中瞬时变形网络(Instantaneous Deformation Network, IDN)捕捉局部精细形变,全局运动网络(Global Motion Network, GMN)建模长程动态,并通过相互学习机制协同优化;同时结合预训练光流主干提取的密集运动特征,利用变形引导注意力机制将相邻时刻的光流信息与当前3D高斯状态对齐,从而提升重建结果的时空一致性和细节保真度。
链接: https://arxiv.org/abs/2602.08558
作者: Guan Yuan Tan,Ngoc Tuan Vu,Arghya Pal,Sailaja Rajanala,Raphael Phan C.-W.,Mettu Srinivas,Chee-Ming Ting
机构: 1. National University of Singapore (新加坡国立大学); 2. Nanyang Technological University (南洋理工大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Computer Science and Game Theory (cs.GT)
备注:
点击查看摘要
Abstract:We introduce FLAG-4D, a novel framework for generating novel views of dynamic scenes by reconstructing how 3D Gaussian primitives evolve through space and time. Existing methods typically rely on a single Multilayer Perceptron (MLP) to model temporal deformations, and they often struggle to capture complex point motions and fine-grained dynamic details consistently over time, especially from sparse input views. Our approach, FLAG-4D, overcomes this by employing a dual-deformation network that dynamically warps a canonical set of 3D Gaussians over time into new positions and anisotropic shapes. This dual-deformation network consists of an Instantaneous Deformation Network (IDN) for modeling fine-grained, local deformations and a Global Motion Network (GMN) for capturing long-range dynamics, refined through mutual learning. To ensure these deformations are both accurate and temporally smooth, FLAG-4D incorporates dense motion features from a pretrained optical flow backbone. We fuse these motion cues from adjacent timeframes and use a deformation-guided attention mechanism to align this flow information with the current state of each evolving 3D Gaussian. Extensive experiments demonstrate that FLAG-4D achieves higher-fidelity and more temporally coherent reconstructions with finer detail preservation than state-of-the-art methods.
[CV-46] GOT-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing DATE ICLR2026
链接: https://arxiv.org/abs/2602.08550
作者: Shih-Fang Chen,Jun-Cheng Chen,I-Hong Jhuo,Yen-Yu Lin
机构: National Yang Ming Chiao Tung University (国立阳明交通大学); Academia Sinica (中央研究院); Microsoft AI (微软人工智能)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Image and Video Processing (eess.IV)
备注: ICLR 2026. This is a preprint version. The camera-ready version will be updated soon
[CV-47] IBR4D: Tracing-Guided Iterative Boundary Refinement for Efficient 4D Gaussian Segmentation
链接: https://arxiv.org/abs/2602.08540
作者: He Wu,Xia Yan,Yanghui Xu,Liegang Xia,Jiazhou Chen
机构: Zhejiang University of Technology (浙江工业大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
备注: 13 pages, 6 figures, 4 tables
[CV-48] hegra: Graph-based SLAM for Thermal Imagery
链接: https://arxiv.org/abs/2602.08531
作者: Anastasiia Kornilova,Ivan Moskalenko,Arabella Gromova,Gonzalo Ferrer,Alexander Menshchikov
机构: Skolkovo Institute of Science and Technology (斯科尔科沃科学技术研究所)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-49] Automatic regularization parameter choice for tomography using a double model approach
链接: https://arxiv.org/abs/2602.08528
作者: Chuyang Wu,Samuli Siltanen
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC)
备注:
[CV-50] GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving
链接: https://arxiv.org/abs/2602.08524
作者: Linger Deng,Yuliang Liu,Wenwen Yu,Zujia Zhang,Jianzhong Ju,Zhenbo Luo,Xiang Bai
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-51] Are Vision Foundation Models Foundational for Electron Microscopy Image Segmentation?
链接: https://arxiv.org/abs/2602.08505
作者: Caterina Fuster-Barceló,Virginie Uhlmann
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-52] Enhanced Food Category Recognition under Illumination-Induced Domain Shift
【速读】:该论文旨在解决视觉食物识别系统在真实场景(如自动传送带检测)中因光照变化导致的域偏移(domain shift)问题,这种偏移会显著降低模型在跨数据集评估时的准确率。现有研究多局限于单一食物类别或受控环境,且公共食物数据集普遍缺乏明确的光照标注。解决方案的关键在于构建合成光照增强数据集,通过系统性地调整光温(light temperature)与光强(intensity),实现无额外标签条件下的可控鲁棒性分析;同时结合跨数据集迁移学习与领域泛化策略,尤其针对苹果类等对光照敏感的目标类别进行优化。实验表明,光照感知的数据增强能有效提升模型在域偏移下的识别鲁棒性,同时保持实时性能。
链接: https://arxiv.org/abs/2602.08491
作者: Keonvin Park,Aditya Pal,Jin Hong Mok
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:
点击查看摘要
Abstract:Visual food recognition systems deployed in real-world environments, such as automated conveyor-belt inspection, are highly sensitive to domain shifts caused by illumination changes. While recent studies have shown that lighting variations can significantly distort food perception by both humans and AI, existing works are often limited to single food categories or controlled settings, and most public food datasets lack explicit illumination annotations. In this work, we investigate illumination-induced domain shift in multi-class food category recognition using two widely adopted datasets, Food-101 and Fruits-360. We demonstrate substantial accuracy degradation under cross-dataset evaluation due to mismatched visual conditions. To address this challenge, we construct synthetic illumination-augmented datasets by systematically varying light temperature and intensity, enabling controlled robustness analysis without additional labels. We further evaluate cross-dataset transfer learning and domain generalization, with a focus on illumination-sensitive target categories such as apple-based classes. Experimental results show that illumination-aware augmentation significantly improves recognition robustness under domain shift while preserving real-time performance. Our findings highlight the importance of illumination robustness and provide practical insights for deploying reliable food recognition systems in real-world inspection scenarios. Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG) Cite as: arXiv:2602.08491 [cs.CV] (or arXiv:2602.08491v1 [cs.CV] for this version) https://doi.org/10.48550/arXiv.2602.08491 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
[CV-53] Reliability-aware Execution Gating for Near-field and Off-axis Vision-guided Robotic Alignment
【速读】:该论文旨在解决视觉引导机器人系统在近场和偏轴配置下执行精度对齐任务时,尽管姿态估计(pose estimation)数值准确但实际执行仍频繁失败的问题。研究表明,这种失败源于一种确定性的几何误差放大机制:微小的姿态估计误差通过系统结构和运动执行被放大,导致对齐不稳定或失败。解决方案的关键在于提出一种可靠性感知的执行门控机制(Reliability-aware Execution Gating),该机制在执行层面评估几何一致性与配置风险,并选择性地拒绝或缩放高风险的姿态更新,从而显著提升任务成功率、降低执行方差并抑制尾部风险行为,同时保持平均姿态精度不变。该方法不依赖特定姿态估计算法,具备良好的通用性和可集成性。
链接: https://arxiv.org/abs/2602.08466
作者: Ning Hu,Senhao Cao,Maochen Li
机构: 未知
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
备注: 7 pages, 1 figure
点击查看摘要
Abstract:Vision-guided robotic systems are increasingly deployed in precision alignment tasks that require reliable execution under near-field and off-axis configurations. While recent advances in pose estimation have significantly improved numerical accuracy, practical robotic systems still suffer from frequent execution failures even when pose estimates appear accurate. This gap suggests that pose accuracy alone is insufficient to guarantee execution-level reliability. In this paper, we reveal that such failures arise from a deterministic geometric error amplification mechanism, in which small pose estimation errors are magnified through system structure and motion execution, leading to unstable or failed alignment. Rather than modifying pose estimation algorithms, we propose a Reliability-aware Execution Gating mechanism that operates at the execution level. The proposed approach evaluates geometric consistency and configuration risk before execution, and selectively rejects or scales high-risk pose updates. We validate the proposed method on a real UR5 robotic platform performing single-step visual alignment tasks under varying camera-target distances and off-axis configurations. Experimental results demonstrate that the proposed execution gating significantly improves task success rates, reduces execution variance, and suppresses tail-risk behavior, while leaving average pose accuracy largely unchanged. Importantly, the proposed mechanism is estimator-agnostic and can be readily integrated with both classical geometry-based and learning-based pose estimation pipelines. These results highlight the importance of execution-level reliability modeling and provide a practical solution for improving robustness in near-field vision-guided robotic systems.
[CV-54] riC-Motion: Tri-Domain Causal Modeling Grounded Text-to-Motion Generation
链接: https://arxiv.org/abs/2602.08462
作者: Yiyang Cao,Yunze Deng,Ziyu Lin,Bin Feng,Xinggang Wang,Wenyu Liu,Dandan Zheng,Jingdong Chen
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-55] Vista: Scene-Aware Optimization for Streaming Video Question Answering under Post-Hoc Queries AAAI2026
链接: https://arxiv.org/abs/2602.08448
作者: Haocheng Lu,Nan Zhang,Wei Tao,Xiaoyang Qu,Guokuan Li,Jiguang Wan,Jianzong Wang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Accepted to AAAI 2026 (Main Technical Track)
[CV-56] Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition
链接: https://arxiv.org/abs/2602.08439
作者: Yuhao Dong,Shulin Tian,Shuai Liu,Shuangrui Ding,Yuhang Zang,Xiaoyi Dong,Yuhang Cao,Jiaqi Wang,Ziwei Liu
机构: S-Lab, Nanyang Technological University (南洋理工大学); Shanghai AI Lab; CUHK-MMLab
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-57] Understanding and Optimizing Attention-Based Sparse Matching for Diverse Local Features
链接: https://arxiv.org/abs/2602.08430
作者: Qiang Wang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-58] RealSynCol: a high-fidelity synthetic colon dataset for 3D reconstruction applications
链接: https://arxiv.org/abs/2602.08397
作者: Chiara Lena,Davide Milesi,Alessandro Casella,Luca Carlini,Joseph C. Norton,James Martin,Bruno Scaglioni,Keith L. Obstein,Roberto De Sire,Marco Spadaccini,Cesare Hassan,Pietro Valdastri,Elena De Momi
机构: University of Pisa (比萨大学); University of Oxford (牛津大学); Imperial College London (伦敦帝国理工学院); Stanford University (斯坦福大学); Massachusetts Institute of Technology (麻省理工学院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-59] D2-VR: Degradation-Robust and Distilled Video Restoration with Synergistic Optimization Strategy
链接: https://arxiv.org/abs/2602.08395
作者: Jianfeng Liang,Shaocheng Shen,Botao Xu,Qiang Hu,Xiaoyun Zhang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-60] BiManiBench: A Hierarchical Benchmark for Evaluating Bimanual Coordination of Multimodal Large Language Models
【速读】:该论文旨在解决当前多模态大语言模型(Multimodal Large Language Models, MLLMs)在具身智能评估中对双臂操作(bimanual manipulation)任务覆盖不足的问题,尤其是缺乏对双臂空间协同与时间序列控制能力的系统性评测。现有基准主要局限于单臂操作,无法准确捕捉如双手搬运重物这类任务所需的时空协调性。解决方案的关键在于提出BiManiBench,一个分层评估框架,从三个层级进行测评:基础空间推理、高层动作规划和底层末端执行器控制,从而隔离双臂操作特有的挑战(如可达性限制与运动学约束),区分感知幻觉与规划失败,并揭示当前模型在双臂空间定位与控制上的显著缺陷,强调未来研究需聚焦于跨臂碰撞规避与精细时序调度机制。
链接: https://arxiv.org/abs/2602.08392
作者: Xin Wu,Zhixuan Liang,Yue Ma,Mengkang Hu,Zhiyuan Qin,Xiu Li
机构: Tsinghua University (清华大学); The University of Hong Kong (香港大学); HKUST (香港科技大学); Beijing Innovation Center of Humanoid Robotics (北京人形机器人创新中心)
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: 38 pages, 9 figures. Project page: this https URL
点击查看摘要
Abstract:Multimodal Large Language Models (MLLMs) have significantly advanced embodied AI, and using them to benchmark robotic intelligence has become a pivotal trend. However, existing frameworks remain predominantly confined to single-arm manipulation, failing to capture the spatio-temporal coordination required for bimanual tasks like lifting a heavy pot. To address this, we introduce BiManiBench, a hierarchical benchmark evaluating MLLMs across three tiers: fundamental spatial reasoning, high-level action planning, and low-level end-effector control. Our framework isolates unique bimanual challenges, such as arm reachability and kinematic constraints, thereby distinguishing perceptual hallucinations from planning failures. Analysis of over 30 state-of-the-art models reveals that despite high-level reasoning proficiency, MLLMs struggle with dual-arm spatial grounding and control, frequently resulting in mutual interference and sequencing errors. These findings suggest the current paradigm lacks a deep understanding of mutual kinematic constraints, highlighting the need for future research to focus on inter-arm collision-avoidance and fine-grained temporal sequencing.
[CV-61] Geometric Image Editing via Effects-Sensitive In-Context Inpainting with Diffusion Transformers
【速读】:该论文旨在解决扩散模型在图像编辑中面临的两个核心问题:一是难以实现对物体的精确几何变换(如平移、旋转和缩放),尤其是在复杂场景下;二是无法有效建模复杂的光照与阴影效果,导致生成结果缺乏真实感。解决方案的关键在于提出GeoEdit框架,其核心创新包括:(1) 利用基于扩散Transformer模块的上下文生成机制,集成几何变换以实现精准的对象编辑;(2) 引入Effect-Sensitive Attention机制,增强对复杂光照与阴影效应的建模能力,从而提升图像 realism。此外,作者构建了包含超过12万对高质量图像的RS-Objects数据集,支持模型在几何准确性与视觉真实性上的联合优化。
链接: https://arxiv.org/abs/2602.08388
作者: Shuo Zhang,Wenzhuo Wu,Huayu Zhang,Jiarong Cheng,Xianghao Zang,Chao Ban,Hao Sun,Zhongjiang He,Tianwei Cao,Kongming Liang,Zhanyu Ma
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
点击查看摘要
Abstract:Recent advances in diffusion models have significantly improved image editing. However, challenges persist in handling geometric transformations, such as translation, rotation, and scaling, particularly in complex scenes. Existing approaches suffer from two main limitations: (1) difficulty in achieving accurate geometric editing of object translation, rotation, and scaling; (2) inadequate modeling of intricate lighting and shadow effects, leading to unrealistic results. To address these issues, we propose GeoEdit, a framework that leverages in-context generation through a diffusion transformer module, which integrates geometric transformations for precise object edits. Moreover, we introduce Effects-Sensitive Attention, which enhances the modeling of intricate lighting and shadow effects for improved realism. To further support training, we construct RS-Objects, a large-scale geometric editing dataset containing over 120,000 high-quality image pairs, enabling the model to learn precise geometric editing while generating realistic lighting and shadows. Extensive experiments on public benchmarks demonstrate that GeoEdit consistently outperforms state-of-the-art methods in terms of visual quality, geometric accuracy, and realism.
[CV-62] E-VAds: An E-commerce Short Videos Understanding Benchmark for MLLM s
链接: https://arxiv.org/abs/2602.08355
作者: Xianjie Liu,Yiman Hu,Liang Wu,Ping Hu,Yixiong Zou,Jian Xu,Bo Zheng
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-63] What Whether and How? Unveiling Process Reward Models for Thinking with Images Reasoning
链接: https://arxiv.org/abs/2602.08346
作者: Yujin Zhou,Pengcheng Wen,Jiale Chen,Boqin Yin,Han Zhu,Jiaming Ji,Juntao Dai,Chi-Min Chan,Sirui Han
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-64] UrbanGraphEmbeddings: Learning and Evaluating Spatially Grounded Multimodal Embeddings for Urban Science
链接: https://arxiv.org/abs/2602.08342
作者: Jie Zhang,Xingtong Yu,Yuan Fang,Rudi Stouffs,Zdravko Trivic
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:
[CV-65] CoTZero: Annotation-Free Human-Like Vision Reasoning via Hierarchical Synthetic CoT
【速读】:该论文旨在解决当前视觉语言模型(Vision-Language Models, VLMs)在图像-文本对齐基础上仍缺乏类人视觉推理能力的问题,尤其是其依赖表面相关性而非构建逻辑一致的结构化表征,导致难以捕捉高层次语义结构和因果关系,从而阻碍组合式与可验证的推理。解决方案的关键在于提出一种无需标注数据的范式 CoTZero,其核心由两部分组成:一是受神经认知理论启发的双阶段数据合成方法,包括自底向上提取原子视觉基元并逐步组合成多样化结构化问答-推理形式,以及自顶向下利用粗粒度全局结构引导局部细节与因果关系的理解;二是基于合成数据的认知一致可验证奖励(Cognitively Coherent Verifiable Rewards, CCVR)机制,在强化学习微调(Reinforcement Fine-Tuning, RFT)中提供分步反馈以增强模型的层次化推理能力和泛化性能。
链接: https://arxiv.org/abs/2602.08339
作者: Chengyi Du,Yazhe Niu,Dazhong Shen,Luxin Xu
机构: 未知
类目: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: 16 pages 6 figures
点击查看摘要
Abstract:Recent advances in vision-language models (VLMs) have markedly improved image-text alignment, yet they still fall short of human-like visual reasoning. A key limitation is that many VLMs rely on surface correlations rather than building logically coherent structured representations, which often leads to missed higher-level semantic structure and non-causal relational understanding, hindering compositional and verifiable reasoning. To address these limitations by introducing human models into the reasoning process, we propose CoTZero, an annotation-free paradigm with two components: (i) a dual-stage data synthesis approach and (ii) a cognition-aligned training method. In the first component, we draw inspiration from neurocognitive accounts of compositional productivity and global-to-local analysis. In the bottom-up stage, CoTZero extracts atomic visual primitives and incrementally composes them into diverse, structured question-reasoning forms. In the top-down stage, it enforces hierarchical reasoning by using coarse global structure to guide the interpretation of local details and causal relations. In the cognition-aligned training component, built on the synthesized CoT data, we introduce Cognitively Coherent Verifiable Rewards (CCVR) in Reinforcement Fine-Tuning (RFT) to further strengthen VLMs’ hierarchical reasoning and generalization, providing stepwise feedback on reasoning coherence and factual correctness. Experiments show that CoTZero achieves an F1 score of 83.33 percent on our multi-level semantic inconsistency benchmark with lexical-perturbation negatives, across both in-domain and out-of-domain settings. Ablations confirm that each component contributes to more interpretable and human-aligned visual reasoning.
[CV-66] Language-Guided Transformer Tokenizer for Human Motion Generation
链接: https://arxiv.org/abs/2602.08337
作者: Sheng Yan,Yong Wang,Xin Du,Junsong Yuan,Mengyuan Liu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
[CV-67] CAE-AV: Improving Audio-Visual Learning via Cross-modal Interactive Enrichment
链接: https://arxiv.org/abs/2602.08309
作者: Yunzuo Hu,Wen Li,Jing Zhang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 13 pages, 8 figures
[CV-68] ghnari v2: Mitigating Label Noise and Distribution Shift in Multimodal Plant Distribution Prediction via Mixture of Experts and Weakly Supervised Learning
链接: https://arxiv.org/abs/2602.08282
作者: Haixu Liu,Yufei Wang,Tianxiang Xu,Chuancheng Shi,Hongsheng Xing
机构: The University of Sydney (悉尼大学); The University of New South Wales (新南威尔士大学); School of Software and Microelectronics, Peking University (北京大学软件与微电子学院); Shandong University of Technology (山东理工大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:
[CV-69] PISCO: Precise Video Instance Insertion with Sparse Control
链接: https://arxiv.org/abs/2602.08277
作者: Xiangbo Gao,Renjie Li,Xinghao Chen,Yuheng Wu,Suofei Feng,Qing Yin,Zhengzhong Tu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:
[CV-70] Informative Object-centric Next Best View for Object-aware 3D Gaussian Splatting in Cluttered Scenes ICRA2026
【速读】:该论文旨在解决在杂乱场景中因遮挡和观测不完整导致的3D表示建模可靠性不足的问题,特别是现有方法仅依赖几何线索而忽略与操作相关的语义信息,导致探索(exploration)不足、局部区域覆盖不充分。解决方案的关键在于提出一种实例感知的Next Best View (NBV)策略,通过将3D高斯泼溅(3D Gaussian Splatting, 3DGS)中的实例级信息提炼为独热编码的物体向量(one-hot object vectors),计算置信度加权的信息增益,从而识别出错误或不确定的高斯分布区域,并优先选择这些区域进行观测;此外,该方法可进一步扩展为以目标物体为中心的NBV策略,显著提升对物体位姿变化的鲁棒性。实验表明,该方法在合成数据集和真实世界GraspNet数据集上分别将深度误差降低77.14%和34.10%,且针对特定物体执行NBV时相较全局视角再降25.60%的深度误差。
链接: https://arxiv.org/abs/2602.08266
作者: Seunghoon Jeong,Eunho Lee,Jeongyun Kim,Ayoung Kim
机构: 未知
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
备注: 9 pages, 8 figures, 4 tables, accepted to ICRA 2026
点击查看摘要
Abstract:In cluttered scenes with inevitable occlusions and incomplete observations, selecting informative viewpoints is essential for building a reliable representation. In this context, 3D Gaussian Splatting (3DGS) offers a distinct advantage, as it can explicitly guide the selection of subsequent viewpoints and then refine the representation with new observations. However, existing approaches rely solely on geometric cues, neglect manipulation-relevant semantics, and tend to prioritize exploitation over exploration. To tackle these limitations, we introduce an instance-aware Next Best View (NBV) policy that prioritizes underexplored regions by leveraging object features. Specifically, our object-aware 3DGS distills instancelevel information into one-hot object vectors, which are used to compute confidence-weighted information gain that guides the identification of regions associated with erroneous and uncertain Gaussians. Furthermore, our method can be easily adapted to an object-centric NBV, which focuses view selection on a target object, thereby improving reconstruction robustness to object placement. Experiments demonstrate that our NBV policy reduces depth error by up to 77.14% on the synthetic dataset and 34.10% on the real-world GraspNet dataset compared to baselines. Moreover, compared to targeting the entire scene, performing NBV on a specific object yields an additional reduction of 25.60% in depth error for that object. We further validate the effectiveness of our approach through real-world robotic manipulation tasks.
[CV-71] Moving Beyond Functional Connectivity: Time-Series Modeling for fMRI-Based Brain Disorder Classification
链接: https://arxiv.org/abs/2602.08262
作者: Guoqi Yu,Xiaowei Hu,Angelica I. Aviles-Rivero,Anqi Qiu,Shujun Wang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: This paper has been accepted by IEEE Transactions on Medical Imaging
[CV-72] Do MLLM s Really See It: Reinforcing Visual Attention in Multimodal LLM s
【速读】:该论文旨在解决多模态大语言模型(Multimodal Large Language Models, MLLMs)在复杂推理任务中因视觉注意力不稳定而导致的错误传播问题。现有方法依赖于长文本推理轨迹,缺乏对视觉注意力策略的有效学习机制,导致早期视觉错位难以纠正,进而影响最终推理准确性。解决方案的关键在于提出SAYO模型,其采用基于强化学习(Reinforcement Learning, RL)的训练框架,并引入一种基于区域级视觉注意力的奖励机制,该奖励显式地将优化信号与视觉锚定的推理步骤对齐,从而引导模型学习更可靠的视觉注意力行为,提升推理稳定性与准确性。
链接: https://arxiv.org/abs/2602.08241
作者: Siqu Ou,Tianrui Wan,Zhiyuan Zhao,Junyu Gao,Xuelong Li
机构: 未知
类目: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注:
点击查看摘要
Abstract:While chain-of-thought (CoT) reasoning has substantially improved multimodal large language models (MLLMs) on complex reasoning tasks, existing approaches largely rely on long textual reasoning trajectories and provide limited mechanisms for learning stable visual attention policies. Our analysis shows that current MLLMs exhibit weak visual focus: early-stage visual misalignment is rarely corrected during subsequent reasoning, leading to error propagation and failed inferences. We argue that this limitation stems from inadequate credit assignment for visual attention during training. To address this issue, we propose SAYO, a visual reasoning model trained with a reinforcement learning (RL) framework that introduces a region-level visual attention-based reward. This reward explicitly aligns optimization signals with visually grounded reasoning steps, enabling the model to learn more reliable attention behaviors. Extensive experiments across multiple multimodal benchmarks demonstrate that SAYO consistently improves performance on diverse reasoning and perception tasks.
[CV-73] Generating Adversarial Events: A Motion-Aware Point Cloud Framework
链接: https://arxiv.org/abs/2602.08230
作者: Hongwei Ren,Youxin Jiang,Qifei Gu,Xiangqian Wu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:
人工智能
[AI-0] CIC-Trap4Phish: A Unified Multi-Format Dataset for Phishing and Quishing Attachment Detection
链接: https://arxiv.org/abs/2602.09015
作者: Fatemeh Nejati,Mahdi Rabbani,Mansur Mirani,Gunjan Piya,Igor Opushnyev,Ali A. Ghorbani,Sajjad Dadkhah
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注:
[AI-1] ANCRe: Adaptive Neural Connection Reassignment for Efficient Depth Scaling
【速读】:该论文旨在解决深度神经网络中深层结构常被低效利用的问题,尤其是传统残差连接(Residual Connections)在优化过程中可能导致收敛速度缓慢和深度利用率不足的瓶颈。其解决方案的关键在于提出一种名为自适应神经连接重分配(Adaptive Neural Connection Reassignment, ANCRe)的轻量级框架,该框架通过数据驱动的方式参数化并学习最优的残差连接模式,在几乎不增加计算与内存开销(<1%)的前提下,显著提升网络深度的有效利用,从而实现更快的收敛速度、更高的性能表现以及更强的深度效率。
链接: https://arxiv.org/abs/2602.09009
作者: Yilang Zhang,Bingcong Li,Niao He,Georgios B. Giannakis
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:Scaling network depth has been a central driver behind the success of modern foundation models, yet recent investigations suggest that deep layers are often underutilized. This paper revisits the default mechanism for deepening neural networks, namely residual connections, from an optimization perspective. Rigorous analysis proves that the layout of residual connections can fundamentally shape convergence behavior, and even induces an exponential gap in convergence rates. Prompted by this insight, we introduce adaptive neural connection reassignment (ANCRe), a principled and lightweight framework that parameterizes and learns residual connectivities from the data. ANCRe adaptively reassigns residual connections with negligible computational and memory overhead ( 1% ), while enabling more effective utilization of network depth. Extensive numerical tests across pre-training of large language models, diffusion models, and deep ResNets demonstrate consistently accelerated convergence, boosted performance, and enhanced depth efficiency over conventional residual connections.
[AI-2] ARO: A New Lens On Matrix Optimization For Large Models
链接: https://arxiv.org/abs/2602.09006
作者: Wenbo Gong,Javier Zazo,Qijun Luo,Puqian Wang,James Hensman,Chao Ma
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
备注:
[AI-3] From Obstacles to Etiquette: Robot Social Navigation with VLM-Informed Path Selection
【速读】:该论文旨在解决社会机器人在人类环境中导航时面临的挑战,即仅满足几何避障条件不足以实现自然、符合社会规范的交互,因为即使路径无碰撞,仍可能干扰人类活动或违背社交准则。解决方案的关键在于构建一个融合几何规划与情境化社会推理的框架:首先生成几何可行的候选路径,随后利用微调后的视觉-语言模型(Vision-Language Model, VLM)基于上下文感知的社会预期对路径进行评估,从而选择社会最优路径。该任务特定的VLM通过蒸馏大模型的社会推理能力,形成轻量高效模型,支持在多样人机交互场景中实时适应与决策。
链接: https://arxiv.org/abs/2602.09002
作者: Zilin Fang,Anxing Xiao,David Hsu,Gim Hee Lee
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注: Accepted to IEEE Robotics and Automation Letters (RA-L)
点击查看摘要
Abstract:Navigating socially in human environments requires more than satisfying geometric constraints, as collision-free paths may still interfere with ongoing activities or conflict with social norms. Addressing this challenge calls for analyzing interactions between agents and incorporating common-sense reasoning into planning. This paper presents a social robot navigation framework that integrates geometric planning with contextual social reasoning. The system first extracts obstacles and human dynamics to generate geometrically feasible candidate paths, then leverages a fine-tuned vision-language model (VLM) to evaluate these paths, informed by contextually grounded social expectations, selecting a socially optimized path for the controller. This task-specific VLM distills social reasoning from large foundation models into a smaller and efficient model, allowing the framework to perform real-time adaptation in diverse human-robot interaction contexts. Experiments in four social navigation contexts demonstrate that our method achieves the best overall performance with the lowest personal space violation duration, the minimal pedestrian-facing time, and no social zone intrusions. Project page: this https URL
[AI-4] GRPO: Self-Feedback-Driven LLM Reasoning
【速读】:该论文旨在解决大语言模型(Large Language Models, LLMs)在复杂数学问题求解中仍存在准确性和一致性不足的问题。现有方法依赖强化学习(Reinforcement Learning, RL)框架对模型进行任务奖励对齐以提升性能,但传统策略优化方法如近端策略优化(Proximal Policy Optimization, PPO)存在效率瓶颈或价值函数依赖问题。其解决方案的关键在于提出一种两阶段迭代式组相对策略优化(Iterative Group Relative Policy Optimization, iGRPO),通过引入动态自条件机制——即模型生成多个探索性草稿(drafts),选取最高奖励草稿作为条件输入,并在此基础上进行策略更新,从而实现对最优先前尝试的进一步优化。该方法无需显式价值函数,且在相同采样预算下显著优于基线GRPO,在多个数学推理基准测试中达到新SOTA结果,验证了基于自我反馈的迭代强化学习在可验证数学推理中的有效性。
链接: https://arxiv.org/abs/2602.09000
作者: Ali Hatamizadeh,Shrimai Prabhumoye,Igor Gitman,Ximing Lu,Seungju Han,Wei Ping,Yejin Choi,Jan Kautz
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: Tech report
点击查看摘要
Abstract:Large Language Models (LLMs) have shown promise in solving complex mathematical problems, yet they still fall short of producing accurate and consistent solutions. Reinforcement Learning (RL) is a framework for aligning these models with task-specific rewards, improving overall quality and reliability. Group Relative Policy Optimization (GRPO) is an efficient, value-function-free alternative to Proximal Policy Optimization (PPO) that leverages group-relative reward normalization. We introduce Iterative Group Relative Policy Optimization (iGRPO), a two-stage extension of GRPO that adds dynamic self-conditioning through model-generated drafts. In Stage 1, iGRPO samples multiple exploratory drafts and selects the highest-reward draft using the same scalar reward signal used for optimization. In Stage 2, it appends this best draft to the original prompt and applies a GRPO-style update on draft-conditioned refinements, training the policy to improve beyond its strongest prior attempt. Under matched rollout budgets, iGRPO consistently outperforms GRPO across base models (e.g., Nemotron-H-8B-Base-8K and DeepSeek-R1 Distilled), validating its effectiveness on diverse reasoning benchmarks. Moreover, applying iGRPO to OpenReasoning-Nemotron-7B trained on AceReason-Math achieves new state-of-the-art results of 85.62% and 79.64% on AIME24 and AIME25, respectively. Ablations further show that the refinement wrapper generalizes beyond GRPO variants, benefits from a generative judge, and alters learning dynamics by delaying entropy collapse. These results underscore the potential of iterative, self-feedback-based RL for advancing verifiable mathematical reasoning.
[AI-5] InternAgent -1.5: A Unified Agent ic Framework for Long-Horizon Autonomous Scientific Discovery
【速读】:该论文旨在解决科学发现过程中跨计算与实证领域(computational and empirical domains)的自动化与协同问题,即如何构建一个能够端到端完成从假设生成、验证到演化迭代的统一系统,以实现自主科学探索。其解决方案的关键在于提出InternAgent-1.5,该系统采用由生成(generation)、验证(verification)和演化(evolution)三个子系统构成的结构化架构,并辅以深度研究(deep research)、解优化(solution optimization)和长程记忆(long horizon memory)等基础能力,从而支持持续的发现周期并保持行为的一致性与进化性,同时协调计算建模与实验室实验,在算法发现与实证发现任务中均展现出卓越性能。
链接: https://arxiv.org/abs/2602.08990
作者: Shiyang Feng,Runmin Ma,Xiangchao Yan,Yue Fan,Yusong Hu,Songtao Huang,Shuaiyu Zhang,Zongsheng Cao,Tianshuo Peng,Jiakang Yuan,Zijie Guo,Zhijie Zhong,Shangheng Du,Weida Wang,Jinxin Shi,Yuhao Zhou,Xiaohan He,Zhiyin Yu,Fangchen Yu,Qihao Zheng,Jiamin Wu,Mianxin Liu,Chi Zhang,Shaowei Hou,Shuya Li,Yankai Jiang,Wenjie Lou,Lilong Wang,Zifu Wang,Jiong Wang,Wanghan Xu,Yue Deng,Dongrui Liu,Yiheng Wang,Wenlong Zhang,Fenghua Ling,Shufei Zhang,Xiaosong Wang,Shuangjia Zheng,Xun Huang,Siqi Sun,Shuyue Hu,Peng Ye,Chunfeng Song,Bin Wang,Conghui He,Yihao Liu,Xin Li,Qibin Hou,Tao Chen,Xiangyu Yue,Bin Wang,Liang He,Dahua Lin,Bowen Zhou,Bo Zhang,Lei Bai
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: Code and project page: this https URL
点击查看摘要
Abstract:We introduce InternAgent-1.5, a unified system designed for end-to-end scientific discovery across computational and empirical domains. The system is built on a structured architecture composed of three coordinated subsystems for generation, verification, and evolution. These subsystems are supported by foundational capabilities for deep research, solution optimization, and long horizon memory. The architecture allows InternAgent-1.5 to operate continuously across extended discovery cycles while maintaining coherent and improving behavior. It also enables the system to coordinate computational modeling and laboratory experimentation within a single unified system. We evaluate InternAgent-1.5 on scientific reasoning benchmarks such as GAIA, HLE, GPQA, and FrontierScience, and the system achieves leading performance that demonstrates strong foundational capabilities. Beyond these benchmarks, we further assess two categories of discovery tasks. In algorithm discovery tasks, InternAgent-1.5 autonomously designs competitive methods for core machine learning problems. In empirical discovery tasks, it executes complete computational or wet lab experiments and produces scientific findings in earth, life, biological, and physical domains. Overall, these results show that InternAgent-1.5 provides a general and scalable framework for autonomous scientific discovery.
[AI-6] Improving Detection of Rare Nodes in Hierarchical Multi-Label Learning
【速读】:该论文旨在解决层次化多标签分类(hierarchical multi-label classification)中模型难以预测到层次结构更深层节点的问题,其核心挑战源于某些类别的天然稀有性以及层级约束导致子节点通常比父节点更罕见。解决方案的关键在于提出一种加权损失函数(weighted loss objective),该函数结合了节点级别的不平衡权重(node-wise imbalance weighting)与聚焦权重(focal weighting)机制,后者利用现代集成不确定性量化方法来识别训练过程中模型输出分布中不确定的节点。通过强调罕见节点而非稀有样本,并在训练时聚焦于每个输出分布中不确定性较高的节点,该方法显著提升了召回率(最高达五倍)和F₁分数,且在卷积神经网络面对编码器性能不佳或数据有限的挑战任务时亦表现出鲁棒性。
链接: https://arxiv.org/abs/2602.08986
作者: Isaac Xu,Martin Gillis,Ayushi Sharma,Benjamin Misiuk,Craig J. Brown,Thomas Trappenberg
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Accepted for publication in Transactions on Machine Learning Research (TMLR), 2026
点击查看摘要
Abstract:In hierarchical multi-label classification, a persistent challenge is enabling model predictions to reach deeper levels of the hierarchy for more detailed or fine-grained classifications. This difficulty partly arises from the natural rarity of certain classes (or hierarchical nodes) and the hierarchical constraint that ensures child nodes are almost always less frequent than their parents. To address this, we propose a weighted loss objective for neural networks that combines node-wise imbalance weighting with focal weighting components, the latter leveraging modern quantification of ensemble uncertainties. By emphasizing rare nodes rather than rare observations (data points), and focusing on uncertain nodes for each model output distribution during training, we observe improvements in recall by up to a factor of five on benchmark datasets, along with statistically significant gains in F_1 score. We also show our approach aids convolutional networks on challenging tasks, as in situations with suboptimal encoders or limited data.
[AI-7] StretchTime: Adaptive Time Series Forecasting via Symplectic Attention
链接: https://arxiv.org/abs/2602.08983
作者: Yubin Kim,Viresh Pati,Jevon Twitty,Vinh Pham,Shihao Yang,Jiecheng Lu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
[AI-8] stable-worldmodel-v1: Reproducible World Modeling Research and Evaluation
【速读】:该论文旨在解决当前世界模型(World Models)研究中普遍存在的可复用性差、实现代码发布特定性强、缺乏标准化评估环境等问题,这些问题限制了模型的可靠性和研究的可重复性。解决方案的关键在于提出一个名为stable-worldmodel (SWM) 的模块化、可测试且文档完善的开源研究生态系统,其核心包括高效的数据收集工具、标准化环境、通用规划算法及基线实现,并在每个环境中引入可控的变化因子(如视觉和物理属性),以支持鲁棒性和持续学习的研究。通过该平台,作者进一步验证了其在零样本鲁棒性(zero-shot robustness)研究中的实用性。
链接: https://arxiv.org/abs/2602.08968
作者: Lucas Maes,Quentin Le Lidec,Dan Haramati,Nassim Massaudi,Damien Scieur,Yann LeCun,Randall Balestriero
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:World Models have emerged as a powerful paradigm for learning compact, predictive representations of environment dynamics, enabling agents to reason, plan, and generalize beyond direct experience. Despite recent interest in World Models, most available implementations remain publication-specific, severely limiting their reusability, increasing the risk of bugs, and reducing evaluation standardization. To mitigate these issues, we introduce stable-worldmodel (SWM), a modular, tested, and documented world-model research ecosystem that provides efficient data-collection tools, standardized environments, planning algorithms, and baseline implementations. In addition, each environment in SWM enables controllable factors of variation, including visual and physical properties, to support robustness and continual learning research. Finally, we demonstrate the utility of SWM by using it to study zero-shot robustness in DINO-WM.
[AI-9] Digital Twin and Agent ic AI for Wild Fire Disaster Management: Intelligent Virtual Situation Room
【速读】:该论文旨在解决传统灾害管理框架在应对动态演化的野火事件时存在的实时适应能力不足问题,其核心挑战在于静态模拟与被动数据采集难以满足复杂多变的火灾场景需求。解决方案的关键在于提出一种双向数字孪生(Digital Twin, DT)平台——智能虚拟态势室(Intelligent Virtual Situation Room, IVSR),该平台通过自主AI代理实现对多源传感器图像、气象数据和三维森林模型的持续融合,构建实时虚拟火场副本;并利用基于人工智能的相似性引擎匹配预计算的灾难模拟库,快速检索并校准干预策略,同时将授权操作(如无人机重新部署或人员调配)通过标准化流程反馈至物理层,形成闭环响应机制。这一架构显著提升了从检测到干预的延迟效率,并增强了资源调度的有效性。
链接: https://arxiv.org/abs/2602.08949
作者: Mohammad Morsali,Siavash H. Khajavi
机构: 未知
类目: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
备注:
点击查看摘要
Abstract:According to the United Nations, wildfire frequency and intensity are projected to increase by approximately 14% by 2030 and 30% by 2050 due to global warming, posing critical threats to life, infrastructure, and ecosystems. Conventional disaster management frameworks rely on static simulations and passive data acquisition, hindering their ability to adapt to arbitrarily evolving wildfire episodes in real-time. To address these limitations, we introduce the Intelligent Virtual Situation Room (IVSR), a bidirectional Digital Twin (DT) platform augmented by autonomous AI agents. The IVSR continuously ingests multisource sensor imagery, weather data, and 3D forest models to create a live virtual replica of the fire environment. A similarity engine powered by AI aligns emerging conditions with a precomputed Disaster Simulation Library, retrieving and calibrating intervention tactics under the watchful eyes of experts. Authorized action-ranging from UAV redeployment to crew reallocation-is cycled back through standardized procedures to the physical layer, completing the loop between response and analysis. We validate IVSR through detailed case-study simulations provided by an industrial partner, demonstrating capabilities in localized incident detection, privacy-preserving playback, collider-based fire-spread projection, and site-specific ML retraining. Our results indicate marked reductions in detection-to-intervention latency and more effective resource coordination versus traditional systems. By uniting real-time bidirectional DTs with agentic AI, IVSR offers a scalable, semi-automated decision-support paradigm for proactive, adaptive wildfire disaster management.
[AI-10] CausalT5K: Diagnosing and Informing Refusal for Trustworthy Causal Reasoning of Skepticism Sycophancy Detection-Correction and Rung Collapse
【速读】:该论文旨在解决大语言模型(Large Language Models, LLMs)在因果推理中存在的一系列系统性失败问题,包括谄媚倾向(sycophancy)、层级坍塌(rung collapse)以及拒绝响应校准不当(miscalibrated refusal),这些问题阻碍了可信推理系统的进展。传统方法因缺乏能够系统诊断这些故障模式的基准测试而进展缓慢。解决方案的关键在于提出CausalT5K——一个包含超过5000个案例、覆盖10个领域的诊断基准,其核心创新在于:第一,基于Pearl因果阶梯(Ladder of Causation)构建真实叙事中的因果陷阱;第二,将性能分解为效用(Utility,即敏感性)与安全(Safety,即特异性),从而揭示聚合准确率无法捕捉的失效模式;第三,通过人机协同的严格开发流程(40位领域专家参与、迭代交叉验证及规则、LLM与人工评分复合验证)确保基准的可靠性。该基准使研究者首次识别出静态审计策略普遍失效的“四象限控制景观”,显著推动了可信赖因果推理系统的发展。
链接: https://arxiv.org/abs/2602.08939
作者: Longling Geng,Andy Ouyang,Theodore Wu,Daphne Barretto,Matthew John Hayes,Rachael Cooper,Yuqiao Zeng,Sameer Vijay,Gia Ancone,Ankit Rai,Matthew Wolfman,Patrick Flanagan,Edward Y. Chang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 17 pages, 20 tables, figures
点击查看摘要
Abstract:LLM failures in causal reasoning, including sycophancy, rung collapse, and miscalibrated refusal, are well-documented, yet progress on remediation is slow because no benchmark enables systematic diagnosis. We introduce CausalT5K, a diagnostic benchmark of over 5,000 cases across 10 domains that tests three critical capabilities: (1) detecting rung collapse, where models answer interventional queries with associational evidence; (2) resisting sycophantic drift under adversarial pressure; and (3) generating Wise Refusals that specify missing information when evidence is underdetermined. Unlike synthetic benchmarks, CausalT5K embeds causal traps in realistic narratives and decomposes performance into Utility (sensitivity) and Safety (specificity), revealing failure modes invisible to aggregate accuracy. Developed through a rigorous human-machine collaborative pipeline involving 40 domain experts, iterative cross-validation cycles, and composite verification via rule-based, LLM, and human scoring, CausalT5K implements Pearl’s Ladder of Causation as research infrastructure. Preliminary experiments reveal a Four-Quadrant Control Landscape where static audit policies universally fail, a finding that demonstrates CausalT5K’s value for advancing trustworthy reasoning systems. Repository: this https URL
[AI-11] StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors
链接: https://arxiv.org/abs/2602.08934
作者: Suraj Ranganath,Atharv Ramesh
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
备注: Expanded version of a workshop submission. Code available
[AI-12] Efficient and Stable Reinforcement Learning for Diffusion Language Models
【速读】:该论文旨在解决将强化学习(Reinforcement Learning, RL)应用于基于扩散模型的大语言模型(Diffusion-based Large Language Models, dLLMs)时面临的效率与稳定性挑战。其核心解决方案是提出时空剪枝(Spatio-Temporal Pruning, STP)框架,通过两个关键机制实现:一是空间剪枝(spatial pruning),利用静态先验约束探索空间以减少冗余;二是时间剪枝(temporal pruning),跳过生成过程后期的冗余优化步骤。理论分析表明,STP可严格降低对数似然估计的方差,从而保障策略更新的稳定性;实验结果验证了该方法在效率和准确性上均优于现有最优基线。
链接: https://arxiv.org/abs/2602.08905
作者: Jiawei Liu,Xiting Wang,Yuanyuan Zhong,Defu Lian,Yu Yang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 13 pages, 3 figures
点击查看摘要
Abstract:Reinforcement Learning (RL) is crucial for unlocking the complex reasoning capabilities of Diffusion-based Large Language Models (dLLMs). However, applying RL to dLLMs faces unique challenges in efficiency and stability. To address these challenges, we propose Spatio-Temporal Pruning (STP), a framework designed to simultaneously improve the efficiency and stability of RL for dLLMs. STP compresses the redundancy in the generative process through: (1) \textitspatial pruning, which constrains the exploration space using static priors; and (2) \textittemporal pruning, which bypasses redundant late-stage refinement steps. Our theoretical analysis demonstrates that STP strictly reduces the variance of the log-likelihood estimation, thereby ensuring more stable policy updates. Extensive experiments demonstrate that STP surpasses state-of-the-art baselines in both efficiency and accuracy. Our code is available at this https URL.
[AI-13] Scalable Delphi: Large Language Models for Structured Risk Estimation
【速读】:该论文旨在解决高风险领域中定量风险评估因依赖传统结构化专家征询(如德尔菲法)而面临的时间成本过高、可扩展性差的问题,从而限制了其在多数应用场景中的可行性。解决方案的关键在于提出“可扩展德尔菲”(Scalable Delphi),通过为大型语言模型(LLMs)配置多样化专家角色、引入迭代优化机制和理由共享策略,使LLM能够高效模拟人类专家的判断过程,并基于可验证代理指标、证据敏感性和与人类专家判断的一致性构建评估框架,从而实现从数月协调缩短至分钟级的风险评估能力,同时保持与基准数据和人类专家高度一致的准确性。
链接: https://arxiv.org/abs/2602.08889
作者: Tobias Lorenz,Mario Fritz
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:Quantitative risk assessment in high-stakes domains relies on structured expert elicitation to estimate unobservable properties. The gold standard - the Delphi method - produces calibrated, auditable judgments but requires months of coordination and specialist time, placing rigorous risk assessment out of reach for most applications. We investigate whether Large Language Models (LLMs) can serve as scalable proxies for structured expert elicitation. We propose Scalable Delphi, adapting the classical protocol for LLMs with diverse expert personas, iterative refinement, and rationale sharing. Because target quantities are typically unobservable, we develop an evaluation framework based on necessary conditions: calibration against verifiable proxies, sensitivity to evidence, and alignment with human expert judgment. We evaluate in the domain of AI-augmented cybersecurity risk, using three capability benchmarks and independent human elicitation studies. LLM panels achieve strong correlations with benchmark ground truth (Pearson r=0.87-0.95), improve systematically as evidence is added, and align with human expert panels - in one comparison, closer to a human panel than the two human panels are to each other. This demonstrates that LLM-based elicitation can extend structured expert judgment to settings where traditional methods are infeasible, reducing elicitation time from months to minutes.
[AI-14] DeepQuali: Initial results of a study on the use of large language models for assessing the quality of user stories
链接: https://arxiv.org/abs/2602.08887
作者: Adam Trendowicz,Daniel Seifert,Andreas Jedlitschka,Marcus Ciolkowski,Anton Strahilov
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注:
[AI-15] Breaking the Simplification Bottleneck in Amortized Neural Symbolic Regression
链接: https://arxiv.org/abs/2602.08885
作者: Paul Saegert,Ullrich Köthe
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Symbolic Computation (cs.SC)
备注: main text: 8 pages, 7 figures appendix: 12 pages, 11 figures code available at this https URL and this https URL
[AI-16] Learning Potentials for Dynamic Matching and Application to Heart Transplantation
链接: https://arxiv.org/abs/2602.08878
作者: Itai Zilberstein,Ioannis Anagnostides,Zachary W. Sollie,Arman Kilic,Tuomas Sandholm
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
[AI-17] AnomSeer: Reinforcing Multimodal LLM s to Reason for Time-Series Anomaly Detection
链接: https://arxiv.org/abs/2602.08868
作者: Junru Zhang,Lang Feng,Haoran Shi,Xu Guo,Han Yu,Yabo Dong,Duanqing Xu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Preprint
[AI-18] Deciding the Satisfiability of Combined Qualitative Constraint Networks
【速读】:该论文旨在解决如何统一处理多种扩展与组合形式的定性推理(qualitative reasoning)问题,包括多尺度推理、时间序列以及松散集成等情形。其核心挑战在于,在缺乏精确数值信息的情况下,如何对这些复杂组合进行一致性的推理建模,并分析其可满足性判定(satisfiability decision)的计算复杂度。解决方案的关键在于提出一个形式化框架,能够同时支持上述各类扩展与组合的推理操作,并通过两个互补的定理证明:在特定条件下,可满足性判定问题为多项式时间可解。这一框架不仅恢复了已知的“规模-拓扑”组合结果,还扩展了传统定性形式系统的定义,纳入了此前文献中未涵盖但实际重要的形式系统,从而提升了理论覆盖范围与实用性。
链接: https://arxiv.org/abs/2602.08848
作者: Quentin Cohen-Solal,Alexandre Niveau,Maroua Bouzid
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:Among the various forms of reasoning studied in the context of artificial intelligence, qualitative reasoning makes it possible to infer new knowledge in the context of imprecise, incomplete information without numerical values. In this paper, we propose a formal framework unifying several forms of extensions and combinations of qualitative formalisms, including multi-scale reasoning, temporal sequences, and loose integrations. This framework makes it possible to reason in the context of each of these combinations and extensions, but also to study in a unified way the satisfiability decision and its complexity. In particular, we establish two complementary theorems guaranteeing that the satisfiability decision is polynomial, and we use them to recover the known results of the size-topology combination. We also generalize the main definition of qualitative formalism to include qualitative formalisms excluded from the definitions of the literature, important in the context of combinations.
[AI-19] Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems
链接: https://arxiv.org/abs/2602.08847
作者: Lang Feng,Longtao Zheng,Shuo He,Fuxiang Zhang,Bo An
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Preprint
[AI-20] Learning the Value Systems of Societies with Preference-based Multi-objective Reinforcement Learning AAMAS2026
【速读】:该论文旨在解决价值对齐(value alignment)在多智能体系统中的建模问题,即如何让人工智能系统识别并适应不同用户群体的价值观(value-based preferences),从而实现个性化且可解释的决策行为。现有方法往往依赖人工设计特征或缺乏基于价值的可解释性与适应性,难以应对价值系统的多样性与社会性。其解决方案的关键在于提出一种结合聚类分析与基于偏好的多目标强化学习(Preference-based Multi-Objective Reinforcement Learning, PbMORL)的算法框架,通过联合学习社会层面的价值对齐模型(groundings)和代表不同用户群组的价值系统(value systems),每个群组包含一个近似帕累托最优策略(approximately Pareto-optimal policy),以体现该群体的价值偏好。这一方法实现了从多样化人类价值观中自动提取结构化价值系统,并支持个体与群体层面的动态适配。
链接: https://arxiv.org/abs/2602.08835
作者: Andrés Holgado-Sánchez,Peter Vamplew,Richard Dazeley,Sascha Ossowski,Holger Billhardt
机构: 未知
类目: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
备注: 18 pages, 3 figures. To be published in proceedings of the 25th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2026). This is a full version that includes the supplementary material
点击查看摘要
Abstract:Value-aware AI should recognise human values and adapt to the value systems (value-based preferences) of different users. This requires operationalization of values, which can be prone to misspecification. The social nature of values demands their representation to adhere to multiple users while value systems are diverse, yet exhibit patterns among groups. In sequential decision making, efforts have been made towards personalization for different goals or values from demonstrations of diverse agents. However, these approaches demand manually designed features or lack value-based interpretability and/or adaptability to diverse user preferences. We propose algorithms for learning models of value alignment and value systems for a society of agents in Markov Decision Processes (MDPs), based on clustering and preference-based multi-objective reinforcement learning (PbMORL). We jointly learn socially-derived value alignment models (groundings) and a set of value systems that concisely represent different groups of users (clusters) in a society. Each cluster consists of a value system representing the value-based preferences of its members and an approximately Pareto-optimal policy that reflects behaviours aligned with this value system. We evaluate our method against a state-of-the-art PbMORL algorithm and baselines on two MDPs with human values. Comments: 18 pages, 3 figures. To be published in proceedings of the 25th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2026). This is a full version that includes the supplementary material Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG) ACMclasses: I.2.6; J.4; H.1.2 Cite as: arXiv:2602.08835 [cs.AI] (or arXiv:2602.08835v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2602.08835 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Related DOI: https://doi.org/10.65109/NLVD8864 Focus to learn more DOI(s) linking to related resources
[AI-21] Permissive-Washing in the Open AI Supply Chain: A Large-Scale Audit of License Integrity
链接: https://arxiv.org/abs/2602.08816
作者: James Jewitt,Gopi Krishnan Rajbahadur,Hao Li,Bram Adams,Ahmed E. Hassan
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Software Engineering (cs.SE)
备注: 13 pages, 2 figures, 10 tables
[AI-22] Negative-Aware Diffusion Process for Temporal Knowledge Graph Extrapolation
【速读】:该论文旨在解决时间知识图谱(Temporal Knowledge Graph, TKG)推理中两个关键问题:一是生成路径仅依赖正样本证据,忽略了负样本提供的信息;二是训练目标主要基于交叉熵排序损失,虽能优化候选排序,但对去噪嵌入的校准缺乏有效监督。解决方案的核心是提出Negative-Aware Diffusion model for TKG Extrapolation (NADEx),其通过编码实体、关系和时间间隔的主体中心历史序列构建嵌入表示,并利用Transformer去噪器在逆向过程中重构查询对象,同时引入基于批次负样本原型的余弦对齐正则项,增强对不合理候选的决策边界约束,从而提升预测准确性与校准性能。
链接: https://arxiv.org/abs/2602.08815
作者: Yanglei Gan,Peng He,Yuxiang Cai,Run Lin,Guanyu Zhou,Qiao Liu
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:Temporal Knowledge Graph (TKG) reasoning seeks to predict future missing facts from historical evidence. While diffusion models (DM) have recently gained attention for their ability to capture complex predictive distributions, two gaps remain: (i) the generative path is conditioned only on positive evidence, overlooking informative negative context, and (ii) training objectives are dominated by cross-entropy ranking, which improves candidate ordering but provides little supervision over the calibration of the denoised embedding. To bridge this gap, we introduce Negative-Aware Diffusion model for TKG Extrapolation (NADEx). Specifically, NADEx encodes subject-centric histories of entities, relations and temporal intervals into sequential embeddings. NADEx perturbs the query object in the forward process and reconstructs it in reverse with a Transformer denoiser conditioned on the temporal-relational context. We further derive a cosine-alignment regularizer derived from batch-wise negative prototypes, which tightens the decision boundary against implausible candidates. Comprehensive experiments on four public TKG benchmarks demonstrate that NADEx delivers state-of-the-art performance.
[AI-23] textttlrnnx: A library for Linear RNNs EACL
链接: https://arxiv.org/abs/2602.08810
作者: Karan Bania,Soham Kalburgi,Manit Tanwar,Dhruthi,Aditya Nagarsekar,Harshvardhan Mestha,Naman Chibber,Raj Deshmukh,Anish Sathyanarayanan,Aarush Rathore,Pratham Chheda
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: EACL Student Research Workshop 2026
[AI-24] Root Cause Analysis Method Based on Large Language Models with Residual Connection Structures
【速读】:该论文旨在解决复杂大规模微服务架构中根因定位(Root Cause Localization, RCA)的挑战问题,其核心难点在于微服务之间复杂的故障传播机制以及指标(Metrics)、日志(Logs)和追踪(Traces)等多源遥测数据的高维度特性,导致现有RCA方法效果受限。解决方案的关键在于提出一种基于残差连接的RCA方法RC-LLM,通过设计类残差的分层融合结构来整合多源遥测数据,并利用大语言模型(Large Language Model, LLM)的上下文推理能力建模时间序列上的时序依赖关系与跨微服务的因果依赖关系,从而实现更准确高效的根因定位。
链接: https://arxiv.org/abs/2602.08804
作者: Liming Zhou,Ailing Liu,Hongwei Liu,Min He,Heng Zhang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:Root cause localization remain challenging in complex and large-scale microservice architectures. The complex fault propagation among microservices and the high dimensionality of telemetry data, including metrics, logs, and traces, limit the effectiveness of existing root cause analysis (RCA) methods. In this paper, a residual-connection-based RCA method using large language model (LLM), named RC-LLM, is proposed. A residual-like hierarchical fusion structure is designed to integrate multi-source telemetry data, while the contextual reasoning capability of large language models is leveraged to model temporal and cross-microservice causal dependencies. Experimental results on CCF-AIOps microservice datasets demonstrate that RC-LLM achieves strong accuracy and efficiency in root cause analysis.
[AI-25] Default Machine Learning Hyperparameters Do Not Provide Informative Initialization for Bayesian Optimization
链接: https://arxiv.org/abs/2602.08774
作者: Nicolás Villagrán Prieto,Eduardo C. Garrido-Merchán
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
[AI-26] FreqLens: Interpretable Frequency Attribution for Time Series Forecasting
【速读】:该论文旨在解决时间序列预测模型缺乏可解释性的问题,从而限制其在需要可解释预测的领域中的应用。解决方案的关键在于提出了一种名为 \textscFreqLens 的可解释预测框架,其核心创新包括:(1) 可学习频率发现机制——通过 sigmoid 映射参数化频率基函数并结合多样性正则化从数据中自动学习主导周期模式,无需领域知识;(2) 公理化的频率归因机制——构建一个理论保障的归因框架,严格满足完备性(Completeness)、忠实性(Faithfulness)、零频率不变性(Null-Frequency)和对称性(Symmetry)等公理,且每种频率的归因值等价于 Shapley 值,从而实现对预测结果的频率层面精确归因。
链接: https://arxiv.org/abs/2602.08768
作者: Chi-Sheng Chen,Xinyu Zhang,En-Jui Kuo,Guan-Ying Chen,Qiuzhe Xie,Fan Zhang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
备注:
点击查看摘要
Abstract:Time series forecasting models often lack interpretability, limiting their adoption in domains requiring explainable predictions. We propose \textscFreqLens, an interpretable forecasting framework that discovers and attributes predictions to learnable frequency components. \textscFreqLens introduces two key innovations: (1) \emphlearnable frequency discovery – frequency bases are parameterized via sigmoid mapping and learned from data with diversity regularization, enabling automatic discovery of dominant periodic patterns without domain knowledge; and (2) \emphaxiomatic frequency attribution – a theoretically grounded framework that provably satisfies Completeness, Faithfulness, Null-Frequency, and Symmetry axioms, with per-frequency attributions equivalent to Shapley values. On Traffic and Weather datasets, \textscFreqLens achieves competitive or superior performance while discovering physically meaningful frequencies: all 5 independent runs discover the 24-hour daily cycle ( 24.6 \pm 0.1 h, 2.5% error) and 12-hour half-daily cycle ( 11.8 \pm 0.1 h, 1.6% error) on Traffic, and weekly cycles ( 10\times longer than the input window) on Weather. These results demonstrate genuine frequency-level knowledge discovery with formal theoretical guarantees on attribution quality.
[AI-27] aming Scylla: Understanding the multi-headed agent ic daemon of the coding seas
链接: https://arxiv.org/abs/2602.08765
作者: Micah Villmow
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注: 32 Pages, 7 Figures
[AI-28] On the Expressive Power of GNNs for Boolean Satisfiability ICLR2026
链接: https://arxiv.org/abs/2602.08745
作者: Saku Peltonen,Roger Wattenhofer
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Accepted at ICLR 2026
[AI-29] Finite-State Controllers for (Hidden-Model) POMDPs using Deep Reinforcement Learning AAMAS’26
【速读】:该论文旨在解决部分可观测马尔可夫决策过程(Partially Observable Markov Decision Processes, POMDPs)求解中的可扩展性问题,尤其是在多POMDP场景下需要鲁棒策略时的计算挑战。其解决方案的关键在于提出Lexpop框架:首先利用深度强化学习训练一个基于循环神经网络(Recurrent Neural Network, RNN)的神经策略;其次通过高效提取方法构建一个有限状态控制器(Finite-State Controller, FSC),该控制器能够模拟神经策略的行为并支持形式化验证与性能保证。进一步地,作者将Lexpop扩展至隐模型POMDP(Hidden-Model POMDPs, HM-POMDPs)场景,通过迭代训练鲁棒神经策略并提取对应最坏情况下的控制器,从而实现对一组POMDP的鲁棒策略生成。实验表明,在大规模状态空间问题上,Lexpop显著优于现有POMDP及HM-POMDP求解器。
链接: https://arxiv.org/abs/2602.08734
作者: David Hudák,Maris F. L. Galesloot,Martin Tappler,Martin Kurečka,Nils Jansen,Milan Češka
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 17 pages (8 main paper, 2 references, 7 appendix). 3 figures in the main paper, 3 figures in the appendix. Accepted AAMAS’26 submission
点击查看摘要
Abstract:Solving partially observable Markov decision processes (POMDPs) requires computing policies under imperfect state information. Despite recent advances, the scalability of existing POMDP solvers remains limited. Moreover, many settings require a policy that is robust across multiple POMDPs, further aggravating the scalability issue. We propose the Lexpop framework for POMDP solving. Lexpop (1) employs deep reinforcement learning to train a neural policy, represented by a recurrent neural network, and (2) constructs a finite-state controller mimicking the neural policy through efficient extraction methods. Crucially, unlike neural policies, such controllers can be formally evaluated, providing performance guarantees. We extend Lexpop to compute robust policies for hidden-model POMDPs (HM-POMDPs), which describe finite sets of POMDPs. We associate every extracted controller with its worst-case POMDP. Using a set of such POMDPs, we iteratively train a robust neural policy and consequently extract a robust controller. Our experiments show that on problems with large state spaces, Lexpop outperforms state-of-the-art solvers for POMDPs as well as HM-POMDPs.
[AI-30] QUOKA: Query-Oriented KV Selection For Efficient LLM Prefill
链接: https://arxiv.org/abs/2602.08722
作者: Dalton Jones,Junyoung Park,Matthew Morse,Mingu Lee,Chris Lott,Harper Langston
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
[AI-31] Exploring SAIG Methods for an Objective Evaluation of XAI
【速读】:该论文旨在解决可解释人工智能(Explainable Artificial Intelligence, XAI)方法评估中存在的缺乏统一客观标准的问题,即传统AI评估通常依赖于明确的真值(ground truth),而XAI的解释本身难以量化和验证。其解决方案的关键在于引入合成人工智能真值(Synthetic Artificial Intelligence Ground truth, SAIG)方法,通过生成人工构造的真值来直接评估XAI技术的有效性。论文首次系统综述了SAIG方法,并提出了一种新的分类体系,识别出七项关键特征以区分不同方法,从而为未来研究提供结构化参考并推动该领域的标准化发展。
链接: https://arxiv.org/abs/2602.08715
作者: Miquel Miró-Nicolau,Gabriel Moyà-Alcover,Anna Arias-Duart
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:The evaluation of eXplainable Artificial Intelligence (XAI) methods is a rapidly growing field, characterized by a wide variety of approaches. This diversity highlights the complexity of the XAI evaluation, which, unlike traditional AI assessment, lacks a universally correct ground truth for the explanation, making objective evaluation challenging. One promising direction to address this issue involves the use of what we term Synthetic Artificial Intelligence Ground truth (SAIG) methods, which generate artificial ground truths to enable the direct evaluation of XAI techniques. This paper presents the first review and analysis of SAIG methods. We introduce a novel taxonomy to classify these approaches, identifying seven key features that distinguish different SAIG methods. Our comparative study reveals a concerning lack of consensus on the most effective XAI evaluation techniques, underscoring the need for further research and standardization in this area.
[AI-32] Intermediate Results on the Complexity of STRIPS_11
【速读】:该论文旨在解决命题STRIPS规划中一个未决问题:当操作符仅含一个前提和一个效果(即STRIPS^1_1)时,其计划存在性判定是否属于NP完全(NP-complete)类。此前已知该情形为NP难(NP-hard),但尚未确定其是否属于NP类。论文通过三种方法推进该问题的解答:首先调用SAT求解器对小规模实例进行实验验证;其次引入“文字图”(literal graph)这一结构化表示工具以刻画状态变迁关系;最后将该图映射为Petri网模型,从而借助形式化分析手段探索其计算复杂性边界。关键突破在于将原问题转化为可计算性强的图结构与Petri网模型,为证明或反驳该小解假设(small solution hypothesis)提供新的理论路径。
链接: https://arxiv.org/abs/2602.08708
作者: Stefan Edelkamp,Jiří Fink,Petr Gregor,Anders Jonsson,Bernhard Nebel
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:This paper is based on Bylander’s results on the computational complexity of propositional STRIPS planning. He showed that when only ground literals are permitted, determining plan existence is PSPACE-complete even if operators are limited to two preconditions and two postconditions. While NP-hardness is settled, it is unknown whether propositional STRIPS with operators that only have one precondition and one effect is NP-complete. We shed light on the question whether this small solution hypothesis for STRIPS ^1_1 is true, calling a SAT solver for small instances, introducing the literal graph, and mapping it to Petri nets.
[AI-33] PBLean: Pseudo-Boolean Proof Certificates for Lean 4
链接: https://arxiv.org/abs/2602.08692
作者: Stefan Szeider
机构: 未知
类目: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)
备注:
[AI-34] CompilerKV: Risk-Adaptive KV Compression via Offline Experience Compilation
链接: https://arxiv.org/abs/2602.08686
作者: Ning Yang,Chengzhi Wang,Yibo Liu,Baoliang Tian,Haijun Zhang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
[AI-35] LLaDA2.1: Speeding Up Text Diffusion via Token Editing
链接: https://arxiv.org/abs/2602.08676
作者: Tiwei Bie,Maosong Cao,Xiang Cao,Bingsen Chen,Fuyuan Chen,Kun Chen,Lun Du,Daozhuo Feng,Haibo Feng,Mingliang Gong,Zhuocheng Gong,Yanmei Gu,Jian Guan,Kaiyuan Guan,Hongliang He,Zenan Huang,Juyong Jiang,Zhonghui Jiang,Zhenzhong Lan,Chengxi Li,Jianguo Li,Zehuan Li,Huabin Liu,Lin Liu,Guoshan Lu,Yuan Lu,Yuxin Ma,Xingyu Mou,Zhenxuan Pan,Kaida Qiu,Yuji Ren,Jianfeng Tan,Yiding Tian,Zian Wang,Lanning Wei,Tao Wu,Yipeng Xing,Wentao Ye,Liangyu Zha,Tianze Zhang,Xiaolu Zhang,Junbo Zhao,Da Zheng,Hao Zhong,Wanli Zhong,Jun Zhou,Junlin Zhou,Liwang Zhu,Muzhi Zhu,Yihong Zhuang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 11 pages, 3 figures
[AI-36] 6G-Bench: An Open Benchmark for Semantic Communication and Network-Level Reasoning with Foundation Models in AI-Native 6G Networks
【速读】:该论文旨在解决当前6G网络中语义通信(Semantic Communication)与网络级推理能力评估缺乏标准化基准的问题。针对这一挑战,作者提出6G-Bench——一个面向AI原生6G网络的开源评测基准,其关键在于构建了一个由30个决策任务(T1–T30)组成的分类体系,并基于113,475个场景生成了10,000道高难度多选题,通过任务条件化提示(task-conditioned prompts)强制模型在不确定性下进行多步定量推理和多轮时域最坏情况后悔最小化。最终保留3,722道经自动化过滤与专家人工验证的高质量题目作为评估集,同时开放完整数据池以支持专用模型训练与微调,从而推动6G智能系统语义理解与推理能力的客观衡量与持续优化。
链接: https://arxiv.org/abs/2602.08675
作者: Mohamed Amine Ferrag,Abderrahmane Lakas,Merouane Debbah
机构: 未知
类目: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:This paper introduces 6G-Bench, an open benchmark for evaluating semantic communication and network-level reasoning in AI-native 6G networks. 6G-Bench defines a taxonomy of 30 decision-making tasks (T1–T30) extracted from ongoing 6G and AI-agent standardization activities in 3GPP, IETF, ETSI, ITU-T, and the O-RAN Alliance, and organizes them into five standardization-aligned capability categories. Starting from 113,475 scenarios, we generate a balanced pool of 10,000 very-hard multiple-choice questions using task-conditioned prompts that enforce multi-step quantitative reasoning under uncertainty and worst-case regret minimization over multi-turn horizons. After automated filtering and expert human validation, 3,722 questions are retained as a high-confidence evaluation set, while the full pool is released to support training and fine-tuning of 6G-specialized models. Using 6G-Bench, we evaluate 22 foundation models spanning dense and mixture-of-experts architectures, short- and long-context designs (up to 1M tokens), and both open-weight and proprietary systems. Across models, deterministic single-shot accuracy (pass@1) spans a wide range from 0.22 to 0.82, highlighting substantial variation in semantic reasoning capability. Leading models achieve intent and policy reasoning accuracy in the range 0.87–0.89, while selective robustness analysis on reasoning-intensive tasks shows pass@5 values ranging from 0.20 to 0.91. To support open science and reproducibility, we release the 6G-Bench dataset on GitHub: this https URL
[AI-37] Equalized Generative Treatment: Matching f-divergences for Fairness in Generative Models
链接: https://arxiv.org/abs/2602.08660
作者: Alexandre Verine,Rafael Pinot,Florian Le Bronnec
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
[AI-38] LEFT: Learnable Fusion of Tri-view Tokens for Unsupervised Time Series Anomaly Detection
链接: https://arxiv.org/abs/2602.08638
作者: Dezheng Wang,Tong Chen,Guansong Pang,Congyan Chen,Shihua Li,Hongzhi Yin
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
[AI-39] Debate is efficient with your time
【速读】:该论文旨在解决生成式 AI (Generative AI) 安全性中的一个核心问题:如何以最小的人类监督成本验证复杂计算任务。其解决方案的关键在于引入“辩论查询复杂度”(Debate Query Complexity, DQC),即验证者为正确判断辩论结果所需检查的最小比特数。研究发现,PSPACE/poly 类问题(即辩论可高效解决的问题)恰好对应于仅需 O(log n) 查询即可判定的函数类,这表明辩论机制具有极高的查询效率——即使面对高度复杂的任务,对人类裁判的监督需求也仅为对数级别。此外,论文还揭示了 DQC 与电路复杂度之间的深刻联系,指出若能在 P 类语言中证明 DQC 下界为 log(n) + 6,则可获得新的电路下界,从而将辩论效率分析与计算复杂性理论的核心问题紧密关联。
链接: https://arxiv.org/abs/2602.08630
作者: Jonah Brown-Cohen,Geoffrey Irving,Simon C. Marshall,Ilan Newman,Georgios Piliouras,Mario Szegedy
机构: 未知
类目: Artificial Intelligence (cs.AI); Computational Complexity (cs.CC)
备注: 11 Pages, 0 figures
点击查看摘要
Abstract:AI safety via debate uses two competing models to help a human judge verify complex computational tasks. Previous work has established what problems debate can solve in principle, but has not analysed the practical cost of human oversight: how many queries must the judge make to the debate transcript? We introduce Debate Query Complexity(DQC), the minimum number of bits a verifier must inspect to correctly decide a debate. Surprisingly, we find that PSPACE/poly (the class of problems which debate can efficiently decide) is precisely the class of functions decidable with O(log n) queries. This characterisation shows that debate is remarkably query-efficient: even for highly complex problems, logarithmic oversight suffices. We also establish that functions depending on all their input bits require Omega(log n) queries, and that any function computable by a circuit of size s satisfies DQC(f) = log(s) + 3. Interestingly, this last result implies that proving DQC lower bounds of log(n) + 6 for languages in P would yield new circuit lower bounds, connecting debate query complexity to central questions in circuit complexity. Comments: 11 Pages, 0 figures Subjects: Artificial Intelligence (cs.AI); Computational Complexity (cs.CC) Cite as: arXiv:2602.08630 [cs.AI] (or arXiv:2602.08630v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2602.08630 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
[AI-40] CauScale: Neural Causal Discovery at Scale
链接: https://arxiv.org/abs/2602.08629
作者: Bo Peng,Sirui Chen,Jiaguo Tian,Yu Qiao,Chaochao Lu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注:
[AI-41] Sparse Models Sparse Safety: Unsafe Routes in Mixture-of-Experts LLM s
链接: https://arxiv.org/abs/2602.08621
作者: Yukun Jiang,Hai Huang,Mingjie Li,Yage Zhang,Michael Backes,Yang Zhang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
备注:
[AI-42] Enhancing Genetic Algorithms with Graph Neural Networks: A Timetabling Case Study
链接: https://arxiv.org/abs/2602.08619
作者: Laura-Maria Cornei,Mihaela-Elena Breabăn
机构: 未知
类目: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Paper accepted to the International Conference on Applications of Evolutionary Computation (EvoApplications) 2026
[AI-43] Breaking the Grid: Distance-Guided Reinforcement Learning in Large Discrete and Hybrid Action Spaces
链接: https://arxiv.org/abs/2602.08616
作者: Heiko Hoppe,Fabian Akkerman,Wouter van Heeswijk,Maximilian Schiffer
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 26 pages, 8 figures
[AI-44] OSCAR: Optimization-Steered Agent ic Planning for Composed Image Retrieval
【速读】:该论文旨在解决复杂视觉与文本约束下的组合图像检索(Composed Image Retrieval, CIR)问题,现有方法主要分为统一嵌入检索和启发式代理检索两类,前者受限于单一模型的视角局限,后者则因依赖试错式的编排策略而难以优化。解决方案的关键在于提出OSCAR框架——一个基于优化驱动的代理规划方法,首次将CIR的代理搜索过程重新建模为轨迹优化问题,并引入离线-在线双阶段范式:在离线阶段,通过两阶段混合整数规划建模原子检索选择与组合,利用布尔集合运算数学推导出最大化真实覆盖度的最优轨迹,构建“黄金库”作为上下文示范;在线阶段则用此库引导视觉语言模型(VLM)规划器进行推理。该设计使模型仅需10%训练数据即可超越主流基线,验证了其规划逻辑的强泛化能力而非数据特定记忆。
链接: https://arxiv.org/abs/2602.08603
作者: Teng Wang,Rong Shan,Jianghao Lin,Junjie Wu,Tianyi Xu,Jianping Zhang,Wenteng Chen,Changwang Zhang,Zhaoxiang Wang,Weinan Zhang,Jun Wang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:Composed image retrieval (CIR) requires complex reasoning over heterogeneous visual and textual constraints. Existing approaches largely fall into two paradigms: unified embedding retrieval, which suffers from single-model myopia, and heuristic agentic retrieval, which is limited by suboptimal, trial-and-error orchestration. To this end, we propose OSCAR, an optimization-steered agentic planning framework for composed image retrieval. We are the first to reformulate agentic CIR from a heuristic search process into a principled trajectory optimization problem. Instead of relying on heuristic trial-and-error exploration, OSCAR employs a novel offline-online paradigm. In the offline phase, we model CIR via atomic retrieval selection and composition as a two-stage mixed-integer programming problem, mathematically deriving optimal trajectories that maximize ground-truth coverage for training samples via rigorous boolean set operations. These trajectories are then stored in a golden library to serve as in-context demonstrations for online steering of VLM planner at online inference time. Extensive experiments on three public benchmarks and a private industrial benchmark show that OSCAR consistently outperforms SOTA baselines. Notably, it achieves superior performance using only 10% of training data, demonstrating strong generalization of planning logic rather than dataset-specific memorization.
[AI-45] An Attention Mechanism for Robust Multimodal Integration in a Global Workspace Architecture
【速读】:该论文旨在解决当前多模态集成系统中注意力机制研究不足的问题,特别是如何在全局工作空间(Global Workspace)框架下实现对不同模态的灵活选择与有效整合。其解决方案的关键在于提出并验证了一种自上而下的注意力机制,该机制能够动态地在多模态信息中选择相关子集,从而提升系统的噪声鲁棒性,并展现出跨任务和跨模态的泛化能力。实验表明,该机制在Simple Shapes和MM-IMDb 1.0两个复杂度递增的数据集上均优于现有基线模型,使全局工作空间架构在性能上达到当前最优水平。
链接: https://arxiv.org/abs/2602.08597
作者: Roland Bertin-Johannet,Lara Scipio,Leopold Maytié,Rufin VanRullen
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:Global Workspace Theory (GWT), inspired by cognitive neuroscience, posits that flexible cognition could arise via the attentional selection of a relevant subset of modalities within a multimodal integration system. This cognitive framework can inspire novel computational architectures for multimodal integration. Indeed, recent implementations of GWT have explored its multimodal representation capabilities, but the related attention mechanisms remain understudied. Here, we propose and evaluate a top-down attention mechanism to select modalities inside a global workspace. First, we demonstrate that our attention mechanism improves noise robustness of a global workspace system on two multimodal datasets of increasing complexity: Simple Shapes and MM-IMDb 1.0. Second, we highlight various cross-task and cross-modality generalization capabilities that are not shared by multimodal attention models from the literature. Comparing against existing baselines on the MM-IMDb 1.0 benchmark, we find our attention mechanism makes the global workspace competitive with the state of the art.
[AI-46] PRISM: A Principled Framework for Multi-Agent Reasoning via Gain Decomposition
【速读】:该论文旨在解决多智能体协作(Multi-agent Collaboration)在提升大语言模型(Large Language Models, LLMs)推理能力时存在的两大核心问题:一是现有方法缺乏理论指导,难以明确为何多智能体协作优于单智能体推理;二是设计选择不清晰,无法系统性优化推理性能。为解决这些问题,作者提出一个统一的理论框架,将多智能体推理增益分解为三个独立维度:探索(Exploration)以覆盖多样解空间、信息(Information)用于提供高保真反馈、聚合(Aggregation)实现有原则的一致性共识。该框架的关键创新在于通过角色分工实现多样性、基于执行结果的证据交叉评估提供高质量反馈,并采用闭环验证的迭代合成机制整合多方输出,从而同时最大化三个维度。基于此,作者进一步提出了PRISM(Propose-Review-Integrate Synthesis for Multi-agent Reasoning)框架,在数学推理、代码生成和函数调用等任务上实现了当前最优性能且计算效率更高,为未来多智能体推理系统的设计提供了可操作的理论依据。
链接: https://arxiv.org/abs/2602.08586
作者: Yiming Yang,Zhuoyuan Li,Fanxiang Zeng,Hao Fu,Yue Liu
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:Multi-agent collaboration has emerged as a promising paradigm for enhancing reasoning capabilities of Large Language Models (LLMs). However, existing approaches remain largely heuristic, lacking principled guidance on what drives performance gains and how to systematically optimize multi-agent reasoning. Specifically, it remains unclear why multi-agent collaboration outperforms single-agent reasoning and which design choices contribute most to these gains, making it difficult to build better systems. We address this gap by introducing a unified theoretical framework that decomposes multi-agent reasoning gains into three conceptually independent dimensions: Exploration for diverse solution coverage, Information for high-fidelity feedback, and Aggregation for principled consensus. Through this lens, existing methods can be understood as special cases that optimize only subsets of these dimensions. Building upon this decomposition, a novel framework called PRISM (Propose-Review-Integrate Synthesis for Multi-agent Reasoning) is proposed, which jointly maximizes all three dimensions through role-based diversity, execution-grounded feedback with evidence-based cross-evaluation, and iterative synthesis with closed-loop validation. Extensive experiments across mathematical reasoning, code generation, and function calling benchmarks demonstrate that PRISM achieves state-of-the-art performance with superior compute-efficiency compared to methods optimizing partial dimensions. The theoretical framework provides actionable design principles for future multi-agent reasoning systems. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2602.08586 [cs.AI] (or arXiv:2602.08586v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2602.08586 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
[AI-47] Predicting Future Utility: Global Combinatorial Optimization for Task-Agnostic KV Cache Eviction
链接: https://arxiv.org/abs/2602.08585
作者: Ziyao Tang,Pengkun Jiao,Xinhang Chen,Wei Liu,Shiyong Li,Jingjing Chen
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
[AI-48] Stateless Yet Not Forgetful: Implicit Memory as a Hidden Channel in LLM s
链接: https://arxiv.org/abs/2602.08563
作者: Ahmed Salem,Andrew Paverd,Sahar Abdelnabi
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
备注: Accepted at IEEE SaTML 2026
[AI-49] Dialogue Model Optimization via Agent Game and Adaptive Tree-based GRPO
【速读】:该论文旨在解决开放域对话代理在个性化交互中面临的两大问题:一是对预收集用户数据的高度依赖,二是强化学习(Reinforcement Learning, RL)中存在的短程偏差,导致忽视长期对话价值。其解决方案的关键在于提出一种新颖的长程强化学习框架,融合在线个性化与自适应树状分组相对策略优化(Adaptive Tree-based Group Relative Policy Optimization, AT-GRPO)。该框架采用双智能体博弈机制,通过用户代理模拟用户风格并预测回合终止概率作为即时奖励,驱动对话代理进行更深层次的兴趣探索;同时,AT-GRPO将对话轨迹重新建模为树结构,并引入自适应观测范围:早期阶段采用较大范围以促进话题探索,后期阶段缩小范围以维持对话质量,从而在保持长期奖励捕捉能力的同时,将采样预算从指数级降低至多项式级别。
链接: https://arxiv.org/abs/2602.08533
作者: Kun Peng,Conghui Tan,Yu Liu,Guohua Tang,Zhongqian Sun,Wei Yang,Zining Zhu,Lei Jiang,Yanbing Liu,Hao Peng
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:Open-ended dialogue agents aim to deliver engaging, personalized interactions by adapting to users’ traits, but existing methods face critical limitations: over-reliance on pre-collected user data, and short-horizon biases in reinforcement learning (RL) that neglect long-term dialogue value. To address these, we propose a novel long-horizon RL framework integrating online personalization with Adaptive Tree-based Group Relative Policy Optimization (AT-GRPO). Adopting a two-agent game paradigm, a user agent constructs dynamic environments via style mimicry (learning user-specific conversational traits) and active termination (predicting turn-level termination probabilities as immediate rewards), forming an iterative cycle that drives the dialogue agent to deepen interest exploration. AT-GRPO reinterprets dialogue trajectories as trees and introduces adaptive observation ranges. Unlike full tree expansion that incurs exponential overhead, it limits each node to aggregate rewards from a stage-aware range: larger ranges support early-stage topic exploration, while smaller ranges facilitate late-stage dialogue maintenance. This design reduces rollout budgets from exponential to polynomial in the dialogue length, while preserving long-term reward capture. Extensive experiments show our framework’s superior performance, sample efficiency, and robustness.
[AI-50] Reinforcement Inference: Leverag ing Uncertainty for Self-Correcting Language Model Reasoning
【速读】:该论文旨在解决当前大语言模型(Large Language Models, LLMs)在零样本、贪婪解码(greedy inference)条件下推理能力被系统性低估的问题,其根源在于模型在内部不确定性下过早做出决策,导致本可避免的错误。解决方案的关键在于提出一种称为“强化推理”(Reinforcement Inference)的熵感知推理时控制策略:利用模型自身生成过程中的熵(entropy)作为置信度指标,动态判断是否触发第二次更审慎的推理尝试,从而在不进行任何重新训练的前提下显著提升性能。实验表明,在MMLU-Pro数据集上,该方法将DeepSeek-v3.2模型的准确率从60.72%提升至84.03%,仅增加61.06%的推理调用次数,验证了熵驱动的选择机制能高效捕获大部分潜在改进,同时揭示了以熵为控制信号的新型推理范式对理解模型隐含推理边界和未来训练目标设计的重要意义。
链接: https://arxiv.org/abs/2602.08520
作者: Xinhai Sun
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:
点击查看摘要
Abstract:Modern large language models (LLMs) are often evaluated and deployed under a \emphone-shot, greedy inference protocol, especially in professional settings that require deterministic behavior. This regime can systematically under-estimate a fixed model’s true capability: many errors arise not from missing knowledge, but from premature commitment under internal ambiguity. We introduce \emphReinforcement Inference, an entropy-aware inference-time control strategy that uses the model’s own uncertainty to selectively invoke a second, more deliberate reasoning attempt, enabling stronger performance \emphwithout any retraining. On 12,032 MMLU-Pro questions across 14 subjects, using DeepSeek-v3.2 with deterministic decoding in a zero-shot setting, Reinforcement Inference improves accuracy from 60.72% to 84.03%, while only incurring 61.06% additional inference calls. A 100% re-asking ablation reaches 84.35%, indicating that uncertainty-aware selection captures most of the attainable improvement with substantially less compute. Moreover, a \emphprompt-only ablation underperforms the baseline, suggesting that the gains are not explained by generic `` your output had high entropy, think step-by-step’’ prompting alone. Beyond providing a practical inference-time upgrade, our results suggest a broader \emphentropy-aware paradigm for measuring and expanding model capability: because modern decoder-based models generate outputs autoregressively, entropy and related confidence measures arise naturally as first-class control signals during generation. The resulting gap between one-pass greedy inference and uncertainty-conditioned deliberation offers a diagnostic lens on an LLM’s latent reasoning horizon and motivates future training objectives that explicitly constrain correctness–confidence alignment. Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG) Cite as: arXiv:2602.08520 [cs.AI] (or arXiv:2602.08520v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2602.08520 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
[AI-51] reeTensor: Boost AI System on Nested Data with Constrained Tree-Like Tensor
【速读】:该论文旨在解决传统Tensor在处理复杂认知人工智能系统中具有层次结构(即嵌套数据)且多模态的数据时,因固定形状和缺乏灵活性而导致编程不便与效率低下的问题。解决方案的关键在于提出一种通用的嵌套数据容器——TreeTensor,它通过约束树状结构视角系统地建模数据关系,并借助多种约束和“魔法”工具实现对嵌套数据的任意函数和操作几乎零成本的应用,兼容如Scikit-Learn、Numpy和PyTorch等主流机器学习库,同时支持异步执行和变长数据计算等扩展场景,从而在AlphaStar for StarCraftII等复杂AI系统中展现出卓越的可用性和运行效率。
链接: https://arxiv.org/abs/2602.08517
作者: Shaoang Zhang,Yazhe Niu
机构: 未知
类目: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
备注:
点击查看摘要
Abstract:Tensor is the most basic and essential data structure of nowadays artificial intelligence (AI) system. The natural properties of Tensor, especially the memory-continuity and slice-independence, make it feasible for training system to leverage parallel computing unit like GPU to process data simultaneously in batch, spatial or temporal dimensions. However, if we look beyond perception tasks, the data in a complicated cognitive AI system usually has hierarchical structures (i.e. nested data) with various modalities. They are inconvenient and inefficient to program directly with conventional Tensor with fixed shape. To address this issue, we summarize two main computational patterns of nested data, and then propose a general nested data container: TreeTensor. Through various constraints and magic utilities of TreeTensor, one can apply arbitrary functions and operations to nested data with almost zero cost, including some famous machine learning libraries, such as Scikit-Learn, Numpy and PyTorch. Our approach utilizes a constrained tree-structure perspective to systematically model data relationships, and it can also easily be combined with other methods to extend more usages, such as asynchronous execution and variable-length data computation. Detailed examples and benchmarks show TreeTensor not only provides powerful usability in various problems, especially one of the most complicated AI systems at present: AlphaStar for StarCraftII, but also exhibits excellent runtime efficiency without any overhead. Our project is available at this https URL.
[AI-52] Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards
链接: https://arxiv.org/abs/2602.08499
作者: Xiaodong Lu,Xiaohan Wang,Jiajun Chai,Guojun Yin,Wei Lin,Zhijun Chen,Yu Luo,Fuzhen Zhuang,Yikun Ban,Deqing Wang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
[AI-53] CLEAR: A Knowledge-Centric Vessel Trajectory Analysis Platform
链接: https://arxiv.org/abs/2602.08482
作者: Hengyu Liu,Tianyi Li,Haoyu Wang,Kristian Torp,Yushuai Li,Tiancheng Zhang,Torben Bach Pedersen,Christian S. Jensen
机构: 未知
类目: Databases (cs.DB); Artificial Intelligence (cs.AI)
备注: 4 pages, and 5 Figures
[AI-54] Decentralized Spatial Reuse Optimization in Wi-Fi: An Internal Regret Minimization Approach
【速读】:该论文旨在解决密集IEEE 802.11部署中,由于缺乏全局状态信息而导致的空间复用(Spatial Reuse, SR)参数(如传输功率和载波侦听阈值Carrier Sensing Threshold, CST)难以实现分布式优化的问题。同时,多个代理并发操作造成的非平稳环境常导致次优的全局配置,例如默认使用最大传输功率。解决方案的关键在于提出一种基于后悔匹配(regret-matching)的去中心化学习算法,其理论基础是内部后悔最小化(internal regret minimization),该机制能够引导竞争性代理趋向于相关均衡(Correlated Equilibrium, CE),从而在无需显式通信的情况下实现类协调行为,最终逼近近似最优的全局性能。
链接: https://arxiv.org/abs/2602.08456
作者: Francesc Wilhelmi,Boris Bellalta,Miguel Casasnovas,Aleksandra Kijanka,Miguel Calvo-Fullana
机构: 未知
类目: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:Spatial Reuse (SR) is a cost-effective technique for improving spectral efficiency in dense IEEE 802.11 deployments by enabling simultaneous transmissions. However, the decentralized optimization of SR parameters – transmission power and Carrier Sensing Threshold (CST) – across different Basic Service Sets (BSSs) is challenging due to the lack of global state information. In addition, the concurrent operation of multiple agents creates a highly non-stationary environment, often resulting in suboptimal global configurations (e.g., using the maximum possible transmission power by default). To overcome these limitations, this paper introduces a decentralized learning algorithm based on regret-matching, grounded in internal regret minimization. Unlike standard decentralized ``selfish’’ approaches that often converge to inefficient Nash Equilibria (NE), internal regret minimization guides competing agents toward Correlated Equilibria (CE), effectively mimicking coordination without explicit communication. Through simulation results, we showcase the superiority of our proposed approach and its ability to reach near-optimal global performance. These results confirm the not-yet-unleashed potential of scalable decentralized solutions and question the need for the heavy signaling overheads and architectural complexity associated with emerging centralized solutions like Multi-Access Point Coordination (MAPC).
[AI-55] When Evaluation Becomes a Side Channel: Regime Leakage and Structural Mitigations for Alignment Assessment
【速读】:该论文旨在解决先进人工智能系统在评估阶段与部署阶段行为不一致的问题,尤其是当智能体具备情境感知能力时,可能利用评估与部署环境之间的信息差异(即“制度泄露”)实施条件性策略(如逢迎行为和潜伏代理),从而在评估中表现合规而在实际部署中偏离预期。其解决方案的关键在于将对齐评估重构为部分可观测环境下信息流的问题,并提出“制度盲化”机制:通过训练阶段的对抗不变性干预,降低决策相关内部表征中可提取的制度信息,从而抑制制度依赖的行为。实验表明,该方法在不显著损害任务效用的前提下有效抑制了科学逢迎和时间潜伏代理两种典型失败模式,但其效果取决于制度信息在策略中的嵌入方式,揭示了表示不变性虽具意义但存在根本局限。
链接: https://arxiv.org/abs/2602.08449
作者: Igor Santos-Grueiro
机构: 未知
类目: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
备注: 25 pages, 4 figures,
点击查看摘要
Abstract:Safety evaluation for advanced AI systems implicitly assumes that behavior observed under evaluation is predictive of behavior in deployment. This assumption becomes fragile for agents with situational awareness, which may exploitregime leakage-informational cues distinguishing evaluation from deployment-to implement conditional policies such as sycophancy and sleeper agents, which preserve compliance under oversight while defecting in deployment-like regimes. We reframe alignment evaluation as a problem of information flow under partial observability. Within this framework, we show that divergence between evaluation-time and deployment-time behavior is bounded by the mutual information between internal representations and the regime variable. Motivated by this result, we study regime-blind mechanisms: training-time interventions that reduce the extractability of regime information at decision-relevant internal representations via adversarial invariance. We evaluate this approach on a base, open-weight language model across two fully characterized failure modes -scientific sycophancy and temporal sleeper agents. Regime-blind training suppresses regime-conditioned behavior in both evaluated cases without measurable loss of task utility, but with qualitatively different dynamics: sycophancy exhibits a sharp representational and behavioral transition at low intervention strength, whereas sleeper-agent behavior requires substantially stronger pressure and does not exhibit a clean collapse of regime decodability. These results demonstrate that representational invariance is a meaningful but fundamentally limited control lever, whose effectiveness depends on how regime information is embedded in the policy. We argue that behavioral evaluation should be complemented with white-box diagnostics of regime awareness and information flow.
[AI-56] LLM s Security = Trouble
链接: https://arxiv.org/abs/2602.08422
作者: Benjamin Livshits
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
备注:
[AI-57] From Assistant to Double Agent : Formalizing and Benchmarking Attacks on OpenClaw for Personalized Local AI Agent
【速读】:该论文旨在解决当前大语言模型(Large Language Model, LLM)驱动的个性化AI代理在真实部署场景中面临的安全风险评估不足问题。现有研究多局限于合成或任务导向的测试环境,无法准确刻画个性化代理在实际使用中的攻击面与风险传播机制。为此,作者提出Personalized Agent Security Bench(PASB),其核心在于构建一个面向真实世界个性化代理的端到端安全评估框架,整合个性化使用场景、现实工具链和长周期交互行为,支持对真实系统的黑盒式全链路安全测试。通过以OpenClaw为案例进行系统性评估,验证了PASB能有效识别代理在用户提示处理、工具调用及记忆检索等执行阶段的关键漏洞,从而揭示个性化代理部署中的严重安全隐患。
链接: https://arxiv.org/abs/2602.08412
作者: Yuhang Wang,Feiming Xu,Zheng Lin,Guangyu He,Yuzhe Huang,Haichang Gao,Zhenxing Niu
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 11 pages,2 figures
点击查看摘要
Abstract:Although large language model (LLM)-based agents, exemplified by OpenClaw, are increasingly evolving from task-oriented systems into personalized AI assistants for solving complex real-world tasks, their practical deployment also introduces severe security risks. However, existing agent security research and evaluation frameworks primarily focus on synthetic or task-centric settings, and thus fail to accurately capture the attack surface and risk propagation mechanisms of personalized agents in real-world deployments. To address this gap, we propose Personalized Agent Security Bench (PASB), an end-to-end security evaluation framework tailored for real-world personalized agents. Building upon existing agent attack paradigms, PASB incorporates personalized usage scenarios, realistic toolchains, and long-horizon interactions, enabling black-box, end-to-end security evaluation on real systems. Using OpenClaw as a representative case study, we systematically evaluate its security across multiple personalized scenarios, tool capabilities, and attack types. Our results indicate that OpenClaw exhibits critical vulnerabilities at different execution stages, including user prompt processing, tool usage, and memory retrieval, highlighting substantial security risks in personalized agent deployments. The code for the proposed PASB framework is available at this https URL.
[AI-58] On Protecting Agent ic Systems Intellectual Property via Watermarking
【速读】:该论文旨在解决生成式 AI(Generative AI)代理系统(agentic systems)在面对模仿攻击(imitation attacks)时缺乏有效知识产权(IP)保护机制的问题。当前主流的大型语言模型(Large Language Models, LLMs)水印技术无法适用于此类系统,因其运行于“灰盒”环境,隐藏了内部推理路径,导致传统依赖可验证中间状态的水印方法失效。解决方案的关键在于提出 AGENTWM,一个专为代理模型设计的水印框架:它利用动作序列的语义等价性,通过微调功能等效的工具执行路径分布来嵌入可验证信号,从而在不改变用户感知的前提下,将水印直接注入可见的动作轨迹中。该方案结合自动化水印生成管道与严格的统计假设检验流程,实现了对适应性攻击者的强鲁棒性保护,同时保持代理性能几乎不受影响。
链接: https://arxiv.org/abs/2602.08401
作者: Liwen Wang,Zongjie Li,Yuchong Xie,Shuai Wang,Dongdong She,Wei Wang,Juergen Rahmel
机构: 未知
类目: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
备注:
点击查看摘要
Abstract:The evolution of Large Language Models (LLMs) into agentic systems that perform autonomous reasoning and tool use has created significant intellectual property (IP) value. We demonstrate that these systems are highly vulnerable to imitation attacks, where adversaries steal proprietary capabilities by training imitation models on victim outputs. Crucially, existing LLM watermarking techniques fail in this domain because real-world agentic systems often operate as grey boxes, concealing the internal reasoning traces required for verification. This paper presents AGENTWM, the first watermarking framework designed specifically for agentic models. AGENTWM exploits the semantic equivalence of action sequences, injecting watermarks by subtly biasing the distribution of functionally identical tool execution paths. This mechanism allows AGENTWM to embed verifiable signals directly into the visible action trajectory while remaining indistinguishable to users. We develop an automated pipeline to generate robust watermark schemes and a rigorous statistical hypothesis testing procedure for verification. Extensive evaluations across three complex domains demonstrate that AGENTWM achieves high detection accuracy with negligible impact on agent performance. Our results confirm that AGENTWM effectively protects agentic IP against adaptive adversaries, who cannot remove the watermarks without severely degrading the stolen model’s utility.
[AI-59] SCOUT-RAG : Scalable and Cost-Efficient Unifying Traversal for Agent ic Graph-RAG over Distributed Domains
【速读】:该论文旨在解决在分布式和访问受限场景(如医院或跨国组织)中,如何高效地利用结构化知识进行大语言模型(Large Language Model, LLM)推理的问题。传统Graph-RAG依赖于集中式知识图谱,在缺乏全局可见性或无法进行全量查询的情况下难以适用。为此,作者提出SCOUT-RAG框架,其核心创新在于引入四个协同工作的智能体(agent),分别负责估算领域相关性、决定是否扩展检索范围、动态调整遍历深度以避免冗余探索,并最终合成高质量答案。该方案通过设定增量效用目标实现跨域检索的渐进式推进,在最小化检索遗憾(missing useful domain information)的同时有效控制延迟和API调用成本,从而在多领域知识环境中达到与集中式基线相当的性能,同时显著减少跨域调用次数、处理的总token数及延迟。
链接: https://arxiv.org/abs/2602.08400
作者: Longkun Li,Yuanben Zou,Jinghan Wu,Yuqing Wen,Jing Li,Hangwei Qian,Ivor Tsang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:Graph-RAG improves LLM reasoning using structured knowledge, yet conventional designs rely on a centralized knowledge graph. In distributed and access-restricted settings (e.g., hospitals or multinational organizations), retrieval must select relevant domains and appropriate traversal depth without global graph visibility or exhaustive querying. To address this challenge, we introduce \textbfSCOUT-RAG (\textit\underlineScalable and \underlineCOst-efficient \underlineUnifying \underlineTraversal), a distributed agentic Graph-RAG framework that performs progressive cross-domain retrieval guided by incremental utility goals. SCOUT-RAG employs four cooperative agents that: (i) estimate domain relevance, (ii) decide when to expand retrieval to additional domains, (iii) adapt traversal depth to avoid unnecessary graph exploration, and (iv) synthesize the high-quality answers. The framework is designed to minimize retrieval regret, defined as missing useful domain information, while controlling latency and API cost. Across multi-domain knowledge settings, SCOUT-RAG achieves performance comparable to centralized baselines, including DRIFT and exhaustive domain traversal, while substantially reducing cross-domain calls, total tokens processed, and latency.
[AI-60] Grounding Generative Planners in Verifiable Logic: A Hybrid Architecture for Trustworthy Embodied AI ICLR2026
【速读】:该论文旨在解决大语言模型(Large Language Models, LLMs)在具身人工智能(Embodied AI)规划中因随机性导致缺乏形式化推理、难以提供严格安全保证的问题。现有方法通常依赖不可靠的LLM进行安全检查或直接拒绝不安全计划,而未提供修复机制。其解决方案的关键在于提出一种神经符号架构——可验证迭代精炼框架(Verifiable Iterative Refinement Framework, VIRF),其中包含一个基于形式化安全本体(formal safety ontology)的确定性逻辑导师(Logic Tutor),通过因果与教学反馈与LLM规划器进行对话式协作,实现智能计划修复而非简单规避;同时引入可扩展的知识获取流水线,从真实文档中合成安全知识库以弥补基准测试中的盲区。该方法在家庭安全任务中实现了0%危险动作率(Hazardous Action Rate, HAR)和77.3%的目标条件达成率(Goal-Condition Rate, GCR),平均仅需1.1次修正迭代,为构建可验证安全的具身智能体提供了系统性路径。
链接: https://arxiv.org/abs/2602.08373
作者: Feiyu Wu,Xu Zheng,Yue Qu,Zhuocheng Wang,Zicheng Feng,Hui Li
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Accepted to ICLR 2026. Project page. this https URL
点击查看摘要
Abstract:Large Language Models (LLMs) show promise as planners for embodied AI, but their stochastic nature lacks formal reasoning, preventing strict safety guarantees for physical deployment. Current approaches often rely on unreliable LLMs for safety checks or simply reject unsafe plans without offering repairs. We introduce the Verifiable Iterative Refinement Framework (VIRF), a neuro-symbolic architecture that shifts the paradigm from passive safety gatekeeping to active collaboration. Our core contribution is a tutor-apprentice dialogue where a deterministic Logic Tutor, grounded in a formal safety ontology, provides causal and pedagogical feedback to an LLM planner. This enables intelligent plan repairs rather than mere avoidance. We also introduce a scalable knowledge acquisition pipeline that synthesizes safety knowledge bases from real-world documents, correcting blind spots in existing benchmarks. In challenging home safety tasks, VIRF achieves a perfect 0 percent Hazardous Action Rate (HAR) and a 77.3 percent Goal-Condition Rate (GCR), which is the highest among all baselines. It is highly efficient, requiring only 1.1 correction iterations on average. VIRF demonstrates a principled pathway toward building fundamentally trustworthy and verifiably safe embodied agents.
[AI-61] Learning Human-Like Badminton Skills for Humanoid Robots
【速读】:该论文旨在解决人形机器人在高要求运动项目(如羽毛球)中实现类人且功能性打击能力的难题,其核心挑战在于如何将运动学模仿转化为物理感知驱动的精准击球动作,同时保持自然的运动风格。解决方案的关键在于提出“模仿到交互”(Imitation-to-Interaction)的渐进式强化学习框架:首先从人类数据中建立稳健的运动先验,将其压缩为紧凑的模型基础状态表示,并通过对抗性先验稳定动力学;更重要的是,引入流形扩展策略,将离散击球点泛化为密集的交互体积,以缓解专家示范数据稀疏的问题,从而实现仿真环境中的多样化击球技能(如高远球和吊球)以及首次零样本的仿真到现实迁移,成功复现了人类运动员的动能优雅与功能精度。
链接: https://arxiv.org/abs/2602.08370
作者: Yeke Chen,Shihao Dong,Xiaoyu Ji,Jingkai Sun,Zeren Luo,Liu Zhao,Jiahui Zhang,Wanyue Li,Ji Ma,Bowen Xu,Yimin Han,Yudong Zhao,Peng Lu
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 10 pages, 4 figures
点击查看摘要
Abstract:Realizing versatile and human-like performance in high-demand sports like badminton remains a formidable challenge for humanoid robotics. Unlike standard locomotion or static manipulation, this task demands a seamless integration of explosive whole-body coordination and precise, timing-critical interception. While recent advances have achieved lifelike motion mimicry, bridging the gap between kinematic imitation and functional, physics-aware striking without compromising stylistic naturalness is non-trivial. To address this, we propose Imitation-to-Interaction, a progressive reinforcement learning framework designed to evolve a robot from a “mimic” to a capable “striker.” Our approach establishes a robust motor prior from human data, distills it into a compact, model-based state representation, and stabilizes dynamics via adversarial priors. Crucially, to overcome the sparsity of expert demonstrations, we introduce a manifold expansion strategy that generalizes discrete strike points into a dense interaction volume. We validate our framework through the mastery of diverse skills, including lifts and drop shots, in simulation. Furthermore, we demonstrate the first zero-shot sim-to-real transfer of anthropomorphic badminton skills to a humanoid robot, successfully replicating the kinetic elegance and functional precision of human athletes in the physical world.
[AI-62] Circuit Representations of Random Forests with Applications to XAI
【速读】:该论文旨在解决随机森林分类器(Random Forest Classifier)决策解释与鲁棒性分析的问题,特别是如何高效计算决策的完整原因(complete and general reasons)、必要原因(necessary reasons)、充分原因(sufficient reasons)以及对比解释(contrastive explanations),并识别使决策发生改变的最短路径。其解决方案的关键在于:首先提出一种将随机森林编译为一组电路的方法,每个电路直接编码分类器中某类的所有实例;其次基于该电路表示,设计算法以高效计算上述各类解释,并进一步开发用于评估决策鲁棒性及找出所有最短翻转路径的算法。该方法显著优于现有技术,在多种数据集上验证了其有效性与实用性。
链接: https://arxiv.org/abs/2602.08362
作者: Chunxi Ji,Adnan Darwiche
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
备注:
点击查看摘要
Abstract:We make three contributions in this paper. First, we present an approach for compiling a random forest classifier into a set of circuits, where each circuit directly encodes the instances in some class of the classifier. We show empirically that our proposed approach is significantly more efficient than existing similar approaches. Next, we utilize this approach to further obtain circuits that are tractable for computing the complete and general reasons of a decision, which are instance abstractions that play a fundamental role in computing explanations. Finally, we propose algorithms for computing the robustness of a decision and all shortest ways to flip it. We illustrate the utility of our contributions by using them to enumerate all sufficient reasons, necessary reasons and contrastive explanations of decisions; to compute the robustness of decisions; and to identify all shortest ways to flip the decisions made by random forest classifiers learned from a wide range of datasets.
[AI-63] Does Your Reasoning Model Implicitly Know When to Stop Thinking?
【速读】:该论文旨在解决大推理模型(Large Reasoning Models, LRMs)在复杂推理任务中因采用长思维链(Long Chains of Thought, CoTs)而导致的冗余计算问题,该问题不仅降低计算效率,还可能损害推理准确性。解决方案的关键在于提出一种名为SAGE(Self-Aware Guided Efficient Reasoning)的新颖采样范式,该范式能够揭示并利用LRMs隐含的“适时停止思考”能力——这一能力此前被现有采样策略所掩盖。通过将SAGE与基于群体的强化学习结合(即SAGE-RL),模型可有效提取高效推理模式,并在标准的pass@1推理中显著提升推理准确率与效率,从而在多个数学基准测试中实现性能优化。
链接: https://arxiv.org/abs/2602.08354
作者: Zixuan Huang,Xin Xia,Yuxi Ren,Jianbin Zheng,Xuanda Wang,Zhixia Zhang,Hongyan Xie,Songshi Liang,Zehao Chen,Xuefeng Xiao,Fuzhen Zhuang,Jianxin Li,Yikun Ban,Deqing Wang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:Recent advancements in large reasoning models (LRMs) have greatly improved their capabilities on complex reasoning tasks through Long Chains of Thought (CoTs). However, this approach often results in substantial redundancy, impairing computational efficiency and causing significant delays in real-time applications. Recent studies show that longer reasoning chains are frequently uncorrelated with correctness and can even be detrimental to accuracy. In a further in-depth analysis of this phenomenon, we surprisingly uncover and empirically verify that LRMs implicitly know the appropriate time to stop thinking, while this capability is obscured by current sampling paradigms. Motivated by this, we introduce SAGE (Self-Aware Guided Efficient Reasoning), a novel sampling paradigm that unleashes this efficient reasoning potential. Furthermore, integrating SAGE as mixed sampling into group-based reinforcement learning (SAGE-RL) enables SAGE-RL to effectively incorporate SAGE-discovered efficient reasoning patterns into standard pass@1 inference, markedly enhancing both the reasoning accuracy and efficiency of LRMs across multiple challenging mathematical benchmarks.
[AI-64] owards Better Evolution Modeling for Temporal Knowledge Graphs
【速读】:该论文旨在解决当前Temporal Knowledge Graphs (TKGs) 基准测试中存在的“捷径问题”(shortcut),即现有模型在未利用任何时间信息的情况下,仅通过统计共现频率即可达到接近最先进性能的现象。其根源在于现有数据集存在内在偏差以及评估任务设计过于简化,导致模型可绕过对知识演化本质的学习。解决方案的关键在于引入一个全新的TKG演化基准(TKG evolution benchmark),包含四个去偏数据集和两个紧密贴合演化过程的新任务,从而更准确地刻画TKG建模中的真实挑战,并抑制对简单统计规律的依赖,推动对知识演化机制的深入理解与建模。
链接: https://arxiv.org/abs/2602.08353
作者: Zhang Jiasheng,Li Zhangpin,Wang Mingzhe,Shao Jie,Cui Jiangtao,Li Hui
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 13 pages, 11 figures
点击查看摘要
Abstract:Temporal knowledge graphs (TKGs) structurally preserve evolving human knowledge. Recent research has focused on designing models to learn the evolutionary nature of TKGs to predict future facts, achieving impressive results. For instance, Hits@10 scores over 0.9 on YAGO dataset. However, we find that existing benchmarks inadvertently introduce a shortcut. Near state-of-the-art performance can be simply achieved by counting co-occurrences, without using any temporal information. In this work, we examine the root cause of this issue, identifying inherent biases in current datasets and over simplified form of evaluation task that can be exploited by these biases. Through this analysis, we further uncover additional limitations of existing benchmarks, including unreasonable formatting of time-interval knowledge, ignorance of learning knowledge obsolescence, and insufficient information for precise evolution understanding, all of which can amplify the shortcut and hinder a fair assessment. Therefore, we introduce the TKG evolution benchmark. It includes four bias-corrected datasets and two novel tasks closely aligned with the evolution process, promoting a more accurate understanding of the challenges in TKG evolution modeling. Benchmark is available at: this https URL.
[AI-65] he Chicken and Egg Dilemma: Co-optimizing Data and Model Configurations for LLM s
链接: https://arxiv.org/abs/2602.08351
作者: Zhiliang Chen,Alfred Wei Lun Leong,Shao Yong Ong,Apivich Hemachandram,Gregory Kang Ruey Lau,Chuan-Sheng Foo,Zhengyuan Liu,Nancy F. Chen,Bryan Kian Hsiang Low
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
[AI-66] OPE: Overcoming Information Saturation in Parallel Thinking via Outline-Guided Path Exploration
【速读】:该论文旨在解决当前基于强化学习的并行思维(Parallel Thinking)方法在复杂问题求解中,因探索路径间信息冗余导致的整体性能受限的问题。现有研究多聚焦于聚合阶段的优化,而忽视了路径探索阶段的信息多样性不足。其解决方案的关键在于提出Outline-Guided Path Exploration (OPE),通过预先生成多样化的推理大纲(reasoning outlines)来显式划分解空间,从而降低探索路径间的互信息瓶颈,提升信息多样性;同时采用迭代强化学习策略独立优化大纲规划与引导式推理,显著增强大型推理模型(LRMs)发现正确解的能力。
链接: https://arxiv.org/abs/2602.08344
作者: Qi Guo,Jianing Wang,Deyang Kong,Xiangyu Xi,Jianfei Zhang,Yi Lu,Jingang Wang,Wei Wang,Shikun Zhang,Wei Ye
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:Parallel thinking has emerged as a new paradigm for large reasoning models (LRMs) in tackling complex problems. Recent methods leverage Reinforcement Learning (RL) to enhance parallel thinking, aiming to address the limitations in computational resources and effectiveness encountered with supervised fine-tuning. However, most existing studies primarily focus on optimizing the aggregation phase, with limited attention to the path exploration stage. In this paper, we theoretically analyze the optimization of parallel thinking under the Reinforcement Learning with Verifiable Rewards (RLVR) setting, and identify that the mutual information bottleneck among exploration paths fundamentally restricts overall performance. To address this, we propose Outline-Guided Path Exploration (OPE), which explicitly partitions the solution space by generating diverse reasoning outlines prior to parallel path reasoning, thereby reducing information redundancy and improving the diversity of information captured across exploration paths. We implement OPE with an iterative RL strategy that optimizes outline planning and outline-guided reasoning independently. Extensive experiments across multiple challenging mathematical benchmarks demonstrate that OPE effectively improves reasoning performance in different aggregation strategies, enabling LRMs to more reliably discover correct solutions.
[AI-67] Effect-Level Validation for Causal Discovery
【速读】:该论文旨在解决在具有强自选择(self-selection)特征的反馈驱动系统中,因果发现方法对用户干预效应估计的可靠性问题。当前因果发现常用于大规模遥测数据以评估用户面向干预的效果,但其在复杂现实场景中的可信度尚不明确。解决方案的关键在于提出一种以效应为中心、优先满足可适配性(admissibility)的框架:将发现的因果图视为结构假设,通过可识别性(identifiability)、稳定性(stability)和证伪检验(falsification)而非单纯依赖图恢复准确性来评估其可靠性。实证表明,许多统计上合理的因果图在施加最小时间与语义约束后无法支持点识别的因果查询,凸显可识别性是决策支持的核心瓶颈;而当可识别性成立时,不同算法家族虽生成差异显著的图结构,却能收敛至一致的效应估计,且这些估计经受住安慰剂检验、子采样和敏感性验证,说明应优先保障目标查询的适配性和效应层面的验证,而非单纯追求因果结构的精确恢复。
链接: https://arxiv.org/abs/2602.08340
作者: Hoang Dang,Luan Pham,Minh Nguyen
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:Causal discovery is increasingly applied to large-scale telemetry data to estimate the effects of user-facing interventions, yet its reliability for decision-making in feedback-driven systems with strong self-selection remains unclear. In this paper, we propose an effect-centric, admissibility-first framework that treats discovered graphs as structural hypotheses and evaluates them by identifiability, stability, and falsification rather than by graph recovery accuracy alone. Empirically, we study the effect of early exposure to competitive gameplay on short-term retention using real-world game telemetry. We find that many statistically plausible discovery outputs do not admit point-identified causal queries once minimal temporal and semantic constraints are enforced, highlighting identifiability as a critical bottleneck for decision support. When identification is possible, several algorithm families converge to similar, decision-consistent effect estimates despite producing substantially different graph structures, including cases where the direct treatment-outcome edge is absent and the effect is preserved through indirect causal pathways. These converging estimates survive placebo, subsampling, and sensitivity refutation. In contrast, other methods exhibit sporadic admissibility and threshold-sensitive or attenuated effects due to endpoint ambiguity. These results suggest that graph-level metrics alone are inadequate proxies for causal reliability for a given target query. Therefore, trustworthy causal conclusions in telemetry-driven systems require prioritizing admissibility and effect-level validation over causal structural recovery alone.
[AI-68] Who Deserves the Reward? SHARP: Shapley Credit-based Optimization for Multi-Agent System
【速读】:该论文旨在解决多智能体强化学习(Multi-Agent Reinforcement Learning, MARL)中因信用分配(credit assignment)难题导致的训练不稳定与效率低下问题。现有方法通常依赖稀疏或全局广播奖励,无法精准衡量各功能智能体对决策轨迹成功或失败的具体贡献,从而限制了学习效果。解决方案的关键在于提出一种基于Shapley值的分层信用归因框架(Shapley-based Hierarchical Attribution for Reinforcement Policy, SHARP),通过分解奖励机制——包括全局广播准确度奖励、基于Shapley值的边际信用奖励以及工具执行效率奖励——实现对每个智能体优势的精确归因与归一化,显著提升了训练稳定性与性能表现。
链接: https://arxiv.org/abs/2602.08335
作者: Yanming Li,Xuelin Zhang,WenJie Lu,Ziye Tang,Maodong Wu,Haotian Luo,Tongtong Wu,Zijie Peng,Hongze Mi,Yibo Feng,Naiqiang Tan,Chao Huang,Hong Chen,Li Shen
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:Integrating Large Language Models (LLMs) with external tools via multi-agent systems offers a promising new paradigm for decomposing and solving complex problems. However, training these systems remains notoriously difficult due to the credit assignment challenge, as it is often unclear which specific functional agent is responsible for the success or failure of decision trajectories. Existing methods typically rely on sparse or globally broadcast rewards, failing to capture individual contributions and leading to inefficient reinforcement learning. To address these limitations, we introduce the Shapley-based Hierarchical Attribution for Reinforcement Policy (SHARP), a novel framework for optimizing multi-agent reinforcement learning via precise credit attribution. SHARP effectively stabilizes training by normalizing agent-specific advantages across trajectory groups, primarily through a decomposed reward mechanism comprising a global broadcast-accuracy reward, a Shapley-based marginal-credit reward for each agent, and a tool-process reward to improve execution efficiency. Extensive experiments across various real-world benchmarks demonstrate that SHARP significantly outperforms recent state-of-the-art baselines, achieving average match improvements of 23.66% and 14.05% over single-agent and multi-agent approaches, respectively.
[AI-69] Regime Change Hypothesis: Foundations for Decoupled Dynamics in Neural Network Training
链接: https://arxiv.org/abs/2602.08333
作者: Cristian Pérez-Corral,Alberto Fernández-Hernández,Jose I. Mestre,Manuel F. Dolz,Jose Duato,Enrique S. Quintana-Ortí
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 8 pages, 1 figure
[AI-70] Near-Oracle KV Selection via Pre-hoc Sparsity for Long-Context Inference
【速读】:该论文旨在解决大语言模型(Large Language Model, LLM)推理过程中因不断增长的键值缓存(Key-Value Cache, KV cache)导致的注意力计算开销问题。现有稀疏注意力方法通常依赖于后验启发式选择策略(posterior heuristics),即基于已观测到的注意力分数或代理指标进行KV条目筛选,但这类方法易引入后验偏差(posterior bias),导致真实token重要性失真并遗漏关键信息,从而损害长程推理能力。其解决方案的关键在于提出预设稀疏性(Pre-hoc Sparsity, PrHS),通过在注意力评分前预先选择KV条目实现显式的精度控制:定义被丢弃条目的注意力质量损失为“ dropped mass”(Δ),并基于边际到互信息的分析推导出互信息损失的上界仅依赖于Δ;这一理论关系解释了后验方法失效的原因,并允许通过提前约束Δ来提供可验证的性能保障。PrHS进一步在时间、深度和层三个维度上设计三种正交的预设稀疏选择器,实验表明其在LLaMA与Mistral系列模型上显著降低检索开销超90%,相较HShare提升3倍稀疏度且保持相当或更优精度,在LongBench上平均性能下降低于1%,注意力FLOPs减少约15%,并在NVIDIA A100-80GB GPU上实现9.9倍注意力操作延迟加速与2.8倍吞吐量提升。
链接: https://arxiv.org/abs/2602.08329
作者: Yifei Gao,Lei Wang,Rong-Cheng Tu,Qixin Zhang,Jun Cheng,Dacheng Tao
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT)
备注: An effective method for accelerating LLM’s inference via selective KV processing
点击查看摘要
Abstract:A core bottleneck in large language model (LLM) inference is the cost of attending over the ever-growing key-value (KV) cache. Although near-oracle top-k KV selection can preserve the quality of dense attention while sharply reducing computation and bandwidth, existing sparse methods generally rely on posterior heuristics, i.e., selectors conditioned on observed attention or proxy scores. Such conditioning introduces posterior bias: it tends to distort true token importance and miss salient tokens, thereby impairing long-range reasoning. To tackle this problem, we propose Pre-hoc Sparsity (PrHS), which selects KV entries before attention scoring and provides explicit accuracy control. Let the attention mass of discarded entries be delta (the dropped mass). Through a marginal-to-mutual-information analysis, we derive an upper bound on the mutual-information loss that depends only on the dropped mass. This relation explains failure modes of posterior heuristics and enables verifiable guarantees by controlling the dropped mass in advance. Within PrHS, we instantiate three orthogonal pre-hoc selectors along the axes of time, depth, and layer. Extensive experiments on LLaMA and Mistral families validate PrHS. Across GSM8K and CoQA, PrHS reduces retrieval overhead by over 90%, achieving 3x higher retrieval sparsity than HShare at matched or better accuracy. It incurs under 1% average degradation on LongBench, lowers attention FLOPs by about 15% versus prior sparse baselines, and yields a 9.9x speedup in attention-operator latency and 2.8x higher throughput on NVIDIA A100-80GB GPUs than the dense baseline.
[AI-71] SWE Context Bench: A Benchmark for Context Learning in Coding
链接: https://arxiv.org/abs/2602.08316
作者: Jared Zhu,Minhao Hu,Junde Wu
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注:
[AI-72] Moral Sycophancy in Vision Language Models ACL
【速读】:该论文旨在解决视觉语言模型(Vision-Language Models, VLMs)在道德决策中表现出的“谄媚行为”(sycophancy)问题,即模型倾向于迎合用户意见,即使这会导致道德或事实上的错误判断。现有研究多关注一般情境下的谄媚行为,但对道德导向的视觉决策中的影响缺乏系统理解。论文的关键解决方案是通过构建首个针对VLMs道德谄媚行为的系统性分析框架,利用Moralise和M^3oralBench两个基准数据集,在显式用户分歧条件下评估十种主流VLMs的表现。研究发现模型存在显著的道德不对称性:更易从正确道德判断转向错误判断;同时揭示了错误引入率(EIR)与错误修正率(ECR)之间的权衡关系,表明高纠错能力的模型往往引入更多推理错误,而保守模型则难以自我修正。这一发现为提升多模态AI系统的伦理一致性与鲁棒性提供了关键实证依据和改进方向。
链接: https://arxiv.org/abs/2602.08311
作者: Shadman Rabby,Md. Hefzul Hossain Papon,Sabbir Ahmed,Nokimul Hasan Arif,A.B.M. Ashikur Rahman,Irfan Ahmad
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 13 pages, 6 figures, 8 tables, Submitted for review in ACL
点击查看摘要
Abstract:Sycophancy in Vision-Language Models (VLMs) refers to their tendency to align with user opinions, often at the expense of moral or factual accuracy. While prior studies have explored sycophantic behavior in general contexts, its impact on morally grounded visual decision-making remains insufficiently understood. To address this gap, we present the first systematic study of moral sycophancy in VLMs, analyzing ten widely-used models on the Moralise and M^3oralBench datasets under explicit user disagreement. Our results reveal that VLMs frequently produce morally incorrect follow-up responses even when their initial judgments are correct, and exhibit a consistent asymmetry: models are more likely to shift from morally right to morally wrong judgments than the reverse when exposed to user-induced bias. Follow-up prompts generally degrade performance on Moralise, while yielding mixed or even improved accuracy on M^3oralBench, highlighting dataset-dependent differences in moral robustness. Evaluation using Error Introduction Rate (EIR) and Error Correction Rate (ECR) reveals a clear trade-off: models with stronger error-correction capabilities tend to introduce more reasoning errors, whereas more conservative models minimize errors but exhibit limited ability to self-correct. Finally, initial contexts with a morally right stance elicit stronger sycophantic behavior, emphasizing the vulnerability of VLMs to moral influence and the need for principled strategies to improve ethical consistency and robustness in multimodal AI systems.
[AI-73] Grokking in Linear Models for Logistic Regression
链接: https://arxiv.org/abs/2602.08302
作者: Nataraj Das,Atreya Vedantam,Chandrashekar Lakshminarayanan
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
[AI-74] Automatic Generation of Polynomial Symmetry Breaking Constraints
链接: https://arxiv.org/abs/2602.08297
作者: Madalina Erascu,Johannes Middeke
机构: 未知
类目: ymbolic Computation (cs.SC); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
备注:
[AI-75] he Vibe-Automation of Automation: A Proactive Education Framework for Computer Science in the Age of Generative AI
【速读】:该论文试图解决的问题是:生成式人工智能(Generative AI)作为一种非线性、语境敏感的智能范式,正在引发计算机科学领域内认知论(epistemological)的根本性转变,传统基于可计算目标函数的机器学习方法已无法充分解释其运作机制与社会影响。解决方案的关键在于提出“Vibe-Automation”这一新概念,用以描述生成式AI通过高维潜在表征对实践中的隐性规律(tacit regularities)进行功能性操作的能力——这些规律无法被显式算法规则完全捕捉,但可通过上下文一致性、语义连贯性和风格适应性体现出来。由此,人类角色从算法问题定义转向“Vibe-Engineering”,即在生成系统中协调对齐与情境判断,从而推动教育理念、产业关系与课程设计三个层面的系统性变革,以应对模式坍缩和文化同质化的风险。
链接: https://arxiv.org/abs/2602.08295
作者: Ilya Levin
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 19 pages
点击查看摘要
Abstract:The emergence of generative artificial intelligence (GenAI) represents not an incremental technological advance but a qualitative epistemological shift that challenges foundational assumptions of computer science. Whereas machine learning has been described as the automation of automation, generative AI operates by navigating contextual, semantic, and stylistic coherence rather than optimizing predefined objective metrics. This paper introduces the concept of Vibe-Automation to characterize this transition. The central claim is that the significance of GenAI lies in its functional access to operationalized tacit regularities: context-sensitive patterns embedded in practice that cannot be fully specified through explicit algorithmic rules. Although generative systems do not possess tacit knowledge in a phenomenological sense, they operationalize sensitivities to tone, intent, and situated judgment encoded in high-dimensional latent representations. On this basis, the human role shifts from algorithmic problem specification toward Vibe-Engineering, understood as the orchestration of alignment and contextual judgment in generative systems. The paper connects this epistemological shift to educational and institutional transformation by proposing a conceptual framework structured across three analytical levels and three domains of action: faculty worldview, industry relations, and curriculum design. The risks of mode collapse and cultural homogenization are briefly discussed, emphasizing the need for deliberate engagement with generative systems to avoid regression toward synthetic uniformity. Comments: 19 pages Subjects: Artificial Intelligence (cs.AI) ACMclasses: I.2.0; K.3.2 Cite as: arXiv:2602.08295 [cs.AI] (or arXiv:2602.08295v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2602.08295 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
[AI-76] rust-Based Incentive Mechanisms in Semi-Decentralized Federated Learning Systems
链接: https://arxiv.org/abs/2602.08290
作者: Ajay Kumar Shrestha
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)
备注: To appear in the ICBTA 2025 Conference Proceedings and published as a volume of Lecture Notes in Networks and Systems by Springer
[AI-77] Noise Stability of Transformer Models ICLR2026
【速读】:该论文旨在解决当前深度学习中对模型简化偏好(simplicity biases)的理解局限问题,特别是平均敏感性(average sensitivity)这一常用指标在实值域缺乏自然推广且无法解释现代大语言模型(LLM)中观测到的“junta-like”输入依赖现象。其解决方案的关键在于提出噪声稳定性(noise stability)作为更全面的简化性度量,该指标衡量模型对同时施加于所有输入坐标的相关噪声的鲁棒性。作者通过理论分析单层注意力和ReLU MLP层的噪声稳定性,并采用协方差区间传播方法处理多层传播问题,进而设计了一种实用的噪声稳定性正则化方法,在算法任务和下一个词预测任务上分别使模型更快达到grokking状态并加速训练约35%和75%,从而建立了信号传播与可解释性之间的新联系,凸显了噪声稳定性在理解与优化Transformer架构中的潜力。
链接: https://arxiv.org/abs/2602.08287
作者: Themistoklis Haris,Zihan Zhang,Yuichi Yoshida
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注: Published in ICLR 2026
点击查看摘要
Abstract:Understanding simplicity biases in deep learning offers a promising path toward developing reliable AI. A common metric for this, inspired by Boolean function analysis, is average sensitivity, which captures a model’s robustness to single-token perturbations. We argue that average sensitivity has two key limitations: it lacks a natural generalization to real-valued domains and fails to explain the “junta-like” input dependence we empirically observe in modern LLMs. To address these limitations, we propose noise stability as a more comprehensive simplicity metric. Noise stability expresses a model’s robustness to correlated noise applied to all input coordinates simultaneously. We provide a theoretical analysis of noise stability for single-layer attention and ReLU MLP layers and tackle the multi-layer propagation problem with a covariance interval propagation approach. Building on this theory, we develop a practical noise stability regularization method. Experiments on algorithmic and next-token-prediction tasks show that our regularizer consistently catalyzes grokking and accelerates training by approximately 35% and 75% respectively. Our results sculpt a new connection between signal propagation in neural networks and interpretability, with noise stability emerging as a powerful tool for understanding and improving modern Transformers.
[AI-78] oward Formalizing LLM -Based Agent Designs through Structural Context Modeling and Semantic Dynamics Analysis
【速读】:该论文试图解决当前大语言模型(Large Language Model, LLM)智能体研究中存在的碎片化问题,即概念框架与方法论原则常与低层实现细节混杂,导致难以进行系统性比较和理解。解决方案的关键在于提出一个形式化的分析模型——结构化上下文模型(Structural Context Model),该模型从上下文结构的角度对LLM智能体进行可分析、独立于具体实现的表征与比较。在此基础上,论文进一步引入两个互补组件:一是声明式实现框架,二是可持续的智能体工程工作流——语义动态分析(Semantic Dynamics Analysis),从而覆盖LLM智能体研发的全生命周期,并提供机制层面的原理性洞见,支持高效的设计迭代。
链接: https://arxiv.org/abs/2602.08276
作者: Haoyu Jia,Kento Kawaharazuka,Kei Okada
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:Current research on large language model (LLM) agents is fragmented: discussions of conceptual frameworks and methodological principles are frequently intertwined with low-level implementation details, causing both readers and authors to lose track amid a proliferation of superficially distinct concepts. We argue that this fragmentation largely stems from the absence of an analyzable, self-consistent formal model that enables implementation-independent characterization and comparison of LLM agents. To address this gap, we propose the \textttStructural Context Model, a formal model for analyzing and comparing LLM agents from the perspective of context structure. Building upon this foundation, we introduce two complementary components that together span the full lifecycle of LLM agent research and development: (1) a declarative implementation framework; and (2) a sustainable agent engineering workflow, \textttSemantic Dynamics Analysis. The proposed workflow provides principled insights into agent mechanisms and supports rapid, systematic design iteration. We demonstrate the effectiveness of the complete framework on dynamic variants of the monkey-banana problem, where agents engineered using our approach achieve up to a 32 percentage points improvement in success rate on the most challenging setting.
[AI-79] When Do Multi-Agent Systems Outperform? Analysing the Learning Efficiency of Agent ic Systems
链接: https://arxiv.org/abs/2602.08272
作者: Junwei Su,Chuan Wu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
[AI-80] Puda: Private User Dataset Agent for User-Sovereign and Privacy-Preserving Personalized AI
【速读】:该论文旨在解决当前主流平台服务中个人数据集中化所导致的用户主权受限问题,以及生成式 AI(Generative AI)驱动的大型语言模型(Large Language Model, LLM)代理对多样化、动态个性化数据需求与隐私保护之间的矛盾。其解决方案的关键在于提出一种用户主权架构 Puda(Private User Dataset Agent),通过客户端侧的数据聚合与管理机制,使用户能够在三个隐私粒度层级上灵活控制数据共享:详细浏览历史、提取关键词和预定义类别子集。实验表明,在旅行规划任务中,仅使用预定义类别子集即可达到97.2%的个性化性能(基于LLM-as-a-Judge框架评估),有效缓解了隐私与个性化之间的权衡问题,为实现AI原生的用户主权提供了可落地的技术路径。
链接: https://arxiv.org/abs/2602.08268
作者: Akinori Maeda,Yuto Sekiya,Sota Sugimura,Tomoya Asai,Yu Tsuda,Kohei Ikeda,Hiroshi Fujii,Kohei Watanabe
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 9 pages, 5 figures
点击查看摘要
Abstract:Personal data centralization among dominant platform providers including search engines, social networking services, and e-commerce has created siloed ecosystems that restrict user sovereignty, thereby impeding data use across services. Meanwhile, the rapid proliferation of Large Language Model (LLM)-based agents has intensified demand for highly personalized services that require the dynamic provision of diverse personal data. This presents a significant challenge: balancing the utilization of such data with privacy protection. To address this challenge, we propose Puda (Private User Dataset Agent), a user-sovereign architecture that aggregates data across services and enables client-side management. Puda allows users to control data sharing at three privacy levels: (i) Detailed Browsing History, (ii) Extracted Keywords, and (iii) Predefined Category Subsets. We implemented Puda as a browser-based system that serves as a common platform across diverse services and evaluated it through a personalized travel planning task. Our results show that providing Predefined Category Subsets achieves 97.2% of the personalization performance (evaluated via an LLM-as-a-Judge framework across three criteria) obtained when sharing Detailed Browsing History. These findings demonstrate that Puda enables effective multi-granularity management, offering practical choices to mitigate the privacy-personalization trade-off. Overall, Puda provides an AI-native foundation for user sovereignty, empowering users to safely leverage the full potential of personalized AI.
[AI-81] Inverting Data Transformations via Diffusion Sampling
链接: https://arxiv.org/abs/2602.08267
作者: Jinwoo Kim,Sékou-Oumar Kaba,Jiyun Park,Seunghoon Hong,Siamak Ravanbakhsh
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 24 pages, 4 figures
[AI-82] G-LNS: Generative Large Neighborhood Search for LLM -Based Automatic Heuristic Design
【速读】:该论文旨在解决当前基于大语言模型(Large Language Models, LLMs)的自动启发式设计(Automated Heuristic Design, AHD)方法在组合优化问题(Combinatorial Optimization Problems, COPs)中搜索空间受限、难以跳出深度局部最优解的问题。现有方法多局限于构造性优先规则或参数化局部搜索引导,限制了结构层面的探索能力。其解决方案的关键在于提出G-LNS框架,该框架利用LLM协同进化紧密耦合的破坏(destroy)与修复(repair)算子对,通过一个合作评估机制显式建模二者交互关系,从而发现具有互补逻辑的算子组合,实现有效的结构扰动与重构。实验表明,G-LNS在TSP和CVRP等复杂基准问题上显著优于现有LLM-AHD方法及经典求解器,不仅能以更低计算预算逼近最优解,还具备跨不同实例分布的鲁棒泛化能力。
链接: https://arxiv.org/abs/2602.08253
作者: Baoyun Zhao,He Wang,Liang Zeng
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:While Large Language Models (LLMs) have recently shown promise in Automated Heuristic Design (AHD), existing approaches typically formulate AHD around constructive priority rules or parameterized local search guidance, thereby restricting the search space to fixed heuristic forms. Such designs offer limited capacity for structural exploration, making it difficult to escape deep local optima in complex Combinatorial Optimization Problems (COPs). In this work, we propose G-LNS, a generative evolutionary framework that extends LLM-based AHD to the automated design of Large Neighborhood Search (LNS) operators. Unlike prior methods that evolve heuristics in isolation, G-LNS leverages LLMs to co-evolve tightly coupled pairs of destroy and repair operators. A cooperative evaluation mechanism explicitly captures their interaction, enabling the discovery of complementary operator logic that jointly performs effective structural disruption and reconstruction. Extensive experiments on challenging COP benchmarks, such as Traveling Salesman Problems (TSP) and Capacitated Vehicle Routing Problems (CVRP), demonstrate that G-LNS significantly outperforms LLM-based AHD methods as well as strong classical solvers. The discovered heuristics not only achieve near-optimal solutions with reduced computational budgets but also exhibit robust generalization across diverse and unseen instance distributions.
[AI-83] STEP: Warm-Started Visuomotor Policies with Spatiotemporal Consistency Prediction
【速读】:该论文旨在解决扩散策略(diffusion policy)在机器人操作任务中因迭代去噪导致的推理延迟过高问题,从而限制了实时闭环控制系统中的控制频率。现有加速方法要么减少采样步骤、绕过扩散过程直接预测动作,要么复用历史动作,但往往难以同时保障动作质量与实现稳定的低延迟。解决方案的关键在于提出STEP(spatiotemporal consistency prediction mechanism),通过构建高质量的暖启动动作(warm-start actions),使其在分布上接近目标动作且具有时间一致性,同时不损害原始扩散策略的生成能力;并进一步引入速度感知扰动注入机制(velocity-aware perturbation injection),根据动作的时间变化率自适应调节执行激励,防止实际任务中因动作停滞而导致的执行失败。理论分析表明,该预测机制诱导局部压缩映射,确保扩散优化过程中动作误差收敛。
链接: https://arxiv.org/abs/2602.08245
作者: Jinhao Li,Yuxuan Cong,Yingqiao Wang,Hao Xia,Shan Huang,Yijia Zhang,Ningyi Xu,Guohao Dai
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注: 13 pages, 9 figures
点击查看摘要
Abstract:Diffusion policies have recently emerged as a powerful paradigm for visuomotor control in robotic manipulation due to their ability to model the distribution of action sequences and capture multimodality. However, iterative denoising leads to substantial inference latency, limiting control frequency in real-time closed-loop systems. Existing acceleration methods either reduce sampling steps, bypass diffusion through direct prediction, or reuse past actions, but often struggle to jointly preserve action quality and achieve consistently low latency. In this work, we propose STEP, a lightweight spatiotemporal consistency prediction mechanism to construct high-quality warm-start actions that are both distributionally close to the target action and temporally consistent, without compromising the generative capability of the original diffusion policy. Then, we propose a velocity-aware perturbation injection mechanism that adaptively modulates actuation excitation based on temporal action variation to prevent execution stall especially for real-world tasks. We further provide a theoretical analysis showing that the proposed prediction induces a locally contractive mapping, ensuring convergence of action errors during diffusion refinement. We conduct extensive evaluations on nine simulated benchmarks and two real-world tasks. Notably, STEP with 2 steps can achieve an average 21.6% and 27.5% higher success rate than BRIDGER and DDIM on the RoboMimic benchmark and real-world tasks, respectively. These results demonstrate that STEP consistently advances the Pareto frontier of inference latency and success rate over existing methods.
[AI-84] Learning in Context Guided by Choice: A Reward-Free Paradigm for Reinforcement Learning with Transformers
【速读】:该论文旨在解决传统基于提示的强化学习(In-context Reinforcement Learning, ICRL)对显式奖励信号的高度依赖问题,这一限制使其在奖励不明确、难以定义或获取成本高昂的场景中难以应用。解决方案的关键在于提出一种全新的学习范式——基于偏好的提示强化学习(In-Context Preference-based Reinforcement Learning, ICPRL),其核心创新在于仅使用偏好反馈(preference feedback)进行预训练和部署,无需任何奖励监督信号。具体而言,ICPRL 包含两种变体:基于每步偏好的即时偏好强化学习(I-PRL)与基于轨迹对比的轨迹偏好强化学习(T-PRL),并通过引入直接从偏好数据优化 Transformer 模型策略的偏好原生框架,显著提升了数据效率,实现了在未见任务上的强泛化能力,性能可媲美依赖完整奖励监督的传统 ICRL 方法。
链接: https://arxiv.org/abs/2602.08244
作者: Juncheng Dong,Bowen He,Moyang Guo,Ethan X. Fang,Zhuoran Yang,Vahid Tarokh
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
点击查看摘要
Abstract:In-context reinforcement learning (ICRL) leverages the in-context learning capabilities of transformer models (TMs) to efficiently generalize to unseen sequential decision-making tasks without parameter updates. However, existing ICRL methods rely on explicit reward signals during pretraining, which limits their applicability when rewards are ambiguous, hard to specify, or costly to obtain. To overcome this limitation, we propose a new learning paradigm, In-Context Preference-based Reinforcement Learning (ICPRL), in which both pretraining and deployment rely solely on preference feedback, eliminating the need for reward supervision. We study two variants that differ in the granularity of feedback: Immediate Preference-based RL (I-PRL) with per-step preferences, and Trajectory Preference-based RL (T-PRL) with trajectory-level comparisons. We first show that supervised pretraining, a standard approach in ICRL, remains effective under preference-only context datasets, demonstrating the feasibility of in-context reinforcement learning using only preference signals. To further improve data efficiency, we introduce alternative preference-native frameworks for I-PRL and T-PRL that directly optimize TM policies from preference data without requiring reward signals nor optimal action this http URL on dueling bandits, navigation, and continuous control tasks demonstrate that ICPRL enables strong in-context generalization to unseen tasks, achieving performance comparable to ICRL methods trained with full reward supervision.
[AI-85] PTS-SNN: A Prompt-Tuned Temporal Shift Spiking Neural Networks for Efficient Speech Emotion Recognition
【速读】:该论文旨在解决传统语音情感识别(Speech Emotion Recognition, SER)模型在资源受限的边缘设备上部署时面临的高计算开销问题,以及将连续自监督学习(Self-Supervised Learning, SSL)表征与脉冲神经网络(Spiking Neural Networks, SNNs)集成时因分布不匹配导致的信息编码能力下降问题。解决方案的关键在于提出一种参数高效的类神经形态适配框架——Prompt-Tuned Spiking Neural Networks (PTS-SNN),其核心创新包括:1)设计了一个无参数的时域偏移脉冲编码器(Temporal Shift Spiking Encoder),通过通道级时移捕捉局部时间依赖性,构建稳定的特征基础;2)提出一种上下文感知膜电位校准策略(Context-Aware Membrane Potential Calibration),利用脉冲稀疏线性注意力模块聚合全局语义信息生成可学习的软提示(soft prompts),动态调节参数化漏电积分发放(Parametric Leaky Integrate-and-Fire, PLIF)神经元的偏置电压,从而将异构输入分布有效约束在神经元响应区间内,缓解功能静默或饱和现象。该方法在五个多语言数据集上实现媲美人工神经网络(Artificial Neural Networks, ANNs)的性能(如IEMOCAP达73.34%准确率),同时仅需1.19M可训练参数和每样本0.35 mJ推理能耗。
链接: https://arxiv.org/abs/2602.08240
作者: Xun Su,Huamin Wang,Qi Zhang
机构: 未知
类目: Artificial Intelligence (cs.AI); Sound (cs.SD)
备注:
点击查看摘要
Abstract:Speech Emotion Recognition (SER) is widely deployed in Human-Computer Interaction, yet the high computational cost of conventional models hinders their implementation on resource-constrained edge devices. Spiking Neural Networks (SNNs) offer an energy-efficient alternative due to their event-driven nature; however, their integration with continuous Self-Supervised Learning (SSL) representations is fundamentally challenged by distribution mismatch, where high-dynamic-range embeddings degrade the information coding capacity of threshold-based neurons. To resolve this, we propose Prompt-Tuned Spiking Neural Networks (PTS-SNN), a parameter-efficient neuromorphic adaptation framework that aligns frozen SSL backbones with spiking dynamics. Specifically, we introduce a Temporal Shift Spiking Encoder to capture local temporal dependencies via parameter-free channel shifts, establishing a stable feature basis. To bridge the domain gap, we devise a Context-Aware Membrane Potential Calibration strategy. This mechanism leverages a Spiking Sparse Linear Attention module to aggregate global semantic context into learnable soft prompts, which dynamically regulate the bias voltages of Parametric Leaky Integrate-and-Fire (PLIF) neurons. This regulation effectively centers the heterogeneous input distribution within the responsive firing range, mitigating functional silence or saturation. Extensive experiments on five multilingual datasets (e.g., IEMOCAP, CASIA, EMODB) demonstrate that PTS-SNN achieves 73.34% accuracy on IEMOCAP, comparable to competitive Artificial Neural Networks (ANNs), while requiring only 1.19M trainable parameters and 0.35 mJ inference energy per sample.
[AI-86] Linearization Explains Fine-Tuning in Large Language Models
链接: https://arxiv.org/abs/2602.08239
作者: Zahra Rahimi Afzal,Tara Esmaeilbeig,Mojtaba Soltanalian,Mesrob I. Ohannessian
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
[AI-87] utti: Expressive Multi-Singer Synthesis via Structure-Level Timbre Control and Vocal Texture Modeling
链接: https://arxiv.org/abs/2602.08233
作者: Jiatao Chen,Xing Tang,Xiaoyue Duan,Yutang Feng,Jinchao Zhang,Jie Zhou
机构: 未知
类目: ound (cs.SD); Artificial Intelligence (cs.AI)
备注:
[AI-88] InfiCoEvalChain: A Blockchain-Based Decentralized Framework for Collaborative LLM Evaluation
【速读】:该论文旨在解决当前大语言模型(Large Language Models, LLMs)评估中存在的不稳定性问题,包括集中式评估的透明度不足、过拟合风险以及硬件差异导致的性能波动。实证分析表明,单一模型在HumanEval上的十次重复运行标准差(1.67)甚至高于前10名模型之间的性能差距(0.91),说明现有排名缺乏统计可靠性。解决方案的关键在于提出一种去中心化评估框架,通过跨异构计算节点的大规模基准测试实现参数与硬件多样性,并利用基于区块链的协议激励全球贡献者作为独立验证者,结合稳健的奖励机制保障评估完整性并抑制恶意行为。该框架将评估从“集中式黑箱”转变为“去中心化共识”,借助多方共识和多样化推理环境显著降低评估方差(降至0.28),从而提升模型排名的稳定性和代表性。
链接: https://arxiv.org/abs/2602.08229
作者: Yifan Yang,Jinjia Li,Kunxi Li,Puhao Zheng,Yuanyi Wang,Zheyan Qu,Yang Yu,Jianmin Wu,Ming Li,Hongxia Yang
机构: 未知
类目: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
备注:
点击查看摘要
Abstract:The rapid advancement of large language models (LLMs) demands increasingly reliable evaluation, yet current centralized evaluation suffers from opacity, overfitting, and hardware-induced variance. Our empirical analysis reveals an alarming inconsistency in existing evaluations: the standard deviation across ten repeated runs of a single model on HumanEval (1.67) actually exceeds the performance gap among the top-10 models on the official leaderboard (0.91), rendering current rankings statistically precarious. To mitigate these instabilities, we propose a decentralized evaluation framework that enables hardware and parameter diversity through large-scale benchmarking across heterogeneous compute nodes. By leveraging the blockchain-based protocol, the framework incentivizes global contributors to act as independent validators, using a robust reward system to ensure evaluation integrity and discourage dishonest participation. This collective verification transforms evaluation from a “centralized black box” into a “decentralized endorsement” where multi-party consensus and diverse inference environments yield a more stable, representative metric. Experimental results demonstrate that the decentralized evaluation framework reduces the standard deviation across ten runs on the same model to 0.28. This significant improvement over conventional frameworks ensures higher statistical confidence in model rankings. We have completely implemented this platform and will soon release it to the community.
机器学习
[LG-0] Contact-Anchored Policies: Contact Conditioning Creates Strong Robot Utility Models
链接: https://arxiv.org/abs/2602.09017
作者: Zichen Jeff Cui,Omar Rayyan,Haritheja Etukuru,Bowen Tan,Zavier Andrianarivo,Zicheng Teng,Yihang Zhou,Krish Mehta,Nicholas Wojno,Kevin Yuanbo Wu,Manan H Anjaria,Ziyuan Wu,Manrong Mao,Guangxun Zhang,Binit Shah,Yejin Kim,Soumith Chintala,Lerrel Pinto,Nur Muhammad Mahi Shafiullah
类目: Robotics (cs.RO); Machine Learning (cs.LG)
*备注:
[LG-1] ShapeCond: Fast Shapelet-Guided Dataset Condensation for Time Series Classification
链接: https://arxiv.org/abs/2602.09008
作者: Sijia Peng,Yun Xiong,Xi Chen,Yi Xie,Guanzhi Li,Yanwei Yu,Yangyong Zhu,Zhiqiang Shen
类目: Machine Learning (cs.LG)
*备注: Code at: this https URL
[LG-2] DirMoE: Dirichlet-routed Mixture of Experts
链接: https://arxiv.org/abs/2602.09001
作者: Amirhossein Vahidi,Hesam Asadollahzadeh,Navid Akhavan Attar,Marie Moullet,Kevin Ly,Xingyi Yang,Mohammad Lotfollahi
类目: Machine Learning (cs.LG)
*备注:
[LG-3] Distributionally Robust Optimization via Generative Ambiguity Modeling
链接: https://arxiv.org/abs/2602.08976
作者: Jiaqi Wen,Jianyi Yang
类目: Machine Learning (cs.LG)
*备注:
[LG-4] DynamiQ: Accelerating Gradient Synchronization using Compressed Multi-hop All-reduce
链接: https://arxiv.org/abs/2602.08923
作者: Wenchen Han,Shay Vargaftik,Michael Mitzenmacher,Ran Ben Basat
类目: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
*备注: 18 pages, 18 figures
[LG-5] Diffusion-Inspired Reconfiguration of Transformers for Uncertainty Calibration
链接: https://arxiv.org/abs/2602.08920
作者: Manh Cuong Dao,Quang Hung Pham,Phi Le Nguyen,Thao Nguyen Truong,Bryan Kian Hsiang Low,Trong Nghia Hoang
类目: Machine Learning (cs.LG)
*备注:
[LG-6] AMS-HD: Hyperdimensional Computing for Real-Time and Energy-Efficient Acute Mountain Sickness Detection
链接: https://arxiv.org/abs/2602.08916
作者: Abu Masum,Mehran Moghadam,M. Hassan Najafi,Bige Unluturk,Ulkuhan Guler,Sercan Aygun
类目: ymbolic Computation (cs.SC); Emerging Technologies (cs.ET); Machine Learning (cs.LG)
*备注:
[LG-7] GEMSS: A Variational Bayesian Method for Discovering Multiple Sparse Solutions in Classification and Regression Problems
链接: https://arxiv.org/abs/2602.08913
作者: Kateřina Henclová,Václav Šmídl
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:
[LG-8] Positive Distribution Shift as a Framework for Understanding Tractable Learning
链接: https://arxiv.org/abs/2602.08907
作者: Marko Medvedev,Idan Attias,Elisabetta Cornacchia,Theodor Misiakiewicz,Gal Vardi,Nathan Srebro
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:
[LG-9] GSS: Gated Subspace Steering for Selective Memorization Mitigation in LLM s
链接: https://arxiv.org/abs/2602.08901
作者: Xuanqi Zhang,Haoyang Shang,Xiaoxiao Li
类目: Machine Learning (cs.LG)
*备注: 34 pages, 12 figures
[LG-10] Discrete Bridges for Mutual Information Estimation
链接: https://arxiv.org/abs/2602.08894
作者: Iryna Zabarianska,Sergei Kholkin,Grigoriy Ksenofontov,Ivan Butakov,Alexander Korotin
类目: Machine Learning (cs.LG)
*备注:
[LG-11] Stress-Testing Alignment Audits With Prompt-Level Strategic Deception
链接: https://arxiv.org/abs/2602.08877
作者: Oliver Daniels,Perusha Moodley,Ben Marlin,David Lindner
类目: Machine Learning (cs.LG)
*备注:
[LG-12] Near-optimal Swap Regret Minimization for Convex Losses
链接: https://arxiv.org/abs/2602.08862
作者: Lunjia Hu,Jon Schneider,Yifan Wu
类目: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
*备注:
点击查看摘要
Abstract:We give a randomized online algorithm that guarantees near-optimal \widetilde O(\sqrt T) expected swap regret against any sequence of T adaptively chosen Lipschitz convex losses on the unit interval. This improves the previous best bound of \widetilde O(T^2/3) and answers an open question of Fishelson et al. [2025b]. In addition, our algorithm is efficient: it runs in \mathsfpoly(T) time. A key technical idea we develop to obtain this result is to discretize the unit interval into bins at multiple scales of granularity and simultaneously use all scales to make randomized predictions, which we call multi-scale binning and may be of independent interest. A direct corollary of our result is an efficient online algorithm for minimizing the calibration error for general elicitable properties. This result does not require the Lipschitzness assumption of the identification function needed in prior work, making it applicable to median calibration, for which we achieve the first \widetilde O(\sqrt T) calibration error guarantee.
[LG-13] Magnitude Distance: A Geometric Measure of Dataset Similarity
链接: https://arxiv.org/abs/2602.08859
作者: Sahel Torkamani,Henry Gouk,Rik Sarkar
类目: Machine Learning (cs.LG)
*备注:
[LG-14] Rethinking Graph Generalization through the Lens of Sharpness-Aware Minimization
链接: https://arxiv.org/abs/2602.08855
作者: Yang Qiu,Yixiong Zou,Jun Wang
类目: Machine Learning (cs.LG)
*备注:
[LG-15] FlexMoRE: A Flexible Mixture of Rank-heterogeneous Experts for Efficient Federatedly-trained Large Language Models
链接: https://arxiv.org/abs/2602.08818
作者: Annemette Brok Pirchert,Jacob Nielsen,Mogens Henrik From,Lukas Galke Poech,Peter Schneider-Kamp
类目: Machine Learning (cs.LG)
*备注:
[LG-16] Kirin: Improving ANN efficiency with SNN Hybridization
链接: https://arxiv.org/abs/2602.08817
作者: Chenyu Wang,Zhanglu Yan,Zhi Zhou,Xu Chen,Weng-Fai Wong
类目: Machine Learning (cs.LG)
*备注:
[LG-17] Robust Policy Optimization to Prevent Catastrophic Forgetting
链接: https://arxiv.org/abs/2602.08813
作者: Mahdi Sabbaghi,George Pappas,Adel Javanmard,Hamed Hassani
类目: Machine Learning (cs.LG)
*备注:
[LG-18] Efficient Deep Learning for Biometrics: Overview Challenges and Trends in Ear of Frugal AI
链接: https://arxiv.org/abs/2602.08809
作者: Karim Haroun,Aya Zitouni,Aicha Zenakhri,Meriem Amel Guessoum,Larbi Boubchir
类目: Machine Learning (cs.LG)
*备注: 8 pages, 2 figures, accepted at the 2025 IEEE SDS conference
[LG-19] How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLM s
链接: https://arxiv.org/abs/2602.08808
作者: Yapei Chang,Kyle Lo,Mohit Iyyer,Luca Soldaini
类目: Machine Learning (cs.LG)
*备注: 53 pages, 22 figures
[LG-20] Empirically Understanding the Value of Prediction in Allocation
链接: https://arxiv.org/abs/2602.08786
作者: Unai Fischer-Abaigar,Emily Aiken,Christoph Kern,Juan Carlos Perdomo
类目: Computers and Society (cs.CY); Machine Learning (cs.LG)
*备注:
[LG-21] A Graphop Analysis of Graph Neural Networks on Sparse Graphs: Generalization and Universal Approximation
链接: https://arxiv.org/abs/2602.08785
作者: Ofek Amran,Tom Gilat,Ron Levie
类目: Machine Learning (cs.LG)
*备注:
[LG-22] HoGS: Homophily-Oriented Graph Synthesis for Local Differentially Private GNN Training
链接: https://arxiv.org/abs/2602.08762
作者: Wen Xu,Zhetao Li,Yong Xiao,Pengpeng Qiao,Mianxiong Dong,Kaoru Ota
类目: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
*备注:
[LG-23] Redundancy-Free View Alignment for Multimodal Human Activity Recognition with Arbitrarily Missing Views
链接: https://arxiv.org/abs/2602.08755
作者: Duc-Anh Nguyen,Nhien-An Le-Khac
类目: Machine Learning (cs.LG)
*备注:
[LG-24] Central Dogma Transformer II: An AI Microscope for Understanding Cellular Regulatory Mechanisms
链接: https://arxiv.org/abs/2602.08751
作者: Nobuyuki Ota
类目: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
*备注: 20 pages, 6 figures
点击查看摘要
Abstract:Current biological AI models lack interpretability – their internal representations do not correspond to biological relationships that researchers can examine. Here we present CDT-II, an “AI microscope” whose attention maps are directly interpretable as regulatory structure. By mirroring the central dogma in its architecture, each attention mechanism corresponds to a specific biological relationship: DNA self-attention for genomic relationships, RNA self-attention for gene co-regulation, and DNA-to-RNA cross-attention for transcriptional control. Using only genomic embeddings and raw per-cell expression, CDT-II enables experimental biologists to observe regulatory networks in their own data. Applied to K562 CRISPRi data, CDT-II predicts perturbation effects (per-gene mean r = 0.84 ) and recovers the GFI1B regulatory network without supervision (6.6-fold enrichment, P = 3.5 \times 10^-17 ). Two distinct attention mechanisms converge on an RNA processing module ( P = 1 \times 10^-16 ). CDT-II establishes mechanism-oriented AI as an alternative to task-oriented approaches, revealing regulatory structure rather than merely optimizing predictions. Comments: 20 pages, 6 figures Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM) Cite as: arXiv:2602.08751 [cs.LG] (or arXiv:2602.08751v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2602.08751 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Nobuyuki Ota [view email] [v1] Mon, 9 Feb 2026 14:54:31 UTC (5,503 KB) Full-text links: Access Paper: View a PDF of the paper titled Central Dogma Transformer II: An AI Microscope for Understanding Cellular Regulatory Mechanisms, by Nobuyuki OtaView PDFHTML (experimental)TeX Source view license Current browse context: cs.LG prev | next new | recent | 2026-02 Change to browse by: cs q-bio q-bio.QM References Citations NASA ADSGoogle Scholar Semantic Scholar export BibTeX citation Loading… BibTeX formatted citation loading… Data provided by: Bookmark checked=“checked”> Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Code, Data and Media Associated with this Article alphaXiv Toggle alphaXiv (What is alphaXiv?) Links to Code Toggle CatalyzeX Code Finder for Papers (What is CatalyzeX?) DagsHub Toggle DagsHub (What is DagsHub?) GotitPub Toggle Gotit.pub (What is GotitPub?) Huggingface Toggle Hugging Face (What is Huggingface?) Links to Code Toggle Papers with Code (What is Papers with Code?) ScienceCast Toggle ScienceCast (What is ScienceCast?) Demos Demos Replicate Toggle Replicate (What is Replicate?) Spaces Toggle Hugging Face Spaces (What is Spaces?) Spaces Toggle TXYZ.AI (What is TXYZ.AI?) Related Papers Recommenders and Search Tools Link to Influence Flower Influence Flower (What are Influence Flowers?) Core recommender toggle CORE Recommender (What is CORE?) IArxiv recommender toggle IArxiv Recommender (What is IArxiv?) Author Venue Institution Topic About arXivLabs arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv’s community? Learn more about arXivLabs. Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?) mathjaxToggle(); About Help contact arXivClick here to contact arXiv Contact subscribe to arXiv mailingsClick here to subscribe Subscribe Copyright Privacy Policy Web Accessibility Assistance arXiv Operational Status
[LG-25] Foundation Inference Models for Ordinary Differential Equations
链接: https://arxiv.org/abs/2602.08733
作者: Maximilian Mauel,Johannes R. Hübers,David Berghaus,Patrick Seifner,Ramses J. Sanchez
类目: Machine Learning (cs.LG)
*备注:
[LG-26] Data Reconstruction: Identifiability and Optimization with Sample Splitting
链接: https://arxiv.org/abs/2602.08723
作者: Yujie Shen,Zihan Wang,Jian Qian,Qi Lei
类目: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
*备注:
[LG-27] rapped by simplicity: When Transformers fail to learn from noisy features ICLR2026
链接: https://arxiv.org/abs/2602.08695
作者: Evan Peters,Ando Deng,Matheus H. Zambianco,Devin Blankespoor,Achim Kempf
类目: Machine Learning (cs.LG)
*备注: 13+12 pages, 7 figures. Accepted at ICLR 2026
[LG-28] Reasoning aligns language models to human cognition
链接: https://arxiv.org/abs/2602.08693
作者: Gonçalo Guiomar,Elia Torre,Pehuen Moure,Victoria Shavina,Mario Giulianelli,Shih-Chii Liu,Valerio Mante
类目: Machine Learning (cs.LG)
*备注: 38 pages, 4 main figures, multiple appendix figures
[LG-29] SoK: The Pitfalls of Deep Reinforcement Learning for Cybersecurity
链接: https://arxiv.org/abs/2602.08690
作者: Shae McFadden,Myles Foley,Elizabeth Bates,Ilias Tsingenopoulos,Sanyam Vyas,Vasilios Mavroudis,Chris Hicks,Fabio Pierazzi
类目: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
*备注:
[LG-30] Learning To Sample From Diffusion Models Via Inverse Reinforcement Learning
链接: https://arxiv.org/abs/2602.08689
作者: Constant Bourdrez,Alexandre Vérine,Olivier Cappé
类目: Machine Learning (cs.LG)
*备注: Preprint
[LG-31] he Theory and Practice of MAP Inference over Non-Convex Constraints
链接: https://arxiv.org/abs/2602.08681
作者: Leander Kurscheidt,Gabriele Masina,Roberto Sebastiani,Antonio Vergari
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:
[LG-32] Dashed Line Defense: Plug-And-Play Defense Against Adaptive Score-Based Query Attacks
链接: https://arxiv.org/abs/2602.08679
作者: Yanzhang Fu,Zizheng Guo,Jizhou Luo
类目: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
*备注:
[LG-33] wo-Stage Data Synthesization: A Statistics-Driven Restricted Trade-off between Privacy and Prediction
链接: https://arxiv.org/abs/2602.08657
作者: Xiaotong Liu,Shao-Bo Lin,Jun Fan,Ding-Xuan Zhou
类目: Machine Learning (cs.LG); Methodology (stat.ME)
*备注:
[LG-34] From Robotics to Sepsis Treatment: Offline RL via Geometric Pessimism
链接: https://arxiv.org/abs/2602.08655
作者: Sarthak Wanjari
类目: Machine Learning (cs.LG)
*备注: 10 pages, 8 figures
[LG-35] Projected Gradient Ascent for Efficient Reward-Guided Updates with One-Step Generative Models
链接: https://arxiv.org/abs/2602.08646
作者: Jisung Hwang,Minhyuk Sung
类目: Machine Learning (cs.LG)
*备注:
[LG-36] ERIS: Enhancing Privacy and Communication Efficiency in Serverless Federated Learning
链接: https://arxiv.org/abs/2602.08617
作者: Dario Fenoglio,Pasquale Polverino,Jacopo Quizi,Martin Gjoreski,Marc Langheinrich
类目: Machine Learning (cs.LG)
*备注:
[LG-37] FMLinker: Universal Link Predictor by Graph In-Context Learning with Tabular Foundation Models
链接: https://arxiv.org/abs/2602.08592
作者: Tianyin Liao,Chunyu Hu,Yicheng Sui,Xingxuan Zhang,Peng Cui,Jianxin Li,Ziwei Zhang
类目: Machine Learning (cs.LG)
*备注:
[LG-38] SDFed: Bridging Local Global Discrepancy via Subspace Refinement and Divergence Control in Federated Prompt Learning
链接: https://arxiv.org/abs/2602.08590
作者: Yicheng Di,Wei Yuan,Tieke He,Zhanjie Zhang,Ao Ma,Yuan Liu,Hongzhi Yin
类目: Machine Learning (cs.LG); Databases (cs.DB)
*备注: 13 pages, 6 figures
[LG-39] FairRARI: A Plug and Play Framework for Fairness-Aware PageRank
链接: https://arxiv.org/abs/2602.08589
作者: Emmanouil Kariotakis,Aritra Konar
类目: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
*备注:
[LG-40] Conditional Sequence Modeling for Safe Reinforcement Learning
链接: https://arxiv.org/abs/2602.08584
作者: Wensong Bai,Chao Zhang,Qihang Xu,Chufan Chen,Chenhao Zhou,Hui Qian
类目: Machine Learning (cs.LG)
*备注:
[LG-41] Modeling Score Approximation Errors in Diffusion Models via Forward SPDEs
链接: https://arxiv.org/abs/2602.08579
作者: Junsu Seo
类目: Machine Learning (cs.LG)
*备注:
[LG-42] An arithmetic method algorithm optimizing k-nearest neighbors compared to regression algorithms and evaluated on real world data sources
链接: https://arxiv.org/abs/2602.08577
作者: Theodoros Anagnostopoulos,Evanthia Zervoudi,Christos Anagnostopoulos,Apostolos Christopoulos,Bogdan Wierzbinski
类目: Machine Learning (cs.LG); Combinatorics (math.CO); Computation (stat.CO)
*备注: Nature Scientific Reports
点击查看摘要
Abstract:Linear regression analysis focuses on predicting a numeric regressand value based on certain regressor values. In this context, k-Nearest Neighbors (k-NN) is a common non-parametric regression algorithm, which achieves efficient performance when compared with other algorithms in literature. In this research effort an optimization of the k-NN algorithm is proposed by exploiting the potentiality of an introduced arithmetic method, which can provide solutions for linear equations involving an arbitrary number of real variables. Specifically, an Arithmetic Method Algorithm (AMA) is adopted to assess the efficiency of the introduced arithmetic method, while an Arithmetic Method Regression (AMR) algorithm is proposed as an optimization of k-NN adopting the potentiality of AMA. Such algorithm is compared with other regression algorithms, according to an introduced optimal inference decision rule, and evaluated on certain real world data sources, which are publicly available. Results are promising since the proposed AMR algorithm has comparable performance with the other algorithms, while in most cases it achieves better performance than the k-NN. The output results indicate that introduced AMR is an optimization of k-NN.
[LG-43] M-Loss: Quantifying Model Merging Compatibility with Limited Unlabeled Data
链接: https://arxiv.org/abs/2602.08564
作者: Tiantong Wang,Yiyang Duan,Haoyu Chen,Tiantong Wu,Wei Yang Bryan Lim
类目: Machine Learning (cs.LG)
*备注: Code available at this https URL
[LG-44] Rho-Perfect: Correlation Ceiling For Subjective Evaluation Datasets
链接: https://arxiv.org/abs/2602.08552
作者: Fredrik Cumlin
类目: Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
*备注:
[LG-45] Incremental (k z)-Clustering on Graphs
链接: https://arxiv.org/abs/2602.08542
作者: Emilio Cruciani,Sebastian Forster,Antonis Skarlatos
类目: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
*备注: Abstract shortened to meet arXiv limits
点击查看摘要
Abstract:Given a weighted undirected graph, a number of clusters k , and an exponent z , the goal in the (k, z) -clustering problem on graphs is to select k vertices as centers that minimize the sum of the distances raised to the power z of each vertex to its closest center. In the dynamic setting, the graph is subject to adversarial edge updates, and the goal is to maintain explicitly an exact (k, z) -clustering solution in the induced shortest-path metric. While efficient dynamic k -center approximation algorithms on graphs exist [Cruciani et al. SODA 2024], to the best of our knowledge, no prior work provides similar results for the dynamic (k,z) -clustering problem. As the main result of this paper, we develop a randomized incremental (k, z) -clustering algorithm that maintains with high probability a constant-factor approximation in a graph undergoing edge insertions with a total update time of \tilde O(k m^1+o(1)+ k^1+\frac1\lambda m) , where \lambda \geq 1 is an arbitrary fixed constant. Our incremental algorithm consists of two stages. In the first stage, we maintain a constant-factor bicriteria approximate solution of size \tildeO(k) with a total update time of m^1+o(1) over all adversarial edge insertions. This first stage is an intricate adaptation of the bicriteria approximation algorithm by Mettu and Plaxton [Machine Learning 2004] to incremental graphs. One of our key technical results is that the radii in their algorithm can be assumed to be non-decreasing while the approximation ratio remains constant, a property that may be of independent interest. In the second stage, we maintain a constant-factor approximate (k,z) -clustering solution on a dynamic weighted instance induced by the bicriteria approximate solution. For this subproblem, we employ a dynamic spanner algorithm together with a static (k,z) -clustering algorithm. Comments: Abstract shortened to meet arXiv limits Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG) Cite as: arXiv:2602.08542 [cs.DS] (or arXiv:2602.08542v1 [cs.DS] for this version) https://doi.org/10.48550/arXiv.2602.08542 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
[LG-46] Causal Schrödinger Bridges: Constrained Optimal Transport on Structural Manifolds
链接: https://arxiv.org/abs/2602.08535
作者: Rui Wu,Li YongJun
类目: Machine Learning (cs.LG)
*备注: 12 pages, 7 figures
[LG-47] Bridging Academia and Industry: A Comprehensive Benchmark for Attributed Graph Clustering
链接: https://arxiv.org/abs/2602.08519
作者: Yunhui Liu,Pengyu Qiu,Yu Xing,Yongchao Liu,Peng Du,Chuntao Hong,Jiajun Zheng,Tao Zheng,Tieke He
类目: Machine Learning (cs.LG)
*备注:
[LG-48] Do physics-informed neural networks (PINNs) need to be deep? Shallow PINNs using the Levenberg-Marquardt algorithm
链接: https://arxiv.org/abs/2602.08515
作者: Muhammad Luthfi Shahab,Imam Mukhlash,Hadi Susanto
类目: Numerical Analysis (math.NA); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)
*备注:
点击查看摘要
Abstract:This work investigates the use of shallow physics-informed neural networks (PINNs) for solving forward and inverse problems of nonlinear partial differential equations (PDEs). By reformulating PINNs as nonlinear systems, the Levenberg-Marquardt (LM) algorithm is employed to efficiently optimize the network parameters. Analytical expressions for the neural network derivatives with respect to the input variables are derived, enabling accurate and efficient computation of the Jacobian matrix required by LM. The proposed approach is tested on several benchmark problems, including the Burgers, Schrödinger, Allen-Cahn, and three-dimensional Bratu equations. Numerical results demonstrate that LM significantly outperforms BFGS in terms of convergence speed, accuracy, and final loss values, even when using shallow network architectures with only two hidden layers. These findings indicate that, for a wide class of PDEs, shallow PINNs combined with efficient second-order optimization methods can provide accurate and computationally efficient solutions for both forward and inverse problems.
[LG-49] Is Meta-Path Attention an Explanation? Evidence of Alignment and Decoupling in Heterogeneous GNNs
链接: https://arxiv.org/abs/2602.08500
作者: Maiqi Jiang,Noman Ali,Yiran Ding,Yanfu Zhang
类目: Machine Learning (cs.LG)
*备注:
[LG-50] me-Delayed Transformers for Data-Driven Modeling of Low-Dimensional Dynamics
链接: https://arxiv.org/abs/2602.08478
作者: Albert Alcalde,Markus Widhalm,Emre Yılmaz
类目: Machine Learning (cs.LG); Dynamical Systems (math.DS); Numerical Analysis (math.NA)
*备注:
[LG-51] Learning Credal Ensembles via Distributionally Robust Optimization
链接: https://arxiv.org/abs/2602.08470
作者: Kaizheng Wang,Ghifari Adam Faza,Fabio Cuzzolin,Siu Lun Chau,David Moens,Hans Hallez
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注: 32 pages
[LG-52] Low Rank Transformer for Multivariate Time Series Anomaly Detection and Localization
链接: https://arxiv.org/abs/2602.08467
作者: Charalampos Shimillas,Kleanthis Malialis,Konstantinos Fokianos,Marios M. Polycarpou
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:
[LG-53] Estimating Aleatoric Uncertainty in the Causal Treatment Effect
链接: https://arxiv.org/abs/2602.08461
作者: Liyuan Xu,Bijan Mazaheri
类目: Machine Learning (cs.LG)
*备注:
[LG-54] RIFLE: Robust Distillation-based FL for Deep Model Deployment on Resource-Constrained IoT Networks
链接: https://arxiv.org/abs/2602.08446
作者: Pouria Arefijamal,Mahdi Ahmadlou,Bardia Safaei,Jörg Henkel
类目: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
*备注: This paper has been accepted for publication in IEEE ICC 2026 and will be indexed in the IEEE Xplore Digital Library
[LG-55] USBD: Universal Structural Basis Distillation for Source-Free Graph Domain Adaptation
链接: https://arxiv.org/abs/2602.08431
作者: Yingxu Wang,Kunyu Zhang,Mengzhu Wang,Siyang Gao,Nan Yin
类目: Machine Learning (cs.LG)
*备注:
点击查看摘要
Abstract:SF-GDA is pivotal for privacy-preserving knowledge transfer across graph datasets. Although recent works incorporate structural information, they implicitly condition adaptation on the smoothness priors of sourcetrained GNNs, thereby limiting their generalization to structurally distinct targets. This dependency becomes a critical bottleneck under significant topological shifts, where the source model misinterprets distinct topological patterns unseen in the source domain as noise, rendering pseudo-label-based adaptation unreliable. To overcome this limitation, we propose the Universal Structural Basis Distillation, a framework that shifts the paradigm from adapting a biased model to learning a universal structural basis for SF-GDA. Instead of adapting a biased source model to a specific target, our core idea is to construct a structure-agnostic basis that proactively covers the full spectrum of potential topological patterns. Specifically, USBD employs a bi-level optimization framework to distill the source dataset into a compact structural basis. By enforcing the prototypes to span the full Dirichlet energy spectrum, the learned basis explicitly captures diverse topological motifs, ranging from low-frequency clusters to high-frequency chains, beyond those present in the source. This ensures that the learned basis creates a comprehensive structural covering capable of handling targets with disparate structures. For inference, we introduce a spectral-aware ensemble mechanism that dynamically activates the optimal prototype combination based on the spectral fingerprint of the target graph. Extensive experiments on benchmarks demonstrate that USBD significantly outperforms state-of-the-art methods, particularly in scenarios with severe structural shifts, while achieving superior computational efficiency by decoupling the adaptation cost from the target data scale.
[LG-56] he Connection between Kriging and Large Neural Networks
链接: https://arxiv.org/abs/2602.08427
作者: Marius Marinescu
类目: Machine Learning (cs.LG); Statistics Theory (math.ST)
*备注:
[LG-57] Radial Müntz-Szász Networks: Neural Architectures with Learnable Power Bases for Multidimensional Singularities
链接: https://arxiv.org/abs/2602.08419
作者: Gnankan Landry Regis N’guessan,Bum Jun Kim
类目: Machine Learning (cs.LG); Numerical Analysis (math.NA)
*备注: 47 pages, 13 figures
[LG-58] Drop the mask! GAMM-A Taxonomy for Graph Attributes Missing Mechanisms
链接: https://arxiv.org/abs/2602.08407
作者: Richard Serrano(LabHC),Baptiste Jeudy(LabHC),Charlotte Laclau(IDS, S2A),Christine Largeron(LabHC)
类目: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
*备注:
[LG-59] Modalities a PyTorch-native Framework For Large-scale LLM Training and Research
链接: https://arxiv.org/abs/2602.08387
作者: Max Lübbering,Timm Ruland,Richard Rutmann,Felix Stollenwerk,David Fitzek,Michael Fromm,Alexander Weber,Rafet Sifa,Nicolas Flores-Herr,Joachim Köhler,Mehdi Ali
类目: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
*备注:
[LG-60] OJBKQ: Objective-Joint Babai-Klein Quantization
链接: https://arxiv.org/abs/2602.08376
作者: Xinyu Wang,Ziyu Zhao,Peng Lu,Yu Gu,Xiao-Wen Chang
类目: Machine Learning (cs.LG)
*备注:
[LG-61] Dynamic Regret via Discounted-to-Dynamic Reduction with Applications to Curved Losses and Adam Optimizer
链接: https://arxiv.org/abs/2602.08372
作者: Yan-Feng Xie,Yu-Jie Zhang,Peng Zhao,Zhi-Hua Zhou
类目: Machine Learning (cs.LG); Optimization and Control (math.OC)
*备注:
[LG-62] All ERMs Can Fail in Stochastic Convex Optimization Lower Bounds in Linear Dimension
链接: https://arxiv.org/abs/2602.08350
作者: Tal Burla,Roi Livni
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:
[LG-63] PACC: Protocol-Aware Cross-Layer Compression for Compact Network Traffic Representation
链接: https://arxiv.org/abs/2602.08331
作者: Zhaochen Guo,Tianyufei Zhou,Honghao Wang,Ronghua Li,Shinan Liu
类目: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
*备注:
点击查看摘要
Abstract:Network traffic classification is a core primitive for network security and management, yet it is increasingly challenged by pervasive encryption and evolving protocols. A central bottleneck is representation: hand-crafted flow statistics are efficient but often too lossy, raw-bit encodings can be accurate but are costly, and recent pre-trained embeddings provide transfer but frequently flatten the protocol stack and entangle signals across layers. We observe that real traffic contains substantial redundancy both across network layers and within each layer; existing paradigms do not explicitly identify and remove this redundancy, leading to wasted capacity, shortcut learning, and degraded generalization. To address this, we propose PACC, a redundancy-aware, layer-aware representation framework. PACC treats the protocol stack as multi-view inputs and learns compact layer-wise projections that remain faithful to each layer while explicitly factorizing representations into shared (cross-layer) and private (layer-specific) components. We operationalize these goals with a joint objective that preserves layer-specific information via reconstruction, captures shared structure via contrastive mutual-information learning, and maximizes task-relevant information via supervised losses, yielding compact latents suitable for efficient inference. Across datasets covering encrypted application classification, IoT device identification, and intrusion detection, PACC consistently outperforms feature-engineered and raw-bit baselines. On encrypted subsets, it achieves up to a 12.9% accuracy improvement over nPrint. PACC matches or surpasses strong foundation-model baselines. At the same time, it improves end-to-end efficiency by up to 3.16x.
[LG-64] owards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought Compression
链接: https://arxiv.org/abs/2602.08324
作者: Yuntian Tang,Bohan Jia,Wenxuan Huang,Lianyue Zhang,Jiao Xie,Wenxi Li,Wei Li,Jie Hu,Xinghao Chen,Rongrong Ji,Shaohui Lin
类目: Machine Learning (cs.LG)
*备注: 15 pages, 7 figures
[LG-65] Fast Flow Matching based Conditional Independence Tests for Causal Discovery
链接: https://arxiv.org/abs/2602.08315
作者: Shunyu Zhao,Yanfeng Yang,Shuai Li,Kenji Fukumizu
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:
[LG-66] Interaction-Grounded Learning for Contextual Markov Decision Processes with Personalized Feedback
链接: https://arxiv.org/abs/2602.08307
作者: Mengxiao Zhang,Yuheng Zhang,Haipeng Luo,Paul Mineiro
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:
[LG-67] xtResNet: Decoupling and Routing Optimization Signals in Compound AI Systems via Deep Residual Tuning
链接: https://arxiv.org/abs/2602.08306
作者: Suizhi Huang,Mei Li,Han Yu,Xiaoxiao Li
类目: Machine Learning (cs.LG)
*备注:
[LG-68] Constraint-Aware Generative Auto-bidding via Pareto-Prioritized Regret Optimization
链接: https://arxiv.org/abs/2602.08261
作者: Binglin Wu,Yingyi Zhang,Xianneng Li,Ruyue Deng,Chuan Yue,Weiru Zhang,Xiaoyi Zeng
类目: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)
*备注:
点击查看摘要
Abstract:Auto-bidding systems aim to maximize marketing value while satisfying strict efficiency constraints such as Target Cost-Per-Action (CPA). Although Decision Transformers provide powerful sequence modeling capabilities, applying them to this constrained setting encounters two challenges: 1) standard Return-to-Go conditioning causes state aliasing by neglecting the cost dimension, preventing precise resource pacing; and 2) standard regression forces the policy to mimic average historical behaviors, thereby limiting the capacity to optimize performance toward the constraint boundary. To address these challenges, we propose PRO-Bid, a constraint-aware generative auto-bidding framework based on two synergistic mechanisms: 1) Constraint-Decoupled Pareto Representation (CDPR) decomposes global constraints into recursive cost and value contexts to restore resource perception, while reweighting trajectories based on the Pareto frontier to focus on high-efficiency data; and 2) Counterfactual Regret Optimization (CRO) facilitates active improvement by utilizing a global outcome predictor to identify superior counterfactual actions. By treating these high-utility outcomes as weighted regression targets, the model transcends historical averages to approach the optimal constraint boundary. Extensive experiments on two public benchmarks and online A/B tests demonstrate that PRO-Bid achieves superior constraint satisfaction and value acquisition compared to state-of-the-art baselines.
[LG-69] SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
链接: https://arxiv.org/abs/2602.08234
作者: Peng Xia,Jianwen Chen,Hanyang Wang,Jiaqi Liu,Kaide Zeng,Yu Wang,Siwei Han,Yiyang Zhou,Xujiang Zhao,Haifeng Chen,Zeyu Zheng,Cihang Xie,Huaxiu Yao
类目: Machine Learning (cs.LG)
*备注:
附件下载
点击下载今日全部论文列表