本篇博文主要内容为 2025-10-28 从Arxiv.org论文网站获取的最新论文列表,自动更新,按照NLP、CV、ML、AI、IR五个大方向区分,若需要邮件定时接收,请在评论区留下你的邮箱号。

说明:每日论文数据从Arxiv.org获取,每天早上12:00左右定时自动更新。

友情提示: 如何您需要邮箱接收每日论文数据,请在评论处留下你的邮箱。

目录

概览 (2025-10-28)

今日共更新1100篇论文,其中:

  • 自然语言处理184篇(Computation and Language (cs.CL))
  • 人工智能356篇(Artificial Intelligence (cs.AI))
  • 计算机视觉253篇(Computer Vision and Pattern Recognition (cs.CV))
  • 机器学习355篇(Machine Learning (cs.LG))

自然语言处理

[NLP-0] Variational Masked Diffusion Models

链接: https://arxiv.org/abs/2510.23606
作者: Yichi Zhang,Alex Schwing,Zhizhen Zhao
机构: University of Illinois Urbana-Champaign (伊利诺伊大学厄巴纳-香槟分校)
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: Project Page: this https URL

点击查看摘要

[NLP-1] hink Twice: Branch-and-Rethink Reasoning Reward Model

链接: https://arxiv.org/abs/2510.23596
作者: Yizhu Jiao,Jiaqi Zeng,Julien Veron Vialard,Oleksii Kuchaiev,Jiawei Han,Olivier Delalleau
机构: NVIDIA; University of Illinois Urbana-Champaign
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-2] Hope Speech Detection in Social Media English Corpora: Performance of Traditional and Transformer Models

链接: https://arxiv.org/abs/2510.23585
作者: Luis Ramos,Hiram Calvo,Olga Kolesnikova
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-3] ReCode: Unify Plan and Action for Universal Granularity Control

链接: https://arxiv.org/abs/2510.23564
作者: Zhaoyang Yu,Jiayi Zhang,Huixue Su,Yufan Zhao,Yifan Wu,Mingyi Deng,Jinyu Xiang,Yizhang Lin,Lingxiao Tang,Yingchao Li,Yuyu Luo,Bang Liu,Chenglin Wu
机构: DeepWisdom; The Hong Kong University of Science and Technology (Guangzhou); Renmin University of China; Zhejiang University; Université de Montréal & Mila
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-4] ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models ICASSP2026

链接: https://arxiv.org/abs/2510.23558
作者: Bohan Li,Wenbin Huang,Yuhang Qiu,Yiwei Guo,Hankun Wang,Zhihan Li,Jing Peng,Ziyang Ma,Xie Chen,Kai Yu
机构: 未知
类目: ound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
备注: submitted to icassp 2026

点击查看摘要

[NLP-5] A U-Net and Transformer Pipeline for Multilingual Image Translation

链接: https://arxiv.org/abs/2510.23554
作者: Siddharth Sahay,Radhika Agarwal
机构: 未知
类目: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
备注: 6 pages, 3 figures, 5 tables, and 2 algorithms. Prepared in IEEE double-column format

点击查看摘要

[NLP-6] LimRank: Less is More for Reasoning -Intensive Information Reranking EMNLP2025

【速读】: 该论文旨在解决大型语言模型(Large Language Models, LLMs)在信息重排序(information reranking)任务中依赖大规模微调所带来的高计算成本问题。其解决方案的关键在于:提出一种名为 LIMRANK-SYNTHESIZER 的可复用、开源的数据生成流水线,用于合成多样化、具有挑战性且真实的重排序样本;基于此类高质量合成数据对重排序模型 LIMRANK 进行微调,仅需不到传统方法所需数据量的 5%,即可在 BRIGHT 和 FollowIR 等挑战性基准上实现具有竞争力的性能,并展现出良好的跨下游任务泛化能力,如科学文献检索和知识密集型问题求解中的检索增强生成(retrieval-augmented generation)。

链接: https://arxiv.org/abs/2510.23544
作者: Tingyu Song,Yilun Zhao,Siyue Zhang,Chen Zhao,Arman Cohan
机构: Yale NLP Lab (耶鲁大学自然语言处理实验室)
类目: Computation and Language (cs.CL); Information Retrieval (cs.IR)
备注: EMNLP 2025 Main (Short)

点击查看摘要

Abstract:Existing approaches typically rely on large-scale fine-tuning to adapt LLMs for information reranking tasks, which is computationally expensive. In this work, we demonstrate that modern LLMs can be effectively adapted using only minimal, high-quality supervision. To enable this, we design LIMRANK-SYNTHESIZER, a reusable and open-source pipeline for generating diverse, challenging, and realistic reranking examples. Using this synthetic data, we fine-tune our reranker model, LIMRANK. We evaluate LIMRANK on two challenging benchmarks, i.e., BRIGHT for reasoning-intensive retrieval and FollowIR for instruction-following retrieval. Our experiments demonstrate that LIMRANK achieves competitive performance, while being trained on less than 5% of the data typically used in prior work. Further ablation studies demonstrate the effectiveness of LIMRANK-SYNTHESIZER and the strong generalization capabilities of LIMRANK across downstream tasks, including scientific literature search and retrieval-augmented generation for knowledge-intensive problem solving.
zh

[NLP-7] JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence

链接: https://arxiv.org/abs/2510.23538
作者: Qiushi Sun,Jingyang Gong,Yang Liu,Qiaosheng Chen,Lei Li,Kai Chen,Qipeng Guo,Ben Kao,Fei Yuan
机构: The University of Hong Kong (香港大学); Shanghai AI Laboratory (上海人工智能实验室); Nanjing University (南京大学); Carnegie Mellon University (卡内基梅隆大学); Shanghai Innovation Institute (上海创新研究院)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Software Engineering (cs.SE)
备注: Work in progress

点击查看摘要

[NLP-8] IPQA: A Benchmark for Core Intent Identification in Personalized Question Answering

链接: https://arxiv.org/abs/2510.23536
作者: Jieyong Kim,Maryam Amirizaniani,Soojin Yoon,Dongha Lee
机构: Yonsei University (延世大学); University of Washington (华盛顿大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-9] M4FC: a Multimodal Multilingual Multicultural Multitask Real-World Fact-Checking Dataset

链接: https://arxiv.org/abs/2510.23508
作者: Jiahui Geng,Jonathan Tonglet,Iryna Gurevych
机构: Mohamed bin Zayed University of Artificial Intelligence (MBZUAI); Ubiquitous Knowledge Processing Lab (UKP Lab); TU Darmstadt; National Research Center for Applied Cybersecurity ATHENE; KU Leuven
类目: Computation and Language (cs.CL)
备注: Preprint under review. Code and data available at: this https URL

点击查看摘要

[NLP-10] MMTutorBench: The First Multimodal Benchmark for AI Math Tutoring

【速读】: 该论文旨在解决当前多模态大语言模型(Multimodal Large Language Models, MLLMs)在数学辅导能力评估中缺乏系统性评测基准的问题,尤其忽视了诊断学生困难和分步引导的核心 tutoring 技能。其解决方案的关键在于构建首个面向 AI 数学辅导的基准测试平台 MMTutorBench,包含 685 个围绕教学关键步骤设计的问题,每个问题配有细粒度评分规则(rubrics),支持从六个维度进行评估,并将任务结构化为 Insight Discovery、Operation Formulation 和 Operation Execution 三个阶段,从而实现对 AI 教师辅导能力的全面、可量化测评。

链接: https://arxiv.org/abs/2510.23477
作者: Tengchao Yang,Sichen Guo,Mengzhao Jia,Jiaming Su,Yuanyang Liu,Zhihan Zhang,Meng Jiang
机构: University of Notre Dame(圣母大学); Fudan University(复旦大学); Nanjing University of Posts and Telecommunications(南京邮电大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Effective math tutoring requires not only solving problems but also diagnosing students’ difficulties and guiding them step by step. While multimodal large language models (MLLMs) show promise, existing benchmarks largely overlook these tutoring skills. We introduce MMTutorBench, the first benchmark for AI math tutoring, consisting of 685 problems built around pedagogically significant key-steps. Each problem is paired with problem-specific rubrics that enable fine-grained evaluation across six dimensions, and structured into three tasks-Insight Discovery, Operation Formulation, and Operation Execution. We evaluate 12 leading MLLMs and find clear performance gaps between proprietary and open-source systems, substantial room compared to human tutors, and consistent trends across input variants: OCR pipelines degrade tutoring quality, few-shot prompting yields limited gains, and our rubric-based LLM-as-a-Judge proves highly reliable. These results highlight both the difficulty and diagnostic value of MMTutorBench for advancing AI tutoring.
zh

[NLP-11] Evaluating Large Language Models for Stance Detection on Financial Targets from SEC Filing Reports and Earnings Call Transcripts

链接: https://arxiv.org/abs/2510.23464
作者: Nikesh Gyawali,Doina Caragea,Alex Vasenkov,Cornelia Caragea
机构: Kansas State University (堪萨斯州立大学); Mathinvestments, Inc. (数学投资公司); University of Illinois Chicago (芝加哥大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-12] BrowseConf: Confidence-Guided Test-Time Scaling for Web Agents

链接: https://arxiv.org/abs/2510.23458
作者: Litu Ou,Kuan Li,Huifeng Yin,Liwen Zhang,Zhongwang Zhang,Xixi Wu,Rui Ye,Zile Qiao,Yong Jiang,Pengjun Xie,Fei Huang,Jingren Zhou
机构: Tongyi Lab (通义实验室); Alibaba Group (阿里巴巴集团)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 25 pages

点击查看摘要

[NLP-13] Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences

链接: https://arxiv.org/abs/2510.23451
作者: Zhuoran Jin,Hongbang Yuan,Kejian Zhu,Jiachun Li,Pengfei Cao,Yubo Chen,Kang Liu,Jun Zhao
机构: University of Chinese Academy of Sciences (中国科学院大学); Institute of Automation, Chinese Academy of Sciences (中国科学院自动化研究所)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: 48 pages, 17 figures

点击查看摘要

[NLP-14] A Neuro-Symbolic Multi-Agent Approach to Legal-Cybersecurity Knowledge Integration

【速读】: 该论文试图解决网络安全与法律交叉领域中存在的知识鸿沟问题,即传统法律研究工具难以处理案例、法规与技术漏洞之间的复杂关联,从而阻碍了法律专家与网络安全专业人员之间的协作。解决方案的关键在于构建能够智能导航日益复杂的网络法域的系统,并在多语言任务上展示了有前景的初步成果。

链接: https://arxiv.org/abs/2510.23443
作者: Chiara Bonfanti,Alessandro Druetto,Cataldo Basile,Tharindu Ranasinghe,Marcos Zampieri
机构: Politecnico di Torino (都灵理工大学); Università di Torino (都灵大学); Lancaster University (兰卡斯特大学); George Mason University (乔治梅森大学)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Multiagent Systems (cs.MA)
备注: 7 pages

点击查看摘要

Abstract:The growing intersection of cybersecurity and law creates a complex information space where traditional legal research tools struggle to deal with nuanced connections between cases, statutes, and technical vulnerabilities. This knowledge divide hinders collaboration between legal experts and cybersecurity professionals. To address this important gap, this work provides a first step towards intelligent systems capable of navigating the increasingly intricate cyber-legal domain. We demonstrate promising initial results on multilingual tasks.
zh

[NLP-15] EMTSF:Extraordinary Mixture of SOTA Models for Time Series Forecasting

【速读】: 该论文旨在解决时间序列预测(Time Series Forecasting, TSF)中模型性能瓶颈的问题,特别是在面对近期数据偏好和不可预测事件时的建模挑战。近年来,尽管Transformer架构在TSF领域取得显著进展,但其有效性受到质疑,例如有研究指出简单线性模型可能优于复杂Transformer结构,随后提出的PatchTST和TimeLLM虽提升性能,却也引发新争议——如移除大型语言模型(Large Language Model, LLM)组件反而带来更好效果。针对此问题,作者提出一种基于混合专家(Mixture of Experts, MoE)框架的增强型TSF方法(EMTSF),其核心在于融合多种互补且多样化的SOTA模型(包括xLSTM、改进线性模型、PatchTST和minGRU等),并通过一个基于Transformer的MoE门控网络进行动态集成,从而实现对不同时间模式的自适应捕捉与优化组合,最终在标准基准上超越所有现有TSF模型,包括其他MoE架构。

链接: https://arxiv.org/abs/2510.23396
作者: Musleh Alharthi,Kaleel Mahmood,Sarosh Patel,Ausif Mahmood
机构: University of Bridgeport (桥港大学); University of Rhode Island (罗德岛大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:The immense success of the Transformer architecture in Natural Language Processing has led to its adoption in Time Se ries Forecasting (TSF), where superior performance has been shown. However, a recent important paper questioned their effectiveness by demonstrating that a simple single layer linear model outperforms Transformer-based models. This was soon shown to be not as valid, by a better transformer-based model termed PatchTST. More re cently, TimeLLM demonstrated even better results by repurposing a Large Language Model (LLM) for the TSF domain. Again, a follow up paper challenged this by demonstrating that removing the LLM component or replacing it with a basic attention layer in fact yields better performance. One of the challenges in forecasting is the fact that TSF data favors the more recent past, and is sometimes subject to unpredictable events. Based upon these recent insights in TSF, we propose a strong Mixture of Experts (MoE) framework. Our method combines the state-of-the-art (SOTA) models including xLSTM, en hanced Linear, PatchTST, and minGRU, among others. This set of complimentary and diverse models for TSF are integrated in a Trans former based MoE gating network. Our proposed model outperforms all existing TSF models on standard benchmarks, surpassing even the latest approaches based on MoE frameworks. Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI) Cite as: arXiv:2510.23396 [cs.CL] (or arXiv:2510.23396v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2510.23396 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Musleh Alharthi [view email] [v1] Mon, 27 Oct 2025 14:55:30 UTC (394 KB) Full-text links: Access Paper: View a PDF of the paper titled EMTSF:Extraordinary Mixture of SOTA Models for Time Series Forecasting, by Musleh Alharthi and 2 other authorsView PDFHTML (experimental)TeX Source view license Current browse context: cs.CL prev | next new | recent | 2025-10 Change to browse by: cs cs.AI References Citations NASA ADSGoogle Scholar Semantic Scholar export BibTeX citation Loading… BibTeX formatted citation loading… Data provided by: Bookmark checked=“checked”> Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Code, Data and Media Associated with this Article alphaXiv Toggle alphaXiv (What is alphaXiv?) Links to Code Toggle CatalyzeX Code Finder for Papers (What is CatalyzeX?) DagsHub Toggle DagsHub (What is DagsHub?) GotitPub Toggle Gotit.pub (What is GotitPub?) Huggingface Toggle Hugging Face (What is Huggingface?) Links to Code Toggle Papers with Code (What is Papers with Code?) ScienceCast Toggle ScienceCast (What is ScienceCast?) Demos Demos Replicate Toggle Replicate (What is Replicate?) Spaces Toggle Hugging Face Spaces (What is Spaces?) Spaces Toggle TXYZ.AI (What is TXYZ.AI?) Related Papers Recommenders and Search Tools Link to Influence Flower Influence Flower (What are Influence Flowers?) Core recommender toggle CORE Recommender (What is CORE?) Author Venue Institution Topic About arXivLabs arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv’s community? Learn more about arXivLabs. Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?) mathjaxToggle(); About Help contact arXivClick here to contact arXiv Contact subscribe to arXiv mailingsClick here to subscribe Subscribe Copyright Privacy Policy Web Accessibility Assistance arXiv Operational Status
zh

[NLP-16] Detecting Religious Language in Climate Discourse

【速读】: 该论文试图解决的问题是:如何在气候相关文本中准确识别显性和隐性的宗教语言,尤其是在世俗与宗教非政府组织(NGO)的文本中,并探讨不同方法在检测宗教语言时的差异与局限。其解决方案的关键在于引入一种双重方法论框架——一方面采用基于规则的模型,利用生态神学文献构建的宗教术语分层树进行识别;另一方面使用零样本设置下的大语言模型(LLMs)进行对比分析。通过超过88万句的语料库比较两种方法的结果,研究揭示了规则模型比LLMs更倾向于标注为宗教语言,从而凸显了宗教语言检测中的方法论挑战,即是否应仅依据词汇特征还是结合语境意义来定义宗教语言。

链接: https://arxiv.org/abs/2510.23395
作者: Evy Beijen,Pien Pieterse,Yusuf Çelik,Willem Th. van Peursen,Sandjai Bhulai,Meike Morren
机构: Eep Talstra Centre for Bible and Computer, Vrije Universiteit Amsterdam; Department of Mathematics, Vrije Universiteit Amsterdam; Department of Marketing, School of Business and Economics, Vrije Universiteit Amsterdam
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Religious language continues to permeate contemporary discourse, even in ostensibly secular domains such as environmental activism and climate change debates. This paper investigates how explicit and implicit forms of religious language appear in climate-related texts produced by secular and religious nongovernmental organizations (NGOs). We introduce a dual methodological approach: a rule-based model using a hierarchical tree of religious terms derived from ecotheology literature, and large language models (LLMs) operating in a zero-shot setting. Using a dataset of more than 880,000 sentences, we compare how these methods detect religious language and analyze points of agreement and divergence. The results show that the rule-based method consistently labels more sentences as religious than LLMs. These findings highlight not only the methodological challenges of computationally detecting religious language but also the broader tension over whether religious language should be defined by vocabulary alone or by contextual meaning. This study contributes to digital methods in religious studies by demonstrating both the potential and the limitations of approaches for analyzing how the sacred persists in climate discourse.
zh

[NLP-17] How AI Forecasts AI Jobs: Benchmarking LLM Predictions of Labor Market Changes

链接: https://arxiv.org/abs/2510.23358
作者: Sheri Osborn,Rohit Valecha,H. Raghav Rao,Dan Sass,Anthony Rios
机构: 未知
类目: Computation and Language (cs.CL)
备注: 8 pages + Limitations + References

点击查看摘要

[NLP-18] LightKGG: Simple and Efficient Knowledge Graph Generation from Textual Data

链接: https://arxiv.org/abs/2510.23341
作者: Teng Lin
机构: DSA,HKUST(GZ)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-19] Planning Ahead with RSA: Efficient Signalling in Dynamic Environments by Projecting User Awareness across Future Timesteps

【速读】: 该论文旨在解决在动态环境中,如何通过自适应信号传递机制提升人类与AI协作效率的问题,特别是在时间敏感任务中确保人类用户对关键任务信息保持准确理解。其核心挑战在于人类注意力资源具有零和特性,即关注某一信息会削弱对其他或后续信息的感知能力。解决方案的关键在于引入基于理性通信原理的自适应信号框架,利用贝叶斯参考解析(Bayesian reference resolution)与理性言语行为(Rational Speech Act, RSA)建模方法,规划多步消息序列以优化用户信念与动态环境之间的及时对齐。该框架根据用户特性和场景变化调整消息的具体性与时序,通过预测先前引导下的信息解读如何影响用户对界面的关注及后续信念更新,从而实现更有效的认知协同。实验表明,该方案的成功依赖于将多步规划与对用户意识的现实建模相结合,为人类-智能体团队中的语用沟通提供了理论基础。

链接: https://arxiv.org/abs/2510.23340
作者: Anwesha Das,John Duff,Jörg Hoffmann,Vera Demberg
机构: Saarland University (萨尔兰大学)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
备注: 11 pages, 3 figures

点击查看摘要

Abstract:Adaptive agent design offers a way to improve human-AI collaboration on time-sensitive tasks in rapidly changing environments. In such cases, to ensure the human maintains an accurate understanding of critical task elements, an assistive agent must not only identify the highest priority information but also estimate how and when this information can be communicated most effectively, given that human attention represents a zero-sum cognitive resource where focus on one message diminishes awareness of other or upcoming information. We introduce a theoretical framework for adaptive signalling which meets these challenges by using principles of rational communication, formalised as Bayesian reference resolution using the Rational Speech Act (RSA) modelling framework, to plan a sequence of messages which optimise timely alignment between user belief and a dynamic environment. The agent adapts message specificity and timing to the particulars of a user and scenario based on projections of how prior-guided interpretation of messages will influence attention to the interface and subsequent belief update, across several timesteps out to a fixed horizon. In a comparison to baseline methods, we show that this effectiveness depends crucially on combining multi-step planning with a realistic model of user awareness. As the first application of RSA for communication in a dynamic environment, and for human-AI interaction in general, we establish theoretical foundations for pragmatic communication in human-agent teams, highlighting how insights from cognitive science can be capitalised to inform the design of assistive agents.
zh

[NLP-20] BaZi-Based Character Simulation Benchmark: Evaluating AI on Temporal and Persona Reasoning

链接: https://arxiv.org/abs/2510.23337
作者: Siyuan Zheng,Pai Liu,Xi Chen,Jizheng Dong,Sihan Jia
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-21] Adaptive Blockwise Search: Inference-Time Alignment for Large Language Models

链接: https://arxiv.org/abs/2510.23334
作者: Mohammad Atif Quamar,Mohammad Areeb,Nishant Sharma,Ananth Shreekumar,Jonathan Rosenthal,Muslum Ozgur Ozmen,Mikhail Kuznetsov,Z. Berkay Celik
机构: Purdue University (普渡大学); Arizona State University (亚利桑那州立大学); Amazon (亚马逊)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-22] Arabic Little STT: Arabic Children Speech Recognition Dataset

链接: https://arxiv.org/abs/2510.23319
作者: Mouhand Alkadri,Dania Desouki,Khloud Al Jallad
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD)
备注:

点击查看摘要

[NLP-23] DCMM-SQL: Automated Data-Centric Pipeline and Multi-Model Collaboration Training for Text-to-SQL Model

链接: https://arxiv.org/abs/2510.23284
作者: Yuanzhen Xie,Liu Ye,Jiqun Chu,Mochi Gao,Hehuan Liu,Yunzhi Tan,Bo Hu,Zang Li
机构: Tencent(腾讯)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-24] A Cocktail-Party Benchmark: Multi-Modal dataset and Comparative Evaluation Results ICASSP2026

【速读】: 该论文旨在解决单房间场景下多人对话中的鸡尾酒会问题(cocktail-party problem),即在高度重叠的语音流中准确识别“谁在何时说什么以及与谁对话”。其解决方案的关键在于引入多模态上下文感知识别(Multi-Modal Context-Aware Recognition, MCoRec)任务,通过融合音频、视觉和上下文线索,联合实现每个说话人的语音转录与对话聚类,从而从音视频记录中恢复自然、未脚本的群体对话结构。实验表明,仅使用音频的基线系统词错误率(Word Error Rate, WER)超过100%,而加入视觉信息后性能提升达50%,凸显了多模态融合对解决复杂语音重叠问题的重要性。

链接: https://arxiv.org/abs/2510.23276
作者: Thai-Binh Nguyen,Katerina Zmolikova,Pingchuan Ma,Ngoc Quan Pham,Christian Fuegen,Alexander Waibel
机构: 未知
类目: Computation and Language (cs.CL)
备注: Submitted to ICASSP 2026

点击查看摘要

Abstract:We introduce the task of Multi-Modal Context-Aware Recognition (MCoRec) in the ninth CHiME Challenge, which addresses the cocktail-party problem of overlapping conversations in a single-room setting using audio, visual, and contextual cues. MCoRec captures natural multi-party conversations where the recordings focus on unscripted, casual group chats, leading to extreme speech overlap of up to 100% and highly fragmented conversational turns. The task requires systems to answer the question “Who speaks when, what, and with whom?” by jointly transcribing each speaker’s speech and clustering them into their respective conversations from audio-visual recordings. Audio-only baselines exceed 100% word error rate, whereas incorporating visual cues yields substantial 50% improvements, highlighting the importance of multi-modality. In this manuscript, we present the motivation behind the task, outline the data collection process, and report the baseline systems developed for the MCoRec.
zh

[NLP-25] Code Aesthetics with Agent ic Reward Feedback

链接: https://arxiv.org/abs/2510.23272
作者: Bang Xiao,Lingjie Jiang,Shaohan Huang,Tengchao Lv,Yupan Huang,Xun Wu,Lei Cui,Furu Wei
机构: Microsoft Research Asia (微软亚洲研究院); Zhiyuan College (智源学院); Shanghai Jiao Tong University (上海交通大学); Peking University (北京大学)
类目: Computation and Language (cs.CL)
备注: 30 pages, 7 figures

点击查看摘要

[NLP-26] Mubeen AI: A Specialized Arabic Language Model for Heritage Preservation and User Intent Understanding

链接: https://arxiv.org/abs/2510.23271
作者: Mohammed Aljafari,Ismail Alturki,Ahmed Mori,Yehya Kadumi
机构: 未知
类目: Computation and Language (cs.CL)
备注: 21 pages, 2 figures, 3 tables. Includes appendices on ethical guidelines and training framework. Submitted September 04, 2025

点击查看摘要

[NLP-27] Are ASR foundation models generalized enough to capture features of regional dialects for low-resource languages? AACL

链接: https://arxiv.org/abs/2510.23252
作者: Tawsif Tashwar Dipto,Azmol Hossain,Rubayet Sabbir Faruque,Md. Rezuwan Hassan,Kanij Fatema,Tanmoy Shome,Ruwad Naswan,Md.Foriduzzaman Zihad,Mohaymen Ul Anam,Nazia Tasnim,Hasan Mahmud,Md Kamrul Hasan,Md. Mehedi Hasan Shawon,Farig Sadeque,Tahsin Reasat
机构: 未知
类目: Computation and Language (cs.CL)
备注: This manuscript contains 11 pages, 5 tables and 16 figures This was accepted at International Joint Conference on Natural Language Processing Asia-Pacific Chapter of the Association for Computational Linguistics (IJCNLP-AACL) 2025

点击查看摘要

[NLP-28] Process Reward Models for Sentence-Level Verification of LVLM Radiology Reports

链接: https://arxiv.org/abs/2510.23217
作者: Alois Thomas,Maya Varma,Jean-Benoit Delbrouck,Curtis P. Langlotz
机构: Stanford University (斯坦福大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-29] PTPP-Aware Adaptation Scaling Laws: Predicting Domain-Adaptation Performance at Unseen Pre-Training Budgets

链接: https://arxiv.org/abs/2510.23198
作者: Etienne Goffinet,Shane Bergsma,Avraham Sheinin,Natalia Vassilieva,Shaheer Muhammad,Preslav Nakov,Gurpreet Gosal
机构: Cerebras Systems(赛雷布拉斯系统); MBZUAI(穆罕默德·本·扎耶德人工智能大学)
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-30] DREaM: Drug-Drug Relation Extraction via Transfer Learning Method

链接: https://arxiv.org/abs/2510.23189
作者: Ali Fata,Hossein Rahmani,Parinaz Soltanzadeh,Amirhossein Derakhshan,Behrouz Minaei Bidgoli
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-31] SI-Bench: Benchmarking Social Intelligence of Large Language Models in Human-to-Human Conversations

链接: https://arxiv.org/abs/2510.23182
作者: Shuai Huang,Wenxuan Zhao,Jun Gao
机构: Hello Group
类目: Computation and Language (cs.CL)
备注: 17 pages, 9 figures

点击查看摘要

[NLP-32] MATCH: Task-Driven Code Evaluation through Contrastive Learning

【速读】: 该论文旨在解决生成式 AI (Generative AI) 代码在缺乏参考代码的情况下,如何准确评估其与开发者意图一致性的问题。传统方法如单元测试成本高且难以扩展,而基于语法相似性的指标(如 BLEU、ROUGE)无法反映代码功能,CodeBERTScore 等则依赖参考代码,适用性受限。为填补这一空白,论文提出 MATCH,其核心创新在于采用对比学习(Contrastive Learning)构建代码与自然语言任务描述的语义嵌入(embedding),从而实现无需参考代码即可衡量生成代码功能性匹配度的评分机制,显著提升了与功能正确性和人类偏好之间的相关性。

链接: https://arxiv.org/abs/2510.23169
作者: Marah Ghoummaid,Vladimir Tchuiev,Ofek Glick,Michal Moschkovitz,Dotan Di Castro
机构: Bosch Research (博世研究)
类目: Computation and Language (cs.CL); Software Engineering (cs.SE)
备注:

点击查看摘要

Abstract:AI-based code generation is increasingly prevalent, with GitHub Copilot estimated to generate 46% of the code on GitHub. Accurately evaluating how well generated code aligns with developer intent remains a critical challenge. Traditional evaluation methods, such as unit tests, are often unscalable and costly. Syntactic similarity metrics (e.g., BLEU, ROUGE) fail to capture code functionality, and metrics like CodeBERTScore require reference code, which is not always available. To address the gap in reference-free evaluation, with few alternatives such as ICE-Score, this paper introduces MATCH, a novel reference-free metric. MATCH uses Contrastive Learning to generate meaningful embeddings for code and natural language task descriptions, enabling similarity scoring that reflects how well generated code implements the task. We show that MATCH achieves stronger correlations with functional correctness and human preference than existing metrics across multiple programming languages.
zh

[NLP-33] Beyond Direct Generation: A Decomposed Approach to Well-Crafted Screenwriting with LLM s

链接: https://arxiv.org/abs/2510.23163
作者: Hang Lei,Shengyi Zong,Zhaoyan Li,Ziren Zhou,Hao Liu
机构: Alibaba Group (阿里巴巴集团); Peking University (北京大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-34] ENTP: Enhancing Low-Quality SFT Data via Neural-Symbolic Text Purge-Mix

链接: https://arxiv.org/abs/2510.23160
作者: Zile Yang,Ling Li,Na Di,Jinlong Pang,Yao Zhou,Hao Cheng,Bo Han,Jiaheng Wei
机构: The Hong Kong University of Science and Technology (Guangzhou) (香港科技大学(广州)); University of California, Santa Cruz (加州大学圣克鲁兹分校); Hong Kong Baptist University (香港浸会大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-35] Rethinking GSPO: The Perplexity-Entropy Equivalence

链接: https://arxiv.org/abs/2510.23142
作者: Chi Liu
机构: 武汉大学(Whuhan University)
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: 10 pages, 2 figures

点击查看摘要

[NLP-36] Corpus Frequencies in Morphological Inflection: Do They Matter?

链接: https://arxiv.org/abs/2510.23131
作者: Tomáš Sourada,Jana Straková
机构: Charles University (查尔斯大学)
类目: Computation and Language (cs.CL)
备注: Published in the proceedings of ITAT 2025.15 pages, 1 figure, 4 tables

点击查看摘要

[NLP-37] Beyond Higher Rank: Token-wise Input-Output Projections for Efficient Low-Rank Adaptation NEURIPS2025

链接: https://arxiv.org/abs/2510.23123
作者: Shiwei Li,Xiandi Luo,Haozhao Wang,Xing Tang,Ziqiang Cui,Dugang Liu,Yuhua Li,Xiuqiang He,Ruixuan Li
机构: Huazhong University of Science and Technology (华中科技大学); Shenzhen Technology University (深圳技术大学); City University of Hong Kong (香港城市大学); Shenzhen University (深圳大学)
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: Accepted by NeurIPS 2025

点击查看摘要

[NLP-38] Flexing in 73 Languages: A Single Small Model for Multilingual Inflection

链接: https://arxiv.org/abs/2510.23114
作者: Tomáš Sourada,Jana Straková
机构: 未知
类目: Computation and Language (cs.CL)
备注: Published in the proceedings of TSD 2025. 12 pages, 1 figure, 4 tables

点击查看摘要

[NLP-39] Leverag ing Hierarchical Organization for Medical Multi-document Summarization

【速读】: 该论文旨在解决医学多文档摘要(Medical Multi-Document Summarization, MDS)中如何更有效地组织和 contextualize 跨文档信息的问题,以提升摘要的可读性与人类偏好度。其解决方案的关键在于引入层次化结构(hierarchical structure)作为输入组织方式,相较于传统的扁平化摘要方法,能够更好地利用文档间的层级关系来增强模型对信息的组织能力,从而在保持内容覆盖度(coverage)、事实准确性(factuality)和连贯性(coherence)的同时,显著提高人类专家对生成摘要的偏好度与清晰度(clarity)。

链接: https://arxiv.org/abs/2510.23104
作者: Yi-Li Hsu,Katelyn X. Mei,Lucy Lu Wang
机构: National Tsing Hua University (国立清华大学); University of Washington (华盛顿大学); Allen Institute for AI (艾伦人工智能研究所)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
备注:

点击查看摘要

Abstract:Medical multi-document summarization (MDS) is a complex task that requires effectively managing cross-document relationships. This paper investigates whether incorporating hierarchical structures in the inputs of MDS can improve a model’s ability to organize and contextualize information across documents compared to traditional flat summarization methods. We investigate two ways of incorporating hierarchical organization across three large language models (LLMs), and conduct comprehensive evaluations of the resulting summaries using automated metrics, model-based metrics, and domain expert evaluation of preference, understandability, clarity, complexity, relevance, coverage, factuality, and coherence. Our results show that human experts prefer model-generated summaries over human-written summaries. Hierarchical approaches generally preserve factuality, coverage, and coherence of information, while also increasing human preference for summaries. Additionally, we examine whether simulated judgments from GPT-4 align with human judgments, finding higher agreement along more objective evaluation facets. Our findings demonstrate that hierarchical structures can improve the clarity of medical summaries generated by models while maintaining content coverage, providing a practical way to improve human preference for generated summaries.
zh

[NLP-40] MAP4TS: A Multi-Aspect Prompting Framework for Time-Series Forecasting with Large Language Models

链接: https://arxiv.org/abs/2510.23090
作者: Suchan Lee,Jihoon Choi,Sohyeon Lee,Minseok Song,Bong-Gyu Jang,Hwanjo Yu,Soyeon Caren Han
机构: Pohang University of Science and Technology (浦项科技大学); The University of Melbourne (墨尔本大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-41] A Survey on LLM Mid-training

链接: https://arxiv.org/abs/2510.23081
作者: Chengying Tu,Xuemiao Zhang,Rongxiang Weng,Rumei Li,Chen Zhang,Yang Bai,Hongfei Yan,Jingang Wang,Xunliang Cai
机构: Peking University (北京大学); Meituan (美团)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-42] Fast-MIA: Efficient and Scalable Membership Inference for LLM s

链接: https://arxiv.org/abs/2510.23074
作者: Hiromu Takahashi,Shotaro Ishihara
机构: Nikkei Inc.
类目: Cryptography and Security (cs.CR); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-43] Quality-Aware Translation Tagging in Multilingual RAG system EMNLP2025

链接: https://arxiv.org/abs/2510.23070
作者: Hoyeon Moon,Byeolhee Kim,Nikhil Verma
机构: Yonsei University (延世大学); University of Ulsan (蔚山大学); LG Electronics, Toronto AI Lab (LG电子多伦多人工智能实验室)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: EMNLP 2025 MRL Workshop

点击查看摘要

[NLP-44] Knocking-Heads Attention

链接: https://arxiv.org/abs/2510.23052
作者: Zhanchao Zhou,Xiaodong Chen,Haoxing Chen,Zhenzhong Lan,Jianguo Li
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-45] Incentivizing Agent ic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning

链接: https://arxiv.org/abs/2510.23038
作者: Ran Xu,Jingjing Chen,Jiayu Ye,Yu Wu,Jun Yan,Carl Yang,Hongkun Yu
机构: Emory University(埃默里大学); Google(谷歌); Google Cloud AI Research(谷歌云人工智能研究)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Work in Progress

点击查看摘要

[NLP-46] owards Stable and Effective Reinforcement Learning for Mixture-of-Experts

链接: https://arxiv.org/abs/2510.23027
作者: Di Zhang,Xun Wu,Shaohan Huang,Yaru Hao,Li Dong,Zewen Chi,Zhifang Sui,Furu Wei
机构: Microsoft Research (微软研究院); Peking University (北京大学)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-47] UniAIDet: A Unified and Universal Benchmark for AI-Generated Image Content Detection and Localization

【速读】: 该论文旨在解决当前AI生成图像检测基准(benchmark)在覆盖范围上的局限性问题,即现有评估体系未能充分涵盖多样化的生成模型(如文本到图像、图像到图像、图像修复、图像编辑和深度伪造模型)以及图像类型(尤其是端到端图像编辑和艺术类图像)。其解决方案的关键在于提出UniAIDet——一个统一且全面的基准,不仅包含摄影类图像,还扩展至艺术图像,并系统性地整合多种生成范式,从而为检测方法的性能评估提供更真实、更广泛的测试环境,支撑未来研究在泛化能力与检测-定位关系方面的深入探索。

链接: https://arxiv.org/abs/2510.23023
作者: Huixuan Zhang,Xiaojun Wan
机构: Wangxuan Institute of Computer Technology, Peking University (北京大学王选计算机研究所)
类目: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:With the rapid proliferation of image generative models, the authenticity of digital images has become a significant concern. While existing studies have proposed various methods for detecting AI-generated content, current benchmarks are limited in their coverage of diverse generative models and image categories, often overlooking end-to-end image editing and artistic images. To address these limitations, we introduce UniAIDet, a unified and comprehensive benchmark that includes both photographic and artistic images. UniAIDet covers a wide range of generative models, including text-to-image, image-to-image, image inpainting, image editing, and deepfake models. Using UniAIDet, we conduct a comprehensive evaluation of various detection methods and answer three key research questions regarding generalization capability and the relation between detection and localization. Our benchmark and analysis provide a robust foundation for future research.
zh

[NLP-48] M3T2IBench: A Large-Scale Multi-Category Multi-Instance Multi-Relation Text-to-Image Benchmark

链接: https://arxiv.org/abs/2510.23020
作者: Huixuan Zhang,Xiaojun Wan
机构: Peking University (北京大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-49] LangLingual: A Personalised Exercise-oriented English Language Learning Tool Leverag ing Large Language Models

链接: https://arxiv.org/abs/2510.23011
作者: Sammriddh Gupta,Sonit Singh,Aditya Joshi,Mira Kim
机构: UNSW(新南威尔士大学)
类目: Computation and Language (cs.CL); Computers and Society (cs.CY)
备注: 14 pages

点击查看摘要

[NLP-50] Understanding In-Context Learning Beyond Transformers: An Investigation of State Space and Hybrid Architectures

链接: https://arxiv.org/abs/2510.23006
作者: Shenran Wang,Timothy Tin-Long Tse,Jian Zhu
机构: The University of British Columbia (不列颠哥伦比亚大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-51] Can Language Models Compose Skills In-Context?

【速读】: 该论文旨在解决语言模型在上下文学习(in-context learning)中执行复合任务时的技能组合能力不足的问题,即如何从简单的示例任务中识别并正确组装基本技能以完成更复杂的任务。其解决方案的关键在于:确保示例与复合任务中的具体步骤对齐,从而提升模型对技能识别和组合的理解能力;实验表明,若示例未与对应步骤对齐,即使使用思维链(Chain-of-Thought)示例也可能导致性能下降,而对齐后的示例能显著改善模型表现。

链接: https://arxiv.org/abs/2510.22993
作者: Zidong Liu,Zhuoyan Xu,Zhenmei Shi,Yingyu Liang
机构: The University of Hong Kong (香港大学); University of Wisconsin-Madison (威斯康星大学麦迪逊分校)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Composing basic skills from simple tasks to accomplish composite tasks is crucial for modern intelligent systems. We investigate the in-context composition ability of language models to perform composite tasks that combine basic skills demonstrated in in-context examples. This is more challenging than the standard setting, where skills and their composition can be learned in training. We conduct systematic experiments on various representative open-source language models, utilizing linguistic and logical tasks designed to probe composition abilities. The results reveal that simple task examples can have a surprising negative impact on the performance, because the models generally struggle to recognize and assemble the skills correctly, even with Chain-of-Thought examples. Theoretical analysis further shows that it is crucial to align examples with the corresponding steps in the composition. This inspires a method for the probing tasks, whose improved performance provides positive support for our insights.
zh

[NLP-52] Measuring Teaching with LLM s

链接: https://arxiv.org/abs/2510.22968
作者: Michael Hardy
机构: Stanford University (斯坦福大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-53] MAD-Fact: A Multi-Agent Debate Framework for Long-Form Factuality Evaluation in LLM s

链接: https://arxiv.org/abs/2510.22967
作者: Yucheng Ning,Xixun Lin,Fang Fang,Yanan Cao
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: This article has been accepted by Frontiers of Computer Science (FCS)

点击查看摘要

[NLP-54] agging-Augmented Generation: Assisting Language Models in Finding Intricate Knowledge In Long Contexts EMNLP2025

链接: https://arxiv.org/abs/2510.22956
作者: Anwesan Pal,Karen Hovsepian,Tinghao Guo,Mengnan Zhao,Somendra Tripathi,Nikos Kanakaris,George Mihaila,Sumit Nigam
机构: AWS AI Labs; Amazon Web Services; Amazon OTS; Amazon Catalog AI
类目: Computation and Language (cs.CL); Information Retrieval (cs.IR)
备注: Paper accepted at EMNLP 2025

点击查看摘要

[NLP-55] Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond) NEURIPS2025

链接: https://arxiv.org/abs/2510.22954
作者: Liwei Jiang,Yuanjun Chai,Margaret Li,Mickel Liu,Raymond Fok,Nouha Dziri,Yulia Tsvetkov,Maarten Sap,Alon Albalak,Yejin Choi
机构: 未知
类目: Computation and Language (cs.CL)
备注: NeurIPS 2025 DB Paper (Oral); Camera-Ready Version

点击查看摘要

[NLP-56] Language Server CLI Empowers Language Agents with Process Rewards

链接: https://arxiv.org/abs/2510.22907
作者: Yifan Zhang,Lanser Contributors
机构: Princeton University (普林斯顿大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Programming Languages (cs.PL); Software Engineering (cs.SE)
备注: Project Page: this https URL

点击查看摘要

[NLP-57] Modeling Political Discourse with Sentence-BERT and BERTopic

链接: https://arxiv.org/abs/2510.22904
作者: Margarida Mendonca,Alvaro Figueira
机构: 未知
类目: ocial and Information Networks (cs.SI); Computation and Language (cs.CL); Computers and Society (cs.CY)
备注: 11 pages. Continues previous study by Mendonca M. and Figueira A, 2023: “Analyzing Political Discourse in the 117th U.S. Congress Using Transformer-Based Topic Models”, presented at the International Conference on Computational Social Science

点击查看摘要

[NLP-58] Offline Preference Optimization via Maximum Marginal Likelihood Estimation

链接: https://arxiv.org/abs/2510.22881
作者: Saeed Najafi,Alona Fyshe
机构: University of Alberta (阿尔伯塔大学)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-59] Batch Speculative Decoding Done Right

链接: https://arxiv.org/abs/2510.22876
作者: Ranran Haoran Zhang,Soumik Dey,Ashirbad Mishra,Hansi Wu,Binbin Li,Rui Zhang
机构: The Pennsylvania State University (宾夕法尼亚州立大学); eBay Inc (eBay公司)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-60] A Comprehensive Dataset for Human vs. AI Generated Text Detection AAAI2025

链接: https://arxiv.org/abs/2510.22874
作者: Rajarshi Roy,Nasrin Imanpour,Ashhar Aziz,Shashwat Bajpai,Gurpreet Singh,Shwetangshu Biswas,Kapil Wanaskar,Parth Patwa,Subhankar Ghosh,Shreyas Dixit,Nilesh Ranjan Pal,Vipula Rawte,Ritvik Garimella,Gaytri Jena,Amit Sheth,Vasu Sharma,Aishwarya Naresh Reganti,Vinija Jain,Aman Chadha,Amitava Das
机构: Kalyani Government Engineering College, India (印度卡利亚尼政府工程学院); AI Institute University of South Carolina, USA (美国南卡罗来纳大学人工智能研究所); Indraprastha Institute of Information Technology Delhi, India (印度德里印地普拉斯特信息技术学院); BITS Pilani Hyderabad Campus, India (印度比尔拉理工学院海得拉巴校区); Indian Institute of Information Technology Guwahati, India (印度古瓦哈蒂信息科技学院); National Institute of Technology Silchar, India (印度西尔查尔国立技术学院); San Jose State University, USA (美国圣何塞州立大学); University of California Los Angeles, USA (美国加州大学洛杉矶分校); Washington State University, USA (美国华盛顿州立大学); Vishwakarma Institute of Information Technology, India (印度维什瓦克玛信息技术学院); Gandhi Institute for Technological Advancement, India (印度甘地技术进步学院); Meta AI, USA (美国Meta AI); Amazon AI, USA (美国亚马逊AI); Birla Institute of Technology and Science Pilani Goa, India (印度比尔拉理工学院果阿校区)
类目: Computation and Language (cs.CL)
备注: Defactify4 @AAAI 2025

点击查看摘要

[NLP-61] Interpreting and Mitigating Unwanted Uncertainty in LLM s

链接: https://arxiv.org/abs/2510.22866
作者: Tiasa Singha Roy,Ayush Rajesh Jhaveri,Ilias Triantafyllopoulos
机构: New York University (纽约大学)
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-62] Far from the Shallow: Brain-Predictive Reasoning Embedding through Residual Disentanglement NEURIPS2025

链接: https://arxiv.org/abs/2510.22860
作者: Linyang He,Tianjun Zhong,Richard Antonello,Gavin Mischler,Micah Goldblum,Nima Mesgarani
机构: Zuckerman Mind Brain Behavior Institute, Columbia University (哥伦比亚大学祖克曼心智大脑行为研究所); Department of Electrical Engineering, Columbia University (哥伦比亚大学电气工程系); Department of Computer Science, Columbia University (哥伦比亚大学计算机科学系)
类目: Computation and Language (cs.CL); Neurons and Cognition (q-bio.NC)
备注: Accepted at NeurIPS 2025

点击查看摘要

[NLP-63] Once Upon an Input: Reasoning via Per-Instance Program Synthesis NEURIPS2025

链接: https://arxiv.org/abs/2510.22849
作者: Adam Stein,Neelay Velingker,Mayur Naik,Eric Wong
机构: University of Pennsylvania (宾夕法尼亚大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Accepted at NeurIPS 2025. 34 pages, 7 figures

点击查看摘要

[NLP-64] Leverag ing Large Language Models to Identify Conversation Threads in Collaborative Learning

链接: https://arxiv.org/abs/2510.22844
作者: Prerna Ravi,Dong Won Lee,Beatriz Flamia,Jasmine David,Brandon Hanks,Cynthia Breazeal,Emma Anderson,Grace Lin
机构: MIT CSAIL (麻省理工学院计算机科学与人工智能实验室); MIT Media Lab (麻省理工学院媒体实验室); Instituto Politécnico de Bragança (布拉干萨理工学院); MIT STEP (麻省理工学院科学与技术教育项目)
类目: Computation and Language (cs.CL)
备注: In Submission: Journal of Educational Data Mining (jEDM) 2026

点击查看摘要

[NLP-65] Exploration of Summarization by Generative Language Models for Automated Scoring of Long Essays

链接: https://arxiv.org/abs/2510.22830
作者: Haowei Hua(1),Hong Jiao(2),Xinyi Wang(3) ((1) Princeton University, (2) University of Maryland, College Park, (3) University of Maryland, College Park amp; Beijing Normal University)
机构: 未知
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: 19 pages, 5 Tables 7 Figures, Presentation at Artificial Intelligence in Measurement and Education Conference (AIME-Con)

点击查看摘要

[NLP-66] Cross-Lingual Stability and Bias in Instruction-Tuned Language Models for Humanitarian NLP

链接: https://arxiv.org/abs/2510.22823
作者: Poli Nemkova,Amrit Adhikari,Matthew Pearson,Vamsi Krishna Sadu,Mark V. Albert
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-67] VEHME: A Vision-Language Model For Evaluating Handwritten Mathematics Expressions EMNLP2025

链接: https://arxiv.org/abs/2510.22798
作者: Thu Phuong Nguyen,Duc M. Nguyen,Hyotaek Jeon,Hyunwook Lee,Hyunmin Song,Sungahn Ko,Taehwan Kim
机构: 未知
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: EMNLP 2025. Project Website: this https URL

点击查看摘要

[NLP-68] How Do AI Agents Do Human Work? Comparing AI and Human Workflows Across Diverse Occupations

链接: https://arxiv.org/abs/2510.22780
作者: Zora Zhiruo Wang,Yijia Shao,Omar Shaikh,Daniel Fried,Graham Neubig,Diyi Yang
机构: Carnegie Mellon University (卡内基梅隆大学); Stanford University (斯坦福大学)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
备注:

点击查看摘要

[NLP-69] Scalable Supervising Software Agents with Patch Reason er

链接: https://arxiv.org/abs/2510.22775
作者: Junjielong Xu,Boyin Tan,Xiaoyuan Liu,Chao Peng,Pengfei Gao,Pinjia He
机构: The Chinese University of Hong Kong, Shenzhen (香港中文大学(深圳)); ByteDance (字节跳动)
类目: Computation and Language (cs.CL); Software Engineering (cs.SE)
备注:

点击查看摘要

[NLP-70] MMPersuade: A Dataset and Evaluation Framework for Multimodal Persuasion

链接: https://arxiv.org/abs/2510.22768
作者: Haoyi Qiu,Yilun Zhou,Pranav Narayanan Venkit,Kung-Hsiang Huang,Jiaxin Zhang,Nanyun Peng,Chien-Sheng Wu
机构: University of California, Los Angeles (加州大学洛杉矶分校); Salesforce AI Research (Salesforce人工智能研究院)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-71] ELL-TALE: Task Efficient LLM s with Task Aware Layer Elimination

链接: https://arxiv.org/abs/2510.22767
作者: Omar Naim,Krish Sharma,Nicholas Asher
机构: IRIT(信息与推理技术研究所)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-72] Iterative Layer Pruning for Efficient Translation Inference

链接: https://arxiv.org/abs/2510.22763
作者: Yasmin Moslem,Muhammad Hazim Al Farouq,John D. Kelleher
机构: ADAPT Centre (ADAPT 中心); Trinity College Dublin (都柏林圣三一学院); Kreasof AI (Kreasof AI)
类目: Computation and Language (cs.CL); Performance (cs.PF)
备注: WMT 2025

点击查看摘要

[NLP-73] EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models

链接: https://arxiv.org/abs/2510.22758
作者: Li Zhou,Lutong Yu,You Lyu,Yihang Lin,Zefeng Zhao,Junyi Ao,Yuhao Zhang,Benyou Wang,Haizhou Li
机构: The Chinese University of Hong Kong, Shenzhen (香港中文大学(深圳)); Shenzhen Research Institute of Big Data (深圳市大数据研究院)
类目: Computation and Language (cs.CL)
备注: Speech Language Models, Spoken Language Understanding, Vocal Cue Perception, Empathetic Dialogue, Benchmark Evaluation

点击查看摘要

[NLP-74] Beyond Semantics: How Temporal Biases Shape Retrieval in Transformer and State-Space Models

链接: https://arxiv.org/abs/2510.22752
作者: Anooshka Bajaj,Deven Mahesh Mistry,Sahaj Singh Maini,Yash Aggarwal,Zoran Tiganj
机构: Indiana University Bloomington (印第安纳大学伯明顿分校)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-75] Multi-Modal Fact-Verification Framework for Reducing Hallucinations in Large Language Models

链接: https://arxiv.org/abs/2510.22751
作者: Piyushkumar Patel
机构: Microsoft(微软)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-76] Low-Resource Dialect Adaptation of Large Language Models : A French Dialect Case-Study LREC2026

链接: https://arxiv.org/abs/2510.22747
作者: Eeham Khan,Firas Saidani,Owen Van Esbroeck,Richard Khoury,Leila Kosseim
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Submitted to LREC 2026

点击查看摘要

[NLP-77] REVISION:Reflective Intent Mining and Online Reasoning Auxiliary for E-commerce Visual Search System Optimization

链接: https://arxiv.org/abs/2510.22739
作者: Yiwen Tang,Qiuyu Zhao,Zenghui Sun,Jinsong Lan,Xiaoyong Zhu,Bo Zheng,Kaifu Zhang
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-78] textE2textRank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker

链接: https://arxiv.org/abs/2510.22733
作者: Qi Liu,Yanzhao Zhang,Mingxin Li,Dingkun Long,Pengjun Xie,Jiaxin Mao
机构: Renmin University of China (中国人民大学); Alibaba Group (阿里巴巴集团)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
备注: Code and models are avaliable at this https URL

点击查看摘要

[NLP-79] ATLAS: Actor-Critic Task-Completion with Look-ahead Action Simulation NEURIPS2025

链接: https://arxiv.org/abs/2510.22732
作者: Jiali Cheng,Anjishnu Kumar,Roshan Lal,Rishi Rajasekaran,Hani Ramezani,Omar Zia Khan,Oleg Rokhlenko,Sunny Chiu-Webster,Gang Hua,Hadi Amiri
机构: University of Massachusetts Lowell (马萨诸塞大学洛厄尔分校); Amazon Alexa AI (亚马逊Alexa AI)
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Multiagent Systems (cs.MA); Robotics (cs.RO)
备注: 9 pages, NeurIPS 2025 Workshop on Language Agents and World Models

点击查看摘要

[NLP-80] Critical Insights into Leading Conversational AI Models

链接: https://arxiv.org/abs/2510.22729
作者: Urja Kohli(1),Aditi Singh(2),Arun Sharma(3) ((1) Department of Mechanical and Automation Engineering, Indira Gandhi Delhi Technical University for Women, Delhi, India, (2) Department of Electronics and Communication Engineering, Indira Gandhi Delhi Technical University for Women, Delhi, India, (3) Department of Information Technology, Indira Gandhi Delhi Technical University for Women, Delhi, India)
机构: 未知
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: 21 pages, 7 tables, 3 figures. Open-access preprint intended for journal or conference submission

点击查看摘要

[NLP-81] Windsock is Dancing: Adaptive Multimodal Retrieval-Augmented Generation NEURIPS2025

链接: https://arxiv.org/abs/2510.22694
作者: Shu Zhao,Tianyi Shen,Nilesh Ahuja,Omesh Tickoo,Vijaykrishnan Narayanan
机构: The Pennsylvania State University (宾夕法尼亚州立大学); Intel (英特尔)
类目: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Information Retrieval (cs.IR)
备注: Accepted at NeurIPS 2025 UniReps Workshop

点击查看摘要

[NLP-82] SALSA: Single-pass Autoregressive LLM Structured Classification

链接: https://arxiv.org/abs/2510.22691
作者: Ruslan Berdichevsky,Shai Nahum-Gefen,Elad Ben Zaken
机构: Dream Security Ltd. (梦想安全有限公司)
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-83] Rule-Based Explanations for Retrieval-Augmented LLM Systems

链接: https://arxiv.org/abs/2510.22689
作者: Joel Rorseth,Parke Godfrey,Lukasz Golab,Divesh Srivastava,Jarek Szlichta
机构: University of Waterloo (滑铁卢大学); York University (约克大学); AT&T Research (AT&T 研究院)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-84] RoboSVG: A Unified Framework for Interactive SVG Generation with Multi-modal Guidance

链接: https://arxiv.org/abs/2510.22684
作者: Jiuniu Wang,Gongjie Zhang,Quanhao Qian,Junlong Gao,Deli Zhao,Ran Xu
机构: DAMO Academy (达摩院); Alibaba Group (阿里巴巴集团)
类目: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
备注: 15 pages, 5 figures

点击查看摘要

[NLP-85] Do Stop Me Now: Detecting Boilerplate Responses with a Single Iteration

链接: https://arxiv.org/abs/2510.22679
作者: Yuval Kainan,Shaked Zychlinski
机构: JFrog
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: 13 pages, 4 figures

点击查看摘要

[NLP-86] Look and Tell: A Dataset for Multimodal Grounding Across Egocentric and Exocentric Views NEURIPS2025

链接: https://arxiv.org/abs/2510.22672
作者: Anna Deichler,Jonas Beskow
机构: KTH Royal Institute of Technology (皇家理工学院)
类目: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Robotics (cs.RO)
备注: 10 pages, 6 figures, 2 tables. Accepted to the NeurIPS 2025 Workshop on SPACE in Vision, Language, and Embodied AI (SpaVLE)

点击查看摘要

[NLP-87] Conjugate Relation Modeling for Few-Shot Knowledge Graph Completion

链接: https://arxiv.org/abs/2510.22656
作者: Zilong Wang,Qingtian Zeng,Hua Duan,Cheng Cheng,Minghao Zou,Ziyang Wang
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-88] Culturally Grounded Physical Commonsense Reasoning in Italian and English: A Submission to the MRL 2025 Shared Task

链接: https://arxiv.org/abs/2510.22631
作者: Marco De Santis,Lisa Alazraki
机构: University of Udine (乌迪内大学); Imperial College London (帝国理工学院)
类目: Computation and Language (cs.CL)
备注: MRL 2025 Shared Task on Multilingual Physical Reasoning Datasets

点击查看摘要

[NLP-89] Integrating Linguistics and AI: Morphological Analysis and Corpus development of Endangered Toto Language of West Bengal

链接: https://arxiv.org/abs/2510.22629
作者: Ambalika Guha,Sajal Saha,Debanjan Ballav,Soumi Mitra,Hritwick Chakraborty
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-90] PerCoR: Evaluating Commonsense Reasoning in Persian via Multiple-Choice Sentence Completion AACL2025

【速读】: 该论文旨在解决伊朗语(波斯语)常识推理能力评估缺乏大规模、高质量基准数据集的问题。为应对这一挑战,作者提出了PerCoR(Persian Commonsense Reasoning),这是首个针对波斯语的常识推理大规模基准测试,包含106K个多选句补全任务,来源涵盖四十多个新闻与文化类网络资源。其解决方案的关键在于两个创新:一是提出基于连词(conjunction-based)的分段策略,以生成结构多样且语义连贯的句子补全对;二是设计DRESS-AF(Distractor Ranking via Embedding Similarity Scoring and Adversarial Filtering)方法,通过嵌入相似度评分与对抗过滤从正确续写中筛选干扰项,从而最大化模型混淆度,显著提升任务难度。实验证明该方法不仅在波斯语上有效,还能迁移至英文HellaSwag基准并增强其挑战性而不影响人类解题能力。

链接: https://arxiv.org/abs/2510.22616
作者: Morteza Alikhani,Mohammadtaha Bagherifard,Erfan Zinvandi,Mehran Sarmadi
机构: Sharif University of Technology (伊朗沙里夫理工大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 20 pages, 17 figures, Accepted to IJCNLP-AACL 2025 (Main Conference)

点击查看摘要

Abstract:We introduced PerCoR (Persian Commonsense Reasoning), the first large-scale Persian benchmark for commonsense reasoning. PerCoR contains 106K multiple-choice sentence-completion problems drawn from more than forty news, cultural, and other web sources. We introduce a novel conjunction-based segmentation strategy to generate coherent sentence-completion pairs, enabling broad topical and structural diversity. To create challenging distractors, we propose DRESS-AF (Distractor Ranking via Embedding Similarity Scoring and Adversarial Filtering), a generation-free adversarial filtering method that selects distractors from the pool of gold continuations while maximising model confusion. Human annotators score 89% on PerCoR, while OpenAI-o3 achieves the highest performance at 92.18%, followed closely by Claude-Sonnet-3.7 (91.17%). The strongest open-source model, DeepSeek-R1, reaches 82.51%, underscoring both the dataset’s difficulty and the remaining performance gap in Persian commonsense reasoning. We further show that DRESS-AF transfers to the English HellaSwag benchmark, increasing its difficulty without hurting human solvability. The dataset is available at this https URL.
zh

[NLP-91] Personal Care Utility (PCU): Building the Health Infrastructure for Everyday Insight and Guidance

链接: https://arxiv.org/abs/2510.22602
作者: Mahyar Abbasian,Ramesh Jain
机构: University of California, Irvine (加州大学欧文分校)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
备注: 22 pages, 2 figures, 1 table, Journal paper

点击查看摘要

[NLP-92] AutoBench: Automating LLM Evaluation through Reciprocal Peer Assessment

链接: https://arxiv.org/abs/2510.22593
作者: Dario Loi,Elena Maria Muià,Federico Siciliano,Giovanni Trappolini,Vincenzo Crisà,Peter Kruger,Fabrizio Silvestri
机构: Sapienza University of Rome (罗马大学); eZecute S.R.L.
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-93] ATOM: AdapTive and OptiMized dynamic temporal knowledge graph construction using LLM s

链接: https://arxiv.org/abs/2510.22590
作者: Yassir Lairgi,Ludovic Moncla,Khalid Benabdeslem,Rémy Cazabet,Pierre Cléau
机构: LIRIS, INSA Lyon, Université Claude Bernard Lyon 1, France; GAUC, Lyon, France
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
备注:

点击查看摘要

[NLP-94] Pedagogy-driven Evaluation of Generative AI-powered Intelligent Tutoring Systems

【速读】: 该论文旨在解决当前生成式 AI (Generative AI) 驱动的智能辅导系统(Intelligent Tutoring Systems, ITSs)缺乏可靠、统一且以教学法为导向的评估框架与基准的问题。现有教育对话类 ITS 评估多依赖主观协议和非标准化指标,导致结果不一致且难以推广。解决方案的关键在于:首先,从主流 ITS 开发中抽身,系统梳理当前最先进的评估实践,并通过真实案例揭示其挑战;其次,基于跨学科人工智能教育(Artificial Intelligence in Education, AIED)研究的洞见,提出三个切实可行、理论根基扎实的研究方向,聚焦学习科学原则,目标是建立公平、统一且可扩展的 ITS 评估方法论。

链接: https://arxiv.org/abs/2510.22581
作者: Kaushal Kumar Maurya,Ekaterina Kochmar
机构: 未知
类目: Computation and Language (cs.CL)
备注: AIED 2025 (BlueSky)

点击查看摘要

Abstract:The interdisciplinary research domain of Artificial Intelligence in Education (AIED) has a long history of developing Intelligent Tutoring Systems (ITSs) by integrating insights from technological advancements, educational theories, and cognitive psychology. The remarkable success of generative AI (GenAI) models has accelerated the development of large language model (LLM)-powered ITSs, which have potential to imitate human-like, pedagogically rich, and cognitively demanding tutoring. However, the progress and impact of these systems remain largely untraceable due to the absence of reliable, universally accepted, and pedagogy-driven evaluation frameworks and benchmarks. Most existing educational dialogue-based ITS evaluations rely on subjective protocols and non-standardized benchmarks, leading to inconsistencies and limited generalizability. In this work, we take a step back from mainstream ITS development and provide comprehensive state-of-the-art evaluation practices, highlighting associated challenges through real-world case studies from careful and caring AIED research. Finally, building on insights from previous interdisciplinary AIED research, we propose three practical, feasible, and theoretically grounded research directions, rooted in learning science principles and aimed at establishing fair, unified, and scalable evaluation methodologies for ITSs.
zh

[NLP-95] A Closed-Loop Personalized Learning Agent Integrating Neural Cognitive Diagnosis Bounded-Ability Adaptive Testing and LLM -Driven Feedback

链接: https://arxiv.org/abs/2510.22559
作者: Zhifeng Wang,Xinyue Zheng,Chunyan Zeng
机构: 未知
类目: Computation and Language (cs.CL)
备注: 8 pages, 6 figures

点击查看摘要

[NLP-96] SABlock: Semantic-Aware KV Cache Eviction with Adaptive Compression Block Size

【速读】: 该论文旨在解决长上下文大语言模型(Large Language Model, LLM)推理中Key-Value (KV)缓存内存占用过大导致的可扩展性瓶颈问题。现有基于token、block或句子级别的压缩方法难以在语义连贯性和内存效率之间取得良好平衡。解决方案的关键在于提出SABlock框架,其核心创新包括:1)通过语义分割(semantic segmentation)将压缩边界对齐至语言结构以增强语义一致性;2)引入基于分段的token评分机制优化重要性评估;3)采用预算驱动的自适应块大小搜索策略,在给定缓存预算下动态确定最优块尺寸,从而在保持语义完整性的同时最大化压缩效率。实验表明,SABlock在多个长文本基准测试中显著优于现有最先进方法。

链接: https://arxiv.org/abs/2510.22556
作者: Jinhan Chen,Jianchun Liu,Hongli Xu,Xianjun Gao,Shilong Wang
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:The growing memory footprint of the Key-Value (KV) cache poses a severe scalability bottleneck for long-context Large Language Model (LLM) inference. While KV cache eviction has emerged as an effective solution by discarding less critical tokens, existing token-, block-, and sentence-level compression methods struggle to balance semantic coherence and memory efficiency. To this end, we introduce SABlock, a \underlinesemantic-aware KV cache eviction framework with \underlineadaptive \underlineblock sizes. Specifically, SABlock first performs semantic segmentation to align compression boundaries with linguistic structures, then applies segment-guided token scoring to refine token importance estimation. Finally, for each segment, a budget-driven search strategy adaptively determines the optimal block size that preserves semantic integrity while improving compression efficiency under a given cache budget. Extensive experiments on long-context benchmarks demonstrate that SABlock consistently outperforms state-of-the-art baselines under the same memory budgets. For instance, on Needle-in-a-Haystack (NIAH), SABlock achieves 99.9% retrieval accuracy with only 96 KV entries, nearly matching the performance of the full-cache baseline that retains up to 8K entries. Under a fixed cache budget of 1,024, SABlock further reduces peak memory usage by 46.28% and achieves up to 9.5x faster decoding on a 128K context length.
zh

[NLP-97] LooGLE v2: Are LLM s Ready for Real World Long Dependency Challenges? NEURIPS2025

链接: https://arxiv.org/abs/2510.22548
作者: Ziyuan He,Yuxuan Wang,Jiaqi Li,Kexin Liang,Muhan Zhang
机构: Peking University (北京大学); BIGAI (通用人工智能国家重点实验室); Beijing Institute of Technology (北京理工大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: NeurIPS 2025 Datasets and Benchmarks Track

点击查看摘要

[NLP-98] OFFSIDE: Benchmarking Unlearning Misinformation in Multimodal Large Language Models

【速读】: 该论文旨在解决多模态大语言模型(Multimodal Large Language Models, MLLMs)在数据隐私保护方面的关键挑战,即如何有效实现机器遗忘(Machine Unlearning, MU),以选择性地移除模型中特定信息,尤其是在包含文本与图像的复杂谣言场景下。其解决方案的核心是提出一个名为OFFSIDE的新基准,专门用于评估MLLMs对足球转会谣言中的虚假信息进行遗忘的能力;该基准包含15.68K条人工标注记录、覆盖80名球员,并设计了四个测试集以系统性衡量遗忘效果(forgetting efficacy)、泛化能力(generalization)、模型效用(utility)和鲁棒性(robustness)。此外,OFFSIDE支持选择性遗忘和修正重学习等高级设置,尤其创新性地引入了单模态遗忘(unimodal unlearning,仅删除文本知识)机制,从而揭示当前方法在处理跨模态谣言时的显著缺陷,如灾难性遗忘主导遗忘效果、视觉谣言难以被清除、遗忘信息易被恢复以及对提示攻击敏感等问题,为构建更可靠的多模态遗忘机制提供了实证基础和方向指引。

链接: https://arxiv.org/abs/2510.22535
作者: Hao Zheng,Zirui Pang,Ling li,Zhijie Deng,Yuhan Pu,Zhaowei Zhu,Xiaobo Xia,Jiaheng Wei
机构: Harbin Institute of Technology (哈尔滨工业大学); University of Illinois Urbana-Champaign (伊利诺伊大学厄巴纳-香槟分校); The Hong Kong University of Science and Technology (Guangzhou) (香港科技大学(广州)); BIAI, ZJUT & D5Data.ai; National University of Singapore (新加坡国立大学)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Advances in Multimodal Large Language Models (MLLMs) intensify concerns about data privacy, making Machine Unlearning (MU), the selective removal of learned information, a critical necessity. However, existing MU benchmarks for MLLMs are limited by a lack of image diversity, potential inaccuracies, and insufficient evaluation scenarios, which fail to capture the complexity of real-world applications. To facilitate the development of MLLMs unlearning and alleviate the aforementioned limitations, we introduce OFFSIDE, a novel benchmark for evaluating misinformation unlearning in MLLMs based on football transfer rumors. This manually curated dataset contains 15.68K records for 80 players, providing a comprehensive framework with four test sets to assess forgetting efficacy, generalization, utility, and robustness. OFFSIDE supports advanced settings like selective unlearning and corrective relearning, and crucially, unimodal unlearning (forgetting only text data). Our extensive evaluation of multiple baselines reveals key findings: (1) Unimodal methods (erasing text-based knowledge) fail on multimodal rumors; (2) Unlearning efficacy is largely driven by catastrophic forgetting; (3) All methods struggle with “visual rumors” (rumors appear in the image); (4) The unlearned rumors can be easily recovered and (5) All methods are vulnerable to prompt attacks. These results expose significant vulnerabilities in current approaches, highlighting the need for more robust multimodal unlearning solutions. The code is available at \hrefthis https URLthis https URL.
zh

[NLP-99] xt to Trust: Evaluating Fine-Tuning and LoRA Trade-offs in Language Models for Unfair Terms of Service Detection

链接: https://arxiv.org/abs/2510.22531
作者: Noshitha Padma Pratyusha Juttu,Sahithi Singireddy,Sravani Gona,Sujal Timilsina
机构: University of Massachusetts Amherst (马萨诸塞大学阿默斯特分校)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 6 pages, including figures and tables. All experiments are reproducible. Code and fine-tuned models are publicly available on: GitHub: ( this https URL ) and Hugging Face: ( this https URL )

点击查看摘要

[NLP-100] Scalable Oversight via Partitioned Human Supervision

【速读】: 该论文旨在解决在多领域深度知识任务中,如何对超越人类专家能力的前沿人工智能系统进行有效评估与训练的问题。由于人类专家仅具备单一领域的狭窄专业知识,无法提供完整标注(ground truth),传统监督学习方法难以适用。其解决方案的关键在于利用人类专家提供的“弱信号”——即互补标签(complementary labels),这些标签仅指示错误选项而非正确答案。作者基于此提出了一种可扩展的监督框架,推导出无偏的Top-1准确率估计器,并量化了所需互补标签数量以匹配普通标签的方差;进一步引入两种混合估计器,将稀缺的普通标签与丰富的互补标签结合,同时提供了有限样本下的偏差保证。实证表明,该方法可在无真实标签的情况下评估大语言模型输出,并可用于训练具有分层人类监督的代理型AI系统(agentic AI system)。

链接: https://arxiv.org/abs/2510.22500
作者: Ren Yin,Takashi Ishida,Masashi Sugiyama
机构: The University of Tokyo (东京大学); RIKEN (理化学研究所)
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:As artificial intelligence (AI) systems approach and surpass expert human performance across a broad range of tasks, obtaining high-quality human supervision for evaluation and training becomes increasingly challenging. Our focus is on tasks that require deep knowledge and skills of multiple domains. Unfortunately, even the best human experts are knowledgeable only in a single narrow area, and will not be able to evaluate the correctness of advanced AI systems on such superhuman tasks. However, based on their narrow expertise, humans may provide a weak signal, i.e., a complementary label indicating an option that is incorrect. For example, a cardiologist could state that "this is not related to cardiology,‘’ even if they cannot identify the true disease. Based on this weak signal, we propose a scalable oversight framework that enables us to evaluate frontier AI systems without the need to prepare the ground truth. We derive an unbiased estimator of top-1 accuracy from complementary labels and quantify how many complementary labels are needed to match the variance of ordinary labels. We further introduce two estimators to combine scarce ordinary labels with abundant complementary labels. We provide finite-sample deviation guarantees for both complementary-only and the mixed estimators. Empirically, we show that we can evaluate the output of large language models without the ground truth, if we have complementary labels. We further show that we can train an AI system with such weak signals: we show how we can design an agentic AI system automatically that can perform better with this partitioned human supervision. Our code is available at this https URL.
zh

[NLP-101] A Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific Northwest English Corpus

链接: https://arxiv.org/abs/2510.22495
作者: Michael Scott,Siyu Liang,Alicia Wassink,Gina-Anne Levow
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-102] he Limits of Data Scaling: Sub-token Utilization and Acoustic Saturation in Multilingual ASR

链接: https://arxiv.org/abs/2510.22492
作者: Siyu Liang,Nicolas Ballier,Gina-Anne Levow,Richard Wright
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-103] Frustratingly Easy Task-aware Pruning for Large Language Models

链接: https://arxiv.org/abs/2510.22489
作者: Yuanhe Tian,Junjie Liu,Xican Yang,Haishan Ye,Yan Song
机构: Zhongguancun Institute of Artificial Intelligence (中关村人工智能研究院); University of Science and Technology of China (中国科学技术大学); Xi’an Jiaotong University (西安交通大学)
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: 8 pages, 3 figures

点击查看摘要

[NLP-104] he Tonogenesis Continuum in Tibetan: A Computational Investigation

链接: https://arxiv.org/abs/2510.22485
作者: Siyu Liang,Zhaxi Zerong
机构: University of Washington (华盛顿大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-105] CHOIR: Collaborative Harmonization fOr Inference Robustness DATE

【速读】: 该论文旨在解决生成式 AI(Generative AI)在推理过程中因微小的人设(persona)变化(如代词替换)导致推理路径偏移、进而产生不一致正确答案的问题。传统方法将此类差异视为需消除的偏差,而本文提出将人设变化视为可利用的构造性信号,通过引入CHOIR(Collaborative Harmonization for Inference Robustness)框架,在测试阶段协同整合多个不同人设条件下的推理输出,动态平衡其推理路径的一致性与多样性,从而提升模型推理的鲁棒性。解决方案的关键在于:设计一种无需额外训练的协作解码机制,使多个人设驱动的推理信号能够自适应地融合,显著增强跨人群、架构、规模和任务的性能稳定性。

链接: https://arxiv.org/abs/2510.22475
作者: Xiangjue Dong,Cong Wang,Maria Teleki,Millennium Bismay,James Caverlee
机构: Texas A&M University (德州农工大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: updated version

点击查看摘要

Abstract:Persona-assigned Large Language Models (LLMs) can adopt diverse roles, enabling personalized and context-aware reasoning. However, even minor demographic perturbations in personas, such as simple pronoun changes, can alter reasoning trajectories, leading to divergent sets of correct answers. Instead of treating these variations as biases to be mitigated, we explore their potential as a constructive resource to improve reasoning robustness. We propose CHOIR (Collaborative Harmonization fOr Inference Robustness), a test-time framework that harmonizes multiple persona-conditioned reasoning signals into a unified prediction. CHOIR orchestrates a collaborative decoding process among counterfactual personas, dynamically balancing agreement and divergence in their reasoning paths. Experiments on various reasoning benchmarks demonstrate that CHOIR consistently enhances performance across demographics, model architectures, scales, and tasks - without additional training. Improvements reach up to 26.4% for individual demographic groups and 19.2% on average across five demographics. It remains effective even when base personas are suboptimal. By reframing persona variation as a constructive signal, CHOIR provides a scalable and generalizable approach to more reliable LLM reasoning.
zh

[NLP-106] Modeling Hierarchical Thinking in Large Reasoning Models

链接: https://arxiv.org/abs/2510.22437
作者: G M Shahariar,Ali Nazari,Erfan Shayegani,Nael Abu-Ghazaleh
机构: University of California, Riverside (加州大学河滨分校)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-107] Confabulations from ACL Publications (CAP): A Dataset for Scientific Hallucination Detection

链接: https://arxiv.org/abs/2510.22395
作者: Federica Gamba,Aman Sinha,Timothee Mickus,Raul Vazquez,Patanjali Bhamidipati,Claudio Savelli,Ahana Chattopadhyay,Laura A. Zanella,Yash Kankanampati,Binesh Arakkal Remesh,Aryan Ashok Chandramania,Rohit Agarwal,Chuyuan Li,Ioana Buhnila,Radhika Mamidi
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-108] Label Smoothing Improves Gradient Ascent in LLM Unlearning

【速读】: 该论文旨在解决大语言模型(Large Language Models, LLMs)在执行遗忘学习(unlearning)时因梯度上升(Gradient Ascent, GA)方法导致的不稳定问题,即GA在更新过程中容易产生发散方向,显著损害模型性能。其解决方案的关键在于提出平滑梯度上升(Smoothed Gradient Ascent, SGA),通过引入一个可调节的平滑率将遗忘数据与构造的正常数据进行融合,从而将原本仅基于遗忘数据的更新扩展为在遗忘数据和正常数据上联合学习,实现更稳定的遗忘过程并更好地保留模型效用。理论层面提供了最优平滑率选择的指导,实验表明SGA在多个基准测试中均优于原始GA方法,并在关键指标上达到前两名性能。

链接: https://arxiv.org/abs/2510.22376
作者: Zirui Pang,Hao Zheng,Zhijie Deng,Ling Li,Zixin Zhong,Jiaheng Wei
机构: The Hong Kong University of Science and Technology (Guangzhou); Harbin Institute of Technology (Weihai)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:LLM unlearning has emerged as a promising approach, aiming to enable models to forget hazardous/undesired knowledge at low cost while preserving as much model utility as possible. Among existing techniques, the most straightforward method is performing Gradient Ascent (GA) w.r.t. the forget data, thereby forcing the model to unlearn the forget dataset. However, GA suffers from severe instability, as it drives updates in a divergent direction, often resulting in drastically degraded model utility. To address this issue, we propose Smoothed Gradient Ascent (SGA). SGA combines the forget data with multiple constructed normal data through a tunable smoothing rate. Intuitively, this extends GA from learning solely on the forget data to jointly learning across both forget and normal data, enabling more stable unlearning while better preserving model utility. Theoretically, we provide the theoretical guidance on the selection of the optimal smoothing rate. Empirically, we evaluate SGA on three benchmarks: TOFU, Harry Potter, and MUSE-NEWS. Experimental results demonstrate that SGA consistently outperforms the original Gradient Ascent (GA) method across all metrics and achieves top-2 performance among all baseline methods on several key metrics.
zh

[NLP-109] VisJudge-Bench: Aesthetics and Quality Assessment of Visualizations

【速读】: 该论文旨在解决当前多模态大语言模型(Multimodal Large Language Models, MLLMs)在评估可视化(Visualization)美学与质量方面能力不足的问题。现有方法缺乏系统性基准测试,且先进模型如GPT-5在数据编码准确性、信息表达力和视觉美感的综合判断上仍显著落后于人类专家,表现为平均绝对误差(MAE)较高(0.551)和与人类评分的相关性较低(0.429)。解决方案的关键在于提出首个全面的基准VisJudge-Bench,包含3,090个真实场景下的专家标注样本,并设计专门针对可视化评估的模型VisJudge,通过针对性训练显著缩小了与人类专家判断的差距——MAE降低至0.442(相对减少19.8%),相关性提升至0.681(相对提高58.7%)。

链接: https://arxiv.org/abs/2510.22373
作者: Yupeng Xie,Zhiyang Zhang,Yifan Wu,Sirong Lu,Jiayi Zhang,Zhaoyang Yu,Jinlin Wang,Sirui Hong,Bang Liu,Chenglin Wu,Yuyu Luo
机构: The Hong Kong University of Science and Technology (Guangzhou) (香港科技大学(广州)); DeepWisdom; Université de Montréal & Mila (蒙特利尔大学与Mila研究所)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: 53 pages, 26 figures, 5 tables

点击查看摘要

Abstract:Visualization, a domain-specific yet widely used form of imagery, is an effective way to turn complex datasets into intuitive insights, and its value depends on whether data are faithfully represented, clearly communicated, and aesthetically designed. However, evaluating visualization quality is challenging: unlike natural images, it requires simultaneous judgment across data encoding accuracy, information expressiveness, and visual aesthetics. Although multimodal large language models (MLLMs) have shown promising performance in aesthetic assessment of natural images, no systematic benchmark exists for measuring their capabilities in evaluating visualizations. To address this, we propose VisJudge-Bench, the first comprehensive benchmark for evaluating MLLMs’ performance in assessing visualization aesthetics and quality. It contains 3,090 expert-annotated samples from real-world scenarios, covering single visualizations, multiple visualizations, and dashboards across 32 chart types. Systematic testing on this benchmark reveals that even the most advanced MLLMs (such as GPT-5) still exhibit significant gaps compared to human experts in judgment, with a Mean Absolute Error (MAE) of 0.551 and a correlation with human ratings of only 0.429. To address this issue, we propose VisJudge, a model specifically designed for visualization aesthetics and quality assessment. Experimental results demonstrate that VisJudge significantly narrows the gap with human judgment, reducing the MAE to 0.442 (a 19.8% reduction) and increasing the consistency with human experts to 0.681 (a 58.7% improvement) compared to GPT-5. The benchmark is available at this https URL.
zh

[NLP-110] Reasoning Models Reason Well Until They Dont

【速读】: 该论文旨在解决当前大语言模型(Large Language Models, LLMs)在复杂推理任务中表现急剧下降的问题,尤其是当推理问题的复杂度超过一定阈值时,现有模型会“灾难性失败”。为应对这一挑战,作者提出构建大推理模型(Large Reasoning Models, LRMs),其核心解决方案是通过引入逐步论证(step-by-step argumentation)和自我验证(self-verification)的训练激励机制对LLMs进行微调。然而,研究发现,尽管LRMs在当前基准测试(如NLGraph)上表现出色,但这些基准的实际复杂度有限;通过新构建的可扩展复杂度数据集DeepRD进行评估后,LRMs在足够复杂的图连通性和自然语言证明规划任务中性能骤降且不具备泛化能力。这表明,虽然LRMs在现实世界中多数常见场景下仍具实用性,但其泛化能力受限于训练数据的复杂度分布,亟需发展能超越训练分布复杂度的新方法。

链接: https://arxiv.org/abs/2510.22371
作者: Revanth Rameshkumar,Jimson Huang,Yunxin Sun,Fei Xia,Abulhair Saparov
机构: University of Washington (华盛顿大学); Purdue University (普渡大学)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Large language models (LLMs) have shown significant progress in reasoning tasks. However, recent studies show that transformers and LLMs fail catastrophically once reasoning problems exceed modest complexity. We revisit these findings through the lens of large reasoning models (LRMs) – LLMs fine-tuned with incentives for step-by-step argumentation and self-verification. LRM performance on graph and reasoning benchmarks such as NLGraph seem extraordinary, with some even claiming they are capable of generalized reasoning and innovation in reasoning-intensive fields such as mathematics, physics, medicine, and law. However, by more carefully scaling the complexity of reasoning problems, we show existing benchmarks actually have limited complexity. We develop a new dataset, the Deep Reasoning Dataset (DeepRD), along with a generative process for producing unlimited examples of scalable complexity. We use this dataset to evaluate model performance on graph connectivity and natural language proof planning. We find that the performance of LRMs drop abruptly at sufficient complexity and do not generalize. We also relate our LRM results to the distributions of the complexities of large, real-world knowledge graphs, interaction graphs, and proof datasets. We find the majority of real-world examples fall inside the LRMs’ success regime, yet the long tails expose substantial failure potential. Our analysis highlights the near-term utility of LRMs while underscoring the need for new methods that generalize beyond the complexity of examples in the training distribution.
zh

[NLP-111] GigaEmbeddings: Efficient Russian Language Embedding Model

链接: https://arxiv.org/abs/2510.22369
作者: Egor Kolodin,Daria Khomich,Nikita Savushkin,Anastasia Ianina,Fyodor Minkin
机构: MIPT(莫斯科物理技术学院); SaluteDevices(萨尔特设备公司); MSU(莫斯科国立大学); Wildberries(野莓公司)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-112] Mapping Faithful Reasoning in Language Models NEURIPS2025

【速读】: 该论文旨在解决链式思维(Chain-of-thought, CoT)推理轨迹在大语言模型中可能缺乏忠实性的问题,即表面生成的推理过程未必真实反映模型内部计算逻辑,导致实践者误判“装饰性推理”为有效推理。其解决方案的关键在于提出Concept Walk框架,该框架通过在激活空间(activation space)中投影每一步推理到由对比数据学习得到的概念方向(concept direction),从而追踪模型内部状态随概念演化的情况,以此区分推理轨迹是真正影响输出结果(忠实推理)还是被模型忽略(装饰性推理)。这一方法为评估CoT的可信度提供了可量化的内省机制,有助于识别何时可以信任推理过程、何时存在误导风险。

链接: https://arxiv.org/abs/2510.22362
作者: Jiazheng Li,Andreas Damianou,J Rosser,José Luis Redondo García,Konstantina Palla
机构: King’s College London (伦敦国王学院); Spotify (斯普拉特); University of Oxford (牛津大学)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注: 9 pages, Accepted to the Mechanistic Interpretability Workshop at NeurIPS 2025

点击查看摘要

Abstract:Chain-of-thought (CoT) traces promise transparency for reasoning language models, but prior work shows they are not always faithful reflections of internal computation. This raises challenges for oversight: practitioners may misinterpret decorative reasoning as genuine. We introduce Concept Walk, a general framework for tracing how a model’s internal stance evolves with respect to a concept direction during reasoning. Unlike surface text, Concept Walk operates in activation space, projecting each reasoning step onto the concept direction learned from contrastive data. This allows us to observe whether reasoning traces shape outcomes or are discarded. As a case study, we apply Concept Walk to the domain of Safety using Qwen 3-4B. We find that in ‘easy’ cases, perturbed CoTs are quickly ignored, indicating decorative reasoning, whereas in ‘hard’ cases, perturbations induce sustained shifts in internal activations, consistent with faithful reasoning. The contribution is methodological: Concept Walk provides a lens to re-examine faithfulness through concept-specific internal dynamics, helping identify when reasoning traces can be trusted and when they risk misleading practitioners.
zh

[NLP-113] Irony Detection in Urdu Text: A Comparative Study Using Machine Learning Models and Large Language Models

【速读】: 该论文旨在解决自然语言处理中讽刺识别(ironic identification)这一挑战性任务,特别是在语法结构和文化背景差异显著的语言如乌尔都语(Urdu)中的应用问题。其关键解决方案在于通过将英文讽刺语料库(English Ironic Corpus)翻译成乌尔都语,并结合先进的词嵌入技术(GloVe 和 Word2Vec)与机器学习算法进行评估,同时对基于Transformer架构的大规模语言模型(如 BERT、RoBERTa、LLaMA 2 (7B)、LLaMA 3 (8B) 和 Mistral)进行微调,从而实现对乌尔都语中讽刺内容的有效检测。实验表明,该方法在低资源语言场景下具有显著优势,其中 LLaMA 3 (8B) 模型达到最高 F1 分数 94.61%,验证了翻译+预训练大模型策略在跨语言讽刺识别中的有效性。

链接: https://arxiv.org/abs/2510.22356
作者: Fiaz Ahmad,Nisar Hussain,Amna Qasim,Momina Hafeez,Muhammad Usman Grigori Sidorov,Alexander Gelbukh
机构: 未知
类目: Computation and Language (cs.CL)
备注: 5 pages, 3 figuers

点击查看摘要

Abstract:Ironic identification is a challenging task in Natural Language Processing, particularly when dealing with languages that differ in syntax and cultural context. In this work, we aim to detect irony in Urdu by translating an English Ironic Corpus into the Urdu language. We evaluate ten state-of-the-art machine learning algorithms using GloVe and Word2Vec embeddings, and compare their performance with classical methods. Additionally, we fine-tune advanced transformer-based models, including BERT, RoBERTa, LLaMA 2 (7B), LLaMA 3 (8B), and Mistral, to assess the effectiveness of large-scale models in irony detection. Among machine learning models, Gradient Boosting achieved the best performance with an F1-score of 89.18%. Among transformer-based models, LLaMA 3 (8B) achieved the highest performance with an F1-score of 94.61%. These results demonstrate that combining transliteration techniques with modern NLP models enables robust irony detection in Urdu, a historically low-resource language.
zh

[NLP-114] FAIR-RAG : Faithful Adaptive Iterative Refinement for Retrieval-Augmented Generation

链接: https://arxiv.org/abs/2510.22344
作者: Mohammad Aghajani Asl,Majid Asgari-Bidhendi,Behrooz Minaei-Bidgoli
机构: Sharif University of Technology (谢里夫理工大学); Iran University of Science and Technology (伊朗科学技术大学); Noor Avaran Jelvehaye Maanaei Najm Co., Ltd.
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
备注: 30 pages, 5 figures, 5 tables. Keywords: Retrieval-Augmented Generation (RAG), Large Language Models (LLMs), Agentic AI, Multi-hop Question Answering, Faithfulness

点击查看摘要

[NLP-115] DynaSolidGeo: A Dynamic Benchmark for Genuine Spatial Mathematical Reasoning of VLMs in Solid Geometry

【速读】: 该论文旨在解决当前多模态数学推理评测基准在立体几何(solid geometry)领域存在的三大问题:一是现有基准主要聚焦于二维平面几何,缺乏对三维空间推理能力的评估;二是数据集静态且易受数据污染与记忆效应影响;三是仅以最终答案作为评价标准,忽视了推理过程的质量。解决方案的关键在于提出首个动态基准DynaSolidGeo,其通过半自动标注流程构建包含503个专家设计种子问题的数据集,可动态生成无限多样化的图文实例,从而有效规避数据泄露风险;同时引入基于专家标注推理链的过程评估机制,从逻辑有效性与因果连贯性两个维度衡量模型的空间推理质量,显著提升了对视觉-语言模型(VLMs)真实空间智能的评估能力。

链接: https://arxiv.org/abs/2510.22340
作者: Changti Wu,Shijie Lian,Zihao Liu,Lei Zhang,Laurence Tianruo Yang,Kai Chen
机构: East China Normal University (华东师范大学); Zhongguancun Academy (中关村学院); Huazhong University of Science and Technology (华中科技大学); Peking University (北京大学); Zhengzhou University (郑州大学); Zhongguancun Institute of Artificial Intelligence (中关村人工智能研究院)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: The code and dataset are available at \href{ this https URL }{DynaSolidGeo}

点击查看摘要

Abstract:Solid geometry problem solving demands spatial mathematical reasoning that integrates spatial intelligence and symbolic reasoning. However, most existing multimodal mathematical reasoning benchmarks focus primarily on 2D plane geometry, rely on static datasets prone to data contamination and memorization, and evaluate models solely by final answers, overlooking the reasoning process. To address these limitations, we introduce DynaSolidGeo, the first dynamic benchmark for evaluating genuine spatial reasoning in Vision-Language Models (VLMs). Constructed through a semi-automatic annotation pipeline, DynaSolidGeo contains 503 expert-curated seed questions that can, in principle, dynamically generate an unbounded number of diverse multimodal text-visual instances. Beyond answer accuracy, we incorporate process evaluation based on expert-annotated reasoning chains to measure logical validity and causal coherence. Experiments across representative open-source and closed-source VLMs reveal large performance gaps, severe degradation in dynamic settings, and poor performance on tasks requiring high-level spatial intelligence, such as mental rotation and visualization. The code and dataset are available at \hrefthis https URLDynaSolidGeo.
zh

[NLP-116] Multilingual Target-Stance Extraction LREC2026

链接: https://arxiv.org/abs/2510.22334
作者: Ethan Mines,Bonnie Dorr
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 11 pages, 2 figures, Submitted to the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

点击查看摘要

[NLP-117] Memory-based Language Models: An Efficient Explainable and Eco-friendly Approach to Large Language Modeling

链接: https://arxiv.org/abs/2510.22317
作者: Antal van den Bosch,Ainhoa Risco Patón,Teun Buijse,Peter Berck,Maarten van Gompel
机构: Utrecht University (乌得勒支大学); Lund University (隆德大学); Royal Netherlands Academy of Arts and Sciences (荷兰皇家艺术与科学学院)
类目: Computation and Language (cs.CL)
备注: 15 pages, 11 figures

点击查看摘要

[NLP-118] VietLyrics: A Large-Scale Dataset and Models for Vietnamese Automatic Lyrics Transcription

链接: https://arxiv.org/abs/2510.22295
作者: Quoc Anh Nguyen,Bernard Cheng,Kelvin Soh
机构: National University of Singapore (新加坡国立大学)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-119] Supervised Fine-Tuning or In-Context Learning? Evaluating LLM s for Clinical NER

链接: https://arxiv.org/abs/2510.22285
作者: Andrei Baroian
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Work done in November - December 2024

点击查看摘要

[NLP-120] CityRiSE: Reasoning Urban Socio-Economic Status in Vision-Language Models via Reinforcement Learning

【速读】: 该论文旨在解决当前大型视觉语言模型(Large Vision-Language Models, LVLMs)在基于街景和卫星影像等多模态数据进行城市社会经济状态预测时存在的准确性不足与可解释性差的问题。解决方案的关键在于提出一种名为CityRiSE的新框架,其核心是通过纯强化学习(Reinforcement Learning, RL)引导LVLM聚焦于语义上有意义的视觉线索,从而实现结构化、目标导向的推理过程,以提升预测精度与跨城市、跨指标的泛化能力。

链接: https://arxiv.org/abs/2510.22282
作者: Tianhui Liu,Hetian Pang,Xin Zhang,Jie Feng,Yong Li,Pan Hui
机构: The Hong Kong University of Science and Technology (Guangzhou) (香港科技大学(广州)); Tsinghua University (清华大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Harnessing publicly available, large-scale web data, such as street view and satellite imagery, urban socio-economic sensing is of paramount importance for achieving global sustainable development goals. With the emergence of Large Vision-Language Models (LVLMs), new opportunities have arisen to solve this task by treating it as a multi-modal perception and understanding problem. However, recent studies reveal that LVLMs still struggle with accurate and interpretable socio-economic predictions from visual data. To address these limitations and maximize the potential of LVLMs, we introduce \textbfCityRiSE, a novel framework for \textbfReason\textbfing urban \textbfSocio-\textbfEconomic status in LVLMs through pure reinforcement learning (RL). With carefully curated multi-modal data and verifiable reward design, our approach guides the LVLM to focus on semantically meaningful visual cues, enabling structured and goal-oriented reasoning for generalist socio-economic status prediction. Experiments demonstrate that CityRiSE with emergent reasoning process significantly outperforms existing baselines, improving both prediction accuracy and generalization across diverse urban contexts, particularly for prediction on unseen cities and unseen indicators. This work highlights the promise of combining RL and LVLMs for interpretable and generalist urban socio-economic sensing.
zh

[NLP-121] WAON: Large-Scale and High-Quality Japanese Image-Text Pair Dataset for Vision-Language Models

【速读】: 该论文旨在解决当前日本语图像-文本对数据集规模小、质量不足的问题,从而制约视觉语言模型(Vision-Language Models, VLMs)在日语文化理解任务中的性能提升。其解决方案的关键在于构建一个大规模(约1.55亿样本)、高质量的日本语图像-文本对数据集WAON,该数据集通过从Common Crawl中收集并采用多种过滤与去重技术进行清洗,确保数据的多样性和准确性;同时,作者还设计了WAON-Bench这一人工标注的基准测试集用于评估模型在日语文化图像分类任务上的表现,实验表明基于WAON微调的SigLIP2模型在多个日语文化基准上达到最优性能,验证了该数据集的有效性与优越性。

链接: https://arxiv.org/abs/2510.22276
作者: Issa Sugiura,Shuhei Kurita,Yusuke Oda,Daisuke Kawahara,Yasuo Okabe,Naoaki Okazaki
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
备注: 9 pages, 5 figures

点击查看摘要

Abstract:Large-scale and high-quality image-text pair datasets play an important role in developing high-performing Vision-Language Models (VLMs). In this work, we introduce WAON, a large-scale and high-quality Japanese image-text pair dataset containing approximately 155 million examples, collected from Common Crawl. Our dataset construction pipeline employs various techniques, including filtering and deduplication, which have been shown to be effective in previous studies. To evaluate its effectiveness, we also construct WAON-Bench, a manually curated benchmark for Japanese cultural image classification, consisting of 374 classes. To assess the effectiveness of our dataset, we conduct experiments using both WAON and the Japanese subset of ReLAION, one of the most widely used vision-language datasets. We fine-tune SigLIP2, a strong multilingual model, on both datasets. The results demonstrate that WAON enhances model performance on WAON-Bench more efficiently than ReLAION and achieves higher accuracy across all evaluated benchmarks. Furthermore, the model fine-tuned on WAON achieves state-of-the-art performance on several Japanese cultural benchmarks. We release our dataset, model, and code at this https URL.
zh

[NLP-122] From Slides to Chatbots: Enhancing Large Language Models with University Course Materials

【速读】: 该论文旨在解决大型语言模型(Large Language Models, LLMs)在大学计算机科学课程中回答问题准确率不足的问题,核心挑战在于如何有效利用课程材料(如讲义幻灯片和课堂录音文本)来增强模型对特定领域知识的理解与推理能力。解决方案的关键在于对比两种知识扩展策略:检索增强生成(Retrieval-Augmented Generation, RAG)与持续预训练(Continual Pre-Training, CPT),并进一步探索将幻灯片以图像形式纳入多模态检索的RAG方法。实验表明,在课程材料规模有限的情况下,RAG比CPT更高效且效果更优;同时,以图像形式呈现幻灯片内容能显著提升性能,凸显了多模态信息融合在教育场景下对LLM能力增强的重要性。

链接: https://arxiv.org/abs/2510.22272
作者: Tu Anh Dinh,Philipp Nicolas Schumacher,Jan Niehues
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Large Language Models (LLMs) have advanced rapidly in recent years. One application of LLMs is to support student learning in educational settings. However, prior work has shown that LLMs still struggle to answer questions accurately within university-level computer science courses. In this work, we investigate how incorporating university course materials can enhance LLM performance in this setting. A key challenge lies in leveraging diverse course materials such as lecture slides and transcripts, which differ substantially from typical textual corpora: slides also contain visual elements like images and formulas, while transcripts contain spoken, less structured language. We compare two strategies, Retrieval-Augmented Generation (RAG) and Continual Pre-Training (CPT), to extend LLMs with course-specific knowledge. For lecture slides, we further explore a multi-modal RAG approach, where we present the retrieved content to the generator in image form. Our experiments reveal that, given the relatively small size of university course materials, RAG is more effective and efficient than CPT. Moreover, incorporating slides as images in the multi-modal setting significantly improves performance over text-only retrieval. These findings highlight practical strategies for developing AI assistants that better support learning and teaching, and we hope they inspire similar efforts in other educational contexts.
zh

[NLP-123] PatenTEB: A Comprehensive Benchmark and Model Family for Patent Text Embedding

链接: https://arxiv.org/abs/2510.22264
作者: Iliass Ayaou,Denis Cavallucci
机构: ICUBE Laboratory(实验室); INSA Strasbourg(斯特拉斯堡国立应用科学学院)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
备注:

点击查看摘要

[NLP-124] SteerX: Disentangled Steering for LLM Personalization

链接: https://arxiv.org/abs/2510.22256
作者: Xiaoyan Zhao,Ming Yan,Yilun Qiu,Haoting Ni,Yang Zhang,Fuli Feng,Hong Cheng,Tat-Seng Chua
机构: The Chinese University of Hong Kong (香港中文大学); University of Science and Technology of China (中国科学技术大学); National University of Singapore (新加坡国立大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-125] PACR: Progressively Ascending Confidence Reward for LLM Reasoning

【速读】: 该论文旨在解决强化学习中基于可验证奖励(Reinforcement Learning with Verifiable Rewards, RLVR)在大语言模型(Large Language Models, LLMs)推理任务中存在的探索效率低下的问题,即稀疏的、仅基于最终结果的奖励信号无法为中间推理步骤提供有效指导。解决方案的关键在于提出一种密集且模型内生的奖励机制——逐步上升置信度奖励(Progressively Ascending Confidence Reward, PACR),该奖励直接从模型对正确答案信念的变化中计算得出,其核心假设是:在逻辑严谨的推理路径上,模型对真实答案的概率应呈现整体上升趋势。这一先验偏置(inductive bias)通过约束探索空间至富含合理推理的区域,显著提升了训练效率与稳定性。

链接: https://arxiv.org/abs/2510.22255
作者: Eunseop Yoon,Hee Suk Yoon,Jaehyun Jang,SooHwan Eom,Qi Dai,Chong Luo,Mark A. Hasegawa-Johnson,Chang D. Yoo
机构: Korea Advanced Institute of Science and Technology (KAIST); Microsoft Research Asia (MSRA); University of Illinois at Urbana-Champaign (UIUC)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: 16 pages, 14 figures

点击查看摘要

Abstract:Reinforcement Learning with Verifiable Rewards (RLVR) has significantly improved LLM reasoning, but its sparse, outcome-based reward provides no guidance for intermediate steps, slowing exploration. We propose Progressively Ascending Confidence Reward (PACR), a dense, model-intrinsic reward computed directly from the model’s evolving belief in the correct answer. PACR encodes the inductive bias that, along a well-formed reasoning trajectory, the probability of the ground-truth answer should have a generally ascending trend. We provide empirical and theoretical analysis validating that such an inductive bias constrains the exploration search space to regions richer in logically sound reasoning. We demonstrate that PACR accelerates exploration, reaches reward saturation with fewer trajectories, and yields improvements on multiple benchmarks. Our results suggest that dense, model-intrinsic shaping signals can make RLVR training more effective and reliable.
zh

[NLP-126] You Dont Need Prompt Engineering Anymore: The Prompting Inversion

链接: https://arxiv.org/abs/2510.22251
作者: Imran Khan(Independent Researcher)
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 17 pages, 1 figure, 6 tables. Code and experimental data available at this https URL

点击查看摘要

[NLP-127] PaperAsk: A Benchmark for Reliability Evaluation of LLM s in Paper Search and Reading

【速读】: 该论文旨在解决大型语言模型(Large Language Models, LLMs)在学术研究任务中可靠性不足的问题,尤其关注其在引文检索、内容提取、论文发现和主张验证等关键科研场景下的表现。研究表明,在真实使用条件下(通过网页界面进行无透明度搜索),主流模型如GPT-4o、GPT-5和Gemini-2.5-Flash存在系统性失败:多引用查询的引文检索准确率低至48–98%,段落级内容提取失败率达72–91%,论文发现F1分数低于0.32且漏检超60%相关文献。问题根源在于LLM对上下文扩展缺乏控制以及优先选择语义相关文本而非遵循任务指令。解决方案的关键是提出PaperAsk基准框架,并基于其数据训练轻量级可靠性分类器,以识别不可靠输出,从而为提升LLM在学术辅助系统中的可靠性提供可复现、可诊断的评估与改进路径。

链接: https://arxiv.org/abs/2510.22242
作者: Yutao Wu,Xiao Liu,Yunhao Feng,Jiale Ding,Xingjun Ma
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Large Language Models (LLMs) increasingly serve as research assistants, yet their reliability in scholarly tasks remains under-evaluated. In this work, we introduce PaperAsk, a benchmark that systematically evaluates LLMs across four key research tasks: citation retrieval, content extraction, paper discovery, and claim verification. We evaluate GPT-4o, GPT-5, and Gemini-2.5-Flash under realistic usage conditions-via web interfaces where search operations are opaque to the user. Through controlled experiments, we find consistent reliability failures: citation retrieval fails in 48-98% of multi-reference queries, section-specific content extraction fails in 72-91% of cases, and topical paper discovery yields F1 scores below 0.32, missing over 60% of relevant literature. Further human analysis attributes these failures to the uncontrolled expansion of retrieved context and the tendency of LLMs to prioritize semantically relevant text over task instructions. Across basic tasks, the LLMs display distinct failure behaviors: ChatGPT often withholds responses rather than risk errors, whereas Gemini produces fluent but fabricated answers. To address these issues, we develop lightweight reliability classifiers trained on PaperAsk data to identify unreliable outputs. PaperAsk provides a reproducible and diagnostic framework for advancing the reliability evaluation of LLM-based scholarly assistance systems.
zh

[NLP-128] Evolution of the lexicon: a probabilistic point of view

链接: https://arxiv.org/abs/2510.22220
作者: Maurizio Serva
机构: 未知
类目: Computation and Language (cs.CL); Populations and Evolution (q-bio.PE)
备注:

点击查看摘要

[NLP-129] Estimating the Error of Large Language Models at Pairwise Text Comparison

链接: https://arxiv.org/abs/2510.22219
作者: Tianyi Li
机构: CUHK (香港中文大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Probability (math.PR)
备注: 14 pages, 6 figures

点击查看摘要

[NLP-130] DETECT: Determining Ease and Textual Clarity of German Text Simplifications

链接: https://arxiv.org/abs/2510.22212
作者: Maria Korobeynikova,Alessia Battisti,Lukas Fischer,Yingqiang Gao
机构: University of Zurich (苏黎世大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-131] he Lossy Horizon: Error-Bounded Predictive Coding for Lossy Text Compression (Episode I)

【速读】: 该论文旨在解决文本压缩中如何在保持较高重建质量的前提下实现更高压缩比的问题,尤其是在损失性(lossy)压缩场景下。传统方法通常通过丢弃部分信息来提升压缩率,但难以灵活控制失真与比特率之间的权衡。其解决方案的关键在于提出一种名为误差有界预测编码(Error-Bounded Predictive Coding, EPC)的新型文本编解码器,该方法利用掩码语言模型(Masked Language Model, MLM)作为解码器,在预测过程中仅存储当模型最可能预测错误时所需的最小秩基修正信息(rank-based corrections),从而构建一个连续的率失真(rate-distortion)控制通道。这一机制有效利用了模型内在的语言知识,显著优于简单的预测掩码(Predictive Masking, PM)基线,并在更低比特率下实现了更高的重建保真度。

链接: https://arxiv.org/abs/2510.22207
作者: Nnamdi Aghanya,Jun Li,Kewei Wang
机构: Cranfield University (克兰菲尔德大学)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL); Information Theory (cs.IT)
备注: 12 pages, 7 figures

点击查看摘要

Abstract:Large Language Models (LLMs) can achieve near-optimal lossless compression by acting as powerful probability models. We investigate their use in the lossy domain, where reconstruction fidelity is traded for higher compression ratios. This paper introduces Error-Bounded Predictive Coding (EPC), a lossy text codec that leverages a Masked Language Model (MLM) as a decompressor. Instead of storing a subset of original tokens, EPC allows the model to predict masked content and stores minimal, rank-based corrections only when the model’s top prediction is incorrect. This creates a residual channel that offers continuous rate-distortion control. We compare EPC to a simpler Predictive Masking (PM) baseline and a transform-based Vector Quantisation with a Residual Patch (VQ+RE) approach. Through an evaluation that includes precise bit accounting and rate-distortion analysis, we demonstrate that EPC consistently dominates PM, offering superior fidelity at a significantly lower bit rate by more efficiently utilising the model’s intrinsic knowledge.
zh

[NLP-132] M-CIF: Multi-Scale Alignment For CIF-Based Non-Autoregressive ASR

【速读】: 该论文旨在解决非自回归(Non-Autoregressive, NAR)语音识别中因缺乏细粒度引导而导致的声学-文本对齐不稳定问题,尤其在英语和法语等语言上表现明显。其解决方案的关键在于提出多尺度连续积分与放电机制(Multi-scale Continuous Integrate-and-Fire, M-CIF),通过逐级融合字符和音素层级的监督信号,并将其逐步蒸馏至子词表示中,从而增强声学特征到目标token的渐进式对齐能力。实验表明,M-CIF在CommonVoice数据集上显著降低词错误率(WER),特别是在德语和法语中分别提升4.21%和3.05%,且通过定义音素混淆错误(Phonetic Confusion Errors, PE)和空间相关分割错误(Space-related Segmentation Errors, SE)的分析验证了音素与字符层级对提升对齐稳定性的关键作用。

链接: https://arxiv.org/abs/2510.22172
作者: Ruixiang Mao,Xiangnan Ma,Qing Yang,Ziming Zhu,Yucheng Qiao,Yuan Ge,Tong Xiao,Shengxiang Gao,Zhengtao Yu,Jingbo Zhu
机构: 未知
类目: ound (cs.SD); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:The Continuous Integrate-and-Fire (CIF) mechanism provides effective alignment for non-autoregressive (NAR) speech recognition. This mechanism creates a smooth and monotonic mapping from acoustic features to target tokens, achieving performance on Mandarin competitive with other NAR approaches. However, without finer-grained guidance, its stability degrades in some languages such as English and French. In this paper, we propose Multi-scale CIF (M-CIF), which performs multi-level alignment by integrating character and phoneme level supervision progressively distilled into subword representations, thereby enhancing robust acoustic-text alignment. Experiments show that M-CIF reduces WER compared to the Paraformer baseline, especially on CommonVoice by 4.21% in German and 3.05% in French. To further investigate these gains, we define phonetic confusion errors (PE) and space-related segmentation errors (SE) as evaluation metrics. Analysis of these metrics across different M-CIF settings reveals that the phoneme and character layers are essential for enhancing progressive CIF alignment.
zh

[NLP-133] Surface Reading LLM s: Synthetic Text and its Styles

【速读】: 该论文试图解决的问题是:当前对大型语言模型(Large Language Models, LLMs)的社会影响研究往往聚焦于其是否接近超级智能(superintelligence),而忽视了LLMs通过生成难以与人类写作区分的文本,在符号层面(semiotic level)如何重塑意义建构过程。解决方案的关键在于提出一种“表层完整性”(surface integrity)的符号学框架,强调应关注LLMs在人类交流中直接呈现的风格特征(stylistic markers),并将这种表层风格分析与深度批判性研究(如Critical AI Studies)相结合。作者通过两个案例研究表明,风格本身作为一种符号现象,揭示了LLMs作为文化主体(cultural actors)如何改变当代话语中意义产生与传播的条件,而不依赖于对机器意识的讨论。

链接: https://arxiv.org/abs/2510.22162
作者: Hannes Bajohr
机构: 未知
类目: Computers and Society (cs.CY); Computation and Language (cs.CL)
备注: 12 pages, 1 figure

点击查看摘要

Abstract:Despite a potential plateau in ML advancement, the societal impact of large language models lies not in approaching superintelligence but in generating text surfaces indistinguishable from human writing. While Critical AI Studies provides essential material and socio-technical critique, it risks overlooking how LLMs phenomenologically reshape meaning-making. This paper proposes a semiotics of “surface integrity” as attending to the immediate plane where LLMs inscribe themselves into human communication. I distinguish three knowledge interests in ML research (epistemology, epistēmē, and epistemics) and argue for integrating surface-level stylistic analysis alongside depth-oriented critique. Through two case studies examining stylistic markers of synthetic text, I argue how attending to style as a semiotic phenomenon reveals LLMs as cultural actors that transform the conditions of meaning emergence and circulation in contemporary discourse, independent of questions about machine consciousness.
zh

[NLP-134] SentiMaithili: A Benchmark Dataset for Sentiment and Reason Generation for the Low-Resource Maithili Language

链接: https://arxiv.org/abs/2510.22160
作者: Rahul Ranjan,Mahendra Kumar Gurve,Anuj,Nitin,Yamuna Prasad
机构: 未知
类目: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[NLP-135] Power to the Clients: Federated Learning in a Dictatorship Setting

链接: https://arxiv.org/abs/2510.22149
作者: Mohammadsajad Alipour,Mohammad Mohammadi Amiri
机构: Rensselaer Polytechnic Institute (伦斯勒理工学院)
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
备注:

点击查看摘要

[NLP-136] OlaMind: Towards Human-Like and Hallucination-Safe Customer Service for Retrieval-Augmented Dialogue

链接: https://arxiv.org/abs/2510.22143
作者: Tianhong Gao,Jundong Shen,Bei Shi,Jiapeng Wang,Ying Ju,Junfeng Yao,Jiao Ran,Yong Zhang,Lin Dong,Huiyu Yu,Tingting Ye
机构: ByteDance(字节跳动)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-137] LOC: A General Language-Guided Framework for Open-Set 3D Occupancy Prediction

链接: https://arxiv.org/abs/2510.22141
作者: Yuhang Gao,Xiang Xiang,Sheng Zhong,Guoyou Wang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG); Robotics (cs.RO); Image and Video Processing (eess.IV)
备注:

点击查看摘要

[NLP-138] Edit Less Achieve More: Dynamic Sparse Neuron Masking for Lifelong Knowledge Editing in LLM s NEURIPS2025

链接: https://arxiv.org/abs/2510.22139
作者: Jinzhe Liu,Junshu Sun,Shufan Shen,Chenxue Yang,Shuhui Wang
机构: Key Lab of Intell. Info. Process., Inst. of Comput. Tech., CAS (中国科学院计算技术研究所智能信息处理重点实验室); University of Chinese Academy of Sciences (中国科学院大学); Agriculture Information Institute, CAAS (中国农业科学院农业信息研究所)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注: 19 pages, 11 figures, Accepted by NeurIPS 2025

点击查看摘要

[NLP-139] Every Activation Boosted: Scaling General Reason er to 1 Trillion Open Language Foundation

链接: https://arxiv.org/abs/2510.22115
作者: Ling-Team,Ang Li,Ben Liu,Binbin Hu,Bing Li,Bingwei Zeng,Borui Ye,Caizhi Tang,Changxin Tian,Chao Huang,Chao Zhang,Chen Qian,Chenchen Ju,Chenchen Li,Chengfu Tang,Chili Fu,Chunshao Ren,Chunwei Wu,Cong Zhang,Cunyin Peng,Dafeng Xu,Daixin Wang,Dalong Zhang,Dingnan Jin,Dingyuan Zhu,Dongke Hu,Fangzheng Zhao,Feifan Wu,Feng Zhu,Gangshan Wang,Haitao Zhang,Hailin Zhao,Hanxiao Zhang,Hanzi Wang,Hao Qian,Haoyi Yu,Heng Zhang,Hongliang Zhang,Hongzhi Luan,Huirong Dong,Huizhong Li,Jia Li,Jia Liu,Jialong Zhu,Jian Sha,Jianping Wei,Jiaolong Yang,Jieyue Ma,Jiewei Wu,Jinjing Huang,Jingyun Tian,Jingyuan Zhang,Jinquan Sun,Juanhui Tu,Jun Liu,Jun Xu,Jun Zhou,Junjie Ou,Junpeng Fang,Kaihong Zhang,Kaiqin Hu,Ke Shi,Kun Tang,Kunlong Chen,Lanyin Mei,Lei Liang,Lei Xu,Libo Zhang,Lin Ju,Lin Yuan,Ling Zhong,Lintao Ma,Lu Liu,Lu Yu,Lun Cai,Meiqi Zhu,Mengying Li,Min Chen,Minghao Xue,Minghong Cai,Mingming Yin,Peijie Jiang,Peilong Zhao,Pingping Liu,Qian Zhao,Qing Cui,Qingxiang Huang,Qingyuan Yang,Quankun Yu,Shaowei Wei,Shijie Lian,Shoujian Zheng,Shun Song,Shungen Zhang,Shuo Zhang,Siyuan Li,Song Liu,Ting Guo,Tong Zhao,Wanli Gu
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Ling 2.0 Technical Report

点击查看摘要

[NLP-140] Gradual Forgetting: Logarithmic Compression for Extending Transformer Context Windows

【速读】: 该论文旨在解决Transformer模型在处理长上下文时面临的挑战,即如何在不显著增加模型内部复杂度的前提下扩展其长期记忆能力。传统方法通常通过引入循环机制或辅助记忆模块来增强长程依赖建模,但这会增加架构复杂性。论文提出了一种创新的解决方案:在输入层对token序列进行尺度不变的对数压缩(scale-invariant logarithmic compression),从而将原始长序列映射为更紧凑的表示形式,再由标准Transformer处理。该方案的关键在于通过输入层面的压缩而非修改Transformer结构本身,实现了对长上下文的有效利用,且实验表明其在WikiText-103和PG-19语言建模任务中显著降低困惑度,并随压缩后时间上下文长度增加而持续提升性能。

链接: https://arxiv.org/abs/2510.22109
作者: Billy Dickson,Zoran Tiganj
机构: Indiana University Bloomington (印第安纳大学布卢明顿分校)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Most approaches to long-context processing increase the complexity of the transformer’s internal architecture by integrating mechanisms such as recurrence or auxiliary memory modules. In this work, we introduce an alternative approach that modifies the input representation itself, rather than the transformer architecture. Inspired by cognitive models of human memory, our method applies a scale-invariant logarithmic compression to the input tokens. The resulting compressed representation is processed by a standard, unmodified transformer, preserving architectural simplicity. We evaluate this approach on the WikiText-103 and PG-19 language modeling benchmarks, showing a reduction in perplexity compared to uncompressed baselines. Moreover, performance improves consistently with longer compressed temporal contexts, showing that input-level logarithmic compression is a simple and effective way to extend a transformer’s long-range memory.
zh

[NLP-141] Mitigating Coordinate Prediction Bias from Positional Encoding Failures

【速读】: 该论文旨在解决多模态大语言模型(Multimodal Large Language Models, MLLs)在高分辨率输入下进行精确坐标预测时面临的挑战,特别是由于长序列导致的位置编码(Visual Positional Encoding, VPE)弱化和方向性偏差问题。研究表明,当VPE被人为打乱时,模型输出的坐标误差并非随机,而是呈现可预测的方向性偏差,表明模型依赖内部位置先验来补偿退化的空间信号。解决方案的关键在于提出一种无需训练的测试时方法——视觉位置编码打乱引导(Vision-PE Shuffle Guidance, VPSG),其核心思想是通过辅助解码使用打乱后的VPE提取“无位置条件”的倾向作为负证据,并利用轻量级有限状态机保留坐标格式的同时纠正预测结果,从而提升MLLM在空间推理任务中的准确性。

链接: https://arxiv.org/abs/2510.22102
作者: Xingjian Tao,Yiwei Wang,Yujun Cai,Yihong Luo,Jing Tang
机构: The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)); The Hong Kong University of Science and Technology(香港科技大学); University of California, Merced(加州大学默塞德分校); The University of Queensland(昆士兰大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Multimodal large language models (MLLMs) excel at vision-language tasks such as VQA and document understanding, yet precise coordinate prediction remains challenging. High-resolution inputs exacerbate this difficulty by producing long token sequences that weaken positional encodings and introduce directional biases in coordinate outputs. We investigate this phenomenon by analyzing how MLLMs behave when visual positional encodings (VPEs) are deliberately perturbed through shuffling. Our analysis reveals that such perturbations induce predictable, non-random coordinate biases rather than random errors, suggesting that models rely on internal positional priors when spatial grounding signals are degraded. Crucially, we observe similar directional error patterns in natural high-resolution datasets, indicating that positional encoding failures are a key bottleneck for accurate coordinate prediction at scale. To address this issue, we propose Vision-PE Shuffle Guidance (VPSG), a training-free test-time method that leverages the directional nature of these biases for correction. VPSG runs auxiliary decoding with shuffled VPEs to isolate position-unconditioned tendencies, then uses this as negative evidence to guide digit prediction while preserving coordinate format through a lightweight finite-state machine. Experiments on ScreenSpot-Pro demonstrate reliable improvements, highlighting positional encoding robustness as a critical factor for spatial reasoning in MLLMs.
zh

[NLP-142] Generalization or Memorization: Dynamic Decoding for Mode Steering

链接: https://arxiv.org/abs/2510.22099
作者: Xuanming Zhang
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-143] Embracing Trustworthy Brain-Agent Collaboration as Paradigm Extension for Intelligent Assistive Technologies NEURIPS’25

【速读】: 该论文旨在解决脑机接口(Brain-Computer Interfaces, BCIs)在实际应用中面临的两大核心问题:一是信息传输速率低,二是需要大量用户特定的校准过程,从而限制了其广泛部署。为应对这些挑战,论文提出将大语言模型(Large Language Models, LLMs)引入BCI系统,推动从简单的命令解码向复杂认知状态理解的转变。解决方案的关键在于将AI代理(agent)从被动的数据处理工具转变为具有主动性和协作能力的智能伙伴,形成“脑-代理协同”(Brain-Agent Collaboration, BAC)的新范式,强调伦理数据管理、模型可靠性以及人机协同框架的构建,以确保系统的安全性、可信度与有效性。

链接: https://arxiv.org/abs/2510.22095
作者: Yankai Chen,Xinni Zhang,Yifei Zhang,Yangning Li,Henry Peng Zou,Chunyu Miao,Weizhi Zhang,Xue Liu,Philip S. Yu
机构: University of Illinois Chicago (伊利诺伊大学芝加哥分校); MBZUAI; McGill University (麦吉尔大学); The Chinese University of Hong Kong (香港中文大学); Nanyang Technological University (南洋理工大学); Tsinghua University (清华大学)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: Accepted by NeurIPS’25 Position Track

点击查看摘要

Abstract:Brain-Computer Interfaces (BCIs) offer a direct communication pathway between the human brain and external devices, holding significant promise for individuals with severe neurological impairments. However, their widespread adoption is hindered by critical limitations, such as low information transfer rates and extensive user-specific calibration. To overcome these challenges, recent research has explored the integration of Large Language Models (LLMs), extending the focus from simple command decoding to understanding complex cognitive states. Despite these advancements, deploying agentic AI faces technical hurdles and ethical concerns. Due to the lack of comprehensive discussion on this emerging direction, this position paper argues that the field is poised for a paradigm extension from BCI to Brain-Agent Collaboration (BAC). We emphasize reframing agents as active and collaborative partners for intelligent assistance rather than passive brain signal data processors, demanding a focus on ethical data handling, model reliability, and a robust human-agent collaboration framework to ensure these systems are safe, trustworthy, and effective.
zh

[NLP-144] Jailbreak Mimicry: Automated Discovery of Narrative-Based Jailbreaks for Large Language Models

链接: https://arxiv.org/abs/2510.22085
作者: Pavlos Ntais
机构: University of Athens (雅典大学)
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: 18 pages, 5 figures

点击查看摘要

[NLP-145] Compositional Bias Control in Large Language Models : Preference Learning Fails Supervision Succeeds

【速读】: 该论文旨在解决大型语言模型(Large Language Models, LLMs)在职业中性语境下仍生成性别刻板印象语言的问题,这反映了深层次的社会偏见。解决方案的关键在于系统比较六种偏见缓解技术:仅提示(prompt-only)、生成后过滤(generate-and-filter)、基于DFA的Ctrl-G解码、监督微调(Supervised Fine-Tuning, SFT)、直接偏好优化(Direct Preference Optimization, DPO)以及迭代零空间投影(Iterative Nullspace Projection, INLP)。研究发现,只有显式正向监督(如SFT)能够有效缓解复合约束下的偏见,同时保持高词汇多样性和流畅性;而基于偏好学习的方法(如DPO)因无法编码逻辑合取关系,在满足复合约束时失败,揭示了偏好学习在处理结构化公平性要求上的局限性,强调了显式监督对于实现公平且自然可控生成的必要性。

链接: https://arxiv.org/abs/2510.22084
作者: Atij Mahesh
机构: University of California, Los Angeles (加州大学洛杉矶分校)
类目: Computation and Language (cs.CL)
备注: 20 pages

点击查看摘要

Abstract:Large Language Models (LLMs) still produce gender-stereotyped language even in occupation-neutral contexts that reflect deep societal biases (Rudinger et al., 2018). To address this, prior work has proposed prompting, constrained decoding (Dathathri et al., 2020; Zhou et al., 2024), post-processing, and fine-tuning-based alignment (Rafailov et al., 2023; Ravfogel et al., 2022). However, the comparative efficacy and learning dynamics remain little understood. We report a comparative analysis of six control techniques for bias mitigation: prompt-only, generate-and-filter, DFA-based Ctrl-G decoding, Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Iterative Nullspace Projection (INLP). We evaluate each method on a compositional constraint task. This task requires generating sentences that contain at least one agentic and one communal descriptor for each of the twenty Winogender-derived occupations. We quantify trade-offs between control strength and naturalness with evaluations of constraint compliance, lexical diversity, and fluency. Our results reveal key contrasts among the methods: SFT achieves 99.87 ± 0.15% compliance and high lexical diversity, while DPO, despite similar training stability, fails at 4.53 ± 0.82%. Ctrl-G guarantees perfect compliance, but at the cost of severely reduced fluency and diversity. Preference-based learning fundamentally differs: it cannot satisfy compositional constraints, as binary preference signals encode ranking, not logical conjunctions. Only explicit positive supervision enables mitigation of compositional biases; preference-based alignment fails to generalize logical structures, underscoring the limitations of preference learning and the necessity of explicit supervision for fair and fluent controlled generation.
zh

[NLP-146] Agent ic Reinforcement Learning for Real-World Code Repair

【速读】: 该论文旨在解决在真实代码仓库中训练可靠代码修复代理(code-fixing agents)的问题,其核心挑战在于复杂构建流程和依赖项的动态变化导致评估不稳定。解决方案的关键在于构建一个可验证的流水线,通过固定依赖项(pinning dependencies)和禁用自动升级来提升约1000个实际问题的可重现性;在此基础上进一步设计了一个可扩展的简化流水线以支持大规模强化学习(Reinforcement Learning, RL),从而实现对Qwen3-32B模型的监督微调(Supervised Fine-Tuning, SFT)及基于SFT模型的RL优化,最终在匹配训练测试环境条件下实现了7–20%的绝对性能提升,同时揭示了训练与测试环境一致性对构建真实世界代码修复代理的重要性。

链接: https://arxiv.org/abs/2510.22075
作者: Siyu Zhu,Anastasiya Karpovich,Albert Chen,Jessica Koscheka,Shailesh Jannu,Di Wen,Yuqing Zhu,Rohit Jain,Alborz Geramifard
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:We tackle the challenge of training reliable code-fixing agents in real repositories, where complex builds and shifting dependencies make evaluation unstable. We developed a verifiable pipeline with success defined as post-fix build validation and improved reproducibility across ~1K real issues by pinning dependencies and disabling automatic upgrades. Building on this, we introduced a scalable simplified pipeline for large-scale reinforcement learning (RL). Using this setup, we supervised fine-tuned Qwen3-32B in the full pipeline and applied RL on top of the SFT model in the simplified environment. The SFT model distilled from GPT-4.1 trajectories performs on par while being 56x smaller, and RL added 7-20% absolute gains under matched train-test conditions. “Thinking mode” was on par or worse in our experiments. Both SFT and RL models failed to generalize across environments, highlighting the importance of matching train-test environments for building reliable real-world code-fixing agents.
zh

[NLP-147] A Benchmark for Open-Domain Numerical Fact-Checking Enhanced by Claim Decomposition

【速读】: 该论文旨在解决自动事实核查中对自然数值类声明(numerical claims)验证不足的问题,尤其是在现有基准数据集因采用启发式分解方法和弱监督网络搜索获取证据时,常导致相关性差、噪声源及时间泄漏(temporal leakage),从而无法真实模拟人类事实核查者的检索过程。其解决方案的关键在于构建一个名为QuanTemp++的数据集,该数据集包含自然数值声明、开放域语料库及其对应的高质量相关证据,这些证据通过近似模仿人类事实核查者的方法进行声明分解并收集,同时确保无时间泄漏,从而提供更贴近现实的检索场景。在此基础上,作者还评估了关键声明分解范式的检索性能,并分析其对整个验证流程结果的影响,为开发更可靠的自动数值事实核查系统提供了实证基础与实践指导。

链接: https://arxiv.org/abs/2510.22055
作者: V Venktesh,Deepali Prabhu,Avishek Anand
机构: 未知
类目: Information Retrieval (cs.IR); Computation and Language (cs.CL)
备注: 16 pages

点击查看摘要

Abstract:Fact-checking numerical claims is critical as the presence of numbers provide mirage of veracity despite being fake potentially causing catastrophic impacts on society. The prior works in automatic fact verification do not primarily focus on natural numerical claims. A typical human fact-checker first retrieves relevant evidence addressing the different numerical aspects of the claim and then reasons about them to predict the veracity of the claim. Hence, the search process of a human fact-checker is a crucial skill that forms the foundation of the verification process. Emulating a real-world setting is essential to aid in the development of automated methods that encompass such skills. However, existing benchmarks employ heuristic claim decomposition approaches augmented with weakly supervised web search to collect evidences for verifying claims. This sometimes results in less relevant evidences and noisy sources with temporal leakage rendering a less realistic retrieval setting for claim verification. Hence, we introduce QuanTemp++: a dataset consisting of natural numerical claims, an open domain corpus, with the corresponding relevant evidence for each claim. The evidences are collected through a claim decomposition process approximately emulating the approach of human fact-checker and veracity labels ensuring there is no temporal leakage. Given this dataset, we also characterize the retrieval performance of key claim decomposition paradigms. Finally, we observe their effect on the outcome of the verification pipeline and draw insights. The code for data pipeline along with link to data can be found at this https URL
zh

[NLP-148] Emotions Where Art Thou: Understanding and Characterizing the Emotional Latent Space of Large Language Models

链接: https://arxiv.org/abs/2510.22042
作者: Benjamin Reichman,Adar Avsian,Larry Heck
机构: Georgia Institute of Technology (佐治亚理工学院)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-149] ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining Finetuning and Decoding the Curse of Multilinguality

【速读】: 该论文旨在解决当前生成式 AI (Generative AI) 领域中Scaling Laws研究过度集中于英语、忽视多语言场景的问题,从而阻碍了全球范围内模型性能的公平提升。其核心解决方案是提出自适应迁移缩放定律(Adaptive Transfer Scaling Law, ATLAS),该方法在单语和多语预训练场景下均显著优于现有缩放定律,在样本外泛化能力上平均提升超过0.3的R²。关键创新在于通过大规模实验(774次多语言训练,覆盖400+训练语言和48评估语言)揭示了跨语言迁移矩阵、语言无关的最优缩放策略以及从头预训练与微调的计算拐点,为多语言模型高效扩展提供了可量化的科学依据。

链接: https://arxiv.org/abs/2510.22037
作者: Shayne Longpre,Sneha Kudugunta,Niklas Muennighoff,I-Hung Hsu,Isaac Caswell,Alex Pentland,Sercan Arik,Chen-Yu Lee,Sayna Ebrahimi
机构: MIT(麻省理工学院); University of Washington(华盛顿大学); Stanford University(斯坦福大学); Google Cloud AI(谷歌云人工智能); Google DeepMind(谷歌深度思维)
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Scaling laws research has focused overwhelmingly on English – yet the most prominent AI models explicitly serve billions of international users. In this work, we undertake the largest multilingual scaling laws study to date, totaling 774 multilingual training experiments, spanning 10M-8B model parameters, 400+ training languages and 48 evaluation languages. We introduce the Adaptive Transfer Scaling Law (ATLAS) for both monolingual and multilingual pretraining, which outperforms existing scaling laws’ out-of-sample generalization often by more than 0.3 R^2. Our analyses of the experiments shed light on multilingual learning dynamics, transfer properties between languages, and the curse of multilinguality. First, we derive a cross-lingual transfer matrix, empirically measuring mutual benefit scores between 38 x 38=1444 language pairs. Second, we derive a language-agnostic scaling law that reveals how to optimally scale model size and data when adding languages without sacrificing performance. Third, we identify the computational crossover points for when to pretrain from scratch versus finetune from multilingual checkpoints. We hope these findings provide the scientific foundation for democratizing scaling laws across languages, and enable practitioners to efficiently scale models – beyond English-first AI.
zh

[NLP-150] Penalizing Length: Uncovering Systematic Bias in Quality Estimation Metrics

链接: https://arxiv.org/abs/2510.22028
作者: Yilin Zhang,Wenda Xu,Zhongtao Liu,Tetsuji Nakagawa,Markus Freitag
机构: Carnegie Mellon University (卡内基梅隆大学); Google (谷歌)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-151] oward Understanding the Transferability of Adversarial Suffixes in Large Language Models

【速读】: 该论文旨在解决离散优化类越狱攻击(discrete optimization-based jailbreaking attacks)中攻击后缀的迁移性(transferability)缺乏理论解释的问题,即为何某些在特定提示(prompt)和模型上优化出的无意义后缀能够成功地在未见过的提示和模型上触发非法内容生成。解决方案的关键在于识别出三个与迁移成功率强相关的统计特性:(1)原始提示未加后缀时激活模型内部拒绝方向(refusal direction)的程度;(2)后缀诱导模型远离该拒绝方向的强度;(3)这些变化在垂直于拒绝方向上的分量大小。研究发现,提示语义相似性对迁移性影响较弱,而上述三个统计指标则能更精细地预测并指导攻击优化策略,从而提升攻击在跨提示与跨模型场景下的有效性。

链接: https://arxiv.org/abs/2510.22014
作者: Sarah Ball,Niki Hasrati,Alexander Robey,Avi Schwarzschild,Frauke Kreuter,Zico Kolter,Andrej Risteski
机构: Ludwig-Maximilians-Universität München (慕尼黑路德维希-马克西米利安大学); Munich Center for Machine Learning (MCML) (慕尼黑机器学习中心); Carnegie Mellon University (卡内基梅隆大学); JPSM University of Maryland (马里兰大学公共政策与社会测量中心)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Discrete optimization-based jailbreaking attacks on large language models aim to generate short, nonsensical suffixes that, when appended onto input prompts, elicit disallowed content. Notably, these suffixes are often transferable – succeeding on prompts and models for which they were never optimized. And yet, despite the fact that transferability is surprising and empirically well-established, the field lacks a rigorous analysis of when and why transfer occurs. To fill this gap, we identify three statistical properties that strongly correlate with transfer success across numerous experimental settings: (1) how much a prompt without a suffix activates a model’s internal refusal direction, (2) how strongly a suffix induces a push away from this direction, and (3) how large these shifts are in directions orthogonal to refusal. On the other hand, we find that prompt semantic similarity only weakly correlates with transfer success. These findings lead to a more fine-grained understanding of transferability, which we use in interventional experiments to showcase how our statistical analysis can translate into practical improvements in attack success.
zh

[NLP-152] Optimal Detection for Language Watermarks with Pseudorandom Collision

链接: https://arxiv.org/abs/2510.22007
作者: T. Tony Cai,Xiang Li,Qi Long,Weijie J. Su,Garrett G. Wen
机构: University of Pennsylvania (宾夕法尼亚大学); Yale University (耶鲁大学)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Statistics Theory (math.ST); Machine Learning (stat.ML)
备注:

点击查看摘要

[NLP-153] From Social Division to Cohesion with AI Message Suggestions in Online Chat Groups

【速读】: 该论文试图解决的问题是:在意见分化显著的社会中,尤其是在在线交流场景下,如何通过人工智能(AI)辅助沟通来维持或增强社会凝聚力(social cohesion)。其解决方案的关键在于设计不同形式的AI辅助策略——具体而言,若采用以个体为中心的个性化提示(individual-focused assistance),会导致用户趋向于形成同质化群体,加剧分裂;而采用基于群体关系的上下文适配提示(relational assistance),即根据群体成员立场调整建议内容,则能促进更具包容性和开放性的互动,从而提升群体内部的社会凝聚力。研究结果表明,AI对社会结构的影响并非必然负面,其效果高度依赖于个性化机制的设计逻辑。

链接: https://arxiv.org/abs/2510.21984
作者: Faria Huq,Elijah L. Claggett,Hirokazu Shirado
机构: Carnegie Mellon University (卡内基梅隆大学)
类目: ocial and Information Networks (cs.SI); Computation and Language (cs.CL)
备注: Preprint, Under Review

点击查看摘要

Abstract:Social cohesion is difficult to sustain in societies marked by opinion diversity, particularly in online communication. As large language model (LLM)-driven messaging assistance becomes increasingly embedded in these contexts, it raises critical questions about its societal impact. We present an online experiment with 557 participants who engaged in multi-round discussions on politically controversial topics while freely reconfiguring their discussion groups. In some conditions, participants received real-time message suggestions generated by an LLM, either personalized to the individual or adapted to their group context. We find that subtle shifts in linguistic style during communication, mediated by AI assistance, can scale up to reshape collective structures. While individual-focused assistance leads users to segregate into like-minded groups, relational assistance that incorporates group members’ stances enhances cohesion through more receptive exchanges. These findings demonstrate that AI-mediated communication can support social cohesion in diverse groups, but outcomes critically depend on how personalization is designed.
zh

[NLP-154] Uncovering the Persuasive Fingerprint of LLM s in Jailbreaking Attacks

【速读】: 该论文旨在解决大型语言模型(Large Language Models, LLMs)在对齐机制下仍易受越狱攻击(jailbreak attacks)的问题,即攻击者通过特定提示词绕过模型的安全约束并诱导产生有害输出。其解决方案的关键在于引入社会科学研究中关于说服(persuasion)的基础理论,设计具有 persuasive 结构的对抗性提示词,从而利用LLM在大规模人类文本训练中习得的对说服结构的响应倾向,显著提升越狱成功率。实验表明,这种基于说服机制的提示策略在多个对齐LLM上均能有效规避安全防护,并揭示了模型在越狱响应中可能表现出独特的“说服指纹”(persuasive fingerprints),凸显了跨学科视角在LLM安全研究中的重要价值。

链接: https://arxiv.org/abs/2510.21983
作者: Havva Alizadeh Noughabi,Julien Serbanescu,Fattane Zarrinkalam,Ali Dehghantanha
机构: University of Guelph (圭尔夫大学); College of Engineering (工程学院)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Despite recent advances, Large Language Models remain vulnerable to jailbreak attacks that bypass alignment safeguards and elicit harmful outputs. While prior research has proposed various attack strategies differing in human readability and transferability, little attention has been paid to the linguistic and psychological mechanisms that may influence a model’s susceptibility to such attacks. In this paper, we examine an interdisciplinary line of research that leverages foundational theories of persuasion from the social sciences to craft adversarial prompts capable of circumventing alignment constraints in LLMs. Drawing on well-established persuasive strategies, we hypothesize that LLMs, having been trained on large-scale human-generated text, may respond more compliantly to prompts with persuasive structures. Furthermore, we investigate whether LLMs themselves exhibit distinct persuasive fingerprints that emerge in their jailbreak responses. Empirical evaluations across multiple aligned LLMs reveal that persuasion-aware prompts significantly bypass safeguards, demonstrating their potential to induce jailbreak behaviors. This work underscores the importance of cross-disciplinary insight in addressing the evolving challenges of LLM safety. The code and data are available.
zh

[NLP-155] Performance Trade-offs of Optimizing Small Language Models for E-Commerce

【速读】: 该论文旨在解决大型商业语言模型在特定领域(如电商)部署时面临的高计算成本、延迟和运营开销问题。其解决方案的关键在于采用参数量更小的开源模型(10亿参数的Llama 3.2),通过量化低秩适配(QLoRA)技术进行微调,并结合后训练量化方法(GPTQ用于GPU优化,GGUF用于CPU优化),从而在保持与GPT-4.1相当的99%准确率的同时,显著降低资源消耗。研究表明,针对硬件特性选择合适的量化格式可实现性能与效率的最佳平衡,尤其在CPU端使用GGUF格式能带来超过18倍的推理吞吐提升和90%以上的内存节省。

链接: https://arxiv.org/abs/2510.21970
作者: Josip Tomo Licardo,Nikola Tankovic
机构: Juraj Dobrila University of Pula (尤拉伊·多布里拉普拉大学)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: 15 pages, 9 figures

点击查看摘要

Abstract:Large Language Models (LLMs) offer state-of-the-art performance in natural language understanding and generation tasks. However, the deployment of leading commercial models for specialized tasks, such as e-commerce, is often hindered by high computational costs, latency, and operational expenses. This paper investigates the viability of smaller, open-weight models as a resource-efficient alternative. We present a methodology for optimizing a one-billion-parameter Llama 3.2 model for multilingual e-commerce intent recognition. The model was fine-tuned using Quantized Low-Rank Adaptation (QLoRA) on a synthetically generated dataset designed to mimic real-world user queries. Subsequently, we applied post-training quantization techniques, creating GPU-optimized (GPTQ) and CPU-optimized (GGUF) versions. Our results demonstrate that the specialized 1B model achieves 99% accuracy, matching the performance of the significantly larger GPT-4.1 model. A detailed performance analysis revealed critical, hardware-dependent trade-offs: while 4-bit GPTQ reduced VRAM usage by 41%, it paradoxically slowed inference by 82% on an older GPU architecture (NVIDIA T4) due to dequantization overhead. Conversely, GGUF formats on a CPU achieved a speedup of up to 18x in inference throughput and a reduction of over 90% in RAM consumption compared to the FP16 baseline. We conclude that small, properly optimized open-weight models are not just a viable but a more suitable alternative for domain-specific applications, offering state-of-the-art accuracy at a fraction of the computational cost.
zh

[NLP-156] Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing

【速读】: 该论文旨在解决掩码扩散模型(Masked Diffusion Models, MDMs)在离散文本生成中实现高效并行采样时面临的两个相互冲突的需求:一是同时更新的标记必须满足条件独立性,二是更新应优先选择高置信度的预测。这两个目标存在矛盾,因为高置信度预测往往具有依赖关系,难以并行处理。为此,作者提出了一种模型无关的采样方法 PUNT,其核心在于通过识别标记间的依赖关系,移除冲突组中低置信度的标记,从而生成满足独立性和置信度双重约束的解掩码索引集合。该方法基于近似条件独立性测试,实现了更优的并行解掩码策略,在不依赖繁琐超参数调优的前提下显著提升了长序列生成的准确率与计算效率,尤其在 IFEval 基准上相比基线方法(包括逐个生成)最高提升达 16%。此外,PUNT 还诱导出一种类规划的层次化生成机制,先构建段落级结构再进行局部细化,有助于提升生成内容的整体一致性与对齐性能。

链接: https://arxiv.org/abs/2510.21961
作者: Iskander Azangulov,Teodora Pandeva,Niranjani Prasad,Javier Zazo,Sushrut Karmalkar
机构: University of Oxford (牛津大学); Microsoft Research, Cambridge (微软研究院剑桥)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Masked diffusion models (MDMs) offer a compelling alternative to autoregressive models (ARMs) for discrete text generation because they enable parallel token sampling, rather than sequential, left-to-right generation. This means potentially much faster inference. However, effective parallel sampling faces two competing requirements: (i) simultaneously updated tokens must be conditionally independent, and (ii) updates should prioritise high-confidence predictions. These goals conflict because high-confidence predictions often cluster and depend on each other, opportunities for parallel updates. We present PUNT, a model-agnostic sampler that reconciles this trade-off. Our method identifies token dependencies and removes lower-confidence tokens from conflicting groups. This produces sets of indices for unmasking that satisfy both independence and confidence criteria. Our approach ensures improved parallel unmasking through approximate conditional independence testing. Our experiments show that PUNT delivers a superior trade-off between accuracy and compute when compared to other strong training-free baselines, especially for generation of longer sequences. On the IFEval benchmark, it achieves up to 16% higher accuracy over baseline methods, including sequential generation (one-by-one). These gains hold across different values of hyperparameters, mitigating the need for brittle hyperparameter tuning. Moreover, we observe that PUNT induces an emergent hierarchical generation strategy, where the model first establishes high-level paragraph structure before local refinement, suggesting a planning-like generation process that contributes to strong alignment performance. Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL) Cite as: arXiv:2510.21961 [cs.LG] (or arXiv:2510.21961v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2510.21961 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
zh

[NLP-157] A Stylometric Application of Large Language Models

【速读】: 该论文试图解决如何利用大语言模型(Large Language Models, LLMs)识别和区分不同作者写作风格的问题。其解决方案的关键在于:训练一个独立的GPT-2模型从零开始学习某一位作者的全部作品,使得该模型在预测该作者未见文本时的准确性显著高于其他作者的文本;这表明模型在训练过程中捕获并内化了特定作者的独特写作风格(authorial style),从而可用于作者身份识别与确认,例如成功验证了R. P. Thompson为《奥兹国》第十五部作品的实际作者。

链接: https://arxiv.org/abs/2510.21958
作者: Harrison F. Stropkay,Jiayi Chen,Mohammad J. Latifi,Daniel N. Rockmore,Jeremy R. Manning
机构: Dartmouth College (达特茅斯学院)
类目: Computation and Language (cs.CL); Digital Libraries (cs.DL)
备注: All code and data needed to reproduce the results in this paper are available at this https URL

点击查看摘要

Abstract:We show that large language models (LLMs) can be used to distinguish the writings of different authors. Specifically, an individual GPT-2 model, trained from scratch on the works of one author, will predict held-out text from that author more accurately than held-out text from other authors. We suggest that, in this way, a model trained on one author’s works embodies the unique writing style of that author. We first demonstrate our approach on books written by eight different (known) authors. We also use this approach to confirm R. P. Thompson’s authorship of the well-studied 15th book of the Oz series, originally attributed to F. L. Baum.
zh

[NLP-158] ransformer Based Linear Attention with Optimized GPU Kernel Implementation

链接: https://arxiv.org/abs/2510.21956
作者: Armin Gerami,Ramani Duraiswami
机构: University of Maryland (马里兰大学)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-159] Model-Aware Tokenizer Transfer

链接: https://arxiv.org/abs/2510.21954
作者: Mykola Haltiuk,Aleksander Smywiński-Pohl
机构: AGH University of Krakow (克拉科夫 AGH 大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-160] Explaining and Mitigating Crosslingual Tokenizer Inequities NEURIPS2025

【速读】: 该论文旨在解决多语言文本在编码过程中因词汇表差异导致的token premiums(令牌溢价)问题,即不同语言在相同内容下所需token数量不一致的现象。这种现象会降低训练吞吐量并增加推理成本。解决方案的关键在于:通过系统性地训练约7,000个可比的单语种分词器(monolingual tokenizers),发现词汇表大小(vocabulary size)和预分词策略(pre-tokenization)是影响token premiums的核心因素;进一步提出针对每种语言确定“最优”词汇表大小,并引入支持跨空格合并的superword tokenizers(超词分词器),从而显著降低跨语言token premiums效应并提升整体压缩性能。

链接: https://arxiv.org/abs/2510.21909
作者: Catherine Arnett,Tyler A. Chang,Stella Biderman,Benjamin K. Bergen
机构: EleutherAI(电报AI); UC San Diego(加州大学圣地亚哥分校)
类目: Computation and Language (cs.CL)
备注: Accepted to NeurIPS 2025

点击查看摘要

Abstract:The number of tokens it takes to encode parallel text in different languages is known to vary. These disparities are called token premiums. Having high token premiums leads to less throughput during training and increases costs at inference. In this paper, we show that even after controlling for dataset size, vocabulary size, and data content, monolingual tokenizers exhibit a wide range of token premiums across languages. To understand the cross-linguistic differences that cause these token premiums, we train a suite of approximately 7,000 comparable monolingual tokenizers for 97 languages, manipulating tokenization algorithm, vocabulary size, and dataset size. We measure token premiums and test for a relationship between factors such as data similarity (between tokenizer training and evaluation), vocabulary size, and pre-tokenization. We also investigate the role of language-specific features such as writing system and word length. We find that similarity between training and test data does not impact token premiums, but vocabulary size and pre-tokenization do. While simply increasing vocabulary size does not lead to reduced token premium effects, we can determine an ``optimal’’ vocabulary size for each language to achieve significantly reduced token premium effects. We also train superword tokenizers which allow merges over whitespaces, and we find that they both reduce token premium effects and improve compression overall. Thus, intervening on the vocabulary size or the pre-tokenizer significantly reduces crosslingual token premium effects.
zh

[NLP-161] Deep Literature Survey Automation with an Iterative Workflow

【速读】: 该论文旨在解决现有自动文献综述生成系统普遍采用“一次性”(one-shot)范式所导致的噪声检索、结构碎片化和上下文过载等问题,这些问题限制了综述的质量与可读性。其解决方案的关键在于提出了一种基于循环式大纲生成(recurrent outline generation)的框架 \ours,通过规划代理(planning agent)逐步迭代地执行文献检索、阅读与大纲更新,从而在探索(exploration)与连贯性(coherence)之间取得平衡;同时设计了纸张卡片(paper cards)以实现逐篇论文的贡献、方法与发现的凝练,并引入审查-优化循环(review-and-refine loop)结合可视化增强机制,提升文本流畅性并整合图表等多模态元素,显著改善综述内容覆盖度、结构一致性和引用质量。

链接: https://arxiv.org/abs/2510.21900
作者: Hongbo Zhang,Han Cui,Yidong Wang,Yijian Tian,Qi Guo,Cunxiang Wang,Jian Wu,Chiyu Song,Yue Zhang
机构: Zhejiang University (浙江大学); Westlake University (西湖大学); Peking University (北京大学); Westlake Institute for Advanced Study (西湖高等研究院)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Preprint version

点击查看摘要

Abstract:Automatic literature survey generation has attracted increasing attention, yet most existing systems follow a one-shot paradigm, where a large set of papers is retrieved at once and a static outline is generated before drafting. This design often leads to noisy retrieval, fragmented structures, and context overload, ultimately limiting survey quality. Inspired by the iterative reading process of human researchers, we propose \ours, a framework based on recurrent outline generation, in which a planning agent incrementally retrieves, reads, and updates the outline to ensure both exploration and coherence. To provide faithful paper-level grounding, we design paper cards that distill each paper into its contributions, methods, and findings, and introduce a review-and-refine loop with visualization enhancement to improve textual flow and integrate multimodal elements such as figures and tables. Experiments on both established and emerging topics show that \ours\ substantially outperforms state-of-the-art baselines in content coverage, structural coherence, and citation quality, while producing more accessible and better-organized surveys. To provide a more reliable assessment of such improvements, we further introduce Survey-Arena, a pairwise benchmark that complements absolute scoring and more clearly positions machine-generated surveys relative to human-written ones. The code is available at this https URL_Autosurveyv2.
zh

[NLP-162] Understanding Network Behaviors through Natural Language Question-Answering

【速读】: 该论文旨在解决现代大规模网络中因配置复杂性导致的行为理解困难问题,特别是传统基于领域特定语言(Domain-Specific Language, DSL)的方法存在学习曲线陡峭和灵活性不足的局限。针对此问题,其核心解决方案在于提出NetMind框架,通过三个关键技术突破实现自然语言(Natural Language, NL)驱动的网络行为理解:首先采用基于树结构的配置分块策略,在保持语义连贯性的前提下提升长文本处理效率;其次构建统一的事实图(Fact Graph)作为中间表示,以标准化不同厂商设备的配置差异;最后设计混合式命令-声明式语言,降低大语言模型(Large Language Model, LLM)的推理负担并提高准确性。实验证明该方法在准确性和可扩展性上优于现有基线。

链接: https://arxiv.org/abs/2510.21894
作者: Mingzhe Xing,Chang Tian,Jianan Zhang,Lichen Pan,Peipei Liu,Zhaoteng Yan,Yinliang Yue
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Large Language Models

点击查看摘要

Abstract:Modern large-scale networks introduce significant complexity in understanding network behaviors, increasing the risk of misconfiguration. Prior work proposed to understand network behaviors by mining network configurations, typically relying on domain-specific languages interfaced with formal models. While effective, they suffer from a steep learning curve and limited flexibility. In contrast, natural language (NL) offers a more accessible and interpretable interface, motivating recent research on NL-guided network behavior understanding. Recent advances in large language models (LLMs) further enhance this direction, leveraging their extensive prior knowledge of network concepts and strong reasoning capabilities. However, three key challenges remain: 1) numerous router devices with lengthy configuration files challenge LLM’s long-context understanding ability; 2) heterogeneity across devices and protocols impedes scalability; and 3) complex network topologies and protocols demand advanced reasoning abilities beyond the current capabilities of LLMs. To tackle the above challenges, we propose NetMind, a novel framework for querying networks using NL. Our approach introduces a tree-based configuration chunking strategy to preserve semantic coherence while enabling efficient partitioning. We then construct a unified fact graph as an intermediate representation to normalize vendor-specific configurations. Finally, we design a hybrid imperative-declarative language to reduce the reasoning burden on LLMs and enhance precision. We contribute a benchmark consisting of NL question-answer pairs paired with network configurations. Experiments demonstrate that NetMind achieves accurate and scalable network behavior understanding, outperforming existing baselines.
zh

[NLP-163] Embedding Trust: Semantic Isotropy Predicts Nonfactuality in Long-Form Text Generation

【速读】: 该论文旨在解决大语言模型(Large Language Models, LLMs)在高风险应用场景中生成长文本响应时,如何高效、可靠地评估其可信度的问题。现有方法依赖逐条事实核查,计算成本高且在开放式提示的长文本中表现脆弱。解决方案的关键在于引入“语义各向同性”(semantic isotropy)——即标准化文本嵌入在单位球面上分布的均匀程度,并通过估计多个响应嵌入之间的角度分散度来量化该指标。研究发现,语义各向同性越高(嵌入越分散),表明响应在事实一致性上越低,从而可作为非事实性预测的有效代理指标。该方法无需标注数据、微调或超参数调整,适用于开放或闭源嵌入模型,在多领域均显著优于现有方法,为实际部署中集成信任评估提供了低成本、高效的工具。

链接: https://arxiv.org/abs/2510.21891
作者: Dhrupad Bhardwaj,Julia Kempe,Tim G. J. Rudner
机构: New York University (纽约大学); University of Toronto (多伦多大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
备注:

点击查看摘要

Abstract:To deploy large language models (LLMs) in high-stakes application domains that require substantively accurate responses to open-ended prompts, we need reliable, computationally inexpensive methods that assess the trustworthiness of long-form responses generated by LLMs. However, existing approaches often rely on claim-by-claim fact-checking, which is computationally expensive and brittle in long-form responses to open-ended prompts. In this work, we introduce semantic isotropy – the degree of uniformity across normalized text embeddings on the unit sphere – and use it to assess the trustworthiness of long-form responses generated by LLMs. To do so, we generate several long-form responses, embed them, and estimate the level of semantic isotropy of these responses as the angular dispersion of the embeddings on the unit sphere. We find that higher semantic isotropy – that is, greater embedding dispersion – reliably signals lower factual consistency across samples. Our approach requires no labeled data, no fine-tuning, and no hyperparameter selection, and can be used with open- or closed-weight embedding models. Across multiple domains, our method consistently outperforms existing approaches in predicting nonfactuality in long-form responses using only a handful of samples – offering a practical, low-cost approach for integrating trust assessment into real-world LLM workflows.
zh

[NLP-164] Preventing Catastrophic Forgetting: Behavior-Aware Sampling for Safer Language Model Fine-Tuning

【速读】: 该论文旨在解决大语言模型在使用良性数据进行微调时出现的安全行为退化问题,即灾难性遗忘(catastrophic forgetting)现象。其解决方案的关键在于提出一种行为感知的采样框架(behavior-aware sampling framework),通过两个互补因素选择安全样本:一是指令-响应行为特征(如拒绝或遵从),二是跨危害类别语义多样性。该方法显著降低了有害输出,同时保持模型有用性,在仅增加0.5%训练数据的情况下实现高达41%的有害性减少,从而提升了微调阶段的安全性和效率。

链接: https://arxiv.org/abs/2510.21885
作者: Anh Pham,Mihir Thalanki,Michael Sun,Aditya Chaloo,Ankita Gupta,Tian Xia,Aditya Mate,Ehimwenma Nosakhare,Soundararajan Srinivasan
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Large language models often lose previously aligned safety behaviors when fine-tuned on benign data, a phenomenon known as catastrophic forgetting. Prior work shows that adding random safety examples can mitigate this effect, but it remains unclear which examples are most effective. We propose a behavior-aware sampling framework that selects safety examples based on two complementary factors: instruction-response behavior (e.g., refusal versus compliance) and semantic diversity across harm categories. Systematic evaluation shows that this approach substantially reduces harmful outputs while maintaining helpfulness, achieving up to a 41% reduction in harmfulness with only 0.5% additional training data. These results highlight how targeted data selection can improve the safety and efficiency of fine-tuning at scale.
zh

[NLP-165] Framework for Machine Evaluation of Reasoning Completeness in Large Language Models For Classification Tasks

【速读】: 该论文旨在解决大语言模型(Large Language Models, LLMs)生成的解释(rationales)与可解释特征重要性之间的一致性问题,即这些自然语言解释是否真实反映了模型决策所依赖的关键预测信号。其解决方案的核心是提出RACE(Reasoning Alignment for Completeness of Explanations)框架,通过将LLM生成的解释与逻辑回归基线模型提取的特征重要性进行多粒度对齐分析,采用词元感知(token-aware)、精确字符串匹配和编辑距离(edit-distance)三种技术量化解释的完整性。实证结果表明,正确预测更倾向于覆盖支持性特征,而错误预测则更多涉及矛盾特征,且编辑距离匹配揭示了语义重用现象,进一步验证了LLM解释在表面层面和灵活语义层面的双重证据利用机制,同时也暴露了其在错误案例中可能放大误导性线索的问题。

链接: https://arxiv.org/abs/2510.21884
作者: Avinash Patil
机构: Juniper Networks Inc. (Juniper网络公司)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 12 Pages, 12 Figures, 2 tables

点击查看摘要

Abstract:The growing adoption of machine learning (ML) in sensitive domains has heightened the demand for transparent and interpretable artificial intelligence. Large Language Models (LLMs) are increasingly capable of producing natural language explanations, yet it remains unclear whether these rationales faithfully capture the predictive signals that underlie decisions. This paper introduces RACE-Reasoning Alignment for Completeness of Explanations, a systematic framework to evaluate the alignment between LLM-generated explanations and interpretable feature importance scores derived from a logistic regression baseline. We analyze four widely used text classification datasets-WIKI ONTOLOGY, AG NEWS, IMDB, and GOEMOTIONS-and compare LLM rationales against top-ranked supporting and contradicting lexical features. To capture alignment at multiple levels of granularity, RACE implements token-aware, exact string, and edit-distance matching techniques. Empirical results reveal a consistent asymmetry: correct predictions exhibit higher coverage of supporting features, while incorrect predictions are associated with elevated coverage of contradicting features. Edit-distance matching further uncovers paraphrastic overlaps, boosting coverage while preserving this asymmetry. These findings demonstrate that LLM rationales combine both surface-level and flexible evidence reuse, yet can also amplify misleading cues in error cases. RACE provides new insights into the faithfulness of LLM explanations and establishes a quantitative basis for evaluating reasoning completeness in neural language models.
zh

[NLP-166] Language Ranker: A Lightweight Ranking framework for LLM Decoding

【速读】: 该论文旨在解决大语言模型(Large Language Models, LLMs)生成过程中 decoding 阶段效率与效果不足的问题,特别是传统方法在推理时依赖高计算成本的奖励模型(reward models),且存在冗余性导致性能受限。其解决方案的关键在于将 decoding 过程类比为推荐系统中的排序阶段,提出一种名为 Language Ranker 的轻量级框架:该框架通过一个仅含 0.5M 参数的模块,利用基础模型提取的特征对候选响应进行重排序(reranking),从而在不显著增加计算开销的前提下实现媲美大规模奖励模型的性能,有效提升了 LLM 生成过程的效率与实用性。

链接: https://arxiv.org/abs/2510.21883
作者: Chenheng Zhang,Tianqi Du,Jizhe Zhang,Mingqing Xiao,Yifei Wang,Yisen Wang,Zhouchen Lin
机构: Peking University (北京大学); MIT CSAIL (麻省理工学院计算机科学与人工智能实验室); Microsoft Research Asia (微软亚洲研究院)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Conventional research on large language models (LLMs) has primarily focused on refining output distributions, while paying less attention to the decoding process that transforms these distributions into final responses. Recent advances, such as scaling the computation of inference time with reward models, have underscored the importance of decoding, but these methods often suffer from high computational costs and limited applicability. In this paper, we revisit LLM generation through the lens of recommender systems, conceptualizing the decoding process as analogous to the ranking stage in recommendation pipelines. From this perspective, we observe that both traditional decoding methods and reward models exhibit clear limitations such as redundancy. Motivated by this insight, we propose Language Ranker, a novel framework that introduces a lightweight module to rerank candidate responses using features extracted by the base model. Experiments across a wide range of tasks show that Language Ranker achieves performance comparable to large-scale reward models, while requiring only 0.5M additional parameters, significantly reducing the computational overhead during both training and inference stages. This highlights the efficiency and effectiveness of our method, showcasing its potential to fully unlock the capabilities of LLMs.
zh

[NLP-167] GeoThought: A Dataset for Enhancing Mathematical Geometry Reasoning in Vision-Language Models

【速读】: 该论文旨在解决大语言模型(Large Language Models, LLMs)在视觉推理任务中,尤其是几何问题求解时性能显著下降的问题。其核心挑战在于几何问题对图像细节理解与多步逻辑推理的高要求,以及现有数据集在规模、多样性及显式推理链方面的不足。解决方案的关键在于构建GeoThoughts数据集,该数据集包含6,243(Geo-Thought-6K)和10,834(Geo-Thought-Augmented-10K)个样本,每条样本均配有视觉描述、分步解答、显式推理链(Chain-of-Thought, CoT)和反思步骤;基于此数据集训练的GeoThought-MLLM模型能够生成结构化的思考过程,在几何任务上优于现有基准,且在域内与域外场景下均展现出更强的推理能力。

链接: https://arxiv.org/abs/2510.21881
作者: Nannan Shi,Chuanyu Qin,Shipeng Song,Man Luo
机构: Baidu Inc.(百度公司); Institute of Information Engineering, Chinese Academy of Sciences(中国科学院信息工程研究所); Intel Lab, Intel(英特尔实验室,英特尔)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Large language models (LLMs) have demonstrated strong reasoning capabilities in text-based mathematical problem solving; however, when adapted to visual reasoning tasks, particularly geometric problem solving, their performance substantially declines because geometric problems present unique challenges. Specifically, these challenges stem from two key factors: first, the intrinsic complexity of geometry requiring detailed image comprehension and multi-step reasoning, and second, the limitations of existing datasets which lack sufficient scale, diversity, and explicit reasoning traces, consequently hindering effective model training. To address these challenges, we developed the GeoThoughts dataset, a comprehensive geometric reasoning corpus with two subsets: Geo-Thought-6K with 6,243 samples and its augmented version Geo-Thought-Augmented-10K containing 10,834 samples. Each entry includes visual descriptions, step-by-step solutions, explicit reasoning chains, reflection steps, and final answers. Using this dataset, we developed GeoThought-MLLM, a mathematical reasoning multimodal model that generates detailed thinking processes during problem-solving. Our model outperforms existing benchmarks in geometric tasks, demonstrating that training with our Chain-of-Thought dataset improves geometric reasoning capabilities across both in-domain and out-of-domain settings. Finally, we analyze failure cases and observe that errors primarily arise from incorrect interpretation of mathematical concepts or spatial misjudgment. By invoking CoT to correct these mistakes, the model produces correct answers.
zh

[NLP-168] he Mirror Loop: Recursive Non-Convergence in Generative Reasoning Systems

【速读】: 该论文旨在解决生成式 AI 在递归自我评估(recursive self-evaluation)过程中缺乏实质性改进的问题,即模型在无外部反馈的情况下进行反思时,常表现为形式上的改写而非认知上的进步。其关键解决方案是引入一个最小的“接地干预”(minimal grounding intervention),即在推理过程的第三轮迭代中加入一次独立验证步骤,以打破信息封闭状态。实验证明,此干预使信息变化量(delta I)在干预后提升28%,并维持非零方差,表明接地机制通过引入外源信息流(dissipative coupling)有效避免了认知停滞(epistemic stasis),从而实现真正的知识更新与协同推理。

链接: https://arxiv.org/abs/2510.21861
作者: Bentley DeVilling(Course Correct Labs, Independent Research Group)
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: 18 pages, 2 figures. Category: cs.LG. Code and data: this https URL

点击查看摘要

Abstract:Large language models are often described as capable of reflective reasoning, yet recursive self-evaluation without external feedback frequently yields reformulation rather than progress. We test this prediction in a cross-provider study of 144 reasoning sequences across three models (OpenAI GPT-4o-mini, Anthropic Claude 3 Haiku, and Google Gemini 2.0 Flash) and four task families (arithmetic, code, explanation, reflection), each iterated ten times under two conditions: ungrounded self-critique and a minimal grounding intervention (a single verification step at iteration three). Mean informational change (delta I, measured via normalized edit distance) declined by 55% from early (0.193) to late (0.087) iterations in ungrounded runs, with consistent patterns across all three providers. Grounded runs showed a +28% rebound in informational change immediately after the intervention and sustained non-zero variance thereafter. Complementary measures-n-gram novelty, embedding drift, and character-level entropy-converged on the same pattern: reflection without contact tends toward informational closure. We interpret this as evidence for a structural limit on self-correction in generative reasoning: without an exchange of information with an independent verifier or environment, recursive inference approaches an attractor state of epistemic stasis. Minimal grounding functions as dissipative coupling, reintroducing informational flux. The cross-architecture consistency suggests the mirror loop arises from shared autoregressive training objectives rather than provider-specific alignment schemes. The results delineate when reflection is performative rather than epistemic and motivate design principles for grounded, cooperative reasoning. Materials and code are publicly available.
zh

[NLP-169] SIGN: Schema-Induced Games for Naming AAAI2026

【速读】: 该论文旨在解决多智能体系统中因缺乏一致约定而导致的协调失效问题,尤其是在大型语言模型(Large Language Model, LLM)代理之间进行复杂任务协作时,如协同编程和分布式规划等场景下,通信一致性与可扩展性成为关键挑战。解决方案的核心在于提出Schema-Induced Games for Naming (SIGN),即一种通过轻量级结构引导命名惯例形成的命名游戏机制;其关键创新在于利用最小限度的结构化schema作为控制参数,显著加速了代理间共识的收敛速度(相比无约束自然语言沟通提升最高达5.8倍的一致性),从而为高效、可扩展的多智能体协作提供了可行路径。

链接: https://arxiv.org/abs/2510.21855
作者: Ryan Zhang,Herbert Woisetscläger
机构: 未知
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
备注: AAAI 2026 Student Abstract (Oral). Code available ar this https URL

点击查看摘要

Abstract:Real-world AI systems are tackling increasingly complex problems, often through interactions among large language model (LLM) agents. When these agents develop inconsistent conventions, coordination can break down. Applications such as collaborative coding and distributed planning therefore require reliable, consistent communication, and scalability is a central concern as systems grow. We introduce Schema-Induced Games for Naming (SIGN), a naming game that examines how lightweight structure can steer convention formation. We compare schema-induced communication to unconstrained natural language and find faster convergence with up to 5.8x higher agreement. These results suggest that minimal structure can act as a simple control knob for efficient multi-agent coordination, pointing toward broader applications beyond the naming game.
zh

[NLP-170] Policy Optimization Prefers The Path of Least Resistance

链接: https://arxiv.org/abs/2510.21853
作者: Debdeep Sanyal,Aakash Sen Sharma,Dhruv Kumar,Saurabh Deshpande,Murari Mandal
机构: Birla AI Labs (Birla AI 实验室); InvideoAI (InvideoAI); BITS Pilani (BITS 比拉尼); Kalinga Institute of Industrial Technology (卡林加工业技术学院)
类目: Computation and Language (cs.CL)
备注: 21 pages, 8 figures, 2 tables

点击查看摘要

[NLP-171] SCoPE VLM: Selective Context Processing for Efficient Document Navigation in Vision-Language Models

【速读】: 该论文旨在解决视觉语言模型(Vision-Language Models, VLMs)在处理长文本和高分辨率视觉信息时面临的挑战,特别是在GUI控制与网页导航等代理任务中缺乏对结构化文档的决策导向理解问题。现有方法主要通过扩展视觉嵌入来处理长输入,但存在内存消耗大、难以本地部署的问题。解决方案的关键在于提出SCoPE VLM框架,其核心创新是引入一种新颖的“滚动链”机制(Chain of Scroll),实现对文档内容的有选择性、递归式导航,仅关注相关段落;同时设计了专用的数据生成流水线以构建有意义的滚动轨迹,并采用改进的强化学习方法——分段组相对策略优化(Episodic Group Relative Policy Optimization),缩小训练与推理阶段的性能差距,从而显著降低内存占用并模拟人类阅读行为。

链接: https://arxiv.org/abs/2510.21850
作者: Gyubeum Lim,Yemo Koo,Vijay Krishna Madisetti
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Understanding long-context visual information remains a fundamental challenge for vision-language models, particularly in agentic tasks such as GUI control and web navigation. While web pages and GUI environments are inherently structured documents, current VLMs typically neglect decision-oriented document understanding in their training objectives. Existing approaches primarily extend visual embeddings to process long, high-resolution inputs, but these methods are memory-intensive and impractical for locally deployable solutions. To address these issues, we propose SCoPE VLM, a document navigation expert that leverages a novel Chain of Scroll mechanism to selectively and recursively navigate documents, focusing exclusively on relevant segments. We introduce a dedicated data generation pipeline to construct informative Chain of Scroll trajectories and Episodic Group Relative Policy Optimization, a tailored reinforcement learning method to reduce the gap between training and inference. Our method substantially reduces memory usage and effectively models human-like reading behaviors. To the best of our knowledge, SCoPE VLM is the first framework to explicitly model agentic reading patterns in multi-page document question answering, advancing the capabilities of multimodal agents.
zh

[NLP-172] A Multimodal Multitask System for Generating E Commerce Text Listings from Images

【速读】: 该论文旨在解决零售领域中商品描述与名称生成过程耗时且易产生事实性错误的问题。当前基于视觉到语言模型(Vision to Language Models, VLM)的自动化方案存在两大局限:一是单一任务模型效率低且难以捕捉特征间的依赖关系,二是VLM容易产生事实性“幻觉”(hallucination),导致生成内容与输入图像不符。解决方案的关键在于提出一种端到端的多任务系统,其核心创新包括:第一,采用多任务学习对视觉编码器进行微调,使单一视觉主干网络联合训练属性预测(如颜色、下摆和领型)与价格回归任务,提升模型整体性能;第二,引入分层生成机制,将模型自身预测的属性嵌入提示词(prompt)并输入文本解码器,从而增强生成文本的事实一致性。实验表明,该架构在属性分类F1分数上提升6.6%、价格回归R²提升3.6%,并将事实性幻觉率从12.7%降至7.1%,相对减少44.5%,同时显著降低自回归文本生成延迟约3.5倍。

链接: https://arxiv.org/abs/2510.21835
作者: Nayan Kumar Singh
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
备注: 24 pages, 10 figures, 11 tables. Code can be found at: this https URL

点击查看摘要

Abstract:Manually generating catchy descriptions and names is labor intensive and a slow process for retailers. Although generative AI provides an automation solution in form of Vision to Language Models (VLM), the current VLMs are prone to factual “hallucinations”. Siloed, single task models are not only inefficient but also fail to capture interdependent relationships between features. To address these challenges, we propose an end to end, multi task system that generates factually grounded textual listings from a single image. The contributions of this study are two proposals for the model architecture. First, application of multi task learning approach for fine tuning a vision encoder where a single vision backbone is jointly trained on attribute prediction such as color, hemline and neck style and price regression. Second, introduction of a hierarchical generation process where the model’s own predicted attributes are embedded in a prompt and fed to the text decoder to improve factual consistency. The experiments demonstrate the superiority of this architecture. The multi tasking approach outperforms both the independent price regression, with a 3.6% better R2 Value and attribute classification, with a 6.6% improvement F1 score. Critically, the hierarchical generation process proves highly effective, slashing the factual hallucination rate from 12.7% to 7.1%, a 44.5% relative reduction, compared to a non hierarchical ablation. The hierarchical approach also reduces the latency of the autoregressive text generation process by a factor of 3.5 when compared to direct vision to language model of similar size. One minor caveat is that the model does perform 3.5% worse than direct vision-to-language model on ROUGE-L score.
zh

[NLP-173] Structured and Abstractive Reasoning on Multi-modal Relational Knowledge Images

【速读】: 该论文旨在解决当前多模态大语言模型(Multi-Modal Large Language Models, MLLMs)在理解和推理视觉模态中抽象信息(abstractive information)方面的显著挑战,尤其是针对多模态关系知识(Multi-Modal Relational Knowledge, MMRK)这一尚未被充分探索的抽象结构形式。MMRK以节点-边格式表征跨模态实体间的抽象关系,其结构化和抽象性使得现有模型难以有效建模与推理。为填补高质量数据稀缺与能力增强方法不足的双重空白,论文提出两个关键解决方案:一是构建一个自动化的STAR数据生成引擎,用于合成带有MMRK的图像并生成具备可靠思维链(chain-of-thought)的多模态指令数据;二是设计一个两阶段的能力增强训练框架,并配套面向不同STAR任务的评估协议。该方案通过引入STAR-64K数据集(64K高质量多模态指令样本)验证有效性,实验表明该框架使小型3B/7B模型在STAR任务上显著超越GPT-4o,证明了其在提升模型结构化抽象推理能力方面的潜力。

链接: https://arxiv.org/abs/2510.21828
作者: Yichi Zhang,Zhuo Chen,Lingbing Guo,Lei Liang,Wen Zhang,Huajun Chen
机构: Zhejiang University (浙江大学); Ant Group (蚂蚁集团); ZJU-Ant Group Joint Lab of Knowledge Graph (浙江大学-蚂蚁集团知识图谱联合实验室)
类目: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
备注: Work in Progress. Code and data will be released at this https URL

点击查看摘要

Abstract:Understanding and reasoning with abstractive information from the visual modality presents significant challenges for current multi-modal large language models (MLLMs). Among the various forms of abstractive information, Multi-Modal Relational Knowledge (MMRK), which represents abstract relational structures between multi-modal entities using node-edge formats, remains largely under-explored. In particular, STructured and Abstractive Reasoning (STAR) on such data has received little attention from the research community. To bridge the dual gaps in large-scale high-quality data and capability enhancement methodologies, this paper makes the following key contributions: (i). An automatic STAR data engine capable of synthesizing images with MMRK to build multi-modal instruction data with reliable chain-of-thought thinking for various STAR tasks and (ii). A comprehsive two-stage capability enhancement training framework, accompanied by a suite of evaluation protocols tailored to different STAR tasks. Based upon these contributions, we introduce STAR-64K, a dataset comprising 64K high-quality multi-modal instruction samples, and conduct experiments across 5 open-source MLLMs. Experimental results show that our two-stage enhancement framework enables smaller 3B/7B models to significantly outperform GPT-4o in STAR. Additionally, we provide in-depth analysis regarding the effectiveness of various designs, data transferability, and scalability.
zh

[NLP-174] VITA-E: Natural Embodied Interaction with Concurrent Seeing Hearing Speaking and Acting

【速读】: 该论文旨在解决当前视觉-语言-动作(Vision-Language-Action, VLA)模型受限于静态交互范式的问题,即缺乏同时感知、听觉响应、言语输出与动作执行的能力,且无法动态处理实时用户中断,从而导致具身协作不流畅、用户体验僵化。解决方案的关键在于提出VITA-E框架,其核心是双模型架构——由一个“活跃模型”(Active Model)和一个“待机模型”(Standby Model)并行运行,使代理能够并发地观察环境、聆听用户语音、生成回应并执行动作,同时支持近实时中断响应;此外,引入“模型即控制器”(model-as-controller)范式,通过微调视觉语言模型(VLM)生成特殊标记作为系统级指令,将模型推理直接耦合至系统行为,从而实现更自然、灵活的具身交互能力。

链接: https://arxiv.org/abs/2510.21817
作者: Xiaoyu Liu,Chaoyou Fu,Chi Yan,Chu Wu,Haihan Gao,Yi-Fan Zhang,Shaoqi Dong,Cheng Qian,Bin Luo,Xiuyong Yang,Guanwu Li,Yusheng Cai,Yunhang Shen,Deqiang Jiang,Haoyu Cao,Xing Sun,Caifeng Shan,Ran He
机构: Nanjing University (南京大学); Tencent Youtu Lab (腾讯优图实验室); CASIA (中国科学院自动化研究所); Fourier Intelligence Inc. (傅里叶智能公司)
类目: Robotics (cs.RO); Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: Homepage: this https URL

点击查看摘要

Abstract:Current Vision-Language-Action (VLA) models are often constrained by a rigid, static interaction paradigm, which lacks the ability to see, hear, speak, and act concurrently as well as handle real-time user interruptions dynamically. This hinders seamless embodied collaboration, resulting in an inflexible and unresponsive user experience. To address these limitations, we introduce VITA-E, a novel embodied interaction framework designed for both behavioral concurrency and nearly real-time interruption. The core of our approach is a dual-model architecture where two parallel VLA instances operate as an Active Model'' and a Standby Model’‘, allowing the embodied agent to observe its environment, listen to user speech, provide verbal responses, and execute actions, all concurrently and interruptibly, mimicking human-like multitasking capabilities. We further propose a ``model-as-controller’’ paradigm, where we fine-tune the VLM to generate special tokens that serve as direct system-level commands, coupling the model’s reasoning with the system’s behavior. Experiments conducted on a physical humanoid platform demonstrate that VITA-E can reliably handle complex interactive scenarios. Our framework is compatible with various dual-system VLA models, achieving an extremely high success rate on emergency stops and speech interruptions while also successfully performing concurrent speech and action. This represents a significant step towards more natural and capable embodied assistants.
zh

[NLP-175] A Multi-lingual Dataset of Classified Parag raphs from Open Access Scientific Publications

【速读】: 该论文旨在解决科学文献中结构化信息提取的难题,特别是针对 acknowledgments(致谢)、data(数据)、software/code(软件/代码)和clinical trial(临床试验)等关键要素的自动识别与分类问题。其解决方案的关键在于构建了一个大规模、多语言的标注数据集(共833k段落),该数据集源自CC-BY许可的科学出版物,并通过GROBID进行预处理,结合fastText进行语言识别和OpenAlex进行科学领域标注,从而为训练文本分类模型和开发命名实体识别(Named Entity Recognition, NER)系统提供了高质量的数据基础。

链接: https://arxiv.org/abs/2510.21762
作者: Eric Jeangirard
机构: French Ministry of Higher Education and Research (法国高等教育与研究部)
类目: Computation and Language (cs.CL); Digital Libraries (cs.DL)
备注:

点击查看摘要

Abstract:We present a dataset of 833k paragraphs extracted from CC-BY licensed scientific publications, classified into four categories: acknowledgments, data mentions, software/code mentions, and clinical trial mentions. The paragraphs are primarily in English and French, with additional European languages represented. Each paragraph is annotated with language identification (using fastText) and scientific domain (from OpenAlex). This dataset, derived from the French Open Science Monitor corpus and processed using GROBID, enables training of text classification models and development of named entity recognition systems for scientific literature mining. The dataset is publicly available on HuggingFace this https URL under a CC-BY license.
zh

[NLP-176] Diagnosing Bottlenecks in Data Visualization Understanding by Vision-Language Models

【速读】: 该论文旨在解决当前视觉-语言模型(Vision-Language Models, VLMs)在数据可视化理解任务中表现不佳的问题,尤其是其失败原因尚不明确:是视觉信息编码不足、跨模态信息传递障碍,还是语言模块内部处理缺陷所致。解决方案的关键在于构建FUGU这一套精细的数据可视化理解任务体系,用于精准定位错误来源,并结合激活修补(activation patching)与线性探测(linear probes)技术追踪不同提示策略下模型的信息流路径。研究发现,部分VLM无法正确提取数据点坐标,且此类初始错误会引发后续响应偏差;而即便生成错误答案,正确坐标仍可从视觉编码器的潜在表示中读出,表明问题根源在于视觉到语言的“手递手”(vision-language handoff)环节。此外,提供正确坐标虽提升单点任务性能,却损害多点统计关系提取任务的表现,且基于FUGU的微调也无法达到理想性能,揭示了当前VLM架构存在的根本性局限。

链接: https://arxiv.org/abs/2510.21740
作者: Alexa R. Tartaglini,Satchel Grant,Daniel Wurgaft,Christopher Potts,Judith E. Fan
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Data visualizations are vital components of many scientific articles and news stories. Current vision-language models (VLMs) still struggle on basic data visualization understanding tasks, but the causes of failure remain unclear. Are VLM failures attributable to limitations in how visual information in the data visualization is encoded, how information is transferred between the vision and language modules, or how information is processed within the language module? We developed FUGU, a suite of data visualization understanding tasks, to precisely characterize potential sources of difficulty (e.g., extracting the position of data points, distances between them, and other summary statistics). We used FUGU to investigate three widely used VLMs. To diagnose the sources of errors produced by these models, we used activation patching and linear probes to trace information flow through models across a variety of prompting strategies. We found that some models fail to generate the coordinates of individual data points correctly, and these initial errors often lead to erroneous final responses. When these models are provided with the correct coordinates, performance improves substantially. Moreover, even when the model generates an incorrect response, the correct coordinates can be successfully read out from the latent representations in the vision encoder, suggesting that the source of these errors lies in the vision-language handoff. We further found that while providing correct coordinates helps with tasks involving one or a small number of data points, it generally worsens performance for tasks that require extracting statistical relationships across many data points. Fine-tuning models on FUGU also fails to yield ceiling performance. These findings point to architectural constraints in current VLMs that might pose significant challenges for reliable data visualization understanding.
zh

[NLP-177] Next-Generation LLM for UAV: From Natural Language to Autonomous Flight

【速读】: 该论文旨在解决当前大型语言模型(Large Language Models, LLMs)在无人机(Unmanned Aerial Vehicle, UAV)系统中应用受限的问题,特别是现有研究多集中于小型无人机的局部任务(如玩具级路径规划),缺乏对中长距离、多尺度UAV系统在真实场景下综合自动化能力的探索。其核心挑战包括机场起降规范性要求、复杂法规遵从性以及高阶任务执行能力等。解决方案的关键在于提出下一代LLM驱动的UAV系统(NeLV),通过五个关键技术模块实现自然语言指令到可执行任务的端到端转化:(i) LLM-as-Parser用于语义解析,(ii) Route Planner确定兴趣点(Points of Interest, POI),(iii) Path Planner生成航路点,(iv) Control Platform执行轨迹,(v) UAV监控保障运行安全。此外,论文构建了一个五级自动化分类体系,明确从当前仅具备指令解析能力(Level 1)到全自主LLM-as-Autopilot系统(Level 5)的技术演进路径与阶段性挑战,为未来LLM赋能UAV系统的规模化落地提供结构化框架。

链接: https://arxiv.org/abs/2510.21739
作者: Liangqi Yuan,Chuhao Deng,Dong-Jun Han,Inseok Hwang,Sabine Brunswicker,Christopher G. Brinton
机构: Purdue University (普渡大学); Yonsei University (延世大学)
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Systems and Control (eess.SY)
备注:

点击查看摘要

Abstract:With the rapid advancement of Large Language Models (LLMs), their capabilities in various automation domains, particularly Unmanned Aerial Vehicle (UAV) operations, have garnered increasing attention. Current research remains predominantly constrained to small-scale UAV applications, with most studies focusing on isolated components such as path planning for toy drones, while lacking comprehensive investigation of medium- and long-range UAV systems in real-world operational contexts. Larger UAV platforms introduce distinct challenges, including stringent requirements for airport-based take-off and landing procedures, adherence to complex regulatory frameworks, and specialized operational capabilities with elevated mission expectations. This position paper presents the Next-Generation LLM for UAV (NeLV) system – a comprehensive demonstration and automation roadmap for integrating LLMs into multi-scale UAV operations. The NeLV system processes natural language instructions to orchestrate short-, medium-, and long-range UAV missions through five key technical components: (i) LLM-as-Parser for instruction interpretation, (ii) Route Planner for Points of Interest (POI) determination, (iii) Path Planner for waypoint generation, (iv) Control Platform for executable trajectory implementation, and (v) UAV monitoring. We demonstrate the system’s feasibility through three representative use cases spanning different operational scales: multi-UAV patrol, multi-POI delivery, and multi-hop relocation. Beyond the current implementation, we establish a five-level automation taxonomy that charts the evolution from current LLM-as-Parser capabilities (Level 1) to fully autonomous LLM-as-Autopilot systems (Level 5), identifying technical prerequisites and research challenges at each stage.
zh

[NLP-178] When Robots Say No: Temporal Trust Recovery Through Explanation

【速读】: 该论文旨在解决高风险任务中人-机器人团队(Human-Robot Team, HRT)内用户信任动态变化的问题,特别是在机器人因自身优先级判断而拒绝立即响应用户请求时,可能导致信任受损的情况。解决方案的关键在于:通过提供合理的解释(explanation),即使在信任短暂下降后,也能有效促进信任的恢复,从而缓解因机器人非即时响应所引发的信任危机。

链接: https://arxiv.org/abs/2510.21716
作者: Nicola Webb,Zijun Huang,Sanja Milivojevic,Chris Baber,Edmund R. Hunt
机构: 未知
类目: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL); Robotics (cs.RO)
备注:

点击查看摘要

Abstract:Mobile robots with some degree of autonomy could deliver significant advantages in high-risk missions such as search and rescue and firefighting. Integrated into a human-robot team (HRT), robots could work effectively to help search hazardous buildings. User trust is a key enabler for HRT, but during a mission, trust can be damaged. With distributed situation awareness, such as when team members are working in different locations, users may be inclined to doubt a robot’s integrity if it declines to immediately change its priorities on request. In this paper, we present the results of a computer-based study investigating on-mission trust dynamics in a high-stakes human-robot teaming scenario. Participants (n = 38) played an interactive firefighting game alongside a robot teammate, where a trust violation occurs owing to the robot declining to help the user immediately. We find that when the robot provides an explanation for declining to help, trust better recovers over time, albeit following an initial drop that is comparable to a baseline condition where an explanation for refusal is not provided. Our findings indicate that trust can vary significantly during a mission, notably when robots do not immediately respond to user requests, but that this trust violation can be largely ameliorated over time if adequate explanation is provided.
zh

[NLP-179] Beyond IVR Touch-Tones: Customer Intent Routing using LLM s

【速读】: 该论文旨在解决传统触控式交互式语音应答(Interactive Voice Response, IVR)系统因僵化结构导致用户体验不佳的问题,核心挑战在于如何将用户自然语言表达的意图准确路由至IVR菜单路径。解决方案的关键在于利用大语言模型(Large Language Models, LLMs)构建一种新型的意图识别与路由方法:通过合成一个包含23个节点的现实IVR结构,生成920个用户意图(包括230个基础意图和690个增强意图),并对比两种提示设计策略——描述性分层菜单与扁平化路径表示——进行路由任务评估。实验表明,扁平化路径表示在基础数据集上达到89.13%的准确率,显著优于描述性格式(81.30%),验证了LLMs在提升IVR交互智能化水平方面的可行性,为实现更自然、无缝的客户服务交互提供了技术路径。

链接: https://arxiv.org/abs/2510.21715
作者: Sergio Rojas-Galeano
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
备注: Accepted for publication in the Proceedings of the Workshop on Engineering Applications 2025 (WEA 2025)

点击查看摘要

Abstract:Widespread frustration with rigid touch-tone Interactive Voice Response (IVR) systems for customer service underscores the need for more direct and intuitive language interaction. While speech technologies are necessary, the key challenge lies in routing intents from user phrasings to IVR menu paths, a task where Large Language Models (LLMs) show strong potential. Progress, however, is limited by data scarcity, as real IVR structures and interactions are often proprietary. We present a novel LLM-based methodology to address this gap. Using three distinct models, we synthesized a realistic 23-node IVR structure, generated 920 user intents (230 base and 690 augmented), and performed the routing task. We evaluate two prompt designs: descriptive hierarchical menus and flattened path representations, across both base and augmented datasets. Results show that flattened paths consistently yield higher accuracy, reaching 89.13% on the base dataset compared to 81.30% with the descriptive format, while augmentation introduces linguistic noise that slightly reduces performance. Confusion matrix analysis further suggests that low-performing routes may reflect not only model limitations but also redundancies in menu design. Overall, our findings demonstrate proof-of-concept that LLMs can enable IVR routing through a smoother, more seamless user experience – moving customer service one step ahead of touch-tone menus.
zh

[NLP-180] DecoupleSearch: Decouple Planning and Search via Hierarchical Reward Modeling EMNLP2025

【速读】: 该论文旨在解决Agentic RAG(智能体增强检索生成)系统中存在的三大挑战:(1)每一步的成功依赖于高质量的规划与精确的搜索;(2)中间推理步骤缺乏监督信号;(3)规划与搜索的候选空间呈指数级增长。解决方案的关键在于提出DecoupleSearch框架,通过引入双价值模型(dual value models)将规划(planning)与搜索(search)过程解耦,从而实现对计划推理和搜索锚定(search grounding)的独立优化。该框架构建了一个推理树结构,每个节点代表一个规划或搜索步骤,并利用蒙特卡洛树搜索(Monte Carlo Tree Search, MCTS)评估各步骤质量;在推理阶段则采用分层束搜索(Hierarchical Beam Search)迭代优化候选方案,显著提升了Agentic RAG的稳定性和效率。

链接: https://arxiv.org/abs/2510.21712
作者: Hao Sun,Zile Qiao,Bo Wang,Guoxin Chen,Yingyan Hou,Yong Jiang,Pengjun Xie,Fei Huang,Yan Zhang
机构: Tongyi Lab (通义实验室); Alibaba Group (阿里巴巴集团)
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: EMNLP 2025 Main Conference

点击查看摘要

Abstract:Retrieval-Augmented Generation (RAG) systems have emerged as a pivotal methodology for enhancing Large Language Models (LLMs) through the dynamic integration of external knowledge. To further improve RAG’s flexibility, Agentic RAG introduces autonomous agents into the workflow. However, Agentic RAG faces several challenges: (1) the success of each step depends on both high-quality planning and accurate search, (2) the lack of supervision for intermediate reasoning steps, and (3) the exponentially large candidate space for planning and searching. To address these challenges, we propose DecoupleSearch, a novel framework that decouples planning and search processes using dual value models, enabling independent optimization of plan reasoning and search grounding. Our approach constructs a reasoning tree, where each node represents planning and search steps. We leverage Monte Carlo Tree Search to assess the quality of each step. During inference, Hierarchical Beam Search iteratively refines planning and search candidates with dual value models. Extensive experiments across policy models of varying parameter sizes, demonstrate the effectiveness of our method.
zh

[NLP-181] BugPilot: Complex Bug Generation for Efficient Learning of SWE Skills

【速读】: 该论文旨在解决当前用于训练基于语言模型的软件工程(Software Engineering, SWE)代理的bug数据质量低、多样性不足的问题,尤其是现有方法生成的bug往往偏离真实开发场景,导致训练效率低下。其解决方案的关键在于提出一种新颖的合成bug生成方法:通过指令SWE代理在代码库中引入新功能,使其无意间破坏测试用例从而产生bug,这种方法更贴近人类开发者在实际编码过程中因功能变更引发错误的真实模式。相较于传统通过局部扰动等人为干预方式生成bug的方法,该策略能生成更具现实性和多样性的bug数据,显著提升监督微调(Supervised Fine-Tuning)的训练效率,在仅使用1.2k个bug的情况下性能优于使用3k个bug的其他数据集,最终推动FrogBoss(32B参数)和FrogMini(14B参数)两个模型在SWE-bench Verified基准上达到当前最优表现。

链接: https://arxiv.org/abs/2510.19898
作者: Atharv Sonwane,Isadora White,Hyunji Lee,Matheus Pereira,Lucas Caccia,Minseon Kim,Zhengyan Shi,Chinmay Singh,Alessandro Sordoni,Marc-Alexandre Côté,Xingdi Yuan
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:High quality bugs are key to training the next generation of language model based software engineering (SWE) agents. We introduce a novel method for synthetic generation of difficult and diverse bugs. Our method instructs SWE Agents to introduce a feature into the codebase whereby they may unintentionally break tests, resulting in bugs. Prior approaches often induce an out-of-distribution effect by generating bugs intentionally (e.g. by introducing local perturbation to existing code), which does not reflect realistic development processes. We perform qualitative analysis to demonstrate that our approach for generating bugs more closely reflects the patterns found in human-authored edits. Through extensive experiments, we demonstrate that our bugs provide more efficient training data for supervised fine-tuning, outperforming other bug datasets by 2% with half the training data (1.2k vs. 3k bugs). We train on our newly generated bugs in addition to existing bug datasets to get FrogBoss a state-of-the-art 32B parameter model on SWE-bench Verified with a pass@1 of 54.6% and FrogMini a state-of-the-art 14B model on SWE-bench Verified with a pass@1 of 45.3% on SWE-bench Verified averaged over three seeds.
zh

[NLP-182] LibriConvo: Simulating Conversations from Read Literature for ASR and Diarization LREC2026

【速读】: 该论文旨在解决现有多说话人对话数据集在语义连贯性和对话时间合理性方面的不足,这些问题限制了说话人聚类(speaker diarization)和自动语音识别(ASR)系统的训练与评估效果。解决方案的关键在于提出一种基于说话人感知对话模拟(Speaker-Aware Conversation Simulation, SASC)的新型数据生成流程:首先利用CallHome结合外部语音活动检测(VAD)获取可靠边界,通过压缩技术减少不自然的长静默,同时按书籍组织LibriTTS语句以保持上下文一致性;其次引入一种新的房间脉冲响应(Room Impulse Response, RIR)选择方法,依据空间合理性对麦克风-说话人配置进行排序,从而在真实感与多样性之间取得平衡。最终构建的LibriConvo数据集包含240.1小时、1,496段对话及830名唯一说话人,采用说话人不交集划分方式,支持更鲁棒的模型评估,为多说话人语音处理研究提供了高质量、可控且符合实际对话动态的数据资源。

链接: https://arxiv.org/abs/2510.23320
作者: Máté Gedeon,Péter Mihajlik
机构: 未知
类目: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
备注: Submitted to LREC 2026

点击查看摘要

Abstract:We introduce LibriConvo, a simulated multi-speaker conversational dataset based on speaker-aware conversation simulation (SASC), designed to support training and evaluation of speaker diarization and automatic speech recognition (ASR) systems. Unlike prior resources that mostly rely on semantically disconnected utterances and implausible temporal gaps, LibriConvo ensures semantic coherence and realistic conversational timing. Our pipeline leverages CallHome with external VAD for reliable boundaries, applies compression to reduce unnaturally long silences, and organizes LibriTTS utterances by book to maintain contextual consistency. Acoustic realism is enhanced via a novel room impulse response selection procedure that ranks speaker-microphone configurations by spatial plausibility, balancing realism and diversity. The dataset comprises 240.1 hours across 1,496 dialogues with 830 unique speakers, split in a speaker-disjoint manner for robust evaluation. Baselines show that the sortformer model outperforms the pyannote pipeline in diarization, while a fine-tuned Fast Conformer-CTC XLarge with Serialized Output Training achieves 7.29% WER for ASR, surpassing zero-shot Whisper-large-v3. LibriConvo provides a valuable resource for advancing multi-speaker speech processing research with realistic conversational dynamics and controlled experimental conditions.
zh

[NLP-183] UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models

【速读】: 该论文旨在解决当前语音对话模型在细粒度语音风格控制方面的不足,这一能力对于实现类人交互至关重要,但常被忽视。其核心问题是现有模型缺乏对情绪、语速、音量、口音、语言及复合风格等多维度语音特征的精确调控能力。解决方案的关键在于构建首个大规模语音对话数据集UltraVoice,该数据集包含超过830小时的语音对话,并提供针对六种关键语音风格维度的指令。基于此数据集对SLAM-Omni和VocalNet等主流模型进行微调,显著提升了模型在多维语音风格控制下的表现(MOS提升29.12–42.33%,IFR提升14.61–40.09个百分点),同时保持甚至增强了核心对话理解与推理能力,验证了UltraVoice在训练可控文本转语音(TTS)模型中的高可用性与广泛适用性。

链接: https://arxiv.org/abs/2510.22588
作者: Wenming Tu,Guanrou Yang,Ruiqi Yan,Wenxi Chen,Ziyang Ma,Yipeng Kang,Kai Yu,Xie Chen,Zilong Zheng
机构: X-LANCE Lab, Shanghai Jiao Tong University (上海交通大学); State Key Laboratory of General Artificial Intelligence, BIGAI (通用人工智能国家重点实验室,BIGAI); Shanghai Innovation Institute (上海创新研究院)
类目: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
备注: 23 pages, 4 figures

点击查看摘要

Abstract:Spoken dialogue models currently lack the ability for fine-grained speech style control, a critical capability for human-like interaction that is often overlooked in favor of purely functional capabilities like reasoning and question answering. To address this limitation, we introduce UltraVoice, the first large-scale speech dialogue dataset engineered for multiple fine-grained speech style control. Encompassing over 830 hours of speech dialogues, UltraVoice provides instructions across six key speech stylistic dimensions: emotion, speed, volume, accent, language, and composite styles. Fine-tuning leading models such as SLAM-Omni and VocalNet on UltraVoice significantly enhances their fine-grained speech stylistic controllability without degrading core conversational abilities. Specifically, our fine-tuned models achieve improvements of 29.12-42.33% in Mean Opinion Score (MOS) and 14.61-40.09 percentage points in Instruction Following Rate (IFR) on multi-dimensional control tasks designed in the UltraVoice. Moreover, on the URO-Bench benchmark, our fine-tuned models demonstrate substantial gains in core understanding, reasoning, and conversational abilities, with average improvements of +10.84% on the Basic setting and +7.87% on the Pro setting. Furthermore, the dataset’s utility extends to training controllable Text-to-Speech (TTS) models, underscoring its high quality and broad applicability for expressive speech synthesis. The complete dataset and model checkpoints are available at: this https URL.
zh

计算机视觉

[CV-0] Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations NEURIPS2025

【速读】:该论文旨在解决空间认知中多模态信息融合与高质量特征表示学习的问题,特别是如何在无监督条件下从单模态输入(如2D图像或3D点云)中提取具有几何和语义一致性的空间特征。其解决方案的关键在于提出一种极简但高效的模拟人类概念学习机制——Concerto,该方法结合了3D内部模态自蒸馏(3D intra-modal self-distillation)与2D-3D跨模态联合嵌入(cross-modal joint embedding),从而在无需显式标注的情况下学习出更连贯、更具信息量的空间特征表示。实验表明,Concerto在零样本可视化、线性探测和全微调任务中均显著优于当前最优的2D和3D自监督模型,展现出优越的细粒度几何与语义一致性。

链接: https://arxiv.org/abs/2510.23607
作者: Yujia Zhang,Xiaoyang Wu,Yixing Lao,Chengyao Wang,Zhuotao Tian,Naiyan Wang,Hengshuang Zhao
机构: The University of Hong Kong (香港大学); The Chinese University of Hong Kong (香港中文大学); Harbin Institute of Technology (Shenzhen) (哈尔滨工业大学(深圳)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: NeurIPS 2025, produced by Pointcept, project page: this https URL

点击查看摘要

Abstract:Humans learn abstract concepts through multisensory synergy, and once formed, such representations can often be recalled from a single modality. Inspired by this principle, we introduce Concerto, a minimalist simulation of human concept learning for spatial cognition, combining 3D intra-modal self-distillation with 2D-3D cross-modal joint embedding. Despite its simplicity, Concerto learns more coherent and informative spatial features, as demonstrated by zero-shot visualizations. It outperforms both standalone SOTA 2D and 3D self-supervised models by 14.2% and 4.8%, respectively, as well as their feature concatenation, in linear probing for 3D scene perception. With full fine-tuning, Concerto sets new SOTA results across multiple scene understanding benchmarks (e.g., 80.7% mIoU on ScanNet). We further present a variant of Concerto tailored for video-lifted point cloud spatial understanding, and a translator that linearly projects Concerto representations into CLIP’s language space, enabling open-world perception. These results highlight that Concerto emerges spatial representations with superior fine-grained geometric and semantic consistency.
zh

[CV-1] rack Inpaint Resplat: Subject-driven 3D and 4D Generation with Progressive Texture Infilling NEURIPS2025

链接: https://arxiv.org/abs/2510.23605
作者: Shuhong Zheng,Ashkan Mirzaei,Igor Gilitschenski
机构: University of Toronto (多伦多大学); Vector Institute (矢量研究所); Snap Inc. (Snap公司)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Robotics (cs.RO)
备注: NeurIPS 2025, 38 pages, 22 figures

点击查看摘要

[CV-2] PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity

链接: https://arxiv.org/abs/2510.23603
作者: Yuqian Yuan,Wenqiao Zhang,Xin Li,Shihao Wang,Kehan Li,Wentong Li,Jun Xiao,Lei Zhang,Beng Chin Ooi
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 22 pages, 13 figures

点击查看摘要

[CV-3] PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection

链接: https://arxiv.org/abs/2510.23594
作者: Yusu Qian,Cheng Wan,Chao Jia,Yinfei Yang,Qingyu Zhao,Zhe Gan
机构: Apple(苹果); Cornell(康奈尔大学); Weill Cornell Medicine(威尔康奈尔医学院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-4] InFlux: A Benchmark for Self-Calibration of Dynamic Intrinsics of Video Cameras NEURIPS2025

【速读】:该论文旨在解决真实世界视频中相机内参(camera intrinsics)动态变化问题,即现有3D视觉算法普遍假设相机内参在视频中保持恒定,但实际场景中这一假设往往不成立。其解决方案的关键在于构建首个真实世界级别的动态相机内参基准数据集InFlux,该数据集提供逐帧标注的相机内参真值,包含143K+帧来自386段高分辨率室内和室外视频,涵盖更广泛的场景多样性与内参变化范围;同时通过建立全面的标定实验查找表并扩展Kalibr工具箱以提升逐帧内参估计的精度与鲁棒性,从而为动态内参预测方法提供了可靠的评估平台。

链接: https://arxiv.org/abs/2510.23589
作者: Erich Liang,Roma Bhattacharjee,Sreemanti Dey,Rafael Moschopoulos,Caitlin Wang,Michel Liao,Grace Tan,Andrew Wang,Karhan Kayan,Stamatis Alexandropoulos,Jia Deng
机构: Princeton University (普林斯顿大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted at NeurIPS 2025 DB Track, Camera Ready Version. Supplementary material included

点击查看摘要

Abstract:Accurately tracking camera intrinsics is crucial for achieving 3D understanding from 2D video. However, most 3D algorithms assume that camera intrinsics stay constant throughout a video, which is often not true for many real-world in-the-wild videos. A major obstacle in this field is a lack of dynamic camera intrinsics benchmarks–existing benchmarks typically offer limited diversity in scene content and intrinsics variation, and none provide per-frame intrinsic changes for consecutive video frames. In this paper, we present Intrinsics in Flux (InFlux), a real-world benchmark that provides per-frame ground truth intrinsics annotations for videos with dynamic intrinsics. Compared to prior benchmarks, InFlux captures a wider range of intrinsic variations and scene diversity, featuring 143K+ annotated frames from 386 high-resolution indoor and outdoor videos with dynamic camera intrinsics. To ensure accurate per-frame intrinsics, we build a comprehensive lookup table of calibration experiments and extend the Kalibr toolbox to improve its accuracy and robustness. Using our benchmark, we evaluate existing baseline methods for predicting camera intrinsics and find that most struggle to achieve accurate predictions on videos with dynamic intrinsics. For the dataset, code, videos, and submission, please visit this https URL.
zh

[CV-5] FARMER: Flow AutoRegressive Transformer over Pixels

链接: https://arxiv.org/abs/2510.23588
作者: Guangting Zheng,Qinyu Zhao,Tao Yang,Fei Xiao,Zhijie Lin,Jie Wu,Jiajun Deng,Yanyong Zhang,Rui Zhu
机构: ByteDance(字节跳动)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Bytedance Seed Technical Report

点击查看摘要

[CV-6] Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation

链接: https://arxiv.org/abs/2510.23581
作者: Junyoung Seo,Rodrigo Mira,Alexandros Haliassos,Stella Bounareli,Honglie Chen,Linh Tran,Seungryong Kim,Zoe Landgraf,Jie Shen
机构: KAIST(韩国科学技术院); Imperial College London(帝国理工学院)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Project page: this https URL

点击查看摘要

[CV-7] UrbanVLA: A Vision-Language-Action Model for Urban Micromobility

【速读】:该论文旨在解决城市微移动(Urban Micromobility)场景下机器人在大规模、动态且非结构化环境中实现可靠导航的问题,尤其针对长距离路径指令执行的挑战。现有导航方法多适用于短尺度可控场景,难以满足实际城市应用中对高阶路径对齐与低阶避障能力的协同需求。解决方案的关键在于提出UrbanVLA——一种基于路线条件的视觉-语言-动作(Vision-Language-Action, VLA)框架,通过显式对齐噪声路径航点与视觉观测来规划轨迹,并采用两阶段训练策略:首先在模拟环境和网络视频解析轨迹上进行监督微调(Supervised Fine-Tuning, SFT),随后结合仿真与真实数据进行强化微调(Reinforcement Fine-Tuning, RFT),从而同时提升模型的低层运动控制能力和高层路径理解能力,最终实现在MetaUrban社交导航任务中优于基线超过55%的效果,并在真实世界中展现出可扩展性和鲁棒性。

链接: https://arxiv.org/abs/2510.23576
作者: Anqi Li,Zhiyong Wang,Jiazhao Zhang,Minghan Li,Yunpeng Qi,Zhibo Chen,Zhizheng Zhang,He Wang
机构: Peking University (北京大学); Galbot; USTC (中国科学技术大学); BAAI (北京人工智能研究院)
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Urban micromobility applications, such as delivery robots, demand reliable navigation across large-scale urban environments while following long-horizon route instructions. This task is particularly challenging due to the dynamic and unstructured nature of real-world city areas, yet most existing navigation methods remain tailored to short-scale and controllable scenarios. Effective urban micromobility requires two complementary levels of navigation skills: low-level capabilities such as point-goal reaching and obstacle avoidance, and high-level capabilities, such as route-visual alignment. To this end, we propose UrbanVLA, a route-conditioned Vision-Language-Action (VLA) framework designed for scalable urban navigation. Our method explicitly aligns noisy route waypoints with visual observations during execution, and subsequently plans trajectories to drive the robot. To enable UrbanVLA to master both levels of navigation, we employ a two-stage training pipeline. The process begins with Supervised Fine-Tuning (SFT) using simulated environments and trajectories parsed from web videos. This is followed by Reinforcement Fine-Tuning (RFT) on a mixture of simulation and real-world data, which enhances the model’s safety and adaptability in real-world settings. Experiments demonstrate that UrbanVLA surpasses strong baselines by more than 55% in the SocialNav task on MetaUrban. Furthermore, UrbanVLA achieves reliable real-world navigation, showcasing both scalability to large-scale urban environments and robustness against real-world uncertainties.
zh

[CV-8] More Than Generation: Unifying Generation and Depth Estimation via Text-to-Image Diffusion Models NEURIPS2025

链接: https://arxiv.org/abs/2510.23574
作者: Hongkai Lin,Dingkang Liang,Mingyang Du,Xin Zhou,Xiang Bai
机构: Huazhong University of Science and Technology (华中科技大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by NeurIPS 2025. The code will be made available at this https URL

点击查看摘要

[CV-9] RobotArena infty: Scalable Robot Benchmarking via Real-to-Sim Translation

链接: https://arxiv.org/abs/2510.23571
作者: Yash Jangir,Yidi Zhang,Kashu Yamazaki,Chenyu Zhang,Kuan-Hsun Tu,Tsung-Wei Ke,Lei Ke,Yonatan Bisk,Katerina Fragkiadaki
机构: Carnegie Mellon University (卡内基梅隆大学); Zhejiang University (浙江大学); Peking University (北京大学); National Taiwan University (台湾国立大学)
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Website: this https URL

点击查看摘要

[CV-10] EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT NEURIPS2025

链接: https://arxiv.org/abs/2510.23569
作者: Baoqi Pei,Yifei Huang,Jilan Xu,Yuping He,Guo Chen,Fei Wu,Yu Qiao,Jiangmiao Pang
机构: Shanghai Artificial Intelligence Laboratory (上海人工智能实验室); Zhejiang University (浙江大学); The University of Tokyo (东京大学); Fudan University (复旦大学); Nanjing University (南京大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted at NeurIPS 2025

点击查看摘要

[CV-11] DPGLA: Bridging the Gap between Synthetic and Real Data for Unsupervised Domain Adaptation in 3D LiDAR Semantic Segmentation IROS

链接: https://arxiv.org/abs/2510.23525
作者: Wanmeng Li,Simone Mosco,Daniel Fusaro,Alberto Pretto
机构: Intelligent Autonomous Systems Laboratory (IAS-LAB); Department of Information Engineering of the University of Padua (帕多瓦大学信息工程系); Italy (意大利)
类目: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
备注: This paper has been accepted for publication at the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

点击查看摘要

[CV-12] FreeFuse: Multi-Subject LoRA Fusion via Auto Masking at Test Time

【速读】:该论文旨在解决多主体文本到图像生成任务中,如何在不进行额外训练、不修改LoRA(Low-Rank Adaptation)权重、不依赖辅助模型或复杂分割技术的前提下,实现多个主体LoRA的高效融合问题。现有方法要么需要在推理前合并LoRA权重,要么依赖分割模型和噪声混合等复杂手段来隔离不同LoRA的输出,导致流程繁琐且灵活性差。解决方案的关键在于提出一种基于交叉注意力层权重的上下文感知动态主体掩码(context-aware dynamic subject masks),通过数学分析证明直接将此类掩码应用于LoRA输出可近似等效于将每个主体LoRA整合进扩散模型并在其掩码区域内单独使用的效果,从而实现无需训练、无额外模型、无需用户指定区域或模板的无缝集成。

链接: https://arxiv.org/abs/2510.23515
作者: Yaoli Liu,Yao-Xiang Ding,Kun Zhou
机构: Zhejiang University (浙江大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:This paper proposes FreeFuse, a novel training-free approach for multi-subject text-to-image generation through automatic fusion of multiple subject LoRAs. In contrast to existing methods that either focus on pre-inference LoRA weight merging or rely on segmentation models and complex techniques like noise blending to isolate LoRA outputs, our key insight is that context-aware dynamic subject masks can be automatically derived from cross-attention layer weights. Mathematical analysis shows that directly applying these masks to LoRA outputs during inference well approximates the case where the subject LoRA is integrated into the diffusion model and used individually for the masked region. FreeFuse demonstrates superior practicality and efficiency as it requires no additional training, no modification to LoRAs, no auxiliary models, and no user-defined prompt templates or region specifications. Alternatively, it only requires users to provide the LoRA activation words for seamless integration into standard workflows. Extensive experiments validate that FreeFuse outperforms existing approaches in both generation quality and usability under the multi-subject generation tasks. The project page is at this https URL
zh

[CV-13] Localising under the drape: proprioception in the era of distributed surgical robotic system

【速读】:该论文旨在解决当前外科手术机器人因缺乏空间感知能力而导致的碰撞、系统恢复和流程中断等问题,尤其是在分布式机器人系统中独立交互手臂增多时,这些问题将更加突出。现有跟踪系统依赖于笨重的红外摄像头和反射标记,不仅视野有限,还增加了手术室内的硬件负担。其解决方案的关键在于提出了一种无需标记的本体感觉(proprioception)方法,该方法仅使用轻量级立体RGB相机与新型基于Transformer的深度学习模型,实现了在无视觉线索干扰下对完全遮蔽的手术机器人进行精确局部定位。该方法基于迄今为止最大的多中心空间手术机器人数据集(140万张自标注图像),通过追踪整个机器人及手术场景而非单个标记点,提供鲁棒性强的全局视角,支持手术场景理解与上下文感知控制,同时显著提升可见性并消除标记依赖,为模块化和自主化手术机器人系统的发展奠定基础。

链接: https://arxiv.org/abs/2510.23512
作者: Martin Huber,Nicola A. Cavalcanti,Ayoob Davoodi,Ruixuan Li,Christopher E. Mower,Fabio Carrillo,Christoph J. Laux,Francois Teyssere,Thibault Chandanson,Antoine Harlé,Elie Saghbiny,Mazda Farshad,Guillaume Morel,Emmanuel Vander Poorten,Philipp Fürnstahl,Sébastien Ourselin,Christos Bergeles,Tom Vercauteren
机构: King’s College London (国王学院); Huawei (华为); Balgrist University Hospital (巴尔格里斯特大学医院); University of Zurich (苏黎世大学); KU Leuven (鲁汶大学); Sorbonne University (索邦大学); SpineGuard (脊柱守护公司)
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Despite their mechanical sophistication, surgical robots remain blind to their surroundings. This lack of spatial awareness causes collisions, system recoveries, and workflow disruptions, issues that will intensify with the introduction of distributed robots with independent interacting arms. Existing tracking systems rely on bulky infrared cameras and reflective markers, providing only limited views of the surgical scene and adding hardware burden in crowded operating rooms. We present a marker-free proprioception method that enables precise localisation of surgical robots under their sterile draping despite associated obstruction of visual cues. Our method solely relies on lightweight stereo-RGB cameras and novel transformer-based deep learning models. It builds on the largest multi-centre spatial robotic surgery dataset to date (1.4M self-annotated images from human cadaveric and preclinical in vivo studies). By tracking the entire robot and surgical scene, rather than individual markers, our approach provides a holistic view robust to occlusions, supporting surgical scene understanding and context-aware control. We demonstrate an example of potential clinical benefits during in vivo breathing compensation with access to tissue dynamics, unobservable under state of the art tracking, and accurately locate in multi-robot systems for future intelligent interaction. In addition, and compared with existing systems, our method eliminates markers and improves tracking visibility by 25%. To our knowledge, this is the first demonstration of marker-free proprioception for fully draped surgical robots, reducing setup complexity, enhancing safety, and paving the way toward modular and autonomous robotic surgery.
zh

[CV-14] Pac: Incorporating Intra-image Patch Context into Graph Neural Networks for Medical Image Classification ICONIP2025

链接: https://arxiv.org/abs/2510.23504
作者: Usama Zidan,Mohamed Gaber,Mohammed M. Abdelsamea
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted for publication in the proceedings of ICONIP 2025

点击查看摘要

[CV-15] VOLD: Reasoning Transfer from LLM s to Vision-Language Models via On-Policy Distillation WWW

链接: https://arxiv.org/abs/2510.23497
作者: Walid Bousselham,Hilde Kuehne,Cordelia Schmid
机构: Tuebingen AI Center; University of Tuebingen; MIT-IBM Watson AI Lab; Inria; École Normale Supérieure; CNRS; PSL Research University
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: this http URL

点击查看摘要

[CV-16] Yesnt: Are Diffusion Relighting Models Ready for Capture Stage Compositing? A Hybrid Alternative to Bridge the Gap

链接: https://arxiv.org/abs/2510.23494
作者: Elisabeth Jüttner,Leona Krath,Stefan Korfhage,Hannah Dröge,Matthias B. Hullin,Markus Plack
机构: University of Bonn (波恩大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
备注:

点击查看摘要

[CV-17] -REGS: Minimum Spanning Tree Regularization for Self-Supervised Learning NEURIPS2025

链接: https://arxiv.org/abs/2510.23484
作者: Julie Mordacq,David Loiseaux,Vicky Kalogeiton,Steve Oudot
机构: Inria Saclay; LIX, CNRS, École Polytechnique, IP Paris
类目: Machine Learning (cs.LG); Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV)
备注: NeurIPS 2025

点击查看摘要

[CV-18] On the Faithfulness of Visual Thinking: Measurement and Enhancement

【速读】:该论文旨在解决当前大型视觉语言模型(Large Vision-Language Models, LVLMs)在经过强化学习微调(Reinforcement Fine-Tuning, RFT)后生成的视觉-文本多模态思维链(Multimodal Chain-of-Thought, MCoT)推理过程中存在的视觉信息不忠实性(unfaithfulness)问题。具体表现为:尽管MCoT推理能得出正确答案,但其中引入的视觉信息往往不准确甚至无关,说明模型并未真正依赖视觉证据进行推理。作者指出,这一问题源于RFT阶段的奖励机制仅鼓励“视觉与文本交错呈现”的格式,而非视觉内容的准确性。为解决此问题,论文提出一种名为充分组件因果模型(Sufficient-Component Cause Model, SCCM)的学习策略,其核心在于引导MCoT生成足够且最小化的视觉组件,这些组件本身即具备独立推导出正确答案的能力。SCCM无需人工标注,可无缝集成至现有MCoT的RFT流程中,实验证明其显著提升了多个细粒度感知与推理任务上的视觉忠实性。

链接: https://arxiv.org/abs/2510.23482
作者: Zujing Liu,Junwen Pan,Qi She,Yuan Gao,Guisong Xia
机构: Wuhan University (武汉大学); ByteDance (字节跳动)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Recent large vision-language models (LVLMs) can generate vision-text multimodal chain-of-thought (MCoT) traces after reinforcement fine-tuning (RFT). However, we observe that the visual information incorporated in MCoT is often inaccurate, though still yield correct answers, indicating a lack of faithfulness in the MCoT reasoning process. We attribute this unfaithfulness to the RL reward in RFT, which solely incentivizes the format of interleaved vision-text cues, ie, it encourages the model to incorporate visual information into its text reasoning steps without considering the correctness of the visual information. In this paper, we first probe the faithfulness of MCoT by measuring how much the prediction changes when its visual and textual thoughts are intervened. Surprisingly, the model’s predictions remain nearly unchanged under visual intervention but change significantly under textual intervention, indicating that the visual evidence is largely ignored. To further analyze visual information, we introduce an automated LVLM-based evaluation metric that quantifies the faithfulness of visual cues from two perspectives: reliability and sufficiency. Our evaluation reveals that the visual information in current MCoT traces is simultaneously unreliable and insufficient. To address this issue, we propose a novel MCoT learning strategy termed Sufficient-Component Cause Model (SCCM) learning. This approach encourages the MCoT to generate sufficient yet minimal visual components that are independently capable of leading to correct answers. We note that the proposed SCCM is annotation-free and compatible with various RFT for MCoT in a plug-and-play manner. Empirical results demonstrate that SCCM consistently improves the visual faithfulness across a suite of fine-grained perception and reasoning benchmarks. Code is available at this https URL.
zh

[CV-19] MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding

【速读】:该论文旨在解决多模态大语言模型(Multi-modal Large Language Models, MLLMs)中视觉-语言对齐训练时面临的可扩展性、鲁棒性与对齐质量之间的权衡问题。当前主流方法如监督微调(Supervised Fine-Tuning, SFT)依赖大量人工标注且难以捕捉细微偏好,而强化学习(Reinforcement Learning, RL)虽引入奖励信号但存在计算开销大和训练不稳定的问题。为此,作者提出MergeMix——一种训练阶段的增强范式,其关键在于通过注意力感知的图像混合策略(基于token合并实现更密集的聚类表示与空间上下文保留)构建偏好对(混合图像与原始图像配对),并利用SimPO损失进行优化,从而在不显著增加复杂度的前提下提升注意力一致性与对齐效率,实现比传统启发式方法更优的分类性能与训练稳定性。

链接: https://arxiv.org/abs/2510.23479
作者: Xin Jin,Siyuan Li,Siyong Jian,Kai Yu,Huan Wang
机构: Westlake University (西湖大学); Zhejiang University (浙江大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Code Link: this https URL

点击查看摘要

Abstract:Vision-language alignment in multi-modal large language models (MLLMs) typically relies on supervised fine-tuning (SFT) or reinforcement learning (RL). SFT is stable and efficient but requires large-scale human annotations and cannot capture subtle preferences, while RL brings in a reward signal for training, but suffers from overhead and instability. These limitations highlight a trade-off between scalability, robustness, and alignment quality. To address this, we propose MergeMix, a training-time augmentation paradigm that bridges SFT and RL. It first applies an attention-aware image mixing via token merge with more cluster representation and spatial context, and then presents a preference-driven training paradigm for MLLMs by building preference pairs with mixed images and raw images, and optimizing via SimPO loss. As a mixup augmentation, MergeMix enhances attention consistency and efficiency, surpassing other heuristic-based methods in classification. Extensive experiments demonstrate that MergeMix achieves competitive accuracy with improved efficiency, providing a scalable approach to preference alignment in classification and MLLMs.
zh

[CV-20] UrbanIng-V2X: A Large-Scale Multi-Vehicle Multi-Infrastructure Dataset Across Multiple Intersections for Cooperative Perception NEURIPS2025

【速读】:该论文旨在解决当前合作感知(Cooperative Perception)研究中缺乏大规模、多交叉路口、多模态数据集的问题,从而避免模型因训练环境单一而过拟合,导致性能评估失真。现有真实世界数据集通常仅覆盖单个交叉路口或单一车辆,难以全面反映复杂城市交通场景下的感知挑战。解决方案的关键在于提出UrbanIng-V2X,这是首个在德国英戈尔施塔特市三个不同交叉路口部署的大型多模态合作感知数据集,包含34段时序对齐且空间标定的传感器序列,涵盖12辆车载RGB相机、2辆车载LiDAR、17个基础设施热成像相机和12个基础设施LiDAR,所有数据均以10 Hz频率标注3D边界框,共涉及13类目标对象,总计约71.2万标注实例。该数据集支持多车与基础设施协同感知任务的基准测试,显著提升了算法在多样化交通环境中的泛化能力评估水平。

链接: https://arxiv.org/abs/2510.23478
作者: Karthikeyan Chandra Sekaran,Markus Geisler,Dominik Rößle,Adithya Mohan,Daniel Cremers,Wolfgang Utschick,Michael Botsch,Werner Huber,Torsten Schön
机构: Technische Hochschule Ingolstadt (英戈尔施塔特应用技术大学); Technical University of Munich (慕尼黑工业大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted to NeurIPS 2025. Including supplemental material. For code and dataset, see this https URL

点击查看摘要

Abstract:Recent cooperative perception datasets have played a crucial role in advancing smart mobility applications by enabling information exchange between intelligent agents, helping to overcome challenges such as occlusions and improving overall scene understanding. While some existing real-world datasets incorporate both vehicle-to-vehicle and vehicle-to-infrastructure interactions, they are typically limited to a single intersection or a single vehicle. A comprehensive perception dataset featuring multiple connected vehicles and infrastructure sensors across several intersections remains unavailable, limiting the benchmarking of algorithms in diverse traffic environments. Consequently, overfitting can occur, and models may demonstrate misleadingly high performance due to similar intersection layouts and traffic participant behavior. To address this gap, we introduce UrbanIng-V2X, the first large-scale, multi-modal dataset supporting cooperative perception involving vehicles and infrastructure sensors deployed across three urban intersections in Ingolstadt, Germany. UrbanIng-V2X consists of 34 temporally aligned and spatially calibrated sensor sequences, each lasting 20 seconds. All sequences contain recordings from one of three intersections, involving two vehicles and up to three infrastructure-mounted sensor poles operating in coordinated scenarios. In total, UrbanIng-V2X provides data from 12 vehicle-mounted RGB cameras, 2 vehicle LiDARs, 17 infrastructure thermal cameras, and 12 infrastructure LiDARs. All sequences are annotated at a frequency of 10 Hz with 3D bounding boxes spanning 13 object classes, resulting in approximately 712k annotated instances across the dataset. We provide comprehensive evaluations using state-of-the-art cooperative perception methods and publicly release the codebase, dataset, HD map, and a digital twin of the complete data collection environment.
zh

[CV-21] Video-Thinker: Sparking “Thinking with Videos” via Reinforcement Learning

链接: https://arxiv.org/abs/2510.23473
作者: Shijian Wang,Jiarui Jin,Xingjian Wang,Linxin Song,Runhao Fu,Hecheng Wang,Zongyuan Ge,Yuan Lu,Xuelian Cheng
机构: Southeast University (东南大学); Monash University (蒙纳士大学); Xiaohongshu Inc. (小红书公司); University of Southern California (南加州大学); Fudan University (复旦大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-22] FRBNet: Revisiting Low-Light Vision through Frequency-Domain Radial Basis Network

【速读】:该论文旨在解决低光照条件下图像视觉任务性能显著下降的问题,核心挑战在于现有方法对低光环境建模不充分,导致下游任务(如目标检测和分割)效果受限。解决方案的关键在于重新审视低光图像形成过程,将经典的朗伯模型扩展以更准确刻画低光条件,并通过频域分析理论证明频率域通道比可作为提取光照不变特征的依据;进而提出一种新颖的端到端可训练模块——频域径向基网络(Frequency-domain Radial Basis Network, FRBNet),其融合了频率域通道比操作与可学习的频域滤波器,实现整体光照不变特征增强,且无需修改现有网络损失函数即可作为即插即用模块集成至各类低光下游任务中。

链接: https://arxiv.org/abs/2510.23444
作者: Fangtong Sun,Congyu Li,Ke Yang,Yuchen Pan,Hanwen Yu,Xichuan Zhang,Yiying Li
机构: Intelligent Game and Decision Lab (智能游戏与决策实验室), Beijing, China; NUDT (国防科技大学); Hunan University (湖南大学); Harbin Institute of Technology (哈尔滨工业大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Low-light vision remains a fundamental challenge in computer vision due to severe illumination degradation, which significantly affects the performance of downstream tasks such as detection and segmentation. While recent state-of-the-art methods have improved performance through invariant feature learning modules, they still fall short due to incomplete modeling of low-light conditions. Therefore, we revisit low-light image formation and extend the classical Lambertian model to better characterize low-light conditions. By shifting our analysis to the frequency domain, we theoretically prove that the frequency-domain channel ratio can be leveraged to extract illumination-invariant features via a structured filtering process. We then propose a novel and end-to-end trainable module named \textbfFrequency-domain \textbfRadial \textbfBasis \textbfNetwork (\textbfFRBNet), which integrates the frequency-domain channel ratio operation with a learnable frequency domain filter for the overall illumination-invariant feature enhancement. As a plug-and-play module, FRBNet can be integrated into existing networks for low-light downstream tasks without modifying loss functions. Extensive experiments across various downstream tasks demonstrate that FRBNet achieves superior performance, including +2.2 mAP for dark object detection and +2.9 mIoU for nighttime segmentation. Code is available at: this https URL.
zh

[CV-23] CURVETE: Curriculum Learning and Progressive Self-supervised Training for Medical Image Classification ICONIP2025

链接: https://arxiv.org/abs/2510.23442
作者: Asmaa Abbas,Mohamed Gaber,Mohammed M. Abdelsamea
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted for publication in the proceedings of ICONIP 2025

点击查看摘要

[CV-24] MiCADangelo: Fine-Grained Reconstruction of Constrained CAD Models from 3D Scans NEURIPS2025

链接: https://arxiv.org/abs/2510.23429
作者: Ahmet Serdar Karadeniz,Dimitrios Mallis,Danila Rukhovich,Kseniya Cherenkova,Anis Kacem,Djamila Aouada
机构: SnT, University of Luxembourg(卢森堡大学SnT研究所); Artec 3D(阿特克3D)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted at NeurIPS 2025

点击查看摘要

[CV-25] Quality-controlled registration of urban MLS point clouds reducing drift effects by adaptive frag mentation

链接: https://arxiv.org/abs/2510.23416
作者: Marco Antonio Ortiz Rincon,Yihui Yang,Christoph Holst
机构: Chair of Engineering Geodesy, TUM School of Engineering and Design, Technical University of Munich (慕尼黑工业大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
备注: 10 pages, 7 figures. This manuscript is currently under review at the International Journal of Applied Earth Observation and Geoinformation (Elsevier). A preprint version will also be available on SSRN (Elsevier Preprints) with a DOI once processed. This is the original preprint version submitted for peer review

点击查看摘要

[CV-26] owards Generalisable Foundation Models for 3D Brain MRI

【速读】:该论文旨在解决医学影像领域中3D脑部磁共振成像(MRI)特征学习受限于标注数据稀缺、多模态信息利用不足以及传统单切片建模方法难以捕捉完整脑部解剖结构的问题。其解决方案的关键在于提出BrainFound,一个基于DINO-v2架构扩展的自监督基础模型,通过引入序列MRI切片的体积信息来建模完整的3D脑部解剖结构,从而突破传统单切片范式;该模型支持单模态与多模态输入,在多种成像协议和临床场景下均表现出强泛化能力,并显著提升疾病检测与图像分割等下游任务的性能,尤其在标签稀疏和多对比度设置中优于现有自监督预训练策略及监督基线。

链接: https://arxiv.org/abs/2510.23415
作者: Moona Mazher,Geoff J. M. Parker,Daniel C. Alexander
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Foundation models in artificial intelligence (AI) are transforming medical imaging by enabling general-purpose feature learning from large-scale, unlabeled datasets. In this work, we introduce BrainFound, a self-supervised foundation model for brain MRI, built by extending DINO-v2, a vision transformer originally designed for 2D natural images. BrainFound adapts DINO-v2 to model full 3D brain anatomy by incorporating volumetric information from sequential MRI slices, moving beyond conventional single-slice paradigms. It supports both single- and multimodal inputs, enabling a broad range of downstream tasks, including disease detection and image segmentation, while generalising across varied imaging protocols and clinical scenarios. We show that BrainFound consistently outperforms existing self-supervised pretraining strategies and supervised baselines, particularly in label-scarce and multi-contrast settings. By integrating information from diverse 3D MRI modalities (e.g., T1, T2, FLAIR), it enhances diagnostic accuracy and reduces dependency on extensive expert annotations. This flexibility makes BrainFound a scalable and practical solution for 3D neuroimaging pipelines, with significant potential for clinical deployment and research innovation.
zh

[CV-27] Symmetria: A Synthetic Dataset for Learning in Point Clouds

链接: https://arxiv.org/abs/2510.23414
作者: Ivan Sipiran,Gustavo Santelices,Lucas Oyarzún,Andrea Ranieri,Chiara Romanengo,Silvia Biasotti,Bianca Falcidieno
机构: University of Chile (智利大学); National Research Council (意大利国家研究委员会)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 40 pages

点击查看摘要

[CV-28] Color and Frequency Correction for Image Colorization

链接: https://arxiv.org/abs/2510.23399
作者: Yun Kai Zhuang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 7 pages, 5 tables

点击查看摘要

[CV-29] VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary Annotations

链接: https://arxiv.org/abs/2510.23397
作者: Lu Dong,Haiyu Zhang,Han Lin,Ziang Yan,Xiangyu Zeng,Hongjie Zhang,Yifei Huang,Yi Wang,Zhen-Hua Ling,Limin Wang,Yali Wang
机构: University of Science and Technology of China(中国科学技术大学); Shanghai Artificial Intelligence Laboratory(上海人工智能实验室); Beihang University(北京航空航天大学); Shanghai Jiao Tong University(上海交通大学); Zhejiang University(浙江大学); State Key Laboratory for Novel Software Technology, Nanjing University(南京大学软件新技术国家重点实验室); Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences(中国科学院深圳先进技术研究院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-30] An Efficient Remote Sensing Super Resolution Method Exploring Diffusion Priors and Multi-Modal Constraints for Crop Type Mapping

链接: https://arxiv.org/abs/2510.23382
作者: Songxi Yang,Tang Sui,Qunying Huang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 41 pages

点击查看摘要

[CV-31] PlanarTrack: A high-quality and challenging benchmark for large-scale planar object tracking

链接: https://arxiv.org/abs/2510.23368
作者: Yifan Jiao,Xinran Liu,Xiaoqiong Liu,Xiaohui Yuan,Heng Fan,Libo Zhang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-32] Interpretable Tile-Based Classification of Paclitaxel Exposure

链接: https://arxiv.org/abs/2510.23363
作者: Sean Fletcher,Gabby Scott,Douglas Currie,Xin Zhang,Yuqi Song,Bruce MacLeod
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-33] Multitask Multimodal Self-Supervised Learning for Medical Images

【速读】:该论文旨在解决医学图像分析中对大规模标注数据的高度依赖问题,这一限制主要源于专家标注的高成本以及隐私和法律合规性约束。解决方案的关键在于提出一种名为Medformer的新型神经网络架构,其核心创新包括:1)基于自监督学习(self-supervised learning)技术设计新颖的预训练任务,从无标签数据中提取可迁移的语义特征;2)引入动态输入-输出适配机制,实现跨域(deep domain adaptation)的多任务学习能力,从而有效整合不同模态(如2D X光片与3D MRI)和尺寸的医学图像数据。该方法显著降低了对人工标注数据的依赖,并在MedMNIST等基准数据集上验证了其泛化性能,为构建更高效、可扩展的医疗AI诊断系统提供了新路径。

链接: https://arxiv.org/abs/2510.23325
作者: Cristian Simionescu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:This thesis works to address a pivotal challenge in medical image analysis: the reliance on extensive labeled datasets, which are often limited due to the need for expert annotation and constrained by privacy and legal issues. By focusing on the development of self-supervised learning techniques and domain adaptation methods, this research aims to circumvent these limitations, presenting a novel approach to enhance the utility and efficacy of deep learning in medical imaging. Central to this thesis is the development of the Medformer, an innovative neural network architecture designed for multitask learning and deep domain adaptation. This model is adept at pre-training on diverse medical image datasets, handling varying sizes and modalities, and is equipped with a dynamic input-output adaptation mechanism. This enables efficient processing and integration of a wide range of medical image types, from 2D X-rays to complex 3D MRIs, thus mitigating the dependency on large labeled datasets. Further, the thesis explores the current state of self-supervised learning in medical imaging. It introduces novel pretext tasks that are capable of extracting meaningful information from unlabeled data, significantly advancing the model’s interpretative abilities. This approach is validated through rigorous experimentation, including the use of the MedMNIST dataset, demonstrating the model’s proficiency in learning generalized features applicable to various downstream tasks. In summary, this thesis contributes to the advancement of medical image analysis by offering a scalable, adaptable framework that reduces reliance on labeled data. It paves the way for more accurate, efficient diagnostic tools in healthcare, signifying a major step forward in the application of deep learning in medical imaging. Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG) Cite as: arXiv:2510.23325 [cs.CV] (or arXiv:2510.23325v1 [cs.CV] for this version) https://doi.org/10.48550/arXiv.2510.23325 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Cristian Simionescu Drd [view email] [v1] Mon, 27 Oct 2025 13:42:16 UTC (11,464 KB)
zh

[CV-34] ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation

链接: https://arxiv.org/abs/2510.23306
作者: Jiahao Chang,Chongjie Ye,Yushuang Wu,Yuantao Chen,Yidan Zhang,Zhongjin Luo,Chenghong Li,Yihao Zhi,Xiaoguang Han
机构: The Chinese University of Hong Kong, Shenzhen (香港中文大学(深圳)); The Future Network of Intelligence Institute, CUHK-Shenzhen (未来网络智能研究院)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 18 pages, 7 figures

点击查看摘要

[CV-35] MDReID: Modality-Decoupled Learning for Any-to-Any Multi-Modal Object Re-Identification NEURIPS2025

链接: https://arxiv.org/abs/2510.23301
作者: Yingying Feng,Jie Li,Jie Hu,Yukang Zhang,Lei Tan,Jiayi Ji
机构: Northeastern University (东北大学); Xiamen University (厦门大学); National University of Singapore (新加坡国立大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by NeurIPS 2025

点击查看摘要

[CV-36] MMSD3.0: A Multi-Image Benchmark for Real-World Multimodal Sarcasm Detection

链接: https://arxiv.org/abs/2510.23299
作者: Haochen Zhao,Yuyao Kong,Yongxiu Xu,Gaopeng Gou,Hongbo Xu,Yubin Wang,Haoliang Zhang
机构: Institute of Information Engineering, Chinese Academy of Sciences (中国科学院信息工程研究所); School of Cyber Security, University of Chinese Academy of Sciences (中国科学院大学网络空间安全学院)
类目: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
备注:

点击查看摘要

[CV-37] Adaptive Stochastic Coefficients for Accelerating Diffusion Sampling NEURIPS2025

链接: https://arxiv.org/abs/2510.23285
作者: Ruoyu Wang,Beier Zhu,Junzhi Li,Liangyu Yuan,Chi Zhang
机构: AGI Lab, Westlake University (西湖大学); Nanyang Technological University (南洋理工大学); University of Chinese Academy of Sciences (中国科学院大学); Institute of Software, Chinese Academy of Sciences (中国科学院软件研究所); Tongji University (同济大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: To appear in NeurIPS 2025

点击查看摘要

[CV-38] hYOLO Model: Enhancing Object Classification with Hierarchical Context in YOLOv8

链接: https://arxiv.org/abs/2510.23278
作者: Veska Tsenkova,Peter Stanchev,Daniel Petrov,Deyan Lazarov
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 39 pages, 12 figures, 4 tables, code available at this https URL

点击查看摘要

[CV-39] A Video Is Not Worth a Thousand Words

链接: https://arxiv.org/abs/2510.23253
作者: Sam Pollard,Michael Wray
机构: University of Bristol (布里斯托大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-40] Progressive Growing of Patch Size: Curriculum Learning for Accelerated and Improved Medical Image Segmentation MICCAI2024 ATC

链接: https://arxiv.org/abs/2510.23241
作者: Stefan M. Fischer,Johannes Kiechle,Laura Daza,Lina Felsner,Richard Osuala,Daniel M. Lang,Karim Lekadir,Jan C. Peeken,Julia A. Schnabel
机构: Technische Universität München (慕尼黑工业大学); Ludwig-Maximilians-Universität München (慕尼黑路德维希-马克西米利安大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Journal Extension of “Progressive Growing of Patch Size: Resource-Efficient Curriculum Learning for Dense Prediction Tasks” (MICCAI2024) submitted to MedIA

点击查看摘要

[CV-41] Autoregressive Styled Text Image Generation but Make it Reliable

链接: https://arxiv.org/abs/2510.23240
作者: Carmine Zaccagnino,Fabio Quattrini,Vittorio Pippi,Silvia Cascianelli,Alessio Tonioni,Rita Cucchiara
机构: University of Modena and Reggio Emilia(摩德纳和雷焦艾米利亚大学); Google(谷歌)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-42] hrough the Lens: Benchmarking Deepfake Detectors Against Moiré-Induced Distortions

链接: https://arxiv.org/abs/2510.23225
作者: Razaib Tariq,Minji Heo,Simon S. Woo,Shahroz Tariq
机构: Sungkyunkwan University (成均馆大学); CSIRO’s Data61 (澳大利亚联邦科学与工业研究组织数据61)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 48 Pages, 29 Figures, 15 Tables

点击查看摘要

[CV-43] Accurate and Scalable Multimodal Pathology Retrieval via Attentive Vision-Language Alignment

链接: https://arxiv.org/abs/2510.23224
作者: Hongyi Wang,Zhengjie Zhu,Jiabo Ma,Fang Wang,Yue Shi,Bo Luo,Jili Wang,Qiuyu Cai,Xiuming Zhang,Yen-Wei Chen,Lanfen Lin,Hao Chen
机构: The Hong Kong University of Science and Technology (香港科技大学); Zhejiang University (浙江大学); Huazhong University of Science and Technology (华中科技大学); Tongji Medical College (同济医学院); Union Hospital (协和医院); Sir Run Run Shaw Hospital (邵逸夫医院); The Central Hospital of Wuhan (武汉市中心医院); The First Affiliated Hospital (第一附属医院)
类目: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
备注:

点击查看摘要

[CV-44] VR-Drive: Viewpoint-Robust End-to-End Driving with Feed-Forward 3D Gaussian Splatting NEURIPS2025

链接: https://arxiv.org/abs/2510.23205
作者: Hoonhee Cho,Jae-Young Kang,Giwon Lee,Hyemin Yang,Heejun Park,Seokwoo Jung,Kuk-Jin Yoon
机构: KAIST(韩国科学技术院); 42dot
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by NeurIPS2025

点击查看摘要

[CV-45] DecoDINO: 3D Human-Scene Contact Prediction with Semantic Classification

链接: https://arxiv.org/abs/2510.23203
作者: Lukas Bierling,Davide Pasero,Fleur Dolmans,Helia Ghasemi,Angelo Broere
机构: University of Amsterdam (阿姆斯特丹大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-46] Evaluation of Vision-LLM s in Surveillance Video NEURIPS2025

链接: https://arxiv.org/abs/2510.23190
作者: Pascal Benschop,Cristian Meo,Justin Dauwels,Jelte P. Mense
机构: Delft University of Technology (代尔夫特理工大学); LatentWorlds AI
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted as poster in the NeurIPS 2025 Workshop on Space in Vision, Language, and Embodied AI

点击查看摘要

[CV-47] Finding 3D Scene Analogies with Multimodal Foundation Models

链接: https://arxiv.org/abs/2510.23184
作者: Junho Kim,Young Min Kim
机构: Seoul National University (首尔国立大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted to FM4RoboPlan workshop at RSS 2025

点击查看摘要

[CV-48] AG-Fusion: adaptive gated multimodal fusion for 3d object detection in complex scenes

链接: https://arxiv.org/abs/2510.23151
作者: Sixian Liu,Chen Xu,Qiang Wang,Donghai Shi,Yiwen Li
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:

点击查看摘要

[CV-49] Implicit Modeling for Transferability Estimation of Vision Foundation Models NEURIPS2025

链接: https://arxiv.org/abs/2510.23145
作者: Yaoyan Zheng,Huiqun Wang,Nan Zhou,Di Huang
机构: Beihang University (北京航空航天大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by NeurIPS 2025

点击查看摘要

[CV-50] DQ3D: Depth-guided Query for Transformer-Based 3D Object Detection in Traffic Scenarios

链接: https://arxiv.org/abs/2510.23144
作者: Ziyu Wang,Wenhao Li,Ji Wu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-51] Fast Voxel-Wise Kinetic Modeling in Dynamic PET using a Physics-Informed CycleGAN

链接: https://arxiv.org/abs/2510.23140
作者: Christian Salomonsen,Samuel Kuttner,Michael Kampffmeyer,Robert Jenssen,Kristoffer Wickstrøm,Jong Chul Ye,Elisabeth Wetzer
机构: UiT The Arctic University of Norway (北极挪威大学); University Hospital of North Norway (北挪威大学医院); Norwegian Computing Center (挪威计算中心); Univ. of Copenhagen (哥本哈根大学); Korea Advanced Institute of Science and Technology (韩国科学技术院)
类目: Computer Vision and Pattern Recognition (cs.CV); Other Quantitative Biology (q-bio.OT)
备注: 5 pages, 1 figure. Pre-review preprint. Submitted to MedEurIPS 2025 (EurIPS workshop)

点击查看摘要

[CV-52] Note on the Construction of Structure Tensor

链接: https://arxiv.org/abs/2510.23137
作者: Josef Bigun,Fernado Alonso-Fernandez
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Spectral Theory (math.SP)
备注:

点击查看摘要

[CV-53] DeepSalt: Bridging Laboratory and Satellite Spectra through Domain Adaptation and Knowledge Distillation for Large-Scale Soil Salinity Estimation

链接: https://arxiv.org/abs/2510.23124
作者: Rupasree Dey,Abdul Matin,Everett Lewark,Tanjim Bin Faruk,Andrei Bachinin,Sam Leuthold,M. Francesca Cotrufo,Shrideep Pallickara,Sangmi Lee Pallickara
机构: University of Colorado Boulder (科罗拉多大学博尔德分校); University of Colorado Denver (科罗拉多大学丹佛分校)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:

点击查看摘要

[CV-54] ask-Agnostic Fusion of Time Series and Imagery for Earth Observation

链接: https://arxiv.org/abs/2510.23118
作者: Gianfranco Basile,Johannes Jakubik,Benedikt Blumenstiel,Thomas Brunschwiler,Juan Bernabe Moreno
机构: IBM Research Europe (IBM 研究欧洲); ETH Zürich (苏黎世联邦理工学院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-55] Seeing Structural Failure Before it Happens: An Image-Based Physics-Informed Neural Network (PINN) for Spaghetti Bridge Load Prediction

链接: https://arxiv.org/abs/2510.23117
作者: Omer Jauhar Khan,Sudais Khan,Hafeez Anwar
机构: National University of Computer and Emerging Sciences (FAST-NUCES)
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
备注: 12 pages, 17 figures. Preprint

点击查看摘要

[CV-56] Residual Diffusion Bridge Model for Image Restoration

【速读】:该论文旨在解决现有扩散桥模型(Diffusion Bridge Models)在图像修复任务中面临的两大问题:一是缺乏统一的理论分析视角,多数方法仅将其视为随机插值的简单变体;二是采用全局噪声注入与去除策略,导致未退化区域因重建不完善而被误扰动。解决方案的关键在于提出残差扩散桥模型(Residual Diffusion Bridge Model, RDBM),其核心创新包括:理论上重新推导广义扩散桥的前向与反向随机微分方程(Stochastic Differential Equations, SDEs)并给出解析公式,同时利用源分布与目标分布之间的残差(residual)动态调节噪声注入与移除过程,实现对退化区域的自适应修复,同时保留完好区域的完整性。该方法揭示了现有桥模型均为RDBM的特例,并通过实验证明其最优性与先进性。

链接: https://arxiv.org/abs/2510.23116
作者: Hebaixu Wang,Jing Zhang,Haoyang Chen,Haonan Guo,Di Wang,Jiayi Ma,Bo Du
机构: Wuhan University (武汉大学); Zhongguancun Academy (中关村学院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Diffusion bridge models establish probabilistic paths between arbitrary paired distributions and exhibit great potential for universal image restoration. Most existing methods merely treat them as simple variants of stochastic interpolants, lacking a unified analytical perspective. Besides, they indiscriminately reconstruct images through global noise injection and removal, inevitably distorting undegraded regions due to imperfect reconstruction. To address these challenges, we propose the Residual Diffusion Bridge Model (RDBM). Specifically, we theoretically reformulate the stochastic differential equations of generalized diffusion bridge and derive the analytical formulas of its forward and reverse processes. Crucially, we leverage the residuals from given distributions to modulate the noise injection and removal, enabling adaptive restoration of degraded regions while preserving intact others. Moreover, we unravel the fundamental mathematical essence of existing bridge models, all of which are special cases of RDBM and empirically demonstrate the optimality of our proposed models. Extensive experiments are conducted to demonstrate the state-of-the-art performance of our method both qualitatively and quantitatively across diverse image restoration tasks. Code is publicly available at this https URL.
zh

[CV-57] Revisiting Multimodal Positional Encoding in Vision-Language Models

链接: https://arxiv.org/abs/2510.23095
作者: Jie Huang,Xuejing Liu,Sibo Song,Ruibing Hou,Hong Chang,Junyang Lin,Shuai Bai
机构: Qwen Team, Alibaba Group (阿里巴巴集团); Institute of Computing Technology, Chinese Academy of Sciences (中国科学院计算技术研究所)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 16 pages

点击查看摘要

[CV-58] EndoWave: Rational-Wavelet 4D Gaussian Splatting for Endoscopic Reconstruction

链接: https://arxiv.org/abs/2510.23087
作者: Taoyu Wu,Yiyi Miao,Jiaxin Guo,Ziyan Chen,Sihang Zhao,Zhuoxiao Li,Zhe Tang,Baoru Huang,Limin Yu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
备注:

点击查看摘要

[CV-59] Strategies for Robust Deep Learning Based Deformable Registration

链接: https://arxiv.org/abs/2510.23079
作者: Joel Honkamaa,Pekka Marttinen
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-60] Seq-DeepIPC: Sequential Sensing for End-to-End Control in Legged Robot Navigation

链接: https://arxiv.org/abs/2510.23057
作者: Oskar Natan,Jun Miura
机构: Universitas Gadjah Mada (印度尼西亚大学); Toyohashi University of Technology (丰田工业大学)
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Systems and Control (eess.SY)
备注: Preprint notice, this manuscript has been submitted to IEEE sensors journal for possible publication

点击查看摘要

[CV-61] HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling

链接: https://arxiv.org/abs/2510.23043
作者: Joungbin An,Kristen Grauman
机构: The University of Texas at Austin (德克萨斯大学奥斯汀分校)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Project Page: this https URL

点击查看摘要

[CV-62] Nested AutoRegressive Models

链接: https://arxiv.org/abs/2510.23028
作者: Hongyu Wu,Xuhui Fan,Zhangkai Wu,Longbing Cao
机构: Macquarie University (麦考瑞大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-63] UGAE: Unified Geometry and Attribute Enhancement for G-PCC Compressed Point Clouds

链接: https://arxiv.org/abs/2510.23009
作者: Pan Zhao,Hui Yuan,Chongzhen Tian,Tian Guo,Raouf Hamzaoui,Zhigeng Pan
机构: Shandong University (山东大学); Key Laboratory of Machine Intelligence and System Control, Ministry of Education (教育部机器智能与系统控制重点实验室); De Montfort University (德蒙福特大学); Nanjing University of Information Science and Technology (南京信息工程大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-64] CoMo: Compositional Motion Customization for Text-to-Video Generation

链接: https://arxiv.org/abs/2510.23007
作者: Youcan Xu,Zhen Wang,Jiaxin Shi,Kexin Li,Feifei Shao,Jun Xiao,Yi Yang,Jun Yu,Long Chen
机构: Zhejiang University (浙江大学); HKUST (香港科技大学); Xmax.AI Ltd; HIT (SZ) (深圳大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-65] An Intelligent Water-Saving Irrigation System Based on Multi-Sensor Fusion and Visual Servoing Control

链接: https://arxiv.org/abs/2510.23003
作者: ZhengKai Huang,YiKun Wang,ChenYu Hui,XiaoCheng
机构: 未知
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Systems and Control (eess.SY)
备注:

点击查看摘要

[CV-66] LoMix: Learnable Weighted Multi-Scale Logits Mixing for Medical Image Segmentation NEURIPS2025

链接: https://arxiv.org/abs/2510.22995
作者: Md Mostafijur Rahman,Radu Marculescu
机构: The University of Texas at Austin (德克萨斯大学奥斯汀分校)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 25 pages, 13 figures, NeurIPS 2025 accepted paper

点击查看摘要

[CV-67] SceneDecorator: Towards Scene-Oriented Story Generation with Scene Planning and Scene Consistency NEURIPS2025

链接: https://arxiv.org/abs/2510.22994
作者: Quanjian Song,Donghao Zhou,Jingyu Lin,Fei Shen,Jiaze Wang,Xiaowei Hu,Cunjian Chen,Pheng-Ann Heng
机构: Monash University (蒙纳士大学); The Chinese University of Hong Kong (香港中文大学); National University of Singapore (新加坡国立大学); South China University of Technology (华南理工大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by NeurIPS 2025; Project Page: this https URL

点击查看摘要

[CV-68] Exploring Semantic-constrained Adversarial Example with Instruction Uncertainty Reduction NEURIPS2025

【速读】:该论文旨在解决当前语义约束对抗样本(SemanticAE)生成方法在攻击能力上的不足,其核心问题在于人类指令中固有的语义不确定性因素(如指代多样性、描述不完整性及边界模糊性)未被充分建模。解决方案的关键在于提出多维指令不确定性降低框架(InSUR),通过三个维度的创新实现更优的SemanticAE生成:1)在采样方法上,引入残差驱动的攻击方向稳定机制(ResAdv-DDIM采样器),缓解因语言指代多样性导致的对抗优化不稳定,释放多步扩散模型的迁移性和鲁棒性;2)在任务建模上,设计上下文编码的攻击场景约束机制,利用引导掩码和渲染器集成弥补指令缺失信息,增强二维/三维SemanticAE的场景自适应攻击能力;3)在生成器评估上,提出语义抽象的攻击评估增强策略,明确评估边界以提升生成器的有效性。实验表明,InSUR显著提升了攻击迁移性能,并首次实现了无参考条件下的语义约束三维对抗样本生成。

链接: https://arxiv.org/abs/2510.22981
作者: Jin Hu,Jiakai Wang,Linna Jing,Haolin Li,Haodong Liu,Haotong Qin,Aishan Liu,Ke Xu,Xianglong Liu
机构: Beihang University (北京航空航天大学); Zhongguancun Laboratory (中关村实验室); ETH Zurich (苏黎世联邦理工学院)
类目: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: NeurIPS 2025

点击查看摘要

Abstract:Recently, semantically constrained adversarial examples (SemanticAE), which are directly generated from natural language instructions, have become a promising avenue for future research due to their flexible attacking forms. To generate SemanticAEs, current methods fall short of satisfactory attacking ability as the key underlying factors of semantic uncertainty in human instructions, such as referring diversity, descriptive incompleteness, and boundary ambiguity, have not been fully investigated. To tackle the issues, this paper develops a multi-dimensional instruction uncertainty reduction (InSUR) framework to generate more satisfactory SemanticAE, i.e., transferable, adaptive, and effective. Specifically, in the dimension of the sampling method, we propose the residual-driven attacking direction stabilization to alleviate the unstable adversarial optimization caused by the diversity of language references. By coarsely predicting the language-guided sampling process, the optimization process will be stabilized by the designed ResAdv-DDIM sampler, therefore releasing the transferable and robust adversarial capability of multi-step diffusion models. In task modeling, we propose the context-encoded attacking scenario constraint to supplement the missing knowledge from incomplete human instructions. Guidance masking and renderer integration are proposed to regulate the constraints of 2D/3D SemanticAE, activating stronger scenario-adapted attacks. Moreover, in the dimension of generator evaluation, we propose the semantic-abstracted attacking evaluation enhancement by clarifying the evaluation boundary, facilitating the development of more effective SemanticAE generators. Extensive experiments demonstrate the superiority of the transfer attack performance of InSUR. Moreover, we realize the reference-free generation of semantically constrained 3D adversarial examples for the first time.
zh

[CV-69] VoMP: Predicting Volumetric Mechanical Property Fields

【速读】:该论文旨在解决物理仿真中材料属性(如杨氏模量 E、泊松比 ν 和密度 ρ)空间分布难以准确建模的问题,传统方法依赖人工设计,效率低且易出错。其解决方案的关键在于提出 VoMP(Volumetric Material Prediction),一种前馈式神经网络框架,通过聚合体素级多视角特征并输入训练好的 Geometry Transformer 来预测每个体素的材料潜在编码(material latent codes)。这些潜在编码位于从真实世界数据中学到的物理可行材料流形上,确保解码后的材料属性在物理上合理;同时,论文还构建了结合分割 3D 数据集、材料数据库和视觉-语言模型的标注流程,以生成高质量的对象级训练数据,从而实现高精度、高速度的体积材料属性估计。

链接: https://arxiv.org/abs/2510.22975
作者: Rishit Dagli,Donglai Xiang,Vismay Modi,Charles Loop,Clement Fuji Tsang,Anka He Chen,Anita Hu,Gavriel State,David I.W. Levin,Maria Shugrina
机构: NVIDIA(英伟达); University of Toronto(多伦多大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
备注: hi-res paper and other details at: this https URL

点击查看摘要

Abstract:Physical simulation relies on spatially-varying mechanical properties, often laboriously hand-crafted. VoMP is a feed-forward method trained to predict Young’s modulus ( E ), Poisson’s ratio ( \nu ), and density ( \rho ) throughout the volume of 3D objects, in any representation that can be rendered and voxelized. VoMP aggregates per-voxel multi-view features and passes them to our trained Geometry Transformer to predict per-voxel material latent codes. These latents reside on a manifold of physically plausible materials, which we learn from a real-world dataset, guaranteeing the validity of decoded per-voxel materials. To obtain object-level training data, we propose an annotation pipeline combining knowledge from segmented 3D datasets, material databases, and a vision-language model, along with a new benchmark. Experiments show that VoMP estimates accurate volumetric properties, far outperforming prior art in accuracy and speed.
zh

[CV-70] Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method

链接: https://arxiv.org/abs/2510.22973
作者: Bohan Li,Xin Jin,Hu Zhu,Hongsi Liu,Ruikai Li,Jiazhe Guo,Kaiwen Cai,Chao Ma,Yueming Jin,Hao Zhao,Xiaokang Yang,Wenjun Zeng
机构: Shanghai Jiao Tong University (上海交通大学); Eastern Institute of Technology (东方理工大学); Li Auto (理想汽车); National University of Singapore (新加坡国立大学); Tsinghua University (清华大学); Ningbo Institute of Digital Twin, Eastern Institute of Technology (东方理工大学宁波数字孪生研究院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: this https URL

点击查看摘要

[CV-71] VALA: Learning Latent Anchors for Training-Free and Temporally Consistent

链接: https://arxiv.org/abs/2510.22970
作者: Zhangkai Wu,Xuhui Fan,Zhongyuan Xie,Kaize Shi,Longbing Cao
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-72] Survey of Multimodal Geospatial Foundation Models: Techniques Applications and Challenges

链接: https://arxiv.org/abs/2510.22964
作者: Liling Yang,Ning Chen,Jun Yue,Yidan Liu,Jiayi Ma,Pedram Ghamisi,Antonio Plaza,Leyuan Fang
机构: Hunan University (湖南大学); Peking University (北京大学); Central South University (中南大学); Wuhan University (武汉大学); Helmholtz-Zentrum Dresden-Rossendorf (德累斯顿罗森多夫亥姆霍兹中心); Lancaster University (兰卡斯特大学); University of Extremadura (埃斯特雷马杜拉大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-73] FAME: Fairness-aware Attention-modulated Video Editing

链接: https://arxiv.org/abs/2510.22960
作者: Zhangkai Wu,Xuhui Fan,Zhongyuan Xie,Kaize Shi,Zhidong Li,Longbing Cao
机构: 1. 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-74] LightBagel: A Light-weighted Double Fusion Framework for Unified Multimodal Understanding and Generation

链接: https://arxiv.org/abs/2510.22946
作者: Zeyu Wang,Zilong Chen,Chenhui Gou,Feng Li,Chaorui Deng,Deyao Zhu,Kunchang Li,Weihao Yu,Haoqin Tu,Haoqi Fan,Cihang Xie
机构: UC Santa Cruz (加州大学圣克鲁兹分校); Tsinghua University (清华大学); Monash University (莫纳什大学); ByteDance Seed (字节跳动种子项目)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Preprint. Project page: this https URL

点击查看摘要

[CV-75] Switchable Token-Specific Codebook Quantization For Face Image Compression

链接: https://arxiv.org/abs/2510.22943
作者: Yongbo Wang,Haonan Wang,Guodong Mu,Ruixin Zhang,Jiaqi Chen,Jingyun Zhang,Jun Wang,Yuan Xie,Zhizhong Zhang,Shouhong Ding
机构: East China Normal University (华东师范大学); Tencent Youtu Lab (腾讯优图实验室); Tencent WeChat Pay Lab (腾讯微信支付实验室)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-76] Bi-Encoder Contrastive Learning for Fingerprint and Iris Biometrics

链接: https://arxiv.org/abs/2510.22937
作者: Matthew So,Judah Goldfeder,Mark Lis,Hod Lipson
机构: Columbia University (哥伦比亚大学); SUNY Downstate Health Sciences University (纽约州立大学Downstate健康科学中心)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:

点击查看摘要

[CV-77] Positional Preservation Embedding for Multimodal Large Language Models

链接: https://arxiv.org/abs/2510.22936
作者: Mouxiao Huang,Borui Jiang,Dehua Zheng,Hailin Hu,Kai Han,Xinghao Chen
机构: Huawei Noah’s Ark Lab (华为诺亚方舟实验室)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-78] Gen-LangSplat: Generalized Language Gaussian Splatting with Pre-Trained Feature Compression

链接: https://arxiv.org/abs/2510.22930
作者: Pranav Saxena
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-79] Estimating Pasture Biomass from Top-View Images: A Dataset for Precision Agriculture WWW

链接: https://arxiv.org/abs/2510.22916
作者: Qiyu Liao,Dadong Wang,Rebecca Haling,Jiajun Liu,Xun Li,Martyna Plomecka,Andrew Robson,Matthew Pringle,Rhys Pirie,Megan Walker,Joshua Whelan
机构: Data61, CSIRO; Agriculture and Food, CSIRO; Google; University of New England; Meat & Livestock Australia
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 9 pages, 2 figures, 2 tables, The dataset is available on the official Kaggle webpage: this https URL

点击查看摘要

[CV-80] Seeing the Unseen: Towards Zero-Shot Inspection for Wind Turbine Blades using Knowledge-Augmented Vision Language Models

链接: https://arxiv.org/abs/2510.22868
作者: Yang Zhang,Qianyu Zhou,Farhad Imani,Jiong Tang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-81] Semantic Surgery: Zero-Shot Concept Erasure in Diffusion Models NEURIPS2025

链接: https://arxiv.org/abs/2510.22851
作者: Lexiang Xiong,Chengyu Liu,Jingwen Ye,Yan Liu,Yuecong Xu
机构: National University of Singapore (新加坡国立大学); Sichuan University (四川大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Accepted to the 39th Conference on Neural Information Processing Systems (NeurIPS 2025). Code is available at this https URL

点击查看摘要

[CV-82] FastJAM: a Fast Joint Alignment Model for Images FAST NEURIPS2025

链接: https://arxiv.org/abs/2510.22842
作者: Omri Hirsch,Ron Shapira Weber,Shira Ifergane,Oren Freifeld
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted to NeurIPS 2025. Pages 1-10 are the Main Paper. Pages 23-31 are Supplemental Material. FastJAM website - this https URL

点击查看摘要

[CV-83] Semantic-Preserving Cross-Style Visual Reasoning for Robust Multi-Modal Understanding in Large Vision-Language Models

【速读】:该论文旨在解决大型视觉语言模型(Large Vision-Language Models, LVLMs)在跨风格视觉理解中面临的“风格陷阱”(style trap)问题,即模型难以有效区分图像中的内容与风格信息,从而导致在不同视觉风格下语义理解能力下降,尤其是在上下文学习(in-context learning, ICL)场景中表现不稳定。解决方案的关键在于提出一种名为语义保持的跨风格视觉推理框架(Semantic-Preserving Cross-Style Visual Reasoner, SP-CSVR),其核心创新包括:1)跨风格特征编码器(Cross-Style Feature Encoder, CSFE)实现风格与内容的解耦;2)语义对齐的上下文解码器(Semantic-Aligned In-Context Decoder, SAICD)支持高效少样本风格适应;3)自适应语义一致性模块(Adaptive Semantic Consistency Module, ASCM)通过多任务对比学习强化跨风格语义不变性,从而显著提升模型在多样化视觉风格下的鲁棒性、泛化能力和推理效率。

链接: https://arxiv.org/abs/2510.22838
作者: Aya Nakayama,Brian Wong,Yuji Nishimura,Kaito Tanaka
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:The “style trap” poses a significant challenge for Large Vision-Language Models (LVLMs), hindering robust semantic understanding across diverse visual styles, especially in in-context learning (ICL). Existing methods often fail to effectively decouple style from content, hindering generalization. To address this, we propose the Semantic-Preserving Cross-Style Visual Reasoner (SP-CSVR), a novel framework for stable semantic understanding and adaptive cross-style visual reasoning. SP-CSVR integrates a Cross-Style Feature Encoder (CSFE) for style-content disentanglement, a Semantic-Aligned In-Context Decoder (SAICD) for efficient few-shot style adaptation, and an Adaptive Semantic Consistency Module (ASCM) employing multi-task contrastive learning to enforce cross-style semantic invariance. Extensive experiments on a challenging multi-style dataset demonstrate SP-CSVR’s state-of-the-art performance across visual captioning, visual question answering, and in-context style adaptation. Comprehensive evaluations, including ablation studies and generalization analysis, confirm SP-CSVR’s efficacy in enhancing robustness, generalization, and efficiency across diverse visual styles.
zh

[CV-84] LLM -based Fusion of Multi-modal Features for Commercial Memorability Prediction

链接: https://arxiv.org/abs/2510.22829
作者: Aleksandar Pramov
机构: Georgia Institute of Technology (佐治亚理工学院)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
备注:

点击查看摘要

[CV-85] FairJudge: MLLM Judging for Social Attributes and Prompt Image Alignment

链接: https://arxiv.org/abs/2510.22827
作者: Zahraa Al Sahili,Maryam Fetanat,Maimuna Nowaz,Ioannis Patras,Matthew Purver
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:

点击查看摘要

[CV-86] MAGIC-Talk: Motion-aware Audio-Driven Talking Face Generation with Customizable Identity Control

链接: https://arxiv.org/abs/2510.22810
作者: Fatemeh Nazarieh,Zhenhua Feng,Diptesh Kanojia,Muhammad Awais,Josef Kittler
机构: University of Surrey (萨里大学); Jiangnan University (江南大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-87] MedXplain-VQA: Multi-Component Explainable Medical Visual Question Answering

链接: https://arxiv.org/abs/2510.22803
作者: Hai-Dang Nguyen,Minh-Anh Dang,Minh-Tan Le,Minh-Tuan Le
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 10 pages, 4 figures, IEEE conference format

点击查看摘要

[CV-88] Self-Calibrated Consistency can Fight Back for Adversarial Robustness in Vision-Language Models

链接: https://arxiv.org/abs/2510.22785
作者: Jiaxiang Liu,Jiawei Du,Xiao Liu,Prayag Tiwari,Mingkun Xu
机构: Guangdong Institute of Intelligence Science and Technology (广东省智能科学与技术研究院); Agency for Science, Technology and Research (A*STAR) (新加坡科技研究局); School of Information Technology, Halmstad University (哈尔姆斯塔德大学信息学院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-89] ConMatFormer: A Multi-attention and Transformer Integrated ConvNext based Deep Learning Model for Enhanced Diabetic Foot Ulcer Classification

链接: https://arxiv.org/abs/2510.22743
作者: Raihan Ahamed Rifat,Fuyad Hasan Bhoyan,Md Humaion Kabir Mehedi,Md Kaviul Hossain,Md. Jakir Hossen,M. F. Mridha
机构: Charles Darwin University (查尔斯达尔文大学); University of Liberal Arts Bangladesh (自由艺术大学孟加拉国); BRAC University (BRAC大学); Multimedia University (多媒体大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-90] Cross-view Localization and Synthesis - Datasets Challenges and Opportunities

链接: https://arxiv.org/abs/2510.22736
作者: Ningli Xu,Rongjun Qin
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 15 Figures

点击查看摘要

[CV-91] S-Chain: Structured Visual Chain-of-Thought For Medicine

链接: https://arxiv.org/abs/2510.22728
作者: Khai Le-Duc,Duy M. H. Nguyen,Phuong T. H. Trinh,Tien-Phat Nguyen,Nghiem T. Diep,An Ngo,Tung Vu,Trinh Vuong,Anh-Tien Nguyen,Mau Nguyen,Van Trung Hoang,Khai-Nguyen Nguyen,Hy Nguyen,Chris Ngo,Anji Liu,Nhat Ho,Anne-Christin Hauschild,Khanh Xuan Nguyen,Thanh Nguyen-Tang,Pengtao Xie,Daniel Sonntag,James Zou,Mathias Niepert,Anh Totti Nguyen
机构: 未知
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
备注: First version

点击查看摘要

[CV-92] Edge Collaborative Gaussian Splatting with Integrated Rendering and Communication

链接: https://arxiv.org/abs/2510.22718
作者: Yujie Wan,Chenxuan Liu,Shuai Wang,Tong Zhang,James Jianqiao Yu,Kejiang Ye,Dusit Niyato,Chengzhong Xu
机构: 未知
类目: Information Theory (cs.IT); Computer Vision and Pattern Recognition (cs.CV)
备注: 5 pages and 7 figures, submitted for possible publication

点击查看摘要

[CV-93] LRW-Persian: Lip-reading in the Wild Dataset for Persian Language

链接: https://arxiv.org/abs/2510.22716
作者: Zahra Taghizadeh,Mohammad Shahverdikondori,Arian Noori,Alireza Dadgarnia
机构: Sharif University of Technology (谢里夫理工大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 12 pages, 6 figures

点击查看摘要

[CV-94] IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction

链接: https://arxiv.org/abs/2510.22706
作者: Hao Li,Zhengyu Zou,Fangfu Liu,Xuanyang Zhang,Fangzhou Hong,Yukang Cao,Yushi Lan,Manyuan Zhang,Gang Yu,Dingwen Zhang,Ziwei Liu
机构: NWPU; S-Lab, NTU; StepFun, Inc.; THU; MMLab, CUHK
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: this https URL

点击查看摘要

[CV-95] Atlas Urban Index: A VLM-Based Approach for Spatially and Temporally Calibrated Urban Development Monitoring KDD

链接: https://arxiv.org/abs/2510.22702
作者: Mithul Chander,Sai Pragnya Ranga,Prathamesh Mayekar
机构: Propheus(Propheus)
类目: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Emerging Technologies (cs.ET); Image and Video Processing (eess.IV)
备注: An abridged version of this paper will be presented at and appear in the Proceedings of ACM IKDD CODS 2025

点击查看摘要

[CV-96] WaveMAE: Wavelet decomposition Masked Auto-Encoder for Remote Sensing

链接: https://arxiv.org/abs/2510.22697
作者: Vittorio Bernuzzi,Leonardo Rossi,Tomaso Fontanini,Massimo Bertozzi,Andrea Prati
机构: Università di Parma (帕尔马大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-97] VADTree: Explainable Training-Free Video Anomaly Detection via Hierarchical Granularity-Aware Tree NEURIPS2025

链接: https://arxiv.org/abs/2510.22693
作者: Wenlong Li,Yifei Xu,Yuan Rao,Zhenhua Wang,Shuiguang Deng
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: NeurIPS 2025 Camera Ready

点击查看摘要

[CV-98] Estimation of Fireproof Structure Class and Construction Year for Disaster Risk Assessment

链接: https://arxiv.org/abs/2510.22683
作者: Hibiki Ayabe,Kazushi Okamoto,Koki Karube,Atsushi Shibata,Kei Harada
机构: The University of Electro-Communications(电气通信大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-99] DAMap: Distance-aware MapNet for High Quality HD Map Construction ICCV2025

链接: https://arxiv.org/abs/2510.22675
作者: Jinpeng Dong,Chen Li,Yutong Lin,Jingwen Fu,Sanping Zhou,Nanning Zheng
机构: Xi’an Jiaotong University (西安交通大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted to ICCV2025

点击查看摘要

[CV-100] Alias-Free ViT: Fractional Shift Invariance via Linear Attention NEURIPS2025

链接: https://arxiv.org/abs/2510.22673
作者: Hagay Michaeli,Daniel Soudry
机构: Technion (以色列理工学院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted at NeurIPS 2025. Code is available at this https URL

点击查看摘要

[CV-101] LVD-GS: Gaussian Splatting SLAM for Dynamic Scenes via Hierarchical Explicit-Implicit Representation Collaboration Rendering

链接: https://arxiv.org/abs/2510.22669
作者: Wenkai Zhu,Xu Li,Qimin Xu,Benwu Wang,Kun Wei,Yiming Peng,Zihang Wang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-102] SARCLIP: A Vision Language Foundation Model for Semantic Understanding and Target Recognition in SAR Imagery

链接: https://arxiv.org/abs/2510.22665
作者: Qiwei Ma,Zhiyu Wang,Wang Liu,Xukun Lu,Bin Deng,Puhong Duan,Xudong Kang,Shutao Li
机构: Hunan University (湖南大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 9 pages, 6 figures

点击查看摘要

[CV-103] Self-Attention Decomposition For Training Free Diffusion Editing ICASSP

链接: https://arxiv.org/abs/2510.22650
作者: Tharun Anand,Mohammad Hassan Vali,Arno Solin
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 4 pages (ICASSP Format)

点击查看摘要

[CV-104] A Critical Study on Tea Leaf Disease Detection using Deep Learning Techniques

链接: https://arxiv.org/abs/2510.22647
作者: Nabajyoti Borah,Raju Moni Borah,Bandan Boruah,Purnendu Bikash Acharjee,Sajal Saha,Ripjyoti Hazarika
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-105] Robust Atypical Mitosis Classification with DenseNet121: Stain-Aware Augmentation and Hybrid Loss for Domain Generalization MICCAI

链接: https://arxiv.org/abs/2510.22630
作者: Adinath Dukre,Ankan Deria,Yutong Xie,Imran Razzak
机构: MBZUAI (Mohamed bin Zayed University of Artificial Intelligence)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: MIDOG 2025 MICCAI Workshop accepted

点击查看摘要

[CV-106] DeepfakeBench-MM: A Comprehensive Benchmark for Multimodal Deepfake Detection

链接: https://arxiv.org/abs/2510.22622
作者: Kangran Zhao,Yupeng Chen,Xiaoyu Zhang,Yize Chen,Weinan Guan,Baicheng Chen,Chengzhe Sun,Soumyya Kanti Datta,Qingshan Liu,Siwei Lyu,Baoyuan Wu
机构: The Chinese University of Hong Kong, Shenzhen (深圳中文大学); University at Buffalo, State University of New York (纽约州立大学布法罗分校); Nanjing University of Posts and Telecommunications (南京邮电大学)
类目: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
备注: Preprint

点击查看摘要

[CV-107] Cross-Species Transfer Learning in Agricultural AI: Evaluating ZebraPose Adaptation for Dairy Cattle Pose Estimation

链接: https://arxiv.org/abs/2510.22618
作者: Mackenzie Tapp,Sibi Chakravarthy Parivendan,Kashfia Sailunaz,Suresh Neethirajan
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 20 pages, 11 figures, 6 Tables

点击查看摘要

[CV-108] SWAN: Self-supervised Wavelet Neural Network for Hyperspectral Image Unmixing

链接: https://arxiv.org/abs/2510.22607
作者: Yassh Ramchandani,Vijayashekhar S S,Jignesh S. Bhatt
机构: Info Edge India Ltd (Info Edge 印度有限公司); Acharya Institute of Technology (阿查里亚技术学院); Indian Institute of Information Technology Vadodara (印度信息科技学院瓦多拉分校)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-109] Projection Embedded Diffusion Bridge for CT Reconstruction from Incomplete Data

链接: https://arxiv.org/abs/2510.22605
作者: Yuang Wang,Pengfei Jin,Siyeop Yoon,Matthew Tivnan,Shaoyang Zhang,Li Zhang,Quanzheng Li,Zhiqiang Chen,Dufan Wu
机构: Tsinghua University (清华大学); Ohio State University Medical Center (俄亥俄州立大学医学中心)
类目: Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
备注: 53 pages, 7 figures, submitted to Medical Image Analysis

点击查看摘要

[CV-110] PSScreen V2: Partially Supervised Multiple Retinal Disease Screening

链接: https://arxiv.org/abs/2510.22589
作者: Boyi Zheng,Yalin Zheng,Hrvoje Bogunović,Qing Liu
机构: University of Oulu (奥卢大学); University of Liverpool (利物浦大学); Medical University of Vienna (维也纳医科大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-111] Cross-View UAV Geo-Localization with Precision-Focused Efficient Design: A Hierarchical Distillation Approach with Multi-view Refinement

链接: https://arxiv.org/abs/2510.22582
作者: Jian Sun,Kangdao Liu,Chi Zhang,Chuangquan Chen,Junge Shen,Chi-Man Vong
机构: University of Macau (澳门大学); Wuyi University (五邑大学); Northwestern Polytechnical University (西北工业大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-112] From Pixels to Views: Learning Angular-Aware and Physics-Consistent Representations for Light Field Microscopy NEURIPS2025

链接: https://arxiv.org/abs/2510.22577
作者: Feng He,Guodong Tan,Qiankun Li,Jun Yu,Quan Wen
机构: University of Science and Technology of China (中国科学技术大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by NeurIPS 2025

点击查看摘要

[CV-113] MELDAE: A Framework for Micro-Expression Spotting Detection and Automatic Evaluation in In-the-Wild Conversational Scenes

【速读】:该论文旨在解决在自然场景(如真实对话)中准确识别自发性、无意识微表情(micro-expression)的难题,现有方法多依赖于受控实验室环境下的数据集,在真实场景中性能显著下降。其解决方案的关键在于:提出首个面向对话场景的微表情数据集(conversational-in-the-wild),构建端到端的定位与检测框架MELDAE,并设计一种边界感知损失函数(boundary-aware loss function),通过惩罚起始和终止时刻的误差提升时序精度,从而显著改善模型在真实场景中的定位准确性和泛化能力。

链接: https://arxiv.org/abs/2510.22575
作者: Yigui Feng,Qinglin Wang,Yang Liu,Ke Liu,Haotian Mo,Enhao Huang,Gencheng Liu,Mingzhe Liu,Jie Liu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Accurately analyzing spontaneous, unconscious micro-expressions is crucial for revealing true human emotions, but this task remains challenging in wild scenarios, such as natural conversation. Existing research largely relies on datasets from controlled laboratory environments, and their performance degrades dramatically in the real world. To address this issue, we propose three contributions: the first micro-expression dataset focused on conversational-in-the-wild scenarios; an end-to-end localization and detection framework, MELDAE; and a novel boundary-aware loss function that improves temporal accuracy by penalizing onset and offset errors. Extensive experiments demonstrate that our framework achieves state-of-the-art results on the WDMD dataset, improving the key F1_DR localization metric by 17.72% over the strongest baseline, while also demonstrating excellent generalization capabilities on existing benchmarks.
zh

[CV-114] STATUS Bench: A Rigorous Benchmark for Evaluating Object State Understanding in Vision-Language Models

【速读】:该论文旨在解决当前视觉语言模型(Vision-Language Models, VLMs)在识别物体状态(如位置状态或功能状态)时精度不足的问题,尤其是对细微状态差异的捕捉能力有限。其解决方案的关键在于提出首个系统性的评估基准 STATUS Bench 和大规模训练数据集 STATUS Train:STATUS Bench 通过同时要求模型完成物体状态识别(Object State Identification, OSI)、图像检索(Image Retrieval, IR)和状态变化识别(State Change Identification, SCI)三项任务,构建了严谨且多维度的评估机制;而 STATUS Train 包含1300万条半自动生成的描述数据,为模型训练提供了高质量、多样化的语义标注资源,从而推动VLM在物体状态理解上的性能提升。

链接: https://arxiv.org/abs/2510.22571
作者: Mahiro Ukai,Shuhei Kurita,Nakamasa Inoue
机构: Institute of Science Tokyo (东京科学研究所); National Institute of Informatics (信息基础研究所)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
备注:

点击查看摘要

Abstract:Object state recognition aims to identify the specific condition of objects, such as their positional states (e.g., open or closed) and functional states (e.g., on or off). While recent Vision-Language Models (VLMs) are capable of performing a variety of multimodal tasks, it remains unclear how precisely they can identify object states. To alleviate this issue, we introduce the STAte and Transition UnderStanding Benchmark (STATUS Bench), the first benchmark for rigorously evaluating the ability of VLMs to understand subtle variations in object states in diverse situations. Specifically, STATUS Bench introduces a novel evaluation scheme that requires VLMs to perform three tasks simultaneously: object state identification (OSI), image retrieval (IR), and state change identification (SCI). These tasks are defined over our fully hand-crafted dataset involving image pairs, their corresponding object state descriptions and state change descriptions. Furthermore, we introduce a large-scale training dataset, namely STATUS Train, which consists of 13 million semi-automatically created descriptions. This dataset serves as the largest resource to facilitate further research in this area. In our experiments, we demonstrate that STATUS Bench enables rigorous consistency evaluation and reveal that current state-of-the-art VLMs still significantly struggle to capture subtle object state distinctions. Surprisingly, under the proposed rigorous evaluation scheme, most open-weight VLMs exhibited chance-level zero-shot performance. After fine-tuning on STATUS Train, Qwen2.5-VL achieved performance comparable to Gemini 2.0 Flash. These findings underscore the necessity of STATUS Bench and Train for advancing object state recognition in VLM research.
zh

[CV-115] SRSR: Enhancing Semantic Accuracy in Real-World Image Super-Resolution with Spatially Re-Focused Text-Conditioning NEURIPS2025

链接: https://arxiv.org/abs/2510.22534
作者: Chen Chen,Majid Abdolshah,Violetta Shevchenko,Hongdong Li,Chang Xu,Pulak Purkait
机构: Amazon(亚马逊); The University of Sydney(悉尼大学); Australian National University(澳大利亚国立大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted at NeurIPS 2025

点击查看摘要

[CV-116] Bag-of-Word-Groups (BoWG): A Robust and Efficient Loop Closure Detection Method Under Perceptual Aliasing IROS

链接: https://arxiv.org/abs/2510.22529
作者: Xiang Fei,Tina Tian,Howie Choset,Lu Li
机构: Carnegie Mellon University (卡内基梅隆大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
备注: This paper has been accepted by IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025

点击查看摘要

[CV-117] AesCrop: Aesthetic-driven Cropping Guided by Composition ICCV

链接: https://arxiv.org/abs/2510.22528
作者: Yen-Hong Wong,Lai-Kuan Wong
机构: Multimedia University (多媒体大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted at the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025

点击查看摘要

[CV-118] Open Multimodal Retrieval-Augmented Factual Image Generation

链接: https://arxiv.org/abs/2510.22521
作者: Yang Tian,Fan Liu,Jingyuan Zhang,Wei Bi,Yupeng Hu,Liqiang Nie
机构: Shandong University (山东大学); National University of Singapore (新加坡国立大学); Kuaishou Technology (快手科技); Harbin Institute of Technology, Shenzhen (哈尔滨工业大学深圳校区)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
备注: Preprint

点击查看摘要

[CV-119] GateFuseNet: An Adaptive 3D Multimodal Neuroimaging Fusion Network for Parkinsons Disease Diagnosis

链接: https://arxiv.org/abs/2510.22507
作者: Rui Jin,Chen Chen,Yin Liu,Hongfu Sun,Min Zeng,Min Li,Yang Gao
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: The first two authors contributed equally to this work. Correspondence to: Yang Gao, E-mail: this http URL @csu. this http URL

点击查看摘要

[CV-120] LAMP: Data-Efficient Linear Affine Weight-Space Models for Parameter-Controlled 3D Shape Generation and Extrapolation

【速读】:该论文旨在解决当前3D生成方法在数据效率、可控性和泛化能力方面的局限性,尤其是当训练数据不足时难以实现参数约束下的高质量几何生成。其解决方案的关键在于提出LAMP(Linear Affine Mixing of Parametric shapes)框架,通过首先对符号距离函数(Signed Distance Function, SDF)解码器进行过拟合对齐,再在对齐的权重空间中求解带参数约束的混合优化问题,从而实现高效、可解释且安全的3D形状生成。此外,引入基于线性失配的安全度量以检测几何有效性,显著提升了外推能力与鲁棒性。

链接: https://arxiv.org/abs/2510.22491
作者: Ghadi Nehme,Yanxia Zhang,Dule Shu,Matt Klenk,Faez Ahmed
机构: Massachusetts Institute of Technology (麻省理工学院); Toyota Research Institute (丰田研究院)
类目: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Generating high-fidelity 3D geometries that satisfy specific parameter constraints has broad applications in design and engineering. However, current methods typically rely on large training datasets and struggle with controllability and generalization beyond the training distributions. To overcome these limitations, we introduce LAMP (Linear Affine Mixing of Parametric shapes), a data-efficient framework for controllable and interpretable 3D generation. LAMP first aligns signed distance function (SDF) decoders by overfitting each exemplar from a shared initialization, then synthesizes new geometries by solving a parameter-constrained mixing problem in the aligned weight space. To ensure robustness, we further propose a safety metric that detects geometry validity via linearity mismatch. We evaluate LAMP on two 3D parametric benchmarks: DrivAerNet++ and BlendedNet. We found that LAMP enables (i) controlled interpolation within bounds with as few as 100 samples, (ii) safe extrapolation by up to 100% parameter difference beyond training ranges, (iii) physics performance-guided optimization under fixed parameters. LAMP significantly outperforms conditional autoencoder and Deep Network Interpolation (DNI) baselines in both extrapolation and data efficiency. Our results demonstrate that LAMP advances controllable, data-efficient, and safe 3D generation for design exploration, dataset generation, and performance-driven optimization.
zh

[CV-121] Single-Teacher View Augmentation: Boosting Knowledge Distillation via Angular Diversity NEURIPS2025

链接: https://arxiv.org/abs/2510.22480
作者: Seonghoon Yu,Dongjun Nam,Dina Katabi,Jeany Son
机构: GIST(韩国科学技术院); POSTECH(浦项工科大学); MIT CSAIL(麻省理工学院计算机科学与人工智能实验室)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Accepted to NeurIPS 2025

点击查看摘要

[CV-122] DynaPose4D: High-Quality 4D Dynamic Content Generation via Pose Alignment Loss

链接: https://arxiv.org/abs/2510.22473
作者: Jing Yang,Yufeng Yang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-123] SemiETPicker: Fast and Label-Efficient Particle Picking for CryoET Tomography Using Semi-Supervised Learning

链接: https://arxiv.org/abs/2510.22454
作者: Linhan Wang,Jianwen Dou,Wang Li,Shengkun Wang,Zhiwu Xie,Chang-Tien Lu,Yinlin Chen
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-124] Benchmarking Egocentric Multimodal Goal Inference for Assistive Wearable Agents NEURIPS2025

链接: https://arxiv.org/abs/2510.22443
作者: Vijay Veerabadran,Fanyi Xiao,Nitin Kamra,Pedro Matias,Joy Chen,Caley Drooff,Brett D Roads,Riley Williams,Ethan Henderson,Xuanyi Zhao,Kevin Carlberg,Joseph Tighe,Karl Ridgeway
机构: Meta(Meta)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Accepted as a spotlight paper at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025)

点击查看摘要

[CV-125] 3D Roadway Scene Object Detection with LIDARs in Snowfall Conditions ITSC

链接: https://arxiv.org/abs/2510.22436
作者: Ghazal Farhani,Taufiq Rahman,Syed Mostaquim Ali,Andrew Liu,Mohamed Zaki,Dominique Charlebois,Benoit Anctil
机构: National Research Council Canada (加拿大国家研究委员会); Western University (西安大略大学); Transport Canada (加拿大交通部)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC), pp. 1441–1448, Sept. 2024

点击查看摘要

[CV-126] Hollywood Town: Long-Video Generation via Cross-Modal Multi-Agent Orchestration

链接: https://arxiv.org/abs/2510.22431
作者: Zheng Wei,Mingchen Li,Zeqian Zhang,Ruibin Yuan,Pan Hui,Huamin Qu,James Evans,Maneesh Agrawala,Anyi Rao
机构: The Hong Kong University of Science and Technology (香港科技大学); The Hong Kong University of Science and Technology (Guangzhou) (香港科技大学广州分校); University of Chicago (芝加哥大学); Stanford University (斯坦福大学)
类目: Multiagent Systems (cs.MA); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-127] op-Down Semantic Refinement for Image Captioning

链接: https://arxiv.org/abs/2510.22391
作者: Jusheng Zhang,Kaitong Cai,Jing Yang,Jian Wang,Chengpei Tang,Keze Wang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-128] A Fully Interpretable Statistical Approach for Roadside LiDAR Background Subtraction

【速读】:该论文旨在解决路边激光雷达(LiDAR)数据中背景抑制(background subtraction)的难题,以提升自动驾驶系统中基于基础设施的感知能力。其核心解决方案在于提出一种完全可解释且灵活的统计方法,关键创新点包括:构建高斯分布网格(Gaussian distribution grid, GDG),利用仅包含背景点的扫描数据建模空间统计特性,并设计相应的滤波算法基于该表示对LiDAR点进行前景/背景分类。该方法兼容多种LiDAR类型(如多线360°扫描和微机电系统MEMS传感器),并可在少量背景数据下实现高性能,同时具备低资源硬件上的高效部署能力,从而在准确性和实用性上优于现有最优技术。

链接: https://arxiv.org/abs/2510.22390
作者: Aitor Iglesias,Nerea Aranjuelo,Patricia Javierre,Ainhoa Menendez,Ignacio Arganda-Carreras,Marcos Nieto
机构: Fundación Vicomtech, Basque Research and Technology Alliance (BRTA), Donostia-San Sebastián, Spain; University of the Basque Country (UPV/EHU), Donostia - San Sebastián, Spain; CAFSignalling, Amorebieta, Spain; IKERBASQUE, Basque Foundation for Science, Bilbao, Spain; Donostia International Physics Center (DIPC), Donostia - San Sebastián, Spain; Biofisika Institute, Leioa, Spain
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:We present a fully interpretable and flexible statistical method for background subtraction in roadside LiDAR data, aimed at enhancing infrastructure-based perception in automated driving. Our approach introduces both a Gaussian distribution grid (GDG), which models the spatial statistics of the background using background-only scans, and a filtering algorithm that uses this representation to classify LiDAR points as foreground or background. The method supports diverse LiDAR types, including multiline 360 degree and micro-electro-mechanical systems (MEMS) sensors, and adapts to various configurations. Evaluated on the publicly available RCooper dataset, it outperforms state-of-the-art techniques in accuracy and flexibility, even with minimal background data. Its efficient implementation ensures reliable performance on low-resource hardware, enabling scalable real-world deployment.
zh

[CV-129] Privacy-Aware Federated nnU-Net for ECG Page Digitization

【速读】:该论文旨在解决多机构协作训练深度神经网络进行心电图(ECG)图像数字化时面临的隐私保护与模型性能之间的矛盾问题。传统集中式训练需共享原始图像数据,违反跨机构隐私规范;而现有联邦学习方法在非独立同分布(non-IID)数据下难以保证模型收敛性和泛化能力。解决方案的关键在于提出一种跨孤岛(cross-silo)联邦数字化框架,其核心包括:(i) 全模型端到端训练与客户端同步机制,确保模型一致性;(ii) 安全聚合策略,仅在满足参与阈值后服务器接收裁剪加权和,防止单个客户端更新泄露;(iii) 中心级高斯差分隐私(central Gaussian DP)结合Rényi差分隐私分析,在聚合后提供可审计的用户级隐私保障;(iv) 校准感知的数字化流水线,涵盖页面归一化、波形分割、网格泄漏抑制及向量转十二导联信号,显著提升下游任务准确性。实验表明,采用自适应服务器更新(FedAdam)相比FedAvg和FedProx具有更快收敛速度和更高最终性能,且接近集中式基准,同时有效防护原始图像和客户端梯度暴露。

链接: https://arxiv.org/abs/2510.22387
作者: Nader Nemati
机构: IEEE Machine Learning Member (IEEE 机器学习会员)
类目: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Deep neural networks can convert ECG page images into analyzable waveforms, yet centralized training often conflicts with cross-institutional privacy and deployment constraints. A cross-silo federated digitization framework is presented that trains a full-model nnU-Net segmentation backbone without sharing images and aggregates updates across sites under realistic non-IID heterogeneity (layout, grid style, scanner profile, noise). The protocol integrates three standard server-side aggregators–FedAvg, FedProx, and FedAdam–and couples secure aggregation with central, user-level differential privacy to align utility with formal guarantees. Key features include: (i) end-to-end full-model training and synchronization across clients; (ii) secure aggregation so the server only observes a clipped, weighted sum once a participation threshold is met; (iii) central Gaussian DP with Renyi accounting applied post-aggregation for auditable user-level privacy; and (iv) a calibration-aware digitization pipeline comprising page normalization, trace segmentation, grid-leakage suppression, and vectorization to twelve-lead signals. Experiments on ECG pages rendered from PTB-XL show consistently faster convergence and higher late-round plateaus with adaptive server updates (FedAdam) relative to FedAvg and FedProx, while approaching centralized performance. The privacy mechanism maintains competitive accuracy while preventing exposure of raw images or per-client updates, yielding deployable, auditable guarantees suitable for multi-institution settings. Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG) Cite as: arXiv:2510.22387 [cs.CR] (or arXiv:2510.22387v1 [cs.CR] for this version) https://doi.org/10.48550/arXiv.2510.22387 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
zh

[CV-130] Dynamic Dropout: Leverag ing Conways Game of Life for Neural Networks Regularization

【速读】:该论文旨在解决传统Dropout正则化方法中存在的静态性与可解释性不足的问题,即Dropout在训练过程中随机且固定地丢弃神经元,缺乏对数据特征的动态响应能力,且难以提供网络内部行为的可视化洞察。解决方案的关键在于将神经网络中的单元(neuron units)映射为康威生命游戏(Conway’s Game of Life, GoL)网格中的细胞,并利用GoL的局部演化规则实现动态单位丢弃:通过模拟细胞状态的时空演化,使丢弃模式随训练过程自适应形成空间结构,从而在保持模型泛化性能的同时,增强对网络行为的可解释性与灵活性。

链接: https://arxiv.org/abs/2510.22383
作者: David Freire-Obregón,José Salas-Cáceres,Modesto Castrillón-Santana
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted for presentation at the 5th International Conference on Computing and Machine Intelligence (ICMI 2026)

点击查看摘要

Abstract:Regularization techniques play a crucial role in preventing overfitting and improving the generalization performance of neural networks. Dropout, a widely used regularization technique, randomly deactivates units during training to introduce redundancy and prevent co-adaptation among neurons. Despite its effectiveness, dropout has limitations, such as its static nature and lack of interpretability. In this paper, we propose a novel approach to regularization by substituting dropout with Conway’s Game of Life (GoL), a cellular automata with simple rules that govern the evolution of a grid of cells. We introduce dynamic unit deactivation during training by representing neural network units as cells in a GoL grid and applying the game’s rules to deactivate units. This approach allows for the emergence of spatial patterns that adapt to the training data, potentially enhancing the network’s ability to generalize. We demonstrate the effectiveness of our approach on the CIFAR-10 dataset, showing that dynamic unit deactivation using GoL achieves comparable performance to traditional dropout techniques while offering insights into the network’s behavior through the visualization of evolving patterns. Furthermore, our discussion highlights the applicability of our proposal in deeper architectures, demonstrating how it enhances the performance of different dropout techniques.
zh

[CV-131] Efficient Large-Deformation Medical Image Registration via Recurrent Dynamic Correlation

【速读】:该论文旨在解决深度学习方法在处理大形变(large deformations)时效率不足的问题,尤其是在基于卷积网络的图像配准中,由于缺乏对体素对应关系的直接建模能力,导致难以有效捕捉长距离形变。解决方案的关键在于提出一种基于循环相关性的框架(Recurrent Correlation-based framework),通过动态重定位匹配区域来逐步逼近大形变:在每一步中进行低成本的局部匹配,并利用估计的偏移量引导下一搜索区域,从而实现高效收敛;同时引入轻量级循环更新模块以保留记忆能力,并解耦运动相关特征与纹理特征,抑制语义冗余,提升配准精度与计算效率。

链接: https://arxiv.org/abs/2510.22380
作者: Tianran Li,Marius Staring,Yuchuan Qiao
机构: Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University (复旦大学脑科学与智能技术研究院); Department of Radiology, Leiden University Medical Center (莱顿大学医学中心)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Deformable image registration estimates voxel-wise correspondences between images through spatial transformations, and plays a key role in medical imaging. While deep learning methods have significantly reduced runtime, efficiently handling large deformations remains a challenging task. Convolutional networks aggregate local features but lack direct modeling of voxel correspondences, promoting recent works to explore explicit feature matching. Among them, voxel-to-region matching is more efficient for direct correspondence modeling by computing local correlation features whithin neighbourhoods, while region-to-region matching incurs higher redundancy due to excessive correlation pairs across large regions. However, the inherent locality of voxel-to-region matching hinders the capture of long-range correspondences required for large deformations. To address this, we propose a Recurrent Correlation-based framework that dynamically relocates the matching region toward more promising positions. At each step, local matching is performed with low cost, and the estimated offset guides the next search region, supporting efficient convergence toward large deformations. In addition, we uses a lightweight recurrent update module with memory capacity and decouples motion-related and texture features to suppress semantic redundancy. We conduct extensive experiments on brain MRI and abdominal CT datasets under two settings: with and without affine pre-registration. Results show that our method exibits a strong accuracy-computation trade-off, surpassing or matching the state-of-the-art performance. For example, it achieves comparable performance on the non-affine OASIS dataset, while using only 9.5% of the FLOPs and running 96% faster than RDP, a representative high-performing method.
zh

[CV-132] BLIP-FusePPO: A Vision-Language Deep Reinforcement Learning Framework for Lane Keeping in Autonomous Vehicles

【速读】:该论文旨在解决自动驾驶中车道保持(Lane-Keeping, LK)任务的鲁棒性与可解释性问题,特别是在复杂动态场景下如何融合高阶语义理解与低阶控制信号以提升策略学习效果。其解决方案的关键在于提出了一种基于自举语言-图像预训练(Bootstrapped Language-Image Pretraining, BLIP)驱动的状态表示融合机制(BLIP-FusePPO),将视觉语言模型(Vision-Language Model, VLM)生成的语义嵌入直接融入智能体的观测空间,与几何状态、LiDAR感知和PID控制反馈进行深度融合,从而构建多模态状态表示。该设计不仅避免了仅依赖语义模型进行奖励塑形所带来的运行时推理开销,还确保了语义指导在训练和推理阶段始终可用,显著提升了策略的稳定性与泛化能力。

链接: https://arxiv.org/abs/2510.22370
作者: Seyed Ahmad Hosseini Miangoleh,Amin Jalal Aghdasian,Farzaneh Abdollahi
机构: Amirkabir University of Technology (Tehran Polytechnic)
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Software Engineering (cs.SE)
备注: this https URL

点击查看摘要

Abstract:In this paper, we propose Bootstrapped Language-Image Pretraining-driven Fused State Representation in Proximal Policy Optimization (BLIP-FusePPO), a novel multimodal reinforcement learning (RL) framework for autonomous lane-keeping (LK), in which semantic embeddings generated by a vision-language model (VLM) are directly fused with geometric states, LiDAR observations, and Proportional-Integral-Derivative-based (PID) control feedback within the agent observation space. The proposed method lets the agent learn driving rules that are aware of their surroundings and easy to understand by combining high-level scene understanding from the VLM with low-level control and spatial signals. Our architecture brings together semantic, geometric, and control-aware representations to make policy learning more robust. A hybrid reward function that includes semantic alignment, LK accuracy, obstacle avoidance, and speed regulation helps learning to be more efficient and generalizable. Our method is different from the approaches that only use semantic models to shape rewards. Instead, it directly embeds semantic features into the state representation. This cuts down on expensive runtime inference and makes sure that semantic guidance is always available. The simulation results show that the proposed model is better at LK stability and adaptability than the best vision-based and multimodal RL baselines in a wide range of difficult driving situations. We make our code publicly available.
zh

[CV-133] 2SMark: Balancing Robustness and Diversity in Noise-as-Watermark for Diffusion Models NEURIPS2025

【速读】:该论文旨在解决生成式 AI(Generative AI)中图像水印技术在鲁棒性与生成多样性之间难以平衡的问题。现有基于噪声嵌入的水印方法(Noise-as-Watermark, NaW)要么通过严格约束初始噪声采样来提升鲁棒性,从而损害图像生成多样性;要么保留多样性但对实际攻击过于脆弱。其解决方案的关键在于提出一种两阶段水印方案 T2SMark,核心创新为基于尾部截断采样(Tail-Truncated Sampling, TTS)机制:将水印信息仅嵌入到高可靠性的尾部区域,同时随机采样中间区域以保持潜在空间分布不变,从而在不牺牲生成多样性的前提下增强鲁棒性;此外,通过引入随机会话密钥(session key)嵌入两个阶段的加密流程,进一步保障水印的随机性和安全性。

链接: https://arxiv.org/abs/2510.22366
作者: Jindong Yang,Han Fang,Weiming Zhang,Nenghai Yu,Kejiang Chen
机构: University of Science and Technology of China (中国科学技术大学); Anhui Province Key Laboratory of Digital Security (安徽省数字安全重点实验室); National University of Singapore (新加坡国立大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Accepted by NeurIPS 2025

点击查看摘要

Abstract:Diffusion models have advanced rapidly in recent years, producing high-fidelity images while raising concerns about intellectual property protection and the misuse of generative AI. Image watermarking for diffusion models, particularly Noise-as-Watermark (NaW) methods, encode watermark as specific standard Gaussian noise vector for image generation, embedding the infomation seamlessly while maintaining image quality. For detection, the generation process is inverted to recover the initial noise vector containing the watermark before extraction. However, existing NaW methods struggle to balance watermark robustness with generation diversity. Some methods achieve strong robustness by heavily constraining initial noise sampling, which degrades user experience, while others preserve diversity but prove too fragile for real-world deployment. To address this issue, we propose T2SMark, a two-stage watermarking scheme based on Tail-Truncated Sampling (TTS). Unlike prior methods that simply map bits to positive or negative values, TTS enhances robustness by embedding bits exclusively in the reliable tail regions while randomly sampling the central zone to preserve the latent distribution. Our two-stage framework then ensures sampling diversity by integrating a randomly generated session key into both encryption pipelines. We evaluate T2SMark on diffusion models with both U-Net and DiT backbones. Extensive experiments show that it achieves an optimal balance between robustness and diversity. Our code is available at \hrefthis https URLthis https URL.
zh

[CV-134] EndoSfM3D: Learning to 3D Reconstruct Any Endoscopic Surgery Scene using Self-supervised Foundation Model

【速读】:该论文旨在解决内窥镜手术场景中3D重建的关键难题——即在受限的无菌环境下,如何准确估计内窥镜的内部参数(intrinsic parameters),从而提升三维重建的精度与可靠性。现有方法普遍忽略内在参数估计,限制了其在实际手术中的应用效果。解决方案的核心在于将内在参数估计集成到自监督单目深度估计框架中,通过改进Depth Anything V2 (DA2) 模型实现深度、位姿和内参的联合预测;同时引入基于注意力机制的位姿网络以及Weight-Decomposed Low-Rank Adaptation (DoRA)策略,实现对DA2模型的高效微调,显著提升了在SCARED和C3VD公开数据集上的重建性能。

链接: https://arxiv.org/abs/2510.22359
作者: Changhao Zhang,Matthew J. Clarkson,Mobarak I. Hoque
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 11 pages

点击查看摘要

Abstract:3D reconstruction of endoscopic surgery scenes plays a vital role in enhancing scene perception, enabling AR visualization, and supporting context-aware decision-making in image-guided surgery. A critical yet challenging step in this process is the accurate estimation of the endoscope’s intrinsic parameters. In real surgical settings, intrinsic calibration is hindered by sterility constraints and the use of specialized endoscopes with continuous zoom and telescope rotation. Most existing methods for endoscopic 3D reconstruction do not estimate intrinsic parameters, limiting their effectiveness for accurate and reliable reconstruction. In this paper, we integrate intrinsic parameter estimation into a self-supervised monocular depth estimation framework by adapting the Depth Anything V2 (DA2) model for joint depth, pose, and intrinsics prediction. We introduce an attention-based pose network and a Weight-Decomposed Low-Rank Adaptation (DoRA) strategy for efficient fine-tuning of DA2. Our method is validated on the SCARED and C3VD public datasets, demonstrating superior performance compared to recent state-of-the-art approaches in self-supervised monocular depth estimation and 3D reconstruction. Code and model weights can be found in project repository: this https URL.
zh

[CV-135] GeoDiffusion: A Training-Free Framework for Accurate 3D Geometric Conditioning in Image Generation

【速读】:该论文旨在解决图像生成中几何控制精度不足的问题,尤其是在3D物体特征在图像空间中的精准调控方面,传统3D编辑方法耗时且依赖专业技能,而现有基于图像的生成方法在几何条件约束上缺乏准确性。解决方案的关键在于提出GeoDiffusion框架,其核心是利用类别特定的3D对象作为几何先验,定义三维空间中的关键点和参数化关联关系,并通过参考3D对象的渲染图像确保视角一致性,再结合风格迁移满足用户指定的外观要求;同时引入GeoDrag模块,在拖拽式图像编辑任务中显著提升几何引导任务的准确性和效率,实现在多种迭代设计流程中对3D特征进行精确几何修改。

链接: https://arxiv.org/abs/2510.22337
作者: Phillip Mueller,Talip Uenlue,Sebastian Schmidt,Marcel Kollovieh,Jiajie Fan,Stephan Guennemann,Lars Mikelsons
机构: University of Augsburg (奥格斯堡大学); BMW Group (宝马集团); Technical University of Munich (慕尼黑工业大学); Leiden University (莱顿大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Precise geometric control in image generation is essential for engineering \ product design and creative industries to control 3D object features accurately in image space. Traditional 3D editing approaches are time-consuming and demand specialized skills, while current image-based generative methods lack accuracy in geometric conditioning. To address these challenges, we propose GeoDiffusion, a training-free framework for accurate and efficient geometric conditioning of 3D features in image generation. GeoDiffusion employs a class-specific 3D object as a geometric prior to define keypoints and parametric correlations in 3D space. We ensure viewpoint consistency through a rendered image of a reference 3D object, followed by style transfer to meet user-defined appearance specifications. At the core of our framework is GeoDrag, improving accuracy and speed of drag-based image editing on geometry guidance tasks and general instructions on DragBench. Our results demonstrate that GeoDiffusion enables precise geometric modifications across various iterative design workflows.
zh

[CV-136] Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction

链接: https://arxiv.org/abs/2510.22335
作者: Xu Zhang,Ruijie Quan,Wenguan Wang,Yi Yang
机构: ReLER, CCAI, Zhejiang University (浙江大学); Nanyang Technological University (南洋理工大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-137] Beyond Augmentation: Leverag ing Inter-Instance Relation in Self-Supervised Representation Learning

链接: https://arxiv.org/abs/2510.22322
作者: Ali Javidani,Babak Nadjar Araabi,Mohammad Amin Sadeghi
机构: University of Tehran (德黑兰大学); Hamad bin Khalifa University (哈马德本哈利法大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted in IEEE Signal Processing Letters, 2025

点击查看摘要

[CV-138] GRPO-Guard: Mitigating Implicit Over-Optimization in Flow Matching via Regulated Clipping

链接: https://arxiv.org/abs/2510.22319
作者: Jing Wang,Jiajun Liang,Jie Liu,Henglin Liu,Gongye Liu,Jun Zheng,Wanyuan Pang,Ao Ma,Zhenyu Xie,Xintao Wang,Meng Wang,Pengfei Wan,Xiaodan Liang
机构: Shenzhen Campus of Sun Yat-Sen University (中山大学深圳校区); Kling Team, Kuaishou Technology (快手科技Kling团队); CUHK MMLab (香港中文大学多媒体实验室); Tsinghua University (清华大学); HKUST (香港科技大学); USTB (北京科技大学); UCAS (中国科学院大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:

点击查看摘要

[CV-139] 2I-RiskyPrompt: A Benchmark for Safety Evaluation Attack and Defense on Text-to-Image Model AAAI

【速读】:该论文旨在解决当前用于评估文本到图像(Text-to-Image, T2I)模型安全性的风险提示数据集存在的三大局限:风险类别有限、标注粒度粗略以及有效性不足。其解决方案的关键在于构建一个名为T2I-RiskyPrompt的综合性基准,包含6个主类和14个细粒度子类的风险分类体系,并通过系统化的采集与标注流程获得6,432条有效风险提示,每条均附带层级标签和详细风险原因说明;同时提出一种基于原因驱动的风险图像检测方法,显式对齐多模态大语言模型(Multimodal Large Language Model, MLLM)与安全标注,从而提升评估的准确性与可解释性。

链接: https://arxiv.org/abs/2510.22300
作者: Chenyu Zhang,Tairen Zhang,Lanjun Wang,Ruidong Chen,Wenhui Li,Anan Liu
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: AAAI under review

点击查看摘要

Abstract:Using risky text prompts, such as pornography and violent prompts, to test the safety of text-to-image (T2I) models is a critical task. However, existing risky prompt datasets are limited in three key areas: 1) limited risky categories, 2) coarse-grained annotation, and 3) low effectiveness. To address these limitations, we introduce T2I-RiskyPrompt, a comprehensive benchmark designed for evaluating safety-related tasks in T2I models. Specifically, we first develop a hierarchical risk taxonomy, which consists of 6 primary categories and 14 fine-grained subcategories. Building upon this taxonomy, we construct a pipeline to collect and annotate risky prompts. Finally, we obtain 6,432 effective risky prompts, where each prompt is annotated with both hierarchical category labels and detailed risk reasons. Moreover, to facilitate the evaluation, we propose a reason-driven risky image detection method that explicitly aligns the MLLM with safety annotations. Based on T2I-RiskyPrompt, we conduct a comprehensive evaluation of eight T2I models, nine defense methods, five safety filters, and five attack strategies, offering nine key insights into the strengths and limitations of T2I model safety. Finally, we discuss potential applications of T2I-RiskyPrompt across various research fields. The dataset and code are provided in this https URL.
zh

[CV-140] GSAlign: Geometric and Semantic Alignment Network for Aerial-Ground Person Re-Identification NEURIPS2025

【速读】:该论文针对空中-地面行人重识别(Aerial-Ground Person Re-Identification, AG-ReID)任务中因极端视角差异、遮挡和域间差异导致的匹配困难问题展开研究。现有方法在处理严重姿态变化和空间错位方面仍存在局限性。解决方案的关键在于提出几何与语义对齐网络(Geometric and Semantic Alignment Network, GSAlign),其核心创新包括两个模块:一是可学习薄板样条(Learnable Thin Plate Spline, LTPS)模块,通过学习关键点自适应地扭曲行人特征以补偿由极端视角变化引起的几何失真;二是动态对齐模块(Dynamic Alignment Module, DAM),通过估计可见性感知的表示掩码,在语义层面突出可见身体区域,从而缓解遮挡和局部观测带来的跨视图对应干扰。实验表明,GSAlign在CARGO数据集上显著提升了匹配性能,mAP和Rank-1准确率分别提升+18.8%和+16.8%。

链接: https://arxiv.org/abs/2510.22268
作者: Qiao Li,Jie Li,Yukang Zhang,Lei Tan,Jing Chen,Jiayi Ji
机构: Wuhan University (武汉大学); Xiamen University (厦门大学); National University of Singapore (新加坡国立大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by Neurips 2025

点击查看摘要

Abstract:Aerial-Ground person re-identification (AG-ReID) is an emerging yet challenging task that aims to match pedestrian images captured from drastically different viewpoints, typically from unmanned aerial vehicles (UAVs) and ground-based surveillance cameras. The task poses significant challenges due to extreme viewpoint discrepancies, occlusions, and domain gaps between aerial and ground imagery. While prior works have made progress by learning cross-view representations, they remain limited in handling severe pose variations and spatial misalignment. To address these issues, we propose a Geometric and Semantic Alignment Network (GSAlign) tailored for AG-ReID. GSAlign introduces two key components to jointly tackle geometric distortion and semantic misalignment in aerial-ground matching: a Learnable Thin Plate Spline (LTPS) Module and a Dynamic Alignment Module (DAM). The LTPS module adaptively warps pedestrian features based on a set of learned keypoints, effectively compensating for geometric variations caused by extreme viewpoint changes. In parallel, the DAM estimates visibility-aware representation masks that highlight visible body regions at the semantic level, thereby alleviating the negative impact of occlusions and partial observations in cross-view correspondence. A comprehensive evaluation on CARGO with four matching protocols demonstrates the effectiveness of GSAlign, achieving significant improvements of +18.8% in mAP and +16.8% in Rank-1 accuracy over previous state-of-the-art methods on the aerial-ground setting. The code is available at: \textcolormagentathis https URL.
zh

[CV-141] Accident Anticipation via Temporal Occurrence Prediction NIPS2025

链接: https://arxiv.org/abs/2510.22260
作者: Tianhao Zhao,Yiyang Zou,Zihao Mao,Peilun Xiao,Yulin Huang,Hongda Yang,Yuxuan Li,Qun Li,Guobin Wu,Yutian Lin
机构: Wuhan University (武汉大学); Zhongguancun Academy (中关村学院); Didi Chuxing (滴滴出行)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by NIPS 2025

点击查看摘要

[CV-142] Real-Time Semantic Segmentation on FPGA for Autonomous Vehicles Using LMIINet with the CGRA4ML Framework

链接: https://arxiv.org/abs/2510.22243
作者: Amir Mohammad Khadem Hosseini,Sattar Mirzakuchaki
机构: Iran University of Science and Technology (伊朗科学技术大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-143] DiffusionLane: Diffusion Model for Lane Detection

链接: https://arxiv.org/abs/2510.22236
作者: Kunyang Zhou,Yeqin Shao
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-144] Diffusion-Driven Two-Stage Active Learning for Low-Budget Semantic Segmentation NEURIPS2025

链接: https://arxiv.org/abs/2510.22229
作者: Jeongin Kim,Wonho Bae,YouLee Han,Giyeong Oh,Youngjae Yu,Danica J. Sutherland,Junhyug Noh
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted to NeurIPS 2025

点击查看摘要

[CV-145] Audio Frequency-Time Dual Domain Evaluation on Depression Diagnosis

【速读】:该论文旨在解决抑郁症(Depression)在预防与治疗中面临的诊断流程复杂、标准模糊及就诊率低等问题,这些问题严重阻碍了早期评估与干预。其解决方案的关键在于利用语音作为生理信号,挖掘其时频域双重特征,并结合深度学习模型构建智能评估与诊断算法,从而实现对抑郁症的高效分类与识别。

链接: https://arxiv.org/abs/2510.22225
作者: Yu Luo,Nan Huang,Sophie Yu,Hendry Xu,Jerry Wang,Colin Wang,Zhichao Liu,Chen Zeng
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Depression, as a typical mental disorder, has become a prevalent issue significantly impacting public health. However, the prevention and treatment of depression still face multiple challenges, including complex diagnostic procedures, ambiguous criteria, and low consultation rates, which severely hinder timely assessment and intervention. To address these issues, this study adopts voice as a physiological signal and leverages its frequency-time dual domain multimodal characteristics along with deep learning models to develop an intelligent assessment and diagnostic algorithm for depression. Experimental results demonstrate that the proposed method achieves excellent performance in the classification task for depression diagnosis, offering new insights and approaches for the assessment, screening, and diagnosis of depression.
zh

[CV-146] Enpowering Your Pansharpening Models with Generalizability: Unified Distribution is All You Need ICCV2025

链接: https://arxiv.org/abs/2510.22217
作者: Yongchuan Cui,Peng Liu,Hui Zhang
机构: Aerospace Information Research Institute, Chinese Academy of Sciences (中国科学院空天信息研究院); School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences (中国科学院大学电子、电气与通信工程学院); School of Engineering Medicine, Beihang University (北京航空航天大学医学工程学院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted to ICCV 2025

点击查看摘要

[CV-147] Hybrid-Vector Retrieval for Visually Rich Documents: Combining Single-Vector Efficiency and Multi-Vector Accuracy

链接: https://arxiv.org/abs/2510.22215
作者: Juyeon Kim,Geon Lee,Dongwon Choi,Taeuk Kim,Kijung Shin
机构: KAIST(韩国科学技术院); Hanyang University(汉阳大学)
类目: Information Retrieval (cs.IR); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-148] GALA: A GlobAl-LocAl Approach for Multi-Source Active Domain Adaptation

链接: https://arxiv.org/abs/2510.22214
作者: Juepeng Zheng,Peifeng Zhang,Yibin Wen,Qingmei Li,Yang Zhang,Haohuan Fu
机构: Sun Yat-Sen University (中山大学); National Supercomputing Center in Shenzhen (深圳国家超级计算机中心); Tsinghua University (清华大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-149] DynamicTree: Interactive Real Tree Animation via Sparse Voxel Spectrum

链接: https://arxiv.org/abs/2510.22213
作者: Yaokun Li,Lihe Ding,Xiao Chen,Guang Tan,Tianfan Xue
机构: Sun Yat-sen University (中山大学); The Chinese University of Hong Kong (香港中文大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Project Page: this https URL

点击查看摘要

[CV-150] Simplifying Knowledge Transfer in Pretrained Models

【速读】:该论文旨在解决预训练模型在不同设计选择下展现出异质泛化行为的问题,即某些模型能够捕捉到其他模型无法获得的数据特定洞察。其解决方案的关键在于利用大规模公开的模型仓库作为辅助知识源,并提出一种数据分区策略,使预训练模型能自主扮演“学生”或“教师”的角色,实现双向知识迁移。通过在图像分类、语义分割和视频显著性预测等任务上的实验证明,该方法显著提升了模型性能,尤其在跨架构的知识传递中表现出更强的适应性和有效性。

链接: https://arxiv.org/abs/2510.22208
作者: Siddharth Jain,Shyamgopal Karthik,Vineet Gandhi
机构: International Institute of Information Technology, Hyderabad (国际信息科技研究所,海得拉巴); University of Tübingen (图宾根大学)
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
备注: 12 pages, 3 figures, 6 tables, Accepted at TMLR 2025

点击查看摘要

Abstract:Pretrained models are ubiquitous in the current deep learning landscape, offering strong results on a broad range of tasks. Recent works have shown that models differing in various design choices exhibit categorically diverse generalization behavior, resulting in one model grasping distinct data-specific insights unavailable to the other. In this paper, we propose to leverage large publicly available model repositories as an auxiliary source of model improvements. We introduce a data partitioning strategy where pretrained models autonomously adopt either the role of a student, seeking knowledge, or that of a teacher, imparting knowledge. Experiments across various tasks demonstrate the effectiveness of our proposed approach. In image classification, we improved the performance of ViT-B by approximately 1.4% through bidirectional knowledge transfer with ViT-T. For semantic segmentation, our method boosted all evaluation metrics by enabling knowledge transfer both within and across backbone architectures. In video saliency prediction, our approach achieved a new state-of-the-art. We further extend our approach to knowledge transfer between multiple models, leading to considerable performance improvements for all model participants.
zh

[CV-151] rajGATFormer: A Graph-Based Transformer Approach for Worker and Obstacle Trajectory Prediction in Off-site Construction Environments

链接: https://arxiv.org/abs/2510.22205
作者: Mohammed Alduais,Xinming Li,Qipei Mei
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-152] LongCat-Video Technical Report

【速读】:该论文旨在解决长视频生成中效率低、质量差及缺乏统一架构的问题,尤其针对构建世界模型所需的长时间视频推理能力。其核心解决方案是提出LongCat-Video,一个拥有136亿参数的统一视频生成基础模型,采用扩散Transformer(Diffusion Transformer, DiT)框架支持文本到视频(Text-to-Video)、图像到视频(Image-to-Video)和视频续写(Video-Continuation)等多种任务;通过视频续写预训练实现高质量、时间连贯的分钟级长视频生成;并结合粗到精生成策略与块稀疏注意力(Block Sparse Attention),在时空维度上提升推理效率,同时利用多奖励强化学习人类反馈(Multi-reward RLHF)优化性能,使其达到与最新闭源及领先开源模型相当的水平。

链接: https://arxiv.org/abs/2510.22200
作者: Meituan LongCat Team:Xunliang Cai,Qilong Huang,Zhuoliang Kang,Hongyu Li,Shijun Liang,Liya Ma,Siyu Ren,Xiaoming Wei,Rixu Xie,Tong Zhang
机构: Meituan LongCat Team(美团龙猫团队)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Video generation is a critical pathway toward world models, with efficient long video inference as a key capability. Toward this end, we introduce LongCat-Video, a foundational video generation model with 13.6B parameters, delivering strong performance across multiple video generation tasks. It particularly excels in efficient and high-quality long video generation, representing our first step toward world models. Key features include: Unified architecture for multiple tasks: Built on the Diffusion Transformer (DiT) framework, LongCat-Video supports Text-to-Video, Image-to-Video, and Video-Continuation tasks with a single model; Long video generation: Pretraining on Video-Continuation tasks enables LongCat-Video to maintain high quality and temporal coherence in the generation of minutes-long videos; Efficient inference: LongCat-Video generates 720p, 30fps videos within minutes by employing a coarse-to-fine generation strategy along both the temporal and spatial axes. Block Sparse Attention further enhances efficiency, particularly at high resolutions; Strong performance with multi-reward RLHF: Multi-reward RLHF training enables LongCat-Video to achieve performance on par with the latest closed-source and leading open-source models. Code and model weights are publicly available to accelerate progress in the field.
zh

[CV-153] MOGRAS: Human Motion with Grasping in 3D Scenes

【速读】:该论文旨在解决生成兼具物理合理性与场景感知能力的全身抓取动作(full-body grasping motions)这一关键挑战,即现有方法要么无法在复杂3D场景中实现精细物体抓取,要么忽略场景上下文导致动作不真实。解决方案的关键在于构建MOGRAS(Human MOtion with GRAsping in 3D Scenes)数据集,该数据集提供丰富标注的室内场景中预抓取阶段的全身行走动作和最终抓取姿态,并基于此基准测试现有方法的局限性,同时提出一种简单但有效的适配策略,使已有全身体运动模型能够无缝融入3D场景中,从而显著提升生成动作的场景一致性与物理合理性。

链接: https://arxiv.org/abs/2510.22199
作者: Kunal Bhosikar,Siddharth Katageri,Vivek Madhavaram,Kai Han,Charu Sharma
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Robotics (cs.RO)
备注: British Machine Vision Conference Workshop - From Scene Understanding to Human Modeling

点击查看摘要

Abstract:Generating realistic full-body motion interacting with objects is critical for applications in robotics, virtual reality, and human-computer interaction. While existing methods can generate full-body motion within 3D scenes, they often lack the fidelity for fine-grained tasks like object grasping. Conversely, methods that generate precise grasping motions typically ignore the surrounding 3D scene. This gap, generating full-body grasping motions that are physically plausible within a 3D scene, remains a significant challenge. To address this, we introduce MOGRAS (Human MOtion with GRAsping in 3D Scenes), a large-scale dataset that bridges this gap. MOGRAS provides pre-grasping full-body walking motions and final grasping poses within richly annotated 3D indoor scenes. We leverage MOGRAS to benchmark existing full-body grasping methods and demonstrate their limitations in scene-aware generation. Furthermore, we propose a simple yet effective method to adapt existing approaches to work seamlessly within 3D scenes. Through extensive quantitative and qualitative experiments, we validate the effectiveness of our dataset and highlight the significant improvements our proposed method achieves, paving the way for more realistic human-scene interactions.
zh

[CV-154] Scaling Non-Parametric Sampling with Representation

链接: https://arxiv.org/abs/2510.22196
作者: Vincent Lu,Aaron Truong,Zeyu Yun,Yubei Chen
机构: University of Wisconsin–Madison (威斯康星大学麦迪逊分校); University of Illinois Urbana–Champaign (伊利诺伊大学厄巴纳-香槟分校); UC Berkeley (加州大学伯克利分校); UC Davis (加州大学戴维斯分校)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-155] HARMONY: Hidden Activation Representations and Model Output-Aware Uncertainty Estimation for Vision-Language Models

【速读】:该论文旨在解决视觉语言模型(Vision-Language Models, VLMs)在高风险应用场景中输出不可靠性评估的问题,即如何准确估计模型生成结果的不确定性(Uncertainty Estimation, UE),以提升其安全性与可信度。现有方法主要依赖输出概率分布或仅利用模型隐藏表示进行预测,但难以捕捉跨模态语义关系,且易受语言先验偏差影响。本文提出HARMONY框架,其核心创新在于联合建模模型激活中的融合多模态信息与输出分布,认为模型内部对视觉理解的信心(由隐藏表征体现)和生成词元的概率分布均蕴含可靠信号,二者协同可显著提升不确定性估计性能。实验表明,该方法在多个开放问答基准上优于现有技术,实现AUROC最高提升4%、PRR最高提升6%,确立了VLM不确定性估计的新SOTA。

链接: https://arxiv.org/abs/2510.22171
作者: Erum Mushtaq,Zalan Fabian,Yavuz Faruk Bakman,Anil Ramakrishna,Mahdi Soltanolkotabi,Salman Avestimehr
机构: University of Southern California (南加州大学); Amazon AGI
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:The growing deployment of Vision-Language Models (VLMs) in high-stakes applications such as autonomous driving and assistive technologies for visually impaired individuals necessitates reliable mechanisms to assess the trustworthiness of their generation. Uncertainty Estimation (UE) plays a central role in quantifying the reliability of model outputs and reducing unsafe generations via selective prediction. In this regard, most existing probability-based UE approaches rely on output probability distributions, aggregating token probabilities into a single uncertainty score using predefined functions such as length-normalization. Another line of research leverages model hidden representations and trains MLP-based models to predict uncertainty. However, these methods often fail to capture the complex multimodal relationships between semantic and textual tokens and struggle to identify biased probabilities often influenced by language priors. Motivated by these observations, we propose a novel UE framework, HARMONY, that jointly leverages fused multimodal information in model activations and the output distribution of the VLM to determine the reliability of responses. The key hypothesis of our work is that both the model’s internal belief in its visual understanding, captured by its hidden representations, and the produced token probabilities carry valuable reliability signals that can be jointly leveraged to improve UE performance, surpassing approaches that rely on only one of these components. Experimental results on three open-ended VQA benchmarks, A-OKVQA, VizWiz, and PathVQA, and three state-of-the-art VLMs, LLaVa-7b, LLaVA-13b and InstructBLIP demonstrate that our method consistently performs on par with or better than existing approaches, achieving up to 4% improvement in AUROC, and 6% in PRR, establishing new state of the art in uncertainty estimation for VLMs.
zh

[CV-156] LT-Exosense: A Vision-centric Multi-session Mapping System for Lifelong Safe Navigation of Exoskeletons

链接: https://arxiv.org/abs/2510.22164
作者: Jianeng Wang,Matias Mattamala,Christina Kassab,Nived Chebrolu,Guillaume Burger,Fabio Elnecave,Marine Petriaux,Maurice Fallon
机构: 未知
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
备注: 8 pages, 4 figures

点击查看摘要

[CV-157] I2-NeRF: Learning Neural Radiance Fields Under Physically-Grounded Media Interactions

链接: https://arxiv.org/abs/2510.22161
作者: Shuhong Liu,Lin Gu,Ziteng Cui,Xuangeng Chu,Tatsuya Harada
机构: The University of Tokyo (东京大学); RIKEN (理化学研究所)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-158] Attention Residual Fusion Network with Contrast for Source-free Domain Adaptation

链接: https://arxiv.org/abs/2510.22142
作者: Renrong Shao,Wei Zhang,Jun Wang
机构: Naval Medical University (Second Military Medical University) (海军军医大学(第二军医大学)); East China Normal University (华东师范大学); KLATASDS-MOE (KLATASDS-MOE)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 13 pages, 8 figures

点击查看摘要

[CV-159] STG-Avatar: Animatable Human Avatars via Spacetime Gaussian IROS

【速读】:该论文旨在解决从单目视频中重建高保真、可动画化人类虚拟形象(human avatar)时面临的挑战,特别是非刚性物体细节(如衣物变形)和动态区域(如快速移动的肢体)的精确表示问题。解决方案的关键在于提出了一种基于3D高斯散射(3DGS)的框架STG-Avatar,其核心创新是引入刚性-非刚性耦合变形机制:利用线性混合皮肤(LBS)实现全局骨骼控制以支持实时姿态驱动,同时结合时空高斯(Spacetime Gaussians, STG)通过时空自适应优化增强对动态区域的细节表达能力;此外,通过光流检测高动态区域并引导3D高斯的自适应稀疏化与密集化,从而在保持实时渲染性能的同时显著提升重建质量。

链接: https://arxiv.org/abs/2510.22140
作者: Guangan Jiang,Tianzi Zhang,Dong Li,Zhenjun Zhao,Haoang Li,Mingrui Li,Hongyu Wang
机构: Dalian University of Technology (大连理工大学); Fudan University (复旦大学); University of Macau (澳门大学); WAYTOUS Inc. (未定义公司名称); University of Zaragoza (萨拉戈萨大学); Hong Kong University of Science and Technology (广州) (香港科技大学(广州)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025

点击查看摘要

Abstract:Realistic animatable human avatars from monocular videos are crucial for advancing human-robot interaction and enhancing immersive virtual experiences. While recent research on 3DGS-based human avatars has made progress, it still struggles with accurately representing detailed features of non-rigid objects (e.g., clothing deformations) and dynamic regions (e.g., rapidly moving limbs). To address these challenges, we present STG-Avatar, a 3DGS-based framework for high-fidelity animatable human avatar reconstruction. Specifically, our framework introduces a rigid-nonrigid coupled deformation framework that synergistically integrates Spacetime Gaussians (STG) with linear blend skinning (LBS). In this hybrid design, LBS enables real-time skeletal control by driving global pose transformations, while STG complements it through spacetime adaptive optimization of 3D Gaussians. Furthermore, we employ optical flow to identify high-dynamic regions and guide the adaptive densification of 3D Gaussians in these regions. Experimental results demonstrate that our method consistently outperforms state-of-the-art baselines in both reconstruction quality and operational efficiency, achieving superior quantitative metrics while retaining real-time rendering capabilities. Our code is available at this https URL
zh

[CV-160] goEMOTION: Egocentric Vision and Physiological Signals for Emotion and Personality Recognition in Real-World Tasks NEURIPS2025

链接: https://arxiv.org/abs/2510.22129
作者: Matthias Jammot,Bjöern Braun,Paul Streli,Rafael Wampfler,Christian Holz
机构: ETH Zurich (苏黎世联邦理工学院)
类目: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
备注: Accepted for publication at NeurIPS 2025

点击查看摘要

[CV-161] Mint: A Simple Test-Time Adaptation of Vision-Language Models against Common Corruptions NEURIPS2025

【速读】:该论文旨在解决预训练视觉-语言模型(如CLIP)在输入存在噪声或损坏时性能下降的问题,特别是由于分布偏移导致的图像嵌入(image embeddings)方差塌缩现象。研究发现,随着损坏严重程度增加,类内和类间嵌入方差显著缩小,且类间方差与分类准确率高度相关,这揭示了嵌入空间结构退化是性能下降的核心原因。解决方案的关键在于:通过最大化伪标签驱动的类间方差来增强嵌入质量,提出一种名为Mint的测试时自适应方法,利用均值累加器和梯度累加器在线优化嵌入空间,无需额外标注即可有效提升鲁棒性,且在多种损坏基准和CLIP架构上表现一致改进。

链接: https://arxiv.org/abs/2510.22127
作者: Wenxuan Bao,Ruxi Deng,Jingrui He
机构: University of Illinois Urbana-Champaign (伊利诺伊大学厄巴纳-香槟分校)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Accepted by NeurIPS 2025

点击查看摘要

Abstract:Pretrained vision-language models such as CLIP achieve strong zero-shot generalization but remain vulnerable to distribution shifts caused by input corruptions. In this work, we investigate how corruptions affect CLIP’s image embeddings and uncover a consistent phenomenon we term as embedding variance collapse, where both intra-class and inter-class variances shrink as corruption severity increases. We find that this collapse is closely tied to performance degradation, with inter-class variance strongly correlated with classification accuracy. To explain this phenomenon, we analyze how corruptions alter the structure of the embedding space. Our theoretical results suggest that the visual encoder tends to encode corruption-related signals, which dilute class-discriminative features and compress the representation geometry. We further show that maximizing inter-class variance, even when estimated from pseudo-labels, can provably enhance embedding quality. Based on this insight, we propose Mint, a simple test-time adaptation method that maximizes pseudo-label-based inter-class variance on the fly using a mean accumulator and a gradient accumulator. Mint operates effectively with small batch sizes and consistently improves performance across multiple corruption benchmarks and CLIP architectures. Our code is available at this https URL .
zh

[CV-162] CogStereo: Neural Stereo Matching with Implicit Spatial Cognition Embedding

【速读】:该论文旨在解决深度立体匹配(Deep Stereo Matching)在零样本泛化能力上的不足问题,即当前方法依赖特定数据集的先验知识,在面对新场景或域时性能显著下降。其解决方案的关键在于提出CogStereo框架,通过引入单目深度特征作为先验,将隐式空间认知嵌入到细化过程中,从而超越局部对应关系,实现全局结构一致的视差估计;该框架还采用双条件细化机制,融合像素级不确定性与认知引导特征,以实现对误匹配的一致性全局修正,显著提升跨域泛化性能。

链接: https://arxiv.org/abs/2510.22119
作者: Lihuang Fang,Xiao Hu,Yuchen Zou,Hong Zhang
机构: Southern University of Science and Technology (南方科技大学); International Digital Economy Academy (国际数字经济发展研究院); Xi’an Jiaotong University (西安交通大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 9 pages, 6 figures

点击查看摘要

Abstract:Deep stereo matching has advanced significantly on benchmark datasets through fine-tuning but falls short of the zero-shot generalization seen in foundation models in other vision tasks. We introduce CogStereo, a novel framework that addresses challenging regions, such as occlusions or weak textures, without relying on dataset-specific priors. CogStereo embeds implicit spatial cognition into the refinement process by using monocular depth features as priors, capturing holistic scene understanding beyond local correspondences. This approach ensures structurally coherent disparity estimation, even in areas where geometry alone is inadequate. CogStereo employs a dual-conditional refinement mechanism that combines pixel-wise uncertainty with cognition-guided features for consistent global correction of mismatches. Extensive experiments on Scene Flow, KITTI, Middlebury, ETH3D, EuRoc, and real-world demonstrate that CogStereo not only achieves state-of-the-art results but also excels in cross-domain generalization, shifting stereo vision towards a cognition-driven approach.
zh

[CV-163] GRAID: Enhancing Spatial Reasoning of VLMs Through High-Fidelity Data Generation RAID

【速读】:该论文旨在解决当前视觉语言模型(Vision Language Models, VLMs)在空间推理任务中表现不佳的问题,而空间推理是许多实际应用的关键前提。现有训练数据生成方法存在两大局限:一是基于单图三维重建的方法引入级联建模误差并需宽泛的答案容忍度;二是基于图像描述(caption-based)的方法依赖超精细标注且易受生成幻觉影响,导致人类验证准确率仅为57.6%。其解决方案的核心在于提出GRAID框架,该框架的关键洞察是仅通过二维几何原语(如标准目标检测器输出的2D边界框)即可可靠地确定定性空间关系,从而避免了三维重建错误和生成幻觉问题。实验表明,GRAID生成的数据集在人类验证准确率上达到91.16%,显著优于现有方法,并且基于该数据集微调的模型在未见空间推理任务上展现出强泛化能力,例如在BDD和NuImages上分别实现47.5%和37.9%的准确率提升。

链接: https://arxiv.org/abs/2510.22118
作者: Karim Elmaaroufi,Liheng Lai,Justin Svegliato,Yutong Bai,Sanjit A. Seshia,Matei Zaharia
机构: University of California, Berkeley (加州大学伯克利分校); Models for Embodied and Spatial Harmony
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 22 pages, 3 figures, 3 tables, project page: this https URL

点击查看摘要

Abstract:Vision Language Models (VLMs) achieve strong performance on many vision-language tasks but often struggle with spatial reasoning\textemdasha prerequisite for many applications. Empirically, we find that a dataset produced by a current training data generation pipeline has a 57.6% human validation rate. These rates stem from current limitations: single-image 3D reconstruction introduces cascading modeling errors and requires wide answer tolerances, while caption-based methods require hyper-detailed annotations and suffer from generative hallucinations. We present GRAID, built on the key insight that qualitative spatial relationships can be reliably determined from 2D geometric primitives alone. By operating exclusively on 2D bounding boxes from standard object detectors, GRAID avoids both 3D reconstruction errors and generative hallucinations, resulting in datasets that are of higher quality than existing tools that produce similar datasets as validated by human evaluations. We apply our framework to the BDD100k, NuImages, and Waymo datasets, generating over 8.5 million high-quality VQA pairs creating questions spanning spatial relations, counting, ranking, and size comparisons. We evaluate one of the datasets and find it achieves 91.16% human-validated accuracy\textemdashcompared to 57.6% on a dataset generated by recent work. % or recent work Critically, we demonstrate that when trained on GRAID data, models learn spatial reasoning concepts that generalize: models fine-tuned on 6 question types improve on over 10 held-out types, with accuracy gains of 47.5% on BDD and 37.9% on NuImages for Llama 3.2B 11B, and when trained on all questions types, achieve improvements on several existing benchmarks such as BLINK. The GRAID framework, datasets, and additional information can be found on our \hrefthis https URLproject page.
zh

[CV-164] Discovering Latent Graphs with GFlowNets for Diverse Conditional Image Generation

【速读】:该论文旨在解决条件图像生成中因条件或提示(prompt)存在不确定性而导致的多样性不足问题,尤其在需要生成多个合理且多样化的图像时,传统方法依赖随机种子修改难以区分有意义的差异,而单纯多样化提示则受限于语言可解释性的范围。解决方案的关键在于提出名为Rainbow的新框架,其核心思想是将输入条件分解为多种潜在表示(latent representations),每种表示捕捉条件中的不确定性维度,并据此生成不同的图像;具体实现上,通过引入由生成流网络(Generative Flow Networks, GFlowNets)参数化的潜在图(latent graph)来建模条件的不确定性,并利用GFlowNets强大的图采样能力生成多条轨迹,从而获得多样化的条件表示及对应的输出图像,在自然图像和医学图像数据集上的实验验证了其在多样性与保真度上的显著提升。

链接: https://arxiv.org/abs/2510.22107
作者: Bailey Trang,Parham Saremi,Alan Q. Wang,Fangrui Huang,Zahra TehraniNasab,Amar Kumar,Tal Arbel,Li Fei-Fei,Ehsan Adeli
机构: Stanford University (斯坦福大学); McGill University (麦吉尔大学); MILA - Quebec AI institute (魁北克人工智能研究所)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Capturing diversity is crucial in conditional and prompt-based image generation, particularly when conditions contain uncertainty that can lead to multiple plausible outputs. To generate diverse images reflecting this diversity, traditional methods often modify random seeds, making it difficult to discern meaningful differences between samples, or diversify the input prompt, which is limited in verbally interpretable diversity. We propose Rainbow, a novel conditional image generation framework, applicable to any pretrained conditional generative model, that addresses inherent condition/prompt uncertainty and generates diverse plausible images. Rainbow is based on a simple yet effective idea: decomposing the input condition into diverse latent representations, each capturing an aspect of the uncertainty and generating a distinct image. First, we integrate a latent graph, parameterized by Generative Flow Networks (GFlowNets), into the prompt representation computation. Second, leveraging GFlowNets’ advanced graph sampling capabilities to capture uncertainty and output diverse trajectories over the graph, we produce multiple trajectories that collectively represent the input condition, leading to diverse condition representations and corresponding output images. Evaluations on natural image and medical image datasets demonstrate Rainbow’s improvement in both diversity and fidelity across image synthesis, image generation, and counterfactual generation tasks.
zh

[CV-165] Scanner-Agnostic MRI Harmonization via SSIM-Guided Disentanglement

链接: https://arxiv.org/abs/2510.22073
作者: Luca Caldera,Lara Cavinato,Francesca Ieva
机构: Politecnico di Milano (米兰理工大学); Alzheimer’s Disease Neuroimaging Initiative (阿尔茨海默病神经影像计划)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-166] MAGIC-Flow: Multiscale Adaptive Conditional Flows for Generation and Interpretable Classification

【速读】:该论文旨在解决生成式建模在医学影像等数据受限领域中直接应用时面临的挑战,即单纯生成能力缺乏任务对齐性,难以构建可靠的临床基础。其解决方案的关键在于提出MAGIC-Flow——一种条件多尺度归一化流(conditional multiscale normalizing flow)架构,通过构建可逆且可微的双射层级结构,使模型能够在单一模块框架内同时完成生成与分类任务;该设计确保了精确似然计算和稳定优化,并利用可逆性实现样本似然的显式可视化,从而提供可解释的推理视角。此外,通过类别标签条件控制,模型支持可控合成与严谨的概率估计,有效协同生成与判别目标,在扫描噪声、模态特异性合成与识别等场景下均表现出优越性能。

链接: https://arxiv.org/abs/2510.22070
作者: Luca Caldera,Giacomo Bottacini,Lara Cavinato
机构: Politecnico di Milano (米兰理工大学); Alzheimer’s Disease Neuroimaging Initiative (阿尔茨海默病神经影像计划)
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
备注:

点击查看摘要

Abstract:Generative modeling has emerged as a powerful paradigm for representation learning, but its direct applicability to challenging fields like medical imaging remains limited: mere generation, without task alignment, fails to provide a robust foundation for clinical use. We propose MAGIC-Flow, a conditional multiscale normalizing flow architecture that performs generation and classification within a single modular framework. The model is built as a hierarchy of invertible and differentiable bijections, where the Jacobian determinant factorizes across sub-transformations. We show how this ensures exact likelihood computation and stable optimization, while invertibility enables explicit visualization of sample likelihoods, providing an interpretable lens into the model’s reasoning. By conditioning on class labels, MAGIC-Flow supports controllable sample synthesis and principled class-probability estimation, effectively aiding both generative and discriminative objectives. We evaluate MAGIC-Flow against top baselines using metrics for similarity, fidelity, and diversity. Across multiple datasets, it addresses generation and classification under scanner noise, and modality-specific synthesis and identification. Results show MAGIC-Flow creates realistic, diverse samples and improves classification. MAGIC-Flow is an effective strategy for generation and classification in data-limited domains, with direct benefits for privacy-preserving augmentation, robust generalization, and trustworthy medical AI.
zh

[CV-167] Capturing Gaze Shifts for Guidance: Cross-Modal Fusion Enhancement for VLM Hallucination Mitigation

【速读】:该论文旨在解决视觉语言模型(Vision Language Models, VLMs)中常见的幻觉问题,即模型生成的内容无法由文本或视觉输入支持。现有方法主要关注于缓解对语言先验的过度依赖,但忽略了视觉注意力“sink”问题——即注意力常被错误分配到与任务无关的视觉区域,且未平衡跨模态融合,仅增强视觉注意力而未同步调整对用户查询的关注度,导致错误区域被放大而用户意图未能准确解析。解决方案的关键在于提出一种简单而有效的方法GIFT(Gaze Shift-Guided Cross-modal Fusion Enhancement),其通过预计算全局视觉显著性图来捕捉用户查询理解过程中视觉注意力的正向变化(即“凝视转移”),并在每个解码步骤中利用该显著性图同时增强对显著视觉信息和用户查询的注意力,从而减少无关区域的干扰并实现更均衡的跨模态融合,最终显著降低幻觉发生率,同时保持良好的视觉-语言性能与低计算开销。

链接: https://arxiv.org/abs/2510.22067
作者: Zheng Qi,Chao Shang,Evangelia Spiliopoulou,Nikolaos Pappas
机构: AWS AI Labs (Amazon Web Services 人工智能实验室)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Vision language models (VLMs) often generate hallucination, i.e., content that cannot be substantiated by either textual or visual inputs. Prior work primarily attributes this to over-reliance on linguistic prior knowledge rather than visual inputs. Some methods attempt to mitigate hallucination by amplifying visual token attention proportionally to their attention scores. However, these methods overlook the visual attention sink problem, where attention is frequently misallocated to task-irrelevant visual regions, and neglect cross-modal fusion balance by enhancing only visual attention without adjusting attention to the user query. This can result in amplifying incorrect areas while failing to properly interpret the user query. To address these challenges, we propose a simple yet effective method called Gaze Shift-Guided Cross-modal Fusion Enhancement (GIFT). GIFT pre-computes a holistic visual saliency map by tracking positive changes in visual attention, or “gaze shifts”, during user query comprehension, and leverages this map to amplify attention to both salient visual information and the user query at each decoding step. This reduces the impact of visual attention sink, as irrelevant tokens exhibit minimal shifts, while ensuring balanced cross-modal fusion for well-integrated representation. Extensive experiments show that GIFT effectively mitigates hallucination in VLMs across both generative and classification tasks, achieving up to 20.7% improvement over greedy decoding, while maintaining general vision-language performance with low computational overhead.
zh

[CV-168] Human-Centric Anomaly Detection in Surveillance Videos Using YOLO-World and Spatio-Temporal Deep Learning

【速读】:该论文旨在解决监控视频中异常行为检测的挑战,包括异常事件类型的多样性、类别不平衡以及场景依赖的视觉杂波问题。其解决方案的关键在于构建一个融合人体中心预处理与时空建模的深度学习框架:首先利用YOLO-World(开放词汇视觉语言检测器)定位人像,并通过ByteTrack实现身份感知跟踪;随后采用高斯模糊抑制背景区域以聚焦行为相关的前景内容;接着使用ImageNet预训练的InceptionV3提取空间特征,再通过双向长短期记忆网络(Bidirectional LSTM, BiLSTM)捕获时序动态信息,从而实现多类异常行为的准确分类。该方法在UCF-Crime数据集五个类别上的平均测试准确率达92.41%,且每类F1分数均高于0.85,验证了前景聚焦预处理对提升真实场景下异常判别的有效性。

链接: https://arxiv.org/abs/2510.22056
作者: Mohammad Ali Etemadi Naeen,Hoda Mohammadzade,Saeed Bagheri Shouraki
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Anomaly detection in surveillance videos remains a challenging task due to the diversity of abnormal events, class imbalance, and scene-dependent visual clutter. To address these issues, we propose a robust deep learning framework that integrates human-centric preprocessing with spatio-temporal modeling for multi-class anomaly classification. Our pipeline begins by applying YOLO-World - an open-vocabulary vision-language detector - to identify human instances in raw video clips, followed by ByteTrack for consistent identity-aware tracking. Background regions outside detected bounding boxes are suppressed via Gaussian blurring, effectively reducing scene-specific distractions and focusing the model on behaviorally relevant foreground content. The refined frames are then processed by an ImageNet-pretrained InceptionV3 network for spatial feature extraction, and temporal dynamics are captured using a bidirectional LSTM (BiLSTM) for sequence-level classification. Evaluated on a five-class subset of the UCF-Crime dataset (Normal, Burglary, Fighting, Arson, Explosion), our method achieves a mean test accuracy of 92.41% across three independent trials, with per-class F1-scores consistently exceeding 0.85. Comprehensive evaluation metrics - including confusion matrices, ROC curves, and macro/weighted averages - demonstrate strong generalization and resilience to class imbalance. The results confirm that foreground-focused preprocessing significantly enhances anomaly discrimination in real-world surveillance scenarios.
zh

[CV-169] VLM-SlideEval: Evaluating VLMs on Structured Comprehension and Perturbation Sensitivity in PPT NEURIPS2025

【速读】:该论文旨在解决当前视觉语言模型(Vision-Language Models, VLMs)在幻灯片内容评估中的理解局限问题,特别是其对幻灯片元素级提取、鲁棒性以及叙事结构理解能力的不足。解决方案的关键在于提出VLM-SlideEval评估框架,从三个维度系统性地评测VLMs:(1) 基于真实标注的幻灯片图像中元素级信息提取精度;(2) 对几何、风格和文本扰动的鲁棒性;(3) 从打乱顺序的幻灯片中恢复整体叙事逻辑的能力。通过统一公开幻灯片数据集的元数据格式并构建可验证的评估标准,实证表明当前VLMs在像素级提取与叙事结构理解上存在明显短板,从而推动开发具备校准机制的“批评者在环”(critic-in-the-loop)评估器,以支持代理式(agentic)流水线中的迭代优化与选择。

链接: https://arxiv.org/abs/2510.22045
作者: Hyeonsu Kang,Emily Bao,Anjan Goswami
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Evaluating the Evolving LLM Lifecycle - Benchmarks, Emergent Abilities, and Scaling

点击查看摘要

Abstract:Vision-language models (VLMs) are increasingly used to evaluate multimodal content, including presentation slides, yet their slide-specific understanding remains underexplored despite their growing role as critics in agentic, model-forward pipelines. We introduce VLM-SlideEval, an evaluation framework that probes VLMs along three axes: (1) element-level extraction from slide images aligned to ground truth; (2) robustness to controlled perturbations in geometry, style, and text; and (3) higher-level comprehension, such as recovering a deck’s narrative order from shuffled slides. Using publicly available decks from Zenodo (this https URL), we standardize ground-truth element metadata from PowerPoint XML and live renderings into a unified, verifiable schema. Empirically, VLMs underperform on pixel-accurate extraction and show non-trivial agreement, fidelity, and consistency under controlled perturbations, while performing better on single-slide content understanding; however, they do not reliably capture narrative structure across slides. These results highlight the limits of current VLMs for slide evaluation and motivate calibrated, critic-in-the-loop evaluators that drive iterative refinement and selection in agentic pipelines.
zh

[CV-170] Caption-Driven Explainability: Probing CNNs for Bias via CLIP ICIP2025

【速读】:该论文旨在解决机器学习模型在面对干扰特征(spurious features)时缺乏鲁棒性的问题,尤其是在使用基于像素的显著性图(saliency map)进行解释时可能误导模型判断的情况。其解决方案的关键在于提出一种基于文本描述(caption-based)的可解释人工智能(XAI)方法,通过一种新颖的网络手术(network surgery)技术将待解释的独立模型嵌入到对比语言-图像预训练(CLIP)模型中,从而识别出对模型预测贡献最大的主导概念(dominant concept),有效降低模型因协变量偏移(covariate shift)而误判的风险,有助于提升模型的整体鲁棒性。

链接: https://arxiv.org/abs/2510.22035
作者: Patrick Koller(Northwestern University, Evanston, Illinois, United States),Amil V. Dravid(University of California, Berkeley, California, United States),Guido M. Schuster(Eastern Switzerland University of Applied Sciences, Rapperswil, St. Gallen, Switzerland),Aggelos K. Katsaggelos(Northwestern University, Evanston, Illinois, United States)
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
备注: Accepted and presented at the IEEE ICIP 2025 Satellite Workshop “Generative AI for World Simulations and Communications Celebrating 40 Years of Excellence in Education: Honoring Professor Aggelos Katsaggelos”, Anchorage, Alaska, United States, September 14, 2025. Camera-ready preprint. The official IEEE Xplore version will be available after proceedings processing

点击查看摘要

Abstract:Robustness has become one of the most critical problems in machine learning (ML). The science of interpreting ML models to understand their behavior and improve their robustness is referred to as explainable artificial intelligence (XAI). One of the state-of-the-art XAI methods for computer vision problems is to generate saliency maps. A saliency map highlights the pixel space of an image that excites the ML model the most. However, this property could be misleading if spurious and salient features are present in overlapping pixel spaces. In this paper, we propose a caption-based XAI method, which integrates a standalone model to be explained into the contrastive language-image pre-training (CLIP) model using a novel network surgery approach. The resulting caption-based XAI model identifies the dominant concept that contributes the most to the models prediction. This explanation minimizes the risk of the standalone model falling for a covariate shift and contributes significantly towards developing robust ML models.
zh

[CV-171] Reconnaissance Automatique des Langues des Signes : Une Approche Hybridée CNN-LSTM Basée sur Mediapipe

链接: https://arxiv.org/abs/2510.22011
作者: Fraisse Sacré Takouchouang,Ho Tuong Vinh
机构: Université Nationale du Vietnam (越南国家大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: in French language

点击查看摘要

[CV-172] FlowOpt: Fast Optimization Through Whole Flow Processes for Training-Free Editing

【速读】:该论文旨在解决扩散模型(Diffusion Models)和流匹配模型(Flow-Matching Models)在测试时进行可控生成任务(如图像编辑、修复、压缩和个人化)中存在的计算效率问题。由于这些模型的采样过程具有迭代特性,直接使用基于梯度的优化方法对最终生成图像进行控制在计算上不可行。现有方法通常只能逐时间步单独操作,难以高效实现端到端的优化。本文提出 FlowOpt —— 一种零阶(无梯度)优化框架,将整个流过程视为黑箱,通过不依赖反向传播的方式在整个采样路径上进行优化,从而实现高效且可监控的控制生成。其关键创新在于:1)设计了一种无需梯度信息即可优化完整生成路径的方法;2)给出了收敛至全局最优的步长充分条件,并提供经验估计策略以选择合适的步长。实验表明,FlowOpt 在图像编辑任务中优于现有方法,同时保持与现有方法相当的神经函数评估次数(NFEs)。

链接: https://arxiv.org/abs/2510.22010
作者: Or Ronai,Vladimir Kulikov,Tomer Michaeli
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
备注: Project’s webpage at this https URL

点击查看摘要

Abstract:The remarkable success of diffusion and flow-matching models has ignited a surge of works on adapting them at test time for controlled generation tasks. Examples range from image editing to restoration, compression and personalization. However, due to the iterative nature of the sampling process in those models, it is computationally impractical to use gradient-based optimization to directly control the image generated at the end of the process. As a result, existing methods typically resort to manipulating each timestep separately. Here we introduce FlowOpt - a zero-order (gradient-free) optimization framework that treats the entire flow process as a black box, enabling optimization through the whole sampling path without backpropagation through the model. Our method is both highly efficient and allows users to monitor the intermediate optimization results and perform early stopping if desired. We prove a sufficient condition on FlowOpt’s step-size, under which convergence to the global optimum is guaranteed. We further show how to empirically estimate this upper bound so as to choose an appropriate step-size. We demonstrate how FlowOpt can be used for image editing, showcasing two options: (i) inversion (determining the initial noise that generates a given image), and (ii) directly steering the edited image to be similar to the source image while conforming to a target text prompt. In both cases, FlowOpt achieves state-of-the-art results while using roughly the same number of neural function evaluations (NFEs) as existing methods. Code and examples are available on the project’s webpage.
zh

[CV-173] LiteDiff

【速读】:该论文旨在解决扩散模型(diffusion models)在特定领域(如医学影像)微调时面临的两大挑战:一是领域特定数据稀缺导致的过拟合问题,二是全模型微调带来的高计算成本。解决方案的关键在于提出 Lite-Diff(Lightweight Diffusion Model Adaptation),其核心是将轻量级适配层(lightweight adaptation layers)嵌入到冻结的扩散 U-Net 中,并结合潜在形态自动编码器(latent morphological autoencoder)以增强域特定潜在空间的一致性,以及像素级判别器(pixel-level discriminator)实现对抗对齐。通过仅优化少量残差适配模块而非更新主模型权重,LiteDiff 显著降低计算开销并提升小样本场景下的泛化性能。

链接: https://arxiv.org/abs/2510.22004
作者: Ruchir Namjoshi,Nagasai Thadishetty,Vignesh Kumar,Hemanth Venkateshwara
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:In recent years, diffusion models have demonstrated remarkable success in high-fidelity image synthesis. However, fine-tuning these models for specialized domains, such as medical imaging, remains challenging due to limited domain-specific data and the high computational cost of full model adaptation. In this paper, we introduce Lite-Diff (Lightweight Diffusion Model Adaptation), a novel finetuning approach that integrates lightweight adaptation layers into a frozen diffusion U-Net while enhancing training with a latent morphological autoencoder (for domain-specific latent consistency) and a pixel level discriminator(for adversarial alignment). By freezing weights of the base model and optimizing only small residual adapter modules, LiteDiff significantly reduces the computational overhead and mitigates overfitting, even in minimal-data settings. Additionally, we conduct ablation studies to analyze the effects of selectively integrating adaptation layers in different U-Net blocks, revealing an optimal balance between efficiency and performance. Experiments on three chest X-ray datasets - (1) Kaggle Chest X-Ray Pneumonia, (2) NIH Chest X-ray14 and (3) VinBigData Chest X_ray demonstrate that LiteDiff achieves superior adaptation efficiency compared to naive full fine-tuning. Our framework provides a promising direction for transfer learning in diffusion models, facilitating their deployment in diverse low data domains.
zh

[CV-174] Sprint: Sparse-Dense Residual Fusion for Efficient Diffusion Transformers

链接: https://arxiv.org/abs/2510.21986
作者: Dogyun Park,Moayed Haji-Ali,Yanyu Li,Willi Menapace,Sergey Tulyakov,Hyunwoo J. Kim,Aliaksandr Siarohin,Anil Kag
机构: Snap Inc.; Korea University (韩国科学技术院); KAIST (韩国科学技术院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-175] A supervised discriminant data representation: application to pattern classification

链接: https://arxiv.org/abs/2510.21898
作者: Fadi Dornaika,Ahmad Khoder,Abdelmalik Moujahid,Wassim Khoder
机构: 未知
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-176] Generative AI in Depth: A Survey of Recent Advances Model Variants and Real-World Applications

链接: https://arxiv.org/abs/2510.21887
作者: Shamim Yazdani,Akansha Singh,Nripsuta Saxena,Zichong Wang,Avash Palikhe,Deng Pan,Umapada Pal,Jie Yang,Wenbin Zhang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Accepted by the Journal of Big Data

点击查看摘要

[CV-177] rnaryCLIP: Efficiently Compressing Vision-Language Models with Ternary Weights and Distilled Knowledge

链接: https://arxiv.org/abs/2510.21879
作者: Shu-Hao Zhang,Wei-Cheng Tang,Chen Wu,Peng Hu,Nan Li,Liang-Jie Zhang,Qi Zhang,Shao-Qun Zhang
机构: Nanjing University (南京大学); Microsoft AI (微软人工智能)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-178] AI Powered Urban Green Infrastructure Assessment Through Aerial Imagery of an Industrial Township

【速读】:该论文旨在解决城市冠层覆盖率(urban canopy coverage)精准评估难题,传统方法受限于技术要求不足、难以规模化处理数据及缺乏专业人才。其解决方案的关键在于利用生成式AI(Generative AI)中的计算机视觉(Computer Vision)技术,结合基于深度学习的对象图像分析(Object-Based Image Analysis, OBIA),从高分辨率无人机影像中准确识别与分割绿色冠层。通过云平台部署以高性能处理器支持的大规模计算资源,有效应对数据处理的时空复杂度问题,实现高效、低成本的城市尺度冠层覆盖估算,从而为城市森林管理和碳汇潜力评估提供可靠依据。

链接: https://arxiv.org/abs/2510.21876
作者: Anisha Dutta
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Presented at IIIE Conference 2024, Jamshedpur

点击查看摘要

Abstract:Accurate assessment of urban canopy coverage is crucial for informed urban planning, effective environmental monitoring, and mitigating the impacts of climate change. Traditional practices often face limitations due to inadequate technical requirements, difficulties in scaling and data processing, and the lack of specialized expertise. This study presents an efficient approach for estimating green canopy coverage using artificial intelligence, specifically computer vision techniques, applied to aerial imageries. Our proposed methodology utilizes object-based image analysis, based on deep learning algorithms to accurately identify and segment green canopies from high-resolution drone images. This approach allows the user for detailed analysis of urban vegetation, capturing variations in canopy density and understanding spatial distribution. To overcome the computational challenges associated with processing large datasets, it was implemented over a cloud platform utilizing high-performance processors. This infrastructure efficiently manages space complexity and ensures affordable latency, enabling the rapid analysis of vast amounts of drone imageries. Our results demonstrate the effectiveness of this approach in accurately estimating canopy coverage at the city scale, providing valuable insights for urban forestry management of an industrial township. The resultant data generated by this method can be used to optimize tree plantation and assess the carbon sequestration potential of urban forests. By integrating these insights into sustainable urban planning, we can foster more resilient urban environments, contributing to a greener and healthier future.
zh

[CV-179] Addressing Corner Cases in Autonomous Driving: A World Model-based Approach with Mixture of Experts and LLM s

【速读】:该论文旨在解决自动驾驶中高风险“corner case”(角落案例)场景下运动预测模型性能不足的问题,其核心挑战在于训练数据对常见场景的过拟合以及模型在罕见但关键场景中的泛化能力有限。解决方案的关键在于提出WM-MoE框架,该框架首次将世界模型(world model)与感知、时序记忆和决策机制统一建模,通过构建紧凑的场景表征实现对当前观测的解释、未来动态的预测及潜在动作结果的评估;进一步引入轻量级时序标记器(temporal tokenizer)以无额外训练方式将代理轨迹与上下文线索映射至大语言模型(LLM)特征空间,增强时序语境和常识先验;同时采用混合专家(MoE)结构分解复杂corner case为子问题,并由路由器分配场景至专用专家进行意图推理与反事实滚动推演,从而显著提升模型在极端场景下的鲁棒性和可扩展性。

链接: https://arxiv.org/abs/2510.21867
作者: Haicheng Liao,Bonan Wang,Junxian Yang,Chengyue Wang,Zhengbin He,Guohui Zhang,Chengzhong Xu,Zhenning Li
机构: University of Macau (澳门大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Accurate and reliable motion forecasting is essential for the safe deployment of autonomous vehicles (AVs), particularly in rare but safety-critical scenarios known as corner cases. Existing models often underperform in these situations due to an over-representation of common scenes in training data and limited generalization capabilities. To address this limitation, we present WM-MoE, the first world model-based motion forecasting framework that unifies perception, temporal memory, and decision making to address the challenges of high-risk corner-case scenarios. The model constructs a compact scene representation that explains current observations, anticipates future dynamics, and evaluates the outcomes of potential actions. To enhance long-horizon reasoning, we leverage large language models (LLMs) and introduce a lightweight temporal tokenizer that maps agent trajectories and contextual cues into the LLM’s feature space without additional training, enriching temporal context and commonsense priors. Furthermore, a mixture-of-experts (MoE) is introduced to decompose complex corner cases into subproblems and allocate capacity across scenario types, and a router assigns scenes to specialized experts that infer agent intent and perform counterfactual rollouts. In addition, we introduce nuScenes-corner, a new benchmark that comprises four real-world corner-case scenarios for rigorous evaluation. Extensive experiments on four benchmark datasets (nuScenes, NGSIM, HighD, and MoCAD) showcase that WM-MoE consistently outperforms state-of-the-art (SOTA) baselines and remains robust under corner-case and data-missing conditions, indicating the promise of world model-based architectures for robust and generalizable motion forecasting in fully AVs.
zh

[CV-180] LSF-Animation: Label-Free Speech-Driven Facial Animation via Implicit Feature Representation

【速读】:该论文旨在解决当前语音驱动的3D人脸动画方法在处理未见过的说话者(unseen speakers)和情感状态时泛化能力不足的问题,以及现有方法通常依赖显式的one-hot编码来表示身份和情绪标签,忽视了语音中隐含的情感线索,从而限制了动画的自然性和适应性。其解决方案的关键在于提出LSF-Animation框架,该框架通过从语音中隐式提取情绪信息,并从中性面部网格中捕捉身份特征,从而无需人工标注即可实现对未见说话者和情绪状态的有效泛化;同时引入层次交互融合块(Hierarchical Interaction Fusion Block, HIFB),利用融合token整合双Transformer特征,更有效地融合情绪、运动相关和身份相关线索,显著提升了动画的情感表达力、身份泛化能力和真实感。

链接: https://arxiv.org/abs/2510.21864
作者: Xin Lu,Chuanqing Zhuang,Chenxi Jin,Zhengda Lu,Yiqun Wang,Wu Liu,Jun Xiao
机构: University of Chinese Academy of Sciences(中国科学院大学); Zhongguancun Academy(中关村学院); National University of Singapore(新加坡国立大学); Chongqing University(重庆大学); University of Science and Technology of China(中国科学技术大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
备注:

点击查看摘要

Abstract:Speech-driven 3D facial animation has attracted increasing interest since its potential to generate expressive and temporally synchronized digital humans. While recent works have begun to explore emotion-aware animation, they still depend on explicit one-hot encodings to represent identity and emotion with given emotion and identity labels, which limits their ability to generalize to unseen speakers. Moreover, the emotional cues inherently present in speech are often neglected, limiting the naturalness and adaptability of generated animations. In this work, we propose LSF-Animation, a novel framework that eliminates the reliance on explicit emotion and identity feature representations. Specifically, LSF-Animation implicitly extracts emotion information from speech and captures the identity features from a neutral facial mesh, enabling improved generalization to unseen speakers and emotional states without requiring manual labels. Furthermore, we introduce a Hierarchical Interaction Fusion Block (HIFB), which employs a fusion token to integrate dual transformer features and more effectively integrate emotional, motion-related and identity-related cues. Extensive experiments conducted on the 3DMEAD dataset demonstrate that our method surpasses recent state-of-the-art approaches in terms of emotional expressiveness, identity generalization, and animation realism. The source code will be released at: this https URL.
zh

[CV-181] A Multi-Stage Hybrid Framework for Automated Interpretation of Multi-View Engineering Drawings Using Vision Language Model

链接: https://arxiv.org/abs/2510.21862
作者: Muhammad Tayyab Khan,Zane Yong,Lequn Chen,Wenhe Feng,Nicholas Yew Jin Tan,Seung Ki Moon
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
备注: This draft has been submitted to the 13th International Conference on Industrial Engineering and Applications (ICIEA 2026)

点击查看摘要

[CV-182] Poisson Flow Consistency Training

【速读】:该论文旨在解决Poisson Flow Consistency Model (PFCM) 仅能通过蒸馏(distillation)方式进行训练的问题,这一限制制约了其在多种数据模态中的应用潜力。解决方案的关键在于提出一种全新的训练方法——Poisson Flow Consistency Training (PFCT),其核心创新包括:利用扰动核(perturbation kernel)移除对预训练PFGM++的依赖,引入正弦离散化调度(sinusoidal discretization schedule)和Beta噪声分布(Beta noise distribution),以提升模型的适应性和生成样本质量。实验表明,PFCT在低剂量CT图像去噪任务中显著优于基线方法,在LPIPS和SSIM指标上表现优异,且效果与一致性模型(Consistency Model)相当,验证了该方法的有效性与通用性。

链接: https://arxiv.org/abs/2510.21857
作者: Anthony Zhang,Mahmut Gokmen,Dennis Hein,Rongjun Ge,Wenjun Xia,Ge Wang,Jin Chen
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 5 pages, 3 figures, 1 table

点击查看摘要

Abstract:The Poisson Flow Consistency Model (PFCM) is a consistency-style model based on the robust Poisson Flow Generative Model++ (PFGM++) which has achieved success in unconditional image generation and CT image denoising. Yet the PFCM can only be trained in distillation which limits the potential of the PFCM in many data modalities. The objective of this research was to create a method to train the PFCM in isolation called Poisson Flow Consistency Training (PFCT). The perturbation kernel was leveraged to remove the pretrained PFGM++, and the sinusoidal discretization schedule and Beta noise distribution were introduced in order to facilitate adaptability and improve sample quality. The model was applied to the task of low dose computed tomography image denoising and improved the low dose image in terms of LPIPS and SSIM. It also displayed similar denoising effectiveness as models like the Consistency Model. PFCT is established as a valid method of training the PFCM from its effectiveness in denoising CT images, showing potential with competitive results to other generative models. Further study is needed in the precise optimization of PFCT and in its applicability to other generative modeling tasks. The framework of PFCT creates more flexibility for the ways in which a PFCM can be created and can be applied to the field of generative modeling.
zh

[CV-183] Modal Aphasia: Can Unified Multimodal Models Describe Images From Memory?

【速读】:该论文试图解决当前统一多模态模型(unified multimodal models)在跨模态一致性上存在的系统性缺陷问题,即模型能够准确记忆和再现视觉信息(如电影艺术图像),却无法正确用文字描述这些内容,表现出一种称为“模态失语症”(modal aphasia)的现象。其解决方案的关键在于揭示这一现象并非训练过程中的偶然误差,而是当前多模态模型的固有属性——通过在多种架构和合成数据集上的控制实验验证了模态失语症的普遍性和稳定性,并指出该特性可能引发AI安全框架的漏洞:例如仅基于文本对齐的防护机制无法阻止模型生成有害图像,从而凸显出构建真正跨模态一致性的必要性。

链接: https://arxiv.org/abs/2510.21842
作者: Michael Aerni,Joshua Swanson,Kristina Nikolić,Florian Tramèr
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
备注:

点击查看摘要

Abstract:We present modal aphasia, a systematic dissociation in which current unified multimodal models accurately memorize concepts visually but fail to articulate them in writing, despite being trained on images and text simultaneously. For one, we show that leading frontier models can generate near-perfect reproductions of iconic movie artwork, but confuse crucial details when asked for textual descriptions. We corroborate those findings through controlled experiments on synthetic datasets in multiple architectures. Our experiments confirm that modal aphasia reliably emerges as a fundamental property of current unified multimodal models, not just as a training artifact. In practice, modal aphasia can introduce vulnerabilities in AI safety frameworks, as safeguards applied to one modality may leave harmful concepts accessible in other modalities. We demonstrate this risk by showing how a model aligned solely on text remains capable of generating unsafe images.
zh

[CV-184] RatioWaveNet: A Learnable RDWT Front-End for Robust and Interpretable EEG Motor-Imagery Classification

【速读】:该论文旨在解决非侵入式脑机接口(Brain-Computer Interface, BCI)中基于运动想象(Motor Imagery, MI)信号的可靠解码难题,尤其是面对脑电图(EEG)数据中存在的非平稳性、信噪比低(low SNR)和受试者间差异大等挑战。其核心解决方案是提出RatioWaveNet架构,关键创新在于引入一个可训练的有理稀疏小波变换(Rationally-Dilated Wavelet Transform, RDWT)前端模块,该模块对EEG信号进行无下采样的多分辨率子带分解,在保持时间长度与平移不变性的前提下增强sensorimotor节律并抑制抖动和轻微伪迹;随后通过轻量级分组一维卷积融合子带特征,并结合多核卷积神经网络(CNN)、分组查询注意力编码器(grouped-query attention encoder)和紧凑因果时序网络(TCN)头实现局部时空特征提取与长期上下文建模。实验表明,该设计显著提升了最难受试者的分类准确率,且在跨种子平均性能上也具一致性,证明了该小波前端作为Transformer-BI模型的有效插件,可在不牺牲效率的前提下增强鲁棒性。

链接: https://arxiv.org/abs/2510.21841
作者: Marco Siino,Giuseppe Bonomo,Rosario Sorbello,Ilenia Tinnirello
机构: University of Catania (卡塔尼亚大学); University of Palermo (巴勒莫大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Brain-computer interfaces (BCIs) based on motor imagery (MI) translate covert movement intentions into actionable commands, yet reliable decoding from non-invasive EEG remains challenging due to nonstationarity, low SNR, and subject variability. We present RatioWaveNet, which augments a strong temporal CNN-Transformer backbone (TCFormer) with a trainable, Rationally-Dilated Wavelet Transform (RDWT) front end. The RDWT performs an undecimated, multi-resolution subband decomposition that preserves temporal length and shift-invariance, enhancing sensorimotor rhythms while mitigating jitter and mild artifacts; subbands are fused via lightweight grouped 1-D convolutions and passed to a multi-kernel CNN for local temporal-spatial feature extraction, a grouped-query attention encoder for long-range context, and a compact TCN head for causal temporal integration. Our goal is to test whether this principled wavelet front end improves robustness precisely where BCIs typically fail - on the hardest subjects - and whether such gains persist on average across seeds under both intra- and inter-subject protocols. On BCI-IV-2a and BCI-IV-2b, across five seeds, RatioWaveNet improves worst-subject accuracy over the Transformer backbone by +0.17 / +0.42 percentage points (Sub-Dependent / LOSO) on 2a and by +1.07 / +2.54 percentage points on 2b, with consistent average-case gains and modest computational overhead. These results indicate that a simple, trainable wavelet front end is an effective plug-in to strengthen Transformer-based BCIs, improving worst-case reliability without sacrificing efficiency. Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG) Cite as: arXiv:2510.21841 [cs.CV] (or arXiv:2510.21841v1 [cs.CV] for this version) https://doi.org/10.48550/arXiv.2510.21841 Focus to learn more arXiv-issued DOI via DataCite
zh

[CV-185] Improving the Physics of Video Generation with VJEPA-2 Reward Signal

【速读】:该论文旨在解决当前生成式视频模型在物理合理性方面表现不足的问题,即尽管这些模型能够生成视觉上逼真的视频,但其对物理规律的理解仍然有限,导致生成内容常出现违背常识的物理现象。解决方案的关键在于利用自监督学习(SSL)预训练的视频世界模型——Video Joint Embedding Predictive Architecture 2(VJEPA-2)作为奖励信号,引导生成模型(MAGI-1)的训练过程,从而提升其生成视频的物理合理性。实验表明,该方法可使视频生成模型的物理合理性提升约6%。

链接: https://arxiv.org/abs/2510.21840
作者: Jianhao Yuan,Xiaofeng Zhang,Felix Friedrich,Nicolas Beltran-Velez,Melissa Hall,Reyhane Askari-Hemmat,Xiaochuang Han,Nicolas Ballas,Michal Drozdzal,Adriana Romero-Soriano
机构: FAIR, Meta Superintelligence Labs (FAIR,Meta超级智能实验室); University of Oxford (牛津大学); Mila - Québec AI Institute (Mila-魁北克人工智能研究所); Université de Montréal (蒙特利尔大学); Columbia University (哥伦比亚大学); McGill University (麦吉尔大学); Canada CIFAR AI Chair (加拿大 CIFAR 人工智能主席)
类目: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
备注: 2 pages

点击查看摘要

Abstract:This is a short technical report describing the winning entry of the PhysicsIQ Challenge, presented at the Perception Test Workshop at ICCV 2025. State-of-the-art video generative models exhibit severely limited physical understanding, and often produce implausible videos. The Physics IQ benchmark has shown that visual realism does not imply physics understanding. Yet, intuitive physics understanding has shown to emerge from SSL pretraining on natural videos. In this report, we investigate whether we can leverage SSL-based video world models to improve the physics plausibility of video generative models. In particular, we build ontop of the state-of-the-art video generative model MAGI-1 and couple it with the recently introduced Video Joint Embedding Predictive Architecture 2 (VJEPA-2) to guide the generation process. We show that by leveraging VJEPA-2 as reward signal, we can improve the physics plausibility of state-of-the-art video generative models by ~6%.
zh

[CV-186] Evaluating ChatGPT s Performance in Classifying Pneumonia from Chest X-Ray Images

【速读】:该论文旨在解决生成式 AI(Generative AI)在无微调情况下对胸部X光图像进行肺炎分类的零样本(zero-shot)能力问题。解决方案的关键在于设计不同类型的提示(prompt),特别是通过对比简洁、聚焦特征的提示与强调推理过程的提示,发现前者在不依赖任务特定训练的前提下实现了最高74%的分类准确率,表明提示工程(prompt engineering)对提升模型在医学影像诊断中的表现具有决定性作用。

链接: https://arxiv.org/abs/2510.21839
作者: Pragna Prahallad,Pranathi Prahallad
机构: Emerald High School (艾默尔高中)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:In this study, we evaluate the ability of OpenAI’s gpt-4o model to classify chest X-ray images as either NORMAL or PNEUMONIA in a zero-shot setting, without any prior fine-tuning. A balanced test set of 400 images (200 from each class) was used to assess performance across four distinct prompt designs, ranging from minimal instructions to detailed, reasoning-based prompts. The results indicate that concise, feature-focused prompts achieved the highest classification accuracy of 74%, whereas reasoning-oriented prompts resulted in lower performance. These findings highlight that while ChatGPT exhibits emerging potential for medical image interpretation, its diagnostic reliability remains limited. Continued advances in visual reasoning and domain-specific adaptation are required before such models can be safely applied in clinical practice.
zh

[CV-187] owards Accurate and Efficient Waste Image Classification: A Hybrid Deep Learning and Machine Learning Approach

【速读】:该论文旨在解决自动化图像垃圾分类在废物管理中的应用问题,特别是当前缺乏系统性基准来评估机器学习(Machine Learning, ML)、深度学习(Deep Learning, DL)及高效混合方案的性能差异。其关键解决方案是提出一种混合方法:利用深度模型(如ResNet变体和EfficientNetV2S)进行特征提取,并结合传统分类器(如支持向量机和支持向量机(Support Vector Machine, SVM)和逻辑回归(Logistic Regression))进行最终分类,从而在保证高准确率的同时显著降低计算成本。实验表明,该混合方法在多个公开数据集上均优于纯ML或纯DL方案,最高准确率达100%,且通过特征选择将维度压缩超过95%而不损失精度,提升了训练与推理效率,适用于资源受限环境下的可扩展部署。

链接: https://arxiv.org/abs/2510.21833
作者: Ngoc-Bao-Quang Nguyen,Tuan-Minh Do,Cong-Tam Phan,Thi-Thu-Hong Phan
机构: FPT University (FPT大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 31 pages; 7 figures; 16 tables

点击查看摘要

Abstract:Automated image-based garbage classification is a critical component of global waste management; however, systematic benchmarks that integrate Machine Learning (ML), Deep Learning (DL), and efficient hybrid solutions remain underdeveloped. This study provides a comprehensive comparison of three paradigms: (1) machine learning algorithms using handcrafted features, (2) deep learning architectures, including ResNet variants and EfficientNetV2S, and (3) a hybrid approach that utilizes deep models for feature extraction combined with classical classifiers such as Support Vector Machine and Logistic Regression to identify the most effective strategy. Experiments on three public datasets - TrashNet, Garbage Classification, and a refined Household Garbage Dataset (with 43 corrected mislabels)- demonstrate that the hybrid method consistently outperforms the others, achieving up to 100% accuracy on TrashNet and the refined Household set, and 99.87% on Garbage Classification, thereby surpassing state-of-the-art benchmarks. Furthermore, feature selection reduces feature dimensionality by over 95% without compromising accuracy, resulting in faster training and inference. This work establishes more reliable benchmarks for waste classification and introduces an efficient hybrid framework that achieves high accuracy while reducing inference cost, making it suitable for scalable deployment in resource-constrained environments.
zh

[CV-188] A Flow Model with Low-Rank Transformers for Incomplete Multimodal Survival Analysis

【速读】:该论文旨在解决多模态医学数据(如全切片图像WSI和基因组谱)在生存分析中因模态缺失导致的建模不准确问题。现有方法通常通过深度神经网络直接从可观测模态推断缺失模态,但忽略了跨模态分布差异,造成重建结果不一致且不可靠。其解决方案的关键在于提出一种结合低秩Transformer与基于流的生成模型的新框架:首先利用类别条件下的流模型实现跨模态分布对齐,借助归一化流的可逆结构和精确密度建模能力构建分布一致的潜在空间,从而提升缺失模态重建的真实性;其次采用轻量级低秩Transformer建模模态内依赖关系,并通过低秩结构缓解高维模态融合中的过拟合问题,显著提升了在完整与不完整模态场景下的生存预测鲁棒性与准确性。

链接: https://arxiv.org/abs/2510.21829
作者: Yi Yin,Yuntao Shou,Zao Dai,Yun Peng,Tao Meng,Wei Ai,Keqin Li
机构: Central South University of Forestry and Technology (中南林业科技大学); Xi’an Jiaotong University (西安交通大学); State University of New York, New Paltz (纽约州立大学新帕尔兹分校)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 12 pages, 4 figures

点击查看摘要

Abstract:In recent years, multimodal medical data-based survival analysis has attracted much attention. However, real-world datasets often suffer from the problem of incomplete modality, where some patient modality information is missing due to acquisition limitations or system failures. Existing methods typically infer missing modalities directly from observed ones using deep neural networks, but they often ignore the distributional discrepancy across modalities, resulting in inconsistent and unreliable modality reconstruction. To address these challenges, we propose a novel framework that combines a low-rank Transformer with a flow-based generative model for robust and flexible multimodal survival prediction. Specifically, we first formulate the concerned problem as incomplete multimodal survival analysis using the multi-instance representation of whole slide images (WSIs) and genomic profiles. To realize incomplete multimodal survival analysis, we propose a class-specific flow for cross-modal distribution alignment. Under the condition of class labels, we model and transform the cross-modal distribution. By virtue of the reversible structure and accurate density modeling capabilities of the normalizing flow model, the model can effectively construct a distribution-consistent latent space of the missing modality, thereby improving the consistency between the reconstructed data and the true distribution. Finally, we design a lightweight Transformer architecture to model intra-modal dependencies while alleviating the overfitting problem in high-dimensional modality fusion by virtue of the low-rank Transformer. Extensive experiments have demonstrated that our method not only achieves state-of-the-art performance under complete modality settings, but also maintains robust and superior accuracy under the incomplete modalities scenario.
zh

[CV-189] Precise classification of low quality G-banded Chromosome Images by reliability metrics and data pruning classifier

【速读】:该论文旨在解决低质量图像和低成本设备环境下染色体分类精度不足的问题,尤其是在资源匮乏的病理实验室中难以获取高质量标注数据的情况下。解决方案的关键在于提出了一种基于可靠性阈值(reliability thresholding)的度量方法,并结合精心设计的特征工程,以提升分类精度;同时采用改进的Alex-Net神经网络、支持向量机(SVM)、K近邻(KNN)及其级联管道对半直染色体进行自动过滤,从而在低质量G带数据库上实现了超过90%的高精度分类结果,验证了该方法在欠发达地区和预算有限的细胞遗传学实验室中的适用性。

链接: https://arxiv.org/abs/2510.21827
作者: Mojtaba Moattari
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:In the last decade, due to high resolution cameras and accurate meta-phase analyzes, the accuracy of chromosome classification has improved substantially. However, current Karyotyping systems demand large number of high quality train data to have an adequately plausible Precision per each chromosome. Such provision of high quality train data with accurate devices are not yet accomplished in some out-reached pathological laboratories. To prevent false positive detections in low-cost systems and low-quality images settings, this paper improves the classification Precision of chromosomes using proposed reliability thresholding metrics and deliberately engineered features. The proposed method has been evaluated using a variation of deep Alex-Net neural network, SVM, K Nearest-Neighbors, and their cascade pipelines to an automated filtering of semi-straight chromosome. The classification results have highly improved over 90% for the chromosomes with more common defections and translocations. Furthermore, a comparative analysis over the proposed thresholding metrics has been conducted and the best metric is bolded with its salient characteristics. The high Precision results provided for a very low-quality G-banding database verifies suitability of the proposed metrics and pruning method for Karyotyping facilities in poor countries and lowbudget pathological laboratories.
zh

[CV-190] Explainable Deep Learning in Medical Imaging: Brain Tumor and Pneumonia Detection

【速读】:该论文旨在解决深度学习模型在医学影像诊断中因缺乏可解释性而难以获得临床信任与应用的问题。其解决方案的关键在于构建一个可解释的深度学习框架,采用ResNet50和DenseNet121两种主流卷积神经网络对脑肿瘤MRI图像和肺炎胸片进行分类,并引入梯度加权类激活映射(Gradient-weighted Class Activation Mapping, Grad-CAM)生成热力图,直观展示模型决策所依赖的图像区域。实验表明,DenseNet121不仅分类准确率更高(脑肿瘤94.3% vs. 92.5%,肺炎89.1% vs. 84.4%),且Grad-CAM可视化结果更聚焦于核心病灶区域,显著提升了模型的可解释性和临床可信度。

链接: https://arxiv.org/abs/2510.21823
作者: Sai Teja Erukude,Viswa Chaitanya Marella,Suhasnadh Reddy Veluru
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Published in IEEE

点击查看摘要

Abstract:Deep Learning (DL) holds enormous potential for improving medical imaging diagnostics, yet the lack of interpretability in most models hampers clinical trust and adoption. This paper presents an explainable deep learning framework for detecting brain tumors in MRI scans and pneumonia in chest X-ray images using two leading Convolutional Neural Networks, ResNet50 and DenseNet121. These models were trained on publicly available Kaggle datasets comprising 7,023 brain MRI images and 5,863 chest X-ray images, achieving high classification performance. DenseNet121 consistently outperformed ResNet50 with 94.3 percent vs. 92.5 percent accuracy for brain tumors and 89.1 percent vs. 84.4 percent accuracy for pneumonia. For better explainability, Gradient-weighted Class Activation Mapping (Grad-CAM) was integrated to create heatmap visualizations superimposed on the test images, indicating the most influential image regions in the decision-making process. Interestingly, while both models produced accurate results, Grad-CAM showed that DenseNet121 consistently focused on core pathological regions, whereas ResNet50 sometimes scattered attention to peripheral or non-pathological areas. Combining deep learning and explainable AI offers a promising path toward reliable, interpretable, and clinically useful diagnostic tools.
zh

[CV-191] Wavelet-based GAN Fingerprint Detection using ResNet50

【速读】:该论文旨在解决生成式对抗网络(Generative Adversarial Networks, GANs)生成图像的检测问题,即如何有效区分由StyleGAN等模型生成的伪造图像与真实图像。其解决方案的关键在于利用离散小波变换(Discrete Wavelet Transform, DWT)对输入图像进行多分辨率预处理,提取图像在小波域中的细微伪影特征,并将这些特征输入至ResNet50分类网络进行识别。实验表明,基于Haar和Daubechies小波滤波器的预处理方法分别实现了93.8%和95.1%的准确率,显著优于直接在空间域训练的ResNet50模型(准确率81.5%),证明了GAN生成图像在小波域中存在可被识别的独特“指纹”,且更复杂的频率描述层有助于提升检测性能。

链接: https://arxiv.org/abs/2510.21822
作者: Sai Teja Erukude,Suhasnadh Reddy Veluru,Viswa Chaitanya Marella
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 6 pages; Published in IEEE

点击查看摘要

Abstract:Identifying images generated by Generative Adversarial Networks (GANs) has become a significant challenge in digital image forensics. This research presents a wavelet-based detection method that uses discrete wavelet transform (DWT) preprocessing and a ResNet50 classification layer to differentiate the StyleGAN-generated images from real ones. Haar and Daubechies wavelet filters are applied to convert the input images into multi-resolution representations, which will then be fed to a ResNet50 network for classification, capitalizing on subtle artifacts left by the generative process. Moreover, the wavelet-based models are compared to an identical ResNet50 model trained on spatial data. The Haar and Daubechies preprocessed models achieved a greater accuracy of 93.8 percent and 95.1 percent, much higher than the model developed in the spatial domain (accuracy rate of 81.5 percent). The Daubechies-based model outperforms Haar, showing that adding layers of descriptive frequency patterns can lead to even greater distinguishing power. These results indicate that the GAN-generated images have unique wavelet-domain artifacts or “fingerprints.” The method proposed illustrates the effectiveness of wavelet-domain analysis to detect GAN images and emphasizes the potential of further developing the capabilities of future deepfake detection systems.
zh

[CV-192] Prompt fidelity of ChatGPT 4o / Dall-E3 text-to-image visualisations

【速读】:该论文旨在解决生成式 AI(Generative AI)在文本到图像生成过程中提示词(prompt)与最终图像之间的一致性问题,即评估模型是否能准确地将用户指定的属性(如年龄、服饰、配饰等)忠实地呈现于生成图像中。其解决方案的关键在于构建一个系统化的评估框架,基于两个公开数据集(共430张图像),量化分析了DALL-E3在个人特征(age, hair)、外貌(attire, glasses)及随身物品(name tags, clipboards)三类属性上的渲染准确性,并发现模型在人物本身属性(尤其是年龄)上偏差最高,提示需关注提示词到图像的映射误差对偏见检测和模型评估的影响。

链接: https://arxiv.org/abs/2510.21821
作者: Dirk HR Spennemann
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:This study examines the prompt fidelity of ChatGPT4o / DALL-E3 text-to-image visualisations by analysing whether attributes explicitly specified in autogenously generated prompts are correctly rendered in the resulting images. Using two public-domain datasets comprising 200 visualisations of women working in the cultural and creative industries and 230 visualisations of museum curators, the study assessed accuracy across personal attributes (age, hair), appearance (attire, glasses), and paraphernalia (name tags, clipboards). While correctly rendered in most cases, DALL-E3 deviated from prompt specifications in 15.6% of all attributes (n=710). Errors were lowest for paraphernalia, moderate for personal appearance, and highest for depictions of the person themselves, particularly age. These findings demonstrate measurable prompt-to-image fidelity gaps with implications for bias detection and model evaluation.
zh

[CV-193] Gestura: A LVLM-Powered System Bridging Motion and Semantics for Real-Time Free-Form Gesture Understanding

【速读】:该论文旨在解决自由形态手势理解(free-form gesture understanding)在人机交互中的识别准确率低和响应速度慢的问题,现有方案GestureGPT存在显著局限。其解决方案的关键在于提出一个端到端系统Gestura,该系统通过三个核心组件实现:首先,利用预训练的大视觉语言模型(Large Vision-Language Model, LVLM)将高动态性和多样性的自由形态手势与高层语义概念对齐;其次,引入关键点处理模块(Landmark Processing Module),嵌入解剖学手部先验知识以弥补LVLM在细粒度领域知识上的不足,从而更好捕捉不同风格下的细微手部动作;最后,采用链式思维(Chain-of-Thought, CoT)推理策略进行分步语义推断,将浅层知识转化为深层语义理解,显著提升模型对模糊或非常规手势的解析能力。

链接: https://arxiv.org/abs/2510.21814
作者: Zhuoming Li,Aitong Liu,Mengxi Jia,Tengxiang Zhang,Dell Zhang,Xuelong Li
机构: Institute of Artificial Intelligence (TeleAI) of China Telecom (中国电信人工智能研究院); Goertek Inc (歌尔股份有限公司)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: IMWUT2025

点击查看摘要

Abstract:Free-form gesture understanding is highly appealing for human-computer interaction, as it liberates users from the constraints of predefined gesture categories. However, the sole existing solution GestureGPT suffers from limited recognition accuracy and slow response times. In this paper, we propose Gestura, an end-to-end system for free-form gesture understanding. Gestura harnesses a pre-trained Large Vision-Language Model (LVLM) to align the highly dynamic and diverse patterns of free-form gestures with high-level semantic concepts. To better capture subtle hand movements across different styles, we introduce a Landmark Processing Module that compensate for LVLMs’ lack of fine-grained domain knowledge by embedding anatomical hand priors. Further, a Chain-of-Thought (CoT) reasoning strategy enables step-by-step semantic inference, transforming shallow knowledge into deep semantic understanding and significantly enhancing the model’s ability to interpret ambiguous or unconventional gestures. Together, these components allow Gestura to achieve robust and adaptable free-form gesture comprehension. Additionally, we have developed the first open-source dataset for free-form gesture intention reasoning and understanding with over 300,000 annotated QA pairs.
zh

[CV-194] SITS-DECO: A Generative Decoder Is All You Need For Multitask Satellite Image Time Series Modelling

【速读】:该论文旨在解决当前地球观测(Earth Observation, EO)基础模型在实际应用中存在适应性差、结构僵化的问题,即大多数现有模型需针对特定数据源或训练方法进行额外调整,难以实现多任务、多模态的通用建模。其解决方案的关键在于借鉴大语言模型中的统一序列建模思想,提出SITS-DECO(Satellite Image Time Series-DECoder Only)——一个仅使用GPT风格解码器架构的生成式模型,通过将卫星时间序列图像编码为统一符号序列,在无需任务或模态特异性适配的情况下,利用符号提示(symbolic prompting)实现像素级、多时相、多模态的作物类型分类等下游任务。实验表明,尽管结构简单且缺乏空间上下文信息,SITS-DECO在PASTIS-R作物分类任务上优于更大规模的传统EO基础模型,验证了密集时间序列建模是当前EO建模范式中缺失的关键要素,从而确立了一种以数据多样性与结构驱动能力的“数据为中心”的建模范式。

链接: https://arxiv.org/abs/2510.21813
作者: Samuel J. Barrett,Docko Sow
机构: LGND AI (LGND AI); Tolbi (Tolbi)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 27 pages, 7 figures

点击查看摘要

Abstract:Earth Observation (EO) Foundation Modelling (FM) holds great promise for simplifying and improving the use of EO data for diverse real-world tasks. However, most existing models require additional adaptation before they can be used and are structured rigidly around particular data sources or training approaches. To address this, we take inspiration from large language models, where diverse tasks, both pre-training and downstream, are implicitly captured through next-token prediction over unified token sequences, leveraging the structure and diversity of the training data. We introduce SITS-DECO (Satellite Image Time Series-DECoder Only), a proof-of-concept generative model that applies this unified-sequence framing to EO data. Using a simple GPT-style decoder-only architecture, and demonstrate its ability to perform useful EO tasks (pixel-wise, multi-temporal, multi-modal crop-type classification) in a purely generative framework. Through symbolic prompting, we show that the model can perform multiple supervised and self-supervised tasks within a single unified architecture, without task- or modality-specific adaptation. Despite its simplicity and lack of spatial context, SITS-DECO outperforms much larger EO foundation models on crop-type classification (PASTIS-R) demonstrating that dense temporal sequence modelling is a critical missing ingredient in the current paradigm. This work exemplifies a data-centric modelling paradigm in which capability arises from the diversity and structure of the training data rather than from architectural complexity. SITS-DECO provides a lightweight, practical route to multi-modal, multi-task EO modelling, and a conceptual bridge toward future generative EO foundation models. Comments: 27 pages, 7 figures Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG) Cite as: arXiv:2510.21813 [cs.CV] (or arXiv:2510.21813v1 [cs.CV] for this version) https://doi.org/10.48550/arXiv.2510.21813 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
zh

[CV-195] Comparative Analysis of Object Detection Algorithms for Surface Defect Detection

【速读】:该论文旨在解决工业质量控制中金属表面缺陷检测的精准性与实时性问题,其核心挑战在于如何在复杂背景下高效识别多种类型的微小缺陷(如划痕、夹杂物和轧入氧化皮)。解决方案的关键在于采用YOLOv11这一前沿实时目标检测算法,其优势源于增强的特征提取能力、单次前向传播处理全图的架构设计,以及改进的锚框生成机制和更深的卷积层结构,从而显著提升了检测精度与速度,在NEU-DET数据集上平均准确率比其他模型高出70%。

链接: https://arxiv.org/abs/2510.21811
作者: Arpan Maity,Tamal Ghosh
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 14 pages, 8 figures

点击查看摘要

Abstract:This article compares the performance of six prominent object detection algorithms, YOLOv11, RetinaNet, Fast R-CNN, YOLOv8, RT-DETR, and DETR, on the NEU-DET surface defect detection dataset, comprising images representing various metal surface defects, a crucial application in industrial quality control. Each model’s performance was assessed regarding detection accuracy, speed, and robustness across different defect types such as scratches, inclusions, and rolled-in scales. YOLOv11, a state-of-the-art real-time object detection algorithm, demonstrated superior performance compared to the other methods, achieving a remarkable 70% higher accuracy on average. This improvement can be attributed to YOLOv11s enhanced feature extraction capabilities and ability to process the entire image in a single forward pass, making it faster and more efficient in detecting minor surface defects. Additionally, YOLOv11’s architecture optimizations, such as improved anchor box generation and deeper convolutional layers, contributed to more precise localization of defects. In conclusion, YOLOv11’s outstanding performance in accuracy and speed solidifies its position as the most effective model for surface defect detection on the NEU dataset, surpassing competing algorithms by a substantial margin.
zh

[CV-196] Hybrid Deep Learning Framework for Enhanced Diabetic Retinopathy Detection: Integrating Traditional Features with AI-driven Insights

【速读】:该论文旨在解决糖尿病视网膜病变(Diabetic Retinopathy, DR)早期筛查困难的问题,尤其是在糖尿病负担较重的地区如印度,由于DR在早期阶段无明显症状,易导致漏诊和不可逆的视力丧失。解决方案的关键在于提出一种融合传统特征提取与深度学习(Deep Learning, DL)的混合诊断框架:通过手工设计的特征捕捉临床关键标志(如微动脉瘤、出血和渗漏),同时利用DL自动识别多层次图像模式,从而提升早期诊断准确率并减少假阴性结果。该方法实现了可解释的临床数据与学习特征的协同优化,相较于单一深度学习模型更具优势,为大规模、精准的DR筛查提供了可行的AI驱动路径。

链接: https://arxiv.org/abs/2510.21810
作者: Arpan Maity,Aviroop Pal,MD. Samiul Islam,Tamal Ghosh
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 11 pages, 3 figures

点击查看摘要

Abstract:Diabetic Retinopathy (DR), a vision-threatening complication of Dia-betes Mellitus (DM), is a major global concern, particularly in India, which has one of the highest diabetic populations. Prolonged hyperglycemia damages reti-nal microvasculature, leading to DR symptoms like microaneurysms, hemor-rhages, and fluid leakage, which, if undetected, cause irreversible vision loss. Therefore, early screening is crucial as DR is asymptomatic in its initial stages. Fundus imaging aids precise diagnosis by detecting subtle retinal lesions. This paper introduces a hybrid diagnostic framework combining traditional feature extraction and deep learning (DL) to enhance DR detection. While handcrafted features capture key clinical markers, DL automates hierarchical pattern recog-nition, improving early diagnosis. The model synergizes interpretable clinical data with learned features, surpassing standalone DL approaches that demon-strate superior classification and reduce false negatives. This multimodal AI-driven approach enables scalable, accurate DR screening, crucial for diabetes-burdened regions.
zh

[CV-197] Embodied Navigation with Auxiliary Task of Action Description Prediction ICCV2025

【速读】:该论文旨在解决多模态机器人导航中决策系统日益复杂且缺乏可解释性的问题,即在提升导航性能的同时如何实现对动作决策的自然语言描述。其关键解决方案是将动作描述任务作为辅助任务引入强化学习框架,并通过知识蒸馏技术利用预训练的视觉-语言模型(vision-language models)生成伪标签数据,从而克服传统方法因缺乏真实标注数据而难以融合描述任务的瓶颈。此方法在保持高导航性能的同时显著提升了系统的可解释性,在语义音视频导航等挑战性任务上达到了当前最优水平。

链接: https://arxiv.org/abs/2510.21809
作者: Haru Kondoh,Asako Kanezaki
机构: Institute of Science Tokyo (东京科学研究所); RIKEN AIP (理化学研究所先进智能项目)
类目: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
备注: ICCV 2025 Poster

点击查看摘要

Abstract:The field of multimodal robot navigation in indoor environments has garnered significant attention in recent years. However, as tasks and methods become more advanced, the action decision systems tend to become more complex and operate as black-boxes. For a reliable system, the ability to explain or describe its decisions is crucial; however, there tends to be a trade-off in that explainable systems can not outperform non-explainable systems in terms of performance. In this paper, we propose incorporating the task of describing actions in language into the reinforcement learning of navigation as an auxiliary task. Existing studies have found it difficult to incorporate describing actions into reinforcement learning due to the absence of ground-truth data. We address this issue by leveraging knowledge distillation from pre-trained description generation models, such as vision-language models. We comprehensively evaluate our approach across various navigation tasks, demonstrating that it can describe actions while attaining high navigation performance. Furthermore, it achieves state-of-the-art performance in the particularly challenging multimodal navigation task of semantic audio-visual navigation.
zh

[CV-198] Semantic Relation-Enhanced CLIP Adapter for Domain Adaptive Zero-Shot Learning

【速读】:该论文旨在解决域自适应零样本学习(Domain-Adaptive Zero-Shot Learning, DAZSL)中跨域迁移与跨类别泛化难以平衡的问题,尤其针对基于视觉-语言模型(如CLIP)在DAZSL场景下存在的两大挑战:一是缺乏语义关系引导导致跨类别知识传递效率低下;二是目标域微调过程中跨模态对齐性能下降。解决方案的关键在于提出一种语义关系增强的CLIP适配器框架(Semantic Relation-Enhanced CLIP, SRE-CLIP),其核心创新包括引入语义关系结构损失(Semantic Relation Structure Loss)以强化类别间语义关联,并设计跨模态对齐保留策略(Cross-Modal Alignment Retention Strategy)以稳定微调过程中的多模态一致性,从而显著提升DAZSL任务的性能。

链接: https://arxiv.org/abs/2510.21808
作者: Jiaao Yu,Mingjie Han,Jinkun Jiang,Junyu Dong,Tao Gong,Man Lan
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 5 pages

点击查看摘要

Abstract:The high cost of data annotation has spurred research on training deep learning models in data-limited scenarios. Existing paradigms, however, fail to balance cross-domain transfer and cross-category generalization, giving rise to the demand for Domain-Adaptive Zero-Shot Learning (DAZSL). Although vision-language models (e.g., CLIP) have inherent advantages in the DAZSL field, current studies do not fully exploit their potential. Applying CLIP to DAZSL faces two core challenges: inefficient cross-category knowledge transfer due to the lack of semantic relation guidance, and degraded cross-modal alignment during target domain fine-tuning. To address these issues, we propose a Semantic Relation-Enhanced CLIP (SRE-CLIP) Adapter framework, integrating a Semantic Relation Structure Loss and a Cross-Modal Alignment Retention Strategy. As the first CLIP-based DAZSL method, SRE-CLIP achieves state-of-the-art performance on the I2AwA and I2WebV benchmarks, significantly outperforming existing approaches.
zh

[CV-199] Activating Visual Context and Commonsense Reasoning through Masked Prediction in VLMs

【速读】:该论文旨在解决当前视觉语言模型(Vision Language Models, VLMs)在真实世界多模态场景中推理能力不足的问题,尤其在缺乏对视觉上下文和常识知识充分利用的情况下,导致模型泛化能力受限。其核心解决方案是提出一种新颖的微调任务——基于上下文与常识的掩码预测(Masked Prediction via Context and Commonsense, MPCC),通过重建被遮挡图像中的语义内容,强制模型融合视觉信息与常识推理,从而构建具备通用推理能力的基础模型。关键创新在于引入强化微调结合先验采样(Reinforcement Fine tuning with Prior Sampling)策略,在提升性能的同时显著增强模型在分布外(OOD)及跨任务场景下的泛化推理能力。

链接: https://arxiv.org/abs/2510.21807
作者: Jiaao Yu,Shenwei Li,Mingjie Han,Yifei Yin,Wenzheng Song,Chenghao Jia,Man Lan
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 9 pages

点击查看摘要

Abstract:Recent breakthroughs in reasoning models have markedly advanced the reasoning capabilities of large language models, particularly via training on tasks with verifiable rewards. Yet, a significant gap persists in their adaptation to real world multimodal scenarios, most notably, vision language tasks, due to a heavy focus on single modal language settings. While efforts to transplant reinforcement learning techniques from NLP to VLMs have emerged, these approaches often remain confined to perception centric tasks or reduce images to textual summaries, failing to fully exploit visual context and commonsense knowledge, ultimately constraining the generalization of reasoning capabilities across diverse multimodal environments. To address this limitation, we introduce a novel fine tuning task, Masked Prediction via Context and Commonsense, which forces models to integrate visual context and commonsense reasoning by reconstructing semantically meaningful content from occluded images, thereby laying the foundation for generalized reasoning. To systematically evaluate the model performance in generalized reasoning, we developed a specialized evaluation benchmark, MPCC Eval, and employed various fine tuning strategies to guide reasoning. Among these, we introduced an innovative training method, Reinforcement Fine tuning with Prior Sampling, which not only enhances model performance but also improves its generalized reasoning capabilities in OOD and cross task scenarios.
zh

[CV-200] Frame-Difference Guided Dynamic Region Perception for CLIP Adaptation in Text-Video Retrieval

【速读】:该论文旨在解决文本-视频检索技术中两个关键问题:一是依赖大规模标注视频-文本对导致的数据获取成本高;二是视频与文本特征之间存在显著模态鸿沟,影响跨模态对齐精度。解决方案的关键在于提出FDA-CLIP(Frame Difference Alpha-CLIP),通过引入帧差分生成动态区域掩码,并将其作为额外的Alpha通道输入Alpha-CLIP模型,从而主动引导模型关注语义关键动态区域,同时抑制静态背景冗余信息,实现更高效的视频语义编码与跨模态对齐。

链接: https://arxiv.org/abs/2510.21806
作者: Jiaao Yu,Mingjie Han,Tao Gong,Jian Zhang,Man Lan
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 5 pages

点击查看摘要

Abstract:With the rapid growth of video data, text-video retrieval technology has become increasingly important in numerous application scenarios such as recommendation and search. Early text-video retrieval methods suffer from two critical drawbacks: first, they heavily rely on large-scale annotated video-text pairs, leading to high data acquisition costs; second, there is a significant modal gap between video and text features, which limits cross-modal alignment accuracy. With the development of vision-language model, adapting CLIP to video tasks has attracted great attention. However, existing adaptation methods generally lack enhancement for dynamic video features and fail to effectively suppress static redundant features. To address this issue, this paper proposes FDA-CLIP (Frame Difference Alpha-CLIP), which is a concise CLIP-based training framework for text-video alignment. Specifically, the method uses frame differences to generate dynamic region masks, which are input into Alpha-CLIP as an additional Alpha channel. This proactively guides the model to focus on semantically critical dynamic regions while suppressing static background redundancy. Experiments demonstrate that frame difference-guided video semantic encoding can effectively balance retrieval efficiency and accuracy.
zh

[CV-201] It Takes Two to Tango: Two Parallel Samplers Improve Quality in Diffusion Models for Limited Steps

链接: https://arxiv.org/abs/2510.21802
作者: Pedro Cisneros-Velarde
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
备注:

点击查看摘要

[CV-202] Morphology-Aware KOA Classification: Integrating Graph Priors with Vision Models ICASSP2026

【速读】:该论文旨在解决膝骨关节炎(Knee Osteoarthritis, KOA)在X线影像中诊断困难的问题,特别是由于标准深度学习模型难以有效捕捉细微的形态学特征。其解决方案的关键在于提出一种新颖的多模态框架,通过将解剖结构信息与放射学特征相结合:利用Segment Anything Model (SAM) 的分割结果构建形态图表示(morphological graph representation),并将其与视觉编码器融合;同时,通过最大化互信息来强制几何感知的图嵌入与放射学特征对齐,从而引入显式的形态学先验,增强模型的归纳偏置,显著提升分类准确率。

链接: https://arxiv.org/abs/2510.21801
作者: Marouane Tliba,Mohamed Amine Kerkouri,Yassine Nasser,Nour Aburaed,Aladine Chetouani,Ulas Bagci,Rachid Jennane
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Submitted to ICASSP 2026

点击查看摘要

Abstract:Knee osteoarthritis (KOA) diagnosis from radiographs remains challenging due to the subtle morphological details that standard deep learning models struggle to capture effectively. We propose a novel multimodal framework that combines anatomical structure with radiographic features by integrating a morphological graph representation - derived from Segment Anything Model (SAM) segmentations - with a vision encoder. Our approach enforces alignment between geometry-informed graph embeddings and radiographic features through mutual information maximization, significantly improving KOA classification accuracy. By constructing graphs from anatomical features, we introduce explicit morphological priors that mirror clinical assessment criteria, enriching the feature space and enhancing the model’s inductive bias. Experiments on the Osteoarthritis Initiative dataset demonstrate that our approach surpasses single-modality baselines by up to 10% in accuracy (reaching nearly 80%), while outperforming existing state-of-the-art methods by 8% in accuracy and 11% in F1 score. These results underscore the critical importance of incorporating anatomical structure into radiographic analysis for accurate KOA severity grading.
zh

[CV-203] AI-Boosted Video Annotation: Assessing the Process Enhancement

【速读】:该论文旨在解决视频标注过程中人工标注效率低、成本高以及标注一致性差的问题。其核心解决方案是引入生成式 AI (Generative AI) 驱动的零样本预标注(zero-shot pre-annotations),结合 Human-in-the-Loop 框架,在 Label Studio 平台上实现单轮迭代标注流程。关键在于利用 AI 自动生成初始标注结果,从而显著降低人工标注负担,并通过实证验证其在提升标注效率(平均减少35%时间)和保持标注质量方面的有效性,同时增强不同标注者之间的一致性与视频帧自然聚类结构的匹配度。

链接: https://arxiv.org/abs/2510.21798
作者: Juan Gutiérrez,Ángel Mora,Pablo Regodón,Silvia Rodriguez,José Luis Blanco
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:We explore the enhancement of Human-in-the-Loop video annotation by integrating automatic capabilities to ease the task for annotators and assess their performance. The research delves into the practical implications of the annotation processes, the integration of AI components, and the evaluation of its outcomes. We analyze their impact on efficiency, accuracy, and overall annotation quality. Focusing on the Human-in-the-Loop for video annotation tasks, we implemented a single-iteration scheme using Label Studio and AI-powered zero-shot pre-annotations. Using this framework, we designed a test based on the annotation of the UCF-Crime dataset to discriminate between normal and abnormal activities in video footage. Our results evidence how automatic AI-based pre-annotation can streamline the video annotation workflow, empowering human annotators and optimizing the overall pipeline. Using the pre-annotated data, we observed a 35% reduction in the annotation time for 70% of the annotators with similar quality annotations, compared to the traditional manual annotation task. Results are consistent with asset duration and complexity. We also observed that while annotators rapidly learned to use the tool, the produced annotations are more coherent among annotators and better match the natural clustering of the video frames.
zh

[CV-204] Xihe: Scalable Zero-Shot Time Series Learner Via Hierarchical Interleaved Block Attention

链接: https://arxiv.org/abs/2510.21795
作者: Yinbo Sun,Yuchen Fang,Zhibo Zhu,Jia Li,Yu Liu,Qiwen Deng,Jun Zhou,Hang Yu,Xingyu Lu,Lintao Ma
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-205] oken-Level Inference-Time Alignment for Vision-Language Models

【速读】:该论文旨在解决视觉语言模型(Vision-Language Models, VLMs)在生成文本时容易产生幻觉(hallucination)的问题,即输出内容与输入图像信息不一致的现象。现有对齐方法通常依赖于昂贵的微调策略或仅提供粗粒度延迟反馈的序列级推理机制,难以实现高效、细粒度的校正。解决方案的关键在于提出一种轻量级框架TITA(Token-level Inference-Time Alignment),其核心思想是在冻结基础VLM的前提下,训练一个奖励模型(reward model)来近似VLM的概率分布;在推理阶段,通过计算奖励模型与目标VLM之间的log-probability比值,提取隐式偏好信号,从而获得密集的自回归级反馈,本质上是直接偏好优化(Direct Preference Optimization, DPO)的推理时变体,无需重训练主干模型即可实现token级别的纠正信号,显著提升多模态理解能力并降低幻觉率。

链接: https://arxiv.org/abs/2510.21794
作者: Kejia Chen,Jiawen Zhang,Jiacong Hu,Kewei Gao,Jian Lou,Zunlei Feng,Mingli Song
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Vision-Language Models (VLMs) have become essential backbones of modern multimodal intelligence, yet their outputs remain prone to hallucination-plausible text misaligned with visual inputs. Existing alignment approaches often rely on expensive fine-tuning with annotated preference data or sequence-level inference strategies that provide only coarse, delayed feedback. To overcome these limitations, we present TITA (Token-level Inference-Time Alignment), a lightweight framework that freezes the base VLM and instead trains a reward model to approximate its distribution. During inference, implicit preference signals are extracted as log-probability ratios between the reward model and the target VLM, yielding dense autoregressive feedback. This formulation can be viewed as an inference-time variant of Direct Preference Optimization (DPO), providing token-level corrective signals without retraining the backbone. Extensive evaluations on LLaVA-1.5-7B and 13B show consistent gains across 12 benchmarks, with improvements of 8.6% on MMVet and 6.7% on POPE, indicating stronger general understanding and reduced hallucinations. Additional experiments on Qwen2.5-VL-7B and DeepSeek-VL2-27.5B show comparable gains, especially in hallucination reduction and VQA accuracy, while incurring negligible inference overhead.
zh

[CV-206] 2D_3D Feature Fusion via Cross-Modal Latent Synthesis and Attention Guided Restoration for Industrial Anomaly Detection

【速读】:该论文旨在解决工业异常检测(Industrial Anomaly Detection, IAD)中跨模态融合不 robust 的问题,尤其是在结合RGB图像与点云数据时难以有效整合视觉与几何信息的挑战。解决方案的关键在于提出一种无监督框架 Multi-Modal Attention-Driven Fusion Restoration (MAFR),其核心是通过共享的融合编码器构建统一的潜在空间,并采用注意力引导的、模态特定的解码器进行重建;异常通过输入特征与其恢复版本之间的重构误差来定位。该方法在 MVTec 3D-AD 和 Eyecandies 基准上均达到当前最优性能,验证了其在跨模态融合与异常定位上的有效性。

链接: https://arxiv.org/abs/2510.21793
作者: Usman Ali,Ali Zia,Abdul Rehman,Umer Ramzan,Zohaib Hassan,Talha Sattar,Jing Wang,Wei Xiang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
备注: Accepted at 26th International Conference on Digital Image Computing: Techniques and Applications (DICTA 2025)

点击查看摘要

Abstract:Industrial anomaly detection (IAD) increasingly benefits from integrating 2D and 3D data, but robust cross-modal fusion remains challenging. We propose a novel unsupervised framework, Multi-Modal Attention-Driven Fusion Restoration (MAFR), which synthesises a unified latent space from RGB images and point clouds using a shared fusion encoder, followed by attention-guided, modality-specific decoders. Anomalies are localised by measuring reconstruction errors between input features and their restored counterparts. Evaluations on the MVTec 3D-AD and Eyecandies benchmarks demonstrate that MAFR achieves state-of-the-art results, with a mean I-AUROC of 0.972 and 0.901, respectively. The framework also exhibits strong performance in few-shot learning settings, and ablation studies confirm the critical roles of the fusion architecture and composite loss. MAFR offers a principled approach for fusing visual and geometric information, advancing the robustness and accuracy of industrial anomaly detection. Code is available at this https URL
zh

[CV-207] Exploring the design space of diffusion and flow models for data fusion

【速读】:该论文旨在解决卫星遥感数据融合中的多源信息整合问题,特别是如何有效融合美国国防气象卫星计划的运行线扫描系统(DMSP-OLS)与可见光红外成像辐射计套件(VIIRS)的夜间灯光数据,以提升空间和时间分辨率。其解决方案的关键在于系统性地探索扩散模型(diffusion models)与流模型(flow models)在图像到图像生成任务中的设计空间,发现基于UNet架构的扩散模型在保留细粒度空间细节和生成高保真融合图像方面表现最优;同时提出噪声调度策略的选择指南,权衡迭代求解器(加速推理)与离散调度器(提升重建质量)之间的性能 trade-off,并引入量化技术优化内存效率与计算成本,从而为遥感领域的数据融合提供可落地的高性能建模方案。

链接: https://arxiv.org/abs/2510.21791
作者: Niraj Chaudhari,Manmeet Singh,Naveen Sudharsan,Amit Kumar Srivastava,Harsh Kamath,Dushyant Mahajan,Ayan Paul
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Instrumentation and Detectors (physics.ins-det)
备注:

点击查看摘要

Abstract:Data fusion is an essential task in various domains, enabling the integration of multi-source information to enhance data quality and insights. One key application is in satellite remote sensing, where fusing multi-sensor observations can improve spatial and temporal resolution. In this study, we explore the design space of diffusion and flow models for data fusion, focusing on the integration of Defense Meteorological Satellite Program’s Operational Linescan System (DMSP-OLS) and Visible Infrared Imaging Radiometer Suite (VIIRS) nighttime lights data. Our approach leverages a diverse set of 2D image-to-image generative models, including UNET, diffusion, and flow modeling architectures. We evaluate the effectiveness of these architectures in satellite remote sensing data fusion, identifying diffusion models based on UNet as particularly adept at preserving fine-grained spatial details and generating high-fidelity fused images. We also provide guidance on the selection of noise schedulers in diffusion-based models, highlighting the trade-offs between iterative solvers for faster inference and discrete schedulers for higher-quality reconstructions. Additionally, we explore quantization techniques to optimize memory efficiency and computational cost without compromising performance. Our findings offer practical insights into selecting the most effective diffusion and flow model architectures for data fusion tasks, particularly in remote sensing applications, and provide recommendations for leveraging noise scheduling strategies to enhance fusion quality.
zh

[CV-208] Mismatch reconstruction theory for unknown measurement matrix in imaging through multimode fiber bending

【速读】:该论文旨在解决多模光纤成像中因测量矩阵(measurement matrix)未知而导致图像重建失败的问题,尤其在系统配置不明确或光纤任意弯曲后难以实时对齐的情况下。解决方案的关键在于提出了一种新的失配重建理论(mismatch reconstruction theory),通过建立失配方程(mismatch equation)并设计匹配与校准算法,从实际测量值中重构出可用的测量矩阵,从而使得传统重建算法能够恢复原始图像。实验表明,在低噪声环境下,重构矩阵可作为匹配对用于经典重建方法,并且所提算法对噪声、计算精度和正交性具有一定的鲁棒性。

链接: https://arxiv.org/abs/2510.21787
作者: Le Yang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Optics (physics.optics)
备注:

点击查看摘要

Abstract:Multimode fiber imaging requires strict matching between measurement value and measurement matrix to achieve image reconstruction. However, in practical applications, the measurement matrix often cannot be obtained due to unknown system configuration or difficulty in real-time alignment after arbitrary fiber bending, resulting in the failure of traditional reconstruction algorithms. This paper presents a novel mismatch reconstruction theory for solving the problem of image reconstruction when measurement matrix is unknown. We first propose mismatch equation and design matched and calibration solution algorithms to construct a new measurement matrix. In addition, we also provide a detailed proof of these equations and algorithms in the appendix. The experimental results show that under low noise levels, constructed matrix can be used for matched pair in traditional reconstruction algorithms, and reconstruct the original image successfully. Then, we analyze the impact of noise, computational precision and orthogonality on reconstruction performance. The results show that proposed algorithms have a certain degree of robustness. Finally, we discuss the limitations and potential applications of this theory. The code is available: this https URL.
zh

[CV-209] EventFormer: A Node-graph Hierarchical Attention Transformer for Action-centric Video Event Prediction

【速读】:该论文旨在解决视频场景中事件预测(Video Event Prediction)任务的挑战,特别是针对人类活动通常以视频形式记录而非脚本文本这一现实情况,当前视觉领域缺乏相关研究的问题。为应对这一挑战,作者提出了一个名为 AVEP(Action-centric Video Event Prediction)的新任务,其核心在于引入更复杂的逻辑结构和更丰富的语义信息,从而实现对后续事件的精准预测。解决方案的关键在于构建了一个包含约 35K 标注视频和超过 178K 视频片段的大规模结构化数据集,并设计了 EventFormer 模型——一种基于节点图层次注意力机制的视频事件预测模型,能够同时捕捉事件与其论元之间的关系以及论元间的共指关系,从而更好地建模视频事件的细粒度结构。实验表明,该方法在多个主流视频预测模型和大视觉语言模型(LVLMs)上均取得显著性能提升。

链接: https://arxiv.org/abs/2510.21786
作者: Qile Su,Shoutai Zhu,Shuai Zhang,Baoyu Liang,Chao Tong
机构: Beihang University (北京航空航天大学); University of Science and Technology Beijing (北京科技大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
备注: 15 pages, 7 figures, 6 tables

点击查看摘要

Abstract:Script event induction, which aims to predict the subsequent event based on the context, is a challenging task in NLP, achieving remarkable success in practical applications. However, human events are mostly recorded and presented in the form of videos rather than scripts, yet there is a lack of related research in the realm of vision. To address this problem, we introduce AVEP (Action-centric Video Event Prediction), a task that distinguishes itself from existing video prediction tasks through its incorporation of more complex logic and richer semantic information. We present a large structured dataset, which consists of about 35K annotated videos and more than 178K video clips of event, built upon existing video event datasets to support this task. The dataset offers more fine-grained annotations, where the atomic unit is represented as a multimodal event argument node, providing better structured representations of video events. Due to the complexity of event structures, traditional visual models that take patches or frames as input are not well-suited for AVEP. We propose EventFormer, a node-graph hierarchical attention based video event prediction model, which can capture both the relationships between events and their arguments and the coreferencial relationships between arguments. We conducted experiments using several SOTA video prediction models as well as LVLMs on AVEP, demonstrating both the complexity of the task and the value of the dataset. Our approach outperforms all these video prediction models. We will release the dataset and code for replicating the experiments and annotations.
zh

[CV-210] Multi-Agent Pose Uncertainty: A Differentiable Rendering Cramér-Rao Bound IROS2025 ICCV2025

【速读】:该论文旨在解决在计算机视觉与机器人领域中,针对密集或学习型模型下的相机位姿估计缺乏严谨不确定性量化的问题。其解决方案的关键在于将可微渲染器(differentiable renderer)视为测量函数,通过在位姿流形上对小扰动进行线性化,推导出相机位姿估计协方差的闭式下界,从而获得一种基于渲染的Cramér-Rao下界(render-aware Cramér-Rao bound)。该方法不仅在形式上与经典束调整(bundle adjustment)的不确定性分析保持连续性,还可自然扩展至多智能体场景,通过融合各相机的费舍尔信息(Fisher information)实现协同感知等下游任务。

链接: https://arxiv.org/abs/2510.21785
作者: Arun Muthukkumar
机构: Illinois Mathematics and Science Academy (伊利诺伊数学与科学学院)
类目: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Robotics (cs.RO)
备注: 5 pages, 3 figures, 1 table. Presented at IEEE/CVF International Conference on Computer Vision (ICCV 2025) and IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)

点击查看摘要

Abstract:Pose estimation is essential for many applications within computer vision and robotics. Despite its uses, few works provide rigorous uncertainty quantification for poses under dense or learned models. We derive a closed-form lower bound on the covariance of camera pose estimates by treating a differentiable renderer as a measurement function. Linearizing image formation with respect to a small pose perturbation on the manifold yields a render-aware Cramér-Rao bound. Our approach reduces to classical bundle-adjustment uncertainty, ensuring continuity with vision theory. It also naturally extends to multi-agent settings by fusing Fisher information across cameras. Our statistical formulation has downstream applications for tasks such as cooperative perception and novel view synthesis without requiring explicit keypoint correspondences.
zh

[CV-211] Noise Aggregation Analysis Driven by Small-Noise Injection: Efficient Membership Inference for Diffusion Models

【速读】:该论文旨在解决扩散模型(Diffusion Models)在广泛应用中引发的隐私风险问题,特别是针对成员推理攻击(Membership Inference Attack, MIA)的防御不足。现有方法在攻击效率和适用性上存在局限,尤其在大规模文本到图像生成模型(如Stable Diffusion)中表现不佳。论文提出了一种高效的成员推理攻击方法,其核心在于通过向待测图像注入微小噪声,并分析模型在扩散过程某一时间步上预测噪声分布的聚集程度来判断样本是否属于训练集。关键创新点在于:训练集样本的噪声预测模式具有更高的聚集性,而非训练集样本则表现出更分散的特征,从而实现高精度、低访问次数的成员推断。实验证明该方法在多个数据集及大规模文本到图像扩散模型中均具备优越性能,且在攻击成功率(ASR)和AUC指标上优于现有方法,展现出良好的可扩展性。

链接: https://arxiv.org/abs/2510.21783
作者: Guo Li,Yuyang Yu,Xuemiao Xu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
备注:

点击查看摘要

Abstract:Diffusion models have demonstrated powerful performance in generating high-quality images. A typical example is text-to-image generator like Stable Diffusion. However, their widespread use also poses potential privacy risks. A key concern is membership inference attacks, which attempt to determine whether a particular data sample was used in the model training process. We propose an efficient membership inference attack method against diffusion models. This method is based on the injection of slight noise and the evaluation of the aggregation degree of the noise distribution. The intuition is that the noise prediction patterns of diffusion models for training set samples and non-training set samples exhibit distinguishable this http URL, we suppose that member images exhibit higher aggregation of predicted noise around a certain time step of the diffusion process. In contrast, the predicted noises of non-member images exhibit a more discrete characteristic around the certain time step. Compared with other existing methods, our proposed method requires fewer visits to the target diffusion model. We inject slight noise into the image under test and then determine its membership by analyzing the aggregation degree of the noise distribution predicted by the model. Empirical findings indicate that our method achieves superior performance across multiple datasets. At the same time, our method can also show better attack effects in ASR and AUC when facing large-scale text-to-image diffusion models, proving the scalability of our method.
zh

[CV-212] Promptable Fire Segmentation: Unleashing SAM2s Potential for Real-Time Mobile Deployment with Strategic Bounding Box Guidance

【速读】:该论文旨在解决火焰分割在计算机视觉中面临的挑战,尤其是火焰不规则边界、半透明边缘及强度高度变化等问题,同时探索生成式 AI(Generative AI)模型 SAM2 在移动端部署场景下的有效性与优化策略。其解决方案的关键在于系统性评估多种 SAM2.1 及轻量化变体(如 TinySAM 和 MobileSAM)在三类火灾数据集上的性能表现,并重点分析不同提示策略(prompting strategies)对分割精度的影响,发现边界框提示(bounding box prompt)及其与多点提示的混合策略(Box+MP)能显著提升平均交并比(mean IoU)和 Dice 系数,从而为低延迟边缘计算环境中的火灾监测系统提供可行且高效的部署方案。

链接: https://arxiv.org/abs/2510.21782
作者: Emmanuel U. Ugwu,Zhang Xinming
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted for presentation at the 9th International Conference on Image and Graphics Processing (ICIGP 2026) will be held in Wuhan, China during January 16-18, 2026 (publication forthcoming). 6 pages, 3 figures, 3 tables

点击查看摘要

Abstract:Fire segmentation remains a critical challenge in computer vision due to flames’ irregular boundaries, translucent edges, and highly variable intensities. While the Segment Anything Models (SAM and SAM2) have demonstrated impressive cross-domain generalization capabilities, their effectiveness in fire segmentation – particularly under mobile deployment constraints – remains largely unexplored. This paper presents the first comprehensive evaluation of SAM2 variants for fire segmentation, focusing on bounding box prompting strategies to enhance deployment feasibility. We systematically evaluate four SAM2.1 variants (tiny, small, base_plus, large) alongside mobile-oriented variants (TinySAM, MobileSAM) across three fire datasets using multiple prompting strategies: automatic, single positive point (SP), single positive point + single negative point (SP+SN), multiple positive points (MP), bounding box (Box), and hybrid variants (Box+SP and Box+MP). Our experimental results demonstrate that bounding box prompts consistently outperform automatic and single point-based approaches, with Box+MP achieving the highest mean IoU (0.64) and Dice coefficient (0.75) on the Khan dataset. Lightweight variants such as TinySAM and MobileSAM further reduce memory and computational costs, making them more suitable for latency-tolerant edge scenarios. Overall, this work provides critical insights for deploying promptable segmentation models in fire monitoring systems and establishes benchmarks for future research in domain-specific SAM applications. Code is available at: this https URL
zh

[CV-213] EdgeSync: Accelerating Edge-Model Updates for Data Drift through Adaptive Continuous Learning

【速读】:该论文旨在解决实时视频分析系统中边缘模型因数据特征分布随时间变化(如光照、天气等)而导致的准确性下降问题,以及现有云端协同训练方法在模型更新过程中存在的计算密集型延迟和新模型与当前视频流数据分布不匹配两大挑战。其解决方案的关键在于提出EdgeSync机制,通过引入时效性与时序推理结果相结合的样本过滤策略,提升训练样本与当前视频内容的相关性并减少更新延迟;同时设计动态训练管理模块,优化模型更新的时间与顺序,从而增强模型更新的及时性和有效性。

链接: https://arxiv.org/abs/2510.21781
作者: Runchu Donga,Peng Zhao,Guiqin Wang,Nan Qi,Jie Lin
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Real-time video analytics systems typically deploy lightweight models on edge devices to reduce latency. However, the distribution of data features may change over time due to various factors such as changing lighting and weather conditions, leading to decreased model accuracy. Recent frameworks try to address this issue by leveraging remote servers to continuously train and adapt lightweight edge models using more complex models in the cloud. Despite these advancements, existing methods face two key challenges: first, the retraining process is compute-intensive, causing significant delays in model updates; second, the new model may not align well with the evolving data distribution of the current video stream. To address these challenges, we introduce EdgeSync, an efficient edge-model updating approach that enhances sample filtering by incorporating timeliness and inference results, thus ensuring training samples are more relevant to the current video content while reducing update delays. Additionally, EdgeSync features a dynamic training management module that optimizes the timing and sequencing of model updates to improve their timeliness. Evaluations on diverse and complex real-world datasets demonstrate that EdgeSync improves accuracy by approximately 3.4% compared to existing methods and by about 10% compared to traditional approaches.
zh

[CV-214] Bridging Accuracy and Interpretability: Deep Learning with XAI for Breast Cancer Detection

链接: https://arxiv.org/abs/2510.21780
作者: Bishal Chhetri,B.V. Rathish Kumar
机构: Indian Institute of Technology Kanpur (印度理工学院坎普尔分校)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 15 pages, 14 figures

点击查看摘要

[CV-215] Ageing Drift in Binary Face Templates: A Bits-per-Decade Analysis

【速读】:该论文旨在解决人脸模板在时间维度上的纵向稳定性问题,即个体随年龄增长导致的生物特征表示变化(称为“老化漂移”),并量化这种漂移对二进制编码模板的影响。解决方案的关键在于:使用主成分分析与整数量化(PCA-ITQ)将现代人脸卷积神经网络(CNN)输出的浮点嵌入压缩为64位和128位二进制码,并基于AgeDB数据集中至少包含三个不同年龄段的566个身份,构建所有真实配对样本,拟合每个身份的汉明距离与绝对年龄差之间的线性模型,从而直接以“每十年比特数”为单位量化老化漂移。结果显示,较短的64位模板比128位模板具有更低的老化漂移速率,表明在固定决策阈值下,短码更具备年龄稳定性;同时通过报告不同年龄区间下的等错误率(EER)和1%假正率(FAR)时的真阳性率(TPR),将漂移斜率与实际部署性能关联,为智能卡及芯片内匹配(match-on-card)场景提供可操作的缓解策略,如定期重新注册和针对不稳定的比特位置进行针对性校准。

链接: https://arxiv.org/abs/2510.21778
作者: Abdelilah Ganmati,Karim Afdel,Lahcen Koutti
机构: Ibn Zohr University (伊本·佐赫大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 9 pages, 3 figures, 2 tables

点击查看摘要

Abstract:We study the longitudinal stability of compact binary face templates and quantify ageing drift directly in bits per decade. Float embeddings from a modern face CNN are compressed with PCA-ITQ into 64- and 128-bit codes. For each identity in AgeDB with at least three distinct ages, we form all genuine pairs and fit a per-identity linear model of Hamming distance versus absolute age gap. Across 566 identities, the median slope is 1.357 bits per decade for 64-bit templates and 2.571 bits per decade for 128-bit templates, with tight non-parametric 95 percent bootstrap confidence intervals. The distributions are predominantly positive, indicating a small but systematic increase in intra-class distance over time. Because drift scales with code length, shorter codes are inherently more age-stable at a fixed decision threshold. We connect these slopes to operating characteristics by reporting EER and TPR at FAR = 1 percent in three age bins. We discuss implications for smart-card and match-on-card deployments, including simple mitigations such as periodic re-enrolment and targeted parity on empirically unstable bit positions. Code and CSV artifacts are provided to support reproducibility.
zh

[CV-216] Face-MakeUpV2: Facial Consistency Learning for Controllable Text-to-Image Generation

【速读】:该论文旨在解决当前文本到图像生成模型在面部图像生成中面临的两个关键问题:一是局部语义指令响应时出现的面部属性泄露(facial attribute leakage)问题,二是难以保持参考图像的面部身份一致性(face ID consistency)与物理特征一致性(physical consistency)。解决方案的关键在于:首先构建了包含约百万级图像-文本-掩码对的大规模数据集FaceCaptionMask-1M,为局部语义指令提供精确的空间监督;其次,在通用文本到图像预训练模型基础上引入两种互补的面部信息注入通道——3D面部渲染通道用于融入参考图像的物理特征,全局面部特征通道用于保留身份信息;最后,通过嵌入空间语义对齐和面部图像感知损失(perceptual loss)两个优化目标,协同提升生成结果的语义准确性与身份一致性。

链接: https://arxiv.org/abs/2510.21775
作者: Dawei Dai,Yinxiu Zhou,Chenghang Li,Guolai Jiang,Chengfang Zhang
机构: Chongqing University of Posts and Telecommunications (重庆邮电大学); Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences (中国科学院深圳先进技术研究院); Intelligent Policing Key Laboratory of Sichuan Province, Sichuan Police College (四川省智能警务重点实验室,四川警察学院)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
备注:

点击查看摘要

Abstract:In facial image generation, current text-to-image models often suffer from facial attribute leakage and insufficient physical consistency when responding to local semantic instructions. In this study, we propose Face-MakeUpV2, a facial image generation model that aims to maintain the consistency of face ID and physical characteristics with the reference image. First, we constructed a large-scale dataset FaceCaptionMask-1M comprising approximately one million image-text-masks pairs that provide precise spatial supervision for the local semantic instructions. Second, we employed a general text-to-image pretrained model as the backbone and introduced two complementary facial information injection channels: a 3D facial rendering channel to incorporate the physical characteristics of the image and a global facial feature channel. Third, we formulated two optimization objectives for the supervised learning of our model: semantic alignment in the model’s embedding space to mitigate the attribute leakage problem and perceptual loss on facial images to preserve ID consistency. Extensive experiments demonstrated that our Face-MakeUpV2 achieves best overall performance in terms of preserving face ID and maintaining physical consistency of the reference images. These results highlight the practical potential of Face-MakeUpV2 for reliable and controllable facial editing in diverse applications.
zh

[CV-217] OCR-Quality: A Human-Annotated Dataset for OCR Quality Assessment

【速读】:该论文旨在解决真实场景中光学字符识别(OCR)质量评估缺乏可靠基准数据集的问题。现有方法在复杂文档(如多语言、学术论文、电子书等)上的评估能力不足,难以有效支撑OCR验证系统的开发与优化。解决方案的关键在于构建了一个高质量、多样化的标注数据集——OCR-Quality,其中包含1,000个从实际PDF转换为300 DPI PNG图像的文档样本,并通过先进的视觉-语言模型(VLMs)预处理后由人工按四级评分体系(Excellent到Poor)进行标注,同时提供详尽的来源信息和注释指南,从而为OCR质量评估方法的研发提供了标准化、可复现的基准。

链接: https://arxiv.org/abs/2510.21774
作者: Yulong Zhang
机构: Beijing University of Posts and Telecommunications (北京邮电大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:We present OCR-Quality, a comprehensive human-annotated dataset designed for evaluating and developing OCR quality assessment methods. The dataset consists of 1,000 PDF pages converted to PNG images at 300 DPI, sampled from diverse real-world scenarios, including academic papers, textbooks, e-books, and multilingual documents. Each document has been processed using state-of-the-art Vision-Language Models (VLMs) and manually annotated with quality scores using a 4-level scoring system (1: Excellent, 2: Good, 3: Fair, 4: Poor). The dataset includes detailed source information, annotation guidelines, and representative cases across various difficulty levels. OCR-Quality addresses the critical need for reliable OCR quality assessment in real-world applications and provides a valuable benchmark for training and evaluating OCR verification systems. The dataset is publicly available at this https URL .
zh

[CV-218] H2OFlow: Grounding Human-Object Affordances with 3D Generative Models and Dense Diffused Flows

【速读】:该论文旨在解决当前3D人类-物体交互(Human-Object Interaction, HOI) affordance理解方法中存在的两大问题:一是依赖人工标注数据,导致成本高、效率低;二是现有方法仅关注接触性分析,忽视了方向偏好(orientation)和空间占据(spatial occupancy)等关键交互特性。其解决方案的关键在于提出H2OFlow框架,该框架通过基于点云的密集3D流(dense 3D-flow)表示学习机制,利用仅来自3D生成模型的合成数据,无需人工标注即可自动发现包含接触、方向和空间占据在内的丰富3D affordance特征,从而实现对真实世界物体的有效泛化。

链接: https://arxiv.org/abs/2510.21769
作者: Harry Zhang,Luca Carlone
机构: MIT (麻省理工学院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Understanding how humans interact with the surrounding environment, and specifically reasoning about object interactions and affordances, is a critical challenge in computer vision, robotics, and AI. Current approaches often depend on labor-intensive, hand-labeled datasets capturing real-world or simulated human-object interaction (HOI) tasks, which are costly and time-consuming to produce. Furthermore, most existing methods for 3D affordance understanding are limited to contact-based analysis, neglecting other essential aspects of human-object interactions, such as orientation (\eg, humans might have a preferential orientation with respect certain objects, such as a TV) and spatial occupancy (\eg, humans are more likely to occupy certain regions around an object, like the front of a microwave rather than its back). To address these limitations, we introduce \emphH2OFlow, a novel framework that comprehensively learns 3D HOI affordances – encompassing contact, orientation, and spatial occupancy – using only synthetic data generated from 3D generative models. H2OFlow employs a dense 3D-flow-based representation, learned through a dense diffusion process operating on point clouds. This learned flow enables the discovery of rich 3D affordances without the need for human annotations. Through extensive quantitative and qualitative evaluations, we demonstrate that H2OFlow generalizes effectively to real-world objects and surpasses prior methods that rely on manual annotations or mesh-based representations in modeling 3D affordance.
zh

[CV-219] Proportion and Perspective Control for Flow-Based Image Generation

【速读】:该论文旨在解决当前文本到图像扩散模型在生成图像时对空间布局和几何结构控制能力有限的问题(limited control over the spatial and geometric structure of the output)。其解决方案的关键在于引入两种专用的ControlNet模块:一是比例ControlNet(proportion ControlNet),通过边界框(bounding boxes)精确指定对象的位置与尺度;二是透视ControlNet(perspective ControlNet),利用消失线(vanishing lines)调控场景的三维几何结构。这两个模块结合基于视觉-语言模型的数据标注管道与专门设计的条件图像合成算法,实现了对生成图像结构的有效控制,尽管在复杂约束条件下仍存在局限性。

链接: https://arxiv.org/abs/2510.21763
作者: Julien Boudier,Hugo Caselles-Dupré
机构: Obvious Research (Obvious 研究所)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Technical report after open-source release

点击查看摘要

Abstract:While modern text-to-image diffusion models generate high-fidelity images, they offer limited control over the spatial and geometric structure of the output. To address this, we introduce and evaluate two ControlNets specialized for artistic control: (1) a proportion ControlNet that uses bounding boxes to dictate the position and scale of objects, and (2) a perspective ControlNet that employs vanishing lines to control the 3D geometry of the scene. We support the training of these modules with data pipelines that leverage vision-language models for annotation and specialized algorithms for conditioning image synthesis. Our experiments demonstrate that both modules provide effective control but exhibit limitations with complex constraints. Both models are released on HuggingFace: this https URL
zh

[CV-220] J-ORA: A Framework and Multimodal Dataset for Japanese Object Identification Reference Action Prediction in Robot Perception IROS2025

【速读】:该论文旨在解决机器人感知中因缺乏细粒度对象属性注释而导致的性能瓶颈问题,特别是在日语人机对话场景下,现有数据集难以支持物体识别、指代消解和下一步动作预测等关键任务。其解决方案的关键在于构建J-ORA这一新型多模态数据集,通过引入涵盖类别、颜色、形状、尺寸、材质及空间关系等丰富属性的综合标注模板,显著提升了视觉语言模型(Vision Language Models, VLMs)在动态环境中的多模态感知能力。实验表明,加入详细对象属性可大幅改善感知性能,但开源与专有VLM之间仍存在差距,且不同模型在理解对象功能性和上下文关联性方面表现不一,凸显了高质量、情境敏感属性标注对推动机器人感知技术发展的必要性。

链接: https://arxiv.org/abs/2510.21761
作者: Jesse Atuhurra,Hidetaka Kamigaito,Taro Watanabe,Koichiro Yoshino
机构: NAIST(日本信息学研究所); RIKEN(理化学研究所)
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted to IROS2025

点击查看摘要

Abstract:We introduce J-ORA, a novel multimodal dataset that bridges the gap in robot perception by providing detailed object attribute annotations within Japanese human-robot dialogue scenarios. J-ORA is designed to support three critical perception tasks, object identification, reference resolution, and next-action prediction, by leveraging a comprehensive template of attributes (e.g., category, color, shape, size, material, and spatial relations). Extensive evaluations with both proprietary and open-source Vision Language Models (VLMs) reveal that incorporating detailed object attributes substantially improves multimodal perception performance compared to without object attributes. Despite the improvement, we find that there still exists a gap between proprietary and open-source VLMs. In addition, our analysis of object affordances demonstrates varying abilities in understanding object functionality and contextual relationships across different VLMs. These findings underscore the importance of rich, context-sensitive attribute annotations in advancing robot perception in dynamic environments. See project page at this https URL.
zh

[CV-221] Agro-Consensus: Semantic Self-Consistency in Vision-Language Models for Crop Disease Management in Developing Countries

【速读】:该论文旨在解决发展中国家农业病害管理中因专家资源匮乏、网络连接不稳定及成本限制导致大规模人工智能系统难以部署的问题,尤其聚焦于提升视觉语言模型(Vision-Language Model, VLM)在农作物图像描述生成中的可靠性。其解决方案的关键在于提出一种低成本的自一致性框架:通过一个轻量级(80MB)预训练嵌入模型对多个候选生成结果进行语义聚类,并基于余弦相似度计算共识,从中选择包含诊断、症状、分析、治疗与预防建议的最连贯描述;同时引入人机协同(Human-in-the-Loop, HITL)机制,利用用户确认作物类型过滤错误生成,从而提高输入质量并增强最终输出的一致性与准确性。

链接: https://arxiv.org/abs/2510.21757
作者: Mihir Gupta,Pratik Desai,Ross Greer
机构: The Harker School (美国哈克学校); Kissan.ai (美国Kissan.ai); University of California, Merced (加州大学默塞德分校)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Agricultural disease management in developing countries such as India, Kenya, and Nigeria faces significant challenges due to limited access to expert plant pathologists, unreliable internet connectivity, and cost constraints that hinder the deployment of large-scale AI systems. This work introduces a cost-effective self-consistency framework to improve vision-language model (VLM) reliability for agricultural image captioning. The proposed method employs semantic clustering, using a lightweight (80MB) pre-trained embedding model to group multiple candidate responses. It then selects the most coherent caption – containing a diagnosis, symptoms, analysis, treatment, and prevention recommendations – through a cosine similarity-based consensus. A practical human-in-the-loop (HITL) component is incorporated, wherein user confirmation of the crop type filters erroneous generations, ensuring higher-quality input for the consensus mechanism. Applied to the publicly available PlantVillage dataset using a fine-tuned 3B-parameter PaliGemma model, our framework demonstrates improvements over standard decoding methods. Evaluated on 800 crop disease images with up to 21 generations per image, our single-cluster consensus method achieves a peak accuracy of 83.1% with 10 candidate generations, compared to the 77.5% baseline accuracy of greedy decoding. The framework’s effectiveness is further demonstrated when considering multiple clusters; accuracy rises to 94.0% when a correct response is found within any of the top four candidate clusters, outperforming the 88.5% achieved by a top-4 selection from the baseline.
zh

[CV-222] A Robotic Stirring Method with Trajectory Optimization and Adaptive Speed Control for Accurate Pest Counting in Water Traps ICRA2026

【速读】:该论文旨在解决现有基于图像的害虫计数方法在处理害虫遮挡(occlusion)问题时精度不足的难题,尤其是在水 trap 中因害虫聚集导致部分个体被遮挡而难以准确计数的问题。其解决方案的关键在于提出一种基于机器人臂的动态搅拌方法,通过轨迹优化与自适应速度控制提升计数准确性:首先设计六种典型搅拌轨迹(如圆形、螺旋形等),并通过对比不同密度场景下的平均计数误差和计数置信度,确定最优搅拌路径;其次构建以计数置信度变化为反馈的闭环控制系统,实现搅拌速度的动态调节,从而在保证检测效率的同时最大化可见害虫数量,显著改善遮挡情况下的计数性能。

链接: https://arxiv.org/abs/2510.21732
作者: Xumin Gao,Mark Stevens,Grzegorz Cielniak
机构: 未知
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
备注: This paper has been submitted to ICRA 2026 and is currently under review

点击查看摘要

Abstract:Accurate monitoring of pest population dynamics is crucial for informed decision-making in precision agriculture. Currently, mainstream image-based pest counting methods primarily rely on image processing combined with machine learning or deep learning for pest counting. However, these methods have limitations and struggle to handle situations involving pest occlusion. To address this issue, this paper proposed a robotic stirring method with trajectory optimization and adaptive speed control for accurate pest counting in water traps. First, we developed an automated stirring system for pest counting in yellow water traps based on a robotic arm. Stirring alters the distribution of pests in the yellow water trap, making some of the occluded individuals visible for detection and counting. Then, we investigated the impact of different stirring trajectories on pest counting performance and selected the optimal trajectory for pest counting. Specifically, we designed six representative stirring trajectories, including circle, square, triangle, spiral, four small circles, and random lines, for the robotic arm to stir. And by comparing the overall average counting error and counting confidence of different stirring trajectories across various pest density scenarios, we determined the optimal trajectory. Finally, we proposed a counting confidence-driven closed-loop control system to achieve adaptive-speed stirring. It uses changes in pest counting confidence between consecutive frames as feedback to adjust the stirring speed. To the best of our knowledge, this is the first study dedicated to investigating the effects of different stirring trajectories on object counting in the dynamic liquid environment and to implement adaptive-speed stirring for this type of task. Experimental results show …
zh

[CV-223] Revising Second Order Terms in Deep Animation Video Coding

【速读】:该论文旨在解决First Order Motion Model (FOMM) 在处理强头部运动(尤其是头部旋转)时性能受限的问题,其核心局限在于依赖局部仿射变换(Jacobian变换)进行图像变形,导致在头部大幅旋转场景下无法准确重建视频内容。解决方案的关键在于用全局旋转矩阵替代原模型中的Jacobian变换,从而更有效地建模头部旋转运动;同时,通过引入先进的归一化技术优化判别器训练稳定性,提升生成视频的视觉质量。实验表明,该改进在保持高质量的同时,可使P帧比特率降低40%至80%。

链接: https://arxiv.org/abs/2510.23561
作者: Konstantin Schmidt,Thomas Richter
机构: 未知
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:First Order Motion Model is a generative model that animates human heads based on very little motion information derived from keypoints. It is a promising solution for video communication because first it operates at very low bitrate and second its computational complexity is moderate compared to other learning based video codecs. However, it has strong limitations by design. Since it generates facial animations by warping source-images, it fails to recreate videos with strong head movements. This works concentrates on one specific kind of head movements, namely head rotations. We show that replacing the Jacobian transformations in FOMM by a global rotation helps the system to perform better on items with head-rotations while saving 40% to 80% of bitrate on P-frames. Moreover, we apply state-of-the-art normalization techniques to the discriminator to stabilize the adversarial training which is essential for generating visually appealing videos. We evaluate the performance by the learned metics LPIPS and DISTS to show the success our optimizations.
zh

[CV-224] USF-MAE: Ultrasound Self-Supervised Foundation Model with Masked Autoencoding

【速读】:该论文旨在解决超声成像(Ultrasound Imaging)在临床应用中面临的三大挑战:图像噪声高、操作者依赖性强以及视野有限,导致诊断结果存在显著的观察者间差异;同时,当前深度学习方法受限于高质量标注数据稀缺及通用图像与超声图像之间的领域差距(domain gap),使得基于非医学数据预训练的模型迁移能力不足。解决方案的关键在于提出首个仅在超声数据上预训练的大规模自监督基础模型——超声掩码自动编码器(Ultrasound Self-Supervised Foundation Model with Masked Autoencoding, USF-MAE),其利用370,000张来自46个开源数据集的2D和3D超声图像(统称为OpenUS-46)进行无标签预训练,采用Vision Transformer架构通过重建被遮蔽图像块来学习模态特异性表征,并在三个公开下游分类任务(乳腺癌、卵巢肿瘤和胃肠道间质瘤)上验证了其优越性能,表明该方法能够有效提升跨解剖区域的泛化能力并逼近甚至超越监督预训练模型的表现。

链接: https://arxiv.org/abs/2510.22990
作者: Youssef Megahed,Robin Ducharme,Mark Walker,Steven Hawken,Adrian D. C. Chan
机构: 未知
类目: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Ultrasound imaging is one of the most widely used diagnostic modalities, offering real-time, radiation-free assessment across diverse clinical domains. However, interpretation of ultrasound images remains challenging due to high noise levels, operator dependence, and limited field of view, resulting in substantial inter-observer variability. Current Deep Learning approaches are hindered by the scarcity of large labeled datasets and the domain gap between general and sonographic images, which limits the transferability of models pretrained on non-medical data. To address these challenges, we introduce the Ultrasound Self-Supervised Foundation Model with Masked Autoencoding (USF-MAE), the first large-scale self-supervised MAE framework pretrained exclusively on ultrasound data. The model was pre-trained on 370,000 2D and 3D ultrasound images curated from 46 open-source datasets, collectively termed OpenUS-46, spanning over twenty anatomical regions. This curated dataset has been made publicly available to facilitate further research and reproducibility. Using a Vision Transformer encoder-decoder architecture, USF-MAE reconstructs masked image patches, enabling it to learn rich, modality-specific representations directly from unlabeled data. The pretrained encoder was fine-tuned on three public downstream classification benchmarks: BUS-BRA (breast cancer), MMOTU-2D (ovarian tumors), and GIST514-DB (gastrointestinal stromal tumors). Across all tasks, USF-MAE consistently outperformed conventional CNN and ViT baselines, achieving F1-scores of 81.6%, 79.6%, and 82.4%, respectively. Despite not using labels during pretraining, USF-MAE approached the performance of the supervised foundation model UltraSam on breast cancer classification and surpassed it on the other tasks, demonstrating strong cross-anatomical generalization.
zh

[CV-225] Neural-HAR: A Dimension-Gated CNN Accelerator for Real-Time Radar Human Activity Recognition

【速读】:该论文旨在解决雷达基人体活动识别(Radar-based Human Activity Recognition, HAR)在资源受限边缘设备上部署时面临的计算复杂度高、功耗大等问题。现有基于CNN/RNN的方法通常参数量和计算量过大,难以满足实时性与能效要求;即便轻量级的Vision Transformer (ViT) 或状态空间模型(State Space Model, SSM)变体也常超出实际硬件的算力与存储预算。其解决方案的关键在于提出一种面向边缘部署的维度门控卷积神经网络(GateCNN),通过两个核心设计实现高效推理:一是将多普勒向量嵌入以突出频率随时间演化特征,二是采用双路径门控卷积结构,利用时间门控机制调制多普勒感知内容特征,辅以残差路径保障训练稳定性。该架构仅需2.7k参数和0.28M FLOPs/推理,即可在UoG2020数据集上达到86.4%准确率,且FPGA原型在Xilinx Zynq-7000平台上实现107.5 μs延迟和15 mW动态功耗,验证了其实时、低功耗边缘推理能力。

链接: https://arxiv.org/abs/2510.22772
作者: Yizhuo Wu,Francesco Fioranelli,Chang Gao
机构: Delft University of Technology (代尔夫特理工大学)
类目: ignal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Radar-based human activity recognition (HAR) is attractive for unobtrusive and privacy-preserving monitoring, yet many CNN/RNN solutions remain too heavy for edge deployment, and even lightweight ViT/SSM variants often exceed practical compute and memory budgets. We introduce Neural-HAR, a dimension-gated CNN accelerator tailored for real-time radar HAR on resource-constrained platforms. At its core is GateCNN, a parameter-efficient Doppler-temporal network that (i) embeds Doppler vectors to emphasize frequency evolution over time and (ii) applies dual-path gated convolutions that modulate Doppler-aware content features with temporal gates, complemented by a residual path for stable training. On the University of Glasgow UoG2020 continuous radar dataset, GateCNN attains 86.4% accuracy with only 2.7k parameters and 0.28M FLOPs per inference, comparable to CNN-BiGRU at a fraction of the complexity. Our FPGA prototype on Xilinx Zynq-7000 Z-7007S reaches 107.5 \mu s latency and 15 mW dynamic power using LUT-based ROM and distributed RAM only (zero DSP/BRAM), demonstrating real-time, energy-efficient edge inference. Code and HLS conversion scripts are available at this https URL.
zh

[CV-226] Understanding What Is Not Said:Referring Remote Sensing Image Segmentation with Scarce Expressions

【速读】:该论文旨在解决遥感图像中指代表达分割(Referring Remote Sensing Image Segmentation, RRSIS)任务在标注成本高、高质量指代表达稀缺条件下的训练效率与性能问题。其核心挑战在于遥感图像中小目标密集分布、背景复杂,难以获取精准的指代表达注释。解决方案的关键在于提出一种弱监督学习范式——弱指代表达学习(Weakly Referring Expression Learning, WREL),利用大量类别名称作为弱指代表达与少量精确指代表达相结合进行训练,并通过理论分析证明混合标注训练可提供相对于全标注训练的性能上限保证。进一步地,作者设计了基于可学习参考库(Learnable Reference Bank, LRB)的LRB-WREL框架,通过样本特定的提示嵌入(prompt embeddings)增强粗粒度类别名称输入,结合动态调度的教师-学生优化机制,有效缓解弱监督噪声,提升跨模态泛化能力与训练稳定性。

链接: https://arxiv.org/abs/2510.22760
作者: Kai Ye,Bowen Liu,Jianghang Lin,Jiayi Ji,Pingyang Dai,Liujuan Cao
机构: Xiamen University (厦门大学); National University of Singapore (新加坡国立大学)
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
备注:

点击查看摘要

Abstract:Referring Remote Sensing Image Segmentation (RRSIS) aims to segment instances in remote sensing images according to referring expressions. Unlike Referring Image Segmentation on general images, acquiring high-quality referring expressions in the remote sensing domain is particularly challenging due to the prevalence of small, densely distributed objects and complex backgrounds. This paper introduces a new learning paradigm, Weakly Referring Expression Learning (WREL) for RRSIS, which leverages abundant class names as weakly referring expressions together with a small set of accurate ones to enable efficient training under limited annotation conditions. Furthermore, we provide a theoretical analysis showing that mixed-referring training yields a provable upper bound on the performance gap relative to training with fully annotated referring expressions, thereby establishing the validity of this new setting. We also propose LRB-WREL, which integrates a Learnable Reference Bank (LRB) to refine weakly referring expressions through sample-specific prompt embeddings that enrich coarse class-name inputs. Combined with a teacher-student optimization framework using dynamically scheduled EMA updates, LRB-WREL stabilizes training and enhances cross-modal generalization under noisy weakly referring supervision. Extensive experiments on our newly constructed benchmark with varying weakly referring data ratios validate both the theoretical insights and the practical effectiveness of WREL and LRB-WREL, demonstrating that they can approach or even surpass models trained with fully annotated referring expressions.
zh

[CV-227] Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLM S

【速读】:该论文旨在解决大语言模型(Large Language Models, LLMs)在多模态语音识别(包括音频语音识别 ASR、视觉语音识别 VSR 和音视频语音识别 AVSR)任务中微调过程中内部动态机制不明确的问题,特别是注意力汇聚点(attention sinks)和大规模激活(massive activations)现象的成因及其对模型性能的影响。解决方案的关键在于:首先,通过系统分析发现注意力汇聚点不仅存在于起始标记(BOS token),还出现在中间低语义信息的token上,并且这些汇聚点的激活特征对应于MLP层中固定的特征索引;其次,揭示了中间汇聚点与BOS token之间存在高余弦相似性,从而放大注意力分配和激活强度;基于此,提出一种简单的去相关损失函数(decorrelation loss),用于降低BOS与其他token之间的余弦相似度,有效抑制中间汇聚点及大规模激活现象,同时在高音视频特征下采样率下显著提升词错误率(WER)性能,保持低下采样率下的稳定性。

链接: https://arxiv.org/abs/2510.22603
作者: Anand,Umberto Cappellazzo,Stavros Petridis,Maja Pantic
机构: 未知
类目: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
备注: The code is available at this https URL

点击查看摘要

Abstract:Large language models (LLMs) have recently advanced auditory speech recognition (ASR), visual speech recognition (VSR), and audio-visual speech recognition (AVSR). However, understanding of their internal dynamics under fine-tuning remains limited. In natural language processing, recent work has revealed attention sinks, tokens that attract disproportionately high attention, and associated massive activations in which some features of sink tokens exhibit huge activation in LLMs. In this work, we are the first to study these phenomena in multimodal speech recognition. Through a detailed analysis of audio-visual LLMs, we identify attention sinks and massive activations not only at the BOS token but also at intermediate low-semantic tokens across ASR, VSR, and AVSR. We show that massive activations originate in the MLP layers and correspond to fixed feature indices across all sink tokens. We further show that intermediate sink tokens exhibit high cosine similarity to the BOS token, thereby amplifying attention and activation. Building on these insights, we introduce a simple decorrelation loss that reduces cosine similarity between BOS and other tokens, effectively mitigating intermediate sinks and massive activations. Furthermore, our method improves word error rate (WER) under high audio-visual feature downsampling while remaining stable at lower downsampling rates.
zh

[CV-228] Learning Event-guided Exposure-agnostic Video Frame Interpolation via Adaptive Feature Blending BMVC2025

【速读】:该论文旨在解决暴露无关的视频帧插值(Exposure-agnostic Video Frame Interpolation, VFI)问题,即从未知且动态曝光条件下捕获的模糊低帧率视频中恢复出清晰、高帧率的视频序列。现有基于事件相机(event camera)的方法在严重低帧率模糊视频上表现不佳,主要受限于缺乏有效的时序约束。其解决方案的关键在于提出两个核心组件:目标自适应事件采样(Target-adaptive Event Sampling, TES)和目标自适应重要性映射(Target-adaptive Importance Mapping, TIM)。TES通过在目标时间戳和未知曝光时间附近采样事件,提升事件与模糊帧之间的时序对齐;TIM则生成一个融合时序邻近性和空间相关性的重要性图,指导框架自适应地融合连续特征,使时序对齐特征作为主线索、空间相关特征提供补充支持,从而显著提升插值质量。

链接: https://arxiv.org/abs/2510.22565
作者: Junsik Jung,Yoonki Cho,Woo Jae Kim,Lin Wang,Sune-eui Yoon
机构: 未知
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
备注: Accepted for BMVC2025

点击查看摘要

Abstract:Exposure-agnostic video frame interpolation (VFI) is a challenging task that aims to recover sharp, high-frame-rate videos from blurry, low-frame-rate inputs captured under unknown and dynamic exposure conditions. Event cameras are sensors with high temporal resolution, making them especially advantageous for this task. However, existing event-guided methods struggle to produce satisfactory results on severely low-frame-rate blurry videos due to the lack of temporal constraints. In this paper, we introduce a novel event-guided framework for exposure-agnostic VFI, addressing this limitation through two key components: a Target-adaptive Event Sampling (TES) and a Target-adaptive Importance Mapping (TIM). Specifically, TES samples events around the target timestamp and the unknown exposure time to better align them with the corresponding blurry frames. TIM then generates an importance map that considers the temporal proximity and spatial relevance of consecutive features to the target. Guided by this map, our framework adaptively blends consecutive features, allowing temporally aligned features to serve as the primary cues while spatially relevant ones offer complementary support. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of our approach in exposure-agnostic VFI scenarios.
zh

[CV-229] raceTrans: Translation and Spatial Tracing for Surgical Prediction

【速读】:该论文旨在解决现有图像到图像翻译模型在医学场景中因忽略源图像与目标图像间空间对应关系而导致的结构不一致和幻觉问题,从而影响预测结果的可靠性和可解释性。其解决方案的关键在于提出了一种名为TraceTrans的新颖可变形图像翻译框架,该框架通过双解码器结构分别预测空间形变场和合成目标图像,其中形变场显式约束生成结果的空间一致性,确保输出与输入在解剖结构上保持对应,从而实现既符合目标分布又具备可解释性的术后预测。

链接: https://arxiv.org/abs/2510.22379
作者: Xiyu Luo,Haodong LI,Xinxing Cheng,He Zhao,Yang Hu,Xuan Song,Tianyang Zhang
机构: 未知
类目: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Image-to-image translation models have achieved notable success in converting images across visual domains and are increasingly used for medical tasks such as predicting post-operative outcomes and modeling disease progression. However, most existing methods primarily aim to match the target distribution and often neglect spatial correspondences between the source and translated images. This limitation can lead to structural inconsistencies and hallucinations, undermining the reliability and interpretability of the predictions. These challenges are accentuated in clinical applications by the stringent requirement for anatomical accuracy. In this work, we present TraceTrans, a novel deformable image translation model designed for post-operative prediction that generates images aligned with the target distribution while explicitly revealing spatial correspondences with the pre-operative input. The framework employs an encoder for feature extraction and dual decoders for predicting spatial deformations and synthesizing the translated image. The predicted deformation field imposes spatial constraints on the generated output, ensuring anatomical consistency with the source. Extensive experiments on medical cosmetology and brain MRI datasets demonstrate that TraceTrans delivers accurate and interpretable post-operative predictions, highlighting its potential for reliable clinical deployment.
zh

[CV-230] Expert Validation of Synthetic Cervical Spine Radiographs Generated with a Denoising Diffusion Probabilistic Model

【速读】:该论文旨在解决神经外科领域中机器学习(Machine Learning, ML)应用受限于高质量医学影像数据集难以获取的问题,尤其是侧位颈椎X线片的样本量不足与隐私保护难题。其核心解决方案是采用去噪扩散概率模型(Denoising Diffusion Probabilistic Model, DDPM)生成高保真度的合成颈椎X线图像,通过在4,963张真实图像上训练模型,并利用临床专家盲评“图灵测试”验证合成图像的真实性与质量,结果表明合成图像在视觉感知上与真实图像无统计学差异,且未出现过拟合或记忆现象。这一方法为构建大规模、隐私安全的神经影像数据集提供了可行路径,适用于关键点定位、分割和分类等下游任务。

链接: https://arxiv.org/abs/2510.22166
作者: Austin A. Barr,Brij S. Karmur,Anthony J. Winder,Eddie Guo,John T. Lysack,James N. Scott,William F. Morrish,Muneer Eesa,Morgan Willson,David W. Cadotte,Michael M.H. Yang,Ian Y.M. Chan,Sanju Lama,Garnette R. Sutherland
机构: Cumming School of Medicine, University of Calgary; Division of Neurosurgery, Department of Clinical Neurosciences, University of Calgary; Department of Radiology, University of Calgary; Division of Neurosurgery, Department of Surgery, University of Toronto; Department of Medical Imaging, University of Toronto; Project neuroArm, Department of Clinical Sciences, University of Calgary
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: 10 pages, 4 figures, 1 table

点击查看摘要

Abstract:Machine learning in neurosurgery is limited by challenges in assembling large, high-quality imaging datasets. Synthetic data offers a scalable, privacy-preserving solution. We evaluated the feasibility of generating realistic lateral cervical spine radiographs using a denoising diffusion probabilistic model (DDPM) trained on 4,963 images from the Cervical Spine X-ray Atlas. Model performance was monitored via training/validation loss and Frechet inception distance, and synthetic image quality was assessed in a blinded “clinical Turing test” with six neuroradiologists and two spine-fellowship trained neurosurgeons. Experts reviewed 50 quartets containing one real and three synthetic images, identifying the real image and rating realism on a 4-point Likert scale. Experts correctly identified the real image in 29% of trials (Fleiss’ kappa=0.061). Mean realism scores were comparable between real (3.323) and synthetic images (3.228, 3.258, and 3.320; p=0.383, 0.471, 1.000). Nearest-neighbor analysis found no evidence of memorization. We also provide a dataset of 20,063 synthetic radiographs. These results demonstrate that DDPM-generated cervical spine X-rays are statistically indistinguishable in realism and quality from real clinical images, offering a novel approach to creating large-scale neuroimaging datasets for ML applications in landmarking, segmentation, and classification.
zh

[CV-231] Frequency-Spatial Interaction Driven Network for Low-Light Image Enhancement

【速读】:该论文旨在解决低光照图像增强(Low-light Image Enhancement, LLIE)中现有方法忽视频域信息重要性或未能有效促进信息传播与流动的问题,从而限制了增强性能。其解决方案的关键在于提出一种两阶段架构的频率-空间交互驱动网络(Frequency-Spatial Interaction-Driven Network, FSIDNet):第一阶段专注于恢复图像幅度以提升亮度,第二阶段则聚焦于相位信息的重建以细化结构细节;同时设计了两个频率-空间交互模块(Frequency-Spatial Interaction Blocks),实现频域与空域信息的互补融合,并引入信息交换模块(Information Exchange Module, IEM)通过跨阶段和跨尺度特征整合,显著增强两阶段网络中的信息流动效率,最终在多个基准数据集上实现了优异的视觉效果与量化指标表现。

链接: https://arxiv.org/abs/2510.22154
作者: Yunhong Tao,Wenbing Tao,Xiang Xiang
机构: 未知
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Signal Processing (eess.SP)
备注:

点击查看摘要

Abstract:Low-light image enhancement (LLIE) aims at improving the perception or interpretability of an image captured in an environment with poor illumination. With the advent of deep learning, the LLIE technique has achieved significant breakthroughs. However, existing LLIE methods either ignore the important role of frequency domain information or fail to effectively promote the propagation and flow of information, limiting the LLIE performance. In this paper, we develop a novel frequency-spatial interaction-driven network (FSIDNet) for LLIE based on two-stage architecture. To be specific, the first stage is designed to restore the amplitude of low-light images to improve the lightness, and the second stage devotes to restore phase information to refine fine-grained structures. Considering that Frequency domain and spatial domain information are complementary and both favorable for LLIE, we further develop two frequency-spatial interaction blocks which mutually amalgamate the complementary spatial and frequency information to enhance the capability of the model. In addition, we construct the Information Exchange Module (IEM) to associate two stages by adequately incorporating cross-stage and cross-scale features to effectively promote the propagation and flow of information in the two-stage network structure. Finally, we conduct experiments on several widely used benchmark datasets (i.e., LOL-Real, LSRW-Huawei, etc.), which demonstrate that our method achieves the excellent performance in terms of visual results and quantitative metrics while preserving good model efficiency.
zh

[CV-232] HDR Image Reconstruction using an Unsupervised Fusion Model

【速读】:该论文旨在解决传统数字相机因动态范围有限而无法准确捕捉自然场景中广泛亮度层次的问题,从而影响成像质量。其解决方案的关键在于提出一种基于深度学习的多曝光融合方法,利用卷积神经网络(Convolutional Neural Network, CNN)对不同曝光的低动态范围(Low Dynamic Range, LDR)图像(通常为欠曝和过曝图像)进行互补信息融合:欠曝图像保留亮部细节,过曝图像保存暗部信息,网络通过无监督训练自动学习最优融合策略,生成高质量高动态范围(High Dynamic Range, HDR)图像,且无需依赖真实HDR图像作为监督信号,提升了方法在实际应用中的可行性与鲁棒性。

链接: https://arxiv.org/abs/2510.21815
作者: Kumbha Nagaswetha
机构: Indian Institute of Science (印度科学研究所)
类目: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:High Dynamic Range (HDR) imaging aims to reproduce the wide range of brightness levels present in natural scenes, which the human visual system can perceive but conventional digital cameras often fail to capture due to their limited dynamic range. To address this limitation, we propose a deep learning-based multi-exposure fusion approach for HDR image generation. The method takes a set of differently exposed Low Dynamic Range (LDR) images, typically an underexposed and an overexposed image, and learns to fuse their complementary information using a convolutional neural network (CNN). The underexposed image preserves details in bright regions, while the overexposed image retains information in dark regions; the network effectively combines these to reconstruct a high-quality HDR output. The model is trained in an unsupervised manner, without relying on ground-truth HDR images, making it practical for real-world applications where such data is unavailable. We evaluate our results using the Multi-Exposure Fusion Structural Similarity Index Measure (MEF-SSIM) and demonstrate that our approach achieves superior visual quality compared to existing fusion methods. A customized loss function is further introduced to improve reconstruction fidelity and optimize model performance.
zh

人工智能

[AI-0] Alita-G: Self-Evolving Generative Agent for Agent Generation

链接: https://arxiv.org/abs/2510.23601
作者: Jiahao Qiu,Xuan Qi,Hongru Wang,Xinzhe Juan,Yimin Wang,Zelin Zhao,Jiayi Geng,Jiacheng Guo,Peihang Li,Jingzhe Shi,Shilong Liu,Mengdi Wang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 15 pages, 3 figures

点击查看摘要

[AI-1] Multi-Agent Evolve: LLM Self-Improve through Co-evolution ICLR2026

【速读】:该论文旨在解决当前基于强化学习(Reinforcement Learning, RL)的大语言模型(Large Language Models, LLMs)在提升推理能力时对人工标注数据和可验证奖励信号的高度依赖问题,从而限制了其可扩展性和泛化能力。针对这一挑战,作者提出了一种名为多智能体演化(Multi-Agent Evolve, MAE)的框架,其核心创新在于设计了一个由三个交互式代理组成的结构——提案者(Proposer)、求解者(Solver)与评判者(Judge),三者均从同一LLM实例化而来,并通过强化学习协同优化其行为策略。该机制实现了无需人类标注数据即可自动生成任务、尝试解答并评估结果的闭环演化过程,显著提升了LLM在数学推理、逻辑推理及通用知识问答等多样化任务上的表现,实验表明在Qwen2.5-3B-Instruct模型上平均性能提升达4.54%。

链接: https://arxiv.org/abs/2510.23595
作者: Yixing Chen,Yiding Wang,Siqi Zhu,Haofei Yu,Tao Feng,Muhan Zhan,Mostofa Patwary,Jiaxuan You
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 29 pages, 4 figures, submitted to ICLR 2026

点击查看摘要

Abstract:Reinforcement Learning (RL) has demonstrated significant potential in enhancing the reasoning capabilities of large language models (LLMs). However, the success of RL for LLMs heavily relies on human-curated datasets and verifiable rewards, which limit their scalability and generality. Recent Self-Play RL methods, inspired by the success of the paradigm in games and Go, aim to enhance LLM reasoning capabilities without human-annotated data. However, their methods primarily depend on a grounded environment for feedback (e.g., a Python interpreter or a game engine); extending them to general domains remains challenging. To address these challenges, we propose Multi-Agent Evolve (MAE), a framework that enables LLMs to self-evolve in solving diverse tasks, including mathematics, reasoning, and general knowledge QA. The core design of MAE is based on a triplet of interacting agents (Proposer, Solver, Judge) that are instantiated from a single LLM, and applies reinforcement learning to optimize their behaviors. The Proposer generates questions, the Solver attempts solutions, and the Judge evaluates both while co-evolving. Experiments on Qwen2.5-3B-Instruct demonstrate that MAE achieves an average improvement of 4.54% on multiple benchmarks. These results highlight MAE as a scalable, data-efficient method for enhancing the general reasoning abilities of LLMs with minimal reliance on human-curated supervision.
zh

[AI-2] A Survey of Data Agents : Emerging Paradigm or Overstated Hype?

链接: https://arxiv.org/abs/2510.23587
作者: Yizhang Zhu,Liangwei Wang,Chenyu Yang,Xiaotian Lin,Boyan Li,Wei Zhou,Xinyu Liu,Zhangyang Peng,Tianqi Luo,Yu Li,Chengliang Chai,Chong Chen,Shimin Di,Ju Fan,Ji Sun,Nan Tang,Fugee Tsung,Jiannan Wang,Chenglin Wu,Yanwei Xu,Shaolei Zhang,Yong Zhang,Xuanhe Zhou,Guoliang Li,Yuyu Luo
机构: 未知
类目: Databases (cs.DB); Artificial Intelligence (cs.AI)
备注: Please refer to our paper list and companion materials at: this https URL

点击查看摘要

[AI-3] Reduced AI Acceptance After the Generative AI Boom: Evidence From a Two-Wave Survey Study

链接: https://arxiv.org/abs/2510.23578
作者: Joachim Baumann,Aleksandra Urman,Ulrich Leicht-Deobald,Zachary J. Roman,Anikó Hannák,Markus Christen
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-4] AMI: Taming Heterogeneity in Temporal Interactions for Temporal Graph Link Prediction NEURIPS2025

【速读】:该论文旨在解决时间图链路预测中因交互异质性导致的时序信息编码效率低下和稀疏交互节点对历史交互信息遗忘的问题。具体而言,频繁交互节点对主导多数事件、交互间隔差异大等现象使得现有方法难以有效建模时序动态,尤其在预测低频交互节点对时性能下降明显。解决方案的关键在于提出名为TAMI的新框架,其核心包含两个组件:一是日志时间编码函数(log time encoding function, LTE),通过将原始交互间隔映射为更均衡的区间以提升时序信息表示能力;二是链路历史聚合机制(link history aggregation, LHA),用于保留每对目标节点的历史交互记忆,避免因间歇性交互而丢失关键时序特征。该框架可无缝集成至当前最先进的时序图神经网络模型中,显著提升链路预测性能。

链接: https://arxiv.org/abs/2510.23577
作者: Zhongyi Yu,Jianqiu Wu,Zhenghao Wu,Shuhan Zhong,Weifeng Su,Chul-Ho Lee,Weipeng Zhuo
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Accepted to NeurIPS 2025

点击查看摘要

Abstract:Temporal graph link prediction aims to predict future interactions between nodes in a graph based on their historical interactions, which are encoded in node embeddings. We observe that heterogeneity naturally appears in temporal interactions, e.g., a few node pairs can make most interaction events, and interaction events happen at varying intervals. This leads to the problems of ineffective temporal information encoding and forgetting of past interactions for a pair of nodes that interact intermittently for their link prediction. Existing methods, however, do not consider such heterogeneity in their learning process, and thus their learned temporal node embeddings are less effective, especially when predicting the links for infrequently interacting node pairs. To cope with the heterogeneity, we propose a novel framework called TAMI, which contains two effective components, namely log time encoding function (LTE) and link history aggregation (LHA). LTE better encodes the temporal information through transforming interaction intervals into more balanced ones, and LHA prevents the historical interactions for each target node pair from being forgotten. State-of-the-art temporal graph neural networks can be seamlessly and readily integrated into TAMI to improve their effectiveness. Experiment results on 13 classic datasets and three newest temporal graph benchmark (TGB) datasets show that TAMI consistently improves the link prediction performance of the underlying models in both transductive and inductive settings. Our code is available at this https URL.
zh

[AI-5] OntoPret: An Ontology for the Interpretation of Human Behavior

链接: https://arxiv.org/abs/2510.23553
作者: Alexis Ellis,Stacie Severyn,Fjollë Novakazi,Hadi Banaee,Cogan Shimizu
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-6] When No Paths Lead to Rome: Benchmarking Systematic Neural Relational Reasoning NEURIPS2025

链接: https://arxiv.org/abs/2510.23532
作者: Anirban Das,Irtaza Khalid,Rafael Peñaloza,Steven Schockaert
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: accepted at NeurIPS 2025 DB track

点击查看摘要

[AI-7] Learning Linearity in Audio Consistency Autoencoders via Implicit Regularization

链接: https://arxiv.org/abs/2510.23530
作者: Bernardo Torres,Manuel Moussallam,Gabriel Meseguer-Brocal
机构: 未知
类目: ound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
备注:

点击查看摘要

[AI-8] oward Carbon-Neutral Human AI: Rethinking Data Computation and Learning Paradigms for Sustainable Intelligence

【速读】:该论文旨在解决当前人工智能(Artificial Intelligence, AI)发展中因大规模静态数据集和单一训练范式所引发的计算资源消耗过高、环境负担加重以及伦理责任模糊等问题。其核心解决方案是提出一种名为“人类启发的人工智能”(Human AI, HAI)的新框架,关键在于通过增量学习(incremental learning)、碳感知优化(carbon-aware optimization)与人机协同(human-in-the-loop collaboration)三大机制,实现模型的持续适应性、能效提升与责任可追溯性,从而在保障性能的同时降低碳足迹并减少人工标注成本,推动AI向可持续、以人为中心的方向演进。

链接: https://arxiv.org/abs/2510.23524
作者: KC Santosh,Rodrigue Rizk,Longwei Wang
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 9 pages, 3 figures

点击查看摘要

Abstract:The rapid advancement of Artificial Intelligence (AI) has led to unprecedented computational demands, raising significant environmental and ethical concerns. This paper critiques the prevailing reliance on large-scale, static datasets and monolithic training paradigms, advocating for a shift toward human-inspired, sustainable AI solutions. We introduce a novel framework, Human AI (HAI), which emphasizes incremental learning, carbon-aware optimization, and human-in-the-loop collaboration to enhance adaptability, efficiency, and accountability. By drawing parallels with biological cognition and leveraging dynamic architectures, HAI seeks to balance performance with ecological responsibility. We detail the theoretical foundations, system design, and operational principles that enable AI to learn continuously and contextually while minimizing carbon footprints and human annotation costs. Our approach addresses pressing challenges in active learning, continual adaptation, and energy-efficient model deployment, offering a pathway toward responsible, human-centered artificial intelligence.
zh

[AI-9] A Deep Latent Factor Graph Clustering with Fairness-Utility Trade-off Perspective

链接: https://arxiv.org/abs/2510.23507
作者: Siamak Ghodsi,Amjad Seyedi,Tai Le Quy,Fariba Karimi,Eirini Ntoutsi
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT)
备注: Accepted to IEEE Big-Data 2025 main research track. The paper is 10 main pages and 4 pages of Appendix

点击查看摘要

[AI-10] Emotion-Coherent Reasoning for Multimodal LLM s via Emotional Rationale Verifier

链接: https://arxiv.org/abs/2510.23506
作者: Hyeongseop Rha,Jeong Hun Yeo,Yeonju Kim,Yong Man Ro
机构: 未知
类目: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
备注: 16 pages, 11 figures

点击查看摘要

[AI-11] Mixed Precision Training of Neural ODEs

链接: https://arxiv.org/abs/2510.23498
作者: Elena Celledoni,Brynjulf Owren,Lars Ruthotto,Tianjiao Nicole Yang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Numerical Analysis (math.NA)
备注: Code available at this https URL 26 pages, 4 figures

点击查看摘要

[AI-12] Are Agents Just Automata? On the Formal Equivalence Between Agent ic AI and the Chomsky Hierarchy

链接: https://arxiv.org/abs/2510.23487
作者: Roham Koohestani,Ziyou Li,Anton Podkopaev,Maliheh Izadi
机构: 未知
类目: Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL)
备注:

点击查看摘要

[AI-13] Human-AI Collaborative Uncertainty Quantification

【速读】:该论文旨在解决当前人工智能(AI)在高风险决策场景中因缺乏领域知识、长期上下文和物理世界推理能力而导致的可靠性不足问题,尤其是在不确定性量化(Uncertainty Quantification, UQ)环节。解决方案的关键在于提出“人类-AI协同不确定性量化”(Human AI Collaborative Uncertainty Quantification)框架,该框架通过一个统一的评分函数构建预测集,并引入两个核心目标:避免反事实伤害(counterfactual harm),即确保AI不会削弱人类专家的正确判断;以及互补性(complementarity),使AI能够修正人类遗漏的正确结果。理论分析表明,在群体层面最优的协作预测集遵循一个双阈值结构,扩展了经典的保形预测(conformal prediction)结果;进一步开发了具有无分布有限样本保证的离线与在线校准算法,其中在线方法能适应分布偏移,包括人类行为因与AI交互而演化的情形(称为“人类到AI适应”)。实验验证了该框架在图像分类、回归及基于文本的医疗决策任务中均显著优于单独使用人类或AI模型的表现。

链接: https://arxiv.org/abs/2510.23476
作者: Sima Noorani,Shayan Kiyani,George Pappas,Hamed Hassani
机构: 未知
类目: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)
备注:

点击查看摘要

Abstract:AI predictive systems are increasingly embedded in decision making pipelines, shaping high stakes choices once made solely by humans. Yet robust decisions under uncertainty still rely on capabilities that current AI lacks: domain knowledge not captured by data, long horizon context, and reasoning grounded in the physical world. This gap has motivated growing efforts to design collaborative frameworks that combine the complementary strengths of humans and AI. This work advances this vision by identifying the fundamental principles of Human AI collaboration within uncertainty quantification, a key component of reliable decision making. We introduce Human AI Collaborative Uncertainty Quantification, a framework that formalizes how an AI model can refine a human expert’s proposed prediction set with two goals: avoiding counterfactual harm, ensuring the AI does not degrade correct human judgments, and complementarity, enabling recovery of correct outcomes the human missed. At the population level, we show that the optimal collaborative prediction set follows an intuitive two threshold structure over a single score function, extending a classical result in conformal prediction. Building on this insight, we develop practical offline and online calibration algorithms with provable distribution free finite sample guarantees. The online method adapts to distribution shifts, including human behavior evolving through interaction with AI, a phenomenon we call Human to AI Adaptation. Experiments across image classification, regression, and text based medical decision making show that collaborative prediction sets consistently outperform either agent alone, achieving higher coverage and smaller set sizes across various conditions.
zh

[AI-14] Policy-Aware Generative AI for Safe Auditable Data Access Governance

链接: https://arxiv.org/abs/2510.23474
作者: Shames Al Mandalawi,Muzakkiruddin Ahmed Mohammed,Hendrika Maclean,Mert Can Cakmak,John R. Talburt
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: The 17th International Conference on Knowledge and Systems Engineering

点击查看摘要

[AI-15] BBOPlace-Bench: Benchmarking Black-Box Optimization for Chip Placement

链接: https://arxiv.org/abs/2510.23472
作者: Ke Xue,Ruo-Tong Chen,Rong-Xi Tan,Xi Lin,Yunqi Shi,Siyuan Xu,Mingxuan Yuan,Chao Qian
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Neural and Evolutionary Computing (cs.NE)
备注:

点击查看摘要

[AI-16] What are the odds? Risk and uncertainty about AI existential risk

链接: https://arxiv.org/abs/2510.23453
作者: Marco Grossi
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 10 pages

点击查看摘要

[AI-17] Causal Deep Q Network

链接: https://arxiv.org/abs/2510.23424
作者: Elouanes Khelifi,Amir Saki,Usef Faghihi
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-18] Bid2X: Revealing Dynamics of Bidding Environment in Online Advertising from A Foundation Model Lens KDD2025

链接: https://arxiv.org/abs/2510.23410
作者: Jiahao Ji,Tianyu Wang,Yeshu Li,Yushen Huo,Zhilin Zhang,Chuan Yu,Jian Xu,Bo Zheng
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 12 pages, KDD 2025

点击查看摘要

[AI-19] Eigen-Value: Efficient Domain-Robust Data Valuation via Eigenvalue-Based Approach

【速读】:该论文旨在解决现有数据估值方法在分布外(Out-of-Distribution, OOD)场景下泛化能力不足的问题,尤其是在验证集不含OOD数据时,基于分布内(In-Distribution, ID)损失的估值方法往往失效。其核心解决方案是提出一个名为Eigen-Value (EV) 的插件式框架,通过仅使用ID数据子集(包括验证阶段)来估计数据点对领域差异(domain discrepancy)的边际贡献;关键创新在于利用ID数据协方差矩阵特征值的比值构建领域差异的谱近似,并结合微扰理论高效计算每个数据点对这一差异的贡献,从而无需额外训练即可增强模型在域偏移下的鲁棒性,同时保持计算轻量化。

链接: https://arxiv.org/abs/2510.23409
作者: Youngjun Choi,Joonseong Kang,Sungjun Lim,Kyungwoo Song
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Data valuation has become central in the era of data-centric AI. It drives efficient training pipelines and enables objective pricing in data markets by assigning a numeric value to each data point. Most existing data valuation methods estimate the effect of removing individual data points by evaluating changes in model validation performance under in-distribution (ID) settings, as opposed to out-of-distribution (OOD) scenarios where data follow different patterns. Since ID and OOD data behave differently, data valuation methods based on ID loss often fail to generalize to OOD settings, particularly when the validation set contains no OOD data. Furthermore, although OOD-aware methods exist, they involve heavy computational costs, which hinder practical deployment. To address these challenges, we introduce \emphEigen-Value (EV), a plug-and-play data valuation framework for OOD robustness that uses only an ID data subset, including during validation. EV provides a new spectral approximation of domain discrepancy, which is the gap of loss between ID and OOD using ratios of eigenvalues of ID data’s covariance matrix. EV then estimates the marginal contribution of each data point to this discrepancy via perturbation theory, alleviating the computational burden. Subsequently, EV plugs into ID loss-based methods by adding an EV term without any additional training loop. We demonstrate that EV achieves improved OOD robustness and stable value rankings across real-world datasets, while remaining computationally lightweight. These results indicate that EV is practical for large-scale settings with domain shift, offering an efficient path to OOD-robust data valuation.
zh

[AI-20] AutoStreamPipe: LLM Assisted Automatic Generation of Data Stream Processing Pipelines

链接: https://arxiv.org/abs/2510.23408
作者: Abolfazl Younesi,Zahra Najafabadi Samani,Thomas Fahringer
机构: 未知
类目: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
备注: Under review

点击查看摘要

[AI-21] Opinion Mining Based Entity Ranking using Fuzzy Logic Algorithmic Approach

【速读】:该论文旨在解决现有观点挖掘(Opinion Mining)研究中缺乏对意见进行细粒度分类并据此对实体进行排序的问题。传统方法通常仅识别评论的情感极性(正向、负向或中性),而未深入分析评论所针对的具体属性或组件,并据此进行更精细的排序。本文提出了一种基于模糊逻辑推理(Fuzzy Logic Reasoning)的方法,在细粒度层级上提取每个评价语句中涉及的对象属性及其情感倾向,从而实现更精准的实体排序。其解决方案的关键在于引入模糊逻辑以处理自然语言中固有的不确定性与模糊性,进而提升观点抽取和排序的准确性与实用性。

链接: https://arxiv.org/abs/2510.23384
作者: Pratik N. Kalamkar,A.G. Phakatkar
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 8 pages, 4 figures, Conference Paper

点击查看摘要

Abstract:Opinions are central to almost all human activities and are key influencers of our behaviors. In current times due to growth of social networking website and increase in number of e-commerce site huge amount of opinions are now available on web. Given a set of evaluative statements that contain opinions (or sentiments) about an Entity, opinion mining aims to extract attributes and components of the object that have been commented on in each statement and to determine whether the comments are positive, negative or neutral. While lot of research recently has been done in field of opinion mining and some of it dealing with ranking of entities based on review or opinion set, classifying opinions into finer granularity level and then ranking entities has never been done before. In this paper method for opinion mining from statements at a deeper level of granularity is proposed. This is done by using fuzzy logic reasoning, after which entities are ranked as per this information.
zh

[AI-22] Symbolic Neural Generation with Applications to Lead Discovery in Drug Design

链接: https://arxiv.org/abs/2510.23379
作者: Ashwin Srinivasan,A Baskar,Tirtharaj Dash,Michael Bain,Sanjay Kumar Dey,Mainak Banerjee
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Biomolecules (q-bio.BM)
备注: 37 pages, 15 figures; partial overlap of experimental results with this https URL

点击查看摘要

[AI-23] ZeroFlood: A Geospatial Foundation Model for Data-Efficient Flood Susceptibility Mapping

链接: https://arxiv.org/abs/2510.23364
作者: Hyeongkyun Kim,Orestis Oikonomou
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Preprint submitted to EUSAR 2026 (under review)

点击查看摘要

[AI-24] CNOT Minimal Circuit Synthesis: A Reinforcement Learning Approach

【速读】:该论文致力于解决量子计算中的CNOT门最小化问题(CNOT minimisation),即在保证量子电路功能不变的前提下,尽可能减少CNOT门的数量,以提升量子算法的效率和可行性。其解决方案的关键在于提出了一种新颖的强化学习方法:使用单一训练好的强化学习代理(agent)处理固定大小为 $ m = 8 $ 的矩阵,并通过嵌入(embedding)或高斯条纹化(Gaussian striping)对不同尺寸的矩阵进行预处理,从而实现对任意规模(3 到 15)的输入矩阵的有效泛化。实验表明,该方法在输入规模增大时优于现有最先进算法。

链接: https://arxiv.org/abs/2510.23304
作者: Riccardo Romanello,Daniele Lizzio Bosco,Jacopo Cossio,Dusan Sutulovic,Giuseppe Serra,Carla Piazza,Paolo Burelli
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:CNOT gates are fundamental to quantum computing, as they facilitate entanglement, a crucial resource for quantum algorithms. Certain classes of quantum circuits are constructed exclusively from CNOT gates. Given their widespread use, it is imperative to minimise the number of CNOT gates employed. This problem, known as CNOT minimisation, remains an open challenge, with its computational complexity yet to be fully characterised. In this work, we introduce a novel reinforcement learning approach to address this task. Instead of training multiple reinforcement learning agents for different circuit sizes, we use a single agent up to a fixed size m . Matrices of sizes different from m are preprocessed using either embedding or Gaussian striping. To assess the efficacy of our approach, we trained an agent with m = 8, and evaluated it on matrices of size n that range from 3 to 15. The results we obtained show that our method overperforms the state-of-the-art algorithm as the value of n increases.
zh

[AI-25] A Novel Framework for Multi-Modal Protein Representation Learning

链接: https://arxiv.org/abs/2510.23273
作者: Runjie Zheng,Zhen Wang,Anjie Qiao,Jiancong Xie,Jiahua Rao,Yuedong Yang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)
备注: 35 pages, 5 figures, 4 tables

点击查看摘要

[AI-26] PAHQ: Accelerating Automated Circuit Discovery through Mixed-Precision Inference Optimization

【速读】:该论文旨在解决自动化电路发现(Automated Circuit Discovery, ACDC)在大型语言模型中应用时面临的计算效率低下和内存占用过高问题。ACDC作为机制可解释性的重要方法,其瓶颈在于patching操作的高开销,而现有加速方案多依赖线性近似,导致分析忠实度显著下降。本文提出的关键解决方案是Per Attention Head Quantization (PAHQ),其核心思想在于利用激活patching与混合精度量化(Mixed-Precision Quantization, MPQ)之间的本质对齐关系:通过仅对被研究模块保持高精度、其余部分安全降精度,实现patching操作的高效执行。PAHQ在不牺牲分析忠实度的前提下,将ACDC运行时间减少最多80%,内存消耗降低最多30%,并可无缝集成至现有基于边缘的电路发现技术中,为机制可解释性提供了训练-free的高效新路径。

链接: https://arxiv.org/abs/2510.23264
作者: Xinhai Wang,Shu Yang,Liangyu Wang,Lin Zhang,Huanyi Xie,Lijie Hu,Di Wang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Circuit discovery, which involves identifying sparse and task-relevant subnetworks in pre-trained language models, is a cornerstone of mechanistic interpretability. Automated Circuit Discovery (ACDC) has emerged as a pivotal methodology in circuit discovery, but its application to large language models is severely limited by computational inefficiency and prohibitively high memory requirements. Although several accelerated approaches have been proposed, they primarily rely on linear approximations to ACDC, which significantly compromises analytical faithfulness. Our proposed method for accelerating automated circuit discovery, Per Attention Head Quantization (PAHQ), takes a fundamentally different approach by optimizing the efficiency of each individual patching operation. PAHQ leverages a fundamental alignment between activation patching and mixed-precision quantization (MPQ): interpretability analysis through patching essentially performs targeted ablation studies. Therefore, we can maintain high precision exclusively for investigated components while safely reducing precision elsewhere in the network. PAHQ-accelerated ACDC reduces runtime by up to 80% and memory consumption by up to 30% compared to unaccelerated ACDC while maintaining faithfulness. Importantly, our method readily integrates with existing edge-based circuit discovery techniques by modifying the attention computation mechanism. This training-free approach provides a practical and novel pathway for accelerating mechanistic interpretability methods. Our code is available at this https URL.
zh

[AI-27] Deep Active Inference with Diffusion Policy and Multiple Timescale World Model for Real-World Exploration and Navigation

链接: https://arxiv.org/abs/2510.23258
作者: Riko Yokozawa,Kentaro Fujii,Yuta Nomura,Shingo Murata
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Preprint version

点击查看摘要

[AI-28] Accelerating IC Thermal Simulation Data Generation via Block Krylov and Operator Action

【速读】:该论文旨在解决生成集成电路(IC)热仿真数据时所需高保真训练数据量大、计算成本高的问题。现有数据驱动方法依赖大量芯片参数与温度分布数据进行训练,导致数据生成过程耗时严重。解决方案的关键在于提出一种名为块Krylov和算子作用(BlocKOA)的新算法:首先利用基于热方程结构的块Krylov算法快速获得少量基础解,再通过线性组合生成满足物理约束的多种温度分布;随后应用热算子于这些函数以确定热源分布,从而高效生成高精度数据点。理论分析表明,BlocKOA的时间复杂度比现有方法低一个数量级,实验验证其在生成5000个不同物理参数和结构的芯片数据时实现420倍加速,且仅用4%的生成时间即可使数据驱动模型达到相当性能。

链接: https://arxiv.org/abs/2510.23221
作者: Hong Wang,Wenkai Yang,Jie Wang,Huanshuo Dong,Zijie Geng,Zhen Huang,Depeng Xie,Zhezheng Hao,Hande Dong
机构: 未知
类目: Artificial Intelligence (cs.AI); Computational Physics (physics.comp-ph)
备注:

点击查看摘要

Abstract:Recent advances in data-driven approaches, such as neural operators (NOs), have shown substantial efficacy in reducing the solution time for integrated circuit (IC) thermal simulations. However, a limitation of these approaches is requiring a large amount of high-fidelity training data, such as chip parameters and temperature distributions, thereby incurring significant computational costs. To address this challenge, we propose a novel algorithm for the generation of IC thermal simulation data, named block Krylov and operator action (BlocKOA), which simultaneously accelerates the data generation process and enhances the precision of generated data. BlocKOA is specifically designed for IC applications. Initially, we use the block Krylov algorithm based on the structure of the heat equation to quickly obtain a few basic solutions. Then we combine them to get numerous temperature distributions that satisfy the physical constraints. Finally, we apply heat operators on these functions to determine the heat source distributions, efficiently generating precise data points. Theoretical analysis shows that the time complexity of BlocKOA is one order lower than the existing method. Experimental results further validate its efficiency, showing that BlocKOA achieves a 420-fold speedup in generating thermal simulation data for 5000 chips with varying physical parameters and IC structures. Even with just 4% of the generation time, data-driven approaches trained on the data generated by BlocKOA exhibits comparable performance to that using the existing method.
zh

[AI-29] Human-Like Goalkeeping in a Realistic Football Simulation: a Sample-Efficient Reinforcement Learning Approach

链接: https://arxiv.org/abs/2510.23216
作者: Alessandro Sestini,Joakim Bergdahl,Jean-Philippe Barrette-LaPierre,Florian Fuchs,Brady Chen,Micheal Jones,Linus Gisslén
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[AI-30] Accelerating Eigenvalue Dataset Generation via Chebyshev Subspace Filter

链接: https://arxiv.org/abs/2510.23215
作者: Hong Wang,Jie Wang,Jian Luo,huanshuo dong,Yeqiu Chen,Runmin Jiang,Zhen huang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Numerical Analysis (math.NA)
备注:

点击查看摘要

[AI-31] AUPO - Abstracted Until Proven Otherwise: A Reward Distribution Based Abstraction Algorithm

链接: https://arxiv.org/abs/2510.23214
作者: Robin Schmöcker,Alexander Dockhorn,Bodo Rosenhahn
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-32] Increasing LLM Coding Capabilities through Diverse Synthetic Coding Tasks NEURIPS2025

【速读】:该论文旨在解决当前大型语言模型(Large Language Models, LLMs)在代码生成任务中进展受限的问题,即现有数据集普遍存在多样性不足与人类推理对齐性差的缺陷——多数数据仅包含问题与解决方案,缺乏引导编码的中间推理过程。为此,作者提出了一种可扩展的合成数据生成流水线,其关键在于构建近80万条“指令-推理-代码-测试”四元组样本,其中每条样本均包含任务描述、分步推理轨迹、可运行的代码实现及执行测试用例,从而让模型不仅学习“做什么”,更掌握“如何做”。该方案的核心创新包括:精选竞赛题目、基于相关性分类器过滤网络挖掘内容、依据推理模式驱动的数据扩展,以及多阶段基于执行的验证机制;此外引入遗传突变算法增强任务多样性并确保推理轨迹与代码实现的一致性。实验证明,基于此数据集微调的模型在多个编程基准上表现一致提升,且推理感知数据可在相同样本预算下替代模型规模扩大,具备跨架构泛化能力,并优于主流开源模型。

链接: https://arxiv.org/abs/2510.23208
作者: Amal Abed,Ivan Lukic,Jörg K.H. Franke,Frank Hutter
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Presented at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: The 4th Deep Learning for Code Workshop (DL4C)

点击查看摘要

Abstract:Large language models (LLMs) have shown impressive promise in code generation, yet their progress remains limited by the shortage of large-scale datasets that are both diverse and well-aligned with human reasoning. Most existing resources pair problems with solutions, but omit the intermediate thought process that guides coding. To close this gap, we present a scalable synthetic data generation pipeline that produces nearly 800k instruction-reasoning-code-test quadruplets. Each sample combines a task, a step-by-step reasoning trace, a working solution, and executable tests, enabling models to learn not just the what but also the how of problem solving. Our pipeline combines four key components: curated contest problems, web-mined content filtered by relevance classifiers, data expansion guided by reasoning patterns, and multi-stage execution-based validation. A genetic mutation algorithm further increases task diversity while maintaining consistency between reasoning traces and code implementations. Our key finding is that fine-tuning LLMs on this dataset yields consistent improvements on coding benchmarks. Beyond raw accuracy, reasoning-aware data can substitute for model scaling, generalize across architectures, and outperform leading open-source alternatives under identical sample budgets. Our work establishes reasoning-centered synthetic data generation as an efficient approach for advancing coding capabilities in LLMs. We publish our dataset and generation pipeline to facilitate further research.
zh

[AI-33] Guiding Skill Discovery with Foundation Models

链接: https://arxiv.org/abs/2510.23167
作者: Zhao Yang,Thomas M. Moerland,Mike Preuss,Aske Plaat,Vincent François-Lavet,Edward S. Hu
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-34] Enabling Vibration-Based Gesture Recognition on Everyday Furniture via Energy-Efficient FPGA Implementation of 1D Convolutional Networks

链接: https://arxiv.org/abs/2510.23156
作者: Koki Shibata,Tianheng Ling,Chao Qian,Tomokazu Matsui,Hirohiko Suwa,Keiichi Yasumoto,Gregor Schiele
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 9 pages, 5 figures, 5 tables, accepted by 2025 IEEE Annual Congress on Artificial Intelligence of Things (IEEE AIoT)

点击查看摘要

[AI-35] Adapting Interleaved Encoders with PPO for Language-Guided Reinforcement Learning in BabyAI

链接: https://arxiv.org/abs/2510.23148
作者: Aryan Mathur,Asaduddin Ahmed
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
备注: Undergraduate research project, IIT Palakkad, 2025

点击查看摘要

[AI-36] Lost in Tokenization: Context as the Key to Unlocking Biomolecular Understanding in Scientific LLM s

链接: https://arxiv.org/abs/2510.23127
作者: Kai Zhuang,Jiawei Zhang,Yumou Liu,Hanqun Cao,Chunbin Gu,Mengdi Liu,Zhangyang Gao,Zitong Jerry Wang,Xuanhe Zhou,Pheng-Ann Heng,Lijun Wu,Conghui He,Cheng Tan
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 36 pages, under review

点击查看摘要

[AI-37] GroupSHAP-Guided Integration of Financial News Keywords and Technical Indicators for Stock Price Prediction

链接: https://arxiv.org/abs/2510.23112
作者: Minjoo Kim,Jinwoong Kim,Sangjin Park
机构: 未知
类目: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI)
备注: 6 pages

点击查看摘要

[AI-38] Smaller Models Smarter Rewards: A Two-Sided Approach to Process and Outcome Rewards NEURIPS2025

链接: https://arxiv.org/abs/2510.23083
作者: Jan Niklas Groeneveld,Xi Qin,Alexander Schaefer,Yaad Oren
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE)
备注: Accepted and to be presented at NeurIPS 2025 Workshop: Foundations of Reasoning in Language Models

点击查看摘要

[AI-39] hink before Recommendation: Autonomous Reasoning -enhanced Recommender NEURIPS2025

链接: https://arxiv.org/abs/2510.23077
作者: Xiaoyu Kong,Junguang Jiang,Bin Liu,Ziru Xu,Han Zhu,Jian Xu,Bo Zheng,Jiancan Wu,Xiang Wang
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
备注: NeurIPS 2025 poster

点击查看摘要

[AI-40] LCD: A Deep Transfer Learning Framework for Cross-Disciplinary Cognitive Diagnosis

【速读】:该论文旨在解决跨学科情境下传统认知诊断方法在特征提取复杂性和学科数据稀缺性方面面临的挑战,尤其是在不同学科间知识体系、认知结构和数据特性差异显著的情况下。其解决方案的关键在于提出一种基于深度学习与迁移学习相结合的跨学科认知诊断方法(TLCD),通过利用主学科中的共性特征来提升目标学科模型的性能,从而更准确地评估学生的学业水平。

链接: https://arxiv.org/abs/2510.23062
作者: Zhifeng Wang,Meixin Su,Yang Yang,Chunyan Zeng,Lizhi Ye
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 10 pages, 8 figures

点击查看摘要

Abstract:Driven by the dual principles of smart education and artificial intelligence technology, the online education model has rapidly emerged as an important component of the education industry. Cognitive diagnostic technology can utilize students’ learning data and feedback information in educational evaluation to accurately assess their ability level at the knowledge level. However, while massive amounts of information provide abundant data resources, they also bring about complexity in feature extraction and scarcity of disciplinary data. In cross-disciplinary fields, traditional cognitive diagnostic methods still face many challenges. Given the differences in knowledge systems, cognitive structures, and data characteristics between different disciplines, this paper conducts in-depth research on neural network cognitive diagnosis and knowledge association neural network cognitive diagnosis, and proposes an innovative cross-disciplinary cognitive diagnosis method (TLCD). This method combines deep learning techniques and transfer learning strategies to enhance the performance of the model in the target discipline by utilizing the common features of the main discipline. The experimental results show that the cross-disciplinary cognitive diagnosis model based on deep learning performs better than the basic model in cross-disciplinary cognitive diagnosis tasks, and can more accurately evaluate students’ learning situation.
zh

[AI-41] Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients

链接: https://arxiv.org/abs/2510.23049
作者: Christos Thrampoulidis,Sadegh Mahdavi,Wenlong Deng
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-42] A Survey of AI Scientists: Surveying the automatic Scientists and Research

链接: https://arxiv.org/abs/2510.23045
作者: Guiyao Tie,Pan Zhou,Lichao Sun
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 28 pages, 9 figures, 1 table

点击查看摘要

[AI-43] LLM Meets Diffusion: A Hybrid Framework for Crystal Material Generation NEURIPS2025

【速读】:该论文旨在解决当前生成式晶体材料设计中存在的一大瓶颈问题:现有方法在处理原子类型(离散特征)与原子坐标及晶格参数(连续特征)时存在性能不均衡的问题。具体而言,大语言模型(Large Language Models, LLMs)擅长生成准确的原子组成但难以建模连续空间变量,而等变去噪模型(equivariant denoising models)能有效捕捉结构细节却难以保证化学组成的合理性。解决方案的关键在于提出一种混合框架 CrysLLMGen,该框架通过将微调后的 LLM 与预训练的等变扩散模型(equivariant diffusion model)协同工作,在采样阶段首先由 LLM 生成包含原子类型、坐标和晶格结构的中间表示,随后仅保留原子类型信息,将坐标和晶格输入至扩散模型进行精细化修正,从而实现对晶体结构和化学组成的联合优化。这一策略充分利用了两类模型的优势,显著提升了生成材料的结构有效性、化学合理性以及新颖性和稳定性。

链接: https://arxiv.org/abs/2510.23040
作者: Subhojyoti Khastagir,Kishalay Das,Pawan Goyal,Seung-Cheol Lee,Satadeep Bhattacharjee,Niloy Ganguly
机构: 未知
类目: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci); Artificial Intelligence (cs.AI)
备注: NeurIPS 2025

点击查看摘要

Abstract:Recent advances in generative modeling have shown significant promise in designing novel periodic crystal structures. Existing approaches typically rely on either large language models (LLMs) or equivariant denoising models, each with complementary strengths: LLMs excel at handling discrete atomic types but often struggle with continuous features such as atomic positions and lattice parameters, while denoising models are effective at modeling continuous variables but encounter difficulties in generating accurate atomic compositions. To bridge this gap, we propose CrysLLMGen, a hybrid framework that integrates an LLM with a diffusion model to leverage their complementary strengths for crystal material generation. During sampling, CrysLLMGen first employs a fine-tuned LLM to produce an intermediate representation of atom types, atomic coordinates, and lattice structure. While retaining the predicted atom types, it passes the atomic coordinates and lattice structure to a pre-trained equivariant diffusion model for refinement. Our framework outperforms state-of-the-art generative models across several benchmark tasks and datasets. Specifically, CrysLLMGen not only achieves a balanced performance in terms of structural and compositional validity but also generates more stable and novel materials compared to LLM-based and denoisingbased models Furthermore, CrysLLMGen exhibits strong conditional generation capabilities, effectively producing materials that satisfy user-defined constraints. Code is available at this https URL
zh

[AI-44] A high-capacity linguistic steganography based on entropy-driven rank-token mapping

链接: https://arxiv.org/abs/2510.23035
作者: Jun Jiang,Weiming Zhang,Nenghai Yu,Kejiang Chen
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-45] Efficient and Encrypted Inference using Binarized Neural Networks within In-Memory Computing Architectures

链接: https://arxiv.org/abs/2510.23034
作者: Gokulnath Rajendran,Suman Deb,Anupam Chattopadhyay
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注: to be published in: 7th International Conference on Emerging Electronics (ICEE 2025)

点击查看摘要

[AI-46] Mixed Density Diffuser: Efficient Planning with Non-uniform Temporal Resolution

链接: https://arxiv.org/abs/2510.23026
作者: Crimson Stambaugh,Rajesh P. N. Rao
机构: 未知
类目: Artificial Intelligence (cs.AI); Robotics (cs.RO)
备注: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESSAN) (under review)

点击查看摘要

[AI-47] MoEMeta: Mixture-of-Experts Meta Learning for Few-Shot Relational Learning NEURIPS2025

链接: https://arxiv.org/abs/2510.23013
作者: Han Wu,Jie Yin
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Accepted by NeurIPS 2025

点击查看摘要

[AI-48] Softmax is 1/2-Lipschitz: A tight bound across all ell_p norms

链接: https://arxiv.org/abs/2510.23012
作者: Pravin Nair
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Under review

点击查看摘要

[AI-49] From Prompt Optimization to Multi-Dimensional Credibility Evaluation: Enhancing Trustworthiness of Chinese LLM -Generated Liver MRI Reports

链接: https://arxiv.org/abs/2510.23008
作者: Qiuli Wang,Xiaoming Li,Jie Chen,Yongxu Liu,Xingpeng Zhang,Chen Liu,Wei Chen
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 10 pages, 6 figures, 4 tables

点击查看摘要

[AI-50] ProfileXAI: User-Adaptive Explainable AI

链接: https://arxiv.org/abs/2510.22998
作者: Gilber A. Corrales,Carlos Andrés Ferro Sánchez,Reinel Tabares-Soto,Jesús Alfonso López Sotelo,Gonzalo A. Ruz,Johan Sebastian Piña Durán
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: pages, 1 figure, 3 tables. Preprint. Evaluated on UCI Heart Disease (1989) and UCI Differentiated Thyroid Cancer Recurrence (2023). Uses IEEEtran

点击查看摘要

[AI-51] he Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination

链接: https://arxiv.org/abs/2510.22977
作者: Chenlong Yin,Zeyang Sha,Shiwen Cui,Changhua Meng
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 18 pages, 5 figures

点击查看摘要

[AI-52] Multi-Agent Conditional Diffusion Model with Mean Field Communication as Wireless Resource Allocation Planner

链接: https://arxiv.org/abs/2510.22969
作者: Kechen Meng,Sinuo Zhang,Rongpeng Li,Xiangming Meng,Chan Wang,Ming Lei,Zhifeng Zhao
机构: 未知
类目: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
备注:

点击查看摘要

[AI-53] CompressionAttack: Exploiting Prompt Compression as a New Attack Surface in LLM -Powered Agents

链接: https://arxiv.org/abs/2510.22963
作者: Zesen Liu,Zhixiang Zhang,Yuchong Xie,Dongdong She
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-54] Manifold Approximation leads to Robust Kernel Alignment

链接: https://arxiv.org/abs/2510.22953
作者: Mohammad Tariqul Islam,Du Liu,Deblina Sarkar
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注: 9 pages, 5 figures + supplementary

点击查看摘要

[AI-55] Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies

链接: https://arxiv.org/abs/2510.22944
作者: Bin Wang,YiLu Zhong,MiDi Wan,WenJie Yu,YuanBing Ouyang,Yenan Huang,Hui Li
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-56] GTR-Mamba: Geometry-to-Tangent Routing for Hyperbolic POI Recommendation ICDE2026

链接: https://arxiv.org/abs/2510.22942
作者: Zhuoxuan Li,Jieyuan Pei,Tangwei Ye,Zhongyuan Lai,Zihan Liu,Fengyuan Xu,Qi Zhang,Liang Hu
机构: 未知
类目: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
备注: 14 pages, 8 figures, 4 tables, submitted to ICDE 2026

点击查看摘要

[AI-57] Robust Uncertainty Quantification for Self-Evolving Large Language Models via Continual Domain Pretraining

链接: https://arxiv.org/abs/2510.22931
作者: Xiaofan Zhou,Lu Cheng
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-58] HyPerNav: Hybrid Perception for Object-Oriented Navigation in Unknown Environment

链接: https://arxiv.org/abs/2510.22917
作者: Zecheng Yin,Hao Zhao,Zhen Li
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注: under review

点击查看摘要

[AI-59] Rethinking Inference Placement for Deep Learning across Edge and Cloud Platforms: A Multi-Objective Optimization Perspective and Future Directions

链接: https://arxiv.org/abs/2510.22909
作者: Zongshun Zhang,Ibrahim Matta
机构: 未知
类目: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Performance (cs.PF)
备注:

点击查看摘要

[AI-60] On Generalization in Agent ic Tool Calling: CoreThink Agent ic Reason er and MAVEN Dataset

链接: https://arxiv.org/abs/2510.22898
作者: Vishvesh Bhat,Omkar Ghugarkar,Julian McAuley
机构: 未知
类目: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
备注: Preprint

点击查看摘要

[AI-61] Exploring Structures of Inferential Mechanisms through Simplistic Digital Circuits ECAI2025

链接: https://arxiv.org/abs/2510.22883
作者: Giovanni Sileno,Jean-Louis Dessalles
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: paper presented at the 10th AIC workshop (AI cognition) at ECAI 2025

点击查看摘要

[AI-62] Learning Reconfigurable Representations for Multimodal Federated Learning with Missing Data NEURIPS2025

链接: https://arxiv.org/abs/2510.22880
作者: Duong M. Nguyen,Trong Nghia Hoang,Thanh Trung Huynh,Quoc Viet Hung Nguyen,Phi Le Nguyen
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Accepted at NeurIPS 2025

点击查看摘要

[AI-63] Long-Term PM2.5 Forecasting Using a DTW-Enhanced CNN-GRU Model

链接: https://arxiv.org/abs/2510.22863
作者: Amirali Ataee Naeini,Arshia Ataee Naeini,Fatemeh Karami Mohammadi,Omid Ghaffarpasand
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 26 pages

点击查看摘要

[AI-64] Guardian: Decoupling Exploration from Safety in Reinforcement Learning

【速读】:该论文旨在解决混合离线-在线强化学习(Hybrid offline–online reinforcement learning, O2O RL)中因离线与在线数据分布偏移导致的训练不稳定问题。其核心解决方案是提出RLPD-GX框架,关键在于将策略优化与安全约束解耦:一个奖励驱动的学习器自由探索,而基于投影的安全守护者则确保动作执行符合规则并保障价值更新的安全性。该设计在保留在线交互探索优势的同时避免策略趋于保守,结合动态课程机制逐步扩展时序跨度并调节离线-在线数据混合比例,理论上通过受保护贝尔曼算子的压缩性质证明收敛性,并在Atari-100k等任务上实现显著性能提升与更强稳定性。

链接: https://arxiv.org/abs/2510.22859
作者: Kaitong Cai,Jusheng Zhang,Jing Yang,Keze Wang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Hybrid offline–online reinforcement learning (O2O RL) promises both sample efficiency and robust exploration, but suffers from instability due to distribution shift between offline and online data. We introduce RLPD-GX, a framework that decouples policy optimization from safety enforcement: a reward-seeking learner explores freely, while a projection-based guardian guarantees rule-consistent execution and safe value backups. This design preserves the exploratory value of online interactions without collapsing to conservative policies. To further stabilize training, we propose dynamic curricula that gradually extend temporal horizons and anneal offline–online data mixing. We prove convergence via a contraction property of the guarded Bellman operator, and empirically show state-of-the-art performance on Atari-100k, achieving a normalized mean score of 3.02 (+45% over prior hybrid methods) with stronger safety and stability. Beyond Atari, ablations demonstrate consistent gains across safety-critical and long-horizon tasks, underscoring the generality of our design. Extensive and comprehensive results highlight decoupled safety enforcement as a simple yet principled route to robust O2O RL, suggesting a broader paradigm for reconciling exploration and safety in reinforcement learning.
zh

[AI-65] Encoder-Decoder Diffusion Language Models for Efficient Training and Inference NEURIPS2025

链接: https://arxiv.org/abs/2510.22852
作者: Marianne Arriola,Yair Schiff,Hao Phung,Aaron Gokaslan,Volodymyr Kuleshov
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: NeurIPS 2025. We provide the code at this https URL

点击查看摘要

[AI-66] Lyapunov Function-guided Reinforcement Learning for Flight Control

链接: https://arxiv.org/abs/2510.22840
作者: Yifei Li,Erik-Jan van Kampen
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-67] Rethinking the Text-Vision Reasoning Imbalance in MLLM s through the Lens of Training Recipes

链接: https://arxiv.org/abs/2510.22836
作者: Guanyu Yao,Qiucheng Wu,Yang Zhang,Zhaowen Wang,Handong Zhao,Shiyu Chang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-68] oward Agents That Reason About Their Computation

链接: https://arxiv.org/abs/2510.22833
作者: Adrian Orenstein,Jessica Chen,Gwyneth Anne Delos Santos,Bayley Sapara,Michael Bowling
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[AI-69] HRM-Agent : Training a recurrent reasoning model in dynamic environments using reinforcement learning

【速读】:该论文旨在解决Hierarchical Reasoning Model (HRM)在动态、不确定或部分可观测环境中无法有效利用历史计算资源的问题,以及其在缺乏明确正确动作定义的现实场景中应用受限的局限性。解决方案的关键在于提出HRM-Agent——一种仅通过强化学习训练的HRM变体,使其能够适应动态和不确定性环境中的目标导航任务,并通过分析其递归推理过程发现该机制成功复用了先前环境时间步的计算结果,从而实现高效且具适应性的决策。

链接: https://arxiv.org/abs/2510.22832
作者: Long H Dang,David Rawlinson
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
备注: 14 pages, 9 figures, 1 table

点击查看摘要

Abstract:The Hierarchical Reasoning Model (HRM) has impressive reasoning abilities given its small size, but has only been applied to supervised, static, fully-observable problems. One of HRM’s strengths is its ability to adapt its computational effort to the difficulty of the problem. However, in its current form it cannot integrate and reuse computation from previous time-steps if the problem is dynamic, uncertain or partially observable, or be applied where the correct action is undefined, characteristics of many real-world problems. This paper presents HRM-Agent, a variant of HRM trained using only reinforcement learning. We show that HRM can learn to navigate to goals in dynamic and uncertain maze environments. Recent work suggests that HRM’s reasoning abilities stem from its recurrent inference process. We explore the dynamics of the recurrent inference process and find evidence that it is successfully reusing computation from earlier environment time-steps. Comments: 14 pages, 9 figures, 1 table Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML) MSC classes: 68T07 (Primary) 62M45, 37N99 (Secondary) ACMclasses: I.2.6; I.2.8 Cite as: arXiv:2510.22832 [cs.AI] (or arXiv:2510.22832v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2510.22832 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
zh

[AI-70] Air Quality Prediction Using LOESS-ARIMA and Multi-Scale CNN-BiLSTM with Residual-Gated Attention

链接: https://arxiv.org/abs/2510.22818
作者: Soham Pahari,Sandeep Chand Kumain
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-71] Will Humanity Be Rendered Obsolete by AI?

链接: https://arxiv.org/abs/2510.22814
作者: Mohamed El Louadi,Emna Ben Romdhane
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-72] A Theory of the Mechanics of Information: Generalization Through Measurement of Uncertainty (Learning is Measuring)

链接: https://arxiv.org/abs/2510.22809
作者: Christopher J. Hazard,Michael Resnick,Jacob Beel,Jack Xia,Cade Mack,Dominic Glennie,Matthew Fulp,David Maze,Andrew Bassett,Martin Koistinen
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Statistics Theory (math.ST); Machine Learning (stat.ML)
备注: 117 pages

点击查看摘要

[AI-73] Collaborative LLM Agents for C4 Software Architecture Design Automation

链接: https://arxiv.org/abs/2510.22787
作者: Kamil Szczepanik,Jarosław A. Chudziak
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注: This paper has been accepted for the upcoming 59th Hawaii International Conference on System Sciences (HICSS-59), 2026, Hawaii, USA. The final published version will appear in the official conference proceedings

点击查看摘要

[AI-74] PIP-LLM : Integrating PDDL-Integer Programming with LLM s for Coordinating Multi-Robot Teams Using Natural Language

链接: https://arxiv.org/abs/2510.22784
作者: Guangyao Shi,Yuwei Wu,Vijay Kumar,Gaurav S. Sukhatme
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-75] Agent ic Meta-Orchestrator for Multi-task Copilots

链接: https://arxiv.org/abs/2510.22781
作者: Xiaofeng Zhu,Yunshen Zhou
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-76] Jarvis: Towards Personalized AI Assistant via Personal KV-Cache Retrieval

链接: https://arxiv.org/abs/2510.22765
作者: Binxiao Xu,Junyu Feng,Ruichuan An,Yulin Luo,Shilin Yan,Hao Liang,Ming Lu,Wentao Zhang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 19 pages, 7 figures

点击查看摘要

[AI-77] Policies over Poses: Reinforcement Learning based Distributed Pose-Graph Optimization for Multi-Robot SLAM

链接: https://arxiv.org/abs/2510.22740
作者: Sai Krishna Ghanta,Ramviyas Parasuraman
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
备注: IEEE International Symposium on Multi-Robot Multi-Agent Systems (MRS) 2025

点击查看摘要

[AI-78] Step2Motion: Locomotion Reconstruction from Pressure Sensing Insoles

链接: https://arxiv.org/abs/2510.22712
作者: Jose Luis Ponton,Eduardo Alvarado,Lin Geng Foo,Nuria Pelechano,Carlos Andujar,Marc Habermann
机构: 未知
类目: Graphics (cs.GR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-79] RaCoT: Plug-and-Play Contrastive Example Generation Mechanism for Enhanced LLM Reasoning Reliability

链接: https://arxiv.org/abs/2510.22710
作者: Kaitong Cai,Jusheng Zhang,Yijia Fan,Jing Yang,Keze Wang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-80] FlowCritic: Bridging Value Estimation with Flow Matching in Reinforcement Learning

链接: https://arxiv.org/abs/2510.22686
作者: Shan Zhong,Shutong Ding,He Diao,Xiangyu Wang,Kah Chan Teh,Bei Peng
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-81] Uncertainty-Aware Autonomous Vehicles: Predicting the Road Ahead

链接: https://arxiv.org/abs/2510.22680
作者: Shireen Kudukkil Manchingal,Armand Amaritei,Mihir Gohad,Maryam Sultana,Julian F. P. Kooij,Fabio Cuzzolin,Andrew Bradley
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-82] Learning Without Augmenting: Unsupervised Time Series Representation Learning via Frame Projections NEURIPS

链接: https://arxiv.org/abs/2510.22655
作者: Berken Utku Demirel,Christian Holz
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Published at the Conference on Neural Information Processing Systems (NeurIPS) 2025

点击查看摘要

[AI-83] Variational Polya Tree

链接: https://arxiv.org/abs/2510.22651
作者: Lu Xu,Tsai Hor Chan,Kwok Fai Lam,Lequan Yu,Guosheng Yin
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-84] Enhancing Graph Classification Robustness with Singular Pooling NEURIPS2025

链接: https://arxiv.org/abs/2510.22643
作者: Sofiane Ennadir,Oleg Smirnov,Yassine Abbahaddou,Lele Cao,Johannes F. Lutzeyer
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Accepted at Neurips 2025

点击查看摘要

[AI-85] FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference AACL2025

链接: https://arxiv.org/abs/2510.22641
作者: Divya Jyoti Bajpai,Manjesh Kumar Hanawal
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Accepted for presentation at the main Conference IJCNLP-AACL 2025

点击查看摘要

[AI-86] Sentra-Guard: A Multilingual Human-AI Framework for Real-Time Defense Against Adversarial LLM Jailbreaks

【速读】:该论文旨在解决大型语言模型(Large Language Models, LLMs)面临的越狱攻击(jailbreak attacks)和提示注入攻击(prompt injection attacks)问题,这些攻击可能诱导模型生成有害或非预期内容。解决方案的关键在于提出一个实时模块化防御系统Sentra-Guard,其核心创新是引入分类器-检索融合模块(classifier-retriever fusion module),该模块通过FAISS索引的SBERT嵌入表示与微调的Transformer分类器相结合,动态计算上下文感知的风险评分,从而精准识别直接和混淆形式的恶意提示。此外,系统具备多语言鲁棒性,通过语言无关的预处理层实现100余种语言的统一语义评估,并结合人工在环(Human-in-the-Loop, HITL)反馈机制持续优化双标签知识库,显著提升检测准确率(AUC=1.00, F1=1.00)并降低误报率,优于现有主流基线方法。

链接: https://arxiv.org/abs/2510.22628
作者: Md. Mehedi Hasan,Ziaur Rahman,Rafid Mostafiz,Md. Abir Hossain
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注: 11 pages, 5 figures. Preprint version under review in the area of Artificial Intelligence (cs.AI)

点击查看摘要

Abstract:This paper presents a real-time modular defense system named Sentra-Guard. The system detects and mitigates jailbreak and prompt injection attacks targeting large language models (LLMs). The framework uses a hybrid architecture with FAISS-indexed SBERT embedding representations that capture the semantic meaning of prompts, combined with fine-tuned transformer classifiers, which are machine learning models specialized for distinguishing between benign and adversarial language inputs. It identifies adversarial prompts in both direct and obfuscated attack vectors. A core innovation is the classifier-retriever fusion module, which dynamically computes context-aware risk scores that estimate how likely a prompt is to be adversarial based on its content and context. The framework ensures multilingual resilience with a language-agnostic preprocessing layer. This component automatically translates non-English prompts into English for semantic evaluation, enabling consistent detection across over 100 languages. The system includes a HITL feedback loop, where decisions made by the automated system are reviewed by human experts for continual learning and rapid adaptation under adversarial pressure. Sentra-Guard maintains an evolving dual-labeled knowledge base of benign and malicious prompts, enhancing detection reliability and reducing false positives. Evaluation results show a 99.96% detection rate (AUC = 1.00, F1 = 1.00) and an attack success rate (ASR) of only 0.004%. This outperforms leading baselines such as LlamaGuard-2 (1.3%) and OpenAI Moderation (3.7%). Unlike black-box approaches, Sentra-Guard is transparent, fine-tunable, and compatible with diverse LLM backends. Its modular design supports scalable deployment in both commercial and open-source environments. The system establishes a new state-of-the-art in adversarial LLM defense.
zh

[AI-87] SwiftSolve: A Self-Iterative Complexity-Aware Multi-Agent Framework for Competitive Programming

链接: https://arxiv.org/abs/2510.22626
作者: Adhyayan Veer Singh,Aaron Shen,Brian Law,Ahmed Ismail,Jonas Rohweder,Sean O’Brien,Kevin Zhu
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-88] Breaking Agent Backbones: Evaluating the Security of Backbone LLM s in AI Agents

链接: https://arxiv.org/abs/2510.22620
作者: Julia Bazinska,Max Mathys,Francesco Casucci,Mateo Rojas-Carulla,Xander Davies,Alexandra Souly,Niklas Pfister
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Julia Bazinska and Max Mathys contributed equally

点击查看摘要

[AI-89] Does In-IDE Calibration of Large Language Models work at Scale?

链接: https://arxiv.org/abs/2510.22614
作者: Roham Koohestani,Agnia Sergeyuk,David Gros,Claudio Spiess,Sergey Titov,Prem Devanbu,Maliheh Izadi
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注: Under Review

点击查看摘要

[AI-90] CLIN-LLM : A Safety-Constrained Hybrid Framework for Clinical Diagnosis and Treatment Generation

链接: https://arxiv.org/abs/2510.22609
作者: Md. Mehedi Hasan,Rafid Mostafiz,Md. Abir Hossain,Bikash Kumar Paul
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 13 pages, 9 figures. Preprint version under review in the area of Artificial Intelligence (cs.CR)

点击查看摘要

[AI-91] RoGER-SLAM: A Robust Gaussian Splatting SLAM System for Noisy and Low-light Environment Resilience

链接: https://arxiv.org/abs/2510.22600
作者: Huilin Yin,Zhaolin Yang,Linchuan Zhang,Gerhard Rigoll,Johannes Betz
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注: 13 pages, 11 figures, under review

点击查看摘要

[AI-92] A Framework for Quantifying How Pre-Training and Context Benefit In-Context Learning

链接: https://arxiv.org/abs/2510.22594
作者: Bingqing Song,Jiaxiang Li,Rong Wang,Songtao Lu,Mingyi Hong
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[AI-93] Combining Deep Learning and Explainable AI for Toxicity Prediction of Chemical Compounds

链接: https://arxiv.org/abs/2510.22572
作者: Eduard Popescu,Adrian Groza,Andreea Cernat
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-94] Curriculum-Based Iterative Self-Play for Scalable Multi-Drone Racing

链接: https://arxiv.org/abs/2510.22570
作者: Onur Akgün
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Systems and Control (eess.SY)
备注: 13 pages, 5 figures. This paper is currently under review at the journal Engineering Applications of Artificial Intelligence. Supplementary video: this https URL Source code and models: this https URL

点击查看摘要

[AI-95] SPIRAL: Self-Play Incremental Racing Algorithm for Learning in Multi-Drone Competitions

【速读】:该论文旨在解决多智能体自主无人机竞速中复杂行为学习与适应性提升的问题,尤其是在动态、高难度环境中实现从基础飞行控制到高级协同竞速策略的渐进式进化。其解决方案的关键在于提出了一种名为SPIRAL(Self-Play Incremental Racing Algorithm for Learning)的新方法,该方法基于自对弈(self-play)机制,使无人机在训练过程中持续与自身不断优化的版本对抗,从而自动生成难度递增的竞争挑战,推动智能体逐步掌握更复杂的竞速行为。此机制可与任意先进的深度强化学习(Deep Reinforcement Learning, DRL)算法结合,形成一个通用、可扩展且具备自我改进能力的学习框架。

链接: https://arxiv.org/abs/2510.22568
作者: Onur Akgün
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Systems and Control (eess.SY)
备注: \c{opyright} 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

点击查看摘要

Abstract:This paper introduces SPIRAL (Self-Play Incremental Racing Algorithm for Learning), a novel approach for training autonomous drones in multi-agent racing competitions. SPIRAL distinctively employs a self-play mechanism to incrementally cultivate complex racing behaviors within a challenging, dynamic environment. Through this self-play core, drones continuously compete against increasingly proficient versions of themselves, naturally escalating the difficulty of competitive interactions. This progressive learning journey guides agents from mastering fundamental flight control to executing sophisticated cooperative multi-drone racing strategies. Our method is designed for versatility, allowing integration with any state-of-the-art Deep Reinforcement Learning (DRL) algorithms within its self-play framework. Simulations demonstrate the significant advantages of SPIRAL and benchmark the performance of various DRL algorithms operating within it. Consequently, we contribute a versatile, scalable, and self-improving learning framework to the field of autonomous drone racing. SPIRAL’s capacity to autonomously generate appropriate and escalating challenges through its self-play dynamic offers a promising direction for developing robust and adaptive racing strategies in multi-agent environments. This research opens new avenues for enhancing the performance and reliability of autonomous racing drones in increasingly complex and competitive scenarios.
zh

[AI-96] Blockchain Signatures to Ensure Information Integrity and Non-Repudiation in the Digital Era: A comprehensive study

链接: https://arxiv.org/abs/2510.22561
作者: Kaveri Banerjee,Sajal Saha
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注: 13 Pages, 2 Figures

点击查看摘要

[AI-97] DDTR: Diffusion Denoising Trace Recovery

链接: https://arxiv.org/abs/2510.22553
作者: Maximilian Matyash,Avigdor Gal,Arik Senderovich
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-98] oward Robust Signed Graph Learning through Joint Input-Target Denoising ACM-MM2025

链接: https://arxiv.org/abs/2510.22513
作者: Junran Wu,Beng Chin Ooi,Ke Xu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: ACM MM 2025

点击查看摘要

[AI-99] ransitive RL: Value Learning via Divide and Conquer

链接: https://arxiv.org/abs/2510.22512
作者: Seohong Park,Aditya Oberai,Pranav Atreya,Sergey Levine
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-100] Accelerating Materials Design via LLM -Guided Evolutionary Search

链接: https://arxiv.org/abs/2510.22503
作者: Nikhil Abhyankar,Sanchit Kabra,Saaketh Desai,Chandan K. Reddy
机构: 未知
类目: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
备注:

点击查看摘要

[AI-101] Agent -GSPO: Communication-Efficient Multi-Agent Systems via Group Sequence Policy Optimization

链接: https://arxiv.org/abs/2510.22477
作者: Yijia Fan,Jusheng Zhang,Jing Yang,Keze Wang
机构: 未知
类目: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-102] Backward-Friendly Optimization: Training Large Language Models with Approximate Gradients under Memory Constraints

链接: https://arxiv.org/abs/2510.22467
作者: Jing Yang,Kaitong Cai,Yijia Fan,Yufeng Yang,Keze Wang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-103] Learning “Partner-Aware” Collaborators in Multi-Party Collaboration

链接: https://arxiv.org/abs/2510.22462
作者: Abhijnan Nath,Nikhil Krishnaswamy
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-104] Evaluating Multimodal Large Language Models on Core Music Perception Tasks NEURIPS2025

链接: https://arxiv.org/abs/2510.22455
作者: Brandon James Carone,Iran R. Roman,Pablo Ripollés
机构: 未知
类目: ound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
备注: Accepted to the NeurIPS 2025 Workshop on AI for Music (AI4Music), 16 pages, 1 figure, 3 tables

点击查看摘要

[AI-105] GraphTOP: Graph Topology-Oriented Prompting for Graph Neural Networks NEURIPS2025

【速读】:该论文旨在解决当前图提示(graph prompting)方法在适应预训练图神经网络(GNN)时性能受限的问题,特别是现有研究多集中于特征导向型提示(feature-oriented prompting),而忽视了拓扑导向型提示(topology-oriented prompting)的潜力。其解决方案的关键在于提出首个面向图拓扑的提示框架——GraphTOP,将拓扑提示建模为多跳局部子图内的边重连(edge rewiring)问题,并通过重参数化技术将其松弛到连续概率空间,从而在保持图稀疏性的同时实现紧致逼近,显著提升了下游任务的性能表现。

链接: https://arxiv.org/abs/2510.22451
作者: Xingbo Fu,Zhenyu Lei,Zihan Chen,Binchi Zhang,Chuxu Zhang,Jundong Li
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Accepted by the 39 Annual Conference on Neural Information Processing Systems (NeurIPS 2025)

点击查看摘要

Abstract:Graph Neural Networks (GNNs) have revolutionized the field of graph learning by learning expressive graph representations from massive graph data. As a common pattern to train powerful GNNs, the “pre-training, adaptation” scheme first pre-trains GNNs over unlabeled graph data and subsequently adapts them to specific downstream tasks. In the adaptation phase, graph prompting is an effective strategy that modifies input graph data with learnable prompts while keeping pre-trained GNN models frozen. Typically, existing graph prompting studies mainly focus on feature-oriented methods that apply graph prompts to node features or hidden representations. However, these studies often achieve suboptimal performance, as they consistently overlook the potential of topology-oriented prompting, which adapts pre-trained GNNs by modifying the graph topology. In this study, we conduct a pioneering investigation of graph prompting in terms of graph topology. We propose the first Graph Topology-Oriented Prompting (GraphTOP) framework to effectively adapt pre-trained GNN models for downstream tasks. More specifically, we reformulate topology-oriented prompting as an edge rewiring problem within multi-hop local subgraphs and relax it into the continuous probability space through reparameterization while ensuring tight relaxation and preserving graph sparsity. Extensive experiments on five graph datasets under four pre-training strategies demonstrate that our proposed GraphTOP outshines six baselines on multiple node classification datasets. Our code is available at this https URL.
zh

[AI-106] SmartMixed: A Two-Phase Training Strategy for Adaptive Activation Function Learning in Neural Networks

链接: https://arxiv.org/abs/2510.22450
作者: Amin Omidvar
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-107] PromptReverb: Multimodal Room Impulse Response Generation Through Latent Rectified Flow Matching

【速读】:该论文旨在解决虚拟声学环境中房间脉冲响应(Room Impulse Response, RIR)生成的两大核心问题:一是全频段RIR数据集稀缺,二是现有模型难以从多样化输入模态(如自然语言描述)中生成声学准确的RIR。解决方案的关键在于提出一个两阶段生成框架PromptReverb:第一阶段采用变分自编码器(Variational Autoencoder)将带限RIR上采样至全频段(48 kHz)质量;第二阶段基于修正流匹配(rectified flow matching)的条件扩散Transformer模型,能够根据自然语言描述生成高质量RIR。该方法在感知质量和声学准确性上显著优于现有基线,实现了更实用的高保真RIR合成。

链接: https://arxiv.org/abs/2510.22439
作者: Ali Vosoughi,Yongyi Zang,Qihui Yang,Nathan Peak,Randal Leistikow,Chenliang Xu
机构: 未知
类目: ound (cs.SD); Artificial Intelligence (cs.AI)
备注: 9 pages, 2 figures, 4 tables

点击查看摘要

Abstract:Room impulse response (RIR) generation remains a critical challenge for creating immersive virtual acoustic environments. Current methods suffer from two fundamental limitations: the scarcity of full-band RIR datasets and the inability of existing models to generate acoustically accurate responses from diverse input modalities. We present PromptReverb, a two-stage generative framework that addresses these challenges. Our approach combines a variational autoencoder that upsamples band-limited RIRs to full-band quality (48 kHz), and a conditional diffusion transformer model based on rectified flow matching that generates RIRs from descriptions in natural language. Empirical evaluation demonstrates that PromptReverb produces RIRs with superior perceptual quality and acoustic accuracy compared to existing methods, achieving 8.8% mean RT60 error compared to -37% for widely used baselines and yielding more realistic room-acoustic parameters. Our method enables practical applications in virtual reality, architectural acoustics, and audio production where flexible, high-quality RIR synthesis is essential.
zh

[AI-108] Group size effects and collective misalignment in LLM multi-agent systems

链接: https://arxiv.org/abs/2510.22422
作者: Ariel Flint,Luca Maria Aiello,Romualdo Pastor-Satorras,Andrea Baronchelli
机构: 未知
类目: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Physics and Society (physics.soc-ph)
备注:

点击查看摘要

[AI-109] Knowledge-guided Continual Learning for Behavioral Analytics Systems

【速读】:该论文旨在解决在线平台用户行为分析模型因数据漂移(data drift)导致性能下降的问题,同时避免在持续学习过程中因灾难性遗忘(catastrophic forgetting)而丢失先前知识。其解决方案的关键在于提出一种基于数据增强的改进型回放机制(replay-based continual learning framework),通过引入外部知识库对历史样本进行增强,从而缓解固定大小缓冲区(buffer)带来的容量限制问题,有效提升模型在动态环境中的适应能力与稳定性。

链接: https://arxiv.org/abs/2510.22405
作者: Yasas Senarath,Hemant Purohit
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: This is a preprint of the accepted author manuscript that has been accepted for publication at IEEE CogMI 2025 - The 7th IEEE International Conference on Cognitive Machine Intelligence

点击查看摘要

Abstract:User behavior on online platforms is evolving, reflecting real-world changes in how people post, whether it’s helpful messages or hate speech. Models that learn to capture this content can experience a decrease in performance over time due to data drift, which can lead to ineffective behavioral analytics systems. However, fine-tuning such a model over time with new data can be detrimental due to catastrophic forgetting. Replay-based approaches in continual learning offer a simple yet efficient method to update such models, minimizing forgetting by maintaining a buffer of important training instances from past learned tasks. However, the main limitation of this approach is the fixed size of the buffer. External knowledge bases can be utilized to overcome this limitation through data augmentation. We propose a novel augmentation-based approach to incorporate external knowledge in the replay-based continual learning framework. We evaluate several strategies with three datasets from prior studies related to deviant behavior classification to assess the integration of external knowledge in continual learning and demonstrate that augmentation helps outperform baseline replay-based approaches.
zh

[AI-110] Can Small and Reasoning Large Language Models Score Journal Articles for Research Quality and Do Averag ing and Few-shot Help?

【速读】:该论文旨在解决小规模生成式 AI(Generative AI)模型在评估学术期刊文章质量方面的有效性问题,特别是针对参数量较小(如1B和4B)及具备推理能力的模型是否能够达到与大型模型(如ChatGPT 4o-mini和Gemini 2.0 Flash)相当的表现。其关键解决方案在于通过实证研究验证:即使参数规模较小(尤其是4B级别),开放权重的小型模型(如Gemma3、Llama4 Scout、Qwen3、Magistral Small和DeepSeek R1)仍具备显著的文章质量评分能力;同时发现,对多个相同查询结果进行平均(score averaging)是一种普遍有效的策略,显著提升评分稳定性与准确性,而少样本提示(few-shot prompting)效果尚不明确。

链接: https://arxiv.org/abs/2510.22389
作者: Mike Thelwall,Ehsan Mohammadi
机构: 未知
类目: Digital Libraries (cs.DL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Assessing published academic journal articles is a common task for evaluations of departments and individuals. Whilst it is sometimes supported by citation data, Large Language Models (LLMs) may give more useful indications of article quality. Evidence of this capability exists for two of the largest LLM families, ChatGPT and Gemini, and the medium sized LLM Gemma3 27b, but it is unclear whether smaller LLMs and reasoning models have similar abilities. This is important because larger models may be slow and impractical in some situations, and reasoning models may perform differently. Four relevant questions are addressed with Gemma3 variants, Llama4 Scout, Qwen3, Magistral Small and DeepSeek R1, on a dataset of 2,780 medical, health and life science papers in 6 fields, with two different gold standards, one novel. The results suggest that smaller (open weights) and reasoning LLMs have similar performance to ChatGPT 4o-mini and Gemini 2.0 Flash, but that 1b parameters may often, and 4b sometimes, be too few. Moreover, averaging scores from multiple identical queries seems to be a universally successful strategy, and few-shot prompts (four examples) tended to help but the evidence was equivocal. Reasoning models did not have a clear advantage. Overall, the results show, for the first time, that smaller LLMs 4b, including reasoning models, have a substantial capability to score journal articles for research quality, especially if score averaging is used.
zh

[AI-111] oward Humanoid Brain-Body Co-design: Joint Optimization of Control and Morphology for Fall Recovery

链接: https://arxiv.org/abs/2510.22336
作者: Bo Yue,Sheng Xu,Kui Jia,Guiliang Liu
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-112] LIFT: Interpretable truck driving risk prediction with literature-informed fine-tuned LLM s

【速读】:该论文旨在解决卡车驾驶风险预测中模型可解释性不足的问题,尤其是在缺乏领域知识引导的情况下,现有大语言模型(Large Language Models, LLMs)难以提供可靠且符合交通工程实践的解释。其解决方案的关键在于提出了一种文献引导微调(Literature-informed Fine-tuned, LIFT)框架,该框架通过一个文献处理流水线自动构建领域知识库,并基于此对LLM进行微调,从而在保持高预测性能的同时显著提升模型输出的可解释性。实验表明,LIFT LLM在召回率和F1分数上分别优于基线模型26.7%和10.1%,且变量重要性排序与传统统计模型一致,同时能识别潜在高风险场景并具备良好的数据采样鲁棒性。

链接: https://arxiv.org/abs/2510.22333
作者: Xiao Hu,Yuansheng Lian,Ke Zhang,Yunxuan Li,Yuelong Su,Meng Li
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:This study proposes an interpretable prediction framework with literature-informed fine-tuned (LIFT) LLMs for truck driving risk prediction. The framework integrates an LLM-driven Inference Core that predicts and explains truck driving risk, a Literature Processing Pipeline that filters and summarizes domain-specific literature into a literature knowledge base, and a Result Evaluator that evaluates the prediction performance as well as the interpretability of the LIFT LLM. After fine-tuning on a real-world truck driving risk dataset, the LIFT LLM achieved accurate risk prediction, outperforming benchmark models by 26.7% in recall and 10.1% in F1-score. Furthermore, guided by the literature knowledge base automatically constructed from 299 domain papers, the LIFT LLM produced variable importance ranking consistent with that derived from the benchmark model, while demonstrating robustness in interpretation results to various data sampling conditions. The LIFT LLM also identified potential risky scenarios by detecting key combination of variables in truck driving risk, which were verified by PERMANOVA tests. Finally, we demonstrated the contribution of the literature knowledge base and the fine-tuning process in the interpretability of the LIFT LLM, and discussed the potential of the LIFT LLM in data-driven knowledge discovery.
zh

[AI-113] Graph-Coarsening Approach for the Capacitated Vehicle Routing Problem with Time Windows

链接: https://arxiv.org/abs/2510.22329
作者: Mustafa Mert Özyılmaz
机构: 未知
类目: Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
备注: 13 pages, 30 figures. Submitted to arXiv under categories quant-ph. A revised version with quantum solver experiment results will be submitted to a peer-reviewed journal

点击查看摘要

[AI-114] Harnessing the Power of Large Language Models for Software Testing Education: A Focus on ISTQB Syllabus

【速读】:该论文旨在解决如何将生成式 AI(Generative AI)与国际软件测试资格委员会(ISTQB)认证框架相结合,以提升高等教育中软件测试教学的效果。其核心问题在于:尽管ISTQB认证在工业界和学术界广泛应用,但其教学方法尚未有效融合近年来大语言模型(LLMs)的进展。解决方案的关键在于构建一个覆盖十年、包含28份样题及1,145道题目的ISTQB对齐数据集,开发针对领域优化的提示(prompt),系统评估主流LLMs在该任务上的表现,并提出可操作的教学整合建议,从而为LLMs在软件工程教育中的应用提供实证基础与实践路径。

链接: https://arxiv.org/abs/2510.22318
作者: Tuan-Phong Ngo,Bao-Ngoc Duong,Tuan-Anh Hoang,Joshua Dwight,Ushik Shrestha Khwakhali
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注: 7 pages, 3 figures, 3 tables

点击查看摘要

Abstract:Software testing is a critical component in the software engineering field and is important for software engineering education. Thus, it is vital for academia to continuously improve and update educational methods to reflect the current state of the field. The International Software Testing Qualifications Board (ISTQB) certification framework is globally recognized and widely adopted in industry and academia. However, ISTQB-based learning has been rarely applied with recent generative artificial intelligence advances. Despite the growing capabilities of large language models (LLMs), ISTQB-based learning and instruction with LLMs have not been thoroughly explored. This paper explores and evaluates how LLMs can complement the ISTQB framework for higher education. The findings present four key contributions: (i) the creation of a comprehensive ISTQB-aligned dataset spanning over a decade, consisting of 28 sample exams and 1,145 questions; (ii) the development of a domain-optimized prompt that enhances LLM precision and explanation quality on ISTQB tasks; (iii) a systematic evaluation of state-of-the-art LLMs on this dataset; and (iv) actionable insights and recommendations for integrating LLMs into software testing education. These findings highlight the promise of LLMs in supporting ISTQB certification preparation and offer a foundation for their broader use in software engineering at higher education.
zh

[AI-115] LacMaterial: Large Language Models as Analogical Chemists for Materials Discovery

【速读】:该论文旨在解决科学发现中因领域专业知识局限和表层偏见导致的深层结构驱动类比推理受限问题,尤其在电池材料设计等跨学科场景下难以突破传统组成空间的问题。其解决方案的关键在于利用大语言模型(LLM)的跨域数据训练优势,通过两种显式类比推理策略实现创新:一是检索跨域类比与类比引导的示例以拓展探索范围,避免局限于常规掺杂替换;二是从少量标注样本中构建领域内类比模板,实现靶向性优化。这两种策略使生成的候选材料突破既有化学组成空间,并显著优于标准提示基线,从而将LLM定位为可解释、类专家的假设生成工具,推动基于类比泛化的科学创新。

链接: https://arxiv.org/abs/2510.22312
作者: Hongyu Guo
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Analogical reasoning, the transfer of relational structures across contexts (e.g., planet is to sun as electron is to nucleus), is fundamental to scientific discovery. Yet human insight is often constrained by domain expertise and surface-level biases, limiting access to deeper, structure-driven analogies both within and across disciplines. Large language models (LLMs), trained on vast cross-domain data, present a promising yet underexplored tool for analogical reasoning in science. Here, we demonstrate that LLMs can generate novel battery materials by (1) retrieving cross-domain analogs and analogy-guided exemplars to steer exploration beyond conventional dopant substitutions, and (2) constructing in-domain analogical templates from few labeled examples to guide targeted exploitation. These explicit analogical reasoning strategies yield candidates outside established compositional spaces and outperform standard prompting baselines. Our findings position LLMs as interpretable, expert-like hypothesis generators that leverage analogy-driven generalization for scientific innovation.
zh

[AI-116] AnyECG-Lab: An Exploration Study of Fine-tuning an ECG Foundation Model to Estimate Laboratory Values from Single-Lead ECG Signals

【速读】:该论文旨在解决临床决策中实验室指标获取延迟的问题,传统方法依赖侵入性静脉采血,难以实现快速响应。研究提出利用心电图(Electrocardiography, ECG)这一非侵入且广泛可用的信号,结合深度学习技术实现对多种血液生化指标的实时估计。其解决方案的关键在于采用迁移学习策略,在斯坦福大学MC-MED数据集上微调一个大规模预训练的ECG基础模型(ECGFounder),并通过构建超过2000万条标准化十秒ECG片段的语料库,增强模型对细微生化相关特征的敏感性,从而在多种实验室指标预测任务中展现出良好的性能表现。

链接: https://arxiv.org/abs/2510.22301
作者: Yujie Xiao,Gongzhen Tang,Wenhui Liu,Jun Li,Guangkun Nie,Zhuoran Kan,Deyun Zhang,Qinghao Zhao,Shenda Hong
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Timely access to laboratory values is critical for clinical decision-making, yet current approaches rely on invasive venous sampling and are intrinsically delayed. Electrocardiography (ECG), as a non-invasive and widely available signal, offers a promising modality for rapid laboratory estimation. Recent progress in deep learning has enabled the extraction of latent hematological signatures from ECGs. However, existing models are constrained by low signal-to-noise ratios, substantial inter-individual variability, limited data diversity, and suboptimal generalization, especially when adapted to low-lead wearable devices. In this work, we conduct an exploratory study leveraging transfer learning to fine-tune ECGFounder, a large-scale pre-trained ECG foundation model, on the Multimodal Clinical Monitoring in the Emergency Department (MC-MED) dataset from Stanford. We generated a corpus of more than 20 million standardized ten-second ECG segments to enhance sensitivity to subtle biochemical correlates. On internal validation, the model demonstrated strong predictive performance (area under the curve above 0.65) for thirty-three laboratory indicators, moderate performance (between 0.55 and 0.65) for fifty-nine indicators, and limited performance (below 0.55) for sixteen indicators. This study provides an efficient artificial-intelligence driven solution and establishes the feasibility scope for real-time, non-invasive estimation of laboratory values.
zh

[AI-117] Does Homophily Help in Robust Test-time Node Classification?

链接: https://arxiv.org/abs/2510.22289
作者: Yan Jiang,Ruihong Qiu,Zi Huang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-118] A Multi-level Analysis of Factors Associated with Student Performance: A Machine Learning Approach to the SAEB Microdata

【速读】:该论文旨在解决巴西基础教育中影响学生学业表现的关键因素识别问题,以支持制定有效的公共政策。其解决方案的关键在于提出了一种多层级机器学习方法,整合了学生社会经济特征、教师专业背景、学校指标及校长管理 profile 四类数据源,并通过对比四种集成算法发现随机森林(Random Forest)模型在分类准确率(90.2%)和AUC(96.7%)上表现最优。进一步采用可解释人工智能(Explainable AI, XAI)技术中的SHAP方法进行特征重要性分析,揭示学校平均社会经济水平是最重要的预测因子,表明学业表现是一个受学校生态系统深刻影响的系统性现象,而非仅由个体特征决定。这一发现为基于数据驱动且可解释的教育公平政策设计提供了实证依据。

链接: https://arxiv.org/abs/2510.22266
作者: Rodrigo Tertulino,Ricardo Almeida
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
备注: This article is being prepared for submission to the International Journal of Educational Technology in Higher Education

点击查看摘要

Abstract:Identifying the factors that influence student performance in basic education is a central challenge for formulating effective public policies in Brazil. This study introduces a multi-level machine learning approach to classify the proficiency of 9th-grade and high school students using microdata from the System of Assessment of Basic Education (SAEB). Our model uniquely integrates four data sources: student socioeconomic characteristics, teacher professional profiles, school indicators, and director management profiles. A comparative analysis of four ensemble algorithms confirmed the superiority of a Random Forest model, which achieved 90.2% accuracy and an Area Under the Curve (AUC) of 96.7%. To move beyond prediction, we applied Explainable AI (XAI) using SHAP, which revealed that the school’s average socioeconomic level is the most dominant predictor, demonstrating that systemic factors have a greater impact than individual characteristics in isolation. The primary conclusion is that academic performance is a systemic phenomenon deeply tied to the school’s ecosystem. This study provides a data-driven, interpretable tool to inform policies aimed at promoting educational equity by addressing disparities between schools.
zh

[AI-119] Epistemic Deep Learning: Enabling Machine Learning Models to Know When They Do Not Know

链接: https://arxiv.org/abs/2510.22261
作者: Shireen Kudukkil Manchingal
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-120] LUNA: Efficient and Topology-Agnostic Foundation Model for EEG Signal Analysis NEURIPS

链接: https://arxiv.org/abs/2510.22257
作者: Berkay Döner,Thorir Mar Ingolfsson,Luca Benini,Yawei Li
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: NeurIPS camera-ready version, 27 pages, 10 figures, 13 tables

点击查看摘要

[AI-121] Rational Adversaries and the Maintenance of Frag ility: A Game-Theoretic Theory of Rational Stagnation

链接: https://arxiv.org/abs/2510.22232
作者: Daisuke Hirota
机构: 未知
类目: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Theoretical Economics (econ.TH)
备注:

点击查看摘要

[AI-122] When Fewer Layers Break More Chains: Layer Pruning Harms Test-Time Scaling in LLM s

链接: https://arxiv.org/abs/2510.22228
作者: Keyu Wang,Tian Lyu,Guinan Su,Jonas Geiping,Lu Yin,Marco Canini,Shiwei Liu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-123] aming Silent Failures: A Framework for Verifiable AI Reliability

【速读】:该论文旨在解决人工智能(Artificial Intelligence, AI)在安全关键系统中引入的“静默失效”(silent failures)问题,即AI组件产生自信但错误的输出,而这些错误无法被传统检测机制发现,从而带来安全隐患。解决方案的关键在于提出了一种名为FAME(Formal Assurance and Monitoring Environment)的新框架,其核心是将离线形式化合成(offline formal synthesis)的数学严谨性与在线运行时监控(online runtime monitoring)的实时警觉性相结合,为黑箱AI模块构建可验证的安全防护网。通过在自动驾驶感知系统中的实证,FAME成功识别了93.5%原本无法察觉的关键安全违规事件,且其设计符合ISO 26262和ISO/PAS 8800标准,为可靠部署可信AI提供了可认证的工程路径。

链接: https://arxiv.org/abs/2510.22224
作者: Guan-Yan Yang,Farn Wang
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Logic in Computer Science (cs.LO); Systems and Control (eess.SY)
备注: This preprint has been accepted by IEEE Reliability Magazine. 10 pages, 3 figures

点击查看摘要

Abstract:The integration of Artificial Intelligence (AI) into safety-critical systems introduces a new reliability paradigm: silent failures, where AI produces confident but incorrect outputs that can be dangerous. This paper introduces the Formal Assurance and Monitoring Environment (FAME), a novel framework that confronts this challenge. FAME synergizes the mathematical rigor of offline formal synthesis with the vigilance of online runtime monitoring to create a verifiable safety net around opaque AI components. We demonstrate its efficacy in an autonomous vehicle perception system, where FAME successfully detected 93.5% of critical safety violations that were otherwise silent. By contextualizing our framework within the ISO 26262 and ISO/PAS 8800 standards, we provide reliability engineers with a practical, certifiable pathway for deploying trustworthy AI. FAME represents a crucial shift from accepting probabilistic performance to enforcing provable safety in next-generation systems.
zh

[AI-124] LSPRAG : LSP-Guided RAG for Language-Agnostic Real-Time Unit Test Generation

链接: https://arxiv.org/abs/2510.22210
作者: Gwihwan Go,Quan Zhang,Chijin Zhou,Zhao Wei,Yu Jiang
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注: 13pages, 6 figures

点击查看摘要

[AI-125] Bridging Perception and Reasoning : Dual-Pipeline Neuro-Symbolic Landing for UAVs in Cluttered Environments

【速读】:该论文旨在解决无人机(UAV)在非结构化环境(如杂乱、不平整且地图信息贫乏的场景)中实现自主着陆时,纯视觉或深度学习模型因协变量偏移(covariate shift)而性能下降且缺乏可解释性的问题。解决方案的关键在于提出一种神经符号框架 NeuroSymLand,其核心创新是将两个互补的流水线紧密耦合:一是离线流水线,利用大语言模型(LLM)与人工干预生成可验证的 Scallop 逻辑代码,提炼出通用的符号知识;二是在线流水线,采用轻量级基础模型进行语义分割并生成概率性 Scallop 事实,进而构建语义场景图用于实时演绎推理。该设计融合了轻量基础模型的感知能力与符号推理的可解释性和可验证性,通过几何计算而非学习方式获取节点属性(如平坦度、面积)和边关系(邻接、包含、接近),从而避免训练依赖和延迟,最终实现高精度、强鲁棒性及高效的安全着陆决策,并输出带排序的感兴趣区域(ROIs)和人类可读的合理性说明。

链接: https://arxiv.org/abs/2510.22204
作者: Weixian Qian,Sebastian Schroder,Yao Deng,Jiaohong Yao,Linfeng Liang,Xiao Cheng,Richard Han,Xi Zheng
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Autonomous landing in unstructured (cluttered, uneven, and map-poor) environments is a core requirement for Unmanned Aerial Vehicles (UAVs), yet purely vision-based or deep learning models often falter under covariate shift and provide limited interpretability. We propose NeuroSymLand, a neuro-symbolic framework that tightly couples two complementary pipelines: (i) an offline pipeline, where Large Language Models (LLMs) and human-in-the-loop refinement synthesize Scallop code from diverse landing scenarios, distilling generalizable and verifiable symbolic knowledge; and (ii) an online pipeline, where a compact foundation-based semantic segmentation model generates probabilistic Scallop facts that are composed into semantic scene graphs for real-time deductive reasoning. This design combines the perceptual strengths of lightweight foundation models with the interpretability and verifiability of symbolic reasoning. Node attributes (e.g., flatness, area) and edge relations (adjacency, containment, proximity) are computed with geometric routines rather than learned, avoiding the data dependence and latency of train-time graph builders. The resulting Scallop program encodes landing principles (avoid water and obstacles; prefer large, flat, accessible regions) and yields calibrated safety scores with ranked Regions of Interest (ROIs) and human-readable justifications. Extensive evaluations across datasets, diverse simulation maps, and real UAV hardware show that NeuroSymLand achieves higher accuracy, stronger robustness to covariate shift, and superior efficiency compared with state-of-the-art baselines, while advancing UAV safety and reliability in emergency response, surveillance, and delivery missions.
zh

[AI-126] Multi-dataset Joint Pre-training of Emotional EEG Enables Generalizable Affective Computing

【速读】:该论文旨在解决跨数据集情绪识别中因数据分布差异大、情绪类别定义不一致以及个体间变异显著而导致的性能瓶颈问题,这些问题使得现有通用预训练脑电图(EEG)模型在复杂任务如情绪识别中表现不佳。其解决方案的关键在于提出一种任务特定的多数据集联合预训练框架,通过引入跨数据集协方差对齐损失(cross-dataset covariance alignment loss)来对齐不同数据集间的二阶统计特性,从而实现无需大量标签或个体校准即可稳健泛化;同时设计了一种混合编码器结构,结合类Mamba线性注意力通道编码器与时空动态建模模块,有效捕捉EEG信号的长期依赖性和复杂动态特征。实验证明,该方法在少样本情绪识别和零样本跨数据集迁移任务中均显著优于当前最优模型。

链接: https://arxiv.org/abs/2510.22197
作者: Qingzhu Zhang,Jiani Zhong,Zongsheng Li,Xinke Shen,Quanying Liu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)
备注:

点击查看摘要

Abstract:Task-specific pre-training is essential when task representations diverge from generic pre-training features. Existing task-general pre-training EEG models struggle with complex tasks like emotion recognition due to mismatches between task-specific features and broad pre-training approaches. This work aims to develop a task-specific multi-dataset joint pre-training framework for cross-dataset emotion recognition, tackling problems of large inter-dataset distribution shifts, inconsistent emotion category definitions, and substantial inter-subject variability. We introduce a cross-dataset covariance alignment loss to align second-order statistical properties across datasets, enabling robust generalization without the need for extensive labels or per-subject calibration. To capture the long-term dependency and complex dynamics of EEG, we propose a hybrid encoder combining a Mamba-like linear attention channel encoder and a spatiotemporal dynamics model. Our method outperforms state-of-the-art large-scale EEG models by an average of 4.57% in AUROC for few-shot emotion recognition and 11.92% in accuracy for zero-shot generalization to a new dataset. Performance scales with the increase of datasets used in pre-training. Multi-dataset joint pre-training achieves a performance gain of 8.55% over single-dataset training. This work provides a scalable framework for task-specific pre-training and highlights its benefit in generalizable affective computing. Our code is available at this https URL.
zh

[AI-127] OptiTree: Hierarchical Thoughts Generation with Tree Search for LLM Optimization Modeling NEURIPS2025

【速读】:该论文旨在解决运筹学(Operations Research, OR)中优化建模自动化问题,特别是针对现有基于大语言模型(Large Language Models, LLMs)的方法在处理复杂数学结构时因固定步骤分解导致建模精度不足的挑战。解决方案的关键在于提出一种名为OptiTree的树搜索方法,通过构建一个基于问题层次分类与复杂度组织的建模树(modeling tree),将复杂OR问题自适应地分解为一系列更简单的子问题,并利用树结构中各节点所包含的高层建模思路进行递归搜索与思想融合,从而实现对全局建模逻辑的精准合成。实验表明,该方法在复杂基准测试上相较当前最优方法建模准确率提升超过10%。

链接: https://arxiv.org/abs/2510.22192
作者: Haoyang Liu,Jie Wang,Yuyang Cai,Xiongwei Han,Yufei Kuang,Jianye Hao
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: Published at NeurIPS 2025

点击查看摘要

Abstract:Optimization modeling is one of the most crucial but technical parts of operations research (OR). To automate the modeling process, existing works have leveraged large language models (LLMs), prompting them to break down tasks into steps for generating variables, constraints, and objectives. However, due to the highly complex mathematical structures inherent in OR problems, standard fixed-step decomposition often fails to achieve high performance. To address this challenge, we introduce OptiTree, a novel tree search approach designed to enhance modeling capabilities for complex problems through adaptive problem decomposition into simpler subproblems. Specifically, we develop a modeling tree that organizes a wide range of OR problems based on their hierarchical problem taxonomy and complexity, with each node representing a problem category and containing relevant high-level modeling thoughts. Given a problem to model, we recurrently search the tree to identify a series of simpler subproblems and synthesize the global modeling thoughts by adaptively integrating the hierarchical thoughts. Experiments show that OptiTree significantly improves the modeling accuracy compared to the state-of-the-art, achieving over 10% improvements on the challenging benchmarks. The code is released at this https URL.
zh

[AI-128] Dopamine-driven synaptic credit assignment in neural networks

链接: https://arxiv.org/abs/2510.22178
作者: Saranraj Nambusubramaniyan,Shervin Safavi,Raja Guru,Andreas Knoblauch
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[AI-129] Measure what Matters: Psychometric Evaluation of AI with Situational Judgment Tests

【速读】:该论文旨在解决当前AI心理测量(AI psychometrics)评估中普遍存在的行为真实性和领域相关性不足的问题,即现有方法多依赖于人类特质量表(如大五人格、HEXACO)或临时构建的角色设定,难以有效模拟复杂情境下的情感判断与伦理决策能力。其解决方案的关键在于提出一个系统性框架:首先,采用情境判断测试(Situational Judgment Tests, SJTs)从真实场景出发探测特定领域的专业能力;其次,融合工业组织心理学与人格心理学理论设计包含行为描述、心理特征、人生经历及社会情感功能的高保真角色(persona);最后,通过结构化生成技术结合人口统计先验和自传体叙事,并以Pydantic模式编码,提升生成内容的可控性与一致性。该框架在执法辅助场景中验证,构建了涵盖8类角色原型、11项属性的4000个SJTs及30万条响应的大型数据集,显著增强了AI系统在伦理敏感任务中的评估效度。

链接: https://arxiv.org/abs/2510.22170
作者: Alexandra Yost,Shreyans Jain,Shivam Raval,Grant Corser,Allen Roush,Nina Xu,Jacqueline Hammack,Ravid Shwartz-Ziv,Amirali Abdullah
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 49 pages

点击查看摘要

Abstract:AI psychometrics evaluates AI systems in roles that traditionally require emotional judgment and ethical consideration. Prior work often reuses human trait inventories (Big Five, \hexaco) or ad hoc personas, limiting behavioral realism and domain relevance. We propose a framework that (1) uses situational judgment tests (SJTs) from realistic scenarios to probe domain-specific competencies; (2) integrates industrial-organizational and personality psychology to design sophisticated personas which include behavioral and psychological descriptors, life history, and social and emotional functions; and (3) employs structured generation with population demographic priors and memoir inspired narratives, encoded with Pydantic schemas. In a law enforcement assistant case study, we construct a rich dataset of personas drawn across 8 persona archetypes and SJTs across 11 attributes, and analyze behaviors across subpopulation and scenario slices. The dataset spans 8,500 personas, 4,000 SJTs, and 300,000 responses. We will release the dataset and all code to the public.
zh

[AI-130] Solving Continuous Mean Field Games: Deep Reinforcement Learning for Non-Stationary Dynamics NEURIPS2025

【速读】:该论文旨在解决现有深度强化学习(Deep Reinforcement Learning, DRL)方法在处理非平稳连续型平均场博弈(Mean Field Games, MFGs)时的局限性,尤其是其在无限状态空间和动态环境下的可扩展性与密度逼近能力不足的问题。解决方案的关键在于:首先基于虚构博弈(Fictitious Play, FP)框架,利用DRL进行最优响应计算,并通过监督学习构建平均策略表示;其次,引入条件归一化流(Conditional Normalizing Flow)来学习随时间变化的人口分布表示,从而有效捕捉系统动态特性并提升模型对复杂多智能体系统的建模精度。

链接: https://arxiv.org/abs/2510.22158
作者: Lorenzo Magnino,Kai Shao,Zida Wu,Jiacheng Shen,Mathieu Laurière
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Optimization and Control (math.OC)
备注: Neurips 2025

点击查看摘要

Abstract:Mean field games (MFGs) have emerged as a powerful framework for modeling interactions in large-scale multi-agent systems. Despite recent advancements in reinforcement learning (RL) for MFGs, existing methods are typically limited to finite spaces or stationary models, hindering their applicability to real-world problems. This paper introduces a novel deep reinforcement learning (DRL) algorithm specifically designed for non-stationary continuous MFGs. The proposed approach builds upon a Fictitious Play (FP) methodology, leveraging DRL for best-response computation and supervised learning for average policy representation. Furthermore, it learns a representation of the time-dependent population distribution using a Conditional Normalizing Flow. To validate the effectiveness of our method, we evaluate it on three different examples of increasing complexity. By addressing critical limitations in scalability and density approximation, this work represents a significant advancement in applying DRL techniques to complex MFG problems, bringing the field closer to real-world multi-agent systems.
zh

[AI-131] Controllable Mathematical Reasoning via Self-Optimizing Thought Vectors

链接: https://arxiv.org/abs/2510.22132
作者: Xuying LI
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-132] Probing Neural Combinatorial Optimization Models NEURIPS2025

【速读】:该论文旨在解决神经组合优化(Neural Combinatorial Optimization, NCO)模型内部机制不透明的问题,即其学习到的表示和决策逻辑缺乏可解释性,从而阻碍了学术研究与实际部署。解决方案的关键在于引入一种新颖的探针工具——系数显著性探针(Coefficient Significance Probing, CS-Probing),通过分析探针过程中系数及其统计显著性,深入揭示NCO模型的表征特性。实验表明,NCO模型既编码了用于解构建造的低层信息,也捕获了有助于优化决策的高层知识;CS-Probing进一步发现主流NCO模型对其表示施加了不同的归纳偏置,提供了模型泛化能力的直接证据,并识别出与特定知识相关的嵌入维度,为改进模型泛化性能提供了可操作的路径。

链接: https://arxiv.org/abs/2510.22131
作者: Zhiqin Zhang,Yining Ma,Zhiguang Cao,Hoong Chuin Lau
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 39 pages, 16 figures. Accepted as Spotlight at NeurIPS 2025

点击查看摘要

Abstract:Neural combinatorial optimization (NCO) has achieved remarkable performance, yet its learned model representations and decision rationale remain a black box. This impedes both academic research and practical deployment, since researchers and stakeholders require deeper insights into NCO models. In this paper, we take the first critical step towards interpreting NCO models by investigating their representations through various probing tasks. Moreover, we introduce a novel probing tool named Coefficient Significance Probing (CS-Probing) to enable deeper analysis of NCO representations by examining the coefficients and statistical significance during probing. Extensive experiments and analysis reveal that NCO models encode low-level information essential for solution construction, while capturing high-level knowledge to facilitate better decisions. Using CS-Probing, we find that prevalent NCO models impose varying inductive biases on their learned representations, uncover direct evidence related to model generalization, and identify key embedding dimensions associated with specific knowledge. These insights can be potentially translated into practice, for example, with minor code modifications, we improve the generalization of the analyzed model. Our work represents a first systematic attempt to interpret black-box NCO models, showcasing probing as a promising tool for analyzing their internal mechanisms and revealing insights for the NCO community. The source code is publicly available.
zh

[AI-133] Efficient Utility-Preserving Machine Unlearning with Implicit Gradient Surgery

链接: https://arxiv.org/abs/2510.22124
作者: Shiji Zhou(Institute of Artificial Intelligence, Beihang University, Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beihang University),Tianbai Yu(University of Illinois at Urbana-Champaign),Zhi Zhang(University of Amsterdam),Heng Chang(Tsinghua University),Xiao Zhou(Tsinghua University),Dong Wu(YanTron Technology Co. Ltd),Han Zhao(University of Illinois at Urbana-Champaign)
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Corresponding author: Shiji Zhou (zhoushiji25@buaa. this http URL ). Shiji Zhou and Tianbai Yu contributed equally

点击查看摘要

[AI-134] When UAV Swarm Meets IRS: Collaborative Secure Communications in Low-altitude Wireless Networks

链接: https://arxiv.org/abs/2510.22117
作者: Jiahui Li,Xinyue Liang,Geng Sun,Hui Kang,Jiacheng Wang,Dusit Niyato,Shiwen Mao,Abbas Jamalipour
机构: 未知
类目: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
备注: 13 pages, 7 figures, submitted to IEEE Journal on Selected Areas in Communications

点击查看摘要

[AI-135] STAR-RIS-assisted Collaborative Beamforming for Low-altitude Wireless Networks

【速读】:该论文旨在解决低空无线网络(Low-altitude wireless networks, LAWNs)在城市密集环境中因障碍物导致的严重信号衰减问题,从而提升通信质量与能效。其核心解决方案是引入无人机协同波束赋形(collaborative beamforming, CB)与同时发射和反射可重构智能表面(simultaneous transmitting and reflecting reconfigurable intelligent surfaces, STAR-RIS)的全向可重构波束赋形(omnidirectional reconfigurable beamforming, ORB),以增强信号的方向性和强度。为优化系统传输速率并降低无人机群能耗,作者提出联合速率与能量优化问题(joint rate and energy optimization problem, JREOP),并设计了一种异构多智能体协同动态(heterogeneous multi-agent collaborative dynamic, HMCD)优化框架,其关键创新在于:基于模拟退火(SA)的STAR-RIS控制方法动态调整反射与透射系数,以及融合自注意力机制与自适应速度迁移机制的改进型多智能体深度强化学习(MADRL)策略,有效提升了训练稳定性和智能体间交互建模能力。

链接: https://arxiv.org/abs/2510.22108
作者: Xinyue Liang,Hui Kang,Junwei Che,Jiahui Li,Geng Sun,Qingqing Wu,Jiacheng Wang,Dusit Niyato
机构: 未知
类目: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
备注: 13 pages, 9 figures, submitted to IEEE Transactions on Communications

点击查看摘要

Abstract:While low-altitude wireless networks (LAWNs) based on uncrewed aerial vehicles (UAVs) offer high mobility, flexibility, and coverage for urban communications, they face severe signal attenuation in dense environments due to obstructions. To address this critical issue, we consider introducing collaborative beamforming (CB) of UAVs and omnidirectional reconfigurable beamforming (ORB) of simultaneous transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS) to enhance the signal quality and directionality. On this basis, we formulate a joint rate and energy optimization problem (JREOP) to maximize the transmission rate of the overall system, while minimizing the energy consumption of the UAV swarm. Due to the non-convex and NP-hard nature of JREOP, we propose a heterogeneous multi-agent collaborative dynamic (HMCD) optimization framework, which has two core components. The first component is a simulated annealing (SA)-based STAR-RIS control method, which dynamically optimizes reflection and transmission coefficients to enhance signal propagation. The second component is an improved multi-agent deep reinforcement learning (MADRL) control method, which incorporates a self-attention evaluation mechanism to capture interactions between UAVs and an adaptive velocity transition mechanism to enhance training stability. Simulation results demonstrate that HMCD outperforms various baselines in terms of convergence speed, average transmission rate, and energy consumption. Further analysis reveals that the average transmission rate of the overall system scales positively with both UAV count and STAR-RIS element numbers.
zh

[AI-136] QuArch: A Benchmark for Evaluating LLM Reasoning in Computer Architecture

【速读】:该论文旨在解决当前大型语言模型(Large Language Model, LLM)评估体系中缺乏对计算机体系结构(Computer Architecture)领域知识与推理能力的系统性评测问题。现有LLM基准普遍忽视了从高级软件抽象到低级硬件实现之间的关键桥梁——计算机体系结构,导致模型在该领域的推理能力未被充分挖掘和量化。解决方案的关键在于提出QuArch(发音为“quark”),这是首个专为计算机体系结构设计的基准测试集,包含2,671对专家验证的问答对(QA pairs),覆盖处理器设计、存储系统和互连网络等多个核心子领域。通过该基准,研究者可全面评估LLM在分析、设计和实现等高阶推理任务中的表现,从而识别并推动LLM在计算系统创新中的能力提升。

链接: https://arxiv.org/abs/2510.22087
作者: Shvetank Prakash,Andrew Cheng,Arya Tschand,Mark Mazumder,Varun Gohil,Jeffrey Ma,Jason Yik,Zishen Wan,Jessica Quaye,Elisavet Lydia Alvanaki,Avinash Kumar,Chandrashis Mazumdar,Tuhin Khare,Alexander Ingare,Ikechukwu Uchendu,Radhika Ghosal,Abhishek Tyagi,Chenyu Wang,Andrea Mattia Garavagno,Sarah Gu,Alice Guo,Grace Hur,Luca Carloni,Tushar Krishna,Ankita Nayak,Amir Yazdanbakhsh,Vijay Janapa Reddi
机构: 未知
类目: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE)
备注:

点击查看摘要

Abstract:The field of computer architecture, which bridges high-level software abstractions and low-level hardware implementations, remains absent from current large language model (LLM) evaluations. To this end, we present QuArch (pronounced ‘quark’), the first benchmark designed to facilitate the development and evaluation of LLM knowledge and reasoning capabilities specifically in computer architecture. QuArch provides a comprehensive collection of 2,671 expert-validated question-answer (QA) pairs covering various aspects of computer architecture, including processor design, memory systems, and interconnection networks. Our evaluation reveals that while frontier models possess domain-specific knowledge, they struggle with skills that require higher-order thinking in computer architecture. Frontier model accuracies vary widely (from 34% to 72%) on these advanced questions, highlighting persistent gaps in architectural reasoning across analysis, design, and implementation QAs. By holistically assessing fundamental skills, QuArch provides a foundation for building and measuring LLM capabilities that can accelerate innovation in computing systems. With over 140 contributors from 40 institutions, this benchmark represents a community effort to set the standard for architectural reasoning in LLM evaluation.
zh

[AI-137] Automatic Assessment of Students Classroom Engagement with Bias Mitigated Multi-task Model

【速读】:该论文旨在解决在线学习环境中学生参与度(student engagement)监测与提升的问题,尤其是传统评估方法难以直接适用于虚拟教学场景的挑战。其核心解决方案是提出一种新型训练方法,通过引入属性正交化正则化(attribute-orthogonal regularization)技术,构建一个分层式分类器(split-model classifier),结合多种迁移学习策略,有效抑制模型对敏感特征(如性别)的依赖性,从而降低预测结果在不同敏感群体间的分布差异。该方法不仅有助于实现伦理合规性,还能增强模型预测的可解释性,实验表明,该方法将未缓解模型的皮尔逊相关系数从0.897降至0.999,显著改善了公平性表现。

链接: https://arxiv.org/abs/2510.22057
作者: James Thiering,Tarun Sethupat Radha Krishna,Dylan Zelkin,Ashis Kumer Biswas
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
备注: 13 pages, 12 figures, and 1 table

点击查看摘要

Abstract:With the rise of online and virtual learning, monitoring and enhancing student engagement have become an important aspect of effective education. Traditional methods of assessing a student’s involvement might not be applicable directly to virtual environments. In this study, we focused on this problem and addressed the need to develop an automated system to detect student engagement levels during online learning. We proposed a novel training method which can discourage a model from leveraging sensitive features like gender for its predictions. The proposed method offers benefits not only in the enforcement of ethical standards, but also to enhance interpretability of the model predictions. We applied an attribute-orthogonal regularization technique to a split-model classifier, which uses multiple transfer learning strategies to achieve effective results in reducing disparity in the distribution of prediction for sensitivity groups from a Pearson correlation coefficient of 0.897 for the unmitigated model, to 0.999 for the mitigated model. The source code for this project is available on this https URL .
zh

[AI-138] Energy-Efficient Domain-Specific Artificial Intelligence Models and Agents : Pathways and Paradigms

链接: https://arxiv.org/abs/2510.22052
作者: Abhijit Chatterjee,Niraj K. Jha,Jonathan D. Cohen,Thomas L. Griffiths,Hongjing Lu,Diana Marculescu,Ashiqur Rasul,Keshab K. Parhi
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[AI-139] owards Error-Centric Intelligence II: Energy-Structured Causal Models

【速读】:该论文试图解决当前机器学习模型在因果可解释性方面的局限性问题,即尽管模型预测性能达到顶尖水平,但其内部表征缺乏因果语义,无法实现对特定机制的精准干预(surgical editing),从而阻碍了对系统行为的理解与可控调整。解决方案的关键在于提出“计算解释”(computational explanations)这一新范式,通过引入能量结构因果模型(Energy Structured Causal Models, ESCMs),将机制表示为约束(如能量函数或向量场)而非显式的输入输出映射,使干预操作可在机制层面进行局部手术式修改;同时,该框架基于结构因果原则(如LAP和ICM)构建因果推理形式语言,并揭示经验风险最小化导致的表征纠缠本质为编码器能量对中的规范模糊性(gauge ambiguity),最终在弱条件下恢复标准结构性因果模型(SCM)语义,从而实现从预测导向到解释导向的智能范式转变。

链接: https://arxiv.org/abs/2510.22050
作者: Marcus Thomas
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Contemporary machine learning optimizes for predictive accuracy, yet systems that achieve state of the art performance remain causally opaque: their internal representations provide no principled handle for intervention. We can retrain such models, but we cannot surgically edit specific mechanisms while holding others fixed, because learned latent variables lack causal semantics. We argue for a conceptual reorientation: intelligence is the ability to build and refine explanations, falsifiable claims about manipulable structure that specify what changes and what remains invariant under intervention. Explanations subsume prediction but demand more: causal commitments that can be independently tested and corrected at the level of mechanisms. We introduce computational explanations, mappings from observations to intervention ready causal accounts. We instantiate these explanations with Energy Structured Causal Models (ESCMs), in which mechanisms are expressed as constraints (energy functions or vector fields) rather than explicit input output maps, and interventions act by local surgery on those constraints. This shift makes internal structure manipulable at the level where explanations live: which relations must hold, which can change, and what follows when they do. We provide concrete instantiations of the structural-causal principles LAP and ICM in the ESCM context, and also argue that empirical risk minimization systematically produces fractured, entangled representations, a failure we analyze as gauge ambiguity in encoder energy pairs. Finally, we show that under mild conditions, ESCMs recover standard SCM semantics. Building on Part I’s principles (LAP, ICM, CAP) and its definition of intelligence as explanation-building under criticism, this paper offers a formal language for causal reasoning in systems that aspire to understand, not merely to predict.
zh

[AI-140] HW/SW Co-design of a PCM/PWM converter: a System Level Approach based in the SpecC Methodology

【速读】:该论文旨在解决在系统级硬件/软件协同设计(Hardware/Software Co-Design)中如何高效实现PCM-to-PWM转换器的问题,该转换器是Class-D音频放大器的核心模块。为应对纯硬件方案成本过高与纯软件方案性能不足的困境,研究提出采用SpecC方法论进行建模与探索,以获得最优的软硬件划分(HW/SW Partition)。其解决方案的关键在于利用系统级估算和快速功能仿真,在满足实时性约束的前提下,实现成本与性能之间的可量化权衡,从而避免了全硬件实现的高开销和全软件实现对高端处理器的依赖。

链接: https://arxiv.org/abs/2510.22046
作者: Daniel G. P. Petrini,Braz Izaias da Silva Junior
机构: 未知
类目: Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Software Engineering (cs.SE)
备注: 6

点击查看摘要

Abstract:We present a case study applying the SpecC methodology within a system-level hardware/software co-design flow to a PCM-to-PWM converter, the core of a Class-D audio amplifier. The converter was modeled and explored with SpecC methodology to derive an HW/SW partition. Using system-level estimates and fast functional simulation, we evaluated mappings that meet real-time constraints while reducing estimated cost of an all-hardware solution and avoiding the expense of a purely software implementation on a high-end processor. Despite the design’s moderate complexity, the results underline the value of system-level co-design for early architectural insight, rapid validation, and actionable cost/performance trade-offs. [Original work from 2005; formatting revised in 2025, with no changes to the results.]
zh

[AI-141] Predictive Coding Enhances Meta-RL To Achieve Interpretable Bayes-Optimal Belief Representation Under Partial Observability NEURIPS

【速读】:该论文旨在解决元强化学习(Meta-Reinforcement Learning, Meta-RL)在部分可观测环境中的表征学习效率问题,即尽管元RL代理能够逼近贝叶斯最优策略,却难以学习到紧凑且可解释的贝叶斯最优信念状态(Belief States),从而限制其适应性和泛化能力。解决方案的关键在于引入自监督的预测编码(Predictive Coding)模块嵌入到元RL框架中,借鉴神经科学中预测编码机制和深度强化学习中的辅助预测目标,以引导模型学习更接近贝叶斯最优信念状态的表征。实验表明,这种整合显著提升了表征的可解释性与准确性,并在需要主动信息获取的复杂任务中实现了最优策略与表征的同步学习,最终增强了模型的泛化性能。

链接: https://arxiv.org/abs/2510.22039
作者: Po-Chen Kuo,Han Hou,Will Dabney,Edgar Y. Walker
机构: 未知
类目: Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)
备注: Accepted to Annual Conference on Neural Information Processing Systems (NeurIPS) 2025

点击查看摘要

Abstract:Learning a compact representation of history is critical for planning and generalization in partially observable environments. While meta-reinforcement learning (RL) agents can attain near Bayes-optimal policies, they often fail to learn the compact, interpretable Bayes-optimal belief states. This representational inefficiency potentially limits the agent’s adaptability and generalization capacity. Inspired by predictive coding in neuroscience–which suggests that the brain predicts sensory inputs as a neural implementation of Bayesian inference–and by auxiliary predictive objectives in deep RL, we investigate whether integrating self-supervised predictive coding modules into meta-RL can facilitate learning of Bayes-optimal representations. Through state machine simulation, we show that meta-RL with predictive modules consistently generates more interpretable representations that better approximate Bayes-optimal belief states compared to conventional meta-RL across a wide variety of tasks, even when both achieve optimal policies. In challenging tasks requiring active information seeking, only meta-RL with predictive modules successfully learns optimal representations and policies, whereas conventional meta-RL struggles with inadequate representation learning. Finally, we demonstrate that better representation learning leads to improved generalization. Our results strongly suggest the role of predictive learning as a guiding principle for effective representation learning in agents navigating partial observability.
zh

[AI-142] LLM -AR: LLM -powered Automated Reasoning Framework

【速读】:该论文旨在解决大语言模型(Large Language Models, LLMs)在高风险决策场景中因预测准确率不稳定而难以被采纳的问题。其核心解决方案是提出一种名为LLM-AR的可解释推理框架,该框架受神经符号系统启发,通过将LLM生成的启发式规则蒸馏为概率规则,并由ProbLog自动推理引擎执行,从而实现稳定且可审计的预测;关键创新在于引入基于关联规则挖掘的迭代策略演化机制,持续优化预测规则,在未见数据上达到59.5%精度和8.7%召回率,显著优于随机基线(精度提升5.9倍),同时保留完整的决策路径供人工审查。

链接: https://arxiv.org/abs/2510.22034
作者: Rick Chen,Joseph Ternasky,Aaron Ontoyin Yin,Xianling Mu,Fuat Alican,Yigit Ihlamur
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Large language models (LLMs) can already identify patterns and reason effectively, yet their variable accuracy hampers adoption in high-stakes decision-making applications. In this paper, we study this issue from a venture capital perspective by predicting idea-stage startup success based on founder traits. (i) To build a reliable prediction model, we introduce LLM-AR, a pipeline inspired by neural-symbolic systems that distils LLM-generated heuristics into probabilistic rules executed by the ProbLog automated-reasoning engine. (ii) An iterative policy-evolution loop incorporates association-rule mining to progressively refine the prediction rules. On unseen folds, LLM-AR achieves 59.5% precision and 8.7% recall, 5.9x the random baseline precision, while exposing every decision path for human inspection. The framework is interpretable and tunable via hyperparameters, showing promise to extend into other domains. Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG) Cite as: arXiv:2510.22034 [cs.AI] (or arXiv:2510.22034v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2510.22034 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
zh

[AI-143] Differentiable Constraint-Based Causal Discovery

链接: https://arxiv.org/abs/2510.22031
作者: Jincheng Zhou,Mengbo Wang,Anqi He,Yumeng Zhou,Hessam Olya,Murat Kocaoglu,Bruno Ribeiro
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注:

点击查看摘要

[AI-144] Online Optimization for Offline Safe Reinforcement Learning NEURIPS2025

【速读】:该论文致力于解决离线安全强化学习(Offline Safe Reinforcement Learning, OSRL)问题,即在固定数据集上学习一个最大化奖励的策略,同时满足累积成本约束。其解决方案的关键在于将OSRL建模为一个极小极大(minimax)优化目标,并通过结合离线强化学习(offline RL)与在线优化算法来求解。作者证明了当该方法与近似离线RL代理和无遗憾(no-regret)在线优化算法集成时,可实现近似最优性;此外,提出了一种实用的近似形式,可与任意离线RL算法兼容,从而无需依赖离线策略评估(offline policy evaluation)。实验证明该方法在DSRL基准测试中能可靠地在严格成本预算下执行安全约束并获得高奖励。

链接: https://arxiv.org/abs/2510.22027
作者: Yassine Chemingui,Aryan Deshwal,Alan Fern,Thanh Nguyen-Tang,Janardhan Rao Doppa
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注: To appear in NeurIPS 2025 Conference

点击查看摘要

Abstract:We study the problem of Offline Safe Reinforcement Learning (OSRL), where the goal is to learn a reward-maximizing policy from fixed data under a cumulative cost constraint. We propose a novel OSRL approach that frames the problem as a minimax objective and solves it by combining offline RL with online optimization algorithms. We prove the approximate optimality of this approach when integrated with an approximate offline RL oracle and no-regret online optimization. We also present a practical approximation that can be combined with any offline RL algorithm, eliminating the need for offline policy evaluation. Empirical results on the DSRL benchmark demonstrate that our method reliably enforces safety constraints under stringent cost budgets, while achieving high rewards. The code is available at this https URL.
zh

[AI-145] Normalization in Attention Dynamics NEURIPS2025

链接: https://arxiv.org/abs/2510.22026
作者: Nikita Karagodin,Shu Ge,Yury Polyanskiy,Philippe Rigollet
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 39th Conference on Neural Information Processing Systems (NeurIPS 2025), 23 pages

点击查看摘要

[AI-146] LightAgent : Mobile Agent ic Foundation Models

【速读】:该论文旨在解决移动图形用户界面(GUI)代理系统在模型性能与部署成本之间的矛盾问题:小型本地模型(4B或更小)性能不足,而高性能模型(从7B开始)则因体积过大难以部署于移动端或依赖云端服务导致成本高昂。解决方案的关键在于提出LightAgent,一种基于设备-云协同的轻量级智能体基础模型架构,其核心包括:通过两阶段监督微调与奖励模型策略优化(SFT-GRPO)训练增强Qwen2.5-VL-3B模型的决策能力;引入高效的长推理机制以在资源受限环境下复用历史交互信息;并采用实时复杂度评估机制,默认仅在本地执行任务,仅将高难度子任务动态上传至云端处理,从而在保持性能的同时显著降低云服务开销。

链接: https://arxiv.org/abs/2510.22009
作者: Yangqin Jiang,Chao Huang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:With the advancement of multimodal large language models (MLLMs), building GUI agent systems has become an increasingly promising direction-especially for mobile platforms, given their rich app ecosystems and intuitive touch interactions. Yet mobile GUI agents face a critical dilemma: truly on-device models (4B or smaller) lack sufficient performance, while capable models (starting from 7B) are either too large for mobile deployment or prohibitively costly (e.g., cloud-only closed-source MLLMs). To resolve this, we propose LightAgent, a mobile agentic foundation model solution that leverages device-cloud collaboration to tap the cost-efficiency of on-device models and the high capability of cloud models, while avoiding their drawbacks. Specifically, LightAgent enhances Qwen2.5-VL-3B via two-stage SFT-GRPO training on synthetic GUI data for strong decision-making, integrates an efficient long-reasoning mechanism to utilize historical interactions under tight resources, and defaults to on-device execution-only escalating challenging subtasks to the cloud via real-time complexity assessment. Experiments on the online AndroidLab benchmark and diverse apps show LightAgent matches or nears larger models, with a significant reduction in cloud costs.
zh

[AI-147] Impact and Implications of Generative AI for Enterprise Architects in Agile Environments: A Systematic Literature Review

链接: https://arxiv.org/abs/2510.22003
作者: Stefan Julian Kooy,Jean Paul Sebastian Piest,Rob Henk Bemthuis
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注: 17 pages, 1 figure, 5 tables; to appear in Enterprise Design, Operations, and Computing. EDOC 2025 Workshops, Lecture Notes in Business Information Processing (LNBIP), Springer, 2025. Part of 29th International Conference on Enterprise Design, Operations, and Computing (EDOC)

点击查看摘要

[AI-148] Foundation of Intelligence: Review of Math Word Problems from Human Cognition Perspective

【速读】:该论文旨在解决数学应用题(Math Word Problem, MWP)研究领域缺乏系统性分类与当前发展趋势讨论的问题。其解决方案的关键在于从人类认知角度出发,提出并总结了五种关键的认知能力:问题理解(Problem Understanding)、逻辑组织(Logical Organization)、关联记忆(Associative Memory)、批判性思维(Critical Thinking)和知识学习(Knowledge Learning),以此为框架对近十年主流MWP求解模型——神经网络求解器与大语言模型(Large Language Models, LLMs)——进行系统性回顾与性能统一评估,从而揭示AI模型在模拟人类推理过程中的演进路径与能力表现。

链接: https://arxiv.org/abs/2510.21999
作者: Zhenya Huang,Jiayu Liu,Xin Lin,Zhiyuan Ma,Shangzi Xue,Tong Xiao,Qi Liu,Yee Whye Teh,Enhong Chen
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Math word problem (MWP) serves as a fundamental research topic in artificial intelligence (AI) dating back to 1960s. This research aims to advance the reasoning abilities of AI by mirroring the human-like cognitive intelligence. The mainstream technological paradigm has evolved from the early rule-based methods, to deep learning models, and is rapidly advancing towards large language models. However, the field still lacks a systematic taxonomy for the MWP survey along with a discussion of current development trends. Therefore, in this paper, we aim to comprehensively review related research in MWP solving through the lens of human cognition, to demonstrate how recent AI models are advancing in simulating human cognitive abilities. Specifically, we summarize 5 crucial cognitive abilities for MWP solving, including Problem Understanding, Logical Organization, Associative Memory, Critical Thinking, and Knowledge Learning. Focused on these abilities, we review two mainstream MWP models in recent 10 years: neural network solvers, and LLM based solvers, and discuss the core human-like abilities they demonstrated in their intricate problem-solving process. Moreover, we rerun all the representative MWP solvers and supplement their performance on 5 mainstream benchmarks for a unified comparison. To the best of our knowledge, this survey first comprehensively analyzes the influential MWP research of the past decade from the perspective of human reasoning cognition and provides an integrative overall comparison across existing approaches. We hope it can inspire further research in AI reasoning. Our repository is released on this https URL.
zh

[AI-149] From Black-box to Causal-box: Towards Building More Interpretable Models NEURIPS2025

【速读】:该论文试图解决深度学习模型预测结果难以解释的问题,特别是在高风险应用场景中,如何通过回答反事实问题(counterfactual questions)来揭示模型推理机制。其核心挑战在于现有模型架构(如黑盒模型和基于概念的预测器)通常不具备因果可解释性(causal interpretability),即无法从观测数据中可靠地评估反事实查询。解决方案的关键在于提出了一种“因果可解释性”的形式化定义,并构建了一个基于图模型的完整判别准则,用于判断特定模型架构是否支持给定的反事实查询;在此基础上,进一步识别出在保证因果可解释性的前提下,能最大化预测表达能力的特征集合,从而实现因果可解释性与预测准确性的权衡与优化。

链接: https://arxiv.org/abs/2510.21998
作者: Inwoo Hwang,Yushu Pan,Elias Bareinboim
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注: NeurIPS 2025

点击查看摘要

Abstract:Understanding the predictions made by deep learning models remains a central challenge, especially in high-stakes applications. A promising approach is to equip models with the ability to answer counterfactual questions – hypothetical ``what if?‘’ scenarios that go beyond the observed data and provide insight into a model reasoning. In this work, we introduce the notion of causal interpretability, which formalizes when counterfactual queries can be evaluated from a specific class of models and observational data. We analyze two common model classes – blackbox and concept-based predictors – and show that neither is causally interpretable in general. To address this gap, we develop a framework for building models that are causally interpretable by design. Specifically, we derive a complete graphical criterion that determines whether a given model architecture supports a given counterfactual query. This leads to a fundamental tradeoff between causal interpretability and predictive accuracy, which we characterize by identifying the unique maximal set of features that yields an interpretable model with maximal predictive expressiveness. Experiments corroborate the theoretical findings.
zh

[AI-150] Is Temporal Difference Learning the Gold Standard for Stitching in RL?

链接: https://arxiv.org/abs/2510.21995
作者: Michał Bortkiewicz,Władysław Pałucki,Mateusz Ostaszewski,Benjamin Eysenbach
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注: The first two authors contributed equally. Project website: this https URL

点击查看摘要

[AI-151] wo-Steps Diffusion Policy for Robotic Manipulation via Genetic Denoising NEURIPS2025

链接: https://arxiv.org/abs/2510.21991
作者: Mateo Clemente,Leo Brunswic,Rui Heng Yang,Xuan Zhao,Yasser Khalil,Haoyu Lei,Amir Rasouli,Yinchuan Li
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注: 16 pages, 11 figure, 2 tables, accepted at Neurips 2025

点击查看摘要

[AI-152] Beyond Reasoning Gains: Mitigating General Capabilities Forgetting in Large Reasoning Models

【速读】:该论文旨在解决强化学习中可验证奖励(Reinforcement Learning with Verifiable Rewards, RLVR)范式下模型出现的能力退化问题,即在长期训练过程中,模型会遗忘基础能力(如感知和忠实性),尽管引入了KL散度等正则化项以防止偏离基线模型,但这些方法无法保障更广泛的知识保留。解决方案的关键在于提出一种名为RECAP的动态目标重加权经验回放策略:通过在线分析短期收敛与不稳定性信号,自动调整不同任务目标的训练权重,将训练焦点从已饱和的目标转移至表现不佳或波动较大的目标上,从而实现通用知识的持续保留,并提升推理能力,且无需额外训练模型或复杂调参,可直接集成到现有RLVR流程中。

链接: https://arxiv.org/abs/2510.21978
作者: Hoang Phan,Xianjun Yang,Kevin Yao,Jingyu Zhang,Shengjie Bi,Xiaocheng Tang,Madian Khabsa,Lijuan Liu,Deren Lei
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Reinforcement learning with verifiable rewards (RLVR) has delivered impressive gains in mathematical and multimodal reasoning and has become a standard post-training paradigm for contemporary language and vision-language models. However, the RLVR recipe introduces a significant risk of capability regression, where models forget foundational skills after prolonged training without employing regularization strategies. We empirically confirm this concern, observing that open-source reasoning models suffer performance degradation on core capabilities such as perception and faithfulness. While imposing regularization terms like KL divergence can help prevent deviation from the base model, these terms are calculated on the current task, thus they do not guarantee broader knowledge. Meanwhile, commonly used experience replay across heterogeneous domains makes it nontrivial to decide how much training focus each objective should receive. To address this, we propose RECAP-a replay strategy with dynamic objective reweighting for general knowledge preservation. Our reweighting mechanism adapts in an online manner using short-horizon signals of convergence and instability, shifting the post-training focus away from saturated objectives and toward underperforming or volatile ones. Our method is end-to-end and readily applicable to existing RLVR pipelines without training additional models or heavy tuning. Extensive experiments on benchmarks based on Qwen2.5-VL-3B and Qwen2.5-VL-7B demonstrate the effectiveness of our method, which not only preserves general capabilities but also improves reasoning by enabling more flexible trade-offs among in-task rewards.
zh

[AI-153] Distribution Shift Alignment Helps LLM s Simulate Survey Response Distributions

链接: https://arxiv.org/abs/2510.21977
作者: Ji Huang,Mengfei Li,Shuai Shao
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-154] ArchISMiner: A Framework for Automatic Mining of Architectural Issue-Solution Pairs from Online Developer Communities

【速读】:该论文旨在解决从Stack Overflow(SO)等开发者社区中高效提取架构知识(如架构问题与解决方案对)的难题,因其内容高度非结构化且分散,导致开发者需手动筛选,效率低下且易出错。解决方案的关键在于提出一个名为ArchISMiner的框架,其核心由两个互补模块构成:ArchPI用于自动识别与架构相关的帖子(Architecture-Related Posts, ARPs),通过对比传统机器学习/深度学习模型、预训练语言模型(Pre-trained Language Models, PLMs)及大语言模型(Large Language Models, LLMs)的表现,最终选用最优模型实现高精度ARP检测(F1=0.960);ArchISPE则采用间接监督方法,融合BERT嵌入与局部TextCNN特征,从ARPs中抽取架构问题-解决方案对,在SE和NLP领域均显著优于基线方法(问题F1=0.883,解决方案F1=0.894)。该框架有效提升了架构知识挖掘的准确性与效率,且在多个论坛上验证了其泛化能力并发布了包含超1.8万条问题-解决方案对的数据集。

链接: https://arxiv.org/abs/2510.21966
作者: Musengamana Jean de Dieu,Ruiyin Li,Peng Liang,Mojtaba Shahin,Muhammad Waseem,Arif Ali Khan,Bangchao Wang,Mst Shamima Aktar
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注: 42 pages, 14 images, 6 tables, Manuscript submitted to a Journal (2025)

点击查看摘要

Abstract:Stack Overflow (SO), a leading online community forum, is a rich source of software development knowledge. However, locating architectural knowledge, such as architectural solutions remains challenging due to the overwhelming volume of unstructured content and fragmented discussions. Developers must manually sift through posts to find relevant architectural insights, which is time-consuming and error-prone. This study introduces ArchISMiner, a framework for mining architectural knowledge from SO. The framework comprises two complementary components: ArchPI and ArchISPE. ArchPI trains and evaluates multiple models, including conventional ML/DL models, Pre-trained Language Models (PLMs), and Large Language Models (LLMs), and selects the best-performing model to automatically identify Architecture-Related Posts (ARPs) among programming-related discussions. ArchISPE employs an indirect supervised approach that leverages diverse features, including BERT embeddings and local TextCNN features, to extract architectural issue-solution pairs. Our evaluation shows that the best model in ArchPI achieves an F1-score of 0.960 in ARP detection, and ArchISPE outperforms baselines in both SE and NLP fields, achieving F1-scores of 0.883 for architectural issues and 0.894 for solutions. A user study further validated the quality (e.g., relevance and usefulness) of the identified ARPs and the extracted issue-solution pairs. Moreover, we applied ArchISMiner to three additional forums, releasing a dataset of over 18K architectural issue-solution pairs. Overall, ArchISMiner can help architects and developers identify ARPs and extract succinct, relevant, and useful architectural knowledge from developer communities more accurately and efficiently. The replication package of this study has been provided at this https URL
zh

[AI-155] owards Low-Latency and Adaptive Ransomware Detection Using Contrastive Learning

【速读】:该论文旨在解决传统 ransomware 检测方法在面对快速演化的恶意软件变种时所面临的三大局限:特征依赖性过强、响应延迟高以及对未见变种适应能力差的问题。其解决方案的关键在于融合自监督对比学习(self-supervised contrastive learning)与神经架构搜索(Neural Architecture Search, NAS),通过引入硬件性能计数器(Hardware Performance Counters, HPC)捕捉运行时行为特征,设计一种鼓励早期恶意活动识别的定制损失函数以降低检测延迟,并利用 NAS 自动构建可适应未知变种的模型架构,从而实现更高准确率(最高提升16.1%)和更快响应速度(最高提速6倍)的同时保持对抗规避攻击的鲁棒性。

链接: https://arxiv.org/abs/2510.21957
作者: Zhixin Pan,Ziyu Shu,Amberbir Alemayoh
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注: This paper was accepted in the 2025 IEEE International Conference on Computer Design (ICCD)

点击查看摘要

Abstract:Ransomware has become a critical threat to cybersecurity due to its rapid evolution, the necessity for early detection, and growing diversity, posing significant challenges to traditional detection methods. While AI-based approaches had been proposed by prior works to assist ransomware detection, existing methods suffer from three major limitations, ad-hoc feature dependencies, delayed response, and limited adaptability to unseen variants. In this paper, we propose a framework that integrates self-supervised contrastive learning with neural architecture search (NAS) to address these challenges. Specifically, this paper offers three important contributions. (1) We design a contrastive learning framework that incorporates hardware performance counters (HPC) to analyze the runtime behavior of target ransomware. (2) We introduce a customized loss function that encourages early-stage detection of malicious activity, and significantly reduces the detection latency. (3) We deploy a neural architecture search (NAS) framework to automatically construct adaptive model architectures, allowing the detector to flexibly align with unseen ransomware variants. Experimental results show that our proposed method achieves significant improvements in both detection accuracy (up to 16.1%) and response time (up to 6x) compared to existing approaches while maintaining robustness under evasive attacks.
zh

[AI-156] AutoSciDACT: Automated Scientific Discovery through Contrastive Embedding and Hypothesis Testing NEURIPS2025

【速读】:该论文旨在解决科学数据中新颖性检测(novelty detection)的两大核心挑战:一是实验数据通常具有高维度和噪声干扰,二是需要对检测到的异常值做出统计上严谨的科学声明。解决方案的关键在于提出了一种统一的自动化流程 AutoSciDACT(Automated Scientific Discovery with Anomalous Contrastive Testing),其核心机制包括两个阶段:首先利用对比预训练(contrastive pre-training)从大量高质量模拟数据中学习低维表达表示,结合领域专家指导的数据增强策略以提升表征能力;随后基于这些紧凑嵌入,采用 New Physics Learning Machine (NPLM) 框架实施高灵敏度的两样本检验,从而实现对观测数据相对于参考分布(零假设)的偏差进行统计量化。该方法在天文、物理、生物、图像及合成数据等多个领域均展现出对微小异常注入的高度敏感性。

链接: https://arxiv.org/abs/2510.21935
作者: Samuel Bright-Thonney,Christina Reissel,Gaia Grosso,Nathaniel Woodward,Katya Govorkova,Andrzej Novak,Sang Eon Park,Eric Moreno,Philip Harris
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注: Accepted at NeurIPS 2025; 32 pages, 16 figures

点击查看摘要

Abstract:Novelty detection in large scientific datasets faces two key challenges: the noisy and high-dimensional nature of experimental data, and the necessity of making statistically robust statements about any observed outliers. While there is a wealth of literature on anomaly detection via dimensionality reduction, most methods do not produce outputs compatible with quantifiable claims of scientific discovery. In this work we directly address these challenges, presenting the first step towards a unified pipeline for novelty detection adapted for the rigorous statistical demands of science. We introduce AutoSciDACT (Automated Scientific Discovery with Anomalous Contrastive Testing), a general-purpose pipeline for detecting novelty in scientific data. AutoSciDACT begins by creating expressive low-dimensional data representations using a contrastive pre-training, leveraging the abundance of high-quality simulated data in many scientific domains alongside expertise that can guide principled data augmentation strategies. These compact embeddings then enable an extremely sensitive machine learning-based two-sample test using the New Physics Learning Machine (NPLM) framework, which identifies and statistically quantifies deviations in observed data relative to a reference distribution (null hypothesis). We perform experiments across a range of astronomical, physical, biological, image, and synthetic datasets, demonstrating strong sensitivity to small injections of anomalous data across all domains.
zh

[AI-157] A Comparison of Conversational Models and Humans in Answering Technical Questions: the Firefox Case

【速读】:该论文旨在解决开源软件(Open Source Software, OSS)项目中核心维护者因频繁回答开发者提问而面临的工作负荷过重问题。为提升开发协助效率,研究提出以检索增强生成(Retrieval-Augmented Generation, RAG)技术改进大语言模型(Large Language Models, LLMs)的响应能力,其关键在于通过引入外部知识库检索机制,使生成式AI在回答开发者问题时更具信息完整性与准确性,从而减轻人工负担并维持高质量输出。实验表明,RAG增强后的模型在全面性上优于人类开发者(62.50% vs 54.17%),且帮助度接近人类水平(75.00% vs 79.17%),验证了该方案在大规模项目如Mozilla Firefox中的应用潜力。

链接: https://arxiv.org/abs/2510.21933
作者: Joao Correia,Daniel Coutinho,Marco Castelluccio,Caio Barbosa,Rafael de Mello,Anita Sarma,Alessandro Garcia,Marco Gerosa,Igor Steinmacher
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注: 13 pages

点击查看摘要

Abstract:The use of Large Language Models (LLMs) to support tasks in software development has steadily increased over recent years. From assisting developers in coding activities to providing conversational agents that answer newcomers’ questions. In collaboration with the Mozilla Foundation, this study evaluates the effectiveness of Retrieval-Augmented Generation (RAG) in assisting developers within the Mozilla Firefox project. We conducted an empirical analysis comparing responses from human developers, a standard GPT model, and a GPT model enhanced with RAG, using real queries from Mozilla’s developer chat rooms. To ensure a rigorous evaluation, Mozilla experts assessed the responses based on helpfulness, comprehensiveness, and conciseness. The results show that RAG-assisted responses were more comprehensive than human developers (62.50% to 54.17%) and almost as helpful (75.00% to 79.17%), suggesting RAG’s potential to enhance developer assistance. However, the RAG responses were not as concise and often verbose. The results show the potential to apply RAG-based tools to Open Source Software (OSS) to minimize the load to core maintainers without losing answer quality. Toning down retrieval mechanisms and making responses even shorter in the future would enhance developer assistance in massive projects like Mozilla Firefox.
zh

[AI-158] Enabling Robust In-Context Memory and Rapid Task Adaptation in Transformers with Hebbian and Gradient-Based Plasticity

【速读】:该论文试图解决的问题是:大型语言模型(Large Language Models, LLMs)在推理过程中依赖静态权重,缺乏生物神经系统中通过突触可塑性实现的动态适应能力,从而限制了其在序列内快速任务特定调整的能力。解决方案的关键在于,在解码器-only 的 Transformer 架构中引入显式的、受生物学启发的突触可塑性机制,具体包括两种方式:(i) 一种由神经调制的赫布型(Hebbian)规则,用于快速、事件驱动的局部更新;(ii) 基于梯度的可塑性机制(如 Duan et al., 2023 所提出),适用于长程信用分配任务。实验表明,赫布型可塑性在复制、回归和少样本分类任务中表现出更低损失和更强的泛化性能,而基于梯度的方法在长序列任务中更优,揭示了不同可塑性机制在不同任务条件下的有效性边界。

链接: https://arxiv.org/abs/2510.21908
作者: Siddharth Chaudhary
机构: 未知
类目: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Large language models display in-context learning as an emergent effect of scale, but they rely on static weights during inference. In contrast, biological systems continually adapt via synaptic plasticity. We investigate whether explicit, biologically inspired plasticity can endow Transformers with faster in-sequence adaptation. To this end, we augment decoder-only Transformers with fast-weight modules updated either by (i) a neuromodulated Hebbian rule or (ii) the gradient-based plasticity mechanism of Duan et al. (2023). Across copying, regression, and few-shot classification tasks (CIFAR-FS, Omniglot), Hebbian plasticity consistently achieves lower loss and stronger few-shot generalization, while gradient-based updates perform best on long-horizon credit assignment. When associations are short and linearly separable, static weights suffice, defining a clear boundary condition for when plasticity helps. Analysis of learned modulatory signals reveals that gradient-based rules maintain large, persistent updates, whereas Hebbian plasticity is sharply gated around salient events. Together, these results show that explicit plasticity complements attention by enabling rapid, task-specific adaptation, and clarify when different plasticity mechanisms are most effective.
zh

[AI-159] Structure-Aware Cooperative Ensemble Evolutionary Optimization on Combinatorial Problems with Multimodal Large Language Models

【速读】:该论文旨在解决传统进化算法(Evolutionary Algorithms, EAs)在处理图结构组合优化问题时,因编码方式(如二进制或数值表示)难以有效捕捉网络拓扑特性而导致的性能瓶颈问题。其解决方案的关键在于引入基于图像的编码策略以保留网络的拓扑上下文信息,并利用多模态大语言模型(Multimodal Large Language Models, MLLMs)作为进化算子,实现结构感知的优化过程;同时通过图稀疏化技术简化大规模网络可视化中的视觉杂乱问题,结合协同进化优化框架促进跨域知识迁移并统一多种稀疏变体,最终借助布局集成策略通过共识投票提升MLLM对不同网络布局敏感性的鲁棒性。

链接: https://arxiv.org/abs/2510.21906
作者: Jie Zhao,Kang Hao Cheong
机构: 未知
类目: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Evolutionary algorithms (EAs) have proven effective in exploring the vast solution spaces typical of graph-structured combinatorial problems. However, traditional encoding schemes, such as binary or numerical representations, often fail to straightforwardly capture the intricate structural properties of networks. Through employing the image-based encoding to preserve topological context, this study utilizes multimodal large language models (MLLMs) as evolutionary operators to facilitate structure-aware optimization over graph data. To address the visual clutter inherent in large-scale network visualizations, we leverage graph sparsification techniques to simplify structures while maintaining essential structural features. To further improve robustness and mitigate bias from different sparsification views, we propose a cooperative evolutionary optimization framework that facilitates cross-domain knowledge transfer and unifies multiple sparsified variants of diverse structures. Additionally, recognizing the sensitivity of MLLMs to network layout, we introduce an ensemble strategy that aggregates outputs from various layout configurations through consensus voting. Finally, experiments on real-world networks through various tasks demonstrate that our approach improves both the quality and reliability of solutions in MLLM-driven evolutionary optimization.
zh

[AI-160] OM-SWE: User Mental Modeling For Software Engineering Agents

【速读】:该论文旨在解决当前编码代理(coding agents)在处理用户意图时存在的不足,尤其是在指令不明确或依赖上下文的情况下难以准确推断和跟踪用户目标的问题。其解决方案的关键在于提出了一种双代理架构 ToM-SWE,其中主代理负责软件工程任务执行,而一个轻量级理论心智(Theory-of-Mind, ToM)伙伴代理则专注于建模用户的心理状态,通过分析指令与交互历史来推断用户的目标、约束和偏好,并维护对用户的持续记忆,从而向主代理提供与用户相关的建议。这一设计显著提升了任务成功率和用户满意度,尤其在状态感知的软件工程基准测试中表现突出。

链接: https://arxiv.org/abs/2510.21903
作者: Xuhui Zhou,Valerie Chen,Zora Zhiruo Wang,Graham Neubig,Maarten Sap,Xingyao Wang
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Recent advances in coding agents have made them capable of planning, editing, running, and testing complex code bases. Despite their growing ability in coding tasks, these systems still struggle to infer and track user intent, especially when instructions are underspecified or context-dependent. To bridge this gap, we introduce ToM-SWE, a dual-agent architecture that pairs a primary software-engineering (SWE) agent with a lightweight theory-of-mind (ToM) partner agent dedicated to modeling the user’s mental state. The ToM agent infers user goals, constraints, and preferences from instructions and interaction history, maintains a \textbfpersistent memory of the user, and provides user-related suggestions to the SWE agent. In two software engineering benchmarks (ambiguous SWE-bench and stateful SWE-bench), ToM-SWE improves task success rates and user satisfaction. Notably, on the stateful SWE benchmark, a newly introduced evaluation that provides agents with a user simulator along with previous interaction histories, ToM-SWE achieves a substantially higher task success rate of 59.7% compared to 18.1% for OpenHands, a state-of-the-art SWE agent. Furthermore, in a three-week study with professional developers using ToM-SWE in their daily work, participants found it useful 86% of the time, underscoring the value of stateful user modeling for practical coding agents.
zh

[AI-161] Software Engineering Agents for Embodied Controller Generation : A Study in Minigrid Environments

【速读】:该论文旨在解决软件工程智能体(Software Engineering Agents, SWE-Agents)在具身任务(embodied tasks)中控制器生成能力的评估空白问题,特别是其在缺乏传统代码库访问条件下的表现。关键解决方案在于将Mini-SWE-Agent(MSWEA)扩展应用于Minigrid环境中20种多样化的具身任务,并系统比较不同信息获取条件下的性能差异:包括是否拥有环境源代码访问权限,以及交互式探索能力的强弱。通过量化静态代码分析与动态探索对任务求解的相对贡献,该研究确立了具身任务控制器生成作为SWE-Agents的重要评估领域,并提供了未来高效推理系统研究的基准结果。

链接: https://arxiv.org/abs/2510.21902
作者: Timothé Boulet,Xavier Hinaut,Clément Moulin-Frier
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注: 10 pages, 7 figures

点击查看摘要

Abstract:Software Engineering Agents (SWE-Agents) have proven effective for traditional software engineering tasks with accessible codebases, but their performance for embodied tasks requiring well-designed information discovery remains unexplored. We present the first extended evaluation of SWE-Agents on controller generation for embodied tasks, adapting Mini-SWE-Agent (MSWEA) to solve 20 diverse embodied tasks from the Minigrid environment. Our experiments compare agent performance across different information access conditions: with and without environment source code access, and with varying capabilities for interactive exploration. We quantify how different information access levels affect SWE-Agent performance for embodied tasks and analyze the relative importance of static code analysis versus dynamic exploration for task solving. This work establishes controller generation for embodied tasks as a crucial evaluation domain for SWE-Agents and provides baseline results for future research in efficient reasoning systems.
zh

[AI-162] he Principles of Diffusion Models

【速读】:该论文旨在解决生成式模型中如何统一理解与构建扩散模型(Diffusion Models)的问题,特别是阐明其背后的数学原理和不同建模视角之间的内在联系。解决方案的关键在于提出三种互补的理论视角:变分视角(Variational View)、基于得分的视角(Score-Based View)和基于流的视角(Flow-Based View),它们共同依赖于一个时间依赖的速度场(time-dependent velocity field),该速度场定义了一个从简单先验分布到数据分布的连续映射路径。通过求解相应的微分方程,可实现从噪声到数据的平滑演化过程,从而为可控生成、高效数值求解及扩散启发的流映射模型提供统一的理论框架和数学基础。

链接: https://arxiv.org/abs/2510.21890
作者: Chieh-Hsin Lai,Yang Song,Dongjun Kim,Yuki Mitsufuji,Stefano Ermon
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Graphics (cs.GR)
备注:

点击查看摘要

Abstract:This monograph presents the core principles that have guided the development of diffusion models, tracing their origins and showing how diverse formulations arise from shared mathematical ideas. Diffusion modeling starts by defining a forward process that gradually corrupts data into noise, linking the data distribution to a simple prior through a continuum of intermediate distributions. The goal is to learn a reverse process that transforms noise back into data while recovering the same intermediates. We describe three complementary views. The variational view, inspired by variational autoencoders, sees diffusion as learning to remove noise step by step. The score-based view, rooted in energy-based modeling, learns the gradient of the evolving data distribution, indicating how to nudge samples toward more likely regions. The flow-based view, related to normalizing flows, treats generation as following a smooth path that moves samples from noise to data under a learned velocity field. These perspectives share a common backbone: a time-dependent velocity field whose flow transports a simple prior to the data. Sampling then amounts to solving a differential equation that evolves noise into data along a continuous trajectory. On this foundation, the monograph discusses guidance for controllable generation, efficient numerical solvers, and diffusion-motivated flow-map models that learn direct mappings between arbitrary times. It provides a conceptual and mathematically grounded understanding of diffusion models for readers with basic deep-learning knowledge.
zh

[AI-163] Computational Hardness of Reinforcement Learning with Partial qπ-Realizability NEURIPS2025

【速读】:该论文旨在解决在部分 $ q^\pi $-可实现(partial $ q^\pi $-realizability)线性函数逼近框架下,强化学习中求解 ϵ\epsilon-最优策略的计算复杂性问题。该框架假设所有策略对应的值函数均可线性表示,其假设强度介于 $ q^\pi $-可实现性和 $ q^* $-可实现性之间,具有更强的实际可行性。研究的关键在于通过归约方法证明:在参数化贪婪策略集(argmax)下,问题为 NP-hard;而在 softmax 策略集下,在随机指数时间假设(Randomized Exponential Time Hypothesis)成立的前提下,存在特征维度上的指数下界。这一结果表明,即使策略集合扩展至非最优策略,学习 ϵ\epsilon-最优策略仍具计算困难,从而揭示了在该框架下通常无法获得有效的正向计算结果,与 $ q^\pi $-可实现性在生成访问模型下的良好性质形成鲜明对比。

链接: https://arxiv.org/abs/2510.21888
作者: Shayan Karimi,Xiaoqi Tan
机构: 未知
类目: Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Machine Learning (cs.LG)
备注: to be published in NeurIPS 2025

点击查看摘要

Abstract:This paper investigates the computational complexity of reinforcement learning in a novel linear function approximation regime, termed partial q^\pi -realizability. In this framework, the objective is to learn an \epsilon -optimal policy with respect to a predefined policy set \Pi , under the assumption that all value functions for policies in \Pi are linearly realizable. The assumptions of this framework are weaker than those in q^\pi -realizability but stronger than those in q^* -realizability, providing a practical model where function approximation naturally arises. We prove that learning an \epsilon -optimal policy in this setting is computationally hard. Specifically, we establish NP-hardness under a parameterized greedy policy set (argmax) and show that - unless NP = RP - an exponential lower bound (in feature vector dimension) holds when the policy set contains softmax policies, under the Randomized Exponential Time Hypothesis. Our hardness results mirror those in q^* -realizability and suggest computational difficulty persists even when \Pi is expanded beyond the optimal policy. To establish this, we reduce from two complexity problems, \delta -Max-3SAT and \delta -Max-3SAT(b), to instances of GLinear- \kappa -RL (greedy policy) and SLinear- \kappa -RL (softmax policy). Our findings indicate that positive computational results are generally unattainable in partial q^\pi -realizability, in contrast to q^\pi -realizability under a generative access model.
zh

[AI-164] Exploration through Generation: Applying GFlowNets to Structured Search

【速读】:该论文旨在解决图优化问题中的组合优化难题,包括旅行商问题(Traveling Salesperson Problem, TSP)、最小生成树(Minimum Spanning Tree)和最短路径(Shortest Path)。其解决方案的关键在于应用生成流网络(Generative Flow Networks, GFlowNets),这是一种能够学习按奖励函数比例采样解的生成模型。GFlowNets通过轨迹平衡损失(Trajectory Balance loss)进行训练,以序列化方式逐步构建解:对生成树选择边、对路径选择节点、对TSP巡回选择城市。实验表明,该方法能学习到最优解,并在不同规模的基准实例上与经典算法(如Dijkstra、Kruskal及精确求解器)的结果一致。其核心优势在于计算可扩展性——相较于传统算法每实例固定复杂度,GFlowNets通过训练实现计算 amortization,在足够算力支持下有望处理经典方法难以应对的大规模问题实例。

链接: https://arxiv.org/abs/2510.21886
作者: Mark Phillip Matovic
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 12 pages

点击查看摘要

Abstract:This work applies Generative Flow Networks (GFlowNets) to three graph optimization problems: the Traveling Salesperson Problem, Minimum Spanning Tree, and Shortest Path. GFlowNets are generative models that learn to sample solutions proportionally to a reward function. The models are trained using the Trajectory Balance loss to build solutions sequentially, se- lecting edges for spanning trees, nodes for paths, and cities for tours. Experiments on benchmark instances of varying sizes show that GFlowNets learn to find optimal solutions. For each problem type, multiple graph configurations with different numbers of nodes were tested. The generated solutions match those from classical algorithms (Dijkstra for shortest path, Kruskal for spanning trees, and exact solvers for TSP). Training convergence depends on problem complexity, with the number of episodes required for loss stabilization increasing as graph size grows. Once training converges, the generated solutions match known optima from classical algorithms across the tested instances. This work demonstrates that generative models can solve combinatorial optimization problems through learned policies. The main advantage of this learning-based approach is computational scalability: while classical algorithms have fixed complexity per instance, GFlowNets amortize computation through training. With sufficient computational resources, the framework could potentially scale to larger problem instances where classical exact methods become infeasible.
zh

[AI-165] A Physics-Informed Neural Network Approach for UAV Path Planning in Dynamic Environments

链接: https://arxiv.org/abs/2510.21874
作者: Shuning Zhang
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注: 15 pages, 8 figures

点击查看摘要

[AI-166] GuitarFlow: Realistic Electric Guitar Synthesis From Tablatures via Flow Matching and Style Transfer

【速读】:该论文旨在解决电吉他(electric guitar)合成中可控性与表现力不足的问题,尤其针对现有方法在表达吉他特有演奏技巧(如推弦、闷音和连奏等)时的局限性。解决方案的关键在于提出GuitarFlow模型,其核心创新是利用吉他专用的符号记谱法——TAB谱(tablature)作为生成引导,并通过两阶段流程实现高质量音频合成:首先使用基于采样的虚拟乐器将TAB谱渲染为初始音频,再借助流匹配(Flow Matching)技术进行风格迁移,从而将虚拟乐器音色转换为更逼真的电吉他音色。该方法训练效率高(少于6小时数据即可完成训练),且在客观指标和主观听感测试中均显著提升了生成音频的真实感。

链接: https://arxiv.org/abs/2510.21872
作者: Jackson Loth,Pedro Sarmento,Mark Sandler,Mathieu Barthet
机构: 未知
类目: ound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
备注: To be published in Proceedings of the 17th International Symposium on Computer Music and Multidisciplinary Research (CMMR)

点击查看摘要

Abstract:Music generation in the audio domain using artificial intelligence (AI) has witnessed steady progress in recent years. However for some instruments, particularly the guitar, controllable instrument synthesis remains limited in expressivity. We introduce GuitarFlow, a model designed specifically for electric guitar synthesis. The generative process is guided using tablatures, an ubiquitous and intuitive guitar-specific symbolic format. The tablature format easily represents guitar-specific playing techniques (e.g. bends, muted strings and legatos), which are more difficult to represent in other common music notation formats such as MIDI. Our model relies on an intermediary step of first rendering the tablature to audio using a simple sample-based virtual instrument, then performing style transfer using Flow Matching in order to transform the virtual instrument audio into more realistic sounding examples. This results in a model that is quick to train and to perform inference, requiring less than 6 hours of training data. We present the results of objective evaluation metrics, together with a listening test, in which we show significant improvement in the realism of the generated guitar audio from tablatures.
zh

[AI-167] Capability Ceilings in Autoregressive Language Models: Empirical Evidence from Knowledge-Intensive Tasks

【速读】:该论文试图解决的问题是:在解码器-only的自回归语言模型中,参数规模扩展是否能够有效提升知识密集型任务(如知识检索和数学推理)的性能。研究表明,尽管随着参数量增加,交叉熵损失持续下降,但这类任务的准确率并未显著改善,呈现出“能力天花板”现象。解决方案的关键在于通过系统性评估OPT和Pythia模型家族(参数规模从70M到30B,跨度达240倍)在多个基准上的表现,发现知识密集型任务的准确率趋于饱和(如MMLU数学基准稳定在19-20%,低于随机猜测的25%),而过程性任务(如算术运算)则表现出传统缩放规律。此外,注意力干预实验揭示了模型对注意力模式扰动高度敏感,微小变化即可导致性能崩溃,而非渐进退化。这一发现为资源分配提供了量化依据:对于知识密集型应用,参数扩展超过1-2B后收益极低,提示需重新审视架构设计以突破当前能力瓶颈。

链接: https://arxiv.org/abs/2510.21866
作者: Javier Marín
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: The experiments in this paper were performed in January 2024. Current model architectures are considerably more complex than those presented here

点击查看摘要

Abstract:We document empirical capability ceilings in decoder-only autoregressive language models across knowledge-intensive tasks. Systematic evaluation of OPT and Pythia model families (70M-30B parameters, spanning 240 times scaling) reveals that knowledge retrieval tasks show negligible accuracy improvement despite smooth loss reduction. On MMLU mathematics benchmarks, accuracy remains flat at 19-20% (below 25% random chance) across all scales while cross-entropy loss decreases by 31%. In contrast, procedural tasks like arithmetic show conventional scaling where both metrics improve together. Attention intervention experiments reveal high sensitivity to perturbation: swapping attention patterns between models causes catastrophic performance collapse (complete accuracy loss) rather than graceful degradation. These measurements have immediate engineering implications: for knowledge-intensive applications using OPT and Pythia architectures, parameter scaling beyond 1-2B offers minimal accuracy gains despite continued loss improvement. Our findings quantify capability-specific scaling failures in these model families to inform resource allocation decisions. Whether these patterns reflect fundamental constraints of decoder-only architectures or implementation-specific limitations remains an open question requiring investigation across diverse architectural approaches.
zh

[AI-168] Butter-Bench: Evaluating LLM Controlled Robots for Practical Intelligence

【速读】:该论文旨在解决当前大型语言模型(Large Language Model, LLM)在控制机器人执行物理世界任务时,其“实践智能”(practical intelligence)——即应对现实世界复杂性和不确定性能力——的评估问题。现有系统通常采用分层架构,由LLM负责高层推理、视觉语言动作(Vision Language Action, VLA)模型执行底层控制,但缺乏对LLM独立性能的量化评估。为此,作者提出了Butter-Bench基准测试,专门用于评估LLM在无需VLA辅助下的任务规划与决策能力。关键创新在于将LLM与VLA解耦,从而精准识别LLM在多步骤空间规划和社交理解等维度的不足,揭示尽管LLM在分析型任务中超越人类,但在实践中仍显著落后于人类(最佳LLM得分40%,人类均值95%),且针对具身推理进行微调并未提升其表现。

链接: https://arxiv.org/abs/2510.21860
作者: Callum Sharrock,Lukas Petersson,Hanna Petersson,Axel Backlund,Axel Wennström,Kristoffer Nordström,Elias Aronsson
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:We present Butter-Bench, a benchmark evaluating large language model (LLM) controlled robots for practical intelligence, defined as the ability to navigate the messiness of the physical world. Current state-of-the-art robotic systems use a hierarchical architecture with LLMs in charge of high-level reasoning, and a Vision Language Action (VLA) model for low-level control. Butter-Bench evaluates the LLM part in isolation from the VLA. Although LLMs have repeatedly surpassed humans in evaluations requiring analytical intelligence, we find humans still outperform LLMs on Butter-Bench. The best LLMs score 40% on Butter-Bench, while the mean human score is 95%. LLMs struggled the most with multi-step spatial planning and social understanding. We also evaluate LLMs that are fine-tuned for embodied reasoning and conclude that this training does not improve their score on Butter-Bench.
zh

[AI-169] Privacy-preserving Decision-focused Learning for Multi-energy Systems

【速读】:该论文旨在解决多能源系统(Multi-Energy System, MES)调度中负载预测与决策制定分离所导致的效率低下问题,尤其是传统预测模型仅优化预测误差而忽视对下游调度成本的影响。为此,论文提出一种隐私保护的决策导向学习(Decision-Focused Learning, DFL)框架,其关键在于通过信息掩码机制在保护敏感负载数据的同时恢复决策变量和梯度以支持模型训练,并结合矩阵分解与同态加密设计安全协议,有效防范合谋攻击和未授权访问;此外,还开发了隐私保护的负载模式识别算法,使针对异构负载模式的专用DFL模型得以训练,从而在保障隐私的前提下显著降低平均每日调度成本。

链接: https://arxiv.org/abs/2510.21858
作者: Yangze Zhou,Ruiyang Yao,Dalin Qin,Yixiong Jia,Yi Wang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
备注: 10 pages, 7 figures

点击查看摘要

Abstract:Decision-making for multi-energy system (MES) dispatch depends on accurate load forecasting. Traditionally, load forecasting and decision-making for MES are implemented separately. Forecasting models are typically trained to minimize forecasting errors, overlooking their impact on downstream decision-making. To address this, decision-focused learning (DFL) has been studied to minimize decision-making costs instead. However, practical adoption of DFL in MES faces significant challenges: the process requires sharing sensitive load data and model parameters across multiple sectors, raising serious privacy issues. To this end, we propose a privacy-preserving DFL framework tailored for MES. Our approach introduces information masking to safeguard private data while enabling recovery of decision variables and gradients required for model training. To further enhance security for DFL, we design a safety protocol combining matrix decomposition and homomorphic encryption, effectively preventing collusion and unauthorized data access. Additionally, we developed a privacy-preserving load pattern recognition algorithm, enabling the training of specialized DFL models for heterogeneous load patterns. Theoretical analysis and comprehensive case studies, including real-world MES data, demonstrate that our framework not only protects privacy but also consistently achieves lower average daily dispatch costs compared to existing methods.
zh

[AI-170] owerVision: Understanding and Improving Multilinguality in Vision-Language Models

【速读】:该论文旨在解决当前视觉语言模型(Vision-Language Models, VLMs)普遍采用英语中心设计导致在多语言场景下性能受限的问题。其核心解决方案是构建TowerVision,一个基于多语言纯文本模型Tower+的开源多语言VLM家族,通过优化训练数据组成、编码器选择和文本骨干网络等关键设计因素,显著提升跨语言泛化能力。关键创新在于:1)利用高质量、精心构建的多语言视觉-语言数据集VisionBlocks进行训练;2)在微调阶段引入视觉与文化语境信息,使模型在ALM-Bench、Multi30K及ViMUL-Bench等多个多模态多语言基准上超越使用更大规模数据训练的现有方法,尤其在文化相关任务和多模态翻译中表现突出;3)实证表明,多语言训练数据能有效增强从高资源语言到低资源语言乃至反向的跨语言迁移能力,且指令微调的大语言模型(Instruction-Tuned Large Language Models, LLMs)并非总是最优初始化点。

链接: https://arxiv.org/abs/2510.21849
作者: André G. Viveiros,Patrick Fernandes,Saul Santos,Sonal Sannigrahi,Emmanouil Zaranis,Nuno M. Guerreiro,Amin Farajian,Pierre Colombo,Graham Neubig,André F. T. Martins
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 15 pages, 7 figures, submitted to arXiv October 2025. All models, datasets, and training code will be released at this https URL

点击查看摘要

Abstract:Despite significant advances in vision-language models (VLMs), most existing work follows an English-centric design process, limiting their effectiveness in multilingual settings. In this work, we provide a comprehensive empirical study analyzing the impact of several multilingual design choices, such as training data composition, encoder selection, and text backbones. The result is TowerVision, a family of open multilingual VLMs for both image-text and video-text tasks, built upon the multilingual text-only model Tower+. TowerVision achieves competitive performance on multiple multimodal multilingual benchmarks and shows particular strength in culturally grounded tasks and multimodal translation. By incorporating visual and cultural context during fine-tuning, our models surpass existing approaches trained on substantially larger datasets, as demonstrated on ALM-Bench and Multi30K (image tasks) and ViMUL-Bench (video tasks). Alongside the models, we release VisionBlocks, a high-quality, curated vision-language dataset. Our findings highlight that multilingual vision-language training data substantially improves cross-lingual generalization – both from high-resource to underrepresented languages and vice versa – and that instruction-tuned LLMs are not always the optimal initialization point. To support further research, we publicly release all models, data, and training recipes.
zh

[AI-171] raining data membership inference via Gaussian process meta-modeling: a post-hoc analysis approach

【速读】:该论文旨在解决成员推理攻击(Membership Inference Attacks, MIAs)在实际应用中依赖大量查询访问或构建影子模型(shadow models)所带来的局限性问题,从而提升攻击方法的实用性与可部署性。其解决方案的关键在于提出一种基于高斯过程(Gaussian Process, GP)元建模的高效且可解释的方法——GP-MIA:该方法仅需单个训练好的模型,即可利用后验指标(如准确率、熵、数据集统计特征等)以及可选的敏感性特征(如梯度、神经切空间(NTK)度量),通过训练一个高斯过程分类器来区分成员样本与非成员样本,并提供校准的不确定性估计,从而在多个真实和合成数据集上实现高精度与强泛化能力。

链接: https://arxiv.org/abs/2510.21846
作者: Yongchao Huang,Pengfei Zhang,Shahzad Mumtaz
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 10 pages

点击查看摘要

Abstract:Membership inference attacks (MIAs) test whether a data point was part of a model’s training set, posing serious privacy risks. Existing methods often depend on shadow models or heavy query access, which limits their practicality. We propose GP-MIA, an efficient and interpretable approach based on Gaussian process (GP) meta-modeling. Using post-hoc metrics such as accuracy, entropy, dataset statistics, and optional sensitivity features (e.g. gradients, NTK measures) from a single trained model, GP-MIA trains a GP classifier to distinguish members from non-members while providing calibrated uncertainty estimates. Experiments on synthetic data, real-world fraud detection data, CIFAR-10, and WikiText-2 show that GP-MIA achieves high accuracy and generalizability, offering a practical alternative to existing MIAs.
zh

[AI-172] GAPO: Group Adaptive Policy Optimization for Real-World Code Edit

【速读】:该论文旨在解决强化学习(Reinforcement Learning, RL)在大语言模型(Large Language Models, LLMs)代码编辑后训练中,因奖励分布偏斜和异常值导致的优势估计失真与噪声增加的问题。现有群组相对方法(如GRPO)依赖于群体均值计算优势,对非正态分布的奖励数据敏感,难以适应真实场景中的复杂奖励结构。解决方案的关键在于提出一种群组自适应策略优化(Group Adaptive Policy Optimization, GAPO),其核心机制是:针对每个提示(prompt)动态识别无异常值的最高密度区间(Highest-Density Interval, HDI),并以该区间的中位数作为自适应Q值替代原群体均值用于优势计算,从而在保持插件式兼容性和高效性的同时,显著提升对偏斜奖励分布的鲁棒性。

链接: https://arxiv.org/abs/2510.21830
作者: Jianqing Zhang,Zhezheng Hao,Wei Xia,Hande Dong,Hong Wang,Chenxing Wei,Yuyan Zhou,Yubin Qi,Qiang Lin,Jian Cao
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Reinforcement learning (RL) is widely used for post-training large language models (LLMs) in code editing, where group-relative methods like GRPO are popular for their critic-free, normalized advantage estimation. However, in real-world code-editing scenarios, reward distributions are often skewed with unpredictable outliers, leading to distorted advantage computation and increased noise. To address this issue, we propose Group Adaptive Policy Optimization (GAPO), which adaptively finds an outlier-free highest-density interval (HDI) per prompt and then uses the median of that interval as an adaptive Q to replace the group mean in advantage calculation. This adaptive Q robustly handles skewed distributions while remaining plug-and-play and efficient. We validate GAPO on nine instruction-tuned LLMs (3B-14B) using a large internal dataset of 51,844 real-world, history-aware code-editing tasks across 10 languages, demonstrating consistent improvements in exact match accuracy over GRPO and its variant DAPO. Code is publicly available.
zh

[AI-173] Unlocking Biomedical Insights: Hierarchical Attention Networks for High-Dimensional Data Interpretation

【速读】:该论文旨在解决高维数据场景下机器学习模型在准确性与可解释性之间难以平衡的问题,尤其是在基因组学、医疗健康等对决策敏感领域中,传统深度学习模型虽具备强大预测能力但缺乏透明度,限制了其临床部署。解决方案的关键在于提出一种名为层级注意力可解释网络(Hierarchical Attention-based Interpretable Network, HAIN)的新架构,其核心创新包括:多层级注意力机制实现特征级可解释性(通过梯度加权注意力),原型驱动的损失函数提供全局模型解释(基于原型表示),并结合降维技术提升鲁棒性。实验表明,HAIN在TCGA数据集上达到94.3%的分类准确率,且在解释力上优于SHAP和LIME等后处理方法,同时能识别出具有生物学意义的癌症生物标志物,从而兼顾精准预测与透明决策,推动可解释人工智能在精准医学中的应用。

链接: https://arxiv.org/abs/2510.21820
作者: Rekha R Nair,Tina Babu,Alavikunhu Panthakkan,Hussain Al-Ahmad,Balamurugan Balusamy
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:The proliferation of high-dimensional datasets in fields such as genomics, healthcare, and finance has created an urgent need for machine learning models that are both highly accurate and inherently interpretable. While traditional deep learning approaches deliver strong predictive performance, their lack of transparency often impedes their deployment in critical, decision-sensitive applications. In this work, we introduce the Hierarchical Attention-based Interpretable Network (HAIN), a novel architecture that unifies multi-level attention mechanisms, dimensionality reduction, and explanation-driven loss functions to deliver interpretable and robust analysis of complex biomedical data. HAIN provides feature-level interpretability via gradientweighted attention and offers global model explanations through prototype-based representations. Comprehensive evaluation on The Cancer Genome Atlas (TCGA) dataset demonstrates that HAIN achieves a classification accuracy of 94.3%, surpassing conventional post-hoc interpretability approaches such as SHAP and LIME in both transparency and explanatory power. Furthermore, HAIN effectively identifies biologically relevant cancer biomarkers, supporting its utility for clinical and research applications. By harmonizing predictive accuracy with interpretability, HAIN advances the development of transparent AI solutions for precision medicine and regulatory compliance.
zh

[AI-174] Unifying Inductive Cross-Domain and Multimodal Learning for Robust and Generalizable Recommendation CIKM2025

链接: https://arxiv.org/abs/2510.21812
作者: Chanyoung Chung,Kyeongryul Lee,Sunbin Park,Joyce Jiyoung Whang
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 7 pages, 3 figures, and 4 tables. International Workshop on Multimodal Generative Search and Recommendation (MMGenSR) at The 34th ACM International Conference on Information and Knowledge Management (CIKM 2025)

点击查看摘要

[AI-175] DiffGRM: Diffusion-based Generative Recommendation Model

【速读】:该论文针对生成式推荐(Generative Recommendation, GR)中基于自回归建模(Autoregressive Modeling, ARM)的局限性展开研究,旨在解决两个核心问题:一是项内一致性(intra-item consistency),即SID(Semantic ID)的n个数字共同表征一个物品,但自回归机制仅依赖前缀信息进行逐位预测,无法利用双向上下文;二是位间异质性(inter-digit heterogeneity),即不同数字在语义粒度和可预测性上存在差异,而统一的next-token损失函数对所有数字赋予相同权重,导致易预测位过拟合、难预测位欠拟合。解决方案的关键在于提出DiffGRM,一种基于扩散模型(Diffusion-based Model)的新型生成式推荐框架,其核心创新包括:(1)采用并行语义编码(Parallel Semantic Encoding, PSE)解耦SID各数字并平衡每维信息;(2)引入策略一致噪声(On-policy Coherent Noising, OCN)机制,通过相干掩码优先训练不确定性高的数字以聚焦监督信号;(3)设计置信度引导并行去噪(Confidence-guided Parallel Denoising, CPD)推理策略,按置信度顺序填充高置信度数字并生成多样Top-K候选。实验表明,该方法显著优于现有生成与判别型推荐基线,NDCG@10提升达6.9%-15.5%。

链接: https://arxiv.org/abs/2510.21805
作者: Zhao Liu,Yichen Zhu,Yiqing Yang,Guoping Tang,Rui Huang,Qiang Luo,Xiao Lv,Ruiming Tang,Kun Gai,Guorui Zhou
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 13 pages, 5 figures

点击查看摘要

Abstract:Generative recommendation (GR) is an emerging paradigm that represents each item via a tokenizer as an n-digit semantic ID (SID) and predicts the next item by autoregressively generating its SID conditioned on the user’s history. However, two structural properties of SIDs make ARMs ill-suited. First, intra-item consistency: the n digits jointly specify one item, yet the left-to-right causality trains each digit only under its prefix and blocks bidirectional cross-digit evidence, collapsing supervision to a single causal path. Second, inter-digit heterogeneity: digits differ in semantic granularity and predictability, while the uniform next-token objective assigns equal weight to all digits, overtraining easy digits and undertraining hard digits. To address these two issues, we propose DiffGRM, a diffusion-based GR model that replaces the autoregressive decoder with a masked discrete diffusion model (MDM), thereby enabling bidirectional context and any-order parallel generation of SID digits for recommendation. Specifically, we tailor DiffGRM in three aspects: (1) tokenization with Parallel Semantic Encoding (PSE) to decouple digits and balance per-digit information; (2) training with On-policy Coherent Noising (OCN) that prioritizes uncertain digits via coherent masking to concentrate supervision on high-value signals; and (3) inference with Confidence-guided Parallel Denoising (CPD) that fills higher-confidence digits first and generates diverse Top-K candidates. Experiments show consistent gains over strong generative and discriminative recommendation baselines on multiple datasets, improving NDCG@10 by 6.9%-15.5%. Code is available at this https URL.
zh

[AI-176] Quantifying Multimodal Imbalance: A GMM-Guided Adaptive Loss for Audio-Visual Learning

【速读】:该论文旨在解决多模态数据中模态不平衡(multimodal imbalance)问题,即不同模态(如音频与视觉)在模型预测时贡献不均,导致性能受限。传统方法主要依赖架构调整或优化策略,但缺乏对模态不平衡程度的定量分析。其解决方案的关键在于提出一种基于“模态差距”(Modality Gap)的量化分析方法——定义为不同模态对真实类别预测的Softmax得分差异,并发现该差距分布可由双峰高斯混合模型(bimodal Gaussian Mixture Model, GMM)有效建模,分别对应“模态平衡”和“模态失衡”的样本子集。进一步利用贝叶斯定理计算每个样本属于两类的概率,据此设计了一种样本级自适应损失函数,包含三个目标:最小化整体模态差距、促使失衡样本分布向平衡状态迁移、并对失衡样本施加更高惩罚权重。该方法在CREMA-D和AVE数据集上达到SOTA性能(80.65%和70.90%),验证了其有效性。

链接: https://arxiv.org/abs/2510.21797
作者: Zhaocheng Liu,Zhiwen Yu,Xiaoqing Liu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
备注:

点击查看摘要

Abstract:Current mainstream approaches to addressing multimodal imbalance primarily focus on architectural modifications and optimization-based, often overlooking a quantitative analysis of the imbalance degree between modalities. To address this gap, our work introduces a novel method for the quantitative analysis of multi-modal imbalance, which in turn informs the design of a sample-level adaptive loss this http URL begin by defining the “Modality Gap” as the difference between the Softmax scores of different modalities (e.g., audio and visual) for the ground-truth class prediction. Analysis of the Modality Gap distribution reveals that it can be effectively modeled by a bimodal Gaussian Mixture Model (GMM). These two components are found to correspond respectively to “modality-balanced” and “modality-imbalanced” data samples. Subsequently, we apply Bayes’ theorem to compute the posterior probability of each sample belonging to these two distinct this http URL by this quantitative analysis, we design a novel adaptive loss function with three objectives: (1) to minimize the overall Modality Gap; (2) to encourage the imbalanced sample distribution to shift towards the balanced one; and (3) to apply greater penalty weights to imbalanced samples. We employ a two-stage training strategy consisting of a warm-up phase followed by an adaptive training this http URL results demonstrate that our approach achieves state-of-the-art (SOTA) performance on the public CREMA-D and AVE datasets, attaining accuracies of 80.65% and 70.90% , respectively. This validates the effectiveness of our proposed methodology.
zh

[AI-177] A Physics-Guided AI Cascaded Corrector Model Significantly Extends Madden-Julian Oscillation Prediction Skill

【速读】:该论文旨在解决当前数值天气预报模型在预测热带大气振荡—— Madden-Julian Oscillation (MJO) 方面的局限性,尤其是其预报技巧通常仅能维持3-4周,且难以准确模拟MJO向东传播过程中的“海洋大陆障碍”(Maritime Continent barrier)问题。解决方案的关键在于提出一种物理引导的级联校正框架(Physics-guided Cascaded Corrector for MJO, PCC-MJO),该框架由两阶段组成:第一阶段利用物理信息嵌入的3D U-Net网络校正时空场误差,第二阶段通过优化预报技巧的LSTM对MJO的RMM指数进行精细化修正。该方法在多个运行模式(CMA、ECMWF、NCEP)上均显著延长了有效预报时间(提升2–8天),并借助可解释AI分析验证了其决策机制与真实MJO动力学高度一致(空间相关系数达0.93),从而实现物理一致性、计算高效性和强泛化能力的统一。

链接: https://arxiv.org/abs/2510.21796
作者: Xiao Zhou,Yuze Sun,Jie Wu,Xiaomeng Huang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Atmospheric and Oceanic Physics (physics.ao-ph)
备注:

点击查看摘要

Abstract:The Madden-Julian Oscillation (MJO) is an important driver of global weather and climate extremes, but its prediction in operational dynamical models remains challenging, with skillful forecasts typically limited to 3-4 weeks. Here, we introduce a novel deep learning framework, the Physics-guided Cascaded Corrector for MJO (PCC-MJO), which acts as a universal post-processor to correct MJO forecasts from dynamical models. This two-stage model first employs a physics-informed 3D U-Net to correct spatial-temporal field errors, then refines the MJO’s RMM index using an LSTM optimized for forecast skill. When applied to three different operational forecasts from CMA, ECMWF and NCEP, our unified framework consistently extends the skillful forecast range (bivariate correlation 0.5) by 2-8 days. Crucially, the model effectively mitigates the “Maritime Continent barrier”, enabling more realistic eastward propagation and amplitude. Explainable AI analysis quantitatively confirms that the model’s decision-making is spatially congruent with observed MJO dynamics (correlation 0.93), demonstrating that it learns physically meaningful features rather than statistical fittings. Our work provides a promising physically consistent, computationally efficient, and highly generalizable pathway to break through longstanding barriers in subseasonal forecasting.
zh

[AI-178] Variance-Reduction Guidance: Sampling Trajectory Optimization for Diffusion Models

【速读】:该论文旨在解决扩散模型(Diffusion Models)在采样过程中因预测误差累积而导致生成质量下降的问题。扩散模型的采样过程涉及多步噪声预测,每一步的预测偏差会随时间累积,从而影响最终生成结果的质量。解决方案的关键在于提出了一种无需模型微调或修改的方差缩减引导(Variance-Reduction Guidance, VRG)方法:给定预定义的采样轨迹,VRG通过搜索具有相同步数但生成质量更高的新轨迹来有效缓解预测误差的累积效应,且该方法适用于条件与无条件生成任务。

链接: https://arxiv.org/abs/2510.21792
作者: Shifeng Xu,Yanzhu Liu,Adams Wai-Kin Kong
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Diffusion models have become emerging generative models. Their sampling process involves multiple steps, and in each step the models predict the noise from a noisy sample. When the models make prediction, the output deviates from the ground truth, and we call such a deviation as \textitprediction error. The prediction error accumulates over the sampling process and deteriorates generation quality. This paper introduces a novel technique for statistically measuring the prediction error and proposes the Variance-Reduction Guidance (VRG) method to mitigate this error. VRG does not require model fine-tuning or modification. Given a predefined sampling trajectory, it searches for a new trajectory which has the same number of sampling steps but produces higher quality results. VRG is applicable to both conditional and unconditional generation. Experiments on various datasets and baselines demonstrate that VRG can significantly improve the generation quality of diffusion models. Source code is available at this https URL.
zh

[AI-179] Online Mixture of Experts: No-Regret Learning for Optimal Collective Decision-Making NEURIPS2025

【速读】:该论文旨在解决多专家模型在在线决策场景中如何动态聚合其输出以实现最优整体准确率的问题,即在给定上下文的情况下,从候选专家集合中选择最佳组合并分配权重,从而提升系统性能。解决方案的关键在于提出两种基于专家引导的bandit学习算法:第一种结合了聚合投票与UCB驱动的逐次消除机制,通过高效剪枝次优探索动作来降低 regret;第二种采用在线加权多数投票机制,根据各专家的预测能力动态分配投票权重。这两种方法均在理想条件下提供了理论上的无遗憾(no-regret)保证,并成功应用于生成式大语言模型(Generative LLMs)的在线微调任务中,实现了响应后实时调整专家权重或选择最优专家委员会的目标。

链接: https://arxiv.org/abs/2510.21788
作者: Larkin Liu,Jalal Etesami
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)

点击查看摘要

Abstract:We explore the use of expert-guided bandit learning, which we refer to as online mixture-of-experts (OMoE). In this setting, given a context, a candidate committee of experts must determine how to aggregate their outputs to achieve optimal results in terms of aggregate accuracy. We propose two algorithms to address this problem. The first algorithm combines aggregate voting with UCB-driven successive elimination, efficiently pruning suboptimal exploration actions. The second algorithm employs an online weighted-majority-voting mechanism, leveraging the respective voting power of each expert proportional to their predictive power. We derive theoretical guarantees for the regret properties in the bandit setting under ideal circumstances, and empirical results are provided accordingly. As a modern study on applications, these methods are applied to the online fine-tuning of a set of expert large language models (LLMs), where after each response, the generative LLM dynamically reweighs its set of experts and/or selects the optimal committee of experts to generate the most accurate response. Our results introduce new methodologies and no-regret guarantees for combining multiple experts to improve on the performance of the an aggregate model overall.
zh

[AI-180] What Causes Postoperative Aspiration?

【速读】:该论文旨在解决术后误吸(postoperative aspiration)风险难以精准预测的问题,从而实现早期干预以降低手术患者的发病率和死亡率。其解决方案的关键在于构建一个基于机器学习(Machine Learning, ML)的预测模型,利用MIMIC-IV数据库中826名外科患者的数据训练XGBoost、多层感知机(Multilayer Perceptron)和随机森林三种模型,最终获得AUROC为0.86、敏感度达77.3%的性能表现;同时通过增广逆概率加权法(Augmented Inverse Probability Weighting)估计平均治疗效应(Average Treatment Effect, ATE),识别出最大日阿片类药物剂量和手术部位(如颈部和头部)是显著的因果影响因素,揭示了性别差异在阿片类药物使用与误吸风险中的潜在关联,为制定个体化预防策略提供了数据驱动依据。

链接: https://arxiv.org/abs/2510.21779
作者: Supriya Nagesh,Karina Covarrubias,Robert El-Kareh,Shiva Prasad Kasiviswanathan,Nina Mishra
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Background: Aspiration, the inhalation of foreign material into the lungs, significantly impacts surgical patient morbidity and mortality. This study develops a machine learning (ML) model to predict postoperative aspiration, enabling timely preventative interventions. Methods: From the MIMIC-IV database of over 400,000 hospital admissions, we identified 826 surgical patients (mean age: 62, 55.7% male) who experienced aspiration within seven days post-surgery, along with a matched non-aspiration cohort. Three ML models: XGBoost, Multilayer Perceptron, and Random Forest were trained using pre-surgical hospitalization data to predict postoperative aspiration. To investigate causation, we estimated Average Treatment Effects (ATE) using Augmented Inverse Probability Weighting. Results: Our ML model achieved an AUROC of 0.86 and 77.3% sensitivity on a held-out test set. Maximum daily opioid dose, length of stay, and patient age emerged as the most important predictors. ATE analysis identified significant causative factors: opioids (0.25 +/- 0.06) and operative site (neck: 0.20 +/- 0.13, head: 0.19 +/- 0.13). Despite equal surgery rates across genders, men were 1.5 times more likely to aspirate and received 27% higher maximum daily opioid dosages compared to women. Conclusion: ML models can effectively predict postoperative aspiration risk, enabling targeted preventative measures. Maximum daily opioid dosage and operative site significantly influence aspiration risk. The gender disparity in both opioid administration and aspiration rates warrants further investigation. These findings have important implications for improving postoperative care protocols and aspiration prevention strategies. Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI) Cite as: arXiv:2510.21779 [cs.LG] (or arXiv:2510.21779v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2510.21779 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Supriya Nagesh [view email] [v1] Sat, 18 Oct 2025 05:07:57 UTC (1,581 KB)
zh

[AI-181] Learn2Drive: A neural network-based framework for socially compliant automated vehicle control

【速读】:该论文旨在解决当前自动驾驶车辆(AV)控制策略忽视与人类驾驶车辆(HV)交互及其对整体交通流影响的问题,从而导致拥堵加剧和系统效率下降。解决方案的关键在于提出一种基于神经网络的社会合规型AV控制框架,引入社会价值取向(Social Value Orientation, SVO)机制,使AV在优化自身控制目标的同时,兼顾对HV行为和全局交通动态的影响。通过定义AV与HV的效用函数并依据SVO进行优化,该框架实现了AV作为移动交通调节器的功能,显著提升了交通流效率、降低了能耗,并在不同交通条件下展现出良好的自适应能力。

链接: https://arxiv.org/abs/2510.21736
作者: Yuhui Liu,Samannita Halder,Shian Wang,Tianyi Li
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Systems and Control (eess.SY)
备注:

点击查看摘要

Abstract:This study introduces a novel control framework for adaptive cruise control (ACC) in automated driving, leveraging Long Short-Term Memory (LSTM) networks and physics-informed constraints. As automated vehicles (AVs) adopt advanced features like ACC, transportation systems are becoming increasingly intelligent and efficient. However, existing AV control strategies primarily focus on optimizing the performance of individual vehicles or platoons, often neglecting their interactions with human-driven vehicles (HVs) and the broader impact on traffic flow. This oversight can exacerbate congestion and reduce overall system efficiency. To address this critical research gap, we propose a neural network-based, socially compliant AV control framework that incorporates social value orientation (SVO). This framework enables AVs to account for their influence on HVs and traffic dynamics. By leveraging AVs as mobile traffic regulators, the proposed approach promotes adaptive driving behaviors that reduce congestion, improve traffic efficiency, and lower energy consumption. Within this framework, we define utility functions for both AVs and HVs, which are optimized based on the SVO of each AV to balance its own control objectives with broader traffic flow considerations. Numerical results demonstrate the effectiveness of the proposed method in adapting to varying traffic conditions, thereby enhancing system-wide efficiency. Specifically, when the AV’s control mode shifts from prioritizing energy consumption to optimizing traffic flow efficiency, vehicles in the following platoon experience at least a 58.99% increase in individual energy consumption alongside at least a 38.39% improvement in individual average speed, indicating significant enhancements in traffic dynamics.
zh

[AI-182] A phase-aware AI car-following model for electric vehicles with adaptive cruise control: Development and validation using real-world data

【速读】:该论文旨在解决当前微观交通流模型难以准确描述电动汽车(Electric Vehicles, EVs)独特跟车行为的问题。由于EV具有快速加速能力和再生制动特性,其动态响应与传统内燃机汽车(Internal Combustion Engine vehicles, ICEs)存在显著差异,而现有模型未能充分捕捉这一差异。解决方案的关键在于提出一种相位感知人工智能(Phase-Aware AI, PAAI)跟车模型,该模型在传统物理驱动框架基础上引入AI组件,能够识别并自适应不同驾驶阶段(如快速加速和再生制动),从而显著提升对EV行为的预测精度。通过基于配备自适应巡航控制(Adaptive Cruise Control, ACC)车辆的真实轨迹数据进行仿真验证,证明了PAAI模型在交通仿真中对EV行为刻画的有效性。

链接: https://arxiv.org/abs/2510.21735
作者: Yuhui Liu,Shian Wang,Ansel Panicker,Kate Embry,Ayana Asanova,Tianyi Li
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
备注:

点击查看摘要

Abstract:Internal combustion engine (ICE) vehicles and electric vehicles (EVs) exhibit distinct vehicle dynamics. EVs provide rapid acceleration, with electric motors producing peak power across a wider speed range, and achieve swift deceleration through regenerative braking. While existing microscopic models effectively capture the driving behavior of ICE vehicles, a modeling framework that accurately describes the unique car-following dynamics of EVs is lacking. Developing such a model is essential given the increasing presence of EVs in traffic, yet creating an easy-to-use and accurate analytical model remains challenging. To address these gaps, this study develops and validates a Phase-Aware AI (PAAI) car-following model specifically for EVs. The proposed model enhances traditional physics-based frameworks with an AI component that recognizes and adapts to different driving phases, such as rapid acceleration and regenerative braking. Using real-world trajectory data from vehicles equipped with adaptive cruise control (ACC), we conduct comprehensive simulations to validate the model’s performance. The numerical results demonstrate that the PAAI model significantly improves prediction accuracy over traditional car-following models, providing an effective tool for accurately representing EV behavior in traffic simulations. Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY) Cite as: arXiv:2510.21735 [cs.RO] (or arXiv:2510.21735v1 [cs.RO] for this version) https://doi.org/10.48550/arXiv.2510.21735 Focus to learn more arXiv-issued DOI via DataCite
zh

[AI-183] CustomIR: Unsupervised Fine-Tuning of Dense Embeddings for Known Document Corpora

【速读】:该论文旨在解决预训练语言嵌入模型(pre-trained language embedding models)在应用于特定领域语料库时性能下降的问题,尤其是在检索增强生成(Retrieval-Augmented Generation, RAG)流水线中。其关键解决方案是提出一种名为CustomIR的无监督适应框架,通过大语言模型(Large Language Models, LLMs)生成基于目标语料库的多样化查询-文档对,并结合LLM验证的难负样本(hard negatives),从而无需人工标注即可实现对嵌入模型的针对性微调。该方法显著提升了小模型在企业邮件和消息数据集上的召回率(Recall@10),使其性能可媲美更大模型,同时降低了RAG部署成本。

链接: https://arxiv.org/abs/2510.21729
作者: Nathan Paull
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Dense embedding models have become critical for modern information retrieval, particularly in RAG pipelines, but their performance often degrades when applied to specialized corpora outside their pre-training distribution. To address thi we introduce \textbfCustomIR, a framework for unsupervised adaptation of pre-trained language embedding models to domain-specific corpora using synthetically generated query-document pairs. CustomIR leverages large language models (LLMs) to create diverse queries grounded in a known target corpus, paired with LLM-verified hard negatives, eliminating the need for costly human annotation. Experiments on enterprise email and messaging datasets show that CustomIR consistently improves retrieval effectiveness with small models gaining up to 2.3 points in Recall@10. This performance increase allows these small models to rival the performance of much larger alternatives, allowing for cheaper RAG deployments. These results highlight that targeted synthetic fine-tuning offers a scalable and cost-efficient strategy for increasing domain-specific performance.
zh

[AI-184] Modeling Bias Evolution in Fashion Recommender Systems: A System Dynamics Approach

【速读】:该论文旨在解决时尚推荐系统(Fashion Recommender Systems, FRS)中存在的偏见问题,这类偏见不仅扭曲用户体验,还可能强化和放大社会刻板印象,尤其在时尚电商领域。研究通过系统动力学建模与实验仿真,揭示了偏见激活与强化的动态机制及其对系统性能的多维影响,发现归纳偏见(inductive bias)比用户偏见对系统结果的影响更为显著。解决方案的关键在于:一方面需提升现有去偏策略(如数据重平衡和算法正则化)的有效性,另一方面应拓展系统边界,整合更广泛的上下文因素(如用户人口统计特征与商品多样性),以实现更公平、包容的推荐效果。

链接: https://arxiv.org/abs/2510.21728
作者: Mahsa Goodarzi,M. Abdullah Canbaz
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
备注: Published in the proceedings of the 43rd International System Dynamics Conference (ISDC 25): this https URL

点击查看摘要

Abstract:Bias in recommender systems not only distorts user experience but also perpetuates and amplifies existing societal stereotypes, particularly in sectors like fashion e-commerce. This study employs a dynamic modeling approach to scrutinize the mechanisms of bias activation and reinforcement within Fashion Recommender Systems (FRS). By leveraging system dynamics modeling and experimental simulations, we dissect the temporal evolution of bias and its multifaceted impacts on system performance. Our analysis reveals that inductive biases exert a more substantial influence on system outcomes than user biases, suggesting critical areas for intervention. We demonstrate that while current debiasing strategies, including data rebalancing and algorithmic regularization, are effective to an extent, they require further enhancement to comprehensively mitigate biases. This research underscores the necessity for advancing these strategies and extending system boundaries to incorporate broader contextual factors such as user demographics and item diversity, aiming to foster inclusivity and fairness in FRS. The findings advocate for a proactive approach in recommender system design to counteract bias propagation and ensure equitable user experiences.
zh

[AI-185] Your Dense Retriever is Secretly an Expeditious Reason er

【速读】:该论文旨在解决密集检索器(Dense Retrievers)在处理需要复杂推理的查询时性能不足的问题,同时避免普遍使用大语言模型(LLM)进行查询重写所带来的高计算成本。其解决方案的关键在于提出了一种自适应查询推理框架(Adaptive Query Reasoning, AdaQR),其中包含一个推理路由器(Reasoner Router),能够动态判断每个查询应采用高效的密集推理(Dense Reasoner)还是深度LLM推理;密集推理通过在嵌入空间中直接执行类似LLM的推理机制,实现了效率与准确性的可控权衡,在大规模检索基准BRIGHT上验证了该方法可在降低28%推理成本的同时保持甚至提升检索性能。

链接: https://arxiv.org/abs/2510.21727
作者: Yichi Zhang,Jun Bai,Zhixin Cai,Shuhan Qin,Zhuofan Chen,Jinghua Guan,Wenge Rong
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 16 pages, 11 figures

点击查看摘要

Abstract:Dense retrievers enhance retrieval by encoding queries and documents into continuous vectors, but they often struggle with reasoning-intensive queries. Although Large Language Models (LLMs) can reformulate queries to capture complex reasoning, applying them universally incurs significant computational cost. In this work, we propose Adaptive Query Reasoning (AdaQR), a hybrid query rewriting framework. Within this framework, a Reasoner Router dynamically directs each query to either fast dense reasoning or deep LLM reasoning. The dense reasoning is achieved by the Dense Reasoner, which performs LLM-style reasoning directly in the embedding space, enabling a controllable trade-off between efficiency and accuracy. Experiments on large-scale retrieval benchmarks BRIGHT show that AdaQR reduces reasoning cost by 28% while preserving-or even improving-retrieval performance by 7%.
zh

[AI-186] AquaVLM: Improving Underwater Situation Awareness with Mobile Vision Language Models

【速读】:该论文旨在解决传统水下通信系统体积大、成本高且缺乏上下文感知能力的问题,这些问题限制了潜水员在水下进行高效、自然交流的能力。现有系统虽借助轻量级智能手机支持文本消息传递,但依赖预设内容,无法适应动态场景需求。其解决方案的关键在于提出AquaVLM系统,该系统基于微调后的移动视觉语言模型(Vision-Language Model, VLM),通过自动识别水下环境并生成情境感知的消息内容,结合分层消息生成管道与抗误码优化的传输机制,实现了以智能手机为载体的“点击发送”式水下通信。该方案显著提升了通信的灵活性与实用性,并通过虚拟现实模拟器和iOS平台原型验证了其有效性。

链接: https://arxiv.org/abs/2510.21722
作者: Beitong Tian,Lingzhi Zhao,Bo Chen,Haozhen Zheng,Jingcheng Yang,Mingyuan Wu,Deepak Vasisht,Klara Nahrstedt
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注: 12 pages, 10 figures, under review

点击查看摘要

Abstract:Underwater activities like scuba diving enable millions annually to explore marine environments for recreation and scientific research. Maintaining situational awareness and effective communication are essential for diver safety. Traditional underwater communication systems are often bulky and expensive, limiting their accessibility to divers of all levels. While recent systems leverage lightweight smartphones and support text messaging, the messages are predefined and thus restrict context-specific communication. In this paper, we present AquaVLM, a tap-and-send underwater communication system that automatically generates context-aware messages and transmits them using ubiquitous smartphones. Our system features a mobile vision-language model (VLM) fine-tuned on an auto-generated underwater conversation dataset and employs a hierarchical message generation pipeline. We co-design the VLM and transmission, incorporating error-resilient fine-tuning to improve the system’s robustness to transmission errors. We develop a VR simulator to enable users to experience AquaVLM in a realistic underwater environment and create a fully functional prototype on the iOS platform for real-world experiments. Both subjective and objective evaluations validate the effectiveness of AquaVLM and highlight its potential for personal underwater communication as well as broader mobile VLM applications. Comments: 12 pages, 10 figures, under review Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI) Cite as: arXiv:2510.21722 [cs.HC] (or arXiv:2510.21722v1 [cs.HC] for this version) https://doi.org/10.48550/arXiv.2510.21722 Focus to learn more arXiv-issued DOI via DataCite
zh

[AI-187] PREFINE: Personalized Story Generation via Simulated User Critics and User-Specific Rubric Generation

【速读】:该论文旨在解决生成式 AI 在个性化故事生成中难以反映个体用户偏好这一问题,传统方法依赖显式反馈或参数微调,存在用户负担重、数据收集困难、计算成本高及隐私风险等实际挑战。其解决方案的关键在于提出 PREFINE(Persona-and-Rubric Guided Critique-and-Refine)框架,通过构建基于用户交互历史的伪用户代理(pseudo-user agent),自动生成用户特定的评价标准(rubrics),并利用该代理以批判与优化的方式对生成内容进行迭代改进,从而实现无需参数更新或直接用户反馈的个性化生成。实验表明,该方法在自动评估中显著优于基线模型,同时保持了通用故事质量,验证了伪用户代理和用户专属 rubrics 对提升个性化性能的核心作用。

链接: https://arxiv.org/abs/2510.21721
作者: Kentaro Ueda,Takehiro Takayanagi
机构: 未知
类目: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
备注:

点击查看摘要

Abstract:While recent advances in Large Language Models (LLMs) have improved the quality of creative text generation, significant challenges remain in producing personalized stories that reflect individual user preferences. Conventional approaches rely on explicit feedback or fine-tuning, which presents practical issues regarding user burden, data collection, computational costs, and privacy. In this work, we propose PREFINE (Persona-and-Rubric Guided Critique-and-Refine), a novel framework that extends the Critique-and-Refine paradigm to personalization. PREFINE constructs a pseudo-user agent from a user’s interaction history and generates user-specific rubrics (evaluation criteria). By having this agent critique and refine outputs on the user’s behalf based on these tailored rubrics, our method achieves personalized generation without requiring parameter updates or direct user feedback. We conducted a comprehensive evaluation on the PerDOC and PerMPST story datasets. We designed three baseline methods and several model variants to verify the contribution of each component of our framework. In automatic evaluations (LLM-as-a-Judge), PREFINE achieved higher win rates and statistically significant scores than the baselines, without compromising general story quality. Analysis of the model variants confirmed that both the pseudo-user agent and the user-specific rubrics are crucial for enhancing personalization performance. Beyond story generation, our approach holds potential for enabling efficient personalization in broader applications, such as dialogue systems, education, and recommendation.
zh

[AI-188] A Multi-Component AI Framework for Computational Psychology: From Robust Predictive Modeling to Deployed Generative Dialogue

【速读】:该论文旨在解决当前计算心理学研究中预测建模与交互式心理分析系统之间存在的断层问题,即如何将孤立的预测模型转化为可实际部署、具备交互能力的心理状态分析平台。其解决方案的关键在于构建了一个端到端的多维度框架:首先在四个不同心理数据集上建立基准性能;其次通过改进Transformer模型以克服回归任务中的数值不稳定性,并在资源受限条件下实现大规模训练;再次利用参数高效微调技术训练生成式大语言模型(Large Language Model, LLM),使其作为交互式“人格大脑”运行;最终将所有预测与生成模型集成至一个可扩展的微服务架构中,实现了从算法开发到系统部署的完整闭环,从而为计算心理学和人机交互研究提供了一套可复现且实用的技术路径。

链接: https://arxiv.org/abs/2510.21720
作者: Anant Pareek
机构: 未知
类目: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:The confluence of Artificial Intelligence and Computational Psychology presents an opportunity to model, understand, and interact with complex human psychological states through computational means. This paper presents a comprehensive, multi-faceted framework designed to bridge the gap between isolated predictive modeling and an interactive system for psychological analysis. The methodology encompasses a rigorous, end-to-end development lifecycle. First, foundational performance benchmarks were established on four diverse psychological datasets using classical machine learning techniques. Second, state-of-the-art transformer models were fine-tuned, a process that necessitated the development of effective solutions to overcome critical engineering challenges, including the resolution of numerical instability in regression tasks and the creation of a systematic workflow for conducting large-scale training under severe resource constraints. Third, a generative large language model (LLM) was fine-tuned using parameter-efficient techniques to function as an interactive “Personality Brain.” Finally, the entire suite of predictive and generative models was architected and deployed as a robust, scalable microservices ecosystem. Key findings include the successful stabilization of transformer-based regression models for affective computing, showing meaningful predictive performance where standard approaches failed, and the development of a replicable methodology for democratizing large-scale AI research. The significance of this work lies in its holistic approach, demonstrating a complete research-to-deployment pipeline that integrates predictive analysis with generative dialogue, thereby providing a practical model for future research in computational psychology and human-AI interaction.
zh

[AI-189] GAMER PAT: Research as a Serious Game

【速读】:该论文试图解决的问题是:随着生成式 AI 在学术写作方面日益超越学生水平,如何在自动化学术成果普及的背景下,保持初学者研究者的学习动机、创造力与认知成长。其解决方案的关键在于提出 GAMER PAT(GAme MastER, Paper Authoring Tutor),一个通过提示工程构建的 AI 对话机器人,将科研论文写作重构为一种严肃游戏(Serious Game)——用户通过角色扮演机制与虚拟合作者(NPC)及匿名审稿人互动,将反馈转化为“任务”,并沿着叙事驱动的流程逐步推进写作进程。研究识别出四阶段支架式结构化支持模式:问题提出、元视角建立、结构搭建与递归反思,表明该系统不仅促进论文结构发展,还强化了反思性思维与内在动机。

链接: https://arxiv.org/abs/2510.21719
作者: Kenji Saito,Rei Tadika
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
备注: 14 pages, 2 figures

点击查看摘要

Abstract:As generative AI increasingly outperforms students in producing academic writing, a critical question arises: how can we preserve the motivation, creativity, and intellectual growth of novice researchers in an age of automated academic achievement? This paper introduces GAMER PAT (GAme MastER, Paper Authoring Tutor), a prompt-engineered AI chatbot that reframes research paper writing as a serious game. Through role-playing mechanics, users interact with a co-author NPC and anonymous reviewer NPCs, turning feedback into “missions” and advancing through a narrative-driven writing process. Our study reports on 26+ gameplay chat logs, including both autoethnography and use by graduate students under supervision. Using qualitative log analysis with SCAT (Steps for Coding and Theorization), we identified an emergent four-phase scaffolding pattern: (1) question posing, (2) meta-perspective, (3) structuring, and (4) recursive reflection. These results suggest that GAMER PAT supports not only the structural development of research writing but also reflective and motivational aspects. We present this work as a descriptive account of concept and process, not a causal evaluation. We also include a speculative outlook envisioning how humans may continue to cultivate curiosity and agency alongside AI-driven research. This arXiv version thus provides both a descriptive report of design and usage, and a forward-looking provocation for future empirical studies. Comments: 14 pages, 2 figures Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY) ACMclasses: K.4.0; K.3.1 Cite as: arXiv:2510.21719 [cs.HC] (or arXiv:2510.21719v1 [cs.HC] for this version) https://doi.org/10.48550/arXiv.2510.21719 Focus to learn more arXiv-issued DOI via DataCite
zh

[AI-190] AI-Enhanced Operator Assistance for UNICOS Applications

【速读】:该论文旨在解决UNICOS(CERN的统一工业控制系统)中操作员和维护人员面临的三大核心问题:一是解码控制面板(widget)带来的认知负担,二是进行根本原因分析(root cause analysis)时需要大量手动操作,三是难以在复杂的代码库中追踪数据点元素(DPE, Data Point Element)。为应对这些问题,研究提出了一种基于多智能体系统(multi-agent system)的解决方案,其关键在于构建了一个模块化架构,包括使用CTRL语言编写的UNICOS侧扩展、部署在虚拟机上的Python多智能体系统,以及存储操作文档与控件动画代码的向量数据库(vector database)。该架构通过检索增强生成(RAG)技术实现对实时设备数据和文档的联合推理,从而自动完成控件解析、根因定位和DPE溯源,显著降低人工工作量并提升故障响应效率。

链接: https://arxiv.org/abs/2510.21717
作者: Bernard Tam,Jean-Charles Tournier,Fernando Varela Rodriguez
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
备注: Prepared as part of the CERN openlab programme 2025. Also available on Zenodo, a repository operated by CERN and co-funded by the European Union

点击查看摘要

Abstract:This project explores the development of an AI-enhanced operator assistant for UNICOS, CERN’s UNified Industrial Control System. While powerful, UNICOS presents a number of challenges, including the cognitive burden of decoding widgets, manual effort required for root cause analysis, and difficulties maintainers face in tracing datapoint elements (DPEs) across a complex codebase. In situations where timely responses are critical, these challenges can increase cognitive load and slow down diagnostics. To address these issues, a multi-agent system was designed and implemented. The solution is supported by a modular architecture comprising a UNICOS-side extension written in CTRL code, a Python-based multi-agent system deployed on a virtual machine, and a vector database storing both operator documentation and widget animation code. Preliminary evaluations suggest that the system is capable of decoding widgets, performing root cause analysis by leveraging live device data and documentation, and tracing DPEs across a complex codebase. Together, these capabilities reduce the manual workload of operators and maintainers, enhance situational awareness in operations, and accelerate responses to alarms and anomalies. Beyond these immediate gains, this work highlights the potential of introducing multi-modal reasoning and retrieval augmented generation (RAG) into the domain of industrial control. Ultimately, this work represents more than a proof of concept: it provides a basis for advancing intelligent operator interfaces at CERN. By combining modular design, extensibility, and practical AI integration, this project not only alleviates current operator pain points but also points toward broader opportunities for assistive AI in accelerator operations.
zh

[AI-191] A Feature Engineering Approach for Business Impact-Oriented Failure Detection in Distributed Instant Payment Systems

【速读】:该论文旨在解决即时支付基础设施(Instant Payment Infrastructure)中技术指标与业务流程可视性之间的观测鸿沟问题,特别是在高吞吐量、零中断运行场景下,传统监控手段难以实现早期故障检测与定位。其解决方案的关键在于提出一种基于连续ISO 20022消息交换间处理时间的特征工程方法,构建系统状态的紧凑表示,并结合异常检测技术实现对多种异常模式的有效识别。该方法不仅提供可解释的诊断依据,还能通过将特征映射到不同的处理阶段,区分内部与外部支付系统问题,从而显著缩短故障排查时间并弥合分布式系统中事务状态碎片化带来的可观测性缺口。

链接: https://arxiv.org/abs/2510.21710
作者: Lorenzo Porcelli
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
备注:

点击查看摘要

Abstract:Instant payment infrastructures have stringent performance requirements, processing millions of transactions daily with zero-downtime expectations. Traditional monitoring approaches fail to bridge the gap between technical infrastructure metrics and business process visibility. We introduce a novel feature engineering approach based on processing times computed between consecutive ISO 20022 message exchanges, creating a compact representation of system state. By applying anomaly detection to these features, we enable early failure detection and localization, allowing incident classification. Experimental evaluation on the TARGET Instant Payment Settlement (TIPS) system, using both real-world incidents and controlled simulations, demonstrates the approach’s effectiveness in detecting diverse anomaly patterns and provides inherently interpretable explanations that enable operators to understand the business impact. By mapping features to distinct processing phases, the resulting framework differentiates between internal and external payment system issues, significantly reduces investigation time, and bridges observability gaps in distributed systems where transaction state is fragmented across multiple entities.
zh

[AI-192] Robust Decision Making with Partially Calibrated Forecasts

【速读】:该论文旨在解决高维预测问题中,如何在弱校准(partial calibration)条件下为保守决策者设计最优决策规则的问题。传统全校准(fully calibrated)保证虽具有强决策理论意义,但仅适用于低维预测场景;而现有弱校准形式缺乏此类性质。论文提出一种最小最大(minimax)框架,在所有与校准约束一致的分布中最优化期望效用,从而实现鲁棒决策。其关键创新在于通过对偶论证刻画了最小最大最优决策规则,并发现“信任预测并据此行动”这一策略在决策校准(decision calibration)及其更强条件下的最小最大意义上依然成立——这比全校准更弱且更易实现;对于未达到决策校准的校准形式,论文还提供了一种高效可计算的自然决策规则,并通过回归模型优化平方误差进行实证验证。

链接: https://arxiv.org/abs/2510.23471
作者: Shayan Kiyani,Hamed Hassani,George Pappas,Aaron Roth
机构: 未知
类目: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Calibration has emerged as a foundational goal in trustworthy machine learning'', in part because of its strong decision theoretic semantics. Independent of the underlying distribution, and independent of the decision maker's utility function, calibration promises that amongst all policies mapping predictions to actions, the uniformly best policy is the one that trusts the predictions’’ and acts as if they were correct. But this is true only of \emphfully calibrated forecasts, which are tractable to guarantee only for very low dimensional prediction problems. For higher dimensional prediction problems (e.g. when outcomes are multiclass), weaker forms of calibration have been studied that lack these decision theoretic properties. In this paper we study how a conservative decision maker should map predictions endowed with these weaker (partial'') calibration guarantees to actions, in a way that is robust in a minimax sense: i.e. to maximize their expected utility in the worst case over distributions consistent with the calibration guarantees. We characterize their minimax optimal decision rule via a duality argument, and show that surprisingly, trusting the predictions and acting accordingly’’ is recovered in this minimax sense by \emphdecision calibration (and any strictly stronger notion of calibration), a substantially weaker and more tractable condition than full calibration. For calibration guarantees that fall short of decision calibration, the minimax optimal decision rule is still efficiently computable, and we provide an empirical evaluation of a natural one that applies to any regression model solved to optimize squared error.
zh

[AI-193] Exploring Vulnerability in AI Industry

【速读】:该论文旨在解决生成式 AI(Generative AI)基础模型(Foundation Models, FMs)产业在快速演进过程中所面临的系统性脆弱性评估难题,尤其聚焦于上游生产价值链中的风险识别与量化。其解决方案的关键在于构建一个合成的AI脆弱性指数(AI Vulnerability Index, AIVI),将FM产出建模为计算能力(Compute)、数据(Data)、人才(Talent)、资本(Capital)和能源(Energy)五类输入的函数,并通过加权几何平均法整合各子指数,利用理论或实证基准进行归一化处理,从而在公开数据受限的前提下,系统性刻画基础模型产业链的核心风险点。

链接: https://arxiv.org/abs/2510.23421
作者: Claudio Pirrone,Stefano Fricano,Gioacchino Fazio
机构: 未知
类目: General Economics (econ.GN); Artificial Intelligence (cs.AI)
备注: Preliminary Draft

点击查看摘要

Abstract:The rapid ascent of Foundation Models (FMs), enabled by the Transformer architecture, drives the current AI ecosystem. Characterized by large-scale training and downstream adaptability, FMs (as GPT family) have achieved massive public adoption, fueling a turbulent market shaped by platform economics and intense investment. Assessing the vulnerability of this fast-evolving industry is critical yet challenging due to data limitations. This paper proposes a synthetic AI Vulnerability Index (AIVI) focusing on the upstream value chain for FM production, prioritizing publicly available data. We model FM output as a function of five inputs: Compute, Data, Talent, Capital, and Energy, hypothesizing that supply vulnerability in any input threatens the industry. Key vulnerabilities include compute concentration, data scarcity and legal risks, talent bottlenecks, capital intensity and strategic dependencies, as well as escalating energy demands. Acknowledging imperfect input substitutability, we propose a weighted geometrical average of aggregate subindexes, normalized using theoretical or empirical benchmarks. Despite limitations and room for improvement, this preliminary index aims to quantify systemic risks in AI’s core production engine, and implicitly shed a light on the risks for downstream value chain.
zh

[AI-194] PASS-Enhanced MEC: Joint Optimization of Task Offloading and Uplink PASS Beamforming

【速读】:该论文旨在解决动态无线环境中移动边缘计算(MEC)系统中任务卸载效率低和延迟高的问题。其解决方案的关键在于提出一种基于夹紧天线系统(PASS)增强的MEC架构,通过利用介质波导和可调夹紧天线建立短距离视距(LoS)链路,有效缓解高频段显著路径损耗与潜在信号阻塞问题。进一步地,作者将网络延迟最小化问题建模为马尔可夫决策过程(MDP),并采用深度强化学习(DRL)方法联合优化上行PASS波束赋形与任务卸载策略;为提升训练稳定性,提出了一种负载均衡感知的近端策略优化(LBPPO)算法,该算法融合节点级与波导级负载均衡信息,分别维持计算与传输延迟的平衡,从而在用户设备数量较多或发射功率较高场景下展现出优于固定功率放大器基线和传统MIMO辅助MEC的收敛性能。

链接: https://arxiv.org/abs/2510.22948
作者: Zhaoming Hu,Ruikang Zhong,Xidong Mu,Dengao Li,Yuanwei Liu
机构: 未知
类目: ignal Processing (eess.SP); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
备注:

点击查看摘要

Abstract:A pinching-antenna system (PASS)-enhanced mobile edge computing (MEC) architecture is investigated to improve the task offloading efficiency and latency performance in dynamic wireless environments. By leveraging dielectric waveguides and flexibly adjustable pinching antennas, PASS establishes short-distance line-of-sight (LoS) links while effectively mitigating the significant path loss and potential signal blockage, making it a promising solution for high-frequency MEC systems. We formulate a network latency minimization problem to joint optimize uplink PASS beamforming and task offloading. The resulting problem is modeled as a Markov decision process (MDP) and solved via the deep reinforcement learning (DRL) method. To address the instability introduced by the \max operator in the objective function, we propose a load balancing-aware proximal policy optimization (LBPPO) algorithm. LBPPO incorporates both node-level and waveguide-level load balancing information into the policy design, maintaining computational and transmission delay equilibrium, respectively. Simulation results demonstrate that the proposed PASS-enhanced MEC with adaptive uplink PASS beamforming exhibit stronger convergence capability than fixed-PA baselines and conventional MIMO-assisted MEC, especially in scenarios with a large number of UEs or high transmit power.
zh

[AI-195] ABL-ABM: A Hybrid Framework for Synthetic LOB Generation ECAI2025

【速读】:该论文旨在解决金融交易中对高保真度时间序列数据的需求,以补充历史数据并训练大规模交易模型。传统生成式模型依赖大量历史数据和复杂架构(如自回归或扩散模型),而本文提出的关键解决方案是将一个流行的行为代理模型——Chiarella模型(用于模拟日内交易活动)与当前性能优异的多变量时间序列预测模型TABL(Temporal-Attention Bilinear Layer)相结合,并通过一种新颖的删除订单流模拟方法耦合匹配引擎。这一整合框架使得能够基于典型事实(stylised facts)评估预测模型的生成能力,结果显示其能生成合理的价格动态,但在市场微观结构细节上仍存在不足,表明需引入更复杂的行为代理机制来更好地刻画尾部事件。

链接: https://arxiv.org/abs/2510.22685
作者: Ollie Olby,Rory Baggott,Namid Stillman
机构: 未知
类目: Computational Finance (q-fin.CP); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Trading and Market Microstructure (q-fin.TR)
备注: 8 pages, 5 figures, accepted to the Workshop on AI in Finance at ECAI2025

点击查看摘要

Abstract:The recent application of deep learning models to financial trading has heightened the need for high fidelity financial time series data. This synthetic data can be used to supplement historical data to train large trading models. The state-of-the-art models for the generative application often rely on huge amounts of historical data and large, complicated models. These models range from autoregressive and diffusion-based models through to architecturally simpler models such as the temporal-attention bilinear layer. Agent-based approaches to modelling limit order book dynamics can also recreate trading activity through mechanistic models of trader behaviours. In this work, we demonstrate how a popular agent-based framework for simulating intraday trading activity, the Chiarella model, can be combined with one of the most performant deep learning models for forecasting multi-variate time series, the TABL model. This forecasting model is coupled to a simulation of a matching engine with a novel method for simulating deleted order flow. Our simulator gives us the ability to test the generative abilities of the forecasting model using stylised facts. Our results show that this methodology generates realistic price dynamics however, when analysing deeper, parts of the markets microstructure are not accurately recreated, highlighting the necessity for including more sophisticated agent behaviors into the modeling framework to help account for tail events.
zh

[AI-196] An Analytic Theory of Quantum Imaginary Time Evolution

【速读】:该论文旨在解决量子虚时间演化(Quantum Imaginary Time Evolution, QITE)算法在当前噪声中等规模量子(Noisy Intermediate-Scale Quantum, NISQ)设备上的动力学机制缺乏第一性原理理论理解的问题。其解决方案的关键在于:首先,将QITE等价于一种使用量子自然梯度下降(Quantum Natural Gradient Descent, QNGD)训练的变分量子算法(Variational Quantum Algorithm, VQA),其中逆量子费舍尔信息矩阵作为学习率张量;这一等价关系不仅体现在梯度更新规则层面,还通过作用量原理建立——变分原理可直接关联到量子费舍尔信息度量下的几何测地距离(up to an integration constant)。其次,针对宽量子神经网络(wide quantum neural networks),利用量子神经切核(quantum neural tangent kernel, QNTK)框架构建了QITE的解析模型,并证明QITE的收敛速度始终优于基于经典梯度下降(Gradient Descent, GD)的VQA,尽管该优势受希尔伯特空间维度指数增长的影响而被抑制。该理论覆盖线性、二次及更一般的损失函数形式,并通过数值模拟验证了分析结果,从而为QITE的动力学提供了理论基础,并为变分量子算法的第一性原理设计提供了解析洞见。

链接: https://arxiv.org/abs/2510.22481
作者: Min Chen,Bingzhi Zhang,Quntao Zhuang,Junyu Liu
机构: 未知
类目: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
备注: 35 pages, 8 figures

点击查看摘要

Abstract:Quantum imaginary time evolution (QITE) algorithm is one of the most promising variational quantum algorithms (VQAs), bridging the current era of Noisy Intermediate-Scale Quantum devices and the future of fully fault-tolerant quantum computing. Although practical demonstrations of QITE and its potential advantages over the general VQA trained with vanilla gradient descent (GD) in certain tasks have been reported, a first-principle, theoretical understanding of QITE remains limited. Here, we aim to develop an analytic theory for the dynamics of QITE. First, we show that QITE can be interpreted as a form of a general VQA trained with Quantum Natural Gradient Descent (QNGD), where the inverse quantum Fisher information matrix serves as the learning-rate tensor. This equivalence is established not only at the level of gradient update rules, but also through the action principle: the variational principle can be directly connected to the geometric geodesic distance in the quantum Fisher information metric, up to an integration constant. Second, for wide quantum neural networks, we employ the quantum neural tangent kernel framework to construct an analytic model for QITE. We prove that QITE always converges faster than GD-based VQA, though this advantage is suppressed by the exponential growth of Hilbert space dimension. This helps explain certain experimental results in quantum computational chemistry. Our theory encompasses linear, quadratic, and more general loss functions. We validate the analytic results through numerical simulations. Our findings establish a theoretical foundation for QITE dynamics and provide analytic insights for the first-principle design of variational quantum algorithms.
zh

[AI-197] Right Place Right Time: Market Simulation-based RL for Execution Optimisation

【速读】:该论文旨在解决高频交易中大规模订单执行策略优化难题,即如何在最小化市场冲击(market impact)和交易成本的同时,有效控制执行风险(execution risk)。其核心解决方案是构建一个基于强化学习(reinforcement learning, RL)的框架,并在反应式基于代理的市场模拟器中进行评估,该模拟器能够将滑点(slippage)分解为市场冲击与执行风险两部分。通过引入Almgren和Chriss提出的有效前沿(efficient frontier)作为评价指标,验证了RL代理所生成的策略能稳定优于传统基线方法并逼近理论最优边界,从而证明了强化学习在交易算法优化中的强大潜力。

链接: https://arxiv.org/abs/2510.22206
作者: Ollie Olby,Andreea Bacalum,Rory Baggott,Namid Stillman
机构: 未知
类目: Computational Finance (q-fin.CP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Risk Management (q-fin.RM); Trading and Market Microstructure (q-fin.TR)
备注: 8 pages, 4 figures, accepted to ICAIF 2025

点击查看摘要

Abstract:Execution algorithms are vital to modern trading, they enable market participants to execute large orders while minimising market impact and transaction costs. As these algorithms grow more sophisticated, optimising them becomes increasingly challenging. In this work, we present a reinforcement learning (RL) framework for discovering optimal execution strategies, evaluated within a reactive agent-based market simulator. This simulator creates reactive order flow and allows us to decompose slippage into its constituent components: market impact and execution risk. We assess the RL agent’s performance using the efficient frontier based on work by Almgren and Chriss, measuring its ability to balance risk and cost. Results show that the RL-derived strategies consistently outperform baselines and operate near the efficient frontier, demonstrating a strong ability to optimise for risk and impact. These findings highlight the potential of reinforcement learning as a powerful tool in the trader’s toolkit.
zh

[AI-198] Frequentist Validity of Epistemic Uncertainty Estimators

【速读】:该论文旨在解决机器学习系统中预测不确定性分解的问题,特别是如何有效估计模型参数带来的认知不确定性(epistemic uncertainty),而传统基于贝叶斯框架的互信息(mutual information)方法因需计算模型参数后验分布而难以实现。解决方案的关键在于提出一种基于自助法(bootstrap)的频数论(frequentist)度量,并通过一个新颖的渐近展开理论证明该度量与贝叶斯互信息在大样本下渐近等价,从而为互信息提供频数论解释并带来新的近似计算策略;同时,该方法还与深度集成(deep ensembles)这一广泛使用的启发式方法建立了理论联系,深化了对其实际性能的理解。

链接: https://arxiv.org/abs/2510.22063
作者: Anchit Jain,Stephen Bates
机构: 未知
类目: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Statistics Theory (math.ST)
备注:

点击查看摘要

Abstract:Decomposing prediction uncertainty into its aleatoric (irreducible) and epistemic (reducible) components is critical for the development and deployment of machine learning systems. A popular, principled measure for epistemic uncertainty is the mutual information between the response variable and model parameters. However, evaluating this measure requires access to the posterior distribution of the model parameters, which is challenging to compute. In view of this, we introduce a frequentist measure of epistemic uncertainty based on the bootstrap. Our main theoretical contribution is a novel asymptotic expansion that reveals that our proposed (frequentist) measure and the (Bayesian) mutual information are asymptotically equivalent. This provides frequentist interpretations to mutual information and new computational strategies for approximating it. Moreover, we link our proposed approach to the widely-used heuristic approach of deep ensembles, giving added perspective on their practical success.
zh

机器学习

[LG-0] Lightweight Robust Direct Preference Optimization

链接: https://arxiv.org/abs/2510.23590
作者: Cheol Woo Kim,Shresth Verma,Mauricio Tec,Milind Tambe
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-1] Sequential Multi-Agent Dynamic Algorithm Configuration NEURIPS2025

链接: https://arxiv.org/abs/2510.23535
作者: Chen Lu,Ke Xue,Lei Yuan,Yao Wang,Yaoyuan Wang,Sheng Fu,Chao Qian
类目: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
*备注: NeurIPS 2025

点击查看摘要

[LG-2] Bayes-Split-Edge: Bayesian Optimization for Constrained Collaborative Inference in Wireless Edge Systems

链接: https://arxiv.org/abs/2510.23503
作者: Fatemeh Zahra Safaeipour,Jacob Chakareski,Morteza Hashemi
类目: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Signal Processing (eess.SP)
*备注:

点击查看摘要

Abstract:Mobile edge devices (e.g., AR/VR headsets) typically need to complete timely inference tasks while operating with limited on-board computing and energy resources. In this paper, we investigate the problem of collaborative inference in wireless edge networks, where energy-constrained edge devices aim to complete inference tasks within given deadlines. These tasks are carried out using neural networks, and the edge device seeks to optimize inference performance under energy and delay constraints. The inference process can be split between the edge device and an edge server, thereby achieving collaborative inference over wireless networks. We formulate an inference utility optimization problem subject to energy and delay constraints, and propose a novel solution called Bayes-Split-Edge, which leverages Bayesian optimization for collaborative split inference over wireless edge networks. Our solution jointly optimizes the transmission power and the neural network split point. The Bayes-Split-Edge framework incorporates a novel hybrid acquisition function that balances inference task utility, sample efficiency, and constraint violation penalties. We evaluate our approach using the VGG19 model on the ImageNet-Mini dataset, and Resnet101 on Tiny-ImageNet, and real-world mMobile wireless channel datasets. Numerical results demonstrate that Bayes-Split-Edge achieves up to 2.4x reduction in evaluation cost compared to standard Bayesian optimization and achieves near-linear convergence. It also outperforms several baselines, including CMA-ES, DIRECT, exhaustive search, and Proximal Policy Optimization (PPO), while matching exhaustive search performance under tight constraints. These results confirm that the proposed framework provides a sample-efficient solution requiring maximum 20 function evaluations and constraint-aware optimization for wireless split inference in edge computing systems.

[LG-3] owards Deep Physics-Informed Kolmogorov-Arnold Networks

链接: https://arxiv.org/abs/2510.23501
作者: Spyros Rigas,Fotios Anagnostopoulos,Michalis Papachristou,Georgios Alexandridis
类目: Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
*备注: 73 pages, 22 figures

点击查看摘要

[LG-4] Learning to Reason Efficiently with Discounted Reinforcement Learning

链接: https://arxiv.org/abs/2510.23486
作者: Alex Ayoub,Kavosh Asadi,Dale Schuurmans,Csaba Szepesvári,Karim Bouyarmane
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Large reasoning models (LRMs) often consume excessive tokens, inflating computational cost and latency. We challenge the assumption that longer responses improve accuracy. By penalizing reasoning tokens using a discounted reinforcement learning setup (interpretable as a small token cost) and analyzing Blackwell optimality in restricted policy classes, we encourage concise yet accurate reasoning. Experiments confirm our theoretical results that this approach shortens chains of thought while preserving accuracy.

[LG-5] Adaptive Dual Prompting: Hierarchical Debiasing for Fairness-aware Graph Neural Networks

链接: https://arxiv.org/abs/2510.23469
作者: Yuhan Yang,Xingbo Fu,Jundong Li
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-6] Differential Privacy as a Perk: Federated Learning over Multiple-Access Fading Channels with a Multi-Antenna Base Station

链接: https://arxiv.org/abs/2510.23463
作者: Hao Liang,Haifeng Wen,Kaishun Wu,Hong Xing
类目: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
*备注: 15 pages, 5 figures, submitted for possible publication

点击查看摘要

[LG-7] SGFusion: Stochastic Geographic Gradient Fusion in Federated Learning

链接: https://arxiv.org/abs/2510.23455
作者: Khoa Nguyen,Khang Tran,NhatHai Phan,Cristian Borcea,Rouming Jin,Issa Khalil
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-8] Schrodinger Neural Network and Uncertainty Quantification: Quantum Machine

链接: https://arxiv.org/abs/2510.23449
作者: M. M. Hammad
类目: Machine Learning (cs.LG)
*备注: 29 pages, 16 figures

点击查看摘要

[LG-9] An Information-Theoretic Analysis of Out-of-Distribution Generalization in Meta-Learning with Applications to Meta-RL

链接: https://arxiv.org/abs/2510.23448
作者: Xingtu Liu
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-10] Coresets for Clustering Under Stochastic Noise NEURIPS2025

链接: https://arxiv.org/abs/2510.23438
作者: Lingxiao Huang,Zhize Li,Nisheeth K. Vishnoi,Runkai Yang,Haoyu Zhao
类目: Machine Learning (cs.LG); Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
*备注: This paper has been accepted by NeurIPS 2025

点击查看摘要

[LG-11] Improving Predictions of Molecular Properties with Graph Featurisation and Heterogeneous Ensemble Models

链接: https://arxiv.org/abs/2510.23428
作者: Michael L. Parker,Samar Mahmoud,Bailey Montefiore,Mario Öeren,Himani Tandon,Charlotte Wharrick,Matthew D. Segall
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-12] PrivacyGuard: A Modular Framework for Privacy Auditing in Machine Learning

链接: https://arxiv.org/abs/2510.23427
作者: Luca Melis,Matthew Grange,Iden Kalemaj,Karan Chadha,Shengyuan Hu,Elena Kashtelyan,Will Bullock
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-13] he Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation

链接: https://arxiv.org/abs/2510.23393
作者: Farid Bagirov,Mikhail Arkhipov,Ksenia Sycheva,Evgeniy Glukhov,Egor Bogomolov
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-14] Floating-Point Neural Network Verification at the Software Level

链接: https://arxiv.org/abs/2510.23389
作者: Edoardo Manino,Bruno Farias,Rafael Sá Menezes,Fedor Shmarov,Lucas C. Cordeiro
类目: oftware Engineering (cs.SE); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
*备注: Pre-print before submission to peer review

点击查看摘要

[LG-15] owards a Generalizable AI for Materials Discovery: Validation through Immersion Coolant Screening

链接: https://arxiv.org/abs/2510.23371
作者: Hyunseung Kim,Dae-Woong Jeong,Changyoung Park,Won-Ji Lee,Ha-Eun Lee,Ji-Hye Lee,Rodrigo Hormazabal,Sung Moon Ko,Sumin Lee,Soorin Yim,Chanhui Lee,Sehui Han,Sang-Ho Cha,Woohyung Lim
类目: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
*备注: 16 pages, 4 figures

点击查看摘要

[LG-16] Robust Non-negative Proximal Gradient Algorithm for Inverse Problems

链接: https://arxiv.org/abs/2510.23362
作者: Hanzhang Wang,Zonglin Liu,Jingyi Xu,Chenyang Wang,Zhiwei Zhong,Qiangqiang Shen
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-17] Block-Diagonal LoRA for Eliminating Communication Overhead in Tensor Parallel LoRA Serving

链接: https://arxiv.org/abs/2510.23346
作者: Xinyu Wang,Jonas M. Kübler,Kailash Budhathoki,Yida Wang,Matthäus Kleindessner
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:When serving a single base LLM with several different LoRA adapters simultaneously, the adapters cannot simply be merged with the base model’s weights as the adapter swapping would create overhead and requests using different adapters could not be batched. Rather, the LoRA computations have to be separated from the base LLM computations, and in a multi-device setup the LoRA adapters can be sharded in a way that is well aligned with the base model’s tensor parallel execution, as proposed in S-LoRA. However, the S-LoRA sharding strategy encounters some communication overhead, which may be small in theory, but can be large in practice. In this paper, we propose to constrain certain LoRA factors to be block-diagonal, which allows for an alternative way of sharding LoRA adapters that does not require any additional communication for the LoRA computations. We demonstrate in extensive experiments that our block-diagonal LoRA approach is similarly parameter efficient as standard LoRA (i.e., for a similar number of parameters it achieves similar downstream performance) and that it leads to significant end-to-end speed-up over S-LoRA. For example, when serving on eight A100 GPUs, we observe up to 1.79x (1.23x) end-to-end speed-up with 0.87x (1.74x) the number of adapter parameters for Llama-3.1-70B, and up to 1.63x (1.3x) end-to-end speed-up with 0.86x (1.73x) the number of adapter parameters for Llama-3.1-8B.

[LG-18] GRAD: Real-Time Gated Recurrent Anomaly Detection in Autonomous Vehicle Sensors Using Reinforced EMA and Multi-Stage Sliding Window Techniques

链接: https://arxiv.org/abs/2510.23327
作者: Mohammad Hossein Jafari Naeimi,Ali Norouzi,Athena Abdi
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-19] owards Scaling Deep Neural Networks with Predictive Coding: Theory and Practice

链接: https://arxiv.org/abs/2510.23323
作者: Francesco Innocenti
类目: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
*备注:

点击查看摘要

[LG-20] Predicting symbolic ODEs from multiple trajectories NEURIPS2025

链接: https://arxiv.org/abs/2510.23295
作者: Yakup Emre Şahin,Niki Kilbertus,Sören Becker
类目: Machine Learning (cs.LG)
*备注: Published at: 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Machine Learning and the Physical Sciences

点击查看摘要

[LG-21] Learning from Frustration: Torsor CNNs on Graphs

链接: https://arxiv.org/abs/2510.23288
作者: Daiyuan Li,Shreya Arya,Robert Ghrist
类目: Machine Learning (cs.LG); Algebraic Topology (math.AT)
*备注: 19 pages (main text + appendices), 1 figure

点击查看摘要

[LG-22] oward Interpretable Evaluation Measures for Time Series Segmentation

链接: https://arxiv.org/abs/2510.23261
作者: Félix Chavelli,Paul Boniol,Michaël Thomazo
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-23] GCAO: Group-driven Clustering via Gravitational Attraction and Optimization

链接: https://arxiv.org/abs/2510.23259
作者: Qi Li,Jun Wang
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:Traditional clustering algorithms often struggle with high-dimensional and non-uniformly distributed data, where low-density boundary samples are easily disturbed by neighboring clusters, leading to unstable and distorted clustering results. To address this issue, we propose a Group-driven Clustering via Gravitational Attraction and Optimization (GCAO) algorithm. GCAO introduces a group-level optimization mechanism that aggregates low-density boundary points into collaboratively moving groups, replacing the traditional point-based contraction process. By combining local density estimation with neighborhood topology, GCAO constructs effective gravitational interactions between groups and their surroundings, enhancing boundary clarity and structural consistency. Using groups as basic motion units, a gravitational contraction strategy ensures globally stable and directionally consistent convergence. Experiments on multiple high-dimensional datasets demonstrate that GCAO outperforms 11 representative clustering methods, achieving average improvements of 37.13%, 52.08%, 44.98%, and 38.81% in NMI, ARI, Homogeneity, and ACC, respectively, while maintaining competitive efficiency and scalability. These results highlight GCAO’s superiority in preserving cluster integrity, enhancing boundary separability, and ensuring robust performance on complex data distributions.

[LG-24] Robust Iterative Learning Hidden Quantum Markov Models

链接: https://arxiv.org/abs/2510.23237
作者: Ning Ning
类目: Machine Learning (cs.LG); Quantum Physics (quant-ph); Computation (stat.CO); Methodology (stat.ME); Machine Learning (stat.ML)
*备注: Quantum Computing, Bayesian Inference, Spatiotemporal Analysis, Robust Learning

点击查看摘要

Abstract:Hidden Quantum Markov Models (HQMMs) extend classical Hidden Markov Models to the quantum domain, offering a powerful probabilistic framework for modeling sequential data with quantum coherence. However, existing HQMM learning algorithms are highly sensitive to data corruption and lack mechanisms to ensure robustness under adversarial perturbations. In this work, we introduce the Adversarially Corrupted HQMM (AC-HQMM), which formalizes robustness analysis by allowing a controlled fraction of observation sequences to be adversarially corrupted. To learn AC-HQMMs, we propose the Robust Iterative Learning Algorithm (RILA), a derivative-free method that integrates a Remove Corrupted Rows by Entropy Filtering (RCR-EF) module with an iterative stochastic resampling procedure for physically valid Kraus operator updates. RILA incorporates L1-penalized likelihood objectives to enhance stability, resist overfitting, and remain effective under non-differentiable conditions. Across multiple HQMM and HMM benchmarks, RILA demonstrates superior convergence stability, corruption resilience, and preservation of physical validity compared to existing algorithms, establishing a principled and efficient approach for robust quantum sequential learning.

[LG-25] Grassmanian Interpolation of Low-Pass Graph Filters: Theory and Applications

链接: https://arxiv.org/abs/2510.23235
作者: Anton Savostianov,Michael T. Schaub,Benjamin Stamm
类目: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Signal Processing (eess.SP); Numerical Analysis (math.NA); Spectral Theory (math.SP)
*备注: 13 pages

点击查看摘要

[LG-26] he Benchmarking Epistemology: Construct Validity for Evaluating Machine Learning Models

链接: https://arxiv.org/abs/2510.23191
作者: Timo Freiesleben,Sebastian Zezulka
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-27] ARC: Time-Adaptive Robotic Control

链接: https://arxiv.org/abs/2510.23176
作者: Arnav Sukhija,Lenart Treven,Jin Cheng,Florian Dörfler,Stelian Coros,Andreas Krause
类目: Robotics (cs.RO); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-28] A method for outlier detection based on cluster analysis and visual expert criteria

链接: https://arxiv.org/abs/2510.23136
作者: Juan A. Lara,David Lizcano,Víctor Rampérez,Javier Soriano
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-29] Neural Emulator Superiority: When Machine Learning for PDEs Surpasses its Training Data NEURIPS NEURIPS2025

链接: https://arxiv.org/abs/2510.23111
作者: Felix Koehler,Nils Thuerey
类目: Machine Learning (cs.LG)
*备注: Accepted at NeurIPS 2025: this https URL

点击查看摘要

[LG-30] Sampling from Energy distributions with Target Concrete Score Identity

链接: https://arxiv.org/abs/2510.23106
作者: Sergei Kholkin,Francisco Vargas,Alexander Korotin
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-31] AirFed: Federated Graph-Enhanced Multi-Agent Reinforcement Learning for Multi-UAV Cooperative Mobile Edge Computing

链接: https://arxiv.org/abs/2510.23053
作者: Zhiyu Wang,Suman Raj,Rajkumar Buyya
类目: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
*备注:

点击查看摘要

[LG-32] SwiftTS: A Swift Selection Framework for Time Series Pre-trained Models via Multi-task Meta-Learning

链接: https://arxiv.org/abs/2510.23051
作者: Tengxue Zhang,Biao Ouyang,Yang Shu,Xinyang Chen,Chenjuan Guo,Bin Yang
类目: Machine Learning (cs.LG)
*备注: 10 pages,6 figures

点击查看摘要

[LG-33] Sublinear Sketches for Approximate Nearest Neighbor and Kernel Density Estimation

链接: https://arxiv.org/abs/2510.23039
作者: Ved Danait,Srijan Das,Sujoy Bhore
类目: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
*备注: 28 pages, 11 figures

点击查看摘要

[LG-34] Sentinel: Dynamic Knowledge Distillation for Personalized Federated Intrusion Detection in Heterogeneous IoT Networks

链接: https://arxiv.org/abs/2510.23019
作者: Gurpreet Singh,Keshav Sood,P. Rajalakshmi,Yong Xiang
类目: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
*备注: This is a preprint version of a paper currently under review for possible publication in IEEE TDSC

点击查看摘要

[LG-35] Adaptive Forests For Classification

链接: https://arxiv.org/abs/2510.22991
作者: Dimitris Bertsimas,Yubing Cui
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注: Under review at JMLR

点击查看摘要

[LG-36] Equivariant Neural Networks for General Linear Symmetries on Lie Algebras

链接: https://arxiv.org/abs/2510.22984
作者: Chankyo Kim(1),Sicheng Zhao(1),Minghan Zhu(1 and 2),Tzu-Yuan Lin(3),Maani Ghaffari(1) ((1) University of Michigan, (2) University of Pennsylvania, (3) Massachusetts Institute of Technology)
类目: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
*备注: 23 pages, 5 figures

点击查看摘要

Abstract:Encoding symmetries is a powerful inductive bias for improving the generalization of deep neural networks. However, most existing equivariant models are limited to simple symmetries like rotations, failing to address the broader class of general linear transformations, GL(n), that appear in many scientific domains. We introduce Reductive Lie Neurons (ReLNs), a novel neural network architecture exactly equivariant to these general linear symmetries. ReLNs are designed to operate directly on a wide range of structured inputs, including general n-by-n matrices. ReLNs introduce a novel adjoint-invariant bilinear layer to achieve stable equivariance for both Lie-algebraic features and matrix-valued inputs, without requiring redesign for each subgroup. This architecture overcomes the limitations of prior equivariant networks that only apply to compact groups or simple vector data. We validate ReLNs’ versatility across a spectrum of tasks: they outperform existing methods on algebraic benchmarks with sl(3) and sp(4) symmetries and achieve competitive results on a Lorentz-equivariant particle physics task. In 3D drone state estimation with geometric uncertainty, ReLNs jointly process velocities and covariances, yielding significant improvements in trajectory accuracy. ReLNs provide a practical and general framework for learning with broad linear group symmetries on Lie algebras and matrix-valued data. Project page: this https URL

[LG-37] QoSGMAA: A Robust Multi-Order Graph Attention and Adversarial Framework for Sparse QoS Prediction

链接: https://arxiv.org/abs/2510.22982
作者: Guanchen Du,Jianlong Xu,Mingtong Li,Ruiqi Wang,Qianqing Guo,Caiyi Chen,Qingcao Dai,Yuxiang Zeng
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-38] How Muons Spectral Design Benefits Generalization: A Study on Imbalanced Data

链接: https://arxiv.org/abs/2510.22980
作者: Bhavya Vasudeva,Puneesh Deora,Yize Zhao,Vatsal Sharan,Christos Thrampoulidis
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注: 32 pages, 28 figures

点击查看摘要

Abstract:The growing adoption of spectrum-aware matrix-valued optimizers such as Muon and Shampoo in deep learning motivates a systematic study of their generalization properties and, in particular, when they might outperform competitive algorithms. We approach this question by introducing appropriate simplifying abstractions as follows: First, we use imbalanced data as a testbed. Second, we study the canonical form of such optimizers, which is Spectral Gradient Descent (SpecGD) – each update step is UV^T where U\Sigma V^T is the truncated SVD of the gradient. Third, within this framework we identify a canonical setting for which we precisely quantify when SpecGD outperforms vanilla Euclidean GD. For a Gaussian mixture data model and both linear and bilinear models, we show that unlike GD, which prioritizes learning dominant principal components of the data first, SpecGD learns all principal components of the data at equal rates. We demonstrate how this translates to a growing gap in balanced accuracy favoring SpecGD early in training and further show that the gap remains consistent even when the GD counterpart uses adaptive step-sizes via normalization. By extending the analysis to deep linear models, we show that depth amplifies these effects. We empirically verify our theoretical findings on a variety of imbalanced datasets. Our experiments compare practical variants of spectral methods, like Muon and Shampoo, against their Euclidean counterparts and Adam. The results validate our findings that these spectral optimizers achieve superior generalization by promoting a more balanced learning of the data’s underlying components.

[LG-39] SARNet: A Spike-Aware consecutive validation Framework for Accurate Remaining Useful Life Prediction ICASSP2026

链接: https://arxiv.org/abs/2510.22955
作者: Junhao Fan,Wenrui Liang,Wei-Qiang Zhang
类目: Machine Learning (cs.LG)
*备注: 5 pages, 2 figures, 3 tables. Equal contribution by Junhao Fan and Wenrui Liang. Corresponding author: Wei-Qiang Zhang. Submitted to ICASSP 2026

点击查看摘要

Abstract:Accurate prediction of remaining useful life (RUL) is essential to enhance system reliability and reduce maintenance risk. Yet many strong contemporary models are fragile around fault onset and opaque to engineers: short, high-energy spikes are smoothed away or misread, fixed thresholds blunt sensitivity, and physics-based explanations are scarce. To remedy this, we introduce SARNet (Spike-Aware Consecutive Validation Framework), which builds on a Modern Temporal Convolutional Network (ModernTCN) and adds spike-aware detection to provide physics-informed interpretability. ModernTCN forecasts degradation-sensitive indicators; an adaptive consecutive threshold validates true spikes while suppressing noise. Failure-prone segments then receive targeted feature engineering (spectral slopes, statistical derivatives, energy ratios), and the final RUL is produced by a stacked RF–LGBM regressor. Across benchmark-ported datasets under an event-triggered protocol, SARNet consistently lowers error compared to recent baselines (RMSE 0.0365, MAE 0.0204) while remaining lightweight, robust, and easy to deploy.

[LG-40] Hankel Singular Value Regularization for Highly Compressible State Space Models NEURIPS2025

链接: https://arxiv.org/abs/2510.22951
作者: Paul Schwerdtner,Jules Berman,Benjamin Peherstorfer
类目: Machine Learning (cs.LG); Dynamical Systems (math.DS)
*备注: Accepted at NeurIPS 2025

点击查看摘要

[LG-41] Hazard-Responsive Digital Twin for Climate-Driven Urban Resilience and Equity

链接: https://arxiv.org/abs/2510.22941
作者: Zhenglai Shen,Hongyu Zhou
类目: Machine Learning (cs.LG)
*备注: 52 pages, 9 figures

点击查看摘要

[LG-42] RL-AUX: Reinforcement Learning for Auxiliary Task Generation

链接: https://arxiv.org/abs/2510.22940
作者: Judah Goldfeder,Matthew So,Hod Lipson
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-43] Diffuse to Detect: A Generalizable Framework for Anomaly Detection with Diffusion Models Applications to UAVs and Beyond

链接: https://arxiv.org/abs/2510.22928
作者: Mingze Gong,Juan Du,Jianbang You
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-44] Simple Denoising Diffusion Language Models

链接: https://arxiv.org/abs/2510.22926
作者: Huaisheng Zhu,Zhengyu Chen,Shijie Zhou,Zhihui Xie,Yige Yuan,Zhimeng Guo,Siyuan Xu,Hangfan Zhang,Vasant Honavar,Teng Xiao
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-45] owards Personalized Treatment Plan: Geometrical Model-Agnostic Approach to Counterfactual Explanations

链接: https://arxiv.org/abs/2510.22911
作者: Daniel Sin,Milad Toutounchian
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注: This paper is 15 pages long consisting of multiple sections including an abstract, introduction, related works, methodology, results, ablation studies, conclusion, future works, and an appendix section. There are 10 figures and 5 tables in total

点击查看摘要

[LG-46] On the Anisotropy of Score-Based Generative Models

链接: https://arxiv.org/abs/2510.22899
作者: Andreas Floros,Seyed-Mohsen Moosavi-Dezfooli,Pier Luigi Dragotti
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:We investigate the role of network architecture in shaping the inductive biases of modern score-based generative models. To this end, we introduce the Score Anisotropy Directions (SADs), architecture-dependent directions that reveal how different networks preferentially capture data structure. Our analysis shows that SADs form adaptive bases aligned with the architecture’s output geometry, providing a principled way to predict generalization ability in score models prior to training. Through both synthetic data and standard image benchmarks, we demonstrate that SADs reliably capture fine-grained model behavior and correlate with downstream performance, as measured by Wasserstein metrics. Our work offers a new lens for explaining and predicting directional biases of generative models.

[LG-47] Charting the Design Space of Neural Graph Representations for Subgraph Matching ICLR2025

链接: https://arxiv.org/abs/2510.22897
作者: Vaibhav Raj,Indradyumna Roy,Ashwin Ramachandran,Soumen Chakrabarti,Abir De
类目: Machine Learning (cs.LG)
*备注: ICLR 2025

点击查看摘要

[LG-48] ransforming volcanic monitoring: A dataset and benchmark for onboard volcano activity detection

链接: https://arxiv.org/abs/2510.22889
作者: Darshana Priyasad,Tharindu Fernando,Maryam Haghighat,Harshala Gammulle,Clinton Fookes
类目: Machine Learning (cs.LG)
*备注: Preprint to appear in IEEE IGARSS 2025

点击查看摘要

[LG-49] AI based signage classification for linguistic landscape studies

链接: https://arxiv.org/abs/2510.22885
作者: Yuqin Jiang,Song Jiang,Jacob Algrim,Trevor Harms,Maxwell Koenen,Xinya Lan,Xingyu Li,Chun-Han Lin,Jia Liu,Jiayang Sun,Henry Zenger
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-50] Limits of Generative Pre-Training in Structured EMR Trajectories with Irregular Sampling

链接: https://arxiv.org/abs/2510.22878
作者: Nicholas I-Hsien Kuo,Blanca Gallego,Louisa Jorm
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-51] A Review of End-to-End Precipitation Prediction Using Remote Sensing Data: from Divination to Machine Learning

链接: https://arxiv.org/abs/2510.22855
作者: Yugong Zeng,Jonathan Wu
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-52] Self-induced stochastic resonance: A physics-informed machine learning approach

链接: https://arxiv.org/abs/2510.22848
作者: Divyesh Savaliya,Marius E. Yamakou
类目: Machine Learning (cs.LG); Adaptation and Self-Organizing Systems (nlin.AO); Machine Learning (stat.ML)
*备注: 22 pages, 10 figures, 58 references

点击查看摘要

[LG-53] Clustering by Denoising: Latent plug-and-play diffusion for single-cell data

链接: https://arxiv.org/abs/2510.22835
作者: Dominik Meier,Shixing Yu,Sagnik Nandy,Promit Ghosal,Kyra Gan
类目: Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-54] Logical GANs: Adversarial Learning through Ehrenfeucht Fraisse Games

链接: https://arxiv.org/abs/2510.22824
作者: Mirco A. Mannucci
类目: Machine Learning (cs.LG); Logic in Computer Science (cs.LO); Logic (math.LO)
*备注: 12

点击查看摘要

[LG-55] Last Iterate Analyses of FTRL in Stochasitc Bandits

链接: https://arxiv.org/abs/2510.22819
作者: Jingxin Zhan,Yuze Han,Zhihua Zhang
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:The convergence analysis of online learning algorithms is central to machine learning theory, where last-iterate convergence is particularly important, as it captures the learner’s actual decisions and describes the evolution of the learning process over time. However, in multi-armed bandits, most existing algorithmic analyses mainly focus on the order of regret, while the last-iterate (simple regret) convergence rate remains less explored – especially for the widely studied Follow-the-Regularized-Leader (FTRL) algorithms. Recently, a growing line of work has established the Best-of-Both-Worlds (BOBW) property of FTRL algorithms in bandit problems, showing in particular that they achieve logarithmic regret in stochastic bandits. Nevertheless, their last-iterate convergence rate has not yet been studied. Intuitively, logarithmic regret should correspond to a t^-1 last-iterate convergence rate. This paper partially confirms this intuition through theoretical analysis, showing that the Bregman divergence, defined by the regular function \Psi§=-4\sum_i=1^d\sqrtp_i associated with the BOBW FTRL algorithm 1/2 -Tsallis-INF (arXiv:1807.07623), between the point mass on the optimal arm and the probability distribution over the arm set obtained at iteration t , decays at a rate of t^-1/2 .

[LG-56] Distributed Multi-Agent Bandits Over Erdős-Rényi Random Networks

链接: https://arxiv.org/abs/2510.22811
作者: Jingyuan Liu,Hao Qiu,Lin Yang,Mengfan Xu
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-57] Inductive Transfer Learning for Graph-Based Recommenders NEURIPS2025

链接: https://arxiv.org/abs/2510.22799
作者: Florian Grötschla,Elia Trachsel,Luca A. Lanzendörfer,Roger Wattenhofer
类目: Machine Learning (cs.LG)
*备注: Accepted at the New Perspectives in Graph Machine Learning Workshop at NeurIPS 2025

点击查看摘要

[LG-58] SAO-Instruct: Free-form Audio Editing using Natural Language Instructions NEURIPS2025

链接: https://arxiv.org/abs/2510.22795
作者: Michael Ungersböck,Florian Grötschla,Luca A. Lanzendörfer,June Young Yi,Changho Choi,Roger Wattenhofer
类目: ound (cs.SD); Machine Learning (cs.LG)
*备注: Accepted at NeurIPS 2025

点击查看摘要

[LG-59] SeeDNorm: Self-Rescaled Dynamic Normalization

链接: https://arxiv.org/abs/2510.22777
作者: Wenrui Cai,Defa Zhu,Qingjie Liu,Qiyang Min
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-60] Distributionally Robust Optimization via Diffusion Ambiguity Modeling

链接: https://arxiv.org/abs/2510.22757
作者: Jiaqi Wen,Jianyi Yang
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-61] Centrum: Model-based Database Auto-tuning with Minimal Distributional Assumptions

链接: https://arxiv.org/abs/2510.22734
作者: Yuanhao Lai,Pengfei Zheng,Chenpeng Ji,Yan Li,Songhan Zhang,Rutao Zhang,Zhengang Wang,Yunfei Du
类目: Machine Learning (cs.LG); Databases (cs.DB); Methodology (stat.ME)
*备注: 26 pages

点击查看摘要

[LG-62] Identification of Causal Direction under an Arbitrary Number of Latent Confounders

链接: https://arxiv.org/abs/2510.22711
作者: Wei Chen,Linjun Peng,Zhiyi Huang,Haoyue Dai,Zhifeng Hao,Ruichu Cai,Kun Zhang
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-63] UCB-type Algorithm for Budget-Constrained Expert Learning

链接: https://arxiv.org/abs/2510.22654
作者: Ilgam Latypov,Alexandra Suvorikova,Alexey Kroshnin,Alexander Gasnikov,Yuriy Dorn
类目: Machine Learning (cs.LG); Multiagent Systems (cs.MA)
*备注:

点击查看摘要

[LG-64] If You Want to Be Robust Be Wary of Initialization NEURIPS2024

链接: https://arxiv.org/abs/2510.22652
作者: Sofiane Ennadir,Johannes F. Lutzeyer,Michalis Vazirgiannis,El Houcine Bergou
类目: Machine Learning (cs.LG)
*备注: Accepted at NeurIPS 2024

点击查看摘要

[LG-65] Environment-aware Motion Matching ATC SIGGRAPH

链接: https://arxiv.org/abs/2510.22632
作者: Jose Luis Ponton,Sheldon Andrews,Carlos Andujar,Nuria Pelechano
类目: Graphics (cs.GR); Machine Learning (cs.LG)
*备注: Published in ACM TOG and presented in SIGGRAPH ASIA 2025. Project webpage: this https URL

点击查看摘要

Abstract:Interactive applications demand believable characters that respond naturally to dynamic environments. Traditional character animation techniques often struggle to handle arbitrary situations, leading to a growing trend of dynamically selecting motion-captured animations based on predefined features. While Motion Matching has proven effective for locomotion by aligning to target trajectories, animating environment interactions and crowd behaviors remains challenging due to the need to consider surrounding elements. Existing approaches often involve manual setup or lack the naturalism of motion capture. Furthermore, in crowd animation, body animation is frequently treated as a separate process from trajectory planning, leading to inconsistencies between body pose and root motion. To address these limitations, we present Environment-aware Motion Matching, a novel real-time system for full-body character animation that dynamically adapts to obstacles and other agents, emphasizing the bidirectional relationship between pose and trajectory. In a preprocessing step, we extract shape, pose, and trajectory features from a motion capture database. At runtime, we perform an efficient search that matches user input and current pose while penalizing collisions with a dynamic environment. Our method allows characters to naturally adjust their pose and trajectory to navigate crowded scenes.

[LG-66] CLEANet: Robust and Efficient Anomaly Detection in Contaminated Multivariate Time Series

链接: https://arxiv.org/abs/2510.22619
作者: Songhan Zhang,Yuanhao Lai,Pengfei Zheng,Boxi Yu,Xiaoying Tang,Qiuai Fu,Pinjia He
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-67] A roadmap for curvature-based geometric data analysis and learning

链接: https://arxiv.org/abs/2510.22599
作者: Yasharth Yadav,Kelin Xia
类目: Machine Learning (cs.LG); Differential Geometry (math.DG)
*备注:

点击查看摘要

[LG-68] Prediction-Powered Semi-Supervised Learning with Online Power Tuning NEURIPS2025

链接: https://arxiv.org/abs/2510.22586
作者: Noa Shoham,Ron Dorfman,Shalev Shaer,Kfir Y. Levy,Yaniv Romano
类目: Machine Learning (cs.LG)
*备注: NeurIPS 2025

点击查看摘要

Abstract:Prediction-Powered Inference (PPI) is a recently proposed statistical inference technique for parameter estimation that leverages pseudo-labels on both labeled and unlabeled data to construct an unbiased, low-variance estimator. In this work, we extend its core idea to semi-supervised learning (SSL) for model training, introducing a novel unbiased gradient estimator. This extension addresses a key challenge in SSL: while unlabeled data can improve model performance, its benefit heavily depends on the quality of pseudo-labels. Inaccurate pseudo-labels can introduce bias, leading to suboptimal this http URL balance the contributions of labeled and pseudo-labeled data, we utilize an interpolation parameter and tune it on the fly, alongside the model parameters, using a one-dimensional online learning algorithm. We verify the practical advantage of our approach through experiments on both synthetic and real datasets, demonstrating improved performance over classic SSL baselines and PPI methods that tune the interpolation parameter offline.

[LG-69] Optimal Anytime Algorithms for Online Convex Optimization with Adversarial Constraints

链接: https://arxiv.org/abs/2510.22579
作者: Dhruv Sarkar,Abhishek Sinha
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-70] Cross-Paradigm Graph Backdoor Attacks with Promptable Subgraph Triggers

链接: https://arxiv.org/abs/2510.22555
作者: Dongyi Liu,Jiangtong Li,Dawei Cheng,Changjun Jiang
类目: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-71] FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning

链接: https://arxiv.org/abs/2510.22543
作者: Yuyang Ding,Chi Zhang,Juntao Li,Haibin Lin,Xin Liu,Min Zhang
类目: Machine Learning (cs.LG)
*备注: Project page: this https URL

点击查看摘要

[LG-72] Approximate Gradient Coding for Distributed Learning with Heterogeneous Strag glers

链接: https://arxiv.org/abs/2510.22539
作者: Heekang Song,Wan Choi
类目: ystems and Control (eess.SY); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-73] Iteratively Refined Early Interaction Alignment for Subgraph Matching based Graph Retrieval

链接: https://arxiv.org/abs/2510.22538
作者: Ashwin Ramachandran,Vaibhav Raj,Indrayumna Roy,Soumen Chakrabarti,Abir De
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-74] Random Search Neural Networks for Efficient and Expressive Graph Learning NEURIPS2025

链接: https://arxiv.org/abs/2510.22520
作者: Michael Ito,Danai Koutra,Jenna Wiens
类目: Machine Learning (cs.LG)
*备注: NEURIPS 2025; version with full appendix

点击查看摘要

Abstract:Random walk neural networks (RWNNs) have emerged as a promising approach for graph representation learning, leveraging recent advances in sequence models to process random walks. However, under realistic sampling constraints, RWNNs often fail to capture global structure even in small graphs due to incomplete node and edge coverage, limiting their expressivity. To address this, we propose \textitrandom search neural networks (RSNNs), which operate on random searches, each of which guarantees full node coverage. Theoretically, we demonstrate that in sparse graphs, only O(\log |V|) searches are needed to achieve full edge coverage, substantially reducing sampling complexity compared to the O(|V|) walks required by RWNNs (assuming walk lengths scale with graph size). Furthermore, when paired with universal sequence models, RSNNs are universal approximators. We lastly show RSNNs are probabilistically invariant to graph isomorphisms, ensuring their expectation is an isomorphism-invariant graph function. Empirically, RSNNs consistently outperform RWNNs on molecular and protein benchmarks, achieving comparable or superior performance with up to 16 \times fewer sampled sequences. Our work bridges theoretical and practical advances in random walk based approaches, offering an efficient and expressive framework for learning on sparse graphs.

[LG-75] A Scalable Global Optimization Algorithm For Constrained Clustering

链接: https://arxiv.org/abs/2510.22519
作者: Pedro Chumpitaz-Flores,My Duong,Cristobal Heredia,Kaixun Hua
类目: Machine Learning (cs.LG)
*备注: 21 pages, 4 figures, 9 tables

点击查看摘要

[LG-76] Smart Sensor Placement: A Correlation-Aware Attribution Framework (CAAF) for Real-world Data Modeling

链接: https://arxiv.org/abs/2510.22517
作者: Sze Chai Leung,Di Zhou,H. Jane Bae
类目: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Systems and Control (eess.SY)
*备注:

点击查看摘要

[LG-77] CANDI: Hybrid Discrete-Continuous Diffusion Models

链接: https://arxiv.org/abs/2510.22510
作者: Patrick Pynadath,Jiaxin Shi,Ruqi Zhang
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-78] Multi-Scale Finite Expression Method for PDEs with Oscillatory Solutions on Complex Domains

链接: https://arxiv.org/abs/2510.22497
作者: Gareth Hardwick,Haizhao Yang
类目: Numerical Analysis (math.NA); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-79] Contextual Tokenization for Graph Inverted Indices

链接: https://arxiv.org/abs/2510.22479
作者: Pritish Chakraborty,Indradyumna Roy,Soumen Chakrabarti,Abir De
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Retrieving graphs from a large corpus, that contain a subgraph isomorphic to a given query graph, is a core operation in many real-world applications. While recent multi-vector graph representations and scores based on set alignment and containment can provide accurate subgraph isomorphism tests, their use in retrieval remains limited by their need to score corpus graphs exhaustively. We introduce CORGII (Contextual Representation of Graphs for Inverted Indexing), a graph indexing framework in which, starting with a contextual dense graph representation, a differentiable discretization module computes sparse binary codes over a learned latent vocabulary. This text document-like representation allows us to leverage classic, highly optimized inverted indices, while supporting soft (vector) set containment scores. Pushing this paradigm further, we replace the classical, fixed impact weight of a `token’ on a graph (such as TFIDF or BM25) with a data-driven, trainable impact weight. Finally, we explore token expansion to support multi-probing the index for smoother accuracy-efficiency tradeoffs. To our knowledge, CORGII is the first indexer of dense graph representations using discrete tokens mapping to efficient inverted lists. Extensive experiments show that CORGII provides better trade-offs between accuracy and efficiency, compared to several baselines.

[LG-80] Learning Local Stackelberg Equilibria from Repeated Interactions with a Learning Agent

链接: https://arxiv.org/abs/2510.22471
作者: Nivasini Ananthakrishnan,Yuval Dagan,Kunhe Yang
类目: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-81] Low-Precision Streaming PCA

链接: https://arxiv.org/abs/2510.22440
作者: Sanjoy Dasgupta,Syamantak Kumar,Shourya Pandey,Purnamrita Sarkar
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:Low-precision streaming PCA estimates the top principal component in a streaming setting under limited precision. We establish an information-theoretic lower bound on the quantization resolution required to achieve a target accuracy for the leading eigenvector. We study Oja’s algorithm for streaming PCA under linear and nonlinear stochastic quantization. The quantized variants use unbiased stochastic quantization of the weight vector and the updates. Under mild moment and spectral-gap assumptions on the data distribution, we show that a batched version achieves the lower bound up to logarithmic factors under both schemes. This leads to a nearly dimension-free quantization error in the nonlinear quantization setting. Empirical evaluations on synthetic streams validate our theoretical findings and demonstrate that our low-precision methods closely track the performance of standard Oja’s algorithm.

[LG-82] NetBurst: Event-Centric Forecasting of Bursty Intermittent Time Series

链接: https://arxiv.org/abs/2510.22397
作者: Satyandra Guthula,Jaber Daneshamooz,Charles Fleming,Ashish Kundu,Walter Willinger,Arpit Gupta
类目: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-83] Bias Begins with Data: The FairGround Corpus for Robust and Reproducible Research on Algorithmic Fairness

链接: https://arxiv.org/abs/2510.22363
作者: Jan Simson,Alessandro Fabris,Cosima Fröhner,Frauke Kreuter,Christoph Kern
类目: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)
*备注: Website: this https URL

点击查看摘要

[LG-84] Uncertainty quantification in model discovery by distilling interpretable material constitutive models from Gaussian process posteriors

链接: https://arxiv.org/abs/2510.22345
作者: David Anton,Henning Wessels,Ulrich Römer,Alexander Henkes,Jorge-Humberto Urrea-Quintero
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-85] ransformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders NEURIPS2025

链接: https://arxiv.org/abs/2510.22332
作者: Mengyu Ye,Jun Suzuki,Tatsuro Inaba,Tatsuki Kuribayashi
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注: NeurIPS 2025

点击查看摘要

Abstract:Recent interpretability work on large language models (LLMs) has been increasingly dominated by a feature-discovery approach with the help of proxy modules. Then, the quality of features learned by, e.g., sparse auto-encoders (SAEs), is evaluated. This paradigm naturally raises a critical question: do such learned features have better properties than those already represented within the original model parameters, and unfortunately, only a few studies have made such comparisons systematically so far. In this work, we revisit the interpretability of feature vectors stored in feed-forward (FF) layers, given the perspective of FF as key-value memories, with modern interpretability benchmarks. Our extensive evaluation revealed that SAE and FFs exhibits a similar range of interpretability, although SAEs displayed an observable but minimal improvement in some aspects. Furthermore, in certain aspects, surprisingly, even vanilla FFs yielded better interpretability than the SAEs, and features discovered in SAEs and FFs diverged. These bring questions about the advantage of SAEs from both perspectives of feature quality and faithfulness, compared to directly interpreting FF feature vectors, and FF key-value parameters serve as a strong baseline in modern interpretability research.

[LG-86] Monitoring State Transitions in Markovian Systems with Sampling Cost

链接: https://arxiv.org/abs/2510.22327
作者: Kumar Saurav,Ness B. Shroff,Yingbin Liang
类目: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
*备注: 6 pages, 4 figures

点击查看摘要

[LG-87] Stable neural networks and connections to continuous dynamical systems

链接: https://arxiv.org/abs/2510.22299
作者: Matthias J. Ehrhardt,Davide Murari,Ferdia Sherry
类目: Numerical Analysis (math.NA); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:The existence of instabilities, for example in the form of adversarial examples, has given rise to a highly active area of research concerning itself with understanding and enhancing the stability of neural networks. We focus on a popular branch within this area which draws on connections to continuous dynamical systems and optimal control, giving a bird’s eye view of this area. We identify and describe the fundamental concepts that underlie much of the existing work in this area. Following this, we go into more detail on a specific approach to designing stable neural networks, developing the theoretical background and giving a description of how these networks can be implemented. We provide code that implements the approach that can be adapted and extended by the reader. The code further includes a notebook with a fleshed-out toy example on adversarial robustness of image classification that can be run without heavy requirements on the reader’s computer. We finish by discussing this toy example so that the reader can interactively follow along on their computer. This work will be included as a chapter of a book on scientific machine learning, which is currently under revision and aimed at students.

[LG-88] Predicting Metabolic Dysfunction-Associated Steatotic Liver Disease using Machine Learning Methods

链接: https://arxiv.org/abs/2510.22293
作者: Mary E. An,Paul Griffin,Jonathan G. Stine,Ramakrishna Balakrishnan,Ram Sriram,Soundar Kumara
类目: Machine Learning (cs.LG); Computers and Society (cs.CY); Quantitative Methods (q-bio.QM)
*备注:

点击查看摘要

Abstract:Background: Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD) affects ~33% of U.S. adults and is the most common chronic liver disease. Although often asymptomatic, progression can lead to cirrhosis. Early detection is important, as lifestyle interventions can prevent disease progression. We developed a fair, rigorous, and reproducible MASLD prediction model and compared it to prior methods using a large electronic health record database. Methods: We evaluated LASSO logistic regression, random forest, XGBoost, and a neural network for MASLD prediction using clinical feature subsets, including the top 10 SHAP-ranked features. To reduce disparities in true positive rates across racial and ethnic subgroups, we applied an equal opportunity postprocessing method. Results: This study included 59,492 patients in the training data, 24,198 in the validating data, and 25,188 in the testing data. The LASSO logistic regression model with the top 10 features was selected for its interpretability and comparable performance. Before fairness adjustment, the model achieved AUROC of 0.84, accuracy of 78%, sensitivity of 72%, specificity of 79%, and F1-score of 0.617. After equal opportunity postprocessing, accuracy modestly increased to 81% and specificity to 94%, while sensitivity decreased to 41% and F1-score to 0.515, reflecting the fairness trade-off. Conclusions: We developed the MASER prediction model (MASLD Static EHR Risk Prediction), a LASSO logistic regression model which achieved competitive performance for MASLD prediction (AUROC 0.836, accuracy 77.6%), comparable to previously reported ensemble and tree-based models. Overall, this approach demonstrates that interpretable models can achieve a balance of predictive performance and fairness in diverse patient populations. Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Quantitative Methods (q-bio.QM) Cite as: arXiv:2510.22293 [cs.LG] (or arXiv:2510.22293v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2510.22293 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Mary An [view email] [v1] Sat, 25 Oct 2025 13:36:18 UTC (671 KB)

[LG-89] Machine Learning Enabled Early Warning System For Financial Distress Using Real-Time Digital Signals

链接: https://arxiv.org/abs/2510.22287
作者: Laxmi pant,Syed Ali Reza,Md Khalilor Rahman,MD Saifur Rahman,Shamima Sharmin,Md Fazlul Huq Mithu,Kazi Nehal Hasnain,Adnan Farabi,Mahamuda khanom,Raisul Kabir
类目: Machine Learning (cs.LG); Computers and Society (cs.CY)
*备注:

点击查看摘要

[LG-90] Adapting Noise-Driven PUF and AI for Secure WBG ICS: A Proof-of-Concept Study

链接: https://arxiv.org/abs/2510.22283
作者: Devon A. Kelly,Christiana Chamon
类目: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Systems and Control (eess.SY); Applied Physics (physics.app-ph)
*备注:

点击查看摘要

Abstract:Wide-bandgap (WBG) technologies offer unprecedented improvements in power system efficiency, size, and performance, but also introduce unique sensor corruption and cybersecurity risks in industrial control systems (ICS), particularly due to high-frequency noise and sophisticated cyber-physical threats. This proof-of-concept (PoC) study demonstrates the adaptation of a noise-driven physically unclonable function (PUF) and machine learning (ML)-assisted anomaly detection framework to the demanding environment of WBG-based ICS sensor pathways. By extracting entropy from unavoidable WBG switching noise (up to 100 kHz) as a PUF source, and simultaneously using this noise as a real-time threat indicator, the proposed system unites hardware-level authentication and anomaly detection. Our approach integrates hybrid machine learning (ML) models with adaptive Bayesian filtering, providing robust and low-latency detection capabilities resilient to both natural electromagnetic interference (EMI) and active adversarial manipulation. Through detailed simulations of WBG modules under benign and attack scenarios–including EMI injection, signal tampering, and node impersonation–we achieve 95% detection accuracy and sub-millisecond processing latency. These results demonstrate the feasibility of physics-driven, dual-use noise exploitation as a scalable ICS defense primitive. Our findings lay the groundwork for next-generation security strategies that leverage inherent device characteristics, bridging hardware and artificial intelligence (AI) for enhanced protection of critical ICS infrastructure.

[LG-91] SecureLearn - An Attack-agnostic Defense for Multiclass Machine Learning Against Data Poisoning Attacks

链接: https://arxiv.org/abs/2510.22274
作者: Anum Paracha,Junaid Arshad,Mohamed Ben Farah,Khalid Ismail
类目: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Data poisoning attacks are a potential threat to machine learning (ML) models, aiming to manipulate training datasets to disrupt their performance. Existing defenses are mostly designed to mitigate specific poisoning attacks or are aligned with particular ML algorithms. Furthermore, most defenses are developed to secure deep neural networks or binary classifiers. However, traditional multiclass classifiers need attention to be secure from data poisoning attacks, as these models are significant in developing multi-modal applications. Therefore, this paper proposes SecureLearn, a two-layer attack-agnostic defense to defend multiclass models from poisoning attacks. It comprises two components of data sanitization and a new feature-oriented adversarial training. To ascertain the effectiveness of SecureLearn, we proposed a 3D evaluation matrix with three orthogonal dimensions: data poisoning attack, data sanitization and adversarial training. Benchmarking SecureLearn in a 3D matrix, a detailed analysis is conducted at different poisoning levels (10%-20%), particularly analysing accuracy, recall, F1-score, detection and correction rates, and false discovery rate. The experimentation is conducted for four ML algorithms, namely Random Forest (RF), Decision Tree (DT), Gaussian Naive Bayes (GNB) and Multilayer Perceptron (MLP), trained with three public datasets, against three poisoning attacks and compared with two existing mitigations. Our results highlight that SecureLearn is effective against the provided attacks. SecureLearn has strengthened resilience and adversarial robustness of traditional multiclass models and neural networks, confirming its generalization beyond algorithm-specific defenses. It consistently maintained accuracy above 90%, recall and F1-score above 75%. For neural networks, SecureLearn achieved 97% recall and F1-score against all selected poisoning attacks.

[LG-92] Visual Model Selection using Feature Importance Clusters in Fairness-Performance Similarity Optimized Space

链接: https://arxiv.org/abs/2510.22209
作者: Sofoklis Kitharidis,Cor J. Veenman,Thomas Bäck,Niki van Stein
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:In the context of algorithmic decision-making, fair machine learning methods often yield multiple models that balance predictive fairness and performance in varying degrees. This diversity introduces a challenge for stakeholders who must select a model that aligns with their specific requirements and values. To address this, we propose an interactive framework that assists in navigating and interpreting the trade-offs across a portfolio of models. Our approach leverages weakly supervised metric learning to learn a Mahalanobis distance that reflects similarity in fairness and performance outcomes, effectively structuring the feature importance space of the models according to stakeholder-relevant criteria. We then apply clustering technique (k-means) to group models based on their transformed representations of feature importances, allowing users to explore clusters of models with similar predictive behaviors and fairness characteristics. This facilitates informed decision-making by helping users understand how models differ not only in their fairness-performance balance but also in the features that drive their predictions.

[LG-93] Quantitative Bounds for Sorting-Based Permutation-Invariant Embeddings

链接: https://arxiv.org/abs/2510.22186
作者: Nadav Dym,Matthias Wellershoff,Efstratios Tsoukanis,Daniel Levy,Radu Balan
类目: Machine Learning (cs.LG); Information Theory (cs.IT); Functional Analysis (math.FA); Metric Geometry (math.MG)
*备注: 26 pages, 1 figure, 2 tables

点击查看摘要

[LG-94] ractable Shapley Values and Interactions via Tensor Networks

链接: https://arxiv.org/abs/2510.22138
作者: Farzaneh Heidari,Chao Li,Farzaneh Heidari
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:We show how to replace the O(2^n) coalition enumeration over n features behind Shapley values and Shapley-style interaction indices with a few-evaluation scheme on a tensor-network (TN) surrogate: TN-SHAP. The key idea is to represent a predictor’s local behavior as a factorized multilinear map, so that coalitional quantities become linear probes of a coefficient tensor. TN-SHAP replaces exhaustive coalition sweeps with just a small number of targeted evaluations to extract order-k Shapley interactions. In particular, both order-1 (single-feature) and order-2 (pairwise) computations have cost O(n*poly(chi) + n^2), where chi is the TN’s maximal cut rank. We provide theoretical guarantees on the approximation error and tractability of TN-SHAP. On UCI datasets, our method matches enumeration on the fitted surrogate while reducing evaluation by orders of magnitude and achieves 25-1000x wall-clock speedups over KernelSHAP-IQ at comparable accuracy, while amortizing training across local cohorts.

[LG-95] HandPass: A Wi-Fi CSI Palm Authentication Approach for Access Control

链接: https://arxiv.org/abs/2510.22133
作者: Eduardo Fabricio Gomes Trindade,Felipe Silveira de Almeida,Gioliano de Oliveira Braga,Rafael Pimenta de Mattos Paixão,Pedro Henrique dos Santos Rocha,Lourenco Alves Pereira Jr
类目: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
*备注: 9 pages, 4 figures, 3 tables

点击查看摘要

[LG-96] Learning 3D Anisotropic Noise Distributions Improves Molecular Force Field Modeling

链接: https://arxiv.org/abs/2510.22123
作者: Xixian Liu,Rui Jiao,Zhiyuan Liu,Yurou Liu,Yang Liu,Ziheng Lu,Wenbing Huang,Yang Zhang,Yixin Cao
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-97] Scaling Up Efficient Small Language Models Serving and Deployment for Semantic Job Search

链接: https://arxiv.org/abs/2510.22101
作者: Kayhan Behdin,Qingquan Song,Sriram Vasudevan,Jian Sheng,Xiaojing Ma,Z Zhou,Chuanrui Zhu,Guoyao Li,Chanh Nguyen,Sayan Ghosh,Hejian Sang,Ata Fatahi Baarzi,Sundara Raman Ramachandran,Xiaoqing Wang,Qing Lan,Vinay Y S,Qi Guo,Caleb Johnson,Zhipeng Wang,Fedor Borisyuk
类目: Information Retrieval (cs.IR); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-98] Dynamic Graph Neural Network for Data-Driven Physiologically Based Pharmacokinetic Modeling

链接: https://arxiv.org/abs/2510.22096
作者: Su Liu,Xin Hu,Shurong Wen,Jiaqi Liu,Jiexi Xu,Lanruo Wang
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-99] Hierarchical Graph Networks for Accurate Weather Forecasting via Lightweight Training

链接: https://arxiv.org/abs/2510.22094
作者: Thomas Bailie,S. Karthik Mukkavilli,Varvara Vetrova,Yun Sing Koh
类目: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
*备注:

点击查看摘要

Abstract:Climate events arise from intricate, multivariate dynamics governed by global-scale drivers, profoundly impacting food, energy, and infrastructure. Yet, accurate weather prediction remains elusive due to physical processes unfolding across diverse spatio-temporal scales, which fixed-resolution methods cannot capture. Hierarchical Graph Neural Networks (HGNNs) offer a multiscale representation, but nonlinear downward mappings often erase global trends, weakening the integration of physics into forecasts. We introduce HiFlowCast and its ensemble variant HiAntFlow, HGNNs that embed physics within a multiscale prediction framework. Two innovations underpin their design: a Latent-Memory-Retention mechanism that preserves global trends during downward traversal, and a Latent-to-Physics branch that integrates PDE solution fields across diverse scales. Our Flow models cut errors by over 5% at 13-day lead times and by 5-8% under 1st and 99th quantile extremes, improving reliability for rare events. Leveraging pretrained model weights, they converge within a single epoch, reducing training cost and their carbon footprint. Such efficiency is vital as the growing scale of machine learning challenges sustainability and limits research accessibility. Code and model weights are in the supplementary materials.

[LG-100] Neural Index Policies for Restless Multi-Action Bandits with Heterogeneous Budgets

链接: https://arxiv.org/abs/2510.22069
作者: Himadri S. Pandey,Kai Wang,Gian-Gabriel P. Garcia
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:Restless multi-armed bandits (RMABs) provide a scalable framework for sequential decision-making under uncertainty, but classical formulations assume binary actions and a single global budget. Real-world settings, such as healthcare, often involve multiple interventions with heterogeneous costs and constraints, where such assumptions break down. We introduce a Neural Index Policy (NIP) for multi-action RMABs with heterogeneous budget constraints. Our approach learns to assign budget-aware indices to arm–action pairs using a neural network, and converts them into feasible allocations via a differentiable knapsack layer formulated as an entropy-regularized optimal transport (OT) problem. The resulting model unifies index prediction and constrained optimization in a single end-to-end differentiable framework, enabling gradient-based training directly on decision quality. The network is optimized to align its induced occupancy measure with the theoretical upper bound from a linear programming relaxation, bridging asymptotic RMAB theory with practical learning. Empirically, NIP achieves near-optimal performance within 5% of the oracle occupancy-measure policy while strictly enforcing heterogeneous budgets and scaling to hundreds of arms. This work establishes a general, theoretically grounded, and scalable framework for learning index-based policies in complex resource-constrained environments.

[LG-101] Deep Gaussian Processes for Functional Maps

链接: https://arxiv.org/abs/2510.22068
作者: Matthew Lowery,Zhitong Xu,Da Long,Keyan Chen,Daniel S. Johnson,Yang Bai,Varun Shankar,Shandian Zhe
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注: 10 pages + 9 page appendix, 5 figures

点击查看摘要

Abstract:Learning mappings between functional spaces, also known as function-on-function regression, plays a crucial role in functional data analysis and has broad applications, e.g. spatiotemporal forecasting, curve prediction, and climate modeling. Existing approaches, such as functional linear models and neural operators, either fall short of capturing complex nonlinearities or lack reliable uncertainty quantification under noisy, sparse, and irregularly sampled data. To address these issues, we propose Deep Gaussian Processes for Functional Maps (DGPFM). Our method designs a sequence of GP-based linear and nonlinear transformations, leveraging integral transforms of kernels, GP interpolation, and nonlinear activations sampled from GPs. A key insight simplifies implementation: under fixed locations, discrete approximations of kernel integral transforms collapse into direct functional integral transforms, enabling flexible incorporation of various integral transform designs. To achieve scalable probabilistic inference, we use inducing points and whitening transformations to develop a variational learning algorithm. Empirical results on real-world and PDE benchmark datasets demonstrate that the advantage of DGPFM in both predictive performance and uncertainty calibration.

[LG-102] Pruning and Quantization Impact on Graph Neural Networks

链接: https://arxiv.org/abs/2510.22058
作者: Khatoon Khedri,Reza Rawassizadeh,Qifu Wen,Mehdi Hosseinzadeh
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Graph neural networks (GNNs) are known to operate with high accuracy on learning from graph-structured data, but they suffer from high computational and resource costs. Neural network compression methods are used to reduce the model size while maintaining reasonable accuracy. Two of the common neural network compression techniques include pruning and quantization. In this research, we empirically examine the effects of three pruning methods and three quantization methods on different GNN models, including graph classification tasks, node classification tasks, and link prediction. We conducted all experiments on three graph datasets, including Cora, Proteins, and BBBP. Our findings demonstrate that unstructured fine-grained and global pruning can significantly reduce the model’s size(50%) while maintaining or even improving precision after fine-tuning the pruned model. The evaluation of different quantization methods on GNN shows diverse impacts on accuracy, inference time, and model size across different datasets.

[LG-103] Massive Memorization with Hundreds of Trillions of Parameters for Sequential Transducer Generative Recommenders

链接: https://arxiv.org/abs/2510.22049
作者: Zhimin Chen,Chenyu Zhao,Ka Chun Mo,Yunjiang Jiang,Jane H. Lee,Shouwei Chen,Khushhall Chandra Mahajan,Ning Jiang,Kai Ren,Jinhui Li,Wen-Yun Yang
类目: Information Retrieval (cs.IR); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-104] PFΔ: A Benchmark Dataset for Power Flow under Load Generation and Topology Variations NEURIPS2025

链接: https://arxiv.org/abs/2510.22048
作者: Ana K. Rivera,Anvita Bhagavathula,Alvaro Carbonero,Priya Donti
类目: Machine Learning (cs.LG)
*备注: 31 pages, 14 figures. Accepted at NeurIPS 2025

点击查看摘要

Abstract:Power flow (PF) calculations are the backbone of real-time grid operations, across workflows such as contingency analysis (where repeated PF evaluations assess grid security under outages) and topology optimization (which involves PF-based searches over combinatorially large action spaces). Running these calculations at operational timescales or across large evaluation spaces remains a major computational bottleneck. Additionally, growing uncertainty in power system operations from the integration of renewables and climate-induced extreme weather also calls for tools that can accurately and efficiently simulate a wide range of scenarios and operating conditions. Machine learning methods offer a potential speedup over traditional solvers, but their performance has not been systematically assessed on benchmarks that capture real-world variability. This paper introduces PF \Delta , a benchmark dataset for power flow that captures diverse variations in load, generation, and topology. PF \Delta contains 859,800 solved power flow instances spanning six different bus system sizes, capturing three types of contingency scenarios (N , N -1, and N -2), and including close-to-infeasible cases near steady-state voltage stability limits. We evaluate traditional solvers and GNN-based methods, highlighting key areas where existing approaches struggle, and identifying open problems for future research. Our dataset is available at this https URL and our code with data generation scripts and model implementations is at this https URL.

[LG-105] Fast Non-Log-Concave Sampling under Nonconvex Equality and Inequality Constraints with Landing

链接: https://arxiv.org/abs/2510.22044
作者: Kijung Jeon,Michael Muehlebach,Molei Tao
类目: Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
*备注: 62 pages

点击查看摘要

Abstract:Sampling from constrained statistical distributions is a fundamental task in various fields including Bayesian statistics, computational chemistry, and statistical physics. This article considers the cases where the constrained distribution is described by an unconstrained density, as well as additional equality and/or inequality constraints, which often make the constraint set nonconvex. Existing methods for nonconvex constraint set \Sigma \subset \mathbbR^d defined by equality or inequality constraints commonly rely on costly projection steps. Moreover, they cannot handle equality and inequality constraints simultaneously as each method only specialized in one case. In addition, rigorous and quantitative convergence guarantee is often lacking. In this paper, we introduce Overdamped Langevin with LAnding (OLLA), a new framework that can design overdamped Langevin dynamics accommodating both equality and inequality constraints. The proposed dynamics also deterministically corrects trajectories along the normal direction of the constraint surface, thus obviating the need for explicit projections. We show that, under suitable regularity conditions on the target density and \Sigma , OLLA converges exponentially fast in W_2 distance to the constrained target density \rho_\Sigma(x) \propto \exp(-f(x))d\sigma_\Sigma . Lastly, through experiments, we demonstrate the efficiency of OLLA compared to projection-based constrained Langevin algorithms and their slack variable variants, highlighting its favorable computational cost and reasonable empirical mixing.

[LG-106] Generalized Top-k Mallows Model for Ranked Choices

链接: https://arxiv.org/abs/2510.22040
作者: Shahrzad Haddadan,Sara Ahmadian
类目: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:The classic Mallows model is a foundational tool for modeling user preferences. However, it has limitations in capturing real-world scenarios, where users often focus only on a limited set of preferred items and are indifferent to the rest. To address this, extensions such as the top-k Mallows model have been proposed, aligning better with practical applications. In this paper, we address several challenges related to the generalized top-k Mallows model, with a focus on analyzing buyer choices. Our key contributions are: (1) a novel sampling scheme tailored to generalized top-k Mallows models, (2) an efficient algorithm for computing choice probabilities under this model, and (3) an active learning algorithm for estimating the model parameters from observed choice data. These contributions provide new tools for analysis and prediction in critical decision-making scenarios. We present a rigorous mathematical analysis for the performance of our algorithms. Furthermore, through extensive experiments on synthetic data and real-world data, we demonstrate the scalability and accuracy of our proposed methods, and we compare the predictive power of Mallows model for top-k lists compared to the simpler Multinomial Logit model.

[LG-107] Linearized Optimal Transport for Analysis of High-Dimensional Point-Cloud and Single-Cell Data

链接: https://arxiv.org/abs/2510.22033
作者: Tianxiang Wang,Yingtong Ke,Dhananjay Bhaskar,Smita Krishnaswamy,Alexander Cloninger
类目: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
*备注: 11 pages, 5 figures

点击查看摘要

Abstract:Single-cell technologies generate high-dimensional point clouds of cells, enabling detailed characterization of complex patient states and treatment responses. Yet each patient is represented by an irregular point cloud rather than a simple vector, making it difficult to directly quantify and compare biological differences between individuals. Nonlinear methods such as kernels and neural networks achieve predictive accuracy but act as black boxes, offering little biological interpretability. To address these limitations, we adapt the Linear Optimal Transport (LOT) framework to this setting, embedding irregular point clouds into a fixed-dimensional Euclidean space while preserving distributional structure. This embedding provides a principled linear representation that preserves optimal transport geometry while enabling downstream analysis. It also forms a registration between any two patients, enabling direct comparison of their cellular distributions. Within this space, LOT enables: (i) \textbfaccurate and interpretable classification of COVID-19 patient states, where classifier weights map back to specific markers and spatial regions driving predictions; and (ii) \textbfsynthetic data generation for patient-derived organoids, exploiting the linearity of the LOT embedding. LOT barycenters yield averaged cellular profiles representing combined conditions or samples, supporting drug interaction testing. Together, these results establish LOT as a unified framework that bridges predictive performance, interpretability, and generative modeling. By transforming heterogeneous point clouds into structured embeddings directly traceable to the original data, LOT opens new opportunities for understanding immune variation and treatment effects in high-dimensional biological systems. Comments: 11 pages, 5 figures Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML) MSC classes: 68T05 Cite as: arXiv:2510.22033 [cs.LG] (or arXiv:2510.22033v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2510.22033 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

[LG-108] K-DAREK: Distance Aware Error for Kurkova Kolmogorov Networks

链接: https://arxiv.org/abs/2510.22021
作者: Masoud Ataei,Vikas Dhiman,Mohammad Javad Khojasteh
类目: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
*备注: Accepted at IEEE ACSSC, 9 pages and 3 figures

点击查看摘要

Abstract:Neural networks are parametric and powerful tools for function approximation, and the choice of architecture heavily influences their interpretability, efficiency, and generalization. In contrast, Gaussian processes (GPs) are nonparametric probabilistic models that define distributions over functions using a kernel to capture correlations among data points. However, these models become computationally expensive for large-scale problems, as they require inverting a large covariance matrix. Kolmogorov- Arnold networks (KANs), semi-parametric neural architectures, have emerged as a prominent approach for modeling complex functions with structured and efficient representations through spline layers. Kurkova Kolmogorov-Arnold networks (KKANs) extend this idea by reducing the number of spline layers in KAN and replacing them with Chebyshev layers and multi-layer perceptrons, thereby mapping inputs into higher-dimensional spaces before applying spline-based transformations. Compared to KANs, KKANs perform more stable convergence during training, making them a strong architecture for estimating operators and system modeling in dynamical systems. By enhancing the KKAN architecture, we develop a novel learning algorithm, distance-aware error for Kurkova-Kolmogorov networks (K-DAREK), for efficient and interpretable function approximation with uncertainty quantification. Our approach establishes robust error bounds that are distance-aware; this means they reflect the proximity of a test point to its nearest training points. Through case studies on a safe control task, we demonstrate that K-DAREK is about four times faster and ten times higher computationally efficiency than Ensemble of KANs, 8.6 times more scalable than GP by increasing the data size, and 50% safer than our previous work distance-aware error for Kolmogorov networks (DAREK).

[LG-109] Do You Trust the Process?: Modeling Institutional Trust for Community Adoption of Reinforcement Learning Policies

链接: https://arxiv.org/abs/2510.22017
作者: Naina Balepur,Xingrui Pei,Hari Sundaram
类目: Machine Learning (cs.LG); Computers and Society (cs.CY); Social and Information Networks (cs.SI)
*备注:

点击查看摘要

Abstract:Many governmental bodies are adopting AI policies for decision-making. In particular, Reinforcement Learning has been used to design policies that citizens would be expected to follow if implemented. Much RL work assumes that citizens follow these policies, and evaluate them with this in mind. However, we know from prior work that without institutional trust, citizens will not follow policies put in place by governments. In this work, we develop a trust-aware RL algorithm for resource allocation in communities. We consider the case of humanitarian engineering, where the organization is aiming to distribute some technology or resource to community members. We use a Deep Deterministic Policy Gradient approach to learn a resource allocation that fits the needs of the organization. Then, we simulate resource allocation according to the learned policy, and model the changes in institutional trust of community members. We investigate how this incorporation of institutional trust affects outcomes, and ask how effectively an organization can learn policies if trust values are private. We find that incorporating trust into RL algorithms can lead to more successful policies, specifically when the organization’s goals are less certain. We find more conservative trust estimates lead to increased fairness and average community trust, though organization success suffers. Finally, we explore a strategy to prevent unfair outcomes to communities. We implement a quota system by an external entity which decreases the organization’s utility when it does not serve enough community members. We find this intervention can improve fairness and trust among communities in some cases, while decreasing the success of the organization. This work underscores the importance of institutional trust in algorithm design and implementation, and identifies a tension between organization success and community well-being.

[LG-110] Cost-Sensitive Evaluation for Binary Classifiers

链接: https://arxiv.org/abs/2510.22016
作者: Pierangelo Lombardo,Antonio Casoli,Cristian Cingolani,Shola Oshodi,Michele Zanatta
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:Selecting an appropriate evaluation metric for classifiers is crucial for model comparison and parameter optimization, yet there is not consensus on a universally accepted metric that serves as a definitive standard. Moreover, there is often a misconception about the perceived need to mitigate imbalance in datasets used to train classification models. Since the final goal in classifier optimization is typically maximizing the return of investment or, equivalently, minimizing the Total Classification Cost (TCC), we define Weighted Accuracy (WA), an evaluation metric for binary classifiers with a straightforward interpretation as a weighted version of the well-known accuracy metric, coherent with the need of minimizing TCC. We clarify the conceptual framework for handling class imbalance in cost-sensitive scenarios, providing an alternative to rebalancing techniques. This framework can be applied to any metric that, like WA, can be expressed as a linear combination of example-dependent quantities and allows for comparing the results obtained in different datasets and for addressing discrepancies between the development dataset, used to train and validate the model, and the target dataset, where the model will be deployed. It also specifies in which scenarios using UCCs-unaware class rebalancing techniques or rebalancing metrics aligns with TCC minimization and when it is instead counterproductive. Finally, we propose a procedure to estimate the WA weight parameter in the absence of fully specified UCCs and demonstrate the robustness of WA by analyzing its correlation with TCC in example-dependent scenarios.

[LG-111] A Multimodal Human Protein Embeddings Database: DeepDrug Protein Embeddings Bank (DPEB)

链接: https://arxiv.org/abs/2510.22008
作者: Md Saiful Islam Sajol,Magesh Rajasekaran,Hayden Gemeinhardt,Adam Bess,Chris Alvin,Supratik Mukhopadhyay
类目: Machine Learning (cs.LG); Molecular Networks (q-bio.MN)
*备注:

点击查看摘要

Abstract:Computationally predicting protein-protein interactions (PPIs) is challenging due to the lack of integrated, multimodal protein representations. DPEB is a curated collection of 22,043 human proteins that integrates four embedding types: structural (AlphaFold2), transformer-based sequence (BioEmbeddings), contextual amino acid patterns (ESM-2: Evolutionary Scale Modeling), and sequence-based n-gram statistics (ProtVec]). AlphaFold2 protein structures are available through public databases (e.g., AlphaFold2 Protein Structure Database), but the internal neural network embeddings are not. DPEB addresses this gap by providing AlphaFold2-derived embeddings for computational modeling. Our benchmark evaluations show GraphSAGE with BioEmbedding achieved the highest PPI prediction performance (87.37% AUROC, 79.16% accuracy). The framework also achieved 77.42% accuracy for enzyme classification and 86.04% accuracy for protein family classification. DPEB supports multiple graph neural network methods for PPI prediction, enabling applications in systems biology, drug target identification, pathway analysis, and disease mechanism studies.

[LG-112] An Introductory Guide to Koopman Learning

链接: https://arxiv.org/abs/2510.22002
作者: Matthew J. Colbrook,Zlatko Drmač,Andrew Horning
类目: Numerical Analysis (math.NA); Machine Learning (cs.LG); Dynamical Systems (math.DS); Optimization and Control (math.OC); Spectral Theory (math.SP)
*备注:

点击查看摘要

Abstract:Koopman operators provide a linear framework for data-driven analyses of nonlinear dynamical systems, but their infinite-dimensional nature presents major computational challenges. In this article, we offer an introductory guide to Koopman learning, emphasizing rigorously convergent data-driven methods for forecasting and spectral analysis. We provide a unified account of error control via residuals in both finite- and infinite-dimensional settings, an elementary proof of convergence for generalized Laplace analysis – a variant of filtered power iteration that works for operators with continuous spectra and no spectral gaps – and review state-of-the-art approaches for computing continuous spectra and spectral measures. The goal is to provide both newcomers and experts with a clear, structured overview of reliable data-driven techniques for Koopman spectral analysis.

[LG-113] Deep Learning on Real-World Graphs NEURIPS ICLR ICML

链接: https://arxiv.org/abs/2510.21994
作者: Emanuele Rossi
类目: Machine Learning (cs.LG)
*备注: The thesis was submitted for the degree of Doctor of Philosophy in Computing at Imperial College London (February 2024), under the supervision of Prof. Michael M. Bronstein. It includes work published at ICML, ICLR, NeurIPS, and the Learning on Graphs Conference

点击查看摘要

Abstract:Graph Neural Networks (GNNs) have become a central tool for learning on graph-structured data, yet their applicability to real-world systems remains limited by key challenges such as scalability, temporality, directionality, data incompleteness, and structural uncertainty. This thesis introduces a series of models addressing these limitations: SIGN for scalable graph learning, TGN for temporal graphs, Dir-GNN for directed and heterophilic networks, Feature Propagation (FP) for learning with missing node features, and NuGget for game-theoretic structural inference. Together, these contributions bridge the gap between academic benchmarks and industrial-scale graphs, enabling the use of GNNs in domains such as social and recommender systems.

[LG-114] Boltzmann Graph Ensemble Embeddings for Aptamer Libraries

链接: https://arxiv.org/abs/2510.21980
作者: Starlika Bauskar,Jade Jiao,Narayanan Kannan,Alexander Kimm,Justin M. Baker,Matthew J. Tyler,Andrea L. Bertozzi,Anne M. Andrews
类目: Machine Learning (cs.LG); Probability (math.PR); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-115] Deep Jump Gaussian Processes for Surrogate Modeling of High-Dimensional Piecewise Continuous Functions

链接: https://arxiv.org/abs/2510.21974
作者: Yang Xu,Chiwoo Park
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-116] Revisiting Orbital Minimization Method for Neural Operator Decomposition NEURIPS2025

链接: https://arxiv.org/abs/2510.21952
作者: J. Jon Ryu,Samuel Zhou,Gregory W. Wornell
类目: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
*备注: 25 pages, 8 figures. To appear at NeurIPS 2025

点击查看摘要

[LG-117] Generalization Bounds for Rank-sparse Neural Networks NEURIPS2025

链接: https://arxiv.org/abs/2510.21945
作者: Antoine Ledent,Rodrigo Alves,Yunwen Lei
类目: Machine Learning (cs.LG)
*备注: Accepted at NeurIPS 2025

点击查看摘要

Abstract:It has been recently observed in much of the literature that neural networks exhibit a bottleneck rank property: for larger depths, the activation and weights of neural networks trained with gradient-based methods tend to be of approximately low rank. In fact, the rank of the activations of each layer converges to a fixed value referred to as the ``bottleneck rank’', which is the minimum rank required to represent the training data. This perspective is in line with the observation that regularizing linear networks (without activations) with weight decay is equivalent to minimizing the Schatten p quasi norm of the neural network. In this paper we investigate the implications of this phenomenon for generalization. More specifically, we prove generalization bounds for neural networks which exploit the approximate low rank structure of the weight matrices if present. The final results rely on the Schatten p quasi norms of the weight matrices: for small p , the bounds exhibit a sample complexity \widetildeO(WrL^2) where W and L are the width and depth of the neural network respectively and where r is the rank of the weight matrices. As p increases, the bound behaves more like a norm-based bound instead.

[LG-118] Joint Score-Threshold Optimization for Interpretable Risk Assessment Under Partial Supervision

链接: https://arxiv.org/abs/2510.21934
作者: Fardin Gankhanloo,Emmett Springer,Erik H. Hoyer,Daniel L. Young,Kimia Ghobadi
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:Risk assessment tools in healthcare commonly employ point-based scoring systems that map patients to ordinal risk categories via thresholds. While electronic health record (EHR) data presents opportunities for data-driven optimization of these tools, two fundamental challenges impede standard supervised learning: (1) partial supervision arising from intervention-censored outcomes, where only extreme categories can be reliably labeled, and (2) asymmetric misclassification costs that increase with ordinal distance. We propose a mixed-integer programming (MIP) framework that jointly optimizes scoring weights and category thresholds under these constraints. Our approach handles partial supervision through per-instance feasible label sets, incorporates asymmetric distance-aware objectives, and prevents middle-category collapse via minimum threshold gaps. We further develop a CSO relaxation using softplus losses that preserves the ordinal structure while enabling efficient optimization. The framework supports governance constraints including sign restrictions, sparsity, and minimal modifications to incumbent tools, ensuring practical deployability in clinical workflows.

[LG-119] Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks

链接: https://arxiv.org/abs/2510.21910
作者: Mahavir Dabas,Tran Huynh,Nikhil Reddy Billa,Jiachen T. Wang,Peng Gao,Charith Peris,Yao Ma,Rahul Gupta,Ming Jin,Prateek Mittal,Ruoxi Jia
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Large language models remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs. Defending against novel jailbreaks represents a critical challenge in AI safety. Adversarial training – designed to make models robust against worst-case perturbations – has been the dominant paradigm for adversarial robustness. However, due to optimization challenges and difficulties in defining realistic threat models, adversarial training methods often fail on newly developed jailbreaks in practice. This paper proposes a new paradigm for improving robustness against unseen jailbreaks, centered on the Adversarial Déjà Vu hypothesis: novel jailbreaks are not fundamentally new, but largely recombinations of adversarial skills from previous attacks. We study this hypothesis through a large-scale analysis of 32 attack papers published over two years. Using an automated pipeline, we extract and compress adversarial skills into a sparse dictionary of primitives, with LLMs generating human-readable descriptions. Our analysis reveals that unseen attacks can be effectively explained as sparse compositions of earlier skills, with explanatory power increasing monotonically as skill coverage grows. Guided by this insight, we introduce Adversarial Skill Compositional Training (ASCoT), which trains on diverse compositions of skill primitives rather than isolated attack instances. ASCoT substantially improves robustness to unseen attacks, including multi-turn jailbreaks, while maintaining low over-refusal rates. We also demonstrate that expanding adversarial skill coverage, not just data scale, is key to defending against novel attacks. \textcolorred\textbfWarning: This paper contains content that may be harmful or offensive in nature.

[LG-120] Prefetching Cache Optimization Using Graph Neural Networks: A Modular Framework and Conceptual Analysis

链接: https://arxiv.org/abs/2510.21865
作者: F. I. Qowy
类目: Performance (cs.PF); Machine Learning (cs.LG); Software Engineering (cs.SE)
*备注:

点击查看摘要

[LG-121] OpenEM: Large-scale multi-structural 3D datasets for electromagnetic methods

链接: https://arxiv.org/abs/2510.21859
作者: Shuang Wang,Xuben Wang,Fei Deng,Peifan Jiang,Jian Chen,Gianluca Fiandaca
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:With the remarkable success of deep learning, applying such techniques to EM methods has emerged as a promising research direction to overcome the limitations of conventional approaches. The effectiveness of deep learning methods depends heavily on the quality of datasets, which directly influences model performance and generalization ability. Existing application studies often construct datasets from random one-dimensional or structurally simple three-dimensional models, which fail to represent the complexity of real geological environments. Furthermore, the absence of standardized, publicly available three-dimensional geoelectric datasets continues to hinder progress in deep learning based EM exploration. To address these limitations, we present OpenEM, a large scale, multi structural three dimensional geoelectric dataset that encompasses a broad range of geologically plausible subsurface structures. OpenEM consists of nine categories of geoelectric models, spanning from simple configurations with anomalous bodies in half space to more complex structures such as flat layers, folded layers, flat faults, curved faults, and their corresponding variants with anomalous bodies. Since three-dimensional forward modeling in electromagnetics is extremely time-consuming, we further developed a deep learning based fast forward modeling approach for OpenEM, enabling efficient and reliable forward modeling across the entire dataset. This capability allows OpenEM to be rapidly deployed for a wide range of tasks. OpenEM provides a unified, comprehensive, and large-scale dataset for common EM exploration systems to accelerate the application of deep learning in electromagnetic methods. The complete dataset, along with the forward modeling codes and trained models, is publicly available at this https URL.

[LG-122] owards Interpretable Deep Learning and Analysis of Dynamical Systems via the Discrete Empirical Interpolation Method

链接: https://arxiv.org/abs/2510.21852
作者: Hojin Kim,Romit Maulik
类目: Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn)
*备注: 9 pages, 12 figures

点击查看摘要

Abstract:We present a differentiable framework that leverages the Discrete Empirical Interpolation Method (DEIM) for interpretable deep learning and dynamical system analysis. Although DEIM efficiently approximates nonlinear terms in projection-based reduced-order models (POD-ROM), its fixed interpolation points limit the adaptability to complex and time-varying dynamics. To address this limitation, we first develop a differentiable adaptive DEIM formulation for the one-dimensional viscous Burgers equation, which allows neural networks to dynamically select interpolation points in a computationally efficient and physically consistent manner. We then apply DEIM as an interpretable analysis tool for examining the learned dynamics of a pre-trained Neural Ordinary Differential Equation (NODE) on a two-dimensional vortex-merging problem. The DEIM trajectories reveal physically meaningful features in the learned dynamics of NODE and expose its limitations when extrapolating to unseen flow configurations. These findings demonstrate that DEIM can serve not only as a model reduction tool but also as a diagnostic framework for understanding and improving the generalization behavior of neural differential equation models.

[LG-123] Data-Driven Approach to Capitation Reform in Rwanda

链接: https://arxiv.org/abs/2510.21851
作者: Babaniyi Olaniyi,Ina Kalisa,Ana Fernández del Río,Jean Marie Vianney Hakizayezu,Enric Jané,Eniola Olaleye,Juan Francisco Garamendi,Ivan Nazarov,Aditya Rastogi,Mateo Diaz-Quiroz,África Periáñez,Regis Hitimana
类目: Computers and Society (cs.CY); Machine Learning (cs.LG); Applications (stat.AP)
*备注:

点击查看摘要

Abstract:As part of Rwanda’s transition toward universal health coverage, the national Community-Based Health Insurance (CBHI) scheme is moving from retrospective fee-for-service reimbursements to prospective capitation payments for public primary healthcare providers. This report outlines a data-driven approach to designing, calibrating, and monitoring the capitation model using individual-level claims data from the Intelligent Health Benefits System (IHBS). We introduce a transparent, interpretable formula for allocating payments to Health Centers and their affiliated Health Posts. The formula is based on catchment population, service utilization patterns, and patient inflows, with parameters estimated via regression models calibrated on national claims data. Repeated validation exercises show the payment scheme closely aligns with historical spending while promoting fairness and adaptability across diverse facilities. In addition to payment design, the same dataset enables actionable behavioral insights. We highlight the use case of monitoring antibiotic prescribing patterns, particularly in pediatric care, to flag potential overuse and guideline deviations. Together, these capabilities lay the groundwork for a learning health financing system: one that connects digital infrastructure, resource allocation, and service quality to support continuous improvement and evidence-informed policy reform.

[LG-124] SynCast: Synergizing Contradictions in Precipitation Nowcasting via Diffusion Sequential Preference Optimization

链接: https://arxiv.org/abs/2510.21847
作者: Kaiyi Xu,Junchao Gong,Wenlong Zhang,Ben Fei,Lei Bai,Wanli Ouyang
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-125] KARIPAP: Quantum-Inspired Tensor Network Compression of Large Language Models Using Infinite Projected Entangled Pair States and Tensor Renormalization Group

链接: https://arxiv.org/abs/2510.21844
作者: Azree Nazri
类目: Machine Learning (cs.LG); Quantum Physics (quant-ph)
*备注: 28 pages

点击查看摘要

Abstract:Large Language Models (LLMs) like ChatGPT and LLaMA drive rapid progress in generative AI, yet their huge parameter scales create severe computational and environmental burdens. High training costs, energy use, and limited device deployment hinder accessibility. Existing compression - pruning, distillation, low-rank, and quantization - reduces size but ignores complex inter-layer correlations. We propose KARIPAP, a quantum-inspired tensor network compression using Infinite Projected Entangled Pair States (iPEPS) and Tensor Renormalization Group (TRG) contraction. Unlike 1D Matrix Product States, iPEPS captures multi-directional entanglement in attention and deep transformer layers. TRG ensures polynomial-time contraction, making tensorization feasible while preserving key correlation geometry. Experiments on LLaMA-2 7B show up to 93% memory and 70% parameter reduction, with 50% faster training, 25% faster inference, and only 2-3% accuracy loss. Layer-wise entanglement profiling reveals redundancy in deeper layers, confirming their suitability for tensor factorization. KARIPAP demonstrates that modern LLMs occupy low-dimensional entanglement manifolds, enabling scalable, energy-efficient, and quantum-aware AI architectures.

[LG-126] Quantum Autoencoders for Anomaly Detection in Cybersecurity

链接: https://arxiv.org/abs/2510.21837
作者: Rohan Senthil,Swee Liang Wong
类目: Emerging Technologies (cs.ET); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Anomaly detection in cybersecurity is a challenging task, where normal events far outnumber anomalous ones with new anomalies occurring frequently. Classical autoencoders have been used for anomaly detection, but struggles in data-limited settings which quantum counterparts can potentially overcome. In this work, we apply Quantum Autoencoders (QAEs) for anomaly detection in cybersecurity, specifically on the BPF-extended tracking honeypot (BETH) dataset. QAEs are evaluated across multiple encoding techniques, ansatz types, repetitions, and feature selection strategies. Our results demonstrate that an 8-feature QAE using Dense-Angle encoding with a RealAmplitude ansatz can outperform Classical Autoencoders (CAEs), even when trained on substantially fewer samples. The effects of quantum encoding and feature selection for developing quantum models are demonstrated and discussed. In a data-limited setting, the best performing QAE model has a F1 score of 0.87, better than that of CAE (0.77). These findings suggest that QAEs may offer practical advantages for anomaly detection in data-limited scenarios.

[LG-127] COLA: Continual Learning via Autoencoder Retrieval of Adapters

链接: https://arxiv.org/abs/2510.21836
作者: Jaya Krishna Mandivarapu
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Learning a set of tasks over time, also known as continual learning (CL), is one of the most challenging problems in artificial intelligence due to catastrophic forgetting. Large language models (LLMs) are often impractical to frequent re-training and continual learning , due to high cost of computational resources for training. Moreover, LLM are not suitable for continual learning as updating these models over time for acquiring new knowledge leads to overwrites existing knowledge leading to common phenomenon know as \textitcatastrophic forgetting. In this paper, we aim to address these concerns using a novel framework , COLA that employs an autoencoder to learn capture low-dimensional embeddings of the weights associated with various tasks. Our approach facilitates the transfer of knowledge to new tasks while preventing catastrophic forgetting, all without using data replay or a substantial set of task-specific parameters. Our approach, COLA, makes the LLM efficiently learn new tasks with minimal training, insignificant performance degradation on previous tasks, and eliminates the need for retaining earlier training data. Empirical evaluation on different datasets ranging from task oriented dialouge system to intent classsfication datasets showcases that our method not only overcomes catastrophic forgetting but also achieves significant reduction in parameter usage and memory size, across multiple tasks and outperforming the existing state of the art methods across multiple datasets.

[LG-128] Restoring Pruned Large Language Models via Lost Component Compensation NEURIPS2025

链接: https://arxiv.org/abs/2510.21834
作者: Zijian Feng,Hanzhang Zhou,Zixiao Zhu,Tianjiao Li,Jia Jim Deryl Chua,Lee Onn Mak,Gee Wah Ng,Kezhi Mao
类目: Machine Learning (cs.LG)
*备注: NeurIPS 2025 Spotlight

点击查看摘要

Abstract:Pruning is a widely used technique to reduce the size and inference cost of large language models (LLMs), but it often causes performance degradation. To mitigate this, existing restoration methods typically employ parameter-efficient fine-tuning (PEFT), such as LoRA, to recover the pruned model’s performance. However, most PEFT methods are designed for dense models and overlook the distinct properties of pruned models, often resulting in suboptimal recovery. In this work, we propose a targeted restoration strategy for pruned models that restores performance while preserving their low cost and high efficiency. We observe that pruning-induced information loss is reflected in attention activations, and selectively reintroducing components of this information can significantly recover model performance. Based on this insight, we introduce RestoreLCC (Restoring Pruned LLMs via Lost Component Compensation), a plug-and-play method that contrastively probes critical attention heads via activation editing, extracts lost components from activation differences, and finally injects them back into the corresponding pruned heads for compensation and recovery. RestoreLCC is compatible with structured, semi-structured, and unstructured pruning schemes. Extensive experiments demonstrate that RestoreLCC consistently outperforms state-of-the-art baselines in both general and task-specific performance recovery, without compromising the sparsity or inference efficiency of pruned models.

[LG-129] Beyond Point Matching: Evaluating Multiscale Dubuc Distance for Time Series Similarity

链接: https://arxiv.org/abs/2510.21824
作者: Azim Ahmadzadeh,Mahsa Khazaei,Elaina Rohlfing
类目: Machine Learning (cs.LG); Solar and Stellar Astrophysics (astro-ph.SR)
*备注:

点击查看摘要

Abstract:Time series are high-dimensional and complex data objects, making their efficient search and indexing a longstanding challenge in data mining. Building on a recently introduced similarity measure, namely Multiscale Dubuc Distance (MDD), this paper investigates its comparative strengths and limitations relative to the widely used Dynamic Time Warping (DTW). MDD is novel in two key ways: it evaluates time series similarity across multiple temporal scales and avoids point-to-point alignment. We demonstrate that in many scenarios where MDD outperforms DTW, the gains are substantial, and we provide a detailed analysis of the specific performance gaps it addresses. We provide simulations, in addition to the 95 datasets from the UCR archive, to test our hypotheses. Finally, we apply both methods to a challenging real-world classification task and show that MDD yields a significant improvement over DTW, underscoring its practical utility.

[LG-130] Geographic Transferability of Machine Learning Models for Short-Term Airport Fog Forecasting

链接: https://arxiv.org/abs/2510.21819
作者: Marcelo Cerda Castillo
类目: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
*备注: 21 pages, 8 tables, 2 figures. Uses publicly available ERA5 and METAR datasets

点击查看摘要

Abstract:Short-term forecasting of airport fog (visibility 1.0 km) presents challenges in geographic generalization because many machine learning models rely on location-specific features and fail to transfer across sites. This study investigates whether fundamental thermodynamic and radiative processes can be encoded in a coordinate-free (location-independent) feature set to enable geographic transferability. A gradient boosting classifier (XGBoost) trained on Santiago, Chile (SCEL, 33S) data from 2002-2009 was evaluated on a 2010-2012 holdout set and under strict zero-shot tests at Puerto Montt (SCTE), San Francisco (KSFO), and London (EGLL). The model achieved AUC values of 0.923-0.947 across distances up to 11,650 km and different fog regimes (radiative, advective, marine). Consistent SHAP feature rankings show that visibility persistence, solar angle, and thermal gradients dominate predictions, suggesting the model learned transferable physical relationships rather than site-specific patterns. Results suggest that physics-informed, coordinate-free feature engineering can yield geographically transferable atmospheric forecasting tools.

[LG-131] Residual-guided AI-CFD hybrid method enables stable and scalable simulations: from 2D benchmarks to 3D applications

链接: https://arxiv.org/abs/2510.21804
作者: Shilaj Baral,Youngkyu Lee,Sangam Khanal,Joongoo Jeon
类目: Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn)
*备注:

点击查看摘要

Abstract:Purely data-driven surrogates for fluid dynamics often fail catastrophically from error accumulation, while existing hybrid methods have lacked the automation and robustness for practical use. To solve this, we developed XRePIT, a novel hybrid simulation strategy that synergizes machine learning (ML) acceleration with solver-based correction. We specifically designed our method to be fully automated and physics-aware, ensuring the stability and practical applicability that previous approaches lacked. We demonstrate that this new design overcomes long-standing barriers, achieving the first stable, accelerated rollouts for over 10,000 timesteps. The method also generalizes robustly to unseen boundary conditions and, crucially, scales to 3D flows. Our approach delivers speedups up to 4.98 \times while maintaining high physical fidelity, resolving thermal fields with relative errors of ~1E-3 and capturing low magnitude velocity dynamics with errors below 1E-2 ms-1. This work thus establishes a mature and scalable hybrid method, paving the way for its use in real-world engineering.

[LG-132] MARS-M: When Variance Reduction Meets Matrices

链接: https://arxiv.org/abs/2510.21800
作者: Yifeng Liu,Angela Yuan,Quanquan Gu
类目: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:Matrix-based preconditioned optimizers, such as Muon, have recently been shown to be more efficient than scalar-based optimizers for training large-scale neural networks, including large language models (LLMs). On the other hand, recent benchmarks on optimizers for LLM pre-training have demonstrated that variance-reduction techniques such as MARS can achieve substantial speedups over standard optimizers that do not employ variance reduction. In this paper, to achieve the best of both worlds, we introduce MARS-M, a new optimizer that integrates the variance reduction technique in MARS with Muon. Under standard regularity conditions, we prove that Muon-M converges to a first-order stationary point at a rate of \tilde\mathcalO(T^-1/3) , which improves upon \tilde\mathcalO(T^-1/4) rate attained by Muon. Our empirical results on language modeling and computer vision tasks demonstrate that MARS-M consistently yields lower losses and improved performance across various downstream benchmarks. The implementation of MARS-M is available at this https URL.

[LG-133] Chebyshev Moment Regularization (CMR): Condition-Number Control with Moment Shaping

链接: https://arxiv.org/abs/2510.21772
作者: Jinwoo Baek
类目: Machine Learning (cs.LG); Numerical Analysis (math.NA)
*备注: 15 pages

点击查看摘要

Abstract:We introduce \textbfChebyshev Moment Regularization (CMR), a simple, architecture-agnostic loss that directly optimizes layer spectra. CMR jointly controls spectral edges via a log-condition proxy and shapes the interior via Chebyshev moments, with a decoupled, capped mixing rule that preserves task gradients. We prove strictly monotone descent for the condition proxy, bounded moment gradients, and orthogonal invariance. In an adversarial `` \kappa -stress’’ setting (MNIST, 15-layer MLP), \emphcompared to vanilla training, CMR reduces mean layer condition numbers by \sim!10^3 (from \approx3.9!\times!10^3 to \approx3.4 in 5 epochs), increases average gradient magnitude, and restores test accuracy ( \approx10%!\to!\approx86% ). These results support \textbfoptimization-driven spectral preconditioning: directly steering models toward well-conditioned regimes for stable, accurate learning.

[LG-134] Numerical Frag ility in Transformers: A Layer-wise Theory for Explaining Forecasting and Mitigating Instability

链接: https://arxiv.org/abs/2510.21770
作者: Jinwoo Baek
类目: Machine Learning (cs.LG); Numerical Analysis (math.NA)
*备注: 15 pages

点击查看摘要

Abstract:Transformers trained in low precision can suffer forward-error amplification. We give a first-order, module-wise theory that predicts when and where errors grow. For self-attention we derive a per-layer bound that factorizes into three interpretable diagnostics: a score-scale ratio \kappa_\rm score , a rowwise softmax sensitivity \kappa_\rm softmax , and value conditioning \kappa(V) . We prove a residual relaxation inequality showing that residual blocks attenuate depth-wise accumulation, and we introduce a precision- and width-aware LayerNorm indicator \rho_\rm LN with a matching first-order bound in the \epsilon -dominated regime. These pieces yield a unified forward-stability bound whose right-hand side is directly estimable during training. On Tiny-ViT/CIFAR-10 we evaluate the bound and components. (1) The combined predictor \kappa_\rm softmax,(1+\kappa_\rm score),\kappa(V),|W_O|2+\kappa\rm eff+C_\rm LN tracks FP32 \leftrightarrow LP mismatches across seeds, widths, and precisions; scaling by \epsilon_\rm mach collapses mixed-precision points. (2) The time-series maximum of \kappa_\rm softmax acts as an early-warning signal, leading error spikes by 16-24 steps (corr. 0.65-0.82; permutation p!\approx!10^-3 ; Precision@K 0.89-1.00). (3) Guided by \rho_\rm LN , a small LayerNorm- \epsilon tweak targeting \rho_\star gives consistent stabilization (mean tail-loss \downarrow\ \approx0.010 at \rho_\star!=!0.6 , cap =10^-2 ) with negligible overhead. Overall, our theory supplies actionable, unitless diagnostics that (i) explain when self-attention is fragile, (ii) forecast instability, and (iii) motivate a minimally invasive mitigation. Comments: 15 pages Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA) Cite as: arXiv:2510.21770 [cs.LG] (or arXiv:2510.21770v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2510.21770 Focus to learn more arXiv-issued DOI via DataCite

[LG-135] axonomy and Trends in Reinforcement Learning for Robotics and Control Systems: A Structured Review

链接: https://arxiv.org/abs/2510.21758
作者: Kumater Ter,RexCharles Donatus,Ore-Ofe Ajayi,Daniel Udekwe
类目: Robotics (cs.RO); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Reinforcement learning (RL) has become a foundational approach for enabling intelligent robotic behavior in dynamic and uncertain environments. This work presents an in-depth review of RL principles, advanced deep reinforcement learning (DRL) algorithms, and their integration into robotic and control systems. Beginning with the formalism of Markov Decision Processes (MDPs), the study outlines essential elements of the agent-environment interaction and explores core algorithmic strategies including actor-critic methods, value-based learning, and policy gradients. Emphasis is placed on modern DRL techniques such as DDPG, TD3, PPO, and SAC, which have shown promise in solving high-dimensional, continuous control tasks. A structured taxonomy is introduced to categorize RL applications across domains such as locomotion, manipulation, multi-agent coordination, and human-robot interaction, along with training methodologies and deployment readiness levels. The review synthesizes recent research efforts, highlighting technical trends, design patterns, and the growing maturity of RL in real-world robotics. Overall, this work aims to bridge theoretical advances with practical implementations, providing a consolidated perspective on the evolving role of RL in autonomous robotic systems.

[LG-136] From Authors to Reviewers: Leverag ing Rankings to Improve Peer Review

链接: https://arxiv.org/abs/2510.21726
作者: Weichen Wang,Chengchun Shi
类目: Information Retrieval (cs.IR); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:This paper is a discussion of the 2025 JASA discussion paper by Su et al. (2025). We would like to congratulate the authors on conducting a comprehensive and insightful empirical investigation of the 2023 ICML ranking data. The review quality of machine learning (ML) conferences has become a big concern in recent years, due to the rapidly growing number of submitted manuscripts. In this discussion, we propose an approach alternative to Su et al. (2025) that leverages ranking information from reviewers rather than authors. We simulate review data that closely mimics the 2023 ICML conference submissions. Our results show that (i) incorporating ranking information from reviewers can significantly improve the evaluation of each paper’s quality, often outperforming the use of ranking information from authors alone; and (ii) combining ranking information from both reviewers and authors yields the most accurate evaluation of submitted papers in most scenarios.

[LG-137] Words to Waves: Emotion-Adaptive Music Recommendation System

链接: https://arxiv.org/abs/2510.21724
作者: Apoorva Chavali,Reeve Menezes
类目: Information Retrieval (cs.IR); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Current recommendation systems often tend to overlook emotional context and rely on historical listening patterns or static mood tags. This paper introduces a novel music recommendation framework employing a variant of Wide and Deep Learning architecture that takes in real-time emotional states inferred directly from natural language as inputs and recommends songs that closely portray the mood. The system captures emotional contexts from user-provided textual descriptions by using transformer-based embeddings, which were finetuned to predict the emotional dimensions of valence-arousal. The deep component of the architecture utilizes these embeddings to generalize unseen emotional patterns, while the wide component effectively memorizes user-emotion and emotion-genre associations through cross-product features. Experimental results show that personalized music selections positively influence the user’s emotions and lead to a significant improvement in emotional relevance.

[LG-138] asLLR: LLM based Leads Ranking in Auto Sales

链接: https://arxiv.org/abs/2510.21713
作者: Yin Sun,Yiwen Liu,Junjie Song,Chenyu Zhang,Xinyuan Zhang,Lingjie Liu,Siqi Chen,Yuji Cao
类目: Information Retrieval (cs.IR); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:In the area of commercial auto sales system, high-quality lead score sequencing determines the priority of a sale’s work and is essential for optimizing the efficiency of the sales system. Since CRM (Customer Relationship Management) system contains plenty of textual interaction features between sales and customers, traditional techniques such as Click Through Rate (CTR) prediction struggle with processing the complex information inherent in natural language features, which limits their effectiveness in sales lead ranking. Bridging this gap is critical for enhancing business intelligence and decision-making. Recently, the emergence of large language models (LLMs) has opened new avenues for improving recommendation systems, this study introduces asLLR (LLM-based Leads Ranking in Auto Sales), which integrates CTR loss and Question Answering (QA) loss within a decoder-only large language model architecture. This integration enables the simultaneous modeling of both tabular and natural language features. To verify the efficacy of asLLR, we constructed an innovative dataset derived from the customer lead pool of a prominent new energy vehicle brand, with 300,000 training samples and 40,000 testing samples. Our experimental results demonstrate that asLLR effectively models intricate patterns in commercial datasets, achieving the AUC of 0.8127, surpassing traditional CTR estimation methods by 0.0231. Moreover, asLLR enhances CTR models when used for extracting text features by 0.0058. In real-world sales scenarios, after rigorous online A/B testing, asLLR increased the sales volume by about 9.5% compared to the traditional method, providing a valuable tool for business intelligence and operational decision-making.

[LG-139] Minimizing Human Intervention in Online Classification

链接: https://arxiv.org/abs/2510.23557
作者: William Réveillard,Vasileios Saketos,Alexandre Proutiere,Richard Combes
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注: 49 pages, 8 figures

点击查看摘要

Abstract:We introduce and study an online problem arising in question answering systems. In this problem, an agent must sequentially classify user-submitted queries represented by d -dimensional embeddings drawn i.i.d. from an unknown distribution. The agent may consult a costly human expert for the correct label, or guess on her own without receiving feedback. The goal is to minimize regret against an oracle with free expert access. When the time horizon T is at least exponential in the embedding dimension d , one can learn the geometry of the class regions: in this regime, we propose the Conservative Hull-based Classifier (CHC), which maintains convex hulls of expert-labeled queries and calls the expert as soon as a query lands outside all known hulls. CHC attains \mathcalO(\log^d T) regret in T and is minimax optimal for d=1 . Otherwise, the geometry cannot be reliably learned without additional distributional assumptions. We show that when the queries are drawn from a subgaussian mixture, for T \le e^d , a Center-based Classifier (CC) achieves regret proportional to N\logN where N is the number of labels. To bridge these regimes, we introduce the Generalized Hull-based Classifier (GHC), a practical extension of CHC that allows for more aggressive guessing via a tunable threshold parameter. Our approach is validated with experiments, notably on real-world question-answering datasets using embeddings derived from state-of-the-art large language models.

[LG-140] Direct Debiased Machine Learning via Bregman Divergence Minimization

链接: https://arxiv.org/abs/2510.23534
作者: Masahiro Kato
类目: Econometrics (econ.EM); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:We develop a direct debiased machine learning framework comprising Neyman targeted estimation and generalized Riesz regression. Our framework unifies Riesz regression for automatic debiased machine learning, covariate balancing, targeted maximum likelihood estimation (TMLE), and density-ratio estimation. In many problems involving causal effects or structural models, the parameters of interest depend on regression functions. Plugging regression functions estimated by machine learning methods into the identifying equations can yield poor performance because of first-stage bias. To reduce such bias, debiased machine learning employs Neyman orthogonal estimating equations. Debiased machine learning typically requires estimation of the Riesz representer and the regression function. For this problem, we develop a direct debiased machine learning framework with an end-to-end algorithm. We formulate estimation of the nuisance parameters, the regression function and the Riesz representer, as minimizing the discrepancy between Neyman orthogonal scores computed with known and unknown nuisance parameters, which we refer to as Neyman targeted estimation. Neyman targeted estimation includes Riesz representer estimation, and we measure discrepancies using the Bregman divergence. The Bregman divergence encompasses various loss functions as special cases, where the squared loss yields Riesz regression and the Kullback-Leibler divergence yields entropy balancing. We refer to this Riesz representer estimation as generalized Riesz regression. Neyman targeted estimation also yields TMLE as a special case for regression function estimation. Furthermore, for specific pairs of models and Riesz representer estimation methods, we can automatically obtain the covariate balancing property without explicitly solving the covariate balancing objective.

[LG-141] Quantum Phase Classification of Rydberg Atom Systems Using Resource-Efficient Variational Quantum Circuits and Classical Shadows

链接: https://arxiv.org/abs/2510.23489
作者: Hemish Ahuja,Samradh Bhardwaj,Kirti Dhir,Roman Bagdasarian,Ziwoong Jang
类目: Quantum Physics (quant-ph); Machine Learning (cs.LG)
*备注: 7 pages, 2 tables, and 3 figures. for associated code files, see this https URL

点击查看摘要

Abstract:Quantum phase transitions in Rydberg atom arrays present significant opportunities for studying many-body physics, yet distinguishing between different ordered phases without explicit order parameters remains challenging. We present a resource-efficient quantum machine learning approach combining classical shadow tomography with variational quantum circuits (VQCs) for binary phase classification of Z2 and Z3 ordered phases. Our pipeline processes 500 randomized measurements per 51-atom chain state, reconstructs shadow operators, performs PCA dimensionality reduction (514 features), and encodes features using angle embedding onto a 2-qubit parameterized circuit. The circuit employs RY-RZ angle encoding, strong entanglement via all-to-all CZ gates, and a minimal 2-parameter ansatz achieving depth 7. Training via simultaneous perturbation stochastic approximation (SPSA) with hinge loss converged in 120 iterations. The model achieved 100% test accuracy with perfect precision, recall, and F1 scores, demonstrating that minimal quantum resources suffice for high-accuracy phase classification. This work establishes pathways for quantum-enhanced condensed matter physics on near-term quantum devices.

[LG-142] ghter CMI-Based Generalization Bounds via Stochastic Projection and Quantization NEURIPS2025

链接: https://arxiv.org/abs/2510.23485
作者: Milad Sefidgaran,Kimia Nadjahi,Abdellatif Zaidi
类目: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)
*备注: Accepted for oral presentation at NeurIPS 2025

点击查看摘要

Abstract:In this paper, we leverage stochastic projection and lossy compression to establish new conditional mutual information (CMI) bounds on the generalization error of statistical learning algorithms. It is shown that these bounds are generally tighter than the existing ones. In particular, we prove that for certain problem instances for which existing MI and CMI bounds were recently shown in Attias et al. [2024] and Livni [2023] to become vacuous or fail to describe the right generalization behavior, our bounds yield suitable generalization guarantees of the order of \mathcalO(1/\sqrtn) , where n is the size of the training dataset. Furthermore, we use our bounds to investigate the problem of data “memorization” raised in those works, and which asserts that there are learning problem instances for which any learning algorithm that has good prediction there exist distributions under which the algorithm must “memorize” a big fraction of the training dataset. We show that for every learning algorithm, there exists an auxiliary algorithm that does not memorize and which yields comparable generalization error for any data distribution. In part, this shows that memorization is not necessary for good generalization.

[LG-143] Macroeconomic Forecasting for the G7 countries under Uncertainty Shocks

链接: https://arxiv.org/abs/2510.23347
作者: Shovon Sengupta,Sunny Kumar Singh,Tanujit Chakraborty
类目: Econometrics (econ.EM); Machine Learning (cs.LG); Applications (stat.AP)
*备注:

点击查看摘要

Abstract:Accurate macroeconomic forecasting has become harder amid geopolitical disruptions, policy reversals, and volatile financial markets. Conventional vector autoregressions (VARs) overfit in high dimensional settings, while threshold VARs struggle with time varying interdependencies and complex parameter structures. We address these limitations by extending the Sims Zha Bayesian VAR with exogenous variables (SZBVARx) to incorporate domain-informed shrinkage and four newspaper based uncertainty shocks such as economic policy uncertainty, geopolitical risk, US equity market volatility, and US monetary policy uncertainty. The framework improves structural interpretability, mitigates dimensionality, and imposes empirically guided regularization. Using G7 data, we study spillovers from uncertainty shocks to five core variables (unemployment, real broad effective exchange rates, short term rates, oil prices, and CPI inflation), combining wavelet coherence (time frequency dynamics) with nonlinear local projections (state dependent impulse responses). Out-of-sample results at 12 and 24 month horizons show that SZBVARx outperforms 14 benchmarks, including classical VARs and leading machine learning models, as confirmed by Murphy difference diagrams, multivariate Diebold Mariano tests, and Giacomini White predictability tests. Credible Bayesian prediction intervals deliver robust uncertainty quantification for scenario analysis and risk management. The proposed SZBVARx offers G7 policymakers a transparent, well calibrated tool for modern macroeconomic forecasting under pervasive uncertainty.

[LG-144] he First Star-by-star N-body/Hydrodynamics Simulation of Our Galaxy Coupling with a Surrogate Model

链接: https://arxiv.org/abs/2510.23330
作者: Keiya Hirashima,Michiko S. Fujii,Takayuki R. Saitoh,Naoto Harada,Kentaro Nomura,Kohji Yoshikawa,Yutaka Hirai,Tetsuro Asano,Kana Moriwaki,Masaki Iwasawa,Takashi Okamoto,Junichiro Makino
类目: Astrophysics of Galaxies (astro-ph.GA); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
*备注: 12 pages, 7 figures, 7 tables, IEEE/ACM Supercomputing Conference (SC25)

点击查看摘要

Abstract:A major goal of computational astrophysics is to simulate the Milky Way Galaxy with sufficient resolution down to individual stars. However, the scaling fails due to some small-scale, short-timescale phenomena, such as supernova explosions. We have developed a novel integration scheme of N -body/hydrodynamics simulations working with machine learning. This approach bypasses the short timesteps caused by supernova explosions using a surrogate model, thereby improving scalability. With this method, we reached 300 billion particles using 148,900 nodes, equivalent to 7,147,200 CPU cores, breaking through the billion-particle barrier currently faced by state-of-the-art simulations. This resolution allows us to perform the first star-by-star galaxy simulation, which resolves individual stars in the Milky Way Galaxy. The performance scales over 10^4 CPU cores, an upper limit in the current state-of-the-art simulations using both A64FX and X86-64 processors and NVIDIA CUDA GPUs.

[LG-145] Provable test-time adaptivity and distributional robustness of in-context learning

链接: https://arxiv.org/abs/2510.23254
作者: Tianyi Ma,Tengyao Wang,Richard J. Samworth
类目: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
*备注: 44 pages

点击查看摘要

Abstract:We study in-context learning problems where a Transformer is pretrained on tasks drawn from a mixture distribution \pi=\sum_\alpha\in\mathcalA \lambda_\alpha \pi_\alpha , called the pretraining prior, in which each mixture component \pi_\alpha is a distribution on tasks of a specific difficulty level indexed by \alpha . Our goal is to understand the performance of the pretrained Transformer when evaluated on a different test distribution \mu , consisting of tasks of fixed difficulty \beta\in\mathcalA , and with potential distribution shift relative to \pi_\beta , subject to the chi-squared divergence \chi^2(\mu,\pi_\beta) being at most \kappa . In particular, we consider nonparametric regression problems with random smoothness, and multi-index models with random smoothness as well as random effective dimension. We prove that a large Transformer pretrained on sufficient data achieves the optimal rate of convergence corresponding to the difficulty level \beta , uniformly over test distributions \mu in the chi-squared divergence ball. Thus, the pretrained Transformer is able to achieve faster rates of convergence on easier tasks and is robust to distribution shift at test time. Finally, we prove that even if an estimator had access to the test distribution \mu , the convergence rate of its expected risk over \mu could not be faster than that of our pretrained Transformers, thereby providing a more appropriate optimality guarantee than minimax lower bounds.

[LG-146] Rate-optimal Design for Anytime Best Arm Identification

链接: https://arxiv.org/abs/2510.23199
作者: Junpei Komiyama,Kyoungseok Jang,Junya Honda
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:We consider the best arm identification problem, where the goal is to identify the arm with the highest mean reward from a set of K arms under a limited sampling budget. This problem models many practical scenarios such as A/B testing. We consider a class of algorithms for this problem, which is provably minimax optimal up to a constant factor. This idea is a generalization of existing works in fixed-budget best arm identification, which are limited to a particular choice of risk measures. Based on the framework, we propose Almost Tracking, a closed-form algorithm that has a provable guarantee on the popular risk measure H_1 . Unlike existing algorithms, Almost Tracking does not require the total budget in advance nor does it need to discard a significant part of samples, which gives a practical advantage. Through experiments on synthetic and real-world datasets, we show that our algorithm outperforms existing anytime algorithms as well as fixed-budget algorithms.

[LG-147] Physics-informed diffusion models for extrapolating crystal structures beyond known motifs

链接: https://arxiv.org/abs/2510.23181
作者: Andrij Vasylenko,Federico Ottomano,Christopher M. Collins,Rahul Savani,Matthew S. Dyer,Matthew J. Rosseinsky
类目: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Discovering materials with previously unreported crystal frameworks is key to achieving transformative functionality. Generative artificial intelligence offers a scalable means to propose candidate crystal structures, however existing approaches mainly reproduce decorated variants of established motifs rather than uncover new configurations. Here we develop a physics-informed diffusion method, supported by chemically grounded validation protocol, which embeds descriptors of compactness and local environment diversity to balance physical plausibility with structural novelty. Conditioning on these metrics improves generative performance across architectures, increasing the fraction of structures outside 100 most common prototypes up to 67%. When crystal structure prediction (CSP) is seeded with generative structures, most candidates (97%) are reconstructed by CSP, yielding 145 (66%) low-energy frameworks not matching any known prototypes. These results show that while generative models are not substitutes for CSP, their chemically informed, diversity-guided outputs can enhance CSP efficiency, establishing a practical generative-CSP synergy for discovery-oriented exploration of chemical space.

[LG-148] Benchmarking VQE Configurations: Architectures Initializations and Optimizers for Silicon Ground State Energy

链接: https://arxiv.org/abs/2510.23171
作者: Zakaria Boutakka,Nouhaila Innan,Muhammed Shafique,Mohamed Bennai,Z. Sakhi
类目: Quantum Physics (quant-ph); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Quantum computing presents a promising path toward precise quantum chemical simulations, particularly for systems that challenge classical methods. This work investigates the performance of the Variational Quantum Eigensolver (VQE) in estimating the ground-state energy of the silicon atom, a relatively heavy element that poses significant computational complexity. Within a hybrid quantum-classical optimization framework, we implement VQE using a range of ansatz, including Double Excitation Gates, ParticleConservingU2, UCCSD, and k-UpCCGSD, combined with various optimizers such as gradient descent, SPSA, and ADAM. The main contribution of this work lies in a systematic methodological exploration of how these configuration choices interact to influence VQE performance, establishing a structured benchmark for selecting optimal settings in quantum chemical simulations. Key findings show that parameter initialization plays a decisive role in the algorithm’s stability, and that the combination of a chemically inspired ansatz with adaptive optimization yields superior convergence and precision compared to conventional approaches.

[LG-149] Complexity Dependent Error Rates for Physics-informed Statistical Learning via the Small-ball Method

链接: https://arxiv.org/abs/2510.23149
作者: Diego Marcondes
类目: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
*备注:

点击查看摘要

Abstract:Physics-informed statistical learning (PISL) integrates empirical data with physical knowledge to enhance the statistical performance of estimators. While PISL methods are widely used in practice, a comprehensive theoretical understanding of how informed regularization affects statistical properties is still missing. Specifically, two fundamental questions have yet to be fully addressed: (1) what is the trade-off between considering soft penalties versus hard constraints, and (2) what is the statistical gain of incorporating physical knowledge compared to purely data-driven empirical error minimisation. In this paper, we address these questions for PISL in convex classes of functions under physical knowledge expressed as linear equations by developing appropriate complexity dependent error rates based on the small-ball method. We show that, under suitable assumptions, (1) the error rates of physics-informed estimators are comparable to those of hard constrained empirical error minimisers, differing only by constant terms, and that (2) informed penalization can effectively reduce model complexity, akin to dimensionality reduction, thereby improving learning performance. This work establishes a theoretical framework for evaluating the statistical properties of physics-informed estimators in convex classes of functions, contributing to closing the gap between statistical theory and practical PISL, with potential applications to cases not yet explored in the literature.

[LG-150] reble10: A high-quality dataset for far-field speech recognition dereverberation and enhancement

链接: https://arxiv.org/abs/2510.23141
作者: Sarabeth S. Mullins,Georg Götz,Eric Bezzam,Steven Zheng,Daniel Gert Nielsen
类目: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Accurate far-field speech datasets are critical for tasks such as automatic speech recognition (ASR), dereverberation, speech enhancement, and source separation. However, current datasets are limited by the trade-off between acoustic realism and scalability. Measured corpora provide faithful physics but are expensive, low-coverage, and rarely include paired clean and reverberant data. In contrast, most simulation-based datasets rely on simplified geometrical acoustics, thus failing to reproduce key physical phenomena like diffraction, scattering, and interference that govern sound propagation in complex environments. We introduce Treble10, a large-scale, physically accurate room-acoustic dataset. Treble10 contains over 3000 broadband room impulse responses (RIRs) simulated in 10 fully furnished real-world rooms, using a hybrid simulation paradigm implemented in the Treble SDK that combines a wave-based and geometrical acoustics solver. The dataset provides six complementary subsets, spanning mono, 8th-order Ambisonics, and 6-channel device RIRs, as well as pre-convolved reverberant speech scenes paired with LibriSpeech utterances. All signals are simulated at 32 kHz, accurately modelling low-frequency wave effects and high-frequency reflections. Treble10 bridges the realism gap between measurement and simulation, enabling reproducible, physically grounded evaluation and large-scale data augmentation for far-field speech tasks. The dataset is openly available via the Hugging Face Hub, and is intended as both a benchmark and a template for next-generation simulation-driven audio research.

[LG-151] Coupled Flow Matching

链接: https://arxiv.org/abs/2510.23015
作者: Wenxi Cai,Yuheng Wang,Naichen Shi
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:We introduce Coupled Flow Matching (CPFM), a framework that integrates controllable dimensionality reduction and high-fidelity reconstruction. CPFM learns coupled continuous flows for both the high-dimensional data x and the low-dimensional embedding y, which enables sampling p(y|x) via a latent-space flow and p(x|y) via a data-space flow. Unlike classical dimension-reduction methods, where information discarded during compression is often difficult to recover, CPFM preserves the knowledge of residual information within the weights of a flow network. This design provides bespoke controllability: users may decide which semantic factors to retain explicitly in the latent space, while the complementary information remains recoverable through the flow network. Coupled flow matching builds on two components: (i) an extended Gromov-Wasserstein optimal transport objective that establishes a probabilistic correspondence between data and embeddings, and (ii) a dual-conditional flow-matching network that extrapolates the correspondence to the underlying space. Experiments on multiple benchmarks show that CPFM yields semantically rich embeddings and reconstructs data with higher fidelity than existing baselines.

[LG-152] Analysis of accuracy and efficiency of neural networks to simulate Navier-Stokes fluid flows with obstacles

链接: https://arxiv.org/abs/2510.22976
作者: Rui Hespanha,Elliot McGuire,João Hespanha
类目: Fluid Dynamics (physics.flu-dyn); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Conventional fluid simulations can be time consuming and energy intensive. We researched the viability of a neural network for simulating incompressible fluids in a randomized obstacle-heavy environment, as an alternative to the numerical simulation of the Navier-Stokes equation. We hypothesized that the neural network predictions would have a relatively low error for simulations over a small number of time steps, but errors would eventually accumulate to the point that the output would become very noisy. Over a rich set of obstacle configurations, we achieved a root mean square error of 0.32% on our training dataset and 0.36% on a testing dataset. These errors only grew to 1.45% and 2.34% at t = 10 and, 2.11% and 4.16% at timestep t = 20. We also found that our selected neural network was approximately 8,800 times faster at predicting the flow than a conventional simulation. Our findings indicate neural networks can be extremely useful at simulating fluids in obstacle-heavy environments. Useful applications include modeling forest fire smoke, pipe fluid flow, and underwater/flood currents.

[LG-153] AQCat25: Unlocking spin-aware high-fidelity machine learning potentials for heterogeneous catalysis

链接: https://arxiv.org/abs/2510.22938
作者: Omar Allam,Brook Wander,Aayush R. Singh
类目: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)
*备注: 32 pages, 17 figures

点击查看摘要

Abstract:Large-scale datasets have enabled highly accurate machine learning interatomic potentials (MLIPs) for general-purpose heterogeneous catalysis modeling. There are, however, some limitations in what can be treated with these potentials because of gaps in the underlying training data. To extend these capabilities, we introduce AQCat25, a complementary dataset of 13.5 million density functional theory (DFT) single point calculations designed to improve the treatment of systems where spin polarization and/or higher fidelity are critical. We also investigate methodologies for integrating new datasets, such as AQCat25, with the broader Open Catalyst 2020 (OC20) dataset to create spin-aware models without sacrificing generalizability. We find that directly tuning a general model on AQCat25 leads to catastrophic forgetting of the original dataset’s knowledge. Conversely, joint training strategies prove effective for improving accuracy on the new data without sacrificing general performance. This joint approach introduces a challenge, as the model must learn from a dataset containing both mixed-fidelity calculations and mixed-physics (spin-polarized vs. unpolarized). We show that explicitly conditioning the model on this system-specific metadata, for example by using Feature-wise Linear Modulation (FiLM), successfully addresses this challenge and further enhances model accuracy. Ultimately, our work establishes an effective protocol for bridging DFT fidelity domains to advance the predictive power of foundational models in catalysis.

[LG-154] Clinic-Oriented Feasibility of a Sensor-Fused Wearable for Upper-Limb Function

链接: https://arxiv.org/abs/2510.22913
作者: Thanyanee Srichaisak,Arissa Ieochai,Aueaphum Aueawattthanaphisut
类目: ignal Processing (eess.SP); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Robotics (cs.RO); Neurons and Cognition (q-bio.NC)
*备注: 19 pages, 7 figures, 5 Tables

点击查看摘要

Abstract:Background: Upper-limb weakness and tremor (4–12 Hz) limit activities of daily living (ADL) and reduce adherence to home rehabilitation. Objective: To assess technical feasibility and clinician-relevant signals of a sensor-fused wearable targeting the triceps brachii and extensor pollicis brevis. Methods: A lightweight node integrates surface EMG (1 kHz), IMU (100–200 Hz), and flex/force sensors with on-device INT8 inference (Tiny 1D-CNN/Transformer) and a safety-bounded assist policy (angle/torque/jerk limits; stall/time-out). Healthy adults (n = 12) performed three ADL-like tasks. Primary outcomes: Tremor Index (TI), range of motion (ROM), repetitions (Reps min ^-1 ). Secondary: EMG median-frequency slope (fatigue trend), closed-loop latency, session completion, and device-related adverse events. Analyses used subject-level paired medians with BCa 95% CIs; exact Wilcoxon p -values are reported in the Results. Results: Assistance was associated with lower tremor prominence and improved task throughput: TI decreased by -0.092 (95% CI [ -0.102 , -0.079 ]), ROM increased by +12.65% (95% CI [ +8.43 , +13.89 ]), and Reps rose by +2.99 min ^-1 (95% CI [ +2.61 , +3.35 ]). Median on-device latency was 8.7 ms at a 100 Hz loop rate; all sessions were completed with no device-related adverse events. Conclusions: Multimodal sensing with low-latency, safety-bounded assistance produced improved movement quality (TI \downarrow ) and throughput (ROM, Reps \uparrow ) in a pilot technical-feasibility setting, supporting progression to IRB-approved patient studies. Trial registration: Not applicable (pilot non-clinical).

[LG-155] A Free Probabilistic Framework for Denoising Diffusion Models: Entropy Transport and Reverse Processes

链接: https://arxiv.org/abs/2510.22778
作者: Swagatam Das
类目: Probability (math.PR); Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:This work develops a rigorous framework for diffusion-based generative modeling in the setting of free probability. We extend classical denoising diffusion probabilistic models to free diffusion processes – stochastic dynamics acting on noncommutative random variables whose spectral measures evolve by free additive convolution. The forward dynamics satisfy a free Fokker–Planck equation that increases Voiculescu’s free entropy and dissipates free Fisher information, providing a noncommutative analogue of the classical de Bruijn identity. Using tools from free stochastic analysis, including a free Malliavin calculus and a Clark–Ocone representation, we derive the reverse-time stochastic differential equation driven by the conjugate variable, the free analogue of the score function. We further develop a variational formulation of these flows in the free Wasserstein space, showing that the resulting gradient-flow structure converges to the semicircular equilibrium law. Together, these results connect modern diffusion models with the information geometry of free entropy and establish a mathematical foundation for generative modeling with operator-valued or high-dimensional structured data.

[LG-156] OEUVRE: OnlinE Unbiased Variance-Reduced loss Estimation

链接: https://arxiv.org/abs/2510.22744
作者: Kanad Pardeshi,Bryan Wilder,Aarti Singh
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Online learning algorithms continually update their models as data arrive, making it essential to accurately estimate the expected loss at the current time step. The prequential method is an effective estimation approach which can be practically deployed in various ways. However, theoretical guarantees have previously been established under strong conditions on the algorithm, and practical algorithms have hyperparameters which require careful tuning. We introduce OEUVRE, an estimator that evaluates each incoming sample on the function learned at the current and previous time steps, recursively updating the loss estimate in constant time and memory. We use algorithmic stability, a property satisfied by many popular online learners, for optimal updates and prove consistency, convergence rates, and concentration bounds for our estimator. We design a method to adaptively tune OEUVRE’s hyperparameters and test it across diverse online and stochastic tasks. We observe that OEUVRE matches or outperforms other estimators even when their hyperparameters are tuned with oracle access to ground truth.

[LG-157] Scalable Neural Decoders for Practical Real-Time Quantum Error Correction

链接: https://arxiv.org/abs/2510.22724
作者: Changwon Lee,Tak Hur,Daniel K. Park
类目: Quantum Physics (quant-ph); Machine Learning (cs.LG)
*备注: 10 pages, 5 figures

点击查看摘要

Abstract:Real-time, scalable, and accurate decoding is a critical component for realizing a fault-tolerant quantum computer. While Transformer-based neural decoders such as \textitAlphaQubit have demonstrated high accuracy, the computational complexity of their core attention mechanism, which scales as \mathcalO(d^4) with code distance d , results in decoding speeds insufficient for practical real-time applications. In this work, we introduce and evaluate a \textitMamba-based decoder, a state-space model with \mathcalO(d^2) complexity. In memory experiments using Sycamore hardware data, our Mamba decoder matches the performance of its Transformer-based counterpart, providing that its superior efficiency does not come at the cost of performance. Crucially, in simulated real-time scenarios that account for decoder-induced noise, the Mamba decoder significantly outperforms the Transformer, exhibiting a higher error threshold of 0.0104 compared to 0.0097 . These results demonstrate that Mamba decoders offer a compelling balance between speed and accuracy, making them a promising architecture for scalable, real-time quantum error correction.

[LG-158] Block Coordinate Descent for Neural Networks Provably Finds Global Minima

链接: https://arxiv.org/abs/2510.22667
作者: Shunta Akiyama
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注: 32 pages, 4 figures

点击查看摘要

Abstract:In this paper, we consider a block coordinate descent (BCD) algorithm for training deep neural networks and provide a new global convergence guarantee under strictly monotonically increasing activation functions. While existing works demonstrate convergence to stationary points for BCD in neural networks, our contribution is the first to prove convergence to global minima, ensuring arbitrarily small loss. We show that the loss with respect to the output layer decreases exponentially while the loss with respect to the hidden layers remains well-controlled. Additionally, we derive generalization bounds using the Rademacher complexity framework, demonstrating that BCD not only achieves strong optimization guarantees but also provides favorable generalization performance. Moreover, we propose a modified BCD algorithm with skip connections and non-negative projection, extending our convergence guarantees to ReLU activation, which are not strictly monotonic. Empirical experiments confirm our theoretical findings, showing that the BCD algorithm achieves a small loss for strictly monotonic and ReLU activations.

[LG-159] Semi-Supervised Learning under General Causal Models

链接: https://arxiv.org/abs/2510.22567
作者: Archer Moore,Heejung Shim,Jingge Zhu,Mingming Gong
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Semi-supervised learning (SSL) aims to train a machine learning model using both labelled and unlabelled data. While the unlabelled data have been used in various ways to improve the prediction accuracy, the reason why unlabelled data could help is not fully understood. One interesting and promising direction is to understand SSL from a causal perspective. In light of the independent causal mechanisms principle, the unlabelled data can be helpful when the label causes the features but not vice versa. However, the causal relations between the features and labels can be complex in real world applications. In this paper, we propose a SSL framework that works with general causal models in which the variables have flexible causal relations. More specifically, we explore the causal graph structures and design corresponding causal generative models which can be learned with the help of unlabelled data. The learned causal generative model can generate synthetic labelled data for training a more accurate predictive model. We verify the effectiveness of our proposed method by empirical studies on both simulated and real data.

[LG-160] Statistical Analysis of the Sinkhorn Iterations for Two-Sample Schrödinger Bridge Estimation

链接: https://arxiv.org/abs/2510.22560
作者: Ibuki Maeda,Rentian Yao,Atsushi Nitanda
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注: 30 pages

点击查看摘要

Abstract:The Schrödinger bridge problem seeks the optimal stochastic process that connects two given probability distributions with minimal energy modification. While the Sinkhorn algorithm is widely used to solve the static optimal transport problem, a recent work (Pooladian and Niles-Weed, 2024) proposed the Sinkhorn bridge, which estimates Schrödinger bridges by plugging optimal transport into the time-dependent drifts of SDEs, with statistical guarantees in the one-sample estimation setting where the true source distribution is fully accessible. In this work, to further justify this method, we study the statistical performance of intermediate Sinkhorn iterations in the two-sample estimation setting, where only finite samples from both source and target distributions are available. Specifically, we establish a statistical bound on the squared total variation error of Sinkhorn bridge iterations: O(1/m+1/n + r^4k)~(r \in (0,1)) , where m and n are the sample sizes from the source and target distributions, respectively, and k is the number of Sinkhorn iterations. This result provides a theoretical guarantee for the finite-sample performance of the Schrödinger bridge estimator and offers practical guidance for selecting sample sizes and the number of Sinkhorn iterations. Notably, our theoretical results apply to several representative methods such as [SF] ^2 M, DSBM-IMF, BM2, and LightSB(-M) under specific settings, through the previously unnoticed connection between these estimators.

[LG-161] qc-kmeans: A Quantum Compressive K-Means Algorithm for NISQ Devices

链接: https://arxiv.org/abs/2510.22540
作者: Pedro Chumpitaz-Flores,My Duong,Ying Mao,Kaixun Hua
类目: Quantum Physics (quant-ph); Emerging Technologies (cs.ET); Machine Learning (cs.LG)
*备注: 10 pages, 3 figures, accepted to 2025 IEEE International Conference on Big Data (IEEE BigData 2025)

点击查看摘要

Abstract:Clustering on NISQ hardware is constrained by data loading and limited qubits. We present \textbfqc-kmeans, a hybrid compressive k -means that summarizes a dataset with a constant-size Fourier-feature sketch and selects centroids by solving small per-group QUBOs with shallow QAOA circuits. The QFF sketch estimator is unbiased with mean-squared error O(\varepsilon^2) for B,S=\Theta(\varepsilon^-2) , and the peak-qubit requirement q_\textpeak=\max\D,\lceil \log_2 B\rceil + 1\ does not scale with the number of samples. A refinement step with elitist retention ensures non-increasing surrogate cost. In Qiskit Aer simulations (depth p=1 ), the method ran with \le 9 qubits on low-dimensional synthetic benchmarks and achieved competitive sum-of-squared errors relative to quantum baselines; runtimes are not directly comparable. On nine real datasets (up to 4.3\times 10^5 points), the pipeline maintained constant peak-qubit usage in simulation. Under IBM noise models, accuracy was similar to the idealized setting. Overall, qc-kmeans offers a NISQ-oriented formulation with shallow, bounded-width circuits and competitive clustering quality in simulation.

[LG-162] Multi-Modal Masked Autoencoders for Learning Image-Spectrum Associations for Galaxy Evolution and Cosmology NEURIPS2025

链接: https://arxiv.org/abs/2510.22527
作者: Morgan Himes,Samiksha Krishnamurthy,Andrew Lizarraga,Srinath Saikrishnan,Vikram Seenivasan,Jonathan Soriano,Ying Nian Wu,Tuan Do
类目: Instrumentation and Methods for Astrophysics (astro-ph.IM); Astrophysics of Galaxies (astro-ph.GA); Machine Learning (cs.LG)
*备注: 8 pages, 3 figures, 1 table, accepted to NeurIPS 2025 Workshop ML4PS

点击查看摘要

Abstract:Upcoming surveys will produce billions of galaxy images but comparatively few spectra, motivating models that learn cross-modal representations. We build a dataset of 134,533 galaxy images (HSC-PDR2) and spectra (DESI-DR1) and adapt a Multi-Modal Masked Autoencoder (MMAE) to embed both images and spectra in a shared representation. The MMAE is a transformer-based architecture, which we train by masking 75% of the data and reconstructing missing image and spectral tokens. We use this model to test three applications: spectral and image reconstruction from heavily masked data and redshift regression from images alone. It recovers key physical features, such as galaxy shapes, atomic emission line peaks, and broad continuum slopes, though it struggles with fine image details and line strengths. For redshift regression, the MMAE performs comparably or better than prior multi-modal models in terms of prediction scatter even when missing spectra in testing. These results highlight both the potential and limitations of masked autoencoders in astrophysics and motivate extensions to additional modalities, such as text, for foundation models.

[LG-163] Semi-supervised Vertex Hunting with Applications in Network and Text Analysis

链接: https://arxiv.org/abs/2510.22526
作者: Yicong Jiang,Zheng Tracy Ke
类目: Methodology (stat.ME); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Vertex hunting (VH) is the task of estimating a simplex from noisy data points and has many applications in areas such as network and text analysis. We introduce a new variant, semi-supervised vertex hunting (SSVH), in which partial information is available in the form of barycentric coordinates for some data points, known only up to an unknown transformation. To address this problem, we develop a method that leverages properties of orthogonal projection matrices, drawing on novel insights from linear algebra. We establish theoretical error bounds for our method and demonstrate that it achieves a faster convergence rate than existing unsupervised VH algorithms. Finally, we apply SSVH to two practical settings, semi-supervised network mixed membership estimation and semi-supervised topic modeling, resulting in efficient and scalable algorithms.

[LG-164] Confidence Sets for Multidimensional Scaling

链接: https://arxiv.org/abs/2510.22452
作者: Siddharth Vishwanath,Ery Arias-Castro
类目: atistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注: 62 pages, 5 figures

点击查看摘要

Abstract:We develop a formal statistical framework for classical multidimensional scaling (CMDS) applied to noisy dissimilarity data. We establish distributional convergence results for the embeddings produced by CMDS for various noise models, which enable the construction of \emphbona~fide uniform confidence sets for the latent configuration, up to rigid transformations. We further propose bootstrap procedures for constructing these confidence sets and provide theoretical guarantees for their validity. We find that the multiplier bootstrap adapts automatically to heteroscedastic noise such as multiplicative noise, while the empirical bootstrap seems to require homoscedasticity. Either form of bootstrap, when valid, is shown to substantially improve finite-sample accuracy. The empirical performance of the proposed methods is demonstrated through numerical experiments.

[LG-165] Reinforcement learning-guided optimization of critical current in high-temperature superconductors

链接: https://arxiv.org/abs/2510.22424
作者: Mouyang Cheng,Qiwei Wan,Bowen Yu,Eunbi Rha,Michael J Landry,Mingda Li
类目: Materials Science (cond-mat.mtrl-sci); Superconductivity (cond-mat.supr-con); Machine Learning (cs.LG)
*备注: 7 pages, 4 figures

点击查看摘要

Abstract:High-temperature superconductors are essential for next-generation energy and quantum technologies, yet their performance is often limited by the critical current density ( J_c ), which is strongly influenced by microstructural defects. Optimizing J_c through defect engineering is challenging due to the complex interplay of defect type, density, and spatial correlation. Here we present an integrated workflow that combines reinforcement learning (RL) with time-dependent Ginzburg-Landau (TDGL) simulations to autonomously identify optimal defect configurations that maximize J_c . In our framework, TDGL simulations generate current-voltage characteristics to evaluate J_c , which serves as the reward signal that guides the RL agent to iteratively refine defect configurations. We find that the agent discovers optimal defect densities and correlations in two-dimensional thin-film geometries, enhancing vortex pinning and J_c relative to the pristine thin-film, approaching 60% of theoretical depairing limit with up to 15-fold enhancement compared to random initialization. This RL-driven approach provides a scalable strategy for defect engineering, with broad implications for advancing HTS applications in fusion magnets, particle accelerators, and other high-field technologies.

[LG-166] Extrag radient Method for (L_0 L_1)-Lipschitz Root-finding Problems NEURIPS2025

链接: https://arxiv.org/abs/2510.22421
作者: Sayantan Choudhury,Nicolas Loizou
类目: Optimization and Control (math.OC); Machine Learning (cs.LG)
*备注: Published in NeurIPS 2025, 44 pages, 6 Figures

点击查看摘要

Abstract:Introduced by Korpelevich in 1976, the extragradient method (EG) has become a cornerstone technique for solving min-max optimization, root-finding problems, and variational inequalities (VIs). Despite its longstanding presence and significant attention within the optimization community, most works focusing on understanding its convergence guarantees assume the strong L-Lipschitz condition. In this work, building on the proposed assumptions by Zhang et al. [2024b] for minimization and Vankov et al.[2024] for VIs, we focus on the more relaxed \alpha -symmetric (L_0, L_1) -Lipschitz condition. This condition generalizes the standard Lipschitz assumption by allowing the Lipschitz constant to scale with the operator norm, providing a more refined characterization of problem structures in modern machine learning. Under the \alpha -symmetric (L_0, L_1) -Lipschitz condition, we propose a novel step size strategy for EG to solve root-finding problems and establish sublinear convergence rates for monotone operators and linear convergence rates for strongly monotone operators. Additionally, we prove local convergence guarantees for weak Minty operators. We supplement our analysis with experiments validating our theory and demonstrating the effectiveness and robustness of the proposed step sizes for EG.

[LG-167] Beyond Isotonization: Scalable Non-Crossing Quantile Estimation via Neural Networks for Student Growth Percentiles

链接: https://arxiv.org/abs/2510.22419
作者: Kaihua Chang
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注: 15 pages, 2 tables, 1 code listing

点击查看摘要

Abstract:Student Growth Percentiles (SGPs), widely adopted across U.S. state assessment systems, employ independent quantile regression followed by post-hoc correction using an isotonic projection method (\textttisotonize=TRUE in the \textttSGP R package) to address quantile crossing. We demonstrate this approach contains a fundamental methodological inconsistency: interpolation between independently-estimated, potentially crossed quantiles requires monotonicity, yet the post-hoc correction alters estimates in ways that may violate the quantile property P(Y \leq \hatQ_\tau(Y|X) \mid X) = \tau . We term this the \emphinterpolation paradox. While theoretically sound constrained joint quantile regression (CJQR) eliminates crossing by enforcing non-crossing constraints during optimization, we analyze its computational complexity (often scaling poorly, e.g., \mathcalO((qn)^3) for standard LP solvers) rendering it intractable for large-scale educational data ( n 100,000 ). We examine the SGP package’s switch to the Frisch-Newton interior point method (\textttrq.method.for.large.n=“fn”) for large N , noting that while efficient for \emphindependent QR, it doesn’t resolve the joint problem’s complexity or the paradox. We propose neural network-based multi-quantile regression (NNQR) with shared hidden layers as a practical alternative. Leveraging the convexity of the composite pinball loss, SGD-based optimization used in NN training can reliably approach the global optimum, offering scalability ( O(n) ) and implicitly reducing crossing. Our empirical analysis shows independent QR yields crossing, while both CJQR and NNQR enforce monotonicity. NNQR emerges as a viable, scalable alternative for operational SGP systems, aligning theoretical validity with computational feasibility.

[LG-168] MetaCaDI: A Meta-Learning Framework for Scalable Causal Discovery with Unknown Interventions

链接: https://arxiv.org/abs/2510.22298
作者: Hans Jarett Ong,Yoichi Chikahara,Tomoharu Iwata
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注: 8 pages, 2 figures

点击查看摘要

Abstract:Uncovering the underlying causal mechanisms of complex real-world systems remains a significant challenge, as these systems often entail high data collection costs and involve unknown interventions. We introduce MetaCaDI, the first framework to cast the joint discovery of a causal graph and unknown interventions as a meta-learning problem. MetaCaDI is a Bayesian framework that learns a shared causal graph structure across multiple experiments and is optimized to rapidly adapt to new, few-shot intervention target prediction tasks. A key innovation is our model’s analytical adaptation, which uses a closed-form solution to bypass expensive and potentially unstable gradient-based bilevel optimization. Extensive experiments on synthetic and complex gene expression data demonstrate that MetaCaDI significantly outperforms state-of-the-art methods. It excels at both causal graph recovery and identifying intervention targets from as few as 10 data instances, proving its robustness in data-scarce scenarios.

[LG-169] Synthetic-to-Real Transfer Learning for Chromatin-Sensitive PWS Microscopy

链接: https://arxiv.org/abs/2510.22239
作者: Jahidul Arafat,Sanjaya Poudel
类目: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
*备注: 24 pages, 5 figures and 4 tables

点击查看摘要

Abstract:Chromatin sensitive partial wave spectroscopic (csPWS) microscopy enables label free detection of nanoscale chromatin packing alterations that occur before visible cellular transformation. However, manual nuclear segmentation limits population scale analysis needed for biomarker discovery in early cancer detection. The lack of annotated csPWS imaging data prevents direct use of standard deep learning methods. We present CFU Net, a hierarchical segmentation architecture trained with a three stage curriculum on synthetic multimodal data. CFU Net achieves near perfect performance on held out synthetic test data that represent diverse spectroscopic imaging conditions without manual annotations (Dice 0.9879, IoU 0.9895). Our approach uses physics based rendering that incorporates empirically supported chromatin packing statistics, Mie scattering models, and modality specific noise, combined with a curriculum that progresses from adversarial RGB pretraining to spectroscopic fine tuning and histology validation. CFU Net integrates five architectural elements (ConvNeXt backbone, Feature Pyramid Network, UNet plus plus dense connections, dual attention, and deep supervision) that together improve Dice over a baseline UNet by 8.3 percent. We demonstrate deployment ready INT8 quantization with 74.9 percent compression and 0.15 second inference, giving a 240 times throughput gain over manual analysis. Applied to more than ten thousand automatically segmented nuclei from synthetic test data, the pipeline extracts chromatin biomarkers that distinguish normal from pre cancerous tissue with large effect sizes (Cohens d between 1.31 and 2.98), reaching 94 percent classification accuracy. This work provides a general framework for synthetic to real transfer learning in specialized microscopy and open resources for community validation on clinical specimens.

[LG-170] Bridging the Perceptual - Statistical Gap in Dysarthria Assessment: Why Machine Learning Still Falls Short

链接: https://arxiv.org/abs/2510.22237
作者: Krishna Gurugubelli
类目: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Automated dysarthria detection and severity assessment from speech have attracted significant research attention due to their potential clinical impact. Despite rapid progress in acoustic modeling and deep learning, models still fall short of human expert performance. This manuscript provides a comprehensive analysis of the reasons behind this gap, emphasizing a conceptual divergence we term the ``perceptual-statistical gap’'. We detail human expert perceptual processes, survey machine learning representations and methods, review existing literature on feature sets and modeling strategies, and present a theoretical analysis of limits imposed by label noise and inter-rater variability. We further outline practical strategies to narrow the gap, perceptually motivated features, self-supervised pretraining, ASR-informed objectives, multimodal fusion, human-in-the-loop training, and explainability methods. Finally, we propose experimental protocols and evaluation metrics aligned with clinical goals to guide future research toward clinically reliable and interpretable dysarthria assessment tools.

[LG-171] HPC-Driven Modeling with ML-Based Surrogates for Magnon-Photon Dynamics in Hybrid Quantum Systems

链接: https://arxiv.org/abs/2510.22221
作者: Jialin Song,Yingheng Tang,Pu Ren,Shintaro Takayoshi,Saurabh Sawant,Yujie Zhu,Jia-Mian Hu,Andy Nonaka,Michael W. Mahoney,Benjamin Erichson, Zhi (Jackie)Yao
类目: Quantum Physics (quant-ph); Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
*备注:

点击查看摘要

Abstract:Simulating hybrid magnonic quantum systems remains a challenge due to the large disparity between the timescales of the two systems. We present a massively parallel GPU-based simulation framework that enables fully coupled, large-scale modeling of on-chip magnon-photon circuits. Our approach resolves the dynamic interaction between ferromagnetic and electromagnetic fields with high spatiotemporal fidelity. To accelerate design workflows, we develop a physics-informed machine learning surrogate trained on the simulation data, reducing computational cost while maintaining accuracy. This combined approach reveals real-time energy exchange dynamics and reproduces key phenomena such as anti-crossing behavior and the suppression of ferromagnetic resonance under strong electromagnetic fields. By addressing the multiscale and multiphysics challenges in magnon-photon modeling, our framework enables scalable simulation and rapid prototyping of next-generation quantum and spintronic devices.

[LG-172] MMbeddings: Parameter-Efficient Low-Overfitting Probabilistic Embeddings Inspired by Nonlinear Mixed Models

链接: https://arxiv.org/abs/2510.22198
作者: Giora Simchoni,Saharon Rosset
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:We present MMbeddings, a probabilistic embedding approach that reinterprets categorical embeddings through the lens of nonlinear mixed models, effectively bridging classical statistical theory with modern deep learning. By treating embeddings as latent random effects within a variational autoencoder framework, our method substantially decreases the number of parameters – from the conventional embedding approach of cardinality \times embedding dimension, which quickly becomes infeasible with large cardinalities, to a significantly smaller, cardinality-independent number determined primarily by the encoder architecture. This reduction dramatically mitigates overfitting and computational burden in high-cardinality settings. Extensive experiments on simulated and real datasets, encompassing collaborative filtering and tabular regression tasks using varied architectures, demonstrate that MMbeddings consistently outperforms traditional embeddings, underscoring its potential across diverse machine learning applications.

[LG-173] RGC: a radio AGN classifier based on deep learning. I. A semi-supervised model for the VLA images of bent radio AGNs

链接: https://arxiv.org/abs/2510.22190
作者: M.S. Hossain(1),M.S.H. Shahal(2 and 3),A. Khan(1 and 2),K.M.B. Asad(2 and 4),P. Saikia(5),F. Akter(6),A. Ali(1 and 3),M.A. Amin(1 and 3),A. Momen(1 and 2 and 4),M. Hasan(3),A.K.M.M. Rahman(1 and 3) ((1) Center for Computational and Data Sciences, Independent University, Bangladesh, (2) Center for Astronomy, Space Science and Astrophysics, Independent University, Bangladesh, (3) Department of Computer Science and Engineering, Independent University, Bangladesh, (4) Department of Physical Sciences, Independent University, Bangladesh, (5) Department of Astronomy and Physics, Yale University, USA, (6) Department of Agricultural and Biosystems Engineering, North Dakota State University, USA)
类目: Instrumentation and Methods for Astrophysics (astro-ph.IM); Cosmology and Nongalactic Astrophysics (astro-ph.CO); Machine Learning (cs.LG)
*备注: 12 pages, 7 pages appendix, 6 figures, submitted to AA

点击查看摘要

Abstract:Wide-angle tail (WAT) and narrow-angle tail (NAT) radio active galactic nuclei (RAGNs) are key tracers of dense environments in galaxy groups and clusters, yet no machine-learning classifier of bent RAGNs has been trained using both unlabeled data and purely visually inspected labels. We release the RGC Python package, which includes two newly preprocessed labeled datasets of 639 WATs and NATs derived from a publicly available catalog of visually inspected sources, along with a semi-supervised RGC model that leverages 20,000 unlabeled RAGNs. The two labeled datasets in RGC were preprocessed using PyBDSF which retains spurious sources, and Photutils which removes them. The RGC model integrates the self-supervised framework BYOL (Bootstrap YOur Latent) with the supervised E2CNN (E2-equivariant Convolutional Neural Network) to form a semi-supervised binary classifier. The RGC model, when trained and evaluated on a dataset devoid of spurious sources, reaches peak performance, attaining an accuracy of 88.88% along with F1-scores of 0.90 for WATs and 0.85 for NATs. The model’s attention patterns amid class imbalance suggest that this work can serve as a stepping stone toward developing physics-informed foundation models capable of identifying a broad range of AGN physical properties.

[LG-174] Differentially Private High-dimensional Variable Selection via Integer Programming NEURIPS2025

链接: https://arxiv.org/abs/2510.22062
作者: Petros Prastakos,Kayhan Behdin,Rahul Mazumder
类目: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
*备注: NeurIPS 2025

点击查看摘要

Abstract:Sparse variable selection improves interpretability and generalization in high-dimensional learning by selecting a small subset of informative features. Recent advances in Mixed Integer Programming (MIP) have enabled solving large-scale non-private sparse regression - known as Best Subset Selection (BSS) - with millions of variables in minutes. However, extending these algorithmic advances to the setting of Differential Privacy (DP) has remained largely unexplored. In this paper, we introduce two new pure differentially private estimators for sparse variable selection, levering modern MIP techniques. Our framework is general and applies broadly to problems like sparse regression or classification, and we provide theoretical support recovery guarantees in the case of BSS. Inspired by the exponential mechanism, we develop structured sampling procedures that efficiently explore the non-convex objective landscape, avoiding the exhaustive combinatorial search in the exponential mechanism. We complement our theoretical findings with extensive numerical experiments, using both least squares and hinge loss for our objective function, and demonstrate that our methods achieve state-of-the-art empirical support recovery, outperforming competing algorithms in settings with up to p=10^4 .

[LG-175] Input Adaptive Bayesian Model Averag ing

链接: https://arxiv.org/abs/2510.22054
作者: Yuli Slavutsky,Sebastian Salazar,David M. Blei
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:This paper studies prediction with multiple candidate models, where the goal is to combine their outputs. This task is especially challenging in heterogeneous settings, where different models may be better suited to different inputs. We propose input adaptive Bayesian Model Averaging (IA-BMA), a Bayesian method that assigns model weights conditional on the input. IA-BMA employs an input adaptive prior, and yields a posterior distribution that adapts to each prediction, which we estimate with amortized variational inference. We derive formal guarantees for its performance, relative to any single predictor selected per input. We evaluate IABMA across regression and classification tasks, studying data from personalized cancer treatment, credit-card fraud detection, and UCI datasets. IA-BMA consistently delivers more accurate and better-calibrated predictions than both non-adaptive baselines and existing adaptive methods.

[LG-176] Adaptive Split-MMD Training for Small-Sample Cross-Dataset P300 EEG Classification

链接: https://arxiv.org/abs/2510.21969
作者: Weiyu Chen,Arnaud Delorme
类目: ignal Processing (eess.SP); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
*备注: 8 pages, 5 figures. Submitted to IEEE BIBM 2025 Workshop on Machine Learning for EEG Signal Processing (MLESP)

点击查看摘要

Abstract:Detecting single-trial P300 from EEG is difficult when only a few labeled trials are available. When attempting to boost a small target set with a large source dataset through transfer learning, cross-dataset shift arises. To address this challenge, we study transfer between two public visual-oddball ERP datasets using five shared electrodes (Fz, Pz, P3, P4, Oz) under a strict small-sample regime (target: 10 trials/subject; source: 80 trials/subject). We introduce Adaptive Split Maximum Mean Discrepancy Training (AS-MMD), which combines (i) a target-weighted loss with warm-up tied to the square root of the source/target size ratio, (ii) Split Batch Normalization (Split-BN) with shared affine parameters and per-domain running statistics, and (iii) a parameter-free logit-level Radial Basis Function kernel Maximum Mean Discrepancy (RBF-MMD) term using the median-bandwidth heuristic. Implemented on an EEG Conformer, AS-MMD is backbone-agnostic and leaves the inference-time model unchanged. Across both transfer directions, it outperforms target-only and pooled training (Active Visual Oddball: accuracy/AUC 0.66/0.74; ERP CORE P3: 0.61/0.65), with gains over pooling significant under corrected paired t-tests. Ablations attribute improvements to all three components.

[LG-177] Bridging Prediction and Attribution: Identifying Forward and Backward Causal Influence Ranges Using Assimilative Causal Inference

链接: https://arxiv.org/abs/2510.21889
作者: Marios Andreou,Nan Chen
类目: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST); Data Analysis, Statistics and Probability (physics.data-an); Methodology (stat.ME)
*备注: 39 pages (Main Text pp. 1–25; Supplementary Materials/Appendix pp. 26–35), 9 figures (all in Main Text). Submitted for peer-review to SIAM/ASA Journal on Uncertainty Quantification. Code available upon request. For more info see this https URL

点击查看摘要

Abstract:Causal inference identifies cause-and-effect relationships between variables. While traditional approaches rely on data to reveal causal links, a recently developed method, assimilative causal inference (ACI), integrates observations with dynamical models. It utilizes Bayesian data assimilation to trace causes back from observed effects by quantifying the reduction in uncertainty. ACI advances the detection of instantaneous causal relationships and the intermittent reversal of causal roles over time. Beyond identifying causal connections, an equally important challenge is determining the associated causal influence range (CIR), indicating when causal influences emerged and for how long they persist. In this paper, ACI is employed to develop mathematically rigorous formulations of both forward and backward CIRs at each time. The forward CIR quantifies the temporal impact of a cause, while the backward CIR traces the onset of triggers for an observed effect, thus characterizing causal predictability and attribution of outcomes at each transient phase, respectively. Objective and robust metrics for both CIRs are introduced, eliminating the need for empirical thresholds. Computationally efficient approximation algorithms to compute CIRs are developed, which facilitate the use of closed-form expressions for a broad class of nonlinear dynamical systems. Numerical simulations demonstrate how this forward and backward CIR framework provides new possibilities for probing complex dynamical systems. It advances the study of bifurcation-driven and noise-induced tipping points in Earth systems, investigates the impact from resolving the interfering variables when determining the influence ranges, and elucidates atmospheric blocking mechanisms in the equatorial region. These results have direct implications for science, policy, and decision-making.

信息检索

[IR-0] Multi-Stage Field Extraction of Financial Documents with OCR and Compact Vision-Language Models

链接: https://arxiv.org/abs/2510.23066
作者: Yichao Jin,Yushuo Wang,Qishuai Zhong,Kent Chiu Jin-Chun,Kenneth Zhu Ke,Donald MacDonald
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

[IR-1] Improving Product Search Relevance with EAR-MP: A Solution for the CIKM 2025 AnalytiCup

链接: https://arxiv.org/abs/2510.23018
作者: JaeEun Lim,Soomin Kim,Jaeyong Seo,Iori Ono,Qimu Ran,Jae-woong Lee
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

[IR-2] MGFRec: Towards Reinforced Reasoning Recommendation with Multiple Groundings and Feedback

链接: https://arxiv.org/abs/2510.22888
作者: Shihao Cai,Chongming Gao,Haoyan Liu,Wentao Shi,Jianshan Sun,Ruiming Tang,Fuli Feng
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

[IR-3] Civic Ground Truth in News Recommenders: A Method for Public Value Scoring RECSYS2025

链接: https://arxiv.org/abs/2510.22865
作者: James Meese,Kyle Herbertson
类目: Information Retrieval (cs.IR)
*备注: Presented at NORMalize 2025: The Third Workshop on the Normative Design and Evaluation of Recommender Systems, co-located with the ACM Conference on Recommender Systems 2025 (RecSys 2025), Prague

点击查看摘要

[IR-4] Diversification as Risk Minimization WSDM2026

链接: https://arxiv.org/abs/2510.22681
作者: Rikiya Takehi,Fernando Diaz,Tetsuya Sakai
类目: Information Retrieval (cs.IR)
*备注: Preprint, accepted at WSDM 2026 (Full Paper). 16 pages, 8 figures

点击查看摘要

[IR-5] ools are under-documented: Simple Document Expansion Boosts Tool Retrieval

链接: https://arxiv.org/abs/2510.22670
作者: Xuan Lu,Haohang Huang,Rui Meng,Yaohui Jin,Wenjun Zeng,Xiaoyu Shen
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

[IR-6] Multimodal Item Scoring for Natural Language Recommendation via Gaussian Process Regression with LLM Relevance Judgments

链接: https://arxiv.org/abs/2510.22023
作者: Yifan Liu,Qianfeng Wen,Jiazhou Liang,Mark Zhao,Justin Cui,Anton Korikov,Armin Torogh,Junyoung Kim,Scott Sanner
类目: Information Retrieval (cs.IR)
*备注: 16 pages,20 figures

点击查看摘要

[IR-7] mporal Graph Theoretic Analysis of Geopolitical Dynamics in the U.S. Entity List

链接: https://arxiv.org/abs/2510.21962
作者: Yunsen Lei,Kexin Bai,Quan Li,H. Howie Huang
类目: Information Retrieval (cs.IR)
*备注: 13 pages, 9 figures. Under review

点击查看摘要

Abstract:Export controls have become one of America’s most prominent tools of economic statecraft. They aim to block rival countries’ access to sensitive technologies, safeguard U.S. supply chains, protect national security, and shape geopolitical competition. Among various instruments, the U.S. Entity List has emerged as the most salient, yet its dynamics remain underexplored. This paper introduces a novel temporal graph framework that transforms the Entity List documents from a static registry of foreign entities of concern into a dynamic representation of geopolitical strategy. We construct the first event-based dataset of U.S. government foreign entity designations and model them as a temporal bipartite graph. Building on this representation, we develop a multi-level analytical approach that reveals shifting roles, enforcement strategy, and broader sanction ecosystems. Applied to 25 years of data, the framework uncovers dynamic patterns of escalation, persistence, and coordination that static views cannot capture. More broadly, our study demonstrates how temporal graph analysis offers systematic computational insights into the geopolitical dynamics of export controls.

[IR-8] Development of an Automated Web Application for Efficient Web Scraping: Design and Implementation

链接: https://arxiv.org/abs/2510.21831
作者: Alok Dutta,Nilanjana Roy,Rhythm Sen,Sougata Dutta,Prabhat Das
类目: Information Retrieval (cs.IR); Software Engineering (cs.SE)
*备注:

点击查看摘要

Abstract:This paper presents the design and implementation of a user-friendly, automated web application that simplifies and optimizes the web scraping process for non-technical users. The application breaks down the complex task of web scraping into three main stages: fetching, extraction, and execution. In the fetching stage, the application accesses target websites using the HTTP protocol, leveraging the requests library to retrieve HTML content. The extraction stage utilizes powerful parsing libraries like BeautifulSoup and regular expressions to extract relevant data from the HTML. Finally, the execution stage structures the data into accessible formats, such as CSV, ensuring the scraped content is organized for easy use. To provide personalized and secure experiences, the application includes user registration and login functionalities, supported by MongoDB, which stores user data and scraping history. Deployed using the Flask framework, the tool offers a scalable, robust environment for web scraping. Users can easily input website URLs, define data extraction parameters, and download the data in a simplified format, without needing technical expertise. This automated tool not only enhances the efficiency of web scraping but also democratizes access to data extraction by empowering users of all technical levels to gather and manage data tailored to their needs. The methodology detailed in this paper represents a significant advancement in making web scraping tools accessible, efficient, and easy to use for a broader audience.

[IR-9] 10 Simple Rules for Improving Your Standardized Fields and Terms

链接: https://arxiv.org/abs/2510.21825
作者: Rhiannon Cameron(1),Emma Griffiths(1),Damion Dooley(1),William Hsiao(1) ((1) Centre for Infectious Disease Genomics and One Health, Faculty of Health Sciences, Simon Fraser University, Burnaby, BC, Canada)
类目: Digital Libraries (cs.DL); Information Retrieval (cs.IR)
*备注: 17 pages, 1 figure Author Contributions: Conceptualization by EG and RC. Manuscript writing by RC. Revisions and Editing by RC, EG, DD, and WH. Acknowledgements: Charlotte Barclay

点击查看摘要

Abstract:Contextual metadata is the unsung hero of research data. When done right, standardized and structured vocabularies make your data findable, shareable, and reusable. When done wrong, they turn a well intended effort into data cleanup and curation nightmares. In this paper we tackle the surprisingly tricky process of vocabulary standardization with a mix of practical advice and grounded examples. Drawing from real-world experience in contextual data harmonization, we highlight common challenges (e.g., semantic noise and concept bombs) and provide actionable strategies to address them. Our rules emphasize alignment with Findability, Accessibility, Interoperability, and Reusability (FAIR) principles while remaining adaptable to evolving user and research needs. Whether you are curating datasets, designing a schema, or contributing to a standards body, these rules aim to help you create metadata that is not only technically sound but also meaningful to users.

[IR-10] From Factoid Questions to Data Product Requests: Benchmarking Data Product Discovery over Tables and Text

链接: https://arxiv.org/abs/2510.21737
作者: Liangliang Zhang,Nandana Mihindukulasooriya,Niharika S. D’Souza,Sola Shirai,Sarthak Dash,Yao Ma,Horst Samulowitz
类目: Information Retrieval (cs.IR)
*备注: 9 pages, 1 figure, 2 tables

点击查看摘要

Abstract:Data products are reusable, self-contained assets designed for specific business use cases. Automating their discovery and generation is of great industry interest, as it enables discovery in large data lakes and supports analytical Data Product Requests (DPRs). Currently, there is no benchmark established specifically for data product discovery. Existing datasets focus on answering single factoid questions over individual tables rather than collecting multiple data assets for broader, coherent products. To address this gap, we introduce DPBench, the first user-request-driven data product benchmark over hybrid table-text corpora. Our framework systematically repurposes existing table-text QA datasets by clustering related tables and passages into coherent data products, generating professional-level analytical requests that span both data sources, and validating benchmark quality through multi-LLM evaluation. DPBench preserves full provenance while producing actionable, analyst-like data product requests. Baseline experiments with hybrid retrieval methods establish the feasibility of DPR evaluation, reveal current limitations, and point to new opportunities for automatic data product discovery research. Code and datasets are available at: this https URL Comments: 9 pages, 1 figure, 2 tables Subjects: Information Retrieval (cs.IR) MSC classes: 68T30, 68T50 ACMclasses: I.2.7; I.2.4; H.3.3 Cite as: arXiv:2510.21737 [cs.IR] (or arXiv:2510.21737v1 [cs.IR] for this version) https://doi.org/10.48550/arXiv.2510.21737 Focus to learn more arXiv-issued DOI via DataCite

[IR-11] Augmenting Researchy Questions with Sub-question Judgments

链接: https://arxiv.org/abs/2510.21733
作者: Jia-Huei Ju,Eugene Yang,Trevor Adriaanse,Andrew Yates
类目: Information Retrieval (cs.IR)
*备注: 3 pages

点击查看摘要

Abstract:The Researchy Questions dataset provides about 100k question queries with complex information needs that require retrieving information about several aspects of a topic. Each query in ResearchyQuestions is associated with sub-questions that were produced by prompting GPT-4. While ResearchyQuestions contains labels indicating what documents were clicked after issuing the query, there are no associations in the dataset between sub-questions and relevant documents. In this work, we augment the Researchy Questions dataset with LLM-judged labels for each sub-question using a Llama3.3 70B model. We intend these sub-question labels to serve as a resource for training retrieval models that better support complex information needs.

[IR-12] riMat: Context-aware Recommendation by Tri-Matrix Factorization

链接: https://arxiv.org/abs/2510.21730
作者: Hao Wang
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:Search engine is the symbolic technology of Web 2.0, and many people used to believe recommender systems is the new frontier of Web 3.0. In the past 10 years, with the advent of TikTok and similar apps, recommender systems has materialized the vision of the machine learning pioneers. However, many research topics of the field remain unfixed until today. One such topic is CARS (Context-aware Recommender Systems) , which is largely a theoretical topic without much advance in real-world applications. In this paper, we utilize tri-matrix factorization technique to incorporate contextual information into our matrix factorization framework, and prove that our technique is effective in improving both the accuracy and fairness metrics in our experiments.

[IR-13] Practice on Long Behavior Sequence Modeling in Tencent Advertising

链接: https://arxiv.org/abs/2510.21714
作者: Xian Hu,Ming Yue,Zhixiang Feng,Junwei Pan,Junjie Zhai,Ximei Wang,Xinrui Miao,Qian Li,Xun Liu,Shangyu Zhang,Letian Wang,Hua Lu,Zijian Zeng,Chen Cai,Wei Wang,Fei Xiong,Pengfei Xiong,Jintao Zhang,Zhiyuan Wu,Chunhui Zhang,Anan Liu,Jiulong You,Chao Deng,Yuekui Yang,Shudong Huang,Dapeng Liu,Haijie Gu
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:Long-sequence modeling has become an indispensable frontier in recommendation systems for capturing users’ long-term preferences. However, user behaviors within advertising domains are inherently sparse, posing a significant barrier to constructing long behavioral sequences using data from a single advertising domain alone. This motivates us to collect users’ behaviors not only across diverse advertising scenarios, but also beyond the boundaries of the advertising domain into content domains-thereby constructing unified commercial behavior trajectories. This cross-domain or cross-scenario integration gives rise to the following challenges: (1) feature taxonomy gaps between distinct scenarios and domains, (2) inter-field interference arising from irrelevant feature field pairs, and (3) target-wise interference in temporal and semantic patterns when optimizing for different advertising targets. To address these challenges, we propose several practical approaches within the two-stage framework for long-sequence modeling. In the first (search) stage, we design a hierarchical hard search method for handling complex feature taxonomy hierarchies, alongside a decoupled embedding-based soft search to alleviate conflicts between attention mechanisms and feature representation. In the second (sequence modeling) stage, we introduce: (a) Decoupled Side Information Temporal Interest Networks (TIN) to mitigate inter-field conflicts; (b) Target-Decoupled Positional Encoding and Target-Decoupled SASRec to address target-wise interference; and © Stacked TIN to model high-order behavioral correlations. Deployed in production on Tencent’s large-scale advertising platforms, our innovations delivered significant performance gains: an overall 4.22% GMV lift in WeChat Channels and an overall 1.96% GMV increase in WeChat Moments.

[IR-14] Improving E-commerce Search with Category-Aligned Retrieval

链接: https://arxiv.org/abs/2510.21711
作者: Rauf Aliev
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:Traditional e-commerce search systems often struggle with the semantic gap between user queries and product catalogs. In this paper, we propose a Category-Aligned Retrieval System (CARS) that improves search relevance by first predicting the product category from a user’s query and then boosting products within that category. We introduce a novel method for creating “Trainable Category Prototypes” from query embeddings. We evaluate this method with two models: a lightweight all-MiniLM-L6-v2 and OpenAI’s text-embedding-ada-002. Our offline evaluation shows this method is highly effective, with the OpenAI model increasing Top-3 category prediction accuracy from a zero-shot baseline of 43.8% to 83.2% after training. The end-to-end simulation, however, highlights the limitations of blindly applying category boosts in a complex retrieval pipeline: while accuracy is high, naive integration can negatively affect search relevance metrics such as nDCG@10. We argue that this is partly due to dataset-specific ambiguities (e.g., polysemous queries in the Amazon ESCI corpus) and partly due to the sensitivity of retrieval systems to over-constraining filters. Crucially, these results do not diminish the value of the approach; rather, they emphasize the need for confidence-aware and adaptive integration strategies.

附件下载

点击下载今日全部论文列表