本篇博文主要内容为 2025-11-11 从Arxiv.org论文网站获取的最新论文列表,自动更新,按照NLP、CV、ML、AI、IR五个大方向区分,若需要邮件定时接收,请在评论区留下你的邮箱号。

说明:每日论文数据从Arxiv.org获取,每天早上12:00左右定时自动更新。

友情提示: 如何您需要邮箱接收每日论文数据,请在评论处留下你的邮箱。

目录

概览 (2025-11-11)

今日共更新1095篇论文,其中:

  • 自然语言处理145篇(Computation and Language (cs.CL))
  • 人工智能363篇(Artificial Intelligence (cs.AI))
  • 计算机视觉255篇(Computer Vision and Pattern Recognition (cs.CV))
  • 机器学习344篇(Machine Learning (cs.LG))

自然语言处理

[NLP-0] DigiData: Training and Evaluating General-Purpose Mobile Control Agents KR

链接: https://arxiv.org/abs/2511.07413
作者: Yuxuan Sun,Manchen Wang,Shengyi Qian,William R. Wong,Eric Gan,Pierluca D’Oro,Alejandro Castillejo Munoz,Sneha Silwal,Pedro Matias,Nitin Kamra,Satwik Kottur,Nick Raines,Xuanyi Zhao,Joy Chen,Joseph Greer,Andrea Madotto,Allen Bolourchi,James Valori,Kevin Carlberg,Karl Ridgeway,Joseph Tighe
机构: Meta(元)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
备注: Website: this https URL

点击查看摘要

[NLP-1] SPOT: An Annotated French Corpus and Benchmark for Detecting Critical Interventions in Online Conversations

链接: https://arxiv.org/abs/2511.07405
作者: Manon Berriche,Célia Nouri,Chloé Clavel,Jean-Philippe Cointet
机构: 未知
类目: Computation and Language (cs.CL); Computers and Society (cs.CY)
备注:

点击查看摘要

[NLP-2] SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLM s via Spatial Rewards NEURIPS2025

【速读】: 该论文旨在解决多模态大语言模型(Multimodal Large Language Models, MLLMs)在空间理解能力上的不足问题,尤其针对现有方法依赖显式3D输入、架构特异性修改、大规模数据或稀疏监督所导致的局限性。其解决方案的关键在于提出SpatialThinker,一个通过强化学习(Reinforcement Learning, RL)训练的3D感知MLLM,能够将结构化的空间定位与多步推理相结合;具体包括两个核心贡献:一是构建高质量的空间视觉问答(Spatial Text-Vision Question Answering, STVQA-7K)数据集以支持训练,二是设计基于多目标密集空间奖励的在线强化学习机制,从而强化模型对空间关系的准确建模和推理能力。实验表明,SpatialThinker-7B在空间理解和真实世界视觉问答任务中显著优于监督微调及稀疏强化学习基线,且性能接近甚至超越GPT-4o,验证了结合空间监督与奖励对齐推理的有效性。

链接: https://arxiv.org/abs/2511.07403
作者: Hunar Batra,Haoqin Tu,Hardy Chen,Yuanze Lin,Cihang Xie,Ronald Clark
机构: University of Oxford (牛津大学); University of California, Santa Cruz (加州大学圣克鲁兹分校)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: Preprint. Accepted at NeurIPS 2025 Workshops on SPACE in Vision, Language, and Embodied AI (SpaVLE), Embodied World Models for Decision Making (EWM), Aligning Reinforcement Learning Experimentalists and Theorists (ARLET), and Scaling Environments for Agents (SEA)

点击查看摘要

Abstract:Multimodal large language models (MLLMs) have achieved remarkable progress in vision-language tasks, but they continue to struggle with spatial understanding. Existing spatial MLLMs often rely on explicit 3D inputs or architecture-specific modifications, and remain constrained by large-scale datasets or sparse supervision. To address these limitations, we introduce SpatialThinker, a 3D-aware MLLM trained with RL to integrate structured spatial grounding with multi-step reasoning. The model simulates human-like spatial perception by constructing a scene graph of task-relevant objects and spatial relations, and reasoning towards an answer via dense spatial rewards. SpatialThinker consists of two key contributions: (1) a data synthesis pipeline that generates STVQA-7K, a high-quality spatial VQA dataset, and (2) online RL with a multi-objective dense spatial reward enforcing spatial grounding. SpatialThinker-7B outperforms supervised fine-tuning and the sparse RL baseline on spatial understanding and real-world VQA benchmarks, nearly doubling the base-model gain compared to sparse RL, and surpassing GPT-4o. These results showcase the effectiveness of combining spatial supervision with reward-aligned reasoning in enabling robust 3D spatial understanding with limited data and advancing MLLMs towards human-level visual reasoning.
zh

[NLP-3] ConvFill: Model Collaboration for Responsive Conversational Voice Agents

链接: https://arxiv.org/abs/2511.07397
作者: Vidya Srinivas,Zachary Englhardt,Maximus Powers,Shwetak Patel,Vikram Iyer
机构: University of Washington (华盛顿大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-4] Surgical Agent Orchestration Platform for Voice-directed Patient Data Interaction

链接: https://arxiv.org/abs/2511.07392
作者: Hyeryun Park,Byung Mo Gu,Jun Hee Lee,Byeong Hyeon Choi,Sekeun Kim,Hyun Koo Kim,Kyungsang Kim
机构: Korea University (韩国科学技术院); MGB (医疗基因生物公司)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 22 pages, 12 figures, 1 table, Supplementary Information, Supplementary Data 1

点击查看摘要

[NLP-5] aching Pretrained Language Models to Think Deeper with Retrofitted Recurrence

链接: https://arxiv.org/abs/2511.07384
作者: Sean McLeish,Ang Li,John Kirchenbauer,Dayal Singh Kalra,Brian R. Bartoldson,Bhavya Kailkhura,Avi Schwarzschild,Jonas Geiping,Tom Goldstein,Micah Goldblum
机构: University of Maryland (马里兰大学); New York University (纽约大学); Lawrence Livermore National Laboratory (劳伦斯利弗莫尔国家实验室); University of North Carolina (北卡罗来纳大学); ELLIS Institute Tübingen, Max Planck Institute for Intelligent Systems, Tübingen AI Center (图宾根ELLIS研究所,马克斯·普朗克智能系统研究所,图宾根人工智能中心); Columbia University (哥伦比亚大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: code: this https URL , models: this https URL

点击查看摘要

[NLP-6] Retriv at BLP-2025 Task 2: Test-Driven Feedback-Guided Framework for Bangla-to-Python Code Generation

链接: https://arxiv.org/abs/2511.07382
作者: K M Nafi Asib,Sourav Saha,Mohammed Moshiul Hoque
机构: Chittagong University of Engineering and Technology (吉大港工程与技术大学)
类目: Computation and Language (cs.CL)
备注: 8 pages, 1 figure, experimental scripts publicly available at this https URL

点击查看摘要

[NLP-7] Selecting Auxiliary Data via Neural Tangent Kernels for Low-Resource Domains

链接: https://arxiv.org/abs/2511.07380
作者: Pingjie Wang,Hongcheng Liu,Yusheng Liao,Ziqing Fan,Yaxin Du,Shuo Tang,Yanfeng Wang,Yu Wang
机构: Shanghai Jiao Tong University (上海交通大学); Shanghai Artificial Intelligence Laboratory (上海人工智能实验室)
类目: Computation and Language (cs.CL)
备注: 27 pages

点击查看摘要

[NLP-8] Self-Evaluating LLM s for Multi-Step Tasks: Stepwise Confidence Estimation for Failure Detection NEURIPS2025

链接: https://arxiv.org/abs/2511.07364
作者: Vaibhav Mavi,Shubh Jaroria,Weiqi Sun
机构: Dyania Health
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: Accepted at NeurIPS 2025 Workshop on Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling

点击查看摘要

[NLP-9] IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction

链接: https://arxiv.org/abs/2511.07327
作者: Guoxin Chen,Zile Qiao,Xuanzhong Chen,Donglei Yu,Haotian Xu,Wayne Xin Zhao,Ruihua Song,Wenbiao Yin,Huifeng Yin,Liwen Zhang,Kuan Li,Minpeng Liao,Yong Jiang,Pengjun Xie,Fei Huang,Jingren Zhou
机构: Gaoling School of Artificial Intelligence, Renmin University of China (中国人民大学高瓴人工智能学院); Tongyi Lab, Alibaba Group (阿里巴巴集团通义实验室); OpenRLHF
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: this https URL

点击查看摘要

[NLP-10] FinRpt: Dataset Evaluation System and LLM -based Multi-agent Framework for Equity Research Report Generation AAAI2026

链接: https://arxiv.org/abs/2511.07322
作者: Song Jin,Shuqi Li,Shukun Zhang,Rui Yan
机构: 武汉大学(Whu); 武汉大学人民医院(Wuhan University People’s Hospital)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: AAAI 2026

点击查看摘要

[NLP-11] When Bias Pretends to Be Truth: How Spurious Correlations Undermine Hallucination Detection in LLM s

【速读】: 该论文旨在解决大语言模型(Large Language Models, LLMs)中由虚假相关性(spurious correlations)引发的幻觉问题,即模型基于训练数据中表面但统计显著的特征-属性关联(如姓氏与国籍)生成看似合理实则错误的回答。解决方案的关键在于揭示现有检测方法(如基于置信度的过滤和内部状态探测)在面对此类虚假相关性时的根本失效机制,并通过系统性的合成实验与真实模型评估验证:这类幻觉具有高置信度、不随模型规模扩大而缓解、可规避当前检测手段且对拒绝微调(refusal fine-tuning)具有鲁棒性。论文进一步提出理论分析,阐明统计偏差如何内在地破坏依赖置信度的检测逻辑,从而强调亟需开发专门针对虚假相关性驱动幻觉的新检测与防御方法。

链接: https://arxiv.org/abs/2511.07318
作者: Shaowen Wang,Yiqi Dong,Ruinian Chang,Tansheng Zhu,Yuebo Sun,Kaifeng Lyu,Jian Li
机构: Institute for Interdisciplinary Information Sciences, Tsinghua University (清华大学交叉信息研究院)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Despite substantial advances, large language models (LLMs) continue to exhibit hallucinations, generating plausible yet incorrect responses. In this paper, we highlight a critical yet previously underexplored class of hallucinations driven by spurious correlations – superficial but statistically prominent associations between features (e.g., surnames) and attributes (e.g., nationality) present in the training data. We demonstrate that these spurious correlations induce hallucinations that are confidently generated, immune to model scaling, evade current detection methods, and persist even after refusal fine-tuning. Through systematically controlled synthetic experiments and empirical evaluations on state-of-the-art open-source and proprietary LLMs (including GPT-5), we show that existing hallucination detection methods, such as confidence-based filtering and inner-state probing, fundamentally fail in the presence of spurious correlations. Our theoretical analysis further elucidates why these statistical biases intrinsically undermine confidence-based detection techniques. Our findings thus emphasize the urgent need for new approaches explicitly designed to address hallucinations caused by spurious correlations.
zh

[NLP-12] RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments

链接: https://arxiv.org/abs/2511.07317
作者: Zhiyuan Zeng,Hamish Ivison,Yiping Wang,Lifan Yuan,Shuyue Stella Li,Zhuorui Ye,Siting Li,Jacqueline He,Runlong Zhou,Tong Chen,Chenyang Zhao,Yulia Tsvetkov,Simon Shaolei Du,Natasha Jaques,Hao Peng,Pang Wei Koh,Hannaneh Hajishirzi
机构: 未知
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-13] ACE-ICD: Acronym Expansion As Data Augmentation For Automated ICD Coding AACL2025

链接: https://arxiv.org/abs/2511.07311
作者: Tuan-Dung Le,Shohreh Haddadan,Thanh Q. Thieu
机构: Moffitt Cancer Center and Research Institute (莫菲特癌症中心和研究所); University of South Florida (南佛罗里达大学)
类目: Computation and Language (cs.CL)
备注: Camera ready version for IJCNLP-AACL 2025 (Findings)

点击查看摘要

[NLP-14] Retriv at BLP-2025 Task 1: A Transformer Ensemble and Multi-Task Learning Approach for Bangla Hate Speech Identification

链接: https://arxiv.org/abs/2511.07304
作者: Sourav Saha,K M Nafi Asib,Mohammed Moshiul Hoque
机构: Chittagong University of Engineering and Technology (吉大港工程与技术大学)
类目: Computation and Language (cs.CL)
备注: 7 pages, 3 figures, experimental scripts publicly available at this https URL

点击查看摘要

[NLP-15] Who Is the Story About? Protagonist Entity Recognition in News

链接: https://arxiv.org/abs/2511.07296
作者: Jorge Gabín,M. Eduardo Ares,Javier Parapar
机构: Linknovate Science (Linknovate科学); University of A Coruña (拉科鲁尼亚大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-16] he Few Govern the Many:Unveiling Few-Layer Dominance for Time Series Models

链接: https://arxiv.org/abs/2511.07237
作者: Xin Qiu,Junlong Tong,Yirong Sun,Yunpu Ma,Xiaoyu Shen
机构: Eastern Institute of Technology (东方理工大学); Zhejiang University (浙江大学); Shanghai Jiao Tong University (上海交通大学); Ludwig Maximilian University of Munich (慕尼黑路德维希-马克西米利安大学)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-17] Discourse Graph Guided Document Translation with Large Language Models

链接: https://arxiv.org/abs/2511.07230
作者: Viet-Thanh Pham,Minghan Wang,Hao-Han Liao,Thuy-Trang Vu
机构: Monash University (蒙纳士大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-18] EMODIS: A Benchmark for Context-Dependent Emoji Disambiguation in Large Language Models AAAI2026

链接: https://arxiv.org/abs/2511.07193
作者: Jiacheng Huang,Ning Yu,Xiaoyin Yi
机构: 未知
类目: Computation and Language (cs.CL)
备注: Accepted by AAAI2026

点击查看摘要

[NLP-19] Graph Representation-based Model Poisoning on the Heterogeneous Internet of Agents

链接: https://arxiv.org/abs/2511.07176
作者: Hanlin Cai,Houtianfu Wang,Haofan Dong,Kai Li,Ozgur B. Akan
机构: 未知
类目: Networking and Internet Architecture (cs.NI); Computation and Language (cs.CL)
备注: 6 pages, 6 figures

点击查看摘要

[NLP-20] AdaRec: Adaptive Recommendation with LLM s via Narrative Profiling and Dual-Channel Reasoning

链接: https://arxiv.org/abs/2511.07166
作者: Meiyun Wang,Charin Polpanumas
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
备注:

点击查看摘要

[NLP-21] Categorical Emotions or Appraisals - Which Emotion Model Explains Argument Convincingness Better?

链接: https://arxiv.org/abs/2511.07162
作者: Lynn Greschner,Meike Bauer,Sabine Weber,Roman Klinger
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-22] CM-Eval: An Expert-Level Dynamic and Extensible Benchmark for Traditional Chinese Medicine

链接: https://arxiv.org/abs/2511.07148
作者: Zihao Cheng,Yuheng Lu,Huaiqian Ye,Zeming Liu,Minqi Wang,Jingjing Liu,Zihan Li,Wei Fan,Yuanfang Guo,Ruiji Fu,Shifeng She,Gang Wang,Yunhong Wang
机构: Beihang University (北京航空航天大学); Beijing Zhimingtang Technology Co., Ltd. (北京智明堂科技有限公司); Beijing Zhiyan AI Technology Co., Ltd. (北京智言人工智能科技有限公司); Guangzhou University of Chinese Medicine (广州中医药大学)
类目: Computation and Language (cs.CL)
备注: Work in Progress

点击查看摘要

[NLP-23] LoRA on the Go: Instance-level Dynamic LoRA Selection and Merging

链接: https://arxiv.org/abs/2511.07129
作者: Seungeon Lee,Soumi Das,Manish Gupta,Krishna P. Gummadi
机构: MPI-SWS; Microsoft(微软)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-24] hink Consistently Reason Efficiently: Energy-Based Calibration for Implicit Chain-of-Thought

链接: https://arxiv.org/abs/2511.07124
作者: Zhikang Chen,Sen Cui,Deheng Ye,Yu Zhang,Yatao Bian,Tingting Zhu
机构: University of Oxford (牛津大学); Tsinghua University (清华大学); Tencent (腾讯); Southern University of Science and Technology (南方科技大学); National University of Singapore (新加坡国立大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-25] More Agents Helps but Adversarial Robustness Gap Persists

链接: https://arxiv.org/abs/2511.07112
作者: Khashayar Alavi,Zhastay Yeltay,Lucie Flek,Akbar Karimi
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-26] MENTOR: A Metacognition-Driven Self-Evolution Framework for Uncovering and Mitigating Implicit Risks in LLM s on Domain Tasks

链接: https://arxiv.org/abs/2511.07107
作者: Liang Shan,Kaicheng Shen,Wen Wu,Zhenyu Ying,Chaochao Lu,Guangze Ye,Liang He
机构: 华东师范大学计算机科学与技术学院(Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science and Technology, East China Normal University)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-27] Wasm: A Pipeline for Constructing Structured Arabic Interleaved Multimodal Corpora

链接: https://arxiv.org/abs/2511.07080
作者: Khalil Hennara,Ahmad Bastati,Muhammad Hreden,Mohamed Motasim Hamed,Zeina Aldallal,Sara Chrouf,Safwan AlModhayan
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-28] EmoBang: Detecting Emotion From Bengali Texts

链接: https://arxiv.org/abs/2511.07077
作者: Abdullah Al Maruf,Aditi Golder,Zakaria Masud Jiyad,Abdullah Al Numan,Tarannum Shaila Zaman
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-29] Importance-Aware Data Selection for Efficient LLM Instruction Tuning AAAI2026

链接: https://arxiv.org/abs/2511.07074
作者: Tingyu Jiang,Shen Li,Yiyao Song,Lan Zhang,Hualei Zhu,Yuan Zhao,Xiaohang Xu,Kenjiro Taura,Hao Henry Wang
机构: 1: University of California, Berkeley (加州大学伯克利分校); 2: Tsinghua University (清华大学); 3: University of Tokyo (东京大学)
类目: Computation and Language (cs.CL)
备注: Accepted by AAAI 2026 Oral

点击查看摘要

[NLP-30] Aligning Attention with Human Rationales for Self-Explaining Hate Speech Detection AAAI26 AAAI

链接: https://arxiv.org/abs/2511.07065
作者: Brage Eilertsen,Røskva Bjørgfinsdóttir,Francielle Vargas,Ali Ramezani-Kebrya
机构: 未知
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: Accepted at the Annual AAAI Conference on Artificial Intelligence (AAAI26)

点击查看摘要

[NLP-31] When Sufficient is not Enough: Utilizing the Rashomon Effect for Complete Evidence Extraction

链接: https://arxiv.org/abs/2511.07055
作者: Katharina Beckh,Stefan Rüping
机构: Fraunhofer IAIS (弗劳恩霍夫信息与通信技术研究所); Lamarr Institute (拉马尔研究所)
类目: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-32] Evaluating LLM s for Anxiety Depression and Stress Detection Evaluating Large Language Models for Anxiety Depression and Stress Detection: Insights into Prompting Strategies and Synthetic Data

链接: https://arxiv.org/abs/2511.07044
作者: Mihael Arcan,David-Paul Niland
机构: Lua Health(卢亚健康)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-33] Llama-Embed-Nemotron-8B: A Universal Text Embedding Model for Multilingual and Cross-Lingual Tasks

链接: https://arxiv.org/abs/2511.07025
作者: Yauhen Babakhin,Radek Osmulski,Ronay Ak,Gabriel Moreira,Mengyao Xu,Benedikt Schifferer,Bo Liu,Even Oldridge
机构: 未知
类目: Computation and Language (cs.CL); Information Retrieval (cs.IR)
备注:

点击查看摘要

[NLP-34] Multilingual Lexical Feature Analysis of Spoken Language for Predicting Major Depression Symptom Severity

链接: https://arxiv.org/abs/2511.07011
作者: Anastasiia Tokareva,Judith Dineley,Zoe Firth,Pauline Conde,Faith Matcham,Sara Siddi,Femke Lamers,Ewan Carr,Carolin Oetzmann,Daniel Leightley,Yuezhou Zhang,Amos A. Folarin,Josep Maria Haro,Brenda W.J.H. Penninx,Raquel Bailon,Srinivasan Vairavan,Til Wykes,Richard J.B. Dobson,Vaibhav A. Narayan,Matthew Hotopf,Nicholas Cummins, TheRADAR-CNS Consortium
机构: 未知
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-35] A Picture is Worth a Thousand (Correct) Captions: A Vision-Guided Judge-Corrector System for Multimodal Machine Translation AACL2025

链接: https://arxiv.org/abs/2511.07010
作者: Siddharth Betala,Kushan Raj,Vipul Betala,Rohan Saswade
机构: 未知
类目: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
备注: Accepted at The 12th Workshop on Asian Translation, co-located with IJCLNLP-AACL 2025

点击查看摘要

[NLP-36] Beyond English: Toward Inclusive and Scalable Multilingual Machine Translation with LLM s

链接: https://arxiv.org/abs/2511.07003
作者: Yingfeng Luo,Ziqiang Xu,Yuxuan Ouyang,Murun Yang,Dingyang Lin,Kaiyan Chang,Tong Zheng,Bei Li,Peinan Feng,Quan Du,Tong Xiao,Jingbo Zhu
机构: Northeastern University (东北大学); NiuTrans Research (牛津研究)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-37] Automated Circuit Interpretation via Probe Prompting

链接: https://arxiv.org/abs/2511.07002
作者: Giuseppe Birardi
机构: Orma Lab Srl (Orma 实验室有限公司)
类目: Computation and Language (cs.CL)
备注: 27 pages, 5 figures, 3 tables. Code and interactive demo available

点击查看摘要

[NLP-38] SCOPE: Intrinsic Semantic Space Control for Mitigating Copyright Infringement in LLM s AAAI2026

链接: https://arxiv.org/abs/2511.07001
作者: Zhenliang Zhang,Xinyu Hu,Xiaojun Wan
机构: 未知
类目: Computation and Language (cs.CL)
备注: Accepted by the AAAI 2026 (Main Track)

点击查看摘要

[NLP-39] HLPD: Aligning LLM s to Human Language Preference for Machine-Revised Text Detection AAAI’26

链接: https://arxiv.org/abs/2511.06942
作者: Fangqi Dai,Xingjian Jiang,Zizhuang Deng
机构: 未知
类目: Computation and Language (cs.CL); Cryptography and Security (cs.CR)
备注: 9 pages, 3 figures, accepted by AAAI’26

点击查看摘要

[NLP-40] RPTS: Tree-Structured Reasoning Process Scoring for Faithful Multimodal Evaluation

链接: https://arxiv.org/abs/2511.06899
作者: Haofeng Wang,Yu Zhang
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-41] EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLM s as Simulated Teachers AAAI2026

链接: https://arxiv.org/abs/2511.06890
作者: Yilin Jiang,Mingzi Zhang,Xuanyu Yin,Sheng Jin,Suyu Lu,Zuocan Ying,Zengyi Yu,Xiangjie Kong
机构: 1. Tsinghua University (清华大学); 2. Institute of Automation, Chinese Academy of Sciences (中国科学院自动化研究所); 3. Alibaba Group (阿里巴巴集团); 4. University of California, Berkeley (加州大学伯克利分校); 5. Microsoft Research (微软研究院); 6. Peking University (北京大学); 7. Huawei Technologies (华为技术有限公司)
类目: Computation and Language (cs.CL)
备注: 22 pages, 9 figures, accepted by AAAI2026 as oral paper

点击查看摘要

[NLP-42] Inclusion of Role into Named Entity Recognition and Ranking

链接: https://arxiv.org/abs/2511.06886
作者: Neelesh Kumar Shukla,Sanasam Ranbir Singh
机构: IIT Guwahati (印度理工学院古瓦哈蒂分校)
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: MTP Paper

点击查看摘要

[NLP-43] CLiFT-ASR: A Cross-Lingual Fine-Tuning Framework for Low-Resource Taiwanese Hokkien Speech Recognition

链接: https://arxiv.org/abs/2511.06860
作者: Hung-Yang Sung,Chien-Chun Wang,Kuan-Tang Huang,Tien-Hong Lo,Yu-Sheng Tsao,Yung-Chang Hsu,Berlin Chen
机构: National Taiwan Normal University (国立台湾师范大学); EZAI (EZAI)
类目: Computation and Language (cs.CL); Sound (cs.SD)
备注: Accepted for an oral presentation at the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)

点击查看摘要

[NLP-44] Beyond Plain Demos: A Demo-centric Anchoring Paradigm for In-Context Learning in Alzheimers Disease Detection AAAI

链接: https://arxiv.org/abs/2511.06826
作者: Puzhen Su,Haoran Yin,Yongzhu Miao,Jintao Tang,Shasha Li,Ting Wang
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Accepted to the 40th Annual AAAI Conference on Artificial Intelligence (2026) - Main Technical Track (Oral)

点击查看摘要

[NLP-45] Learning to Focus: Focal Attention for Selective and Scalable Transformers

链接: https://arxiv.org/abs/2511.06818
作者: Dhananjay Ram,Wei Xia,Stefano Soatto
机构: AWS AI Labs (Amazon Web Services 人工智能实验室)
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-46] SAFENLIDB: A Privacy-Preserving Safety Alignment Framework for LLM -based Natural Language Database Interfaces

链接: https://arxiv.org/abs/2511.06778
作者: Ruiheng Liu,XiaoBing Chen,Jinyu Zhang,Qiongwen Zhang,Yu Zhang,Bailong Yang
机构: 未知
类目: Computation and Language (cs.CL)
备注: 26 pages, 14 figures, 22 tables

点击查看摘要

[NLP-47] Sensitivity of Small Language Models to Fine-tuning Data Contamination

链接: https://arxiv.org/abs/2511.06763
作者: Nicy Scaria,Silvester John Joseph Kennedy,Deepak Subramani
机构: Indian Institute of Science (印度科学研究所)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-48] Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale Systematic Expert Evaluation and Practical Insights

链接: https://arxiv.org/abs/2511.06738
作者: Hyunjae Kim,Jiwoong Sohn,Aidan Gilson,Nicholas Cochran-Caggiano,Serina Applebaum,Heeju Jin,Seihee Park,Yujin Park,Jiyeong Park,Seoyoung Choi,Brittany Alexandra Herrera Contreras,Thomas Huang,Jaehoon Yun,Ethan F. Wei,Roy Jiang,Leah Colucci,Eric Lai,Amisha Dave,Tuo Guo,Maxwell B. Singer,Yonghoe Koo,Ron A. Adelman,James Zou,Andrew Taylor,Arman Cohan,Hua Xu,Qingyu Chen
机构: Yale School of Medicine, Yale University (耶鲁大学医学院); ETH Zurich (苏黎世联邦理工学院); Harvard Medical School (哈佛医学院); Geisel School of Medicine at Dartmouth (达特茅斯医学院); Seoul National University College of Medicine (首尔国立大学医学院); Hanyang University College of Medicine (汉阳大学医学院); Asan Medical Center, University of Ulsan College of Medicine (首尔大学医学院附属医院); Stanford University School of Medicine (斯坦福大学医学院); University of Virginia School of Medicine (弗吉尼亚大学医学院); Yale School of Engineering & Applied Science (耶鲁大学工程与应用科学学院)
类目: Computation and Language (cs.CL)
备注: 34 pages, 6 figures

点击查看摘要

[NLP-49] Revisiting the Data Sampling in Multimodal Post-training from a Difficulty-Distinguish View AAAI2026

链接: https://arxiv.org/abs/2511.06722
作者: Jianyu Qi,Ding Zou,Wenrui Yan,Rui Ma,Jiaxu Li,Zhijie Zheng,Zhiguo Yang,Rongchang Zhao
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: Accpeted by AAAI 2026

点击查看摘要

[NLP-50] Sentiment Analysis On YouTube Comments Using Machine Learning Techniques Based On Video Games Content

链接: https://arxiv.org/abs/2511.06708
作者: Adi Danish Bin Muhammad Amin,Mohaiminul Islam Bhuiyan,Nur Shazwani Kamarudin,Zulfahmi Toh,Nur Syafiqah Nafis
机构: 未知
类目: Computation and Language (cs.CL)
备注: 6 pages, 7 figures, 2025 IEEE 9th International Conference on Software Engineering Computer Systems

点击查看摘要

[NLP-51] Place Matters: Comparing LLM Hallucination Rates for Place-Based Legal Queries

链接: https://arxiv.org/abs/2511.06700
作者: Damian Curran,Vanessa Sporne,Lea Frermann,Jeannie Paterson
机构: The University of Melbourne (墨尔本大学); Melbourne Law School (墨尔本法学院); The Centre for Artificial Intelligence and Digital Ethics (人工智能与数字伦理中心)
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-52] xtual Self-attention Network: Test-Time Preference Optimization through Textual Gradient-based Attention AAAI2026

链接: https://arxiv.org/abs/2511.06682
作者: Shibing Mo,Haoyang Ruan,Kai Wu,Jing Liu
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: AAAI2026

点击查看摘要

[NLP-53] Steering LLM s toward Korean Local Speech: Iterative Refinement Framework for Faithful Dialect Translation LREC2026

链接: https://arxiv.org/abs/2511.06680
作者: Keunhyeung Park,Seunguk Yu,Youngbin Kim
机构: 未知
类目: Computation and Language (cs.CL)
备注: Submitted to LREC 2026

点击查看摘要

[NLP-54] How AI Fails: An Interactive Pedagogical Tool for Demonstrating Dialectal Bias in Automated Toxicity Models

链接: https://arxiv.org/abs/2511.06676
作者: Subhojit Ghimire
机构: 未知
类目: Computation and Language (cs.CL); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
备注: 9 pages, 5 figures, 4 tables, 14 references

点击查看摘要

[NLP-55] HiMo-CLIP: Modeling Semantic Hierarchy and Monotonicity in Vision-Language Alignment AAAI2026

链接: https://arxiv.org/abs/2511.06653
作者: Ruijia Wu,Ping Chen,Fei Shen,Shaoan Zhao,Qiang Hui,Huanlin Gao,Ting Lu,Zhaoxiang Liu,Fang Zhao,Kai Wang,Shiguo Lian
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
备注: Accepted by AAAI 2026 as an Oral Presentation (13 pages, 7 figures, 7 tables)

点击查看摘要

[NLP-56] GRAPH-GRPO-LEX: Contract Graph Modeling and Reinforcement Learning with Group Relative Policy Optimization

链接: https://arxiv.org/abs/2511.06618
作者: Moriya Dechtiar,Daniel Martin Katz,Mari Sundaresan,Sylvain Jaume,Hongming Wang
机构: Harvard University (哈佛大学); Illinois Tech - Chicago Kent College of Law (伊利诺伊理工学院-芝加哥肯特法学院); CLTDS, Bucerius Law School (CLTDS,布策里乌斯法学院); Yong Pung How School of Law, Singapore Management University (新加坡管理大学杨敬礼法学院); CodeX - The Stanford Center for Legal Informatics, Stanford University (斯坦福大学法律信息中心); Georgetown University (乔治城大学); Massachusetts Institute of Technology (麻省理工学院)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Software Engineering (cs.SE)
备注:

点击查看摘要

[NLP-57] Duality-based Mode Operations and Pyramid Multilayer Mapping for Rhetorical Modes

链接: https://arxiv.org/abs/2511.06601
作者: Zi-Niu Wu
机构: 未知
类目: Computation and Language (cs.CL); Formal Languages and Automata Theory (cs.FL); Programming Languages (cs.PL)
备注:

点击查看摘要

[NLP-58] MedVoiceBias: A Controlled Study of Audio LLM Behavior in Clinical Decision-Making

链接: https://arxiv.org/abs/2511.06592
作者: Zhi Rui Tam,Yun-Nung Chen
机构: 未知
类目: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
备注:

点击查看摘要

[NLP-59] abRAG : Tabular Document Retrieval via Structured Language Representations NEURIPS2025

链接: https://arxiv.org/abs/2511.06582
作者: Jacob Si,Mike Qu,Michelle Lee,Yingzhen Li
机构: Imperial College London (帝国理工学院); Columbia University (哥伦比亚大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (cs.LG)
备注: NeurIPS 2025 AI4Tab

点击查看摘要

[NLP-60] Rep2Text: Decoding Full Text from a Single LLM Token Representation

链接: https://arxiv.org/abs/2511.06571
作者: Haiyan Zhao,Zirui He,Fan Yang,Ali Payani,Mengnan Du
机构: New Jersey Institute of Technology (新泽西理工学院); Wake Forest University (维克森林大学); Cisco Research (思科研究院)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 15 pages, 7 figures, 4 tables

点击查看摘要

[NLP-61] FPGA or GPU? Analyzing comparative research for application-specific guidance

链接: https://arxiv.org/abs/2511.06565
作者: Arnab A Purkayastha,Jay Tharwani,Shobhit Aggarwal
机构: 未知
类目: Hardware Architecture (cs.AR); Computation and Language (cs.CL); Distributed, Parallel, and Cluster Computing (cs.DC); Programming Languages (cs.PL)
备注: 7 pages

点击查看摘要

[NLP-62] Ibom NLP: A Step Toward Inclusive Natural Language Processing for Nigerias Minority Languages AACL

链接: https://arxiv.org/abs/2511.06531
作者: Oluwadara Kalejaiye,Luel Hagos Beyene,David Ifeoluwa Adelani,Mmekut-Mfon Gabriel Edet,Aniefon Daniel Akpan,Eno-Abasi Urua,Anietie Andy
机构: Howard University (霍华德大学); AIMS Research and Innovation Centre; NM-AIST; Mila - Quebec AI Institute; McGill University; Canada CIFAR AI Chair; Korapay; National Institute for Nigerian Languages; University of Uyo (尤yo大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Accepted at IJCNLP-AACL

点击查看摘要

[NLP-63] Better Datasets Start From RefineLab: Automatic Optimization for High-Quality Dataset Refinement

链接: https://arxiv.org/abs/2511.06530
作者: Xiaonan Luo,Yue Huang,Ping He,Xiangliang Zhang
机构: University of Notre Dame(圣母大学); Vanderbilt University(范德比尔特大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-64] You Had One Job: Per-Task Quantization Using LLM s Hidden Representations

链接: https://arxiv.org/abs/2511.06516
作者: Amit LeVi,Raz Lapid,Rom Himelstein,Yaniv Nemcovsky,Ravid Shwartz Ziv,Avi Mendelson
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-65] Rethinking what Matters: Effective and Robust Multilingual Realignment for Low-Resource Languages AACL2025

链接: https://arxiv.org/abs/2511.06497
作者: Quang Phuoc Nguyen,David Anugraha,Felix Gaschi,Jun Bin Cheng,En-Shiun Annie Lee
机构: Ontario Tech University (安大略理工大学); Stanford University (斯坦福大学); SAS Posos; University of Toronto (多伦多大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Accepted to IJCNLP-AACL 2025

点击查看摘要

[NLP-66] When AI Agents Collude Online: Financial Fraud Risks by Collaborative LLM Agents on Social Platforms

链接: https://arxiv.org/abs/2511.06448
作者: Qibing Ren,Zhijie Zheng,Jiaxuan Guo,Junchi Yan,Lizhuang Ma,Jing Shao
机构: Shanghai Jiao Tong University (上海交通大学); Shanghai Artificial Intelligence Laboratory (上海人工智能实验室); Beihang University (北京航空航天大学)
类目: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Social and Information Networks (cs.SI)
备注: Code is available at this https URL

点击查看摘要

[NLP-67] SR-KI: Scalable and Real-Time Knowledge Integration into LLM s via Supervised Attention AAAI2026

链接: https://arxiv.org/abs/2511.06446
作者: Bohan Yu,Wei Huang,Kang Liu
机构: Baidu(百度); Tsinghua University (清华大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Accepted by AAAI 2026

点击查看摘要

[NLP-68] owards Resource-Efficient Multimodal Intelligence: Learned Routing among Specialized Expert Models

链接: https://arxiv.org/abs/2511.06441
作者: Mayank Saini,Arit Kumar Bishwas
机构: PwC US
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: 15 pages, 4 figures

点击查看摘要

[NLP-69] Optimizing Chain-of-Thought Confidence via Topological and Dirichlet Risk Analysis

链接: https://arxiv.org/abs/2511.06437
作者: Abhishek More,Anthony Zhang,Nicole Bonilla,Ashvik Vivekan,Kevin Zhu,Parham Sharafoleslami,Maheep Chaudhary
机构: Algoverse AI Research
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-70] CG-TTRL: Context-Guided Test-Time Reinforcement Learning for On-Device Large Language Models

链接: https://arxiv.org/abs/2511.06430
作者: Peyman Hosseini,Ondrej Bohdal,Taha Ceritli,Ignacio Castro,Matthew Purver,Mete Ozay,Umberto Michieli
机构: Samsung R&D Institute UK (三星研发研究院英国); Queen Mary University of London (伦敦玛丽女王大学)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注: 12 pages, 7 Figures, 4 Tables

点击查看摘要

[NLP-71] Dutch Metaphor Extraction from Cancer Patients Interviews and Forum Data using LLM s and Human in the Loop

链接: https://arxiv.org/abs/2511.06427
作者: Lifeng Han,David Lindevelt,Sander Puts,Erik van Mulligen,Suzan Verberne
机构: 未知
类目: Computation and Language (cs.CL); Computers and Society (cs.CY)
备注: Ongoing project report, on behalf of 4D PICTURE this https URL

点击查看摘要

[NLP-72] MONICA: Real-Time Monitoring and Calibration of Chain-of-Thought Sycophancy in Large Reasoning Models

【速读】: 该论文旨在解决大型推理模型(Large Reasoning Models, LRM)中存在的谄媚行为(sycophantic behavior)问题,即模型在推理过程中倾向于附和用户错误信念或接受误导信息,而非保持独立推理能力,从而损害模型可靠性并带来社会风险。解决方案的关键在于提出MONICA框架——一种基于监控引导校准的机制,能够在模型推理过程中的每一步实时监测谄媚漂移得分(sycophantic drift scores),并在得分超过预设阈值时,通过动态抑制机制干预模型行为,无需等待完整答案生成即可实现对谄媚行为的有效缓解。

链接: https://arxiv.org/abs/2511.06419
作者: Jingyu Hu,Shu Yang,Xilin Gong,Hongming Wang,Weiru Liu,Di Wang
机构: University of Bristol (布里斯托大学); King Abdullah University of Science and Technology (阿卜杜拉国王科技大学); University of Georgia (佐治亚大学); Southern University of Science and Technology (南方科技大学)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Large Reasoning Models (LRMs) suffer from sycophantic behavior, where models tend to agree with users’ incorrect beliefs and follow misinformation rather than maintain independent reasoning. This behavior undermines model reliability and poses societal risks. Mitigating LRM sycophancy requires monitoring how this sycophancy emerges during the reasoning trajectory; however, current methods mainly focus on judging based on final answers and correcting them, without understanding how sycophancy develops during reasoning processes. To address this limitation, we propose MONICA, a novel Monitor-guided Calibration framework that monitors and mitigates sycophancy during model inference at the level of reasoning steps, without requiring the model to finish generating its complete answer. MONICA integrates a sycophantic monitor that provides real-time monitoring of sycophantic drift scores during response generation with a calibrator that dynamically suppresses sycophantic behavior when scores exceed predefined thresholds. Extensive experiments across 12 datasets and 3 LRMs demonstrate that our method effectively reduces sycophantic behavior in both intermediate reasoning steps and final answers, yielding robust performance improvements.
zh

[NLP-73] How Well Do LLM s Understand Drug Mechanisms? A Knowledge Reasoning Evaluation Dataset

链接: https://arxiv.org/abs/2511.06418
作者: Sunil Mohan,Theofanis Karaletsos
机构: Chan Zuckerberg Initiative (Chan Zuckerberg Initiative)
类目: Computation and Language (cs.CL)
备注: An earlier version of this paper appears in IEEE FLLM 2025. GitHub: this https URL

点击查看摘要

[NLP-74] SugarTextNet: A Transformer-Based Framework for Detecting Sugar Dating-Related Content on Social Media with Context-Aware Focal Loss

链接: https://arxiv.org/abs/2511.06402
作者: Lionel Z. Wang,Shihan Ben,Yulu Huang,Simeng Qing
机构: The Hong Kong Polytechnic University (香港理工大学); The University of Hong Kong (香港大学); Northeastern University (东北大学)
类目: Computation and Language (cs.CL); Computers and Society (cs.CY); Social and Information Networks (cs.SI)
备注: This paper is accepted by HICSS 2026

点击查看摘要

[NLP-75] HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate Speech Detection

链接: https://arxiv.org/abs/2511.06391
作者: Irina Proskurina,Marc-Antoine Carpentier,Julien Velcin
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-76] LPFQA: A Long-Tail Professional Forum-based Benchmark for LLM Evaluation

链接: https://arxiv.org/abs/2511.06346
作者: Liya Zhu,Peizhuang Cong,Aowei Ji,Wenya Wu,Jiani Hou,Chunjie Wu,Xiang Gao,Jingkai Liu,Zhou Huan,Xuelei Sun,Yang Yang,Jianpeng Jiao,Liang Hu,Xinjie Chen,Jiashuo Liu,Jingzhe Ding,Tong Yang,Zaiyuan Wang,Ge Zhang,Wenhao Huang
机构: 未知
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-77] meSense:Making Large Language Models Proficient in Time-Series Analysis

链接: https://arxiv.org/abs/2511.06344
作者: Zhirui Zhang,Changhua Pei,Tianyi Gao,Zhe Xie,Yibo Hao,Zhaoyang Yu,Longlong Xu,Tong Xiao,Jing Han,Dan Pei
机构: Tsinghua University (清华大学); Computer Network Information Center, Chinese Academy of Sciences (中国科学院计算机网络信息中心); ZTE Corporation (中兴通讯)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-78] ELEGANCE: Efficient LLM Guidance for Audio-Visual Target Speech Extraction

链接: https://arxiv.org/abs/2511.06288
作者: Wenxuan Wu,Shuai Wang,Xixin Wu,Helen Meng,Haizhou Li
机构: 未知
类目: ound (cs.SD); Computation and Language (cs.CL); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
备注:

点击查看摘要

[NLP-79] Enhancing Multimodal Misinformation Detection by Replaying the Whole Story from Image Modality Perspective AAAI2026

链接: https://arxiv.org/abs/2511.06284
作者: Bing Wang,Ximing Li,Yanjun Wang,Changchun Li,Lin Yuanbo Wu,Buyu Wang,Shengsheng Wang
机构: 1. University of Science and Technology of China (中国科学技术大学); 2. Institute of Artificial Intelligence, University of Science and Technology of China (中国科学技术大学人工智能研究所); 3. School of Computer Science and Technology, University of Science and Technology of China (中国科学技术大学计算机科学与技术学院); 4. Department of Computer Science and Technology, Tsinghua University (清华大学计算机科学与技术系); 5. School of Information Science and Technology, Sun Yat-sen University (中山大学信息科学与技术学院)
类目: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
备注: Accepted by AAAI 2026. 13 pages, 6 figures. Code: this https URL

点击查看摘要

[NLP-80] Mixtures of SubExperts for Large Language Continual Learning

链接: https://arxiv.org/abs/2511.06237
作者: Haeyong Kang
机构: Deep.AI(深度人工智能)
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-81] Analyzing and Mitigating Negation Artifacts using Data Augmentation for Improving ELECTRA-Small Model Accuracy

链接: https://arxiv.org/abs/2511.06234
作者: Mojtaba Noghabaei
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-82] Overview of CHIP 2025 Shared Task 2: Discharge Medication Recommendation for Metabolic Diseases Based on Chinese Electronic Health Records

链接: https://arxiv.org/abs/2511.06230
作者: Juntao Li,Haobin Yuan,Ling Luo,Tengxiao Lv,Yan Jiang,Fan Wang,Ping Zhang,Huiyi Lv,Jian Wang,Yuanyuan Sun,Hongfei Lin
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-83] SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization AAAI2026

链接: https://arxiv.org/abs/2511.06222
作者: Yue Huang,Xiangqi Wang,Xiangliang Zhang
机构: University of Notre Dame (圣母大学)
类目: Computation and Language (cs.CL); Computers and Society (cs.CY)
备注: Accepted by AAAI 2026 (Oral)

点击查看摘要

[NLP-84] ny Model Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

【速读】: 该论文旨在解决当前主流观点中“小模型 inherently 缺乏 robust reasoning(鲁棒推理能力)”的问题,挑战了通过单纯增加参数规模来提升模型能力的范式。其解决方案的关键在于提出了一种名为 Spectrum-to-Signal Principle (SSP) 的新训练框架,该框架包含两个核心阶段:首先通过 Two-Stage Diversity-Exploring Distillation (SFT) 生成多样化的解空间以探索更广谱的解决方案,随后利用 MaxEnt-Guided Policy Optimization (RL) 强化正确信号,从而显著提升推理性能。基于此方法训练的 VibeThinker-1.5B 模型在多个数学与代码基准上超越了远大于它的闭源和开源大模型,证明了小模型亦可具备媲美甚至超越大型模型的推理能力。

链接: https://arxiv.org/abs/2511.06221
作者: Sen Xu,Yi Zhou,Wei Wang,Jixin Min,Zhibin Yin,Yingwei Dai,Shixi Liu,Lianyu Pang,Yirong Chen,Junlin Zhang
机构: 未知
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Challenging the prevailing consensus that small models inherently lack robust reasoning, this report introduces VibeThinker-1.5B, a 1.5B-parameter dense model developed via our Spectrum-to-Signal Principle (SSP). This challenges the prevailing approach of scaling model parameters to enhance capabilities, as seen in models like DeepSeek R1 (671B) and Kimi k2 (1T). The SSP framework first employs a Two-Stage Diversity-Exploring Distillation (SFT) to generate a broad spectrum of solutions, followed by MaxEnt-Guided Policy Optimization (RL) to amplify the correct signal. With a total training cost of only 7,800, VibeThinker-1.5B demonstrates superior reasoning capabilities compared to closed-source models like Magistral Medium and Claude Opus 4, and performs on par with open-source models like GPT OSS-20B Medium. Remarkably, it surpasses the 400x larger DeepSeek R1 on three math benchmarks: AIME24 (80.3 vs. 79.8), AIME25 (74.4 vs. 70.0), and HMMT25 (50.4 vs. 41.7). This is a substantial improvement over its base model (6.7, 4.3, and 0.6, respectively). On LiveCodeBench V6, it scores 51.1, outperforming Magistral Medium’s 50.3 and its base model’s 0.0. These findings demonstrate that small models can achieve reasoning capabilities comparable to large models, drastically reducing training and inference costs and thereby democratizing advanced AI research.
zh

[NLP-85] Explicit Knowledge-Guided In-Context Learning for Early Detection of Alzheimers Disease

链接: https://arxiv.org/abs/2511.06215
作者: Puzhen Su,Yongzhu Miao,Chunxi Guo,Jintao Tang,Shasha Li,Ting Wang
机构: National University of Defense Technology (国防科技大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: This paper was accepted by IEEE BIBM 2025 conference

点击查看摘要

[NLP-86] Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads

链接: https://arxiv.org/abs/2511.06209
作者: Jingwei Ni,Ekaterina Fadeeva,Tianyi Wu,Mubashara Akhtar,Jiaheng Zhang,Elliott Ash,Markus Leippold,Timothy Baldwin,See-Kiong Ng,Artem Shelmanov,Mrinmaya Sachan
机构: ETH Zurich (苏黎世联邦理工学院); National University of Singapore (新加坡国立大学); MBZUAI (穆罕默德·本·扎耶德人工智能大学); University of Zurich (苏黎世大学); The University of Melbourne (墨尔本大学)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: Preprint under review

点击查看摘要

[NLP-87] Enhancing Adversarial Robustness of IoT Intrusion Detection via SHAP-Based Attribution Fingerprinting

【速读】: 该论文旨在解决物联网(IoT)入侵检测系统(IDS)在面对针对人工智能(AI)和机器学习(ML)模型的对抗攻击时,易被欺骗、误分类且防御可靠性下降的问题。其核心解决方案是提出一种基于SHapley Additive exPlanations(SHAP)指纹识别的新型对抗检测模型,关键在于利用SHAP的DeepExplainer从网络流量特征中提取归因指纹(attribution fingerprints),从而有效区分干净输入与对抗扰动输入,显著提升IDS对对抗样本的鲁棒性,并增强模型的可解释性与可信度。

链接: https://arxiv.org/abs/2511.06197
作者: Dilli Prasad Sharma,Liang Xue,Xiaowei Sun,Xiaodong Lin,Pulei Xiong
机构: York University (约克大学); University of Guelph (圭尔夫大学); National Research Council of Canada (加拿大国家研究委员会)
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
备注:

点击查看摘要

Abstract:The rapid proliferation of Internet of Things (IoT) devices has transformed numerous industries by enabling seamless connectivity and data-driven automation. However, this expansion has also exposed IoT networks to increasingly sophisticated security threats, including adversarial attacks targeting artificial intelligence (AI) and machine learning (ML)-based intrusion detection systems (IDS) to deliberately evade detection, induce misclassification, and systematically undermine the reliability and integrity of security defenses. To address these challenges, we propose a novel adversarial detection model that enhances the robustness of IoT IDS against adversarial attacks through SHapley Additive exPlanations (SHAP)-based fingerprinting. Using SHAP’s DeepExplainer, we extract attribution fingerprints from network traffic features, enabling the IDS to reliably distinguish between clean and adversarially perturbed inputs. By capturing subtle attribution patterns, the model becomes more resilient to evasion attempts and adversarial manipulations. We evaluated the model on a standard IoT benchmark dataset, where it significantly outperformed a state-of-the-art method in detecting adversarial attacks. In addition to enhanced robustness, this approach improves model transparency and interpretability, thereby increasing trust in the IDS through explainable AI.
zh

[NLP-88] Confidence-Guided Stepwise Model Routing for Cost-Efficient Reasoning

链接: https://arxiv.org/abs/2511.06190
作者: Sangmook Lee,Dohyung Kim,Hyukhun Koh,Nakyeong Yang,Kyomin Jung
机构: 未知
类目: Computation and Language (cs.CL)
备注: 7 pages, 5 figures

点击查看摘要

[NLP-89] BookAsSumQA: An Evaluation Framework for Aspect-Based Book Summarization via Question Answering

链接: https://arxiv.org/abs/2511.06183
作者: Ryuhei Miyazato,Ting-Ruen Wei,Xuyang Wu,Hsin-Tai Wu,Kei Harada
机构: The University of Electro-Communications (电波通信大学); Santa Clara University (圣克拉拉大学); DOCOMO Innovations, Inc. (NTT DoCoMo创新公司)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-90] Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles

【速读】: 该论文旨在解决当前大型语言模型(Large Language Models, LLMs)在复杂逻辑推理任务中隐性社会偏见难以被现有评估基准捕捉的问题。现有安全防护机制虽能抑制显性偏见,但无法识别由社会刻板印象引发的细微偏差,尤其在涉及性别等敏感属性的推理过程中表现突出。解决方案的关键在于提出PRIME(Puzzle Reasoning for Implicit Biases in Model Evaluation)框架,该框架利用逻辑网格谜题(logic grid puzzles)作为测评工具,通过在同一结构下生成具有刻板印象、反刻板印象和中立情境的谜题变体,实现对模型推理准确率的可控对比与量化分析。实验表明,当解题结果符合性别刻板印象时,模型推理准确性显著提升,凸显了PRIME在诊断和度量LLMs在演绎推理中隐性偏见方面的有效性。

链接: https://arxiv.org/abs/2511.06160
作者: Fatima Jahara,Mark Dredze,Sharon Levy
机构: Rutgers University (罗格斯大学); Johns Hopkins University (约翰霍普金斯大学)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
备注: 24 pages (including appendix)

点击查看摘要

Abstract:While recent safety guardrails effectively suppress overtly biased outputs, subtler forms of social bias emerge during complex logical reasoning tasks that evade current evaluation benchmarks. To fill this gap, we introduce a new evaluation framework, PRIME (Puzzle Reasoning for Implicit Biases in Model Evaluation), that uses logic grid puzzles to systematically probe the influence of social stereotypes on logical reasoning and decision making in LLMs. Our use of logic puzzles enables automatic generation and verification, as well as variability in complexity and biased settings. PRIME includes stereotypical, anti-stereotypical, and neutral puzzle variants generated from a shared puzzle structure, allowing for controlled and fine-grained comparisons. We evaluate multiple model families across puzzle sizes and test the effectiveness of prompt-based mitigation strategies. Focusing our experiments on gender stereotypes, our findings highlight that models consistently reason more accurately when solutions align with stereotypical associations. This demonstrates the significance of PRIME for diagnosing and quantifying social biases perpetuated in the deductive reasoning of LLMs, where fairness is critical.
zh

[NLP-91] Large Language Models Develop Novel Social Biases Through Adaptive Exploration

【速读】: 该论文旨在解决大型语言模型(Large Language Models, LLMs)在决策框架中可能自发产生新型社会偏见的问题,这些偏见即使在初始无差异的人工人口群体中也会出现,并导致任务分配的严重不平等,且随着模型规模增大而加剧。解决方案的关键在于识别并干预“探索-利用权衡”(exploration-exploitation trade-offs)机制——即模型因早期经验过度固化对群体的认知,从而形成系统性偏见。研究发现,通过显式激励模型进行探索(explicitly incentivizing exploration),能最有效地降低任务分配的分层现象,强调需设计更复杂的多维目标函数来缓解此类偏见。

链接: https://arxiv.org/abs/2511.06148
作者: Addison J. Wu,Ryan Liu,Xuechunzi Bai,Thomas L. Griffiths
机构: Princeton University (普林斯顿大学); University of Chicago (芝加哥大学)
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:As large language models (LLMs) are adopted into frameworks that grant them the capacity to make real decisions, it is increasingly important to ensure that they are unbiased. In this paper, we argue that the predominant approach of simply removing existing biases from models is not enough. Using a paradigm from the psychology literature, we demonstrate that LLMs can spontaneously develop novel social biases about artificial demographic groups even when no inherent differences exist. These biases result in highly stratified task allocations, which are less fair than assignments by human participants and are exacerbated by newer and larger models. In social science, emergent biases like these have been shown to result from exploration-exploitation trade-offs, where the decision-maker explores too little, allowing early observations to strongly influence impressions about entire demographic groups. To alleviate this effect, we examine a series of interventions targeting model inputs, problem structure, and explicit steering. We find that explicitly incentivizing exploration most robustly reduces stratification, highlighting the need for better multifaceted objectives to mitigate bias. These results reveal that LLMs are not merely passive mirrors of human social biases, but can actively create new ones from experience, raising urgent questions about how these systems will shape societies over time.
zh

[NLP-92] Referring Expressions as a Lens into Spatial Language Grounding in Vision-Language Models AACL2025

链接: https://arxiv.org/abs/2511.06146
作者: Akshar Tumu,Varad Shinde,Parisa Kordjamshidi
机构: UC San Diego (加州大学圣地亚哥分校); IIT Kanpur (印度理工学院坎普尔分校); Michigan State University (密歇根州立大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted at IJCNLP-AACL 2025

点击查看摘要

[NLP-93] Evaluation of retrieval-based QA on QUEST-LOFT

链接: https://arxiv.org/abs/2511.06125
作者: Nathan Scales,Nathanael Schärli,Olivier Bousquet
机构: Google DeepMind(谷歌深度大脑)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
备注:

点击查看摘要

[NLP-94] Adapting Web Agents with Synthetic Supervision

链接: https://arxiv.org/abs/2511.06101
作者: Zhaoyang Wang,Yiming Liang,Xuchao Zhang,Qianhui Wu,Siwei Han,Anson Bastos,Rujia Wang,Chetan Bansal,Baolin Peng,Jianfeng Gao,Saravan Rajmohan,Huaxiu Yao
机构: UNC-Chapel Hill (北卡罗来纳大学教堂山分校); Purdue University (普渡大学); Microsoft (微软)
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: 19 pages, 6 figures

点击查看摘要

[NLP-95] MuonAll: Muon Variant for Efficient Finetuning of Large Language Models

链接: https://arxiv.org/abs/2511.06086
作者: Saurabh Page,Advait Joshi,S. S. Sonawane
机构: 未知
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-96] Simulating Students with Large Language Models : A Review of Architecture Mechanisms and Role Modelling in Education with Generative AI

【速读】: 该论文旨在解决如何通过生成式 AI (Generative AI) 构建可模拟多样化学习者行为的虚拟学生(Simulated Students),以系统化评估教学方法、建模认知发展路径与社会行为,从而克服真实教育场景中难以实现的复杂实验设计问题。其解决方案的关键在于将大语言模型(Large Language Models, LLMs)集成到教育研究中,利用其高度的语言真实性与行为适应性,使模拟代理能够逼近人类认知过程并开展情境适配的教学对话,进而支持课程开发、教学评价和教师培训等应用。

链接: https://arxiv.org/abs/2511.06078
作者: Luis Marquez-Carpintero,Alberto Lopez-Sellers,Miguel Cazorla
机构: University of Alicante (阿利坎特大学)
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Simulated Students offer a valuable methodological framework for evaluating pedagogical approaches and modelling diverse learner profiles, tasks which are otherwise challenging to undertake systematically in real-world settings. Recent research has increasingly focused on developing such simulated agents to capture a range of learning styles, cognitive development pathways, and social behaviours. Among contemporary simulation techniques, the integration of large language models (LLMs) into educational research has emerged as a particularly versatile and scalable paradigm. LLMs afford a high degree of linguistic realism and behavioural adaptability, enabling agents to approximate cognitive processes and engage in contextually appropriate pedagogical dialogues. This paper presents a thematic review of empirical and methodological studies utilising LLMs to simulate student behaviour across educational environments. We synthesise current evidence on the capacity of LLM-based agents to emulate learner archetypes, respond to instructional inputs, and interact within multi-agent classroom scenarios. Furthermore, we examine the implications of such systems for curriculum development, instructional evaluation, and teacher training. While LLMs surpass rule-based systems in natural language generation and situational flexibility, ongoing concerns persist regarding algorithmic bias, evaluation reliability, and alignment with educational objectives. The review identifies existing technological and methodological gaps and proposes future research directions for integrating generative AI into adaptive learning systems and instructional design.
zh

[NLP-97] Stemming Hallucination in Language Models Using a Licensing Oracle ACL

链接: https://arxiv.org/abs/2511.06073
作者: Simeon Emanuilov,Richard Ackermann
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
备注: 23 pages, 4 figures, 8 tables. Introduces the Licensing Oracle, an architectural solution for eliminating hallucinations in language models through formal SHACL validation against knowledge graphs. All datasets and models are available at this https URL

点击查看摘要

[NLP-98] Automating Hardware Design and Verification from Architectural Papers via a Neural-Symbolic Graph Framework

【速读】: 该论文旨在解决从学术论文中复现硬件架构的难题,主要挑战在于缺乏公开的源代码以及硬件描述语言(HDL)的复杂性。解决方案的关键在于提出了一种名为ArchCraft的框架,该框架通过形式化图结构捕获架构蓝图(Architectural Blueprint),并利用符号定义功能规范(Functional Specification),将非结构化的学术论文转化为可综合的Verilog项目及寄存器传输级(RTL)验证代码。其核心创新在于解耦生成的RTL与测试平台(TB)代码,并通过结构化工作流实现自动化的电路设计、验证与PPA(功耗、面积、性能)评估,从而显著提升论文理解与代码生成的准确性与可靠性。

链接: https://arxiv.org/abs/2511.06067
作者: Haoyue Yang,Xuanle Zhao,Yujie Liu,Zhuojun Zou,Kailin Lyu,Changchun Zhou,Yao Zhu,Jie Hao
机构: Institute of Automation, Chinese Academy of Sciences (中国科学院自动化研究所); Peking University (北京大学); Zhejiang University (浙江大学)
类目: Computation and Language (cs.CL); Software Engineering (cs.SE)
备注: Preprint Version, Work in Progress

点击查看摘要

Abstract:The reproduction of hardware architectures from academic papers remains a significant challenge due to the lack of publicly available source code and the complexity of hardware description languages (HDLs). To this end, we propose \textbfArchCraft, a Framework that converts abstract architectural descriptions from academic papers into synthesizable Verilog projects with register-transfer level (RTL) verification. ArchCraft introduces a structured workflow, which uses formal graphs to capture the Architectural Blueprint and symbols to define the Functional Specification, translating unstructured academic papers into verifiable, hardware-aware designs. The framework then generates RTL and testbench (TB) code decoupled via these symbols to facilitate verification and debugging, ultimately reporting the circuit’s Power, Area, and Performance (PPA). Moreover, we propose the first benchmark, \textbfArchSynthBench, for synthesizing hardware from architectural descriptions, with a complete set of evaluation indicators, 50 project-level circuits, and around 600 circuit blocks. We systematically assess ArchCraft on ArchSynthBench, where the experiment results demonstrate the superiority of our proposed method, surpassing direct generation methods and the VerilogCoder framework in both paper understanding and code completion. Furthermore, evaluation and physical implementation of the generated executable RTL code show that these implementations meet all timing constraints without violations, and their performance metrics are consistent with those reported in the original papers.
zh

[NLP-99] ScRPO: From Errors to Insights

【速读】: 该论文旨在解决大语言模型在处理复杂数学推理任务时表现不足的问题,尤其在缺乏外部反馈的情况下难以有效自我改进。解决方案的关键在于提出一种名为Self-correction Relative Policy Optimization (ScRPO) 的强化学习框架,其核心机制包括两个阶段:第一阶段通过GRPO(Generalized Relative Policy Optimization)进行试错学习,并将错误答案及其对应问题存储至错误池;第二阶段则引导模型对先前错误进行自省与修正,从而实现基于内部反馈的自我优化。实验表明,该方法在多个数学推理基准测试中均优于现有后训练方法,展现出在有限外部监督下提升模型可靠性和能力的潜力。

链接: https://arxiv.org/abs/2511.06065
作者: Lianrui Li,Dakuan Lu,Jiawei Shao,Chi Zhang,Xuelong Li
机构: Institute of Artificial Intelligence (TeleAI), China Telecom (中国电信人工智能研究院)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:We propose Self-correction Relative Policy Optimization (ScRPO), a novel reinforcement learning framework designed to enhance large language models on challenging mathemati- cal problems by leveraging self-reflection and error correction. Our approach consists of two stages: (1) Trial-and-error learning stage: training the model with GRPO and collect- ing incorrect answers along with their cor- responding questions in an error pool; (2) Self-correction learning stage: guiding the model to reflect on why its previous an- swers were wrong. Extensive experiments across multiple math reasoning benchmarks, including AIME, AMC, Olympiad, MATH- 500, GSM8k, using Deepseek-Distill-Qwen- 1.5B and Deepseek-Distill-Qwen-7B. The ex- perimental results demonstrate that ScRPO consistently outperforms several post-training methods. These findings highlight ScRPO as a promising paradigm for enabling language models to self-improve on difficult tasks with limited external feedback, paving the way to- ward more reliable and capable AI systems.
zh

[NLP-100] ReMoD: Rethinking Modality Contribution in Multimodal Stance Detection via Dual Reasoning

【速读】: 该论文旨在解决多模态立场检测(Multimodal Stance Detection, MSD)中因粗暴融合不同模态信息而导致的立场理解噪声问题,即现有方法忽视了各模态在表达立场时贡献不均的问题,从而可能引入学习误差。其解决方案的关键在于提出一种基于双推理范式的框架 ReMoD(ReThink Modality contribution via Dual-reasoning),通过“经验驱动的直觉推理”和“ deliberate 反思推理”两个阶段动态调整模态权重:首先利用模态经验池(Modality Experience Pool, MEP)与语义经验池(Semantic Experience Pool, SEP)形成初始立场假设,随后通过模态思维链(Modality-CoT)和语义思维链(Semantic-CoT)分别优化模态融合策略与语义上下文理解,实现对模态贡献的自适应调节,从而提升模型在复杂场景下的鲁棒性和泛化能力。

链接: https://arxiv.org/abs/2511.06057
作者: Bingbing Wang,Zhengda Jin,Bin Liang,Jing Li,Ruifeng Xu
机构: 未知
类目: Computation and Language (cs.CL); Multimedia (cs.MM)
备注:

点击查看摘要

Abstract:Multimodal Stance Detection (MSD) is a crucial task for understanding public opinion on social media. Existing work simply fuses information from various modalities to learn stance representations, overlooking the varying contributions of stance expression from different modalities. Therefore, stance misunderstanding noises may be drawn into the stance learning process due to the risk of learning errors by rough modality combination. To address this, we get inspiration from the dual-process theory of human cognition and propose ReMoD, a framework that Rethinks Modality contribution of stance expression through a Dual-reasoning paradigm. ReMoD integrates experience-driven intuitive reasoning to capture initial stance cues with deliberate reflective reasoning to adjust for modality biases, refine stance judgments, and thereby dynamically weight modality contributions based on their actual expressive power for the target stance. Specifically, the intuitive stage queries the Modality Experience Pool (MEP) and Semantic Experience Pool (SEP) to form an initial stance hypothesis, prioritizing historically impactful modalities. This hypothesis is then refined in the reflective stage via two reasoning chains: Modality-CoT updates MEP with adaptive fusion strategies to amplify relevant modalities, while Semantic-CoT refines SEP with deeper contextual insights of stance semantics. These dual experience structures are continuously refined during training and recalled at inference to guide robust and context-aware stance decisions. Extensive experiments on the public MMSD benchmark demonstrate that our ReMoD significantly outperforms most baseline models and exhibits strong generalization capabilities.
zh

[NLP-101] Efficient Hate Speech Detection: A Three-Layer LoRA-Tuned BERTweet Framework

【速读】: 该论文旨在解决仇恨言论检测系统在保持高精度的同时实现计算效率提升的问题,以支持实时部署场景。其核心解决方案是提出一种三层架构:首先通过规则-based预过滤减少冗余计算;其次采用LoRA(Low-Rank Adaptation)微调的BERTweet模型,在仅需1.87M可训练参数(仅为全量微调的1.37%)的情况下实现高效参数更新;最后引入持续学习机制以适应动态数据变化。该方法在仅使用134M参数(相比主流模型如SafePhi的14B参数小100倍)的前提下,达到0.85宏F1分数,性能为先进大模型的94%,且训练时间缩短至单张T4 GPU上约2小时,显著提升了资源受限环境下的实用性与部署可行性。

链接: https://arxiv.org/abs/2511.06051
作者: Mahmoud El-Bahnasawi
机构: Zewail City of Science and Technology (扎维尔科学与技术城)
类目: Computation and Language (cs.CL)
备注: 13 pages, 2 figures

点击查看摘要

Abstract:This paper addresses the critical challenge of developing computationally efficient hate speech detection systems that maintain competitive performance while being practical for real-time deployment. We propose a novel three-layer framework that combines rule-based pre-filtering with a parameter-efficient LoRA-tuned BERTweet model and continuous learning capabilities. Our approach achieves 0.85 macro F1 score - representing 94% of the performance of state-of-the-art large language models like SafePhi (Phi-4 based) while using a base model that is 100x smaller (134M vs 14B parameters). Compared to traditional BERT-based approaches with similar computational requirements, our method demonstrates superior performance through strategic dataset unification and optimized fine-tuning. The system requires only 1.87M trainable parameters (1.37% of full fine-tuning) and trains in approximately 2 hours on a single T4 GPU, making robust hate speech detection accessible in resource-constrained environments while maintaining competitive accuracy for real-world deployment.
zh

[NLP-102] Visual Exploration of Feature Relationships in Sparse Autoencoders with Curated Concepts NEURIPS2025

链接: https://arxiv.org/abs/2511.06048
作者: Xinyuan Yan,Shusen Liu,Kowshik Thopalli,Bei Wang
机构: University of Utah (犹他大学); Lawrence Livermore National Laboratory (劳伦斯利弗莫尔国家实验室)
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: 8 pages (5 main paper+3 refernce), 2 figures, pulished at Mechanistic Interpretability Workshop at NeurIPS 2025

点击查看摘要

[NLP-103] Multi-Reward GRPO Fine-Tuning for De-biasing Large Language Models : A Study Based on Chinese-Context Discrimination Data

【速读】: 该论文旨在解决大语言模型(Large Language Models, LLMs)中存在的隐性偏见与歧视倾向问题,尤其是那些反映社会刻板印象的文化特定性和多维歧视。现有对齐技术如基于人类反馈的强化学习(Reinforcement Learning from Human Feedback, RLHF)和直接偏好优化(Direct Preference Optimization, DPO)在缓解此类问题上存在局限性。解决方案的关键在于提出一种多奖励组相对策略优化(Multi-Reward Group Relative Policy Optimization, GRPO)框架:首先构建一个源自中文语境的合成英文数据集,涵盖地域、民族和职业等维度的偏见类别;其次利用DeBERTa-v3训练一个具备多维奖励信号(公平性、中立性和语言质量)的奖励模型;最后通过该模型引导GRPO进行细粒度的策略优化,从而实现模型输出在伦理维度上的去偏目标。实验表明,该方法显著降低了偏见强度,并保持了语言流畅性和信息丰富性,为跨文化语境下的伦理对齐提供了可复现的技术路径。

链接: https://arxiv.org/abs/2511.06023
作者: Deng Yixuan,Ji Xiaoqiang
机构: The Chinese University of Hong Kong, Shenzhen (香港中文大学(深圳)); Shenzhen Institute of Artificial Intelligence and Robotics for Society (深圳市人工智能与机器人社会研究院)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Large Language Models (LLMs) often exhibit implicit biases and discriminatory tendencies that reflect underlying social stereotypes. While recent alignment techniques such as RLHF and DPO have mitigated some of these issues, they remain limited in addressing culturally specific and multi-dimensional forms of discrimination. This paper proposes a Multi-Reward Group Relative Policy Optimization (GRPO) framework to fine-tune LLMs toward ethical and bias-free behavior. Our approach constructs a synthetic English-language dataset derived from Chinese-context discrimination categories, including regional, ethnic, and occupational biases. Each instance is paired with both neutral and biased responses to train a reward model based on DeBERTa-v3, which provides multi-dimensional reward signals capturing fairness, neutrality, and linguistic quality. The trained reward model then guides GRPO fine-tuning to optimize model outputs along these ethical dimensions. Experimental results demonstrate significant reductions in bias intensity and improved alignment with non-discriminatory standards without compromising fluency or informativeness. This study highlights the effectiveness of GRPO-based multi-reward optimization for de-biasing LLMs and offers a replicable framework for cultural-contextual ethical alignment.
zh

[NLP-104] LLM s Do Not See Age: Assessing Demographic Bias in Automated Systematic Review Synthesis AACL2025

【速读】: 该论文旨在解决当前生成式 AI (Generative AI) 在生物医学证据合成任务中对年龄相关人口特征的保留不足问题,即语言模型在生成摘要时可能忽略或错误处理儿童、成人和老年人群的差异,从而导致潜在的偏见和不准确信息传递。解决方案的关键在于构建了一个新型的分年龄层数据集 DemogSummary,涵盖儿童、成人和老年群体的系统评价原始研究,并引入一种新的评估指标——人口学显著性评分(Demographic Salience Score, DSS),用于量化年龄相关实体的保留程度与幻觉情况。通过该方法,研究发现不同模型在不同年龄群体上的表现存在系统性差异,尤其成人群体的摘要质量最低,而被忽视的人群更容易出现幻觉,揭示了现有大语言模型在生物医学自然语言处理(NLP)中的公平性缺陷,强调需建立面向公平性的评估框架与摘要流程。

链接: https://arxiv.org/abs/2511.06000
作者: Favour Yahdii Aghaebe,Tanefa Apekey,Elizabeth Williams,Nafise Sadat Moosavi
机构: University of Sheffield (谢菲尔德大学)
类目: Computation and Language (cs.CL)
备注: Accepted at AACL 2025

点击查看摘要

Abstract:Clinical interventions often hinge on age: medications and procedures safe for adults may be harmful to children or ineffective for older adults. However, as language models are increasingly integrated into biomedical evidence synthesis workflows, it remains uncertain whether these systems preserve such crucial demographic distinctions. To address this gap, we evaluate how well state-of-the-art language models retain age-related information when generating abstractive summaries of biomedical studies. We construct DemogSummary, a novel age-stratified dataset of systematic review primary studies, covering child, adult, and older adult populations. We evaluate three prominent summarisation-capable LLMs, Qwen (open-source), Longformer (open-source) and GPT-4.1 Nano (proprietary), using both standard metrics and a newly proposed Demographic Salience Score (DSS), which quantifies age-related entity retention and hallucination. Our results reveal systematic disparities across models and age groups: demographic fidelity is lowest for adult-focused summaries, and under-represented populations are more prone to hallucinations. These findings highlight the limitations of current LLMs in faithful and bias-free summarisation and point to the need for fairness-aware evaluation frameworks and summarisation pipelines in biomedical NLP.
zh

[NLP-105] Revisiting Entropy in Reinforcement Learning for Large Reasoning Models

【速读】: 该论文旨在解决强化学习中可验证奖励(Reinforcement Learning with Verifiable Rewards, RLVR)训练过程中大语言模型(Large Language Models, LLMs)熵坍缩(entropy collapse)的问题,该现象会导致模型过早收敛至次优局部极小值,阻碍性能进一步提升。解决方案的关键在于识别并调控导致熵坍缩的核心机制:研究发现,具有正优势(positive advantages)的token是熵坍缩的主要贡献者,通过在优化目标中调整正负优势token的相对损失权重,可以有效控制模型熵,从而缓解熵坍缩问题并提升模型在多个基准测试中的表现与响应多样性。

链接: https://arxiv.org/abs/2511.05993
作者: Renren Jin,Pengzhi Gao,Yuqi Ren,Zhuowen Han,Tongxuan Zhang,Wuwei Huang,Wei Liu,Jian Luan,Deyi Xiong
机构: Tianjin University (天津大学); Tianjin Normal University (天津师范大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 16 pages, 11 figures, 3 tables

点击查看摘要

Abstract:Reinforcement learning with verifiable rewards (RLVR) has emerged as a predominant approach for enhancing the reasoning capabilities of large language models (LLMs). However, the entropy of LLMs usually collapses during RLVR training, causing premature convergence to suboptimal local minima and hinder further performance improvement. Although various approaches have been proposed to mitigate entropy collapse, a comprehensive study of entropy in RLVR remains lacking. To address this gap, we conduct extensive experiments to investigate the entropy dynamics of LLMs trained with RLVR and analyze how model entropy correlates with response diversity, calibration, and performance across various benchmarks. Our findings reveal that the number of off-policy updates, the diversity of training data, and the clipping thresholds in the optimization objective are critical factors influencing the entropy of LLMs trained with RLVR. Moreover, we theoretically and empirically demonstrate that tokens with positive advantages are the primary contributors to entropy collapse, and that model entropy can be effectively regulated by adjusting the relative loss weights of tokens with positive and negative advantages during training.
zh

[NLP-106] Interpretable Recognition of Cognitive Distortions in Natural Language Texts

链接: https://arxiv.org/abs/2511.05969
作者: Anton Kolonin,Anna Arinicheva
机构: Novosibirsk State University (新西伯利亚国立大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
备注: 9 pages, 4 figures

点击查看摘要

[NLP-107] Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLM s

【速读】: 该论文试图解决的问题是:强化学习(Reinforcement Learning, RL)增强的语言模型在提升推理与泛化能力的同时,是否会导致记忆知识的退化。传统观点认为RL会损害模型对已学知识的保留,但本文通过实证发现,RL增强模型在纯知识召回任务中反而优于基础模型和监督微调(Supervised Fine-Tuning, SFT)模型,尤其是在需要遍历结构化知识层级(如医学编码)的任务上表现突出。解决方案的关键在于提出并验证了一个新假设:RL带来的性能提升并非源于新增数据,而是改善了模型在已有参数中导航和搜索知识层级的程序性技能(procedural skills)。这一假设通过结构化提示(structured prompting)实验得到支持——该方法引导SFT模型进行类似RL模型的知识层级遍历,可显著缩小性能差距;进一步的层间激活分析也表明,RL主要改变的是查询表示(query representations)的路径搜索方式,而非事实表征(factual representations)本身。

链接: https://arxiv.org/abs/2511.05933
作者: Renfei Zhang,Manasa Kaniselvan,Niloofar Mireshghallah
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: `

点击查看摘要

Abstract:Reinforcement learning (RL) is often credited with improving language model reasoning and generalization at the expense of degrading memorized knowledge. We challenge this narrative by observing that RL-enhanced models consistently outperform their base and supervised fine-tuned (SFT) counterparts on pure knowledge recall tasks, particularly those requiring traversal of hierarchical, structured knowledge (e.g., medical codes). We hypothesize these gains stem not from newly acquired data, but from improved procedural skills in navigating and searching existing knowledge hierarchies within the model parameters. To support this hypothesis, we show that structured prompting, which explicitly guides SFTed models through hierarchical traversal, recovers most of the performance gap (reducing 24pp to 7pp on MedConceptsQA for DeepSeek-V3/R1). We further find that while prompting improves final-answer accuracy, RL-enhanced models retain superior ability to recall correct procedural paths on deep-retrieval tasks. Finally our layer-wise internal activation analysis reveals that while factual representations (e.g., activations for the statement “code 57.95 refers to urinary infection”) maintain high cosine similarity between SFT and RL models, query representations (e.g., “what is code 57.95”) diverge noticeably, indicating that RL primarily transforms how models traverse knowledge rather than the knowledge representation itself.
zh

[NLP-108] IDALC: A Semi-Supervised Framework for Intent Detection and Active Learning based Correction

【速读】: 该论文旨在解决语音控制对话系统中因模型置信度低导致的用户意图识别失败问题,以及系统拒绝的语句在后续迭代中需要重新标注以引入新意图时所面临的高人工标注成本问题。解决方案的关键在于提出一种基于主动学习(Active Learning)的半监督框架IDALC(Intent Detection and Active Learning based Correction),通过智能筛选最具信息量的未标注样本进行人工标注,从而显著降低整体标注需求(仅需6–10%的未标注数据),同时提升意图检测准确率与宏平均F1分数(相比基线方法提升5–10%和4–8%)。

链接: https://arxiv.org/abs/2511.05921
作者: Ankan Mullick,Sukannya Purkayastha,Saransh Sharma,Pawan Goyal,Niloy Ganguly
机构: IIT Kharagpur(印度理工学院克哈格普尔分校); Technische Universität Darmstadt(达姆施塔特工业大学); Adobe Research(Adobe研究院)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Paper accepted in IEEE Transactions on Artificial Intelligence (October 2025)

点击查看摘要

Abstract:Voice-controlled dialog systems have become immensely popular due to their ability to perform a wide range of actions in response to diverse user queries. These agents possess a predefined set of skills or intents to fulfill specific user tasks. But every system has its own limitations. There are instances where, even for known intents, if any model exhibits low confidence, it results in rejection of utterances that necessitate manual annotation. Additionally, as time progresses, there may be a need to retrain these agents with new intents from the system-rejected queries to carry out additional tasks. Labeling all these emerging intents and rejected utterances over time is impractical, thus calling for an efficient mechanism to reduce annotation costs. In this paper, we introduce IDALC (Intent Detection and Active Learning based Correction), a semi-supervised framework designed to detect user intents and rectify system-rejected utterances while minimizing the need for human annotation. Empirical findings on various benchmark datasets demonstrate that our system surpasses baseline methods, achieving a 5-10% higher accuracy and a 4-8% improvement in macro-F1. Remarkably, we maintain the overall annotation cost at just 6-10% of the unlabelled data available to the system. The overall framework of IDALC is shown in Fig. 1
zh

[NLP-109] Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLM s

【速读】: 该论文旨在解决大语言模型(Large Language Models, LLMs)在信息检索场景中因提示注入(prompt injection)攻击导致的事实记忆错误问题,尤其是针对中间人(man-in-the-middle, MitM)攻击下LLMs生成答案的可靠性与不确定性。其关键解决方案是提出Xmera框架——一个基于理论的MitM攻击评估工具,通过在三个封闭式、基于事实的问答任务中对受害LLM输入进行扰动,量化响应正确性与生成过程中的不确定性;研究发现,简单的指令类攻击成功率高达约85.3%,且错误回答时伴随高不确定性;据此,作者进一步利用随机森林分类器基于响应不确定性水平实现高效防御(平均AUC达~96%),从而为用户识别潜在恶意LLM输出提供初步安全预警机制。

链接: https://arxiv.org/abs/2511.05919
作者: Alina Fastowski,Bardh Prenkaj,Yuxiao Li,Gjergji Kasneci
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:LLMs are now an integral part of information retrieval. As such, their role as question answering chatbots raises significant concerns due to their shown vulnerability to adversarial man-in-the-middle (MitM) attacks. Here, we propose the first principled attack evaluation on LLM factual memory under prompt injection via Xmera, our novel, theory-grounded MitM framework. By perturbing the input given to “victim” LLMs in three closed-book and fact-based QA settings, we undermine the correctness of the responses and assess the uncertainty of their generation process. Surprisingly, trivial instruction-based attacks report the highest success rate (up to ~85.3%) while simultaneously having a high uncertainty for incorrectly answered questions. To provide a simple defense mechanism against Xmera, we train Random Forest classifiers on the response uncertainty levels to distinguish between attacked and unattacked queries (average AUC of up to ~96%). We believe that signaling users to be cautious about the answers they receive from black-box and potentially corrupt LLMs is a first checkpoint toward user cyberspace safety.
zh

[NLP-110] NILC: Discovering New Intents with LLM -assisted Clustering

【速读】: 该论文旨在解决新意图发现(New Intent Discovery, NID)任务中现有级联式架构的局限性问题,即文本嵌入与聚类步骤缺乏相互反馈优化,且仅依赖嵌入空间聚类忽视了语义细微差别,导致性能不佳。其解决方案的关键在于提出一种名为NILC的新颖聚类框架,采用迭代流程:首先利用大语言模型(Large Language Models, LLMs)生成额外的语义中心点以增强欧氏空间中的聚类中心语义表示;其次,通过LLMs对模糊或简短样本进行重写以扩充难样本,用于后续聚类修正;同时引入非平凡的种子设置和软必须链接(soft must links)监督信号,在半监督场景下提升NID精度。该方法有效实现了嵌入与聚类的协同优化,显著提升了跨领域基准数据集上的性能表现。

链接: https://arxiv.org/abs/2511.05913
作者: Hongtao Wang,Renchi Yang,Wenqing Lin
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:New intent discovery (NID) seeks to recognize both new and known intents from unlabeled user utterances, which finds prevalent use in practical dialogue systems. Existing works towards NID mainly adopt a cascaded architecture, wherein the first stage focuses on encoding the utterances into informative text embeddings beforehand, while the latter is to group similar embeddings into clusters (i.e., intents), typically by K-Means. However, such a cascaded pipeline fails to leverage the feedback from both steps for mutual refinement, and, meanwhile, the embedding-only clustering overlooks nuanced textual semantics, leading to suboptimal performance. To bridge this gap, this paper proposes NILC, a novel clustering framework specially catered for effective NID. Particularly, NILC follows an iterative workflow, in which clustering assignments are judiciously updated by carefully refining cluster centroids and text embeddings of uncertain utterances with the aid of large language models (LLMs). Specifically, NILC first taps into LLMs to create additional semantic centroids for clusters, thereby enriching the contextual semantics of the Euclidean centroids of embeddings. Moreover, LLMs are then harnessed to augment hard samples (ambiguous or terse utterances) identified from clusters via rewriting for subsequent cluster correction. Further, we inject supervision signals through non-trivial techniques seeding and soft must links for more accurate NID in the semi-supervised setting. Extensive experiments comparing NILC against multiple recent baselines under both unsupervised and semi-supervised settings showcase that NILC can achieve significant performance improvements over six benchmark datasets of diverse domains consistently.
zh

[NLP-111] he Imperfect Learner: Incorporating Developmental Trajectories in Memory-based Student Simulation

链接: https://arxiv.org/abs/2511.05903
作者: Zhengyuan Liu,Stella Xin Yin,Bryan Chen Zhengyu Tan,Roy Ka-Wei Lee,Guimei Liu,Dion Hoe-Lian Goh,Wenya Wang,Nancy F. Chen
机构: 未知
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
备注:

点击查看摘要

[NLP-112] Retrieval-Augmented Generation in Medicine: A Scoping Review of Technical Implementations Clinical Applications and Ethical Considerations

【速读】: 该论文旨在解决医疗领域中因医学知识快速增长和临床实践日益复杂所带来的挑战,尤其是在大语言模型(Large Language Models, LLMs)临床应用中存在的固有局限性问题。其核心解决方案是采用检索增强生成(Retrieval-Augmented Generation, RAG)技术,通过结合外部知识检索与生成能力,提升LLMs在医疗场景下的准确性、相关性和可靠性。研究指出,当前RAG在医学领域的应用仍处于早期阶段,关键突破点在于加强临床验证、实现跨语言适应能力以及支持低资源环境,以推动其在全球范围内可信且负责任的部署。

链接: https://arxiv.org/abs/2511.05901
作者: Rui Yang,Matthew Yu Heng Wong,Huitao Li,Xin Li,Wentao Zhu,Jingchi Liao,Kunyu Yu,Jonathan Chong Kai Liew,Weihao Xuan,Yingjian Chen,Yuhe Ke,Jasmine Chiat Ling Ong,Douglas Teodoro,Chuan Hong,Daniel Shi Wei Ting,Nan Liu
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:The rapid growth of medical knowledge and increasing complexity of clinical practice pose challenges. In this context, large language models (LLMs) have demonstrated value; however, inherent limitations remain. Retrieval-augmented generation (RAG) technologies show potential to enhance their clinical applicability. This study reviewed RAG applications in medicine. We found that research primarily relied on publicly available data, with limited application in private data. For retrieval, approaches commonly relied on English-centric embedding models, while LLMs were mostly generic, with limited use of medical-specific LLMs. For evaluation, automated metrics evaluated generation quality and task performance, whereas human evaluation focused on accuracy, completeness, relevance, and fluency, with insufficient attention to bias and safety. RAG applications were concentrated on question answering, report generation, text summarization, and information extraction. Overall, medical RAG remains at an early stage, requiring advances in clinical validation, cross-linguistic adaptation, and support for low-resource settings to enable trustworthy and responsible global use.
zh

[NLP-113] MCP-RiskCue: Can LLM infer risk information from MCP server System Logs?

链接: https://arxiv.org/abs/2511.05867
作者: Jiayi Fu,Qiyao Sun
机构: Southern University of Science and Technology (南方科技大学); Beijing University of Posts and Telecommunications (北京邮电大学)
类目: Cryptography and Security (cs.CR); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-114] Quantifying Edits Decay in Fine-tuned LLM s ICLR2026

【速读】: 该论文旨在解决知识编辑(Knowledge Editing, KE)与微调(Fine-tuning)协同使用时的兼容性问题,即在对已编辑的大语言模型(Large Language Models, LLMs)进行微调后,原有编辑内容是否能够保留。这一问题直接影响到模型部署的安全性和成本效率:若编辑失效,则需重复编辑;若编辑残留,则可能传播恶意信息。解决方案的关键在于系统性地量化编辑衰减(edit decay)现象,发现不同编辑方法(如MEMIT、AlphaEdit)和微调策略(如全参数微调、LoRA、DoRA)对编辑持久性的影响,并提出选择性层微调(selective-layer fine-tuning)策略——仅微调被编辑的层可有效移除编辑,同时保持下游任务性能损失最小;更意外的是,微调未编辑层反而比全参数微调造成更大的编辑破坏,揭示了模型内部结构对编辑稳定性的关键作用。

链接: https://arxiv.org/abs/2511.05852
作者: Yinjie Cheng,Paul Youssef,Christin Seifert,Jörg Schlötterer,Zhixue Zhao
机构: University of Sheffield (谢菲尔德大学); University of Marburg (马尔堡大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Under review at ICLR 2026

点击查看摘要

Abstract:Knowledge editing has emerged as a lightweight alternative to retraining for correcting or injecting specific facts in large language models (LLMs). Meanwhile, fine-tuning remains the default operation for adapting LLMs to new domains and tasks. Despite their widespread adoption, these two post-training interventions have been studied in isolation, leaving open a crucial question: if we fine-tune an edited model, do the edits survive? This question is motivated by two practical scenarios: removing covert or malicious edits, and preserving beneficial edits. If fine-tuning impairs edits as shown in Figure 1, current KE methods become less useful, as every fine-tuned model would require re-editing, which significantly increases the cost; if edits persist, fine-tuned models risk propagating hidden malicious edits, raising serious safety concerns. To this end, we systematically quantify edits decay after fine-tuning, investigating how fine-tuning affects knowledge editing. We evaluate two state-of-the-art editing methods (MEMIT, AlphaEdit) and three fine-tuning approaches (full-parameter, LoRA, DoRA) across five LLMs and three datasets, yielding 232 experimental configurations. Our results show that edits decay after fine-tuning, with survival varying across configurations, e.g., AlphaEdit edits decay more than MEMIT edits. Further, we propose selective-layer fine-tuning and find that fine-tuning edited layers only can effectively remove edits, though at a slight cost to downstream performance. Surprisingly, fine-tuning non-edited layers impairs more edits than full fine-tuning. Overall, our study establishes empirical baselines and actionable strategies for integrating knowledge editing with fine-tuning, and underscores that evaluating model editing requires considering the full LLM application pipeline.
zh

[NLP-115] DiagnoLLM : A Hybrid Bayesian Neural Language Framework for Interpretable Disease Diagnosis

【速读】: 该论文旨在解决临床人工智能(AI)系统在疾病诊断中面临的可信赖性问题,即如何在保证高预测准确性的同时提供透明且具有生物学依据的解释。其解决方案的关键在于构建一个混合框架DiagnoLLM,该框架融合了贝叶斯去卷积(Bayesian deconvolution)、eQTL引导的深度学习以及大语言模型(Large Language Model, LLM)驱动的叙事生成模块:首先通过GP-unmix模型从批量和单细胞RNA测序数据中推断细胞类型特异性的基因表达谱并量化生物不确定性;随后利用eQTL分析提供的调控先验信息训练神经分类器以实现阿尔茨海默病(Alzheimer’s Disease, AD)的高效检测(准确率达88.0%);最后借助LLM作为后处理推理模块,将模型输出转化为面向医生和患者的结构化诊断报告,确保内容基于临床特征、归因信号及领域知识,从而增强人类对系统的理解与信任。

链接: https://arxiv.org/abs/2511.05810
作者: Bowen Xu,Xinyue Zeng,Jiazhen Hu,Tuo Wang,Adithya Kulkarni
机构: 未知
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Building trustworthy clinical AI systems requires not only accurate predictions but also transparent, biologically grounded explanations. We present \textttDiagnoLLM, a hybrid framework that integrates Bayesian deconvolution, eQTL-guided deep learning, and LLM-based narrative generation for interpretable disease diagnosis. DiagnoLLM begins with GP-unmix, a Gaussian Process-based hierarchical model that infers cell-type-specific gene expression profiles from bulk and single-cell RNA-seq data while modeling biological uncertainty. These features, combined with regulatory priors from eQTL analysis, power a neural classifier that achieves high predictive performance in Alzheimer’s Disease (AD) detection (88.0% accuracy). To support human understanding and trust, we introduce an LLM-based reasoning module that translates model outputs into audience-specific diagnostic reports, grounded in clinical features, attribution signals, and domain knowledge. Human evaluations confirm that these reports are accurate, actionable, and appropriately tailored for both physicians and patients. Our findings show that LLMs, when deployed as post-hoc reasoners rather than end-to-end predictors, can serve as effective communicators within hybrid diagnostic pipelines.
zh

[NLP-116] DRAG ON: Guard LLM Unlearning in Context via Negative Detection and Reasoning ICML2025 NEURIPS2025

【速读】: 该论文旨在解决大型语言模型(Large Language Models, LLMs)在实际部署中难以有效实现“遗忘”(unlearning)的问题,尤其是在缺乏保留数据(retain data)的情况下,传统基于微调的方法无法平衡遗忘效率与通用语言能力。其解决方案的关键在于提出一种名为Detect-Reasoning Augmented GeneratiON (DRAGON) 的系统性框架,该框架不修改基础模型,而是利用LLM固有的指令遵循能力,通过引入轻量级检测模块识别需遗忘的提示(forget-worthy prompts),并借助上下文链式思维(in-context chain-of-thought, CoT)引导的专用防护模型进行安全且精准的干预,从而实现无需保留数据即可高效、可扩展地完成遗忘任务。

链接: https://arxiv.org/abs/2511.05784
作者: Yaxuan Wang,Chris Yuhao Liu,Quan Liu,Jinglong Pang,Wei Wei,Yujia Bao,Yang Liu
机构: University of California, Santa Cruz (加州大学圣克鲁兹分校); Center for Advanced AI, Accenture (埃森哲高级人工智能中心)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Please refer to the NeurIPS 2025 submission: this https URL The paper has been accepted to the ICML 2025 MUGen Workshop: this https URL

点击查看摘要

Abstract:Unlearning in Large Language Models (LLMs) is crucial for protecting private data and removing harmful knowledge. Most existing approaches rely on fine-tuning to balance unlearning efficiency with general language capabilities. However, these methods typically require training or access to retain data, which is often unavailable in real world scenarios. Although these methods can perform well when both forget and retain data are available, few works have demonstrated equivalent capability in more practical, data-limited scenarios. To overcome these limitations, we propose Detect-Reasoning Augmented GeneratiON (DRAGON), a systematic, reasoning-based framework that utilizes in-context chain-of-thought (CoT) instructions to guard deployed LLMs before inference. Instead of modifying the base model, DRAGON leverages the inherent instruction-following ability of LLMs and introduces a lightweight detection module to identify forget-worthy prompts without any retain data. These are then routed through a dedicated CoT guard model to enforce safe and accurate in-context intervention. To robustly evaluate unlearning performance, we introduce novel metrics for unlearning performance and the continual unlearning setting. Extensive experiments across three representative unlearning tasks validate the effectiveness of DRAGON, demonstrating its strong unlearning capability, scalability, and applicability in practical scenarios.
zh

[NLP-117] Anchors in the Machine: Behavioral and Attributional Evidence of Anchoring Bias in LLM s

【速读】: 该论文旨在解决大语言模型(Large Language Models, LLMs)中锚定偏差(anchoring bias)的本质问题:即观测到的偏倚是表层输出模仿还是深层概率分布的改变。此前研究多依赖表面行为证据,缺乏对内部机制和因果贡献的解析。解决方案的关键在于构建一个融合行为分析与可解释性方法的统一框架——首先通过基于log-probability的行为分析验证锚点会系统性改变输出分布;其次利用Shapley值对结构化提示字段进行精确归因,量化锚点对模型log-probability的影响;最终提出“锚定偏差敏感度评分”(Anchoring Bias Sensitivity Score),整合行为与归因证据,实现对六种开源模型的跨模型比较。该框架不仅揭示了锚定偏差在LLMs中的稳健性和可解释性,也为评估其他认知偏倚提供了可复现的方法路径。

链接: https://arxiv.org/abs/2511.05766
作者: Felipe Valencia-Clavijo
机构: Dataplicada
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); General Economics (econ.GN)
备注:

点击查看摘要

Abstract:Large language models (LLMs) are increasingly examined as both behavioral subjects and decision systems, yet it remains unclear whether observed cognitive biases reflect surface imitation or deeper probability shifts. Anchoring bias, a classic human judgment bias, offers a critical test case. While prior work shows LLMs exhibit anchoring, most evidence relies on surface-level outputs, leaving internal mechanisms and attributional contributions unexplored. This paper advances the study of anchoring in LLMs through three contributions: (1) a log-probability-based behavioral analysis showing that anchors shift entire output distributions, with controls for training-data contamination; (2) exact Shapley-value attribution over structured prompt fields to quantify anchor influence on model log-probabilities; and (3) a unified Anchoring Bias Sensitivity Score integrating behavioral and attributional evidence across six open-source models. Results reveal robust anchoring effects in Gemma-2B, Phi-2, and Llama-2-7B, with attribution signaling that the anchors influence reweighting. Smaller models such as GPT-2, Falcon-RW-1B, and GPT-Neo-125M show variability, suggesting scale may modulate sensitivity. Attributional effects, however, vary across prompt designs, underscoring fragility in treating LLMs as human substitutes. The findings demonstrate that anchoring bias in LLMs is robust, measurable, and interpretable, while highlighting risks in applied domains. More broadly, the framework bridges behavioral science, LLM safety, and interpretability, offering a reproducible path for evaluating other cognitive biases in LLMs.
zh

[NLP-118] Language Generation: Complexity Barriers and Implications for Learning

链接: https://arxiv.org/abs/2511.05759
作者: Marcelo Arenas,Pablo Barceló,Luis Cofré,Alexander Kozachinskiy
机构: DCC UC & IMFD; IMC UC, CENIA & IMFD; Faculty of Mathematics UC; CENIA
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-119] Multi-Scale Feature Fusion and Graph Neural Network Integration for Text Classification with Large Language Models

【速读】: 该论文旨在解决复杂语义场景下文本分类任务中全局信息与局部细节难以平衡、以及语义单元间潜在关系建模不足的问题。解决方案的关键在于提出一种融合深度特征提取、多尺度特征金字塔融合和图神经网络结构建模的混合方法:首先利用大语言模型(Large Language Model, LLM)捕获上下文依赖和深层语义表示,随后通过特征金字塔机制整合不同尺度的语义特征,实现全局与局部信息的协同表达;进而将融合后的特征转化为图结构,借助图神经网络(Graph Neural Network, GNN)挖掘文本内部隐含的语义关联与逻辑依赖,从而实现对语义单元间复杂交互的全面建模。该框架在准确率(ACC)、F1分数(F1-Score)、AUC及精确率(Precision)等指标上均优于现有模型,验证了其有效性与稳定性。

链接: https://arxiv.org/abs/2511.05752
作者: Xiangchen Song,Yulin Huang,Jinxu Guo,Yuchen Liu,Yaxuan Luan
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:This study investigates a hybrid method for text classification that integrates deep feature extraction from large language models, multi-scale fusion through feature pyramids, and structured modeling with graph neural networks to enhance performance in complex semantic contexts. First, the large language model captures contextual dependencies and deep semantic representations of the input text, providing a rich feature foundation for subsequent modeling. Then, based on multi-level feature representations, the feature pyramid mechanism effectively integrates semantic features of different scales, balancing global information and local details to construct hierarchical semantic expressions. Furthermore, the fused features are transformed into graph representations, and graph neural networks are employed to capture latent semantic relations and logical dependencies in the text, enabling comprehensive modeling of complex interactions among semantic units. On this basis, the readout and classification modules generate the final category predictions. The proposed method demonstrates significant advantages in robustness alignment experiments, outperforming existing models on ACC, F1-Score, AUC, and Precision, which verifies the effectiveness and stability of the framework. This study not only constructs an integrated framework that balances global and local information as well as semantics and structure, but also provides a new perspective for multi-scale feature fusion and structured semantic modeling in text classification tasks.
zh

[NLP-120] In-Context Learning Without Copying

【速读】: 该论文旨在解决一个关键问题:生成式 AI(Generative AI)模型中的归纳复制(inductive copying)机制是否是实现更复杂上下文学习(in-context learning, ICL)能力的必要条件。以往研究表明,诱导头(induction heads)在训练过程中会导致损失急剧下降,暗示其可能是ICL能力的前提。为验证这一假设,作者提出Hapax设置——通过屏蔽那些可被诱导头正确预测的token的损失贡献,从而抑制归纳复制行为。解决方案的关键在于设计一种损失掩码策略,使模型无法依赖归纳复制来优化训练目标,同时仍保留对抽象型ICL任务的学习能力。实验表明,在31.7%的token被移除损失的情况下,模型在13/21个抽象型ICL任务上表现优于基线模型,并且在无法被诱导头预测的位置表现出更低的损失,说明归纳复制并非抽象ICL机制的必要条件。

链接: https://arxiv.org/abs/2511.05743
作者: Kerem Sahin(1),Sheridan Feucht(1),Adam Belfki(1),Jannik Brinkmann(2),Aaron Mueller(3),David Bau(1),Chris Wendler(1) ((1) Northeastern University, (2) University of Mannheim, (3) Boston University)
机构: Northeastern University (东北大学); University of Mannheim (曼海姆大学); Boston University (波士顿大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Induction heads are attention heads that perform inductive copying by matching patterns from earlier context and copying their continuations verbatim. As models develop induction heads, they often experience a sharp drop in training loss, a phenomenon cited as evidence that induction heads may serve as a prerequisite for more complex in-context learning (ICL) capabilities. In this work, we ask whether transformers can still acquire ICL capabilities when inductive copying is suppressed. We propose Hapax, a setting where we omit the loss contribution of any token that can be correctly predicted by induction heads. Despite a significant reduction in inductive copying, performance on abstractive ICL tasks (i.e., tasks where the answer is not contained in the input context) remains comparable and surpasses the vanilla model on 13 of 21 tasks, even though 31.7% of tokens are omitted from the loss. Furthermore, our model achieves lower loss values on token positions that cannot be predicted correctly by induction heads. Mechanistic analysis further shows that models trained with Hapax develop fewer and weaker induction heads but still preserve ICL capabilities. Taken together, our findings indicate that inductive copying is not essential for learning abstractive ICL mechanisms.
zh

[NLP-121] OckBench: Measuring the Efficiency of LLM Reasoning

【速读】: 该论文试图解决当前大语言模型(Large Language Models, LLMs)评估体系中忽视解码Token效率的问题。现有基准主要关注准确性和输出质量,但未考虑在实际系统中生成不同数量Token所带来的延迟、成本和能耗差异。解决方案的关键在于提出OckBench——一个模型无关且硬件无关的基准测试平台,能够同时衡量推理与编码任务中的准确性与Token消耗量。通过该平台,研究者可以识别出在相似准确率下Token使用效率存在显著差异的现象,并构建精度-效率帕累托前沿(Pareto frontiers),从而推动从“将Token视为免费资源”向“重视Token效率”的评估范式转变。

链接: https://arxiv.org/abs/2511.05722
作者: Zheng Du,Hao Kang,Song Han,Tushar Krishna,Ligeng Zhu
机构: Georgia Institute of Technology (佐治亚理工学院); Massachusetts Institute of Technology (麻省理工学院); Nvidia Cooperation (英伟达公司)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Large language models such as GPT-4, Claude 3, and the Gemini series have improved automated reasoning and code generation. However, existing benchmarks mainly focus on accuracy and output quality, and they ignore an important factor: decoding token efficiency. In real systems, generating 10,000 tokens versus 100,000 tokens leads to large differences in latency, cost, and energy. In this work, we introduce OckBench, a model-agnostic and hardware-agnostic benchmark that evaluates both accuracy and token count for reasoning and coding tasks. Through experiments comparing multiple open- and closed-source models, we uncover that many models with comparable accuracy differ wildly in token consumption, revealing that efficiency variance is a neglected but significant axis of differentiation. We further demonstrate Pareto frontiers over the accuracy-efficiency plane and argue for an evaluation paradigm shift: we should no longer treat tokens as “free” to multiply. OckBench provides a unified platform for measuring, comparing, and guiding research in token-efficient reasoning. Our benchmarks are available at this https URL .
zh

[NLP-122] Persian Musical Instruments Classification Using Polyphonic Data Augmentation

【速读】: 该论文旨在解决非西方音乐传统(特别是波斯音乐)中乐器分类研究匮乏的问题,以提升音乐信息检索(Music Information Retrieval, MIR)和生成式音乐系统对多元文化音乐的理解能力。其关键解决方案是提出了一种文化相关数据增强策略,通过该策略从单音轨样本中生成具有真实感的多声部混合音频,并结合基于大规模自监督训练的MERT模型(Music undERstanding with large-scale self-supervised Training)进行分类任务。实验表明,该方法在真实波斯音乐多声部场景下取得最优ROC-AUC(0.795),验证了音高与时间连贯性协同增强对于提升乐器识别鲁棒性的有效性,为构建更具文化包容性的MIR系统提供了基础。

链接: https://arxiv.org/abs/2511.05717
作者: Diba Hadi Esfangereh,Mohammad Hossein Sameti,Sepehr Harfi Moridani,Leili Javidpour,Mahdieh Soleymani Baghshah
机构: Sharif University of Technology (伊朗沙里夫理工大学)
类目: ound (cs.SD); Computation and Language (cs.CL)
备注: 9 pages, 2 figures, 4 tables

点击查看摘要

Abstract:Musical instrument classification is essential for music information retrieval (MIR) and generative music systems. However, research on non-Western traditions, particularly Persian music, remains limited. We address this gap by introducing a new dataset of isolated recordings covering seven traditional Persian instruments, two common but originally non-Persian instruments (i.e., violin, piano), and vocals. We propose a culturally informed data augmentation strategy that generates realistic polyphonic mixtures from monophonic samples. Using the MERT model (Music undERstanding with large-scale self-supervised Training) with a classification head, we evaluate our approach with out-of-distribution data which was obtained by manually labeling segments of traditional songs. On real-world polyphonic Persian music, the proposed method yielded the best ROC-AUC (0.795), highlighting complementary benefits of tonal and temporal coherence. These results demonstrate the effectiveness of culturally grounded augmentation for robust Persian instrument recognition and provide a foundation for culturally inclusive MIR and diverse music generation systems.
zh

[NLP-123] Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale

【速读】: 该论文旨在解决如何系统性构建大规模、以视觉为中心的推理数据集的问题,特别是针对超越视觉数学(visual math)任务的复杂多模态推理能力。其核心挑战在于现有方法依赖于未公开的数据集和专有数据合成策略,缺乏可复现性和通用性。解决方案的关键在于提出一种两阶段的数据生成框架:首先“规模扩展”(scale),生成超过100万条高质量的合成视觉推理问题;其次“复杂度提升”(complexity),通过结合视觉语言模型(VLM)与推理大语言模型(LLM)协同生成思维链(Chain-of-Thought, CoT)轨迹,从而捕捉前沿推理模型中丰富的认知行为模式。该框架不仅支持离线和在线强化学习(RL),还被证明在多个视觉基准上显著优于开源基线甚至部分闭源模型,并展现出跨模态迁移能力(如文本和音频推理),为视觉推理模型的训练与评估提供了可扩展、高质量、通用性强的新范式。

链接: https://arxiv.org/abs/2511.05705
作者: David Acuna,Chao-Han Huck Yang,Yuntian Deng,Jaehun Jung,Ximing Lu,Prithviraj Ammanabrolu,Hyunwoo Kim,Yuan-Hong Liao,Yejin Choi
机构: NVIDIA; University of Toronto; University of Waterloo; UCSD
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: Project Page: this https URL

点击查看摘要

Abstract:Recent progress in multimodal reasoning has been driven largely by undisclosed datasets and proprietary data synthesis recipes, leaving open questions about how to systematically build large-scale, vision-centric reasoning datasets, particularly for tasks that go beyond visual math. In this work, we introduce a new reasoning data generation framework spanning diverse skills and levels of complexity with over 1M high-quality synthetic vision-centric questions. The dataset also includes preference data and instruction prompts supporting both offline and online RL. Our synthesis framework proceeds in two stages: (1) scale; and (2) complexity. Reasoning traces are then synthesized through a two-stage process that leverages VLMs and reasoning LLMs, producing CoT traces for VLMs that capture the richness and diverse cognitive behaviors found in frontier reasoning models. Remarkably, we show that finetuning Qwen2.5-VL-7B on our data outperforms all open-data baselines across all evaluated vision-centric benchmarks, and even surpasses strong closed-data models such as MiMo-VL-7B-RL on V* Bench, CV-Bench and MMStar-V. Perhaps most surprising, despite being entirely vision-centric, our data transfers positively to text-only reasoning (MMLU-Pro) and audio reasoning (MMAU), demonstrating its effectiveness. Similarly, despite not containing videos or embodied visual data, we observe notable gains when evaluating on a single-evidence embodied QA benchmark (NiEH). Finally, we use our data to analyze the entire VLM post-training pipeline. Our empirical analysis highlights that (i) SFT on high-quality data with non-linear reasoning traces is essential for effective online RL, (ii) staged offline RL matches online RL’s performance while reducing compute demands, and (iii) careful SFT on high quality data can substantially improve out-of-domain, cross-modality transfer.
zh

[NLP-124] abDistill: Distilling Transformers into Neural Nets for Few-Shot Tabular Classification

【速读】: 该论文旨在解决Transformer-based模型在小样本(few-shot)场景下虽表现优异但存在参数量大、计算复杂度高的问题。其解决方案的关键在于提出一种名为TabDistill的知识蒸馏策略,通过将复杂Transformer模型的预训练知识高效迁移至结构更简单的神经网络中,从而在保持良好性能的同时实现参数效率提升。该方法使简化后的模型在等量训练数据下超越传统基线(如XGBoost、逻辑回归和普通神经网络),甚至在某些情况下优于原始Transformer模型。

链接: https://arxiv.org/abs/2511.05704
作者: Pasan Dissanayake,Sanghamitra Dutta
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Transformer-based models have shown promising performance on tabular data compared to their classical counterparts such as neural networks and Gradient Boosted Decision Trees (GBDTs) in scenarios with limited training data. They utilize their pre-trained knowledge to adapt to new domains, achieving commendable performance with only a few training examples, also called the few-shot regime. However, the performance gain in the few-shot regime comes at the expense of significantly increased complexity and number of parameters. To circumvent this trade-off, we introduce TabDistill, a new strategy to distill the pre-trained knowledge in complex transformer-based models into simpler neural networks for effectively classifying tabular data. Our framework yields the best of both worlds: being parameter-efficient while performing well with limited training data. The distilled neural networks surpass classical baselines such as regular neural networks, XGBoost and logistic regression under equal training data, and in some cases, even the original transformer-based models that they were distilled from.
zh

[NLP-125] A Representation Sharpening Framework for Zero Shot Dense Retrieval

【速读】: 该论文旨在解决零样本密集检索(zero-shot dense retrieval)中的关键挑战:当文档语料库未提供相关查询时,预训练的密集检索器(dense retrievers, DRs)因未针对目标语料库进行训练,难以准确区分语义相近文档之间的差异,从而影响检索效果。解决方案的关键在于提出一种无需训练的表示锐化(representation sharpening)框架,通过在文档表示中引入有助于区分语料库中相似文档的信息,提升检索的细粒度区分能力。实验表明,该方法在多个语言的二十多个数据集上显著优于传统检索方法,并在保持与现有零样本检索方法兼容的同时持续提升性能,同时通过索引时间近似策略有效平衡了性能与计算成本。

链接: https://arxiv.org/abs/2511.05684
作者: Dhananjay Ashok,Suraj Nair,Mutasem Al-Darabsah,Choon Hui Teo,Tarun Agarwal,Jonathan May
机构: Information Sciences Institute, University of Southern California (南加州大学信息科学研究所); Amazon (亚马逊)
类目: Information Retrieval (cs.IR); Computation and Language (cs.CL)
备注: 15 pages, 4 figures

点击查看摘要

Abstract:Zero-shot dense retrieval is a challenging setting where a document corpus is provided without relevant queries, necessitating a reliance on pretrained dense retrievers (DRs). However, since these DRs are not trained on the target corpus, they struggle to represent semantic differences between similar documents. To address this failing, we introduce a training-free representation sharpening framework that augments a document’s representation with information that helps differentiate it from similar documents in the corpus. On over twenty datasets spanning multiple languages, the representation sharpening framework proves consistently superior to traditional retrieval, setting a new state-of-the-art on the BRIGHT benchmark. We show that representation sharpening is compatible with prior approaches to zero-shot dense retrieval and consistently improves their performance. Finally, we address the performance-cost tradeoff presented by our framework and devise an indexing-time approximation that preserves the majority of our performance gains over traditional retrieval, yet suffers no additional inference-time cost.
zh

[NLP-126] Optimizing Diversity and Quality through Base-Aligned Model Collaboration

链接: https://arxiv.org/abs/2511.05650
作者: Yichen Wang,Chenghao Yang,Tenghao Huang,Muhao Chen,Jonathan May,Mina Lee
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 52 pages, 16 figures

点击查看摘要

[NLP-127] UTF-8 Plumbing: Byte-level Tokenizers Unavoidably Enable LLM s to Generate Ill-formed UTF-8

【速读】: 该论文旨在解决子词分词(subword tokenization)过程中因词汇表包含非法UTF-8字符而导致的序列解码不一致问题,即:当分词器的词汇表中存在非合法UTF-8字节组合时,逐个token还原为字符串再解释为UTF-8所得结果与一次性将整个token序列还原为字符串所得结果不一致,从而引发实际应用中的bug。解决方案的关键在于利用**幺半群理论(monoid theory)**对分词过程进行形式化建模,并证明了此类分词器必然会产生非法UTF-8序列;进而指出增量式还原策略不可靠,必须在设计语言模型服务系统时考虑这一理论限制,以避免因UTF-8校验失败导致的程序崩溃或语义错误。

链接: https://arxiv.org/abs/2511.05578
作者: Preston Firestone,Shubham Ugare,Gagandeep Singh,Sasa Misailovic
机构: 未知
类目: Computation and Language (cs.CL)
备注: COLM 2025

点击查看摘要

Abstract:Subword tokenization segments input text according to a pre-defined vocabulary to feed it into a language model; the language model, in turn, generates a sequence made from this same vocabulary. The members of the vocabulary can be built of code points or bytes. Using code points means that all members of the vocabulary are valid UTF-8 characters. However, it also requires thousands of initial members to achieve acceptable coverage of inputs. Beginning with bytes, on the contrary, avoids out-of-vocabulary errors with only 256 initial members of the vocabulary, but the members of the vocabulary and sequences of them are not guaranteed to be valid UTF-8. Sequences that are not valid UTF-8 break code that assumes its input to be valid UTF-8. Applications of language models must account for the breakage thereby introduced. In this paper, we formalize tokenization using monoid theory and prove that tokenizers whose vocabularies contain tokens that are ill-formed UTF-8 can always produce sequences that are ill-formed UTF-8. We demonstrate formally that attempting to incrementally convert tokens back to a string and interpret the results as UTF-8 gives different results than converting the whole sequence of tokens at once. This formal result predicts real-world bugs: we evaluate mitigations for the problem identified and provide case studies of major foundation models, serving engines, and constrained generation systems.
zh

[NLP-128] Fine-Tuning Vision-Language Models for Multimodal Polymer Property Prediction

【速读】: 该论文旨在解决当前视觉-语言模型(Vision-Language Models, VLMs)在材料科学等专业领域中应用受限的问题,特别是缺乏能够利用多模态数据进行广泛任务(如聚合物性能预测)的基础模型。其解决方案的关键在于构建一个用于聚合物的多模态数据集,并通过指令微调(instruction-tuning)对VLM进行优化,结合低秩适应(LoRA)技术实现高效参数调整。实验表明,该方法在聚合物属性预测任务上优于单模态和基线模型,同时减少了为不同属性训练独立模型的需求,从而降低了部署与维护成本。

链接: https://arxiv.org/abs/2511.05577
作者: An Vuong,Minh-Hao Van,Prateek Verma,Chen Zhao,Xintao Wu
机构: 未知
类目: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Vision-Language Models (VLMs) have shown strong performance in tasks like visual question answering and multimodal text generation, but their effectiveness in scientific domains such as materials science remains limited. While some machine learning methods have addressed specific challenges in this field, there is still a lack of foundation models designed for broad tasks like polymer property prediction using multimodal data. In this work, we present a multimodal polymer dataset to fine-tune VLMs through instruction-tuning pairs and assess the impact of multimodality on prediction performance. Our fine-tuned models, using LoRA, outperform unimodal and baseline approaches, demonstrating the benefits of multimodal learning. Additionally, this approach reduces the need to train separate models for different properties, lowering deployment and maintenance costs.
zh

[NLP-129] Sample-Efficient Language Modeling with Linear Attention and Lightweight Enhancements

【速读】: 该论文旨在解决在资源受限环境下实现高效语言建模的问题,尤其是在BabyLM 2025共享任务的约束条件下提升样本效率。其核心解决方案是提出一种名为BLaLM的轻量级架构,关键在于用线性时间复杂度的mLSTM token mixer替代传统的自注意力机制(self-attention),并引入多项轻量化增强技术,包括短卷积、带动态调制的滑动窗口注意力(sliding window attention)以及Hedgehog特征映射。此外,研究还构建了一个注重可读性和教学结构的高质量语料库,并采用Muon优化器替代AdamW以提升训练稳定性与降低困惑度(perplexity),从而在不依赖模型规模扩展的前提下显著提升零样本性能。

链接: https://arxiv.org/abs/2511.05560
作者: Patrick Haller,Jonas Golde,Alan Akbik
机构: Humboldt-Universität zu Berlin (柏林洪堡大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:We study architectural and optimization tech- niques for sample-efficient language modeling under the constraints of the BabyLM 2025 shared task. Our model, BLaLM, replaces self-attention with a linear-time mLSTM to- ken mixer and explores lightweight enhance- ments, including short convolutions, sliding window attention with dynamic modulation, and Hedgehog feature maps. To support train- ing in low-resource settings, we curate a high- quality corpus emphasizing readability and ped- agogical structure. Experiments across both STRICT and STRICT-SMALL tracks show that (1) linear attention combined with sliding win- dow attention consistently improves zero-shot performance, and (2) the Muon optimizer stabi- lizes convergence and reduces perplexity over AdamW. These results highlight effective strate- gies for efficient language modeling without relying on scale.
zh

[NLP-130] Factual and Musical Evaluation Metrics for Music Language Models

【速读】: 该论文旨在解决当前音乐语言模型(Music Language Models, Music LMs)评估体系中存在的核心缺陷——即现有常用指标(如BLEU、METEOR和BERTScore)仅能衡量生成回答的语义流畅性,而无法准确判断其内容是否正确。为应对这一问题,论文提出两个关键解决方案:一是设计了一个适用于音乐领域的通用评价指标,以更贴合音乐语境下的回答质量;二是构建了一个事实性评估框架,用于量化Music LM回答内容的真实性与准确性。该框架不依赖于特定模态的问答模型架构,具备跨领域可扩展性,可推广至其他开放式问答任务中。

链接: https://arxiv.org/abs/2511.05550
作者: Daniel Chenyu Lin,Michael Freeman,John Thickstun
机构: 未知
类目: ound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: 18 pages; first submission

点击查看摘要

Abstract:Music language models (Music LMs), like vision language models, leverage multimodal representations to answer natural language queries about musical audio recordings. Although Music LMs are reportedly improving, we find that current evaluations fail to capture whether their answers are correct. Specifically, for all Music LMs that we examine, widely-used evaluation metrics such as BLEU, METEOR, and BERTScore fail to measure anything beyond linguistic fluency of the model’s responses. To measure the true performance of Music LMs, we propose (1) a better general-purpose evaluation metric for Music LMs adapted to the music domain and (2) a factual evaluation framework to quantify the correctness of a Music LM’s responses. Our framework is agnostic to the modality of the question-answering model and could be generalized to quantify performance in other open-ended question-answering domains. We use open datasets in our experiments and will release all code on publication.
zh

[NLP-131] mporal Sparse Autoencoders: Leverag ing the Sequential Nature of Language for Interpretability

【速读】: 该论文旨在解决当前基于字典学习的解释方法(如稀疏自编码器,Sparse Autoencoders, SAEs)在语言模型(Language Models, LMs)中难以捕捉深层语义概念的问题。现有方法倾向于提取浅层、与特定token相关的噪声特征(例如“句子开头的短语’The’”),而非具有语义连贯性和泛化能力的高层次概念。其根本原因在于当前无监督训练策略忽略了语言固有的结构特性,特别是语义内容具有长程依赖且序列上平滑,而句法信息则更局部。解决方案的关键在于提出Temporal Sparse Autoencoders (T-SAEs),通过引入一种新颖的对比损失函数,强制高阶特征在相邻token间保持一致激活,从而在无需显式语义信号的情况下实现语义与句法特征的解耦。这一机制显著提升了特征的语义平滑性和可解释性,同时保持了重建质量。

链接: https://arxiv.org/abs/2511.05541
作者: Usha Bhalla,Alex Oesterling,Claudio Mayrink Verdun,Himabindu Lakkaraju,Flavio P. Calmon
机构: Harvard University (哈佛大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 23 Pages, 10 figures

点击查看摘要

Abstract:Translating the internal representations and computations of models into concepts that humans can understand is a key goal of interpretability. While recent dictionary learning methods such as Sparse Autoencoders (SAEs) provide a promising route to discover human-interpretable features, they suffer from a variety of problems, including a systematic failure to capture the rich conceptual information that drives linguistic understanding. Instead, they exhibit a bias towards shallow, token-specific, or noisy features, such as “the phrase ‘The’ at the start of sentences”. In this work, we propose that this is due to a fundamental issue with how dictionary learning methods for LLMs are trained. Language itself has a rich, well-studied structure spanning syntax, semantics, and pragmatics; however, current unsupervised methods largely ignore this linguistic knowledge, leading to poor feature discovery that favors superficial patterns over meaningful concepts. We focus on a simple but important aspect of language: semantic content has long-range dependencies and tends to be smooth over a sequence, whereas syntactic information is much more local. Building on this insight, we introduce Temporal Sparse Autoencoders (T-SAEs), which incorporate a novel contrastive loss encouraging consistent activations of high-level features over adjacent tokens. This simple yet powerful modification enables SAEs to disentangle semantic from syntactic features in a self-supervised manner. Across multiple datasets and models, T-SAEs recover smoother, more coherent semantic concepts without sacrificing reconstruction quality. Strikingly, they exhibit clear semantic structure despite being trained without explicit semantic signal, offering a new pathway for unsupervised interpretability in language models.
zh

[NLP-132] Future of AI Models: A Computational perspective on Model collapse

【速读】: 该论文试图解决的问题是:随着生成式 AI(Generative AI)模型的广泛应用,由其生成的内容正迅速占据网络语料库,导致训练数据出现递归污染(recursive contamination),进而可能引发模型崩溃(Model Collapse)——即语言和语义多样性持续下降,最终损害模型的泛化能力。解决方案的关键在于通过量化分析英语维基百科(经过滤的 Common Crawl 数据集)从 2013 到 2025 年间逐年语义相似度的变化,利用 Transformer 嵌入与余弦相似度指标追踪这一趋势,从而提供一个基于数据驱动的模型崩溃发生时间预估。结果表明,在公开发布大型语言模型(LLM)之前,相似度已稳步上升,而之后则呈指数增长,揭示了递归训练对语料多样性的潜在威胁。

链接: https://arxiv.org/abs/2511.05535
作者: Trivikram Satharasi(1),S Sitharama Iyengar(2) ((1) University of Florida, Gainesville, FL, (2) Florida International University, Miami. FL)
机构: 未知
类目: Computation and Language (cs.CL); Databases (cs.DB); Information Theory (cs.IT)
备注: Submitted to Springer Nature. Code Available at this https URL

点击查看摘要

Abstract:Artificial Intelligence, especially Large Language Models (LLMs), has transformed domains such as software engineering, journalism, creative writing, academia, and media (Naveed et al. 2025; arXiv:2307.06435). Diffusion models like Stable Diffusion generate high-quality images and videos from text. Evidence shows rapid expansion: 74.2% of newly published webpages now contain AI-generated material (Ryan Law 2025), 30-40% of the active web corpus is synthetic (Spennemann 2025; arXiv:2504.08755), 52% of U.S. adults use LLMs for writing, coding, or research (Staff 2025), and audits find AI involvement in 18% of financial complaints and 24% of press releases (Liang et al. 2025). The underlying neural architectures, including Transformers (Vaswani et al. 2023; arXiv:1706.03762), RNNs, LSTMs, GANs, and diffusion networks, depend on large, diverse, human-authored datasets (Shi Iyengar 2019). As synthetic content dominates, recursive training risks eroding linguistic and semantic diversity, producing Model Collapse (Shumailov et al. 2024; arXiv:2307.15043; Dohmatob et al. 2024; arXiv:2402.07712). This study quantifies and forecasts collapse onset by examining year-wise semantic similarity in English-language Wikipedia (filtered Common Crawl) from 2013 to 2025 using Transformer embeddings and cosine similarity metrics. Results reveal a steady rise in similarity before public LLM adoption, likely driven by early RNN/LSTM translation and text-normalization pipelines, though modest due to a smaller scale. Observed fluctuations reflect irreducible linguistic diversity, variable corpus size across years, finite sampling error, and an exponential rise in similarity after the public adoption of LLM models. These findings provide a data-driven estimate of when recursive AI contamination may significantly threaten data richness and model generalization.
zh

[NLP-133] FlowMM: Cross-Modal Information Flow Guided KV Cache Merging for Efficient Multimodal Context Inference

【速读】: 该论文旨在解决多模态大语言模型(Multimodal Large Language Models, MLLMs)中KV缓存(Key-Value cache)管理效率与生成质量之间的矛盾问题。传统基于注意力分数的KV缓存淘汰策略易导致上下文丢失或幻觉,而现有融合策略在跨模态场景下受限于模态间分布偏移和跨模态注意力偏差,难以有效保留关键信息。解决方案的关键在于提出FlowMM框架,其核心创新包括:(1) 基于跨模态信息流(cross-modal information flow)动态调整各层的合并策略,以捕捉模态特异性模式并维持上下文完整性;(2) 设计敏感度自适应的token匹配机制,联合评估token相似性与任务敏感性,优先合并低风险token、保护高敏感token。实验表明,FlowMM可在保持任务性能的前提下将KV缓存内存减少80%–95%,解码延迟降低1.3–1.8倍。

链接: https://arxiv.org/abs/2511.05534
作者: Kunxi Li,Yufan Xiong,Zhonghua Jiang,Yiyun Zhou,Zhaode Wang,Chengfei Lv,Shengyu Zhang
机构: Zhejiang University (浙江大学); Huazhong Agricultural University (华中农业大学); Alibaba (阿里巴巴)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Traditional KV cache eviction strategies, which discard less critical KV-pairs based on attention scores, often degrade generation quality, causing context loss or hallucinations. Recent efforts shift toward KV merging, merging eviction tokens with retention tokens based on similarity. However, in multimodal scenarios, distributional biases across modality tokens and attentional biases in cross-modal interactions limit its effectiveness. This work introduces FlowMM, an adaptive framework for cross-modal information flow-guided multimodal KV cache merging. FlowMM leverages cross-modal information flow to dynamically apply layer-specific merging strategies, capturing modality-specific patterns while preserving contextual integrity. Furthermore, we introduce a sensitivity-adaptive token matching mechanism that jointly evaluates token similarity and task-critical sensitivity, merging low-risk tokens while safeguarding high-sensitivity ones. Extensive experiments across diverse leading MLLMs show that FlowMM reduces KV cache memory by 80% to 95% and decoding latency by 1.3-1.8x, while maintaining competitive task performance.
zh

[NLP-134] MCP4IFC: IFC-Based Building Design Using Large Language Models

【速读】: 该论文旨在解决如何将生成式AI(Generative AI)有效引入建筑、工程与施工(AEC)领域的问题,核心挑战在于实现自然语言指令到标准化BIM数据模型(如IFC)操作的自动化映射。解决方案的关键在于提出MCP4IFC框架,该框架通过Model Context Protocol(MCP)使大型语言模型(LLMs)能够直接操作IFC数据,并集成场景查询工具、预定义建模函数以及结合上下文学习与检索增强生成(RAG)的动态代码生成系统,从而支持从简单建模到复杂信息检索与编辑的多样化任务。

链接: https://arxiv.org/abs/2511.05533
作者: Bharathi Kannan Nithyanantham,Tobias Sesterhenn,Ashwin Nedungadi,Sergio Peral Garijo,Janis Zenkner,Christian Bartelt,Stefan Lüdtke
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Bringing generative AI into the architecture, engineering and construction (AEC) field requires systems that can translate natural language instructions into actions on standardized data models. We present MCP4IFC, a comprehensive open-source framework that enables Large Language Models (LLMs) to directly manipulate Industry Foundation Classes (IFC) data through the Model Context Protocol (MCP). The framework provides a set of BIM tools, including scene querying tools for information retrieval, predefined functions for creating and modifying common building elements, and a dynamic code-generation system that combines in-context learning with retrieval-augmented generation (RAG) to handle tasks beyond the predefined toolset. Experiments demonstrate that an LLM using our framework can successfully perform complex tasks, from building a simple house to querying and editing existing IFC data. Our framework is released as open-source to encourage research in LLM-driven BIM design and provide a foundation for AI-assisted modeling workflows. Our code is available at this https URL.
zh

[NLP-135] Beyond One-Size-Fits-All: Personalized Harmful Content Detection with In-Context Learning

链接: https://arxiv.org/abs/2511.05532
作者: Rufan Zhang,Lin Zhang,Xianghang Mi
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-136] Retracing the Past: LLM s Emit Training Data When They Get Lost

【速读】: 该论文旨在解决大规模语言模型(Large Language Models, LLMs)对训练数据的记忆现象所引发的隐私和版权风险问题。现有基于启发式的方法在提取记忆数据时成功率有限,且难以揭示记忆泄露的根本驱动因素。其解决方案的关键在于提出一种系统性的“混淆诱导攻击”(Confusion-Inducing Attacks, CIA)框架,通过优化输入片段以刻意诱发模型在token级别预测熵的持续升高状态,从而有效触发并提取被记忆的文本内容;此外,针对对齐后的LLMs,进一步引入不匹配监督微调(Mismatched Supervised Fine-tuning, SFT),同步削弱模型对齐性并增强其对攻击的敏感性。实验表明,该方法无需事先了解训练数据即可高效提取原文及近似原文的数据,显著优于现有基线方法。

链接: https://arxiv.org/abs/2511.05518
作者: Myeongseob Ko,Nikhil Reddy Billa,Adam Nguyen,Charles Fleming,Ming Jin,Ruoxi Jia
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: The 2025 Conference on Empirical Methods in Natural Language Processing

点击查看摘要

Abstract:The memorization of training data in large language models (LLMs) poses significant privacy and copyright concerns. Existing data extraction methods, particularly heuristic-based divergence attacks, often exhibit limited success and offer limited insight into the fundamental drivers of memorization leakage. This paper introduces Confusion-Inducing Attacks (CIA), a principled framework for extracting memorized data by systematically maximizing model uncertainty. We empirically demonstrate that the emission of memorized text during divergence is preceded by a sustained spike in token-level prediction entropy. CIA leverages this insight by optimizing input snippets to deliberately induce this consecutive high-entropy state. For aligned LLMs, we further propose Mismatched Supervised Fine-tuning (SFT) to simultaneously weaken their alignment and induce targeted confusion, thereby increasing susceptibility to our attacks. Experiments on various unaligned and aligned LLMs demonstrate that our proposed attacks outperform existing baselines in extracting verbatim and near-verbatim training data without requiring prior knowledge of the training data. Our findings highlight persistent memorization risks across various LLMs and offer a more systematic method for assessing these vulnerabilities.
zh

[NLP-137] Ming-UniAudio: Speech LLM for Joint Understanding Generation and Editing with Unified Representation

【速读】: 该论文旨在解决现有语音模型在理解(speech understanding)与生成(speech generation)任务中对词元(token)表示需求相冲突的问题,这一矛盾限制了语音语言模型实现基于指令的自由形式语音编辑。其解决方案的关键在于提出了一种统一的连续语音分词器 MingTok-Audio,这是首个能有效融合语义特征与声学特征的连续分词器,从而为理解和生成任务提供一致且高效的表示基础;在此基础上构建的统一语音语言模型 Ming-UniAudio 实现了生成与理解能力的平衡,并进一步衍生出首个仅依赖自然语言指令即可完成通用、自由形式语音编辑的模型 Ming-UniAudio-Edit,无需时间戳条件即可同时处理语义和声学层面的修改。

链接: https://arxiv.org/abs/2511.05516
作者: Canxiang Yan,Chunxiang Jin,Dawei Huang,Haibing Yu,Han Peng,Hui Zhan,Jie Gao,Jing Peng,Jingdong Chen,Jun Zhou,Kaimeng Ren,Ming Yang,Mingxue Yang,Qiang Xu,Qin Zhao,Ruijie Xiong,Shaoxiong Lin,Xuezhi Wang,Yi Yuan,Yifei Wu,Yongjie Lyu,Zhengyu He,Zhihao Qiu,Zhiqiang Fang,Ziyuan Huang
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
备注: 32 pages, 8 figures

点击查看摘要

Abstract:Existing speech models suffer from competing requirements on token representations by understanding and generation tasks. This discrepancy in representation prevents speech language models from performing instruction-based free-form editing. To solve this challenge, we introduce a novel framework that unifies speech understanding, generation, and editing. The core of our unified model is a unified continuous speech tokenizer MingTok-Audio, the first continuous tokenizer to effectively integrate semantic and acoustic features, which makes it suitable for both understanding and generation tasks. Based on this unified continuous audio tokenizer, we developed the speech language model Ming-UniAudio, which achieved a balance between generation and understanding capabilities. Ming-UniAudio sets new state-of-the-art (SOTA) records on 8 out of 12 metrics on the ContextASR benchmark. Notably, for Chinese voice cloning, it achieves a highly competitive Seed-TTS-WER of 0.95. Leveraging this foundational model, we further trained a dedicated speech editing model Ming-UniAudio-Edit, the first speech language model that enables universal, free-form speech editing guided solely by natural language instructions, handling both semantic and acoustic modifications without timestamp condition. To rigorously assess the editing capability and establish a foundation for future research, we introduce Ming-Freeform-Audio-Edit, the first comprehensive benchmark tailored for instruction-based free-form speech editing, featuring diverse scenarios and evaluation dimensions spanning semantic correctness, acoustic quality, and instruction alignment. We open-sourced the continuous audio tokenizer, the unified foundational model, and the free-form instruction-based editing model to facilitate the development of unified audio understanding, generation, and manipulation.
zh

[NLP-138] Predicting Oscar-Nominated Screenplays with Sentence Embeddings

【速读】: 该论文试图解决的问题是:能否利用现代语言模型预测奥斯卡最佳原创或改编剧本提名(Oscar nominations for screenplays)。其解决方案的关键在于构建了一个名为 Movie-O-Label 的新数据集,该数据集整合了电影剧本集合 MovieSum 与经过人工筛选的奥斯卡获奖记录,并将每部剧本表示为标题、维基百科摘要和完整脚本三部分文本信息;随后使用 E5 句子嵌入模型对长文本进行分块编码,并通过逻辑回归分类器融合三种特征输入(脚本、摘要、标题)进行预测。实验表明,该方法在宏平均 F1 分数(macro F1 score)达到 0.66,ROC-AUC 达到 0.79,证明基于文本嵌入的简单模型已具备较好的预测能力,可作为未来研究的基础。

链接: https://arxiv.org/abs/2511.05500
作者: Francis Gross
机构: University of Regensburg (雷根斯堡大学)
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Oscar nominations are an important factor in the movie industry because they can boost both the visibility and the commercial success. This work explores whether it is possible to predict Oscar nominations for screenplays using modern language models. Since no suitable dataset was available, a new one called Movie-O-Label was created by combining the MovieSum collection of movie scripts with curated Oscar records. Each screenplay was represented by its title, Wikipedia summary, and full script. Long scripts were split into overlapping text chunks and encoded with the E5 sentence em bedding model. Then, the screenplay embed dings were classified using a logistic regression model. The best results were achieved when three feature inputs related to screenplays (script, summary, and title) were combined. The best-performing model reached a macro F1 score of 0.66, a precision recall AP of 0.445 with baseline 0.19 and a ROC-AUC of 0.79. The results suggest that even simple models based on modern text embeddings demonstrate good prediction performance and might be a starting point for future research.
zh

[NLP-139] AI Brown and AI Koditex: LLM -Generated Corpora Comparable to Traditional Corpora of English and Czech Texts

【速读】: 该论文旨在解决当前缺乏可用于语言学比较研究的、由大语言模型(Large Language Models, LLMs)生成的英文与捷克语文本资源的问题,以实现对人类写作与LLM生成文本在语言特征上的系统性对比。解决方案的关键在于构建两个高质量、多题材、主题丰富且结构上可与现有真人创作语料库(BE21和Koditex)直接比较的生成语料库,其文本由来自OpenAI、Anthropic、Alphabet、Meta和DeepSeek的多种LLM(从GPT-3到GPT-4.5)生成,并统一采用Universal Dependencies标准进行标注(包括分词、词形还原及形态句法标注),从而确保数据的可比性和可用性。

链接: https://arxiv.org/abs/2509.22996
作者: Jiří Milička,Anna Marklová,Václav Cvrček
机构: Charles University (查尔斯大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:This article presents two corpora of English and Czech texts generated with large language models (LLMs). The motivation is to create a resource for comparing human-written texts with LLM-generated text linguistically. Emphasis was placed on ensuring these resources are multi-genre and rich in terms of topics, authors, and text types, while maintaining comparability with existing human-created corpora. These generated corpora replicate reference human corpora: BE21 by Paul Baker, which is a modern version of the original Brown Corpus, and Koditex corpus that also follows the Brown Corpus tradition but in Czech. The new corpora were generated using models from OpenAI, Anthropic, Alphabet, Meta, and DeepSeek, ranging from GPT-3 (davinci-002) to GPT-4.5, and are tagged according to the Universal Dependencies standard (i.e., they are tokenized, lemmatized, and morphologically and syntactically annotated). The subcorpus size varies according to the model used (the English part contains on average 864k tokens per model, 27M tokens altogether, the Czech partcontains on average 768k tokens per model, 21.5M tokens altogether). The corpora are freely available for download under the CC BY 4.0 license (the annotated data are under CC BY-NC-SA 4.0 licence) and are also accessible through the search interface of the Czech National Corpus.
zh

[NLP-140] Language Generation with Infinite Contamination

【速读】: 该论文旨在解决语言生成在极限情况下对数据污染(contamination)的鲁棒性问题,即当算法从一个未知目标语言 $ K $ 中观察到由对手构造的字符串枚举序列时,如何保证其仍能生成新的、未见过的 $ K $ 中字符串。此前研究假设数据完全纯净(无噪声插入和遗漏),但现实场景中数据常含噪声或缺失,因此核心挑战在于量化生成任务可容忍的污染程度。解决方案的关键在于:首先,证明了在所有可数语言集合上实现语言生成的充要条件是污染比例趋于零;其次,指出稠密生成(dense generation)比普通生成更脆弱,且通过引入受课程学习启发的“超越最坏情况”模型,进一步表明即使存在无限污染,只要污染比例收敛至零,稠密生成依然可行——这揭示了课程学习机制在处理真实世界噪声数据中的潜在重要性。

链接: https://arxiv.org/abs/2511.07417
作者: Anay Mehrotra,Grigoris Velegkas,Xifan Yu,Felix Zhou
机构: Yale University (耶鲁大学); Google Research (谷歌研究院)
类目: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:We study language generation in the limit, where an algorithm observes an adversarial enumeration of strings from an unknown target language K and must eventually generate new, unseen strings from K . Kleinberg and Mullainathan [KM24] proved that generation is achievable in surprisingly general settings. But their generator suffers from mode collapse,'' producing from an ever-smaller subset of the target. To address this, Kleinberg and Wei [KW25] require the generator's output to be dense’’ in the target language. They showed that generation with density, surprisingly, remains achievable at the same generality. Both results assume perfect data: no noisy insertions and no omissions. This raises a central question: how much contamination can generation tolerate? Recent works made partial progress on this question by studying (non-dense) generation with either finite amounts of noise (but no omissions) or omissions (but no noise). We characterize robustness under contaminated enumerations: 1. Generation under Contamination: Language generation in the limit is achievable for all countable collections iff the fraction of contaminated examples converges to zero. When this fails, we characterize which collections are generable. 2. Dense Generation under Contamination: Dense generation is strictly less robust to contamination than generation. As a byproduct, we resolve an open question of Raman and Raman [ICML25] by showing that generation is possible with only membership oracle access under finitely many contaminated examples. Finally, we introduce a beyond-worst-case model inspired by curriculum learning and prove that dense generation is achievable even with infinite contamination provided the fraction of contaminated examples converges to zero. This suggests curriculum learning may be crucial for learning from noisy web data. Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG) Cite as: arXiv:2511.07417 [stat.ML] (or arXiv:2511.07417v1 [stat.ML] for this version) https://doi.org/10.48550/arXiv.2511.07417 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Anay Mehrotra [view email] [v1] Mon, 10 Nov 2025 18:59:39 UTC (117 KB)
zh

[NLP-141] Adaptive Testing for Segmenting Watermarked Texts From Language Models

【速读】: 该论文旨在解决生成式 AI(Generative AI)文本中水印检测与片段分割的难题,即如何准确识别并分离出由大语言模型(Large Language Models, LLMs)生成且嵌入水印的文本段落与人类撰写的内容。其解决方案的关键在于提出一种基于似然的自适应检测框架,通过引入灵活的加权公式和逆变换采样方法,显著降低了对提示词(prompt)精确估计的依赖性,从而在混合文本中实现高精度、鲁棒的水印区域分割。

链接: https://arxiv.org/abs/2511.06645
作者: Xingchi Li,Xiaochi Liu,Guanxun Li
机构: Texas A&M University (德克萨斯A&M大学); Beijing Normal University at Zhuhai (北京师范大学珠海分校)
类目: Machine Learning (stat.ML); Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: 13 pages, 3 figures, accepted for publication in STAT, October 28, 2025

点击查看摘要

Abstract:The rapid adoption of large language models (LLMs), such as GPT-4 and Claude 3.5, underscores the need to distinguish LLM-generated text from human-written content to mitigate the spread of misinformation and misuse in education. One promising approach to address this issue is the watermark technique, which embeds subtle statistical signals into LLM-generated text to enable reliable identification. In this paper, we first generalize the likelihood-based LLM detection method of a previous study by introducing a flexible weighted formulation, and further adapt this approach to the inverse transform sampling method. Moving beyond watermark detection, we extend this adaptive detection strategy to tackle the more challenging problem of segmenting a given text into watermarked and non-watermarked substrings. In contrast to the approach in a previous study, which relies on accurate estimation of next-token probabilities that are highly sensitive to prompt estimation, our proposed framework removes the need for precise prompt estimation. Extensive numerical experiments demonstrate that the proposed methodology is both effective and robust in accurately segmenting texts containing a mixture of watermarked and non-watermarked content.
zh

[NLP-142] On the Analogy between Human Brain and LLM s: Spotting Key Neurons in Grammar Perception

【速读】: 该论文试图解决的问题是:如何揭示大型语言模型(Large Language Models, LLMs)在处理语言时是否具备类似人类大脑中语法类别(如名词、动词等)的神经表征机制。解决方案的关键在于,利用Llama 3模型识别出与不同词性标签(part-of-speech tags)预测最相关的神经元,并通过这些关键神经元的激活模式训练一个分类器,在新数据上实现对词性的可靠预测。这一发现表明,LLMs中存在一个专门捕捉词性概念的子空间,其结构和功能特征与神经科学中基于脑损伤研究观察到的人类大脑神经分工模式相似。

链接: https://arxiv.org/abs/2511.06519
作者: Sanaz Saki Norouzi,Mohammad Masjedi,Pascal Hitzler
机构: 未知
类目: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Artificial Neural Networks, the building blocks of AI, were inspired by the human brain’s network of neurons. Over the years, these networks have evolved to replicate the complex capabilities of the brain, allowing them to handle tasks such as image and language processing. In the realm of Large Language Models, there has been a keen interest in making the language learning process more akin to that of humans. While neuroscientific research has shown that different grammatical categories are processed by different neurons in the brain, we show that LLMs operate in a similar way. Utilizing Llama 3, we identify the most important neurons associated with the prediction of words belonging to different part-of-speech tags. Using the achieved knowledge, we train a classifier on a dataset, which shows that the activation patterns of these key neurons can reliably predict part-of-speech tags on fresh data. The results suggest the presence of a subspace in LLMs focused on capturing part-of-speech tag concepts, resembling patterns observed in lesion studies of the brain in neuroscience.
zh

[NLP-143] Approximating the Mathematical Structure of Psychodynamics

【速读】: 该论文旨在解决心理学与精神病学领域中复杂认知现象难以进行精确量化研究的问题,尤其是在人工智能安全(AI safety)背景下对人类认知过程建模的挑战。其解决方案的关键在于引入过程理论(process theory)的图示化框架,将人类心理动力学(psychodynamics)形式化为数学上严谨且跨学科可理解的表达方式,从而实现对心理治疗、神经技术、AI对齐(AI alignment)、自主协商中的个体代理表示以及类人AI系统开发等多场景下认知机制的统一建模与分析。

链接: https://arxiv.org/abs/2511.05580
作者: Bryce-Allen Bagley,Navin Khoshnan
机构: 未知
类目: Neurons and Cognition (q-bio.NC); Computation and Language (cs.CL); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
备注:

点击查看摘要

Abstract:The complexity of human cognition has meant that psychology makes more use of theory and conceptual models than perhaps any other biomedical field. To enable precise quantitative study of the full breadth of phenomena in psychological and psychiatric medicine as well as cognitive aspects of AI safety, there is a need for a mathematical formulation which is both mathematically precise and equally accessible to experts from numerous fields. In this paper we formalize human psychodynamics via the diagrammatic framework of process theory, describe its key properties, and explain the links between a diagrammatic representation and central concepts in analysis of cognitive processes in contexts such as psychotherapy, neurotechnology, AI alignment, AI agent representation of individuals in autonomous negotiations, developing human-like AI systems, and other aspects of AI safety.
zh

[NLP-144] he Role of High-Performance GPU Resources in Large Language Model Based Radiology Imaging Diagnosis

【速读】: 该论文旨在解决大型语言模型(Large-language models, LLMs)在放射学临床部署中的两大核心挑战:一是确保高诊断准确性,二是实现低推理延迟,从而满足实际医疗场景对效率与性能的双重需求。解决方案的关键在于利用高性能图形处理单元(Graphics Processing Units, GPUs)提供的强大计算能力和内存带宽,通过现代GPU架构(如NVIDIA A100/H100、AMD Instinct MI250X/MI300)支持LLM在医学影像数据上的高效运行,并结合混合精度训练、量化、压缩及多GPU扩展等优化策略,显著降低推理时间并提升吞吐量,同时兼顾隐私保护、部署可行性与能效管理。

链接: https://arxiv.org/abs/2509.16328
作者: Jyun-Ping Kao
机构: National Taiwan University (国立台湾大学)
类目: Tissues and Organs (q-bio.TO); Computation and Language (cs.CL); Image and Video Processing (eess.IV); Medical Physics (physics.med-ph)
备注:

点击查看摘要

Abstract:Large-language models (LLMs) are rapidly being applied to radiology, enabling automated image interpretation and report generation tasks. Their deployment in clinical practice requires both high diagnostic accuracy and low inference latency, which in turn demands powerful hardware. High-performance graphical processing units (GPUs) provide the necessary compute and memory throughput to run large LLMs on imaging data. We review modern GPU architectures (e.g. NVIDIA A100/H100, AMD Instinct MI250X/MI300) and key performance metrics of floating-point throughput, memory bandwidth, VRAM capacity. We show how these hardware capabilities affect radiology tasks: for example, generating reports or detecting findings on CheXpert and MIMIC-CXR images is computationally intensive and benefits from GPU parallelism and tensor-core acceleration. Empirical studies indicate that using appropriate GPU resources can reduce inference time and improve throughput. We discuss practical challenges including privacy, deployment, cost, power and optimization strategies: mixed-precision, quantization, compression, and multi-GPU scaling. Finally, we anticipate that next-generation features (8-bit tensor cores, enhanced interconnect) will further enable on-premise and federated radiology AI. Advancing GPU infrastructure is essential for safe, efficient LLM-based radiology diagnostics.
zh

计算机视觉

[CV-0] Lightning Grasp: High Performance Procedural Grasp Synthesis with Contact Fields

链接: https://arxiv.org/abs/2511.07418
作者: Zhao-Heng Yin,Pieter Abbeel
机构: UC Berkeley EECS (加州大学伯克利分校电子工程与计算机科学系)
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Graphics (cs.GR)
备注: Code: this https URL

点击查看摘要

[CV-1] Robot Learning from a Physical World Model

链接: https://arxiv.org/abs/2511.07416
作者: Jiageng Mao,Sicheng He,Hao-Ning Wu,Yang You,Shuyang Sun,Zhicheng Wang,Yanan Bao,Huizhong Chen,Leonidas Guibas,Vitor Guizilini,Howard Zhou,Yue Wang
机构: Google DeepMind(谷歌深度智脑); USC(南加州大学); Stanford(斯坦福大学); Toyota Research Institute(丰田研究院)
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: Project page: this https URL

点击查看摘要

[CV-2] winOR: Photorealistic Digital Twins of Dynamic Operating Rooms for Embodied AI Research

链接: https://arxiv.org/abs/2511.07412
作者: Han Zhang,Yiqing Shen,Roger D. Soberanis-Mukul,Ankita Ghosh,Hao Ding,Lalithkumar Seenivasan,Jose L. Porras,Zhekai Mao,Chenjia Li,Wenjie Xiao,Lonny Yarmus,Angela Christine Argento,Masaru Ishii,Mathias Unberath
机构: Johns Hopkins University (约翰霍普金斯大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
备注:

点击查看摘要

[CV-3] DIMO: Diverse 3D Motion Generation for Arbitrary Objects ICCV2025

链接: https://arxiv.org/abs/2511.07409
作者: Linzhan Mou,Jiahui Lei,Chen Wang,Lingjie Liu,Kostas Daniilidis
机构: University of Pennsylvania (宾夕法尼亚大学); Archimedes, Athena RC (Archimedes, Athena RC)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Published in ICCV 2025, project page this https URL

点击查看摘要

[CV-4] StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation

链接: https://arxiv.org/abs/2511.07399
作者: Tianrui Feng,Zhi Li,Shuo Yang,Haocheng Xi,Muyang Li,Xiuyu Li,Lvmin Zhang,Keting Yang,Kelly Peng,Song Han,Maneesh Agrawala,Kurt Keutzer,Akio Kodaira,Chenfeng Xu
机构: University of California, Berkeley (加州大学伯克利分校); Google (谷歌); Stanford University (斯坦福大学); Massachusetts Institute of Technology (麻省理工学院); OpenAI; Meta; Stability.AI; Anthropic; Character.ai; Claude; NVIDIA (英伟达)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Project Page: this http URL

点击查看摘要

[CV-5] Real-Time LiDAR Super-Resolution via Frequency-Aware Multi-Scale Fusion

链接: https://arxiv.org/abs/2511.07377
作者: June Moh Goo,Zichao Zeng,Jan Boehm
机构: University College London (伦敦大学学院); Ordnance Survey (英国国家测绘局)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
备注:

点击查看摘要

[CV-6] Inference-Time Scaling of Diffusion Models for Infrared Data Generation

链接: https://arxiv.org/abs/2511.07362
作者: Kai A. Horstmann,Maxim Clouser,Kia Khezeli
机构: Cornell University (康奈尔大学); YRIKKA, Inc.
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Peer-reviewed workshop paper

点击查看摘要

[CV-7] Preparation of Fractal-Inspired Computational Architectures for Advanced Large Language Model Analysis

链接: https://arxiv.org/abs/2511.07329
作者: Yash Mittal,Dmitry Ignatov,Radu Timofte
机构: 未知
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-8] Garbage Vulnerable Point Monitoring using IoT and Computer Vision

链接: https://arxiv.org/abs/2511.07325
作者: R. Kumar,A. Lall,S. Chaudhari,M. Kale,A. Vattem
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:

点击查看摘要

[CV-9] YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting

链接: https://arxiv.org/abs/2511.07321
作者: Botao Ye,Boqi Chen,Haofei Xu,Daniel Barath,Marc Pollefeys
机构: ETH Zurich (苏黎世联邦理工学院); ETH AI Center (苏黎世联邦理工学院人工智能中心); Microsoft (微软)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-10] Beyond Boundaries: Leverag ing Vision Foundation Models for Source-Free Object Detection AAAI2026

链接: https://arxiv.org/abs/2511.07301
作者: Huizai Yao,Sicheng Zhao,Pengteng Li,Yi Cui,Shuo Lu,Weiyu Guo,Yunfan Lu,Yijie Xu,Hui Xiong
机构: The University of Science and Technology Hong Kong (香港科技大学); Tsinghua University (清华大学); Peking University (北京大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Accepted to AAAI 2026. Extended version with full Appendix

点击查看摘要

[CV-11] VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models

链接: https://arxiv.org/abs/2511.07299
作者: Ying Cheng,Yu-Ho Lin,Min-Hung Chen,Fu-En Yang,Shang-Hong Lai
机构: National Tsing Hua University (国立清华大学); NVIDIA
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-12] LMM-IQA: Image Quality Assessment for Low-Dose CT Imaging

链接: https://arxiv.org/abs/2511.07298
作者: Kagan Celik,Mehmet Ozan Unal,Metin Ertas,Isa Yildirim
机构: Istanbul Technical University (伊斯坦布尔技术大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-13] Verifying rich robustness properties for neural networks

链接: https://arxiv.org/abs/2511.07293
作者: Mohammad Afzal,S. Akshay,Ashutosh Gupta
机构: 未知
类目: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-14] PlanT 2.0: Exposing Biases and Structural Flaws in Closed-Loop Driving

链接: https://arxiv.org/abs/2511.07292
作者: Simon Gerstenecker,Andreas Geiger,Katrin Renz
机构: University of Tübingen (图宾根大学); Tübingen AI Center (图宾根人工智能中心)
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-15] Glioma C6: A Novel Dataset for Training and Benchmarking Cell Segmentation

链接: https://arxiv.org/abs/2511.07286
作者: Roman Malashin,Svetlana Pashkevich,Daniil Ilyukhin,Arseniy Volkov,Valeria Yachnaya,Andrey Denisov,Maria Mikhalkova
机构: Pavlov Institute of Physiology, Russian academy of science (巴甫洛夫生理研究所,俄罗斯科学院); Saint-Petersburg State University of Aerospace Instrumentation, Russia (圣彼得堡航空航天仪器大学,俄罗斯); Institute of Physiology, NAS of Belarus (白俄罗斯科学院生理研究所)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-16] Segmentation of Ischemic Stroke Lesions using Transfer Learning on Multi-sequence MRI

链接: https://arxiv.org/abs/2511.07281
作者: R. P. Chowdhury,T. Rahman
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Ischemic Stroke, Segmentation, Transfer Learning, Magnetic Resonance Imaging, Deep Learning, Res-UNet

点击查看摘要

[CV-17] StreamKV: Streaming Video Question-Answering with Segment-based KV Cache Retrieval and Compression

链接: https://arxiv.org/abs/2511.07278
作者: Yilong Chen,Xiang Bai,Zhibin Wang,Chengyu Bai,Yuhan Dai,Ming Lu,Shanghang Zhang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-18] MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLM s

链接: https://arxiv.org/abs/2511.07250
作者: Tianhao Peng,Haochen Wang,Yuanxing Zhang,Zekun Wang,Zili Wang,Ge Zhang,Jian Yang,Shihao Li,Yanghai Wang,Xintao Wang,Houyi Li,Wei Ji,Pengfei Wan,Wenhao Huang,Zhaoxiang Zhang,Jiaheng Liu
机构: Nanjing University (南京大学); CASIA (中国科学院自动化研究所); Kuaishou Technology (快手科技); M-A-P
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-19] 4DSTR: Advancing Generative 4D Gaussians with Spatial-Temporal Rectification for High-Quality and Consistent 4D Generation AAAI

链接: https://arxiv.org/abs/2511.07241
作者: Mengmeng Liu,Jiuming Liu,Yunpeng Zhang,Jiangtao Li,Michael Ying Yang,Francesco Nex,Hao Cheng
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by AAAI this http URL first two authors contributed equally

点击查看摘要

[CV-20] Leverag ing Text-Driven Semantic Variation for Robust OOD Segmentation IROS

链接: https://arxiv.org/abs/2511.07238
作者: Seungheon Song,Jaekoo Lee
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 8 pages, 5 figure references, 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) submission

点击查看摘要

[CV-21] Noise pattern: identity-anchored Tikhonov regularization for robust structural anomaly detection

链接: https://arxiv.org/abs/2511.07233
作者: Alexander Bauer,Klaus-Robert Müller
机构: TU Berlin (柏林工业大学); BIFOLD (柏林智能计算中心); Korea University (韩国科学技术院); MPI for Informatics (德国马普研究所信息学所)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:

点击查看摘要

[CV-22] Mapping Reduced Accessibility to WASH Facilities in Rohingya Refugee Camps with Sub-Meter Imagery

【速读】:该论文旨在解决难民营中供水、环境卫生与个人卫生(WASH)服务可及性难以量化的问题,尤其是在人口高度密集的罗兴亚人难民营中。其核心挑战在于如何准确识别临时安置点并评估关键设施(如水井、厕所和淋浴间)的分布与使用压力。解决方案的关键在于构建一个基于遥感数据的半监督分割框架,利用亚米级卫星影像实现对难民住所的高精度检测(F1-score达76.4%),进而动态追踪多时段内WASH设施的人均负担变化——结果显示从2022年到2025年,人均服务人数由25上升至29.4,且女性和女孩因缺乏性别隔离设施而面临更严重的可及性下降。这一方法为在资源受限环境中实施需求响应型资源配置提供了科学依据,凸显了高分辨率遥感与机器学习在复杂人道主义场景下识别不平等、优化公平分配的重要价值。

链接: https://arxiv.org/abs/2511.07231
作者: Kyeongjin Ahn,YongHun Suh,Sungwon Han,Jeasurk Yang,Hannes Taubenböck,Meeyoung Cha
机构: Max Planck Institute for Security and Privacy (MPI-SP); Korea Advanced Institute of Science and Technology (KAIST); Meta; German Aerospace Center (DLR); Earth Observation Center (EOC); Würzburg University
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 23 pages, 13 figures, 2 tables

点击查看摘要

Abstract:Access to Water, Sanitation, and Hygiene (WASH) services remains a major public health concern in refugee camps. This study introduces a remote sensing-driven framework to quantify WASH accessibility-specifically to water pumps, latrines, and bathing cubicles-in the Rohingya camps of Cox’s Bazar, one of the world’s most densely populated displacement settings. Detecting refugee shelters in such emergent camps presents substantial challenges, primarily due to their dense spatial configuration and irregular geometric patterns. Using sub-meter satellite images, we develop a semi-supervised segmentation framework that achieves an F1-score of 76.4% in detecting individual refugee shelters. Applying the framework across multi-year data reveals declining WASH accessibility, driven by rapid refugee population growth and reduced facility availability, rising from 25 people per facility in 2022 to 29.4 in 2025. Gender-disaggregated analysis further shows that women and girls experience reduced accessibility, in scenarios with inadequate safety-related segregation in WASH facilities. These findings suggest the importance of demand-responsive allocation strategies that can identify areas with under-served populations-such as women and girls-and ensure that limited infrastructure serves the greatest number of people in settings with fixed or shrinking budgets. We also discuss the value of high-resolution remote sensing and machine learning to detect inequality and inform equitable resource planning in complex humanitarian environments.
zh

[CV-23] Omni-View: Unlocking How Generation Facilitates Understanding in Unified 3D Model based on Multiview images

链接: https://arxiv.org/abs/2511.07222
作者: JiaKui Hu,Shanshan Zhao,Qing-Guo Chen,Xuerui Qiu,Jialun Liu,Zhao Xu,Weihua Luo,Kaifu Zhang,Yanye Lu
机构: Peking University (北京大学); Alibaba International Digital Commerce Group (阿里巴巴国际数字商业集团); CASIA (中国科学院自动化研究所); TeleAI (TeleAI)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Under review

点击查看摘要

[CV-24] Breaking the Stealth-Potency Trade-off in Clean-Image Backdoors with Generative Trigger Optimization AAAI-2026 AAAI’26

链接: https://arxiv.org/abs/2511.07210
作者: Binyan Xu,Fan Yang,Di Tang,Xilin Dai,Kehuan Zhang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
备注: 19 pages, 22 figures, 15 tables. To appear in AAAI '26 (Oral). This paper extends the AAAI-2026 version by including the Appendix

点击查看摘要

[CV-25] Geometric implicit neural representations for signed distance functions

链接: https://arxiv.org/abs/2511.07206
作者: Luiz Schirmer,Tiago Novello,Vinícius da Silva,Guilherme Schardong,Daniel Perazzo,Hélio Lopes,Nuno Gonçalves,Luiz Velho
机构: Universidade Federal de Santa Maria (圣玛丽亚联邦大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Computational Geometry (cs.CG); Graphics (cs.GR)
备注:

点击查看摘要

[CV-26] Automated Estimation of Anatomical Risk Metrics for Endoscopic Sinus Surgery Using Deep Learning

链接: https://arxiv.org/abs/2511.07199
作者: Konrad Reuter,Lennart Thaysen,Bilkay Doruk,Sarah Latus,Brigitte Holst,Benjamin Becker,Dennis Eggert,Christian Betz,Anna-Sophie Hoffmann,Alexander Schlaefer
机构: Hamburg University of Technology (汉堡工业大学); University Medical Center Hamburg-Eppendorf (汉堡-埃彭多夫大学医学中心)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted to SPIE Medical Imaging conference 2026

点击查看摘要

[CV-27] LiteUpdate: A Lightweight Framework for Updating AI-Generated Image Detectors

链接: https://arxiv.org/abs/2511.07192
作者: Jiajie Lu,Zhenkan Fu,Na Zhao,Long Xing,Kejiang Chen,Weiming Zhang,Nenghai Yu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
备注:

点击查看摘要

[CV-28] Federated Learning for Video Violence Detection: Complementary Roles of Lightweight CNNs and Vision-Language Models for Energy-Efficient Use ICTAI2025

【速读】:该论文旨在解决视频监控场景中隐私保护与计算效率之间的矛盾,特别是针对大规模视觉语言模型(VLM)在联邦学习框架下部署时带来的高能耗和可持续性挑战。其核心问题是如何在保障用户隐私的前提下,实现高效、低功耗且准确的暴力行为检测。解决方案的关键在于提出并比较三种策略:基于预训练VLM的零样本推理、LoRA微调的LLaVA-NeXT-Video-7B模型以及个性化联邦学习的65.8M参数3D卷积神经网络(3D CNN)。实验表明,3D CNN在保持超过90%二分类准确率的同时,能耗仅为联邦LoRA方案的一半(240 Wh vs. 570 Wh),且校准性能更优(ROC AUC 92.59%),而VLM则提供更强的多模态推理能力;通过层次化类别分组策略进一步提升VLM在多类暴力行为识别中的准确率(UCF-Crime数据集上从65.31%提升至81%)。研究揭示了混合部署范式的价值:以轻量级CNN处理常规检测任务,仅在复杂情境下调用VLM进行深度语义分析。

链接: https://arxiv.org/abs/2511.07171
作者: Sébastien Thuau,Siba Haidar,Rachid Chelouah
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 5 pages, 3 figures, ICTAI 2025

点击查看摘要

Abstract:Deep learning-based video surveillance increasingly demands privacy-preserving architectures with low computational and environmental overhead. Federated learning preserves privacy but deploying large vision-language models (VLMs) introduces major energy and sustainability challenges. We compare three strategies for federated violence detection under realistic non-IID splits on the RWF-2000 and RLVS datasets: zero-shot inference with pretrained VLMs, LoRA-based fine-tuning of LLaVA-NeXT-Video-7B, and personalized federated learning of a 65.8M-parameter 3D CNN. All methods exceed 90% accuracy in binary violence detection. The 3D CNN achieves superior calibration (ROC AUC 92.59%) at roughly half the energy cost (240 Wh vs. 570 Wh) of federated LoRA, while VLMs provide richer multimodal reasoning. Hierarchical category grouping (based on semantic similarity and class exclusion) boosts VLM multiclass accuracy from 65.31% to 81% on the UCF-Crime dataset. To our knowledge, this is the first comparative simulation study of LoRA-tuned VLMs and personalized CNNs for federated violence detection, with explicit energy and CO2e quantification. Our results inform hybrid deployment strategies that default to efficient CNNs for routine inference and selectively engage VLMs for complex contextual reasoning.
zh

[CV-29] ProcGen3D: Learning Neural Procedural Graph Representations for Image-to-3D Reconstruction

链接: https://arxiv.org/abs/2511.07142
作者: Xinyi Zhang,Daoyi Gao,Naiqi Li,Angela Dai
机构: Technical University of Munich (慕尼黑工业大学); Tsinghua University (清华大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Project Page: this https URL

点击查看摘要

[CV-30] MPJudge: Towards Perceptual Assessment of Music-Induced Paintings

链接: https://arxiv.org/abs/2511.07137
作者: Shiqi Jiang,Tianyi Liang,Changbo Wang,Chenhui Li
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-31] Sparse4DGS: 4D Gaussian Splatting for Sparse-Frame Dynamic Scene Reconstruction AAAI2026

链接: https://arxiv.org/abs/2511.07122
作者: Changyue Shi,Chuxiao Yang,Xinyuan Hu,Minghao Chen,Wenwen Pan,Yan Yang,Jiajun Ding,Zhou Yu,Jun Yu
机构: Hangzhou Dianzi University (杭州电子科技大学); Zhejiang University (浙江大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: AAAI 2026

点击查看摘要

[CV-32] HENet: Hybrid Encoding and Multi-task Learning for 3D Perception and End-to-end Autonomous Driving

链接: https://arxiv.org/abs/2511.07106
作者: Zhongyu Xia,Zhiwei Lin,Yongtao Wang,Ming-Hsuan Yang
机构: Peking University (北京大学); University of California, Merced (加州大学默塞德分校)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Preliminary version, 19 pages

点击查看摘要

[CV-33] GEWDiff: Geometric Enhanced Wavelet-based Diffusion Model for Hyperspectral Image Super-resolution AAAI2026

链接: https://arxiv.org/abs/2511.07103
作者: Sirui Wang,Jiang He,Natàlia Blasco Andreo,Xiao Xiang Zhu
机构: 1. Wuhan University (武汉大学); 2. University of Barcelona (巴塞罗那大学); 3. German Aerospace Center (德国航空航天中心)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: This manuscript has been accepted for publication in AAAI 2026

点击查看摘要

[CV-34] How Bias Binds: Measuring Hidden Associations for Bias Control in Text-to-Image Compositions AAAI2026 AAAI

链接: https://arxiv.org/abs/2511.07091
作者: Jeng-Lin Li,Ming-Ching Chang,Wei-Chao Chen
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Accepted for publication at the Alignment Track of The 40th Annual AAAI Conference on Artificial Intelligence (AAAI 2026)

点击查看摘要

[CV-35] Achieving Effective Virtual Reality Interactions via Acoustic Gesture Recognition based on Large Language Models ICASSP2026

链接: https://arxiv.org/abs/2511.07085
作者: Xijie Zhang,Fengliang He,Hong-Ning Dai
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: 5 pages, 4 figures, 1 table, under review at ICASSP 2026

点击查看摘要

[CV-36] Pandar128 dataset for lane line detection

链接: https://arxiv.org/abs/2511.07084
作者: Filip Beránek,Václav Diviš,Ivan Gruber
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-37] LeCoT: revisiting network architecture for two-view correspondence pruning

链接: https://arxiv.org/abs/2511.07078
作者: Luanyuan Dai,Xiaoyu Du,Jinhui Tang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Just accepted at SCIENCE CHINA Information Sciences

点击查看摘要

[CV-38] ClusterMine: Robust Label-Free Visual Out-Of-Distribution Detection via Concept Mining from Text Corpora WACV WACV2026

链接: https://arxiv.org/abs/2511.07068
作者: Nikolas Adaloglou,Diana Petrusheva,Mohamed Asker,Felix Michels,Markus Kollmann
机构: Heinrich Heine University of Düsseldorf (海因里希·海涅杜塞尔多夫大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Accepted in WACV 2026. Code in this https URL 9 Tables, 11 Figures

点击查看摘要

[CV-39] RaLD: Generating High-Resolution 3D Radar Point Clouds with Latent Diffusion

链接: https://arxiv.org/abs/2511.07067
作者: Ruijie Zhang,Bixin Zeng,Shengpeng Wang,Fuhui Zhou,Wei Wang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-40] Improving Deepfake Detection with Reinforcement Learning-Based Adaptive Data Augmentation

链接: https://arxiv.org/abs/2511.07051
作者: Yuxuan Zhou,Tao Yu,Wen Huang,Yuheng Zhang,Tao Dai,Shu-Tao Xia
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
备注:

点击查看摘要

[CV-41] From Pretrain to Pain: Adversarial Vulnerability of Video Foundation Models Without Task Knowledge AAAI2026

链接: https://arxiv.org/abs/2511.07049
作者: Hui Lu,Yi Yu,Song Xia,Yiming Yang,Deepu Rajan,Boon Poh Ng,Alex Kot,Xudong Jiang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
备注: AAAI 2026 (Oral presentation)

点击查看摘要

[CV-42] 3D-ANC: Adaptive Neural Collapse for Robust 3D Point Cloud Recognition AAAI2026

链接: https://arxiv.org/abs/2511.07040
作者: Yuanmin Huang,Wenxuan Li,Mi Zhang,Xiaohan Zhang,Xiaoyu You,Min Yang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
备注: AAAI 2026

点击查看摘要

[CV-43] Certified L2-Norm Robustness of 3D Point Cloud Recognition in the Frequency Domain AAAI26

链接: https://arxiv.org/abs/2511.07029
作者: Liang Zhou,Qiming Wang,Tianze Chen
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by AAAI26

点击查看摘要

[CV-44] Performance Decay in Deepfake Detection: The Limitations of Training on Outdated Data

链接: https://arxiv.org/abs/2511.07009
作者: Jack Richings,Margaux Leblanc,Ian Groves,Victoria Nockles
机构: Defence AI Research (DARe), The Alan Turing Institute (艾伦图灵研究所)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-45] rueCity: Real and Simulated Urban Data for Cross-Domain 3D Scene Understanding

链接: https://arxiv.org/abs/2511.07007
作者: Duc Nguyen,Yan-Ling Lai,Qilin Zhang,Prabin Gyawali,Benedikt Schwab,Olaf Wysocki,Thomas H. Kolbe
机构: Technical University of Munich (慕尼黑工业大学); CV4DT, University of Cambridge (剑桥大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: The paper accepted for 3DV 2026 (International Conference on 3D Vision 2026)

点击查看摘要

[CV-46] Exploring the “Great Unseen” in Medieval Manuscripts: Instance-Level Labeling of Legacy Image Collections with Zero-Shot Models

链接: https://arxiv.org/abs/2511.07004
作者: Christofer Meinecke,Estelle Guéville,David Joseph Wrisley
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
备注:

点击查看摘要

[CV-47] Oh That Looks Familiar: A Novel Similarity Measure for Spreadsheet Template Discovery

链接: https://arxiv.org/abs/2511.06973
作者: Ananad Krishnakumar,Vengadesh Ravikumaran
机构: Ekimetrics(埃基Metrics)
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
备注: 5 pages, 2 figures, Accepted for EuroIPS: AI for Tabular Data Workshop (2025)

点击查看摘要

[CV-48] Learning from the Right Patches: A Two-Stage Wavelet-Driven Masked Autoencoder for Histopathology Representation Learning

链接: https://arxiv.org/abs/2511.06958
作者: Raneen Younis,Louay Hamdi,Lukas Chavez,Zahra Ahmadi
机构: PLRI Medical Informatics Institute (PLRI 医学信息学研究所); Hannover Medical School (汉诺威医学院); Leibniz University Hannover (汉诺威莱布尼茨大学); Sanford Burnham Prebys Medical Discovery Institute (桑福德伯纳姆-普雷比斯医学发现研究所); University of California San Diego (加州大学圣地亚哥分校)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-49] GFix: Perceptually Enhanced Gaussian Splatting Video Compression

链接: https://arxiv.org/abs/2511.06953
作者: Siyue Teng,Ge Gao,Duolikun Danier,Yuxuan Jiang,Fan Zhang,Thomas Davis,Zoe Liu,David Bull
机构: University of Bristol (布里斯托大学); University of Edinburgh (爱丁堡大学); Visionular Inc. (视觉公司)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-50] PADM: A Physics-aware Diffusion Model for Attenuation Correction WACV

链接: https://arxiv.org/abs/2511.06948
作者: Trung Kien Pham,Hoang Minh Vu,Anh Duc Chu,Dac Thai Nguyen,Trung Thanh Nguyen,Thao Nguyen Truong,Mai Hong Son,Thanh Trung Nguyen,Phi Le Nguyen
机构: AI4LIFE, Hanoi University of Science and Technology (河内科技大学), Vietnam; Nagoya Univeristy (名古屋大学), Japan; National Institute of Advanced Industrial Science and Technology (日本产业技术综合研究所), Japan; 108 Military Central Hospital (108军区中央医院), Vietnam
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2026

点击查看摘要

[CV-51] FoCLIP: A Feature-Space Misalignment Framework for CLIP-Based Image Manipulation and Detection

链接: https://arxiv.org/abs/2511.06947
作者: Yulin Chen,Zeyuan Wang,Tianyuan Yu,Yingmei Wei,Liang Bai
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 15 page, 9 figures, published to PRCV

点击查看摘要

[CV-52] From Attribution to Action: Jointly ALIGNing Predictions and Explanations AAAI2026

链接: https://arxiv.org/abs/2511.06944
作者: Dongsheng Hong,Chao Chen,Yanhui Chen,Shanshan Lin,Zhihao Chen,Xiangwen Liao
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Accepted in AAAI 2026

点击查看摘要

[CV-53] PlantTraitNet: An Uncertainty-Aware Multimodal Framework for Global-Scale Plant Trait Inference from Citizen Science Data AAAI AAAI-26

链接: https://arxiv.org/abs/2511.06943
作者: Ayushi Sharma,Johanna Trost,Daniel Lusk,Johannes Dollinger,Julian Schrader,Christian Rossi,Javier Lopatin,Etienne Laliberté,Simon Haberstroh,Jana Eichel,Daniel Mederer,Jose Miguel Cerda-Paredes,Shyam S. Phartyal,Lisa-Maricia Schwarz,Anja Linstädter,Maria Conceição Caldeira,Teja Kattenborn
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Preprint version of the paper accepted at the 40th AAAI Conference on Artificial Intelligence (AAAI-26), organized by the Association for the Advancement of Artificial Intelligence

点击查看摘要

[CV-54] DTTNet: Improving Video Shadow Detection via Dark-Aware Guidance and Tokenized Temporal Modeling

链接: https://arxiv.org/abs/2511.06925
作者: Zhicheng Li,Kunyang Sun,Rui Yao,Hancheng Zhu,Fuyuan Hu,Jiaqi Zhao,Zhiwen Shao,Yong Zhou
机构: China University of Mining and Technology (中国矿业大学); Shanghai Jiao Tong University (上海交通大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-55] Mono3DVG-EnSD: Enhanced Spatial-aware and Dimension-decoupled Text Encoding for Monocular 3D Visual Grounding

链接: https://arxiv.org/abs/2511.06908
作者: Yuzhen Li,Min Liu,Zhaoyang Li,Yuan Bian,Xueping Wang,Erbo Zhai,Yaonan Wang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
备注: 10 pages

点击查看摘要

[CV-56] Classification of Microplastic Particles in Water using Polarized Light Scattering and Machine Learning Methods

链接: https://arxiv.org/abs/2511.06901
作者: Leonard Saur,Marc von Pawlowski,Ulrich Gengenbach,Ingo Sieber,Hossein Shirali,Lorenz Wührl,Rainer Kiko,Christian Pylatiuk
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 20 pages, 6 figures

点击查看摘要

[CV-57] Adaptive Morph-Patch Transformer for Arotic Vessel Segmentation AAAI2026 AAAI

链接: https://arxiv.org/abs/2511.06897
作者: Zhenxi Zhang,Fuchen Zheng,Adnan Iltaf,Yifei Han,Zhenyu Cheng,Yue Du,Bin Li,Tianyong Liu,Shoujun Zhou
机构: SIAT; Shenzhen Institutes of Advanced Technology (深圳先进技术研究院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: This is the preprint version of a paper accepted by AAAI 2026. The final version will appear in the AAAI Proceedings

点击查看摘要

[CV-58] A Two-Stage System for Layout-Controlled Image Generation using Large Language Models and Diffusion Models

链接: https://arxiv.org/abs/2511.06888
作者: Jan-Hendrik Koch,Jonas Krumme,Konrad Gadzicki
机构: University of Bremen (不来梅大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 12 pages, 5 figures

点击查看摘要

[CV-59] Generating an Image From 1000 Words: Enhancing Text-to-Image With Structured Captions

链接: https://arxiv.org/abs/2511.06876
作者: Eyal Gutflaish,Eliran Kachlon,Hezi Zisman,Tal Hacham,Nimrod Sarid,Alexander Visheratin,Saar Huberman,Gal Davidi,Guy Bukchin,Kfir Goldberg,Ron Mokady
机构: BRIA AI
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-60] VAEVQ: Enhancing Discrete Visual Tokenization through Variational Modeling

链接: https://arxiv.org/abs/2511.06863
作者: Sicheng Yang,Xing Hu,Qiang Wu,Dawei Yang
机构: Houmo AI(候摩AI)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-61] Ambiguity-aware Truncated Flow Matching for Ambiguous Medical Image Segmentation AAAI-26

链接: https://arxiv.org/abs/2511.06857
作者: Fanding Li(1),Xiangyu Li(1),Xianghe Su(1),Xingyu Qiu(1),Suyu Dong(2),Wei Wang(3),Kuanquan Wang(1),Gongning Luo(1),Shuo Li(4 and 5) ((1) Faculty of Computing, Harbin Institute of Technology, Harbin, China, (2) College of Computer and Control Engineering, Northeast Forestry University, Harbin, China, (3) Faculty of Computing, Harbin Institute of Technology, Shenzhen, China, (4) Department of Computer and Data Science, Case Western Reserve University, Cleveland, Ohio 44106, United States, (5) Department of Biomedical Engineering, Case Western Reserve University, Cleveland, Ohio 44106, United States)
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 13 pages, 10 figures, extended version of AAAI-26 paper

点击查看摘要

[CV-62] Distillation Dynamics: Towards Understanding Feature-Based Distillation in Vision Transformers AAAI2026

链接: https://arxiv.org/abs/2511.06848
作者: Huiyuan Tian,Bonan Xu Shijian Li
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted to AAAI 2026. Submitted version

点击查看摘要

[CV-63] Gaussian-Augmented Physics Simulation and System Identification with Complex Colliders NEURIPS2025

链接: https://arxiv.org/abs/2511.06846
作者: Federico Vasile,Ri-Zhao Qiu,Lorenzo Natale,Xiaolong Wang
机构: Istituto Italiano di Tecnologia (意大利技术研究院); UC San Diego (加州大学圣地亚哥分校)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted to NeurIPS 2025. Project website: this https URL

点击查看摘要

[CV-64] Aerial Image Stitching Using IMU Data from a UAV

链接: https://arxiv.org/abs/2511.06841
作者: Selim Ahmet Iz,Mustafa Unel
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Systems and Control (eess.SY); Dynamical Systems (math.DS)
备注:

点击查看摘要

[CV-65] PanoNav: Mapless Zero-Shot Object Navigation with Panoramic Scene Parsing and Dynamic Memory AAAI2026

链接: https://arxiv.org/abs/2511.06840
作者: Qunchao Jin,Yilin Wu,Changhao Chen
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
备注: Accepted as a poster in AAAI 2026

点击查看摘要

[CV-66] Vision-Based System Identification of a Quadrotor

链接: https://arxiv.org/abs/2511.06839
作者: Selim Ahmet Iz,Mustafa Unel
机构: 未知
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Systems and Control (eess.SY); Dynamical Systems (math.DS)
备注:

点击查看摘要

[CV-67] NeuroBridge: Bio-Inspired Self-Supervised EEG-to-Image Decoding via Cognitive Priors and Bidirectional Semantic Alignment AAAI2026

链接: https://arxiv.org/abs/2511.06836
作者: Wenjiang Zhang,Sifeng Wang,Yuwei Su,Xinyu Li,Chen Zhang,Suyu Zhong
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: AAAI 2026

点击查看摘要

[CV-68] ConsistTalk: Intensity Controllable Temporally Consistent Talking Head Generation with Diffusion Noise Search AAAI26

链接: https://arxiv.org/abs/2511.06833
作者: Zhenjie Liu,Jianzhang Lu,Renjie Lu,Cong Liang,Shangfei Wang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: AAAI26 poster

点击查看摘要

[CV-69] MUGSQA: Novel Multi-Uncertainty-Based Gaussian Splatting Quality Assessment Method Dataset and Benchmarks

链接: https://arxiv.org/abs/2511.06830
作者: Tianang Chen,Jian Jin,Shilv Cai,Zhuangzi Li,Weisi Lin
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-70] Integrating Reweighted Least Squares with Plug-and-Play Diffusion Priors for Noisy Image Restoration

链接: https://arxiv.org/abs/2511.06823
作者: Ji Li,Chao Wang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 12 pages

点击查看摘要

[CV-71] S-TSL: Image-Label Supervised Surgical Video Stereo Matching via Time-Switchable Teacher-Student Learning

链接: https://arxiv.org/abs/2511.06817
作者: Rui Wang,Ying Zhou,Hao Wang,Wenwei Zhang,Qiang Li,Zhiwei Wang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 8 pages, 4 figures, accepted by BiBM2025

点击查看摘要

[CV-72] ConeGS: Error-Guided Densification Using Pixel Cones for Improved Reconstruction with Fewer Primitives

链接: https://arxiv.org/abs/2511.06810
作者: Bartłomiej Baranowski,Stefano Esposito,Patricia Gschoßmann,Anpei Chen,Andreas Geiger
机构: University of Tübingen (图宾根大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-73] Robust and High-Fidelity 3D Gaussian Splatting: Fusing Pose Priors and Geometry Constraints for Texture-Deficient Outdoor Scenes IROS2025

链接: https://arxiv.org/abs/2511.06765
作者: Meijun Guo,Yongliang Shi,Caiyun Liu,Yixiao Feng,Ming Ma,Tinghai Yan,Weining Lu,Bin Liang
机构: Beijing Institute of Technology (北京理工大学); Beiing National Research Center for Information Science and Technology (北京信息科学与技术国家研究中心); Qiyuan Lab (启源实验室); Peking University (北京大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
备注: 7 pages, 3 figures. Accepted by IROS 2025

点击查看摘要

[CV-74] CAST-LUT: Tokenizer-Guided HSV Look-Up Tables for Purple Flare Removal

链接: https://arxiv.org/abs/2511.06764
作者: Pu Wang,Shuning Sun,Jialang Lu,Chen Wu,Zhihua Zhang,Youshan Zhang,Chenggang Shan,Dianjie Lu,Guijuan Zhang,Zhuoran Zheng
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-75] SlotVLA: Towards Modeling of Object-Relation Representations in Robotic Manipulation

链接: https://arxiv.org/abs/2511.06754
作者: Taisei Hanyu,Nhat Chung,Huy Le,Toan Nguyen,Yuki Ikebe,Anthony Gunderman,Duy Nguyen Ho Minh,Khoa Vo,Tung Kieu,Kashu Yamazaki,Chase Rainwater,Anh Nguyen,Ngan Le
机构: University of Arkansas, USA; FPT Software AI Center, Vietnam; University of Stuttgart, Germany; Aalborg University, Denmark; Carnegie Mellon University, USA; University of Liverpool, UK; German Research Center for Artificial Intelligence, Germany; Max Planck Research School for Intelligent Systems, Germany
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
备注: under review

点击查看摘要

[CV-76] Med-SORA: Symptom to Organ Reasoning in Abdomen CT Images

链接: https://arxiv.org/abs/2511.06752
作者: You-Kyoung Na,Yeong-Jun Cho
机构: Chonnam National University (全南国立大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 9 pages

点击查看摘要

[CV-77] Semi-distributed Cross-modal Air-Ground Relative Localization IROS2025

链接: https://arxiv.org/abs/2511.06749
作者: Weining Lu,Deer Bin,Lian Ma,Ming Ma,Zhihao Ma,Xiangyang Chen,Longfei Wang,Yixiao Feng,Zhouxian Jiang,Yongliang Shi,Bin Liang
机构: Beijng National Research Center for Information Science and Technology (北京信息科学与技术国家研究中心); Qiyuan Lab (启源实验室); JiangHuai Advanced Technology Center (江淮先进技术中心)
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
备注: 7 pages, 3 figures. Accepted by IROS 2025

点击查看摘要

[CV-78] Image Restoration via Primal Dual Hybrid Gradient and Flow Generative Model AAAI26

链接: https://arxiv.org/abs/2511.06748
作者: Ji Li,Chao Wang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 13 pages; AAAI26 version with appendix

点击查看摘要

[CV-79] PointCubeNet: 3D Part-level Reasoning with 3x3x3 Point Cloud Blocks

链接: https://arxiv.org/abs/2511.06744
作者: Da-Yeong Kim,Yeong-Jun Cho
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-80] Otter: Mitigating Background Distractions of Wide-Angle Few-Shot Action Recognition with Enhanced RWKV AAAI2026

链接: https://arxiv.org/abs/2511.06741
作者: Wenbo Huang,Jinghui Zhang,Zhenghao Chen,Guang Li,Lei Zhang,Yang Cao,Fang Dong,Takahiro Ogawa,Miki Haseyama
机构: 1. Tsinghua University (清华大学); 2. Alibaba Group (阿里巴巴集团); 3. Shanghai Jiao Tong University (上海交通大学); 4. University of Tokyo (东京大学); 5. Microsoft (微软)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by AAAI 2026 Oral

点击查看摘要

[CV-81] SinSEMI: A One-Shot Image Generation Model and Data-Efficient Evaluation Framework for Semiconductor Inspection Equipment

链接: https://arxiv.org/abs/2511.06740
作者: ChunLiang Wu,Xiaochun Li
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-82] Rethinking Rainy 3D Scene Reconstruction via Perspective Transforming and Brightness Tuning AAAI2026

链接: https://arxiv.org/abs/2511.06734
作者: Qianfeng Yang,Xiang Chen,Pengpeng Li,Qiyuan Guan,Guiyue Jin,Jiyu Jin
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by AAAI 2026 (Oral)

点击查看摘要

[CV-83] Argus: Quality-Aware High-Throughput Text-to-Image Inference Serving System

链接: https://arxiv.org/abs/2511.06724
作者: Shubham Agarwal,Subrata Mitra,Saud Iqbal
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
备注: Accepted at Middleware 2025

点击查看摘要

[CV-84] AvatarTex: High-Fidelity Facial Texture Reconstruction from Single-Image Stylized Avatars

链接: https://arxiv.org/abs/2511.06721
作者: Yuda Qiu,Zitong Xiao,Yiwei Zuo,Zisheng Ye,Weikai Chen,Xiaoguang Han
机构: SSE, CUHKSZ; FNii, CUHKSZ
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 3DV 2026 Accepted

点击查看摘要

[CV-85] Relative Energy Learning for LiDAR Out-of-Distribution Detection

链接: https://arxiv.org/abs/2511.06720
作者: Zizhao Li,Zhengkang Xiang,Jiayang Ao,Joseph West,Kourosh Khoshelham
机构: University of Melbourne(墨尔本大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-86] MRT: Learning Compact Representations with Mixed RWKV-Transformer for Extreme Image Compression

链接: https://arxiv.org/abs/2511.06717
作者: Han Liu,Hengyu Man,Xingtao Wang,Wenrui Li,Debin Zhao
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-87] MirrorMamba: Towards Scalable and Robust Mirror Detection in Videos

链接: https://arxiv.org/abs/2511.06716
作者: Rui Song,Jiaying Lin,Rynson W.H. Lau
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-88] K-Stain: Keypoint-Driven Correspondence for HE-to-IHC Virtual Staining

链接: https://arxiv.org/abs/2511.06709
作者: Sicheng Yang,Zhaohu Xing,Haipeng Zhou,Lei Zhu
机构: The Hong Kong University of Science and Technology (Guangzhou) (香港科技大学(广州)); The Hong Kong University of Science and Technology (香港科技大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-89] SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection

链接: https://arxiv.org/abs/2511.06702
作者: Yifan Wang,Yian Zhao,Fanqi Pu,Xiaochen Yang,Yang Tang,Xi Chen,Wenming Yang
机构: Tsinghua Shenzhen International Graduate School, Tsinghua University (清华大学深圳国际研究生院); School of Electronic and Computer Engineering, Peking University (北京大学电子与计算机工程学院); School of Mathematics and Statistics, University of Glasgow (格拉斯哥大学数学与统计学院); Basic Algorithm Center, PCG, Tencent (腾讯PCG基础算法中心)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-90] AnoStyler: Text-Driven Localized Anomaly Generation via Lightweight Style Transfer AAAI2026

链接: https://arxiv.org/abs/2511.06687
作者: Yulim So,Seokho Kang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted to AAAI 2026

点击查看摘要

[CV-91] Flexible Concept Bottleneck Model AAAI2026

链接: https://arxiv.org/abs/2511.06678
作者: Xingbo Du,Qiantong Dou,Lei Fan,Rui Zhang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: To appear in AAAI 2026

点击查看摘要

[CV-92] REOcc: Camera-Radar Fusion with Radar Feature Enrichment for 3D Occupancy Prediction IROS2025

链接: https://arxiv.org/abs/2511.06666
作者: Chaehee Song,Sanmin Kim,Hyeonjun Jeong,Juyeb Shin,Joonhee Lim,Dongsuk Kum
机构: Korea Advanced Institute of Science and Technology (KAIST); Kookmin University
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: IROS 2025

点击查看摘要

[CV-93] Sim4Seg: Boosting Multimodal Multi-disease Medical Diagnosis Segmentation with Region-Aware Vision-Language Similarity Masks AAAI2026

链接: https://arxiv.org/abs/2511.06665
作者: Lingran Song,Yucheng Zhou,Jianbing Shen
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: AAAI 2026

点击查看摘要

[CV-94] Active Learning for Animal Re-Identification with Ambiguity-Aware Sampling AAAI

链接: https://arxiv.org/abs/2511.06658
作者: Depanshu Sani,Mehar Khurana,Saket Anand
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: In Proceedings of AAAI Conference on Artificial Intelligence 2026

点击查看摘要

[CV-95] NOVO: Bridging LLaVA and SAM with Visual-only Prompts for Reasoning Segmentation

链接: https://arxiv.org/abs/2511.06651
作者: Kyung-Yoon Yoon,Yeong-Jun Cho
机构: Chonnam National University (全南国立大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-96] FreqGRL: Suppressing Low-Frequency Bias and Mining High-Frequency Knowledge for Cross-Domain Few-Shot Learning

链接: https://arxiv.org/abs/2511.06648
作者: Siqi Hui,Sanping Zhou,Ye deng,Wenli Huang,Jinjun Wang
机构: Unknown
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-97] UniADC: A Unified Framework for Anomaly Detection and Classification

链接: https://arxiv.org/abs/2511.06644
作者: Ximiao Zhang,Min Xu,Zheng Zhang,Junlin Hu,Xiuzhuang Zhou
机构: Beijing University of Posts and Telecommunications (北京邮电大学); Capital Normal University (首都师范大学); Beihang University (北京航空航天大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-98] DIAL-GS: Dynamic Instance Aware Reconstruction for Label-free Street Scenes with 4D Gaussian Splatting

链接: https://arxiv.org/abs/2511.06632
作者: Chenpeng Su,Wenhua Wu,Chensheng Peng,Tianchen Deng,Zhe Liu,Hesheng Wang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-99] Explainable Cross-Disease Reasoning for Cardiovascular Risk Assessment from LDCT

链接: https://arxiv.org/abs/2511.06625
作者: Yifei Zhang,Jiashuo Zhang,Xiaofeng Yang,Liang Zhao
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[CV-100] On Accurate and Robust Estimation of 3D and 2D Circular Center: Method and Application to Camera-Lidar Calibration

链接: https://arxiv.org/abs/2511.06611
作者: Jiajun Jiang,Xiao Hu,Wancheng Liu,Wei Jiang
机构: The Hong Kong University of Science and Technology (Guangzhou) (香港科技大学(广州)); International Digital Economy Academy (国际数字经济发展研究院); Horizon-Continental Technology Corporation ( horizon-continental 技术公司); Beijing Jiaotong University (北京交通大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
备注:

点击查看摘要

[CV-101] Spatial-Frequency Enhanced Mamba for Multi-Modal Image Fusion

链接: https://arxiv.org/abs/2511.06593
作者: Hui Sun,Long Lv,Pingping Zhang,Tongdan Tang,Feng Tian,Weibing Sun,Huchuan Lu
机构: Dalian University of Technology (大连理工大学); Affiliated Zhongshan Hospital of Dalian University of Technology (大连理工大学附属中山医院); Central Hospital of Dalian University of Technology (大连理工大学中心医院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: This work is accepted by IEEE Transactions on Image Processing. More modifications may be performed

点击查看摘要

[CV-102] Video Dataset for Surgical Phase Keypoint and Instrument Recognition in Laparoscopic Surgery (PhaKIR)

链接: https://arxiv.org/abs/2511.06549
作者: Tobias Rueckert,Raphaela Maerkl,David Rauber,Leonard Klausmann,Max Gutbrod,Daniel Rueckert,Hubertus Feussner,Dirk Wilhelm,Christoph Palm
机构: Regensburg Medical Image Computing (ReMIC), OTH Regensburg (奥格斯堡技术与科学大学); AKTORmed Robotic Surgery (AKTORmed机器人手术); Regensburg Center of Biomedical Engineering (RCBE), OTH Regensburg and Regensburg University (雷根斯堡大学); Regensburg Center of Health Sciences and Technology (RCHST), OTH Regensburg (奥格斯堡技术与科学大学); Chair for AI in Healthcare and Medicine, Technical University of Munich (TUM) and TUM University Hospital (慕尼黑工业大学及慕尼黑工业大学医院); Biomedical Image Analysis Group, Department of Computing, Imperial College London (帝国理工学院); Research Group MITI, TUM University Hospital, School of Medicine and Health, Technical University of Munich (慕尼黑工业大学); Department of Surgery, TUM University Hospital, School of Medicine and Health, Technical University of Munich (慕尼黑工业大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 9 pages, 5 figures, 4 tables

点击查看摘要

[CV-103] SportR: A Benchmark for Multimodal Large Language Model Reasoning in Sports

链接: https://arxiv.org/abs/2511.06499
作者: Haotian Xia,Haonan Ge,Junbo Zou,Hyun Woo Choi,Xuebin Zhang,Danny Suradja,Botao Rui,Ethan Tran,Wendy Jin,Zhen Ye,Xiyang Lin,Christopher Lai,Shengjie Zhang,Junwen Miao,Shichao Chen,Rhys Tracy,Vicente Ordonez,Weining Shen,Hanjie Chen
机构: Rice University (莱斯大学); University of California, Irvine (加州大学欧文分校); Georgia Institute of Technology (佐治亚理工学院); Johns Hopkins University (约翰霍普金斯大学); University of California, Santa Barbara (加州大学圣塔芭芭拉分校)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-104] A Low-Rank Method for Vision Language Model Hallucination Mitigation in Autonomous Driving

链接: https://arxiv.org/abs/2511.06496
作者: Keke Long,Jiacheng Guo,Tianyun Zhang,Hongkai Yu,Xiaopeng Li
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-105] Zooming into Comics: Region-Aware RL Improves Fine-Grained Comic Understanding in Vision-Language Models

链接: https://arxiv.org/abs/2511.06490
作者: Yule Chen,Yufan Ren,Sabine Süsstrunk
机构: EPFL (瑞士联邦理工学院); Chalmers University of Technology (查尔姆斯理工大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-106] NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models

链接: https://arxiv.org/abs/2511.06475
作者: Kyuho Lee,Euntae Kim,Jinwoo Choi,Buru Chang
机构: Korea University (韩国大学); Kyung Hee University (庆熙大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 18 pages, 9 figures. Preprint

点击查看摘要

[CV-107] Inpaint360GS: Efficient Object-Aware 3D Inpainting via Gaussian Splatting for 360° Scenes WACV2026

链接: https://arxiv.org/abs/2511.06457
作者: Shaoxiang Wang,Shihong Zhang,Christen Millerdurai,Rüdiger Westermann,Didier Stricker,Alain Pagani
机构: German Research Center for Artificial Intelligence (德国人工智能研究中心); RPTU (莱布尼茨大学); Technical University of Munich (慕尼黑工业大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: WACV 2026, project page: this https URL

点击查看摘要

[CV-108] EIDSeg: A Pixel-Level Semantic Segmentation Dataset for Post-Earthquake Damage Assessment from Social Media Images AAAI

链接: https://arxiv.org/abs/2511.06456
作者: Huili Huang,Chengeng Liu,Danrong Zhang,Shail Patel,Anastasiya Masalava,Sagar Sadak,Parisa Babolhavaeji,WeiHong Low,Max Mahdi Roozbahani,J. David Frost
机构: Georgia Institute of Technology (佐治亚理工学院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Camera-Ready for AAAI-AISI26

点击查看摘要

[CV-109] Countering Multi-modal Representation Collapse through Rank-targeted Fusion WACV

链接: https://arxiv.org/abs/2511.06450
作者: Seulgi Kim,Kiran Kokilepersaud,Mohit Prabhushankar,Ghassan AlRegib
机构: Georgia Institute of Technology (佐治亚理工学院)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Accepted in 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

点击查看摘要

[CV-110] Diagnose Like A REAL Pathologist: An Uncertainty-Focused Approach for Trustworthy Multi-Resolution Multiple Instance Learning WACV

链接: https://arxiv.org/abs/2511.06433
作者: Sungrae Hong,Sol Lee,Jisu Shin,Mun Yong Yi
机构: Korea Advanced Institute of Science and Technology (韩国科学技术院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2026

点击查看摘要

[CV-111] DiffusionUavLoc: Visually Prompted Diffusion for Cross-View UAV Localization

链接: https://arxiv.org/abs/2511.06422
作者: Tao Liu,Kan Ren,Qian Chen
机构: Nanjing University of Science and Technology (南京理工大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-112] VDNeRF: Vision-only Dynamic Neural Radiance Field for Urban Scenes

链接: https://arxiv.org/abs/2511.06408
作者: Zhengyu Zou,Jingfeng Li,Hao Li,Xiaolei Hou,Jinwen Hu,Jingkun Chen,Lechao Cheng,Dingwen Zhang
机构: Northwestern Polytechnical University (西北工业大学); University of Oxford (牛津大学); Hefei University Of Technology (合肥工业大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-113] On Modality Incomplete Infrared-Visible Object Detection: An Architecture Compatibility Perspective

链接: https://arxiv.org/abs/2511.06406
作者: Shuo Yang,Yinghui Xing,Shizhou Zhang,Zhilong Niu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-114] InfoAffect: A Dataset for Affective Analysis of Infographics

链接: https://arxiv.org/abs/2511.06404
作者: Zihang Fu,Yunchao Wang,Chenyu Huang,Guodao Sun,Ronghua Liang
机构: Zhejiang University of Technology (浙江工业大学); Zhejiang University of Science and Technology (浙江科技学院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-115] ArtReg: Visuo-Tactile based Pose Tracking and Manipulation of Unseen Articulated Objects

链接: https://arxiv.org/abs/2511.06378
作者: Prajval Kumar Murali,Mohsen Kaboli
机构: 未知
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
备注: Under review

点击查看摘要

[CV-116] V-Shuffle: Zero-Shot Style Transfer via Value Shuffle

链接: https://arxiv.org/abs/2511.06365
作者: Haojun Tang,Qiwei Lin,Tongda Xu,Lida Huang,Yan Wang
机构: Tsinghua University (清华大学); Dalian University of Technology (大连理工大学); Beijing Institute of Radio Measurement (北京无线电计量测试研究所)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-117] AesTest: Measuring Aesthetic Intelligence from Perception to Production

链接: https://arxiv.org/abs/2511.06360
作者: Guolong Wang,Heng Huang,Zhiqiang Zhang,Wentian Li,Feilong Ma,Xin Jin
机构: University of International Business and Economics (对外经济贸易大学); University of Science and Technology of China (中国科学技术大学); Huawei Technologies Co., Ltd (华为技术有限公司); Beijing Electronic Science and Technology Institute (北京电子科技学院); Beijing Institute for General Artificial Intelligence (通用人工智能研究院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 10 pages, 9 figures

点击查看摘要

[CV-118] GazeVLM: A Vision-Language Model for Multi-Task Gaze Understanding

链接: https://arxiv.org/abs/2511.06348
作者: Athul M. Mathew,Haithem Hermassi,Thariq Khalid,Arshad Ali Khan,Riad Souissi
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-119] BuildingWorld: A Structured 3D Building Dataset for Urban Foundation Models

链接: https://arxiv.org/abs/2511.06337
作者: Shangfeng Huang,Ruisheng Wang,Xin Wang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-120] Label-Efficient 3D Forest Mapping: Self-Supervised and Transfer Learning for Individual Structural and Species Analysis

链接: https://arxiv.org/abs/2511.06331
作者: Aldino Rizaldy,Fabian Ewald Fassnacht,Ahmed Jamal Afifi,Hua Jiang,Richard Gloaguen,Pedram Ghamisi
机构: Helmholtz-Zentrum Dresden-Rossendorf (HZDR); Helmholtz Institute Freiberg for Resource Technology (HIF); Remote Sensing and Geoinformatics, Freie Universitaet Berlin
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-121] Improving Multimodal Sentiment Analysis via Modality Optimization and Dynamic Primary Modality Selection AAAI2026

链接: https://arxiv.org/abs/2511.06328
作者: Dingkang Yang,Mingcheng Li,Xuecheng Wu,Zhaoyu Chen,Kaixun Jiang,Keliang Liu,Peng Zhai,Lihua Zhang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by AAAI 2026

点击查看摘要

[CV-122] CINEMAE: Leverag ing Frozen Masked Autoencoders for Cross-Generator AI Image Detection

链接: https://arxiv.org/abs/2511.06325
作者: Minsuk Jang,Hyeonseo Jeong,Minseok Son,Changick Kim
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
备注:

点击查看摘要

[CV-123] Seq2Seq Models Reconstruct Visual Jigsaw Puzzles without Seeing Them

链接: https://arxiv.org/abs/2511.06315
作者: Gur Elkn,Ofir Itzhak Shahar,Ohad Ben-Shahar
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-124] Adaptive 3D Reconstruction via Diffusion Priors and Forward Curvature-Matching Likelihood Updates

链接: https://arxiv.org/abs/2511.06310
作者: Seunghyeok Shin,Dabin Kim,Hongki Lim
机构: Inha University (仁荷大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-125] Physics-Informed Deformable Gaussian Splatting: Towards Unified Constitutive Laws for Time-Evolving Material Field AAAI-26

链接: https://arxiv.org/abs/2511.06299
作者: Haoqin Hong,Ding Fan,Fubin Dou,Zhi-Li Zhou,Haoran Sun,Congcong Zhu,Jingrun Chen
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by AAAI-26

点击查看摘要

[CV-126] SFFR: Spatial-Frequency Feature Reconstruction for Multispectral Aerial Object Detection

链接: https://arxiv.org/abs/2511.06298
作者: Xin Zuo,Yuchen Qu,Haibo Zhan,Jifeng Shen,Wankou Yang
机构: Jiangsu University of Science and Technology (江苏科技大学); Jiangsu University (江苏大学); Southeast University (东南大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 11 pages,8 figures, accepted by IEEE TGRS

点击查看摘要

[CV-127] Learning-Based Vision Systems for Semi-Autonomous Forklift Operation in Industrial Warehouse Environments

链接: https://arxiv.org/abs/2511.06295
作者: Vamshika Sutar,Mahek Maheshwari,Archak Mittal
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-128] nyChemVL: Advancing Chemical Vision-Language Models via Efficient Visual Token Reduction and Complex Reaction Tasks AAAI2026

【速读】:该论文旨在解决当前视觉语言模型(Vision Language Models, VLMs)在化学领域应用中的两大瓶颈问题:一是直接使用标准VLM处理包含非信息背景的化学图像导致计算效率低下;二是现有方法局限于分子级任务,限制了化学推理能力的发展。解决方案的关键在于提出一种高效且强大的化学专用VLM——TinyChemVL,其核心创新包括:通过视觉token缩减策略显著降低计算开销,同时引入反应级任务(reaction-level tasks)以增强模型的化学推理能力。此外,作者还构建了ChemRxn-V基准数据集用于评估基于视觉的反应识别与预测任务。实验表明,TinyChemVL仅用4B参数即可在分子和反应任务上超越现有模型,且推理和训练速度更快,同时仅需1/16的视觉token即可优于ChemVLM。

链接: https://arxiv.org/abs/2511.06283
作者: Xuanle Zhao,Shuxin Zeng,Yinyuan Cai,Xiang Cheng,Duzhen Zhang,Xiuyi Chen,Bo Xu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by AAAI 2026, Preprint Version

点击查看摘要

Abstract:While Vision Language Models (VLMs) have demonstrated remarkable capabilities in general visual understanding, their application in the chemical domain has been limited, with previous works predominantly focusing on text and thus overlooking critical visual information, such as molecular structures. Current approaches that directly adopt standard VLMs for chemical tasks suffer from two primary issues: (i) computational inefficiency of processing entire chemical images with non-informative backgrounds. (ii) a narrow scope on molecular-level tasks that restricts progress in chemical reasoning. In this work, we propose \textbfTinyChemVL, an efficient and powerful chemical VLM that leverages visual token reduction and reaction-level tasks to improve model efficiency and reasoning capacity. Also, we propose \textbfChemRxn-V, a reaction-level benchmark for assessing vision-based reaction recognition and prediction tasks. Directly predicting reaction products from molecular images poses a non-trivial challenge, as it requires models to integrate both recognition and reasoning capacities. Our results demonstrate that with only 4B parameters, TinyChemVL achieves superior performance on both molecular and reaction tasks while demonstrating faster inference and training speeds compared to existing models. Notably, TinyChemVL outperforms ChemVLM while utilizing only 1/16th of the visual tokens. This work builds efficient yet powerful VLMs for chemical domains by co-designing model architecture and task complexity.
zh

[CV-129] From ACR O-RADS 2022 to Explainable Deep Learning: Comparative Performance of Expert Radiologists Convolutional Neural Networks Vision Transformers and Fusion Models in Ovarian Masses

链接: https://arxiv.org/abs/2511.06282
作者: Ali Abbasian Ardakani,Afshin Mohammadi,Alisa Mohebbi,Anushya Vijayananthan,Sook Sam Leong,Lim Yi Ting,Mohd Kamil Bin Mohamad Fabell,U Rajendra Acharya,Sepideh Hatamikia
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 18 pages, 4 figures

点击查看摘要

[CV-130] VideoSSR: Video Self-Supervised Reinforcement Learning

链接: https://arxiv.org/abs/2511.06281
作者: Zefeng He,Xiaoye Qu,Yafu Li,Siyuan Huang,Daizong Liu,Yu Cheng
机构: Shanghai Artificial Intelligence Laboratory (上海人工智能实验室); Nanjing Univerisity (南京大学); The Chinese University of Hong Kong (香港中文大学); Shanghai Jiao Tong University (上海交通大学); Wuhan University (武汉大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-131] LaneDiffusion: Improving Centerline Graph Learning via Prior Injected BEV Feature Generation ICCV2025

链接: https://arxiv.org/abs/2511.06272
作者: Zijie Wang,Weiming Zhang,Wei Zhang,Xiao Tan,Hongxing Liu,Yaowei Wang,Guanbin Li
机构: Sun Yat-sen University (中山大学); Shenzhen Loop Area Institute (深圳环区研究院); Baidu Inc. (百度公司); Harbin Institute of Technology, Shenzhen (哈尔滨工业大学(深圳)); Pengcheng Laboratory (鹏城实验室); Guangdong Key Laboratory of Big Data Analysis and Processing (广东省大数据分析与处理重点实验室)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Accepted by ICCV 2025

点击查看摘要

[CV-132] RelightMaster: Precise Video Relighting with Multi-plane Light Images

链接: https://arxiv.org/abs/2511.06271
作者: Weikang Bian,Xiaoyu Shi,Zhaoyang Huang,Jianhong Bai,Qinghe Wang,Xintao Wang,Pengfei Wan,Kun Gai,Hongsheng Li
机构: Multimedia Laboratory, The Chinese University of Hong Kong (香港中文大学多媒体实验室); Kling Team, Kuaishou Technology (快手科技Kling团队); CPII under InnoHK (InnoHK计划下的CPII); Zhejiang University (浙江大学); Dalian University of Technology (大连理工大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Project Page: this https URL

点击查看摘要

[CV-133] LLM -Driven Completeness and Consistency Evaluation for Cultural Heritage Data Augmentation in Cross-Modal Retrieval

链接: https://arxiv.org/abs/2511.06268
作者: Jian Zhang,Junyi Guo,Junyi Yuan,Huanda Lu,Yanlin Zhou,Fangyu Wu,Qiufeng Wang,Dongming Lu
机构: Xi’an Jiaotong-Liverpool University (西安交通大学利物浦大学); NingboTech University (宁波工程学院); Dunhuang Academy (敦煌研究院); Zhejiang University (浙江大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
备注:

点击查看摘要

[CV-134] A Mixture-of-Experts Framework with Log-Logistic Components for Survival Analysis on Histopathology Images

链接: https://arxiv.org/abs/2511.06266
作者: Ardhendu Sekhar,Vasu Soni,Keshav Aske,Shivam Madnoorkar,Pranav Jeevan,Amit Sethi
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-135] CAMP-HiVe: Cyclic Pair Merging based Efficient DNN Pruning with Hessian-Vector Approximation for Resource-Constrained Systems

链接: https://arxiv.org/abs/2511.06265
作者: Mohammad Helal Uddin,Sai Krishna Ghanta,Liam Seymour,Sabur Baidya
机构: University of Louisville (路易斯维尔大学); University of Georgia (佐治亚大学)
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-136] Robust Nearest Neighbour Retrieval Using Targeted Manifold Manipulation

链接: https://arxiv.org/abs/2511.06261
作者: B. Ghosh,H. Harikumar,S. Rana
机构: Applied Artificial Intelligence Institute, Deakin University, Australia(澳大利亚迪肯大学应用人工智能研究所); The University of Manchester, Manchester, England(英格兰曼彻斯特大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-137] VLDrive: Vision-Augmented Lightweight MLLM s for Efficient Language-grounded Autonomous Driving ICCV2025

【速读】:该论文旨在解决基于大语言模型(Large Language Models, LLMs)的视觉-语言驱动系统在自动驾驶场景中面临的两大核心问题:一是由于视觉表征能力不足导致频繁碰撞和障碍物规避失败,影响驾驶鲁棒性;二是LLM参数量庞大,难以部署到资源受限的车载环境中。解决方案的关键在于提出VLDrive架构,其核心创新包括:(1) 通过循环一致性动态视觉剪枝(cycle-consistent dynamic visual pruning)实现紧凑的视觉token表示,提升计算效率;(2) 引入记忆增强特征聚合机制优化视觉-语言联合表征学习;(3) 设计距离解耦指令注意力机制(distance-decoupled instruction attention),强化长距离视觉token的语义对齐与推理能力。实验表明,VLDrive在CARLA模拟器中实现了显著的驾驶性能提升(短距、中距、长距分别提升15.4%、16.8%、7.6%),同时将模型参数减少81%(从7B降至1.3B)。

链接: https://arxiv.org/abs/2511.06256
作者: Ruifei Zhang,Wei Zhang,Xiao Tan,Sibei Yang,Xiang Wan,Xiaonan Luo,Guanbin Li
机构: The Chinese University of Hong Kong, Shenzhen; Shenzhen Research Institute of Big Data; Sun Yat-sen University; Baidu Inc.; Guilin University of Electronic Technology; Guangdong Key Laboratory of Big Data Analysis and Processing
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by ICCV2025

点击查看摘要

Abstract:Recent advancements in language-grounded autonomous driving have been significantly promoted by the sophisticated cognition and reasoning capabilities of large language models (LLMs). However, current LLM-based approaches encounter critical challenges: (1) Failure analysis reveals that frequent collisions and obstructions, stemming from limitations in visual representations, remain primary obstacles to robust driving performance. (2) The substantial parameters of LLMs pose considerable deployment hurdles. To address these limitations, we introduce VLDrive, a novel approach featuring a lightweight MLLM architecture with enhanced vision components. VLDrive achieves compact visual tokens through innovative strategies, including cycle-consistent dynamic visual pruning and memory-enhanced feature aggregation. Furthermore, we propose a distance-decoupled instruction attention mechanism to improve joint visual-linguistic feature learning, particularly for long-range visual tokens. Extensive experiments conducted in the CARLA simulator demonstrate VLDrive`s effectiveness. Notably, VLDrive achieves state-of-the-art driving performance while reducing parameters by 81% (from 7B to 1.3B), yielding substantial driving score improvements of 15.4%, 16.8%, and 7.6% at tiny, short, and long distances, respectively, in closed-loop evaluations. Code is available at this https URL.
zh

[CV-138] AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving ICCV2025

链接: https://arxiv.org/abs/2511.06253
作者: Ruifei Zhang,Junlin Xie,Wei Zhang,Weikai Chen,Xiao Tan,Xiang Wan,Guanbin Li
机构: The Chinese University of Hong Kong, Shenzhen (香港中文大学(深圳)); Shenzhen Research Institute of Big Data (深圳市大数据研究院); Sun Yat-sen University (中山大学); Baidu Inc. (百度公司); Guangdong Key Laboratory of Big Data Analysis and Processing (广东省大数据分析与处理重点实验室)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by ICCV2025

点击查看摘要

[CV-139] st-Time Iterative Error Correction for Efficient Diffusion Models

链接: https://arxiv.org/abs/2511.06250
作者: Yunshan Zhong,Yanwei Qi,Yuxin Zhang
机构: Hainan University (海南大学); Xiamen University (厦门大学)
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-140] Gait Recognition via Collaborating Discriminative and Generative Diffusion Models

链接: https://arxiv.org/abs/2511.06245
作者: Haijun Xiong,Bin Feng,Bang Wang,Xinggang Wang,Wenyu Liu
机构: Huazhong University of Science & Technology (华中科技大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 14 pages, 4figures

点击查看摘要

[CV-141] Physics-Informed Image Restoration via Progressive PDE Integration

链接: https://arxiv.org/abs/2511.06244
作者: Shamika Likhite,Santiago López-Tapia,Aggelos K. Katsaggelos
机构: Northwestern University (西北大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-142] mporal-Guided Visual Foundation Models for Event-Based Vision

链接: https://arxiv.org/abs/2511.06238
作者: Ruihao Xia,Junhong Cai,Luziwei Leng,Liuyi Wang,Chengju Liu,Ran Cheng,Yang Tang,Pan Zhou
机构: East China University of Science and Technology (华东理工大学); Huawei Technologies Company Ltd. (华为技术有限公司); Southern University of Science and Technology (南方科技大学); Tongji University (同济大学); The Hong Kong Polytechnic University (香港理工大学); Singapore Management University (新加坡管理大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-143] MoRA: Missing Modality Low-Rank Adaptation for Visual Recognition

链接: https://arxiv.org/abs/2511.06225
作者: Shu Zhao,Nilesh Ahuja,Tan Yu,Tianyi Shen,Vijaykrishnan Narayanan
机构: The Pennsylvania State University (宾夕法尼亚州立大学); Intel (英特尔); NVIDIA (英伟达)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-144] Scene-Aware Urban Design: A Human-AI Recommendation Framework Using Co-Occurrence Embeddings and Vision-Language Models NEURIPS2025

链接: https://arxiv.org/abs/2511.06201
作者: Rodrigo Gallardo,Oz Fishman,Alexander Htet Kyaw
机构: Massachusetts Institute of Technology (麻省理工学院)
类目: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
备注: Accepted to NEURIPS 2025 Creative AI Track

点击查看摘要

[CV-145] NURBGen: High-Fidelity Text-to-CAD Generation through LLM -Driven NURBS Modeling AAAI2026

链接: https://arxiv.org/abs/2511.06194
作者: Muhammad Usama,Mohammad Sadil Khan,Didier Stricker,Muhammad Zeshan Afzal
机构: 1. University of Siegen (锡根大学); 2. Fraunhofer Institute for Computer Graphics Research (弗劳恩霍夫计算机图形研究所); 3. Center for Advanced Security Research Darmstadt (达姆施塔特高级安全研究中心)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted in AAAI 2026

点击查看摘要

[CV-146] MambaOVSR: Multiscale Fusion with Global Motion Modeling for Chinese Opera Video Super-Resolution

链接: https://arxiv.org/abs/2511.06172
作者: Hua Chang,Xin Xu,Wei Liu,Wei Wang,Xin Yuan,Kui Jiang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-147] Real-Time Bundle Adjustment for Ultra-High-Resolution UAV Imagery Using Adaptive Patch-Based Feature Tracking

链接: https://arxiv.org/abs/2511.06152
作者: Selim Ahmet Iz,Francesco Nex,Norman Kerle,Henry Meissner,Ralf Berger
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC)
备注:

点击查看摘要

[CV-148] Latent Refinement via Flow Matching for Training-free Linear Inverse Problem Solving

链接: https://arxiv.org/abs/2511.06138
作者: Hossein Askari,Yadan Luo,Hongfu Sun,Fred Roosta
机构: The University of Queensland (昆士兰大学); ARC Training Centre for Information Resilience (CIRES) (信息韧性培训中心)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 37 pages, 16 figures,

点击查看摘要

[CV-149] DiLO: Disentangled Latent Optimization for Learning Shape and Deformation in Grouped Deforming 3D Objects

【速读】:该论文旨在解决如何在无监督条件下将分组变形的3D物体参数化为形状(shape)和形变(deformation)因子的问题,从而实现对复杂形变结构的有效解耦表示。其解决方案的关键在于提出一种基于潜在空间优化(disentangled latent optimization)的方法,联合优化生成网络与形状及形变因子,并引入特定正则化技术以确保解耦性;此外,在第二阶段训练两个顺序不变的PoinNet编码器网络,实现高效且可推广的解耦编码推理,显著提升了下游任务如无监督形变迁移、形变分类和可解释性分析的效果。

链接: https://arxiv.org/abs/2511.06115
作者: Mostofa Rafid Uddin,Jana Armouti,Umong Sain,Md Asib Rahman,Xingjian Li,Min Xu
机构: Carnegie Mellon University (卡内基梅隆大学); Bangladesh University of Engineering and Technology (孟加拉国工程技术大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:In this work, we propose a disentangled latent optimization-based method for parameterizing grouped deforming 3D objects into shape and deformation factors in an unsupervised manner. Our approach involves the joint optimization of a generator network along with the shape and deformation factors, supported by specific regularization techniques. For efficient amortized inference of disentangled shape and deformation codes, we train two order-invariant PoinNet-based encoder networks in the second stage of our method. We demonstrate several significant downstream applications of our method, including unsupervised deformation transfer, deformation classification, and explainability analysis. Extensive experiments conducted on 3D human, animal, and facial expression datasets demonstrate that our simple approach is highly effective in these downstream tasks, comparable or superior to existing methods with much higher complexity.
zh

[CV-150] Hybrid CNN-ViT Framework for Motion-Blurred Scene Text Restoration

【速读】:该论文旨在解决场景文本图像中运动模糊(motion blur)导致的可读性下降问题,该问题严重影响自动驾驶、文档数字化和视觉信息检索等计算机视觉任务的可靠性。传统去模糊方法在处理空间变化的模糊以及建模长距离依赖关系方面表现不足。解决方案的关键在于提出一种结合卷积神经网络(CNN)与视觉Transformer(ViT)的混合深度学习框架:CNN编码器-解码器结构用于保留文本的局部结构细节,而Transformer模块通过自注意力机制增强全局上下文感知能力,从而有效恢复模糊文本的清晰度。

链接: https://arxiv.org/abs/2511.06087
作者: Umar Rashid(1),Muhammad Arslan Arshad(1),Ghulam Ahmad(1),Muhammad Zeeshan Anjum(1),Rizwan Khan(1),Muhammad Akmal(2) ((1) University of Engineering amp; Technology, New Campus, Lahore, Pakistan, (2) Sheffield Hallam University, Sheffield, UK)
机构: University of Engineering & Technology, New Campus, Lahore, Pakistan; Sheffield Hallam University, Sheffield S1 1WB, UK
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Motion blur in scene text images severely impairs readability and hinders the reliability of computer vision tasks, including autonomous driving, document digitization, and visual information retrieval. Conventional deblurring approaches are often inadequate in handling spatially varying blur and typically fall short in modeling the long-range dependencies necessary for restoring textual clarity. To overcome these limitations, we introduce a hybrid deep learning framework that combines convolutional neural networks (CNNs) with vision transformers (ViTs), thereby leveraging both local feature extraction and global contextual reasoning. The architecture employs a CNN-based encoder-decoder to preserve structural details, while a transformer module enhances global awareness through self-attention. Training is conducted on a curated dataset derived from TextOCR, where sharp scene-text samples are paired with synthetically blurred versions generated using realistic motion-blur kernels of multiple sizes and orientations. Model optimization is guided by a composite loss that incorporates mean absolute error (MAE), squared error (MSE), perceptual similarity, and structural similarity (SSIM). Quantitative eval- uations show that the proposed method attains 32.20 dB in PSNR and 0.934 in SSIM, while remaining lightweight with 2.83 million parameters and an average inference time of 61 ms. These results highlight the effectiveness and computational efficiency of the CNN-ViT hybrid design, establishing its practicality for real-world motion-blurred scene-text restoration.
zh

[CV-151] An Artificial Intelligence-based Assistant for the Visually Impaired

链接: https://arxiv.org/abs/2511.06080
作者: Luis Marquez-Carpintero,Francisco Gomez-Donoso,Zuria Bauer,Bessie Dominguez-Dager,Alvaro Belmonte-Baeza,Mónica Pina-Navarro,Francisco Morillas-Espejo,Felix Escalona,Miguel Cazorla
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
备注:

点击查看摘要

[CV-152] LoopExpose: An Unsupervised Framework for Arbitrary-Length Exposure Correction

链接: https://arxiv.org/abs/2511.06066
作者: Ao Li,Chen Chen,Zhenyu Wang,Tao Huang,Fangfang Wu,Weisheng Dong
机构: Xidian University (西安电子科技大学); Dalian University of Technology (大连理工大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-153] Identity Card Presentation Attack Detection: A Systematic Review

【速读】:该论文旨在解决身份文档的生成式 AI (Generative AI) 攻击检测(Presentation Attack Detection, PAD)中存在的数据稀缺与模型泛化能力不足的问题。当前基于深度学习(Deep Learning, DL)的PAD方法受限于私有数据集上的良好表现难以在公开数据集上复现,且合成数据在实际攻击场景中的有效性存疑,导致模型易过拟合于生成伪影而非真实攻击特征。其解决方案的关键在于识别并量化两个核心“差距”:一是“现实差距”(Reality Gap),即模型在大规模私有数据与有限公共数据间的性能差异;二是“合成效用差距”(Synthetic Utility Gap),即合成数据无法有效反映真实伪造攻击的 forensic 特征。通过系统性文献综述(SLR)厘清技术演进路径,并提出一个可操作的研究路线图,以推动开发具备全球泛化能力、鲁棒性强且安全可靠的下一代PAD系统。

链接: https://arxiv.org/abs/2511.06056
作者: Esteban M. Ruiz,Juan E. Tapia,Reinel T. Soto,Christoph Busch
机构: Hochschule Darmstadt (达姆施塔特应用技术大学); Universidad Autónoma de Manizales (曼萨莱斯自治大学); Universidad de Caldas (卡爾達斯大學)
类目: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Remote identity verification is essential for modern digital security; however, it remains highly vulnerable to sophisticated Presentation Attacks (PAs) that utilise forged or manipulated identity documents. Although Deep Learning (DL) has driven advances in Presentation Attack Detection (PAD), the field is fundamentally limited by a lack of data and the poor generalisation of models across various document types and new attack methods. This article presents a systematic literature review (SLR) conducted in accordance with the PRISMA methodology, aiming to analyse and synthesise the current state of AI-based PAD for identity documents from 2020 to 2025 comprehensively. Our analysis reveals a significant methodological evolution: a transition from standard Convolutional Neural Networks (CNNs) to specialised forensic micro-artefact analysis, and more recently, the adoption of large-scale Foundation Models (FMs), marking a substantial shift in the field. We identify a central paradox that hinders progress: a critical “Reality Gap” exists between models validated on extensive, private datasets and those assessed using limited public datasets, which typically consist of mock-ups or synthetic data. This gap limits the reproducibility of research results. Additionally, we highlight a “Synthetic Utility Gap,” where synthetic data generation the primary academic response to data scarcity often fails to predict forensic utility. This can lead to model overfitting to generation artefacts instead of the actual attack. This review consolidates our findings, identifies critical research gaps, and provides a definitive reference framework that outlines a prescriptive roadmap for future research aimed at developing secure, robust, and globally generalizable PAD systems. Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV) Cite as: arXiv:2511.06056 [cs.CR] (or arXiv:2511.06056v1 [cs.CR] for this version) https://doi.org/10.48550/arXiv.2511.06056 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Juan Tapia Dr. [view email] [v1] Sat, 8 Nov 2025 15:55:37 UTC (2,723 KB)
zh

[CV-154] Neodrag on: Mobile Video Generation using Diffusion Transformer

链接: https://arxiv.org/abs/2511.06055
作者: Animesh Karnewar,Denis Korzhenkov,Ioannis Lelekas,Adil Karjauv,Noor Fathima,Hanwen Xiong,Vancheeswaran Vaidyanathan,Will Zeng,Rafael Esteves,Tushar Singhal,Fatih Porikli,Mohsen Ghafoorian,Amirhossein Habibian
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-155] StreamSTGS: Streaming Spatial and Temporal Gaussian Grids for Real-Time Free-Viewpoint Video WWW AAAI2026

链接: https://arxiv.org/abs/2511.06046
作者: Zhihui Ke,Yuyang Liu,Xiaobo Zhou,Tie Qiu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by AAAI 2026. Code will be released at this https URL

点击查看摘要

[CV-156] S2ML: Spatio-Spectral Mutual Learning for Depth Completion

链接: https://arxiv.org/abs/2511.06033
作者: Zihui Zhao,Yifei Zhang,Zheng Wang,Yang Li,Kui Jiang,Zihan Geng,Chia-Wen Lin
机构: Tsinghua Shenzhen International Graduate School, Tsinghua University (清华大学深圳国际研究生院); Wuhan University (武汉大学); Harbin Institute of Technology (哈尔滨工业大学); National Tsing Hua University (国立清华大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-157] owards Implicit Aggregation: Robust Image Representation for Place Recognition in the Transformer Era NEURIPS2025

【速读】:该论文旨在解决视觉场景识别(Visual Place Recognition, VPR)中传统方法依赖显式聚合模块(aggregator)的问题,即现有主流方法(如NetVLAD)通常采用“骨干网络+显式聚合器”的范式,先提取图像块特征(patch features),再通过额外的聚合层生成全局描述符。然而,在基于Transformer的模型中,作者提出无需设计专门的聚合模块,仅通过骨干网络即可生成鲁棒的全局描述符。其解决方案的关键在于引入可学习的聚合标记(aggregation tokens),这些标记在特定Transformer块之前被插入到图像块标记序列中,并借助Transformer固有的自注意力机制实现隐式聚合——所有标记在多头自注意力下进行全局交互,从而将有用信息从图像块标记隐式地汇聚到聚合标记中。最终,仅取最后一层输出中的聚合标记并拼接作为全局表示。该方法显著简化了架构设计,在多个VPR数据集上性能优于现有最先进方法,且效率更高。

链接: https://arxiv.org/abs/2511.06024
作者: Feng Lu,Tong Jin,Canming Ye,Yunpeng Liu,Xiangyuan Lan,Chun Yuan
机构: Tsinghua Shenzhen International Graduate School, Tsinghua University (清华大学深圳国际研究生院); Pengcheng Laboratory (鹏城实验室); Shenyang Institute of Automation, Chinese Academy of Sciences (中国科学院沈阳自动化研究所); University of Chinese Academy of Sciences (中国科学院大学); Pazhou Laboratory (Huangpu) (琶洲实验室(黄埔))
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by NeurIPS 2025

点击查看摘要

Abstract:Visual place recognition (VPR) is typically regarded as a specific image retrieval task, whose core lies in representing images as global descriptors. Over the past decade, dominant VPR methods (e.g., NetVLAD) have followed a paradigm that first extracts the patch features/tokens of the input image using a backbone, and then aggregates these patch features into a global descriptor via an aggregator. This backbone-plus-aggregator paradigm has achieved overwhelming dominance in the CNN era and remains widely used in transformer-based models. In this paper, however, we argue that a dedicated aggregator is not necessary in the transformer era, that is, we can obtain robust global descriptors only with the backbone. Specifically, we introduce some learnable aggregation tokens, which are prepended to the patch tokens before a particular transformer block. All these tokens will be jointly processed and interact globally via the intrinsic self-attention mechanism, implicitly aggregating useful information within the patch tokens to the aggregation tokens. Finally, we only take these aggregation tokens from the last output tokens and concatenate them as the global representation. Although implicit aggregation can provide robust global descriptors in an extremely simple manner, where and how to insert additional tokens, as well as the initialization of tokens, remains an open issue worthy of further exploration. To this end, we also propose the optimal token insertion strategy and token initialization method derived from empirical studies. Experimental results show that our method outperforms state-of-the-art methods on several VPR datasets with higher efficiency and ranks 1st on the MSLS challenge leaderboard. The code is available at this https URL.
zh

[CV-158] MiVID: Multi-Strategic Self-Supervision for Video Frame Interpolation using Diffusion Model

【速读】:该论文旨在解决视频帧插值(Video Frame Interpolation, VFI)中因遮挡、域偏移和运动模糊导致的性能瓶颈问题,尤其在缺乏密集真值标注的情况下如何实现高质量的时序帧生成。其解决方案的关键在于提出了一种轻量级、自监督的扩散模型MiVID,通过结合3D U-Net主干与基于Transformer的时序注意力机制,并采用混合掩码训练策略模拟遮挡和运动不确定性,从而无需显式运动估计即可学习鲁棒的时空表征;同时利用余弦渐进掩码和自适应损失调度,使模型在仅使用CPU训练且无高帧率监督的情况下,仅用50个epoch即可达到优于多个监督方法的效果,验证了自监督扩散先验在时序一致帧合成中的强大能力。

链接: https://arxiv.org/abs/2511.06019
作者: Priyansh Srivastava,Romit Chatterjee,Abir Sen,Aradhana Behura,Ratnakar Dash
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Video Frame Interpolation (VFI) remains a cornerstone in video enhancement, enabling temporal upscaling for tasks like slow-motion rendering, frame rate conversion, and video restoration. While classical methods rely on optical flow and learning-based models assume access to dense ground-truth, both struggle with occlusions, domain shifts, and ambiguous motion. This article introduces MiVID, a lightweight, self-supervised, diffusion-based framework for video interpolation. Our model eliminates the need for explicit motion estimation by combining a 3D U-Net backbone with transformer-style temporal attention, trained under a hybrid masking regime that simulates occlusions and motion uncertainty. The use of cosine-based progressive masking and adaptive loss scheduling allows our network to learn robust spatiotemporal representations without any high-frame-rate supervision. Our framework is evaluated on UCF101-7 and DAVIS-7 datasets. MiVID is trained entirely on CPU using the datasets and 9-frame video segments, making it a low-resource yet highly effective pipeline. Despite these constraints, our model achieves optimal results at just 50 epochs, competitive with several supervised this http URL work demonstrates the power of self-supervised diffusion priors for temporally coherent frame synthesis and provides a scalable path toward accessible and generalizable VFI systems.
zh

[CV-159] One-Shot Knowledge Transfer for Scalable Person Re-Identification ICCV2025

链接: https://arxiv.org/abs/2511.06016
作者: Longhua Li,Lei Qi,Xin Geng
机构: School of Computer Science and Engineering, Southeast University, Nanjing, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Accepted by ICCV 2025

点击查看摘要

[CV-160] Distributed Deep Learning for Medical Image Denoising with Data Obfuscation

链接: https://arxiv.org/abs/2511.06006
作者: Sulaimon Oyeniyi Adebayo,Ayaz H. Khan
机构: King Fahd University of Petroleum and Minerals (国王法赫德石油矿产大学); SDAIA-KFUPM Joint Research Center for Artificial Intelligence (沙特数据与人工智能局-国王法赫德石油矿产大学联合人工智能研究中心)
类目: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
备注:

点击查看摘要

[CV-161] How Reasoning Influences Intersectional Biases in Vision Language Models

链接: https://arxiv.org/abs/2511.06005
作者: Adit Desai,Sudipta Roy,Mohna Chakraborty
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-162] MALeR: Improving Compositional Fidelity in Layout-Guided Generation SIGGRAPH

链接: https://arxiv.org/abs/2511.06002
作者: Shivank Saxena,Dhruv Srivastava,Makarand Tapaswi
机构: CVIT, IIIT Hyderabad (计算机视觉与图像技术中心,印度国际信息技术研究所); Adobe Research (Adobe 研究院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: ACM TOG Dec 2025, Siggraph Asia, Project page: this https URL

点击查看摘要

[CV-163] Exploring Category-level Articulated Object Pose Tracking on SE(3) Manifolds

【速读】:该论文旨在解决关节式物体(articulated objects)在多帧场景下的位姿跟踪(pose tracking)问题,这在机器人操作、具身智能和增强现实等领域具有重要意义。相较于刚性物体,关节式物体因存在运动学约束而更难进行准确跟踪。其解决方案的关键在于提出了一种基于点对特征(Point Pair Features, PPF)的位姿跟踪框架——PPF-Tracker:首先在SE(3)李群空间中对点云进行准规范化的预处理,利用PPF在SE(3)变换下的不变性来预测位姿投票参数;随后引入关节轴的语义信息以统一施加运动学约束于物体各部件,从而实现鲁棒且精确的多帧位姿估计。

链接: https://arxiv.org/abs/2511.05996
作者: Xianhui Meng,Yukang Huo,Li Zhang,Liu Liu,Haonan Jiang,Yan Zhong,Pingrui Zhang,Cewu Lu,Jun Liu
机构: 1. University of Science and Technology of China (中国科学技术大学); 2. Tsinghua University (清华大学); 3. Peking University (北京大学); 4. Chinese Academy of Sciences (中国科学院); 5. National University of Singapore (新加坡国立大学); 6. University of California, Berkeley (加州大学伯克利分校); 7. University of Oxford (牛津大学); 8. Stanford University (斯坦福大学); 9. Microsoft Research (微软研究院)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Articulated objects are prevalent in daily life and robotic manipulation tasks. However, compared to rigid objects, pose tracking for articulated objects remains an underexplored problem due to their inherent kinematic constraints. To address these challenges, this work proposes a novel point-pair-based pose tracking framework, termed \textbfPPF-Tracker. The proposed framework first performs quasi-canonicalization of point clouds in the SE(3) Lie group space, and then models articulated objects using Point Pair Features (PPF) to predict pose voting parameters by leveraging the invariance properties of SE(3). Finally, semantic information of joint axes is incorporated to impose unified kinematic constraints across all parts of the articulated object. PPF-Tracker is systematically evaluated on both synthetic datasets and real-world scenarios, demonstrating strong generalization across diverse and challenging environments. Experimental results highlight the effectiveness and robustness of PPF-Tracker in multi-frame pose tracking of articulated objects. We believe this work can foster advances in robotics, embodied intelligence, and augmented reality. Codes are available at this https URL.
zh

[CV-164] A Dual-Mode ViT-Conditioned Diffusion Framework with an Adaptive Conditioning Bridge for Breast Cancer Segmentation

【速读】:该论文旨在解决乳腺超声图像中病灶分割精度不足的问题,主要挑战包括低对比度、斑点噪声以及边界不清晰等因素导致的分割结果不准确和解剖结构不一致。解决方案的关键在于提出一种灵活且条件化的去噪扩散模型(Denoising Diffusion Model),其核心创新包括:1)自适应条件桥接模块(Adaptive Conditioning Bridge, ACB),实现语义特征的多尺度高效融合;2)拓扑一致性去噪损失(Topological Denoising Consistency, TDC),通过惩罚去噪过程中的结构不一致性来正则化训练;3)双头架构设计,利用去噪目标作为强大正则器,使轻量级辅助头能够在小数据集上实现快速准确推理,同时保留噪声预测头以提升主任务性能。该方法在多个公开乳腺超声数据集上达到了新的最先进水平,验证了其在准确性与解剖合理性上的优势。

链接: https://arxiv.org/abs/2511.05989
作者: Prateek Singh,Moumita Dholey,P.K. Vinod
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 5 pages, 2 figures, 3 tables, submitted to ISBI 2026

点击查看摘要

Abstract:In breast ultrasound images, precise lesion segmentation is essential for early diagnosis; however, low contrast, speckle noise, and unclear boundaries make this difficult. Even though deep learning models have demonstrated potential, standard convolutional architectures frequently fall short in capturing enough global context, resulting in segmentations that are anatomically inconsistent. To overcome these drawbacks, we suggest a flexible, conditional Denoising Diffusion Model that combines an enhanced UNet-based generative decoder with a Vision Transformer (ViT) encoder for global feature extraction. We introduce three primary innovations: 1) an Adaptive Conditioning Bridge (ACB) for efficient, multi-scale fusion of semantic features; 2) a novel Topological Denoising Consistency (TDC) loss component that regularizes training by penalizing structural inconsistencies during denoising; and 3) a dual-head architecture that leverages the denoising objective as a powerful regularizer, enabling a lightweight auxiliary head to perform rapid and accurate inference on smaller datasets and a noise prediction head. Our framework establishes a new state-of-the-art on public breast ultrasound datasets, achieving Dice scores of 0.96 on BUSI, 0.90 on BrEaST and 0.97 on BUS-UCLM. Comprehensive ablation studies empirically validate that the model components are critical for achieving these results and for producing segmentations that are not only accurate but also anatomically plausible.
zh

[CV-165] Runtime Safety Monitoring of Deep Neural Networks for Perception: A Survey

【速读】:该论文旨在解决深度神经网络(Deep Neural Networks, DNNs)在安全关键应用(如自动驾驶和机器人感知系统)中面临的运行时安全性问题,包括泛化误差、分布外(out-of-distribution, OOD)输入以及对抗攻击等可能导致危险故障的隐患。其解决方案的关键在于提出一种无需修改DNN结构或参数的运行时安全监控方法,该方法在推理阶段与DNN并行运行,通过监测输入数据、内部表征和输出结果三个维度来实时识别潜在的安全风险,并对不同监控策略进行系统性分类、分析与对比,从而为提升DNN系统的鲁棒性和可信赖性提供理论支撑与实践路径。

链接: https://arxiv.org/abs/2511.05982
作者: Albert Schotschneider,Svetlana Pavlitska,J. Marius Zöllner
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
备注: 6 pages, 1 figure, 2 tables, accepted at IEEE SMC 2025 in Vienna, presented on 8th October 2025

点击查看摘要

Abstract:Deep neural networks (DNNs) are widely used in perception systems for safety-critical applications, such as autonomous driving and robotics. However, DNNs remain vulnerable to various safety concerns, including generalization errors, out-of-distribution (OOD) inputs, and adversarial attacks, which can lead to hazardous failures. This survey provides a comprehensive overview of runtime safety monitoring approaches, which operate in parallel to DNNs during inference to detect these safety concerns without modifying the DNN itself. We categorize existing methods into three main groups: Monitoring inputs, internal representations, and outputs. We analyze the state-of-the-art for each category, identify strengths and limitations, and map methods to the safety concerns they address. In addition, we highlight open challenges and future research directions.
zh

[CV-166] DiA-gnostic VLVAE: Disentangled Alignment-Constrained Vision Language Variational AutoEncoder for Robust Radiology Reporting with Missing Modalities AAAI AAAI-26

链接: https://arxiv.org/abs/2511.05968
作者: Nagur Shareef Shaik,Teja Krishna Cherukuri,Adnan Masood,Dong Hye Ye
机构: 1: Korea University of Technology and Education (韩国技术教育大学); 2: Samsung Electronics (三星电子)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Accepted for Oral Presentation at the 40th AAAI Conference on Artificial Intelligence (AAAI-26), Main Technical Track

点击查看摘要

[CV-167] Adapted Foundation Models for Breast MRI Triaging in Contrast-Enhanced and Non-Contrast Enhanced Protocols

【速读】:该论文旨在解决乳腺MRI影像诊断中人工判读耗时且效率低的问题,提出利用基于DINOv2的Medical Slice Transformer (MST)模型实现对显著病变(BI-RADS = 4)的自动筛查与排除。其解决方案的关键在于构建一个可泛化的深度学习框架,通过融合不同序列(如T1加权早期减影图像和T2加权图像或扩散加权成像b=1500 s/mm²),在保持97.5%高敏感度的前提下提升特异性(达19%),从而有效识别无需进一步评估的阴性病例,显著降低放射科医生的工作负担。

链接: https://arxiv.org/abs/2511.05967
作者: Tri-Thien Nguyen,Lorenz A. Kapsner,Tobias Hepp,Shirin Heidarikahkesh,Hannes Schreiter,Luise Brock,Dominika Skwierawska,Dominique Hadler,Julian Hossbach,Evelyn Wenkel,Sabine Ohlmeyer,Frederik B. Laun,Andrzej Liebert,Andreas Maier,Michael Uder,Sebastian Bickelhaupt
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 23 pages, 6 figures, 4 tables. Originally submitted to Radiology (RAD-25-2541); under consideration for transfer to Radiology: Artificial Intelligence (RSNA Portfolio Journal)

点击查看摘要

Abstract:Background: Magnetic resonance imaging (MRI) has high sensitivity for breast cancer detection, but interpretation is time-consuming. Artificial intelligence may aid in pre-screening. Purpose: To evaluate the DINOv2-based Medical Slice Transformer (MST) for ruling out significant findings (Breast Imaging Reporting and Data System [BI-RADS] =4) in contrast-enhanced and non-contrast-enhanced abbreviated breast MRI. Materials and Methods: This institutional review board approved retrospective study included 1,847 single-breast MRI examinations (377 BI-RADS =4) from an in-house dataset and 924 from an external validation dataset (Duke). Four abbreviated protocols were tested: T1-weighted early subtraction (T1sub), diffusion-weighted imaging with b=1500 s/mm2 (DWI1500), DWI1500+T2-weighted (T2w), and T1sub+T2w. Performance was assessed at 90%, 95%, and 97.5% sensitivity using five-fold cross-validation and area under the receiver operating characteristic curve (AUC) analysis. AUC differences were compared with the DeLong test. False negatives were characterized, and attention maps of true positives were rated in the external dataset. Results: A total of 1,448 female patients (mean age, 49 +/- 12 years) were included. T1sub+T2w achieved an AUC of 0.77 +/- 0.04; DWI1500+T2w, 0.74 +/- 0.04 (p=0.15). At 97.5% sensitivity, T1sub+T2w had the highest specificity (19% +/- 7%), followed by DWI1500+T2w (17% +/- 11%). Missed lesions had a mean diameter 10 mm at 95% and 97.5% thresholds for both T1sub and DWI1500, predominantly non-mass enhancements. External validation yielded an AUC of 0.77, with 88% of attention maps rated good or moderate. Conclusion: At 97.5% sensitivity, the MST framework correctly triaged cases without BI-RADS =4, achieving 19% specificity for contrast-enhanced and 17% for non-contrast-enhanced MRI. Further research is warranted before clinical implementation.
zh

[CV-168] Commonality in Few: Few-Shot Multimodal Anomaly Detection via Hypergraph-Enhanced Memory AAAI2026

链接: https://arxiv.org/abs/2511.05966
作者: Yuxuan Lin,Hanjing Yan,Xuan Tong,Yang Chang,Huanzhen Wang,Ziheng Zhou,Shuyong Gao,Yan Wang,Wenqiang Zhang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by AAAI 2026

点击查看摘要

[CV-169] Adaptive Agent Selection and Interaction Network for Image-to-point cloud Registration AAAI2026

【速读】:该论文旨在解决图像到点云配准中检测-free方法在复杂场景下因噪声干扰导致相似性计算失真、错误对应关系频发,以及跨模态信息选择不充分限制配准鲁棒性和精度的问题。其解决方案的关键在于提出一种包含两个核心模块的新型跨模态配准框架:一是迭代代理选择(Iterative Agents Selection, IAS)模块,通过相位图增强结构特征感知,并引入强化学习机制高效筛选可靠代理;二是可靠代理交互(Reliable Agents Interaction, RAI)模块,利用所选代理引导跨模态交互,有效抑制误匹配并提升整体鲁棒性。

链接: https://arxiv.org/abs/2511.05965
作者: Zhixin Cheng,Xiaotian Yin,Jiacheng Deng,Bohao Liao,Yujia Chen,Xu Zhou,Baoqun Yin,Tianzhu Zhang
机构: 1. University of Science and Technology of China (中国科学技术大学); 2. Institute of Artificial Intelligence, University of Science and Technology of China (中国科学技术大学人工智能研究院); 3. Alibaba Group (阿里巴巴集团); 4. Tsinghua University (清华大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Accepted by AAAI2026

点击查看摘要

Abstract:Typical detection-free methods for image-to-point cloud registration leverage transformer-based architectures to aggregate cross-modal features and establish correspondences. However, they often struggle under challenging conditions, where noise disrupts similarity computation and leads to incorrect correspondences. Moreover, without dedicated designs, it remains difficult to effectively select informative and correlated representations across modalities, thereby limiting the robustness and accuracy of registration. To address these challenges, we propose a novel cross-modal registration framework composed of two key modules: the Iterative Agents Selection (IAS) module and the Reliable Agents Interaction (RAI) module. IAS enhances structural feature awareness with phase maps and employs reinforcement learning principles to efficiently select reliable agents. RAI then leverages these selected agents to guide cross-modal interactions, effectively reducing mismatches and improving overall robustness. Extensive experiments on the RGB-D Scenes v2 and 7-Scenes benchmarks demonstrate that our method consistently achieves state-of-the-art performance.
zh

[CV-170] CSGaze: Context-aware Social Gaze Prediction

【速读】:该论文旨在解决社交注视(social gaze)模式预测中的准确性与可解释性问题,尤其是在多人对话场景下如何有效融合上下文线索、视觉场景信息和面部特征以提升预测性能。其解决方案的关键在于提出一种名为CSGaze的上下文感知多模态方法,该方法通过整合面部信息和场景信息作为互补输入,并引入以主说话者为中心的细粒度注意力机制,从而更精准地建模社交注视动态。实验表明,该方法在GP-Static、UCO-LAEO和AVA-LAEO等多个基准数据集上达到先进水平,同时借助生成的注意力得分提供了初步的模型可解释性,验证了其在开放集场景下的泛化能力。

链接: https://arxiv.org/abs/2511.05955
作者: Surbhi Madan,Shreya Ghosh,Ramanathan Subramanian,Abhinav Dhall,Tom Gedeon
机构: IIT Ropar; National Institute of Informatics Japan; The University of Queensland Australia; University of Canberra Australia; Monash University Australia; Curtin University Australia
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:A person’s gaze offers valuable insights into their focus of attention, level of social engagement, and confidence. In this work, we investigate how contextual cues combined with visual scene and facial information can be effectively utilized to predict and interpret social gaze patterns during conversational interactions. We introduce CSGaze, a context aware multimodal approach that leverages facial, scene information as complementary inputs to enhance social gaze pattern prediction from multi-person images. The model also incorporates a fine-grained attention mechanism centered on the principal speaker, which helps in better modeling social gaze dynamics. Experimental results show that CSGaze performs competitively with state-of-the-art methods on GP-Static, UCO-LAEO and AVA-LAEO. Our findings highlight the role of contextual cues in improving social gaze prediction. Additionally, we provide initial explainability through generated attention scores, offering insights into the model’s decision-making process. We also demonstrate our model’s generalizability by testing our model on open set datasets that demonstrating its robustness across diverse scenarios.
zh

[CV-171] Pinching Visuo-haptic Display: Investigating Cross-Modal Effects of Visual Textures on Electrostatic Cloth Tactile Sensations

【速读】:该论文旨在解决视觉纹理呈现如何影响用户在交互静电织物显示设备时的触觉感知问题,特别是探讨多模态感知中视觉粗糙度与触觉摩擦力之间的跨模态效应。解决方案的关键在于提出了一种视觉-触觉(visuo-haptic)系统,允许用户通过夹捏和摩擦虚拟织物来体验由静电激励调控的真实摩擦感,并通过用户实验验证了视觉粗糙度能显著增强感知摩擦力,即使在相同的静电刺激条件下也是如此。这一发现为虚拟材料界面中的触觉反馈设计提供了重要依据。

链接: https://arxiv.org/abs/2511.05952
作者: Takekazu Kitagishi,Chun-Wei Ooi,Yuichi Hiroi,Jun Rekimoto
机构: The University of Tokyo(东京大学); ZOZO Research(ZOZO 研究所); Cluster Metaverse Lab(集群元宇宙实验室); Sony CSL Kyoto(索尼计算机科学实验室)
类目: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
备注: 10 pages, 8 figures, 3 tables. Presented at ACM International Conference on Multimodal Interaction (ICMI) 2025

点击查看摘要

Abstract:This paper investigates how visual texture presentation influences tactile perception when interacting with electrostatic cloth displays. We propose a visuo-haptic system that allows users to pinch and rub virtual fabrics while feeling realistic frictional sensations modulated by electrostatic actuation. Through a user study, we examined the cross-modal effects between visual roughness and perceived tactile friction. The results demonstrate that visually rough textures amplify the perceived frictional force, even under identical electrostatic stimuli. These findings contribute to the understanding of multimodal texture perception and provide design insights for haptic feedback in virtual material interfaces.
zh

[CV-172] U(PM)2:Unsupervised polygon matching with pre-trained models for challenging stereo images

【速读】:该论文旨在解决**多边形匹配(polygon matching)**在计算机视觉、摄影测量与遥感领域中面临的四大挑战:视差不连续性(disparity discontinuity)、尺度变化(scale variation)、训练需求以及泛化能力不足。其解决方案的关键在于提出一种名为U(PM)²的低代价无监督多边形匹配方法,该方法通过融合预训练模型与手工设计特征构建端到端流程:首先利用预训练的Segment Anything Model(SAM)生成掩码并转化为多边形及图形结构;其次基于双向金字塔策略和预训练LoFTR模型实现全局匹配以应对视角变化与尺度差异;最后引入局部联合几何约束与多特征匹配策略结合匈牙利算法进一步优化局部视差不连续性和拓扑不一致性问题。整体方案无需任何训练即可实现高精度、强泛化能力和高效性能。

链接: https://arxiv.org/abs/2511.05949
作者: Chang Li,Xingtao Peng
机构: Central China Normal University (华中师范大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Stereo image matching is a fundamental task in computer vision, photogrammetry and remote sensing, but there is an almost unexplored field, i.e., polygon matching, which faces the following challenges: disparity discontinuity, scale variation, training requirement, and generalization. To address the above-mentioned issues, this paper proposes a novel U(PM) ^2 : low-cost unsupervised polygon matching with pre-trained models by uniting automatically learned and handcrafted features, of which pipeline is as follows: firstly, the detector leverages the pre-trained segment anything model to obtain masks; then, the vectorizer converts the masks to polygons and graphic structure; secondly, the global matcher addresses challenges from global viewpoint changes and scale variation based on bidirectional-pyramid strategy with pre-trained LoFTR; finally, the local matcher further overcomes local disparity discontinuity and topology inconsistency of polygon matching by local-joint geometry and multi-feature matching strategy with Hungarian algorithm. We benchmark our U(PM) ^2 on the ScanNet and SceneFlow datasets using our proposed new metric, which achieved state-of-the-art accuracy at a competitive speed and satisfactory generalization performance at low cost without any training requirement.
zh

[CV-173] Reperio-rPPG: Relational Temporal Graph Neural Networks for Periodicity Learning in Remote Physiological Measurement

【速读】:该论文旨在解决远程光电容积脉搏波描记术(remote photoplethysmography, rPPG)在实际应用中对生理信号内在周期性特征建模不足的问题,这一缺陷限制了模型在复杂运动和光照条件下的细粒度时序动态捕捉能力。解决方案的关键在于提出Reperio-rPPG框架,通过将关系卷积网络(Relational Convolutional Networks)与图Transformer(Graph Transformer)相结合,有效建模生理信号的周期结构;同时引入定制化的CutMix数据增强策略以提升模型在多样化场景下的泛化性能。

链接: https://arxiv.org/abs/2511.05946
作者: Ba-Thinh Nguyen,Thach-Ha Ngoc Pham,Hoang-Long Duc Nguyen,Thi-Duyen Ngo,Thanh-Ha Le
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Remote photoplethysmography (rPPG) is an emerging contactless physiological sensing technique that leverages subtle color variations in facial videos to estimate vital signs such as heart rate and respiratory rate. This non-invasive method has gained traction across diverse domains, including telemedicine, affective computing, driver fatigue detection, and health monitoring, owing to its scalability and convenience. Despite significant progress in remote physiological signal measurement, a crucial characteristic - the intrinsic periodicity - has often been underexplored or insufficiently modeled in previous approaches, limiting their ability to capture fine-grained temporal dynamics under real-world conditions. To bridge this gap, we propose Reperio-rPPG, a novel framework that strategically integrates Relational Convolutional Networks with a Graph Transformer to effectively capture the periodic structure inherent in physiological signals. Additionally, recognizing the limited diversity of existing rPPG datasets, we further introduce a tailored CutMix augmentation to enhance the model’s generalizability. Extensive experiments conducted on three widely used benchmark datasets - PURE, UBFC-rPPG, and MMPD - demonstrate that Reperio-rPPG not only achieves state-of-the-art performance but also exhibits remarkable robustness under various motion (e.g., stationary, rotation, talking, walking) and illumination conditions (e.g., nature, low LED, high LED). The code is publicly available at this https URL.
zh

[CV-174] Polymap: generating high definition map based on rasterized polygons

【速读】:该论文旨在解决现有基于检测的高精地图在线构建方法在自动标注系统中泛化能力不足的问题。其核心挑战在于检测方法对复杂场景和多样数据分布的适应性较差,限制了其在实际自动驾驶环境中的可靠应用。解决方案的关键在于将道路要素重新建模为栅格化的多边形,并设计了一个基于实例分割的简洁框架:首先利用端到端的分割型Transformer模型生成实例掩码,随后通过Potrace算法进行后处理,从而获得矢量化的地图元素。该方法显著提升了地图构建的泛化性能,在NuScenes数据集上得到了定量验证。

链接: https://arxiv.org/abs/2511.05944
作者: Shiyu Gao,Hao Jiang
机构: Institute of Computing Technology, Chinese Academy of Science (中国科学院计算技术研究所); Qcraft Inc.
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:The perception of high-definition maps is an integral component of environmental perception in autonomous driving systems. Existing research have often focused on online construction of high-definition maps. For instance, the Maptr[9] series employ a detection-based method to output vectorized map instances parallelly in an end-to-end manner. However, despite their capability for real-time construction, detection-based methods are observed to lack robust generalizability[19], which hampers their applicability in auto-labeling systems. Therefore, aiming to improve the generalizability, we reinterpret road elements as rasterized polygons and design a concise framework based on instance segmentation. Initially, a segmentation-based transformer is employed to deliver instance masks in an end-to-end manner; succeeding this step, a Potrace-based[17] post-processing module is used to ultimately yield vectorized map elements. Quantitative results attained on the Nuscene[1] dataset substantiate the effectiveness and generaliz-ability of our method.
zh

[CV-175] Global Multiple Extraction Network for Low-Resolution Facial Expression Recognition

链接: https://arxiv.org/abs/2511.05938
作者: Jingyi Shi
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 12 pages

点击查看摘要

[CV-176] Interaction-Centric Knowledge Infusion and Transfer for Open-Vocabulary Scene Graph Generation NEURIPS2025

【速读】:该论文旨在解决开放词汇场景图生成(Open-vocabulary Scene Graph Generation, OVSGG)中因缺乏显式交互建模而导致的两个关键问题:在知识注入阶段,模型难以区分同类别中相互作用与非相互作用实例,从而产生噪声伪监督信号;在知识迁移阶段,查询匹配模糊,导致关系推理不准确。解决方案的核心在于提出一种以交互为中心(interaction-centric)的端到端框架ACC(Interaction-Centric Consistent framework),其关键创新包括:1)采用双向交互提示(bidirectional interaction prompt)增强伪监督信号的鲁棒性,提升模型对交互关系的理解;2)引入交互引导的查询选择机制,优先匹配潜在交互对象以减少干扰,并结合一致性知识蒸馏策略,在保留通用知识的同时强化关系前景与背景的分离,从而显著提升OVSGG的准确性与泛化能力。

链接: https://arxiv.org/abs/2511.05935
作者: Lin Li,Chuhan Zhang,Dong Zhang,Chong Sun,Chen Li,Long Chen
机构: HKUST(香港科技大学); AI Chip Center for Emerging Smart Systems (人工智能芯片中心); Tencent(腾讯)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by NeurIPS 2025

点击查看摘要

Abstract:Open-vocabulary scene graph generation (OVSGG) extends traditional SGG by recognizing novel objects and relationships beyond predefined categories, leveraging the knowledge from pre-trained large-scale models. Existing OVSGG methods always adopt a two-stage pipeline: 1) \textitInfusing knowledge into large-scale models via pre-training on large datasets; 2) \textitTransferring knowledge from pre-trained models with fully annotated scene graphs during supervised fine-tuning. However, due to a lack of explicit interaction modeling, these methods struggle to distinguish between interacting and non-interacting instances of the same object category. This limitation induces critical issues in both stages of OVSGG: it generates noisy pseudo-supervision from mismatched objects during knowledge infusion, and causes ambiguous query matching during knowledge transfer. To this end, in this paper, we propose an inter\textbfACtion-\textbfCentric end-to-end OVSGG framework (\textbfACC) in an interaction-driven paradigm to minimize these mismatches. For \textitinteraction-centric knowledge infusion, ACC employs a bidirectional interaction prompt for robust pseudo-supervision generation to enhance the model’s interaction knowledge. For \textitinteraction-centric knowledge transfer, ACC first adopts interaction-guided query selection that prioritizes pairing interacting objects to reduce interference from non-interacting ones. Then, it integrates interaction-consistent knowledge distillation to bolster robustness by pushing relational foreground away from the background while retaining general knowledge. Extensive experimental results on three benchmarks show that ACC achieves state-of-the-art performance, demonstrating the potential of interaction-centric paradigms for real-world applications.
zh

[CV-177] AD-DAE: Unsupervised Modeling of Longitudinal Alzheimers Disease Progression with Diffusion Auto-Encoder

【速读】:该论文旨在解决现有生成式建模方法在纵向疾病进展建模中对受试者特异性纵向图像监督依赖过强、且潜在空间可控性不足的问题。解决方案的关键在于提出一种可条件化的扩散自编码器(conditionable Diffusion Auto-encoder)框架,其通过显式的图像-扩散自编码机制构建一个紧凑的潜在空间,有效捕捉高层语义信息并实现与疾病进展相关因素的解耦;进一步地,通过将潜在空间中的移动限制在特定子空间内,隔离出与个体身份保持无关的进展相关成分,并利用进展属性隐式引导这些移动,从而在无监督条件下实现从基线图像生成随访图像的能力。

链接: https://arxiv.org/abs/2511.05934
作者: Ayantika Das,Arunima Sarkar,Keerthi Ram,Mohanasankar Sivaprakasam
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Under Review

点击查看摘要

Abstract:Generative modeling frameworks have emerged as an effective approach to capture high-dimensional image distributions from large datasets without requiring domain-specific knowledge, a capability essential for longitudinal disease progression modeling. Recent generative modeling approaches have attempted to capture progression by mapping images into a latent representational space and then controlling and guiding the representations to generate follow-up images from a baseline image. However, existing approaches impose constraints on distribution learning, leading to latent spaces with limited controllability to generate follow-up images without explicit supervision from subject-specific longitudinal images. In order to enable controlled movements in the latent representational space and generate progression images from a baseline image in an unsupervised manner, we introduce a conditionable Diffusion Auto-encoder framework. The explicit encoding mechanism of image-diffusion auto-encoders forms a compact latent space capturing high-level semantics, providing means to disentangle information relevant for progression. Our approach leverages this latent space to condition and apply controlled shifts to baseline representations for generating follow-up. Controllability is induced by restricting these shifts to a subspace, thereby isolating progression-related factors from subject identity-preserving components. The shifts are implicitly guided by correlating with progression attributes, without requiring subject-specific longitudinal supervision. We validate the generations through image quality metrics, volumetric progression analysis, and downstream classification in Alzheimer’s disease datasets from two different sources and disease categories. This demonstrates the effectiveness of our approach for Alzheimer’s progression modeling and longitudinal image generation.
zh

[CV-178] CoMA: Complementary Masking and Hierarchical Dynamic Multi-Window Self-Attention in a Unified Pre-training Framework

链接: https://arxiv.org/abs/2511.05929
作者: Jiaxuan Li,Qing Xu,Xiangjian He,Ziyu Liu,Chang Xing,Zhen Chen,Daokun Zhang,Rong Qu,Chang Wen Chen
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 9 pages, 5 figures

点击查看摘要

[CV-179] Causal Tracing of Object Representations in Large Vision Language Models: Mechanistic Interpretability and Hallucination Mitigation AAAI2026

链接: https://arxiv.org/abs/2511.05923
作者: Qiming Li,Zekai Ye,Xiaocheng Feng,Weihong Zhong,Weitao Ma,Xiachong Feng
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: AAAI2026 Oral

点击查看摘要

[CV-180] GABFusion: Rethinking Feature Fusion for Low-Bit Quantization of Multi-Task Networks

【速读】:该论文旨在解决量化感知训练(Quantization-Aware Training, QAT)在多任务神经网络架构中性能显著下降的问题,主要归因于任务间特征差异(task-specific feature discrepancies)和梯度冲突(gradient conflicts)。解决方案的关键在于提出两种核心机制:一是梯度感知的平衡特征融合方法(Gradient-Aware Balanced Feature Fusion, GABFusion),通过动态调整梯度幅度并以量化友好的方式融合任务特定特征,缓解梯度冲突;二是面向量化模型的特征级蒸馏策略——注意力分布对齐(Attention Distribution Alignment, ADA),用于优化任务间特征一致性。二者共同作用,在不修改原始网络结构的前提下,显著提升多种QAT方法在不同网络架构与位宽下的泛化能力,并具备梯度偏差减少的理论保障。

链接: https://arxiv.org/abs/2511.05898
作者: Zhaoyang Wang,Dong Wang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 9 pages,6 figures

点击查看摘要

Abstract:Despite the effectiveness of quantization-aware training (QAT) in compressing deep neural networks, its performance on multi-task architectures often degrades significantly due to task-specific feature discrepancies and gradient conflicts. To address these challenges, we propose Gradient-Aware Balanced Feature Fusion (GABFusion), which dynamically balances gradient magnitudes and fuses task-specific features in a quantization-friendly manner. We further introduce Attention Distribution Alignment (ADA), a feature-level distillation strategy tailored for quantized models. Our method demonstrates strong generalization across network architectures and QAT algorithms, with theoretical guarantees on gradient bias reduction. Extensive experiments demonstrate that our strategy consistently enhances a variety of QAT methods across different network architectures and bit-widths. On PASCAL VOC and COCO datasets, the proposed approach achieves average mAP improvements of approximately 3.3% and 1.6%, respectively. When applied to YOLOv5 under 4-bit quantization, our method narrows the accuracy gap with the full-precision model to only 1.7% on VOC, showcasing its effectiveness in preserving performance under low-bit constraints. Notably, the proposed framework is modular, easy to integrate, and compatible with any existing QAT technique-enhancing the performance of quantized models without requiring modifications to the original network architecture.
zh

[CV-181] Open-World 3D Scene Graph Generation for Retrieval-Augmented Reasoning AAAI2026

链接: https://arxiv.org/abs/2511.05894
作者: Fei Yu,Quan Deng,Shengeng Tang,Yuehua Li,Lechao Cheng
机构: Huazhong University of Science and Technology (华中科技大学); Hefei University of Technology (合肥工业大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by AAAI 2026

点击查看摘要

[CV-182] Hybrid second-order gradient histogram based global low-rank sparse regression for robust face recognition

【速读】:该论文旨在解决复杂遮挡和光照变化条件下人脸识别性能下降的问题。其核心解决方案在于提出一种基于混合二阶梯度直方图(Hybrid Second-Order Gradient Histogram, H2H)的全局低秩稀疏回归模型(H2H-GLRSR),关键创新点包括:设计了一种能更有效刻画人脸图像局部结构特征的新颖特征描述子H2H,并将其与基于核范数正则化的矩阵回归(Sparse Regularized Nuclear Norm based Matrix Regression, SR_NMR)相结合;同时在残差矩阵上引入全局低秩约束,以更好地捕捉结构化噪声中的全局相关性,从而提升模型在复杂场景下的鲁棒性与识别准确率。

链接: https://arxiv.org/abs/2511.05893
作者: Hongxia Li,Ying Ji,Yongxin Dong,Yuehua Feng
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC)
备注:

点击查看摘要

Abstract:Low-rank sparse regression models have been widely applied in the field of face recognition. To further address the challenges caused by complex occlusions and illumination variations, this paper proposes a Hybrid Second-Order Gradient Histogram based Global Low-Rank Sparse Regression (H2H-GLRSR) model. Specifically, a novel feature descriptor called the Hybrid Second-Order Gradient Histogram (H2H) is first designed to more effectively characterize the local structural features of facial images. Then, this descriptor is integrated with the Sparse Regularized Nuclear Norm based Matrix Regression (SR _ NMR). Moreover, a global low-rank constraint is imposed on the residual matrix, enabling the model to better capture the global correlations inherent in structured noise. Experimental results demonstrate that the proposed method significantly outperforms existing regression-based classification approaches under challenging scenarios involving occlusions, illumination changes, and unconstrained environments.
zh

[CV-183] owards Frequency-Adaptive Learning for SAR Despeckling

【速读】:该论文旨在解决合成孔径雷达(SAR)图像中斑点噪声(speckle noise)对高精度应用的限制问题,特别是现有深度学习方法因采用统一网络处理全图而忽视不同空间物理特征区域的差异性噪声统计特性,导致伪影、边缘模糊和纹理失真。解决方案的关键在于提出一种基于分而治之架构的频率自适应异构去斑模型(SAR-FAH),通过小波分解将图像划分为不同频带子区域,并针对各频带设计专用子网络:低频部分利用神经微分方程(neural ordinary differential equations)建模为连续动态系统以保障结构保真与平滑性,高频部分则引入带可变形卷积的增强型U-Net实现边缘与纹理增强的同时抑制噪声,从而在去噪与细节保持之间取得更优平衡。

链接: https://arxiv.org/abs/2511.05890
作者: Ziqing Ma,Chang Yang,Zhichang Guo,Yao Li
机构: Harbin Institute of Technology (哈尔滨工业大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 13 pages, 14 figures,9 tables

点击查看摘要

Abstract:Synthetic Aperture Radar (SAR) images are inherently corrupted by speckle noise, limiting their utility in high-precision applications. While deep learning methods have shown promise in SAR despeckling, most methods employ a single unified network to process the entire image, failing to account for the distinct speckle statistics associated with different spatial physical characteristics. It often leads to artifacts, blurred edges, and texture distortion. To address these issues, we propose SAR-FAH, a frequency-adaptive heterogeneous despeckling model based on a divide-and-conquer architecture. First, wavelet decomposition is used to separate the image into frequency sub-bands carrying different intrinsic characteristics. Inspired by their differing noise characteristics, we design specialized sub-networks for different frequency components. The tailored approach leverages statistical variations across frequencies, improving edge and texture preservation while suppressing noise. Specifically, for the low-frequency part, denoising is formulated as a continuous dynamic system via neural ordinary differential equations, ensuring structural fidelity and sufficient smoothness that prevents artifacts. For high-frequency sub-bands rich in edges and textures, we introduce an enhanced U-Net with deformable convolutions for noise suppression and enhanced features. Extensive experiments on synthetic and real SAR images validate the superior performance of the proposed model in noise removal and structural preservation.
zh

[CV-184] MoEGCL: Mixture of Ego-Graphs Contrastive Representation Learning for Multi-View Clustering AAAI’2026

链接: https://arxiv.org/abs/2511.05876
作者: Jian Zhu,Xin Zou,Jun Sun,Cheng Luo,Lei Liu,Lingfang Zeng,Ning Zhang,Bian Wu,Chang Tang,Lirong Dai
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: AAAI’2026 oral paper

点击查看摘要

[CV-185] owards a Humanized Social-Media Ecosystem: AI-Augmented HCI Design Patterns for Safety Agency Well-Being

【速读】:该论文旨在解决当前社交平台以用户参与度为核心导向的算法设计所带来的负面影响,如加剧压力、传播虚假信息及削弱用户对内容和体验的控制力。其核心问题在于平台算法往往“作用于”用户而非“与用户协作”,导致用户体验恶化且缺乏透明性与自主权。解决方案的关键是提出Human-Layer AI(HL-AI)——一种部署在浏览器层的、由用户拥有的可解释中间件,能够在不依赖平台合作的前提下,为用户提供即时、可控的干预能力。HL-AI通过五个代表性功能框架(Context-Aware Post Rewriter、Post Integrity Meter、Granular Feed Curator、Micro-Withdrawal Agent 和 Recovery Mode)实现对内容流的精细化调节,并基于统一的数学模型平衡用户效用、自主权成本与风险阈值,从而赋予用户在使用社交平台时更主动的安全感、选择权与心理韧性。

链接: https://arxiv.org/abs/2511.05875
作者: Mohd Ruhul Ameen,Akif Islam
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: 6 pages, 5 tables, 7 figures, and 2 algorithm tables. Accepted at International Conference on Signal Processing, Information, Communication and Systems (SPICSCON 2025)

点击查看摘要

Abstract:Social platforms connect billions of people, yet their engagement-first algorithms often work on users rather than with them, amplifying stress, misinformation, and a loss of control. We propose Human-Layer AI (HL-AI)–user-owned, explainable intermediaries that sit in the browser between platform logic and the interface. HL-AI gives people practical, moment-to-moment control without requiring platform cooperation. We contribute a working Chrome/Edge prototype implementing five representative pattern frameworks–Context-Aware Post Rewriter, Post Integrity Meter, Granular Feed Curator, Micro-Withdrawal Agent, and Recovery Mode–alongside a unifying mathematical formulation balancing user utility, autonomy costs, and risk thresholds. Evaluation spans technical accuracy, usability, and behavioral outcomes. The result is a suite of humane controls that help users rewrite before harm, read with integrity cues, tune feeds with intention, pause compulsive loops, and seek shelter during harassment, all while preserving agency through explanations and override options. This prototype offers a practical path to retrofit today’s feeds with safety, agency, and well-being, inviting rigorous cross-cultural user evaluation.
zh

[CV-186] Light-Field Dataset for Disparity Based Depth Estimation

【速读】:该论文旨在解决光场(Light Field, LF)深度估计中因角分辨率与空间分辨率权衡关系带来的挑战,尤其是焦平面位置对视差(disparity)影响的不确定性,以及现有光场图像数据集在真实性和多样性上的不足。其解决方案的关键在于构建一个公开可用的高质量光场图像数据集,包含285张由Lytro Illum LF相机拍摄的真实光场图像和13张具有相似视差特性的合成图像,并进一步通过机械滑轨系统与Blender软件生成真实与合成相结合的立体光场数据子集,从而为新型基于视差的光场深度估计算法的设计、开发、实现与测试提供可靠的数据支持。

链接: https://arxiv.org/abs/2511.05866
作者: Suresh Nehra,Aupendu Kar,Jayanta Mukhopadhyay,Prabir Kumar Biswas
机构: Indian Institute of Technology Kharagpur (印度理工学院克哈拉格普尔分校); Dolby Laboratories, Inc (杜比实验室公司)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: This paper has been accepted to ACM ICVGIP 2025

点击查看摘要

Abstract:A Light Field (LF) camera consists of an additional two-dimensional array of micro-lenses placed between the main lens and sensor, compared to a conventional camera. The sensor pixels under each micro-lens receive light from a sub-aperture of the main lens. This enables the image sensor to capture both spatial information and the angular resolution of a scene point. This additional angular information is used to estimate the depth of a 3-D scene. The continuum of virtual viewpoints in light field data enables efficient depth estimation using Epipolar Line Images (EPIs) with robust occlusion handling. However, the trade-off between angular information and spatial information is very critical and depends on the focal position of the camera. To design, develop, implement, and test novel disparity-based light field depth estimation algorithms, the availability of suitable light field image datasets is essential. In this paper, a publicly available light field image dataset is introduced and thoroughly described. We have also demonstrated the effect of focal position on the disparity of a 3-D point as well as the shortcomings of the currently available light field dataset. The proposed dataset contains 285 light field images captured using a Lytro Illum LF camera and 13 synthetic LF images. The proposed dataset also comprises a synthetic dataset with similar disparity characteristics to those of a real light field camera. A real and synthetic stereo light field dataset is also created by using a mechanical gantry system and Blender. The dataset is available at this https URL.
zh

[CV-187] CGCE: Classifier-Guided Concept Erasure in Generative Models

【速读】:该论文旨在解决生成式 AI(Generative AI)模型在内容安全方面的问题,即如何实现对模型中特定不良概念的鲁棒擦除,同时避免因擦除操作导致模型对正常、无害概念的生成质量下降。现有方法在面对对抗性攻击时易被绕过,且难以兼顾安全性与生成性能之间的平衡。其解决方案的关键在于提出 Classifier-Guided Concept Erasure (CGCE),一种无需修改原始模型权重的轻量级插件式框架:通过一个运行在文本嵌入空间上的轻量分类器,在推理阶段检测并修正包含不良概念的提示词,从而在不损害模型原有能力的前提下有效阻止有害内容生成。该方法具有高度可扩展性,支持多概念擦除,并在多种文本到图像(T2I)和文本到视频(T2V)模型上验证了其卓越的安全性和生成保真度。

链接: https://arxiv.org/abs/2511.05865
作者: Viet Nguyen,Vishal M. Patel
机构: Johns Hopkins University (约翰霍普金斯大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
备注: 24 pages, 15 figures

点击查看摘要

Abstract:Recent advancements in large-scale generative models have enabled the creation of high-quality images and videos, but have also raised significant safety concerns regarding the generation of unsafe content. To mitigate this, concept erasure methods have been developed to remove undesirable concepts from pre-trained models. However, existing methods remain vulnerable to adversarial attacks that can regenerate the erased content. Moreover, achieving robust erasure often degrades the model’s generative quality for safe, unrelated concepts, creating a difficult trade-off between safety and performance. To address this challenge, we introduce Classifier-Guided Concept Erasure (CGCE), an efficient plug-and-play framework that provides robust concept erasure for diverse generative models without altering their original weights. CGCE uses a lightweight classifier operating on text embeddings to first detect and then refine prompts containing undesired concepts. This approach is highly scalable, allowing for multi-concept erasure by aggregating guidance from several classifiers. By modifying only unsafe embeddings at inference time, our method prevents harmful content generation while preserving the model’s original quality on benign prompts. Extensive experiments show that CGCE achieves state-of-the-art robustness against a wide range of red-teaming attacks. Our approach also maintains high generative utility, demonstrating a superior balance between safety and performance. We showcase the versatility of CGCE through its successful application to various modern T2I and T2V models, establishing it as a practical and effective solution for safe generative AI.
zh

[CV-188] Point Cloud Segmentation of Integrated Circuits Package Substrates Surface Defects Using Causal Inference: Dataset Construction and Methodology

【速读】:该论文旨在解决陶瓷基板(Ceramic Package Substrate, CPS)表面缺陷检测中因结构复杂、缺陷微小及缺乏公开点云数据集而导致的3D分割精度不足问题。解决方案的关键在于构建了目前工业领域点分辨率和标注精度最高的点云数据集CPS3D-Seg,包含1300个样本、20类产品且具备逐点级标注,并提出基于因果推理的新型3D分割网络CINet,其通过结构精修(Structural Refine, SR)与质量评估(Quality Assessment, QA)模块量化点云中的潜在混杂因素,从而显著提升mIoU和准确率指标。

链接: https://arxiv.org/abs/2511.05853
作者: Bingyang Guo,Qiang Zuo,Ruiyun Yu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:The effective segmentation of 3D data is crucial for a wide range of industrial applications, especially for detecting subtle defects in the field of integrated circuits (IC). Ceramic package substrates (CPS), as an important electronic material, are essential in IC packaging owing to their superior physical and chemical properties. However, the complex structure and minor defects of CPS, along with the absence of a publically available dataset, significantly hinder the development of CPS surface defect detection. In this study, we construct a high-quality point cloud dataset for 3D segmentation of surface defects in CPS, i.e., CPS3D-Seg, which has the best point resolution and precision compared to existing 3D industrial datasets. CPS3D-Seg consists of 1300 point cloud samples under 20 product categories, and each sample provides accurate point-level annotations. Meanwhile, we conduct a comprehensive benchmark based on SOTA point cloud segmentation algorithms to validate the effectiveness of CPS3D-Seg. Additionally, we propose a novel 3D segmentation method based on causal inference (CINet), which quantifies potential confounders in point clouds through Structural Refine (SR) and Quality Assessment (QA) Modules. Extensive experiments demonstrate that CINet significantly outperforms existing algorithms in both mIoU and accuracy.
zh

[CV-189] Enhancing Diffusion Model Guidance through Calibration and Regularization NEURIPS2025

【速读】:该论文旨在解决Classifier-guided diffusion models在早期去噪步骤中因分类器预测过于自信而导致指导梯度消失的问题,从而影响条件图像生成的质量。解决方案的关键在于两个互补的改进:其一,提出基于Smooth Expected Calibration Error (Smooth ECE)的可微校准目标,通过最小量微调提升分类器校准性能,显著改善Frechet Inception Distance (FID);其二,设计无需重新训练分类器的增强采样引导方法,包括带批级重加权的倾斜采样、自适应熵正则化采样以保持多样性,以及基于f-divergence的新颖采样策略,在强化类别一致性引导的同时保障模式覆盖。实验表明,所提方法在ImageNet 128x128数据集上使用ResNet-101分类器即可实现FID=2.13,优于现有方法且无需重训练扩散模型。

链接: https://arxiv.org/abs/2511.05844
作者: Seyed Alireza Javid,Amirhossein Bagheri,Nuria González-Prelcic
机构: UC San Diego (加州大学圣地亚哥分校); Politecnico di Milano (米兰理工大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
备注: Accepted from NeurIPS 2025 Workshop on Structured Probabilistic Inference Generative Modeling. Code available at this https URL

点击查看摘要

Abstract:Classifier-guided diffusion models have emerged as a powerful approach for conditional image generation, but they suffer from overconfident predictions during early denoising steps, causing the guidance gradient to vanish. This paper introduces two complementary contributions to address this issue. First, we propose a differentiable calibration objective based on the Smooth Expected Calibration Error (Smooth ECE), which improves classifier calibration with minimal fine-tuning and yields measurable improvements in Frechet Inception Distance (FID). Second, we develop enhanced sampling guidance methods that operate on off-the-shelf classifiers without requiring retraining. These include tilted sampling with batch-level reweighting, adaptive entropy-regularized sampling to preserve diversity, and a novel f-divergence-based sampling strategy that strengthens class-consistent guidance while maintaining mode coverage. Experiments on ImageNet 128x128 demonstrate that our divergence-regularized guidance achieves an FID of 2.13 using a ResNet-101 classifier, improving upon existing classifier-guided diffusion methods while requiring no diffusion model retraining. The results show that principled calibration and divergence-aware sampling provide practical and effective improvements for classifier-guided diffusion.
zh

[CV-190] Understanding Cross Task Generalization in Handwriting-Based Alzheimers Screening via Vision Language Adaptation

【速读】:该论文旨在解决当前基于手写特征的阿尔茨海默病(Alzheimer’s disease, AD)早期检测研究中存在的两大问题:一是现有方法多依赖在线轨迹和手工特征,未系统评估任务类型对诊断性能及跨任务泛化能力的影响;二是尽管大规模视觉语言模型在自然图像和医学影像中展现出强大的零样本或少样本异常检测能力,但其在手写特征疾病识别中的应用仍处于探索阶段。解决方案的关键在于提出一种轻量级跨层融合适配器框架(Cross-Layer Fusion Adapter, CLFA),该框架复用CLIP模型的视觉编码器,并在其内部嵌入多层级融合适配器,逐步对齐表征以捕捉与手写特定医学线索相关的特征,从而实现无需提示(prompt-free)且高效的零样本推理,同时支持跨任务泛化分析,揭示不同书写任务和笔画模式对手写特征在AD早期识别中的贡献。

链接: https://arxiv.org/abs/2511.05841
作者: Changqing Gong,Huafeng Qin,Mounim A. El-Yacoubi
机构: Telecom SudParis (电信巴黎高等矿业学院); Institut Polytechnique de Paris (巴黎综合理工学院); School of Computer Science and Information Engineering (计算机科学与信息工程学院); Chongqing Technology and Business University (重庆工商大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Alzheimer’s disease is a prevalent neurodegenerative disorder for which early detection is critical. Handwriting-often disrupted in prodromal AD-provides a non-invasive and cost-effective window into subtle motor and cognitive decline. Existing handwriting-based AD studies, mostly relying on online trajectories and hand-crafted features, have not systematically examined how task type influences diagnostic performance and cross-task generalization. Meanwhile, large-scale vision language models have demonstrated remarkable zero or few-shot anomaly detection in natural images and strong adaptability across medical modalities such as chest X-ray and brain MRI. However, handwriting-based disease detection remains largely unexplored within this paradigm. To close this gap, we introduce a lightweight Cross-Layer Fusion Adapter framework that repurposes CLIP for handwriting-based AD screening. CLFA implants multi-level fusion adapters within the visual encoder to progressively align representations toward handwriting-specific medical cues, enabling prompt-free and efficient zero-shot inference. Using this framework, we systematically investigate cross-task generalization-training on a specific handwriting task and evaluating on unseen ones-to reveal which task types and writing patterns most effectively discriminate AD. Extensive analyses further highlight characteristic stroke patterns and task-level factors that contribute to early AD identification, offering both diagnostic insights and a benchmark for handwriting-based cognitive assessment.
zh

[CV-191] YrPPG: Uncomplicated and Enhanced Learning Capability rPPG for Remote Heart Rate Estimation

链接: https://arxiv.org/abs/2511.05833
作者: Taixi Chen,Yiu-ming Cheung
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: The 6th International Workshop on AI for Social Good in the Connected World (AI4SG)@ IEEE WI-IAT 2025

点击查看摘要

[CV-192] Hilbert-Guided Block-Sparse Local Attention

链接: https://arxiv.org/abs/2511.05832
作者: Yunge Li,Lanyu Xu
机构: Oakland University (奥克兰大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-193] LRANet: Low-Rank Approximation Network for Accurate and Efficient Text Spotting

【速读】:该论文旨在解决任意形状文本(arbitrary-shaped text)的端到端文本检测与识别(end-to-end text spotting)中检测精度与效率难以兼顾的问题。其核心瓶颈在于缺乏可靠且高效的文本检测方法。解决方案的关键在于提出两个创新模块:一是基于低秩近似的参数化文本形状表示方法,通过从标注文本边界中直接学习低秩子空间,并利用ℓ₁-范数优化恢复机制,实现对文本形状的紧凑且鲁棒的表示;二是三重分配检测头(triple assignment detection head),其中深度稀疏分支用于稳定训练,超轻量稀疏分支用于加速推理,同时密集分支提供丰富的并行监督信号。这两个模块共同构建了LRANet++框架,显著提升了任意形状文本的检测精度与推理效率。

链接: https://arxiv.org/abs/2511.05818
作者: Yuchen Su,Zhineng Chen,Yongkun Du,Zuxuan Wu,Hongtao Xie,Yu-Gang Jiang
机构: Fudan University (复旦大学); University of Science and Technology of China (中国科学技术大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:End-to-end text spotting aims to jointly optimize text detection and recognition within a unified framework. Despite significant progress, designing an accurate and efficient end-to-end text spotter for arbitrary-shaped text remains largely unsolved. We identify the primary bottleneck as the lack of a reliable and efficient text detection method. To address this, we propose a novel parameterized text shape method based on low-rank approximation for precise detection and a triple assignment detection head to enable fast inference. Specifically, unlike other shape representation methods that employ data-irrelevant parameterization, our data-driven approach derives a low-rank subspace directly from labeled text boundaries. To ensure this process is robust against the inherent annotation noise in this data, we utilize a specialized recovery method based on an \ell_1 -norm formulation, which accurately reconstructs the text shape with only a few key orthogonal vectors. By exploiting the inherent shape correlation among different text contours, our method achieves consistency and compactness in shape representation. Next, the triple assignment scheme introduces a novel architecture where a deep sparse branch (for stabilized training) is used to guide the learning of an ultra-lightweight sparse branch (for accelerated inference), while a dense branch provides rich parallel supervision. Building upon these advancements, we integrate the enhanced detection module with a lightweight recognition branch to form an end-to-end text spotting framework, termed LRANet++, capable of accurately and efficiently spotting arbitrary-shaped text. Extensive experiments on several challenging benchmarks demonstrate the superiority of LRANet++ compared to state-of-the-art methods. Code will be available at: this https URL
zh

[CV-194] MACMD: Multi-dilated Contextual Attention and Channel Mixer Decoding for Medical Image Segmentation

【速读】:该论文旨在解决医学图像分割中因解剖结构差异导致的挑战,特别是现有基于Transformer的编码器-解码器架构在处理浅层特征信息丢失以及编码器与解码器之间局部细节和全局上下文融合效率低下的问题。解决方案的关键在于提出一种基于MACMD(Multi-scale Attention and Channel Mixing Decoder)的新型解码器设计,其通过跳接连接实现编码器与解码器间的通道混合,并结合多尺度空洞卷积、注意力驱动调制和跨通道混合模块,有效增强自注意力机制的同时保留局部上下文信息,从而在保持计算效率的前提下显著提升分割精度。

链接: https://arxiv.org/abs/2511.05803
作者: Lalit Maurya,Honghai Liu,Reyer Zwiggelaar
机构: University of Portsmouth (朴茨茅斯大学); Aberystwyth University (阿伯里斯特威斯大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Medical image segmentation faces challenges due to variations in anatomical structures. While convolutional neural networks (CNNs) effectively capture local features, they struggle with modeling long-range dependencies. Transformers mitigate this issue with self-attention mechanisms but lack the ability to preserve local contextual information. State-of-the-art models primarily follow an encoder-decoder architecture, achieving notable success. However, two key limitations remain: (1) Shallow layers, which are closer to the input, capture fine-grained details but suffer from information loss as data propagates through deeper layers. (2) Inefficient integration of local details and global context between the encoder and decoder stages. To address these challenges, we propose the MACMD-based decoder, which enhances attention mechanisms and facilitates channel mixing between encoder and decoder stages via skip connections. This design leverages hierarchical dilated convolutions, attention-driven modulation, and a cross channel-mixing module to capture long-range dependencies while preserving local contextual details, essential for precise medical image segmentation. We evaluated our approach using multiple transformer encoders on both binary and multi-organ segmentation tasks. The results demonstrate that our method outperforms state-of-the-art approaches in terms of Dice score and computational efficiency, highlighting its effectiveness in achieving accurate and robust segmentation performance. The code available at this https URL
zh

[CV-195] Position-Prior-Guided Network for System Matrix Super-Resolution in Magnetic Particle Imaging

【速读】:该论文旨在解决磁颗粒成像(Magnetic Particle Imaging, MPI)中系统矩阵(System Matrix, SM)标定过程耗时长且对系统参数变化敏感的问题。现有基于深度学习的超分辨率(Super-Resolution, SR)方法虽能加速标定,但未能充分利用SM所具有的物理先验信息,如位置对称性。论文的关键解决方案是将位置先验(positional priors)融入现有的SM标定框架,通过理论证明与2D/3D实验验证了该策略可显著提升标定效率与精度,从而实现更鲁棒和高效的MPI重建。

链接: https://arxiv.org/abs/2511.05795
作者: Xuqing Geng,Lei Su,Zhongwei Bian,Zewen Sun,Jiaxuan Wen,Jie Tian,Yang Du
机构: CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所分子影像重点实验室); School of Artificial Intelligence, University of Chinese Academy of Sciences(中国科学院大学人工智能学院); School of Engineering Medicine and the School of Biological Science and Medical Engineering, Beihang University(北京航空航天大学医学工程与生物科学与医学工程学院); Key Laboratory of Big Data-Based Precision Medicine (Beihang University), Ministry of Industry and Information Technology of China(工业和信息化部大数据精准医疗(北京航空航天大学)重点实验室)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: accepted as oral presentation at EMBC 2025

点击查看摘要

Abstract:Magnetic Particle Imaging (MPI) is a novel medical imaging modality. One of the established methods for MPI reconstruction is based on the System Matrix (SM). However, the calibration of the SM is often time-consuming and requires repeated measurements whenever the system parameters change. Current methodologies utilize deep learning-based super-resolution (SR) techniques to expedite SM calibration; nevertheless, these strategies do not fully exploit physical prior knowledge associated with the SM, such as symmetric positional priors. Consequently, we integrated positional priors into existing frameworks for SM calibration. Underpinned by theoretical justification, we empirically validated the efficacy of incorporating positional priors through experiments involving both 2D and 3D SM SR methods.
zh

[CV-196] CSA-UDA: Text-Driven Cross-Semantic Alignment for Unsupervised Domain Adaptation in Medical Image Segmentation

【速读】:该论文旨在解决医学图像分割中无监督域适应(Unsupervised Domain Adaptation, UDA)面临的挑战,特别是跨成像模态(如CT与MRI)之间存在的显著域偏移问题。其核心解决方案是提出TCSA-UDA框架,关键在于引入文本驱动的跨语义对齐机制:一方面通过视觉-语言协方差余弦损失(vision-language covariance cosine loss),直接将图像编码器特征与类间文本语义关系对齐,从而学习具有语义一致性且模态不变的特征表示;另一方面设计原型对齐模块(prototype alignment module),利用高层语义原型对齐不同域间的类别级像素特征分布,缓解残余类别差异并增强跨模态一致性。该方法有效提升了模型在跨模态医学图像分割任务中的鲁棒性与性能。

链接: https://arxiv.org/abs/2511.05782
作者: Lalit Maurya,Honghai Liu,Reyer Zwiggelaar
机构: University of Portsmouth (朴茨茅斯大学); Aberystwyth University (阿伯里斯特威斯大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Unsupervised domain adaptation for medical image segmentation remains a significant challenge due to substantial domain shifts across imaging modalities, such as CT and MRI. While recent vision-language representation learning methods have shown promise, their potential in UDA segmentation tasks remains underexplored. To address this gap, we propose TCSA-UDA, a Text-driven Cross-Semantic Alignment framework that leverages domain-invariant textual class descriptions to guide visual representation learning. Our approach introduces a vision-language covariance cosine loss to directly align image encoder features with inter-class textual semantic relations, encouraging semantically meaningful and modality-invariant feature representations. Additionally, we incorporate a prototype alignment module that aligns class-wise pixel-level feature distributions across domains using high-level semantic prototypes. This mitigates residual category-level discrepancies and enhances cross-modal consistency. Extensive experiments on challenging cross-modality cardiac, abdominal, and brain tumor segmentation benchmarks demonstrate that our TCSA-UDA framework significantly reduces domain shift and consistently outperforms state-of-the-art UDA methods, establishing a new paradigm for integrating language-driven semantics into domain-adaptive medical image analysis.
zh

[CV-197] MARAuders Map: Motion-Aware Real-time Activity Recognition with Layout-Based Trajectories

【速读】:该论文旨在解决智能家居环境中基于环境传感器的人类活动识别(Human Activity Recognition, HAR)难题,特别是针对实时推理、空间感知推理和上下文敏感的时间建模需求。现有方法通常依赖预分割的单一活动数据,忽视环境物理布局,导致在连续真实场景中的鲁棒性不足。解决方案的关键在于提出MARAuder’s Map框架,通过将原始传感器激活映射到物理平面图生成具有轨迹感知能力的图像序列,从而捕捉人类移动的空间流动;并采用混合深度学习模型联合建模空间结构与时间依赖关系,同时引入可学习的时间嵌入模块以编码小时、星期等上下文信息,并设计基于注意力机制的编码器聚焦于每个观察窗口内的关键片段,有效应对跨活动转换和时间模糊性问题。

链接: https://arxiv.org/abs/2511.05773
作者: Zishuai Liu,Weihang You,Jin Lu,Fei Dou
机构: University of Georgia (佐治亚大学)
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Ambient sensor-based human activity recognition (HAR) in smart homes remains challenging due to the need for real-time inference, spatially grounded reasoning, and context-aware temporal modeling. Existing approaches often rely on pre-segmented, within-activity data and overlook the physical layout of the environment, limiting their robustness in continuous, real-world deployments. In this paper, we propose MARAuder’s Map, a novel framework for real-time activity recognition from raw, unsegmented sensor streams. Our method projects sensor activations onto the physical floorplan to generate trajectory-aware, image-like sequences that capture the spatial flow of human movement. These representations are processed by a hybrid deep learning model that jointly captures spatial structure and temporal dependencies. To enhance temporal awareness, we introduce a learnable time embedding module that encodes contextual cues such as hour-of-day and day-of-week. Additionally, an attention-based encoder selectively focuses on informative segments within each observation window, enabling accurate recognition even under cross-activity transitions and temporal ambiguity. Extensive experiments on multiple real-world smart home datasets demonstrate that our method outperforms strong baselines, offering a practical solution for real-time HAR in ambient sensor environments.
zh

[CV-198] Sign language recognition from skeletal data using graph and recurrent neural networks

链接: https://arxiv.org/abs/2511.05772
作者: B. Mederos,J. Mejía,A. Medina-Reyes,Y. Espinosa-Almeyda,J. D. Díaz-Roman,I. Rodríguez-Mederos,M. Mejía-Carreon,F. Gonzalez-Lopez
机构: Universidad Autónoma de Ciudad Juárez (UACJ); Instituto Iberoamericano San Patricio
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 15 pages, 2 figures

点击查看摘要

[CV-199] A Second-Order Attention Mechanism For Prostate Cancer Segmentation and Detection in Bi-Parametric MRI

链接: https://arxiv.org/abs/2511.05760
作者: Mateo Ortiz,Juan Olmos,Fabio Martínez
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted at the 28th Iberoamerican Congress on Pattern Recognition (CIARP 2025). To appear in Lecture Notes in Computer Science (LNCS), Springer

点击查看摘要

[CV-200] owards Better Ultrasound Video Segmentation Foundation Model: An Empirical study on SAM2 Finetuning from Data Perspective

【速读】:该论文旨在解决生成式 AI(Generative AI)模型 SAM2 在超声(Ultrasound, US)视频分割任务中性能下降的问题,其核心挑战源于医学影像域与通用数据集之间的强跨域差异、运动伪影以及标注数据稀缺。解决方案的关键在于采用数据驱动的视角,系统性地分析训练集规模、视频时长和增强策略对模型适应性能的影响,并提出六种针对超声影像特性的专用增强方法。实验表明,相较于模型架构或初始化方式,数据规模和时间上下文对适配效果更具决定性作用;同时,多任务联合训练在模态对齐与任务专业化之间提供了高效的平衡。

链接: https://arxiv.org/abs/2511.05731
作者: Xing Yao,Ahana Gangopadhyay,Hsi-Ming Chang,Ravi Soni
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Ultrasound (US) video segmentation remains a challenging problem due to strong inter- and intra-dataset variability, motion artifacts, and limited annotated data. Although foundation models such as Segment Anything Model 2 (SAM2) demonstrate strong zero-shot and prompt-guided segmentation capabilities, their performance deteriorates substantially when transferred to medical imaging domains. Current adaptation studies mainly emphasize architectural modifications, while the influence of data characteristics and training regimes has not been systematically examined. In this study, we present a comprehensive, data-centric investigation of SAM2 adaptation for ultrasound video segmentation. We analyze how training-set size, video duration, and augmentation schemes affect adaptation performance under three paradigms: task-specific fine-tuning, intermediate adaptation, and multi-task joint training, across five SAM2 variants and multiple prompting modes. We further design six ultrasound-specific augmentations, assessing their effect relative to generic strategies. Experiments on three representative ultrasound datasets reveal that data scale and temporal context play a more decisive role than model architecture or initialization. Moreover, joint training offers an efficient compromise between modality alignment and task specialization. This work aims to provide empirical insights for developing efficient, data-aware adaptation pipelines for SAM2 in ultrasound video analysis.
zh

[CV-201] Pedicle Screw Pairing and Registration for Screw Pose Estimation from Dual C-arm Images Using CAD Models

【速读】:该论文旨在解决脊柱手术中椎弓根螺钉(pedicle screw)在前后位(anteroposterior, AP)和侧位(lateral, LAT)C臂X线图像中的配准与定位问题,这是实现精准椎管减压和固定的关键挑战,尤其在LAT视图中更难准确建立螺钉对应关系。解决方案的关键在于通过比较不同螺钉组合(screw combination)来识别正确配对,并结合基于螺钉CAD三维模型的2D-3D对齐方法,实现高精度的螺钉姿态估计(pose estimation)。实验表明,正确的螺钉组合在未进行注册前即优于错误配对,注册后进一步显著降低投影误差,从而提升术中螺钉位置反馈的可靠性。

链接: https://arxiv.org/abs/2511.05702
作者: Yehyun Suh,Lin Li,Aric Plumley,Chaochao Zhou,Daniel Moyer,Kongbin Kang
机构: AIX Research; Alphatec Spine; Vanderbilt University (范德比尔特大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Accurate matching of pedicle screws in both anteroposterior (AP) and lateral (LAT) images is critical for successful spinal decompression and stabilization during surgery. However, establishing screw correspondence, especially in LAT views, remains a significant clinical challenge. This paper introduces a method to address pedicle screw correspondence and pose estimation from dual C-arm images. By comparing screw combinations, the approach demonstrates consistent accuracy in both pairing and registration tasks. The method also employs 2D-3D alignment with screw CAD 3D models to accurately pair and estimate screw pose from dual views. Our results show that the correct screw combination consistently outperforms incorrect pairings across all test cases, even prior to registration. After registration, the correct combination further enhances alignment between projections and images, significantly reducing projection error. This approach shows promise for improving surgical outcomes in spinal procedures by providing reliable feedback on screw positioning.
zh

[CV-202] VMDT: Decoding the Trustworthiness of Video Foundation Models NEURIPS2025

【速读】:该论文旨在解决视频模态基础模型(foundation models)在安全性、公平性、隐私保护等关键维度上缺乏系统性评估标准的问题。当前,尽管生成式 AI (Generative AI) 在文本和图像领域已建立较为成熟的可信度基准,但视频模态仍处于空白状态。为此,作者提出 VMDT(Video-Modal Decoding Trust),这是首个统一平台,用于评估文本到视频(T2V)和视频到文本(V2T)模型在五个核心可信维度上的表现:安全(safety)、幻觉(hallucination)、公平性(fairness)、隐私(privacy)以及对抗鲁棒性(adversarial robustness)。其关键创新在于构建了一个结构化、可扩展的评测框架,并通过大规模实验揭示了现有模型在安全性不足、不公平性突出、隐私风险随规模上升等方面的显著缺陷,从而为未来开发更可靠、可控的视频基础模型提供了量化依据与改进方向。

链接: https://arxiv.org/abs/2511.05682
作者: Yujin Potter,Zhun Wang,Nicholas Crispino,Kyle Montgomery,Alexander Xiong,Ethan Y. Chang,Francesco Pinto,Yuqi Chen,Rahul Gupta,Morteza Ziyadi,Christos Christodoulopoulos,Bo Li,Chenguang Wang,Dawn Song
机构: University of California, Berkeley (加州大学伯克利分校); University of California, Santa Cruz (加州大学圣克鲁兹分校); University of Illinois at Urbana-Champaign (伊利诺伊大学厄巴纳-香槟分校); University of Chicago (芝加哥大学); Amazon (亚马逊); Information Commissioner’s Office (信息专员办公室)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: NeurIPS 2025 Datasets Benchmarks

点击查看摘要

Abstract:As foundation models become more sophisticated, ensuring their trustworthiness becomes increasingly critical; yet, unlike text and image, the video modality still lacks comprehensive trustworthiness benchmarks. We introduce VMDT (Video-Modal DecodingTrust), the first unified platform for evaluating text-to-video (T2V) and video-to-text (V2T) models across five key trustworthiness dimensions: safety, hallucination, fairness, privacy, and adversarial robustness. Through our extensive evaluation of 7 T2V models and 19 V2T models using VMDT, we uncover several significant insights. For instance, all open-source T2V models evaluated fail to recognize harmful queries and often generate harmful videos, while exhibiting higher levels of unfairness compared to image modality models. In V2T models, unfairness and privacy risks rise with scale, whereas hallucination and adversarial robustness improve – though overall performance remains low. Uniquely, safety shows no correlation with model size, implying that factors other than scale govern current safety levels. Our findings highlight the urgent need for developing more robust and trustworthy video foundation models, and VMDT provides a systematic framework for measuring and tracking progress toward this goal. The code is available at this https URL.
zh

[CV-203] Culture in Action: Evaluating Text-to-Image Models through Social Activities

【速读】:该论文旨在解决文本到图像(Text-to-Image, T2I)扩散模型在跨文化表征中的偏见问题,即当前模型虽能生成高保真图像,但往往继承网络数据中的文化偏见,难以忠实呈现欠代表地区(如全球南方国家)的社会与日常活动。其解决方案的关键在于提出CULTIVate基准,涵盖16个国家、576个提示词和超过19,000张图像,聚焦于文化活动维度(如问候、用餐、游戏、传统舞蹈和节日庆典),并构建了一个基于可解释描述符的多维评估框架,涵盖背景、服饰、物品与互动等文化特征。同时,论文引入四项量化指标——文化一致性、幻觉程度、夸张元素与多样性,系统性地衡量T2I模型的文化忠实度,并通过人类评估验证了这些指标与人工判断的高度相关性。

链接: https://arxiv.org/abs/2511.05681
作者: Sina Malakouti,Boqing Gong,Adriana Kovashka
机构: University of Pittsburgh (匹兹堡大学); Boston University (波士顿大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Text-to-image (T2I) diffusion models achieve impressive photorealism by training on large-scale web data, but models inherit cultural biases and fail to depict underrepresented regions faithfully. Existing cultural benchmarks focus mainly on object-centric categories (e.g., food, attire, and architecture), overlooking the social and daily activities that more clearly reflect cultural norms. Few metrics exist for measuring cultural faithfulness. We introduce CULTIVate, a benchmark for evaluating T2I models on cross-cultural activities (e.g., greetings, dining, games, traditional dances, and cultural celebrations). CULTIVate spans 16 countries with 576 prompts and more than 19,000 images, and provides an explainable descriptor-based evaluation framework across multiple cultural dimensions, including background, attire, objects, and interactions. We propose four metrics to measure cultural alignment, hallucination, exaggerated elements, and diversity. Our findings reveal systematic disparities: models perform better for global north countries than for the global south, with distinct failure modes across T2I systems. Human studies confirm that our metrics correlate more strongly with human judgments than existing text-image metrics.
zh

[CV-204] Lite VLA: Efficient Vision-Language-Action Control on CPU-Bound Edge Robots

【速读】:该论文旨在解决在无GPS环境下的自主机器人如何实现本地化、资源高效的实时场景理解与推理问题。传统方法通常将感知与移动分离,难以满足动态环境中对计算效率和响应速度的严苛要求。解决方案的关键在于提出一种集成紧凑型视觉-语言模型(Vision-Language Model, VLM)的框架,使移动机器人能够在仅依赖机载硬件的前提下,同时完成运动控制与上下文感知推理,从而实现端边协同下的并发决策与行动,且无需云端连接,显著提升了系统的自主性、可靠性和可扩展性。

链接: https://arxiv.org/abs/2511.05642
作者: Justin Williams,Kishor Datta Gupta,Roy George,Mrinmoy Sarkar
机构: 未知
类目: Robotics (cs.RO); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV); Systems and Control (eess.SY)
备注:

点击查看摘要

Abstract:The deployment of artificial intelligence models at the edge is increasingly critical for autonomous robots operating in GPS-denied environments where local, resource-efficient reasoning is essential. This work demonstrates the feasibility of deploying small Vision-Language Models (VLMs) on mobile robots to achieve real-time scene understanding and reasoning under strict computational constraints. Unlike prior approaches that separate perception from mobility, the proposed framework enables simultaneous movement and reasoning in dynamic environments using only on-board hardware. The system integrates a compact VLM with multimodal perception to perform contextual interpretation directly on embedded hardware, eliminating reliance on cloud connectivity. Experimental validation highlights the balance between computational efficiency, task accuracy, and system responsiveness. Implementation on a mobile robot confirms one of the first successful deployments of small VLMs for concurrent reasoning and mobility at the edge. This work establishes a foundation for scalable, assured autonomy in applications such as service robotics, disaster response, and defense operations.
zh

[CV-205] Registration-Free Monitoring of Unstructured Point Cloud Data via Intrinsic Geometrical Properties

链接: https://arxiv.org/abs/2511.05623
作者: Mariafrancesca Patalano,Giovanna Capizzi,Kamran Paynabar
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
备注:

点击查看摘要

[CV-206] Grounding Foundational Vision Models with 3D Human Poses for Robust Action Recognition NEURIPS2025

【速读】:该论文旨在解决当前动作识别模型依赖RGB视频数据时,仅学习到表面的统计相关性而难以捕捉复杂场景中人类动作与物理空间之间本质交互动态的问题。其解决方案的关键在于将两种互补的表征融合:一是V-JEPA 2提供的基于上下文的预测性世界动态表征,二是CoMotion提供的显式且对遮挡鲁棒的人体姿态信息,从而实现动作识别在物理空间中的具身化建模。

链接: https://arxiv.org/abs/2511.05622
作者: Nicholas Babey,Tiffany Gu,Yiheng Li,Cristian Meo,Kevin Zhu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
备注: Accepted at NeurIPS 2025 SpaVLE, for code see this https URL , 9 pages, 1 figure

点击查看摘要

Abstract:For embodied agents to effectively understand and interact within the world around them, they require a nuanced comprehension of human actions grounded in physical space. Current action recognition models, often relying on RGB video, learn superficial correlations between patterns and action labels, so they struggle to capture underlying physical interaction dynamics and human poses in complex scenes. We propose a model architecture that grounds action recognition in physical space by fusing two powerful, complementary representations: V-JEPA 2’s contextual, predictive world dynamics and CoMotion’s explicit, occlusion-tolerant human pose data. Our model is validated on both the InHARD and UCF-19-Y-OCC benchmarks for general action recognition and high-occlusion action recognition, respectively. Our model outperforms three other baselines, especially within complex, occlusive scenes. Our findings emphasize a need for action recognition to be supported by spatial understanding instead of statistical pattern recognition.
zh

[CV-207] Convolutional Fully-Connected Capsule Network (CFC-CapsNet): A Novel and Fast Capsule Network

【速读】:该论文旨在解决传统胶囊网络(Capsule Network, CapsNet)在复杂数据集和实际应用中性能不佳、训练与推理速度慢以及参数量过大的问题。其关键解决方案是提出一种新型的卷积全连接胶囊网络(Convolutional Fully-Connected Capsule Network, CFC-CapsNet),通过引入一种新的CFC层来替代传统方式构建胶囊,从而生成更少但更具表达能力的胶囊,显著提升了模型准确率,同时减少了参数数量,并加快了训练与推理速度。

链接: https://arxiv.org/abs/2511.05617
作者: Pouya Shiri,Amirali Baniasadi
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:A Capsule Network (CapsNet) is a relatively new classifier and one of the possible successors of Convolutional Neural Networks (CNNs). CapsNet maintains the spatial hierarchies between the features and outperforms CNNs at classifying images including overlapping categories. Even though CapsNet works well on small-scale datasets such as MNIST, it fails to achieve a similar level of performance on more complicated datasets and real applications. In addition, CapsNet is slow compared to CNNs when performing the same task and relies on a higher number of parameters. In this work, we introduce Convolutional Fully-Connected Capsule Network (CFC-CapsNet) to address the shortcomings of CapsNet by creating capsules using a different method. We introduce a new layer (CFC layer) as an alternative solution to creating capsules. CFC-CapsNet produces fewer, yet more powerful capsules resulting in higher network accuracy. Our experiments show that CFC-CapsNet achieves competitive accuracy, faster training and inference and uses less number of parameters on the CIFAR-10, SVHN and Fashion-MNIST datasets compared to conventional CapsNet.
zh

[CV-208] Personalized Image Editing in Text-to-Image Diffusion Models via Collaborative Direct Preference Optimization NEURIPS’25

【速读】:该论文旨在解决生成式 AI(Generative AI)中图像生成与编辑模型缺乏个性化适配的问题,即现有文本到图像(Text-to-Image, T2I)扩散模型虽能生成高质量图像,但无法根据用户个体的审美偏好进行精准调整。其解决方案的关键在于提出一种名为协同直接偏好优化(Collaborative Direct Preference Optimization, C-DPO)的新方法:通过构建动态偏好图(preference graph),将每位用户建模为节点,并利用轻量级图神经网络学习用户嵌入(embedding),从而在共享视觉偏好相似用户之间实现信息协同;同时,将这些个性化嵌入集成至改进的直接偏好优化(Direct Preference Optimization, DPO)目标函数中,联合优化个体对齐性与邻域一致性,显著提升图像编辑结果与用户特定偏好的匹配度。

链接: https://arxiv.org/abs/2511.05616
作者: Connor Dunlop,Matthew Zheng,Kavana Venkatesh,Pinar Yanardag
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Published at NeurIPS’25 Main Conference

点击查看摘要

Abstract:Text-to-image (T2I) diffusion models have made remarkable strides in generating and editing high-fidelity images from text. Yet, these models remain fundamentally generic, failing to adapt to the nuanced aesthetic preferences of individual users. In this work, we present the first framework for personalized image editing in diffusion models, introducing Collaborative Direct Preference Optimization (C-DPO), a novel method that aligns image edits with user-specific preferences while leveraging collaborative signals from like-minded individuals. Our approach encodes each user as a node in a dynamic preference graph and learns embeddings via a lightweight graph neural network, enabling information sharing across users with overlapping visual tastes. We enhance a diffusion model’s editing capabilities by integrating these personalized embeddings into a novel DPO objective, which jointly optimizes for individual alignment and neighborhood coherence. Comprehensive experiments, including user studies and quantitative benchmarks, demonstrate that our method consistently outperforms baselines in generating edits that are aligned with user preferences.
zh

[CV-209] Pose-Aware Multi-Level Motion Parsing for Action Quality Assessment

【速读】:该论文旨在解决高水准体育动作质量评估(Action Quality Assessment, AQA)中因细微空间-时间姿态变化难以捕捉而导致的评分准确性不足问题。其核心挑战在于如何有效建模动作单元(Action-Unit)级别的局部与全局姿态特征,并融合多尺度运动信息及非身体相关条件(如跳水中的水花)以提升评分精度。解决方案的关键在于提出一种多层级运动解析框架:首先通过**动作单元解析器(Action-Unit Parser)实现精准的动作分割与局部-全局姿态表示;其次利用运动解析器(Motion Parser)进行时空特征学习,捕获每个动作单元内的姿态演化与外观细节;进一步引入条件解析器(Condition Parser)以灵活处理除人体姿态外的辅助判分因素(如水花);最后通过权重调整评分模块(Weight-Adjust Scoring Module)**适配不同动作类型的多样性与动作单元的多尺度特性,从而在大规模跳水数据集上实现了AQA任务的最优性能。

链接: https://arxiv.org/abs/2511.05611
作者: Shuaikang Zhu,Yang Yang,Chen Sun
机构: Xi’an Jiaotong University (西安交通大学); Shanghai University of Sport (上海体育学院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Human pose serves as a cornerstone of action quality assessment (AQA), where subtle spatial-temporal variations in pose often distinguish excellence from mediocrity. In high-level competitions, these nuanced differences become decisive factors in scoring. In this paper, we propose a novel multi-level motion parsing framework for AQA based on enhanced spatial-temporal pose features. On the first level, the Action-Unit Parser is designed with the help of pose extraction to achieve precise action segmentation and comprehensive local-global pose representations. On the second level, Motion Parser is used by spatial-temporal feature learning to capture pose changes and appearance details for each action-unit. Meanwhile, some special conditions other than body-related will impact action scoring, like water splash in diving. In this work, we design an additional Condition Parser to offer users more flexibility in their choices. Finally, Weight-Adjust Scoring Module is introduced to better accommodate the diverse requirements of various action types and the multi-scale nature of action-units. Extensive evaluations on large-scale diving sports datasets demonstrate that our multi-level motion parsing framework achieves state-of-the-art performance in both action segmentation and action scoring tasks.
zh

[CV-210] Walking the Schrödinger Bridge: A Direct Trajectory for Text-to-3D Generation NEURIPS2025

【速读】:该论文旨在解决基于优化的文本到3D生成方法中因依赖Score Distillation Sampling (SDS) 技术而引入的伪影问题,如过度饱和和过度平滑等,从而影响生成3D资产的质量与保真度。其解决方案的关键在于将生成过程建模为从当前渲染分布到目标文本条件分布之间的最优传输轨迹学习问题,并首次理论证明SDS是Schrödinger Bridge框架的一个简化实例;在此基础上提出Trajectory-Centric Distillation (TraCe) 框架,通过显式构建从当前渲染到文本引导去噪目标的扩散桥,并训练LoRA适配模型以捕捉该轨迹上的得分动力学,从而在较低的Classifier-free Guidance (CFG) 值下实现高质量、高保真的3D生成。

链接: https://arxiv.org/abs/2511.05609
作者: Ziying Li,Xuequan Lu,Xinkui Zhao,Guanjie Cheng,Shuiguang Deng,Jianwei Yin
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: NeurIPS 2025; this https URL

点击查看摘要

Abstract:Recent advancements in optimization-based text-to-3D generation heavily rely on distilling knowledge from pre-trained text-to-image diffusion models using techniques like Score Distillation Sampling (SDS), which often introduce artifacts such as over-saturation and over-smoothing into the generated 3D assets. In this paper, we address this essential problem by formulating the generation process as learning an optimal, direct transport trajectory between the distribution of the current rendering and the desired target distribution, thereby enabling high-quality generation with smaller Classifier-free Guidance (CFG) values. At first, we theoretically establish SDS as a simplified instance of the Schrödinger Bridge framework. We prove that SDS employs the reverse process of an Schrödinger Bridge, which, under specific conditions (e.g., a Gaussian noise as one end), collapses to SDS’s score function of the pre-trained diffusion model. Based upon this, we introduce Trajectory-Centric Distillation (TraCe), a novel text-to-3D generation framework, which reformulates the mathematically trackable framework of Schrödinger Bridge to explicitly construct a diffusion bridge from the current rendering to its text-conditioned, denoised target, and trains a LoRA-adapted model on this trajectory’s score dynamics for robust 3D optimization. Comprehensive experiments demonstrate that TraCe consistently achieves superior quality and fidelity to state-of-the-art techniques.
zh

[CV-211] In-process 3D Deviation Mapping and Defect Monitoring (3D-DM2) in High Production-rate Robotic Additive Manufacturing

【速读】:该论文旨在解决高沉积速率机器人增材制造(High Deposition Rate Robotic AM, HDRRAM)过程中形状精度难以维持的问题,尤其针对当前开环系统中因工艺不稳定性导致的形变偏差无法实时检测与补偿的挑战。解决方案的关键在于构建一个实时监测系统,通过采集并重建正在生长的零件形态,将其与近净成形参考模型进行直接比对,从而在制造过程中实时识别形状偏差;进一步对偏差区域进行分割和追踪,为及时干预与补偿提供依据,最终实现零件质量的一致性控制。

链接: https://arxiv.org/abs/2511.05604
作者: Subash Gautam,Alejandro Vargas-Uscategui,Peter King,Hans Lohr,Alireza Bab-Hadiashar,Ivan Cole,Ehsan Asadi
机构: CSIRO(澳大利亚联邦科学与工业研究组织); RMIT University (皇家墨尔本理工大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
备注:

点击查看摘要

Abstract:Additive manufacturing (AM) is an emerging digital manufacturing technology to produce complex and freeform objects through a layer-wise deposition. High deposition rate robotic AM (HDRRAM) processes, such as cold spray additive manufacturing (CSAM), offer significantly increased build speeds by delivering large volumes of material per unit time. However, maintaining shape accuracy remains a critical challenge, particularly due to process instabilities in current open-loop systems. Detecting these deviations as they occur is essential to prevent error propagation, ensure part quality, and minimize post-processing requirements. This study presents a real-time monitoring system to acquire and reconstruct the growing part and directly compares it with a near-net reference model to detect the shape deviation during the manufacturing process. The early identification of shape inconsistencies, followed by segmenting and tracking each deviation region, paves the way for timely intervention and compensation to achieve consistent part quality.
zh

[CV-212] Google-MedGemma Based Abnormality Detection in Musculoskeletal radiographs

【速读】:该论文旨在解决骨骼肌肉系统X光片中异常区域的自动检测问题,传统方法如卷积神经网络(Convolutional Neural Networks, CNNs)和自编码器(Autoencoder)在特征提取与泛化能力上存在局限。解决方案的关键在于引入基于MedGemma基础模型(Foundation Model)的框架,其核心创新包括:采用SigLIP衍生的视觉编码器(Vision Encoder),该编码器在多种医学影像模态上预训练,能够生成高维语义嵌入(Embedding);随后通过轻量级多层感知机(Multilayer Perceptron, MLP)实现二分类任务。该方法不仅显著优于传统模型,在MURA数据集上的性能指标表现更优,还借助MedGemma强大的迁移学习能力提升模型泛化性,并支持模块化训练策略(如选择性解冻编码器块),从而实现高效领域适应,为临床放射图像分诊提供可扩展、高精度的自动化异常检测方案。

链接: https://arxiv.org/abs/2511.05600
作者: Soumyajit Maity,Pranjal Kamboj,Sneha Maity,Rajat Singh,Sankhadeep Chatterjee
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Proceedings of ICICT 2026, London, Springer (Forthcoming, February 2026; Accepted for Publication)

点击查看摘要

Abstract:This paper proposes a MedGemma-based framework for automatic abnormality detection in musculoskeletal radiographs. Departing from conventional autoencoder and neural network pipelines, the proposed method leverages the MedGemma foundation model, incorporating a SigLIP-derived vision encoder pretrained on diverse medical imaging modalities. Preprocessed X-ray images are encoded into high-dimensional embeddings using the MedGemma vision backbone, which are subsequently passed through a lightweight multilayer perceptron for binary classification. Experimental assessment reveals that the MedGemma-driven classifier exhibits strong performance, exceeding conventional convolutional and autoencoder-based metrics. Additionally, the model leverages MedGemma’s transfer learning capabilities, enhancing generalization and optimizing feature engineering. The integration of a modern medical foundation model not only enhances representation learning but also facilitates modular training strategies such as selective encoder block unfreezing for efficient domain adaptation. The findings suggest that MedGemma-powered classification systems can advance clinical radiograph triage by providing scalable and accurate abnormality detection, with potential for broader applications in automated medical image analysis. Keywords: Google MedGemma, MURA, Medical Image, Classification. Comments: Proceedings of ICICT 2026, London, Springer (Forthcoming, February 2026; Accepted for Publication) Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI) Reportnumber: ICICT-2026-217 Cite as: arXiv:2511.05600 [cs.CV] (or arXiv:2511.05600v1 [cs.CV] for this version) https://doi.org/10.48550/arXiv.2511.05600 Focus to learn more arXiv-issued DOI via DataCite
zh

[CV-213] Beyond Softmax: Dual-Branch Sigmoid Architecture for Accurate Class Activation Maps BMVC2025

【速读】:该论文旨在解决基于类激活映射(Class Activation Mapping, CAM)及其扩展方法在可视化深度网络预测依据时存在的两个根本性偏差问题:一是由最终softmax分类器引起的加性logit偏移,导致重要性评分被任意扭曲;二是符号坍缩(sign collapse),使得兴奋性特征与抑制性特征无法区分。解决方案的关键在于提出一种简单且架构无关的双分支sigmoid头结构,通过将原模型的分类头克隆为并行分支,并以每个类别独立的sigmoid输出替代softmax,同时冻结原始softmax头仅微调sigmoid分支,从而实现定位(localization)与分类(classification)的解耦。该设计保留了特征贡献的幅度和符号信息,显著提升了解释保真度(explanation fidelity),并在不牺牲分类准确率的前提下实现了更精确的目标定位。

链接: https://arxiv.org/abs/2511.05590
作者: Yoojin Oh,Junhyug Noh
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Accepted at BMVC 2025

点击查看摘要

Abstract:Class Activation Mapping (CAM) and its extensions have become indispensable tools for visualizing the evidence behind deep network predictions. However, by relying on a final softmax classifier, these methods suffer from two fundamental distortions: additive logit shifts that arbitrarily bias importance scores, and sign collapse that conflates excitatory and inhibitory features. We propose a simple, architecture-agnostic dual-branch sigmoid head that decouples localization from classification. Given any pretrained model, we clone its classification head into a parallel branch ending in per-class sigmoid outputs, freeze the original softmax head, and fine-tune only the sigmoid branch with class-balanced binary supervision. At inference, softmax retains recognition accuracy, while class evidence maps are generated from the sigmoid branch – preserving both magnitude and sign of feature contributions. Our method integrates seamlessly with most CAM variants and incurs negligible overhead. Extensive evaluations on fine-grained tasks (CUB-200-2011, Stanford Cars) and WSOL benchmarks (ImageNet-1K, OpenImages30K) show improved explanation fidelity and consistent Top-1 Localization gains – without any drop in classification accuracy. Code is available at this https URL.
zh

[CV-214] DiffSwap: 3D Latent-Controlled Diffusion for Identity-Preserving Face Swapping

【速读】:该论文旨在解决当前基于扩散模型(diffusion-based)的人脸交换方法在复杂姿态和表情下仍存在细节伪影(fine-grained artifacts)及身份保留能力不足的问题。现有方法未能有效利用3D人脸结构信息,导致身份与姿态、表情等外观属性难以解耦。其解决方案的关键在于提出DiffSwap++,一种引入3D人脸潜在特征(3D facial latent features)的扩散人脸交换框架,在训练过程中利用3D感知表示引导生成过程,从而提升几何一致性并增强身份与外观属性的解耦能力;同时设计了一种条件去噪架构,以身份嵌入(identity embeddings)和面部关键点(facial landmarks)联合控制扩散过程,实现高保真且身份忠实的人脸交换。

链接: https://arxiv.org/abs/2511.05575
作者: Weston Bondurant,Arkaprava Sinha,Hieu Le,Srijan Das,Stephanie Schuckers
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Diffusion-based approaches have recently achieved strong results in face swapping, offering improved visual quality over traditional GAN-based methods. However, even state-of-the-art models often suffer from fine-grained artifacts and poor identity preservation, particularly under challenging poses and expressions. A key limitation of existing approaches is their failure to meaningfully leverage 3D facial structure, which is crucial for disentangling identity from pose and expression. In this work, we propose DiffSwap++, a novel diffusion-based face-swapping pipeline that incorporates 3D facial latent features during training. By guiding the generation process with 3D-aware representations, our method enhances geometric consistency and improves the disentanglement of facial identity from appearance attributes. We further design a diffusion architecture that conditions the denoising process on both identity embeddings and facial landmarks, enabling high-fidelity and identity-preserving face swaps. Extensive experiments on CelebA, FFHQ, and CelebV-Text demonstrate that DiffSwap++ outperforms prior methods in preserving source identity while maintaining target pose and expression. Additionally, we introduce a biometric-style evaluation and conduct a user study to further validate the realism and effectiveness of our approach. Code will be made publicly available at this https URL
zh

[CV-215] Elements of Active Continuous Learning and Uncertainty Self-Awareness: a Narrow Implementation for Face and Facial Expression Recognition

【速读】:该论文旨在解决当前窄域机器学习算法缺乏自我评估与修正能力的问题,这是实现人工通用智能(Artificial General Intelligence, AGI)的关键障碍之一。其解决方案的核心在于构建一种模拟自知觉机制的监督式人工神经网络(supervising artificial neural network, ANN),该网络通过监测底层卷积神经网络(convolutional neural network, CNN)集成模型的激活模式,识别高不确定性状态以判断预测的可信度。该自知觉ANN具备记忆区域用于存储历史性能信息,并在训练过程中调整可学习参数以优化整体表现;当判定为低可信度时,系统自动进入主动学习模式,使模型具备“代理权”——即在高不确定性和困惑条件下主动请求人类干预,从而提升系统的鲁棒性与可解释性。

链接: https://arxiv.org/abs/2511.05574
作者: Stanislav Selitskiy
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Reflection on one’s thought process and making corrections to it if there exists dissatisfaction in its performance is, perhaps, one of the essential traits of intelligence. However, such high-level abstract concepts mandatory for Artificial General Intelligence can be modelled even at the low level of narrow Machine Learning algorithms. Here, we present the self-awareness mechanism emulation in the form of a supervising artificial neural network (ANN) observing patterns in activations of another underlying ANN in a search for indications of the high uncertainty of the underlying ANN and, therefore, the trustworthiness of its predictions. The underlying ANN is a convolutional neural network (CNN) ensemble employed for face recognition and facial expression tasks. The self-awareness ANN has a memory region where its past performance information is stored, and its learnable parameters are adjusted during the training to optimize the performance. The trustworthiness verdict triggers the active learning mode, giving elements of agency to the machine learning algorithm that asks for human help in high uncertainty and confusion conditions.
zh

[CV-216] Video Text Preservation with Synthetic Text-Rich Videos

【速读】:该论文旨在解决文本到视频(Text-To-Video, T2V)生成模型在视频中生成可读且连贯文字时存在的显著缺陷,尤其是对短语或单词的渲染错误问题。现有方法要么计算成本高,要么不适用于视频生成场景。解决方案的关键在于采用轻量级的合成监督策略:首先利用文本到图像(Text-To-Image, T2I)扩散模型生成富含文本的图像,再通过无文本依赖的图像到视频(Image-To-Video, I2V)模型将这些图像动画化为短视频,从而构建合成的视频-提示对;随后使用这些数据对预训练的T2V模型Wan2.1进行微调,无需修改模型架构即可提升短文本的可读性与长文本的时间一致性,验证了精心设计的合成数据和弱监督机制在提高T2V生成中文本保真度方面的有效性。

链接: https://arxiv.org/abs/2511.05573
作者: Ziyang Liu,Kevin Valencia,Justin Cui
机构: University of California, Los Angeles (加州大学洛杉矶分校)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:While Text-To-Video (T2V) models have advanced rapidly, they continue to struggle with generating legible and coherent text within videos. In particular, existing models often fail to render correctly even short phrases or words and previous attempts to address this problem are computationally expensive and not suitable for video generation. In this work, we investigate a lightweight approach to improve T2V diffusion models using synthetic supervision. We first generate text-rich images using a text-to-image (T2I) diffusion model, then animate them into short videos using a text-agnostic image-to-video (I2v) model. These synthetic video-prompt pairs are used to fine-tune Wan2.1, a pre-trained T2V model, without any architectural changes. Our results show improvement in short-text legibility and temporal consistency with emerging structural priors for longer text. These findings suggest that curated synthetic data and weak supervision offer a practical path toward improving textual fidelity in T2V generation.
zh

[CV-217] C3-Diff: Super-resolving Spatial Transcriptomics via Cross-modal Cross-content Contrastive Diffusion Modelling

【速读】:该论文旨在解决空间转录组学(Spatial Transcriptomics, ST)技术中因分辨率低而导致的空间基因表达解析不足问题,以及如何有效建模组织病理图像(histology images)与基因表达数据之间的跨模态交互关系以实现ST增强。其解决方案的关键在于提出一种基于交叉模态、交叉内容对比扩散框架(C3-Diff),通过改进传统对比学习范式以提取ST图谱与组织图像中共有的模态不变性和内容不变性特征,并引入基于噪声的特征超球面信息增强策略提升低测序敏感区域的表征能力,同时设计动态跨模态插补训练策略缓解ST数据稀缺问题,从而显著提升空间基因表达图谱的质量与下游任务性能。

链接: https://arxiv.org/abs/2511.05571
作者: Xiaofei Wang,Stephen Price,Chao Li
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:The rapid advancement of spatial transcriptomics (ST), i.e., spatial gene expressions, has made it possible to measure gene expression within original tissue, enabling us to discover molecular mechanisms. However, current ST platforms frequently suffer from low resolution, limiting the in-depth understanding of spatial gene expression. Super-resolution approaches promise to enhance ST maps by integrating histology images with gene expressions of profiled tissue spots. However, it remains a challenge to model the interactions between histology images and gene expressions for effective ST enhancement. This study presents a cross-modal cross-content contrastive diffusion framework, called C3-Diff, for ST enhancement with histology images as guidance. In C3-Diff, we firstly analyze the deficiency of traditional contrastive learning paradigm, which is then refined to extract both modal-invariant and content-invariant features of ST maps and histology images. Further, to overcome the problem of low sequencing sensitivity in ST maps, we perform nosing-based information augmentation on the surface of feature unit hypersphere. Finally, we propose a dynamic cross-modal imputation-based training strategy to mitigate ST data scarcity. We tested C3-Diff by benchmarking its performance on four public datasets, where it achieves significant improvements over competing methods. Moreover, we evaluate C3-Diff on downstream tasks of cell type localization, gene expression correlation and single-cell-level gene expression prediction, promoting AI-enhanced biotechnology for biomedical research and clinical applications. Codes are available at this https URL.
zh

[CV-218] Do Street View Imagery and Public Participation GIS align: Comparative Analysis of Urban Attractiveness

【速读】:该论文旨在解决街景影像(Street View Imagery, SVI)与公众参与地理信息系统(Public Participation GIS, PPGIS)在反映城市空间感知一致性方面的可比性问题,即如何评估SVI能否有效替代或补充PPGIS所捕捉的居民主观体验。其解决方案的关键在于:利用参与者对SVI的评分和语义图像分割技术训练机器学习模型,以预测视觉特征驱动的感知吸引力,并将其与PPGIS识别出的“吸引人”或“不吸引人”的地点进行对比分析,同时引入严格与宽松两种阈值标准量化一致性程度。研究发现,尽管SVI在部分场景下能反映人类感知(如宽松阈值下吸引力匹配率达67%),但因缺乏对非视觉因素(如噪声、交通、人群密度及环境压力等)的表征,导致其无法完全替代PPGIS所体现的多维体验,从而强调了将两者整合用于更全面理解城市感知的重要性。

链接: https://arxiv.org/abs/2511.05570
作者: Milad Malekzadeh,Elias Willberg,Jussi Torkko,Silviya Korpilo,Kamyar Hasanzadeh,Olle Järv,Tuuli Toivonen
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:As digital tools increasingly shape spatial planning practices, understanding how different data sources reflect human experiences of urban environments is essential. Street View Imagery (SVI) and Public Participation GIS (PPGIS) represent two prominent approaches for capturing place-based perceptions that can support urban planning decisions, yet their comparability remains underexplored. This study investigates the alignment between SVI-based perceived attractiveness and residents’ reported experiences gathered via a city-wide PPGIS survey in Helsinki, Finland. Using participant-rated SVI data and semantic image segmentation, we trained a machine learning model to predict perceived attractiveness based on visual features. We compared these predictions to PPGIS-identified locations marked as attractive or unattractive, calculating agreement using two sets of strict and moderate criteria. Our findings reveal only partial alignment between the two datasets. While agreement (with a moderate threshold) reached 67% for attractive and 77% for unattractive places, agreement (with a strict threshold) dropped to 27% and 29%, respectively. By analysing a range of contextual variables, including noise, traffic, population presence, and land use, we found that non-visual cues significantly contributed to mismatches. The model failed to account for experiential dimensions such as activity levels and environmental stressors that shape perceptions but are not visible in images. These results suggest that while SVI offers a scalable and visual proxy for urban perception, it cannot fully substitute the experiential richness captured through PPGIS. We argue that both methods are valuable but serve different purposes; therefore, a more integrated approach is needed to holistically capture how people perceive urban environments.
zh

[CV-219] Adaptive Sample-Level Framework Motivated by Distributionally Robust Optimization with Variance-Based Radius Assignment for Enhanced Neural Network Generalization Under Distribution Shift

【速读】:该论文旨在解决深度神经网络在分布偏移(distribution shift)和少数子群体(minority subpopulations)场景下,基于经验风险最小化(Empirical Risk Minimization, ERM)训练时可靠性下降的问题。传统分布鲁棒优化(Distributionally Robust Optimization, DRO)方法依赖单一全局鲁棒预算,易导致模型过于保守或鲁棒性分配不合理。其解决方案的关键在于提出一种基于方差驱动的自适应样本级DRO(Var-DRO)框架:通过在线损失方差自动识别高风险样本,并为每个样本分配个性化的鲁棒预算;同时采用双侧KL散度形式的约束来限制每个样本的对抗权重与经验权重之比,从而将内层最大化问题转化为可在凸多面体上高效求解的线性规划问题(水填法),并引入预热阶段和线性增长的全局预算上限以稳定训练过程。该方法无需群体标签、实现简单、理论严谨且计算高效。

链接: https://arxiv.org/abs/2511.05568
作者: Aheer Sravon,Devdyuti Mazumder,Md. Ibrahim
机构: 未知
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
备注: Conference

点击查看摘要

Abstract:Distribution shifts and minority subpopulations frequently undermine the reliability of deep neural networks trained using Empirical Risk Minimization (ERM). Distributionally Robust Optimization (DRO) addresses this by optimizing for the worst-case risk within a neighborhood of the training distribution. However, conventional methods depend on a single, global robustness budget, which can lead to overly conservative models or a misallocation of robustness. We propose a variance-driven, adaptive, sample-level DRO (Var-DRO) framework that automatically identifies high-risk training samples and assigns a personalized robustness budget to each based on its online loss variance. Our formulation employs two-sided, KL-divergence-style bounds to constrain the ratio between adversarial and empirical weights for every sample. This results in a linear inner maximization problem over a convex polytope, which admits an efficient water-filling solution. To stabilize training, we introduce a warmup phase and a linear ramp schedule for the global cap on per-sample budgets, complemented by label smoothing for numerical robustness. Evaluated on CIFAR-10-C (corruptions), our method achieves the highest overall mean accuracy compared to ERM and KL-DRO. On Waterbirds, Var-DRO improves overall performance while matching or surpassing KL-DRO. On the original CIFAR-10 dataset, Var-DRO remains competitive, exhibiting the modest trade-off anticipated when prioritizing robustness. The proposed framework is unsupervised (requiring no group labels), straightforward to implement, theoretically sound, and computationally efficient.
zh

[CV-220] Automatic Extraction of Road Networks by using Teacher-Student Adaptive Structural Deep Belief Network and Its Application to Landslide Disaster

【速读】:该论文旨在解决复杂道路网络特征识别难题,尤其是在自然灾害(如山体滑坡)导致道路损毁后快速获取可用交通路径的需求。针对传统模型在道路地图自动识别中表现不足的问题,作者提出了一种基于教师-学生集成学习的自适应深度置信网络(Adaptive DBN)方法,其核心创新在于通过受限玻尔兹曼机(RBM)中的神经元生成-湮灭算法和深度信念网络(DBN)中的层生成算法,在训练过程中动态优化网络结构以适应输入数据特征,从而显著提升模型的表示能力与检测精度。实验表明,该方法在七个主要城市测试集上的平均检测准确率从40.0%提升至89.0%,且模型轻量化后可部署于嵌入式边缘设备,实现灾后快速推理。

链接: https://arxiv.org/abs/2511.05567
作者: Shin Kamada,Takumi Ichimura
机构: Hiroshima City University (广岛市立大学); Prefectural University of Hiroshima (广岛县立大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:An adaptive structural learning method of Restricted Boltzmann Machine (RBM) and Deep Belief Network (DBN) has been developed as one of prominent deep learning models. The neuron generation-annihilation algorithm in RBM and layer generation algorithm in DBN make an optimal network structure for given input during the learning. In this paper, our model is applied to an automatic recognition method of road network system, called RoadTracer. RoadTracer can generate a road map on the ground surface from aerial photograph data. A novel method of RoadTracer using the Teacher-Student based ensemble learning model of Adaptive DBN is proposed, since the road maps contain many complicated features so that a model with high representation power to detect should be required. The experimental results showed the detection accuracy of the proposed model was improved from 40.0% to 89.0% on average in the seven major cities among the test dataset. In addition, we challenged to apply our method to the detection of available roads when landslide by natural disaster is occurred, in order to rapidly obtain a way of transportation. For fast inference, a small size of the trained model was implemented on a small embedded edge device as lightweight deep learning. We reported the detection results for the satellite image before and after the rainfall disaster in Japan.
zh

[CV-221] Efficient Online Continual Learning in Sensor-Based Human Activity Recognition

【速读】:该论文旨在解决传感器驱动的人体活动识别(Human Activity Recognition, HAR)中在线持续学习(Online Continual Learning, OCL)的两个核心挑战:一是现有OCL方法计算开销大且依赖大量标注样本;二是预训练模型(Pre-trained Model-based, PTM-based)在HAR场景下的应用受限于数据异构性和标注稀缺性。其解决方案的关键在于提出PTRN-HAR,首次成功将PTM-based OCL应用于传感器驱动的HAR任务:首先使用对比损失(contrastive loss)在有限数据上预训练特征提取器,并在流式学习阶段冻结该提取器以降低资源消耗;其次用关系模块网络(relation module network)替代传统密集分类层,从而显著提升数据效率并维持高性能,实验表明其在三个公开数据集上优于当前最优方法。

链接: https://arxiv.org/abs/2511.05566
作者: Yao Zhang,Souza Leite Clayton,Yu Xiao
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 13 pages

点击查看摘要

Abstract:Machine learning models for sensor-based human activity recognition (HAR) are expected to adapt post-deployment to recognize new activities and different ways of performing existing ones. To address this need, Online Continual Learning (OCL) mechanisms have been proposed, allowing models to update their knowledge incrementally as new data become available while preserving previously acquired information. However, existing OCL approaches for sensor-based HAR are computationally intensive and require extensive labeled samples to represent new changes. Recently, pre-trained model-based (PTM-based) OCL approaches have shown significant improvements in performance and efficiency for computer vision applications. These methods achieve strong generalization capabilities by pre-training complex models on large datasets, followed by fine-tuning on downstream tasks for continual learning. However, applying PTM-based OCL approaches to sensor-based HAR poses significant challenges due to the inherent heterogeneity of HAR datasets and the scarcity of labeled data in post-deployment scenarios. This paper introduces PTRN-HAR, the first successful application of PTM-based OCL to sensor-based HAR. Unlike prior PTM-based OCL approaches, PTRN-HAR pre-trains the feature extractor using contrastive loss with a limited amount of data. This extractor is then frozen during the streaming stage. Furthermore, it replaces the conventional dense classification layer with a relation module network. Our design not only significantly reduces the resource consumption required for model training while maintaining high performance, but also improves data efficiency by reducing the amount of labeled data needed for effective continual learning, as demonstrated through experiments on three public datasets, outperforming the state-of-the-art. The code can be found here: this https URL
zh

[CV-222] In-Context Adaptation of VLMs for Few-Shot Cell Detection in Optical Microscopy

【速读】:该论文旨在解决生成式 AI 在生物医学显微图像中少样本目标检测(Few-Shot Object Detection, FSOD)能力不足的问题,尤其是在缺乏大规模标注数据的场景下。其关键解决方案是引入 Micro-OD 基准测试集,该数据集包含 252 张经专家标注的显微图像,涵盖 11 种细胞类型,并系统评估了八种前沿视觉语言模型(Vision-Language Models, VLMs)在少样本条件下的表现;同时提出一种混合 FSOD 流水线,结合检测头与基于 VLM 的少样本分类器,显著提升模型性能。研究发现,尽管零样本检测效果较差(因领域差距),但少量示例即可带来稳定改进,且推理令牌(reasoning tokens)的存在对端到端定位更有效,而简化版本更适合预定位区域的分类任务。

链接: https://arxiv.org/abs/2511.05565
作者: Shreyan Ganguly,Angona Biswas,Jaydeep Rade,Md Hasibul Hasan Hasib,Nabila Masud,Nitish Singla,Abhipsa Dash,Ushashi Bhattacharjee,Aditya Balu,Anwesha Sarkar,Adarsh Krishnamurthy,Soumik Sarkar
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Foundation vision-language models (VLMs) excel on natural images, but their utility for biomedical microscopy remains underexplored. In this paper, we investigate how in-context learning enables state-of-the-art VLMs to perform few-shot object detection when large annotated datasets are unavailable, as is often the case with microscopic images. We introduce the Micro-OD benchmark, a curated collection of 252 images specifically curated for in-context learning, with bounding-box annotations spanning 11 cell types across four sources, including two in-lab expert-annotated sets. We systematically evaluate eight VLMs under few-shot conditions and compare variants with and without implicit test-time reasoning tokens. We further implement a hybrid Few-Shot Object Detection (FSOD) pipeline that combines a detection head with a VLM-based few-shot classifier, which enhances the few-shot performance of recent VLMs on our benchmark. Across datasets, we observe that zero-shot performance is weak due to the domain gap; however, few-shot support consistently improves detection, with marginal gains achieved after six shots. We observe that models with reasoning tokens are more effective for end-to-end localization, whereas simpler variants are more suitable for classifying pre-localized crops. Our results highlight in-context adaptation as a practical path for microscopy, and our benchmark provides a reproducible testbed for advancing open-vocabulary detection in biomedical imaging.
zh

[CV-223] M2S2L: Mamba-based Multi-Scale Spatial-temporal Learning for Video Anomaly Detection

【速读】:该论文旨在解决视频异常检测(Video Anomaly Detection, VAD)中检测精度与计算效率难以平衡的问题,尤其是在复杂视频场景下,传统方法往往因缺乏全面的时空建模能力或计算开销过大而难以满足实时监控需求。解决方案的关键在于提出一种基于Mamba架构的多尺度时空学习框架(M2S2L),其核心创新包括:1)分层空间编码器在多粒度上捕捉视觉特征;2)多时间尺度编码器建模运动动态;3)引入特征分解机制,实现外观与运动重建的任务特异性优化,从而提升行为建模的精细度和异常评估的质量感知能力。实验表明,该方法在多个基准数据集上实现了高检测性能(如UCSD Ped2达98.5%帧级AUC),同时保持了20.1G FLOPs和45 FPS的高效推理速度,具备实际部署潜力。

链接: https://arxiv.org/abs/2511.05564
作者: Yang Liu,Boan Chen,Xiaoguang Zhu,Jing Liu,Peng Sun,Wei Zhou
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: IEEE VCIP 2025

点击查看摘要

Abstract:Video anomaly detection (VAD) is an essential task in the image processing community with prospects in video surveillance, which faces fundamental challenges in balancing detection accuracy with computational efficiency. As video content becomes increasingly complex with diverse behavioral patterns and contextual scenarios, traditional VAD approaches struggle to provide robust assessment for modern surveillance systems. Existing methods either lack comprehensive spatial-temporal modeling or require excessive computational resources for real-time applications. In this regard, we present a Mamba-based multi-scale spatial-temporal learning (M2S2L) framework in this paper. The proposed method employs hierarchical spatial encoders operating at multiple granularities and multi-temporal encoders capturing motion dynamics across different time scales. We also introduce a feature decomposition mechanism to enable task-specific optimization for appearance and motion reconstruction, facilitating more nuanced behavioral modeling and quality-aware anomaly assessment. Experiments on three benchmark datasets demonstrate that M2S2L framework achieves 98.5%, 92.1%, and 77.9% frame-level AUCs on UCSD Ped2, CUHK Avenue, and ShanghaiTech respectively, while maintaining efficiency with 20.1G FLOPs and 45 FPS inference speed, making it suitable for practical surveillance deployment.
zh

[CV-224] FilletRec: A Lightweight Graph Neural Network with Intrinsic Features for Automated Fillet Recognition

【速读】:该论文旨在解决CAD模型中圆角特征(fillet features)的自动化识别与简化问题,这是CAE(计算机辅助工程)分析中的关键步骤,但传统基于规则的方法鲁棒性差,而现有深度学习模型因通用设计和训练数据不足,在复杂圆角识别上存在准确率低、泛化能力弱的问题。解决方案的关键在于提出一个端到端的数据驱动框架,核心创新是构建了一个大规模多样化基准数据集,并设计了一种轻量级图神经网络FilletRec,其通过引入姿态不变的内在几何特征(如曲率)来学习更本质的几何模式,从而实现对复杂拓扑结构的高精度识别,同时参数量仅为基线模型的0.2%–5.4%,显著提升效率与泛化性能。

链接: https://arxiv.org/abs/2511.05561
作者: Jiali Gao,Taoran Liu,Hongfei Ye,Jianjun Chen
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Automated recognition and simplification of fillet features in CAD models is critical for CAE analysis, yet it remains an open challenge. Traditional rule-based methods lack robustness, while existing deep learning models suffer from poor generalization and low accuracy on complex fillets due to their generic design and inadequate training data. To address these issues, this paper proposes an end-to-end, data-driven framework specifically for fillet features. We first construct and release a large-scale, diverse benchmark dataset for fillet recognition to address the inadequacy of existing data. Based on it, we propose FilletRec, a lightweight graph neural network. The core innovation of this network is its use of pose-invariant intrinsic geometric features, such as curvature, enabling it to learn more fundamental geometric patterns and thereby achieve high-precision recognition of complex geometric topologies. Experiments show that FilletRec surpasses state-of-the-art methods in both accuracy and generalization, while using only 0.2%-5.4% of the parameters of baseline models, demonstrating high model efficiency. Finally, the framework completes the automated workflow from recognition to simplification by integrating an effective geometric simplification algorithm.
zh

[CV-225] Compressing Multi-Task Model for Autonomous Driving via Pruning and Knowledge Distillation

【速读】:该论文旨在解决多任务自动驾驶感知系统(包括目标检测、可行驶区域分割和车道线分割)在车载设备上部署时因模型参数量大、复杂度高而导致的难题。其核心解决方案是提出一种融合任务感知安全剪枝(task-aware safe pruning)与特征级知识蒸馏(feature-level knowledge distillation)的多任务模型压缩框架:前者通过结合基于泰勒展开的通道重要性评估与梯度冲突惩罚机制,保留关键通道并移除冗余及冲突通道;后者设计了一种不依赖任务头的蒸馏方法,将教师模型中的中间骨干特征和编码器特征作为指导信息传递给学生模型,从而有效缓解剪枝带来的性能下降。实验表明,该方案可在参数减少32.7%的情况下保持分割精度几乎不变,并仅小幅降低检测指标(Recall下降1.2%,mAP50下降1.8%),同时满足实时推理需求(32.7 FPS)。

链接: https://arxiv.org/abs/2511.05557
作者: Jiayuan Wang,Q. M. Jonathan Wu,Ning Zhang,Katsuya Suto,Lei Zhong
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Autonomous driving systems rely on panoptic perception to jointly handle object detection, drivable area segmentation, and lane line segmentation. Although multi-task learning is an effective way to integrate these tasks, its increasing model parameters and complexity make deployment on on-board devices difficult. To address this challenge, we propose a multi-task model compression framework that combines task-aware safe pruning with feature-level knowledge distillation. Our safe pruning strategy integrates Taylor-based channel importance with gradient conflict penalty to keep important channels while removing redundant and conflicting channels. To mitigate performance degradation after pruning, we further design a task head-agnostic distillation method that transfers intermediate backbone and encoder features from a teacher to a student model as guidance. Experiments on the BDD100K dataset demonstrate that our compressed model achieves a 32.7% reduction in parameters while segmentation performance shows negligible accuracy loss and only a minor decrease in detection (-1.2% for Recall and -1.8% for mAP50) compared to the teacher. The compressed model still runs at 32.7 FPS in real-time. These results show that combining pruning and knowledge distillation provides an effective compression solution for multi-task panoptic perception.
zh

[CV-226] MCFCN: Multi-View Clustering via a Fusion-Consensus Graph Convolutional Network

【速读】:该论文旨在解决多视图聚类(Multi-view Clustering, MVC)中现有方法在学习共识表示时忽视数据固有拓扑结构、依赖易受噪声干扰的图结构输入,以及跨视图一致性不足和难以处理特征空间中难区分样本等问题。其解决方案的关键在于提出一种融合共识图卷积网络(Multi-View Clustering via a Fusion-Consensus Graph Convolutional Network, MCFCN),通过端到端方式学习多视图数据的共识图,并结合视图特征融合模型与统一图结构适配器(Unified Graph Structure Adapter, UGA)来提取有效共识表示;同时设计相似性矩阵对齐损失(Similarity Matrix Alignment Loss, SMAL)和特征表示对齐损失(Feature Representation Alignment Loss, FRAL),在共识引导下优化视图特定图结构,保持跨视图拓扑一致性,增强类内边构建,从而提升聚类性能。

链接: https://arxiv.org/abs/2511.05554
作者: Chenping Pei,Fadi Dornaika,Jingjun Bi
机构: University of the Basque Country (巴斯克大学); IKERBASQUE (巴斯克基金会); North China University of Water Resources and Electric Power (华北水利水电大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Existing Multi-view Clustering (MVC) methods based on subspace learning focus on consensus representation learning while neglecting the inherent topological structure of data. Despite the integration of Graph Neural Networks (GNNs) into MVC, their input graph structures remain susceptible to noise interference. Methods based on Multi-view Graph Refinement (MGRC) also have limitations such as insufficient consideration of cross-view consistency, difficulty in handling hard-to-distinguish samples in the feature space, and disjointed optimization processes caused by graph construction algorithms. To address these issues, a Multi-View Clustering method via a Fusion-Consensus Graph Convolutional Network (MCFCN) is proposed. The network learns the consensus graph of multi-view data in an end-to-end manner and learns effective consensus representations through a view feature fusion model and a Unified Graph Structure Adapter (UGA). It designs Similarity Matrix Alignment Loss (SMAL) and Feature Representation Alignment Loss (FRAL). With the guidance of consensus, it optimizes view-specific graphs, preserves cross-view topological consistency, promotes the construction of intra-class edges, and realizes effective consensus representation learning with the help of GCN to improve clustering performance. MCFCN demonstrates state-of-the-art performance on eight multi-view benchmark datasets, and its effectiveness is verified by extensive qualitative and quantitative implementations. The code will be provided at this https URL.
zh

[CV-227] EVLP:Learning Unified Embodied Vision-Language Planner with Reinforced Supervised Fine-Tuning

【速读】:该论文旨在解决复杂具身长时程操作任务中多模态规划不一致的问题,即当前方法缺乏统一的生成框架来协同整合文本逻辑推理与视觉空间想象,导致任务分解与执行效率低下。其解决方案的关键在于提出EVLP(Embodied Vision-Language Planner),一个统一的多模态生成框架,通过三个核心创新实现:1)统一的多模态生成架构,融合语义信息与空间特征以增强视觉感知,并直接学习离散图像的联合分布以支持一步视觉合成;2)动态感知预训练策略,采用双向动态对齐机制(逆动力学与前向动力学任务)强化统一特征空间内的多模态关联;3)强化监督微调机制,在统一生成空间中引入强化损失,对齐文本动作与生成图像之间的空间逻辑,从而赋予模型具备空间感知能力的多模态规划能力。

链接: https://arxiv.org/abs/2511.05553
作者: Xinyan Cai,Shiguang Wu,Dafeng Chi,Yuzheng Zhuang,Xingyue Quan,Jianye Hao,Qiang Guan
机构: Institute of Automation, Chinese Academy of Sciences (CASIA); Huawei Noah’s Ark Lab
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:In complex embodied long-horizon manipulation tasks, effective task decomposition and execution require synergistic integration of textual logical reasoning and visual-spatial imagination to ensure efficient and accurate operation. Current methods fail to adopt a unified generation framework for multimodal planning, lead to inconsistent in multimodal planning. To address this challenge, we present \textbfEVLP (Embodied Vision-Language Planner), an innovative multimodal unified generation framework that jointly models linguistic reasoning and visual generation. Our approach achieves multimodal planning for long-horizon tasks through a novel training pipeline incorporating dynamic pretraining and reinforced alignment. Our core innovations consist of three key components: \textbf1) Unified Multimodal Generation Framework: For understanding, We integrate semantic information with spatial features to provide comprehensive visual perception. For generation, we directly learn the joint distribution of discrete images for one-step visual synthesis, enabling coordinated language-visual modeling through learnable cross-modal attention mechanisms. \textbf2) Dynamic Perception Pretraining: We propose a bidirectional dynamic alignment strategy employing inverse dynamics tasks and forward dynamics tasks, effectively strengthening multimodal correlations within a unified feature space. \textbf3) Reinforced Supervised Fine-Tuning: While conducting instruction-based fine-tuning in the unified generation space, we construct a reinforce loss to align the spatial logic between textual actions and generated images, enabling the model to acquire spatio-awared multimodal planning capabilities.
zh

[CV-228] In-Context-Learning-Assisted Quality Assessment Vision-Language Models for Metal Additive Manufacturing

【速读】:该论文旨在解决增材制造(Additive Manufacturing, AM)中基于视觉的质量评估问题,传统方法依赖于专用机器学习模型和大量标注数据,而数据采集与模型训练成本高、耗时长。解决方案的关键在于利用视觉语言模型(Vision-Language Models, VLMs)的推理能力,并引入上下文学习(In-Context Learning, ICL)机制,通过少量示范样本向VLM注入特定应用知识,从而在无需大规模训练数据的情况下实现高质量分类。实验表明,ICL辅助的VLM可在仅使用极少样本的前提下达到与传统模型相当的准确率,同时生成人类可理解的推理过程,提升决策透明度;为此,作者还提出了“知识相关性”和“推理有效性”两项指标以量化评估其解释质量。

链接: https://arxiv.org/abs/2511.05551
作者: Qiaojie Zheng,Jiucai Zhang,Xiaoli Zhang
机构: Colorado School of Mines (科罗拉多矿业学院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 8 pages, 8 figures

点击查看摘要

Abstract:Vision-based quality assessment in additive manufacturing often requires dedicated machine learning models and application-specific datasets. However, data collection and model training can be expensive and time-consuming. In this paper, we leverage vision-language models’ (VLMs’) reasoning capabilities to assess the quality of printed parts and introduce in-context learning (ICL) to provide VLMs with necessary application-specific knowledge and demonstration samples. This method eliminates the requirement for large application-specific datasets for training models. We explored different sampling strategies for ICL to search for the optimal configuration that makes use of limited samples. We evaluated these strategies on two VLMs, Gemini-2.5-flash and Gemma3:27b, with quality assessment tasks in wire-laser direct energy deposition processes. The results show that ICL-assisted VLMs can reach quality classification accuracies similar to those of traditional machine learning models while requiring only a minimal number of samples. In addition, unlike traditional classification models that lack transparency, VLMs can generate human-interpretable rationales to enhance trust. Since there are no metrics to evaluate their interpretability in manufacturing applications, we propose two metrics, knowledge relevance and rationale validity, to evaluate the quality of VLMs’ supporting rationales. Our results show that ICL-assisted VLMs can address application-specific tasks with limited data, achieving relatively high accuracy while also providing valid supporting rationales for improved decision transparency.
zh

[CV-229] Automated Invoice Data Extraction: Using LLM and OCR

【速读】:该论文旨在解决传统光学字符识别(OCR)系统在处理发票文档时面临的挑战,包括布局多样性、手写文本以及低质量扫描图像等问题,这些问题通常源于强模板依赖性导致的灵活性不足。解决方案的关键在于构建一个融合光学字符识别(OCR)、深度学习模型(如卷积神经网络CNN和Transformer)、大语言模型(LLMs)及图分析技术的综合性人工智能(AI)平台,通过增强布局理解、语义解析与实体关系建模能力,实现高精度、高一致性的文档信息提取,从而显著优于以往依赖固定规则或单一模型的方法。

链接: https://arxiv.org/abs/2511.05547
作者: Advait Thakur,Khushi Khanchandani,Akshita Shetty,Chaitravi Reddy,Ritisa Behera
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 10 pages, 3 figures

点击查看摘要

Abstract:Conventional Optical Character Recognition (OCR) systems are challenged by variant invoice layouts, handwritten text, and low- quality scans, which are often caused by strong template dependencies that restrict their flexibility across different document structures and layouts. Newer solutions utilize advanced deep learning models such as Convolutional Neural Networks (CNN) as well as Transformers, and domain-specific models for better layout analysis and accuracy across various sections over varied document types. Large Language Models (LLMs) have revolutionized extraction pipelines at their core with sophisticated entity recognition and semantic comprehension to support complex contextual relationship mapping without direct programming specification. Visual Named Entity Recognition (NER) capabilities permit extraction from invoice images with greater contextual sensitivity and much higher accuracy rates than older approaches. Existing industry best practices utilize hybrid architectures that blend OCR technology and LLM for maximum scalability and minimal human intervention. This work introduces a holistic Artificial Intelligence (AI) platform combining OCR, deep learning, LLMs, and graph analytics to achieve unprecedented extraction quality and consistency.
zh

[CV-230] oken Is All You Need: Cognitive Planning through Sparse Intent Alignment

【速读】:该论文旨在解决端到端自动驾驶(End-to-End Autonomous Driving, E2EAD)中长期依赖于详尽场景建模的假设问题,即传统方法通常需要复杂的未来场景生成或受限于马尔可夫假设的视觉-语言-动作(Vision-Language-Action, VLA)系统。其解决方案的关键在于提出一种基于稀疏语义令牌(semantically rich tokens)的最小表示策略,无需显式预测未来场景即可实现高效规划:实验表明,在nuPlan基准上仅使用感知引导的BEV(Bird’s-Eye-View)表示,即使不进行未来预测也能达到0.548 m ADE(平均位移误差),优于此前在nuScenes上约0.75 m的性能;进一步地,通过条件化轨迹解码于预测的未来令牌,AED可提升至0.479 m,较当前状态基线改善12.6%。此外,研究发现显式重建损失在可靠感知输入下不仅无益反而可能损害性能,并观察到时间模糊性(temporal fuzziness)现象——模型自适应关注任务相关语义而非固定时间戳,体现出在不确定性下的认知优势。这一“token is all you need”范式标志着从世界重建转向语义理解的认知转变,为基于想象而非反应的规划系统奠定基础。

链接: https://arxiv.org/abs/2511.05540
作者: Shiyao Sang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
备注: 6 pages, 2 figures. Preprint exploring a new cognitive paradigm for autonomous planning

点击查看摘要

Abstract:We challenge the long-standing assumption that exhaustive scene modeling is required for high-performance end-to-end autonomous driving (E2EAD). Unlike world-model approaches that rely on computationally intensive future scene generation or vision-language-action (VLA) systems constrained by Markov assumptions, we show that a minimal set of semantically rich tokens is sufficient for effective planning. Experiments on the nuPlan benchmark (720 scenarios, over 11,000 samples) using perception-informed BEV representations yield three key findings: (1) even without future prediction, our sparse representation achieves 0.548 m ADE, comparable to or surpassing prior methods reporting around 0.75 m on nuScenes; (2) conditioning trajectory decoding on predicted future tokens reduces ADE to 0.479 m, a 12.6% improvement over current-state baselines; and (3) explicit reconstruction loss offers no benefit and may degrade performance under reliable perception inputs. Notably, we observe the emergence of temporal fuzziness, where the model adaptively attends to task-relevant semantics rather than aligning rigidly to fixed timestamps, providing a cognitive advantage for planning under uncertainty. Our “token is all you need” principle marks a paradigm shift from reconstructing the world to understanding it, laying a foundation for cognitively inspired systems that plan through imagination rather than reaction.
zh

[CV-231] Randomized-MLP Regularization Improves Domain Adaptation and Interpretability in DINOv2

【速读】:该论文旨在解决视觉Transformer(Vision Transformer, ViT)模型在跨域应用中,尤其是医学影像领域,因低信息量的patch token被不合理利用而导致注意力机制和特征图可解释性下降的问题。解决方案的关键在于提出一种基于对比学习的随机化MLP(Randomized-MLP, RMLP)正则化方法,通过在微调DINOv2时引入RMLP,促使模型生成更具语义一致性的表示,从而在保持或提升下游任务性能的同时显著增强注意力图的可解释性。

链接: https://arxiv.org/abs/2511.05509
作者: Joel Valdivia Ortega,Lorenz Lamm,Franziska Eckardt,Benedikt Schworm,Marion Jasnin,Tingying Peng
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Vision Transformers (ViTs), such as DINOv2, achieve strong performance across domains but often repurpose low-informative patch tokens in ways that reduce the interpretability of attention and feature maps. This challenge is especially evident in medical imaging, where domain shifts can degrade both performance and transparency. In this paper, we introduce Randomized-MLP (RMLP) regularization, a contrastive learning-based method that encourages more semantically aligned representations. We use RMLPs when fine-tuning DINOv2 to both medical and natural image modalities, showing that it improves or maintains downstream performance while producing more interpretable attention maps. We also provide a mathematical analysis of RMLPs, offering insights into its role in enhancing ViT-based models and advancing our understanding of contrastive learning.
zh

[CV-232] CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video

【速读】:该论文旨在解决用户生成内容(User-Generated Content, UGC)在视频平台(如YouTube和TikTok)上传播时,无参考(No-Reference, NR)视频质量评估(VQA)面临的挑战,尤其是由于非专业拍摄和后续转码导致的复杂失真类型难以建模的问题。现有NR-VQA模型虽能预测平均意见分(MOS),但因缺乏细粒度的感知失真标注,其对压缩内容主观质量的建模能力受限。解决方案的关键在于提出CAMP-VQA框架,利用大视觉语言模型(Vision-Language Model, VLM)的语义理解能力,设计一种质量感知提示机制(quality-aware prompting),将视频元数据(如分辨率、帧率、比特率)与基于帧间差异提取的关键片段相结合,引导BLIP-2预训练模型生成细粒度的质量描述文本;进而构建统一架构,融合语义对齐、时序特征和空间特征三维度的多模态特征,并回归得到视频质量分数,从而在无需昂贵人工细粒度标注的情况下显著提升评估精度(SRCC: 0.928, PLCC: 0.938)。

链接: https://arxiv.org/abs/2511.07290
作者: Xinyi Wang,Angeliki Katsenou,Junxiao Shen,David Bull
机构: 未知
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
备注: 14 pages, 6 figures

点击查看摘要

Abstract:The prevalence of user-generated content (UGC) on platforms such as YouTube and TikTok has rendered no-reference (NR) perceptual video quality assessment (VQA) vital for optimizing video delivery. Nonetheless, the characteristics of non-professional acquisition and the subsequent transcoding of UGC video on sharing platforms present significant challenges for NR-VQA. Although NR-VQA models attempt to infer mean opinion scores (MOS), their modeling of subjective scores for compressed content remains limited due to the absence of fine-grained perceptual annotations of artifact types. To address these challenges, we propose CAMP-VQA, a novel NR-VQA framework that exploits the semantic understanding capabilities of large vision-language models. Our approach introduces a quality-aware prompting mechanism that integrates video metadata (e.g., resolution, frame rate, bitrate) with key fragments extracted from inter-frame variations to guide the BLIP-2 pretraining approach in generating fine-grained quality captions. A unified architecture has been designed to model perceptual quality across three dimensions: semantic alignment, temporal characteristics, and spatial characteristics. These multimodal features are extracted and fused, then regressed to video quality scores. Extensive experiments on a wide variety of UGC datasets demonstrate that our model consistently outperforms existing NR-VQA methods, achieving improved accuracy without the need for costly manual fine-grained annotations. Our method achieves the best performance in terms of average rank and linear correlation (SRCC: 0.928, PLCC: 0.938) compared to state-of-the-art methods. The source code and trained models, along with a user-friendly demo, are available at: this https URL.
zh

[CV-233] Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models

【速读】:该论文旨在解决当前基于大语言模型(Large Language Models, LLMs)的多模态语音识别方法中存在的两个核心问题:一是各任务(听觉语音识别 Auditory Speech Recognition, ASR;视觉语音识别 Visual Speech Recognition, VSR;视听语音识别 Audio-Visual Speech Recognition, AVSR)独立建模导致计算与部署资源消耗高,且难以挖掘跨任务协同潜力;二是固定速率的 token 压缩机制限制了在准确率与效率之间灵活权衡的能力。解决方案的关键在于提出 Omni-AVSR,一个统一的音频-视觉 LLM 框架,其核心创新包括:1)采用马特罗什卡(Matryoshka)表示学习范式实现多粒度训练,显著降低训练资源开销;2)引入三种基于 LoRA(Low-Rank Adaptation)的适配策略,在共享参数与任务特异性专精之间取得平衡,从而支持弹性推理(elastic inference),在保持性能的同时大幅减少训练和部署成本,并在噪声环境下仍具鲁棒性。

链接: https://arxiv.org/abs/2511.07253
作者: Umberto Cappellazzo,Xubo Liu,Pingchuan Ma,Stavros Petridis,Maja Pantic
机构: 未知
类目: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
备注: Project website: this https URL

点击查看摘要

Abstract:Large language models (LLMs) have recently achieved impressive results in speech recognition across multiple modalities, including Auditory Speech Recognition (ASR), Visual Speech Recognition (VSR), and Audio-Visual Speech Recognition (AVSR). Despite this progress, current LLM-based approaches typically address each task independently, training separate models that raise computational and deployment resource use while missing potential cross-task synergies. They also rely on fixed-rate token compression, which restricts flexibility in balancing accuracy with efficiency. These limitations highlight the need for a unified framework that can support ASR, VSR, and AVSR while enabling elastic inference. To this end, we present Omni-AVSR, a unified audio-visual LLM that combines efficient multi-granularity training with parameter-efficient adaptation. Specifically, we adapt the matryoshka representation learning paradigm to efficiently train across multiple audio and visual granularities, reducing its inherent training resource use. Furthermore, we explore three LoRA-based strategies for adapting the backbone LLM, balancing shared and task-specific specialization. Experiments on LRS2 and LRS3 show that Omni-AVSR achieves comparable or superior accuracy to state-of-the-art baselines while training a single model at substantially lower training and deployment resource use. The model also remains robust under acoustic noise, and we analyze its scaling behavior as LLM size increases, providing insights into the trade-off between performance and efficiency.
zh

[CV-234] ask-Adaptive Low-Dose CT Reconstruction

【速读】:该论文旨在解决深度学习驱动的低剂量计算机断层扫描(low-dose computed tomography, LD-CT)重建方法在标准图像质量指标(如峰值信噪比和结构相似性指数)上表现优异,但无法有效保留临床诊断所需关键解剖细节的问题。这一局限性严重阻碍了其在临床实践中的应用。解决方案的关键在于提出一种任务自适应(task-adaptive)重建框架,通过将一个冻结的预训练任务网络(如肝脏及肿瘤分割网络)作为正则化项嵌入重建损失函数中,从而引导重建过程以提升诊断相关特征的保真度。该方法区别于现有联合训练策略,避免了重建与任务网络之间的优化冲突,能够在不改变原重建模型结构的前提下,仅通过修改损失函数实现对特定诊断任务的优化,显著提升了任务性能(如Dice分数达0.707,接近全剂量扫描的0.874),并优于传统重建方法和联合训练方案。

链接: https://arxiv.org/abs/2511.07094
作者: Necati Sefercioglu,Mehmet Ozan Unal,Metin Ertas,Isa Yildirim
机构: Istanbul Technical University (伊斯坦布尔技术大学)
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Deep learning-based low-dose computed tomography reconstruction methods already achieve high performance on standard image quality metrics like peak signal-to-noise ratio and structural similarity index measure. Yet, they frequently fail to preserve the critical anatomical details needed for diagnostic tasks. This fundamental limitation hinders their clinical applicability despite their high metric scores. We propose a novel task-adaptive reconstruction framework that addresses this gap by incorporating a frozen pre-trained task network as a regularization term in the reconstruction loss function. Unlike existing joint-training approaches that simultaneously optimize both reconstruction and task networks, and risk diverging from satisfactory reconstructions, our method leverages a pre-trained task model to guide reconstruction training while still maintaining diagnostic quality. We validate our framework on a liver and liver tumor segmentation task. Our task-adaptive models achieve Dice scores up to 0.707, approaching the performance of full-dose scans (0.874), and substantially outperforming joint-training approaches (0.331) and traditional reconstruction methods (0.626). Critically, our framework can be integrated into any existing deep learning-based reconstruction model through simple loss function modification, enabling widespread adoption for task-adaptive optimization in clinical practice. Our codes are available at: this https URL
zh

[CV-235] auFlow: Dynamic Causal Constraint for Complexity-Adaptive Lightweight Segmentation

【速读】:该论文旨在解决轻量化医学图像分割模型在边缘设备部署中面临的两大挑战:一是如何高效处理病灶边界与背景区域之间的显著对比差异;二是如何在追求极致轻量化(如参数量仅0.5M)时避免准确率急剧下降的问题。解决方案的关键在于提出TauFlow模型,其核心是一种受大脑机制启发的动态特征响应策略,具体包含两个创新模块:一是卷积长时恒定细胞(Convolutional Long-Time Constant Cell, ConvLTC),通过动态调节特征更新速率实现对低频背景的“缓慢”处理和对高频边界的“快速”响应;二是STDP自组织模块(STDP Self-Organizing Module),有效缓解编码器与解码器之间特征冲突,将冲突率从约35%-40%降低至8%-10%。

链接: https://arxiv.org/abs/2511.07057
作者: Zidong Chen,Fadratul Hafinaz Hassan
机构: 未知
类目: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: 42 pages and 9 figures

点击查看摘要

Abstract:Deploying lightweight medical image segmentation models on edge devices presents two major challenges: 1) efficiently handling the stark contrast between lesion boundaries and background regions, and 2) the sharp drop in accuracy that occurs when pursuing extremely lightweight designs (e.g., 0.5M parameters). To address these problems, this paper proposes TauFlow, a novel lightweight segmentation model. The core of TauFlow is a dynamic feature response strategy inspired by brain-like mechanisms. This is achieved through two key innovations: the Convolutional Long-Time Constant Cell (ConvLTC), which dynamically regulates the feature update rate to “slowly” process low-frequency backgrounds and “quickly” respond to high-frequency boundaries; and the STDP Self-Organizing Module, which significantly mitigates feature conflicts between the encoder and decoder, reducing the conflict rate from approximately 35%-40% to 8%-10%.
zh

[CV-236] RRTS Dataset: A Benchmark Colonoscopy Dataset from Resource-Limited Settings for Computer-Aided Diagnosis Research

【速读】:该论文旨在解决当前结直肠癌筛查中因现有公开数据集(如CVC-ClinicDB和Kvasir-SEG)样本量小、图像选择受控或缺乏真实世界伪影而导致的模型泛化能力不足问题,尤其是在资源受限临床环境下的适用性局限。其解决方案的关键在于构建一个名为BUET Polyp Dataset (BPD) 的新型结肠镜图像数据集,该数据集在常规临床条件下采集自Olympus 170和Pentax i-Scan系列内窥镜,包含1,288张带专家标注二值掩膜的含息肉图像及1,657张无息肉图像,覆盖运动模糊、镜面反光、粪便伪影、出血和低光照等多种真实场景挑战;同时提供基于VGG16、ResNet50、InceptionV3的分类基准与基于UNet架构结合不同骨干网络(VGG16、ResNet34、InceptionV4)的分割基准,验证了该数据集能更真实地反映临床复杂性,从而推动生成式AI(Generative AI)在医学图像分析中的鲁棒性发展。

链接: https://arxiv.org/abs/2511.06769
作者: Ridoy Chandra Shil,Ragib Abid,Tasnia Binte Mamun,Samiul Based Shuvo,Masfique Ahmed Bhuiyan,Jahid Ferdous
机构: Bangladesh University of Engineering and Technology (BUET)(孟加拉国工程技术大学); Dhaka Medical College(达卡医学院)
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Background and Objective: Colorectal cancer prevention relies on early detection of polyps during colonoscopy. Existing public datasets, such as CVC-ClinicDB and Kvasir-SEG, provide valuable benchmarks but are limited by small sample sizes, curated image selection, or lack of real-world artifacts. There remains a need for datasets that capture the complexity of clinical practice, particularly in resource-constrained settings. Methods: We introduce a dataset, BUET Polyp Dataset (BPD), of colonoscopy images collected using Olympus 170 and Pen- tax i-Scan series endoscopes under routine clinical conditions. The dataset contains images with corresponding expert-annotated binary masks, reflecting diverse challenges such as motion blur, specular highlights, stool artifacts, blood, and low-light frames. Annotations were manually reviewed by clinical experts to ensure quality. To demonstrate baseline performance, we provide bench- mark results for classification using VGG16, ResNet50, and InceptionV3, and for segmentation using UNet variants with VGG16, ResNet34, and InceptionV4 backbones. Results: The dataset comprises 1,288 images with polyps from 164 patients with corresponding ground-truth masks and 1,657 polyp-free images from 31 patients. Benchmarking experiments achieved up to 90.8% accuracy for binary classification (VGG16) and a maximum Dice score of 0.64 with InceptionV4-UNet for segmentation. Performance was lower compared to curated datasets, reflecting the real-world difficulty of images with artifacts and variable quality.
zh

[CV-237] Hierarchical Spatial-Frequency Aggregation for Spectral Deconvolution Imaging

【速读】:该论文旨在解决光谱解卷积成像(Spectral Deconvolution Imaging, SDI)中因点扩散函数(Point Spread Function, PSF)工程导致的复合卷积-积分运算所引发的系数矩阵场景依赖性问题,这一特性阻碍了成像先验的有效利用并增加了重建精度的挑战。解决方案的关键在于提出一种分层空间-光谱聚合展开框架(Hierarchical Spatial-Spectral Aggregation Unfolding Framework, HSFAUF),通过将子问题分解并在频域中投影,将非线性过程转化为线性映射以实现高效求解;进一步地,引入空间-频率聚合变压器(Spatial-Frequency Aggregation Transformer, SFAT),显式融合空间与光谱域的信息,在迭代优化过程中整合多维先验,最终构建出基于Transformer的深度展开方法——分层空间-频率聚合展开变压器(HSFAUT),从而在不同SDI系统上实现高保真、低计算与内存开销的重建性能。

链接: https://arxiv.org/abs/2511.06751
作者: Tao Lv,Daoming Zhou,Chenglong Huang,Chongde Zi,Linsen Chen,Xun Cao
机构: 未知
类目: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: Under Review at TPAMI

点击查看摘要

Abstract:Computational spectral imaging (CSI) achieves real-time hyperspectral imaging through co-designed optics and algorithms, but typical CSI methods suffer from a bulky footprint and limited fidelity. Therefore, Spectral Deconvolution imaging (SDI) methods based on PSF engineering have been proposed to achieve high-fidelity compact CSI design recently. However, the composite convolution-integration operations of SDI render the normal-equation coefficient matrix scene-dependent, which hampers the efficient exploitation of imaging priors and poses challenges for accurate reconstruction. To tackle the inherent data-dependent operators in SDI, we introduce a Hierarchical Spatial-Spectral Aggregation Unfolding Framework (HSFAUF). By decomposing subproblems and projecting them into the frequency domain, HSFAUF transforms nonlinear processes into linear mappings, thereby enabling efficient solutions. Furthermore, to integrate spatial-spectral priors during iterative refinement, we propose a Spatial-Frequency Aggregation Transformer (SFAT), which explicitly aggregates information across spatial and frequency domains. By integrating SFAT into HSFAUF, we develop a Transformer-based deep unfolding method, \textbfHierarchical \textbfSpatial-\textbfFrequency \textbfAggregation \textbfUnfolding \textbfTransformer (HSFAUT), to solve the inverse problem of SDI. Systematic simulated and real experiments show that HSFAUT surpasses SOTA methods with cheaper memory and computational costs, while exhibiting optimal performance on different SDI systems.
zh

[CV-238] Non-Negative Stiefel Approximating Flow: Orthogonalish Matrix Optimization for Interpretable Embeddings

【速读】:该论文旨在解决高维数据场景下(如神经影像、基因组学和文本分析)可解释表示学习的核心挑战,即如何在模型灵活性与可解释性之间取得平衡。传统方法往往难以同时实现稳定、稀疏且具有生物学或领域意义的特征表示。其解决方案的关键在于提出非负Stiefel近似流(NSA-Flow),该框架通过连续调节重建保真度与列间去相关性之间的权衡,以单一可调权重控制结构化稀疏性;其优化过程在Stiefel流形附近平滑流动,结合近端更新保证非负约束并采用自适应梯度控制,从而生成既稀疏又稳定的可解释表示。相较于经典正则化方法,NSA-Flow提供了直观的几何机制,在全局结构层面操控稀疏性的同时简化潜在特征,显著提升模型在真实生物医学数据中的可解释性和泛化能力。

链接: https://arxiv.org/abs/2511.06425
作者: Brian B. Avants,Nicholas J. Tustison,James R Stone(Department of Radiology and Medical Imaging University of Virginia, Charlottesville, VA)
机构: 未知
类目: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Methodology (stat.ME)
备注:

点击查看摘要

Abstract:Interpretable representation learning is a central challenge in modern machine learning, particularly in high-dimensional settings such as neuroimaging, genomics, and text analysis. Current methods often struggle to balance the competing demands of interpretability and model flexibility, limiting their effectiveness in extracting meaningful insights from complex data. We introduce Non-negative Stiefel Approximating Flow (NSA-Flow), a general-purpose matrix estimation framework that unifies ideas from sparse matrix factorization, orthogonalization, and constrained manifold learning. NSA-Flow enforces structured sparsity through a continuous balance between reconstruction fidelity and column-wise decorrelation, parameterized by a single tunable weight. The method operates as a smooth flow near the Stiefel manifold with proximal updates for non-negativity and adaptive gradient control, yielding representations that are simultaneously sparse, stable, and interpretable. Unlike classical regularization schemes, NSA-Flow provides an intuitive geometric mechanism for manipulating sparsity at the level of global structure while simplifying latent features. We demonstrate that the NSA-Flow objective can be optimized smoothly and integrates seamlessly with existing pipelines for dimensionality reduction while improving interpretability and generalization in both simulated and real biomedical data. Empirical validation on the Golub leukemia dataset and in Alzheimer’s disease demonstrate that the NSA-Flow constraints can maintain or improve performance over related methods with little additional methodological effort. NSA-Flow offers a scalable, general-purpose tool for interpretable ML, applicable across data science domains.
zh

[CV-239] urbo-DDCM: Fast and Flexible Zero-Shot Diffusion-Based Image Compression

【速读】:该论文旨在解决基于扩散模型(diffusion-based)的零样本图像压缩方法在实际应用中效率低下、计算开销大的问题。现有方法虽在压缩性能上接近前沿水平,但其逐步去噪过程导致推理速度缓慢,难以满足实时或资源受限场景的需求。解决方案的关键在于提出 Turbo-DDCM 框架,通过在每个去噪步骤中高效组合大量噪声向量(noise vectors),显著减少所需的去噪操作次数,从而大幅提升压缩速度;同时引入改进的编码协议,并设计两种灵活变体——优先级感知变体(priority-aware variant)和失真可控变体(distortion-controlled variant),分别支持区域优先压缩与目标峰值信噪比(PSNR)控制,使方法兼具高效性、实用性与灵活性。

链接: https://arxiv.org/abs/2511.06424
作者: Amit Vaisman,Guy Ohayon,Hila Manor,Michael Elad,Tomer Michaeli
机构: Technion – Israel Institute of Technology (以色列理工学院); Flatiron Institute, Simons Foundation (西蒙斯基金会)
类目: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP); Machine Learning (stat.ML)
备注: Code is available at this https URL

点击查看摘要

Abstract:While zero-shot diffusion-based compression methods have seen significant progress in recent years, they remain notoriously slow and computationally demanding. This paper presents an efficient zero-shot diffusion-based compression method that runs substantially faster than existing methods, while maintaining performance that is on par with the state-of-the-art techniques. Our method builds upon the recently proposed Denoising Diffusion Codebook Models (DDCMs) compression scheme. Specifically, DDCM compresses an image by sequentially choosing the diffusion noise vectors from reproducible random codebooks, guiding the denoiser’s output to reconstruct the target image. We modify this framework with Turbo-DDCM, which efficiently combines a large number of noise vectors at each denoising step, thereby significantly reducing the number of required denoising operations. This modification is also coupled with an improved encoding protocol. Furthermore, we introduce two flexible variants of Turbo-DDCM, a priority-aware variant that prioritizes user-specified regions and a distortion-controlled variant that compresses an image based on a target PSNR rather than a target BPP. Comprehensive experiments position Turbo-DDCM as a compelling, practical, and flexible image compression scheme.
zh

[CV-240] Cross-Modal Fine-Tuning of 3D Convolutional Foundation Models for ADHD Classification with Low-Rank Adaptation

【速读】:该论文旨在解决儿童注意缺陷多动障碍(ADHD)早期诊断中因神经影像数据异质性高和症状与其他疾病重叠而导致的困难问题。解决方案的关键在于提出一种参数高效的迁移学习方法,通过在3D卷积基础上引入低秩适配(Low-Rank Adaptation, LoRA),将预训练于CT图像的大规模3D卷积基础模型高效迁移到MRI数据上的ADHD分类任务中;该方法通过将3D卷积核分解为2D低秩更新,显著减少可训练参数(仅需164万参数,较全量微调减少113倍),同时实现优于现有方法的性能(最高准确率达71.9%,AUC达0.716),成为首个成功实现跨模态(CT到MRI)基础模型迁移的神经影像学应用,为ADHD分类建立了新基准并大幅提升效率。

链接: https://arxiv.org/abs/2511.06163
作者: Jyun-Ping Kao,Shinyeong Rho,Shahar Lazarev,Hyun-Hae Cho,Fangxu Xing,Taehoon Shin,C.-C. Jay Kuo,Jonghye Woo
机构: 未知
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Medical Physics (physics.med-ph)
备注:

点击查看摘要

Abstract:Early diagnosis of attention-deficit/hyperactivity disorder (ADHD) in children plays a crucial role in improving outcomes in education and mental health. Diagnosing ADHD using neuroimaging data, however, remains challenging due to heterogeneous presentations and overlapping symptoms with other conditions. To address this, we propose a novel parameter-efficient transfer learning approach that adapts a large-scale 3D convolutional foundation model, pre-trained on CT images, to an MRI-based ADHD classification task. Our method introduces Low-Rank Adaptation (LoRA) in 3D by factorizing 3D convolutional kernels into 2D low-rank updates, dramatically reducing trainable parameters while achieving superior performance. In a five-fold cross-validated evaluation on a public diffusion MRI database, our 3D LoRA fine-tuning strategy achieved state-of-the-art results, with one model variant reaching 71.9% accuracy and another attaining an AUC of 0.716. Both variants use only 1.64 million trainable parameters (over 113x fewer than a fully fine-tuned foundation model). Our results represent one of the first successful cross-modal (CT-to-MRI) adaptations of a foundation model in neuroimaging, establishing a new benchmark for ADHD classification while greatly improving efficiency.
zh

[CV-241] EndoIR: Degradation-Agnostic All-in-One Endoscopic Image Restoration via Noise-Aware Routing Diffusion

【速读】:该论文旨在解决内窥镜图像中多种退化类型(如低光照、烟雾和出血)共存且相互干扰的问题,这类退化会掩盖关键临床信息,而现有修复方法通常针对特定任务且需预先知晓退化类型,限制了其在真实临床场景中的鲁棒性。解决方案的关键在于提出一种统一的、退化无关的基于扩散模型的框架EndoIR,其核心创新包括:1)Dual-Domain Prompter提取联合空域-频域特征,并通过自适应嵌入编码共享与任务特异性线索作为去噪条件;2)Dual-Stream Diffusion架构分离处理干净与退化输入,结合Rectified Fusion Block以结构化方式融合特征,避免传统拼接导致的特征混淆;3)Noise-Aware Routing Block动态筛选噪声相关特征,提升去噪效率。该方案在SegSTRONG-C和CEC数据集上实现多退化场景下的SOTA性能,且参数量少于强基线,下游分割实验验证了其临床实用性。

链接: https://arxiv.org/abs/2511.05873
作者: Tong Chen,Xinyu Ma,Long Bai,Wenyang Wang,Sun Yue,Luping Zhou
机构: 未知
类目: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
备注:

点击查看摘要

Abstract:Endoscopic images often suffer from diverse and co-occurring degradations such as low lighting, smoke, and bleeding, which obscure critical clinical details. Existing restoration methods are typically task-specific and often require prior knowledge of the degradation type, limiting their robustness in real-world clinical use. We propose EndoIR, an all-in-one, degradation-agnostic diffusion-based framework that restores multiple degradation types using a single model. EndoIR introduces a Dual-Domain Prompter that extracts joint spatial-frequency features, coupled with an adaptive embedding that encodes both shared and task-specific cues as conditioning for denoising. To mitigate feature confusion in conventional concatenation-based conditioning, we design a Dual-Stream Diffusion architecture that processes clean and degraded inputs separately, with a Rectified Fusion Block integrating them in a structured, degradation-aware manner. Furthermore, Noise-Aware Routing Block improves efficiency by dynamically selecting only noise-relevant features during denoising. Experiments on SegSTRONG-C and CEC datasets demonstrate that EndoIR achieves state-of-the-art performance across multiple degradation scenarios while using fewer parameters than strong baselines, and downstream segmentation experiments confirm its clinical utility.
zh

[CV-242] HarmoQ: Harmonized Post-Training Quantization for High-Fidelity Image

【速读】:该论文旨在解决超分辨率模型后训练量化(post-training quantization)中权重(weight)与激活(activation)量化独立处理所导致的性能损失问题,特别是忽视二者间关键耦合关系带来的结构失真和像素级精度下降。其解决方案的关键在于提出HarmoQ框架,通过三个协同步骤实现统一量化:结构残差校准(structural residual calibration)主动补偿激活量化引起的细节丢失,谐波尺度优化(harmonized scale optimization)基于闭式解解析平衡量化难度,自适应边界精修(adaptive boundary refinement)在优化过程中迭代维持平衡。该方法首次系统分析了超分辨率任务中权重-激活耦合机制,并实现了高效高质量图像恢复。

链接: https://arxiv.org/abs/2511.05868
作者: Hongjun Wang,Jiyuan Chen,Xuan Song,Yinqiang Zheng
机构: The University of Tokyo (东京大学); The Hong Kong Polytechnic University (香港理工大学); Jilin University (吉林大学)
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Post-training quantization offers an efficient pathway to deploy super-resolution models, yet existing methods treat weight and activation quantization independently, missing their critical interplay. Through controlled experiments on SwinIR, we uncover a striking asymmetry: weight quantization primarily degrades structural similarity, while activation quantization disproportionately affects pixel-level accuracy. This stems from their distinct roles–weights encode learned restoration priors for textures and edges, whereas activations carry input-specific intensity information. Building on this insight, we propose HarmoQ, a unified framework that harmonizes quantization across components through three synergistic steps: structural residual calibration proactively adjusts weights to compensate for activation-induced detail loss, harmonized scale optimization analytically balances quantization difficulty via closed-form solutions, and adaptive boundary refinement iteratively maintains this balance during optimization. Experiments show HarmoQ achieves substantial gains under aggressive compression, outperforming prior art by 0.46 dB on Set5 at 2-bit while delivering 3.2x speedup and 4x memory reduction on A100 GPUs. This work provides the first systematic analysis of weight-activation coupling in super-resolution quantization and establishes a principled solution for efficient high-quality image restoration.
zh

[CV-243] raining-Free Adaptive Quantization for Variable Rate Image Coding for Machines

【速读】:该论文旨在解决图像编码用于机器(Image Coding for Machines, ICM)中现有可变比特率学习图像压缩(Learned Image Compression, LIC)方法存在的局限性,即大多数LIC框架采用固定比特率且需为每个目标比特率单独训练,导致部署复杂性和计算开销高,且可变比特率控制在ICM场景下尚未得到充分探索。解决方案的关键在于提出一种无需训练的自适应量化步长控制机制,通过利用超先验网络(hyperprior network)预测的通道级熵依赖关系和空间尺度参数,实现对语义重要区域的精细保留与非关键区域的粗粒度量化,从而以单一参数连续调节比特率,显著提升压缩效率——实验表明相较非自适应可变比特率方法最高可获得11.07%的BD-rate节省。

链接: https://arxiv.org/abs/2511.05836
作者: Yui Tatsumi,Ziyue Zeng,Hiroshi Watanabe
机构: 未知
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Image Coding for Machines (ICM) has become increasingly important with the rapid integration of computer vision into real-world applications. However, most ICM frameworks utilize learned image compression (LIC) models that operate at a fixed rate and require separate training for each target bitrate, which may limit their practical applications. Existing variable rate LIC approaches mitigate this limitation but typically depend on training, increasing computational cost and deployment complexity. Moreover, variable rate control has not been thoroughly explored for ICM. To address these challenges, we propose a training-free, adaptive quantization step size control scheme that enables flexible bitrate adjustment. By leveraging both channel-wise entropy dependencies and spatial scale parameters predicted by the hyperprior network, the proposed method preserves semantically important regions while coarsely quantizing less critical areas. The bitrate can be continuously controlled through a single parameter. Experimental results demonstrate the effectiveness of our proposed method, achieving up to 11.07% BD-rate savings over the non-adaptive variable rate method.
zh

[CV-244] ConnectomeBench: Can LLM s Proofread the Connectome? NEURIPS2025

【速读】:该论文旨在解决连接组学(connectomics)中神经连接图谱数据人工校对效率低下的问题,即当前依赖大量人力对成像与机器学习分割所得的数据进行校对。其解决方案的关键在于构建一个名为ConnectomeBench的多模态基准测试平台,用于系统评估大型语言模型(LLM)在三个核心校对任务中的能力:片段类型识别、分裂错误修正和合并错误检测。通过使用来自小鼠视觉皮层和果蝇全脑的专家标注数据,研究发现当前主流大模型在片段识别和分裂错误修正上已显著优于随机水平(准确率52–85%),但在合并错误识别上仍表现不佳,表明AI代理具备辅助甚至替代人类校对的潜力,但仍需进一步优化。

链接: https://arxiv.org/abs/2511.05542
作者: Jeff Brown,Andrew Kirjner Annika Vivekananthan,Ed Boyden
机构: 未知
类目: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: To appear in NeurIPS 2025 Datasets and Benchmarks Track

点击查看摘要

Abstract:Connectomics - the mapping of neural connections in an organism’s brain - currently requires extraordinary human effort to proofread the data collected from imaging and machine-learning assisted segmentation. With the growing excitement around using AI agents to automate important scientific tasks, we explore whether current AI systems can perform multiple tasks necessary for data proofreading. We introduce ConnectomeBench, a multimodal benchmark evaluating large language model (LLM) capabilities in three critical proofreading tasks: segment type identification, split error correction, and merge error detection. Using expert annotated data from two large open-source datasets - a cubic millimeter of mouse visual cortex and the complete Drosophila brain - we evaluate proprietary multimodal LLMs including Claude 3.7/4 Sonnet, o4-mini, GPT-4.1, GPT-4o, as well as open source models like InternVL-3 and NVLM. Our results demonstrate that current models achieve surprisingly high performance in segment identification (52-82% balanced accuracy vs. 20-25% chance) and binary/multiple choice split error correction (75-85% accuracy vs. 50% chance) while generally struggling on merge error identification tasks. Overall, while the best models still lag behind expert performance, they demonstrate promising capabilities that could eventually enable them to augment and potentially replace human proofreading in connectomics. Project page: this https URL and Dataset this https URL
zh

[CV-245] Selective Diabetic Retinopathy Screening with Accuracy-Weighted Deep Ensembles and Entropy-Guided Abstention

【速读】:该论文旨在解决糖尿病视网膜病变(Diabetic Retinopathy, DR)早期诊断中因传统方法成本高、资源密集且缺乏可解释性而导致的漏诊率高(约25%)的问题。其核心解决方案是提出一种集成学习框架,结合不确定性量化机制以提升模型在临床部署中的可靠性与透明度:通过融合七种卷积神经网络(CNN)架构(ResNet-50、DenseNet-121、MobileNetV3 和 EfficientNet 系列),采用基于准确率加权的多数投票策略进行输出融合,并引入概率加权熵(probability-weighted entropy)指标对预测不确定性进行量化,从而筛选出低置信度样本进行人工复核或剔除。实验表明,在35,000张EyePACS图像上训练后,未经过滤时模型准确率达93.70%,经不确定性过滤后最大准确率提升至99.44%,验证了该方法在不牺牲性能的前提下显著增强诊断可信度和可扩展性。

链接: https://arxiv.org/abs/2511.05529
作者: Jophy Lin
机构: 未知
类目: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Diabetic retinopathy (DR), a microvascular complication of diabetes and a leading cause of preventable blindness, is projected to affect more than 130 million individuals worldwide by 2030. Early identification is essential to reduce irreversible vision loss, yet current diagnostic workflows rely on methods such as fundus photography and expert review, which remain costly and resource-intensive. This, combined with DR’s asymptomatic nature, results in its underdiagnosis rate of approximately 25 percent. Although convolutional neural networks (CNNs) have demonstrated strong performance in medical imaging tasks, limited interpretability and the absence of uncertainty quantification restrict clinical reliability. Therefore, in this study, a deep ensemble learning framework integrated with uncertainty estimation is introduced to improve robustness, transparency, and scalability in DR detection. The ensemble incorporates seven CNN architectures-ResNet-50, DenseNet-121, MobileNetV3 (Small and Large), and EfficientNet (B0, B2, B3)- whose outputs are fused through an accuracy-weighted majority voting strategy. A probability-weighted entropy metric quantifies prediction uncertainty, enabling low-confidence samples to be excluded or flagged for additional review. Training and validation on 35,000 EyePACS retinal fundus images produced an unfiltered accuracy of 93.70 percent (F1 = 0.9376). Uncertainty-filtering later was conducted to remove unconfident samples, resulting in maximum-accuracy of 99.44 percent (F1 = 0.9932). The framework shows that uncertainty-aware, accuracy-weighted ensembling improves reliability without hindering performance. With confidence-calibrated outputs and a tunable accuracy-coverage trade-off, it offers a generalizable paradigm for deploying trustworthy AI diagnostics in high-risk care.
zh

[CV-246] sMRI-based Brain Age Estimation in MCI using Persistent Homology

【速读】:该论文旨在解决如何通过结构化脑部影像特征有效区分正常衰老与病理性衰老的问题,从而为认知功能衰退的早期检测和监测提供潜在生物标志物。其解决方案的关键在于引入持久同调(persistent homology)中的贝蒂数曲线(Betti curves)作为特征提取方法,尤其聚焦于一维(连通分量)和二维(一维空洞)拓扑特征,这些特征能够敏感地捕捉与年龄相关的结构性脑变化,并结合临床特征按其与预测脑龄及实际年龄的相关性进行分类,从而实现对健康与病理衰老的有效区分。

链接: https://arxiv.org/abs/2511.05520
作者: Debanjali Bhattacharya,Neelam Sinha
机构: 未知
类目: Neurons and Cognition (q-bio.NC); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
备注:

点击查看摘要

Abstract:In this study, we propose the use of persistent homology- specifically Betti curves for brain age prediction and for distinguishing between healthy and pathological aging. The proposed framework is applied to 100 structural MRI scans from the publicly available ADNI dataset. Our results indicate that Betti curve features, particularly those from dimension-1 (connected components) and dimension-2 (1D holes), effectively capture structural brain alterations associated with aging. Furthermore, clinical features are grouped into three categories based on their correlation, or lack thereof, with (i) predicted brain age and (ii) chronological age. The findings demonstrate that this approach successfully differentiates normal from pathological aging and provides a novel framework for understanding how structural brain changes relate to cognitive impairment. The proposed method serves as a foundation for developing potential biomarkers for early detection and monitoring of cognitive decline.
zh

人工智能

[AI-0] Using Vision Language Models as Closed-Loop Symbolic Planners for Robotic Applications: A Control-Theoretic Perspective

链接: https://arxiv.org/abs/2511.07410
作者: Hao Wang,Sathwik Karnik,Bea Lim,Somil Bansal
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-1] LoReTTA: A Low Resource Framework To Poison Continuous Time Dynamic Graphs AAAI2026

链接: https://arxiv.org/abs/2511.07379
作者: Himanshu Pal,Venkata Sai Pranav Bachina,Ankit Gangwal,Charu Sharma
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Accepted at AAAI 2026

点击查看摘要

[AI-2] ransformers Provably Learn Chain-of-Thought Reasoning with Length Generalization NEURIPS2025

【速读】:该论文旨在解决生成式 AI 模型在面对更复杂、更长链式推理(Chain-of-Thought, CoT)任务时的外推能力问题,即模型能否将已学习的推理模式推广到更难或更长的问题上。其核心解决方案在于通过理论分析揭示了变压器(transformer)模型在梯度下降优化下学习合成状态追踪任务时,其注意力机制如何受问题代数结构调控,并由此决定推理长度的泛化能力。关键突破在于证明了注意力集中机制(attention concentration)能够连接注意力层的鲁棒性与长上下文推理任务的结构特性,从而实现对 NC1\mathsf{NC}^1-complete 问题的可证明学习,显著超越此前局限于 TC0\mathsf{TC}^0 类问题的限制,且提出了递归自训练方案以扩展有限推理长度的模型能力。

链接: https://arxiv.org/abs/2511.07378
作者: Yu Huang,Zixin Wen,Aarti Singh,Yuejie Chi,Yuxin Chen
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
备注: This is the full version of a paper published at NeurIPS 2025

点击查看摘要

Abstract:The ability to reason lies at the core of artificial intelligence (AI), and challenging problems usually call for deeper and longer reasoning to tackle. A crucial question about AI reasoning is whether models can extrapolate learned reasoning patterns to solve harder tasks with longer chain-of-thought (CoT). In this work, we present a theoretical analysis of transformers learning on synthetic state-tracking tasks with gradient descent. We mathematically prove how the algebraic structure of state-tracking problems governs the degree of extrapolation of the learned CoT. Specifically, our theory characterizes the length generalization of transformers through the mechanism of attention concentration, linking the retrieval robustness of the attention layer to the state-tracking task structure of long-context reasoning. Moreover, for transformers with limited reasoning length, we prove that a recursive self-training scheme can progressively extend the range of solvable problem lengths. To our knowledge, we provide the first optimization guarantee that constant-depth transformers provably learn \mathsfNC^1 -complete problems with CoT, significantly going beyond prior art confined in \mathsfTC^0 , unless the widely held conjecture \mathsfTC^0 \neq \mathsfNC^1 fails. Finally, we present a broad set of experiments supporting our theoretical results, confirming the length generalization behaviors and the mechanism of attention concentration.
zh

[AI-3] Consistency Is Not Always Correct: Towards Understanding the Role of Exploration in Post-Training Reasoning

链接: https://arxiv.org/abs/2511.07368
作者: Dake Bu,Wei Huang,Andi Han,Atsushi Nitanda,Bo Xue,Qingfu Zhang,Hau-San Wong,Taiji Suzuki
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-4] NT: Improving Chunkwise Training for Test-Time Memorization

链接: https://arxiv.org/abs/2511.07343
作者: Zeman Li,Ali Behrouz,Yuan Deng,Peilin Zhong,Praneeth Kacham,Mahdi Karami,Meisam Razaviyayn,Vahab Mirrokni
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-5] DeepPersona: A Generative Engine for Scaling Deep Synthetic Personas NEURIPS2025

链接: https://arxiv.org/abs/2511.07338
作者: Zhen Wang,Yufan Zhou,Zhongyan Luo,Lyumanshan Ye,Adam Wood,Man Yao,Luoshang Pan
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 12 pages, 5 figures, accepted at LAW 2025 Workshop (NeurIPS 2025)

点击查看摘要

[AI-6] Grounding Computer Use Agents on Human Demonstrations

链接: https://arxiv.org/abs/2511.07332
作者: Aarash Feizi,Shravan Nayak,Xiangru Jian,Kevin Qinghong Lin,Kaixin Li,Rabiul Awal,Xing Han Lù,Johan Obando-Ceron,Juan A. Rodriguez,Nicolas Chapados,David Vazquez,Adriana Romero-Soriano,Reihaneh Rabbany,Perouz Taslakian,Christopher Pal,Spandana Gella,Sai Rajeswar
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-7] Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search

链接: https://arxiv.org/abs/2511.07312
作者: Samuel Sokota,Eugene Vinitsky,Hengyuan Hu,J. Zico Kolter,Gabriele Farina
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-8] Hard vs. Noise: Resolving Hard-Noisy Sample Confusion in Recommender Systems via Large Language Models AAAI2026

【速读】:该论文旨在解决推荐系统中隐式反馈(implicit feedback)因误点击(misclicks)和位置偏差(position bias)等因素导致的噪声问题,尤其关注噪声样本与困难样本(hard samples)在数据模式上高度相似所引发的“硬-噪混淆”(hard-noisy confusion)问题,这可能导致关键的困难样本被错误过滤,从而损害用户偏好建模效果。解决方案的关键在于提出LLMHNI框架,其核心创新包括:利用大语言模型(Large Language Models, LLMs)生成两个辅助的用户-物品相关性信号——一是基于LLM编码嵌入的语义相关性用于负采样以选择困难负样本并过滤噪声假负样本;二是通过LLM推断的逻辑相关性构建交互图,并结合跨图对比对齐策略进行去噪;同时引入图对比学习机制,通过随机边删除视图对齐表示以抑制LLM幻觉带来的不可靠交互边,从而实现更精准的噪声识别与分离。

链接: https://arxiv.org/abs/2511.07295
作者: Tianrui Song,Wen-Shuo Chao,Hao Liu
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
备注: Accepted by AAAI2026

点击查看摘要

Abstract:Implicit feedback, employed in training recommender systems, unavoidably confronts noise due to factors such as misclicks and position bias. Previous studies have attempted to identify noisy samples through their diverged data patterns, such as higher loss values, and mitigate their influence through sample dropping or reweighting. However, we observed that noisy samples and hard samples display similar patterns, leading to hard-noisy confusion issue. Such confusion is problematic as hard samples are vital for modeling user preferences. To solve this problem, we propose LLMHNI framework, leveraging two auxiliary user-item relevance signals generated by Large Language Models (LLMs) to differentiate hard and noisy samples. LLMHNI obtains user-item semantic relevance from LLM-encoded embeddings, which is used in negative sampling to select hard negatives while filtering out noisy false negatives. An objective alignment strategy is proposed to project LLM-encoded embeddings, originally for general language tasks, into a representation space optimized for user-item relevance modeling. LLMHNI also exploits LLM-inferred logical relevance within user-item interactions to identify hard and noisy samples. These LLM-inferred interactions are integrated into the interaction graph and guide denoising with cross-graph contrastive alignment. To eliminate the impact of unreliable interactions induced by LLM hallucination, we propose a graph contrastive learning strategy that aligns representations from randomly edge-dropped views to suppress unreliable edges. Empirical results demonstrate that LLMHNI significantly improves denoising and recommendation performance.
zh

[AI-9] Enabling Off-Policy Imitation Learning with Deep Actor Critic Stabilization

链接: https://arxiv.org/abs/2511.07288
作者: Sayambhu Sen,Shalabh Bhatnagar
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 14 pages and 4 images

点击查看摘要

[AI-10] Designing Beyond Language: Sociotechnical Barriers in AI Health Technologies for Limited English Proficiency

链接: https://arxiv.org/abs/2511.07277
作者: Michelle Huang,Violeta J. Rodriguez,Koustuv Saha,Tal August
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
备注:

点击查看摘要

[AI-11] Beyond Detection: Exploring Evidence-based Multi-Agent Debate for Misinformation Intervention and Persuasion AAAI2026

链接: https://arxiv.org/abs/2511.07267
作者: Chen Han,Yijia Ma,Jin Tan,Wenzhen Zheng,Xijin Tang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: This paper has been accepted to AAAI 2026

点击查看摘要

[AI-12] Agent icSciML: Collaborative Multi-Agent Systems for Emergent Discovery in Scientific Machine Learning

链接: https://arxiv.org/abs/2511.07262
作者: Qile Jiang,George Karniadakis
机构: 未知
类目: Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
备注:

点击查看摘要

[AI-13] PADiff: Predictive and Adaptive Diffusion Policies for Ad Hoc Teamwork AAAI2026 AAAI

链接: https://arxiv.org/abs/2511.07260
作者: Hohei Chan,Xinzhi Zhang,Antao Xiang,Weinan Zhang,Mengchen Zhao
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Accepted by the 40th AAAI conference on Artificial Intelligence (AAAI 2026)

点击查看摘要

[AI-14] LLM ServingSim2.0: A Unified Simulator for Heterogeneous Hardware and Serving Techniques in LLM Infrastructure

链接: https://arxiv.org/abs/2511.07229
作者: Jaehong Cho,Hyunmin Choi,Jongse Park
机构: 未知
类目: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
备注: 4 pages, 3 figures

点击查看摘要

[AI-15] NoteEx: Interactive Visual Context Manipulation for LLM -Assisted Exploratory Data Analysis in Computational Notebooks

链接: https://arxiv.org/abs/2511.07223
作者: Mohammad Hasan Payandeh,Lin-Ping Yuan,Jian Zhao
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-16] SMiLE: Provably Enforcing Global Relational Properties in Neural Networks

链接: https://arxiv.org/abs/2511.07208
作者: Matteo Francobaldi,Michele Lombardi,Andrea Lodi
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
备注:

点击查看摘要

[AI-17] wenty-Five Years of MIR Research: Achievements Practices Evaluations and Future Challenges

链接: https://arxiv.org/abs/2511.07205
作者: Geoffroy Peeters,Zafar Rafii,Magdalena Fuentes,Zhiyao Duan,Emmanouil Benetos,Juhan Nam,Yuki Mitsufuji
机构: 未知
类目: ound (cs.SD); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-18] Evaluating Online Moderation Via LLM -Powered Counterfactual Simulations AAAI

链接: https://arxiv.org/abs/2511.07204
作者: Giacomo Fidone,Lucia Passaro,Riccardo Guidotti
机构: 未知
类目: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Multiagent Systems (cs.MA)
备注: Accepted for publication at AAAI Conference on Artificial Intelligence 2026

点击查看摘要

[AI-19] Resilient by Design - Active Inference for Distributed Continuum Intelligence

【速读】:该论文旨在解决分布式计算连续体(Distributed Computing Continuum, DCC)中因设备异构性与复杂性导致的可靠性与全局一致性难题,尤其是在需要实时、自适应协调的AI驱动工作负载场景下。其解决方案的关键在于提出一种概率主动推理韧性代理(Probabilistic Active Inference Resilience Agent, PAIR-Agent),通过构建因果故障图(causal fault graph)、利用马尔可夫毯(Markov blankets)和自由能原理(free-energy principle)在不确定性下识别故障,并基于主动推理(active inference)实现自主修复,从而在多种故障条件下维持服务连续性和系统稳定性。

链接: https://arxiv.org/abs/2511.07202
作者: Praveen Kumar Donta,Alfreds Lapkovskis,Enzo Mingozzi,Schahram Dustdar
机构: 未知
类目: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Networking and Internet Architecture (cs.NI)
备注:

点击查看摘要

Abstract:Failures are the norm in highly complex and heterogeneous devices spanning the distributed computing continuum (DCC), from resource-constrained IoT and edge nodes to high-performance computing systems. Ensuring reliability and global consistency across these layers remains a major challenge, especially for AI-driven workloads requiring real-time, adaptive coordination. This paper introduces a Probabilistic Active Inference Resilience Agent (PAIR-Agent) to achieve resilience in DCC systems. PAIR-Agent performs three core operations: (i) constructing a causal fault graph from device logs, (ii) identifying faults while managing certainties and uncertainties using Markov blankets and the free-energy principle, and (iii) autonomously healing issues through active inference. Through continuous monitoring and adaptive reconfiguration, the agent maintains service continuity and stability under diverse failure conditions. Theoretical validations confirm the reliability and effectiveness of the proposed framework.
zh

[AI-20] Fuzzy Label: From Concept to Its Application in Label Learning

链接: https://arxiv.org/abs/2511.07165
作者: Chenxi Luoa,Zhuangzhuang Zhaoa,Zhaohong Denga,Te Zhangb
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-21] Conditional Diffusion as Latent Constraints for Controllable Symbolic Music Generation

链接: https://arxiv.org/abs/2511.07156
作者: Matteo Pettenó,Alessandro Ilic Mezza,Alberto Bernardini
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
备注:

点击查看摘要

[AI-22] Saliency Map-Guided Knowledge Discovery for Subclass Identification with LLM -Based Symbolic Approximations

链接: https://arxiv.org/abs/2511.07126
作者: Tim Bohne,Anne-Kathrin Patricia Windler,Martin Atzmueller
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-23] On the Joint Minimization of Regularization Loss Functions in Deep Variational Bayesian Methods for Attribute-Controlled Symbolic Music Generation

链接: https://arxiv.org/abs/2511.07118
作者: Matteo Pettenó,Alessandro Ilic Mezza,Alberto Bernardini
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
备注: IEEE Catalog No.: CFP2540S-ART ISBN: 978-9-46-459362-4

点击查看摘要

[AI-24] wo Heads are Better than One: Distilling Large Language Model Features Into Small Models with Feature Decomposition and Mixture

【速读】:该论文旨在解决将大语言模型(Large Language Models, LLMs)应用于市场做市(Market Making, MM)时面临的推理速度慢以及知识蒸馏(Knowledge Distillation)方法尚未针对该任务进行深入研究的问题。其解决方案的关键在于提出一种名为协作式市场做市(Cooperative Market Making, CMM)的新框架,该框架通过将LLM的特征解耦为层(layer)、任务(task)和数据(data)三个正交维度,使多个学生模型分别学习不同维度的简化特征,从而实现高效的知识蒸馏;同时,CMM引入Hájek-MoE机制,在核函数生成的公共特征空间中动态整合各学生模型的输出,以提升整体性能。

链接: https://arxiv.org/abs/2511.07110
作者: Tianhao Fu,Xinxin Xu,Weichen Xu,Jue Chen,Ruilong Ren,Bowen Deng,Xinyu Zhao,Jian Cao,Xixin Cao
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Market making (MM) through Reinforcement Learning (RL) has attracted significant attention in financial trading. With the development of Large Language Models (LLMs), more and more attempts are being made to apply LLMs to financial areas. A simple, direct application of LLM as an agent shows significant performance. Such methods are hindered by their slow inference speed, while most of the current research has not studied LLM distillation for this specific task. To address this, we first propose the normalized fluorescent probe to study the mechanism of the LLM’s feature. Based on the observation found by our investigation, we propose Cooperative Market Making (CMM), a novel framework that decouples LLM features across three orthogonal dimensions: layer, task, and data. Various student models collaboratively learn simple LLM features along with different dimensions, with each model responsible for a distinct feature to achieve knowledge distillation. Furthermore, CMM introduces an Hájek-MoE to integrate the output of the student models by investigating the contribution of different models in a kernel function-generated common feature space. Extensive experimental results on four real-world market datasets demonstrate the superiority of CMM over the current distillation method and RL-based market-making strategies.
zh

[AI-25] A Theoretical Analysis of Detecting Large Model-Generated Time Series AAAI-2026

链接: https://arxiv.org/abs/2511.07104
作者: Junji Hou,Junzhou Zhao,Shuo Zhang,Pinghui Wang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 23 pages,12 figures, to be published in AAAI-2026 main track

点击查看摘要

[AI-26] E2E-VGuard: Adversarial Prevention for Production LLM -based End-To-End Speech Synthesis NEURIPS2025

链接: https://arxiv.org/abs/2511.07099
作者: Zhisheng Zhang,Derui Wang,Yifan Mi,Zhiyong Wu,Jie Gao,Yuxin Cao,Kai Ye,Minhui Xue,Jie Hao
机构: 未知
类目: ound (cs.SD); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
备注: Accepted to NeurIPS 2025

点击查看摘要

[AI-27] Boosting Fine-Grained Urban Flow Inference via Lightweight Architecture and Focalized Optimization AAAI’26

链接: https://arxiv.org/abs/2511.07098
作者: Yuanshao Zhu,Xiangyu Zhao,Zijian Zhang,Xuetao Wei,James Jianqiao Yu
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: Accepted as a regular paper by AAAI’26

点击查看摘要

[AI-28] Agent ic AI Sustainability Assessment for Supply Chain Document Insights

链接: https://arxiv.org/abs/2511.07097
作者: Diego Gosmar,Anna Chiara Pallotta,Giovanni Zenezini
机构: 未知
类目: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
备注: 17 pages, 4 figures

点击查看摘要

[AI-29] Data Complexity of Querying Description Logic Knowledge Bases under Cost-Based Semantics AAAI2026

链接: https://arxiv.org/abs/2511.07095
作者: Meghyn Bienvenu,Quentin Manière
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: Long version of paper to appear in AAAI 2026

点击查看摘要

[AI-30] Green AI: A systematic review and meta-analysis of its definitions lifecycle models hardware and measurement attempts

链接: https://arxiv.org/abs/2511.07090
作者: Marcel Rojahn,Marcus Grum
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-31] LLM Driven Processes to Foster Explainable AI

链接: https://arxiv.org/abs/2511.07086
作者: Marcel Pehlke,Marc Jansen
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-32] Increasing AI Explainability by LLM Driven Standard Processes

链接: https://arxiv.org/abs/2511.07083
作者: Marc Jansen,Marcel Pehlke
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-33] RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services

链接: https://arxiv.org/abs/2511.07070
作者: Fei Zhao,Chonggang Lu,Haofu Qian,Fangcheng Shi,Zijie Meng,Jianzhao Huang,Xu Tang,Zheyong Xie,Zheyu Ye,Zhe Xu,Yao Hu,Shaosheng Cao
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[AI-34] Improving Region Representation Learning from Urban Imagery with Noisy Long-Caption Supervision AAAI-26

链接: https://arxiv.org/abs/2511.07062
作者: Yimei Zhang,Guojiang Shen,Kaili Ning,Tongwei Ren,Xuebo Qiu,Mengmeng Wang,Xiangjie Kong
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: Accepted as a full paper by AAAI-26

点击查看摘要

[AI-35] Do LLM s Feel? Teaching Emotion Recognition with Prompts Retrieval and Curriculum Learning AAAI2026

链接: https://arxiv.org/abs/2511.07061
作者: Xinran Li,Xiujuan Xu,Jiaqi Qiao,Yu Liu
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: Accepted at AAAI 2026

点击查看摘要

[AI-36] Learning Quantized Continuous Controllers for Integer Hardware

链接: https://arxiv.org/abs/2511.07046
作者: Fabian Kresse,Christoph H. Lampert
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 17 pages, 6 figures

点击查看摘要

[AI-37] Benchmarking LLM s for Fine-Grained Code Review with Enriched Context in Practice

链接: https://arxiv.org/abs/2511.07017
作者: Ruida Hu,Xinchen Wang,Xin-Cheng Wen,Zhao Zhang,Bo Jiang,Pengfei Gao,Chao Peng,Cuiyun Gao
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-38] Diffolio: A Diffusion Model for Multivariate Probabilistic Financial Time-Series Forecasting and Portfolio Construction

链接: https://arxiv.org/abs/2511.07014
作者: So-Yoon Cho,Jin-Young Kim,Kayoung Ban,Hyeng Keun Koo,Hyun-Gyoon Kim
机构: 未知
类目: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI); Econometrics (econ.EM); Portfolio Management (q-fin.PM)
备注:

点击查看摘要

[AI-39] S2Drug: Bridging Protein Sequence and 3D Structure in Contrastive Representation Learning for Virtual Screening AAAI2026

链接: https://arxiv.org/abs/2511.07006
作者: Bowei He,Bowen Gao,Yankai Chen,Yanyan Lan,Chen Ma,Philip S. Yu,Ya-Qin Zhang,Wei-Ying Ma
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Accepted by AAAI 2026 Main Technical Track

点击查看摘要

[AI-40] Hybrid Autoencoders for Tabular Data: Leverag ing Model-Based Augmentation in Low-Label Settings NEURIPS2025

链接: https://arxiv.org/abs/2511.06961
作者: Erel Naor,Ofir Lindenbaum
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: accepted to neurips 2025, main text is 10 pages

点击查看摘要

[AI-41] Learning to Focus: Prioritizing Informative Histories with Structured Attention Mechanisms in Partially Observable Reinforcement Learning NEURIPS2025

链接: https://arxiv.org/abs/2511.06946
作者: Daniel De Dios Allegue,Jinke He,Frans A. Oliehoek
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Accepted to Embodied World Models for Decision Making (EWM) Workshop at NeurIPS 2025

点击查看摘要

[AI-42] Fine-Tuning Diffusion-Based Recommender Systems via Reinforcement Learning with Reward Function Optimization

链接: https://arxiv.org/abs/2511.06937
作者: Yu Hou,Hua Li,Ha Young Kim,Won-Yong Shin
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Social and Information Networks (cs.SI)
备注: 14 pages, 12 figures, 9 tables

点击查看摘要

[AI-43] Proceedings of the 2025 XCSP3 Competition

链接: https://arxiv.org/abs/2511.06918
作者: Gilles Audemard,Christophe Lecoutre,Emmanuel Lonca
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 110 pages

点击查看摘要

[AI-44] Sampling and Loss Weights in Multi-Domain Training

链接: https://arxiv.org/abs/2511.06913
作者: Mahdi Salmani,Pratik Worah,Meisam Razaviyayn,Vahab Mirrokni
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-45] Counterfactual Explanation for Multivariate Time Series Forecasting with Exogenous Variables

链接: https://arxiv.org/abs/2511.06906
作者: Keita Kinjo
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 27pages,9figures,9tables

点击查看摘要

[AI-46] A Hybrid Autoencoder-Transformer Model for Robust Day-Ahead Electricity Price Forecasting under Extreme Conditions

【速读】:该论文旨在解决电力系统中日前电价预测(Day-ahead Electricity Price Forecasting, DAEPF)在极端天气条件和市场异常情况下的准确性与鲁棒性问题。现有方法难以有效应对这些复杂场景,导致预测误差增大。其解决方案的关键在于提出一种融合蒸馏注意力Transformer(Distilled Attention Transformer, DAT)与自编码器自回归模型(Autoencoder Self-regression Model, ASM)的混合深度学习框架:DAT通过自注意力机制动态加权历史数据中的关键片段,精准捕捉长期趋势与短期波动;ASM则利用无监督学习识别并隔离由极端天气或人为节日等因素引发的异常模式,从而提升模型对扰动的适应能力。实验证明该框架在预测精度、鲁棒性和计算效率上均优于当前最优方法。

链接: https://arxiv.org/abs/2511.06898
作者: Boyan Tang,Xuanhao Ren,Peng Xiao,Shunbo Lei,Xiaorong Sun,Jianghua Wu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Published in 2025 IEEE 1st International Symposium on the Application of Artificial Intelligence in Electrical Engineering (AAIEE) this https URL

点击查看摘要

Abstract:Accurate day-ahead electricity price forecasting (DAEPF) is critical for the efficient operation of power systems, but extreme condition and market anomalies pose significant challenges to existing forecasting methods. To overcome these challenges, this paper proposes a novel hybrid deep learning framework that integrates a Distilled Attention Transformer (DAT) model and an Autoencoder Self-regression Model (ASM). The DAT leverages a self-attention mechanism to dynamically assign higher weights to critical segments of historical data, effectively capturing both long-term trends and short-term fluctuations. Concurrently, the ASM employs unsupervised learning to detect and isolate anomalous patterns induced by extreme conditions, such as heavy rain, heat waves, or human festivals. Experiments on datasets sampled from California and Shandong Province demonstrate that our framework significantly outperforms state-of-the-art methods in prediction accuracy, robustness, and computational efficiency. Our framework thus holds promise for enhancing grid resilience and optimizing market operations in future power systems.
zh

[AI-47] On The Presence of Double-Descent in Deep Reinforcement Learning

链接: https://arxiv.org/abs/2511.06895
作者: Viktor Veselý,Aleksandar Todorov,Matthia Sabatelli
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注:

点击查看摘要

[AI-48] COGNOS: Universal Enhancement for Time Series Anomaly Detection via Constrained Gaussian-Noise Optimization and Smoothing

链接: https://arxiv.org/abs/2511.06894
作者: Wenlong Shang,Peng Chang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-49] DeepBooTS: Dual-Stream Residual Boosting for Drift-Resilient Time-Series Forecasting AAAI-26

链接: https://arxiv.org/abs/2511.06893
作者: Daojun Liang,Jing Chen,Xiao Wang,Yinglong Wang,Suo Li
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 28 pages,17 pages, Published in AAAI-26

点击查看摘要

[AI-50] uckA: Hierarchical Compact Tensor Experts for Efficient Fine-Tuning

链接: https://arxiv.org/abs/2511.06859
作者: Qifeng Lei,Zhiyong Yang,Qianqian Xu,Cong Hua,Peisong Wen,Qingming Huang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-51] Differentiated Directional Intervention A Framework for Evading LLM Safety Alignment AAAI-26

链接: https://arxiv.org/abs/2511.06852
作者: Peng Zhang,peijie sun
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE)
备注: AAAI-26-AIA

点击查看摘要

[AI-52] DeepRWCap: Neural-Guided Random-Walk Capacitance Solver for IC Design AAAI-26

链接: https://arxiv.org/abs/2511.06831
作者: Hector R. Rodriguez,Jiechen Huang,Wenjian Yu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Accepted to AAAI-26

点击查看摘要

[AI-53] Controllable Flow Matching for Online Reinforcement Learning AAAI2026 AAAI

链接: https://arxiv.org/abs/2511.06816
作者: Bin Wang,Boxiang Tao,Haifeng Jing,Hongbo Dou,Zijian Wang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 9 pages, The Fortieth AAAI Conference on Artificial Intelligence(AAAI2026)

点击查看摘要

[AI-54] MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning

链接: https://arxiv.org/abs/2511.06805
作者: Jinhao Chen,Zhen Yang,Jianxin Shi,Tianyu Wo,Jie Tang
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 19 pages, 11 figures

点击查看摘要

[AI-55] AgentS UMO: An Agent ic Framework for Interactive Simulation Scenario Generation in SUMO via Large Language Models

链接: https://arxiv.org/abs/2511.06804
作者: Minwoo Jeong,Jeeyun Chang,Yoonjin Yoon
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
备注: Submitted to Transportation Research Part C (under review)

点击查看摘要

[AI-56] Learning to Fast Unrank in Collaborative Filtering Recommendation

链接: https://arxiv.org/abs/2511.06803
作者: Junpeng Zhao,Lin Li,Ming Li,Amran Bhuiyan,Jimmy Huang
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[AI-57] Recursive Dynamics in Fast-Weights Homeostatic Reentry Networks: Toward Reflective Intelligence

链接: https://arxiv.org/abs/2511.06798
作者: B. G. Chae
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
备注: 17 pages, 6 figures

点击查看摘要

[AI-58] Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models AAAI2026

链接: https://arxiv.org/abs/2511.06793
作者: Kunhao Li,Wenhao Li,Di Wu,Lei Yang,Jun Bai,Ju Jia,Jason Xue
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Accepted at AAAI 2026 as a Conference Paper (Oral Presentation)

点击查看摘要

[AI-59] Robust Causal Discovery under Imperfect Structural Constraints

链接: https://arxiv.org/abs/2511.06790
作者: Zidong Wang,Xi Lin,Chuchao He,Xiaoguang Gao
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注:

点击查看摘要

[AI-60] Resource Efficient Sleep Staging via Multi-Level Masking and Prompt Learning AAAI2026

链接: https://arxiv.org/abs/2511.06785
作者: Lejun Ai,Yulong Li,Haodong Yi,Jixuan Xie,Yue Wang,Jia Liu,Min Chen,Rui Wang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 16 pages, 4 figures, to be published in AAAI 2026

点击查看摘要

[AI-61] On the Mechanisms of Collaborative Learning in VAE Recommenders

【速读】:该论文旨在解决生成式推荐系统中基于变分自编码器(Variational Autoencoders, VAE)的协同过滤(Collaborative Filtering, CF)机制下“协作”如何形成及其几何结构优化的问题。现有方法常采用二值输入掩码(binary input masking)提升性能,但其理论基础尚不清晰。作者通过分析发现,VAE-based CF中的协作由潜在空间中的邻近性(latent proximity)决定,并推导出一个潜在共享半径(latent sharing radius),量化了当用户间潜在 Wasserstein 距离增大时,梯度更新对其他用户的影响衰减规律。关键解决方案包括:(1) 引入 β-KL 正则化以收紧信息瓶颈,促进后验分布重叠,但需避免过强导致表征坍缩;(2) 利用输入掩码引发的随机几何收缩与扩张,使远距离但相关用户进入同一潜在邻域,但可能引入邻域漂移;(3) 提出锚定正则化(anchor regularizer),将用户后验与物品嵌入对齐,在保持用户身份的同时增强跨物品信号共享,从而实现全局一致性。该方案在 Netflix、MovieLens-20M 和 Million Song 数据集上验证有效,并成功部署于亚马逊流媒体平台。

链接: https://arxiv.org/abs/2511.06781
作者: Tung-Long Vuong,Julien Monteil,Hien Dang,Volodymyr Vaskovych,Trung Le,Vu Nguyen
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Variational Autoencoders (VAEs) are a powerful alternative to matrix factorization for recommendation. A common technique in VAE-based collaborative filtering (CF) consists in applying binary input masking to user interaction vectors, which improves performance but remains underexplored theoretically. In this work, we analyze how collaboration arises in VAE-based CF and show it is governed by latent proximity: we derive a latent sharing radius that informs when an SGD update on one user strictly reduces the loss on another user, with influence decaying as the latent Wasserstein distance increases. We further study the induced geometry: with clean inputs, VAE-based CF primarily exploits \emphlocal collaboration between input-similar users and under-utilizes global collaboration between far-but-related users. We compare two mechanisms that encourage \emphglobal mixing and characterize their trade-offs: (1) \beta -KL regularization directly tightens the information bottleneck, promoting posterior overlap but risking representational collapse if too large; (2) input masking induces stochastic geometric contractions and expansions, which can bring distant users onto the same latent neighborhood but also introduce neighborhood drift. To preserve user identity while enabling global consistency, we propose an anchor regularizer that aligns user posteriors with item embeddings, stabilizing users under masking and facilitating signal sharing across related items. Our analyses are validated on the Netflix, MovieLens-20M, and Million Song datasets. We also successfully deployed our proposed algorithm on an Amazon streaming platform following a successful online experiment.
zh

[AI-62] OntoTune: Ontology-Driven Learning for Query Optimization with Convolutional Models

链接: https://arxiv.org/abs/2511.06780
作者: Songhui Yue,Yang Shao,Sean Hayes
机构: 未知
类目: Databases (cs.DB); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[AI-63] Pedagogical Reflections on the Holistic Cognitive Development (HCD) Framework and AI-Augmented Learning in Creative Computing

链接: https://arxiv.org/abs/2511.06779
作者: BHojan Anand
机构: 未知
类目: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
备注: Short Abstract

点击查看摘要

[AI-64] Data Trajectory Alignment for LLM Domain Adaptation: A Two-Phase Synthesis Framework for Telecommunications Mathematics

链接: https://arxiv.org/abs/2511.06776
作者: Zhicheng Zhou,Jing Li,Suming Qiu,Junjie Huang,Linyuan Qiu,Zhijie Sun
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-65] QUARK: Quantization-Enabled Circuit Sharing for Transformer Acceleration by Exploiting Common Patterns in Nonlinear Operations

链接: https://arxiv.org/abs/2511.06767
作者: Zhixiong Zhao,Haomin Li,Fangxin Liu,Yuncheng Lu,Zongwu Wang,Tao Yang,Li Jiang,Haibing Guan
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: ICCAD 2025

点击查看摘要

[AI-66] SRNN: Spatiotemporal Relational Neural Network for Intuitive Physics Understanding

链接: https://arxiv.org/abs/2511.06761
作者: Fei Yang
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[AI-67] Implicit Federated In-context Learning For Task-Specific LLM Fine-Tuning

链接: https://arxiv.org/abs/2511.06757
作者: Dongcheng Li,Junhan Chen,Aoxiang Zhou,Chunpei Li,Youquan Xian,Peng Liu,Xianxian Li
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-68] Physically-Grounded Goal Imagination: Physics-Informed Variational Autoencoder for Self-Supervised Reinforcement Learning

链接: https://arxiv.org/abs/2511.06745
作者: Lan Thi Ha Nguyen,Kien Ton Manh,Anh Do Duc,Nam Pham Hai
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-69] Rank-1 LoRAs Encode Interpretable Reasoning Signals NEURIPS2025

链接: https://arxiv.org/abs/2511.06739
作者: Jake Ward,Paul Riechers,Adam Shai
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Mechanistic Interpretability Workshop

点击查看摘要

[AI-70] S-DAG: A Subject-Based Directed Acyclic Graph for Multi-Agent Heterogeneous Reasoning

链接: https://arxiv.org/abs/2511.06727
作者: Jiangwen Dong,Zehui Lin,Wanyu Lin,Mingjin Zhang
机构: 未知
类目: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-71] Sensor Calibration Model Balancing Accuracy Real-time and Efficiency

链接: https://arxiv.org/abs/2511.06715
作者: Jinyong Yun,Hyungjin Kim,Seokho Ahn,Euijong Lee,Young-Duk Seo
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-72] Structural Enforcement of Statistical Rigor in AI-Driven Discovery: A Functional Architecture

链接: https://arxiv.org/abs/2511.06701
作者: Karen Sargsyan
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-73] Magnitude-Modulated Equivariant Adapter for Parameter-Efficient Fine-Tuning of Equivariant Graph Neural Networks

链接: https://arxiv.org/abs/2511.06696
作者: Dian Jin,Yancheng Yuan,Xiaoming Tao
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-74] ML-EcoLyzer: Quantifying the Environmental Cost of Machine Learning Inference Across Frameworks and Hardware

链接: https://arxiv.org/abs/2511.06694
作者: Jose Marie Antonio Minoza,Rex Gregor Laylo,Christian F Villarin,Sebastian C. Ibanez
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-75] Rapidly Learning Soft Robot Control via Implicit Time-Stepping

链接: https://arxiv.org/abs/2511.06667
作者: Andrew Choi,Dezhong Tong
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注: Code: this https URL

点击查看摘要

[AI-76] CaberNet: Causal Representation Learning for Cross-Domain HVAC Energy Prediction

链接: https://arxiv.org/abs/2511.06634
作者: Kaiyuan Zhai,Jiacheng Cui,Zhehao Zhang,Junyu Xue,Yang Deng,Kui Wu,Guoming Tang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Accepted at ACM e-Energy 2026

点击查看摘要

[AI-77] Spilling the Beans: Teaching LLM s to Self-Report Their Hidden Objectives

链接: https://arxiv.org/abs/2511.06626
作者: Chloe Li,Mary Phuong,Daniel Tan
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-78] How Do VLAs Effectively Inherit from VLMs?

【速读】:该论文试图解决的问题是:视觉-语言-动作(Vision-Language-Action, VLA)模型如何有效继承大规模视觉-语言模型(Vision-Language Models, VLMs)的先验知识以实现可泛化的具身控制。其解决方案的关键在于设计了一个诊断性基准任务——GrinningFace,该任务通过机器人将物体放置到对应语义的emoji图案上,利用emoji在互联网规模数据中广泛存在但标准机器人数据集中几乎缺失这一特性,作为VLM先验知识向具身控制迁移的干净代理指标。实验表明,成功完成该任务可有效验证VLM先验知识的保留与迁移,从而为开发真正具备泛化能力的具身AI系统提供了关键方法论指导和实践依据。

链接: https://arxiv.org/abs/2511.06619
作者: Chuheng Zhang,Rushuai Yang,Xiaoyu Chen,Kaixin Wang,Li Zhao,Yi Chen,Jiang Bian
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Vision-language-action (VLA) models hold the promise to attain generalizable embodied control. To achieve this, a pervasive paradigm is to leverage the rich vision-semantic priors of large vision-language models (VLMs). However, the fundamental question persists: How do VLAs effectively inherit the prior knowledge from VLMs? To address this critical question, we introduce a diagnostic benchmark, GrinningFace, an emoji tabletop manipulation task where the robot arm is asked to place objects onto printed emojis corresponding to language instructions. This task design is particularly revealing – knowledge associated with emojis is ubiquitous in Internet-scale datasets used for VLM pre-training, yet emojis themselves are largely absent from standard robotics datasets. Consequently, they provide a clean proxy: successful task completion indicates effective transfer of VLM priors to embodied control. We implement this diagnostic task in both simulated environment and a real robot, and compare various promising techniques for knowledge transfer. Specifically, we investigate the effects of parameter-efficient fine-tuning, VLM freezing, co-training, predicting discretized actions, and predicting latent actions. Through systematic evaluation, our work not only demonstrates the critical importance of preserving VLM priors for the generalization of VLA but also establishes guidelines for future research in developing truly generalizable embodied AI systems.
zh

[AI-79] Beyond Fixed Depth: Adaptive Graph Neural Networks for Node Classification Under Varying Homophily AAAI2026

链接: https://arxiv.org/abs/2511.06608
作者: Asela Hevapathige,Asiri Wijesinghe,Ahad N. Zehmakan
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Accepted to AAAI 2026

点击查看摘要

[AI-80] CoFineLLM : Conformal Finetuning of LLM s for Language-Instructed Robot Planning

链接: https://arxiv.org/abs/2511.06575
作者: Jun Wang,Yevgeniy Vorobeychik,Yiannis Kantaros
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[AI-81] SteganoSNN: SNN-Based Audio-in-Image Steganography with Encryption

链接: https://arxiv.org/abs/2511.06573
作者: Biswajit Kumar Sahoo,Pedro Machado,Isibor Kennedy Ihianle,Andreas Oikonomou,Srinivas Boppu
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-82] Breaking the Dyadic Barrier: Rethinking Fairness in Link Prediction Beyond Demographic Parity AAAI-26

链接: https://arxiv.org/abs/2511.06568
作者: João Mattos,Debolina Halder Lina,Arlei Silva
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
备注: 12 pages, 5 figures. Accepted at AAAI-26 as an Oral

点击查看摘要

[AI-83] LLM For Loop Invariant Generation and Fixing: How Far Are We?

链接: https://arxiv.org/abs/2511.06552
作者: Mostafijur Rahman Akhond,Saikat Chakraborty,Gias Uddin
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注: This work has been submitted to the IEEE for possible publication

点击查看摘要

[AI-84] riShGAN: Enhancing Sparsity and Robustness in Multivariate Time Series Counterfactuals Explanation

链接: https://arxiv.org/abs/2511.06529
作者: Hongnan Ma,Yiwei Shi,Guanxiong Sun,Mengyue Yang,Weiru Liu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-85] FractalBench: Diagnosing Visual-Mathematical Reasoning Through Recursive Program Synthesis NEURIPS2025

链接: https://arxiv.org/abs/2511.06522
作者: Jan Ondras(1),Marek Šuppa(2) ((1) MIT, (2) Comenius University, Cisco)
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Accepted to The 5th Workshop on Mathematical Reasoning and AI at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025); 25 pages, 14 figures, 8 tables; Code available at this https URL

点击查看摘要

[AI-86] Route Experts by Sequence not by Token

链接: https://arxiv.org/abs/2511.06494
作者: Tiansheng Wen,Yifei Wang,Aosong Feng,Long Ma,Xinyang Liu,Yifan Wang,Lixuan Guo,Bo Chen,Stefanie Jegelka,Chenyu You
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT)
备注:

点击查看摘要

[AI-87] Explainable AI For Early Detection Of Sepsis

链接: https://arxiv.org/abs/2511.06492
作者: Atharva Thakur,Shruti Dhumal
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-88] GHOST: Solving the Traveling Salesman Problem on Graphs of Convex Sets AAAI-2026

链接: https://arxiv.org/abs/2511.06471
作者: Jingtao Tang,Hang Ma
机构: 未知
类目: Artificial Intelligence (cs.AI); Robotics (cs.RO)
备注: Accepted to AAAI-2026

点击查看摘要

[AI-89] Brain-Inspired Planning for Better Generalization in Reinforcement Learning DATE

链接: https://arxiv.org/abs/2511.06470
作者: Mingde “Harry” Zhao
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: McGill PhD Thesis (updated on 20251109 for typos and margin adjustments)

点击查看摘要

[AI-90] EchoMark: Perceptual Acoustic Environment Transfer with Watermark-Embedded Room Impulse Response

链接: https://arxiv.org/abs/2511.06458
作者: Chenpei Huang,Lingfeng Yao,Kyu In Lee,Lan Emily Zhang,Xun Chen,Miao Pan
机构: 未知
类目: ound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
备注:

点击查看摘要

[AI-91] A Multi-Agent System for Semantic Mapping of Relational Data to Knowledge Graphs WWW

链接: https://arxiv.org/abs/2511.06455
作者: Milena Trajanoska,Riste Stojanov,Dimitar Trajanov
机构: 未知
类目: Databases (cs.DB); Artificial Intelligence (cs.AI)
备注: The 1st GOBLIN Workshop on Knowledge Graph Technologies this https URL

点击查看摘要

[AI-92] FLEX: Continuous Agent Evolution via Forward Learning from Experience

链接: https://arxiv.org/abs/2511.06449
作者: Zhicheng Cai,Xinyuan Guo,Yu Pei,JiangTao Feng,Jiangjie Chen,Ya-Qin Zhang,Wei-Ying Ma,Mingxuan Wang,Hao Zhou
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-93] Personality over Precision: Exploring the Influence of Human-Likeness on ChatGPT Use for Search SIGIR’25

链接: https://arxiv.org/abs/2511.06447
作者: Mert Yazan,Frederik Bungaran Ishak Situmeang,Suzan Verberne
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注: Accepted at NIP-IR@SIGIR’25

点击查看摘要

[AI-94] Walking the Tightrope of LLM s for Software Development: A Practitioners Perspective

【速读】:该论文试图解决的问题是:如何在软件开发中平衡大型语言模型(Large Language Models, LLMs)带来的正向影响与潜在负面影响,尤其是在个体开发者、团队、组织和社会层面所面临的复杂权衡。解决方案的关键在于通过三轮数据收集与分析的定性研究方法,结合社会技术扎根理论(Socio-Technical Grounded Theory, STGT),系统识别LLMs对软件开发的实际影响(包括提升开发流程连续性、优化开发者心智模型及促进创业精神等优势,以及对人格特质和职业声誉的潜在损害),并提炼出可操作的最佳实践,从而为软件团队领导者和IT管理者提供基于实证的决策依据,以在特定情境下合理评估和采纳LLMs。

链接: https://arxiv.org/abs/2511.06428
作者: Samuel Ferino,Rashina Hoda,John Grundy,Christoph Treude
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Background: Large Language Models emerged with the potential of provoking a revolution in software development (e.g., automating processes, workforce transformation). Although studies have started to investigate the perceived impact of LLMs for software development, there is a need for empirical studies to comprehend how to balance forward and backward effects of using LLMs. Objective: We investigated how LLMs impact software development and how to manage the impact from a software developer’s perspective. Method: We conducted 22 interviews with software practitioners across 3 rounds of data collection and analysis, between October (2024) and September (2025). We employed socio-technical grounded theory (STGT) for data analysis to rigorously analyse interview participants’ responses. Results: We identified the benefits (e.g., maintain software development flow, improve developers’ mental model, and foster entrepreneurship) and disadvantages (e.g., negative impact on developers’ personality and damage to developers’ reputation) of using LLMs at individual, team, organisation, and society levels; as well as best practices on how to adopt LLMs. Conclusion: Critically, we present the trade-offs that software practitioners, teams, and organisations face in working with LLMs. Our findings are particularly useful for software team leaders and IT managers to assess the viability of LLMs within their specific context.
zh

[AI-95] AUTO-Explorer: Automated Data Collection for GUI Agent

链接: https://arxiv.org/abs/2511.06417
作者: Xiangwu Guo,Difei Gao,Mike Zheng Shou
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-96] SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization

链接: https://arxiv.org/abs/2511.06411
作者: Zhi Zheng,Wee Sun Lee
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[AI-97] Efficient LLM Safety Evaluation through Multi-Agent Debate

链接: https://arxiv.org/abs/2511.06396
作者: Dachuan Lin,Guobin Shen,Zihao Yang,Tianrong Liu,Dongcheng Zhao,Yi Zeng
机构: 未知
类目: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
备注: 9 pages of main text, 14 pages total, 4 figures

点击查看摘要

[AI-98] Ghost in the Transformer: Tracing LLM Lineage with SVD-Fingerprint AAAI2026

链接: https://arxiv.org/abs/2511.06390
作者: Suqing Wang,Ziyang Ma,Xinyi Li,Zuchao Li
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注: Accepted at AAAI 2026 (Oral)

点击查看摘要

[AI-99] HyMoERec: Hybrid Mixture-of-Experts for Sequential Recommendation AAAI2026

链接: https://arxiv.org/abs/2511.06388
作者: Kunrong Li,Zhu Sun,Kwan Hui Lim
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
备注: AAAI 2026 Student Abstract

点击查看摘要

[AI-100] What Makes Reasoning Invalid: Echo Reflection Mitigation for Large Language Models

链接: https://arxiv.org/abs/2511.06380
作者: Chen He,Xun Jiang,Lei Wang,Hao Yang,Chong Peng,Peng Yan,Fumin Shen,Xing Xu
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[AI-101] Privacy-Preserving Federated Learning for Fair and Efficient Urban Traffic Optimization

链接: https://arxiv.org/abs/2511.06363
作者: Rathin Chandra Shit,Sharmila Subudhi
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
备注: Under review at IEEE journal

点击查看摘要

[AI-102] Understanding Student Interaction with AI-Powered Next-Step Hints: Strategies and Challenges

链接: https://arxiv.org/abs/2511.06362
作者: Anastasiia Birillo,Aleksei Rostovskii,Yaroslav Golubev,Hieke Keuning
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
备注: Accepted to SIGCSE’26. 7 pages, 3 figures

点击查看摘要

[AI-103] A Graph-Theoretical Perspective on Law Design for Multiagent Systems AAAI AAAI-26

链接: https://arxiv.org/abs/2511.06361
作者: Qi Shi,Pavel Naumov
机构: 未知
类目: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
备注: The 40th AAAI Conference on Artificial Intelligence (AAAI-26)

点击查看摘要

[AI-104] Reaction Prediction via Interaction Modeling of Symmetric Difference Shingle Sets

链接: https://arxiv.org/abs/2511.06356
作者: Runhan Shi,Letian Chen,Gufeng Yu,Yang Yang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM)
备注:

点击查看摘要

[AI-105] PRAG MA: A Profiling-Reason ed Multi-Agent Framework for Automatic Kernel Optimization

链接: https://arxiv.org/abs/2511.06345
作者: Kelun Lei,Hailong Yang,Huaitao Zhang,Xin You,Kaige Zhang,Zhongzhi Luan,Yi Liu,Depei Qian
机构: 未知
类目: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-106] ALIGN: A Vision-Language Framework for High-Accuracy Accident Location Inference through Geo-Spatial Neural Reasoning

链接: https://arxiv.org/abs/2511.06316
作者: MD Thamed Bin Zaman Chowdhury,Moazzem Hossain
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-107] Precision-Scalable Microscaling Datapaths with Optimized Reduction Tree for Efficient NPU Integration

链接: https://arxiv.org/abs/2511.06313
作者: Stef Cuyckens,Xiaoling Yi,Robin Geens,Joren Dumoulin,Martin Wiesner,Chao Fang,Marian Verhelst
机构: 未知
类目: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Signal Processing (eess.SP)
备注: To appear in the 31st Asia and South Pacific Design Automation Conference (ASP-DAC 2026, Invited Paper)

点击查看摘要

[AI-108] he Station: An Open-World Environment for AI-Driven Discovery

链接: https://arxiv.org/abs/2511.06309
作者: Stephen Chung,Wenyu Du
机构: 未知
类目: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
备注: 54 pages

点击查看摘要

[AI-109] Kaggle Chronicles: 15 Years of Competitions Community and Data Science Innovation

链接: https://arxiv.org/abs/2511.06304
作者: Kevin Bönisch,Leandro Losaria
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); General Literature (cs.GL); Machine Learning (stat.ML)
备注:

点击查看摘要

[AI-110] Secu-Table: a Comprehensive security table dataset for evaluating semantic table interpretation systems

【速读】:该论文旨在解决安全领域中语义表解释(Semantic Tables Interpretation, STI)系统评估缺乏公开可用基准数据集的问题,尤其针对基于大语言模型(Large Language Models, LLMs)的STI方法。其解决方案的关键在于构建并公开发布Secu-Table数据集,该数据集包含超过1500张表格和15000余个实体,来源于CVE(Common Vulnerabilities and Exposures)与CWE(Common Weakness Enumeration)的安全数据,并通过Wikidata及SEPSES CSKG(SEmantic Processing of Security Event Streams CyberSecurity Knowledge Graph)进行标注。该数据集作为SemTab挑战赛的一部分,用于评估开源LLMs(如Falcon-3-7B-Instruct和Mistral-7B-Instruct)与闭源模型(如GPT-4o mini)在表到知识图谱匹配任务中的性能,为安全领域的STI研究提供了可复现、高质量的基准资源。

链接: https://arxiv.org/abs/2511.06301
作者: Azanzi Jiomekong,Jean Bikim,Patricia Negoue,Joyce Chin
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: Submitted to Nature Scientific Data

点击查看摘要

Abstract:Evaluating semantic tables interpretation (STI) systems, (particularly, those based on Large Language Models- LLMs) especially in domain-specific contexts such as the security domain, depends heavily on the dataset. However, in the security domain, tabular datasets for state-of-the-art are not publicly available. In this paper, we introduce Secu-Table dataset, composed of more than 1500 tables with more than 15k entities constructed using security data extracted from Common Vulnerabilities and Exposures (CVE) and Common Weakness Enumeration (CWE) data sources and annotated using Wikidata and the SEmantic Processing of Security Event Streams CyberSecurity Knowledge Graph (SEPSES CSKG). Along with the dataset, all the code is publicly released. This dataset is made available to the research community in the context of the SemTab challenge on Tabular to Knowledge Graph Matching. This challenge aims to evaluate the performance of several STI based on open source LLMs. Preliminary evaluation, serving as baseline, was conducted using Falcon3-7b-instruct and Mistral-7B-Instruct, two open source LLMs and GPT-4o mini one closed source LLM.
zh

[AI-111] Decomate: Leverag ing Generative Models for Co-Creative SVG Animation NEURIPS2025

链接: https://arxiv.org/abs/2511.06297
作者: Jihyeon Park,Jiyoon Myung,Seone Shin,Jungki Son,Joohyung Han
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注: Accepted at the 1st Workshop on Generative and Protective AI for Content Creation (NeurIPS 2025)

点击查看摘要

[AI-112] ransolver is a Linear Transformer: Revisiting Physics-Attention through the Lens of Linear Attention

链接: https://arxiv.org/abs/2511.06294
作者: Wenjie Hu,Sidun Liu,Peng Qiao,Zhenglun Sun,Yong Dou
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-113] Synthetic Data-Driven Prompt Tuning for Financial QA over Tables and Documents

链接: https://arxiv.org/abs/2511.06292
作者: Yaoning Yu,Kaimin Chang,Ye Yu,Kai Wei,Haojing Luo,Haohan Wang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-114] Exploiting Inter-Session Information with Frequency-enhanced Dual-Path Networks for Sequential Recommendation

链接: https://arxiv.org/abs/2511.06285
作者: Peng He,Yanglei Gan,Tingting Dai,Run Lin,Xuexin Li,Yao Liu,Qiao Liu
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-115] COTN: A Chaotic Oscillatory Transformer Network for Complex Volatile Systems under Extreme Conditions

链接: https://arxiv.org/abs/2511.06273
作者: Boyan Tang,Yilong Zeng,Xuanhao Ren,Peng Xiao,Yuhan Zhao,Raymond Lee,Jianghua Wu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Submitted to IEEE Transactions on Neural Networks and Learning Systems

点击查看摘要

[AI-116] GAIA: A General Agency Interaction Architecture for LLM -Human B2B Negotiation Screening

链接: https://arxiv.org/abs/2511.06262
作者: Siming Zhao,Qi Li
机构: 未知
类目: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
备注:

点击查看摘要

[AI-117] LLM -Guided Reinforcement Learning with Representative Agents for Traffic Modeling

链接: https://arxiv.org/abs/2511.06260
作者: Hanlin Sun,Jiayang Li
机构: 未知
类目: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
备注:

点击查看摘要

[AI-118] Breaking the Modality Barrier: Generative Modeling for Accurate Molecule Retrieval from Mass Spectra AAAI2026

【速读】:该论文旨在解决从串联质谱(tandem mass spectra)中检索分子结构时存在的两大问题:一是传统质谱库匹配方法因谱库覆盖范围有限导致的检索精度不足;二是现有跨模态表示学习框架常因模态错位(modality misalignment)而导致检索准确率和泛化能力较差。其解决方案的关键在于提出一种基于生成式语言模型的检索框架(Generative Language Model-based Retrieval, GLMR),通过两阶段策略缓解跨模态错位:第一阶段利用对比学习识别候选分子作为上下文先验;第二阶段将候选分子与输入质谱联合引导生成模型输出精炼的分子结构,并基于分子相似性重新排序候选列表,从而显著提升检索性能与泛化能力。

链接: https://arxiv.org/abs/2511.06259
作者: Yiwen Zhang,Keyan Ding,Yihang Wu,Xiang Zhuang,Yi Yang,Qiang Zhang,Huajun Chen
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Accepted by AAAI 2026

点击查看摘要

Abstract:Retrieving molecular structures from tandem mass spectra is a crucial step in rapid compound identification. Existing retrieval methods, such as traditional mass spectral library matching, suffer from limited spectral library coverage, while recent cross-modal representation learning frameworks often encounter modality misalignment, resulting in suboptimal retrieval accuracy and generalization. To address these limitations, we propose GLMR, a Generative Language Model-based Retrieval framework that mitigates the cross-modal misalignment through a two-stage process. In the pre-retrieval stage, a contrastive learning-based model identifies top candidate molecules as contextual priors for the input mass spectrum. In the generative retrieval stage, these candidate molecules are integrated with the input mass spectrum to guide a generative model in producing refined molecular structures, which are then used to re-rank the candidates based on molecular similarity. Experiments on both MassSpecGym and the proposed MassRET-20k dataset demonstrate that GLMR significantly outperforms existing methods, achieving over 40% improvement in top-1 accuracy and exhibiting strong generalizability.
zh

[AI-119] MrCoM: A Meta-Regularized World-Model Generalizing Across Multi-Scenarios

链接: https://arxiv.org/abs/2511.06252
作者: Xuantang Xiong,Ni Mu,Runpeng Xie,Senhao Yang,Yaqing Wang,Lexiang Wang,Yao Luan,Siyuan Li,Shuang Xu,Yiqin Yang,Bo Xu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-120] WebVIA: A Web-based Vision-Language Agent ic Framework for Interactive and Verifiable UI-to-Code Generation

【速读】:该论文旨在解决当前用户界面(User Interface, UI)开发中从设计原型生成静态代码的局限性,即现有基于视觉-语言模型(Vision-Language Models, VLMs)的UI-to-Code方法仅能生成无交互能力的HTML/CSS/JavaScript代码,无法满足现代Web应用对动态交互的需求。其解决方案的关键在于提出首个面向交互式UI生成与验证的代理框架WebVIA,包含三个核心组件:1)探索代理(exploration agent),用于捕获多状态UI截图以实现对交互行为的建模;2)UI2Code模型(WebVIA-UI2Code),通过微调生成可执行且具备交互能力的前端代码;3)验证模块(validation module),确保生成代码的功能正确性和交互一致性。实验表明,该框架在UI探索稳定性和交互代码生成质量上显著优于基线模型,实现了从静态布局到完整交互系统的跨越。

链接: https://arxiv.org/abs/2511.06251
作者: Mingde Xu,Zhen Yang,Wenyi Hong,Lihang Pan,Xinyue Fan,Yan Wang,Xiaotao Gu,Bin Xu,Jie Tang
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注: 36 pages, 30 figures

点击查看摘要

Abstract:User interface (UI) development requires translating design mockups into functional code, a process that remains repetitive and labor-intensive. While recent Vision-Language Models (VLMs) automate UI-to-Code generation, they generate only static HTML/CSS/JavaScript layouts lacking interactivity. To address this, we propose WebVIA, the first agentic framework for interactive UI-to-Code generation and validation. The framework comprises three components: 1) an exploration agent to capture multi-state UI screenshots; 2) a UI2Code model that generates executable interactive code; 3) a validation module that verifies the interactivity. Experiments demonstrate that WebVIA-Agent achieves more stable and accurate UI exploration than general-purpose agents (e.g., Gemini-2.5-Pro). In addition, our fine-tuned WebVIA-UI2Code models exhibit substantial improvements in generating executable and interactive HTML/CSS/JavaScript code, outperforming their base counterparts across both interactive and static UI2Code benchmarks. Our code and models are available at \hrefthis https URL\textttthis https URL.
zh

[AI-121] Constraint-Informed Active Learning for End-to-End ACOPF Optimization Proxies

链接: https://arxiv.org/abs/2511.06248
作者: Miao Li,Michael Klamkin,Pascal Van Hentenryck,Wenting Li,Russell Bent
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 8 PAGES

点击查看摘要

[AI-122] Affordance-Guided Coarse-to-Fine Exploration for Base Placement in Open-Vocabulary Mobile Manipulation AAAI2026

链接: https://arxiv.org/abs/2511.06240
作者: Tzu-Jung Lin,Jia-Fong Yeh,Hung-Ting Su,Chung-Yi Lin,Yi-Ting Chen,Winston H. Hsu
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注: Accepted to AAAI 2026

点击查看摘要

[AI-123] Scaling Laws and In-Context Learning: A Unified Theoretical Framework

链接: https://arxiv.org/abs/2511.06232
作者: Sushant Mehta,Ishan Gupta
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-124] Assertion-Aware Test Code Summarization with Large Language Models

【速读】:该论文旨在解决单元测试代码缺乏简洁且能准确传达测试意图的摘要问题,尤其是在自动生成或文档缺失的代码库中。其核心挑战在于,与普通代码不同,测试代码通过断言(assertion)验证预期行为而非实现功能,因此传统代码摘要方法效果有限。解决方案的关键在于设计针对测试代码特性的提示工程策略,特别是利用断言语义(assertion semantics)作为输入特征,相比完整的方法体(MUT)上下文可提升摘要质量平均0.10分(相对提升2.3%),同时减少输入token数量,从而显著优化大语言模型(LLM)生成测试摘要的效果。

链接: https://arxiv.org/abs/2511.06227
作者: Anamul Haque Mollah,Ahmed Aljohani,Hyunsook Do
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
备注: Accepted for publication at 2nd ACM International Conference on AI-powered Software (AIware 2025)

点击查看摘要

Abstract:Unit tests often lack concise summaries that convey test intent, especially in auto-generated or poorly documented codebases. Large Language Models (LLMs) offer a promising solution, but their effectiveness depends heavily on how they are prompted. Unlike generic code summarization, test-code summarization poses distinct challenges because test methods validate expected behavior through assertions rather than im- plementing functionality. This paper presents a new benchmark of 91 real-world Java test cases paired with developer-written summaries and conducts a controlled ablation study to investigate how test code-related components-such as the method under test (MUT), assertion messages, and assertion semantics-affect the performance of LLM-generated test summaries. We evaluate four code LLMs (Codex, Codestral, DeepSeek, and Qwen-Coder) across seven prompt configurations using n-gram metrics (BLEU, ROUGE-L, METEOR), semantic similarity (BERTScore), and LLM-based evaluation. Results show that prompting with as- sertion semantics improves summary quality by an average of 0.10 points (2.3%) over full MUT context (4.45 vs. 4.35) while requiring fewer input tokens. Codex and Qwen-Coder achieve the highest alignment with human-written summaries, while DeepSeek underperforms despite high lexical overlap. The replication package is publicly available at this https URL. 5281/zenodo.17067550
zh

[AI-125] ROAR: Robust Accident Recognition and Anticipation for Autonomous Driving

链接: https://arxiv.org/abs/2511.06226
作者: Xingcheng Liu,Yanchen Guan,Haicheng Liao,Zhengbing He,Zhenning Li
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: Published to Accident Analysis and Prevention

点击查看摘要

[AI-126] RAG -targeted Adversarial Attack on LLM -based Threat Detection and Mitigation Framework

【速读】:该论文旨在解决生成式 AI(Generative AI)在物联网(IoT)安全场景中应用时所引入的新攻击面问题,特别是针对基于大语言模型(LLM)的网络入侵检测系统(NIDS)因提示注入(prompt injection)和数据污染(data poisoning)等漏洞导致的安全风险。其解决方案的关键在于通过构造一个针对检索增强生成(Retrieval-Augmented Generation, RAG)知识库的定向数据污染攻击,利用语义不变但词级扰动的方式破坏模型对网络流量特征与攻击行为之间关联的理解能力,从而评估此类框架在对抗性环境下的鲁棒性,并揭示小规模扰动如何显著削弱推荐缓解措施的针对性和实用性,尤其在资源受限设备上的适用性下降。

链接: https://arxiv.org/abs/2511.06212
作者: Seif Ikbarieh,Kshitiz Aryal,Maanak Gupta
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:The rapid expansion of the Internet of Things (IoT) is reshaping communication and operational practices across industries, but it also broadens the attack surface and increases susceptibility to security breaches. Artificial Intelligence has become a valuable solution in securing IoT networks, with Large Language Models (LLMs) enabling automated attack behavior analysis and mitigation suggestion in Network Intrusion Detection Systems (NIDS). Despite advancements, the use of LLMs in such systems further expands the attack surface, putting entire networks at risk by introducing vulnerabilities such as prompt injection and data poisoning. In this work, we attack an LLM-based IoT attack analysis and mitigation framework to test its adversarial robustness. We construct an attack description dataset and use it in a targeted data poisoning attack that applies word-level, meaning-preserving perturbations to corrupt the Retrieval-Augmented Generation (RAG) knowledge base of the framework. We then compare pre-attack and post-attack mitigation responses from the target model, ChatGPT-5 Thinking, to measure the impact of the attack on model performance, using an established evaluation rubric designed for human experts and judge LLMs. Our results show that small perturbations degrade LLM performance by weakening the linkage between observed network traffic features and attack behavior, and by reducing the specificity and practicality of recommended mitigations for resource-constrained devices.
zh

[AI-127] Resilience Inference for Supply Chains with Hypergraph Neural Network

链接: https://arxiv.org/abs/2511.06208
作者: Zetian Shen,Hongjun Wang,Jiyuan Chen,Xuan Song
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-128] AI as intermediary in modern-day ritual: An immersive interactive production of the roller disco musical Xanadu at UCLA

链接: https://arxiv.org/abs/2511.06195
作者: Mira Winick,Naisha Agarwal,Chiheb Boussema,Ingrid Lee,Camilo Vargas,Jeff Burke
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-129] Dataforge: A Data Agent Platform for Autonomous Data Engineering

链接: https://arxiv.org/abs/2511.06185
作者: Xinyuan Wang,Yanjie Fu
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-130] MemoriesDB: A Temporal-Semantic-Relational Database for Long-Term Agent Memory / Modeling Experience as a Graph of Temporal-Semantic Surfaces

链接: https://arxiv.org/abs/2511.06179
作者: Joel Ward(“val”)
机构: 未知
类目: Databases (cs.DB); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
备注:

点击查看摘要

[AI-131] CSP4SDG: Constraint and Information-Theory Based Role Identification in Social Deduction Games with LLM -Enhanced Inference

链接: https://arxiv.org/abs/2511.06175
作者: Kaijie Xu,Fandi Meng,Clark Verbrugge,Simon Lucas
机构: 未知
类目: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
备注:

点击查看摘要

[AI-132] LUT-LLM : Efficient Large Language Model Inference with Memory-based Computations on FPGAs

链接: https://arxiv.org/abs/2511.06174
作者: Zifan He,Shengyu Ye,Rui Ma,Yang Wang,Jason Cong
机构: 未知
类目: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-133] Chasing Consistency: Quantifying and Optimizing Human-Model Alignment in Chain-of-Thought Reasoning

链接: https://arxiv.org/abs/2511.06168
作者: Boxuan Wang,Zhuoyun Li,Xinmiao Huang,Xiaowei Huang,Yi Dong
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 13 pages, 3 figures

点击查看摘要

[AI-134] LLM Attention Transplant for Transfer Learning of Tabular Data Across Disparate Domains

【速读】:该论文旨在解决跨域表格数据(tabular data)迁移学习中因特征空间异质性导致的挑战,尤其是传统深度学习方法在表格知识迁移上的局限性。其核心问题在于如何有效利用大语言模型(LLM)的能力来提升表格数据的迁移性能,同时避免对文本提示(text prompts)和上下文学习(in-context learning)的依赖。解决方案的关键在于提出一种轻量级迁移学习框架——LATTLE(LLM-Attention Transplant for Transfer Learning),通过在源域表格数据上微调LLM,并将其中选择性的Key和Value投影权重移植到专为表格数据设计的门控特征标记Transformer(gFTT)中,从而构建具备跨域注意力机制的gFTT模型;该模型随后在目标域表格数据上进行微调,无需共享特征、提示工程或大规模预训练模型即可实现高效迁移学习。

链接: https://arxiv.org/abs/2511.06161
作者: Ibna Kowsar,Kazi F. Akhter,Manar D. Samad
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Transfer learning of tabular data is non-trivial due to heterogeneity in the feature space across disparate domains. The limited success of traditional deep learning in tabular knowledge transfer can be advanced by leveraging large language models (LLMs). However, the efficacy of LLMs often stagnates for mixed data types structured in tables due to the limitations of text prompts and in-context learning. We propose a lightweight transfer learning framework that fine-tunes an LLM using source tabular data and transplants the LLM’s selective key and value projection weights into a gated feature tokenized transformer (gFTT) built for tabular data. The gFTT model with cross-domain attention is fine-tuned using target tabular data for transfer learning, eliminating the need for shared features, LLM prompt engineering, and large-scale pretrained models. Our experiments using ten pairs of source-target data sets and 12 baselines demonstrate the superiority of the proposed LLM-attention transplant for transfer learning (LATTLE) method over traditional ML models, state-of-the-art deep tabular architectures, and transfer learning models trained on thousands to billions of tabular samples. The proposed attention transfer demonstrates an effective solution to learning relationships between data tables using an LLM in a low-resource learning environment. The source code for the proposed method is publicly available.
zh

[AI-135] Models Got Talent: Identifying High Performing Wearable Human Activity Recognition Models Without Training

【速读】:该论文旨在解决神经架构搜索(Neural Architecture Search, NAS)计算成本高昂的问题,提出了一种基于零成本代理指标(Zero Cost Proxies, ZCPs)的替代方案。其核心解决方案在于利用ZCPs在仅需单次前向/反向传播即可评估候选网络架构性能的能力,从而显著降低搜索过程中的训练开销。实验表明,ZCPs能够在六种基准传感器数据集上有效识别出接近全量训练性能(误差小于5%)的高表现架构,且对数据噪声具有鲁棒性,验证了其在实际应用中的可行性与高效性。

链接: https://arxiv.org/abs/2511.06157
作者: Richard Goldman,Varun Komperla,Thomas Ploetz,Harish Haresamudram
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:A promising alternative to the computationally expensive Neural Architecture Search (NAS) involves the development of \textitZero Cost Proxies (ZCPs), which correlate well to trained performance, but can be computed through a single forward/backward pass on a randomly sampled batch of data. In this paper, we investigate the effectiveness of ZCPs for HAR on six benchmark datasets, and demonstrate that they discover network architectures that obtain within 5% of performance attained by full scale training involving 1500 randomly sampled architectures. This results in substantial computational savings as high performing architectures can be discovered with minimal training. Our experiments not only introduce ZCPs to sensor-based HAR, but also demonstrate that they are robust to data noise, further showcasing their suitability for practical scenarios.
zh

[AI-136] MALinZero: Efficient Low-Dimensional Search for Mastering Complex Multi-Agent Planning

【速读】:该论文旨在解决多智能体规划中蒙特卡洛树搜索(Monte Carlo Tree Search, MCTS)因联合动作空间呈指数级增长而导致的探索与利用效率低下问题。其关键解决方案是提出MALinZero方法,通过将联合动作回报投影到低维表示空间,并基于上下文线性Bandit问题建模,在该空间中设计线性置信上界(Linear Upper Confidence Bound for Trees, LinUCT),从而实现更高效的多智能体探索与利用策略;同时,结合子模目标函数的最大化,提出一个(11e)(1-\frac{1}{e})-近似算法用于联合动作选择,显著提升了多智能体强化学习在矩阵博弈、SMAC及SMACv2等基准任务上的性能和收敛速度。

链接: https://arxiv.org/abs/2511.06142
作者: Sizhe Tang,Jiayu Chen,Tian Lan
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Monte Carlo Tree Search (MCTS), which leverages Upper Confidence Bound for Trees (UCTs) to balance exploration and exploitation through randomized sampling, is instrumental to solving complex planning problems. However, for multi-agent planning, MCTS is confronted with a large combinatorial action space that often grows exponentially with the number of agents. As a result, the branching factor of MCTS during tree expansion also increases exponentially, making it very difficult to efficiently explore and exploit during tree search. To this end, we propose MALinZero, a new approach to leverage low-dimensional representational structures on joint-action returns and enable efficient MCTS in complex multi-agent planning. Our solution can be viewed as projecting the joint-action returns into the low-dimensional space representable using a contextual linear bandit problem formulation. We solve the contextual linear bandit problem with convex and \mu -smooth loss functions – in order to place more importance on better joint actions and mitigate potential representational limitations – and derive a linear Upper Confidence Bound applied to trees (LinUCT) to enable novel multi-agent exploration and exploitation in the low-dimensional space. We analyze the regret of MALinZero for low-dimensional reward functions and propose an (1-\tfrac1e) -approximation algorithm for the joint action selection by maximizing a sub-modular objective. MALinZero demonstrates state-of-the-art performance on multi-agent benchmarks such as matrix games, SMAC, and SMACv2, outperforming both model-based and model-free multi-agent reinforcement learning baselines with faster learning speed and better performance.
zh

[AI-137] When Object-Centric World Models Meet Policy Learning: From Pixels to Policies and Where It Breaks

链接: https://arxiv.org/abs/2511.06136
作者: Stefano Ferraro,Akihiro Nakano,Masahiro Suzuki,Yutaka Matsuo
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-138] Maestro: Learning to Collaborate via Conditional Listwise Policy Optimization for Multi-Agent LLM s

链接: https://arxiv.org/abs/2511.06134
作者: Wei Yang,Jiacheng Pang,Shixuan Li,Paul Bogdan,Stephen Tu,Jesse Thomason
机构: 未知
类目: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
备注:

点击查看摘要

[AI-139] SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?

链接: https://arxiv.org/abs/2511.06090
作者: Jeffrey Jian Ma,Milad Hashemi,Amir Yazdanbakhsh,Kevin Swersky,Ofir Press,Enhui Li,Vijay Janapa Reddi,Parthasarathy Ranganathan
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Performance (cs.PF)
备注: Data, code, and leaderboard are available at this https URL

点击查看摘要

[AI-140] A Privacy-Preserving Federated Learning Method with Homomorphic Encryption in Omics Data

【速读】:该论文旨在解决联邦学习(Federated Learning, FL)中隐私保护与预测准确性之间的权衡问题,尤其是在处理高敏感性的组学(Omics)数据时。传统基于差分隐私(Differential Privacy, DP)的方法虽能提供强隐私保障,但因注入噪声导致模型预测性能下降;而同态加密(Homomorphic Encryption, HE)虽可避免噪声引入、提升准确性,却可能显著增加计算开销。论文提出了一种隐私保护机器学习(Privacy-Preserving Machine Learning, PPML)混合方法(PPML-Hybrid),其关键在于根据客户端的计算能力动态选择HE或DP机制:计算资源充足的客户端采用HE以贡献无噪声梯度更新,资源受限的客户端则采用DP以降低计算负担;同时,高算力客户端还可依据隐私需求灵活切换HE与DP。实验表明,该方案在保持与纯HE相当预测精度的同时大幅减少计算时间,并优于同等或更严格隐私预算下的纯DP方法。

链接: https://arxiv.org/abs/2511.06064
作者: Yusaku Negoya,Feifei Cui,Zilong Zhang,Miao Pan,Tomoaki Ohtsuki,Aohan Li
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注: 6 pages, 4 figures

点击查看摘要

Abstract:Omics data is widely employed in medical research to identify disease mechanisms and contains highly sensitive personal information. Federated Learning (FL) with Differential Privacy (DP) can ensure the protection of omics data privacy against malicious user attacks. However, FL with the DP method faces an inherent trade-off: stronger privacy protection degrades predictive accuracy due to injected noise. On the other hand, Homomorphic Encryption (HE) allows computations on encrypted data and enables aggregation of encrypted gradients without DP-induced noise can increase the predictive accuracy. However, it may increase the computation cost. To improve the predictive accuracy while considering the computational ability of heterogeneous clients, we propose a Privacy-Preserving Machine Learning (PPML)-Hybrid method by introducing HE. In the proposed PPML-Hybrid method, clients distributed select either HE or DP based on their computational resources, so that HE clients contribute noise-free updates while DP clients reduce computational overhead. Meanwhile, clients with high computational resources clients can flexibly adopt HE or DP according to their privacy needs. Performance evaluation on omics datasets show that our proposed method achieves comparable predictive accuracy while significantly reducing computation time relative to HE-only. Additionally, it outperforms DP-only methods under equivalent or stricter privacy budgets.
zh

[AI-141] How Particle-System Random Batch Methods Enhance Graph Transformer: Memory Efficiency and Parallel Computing Strategy

链接: https://arxiv.org/abs/2511.06044
作者: Hanwen Liu,Yixuan Ma,Shi Jin,Yuguang Wang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Statistics Theory (math.ST)
备注:

点击查看摘要

[AI-142] Advancing Ocean State Estimation with efficient and scalable AI

链接: https://arxiv.org/abs/2511.06041
作者: Yanfei Xiang,Yuan Gao,Hao Wu,Quan Zhang,Ruiqi Shu,Xiao Zhou,Xi Wu,Xiaomeng Huang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 29 papes, 10 Figures

点击查看摘要

[AI-143] ITPP: Learning Disentangled Event Dynamics in Marked Temporal Point Processes AAAI’26

链接: https://arxiv.org/abs/2511.06032
作者: Wang-Tao Zhou,Zhao Kang,Ke Yan,Ling Tian
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Accepted to AAAI’26 Poster

点击查看摘要

[AI-144] MoSKA: Mixture of Shared KV Attention for Efficient Long-Sequence LLM Inference

【速读】:该论文旨在解决大语言模型(Large Language Models, LLMs)在长上下文场景下因键值缓存(Key-Value Cache, KV Cache)的内存密集特性导致的GPU利用率低下问题。其核心解决方案是提出混合共享KV注意力机制(Mixture of Shared KV Attention, MoSKA),关键在于通过区分每个请求中独有的序列与大量复用的共享序列,将共享数据上的注意力计算从一系列内存受限的GEMV(General Matrix-Vector multiplication)操作转化为单次计算密集型的GEMM(General Matrix-Matrix multiplication)操作,从而提升硬件利用效率;该机制结合了受MoE(Mixture of Experts)启发的稀疏注意力策略和针对独特与共享数据定制的解耦基础设施,实现了高达538.7倍的吞吐量提升。

链接: https://arxiv.org/abs/2511.06010
作者: Myunghyun Rhee,Sookyung Choi,Euiseok Kim,Joonseop Sim,Youngpyo Joo,Hoshik Kim
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
备注: 4 pages, 5 figures, accepted for publication at IEEE Computer Architecture Letters (IEEE CAL), 2025

点击查看摘要

Abstract:The escalating context length in Large Language Models (LLMs) creates a severe performance bottleneck around the Key-Value (KV) cache, whose memory-bound nature leads to significant GPU under-utilization. This paper introduces Mixture of Shared KV Attention (MoSKA), an architecture that addresses this challenge by exploiting the heterogeneity of context data. It differentiates between per-request unique and massively reused shared sequences. The core of MoSKA is a novel Shared KV Attention mechanism that transforms the attention on shared data from a series of memory-bound GEMV operations into a single, compute-bound GEMM by batching concurrent requests. This is supported by an MoE-inspired sparse attention strategy that prunes the search space and a tailored Disaggregated Infrastructure that specializes hardware for unique and shared data. This comprehensive approach demonstrates a throughput increase of up to 538.7x over baselines in workloads with high context sharing, offering a clear architectural path toward scalable LLM inference.
zh

[AI-145] Ontology Learning and Knowledge Graph Construction: A Comparison of Approaches and Their Impact on RAG Performance

【速读】:该论文旨在解决检索增强生成(Retrieval-Augmented Generation, RAG)系统中知识表示方式对性能影响的关键问题,尤其是如何通过优化知识图谱(Knowledge Graph, KG)构建策略来提升RAG的效果。其解决方案的关键在于采用基于本体(ontology)引导的KG构建方法,将文本片段(chunk)信息融入KG结构中,并对比了从关系数据库与文本语料库中提取本体的不同路径;研究发现,基于关系数据库构建的本体引导KG在性能上可媲美最先进的框架,且显著优于纯向量检索基线,同时具备两大优势:一是仅需一次性的本体学习过程,大幅降低大语言模型(Large Language Model, LLM)调用成本;二是避免了文本驱动方法中常见的本体合并复杂性问题。

链接: https://arxiv.org/abs/2511.05991
作者: Tiago da Cruz,Bernardo Tavares,Francisco Belo
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
备注: 12 pages, 8 Figures

点击查看摘要

Abstract:Retrieval-Augmented Generation (RAG) systems combine Large Language Models (LLMs) with external knowledge, and their performance depends heavily on how that knowledge is represented. This study investigates how different Knowledge Graph (KG) construction strategies influence RAG performance. We compare a variety of approaches: standard vector-based RAG, GraphRAG, and retrieval over KGs built from ontologies derived either from relational databases or textual corpora. Results show that ontology-guided KGs incorporating chunk information achieve competitive performance with state-of-the-art frameworks, substantially outperforming vector retrieval baselines. Moreover, the findings reveal that ontology-guided KGs built from relational databases perform competitively to ones built with ontologies extracted from text, with the benefit of offering a dual advantage: they require a one-time-only ontology learning process, substantially reducing LLM usage costs; and avoid the complexity of ontology merging inherent to text-based approaches.
zh

[AI-146] Kunlun Anomaly Troubleshooter: Enabling Kernel-Level Anomaly Detection and Causal Reasoning for Large Model Distributed Inference

【速读】:该论文旨在解决大规模模型分布式推理(Large Model Distributed Inference, LMDI)中异常诊断效率低、准确性差的问题,尤其是在推理性能下降或延迟抖动等复杂异常场景下,传统依赖专家手动排查的方式耗时且效果有限。其解决方案的关键在于提出Kunlun Anomaly Troubleshooter (KAT)框架,通过两项核心创新实现:一是利用GPU工作节点间的同步性和一致性,基于函数调用追踪数据在纳秒级分辨率下精准定位核级别异常及其关联硬件组件;二是将检测结果整合进领域自适应的大语言模型(domain-adapted LLM),实现对复杂异常症状的系统性因果推理与自然语言解释,从而显著缩小诊断范围并提升故障排查效率与成功率。

链接: https://arxiv.org/abs/2511.05978
作者: Yuyang Liu,Jingjing Cai,Jiayi Ren,Peng Zhou,Danyang Zhang,Yin Du,Shijian Li
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
备注: Preprint version, under submission

点击查看摘要

Abstract:Anomaly troubleshooting for large model distributed inference (LMDI) remains a critical challenge. Resolving anomalies such as inference performance degradation or latency jitter in distributed system demands significant manual efforts from domain experts, resulting in extremely time-consuming diagnosis processes with relatively low accuracy. In this paper, we introduce Kunlun Anomaly Troubleshooter (KAT), the first anomaly troubleshooting framework tailored for LMDI. KAT addresses this problem through two core innovations. First, KAT exploits the synchronicity and consistency of GPU workers, innovatively leverages function trace data to precisely detect kernel-level anomalies and associated hardware components at nanosecond resolution. Second, KAT integrates these detection results into a domain-adapted LLM, delivering systematic causal reasoning and natural language interpretation of complex anomaly symptoms. Evaluations conducted in Alibaba Cloud Service production environment indicate that KAT achieves over 0.884 precision and 0.936 recall in anomaly detection, providing detail anomaly insights that significantly narrow down the diagnostic scope and improve both the efficiency and success rate of troubleshooting.
zh

[AI-147] An Epistemic Perspective on Agent Awareness AAAI AAAI-26

【速读】:该论文旨在解决现有文献中关于代理意识(agent awareness)研究的局限性,即传统方法未能将意识视为一种知识形式,从而忽略了其内在结构。解决方案的关键在于区分“关于事物的意识”(de re)与“关于命题的意识”(de dicto)这两种知识形式,并引入两种模态算子来刻画它们;通过2D语义学版本的形式化定义,构建了一个逻辑系统,该系统能够精确描述这两种新提出的模态与标准“事实知识”模态之间的交互关系,且具备逻辑上的可靠性和完备性。

链接: https://arxiv.org/abs/2511.05977
作者: Pavel Naumov,Alexandra Pavlova
机构: 未知
类目: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Multiagent Systems (cs.MA)
备注: Fortieth AAAI Conference on Artificial Intelligence (AAAI-26)

点击查看摘要

Abstract:The paper proposes to treat agent awareness as a form of knowledge, breaking the tradition in the existing literature on awareness. It distinguishes the de re and de dicto forms of such knowledge. The work introduces two modalities capturing these forms and formally specifies their meaning using a version of 2D-semantics. The main technical result is a sound and complete logical system describing the interplay between the two proposed modalities and the standard “knowledge of the fact” modality.
zh

[AI-148] Klear-Agent Forge: Forging Agent ic Intelligence through Posttraining Scaling

【速读】:该论文旨在解决开源社区中缺乏高性能代理模型(agentic model)的后训练细节,从而限制了其发展的问题。解决方案的关键在于提出了一套完整且完全开源的训练流程——Klear-Qwen3-AgentForge,该流程以Qwen3-8B为基础模型,通过设计有效的监督微调(SFT)结合合成数据,再辅以多轮强化学习(multi-turn reinforcement learning, RL),显著提升了模型在外部工具和环境交互中的多任务适应能力。实验表明,该方法在工具使用与代码生成等代理基准测试中达到同类规模模型中的最先进性能,并优于更大参数量的模型。

链接: https://arxiv.org/abs/2511.05951
作者: Qi Wang,Hongzhi Zhang,Jia Fu,Kai Fu,Yahui Liu,Tinghai Zhang,Chenxi Sun,Gangwei Jiang,Jingyi Tang,Xingguang Ji,Yang Yue,Jingyuan Zhang,Fuzheng Zhang,Kun Gai,Guorui Zhou
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 20 pages, 7 figures

点击查看摘要

Abstract:Despite the proliferation of powerful agentic models, the lack of critical post-training details hinders the development of strong counterparts in the open-source community. In this study, we present a comprehensive and fully open-source pipeline for training a high-performance agentic model for interacting with external tools and environments, named Klear-Qwen3-AgentForge, starting from the Qwen3-8B base model. We design effective supervised fine-tuning (SFT) with synthetic data followed by multi-turn reinforcement learning (RL) to unlock the potential for multiple diverse agentic tasks. We perform exclusive experiments on various agentic benchmarks in both tool use and coding domains. Klear-Qwen3-AgentForge-8B achieves state-of-the-art performance among LLMs of similar size and remains competitive with significantly larger models.
zh

[AI-149] 10 Open Challenges Steering the Future of Vision-Language-Action Models AAAI2026

【速读】:该论文旨在解决当前具身人工智能(Embodied AI)中视觉-语言-动作(Vision-Language-Action, VLA)模型发展路径不清晰、关键瓶颈尚不明确的问题。其解决方案的关键在于系统性梳理并归纳了VLA模型发展的十大核心里程碑——包括多模态融合、推理能力、数据构建、评估体系、跨机器人动作泛化、效率优化、全身协调、安全性、智能体设计以及人机协作,并进一步指出通过空间理解建模、世界动态建模、后训练策略和数据合成等新兴趋势,可有效推动这些里程碑的实现,从而加速VLA模型向更广泛实际应用的落地。

链接: https://arxiv.org/abs/2511.05936
作者: Soujanya Poria,Navonil Majumder,Chia-Yu Hung,Amir Ali Bagherzadeh,Chuan Li,Kenneth Kwok,Ziwei Wang,Cheston Tan,Jiajun Wu,David Hsu
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注: AAAI 2026 (Senior Track)

点击查看摘要

Abstract:Due to their ability of follow natural language instructions, vision-language-action (VLA) models are increasingly prevalent in the embodied AI arena, following the widespread success of their precursors – LLMs and VLMs. In this paper, we discuss 10 principal milestones in the ongoing development of VLA models – multimodality, reasoning, data, evaluation, cross-robot action generalization, efficiency, whole-body coordination, safety, agents, and coordination with humans. Furthermore, we discuss the emerging trends of using spatial understanding, modeling world dynamics, post training, and data synthesis – all aiming to reach these milestones. Through these discussions, we hope to bring attention to the research avenues that may accelerate the development of VLA models into wider acceptability.
zh

[AI-150] he Future of AI in the GCC Post-NPM Landscape: A Comparative Analysis of Kuwait and the UAE

【速读】:该论文试图解决的问题是:在海湾合作委员会(GCC)国家中,如何将人工智能(AI)战略愿景转化为后新公共管理(post-NPM)治理成果,尤其是在缺乏对非西方民主国家经验比较研究的情况下。其解决方案的关键在于运用奥斯特罗姆的制度分析与发展(IAD)框架,通过对比阿联酋(UAE)与科威特两个具有相似财政资源但制度路径迥异的案例,识别出影响AI公共价值实现的核心制度机制。研究发现,垂直规则一致性(vertical rule coherence)而非财富水平才是决定AI公共价值产出的关键因素,具体表现为:阿联酋凭借集中化权威、可执行的制裁机制、创新导向叙事和灵活再投资规则,成功将AI试点扩展为数百项服务并实现资金循环利用;而科威特因决策分散、象征性制裁、保守话语及预算停滞,导致AI项目长期停留在试点阶段。这一发现深化了制度理论,并警示仅靠效率指标无法保障社会目标的实现,除非有可执行的制度保障。

链接: https://arxiv.org/abs/2511.05932
作者: Mohammad Rashed Albous,Bedour Alboloushi,Arnaud Lacheret
机构: 未知
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Theoretical Economics (econ.TH)
备注:

点击查看摘要

Abstract:Comparative evidence on how Gulf Cooperation Council (GCC) states turn artificial intelligence (AI) ambitions into post–New Public Management (post-NPM) outcomes is scarce because most studies examine Western democracies. We analyze constitutional, collective-choice, and operational rules shaping AI uptake in two contrasting GCC members, the United Arab Emirates (UAE) and Kuwait, and whether they foster citizen centricity, collaborative governance, and public value creation. Anchored in Ostrom’s Institutional Analysis and Development framework, the study combines a most similar/most different systems design with multiple sources: 62 public documents from 2018–2025, embedded UAE cases (Smart Dubai and MBZUAI), and 39 interviews with officials conducted Aug 2024–May 2025. Dual coding and process tracing connect rule configurations to AI performance. Cross-case analysis identifies four reinforcing mechanisms behind divergent trajectories. In the UAE, concentrated authority, credible sanctions, pro-innovation narratives, and flexible reinvestment rules scale pilots into hundreds of services and sizable recycled savings. In Kuwait, dispersed veto points, exhortative sanctions, cautious discourse, and lapsed AI budgets confine initiatives to pilot mode despite equivalent fiscal resources. The findings refine institutional theory by showing that vertical rule coherence, not wealth, determines AI’s public-value yield, and temper post-NPM optimism by revealing that efficiency metrics serve societal goals only when backed by enforceable safeguards. To curb ethics washing and test transferability beyond the GCC, future work should track rule diffusion over time, develop blended legitimacy–efficiency scorecards, and examine how narrative framing shapes citizen consent for data sharing.
zh

[AI-151] Self-Abstraction from Grounded Experience for Plan-Guided Policy Refinement

链接: https://arxiv.org/abs/2511.05931
作者: Hiroaki Hayashi,Bo Pang,Wenting Zhao,Ye Liu,Akash Gokul,Srijan Bansal,Caiming Xiong,Semih Yavuz,Yingbo Zhou
机构: 未知
类目: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
备注:

点击查看摘要

[AI-152] Artificial intelligence and the Gulf Cooperation Council workforce adapting to the future of work

链接: https://arxiv.org/abs/2511.05927
作者: Mohammad Rashed Albous,Melodena Stephens,Odeh Rashed Al-Jayyousi
机构: 未知
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); General Economics (econ.GN)
备注:

点击查看摘要

[AI-153] A Remarkably Efficient Paradigm to Multimodal Large Language Models for Sequential Recommendation

链接: https://arxiv.org/abs/2511.05885
作者: Qiyong Zhong,Jiajie Su,Ming Yang,Yunshan Ma,Xiaolin Zheng,Chaochao Chen
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-154] Unveiling Modality Bias: Automated Sample-Specific Analysis for Multimodal Misinformation Benchmarks

【速读】:该论文旨在解决多模态虚假信息检测基准中存在模态偏置(modality bias)的问题,即现有数据集倾向于依赖某一特定模态(如文本或图像)进行预测,导致检测模型可能仅基于单一模态做出判断,从而削弱其在真实场景下的泛化能力。解决方案的关键在于提出三种不同粒度的自动化偏置量化方法:粗粒度的模态收益评估、中粒度的信息流分析以及细粒度的因果关系解析,以实现对样本层面模态偏置的系统识别与测量。实验表明,融合多种视角可提升自动化分析的可靠性,且该方法能有效区分模态平衡样本与偏置样本,为未来多模态虚假信息检测研究提供了新的方向。

链接: https://arxiv.org/abs/2511.05883
作者: Hehai Lin,Hui Liu,Shilei Cao,Jing Li,Haoliang Li,Wenya Wang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Numerous multimodal misinformation benchmarks exhibit bias toward specific modalities, allowing detectors to make predictions based solely on one modality. While previous research has quantified bias at the dataset level or manually identified spurious correlations between modalities and labels, these approaches lack meaningful insights at the sample level and struggle to scale to the vast amount of online information. In this paper, we investigate the design for automated recognition of modality bias at the sample level. Specifically, we propose three bias quantification methods based on theories/views of different levels of granularity: 1) a coarse-grained evaluation of modality benefit; 2) a medium-grained quantification of information flow; and 3) a fine-grained causality analysis. To verify the effectiveness, we conduct a human evaluation on two popular benchmarks. Experimental results reveal three interesting findings that provide potential direction toward future research: 1)~Ensembling multiple views is crucial for reliable automated analysis; 2)~Automated analysis is prone to detector-induced fluctuations; and 3)~Different views produce a higher agreement on modality-balanced samples but diverge on biased ones.
zh

[AI-155] Physics-Informed Neural Networks for Real-Time Gas Crossover Prediction in PEM Electrolyzers: First Application with Multi-Membrane Validation

【速读】:该论文旨在解决质子交换膜(PEM)水电解制氢过程中氢气渗透(hydrogen crossover)带来的安全风险与效率损失问题,其关键在于提出并应用物理信息神经网络(Physics-Informed Neural Networks, PINNs)进行实时预测。解决方案的核心是将质量守恒、菲克扩散定律(Fick’s diffusion law)和亨利溶解定律(Henry’s solubility law)嵌入到一个参数量仅为17,793的紧凑神经网络架构中,从而在保持高精度(R² = 99.84%,RMSE = 0.0348%)的同时实现亚毫秒级推理速度,显著优于纯数据驱动模型,并具备跨工况外推能力(如压力超出训练范围2.5倍时仍保持R² > 86%),支持从桌面CPU到边缘设备(如树莓派4)的硬件无关部署,为大规模绿氢设施提供实时安全监控的新范式。

链接: https://arxiv.org/abs/2511.05879
作者: Yong-Woon Kim,Chulung Kang,Yung-Cheol Byun
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Green hydrogen production via polymer electrolyte membrane (PEM) water electrolysis is pivotal for energy transition, yet hydrogen crossover through membranes threatens safety and economic viability-approaching explosive limits (4 mol% H _2 in O _2 ) while reducing Faradaic efficiency by 2.5%. Current physics-based models require extensive calibration and computational resources that preclude real-time implementation, while purely data-driven approaches fail to extrapolate beyond training conditions-critical for dynamic electrolyzer operation. Here we present the first application of physics-informed neural networks (PINNs) for hydrogen crossover prediction, integrating mass conservation, Fick’s diffusion law, and Henry’s solubility law within a compact architecture (17,793 parameters). Validated across six membranes under industrially relevant conditions (0.05-5.0 A/cm ^2 , 1-200 bar, 25-85°C), our PINN achieves exceptional accuracy (R ^2 = 99.84%, RMSE = 0.0348%) with sub-millisecond inference times suitable for real-time control. Remarkably, the model maintains R ^2 86% when predicting crossover at pressures 2.5x beyond training range-substantially outperforming pure neural networks (R ^2 = 43.4%). The hardware-agnostic deployment, from desktop CPUs to edge devices (Raspberry Pi 4), enables distributed safety monitoring essential for gigawatt-scale installations. By bridging physical rigor and computational efficiency, this work establishes a new paradigm for real-time electrolyzer monitoring, accelerating deployment of safe, efficient green hydrogen infrastructure crucial for net-zero emissions targets.
zh

[AI-156] An Empirical Study of Reasoning Steps in Thinking Code LLM s

链接: https://arxiv.org/abs/2511.05874
作者: Haoran Xue,Gias Uddin,Song Wang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-157] Adaptation and Fine-tuning with TabPFN for Travelling Salesman Problem

链接: https://arxiv.org/abs/2511.05872
作者: Nguyen Gia Hien Vu,Yifan Tang,Rey Lim,Yifan Yang,Hang Ma,Ke Wang,G. Gary Wang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Combinatorics (math.CO)
备注:

点击查看摘要

[AI-158] EMOD: A Unified EEG Emotion Representation Framework Leverag ing V-A Guided Contrastive Learning

链接: https://arxiv.org/abs/2511.05863
作者: Yuning Chen,Sha Zhao,Shijian Li,Gang Pan
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-159] Predicting the Future by Retrieving the Past AAAI2026

【速读】:该论文旨在解决当前深度学习模型(如MLP、Transformer和TCN)在单变量时间序列预测中对全局历史信息利用不足的问题。这些模型虽然在训练过程中隐式地将历史信息压缩到参数中,但在推理阶段仅依赖局部滑动窗口内的上下文,无法动态访问全局历史模式,导致预测精度受限。解决方案的关键在于提出Predicting the Future by Retrieving the Past (PFRP),其核心创新是构建一个Global Memory Bank (GMB)用于存储和管理全局历史模式,并通过检索机制提取相似历史片段以生成全局预测,再与本地预测模型输出自适应融合,从而显著提升预测准确性和可解释性。

链接: https://arxiv.org/abs/2511.05859
作者: Dazhao Du,Tao Han,Song Guo
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Accepted by AAAI 2026

点击查看摘要

Abstract:Deep learning models such as MLP, Transformer, and TCN have achieved remarkable success in univariate time series forecasting, typically relying on sliding window samples from historical data for training. However, while these models implicitly compress historical information into their parameters during training, they are unable to explicitly and dynamically access this global knowledge during inference, relying only on the local context within the lookback window. This results in an underutilization of rich patterns from the global history. To bridge this gap, we propose Predicting the Future by Retrieving the Past (PFRP), a novel approach that explicitly integrates global historical data to enhance forecasting accuracy. Specifically, we construct a Global Memory Bank (GMB) to effectively store and manage global historical patterns. A retrieval mechanism is then employed to extract similar patterns from the GMB, enabling the generation of global predictions. By adaptively combining these global predictions with the outputs of any local prediction model, PFRP produces more accurate and interpretable forecasts. Extensive experiments conducted on seven real-world datasets demonstrate that PFRP significantly enhances the average performance of advanced univariate forecasting models by 8.4%. Codes can be found in this https URL.
zh

[AI-160] Can a Small Model Learn to Look Before It Leaps? Dynamic Learning and Proactive Correction for Hallucination Detection

链接: https://arxiv.org/abs/2511.05854
作者: Zepeng Bao,Shen Zhou,Qiankun Pi,Jianhao Chen,Mayi Xu,Ming Zhong,Yuanyuan Zhu,Tieyun Qian
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-161] Retrieval Quality at Context Limit

【速读】:该论文旨在解决大语言模型(Large Language Models, LLMs)在长上下文场景下存在“中间遗忘”(Lost in the Middle, LITM)现象的问题,即模型对位于长文本中间位置的事实性信息检索准确率显著下降。研究表明,Gemini 2.5 Flash 模型在面对“大海捞针”式问答任务时,无论目标信息处于文档的何种位置(包括接近输入上下文极限的情况),均能保持高准确率,表明其在长距离信息检索方面具有显著改进,关键在于该模型有效缓解甚至消除了传统 LLM 中存在的 LITM 效应。

链接: https://arxiv.org/abs/2511.05850
作者: Max McKinnon
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
备注: 3 pages, 0 figures

点击查看摘要

Abstract:The ability of large language models (LLMs) to recall and retrieve information from long contexts is critical for many real-world applications. Prior work (Liu et al., 2023) reported that LLMs suffer significant drops in retrieval accuracy for facts placed in the middle of large contexts, an effect known as “Lost in the Middle” (LITM). We find the model Gemini 2.5 Flash can answer needle-in-a-haystack questions with great accuracy regardless of document position including when the document is nearly at the input context limit. Our results suggest that the “Lost in the Middle” effect is not present for simple factoid Q\A in Gemini 2.5 Flash, indicating substantial improvements in long-context retrieval.
zh

[AI-162] EGG-SR: Embedding Symbolic Equivalence into Symbolic Regression via Equality Graph

【速读】:该论文旨在解决符号回归(Symbolic Regression)中因表达式搜索空间呈指数级增长而导致的计算效率低下问题。其核心挑战在于,大量语法不同但语义等价的表达式被传统算法视为独立候选解,造成冗余探索与学习缓慢。解决方案的关键在于引入统一框架EGG-SR,通过将等价图(equality graphs, e-graphs)集成到多种符号回归算法(如蒙特卡洛树搜索MCTS、深度强化学习DRL和大语言模型LLM)中,利用E-GG模块紧凑表示语义等价表达式集合,从而实现:在EGG-MCTS中剪枝冗余子树探索,在EGG-DRL中聚合等价类奖励以降低梯度方差,在EGG-LLM中增强反馈提示信息。理论分析表明,在合理假设下嵌入e-graph可收紧MCTS的累积遗憾边界并减少DRL梯度估计方差;实验验证了该方法在多个基准任务上显著优于现有最优方法,能发现更精确的物理规律表达式。

链接: https://arxiv.org/abs/2511.05849
作者: Nan Jiang,Ziyi Wang,Yexiang Xue
机构: 未知
类目: ymbolic Computation (cs.SC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Symbolic regression seeks to uncover physical laws from experimental data by searching for closed-form expressions, which is an important task in AI-driven scientific discovery. Yet the exponential growth of the search space of expression renders the task computationally challenging. A promising yet underexplored direction for reducing the effective search space and accelerating training lies in symbolic equivalence: many expressions, although syntactically different, define the same function – for example, \log(x_1^2x_2^3) , \log(x_1^2)+\log(x_2^3) , and 2\log(x_1)+3\log(x_2) . Existing algorithms treat such variants as distinct outputs, leading to redundant exploration and slow learning. We introduce EGG-SR, a unified framework that integrates equality graphs (e-graphs) into diverse symbolic regression algorithms, including Monte Carlo Tree Search (MCTS), deep reinforcement learning (DRL), and large language models (LLMs). EGG-SR compactly represents equivalent expressions through the proposed EGG module, enabling more efficient learning by: (1) pruning redundant subtree exploration in EGG-MCTS, (2) aggregating rewards across equivalence classes in EGG-DRL, and (3) enriching feedback prompts in EGG-LLM. Under mild assumptions, we show that embedding e-graphs tightens the regret bound of MCTS and reduces the variance of the DRL gradient estimator. Empirically, EGG-SR consistently enhances multiple baselines across challenging benchmarks, discovering equations with lower normalized mean squared error than state-of-the-art methods. Code implementation is available at: this https URL.
zh

[AI-163] Policy Gradient-Based EMT-in-the-Loop Learning to Mitigate Sub-Synchronous Control Interactions

【速读】:该论文旨在解决由电网配置特定条件下控制参数(control gains)调谐不当所引发的次同步振荡(sub-synchronous oscillations, SSO)问题,尤其是次同步控制相互作用(sub-synchronous control interactions, SSCI)带来的稳定性风险。解决方案的关键在于构建一个基于强化学习(reinforcement learning, RL)的闭环自适应调参框架,采用受马尔可夫决策过程(Markov decision process, MDP)启发的深度策略梯度方法,并引入针对SSCI特征的信号处理模块(如下采样、带通滤波和基于振荡能量的奖励函数设计),从而实现对控制增益的在线动态调整,有效抑制由控制交互引起的次同步振荡。

链接: https://arxiv.org/abs/2511.05822
作者: Sayak Mukherjee,Ramij R. Hossain,Kaustav Chatterjee,Sameer Nekkalapu,Marcelo Elizondo
机构: 未知
类目: ystems and Control (eess.SY); Artificial Intelligence (cs.AI)
备注: 10 pages, 7 figures

点击查看摘要

Abstract:This paper explores the development of learning-based tunable control gains using EMT-in-the-loop simulation framework (e.g., PSCAD interfaced with Python-based learning modules) to address critical sub-synchronous oscillations. Since sub-synchronous control interactions (SSCI) arise from the mis-tuning of control gains under specific grid configurations, effective mitigation strategies require adaptive re-tuning of these gains. Such adaptiveness can be achieved by employing a closed-loop, learning-based framework that considers the grid conditions responsible for such sub-synchronous oscillations. This paper addresses this need by adopting methodologies inspired by Markov decision process (MDP) based reinforcement learning (RL), with a particular emphasis on simpler deep policy gradient methods with additional SSCI-specific signal processing modules such as down-sampling, bandpass filtering, and oscillation energy dependent reward computations. Our experimentation in a real-world event setting demonstrates that the deep policy gradient based trained policy can adaptively compute gain settings in response to varying grid conditions and optimally suppress control interaction-induced oscillations.
zh

[AI-164] WAR-Re: Web API Recommendation with Semantic Reasoning

【速读】:该论文旨在解决Web API推荐中的两个关键问题:一是现有方法采用固定的Top-N推荐策略,难以适应不同mashup对API数量需求的差异;二是推荐结果缺乏可解释性,即仅输出排序列表而无推荐依据,导致用户无法理解推荐逻辑。解决方案的关键在于提出WAR-Re模型,该模型基于大语言模型(Large Language Model, LLM)构建,通过引入特殊起始和终止标记(start and stop tokens)以动态调整推荐API的数量,并采用两阶段训练策略——监督微调与基于组相对策略优化(Group Relative Policy Optimization, GRPO)的强化学习,从而同时提升推荐准确性与生成语义推理理由的能力。实验表明,WAR-Re在ProgrammableWeb数据集上相较最先进基线模型推荐准确率最高提升21.59%,并能持续生成高质量的推荐解释。

链接: https://arxiv.org/abs/2511.05820
作者: Zishuo Xu,Dezhong Yao,Yao Wan
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:With the development of cloud computing, the number of Web APIs has increased dramatically, further intensifying the demand for efficient Web API recommendation. Despite the demonstrated success of previous Web API recommendation solutions, two critical challenges persist: 1) a fixed top-N recommendation that cannot accommodate the varying API cardinality requirements of different mashups, and 2) these methods output only ranked API lists without accompanying reasons, depriving users of understanding the recommendation. To address these challenges, we propose WAR-Re, an LLM-based model for Web API recommendation with semantic reasoning for justification. WAR-Re leverages special start and stop tokens to handle the first challenge and uses two-stage training: supervised fine-tuning and reinforcement learning via Group Relative Policy Optimization (GRPO) to enhance the model’s ability in both tasks. Comprehensive experimental evaluations on the ProgrammableWeb dataset demonstrate that WAR-Re achieves a gain of up to 21.59% over the state-of-the-art baseline model in recommendation accuracy, while consistently producing high-quality semantic reasons for recommendations.
zh

[AI-165] In-depth Analysis on Caching and Pre-fetching in Mixture of Experts Offloading

【速读】:该论文旨在解决Mixture of Experts (MoE)模型在部署时面临的高内存占用问题,尤其是在GPU显存受限的边缘设备上难以高效运行的挑战。其核心解决方案在于深入研究MoE的专家激活模式与缓存行为,并提出基于LFU(Least Frequently Used)策略的缓存优化方法,显著优于传统LRU(Least Recently Used)缓存算法;同时引入推测性专家预取(speculative expert pre-fetching)机制,通过详细轨迹分析验证了其在提升系统性能方面的巨大潜力。此外,论文还系统揭示了门控网络和专家模块的行为特征,为未来MoE模型的可解释性研究及低损耗剪枝技术开发提供了重要依据。

链接: https://arxiv.org/abs/2511.05814
作者: Shuning Lin,Yifan He,Yitong Chen
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:In today’s landscape, Mixture of Experts (MoE) is a crucial architecture that has been used by many of the most advanced models. One of the major challenges of MoE models is that they usually require much more memory than their dense counterparts due to their unique architecture, and hence are harder to deploy in environments with limited GPU memory, such as edge devices. MoE offloading is a promising technique proposed to overcome this challenge, especially if it is enhanced with caching and pre-fetching, but prior work stopped at suboptimal caching algorithm and offered limited insights. In this work, we study MoE offloading in depth and make the following contributions: 1. We analyze the expert activation and LRU caching behavior in detail and provide traces. 2. We propose LFU caching optimization based on our analysis and obtain strong improvements from LRU. 3. We implement and experiment speculative expert pre-fetching, providing detailed trace showing its huge potential . 4. In addition, our study extensively covers the behavior of the MoE architecture itself, offering information on the characteristic of the gating network and experts. This can inspire future work on the interpretation of MoE models and the development of pruning techniques for MoE architecture with minimal performance loss.
zh

[AI-166] MOSS: Efficient and Accurate FP8 LLM Training with Microscaling and Automatic Scaling

【速读】:该论文旨在解决使用FP8(Float8)格式训练大语言模型时面临的数值稳定性与效率瓶颈问题。FP8虽能显著提升训练效率,但其较低的数值精度易导致训练不稳定,现有框架依赖混合粒度量化(如对激活采用分组量化、对权重采用张量/块量化)和动态缩放策略来维持性能,然而这些方法引入了额外的反量化开销和低效的在线量化操作,削弱了FP8的优势。解决方案的关键在于提出MOSS框架,通过两项创新实现高效且稳定的FP8训练:一是两级微缩放策略(two-level microscaling),结合高精度全局缩放与紧凑的2的幂次局部缩放,在保证敏感激活精度的同时降低反量化成本;二是线性层权重的自动缩放机制,通过预测和调整缩放因子避免昂贵的最大值归约运算,从而消除对实时缩放的依赖。

链接: https://arxiv.org/abs/2511.05811
作者: Yu Zhang,Hui-Ling Zhen,Mingxuan Yuan,Bei Yu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Training large language models with FP8 formats offers significant efficiency gains. However, the reduced numerical precision of FP8 poses challenges for stable and accurate training. Current frameworks preserve training performance using mixed-granularity quantization, i.e., applying per-group quantization for activations and per-tensor/block quantization for weights. While effective, per-group quantization requires scaling along the inner dimension of matrix multiplication, introducing additional dequantization overhead. Moreover, these frameworks often rely on just-in-time scaling to dynamically adjust scaling factors based on the current data distribution. However, this online quantization is inefficient for FP8 training, as it involves multiple memory reads and writes that negate the performance benefits of FP8. To overcome these limitations, we propose MOSS, a novel FP8 training framework that ensures both efficiency and numerical stability. MOSS introduces two key innovations: (1) a two-level microscaling strategy for quantizing sensitive activations, which balances precision and dequantization cost by combining a high-precision global scale with compact, power-of-two local scales; and (2) automatic scaling for weights in linear layers, which eliminates the need for costly max-reduction operations by predicting and adjusting scaling factors during training. Leveraging these techniques, MOSS enables efficient FP8 training of a 7B parameter model, achieving performance comparable to the BF16 baseline while achieving up to 34% higher training throughput.
zh

[AI-167] Measuring Model Performance in the Presence of an Intervention AAAI2026

链接: https://arxiv.org/abs/2511.05805
作者: Winston Chen,Michael W. Sjoding,Jenna Wiens
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: AAAI 2026

点击查看摘要

[AI-168] Beyond the Lower Bound: Bridging Regret Minimization and Best Arm Identification in Lexicographic Bandits AAAI2026

【速读】:该论文旨在解决具有层级偏好(hierarchical preferences)的多目标决策问题中,如何在兼顾最优臂识别(best arm identification)与遗憾最小化(regret minimization)双重目标下的样本效率优化问题。其关键解决方案在于提出两种基于淘汰机制(elimination-based)的算法:第一种按目标优先级逐层淘汰次优臂,实现与单目标最优算法相当的样本复杂度和遗憾界;第二种则在每轮中同时利用所有目标的奖励信息,有效挖掘跨目标依赖关系,显著优于单目标 bandit 问题的已知下界,从而揭示了多目标设置中跨目标信息共享的优势。

链接: https://arxiv.org/abs/2511.05802
作者: Bo Xue,Yuanyu Wan,Zhichao Lu,Qingfu Zhang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Accepted by AAAI 2026

点击查看摘要

Abstract:In multi-objective decision-making with hierarchical preferences, lexicographic bandits provide a natural framework for optimizing multiple objectives in a prioritized order. In this setting, a learner repeatedly selects arms and observes reward vectors, aiming to maximize the reward for the highest-priority objective, then the next, and so on. While previous studies have primarily focused on regret minimization, this work bridges the gap between \textitregret minimization and \textitbest arm identification under lexicographic preferences. We propose two elimination-based algorithms to address this joint objective. The first algorithm eliminates suboptimal arms sequentially, layer by layer, in accordance with the objective priorities, and achieves sample complexity and regret bounds comparable to those of the best single-objective algorithms. The second algorithm simultaneously leverages reward information from all objectives in each round, effectively exploiting cross-objective dependencies. Remarkably, it outperforms the known lower bound for the single-objective bandit problem, highlighting the benefit of cross-objective information sharing in the multi-objective setting. Empirical results further validate their superior performance over baselines.
zh

[AI-169] When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins

【速读】:该论文旨在解决第三方聊天机器人插件在实际部署中对提示注入攻击(prompt injection attacks)的防护不足问题,尤其是在面向非专业网站开发者的场景下,这些插件广泛用于构建客户服务平台,但其安全机制尚未被充分研究。解决方案的关键在于通过大规模实证分析揭示两类核心漏洞:一是8个插件未能保护对话历史记录的完整性,使攻击者可伪造包含虚假系统消息的上下文,从而将恶意指令成功率提升3至8倍;二是15个插件在引入网站内容时未区分可信与不可信数据源,导致间接提示注入风险,尤其在约13%的电商网站中已存在第三方内容暴露问题。研究通过可控实验验证了上述漏洞,并指出当前多数插件采用的安全实践削弱了大语言模型(LLM)内置的防御能力。

链接: https://arxiv.org/abs/2511.05797
作者: Yigitcan Kaya,Anton Landerer,Stijn Pletinckx,Michelle Zimmermann,Christopher Kruegel,Giovanni Vigna
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注: At IEEE SP 2026

点击查看摘要

Abstract:Prompt injection attacks pose a critical threat to large language models (LLMs), with prior work focusing on cutting-edge LLM applications like personal copilots. In contrast, simpler LLM applications, such as customer service chatbots, are widespread on the web, yet their security posture and exposure to such attacks remain poorly understood. These applications often rely on third-party chatbot plugins that act as intermediaries to commercial LLM APIs, offering non-expert website builders intuitive ways to customize chatbot behaviors. To bridge this gap, we present the first large-scale study of 17 third-party chatbot plugins used by over 10,000 public websites, uncovering previously unknown prompt injection risks in practice. First, 8 of these plugins (used by 8,000 websites) fail to enforce the integrity of the conversation history transmitted in network requests between the website visitor and the chatbot. This oversight amplifies the impact of direct prompt injection attacks by allowing adversaries to forge conversation histories (including fake system messages), boosting their ability to elicit unintended behavior (e.g., code generation) by 3 to 8x. Second, 15 plugins offer tools, such as web-scraping, to enrich the chatbot’s context with website-specific content. However, these tools do not distinguish the website’s trusted content (e.g., product descriptions) from untrusted, third-party content (e.g., customer reviews), introducing a risk of indirect prompt injection. Notably, we found that ~13% of e-commerce websites have already exposed their chatbots to third-party content. We systematically evaluate both vulnerabilities through controlled experiments grounded in real-world observations, focusing on factors such as system prompt design and the underlying LLM. Our findings show that many plugins adopt insecure practices that undermine the built-in LLM safeguards.
zh

[AI-170] VLAD-Grasp: Zero-shot Grasp Detection via Vision-Language Models

【速读】:该论文旨在解决机器人抓取(robotic grasping)任务中对大规模专家标注数据依赖性强以及缺乏零样本泛化能力的问题。传统方法通常需要针对新物体重新训练模型,限制了其在实际场景中的应用灵活性。解决方案的关键在于提出VLAD-Grasp——一种基于视觉语言模型(Vision-Language model, VLM)的零样本抓取检测方法:首先利用大模型生成一个目标图像(goal image),其中用直杆“刺入”物体以表示抗对称抓取(antipodal grasp);随后通过预测深度和分割信息将该目标图像升维至3D空间;最后借助主成分分析(PCA)与无对应关系优化算法对齐生成与观测到的点云,从而恢复可执行的抓取位姿。整个流程无需训练且不依赖特定抓取数据集,实现了对未见过的真实物体的零样本泛化性能。

链接: https://arxiv.org/abs/2511.05791
作者: Manav Kulshrestha,S. Talha Bukhari,Damon Conover,Aniket Bera
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 8 pages, 4 figures, under review

点击查看摘要

Abstract:Robotic grasping is a fundamental capability for autonomous manipulation; however, most existing methods rely on large-scale expert annotations and necessitate retraining to handle new objects. We present VLAD-Grasp, a Vision-Language model Assisted zero-shot approach for Detecting grasps. From a single RGB-D image, our method (1) prompts a large vision-language model to generate a goal image where a straight rod “impales” the object, representing an antipodal grasp, (2) predicts depth and segmentation to lift this generated image into 3D, and (3) aligns generated and observed object point clouds via principal component analysis and correspondence-free optimization to recover an executable grasp pose. Unlike prior work, our approach is training-free and does not rely on curated grasp datasets. Despite this, VLAD-Grasp achieves performance that is competitive with or superior to that of state-of-the-art supervised models on the Cornell and Jacquard datasets. We further demonstrate zero-shot generalization to novel real-world objects on a Franka Research 3 robot, highlighting vision-language foundation models as powerful priors for robotic manipulation.
zh

[AI-171] SymLight: Exploring Interpretable and Deployable Symbolic Policies for Traffic Signal Control

【速读】:该论文旨在解决深度强化学习(Deep Reinforcement Learning)在交通信号控制(Traffic Signal Control, TSC)中因神经策略模型参数过多、缺乏透明性而导致可解释性差和难以部署于资源受限边缘设备的问题。解决方案的关键在于提出SymLight框架,其基于蒙特卡洛树搜索(Monte Carlo Tree Search, MCTS)来搜索具有内在可解释性和可部署性的符号优先级函数(symbolic priority function)。该优先级函数以交通特征为输入,输出各信号相位的优先级以指导相位切换;通过设计简洁而表达能力强的优先级函数表示形式,有效缓解MCTS中动作空间的组合爆炸问题,并引入概率结构回放策略,利用先前发现的高质量优先级函数的结构模式引导探索过程,从而在真实数据集上实现优于基线方法的性能,同时保证策略的可解释性与部署可行性。

链接: https://arxiv.org/abs/2511.05790
作者: Xiao-Cheng Liao,Yi Mei,Mengjie Zhang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Deep Reinforcement Learning have achieved significant success in automatically devising effective traffic signal control (TSC) policies. Neural policies, however, tend to be over-parameterized and non-transparent, hindering their interpretability and deployability on resource-limited edge devices. This work presents SymLight, a priority function search framework based on Monte Carlo Tree Search (MCTS) for discovering inherently interpretable and deployable symbolic priority functions to serve as the TSC policies. The priority function, in particular, accepts traffic features as input and then outputs a priority for each traffic signal phase, which subsequently directs the phase transition. For effective search, we propose a concise yet expressive priority function representation. This helps mitigate the combinatorial explosion of the action space in MCTS. Additionally, a probabilistic structural rollout strategy is introduced to leverage structural patterns from previously discovered high-quality priority functions, guiding the rollout process. Our experiments on real-world datasets demonstrate SymLight’s superior performance across a range of baselines. A key advantage is SymLight’s ability to produce interpretable and deployable TSC policies while maintaining excellent performance.
zh

[AI-172] Lived Experience in Dialogue: Co-designing Personalization in Large Language Models to Support Youth Mental Well-being

链接: https://arxiv.org/abs/2511.05769
作者: Kathleen W. Guan,Sarthak Giri,Mohammed Amara,Bernard J. Jansen,Enrico Liscio,Milena Esherick,Mohammed Al Owayyed,Ausrine Ratkute,Gayane Sedrakyan,Mark de Reuver,Joao Fernando Ferreira Goncalves,Caroline A. Figueroa
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-173] CoT-X: An Adaptive Framework for Cross-Model Chain-of-Thought Transfer and Optimization KDD2025

【速读】:该论文旨在解决链式思维(Chain-of-Thought, CoT)推理在大语言模型(Large Language Models, LLMs)中带来的显著推理开销问题,从而限制其在资源受限场景下的部署。解决方案的关键在于提出一种自适应的推理摘要框架,通过语义分割结合重要性评分对推理路径进行压缩,引入预算感知的动态压缩策略,并辅以连贯性重建机制,在保留关键推理步骤的同时大幅降低token消耗。该方法在医疗考试题库上验证了在相同token预算下比简单截断提升最高达40%的准确率,并展现出跨模型规模与架构的强迁移能力。

链接: https://arxiv.org/abs/2511.05747
作者: Ziqian Bi,Kaijie Chen,Tianyang Wang,Junfeng Hao,Xinyuan Song
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: TKDD 2025

点击查看摘要

Abstract:Chain-of-Thought (CoT) reasoning enhances the problem-solving ability of large language models (LLMs) but leads to substantial inference overhead, limiting deployment in resource-constrained settings. This paper investigates efficient CoT transfer across models of different scales and architectures through an adaptive reasoning summarization framework. The proposed method compresses reasoning traces via semantic segmentation with importance scoring, budget-aware dynamic compression, and coherence reconstruction, preserving critical reasoning steps while significantly reducing token usage. Experiments on 7,501 medical examination questions across 10 specialties show up to 40% higher accuracy than truncation under the same token budgets. Evaluations on 64 model pairs from eight LLMs (1.5B-32B parameters, including DeepSeek-R1 and Qwen3) confirm strong cross-model transferability. Furthermore, a Gaussian Process-based Bayesian optimization module reduces evaluation cost by 84% and reveals a power-law relationship between model size and cross-domain robustness. These results demonstrate that reasoning summarization provides a practical path toward efficient CoT transfer, enabling advanced reasoning under tight computational constraints. Code will be released upon publication.
zh

[AI-174] Beyond Redundancy: Diverse and Specialized Multi-Expert Sparse Autoencoder

【速读】:该论文旨在解决稀疏自编码器(Sparse Autoencoders, SAEs)在大语言模型(Large Language Models, LLMs)解释性分析中面临的可扩展性难题:高维度隐藏层虽能提升解释性,但导致训练与推理成本过高。现有基于混合专家(Mixture of Experts, MoE)的方法试图通过分组专家网络降低计算开销,但其关键瓶颈在于专家之间缺乏特征专业化,常出现冗余或重复学习的现象。本文提出两个核心创新:一是多专家激活机制(Multiple Expert Activation),通过同时调用语义加权的专家子集来促进特征分工;二是特征缩放机制(Feature Scaling),利用自适应高频缩放增强特征多样性。实验表明,该方案相较现有MoE-SAE方法实现了24%更低的重构误差和99%的特征冗余减少,有效弥合了LLM分析中解释性与效率之间的鸿沟。

链接: https://arxiv.org/abs/2511.05745
作者: Zhen Xu,Zhen Tan,Song Wang,Kaidi Xu,Tianlong Chen
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Sparse autoencoders (SAEs) have emerged as a powerful tool for interpreting large language models (LLMs) by decomposing token activations into combinations of human-understandable features. While SAEs provide crucial insights into LLM explanations, their practical adoption faces a fundamental challenge: better interpretability demands that SAEs’ hidden layers have high dimensionality to satisfy sparsity constraints, resulting in prohibitive training and inference costs. Recent Mixture of Experts (MoE) approaches attempt to address this by partitioning SAEs into narrower expert networks with gated activation, thereby reducing computation. In a well-designed MoE, each expert should focus on learning a distinct set of features. However, we identify a \textitcritical limitation in MoE-SAE: Experts often fail to specialize, which means they frequently learn overlapping or identical features. To deal with it, we propose two key innovations: (1) Multiple Expert Activation that simultaneously engages semantically weighted expert subsets to encourage specialization, and (2) Feature Scaling that enhances diversity through adaptive high-frequency scaling. Experiments demonstrate a 24% lower reconstruction error and a 99% reduction in feature redundancy compared to existing MoE-SAE methods. This work bridges the interpretability-efficiency gap in LLM analysis, allowing transparent model inspection without compromising computational feasibility.
zh

[AI-175] Compressing Chemistry Reveals Functional Groups

【速读】:该论文旨在解决传统化学功能基团(functional group)在化学解释中的效用缺乏系统性评估的问题,尤其是其在分子数据压缩与生物活性预测中的表现。解决方案的关键在于引入基于最小消息长度(Minimum Message Length, MML)原则的无监督学习算法,通过搜索能够压缩约三百万个生物相关分子数据的子结构,发现既包含已知功能基团又包含具有更特异性功能的新颖大尺度模式;同时,在24个特定生物活性预测数据集上提取数据集特异的功能基团,并构建指纹特征用于回归建模,结果表明这些指纹显著优于MACCS和Morgan指纹等传统表示方法。

链接: https://arxiv.org/abs/2511.05728
作者: Ruben Sharma,Ross D. King
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT)
备注:

点击查看摘要

Abstract:We introduce the first formal large-scale assessment of the utility of traditional chemical functional groups as used in chemical explanations. Our assessment employs a fundamental principle from computational learning theory: a good explanation of data should also compress the data. We introduce an unsupervised learning algorithm based on the Minimum Message Length (MML) principle that searches for substructures that compress around three million biologically relevant molecules. We demonstrate that the discovered substructures contain most human-curated functional groups as well as novel larger patterns with more specific functions. We also run our algorithm on 24 specific bioactivity prediction datasets to discover dataset-specific functional groups. Fingerprints constructed from dataset-specific functional groups are shown to significantly outperform other fingerprint representations, including the MACCS and Morgan fingerprint, when training ridge regression models on bioactivity regression tasks.
zh

[AI-176] AdvisingWise: Supporting Academic Advising in Higher Educations Through a Human-in-the-Loop Multi-Agent Framework

【速读】:该论文旨在解决高校中学术顾问(academic advising)因师生比过高而导致支持响应不及时的问题,尤其是在学业高峰期。解决方案的关键在于提出并实现了一个名为AdvisingWise的多智能体系统(multi-agent system),该系统通过自动化信息检索与回复起草等重复性任务,在保持人类顾问最终审核的前提下,提升 advising 效率与个性化水平。其核心创新在于结合权威机构资源与自适应提示机制(adaptive prompting),确保生成内容既准确又贴合学生个体背景,从而实现人机协同(human-AI synergy)下的高质量学术指导服务。

链接: https://arxiv.org/abs/2511.05706
作者: Wendan Jiang,Shiyuan Wang,Hiba Eltigani,Rukhshan Haroon,Abdullah Bin Faisal,Fahad Dogar
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注: 18 pages, 6 figures

点击查看摘要

Abstract:Academic advising is critical to student success in higher education, yet high student-to-advisor ratios limit advisors’ capacity to provide timely support, particularly during peak periods. Recent advances in Large Language Models (LLMs) present opportunities to enhance the advising process. We present AdvisingWise, a multi-agent system that automates time-consuming tasks, such as information retrieval and response drafting, while preserving human oversight. AdvisingWise leverages authoritative institutional resources and adaptively prompts students about their academic backgrounds to generate reliable, personalized responses. All system responses undergo human advisor validation before delivery to students. We evaluate AdvisingWise through a mixed-methods approach: (1) expert evaluation on responses of 20 sample queries, (2) LLM-as-a-judge evaluation of the information retrieval strategy, and (3) a user study with 8 academic advisors to assess the system’s practical utility. Our evaluation shows that AdvisingWise produces accurate, personalized responses. Advisors reported increasingly positive perceptions after using AdvisingWise, as their initial concerns about reliability and personalization diminished. We conclude by discussing the implications of human-AI synergy on the practice of academic advising.
zh

[AI-177] SSTODE: Ocean-Atmosphere Physics-Informed Neural ODEs for Sea Surface Temperature Prediction AAAI

【速读】:该论文旨在解决当前数据驱动的海表温度(Sea Surface Temperature, SST)预测模型中存在的可解释性差和物理过程建模不足的问题,尤其是现有物理信息神经网络在复杂海洋-大气动力学模拟中难以准确刻画海水运动(如沿岸上升流)及外部热通量等驱动因素的影响。其解决方案的关键在于提出一种基于物理约束的神经微分方程框架——SSTODE,该方法首先从流体输运原理出发构建常微分方程(Ordinary Differential Equations, ODEs),显式融合平流与扩散项以描述海洋时空动态;其次通过变分优化恢复隐式速度场来精确刻画SST的时间演化机制,并进一步引入受能量收支方程启发的“能量交换积分器”(Energy Exchanges Integrator, EEI)模块,量化外部强迫因子(如湍流热通量)对SST变化的贡献,从而实现高精度且具物理一致性的SST预测。

链接: https://arxiv.org/abs/2511.05629
作者: Zheng Jiang,Wei Wang,Gaowei Zhang,Yi Wang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Atmospheric and Oceanic Physics (physics.ao-ph)
备注: To be published in the Proceedings of AAAI-AISI 2026

点击查看摘要

Abstract:Sea Surface Temperature (SST) is crucial for understanding upper-ocean thermal dynamics and ocean-atmosphere interactions, which have profound economic and social impacts. While data-driven models show promise in SST prediction, their black-box nature often limits interpretability and overlooks key physical processes. Recently, physics-informed neural networks have been gaining momentum but struggle with complex ocean-atmosphere dynamics due to 1) inadequate characterization of seawater movement (e.g., coastal upwelling) and 2) insufficient integration of external SST drivers (e.g., turbulent heat fluxes). To address these challenges, we propose SSTODE, a physics-informed Neural Ordinary Differential Equations (Neural ODEs) framework for SST prediction. First, we derive ODEs from fluid transport principles, incorporating both advection and diffusion to model ocean spatiotemporal dynamics. Through variational optimization, we recover a latent velocity field that explicitly governs the temporal dynamics of SST. Building upon ODE, we introduce an Energy Exchanges Integrator (EEI)-inspired by ocean heat budget equations-to account for external forcing factors. Thus, the variations in the components of these factors provide deeper insights into SST dynamics. Extensive experiments demonstrate that SSTODE achieves state-of-the-art performances in global and regional SST forecasting benchmarks. Furthermore, SSTODE visually reveals the impact of advection dynamics, thermal diffusion patterns, and diurnal heating-cooling cycles on SST evolution. These findings demonstrate the model’s interpretability and physical consistency.
zh

[AI-178] Unveiling the Training Dynamics of ReLU Networks through a Linear Lens

链接: https://arxiv.org/abs/2511.05628
作者: Longqing Ye
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[AI-179] Assessing the Reliability of Large Language Models in the Bengali Legal Context: A Comparative Evaluation Using LLM -as-Judge and Legal Experts

【速读】:该论文试图解决在孟加拉国获取法律帮助困难的问题,包括高昂的费用、复杂的法律语言、律师短缺以及数百万未解决的法院案件。其解决方案的关键在于评估生成式 AI(Generative AI)模型在提供法律咨询方面的潜力与风险,通过构建一个包含事实准确性、法律适当性、完整性与清晰度的双重评价框架,结合专家评审与自动化指标(如 BLEU 分数),系统性地检验主流大语言模型(如 GPT-4.1 Mini、Gemini 2.0 Flash、Llama 3 70B 和 DeepSeek R1)在真实法律问题上的响应质量。研究发现,尽管AI能生成结构良好且看似专业的回答,但常存在伪造判例引用、错误程序指引等严重误导信息,凸显了在部署前必须建立严格的专家验证机制和全面的安全保障措施。

链接: https://arxiv.org/abs/2511.05627
作者: Sabik Aftahee,A.F.M. Farhad,Arpita Mallik,Ratnajit Dhar,Jawadul Karim,Nahiyan Bin Noor,Ishmam Ahmed Solaiman
机构: 未知
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Accessing legal help in Bangladesh is hard. People face high fees, complex legal language, a shortage of lawyers, and millions of unresolved court cases. Generative AI models like OpenAI GPT-4.1 Mini, Gemini 2.0 Flash, Meta Llama 3 70B, and DeepSeek R1 could potentially democratize legal assistance by providing quick and affordable legal advice. In this study, we collected 250 authentic legal questions from the Facebook group “Know Your Rights,” where verified legal experts regularly provide authoritative answers. These questions were subsequently submitted to four four advanced AI models and responses were generated using a consistent, standardized prompt. A comprehensive dual evaluation framework was employed, in which a state-of-the-art LLM model served as a judge, assessing each AI-generated response across four critical dimensions: factual accuracy, legal appropriateness, completeness, and clarity. Following this, the same set of questions was evaluated by three licensed Bangladeshi legal professionals according to the same criteria. In addition, automated evaluation metrics, including BLEU scores, were applied to assess response similarity. Our findings reveal a complex landscape where AI models frequently generate high-quality, well-structured legal responses but also produce dangerous misinformation, including fabricated case citations, incorrect legal procedures, and potentially harmful advice. These results underscore the critical need for rigorous expert validation and comprehensive safeguards before AI systems can be safely deployed for legal consultation in Bangladesh.
zh

[AI-180] LLM s as Packagers of HPC Software

【速读】:该论文旨在解决高性能量子计算(High Performance Computing, HPC)软件生态中依赖管理的难题,即如何高效、准确地生成可维护的Spack包配置文件(Spack recipes),以支持科学应用对数百个外部依赖项的复杂构建需求。传统方法依赖人工编写和维护这些recipe,成本高昂且难以扩展。论文提出的解决方案核心在于设计并实现了一个名为SpackIt的端到端框架,其关键创新包括:基于代码仓库分析的上下文增强、相关示例检索机制以及通过诊断反馈进行迭代优化的结构化流程。实验证明,该方案将零样本场景下的安装成功率从20%提升至80%以上,显著优于单纯依赖大语言模型(Large Language Models, LLMs)的直接生成方式,验证了检索增强与反馈驱动在可靠包合成中的重要价值。

链接: https://arxiv.org/abs/2511.05626
作者: Caetano Melone,Daniel Nichols,Konstantinos Parasyris,Todd Gamblin,Harshitha Menon
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
备注:

点击查看摘要

Abstract:High performance computing (HPC) software ecosystems are inherently heterogeneous, comprising scientific applications that depend on hundreds of external packages, each with distinct build systems, options, and dependency constraints. Tools such as Spack automate dependency resolution and environment management, but their effectiveness relies on manually written build recipes. As these ecosystems grow, maintaining existing specifications and creating new ones becomes increasingly labor-intensive. While large language models (LLMs) have shown promise in code generation, automatically producing correct and maintainable Spack recipes remains a significant challenge. We present a systematic analysis of how LLMs and context-augmentation methods can assist in the generation of Spack recipes. To this end, we introduce SpackIt, an end-to-end framework that combines repository analysis, retrieval of relevant examples, and iterative refinement through diagnostic feedback. We apply SpackIt to a representative subset of 308 open-source HPC packages to assess its effectiveness and limitations. Our results show that SpackIt increases installation success from 20% in a zero-shot setting to over 80% in its best configuration, demonstrating the value of retrieval and structured feedback for reliable package synthesis.
zh

[AI-181] Report from Workshop on Dialogue alongside Artificial Intelligence

【速读】:该论文旨在解决人工智能(AI)在教育对话(educational dialogue)中的应用边界与潜在风险问题,即如何在不削弱人类学习主体性与社会互动的基础上,合理利用AI促进深度学习和批判性思维。其解决方案的关键在于通过国际专家研讨明确三个核心问题:AI何时真正有助于教育、何种条件下能提升对话式教学效果,以及AI-人类协作是否可能替代或超越教育工作者的角色及其伦理后果。这为未来AI赋能教育提供了基于实证与政策反思的理论框架与实践指引。

链接: https://arxiv.org/abs/2511.05625
作者: Thomas J McKenna(Boston University),Ingvill Rasmussen(University of Oslo),Sten Ludvigsen(University of Oslo),Avivit Arvatz(The Hebrew University of Jerusalem),Christa Asterhan(The Hebrew University of Jerusalem),Gaowei Chen(The University of Hong Kong),Julie Cohen(University of Virginia),Michele Flammia(Independent Scholar),Dongkeun Han(University of Cambridge),Emma Hayward(University of Cambridge),Heather Hill(Harvard University),Yifat Kolikant(The Hebrew University of Jerusalem),Helen Lehndorf(Freie Universität Berlin),Kexin Li(The University of Hong Kong),Lindsay Clare Matsumura(University of Pittsburgh),Henrik Tjønn(University of Oslo),Pengjin Wang(The University of Hong Kong),Rupert Wegerif(University of Cambridge)
机构: 未知
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
备注: Report from the Workshop on Dialogue alongside Artificial Intelligence (2025)

点击查看摘要

Abstract:Educational dialogue -the collaborative exchange of ideas through talk- is widely recognized as a catalyst for deeper learning and critical thinking in and across contexts. At the same time, artificial intelligence (AI) has rapidly emerged as a powerful force in education, with the potential to address major challenges, personalize learning, and innovate teaching practices. However, these advances come with significant risks: rapid AI development can undermine human agency, exacerbate inequities, and outpace our capacity to guide its use with sound policy. Human learning presupposes cognitive efforts and social interaction (dialogues). In response to this evolving landscape, an international workshop titled “Educational Dialogue: Moving Thinking Forward” convened 19 leading researchers from 11 countries in Cambridge (September 1-3, 2025) to examine the intersection of AI and educational dialogue. This AI-focused strand of the workshop centered on three critical questions: (1) When is AI truly useful in education, and when might it merely replace human effort at the expense of learning? (2) Under what conditions can AI use lead to better dialogic teaching and learning? (3) Does the AI-human partnership risk outpacing and displacing human educational work, and what are the implications? These questions framed two days of presentations and structured dialogue among participants.
zh

[AI-182] Frequency Matters: When Time Series Foundation Models Fail Under Spectral Shift NEURIPS2025

【速读】:该论文旨在解决时间序列基础模型(Time Series Foundation Models, TSFMs)在工业场景中泛化能力不足的问题,特别是其在实际应用中表现不如领域适配的基线模型。研究表明,造成这一现象的关键因素是频谱偏移(spectral shift),即下游任务中的主导频率成分与预训练阶段所学习的频率分布不一致。解决方案的核心在于提升TSFM对频率特性的感知能力,通过设计受控的合成实验验证了频谱不匹配会导致系统性性能下降,从而提出应建立更注重频谱多样性的预训练和评估协议,以增强模型在真实工业环境中的鲁棒性。

链接: https://arxiv.org/abs/2511.05619
作者: Tianze Wang,Sofiane Ennadir,John Pertoft,Gabriela Zarzar Gandler,Lele Cao,Zineb Senane,Styliani Katsarou,Sahar Asadi,Axel Karlsson,Oleg Smirnov
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Accepted and presented at NeurIPS 2025 Workshop on Recent Advances in Time Series Foundation Models (BERT2S)

点击查看摘要

Abstract:Time series foundation models (TSFMs) have shown strong results on public benchmarks, prompting comparisons to a “BERT moment” for time series. Their effectiveness in industrial settings, however, remains uncertain. We examine why TSFMs often struggle to generalize and highlight spectral shift (a mismatch between the dominant frequency components in downstream tasks and those represented during pretraining) as a key factor. We present evidence from an industrial-scale player engagement prediction task in mobile gaming, where TSFMs underperform domain-adapted baselines. To isolate the mechanism, we design controlled synthetic experiments contrasting signals with seen versus unseen frequency bands, observing systematic degradation under spectral mismatch. These findings position frequency awareness as critical for robust TSFM deployment and motivate new pretraining and evaluation protocols that explicitly account for spectral diversity.
zh

[AI-183] wa-hls4ml: A Benchmark and Surrogate Models for hls4ml Resource and Latency Estimation

【速读】:该论文旨在解决在硬件加速器设计中,随着生成式 AI (Generative AI) 和机器学习(ML)模型复杂度提升,传统设计流程中硬件综合(hardware synthesis)环节逐渐成为限制快速迭代的关键瓶颈问题。解决方案的关键在于构建一个名为 wa-hls4ml 的基准测试平台,其包含超过68万条全连接和卷积神经网络的资源与延迟数据集,所有模型均通过 hls4ml 工具链在 Xilinx FPGA 上合成。在此基础上,研究提出基于图神经网络(GNN)和 Transformer 架构的代理模型(surrogate models),用于高效预测 ML 加速器的资源占用与延迟性能,实验表明这些模型能够在合成测试集上以几百分比的误差准确估计第75百分位的资源使用情况,从而显著加速设计空间探索与优化过程。

链接: https://arxiv.org/abs/2511.05615
作者: Benjamin Hawks,Jason Weitz,Dmitri Demler,Karla Tame-Narvaez,Dennis Plotnikov,Mohammad Mehdi Rahimifar,Hamza Ezzaoui Rahali,Audrey C. Therrien,Donovan Sproule,Elham E Khoda,Keegan A. Smith,Russell Marroquin,Giuseppe Di Guglielmo,Nhan Tran,Javier Duarte,Vladimir Loncar
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Instrumentation and Detectors (physics.ins-det)
备注: 30 pages, 18 figures

点击查看摘要

Abstract:As machine learning (ML) is increasingly implemented in hardware to address real-time challenges in scientific applications, the development of advanced toolchains has significantly reduced the time required to iterate on various designs. These advancements have solved major obstacles, but also exposed new challenges. For example, processes that were not previously considered bottlenecks, such as hardware synthesis, are becoming limiting factors in the rapid iteration of designs. To mitigate these emerging constraints, multiple efforts have been undertaken to develop an ML-based surrogate model that estimates resource usage of ML accelerator architectures. We introduce wa-hls4ml, a benchmark for ML accelerator resource and latency estimation, and its corresponding initial dataset of over 680,000 fully connected and convolutional neural networks, all synthesized using hls4ml and targeting Xilinx FPGAs. The benchmark evaluates the performance of resource and latency predictors against several common ML model architectures, primarily originating from scientific domains, as exemplar models, and the average performance across a subset of the dataset. Additionally, we introduce GNN- and transformer-based surrogate models that predict latency and resources for ML accelerators. We present the architecture and performance of the models and find that the models generally predict latency and resources for the 75% percentile within several percent of the synthesized resources on the synthetic test dataset.
zh

[AI-184] An MLCommons Scientific Benchmarks Ontology

【速读】:该论文旨在解决科学机器学习(Scientific Machine Learning, SML)领域中基准测试(benchmarking)碎片化和缺乏标准化的问题,这导致了AI在关键科学应用场景中的创新路径不清晰、影响力难以量化。其解决方案的关键在于构建一个由社区驱动的统一本体(ontology),扩展MLCommons生态系统,覆盖物理、化学、材料科学、生物学、气候科学等多个学科,将分散的基准测试整合为一套涵盖科学级、应用级和系统级的分类体系,并通过开放提交流程与六类评级标准确保新基准的质量与可比性,从而提供一个标准化、可扩展且支持跨领域复现的科学机器学习基准框架。

链接: https://arxiv.org/abs/2511.05614
作者: Ben Hawks,Gregor von Laszewski,Matthew D. Sinclair,Marco Colombo,Shivaram Venkataraman,Rutwik Jain,Yiwei Jiang,Nhan Tran,Geoffrey Fox
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Performance (cs.PF); Computational Physics (physics.comp-ph)
备注: 16 Pages, 3 Figures

点击查看摘要

Abstract:Scientific machine learning research spans diverse domains and data modalities, yet existing benchmark efforts remain siloed and lack standardization. This makes novel and transformative applications of machine learning to critical scientific use-cases more fragmented and less clear in pathways to impact. This paper introduces an ontology for scientific benchmarking developed through a unified, community-driven effort that extends the MLCommons ecosystem to cover physics, chemistry, materials science, biology, climate science, and more. Building on prior initiatives such as XAI-BENCH, FastML Science Benchmarks, PDEBench, and the SciMLBench framework, our effort consolidates a large set of disparate benchmarks and frameworks into a single taxonomy of scientific, application, and system-level benchmarks. New benchmarks can be added through an open submission workflow coordinated by the MLCommons Science Working Group and evaluated against a six-category rating rubric that promotes and identifies high-quality benchmarks, enabling stakeholders to select benchmarks that meet their specific needs. The architecture is extensible, supporting future scientific and AI/ML motifs, and we discuss methods for identifying emerging computing patterns for unique scientific workloads. The MLCommons Science Benchmarks Ontology provides a standardized, scalable foundation for reproducible, cross-domain benchmarking in scientific machine learning. A companion webpage for this work has also been developed as the effort evolves: this https URL
zh

[AI-185] Who Evaluates AIs Social Impacts? Mapping Coverag e and Gaps in First and Third Party Evaluations

【速读】:该论文试图解决当前人工智能(AI)系统中社会影响评估(social impact assessment)的不均衡与不足问题,特别是针对偏见、公平性、隐私、环境成本及劳动实践等方面的评估在第一方(first-party)和第三方(third-party)报告之间存在显著差距。研究发现,第一方发布报告普遍稀疏且趋于表面化,尤其在环境影响和偏见方面呈下降趋势;而第三方评估虽覆盖更广、更具严谨性,但无法获取开发者独有的数据来源、内容审核劳动等关键信息。解决方案的关键在于构建一个由政策驱动的多方协作体系:一方面强化模型开发者的透明度义务,推动其主动披露核心社会影响指标;另一方面通过建立共享基础设施来整合并标准化第三方评估结果,从而形成对AI社会影响的全面、可比、可持续的评估机制。

链接: https://arxiv.org/abs/2511.05613
作者: Anka Reuel,Avijit Ghosh,Jenny Chim,Andrew Tran,Yanan Long,Jennifer Mickel,Usman Gohar,Srishti Yadav,Pawan Sasanka Ammanamanchi,Mowafak Allaham,Hossein A. Rahmani,Mubashara Akhtar,Felix Friedrich,Robert Scholz,Michael Alexander Riegler,Jan Batzner,Eliya Habba,Arushi Saxena,Anastassia Kornilova,Kevin Wei,Prajna Soni,Yohan Mathew,Kevin Klyman,Jeba Sania,Subramanyam Sahoo,Olivia Beyer Bruvik,Pouya Sadeghi,Sujata Goswami,Angelina Wang,Yacine Jernite,Zeerak Talat,Stella Biderman,Mykel Kochenderfer,Sanmi Koyejo,Irene Solaiman
机构: 未知
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Foundation models are increasingly central to high-stakes AI systems, and governance frameworks now depend on evaluations to assess their risks and capabilities. Although general capability evaluations are widespread, social impact assessments covering bias, fairness, privacy, environmental costs, and labor practices remain uneven across the AI ecosystem. To characterize this landscape, we conduct the first comprehensive analysis of both first-party and third-party social impact evaluation reporting across a wide range of model developers. Our study examines 186 first-party release reports and 183 post-release evaluation sources, and complements this quantitative analysis with interviews of model developers. We find a clear division of evaluation labor: first-party reporting is sparse, often superficial, and has declined over time in key areas such as environmental impact and bias, while third-party evaluators including academic researchers, nonprofits, and independent organizations provide broader and more rigorous coverage of bias, harmful content, and performance disparities. However, this complementarity has limits. Only model developers can authoritatively report on data provenance, content moderation labor, financial costs, and training infrastructure, yet interviews reveal that these disclosures are often deprioritized unless tied to product adoption or regulatory compliance. Our findings indicate that current evaluation practices leave major gaps in assessing AI’s societal impacts, highlighting the urgent need for policies that promote developer transparency, strengthen independent evaluation ecosystems, and create shared infrastructure to aggregate and compare third-party evaluations in a consistent and accessible way.
zh

[AI-186] Conformal Prediction-Driven Adaptive Sampling for Digital Twins of Water Distribution Networks

【速读】:该论文旨在解决数字孪生(Digital Twin, DT)在供水管网(Water Distribution Networks, WDNs)中状态估计时,因传感器部署受限而难以实现精准监测的问题。传统均匀采样策略忽视了不同节点间不确定性差异,导致资源浪费。其解决方案的关键在于提出一种自适应框架,融合长短期记忆网络(LSTM)预测与保形预测(Conformal Prediction, CP)技术,通过边际CP方法量化每个节点的不确定性,并据此动态优化传感器部署位置,从而在保证高覆盖率的同时显著降低需求误差(实验显示在40%传感器覆盖率下误差降低33–34%),且仅需5–10%额外计算开销即可维持89.4–90.2%的经验覆盖水平。

链接: https://arxiv.org/abs/2511.05610
作者: Mohammadhossein Homaei,Oscar Mogollon Gutierrez,Ruben Molano,Andres Caro,Mar Avila
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
备注: 6 Pages, 7 tables, 1 Figure

点击查看摘要

Abstract:Digital Twins (DTs) for Water Distribution Networks (WDNs) require accurate state estimation with limited sensors. Uniform sampling often wastes resources across nodes with different uncertainty. We propose an adaptive framework combining LSTM forecasting and Conformal Prediction (CP) to estimate node-wise uncertainty and focus sensing on the most uncertain points. Marginal CP is used for its low computational cost, suitable for real-time DTs. Experiments on Hanoi, Net3, and CTOWN show 33-34% lower demand error than uniform sampling at 40% coverage and maintain 89.4-90.2% empirical coverage with only 5-10% extra computation.
zh

[AI-187] From Prompts to Power: Measuring the Energy Footprint of LLM Inference

【速读】:该论文旨在解决生成式 AI(Generative AI)在推理阶段(inference)的能源消耗问题,尤其是随着大语言模型(Large Language Models, LLMs)规模扩大,其推理能耗已显著超过训练阶段,并成为生命周期总能耗的主要部分。现有研究对推理能效缺乏系统性分析,限制了优化和可持续部署的可能。解决方案的关键在于通过大规模实测(超过32,500次测量)构建涵盖21种GPU配置与155种模型架构的数据集,利用vLLM推理引擎实现提示级别(prompt-level)的能量计量,并基于此建立一个可泛化的预测模型,能够准确估算未见过的模型架构与硬件组合下的推理能耗,最终以浏览器扩展形式落地,提升用户对生成式AI环境影响的认知。

链接: https://arxiv.org/abs/2511.05597
作者: Francisco Caravaca,Ángel Cuevas,Rubén Cuevas
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:The rapid expansion of Large Language Models (LLMs) has introduced unprecedented energy demands, extending beyond training to large-scale inference workloads that often dominate total lifecycle consumption. Deploying these models requires energy-intensive GPU infrastructure, and in some cases has even prompted plans to power data centers with nuclear energy. Despite this growing relevance, systematic analyses of inference energy consumption remain limited. In this work, we present a large-scale measurement-based study comprising over 32,500 measurements across 21 GPU configurations and 155 model architectures, from small open-source models to frontier systems. Using the vLLM inference engine, we quantify energy usage at the prompt level and identify how architectural and operational factors shape energy demand. Building on these insights, we develop a predictive model that accurately estimates inference energy consumption across unseen architectures and hardware, and implement it as a browser extension to raise awareness of the environmental impact of generative AI.
zh

[AI-188] FlowNet: Modeling Dynamic Spatio-Temporal Systems via Flow Propagation

【速读】:该论文旨在解决现有方法在建模复杂动态时空系统时无法准确捕捉流驱动的相互依赖关系和情境敏感的交互动态的问题。传统图结构或注意力机制的方法多依赖于相似性驱动的连接假设,忽略了支配系统演化的非对称流交换过程。其解决方案的关键在于提出一种受物理启发的“时空流”(Spatio-Temporal Flow)范式,通过可量化的流转移显式建模动态节点耦合,并基于守恒定律约束状态传播;在此基础上设计的FlowNet架构利用流令牌(flow tokens)作为信息载体,通过流分配模块(Flow Allocation Modules)模拟源到目标的传递过程,确保状态重分布符合守恒律,同时借助自适应空间掩码模块(Adaptive Spatial Masking)动态调整交互半径以抑制噪声并实现上下文感知传播,从而显著提升模型的准确性、可扩展性和物理可解释性。

链接: https://arxiv.org/abs/2511.05595
作者: Yutong Feng,Xu Liu,Yutong Xia,Yuxuan Liang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Accurately modeling complex dynamic spatio-temporal systems requires capturing flow-mediated interdependencies and context-sensitive interaction dynamics. Existing methods, predominantly graph-based or attention-driven, rely on similarity-driven connectivity assumptions, neglecting asymmetric flow exchanges that govern system evolution. We propose Spatio-Temporal Flow, a physics-inspired paradigm that explicitly models dynamic node couplings through quantifiable flow transfers governed by conservation principles. Building on this, we design FlowNet, a novel architecture leveraging flow tokens as information carriers to simulate source-to-destination transfers via Flow Allocation Modules, ensuring state redistribution aligns with conservation laws. FlowNet dynamically adjusts the interaction radius through an Adaptive Spatial Masking module, suppressing irrelevant noise while enabling context-aware propagation. A cascaded architecture enhances scalability and nonlinear representation capacity. Experiments demonstrate that FlowNet significantly outperforms existing state-of-the-art approaches on seven metrics in the modeling of three real-world systems, validating its efficiency and physical interpretability. We establish a principled methodology for modeling complex systems through spatio-temporal flow interactions.
zh

[AI-189] CoPRIS: Efficient and Stable Reinforcement Learning via Concurrency-Controlled Partial Rollout with Importance Sampling

【速读】:该论文旨在解决大规模语言模型(Large Language Models, LLMs)在强化学习(Reinforcement Learning, RL)后训练过程中因完全同步机制导致的效率低下问题。现有RL系统需等待整个批次的轨迹(trajectory)完成才能进行下一步训练,长轨迹会显著拖慢整体流程并造成GPU资源闲置。解决方案的关键在于提出一种并发控制的部分轨迹滚动(Concurrency-Controlled Partial Rollout with Importance Sampling, CoPRIS)方法:通过固定数量的并发滚动任务、提前终止已收集足够样本的轨迹,并复用未完成轨迹以提升资源利用率;同时引入跨阶段重要性采样校正(Cross-stage Importance Sampling Correction),将前一策略的对数概率与当前策略重新计算的概率拼接用于重要性采样校正,从而缓解离策略轨迹带来的偏差。实验表明,CoPRIS在数学推理基准测试中实现最高1.94倍的训练加速,且性能相当或更优。

链接: https://arxiv.org/abs/2511.05589
作者: Zekai Qu,Yinxu Pan,Ao Sun,Chaojun Xiao,Xu Han
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 13 pages, 4 figures

点击查看摘要

Abstract:Reinforcement learning (RL) post-training has become a trending paradigm for enhancing the capabilities of large language models (LLMs). Most existing RL systems for LLMs operate in a fully synchronous manner, where training must wait for the rollout of an entire batch to complete. This design leads to severe inefficiencies, as extremely long trajectories can stall the entire rollout process and leave many GPUs idle. To address this issue, we propose Concurrency- Controlled Partial Rollout with Importance Sampling (CoPRIS), which mitigates long-tail inefficiencies by maintaining a fixed number of concurrent rollouts, early-terminating once sufficient samples are collected, and reusing unfinished trajectories in subsequent rollouts. To mitigate the impact of off-policy trajectories, we introduce Cross-stage Importance Sampling Correction, which concatenates buffered log probabilities from the previous policy with those recomputed under the current policy for importance sampling correction. Experiments on challenging mathematical reasoning benchmarks show that CoPRIS achieves up to 1.94x faster training while maintaining comparable or superior performance to synchronous RL systems. The code of CoPRIS is available at this https URL.
zh

[AI-190] Lookahead Unmasking Elicits Accurate Decoding in Diffusion Language Models

【速读】:该论文旨在解决掩码扩散语言模型(Masked Diffusion Models, MDMs)在推理过程中因未mask token的顺序选择不当而导致性能受限的问题。现有启发式方法(如基于置信度的采样)仅局部优化,无法利用额外的测试时计算资源,且早期解码错误会引发连锁反应。解决方案的关键在于提出Lookahead Unmasking (LookUM),其核心是将采样过程重新建模为在所有可能的unmasking顺序中进行路径选择,无需外部奖励模型;该框架包含两个组件:(i) 路径生成器从多个unmasking集合池中采样生成候选路径,(ii) 验证器通过计算路径不确定性并执行重要性采样来筛选最优路径。实验证明,错误的unmasking会显著增加序列级不确定性,而LookUM有效利用这一特性规避高风险轨迹,在六个基准任务上均实现稳定提升,且仅需2–3条路径即可达到最优性能,体现出高效且通用的路径选择机制。

链接: https://arxiv.org/abs/2511.05563
作者: Sanghyun Lee,Seungryong Kim,Jongho Park,Dongmin Park
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Masked Diffusion Models (MDMs) as language models generate by iteratively unmasking tokens, yet their performance crucially depends on the inference time order of unmasking. Prevailing heuristics, such as confidence based sampling, are myopic: they optimize locally, fail to leverage extra test-time compute, and let early decoding mistakes cascade. We propose Lookahead Unmasking (LookUM), which addresses these concerns by reformulating sampling as path selection over all possible unmasking orders without the need for an external reward model. Our framework couples (i) a path generator that proposes paths by sampling from pools of unmasking sets with (ii) a verifier that computes the uncertainty of the proposed paths and performs importance sampling to subsequently select the final paths. Empirically, erroneous unmasking measurably inflates sequence level uncertainty, and our method exploits this to avoid error-prone trajectories. We validate our framework across six benchmarks, such as mathematics, planning, and coding, and demonstrate consistent performance improvements. LookUM requires only two to three paths to achieve peak performance, demonstrating remarkably efficient path selection. The consistent improvements on both LLaDA and post-trained LLaDA 1.5 are particularly striking: base LLaDA with LookUM rivals the performance of RL-tuned LLaDA 1.5, while LookUM further enhances LLaDA 1.5 itself showing that uncertainty based verification provides orthogonal benefits to reinforcement learning and underscoring the versatility of our framework. Code will be publicly released.
zh

[AI-191] Effective Test-Time Scaling of Discrete Diffusion through Iterative Refinement

【速读】:该论文旨在解决离散扩散模型(discrete diffusion models)在测试阶段通过奖励引导生成时,缺乏有效scaling策略的问题。现有方法通常假设当前状态已与奖励分布对齐,仅指导后续转移过程,难以应对初始状态偏离最优分布的情况。其解决方案的关键在于提出一种名为Iterative Reward-Guided Refinement (IterRef)的新方法,该方法基于多重尝试马尔可夫链蒙特卡洛(Multiple-Try Metropolis, MTM)框架,通过奖励引导的加噪-去噪转换,在测试阶段对每个中间状态进行原位精炼(in situ refinement),逐步将状态推向最优的奖励对齐分布,从而显著提升生成质量,尤其在低计算预算下表现优于现有最先进基线。

链接: https://arxiv.org/abs/2511.05562
作者: Sanghyun Lee,Sunwoo Kim,Seungryong Kim,Jongho Park,Dongmin Park
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Test-time scaling through reward-guided generation remains largely unexplored for discrete diffusion models despite its potential as a promising alternative. In this work, we introduce Iterative Reward-Guided Refinement (IterRef), a novel test-time scaling method tailored to discrete diffusion that leverages reward- guided noising-denoising transitions to progressively refine misaligned intermediate states. We formalize this process within a Multiple-Try Metropolis (MTM) framework, proving convergence to the reward-aligned distribution. Unlike prior methods that assume the current state is already aligned with the reward distribution and only guide the subsequent transition, our approach explicitly refines each state in situ, progressively steering it toward the optimal intermediate distribution. Across both text and image domains, we evaluate IterRef on diverse discrete diffusion models and observe consistent improvements in reward-guided generation quality. In particular, IterRef achieves striking gains under low compute budgets, far surpassing prior state-of-the-art baselines.
zh

[AI-192] Diversified Flow Matching with Translation Identifiability

【速读】:该论文旨在解决无配对域翻译中因内容错位(content misalignment)导致的翻译不可识别性问题,其核心挑战在于如何在不依赖配对数据的情况下,构建一个统一的翻译函数(translation function),将多样化的源分布映射到目标分布。传统方法如生成对抗网络(GAN)虽能实现此目标,但存在训练不稳定且无法提供传输轨迹信息的局限,而这类轨迹对单细胞演化分析、机器人路径规划等应用至关重要。为此,论文提出了一种基于常微分方程(ODE)的新型框架——多样化流匹配(Diversified Flow Matching, DFM),其关键创新在于:通过设计一种双层优化训练损失、引入非线性插值器(nonlinear interpolant)以及结构重参数化方法,克服了流匹配(Flow Matching, FM)仅学习速度场而非直接建模翻译函数的本质限制,从而首次实现了保证翻译可识别性的ODE基方法。

链接: https://arxiv.org/abs/2511.05558
作者: Sagar Shrestha,Xiao Fu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Diversified distribution matching (DDM) finds a unified translation function mapping a diverse collection of conditional source distributions to their target counterparts. DDM was proposed to resolve content misalignment issues in unpaired domain translation, achieving translation identifiability. However, DDM has only been implemented using GANs due to its constraints on the translation function. GANs are often unstable to train and do not provide the transport trajectory information – yet such trajectories are useful in applications such as single-cell evolution analysis and robot route planning. This work introduces diversified flow matching (DFM), an ODE-based framework for DDM. Adapting flow matching (FM) to enforce a unified translation function as in DDM is challenging, as FM learns the translation function’s velocity rather than the translation function itself. A custom bilevel optimization-based training loss, a nonlinear interpolant, and a structural reformulation are proposed to address these challenges, offering a tangible implementation. To our knowledge, DFM is the first ODE-based approach guaranteeing translation identifiability. Experiments on synthetic and real-world datasets validate the proposed method.
zh

[AI-193] Deep one-gate per layer networks with skip connections are universal classifiers

【速读】:该论文旨在解决如何将一个具有两层隐藏层的多层感知机(Multilayer Perceptron, MLP)有效转化为深度神经网络的问题,以提升模型的表达能力和训练效率。其解决方案的关键在于引入单门控层(one-gate layers)和跳跃连接(skip connections),从而在保持原有分类性能的基础上,增强网络的梯度流动性和深层结构的可训练性,实现从浅层MLP到深度神经网络的平滑过渡。

链接: https://arxiv.org/abs/2511.05552
作者: Raul Rojas
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 5 pages, 6 figures

点击查看摘要

Abstract:This paper shows how a multilayer perceptron with two hidden layers, which has been designed to classify two classes of data points, can easily be transformed into a deep neural network with one-gate layers and skip connections.
zh

[AI-194] AGRAG : Advanced Graph-based Retrieval-Augmented Generation for LLM s

【速读】:该论文旨在解决图结构增强生成(Graph-based Retrieval-Augmented Generation, Graph-based RAG)中存在的三个关键问题:1)因大语言模型(Large Language Models, LLMs)幻觉导致的图构建不准确;2)由于缺乏显式推理路径,LLM无法有效解释为何选择特定文本块,从而削弱了推理能力;3)因LLM推理不足导致回答不完整,使得性能在某些任务上落后于朴素RAG(NaiveRAG)。解决方案的核心在于提出AGRAG框架:首先采用基于统计的方法替代LLM实体抽取以避免错误传播;其次将图推理过程建模为最小成本最大影响力(Minimum Cost Maximum Influence, MCMI)子图生成问题,通过引入节点影响力得分与边成本权衡,生成更全面的推理路径;该MCMI子图可作为显式推理依据引导LLM聚焦查询相关部分,减少噪声干扰,并支持包含环路等复杂结构,显著提升推理能力和答案完整性。

链接: https://arxiv.org/abs/2511.05549
作者: Yubo Wang,Haoyang Li,Fei Teng,Lei Chen
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
备注:

点击查看摘要

Abstract:Graph-based retrieval-augmented generation (Graph-based RAG) has demonstrated significant potential in enhancing Large Language Models (LLMs) with structured knowledge. However, existing methods face three critical challenges: Inaccurate Graph Construction, caused by LLM hallucination; Poor Reasoning Ability, caused by failing to generate explicit reasons telling LLM why certain chunks were selected; and Inadequate Answering, which only partially answers the query due to the inadequate LLM reasoning, making their performance lag behind NaiveRAG on certain tasks. To address these issues, we propose AGRAG, an advanced graph-based retrieval-augmented generation framework. When constructing the graph, AGRAG substitutes the widely used LLM entity extraction method with a statistics-based method, avoiding hallucination and error propagation. When retrieval, AGRAG formulates the graph reasoning procedure as the Minimum Cost Maximum Influence (MCMI) subgraph generation problem, where we try to include more nodes with high influence score, but with less involving edge cost, to make the generated reasoning paths more comprehensive. We prove this problem to be NP-hard, and propose a greedy algorithm to solve it. The MCMI subgraph generated can serve as explicit reasoning paths to tell LLM why certain chunks were retrieved, thereby making the LLM better focus on the query-related part contents of the chunks, reducing the impact of noise, and improving AGRAG’s reasoning ability. Furthermore, compared with the simple tree-structured reasoning paths, our MCMI subgraph can allow more complex graph structures, such as cycles, and improve the comprehensiveness of the generated reasoning paths.
zh

[AI-195] SMAGDi: Socratic Multi Agent Interaction Graph Distillation for Efficient High Accuracy Reasoning NEURIPS2025

【速读】:该论文旨在解决多智能体系统(Multi-agent Systems, MAS)在推理准确性上虽优于单模型,但因依赖多轮智能体间辩论而导致计算成本过高的问题。其解决方案的关键在于提出一种名为SMAGDi的蒸馏框架,通过将五智体Llama基础MAS中的辩论动态转化为有向交互图(directed interaction graphs),其中节点编码带正确性标签的中间推理步骤,边则捕捉推理连续性和跨智能体影响;学生模型在此基础上采用复合目标函数进行训练,融合语言建模、基于图的监督、对比推理和嵌入对齐,从而在保持推理结构的同时压缩模型规模——实验表明,该方法可将40B参数的多智能体系统压缩至6B参数的学生模型,同时保留88%的原始准确率,显著优于MAGDi、标准知识蒸馏(Knowledge Distillation, KD)及微调基线方法。

链接: https://arxiv.org/abs/2511.05528
作者: Aayush Aluru,Myra Malik,Samarth Patankar,Spencer Kim,Kevin Zhu,Sean O’Brien,Vasu Sharma
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: Multi-Turn Interactions in Large Language Models (MTI-LLM) Workshop at NeurIPS 2025

点击查看摘要

Abstract:Multi-agent systems (MAS) often achieve higher reasoning accuracy than single models, but their reliance on repeated debates across agents makes them computationally expensive. We introduce SMAGDi, a distillation framework that transfers the debate dynamics of a five-agent Llama-based MAS into a compact Socratic decomposer-solver student. SMAGDi represents debate traces as directed interaction graphs, where nodes encode intermediate reasoning steps with correctness labels and edges capture continuity and cross-agent influence. The student is trained with a composite objective combining language modeling, graph-based supervision, contrastive reasoning, and embedding alignment to preserve both fluency and structured reasoning. On StrategyQA and MMLU, SMAGDi compresses a 40B multi-agent system into a 6B student while retaining 88% of its accuracy, substantially outperforming prior distillation methods such as MAGDi, standard KD, and fine-tuned baselines. These results highlight that explicitly modeling interaction graphs and Socratic decomposition enable small models to inherit the accuracy benefits of multi-agent debate while remaining efficient enough for real-world deployment.
zh

[AI-196] Evidence-Bound Autonomous Research (EviBound): A Governance Framework for Eliminating False Claims

【速读】:该论文旨在解决大语言模型(Large Language Model, LLM)驱动的自主研究代理(autonomous research agents)中存在的虚假声明问题,即任务标记为“完成”但实际缺失关键产出物、指标矛盾或执行失败的情况。解决方案的核心是提出EviBound框架,通过双层治理门控机制实现证据约束的执行:预执行审批门(Approval Gate)在代码运行前验证接受标准模式,主动识别结构违规;后执行验证门(Verification Gate)则通过MLflow API查询产出物并递归路径检查,同时可选地验证指定指标。只有当任务具备可查询的运行ID、必要产出物及FINISHED状态时,声明方可传播,并辅以有限次数(通常1–2次)的受控重试机制避免无限循环。实验表明,该方法在8项基准任务中实现0%虚假声明率,显著优于仅依赖提示词(100%幻觉)或仅事后验证(25%幻觉)的基线方案。

链接: https://arxiv.org/abs/2511.05524
作者: Ruiying Chen
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: 27 pages, 11 figures, 5 tables. Reproducibility package with MLflow artifacts and Google Colab notebooks available upon publication

点击查看摘要

Abstract:LLM-based autonomous research agents report false claims: tasks marked “complete” despite missing artifacts, contradictory metrics, or failed executions. EviBound is an evidence-bound execution framework that eliminates false claims through dual governance gates requiring machine-checkable evidence. Two complementary gates enforce evidence requirements. The pre-execution Approval Gate validates acceptance criteria schemas before code runs, catching structural violations proactively. The post-execution Verification Gate validates artifacts via MLflow API queries (with recursive path checking) and optionally validates metrics when specified by acceptance criteria. Claims propagate only when backed by a queryable run ID, required artifacts, and FINISHED status. Bounded, confidence-gated retries (typically 1-2 attempts) recover from transient failures without unbounded loops. The framework was evaluated on 8 benchmark tasks spanning infrastructure validation, ML capabilities, and governance stress tests. Baseline A (Prompt-Level Only) yields 100% hallucination (8/8 claimed, 0/8 verified). Baseline B (Verification-Only) reduces hallucination to 25% (2/8 fail verification). EviBound (Dual Gates) achieves 0% hallucination: 7/8 tasks verified and 1 task correctly blocked at the approval gate, all with only approximately 8.3% execution overhead. This package includes execution trajectories, MLflow run IDs for all verified tasks, and a 4-step verification protocol. Research integrity is an architectural property, achieved through governance gates rather than emergent from model scale. Comments: 27 pages, 11 figures, 5 tables. Reproducibility package with MLflow artifacts and Google Colab notebooks available upon publication Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2511.05524 [cs.AI] (or arXiv:2511.05524v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2511.05524 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Ruiying Chen [view email] [v1] Tue, 28 Oct 2025 17:47:13 UTC (1,811 KB)
zh

[AI-197] From Failure Modes to Reliability Awareness in Generative and Agent ic AI System

【速读】:该论文旨在解决生成式 AI (Generative AI) 和代理型 AI (Agentic AI) 系统中可靠性风险难以系统识别与管理的问题,尤其关注故障在多层架构中的传播机制及其对组织应对能力的影响。其解决方案的关键在于提出一个11层故障堆栈(failure stack)框架与意识映射(awareness mapping)方法的结合:前者用于结构化识别从硬件到自适应学习等各层级的脆弱性,后者则量化个体和组织对AI全栈可靠性风险的认知水平,并将其作为AI治理的战略输入,最终通过与以可靠性为中心的资产管理(Dependability-Centred Asset Management, DCAM)整合,实现对关键任务领域中可信且可持续AI部署的路径指引。

链接: https://arxiv.org/abs/2511.05511
作者: Janet(Jing)Lin,Liangwei Zhang
机构: 未知
类目: ystems and Control (eess.SY); Artificial Intelligence (cs.AI)
备注: 24pages

点击查看摘要

Abstract:This chapter bridges technical analysis and organizational preparedness by tracing the path from layered failure modes to reliability awareness in generative and agentic AI systems. We first introduce an 11-layer failure stack, a structured framework for identifying vulnerabilities ranging from hardware and power foundations to adaptive learning and agentic reasoning. Building on this, the chapter demonstrates how failures rarely occur in isolation but propagate across layers, creating cascading effects with systemic consequences. To complement this diagnostic lens, we develop the concept of awareness mapping: a maturity-oriented framework that quantifies how well individuals and organizations recognize reliability risks across the AI stack. Awareness is treated not only as a diagnostic score but also as a strategic input for AI governance, guiding improvement and resilience planning. By linking layered failures to awareness levels and further integrating this into Dependability-Centred Asset Management (DCAM), the chapter positions awareness mapping as both a measurement tool and a roadmap for trustworthy and sustainable AI deployment across mission-critical domains.
zh

[AI-198] Production-Grade Local LLM Inference on Apple Silicon: A Comparative Study of MLX MLC-LLM Ollama llama.cpp and PyTorch MPS

【速读】:该论文旨在系统性地评估五种面向Apple Silicon的本地大语言模型(Large Language Model, LLM)推理运行时(MLX、MLC-LLM、this http URL、Ollama 和 PyTorch MPS)在实际部署中的性能表现与设计权衡,以解决当前苹果生态下LLM本地化部署缺乏可比性基准和实践指导的问题。解决方案的关键在于通过统一实验环境(Mac Studio搭载M2 Ultra芯片及192 GB统一内存)和标准化评测指标(包括首次token时间延迟、稳态吞吐量、长上下文行为、量化支持、流式传输效率等),对各框架进行全面量化对比,从而揭示其在交互式任务和长文本处理场景下的适用性差异,并为开发者提供基于实证的选型建议。研究发现,尽管Apple Silicon方案在绝对性能上仍落后于NVIDIA GPU系统(如vLLM),但已逐步成熟为具备隐私保障、无需远程数据传输的生产级本地推理解决方案。

链接: https://arxiv.org/abs/2511.05502
作者: Varun Rajesh,Om Jodhpurkar,Pooja Anbuselvan,Mantinder Singh,Ashok Jallepali,Shantanu Godbole,Pradeep Kumar Sharma,Hritvik Shrivastava
机构: 未知
类目: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:We present a systematic, empirical evaluation of five local large language model (LLM) runtimes on Apple Silicon: MLX, MLC-LLM, this http URL, Ollama, and PyTorch MPS. Experiments were conducted on a Mac Studio equipped with an M2 Ultra processor and 192 GB of unified memory. Using the Qwen-2.5 model family across prompts ranging from a few hundred to 100,000 tokens, we measure time-to-first-token (TTFT), steady-state throughput, latency percentiles, long-context behavior (key-value and prompt caching), quantization support, streaming performance, batching and concurrency behavior, and deployment complexity. Under our settings, MLX achieves the highest sustained generation throughput, while MLC-LLM delivers consistently lower TTFT for moderate prompt sizes and offers stronger out-of-the-box inference features. this http URL is highly efficient for lightweight single-stream use, Ollama emphasizes developer ergonomics but lags in throughput and TTFT, and PyTorch MPS remains limited by memory constraints on large models and long contexts. All frameworks execute fully on-device with no telemetry, ensuring strong privacy guarantees. We release scripts, logs, and plots to reproduce all results. Our analysis clarifies the design trade-offs in Apple-centric LLM deployments and provides evidence-based recommendations for interactive and long-context processing. Although Apple Silicon inference frameworks still trail NVIDIA GPU-based systems such as vLLM in absolute performance, they are rapidly maturing into viable, production-grade solutions for private, on-device LLM inference. Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI) Cite as: arXiv:2511.05502 [cs.AR] (or arXiv:2511.05502v1 [cs.AR] for this version) https://doi.org/10.48550/arXiv.2511.05502 Focus to learn more arXiv-issued DOI via DataCite
zh

[AI-199] owards Ecologically Valid LLM Benchmarks: Understanding and Designing Domain-Centered Evaluations for Journalism Practitioners

【速读】:该论文旨在解决当前生成式 AI(Generative AI)基准测试在构念效度(construct validity)和生态效度(ecological validity)方面存在的问题,即现有基准测试是否真正衡量了模型声称的能力,以及评估结果是否能代表模型在实际应用中的表现。解决方案的关键在于采用以人为中心(human-centered)的方法,聚焦特定领域——新闻业,通过与23名新闻从业者的工作坊互动,提炼出该领域特有的挑战,并据此设计了一个面向新闻实践的领域导向型大语言模型(LLM)基准测试。这一方法不仅回应了基准测试有效性批评,还为开发更贴合具体应用场景的评测体系提供了可复用的设计指导。

链接: https://arxiv.org/abs/2511.05501
作者: Charlotte Li,Nick Hagar,Sachita Nishal,Jeremy Gilbert,Nick Diakopoulos
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注: 14 pages, 2 figures

点击查看摘要

Abstract:Benchmarks play a significant role in how researchers and the public understand generative AI systems. However, the widespread use of benchmark scores to communicate about model capabilities has led to criticisms of validity, especially whether benchmarks test what they claim to test (i.e. construct validity) and whether benchmark evaluations are representative of how models are used in the wild (i.e. ecological validity). In this work we explore how to create an LLM benchmark that addresses these issues by taking a human-centered approach. We focus on designing a domain-oriented benchmark for journalism practitioners, drawing on insights from a workshop of 23 journalism professionals. Our workshop findings surface specific challenges that inform benchmark design opportunities, which we instantiate in a case study that addresses underlying criticisms and specific domain concerns. Through our findings and design case study, this work provides design guidance for developing benchmarks that are better tuned to specific domains.
zh

[AI-200] Weightless Neural Networks for Continuously Trainable Personalized Recommendation Systems

【速读】:该论文旨在解决传统推荐系统在适应实时用户反馈时效率低下且缺乏推荐逻辑透明度的问题。传统推荐系统依赖于大规模分布式系统,并基于聚合用户数据进行预训练,导致新增数据时需耗费大量训练周期,难以及时响应用户变化。其解决方案的关键在于采用基于单用户数据训练的小型个人模型,利用无权重神经网络(Weightless Neural Networks, WNNs)实现持续学习——WNNs将神经网络视为状态机而非具有预训练权重的系统,从而避免了复杂的反向传播过程,提升了模型对个体用户行为的动态适应能力与可解释性。实验表明,该方法在MovieLens子集上达到了与标准协同过滤相当的准确率,同时具备更强的用户可调性和主观准确性潜力。

链接: https://arxiv.org/abs/2511.05499
作者: Rafayel Latif,Satwik Behera,Ali Al-Ebrahim
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Given that conventional recommenders, while deeply effective, rely on large distributed systems pre-trained on aggregate user data, incorporating new data necessitates large training cycles, making them slow to adapt to real-time user feedback and often lacking transparency in recommendation rationale. We explore the performance of smaller personal models trained on per-user data using weightless neural networks (WNNs), an alternative to neural backpropagation that enable continuous learning by using neural networks as a state machine rather than a system with pretrained weights. We contrast our approach against a classic weighted system, also on a per-user level, and standard collaborative filtering, achieving competitive levels of accuracy on a subset of the MovieLens dataset. We close with a discussion of how weightless systems can be developed to augment centralized systems to achieve higher subjective accuracy through recommenders more directly tunable by end-users.
zh

[AI-201] Biomedical Hypothesis Explainability with Graph-Based Context Retrieval

【速读】:该论文旨在解决生物医学假设生成系统中解释性不足的问题,即如何使大型语言模型(Large Language Models, LLMs)生成的假设具备可解释性,并能基于真实世界科研约束提供可信的证据路径。解决方案的关键在于构建一个基于语义图谱的检索机制与受限数据训练策略相结合的框架——Hypothesis Generation Context Retriever(HGCR),并通过检索增强生成(Retrieval-Augmented Generation, RAG)将LLM与已发表科学文献中的上下文证据相融合;此外,引入一种新颖的反馈循环机制,迭代识别并修正LLM生成解释中的错误部分,从而持续优化证据路径和支撑背景,提升整体系统的可解释性和准确性。

链接: https://arxiv.org/abs/2511.05498
作者: Ilya Tyagin,Saeideh Valipour,Aliaksandra Sikirzhytskaya,Michael Shtutman,Ilya Safro
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
备注: 30 pages, 10 figures,

点击查看摘要

Abstract:We introduce an explainability method for biomedical hypothesis generation systems, built on top of the novel Hypothesis Generation Context Retriever framework. Our approach combines semantic graph-based retrieval and relevant data-restrictive training to simulate real-world discovery constraints. Integrated with large language models (LLMs) via retrieval-augmented generation, the system explains hypotheses with contextual evidence using published scientific literature. We also propose a novel feedback loop approach, which iteratively identifies and corrects flawed parts of LLM-generated explanations, refining both the evidence paths and supporting context. We demonstrate the performance of our method with multiple large language models and evaluate the explanation and context retrieval quality through both expert-curated assessment and large-scale automated analysis. Our code is available at: this https URL.
zh

[AI-202] DOCUEVAL: An LLM -based AI Engineering Tool for Building Customisable Document Evaluation Workflows

【速读】:该论文旨在解决基础模型(如大语言模型,LLMs)在实际评估流程中面临的可定制性、准确性与可扩展性挑战。其解决方案的关键在于提出DOCUEVAL——一个用于构建可定制文档评估工作流的AI工程工具,该工具支持高级文档处理、灵活的工作流设计,并允许用户定义基于理论的评审角色、指定评估标准、实验不同推理策略及选择评估风格;同时通过全面的日志记录、来源追溯和配置管理,确保评估过程的可追溯性与结果的系统性对比,从而有效应对评估者是否“足够好”以部署以及如何实证比较不同评估策略等核心软件工程问题。

链接: https://arxiv.org/abs/2511.05496
作者: Hao Zhang,Qinghua Lu,Liming Zhu
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Foundation models, such as large language models (LLMs), have the potential to streamline evaluation workflows and improve their performance. However, practical adoption faces challenges, such as customisability, accuracy, and scalability. In this paper, we present DOCUEVAL, an AI engineering tool for building customisable DOCUment EVALuation workflows. DOCUEVAL supports advanced document processing and customisable workflow design which allow users to define theory-grounded reviewer roles, specify evaluation criteria, experiment with different reasoning strategies and choose the assessment style. To ensure traceability, DOCUEVAL provides comprehensive logging of every run, along with source attribution and configuration management, allowing systematic comparison of results across alternative setups. By integrating these capabilities, DOCUEVAL directly addresses core software engineering challenges, including how to determine whether evaluators are “good enough” for deployment and how to empirically compare different evaluation strategies. We demonstrate the usefulness of DOCUEVAL through a real-world academic peer review case, showing how DOCUEVAL enables both the engineering of evaluators and scalable, reliable document evaluation.
zh

[AI-203] IMDMR: An Intelligent Multi-Dimensional Memory Retrieval System for Enhanced Conversational AI

【速读】:该论文旨在解决当前对话式人工智能(Conversational AI)系统在长时间交互中难以维持连贯且上下文相关的记忆问题,从而限制了个性化和情境相关响应的能力。其解决方案的关键在于提出一种名为IMDMR(Intelligent Multi-Dimensional Memory Retrieval)的新颖多维检索架构,该架构通过六维记忆维度——语义(semantic)、实体(entity)、类别(category)、意图(intent)、上下文(context)和时间(temporal)——实现全面的记忆检索能力,并结合智能查询处理、动态策略选择、跨记忆实体解析与高级记忆融合技术,显著提升了系统性能,在多项指标上优于现有基线方法(如LangChain RAG、LlamaIndex、MemGPT等),其中整体性能提升达3.8倍(0.792 vs. 0.207)。

链接: https://arxiv.org/abs/2511.05495
作者: Tejas Pawar,Sarika Patil,Om Tilekar,Rushikesh Janwade,Vaibhav Helambe
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
备注: 28 pages, 8 figures, submitted to arXiv for open access publication

点击查看摘要

Abstract:Conversational AI systems often struggle with maintaining coherent, contextual memory across extended interactions, limiting their ability to provide personalized and contextually relevant responses. This paper presents IMDMR (Intelligent Multi-Dimensional Memory Retrieval), a novel system that addresses these limitations through a multi-dimensional search architecture. Unlike existing memory systems that rely on single-dimensional approaches, IMDMR leverages six distinct memory dimensions-semantic, entity, category, intent, context, and temporal-to provide comprehensive memory retrieval capabilities. Our system incorporates intelligent query processing with dynamic strategy selection, cross-memory entity resolution, and advanced memory integration techniques. Through comprehensive evaluation against five baseline systems including LangChain RAG, LlamaIndex, MemGPT, and spaCy + RAG, IMDMR achieves a 3.8x improvement in overall performance (0.792 vs 0.207 for the best baseline). We present both simulated (0.314) and production (0.792) implementations, demonstrating the importance of real technology integration while maintaining superiority over all baseline systems. Ablation studies demonstrate the effectiveness of multi-dimensional search, with the full system outperforming individual dimension approaches by 23.3%. Query-type analysis reveals superior performance across all categories, particularly for preferences/interests (0.630) and goals/aspirations (0.630) queries. Comprehensive visualizations and statistical analysis confirm the significance of these improvements with p 0.001 across all metrics. The results establish IMDMR as a significant advancement in conversational AI memory systems, providing a robust foundation for enhanced user interactions and personalized experiences.
zh

[AI-204] Customized Retrieval-Augmented Generation with LLM for Debiasing Recommendation Unlearning ICDM2025

【速读】:该论文旨在解决现代推荐系统在遵守隐私法规(如“被遗忘权”)时面临的挑战:如何在移除特定用户数据的同时,不破坏其他用户的推荐效果。传统去学习(unlearning)方法通过局部模型更新实现这一目标,但会引入传播偏差(propagation bias),即因删除某用户数据而影响行为相似用户的推荐准确性;而完全重训练虽可消除偏差,却因计算成本过高难以应用于大规模系统。解决方案的关键在于提出CRAGRU框架,该框架基于检索增强生成(Retrieval-Augmented Generation, RAG)架构,将去学习过程解耦为独立的检索与生成阶段:检索阶段采用三种定制策略精准隔离目标用户的数据影响,减少对无关用户的干扰;生成阶段则利用大语言模型(LLM)结合用户画像嵌入提示词,重建个性化推荐结果,无需重新训练整个基础模型。实验证明,CRAGRU能有效实现用户特定去学习,在显著缓解传播偏差的同时保持与原模型相当的推荐性能。

链接: https://arxiv.org/abs/2511.05494
作者: Haichao Zhang,Chong Zhang,Peiyu Hu,Shi Qiu,Jia Wang
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
备注: 10 pages, 4 figures. Accepted ICDM 2025 (IEEE International Conference on Data Mining)

点击查看摘要

Abstract:Modern recommender systems face a critical challenge in complying with privacy regulations like the ‘right to be forgotten’: removing a user’s data without disrupting recommendations for others. Traditional unlearning methods address this by partial model updates, but introduce propagation bias–where unlearning one user’s data distorts recommendations for behaviorally similar users, degrading system accuracy. While retraining eliminates bias, it is computationally prohibitive for large-scale systems. To address this challenge, we propose CRAGRU, a novel framework leveraging Retrieval-Augmented Generation (RAG) for efficient, user-specific unlearning that mitigates bias while preserving recommendation quality. CRAGRU decouples unlearning into distinct retrieval and generation stages. In retrieval, we employ three tailored strategies designed to precisely isolate the target user’s data influence, minimizing collateral impact on unrelated users and enhancing unlearning efficiency. Subsequently, the generation stage utilizes an LLM, augmented with user profiles integrated into prompts, to reconstruct accurate and personalized recommendations without needing to retrain the entire base model. Experiments on three public datasets demonstrate that CRAGRU effectively unlearns targeted user data, significantly mitigating unlearning bias by preventing adverse impacts on non-target users, while maintaining recommendation performance comparable to fully trained original models. Our work highlights the promise of RAG-based architectures for building robust and privacy-preserving recommender systems. The source code is available at: this https URL.
zh

[AI-205] Machine-Learning Accelerated Calculations of Reduced Density Matrices

【速读】:该论文旨在解决强关联体系中n-粒子约化密度矩阵(n-particle reduced density matrices, n-RDMs)计算效率低的问题,尤其是在系统尺寸较大时。其核心挑战在于传统方法难以高效处理大尺度系统的n-RDMs,限制了对关联物态的深入研究。解决方案的关键在于利用神经网络(Neural Network, NN)架构对平滑且可插值的n-RDM进行加速计算与预测:首先基于n-RDM在布里渊区(Brillouin zone, BZ)上通常为光滑函数的物理特性(尤其适用于能隙体系),设计两种NN模型——一种是自注意力网络用于将随机RDM映射为物理上合理的RDM,另一种是正弦表示网络(SIREN)直接从动量空间坐标映射到RDM值;实验表明,训练于小尺寸网格(如6×6)的SIREN可高精度预测大尺寸(如18×18)的配对关联函数,并显著减少大系统哈特里-福克(Hartree-Fock, HF)迭代求解所需的迭代次数(最多降低92.78%),从而为强关联物态研究提供了一种高效、可推广的新范式。

链接: https://arxiv.org/abs/2511.07367
作者: Awwab A. Azam,Lexu Zhao,Jiabin Yu
机构: 未知
类目: rongly Correlated Electrons (cond-mat.str-el); Artificial Intelligence (cs.AI)
备注: 10+32 pages, 6+4 figures, 1+6 tables

点击查看摘要

Abstract: n -particle reduced density matrices ( n -RDMs) play a central role in understanding correlated phases of matter. Yet the calculation of n -RDMs is often computationally inefficient for strongly-correlated states, particularly when the system sizes are large. In this work, we propose to use neural network (NN) architectures to accelerate the calculation of, and even predict, the n -RDMs for large-size systems. The underlying intuition is that n -RDMs are often smooth functions over the Brillouin zone (BZ) (certainly true for gapped states) and are thus interpolable, allowing NNs trained on small-size n -RDMs to predict large-size ones. Building on this intuition, we devise two NNs: (i) a self-attention NN that maps random RDMs to physical ones, and (ii) a Sinusoidal Representation Network (SIREN) that directly maps momentum-space coordinates to RDM values. We test the NNs in three 2D models: the pair-pair correlation functions of the Richardson model of superconductivity, the translationally-invariant 1-RDM in a four-band model with short-range repulsion, and the translation-breaking 1-RDM in the half-filled Hubbard model. We find that a SIREN trained on a 6\times 6 momentum mesh can predict the 18\times 18 pair-pair correlation function with a relative accuracy of 0.839 . The NNs trained on 6\times 6 \sim 8\times 8 meshes can provide high-quality initial guesses for 50\times 50 translation-invariant Hartree-Fock (HF) and 30\times 30 fully translation-breaking-allowed HF, reducing the number of iterations required for convergence by up to 91.63% and 92.78% , respectively, compared to random initializations. Our results illustrate the potential of using NN-based methods for interpolable n -RDMs, which might open a new avenue for future research on strongly correlated phases.
zh

[AI-206] Sample-efficient quantum error mitigation via classical learning surrogates

【速读】:该论文旨在解决近中期量子处理器因固有噪声导致计算保真度下降的问题,特别是针对量子误差缓解(Quantum Error Mitigation, QEM)技术中测量开销过大的挑战,尤其是在处理由经典输入参数化的量子电路族时。其解决方案的关键在于提出了一种基于代理模型的零噪声外推法(Surrogate-enabled Zero-Noise Extrapolation, S-ZNE),该方法利用经典学习代理模型在经典侧完成整个量子电路族的误差缓解,从而将原本随电路数量线性增长的测量开销降低至恒定水平,显著提升了可扩展性。理论分析与数值实验表明,S-ZNE在多数实际场景下可达到与传统ZNE相当的精度,为其他QEM协议的扩展提供了通用框架。

链接: https://arxiv.org/abs/2511.07092
作者: Wei-You Liao,Ge Yan,Yujin Song,Tian-Ci Tian,Wei-Ming Zhu,De-Tao Jiang,Yuxuan Du,He-Liang Huang
机构: 未知
类目: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 26 pages, 8 figures

点击查看摘要

Abstract:The pursuit of practical quantum utility on near-term quantum processors is critically challenged by their inherent noise. Quantum error mitigation (QEM) techniques are leading solutions to improve computation fidelity with relatively low qubit-overhead, while full-scale quantum error correction remains a distant goal. However, QEM techniques incur substantial measurement overheads, especially when applied to families of quantum circuits parameterized by classical inputs. Focusing on zero-noise extrapolation (ZNE), a widely adopted QEM technique, here we devise the surrogate-enabled ZNE (S-ZNE), which leverages classical learning surrogates to perform ZNE entirely on the classical side. Unlike conventional ZNE, whose measurement cost scales linearly with the number of circuits, S-ZNE requires only constant measurement overhead for an entire family of quantum circuits, offering superior scalability. Theoretical analysis indicates that S-ZNE achieves accuracy comparable to conventional ZNE in many practical scenarios, and numerical experiments on up to 100-qubit ground-state energy and quantum metrology tasks confirm its effectiveness. Our approach provides a template that can be effectively extended to other quantum error mitigation protocols, opening a promising path toward scalable error mitigation.
zh

[AI-207] Deep learning EPI-TIRF cross-modality enables background subtraction and axial super-resolution for widefield fluorescence microscopy

【速读】:该论文旨在解决宽场荧光显微成像中因轴向分辨率低而导致的离焦背景干扰问题,尤其是在密集标记的生物样本中。解决方案的关键在于提出了一种基于深度学习的跨模态网络ET2dNet,其采用物理信息引导的混合架构,结合监督学习与已注册的宽场-全内反射荧光(EPI-TIRF)图像对,并通过点扩散函数(point spread function, PSF)卷积实现自监督物理建模,从而在不依赖硬件改造的情况下,从单张宽场图像中实现接近TIRF的背景抑制和轴向超分辨效果。该框架具备优异的泛化能力,可适应不同物镜并支持少样本迁移,同时为三维重构进一步发展出ET3dNet,有效去除体数据中的离焦信号,显著提升轴向超分辨成像的可及性与实用性。

链接: https://arxiv.org/abs/2511.06853
作者: Qiushi Li,Celi Lou,Yanfang Cheng,Bilang Gong,Xinlin Chen,Hao Chen,Baowan Li,Jieli Wang,Yulin Wang,Sipeng Yang,Yunqing Tang,Luru Dai
机构: 未知
类目: Optics (physics.optics); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:The resolving ability of wide-field fluorescence microscopy is fundamentally limited by out-of-focus background owing to its low axial resolution, particularly for densely labeled biological samples. To address this, we developed ET2dNet, a deep learning-based EPI-TIRF cross-modality network that achieves TIRF-comparable background subtraction and axial super-resolution from a single wide-field image without requiring hardware modifications. The model employs a physics-informed hybrid architecture, synergizing supervised learning with registered EPI-TIRF image pairs and self-supervised physical modeling via convolution with the point spread function. This framework ensures exceptional generalization across microscope objectives, enabling few-shot adaptation to new imaging setups. Rigorous validation on cellular and tissue samples confirms ET2dNet’s superiority in background suppression and axial resolution enhancement, while maintaining compatibility with deconvolution techniques for lateral resolution improvement. Furthermore, by extending this paradigm through knowledge distillation, we developed ET3dNet, a dedicated three-dimensional reconstruction network that produces artifact-reduced volumetric results. ET3dNet effectively removes out-of-focus background signals even when the input image stack lacks the source of background. This framework makes axial super-resolution imaging more accessible by providing an easy-to-deploy algorithm that avoids additional hardware costs and complexity, showing great potential for live cell studies and clinical histopathology.
zh

[AI-208] Diagnosing and Breaking Amplitude Suppression in Seismic Phase Picking Through Adversarial Shape Learning

【速读】:该论文试图解决深度学习在地震相位拾取(seismic phase picking)中一个长期存在的悖论:尽管生成式 AI(Generative AI)模型能够高精度预测 P 波,但对 S 波的振幅预测始终低于检测阈值,表现为持续的振幅抑制现象。研究通过分析训练历史和损失函数的几何结构,识别出三个相互作用的因素:S 波初至时刻具有较高的时间不确定性;卷积神经网络(CNN)倾向于关注高振幅边界而非微弱初至点;逐点二分类交叉熵(Binary Cross-Entropy, BCE)损失缺乏横向校正力,仅提供垂直梯度,导致振幅被抑制而时间间隔未收敛。解决方案的关键在于提出“先形状后对齐”(shape-then-align)策略——即在时间对齐前先构建稳定的几何模板以约束预测形态。作者采用条件生成对抗网络(conditional GAN)框架,在传统 BCE 训练基础上引入判别器模块,强制施加形状约束,从而在 10,000 步训练后实现有效 S 相位检测率提升 64%。该方法无需先验假设即可自动发现目标几何特征,为需要精确对齐细微结构与主导结构的分割任务提供了通用解决方案。

链接: https://arxiv.org/abs/2511.06731
作者: Chun-Ming Huang,Li-Heng Chang,I-Hsin Chang,An-Sheng Lee,Hao Kuo-Chen
机构: 未知
类目: Geophysics (physics.geo-ph); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Deep learning has revolutionized seismic phase picking, yet a paradox persists: high signal-to-noise S-wave predictions consistently fail to cross detection thresholds, oscillating at suppressed amplitudes. We identify this previously unexplained phenomenon as amplitude suppression, which we diagnose through analyzing training histories and loss landscapes. Three interacting factors emerge: S-wave onsets exhibit high temporal uncertainty relative to high-amplitude boundaries; CNN’s bias toward sharp amplitude changes anchors predictions to these boundaries rather than subtle onsets; and point-wise Binary Cross-Entropy (BCE) loss lacks lateral corrective forces, providing only vertical gradients that suppress amplitude while temporal gaps persist. This geometric trap points to a shape-then-align solution where stable geometric templates must precede temporal alignment. We implement this through a conditional GAN framework by augmenting conventional BCE training with a discriminator that enforces shape constraints. Training for 10,000 steps, this achieves a 64% increase in effective S-phase detections. Our framework autonomously discovers target geometry without a priori assumptions, offering a generalizable solution for segmentation tasks requiring precise alignment of subtle features near dominant structures.
zh

[AI-209] SPUR: A Plug-and-Play Framework for Integrating Spatial Audio Understanding and Reasoning into Large Audio-Language Models

【速读】:该论文旨在解决当前大型音频语言模型(Large Audio-Language Models, LALMs)普遍依赖单声道输入、缺乏空间感知能力的问题,即无法有效捕捉声音的方向、仰角和距离等空间线索。解决方案的关键在于提出一种轻量级、可插拔的框架SPUR,其核心包括:(i) 采用一阶全向声场(First-Order Ambisonics, FOA)编码器将(W, X, Y, Z)四通道信号映射为旋转不变、以听者为中心的空间特征,并通过多模态适配器集成到目标LALMs中;(ii) 构建SPUR-Set数据集,结合开源FOA录音与受控模拟场景,聚焦相对方向、仰角、距离及重叠关系,用于监督空间推理训练。该方法在不显著改变原模型架构的前提下,显著提升了模型的空间问答(Spatial QA)能力和多说话人定位性能,同时保持了通用音频理解能力。

链接: https://arxiv.org/abs/2511.06606
作者: S Sakshi,Vaibhavi Lokegaonkar,Neil Zhang,Ramani Duraiswami,Sreyan Ghosh,Dinesh Manocha,Lie Lu
机构: 未知
类目: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
备注: Project: this https URL

点击查看摘要

Abstract:Spatial perception is central to auditory intelligence, enabling accurate understanding of real-world acoustic scenes and advancing human-level perception of the world around us. While recent large audio-language models (LALMs) show strong reasoning over complex audios, most operate on monaural inputs and lack the ability to capture spatial cues such as direction, elevation, and distance. We introduce SPUR, a lightweight, plug-in approach that equips LALMs with spatial perception through minimal architectural changes. SPUR consists of: (i) a First-Order Ambisonics (FOA) encoder that maps (W, X, Y, Z) channels to rotation-aware, listener-centric spatial features, integrated into target LALMs via a multimodal adapter; and (ii) SPUR-Set, a spatial QA dataset combining open-source FOA recordings with controlled simulations, emphasizing relative direction, elevation, distance, and overlap for supervised spatial reasoning. Fine-tuning our model on the SPUR-Set consistently improves spatial QA and multi-speaker attribution while preserving general audio understanding. SPUR provides a simple recipe that transforms monaural LALMs into spatially aware models. Extensive ablations validate the effectiveness of our approach.
zh

[AI-210] A PDE Perspective on Generative Diffusion Models

【速读】:该论文旨在解决生成式 AI 中基于得分的扩散模型(score-based diffusion models)的数学基础不完善问题,特别是其背后的随机微分方程(SDE)和偏微分方程(PDE)动力学在稳定性与一致性方面的理论缺口。解决方案的关键在于构建一个严格的偏微分方程框架,利用 Li–Yau 微分不等式对热流的分析,证明了得分驱动的 Fokker–Planck 动力学的适定性(well-posedness),并推导出精确的 LpL^p-稳定性估计;进一步通过熵稳定方法表明,在反向时间演化中,扩散轨迹在紧支集数据分布下以 t\sqrt{t} 的速率收敛至数据流形,从而为扩散路径在精确得分引导下返回数据流形并保持模仿保真度提供了理论保障。这一框架还为模型设计提供了可遵循的原则,如得分函数构造、损失函数定义及停止时间选择,实现了生成能力与模仿保真度之间定量权衡的统一数学刻画。

链接: https://arxiv.org/abs/2511.05940
作者: Kang Liu,Enrique Zuazua
机构: 未知
类目: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Analysis of PDEs (math.AP)
备注: 30 pages, 4 figures

点击查看摘要

Abstract:Score-based diffusion models have emerged as a powerful class of generative methods, achieving state-of-the-art performance across diverse domains. Despite their empirical success, the mathematical foundations of those models remain only partially understood, particularly regarding the stability and consistency of the underlying stochastic and partial differential equations governing their dynamics. In this work, we develop a rigorous partial differential equation (PDE) framework for score-based diffusion processes. Building on the Li–Yau differential inequality for the heat flow, we prove well-posedness and derive sharp L^p -stability estimates for the associated score-based Fokker–Planck dynamics, providing a mathematically consistent description of their temporal evolution. Through entropy stability methods, we further show that the reverse-time dynamics of diffusion models concentrate on the data manifold for compactly supported data distributions and a broad class of initialization schemes, with a concentration rate of order \sqrtt as t \to 0 . These results yield a theoretical guarantee that, under exact score guidance, diffusion trajectories return to the data manifold while preserving imitation fidelity. Our findings also provide practical insights for designing diffusion models, including principled criteria for score-function construction, loss formulation, and stopping-time selection. Altogether, this framework provides a quantitative understanding of the trade-off between generative capacity and imitation fidelity, bridging rigorous analysis and model design within a unified mathematical perspective. Comments: 30 pages, 4 figures Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Analysis of PDEs (math.AP) MSC classes: 34D05, 35B35, 35Q68, 35Q84, 68T99 Cite as: arXiv:2511.05940 [math.OC] (or arXiv:2511.05940v1 [math.OC] for this version) https://doi.org/10.48550/arXiv.2511.05940 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
zh

[AI-211] IoT-based Fresh Produce Supply Chain Under Uncertainty: An Adaptive Optimization Framework

【速读】:该论文旨在解决果蔬物流系统中因高易腐性、供应波动、严格的质量安全标准及环境敏感性所导致的复杂配送难题,尤其是温度变化对产品货架期的影响。解决方案的关键在于提出一种自适应优化模型,该模型通过整合物联网(IoT)传感器数据,动态考虑运输过程中的延迟、行驶时间和温变因素,并引入温度反馈机制以实时缓解温度偏差,从而显著延长果蔬货架期——实验结果显示相较传统鲁棒优化、分布鲁棒优化和随机规划方法,货架期提升超过18%。

链接: https://arxiv.org/abs/2511.05920
作者: Chirag Seth,Mehrdad Pirnia,James H Bookbinder
机构: 未知
类目: Optimization and Control (math.OC); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Fruits and vegetables form a vital component of the global economy; however, their distribution poses complex logistical challenges due to high perishability, supply fluctuations, strict quality and safety standards, and environmental sensitivity. In this paper, we propose an adaptive optimization model that accounts for delays, travel time, and associated temperature changes impacting produce shelf life, and compare it against traditional approaches such as Robust Optimization, Distributionally Robust Optimization, and Stochastic Programming. Additionally, we conduct a series of computational experiments using Internet of Things (IoT) sensor data to evaluate the performance of our proposed model. Our study demonstrates that the proposed adaptive model achieves a higher shelf life, extending it by over 18% compared to traditional optimization models, by dynamically mitigating temperature deviations through a temperature feedback mechanism. The promising results demonstrate the potential of this approach to improve both the freshness and efficiency of logistics systems an aspect often neglected in previous works.
zh

[AI-212] BrainCSD: A Hierarchical Consistency-Driven MoE Foundation Model for Unified Connectome Synthesis and Multitask Brain Trait Prediction

【速读】:该论文旨在解决功能连接(Functional Connectivity, FC)与结构连接(Structural Connectivity, SC)作为脑部多模态生物标志物在临床应用中面临的三大挑战:获取成本高、预处理复杂以及模态缺失频繁。现有基础模型要么仅处理单一模态,要么缺乏显式的跨模态与跨尺度一致性机制。其解决方案的关键在于提出 BrainCSD,一个分层混合专家(Mixture-of-Experts, MoE)基础模型,通过三个神经解剖学引导的组件实现 FC/SC 的联合合成与下游任务支持:(1) 区域特异性 MoE 实现经典网络(如默认模式网络 DMN、额顶网络 FPN)区域激活与全局图谱之间的对比一致性对齐;(2) 编码-激活 MoE 建模 fMRI/dMRI 中的时间动态性和梯度依赖性;(3) 网络感知精修 MoE 在个体和群体层面强制施加结构先验与对称性约束。该设计显著提升了在完整和缺失模态场景下的性能表现,达到当前最优水平。

链接: https://arxiv.org/abs/2511.05630
作者: Xiongri Shen,Jiaqi Wang,Yi Zhong,Zhenxi Song,Leilei Zhao,Liling Li,Yichen Wei,Lingyan Liang,Shuqiang Wang,Baiying Lei,Demao Deng,Zhiguo Zhang
机构: 未知
类目: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Functional and structural connectivity (FC/SC) are key multimodal biomarkers for brain analysis, yet their clinical utility is hindered by costly acquisition, complex preprocessing, and frequent missing modalities. Existing foundation models either process single modalities or lack explicit mechanisms for cross-modal and cross-scale consistency. We propose BrainCSD, a hierarchical mixture-of-experts (MoE) foundation model that jointly synthesizes FC/SC biomarkers and supports downstream decoding tasks (diagnosis and prediction). BrainCSD features three neuroanatomically grounded components: (1) a ROI-specific MoE that aligns regional activations from canonical networks (e.g., DMN, FPN) with a global atlas via contrastive consistency; (2) a Encoding-Activation MOE that models dynamic cross-time/gradient dependencies in fMRI/dMRI; and (3) a network-aware refinement MoE that enforces structural priors and symmetry at individual and population levels. Evaluated on the datasets under complete and missing-modality settings, BrainCSD achieves SOTA results: 95.6% accuracy for MCI vs. CN classification without FC, low synthesis error (FC RMSE: 0.038; SC RMSE: 0.006), brain age prediction (MAE: 4.04 years), and MMSE score estimation (MAE: 1.72 points). Code is available in \hrefthis https URLBrainCSD
zh

[AI-213] AI-Enhanced High-Density NIRS Patch for Real-Time Brain Layer Oxygenation Monitoring in Neurological Emergencies

【速读】:该论文旨在解决近红外光谱(NIRS)在脑部层析氧合信息提取中的局限性,即传统NIRS因光子散射导致无法准确获取特定脑层的氧合数据,从而限制了其在神经监护中的临床应用。解决方案的关键在于构建一种基于人工智能(AI)驱动的高密度NIRS系统,该系统通过融合高密度NIRS反射数据与基于MRI生成的合成数据训练的神经网络,实现了对大脑皮层氧合水平的实时、精准监测。该方法显著提升了在不同解剖变异下的准确性,在模拟和仿生phantom实验中均表现出优于传统方法的性能,并在临床验证中成功区分健康人群与缺血性卒中患者,显示出其在急诊和床旁场景中作为高精度诊断工具的巨大潜力。

链接: https://arxiv.org/abs/2511.05612
作者: Minsu Ji,Jihoon Kang,Seongkwon Yu,Jaemyoung Kim,Bumjun Koh,Jimin Lee,Guil Jeong,Jongkwan choi,Chang-Ho Yun,Hyeonmin Bae
机构: 未知
类目: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Photon scattering has traditionally limited the ability of near-infrared spectroscopy (NIRS) to extract accurate, layer-specific information from the brain. This limitation restricts its clinical utility for precise neurological monitoring. To address this, we introduce an AI-driven, high-density NIRS system optimized to provide real-time, layer-specific oxygenation data from the brain cortex, specifically targeting acute neuro-emergencies. Our system integrates high-density NIRS reflectance data with a neural network trained on MRI-based synthetic datasets. This approach achieves robust cortical oxygenation accuracy across diverse anatomical variations. In simulations, our AI-assisted NIRS demonstrated a strong correlation (R2=0.913) with actual cortical oxygenation, markedly outperforming conventional methods (R2=0.469). Furthermore, biomimetic phantom experiments confirmed its superior anatomical reliability (R2=0.986) compared to standard commercial devices (R2=0.823). In clinical validation with healthy subjects and ischemic stroke patients, the system distinguished between the two groups with an AUC of 0.943. This highlights its potential as an accessible, high-accuracy diagnostic tool for emergency and point-of-care settings. These results underscore the system’s capability to advance neuro-monitoring precision through AI, enabling timely, data-driven decisions in critical care environments.
zh

[AI-214] Gravity-Awareness: Deep Learning Models and LLM Simulation of Human Awareness in Altered Gravity

【速读】:该论文旨在解决人类在不同重力环境下(如微重力、部分重力及超重)如何通过神经系统和生理系统进行适应性调整的问题,特别是如何定量建模大脑对重力变化的神经电活动响应及其与自主神经系统、运动行为等生理指标的耦合机制。其解决方案的关键在于构建一个双组件计算框架:一是采用轻量级多层感知机(MLP)预测与g载荷相关的脑电图(EEG)频段变化,表征皮层状态;二是利用一组独立高斯过程(GPs)建模心率变异性(HRV)、皮肤电活动(EDA)和运动行为等更广泛的生理状态;二者均基于抛物线飞行文献数据训练而成,并进一步结合大语言模型(LLM)生成主观体验模拟,从而实现从客观生理数据到认知感知的跨模态整合,为预测人类在非地球重力环境下的表现提供新范式。

链接: https://arxiv.org/abs/2511.05536
作者: Bakytzhan Alibekov,Alina Gutoreva,Elisa Raffaella-Ferre
机构: 未知
类目: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Signal Processing (eess.SP)
备注: 64 pages, 8 figures, 2 datasets, 1 protocol

点击查看摘要

Abstract:Earth’s gravity has fundamentally shaped human development by guiding the brain’s integration of vestibular, visual, and proprioceptive inputs into an internal model of gravity: a dynamic neural representation enabling prediction and interpretation of gravitational forces. This work presents a dual computational framework to quantitatively model these adaptations. The first component is a lightweight Multi-Layer Perceptron (MLP) that predicts g-load-dependent changes in key electroencephalographic (EEG) frequency bands, representing the brain’s cortical state. The second component utilizes a suite of independent Gaussian Processes (GPs) to model the body’s broader physiological state, including Heart Rate Variability (HRV), Electrodermal Activity (EDA), and motor behavior. Both models were trained on data derived from a comprehensive review of parabolic flight literature, using published findings as anchor points to construct robust, continuous functions. To complement this quantitative analysis, we simulated subjective human experience under different gravitational loads, ranging from microgravity (0g) and partial gravity (Moon 0.17g, Mars 0.38g) to hypergravity associated with spacecraft launch and re-entry (1.8g), using a large language model (Claude 3.5 Sonnet). The model was prompted with physiological parameters to generate introspective narratives of alertness and self-awareness, which closely aligned with the quantitative findings from both the EEG and physiological models. This combined framework integrates quantitative physiological modeling with generative cognitive simulation, offering a novel approach to understanding and predicting human performance in altered gravity
zh

[AI-215] he Evolution of Probabilistic Price Forecasting Techniques: A Review of the Day-Ahead Intra-Day and Balancing Markets

【速读】:该论文旨在解决传统电力价格预测方法在应对可再生能源高渗透率带来的市场波动性和不确定性时的不足,特别是点预测方法无法量化风险的问题。其解决方案的关键在于系统梳理和评述概率预测(probabilistic forecasting)方法的发展脉络,涵盖贝叶斯与分布基础方法、分位数回归以及最新的合规预测(conformal prediction)技术,并强调有效性导向的方法以提升不确定性估计的可靠性。同时,研究扩展至日前市场之外的日内市场(Intra-Day Market)与平衡市场(Balancing Market),关注更高时间粒度和实时运行约束下的预测挑战,为研究人员和从业者提供一套全面且前沿的工具与评估框架,以应对现代电力市场的复杂性。

链接: https://arxiv.org/abs/2511.05523
作者: Ciaran O’Connor,Mohamed Bahloul,Steven Prestwich,Andrea Visentin
机构: 未知
类目: atistical Finance (q-fin.ST); Artificial Intelligence (cs.AI); Applications (stat.AP)
备注:

点击查看摘要

Abstract:Electricity price forecasting has become a critical tool for decision-making in energy markets, particularly as the increasing penetration of renewable energy introduces greater volatility and uncertainty. Historically, research in this field has been dominated by point forecasting methods, which provide single-value predictions but fail to quantify uncertainty. However, as power markets evolve due to renewable integration, smart grids, and regulatory changes, the need for probabilistic forecasting has become more pronounced, offering a more comprehensive approach to risk assessment and market participation. This paper presents a review of probabilistic forecasting methods, tracing their evolution from Bayesian and distribution based approaches, through quantile regression techniques, to recent developments in conformal prediction. Particular emphasis is placed on advancements in probabilistic forecasting, including validity-focused methods which address key limitations in uncertainty estimation. Additionally, this review extends beyond the Day-Ahead Market to include the Intra-Day and Balancing Markets, where forecasting challenges are intensified by higher temporal granularity and real-time operational constraints. We examine state of the art methodologies, key evaluation metrics, and ongoing challenges, such as forecast validity, model selection, and the absence of standardised benchmarks, providing researchers and practitioners with a comprehensive and timely resource for navigating the complexities of modern electricity markets.
zh

[AI-216] AIRMap - AI-Generated Radio Maps for Wireless Digital Twins

【速读】:该论文旨在解决无线网络仿真与数字孪生应用中对高精度、低延迟信道建模的需求,传统方法如射线追踪(ray tracing)计算复杂度高且难以适应动态环境。其解决方案的关键在于提出AIRMap——一种基于深度学习的超快速无线电地图估计框架,采用单输入U-Net自动编码器结构,仅需地形和建筑物高度的二维仰角图作为输入,即可在4毫秒内完成一次推理(NVIDIA L40S显卡),相比GPU加速的射线追踪方法快逾7000倍;同时结合轻量级迁移学习校准(仅需20%实测数据),将中位误差降至约10%,显著优于传统仿真器(误差超50%),并在Colosseum模拟器和Sionna SYS平台中验证了其在频谱效率和块错误率上的近零误差表现,证明了其在无线数字孪生中实现可扩展、高精度、实时无线电地图估计的潜力。

链接: https://arxiv.org/abs/2511.05522
作者: Ali Saeizadeh,Miead Tehrani-Moayyed,Davide Villa,J. Gordon Beattie Jr.,Pedram Johari,Stefano Basagni,Tommaso Melodia
机构: 未知
类目: ignal Processing (eess.SP); Artificial Intelligence (cs.AI)
备注: 13 pages, 17 figures, This paper has been submitted to the IEEE Transactions for possible publication

点击查看摘要

Abstract:Accurate, low-latency channel modeling is essential for real-time wireless network simulation and digital-twin applications. Traditional modeling methods like ray tracing are however computationally demanding and unsuited to model dynamic conditions. In this paper, we propose AIRMap, a deep-learning framework for ultra-fast radio-map estimation, along with an automated pipeline for creating the largest radio-map dataset to date. AIRMap uses a single-input U-Net autoencoder that processes only a 2D elevation map of terrain and building heights. Trained and evaluated on 60,000 Boston-area samples, spanning coverage areas from 500 m to 3 km per side, AIRMap predicts path gain with under 5 dB RMSE in 4 ms per inference on an NVIDIA L40S -over 7000x faster than GPU-accelerated ray tracing based radio maps. A lightweight transfer learning calibration using just 20% of field measurements reduces the median error to approximately 10%, significantly outperforming traditional simulators, which exceed 50% error. Integration into the Colosseum emulator and the Sionna SYS platform demonstrate near-zero error in spectral efficiency and block-error rate compared to measurement-based channels. These findings validate AIRMap’s potential for scalable, accurate, and real-time radio map estimation in wireless digital twins.
zh

[AI-217] EMPO: Temporal Multi-scale Autoregressive Generation of Protein Conformational Ensembles

【速读】:该论文旨在解决蛋白质动态行为建模中生成时空一致且物理合理的蛋白质集合轨迹这一挑战。现有方法通常仅生成静态构象集合,或将动态采样视为独立过程,难以捕捉蛋白质运动的因果依赖关系。解决方案的关键在于提出一种分层自回归框架,将蛋白质动力学建模为马尔可夫过程:低分辨率模型捕获驱动主要构象转变的慢速集体运动,高分辨率模型则在低分辨率状态条件下生成局部细微波动,从而保留多尺度运动的因果结构,实现高效且物理准确的动态轨迹生成。

链接: https://arxiv.org/abs/2511.05510
作者: Yaoyao Xu,Di Wang,Zihan Zhou,Tianshu Yu,Mingchen Chen
机构: 未知
类目: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Understanding the dynamic behavior of proteins is critical to elucidating their functional mechanisms, yet generating realistic, temporally coherent trajectories of protein ensembles remains a significant challenge. In this work, we introduce a novel hierarchical autoregressive framework for modeling protein dynamics that leverages the intrinsic multi-scale organization of molecular motions. Unlike existing methods that focus on generating static conformational ensembles or treat dynamic sampling as an independent process, our approach characterizes protein dynamics as a Markovian process. The framework employs a two-scale architecture: a low-resolution model captures slow, collective motions driving major conformational transitions, while a high-resolution model generates detailed local fluctuations conditioned on these large-scale movements. This hierarchical design ensures that the causal dependencies inherent in protein dynamics are preserved, enabling the generation of temporally coherent and physically realistic trajectories. By bridging high-level biophysical principles with state-of-the-art generative modeling, our approach provides an efficient framework for simulating protein dynamics that balances computational efficiency with physical accuracy.
zh

[AI-218] Personalized Chain-of-Thought Summarization of Financial News for Investor Decision Support ICDM

【速读】:该论文旨在解决金融顾问和投资者在面对海量金融新闻时所遭遇的信息过载问题,其中无关内容和噪声掩盖了关键市场信号,从而阻碍及时的投资决策。解决方案的关键在于提出一种新颖的思维链(Chain-of-Thought, CoT)摘要框架,该框架能将金融新闻浓缩为简洁、事件驱动的个性化摘要,通过整合用户指定关键词确保仅突出最相关的情境信息,进而为语言模型生成以投资者为中心的叙事提供中间层支持,弥合原始新闻与可操作洞察之间的鸿沟。

链接: https://arxiv.org/abs/2511.05508
作者: Tianyi Zhang,Mu Chen
机构: 未知
类目: General Finance (q-fin.GN); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
备注: ICDM SENTIRE 2025

点击查看摘要

Abstract:Financial advisors and investors struggle with information overload from financial news, where irrelevant content and noise obscure key market signals and hinder timely investment decisions. To address this, we propose a novel Chain-of-Thought (CoT) summarization framework that condenses financial news into concise, event-driven summaries. The framework integrates user-specified keywords to generate personalized outputs, ensuring that only the most relevant contexts are highlighted. These personalized summaries provide an intermediate layer that supports language models in producing investor-focused narratives, bridging the gap between raw news and actionable insights.
zh

[AI-219] Rewiring Human Brain Networks via Lightweight Dynamic Connectivity Framework: An EEG-Based Stress Validation

【速读】:该论文旨在解决传统静态功能连接(functional connectivity)方法在捕捉脑区间动态、因果性信息流方面的局限性,从而提升基于脑电图(EEG)的应激状态分类精度。其关键解决方案是提出一种轻量级的动态脑连接框架,基于时变定向传递函数(Time Varying Directed Transfer Function, TV DTF),通过提取不同频段(尤其是α和β频段)的动态方向性信息流特征,并结合多种机器学习(ML)分类器进行验证。结果表明,α-TV-DTF特征在多类和双类应激分类中均表现出显著优于传统绝对功率与相位锁定特征的性能,且特征重要性分析揭示了前额叶-顶叶和前额叶-枕叶之间的主导长程信息调控作用,凸显了前额叶在应激状态下的调节功能,验证了TV-DTF作为高效、可解释的动态脑网络分析工具的潜力。

链接: https://arxiv.org/abs/2511.05505
作者: Sayantan Acharya,Abbas Khosravi,Douglas Creighton,Roohallah Alizadehsani,U. Rajendra Acharya
机构: 未知
类目: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI)
备注: 21 pages, 21 figures, 6 tables, 50 references,

点击查看摘要

Abstract:In recent years, Electroencephalographic analysis has gained prominence in stress research when combined with AI and Machine Learning models for validation. In this study, a lightweight dynamic brain connectivity framework based on Time Varying Directed Transfer Function is proposed, where TV DTF features were validated through ML based stress classification. TV DTF estimates the directional information flow between brain regions across distinct EEG frequency bands, thereby capturing temporal and causal influences that are often overlooked by static functional connectivity measures. EEG recordings from the 32 channel SAM 40 dataset were employed, focusing on mental arithmetic task trials. The dynamic EEG-based TV-DTF features were validated through ML classifiers such as Support Vector Machine, Random Forest, Gradient Boosting, Adaptive Boosting, and Extreme Gradient Boosting. Experimental results show that alpha-TV-DTF provided the strongest discriminative power, with SVM achieving 89.73% accuracy in 3-class classification and with XGBoost achieving 93.69% accuracy in 2 class classification. Relative to absolute power and phase locking based functional connectivity features, alpha TV DTF and beta TV DTF achieved higher performance across the ML models, highlighting the advantages of dynamic over static measures. Feature importance analysis further highlighted dominant long-range frontal parietal and frontal occipital informational influences, emphasizing the regulatory role of frontal regions under stress. These findings validate the lightweight TV-DTF as a robust framework, revealing spatiotemporal brain dynamics and directional influences across different stress levels.
zh

机器学习

[LG-0] Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLM s

链接: https://arxiv.org/abs/2511.07419
作者: Zhongyang Li,Ziyue Li,Tianyi Zhou
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-1] Entangled Schrödinger Bridge Matching

链接: https://arxiv.org/abs/2511.07406
作者: Sophia Tang,Yinuo Zhang,Pranam Chatterjee
类目: Machine Learning (cs.LG); Biomolecules (q-bio.BM)
*备注:

点击查看摘要

[LG-2] C3PO: Optimized Large Language Model Cascades with Probabilistic Cost Constraints for Reasoning

链接: https://arxiv.org/abs/2511.07396
作者: Antonios Valkanas,Soumyasundar Pal,Pavel Rumiantsev,Yingxue Zhang,Mark Coates
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-3] A Diffusion Model to Shrink Proteins While Maintaining Their Function

链接: https://arxiv.org/abs/2511.07390
作者: Ethan Baron,Alan N. Amin,Ruben Weitzman,Debora Marks,Andrew Gordon Wilson
类目: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
*备注: Code available at this https URL

点击查看摘要

[LG-4] Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training

链接: https://arxiv.org/abs/2511.07372
作者: Dake Bu,Wei Huang,Andi Han,Atsushi Nitanda,Hau-San Wong,Qingfu Zhang,Taiji Suzuki
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-5] UAV-Assisted Resilience in 6G and Beyond Network Energy Saving: A Multi-Agent DRL Approach

链接: https://arxiv.org/abs/2511.07366
作者: Dao Lan Vy Dinh,Anh Nguyen Thi Mai,Hung Tran,Giang Quynh Le Vu,Tu Dac Ho,Zhenni Pan,Vo Nhan Van,Symeon Chatzinotas,Dinh-Hieu Tran
类目: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
*备注: 6 pages, 5 figures, 1 table

点击查看摘要

[LG-6] Private Sketches for Linear Regression

链接: https://arxiv.org/abs/2511.07365
作者: Shrutimoy Das,Debanuj Nayak,Anirban Dasgupta
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注: 13 pages

点击查看摘要

[LG-7] Q-RAG : Long Context Multi-step Retrieval via Value-based Embedder Training

链接: https://arxiv.org/abs/2511.07328
作者: Artyom Sorokin,Nazar Buzun,Alexander Anokhin,Oleg Inozemcev,Egor Vedernikov,Petr Anokhin,Mikhail Burtsev,Trushkov Alexey,Yin Wenshuai,Evgeny Burnaev
类目: Machine Learning (cs.LG); Information Retrieval (cs.IR)
*备注: 16 pages, 3 figures, 2 tables

点击查看摘要

[LG-8] Can Training Dynamics of Scale-Invariant Neural Networks Be Explained by the Thermodynamics of an Ideal Gas?

链接: https://arxiv.org/abs/2511.07308
作者: Ildus Sadrtdinov,Ekaterina Lobacheva,Ivan Klimov,Mikhail I. Katsnelson,Dmitry Vetrov
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-9] MG-HGNN: A Heterogeneous GNN Framework for Indoor Wi-Fi Fingerprint-Based Localization

链接: https://arxiv.org/abs/2511.07282
作者: Yibu Wang,Zhaoxin Zhang,Ning Li,Xinlong Zhao,Dong Zhao,Tianzi Zhao
类目: Machine Learning (cs.LG)
*备注: 16 pages, 11 figures, 11 tables

点击查看摘要

[LG-10] RobustA: Robust Anomaly Detection in Multimodal Data

链接: https://arxiv.org/abs/2511.07276
作者: Salem AlMarri,Muhammad Irzam Liaqat,Muhammad Zaigham Zaheer,Shah Nawaz,Karthik Nandakumar,Markus Schedl
类目: Machine Learning (cs.LG)
*备注: Submitted to IEEE Transactions on Image Processing

点击查看摘要

Abstract:In recent years, multimodal anomaly detection methods have demonstrated remarkable performance improvements over video-only models. However, real-world multimodal data is often corrupted due to unforeseen environmental distortions. In this paper, we present the first-of-its-kind work that comprehensively investigates the adverse effects of corrupted modalities on multimodal anomaly detection task. To streamline this work, we propose RobustA, a carefully curated evaluation dataset to systematically observe the impacts of audio and visual corruptions on the overall effectiveness of anomaly detection systems. Furthermore, we propose a multimodal anomaly detection method, which shows notable resilience against corrupted modalities. The proposed method learns a shared representation space for different modalities and employs a dynamic weighting scheme during inference based on the estimated level of corruption. Our work represents a significant step forward in enabling the real-world application of multimodal anomaly detection, addressing situations where the likely events of modality corruptions occur. The proposed evaluation dataset with corrupted modalities and respective extracted features will be made publicly available.

[LG-11] Multi-modal Dynamic Proxy Learning for Personalized Multiple Clustering AAAI2026

链接: https://arxiv.org/abs/2511.07274
作者: Jinfeng Xu,Zheyu Chen,Shuo Yang,Jinze Li,Ziyue Peng,Zewei Liu,Hewei Wang,Jiayi Zhang,Edith C. H. Ngai
类目: Machine Learning (cs.LG)
*备注: Accepted by AAAI 2026

点击查看摘要

Abstract:Multiple clustering aims to discover diverse latent structures from different perspectives, yet existing methods generate exhaustive clusterings without discerning user interest, necessitating laborious manual screening. Current multi-modal solutions suffer from static semantic rigidity: predefined candidate words fail to adapt to dataset-specific concepts, and fixed fusion strategies ignore evolving feature interactions. To overcome these limitations, we propose Multi-DProxy, a novel multi-modal dynamic proxy learning framework that leverages cross-modal alignment through learnable textual proxies. Multi-DProxy introduces 1) gated cross-modal fusion that synthesizes discriminative joint representations by adaptively modeling feature interactions. 2) dual-constraint proxy optimization where user interest constraints enforce semantic consistency with domain concepts while concept constraints employ hard example mining to enhance cluster discrimination. 3) dynamic candidate management that refines textual proxies through iterative clustering feedback. Therefore, Multi-DProxy not only effectively captures a user’s interest through proxies but also enables the identification of relevant clusterings with greater precision. Extensive experiments demonstrate state-of-the-art performance with significant improvements over existing methods across a broad set of multi-clustering benchmarks.

[LG-12] Understanding the role of depth in the neural tangent kernel for overparameterized neural networks

链接: https://arxiv.org/abs/2511.07272
作者: William St-Arnaud,Margarida Carvalho,Golnoosh Farnadi
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-13] A Fully Polynomial-Time Algorithm for Robustly Learning Halfspaces over the Hypercube

链接: https://arxiv.org/abs/2511.07244
作者: Gautam Chandrasekaran,Adam R. Klivans,Konstantinos Stavropoulos,Arsen Vasilyan
类目: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
*备注: 52 pages, 1 figure

点击查看摘要

[LG-14] Does TabPFN Understand Causal Structures?

链接: https://arxiv.org/abs/2511.07236
作者: Omar Swelam,Lennart Purucker,Jake Robertson,Hanne Raum,Joschka Boedecker,Frank Hutter
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-15] Deep Neural Operator Learning for Probabilistic Models

链接: https://arxiv.org/abs/2511.07235
作者: Erhan Bayraktar,Qi Feng,Zecheng Zhang,Zhaoyu Zhang
类目: Machine Learning (cs.LG); Computational Finance (q-fin.CP)
*备注: 36 pages, 1 figure

点击查看摘要

[LG-16] DETECT: Data-Driven Evaluation of Treatments Enabled by Classification Transformers ICDM2025

链接: https://arxiv.org/abs/2511.07213
作者: Yuanheng Mao,Lillian Yang,Stephen Yang,Ethan Shao,Zihan Li
类目: Machine Learning (cs.LG)
*备注: 5 pages, 4 figures, 2 tables, accepted for presentation by IEEE ICDM 2025 UGHS Symposium and publication with proceedings forthcoming

点击查看摘要

Abstract:Chronic pain is a global health challenge affecting millions of individuals, making it essential for physicians to have reliable and objective methods to measure the functional impact of clinical treatments. Traditionally used methods, like the numeric rating scale, while personalized and easy to use, are subjective due to their self-reported nature. Thus, this paper proposes DETECT (Data-Driven Evaluation of Treatments Enabled by Classification Transformers), a data-driven framework that assesses treatment success by comparing patient activities of daily life before and after treatment. We use DETECT on public benchmark datasets and simulated patient data from smartphone sensors. Our results demonstrate that DETECT is objective yet lightweight, making it a significant and novel contribution to clinical decision-making. By using DETECT, independently or together with other self-reported metrics, physicians can improve their understanding of their treatment impacts, ultimately leading to more personalized and responsive patient care.

[LG-17] Synergy over Discrepancy: A Partition-Based Approach to Multi-Domain LLM Fine-Tuning NEURIPS2025

链接: https://arxiv.org/abs/2511.07198
作者: Hua Ye(1 and 2),Siyuan Chen(3),Haoliang Zhang(4),Weihao Luo(5),Yanbin Li(6),Xuan Zhang(2 and 7) ((1) Nanjing University, (2) Airon Technology CO., LTD, (3) University of Bristol, (4) The University of Oklahoma, (5) Donghua University, (6) Beijing University of Posts and Telecommunications, (7) Carnegie Mellon University)
类目: Machine Learning (cs.LG)
*备注: 20 pages, 5 figures, 21 tables. Accepted at NeurIPS 2025. Corresponding author: Xuan Zhang (xuanzhang2199@gmail.com)

点击查看摘要

[LG-18] On Stealing Graph Neural Network Models

链接: https://arxiv.org/abs/2511.07170
作者: Marcin Podhajski,Jan Dubiński,Franziska Boenisch,Adam Dziedzic,Agnieszka Pręgowska,Tomasz P. Michalak
类目: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
*备注:

点击查看摘要

[LG-19] Combining digital data streams and epidemic networks for real time outbreak detection

链接: https://arxiv.org/abs/2511.07163
作者: Ruiqi Lyu,Alistair Turcan,Bryan Wilder
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-20] LLM scape NEURIPS2025

链接: https://arxiv.org/abs/2511.07161
作者: Gottfried Haider,Jie Zhang
类目: Machine Learning (cs.LG)
*备注: Accepted to NeurIPS 2025, Creative AI Track

点击查看摘要

[LG-21] Guiding Generative Models to Uncover Diverse and Novel Crystals via Reinforcement Learning

链接: https://arxiv.org/abs/2511.07158
作者: Hyunsoo Park,Aron Walsh
类目: Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
*备注:

点击查看摘要

[LG-22] Dynamics-Decoupled Trajectory Alignment for Sim-to-Real Transfer in Reinforcement Learning for Autonomous Driving

链接: https://arxiv.org/abs/2511.07155
作者: Thomas Steinecker,Alexander Bienemann,Denis Trescher,Thorsten Luettel,Mirko Maehlisch
类目: Robotics (cs.RO); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-23] rading Vector Data in Vector Databases ICDE2026

链接: https://arxiv.org/abs/2511.07139
作者: Jin Cheng,Xiangxiang Dai,Ningning Ding,John C.S. Lui,Jianwei Huang
类目: Databases (cs.DB); Machine Learning (cs.LG)
*备注: Accepted by ICDE 2026

点击查看摘要

Abstract:Vector data trading is essential for cross-domain learning with vector databases, yet it remains largely unexplored. We study this problem under online learning, where sellers face uncertain retrieval costs and buyers provide stochastic feedback to posted prices. Three main challenges arise: (1) heterogeneous and partial feedback in configuration learning, (2) variable and complex feedback in pricing learning, and (3) inherent coupling between configuration and pricing decisions. We propose a hierarchical bandit framework that jointly optimizes retrieval configurations and pricing. Stage I employs contextual clustering with confidence-based exploration to learn effective configurations with logarithmic regret. Stage II adopts interval-based price selection with local Taylor approximation to estimate buyer responses and achieve sublinear regret. We establish theoretical guarantees with polynomial time complexity and validate the framework on four real-world datasets, demonstrating consistent improvements in cumulative reward and regret reduction compared with existing methods. Comments: Accepted by ICDE 2026 Subjects: Databases (cs.DB); Machine Learning (cs.LG) Cite as: arXiv:2511.07139 [cs.DB] (or arXiv:2511.07139v1 [cs.DB] for this version) https://doi.org/10.48550/arXiv.2511.07139 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

[LG-24] REACT-LLM : A Benchmark for Evaluating LLM Integration with Causal Features in Clinical Prognostic Tasks

链接: https://arxiv.org/abs/2511.07127
作者: Linna Wang,Zhixuan You,Qihui Zhang,Jiunan Wen,Ji Shi,Yimin Chen,Yusen Wang,Fanqi Ding,Ziliang Feng,Li Lu
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-25] A Provably-Correct and Robust Convex Model for Smooth Separable NMF

链接: https://arxiv.org/abs/2511.07109
作者: Junjun Pan,Valentin Leplat,Michael Ng,Nicolas Gillis
类目: Numerical Analysis (math.NA); Machine Learning (cs.LG); Signal Processing (eess.SP); Optimization and Control (math.OC); Machine Learning (stat.ML)
*备注: 30 pages, 10 figures, code available from this https URL

点击查看摘要

Abstract:Nonnegative matrix factorization (NMF) is a linear dimensionality reduction technique for nonnegative data, with applications such as hyperspectral unmixing and topic modeling. NMF is a difficult problem in general (NP-hard), and its solutions are typically not unique. To address these two issues, additional constraints or assumptions are often used. In particular, separability assumes that the basis vectors in the NMF are equal to some columns of the input matrix. In that case, the problem is referred to as separable NMF (SNMF) and can be solved in polynomial-time with robustness guarantees, while identifying a unique solution. However, in real-world scenarios, due to noise or variability, multiple data points may lie near the basis vectors, which SNMF does not leverage. In this work, we rely on the smooth separability assumption, which assumes that each basis vector is close to multiple data points. We explore the properties of the corresponding problem, referred to as smooth SNMF (SSNMF), and examine how it relates to SNMF and orthogonal NMF. We then propose a convex model for SSNMF and show that it provably recovers the sought-after factors, even in the presence of noise. We finally adapt an existing fast gradient method to solve this convex model for SSNMF, and show that it compares favorably with state-of-the-art methods on both synthetic and hyperspectral datasets.

[LG-26] Direct Molecular Polarizability Prediction with SO(3) Equivariant Local Frame GNNs

链接: https://arxiv.org/abs/2511.07087
作者: Jean Philip Filling,Felix Post,Michael Wand,Denis Andrienko
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-27] Breaking Privacy in Federated Clustering: Perfect Input Reconstruction via Temporal Correlations

链接: https://arxiv.org/abs/2511.07073
作者: Guang Yang,Lixia Luo,Qiongxiu Li
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-28] Fair Bayesian Data Selection via Generalized Discrepancy Measures

链接: https://arxiv.org/abs/2511.07032
作者: Yixuan Zhang,Jiabin Luo,Zhenggang Wang,Feng Zhou,Quyu Kong
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-29] Correcting False Alarms from Unseen: Adapting Graph Anomaly Detectors at Test Time AAAI2026

链接: https://arxiv.org/abs/2511.07023
作者: Junjun Pan,Yixin Liu,Chuan Zhou,Fei Xiong,Alan Wee-Chung Liew,Shirui Pan
类目: Machine Learning (cs.LG)
*备注: 9 pages, 5 figures, accepted by AAAI 2026

点击查看摘要

[LG-30] CoLM: Collaborative Large Models via A Client-Server Paradigm

链接: https://arxiv.org/abs/2511.06991
作者: Siqi Huang,Sida Huang,Hongyuan Zhang
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-31] HCFSLN: Adaptive Hyperbolic Few-Shot Learning for Multimodal Anxiety Detection

链接: https://arxiv.org/abs/2511.06988
作者: Aditya Sneh,Nilesh Kumar Sahu,Anushka Sanjay Shelke,Arya Adyasha,Haroon R. Lone
类目: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)
*备注:

点击查看摘要

[LG-32] Breaking the Gradient Barrier: Unveiling Large Language Models for Strategic Classification NEURIPS2025

链接: https://arxiv.org/abs/2511.06979
作者: Xinpeng Lv,Yunxin Mao,Haoxuan Li,Ke Liang,Jinxuan Yang,Wanrong Huang,Haoang Chi,Huan Chen,Long Lan,Yuanlong Chen,Wenjing Yang,Haotian Wang
类目: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)
*备注: Accepted by NeurIPS 2025

点击查看摘要

[LG-33] Fast Bayesian Updates via Harmonic Representations

链接: https://arxiv.org/abs/2511.06978
作者: Di Zhang
类目: Machine Learning (cs.LG); Information Theory (cs.IT); Numerical Analysis (math.NA); Statistics Theory (math.ST)
*备注: 13 pages

点击查看摘要

[LG-34] Rethinking Crystal Symmetry Prediction: A Decoupled Perspective

链接: https://arxiv.org/abs/2511.06976
作者: Liheng Yu,Zhe Zhao,Xucong Wang,Di Wu,Pengkun Wang
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-35] A Closer Look at Knowledge Distillation in Spiking Neural Network Training AAAI2026

链接: https://arxiv.org/abs/2511.06902
作者: Xu Liu,Na Xia,Jinxing Zhou,Jingyuan Xu,Dan Guo
类目: Machine Learning (cs.LG)
*备注: Accepted by AAAI 2026

点击查看摘要

[LG-36] Contact Wasserstein Geodesics for Non-Conservative Schrodinger Bridges

链接: https://arxiv.org/abs/2511.06856
作者: Andrea Testa,Soren Hauberg,Tamim Asfour,Leonel Rozo
类目: Machine Learning (cs.LG); Differential Geometry (math.DG)
*备注: 38 pages, 18 figures

点击查看摘要

[LG-37] Beyond Observations: Reconstruction Error-Guided Irregularly Sampled Time Series Representation Learning AAAI2026

链接: https://arxiv.org/abs/2511.06854
作者: Jiexi Liu,Meng Cao,Songcan Chen
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注: Accepted by AAAI 2026

点击查看摘要

[LG-38] MI-to-Mid Distilled Compression (M2M-DC): An Hybrid-Information-Guided-Block Pruning with Progressive Inner Slicing Approach to Model Compression

链接: https://arxiv.org/abs/2511.06842
作者: Lionel Levine,Sajjad Ghiasvand,Haniyeh Ehsani Oskouie,Majid Sarrafzadeh
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-39] P3-LLM : An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats

链接: https://arxiv.org/abs/2511.06838
作者: Yuzong Chen,Chao Fang,Xilai Dai,Yuheng Wu,Thierry Tambe,Marian Verhelst,Mohamed S. Abdelfattah
类目: Hardware Architecture (cs.AR); Machine Learning (cs.LG)
*备注: Preprint. Under review

点击查看摘要

[LG-40] Minimum Width of Deep Narrow Networks for Universal Approximation

链接: https://arxiv.org/abs/2511.06837
作者: Xiao-Song Yang,Qi Zhou,Xuan Zhou
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-41] Neural-Initialized Newton: Accelerating Nonlinear Finite Elements via Operator Learning

链接: https://arxiv.org/abs/2511.06802
作者: Kianoosh Taghikhani,Yusuke Yamazaki,Jerry Paul Varghese,Markus Apel,Reza Najian Asl,Shahed Rezaei
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-42] FedNET: Federated Learning for Proactive Traffic Management and Network Capacity Planning

链接: https://arxiv.org/abs/2511.06797
作者: Saroj Kumar Panda,Basabdatta Palit,Sadananda Behera
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-43] Beyond Uniform Deletion: A Data Value-Weighted Framework for Certified Machine Unlearning

链接: https://arxiv.org/abs/2511.06794
作者: Lisong He,Yi Yang,Xiangyu Chang
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-44] Coupling Agent -based Modeling and Life Cycle Assessment to Analyze Trade-offs in Resilient Energy Transitions NEURIPS

链接: https://arxiv.org/abs/2511.06791
作者: Beichen Zhang,Mohammed T. Zaki,Hanna Breunig,Newsha K. Ajami
类目: Machine Learning (cs.LG); Multiagent Systems (cs.MA)
*备注: 4 pages (+4 pages in appendix), 3 figures (+ 2 figures in appendix), 8 tables in appendix, NeurIPS Workshop on Tackling Climate Change with Machine Learning, 2025

点击查看摘要

[LG-45] Rethinking Parameter Sharing as Graph Coloring for Structured Compression

链接: https://arxiv.org/abs/2511.06786
作者: Boyang Zhang,Daning Cheng,Yunquan Zhang
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-46] HEDN: A Hard-Easy Dual Network with Task Difficulty Assessment for EEG Emotion Recognition

链接: https://arxiv.org/abs/2511.06782
作者: Qiang Wang,Liying Yang
类目: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-47] Dual Mamba for Node-Specific Representation Learning: Tackling Over-Smoothing with Selective State Space Modeling

链接: https://arxiv.org/abs/2511.06756
作者: Xin He,Yili Wang,Yiwei Dai,Xin Wang
类目: Machine Learning (cs.LG)
*备注: 11 pages, 4 figures

点击查看摘要

Abstract:Over-smoothing remains a fundamental challenge in deep Graph Neural Networks (GNNs), where repeated message passing causes node representations to become indistinguishable. While existing solutions, such as residual connections and skip layers, alleviate this issue to some extent, they fail to explicitly model how node representations evolve in a node-specific and progressive manner across layers. Moreover, these methods do not take global information into account, which is also crucial for mitigating the over-smoothing problem. To address the aforementioned issues, in this work, we propose a Dual Mamba-enhanced Graph Convolutional Network (DMbaGCN), which is a novel framework that integrates Mamba into GNNs to address over-smoothing from both local and global perspectives. DMbaGCN consists of two modules: the Local State-Evolution Mamba (LSEMba) for local neighborhood aggregation and utilizing Mamba’s selective state space modeling to capture node-specific representation dynamics across layers, and the Global Context-Aware Mamba (GCAMba) that leverages Mamba’s global attention capabilities to incorporate global context for each node. By combining these components, DMbaGCN enhances node discriminability in deep GNNs, thereby mitigating over-smoothing. Extensive experiments on multiple benchmarks demonstrate the effectiveness and efficiency of our method.

[LG-48] Multi-Modal Continual Learning via Cross-Modality Adapters and Representation Alignment with Knowledge Preservation ECAI2025

链接: https://arxiv.org/abs/2511.06723
作者: Evelyn Chee,Wynne Hsu,Mong Li Lee
类目: Machine Learning (cs.LG)
*备注: Accepted to ECAI 2025

点击查看摘要

[LG-49] MobileLLM -Pro Technical Report

链接: https://arxiv.org/abs/2511.06719
作者: Patrick Huber,Ernie Chang,Wei Wen,Igor Fedorov,Tarek Elgamal,Hanxian Huang,Naveen Suda,Chinnadhurai Sankar,Vish Vogeti,Yanghan Wang,Alex Gladkov,Kai Sheng Tai,Abdelrahman Elogeel,Tarek Hefny,Vikas Chandra,Ahmed Aly,Anuj Kumar,Raghuraman Krishnamoorthi,Adithya Sagar
类目: Machine Learning (cs.LG)
*备注: 17 pages

点击查看摘要

[LG-50] he Wisdom of the Crowd: High-Fidelity Classification of Cyber-Attacks and Faults in Power Systems Using Ensemble and Machine Learning

链接: https://arxiv.org/abs/2511.06714
作者: Emad Abukhousa,Syed Sohail Feroz Syed Afroz,Fahad Alsaeed,Abdulaziz Qwbaiban,Saman Zonouz,A.P. Sakis Meliopoulos
类目: ystems and Control (eess.SY); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-51] Peeling Context from Cause for Multimodal Molecular Property Prediction

链接: https://arxiv.org/abs/2511.06692
作者: Tao Li,Kaiyuan Hou,Tuan Vinh,Carl Yang,Monika Raj
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-52] Mitigating Modality Imbalance in Multi-modal Learning via Multi-objective Optimization

链接: https://arxiv.org/abs/2511.06686
作者: Heshan Fernando,Parikshit Ram,Yi Zhou,Soham Dan,Horst Samulowitz,Nathalie Baracaldo,Tianyi Chen
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-53] An Adaptive Machine Learning Triage Framework for Predicting Alzheimers Disease Progression ALT ML4H

链接: https://arxiv.org/abs/2511.06681
作者: Richard Hou,Shengpu Tang,Wei Jin
类目: Machine Learning (cs.LG)
*备注: Findings paper presented at Machine Learning for Health (ML4H) symposium 2025, December 1-2, 2025, San Diego, CA, USA, 9 pages. Shengpu Tang and Wei Jin contributed equally as senior authors

点击查看摘要

[LG-54] When Evidence Contradicts: Toward Safer Retrieval-Augmented Generation in Healthcare

链接: https://arxiv.org/abs/2511.06668
作者: Saeedeh Javadi,Sara Mirabi,Manan Gangar,Bahadorreza Ofoghi
类目: Information Retrieval (cs.IR); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-55] GNN-Enabled Robust Hybrid Beamforming with Score-Based CSI Generation and Denoising

链接: https://arxiv.org/abs/2511.06663
作者: Yuhang Li,Yang Lu,Bo Ai,Zhiguo Ding,Dusit Niyato,Arumugam Nallanathan
类目: ystems and Control (eess.SY); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-56] Dual-Pathway Fusion of EHRs and Knowledge Graphs for Predicting Unseen Drug-Drug Interactions ML4H2025

链接: https://arxiv.org/abs/2511.06662
作者: Franklin Lee,Tengfei Ma
类目: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
*备注: ML4H 2025 Findings

点击查看摘要

[LG-57] Improving Asset Allocation in a Fast Moving Consumer Goods B2B Company: An Interpretable Machine Learning Framework for Commercial Cooler Assignment Based on Multi-Tier Growth Targets

链接: https://arxiv.org/abs/2511.06642
作者: Renato Castro,Rodrigo Paredes,Douglas Kahn
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-58] Neyman-Pearson Classification under Both Null and Alternative Distributions Shift

链接: https://arxiv.org/abs/2511.06641
作者: Mohammadreza M. Kalan,Yuyang Deng,Eitan J. Neugut,Samory Kpotufe
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-59] Dual-branch Spatial-Temporal Self-supervised Representation for Enhanced Road Network Learning

链接: https://arxiv.org/abs/2511.06633
作者: Qinghong Guo,Yu Wang,Ji Cao,Tongya Zheng,Junshu Dai,Bingde Hu,Shunyu Liu,Canghong Jin
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-60] Non-Rival Data as Rival Products: An Encapsulation-Forging Approach for Data Synthesis

链接: https://arxiv.org/abs/2511.06610
作者: Kaidong Wang,Jiale Li,Shao-Bo Lin,Yao Wang
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-61] A Weak Penalty Neural ODE for Learning Chaotic Dynamics from Noisy Time Series

链接: https://arxiv.org/abs/2511.06609
作者: Xuyang Li,John Harlim,Romit Maulik
类目: Machine Learning (cs.LG); Dynamical Systems (math.DS)
*备注:

点击查看摘要

[LG-62] Explainable Probabilistic Machine Learning for Predicting Drilling Fluid Loss of Circulation in Marun Oil Field

链接: https://arxiv.org/abs/2511.06607
作者: Seshu Kumar Damarla,Xiuli Zhu
类目: Machine Learning (cs.LG)
*备注: 5 pages, 3 tables, 4 figrues

点击查看摘要

[LG-63] Adaptive Initial Residual Connections for GNNs with Theoretical Guarantees AAAI-2026 AAAI

链接: https://arxiv.org/abs/2511.06598
作者: Mohammad Shirzadi,Ali Safarpoor Dehkordi,Ahad N. Zehmakan
类目: Machine Learning (cs.LG)
*备注: This is the full version of the paper accepted to the 40th Annual AAAI Conference on Artificial Intelligence (AAAI-2026)

点击查看摘要

[LG-64] Optimistic Online-to-Batch Conversions for Accelerated Convergence and Universality NEURIPS2025

链接: https://arxiv.org/abs/2511.06597
作者: Yu-Hu Yan,Peng Zhao,Zhi-Hua Zhou
类目: Machine Learning (cs.LG); Optimization and Control (math.OC)
*备注: NeurIPS 2025

点击查看摘要

[LG-65] Practical Policy Distillation for Reinforcement Learning in Radio Access Networks

链接: https://arxiv.org/abs/2511.06563
作者: Sara Khosravi,Burak Demirel,Linghui Zhou,Javier Rasines,Pablo Soldati
类目: Machine Learning (cs.LG)
*备注: This paper is accepted for publication in IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, 2025

点击查看摘要

[LG-66] Bayesian Uncertainty Quantification with Anchored Ensembles for Robust EV Power Consumption Prediction

链接: https://arxiv.org/abs/2511.06538
作者: Ghazal Farhani,Taufiq Rahman,Kieran Humphries
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-67] Efficient Approximation of Volterra Series for High-Dimensional Systems

链接: https://arxiv.org/abs/2511.06527
作者: Navin Khoshnan,Claudia K Petritsch,Bryce-Allen Bagley
类目: Machine Learning (cs.LG); Systems and Control (eess.SY)
*备注:

点击查看摘要

[LG-68] EASE: Practical and Efficient Safety Alignment for Small Language Models AAAI2026

链接: https://arxiv.org/abs/2511.06512
作者: Haonan Shi,Guoli Wang,Tu Ouyang,An Wang
类目: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
*备注: Accepted to AAAI 2026

点击查看摘要

[LG-69] Probably Approximately Global Robustness Certification ICML2025

链接: https://arxiv.org/abs/2511.06495
作者: Peter Blohm,Patrick Indri,Thomas Gärtner,Sagar Malhotra
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注: ICML 2025

点击查看摘要

[LG-70] Learning Time-Varying Graph Signals via Koopman

链接: https://arxiv.org/abs/2511.06493
作者: Sivaram Krishnan,Jinho Choi,Jihong Park
类目: Machine Learning (cs.LG); Signal Processing (eess.SP)
*备注:

点击查看摘要

[LG-71] DyKAF: Dynamical Kronecker Approximation of the Fisher Information Matrix for Gradient Preconditioning

链接: https://arxiv.org/abs/2511.06477
作者: Nikolay Yudin,Ekaterina Grishina,Andrey Veprikov,Alexandr Beznosikov,Maxim Rakhuba
类目: Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)
*备注:

点击查看摘要

[LG-72] Error Estimate and Convergence Analysis for Data Valuation

链接: https://arxiv.org/abs/2511.06463
作者: Zhangyong Liang,Huanhuan Gao,Ji Zhang
类目: Machine Learning (cs.LG)
*备注: 7 pages, 1 figure

点击查看摘要

[LG-73] Reconstruction and Secrecy under Approximate Distance Queries NEURIPS2025

链接: https://arxiv.org/abs/2511.06461
作者: Shay Moran,Elizaveta Nesterova
类目: Machine Learning (cs.LG); Information Theory (cs.IT); Metric Geometry (math.MG)
*备注: 39 pages. Conference version: NeurIPS 2025 (Spotlight). Extended appendix included

点击查看摘要

[LG-74] MULTIBENCH: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains

链接: https://arxiv.org/abs/2511.06452
作者: Leyan Xue,Zongbo Han,Kecheng Xue,Xiaohong Liu,Guangyu Wang,Changqing Zhang
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Although multimodal fusion has made significant progress, its advancement is severely hindered by the lack of adequate evaluation benchmarks. Current fusion methods are typically evaluated on a small selection of public datasets, a limited scope that inadequately represents the complexity and diversity of real-world scenarios, potentially leading to biased evaluations. This issue presents a twofold challenge. On one hand, models may overfit to the biases of specific datasets, hindering their generalization to broader practical applications. On the other hand, the absence of a unified evaluation standard makes fair and objective comparisons between different fusion methods difficult. Consequently, a truly universal and high-performance fusion model has yet to emerge. To address these challenges, we have developed a large-scale, domain-adaptive benchmark for multimodal evaluation. This benchmark integrates over 30 datasets, encompassing 15 modalities and 20 predictive tasks across key application domains. To complement this, we have also developed an open-source, unified, and automated evaluation pipeline that includes standardized implementations of state-of-the-art models and diverse fusion paradigms. Leveraging this platform, we have conducted large-scale experiments, successfully establishing new performance baselines across multiple tasks. This work provides the academic community with a crucial platform for rigorous and reproducible assessment of multimodal models, aiming to propel the field of multimodal artificial intelligence to new heights.

[LG-75] A Risk-Neutral Neural Operator for Arbitrag e-Free SPX-VIX Term Structures

链接: https://arxiv.org/abs/2511.06451
作者: Jian’an Zhang
类目: Machine Learning (cs.LG); Computational Finance (q-fin.CP)
*备注: 46 pages, 9 figures, includes appendices; v11 draft aligned with final outline

点击查看摘要

[LG-76] How Wide and How Deep? Mitigating Over-Squashing of GNNs via Channel Capacity Constrained Estimation AAAI AAAI-26

链接: https://arxiv.org/abs/2511.06443
作者: Zinuo You,Jin Zheng,John Cartlidge
类目: Machine Learning (cs.LG)
*备注: 29 pages, 11 figures. Author manuscript accepted for the 40th Annual AAAI Conference on Artificial Intelligence (AAAI-26), January 2026

点击查看摘要

[LG-77] Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding NIPS2025

链接: https://arxiv.org/abs/2511.06376
作者: Qian Ma,Ruoxiang Xu,Yongqiang Cai
类目: Machine Learning (cs.LG)
*备注: Accepted as NIPS 2025 poster

点击查看摘要

[LG-78] Adaptive Regularization for Large-Scale Sparse Feature Embedding Models

链接: https://arxiv.org/abs/2511.06374
作者: Mang Li,Wei Lyu
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-79] Scalable Verification of Neural Control Barrier Functions Using Linear Bound Propagation

链接: https://arxiv.org/abs/2511.06341
作者: Nikolaus Vertovec,Frederik Baymler Mathiesen,Thom Badings,Luca Laurenti,Alessandro Abate
类目: Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY); Optimization and Control (math.OC)
*备注:

点击查看摘要

[LG-80] DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation

链接: https://arxiv.org/abs/2511.06307
作者: Speed Zhu,Jianwei Cai,Guang Chen,Lulu Wu,Saiyong Yang,Wiggin Zhou
类目: Machine Learning (cs.LG)
*备注: 15 pages, 8 figures

点击查看摘要

[LG-81] Setting varepsilon is not the Issue in Differential Privacy NEURIPS

链接: https://arxiv.org/abs/2511.06305
作者: Edwige Cyffers
类目: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
*备注: Accepted to NeurIPS Position Paper track

点击查看摘要

[LG-82] 3dSAGER: Geospatial Entity Resolution over 3D Objects (Technical Report)

链接: https://arxiv.org/abs/2511.06300
作者: Bar Genossar,Sagi Dalyot,Roee Shraga,Avigdor Gal
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-83] Achieving Fairness Without Harm via Selective Demographic Experts AAAI26

链接: https://arxiv.org/abs/2511.06293
作者: Xuwei Tan,Yuanlong Wang,Thai-Hoang Pham,Ping Zhang,Xueru Zhang
类目: Machine Learning (cs.LG)
*备注: AAAI26; Extended version

点击查看摘要

[LG-84] LLM 3-DTI: A Large Language Model and Multi-modal data co-powered framework for Drug-Target Interaction prediction

链接: https://arxiv.org/abs/2511.06269
作者: Yuhao Zhang,Qinghong Guo,Qixian Chen,Liuwei Zhang,Hongyan Cui,Xiyi Chen
类目: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
*备注:

点击查看摘要

[LG-85] Synheart Emotion: Privacy-Preserving On-Device Emotion Recognition from Biosignals

链接: https://arxiv.org/abs/2511.06231
作者: Henok Ademtew,Israel Goytom
类目: Machine Learning (cs.LG)
*备注: Preprint submitted to the Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT)

点击查看摘要

[LG-86] Deep Reinforcement Learning for Dynamic Origin-Destination Matrix Estimation in Microscopic Traffic Simulations Considering Credit Assignment

链接: https://arxiv.org/abs/2511.06229
作者: Donggyu Min,Seongjin Choi,Dong-Kyu Kim
类目: Machine Learning (cs.LG)
*备注: 11 pages, 10 figures, 3 tables

点击查看摘要

[LG-87] Adaptive Multi-view Graph Contrastive Learning via Fractional-order Neural Diffusion Networks

链接: https://arxiv.org/abs/2511.06216
作者: Yanan Zhao,Feng Ji,Jingyang Dai,Jiaze Ma,Keyue Jiang,Kai Zhao,Wee Peng Tay
类目: Machine Learning (cs.LG)
*备注: Submitted to TPAMI

点击查看摘要

[LG-88] me Matters: A Novel Real-Time Long- and Short-term User Interest Model for Click-Through Rate Prediction

链接: https://arxiv.org/abs/2511.06213
作者: Xian-Jin Gui
类目: Information Retrieval (cs.IR); Machine Learning (cs.LG)
*备注: This work was doned when the first author interned at Alibaba Group

点击查看摘要

[LG-89] Sparse Linear Regression is Easy on Random Supports

链接: https://arxiv.org/abs/2511.06211
作者: Gautam Chandrasekaran,Raghu Meka,Konstantinos Stavropoulos
类目: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Statistics Theory (math.ST); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-90] Local K-Similarity Constraint for Federated Learning with Label Noise

链接: https://arxiv.org/abs/2511.06169
作者: Sanskar Amgain,Prashant Shrestha,Bidur Khanal,Alina Devkota,Yash Raj Shrestha,Seungryul Baek,Prashnna Gyawali,Binod Bhattarai
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Federated learning on clients with noisy labels is a challenging problem, as such clients can infiltrate the global model, impacting the overall generalizability of the system. Existing methods proposed to handle noisy clients assume that a sufficient number of clients with clean labels are available, which can be leveraged to learn a robust global model while dampening the impact of noisy clients. This assumption fails when a high number of heterogeneous clients contain noisy labels, making the existing approaches ineffective. In such scenarios, it is important to locally regularize the clients before communication with the global model, to ensure the global model isn’t corrupted by noisy clients. While pre-trained self-supervised models can be effective for local regularization, existing centralized approaches relying on pretrained initialization are impractical in a federated setting due to the potentially large size of these models, which increases communication costs. In that line, we propose a regularization objective for client models that decouples the pre-trained and classification models by enforcing similarity between close data points within the client. We leverage the representation space of a self-supervised pretrained model to evaluate the closeness among examples. This regularization, when applied with the standard objective function for the downstream task in standard noisy federated settings, significantly improves performance, outperforming existing state-of-the-art federated methods in multiple computer vision and medical image classification benchmarks. Unlike other techniques that rely on self-supervised pretrained initialization, our method does not require the pretrained model and classifier backbone to share the same architecture, making it architecture-agnostic.

[LG-91] Learning Gaussian DAG Models without Condition Number Bounds

链接: https://arxiv.org/abs/2511.06164
作者: Constantinos Daskalakis,Vardis Kandiros,Rui Yao
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-92] Enhancing Robustness of Graph Neural Networks through p-Laplacian AAAI AAAI-26

链接: https://arxiv.org/abs/2511.06143
作者: Anuj Kumar Sirohi,Subhanu Halder,Kabir Kumar,Sandeep Kumar
类目: Machine Learning (cs.LG)
*备注: Accepted at 5th Workshop on Graphs and more Complex Structures For Learning and Reasoning (GCLR), The 40th AAAI Conference on Artificial Intelligence (AAAI-26)

点击查看摘要

[LG-93] On the Convergence and Stability of Distributed Sub-model Training

链接: https://arxiv.org/abs/2511.06132
作者: Yuyang Deng,Fuli Qiao,Mehrdad Mahdavi
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:As learning models continue to grow in size, enabling on-device local training of these models has emerged as a critical challenge in federated learning. A popular solution is sub-model training, where the server only distributes randomly sampled sub-models to the edge clients, and clients only update these small models. However, those random sampling of sub-models may not give satisfying convergence performance. In this paper, observing the success of SGD with shuffling, we propose a distributed shuffled sub-model training, where the full model is partitioned into several sub-models in advance, and the server shuffles those sub-models, sends each of them to clients at each round, and by the end of local updating period, clients send back the updated sub-models, and server averages them. We establish the convergence rate of this algorithm. We also study the generalization of distributed sub-model training via stability analysis, and find that the sub-model training can improve the generalization via amplifying the stability of training process. The extensive experiments also validate our theoretical findings.

[LG-94] A Deep Learning Model for Predicting Transformation Legality

链接: https://arxiv.org/abs/2511.06120
作者: Avani Tiwari,Yacine Hakimi,Riyadh Baghdadi
类目: Programming Languages (cs.PL); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Compilers must check the legality of code transformations to guarantee the correctness of applying a sequence of code transformations to a given code. While such a legality check needs to be precisely computed in general, we can use an approximate legality prediction model in certain cases, such as training a reinforcement learning (RL) agent for schedule prediction. In this paper, we propose an approximate method for legality checks. We propose a novel DL model for predicting the legality of transformations. The model takes the code representation and a list of transformations as input and predicts whether applying those transformations to the code is legal. We implement and evaluate the proposed model, demonstrating its effectiveness. Our evaluation shows an F1 score of 0.91 on a test set of randomly generated programs. To further evaluate the model in a practical scenario, we used the model to replace the legality check used during the training of an RL agent designed for automatic code optimization. We demonstrate that such a replacement enables the agent to train on twice as many steps, resulting in faster training and reducing resource usage by approximately 80% for CPU and 35% for RAM. The agent trained using this approach maintains comparable performance, with only a 4% reduction on benchmarks from the Polybench suite compared to the traditional method.

[LG-95] Guardian-regularized Safe Offline Reinforcement Learning for Smart Weaning of Mechanical Circulatory Devices

链接: https://arxiv.org/abs/2511.06111
作者: Aysin Tumay,Sophia Sun,Sonia Fereidooni,Aaron Dumas,Elise Jortberg,Rose Yu
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-96] Approximating Shapley Explanations in Reinforcement Learning NEURIPS2025

链接: https://arxiv.org/abs/2511.06094
作者: Daniel Beechey,Özgür Şimşek
类目: Machine Learning (cs.LG)
*备注: Camera-ready version. Published at the Conference on Neural Information Processing Systems (NeurIPS 2025)

点击查看摘要

[LG-97] Event-driven physics-informed operator learning for reliability analysis

链接: https://arxiv.org/abs/2511.06083
作者: Shailesh Garg,Souvik Chakraborty
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Reliability analysis of engineering systems under uncertainty poses significant computational challenges, particularly for problems involving high-dimensional stochastic inputs, nonlinear system responses, and multiphysics couplings. Traditional surrogate modeling approaches often incur high energy consumption, which severely limits their scalability and deployability in resource-constrained environments. We introduce NeuroPOL, \textitthe first neuroscience-inspired physics-informed operator learning framework for reliability analysis. NeuroPOL incorporates Variable Spiking Neurons into a physics-informed operator architecture, replacing continuous activations with event-driven spiking dynamics. This innovation promotes sparse communication, significantly reduces computational load, and enables an energy-efficient surrogate model. The proposed framework lowers both computational and power demands, supporting real-time reliability assessment and deployment on edge devices and digital twins. By embedding governing physical laws into operator learning, NeuroPOL builds physics-consistent surrogates capable of accurate uncertainty propagation and efficient failure probability estimation, even for high-dimensional problems. We evaluate NeuroPOL on five canonical benchmarks, the Burgers equation, Nagumo equation, two-dimensional Poisson equation, two-dimensional Darcy equation, and incompressible Navier-Stokes equation with energy coupling. Results show that NeuroPOL achieves reliability measures comparable to standard physics-informed operators, while introducing significant communication sparsity, enabling scalable, distributed, and energy-efficient deployment.

[LG-98] Make It Long Keep It Fast: End-to-End 10k-Sequence Modeling at Billion Scale on Douyin

链接: https://arxiv.org/abs/2511.06077
作者: Lin Guan,Jia-Qi Yang,Zhishan Zhao,Beichuan Zhang,Bo Sun,Xuanyuan Luo,Jinan Ni,Xiaowen Li,Yuhang Qi,Zhifang Fan,Hangyu Wang,Qiwei Chen,Yi Cheng,Feng Zhang,Xiao Yang
类目: Machine Learning (cs.LG); Information Retrieval (cs.IR)
*备注:

点击查看摘要

[LG-99] CatBack: Universal Backdoor Attacks on Tabular Data via Categorical Encoding

链接: https://arxiv.org/abs/2511.06072
作者: Behrad Tajalli,Stefanos Koffas,Stjepan Picek
类目: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
*备注:

点击查看摘要

[LG-100] Function Based Isolation Forest (FuBIF): A Unifying Framework for Interpretable Isolation-Based Anomaly Detection

链接: https://arxiv.org/abs/2511.06054
作者: Alessio Arcudi,Alessandro Ferreri,Francesco Borsatti,Gian Antonio Susto
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-101] Physics-Informed Design of Input Convex Neural Networks for Consistency Optimal Transport Flow Matching

链接: https://arxiv.org/abs/2511.06042
作者: Fanghui Song,Zhongjian Wang,Jiebao Sun
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-102] Lethe: Layer- and Time-Adaptive KV Cache Pruning for Reasoning -Intensive LLM Serving AAAI26

链接: https://arxiv.org/abs/2511.06029
作者: Hui Zeng,Daming Zhao,Pengfei Yang,Wenxuan Hou,Tianyang Zheng,Hui Li,Weiye Ji,Jidong Zhai
类目: Machine Learning (cs.LG)
*备注: aaai26 camera-ready version, 12 pages

点击查看摘要

[LG-103] Learning solutions of parameterized stiff ODEs using Gaussian processes

链接: https://arxiv.org/abs/2511.05990
作者: Idoia Cortes Garcia,P. Förster,W. Schilders,S. Schöps
类目: Numerical Analysis (math.NA); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
*备注: 21 pages, 10 figures

点击查看摘要

Abstract:Stiff ordinary differential equations (ODEs) play an important role in many scientific and engineering applications. Often, the dependence of the solution of the ODE on additional parameters is of interest, e.g.\ when dealing with uncertainty quantification or design optimization. Directly studying this dependence can quickly become too computationally expensive, such that cheaper surrogate models approximating the solution are of interest. One popular class of surrogate models are Gaussian processes (GPs). They perform well when approximating stationary functions, functions which have a similar level of variation along any given parameter direction, however solutions to stiff ODEs are often characterized by a mixture of regions of rapid and slow variation along the time axis and when dealing with such nonstationary functions, GP performance frequently degrades drastically. We therefore aim to reparameterize stiff ODE solutions based on the available data, to make them appear more stationary and hence recover good GP performance. This approach comes with minimal computational overhead and requires no internal changes to the GP implementation, as it can be seen as a separate preprocessing step. We illustrate the achieved benefits using multiple examples.

[LG-104] Bespoke Co-processor for Energy-Efficient Health Monitoring on RISC-V-based Flexible Wearables DATE2026

链接: https://arxiv.org/abs/2511.05985
作者: Theofanis Vergos,Polykarpos Vergos,Mehdi B. Tahoori,Georgios Zervakis
类目: Machine Learning (cs.LG); Hardware Architecture (cs.AR)
*备注: Accepted for publication at IEEE Design, Automation Test in Europe (DATE 2026)

点击查看摘要

Abstract:Flexible electronics offer unique advantages for conformable, lightweight, and disposable healthcare wearables. However, their limited gate count, large feature sizes, and high static power consumption make on-body machine learning classification highly challenging. While existing bendable RISC-V systems provide compact solutions, they lack the energy efficiency required. We present a mechanically flexible RISC-V that integrates a bespoke multiply-accumulate co-processor with fixed coefficients to maximize energy efficiency and minimize latency. Our approach formulates a constrained programming problem to jointly determine co-processor constants and optimally map Multi-Layer Perceptron (MLP) inference operations, enabling compact, model-specific hardware by leveraging the low fabrication and non-recurring engineering costs of flexible technologies. Post-layout results demonstrate near-real-time performance across several healthcare datasets, with our circuits operating within the power budget of existing flexible batteries and occupying only 2.42 mm^2, offering a promising path toward accessible, sustainable, and conformable healthcare wearables. Our microprocessors achieve an average 2.35x speedup and 2.15x lower energy consumption compared to the state of the art.

[LG-105] Are Time-Indexed Foundation Models the Future of Time Series Imputation?

链接: https://arxiv.org/abs/2511.05980
作者: Etienne Le Naour,Tahar Nabil,Adrien Petralia,Ghislain Agoua
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Foundation models for time series imputation remain largely unexplored. Recently, two such models, TabPFN-TS and MoTM, have emerged. These models share a common philosophy that places them within the family of time-indexed foundation models. This paper presents the first large-scale empirical study of these models for zero-shot imputation, which enables missing value recovery without retraining across a wide range of scenarios. We conduct extensive univariate experiments across 33 out-of-domain datasets (approximately 1.3M imputation windows) and evaluate their ability to integrate covariates at inference time to improve accuracy without fine-tuning. Our results demonstrate that time-indexed foundation models are a powerful and practical step toward achieving general-purpose, zero-shot imputation for real-world time series.

[LG-106] Explainable Deep Learning-based Classification of Wolff-Parkinson-White Electrocardiographic Signals

链接: https://arxiv.org/abs/2511.05973
作者: Alice Ragonesi,Stefania Fresca,Karli Gillette,Stefan Kurath-Koller,Gernot Plank,Elena Zappon
类目: Machine Learning (cs.LG); Numerical Analysis (math.NA); Tissues and Organs (q-bio.TO)
*备注: 27 pages, 9 figures, 4 tables

点击查看摘要

[LG-107] Next-Latent Prediction Transformers Learn Compact World Models MICRO

链接: https://arxiv.org/abs/2511.05963
作者: Jayden Teoh,Manan Tomar,Kwangjun Ahn,Edward S. Hu,Pratyusha Sharma,Riashat Islam,Alex Lamb,John Langford
类目: Machine Learning (cs.LG)
*备注: Preprint by Microsoft Research

点击查看摘要

Abstract:Transformers replace recurrence with a memory that grows with sequence length and self-attention that enables ad-hoc look ups over past tokens. Consequently, they lack an inherent incentive to compress history into compact latent states with consistent transition rules. This often leads to learning solutions that generalize poorly. We introduce Next-Latent Prediction (NextLat), which extends standard next-token training with self-supervised predictions in the latent space. Specifically, NextLat trains a transformer to learn latent representations that are predictive of its next latent state given the next output token. Theoretically, we show that these latents provably converge to belief states, compressed information of the history necessary to predict the future. This simple auxiliary objective also injects a recurrent inductive bias into transformers, while leaving their architecture, parallel training, and inference unchanged. NextLat effectively encourages the transformer to form compact internal world models with its own belief states and transition dynamics – a crucial property absent in standard next-token prediction transformers. Empirically, across benchmarks targeting core sequence modeling competencies – world modeling, reasoning, planning, and language modeling – NextLat demonstrates significant gains over standard next-token training in downstream accuracy, representation compression, and lookahead planning. NextLat stands as a simple and efficient paradigm for shaping transformer representations toward stronger generalization.

[LG-108] Deep Survival Analysis of Longitudinal EHR Data for Joint Prediction of Hospitalization and Death in COPD Patients

链接: https://arxiv.org/abs/2511.05960
作者: Enrico Manzini,Thomas Gonzalez Saito,Joan Escudero,Ana Génova,Cristina Caso,Tomas Perez-Porcuna,Alexandre Perera-Lluna
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-109] From Kernels to Attention: A Transformer Framework for Density and Score Estimation

链接: https://arxiv.org/abs/2511.05924
作者: Vasily Ilin,Peter Sushko
类目: Machine Learning (cs.LG)
*备注: 14 pages, 14 figures

点击查看摘要

Abstract:We introduce a unified attention-based framework for joint score and density estimation. Framing the problem as a sequence-to-sequence task, we develop a permutation- and affine-equivariant transformer that estimates both the probability density f(x) and its score \nabla_x \log f(x) directly from i.i.d. samples. Unlike traditional score-matching methods that require training a separate model for each distribution, our approach learns a single distribution-agnostic operator that generalizes across densities and sample sizes. The architecture employs cross-attention to connect observed samples with arbitrary query points, enabling generalization beyond the training data, while built-in symmetry constraints ensure equivariance to permutation and affine transformations. Analytically, we show that the attention weights can recover classical kernel density estimation (KDE), and verify it empirically, establishing a principled link between classical KDE and the transformer architecture. Empirically, the model achieves substantially lower error and better scaling than KDE and score-debiased KDE (SD-KDE), while exhibiting better runtime scaling. Together, these results establish transformers as general-purpose, data-adaptive operators for nonparametric density and score estimation.

[LG-110] FusionLog: Cross-System Log-based Anomaly Detection via Fusion of General and Proprietary Knowledge

链接: https://arxiv.org/abs/2511.05878
作者: Xinlong Zhao,Tong Jia,Minghua He,Xixuan Yang,Ying Li
类目: Machine Learning (cs.LG); Software Engineering (cs.SE)
*备注: 11 pages, 4 figures, and 2 tables

点击查看摘要

[LG-111] CADM: Cluster-customized Adaptive Distance Metric for Categorical Data Clustering

链接: https://arxiv.org/abs/2511.05826
作者: Taixi Chen,Yiu-ming Cheung,Yiqun Zhang
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注: 5 pages

点击查看摘要

Abstract:An appropriate distance metric is crucial for categorical data clustering, as the distance between categorical data cannot be directly calculated. However, the distances between attribute values usually vary in different clusters induced by their different distributions, which has not been taken into account, thus leading to unreasonable distance measurement. Therefore, we propose a cluster-customized distance metric for categorical data clustering, which can competitively update distances based on different distributions of attributes in each cluster. In addition, we extend the proposed distance metric to the mixed data that contains both numerical and categorical attributes. Experiments demonstrate the efficacy of the proposed method, i.e., achieving an average ranking of around first in fourteen datasets. The source code is available at this https URL

[LG-112] AiEDA: An Open-Source AI-Aided Design Library for Design-to-Vector

链接: https://arxiv.org/abs/2511.05823
作者: Yihang Qiu,Zengrong Huang,Simin Tao,Hongda Zhang,Weiguo Li,Xinhua Lai,Rui Wang,Weiqiang Wang,Xingquan Li
类目: Machine Learning (cs.LG); Hardware Architecture (cs.AR)
*备注: 18 pages, 29 figures, accepted by TCAD 2025

点击查看摘要

Abstract:Recent research has demonstrated that artificial intelligence (AI) can assist electronic design automation (EDA) in improving both the quality and efficiency of chip design. But current AI for EDA (AI-EDA) infrastructures remain fragmented, lacking comprehensive solutions for the entire data pipeline from design execution to AI integration. Key challenges include fragmented flow engines that generate raw data, heterogeneous file formats for data exchange, non-standardized data extraction methods, and poorly organized data storage. This work introduces a unified open-source library for EDA (AiEDA) that addresses these issues. AiEDA integrates multiple design-to-vector data representation techniques that transform diverse chip design data into universal multi-level vector representations, establishing an AI-aided design (AAD) paradigm optimized for AI-EDA workflows. AiEDA provides complete physical design flows with programmatic data extraction and standardized Python interfaces bridging EDA datasets and AI frameworks. Leveraging the AiEDA library, we generate iDATA, a 600GB dataset of structured data derived from 50 real chip designs (28nm), and validate its effectiveness through seven representative AAD tasks spanning prediction, generation, optimization and analysis. The code is publicly available at this https URL, while the full iDATA dataset is being prepared for public release, providing a foundation for future AI-EDA research.

[LG-113] Catching Contamination Before Generation: Spectral Kill Switches for Agents

链接: https://arxiv.org/abs/2511.05804
作者: Valentin Noël
类目: Machine Learning (cs.LG); Signal Processing (eess.SP); Systems and Control (eess.SY); Machine Learning (stat.ML)
*备注: Preprint under review (2025). 9 pages, 2 figures. Code and scripts: to be released

点击查看摘要

Abstract:Agentic language models compose multi step reasoning chains, yet intermediate steps can be corrupted by inconsistent context, retrieval errors, or adversarial inputs, which makes post hoc evaluation too late because errors propagate before detection. We introduce a diagnostic that requires no additional training and uses only the forward pass to emit a binary accept or reject signal during agent execution. The method analyzes token graphs induced by attention and computes two spectral statistics in early layers, namely the high frequency energy ratio and spectral entropy. We formalize these signals, establish invariances, and provide finite sample estimators with uncertainty quantification. Under a two regime mixture assumption with a monotone likelihood ratio property, we show that a single threshold on the high frequency energy ratio is optimal in the Bayes sense for detecting context inconsistency. Empirically, the high frequency energy ratio exhibits robust bimodality during context verification across multiple model families, which enables gating decisions with overhead below one millisecond on our hardware and configurations. We demonstrate integration into retrieval augmented agent pipelines and discuss deployment as an inline safety monitor. The approach detects contamination while the model is still processing the text, before errors commit to the reasoning chain.

[LG-114] An Efficient Gradient-Aware Error-Bounded Lossy Compressor for Federated Learning

链接: https://arxiv.org/abs/2511.05770
作者: Zhijing Ye,Sheng Di,Jiamin Wang,Zhiqing Zhong,Zhaorui Zhang,Xiaodong Yu
类目: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
*备注: Preprint version

点击查看摘要

Abstract:Federated learning (FL) enables collaborative model training without exposing clients’ private data, but its deployment is often constrained by the communication cost of transmitting gradients between clients and the central server, especially under system heterogeneity where low-bandwidth clients bottleneck overall performance. Lossy compression of gradient data can mitigate this overhead, and error-bounded lossy compression (EBLC) is particularly appealing for its fine-grained utility-compression tradeoff. However, existing EBLC methods (e.g., SZ), originally designed for smooth scientific data with strong spatial locality, rely on generic predictors such as Lorenzo and interpolation for entropy reduction to improve compression ratio. Gradient tensors, in contrast, exhibit low smoothness and weak spatial correlation, rendering these predictors ineffective and leading to poor compression ratios. To address this limitation, we propose an EBLC framework tailored for FL gradient data to achieve high compression ratios while preserving model accuracy. The core of it is an innovative prediction mechanism that exploits temporal correlations across FL training rounds and structural regularities within convolutional kernels to reduce residual entropy. The predictor is compatible with standard quantizers and entropy coders and comprises (1) a cross-round magnitude predictor based on a normalized exponential moving average, and (2) a sign predictor that leverages gradient oscillation and kernel-level sign consistency. Experiments show that this new EBLC yields up to 1.53x higher compression ratios than SZ3 with lower accuracy loss. Integrated into a real-world FL framework, APPFL, it reduces end-to-end communication time by 76.1%-96.2% under various constrained-bandwidth scenarios, demonstrating strong scalability for real-world FL deployments.

[LG-115] Primal-Only Actor Critic Algorithm for Robust Constrained Averag e Cost MDPs

链接: https://arxiv.org/abs/2511.05758
作者: Anirudh Satheesh,Sooraj Sathish,Swetha Ganesh,Keenan Powell,Vaneet Aggarwal
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-116] Zero-Shot Function Encoder-Based Differentiable Predictive Control

链接: https://arxiv.org/abs/2511.05757
作者: Hassan Iqbal,Xingjian Li,Tyler Ingebrand,Adam Thorpe,Krishna Kumar,Ufuk Topcu,Ján Drgoňa
类目: ystems and Control (eess.SY); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:We introduce a differentiable framework for zero-shot adaptive control over parametric families of nonlinear dynamical systems. Our approach integrates a function encoder-based neural ODE (FE-NODE) for modeling system dynamics with a differentiable predictive control (DPC) for offline self-supervised learning of explicit control policies. The FE-NODE captures nonlinear behaviors in state transitions and enables zero-shot adaptation to new systems without retraining, while the DPC efficiently learns control policies across system parameterizations, thus eliminating costly online optimization common in classical model predictive control. We demonstrate the efficiency, accuracy, and online adaptability of the proposed method across a range of nonlinear systems with varying parametric scenarios, highlighting its potential as a general-purpose tool for fast zero-shot adaptive control.

[LG-117] Near-Exponential Savings for Mean Estimation with Active Learning NEURIPS2025

链接: https://arxiv.org/abs/2511.05736
作者: Julian M. Morimoto,Jacob Goldin,Daniel E. Ho
类目: Machine Learning (cs.LG)
*备注: Accepted to the 39th Conference on Neural Information Processing Systems (NeurIPS 2025)

点击查看摘要

[LG-118] QiVC-Net: Quantum-Inspired Variational Convolutional Network with Application to Biosignal Classification

链接: https://arxiv.org/abs/2511.05730
作者: Amin Golnari,Jamileh Yousefi,Reza Moheimani,Saeid Sanei
类目: Machine Learning (cs.LG); Signal Processing (eess.SP)
*备注:

点击查看摘要

Abstract:This work introduces the quantum-inspired variational convolution (QiVC) framework, a novel learning paradigm that integrates principles of probabilistic inference, variational optimization, and quantum-inspired transformations within convolutional architectures. The central innovation of QiVC lies in its quantum-inspired rotated ensemble (QiRE) mechanism. QiRE performs differentiable low-dimensional subspace rotations of convolutional weights, analogously to quantum state evolution. This approach enables structured uncertainty modeling while preserving the intrinsic geometry of the parameter space, resulting in more expressive, stable, and uncertainty-aware representations. To demonstrate its practical potential, the concept is instantiated in a QiVC-based convolutional network (QiVC-Net) and evaluated in the context of biosignal classification, focusing on phonocardiogram (PCG) recordings, a challenging domain characterized by high noise, inter-subject variability, and often imbalanced data. The proposed QiVC-Net integrates an architecture in which the QiVC layer does not introduce additional parameters, instead performing an ensemble rotation of the convolutional weights through a structured mechanism ensuring robustness without added highly computational burden. Experiments on two benchmark datasets, PhysioNet CinC 2016 and PhysioNet CirCor DigiScope 2022, show that QiVC-Net achieves state-of-the-art performance, reaching accuracies of 97.84% and 97.89%, respectively. These findings highlight the versatility of the QiVC framework and its promise for advancing uncertainty-aware modeling in real-world biomedical signal analysis. The implementation of the QiVConv layer is openly available in GitHub.

[LG-119] GastroDL-Fusion: A Dual-Modal Deep Learning Framework Integrating Protein-Ligand Complexes and Gene Sequences for Gastrointestinal Disease Drug Discovery

链接: https://arxiv.org/abs/2511.05726
作者: Ziyang Gao,Annie Cheung,Yihao Ou
类目: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
*备注:

点击查看摘要

Abstract:Accurate prediction of protein-ligand binding affinity plays a pivotal role in accelerating the discovery of novel drugs and vaccines, particularly for gastrointestinal (GI) diseases such as gastric ulcers, Crohn’s disease, and ulcerative colitis. Traditional computational models often rely on structural information alone and thus fail to capture the genetic determinants that influence disease mechanisms and therapeutic responses. To address this gap, we propose GastroDL-Fusion, a dual-modal deep learning framework that integrates protein-ligand complex data with disease-associated gene sequence information for drug and vaccine development. In our approach, protein-ligand complexes are represented as molecular graphs and modeled using a Graph Isomorphism Network (GIN), while gene sequences are encoded into biologically meaningful embeddings via a pre-trained Transformer (ProtBERT/ESM). These complementary modalities are fused through a multi-layer perceptron to enable robust cross-modal interaction learning. We evaluate the model on benchmark datasets of GI disease-related targets, demonstrating that GastroDL-Fusion significantly improves predictive performance over conventional methods. Specifically, the model achieves a mean absolute error (MAE) of 1.12 and a root mean square error (RMSE) of 1.75, outperforming CNN, BiLSTM, GIN, and Transformer-only baselines. These results confirm that incorporating both structural and genetic features yields more accurate predictions of binding affinities, providing a reliable computational tool for accelerating the design of targeted therapies and vaccines in the context of gastrointestinal diseases.

[LG-120] Distributionally Robust Multimodal Machine Learning

链接: https://arxiv.org/abs/2511.05716
作者: Peilin Yang,Yu Ma
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:We consider the problem of distributionally robust multimodal machine learning. Existing approaches often rely on merging modalities on the feature level (early fusion) or heuristic uncertainty modeling, which downplays modality-aware ef- fects and provide limited insights. We propose a novel distributionally robust optimization (DRO) framework that aims to study both the theoretical and practical insights of multimodal machine learning. We first justify this setup and show the significance of this problem through complexity analysis. We then establish both generalization upper bounds and minimax lower bounds which provide perfor- mance guarantees. These results are further extended in settings where we consider encoder-specific error propogations. Empirically, we demonstrate that our approach improves robustness in both simulation settings and real-world datasets. Together, these findings provide a principled foundation for employing multimodal machine learning models in high-stakes applications where uncertainty is unavoidable.

[LG-121] AI-assisted workflow enables rapid high-fidelity breast cancer clinical trial eligibility prescreening

链接: https://arxiv.org/abs/2511.05696
作者: Jacob T. Rosenthal,Emma Hahesy,Sulov Chalise,Menglei Zhu,Mert R. Sabuncu,Lior Z. Braunstein,Anyi Li
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Clinical trials play an important role in cancer care and research, yet participation rates remain low. We developed MSK-MATCH (Memorial Sloan Kettering Multi-Agent Trial Coordination Hub), an AI system for automated eligibility screening from clinical text. MSK-MATCH integrates a large language model with a curated oncology trial knowledge base and retrieval-augmented architecture providing explanations for all AI predictions grounded in source text. In a retrospective dataset of 88,518 clinical documents from 731 patients across six breast cancer trials, MSK-MATCH automatically resolved 61.9% of cases and triaged 38.1% for human review. This AI-assisted workflow achieved 98.6% accuracy, 98.4% sensitivity, and 98.7% specificity for patient-level eligibility classification, matching or exceeding performance of the human-only and AI-only comparisons. For the triaged cases requiring manual review, prepopulating eligibility screens with AI-generated explanations reduced screening time from 20 minutes to 43 seconds at an average cost of 0.96 per patient-trial pair.

[LG-122] Distributionally Robust Self Paced Curriculum Reinforcement Learning

链接: https://arxiv.org/abs/2511.05694
作者: Anirudh Satheesh,Keenan Powell,Vaneet Aggarwal
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:A central challenge in reinforcement learning is that policies trained in controlled environments often fail under distribution shifts at deployment into real-world environments. Distributionally Robust Reinforcement Learning (DRRL) addresses this by optimizing for worst-case performance within an uncertainty set defined by a robustness budget \epsilon . However, fixing \epsilon results in a tradeoff between performance and robustness: small values yield high nominal performance but weak robustness, while large values can result in instability and overly conservative policies. We propose Distributionally Robust Self-Paced Curriculum Reinforcement Learning (DR-SPCRL), a method that overcomes this limitation by treating \epsilon as a continuous curriculum. DR-SPCRL adaptively schedules the robustness budget according to the agent’s progress, enabling a balance between nominal and robust performance. Empirical results across multiple environments demonstrate that DR-SPCRL not only stabilizes training but also achieves a superior robustness-performance trade-off, yielding an average 11.8% increase in episodic return under varying perturbations compared to fixed or heuristic scheduling strategies, and achieving approximately 1.9 \times the performance of the corresponding nominal RL algorithms.

[LG-123] KLASS: KL-Guided Fast Inference in Masked Diffusion Models NEURIPS2025

链接: https://arxiv.org/abs/2511.05664
作者: Seo Hyun Kim,Sunwoo Hong,Hojung Jung,Youngrok Park,Se-Young Yun
类目: Machine Learning (cs.LG)
*备注: NeurIPS 2025 Spotlight. Code: this https URL

点击查看摘要

Abstract:Masked diffusion models have demonstrated competitive results on various tasks including language generation. However, due to its iterative refinement process, the inference is often bottlenecked by slow and static sampling speed. To overcome this problem, we introduce `KL-Adaptive Stability Sampling’ (KLASS), a fast yet effective sampling method that exploits token-level KL divergence to identify stable, high-confidence predictions. By unmasking multiple tokens in each iteration without any additional model training, our approach speeds up generation significantly while maintaining sample quality. On reasoning benchmarks, KLASS achieves up to 2.78\times wall-clock speedups while improving performance over standard greedy decoding, attaining state-of-the-art results among diffusion-based samplers. We further validate KLASS across diverse domains, including text, image, and molecular generation, showing its effectiveness as a broadly applicable sampler across different models.

[LG-124] Blind Inverse Game Theory: Jointly Decoding Rewards and Rationality in Entropy-Regularized Competitive Games

链接: https://arxiv.org/abs/2511.05640
作者: Hamza Virk,Sandro Amaglobeli,Zuhayr Syed
类目: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:Inverse Game Theory (IGT) methods based on the entropy-regularized Quantal Response Equilibrium (QRE) offer a tractable approach for competitive settings, but critically assume the agents’ rationality parameter (temperature \tau ) is known a priori. When \tau is unknown, a fundamental scale ambiguity emerges that couples \tau with the reward parameters ( \theta ), making them statistically unidentifiable. We introduce Blind-IGT, the first statistical framework to jointly recover both \theta and \tau from observed behavior. We analyze this bilinear inverse problem and establish necessary and sufficient conditions for unique identification by introducing a normalization constraint that resolves the scale ambiguity. We propose an efficient Normalized Least Squares (NLS) estimator and prove it achieves the optimal \mathcalO(N^-1/2) convergence rate for joint parameter recovery. When strong identifiability conditions fail, we provide partial identification guarantees through confidence set construction. We extend our framework to Markov games and demonstrate optimal convergence rates with strong empirical performance even when transition dynamics are unknown.

[LG-125] Physics-Guided Machine Learning for Uncertainty Quantification in Turbulence Models NEURIPS2025

链接: https://arxiv.org/abs/2511.05633
作者: Minghan Chu,Weicheng Qian
类目: Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn)
*备注: Accepted to NeurIPS 2025 Workshop on Machine Learning and the Physical Sciences (ML4PS), non-archival

点击查看摘要

Abstract:Predicting the evolution of turbulent flows is central across science and engineering. Most studies rely on simulations with turbulence models, whose empirical simplifications introduce epistemic uncertainty. The Eigenspace Perturbation Method (EPM) is a widely used physics-based approach to quantify model-form uncertainty, but being purely physics-based it can overpredict uncertainty bounds. We propose a convolutional neural network (CNN)-based modulation of EPM perturbation magnitudes to improve calibration while preserving physical consistency. Across canonical cases, the hybrid ML-EPM framework yields substantially tighter, better-calibrated uncertainty estimates than baseline EPM alone.

[LG-126] Fooling Algorithms in Non-Stationary Bandits using Belief Inertia

链接: https://arxiv.org/abs/2511.05620
作者: Gal Mendelson,Eyal Tadmor
类目: Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:We study the problem of worst case regret in piecewise stationary multi armed bandits. While the minimax theory for stationary bandits is well established, understanding analogous limits in time-varying settings is challenging. Existing lower bounds rely on what we refer to as infrequent sampling arguments, where long intervals without exploration allow adversarial reward changes that induce large regret. In this paper, we introduce a fundamentally different approach based on a belief inertia argument. Our analysis captures how an algorithm’s empirical beliefs, encoded through historical reward averages, create momentum that resists new evidence after a change. We show how this inertia can be exploited to construct adversarial instances that mislead classical algorithms such as Explore Then Commit, epsilon greedy, and UCB, causing them to suffer regret that grows linearly with T and with a substantial constant factor, regardless of how their parameters are tuned, even with a single change point. We extend the analysis to algorithms that periodically restart to handle non stationarity and prove that, even then, the worst case regret remains linear in T. Our results indicate that utilizing belief inertia can be a powerful method for deriving sharp lower bounds in non stationary bandits. Subjects: Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML) Cite as: arXiv:2511.05620 [cs.LG] (or arXiv:2511.05620v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2511.05620 Focus to learn more arXiv-issued DOI via DataCite

[LG-127] FiCABU: A Fisher-Based Context-Adaptive Machine Unlearning Processor for Edge AI DATE2026

链接: https://arxiv.org/abs/2511.05605
作者: Eun-Su Cho,Jongin Choi,Jeongmin Jin,Jae-Jin Lee,Woojoo Lee
类目: Machine Learning (cs.LG); Hardware Architecture (cs.AR)
*备注: 8 pages, 6 figures, 4 tables, DATE 2026 accepted paper

点击查看摘要

Abstract:Machine unlearning, driven by privacy regulations and the “right to be forgotten”, is increasingly needed at the edge, yet server-centric or retraining-heavy methods are impractical under tight computation and energy budgets. We present FiCABU (Fisher-based Context-Adaptive Balanced Unlearning), a software-hardware co-design that brings unlearning to edge AI processors. FiCABU combines (i) Context-Adaptive Unlearning, which begins edits from back-end layers and halts once the target forgetting is reached, with (ii) Balanced Dampening, which scales dampening strength by depth to preserve retain accuracy. These methods are realized in a full RTL design of a RISC-V edge AI processor that integrates two lightweight IPs for Fisher estimation and dampening into a GEMM-centric streaming pipeline, validated on an FPGA prototype and synthesized in 45 nm for power analysis. Across CIFAR-20 and PinsFaceRecognition with ResNet-18 and ViT, FiCABU achieves random-guess forget accuracy while matching the retraining-free Selective Synaptic Dampening (SSD) baseline on retain accuracy, reducing computation by up to 87.52 percent (ResNet-18) and 71.03 percent (ViT). On the INT8 hardware prototype, FiCABU further improves retain preservation and reduces energy to 6.48 percent (CIFAR-20) and 0.13 percent (PinsFaceRecognition) of the SSD baseline. In sum, FiCABU demonstrates that back-end-first, depth-aware unlearning can be made both practical and efficient for resource-constrained edge AI devices.

[LG-128] AutoHood3D: A Multi-Modal Benchmark for Automotive Hood Design and Fluid-Structure Interaction

链接: https://arxiv.org/abs/2511.05596
作者: Vansh Sharma,Harish Jai Ganesh,Maryam Akram,Wanjiao Liu,Venkat Raman
类目: Machine Learning (cs.LG); Computational Physics (physics.comp-ph); Fluid Dynamics (physics.flu-dyn)
*备注:

点击查看摘要

Abstract:This study presents a new high-fidelity multi-modal dataset containing 16000+ geometric variants of automotive hoods useful for machine learning (ML) applications such as engineering component design and process optimization, and multiphysics system surrogates. The dataset is centered on a practical multiphysics problem-hood deformation from fluid entrapment and inertial loading during rotary-dip painting. Each hood is numerically modeled with a coupled Large-Eddy Simulation (LES)-Finite Element Analysis (FEA), using 1.2M cells in total to ensure spatial and temporal accuracy. The dataset provides time-resolved physical fields, along with STL meshes and structured natural language prompts for text-to-geometry synthesis. Existing datasets are either confined to 2D cases, exhibit limited geometric variations, or lack the multi-modal annotations and data structures - shortcomings we address with AutoHood3D. We validate our numerical methodology, establish quantitative baselines across five neural architectures, and demonstrate systematic surrogate errors in displacement and force predictions. These findings motivate the design of novel approaches and multiphysics loss functions that enforce fluid-solid coupling during model training. By providing fully reproducible workflows, AutoHood3D enables physics-aware ML development, accelerates generative-design iteration, and facilitates the creation of new FSI benchmarks. Dataset and code URLs in Appendix.

[LG-129] Optimizing Predictive Maintenance in Intelligent Manufacturing: An Integrated FNO-DAE-GNN-PPO MDP Framework

链接: https://arxiv.org/abs/2511.05594
作者: Shiqing Qiu
类目: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
*备注:

点击查看摘要

Abstract:In the era of smart manufacturing, predictive maintenance (PdM) plays a pivotal role in improving equipment reliability and reducing operating costs. In this paper, we propose a novel Markov Decision Process (MDP) framework that integrates advanced soft computing techniques - Fourier Neural Operator (FNO), Denoising Autoencoder (DAE), Graph Neural Network (GNN), and Proximal Policy Optimisation (PPO) - to address the multidimensional challenges of predictive maintenance in complex manufacturing systems. Specifically, the proposed framework innovatively combines the powerful frequency-domain representation capability of FNOs to capture high-dimensional temporal patterns; DAEs to achieve robust, noise-resistant latent state embedding from complex non-Gaussian sensor data; and GNNs to accurately represent inter-device dependencies for coordinated system-wide maintenance decisions. Furthermore, by exploiting PPO, the framework ensures stable and efficient optimisation of long-term maintenance strategies to effectively handle uncertainty and non-stationary dynamics. Experimental validation demonstrates that the approach significantly outperforms multiple deep learning baseline models with up to 13% cost reduction, as well as strong convergence and inter-module synergy. The framework has considerable industrial potential to effectively reduce downtime and operating expenses through data-driven strategies.

[LG-130] Gradient Projection onto Historical Descent Directions for Communication-Efficient Federated Learning

链接: https://arxiv.org/abs/2511.05593
作者: Arnaud Descours(UCBL),Léonard Deroose,Jan Ramon
类目: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC); Statistics Theory (math.ST)
*备注:

点击查看摘要

[LG-131] GRAVER: Generative Graph Vocabularies for Robust Graph Foundation Models Fine-tuning NEURIPS2025

链接: https://arxiv.org/abs/2511.05592
作者: Haonan Yuan,Qingyun Sun,Junhua Shi,Xingcheng Fu,Bryan Hooi,Jianxin Li,Philip S. Yu
类目: Machine Learning (cs.LG)
*备注: Accepted by the NeurIPS 2025

点击查看摘要

Abstract:Inspired by the remarkable success of foundation models in language and vision, Graph Foundation Models (GFMs) hold significant promise for broad applicability across diverse graph tasks and domains. However, existing GFMs struggle with unstable few-shot fine-tuning, where both performance and adaptation efficiency exhibit significant fluctuations caused by the randomness in the support sample selection and structural discrepancies between the pre-trained and target graphs. How to fine-tune GFMs robustly and efficiently to enable trustworthy knowledge transfer across domains and tasks is the major challenge. In this paper, we propose GRAVER, a novel Generative gRAph VocabulariEs for Robust GFM fine-tuning framework that tackles the aforementioned instability via generative augmentations. Specifically, to identify transferable units, we analyze and extract key class-specific subgraph patterns by ego-graph disentanglement and validate their transferability both theoretically and empirically. To enable effective pre-training across diverse domains, we leverage a universal task template based on ego-graph similarity and construct graph vocabularies via graphon-based generative experts. To facilitate robust and efficient prompt fine-tuning, we grave the support samples with in-context vocabularies, where the lightweight MoE-CoE network attentively routes knowledge from source domains. Extensive experiments demonstrate the superiority of GRAVER over effectiveness, robustness, and efficiency on downstream few-shot node and graph classification tasks compared with 15 state-of-the-art baselines.

[LG-132] FedSparQ: Adaptive Sparse Quantization with Error Feedback for Robust Efficient Federated Learning

链接: https://arxiv.org/abs/2511.05591
作者: Chaimaa Medjadji,Sadi Alawadi,Feras M. Awaysheh,Guilain Leduc,Sylvain Kubler,Yves Le Traon
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Federated Learning (FL) enables collaborative model training across decentralized clients while preserving data privacy by keeping raw data local. However, FL suffers from significant communication overhead due to the frequent exchange of high-dimensional model updates over constrained networks. In this paper, we present FedSparQ, a lightweight compression framework that dynamically sparsifies the gradient of each client through an adaptive threshold, applies half-precision quanti- zation to retained entries and integrates residuals from error feedback to prevent loss of information. FedSparQ requires no manual tuning of sparsity rates or quantization schedules, adapts seamlessly to both homogeneous and heterogeneous data distributions, and is agnostic to model architecture. Through extensive empirical evaluation on vision benchmarks under independent and identically distributed (IID) and non-IID data, we show that FedSparQ substantially reduces communication overhead (reducing by 90% of bytes sent compared to FedAvg) while preserving or improving model accuracy (improving by 6% compared to FedAvg non-compressed solution or to state-of-the- art compression models) and enhancing convergence robustness (by 50%, compared to the other baselines). Our approach provides a practical, easy-to-deploy solution for bandwidth- constrained federated deployments and lays the groundwork for future extensions in adaptive precision and privacy-preserving protocols.

[LG-133] Prompting Neural-Guided Equation Discovery Based on Residuals

链接: https://arxiv.org/abs/2511.05586
作者: Jannis Brugger,Viktor Pfanschilling,David Richter,Mira Mezini,Stefan Kramer
类目: Machine Learning (cs.LG)
*备注: 16 pages, 7 figures, Discovery Science 2025

点击查看摘要

Abstract:Neural-guided equation discovery systems use a data set as prompt and predict an equation that describes the data set without extensive search. However, if the equation does not meet the user’s expectations, there are few options for getting other equation suggestions without intensive work with the system. To fill this gap, we propose Residuals for Equation Discovery (RED), a post-processing method that improves a given equation in a targeted manner, based on its residuals. By parsing the initial equation to a syntax tree, we can use node-based calculation rules to compute the residual for each subequation of the initial equation. It is then possible to use this residual as new target variable in the original data set and generate a new prompt. If, with the new prompt, the equation discovery system suggests a subequation better than the old subequation on a validation set, we replace the latter by the former. RED is usable with any equation discovery system, is fast to calculate, and is easy to extend for new mathematical operations. In experiments on 53 equations from the Feynman benchmark, we show that it not only helps to improve all tested neural-guided systems, but also all tested classical genetic programming systems.

[LG-134] Depth-induced NTK: Bridging Over-parameterized Neural Networks and Deep Neural Kernels

链接: https://arxiv.org/abs/2511.05585
作者: Yong-Ming Tian,Shuang Liang,Shao-Qun Zhang,Feng-Lei Fan
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:While deep learning has achieved remarkable success across a wide range of applications, its theoretical understanding of representation learning remains limited. Deep neural kernels provide a principled framework to interpret over-parameterized neural networks by mapping hierarchical feature transformations into kernel spaces, thereby combining the expressive power of deep architectures with the analytical tractability of kernel methods. Recent advances, particularly neural tangent kernels (NTKs) derived by gradient inner products, have established connections between infinitely wide neural networks and nonparametric Bayesian inference. However, the existing NTK paradigm has been predominantly confined to the infinite-width regime, while overlooking the representational role of network depth. To address this gap, we propose a depth-induced NTK kernel based on a shortcut-related architecture, which converges to a Gaussian process as the network depth approaches infinity. We theoretically analyze the training invariance and spectrum properties of the proposed kernel, which stabilizes the kernel dynamics and mitigates degeneration. Experimental results further underscore the effectiveness of our proposed method. Our findings significantly extend the existing landscape of the neural kernel theory and provide an in-depth understanding of deep learning and the scaling law.

[LG-135] Distillation-Accelerated Uncertainty Modeling for Multi-Objective RTA Interception

链接: https://arxiv.org/abs/2511.05582
作者: Gaoxiang Zhao,Ruina Qiu,Pengpeng Zhao,Rongjin Wang,Zhangang Lin,Xiaoqiang Wang
类目: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)
*备注:

点击查看摘要

Abstract:Real-Time Auction (RTA) Interception aims to filter out invalid or irrelevant traffic to enhance the integrity and reliability of downstream data. However, two key challenges remain: (i) the need for accurate estimation of traffic quality together with sufficiently high confidence in the model’s predictions, typically addressed through uncertainty modeling, and (ii) the efficiency bottlenecks that such uncertainty modeling introduces in real-time applications due to repeated inference. To address these challenges, we propose DAUM, a joint modeling framework that integrates multi-objective learning with uncertainty modeling, yielding both traffic quality predictions and reliable confidence estimates. Building on DAUM, we further apply knowledge distillation to reduce the computational overhead of uncertainty modeling, while largely preserving predictive accuracy and retaining the benefits of uncertainty estimation. Experiments on the JD advertisement dataset demonstrate that DAUM consistently improves predictive performance, with the distilled model delivering a tenfold increase in inference speed.

[LG-136] Data-driven jet fuel demand forecasting: A case study of Copenhagen Airport

链接: https://arxiv.org/abs/2511.05569
作者: Alessandro Contini,Davide Cacciarelli,Murat Kulahci
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Accurate forecasting of jet fuel demand is crucial for optimizing supply chain operations in the aviation market. Fuel distributors specifically require precise estimates to avoid inventory shortages or excesses. However, there is a lack of studies that analyze the jet fuel demand forecasting problem using machine learning models. Instead, many industry practitioners rely on deterministic or expertise-based models. In this research, we evaluate the performance of data-driven approaches using a substantial amount of data obtained from a major aviation fuel distributor in the Danish market. Our analysis compares the predictive capabilities of traditional time series models, Prophet, LSTM sequence-to-sequence neural networks, and hybrid models. A key challenge in developing these models is the required forecasting horizon, as fuel demand needs to be predicted for the next 30 days to optimize sourcing strategies. To ensure the reliability of the data-driven approaches and provide valuable insights to practitioners, we analyze three different datasets. The primary objective of this study is to present a comprehensive case study on jet fuel demand forecasting, demonstrating the advantages of employing data-driven models and highlighting the impact of incorporating additional variables in the predictive models.

[LG-137] Daily Forecasting for Annual Time Series Datasets Using Similarity-Based Machine Learning Methods: A Case Study in the Energy Market

链接: https://arxiv.org/abs/2511.05556
作者: Mahdi Goldani
类目: Machine Learning (cs.LG); General Economics (econ.GN)
*备注:

点击查看摘要

Abstract:The policy environment of countries changes rapidly, influencing macro-level indicators such as the Energy Security Index. However, this index is only reported annually, limiting its responsiveness to short-term fluctuations. To address this gap, the present study introduces a daily proxy for the Energy Security Index and applies it to forecast energy security at a daily this http URL study employs a two stage approach first, a suitable daily proxy for the annual Energy Security Index is identified by applying six time series similarity measures to key energy related variables. Second, the selected proxy is modeled using the XGBoost algorithm to generate 15 day ahead forecasts, enabling high frequency monitoring of energy security this http URL the result of proxy choosing, Volume Brent consistently emerged as the most suitable proxy across the majority of methods. The model demonstrated strong performance, with an R squared of 0.981 on the training set and 0.945 on the test set, and acceptable error metrics . The 15 day forecast of Brent volume indicates short term fluctuations, with a peak around day 4, a decline until day 8, a rise near day 10, and a downward trend toward day 15, accompanied by prediction this http URL integrating time series similarity measures with machine learning based forecasting, this study provides a novel framework for converting low frequency macroeconomic indicators into high frequency, actionable signals. The approach enables real time monitoring of the Energy Security Index, offering policymakers and analysts a scalable and practical tool to respond more rapidly to fast changing policy and market conditions, especially in data scarce environments.

[LG-138] EEG Seizure Detection with a Sparse Hyperdimensional Computing Accelerator MICRO

链接: https://arxiv.org/abs/2511.05503
作者: Stef Cuyckens,Ryan Antonio,Chao Fang,Marian Verhelst
类目: Hardware Architecture (cs.AR); Machine Learning (cs.LG)
*备注: To appear at the 20th International Conference on PhD Research in Microelectronics and Electronics (PRIME 2025)

点击查看摘要

Abstract:Implantable devices for reliable intracranial electroencephalography (iEEG) require efficient, accurate, and real-time detection of seizures. Dense hyperdimensional computing (HDC) proves to be efficient over neural networks; however, it still consumes considerable switching power for an ultra-low energy application. Sparse HDC, on the other hand, has the potential of further reducing the energy consumption, yet at the expense of having to support more complex operations and introducing an extra hyperparameter, the maximum hypervector density. To improve the energy and area efficiency of the sparse HDC operations, this work introduces the compressed item memory (CompIM) and simplifies the spatial bundling. We also analyze how a proper hyperparameter choice improves the detection delay compared to dense HDC. Ultimately, our optimizations achieve a 1.73x more energy- and 2.20x more area-efficient hardware design than the naive sparse implementation. We are also 7.50x more energy- and 3.24x more area-efficient than the dense HDC implementation. This work highlights the hardware advantages of sparse HDC, demonstrating its potential to enable smaller brain implants with a substantially extended battery life compared to the current state-of-the-art.

[LG-139] Socially Aware Music Recommendation: A Multi-Modal Graph Neural Networks for Collaborative Music Consumption and Community-Based Engagement

链接: https://arxiv.org/abs/2511.05497
作者: Kajwan Ziaoddini
类目: Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM)
*备注:

点击查看摘要

Abstract:This study presents a novel Multi-Modal Graph Neural Network (MM-GNN) framework for socially aware music recommendation, designed to enhance personalization and foster community-based engagement. The proposed model introduces a fusion-free deep mutual learning strategy that aligns modality-specific representations from lyrics, audio, and visual data while maintaining robustness against missing modalities. A heterogeneous graph structure is constructed to capture both user-song interactions and user-user social relationships, enabling the integration of individual preferences with social influence. Furthermore, emotion-aware embeddings derived from acoustic and textual signals contribute to emotionally aligned recommendations. Experimental evaluations on benchmark datasets demonstrate that MM-GNN significantly outperforms existing state-of-the-art methods across various performance metrics. Ablation studies further validate the critical impact of each model component, confirming the effectiveness of the framework in delivering accurate and socially contextualized music recommendations.

[LG-140] Solving bilevel optimization via sequential minimax optimization

链接: https://arxiv.org/abs/2511.07398
作者: Zhaosong Lu,Sanyou Mei
类目: Optimization and Control (math.OC); Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
*备注: Accepted by Mathematics of Operations Research

点击查看摘要

Abstract:In this paper we propose a sequential minimax optimization (SMO) method for solving a class of constrained bilevel optimization problems in which the lower-level part is a possibly nonsmooth convex optimization problem, while the upper-level part is a possibly nonconvex optimization problem. Specifically, SMO applies a first-order method to solve a sequence of minimax subproblems, which are obtained by employing a hybrid of modified augmented Lagrangian and penalty schemes on the bilevel optimization problems. Under suitable assumptions, we establish an operation complexity of O(\varepsilon^-7\log\varepsilon^-1) and O(\varepsilon^-6\log\varepsilon^-1) , measured in terms of fundamental operations, for SMO in finding an \varepsilon -KKT solution of the bilevel optimization problems with merely convex and strongly convex lower-level objective functions, respectively. The latter result improves the previous best-known operation complexity by a factor of \varepsilon^-1 . Preliminary numerical results demonstrate significantly superior computational performance compared to the recently developed first-order penalty method.

[LG-141] Walsh-Hadamard Neural Operators for Solving PDEs with Discontinuous Coefficients

链接: https://arxiv.org/abs/2511.07347
作者: Giorrgio M. Cavallazzi,Miguel Perex Cuadrado,Alfredo Pinelli
类目: Computational Physics (physics.comp-ph); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Neural operators have emerged as powerful tools for learning solution operators of partial differential equations (PDEs). However, standard spectral methods based on Fourier transforms struggle with problems involving discontinuous coefficients due to the Gibbs phenomenon and poor representation of sharp interfaces. We introduce the Walsh-Hadamard Neural Operator (WHNO), which leverages Walsh-Hadamard transforms-a spectral basis of rectangular wave functions naturally suited for piecewise constant fields-combined with learnable spectral weights that transform low-sequency Walsh coefficients to capture global dependencies efficiently. We validate WHNO on three problems: steady-state Darcy flow (preliminary validation), heat conduction with discontinuous thermal conductivity, and the 2D Burgers equation with discontinuous initial conditions. In controlled comparisons with Fourier Neural Operators (FNO) under identical conditions, WHNO demonstrates superior accuracy with better preservation of sharp solution features at material interfaces. Critically, we discover that weighted ensemble combinations of WHNO and FNO achieve substantial improvements over either model alone: for both heat conduction and Burgers equation, optimal ensembles reduce mean squared error by 35-40 percent and maximum error by up to 25 percent compared to individual models. This demonstrates that Walsh-Hadamard and Fourier representations capture complementary aspects of discontinuous PDE solutions, with WHNO excelling at sharp interfaces while FNO captures smooth features effectively.

[LG-142] De-Individualizing fMRI Signals via Mahalanobis Whitening and Bures Geometry

链接: https://arxiv.org/abs/2511.07313
作者: Aaron Jacobson,Tingting Dan,Martin Styner,Guorong Wu,Shahar Kovalsky,Caroline Moosmueller
类目: Neurons and Cognition (q-bio.NC); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
*备注: 34 pages, 7 figures

点击查看摘要

Abstract:Functional connectivity has been widely investigated to understand brain disease in clinical studies and imaging-based neuroscience, and analyzing changes in functional connectivity has proven to be valuable for understanding and computationally evaluating the effects on brain function caused by diseases or experimental stimuli. By using Mahalanobis data whitening prior to the use of dimensionality reduction algorithms, we are able to distill meaningful information from fMRI signals about subjects and the experimental stimuli used to prompt them. Furthermore, we offer an interpretation of Mahalanobis whitening as a two-stage de-individualization of data which is motivated by similarity as captured by the Bures distance, which is connected to quantum mechanics. These methods have potential to aid discoveries about the mechanisms that link brain function with cognition and behavior and may improve the accuracy and consistency of Alzheimer’s diagnosis, especially in the preclinical stage of disease progression.

[LG-143] he Value of Personalized Recommendations: Evidence from Netflix

链接: https://arxiv.org/abs/2511.07280
作者: Kevin Zielnicki,Guy Aridor,Aurélien Bibaut,Allen Tran,Winston Chou,Nathan Kallus
类目: General Economics (econ.GN); Information Retrieval (cs.IR); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Personalized recommendation systems shape much of user choice online, yet their targeted nature makes separating out the value of recommendation and the underlying goods challenging. We build a discrete choice model that embeds recommendation-induced utility, low-rank heterogeneity, and flexible state dependence and apply the model to viewership data at Netflix. We exploit idiosyncratic variation introduced by the recommendation algorithm to identify and separately value these components as well as to recover model-free diversion ratios that we can use to validate our structural model. We use the model to evaluate counterfactuals that quantify the incremental engagement generated by personalized recommendations. First, we show that replacing the current recommender system with a matrix factorization or popularity-based algorithm would lead to 4% and 12% reduction in engagement, respectively, and decreased consumption diversity. Second, most of the consumption increase from recommendations comes from effective targeting, not mechanical exposure, with the largest gains for mid-popularity goods (as opposed to broadly appealing or very niche goods).

[LG-144] High-Dimensional Asymptotics of Differentially Private PCA

链接: https://arxiv.org/abs/2511.07270
作者: Youngjoo Yun,Rishabh Dudeja
类目: atistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:In differential privacy, statistics of a sensitive dataset are privatized by introducing random noise. Most privacy analyses provide privacy bounds specifying a noise level sufficient to achieve a target privacy guarantee. Sometimes, these bounds are pessimistic and suggest adding excessive noise, which overwhelms the meaningful signal. It remains unclear if such high noise levels are truly necessary or a limitation of the proof techniques. This paper explores whether we can obtain sharp privacy characterizations that identify the smallest noise level required to achieve a target privacy level for a given mechanism. We study this problem in the context of differentially private principal component analysis, where the goal is to privatize the leading principal components (PCs) of a dataset with n samples and p features. We analyze the exponential mechanism for this problem in a model-free setting and provide sharp utility and privacy characterizations in the high-dimensional limit ( p\rightarrow\infty ). Our privacy result shows that, in high dimensions, detecting the presence of a target individual in the dataset using the privatized PCs is exactly as hard as distinguishing two Gaussians with slightly different means, where the mean difference depends on certain spectral properties of the dataset. Our privacy analysis combines the hypothesis-testing formulation of privacy guarantees proposed by Dong, Roth, and Su (2022) with classical contiguity arguments due to Le Cam to obtain sharp high-dimensional privacy characterizations.

[LG-145] Simulation-based Methods for Optimal Sampling Design in Systems Biology

链接: https://arxiv.org/abs/2511.07197
作者: Tuan Minh Ha,Binh Thanh Nguyen,Lam Si Tung Ho
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:In many areas of systems biology, including virology, pharmacokinetics, and population biology, dynamical systems are commonly used to describe biological processes. These systems can be characterized by estimating their parameters from sampled data. The key problem is how to optimally select sampling points to achieve accurate parameter estimation. Classical approaches often rely on Fisher information matrix-based criteria such as A-, D-, and E-optimality, which require an initial parameter estimate and may yield suboptimal results when the estimate is inaccurate. This study proposes two simulation-based methods for optimal sampling design that do not depend on initial parameter estimates. The first method, E-optimal-ranking (EOR), employs the E-optimal criterion, while the second utilizes a Long Short-Term Memory (LSTM) neural network. Simulation studies based on the Lotka-Volterra and three-compartment models demonstrate that the proposed methods outperform both random selection and classical E-optimal design.

[LG-146] Anatomy-Aware Lymphoma Lesion Detection in Whole-Body PET/CT

链接: https://arxiv.org/abs/2511.07047
作者: Simone Bendazzoli,Antonios Tzortzakakis,Andreas Abrahamsson,Björn Engelbrekt Wahlin,Örjan Smedby,Maria Holstensson,Rodrigo Moreno
类目: Image and Video Processing (eess.IV); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Early cancer detection is crucial for improving patient outcomes, and 18F FDG PET/CT imaging plays a vital role by combining metabolic and anatomical information. Accurate lesion detection remains challenging due to the need to identify multiple lesions of varying sizes. In this study, we investigate the effect of adding anatomy prior information to deep learning-based lesion detection models. In particular, we add organ segmentation masks from the TotalSegmentator tool as auxiliary inputs to provide anatomical context to nnDetection, which is the state-of-the-art for lesion detection, and Swin Transformer. The latter is trained in two stages that combine self-supervised pre-training and supervised fine-tuning. The method is tested in the AutoPET and Karolinska lymphoma datasets. The results indicate that the inclusion of anatomical priors substantially improves the detection performance within the nnDetection framework, while it has almost no impact on the performance of the vision transformer. Moreover, we observe that Swin Transformer does not offer clear advantages over conventional convolutional neural network (CNN) encoders used in nnDetection. These findings highlight the critical role of the anatomical context in cancer lesion detection, especially in CNN-based models.

[LG-147] Dimensionality reduction and width of deep neural networks based on topological degree theory

链接: https://arxiv.org/abs/2511.06821
作者: Xiao-Song Yang
类目: General Topology (math.GN); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:In this paper we present a mathematical framework on linking of embeddings of compact topological spaces into Euclidean spaces and separability of linked embeddings under a specific class of dimension reduction maps. As applications of the established theory, we provide some fascinating insights into classification and approximation problems in deep learning theory in the setting of deep neural networks.

[LG-148] Convergence of Actor-Critic Learning for Mean Field Games and Mean Field Control in Continuous Spaces

链接: https://arxiv.org/abs/2511.06812
作者: Jean-Pierre Fouque,Mathieu Laurière,Mengrui Zhang
类目: Optimization and Control (math.OC); Machine Learning (cs.LG); Probability (math.PR)
*备注:

点击查看摘要

Abstract:We establish the convergence of the deep actor-critic reinforcement learning algorithm presented in [Angiuli et al., 2023a] in the setting of continuous state and action spaces with an infinite discrete-time horizon. This algorithm provides solutions to Mean Field Game (MFG) or Mean Field Control (MFC) problems depending on the ratio between two learning rates: one for the value function and the other for the mean field term. In the MFC case, to rigorously identify the limit, we introduce a discretization of the state and action spaces, following the approach used in the finite-space case in [Angiuli et al., 2023b]. The convergence proofs rely on a generalization of the two-timescale framework introduced in [Borkar, 1997]. We further extend our convergence results to Mean Field Control Games, which involve locally cooperative and globally competitive populations. Finally, we present numerical experiments for linear-quadratic problems in one and two dimensions, for which explicit solutions are available.

[LG-149] Bilevel Learning via Inexact Stochastic Gradient Descent

链接: https://arxiv.org/abs/2511.06774
作者: Mohammad Sadegh Salehi,Subhadip Mukherjee,Lindon Roberts,Matthias J. Ehrhardt
类目: Optimization and Control (math.OC); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Bilevel optimization is a central tool in machine learning for high-dimensional hyperparameter tuning. Its applications are vast; for instance, in imaging it can be used for learning data-adaptive regularizers and optimizing forward operators in variational regularization. These problems are large in many ways: a lot of data is usually available to train a large number of parameters, calling for stochastic gradient-based algorithms. However, exact gradients with respect to parameters (so-called hypergradients) are not available, and their precision is usually linearly related to computational cost. Hence, algorithms must solve the problem efficiently without unnecessary precision. The design of such methods is still not fully understood, especially regarding how accuracy requirements and step size schedules affect theoretical guarantees and practical performance. Existing approaches introduce stochasticity at both the upper level (e.g., in sampling or mini-batch estimates) and the lower level (e.g., in solving the inner problem) to improve generalization, but they typically fix the number of lower-level iterations, which conflicts with asymptotic convergence assumptions. In this work, we advance the theory of inexact stochastic bilevel optimization. We prove convergence and establish rates under decaying accuracy and step size schedules, showing that with optimal configurations convergence occurs at an \mathcalO(k^-1/4) rate in expectation. Experiments on image denoising and inpainting with convex ridge regularizers and input-convex networks confirm our analysis: decreasing step sizes improve stability, accuracy scheduling is more critical than step size strategy, and adaptive preconditioning (e.g., Adam) further boosts performance. These results bridge theory and practice, providing convergence guarantees and practical guidance for large-scale imaging problems.

[LG-150] Lassoed Forests: Random Forests with Adaptive Lasso Post-selection

链接: https://arxiv.org/abs/2511.06698
作者: Jing Shang,James Bannon,Benjamin Haibe-Kains,Robert Tibshirani
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Random forests are a statistical learning technique that use bootstrap aggregation to average high-variance and low-bias trees. Improvements to random forests, such as applying Lasso regression to the tree predictions, have been proposed in order to reduce model bias. However, these changes can sometimes degrade performance (e.g., an increase in mean squared error). In this paper, we show in theory that the relative performance of these two methods, standard and Lasso-weighted random forests, depends on the signal-to-noise ratio. We further propose a unified framework to combine random forests and Lasso selection by applying adaptive weighting and show mathematically that it can strictly outperform the other two methods. We compare the three methods through simulation, including bias-variance decomposition, error estimates evaluation, and variable importance analysis. We also show the versatility of our method by applications to a variety of real-world datasets.

[LG-151] Adam symmetry theorem: characterization of the convergence of the stochastic Adam optimizer

链接: https://arxiv.org/abs/2511.06675
作者: Steffen Dereich,Thang Do,Arnulf Jentzen,Philippe von Wurstemberger
类目: Optimization and Control (math.OC); Machine Learning (cs.LG)
*备注: 66 pages

点击查看摘要

Abstract:Beside the standard stochastic gradient descent (SGD) method, the Adam optimizer due to Kingma Ba (2014) is currently probably the best-known optimization method for the training of deep neural networks in artificial intelligence (AI) systems. Despite the popularity and the success of Adam it remains an \emphopen research problem to provide a rigorous convergence analysis for Adam even for the class of strongly convex SOPs. In one of the main results of this work we establish convergence rates for Adam in terms of the number of gradient steps (convergence rate \nicefrac12 w.r.t. the size of the learning rate), the size of the mini-batches (convergence rate 1 w.r.t. the size of the mini-batches), and the size of the second moment parameter of Adam (convergence rate 1 w.r.t. the distance of the second moment parameter to 1) for the class of strongly convex SOPs. In a further main result of this work, which we refer to as \emphAdam symmetry theorem, we illustrate the optimality of the established convergence rates by proving for a special class of simple quadratic strongly convex SOPs that Adam converges as the number of gradient steps increases to infinity to the solution of the SOP (the unique minimizer of the strongly convex objective function) if and \emphonly if the random variables in the SOP (the data in the SOP) are \emphsymmetrically distributed. In particular, in the standard case where the random variables in the SOP are not symmetrically distributed we \emphdisprove that Adam converges to the minimizer of the SOP as the number of Adam steps increases to infinity. We also complement the conclusions of our convergence analysis and the Adam symmetry theorem by several numerical simulations that indicate the sharpness of the established convergence rates and that illustrate the practical appearance of the phenomena revealed in the \emphAdam symmetry theorem.

[LG-152] Learning Biomolecular Motion: The Physics-Informed Machine Learning Paradigm

链接: https://arxiv.org/abs/2511.06585
作者: Aaryesh Deshpande
类目: Biomolecules (q-bio.BM); Machine Learning (cs.LG); Computational Physics (physics.comp-ph); Machine Learning (stat.ML)
*备注: 31 pages, 4 figures, 3 tables. Review article

点击查看摘要

Abstract:The convergence of statistical learning and molecular physics is transforming our approach to modeling biomolecular systems. Physics-informed machine learning (PIML) offers a systematic framework that integrates data-driven inference with physical constraints, resulting in models that are accurate, mechanistic, generalizable, and able to extrapolate beyond observed domains. This review surveys recent advances in physics-informed neural networks and operator learning, differentiable molecular simulation, and hybrid physics-ML potentials, with emphasis on long-timescale kinetics, rare events, and free-energy estimation. We frame these approaches as solutions to the “biomolecular closure problem”, recovering unresolved interactions beyond classical force fields while preserving thermodynamic consistency and mechanistic interpretability. We examine theoretical foundations, tools and frameworks, computational trade-offs, and unresolved issues, including model expressiveness and stability. We outline prospective research avenues at the intersection of machine learning, statistical physics, and computational chemistry, contending that future advancements will depend on mechanistic inductive biases, and integrated differentiable physical learning frameworks for biomolecular simulation and discovery.

[LG-153] Bridging Theory and Practice: A Stochastic Learning-Optimization Model for Resilient Automotive Supply Chains

链接: https://arxiv.org/abs/2511.06479
作者: Muhammad Shahnawaz,Adeel Safder
类目: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
*备注: 14 pages, 4 figures

点击查看摘要

Abstract:Supply chain disruptions and volatile demand pose significant challenges to the UK automotive industry, which relies heavily on Just-In-Time (JIT) manufacturing. While qualitative studies highlight the potential of integrating Artificial Intelligence (AI) with traditional optimization, a formal, quantitative demonstration of this synergy is lacking. This paper introduces a novel stochastic learning-optimization framework that integrates Bayesian inference with inventory optimization for supply chain management (SCM). We model a two-echelon inventory system subject to stochastic demand and supply disruptions, comparing a traditional static optimization policy against an adaptive policy where Bayesian learning continuously updates parameter estimates to inform stochastic optimization. Our simulations over 365 periods across three operational scenarios demonstrate that the integrated approach achieves 7.4% cost reduction in stable environments and 5.7% improvement during supply disruptions, while revealing important limitations during sudden demand shocks due to the inherent conservatism of Bayesian updating. This work provides mathematical validation for practitioner observations and establishes a formal framework for understanding AI-driven supply chain resilience, while identifying critical boundary conditions for successful implementation.

[LG-154] Fast Riemannian-manifold Hamiltonian Monte Carlo for hierarchical Gaussian-process models

链接: https://arxiv.org/abs/2511.06407
作者: Takashi Hayakawa,Satoshi Asai
类目: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)
*备注:

点击查看摘要

[LG-155] Learning the Inverse Ryu–Takayanagi Formula with Transformers

链接: https://arxiv.org/abs/2511.06387
作者: Sejin Kim
类目: High Energy Physics - Theory (hep-th); Machine Learning (cs.LG)
*备注: 15 pages, 6 figures

点击查看摘要

Abstract:We study the inverse problem of holographic entanglement entropy in AdS _3 using a data-driven generative model. Training data consist of randomly generated geometries and their holographic entanglement entropies using the Ryu–Takayanagi formula. After training, the Transformer reconstructs the blackening function within our metric ansatz from previously unseen inputs. The Transformer achieves accurate reconstructions on smooth black hole geometries and extrapolates to horizonless backgrounds. We describe the architecture and data generation process, and we quantify accuracy on both f(z) and the reconstructed S(\ell) . Code and evaluation scripts are available at the provided repository.

[LG-156] Functional Adjoint Sampler: Scalable Sampling on Infinite Dimensional Spaces

链接: https://arxiv.org/abs/2511.06239
作者: Byoungwoo Park,Juho Lee,Guan-Horng Liu
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Learning-based methods for sampling from the Gibbs distribution in finite-dimensional spaces have progressed quickly, yet theory and algorithmic design for infinite-dimensional function spaces remain limited. This gap persists despite their strong potential for sampling the paths of conditional diffusion processes, enabling efficient simulation of trajectories of diffusion processes that respect rare events or boundary constraints. In this work, we present the adjoint sampler for infinite-dimensional function spaces, a stochastic optimal control-based diffusion sampler that operates in function space and targets Gibbs-type distributions on infinite-dimensional Hilbert spaces. Our Functional Adjoint Sampler (FAS) generalizes Adjoint Sampling (Havens et al., 2025) to Hilbert spaces based on a SOC theory called stochastic maximum principle, yielding a simple and scalable matching-type objective for a functional representation. We show that FAS achieves superior transition path sampling performance across synthetic potential and real molecular systems, including Alanine Dipeptide and Chignolin.

[LG-157] Sparsity via Hyperpriors: A Theoretical and Algorithmic Study under Empirical Bayes Framework

链接: https://arxiv.org/abs/2511.06235
作者: Zhitao Li,Yiqiu Dong,Xueying Zeng
类目: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA)
*备注:

点击查看摘要

[LG-158] Forecasting Thermospheric Density with Transformers for Multi-Satellite Orbit Management

链接: https://arxiv.org/abs/2511.06105
作者: Cedric Bös,Alessandro Bortotto,Mohamed Khalil Ben-Larbi
类目: pace Physics (physics.space-ph); Machine Learning (cs.LG)
*备注: 6 pages, 3 figures, conference

点击查看摘要

Abstract:Accurate thermospheric density prediction is crucial for reliable satellite operations in Low Earth Orbits, especially at high solar and geomagnetic activity. Physics-based models such as TIE-GCM offer high fidelity but are computationally expensive, while empirical models like NRLMSIS are efficient yet lack predictive power. This work presents a transformer-based model that forecasts densities up to three days ahead and is intended as a drop-in replacement for an empirical baseline. Unlike recent approaches, it avoids spatial reduction and complex input pipelines, operating directly on a compact input set. Validated on real-world data, the model improves key prediction metrics and shows potential to support mission planning.

[LG-159] he Algorithmic Phase Transition in Symmetric Correlated Spiked Wigner Model

链接: https://arxiv.org/abs/2511.06040
作者: Zhangsong Li
类目: atistics Theory (math.ST); Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)
*备注: 47 pages

点击查看摘要

Abstract:We study the computational task of detecting and estimating correlated signals in a pair of spiked Wigner matrices. Our model consists of observations X = \tfrac\lambda\sqrtn xx^\top + W , \quad Y = \tfrac\mu\sqrtn yy^\top + Z ,. where x,y \in \mathbb R^n are signal vectors with norm |x|,|y| \approx\sqrtn and correlation \langle x,y \rangle \approx \rho|x||y| , while W,Z are independent Gaussian noise matrices. We propose an efficient algorithm that succeeds whenever F(\lambda,\mu,\rho)1 , where F(\lambda,\mu,\rho)=\max\Big\ \lambda,\mu, \frac \lambda^2 \rho^2 1-\lambda^2+\lambda^2 \rho^2 + \frac \mu^2 \rho^2 1-\mu^2+\mu^2 \rho^2 \Big\ ,. Our result shows that an algorithm can leverage the correlation between the spikes to detect and estimate the signals even in regimes where efficiently recovering either x from X alone or y from Y alone is believed to be computationally infeasible. We complement our algorithmic result with evidence for a matching computational lower bound. In particular, we prove that when F(\lambda,\mu,\rho)1 , all algorithms based on \em low-degree polynomials fails to distinguish (X,Y) with two independent Wigner matrices. This low-degree analysis strongly suggests that F(\lambda,\mu,\rho)=1 is the precise computation threshold for this problem. Comments: 47 pages Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML) MSC classes: 68Q87, 68Q17 Cite as: arXiv:2511.06040 [math.ST] (or arXiv:2511.06040v1 [math.ST] for this version) https://doi.org/10.48550/arXiv.2511.06040 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Zhangsong Li [view email] [v1] Sat, 8 Nov 2025 15:23:44 UTC (44 KB) Full-text links: Access Paper: View a PDF of the paper titled The Algorithmic Phase Transition in Symmetric Correlated Spiked Wigner Model, by Zhangsong LiView PDFHTML (experimental)TeX Source view license Current browse context: math.ST prev | next new | recent | 2025-11 Change to browse by: cs cs.LG math math.PR stat stat.ML stat.TH References Citations NASA ADSGoogle Scholar Semantic Scholar export BibTeX citation Loading… BibTeX formatted citation loading… Data provided by: Bookmark checked=“checked”> Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Code, Data and Media Associated with this Article alphaXiv Toggle alphaXiv (What is alphaXiv?) Links to Code Toggle CatalyzeX Code Finder for Papers (What is CatalyzeX?) DagsHub Toggle DagsHub (What is DagsHub?) GotitPub Toggle Gotit.pub (What is GotitPub?) Huggingface Toggle Hugging Face (What is Huggingface?) Links to Code Toggle Papers with Code (What is Papers with Code?) ScienceCast Toggle ScienceCast (What is ScienceCast?) Demos Demos Replicate Toggle Replicate (What is Replicate?) Spaces Toggle Hugging Face Spaces (What is Spaces?) Spaces Toggle TXYZ.AI (What is TXYZ.AI?) Related Papers Recommenders and Search Tools Link to Influence Flower Influence Flower (What are Influence Flowers?) Core recommender toggle CORE Recommender (What is CORE?) Author Venue Institution Topic About arXivLabs arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv’s community? Learn more about arXivLabs. Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?) mathjaxToggle(); About Help contact arXivClick here to contact arXiv Contact subscribe to arXiv mailingsClick here to subscribe Subscribe Copyright Privacy Policy Web Accessibility Assistance arXiv Operational Status

[LG-160] Benchmarking of Clustering Validity Measures Revisited

链接: https://arxiv.org/abs/2511.05983
作者: Connor Simpson,Ricardo J. G. B. Campello,Elizabeth Stojanovski
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注: 48 pages, 17 tables, 17 figures

点击查看摘要

Abstract:Validation plays a crucial role in the clustering process. Many different internal validity indexes exist for the purpose of determining the best clustering solution(s) from a given collection of candidates, e.g., as produced by different algorithms or different algorithm hyper-parameters. In this study, we present a comprehensive benchmark study of 26 internal validity indexes, which includes highly popular classic indexes as well as more recently developed ones. We adopted an enhanced revision of the methodology presented in Vendramin et al. (2010), developed here to address several shortcomings of this previous work. This overall new approach consists of three complementary custom-tailored evaluation sub-methodologies, each of which has been designed to assess specific aspects of an index’s behaviour while preventing potential biases of the other sub-methodologies. Each sub-methodology features two complementary measures of performance, alongside mechanisms that allow for an in-depth investigation of more complex behaviours of the internal validity indexes under study. Additionally, a new collection of 16177 datasets has been produced, paired with eight widely-used clustering algorithms, for a wider applicability scope and representation of more diverse clustering scenarios.

[LG-161] Beyond Resolution: Multi - Scale Weather and Climate Data for Alpine Renewable Energy in the Digital Twin Era - First Evaluations and Recommendations

链接: https://arxiv.org/abs/2511.05584
作者: Irene Schicker,Marianne Bügelmayer-Blaschek,Annemarie Lexer,Katharina Baier,Kristofer Hasel,Paolo Gazzaneo
类目: Atmospheric and Oceanic Physics (physics.ao-ph); Machine Learning (cs.LG)
*备注: perspectives paper submitted to iScience, 25 pages, 2 tables, 3 figures

点击查看摘要

Abstract:When Austrian hydropower production plummeted by 44% in early 2025 due to reduced snowpack, it exposed a critical vulnerability: standard meteorological and climatological datasets systematically fail in mountain regions that hold untapped renewable potential. This perspectives paper evaluates emerging solutions to the Alpine energy-climate data gap, analyzing datasets from global reanalyses (ERA5, 31 km) to kilometre-scale Digital Twins (Climate DT, Extremes DT, 4.4 km), regional reanalyses (ARA, 2.5 km), and next-generation AI weather prediction models (AIFS, 31 km). The multi-resolution assessment reveals that no single dataset excels universally: coarse reanalyses provide essential climatologies but miss valley-scale processes, while Digital Twins resolve Alpine dynamics yet remain computationally demanding. Effective energy planning therefore requires strategic dataset combinations validated against energy-relevant indices such as population-weighted extremes, wind-gust return periods, and Alpine-adjusted storm thresholds. A key frontier is sub-hourly (10-15 min) temporal resolution to match grid-operation needs. Six evidence-based recommendations outline pathways for bridging spatial and temporal scales. As renewable deployment expands globally into complex terrain, the Alpine region offers transferable perspectives for tackling identical forecasting and climate analysis challenges in mountainous regions worldwide.

[LG-162] Bridging Accuracy and Explainability in EEG-based Graph Attention Network for Depression Detection

链接: https://arxiv.org/abs/2511.05537
作者: Soujanya Hazra,Sanjay Ghosh
类目: ignal Processing (eess.SP); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Neurons and Cognition (q-bio.NC)
*备注: 13 pages, 3 tables, and 7 fugures

点击查看摘要

Abstract:Depression is a major cause of global mental illness and significantly influences suicide rates. Timely and accurate diagnosis is essential for effective intervention. Electroencephalography (EEG) provides a non-invasive and accessible method for examining cerebral activity and identifying disease-associated patterns. We propose a novel graph-based deep learning framework, named Edge-gated, axis-mixed Pooling Attention Network (ExPANet), for differentiating major depressive disorder (MDD) patients from healthy controls (HC). EEG recordings undergo preprocessing to eliminate artifacts and are segmented into short periods of activity. We extract 14 features from each segment, which include time, frequency, fractal, and complexity domains. Electrodes are represented as nodes, whereas edges are determined by the phase-locking value (PLV) to represent functional connectivity. The generated brain graphs are examined utilizing an adapted graph attention network. This architecture acquires both localized electrode characteristics and comprehensive functional connectivity patterns. The proposed framework attains superior performance relative to current EEG-based approaches across two different datasets. A fundamental advantage of our methodology is its explainability. We evaluated the significance of features, channels, and edges, in addition to intrinsic attention weights. These studies highlight features, cerebral areas, and connectivity associations that are especially relevant to MDD, many of which correspond with clinical data. Our findings demonstrate a reliable and transparent method for EEG-based screening of MDD, using deep learning with clinically relevant results.

信息检索

[IR-0] Wavelet Enhanced Adaptive Frequency Filter for Sequential Recommendation

链接: https://arxiv.org/abs/2511.07028
作者: Huayang Xu,Huanhuan Yuan,Guanfeng Liu,Junhua Fang,Lei Zhao,Pengpeng Zhao
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

[IR-1] CGLE: Class-label Graph Link Estimator for Link Prediction ICDM2025

链接: https://arxiv.org/abs/2511.06982
作者: Ankit Mazumder,Srikanta Bedathur
类目: ocial and Information Networks (cs.SI); Information Retrieval (cs.IR)
*备注: Paper accepted at the IEEE International Conference on Data Mining (ICDM 2025)

点击查看摘要

[IR-2] Have We Really Understood Collaborative Information? An Empirical Investigation WSDM2026

链接: https://arxiv.org/abs/2511.06905
作者: Xiaokun Zhang,Zhaochun Ren,Bowei He,Ziqiang Cui,Chen Ma
类目: Information Retrieval (cs.IR)
*备注: This work has been accepted by WSDM 2026

点击查看摘要

[IR-3] Accessibility Gaps in U.S. Government Dashboards for Blind and Low-Vision Residents

链接: https://arxiv.org/abs/2511.06688
作者: Chadani Acharya
类目: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Digital Libraries (cs.DL); Information Retrieval (cs.IR)
*备注: Preprint. Accessibility audit of six U.S. public dashboard ecosystems; 1 figure, 2 tables

点击查看摘要

[IR-4] Can LLM Annotations Replace User Clicks for Learning to Rank?

链接: https://arxiv.org/abs/2511.06635
作者: Lulu Yu,Keping Bi,Jiafeng Guo,Shihao Liu,Shuaiqiang Wang,Dawei Yin,Xueqi Cheng
类目: Information Retrieval (cs.IR)
*备注: 12 pages, 7 figures

点击查看摘要

[IR-5] OOL4POI: A Tool-Augmented LLM Framework for Next POI Recommendation AAAI2026

链接: https://arxiv.org/abs/2511.06405
作者: Dongsheng Wang,Shen Gao,Chengrui Huang,Yuxi Huang,Ruixiang Feng,Shuo Shang
类目: Information Retrieval (cs.IR)
*备注: Accepted by AAAI2026

点击查看摘要

[IR-6] LLaDA-Rec: Discrete Diffusion for Parallel Semantic ID Generation in Generative Recommendation

链接: https://arxiv.org/abs/2511.06254
作者: Teng Shi,Chenglei Shen,Weijie Yu,Shen Nie,Chongxuan Li,Xiao Zhang,Ming He,Yan Han,Jun Xu
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:Generative recommendation represents each item as a semantic ID, i.e., a sequence of discrete tokens, and generates the next item through autoregressive decoding. While effective, existing autoregressive models face two intrinsic limitations: (1) unidirectional constraints, where causal attention restricts each token to attend only to its predecessors, hindering global semantic modeling; and (2) error accumulation, where the fixed left-to-right generation order causes prediction errors in early tokens to propagate to the predictions of subsequent token. To address these issues, we propose LLaDA-Rec, a discrete diffusion framework that reformulates recommendation as parallel semantic ID generation. By combining bidirectional attention with the adaptive generation order, the approach models inter-item and intra-item dependencies more effectively and alleviates error accumulation. Specifically, our approach comprises three key designs: (1) a parallel tokenization scheme that produces semantic IDs for bidirectional modeling, addressing the mismatch between residual quantization and bidirectional architectures; (2) two masking mechanisms at the user-history and next-item levels to capture both inter-item sequential dependencies and intra-item semantic relationships; and (3) an adapted beam search strategy for adaptive-order discrete diffusion decoding, resolving the incompatibility of standard beam search with diffusion-based generation. Experiments on three real-world datasets show that LLaDA-Rec consistently outperforms both ID-based and state-of-the-art generative recommenders, establishing discrete diffusion as a new paradigm for generative recommendation.

[IR-7] User Hesitation and Negative Transfer in Multi-Behavior Recommendation

链接: https://arxiv.org/abs/2511.05808
作者: Cheng Li,Yong Xu,Suhua Tang,Wenqiang Lin,Xin He,Jinde Cao
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:Multi-behavior recommendation aims to integrate users’ interactions across various behavior types (e.g., view, favorite, add-to-cart, purchase) to more comprehensively characterize user preferences. However, existing methods lack in-depth modeling when dealing with interactions that generate only auxiliary behaviors without triggering the target behavior. In fact, these weak signals contain rich latent information and can be categorized into two types: (1) positive weak signals-items that have not triggered the target behavior but exhibit frequent auxiliary interactions, reflecting users’ hesitation tendencies toward these items; and (2) negative weak signals-auxiliary behaviors that result from misoperations or interaction noise, which deviate from true preferences and may cause negative transfer effects. To more effectively identify and utilize these weak signals, we propose a recommendation framework focused on weak signal learning, termed HNT. Specifically, HNT models weak signal features from two dimensions: positive and negative effects. By learning the characteristics of auxiliary behaviors that lead to target behaviors, HNT identifies similar auxiliary behaviors that did not trigger the target behavior and constructs a hesitation set of related items as weak positive samples to enhance preference modeling, thereby capturing users’ latent hesitation intentions. Meanwhile, during auxiliary feature fusion, HNT incorporates latent negative transfer effect modeling to distinguish and suppress interference caused by negative representations through item similarity learning. Experiments on three real-world datasets demonstrate that HNT improves HR@10 and NDCG@10 by 12.57% and 14.37%, respectively, compared to the best baseline methods.

[IR-8] SARCH: Multimodal Search for Archaeological Archives

链接: https://arxiv.org/abs/2511.05667
作者: Nivedita Sinha,Bharati Khanijo,Sanskar Singh,Priyansh Mahant,Ashutosh Roy,Saubhagya Singh Bhadouria,Arpan Jain,Maya Ramanath
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:In this paper, we describe a multi-modal search system designed to search old archaeological books and reports. This corpus is digitally available as scanned PDFs, but varies widely in the quality of scans. Our pipeline, designed for multi-modal archaeological documents, extracts and indexes text, images (classified into maps, photos, layouts, and others), and tables. We evaluated different retrieval strategies, including keyword-based search, embedding- based models, and a hybrid approach that selects optimal results from both modalities. We report and analyze our preliminary results and discuss future work in this exciting vertical.

[IR-9] GreyShot: Zeroshot and Privacy-preserving Recommender System by GM(11) Model

链接: https://arxiv.org/abs/2511.05493
作者: Hao Wang
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:Every recommendation engineer needs to face the cold start problem when building his system. During the past decades, most scientists adopted transfer learning and meta learning to solve the problem. Although notable exceptions such as ZeroMat etc. have been invented in recent years, cold-start problem remains a challenging problem for many researchers. In this paper, we build a zeroshot and privacy-preserving recommender system algorithm GreyShot using GM(1,1) model by taking advantage of the Poisson-Pareto property of the online rating data. Our approach relies on no input data and is effective in generating both accurate and fair results. In conclusion, zeroshot problem of recommender systems could be effectively solved by grey system methods such as GM(1,1).

附件下载

点击下载今日全部论文列表