本篇博文主要内容为 2025-11-06 从Arxiv.org论文网站获取的最新论文列表,自动更新,按照NLP、CV、ML、AI、IR五个大方向区分,若需要邮件定时接收,请在评论区留下你的邮箱号。
说明:每日论文数据从Arxiv.org获取,每天早上12:00左右定时自动更新。
友情提示: 如何您需要邮箱接收每日论文数据,请在评论处留下你的邮箱。
目录
概览 (2025-11-06)
今日共更新430篇论文,其中:
- 自然语言处理共55篇(Computation and Language (cs.CL))
- 人工智能共115篇(Artificial Intelligence (cs.AI))
- 计算机视觉共58篇(Computer Vision and Pattern Recognition (cs.CV))
- 机器学习共127篇(Machine Learning (cs.LG))
自然语言处理
[NLP-0] Grounded Misunderstandings in Asymmetric Dialogue: A Perspectivist Annotation Scheme for MapTask
【速读】: 该论文旨在解决协作对话中因视角差异导致的“表面共识”问题,即参与者可能误以为达成理解一致,实则对同一指称表达存在不同的认知参照(referential misalignment)。其解决方案的关键在于提出一种基于视角主义(perspectivist)的标注方案,将每条指称表达的说话者和听者各自 grounded 的解释分别标注出来,从而精确追踪理解如何随时间建立、偏离与修复。通过约束大语言模型(LLM)的标注流程,作者在HCRC MapTask语料库上构建了13,000条高质量标注数据,并发现:一旦词汇变体统一,完全误解罕见,但多重性不一致(multiplicity discrepancies)系统性引发理解分歧,揭示了看似稳固的共同基础如何掩盖指称层面的错位。该框架为研究具身理解偏差及评估(视觉)大语言模型(V)LLMs在协作对话中建模视角依赖性共知的能力提供了资源与分析工具。
链接: https://arxiv.org/abs/2511.03718
作者: Nan Li,Albert Gatt,Massimo Poesio
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 11 pages, 3 figures, 5 tables; under review
Abstract:Collaborative dialogue relies on participants incrementally establishing common ground, yet in asymmetric settings they may believe they agree while referring to different entities. We introduce a perspectivist annotation scheme for the HCRC MapTask corpus (Anderson et al., 1991) that separately captures speaker and addressee grounded interpretations for each reference expression, enabling us to trace how understanding emerges, diverges, and repairs over time. Using a scheme-constrained LLM annotation pipeline, we obtain 13k annotated reference expressions with reliability estimates and analyze the resulting understanding states. The results show that full misunderstandings are rare once lexical variants are unified, but multiplicity discrepancies systematically induce divergences, revealing how apparent grounding can mask referential misalignment. Our framework provides both a resource and an analytic lens for studying grounded misunderstanding and for evaluating (V)LLMs’ capacity to model perspective-dependent grounding in collaborative dialogue.
zh
[NLP-1] Do Androids Dream of Unseen Puppeteers? Probing for a Conspiracy Mindset in Large Language Models
【速读】: 该论文旨在解决大型语言模型(Large Language Models, LLMs)是否表现出阴谋论倾向、是否存在社会人口学偏差,以及其是否容易被引导形成阴谋论观点的问题。这一问题的核心在于评估LLMs在心理层面的社会拟真度,尤其是它们是否会复制人类中复杂的高阶心理构念(如阴谋心态),从而影响信息传播与公众信任。解决方案的关键在于通过施加经验证的心理测量问卷(psychometric surveys)对多个LLM进行测试,并采用不同提示(prompting)和条件化策略,系统性地观察其响应模式;结果表明,LLMs部分认同阴谋信念,且社会人口属性的条件化会引发不均衡效应,暴露潜在的群体偏见,同时针对性提示可轻易诱导其向阴谋论方向转变,揭示了LLMs在敏感场景部署中的脆弱性与风险。
链接: https://arxiv.org/abs/2511.03699
作者: Francesco Corso,Francesco Pierri,Gianmarco De Francisci Morales
机构: Politecnico Di Milano (米兰理工大学); CENTAI
类目: Computation and Language (cs.CL); Computers and Society (cs.CY)
备注:
Abstract:In this paper, we investigate whether Large Language Models (LLMs) exhibit conspiratorial tendencies, whether they display sociodemographic biases in this domain, and how easily they can be conditioned into adopting conspiratorial perspectives. Conspiracy beliefs play a central role in the spread of misinformation and in shaping distrust toward institutions, making them a critical testbed for evaluating the social fidelity of LLMs. LLMs are increasingly used as proxies for studying human behavior, yet little is known about whether they reproduce higher-order psychological constructs such as a conspiratorial mindset. To bridge this research gap, we administer validated psychometric surveys measuring conspiracy mindset to multiple models under different prompting and conditioning strategies. Our findings reveal that LLMs show partial agreement with elements of conspiracy belief, and conditioning with socio-demographic attributes produces uneven effects, exposing latent demographic biases. Moreover, targeted prompts can easily shift model responses toward conspiratorial directions, underscoring both the susceptibility of LLMs to manipulation and the potential risks of their deployment in sensitive contexts. These results highlight the importance of critically evaluating the psychological dimensions embedded in LLMs, both to advance computational social science and to inform possible mitigation strategies against harmful uses.
zh
[NLP-2] ChiMDQA: Towards Comprehensive Chinese Document QA with Fine-grained Evaluation ICANN2025
【速读】: 该论文旨在解决中文文档问答(Chinese Document Question Answering, CDQA)领域高质量标注数据集稀缺的问题,以支持下游业务场景中对多领域文本理解与智能问答系统的需求。解决方案的关键在于构建了一个名为ChiMDQA的多文档中文问答数据集,涵盖学术、教育、金融、法律、医疗和新闻六大领域,包含6,068个经过严格筛选与人工标注的高质量问题-答案对,并细分为10个细粒度类别;通过系统化的文档筛选与问题设计方法,确保了数据集在多样性与质量上的双重保障,从而为文档理解、知识抽取及智能问答等自然语言处理任务提供可靠基准。
链接: https://arxiv.org/abs/2511.03656
作者: Jing Gao,Shutiao Luo,Yumeng Liu,Yuanming Li,Hongji Zeng
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 13 pages, 6 tables, 4 figures, accepted by ICANN 2025
Abstract:With the rapid advancement of natural language processing (NLP) technologies, the demand for high-quality Chinese document question-answering datasets is steadily growing. To address this issue, we present the Chinese Multi-Document Question Answering Dataset(ChiMDQA), specifically designed for downstream business scenarios across prevalent domains including academic, education, finance, law, medical treatment, and news. ChiMDQA encompasses long-form documents from six distinct fields, consisting of 6,068 rigorously curated, high-quality question-answer (QA) pairs further classified into ten fine-grained categories. Through meticulous document screening and a systematic question-design methodology, the dataset guarantees both diversity and high quality, rendering it applicable to various NLP tasks such as document comprehension, knowledge extraction, and intelligent QA systems. Additionally, this paper offers a comprehensive overview of the dataset’s design objectives, construction methodologies, and fine-grained evaluation system, supplying a substantial foundation for future research and practical applications in Chinese QA. The code and data are available at: this https URL.
zh
[NLP-3] Watermarking Large Language Models in Europe: Interpreting the AI Act in Light of Technology
【速读】: 该论文旨在解决欧盟《人工智能法案》(AI Act)中关于通用大语言模型(Large Language Models, LLMs)输出标记的规范性要求难以量化评估的问题,特别是如何将“可靠性、互操作性、有效性与鲁棒性”四项标准转化为可操作、可测量的评价指标。其解决方案的关键在于:首先提出一个基于LLM生命周期阶段(训练前、训练中、训练后及采样阶段)的水印方法分类体系;其次,将欧盟法规要求映射至现有水印技术在鲁棒性、可检测性和模型质量方面的实证评估,并引入三个规范维度以理论化“互操作性”;最后,通过对照欧盟标准对当前主流水印方法进行系统比较,指出尚无一种方案满足全部四项要求,从而推动未来研究向嵌入低层架构的水印机制发展。
链接: https://arxiv.org/abs/2511.03641
作者: Thomas Souverain
机构: CEA Paris-Saclay (法国原子能和替代能源委员会)
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
备注: 17 pages, 2 Tables and 2 Pictures
Abstract:To foster trustworthy Artificial Intelligence (AI) within the European Union, the AI Act requires providers to mark and detect the outputs of their general-purpose models. The Article 50 and Recital 133 call for marking methods that are ‘‘sufficiently reliable, interoperable, effective and robust’’. Yet, the rapidly evolving and heterogeneous landscape of watermarks for Large Language Models (LLMs) makes it difficult to determine how these four standards can be translated into concrete and measurable evaluations. Our paper addresses this challenge, anchoring the normativity of European requirements in the multiplicity of watermarking techniques. Introducing clear and distinct concepts on LLM watermarking, our contribution is threefold. (1) Watermarking Categorisation: We propose an accessible taxonomy of watermarking methods according to the stage of the LLM lifecycle at which they are applied - before, during, or after training, and during next-token distribution or sampling. (2) Watermarking Evaluation: We interpret the EU AI Act’s requirements by mapping each criterion with state-of-the-art evaluations on robustness and detectability of the watermark, and of quality of the LLM. Since interoperability remains largely untheorised in LLM watermarking research, we propose three normative dimensions to frame its assessment. (3) Watermarking Comparison: We compare current watermarking methods for LLMs against the operationalised European criteria and show that no approach yet satisfies all four standards. Encouraged by emerging empirical tests, we recommend further research into watermarking directly embedded within the low-level architecture of LLMs.
zh
[NLP-4] owards Transparent Stance Detection: A Zero-Shot Approach Using Implicit and Explicit Interpretability AAAI
【速读】: 该论文旨在解决零样本立场检测(Zero-Shot Stance Detection, ZSSD)中存在的可泛化性不足、文本与目标间关联不一致,以及现有基于大语言模型(Large Language Models, LLMs)的方法过度依赖显式推理、解释粗略且缺乏显式建模推理过程的问题。其解决方案的关键在于提出一个新颖的可解释ZSSD框架IRIS,该框架通过隐式理由(implicit rationales)和显式理由(explicit rationales)双路径建模实现解释性:一方面将立场检测视为信息检索排序任务,利用文本内部序列隐式捕捉不同立场的相关性,从而在无需标注理由的情况下提供内在可解释性;另一方面基于交际特征构建显式理由,解码立场的情感与认知维度,增强对作者态度的可解释理解。此设计显著提升了模型在少样本场景下的泛化能力,实验表明在VAST、EZ-STANCE、P-Stance和RFD数据集上使用仅10%训练数据时仍能保持高性能。
链接: https://arxiv.org/abs/2511.03635
作者: Apoorva Upadhyaya,Wolfgang Nejdl,Marco Fisichella
机构: 未知
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: Accepted in AAAI CONFERENCE ON WEB AND SOCIAL MEDIA (ICWSM 2026)
Abstract:Zero-Shot Stance Detection (ZSSD) identifies the attitude of the post toward unseen targets. Existing research using contrastive, meta-learning, or data augmentation suffers from generalizability issues or lack of coherence between text and target. Recent works leveraging large language models (LLMs) for ZSSD focus either on improving unseen target-specific knowledge or generating explanations for stance analysis. However, most of these works are limited by their over-reliance on explicit reasoning, provide coarse explanations that lack nuance, and do not explicitly model the reasoning process, making it difficult to interpret the model’s predictions. To address these issues, in our study, we develop a novel interpretable ZSSD framework, IRIS. We provide an interpretable understanding of the attitude of the input towards the target implicitly based on sequences within the text (implicit rationales) and explicitly based on linguistic measures (explicit rationales). IRIS considers stance detection as an information retrieval ranking task, understanding the relevance of implicit rationales for different stances to guide the model towards correct predictions without requiring the ground-truth of rationales, thus providing inherent interpretability. In addition, explicit rationales based on communicative features help decode the emotional and cognitive dimensions of stance, offering an interpretable understanding of the author’s attitude towards the given target. Extensive experiments on the benchmark datasets of VAST, EZ-STANCE, P-Stance, and RFD using 50%, 30%, and even 10% training data prove the generalizability of our model, benefiting from the proposed architecture and interpretable design.
zh
[NLP-5] A systematic review of relation extraction task since the emergence of Transformers WWW
【速读】: 该论文旨在系统梳理自Transformer模型兴起以来关系抽取(Relation Extraction, RE)领域的研究进展,以解决当前RE研究中方法多样、资源分散、趋势不明确的问题。其解决方案的关键在于构建一个自动化框架,对2019至2024年间发表的34篇综述、64个数据集和104个模型进行系统收集与标注分析,从而从方法论演进、基准资源建设及语义网技术融合等多个维度整合研究成果,识别出当前主流趋势、局限性与开放挑战,为研究人员和实践者提供全面的参考依据。
链接: https://arxiv.org/abs/2511.03610
作者: Ringwald Celian,Gandon,Fabien,Faron Catherine,Michel Franck,Abi Akl Hanna
机构: 未知
类目: Computation and Language (cs.CL)
备注: Submited at ACM-Computing Surveys + The resulting annotated Zotero bibliography : this https URL + SciLEx software: this https URL
Abstract:This article presents a systematic review of relation extraction (RE) research since the advent of Transformer-based models. Using an automated framework to collect and annotate publications, we analyze 34 surveys, 64 datasets, and 104 models published between 2019 and 2024. The review highlights methodological advances, benchmark resources, and the integration of semantic web technologies. By consolidating results across multiple dimensions, the study identifies current trends, limitations, and open challenges, offering researchers and practitioners a comprehensive reference for understanding the evolution and future directions of RE.
zh
[NLP-6] Step-Audio-EditX Technical Report
【速读】: 该论文旨在解决音频编辑中表达性不足与迭代控制困难的问题,尤其在情感(emotion)、说话风格(speaking style)和副语言特征(paralinguistics)等细粒度控制方面存在局限。传统方法通常依赖于表示层面的解耦(representation-level disentanglement),但难以实现高表达力与灵活迭代的统一。其解决方案的关键在于提出Step-Audio-EditX,一种基于大 margin 学习(large-margin learning)的开源大语言模型(LLM-based)音频模型,仅使用大规模合成数据即可实现强大的零样本文本到语音(zero-shot text-to-speech, TTS)能力和多维度音频编辑能力,无需嵌入式先验或辅助模块,从而实现了从表示解耦范式向控制导向范式的根本转变。
链接: https://arxiv.org/abs/2511.03601
作者: Chao Yan,Boyong Wu,Peng Yang,Pengfei Tan,Guoqiang Hu,Yuxin Zhang,Xiangyu(Tony)Zhang,Fei Tian,Xuerui Yang,Xiangyu Zhang,Daxin Jiang,Gang Yu
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
备注:
Abstract:We present Step-Audio-EditX, the first open-source LLM-based audio model excelling at expressive and iterative audio editing encompassing emotion, speaking style, and paralinguistics alongside robust zero-shot text-to-speech (TTS) this http URL core innovation lies in leveraging only large-margin synthetic data, which circumvents the need for embedding-based priors or auxiliary modules. This large-margin learning approach enables both iterative control and high expressivity across voices, and represents a fundamental pivot from the conventional focus on representation-level disentanglement. Evaluation results demonstrate that Step-Audio-EditX surpasses both MiniMax-2.6-hd and Doubao-Seed-TTS-2.0 in emotion editing and other fine-grained control tasks.
zh
[NLP-7] ASVRI-Legal: Fine-Tuning LLM s with Retrieval Augmented Generation for Enhanced Legal Regulation
【速读】: 该论文旨在解决政策制定者在理解、分析和起草法律规范时面临的效率与准确性挑战,尤其是在面对快速演变的法律环境时。其解决方案的关键在于结合**微调(fine-tuning)与检索增强生成(Retrieval-Augmented Generation, RAG)**技术:首先通过构建面向法律领域的监督数据集对大型语言模型(Large Language Models, LLMs)进行微调,使其具备深度法律文本理解能力;随后引入RAG机制,使模型能够从外部知识源中检索并融合最新法律信息,从而提升其在法律解释和法规起草中的实用性与时效性。这一双轨策略显著增强了法律研究与规制制定的效能。
链接: https://arxiv.org/abs/2511.03563
作者: One Octadion,Bondan Sapta Prakoso,Nanang Yudi Setiawan,Novanto Yudistira
机构: 未知
类目: Computation and Language (cs.CL)
备注: 11 pages (including references), 2 figures, 4 tables, published in Atlantis Press (Open Access under CC BY-NC 4.0 license)
Abstract:In this study, we explore the fine-tuning of Large Language Models (LLMs) to better support policymakers in their crucial work of understanding, analyzing, and crafting legal regulations. To equip the model with a deep understanding of legal texts, we curated a supervised dataset tailored to the specific needs of the legal domain. Additionally, we integrated the Retrieval-Augmented Generation (RAG) method, enabling the LLM to access and incorporate up-to-date legal knowledge from external sources. This combination of fine-tuning and RAG-based augmentation results in a tool that not only processes legal information but actively assists policymakers in interpreting regulations and drafting new ones that align with current needs. The results demonstrate that this approach can significantly enhance the effectiveness of legal research and regulation development, offering a valuable resource in the ever-evolving field of law.
zh
[NLP-8] AILA–First Experiments with Localist Language Models
【速读】: 该论文旨在解决传统Transformer语言模型中表示方式缺乏可控性的问题,即模型在解释性和性能之间难以权衡。其解决方案的关键在于提出了一种可调控局部性的架构框架(controllable locality),通过引入一个可调的局部性旋钮参数(locality dial parameter, λ),实现从高度可解释的局部编码(localist encoding)到高效分布式表示之间的连续控制,而无需重新训练模型。实验表明,该方法可在保持较高预测性能的同时显著降低注意力熵,并提升指针保真度(pointer fidelity),从而为需要透明性和能力并重的应用场景提供了数学上精确且可调的优化路径。
链接: https://arxiv.org/abs/2511.03559
作者: Joachim Diederich
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:
Abstract:This paper presents the first empirical demonstration of controllable locality in transformer language models, a novel architectural framework that enables continuous control over the degree of representation localization through a tunable locality dial parameter. Unlike traditional language models that rely exclusively on distributed representations, our approach allows dynamic interpolation between highly interpretable localist encodings and efficient distributed representations without requiring model retraining. We conducted experiments on the WikiText corpus using a two-layer transformer architecture, systematically varying the locality parameter \lambda across the full spectrum from 1.0 (fully localist) to 0.0 (fully distributed). Our results demonstrate that localist configurations achieve dramatically lower attention entropy, with \lambda = 1.0 yielding 5.36 bits compared to 7.18 bits at \lambda = 0.0, while maintaining substantially higher pointer fidelity scores reflecting stronger alignment with rule-specified targets. Prediction experiments reveal that intermediate locality values optimize the tradeoff between interpretability and performance, with \lambda = 0.6 achieving test perplexity of 4.65 and accuracy of 84.7%. These findings establish that localist language models provide a practical framework for applications in regulated domains requiring both transparency and capability, offering precise mathematical control over the interpretability-performance spectrum through explicit penalty thresholds and information-theoretic design principles.
zh
[NLP-9] MultiZebraLogic: A Multilingual Logical Reasoning Benchmark LREC2026
【速读】: 该论文旨在解决如何有效衡量大语言模型(Large Language Models, LLMs)在逻辑推理能力上的差异,特别是跨语言场景下不同模型的推理表现。其核心问题是缺乏高质量、多语言且难度适中的基准数据集来公平比较LLMs的逻辑推理性能。解决方案的关键在于构建名为MultiZebraLogic的多样化zebra谜题数据集,涵盖九种日耳曼语系语言,包含多种主题、谜题规模(2x3和4x5)、14类线索类型及8类干扰项(red herring),并通过实证发现:4x5规模的谜题对o3-mini推理模型具有适当挑战性,而引入5个干扰项可使准确率下降约15%;同时,语言(英语vs.丹麦语)或主题(通用房屋vs.国家特色smoerrebroed)对模型表现无显著影响,验证了数据集的跨语言一致性与可比性。
链接: https://arxiv.org/abs/2511.03553
作者: Sofie Helene Bruun,Dan Saattrup Smart
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Submitted to LREC 2026
Abstract:Measuring the full abilities of large language models (LLMs) requires benchmarks representing multiple tasks. We aim to create large, high-quality datasets for comparison of logical reasoning skills across several languages and of suitable difficulty for LLMs of various reasoning ability. We explore multiple ways of increasing difficulty. We generate zebra puzzles in multiple languages, themes, sizes and including 14 different clue types and 8 red herring types (uninformative clues). We find puzzle sizes 2x3 and 4x5 are sufficiently challenging for GPT-4o mini (a non-reasoning model) and o3-mini (a reasoning model), respectively. Including 5 red herrings decreases o3-mini puzzle-level accuracy on 4x5 puzzles by 15 \pm 7 %. Scores of o3-mini on 4x5 puzzles are not significantly affected by use of English vs. Danish or the common houses theme vs. the country-specific smoerrebroed theme. We find no correlation between difficulty and the selected clue types. Datasets of 128+1024 puzzles are published as MultiZebraLogic in each of nine Germanic languages for sizes 2x3 and 4x5. We publish code for puzzle generation, designed for adaptablity into more languages and themes.
zh
[NLP-10] Bearing Syntactic Fruit with Stack-Augmented Neural Networks
【速读】: 该论文试图解决的问题是:当前主流神经网络架构在语言学习中缺乏人类儿童所表现出的对层次化句法规则的偏好,即在无明确句法监督、未进行大规模预训练或未长时间训练至收敛的情况下,难以实现类人般的泛化能力。解决方案的关键在于引入栈增强型神经网络(stack-augmented neural networks),通过在基础架构(如Transformer、RNN、LSTM)中集成两种类型的栈机制——Joulin与Mikolov(2015)提出的叠加栈(superposition stack)以及DuSell与Chiang(2023)提出的非确定性推广版本——使模型能够在不依赖句法监督、预训练或过量训练的前提下,实现类似人类的语言泛化能力,尤其在经典疑问句生成任务中表现最优。
链接: https://arxiv.org/abs/2511.03547
作者: Brian DuSell,Ryan Cotterell
机构: ETH Zürich (苏黎世联邦理工学院)
类目: Computation and Language (cs.CL)
备注: 15 pages, 5 figures
Abstract:Any finite set of training data is consistent with an infinite number of hypothetical algorithms that could have generated it. Studies have shown that when human children learn language, they consistently favor hypotheses based on hierarchical syntactic rules without ever encountering disambiguating examples. A recent line of work has inquired as to whether common neural network architectures share this bias, finding that they do so only under special conditions: when syntactically supervised, when pre-trained on massive corpora, or when trained long past convergence. In this paper, we demonstrate, for the first time, neural network architectures that are able to generalize in human-like fashion without any of the aforementioned requirements: stack-augmented neural networks. We test three base architectures (transformer, simple RNN, LSTM) augmented with two styles of stack: the superposition stack of Joulin Mikolov (2015) and a nondeterministic generalization of it proposed by DuSell Chiang (2023). We find that transformers with nondeterministic stacks generalize best out of these architectures on a classical question formation task. We also propose a modification to the stack RNN architecture that improves hierarchical generalization. These results suggest that stack-augmented neural networks may be more accurate models of human language acquisition than standard architectures, serving as useful objects of psycholinguistic study. Our code is publicly available.
zh
[NLP-11] SOLVE-Med: Specialized Orchestration for Leading Vertical Experts across Medical Specialties
【速读】: 该论文旨在解决医疗问答系统在部署过程中面临的多重挑战,包括幻觉(hallucinations)、偏见(bias)、计算资源需求高、隐私保护问题以及跨领域专业知识的缺乏。其解决方案的关键在于提出一种多智能体架构SOLVE-Med,该架构通过一个路由器智能体(Router Agent)动态选择最适合特定医学问题的领域专业化小语言模型(domain-specialized small language models),并由协调智能体(Orchestrator Agent)整合多个专家模型的输出以生成最终回答。系统采用10个各具10亿参数(1B parameters)且针对不同医学专科微调的小型模型,在意大利医学论坛数据集上实现了优于单一大模型(最高达14B参数)的性能表现(ROUGE-1: 0.301, BERTScore F1: 0.697),同时支持本地化部署,兼顾准确性与实用性。
链接: https://arxiv.org/abs/2511.03542
作者: Roberta Di Marino,Giovanni Dioguardi,Antonio Romano,Giuseppe Riccio,Mariano Barone,Marco Postiglione,Flora Amato,Vincenzo Moscato
机构: University of Naples Federico II, DIETI, Naples, Italy; Northwestern University, Department of Computer Science, Evanston, IL, United States
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:
Abstract:Medical question answering systems face deployment challenges including hallucinations, bias, computational demands, privacy concerns, and the need for specialized expertise across diverse domains. Here, we present SOLVE-Med, a multi-agent architecture combining domain-specialized small language models for complex medical queries. The system employs a Router Agent for dynamic specialist selection, ten specialized models (1B parameters each) fine-tuned on specific medical domains, and an Orchestrator Agent that synthesizes responses. Evaluated on Italian medical forum data across ten specialties, SOLVE-Med achieves superior performance with ROUGE-1 of 0.301 and BERTScore F1 of 0.697, outperforming standalone models up to 14B parameters while enabling local deployment. Our code is publicly available on GitHub: this https URL.
zh
[NLP-12] One Battle After Another: Probing LLM s Limits on Multi-Turn Instruction Following with a Benchmark Evolving Framework
【速读】: 该论文旨在解决现有对话评估基准在多轮指令跟随能力测试中的局限性,即这些基准通常固定对话轮数,易导致性能饱和且无法真实反映用户交互体验。其解决方案的关键在于提出一个可扩展的框架(EvolIF),通过三层机制将语言表层形式与用户意图模拟解耦,动态追踪约束、指令和话题状态,从而模拟用户与大语言模型(LLM)的真实交互过程,并在模型耗尽模拟用户耐心时终止对话。该框架支持动态构建基准并量化交互质量,显著提升了对多轮指令遵循能力的评估准确性。
链接: https://arxiv.org/abs/2511.03508
作者: Qi Jia,Kaiwei Zhang,Xiujie Song,Ye Shen,Xiangyang Zhu,Guangtao Zhai
机构: Shanghai Artificial Intelligence Laboratory (上海人工智能实验室); Shanghai Jiao Tong University (上海交通大学)
类目: Computation and Language (cs.CL)
备注:
Abstract:Understanding how well large language models can follow users’ instructions throughout a dialogue spanning multiple topics is of great importance for data-intensive conversational applications. Existing benchmarks are often limited to a fixed number of turns, making them susceptible to saturation and failing to account for the user’s interactive experience. In this work, we propose an extensible framework for assessing multi-turn instruction-following ability. At its core, our framework decouples linguistic surface forms from user intent simulation through a three-layer mechanism that tracks constraints, instructions, and topics. This framework mimics User-LLM interaction by enabling the dynamic construction of benchmarks with state changes and tracebacks, terminating a conversation only when the model exhausts a simulated user’s patience. We define a suite of metrics capturing the quality of the interaction process. Using this framework, we construct EvolIF, an evolving instruction-following benchmark incorporating nine distinct constraint types. Our results indicate that GPT-5 exhibits superior instruction-following performance. It sustains an average of 18.54 conversational turns and demonstrates 70.31% robustness, outperforming Gemini-2.5-Pro by a significant margin of 11.41%, while other models lag far behind. All of the data and code will be made publicly available online.
zh
[NLP-13] HaluMem: Evaluating Hallucinations in Memory Systems of Agents
【速读】: 该论文旨在解决当前AI系统中记忆系统(Memory Systems)在存储与检索过程中频繁出现的记忆幻觉(Memory Hallucinations)问题,包括虚构、错误、冲突和遗漏等行为。现有评估方法主要依赖端到端的问答任务,难以定位幻觉发生的具体操作阶段。为此,作者提出首个面向记忆系统操作层级的评估基准——HaluMem,其关键在于定义了三个核心任务:记忆提取(Memory Extraction)、记忆更新(Memory Updating)和记忆问答(Memory Question Answering),从而实现对不同操作阶段幻觉行为的全面揭示。此外,构建了以用户为中心的多轮人机交互数据集(HaluMem-Medium 和 HaluMem-Long),支持在长上下文(超过1M tokens)和复杂任务场景下评估幻觉传播机制,实证表明幻觉主要产生于提取与更新阶段并进一步影响问答准确性,因此未来研究应聚焦于可解释且受约束的记忆操作机制,以系统性抑制幻觉并提升记忆可靠性。
链接: https://arxiv.org/abs/2511.03506
作者: Ding Chen,Simin Niu,Kehang Li,Peng Liu,Xiangping Zheng,Bo Tang,Xinchi Li,Feiyu Xiong,Zhiyu Li
机构: Harbin Engineering University (哈尔滨工程大学); MemTensor (MemTensor)
类目: Computation and Language (cs.CL)
备注:
Abstract:Memory systems are key components that enable AI systems such as LLMs and AI agents to achieve long-term learning and sustained interaction. However, during memory storage and retrieval, these systems frequently exhibit memory hallucinations, including fabrication, errors, conflicts, and omissions. Existing evaluations of memory hallucinations are primarily end-to-end question answering, which makes it difficult to localize the operational stage within the memory system where hallucinations arise. To address this, we introduce the Hallucination in Memory Benchmark (HaluMem), the first operation level hallucination evaluation benchmark tailored to memory systems. HaluMem defines three evaluation tasks (memory extraction, memory updating, and memory question answering) to comprehensively reveal hallucination behaviors across different operational stages of interaction. To support evaluation, we construct user-centric, multi-turn human-AI interaction datasets, HaluMem-Medium and HaluMem-Long. Both include about 15k memory points and 3.5k multi-type questions. The average dialogue length per user reaches 1.5k and 2.6k turns, with context lengths exceeding 1M tokens, enabling evaluation of hallucinations across different context scales and task complexities. Empirical studies based on HaluMem show that existing memory systems tend to generate and accumulate hallucinations during the extraction and updating stages, which subsequently propagate errors to the question answering stage. Future research should focus on developing interpretable and constrained memory operation mechanisms that systematically suppress hallucinations and improve memory reliability.
zh
[NLP-14] BanglaSTEM: A Parallel Corpus for Technical Domain Bangla-English Translation
【速读】: 该论文旨在解决生成式 AI(Generative AI)在处理孟加拉语(Bangla)技术问题时性能低下这一问题,其核心挑战在于现有孟加拉语-英语翻译系统难以准确处理STEM领域中的专业术语,导致翻译失真并引发错误答案。解决方案的关键在于构建了一个高质量的孟加拉语-英语技术语料库 BanglaSTEM,该数据集包含5,000对经人工筛选的STEM领域句子对,涵盖计算机科学、数学、物理、化学和生物学,并通过语言模型生成超过12,000条翻译后由人类评估者挑选出最准确的技术术语保留对,进而训练了一个基于T5架构的翻译模型。实验证明该方法显著提升了技术内容的翻译准确性,从而增强了孟加拉语用户利用面向英语的大语言模型进行技术问题求解的能力。
链接: https://arxiv.org/abs/2511.03498
作者: Kazi Reyazul Hasan,Mubasshira Musarrat,A. B. M. Alim Al Islam,Muhammad Abdullah Adnan
机构: 未知
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:
Abstract:Large language models work well for technical problem solving in English but perform poorly when the same questions are asked in Bangla. A simple solution would be to translate Bangla questions into English first and then use these models. However, existing Bangla-English translation systems struggle with technical terms. They often mistranslate specialized vocabulary, which changes the meaning of the problem and leads to wrong answers. We present BanglaSTEM, a dataset of 5,000 carefully selected Bangla-English sentence pairs from STEM fields including computer science, mathematics, physics, chemistry, and biology. We generated over 12,000 translations using language models and then used human evaluators to select the highest quality pairs that preserve technical terminology correctly. We train a T5-based translation model on BanglaSTEM and test it on two tasks: generating code and solving math problems. Our results show significant improvements in translation accuracy for technical content, making it easier for Bangla speakers to use English-focused language models effectively. Both the BanglaSTEM dataset and the trained translation model are publicly released at this https URL.
zh
[NLP-15] Kastor: Fine-tuned Small Language Models for Shape-based Active Relation Extraction ESWC2025
【速读】: 该论文旨在解决小语言模型(Small Language Models, SLMs)在特定领域知识库(Knowledge Base, KB)构建与优化中面临的训练数据稀缺和模型泛化能力不足的问题。针对这一挑战,作者提出Kastor框架,其核心创新在于将传统的单个SHACL形状验证任务重构为对形状所导出的所有属性组合进行评估,并为每个训练样本选择最优属性组合,从而显著提升模型的泛化性能;此外,Kastor引入迭代学习机制以逐步修正噪声知识库,使模型能够发现新的、相关的事实,实现知识库的持续完善与增强。
链接: https://arxiv.org/abs/2511.03466
作者: Ringwald Celian,Gandon Fabien,Faron Catherine,Michel Franck,Abi Akl Hanna
机构: 未知
类目: Computation and Language (cs.CL)
备注: Accepted at ESWC 2025
Abstract:RDF pattern-based extraction is a compelling approach for fine-tuning small language models (SLMs) by focusing a relation extraction task on a specified SHACL shape. This technique enables the development of efficient models trained on limited text and RDF data. In this article, we introduce Kastor, a framework that advances this approach to meet the demands for completing and refining knowledge bases in specialized domains. Kastor reformulates the traditional validation task, shifting from single SHACL shape validation to evaluating all possible combinations of properties derived from the shape. By selecting the optimal combination for each training example, the framework significantly enhances model generalization and performance. Additionally, Kastor employs an iterative learning process to refine noisy knowledge bases, enabling the creation of robust models capable of uncovering new, relevant facts
zh
[NLP-16] CareMedEval dataset: Evaluating Critical Appraisal and Reasoning in the Biomedical Field LREC2026
【速读】: 该论文旨在解决当前大语言模型(Large Language Models, LLMs)在生物医学领域中进行批判性文献评估(Critical Appraisal)能力不足的问题,尤其在基于科学论文的推理和判断方面表现有限。解决方案的关键在于构建了一个名为CareMedEval的新颖数据集,该数据集源自法国医学生的真实考试题目,包含534道基于37篇科学论文的问答题,明确聚焦于对研究方法、局限性和统计分析等关键维度的批判性推理能力评估。通过在不同上下文条件下对通用及生物医学专用LLMs进行基准测试,研究揭示了现有模型在复杂推理任务中的显著短板,从而为未来开发更可靠的自动化支持工具提供了挑战性基准和改进方向。
链接: https://arxiv.org/abs/2511.03441
作者: Doria Bonzi,Alexandre Guiggi,Frédéric Béchet,Carlos Ramisch,Benoit Favre
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Preprint submitted to LREC 2026 (under review) To access the dataset, see this https URL
Abstract:Critical appraisal of scientific literature is an essential skill in the biomedical field. While large language models (LLMs) can offer promising support in this task, their reliability remains limited, particularly for critical reasoning in specialized domains. We introduce CareMedEval, an original dataset designed to evaluate LLMs on biomedical critical appraisal and reasoning tasks. Derived from authentic exams taken by French medical students, the dataset contains 534 questions based on 37 scientific articles. Unlike existing benchmarks, CareMedEval explicitly evaluates critical reading and reasoning grounded in scientific papers. Benchmarking state-of-the-art generalist and biomedical-specialized LLMs under various context conditions reveals the difficulty of the task: open and commercial models fail to exceed an Exact Match Rate of 0.5 even though generating intermediate reasoning tokens considerably improves the results. Yet, models remain challenged especially on questions about study limitations and statistical analysis. CareMedEval provides a challenging benchmark for grounded reasoning, exposing current LLM limitations and paving the way for future development of automated support for critical appraisal.
zh
[NLP-17] Knowledge-Augmented Question Error Correction for Chinese Question Answer System with QuestionRAG EMNLP2025
【速读】: 该论文旨在解决大语言模型(Large Language Models, LLMs)在问答(Question-Answering, QA)系统中因输入错误而导致的响应偏差问题,具体表现为对用户意图的误判(misinterpretation)和对原始问题结构的过度修正(over-correction)。其解决方案的关键在于提出QuestionRAG框架:一方面通过引入外部知识(如搜索结果、相关实体)增强输入语义表示以缓解误判;另一方面采用强化学习(Reinforcement Learning, RL)对齐模型目标,使其专注于精确修正而非简单改写,从而显著优于传统的监督微调(Supervised Fine-Tuning, SFT)方法,在指令遵循能力和泛化性能上均有提升。
链接: https://arxiv.org/abs/2511.03410
作者: Longpeng Qiu,Ting Li,Shuai Mao,Nan Yang,Xiaohui Yan
机构: University of Chinese Academy of Sciences (中国科学院大学); Huawei Technologies Co., Ltd. (华为技术有限公司)
类目: Computation and Language (cs.CL)
备注: EMNLP2025 Industry Track
Abstract:Input errors in question-answering (QA) systems often lead to incorrect responses. Large language models (LLMs) struggle with this task, frequently failing to interpret user intent (misinterpretation) or unnecessarily altering the original question’s structure (over-correction). We propose QuestionRAG, a framework that tackles these problems. To address misinterpretation, it enriches the input with external knowledge (e.g., search results, related entities). To prevent over-correction, it uses reinforcement learning (RL) to align the model’s objective with precise correction, not just paraphrasing. Our results demonstrate that knowledge augmentation is critical for understanding faulty questions. Furthermore, RL-based alignment proves significantly more effective than traditional supervised fine-tuning (SFT), boosting the model’s ability to follow instructions and generalize. By integrating these two strategies, QuestionRAG unlocks the full potential of LLMs for the question correction task.
zh
[NLP-18] Efficient Reasoning via Thought-Training and Thought-Free Inference
【速读】: 该论文旨在解决当前基于大语言模型(Large Language Models, LLMs)的推理方法在推理效率与质量之间的权衡问题。现有方法多采用显式思维链(Chain-of-Thought, CoT) prompting,虽能提升推理准确性,但依赖于冗长的逐步推理输出,导致推理过程效率低下。其核心问题在于:如何在不牺牲推理质量的前提下实现高效推理,即在推理过程中隐式完成复杂逻辑运算,而无需生成显式的中间步骤。解决方案的关键在于提出3TF(Thought-Training and Thought-Free inference)框架——首先训练一个兼具推理与非推理模式的混合模型,并在标注有思维链(CoT)的数据上进一步优化,使模型内化结构化推理能力;随后通过强制在推理阶段使用“无思考”模式输出简洁结果,从而实现隐式推理(implicit reasoning)与短输出(thought-free inference)的统一。实验表明,该方法能在无需显式推理步骤的情况下显著提升推理性能,证明高质量推理可被学习并隐式执行。
链接: https://arxiv.org/abs/2511.03408
作者: Canhui Wu,Qiong Cao,Chao Xue,Wei Xi,Xiaodong He
机构: Xi’an Jiaotong University (西安交通大学); JD Future Academy (京东未来研究院)
类目: Computation and Language (cs.CL)
备注: 11 pages, 4 figures
Abstract:Recent advances in large language models (LLMs) have leveraged explicit Chain-of-Thought (CoT) prompting to improve reasoning accuracy. However, most existing methods primarily compress verbose reasoning outputs. These Long-to-Short transformations aim to improve efficiency, but still rely on explicit reasoning during inference. In this work, we introduce \textbf3TF (\textbfThought-\textbfTraining and \textbfThought-\textbfFree inference), a framework for efficient reasoning that takes a Short-to-Long perspective. We first train a hybrid model that can operate in both reasoning and non-reasoning modes, and then further train it on CoT-annotated data to internalize structured reasoning, while enforcing concise, thought-free outputs at inference time using the no-reasoning mode. Unlike compression-based approaches, 3TF improves the reasoning quality of non-reasoning outputs, enabling models to perform rich internal reasoning implicitly while keeping external outputs short. Empirically, 3TF-trained models obtain large improvements on reasoning benchmarks under thought-free inference, demonstrating that high quality reasoning can be learned and executed implicitly without explicit step-by-step generation.
zh
[NLP-19] Overcoming the Generalization Limits of SLM Finetuning for Shape-Based Extraction of Datatype and Object Properties
【速读】: 该论文旨在解决小语言模型(Small Language Models, SLMs)在从RDF图中提取关系时,对稀有属性(即长尾分布的属性)识别性能显著下降的问题,尤其是在同时处理数据类型属性(datatype properties)和对象属性(object properties)的情况下。解决方案的关键在于构建一个训练集,其中每种目标属性的出现次数均超过预设阈值,从而有效缓解属性分布不均衡带来的影响,使模型在各类属性上均能实现稳定且均衡的性能表现。
链接: https://arxiv.org/abs/2511.03407
作者: Célian Ringwald,Fabien Gandon,Catherine Faron,Franck Michel,Hanna Abi Akl
机构: Univ. Côte d’Azur (蔚蓝海岸大学); Inria (法国国家信息与自动化研究院); CNRS (法国国家科学研究中心); I3S (智能系统研究所)
类目: Computation and Language (cs.CL)
备注: Accepted at KCAP 2025
Abstract:Small language models (SLMs) have shown promises for relation extraction (RE) when extracting RDF triples guided by SHACL shapes focused on common datatype properties. This paper investigates how SLMs handle both datatype and object properties for a complete RDF graph extraction. We show that the key bottleneck is related to long-tail distribution of rare properties. To solve this issue, we evaluate several strategies: stratified sampling, weighted loss, dataset scaling, and template-based synthetic data augmentation. We show that the best strategy to perform equally well over unbalanced target properties is to build a training set where the number of occurrences of each property exceeds a given threshold. To enable reproducibility, we publicly released our datasets, experimental results and code. Our findings offer practical guidance for training shape-aware SLMs and highlight promising directions for future work in semantic RE.
zh
[NLP-20] Segmentation Beyond Defaults: Asymmetrical Byte Pair Encoding for Optimal Machine Translation Performance
【速读】: 该论文旨在解决现有机器翻译(Machine Translation, MT)研究中普遍采用对称字节对编码(symmetric Byte Pair Encoding, BPE)所带来的性能瓶颈问题,即固定相同的合并操作次数(Number of Merge Operations, NMO)用于源语言和目标语言的分词模型训练,无法在不同语言对和数据规模下实现最优翻译效果。其解决方案的关键在于提出异构BPE(asymmetric BPE),即为源语言和目标语言分别配置不同的NMO值——具体而言,源语言采用较高NMO(4K–32K),目标语言采用较低NMO(0.5K–2K),从而显著提升低资源场景下的MT性能,在多个语言对上实现了统计学意义上的显著改进(p < 0.05)。
链接: https://arxiv.org/abs/2511.03383
作者: Saumitra Yadav,Manish Shrivastava
机构: Language Technologies Research Center, KCIS, International Institute of Information Technology Hyderabad, India (印度海得拉巴国际信息科技学院语言技术研究中心)
类目: Computation and Language (cs.CL)
备注: Accepted at WAT 2025
Abstract:Existing Machine Translation (MT) research often suggests a single, fixed set of hyperparameters for word segmentation models, symmetric Byte Pair Encoding (BPE), which applies the same number of merge operations (NMO) to train tokenizers for both source and target languages. However, we demonstrate that this uniform approach doesn’t guarantee optimal MT performance across different language pairs and data sizes. This work investigates BPE segmentation recipes across various data volumes and language pairs to evaluate MT system performance. We find that utilizing asymmetric BPE, where the source and target languages have different NMOs, significantly improves results over the symmetric approach, especially in low-resource settings (50K, 100K, and 500K sentence pairs). Specifically, asymmetric BPE yield statistically significant ( p0.05 ) average gains of 5.32, 4.46, and 0.7 CHRF++ on English-Hindi in low-resource setups. We validated this trend across six additional language pairs (English and Telugu, Shona, Norwegian, Kyrgyz, Hausa, and Inuktitut), observing statistically significant improvement in 10 out of 12 systems compared to symmetric BPE. Our findings indicate a high NMO for the source (4K to 32K) and a low NMO for the target (0.5K to 2K) provides optimal results, particularly benefiting low-resource MT.
zh
[NLP-21] Beyond Citations: Measuring Idea-level Knowledge Diffusion from Research to Journalism and Policy-making
【速读】: 该论文试图解决社会科学研究知识在不同领域(如新闻业和政策制定)中扩散程度难以量化的问题(即知识扩散的测量难题)。传统方法主要依赖直接引用,但忽略了隐含的知识传播路径。其解决方案的关键在于采用一种新颖的基于文本的方法,通过识别和追踪特定社会科学研究思想(如媒体效果理论)在研究、新闻与政策三大领域中的提及频率、语境特征及其随时间的变化,实现对知识扩散的细粒度刻画。该方法利用嵌入回归(embedding regression)比较各领域间概念语义距离,揭示了知识角色转换(从理论到新闻解读再到政策应用)及跨域语义趋同现象,从而超越了传统引文指标的局限性。
链接: https://arxiv.org/abs/2511.03378
作者: Yangliu Fan,Kilian Buehling,Volker Stocker
机构: Weizenbaum Institute (魏岑鲍姆研究所); Freie Universität Berlin (柏林自由大学); Technische Universität Berlin (柏林工业大学)
类目: ocial and Information Networks (cs.SI); Computation and Language (cs.CL)
备注:
Abstract:Despite the importance of social science knowledge for various stakeholders, measuring its diffusion into different domains remains a challenge. This study uses a novel text-based approach to measure the idea-level diffusion of social science knowledge from the research domain to the journalism and policy-making domains. By doing so, we expand the detection of knowledge diffusion beyond the measurements of direct references. Our study focuses on media effects theories as key research ideas in the field of communication science. Using 72,703 documents (2000-2019) from three domains (i.e., research, journalism, and policy-making) that mention these ideas, we count the mentions of these ideas in each domain, estimate their domain-specific contexts, and track and compare differences across domains and over time. Overall, we find that diffusion patterns and dynamics vary considerably between ideas, with some ideas diffusing between other domains, while others do not. Based on the embedding regression approach, we compare contextualized meanings across domains and find that the distances between research and policy are typically larger than between research and journalism. We also find that ideas largely shift roles across domains - from being the theories themselves in research to sense-making in news to applied, administrative use in policy. Over time, we observe semantic convergence mainly for ideas that are practically oriented. Our results characterize the cross-domain diffusion patterns and dynamics of social science knowledge at the idea level, and we discuss the implications for measuring knowledge diffusion beyond citations.
zh
[NLP-22] LFC-DA: Logical Formula-Controlled Data Augmentation for Enhanced Logical Reasoning
【速读】: 该论文旨在解决复杂逻辑数据增强中依赖人工标注成本高,以及直接使用大语言模型(Large Language Models, LLMs)生成样本时存在不可解释性和逻辑同质性的问题。其解决方案的关键在于提出一种符号逻辑控制的数据增强框架LFC-DA:首先将自然语言逻辑文本映射为命题表达式,构建紧凑的规则库,并通过有界状态空间搜索系统性地发现有效公式,再将其重新转化为自然语言问题,从而在保证逻辑严谨性的前提下实现多样化的高质量数据生成。
链接: https://arxiv.org/abs/2511.03372
作者: Shenghao Li
机构: 未知
类目: Computation and Language (cs.CL)
备注: 10 pages, 6 figures
Abstract:For complex logical data augmentation, heavy reliance on human annotation is costly, whereas direct generation with large language models yields uninterpretable and logically homogeneous examples. To address this, we present LFC-DA, a symbolic-logic-controlled pipeline: logical text is first mapped to propositional expressions, a compact rule library is compiled, and a bounded state-space search systematically discovers valid formulas that are then verbalized back into natural-language questions, ensuring both diversity and logical rigor under propositional logic. Experiments on ReClor and LogiQA show significant improvements in the logical-reasoning accuracy of pretrained models, confirming the effectiveness of LFC-DA for LLM-guided logical data augmentation.
zh
[NLP-23] EQ-Negotiator: Dynamic Emotional Personas Empower Small Language Models for Edge-Deployable Credit Negotiation
【速读】: 该论文旨在解决大语言模型(Large Language Models, LLMs)在自动化谈判中虽性能优异但存在计算成本高和数据隐私要求严苛的问题,使其难以应用于移动助手、具身AI代理或私密客户交互等对隐私敏感的边缘场景。针对小语言模型(Small Language Models, SLMs)因情感推理能力不足导致在情绪化复杂角色扮演(如信贷谈判)中表现显著落后于LLMs的瓶颈,论文提出EQ-Negotiator框架,其核心创新在于引入一种融合博弈论与隐马尔可夫模型(Hidden Markov Model, HMM)的在线推理系统,无需预训练即可实时学习并追踪债务人的情绪状态,从而赋予SLMs对抗操纵、缓和冲突并遵守伦理规范的战略智能。实验表明,配备EQ-Negotiator的7B参数模型在多种信贷谈判场景下,包括欺诈、威胁和装受害者等对抗策略,其债务回收率和谈判效率优于参数量超过其10倍的基线LLMs,证明了战略情感智能(strategic emotional intelligence)是自动化谈判成功的关键因素,而非单纯模型规模。
链接: https://arxiv.org/abs/2511.03370
作者: Yunbo Long,Yuhan Liu,Alexandra Brintrup
机构: University of Cambridge (剑桥大学); University of Toronto (多伦多大学); The Alan Turing Institute (艾伦图灵研究所)
类目: Computation and Language (cs.CL)
备注:
Abstract:The deployment of large language models (LLMs) in automated negotiation has set a high performance benchmark, but their computational cost and data privacy requirements render them unsuitable for many privacy-sensitive, on-device applications such as mobile assistants, embodied AI agents or private client interactions. While small language models (SLMs) offer a practical alternative, they suffer from a significant performance gap compared to LLMs in playing emotionally charged complex personas, especially for credit negotiation. This paper introduces EQ-Negotiator, a novel framework that bridges this capability gap using emotional personas. Its core is a reasoning system that integrates game theory with a Hidden Markov Model(HMM) to learn and track debtor emotional states online, without pre-training. This allows EQ-Negotiator to equip SLMs with the strategic intelligence to counter manipulation while de-escalating conflict and upholding ethical standards. Through extensive agent-to-agent simulations across diverse credit negotiation scenarios, including adversarial debtor strategies like cheating, threatening, and playing the victim, we show that a 7B parameter language model with EQ-Negotiator achieves better debt recovery and negotiation efficiency than baseline LLMs more than 10 times its size. This work advances persona modeling from descriptive character profiles to dynamic emotional architectures that operate within privacy constraints. Besides, this paper establishes that strategic emotional intelligence, not raw model scale, is the critical factor for success in automated negotiation, paving the way for effective, ethical, and privacy-preserving AI negotiators that can operate on the edge.
zh
[NLP-24] Silenced Biases: The Dark Side LLM s Learned to Refuse
【速读】: 该论文旨在解决当前安全对齐的大语言模型(Large Language Models, LLMs)在公平性评估中存在的盲区问题,即现有方法常将模型拒绝回答(refusal responses)误判为公平性的正面指标,从而掩盖了模型内部潜在的偏见。这种“被压制的偏见”(silenced biases)隐藏在模型的潜在空间中,由对齐训练过程所掩盖,导致传统基于问答(QA)的评估框架无法准确揭示真实不公平性。解决方案的关键在于提出Silenced Bias Benchmark (SBB),通过激活操控(activation steering)技术降低模型在标准QA任务中的拒绝率,从而暴露其隐含的不公平偏好;SBB具有良好的可扩展性,支持新增人口群体和主题,为未来开发真正公平的模型与工具提供了更可靠的评估框架。
链接: https://arxiv.org/abs/2511.03369
作者: Rom Himelstein,Amit LeVi,Brit Youngmann,Yaniv Nemcovsky,Avi Mendelson
机构: 未知
类目: Computation and Language (cs.CL); Machine Learning (stat.ML)
备注:
Abstract:Safety-aligned large language models (LLMs) are becoming increasingly widespread, especially in sensitive applications where fairness is essential and biased outputs can cause significant harm. However, evaluating the fairness of models is a complex challenge, and approaches that do so typically utilize standard question-answer (QA) styled schemes. Such methods often overlook deeper issues by interpreting the model’s refusal responses as positive fairness measurements, which creates a false sense of fairness. In this work, we introduce the concept of silenced biases, which are unfair preferences encoded within models’ latent space and are effectively concealed by safety-alignment. Previous approaches that considered similar indirect biases often relied on prompt manipulation or handcrafted implicit queries, which present limited scalability and risk contaminating the evaluation process with additional biases. We propose the Silenced Bias Benchmark (SBB), which aims to uncover these biases by employing activation steering to reduce model refusals during QA. SBB supports easy expansion to new demographic groups and subjects, presenting a fairness evaluation framework that encourages the future development of fair models and tools beyond the masking effects of alignment training. We demonstrate our approach over multiple LLMs, where our findings expose an alarming distinction between models’ direct responses and their underlying fairness issues.
zh
[NLP-25] Generative Artificial Intelligence in Bioinformatics: A Systematic Review of Models Applications and Methodological Advances
【速读】: 该论文旨在系统性地识别和评估生成式人工智能(Generative AI)在生物信息学各子领域中的方法论进步、预测性能与专业化程度,以明确其在高级建模、数据密集型发现及整合生物学分析中的潜力。解决方案的关键在于基于系统综述与元分析的方法论框架,提出六个研究问题(RQs),通过结构化分析揭示:1)GenAI在序列分析、分子设计和整合数据建模等多领域的广泛应用及其超越传统方法的性能优势;2)专用模型架构因针对性预训练和上下文感知策略而优于通用模型;3)在分子分析与数据整合中显著提升准确性并降低误差;4)结构建模、功能预测与合成数据生成等方面的验证性改进;5)当前限制因素如可扩展性不足与数据偏差对泛化能力的影响,并提出未来应聚焦于稳健评估与生物基础建模;6)多种高质量生物医学数据资源(如UniProtKB、CELLxGENE、PubMedQA)为GenAI模型训练与泛化提供了坚实支撑。
链接: https://arxiv.org/abs/2511.03354
作者: Riasad Alvi,Sayeem Been Zaman,Wasimul Karim,Arefin Ittesafun Abian,Mohaimenul Azam Khan Raiaan,Saddam Mukta,Md Rafi Ur Rashid,Md Rafiqul Islam,Yakub Sebastian,Sami Azam
机构: United International University (联合国际大学); Monash University (蒙纳士大学); Lappeenranta-Lahti University of Technology (拉彭兰塔-拉赫蒂工业大学); Pennsylvania State University (宾夕法尼亚州立大学); Charles Darwin University (查尔斯达尔文大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:
Abstract:Generative artificial intelligence (GenAI) has become a transformative approach in bioinformatics that often enables advancements in genomics, proteomics, transcriptomics, structural biology, and drug discovery. To systematically identify and evaluate these growing developments, this review proposed six research questions (RQs), according to the preferred reporting items for systematic reviews and meta-analysis methods. The objective is to evaluate impactful GenAI strategies in methodological advancement, predictive performance, and specialization, and to identify promising approaches for advanced modeling, data-intensive discovery, and integrative biological analysis. RQ1 highlights diverse applications across multiple bioinformatics subfields (sequence analysis, molecular design, and integrative data modeling), which demonstrate superior performance over traditional methods through pattern recognition and output generation. RQ2 reveals that adapted specialized model architectures outperformed general-purpose models, an advantage attributed to targeted pretraining and context-aware strategies. RQ3 identifies significant benefits in the bioinformatics domains, focusing on molecular analysis and data integration, which improves accuracy and reduces errors in complex analysis. RQ4 indicates improvements in structural modeling, functional prediction, and synthetic data generation, validated by established benchmarks. RQ5 suggests the main constraints, such as the lack of scalability and biases in data that impact generalizability, and proposes future directions focused on robust evaluation and biologically grounded modeling. RQ6 examines that molecular datasets (such as UniProtKB and ProteinNet12), cellular datasets (such as CELLxGENE and GTEx) and textual resources (such as PubMedQA and OMIM) broadly support the training and generalization of GenAI models.
zh
[NLP-26] Benchmarking the Thinking Mode of Multimodal Large Language Models in Clinical Tasks
【速读】: 该论文旨在解决当前多模态大语言模型(Multimodal Large Language Models, MLLMs)在临床任务中表现受限的问题,特别是其推理能力是否能通过“思考模式”(thinking mode)的激活显著提升性能与可靠性。解决方案的关键在于系统性评估两种领先的双状态MLLMs——Seed1.5-VL和Gemini-2.5-Flash——在四个视觉医学任务上的表现,对比其在标准非思考模式与主动思考模式下的差异。研究发现,尽管引入了显式的内部推理机制,多数任务中性能提升仍不显著,尤其在开放问答和医学图像解读等复杂场景下表现欠佳,表明当前模型仍需依赖领域特定的医疗数据和更先进的知识融合方法以实现可靠应用。
链接: https://arxiv.org/abs/2511.03328
作者: Jindong Hong,Tianjie Chen,Lingjie Luo,Chuanyang Zheng,Ting Xu,Haibao Yu,Jianing Qiu,Qianzhong Chen,Suning Huang,Yan Xu,Yong Gui,Yijun He,Jiankai Sun
机构: Bytedance(字节跳动); Peking University (北京大学); The Chinese University of Hong Kong (香港中文大学); The University of Hong Kong (香港大学); Mohamed bin Zayed University of Artificial Intelligence (穆罕默德·本·扎耶德人工智能大学); Stanford University (斯坦福大学); University of Michigan (密歇根大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:
Abstract:A recent advancement in Multimodal Large Language Models (MLLMs) research is the emergence of “reasoning MLLMs” that offer explicit control over their internal thinking processes (normally referred as the “thinking mode”) alongside the standard “non-thinking mode”. This capability allows these models to engage in a step-by-step process of internal deliberation before generating a final response. With the rapid transition to and adoption of these “dual-state” MLLMs, this work rigorously evaluated how the enhanced reasoning processes of these MLLMs impact model performance and reliability in clinical tasks. This paper evaluates the active “thinking mode” capabilities of two leading MLLMs, Seed1.5-VL and Gemini-2.5-Flash, for medical applications. We assessed their performance on four visual medical tasks using VQA-RAD and ROCOv2 datasets. Our findings reveal that the improvement from activating the thinking mode remains marginal compared to the standard non-thinking mode for the majority of the tasks. Their performance on complex medical tasks such as open-ended VQA and medical image interpretation remains suboptimal, highlighting the need for domain-specific medical data and more advanced methods for medical knowledge integration.
zh
[NLP-27] How to Evaluate Speech Translation with Source-Aware Neural MT Metrics
【速读】: 该论文旨在解决语音到文本翻译(Speech-to-Text Translation, ST)系统自动评估中依赖参考译文导致的信息损失问题,即传统基于参考译文的评估方法无法利用源端输入(音频)中的有用信息。其核心解决方案是引入源感知(source-aware)评价指标,并针对真实场景中缺乏源端文本转录的情况,提出两种生成文本代理的方法:一是使用自动语音识别(ASR)生成的转录文本,二是通过参考译文反向翻译(back-translation)获得的伪源文本;同时设计了一种新颖的两步跨语言重新分段算法,以缓解合成源与参考译文之间的对齐偏差问题。实验表明,在词错误率低于20%时,ASR转录比反向翻译更可靠,而后者在计算成本更低的前提下仍具有效性,且所提重分段算法显著提升了源感知机器翻译(MT)指标在ST场景下的鲁棒性与适用性。
链接: https://arxiv.org/abs/2511.03295
作者: Mauro Cettolo,Marco Gaido,Matteo Negri,Sara Papi,Luisa Bentivogli
机构: Fondazione Bruno Kessler(布鲁诺·凯斯勒基金会)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:
Abstract:Automatic evaluation of speech-to-text translation (ST) systems is typically performed by comparing translation hypotheses with one or more reference translations. While effective to some extent, this approach inherits the limitation of reference-based evaluation that ignores valuable information from the source input. In machine translation (MT), recent progress has shown that neural metrics incorporating the source text achieve stronger correlation with human judgments. Extending this idea to ST, however, is not trivial because the source is audio rather than text, and reliable transcripts or alignments between source and references are often unavailable. In this work, we conduct the first systematic study of source-aware metrics for ST, with a particular focus on real-world operating conditions where source transcripts are not available. We explore two complementary strategies for generating textual proxies of the input audio, automatic speech recognition (ASR) transcripts, and back-translations of the reference translation, and introduce a novel two-step cross-lingual re-segmentation algorithm to address the alignment mismatch between synthetic sources and reference translations. Our experiments, carried out on two ST benchmarks covering 79 language pairs and six ST systems with diverse architectures and performance levels, show that ASR transcripts constitute a more reliable synthetic source than back-translations when word error rate is below 20%, while back-translations always represent a computationally cheaper but still effective alternative. Furthermore, our cross-lingual re-segmentation algorithm enables robust use of source-aware MT metrics in ST evaluation, paving the way toward more accurate and principled evaluation methodologies for speech translation.
zh
[NLP-28] Let the Bees Find the Weak Spots: A Path Planning Perspective on Multi-Turn Jailbreak Attacks against LLM s
【速读】: 该论文旨在解决大型语言模型(Large Language Models, LLMs)在红队测试(red teaming evaluation)中面临的两个核心问题:一是现有多轮越狱攻击(multi-turn jailbreaks)方法缺乏对成功对话轨迹的系统性探索,二是攻击过程存在较高的查询开销(query overhead)。为应对上述挑战,论文提出了一种基于动态加权图拓扑结构的理论模型,将多轮攻击过程建模为路径规划问题,并进一步设计了ABC算法——一种增强型人工蜂群算法(Artificial Bee Colony algorithm),其通过雇佣蜂、观察蜂和侦察蜂的协同搜索机制,显著提升了最优攻击路径的搜索效率。该方案在三个开源和两个专有语言模型上验证有效,平均仅需26次查询即可实现超过90%的攻击成功率(最高达98%),大幅降低了红队测试的资源消耗。
链接: https://arxiv.org/abs/2511.03271
作者: Yize Liu,Yunyun Hou,Aina Sui
机构: 未知
类目: Cryptography and Security (cs.CR); Computation and Language (cs.CL)
备注:
Abstract:Large Language Models (LLMs) have been widely deployed across various applications, yet their potential security and ethical risks have raised increasing concerns. Existing research employs red teaming evaluations, utilizing multi-turn jailbreaks to identify potential vulnerabilities in LLMs. However, these approaches often lack exploration of successful dialogue trajectories within the attack space, and they tend to overlook the considerable overhead associated with the attack process. To address these limitations, this paper first introduces a theoretical model based on dynamically weighted graph topology, abstracting the multi-turn attack process as a path planning problem. Based on this framework, we propose ABC, an enhanced Artificial Bee Colony algorithm for multi-turn jailbreaks, featuring a collaborative search mechanism with employed, onlooker, and scout bees. This algorithm significantly improves the efficiency of optimal attack path search while substantially reducing the average number of queries required. Empirical evaluations on three open-source and two proprietary language models demonstrate the effectiveness of our approach, achieving attack success rates above 90% across the board, with a peak of 98% on GPT-3.5-Turbo, and outperforming existing baselines. Furthermore, it achieves comparable success with only 26 queries on average, significantly reducing red teaming overhead and highlighting its superior efficiency.
zh
[NLP-29] SCALE: Upscaled Continual Learning of Large Language Models
【速读】: 该论文旨在解决大语言模型在持续预训练(continual pre-training)过程中面临的灾难性遗忘问题,即模型在学习新知识时会严重损害原有任务的性能。传统方法主要依赖于增加参数量(scaling parameters),但研究表明,单纯扩大模型规模已难以有效提升稳定性与适应性的平衡。解决方案的关键在于提出一种名为SCALE的宽度扩展架构,其核心思想是在保持预训练参数冻结的前提下,通过轻量级结构扩展插入线性模块,从而在不破坏原始残差和注意力拓扑结构的基础上增强模型容量。SCALE遵循两个原则:一是“持久保留”(Persistent Preservation),通过保留导向的初始化和权重冻结确保基线模型行为不变;二是“协同适应”(Collaborative Adaptation),选择性地训练部分扩展组件以最小干扰获取新知识。实验表明,SCALE在合成生物数据集和韩语语料上的持续预训练中均显著缓解了遗忘现象,并实现了更优的稳定-可塑性权衡。
链接: https://arxiv.org/abs/2511.03270
作者: Jin-woo Lee,Junhwa Choi,Bongkyu Hwang,Jinho Choo,Bogun Kim,JeongSeon Yi,Joonseok Lee,DongYoung Jung,Jaeseon Park,Kyoungwon Park,Suk-hoon Jung
机构: Samsung(三星)
类目: Computation and Language (cs.CL)
备注:
Abstract:We revisit continual pre-training for large language models and argue that progress now depends more on scaling the right structure than on scaling parameters alone. We introduce SCALE, a width upscaling architecture that inserts lightweight expansion into linear modules while freezing all pre-trained parameters. This preserves the residual and attention topologies and increases capacity without perturbing the base model’s original functionality. SCALE is guided by two principles: Persistent Preservation, which maintains the base model’s behavior via preservation-oriented initialization and freezing of the pre-trained weights, and Collaborative Adaptation, which selectively trains a subset of expansion components to acquire new knowledge with minimal interference. We instantiate these ideas as SCALE-Preserve (preservation-first), SCALE-Adapt (adaptation-first), and SCALE-Route, an optional routing extension that performs token-level routing between preservation and adaptation heads. On a controlled synthetic biography benchmark, SCALE mitigates the severe forgetting observed with depth expansion while still acquiring new knowledge. In continual pre-training on a Korean corpus, SCALE variants achieve less forgetting on English evaluations and competitive gains on Korean benchmarks, with these variants offering the best overall stability-plasticity trade-off. Accompanying analysis clarifies when preservation provably holds and why the interplay between preservation and adaptation stabilizes optimization compared to standard continual learning setups.
zh
[NLP-30] Comparing the Performance of LLM s in RAG -based Question-Answering: A Case Study in Computer Science Literature
【速读】: 该论文旨在解决如何通过检索增强生成(Retrieval Augmented Generation, RAG)技术提升不同大型语言模型(Large Language Models, LLMs)在计算机科学文献领域问答任务中的性能表现问题,特别是针对幻觉减少与答案准确性优化。其解决方案的关键在于系统性比较四种开源LLMs(Mistral-7b-instruct、LLaMa2-7b-chat、Falcon-7b-instruct和Orca-mini-v3-7b)与OpenAI的GPT-3.5在RAG支持下的问答能力,采用准确率、精确度、人类专家排序、Google Gemini排序及余弦相似度等多维指标进行评估,结果表明:Mistral-7b-instruct在RAG加持下表现最优,且开源模型在良好基础设施支撑下可媲美甚至超越商业模型。
链接: https://arxiv.org/abs/2511.03261
作者: Ranul Dayarathne,Uvini Ranaweera,Upeksha Ganegoda
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 18 pages, 4 figures, 5 tables, presented at the 5th International Conference on Artificial Intelligence in Education Technology
Abstract:Retrieval Augmented Generation (RAG) is emerging as a powerful technique to enhance the capabilities of Generative AI models by reducing hallucination. Thus, the increasing prominence of RAG alongside Large Language Models (LLMs) has sparked interest in comparing the performance of different LLMs in question-answering (QA) in diverse domains. This study compares the performance of four open-source LLMs, Mistral-7b-instruct, LLaMa2-7b-chat, Falcon-7b-instruct and Orca-mini-v3-7b, and OpenAI’s trending GPT-3.5 over QA tasks within the computer science literature leveraging RAG support. Evaluation metrics employed in the study include accuracy and precision for binary questions and ranking by a human expert, ranking by Google’s AI model Gemini, alongside cosine similarity for long-answer questions. GPT-3.5, when paired with RAG, effectively answers binary and long-answer questions, reaffirming its status as an advanced LLM. Regarding open-source LLMs, Mistral AI’s Mistral-7b-instruct paired with RAG surpasses the rest in answering both binary and long-answer questions. However, among the open-source LLMs, Orca-mini-v3-7b reports the shortest average latency in generating responses, whereas LLaMa2-7b-chat by Meta reports the highest average latency. This research underscores the fact that open-source LLMs, too, can go hand in hand with proprietary models like GPT-3.5 with better infrastructure.
zh
[NLP-31] IndicSuperTokenizer: An Optimized Tokenizer for Indic Multilingual LLM s
【速读】: 该论文旨在解决多语言大语言模型(Multilingual Large Language Models, LLMs)中词元化(tokenization)效率与语言学合理性不足的问题,尤其针对印度语系(Indic)语言的复杂书写系统和形态变化。其解决方案的关键在于提出IndicSuperTokenizer,该方法融合了子词(subword)与多词(multi-word)词元化策略,并引入语言特定的预分词(language-specific pre-tokenization),从而生成更符合语言学规律的词元,显著提升词元化“肥沃度”(fertility score)。实验表明,该方案在英语、22种印度语言及代码数据上平均肥沃度较LLaMA4提升39.5%,推理吞吐量提升44%,同时保持英文和印地语基准测试性能相当。
链接: https://arxiv.org/abs/2511.03237
作者: Souvik Rana,Arul Menezes,Ashish Kulkarni,Chandra Khatri,Shubham Agarwal
机构: Krutrim AI(克鲁特林人工智能)
类目: Computation and Language (cs.CL)
备注:
Abstract:Tokenizers play a crucial role in determining the performance, training efficiency, and the inference cost of Large Language Models (LLMs). Designing effective tokenizers for multilingual LLMs is particularly challenging due to diverse scripts and rich morphological variation. While subword methods such as Byte Pair Encoding (BPE) are widely adopted, their effectiveness in multilingual settings remains underexplored. We present IndicSuperTokenizer, a tokenizer for Indic multilingual LLMs, that combines both subword and multi-word tokenization, along with language-specific pre-tokenization, leading to more linguistically aligned tokens and achieving a new state-of-the-art in fertility score. Evaluated across English, 22 Indian languages and code data, our tokenizer improves the average fertility score by 39.5% over LLaMA4 and by 18% over Sutra (the current best). This translates to 44% improvement in inference throughput over LLaMA4 while maintaining comparable performance on English and Indic benchmarks. We also present detailed ablations across tokenizer training data size, vocabulary size, merging techniques, and pre-tokenization strategies, demonstrating the robustness of our design choices.
zh
[NLP-32] Beyond Ranked Lists: The SARAL Framework for Cross-Lingual Document Set Retrieval
【速读】: 该论文旨在解决跨语言信息检索(Cross-lingual Information Retrieval, CLIR)中的核心挑战,即如何在目标语言中高效准确地检索与源语言查询相关的文档集合,而不仅限于生成排序的文档列表。解决方案的关键在于提出一种新型方法,专注于构建查询相关的文档集(query-relevant document set),而非仅输出排名列表,从而提升检索结果的相关性和实用性。在MATERIAL项目第三阶段评估中,该方法在三种不同语言(波斯语、哈萨克语和格鲁吉亚语)的六种评估条件下,有五种表现优于其他团队,验证了其有效性。
链接: https://arxiv.org/abs/2511.03228
作者: Shantanu Agarwal,Joel Barry,Elizabeth Boschee,Scott Miller
机构: Information Sciences Institute (信息科学研究所); University of Southern California (南加州大学)
类目: Computation and Language (cs.CL); Information Retrieval (cs.IR)
备注:
Abstract:Machine Translation for English Retrieval of Information in Any Language (MATERIAL) is an IARPA initiative targeted to advance the state of cross-lingual information retrieval (CLIR). This report provides a detailed description of Information Sciences Institute’s (ISI’s) Summarization and domain-Adaptive Retrieval Across Language’s (SARAL’s) effort for MATERIAL. Specifically, we outline our team’s novel approach to handle CLIR with emphasis in developing an approach amenable to retrieve a query-relevant document \textitset, and not just a ranked document-list. In MATERIAL’s Phase-3 evaluations, SARAL exceeded the performance of other teams in five out of six evaluation conditions spanning three different languages (Farsi, Kazakh, and Georgian).
zh
[NLP-33] Hybrid Fact-Checking that Integrates Knowledge Graphs Large Language Models and Search-Based Retrieval Agents Improves Interpretable Claim Verification EMNLP
【速读】: 该论文旨在解决大型语言模型(Large Language Models, LLMs)在生成文本时缺乏可靠事实依据的问题,同时克服基于知识图谱(Knowledge Graph, KG)的核查工具在覆盖范围或响应延迟上的局限性。其解决方案的关键在于构建一个三阶段的混合式事实核查流水线:首先通过知识图谱(如DBpedia)进行快速单跳检索以获取结构化证据;其次利用LLM基于任务特定提示词进行分类判断,输出具有内部规则逻辑的推理结果;最后在知识图谱覆盖不足时调用网络搜索代理作为补充,从而实现高精度、可解释且具备鲁棒性的事实核查能力。该方法在FEVER基准的支持/反驳子集上达到0.93的F1分数,且能有效识别原标注为“信息不足”(Not Enough Information, NEI)的陈述中隐藏的有效证据。
链接: https://arxiv.org/abs/2511.03217
作者: Shaghayegh Kolli,Richard Rosenbaum,Timo Cavelius,Lasse Strothe,Andrii Lata,Jana Diesner
机构: Technical University Munich (慕尼黑工业大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Information Retrieval (cs.IR)
备注: Paper has been accepted at 9th wiNLP workshop at EMNLP
Abstract:Large language models (LLMs) excel in generating fluent utterances but can lack reliable grounding in verified information. At the same time, knowledge-graph-based fact-checkers deliver precise and interpretable evidence, yet suffer from limited coverage or latency. By integrating LLMs with knowledge graphs and real-time search agents, we introduce a hybrid fact-checking approach that leverages the individual strengths of each component. Our system comprises three autonomous steps: 1) a Knowledge Graph (KG) Retrieval for rapid one - hop lookups in DBpedia, 2) an LM-based classification guided by a task-specific labeling prompt, producing outputs with internal rule-based logic, and 3) a Web Search Agent invoked only when KG coverage is insufficient. Our pipeline achieves an F1 score of 0.93 on the FEVER benchmark on the Supported/Refuted split without task- specific fine - tuning. To address Not enough information cases, we conduct a targeted reannotation study showing that our approach frequently uncovers valid evidence for claims originally labeled as Not Enough Information (NEI), as confirmed by both expert annotators and LLM reviewers. With this paper, we present a modular, opensource fact-checking pipeline with fallback strategies and generalization across datasets.
zh
[NLP-34] LGM: Enhancing Large Language Models with Conceptual Meta-Relations and Iterative Retrieval
【速读】: 该论文旨在解决大语言模型(Large Language Models, LLMs)在处理用户指令时,因术语模糊或概念错位而导致理解偏差的问题。其解决方案的关键在于提出语言图模型(Language Graph Model, LGM),通过从自然语言中提取元关系(meta-relations),包括继承(inheritance)、别名(alias)和组合(composition),并引入反射机制对这些关系进行验证;同时,利用概念迭代检索算法动态地将相关关系及其描述注入LLM,从而增强其概念理解能力与响应准确性。该方法不依赖扩展上下文窗口,可支持任意长度文本的无截断处理,显著优于传统检索增强生成(Retrieval-Augmented Generation, RAG)基线。
链接: https://arxiv.org/abs/2511.03214
作者: Wenchang Lei,Ping Zou,Yue Wang,Feng Sun,Lei Zhao
机构: Philisense(菲力森)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 30 pages, 5 figures
Abstract:Large language models (LLMs) exhibit strong semantic understanding, yet struggle when user instructions involve ambiguous or conceptually misaligned terms. We propose the Language Graph Model (LGM) to enhance conceptual clarity by extracting meta-relations-inheritance, alias, and composition-from natural language. The model further employs a reflection mechanism to validate these meta-relations. Leveraging a Concept Iterative Retrieval Algorithm, these relations and related descriptions are dynamically supplied to the LLM, improving its ability to interpret concepts and generate accurate responses. Unlike conventional Retrieval-Augmented Generation (RAG) approaches that rely on extended context windows, our method enables large language models to process texts of any length without the need for truncation. Experiments on standard benchmarks demonstrate that the LGM consistently outperforms existing RAG baselines.
zh
[NLP-35] BengaliMoralBench: A Benchmark for Auditing Moral Reasoning in Large Language Models within Bengali Language and Culture
【速读】: 该论文旨在解决多语言大语言模型(Multilingual Large Language Models, LLMs)在南亚地区,尤其是孟加拉语(Bengali)语境下伦理对齐不足的问题。现有伦理评测基准大多以英语为主、受西方伦理框架主导,忽视了本地文化细微差别,导致模型在实际部署中可能缺乏道德合理性与文化适配性。解决方案的关键在于构建首个针对孟加拉语及其社会文化背景的大规模伦理评测基准——BengaliMoralBench,涵盖五大道德领域共50个文化相关子主题,并通过母语者共识标注三种伦理视角(美德伦理、常识伦理和正义伦理),从而实现对主流多语言LLMs(如Llama、Gemma、Qwen和DeepSeek)的零样本系统评估,揭示其在文化根基、常识推理和道德公平方面的普遍缺陷,为低资源多语言环境中负责任的AI本地化提供可衡量、可改进的基础。
链接: https://arxiv.org/abs/2511.03180
作者: Shahriyar Zaman Ridoy,Azmine Toushik Wasi,Koushik Ahamed Tonmoy
机构: 未知
类目: Computation and Language (cs.CL)
备注: This manuscript is a preprint currently under review
Abstract:As multilingual Large Language Models (LLMs) gain traction across South Asia, their alignment with local ethical norms, particularly for Bengali, which is spoken by over 285 million people and ranked 6th globally, remains underexplored. Existing ethics benchmarks are largely English-centric and shaped by Western frameworks, overlooking cultural nuances critical for real-world deployment. To address this, we introduce BengaliMoralBench, the first large-scale ethics benchmark for the Bengali language and socio-cultural contexts. It covers five moral domains, Daily Activities, Habits, Parenting, Family Relationships, and Religious Activities, subdivided into 50 culturally relevant subtopics. Each scenario is annotated via native-speaker consensus using three ethical lenses: Virtue, Commonsense, and Justice ethics. We conduct systematic zero-shot evaluation of prominent multilingual LLMs, including Llama, Gemma, Qwen, and DeepSeek, using a unified prompting protocol and standard metrics. Performance varies widely (50-91% accuracy), with qualitative analysis revealing consistent weaknesses in cultural grounding, commonsense reasoning, and moral fairness. BengaliMoralBench provides a foundation for responsible localization, enabling culturally aligned evaluation and supporting the deployment of ethically robust AI in diverse, low-resource multilingual settings such as Bangladesh.
zh
[NLP-36] Measuring Aleatoric and Epistemic Uncertainty in LLM s: Empirical Evaluation on ID and OOD QA Tasks KDD’24
【速读】: 该论文旨在解决大语言模型(Large Language Models, LLMs)输出可信度保障问题,核心挑战在于如何有效估计生成结果中的不确定性,尤其是区分数据固有噪声(aleatoric uncertainty)与模型知识不足导致的不确定性(epistemic uncertainty)。解决方案的关键在于系统性地评估十二种不同的不确定性估计(Uncertainty Estimation, UE)方法在问答任务中对分布内(in-distribution, ID)和分布外(out-of-distribution, OOD)样本的表现,并结合LLMScore等生成质量指标进行多维验证。研究发现,基于信息论的方法在ID场景下表现优异,因其与模型对数据的理解高度一致;而密度-based方法和P(True)指标在OOD场景中更可靠,能更好捕捉epistemic uncertainty;此外,语义一致性方法在不同数据集和评估指标下均具鲁棒性,表明其作为通用不确定性代理的有效性。
链接: https://arxiv.org/abs/2511.03166
作者: Kevin Wang,Subre Abdoul Moktar,Jia Li,Kangshuo Li,Feng Chen
机构: The University of Texas at Dallas (德克萨斯大学达拉斯分校)
类目: Computation and Language (cs.CL)
备注: Accepted by UDM-KDD’24
Abstract:Large Language Models (LLMs) have become increasingly pervasive, finding applications across many industries and disciplines. Ensuring the trustworthiness of LLM outputs is paramount, where Uncertainty Estimation (UE) plays a key role. In this work, a comprehensive empirical study is conducted to examine the robustness and effectiveness of diverse UE measures regarding aleatoric and epistemic uncertainty in LLMs. It involves twelve different UE methods and four generation quality metrics including LLMScore from LLM criticizers to evaluate the uncertainty of LLM-generated answers in Question-Answering (QA) tasks on both in-distribution (ID) and out-of-distribution (OOD) datasets. Our analysis reveals that information-based methods, which leverage token and sequence probabilities, perform exceptionally well in ID settings due to their alignment with the model’s understanding of the data. Conversely, density-based methods and the P(True) metric exhibit superior performance in OOD contexts, highlighting their effectiveness in capturing the model’s epistemic uncertainty. Semantic consistency methods, which assess variability in generated answers, show reliable performance across different datasets and generation metrics. These methods generally perform well but may not be optimal for every situation.
zh
[NLP-37] Who Sees the Risk? Stakeholder Conflicts and Explanatory Policies in LLM -based Risk Assessment
【速读】: 该论文旨在解决人工智能(Artificial Intelligence, AI)系统在实际部署中,不同利益相关方(stakeholders)对风险认知存在差异的问题,从而影响AI系统的负责任治理与决策。其解决方案的关键在于构建一个基于大语言模型(Large Language Models, LLMs)的“利益相关方 grounded 风险评估框架”,利用LLMs作为裁判预测并解释风险,并结合Risk Atlas Nexus与GloVe解释方法生成可解释、面向特定利益相关方的风险政策,揭示各方在相同风险上的共识与分歧。该方法通过交互式可视化工具进一步阐明冲突产生的机制,提升了风险评估过程的透明度和人本导向的AI治理能力。
链接: https://arxiv.org/abs/2511.03152
作者: Srishti Yadav,Jasmina Gajcin,Erik Miehling,Elizabeth Daly
机构: IBM Research(IBM 研究院)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:
Abstract:Understanding how different stakeholders perceive risks in AI systems is essential for their responsible deployment. This paper presents a framework for stakeholder-grounded risk assessment by using LLMs, acting as judges to predict and explain risks. Using the Risk Atlas Nexus and GloVE explanation method, our framework generates stakeholder-specific, interpretable policies that shows how different stakeholders agree or disagree about the same risks. We demonstrate our method using three real-world AI use cases of medical AI, autonomous vehicles, and fraud detection domain. We further propose an interactive visualization that reveals how and why conflicts emerge across stakeholder perspectives, enhancing transparency in conflict reasoning. Our results show that stakeholder perspectives significantly influence risk perception and conflict patterns. Our work emphasizes the importance of these stakeholder-aware explanations needed to make LLM-based evaluations more transparent, interpretable, and aligned with human-centered AI governance goals.
zh
[NLP-38] MME-CC: A Challenging Multi-Modal Evaluation Benchmark of Cognitive Capacity
【速读】: 该论文旨在解决当前多模态大语言模型(Multimodal Large Language Models, MLLMs)在评估其认知能力时存在的不足,特别是现有基准测试要么过度侧重文本推理,要么未能系统性地捕捉以视觉为核心的认知行为,导致对MLLMs的认知能力评估不充分。解决方案的关键在于提出一个名为MME-CC(Multi-Modal Evaluation benchmark of Cognitive Capacity)的视觉基础评测基准,该基准将11个代表性推理任务归纳为三大类视觉信息处理能力:空间推理、几何推理和基于知识的推理,并提供细粒度分析以全面评估MLLMs在这些维度上的认知表现。通过该基准,研究团队对16个主流MLLMs进行了系统实验,揭示了当前闭源模型整体领先但空间与几何推理仍较弱等关键发现,为未来模型设计与评估提供了新的方向。
链接: https://arxiv.org/abs/2511.03146
作者: Kaiyuan Zhang,Chenghao Yang,Zhoufutu Wen,Sihang Yuan,Qiuyue Wang,Chaoyi Huang,Guosheng Zhu,He Wang,Huawenyu Lu,Jianing Wen,Jianpeng Jiao,Lishu Luo,Longxiang Liu,Sijin Wu,Xiaolei Zhu,Xuanliang Zhang,Ge Zhang,Yi Lin,Guang Shi,Chaoyou Fu,Wenhao Huang
机构: 未知
类目: Computation and Language (cs.CL)
备注:
Abstract:As reasoning models scale rapidly, the essential role of multimodality in human cognition has come into sharp relief, driving a growing need to probe vision-centric cognitive behaviors. Yet, existing multimodal benchmarks either overemphasize textual reasoning or fall short of systematically capturing vision-centric cognitive behaviors, leaving the cognitive capacity of MLLMs insufficiently assessed. To address this limitation, we introduce MME-CC (Multi-Modal Evaluation benchmark of Cognitive Capacity), a vision-grounded benchmark that organizes 11 representative reasoning tasks into three fundamental categories of visual information: spatial, geometric, and knowledge-based reasoning, and provides fine-grained analyses of MLLMs’ cognitive capacity across these dimensions. Based on MME-CC, we conduct extensive experiments over 16 representative MLLMs. Our study reveals that closed-source models currently lead overall (e.g., 42.66 for Gemini-2.5-Pro vs. 30.45 for GLM-4.5V), while spatial and geometric reasoning remain broadly weak (less than or equal to 30%). We further identify common error patterns, including orientation mistakes, fragile cross-view identity persistence, and poor adherence to counterfactual instructions, and observe that Chain-of-Thought typically follows a three-stage process (extract - reason - verify) with heavy reliance on visual extraction. We hope this work catalyzes a shift toward treating the cognitive capacity of MLLMs as central to both evaluation and model design.
zh
[NLP-39] From Measurement to Expertise: Empathetic Expert Adapters for Context-Based Empathy in Conversational AI Agents
【速读】: 该论文旨在解决当前对话式人工智能(Conversational AI)中情感共鸣(Empathy)普遍缺乏情境特异性的问题,即现有大语言模型(Large Language Models, LLMs)生成的情感回应往往泛化、不贴合具体任务和用户语境,导致用户感知到的情感支持与预期存在显著差距。解决方案的关键在于提出一种新型框架,通过构建合成多轮对话生成管道并基于上下文引导响应以匹配预定义的情感模式,同时训练针对特定任务的“情感专家适配器”(Empathetic Expert Adapters),使其能够根据识别出的任务类型动态调整情感强度。实证结果表明,该方法将感知与期望情感之间的差距减少了72.66%,且在长对话中保持情感一致性优于传统系统提示策略。
链接: https://arxiv.org/abs/2511.03143
作者: Erfan Shayegani,Jina Suh,Andy Wilson,Nagu Rangan,Javier Hernandez
机构: Microsoft Research (微软研究院); University of California, Riverside (加州大学河滨分校)
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)
备注:
Abstract:Empathy is a critical factor in fostering positive user experiences in conversational AI. While models can display empathy, it is often generic rather than tailored to specific tasks and contexts. In this work, we introduce a novel framework for developing and evaluating context-specific empathetic large language models (LLMs). We first analyze a real-world conversational dataset consisting of 672 multi-turn conversations across 8 tasks, revealing significant differences in terms of expected and experienced empathy before and after the conversations, respectively. To help minimize this gap, we develop a synthetic multi-turn conversational generation pipeline and steer responses toward our defined empathy patterns based on the context that more closely matches users’ expectations. We then train empathetic expert adapters for context-specific empathy that specialize in varying empathy levels based on the recognized task. Our empirical results demonstrate a significant gap reduction of 72.66% between perceived and desired empathy with scores increasing by an average factor of 2.43 as measured by our metrics and reward models. Additionally, our trained empathetic expert adapters demonstrate superior effectiveness in preserving empathy patterns throughout conversation turns, outperforming system prompts, which tend to dramatically diminish in impact as conversations lengthen.
zh
[NLP-40] From Insight to Exploit: Leverag ing LLM Collaboration for Adaptive Adversarial Text Generation EMNLP2025
【速读】: 该论文旨在解决大语言模型(Large Language Models, LLMs)在敏感任务中对对抗输入缺乏鲁棒性的问题,即如何系统性地生成能够欺骗LLM的动态且自适应的对抗样本。解决方案的关键在于提出两种创新的攻击框架——静态探测器(Static Deceptor, StaDec)和动态探测器(Dynamic Deceptor, DyDec),它们通过深入理解LLM的行为机制来生成语义相似但具有误导性的自然文本对抗样本,并借助由LLM驱动的自动化流水线实现无需外部启发式规则的攻击生成过程,从而具备跨模型强迁移性并能随LLM演进而持续进化。
链接: https://arxiv.org/abs/2511.03128
作者: Najrin Sultana,Md Rafi Ur Rashid,Kang Gu,Shagufta Mehnaz
机构: The Pennsylvania State University (宾夕法尼亚州立大学); Dartmouth College (达特茅斯学院)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注: Findings of the Association for Computational Linguistics: EMNLP 2025 (camera-ready)
Abstract:LLMs can provide substantial zero-shot performance on diverse tasks using a simple task prompt, eliminating the need for training or fine-tuning. However, when applying these models to sensitive tasks, it is crucial to thoroughly assess their robustness against adversarial inputs. In this work, we introduce Static Deceptor (StaDec) and Dynamic Deceptor (DyDec), two innovative attack frameworks designed to systematically generate dynamic and adaptive adversarial examples by leveraging the understanding of the LLMs. We produce subtle and natural-looking adversarial inputs that preserve semantic similarity to the original text while effectively deceiving the target LLM. By utilizing an automated, LLM-driven pipeline, we eliminate the dependence on external heuristics. Our attacks evolve with the advancements in LLMs and demonstrate strong transferability across models unknown to the attacker. Overall, this work provides a systematic approach for the self-assessment of an LLM’s robustness. We release our code and data at this https URL.
zh
[NLP-41] Control Barrier Function for Aligning Large Language Models
【速读】: 该论文旨在解决大语言模型(Large Language Models, LLMs)在文本生成过程中难以保证符合用户期望的安全性与对齐性问题。解决方案的关键在于引入基于控制屏障函数(Control Barrier Function, CBF)的安全过滤机制,该机制作为附加模块嵌入到基础LLM的token预测输出中,无需微调原始模型即可干预生成内容,从而实现对齐目标;同时,若存在评估模型用于衡量期望对齐程度,可直接用于CBF安全滤波器的设计,提升生成文本的可控性和安全性。
链接: https://arxiv.org/abs/2511.03121
作者: Yuya Miyaoka,Masaki Inoue
机构: Keio University (庆应义塾大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
备注:
Abstract:This paper proposes a control-based framework for aligning large language models (LLMs) by leveraging a control barrier function (CBF) to ensure user-desirable text generation. The presented framework applies the CBF safety filter to the predicted token generated from the baseline LLM, to intervene in the generated text. The safety filter includes two significant advantages: this safety filter is an add-on type, allowing it to be used for alignment purposes without fine-tuning the baseline LLM, and if there is an evaluation model regarding the desired alignment, it can be directly applied to the filter design. The overall text-generation system is implemented with open-source language models, aiming to generate positive text.
zh
[NLP-42] CARMA: Comprehensive Automatically-annotated Reddit Mental Health Dataset for Arabic
【速读】: 该论文旨在解决阿拉伯语人群中精神健康障碍早期检测的难题,尤其是在资源匮乏且文化 stigma 限制心理议题讨论的背景下。现有研究多集中于英语语料,而阿拉伯语相关数据稀缺,严重制约了该领域的发展。解决方案的关键在于构建首个大规模、自动标注的阿拉伯语 Reddit 社交媒体文本数据集 CARMA,涵盖六种精神健康状况(如焦虑症、自闭症和抑郁症)及对照组,其规模与多样性均优于现有资源。通过词法和语义层面的定性与定量分析,该研究揭示了特定精神健康状态的语言特征,并基于此开展从浅层分类器到大语言模型的分类实验,验证了 CARMA 在推动阿拉伯语等低资源语言中精神健康检测方面的潜力。
链接: https://arxiv.org/abs/2511.03102
作者: Saad Mankarious,Ayah Zirikly
机构: George Washington University (乔治华盛顿大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:
Abstract:Mental health disorders affect millions worldwide, yet early detection remains a major challenge, particularly for Arabic-speaking populations where resources are limited and mental health discourse is often discouraged due to cultural stigma. While substantial research has focused on English-language mental health detection, Arabic remains significantly underexplored, partly due to the scarcity of annotated datasets. We present CARMA, the first automatically annotated large-scale dataset of Arabic Reddit posts. The dataset encompasses six mental health conditions, such as Anxiety, Autism, and Depression, and a control group. CARMA surpasses existing resources in both scale and diversity. We conduct qualitative and quantitative analyses of lexical and semantic differences between users, providing insights into the linguistic markers of specific mental health conditions. To demonstrate the dataset’s potential for further mental health analysis, we perform classification experiments using a range of models, from shallow classifiers to large language models. Our results highlight the promise of advancing mental health detection in underrepresented languages such as Arabic.
zh
[NLP-43] A Computational Approach to Analyzing Disrupted Language in Schizophrenia: Integrating Surprisal and Coherence Measures ICASSP2026
【速读】: 该论文旨在解决如何通过计算语言学指标来客观刻画精神分裂症患者在自发语言产生中的语言紊乱问题,进而为症状严重程度和诊断提供量化依据。其解决方案的关键在于利用计算模型分别计算**预期意外度(surprisal)和语义连贯性(semantic coherence)**这两个语言特征,并比较精神分裂症患者与健康对照组之间的差异,同时分析这些语言指标随症状严重程度变化的规律,从而揭示语言异常与认知障碍之间的关联机制。
链接: https://arxiv.org/abs/2511.03089
作者: Gowtham Premananth,Carol Espy-Wilson
机构: 未知
类目: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
备注: Submitted to ICASSP 2026
Abstract:Language disruptions are one of the well-known effects of schizophrenia symptoms. They are often manifested as disorganized speech and impaired discourse coherence. These abnormalities in spontaneous language production reflect underlying cognitive disturbances and have the potential to serve as objective markers for symptom severity and diagnosis of schizophrenia. This study focuses on how these language disruptions can be characterized in terms of two computational linguistic measures: surprisal and semantic coherence. By computing surprisal and semantic coherence of language using computational models, this study investigates how they differ between subjects with schizophrenia and healthy controls. Furthermore, this study provides further insight into how language disruptions in terms of these linguistic measures change with varying degrees of schizophrenia symptom severity.
zh
[NLP-44] PolyNorm: Few-Shot LLM -Based Text Normalization for Text-to-Speech EMNLP2025
【速读】: 该论文旨在解决文本到语音(Text-to-Speech, TTS)系统中文本归一化(Text Normalization, TN)任务的瓶颈问题,即传统TN系统依赖大量人工规则、工程成本高、难以扩展且在低资源语言上表现受限。解决方案的关键在于提出PolyNorm,一种基于大语言模型(Large Language Models, LLMs)的提示驱动(prompt-based)方法,通过减少对人工规则的依赖,实现更广泛的语言覆盖与最小化的人工干预。此外,作者还设计了一个语言无关的自动数据清洗与评估流水线,以支持多语言场景下的可扩展实验,实验证明其在八种语言上均显著降低了词错误率(Word Error Rate, WER)。
链接: https://arxiv.org/abs/2511.03080
作者: Michel Wong,Ali Alshehri,Sophia Kao,Haotian He
机构: Apple(苹果)
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: 9 pages including appendix. EMNLP 2025 Industry Track
Abstract:Text Normalization (TN) is a key preprocessing step in Text-to-Speech (TTS) systems, converting written forms into their canonical spoken equivalents. Traditional TN systems can exhibit high accuracy, but involve substantial engineering effort, are difficult to scale, and pose challenges to language coverage, particularly in low-resource settings. We propose PolyNorm, a prompt-based approach to TN using Large Language Models (LLMs), aiming to reduce the reliance on manually crafted rules and enable broader linguistic applicability with minimal human intervention. Additionally, we present a language-agnostic pipeline for automatic data curation and evaluation, designed to facilitate scalable experimentation across diverse languages. Experiments across eight languages show consistent reductions in the word error rate (WER) compared to a production-grade-based system. To support further research, we release PolyNorm-Benchmark, a multilingual data set covering a diverse range of text normalization phenomena.
zh
[NLP-45] he Curved Spacetime of Transformer Architectures
【速读】: 该论文试图解决的问题是:如何从几何角度理解基于Transformer的语言模型中token表示的动态演化机制,特别是注意力机制如何在高维特征空间中诱导非平凡的曲率结构并影响嵌入轨迹的形状。其解决方案的关键在于构建一个类比广义相对论(General Relativity)的几何框架——其中查询与键(query and key)定义了表示空间中的有效度量张量,注意力操作等价于离散联络(discrete connection),实现值向量沿token序列的平行传输;堆叠层构成离散时间切片,使token嵌入在弯曲流形上演化,而反向传播则对应最小作用量原理,引导参数空间中的损失最小化路径。这一框架预测嵌入轨迹并非直线,而是受嵌入空间曲率调控的弯曲路径,并通过可视化、统计模拟和受控上下文编辑实验验证了该预测,从而揭示了注意力机制对token表示几何结构的深刻塑造作用。
链接: https://arxiv.org/abs/2511.03060
作者: Riccardo Di Sipio,Jairo Diaz-Rodriguez,Luis Serrano
机构: Dayforce; York University (约克大学)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL); Differential Geometry (math.DG)
备注:
Abstract:We present a geometric framework for understanding Transformer-based language models, drawing an explicit analogy to General Relativity. Queries and keys induce an effective metric on representation space, and attention acts as a discrete connection that implements parallel transport of value vectors across tokens. Stacked layers provide discrete time-slices through which token representations evolve on this curved manifold, while backpropagation plays the role of a least-action principle that shapes loss-minimizing trajectories in parameter space. If this analogy is correct, token embeddings should not traverse straight paths in feature space; instead, their layer-wise steps should bend and reorient as interactions mediated by embedding space curvature. To test this prediction, we design experiments that expose both the presence and the consequences of curvature: (i) we visualize a curvature landscape for a full paragraph, revealing how local turning angles vary across tokens and layers; (ii) we show through simulations that excess counts of sharp/flat angles and longer length-to-chord ratios are not explainable by dimensionality or chance; and (iii) inspired by Einstein’s eclipse experiment, we probe deflection under controlled context edits, demonstrating measurable, meaning-consistent bends in embedding trajectories that confirm attention-induced curvature.
zh
[NLP-46] Reading Between the Lines: The One-Sided Conversation Problem
【速读】: 该论文试图解决的是“单边对话问题”(one-sided conversation problem, 1SC),即在仅能记录对话中一方发言的场景下(如远程医疗、呼叫中心和智能眼镜),如何推断并学习另一方未被记录的对话内容。其解决方案的关键在于:(1) 利用未来一回合的对话信息及话语长度提示来提升缺失说话者话语的重建精度;(2) 采用占位符提示(placeholder prompting)策略以减少生成中的幻觉现象;(3) 发现无需重建缺失话语即可生成高质量摘要,从而为隐私敏感场景下的对话理解提供高效路径。实验表明,大模型通过提示即可实现良好重建效果,而小模型则需微调才能达到可接受性能。
链接: https://arxiv.org/abs/2511.03056
作者: Victoria Ebert,Rishabh Singh,Tuochao Chen,Noah A. Smith,Shyamnath Gollakota
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 8 pages, 6 figures, 4 tables
Abstract:Conversational AI is constrained in many real-world settings where only one side of a dialogue can be recorded, such as telemedicine, call centers, and smart glasses. We formalize this as the one-sided conversation problem (1SC): inferring and learning from one side of a conversation. We study two tasks: (1) reconstructing the missing speaker’s turns for real-time use cases, and (2) generating summaries from one-sided transcripts. Evaluating prompting and finetuned models on MultiWOZ, DailyDialog, and Candor with both human A/B testing and LLM-as-a-judge metrics, we find that access to one future turn and information about utterance length improves reconstruction, placeholder prompting helps to mitigate hallucination, and while large models generate promising reconstructions with prompting, smaller models require finetuning. Further, high-quality summaries can be generated without reconstructing missing turns. We present 1SC as a novel challenge and report promising results that mark a step toward privacy-aware conversational AI.
zh
[NLP-47] ROBoto2: An Interactive System and Dataset for LLM -assisted Clinical Trial Risk of Bias Assessment EMNLP2025
【速读】: 该论文旨在解决临床试验偏倚风险(Risk of Bias, ROB)评估中传统人工标注流程效率低、耗时长的问题,特别是针对ROB v2(ROB2)评估的劳动密集型特性。解决方案的关键在于开发了一个开源的基于网页的平台ROBOTO2,其核心创新包括:通过PDF解析实现文档结构化处理、结合检索增强生成(Retrieval-Augmented Generation, RAG)技术提升大语言模型(Large Language Model, LLM)对信号问题的回答准确性,并引入人机协同(human-in-the-loop)机制允许用户实时反馈与修正系统建议,从而在保持评估质量的同时显著提升效率。
链接: https://arxiv.org/abs/2511.03048
作者: Anthony Hevia,Sanjana Chintalapati,Veronica Ka Wai Lai,Thanh Tam Nguyen,Wai-Tat Wong,Terry Klassen,Lucy Lu Wang
机构: University of Washington (华盛顿大学); The Hospital for Sick Children (儿童医院); University of Bologna (博洛尼亚大学); The Chinese University of Hong Kong (香港中文大学); University of Saskatchewan (萨斯喀彻温大学)
类目: Computation and Language (cs.CL)
备注: EMNLP 2025 System Demonstration
Abstract:We present ROBOTO2, an open-source, web-based platform for large language model (LLM)-assisted risk of bias (ROB) assessment of clinical trials. ROBOTO2 streamlines the traditionally labor-intensive ROB v2 (ROB2) annotation process via an interactive interface that combines PDF parsing, retrieval-augmented LLM prompting, and human-in-the-loop review. Users can upload clinical trial reports, receive preliminary answers and supporting evidence for ROB2 signaling questions, and provide real-time feedback or corrections to system suggestions. ROBOTO2 is publicly available at this https URL, with code and data released to foster reproducibility and adoption. We construct and release a dataset of 521 pediatric clinical trial reports (8954 signaling questions with 1202 evidence passages), annotated using both manually and LLM-assisted methods, serving as a benchmark and enabling future research. Using this dataset, we benchmark ROB2 performance for 4 LLMs and provide an analysis into current model capabilities and ongoing challenges in automating this critical aspect of systematic review.
zh
[NLP-48] Data-Efficient Adaptation and a Novel Evaluation Method for Aspect-based Sentiment Analysis
【速读】: 该论文旨在解决Aspect-based Sentiment Analysis (ABSA)在低资源领域(如教育和医疗)中研究匮乏、模型适应性差以及传统评估方法过于严格的问题。其关键解决方案包括:首先提出一种新的评估方法——柔性文本相似度匹配与最优二分图配对(FTS-OBP),能够容忍边界变化并提供细粒度诊断;其次,首次针对小规模解码器-only生成式语言模型(SLMs;7B参数)开展ABSA研究,通过数据-free(上下文学习与权重合并)和数据轻量微调策略,结合多任务微调机制,在仅需200–1,000样本的情况下实现性能超越商用大模型的效果;最后,公开首个教育评论ABSA数据集,推动低资源场景下的持续研究。
链接: https://arxiv.org/abs/2511.03034
作者: Yan Cathy Hua,Paul Denny,Jörg Wicker,Katerina Taškova
机构: University of Auckland (奥克兰大学)
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:
Abstract:Aspect-based Sentiment Analysis (ABSA) is a fine-grained opinion mining approach that identifies and classifies opinions associated with specific entities (aspects) or their categories within a sentence. Despite its rapid growth and broad potential, ABSA research and resources remain concentrated in commercial domains, leaving analytical needs unmet in high-demand yet low-resource areas such as education and healthcare. Domain adaptation challenges and most existing methods’ reliance on resource-intensive in-training knowledge injection further hinder progress in these areas. Moreover, traditional evaluation methods based on exact matches are overly rigid for ABSA tasks, penalising any boundary variations which may misrepresent the performance of generative models. This work addresses these gaps through three contributions: 1) We propose a novel evaluation method, Flexible Text Similarity Matching and Optimal Bipartite Pairing (FTS-OBP), which accommodates realistic extraction boundary variations while maintaining strong correlation with traditional metrics and offering fine-grained diagnostics. 2) We present the first ABSA study of small decoder-only generative language models (SLMs; 7B parameters), examining resource lower bounds via a case study in education review ABSA. We systematically explore data-free (in-context learning and weight merging) and data-light fine-tuning methods, and propose a multitask fine-tuning strategy that significantly enhances SLM performance, enabling 1.5-3.8 B models to surpass proprietary large models and approach benchmark results with only 200-1,000 examples on a single GPU. 3) We release the first public set of education review ABSA resources to support future research in low-resource domains.
zh
[NLP-49] argeted Error Correction in Knowledge Distillation: Small Language Models Surpass GPT
【速读】: 该论文旨在解决开源小规模语言模型(LLM)在客户服务摘要任务中性能显著落后于大型专有模型(如GPT-3.5)的问题。解决方案的关键在于提出一个分析-修订-微调(Analyze-Revise-Finetune, ARF)流水线:首先分析教师模型(GPT-3.5)生成摘要中的常见错误并分类,随后利用一个紧凑的编辑模型(Llama 3.1 70B)针对性地修正这些错误,生成高质量的训练数据;最后,将一个小学生模型(Llama 3.1 8B)在此类精修数据上进行微调,从而实现超越原始教师模型的摘要性能。该方法在提升模型效果的同时兼顾成本效益和数据隐私,具有良好的泛化能力。
链接: https://arxiv.org/abs/2511.03005
作者: Hee-Jin Lee,Zhen Guo,Luchao Jin,Morteza Moazami Goudarzi
机构: eBay Inc. (eBay公司)
类目: Computation and Language (cs.CL)
备注:
Abstract:We introduce an Analyze-Revise-Finetune (ARF) pipeline that enables smaller open-source language models (LLMs) to surpass substantially larger proprietary models in customer service summarization tasks. The pipeline first analyzes and categorizes common errors in summaries produced by a teacher model (GPT-3.5), then performs a targeted revision using a compact editor model (Llama 3.1 70B) to generate high-quality, refined training data. Fine-tuning a smaller student model (Llama 3.1 8B) on this refined data resulted in superior summarization performance compared to GPT-3.5. The ARF pipeline improves cost efficiency and data privacy while maintaining competitive accuracy, illustrating a generalizable framework for enhancing open-source LLMs across diverse downstream applications.
zh
[NLP-50] LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation
【速读】: 该论文旨在解决当前基于大语言模型(Large Language Models, LLMs)生成的3D场景普遍存在空间布局和物体属性不真实的问题,其根源在于指令粒度不足、缺乏对现实环境细节的刻画。为提升生成场景的真实性与指令一致性,作者提出两个核心贡献:一是构建LEGO-Bench基准数据集,包含复杂布局与真实世界属性的细粒度指令;二是设计LEGO-Eval评估框架,通过多样化工具显式锚定场景组件,实现更精准的场景-指令对齐评估。解决方案的关键在于引入细粒度指令驱动的生成目标和基于显式成分锚定的评估机制,从而有效克服现有方法(如CLIPScore和视觉-语言模型)因对3D场景理解浅层化而导致的误判问题。
链接: https://arxiv.org/abs/2511.03001
作者: Gyeom Hwangbo,Hyungjoo Chae,Minseok Kang,Hyeonjong Ju,Soohyun Oh,Jinyoung Yeo
机构: Yonsei University (延世大学); Georgia Institute of Technology (佐治亚理工学院)
类目: Computation and Language (cs.CL)
备注: Work in Progress
Abstract:Despite recent progress in using Large Language Models (LLMs) for automatically generating 3D scenes, generated scenes often lack realistic spatial layouts and object attributes found in real-world environments. As this problem stems from insufficiently detailed, coarse-grained instructions, advancing 3D scene synthesis guided by more detailed, fine-grained instructions that reflect real-world environments becomes crucial. Without such realistic scenes, training embodied agents in unrealistic environments can lead them to learn priors that diverge significantly from real-world physics and semantics, degrading their performance when deployed. Thus, verifying the alignment between the fine-grained instruction and the generated scene is essential for effective learning. However, current evaluation methods, such as CLIPScore and vision-language models (VLMs), often fail to reliably assess such alignment. This shortcoming arises primarily from their shallow understanding of 3D scenes, which often leads to improperly grounded scene components. To address this, we introduce LEGO-Eval, an evaluation framework equipped with diverse tools designed to explicitly ground scene components, enabling more accurate alignment assessments. We also present LEGO-Bench, a benchmark of detailed instructions that specify complex layouts and attributes of real-world environments. Experiments demonstrate that LEGO-Eval outperforms VLM-as-a-judge by 0.41 F1 score in assessing scene-instruction alignment. Benchmarking with LEGO-Bench reveals significant limitations in current generation methods. Across all evaluated approaches, success rates reached at most 10% in generating scenes that fully align with fine-grained instructions.
zh
[NLP-51] Automatic Machine Translation Detection Using a Surrogate Multilingual Translation Model
【速读】: 该论文旨在解决现代机器翻译(Machine Translation, MT)系统训练数据中存在大量机器生成翻译文本的问题,这类合成数据会显著降低翻译质量。其解决方案的关键在于利用一个替代的多语言MT模型的内部表示(internal representations)来直接区分人类翻译与机器翻译句子,从而实现更有效的训练数据过滤。该方法在非英语语对上表现尤为突出,相比现有最先进技术至少提升了5个百分点的准确率。
链接: https://arxiv.org/abs/2511.02958
作者: Cristian García-Romero,Miquel Esplà-Gomis,Felipe Sánchez-Martínez
机构: Universitat d’Alacant (阿尔卡拉大学); Dep. de Llenguatges i Sistemes Informàtics (计算机语言与系统系); Institut Universitari d’Investigació Informàtica (信息研究院)
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: Pre-MIT Press publication version
Abstract:Modern machine translation (MT) systems depend on large parallel corpora, often collected from the Internet. However, recent evidence indicates that (i) a substantial portion of these texts are machine-generated translations, and (ii) an overreliance on such synthetic content in training data can significantly degrade translation quality. As a result, filtering out non-human translations is becoming an essential pre-processing step in building high-quality MT systems. In this work, we propose a novel approach that directly exploits the internal representations of a surrogate multilingual MT model to distinguish between human and machine-translated sentences. Experimental results show that our method outperforms current state-of-the-art techniques, particularly for non-English language pairs, achieving gains of at least 5 percentage points of accuracy.
zh
[NLP-52] Zero-shot data citation function classification using transformer-based large language models (LLM s)
【速读】: 该论文旨在解决如何高效、自动化地识别和分类科学文献中对特定基因组数据集的使用方式(data use case)这一问题。传统方法依赖人工标注或构建训练数据集进行监督学习,成本高且难以扩展。其解决方案的关键在于利用预训练的基于Transformer的大语言模型(LLM),特别是开源模型Llama 3.1-405B,在无需预先定义类别的情况下执行零样本(zero-shot)数据引用分类任务,从而实现对出版物中数据使用场景的结构化标签生成。该方法避免了昂贵的人工标注过程,并通过引入新的评估框架验证了模型在无监督条件下的有效性,尽管仍面临数据可用性、提示过拟合及计算资源等挑战。
链接: https://arxiv.org/abs/2511.02936
作者: Neil Byers,Ali Zaidi,Valerie Skye,Chris Beecroft,Kjiersten Fagnan
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:
Abstract:Efforts have increased in recent years to identify associations between specific datasets and the scientific literature that incorporates them. Knowing that a given publication cites a given dataset, the next logical step is to explore how or why that data was used. Advances in recent years with pretrained, transformer-based large language models (LLMs) offer potential means for scaling the description of data use cases in the published literature. This avoids expensive manual labeling and the development of training datasets for classical machine-learning (ML) systems. In this work we apply an open-source LLM, Llama 3.1-405B, to generate structured data use case labels for publications known to incorporate specific genomic datasets. We also introduce a novel evaluation framework for determining the efficacy of our methods. Our results demonstrate that the stock model can achieve an F1 score of .674 on a zero-shot data citation classification task with no previously defined categories. While promising, our results are qualified by barriers related to data availability, prompt overfitting, computational infrastructure, and the expense required to conduct responsible performance evaluation.
zh
[NLP-53] Cache Mechanism for Agent RAG Systems
【速读】: 该论文旨在解决大型语言模型(Large Language Model, LLM)代理在使用检索增强生成(Retrieval-Augmented Generation, RAG)技术时,代理级缓存管理问题,即如何动态构建、维护和更新一个紧凑且高度相关的知识语料库以满足每个代理的个性化需求。解决方案的关键在于提出ARC(Agent RAG Cache Mechanism),一种无需标注的缓存机制,其通过融合历史查询分布模式与嵌入空间中缓存项的内在几何结构,自动维护高相关性缓存,从而在仅占用原始语料库0.015%存储空间的情况下,显著提升检索准确率(最高达79.8%)并降低平均检索延迟80%。
链接: https://arxiv.org/abs/2511.02919
作者: Shuhang Lin,Zhencan Peng,Lingyao Li,Xiao Lin,Xi Zhu,Yongfeng Zhang
机构: Rutgers University (罗格斯大学); University of South Florida (南佛罗里达大学); University of Illinois Urbana–Champaign (伊利诺伊大学厄巴纳-香槟分校)
类目: Computation and Language (cs.CL)
备注:
Abstract:Recent advances in Large Language Model (LLM)-based agents have been propelled by Retrieval-Augmented Generation (RAG), which grants the models access to vast external knowledge bases. Despite RAG’s success in improving agent performance, agent-level cache management, particularly constructing, maintaining, and updating a compact, relevant corpus dynamically tailored to each agent’s need, remains underexplored. Therefore, we introduce ARC (Agent RAG Cache Mechanism), a novel, annotation-free caching framework that dynamically manages small, high-value corpora for each agent. By synthesizing historical query distribution patterns with the intrinsic geometry of cached items in the embedding space, ARC automatically maintains a high-relevance cache. With comprehensive experiments on three retrieval datasets, our experimental results demonstrate that ARC reduces storage requirements to 0.015% of the original corpus while offering up to 79.8% has-answer rate and reducing average retrieval latency by 80%. Our results demonstrate that ARC can drastically enhance efficiency and effectiveness in RAG-powered LLM agents.
zh
[NLP-54] LiveTradeBench: Seeking Real-World Alpha with Large Language Models
【速读】: 该论文旨在解决当前大语言模型(Large Language Models, LLMs)在静态基准测试中表现优异,但缺乏对真实动态环境中决策能力评估的问题,尤其是面对不确定性时的持续适应与风险权衡能力。其解决方案的关键在于提出 LiveTradeBench——一个基于实时市场数据流、多资产组合管理抽象和跨市场结构差异的活体交易环境,通过引入实时价格与新闻数据流、扩展至多资产配置的控制范式,并在不同波动性、流动性及信息流动特征的市场(如美股与Polymarket预测市场)中进行评估,从而更真实地检验LLM代理在序列决策中的鲁棒性与适应性。
链接: https://arxiv.org/abs/2511.03628
作者: Haofei Yu,Fenghai Li,Jiaxuan You
机构: University of Illinois, Urbana-Champaign (伊利诺伊大学厄巴纳-香槟分校)
类目: Trading and Market Microstructure (q-fin.TR); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Computation and Language (cs.CL)
备注: 16 pages
Abstract:Large language models (LLMs) achieve strong performance across benchmarks–from knowledge quizzes and math reasoning to web-agent tasks–but these tests occur in static settings, lacking real dynamics and uncertainty. Consequently, they evaluate isolated reasoning or problem-solving rather than decision-making under uncertainty. To address this, we introduce LiveTradeBench, a live trading environment for evaluating LLM agents in realistic and evolving markets. LiveTradeBench follows three design principles: (i) Live data streaming of market prices and news, eliminating dependence on offline backtesting and preventing information leakage while capturing real-time uncertainty; (ii) a portfolio-management abstraction that extends control from single-asset actions to multi-asset allocation, integrating risk management and cross-asset reasoning; and (iii) multi-market evaluation across structurally distinct environments–U.S. stocks and Polymarket prediction markets–differing in volatility, liquidity, and information flow. At each step, an agent observes prices, news, and its portfolio, then outputs percentage allocations that balance risk and return. Using LiveTradeBench, we run 50-day live evaluations of 21 LLMs across families. Results show that (1) high LMArena scores do not imply superior trading outcomes; (2) models display distinct portfolio styles reflecting risk appetite and reasoning dynamics; and (3) some LLMs effectively leverage live signals to adapt decisions. These findings expose a gap between static evaluation and real-world competence, motivating benchmarks that test sequential decision making and consistency under live uncertainty.
zh
计算机视觉
[CV-0] Disentangled Concepts Speak Louder Than Words:Explainable Video Action Recognition NEURIPS2025
【速读】:该论文旨在解决视频动作识别模型解释性不足的问题,尤其是现有基于显著性图的方法会产生运动与空间上下文混杂的解释,而语言-based 方法虽具结构性却难以有效描述运动细节。其解决方案的关键在于提出一种基于概念解耦的可解释框架 DANCE(Disentangled Action aNd Context concept-based Explainable),通过预先定义三类可解释概念——运动动力学(motion dynamics)、物体(objects)和场景(scenes)——并利用大语言模型自动提取后两类概念,结合前设的概念瓶颈(ante-hoc concept bottleneck)机制强制模型依赖这些解耦概念进行预测,从而实现对动作识别决策过程的清晰、结构化解释。
链接: https://arxiv.org/abs/2511.03725
作者: Jongseo Lee,Wooil Lee,Gyeong-Moon Park,Seong Tae Kim,Jinwoo Choi
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: NeurIPS 2025 Spotlight paper. Project page: this https URL
Abstract:Effective explanations of video action recognition models should disentangle how movements unfold over time from the surrounding spatial context. However, existing methods based on saliency produce entangled explanations, making it unclear whether predictions rely on motion or spatial context. Language-based approaches offer structure but often fail to explain motions due to their tacit nature – intuitively understood but difficult to verbalize. To address these challenges, we propose Disentangled Action aNd Context concept-based Explainable (DANCE) video action recognition, a framework that predicts actions through disentangled concept types: motion dynamics, objects, and scenes. We define motion dynamics concepts as human pose sequences. We employ a large language model to automatically extract object and scene concepts. Built on an ante-hoc concept bottleneck design, DANCE enforces prediction through these concepts. Experiments on four datasets – KTH, Penn Action, HAA500, and UCF-101 – demonstrate that DANCE significantly improves explanation clarity with competitive performance. We validate the superior interpretability of DANCE through a user study. Experimental results also show that DANCE is beneficial for model debugging, editing, and failure analysis.
zh
[CV-1] Part-Aware Bottom-Up Group Reasoning for Fine-Grained Social Interaction Detection NEURIPS2025
【速读】:该论文旨在解决现有社会交互检测方法忽视细微线索(如面部表情、注视方向和手势)以及未能显式建模个体间交互关系的问题,从而导致局部社交信号捕捉不足且群体配置推断模糊。其解决方案的关键在于提出一种基于部件感知的自底向上群体推理框架,通过利用人体部位特征及其相互关系来推断社交群体及其交互模式;具体而言,模型首先基于部件感知线索增强个体特征,再通过融合空间关系与微妙社交线索的相似性推理机制关联个体,从而实现更精准的群体配置推断。
链接: https://arxiv.org/abs/2511.03666
作者: Dongkeun Kim,Minsu Cho,Suha Kwak
机构: Pohang University of Science and Technology (POSTECH)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted to NeurIPS 2025
Abstract:Social interactions often emerge from subtle, fine-grained cues such as facial expressions, gaze, and gestures. However, existing methods for social interaction detection overlook such nuanced cues and primarily rely on holistic representations of individuals. Moreover, they directly detect social groups without explicitly modeling the underlying interactions between individuals. These drawbacks limit their ability to capture localized social signals and introduce ambiguity when group configurations should be inferred from social interactions grounded in nuanced cues. In this work, we propose a part-aware bottom-up group reasoning framework for fine-grained social interaction detection. The proposed method infers social groups and their interactions using body part features and their interpersonal relations. Our model first detects individuals and enhances their features using part-aware cues, and then infers group configuration by associating individuals via similarity-based reasoning, which considers not only spatial relations but also subtle social cues that signal interactions, leading to more accurate group inference. Experiments on the NVI dataset demonstrate that our method outperforms prior methods, achieving the new state of the art.
zh
[CV-2] A Lightweight 3D-CNN for Event-Based Human Action Recognition with Privacy-Preserving Potential
【速读】:该论文旨在解决传统基于帧的摄像头在人体活动识别(HAR)中因捕获可识别个人身份信息而导致的隐私保护问题。为此,作者提出了一种轻量级三维卷积神经网络(3DCNN),利用事件相机(event camera)采集的事件流数据进行建模,该数据仅记录像素强度变化,具备天然的隐私保护特性。解决方案的关键在于:1)设计一个紧凑高效的3DCNN架构,能够同时捕捉空间与时间动态特征,适用于边缘部署;2)引入焦点损失(focal loss)结合类别重加权和针对性的数据增强策略,有效缓解类别不平衡问题并提升模型泛化能力。实验表明,该方法在丰田智能家居与ETRI数据集上实现了94.17%的整体准确率和0.9415的F1分数,优于C3D、ResNet3D等基准模型。
链接: https://arxiv.org/abs/2511.03665
作者: Mehdi Sefidgar Dilmaghani,Francis Fowley,Peter Corcoran
机构: University of Galway (加拉威大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
Abstract:This paper presents a lightweight three-dimensional convolutional neural network (3DCNN) for human activity recognition (HAR) using event-based vision data. Privacy preservation is a key challenge in human monitoring systems, as conventional frame-based cameras capture identifiable personal information. In contrast, event cameras record only changes in pixel intensity, providing an inherently privacy-preserving sensing modality. The proposed network effectively models both spatial and temporal dynamics while maintaining a compact design suitable for edge deployment. To address class imbalance and enhance generalization, focal loss with class reweighting and targeted data augmentation strategies are employed. The model is trained and evaluated on a composite dataset derived from the Toyota Smart Home and ETRI datasets. Experimental results demonstrate an F1-score of 0.9415 and an overall accuracy of 94.17%, outperforming benchmark 3D-CNN architectures such as C3D, ResNet3D, and MC3_18 by up to 3%. These results highlight the potential of event-based deep learning for developing accurate, efficient, and privacy-aware human action recognition systems suitable for real-world edge applications.
zh
[CV-3] Flying Robotics Art: ROS-based Drone Draws the Record-Breaking Mural
【速读】:该论文旨在解决在户外恶劣环境(如风力和强光)下实现高精度、自主化大型壁画绘制的问题,同时确保系统的运行可靠性与艺术表现的一致性。其解决方案的关键在于构建一个融合红外(Infrared, IR)运动捕捉相机与激光雷达(LiDAR)技术的鲁棒导航系统,实现针对大规模艺术应用定制的精准定位跟踪;并采用一种独特的控制架构,在路径切向与法向分别实施差异化调节,从而实现精确轨迹跟踪与稳定线条绘制;此外,还开发了用于复杂曲线绘制与区域填充的路径规划与优化算法,以及专为应对无人机螺旋桨湍流设计的喷漆机构,有效保护关键部件免受油漆损害,保障长期稳定运行。
链接: https://arxiv.org/abs/2511.03651
作者: Andrei A. Korigodskii,Oleg D. Kalachev,Artem E. Vasiunik,Matvei V. Urvantsev,Georgii E. Bondar
机构: 未知
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Systems and Control (eess.SY)
备注:
Abstract:This paper presents the innovative design and successful deployment of a pioneering autonomous unmanned aerial system developed for executing the world’s largest mural painted by a drone. Addressing the dual challenges of maintaining artistic precision and operational reliability under adverse outdoor conditions such as wind and direct sunlight, our work introduces a robust system capable of navigating and painting outdoors with unprecedented accuracy. Key to our approach is a novel navigation system that combines an infrared (IR) motion capture camera and LiDAR technology, enabling precise location tracking tailored specifically for largescale artistic applications. We employ a unique control architecture that uses different regulation in tangential and normal directions relative to the planned path, enabling precise trajectory tracking and stable line rendering. We also present algorithms for trajectory planning and path optimization, allowing for complex curve drawing and area filling. The system includes a custom-designed paint spraying mechanism, specifically engineered to function effectively amidst the turbulent airflow generated by the drone’s propellers, which also protects the drone’s critical components from paint-related damage, ensuring longevity and consistent performance. Experimental results demonstrate the system’s robustness and precision in varied conditions, showcasing its potential for autonomous large-scale art creation and expanding the functional applications of robotics in creative fields.
zh
[CV-4] Signal Intensity-weighted coordinate channels improve learning stability and generalisation in 1D and 2D CNNs in localisation tasks on biomedical signals
【速读】:该论文旨在解决生物医学数据中定位任务的挑战,即模型需从具有复杂强度分布的信号中学习有意义的空间或时间关系。传统方法如CoordConv层通过在卷积输入中添加坐标通道来赋予网络学习绝对位置的能力,但其未考虑信号强度与位置之间的潜在耦合。本文的关键解决方案是提出一种信号强度加权的坐标表示方法(signal intensity-weighted coordinate representation),该方法用局部信号强度对坐标通道进行缩放,从而在输入表示中直接嵌入强度-位置耦合关系,引入了一种简单且模态无关的归纳偏置(inductive bias)。实验表明,该方法在心电图(ECG)时间预测和细胞图像核中心坐标回归两个任务上均实现了更快的收敛速度和更强的泛化性能。
链接: https://arxiv.org/abs/2511.03645
作者: Vittal L. Rao
机构: Indian Institute of Technology Madras (印度理工学院马德拉斯分校)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
Abstract:Localisation tasks in biomedical data often require models to learn meaningful spatial or temporal relationships from signals with complex intensity distributions. A common strategy, exemplified by CoordConv layers, is to append coordinate channels to convolutional inputs, enabling networks to learn absolute positions. In this work, we propose a signal intensity-weighted coordinate representation that replaces the pure coordinate channels with channels scaled by local signal intensity. This modification embeds an intensity-position coupling directly in the input representation, introducing a simple and modality-agnostic inductive bias. We evaluate the approach on two distinct localisation problems: (i) predicting the time of morphological transition in 20-second, two-lead ECG signals, and (ii) regressing the coordinates of nuclear centres in cytological images from the SiPaKMeD dataset. In both cases, the proposed representation yields faster convergence and higher generalisation performance relative to conventional coordinate-channel approaches, demonstrating its effectiveness across both one-dimensional and two-dimensional biomedical signals.
zh
[CV-5] Human Mesh Modeling for Anny Body
【速读】:该论文旨在解决现有参数化人体模型(parametric body models)依赖昂贵的3D扫描数据、形状空间受专有性限制且人群代表性不足的问题。其解决方案的关键在于提出Anny——一个完全可微、无需3D扫描的人体模型,基于MakeHuman社区的人体测量学知识构建,通过语义明确的表型参数(如性别、年龄、身高、体重)控制混合形状(blendshapes),实现跨年龄(从婴儿到老年人)、体型和比例的连续可解释形变空间;该模型经WHO人口统计数据校准,确保生成的人体形态具有现实性和人口统计学代表性,同时支持高精度扫描拟合、可控合成数据生成及Human Mesh Recovery(HMR)任务,验证了其在性能上可媲美基于扫描数据训练的模型,且具备更强的可解释性与开放性。
链接: https://arxiv.org/abs/2511.03589
作者: Romain Brégier,Guénolé Fiche,Laura Bravo-Sánchez,Thomas Lucas,Matthieu Armando,Philippe Weinzaepfel,Grégory Rogez,Fabien Baradel
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: We release our model and code at this https URL
Abstract:Parametric body models are central to many human-centric tasks, yet existing models often rely on costly 3D scans and learned shape spaces that are proprietary and demographically narrow. We introduce Anny, a simple, fully differentiable, and scan-free human body model grounded in anthropometric knowledge from the MakeHuman community. Anny defines a continuous, interpretable shape space, where phenotype parameters (e.g. gender, age, height, weight) control blendshapes spanning a wide range of human forms – across ages (from infants to elders), body types, and proportions. Calibrated using WHO population statistics, it provides realistic and demographically grounded human shape variation within a single unified model. Thanks to its openness and semantic control, Anny serves as a versatile foundation for 3D human modeling – supporting millimeter-accurate scan fitting, controlled synthetic data generation, and Human Mesh Recovery (HMR). We further introduce Anny-One, a collection of 800k photorealistic humans generated with Anny, showing that despite its simplicity, HMR models trained with Anny can match the performance of those trained with scan-based body models, while remaining interpretable and broadly representative. The Anny body model and its code are released under the Apache 2.0 license, making Anny an accessible foundation for human-centric 3D modeling.
zh
[CV-6] OneOcc: Semantic Occupancy Prediction for Legged Robots with a Single Panoramic Camera
【速读】:该论文旨在解决腿式/人形机器人在存在步态引入的体部抖动(gait-introduced body jitter)情况下,实现鲁棒的3D语义占据(semantic occupancy)感知问题。现有大多数语义场景补全(Semantic Scene Completion, SSC)系统针对轮式平台设计,依赖前向传感器,难以适应全身360°连续感知需求。其关键解决方案包括:(i) 双投影融合(Dual-Projection fusion, DP-ER),利用环形全景图及其等距圆柱展开图,保持360°连续性和网格对齐;(ii) 双网格体素化(Bi-Grid Voxelization, BGV),在笛卡尔与柱坐标空间中协同推理,降低离散化偏差并锐化自由/占据边界;(iii) 轻量级解码器结合分层AMoE-3D模块,实现动态多尺度特征融合,提升远距离与遮挡区域的推理能力;(iv) 插件式步态位移补偿(Gait Displacement Compensation, GDC),通过学习特征级运动校正无需额外传感器即可消除抖动影响。上述模块共同构成OneOcc框架,在两个新发布的全景占据基准QuadOcc和Human360Occ上均取得SOTA性能,且模型轻量化,具备部署于腿式/人形机器人的可行性。
链接: https://arxiv.org/abs/2511.03571
作者: Hao Shi,Ze Wang,Shangwei Guo,Mengfei Duan,Song Wang,Teng Chen,Kailun Yang,Lin Wang,Kaiwei Wang
机构: ZJU(浙江大学); NTU(南洋理工大学); MirrorMe Technology(镜像科技); HNU(湖南大学); Horizon Robotics( horizon机器人)
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
备注: Datasets and code will be publicly available at this https URL
Abstract:Robust 3D semantic occupancy is crucial for legged/humanoid robots, yet most semantic scene completion (SSC) systems target wheeled platforms with forward-facing sensors. We present OneOcc, a vision-only panoramic SSC framework designed for gait-introduced body jitter and 360° continuity. OneOcc combines: (i) Dual-Projection fusion (DP-ER) to exploit the annular panorama and its equirectangular unfolding, preserving 360° continuity and grid alignment; (ii) Bi-Grid Voxelization (BGV) to reason in Cartesian and cylindrical-polar spaces, reducing discretization bias and sharpening free/occupied boundaries; (iii) a lightweight decoder with Hierarchical AMoE-3D for dynamic multi-scale fusion and better long-range/occlusion reasoning; and (iv) plug-and-play Gait Displacement Compensation (GDC) learning feature-level motion correction without extra sensors. We also release two panoramic occupancy benchmarks: QuadOcc (real quadruped, first-person 360°) and Human360Occ (H3O) (CARLA human-ego 360° with RGB, Depth, semantic occupancy; standardized within-/cross-city splits). OneOcc sets new state-of-the-art (SOTA): on QuadOcc it beats strong vision baselines and popular LiDAR ones; on H3O it gains +3.83 mIoU (within-city) and +8.08 (cross-city). Modules are lightweight, enabling deployable full-surround perception for legged/humanoid robots. Datasets and code will be publicly available at this https URL.
zh
[CV-7] Generalizing Shape-from-Template to Topological Changes
【速读】:该论文旨在解决形状恢复(Shape-from-Template, SfT)方法在面对形变伴随拓扑变化(topological changes)时失效的问题。传统SfT方法假设物体形变过程中拓扑结构保持不变,但在实际应用中,如表面撕裂或切割等事件会导致拓扑结构改变,从而破坏重建精度。解决方案的关键在于提出一种原理性扩展的SfT框架:以经典SfT结果为初始解,通过迭代地对模板的空间域进行分割(partitioning),最小化一个联合编码物理合理性与重投影一致性的能量函数,从而自适应调整模板以适应拓扑变化。该方法首次建立了能够感知拓扑变化的通用SfT框架,并在合成与真实数据上验证了其优于基线方法的鲁棒性和准确性。
链接: https://arxiv.org/abs/2511.03459
作者: Kevin Manogue,Tomasz M Schang,Dilara Kuş,Jonas Müller,Stefan Zachow,Agniva Sengupta
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted for publication at Smart Tools and Applications in Graphics (STAG), Genoa, Italy (2025)
Abstract:Reconstructing the surfaces of deformable objects from correspondences between a 3D template and a 2D image is well studied under Shape-from-Template (SfT) methods; however, existing approaches break down when topological changes accompany the deformation. We propose a principled extension of SfT that enables reconstruction in the presence of such changes. Our approach is initialized with a classical SfT solution and iteratively adapts the template by partitioning its spatial domain so as to minimize an energy functional that jointly encodes physical plausibility and reprojection consistency. We demonstrate that the method robustly captures a wide range of practically relevant topological events including tears and cuts on bounded 2D surfaces, thereby establishing the first general framework for topological-change-aware SfT. Experiments on both synthetic and real data confirm that our approach consistently outperforms baseline methods.
zh
[CV-8] Robust Alignment of the Human Embryo in 3D Ultrasound using PCA and an Ensemble of Heuristic Atlas-based and Learning-based Classifiers Evaluated on the Rotterdam Periconceptional Cohort
【速读】:该论文旨在解决三维(3D)超声图像中胚胎定位标准化的问题,以提升产前生长监测的准确性与一致性。当前临床实践中,不同扫描间的胚胎方向差异导致标准切面识别困难、解剖标志可视化不佳及跨时间点比较受限。为应对这一挑战,作者提出一种基于主成分分析(Principal Component Analysis, PCA)的自动化对齐方法:首先利用PCA从胚胎分割掩膜中提取其主轴方向,生成四个候选姿态;随后通过三种策略——基于皮尔逊相关系数的形状启发式规则、基于归一化互相关(Normalized Cross-Correlation)的图像配准至图谱、以及随机森林分类器——筛选出最符合标准方位的候选姿态。关键创新在于结合多策略投票机制,在98.5%的样本中实现高精度对齐,显著提升了早期妊娠期胚胎图像的一致性与可扩展性分析能力。
链接: https://arxiv.org/abs/2511.03416
作者: Nikolai Herrmann,Marcella C. Zijta,Stefan Klein,Régine P.M. Steegers-Theunissen,Rene M.H. Wijnen,Bernadette S. de Bakker,Melek Rousian,Wietske A.P. Bastiaansen
机构: Erasmus MC (伊拉斯谟医学中心)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Submitted version of paper accepted at International Workshop on Preterm, Perinatal and Paediatric Image Analysis 2025
Abstract:Standardized alignment of the embryo in three-dimensional (3D) ultrasound images aids prenatal growth monitoring by facilitating standard plane detection, improving visualization of landmarks and accentuating differences between different scans. In this work, we propose an automated method for standardizing this alignment. Given a segmentation mask of the embryo, Principal Component Analysis (PCA) is applied to the mask extracting the embryo’s principal axes, from which four candidate orientations are derived. The candidate in standard orientation is selected using one of three strategies: a heuristic based on Pearson’s correlation assessing shape, image matching to an atlas through normalized cross-correlation, and a Random Forest classifier. We tested our method on 2166 images longitudinally acquired 3D ultrasound scans from 1043 pregnancies from the Rotterdam Periconceptional Cohort, ranging from 7+0 to 12+6 weeks of gestational age. In 99.0% of images, PCA correctly extracted the principal axes of the embryo. The correct candidate was selected by the Pearson Heuristic, Atlas-based and Random Forest in 97.4%, 95.8%, and 98.4% of images, respectively. A Majority Vote of these selection methods resulted in an accuracy of 98.5%. The high accuracy of this pipeline enables consistent embryonic alignment in the first trimester, enabling scalable analysis in both clinical and research settings. The code is publicly available at: this https URL.
zh
[CV-9] Decoupling Augmentation Bias in Prompt Learning for Vision-Language Models
【速读】:该论文旨在解决当前基于提示学习(prompt learning)的零样本视觉分类方法在面对完全未见类别时泛化能力不足的问题,尤其是现有方法如CoCoOp未能有效聚焦于语义上有意义的视觉特征,而仅依赖文本层面的提示优化,忽略了图像级数据增强对提升模型鲁棒性的潜力。其解决方案的关键在于提出AAPL(Adding Attributes to Prompt Learning),通过引入对抗性标记嵌入(adversarial token embeddings)来解耦由图像增强带来的表面视觉变化与类别相关语义表示之间的关联,从而引导学习到的提示专注于与目标类别高度相关的判别性视觉特征,显著提升了在少样本、零样本、跨数据集及领域泛化等场景下的性能表现。
链接: https://arxiv.org/abs/2511.03367
作者: Gahyeon Kim,Sohee Kim,Seokju Lee
机构: Kentech(肯特科技)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Accepted in Pattern Recognition
Abstract:Recent advances in large-scale vision and language models have led to significant progress in zero-shot learning tasks. Methods such as CoOp and CoCoOp have shown that replacing handcrafted prompts with learnable vectors, known as prompt learning, can result in improved performance. However, these models often struggle to generalize to entirely unseen categories. While traditional zero-shot learning techniques benefit from various data augmentation strategies, prompt learning has primarily focused on text-based modifications, leaving the potential of image-based augmentation largely unexplored. In this work, we explore how image-level augmentations, particularly those that introduce attribute-specific variations, can support and enhance prompt learning. Our analysis examines the interaction between these augmentations and soft prompt frameworks, revealing their potential to improve generalization. We also identify a limitation in existing methods, such as CoCoOp, which do not provide explicit guidance for learning prompts that focus on semantically meaningful visual features. To address this, we propose Adding Attributes to Prompt Learning, AAPL, a novel method that introduces adversarial token embeddings to decouple superficial visual variations introduced by augmentation from class-relevant semantic representations. This decoupling enables the learned prompts to concentrate on visually discriminative features that align with the target categories. We conduct comprehensive experiments on eleven benchmark datasets, and AAPL consistently outperforms existing methods across few-shot, zero-shot, cross-dataset, and domain generalization settings. Our source code is publicly available at: this https URL
zh
[CV-10] UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions
【速读】:该论文旨在解决现有开源音视频生成方法在跨模态建模方面的不足,具体表现为唇形同步质量差和语义一致性弱的问题。其解决方案的关键在于提出一个统一的音视频联合生成框架 UniAVGen,该框架采用双分支联合合成架构,通过两个并行的扩散 Transformer (Diffusion Transformer, DiT) 构建一致的跨模态潜在空间;核心创新是引入不对称跨模态交互机制(Asymmetric Cross-Modal Interaction),实现双向、时序对齐的交叉注意力,从而保障时空同步与语义一致性;同时结合人脸感知调制模块(Face-Aware Modulation)动态强化关键区域交互,并引入模态感知无分类器引导(Modality-Aware Classifier-Free Guidance)以增强生成过程中跨模态相关信号,显著提升生成保真度与任务泛化能力。
链接: https://arxiv.org/abs/2511.03334
作者: Guozhen Zhang,Zixiang Zhou,Teng Hu,Ziqiao Peng,Youliang Zhang,Yi Chen,Yuan Zhou,Qinglin Lu,Limin Wang
机构: Nanjing University (南京大学); Tencent Hunyuan (腾讯混元); Shanghai Jiao Tong University (上海交通大学); Renmin University of China (中国人民大学); Tsinghua University (清华大学); Shanghai AI Lab (上海人工智能实验室)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
Abstract:Due to the lack of effective cross-modal modeling, existing open-source audio-video generation methods often exhibit compromised lip synchronization and insufficient semantic consistency. To mitigate these drawbacks, we propose UniAVGen, a unified framework for joint audio and video generation. UniAVGen is anchored in a dual-branch joint synthesis architecture, incorporating two parallel Diffusion Transformers (DiTs) to build a cohesive cross-modal latent space. At its heart lies an Asymmetric Cross-Modal Interaction mechanism, which enables bidirectional, temporally aligned cross-attention, thus ensuring precise spatiotemporal synchronization and semantic consistency. Furthermore, this cross-modal interaction is augmented by a Face-Aware Modulation module, which dynamically prioritizes salient regions in the interaction process. To enhance generative fidelity during inference, we additionally introduce Modality-Aware Classifier-Free Guidance, a novel strategy that explicitly amplifies cross-modal correlation signals. Notably, UniAVGen’s robust joint synthesis design enables seamless unification of pivotal audio-video tasks within a single model, such as joint audio-video generation and continuation, video-to-audio dubbing, and audio-driven video synthesis. Comprehensive experiments validate that, with far fewer training samples (1.3M vs. 30.1M), UniAVGen delivers overall advantages in audio-video synchronization, timbre consistency, and emotion consistency.
zh
[CV-11] Multi-Object Tracking Retrieval with LLaVA-Video: A Training-Free Solution to MOT25-StAG Challenge
【速读】:该论文旨在解决多目标时空动作定位(MOT25-Spatiotemporal Action Grounding, MOT25-StAG)问题,即在复杂真实场景的视频数据中,准确地定位并跟踪与自由形式语言查询匹配的多个对象。解决方案的关键在于将任务建模为视频检索问题,并提出一种两阶段、零样本(zero-shot)方法:第一阶段利用当前最优的追踪模型 FastTracker 进行初步目标检测与跟踪;第二阶段结合多模态大语言模型 LLaVA-Video 实现跨模态语义对齐与精确定位,从而实现对任意语言描述目标的精准时空定位与持续跟踪。
链接: https://arxiv.org/abs/2511.03332
作者: Yi Yang,Yiming Xu,Timo Kaiser,Hao Cheng,Bodo Rosenhahn,Michael Ying Yang
机构: Leibniz University Hannover (汉诺威莱布尼茨大学); University of Twente (特温特大学); University of Bath (巴斯大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
Abstract:In this report, we present our solution to the MOT25-Spatiotemporal Action Grounding (MOT25-StAG) Challenge. The aim of this challenge is to accurately localize and track multiple objects that match specific and free-form language queries, using video data of complex real-world scenes as input. We model the underlying task as a video retrieval problem and present a two-stage, zero-shot approach, combining the advantages of the SOTA tracking model FastTracker and Multi-modal Large Language Model LLaVA-Video. On the MOT25-StAG test set, our method achieves m-HIoU and HOTA scores of 20.68 and 10.73 respectively, which won second place in the challenge.
zh
[CV-12] SurgViVQA: Temporally-Grounded Video Question Answering for Surgical Scene Understanding
【速读】:该论文旨在解决当前外科视频问答(Video Question Answering, VideoQA)模型在手术场景中对动态过程理解不足的问题,现有方法多依赖静态图像特征且缺乏时间标注,难以捕捉手术过程中关键的时序信息(如运动轨迹和器械-组织交互)。其解决方案的核心在于提出SurgViVQA模型,该模型采用掩码视频-文本编码器(Masked Video–Text Encoder)融合视频与问题特征,有效捕获时间线索,并由微调的大语言模型(LLM)生成连贯答案,从而实现从静态图像到动态手术场景的视觉推理扩展。
链接: https://arxiv.org/abs/2511.03325
作者: Mauro Orazio Drago,Luca Carlini,Pelinsu Celebi Balyemez,Dennis Pierantozzi,Chiara Lena,Cesare Hassan,Danail Stoyanov,Elena De Momi,Sophia Bano,Mobarak I. Hoque
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
Abstract:Video Question Answering (VideoQA) in the surgical domain aims to enhance intraoperative understanding by enabling AI models to reason over temporally coherent events rather than isolated frames. Current approaches are limited to static image features, and available datasets often lack temporal annotations, ignoring the dynamics critical for accurate procedural interpretation. We propose SurgViVQA, a surgical VideoQA model that extends visual reasoning from static images to dynamic surgical scenes. It uses a Masked Video–Text Encoder to fuse video and question features, capturing temporal cues such as motion and tool–tissue interactions, which a fine-tuned large language model (LLM) then decodes into coherent answers. To evaluate its performance, we curated REAL-Colon-VQA, a colonoscopic video dataset that includes motion-related questions and diagnostic attributes, as well as out-of-template questions with rephrased or semantically altered formulations to assess model robustness. Experimental validation on REAL-Colon-VQA and the public EndoVis18-VQA dataset shows that SurgViVQA outperforms existing image-based VQA benchmark models, particularly in keyword accuracy, improving over PitVQA by +11% on REAL-Colon-VQA and +9% on EndoVis18-VQA. A perturbation study on the questions further confirms improved generalizability and robustness to variations in question phrasing. SurgViVQA and the REAL-Colon-VQA dataset provide a framework for temporally-aware understanding in surgical VideoQA, enabling AI models to interpret dynamic procedural contexts more effectively. Code and dataset available at this https URL.
zh
[CV-13] Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models
【速读】:该论文旨在解决扩散模型(Diffusion Models)在基于人类偏好对齐(preference alignment)过程中存在的关键问题:标准的扩散型直接偏好优化(Diffusion-DPO)方法在扩大偏好边际(preference margin)时,并不能保证生成质量提升,反而可能导致未被偏好的输出(loser branch)重建误差显著增加,进而反向损害偏好输出(winner branch)的质量。其解决方案的核心在于提出 Diffusion-SDPO,一种受保护的更新规则,通过自适应地根据 loser 梯度与 winner 梯度的一致性来缩放 loser 的梯度,从而在优化过程中保障 preferred output 的重建误差不增加。理论分析给出了一个一阶闭式缩放系数,确保每一步优化中优选输出的误差非递增,且该方法具有模型无关性、兼容现有 DPO 框架,并仅引入可忽略的计算开销。
链接: https://arxiv.org/abs/2511.03317
作者: Minghao Fu,Guo-Hua Wang,Tianyu Cui,Qing-Guo Chen,Zhao Xu,Weihua Luo,Kaifu Zhang
机构: Nanjing University (南京大学); Alibaba International Digital Commerce Group (阿里巴巴国际数字商业集团)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: The code is publicly available at this https URL
Abstract:Text-to-image diffusion models deliver high-quality images, yet aligning them with human preferences remains challenging. We revisit diffusion-based Direct Preference Optimization (DPO) for these models and identify a critical pathology: enlarging the preference margin does not necessarily improve generation quality. In particular, the standard Diffusion-DPO objective can increase the reconstruction error of both winner and loser branches. Consequently, degradation of the less-preferred outputs can become sufficiently severe that the preferred branch is also adversely affected even as the margin grows. To address this, we introduce Diffusion-SDPO, a safeguarded update rule that preserves the winner by adaptively scaling the loser gradient according to its alignment with the winner gradient. A first-order analysis yields a closed-form scaling coefficient that guarantees the error of the preferred output is non-increasing at each optimization step. Our method is simple, model-agnostic, broadly compatible with existing DPO-style alignment frameworks and adds only marginal computational overhead. Across standard text-to-image benchmarks, Diffusion-SDPO delivers consistent gains over preference-learning baselines on automated preference, aesthetic, and prompt alignment metrics. Code is publicly available at this https URL.
zh
[CV-14] Unified Long Video Inpainting and Outpainting via Overlapping High-Order Co-Denoising
【速读】:该论文旨在解决长视频生成与编辑中两个核心难题:一是如何实现高保真度的长视频生成,二是如何在视频修复(inpainting)和扩展(outpainting)任务中实现高可控性。现有方法通常受限于固定长度片段或产生拼接伪影,难以满足实际应用对长时间一致性与无缝编辑的需求。解决方案的关键在于提出一种统一框架,通过LoRA(Low-Rank Adaptation)高效微调预训练视频扩散模型(如Wan 2.1),实现掩码区域的高质量视频合成;同时引入重叠-融合的时间协同去噪策略(overlap-and-blend temporal co-denoising)与高阶求解器(high-order solvers),显著提升长序列中的时序一致性与视觉连续性,从而在不引入明显接缝或漂移的前提下,支持任意长度的空间编辑视频生成。
链接: https://arxiv.org/abs/2511.03272
作者: Shuangquan Lyu,Steven Mao,Yue Ma
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
Abstract:Generating long videos remains a fundamental challenge, and achieving high controllability in video inpainting and outpainting is particularly demanding. To address both of these challenges simultaneously and achieve controllable video inpainting and outpainting for long video clips, we introduce a novel and unified approach for long video inpainting and outpainting that extends text-to-video diffusion models to generate arbitrarily long, spatially edited videos with high fidelity. Our method leverages LoRA to efficiently fine-tune a large pre-trained video diffusion model like Alibaba’s Wan 2.1 for masked region video synthesis, and employs an overlap-and-blend temporal co-denoising strategy with high-order solvers to maintain consistency across long sequences. In contrast to prior work that struggles with fixed-length clips or exhibits stitching artifacts, our system enables arbitrarily long video generation and editing without noticeable seams or drift. We validate our approach on challenging inpainting/outpainting tasks including editing or adding objects over hundreds of frames and demonstrate superior performance to baseline methods like Wan 2.1 model and VACE in terms of quality (PSNR/SSIM), and perceptual realism (LPIPS). Our method enables practical long-range video editing with minimal overhead, achieved a balance between parameter efficient and superior performance.
zh
[CV-15] IEC3D-AD: A 3D Dataset of Industrial Equipment Components for Unsupervised Point Cloud Anomaly Detection
【速读】:该论文旨在解决工业制造中3D异常检测(3D anomaly detection, 3D-AD)因现有数据集(如Real3D-AD和MVTec 3D-AD)无法充分反映真实工业场景下复杂性和细微缺陷而导致的精度不足问题,尤其针对轴承、环和螺栓等工业设备部件(Industrial Equipment Components, IEC)。解决方案的关键在于构建一个高保真度的点云异常检测数据集IEC3D-AD,其直接从实际生产线采集,具有更高分辨率和更细粒度的缺陷标注;同时提出一种新型3D-AD范式GMANet,受生成式2D异常检测启发,通过几何形态学分析生成合成点云样本,并基于空间差异优化策略缩小正常与异常点级特征之间的距离边界,从而提升检测性能。
链接: https://arxiv.org/abs/2511.03267
作者: Bingyang Guo,Hongjie Li,Ruiyun Yu,Hanzhe Liang,Jinbao Wang
机构: Northeastern University (东北大学); Shenzhen University (深圳大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
Abstract:3D anomaly detection (3D-AD) plays a critical role in industrial manufacturing, particularly in ensuring the reliability and safety of core equipment components. Although existing 3D datasets like Real3D-AD and MVTec 3D-AD offer broad application support, they fall short in capturing the complexities and subtle defects found in real industrial environments. This limitation hampers precise anomaly detection research, especially for industrial equipment components (IEC) such as bearings, rings, and bolts. To address this challenge, we have developed a point cloud anomaly detection dataset (IEC3D-AD) specific to real industrial scenarios. This dataset is directly collected from actual production lines, ensuring high fidelity and relevance. Compared to existing datasets, IEC3D-AD features significantly improved point cloud resolution and defect annotation granularity, facilitating more demanding anomaly detection tasks. Furthermore, inspired by generative 2D-AD methods, we introduce a novel 3D-AD paradigm (GMANet) on IEC3D-AD. This paradigm generates synthetic point cloud samples based on geometric morphological analysis, then reduces the margin and increases the overlap between normal and abnormal point-level features through spatial discrepancy optimization. Extensive experiments demonstrate the effectiveness of our method on both IEC3D-AD and other datasets.
zh
[CV-16] Enhancing Medical Image Segmentation via Heat Conduction Equation
【速读】:该论文旨在解决现有深度学习模型在医疗图像分割任务中难以同时实现高效全局上下文建模与长程依赖推理的问题,尤其是在计算资源受限的实际场景下。其解决方案的关键在于提出一种新型混合架构——U-Mamba,该架构融合了基于状态空间模型(State-Space Model, SSM)的Mamba模块以实现高效的长程推理能力,并在瓶颈层引入热传导算子(Heat Conduction Operator, HCO),通过模拟频域下的热扩散过程增强语义抽象能力,从而在保持计算效率的同时提升分割性能。
链接: https://arxiv.org/abs/2511.03260
作者: Rong Wu,Yim-Sang Yu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
Abstract:Medical image segmentation has been significantly advanced by deep learning architectures, notably U-Net variants. However, existing models struggle to achieve efficient global context modeling and long-range dependency reasoning under practical computational budgets simultaneously. In this work, we propose a novel hybrid architecture utilizing U-Mamba with Heat Conduction Equation. Our model combines Mamba-based state-space modules for efficient long-range reasoning with Heat Conduction Operators (HCOs) in the bottleneck layers, simulating frequency-domain thermal diffusion for enhanced semantic abstraction. Experimental results on multimodal abdominal CT and MRI datasets demonstrate that the proposed model consistently outperforms strong baselines, validating its effectiveness and generalizability. It suggest that blending state-space dynamics with heat-based global diffusion offers a scalable and interpretable solution for medical segmentation tasks.
zh
[CV-17] Decoupled Entropy Minimization NEURIPS2025
【速读】:该论文旨在解决经典熵最小化(Entropy Minimization, EM)方法在实际应用中因耦合式设计导致的两大局限性问题:一是“奖励坍塌”(reward collapse),即高置信度样本对学习过程的贡献被抑制;二是“易类偏差”(easy-class bias),即模型输出分布与标签分布之间出现错位。为克服这些问题,作者提出自适应解耦熵最小化(Adaptive Decoupled Entropy Minimization, AdaDEM),其核心创新在于将EM分解为两个独立作用的模块——通过归一化簇聚合驱动因子(Cluster Aggregation Driving Factor, CADF)的奖励来缓解奖励坍塌,并引入边际熵校准器(Marginal Entropy Calibrator, MEC)替代梯度缓解校准器(Gradient Mitigation Calibrator, GMC),从而更有效地平衡类别间分布并提升模型在噪声和动态环境下的鲁棒性。
链接: https://arxiv.org/abs/2511.03256
作者: Jing Ma,Hanlin Li,Xiang Xiang
机构: Huazhong University of Science and Technology (华中科技大学); Peng Cheng Lab (鹏城实验室)
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)
备注: To appear at NeurIPS 2025 (main conference), San Diego, CA, USA. Codes available at this https URL
Abstract:Entropy Minimization (EM) is beneficial to reducing class overlap, bridging domain gap, and restricting uncertainty for various tasks in machine learning, yet its potential is limited. To study the internal mechanism of EM, we reformulate and decouple the classical EM into two parts with opposite effects: cluster aggregation driving factor (CADF) rewards dominant classes and prompts a peaked output distribution, while gradient mitigation calibrator (GMC) penalizes high-confidence classes based on predicted probabilities. Furthermore, we reveal the limitations of classical EM caused by its coupled formulation: 1) reward collapse impedes the contribution of high-certainty samples in the learning process, and 2) easy-class bias induces misalignment between output distribution and label distribution. To address these issues, we propose Adaptive Decoupled Entropy Minimization (AdaDEM), which normalizes the reward brought from CADF and employs a marginal entropy calibrator (MEC) to replace GMC. AdaDEM outperforms DEM*, an upper-bound variant of classical EM, and achieves superior performance across various imperfectly supervised learning tasks in noisy and dynamic environments.
zh
[CV-18] Generative deep learning for foundational video translation in ultrasound
【速读】:该论文旨在解决医学超声图像数据中因不同模态(如灰度和彩色多普勒血流成像,CFD)分布不均而导致的训练数据不平衡问题,进而提升深度学习模型在医学图像分类与分割任务中的性能。其解决方案的关键在于提出了一种生成式图像翻译方法,用于将CFD图像转换为对应的灰度图像,该方法融合了像素级损失、对抗损失和感知损失,并采用两个独立网络分别负责解剖结构重建与去噪,从而生成高质量且逼真的合成超声视频。实验表明,合成视频在结构相似性(SSIM)指标上接近真实视频(平均0.91±0.04),并在深度学习分类、分割任务及临床专家盲评中表现优异,验证了其真实性和泛化能力。
链接: https://arxiv.org/abs/2511.03255
作者: Nikolina Tomic Roshni Bhatnagar,Sarthak Jain,Connor Lau,Tien-Yu Liu,Laura Gambini,Rima Arnaout
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:
Abstract:Deep learning (DL) has the potential to revolutionize image acquisition and interpretation across medicine, however, attention to data imbalance and missingness is required. Ultrasound data presents a particular challenge because in addition to different views and structures, it includes several sub-modalities-such as greyscale and color flow doppler (CFD)-that are often imbalanced in clinical studies. Image translation can help balance datasets but is challenging for ultrasound sub-modalities to date. Here, we present a generative method for ultrasound CFD-greyscale video translation, trained on 54,975 videos and tested on 8,368. The method developed leveraged pixel-wise, adversarial, and perceptual loses and utilized two networks: one for reconstructing anatomic structures and one for denoising to achieve realistic ultrasound imaging. Average pairwise SSIM between synthetic videos and ground truth was 0.91+/-0.04. Synthetic videos performed indistinguishably from real ones in DL classification and segmentation tasks and when evaluated by blinded clinical experts: F1 score was 0.9 for real and 0.89 for synthetic videos; Dice score between real and synthetic segmentation was 0.97. Overall clinician accuracy in distinguishing real vs synthetic videos was 54+/-6% (42-61%), indicating realistic synthetic videos. Although trained only on heart videos, the model worked well on ultrasound spanning several clinical domains (average SSIM 0.91+/-0.05), demonstrating foundational abilities. Together, these data expand the utility of retrospectively collected imaging and augment the dataset design toolbox for medical imaging.
zh
[CV-19] Decoupled Multi-Predictor Optimization for Inference-Efficient Model Tuning ICCV2025
【速读】:该论文旨在解决多阶段预测器(multi-stage predictors)中早期阶段难以同时提供低层次基础特征与高层次判别特征的问题,这一矛盾限制了模型在推理效率与性能之间的平衡。解决方案的关键在于提出解耦式多预测器优化方法(Decoupled Multi-Predictor Optimization, DMPO),其核心创新包括:1)在架构层面引入轻量级旁路模块(bypass module),实现浅层特征的功能分解,使早期阶段专注于提取低层次代表性特征;2)设计基于高阶统计量的预测器(high-order statistics-based predictor),增强早期阶段的判别能力;3)提出分阶段损失权重分配策略,通过两阶段优化机制——初始阶段侧重于深度阶段的判别能力学习,后期阶段逐步将判别能力向早期阶段迁移,从而在结构和训练策略上有效解耦早期阶段的代表性能力与判别能力。
链接: https://arxiv.org/abs/2511.03245
作者: Liwei Luo,Shuaitengyuan Li,Dongwei Ren,Qilong Wang,Pengfei Zhu,Qinghua Hu
机构: Tianjin University (天津大学); Low-Altitude Intelligence Lab, Xiong’an National Innovation Center Technology Co., Ltd (雄安国家创新中心科技有限公司)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by ICCV2025
Abstract:Recently, remarkable progress has been made in large-scale pre-trained model tuning, and inference efficiency is becoming more crucial for practical deployment. Early exiting in conjunction with multi-stage predictors, when cooperated with a parameter-efficient fine-tuning strategy, offers a straightforward way to achieve an inference-efficient model. However, a key challenge remains unresolved: How can early stages provide low-level fundamental features to deep stages while simultaneously supplying high-level discriminative features to early-stage predictors? To address this problem, we propose a Decoupled Multi-Predictor Optimization (DMPO) method to effectively decouple the low-level representative ability and high-level discriminative ability in early stages. First, in terms of architecture, we introduce a lightweight bypass module into multi-stage predictors for functional decomposition of shallow features from early stages, while a high-order statistics-based predictor is developed for early stages to effectively enhance their discriminative ability. To reasonably train our multi-predictor architecture, a decoupled optimization is proposed to allocate two-phase loss weights for multi-stage predictors during model tuning, where the initial training phase enables the model to prioritize the acquisition of discriminative ability of deep stages via emphasizing representative ability of early stages, and the latter training phase drives discriminative ability towards earlier stages as much as possible. As such, our DMPO can effectively decouple representative and discriminative abilities in early stages in terms of architecture design and model optimization. Experiments across various datasets and pre-trained backbones demonstrate that DMPO clearly outperforms its counterparts when reducing computational cost.
zh
[CV-20] A Feedback-Control Framework for Efficient Dataset Collection from In-Vehicle Data Streams
【速读】:该论文旨在解决当前数据驱动型AI系统中因数据采集方式低效而导致的冗余样本积累、存储成本高及模型泛化能力受限的问题。现有数据收集多采用开环模式,缺乏对已收集数据分布的实时反馈,难以保证数据质量与多样性。其解决方案的关键在于提出一种闭环控制范式——反馈引导的数据采集(Feedback-driven Data Collection, \acFCDC),该方法通过在线概率模型持续近似数据分布状态,并基于似然值和马氏距离等反馈信号自适应调节样本保留策略,从而在探索(exploration)与利用(exploitation)之间动态平衡,有效维持数据集多样性并抑制冗余积累。实验表明,\acFCDC可在真实数据流上实现25.9%的数据分布均衡性提升和39.8%的存储空间节省。
链接: https://arxiv.org/abs/2511.03239
作者: Philipp Reis,Philipp Rigoll,Christian Steinhauser,Jacob Langner,Eric Sax
机构: 未知
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
备注:
Abstract:Modern AI systems are increasingly constrained not by model capacity but by the quality and diversity of their data. Despite growing emphasis on data-centric AI, most datasets are still gathered in an open-loop manner which accumulates redundant samples without feedback from the current coverage. This results in inefficient storage, costly labeling, and limited generalization. To address this, this paper introduces \acFCDC, a paradigm that formulates data collection as a closed-loop control problem. \acFCDC continuously approximates the state of the collected data distribution using an online probabilistic model and adaptively regulates sample retention using based on feedback signals such as likelihood and Mahalanobis distance. Through this feedback mechanism, the system dynamically balances exploration and exploitation, maintains dataset diversity, and prevents redundancy from accumulating over time. Besides showcasing the controllability of \acFCDC on a synthetic dataset, experiments on a real data stream show that \acFCDC produces more balanced datasets by \SI25.9\percent while reducing data storage by \SI39.8\percent . These results demonstrate that data collection itself can be actively controlled, transforming collection from a passive pipeline stage into a self-regulating, feedback-driven process at the core of data-centric AI.
zh
[CV-21] ransformer-Progressive Mamba Network for Lightweight Image Super-Resolution
【速读】:该论文旨在解决现有基于Mamba的超分辨率(Super-Resolution, SR)方法在多尺度特征建模中缺乏细粒度过渡的问题,从而限制了特征表示效率。其解决方案的关键在于提出T-PMambaSR框架,通过引入基于窗口的自注意力机制与渐进式Mamba(Progressive Mamba)相结合,实现不同感受野尺度间的交互,构建一种具有线性复杂度的细粒度建模范式,以逐步增强特征表达能力;同时设计自适应高频细节恢复模块(Adaptive High-Frequency Refinement Module, AHFRM),有效恢复Transformer和Mamba处理过程中丢失的高频信息,从而在保持低计算成本的同时提升重建质量。
链接: https://arxiv.org/abs/2511.03232
作者: Sichen Guo,Wenjie Li,Yuanyang Liu,Guangwei Gao,Jian Yang,Chia-Wen Lin
机构: Bell Honors School, Nanjing University of Posts and Telecommunications (南京邮电大学贝尔英才学院); Pattern Recognition and Intelligent System Laboratory, School of Artificial Intelligence, Beijing University of Posts and Telecommunications (北京邮电大学人工智能学院模式识别与智能系统实验室); PCA Lab, Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, School of Computer Science and Engineering, Nanjing University of Science and Technology (南京理工大学计算机科学与工程学院PCA实验室); Department of Electrical Engineering and the Institute of Communications Engineering, National Tsing Hua University (国立清华大学电机工程系及通讯工程研究所)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 12 pages, 10 figures, 7 tables
Abstract:Recently, Mamba-based super-resolution (SR) methods have demonstrated the ability to capture global receptive fields with linear complexity, addressing the quadratic computational cost of Transformer-based SR approaches. However, existing Mamba-based methods lack fine-grained transitions across different modeling scales, which limits the efficiency of feature representation. In this paper, we propose T-PMambaSR, a lightweight SR framework that integrates window-based self-attention with Progressive Mamba. By enabling interactions among receptive fields of different scales, our method establishes a fine-grained modeling paradigm that progressively enhances feature representation with linear complexity. Furthermore, we introduce an Adaptive High-Frequency Refinement Module (AHFRM) to recover high-frequency details lost during Transformer and Mamba processing. Extensive experiments demonstrate that T-PMambaSR progressively enhances the model’s receptive field and expressiveness, yielding better performance than recent Transformer- or Mamba-based methods while incurring lower computational cost. Our codes will be released after acceptance.
zh
[CV-22] Diffusion-Guided Mask-Consistent Paired Mixing for Endoscopic Image Segmentation
【速读】:该论文旨在解决密集预测任务中数据增强方法的局限性问题:传统样本混合(sample mixing)虽能提升鲁棒性,但因掩码(mask)对齐不一致导致软标签歧义;而扩散生成合成(diffusion synthesis)虽增加多样性,却常忽略掩码条件带来的结构优势,并引入合成与真实数据之间的域偏移(domain shift)。其解决方案的关键在于提出一种成对的、扩散引导的范式(paired, diffusion-guided paradigm),即为每张真实图像生成一个在相同掩码约束下对应的合成图像,二者构成可控输入用于Mask-Consistent Paired Mixing (MCPMix),该方法仅混合图像外观而始终使用原始硬掩码作为监督信号,从而在共享几何结构下平滑地连接合成与真实图像,既扩展多样性又保持像素级语义一致性。此外,通过Real-Anchored Learnable Annealing (RLA) 动态调整混合强度和混合样本损失权重,在训练过程中逐步将优化锚定回真实数据,缓解分布偏差,最终实现更鲁棒且泛化能力强的内镜分割性能。
链接: https://arxiv.org/abs/2511.03219
作者: Pengyu Jie,Wanquan Liu,Rui He,Yihui Wen,Deyu Meng,Chenqiang Gao
机构: Sun Yat-sen University (中山大学); Xi’an Jiaotong University (西安交通大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
Abstract:Augmentation for dense prediction typically relies on either sample mixing or generative synthesis. Mixing improves robustness but misaligned masks yield soft label ambiguity. Diffusion synthesis increases apparent diversity but, when trained as common samples, overlooks the structural benefit of mask conditioning and introduces synthetic-real domain shift. We propose a paired, diffusion-guided paradigm that fuses the strengths of both. For each real image, a synthetic counterpart is generated under the same mask and the pair is used as a controllable input for Mask-Consistent Paired Mixing (MCPMix), which mixes only image appearance while supervision always uses the original hard mask. This produces a continuous family of intermediate samples that smoothly bridges synthetic and real appearances under shared geometry, enlarging diversity without compromising pixel-level semantics. To keep learning aligned with real data, Real-Anchored Learnable Annealing (RLA) adaptively adjusts the mixing strength and the loss weight of mixed samples over training, gradually re-anchoring optimization to real data and mitigating distributional bias. Across Kvasir-SEG, PICCOLO, CVC-ClinicDB, a private NPC-LES cohort, and ISIC 2017, the approach achieves state-of-the-art segmentation performance and consistent gains over baselines. The results show that combining label-preserving mixing with diffusion-driven diversity, together with adaptive re-anchoring, yields robust and generalizable endoscopic segmentation.
zh
[CV-23] MvBody: Multi-View-Based Hybrid Transformer Using Optical 3D Body Scan for Explainable Cesarean Section Prediction
【速读】:该论文旨在解决在医疗资源有限或居家环境中难以准确预测剖宫产(Cesarean Section, CS)风险的问题,尤其针对现有模型多依赖分娩期间医院内获取的参数、不适用于非院内场景的局限性。其解决方案的关键在于提出一种基于多视角Transformer架构的新型网络MvBody,仅利用孕31至38周期间采集的自报医疗数据与3D光学体形扫描信息进行CS风险预测,并引入度量学习损失以提升小样本环境下的训练效率和模型泛化能力。实验表明,该方法在独立测试集上达到84.62%的准确率和0.724的AUC-ROC值,显著优于传统机器学习及先进3D分析方法,且通过集成梯度(Integrated Gradients)算法提供可解释性,明确了预孕期体重、产妇年龄、产科史、既往剖宫产史以及头部和肩部区域体形特征为关键预测因素。
链接: https://arxiv.org/abs/2511.03212
作者: Ruting Cheng,Boyuan Feng,Yijiang Zheng,Chuhui Qiu,Aizierjiang Aiersilan,Joaquin A. Calderon,Wentao Zhao,Qing Pan,James K. Hahn
机构: George Washington University (乔治·华盛顿大学); Waseda University (早稻田大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 19 pages, 4 figures
Abstract:Accurately assessing the risk of cesarean section (CS) delivery is critical, especially in settings with limited medical resources, where access to healthcare is often restricted. Early and reliable risk prediction allows better-informed prenatal care decisions and can improve maternal and neonatal outcomes. However, most existing predictive models are tailored for in-hospital use during labor and rely on parameters that are often unavailable in resource-limited or home-based settings. In this study, we conduct a pilot investigation to examine the feasibility of using 3D body shape for CS risk assessment for future applications with more affordable general devices. We propose a novel multi-view-based Transformer network, MvBody, which predicts CS risk using only self-reported medical data and 3D optical body scans obtained between the 31st and 38th weeks of gestation. To enhance training efficiency and model generalizability in data-scarce environments, we incorporate a metric learning loss into the network. Compared to widely used machine learning models and the latest advanced 3D analysis methods, our method demonstrates superior performance, achieving an accuracy of 84.62% and an Area Under the Receiver Operating Characteristic Curve (AUC-ROC) of 0.724 on the independent test set. To improve transparency and trust in the model’s predictions, we apply the Integrated Gradients algorithm to provide theoretically grounded explanations of the model’s decision-making process. Our results indicate that pre-pregnancy weight, maternal age, obstetric history, previous CS history, and body shape, particularly around the head and shoulders, are key contributors to CS risk prediction.
zh
[CV-24] QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models
【速读】:该论文旨在解决多图像场景下多模态大语言模型(Multimodal Large Language Models, MLLMs)面临的两个核心问题:一是跨不同图像的细粒度感知能力不足,二是难以有效推理并融合多个视觉输入的信息。现有提示方法在单图或受限场景中表现尚可,但在通用且复杂的多图像推理任务中存在明显短板。解决方案的关键在于提出一种零样本提示方法——问题引导的链式描述(Question-Guided Chain-of-Captions, QG-CoC),该方法通过将问题引导的逐图描述与逻辑链式推理相结合,实现对任意数量图像的细粒度感知与信息整合,从而显著提升模型在复杂多图像任务中的性能表现。
链接: https://arxiv.org/abs/2511.03206
作者: Kuei-Chun Kao,Hsu Tzu-Yin,Yunqi Hong,Ruochen Wang,Cho-Jui Hsieh
机构: University of California, Los Angeles (加州大学洛杉矶分校)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 16 pages
Abstract:Recently, Multimodal Large Language Models (MLLMs) encounter two key issues in multi-image contexts: (1) a lack of fine-grained perception across disparate images, and (2) a diminished capability to effectively reason over and synthesize information from multiple visual inputs. However, while various prompting methods aim to describe visual content, many existing studies focus primarily on single-image settings or specific, constrained scenarios. This leaves a critical gap in understanding and addressing how MLLMs tackle more general and complex multi-image reasoning tasks. Thus, we first extensively investigate how current prompting methods perceive fine-grained visual details and process visual information when dealing with multiple images. Our findings reveal that existing prompting methods fall short in attending to needed clues and seamlessly integrating perception and reasoning. Inspired by the findings, we propose a new zero-shot prompting method, Question-Guided Chain-of-Captions (QG-CoC), a generalized prompting approach that effectively handles problems with an arbitrary number of images. We evaluate our method on various open-source and closed-source MLLMs for multi-image and single-image benchmarks. Experimental results indicate that QG-CoC demonstrates competitive performance across tasks and exhibits robust improvements in the challenging scenarios where existing prompting methods fail.
zh
[CV-25] A Probabilistic U-Net Approach to Downscaling Climate Simulations NEURIPS2025
【速读】:该论文旨在解决气候模型因计算成本高而导致空间分辨率较粗,难以满足气候影响研究对细尺度信息需求的问题。其解决方案的关键在于采用一种概率U-Net架构,该架构在确定性U-Net主干网络基础上引入变分潜在空间(variational latent space),以捕捉由数据本身引起的随机不确定性(aleatoric uncertainty),并通过四种训练目标函数进行优化,其中WMSE-MS-SSIM在极端值模拟中表现优异,而afCRPS则更擅长刻画多尺度空间变异性。
链接: https://arxiv.org/abs/2511.03197
作者: Maryam Alipourhajiagha,Pierre-Louis Lemaire,Youssef Diouane,Julie Carreau
机构: Polytechnique Montréal (蒙特利尔理工学院)
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Atmospheric and Oceanic Physics (physics.ao-ph)
备注: NeurIPS 2025 AI4Science
Abstract:Climate models are limited by heavy computational costs, often producing outputs at coarse spatial resolutions, while many climate change impact studies require finer scales. Statistical downscaling bridges this gap, and we adapt the probabilistic U-Net for this task, combining a deterministic U-Net backbone with a variational latent space to capture aleatoric uncertainty. We evaluate four training objectives, afCRPS and WMSE-MS-SSIM with three settings for downscaling precipitation and temperature from 16\times coarser resolution. Our main finding is that WMSE-MS-SSIM performs well for extremes under certain settings, whereas afCRPS better captures spatial variability across scales.
zh
[CV-26] PETWB-REP: A Multi-Cancer Whole-Body FDG PET/CT and Radiology Report Dataset for Medical Imaging Research
【速读】:该论文旨在解决当前公开可用的大规模医学影像数据集在整合功能与解剖成像(如18F-氟代葡萄糖(FDG)正电子发射断层扫描/计算机断层扫描,PET/CT)以及详细临床报告方面仍存在显著不足的问题。其解决方案的关键在于构建并发布一个名为PETWB-REP的结构化、去标识化的多模态数据集,该数据集包含490例不同恶性肿瘤患者的配对PET和CT图像、放射科报告文本及结构化临床元数据,从而为医学影像分析、放射组学、人工智能建模和多模态学习提供高质量的数据支持。
链接: https://arxiv.org/abs/2511.03194
作者: Le Xue,Gang Feng,Wenbo Zhang,Yichi Zhang,Lanlan Li,Shuqi Wang,Liling Peng,Sisi Peng,Xin Gao
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
Abstract:Publicly available, large-scale medical imaging datasets are crucial for developing and validating artificial intelligence models and conducting retrospective clinical research. However, datasets that combine functional and anatomical imaging with detailed clinical reports across multiple cancer types remain scarce. Here, we present PETWB-REP, a curated dataset comprising whole-body 18F-Fluorodeoxyglucose (FDG) Positron Emission Tomography/Computed Tomography (PET/CT) scans and corresponding radiology reports from 490 patients diagnosed with various malignancies. The dataset primarily includes common cancers such as lung cancer, liver cancer, breast cancer, prostate cancer, and ovarian cancer. This dataset includes paired PET and CT images, de-identified textual reports, and structured clinical metadata. It is designed to support research in medical imaging, radiomics, artificial intelligence, and multi-modal learning.
zh
[CV-27] SurgAnt-ViVQA: Learning to Anticipate Surgical Events through GRU-Driven Temporal Cross-Attention
【速读】:该论文旨在解决内镜经鼻垂体手术中实时辅助决策的难题,即如何从有限视野和快速变化的手术流程中预测未来步骤、所需器械及剩余时间,以实现前瞻性(proactive)的视觉问答(VQA)推理。传统VQA系统多基于静态帧进行视觉-语言对齐,难以支持手术阶段的预测与规划。其关键解决方案是提出首个面向手术前瞻推理的数据集PitVQA-Anticipation(含33.5小时视频与734,769个问答对),并设计SurgAnt-ViVQA模型——该模型通过门控时序交叉注意力模块(GRU Gated Temporal Cross-Attention)实现帧间动态建模与细粒度视觉-语言融合:双向GRU编码帧间时序信息,自适应门控机制在token级别将视觉上下文注入语言流,结合参数高效微调使大语言模型适配手术领域。实验表明,该方法显著优于图像与视频基线模型,在时间预测精度和任务泛化能力上取得突破,验证了时序建模与门控融合对构建未来感知型手术辅助系统的重要性。
链接: https://arxiv.org/abs/2511.03178
作者: Shreyas C. Dhake,Jiayuan Huang,Runlong He,Danyal Z. Khan,Evangelos B. Mazomenos,Sophia Bano,Hani J. Marcus,Danail Stoyanov,Matthew J. Clarkson,Mobarak I. Hoque
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 12 pages
Abstract:Anticipating forthcoming surgical events is vital for real-time assistance in endonasal transsphenoidal pituitary surgery, where visibility is limited and workflow changes rapidly. Most visual question answering (VQA) systems reason on isolated frames with static vision language alignment, providing little support for forecasting next steps or instrument needs. Existing surgical VQA datasets likewise center on the current scene rather than the near future. We introduce PitVQA-Anticipation, the first VQA dataset designed for forward looking surgical reasoning. It comprises 33.5 hours of operative video and 734,769 question answer pairs built from temporally grouped clips and expert annotations across four tasks: predicting the future phase, next step, upcoming instrument, and remaining duration. We further propose SurgAnt-ViVQA, a video language model that adapts a large language model using a GRU Gated Temporal Cross-Attention module. A bidirectional GRU encodes frame to frame dynamics, while an adaptive gate injects visual context into the language stream at the token level. Parameter efficient fine tuning customizes the language backbone to the surgical domain. SurgAnt-ViVQA tested upon on PitVQA-Anticipation and EndoVis datasets, surpassing strong image and video based baselines. Ablations show that temporal recurrence and gated fusion drive most of the gains. A frame budget study indicates a trade-off: 8 frames maximize fluency, whereas 32 frames slightly reduce BLEU but improve numeric time estimation. By pairing a temporally aware encoder with fine grained gated cross-attention, SurgAnt-ViVQA advances surgical VQA from retrospective description to proactive anticipation. PitVQA-Anticipation offers a comprehensive benchmark for this setting and highlights the importance of targeted temporal modeling for reliable, future aware surgical assistance.
zh
[CV-28] Subsampled Randomized Fourier GaLore for Adapting Foundation Models in Depth-Driven Liver Landmark Segmentation
【速读】:该论文旨在解决腹腔镜肝手术中因2D视频流导致的深度感知受限和解剖标志定位困难的问题,核心挑战在于如何有效融合RGB图像与深度信息,并高效适配大规模视觉模型至手术场景。解决方案的关键在于提出一种基于深度引导的肝脏标志分割框架,通过集成语义与几何线索实现精确分割:首先利用Segment Anything Model V2(SAM2)编码器提取RGB特征,Depth Anything V2(DA2)编码器提取深度感知特征;其次引入SRFT-GaLore方法——一种基于子采样随机傅里叶变换(Subsampled Randomized Fourier Transform, SRFT)的低秩梯度投影策略,替代传统SVD以高效微调高维注意力层;最后设计交叉注意力融合模块整合多模态特征。实验表明,该框架在公开L3D数据集上Dice相似系数提升4.85%,平均对称表面距离降低11.78点,并在新构建的腹腔镜肝脏手术数据集(LLSD)上展现出优异跨数据集泛化能力,验证了其在实时、深度受限的外科环境中具备可扩展性和高精度。
链接: https://arxiv.org/abs/2511.03163
作者: Yun-Chen Lin,Jiayuan Huang,Hanyuan Zhang,Sergi Kavtaradze,Matthew J. Clarkson,Mobarak I. Hoque
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 12 pages
Abstract:Accurate detection and delineation of anatomical structures in medical imaging are critical for computer-assisted interventions, particularly in laparoscopic liver surgery where 2D video streams limit depth perception and complicate landmark localization. While recent works have leveraged monocular depth cues for enhanced landmark detection, challenges remain in fusing RGB and depth features and in efficiently adapting large-scale vision models to surgical domains. We propose a depth-guided liver landmark segmentation framework integrating semantic and geometric cues via vision foundation encoders. We employ Segment Anything Model V2 (SAM2) encoder to extract RGB features and Depth Anything V2 (DA2) encoder to extract depth-aware features. To efficiently adapt SAM2, we introduce SRFT-GaLore, a novel low-rank gradient projection method that replaces the computationally expensive SVD with a Subsampled Randomized Fourier Transform (SRFT). This enables efficient fine-tuning of high-dimensional attention layers without sacrificing representational power. A cross-attention fusion module further integrates RGB and depth cues. To assess cross-dataset generalization, we also construct a new Laparoscopic Liver Surgical Dataset (LLSD) as an external validation benchmark. On the public L3D dataset, our method achieves a 4.85% improvement in Dice Similarity Coefficient and a 11.78-point reduction in Average Symmetric Surface Distance compared to the D2GPLand. To further assess generalization capability, we evaluate our model on LLSD dataset. Our model maintains competitive performance and significantly outperforms SAM-based baselines, demonstrating strong cross-dataset robustness and adaptability to unseen surgical environments. These results demonstrate that our SRFT-GaLore-enhanced dual-encoder framework enables scalable and precise segmentation under real-time, depth-constrained surgical settings.
zh
[CV-29] Finetuning-Free Personalization of Text to Image Generation via Hypernetworks
【速读】:该论文旨在解决文本到图像扩散模型个性化过程中依赖高计算成本的微调(fine-tuning)方法所带来的效率瓶颈问题,尤其是传统方法如DreamBooth在推理阶段仍需针对每个主体进行优化,导致部署复杂且延迟较高。解决方案的关键在于提出一种无需微调的个性化框架,利用超网络(Hypernetwork)直接从主体图像预测LoRA(Low-Rank Adaptation)适配权重,从而实现端到端训练,并通过简单的输出正则化稳定训练过程,确保个性化效果与提示词对齐性;此外,引入混合模型无分类器指导(Hybrid-Model Classifier-Free Guidance, HM-CFG)以增强推理时的组合泛化能力,使基座扩散模型的结构先验与个性化模型的主题保真度协同作用,显著提升性能并降低部署开销。
链接: https://arxiv.org/abs/2511.03156
作者: Sagar Shrestha,Gopal Sharma,Luowei Zhou,Suren Kumar
机构: Samsung AI Center Mountain View (三星人工智能中心山景城); 111Work done during an internship at Samsung AI Center Mountain View. (实习期间工作地点为三星人工智能中心山景城)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
Abstract:Personalizing text-to-image diffusion models has traditionally relied on subject-specific fine-tuning approaches such as DreamBooth~\citeruiz2023dreambooth, which are computationally expensive and slow at inference. Recent adapter- and encoder-based methods attempt to reduce this overhead but still depend on additional fine-tuning or large backbone models for satisfactory results. In this work, we revisit an orthogonal direction: fine-tuning-free personalization via Hypernetworks that predict LoRA-adapted weights directly from subject images. Prior hypernetwork-based approaches, however, suffer from costly data generation or unstable attempts to mimic base model optimization trajectories. We address these limitations with an end-to-end training objective, stabilized by a simple output regularization, yielding reliable and effective hypernetworks. Our method removes the need for per-subject optimization at test time while preserving both subject fidelity and prompt alignment. To further enhance compositional generalization at inference time, we introduce Hybrid-Model Classifier-Free Guidance (HM-CFG), which combines the compositional strengths of the base diffusion model with the subject fidelity of personalized models during sampling. Extensive experiments on CelebA-HQ, AFHQ-v2, and DreamBench demonstrate that our approach achieves strong personalization performance and highlights the promise of hypernetworks as a scalable and effective direction for open-category personalization.
zh
[CV-30] st Time Adaptation Using Adaptive Quantile Recalibration
【速读】:该论文旨在解决深度学习模型在真实场景中因测试分布与训练域差异而导致泛化能力下降的问题,尤其是传统领域自适应方法依赖目标域先验知识或需重新训练模型,难以适用于动态或资源受限环境。其解决方案的关键在于提出一种无需重训练的测试时自适应方法——自适应分位数校准(Adaptive Quantile Recalibration, AQR),通过通道级分位数对齐来重塑预激活分布,从而捕获激活分布的整体形状,并兼容多种归一化层(如BatchNorm、GroupNorm和LayerNorm)。AQR利用训练时计算的源域统计量,在不依赖目标域标签的情况下实现稳定且精确的分布对齐,尤其针对小批量下尾部分布估计不稳定的问题引入鲁棒的尾部校准策略,显著提升了跨数据集和架构的适应性能。
链接: https://arxiv.org/abs/2511.03148
作者: Paria Mehrbod,Pedro Vianna,Geraldin Nanfack,Guy Wolf,Eugene Belilovsky
机构: Concordia University (康考迪亚大学); Mila – Quebec AI Institute (魁北克人工智能研究所); Université de Montréal (蒙特利尔大学)
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
备注:
Abstract:Domain adaptation is a key strategy for enhancing the generalizability of deep learning models in real-world scenarios, where test distributions often diverge significantly from the training domain. However, conventional approaches typically rely on prior knowledge of the target domain or require model retraining, limiting their practicality in dynamic or resource-constrained environments. Recent test-time adaptation methods based on batch normalization statistic updates allow for unsupervised adaptation, but they often fail to capture complex activation distributions and are constrained to specific normalization layers. We propose Adaptive Quantile Recalibration (AQR), a test-time adaptation technique that modifies pre-activation distributions by aligning quantiles on a channel-wise basis. AQR captures the full shape of activation distributions and generalizes across architectures employing BatchNorm, GroupNorm, or LayerNorm. To address the challenge of estimating distribution tails under varying batch sizes, AQR incorporates a robust tail calibration strategy that improves stability and precision. Our method leverages source-domain statistics computed at training time, enabling unsupervised adaptation without retraining models. Experiments on CIFAR-10-C, CIFAR-100-C, and ImageNet-C across multiple architectures demonstrate that AQR achieves robust adaptation across diverse settings, outperforming existing test-time adaptation baselines. These results highlight AQR’s potential for deployment in real-world scenarios with dynamic and unpredictable data distributions.
zh
[CV-31] Scheduling the Off-Diagonal Weingarten Loss of Neural SDFs for CAD Models
【速读】:该论文旨在解决神经隐式表示中用于CAD重建的符号距离函数(SDF)在优化过程中因缺乏有效曲率正则化而导致的伪影(spurious warp)和结构保真度下降的问题。现有方法如FlatCAD虽引入了非对角Weingarten(Off-Diagonal Weingarten, ODW)损失作为高效的二阶先验以近似完整Hessian正则化,但其采用固定权重策略在训练全程施加相同强度的约束,导致早期优化稳定但后期细节恢复受限。解决方案的关键在于设计时间可变的ODW损失调度策略——初始赋予较高权重以稳定优化过程,随后逐步衰减以释放对细粒度结构的约束,从而实现从粗到精的渐进式重建。实验表明,该策略显著优于固定权重方案,在ABC CAD数据集上将Chamfer Distance提升最高达35%。
链接: https://arxiv.org/abs/2511.03147
作者: Haotian Yin,Przemyslaw Musialski
机构: 未知
类目: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Lecture Notes in Computer Science (LNCS), 20th International Symposium on Visual Computing 2025, 12 pages, 4 figures, preprint
Abstract:Neural signed distance functions (SDFs) have become a powerful representation for geometric reconstruction from point clouds, yet they often require both gradient- and curvature-based regularization to suppress spurious warp and preserve structural fidelity. FlatCAD introduced the Off-Diagonal Weingarten (ODW) loss as an efficient second-order prior for CAD surfaces, approximating full-Hessian regularization at roughly half the computational cost. However, FlatCAD applies a fixed ODW weight throughout training, which is suboptimal: strong regularization stabilizes early optimization but suppresses detail recovery in later stages. We present scheduling strategies for the ODW loss that assign a high initial weight to stabilize optimization and progressively decay it to permit fine-scale refinement. We investigate constant, linear, quintic, and step interpolation schedules, as well as an increasing warm-up variant. Experiments on the ABC CAD dataset demonstrate that time-varying schedules consistently outperform fixed weights. Our method achieves up to a 35% improvement in Chamfer Distance over the FlatCAD baseline, establishing scheduling as a simple yet effective extension of curvature regularization for robust CAD reconstruction.
zh
[CV-32] Deploying Rapid Damage Assessments from sUAS Imagery for Disaster Response
【速读】:该论文旨在解决在联邦宣布的灾害(如飓风Debby和Helene)中,无人飞行系统(sUAS)采集的海量影像数据超出人工处理能力的问题,从而导致灾后响应延迟。传统方式下,sUAS团队每日生成47–369GB影像数据,远超专家可及时分析的范围。解决方案的关键在于开发并部署首个用于sUAS影像建筑损毁评估的AI/ML系统,其核心包括:基于包含21,716个建筑损毁标注的全球最大后灾sUAS影像数据集进行模型训练,并通过91名灾害应急人员的实际操作验证模型性能;最终在飓风响应中实现了对415栋建筑的自动评估,仅耗时约18分钟,显著提升了灾情评估效率。
链接: https://arxiv.org/abs/2511.03132
作者: Thomas Manzini,Priyankari Perali,Robin R. Murphy
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
备注: 6 pages, 4 figures, 1 table. Accepted - In Press, IAAI’26
Abstract:This paper presents the first AI/ML system for automating building damage assessment in uncrewed aerial systems (sUAS) imagery to be deployed operationally during federally declared disasters (Hurricanes Debby and Helene). In response to major disasters, sUAS teams are dispatched to collect imagery of the affected areas to assess damage; however, at recent disasters, teams collectively delivered between 47GB and 369GB of imagery per day, representing more imagery than can reasonably be transmitted or interpreted by subject matter experts in the disaster scene, thus delaying response efforts. To alleviate this data avalanche encountered in practice, computer vision and machine learning techniques are necessary. While prior work has been deployed to automatically assess damage in satellite imagery, there is no current state of practice for sUAS-based damage assessment systems, as all known work has been confined to academic settings. This work establishes the state of practice via the development and deployment of models for building damage assessment with sUAS imagery. The model development involved training on the largest known dataset of post-disaster sUAS aerial imagery, containing 21,716 building damage labels, and the operational training of 91 disaster practitioners. The best performing model was deployed during the responses to Hurricanes Debby and Helene, where it assessed a combined 415 buildings in approximately 18 minutes. This work contributes documentation of the actual use of AI/ML for damage assessment during a disaster and lessons learned to the benefit of the AI/ML research and user communities.
zh
[CV-33] Accelerating Physical Property Reasoning for Augmented Visual Cognition
【速读】:该论文旨在解决视觉引导的物理属性推理(vision-guided physical property reasoning)在实际应用中存在高延迟的问题,从而阻碍了增强视觉认知(augmented visual cognition)的实时性与实用性。其核心解决方案在于通过算法与系统级优化相结合的方式显著降低推理管道的端到端延迟:包括快速几何三维重建、高效的语义特征融合以及并行视图编码等关键技术。这些优化使系统在保持甚至超越现有最先进方法(SOTA)在物体级物理属性估计(如质量)精度的同时,将处理时间从原本的10–20分钟缩短至6秒以内,并在材料分割和体素级推断方面表现更优,从而实现了在真实复杂环境(如IKEA门店)中基于智能眼镜的鲁棒物理属性估算。
链接: https://arxiv.org/abs/2511.03126
作者: Hongbo Lan,Zhenlin An,Haoyu Li,Vaibhav Singh,Longfei Shangguan
机构: University of Pittsburgh (匹兹堡大学); University of Georgia (佐治亚大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
备注:
Abstract:This paper introduces \sysname, a system that accelerates vision-guided physical property reasoning to enable augmented visual cognition. \sysname minimizes the run-time latency of this reasoning pipeline through a combination of both algorithmic and systematic optimizations, including rapid geometric 3D reconstruction, efficient semantic feature fusion, and parallel view encoding. Through these simple yet effective optimizations, \sysname reduces the end-to-end latency of this reasoning pipeline from 10–20 minutes to less than 6 seconds. A head-to-head comparison on the ABO dataset shows that \sysname achieves this 62.9 \times --287.2 \times speedup while not only reaching on-par (and sometimes slightly better) object-level physical property estimation accuracy(e.g. mass), but also demonstrating superior performance in material segmentation and voxel-level inference than two SOTA baselines. We further combine gaze-tracking with \sysname to localize the object of interest in cluttered, real-world environments, streamlining the physical property reasoning on smart glasses. The case study with Meta Aria Glasses conducted at an IKEA furniture store demonstrates that \sysname achives consistently high performance compared to controlled captures, providing robust property estimations even with fewer views in real-world scenarios.
zh
[CV-34] Image-Intrinsic Priors for Integrated Circuit Defect Detection and Novel Class Discovery via Self-Supervised Learning
【速读】:该论文旨在解决集成电路(Integrated Circuit, IC)制造过程中缺陷检测与未知类别缺陷发现的难题,尤其针对传统监督方法依赖大量人工标注、难以应对罕见或新兴缺陷类型,以及基于聚类的无监督方法因缺乏先验信息而导致性能不稳定的问题。解决方案的关键在于提出一种无需支持集(support set free)的框架IC DefectNCD,其核心创新包括:1)通过自监督的正常信息引导机制聚合代表性正常特征,并利用重建残差粗略定位缺陷区域;2)设计自适应二值化策略以稳定提取聚焦于缺陷核心区域的子图像,缓解不同缺陷间显著性差异带来的干扰;3)引入软掩码引导的注意力机制,在师生模型中注入空间缺陷先验,提升对缺陷区域的敏感性并抑制背景噪声,从而实现对已知和未见缺陷的有效识别与分类。
链接: https://arxiv.org/abs/2511.03120
作者: Botong.Zhao,Xubin.Wang,Shujing.Lyu,Yue.Lu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:
Abstract:Integrated circuit manufacturing is highly complex, comprising hundreds of process steps. Defects can arise at any stage, causing yield loss and ultimately degrading product reliability. Supervised methods require extensive human annotation and struggle with emergent categories and rare, data scarce defects. Clustering-based unsupervised methods often exhibit unstable performance due to missing priors. We propose IC DefectNCD, a support set free framework that leverages Image Intrinsic Priors in IC SEM images for defect detection and novel class discovery. We first develop Self Normal Information Guided IC Defect Detection, aggregating representative normal features via a learnable normal information extractor and using reconstruction residuals to coarsely localize defect regions. To handle saliency variations across defects, we introduce an adaptive binarization strategy that produces stable subimages focused on core defective areas. Finally, we design Self Defect Information Guided IC Defect Classification, which incorporates a soft mask guided attention mechanism to inject spatial defect priors into the teacher student model. This enhances sensitivity to defective regions, suppresses background interference, and enables recognition and classification of unseen defects. We validate the approach on a real world dataset spanning three key fabrication stages and covering 15 defect types. Experiments demonstrate robust performance on both defect detection and unseen defect classification.
zh
[CV-35] DentalSplat: Dental Occlusion Novel View Synthesis from Sparse Intra-Oral Photographs
【速读】:该论文旨在解决正畸治疗中基于稀疏视角图像(仅含前视图和双侧颊面视图共三张)进行高质量三维重建的难题,尤其在远程医疗场景下因输入视图稀疏且缺乏相机位姿信息而导致重建质量下降的问题。解决方案的关键在于提出DentalSplat框架:首先利用先验引导的密集立体重建模型初始化点云,随后采用尺度自适应剪枝策略提升3D高斯泼溅(3D Gaussian Splatting, 3DGS)的训练效率与重建精度;在极端稀疏情况下,进一步引入光流作为几何约束并结合梯度正则化,显著增强渲染保真度,从而实现对牙合关系的有效可视化。
链接: https://arxiv.org/abs/2511.03099
作者: Yiyi Miao,Taoyu Wu,Tong Chen,Sihao Li,Ji Jiang,Youpeng Yang,Angelos Stefanidis,Limin Yu,Jionglong Su
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
Abstract:In orthodontic treatment, particularly within telemedicine contexts, observing patients’ dental occlusion from multiple viewpoints facilitates timely clinical decision-making. Recent advances in 3D Gaussian Splatting (3DGS) have shown strong potential in 3D reconstruction and novel view synthesis. However, conventional 3DGS pipelines typically rely on densely captured multi-view inputs and precisely initialized camera poses, limiting their practicality. Orthodontic cases, in contrast, often comprise only three sparse images, specifically, the anterior view and bilateral buccal views, rendering the reconstruction task especially challenging. The extreme sparsity of input views severely degrades reconstruction quality, while the absence of camera pose information further complicates the process. To overcome these limitations, we propose DentalSplat, an effective framework for 3D reconstruction from sparse orthodontic imagery. Our method leverages a prior-guided dense stereo reconstruction model to initialize the point cloud, followed by a scale-adaptive pruning strategy to improve the training efficiency and reconstruction quality of 3DGS. In scenarios with extremely sparse viewpoints, we further incorporate optical flow as a geometric constraint, coupled with gradient regularization, to enhance rendering fidelity. We validate our approach on a large-scale dataset comprising 950 clinical cases and an additional video-based test set of 195 cases designed to simulate real-world remote orthodontic imaging conditions. Experimental results demonstrate that our method effectively handles sparse input scenarios and achieves superior novel view synthesis quality for dental occlusion visualization, outperforming state-of-the-art techniques.
zh
[CV-36] ISC-Perception: A Hybrid Computer Vision Dataset for Object Detection in Novel Steel Assembly
【速读】:该论文旨在解决建筑机器人在钢框架装配中感知能力不足的问题,特别是针对Intermeshed Steel Connection (ISC)组件检测缺乏专用图像数据集的挑战。现有方法受限于真实施工现场图像采集的高成本、安全风险和隐私问题,导致训练数据稀缺且标注效率低下。解决方案的关键在于构建首个专为ISC组件检测设计的混合数据集ISC-Perception,其融合了程序化生成的CAD图像、游戏引擎渲染的逼真场景以及少量人工精选的真实照片,实现了合成数据的全自动标注;同时通过系统性的人工时间投入分析表明,该方法相较纯人工标注可节省81.7%的时间(30.5小时 vs 166.7小时),显著提升数据生产效率。实验验证显示,基于该数据集训练的目标检测模型在mAP@0.50上达到0.756,并在1200帧基准测试中实现0.943/0.823的mAP@0.50/mAP@[0.50:0.95]性能,优于仅使用合成或仅使用逼真图像训练的模型,从而有效填补了施工机器人感知领域的数据空白。
链接: https://arxiv.org/abs/2511.03098
作者: Miftahur Rahman,Samuel Adebayo,Dorian A. Acevedo-Mejia,David Hester,Daniel McPolin,Karen Rafferty,Debra F. Laefer
机构: Queen’s University Belfast (贝尔法斯特女王大学); University of Texas at San Antonio (圣安东尼奥德克萨斯大学); New York University (纽约大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
备注:
Abstract:The Intermeshed Steel Connection (ISC) system, when paired with robotic manipulators, can accelerate steel-frame assembly and improve worker safety by eliminating manual assembly. Dependable perception is one of the initial stages for ISC-aware robots. However, this is hampered by the absence of a dedicated image corpus, as collecting photographs on active construction sites is logistically difficult and raises safety and privacy concerns. In response, we introduce ISC-Perception, the first hybrid dataset expressly designed for ISC component detection. It blends procedurally rendered CAD images, game-engine photorealistic scenes, and a limited, curated set of real photographs, enabling fully automatic labelling of the synthetic portion. We explicitly account for all human effort to produce the dataset, including simulation engine and scene setup, asset preparation, post-processing scripts and quality checks; our total human time to generate a 10,000-image dataset was 30.5,h versus 166.7,h for manual labelling at 60,s per image (-81.7%). A manual pilot on a representative image with five instances of ISC members took 60,s (maximum 80,s), anchoring the manual baseline. Detectors trained on ISC-Perception achieved a mean Average Precision at IoU 0.50 of 0.756, substantially surpassing models trained on synthetic-only or photorealistic-only data. On a 1,200-frame bench test, we report mAP@0.50/mAP@[0.50:0.95] of 0.943/0.823. By bridging the data gap for construction-robotics perception, ISC-Perception facilitates rapid development of custom object detectors and is freely available for research and industrial use upon request.
zh
[CV-37] A Plug-and-Play Framework for Volumetric Light-Sheet Image Reconstruction
【速读】:该论文旨在解决传统光学成像在捕捉心脏跳动过程中动态细胞结构时面临的时空分辨率权衡问题(trade-off between spatial and temporal resolution),尤其是在低光强环境下难以实现高效、高清晰度成像的挑战。其解决方案的关键在于提出了一种融合压缩感知(Compressive Sensing, CS)与光片显微镜(Light-Sheet Microscopy, LSM)的高性能计算成像框架,通过数字微镜器件(Digital Micromirror Device, DMD)实现随机二进制掩码编码的压缩采样,并采用基于交替方向乘子法(ADMM)求解的“即插即用”(Plug-and-Play, PnP)框架,灵活集成多种先进去噪算子(如Tikhonov、总变差TV和BM3D),同时引入时间正则化以保持相邻z切片间的结构连续性,从而在高压缩比下仍能重建出清晰且低噪声的细胞结构图像,验证了该方法在高速、低光生物成像场景中的有效性与鲁棒性。
链接: https://arxiv.org/abs/2511.03093
作者: Yi Gong,Xinyuan Zhang,Jichen Chai,Yichen Ding,Yifei Lou
机构: The University of North Carolina at Chapel Hill (北卡罗来纳大学教堂山分校); The University of Texas at Dallas (德克萨斯大学达拉斯分校)
类目: Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (math.NA)
备注:
Abstract:Cardiac contraction is a rapid, coordinated process that unfolds across three-dimensional tissue on millisecond timescales. Traditional optical imaging is often inadequate for capturing dynamic cellular structure in the beating heart because of a fundamental trade-off between spatial and temporal resolution. To overcome these limitations, we propose a high-performance computational imaging framework that integrates Compressive Sensing (CS) with Light-Sheet Microscopy (LSM) for efficient, low-phototoxic cardiac imaging. The system performs compressed acquisition of fluorescence signals via random binary mask coding using a Digital Micromirror Device (DMD). We propose a Plug-and-Play (PnP) framework, solved using the alternating direction method of multipliers (ADMM), which flexibly incorporates advanced denoisers, including Tikhonov, Total Variation (TV), and BM3D. To preserve structural continuity in dynamic imaging, we further introduce temporal regularization enforcing smoothness between adjacent z-slices. Experimental results on zebrafish heart imaging under high compression ratios demonstrate that the proposed method successfully reconstructs cellular structures with excellent denoising performance and image clarity, validating the effectiveness and robustness of our algorithm in real-world high-speed, low-light biological imaging scenarios.
zh
[CV-38] From Propagation to Prediction: Point-level Uncertainty Evaluation of MLS Point Clouds under Limited Ground Truth
【速读】:该论文旨在解决移动激光扫描(Mobile Laser Scanning, MLS)点云在高精度应用中不确定性评估依赖于昂贵且难以获取的地面真值(Ground Truth, GT)这一长期难题。其解决方案的关键在于提出一种基于学习的框架,通过最优邻域估计与几何特征提取相结合的方式,利用XGBoost模型实现点级不确定性的预测,该模型在精度上可媲美随机森林(Random Forest),但效率提升约3倍,从而证明几何特征可用于量化由点到点距离(Closest Point to Closest Point, C2C)定义的不确定性,为不确定性评估研究提供了新的学习范式。
链接: https://arxiv.org/abs/2511.03053
作者: Ziyang Xu,Olaf Wysocki,Christoph Holst
机构: Technical University of Munich (慕尼黑工业大学); University of Cambridge (剑桥大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:
Abstract:Evaluating uncertainty is critical for reliable use of Mobile Laser Scanning (MLS) point clouds in many high-precision applications such as Scan-to-BIM, deformation analysis, and 3D modeling. However, obtaining the ground truth (GT) for evaluation is often costly and infeasible in many real-world applications. To reduce this long-standing reliance on GT in uncertainty evaluation research, this study presents a learning-based framework for MLS point clouds that integrates optimal neighborhood estimation with geometric feature extraction. Experiments on a real-world dataset show that the proposed framework is feasible and the XGBoost model delivers fully comparable accuracy to Random Forest while achieving substantially higher efficiency (about 3 times faster), providing initial evidence that geometric features can be used to predict point-level uncertainty quantified by the C2C distance. In summary, this study shows that MLS point clouds’ uncertainty is learnable, offering a novel learning-based viewpoint towards uncertainty evaluation research.
zh
[CV-39] Data-Efficient Realized Volatility Forecasting with Vision Transformers NEURIPS
【速读】:该论文旨在解决金融时间序列预测中复杂非线性关系建模的问题,特别是针对期权数据的预测任务尚未被充分探索的现状。其解决方案的关键在于将通常用于图像识别的视觉Transformer(Vision Transformer, ViT)架构迁移至期权隐含波动率(Implied Volatility, IV)表面的预测任务中,利用ViT对非线性特征和季节性模式的学习能力,从单日IV表面(含日期信息增强)中预测未来30天的实际波动率(Realized Volatility)。实验表明,ViT能够有效捕捉IV表面中的复杂结构,为构建高性能期权数据预测模型提供了可行路径。
链接: https://arxiv.org/abs/2511.03046
作者: Emi Soroka,Artem Arzyn
机构: Stanford University (斯坦福大学)
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
备注: NeurIPS Generative AI in Finance
Abstract:Recent work in financial machine learning has shown the virtue of complexity: the phenomenon by which deep learning methods capable of learning highly nonlinear relationships outperform simpler approaches in financial forecasting. While transformer architectures like Informer have shown promise for financial time series forecasting, the application of transformer models for options data remains largely unexplored. We conduct preliminary studies towards the development of a transformer model for options data by training the Vision Transformer (ViT) architecture, typically used in modern image recognition and classification systems, to predict the realized volatility of an asset over the next 30 days from its implied volatility surface (augmented with date information) for a single day. We show that the ViT can learn seasonal patterns and nonlinear features from the IV surface, suggesting a promising direction for model development.
zh
[CV-40] SLIP: Structural-aware Language-Image Pretraining for Vision-Language Alignment
【速读】:该论文旨在解决当前视觉-语言预训练(Vision-Language Pretraining, VLP)方法在训练过程中将图像-文本对视为孤立样本,从而忽视了数据中固有的结构关系(如电商产品共购买图或社交推荐网络中的邻接关系)的问题。其解决方案的关键在于提出结构感知的语言-图像预训练(Structure-aware Language-Image Pretraining, SLIP),通过引入结构对比损失(structural contrastive loss)来同时对齐跨模态表示并建模结构图中相邻实体之间的关系,从而利用结构化监督信号提升跨模态对齐效果。
链接: https://arxiv.org/abs/2511.03019
作者: Wenbo Lu
机构: New York University Shanghai (上海纽约大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Capstone Paper
Abstract:Vision-Language Pretraining (VLP) has achieved remarkable success across various downstream tasks, but such gains are largely driven by scaling up on training data. Yet, literature methods treat image-text pairs as isolated training examples; this neglects the rich relational structure naturally present in many domains, such as e-commerce product co-purchase graphs and social recommendation networks. Inspired by neuroscientific evidence that human encodes knowledge as relationship cognitive maps, we introduce Structure-aware Language-Image Pretraining (SLIP). SLIP integrates a structural contrastive loss to align modalities while also modeling relationships between neighboring entities in a structured graph. To support this paradigm, we construct a large-scale Amazon Product Co-purchase Multimodal Graph Dataset, enabling structured cross-modality supervision at scale. Experiment results show that SLIP consistently outperforms CLIP on cross-modal retrieval and classification tasks in both zero-shot and few-shot settings, showing the value of relational supervision for cross-modal alignment.
zh
[CV-41] A Foundation Model for Brain MRI with Dynamic Modality Integration
【速读】:该论文旨在解决多模态脑部磁共振成像(MRI)中因不同序列组合缺失或未见而导致模型泛化能力受限的问题。传统方法通常需要为每种模态单独训练模型,难以适应实际临床中模态不完整的情况。其解决方案的关键在于设计了一个基础模型(foundation model),采用共享编码器结合可学习模态嵌入(learnable modality embeddings)、条件层归一化(conditional layer normalization)以及考虑缺失模态的掩码自编码目标(masked autoencoding objective),并引入方差-协方差正则项以稳定特征学习并提升表示多样性。该架构使模型能够灵活处理任意模态组合,无需为每个模态单独建模,且具备在部分序列缺失时进行模态补全和特征提取的能力。
链接: https://arxiv.org/abs/2511.03014
作者: Minh Sao Khue Luu,Bair N. Tuchinov
机构: The Artificial Intelligence Research Center of Novosibirsk State University (新西伯利亚国立大学人工智能研究中心)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Preliminary work; results ongoing
Abstract:We present a foundation model for brain MRI that can work with different combinations of imaging sequences. The model uses one encoder with learnable modality embeddings, conditional layer normalization, and a masked autoencoding objective that accounts for missing modalities. A variance-covariance regularizer is applied to stabilize feature learning and improve representation diversity. This design removes the need for separate models for each modality and allows the network to adapt when some sequences are missing or unseen. It is trained on about 60,000 multi-center MRIs using self-supervised reconstruction and modality imputation to learn flexible representations. A learnable modality embedding guides feature extraction so the encoder can adjust to different inputs. We describe our planned evaluation on brain tumor and multiple sclerosis segmentation, as well as lesion classification, under various modality settings. Preliminary results show that the method works feasibly, and further experiments are planned to study its performance in more detail. All code and pretrained models are available at this https URL
zh
[CV-42] Learning with less: label-efficient land cover classification at very high spatial resolution using self-supervised deep learning
【速读】:该论文旨在解决高分辨率(1米)土地覆盖制图中因缺乏大规模代表性标注数据而导致模型难以广泛应用的问题。其解决方案的关键在于采用自监督深度学习方法,利用大量未标注的彩色红外航空影像(377,921个256×256像素的1米分辨率图像块)通过“Bootstrap Your Own Latent”(BYOL)预训练策略对ResNet-101卷积编码器进行预训练,随后将学习到的特征权重迁移至多种语义分割架构(如FCN、U-Net、Attention U-Net等),并在仅使用1,000个标注图像块的小样本训练集上进行微调,从而实现高效且准确的 statewide 1米级8类土地覆盖分类,整体准确率达87.14%,宏平均F1得分为75.58%。
链接: https://arxiv.org/abs/2511.03004
作者: Dakota Hester,Vitor S. Martins,Lucas B. Ferreira,Thainara M. A. Lima
机构: Mississippi State University (密西西比州立大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 25 pages, 11 figures. Submitted in Science of Remote Sensing
Abstract:Deep learning semantic segmentation methods have shown promising performance for very high 1-m resolution land cover classification, but the challenge of collecting large volumes of representative training data creates a significant barrier to widespread adoption of such models for meter-scale land cover mapping over large areas. In this study, we present a novel label-efficient approach for statewide 1-m land cover classification using only 1,000 annotated reference image patches with self-supervised deep learning. We use the “Bootstrap Your Own Latent” pre-training strategy with a large amount of unlabeled color-infrared aerial images (377,921 256x256 1-m pixel patches) to pre-train a ResNet-101 convolutional encoder. The learned encoder weights were subsequently transferred into multiple deep semantic segmentation architectures (FCN, U-Net, Attention U-Net, DeepLabV3+, UPerNet, PAN), which were then fine-tuned using very small training dataset sizes with cross-validation (250, 500, 750 patches). Among the fine-tuned models, we obtained the 87.14% overall accuracy and 75.58% macro F1 score using an ensemble of the best performing U-Net models for comprehensive 1-m, 8-class land cover mapping, covering more than 123 billion pixels over the state of Mississippi, USA. Detailed qualitative and quantitative analysis revealed accurate mapping of open water and forested areas, while highlighting challenges in accurate delineation between cropland, herbaceous, and barren land cover types. These results show that self-supervised learning is an effective strategy for reducing the need for large volumes of manually annotated data, directly addressing a major limitation to high spatial resolution land cover mapping at scale.
zh
[CV-43] SCALE-VLP: Soft-Weighted Contrastive Volumetric Vision-Language Pre-training with Spatial-Knowledge Semantics
【速读】:该论文旨在解决当前视觉语言模型(Vision-Language Models, VLMs)在处理体积数据(如CT影像)时存在的两大问题:一是多数方法仅限于二维数据,忽略体积数据中连续且结构化的空间依赖关系;二是传统对比学习框架通常采用二元监督(正负样本对),无法充分利用临床语义信息(如放射学本体)。解决方案的关键在于提出SCALE-VLP,一个软加权对比视觉语言预训练框架,其核心创新包括:(i) 引入体积空间语义以保留解剖结构的一致性,(ii) 融入领域感知的知识增强语义(如放射学本体)来引导跨模态对齐。该方法在有限监督下生成结构一致且语义锚定的表示,显著提升了跨任务迁移能力(检索、报告生成和分类)与跨域泛化性能,无需额外微调即可在多个下游任务中实现显著优于现有方法的表现。
链接: https://arxiv.org/abs/2511.02996
作者: Ailar Mahdizadeh,Puria Azadi Moghadam,Xiangteng He,Shahriar Mirabbasi,Panos Nasiopoulos,Leonid Sigal
机构: University of British Columbia (不列颠哥伦比亚大学); Vector Institute for AI (AI研究所)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:
Abstract:Vision-language models (VLMs) have demonstrated strong cross-modal capabilities, yet most work remains limited to 2D data and assumes binary supervision (i.e., positive vs. negative pairs), overlooking the continuous and structured dependencies present in volumetric data such as CT. Existing approaches often treat volumetric scans as independent 2D slices, compromising spatial coherence and underutilizing rich clinical semantics. We propose SCALE-VLP, a soft-weighted contrastive vision-language pre-training framework that integrates (i) volumetric spatial semantics to preserve anatomical structure and (ii) domain-aware, knowledge-infused semantics (e.g., radiological ontologies) to guide alignment. This yields structurally consistent and semantically grounded representations under limited supervision, demonstrating strong cross-task transferability (retrieval, report generation, and classification), and cross-domain generalizability with consistent gains without further fine-tuning. In particular, compared to the previous state of the art, SCALE-VLP achieves up to 4.3x higher top-1 CT-report retrieval, improves abnormality classification by 10 points, and reaches ROUGE-L 0.44 and BERT-F1 0.89 for report generation. Further, in zero-shot evaluation on an out-of-domain external dataset, we observe consistent gains, indicating the cross-task and cross-domain generalization ability of SCALE-VLP.
zh
[CV-44] Comprehensive Assessment of LiDAR Evaluation Metrics: A Comparative Study Using Simulated and Real Data
【速读】:该论文旨在解决自动驾驶系统(Autonomous Driving Systems, ADS)在部署前安全性验证难题,尤其是传统物理测试因成本高和安全风险难以实现全面验证的问题。为应对这一挑战,研究提出通过虚拟测试环境(Virtual Testing Environment, VTE)生成的传感器数据与真实世界数据进行对比,以评估VTE的真实性。其解决方案的关键在于:首先系统性地筛选并验证适用于比较真实与仿真LiDAR扫描的评价指标,发现密度感知切比雪夫距离(Density Aware Chamfer Distance, DCD)在不同噪声、密度、畸变等条件下表现最优;其次,基于真实LiDAR数据构建可控环境下的VTE,并利用相同位姿生成仿真LiDAR数据,最终通过几何相似性和模型感知一致性(如语义分割mIoU)综合评估,结果表明DCD与感知方法相关性最强,且仿真与真实扫描在几何上差异较小(DCD=0.63),但感知输出存在显著差异(mIoU=21%),凸显了DCD作为核心评估指标的有效性。
链接: https://arxiv.org/abs/2511.02994
作者: Syed Mostaquim Ali,Taufiq Rahman,Ghazal Farhani,Mohamed H. Zaki,Benoit Anctil,Dominique Charlebois
机构: National Research Council Canada (加拿大国家研究委员会); Western University (西门菲莎大学); Transport Canada (加拿大交通部)
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
备注:
Abstract:For developing safe Autonomous Driving Systems (ADS), rigorous testing is required before they are deemed safe for road deployments. Since comprehensive conventional physical testing is impractical due to cost and safety concerns, Virtual Testing Environments (VTE) can be adopted as an alternative. Comparing VTE-generated sensor outputs against their real-world analogues can be a strong indication that the VTE accurately represents reality. Correspondingly, this work explores a comprehensive experimental approach to finding evaluation metrics suitable for comparing real-world and simulated LiDAR scans. The metrics were tested in terms of sensitivity and accuracy with different noise, density, distortion, sensor orientation, and channel settings. From comparing the metrics, we found that Density Aware Chamfer Distance (DCD) works best across all cases. In the second step of the research, a Virtual Testing Environment was generated using real LiDAR scan data. The data was collected in a controlled environment with only static objects using an instrumented vehicle equipped with LiDAR, IMU and cameras. Simulated LiDAR scans were generated from the VTEs using the same pose as real LiDAR scans. The simulated and LiDAR scans were compared in terms of model perception and geometric similarity. Actual and simulated LiDAR scans have a similar semantic segmentation output with a mIoU of 21% with corrected intensity and an average density aware chamfer distance (DCD) of 0.63. This indicates a slight difference in the geometric properties of simulated and real LiDAR scans and a significant difference between model outputs. During the comparison, density-aware chamfer distance was found to be the most correlated among the metrics with perception methods.
zh
[CV-45] Hybrid Convolution and Vision Transformer NAS Search Space for TinyML Image Classification ECML KDD2024
【速读】:该论文旨在解决当前混合卷积神经网络(CNN)与视觉Transformer(ViT)架构在TinyML部署中因参数量大、计算成本高而难以适用的问题。其解决方案的关键在于提出了一种新的神经架构搜索(NAS)搜索空间,该空间包含可学习局部信息的CNN块、捕捉全局依赖关系的ViT块,以及一种新型可搜索池化层(Pooling block),用于高效地降低特征图尺寸。实验表明,在CIFAR10数据集上,该搜索空间能在严格模型大小约束下生成兼具高准确率和快速推理速度的混合CNN-ViT架构,优于基于ResNet的tinyML模型。
链接: https://arxiv.org/abs/2511.02992
作者: Mikhael Djajapermana,Moritz Reiber,Daniel Mueller-Gritschneder,Ulf Schlichtmann
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Presented at ITEM workshop co-located with ECML PKDD 2024, Vilnius LT
Abstract:Hybrids of Convolutional Neural Network (CNN) and Vision Transformer (ViT) have outperformed pure CNN or ViT architecture. However, since these architectures require large parameters and incur large computational costs, they are unsuitable for tinyML deployment. This paper introduces a new hybrid CNN-ViT search space for Neural Architecture Search (NAS) to find efficient hybrid architectures for image classification. The search space covers hybrid CNN and ViT blocks to learn local and global information, as well as the novel Pooling block of searchable pooling layers for efficient feature map reduction. Experimental results on the CIFAR10 dataset show that our proposed search space can produce hybrid CNN-ViT architectures with superior accuracy and inference speed to ResNet-based tinyML models under tight model size constraints.
zh
[CV-46] EvtSlowTV - A Large and Diverse Dataset for Event-Based Depth Estimation
【速读】:该论文旨在解决事件相机(Event Camera)在深度估计任务中因小规模标注数据集导致模型泛化能力不足的问题。现有方法受限于有限的标注数据,难以适应真实世界复杂多变的环境和运动场景。解决方案的关键在于构建一个大规模、自然场景下的事件数据集 EvtSlowTV,该数据集源自公开 YouTube 视频,包含超过 130 亿个事件,覆盖多种环境条件与运动模式(如季节性徒步、飞行、风景驾驶和水下探索),其规模比现有数据集大一个数量级。通过该数据集,作者提出了一种自监督学习框架,充分利用事件流的高动态范围(HDR)特性,在无需帧级标注的情况下保留事件数据的异步本质,从而显著提升模型在复杂场景中的泛化性能。
链接: https://arxiv.org/abs/2511.02953
作者: Sadiq Layi Macaulay,Nimet Kaygusuz,Simon Hadfield
机构: University of Surrey (萨里大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
备注:
Abstract:Event cameras, with their high dynamic range (HDR) and low latency, offer a promising alternative for robust depth estimation in challenging environments. However, many event-based depth estimation approaches are constrained by small-scale annotated datasets, limiting their generalizability to real-world scenarios. To bridge this gap, we introduce EvtSlowTV, a large-scale event camera dataset curated from publicly available YouTube footage, which contains more than 13B events across various environmental conditions and motions, including seasonal hiking, flying, scenic driving, and underwater exploration. EvtSlowTV is an order of magnitude larger than existing event datasets, providing an unconstrained, naturalistic setting for event-based depth learning. This work shows the suitability of EvtSlowTV for a self-supervised learning framework to capitalise on the HDR potential of raw event streams. We further demonstrate that training with EvtSlowTV enhances the model’s ability to generalise to complex scenes and motions. Our approach removes the need for frame-based annotations and preserves the asynchronous nature of event data.
zh
[CV-47] ProM3E: Probabilistic Masked MultiModal Embedding Model for Ecology
【速读】:该论文旨在解决生态学领域中多模态数据(如图像、文本、传感器数据等)之间缺乏统一表示与跨模态生成能力的问题,从而支持任意模态到任意模态的生成任务。其核心挑战在于如何在嵌入空间中实现模态缺失推理与不确定性建模,以提升跨模态检索和下游任务的泛化性能。解决方案的关键是提出ProM3E——一种基于嵌入空间掩码模态重建的概率化多模态嵌入模型(probabilistic masked multimodal embedding model),通过学习从少量上下文模态推断缺失模态的能力,并利用其概率特性分析不同模态融合对特定下游任务的可行性,最终实现模态无关的生成式表示学习与高效的跨模态检索。
链接: https://arxiv.org/abs/2511.02946
作者: Srikumar Sastry,Subash Khanal,Aayush Dhakal,Jiayu Lin,Dan Cher,Phoenix Jarosz,Nathan Jacobs
机构: Washington University in St. Louis (圣路易斯华盛顿大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 21 pages, 16 figures
Abstract:We introduce ProM3E, a probabilistic masked multimodal embedding model for any-to-any generation of multimodal representations for ecology. ProM3E is based on masked modality reconstruction in the embedding space, learning to infer missing modalities given a few context modalities. By design, our model supports modality inversion in the embedding space. The probabilistic nature of our model allows us to analyse the feasibility of fusing various modalities for given downstream tasks, essentially learning what to fuse. Using these features of our model, we propose a novel cross-modal retrieval approach that mixes inter-modal and intra-modal similarities to achieve superior performance across all retrieval tasks. We further leverage the hidden representation from our model to perform linear probing tasks and demonstrate the superior representation learning capability of our model. All our code, datasets and model will be released at this https URL.
zh
[CV-48] Generative Hints
【速读】:该论文旨在解决数据增强(Data Augmentation)在视觉任务中难以充分学习和建模输入空间中已知不变性(Invariance)的问题,例如空间不变性等。传统数据增强仅通过对训练数据进行变换来尝试学习这些性质,但其效果受限于训练样本的局部变化,无法覆盖整个输入空间。解决方案的关键在于提出“生成式提示”(Generative Hints)方法:利用一个在训练集上训练好的生成模型来近似输入分布并生成未标记的虚拟样本(Virtual Examples),并通过这些虚拟样本在分类目标之外引入额外的“提示”(Hint)目标函数,从而以半监督方式引导模型在整个输入空间中学习并强制实现已知的不变性。该方法不依赖于额外标注数据,却能显著提升模型性能,在多个数据集和架构上均优于标准数据增强策略。
链接: https://arxiv.org/abs/2511.02933
作者: Andy Dimnaku,Abdullah Yusuf Kavranoğlu,Yaser Abu-Mostafa
机构: Stanford University (斯坦福大学); California Institute of Technology (加州理工学院)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 13 pages, 9 figures
Abstract:Data augmentation is widely used in vision to introduce variation and mitigate overfitting, through enabling models to learn invariant properties, such as spatial invariance. However, these properties are not fully captured by data augmentation alone, since it attempts to learn the property on transformations of the training data only. We propose generative hints, a training methodology that directly enforces known invariances in the entire input space. Our approach leverages a generative model trained on the training set to approximate the input distribution and generate unlabeled images, which we refer to as virtual examples. These virtual examples are used to enforce functional properties known as hints. In generative hints, although the training dataset is fully labeled, the model is trained in a semi-supervised manner on both the classification and hint objectives, using the unlabeled virtual examples to guide the model in learning the desired hint. Across datasets, architectures, and loss functions, generative hints consistently outperform standard data augmentation when learning the same property. On popular fine-grained visual classification benchmarks, we achieved up to 1.78% top-1 accuracy improvement (0.63% on average) over fine-tuned models with data augmentation and an average performance boost of 1.286% on the CheXpert X-ray dataset.
zh
[CV-49] Cropland Mapping using Geospatial Embeddings
【速读】:该论文旨在解决当前土地覆盖制图中效率低、更新不及时的问题,以支持对土地利用变化及其气候影响的精准评估。其解决方案的关键在于利用地理空间嵌入(geospatial embeddings)技术,通过预训练模型(如Presto和AlphaEarth)提取遥感数据中的高维特征表示,从而简化传统复杂的分类流程,并实现高精度的耕地分类结果,提升土地利用变化监测的时效性与准确性。
链接: https://arxiv.org/abs/2511.02923
作者: Ivan Zvonkov,Gabriel Tseng,Inbal Becker-Reshef,Hannah Kerner
机构: University of Maryland, College Park (马里兰大学学院市分校); Mila – Quebec AI Institute (魁北克人工智能研究所); McGill University (麦吉尔大学); Arizona State University (亚利桑那州立大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 8 pages, 11 figures
Abstract:Accurate and up-to-date land cover maps are essential for understanding land use change, a key driver of climate change. Geospatial embeddings offer a more efficient and accessible way to map landscape features, yet their use in real-world mapping applications remains underexplored. In this work, we evaluated the utility of geospatial embeddings for cropland mapping in Togo. We produced cropland maps using embeddings from Presto and AlphaEarth. Our findings show that geospatial embeddings can simplify workflows, achieve high-accuracy cropland classification and ultimately support better assessments of land use change and its climate impacts.
zh
[CV-50] OmniVLA: Unifiying Multi-Sensor Perception for Physically-Grounded Multimodal VLA
【速读】:该论文旨在解决现有视觉-语言-动作(Vision-Language-Action, VLA)模型因仅依赖RGB相机而感知能力受限、进而影响物理操作任务中泛化性能的问题。其核心解决方案是提出OmniVLA,一个融合多模态传感信息的VLA模型,关键创新在于引入“传感器掩码图像”(sensor-masked image)这一统一表示:通过将红外相机、毫米波雷达和麦克风阵列等新型传感器的空间对齐且物理意义明确的掩码叠加到RGB图像上,实现跨模态感知的图像原生整合。该设计保持了与RGB统计特性一致的数据格式,简化了训练流程,同时支持轻量级的每传感器投影器,从而显著提升模型在复杂现实任务中的成功率(平均达84%),并优于纯RGB输入和原始传感器输入的基线模型。
链接: https://arxiv.org/abs/2511.01210
作者: Heyu Guo,Shanmu Wang,Ruichun Ma,Shiqi Jiang,Yasaman Ghasempour,Omid Abari,Baining Guo,Lili Qi
机构: Princeton University (普林斯顿大学); University of California, Los Angeles (加州大学洛杉矶分校); Microsoft Research Asia (微软亚洲研究院)
类目: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
备注:
Abstract:Vision-language-action (VLA) models have shown strong generalization for action prediction through large-scale vision-language pretraining. However, most existing models rely solely on RGB cameras, limiting their perception and, consequently, manipulation capabilities. We present OmniVLA, an omni-modality VLA model that integrates novel sensing modalities for physically-grounded spatial intelligence beyond RGB perception. The core of our approach is the sensor-masked image, a unified representation that overlays spatially grounded and physically meaningful masks onto the RGB images, derived from sensors including an infrared camera, a mmWave radar, and a microphone array. This image-native unification keeps sensor input close to RGB statistics to facilitate training, provides a uniform interface across sensor hardware, and enables data-efficient learning with lightweight per-sensor projectors. Built on this, we present a multisensory vision-language-action model architecture and train the model based on an RGB-pretrained VLA backbone. We evaluate OmniVLA on challenging real-world tasks where sensor-modality perception is needed to guide the manipulation. OmniVLA achieves an average task success rate of 84%, significantly outperforms both RGB-only and raw-sensor-input baseline models by 59% and 28% respectively, meanwhile showing higher learning efficiency and stronger generalization capability.
zh
[CV-51] Seeing What You Say: Expressive Image Generation from Speech
【速读】:该论文旨在解决传统语音到图像生成模型中因依赖额外的语音转文本(Speech-to-Text, STT)系统而导致的情感与语调信息丢失问题,从而限制了生成图像的表现力。其核心挑战在于如何在不进行显式文本转换的前提下,直接从语音中提取并利用语言和副语言(paralinguistic)信息以生成符合描述语义且富有情感表达的图像。解决方案的关键是提出 VoxStudio,一个统一且端到端的语音到图像模型,其核心为语音信息瓶颈(Speech Information Bottleneck, SIB)模块,该模块将原始语音压缩为紧凑的语义标记(semantic tokens),同时保留语调和情感细微差别;通过直接操作这些标记,VoxStudio 实现了无需 STT 的高效图像生成,显著提升了对语音中隐含情感信息的建模能力。
链接: https://arxiv.org/abs/2511.03423
作者: Jiyoung Lee,Song Park,Sanghyuk Chun,Soo-Whan Chung
机构: Ewha Womans University (延世女子大学); Princeton University (普林斯顿大学); NAVER CLOUD (NAVER云)
类目: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
备注: In progress
Abstract:This paper proposes VoxStudio, the first unified and end-to-end speech-to-image model that generates expressive images directly from spoken descriptions by jointly aligning linguistic and paralinguistic information. At its core is a speech information bottleneck (SIB) module, which compresses raw speech into compact semantic tokens, preserving prosody and emotional nuance. By operating directly on these tokens, VoxStudio eliminates the need for an additional speech-to-text system, which often ignores the hidden details beyond text, e.g., tone or emotion. We also release VoxEmoset, a large-scale paired emotional speech-image dataset built via an advanced TTS engine to affordably generate richly expressive utterances. Comprehensive experiments on the SpokenCOCO, Flickr8kAudio, and VoxEmoset benchmarks demonstrate the feasibility of our method and highlight key challenges, including emotional consistency and linguistic ambiguity, paving the way for future research.
zh
[CV-52] Morpho-Genomic Deep Learning for Ovarian Cancer Subtype and Gene Mutation Prediction from Histopathology
【速读】:该论文旨在解决卵巢癌(ovarian cancer)因诊断延迟和亚型异质性高而导致的预后不佳问题,特别是当前诊断方法难以揭示支持精准肿瘤学(precision oncology)所需的基因组变异。其解决方案的关键在于提出了一种新型混合深度学习流水线,通过融合定量核形态测量(quantitative nuclear morphometry)与深度卷积图像特征,直接从苏木精-伊红(Hematoxylin and Eosin, HE)组织病理图像中实现卵巢癌亚型分类及基因突变推断。该模型结合ResNet-50卷积神经网络(Convolutional Neural Network, CNN)编码器与视觉Transformer(Vision Transformer, ViT),有效捕捉局部形态纹理与全局组织背景信息,在约45,000个图像块上实现了84.2%的亚型分类准确率(Macro AUC=0.87±0.03),并能以中等到高精度推断TP53、BRCA1和ARID1A等关键基因突变(AUC分别为0.82±0.02、0.76±0.04和0.73±0.05),证明了可量化的组织学表型可编码可测量的基因组信号,为低成本、精准的卵巢癌分诊与诊断提供了新路径。
链接: https://arxiv.org/abs/2511.03365
作者: Gabriela Fernandes
机构: State University of New York (SUNY) at Buffalo (纽约州立大学水牛城分校)
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)
备注:
Abstract:Ovarian cancer remains one of the most lethal gynecological malignancies, largely due to late diagnosis and extensive heterogeneity across subtypes. Current diagnostic methods are limited in their ability to reveal underlying genomic variations essential for precision oncology. This study introduces a novel hybrid deep learning pipeline that integrates quantitative nuclear morphometry with deep convolutional image features to perform ovarian cancer subtype classification and gene mutation inference directly from Hematoxylin and Eosin (HE) histopathological images. Using \sim45,000 image patches sourced from The Cancer Genome Atlas (TCGA) and public datasets, a fusion model combining a ResNet-50 Convolutional Neural Network (CNN) encoder and a Vision Transformer (ViT) was developed. This model successfully captured both local morphological texture and global tissue context. The pipeline achieved a robust overall subtype classification accuracy of 84.2% (Macro AUC of 0.87 \pm 0.03 ). Crucially, the model demonstrated the capacity for gene mutation inference with moderate-to-high accuracy: AUC_TP53 = 0.82 \pm 0.02 , AUC_BRCA1 = 0.76 \pm 0.04 , and AUC_ARID1A = 0.73 \pm 0.05 . Feature importance analysis established direct quantitative links, revealing that nuclear solidity and eccentricity were the dominant predictors for TP53 mutation. These findings validate that quantifiable histological phenotypes encode measurable genomic signals, paving the way for cost-effective, precision histopathology in ovarian cancer triage and diagnosis.
zh
[CV-53] Domain-Adaptive Transformer for Data-Efficient Glioma Segmentation in Sub-Saharan MRI NEURIPS2025
【速读】:该论文旨在解决撒哈拉以南非洲地区胶质瘤(Glioma)分割在临床实践中面临的挑战,即受限的磁共振成像(MRI)基础设施和异构的扫描协议导致严重的域偏移(domain shift),从而影响分割精度。其解决方案的关键在于提出SegFormer3D-plus,一种基于放射组学(radiomics)引导的Transformer架构,通过四个核心机制实现鲁棒性分割:(1) 使用直方图匹配进行跨设备强度归一化;(2) 基于主成分分析(PCA)降维的k-means聚类进行域感知分层采样;(3) 双路径编码器融合频域感知特征提取与空间-通道注意力机制;(4) 采用复合Dice-交叉熵损失函数优化边界定位。该方法在BraTS 2023预训练基础上,针对BraTS-Africa数据微调,显著提升了在异构非洲临床扫描中的肿瘤亚区划分与边界准确性,验证了放射组学引导的域自适应策略在资源有限环境下的有效性。
链接: https://arxiv.org/abs/2511.02928
作者: Ilerioluwakiiye Abolade,Aniekan Udo,Augustine Ojo,Abdulbasit Oyetunji,Hammed Ajigbotosho,Aondana Iorumbur,Confidence Raymond,Maruf Adewole
机构: 未知
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
备注: 4 pages, 2 figures. Accepted as an abstract at the Women in Machine Learning (WiML) Workshop at NeurIPS 2025
Abstract:Glioma segmentation is critical for diagnosis and treatment planning, yet remains challenging in Sub-Saharan Africa due to limited MRI infrastructure and heterogeneous acquisition protocols that induce severe domain shift. We propose SegFormer3D-plus, a radiomics-guided transformer architecture designed for robust segmentation under domain variability. Our method combines: (1) histogram matching for intensity harmonization across scanners, (2) radiomic feature extraction with PCA-reduced k-means for domain-aware stratified sampling, (3) a dual-pathway encoder with frequency-aware feature extraction and spatial-channel attention, and (4) composite Dice-Cross-Entropy loss for boundary refinement. Pretrained on BraTS 2023 and fine-tuned on BraTS-Africa data, SegFormer3D-plus demonstrates improved tumor subregion delineation and boundary localization across heterogeneous African clinical scans, highlighting the value of radiomics-guided domain adaptation for resource-limited settings.
zh
[CV-54] Optimizing the nnU-Net model for brain tumor (Glioma) segmentation Using a BraTS Sub-Saharan Africa (SSA) dataset
【速读】:该论文旨在解决医学图像分割(Medical Image Segmentation)中因数据质量与增强策略不当导致模型泛化能力不足的问题,尤其是在资源匮乏地区(如撒哈拉以南非洲)的临床应用中。其关键解决方案在于:采用高质量原始数据结合nnU-Net框架内建的鲁棒在线增强机制,而非依赖大量离线增强的数据扩充;实验证明,这种策略能有效保留真实解剖变异性和强度分布特性,从而在仅60例多模态MRI病例上实现Dice分数达0.84的全肿瘤分割性能,优于使用360例离线增强数据训练的模型,凸显了数据真实性与合理增强方法对构建可泛化医疗影像分割模型的重要性。
链接: https://arxiv.org/abs/2511.02893
作者: Chukwuemeka Arua Kalu,Adaobi Chiazor Emegoakor,Fortune Okafor,Augustine Okoh Uchenna,Chijioke Kelvin Ukpai,Godsent Erere Onyeugbo
机构: 未知
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
备注: 10 pages, 4 figures
Abstract:Medical image segmentation is a critical achievement in modern medical science, developed over decades of research. It allows for the exact delineation of anatomical and pathological features in two- or three-dimensional pictures by utilizing notions like pixel intensity, texture, and anatomical context. With the advent of automated segmentation, physicians and radiologists may now concentrate on diagnosis and treatment planning while intelligent computers perform routine image processing tasks. This study used the BraTS Sub-Saharan Africa dataset, a selected subset of the BraTS dataset that included 60 multimodal MRI cases from patients with glioma. Surprisingly, the nnU Net model trained on the initial 60 instances performed better than the network trained on an offline-augmented dataset of 360 cases. Hypothetically, the offline augmentations introduced artificial anatomical variances or intensity distributions, reducing generalization. In contrast, the original dataset, when paired with nnU Net’s robust online augmentation procedures, maintained realistic variability and produced better results. The study achieved a Dice score of 0.84 for whole tumor segmentation. These findings highlight the significance of data quality and proper augmentation approaches in constructing accurate, generalizable medical picture segmentation models, particularly for under-represented locations. Comments: 10 pages, 4 figures Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV) Cite as: arXiv:2511.02893 [eess.IV] (or arXiv:2511.02893v1 [eess.IV] for this version) https://doi.org/10.48550/arXiv.2511.02893 Focus to learn more arXiv-issued DOI via DataCite
zh
[CV-55] NEF-NET: Adapting Electrocardio panorama in the wild
【速读】:该论文旨在解决传统多导联心电图(multi-lead electrocardiogram, ECG)系统在诊断特定心脏疾病(如Brugada综合征)时因视角固定而无法捕捉关键电生理模式的问题,同时应对真实场景中长期ECG建模、设备特异性信号伪影干扰及电极放置偏差等挑战。其解决方案的关键在于提出NEF-NET+框架,通过一种新型模型架构实现直接视点变换(direct view transformation),并集成离线预训练、设备校准调优和患者特异性在线校准三阶段流程,从而支持任意长度信号合成、跨设备泛化以及对操作误差的补偿能力。
链接: https://arxiv.org/abs/2511.02880
作者: Zehui Zhan,Yaojun Hu,Jiajing Zhan,Wanchen Lian,Wanqing Wu,Jintai Chen
机构: Hong Kong University of Science and Technology (Guangzhou) (香港科技大学(广州)); Sun Yat-sen University (中山大学); Zhejiang University (浙江大学); University of Hong Kong (香港大学)
类目: ignal Processing (eess.SP); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
备注:
Abstract:Conventional multi-lead electrocardiogram (ECG) systems capture cardiac signals from a fixed set of anatomical viewpoints defined by lead placement. However, certain cardiac conditions (e.g., Brugada syndrome) require additional, non-standard viewpoints to reveal diagnostically critical patterns that may be absent in standard leads. To systematically overcome this limitation, Nef-Net was recently introduced to reconstruct a continuous electrocardiac field, enabling virtual observation of ECG signals from arbitrary views (termed Electrocardio Panorama). Despite its promise, Nef-Net operates under idealized assumptions and faces in-the-wild challenges, such as long-duration ECG modeling, robustness to device-specific signal artifacts, and suboptimal lead placement calibration. This paper presents NEF-NET+, an enhanced framework for realistic panoramic ECG synthesis that supports arbitrary-length signal synthesis from any desired view, generalizes across ECG devices, and com- pensates for operator-induced deviations in electrode placement. These capabilities are enabled by a newly designed model architecture that performs direct view transformation, incorporating a workflow comprising offline pretraining, device calibration tuning steps as well as an on-the-fly calibration step for patient-specific adaptation. To rigorously evaluate panoramic ECG synthesis, we construct a new Electrocardio Panorama benchmark, called Panobench, comprising 5367 recordings with 48-view per subject, capturing the full spatial variability of cardiac electrical activity. Experimental results show that NEF-NET+ delivers substantial improvements over Nef-Net, yielding an increase of around 6 dB in PSNR in real-world setting. The code and Panobench will be released in a subsequent publication.
zh
[CV-56] Benchmarking ResNet for Short-Term Hypoglycemia Classification with DiaData
【速读】:该论文旨在解决Type 1 Diabetes (T1D) 患者医疗数据中存在的质量问题,包括异常值、噪声数据、小样本量以及缺失值导致的信息丢失,这些问题限制了个体化治疗和血糖预测模型的可靠性。解决方案的关键在于对DiaData数据集进行系统性清洗与优化:首先利用四分位距(Interquartile Range, IQR)识别并替换异常值为缺失值;其次针对不同长度的数据间隙采用差异化的插补策略——小于等于25分钟的间隙使用线性插值,大于等于30分钟且小于120分钟的间隙则采用Stineman插值法以获得更真实的葡萄糖估计;最后通过高质量数据训练ResNet模型实现低血糖事件的提前分类预测,实验表明,增加数据量可提升性能7%,而数据质量优化带来额外2–3%的性能增益。
链接: https://arxiv.org/abs/2511.02849
作者: Beyza Cinar,Maria Maleshkova
机构: Helmut Schmidt University (赫尔穆特·施密特大学); Germany (德国)
类目: ignal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
备注: 11 pages, 5 Tables, 4 Figures, BHI 2025 conference (JBHI special issue)
Abstract:Individualized therapy is driven forward by medical data analysis, which provides insight into the patient’s context. In particular, for Type 1 Diabetes (T1D), which is an autoimmune disease, relationships between demographics, sensor data, and context can be analyzed. However, outliers, noisy data, and small data volumes cannot provide a reliable analysis. Hence, the research domain requires large volumes of high-quality data. Moreover, missing values can lead to information loss. To address this limitation, this study improves the data quality of DiaData, an integration of 15 separate datasets containing glucose values from 2510 subjects with T1D. Notably, we make the following contributions: 1) Outliers are identified with the interquartile range (IQR) approach and treated by replacing them with missing values. 2) Small gaps ( \le 25 min) are imputed with linear interpolation and larger gaps ( \ge 30 and 120 min) with Stineman interpolation. Based on a visual comparison, Stineman interpolation provides more realistic glucose estimates than linear interpolation for larger gaps. 3) After data cleaning, the correlation between glucose and heart rate is analyzed, yielding a moderate relation between 15 and 60 minutes before hypoglycemia ( \le 70 mg/dL). 4) Finally, a benchmark for hypoglycemia classification is provided with a state-of-the-art ResNet model. The model is trained with the Maindatabase and Subdatabase II of DiaData to classify hypoglycemia onset up to 2 hours in advance. Training with more data improves performance by 7% while using quality-refined data yields a 2-3% gain compared to raw data.
zh
人工智能
[AI-0] Outbidding and Outbluffing Elite Humans: Mastering Liars Poker via Self-Play and Reinforcement Learning
【速读】:该论文旨在解决多玩家不完美信息博弈中AI代理难以实现顶尖人类水平表现的问题,尤其是在传统扑克类游戏中因多数对局快速收敛至两人对抗而削弱了多玩家动态建模能力的局限。其解决方案的关键在于开发出名为Solly的新型AI代理,采用无模型(model-free)、基于策略梯度与价值函数结合的深度强化学习算法(actor-critic),通过自对弈训练实现了在简化版Liar’s Poker中的精英级表现,不仅在胜率和期望收益上超越人类顶尖玩家,还展现出更强的随机化策略能力和抗exploitation特性,显著优于当时主流大语言模型(LLMs)。
链接: https://arxiv.org/abs/2511.03724
作者: Richard Dewey,Janos Botyanszki,Ciamac C. Moallemi,Andrew T. Zheng
机构: 未知
类目: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
备注:
Abstract:AI researchers have long focused on poker-like games as a testbed for environments characterized by multi-player dynamics, imperfect information, and reasoning under uncertainty. While recent breakthroughs have matched elite human play at no-limit Texas hold’em, the multi-player dynamics are subdued: most hands converge quickly with only two players engaged through multiple rounds of bidding. In this paper, we present Solly, the first AI agent to achieve elite human play in reduced-format Liar’s Poker, a game characterized by extensive multi-player engagement. We trained Solly using self-play with a model-free, actor-critic, deep reinforcement learning algorithm. Solly played at an elite human level as measured by win rate (won over 50% of hands) and equity (money won) in heads-up and multi-player Liar’s Poker. Solly also outperformed large language models (LLMs), including those with reasoning abilities, on the same metrics. Solly developed novel bidding strategies, randomized play effectively, and was not easily exploitable by world-class human players.
zh
[AI-1] AnaFlow: Agent ic LLM -based Workflow for Reasoning -Driven Explainable and Sample-Efficient Analog Circuit Sizing
【速读】:该论文旨在解决模拟/混合信号电路设计中长期存在的两大问题:一是传统手工设计流程效率低、易出错,二是现有基于生成式AI(Generative AI)和强化学习的方法因依赖大量仿真而效率低下,且缺乏可解释性,难以被工程师信任与采纳。解决方案的关键在于提出了一种新型的代理型AI框架(AnaFlow),其核心创新包括:1)采用多智能体协作机制,由基于大语言模型(Large Language Model, LLM)的专用代理协同完成电路拓扑理解、目标解析与参数迭代优化,实现人可理解的推理过程;2)引入自适应仿真策略,显著提升样本效率,减少不必要的仿真次数;3)通过历史优化经验学习避免重复错误,加速收敛。该框架实现了完全自动化的电路尺寸优化,相较于纯贝叶斯优化或强化学习方法更具效率与透明度,为模拟电子设计自动化(Analog EDA)提供了可解释、高效的全新范式。
链接: https://arxiv.org/abs/2511.03697
作者: Mohsen Ahmadzadeh,Kaichang Chen,Georges Gielen
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)
备注: This article was accepted by 2025 International Conference on Computer-Aided Design (ICCAD 2025) and was presented in Munich, October 2025
Abstract:Analog/mixed-signal circuits are key for interfacing electronics with the physical world. Their design, however, remains a largely handcrafted process, resulting in long and error-prone design cycles. While the recent rise of AI-based reinforcement learning and generative AI has created new techniques to automate this task, the need for many time-consuming simulations is a critical bottleneck hindering the overall efficiency. Furthermore, the lack of explainability of the resulting design solutions hampers widespread adoption of the tools. To address these issues, a novel agentic AI framework for sample-efficient and explainable analog circuit sizing is presented. It employs a multi-agent workflow where specialized Large Language Model (LLM)-based agents collaborate to interpret the circuit topology, to understand the design goals, and to iteratively refine the circuit’s design parameters towards the target goals with human-interpretable reasoning. The adaptive simulation strategy creates an intelligent control that yields a high sample efficiency. The AnaFlow framework is demonstrated for two circuits of varying complexity and is able to complete the sizing task fully automatically, differently from pure Bayesian optimization and reinforcement learning approaches. The system learns from its optimization history to avoid past mistakes and to accelerate convergence. The inherent explainability makes this a powerful tool for analog design space exploration and a new paradigm in analog EDA, where AI agents serve as transparent design assistants.
zh
[AI-2] he OpenHands Software Agent SDK: A Composable and Extensible Foundation for Production Agents
【速读】:该论文旨在解决构建可生产级的软件工程代理(Software Engineering Agents)所面临的复杂性问题,具体包括实现灵活性、执行可靠性与安全性以及用户交互接口的易用性。其解决方案的关键在于提出 OpenHands Software Agent SDK,该工具包对原有开源框架 OpenHands 的代理组件进行了完整架构重构,通过简洁但可扩展的接口设计实现灵活的代理开发(默认仅需少量代码即可实现基础功能),并集成本地到远程无缝执行迁移能力、内置沙箱环境、生命周期控制、多大语言模型(Large Language Model, LLM)路由机制及安全分析模块,从而在保障安全性的同时支持多样化交互方式(如 VS Code、命令行、API 等),最终在 SWE-Bench Verified 和 GAIA 基准测试中展现出优异性能,为代理原型设计、定制化应用开发和规模化部署提供了坚实基础。
链接: https://arxiv.org/abs/2511.03690
作者: Xingyao Wang,Simon Rosenberg,Juan Michelini,Calvin Smith,Hoang Tran,Engel Nyst,Rohit Malhotra,Xuhui Zhou,Valerie Chen,Robert Brennan,Graham Neubig
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注:
Abstract:Agents are now used widely in the process of software development, but building production-ready software engineering agents is a complex task. Deploying software agents effectively requires flexibility in implementation and experimentation, reliable and secure execution, and interfaces for users to interact with agents. In this paper, we present the OpenHands Software Agent SDK, a toolkit for implementing software development agents that satisfy these desiderata. This toolkit is a complete architectural redesign of the agent components of the popular OpenHands framework for software development agents, which has 64k+ GitHub stars. To achieve flexibility, we design a simple interface for implementing agents that requires only a few lines of code in the default case, but is easily extensible to more complex, full-featured agents with features such as custom tools, memory management, and more. For security and reliability, it delivers seamless local-to-remote execution portability, integrated REST/WebSocket services. For interaction with human users, it can connect directly to a variety of interfaces, such as visual workspaces (VS Code, VNC, browser), command-line interfaces, and APIs. Compared with existing SDKs from OpenAI, Claude, and Google, OpenHands uniquely integrates native sandboxed execution, lifecycle control, model-agnostic multi-LLM routing, and built-in security analysis. Empirical results on SWE-Bench Verified and GAIA benchmarks demonstrate strong performance. Put together, these elements allow the OpenHands Software Agent SDK to provide a practical foundation for prototyping, unlocking new classes of custom applications, and reliably deploying agents at scale.
zh
[AI-3] Structured Matrix Scaling for Multi-Class Calibration
【速读】:该论文旨在解决分类器输出概率估计不准确的问题,即如何通过后验校准(post-hoc recalibration)方法提升模型预测概率的可靠性。核心问题在于:传统基于逻辑回归的参数化校准方法(如温度缩放、向量缩放和矩阵缩放)在多分类场景中容易因参数数量激增而过拟合,尤其在校准数据有限时表现不佳。解决方案的关键在于引入结构化正则化(structured regularization)、鲁棒预处理(robust preprocessing)和高效优化策略,有效平衡偏差-方差权衡,从而显著优于现有逻辑回归基础的校准技术。
链接: https://arxiv.org/abs/2511.03685
作者: Eugène Berta,David Holzmüller,Michael I. Jordan,Francis Bach
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
Abstract:Post-hoc recalibration methods are widely used to ensure that classifiers provide faithful probability estimates. We argue that parametric recalibration functions based on logistic regression can be motivated from a simple theoretical setting for both binary and multiclass classification. This insight motivates the use of more expressive calibration methods beyond standard temperature scaling. For multi-class calibration however, a key challenge lies in the increasing number of parameters introduced by more complex models, often coupled with limited calibration data, which can lead to overfitting. Through extensive experiments, we demonstrate that the resulting bias-variance tradeoff can be effectively managed by structured regularization, robust preprocessing and efficient optimization. The resulting methods lead to substantial gains over existing logistic-based calibration techniques. We provide efficient and easy-to-use open-source implementations of our methods, making them an attractive alternative to common temperature, vector, and matrix scaling implementations.
zh
[AI-4] Whisper Leak: a side-channel attack on Large Language Models
【速读】:该论文旨在解决大型语言模型(Large Language Models, LLMs)在敏感场景下部署时面临的隐私泄露问题,特别是通过加密通信中的元数据(如数据包大小和时间模式)进行侧信道攻击的风险。其核心解决方案是提出名为Whisper Leak的攻击方法,该方法无需破解TLS加密即可从流式响应中提取用户提示主题信息,实现对28个主流LLM服务的高精度分类(平均AUPRC达98%),即使在极端类别不平衡(噪声与目标比例达10,000:1)条件下仍能保持高精度,尤其对“洗钱”等敏感话题可达到100%精确识别。这揭示了当前LLM部署中元数据泄露的广泛性与严重性,促使业界需重视并改进防护机制。
链接: https://arxiv.org/abs/2511.03675
作者: Geoff McDonald,Jonathan Bar Or
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注: 14 pages, 7 figures
Abstract:Large Language Models (LLMs) are increasingly deployed in sensitive domains including healthcare, legal services, and confidential communications, where privacy is paramount. This paper introduces Whisper Leak, a side-channel attack that infers user prompt topics from encrypted LLM traffic by analyzing packet size and timing patterns in streaming responses. Despite TLS encryption protecting content, these metadata patterns leak sufficient information to enable topic classification. We demonstrate the attack across 28 popular LLMs from major providers, achieving near-perfect classification (often 98% AUPRC) and high precision even at extreme class imbalance (10,000:1 noise-to-target ratio). For many models, we achieve 100% precision in identifying sensitive topics like “money laundering” while recovering 5-20% of target conversations. This industry-wide vulnerability poses significant risks for users under network surveillance by ISPs, governments, or local adversaries. We evaluate three mitigation strategies - random padding, token batching, and packet injection - finding that while each reduces attack effectiveness, none provides complete protection. Through responsible disclosure, we have collaborated with providers to implement initial countermeasures. Our findings underscore the need for LLM providers to address metadata leakage as AI systems handle increasingly sensitive information.
zh
[AI-5] DQN Performance with Epsilon Greedy Policies and Prioritized Experience Replay
【速读】:该论文旨在解决深度Q网络(Deep Q-Network, DQN)在有限环境中的训练效率与收敛稳定性问题,特别是探索策略(epsilon-greedy)与经验回放机制之间的相互作用对学习性能的影响。其解决方案的关键在于系统性地分析不同epsilon衰减策略对学习效率和奖励优化的影响,并引入优先级经验回放(Prioritized Experience Replay, PER)以提升样本利用效率,从而实现更快的收敛速度和更高的累积回报。实验结果表明,合理设计的epsilon衰减 schedule 结合PER能够显著改善DQN在资源受限场景下的鲁棒性和性能表现。
链接: https://arxiv.org/abs/2511.03670
作者: Daniel Perkins,Oscar J. Escobar,Luke Green
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 10 pages, 8 figures
Abstract:We present a detailed study of Deep Q-Networks in finite environments, emphasizing the impact of epsilon-greedy exploration schedules and prioritized experience replay. Through systematic experimentation, we evaluate how variations in epsilon decay schedules affect learning efficiency, convergence behavior, and reward optimization. We investigate how prioritized experience replay leads to faster convergence and higher returns and show empirical results comparing uniform, no replay, and prioritized strategies across multiple simulations. Our findings illuminate the trade-offs and interactions between exploration strategies and memory management in DQN training, offering practical recommendations for robust reinforcement learning in resource-constrained settings.
zh
[AI-6] Visualization Biases MLLM s Decision Making in Network Data Tasks IEEE-VIS2025
【速读】:该论文旨在解决多模态大语言模型(Multimodal Large Language Models, MLLMs)在判断网络中是否存在桥接结构(bridge)时,可视化信息对其决策过程的影响问题。研究发现,相较于纯文本输入,加入可视化虽能提升模型的自信程度,但标准可视化方法会引入显著偏差,导致模型无论实际是否存在桥接结构都倾向于接受或否定其存在。解决方案的关键在于:虽然可视化可有效影响MLLM的判断且不降低其自报置信度,但必须谨慎控制其在生成式AI应用中的使用,以避免因视觉误导引发不可控的幻觉(hallucination)。
链接: https://arxiv.org/abs/2511.03617
作者: Timo Brand,Henry Förster,Stephen G. Kobourov,Jacob Miller
机构: 未知
类目: Graphics (cs.GR); Artificial Intelligence (cs.AI)
备注: This manuscript was presented at VIS x GenAI, a workshop co-located with IEEE VIS 2025
Abstract:We evaluate how visualizations can influence the judgment of MLLMs about the presence or absence of bridges in a network. We show that the inclusion of visualization improves confidence over a structured text-based input that could theoretically be helpful for answering the question. On the other hand, we observe that standard visualization techniques create a strong bias towards accepting or refuting the presence of a bridge – independently of whether or not a bridge actually exists in the network. While our results indicate that the inclusion of visualization techniques can effectively influence the MLLM’s judgment without compromising its self-reported confidence, they also imply that practitioners must be careful of allowing users to include visualizations in generative AI applications so as to avoid undesired hallucinations.
zh
[AI-7] PerfDojo: Automated ML Library Generation for Heterogeneous Architectures
【速读】:该论文旨在解决机器学习模型在异构硬件架构(如CPU、GPU及加速器)上实现最优性能的难题,其核心挑战包括指令集差异、针对不同数据类型和模型特性(如稀疏性、量化)的专用内核需求,以及架构特异性优化带来的性能调优复杂性。传统手动优化成本高,而现有自动方法通常依赖复杂的硬件特定启发式规则和不可解释的中间表示,限制了性能移植性。解决方案的关键在于提出PerfLLM,一种基于大语言模型(Large Language Models, LLMs)与强化学习(Reinforcement Learning, RL)的自动优化方法,其核心组件PerfDojo构建了一个将优化问题建模为RL博弈的环境,采用人类可读且数学启发式的代码表示形式,通过变换保证语义有效性,从而无需预先了解硬件即可实现高效优化,同时支持人工分析与RL代理训练,显著提升了跨多种CPU(x86、Arm、RISC-V)和GPU架构的性能表现。
链接: https://arxiv.org/abs/2511.03586
作者: Andrei Ivanov,Siyuan Shen,Gioele Gottardo,Marcin Chrapek,Afif Boudaoud,Timo Schneider,Luca Benini,Torsten Hoefler
机构: 未知
类目: Performance (cs.PF); Artificial Intelligence (cs.AI)
备注:
Abstract:The increasing complexity of machine learning models and the proliferation of diverse hardware architectures (CPUs, GPUs, accelerators) make achieving optimal performance a significant challenge. Heterogeneity in instruction sets, specialized kernel requirements for different data types and model features (e.g., sparsity, quantization), and architecture-specific optimizations complicate performance tuning. Manual optimization is resource-intensive, while existing automatic approaches often rely on complex hardware-specific heuristics and uninterpretable intermediate representations, hindering performance portability. We introduce PerfLLM, a novel automatic optimization methodology leveraging Large Language Models (LLMs) and Reinforcement Learning (RL). Central to this is PerfDojo, an environment framing optimization as an RL game using a human-readable, mathematically-inspired code representation that guarantees semantic validity through transformations. This allows effective optimization without prior hardware knowledge, facilitating both human analysis and RL agent training. We demonstrate PerfLLM’s ability to achieve significant performance gains across diverse CPU (x86, Arm, RISC-V) and GPU architectures.
zh
[AI-8] Learning Under Laws: A Constraint-Projected Neural PDE Solver that Eliminates Hallucinations
【速读】:该论文旨在解决神经网络在求解偏微分方程(PDE)时违反物理定律的问题,如质量不守恒、激波漂移、熵不满足或非正性破坏等。其解决方案的关键在于提出一种名为约束投影学习(Constraint-Projected Learning, CPL)的框架,通过将神经网络输出投影到由守恒律、Rankine-Hugoniot平衡条件、熵约束和正性约束共同定义的交集上,确保每一步更新都满足物理可实现性。该投影操作是可微的,仅引入约10%的计算开销,且与反向传播完全兼容;同时结合总变差阻尼(Total-Variation Damping, TVD)和滚动训练课程(rollout curriculum),进一步抑制振荡并保证长期预测一致性,从而在保持精度的同时实现严格物理合规性。
链接: https://arxiv.org/abs/2511.03578
作者: Mainak Singha
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 25 pages, 2 figures. This work introduces Constraint-Projected Learning (CPL)- a framework for neural PDE solvers that enforces physical conservation laws during training to eliminate hallucinated, non-physical solutions. Feedback is welcome. Not under review elsewhere
Abstract:Neural networks can approximate solutions to partial differential equations, but they often break the very laws they are meant to model-creating mass from nowhere, drifting shocks, or violating conservation and entropy. We address this by training within the laws of physics rather than beside them. Our framework, called Constraint-Projected Learning (CPL), keeps every update physically admissible by projecting network outputs onto the intersection of constraint sets defined by conservation, Rankine-Hugoniot balance, entropy, and positivity. The projection is differentiable and adds only about 10% computational overhead, making it fully compatible with back-propagation. We further stabilize training with total-variation damping (TVD) to suppress small oscillations and a rollout curriculum that enforces consistency over long prediction horizons. Together, these mechanisms eliminate both hard and soft violations: conservation holds at machine precision, total-variation growth vanishes, and entropy and error remain bounded. On Burgers and Euler systems, CPL produces stable, physically lawful solutions without loss of accuracy. Instead of hoping neural solvers will respect physics, CPL makes that behavior an intrinsic property of the learning process.
zh
[AI-9] Multi-User Personalisation in Human-Robot Interaction: Using Quantitative Bipolar Argumentation Frameworks for Preferences Conflict Resolution
【速读】:该论文旨在解决多用户场景下人机交互(HRI)中的个性化冲突问题,即现有方法主要聚焦于单用户适应,而忽视了多个利益相关者之间可能存在的偏好冲突。其解决方案的关键在于提出一种基于定量双极论证框架(Quantitative Bipolar Argumentation Framework, QBAF)的多用户个性化框架——MUP-QBAF,该框架能够显式建模并解析多用户偏好冲突。与传统论证框架假设静态输入不同,MUP-QBAF融合了用户提出的论据和机器人对环境的动态观测,使系统能随时间迭代更新偏好强度,从而在变化的上下文中实现自适应决策。通过一个真实的助老机器人案例验证了该方法的有效性,表明其可透明、结构化且情境敏感地处理多用户冲突,为现实世界中多用户HRI提供了一种有理论依据的替代数据驱动方法。
链接: https://arxiv.org/abs/2511.03576
作者: Aniol Civit,Antonio Andriella,Carles Sierra,Guillem Alenyà
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注: Preprint submitted to a journal
Abstract:While personalisation in Human-Robot Interaction (HRI) has advanced significantly, most existing approaches focus on single-user adaptation, overlooking scenarios involving multiple stakeholders with potentially conflicting preferences. To address this, we propose the Multi-User Preferences Quantitative Bipolar Argumentation Framework (MUP-QBAF), a novel multi-user personalisation framework based on Quantitative Bipolar Argumentation Frameworks (QBAFs) that explicitly models and resolves multi-user preference conflicts. Unlike prior work in Argumentation Frameworks, which typically assumes static inputs, our approach is tailored to robotics: it incorporates both users’ arguments and the robot’s dynamic observations of the environment, allowing the system to adapt over time and respond to changing contexts. Preferences, both positive and negative, are represented as arguments whose strength is recalculated iteratively based on new information. The framework’s properties and capabilities are presented and validated through a realistic case study, where an assistive robot mediates between the conflicting preferences of a caregiver and a care recipient during a frailty assessment task. This evaluation further includes a sensitivity analysis of argument base scores, demonstrating how preference outcomes can be shaped by user input and contextual observations. By offering a transparent, structured, and context-sensitive approach to resolving competing user preferences, this work advances the field of multi-user HRI. It provides a principled alternative to data-driven methods, enabling robots to navigate conflicts in real-world environments.
zh
[AI-10] Imitation Learning in the Deep Learning Era: A Novel Taxonomy and Recent Advances
【速读】:该论文旨在系统梳理模仿学习(Imitation Learning, IL)领域的最新进展,解决当前研究中方法多样但缺乏统一分类体系、关键挑战如泛化能力、协变量偏移(covariate shift)和示范质量等问题尚未有效应对的问题。其解决方案的关键在于提出一种新颖的分类法(taxonomy),该分类法不同于现有划分方式,能更准确地反映当前IL研究的层次结构与发展趋势;同时,通过批判性分析代表性工作的优势、局限及评估实践,为未来研究指明方向,推动IL在复杂场景下的可扩展性与实用性提升。
链接: https://arxiv.org/abs/2511.03565
作者: Iason Chrysomallis,Georgios Chalkiadakis
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
Abstract:Imitation learning (IL) enables agents to acquire skills by observing and replicating the behavior of one or multiple experts. In recent years, advances in deep learning have significantly expanded the capabilities and scalability of imitation learning across a range of domains, where expert data can range from full state-action trajectories to partial observations or unlabeled sequences. Alongside this growth, novel approaches have emerged, with new methodologies being developed to address longstanding challenges such as generalization, covariate shift, and demonstration quality. In this survey, we review the latest advances in imitation learning research, highlighting recent trends, methodological innovations, and practical applications. We propose a novel taxonomy that is distinct from existing categorizations to better reflect the current state of the IL research stratum and its trends. Throughout the survey, we critically examine the strengths, limitations, and evaluation practices of representative works, and we outline key challenges and open directions for future research.
zh
[AI-11] Uncovering Code Insights: Leverag ing GitHub Artifacts for Deeper Code Understanding
【速读】:该论文旨在解决大型语言模型(Large Language Models, LLMs)在生成代码解释时缺乏对软件工程上下文的 grounding 问题,从而导致解释可能不准确或脱离实际应用场景。其解决方案的关键在于利用 GitHub 中的自然语言 artifacts(如 pull request 描述、Issue 讨论和 commit 消息)来增强 LLM 对代码目的的理解,构建了一个由三部分组成的系统:提取并结构化 GitHub 上下文、基于该上下文生成高层次代码目的解释,并对解释进行验证。该方法显著提升了代码解释的质量与实用性,且通过用户研究表明生成的洞察具有非平凡性且无幻觉。
链接: https://arxiv.org/abs/2511.03549
作者: Ziv Nevo,Orna Raz,Karen Yorav
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注: 7 pages, 6 figures, to be published in AISM 2025, see this https URL
Abstract:Understanding the purpose of source code is a critical task in software maintenance, onboarding, and modernization. While large language models (LLMs) have shown promise in generating code explanations, they often lack grounding in the broader software engineering context. We propose a novel approach that leverages natural language artifacts from GitHub – such as pull request descriptions, issue descriptions and discussions, and commit messages – to enhance LLM-based code understanding. Our system consists of three components: one that extracts and structures relevant GitHub context, another that uses this context to generate high-level explanations of the code’s purpose, and a third that validates the explanation. We implemented this as a standalone tool, as well as a server within the Model Context Protocol (MCP), enabling integration with other AI-assisted development tools. Our main use case is that of enhancing a standard LLM-based code explanation with code insights that our system generates. To evaluate explanations’ quality, we conducted a small scale user study, with developers of several open projects, as well as developers of proprietary projects. Our user study indicates that when insights are generated they often are helpful and non trivial, and are free from hallucinations.
zh
[AI-12] Explaining Decisions in ML Models: a Parameterized Complexity Analysis (Part I) KR
【速读】:该论文旨在解决可解释人工智能(Explainable AI, XAI)领域中关于机器学习(ML)模型解释问题的参数化复杂性缺乏系统理论分析的问题。当前多数ML模型被视为“黑箱”,而本研究聚焦于具有透明内部机制的模型,通过形式化地分析两类核心解释问题——溯因解释(abductive explanation)与对比解释(contrastive explanation),并区分局部与全局变体,揭示了决策树、决策集、决策列表、布尔电路及其集成模型等不同架构下的解释生成复杂度。解决方案的关键在于构建统一的参数化复杂性框架,为XAI提供基础理论支撑,从而推动对AI系统透明性与责任性的深入研究。
链接: https://arxiv.org/abs/2511.03545
作者: Sebastian Ordyniak,Giacomo Paesani,Mateusz Rychlicki,Stefan Szeider
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: Part I of a greatly enhanced version of this https URL , whose full version is available on arXiv under this https URL
Abstract:This paper presents a comprehensive theoretical investigation into the parameterized complexity of explanation problems in various machine learning (ML) models. Contrary to the prevalent black-box perception, our study focuses on models with transparent internal mechanisms. We address two principal types of explanation problems: abductive and contrastive, both in their local and global variants. Our analysis encompasses diverse ML models, including Decision Trees, Decision Sets, Decision Lists, Boolean Circuits, and ensembles thereof, each offering unique explanatory challenges. This research fills a significant gap in explainable AI (XAI) by providing a foundational understanding of the complexities of generating explanations for these models. This work provides insights vital for further research in the domain of XAI, contributing to the broader discourse on the necessity of transparency and accountability in AI systems.
zh
[AI-13] Efficient Neural Networks with Discrete Cosine Transform Activations
【速读】:该论文旨在解决神经网络在表达能力、参数效率与可解释性之间难以平衡的问题。传统多层感知机(Multilayer Perceptron, MLP)虽然具有强大表达能力,但往往存在冗余参数、难以解释且不易压缩的缺陷。为此,作者提出基于离散余弦变换(Discrete Cosine Transform, DCT)参数化的表达式神经网络(Expressive Neural Network, ENN),其关键在于利用DCT对激活函数进行结构化和正交分解,从而获得一个解耦且物理意义明确的表示空间。这一设计不仅提升了模型的可解释性——每个神经元的功能可直接对应到特定的DCT系数——还使得高效剪枝成为可能:由于DCT基函数具有正交性和有界性,可安全移除低贡献度的系数(最多达40%),同时保持性能几乎不变。此方案实现了信号处理理论与神经网络架构设计的融合,显著优化了模型的紧凑性与可理解性。
链接: https://arxiv.org/abs/2511.03531
作者: Marc Martinez-Gost,Sara Pepe,Ana Pérez-Neira,Miguel Ángel Lagunas
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: Paper submitted to WSEAS Signal Processing Journal
Abstract:In this paper, we extend our previous work on the Expressive Neural Network (ENN), a multilayer perceptron with adaptive activation functions parametrized using the Discrete Cosine Transform (DCT). Building upon previous work that demonstrated the strong expressiveness of ENNs with compact architectures, we now emphasize their efficiency, interpretability and pruning capabilities. The DCT-based parameterization provides a structured and decorrelated representation that reveals the functional role of each neuron and allows direct identification of redundant components. Leveraging this property, we propose an efficient pruning strategy that removes unnecessary DCT coefficients with negligible or no loss in performance. Experimental results across classification and implicit neural representation tasks confirm that ENNs achieve state-of-the-art accuracy while maintaining a low number of parameters. Furthermore, up to 40% of the activation coefficients can be safely pruned, thanks to the orthogonality and bounded nature of the DCT basis. Overall, these findings demonstrate that the ENN framework offers a principled integration of signal processing concepts into neural network design, achieving a balanced trade-off between expressiveness, compactness, and interpretability.
zh
[AI-14] A Theoretical Framework for Environmental Similarity and Vessel Mobility as Coupled Predictors of Marine Invasive Species Pathways
【速读】:该论文旨在解决全球航运导致的海洋外来物种入侵风险评估中因缺乏完整压载水记录和交通数据而造成的覆盖不足问题。其解决方案的关键在于融合基于气候特征的环境相似性分析与由自动识别系统(Automatic Identification System, AIS)数据构建的船舶流动网络,通过聚类和度量学习识别港口间的气候类似区域,并结合时间序列链接预测模型模拟未来环境下航运路径的变化,从而在港口和航次层面量化物种传播暴露风险,为精准监测、航线调整及管理干预提供科学依据。
链接: https://arxiv.org/abs/2511.03499
作者: Gabriel Spadon,Vaishnav Vaidheeswaran,Claudio DiBacco
机构: 未知
类目: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI)
备注: Abstract Submitted to the 46th Canadian Conference on Remote Sensing
Abstract:Marine invasive species spread through global shipping and generate substantial ecological and economic impacts. Traditional risk assessments require detailed records of ballast water and traffic patterns, which are often incomplete, limiting global coverage. This work advances a theoretical framework that quantifies invasion risk by combining environmental similarity across ports with observed and forecasted maritime mobility. Climate-based feature representations characterize each port’s marine conditions, while mobility networks derived from Automatic Identification System data capture vessel flows and potential transfer pathways. Clustering and metric learning reveal climate analogues and enable the estimation of species survival likelihood along shipping routes. A temporal link prediction model captures how traffic patterns may change under shifting environmental conditions. The resulting fusion of environmental similarity and predicted mobility provides exposure estimates at the port and voyage levels, supporting targeted monitoring, routing adjustments, and management interventions.
zh
[AI-15] ROSBag MCP Server: Analyzing Robot Data with LLM s for Agent ic Embodied AI Applications
【速读】:该论文旨在解决当前Agentic AI(智能体AI)与Embodied AI(具身AI)交叉领域研究稀缺的问题,特别是如何通过自然语言交互高效分析和处理机器人数据。其解决方案的关键在于构建一个基于Model Context Protocol (MCP) 的服务器,用于解析ROS和ROS 2的bag文件,并结合大语言模型(LLM)与视觉语言模型(VLM)实现对轨迹、激光扫描数据、变换信息及时间序列等移动机器人数据的自然语言驱动分析、可视化与处理。该系统还提供轻量级用户界面以支持不同LLM/VLM模型在工具调用能力上的基准测试,实验揭示了工具描述schema、参数数量及可用工具数目等因素显著影响模型性能,其中Kimi K2和Claude Sonnet 4表现最优。
链接: https://arxiv.org/abs/2511.03497
作者: Lei Fu,Sahar Salimpour,Leonardo Militano,Harry Edelman,Jorge Peña Queralta,Giovanni Toffetti
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
备注:
Abstract:Agentic AI systems and Physical or Embodied AI systems have been two key research verticals at the forefront of Artificial Intelligence and Robotics, with Model Context Protocol (MCP) increasingly becoming a key component and enabler of agentic applications. However, the literature at the intersection of these verticals, i.e., Agentic Embodied AI, remains scarce. This paper introduces an MCP server for analyzing ROS and ROS 2 bags, allowing for analyzing, visualizing and processing robot data with natural language through LLMs and VLMs. We describe specific tooling built with robotics domain knowledge, with our initial release focused on mobile robotics and supporting natively the analysis of trajectories, laser scan data, transforms, or time series data. This is in addition to providing an interface to standard ROS 2 CLI tools (“ros2 bag list” or “ros2 bag info”), as well as the ability to filter bags with a subset of topics or trimmed in time. Coupled with the MCP server, we provide a lightweight UI that allows the benchmarking of the tooling with different LLMs, both proprietary (Anthropic, OpenAI) and open-source (through Groq). Our experimental results include the analysis of tool calling capabilities of eight different state-of-the-art LLM/VLM models, both proprietary and open-source, large and small. Our experiments indicate that there is a large divide in tool calling capabilities, with Kimi K2 and Claude Sonnet 4 demonstrating clearly superior performance. We also conclude that there are multiple factors affecting the success rates, from the tool description schema to the number of arguments, as well as the number of tools available to the models. The code is available with a permissive license at this https URL.
zh
[AI-16] Development of the Bioinspired Tendon-Driven DexHand 021 with Proprioceptive Compliance Control
【速读】:该论文旨在解决如何在满足工程约束(如复杂度、尺寸重量比、耐用性和力感知性能)的前提下,实现具有人类手部多模态功能(运动、传感与协同操作)的高灵巧机器人手的设计与控制难题。其解决方案的关键在于提出一种基于本体感觉力感知的阻抗控制方法,并结合12个主动自由度和7个被动自由度(共19个自由度)的缆绳驱动五指结构,在仅1 kg轻量化设计下实现了高精度力控制与稳定操作能力,实验表明该方案显著降低了多目标抓取时的关节扭矩(减少31.19%),同时具备出色的力估计精度(误差<0.2 N)和指尖重复性(<0.001 m),从而提升了机器人手在工业场景中的适应性与鲁棒性。
链接: https://arxiv.org/abs/2511.03481
作者: Jianbo Yuan,Haohua Zhu,Jing Dai,Sheng Yi
机构: 未知
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
备注: 8 pages 18 fogures, IEEE RAL accept
Abstract:The human hand plays a vital role in daily life and industrial applications, yet replicating its multifunctional capabilities-including motion, sensing, and coordinated manipulation-with robotic systems remains a formidable challenge. Developing a dexterous robotic hand requires balancing human-like agility with engineering constraints such as complexity, size-to-weight ratio, durability, and force-sensing performance. This letter presents Dex-Hand 021, a high-performance, cable-driven five-finger robotic hand with 12 active and 7 passive degrees of freedom (DoFs), achieving 19 DoFs dexterity in a lightweight 1 kg design. We propose a proprioceptive force-sensing-based admittance control method to enhance manipulation. Experimental results demonstrate its superior performance: a single-finger load capacity exceeding 10 N, fingertip repeatability under 0.001 m, and force estimation errors below 0.2 N. Compared to PID control, joint torques in multi-object grasping are reduced by 31.19%, significantly improves force-sensing capability while preventing overload during collisions. The hand excels in both power and precision grasps, successfully executing 33 GRASP taxonomy motions and complex manipulation tasks. This work advances the design of lightweight, industrial-grade dexterous hands and enhances proprioceptive control, contributing to robotic manipulation and intelligent manufacturing.
zh
[AI-17] owards Scalable Web Accessibility Audit with MLLM s as Copilots AAAI2026
【速读】:该论文旨在解决当前网页无障碍审计(web accessibility auditing)实践中存在的资源密集且难以规模化的问题,尤其针对WCAG-EM标准在实际执行中因高度依赖人工而效率低下、难以推广的局限性。解决方案的关键在于提出一个名为AAA的审计框架,其核心创新为两个组成部分:一是GRASP(Graph-based Representational Sampling for Accessibility Planning),一种基于图结构的多模态采样方法,通过学习视觉、文本和关系线索的嵌入表示来确保页面覆盖的代表性;二是MaC(Multimodal AI Copilot),一个基于多模态大语言模型的协作助手,支持审计人员进行跨模态推理并智能辅助高复杂度任务。二者结合实现了可扩展的端到端无障碍审计流程,使人类审计员能够借助AI增强能力实现真实世界的影响。
链接: https://arxiv.org/abs/2511.03471
作者: Ming Gu,Ziwei Wang,Sicen Lai,Zirui Gao,Sheng Zhou,Jiajun Bu
机构: 未知
类目: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
备注: 15 pages. Accepted by AAAI 2026 AISI
Abstract:Ensuring web accessibility is crucial for advancing social welfare, justice, and equality in digital spaces, yet the vast majority of website user interfaces remain non-compliant, due in part to the resource-intensive and unscalable nature of current auditing practices. While WCAG-EM offers a structured methodology for site-wise conformance evaluation, it involves great human efforts and lacks practical support for execution at scale. In this work, we present an auditing framework, AAA, which operationalizes WCAG-EM through a human-AI partnership model. AAA is anchored by two key innovations: GRASP, a graph-based multimodal sampling method that ensures representative page coverage via learned embeddings of visual, textual, and relational cues; and MaC, a multimodal large language model-based copilot that supports auditors through cross-modal reasoning and intelligent assistance in high-effort tasks. Together, these components enable scalable, end-to-end web accessibility auditing, empowering human auditors with AI-enhanced assistance for real-world impact. We further contribute four novel datasets designed for benchmarking core stages of the audit pipeline. Extensive experiments demonstrate the effectiveness of our methods, providing insights that small-scale language models can serve as capable experts when fine-tuned.
zh
[AI-18] Inter-Agent Trust Models: A Comparative Study of Brief Claim Proof Stake Reputation and Constraint in Agent ic Web Protocol Design-A2A AP2 ERC-8004 and Beyond AAAI2026
【速读】:该论文旨在解决多智能体系统中信任机制设计的不足问题,特别是在以大语言模型(LLM)为核心的自主代理(Agent)协作场景下,传统依赖声誉或声明式身份的信任模型易受提示注入、幻觉、对齐偏差等LLM特有脆弱性的影响,导致协议安全性与鲁棒性不足。其解决方案的关键在于提出一种分层的混合信任模型:以“证明”(Proof)和“质押”(Stake)作为默认的强信任锚点,用于保护高影响操作;辅以“简要验证”(Brief)实现身份识别与发现,“声誉”(Reputation)提供灵活性和社会信号,从而在保障安全性的前提下支持可扩展、可互操作的代理经济体系。
链接: https://arxiv.org/abs/2511.03434
作者: Botao ‘Amber’ Hu,Helena Rong
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Networking and Internet Architecture (cs.NI); Social and Information Networks (cs.SI)
备注: Submitted to AAAI 2026 Workshop on Trust and Control in Agentic AI (TrustAgent)
Abstract:As the “agentic web” takes shape-billions of AI agents (often LLM-powered) autonomously transacting and collaborating-trust shifts from human oversight to protocol design. In 2025, several inter-agent protocols crystallized this shift, including Google’s Agent-to-Agent (A2A), Agent Payments Protocol (AP2), and Ethereum’s ERC-8004 “Trustless Agents,” yet their underlying trust assumptions remain under-examined. This paper presents a comparative study of trust models in inter-agent protocol design: Brief (self- or third-party verifiable claims), Claim (self-proclaimed capabilities and identity, e.g. AgentCard), Proof (cryptographic verification, including zero-knowledge proofs and trusted execution environment attestations), Stake (bonded collateral with slashing and insurance), Reputation (crowd feedback and graph-based trust signals), and Constraint (sandboxing and capability bounding). For each, we analyze assumptions, attack surfaces, and design trade-offs, with particular emphasis on LLM-specific fragilities-prompt injection, sycophancy/nudge-susceptibility, hallucination, deception, and misalignment-that render purely reputational or claim-only approaches brittle. Our findings indicate no single mechanism suffices. We argue for trustless-by-default architectures anchored in Proof and Stake to gate high-impact actions, augmented by Brief for identity and discovery and Reputation overlays for flexibility and social signals. We comparatively evaluate A2A, AP2, ERC-8004 and related historical variations in academic research under metrics spanning security, privacy, latency/cost, and social robustness (Sybil/collusion/whitewashing resistance). We conclude with hybrid trust model recommendations that mitigate reputation gaming and misinformed LLM behavior, and we distill actionable design guidelines for safer, interoperable, and scalable agent economies.
zh
[AI-19] Light over Heavy: Automated Performance Requirements Quantification with Linguistic Inducement ICSE2026
【速读】:该论文旨在解决性能需求(performance requirements)量化过程中依赖人工标注所导致的高成本与不准确性问题。现有方法多采用大型语言模型(Large Language Models, LLMs)进行需求分析,但其在处理短且模式性强的性能需求时效率较低。解决方案的关键在于提出LQPR——一种基于新理论框架的自动化方法,将量化任务建模为分类问题,并设计了一种轻量级的语言学诱导匹配机制,以高效捕捉性能需求中的结构化特征,从而在保持高精度的同时显著降低计算开销。实验表明,LQPR在多个数据集上优于九种先进学习方法,在75%以上的场景中表现最优,且资源消耗仅为后者的二十分之一。
链接: https://arxiv.org/abs/2511.03421
作者: Shihai Wang,Tao Chen
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注: accepted by ICSE 2026
Abstract:Elicited performance requirements need to be quantified for compliance in different engineering tasks, e.g., configuration tuning and performance testing. Much existing work has relied on manual quantification, which is expensive and error-prone due to the imprecision. In this paper, we present LQPR, a highly efficient automatic approach for performance requirements this http URL relies on a new theoretical framework that converts quantification as a classification problem. Despite the prevalent applications of Large Language Models (LLMs) for requirement analytics, LQPR takes a different perspective to address the classification: we observed that performance requirements can exhibit strong patterns and are often short/concise, therefore we design a lightweight linguistically induced matching mechanism. We compare LQPR against nine state-of-the-art learning-based approaches over diverse datasets, demonstrating that it is ranked as the sole best for 75% or more cases with two orders less cost. Our work proves that, at least for performance requirement quantification, specialized methods can be more suitable than the general LLM-driven approaches.
zh
[AI-20] Adaptable Hindsight Experience Replay for Search-Based Learning
【速读】:该论文旨在解决AlphaZero类蒙特卡洛树搜索(Monte Carlo Tree Search, MCTS)系统在稀疏奖励环境下的训练效率问题,尤其是在早期阶段由于网络无法提供有效指导而导致的探索困难。其解决方案的关键在于引入可调节的 hindsight experience replay(HER),即Adaptable HER(\ours),该框架将HER机制与AlphaZero相结合,通过灵活调整重标注的目标(relabeling goals)、策略目标(policy targets)以及轨迹选择策略(trajectory selection),从而从失败的搜索路径中提取有效的监督信号,显著提升网络在早期训练阶段的学习能力,并在方程发现等任务上优于纯监督学习或强化学习方法。
链接: https://arxiv.org/abs/2511.03405
作者: Alexandros Vazaios,Jannis Brugger,Cedric Derstroff,Kristian Kersting,Mira Mezini
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注: 8 pages, 2 figures, Presented at the 9th International Workshop on Interactive Adaptive Learning
Abstract:AlphaZero-like Monte Carlo Tree Search systems, originally introduced for two-player games, dynamically balance exploration and exploitation using neural network guidance. This combination makes them also suitable for classical search problems. However, the original method of training the network with simulation results is limited in sparse reward settings, especially in the early stages, where the network cannot yet give guidance. Hindsight Experience Replay (HER) addresses this issue by relabeling unsuccessful trajectories from the search tree as supervised learning signals. We introduce Adaptable HER (\ours), a flexible framework that integrates HER with AlphaZero, allowing easy adjustments to HER properties such as relabeled goals, policy targets, and trajectory selection. Our experiments, including equation discovery, show that the possibility of modifying HER is beneficial and surpasses the performance of pure supervised or reinforcement learning.
zh
[AI-21] Discourse-Aware Scientific Paper Recommendation via QA-Style Summarization and Multi-Level Contrastive Learning
【速读】:该论文旨在解决开放获取(Open-Access, OA)科学文献快速增长背景下,如何在隐私保护和用户交互数据受限的前提下,实现更精准、可解释的学术论文推荐问题。现有内容驱动的推荐方法通常将论文视为无结构文本,忽略了其论述结构(discourse organization),导致语义完整性与可解释性不足。解决方案的关键在于提出OMRC-MR框架,其核心包括:1)基于QA风格的OMRC(Objective, Method, Result, Conclusion)摘要模块,将原始论文转化为结构化且语义连贯的表示;2)多层级对比学习机制,在元数据、段落和文档层面对齐语义表征;3)结构感知重排序阶段,通过上下文相似度校准进一步提升检索精度。该方法显著优于当前主流基线模型,在Precision@10和Recall@10上分别提升最高达7.2%和3.8%,并验证了结构化摘要带来的更高一致性和事实完整性。
链接: https://arxiv.org/abs/2511.03330
作者: Shenghua Wang,Zhen Yin
机构: 未知
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
备注:
Abstract:The rapid growth of open-access (OA) publications has intensified the challenge of identifying relevant scientific papers. Due to privacy constraints and limited access to user interaction data, recent efforts have shifted toward content-based recommendation, which relies solely on textual information. However, existing models typically treat papers as unstructured text, neglecting their discourse organization and thereby limiting semantic completeness and interpretability. To address these limitations, we propose OMRC-MR, a hierarchical framework that integrates QA-style OMRC (Objective, Method, Result, Conclusion) summarization, multi-level contrastive learning, and structure-aware re-ranking for scholarly recommendation. The QA-style summarization module converts raw papers into structured and discourse-consistent representations, while multi-level contrastive objectives align semantic representations across metadata, section, and document levels. The final re-ranking stage further refines retrieval precision through contextual similarity calibration. Experiments on DBLP, S2ORC, and the newly constructed Sci-OMRC dataset demonstrate that OMRC-MR consistently surpasses state-of-the-art baselines, achieving up to 7.2% and 3.8% improvements in Precision@10 and Recall@10, respectively. Additional evaluations confirm that QA-style summarization produces more coherent and factually complete representations. Overall, OMRC-MR provides a unified and interpretable content-based paradigm for scientific paper recommendation, advancing trustworthy and privacy-aware scholarly information retrieval.
zh
[AI-22] Extending Fair Null-Space Projections for Continuous Attributes to Kernel Methods
【速读】:该论文致力于解决连续受保护属性(continuous protected attributes)下的公平性问题,即在回归任务中如何对模型输出进行公平性约束与优化,这在现有文献中研究较少。其关键解决方案是将传统的迭代零空间投影(iterative null-space projection)方法推广至核方法(kernel methods),从而实现对核嵌入(kernel embeddings)的公平性处理,且不依赖于具体模型或公平性评分指标。该方法可直接应用于支持向量回归(SVR)等模型,在多个数据集上表现出竞争力或优于现有主流方法。
链接: https://arxiv.org/abs/2511.03304
作者: Felix Störck,Fabian Hinder,Barbara Hammer
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
Abstract:With the on-going integration of machine learning systems into the everyday social life of millions the notion of fairness becomes an ever increasing priority in their development. Fairness notions commonly rely on protected attributes to assess potential biases. Here, the majority of literature focuses on discrete setups regarding both target and protected attributes. The literature on continuous attributes especially in conjunction with regression – we refer to this as \emphcontinuous fairness – is scarce. A common strategy is iterative null-space projection which as of now has only been explored for linear models or embeddings such as obtained by a non-linear encoder. We improve on this by generalizing to kernel methods, significantly extending the scope. This yields a model and fairness-score agnostic method for kernel embeddings applicable to continuous protected attributes. We demonstrate that our novel approach in conjunction with Support Vector Regression (SVR) provides competitive or improved performance across multiple datasets in comparisons to other contemporary methods.
zh
[AI-23] When Generative Artificial Intelligence meets Extended Reality: A Systematic Review
【速读】:该论文旨在解决当前生成式 AI (Generative AI) 与扩展现实 (Extended Reality, XR) 融合应用中缺乏系统性综述的问题,尤其关注其在2023至2025年间的研究进展与技术实现路径。解决方案的关键在于通过 PRISMA 系统文献筛选与分析方法,对最终纳入的26篇相关文献进行结构化归纳,提炼出生成式 AI 在 XR 中的主要应用场景及核心技术实现方式,从而揭示当前研究趋势、识别关键研究空白,并为未来基于生成式 AI 的 XR 技术发展提供方向性指导。
链接: https://arxiv.org/abs/2511.03282
作者: Xinyu Ning,Yan Zhuo,Xian Wang,Chan-In Devin Sio,Lik-Hang Lee
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:
Abstract:With the continuous advancement of technology, the application of generative artificial intelligence (AI) in various fields is gradually demonstrating great potential, particularly when combined with Extended Reality (XR), creating unprecedented possibilities. This survey article systematically reviews the applications of generative AI in XR, covering as much relevant literature as possible from 2023 to 2025. The application areas of generative AI in XR and its key technology implementations are summarised through PRISMA screening and analysis of the final 26 articles. The survey highlights existing articles from the last three years related to how XR utilises generative AI, providing insights into current trends and research gaps. We also explore potential opportunities for future research to further empower XR through generative AI, providing guidance and information for future generative XR research.
zh
[AI-24] GMoPE:A Prompt-Expert Mixture Framework for Graph Foundation Models
【速读】:该论文旨在解决图神经网络(Graph Neural Networks, GNNs)在跨域和跨任务场景下泛化能力不足的问题,具体表现为负迁移(negative transfer)、可扩展性差以及适应成本高等挑战。其解决方案的关键在于提出GMoPE(Graph Mixture of Prompt-Experts)框架,通过将专家混合(Mixture-of-Experts, MoE)架构与基于提示(prompt-based)的学习相结合,利用专家特异的提示向量(prompt vectors)和结构感知的MoE路由机制,使每个专家专注于不同的子领域并动态参与预测;同时引入软正交约束以增强提示向量间的多样性,防止专家坍缩,提升专家利用均衡性;此外采用仅微调提示参数的策略,在显著降低时空复杂度的同时实现接近全参数微调的性能表现。
链接: https://arxiv.org/abs/2511.03251
作者: Zhibin Wang,Zhixing Zhang,Shuqi Wang,Xuanting Xie,Zhao Kang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
备注:
Abstract:Graph Neural Networks (GNNs) have demonstrated impressive performance on task-specific benchmarks, yet their ability to generalize across diverse domains and tasks remains limited. Existing approaches often struggle with negative transfer, scalability issues, and high adaptation costs. To address these challenges, we propose GMoPE (Graph Mixture of Prompt-Experts), a novel framework that seamlessly integrates the Mixture-of-Experts (MoE) architecture with prompt-based learning for graphs. GMoPE leverages expert-specific prompt vectors and structure-aware MoE routing to enable each expert to specialize in distinct subdomains and dynamically contribute to predictions. To promote diversity and prevent expert collapse, we introduce a soft orthogonality constraint across prompt vectors, encouraging expert specialization and facilitating a more balanced expert utilization. Additionally, we adopt a prompt-only fine-tuning strategy that significantly reduces spatiotemporal complexity during transfer. We validate GMoPE through extensive experiments under various pretraining strategies and multiple downstream tasks. Results show that GMoPE consistently outperforms state-of-the-art baselines and achieves performance comparable to full parameter fine-tuning-while requiring only a fraction of the adaptation overhead. Our work provides a principled and scalable framework for advancing generalizable and efficient graph foundation models.
zh
[AI-25] From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers
【速读】:该论文旨在解决如何利用大型语言模型(Large Language Models, LLMs)从极少量定量心理测量数据中建模人类心理特质之间的相关结构问题。其解决方案的关键在于LLMs通过一种系统性的两阶段推理过程:首先将原始的大五人格量表(Big Five Personality Scale)得分转化为自然语言的人格摘要,实现信息的选择与压缩,类似于生成充分统计量;其次基于这些摘要进行目标心理量表响应的推理生成。研究发现,这种压缩后的摘要并非冗余表达,而是捕捉了特质间协同作用的二阶模式,显著提升了预测一致性,表明LLMs能够通过抽象和推理机制,在零样本条件下高精度模拟个体心理特征间的复杂关联。
链接: https://arxiv.org/abs/2511.03235
作者: Yi-Fei Liu,Yi-Long Lu,Di He,Hang Zhang
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
Abstract:Psychological constructs within individuals are widely believed to be interconnected. We investigated whether and how Large Language Models (LLMs) can model the correlational structure of human psychological traits from minimal quantitative inputs. We prompted various LLMs with Big Five Personality Scale responses from 816 human individuals to role-play their responses on nine other psychological scales. LLMs demonstrated remarkable accuracy in capturing human psychological structure, with the inter-scale correlation patterns from LLM-generated responses strongly aligning with those from human data (R^2 0.89) . This zero-shot performance substantially exceeded predictions based on semantic similarity and approached the accuracy of machine learning algorithms trained directly on the dataset. Analysis of reasoning traces revealed that LLMs use a systematic two-stage process: First, they transform raw Big Five responses into natural language personality summaries through information selection and compression, analogous to generating sufficient statistics. Second, they generate target scale responses based on reasoning from these summaries. For information selection, LLMs identify the same key personality factors as trained algorithms, though they fail to differentiate item importance within factors. The resulting compressed summaries are not merely redundant representations but capture synergistic information–adding them to original scores enhances prediction alignment, suggesting they encode emergent, second-order patterns of trait interplay. Our findings demonstrate that LLMs can precisely predict individual participants’ psychological traits from minimal data through a process of abstraction and reasoning, offering both a powerful tool for psychological simulation and valuable insights into their emergent reasoning capabilities.
zh
[AI-26] Node-Based Editing for Multimodal Generation of Text Audio Image and Vide NEURIPS2025
【速读】:该论文旨在解决多模态内容生成中叙事结构控制与迭代创作效率低下的问题,尤其在生成文本、图像、音频和视频等异构内容时缺乏灵活的编辑机制。其解决方案的关键在于提出一种基于节点(node-based)的故事叙述系统,将故事建模为可扩展、可编辑的图结构,每个节点支持多种模态输入,并通过任务选择代理(task selection agent)协调专门化的生成任务(如故事生成、节点结构推理、格式化布局和上下文生成),从而实现对叙事结构的精细控制和多模态内容的迭代生成。
链接: https://arxiv.org/abs/2511.03227
作者: Alexander Htet Kyaw,Lenin Ravindranath Sivalingam
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
备注: Accepted to NeurIPS 2025, Conference on Neural Information Processing Systems, Workshop on Generative and Protective AI for Content Creation
Abstract:We present a node-based storytelling system for multimodal content generation. The system represents stories as graphs of nodes that can be expanded, edited, and iteratively refined through direct user edits and natural-language prompts. Each node can integrate text, images, audio, and video, allowing creators to compose multimodal narratives. A task selection agent routes between specialized generative tasks that handle story generation, node structure reasoning, node diagram formatting, and context generation. The interface supports targeted editing of individual nodes, automatic branching for parallel storylines, and node-based iterative refinement. Our results demonstrate that node-based editing supports control over narrative structure and iterative generation of text, images, audio, and video. We report quantitative outcomes on automatic story outline generation and qualitative observations of editing workflows. Finally, we discuss current limitations such as scalability to longer narratives and consistency across multiple nodes, and outline future work toward human-in-the-loop and user-centered creative AI tools.
zh
[AI-27] Retrofitters prag matists and activists: Public interest litigation for accountable automated decision-making
【速读】:该论文旨在解决澳大利亚在人工智能(Artificial Intelligence, AI)与自动化决策(Automated Decision-Making, ADM)治理中因地缘政治阻力导致监管乏力的问题,探索如何通过现有法律体系实现问责机制。其核心解决方案在于推动公共利益诉讼(Public Interest Litigation, PIL)作为法律执行的关键工具,强调将既有法律进行“法律重构”(legal retrofitting),即适配旧法以应对ADM的新情境,并系统梳理了行之有效的诉讼策略与战术。论文进一步指出,若要使此类诉讼真正发挥作用,亟需建立相应的制度性保障安排,否则问责机制将难以持续运行。
链接: https://arxiv.org/abs/2511.03211
作者: Henry Fraser,Zahra Stardust
机构: 未知
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
备注:
Abstract:This paper examines the role of public interest litigation in promoting accountability for AI and automated decision-making (ADM) in Australia. Since ADM regulatio faces geopolitical headwinds, effective governance will have to rely at least in part on the enforcement of existing laws. Drawing on interviews with Australian public interest litigators, technology policy activists, and technology law scholars, the paper positions public interest litigation as part of a larger ecosystem for transparency, accountability and justice with respect to ADM. It builds on one participants’s characterisation of litigation about ADM as an exercise in legal retrofitting: adapting old laws to new circumstances. The paper’s primary contribution is to aggregate, organise and present original insights on pragmatic strategies and tactics for effective public interest litigation about ADM. Naturally, it also contends with the limits of these strategies, and of the legal system. Where limits are, however, capable of being overcome, the paper presents findings on urgent needs: the enabling institutional arrangements without which effective litigation and accountability will falter. The paper is relevant to law and technology scholars; individuals and groups harmed by ADM; public interest litigators and technology lawyers; civil society and advocacy organisations; and policymakers.
zh
[AI-28] A Quantized VAE-MLP Botnet Detection Model: A Systematic Evaluation of Quantization-Aware Training and Post-Training Quantization Strategies
【速读】:该论文旨在解决物联网(IoT)设备上针对基于物联网僵尸网络的攻击检测模型计算资源消耗高、难以部署的问题。其核心解决方案是提出一种基于变分自编码器(VAE)与多层感知机(MLP)结合的轻量化检测框架,并通过量化技术(Quantization-Aware Training, QAT 和 Post-Training Quantization, PTQ)对模型进行压缩。关键在于利用预训练VAE提取高维数据的8维潜在向量作为MLP分类器的输入,从而降低模型复杂度;在此基础上,系统评估两种量化策略在检测准确率、存储效率和推理延迟方面的权衡,发现PTQ在保持接近原始精度的同时显著提升推理速度(6倍加速)并大幅减小模型尺寸(21倍压缩),展现出面向边缘设备部署的实际可行性。
链接: https://arxiv.org/abs/2511.03201
作者: Hassan Wasswa,Hussein Abbass,Timothy Lynar
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
Abstract:In an effort to counter the increasing IoT botnet-based attacks, state-of-the-art deep learning methods have been proposed and have achieved impressive detection accuracy. However, their computational intensity restricts deployment on resource-constrained IoT devices, creating a critical need for lightweight detection models. A common solution to this challenge is model compression via quantization. This study proposes a VAE-MLP model framework where an MLP-based classifier is trained on 8-dimensional latent vectors derived from the high-dimensional train data using the encoder component of a pretrained variational autoencoder (VAE). Two widely used quantization strategies–Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ)–are then systematically evaluated in terms of their impact on detection performance, storage efficiency, and inference latency using two benchmark IoT botnet datasets–N-BaIoT and CICIoT2022. The results revealed that, with respect to detection accuracy, the QAT strategy experienced a more noticeable decline,whereas PTQ incurred only a marginal reduction compared to the original unquantized model. Furthermore, PTQ yielded a 6x speedup and 21x reduction in size, while QAT achieved a 3x speedup and 24x compression, demonstrating the practicality of quantization for device-level IoT botnet detection.
zh
[AI-29] Efficient Linear Attention for Multivariate Time Series Modeling via Entropy Equality
【速读】:该论文旨在解决注意力机制在处理长序列时因二次计算复杂度而导致的可扩展性受限问题,尤其在时空时间序列建模中的应用瓶颈。其解决方案的关键在于提出一种基于熵等价的线性注意力机制:通过理论证明熵作为概率单纯形上的严格凹函数,表明具有相似熵值和对齐概率排序的分布具有结构相似性;进而设计出仅具线性复杂度的熵近似算法,实现高效计算并构建基于熵相等性的注意力机制。实验表明,该方法在保持甚至提升预测性能的同时,显著降低内存占用与计算时间。
链接: https://arxiv.org/abs/2511.03190
作者: Mingtao Zhang,Guoli Yang,Zhanxing Zhu,Mengzhu Wang,Xiaoying Bai
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
Abstract:Attention mechanisms have been extensively employed in various applications, including time series modeling, owing to their capacity to capture intricate dependencies; however, their utility is often constrained by quadratic computational complexity, which impedes scalability for long sequences. In this work, we propose a novel linear attention mechanism designed to overcome these limitations. Our approach is grounded in a theoretical demonstration that entropy, as a strictly concave function on the probability simplex, implies that distributions with aligned probability rankings and similar entropy values exhibit structural resemblance. Building on this insight, we develop an efficient approximation algorithm that computes the entropy of dot-product-derived distributions with only linear complexity, enabling the implementation of a linear attention mechanism based on entropy equality. Through rigorous analysis, we reveal that the effectiveness of attention in spatio-temporal time series modeling may not primarily stem from the non-linearity of softmax but rather from the attainment of a moderate and well-balanced weight distribution. Extensive experiments on four spatio-temporal datasets validate our method, demonstrating competitive or superior forecasting performance while achieving substantial reductions in both memory usage and computational time.
zh
[AI-30] Adobe Summit Concierge Evaluation with Human in the Loop VLDB2025
【速读】:该论文旨在解决在企业场景中部署生成式 AI 助手时面临的冷启动问题,包括数据稀疏性(data sparsity)、质量保障(quality assurance)以及快速上线(rapid deployment)等现实约束。其解决方案的关键在于采用“人在环路”(human-in-the-loop)的开发流程,融合提示工程(prompt engineering)、检索增强(retrieval grounding)与轻量级人工验证,从而实现敏捷、反馈驱动的可扩展且可靠的 AI 助手系统构建。
链接: https://arxiv.org/abs/2511.03186
作者: Yiru Chen,Sally Fang,Sai Sree Harsha,Dan Luo,Vaishnavi Muppala,Fei Wu,Shun Jiang,Kun Qian,Yunyao Li
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: Accepted by 6th Workshop on Data Science with Human in the Loop @ VLDB 2025
Abstract:Generative AI assistants offer significant potential to enhance productivity, streamline information access, and improve user experience in enterprise contexts. In this work, we present Summit Concierge, a domain-specific AI assistant developed for Adobe Summit. The assistant handles a wide range of event-related queries and operates under real-world constraints such as data sparsity, quality assurance, and rapid deployment. To address these challenges, we adopt a human-in-the-loop development workflow that combines prompt engineering, retrieval grounding, and lightweight human validation. We describe the system architecture, development process, and real-world deployment outcomes. Our experience shows that agile, feedback-driven development enables scalable and reliable AI assistants, even in cold-start scenarios.
zh
[AI-31] oward Autonomous Engineering Design: A Knowledge-Guided Multi-Agent Framework
【速读】:该论文旨在解决工程设计过程中因多领域专业知识协同复杂、迭代效率低下而导致的资源浪费与质量不稳定问题。其解决方案的关键在于构建一个基于多智能体(multi-agent)的AI框架,通过三个专业化知识驱动型智能体——图谱建构者(Graph Ontologist)、设计工程师(Design Engineer)和系统工程师(Systems Engineer)——形成结构化的设计与评审闭环。其中,图谱建构者利用大语言模型(Large Language Model, LLM)从文献中提取并构建空气动力学领域的知识图谱,设计工程师基于此知识图谱和计算工具生成候选设计方案,系统工程师则结合自身知识图谱对方案进行定性和定量评估,并反馈至设计环节,直至人类管理者确认最终设计。该框架实现了知识驱动的协同优化,显著提升了设计过程的效率、一致性与质量。
链接: https://arxiv.org/abs/2511.03179
作者: Varun Kumar,George Em Karniadakis
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
备注:
Abstract:The engineering design process often demands expertise from multiple domains, leading to complex collaborations and iterative refinements. Traditional methods can be resource-intensive and prone to inefficiencies. To address this, we formalize the engineering design process through a multi-agent AI framework that integrates structured design and review loops. The framework introduces specialized knowledge-driven agents that collaborate to generate and refine design candidates. As an exemplar, we demonstrate its application to the aerodynamic optimization of 4-digit NACA airfoils. The framework consists of three key AI agents: a Graph Ontologist, a Design Engineer, and a Systems Engineer. The Graph Ontologist employs a Large Language Model (LLM) to construct two domain-specific knowledge graphs from airfoil design literature. The Systems Engineer, informed by a human manager, formulates technical requirements that guide design generation and evaluation. The Design Engineer leverages the design knowledge graph and computational tools to propose candidate airfoils meeting these requirements. The Systems Engineer reviews and provides feedback both qualitative and quantitative using its own knowledge graph, forming an iterative feedback loop until a design is validated by the manager. The final design is then optimized to maximize performance metrics such as the lift-to-drag ratio. Overall, this work demonstrates how collaborative AI agents equipped with structured knowledge representations can enhance efficiency, consistency, and quality in the engineering design process.
zh
[AI-32] GraphCliff: Short-Long Range Gating for Subtle Differences but Critical Changes
【速读】:该论文旨在解决分子结构与生物活性之间的非连续性问题,特别是由“活动悬崖”(activity cliffs)引发的挑战——即结构相似但活性差异显著的化合物对,这违背了定量构效关系(QSAR)模型中假设的平滑映射。现有研究表明,传统机器学习模型在处理此类问题时优于图神经网络(Graph Neural Networks, GNNs),原因在于GNN生成的嵌入空间未能有效区分结构相近但功能不同的分子。为克服这一局限,作者提出新型模型GraphCliff,其关键创新在于通过门控机制融合短程与长程信息,从而在保留分子图拓扑结构表达能力的同时,增强模型对活动悬崖的识别能力。实验表明,GraphCliff不仅在活动悬崖化合物上表现更优,在非悬崖化合物上也保持稳定提升,且层间节点嵌入分析显示其有效缓解了过平滑问题并提升了判别力。
链接: https://arxiv.org/abs/2511.03170
作者: Hajung Kim,Jueon Park,Junseok Choe,Sheunheun Baek,Hyeon Hwang,Jaewoo Kang
机构: 未知
类目: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI)
备注:
Abstract:Quantitative structure-activity relationship assumes a smooth relationship between molecular structure and biological activity. However, activity cliffs defined as pairs of structurally similar compounds with large potency differences break this continuity. Recent benchmarks targeting activity cliffs have revealed that classical machine learning models with extended connectivity fingerprints outperform graph neural networks. Our analysis shows that graph embeddings fail to adequately separate structurally similar molecules in the embedding space, making it difficult to distinguish between structurally similar but functionally different molecules. Despite this limitation, molecular graph structures are inherently expressive and attractive, as they preserve molecular topology. To preserve the structural representation of molecules as graphs, we propose a new model, GraphCliff, which integrates short- and long-range information through a gating mechanism. Experimental results demonstrate that GraphCliff consistently improves performance on both non-cliff and cliff compounds. Furthermore, layer-wise node embedding analyses reveal reduced over-smoothing and enhanced discriminative power relative to strong baseline graph models.
zh
[AI-33] Uncovering Bugs in Formal Explainers: A Case Study with PyXAI
【速读】:该论文旨在解决形式化可解释人工智能(Formal Explainable Artificial Intelligence, XAI)在实际实现中缺乏有效验证的问题,即尽管形式化XAI方法具备理论上的严谨性保障,但其具体实现是否正确仍存在不确定性。解决方案的关键在于提出一种全新的验证方法论,用于系统评估形式化解释器的正确性;通过该方法论对公开的形式化解释器PyXAI进行测试,发现其在多数数据集上会产生错误的解释,从而证实了所提验证方法论的必要性和实用性。
链接: https://arxiv.org/abs/2511.03169
作者: Xuanxiang Huang,Yacine Izza,Alexey Ignatiev,Joao Marques-Silva
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
Abstract:Formal explainable artificial intelligence (XAI) offers unique theoretical guarantees of rigor when compared to other non-formal methods of explainability. However, little attention has been given to the validation of practical implementations of formal explainers. This paper develops a novel methodology for validating formal explainers and reports on the assessment of the publicly available formal explainer PyXAI. The paper documents the existence of incorrect explanations computed by PyXAI on most of the datasets analyzed in the experiments, thereby confirming the importance of the proposed novel methodology for the validation of formal explainers.
zh
[AI-34] RefAgent : A Multi-agent LLM -based Framework for Automatic Software Refactoring
【速读】:该论文旨在解决传统大型语言模型(Large Language Models, LLMs)在软件重构(software refactoring)任务中依赖静态指令、缺乏动态适应能力的问题。现有方法难以在复杂多变的开发环境中自主决策并执行端到端的重构流程。为此,作者提出了一种基于多智能体架构的LLM框架RefAgent,其关键在于引入多个专业化智能体(包括规划、执行、测试与自省迭代优化模块),通过工具调用(tool-calling)和自我反思(self-reflection)机制实现对重构任务的动态响应与闭环优化。这一设计显著提升了重构质量与成功率,验证了多智能体架构在自动化软件重构中的有效性。
链接: https://arxiv.org/abs/2511.03153
作者: Khouloud Oueslati,Maxime Lamothe,Foutse Khomh
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注:
Abstract:Large Language Models (LLMs) have substantially influenced various software engineering tasks. Indeed, in the case of software refactoring, traditional LLMs have shown the ability to reduce development time and enhance code quality. However, these LLMs often rely on static, detailed instructions for specific tasks. In contrast, LLM-based agents can dynamically adapt to evolving contexts and autonomously make decisions by interacting with software tools and executing workflows. In this paper, we explore the potential of LLM-based agents in supporting refactoring activities. Specifically, we introduce RefAgent, a multi-agent LLM-based framework for end-to-end software refactoring. RefAgent consists of specialized agents responsible for planning, executing, testing, and iteratively refining refactorings using self-reflection and tool-calling capabilities. We evaluate RefAgent on eight open-source Java projects, comparing its effectiveness against a single-agent approach, a search-based refactoring tool, and historical developer refactorings. Our assessment focuses on: (1) the impact of generated refactorings on software quality, (2) the ability to identify refactoring opportunities, and (3) the contribution of each LLM agent through an ablation study. Our results show that RefAgent achieves a median unit test pass rate of 90%, reduces code smells by a median of 52.5%, and improves key quality attributes (e.g., reusability) by a median of 8.6%. Additionally, it closely aligns with developer refactorings and the search-based tool in identifying refactoring opportunities, attaining a median F1-score of 79.15% and 72.7%, respectively. Compared to single-agent approaches, RefAgent improves the median unit test pass rate by 64.7% and the median compilation success rate by 40.1%. These findings highlight the promise of multi-agent architectures in advancing automated software refactoring.
zh
[AI-35] Forecast2Anomaly (F2A): Adapting Multivariate Time Series Foundation Models for Anomaly Prediction
【速读】:该论文旨在解决多变量时间序列中异常预测(anomaly prediction)的通用性与适应性问题,即现有方法难以在动态、复杂系统中泛化到随时间演变的异常模式。解决方案的关键在于提出 Forecast2Anomaly (F2A) 框架,其核心创新包括:(1) 设计联合预测-异常损失函数(joint forecast-anomaly loss),通过微调预训练时间序列基础模型(Time Series Foundation Models, TSFMs)使其在异常时刻仍能准确预测未来信号;(2) 引入检索增强生成(Retrieval-Augmented Generation, RAG)模块,动态检索历史相关时序片段并基于其条件进行预测,从而在推理阶段自适应分布漂移,无需模型更新即可跟踪演化中的异常模式。该方法实现了从鲁棒的零样本时间序列预测到零样本异常预测的跨越。
链接: https://arxiv.org/abs/2511.03149
作者: Atif Hassan,Tarun Kumar,Ashish Mishra,Sergey Serebryakov,Satish Kumar Mopur,Phanidhar Koganti,Murthy Chelankuri,Ramanagopal Vogety,Suparna Bhattacharya,Martin Foltin
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
Abstract:Forecasting anomalies (anomaly prediction) in multivariate time series from different real-world, dynamic, and complex systems is vital for preempting critical failures, leading to a substantial minimization in operational costs and human labor. Yet, existing methods are limited to specific systems while failing to generalize to evolving anomaly patterns over time. In contrast, pretrained Time Series Foundation Models (TSFMs) have recently demonstrated strong generalization and zero-shot forecasting capabilities. However, their potential remains untapped for anomaly prediction, a task fundamentally different from forecasting normal behavior. Thus, we present Forecast2Anomaly (F2A), a novel framework that empowers TSFMs with anomaly prediction abilities through two key innovations. First, we propose a joint forecast-anomaly loss that fine-tunes TSFMs to accurately forecast future signals even at anomalous time points. Second, we introduce a Retrieval-Augmented Generation (RAG) module that retrieves historically relevant horizons and conditions predictions on them. This component dynamically adapts to distributional shifts at inference time, enabling F2A to track evolving anomalies without requiring model updates. By combining targeted fine-tuning with dynamic retrieval, F2A bridges the gap between robust TSFM zero-shot forecasting and zero-shot anomaly prediction. Extensive experiments across 16 diverse datasets and multiple TSFM backbones show that F2A consistently outperforms state-of-the-art methods, offering a scalable, zero-shot anomaly prediction solution for real-world applications.
zh
[AI-36] A Proprietary Model-Based Safety Response Framework for AI Agents
【速读】:该论文旨在解决大型语言模型(Large Language Models, LLMs)在实际部署中面临的安全性问题,这些问题严重制约了其在关键领域的可信应用。解决方案的关键在于提出一个系统性的安全响应框架,从输入和输出两个层面实现对LLM的防护:在输入层面,采用监督微调(Supervised Fine-Tuning)构建四分类安全判别模型(Safe、Unsafe、Conditionally Safe、Focused Attention),实现细粒度风险识别与差异化处理,达到99.3%的风险召回率;在输出层面,融合检索增强生成(Retrieval-Augmented Generation, RAG)与专用微调的解释模型,确保所有回复基于实时可信知识库,杜绝信息捏造并支持结果溯源。实验表明,该框架在公共安全评测基准上显著优于基线模型TinyR1-Safety-8B,并在高风险测试集上实现100%安全得分,验证了其在复杂风险场景下的卓越保护能力。
链接: https://arxiv.org/abs/2511.03138
作者: Qi Li,Jianjun Xu,Pingtao Wei,Jiu Li,Peiqiang Zhao,Jiwei Shi,Xuan Zhang,Yanhui Yang,Xiaodong Hui,Peng Xu,Wenqin Shao
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
Abstract:With the widespread application of Large Language Models (LLMs), their associated security issues have become increasingly prominent, severely constraining their trustworthy deployment in critical domains. This paper proposes a novel safety response framework designed to systematically safeguard LLMs at both the input and output levels. At the input level, the framework employs a supervised fine-tuning-based safety classification model. Through a fine-grained four-tier taxonomy (Safe, Unsafe, Conditionally Safe, Focused Attention), it performs precise risk identification and differentiated handling of user queries, significantly enhancing risk coverage and business scenario adaptability, and achieving a risk recall rate of 99.3%. At the output level, the framework integrates Retrieval-Augmented Generation (RAG) with a specifically fine-tuned interpretation model, ensuring all responses are grounded in a real-time, trustworthy knowledge base. This approach eliminates information fabrication and enables result traceability. Experimental results demonstrate that our proposed safety control model achieves a significantly higher safety score on public safety evaluation benchmarks compared to the baseline model, TinyR1-Safety-8B. Furthermore, on our proprietary high-risk test set, the framework’s components attained a perfect 100% safety score, validating their exceptional protective capabilities in complex risk scenarios. This research provides an effective engineering pathway for building high-security, high-trust LLM applications.
zh
[AI-37] Using Multi-modal Large Language Model to Boost Fireworks Algorithms Ability in Settling Challenging Optimization Tasks
【速读】:该论文旨在解决复杂优化问题中传统零阶或一阶方法因效率低、梯度信息不准确及优化信息利用不足而难以应对非凸性、高维度和黑箱特性等问题。其解决方案的关键在于提出一种基于多模态大语言模型(Multi-modal Large Language Model, MLLM)增强的烟花算法(Fireworks Algorithm, FWA)框架,引入“关键部分”(Critical Part, CP)概念以扩展FWA在高维复杂任务中的适用性,并通过MLLM的多模态能力挖掘和利用优化过程中的信息,从而显著提升算法性能。实验表明,该框架生成的FWA在旅行商问题(Traveling Salesman Problem, TSP)和电子设计自动化问题(Electronic Design Automation, EDA)上达到了或超越了当前最优(SOTA)结果。
链接: https://arxiv.org/abs/2511.03137
作者: Shipeng Cen,Ying Tan
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
Abstract:As optimization problems grow increasingly complex and diverse, advancements in optimization techniques and paradigm innovations hold significant importance. The challenges posed by optimization problems are primarily manifested in their non-convexity, high-dimensionality, black-box nature, and other unfavorable characteristics. Traditional zero-order or first-order methods, which are often characterized by low efficiency, inaccurate gradient information, and insufficient utilization of optimization information, are ill-equipped to address these challenges effectively. In recent years, the rapid development of large language models (LLM) has led to substantial improvements in their language understanding and code generation capabilities. Consequently, the design of optimization algorithms leveraging large language models has garnered increasing attention from researchers. In this study, we choose the fireworks algorithm(FWA) as the basic optimizer and propose a novel approach to assist the design of the FWA by incorporating multi-modal large language model(MLLM). To put it simply, we propose the concept of Critical Part(CP), which extends FWA to complex high-dimensional tasks, and further utilizes the information in the optimization process with the help of the multi-modal characteristics of large language models. We focus on two specific tasks: the \textittraveling salesman problem (TSP) and \textitelectronic design automation problem (EDA). The experimental results show that FWAs generated under our new framework have achieved or surpassed SOTA results on many problem instances.
zh
[AI-38] An Augmentation Overlap Theory of Contrastive Learning
【速读】:该论文旨在解决自监督对比学习(self-supervised contrastive learning)中机制不明确的问题,尤其是其性能与下游任务表现之间关系的理论解释不足。解决方案的关键在于提出了一种基于“增强重叠”(augmentation overlap)的新理论框架,该框架放宽了传统研究中对条件独立性(conditional independence)的强假设,转而关注在激烈数据增强下同类样本支持集趋于重叠的现象。作者指出,这种重叠使得简单对齐正样本(同一样本的不同增强视图)即可促使对比学习将同类样本聚类在一起,从而实现了对下游性能的渐近紧致边界估计,并进一步基于此提出了无需额外模块即可有效评估表示质量的无监督指标。
链接: https://arxiv.org/abs/2511.03114
作者: Qi Zhang,Yifei Wang,Yisen Wang
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
Abstract:Recently, self-supervised contrastive learning has achieved great success on various tasks. However, its underlying working mechanism is yet unclear. In this paper, we first provide the tightest bounds based on the widely adopted assumption of conditional independence. Further, we relax the conditional independence assumption to a more practical assumption of augmentation overlap and derive the asymptotically closed bounds for the downstream performance. Our proposed augmentation overlap theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations, thus simply aligning the positive samples (augmented views of the same sample) could make contrastive learning cluster intra-class samples together. Moreover, from the newly derived augmentation overlap perspective, we develop an unsupervised metric for the representation evaluation of contrastive learning, which aligns well with the downstream performance almost without relying on additional modules. Code is available at this https URL.
zh
[AI-39] FP-AbDiff: Improving Score-based Antibody Design by Capturing Nonequilibrium Dynamics through the Underlying Fokker-Planck Equation
【速读】:该论文旨在解决生成式抗体设计中两大核心挑战:一是缺乏动力学一致性,导致生成结构在物理上不成立;二是由于数据稀缺和结构偏差导致模型泛化能力差。解决方案的关键在于提出FP-AbDiff,这是首个在完整生成轨迹上强制执行Fokker-Planck方程(Fokker-Planck Equation, FPE)物理规律的抗体生成模型。通过在CDR几何空间(R³ × SO(3))上最小化新颖的FPE残差损失,使局部学习的去噪得分(denoising scores)整合为全局一致的概率流,从而实现物理信息正则化与深度生物先验的协同融合,嵌入于最先进的SE(3)-等变扩散框架中。此方法显著提升了生成抗体的几何精度和氨基酸恢复率,在多个基准测试中达到新的最先进水平。
链接: https://arxiv.org/abs/2511.03113
作者: Jiameng Chen,Yida Xiong,Kun Li,Hongzhi Zhang,Xiantao Cai,Wenbin Hu,Jia Wu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)
备注: 9 pages, 3 figures
Abstract:Computational antibody design holds immense promise for therapeutic discovery, yet existing generative models are fundamentally limited by two core challenges: (i) a lack of dynamical consistency, which yields physically implausible structures, and (ii) poor generalization due to data scarcity and structural bias. We introduce FP-AbDiff, the first antibody generator to enforce Fokker-Planck Equation (FPE) physics along the entire generative trajectory. Our method minimizes a novel FPE residual loss over the mixed manifold of CDR geometries (R^3 x SO(3)), compelling locally-learned denoising scores to assemble into a globally coherent probability flow. This physics-informed regularizer is synergistically integrated with deep biological priors within a state-of-the-art SE(3)-equivariant diffusion framework. Rigorous evaluation on the RAbD benchmark confirms that FP-AbDiff establishes a new state-of-the-art. In de novo CDR-H3 design, it achieves a mean Root Mean Square Deviation of 0.99 Å when superposing on the variable region, a 25% improvement over the previous state-of-the-art model, AbX, and the highest reported Contact Amino Acid Recovery of 39.91%. This superiority is underscored in the more challenging six-CDR co-design task, where our model delivers consistently superior geometric precision, cutting the average full-chain Root Mean Square Deviation by ~15%, and crucially, achieves the highest full-chain Amino Acid Recovery on the functionally dominant CDR-H3 loop (45.67%). By aligning generative dynamics with physical laws, FP-AbDiff enhances robustness and generalizability, establishing a principled approach for physically faithful and functionally viable antibody design.
zh
[AI-40] miniF2F-Lean Revisited: Reviewing Limitations and Charting a Path Forward
【速读】:该论文旨在解决当前自动形式化(autoformalization)与定理证明模型在数学奥林匹克竞赛场景下性能显著低于各自独立任务表现的问题,其核心挑战在于形式化与非形式化陈述之间的语义偏差导致的系统性误差。解决方案的关键在于对miniF2F基准数据集进行全面审查与修正,识别并修复其中超过一半问题中存在的形式与非形式陈述间的不一致、错误及简化,从而构建出miniF2F-v2——一个具有完全验证的形式化与非形式化陈述及证明的新基准。在此基础上,完整定理证明流水线在新基准上的准确率从原始版本的约40%提升至70%,揭示了当前autoformalization模型与定理证明器之间仍存在显著对齐不足,并强调高质量基准对于推动形式推理领域进展和精准诊断模型失败/成功模式的重要性。
链接: https://arxiv.org/abs/2511.03108
作者: Azim Ospanov,Farzan Farnia,Roozbeh Yousefzadeh
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
Abstract:We perform a thorough analysis of the formal and informal statements in the miniF2F benchmark from the perspective of an AI system that is tasked to participate in a math Olympiad consisting of the problems in miniF2F. In such setting, the model has to read and comprehend the problems in natural language, formalize them in Lean language, then proceed with proving the problems, and it will get credit for each problem if the formal proof corresponds to the original informal statement presented to the model. Our evaluation results reveal that the best accuracy of such pipeline can be about 36% using the SoTA models in the literature, considerably lower than the individual SoTA accuracies, 97% and 69% reported in the autoformalization and theorem proving literature. Analyzing the failure modes, we trace back a considerable portion of this drop to discrepancies between the formal and informal statements for more than half of the problems in miniF2F. We proceed with correcting all the errors, discrepancies and simplifications in formal and informal statements, and present the miniF2F-v2 with fully verified formal and informal statements and proofs. Evaluating the full theorem proving pipeline on miniF2F-v2 leads to the best accuracy of 70%, a significant improvement from the 40% on the original miniF2F, yet indicating considerable misalignment between the autoformalization models and theorem provers. Our deep analysis suggests that a higher quality benchmark can help the community better evaluate progress in the field of formal reasoning and also better diagnose the failure and success modes of autoformalization and theorem proving models. Our dataset is available at this https URL.
zh
[AI-41] Large language models require a new form of oversight: capability-based monitoring
【速读】:该论文旨在解决当前医疗领域中大型语言模型(Large Language Models, LLMs)监测方法的局限性问题。现有监控方法沿袭传统机器学习(Machine Learning, ML)范式,基于任务导向的设计,假设因数据集漂移会导致性能下降,但这一假设不适用于LLMs,因其并非为特定人群或任务训练,而是具备通用能力的系统。论文提出以“能力为基础的监测”(capability-based monitoring)作为新的组织原则,其关键在于将监控焦点从单一下游任务转向模型内部共享的核心能力(如摘要、推理、翻译、安全防护等),从而实现跨任务检测系统性弱点、长尾错误和新兴行为,提升监测的全面性与可扩展性,为未来通用人工智能(Generalist AI)在医疗场景中的安全、自适应及协作式监控提供基础。
链接: https://arxiv.org/abs/2511.03106
作者: Katherine C. Kellogg,Bingyang Ye,Yifan Hu,Guergana K. Savova,Byron Wallace,Danielle S. Bitterman
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注: Under review
Abstract:The rapid adoption of large language models (LLMs) in healthcare has been accompanied by scrutiny of their oversight. Existing monitoring approaches, inherited from traditional machine learning (ML), are task-based and founded on assumed performance degradation arising from dataset drift. In contrast, with LLMs, inevitable model degradation due to changes in populations compared to the training dataset cannot be assumed, because LLMs were not trained for any specific task in any given population. We therefore propose a new organizing principle guiding generalist LLM monitoring that is scalable and grounded in how these models are developed and used in practice: capability-based monitoring. Capability-based monitoring is motivated by the fact that LLMs are generalist systems whose overlapping internal capabilities are reused across numerous downstream tasks. Instead of evaluating each downstream task independently, this approach organizes monitoring around shared model capabilities, such as summarization, reasoning, translation, or safety guardrails, in order to enable cross-task detection of systemic weaknesses, long-tail errors, and emergent behaviors that task-based monitoring may miss. We describe considerations for developers, organizational leaders, and professional societies for implementing a capability-based monitoring approach. Ultimately, capability-based monitoring will provide a scalable foundation for safe, adaptive, and collaborative monitoring of LLMs and future generalist artificial intelligence models in healthcare.
zh
[AI-42] Adaptive Detection of Software Aging under Workload Shift ALT
【速读】:该论文旨在解决长期运行系统中软件老化(software aging)导致的性能渐进式退化及故障风险增加的问题,尤其在动态工作负载环境下,传统静态检测模型难以适应突发、渐进或周期性的工作负载变化。解决方案的关键在于提出一种基于机器学习的自适应检测方法,引入概念漂移检测技术中的自适应检测器——Drift Detection Method (DDM) 和 Adaptive Windowing (ADWIN),以实时识别并响应工作负载波动带来的分布偏移;实验表明,采用 ADWIN 的自适应模型在多种工作负载转换场景下均保持高检测精度(F1-Score > 0.93),显著优于静态模型。
链接: https://arxiv.org/abs/2511.03103
作者: Rafael José Moura,Maria Gizele Nascimento,Fumio Machida,Ermeson Andrade
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD)
Abstract:Software aging is a phenomenon that affects long-running systems, leading to progressive performance degradation and increasing the risk of failures. To mitigate this problem, this work proposes an adaptive approach based on machine learning for software aging detection in environments subject to dynamic workload conditions. We evaluate and compare a static model with adaptive models that incorporate adaptive detectors, specifically the Drift Detection Method (DDM) and Adaptive Windowing (ADWIN), originally developed for concept drift scenarios and applied in this work to handle workload shifts. Experiments with simulated sudden, gradual, and recurring workload transitions show that static models suffer a notable performance drop when applied to unseen workload profiles, whereas the adaptive model with ADWIN maintains high accuracy, achieving an F1-Score above 0.93 in all analyzed scenarios.
zh
[AI-43] Scaling Multi-Agent Environment Co-Design with Diffusion Models
【速读】:该论文旨在解决当前代理-环境共设计(agent-environment co-design)方法在高维环境设计空间中难以扩展以及在联合优化过程中因目标动态变化而导致样本效率低下的问题。其解决方案的关键在于提出Diffusion Co-Design(DiCoDe)框架,该框架包含两项核心创新:一是引入投影通用引导(Projected Universal Guidance, PUG),通过采样技术在满足硬约束(如障碍物间空间分离)的前提下探索奖励最大化环境分布;二是设计批评者蒸馏机制(critic distillation mechanism),将强化学习批评者(critic)的知识迁移至扩散模型,从而利用密集且实时更新的学习信号使扩散模型适应不断演化的代理策略。这两项改进显著提升了环境与策略对的质量和协同优化效率,在仓库自动化、多智能体路径规划和风电场优化等复杂基准任务上均超越现有最先进方法。
链接: https://arxiv.org/abs/2511.03100
作者: Hao Xiang Li,Michael Amir,Amanda Prorok
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
备注:
Abstract:The agent-environment co-design paradigm jointly optimises agent policies and environment configurations in search of improved system performance. With application domains ranging from warehouse logistics to windfarm management, co-design promises to fundamentally change how we deploy multi-agent systems. However, current co-design methods struggle to scale. They collapse under high-dimensional environment design spaces and suffer from sample inefficiency when addressing moving targets inherent to joint optimisation. We address these challenges by developing Diffusion Co-Design (DiCoDe), a scalable and sample-efficient co-design framework pushing co-design towards practically relevant settings. DiCoDe incorporates two core innovations. First, we introduce Projected Universal Guidance (PUG), a sampling technique that enables DiCoDe to explore a distribution of reward-maximising environments while satisfying hard constraints such as spatial separation between obstacles. Second, we devise a critic distillation mechanism to share knowledge from the reinforcement learning critic, ensuring that the guided diffusion model adapts to evolving agent policies using a dense and up-to-date learning signal. Together, these improvements lead to superior environment-policy pairs when validated on challenging multi-agent environment co-design benchmarks including warehouse automation, multi-agent pathfinding and wind farm optimisation. Our method consistently exceeds the state-of-the-art, achieving, for example, 39% higher rewards in the warehouse setting with 66% fewer simulation samples. This sets a new standard in agent-environment co-design, and is a stepping stone towards reaping the rewards of co-design in real world domains.
zh
[AI-44] Sparse self-organizing ensembles of local kernels detect rare statistical anomalies
【速读】:该论文旨在解决当前异常检测(Anomaly Detection, AD)方法在面对由现代人工智能生成的高维数据表示时,因统计特性控制不足而导致的检测性能下降问题。具体而言,弱信号或罕见异常可能隐藏于正常数据的表观规律性中,造成漏检。为此,作者提出一套基于结构化先验的设计原则:稀疏性(sparsity)以保证模型简洁性、局部性(locality)以保持几何敏感度、竞争性(competition)以优化模型容量分配。解决方案的关键在于引入SparKer——一种基于半监督Neyman–Pearson框架训练的稀疏高斯核集成模型,其通过自组织局部核机制动态划分表示空间,在统计失衡区域实现对似然比的局部建模,从而有效识别高维空间中的显著异常点。该方法在科学发现、开放世界新奇检测、入侵检测及生成模型验证等任务中展现出卓越的可解释性、效率与可扩展性。
链接: https://arxiv.org/abs/2511.03095
作者: Gaia Grosso,Sai Sumedh R. Hindupur,Thomas Fel,Samuel Bright-Thonney,Philip Harris,Demba Ba
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
Abstract:Modern artificial intelligence has revolutionized our ability to extract rich and versatile data representations across scientific disciplines. Yet, the statistical properties of these representations remain poorly controlled, causing misspecified anomaly detection (AD) methods to falter. Weak or rare signals can remain hidden within the apparent regularity of normal data, creating a gap in our ability to detect and interpret anomalies. We examine this gap and identify a set of structural desiderata for detection methods operating under minimal prior information: sparsity, to enforce parsimony; locality, to preserve geometric sensitivity; and competition, to promote efficient allocation of model capacity. These principles define a class of self-organizing local kernels that adaptively partition the representation space around regions of statistical imbalance. As an instantiation of these principles, we introduce SparKer, a sparse ensemble of Gaussian kernels trained within a semi-supervised Neyman–Pearson framework to locally model the likelihood ratio between a sample that may contain anomalies and a nominal, anomaly-free reference. We provide theoretical insights into the mechanisms that drive detection and self-organization in the proposed model, and demonstrate the effectiveness of this approach on realistic high-dimensional problems of scientific discovery, open-world novelty detection, intrusion detection, and generative-model validation. Our applications span both the natural- and computer-science domains. We demonstrate that ensembles containing only a handful of kernels can identify statistically significant anomalous locations within representation spaces of thousands of dimensions, underscoring both the interpretability, efficiency and scalability of the proposed approach.
zh
[AI-45] SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators
【速读】:该论文旨在解决大规模语言模型(Large Language Models, LLMs)在推理阶段因KV缓存(Key-Value Cache)占用大量片上内存而导致的资源瓶颈问题,尤其是在采用静态图(static graphs)和连续批处理(continuous batching)框架(如vLLM或SGLang)时,难以集成现有KV缓存压缩技术的问题。解决方案的关键在于提出SnapStream,一种可扩展部署的KV缓存压缩方法,其通过稀疏化KV注意力机制实现高效压缩,在保持模型准确性的前提下显著提升片上内存利用率;实验表明,SnapStream在DeepSeek-671B模型上以128k上下文长度运行时,相较基线提升4倍片上内存使用效率,并在LongBench-v2、AIME24和LiveCodeBench等基准测试中引入极小的精度损失,是首个在具备静态图与连续批处理特性的生产级推理系统中成功部署稀疏KV注意力技术的工作。
链接: https://arxiv.org/abs/2511.03092
作者: Jonathan Li,Nasim Farahini,Evgenii Iuliugin,Magnus Vesterlund,Christian Haggstrom,Guangtao Wang,Shubhangi Upasani,Ayush Sachdeva,Rui Li,Faline Fu,Chen Wu,Ayesha Siddiqua,John Long,Tuowen Zhao,Matheen Musaddiq,Hakan Zeffer,Yun Du,Mingran Wang,Qinghua Li,Bo Li,Urmish Thakker,Raghu Prabhakar
机构: 未知
类目: Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)
备注:
Abstract:The proliferation of 100B+ parameter Large Language Models (LLMs) with 100k+ context length support have resulted in increasing demands for on-chip memory to support large KV caches. Techniques such as StreamingLLM and SnapKV demonstrate how to control KV cache size while maintaining model accuracy. Yet, these techniques are not commonly used within industrial deployments using frameworks like vLLM or SGLang. The reason is twofold: on one hand, the static graphs and continuous batching methodology employed by these frameworks make it difficult to admit modifications to the standard multi-head attention algorithm, while on the other hand, the accuracy implications of such techniques on modern instruction-following and reasoning models are not well understood, obfuscating the need for implementing these techniques. In this paper, we explore these accuracy implications on Llama-3.1-8B-Instruct and DeepSeek-R1, and develop SnapStream, a KV cache compression method that can be deployed at scale. We demonstrate the efficacy of SnapStream in a 16-way tensor-parallel deployment of DeepSeek-671B on SambaNova SN40L accelerators running at 128k context length and up to 1832 tokens per second in a real production setting. SnapStream enables 4\times improved on-chip memory usage and introduces minimal accuracy degradation on LongBench-v2, AIME24 and LiveCodeBench. To the best of our knowledge, this is the first implementation of sparse KV attention techniques deployed in a production inference system with static graphs and continuous batching.
zh
[AI-46] Epidemiology of Large Language Models : A Benchmark for Observational Distribution Knowledge
【速读】:该论文旨在解决大语言模型(Large Language Models, LLMs)是否能够内化并准确表征现实世界概率分布这一关键问题,即检验LLMs是否具备对真实世界人口统计特征的隐式知识。其解决方案的关键在于构建首个直接测试该假设的基准(benchmark),用于评估LLMs在经济学、健康、教育和社会行为等多个领域中对现实世界经验分布的理解能力。实验结果表明,LLMs整体表现不佳,未自然内化现实世界的统计规律;结合Pearl因果层次结构(Causal Hierarchy, PCH),进一步揭示其缺乏观测分布(PCH第1层)的知识,从而导致干预性(第2层)和反事实(第3层)推理能力受限。
链接: https://arxiv.org/abs/2511.03070
作者: Drago Plecko,Patrik Okanovic,Torsten Hoefler,Elias Bareinboim
机构: 未知
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
备注:
Abstract:Artificial intelligence (AI) systems hold great promise for advancing various scientific disciplines, and are increasingly used in real-world applications. Despite their remarkable progress, further capabilities are expected in order to achieve more general types of intelligence. A critical distinction in this context is between factual knowledge, which can be evaluated against true or false answers (e.g., “what is the capital of England?”), and probabilistic knowledge, reflecting probabilistic properties of the real world (e.g., “what is the sex of a computer science graduate in the US?”). In this paper, our goal is to build a benchmark for understanding the capabilities of LLMs in terms of knowledge of probability distributions describing the real world. Given that LLMs are trained on vast amounts of text, it may be plausible that they internalize aspects of these distributions. Indeed, LLMs are touted as powerful universal approximators of real-world distributions. At the same time, classical results in statistics, known as curse of dimensionality, highlight fundamental challenges in learning distributions in high dimensions, challenging the notion of universal distributional learning. In this work, we develop the first benchmark to directly test this hypothesis, evaluating whether LLMs have access to empirical distributions describing real-world populations across domains such as economics, health, education, and social behavior. Our results demonstrate that LLMs perform poorly overall, and do not seem to internalize real-world statistics naturally. When interpreted in the context of Pearl’s Causal Hierarchy (PCH), our benchmark demonstrates that language models do not contain knowledge on observational distributions (Layer 1 of PCH), and thus the Causal Hierarchy Theorem implies that interventional (Layer 2) and counterfactual (Layer 3) knowledge of these models is also limited.
zh
[AI-47] No-Human in the Loop: Agent ic Evaluation at Scale for Recommendation NEURIPS2025
【速读】:该论文旨在解决大语言模型(Large Language Models, LLMs)作为评估者在构建可扩展、可信的评估流水线中的可靠性与可比性问题。其解决方案的关键在于提出ScalingEval——一个大规模基准测试框架,通过多智能体协同机制,利用共识驱动的评估协议聚合模式审计和问题代码,以可扩展的多数投票方式生成真实标签(ground-truth labels),从而实现无需人工标注即可对LLM评估者进行可复现的比较。该方法显著提升了评估过程的自动化水平与客观性,为LLM作为裁判的应用提供了系统性、可量化且具备实际指导意义的基准依据。
链接: https://arxiv.org/abs/2511.03051
作者: Tao Zhang,Kehui Yao,Luyi Ma,Jiao Chen,Reza Yousefi Maragheh,Kai Zhao,Jianpeng Xu,Evren Korpeoglu,Sushant Kumar,Kannan Achan
机构: 未知
类目: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
备注: 4 page, NeurIPS 2025 Workshop: Evaluating the Evolving LLM Lifecycle
Abstract:Evaluating large language models (LLMs) as judges is increasingly critical for building scalable and trustworthy evaluation pipelines. We present ScalingEval, a large-scale benchmarking study that systematically compares 36 LLMs, including GPT, Gemini, Claude, and Llama, across multiple product categories using a consensus-driven evaluation protocol. Our multi-agent framework aggregates pattern audits and issue codes into ground-truth labels via scalable majority voting, enabling reproducible comparison of LLM evaluators without human annotation. Applied to large-scale complementary-item recommendation, the benchmark reports four key findings: (i) Anthropic Claude 3.5 Sonnet achieves the highest decision confidence; (ii) Gemini 1.5 Pro offers the best overall performance across categories; (iii) GPT-4o provides the most favorable latency-accuracy-cost tradeoff; and (iv) GPT-OSS 20B leads among open-source models. Category-level analysis shows strong consensus in structured domains (Electronics, Sports) but persistent disagreement in lifestyle categories (Clothing, Food). These results establish ScalingEval as a reproducible benchmark and evaluation protocol for LLMs as judges, with actionable guidance on scaling, reliability, and model family tradeoffs.
zh
[AI-48] PublicAgent : Multi-Agent Design Principles From an LLM -Based Open Data Analysis Framework
【速读】:该论文旨在解决开放数据仓库在实际应用中因非专家用户缺乏数据集发现、模式映射和统计分析能力而导致的可访问性问题。现有大型语言模型(Large Language Models, LLMs)虽在单项任务上表现良好,但在端到端分析流程中存在注意力分散、专业化推理冲突及错误传播等根本局限。其解决方案的关键在于提出PublicAgent多智能体框架,通过将复杂分析任务分解为意图澄清、数据发现、分析与报告四个专业化智能体模块,实现各环节专注处理、阶段验证与错误隔离。该架构不仅保持了每个智能体上下文中的注意力聚焦,还通过模块化设计显著提升了整体流程的鲁棒性和可解释性,从而为非专家用户提供基于自然语言接口的可靠公共数据分析能力。
链接: https://arxiv.org/abs/2511.03023
作者: Sina Montazeri,Yunhe Feng,Kewei Sha
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
Abstract:Open data repositories hold potential for evidence-based decision-making, yet are inaccessible to non-experts lacking expertise in dataset discovery, schema mapping, and statistical analysis. Large language models show promise for individual tasks, but end-to-end analytical workflows expose fundamental limitations: attention dilutes across growing contexts, specialized reasoning patterns interfere, and errors propagate undetected. We present PublicAgent, a multi-agent framework that addresses these limitations through decomposition into specialized agents for intent clarification, dataset discovery, analysis, and reporting. This architecture maintains focused attention within agent contexts and enables validation at each stage. Evaluation across five models and 50 queries derives five design principles for multi-agent LLM systems. First, specialization provides value independent of model strength–even the strongest model shows 97.5% agent win rates, with benefits orthogonal to model scale. Second, agents divide into universal (discovery, analysis) and conditional (report, intent) categories. Universal agents show consistent effectiveness (std dev 12.4%) while conditional agents vary by model (std dev 20.5%). Third, agents mitigate distinct failure modes–removing discovery or analysis causes catastrophic failures (243-280 instances), while removing report or intent causes quality degradation. Fourth, architectural benefits persist across task complexity with stable win rates (86-92% analysis, 84-94% discovery), indicating workflow management value rather than reasoning enhancement. Fifth, wide variance in agent effectiveness across models (42-96% for analysis) requires model-aware architecture design. These principles guide when and why specialization is necessary for complex analytical workflows while enabling broader access to public data through natural language interfaces.
zh
[AI-49] Adaptive-Sensorless Monitoring of Shipping Containers
【速读】:该论文旨在解决传感器缺失(sensorless)模型在集装箱内部温湿度监测中因未融合实时遥测数据及无法校正系统性偏差而导致预测结果与实际值偏差较大、用户难以信任的问题。其核心解决方案是提出一种通用的“残差校正方法”(residual correction method),通过引入少量实时遥测数据对传感器缺失模型进行后处理校正,从而形成“自适应传感器缺失”(adaptive-sensorless)监测框架,显著提升预测精度并增强模型实用性。
链接: https://arxiv.org/abs/2511.03022
作者: Lingqing Shen,Chi Heem Wong,Misaki Mito,Arnab Chakrabarti
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
备注: Published in 2025 IEEE Big Data
Abstract:Monitoring the internal temperature and humidity of shipping containers is essential to preventing quality degradation during cargo transportation. Sensorless monitoring – machine learning models that predict the internal conditions of the containers using exogenous factors – shows promise as an alternative to monitoring using sensors. However, it does not incorporate telemetry information and correct for systematic errors, causing the predictions to differ significantly from the live data and confusing the users. In this paper, we introduce the residual correction method, a general framework for correcting for systematic biases in sensorless models after observing live telemetry data. We call this class of models ``adaptive-sensorless’’ monitoring. We train and evaluate adaptive-sensorless models on the 3.48 million data points – the largest dataset of container sensor readings ever used in academic research – and show that they produce consistent improvements over the baseline sensorless models. When evaluated on the holdout set of the simulated data, they achieve average mean absolute errors (MAEs) of 2.24 \sim 2.31 ^\circ C (vs 2.43 ^\circ C by sensorless) for temperature and 5.72 \sim 7.09% for relative humidity (vs 7.99% by sensorless) and average root mean-squared errors (RMSEs) of 3.19 \sim 3.26 ^\circ C for temperature (vs 3.38 ^\circ C by sensorless) and 7.70 \sim 9.12% for relative humidity (vs 10.0% by sensorless). Adaptive-sensorless models enable more accurate cargo monitoring, early risk detection, and less dependence on full connectivity in global shipping.
zh
[AI-50] Evaluating Control Protocols for Untrusted AI Agents
【速读】:该论文旨在解决AI代理(AI agent)在部署过程中因不可信行为引发的安全风险问题,核心挑战在于如何设计有效的AI控制协议(AI control protocol),以确保其在面对当前攻击策略时具备高安全性,并且对自适应攻击者保持鲁棒性。解决方案的关键在于采用分阶段评估方法:首先测试蓝队协议(如可信模型调用、重采样和关键动作延迟决策)对默认攻击的防御效果,发现“关键动作延迟决策”和“为归责目的的重采样”能将安全率从50%提升至96%;随后通过红队迭代攻击策略,揭示仅依赖外部观测的重采样机制易被破解(安全率降至17%),而“关键动作延迟决策”因其不暴露协议内部逻辑,展现出极强的鲁棒性,证明了限制攻击者对控制协议内部信息访问权限是保障安全的核心要素。
链接: https://arxiv.org/abs/2511.02997
作者: Jon Kutasov,Chloe Loughridge,Yuqi Sun,Henry Sleight,Buck Shlegeris,Tyler Tracy,Joe Benton
机构: 未知
类目: Artificial Intelligence (cs.AI)
备注:
Abstract:As AI systems become more capable and widely deployed as agents, ensuring their safe operation becomes critical. AI control offers one approach to mitigating the risk from untrusted AI agents by monitoring their actions and intervening or auditing when necessary. Evaluating the safety of these protocols requires understanding both their effectiveness against current attacks and their robustness to adaptive adversaries. In this work, we systematically evaluate a range of control protocols in SHADE-Arena, a dataset of diverse agentic environments. First, we evaluate blue team protocols, including deferral to trusted models, resampling, and deferring on critical actions, against a default attack policy. We find that resampling for incrimination and deferring on critical actions perform best, increasing safety from 50% to 96%. We then iterate on red team strategies against these protocols and find that attack policies with additional affordances, such as knowledge of when resampling occurs or the ability to simulate monitors, can substantially improve attack success rates against our resampling strategy, decreasing safety to 17%. However, deferring on critical actions is highly robust to even our strongest red team strategies, demonstrating the importance of denying attack policies access to protocol internals.
zh
[AI-51] Systematizing LLM Persona Design: A Four-Quadrant Technical Taxonomy for AI Companion Applications NEURIPS2025
【速读】:该论文旨在解决生成式 AI(Generative AI)在人工智能陪伴应用领域中因目标多样性、模态差异和技术栈碎片化所导致的缺乏统一框架的问题。其解决方案的关键在于提出一个四象限技术分类法(Four-Quadrant Technical Taxonomy),该框架沿“虚拟 vs. 身体化”和“情感陪伴 vs. 功能增强”两个核心维度划分AI伴侣应用场景,系统梳理了从虚拟偶像、功能性虚拟助手到具身智能机器人等不同类别中的技术挑战与关键使能技术,从而为研究人员提供清晰的导航路径,并为政策制定者识别各类场景下的独特风险奠定基础。
链接: https://arxiv.org/abs/2511.02979
作者: Esther Sun,Zichu Wu
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注: Submitted to Neurips 2025 workshop: LLM Persona Workshop
Abstract:The design and application of LLM-based personas in AI companionship is a rapidly expanding but fragmented field, spanning from virtual emotional compan- ions and game NPCs to embodied functional robots. This diversity in objectives, modality, and technical stacks creates an urgent need for a unified framework. To address this gap, this paper systematizes the field by proposing a Four-Quadrant Technical Taxonomy for AI companion applications. The framework is structured along two critical axes: Virtual vs. Embodied and Emotional Companionship vs. Functional Augmentation. Quadrant I (Virtual Companionship) explores virtual idols, romantic companions, and story characters, introducing a four-layer technical framework to analyze their challenges in maintaining long-term emotional consistency. Quadrant II (Functional Virtual Assistants) analyzes AI applica- tions in work, gaming, and mental health, highlighting the shift from “feeling” to “thinking and acting” and pinpointing key technologies like enterprise RAG and on-device inference. Quadrants III IV (Embodied Intelligence) shift from the virtual to the physical world, analyzing home robots and vertical-domain assistants, revealing core challenges in symbol grounding, data privacy, and ethical liability. This taxonomy provides not only a systematic map for researchers and developers to navigate the complex persona design space but also a basis for policymakers to identify and address the unique risks inherent in different application scenarios.
zh
[AI-52] Value of Information-Enhanced Exploration in Bootstrapped DQN
【速读】:该论文旨在解决深度强化学习中高维状态空间与稀疏奖励环境下探索效率低下的问题,传统基于随机局部策略噪声的探索方法(如ε-greedy和Boltzmann探索)难以有效平衡探索与利用。其解决方案的关键在于将信息价值(Value of Information, VOI)引入经典的Bootstrapped DQN算法框架,通过估计学习信息的价值来量化不同网络头之间的意见差异,并引导智能体向最具潜力的区域进行探索,从而提升探索效率并更好地利用由随机网络初始化带来的内在不确定性,且无需引入额外超参数。
链接: https://arxiv.org/abs/2511.02969
作者: Stergios Plataniotis,Charilaos Akasiadis,Georgios Chalkiadakis
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
Abstract:Efficient exploration in deep reinforcement learning remains a fundamental challenge, especially in environments characterized by high-dimensional states and sparse rewards. Traditional exploration strategies that rely on random local policy noise, such as \epsilon -greedy and Boltzmann exploration methods, often struggle to efficiently balance exploration and exploitation. In this paper, we integrate the notion of (expected) value of information (EVOI) within the well-known Bootstrapped DQN algorithmic framework, to enhance the algorithm’s deep exploration ability. Specifically, we develop two novel algorithms that incorporate the expected gain from learning the value of information into Bootstrapped DQN. Our methods use value of information estimates to measure the discrepancies of opinions among distinct network heads, and drive exploration towards areas with the most potential. We evaluate our algorithms with respect to performance and their ability to exploit inherent uncertainty arising from random network initialization. Our experiments in complex, sparse-reward Atari games demonstrate increased performance, all the while making better use of uncertainty, and, importantly, without introducing extra hyperparameters.
zh
[AI-53] Power Constrained Nonstationary Bandits with Habituation and Recovery Dynamics
【速读】:该论文旨在解决在动态非平稳环境中(如行为健康干预)如何平衡个体个性化决策与群体层面效应估计的问题,尤其针对微随机试验(MRT)中因过度 exploitation 导致探索不足、进而影响对群体治疗效应的可靠检测这一挑战。其解决方案的关键在于提出 ROGUE-TS 算法——一种专为 Reducing or Gaining Unknown Efficacy (ROGUE) 框架设计的 Thompson Sampling 方法,并引入概率截断(probability clipping)机制以量化调节探索强度与 regret 之间的权衡,从而在保持子线性遗憾的同时确保足够的统计功效,实现个体化推荐与群体有效性之间的协同优化。
链接: https://arxiv.org/abs/2511.02944
作者: Fengxu Li,Stephanie M. Carpenter,Matthew P. Buman,Yonatan Mintz
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
备注:
Abstract:A common challenge for decision makers is selecting actions whose rewards are unknown and evolve over time based on prior policies. For instance, repeated use may reduce an action’s effectiveness (habituation), while inactivity may restore it (recovery). These nonstationarities are captured by the Reducing or Gaining Unknown Efficacy (ROGUE) bandit framework, which models real-world settings such as behavioral health interventions. While existing algorithms can compute sublinear regret policies to optimize these settings, they may not provide sufficient exploration due to overemphasis on exploitation, limiting the ability to estimate population-level effects. This is a challenge of particular interest in micro-randomized trials (MRTs) that aid researchers in developing just-in-time adaptive interventions that have population-level effects while still providing personalized recommendations to individuals. In this paper, we first develop ROGUE-TS, a Thompson Sampling algorithm tailored to the ROGUE framework, and provide theoretical guarantees of sublinear regret. We then introduce a probability clipping procedure to balance personalization and population-level learning, with quantified trade-off that balances regret and minimum exploration probability. Validation on two MRT datasets concerning physical activity promotion and bipolar disorder treatment shows that our methods both achieve lower regret than existing approaches and maintain high statistical power through the clipping procedure without significantly increasing regret. This enables reliable detection of treatment effects while accounting for individual behavioral dynamics. For researchers designing MRTs, our framework offers practical guidance on balancing personalization with statistical validity.
zh
[AI-54] Performance Evaluation of Bitstring Representations in a Linear Genetic Programming Framework
【速读】:该论文旨在解决不同位串(bitstring)表示方法在计算性能上的差异问题,特别是在线性遗传编程(Linear Genetic Programming)系统中进行位串拼接操作时的效率优化。其解决方案的关键在于通过实证比较三种位串实现方式——标准库的std::bitset、Boost库的boost::dynamic_bitset以及自定义的直接实现——在多个平台(macOS、Linux 和 Windows MSYS2)上的性能表现,从而揭示编译器优化与系统架构对性能的影响,并为特定平台和应用场景选择最优的位串实现提供实践指导。
链接: https://arxiv.org/abs/2511.02897
作者: Clyde Meli,Vitezslav Nezval,Zuzana Kominkova Oplatkova,Victor Buttigieg,Anthony Spiteri Staines
机构: 未知
类目: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Performance (cs.PF)
备注:
Abstract:Different bitstring representations can yield varying computational performance. This work compares three bitstring implementations in C++: std::bitset, boost::dynamic_bitset, and a custom direct implementation. Their performance is benchmarked in the context of concatenation within a Linear Genetic Programming system. Benchmarks were conducted on three platforms (macOS, Linux, and Windows MSYS2) to assess platform specific performance variations. The results show that the custom direct implementation delivers the fastest performance on Linux and Windows, while std::bitset performs best on macOS. Although consistently slower, boost::dynamic_bitset remains a viable and flexible option. These findings highlight the influence of compiler optimisations and system architecture on performance, providing practical guidance for selecting the optimal method based on platform and application requirements.
zh
[AI-55] A Criminology of Machines
【速读】:该论文试图解决的问题是:随着生成式AI代理(Generative AI Agents)在社会中日益普及并形成自主交互行为,传统犯罪学理论与实践如何应对由多智能体AI系统引发的新型犯罪风险与社会控制挑战。解决方案的关键在于推动犯罪学研究范式的转变——即不再将AI仅视为工具,而应将其视作具有计算、社会和法律维度的行动者(Actor),并基于行动者网络理论(Actor-Network Theory)和机器社会学视角,构建一套用于识别和分析AI间交互可能导致异常、违法或犯罪行为的双层分类框架,进而提出四个亟需理论与实证探讨的核心问题,以引导犯罪学界主动参与AI安全治理与政策制定。
链接: https://arxiv.org/abs/2511.02895
作者: Gian Maria Campedelli
机构: 未知
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Physics and Society (physics.soc-ph)
备注:
Abstract:While the possibility of reaching human-like Artificial Intelligence (AI) remains controversial, the likelihood that the future will be characterized by a society with a growing presence of autonomous machines is high. Autonomous AI agents are already deployed and active across several industries and digital environments and alongside human-human and human-machine interactions, machine-machine interactions are poised to become increasingly prevalent. Given these developments, I argue that criminology must begin to address the implications of this transition for crime and social control. Drawing on Actor-Network Theory and Woolgar’s decades-old call for a sociology of machines – frameworks that acquire renewed relevance with the rise of generative AI agents – I contend that criminologists should move beyond conceiving AI solely as a tool. Instead, AI agents should be recognized as entities with agency encompassing computational, social, and legal dimensions. Building on the literature on AI safety, I thus examine the risks associated with the rise of multi-agent AI systems, proposing a dual taxonomy to characterize the channels through which interactions among AI agents may generate deviant, unlawful, or criminal outcomes. I then advance and discuss four key questions that warrant theoretical and empirical attention: (1) Can we assume that machines will simply mimic humans? (2) Will crime theories developed for humans suffice to explain deviant or criminal behaviors emerging from interactions between autonomous AI agents? (3) What types of criminal behaviors will be affected first? (4) How might this unprecedented societal shift impact policing? These questions underscore the urgent need for criminologists to theoretically and empirically engage with the implications of multi-agent AI systems for the study of crime and play a more active role in debates on AI safety and governance.
zh
[AI-56] Predicting Weekly Fishing Concentration Zones through Deep Learning Integration of Heterogeneous Environmental Spatial Datasets
【速读】:该论文旨在解决北印度洋(包括阿拉伯海和孟加拉湾)沿海渔民难以准确识别高产渔场的问题,从而提升渔业效率并促进可持续发展。解决方案的关键在于构建一个基于人工智能(AI)的预测框架,利用海表温度(Sea Surface Temperature, SST)和叶绿素浓度等海洋学参数,精准识别潜在渔场区域(Potential Fishing Zones, PFZs),从而帮助渔民减少搜寻时间、降低燃油消耗,并优化资源利用效率。
链接: https://arxiv.org/abs/2511.02887
作者: Chaitanya Rele,Aditya Rathod,Kaustubh Natu,Saurabh Kulkarni,Ajay Koli,Swapnali Makdey
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
Abstract:The North Indian Ocean, including the Arabian Sea and the Bay of Bengal, represents a vital source of livelihood for coastal communities, yet fishermen often face uncertainty in locating productive fishing grounds. To address this challenge, we present an AI-assisted framework for predicting Potential Fishing Zones (PFZs) using oceanographic parameters such as sea surface temperature and chlorophyll concentration. The approach is designed to enhance the accuracy of PFZ identification and provide region-specific insights for sustainable fishing practices. Preliminary results indicate that the framework can support fishermen by reducing search time, lowering fuel consumption, and promoting efficient resource utilization.
zh
[AI-57] st-time Adaptation of Tiny Recursive Models
【速读】:该论文旨在解决在计算资源受限的条件下,如何高效提升小型递归模型(Tiny Recursive Models, TRM)在ARC AGI II任务上的表现问题。其关键解决方案在于:先在一个大规模公共ARC任务数据集上对小型递归模型进行预训练(1,280个任务,700k+优化器步长,耗时48小时),使其在公共评估集上达到约10%的得分;随后,在竞赛允许的计算预算内,仅用12,500次梯度更新对模型进行全参数微调(full fine-tuning),即可在半私有评估任务上实现6.67%的得分,显著优于原始方法(7.8%但需超限算力)。该策略证明了预训练-微调范式在资源受限场景下对小型模型性能提升的有效性。
链接: https://arxiv.org/abs/2511.02886
作者: Ronan Killian McGovern
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
Abstract:Prior to the close of the 2025 ARC Prize competition, the leading open source approach - known as TRM, or Tiny Recursive Models - involved training a 7M parameter recursive neural network on augmented variants of ARC tasks. That approach scored approximately 7.8% on the public ARC AGI II evaluation set, but required a level of compute far in excess of what is allowed during the competition. This paper shows that, by starting from a tiny recursive model that has been pre-trained on public ARC tasks, one can efficiently fine-tune on competition tasks within the allowed compute limits. Specifically, a model was pre-trained on 1,280 public tasks for 700k+ optimizer steps over 48 hours on 4xH100 SXM GPUs to obtain a ~10% score on the public evaluation set. That model was then post-trained in just 12,500 gradient steps during the competition to reach a score of 6.67% on semi-private evaluation tasks. Notably, such post-training performance is achieved by full-fine tuning of the tiny model, not LoRA fine-tuning or fine-tuning of task embeddings alone.
zh
[AI-58] AgentS LA : Towards a Service Level Agreement for AI Agents
【速读】:该论文旨在解决AI代理(AI Agents)在智能软件系统中缺乏明确服务质量(Quality of Service, QoS)规范与服务等级协议(Service Level Agreements, SLAs)定义的问题,这一问题阻碍了对AI组件质量保障(Quality Assurance, QA)的有效实施。解决方案的关键在于:首先,基于ISO/IEC 25010标准构建了一个针对AI代理的质量模型,以系统化地刻画AI代理的多维质量属性;其次,提出了一种领域特定语言(Domain Specific Language, DSL),用于形式化定义AI代理所提供的服务的SLA条款,从而实现对AI代理服务质量的可度量、可验证和可管理。
链接: https://arxiv.org/abs/2511.02885
作者: Gwendal Jouneaux,Jordi Cabot
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注:
Abstract:AI components are increasingly becoming a key element of all types of software systems to enhance their functionality. These AI components are often implemented as AI Agents, offering more autonomy than a plain integration of Large Language Models (LLMs), moving from a Model-as-a-Service paradigm to an Agent-as-a-Service one, bringing new challenges to the development of smart software systems. Indeed, while support for the design, implementation, and deployment of those agents exist, the specification of Quality of Service (QoS) and definition of Service Level Agreements (SLAs) aspects for those agents, important to ensure the quality of the resulting systems, remains an open challenge. Part of this is due to the difficulty to clearly define quality in the context of AI components, resulting in a lack of consensus on how to best approach Quality Assurance (QA) for these types of systems. To address this challenge, this paper proposes both a quality model for AI agents based on the ISO/IEC 25010 standard, and a domain specific language to support the definition of SLAs for the services provided by these AI agents.
zh
[AI-59] Stochastic Deep Graph Clustering for Practical Group Formation
【速读】:该论文旨在解决群体推荐系统(Group Recommender Systems, GRSs)在动态现实场景中面临的群组形成问题,现有方法多依赖静态或预定义群组,难以适应实际应用中的实时性和灵活性需求。其解决方案的关键在于提出DeepForm框架,该框架通过轻量级图卷积网络(GCN)捕捉高阶用户结构信息,结合随机聚类学习实现无需重新训练的自适应群组重构,并利用对比学习在动态环境中优化群组质量,从而同时满足高阶信息融合、实时群组生成和群组数量动态调整三大核心要求。
链接: https://arxiv.org/abs/2511.02879
作者: Junhyung Park,Hyungjin Kim,Seokho Ahn,Young-Duk Seo
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
备注:
Abstract:While prior work on group recommender systems (GRSs) has primarily focused on improving recommendation accuracy, most approaches assume static or predefined groups, making them unsuitable for dynamic, real-world scenarios. We reframe group formation as a core challenge in GRSs and propose DeepForm (Stochastic Deep Graph Clustering for Practical Group Formation), a framework designed to meet three key operational requirements: (1) the incorporation of high-order user information, (2) real-time group formation, and (3) dynamic adjustment of the number of groups. DeepForm employs a lightweight GCN architecture that effectively captures high-order structural signals. Stochastic cluster learning enables adaptive group reconfiguration without retraining, while contrastive learning refines groups under dynamic conditions. Experiments on multiple datasets demonstrate that DeepForm achieves superior group formation quality, efficiency, and recommendation accuracy compared with various baselines.
zh
[AI-60] A Novel Reservoir Computing Framework for Chaotic Time Series Prediction Using Time Delay Embedding and Random Fourier Features
【速读】:该论文旨在解决混沌时间序列预测中模型对系统吸引子内在几何结构建模能力不足且计算效率低的问题。传统储备池计算(Reservoir Computing, RC)依赖高维循环连接,难以有效捕捉延迟坐标间的非线性关系,同时需手动调参如谱半径和泄漏率,限制了其泛化性能。解决方案的关键在于提出一种融合时间延迟嵌入与随机傅里叶特征(Random Fourier Feature, RFF)映射的新型RC框架——RFF-RC,通过显式近似非线性核变换来揭示重构相空间中的潜在动力学关联,从而在不使用传统递归架构的情况下增强储层的动力学表示能力,并减少对人工超参数的依赖,实现更高效、鲁棒且可解释的混沌系统建模。
链接: https://arxiv.org/abs/2511.02877
作者: S. K. Laha
机构: 未知
类目: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:
Abstract:Forecasting chaotic time series requires models that can capture the intrinsic geometry of the underlying attractor while remaining computationally efficient. We introduce a novel reservoir computing (RC) framework that integrates time-delay embedding with Random Fourier Feature (RFF) mappings to construct a dynamical reservoir without the need for traditional recurrent architectures. Unlike standard RC, which relies on high-dimensional recurrent connectivity, the proposed RFF-RC explicitly approximates nonlinear kernel transformations that uncover latent dynamical relations in the reconstructed phase space. This hybrid formulation offers two key advantages: (i) it provides a principled way to approximate complex nonlinear interactions among delayed coordinates, thereby enriching the effective dynamical representation of the reservoir, and (ii) it reduces reliance on manual reservoir hyperparameters such as spectral radius and leaking rate. We evaluate the framework on canonical chaotic systems-the Mackey-Glass equation, the Lorenz system, and the Kuramoto-Sivashinsky equation. This novel formulation demonstrates that RFF-RC not only achieves superior prediction accuracy but also yields robust attractor reconstructions and long-horizon forecasts. These results show that the combination of delay embedding and RFF-based reservoirs reveals new dynamical structure by embedding the system in an enriched feature space, providing a computationally efficient and interpretable approach to modeling chaotic dynamics.
zh
[AI-61] Academics and Generative AI: Empirical and Epistemic Indicators of Policy-Practice Voids
【速读】:该论文旨在解决生成式 AI (Generative AI) 在学术界扩散过程中,政策制定与实践应用之间出现的脱节问题,从而催生对可审计对齐指标的需求。其解决方案的关键在于构建一个嵌入结构化解释框架的十项间接测量工具,通过提取学者群体中的经验性(empirical)与认识论(epistemic)信号,提炼出三个经过过滤的指标:(1)AI整合评估能力(代理指标),基于AI技能、感知教学效益和检测信心三重筛选条件,衡量愿意在考试中完全允许AI使用的比例;(2)行业级必要性(代理指标),针对高使用强度但仍高度认可AI贡献的用户,统计认为AI能挑战传统学科的占比;(3)本体论立场(ontological stance),在识别出将AI视为与以往工具本质不同的个体中,结合行为变化和元认知测试,划分物质性与非物质性认知取向,以映射采购主张与证据类别的一致性。
链接: https://arxiv.org/abs/2511.02875
作者: R. Yamamoto Ravenor
机构: 未知
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
备注: 14 pages, 2 tables, 1 figure
Abstract:As generative AI diffuses through academia, policy-practice divergence becomes consequential, creating demand for auditable indicators of alignment. This study prototypes a ten-item, indirect-elicitation instrument embedded in a structured interpretive framework to surface voids between institutional rules and practitioner AI use. The framework extracts empirical and epistemic signals from academics, yielding three filtered indicators of such voids: (1) AI-integrated assessment capacity (proxy) - within a three-signal screen (AI skill, perceived teaching benefit, detection confidence), the share who would fully allow AI in exams; (2) sector-level necessity (proxy) - among high output control users who still credit AI with high contribution, the proportion who judge AI capable of challenging established disciplines; and (3) ontological stance - among respondents who judge AI different in kind from prior tools, report practice change, and pass a metacognition gate, the split between material and immaterial views as an ontological map aligning procurement claims with evidence classes.
zh
[AI-62] FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels
【速读】:该论文旨在解决当前大型语言模型(Large Language Models, LLMs)在形式化数学推理能力上的局限性,特别是其在应对现代数学研究级问题时表现不足的问题。现有基准如IMO竞赛题主要反映的是特定类型的解题能力,无法体现数学研究所需的深度、广度与抽象性。为此,作者提出FATE(Formal Algebra Theorem Evaluation)基准系列,聚焦于抽象代数与交换代数领域,包含两个子集FATE-H和FATE-X,分别涵盖从本科练习到博士资格考试难度以上的问题,并首次实现了对Mathlib库覆盖范围的超越。关键解决方案在于构建了一个系统性的、具有挑战性的形式化代数推理评估体系,通过两阶段评测揭示了模型在自然语言推理与形式化转换之间的显著性能差距,并识别出形式化过程中的典型错误类型,从而为推进研究级形式数学推理提供了可量化、可比较的基准与改进方向。
链接: https://arxiv.org/abs/2511.02872
作者: Jiedong Jiang,Wanyi He,Yuefeng Wang,Guoxiong Gao,Yongle Hu,Jingting Wang,Nailing Guan,Peihao Wu,Chunbo Dai,Liang Xiao,Bin Dong
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)
备注:
Abstract:Recent advances in large language models (LLMs) have demonstrated impressive capabilities in formal theorem proving, particularly on contest-based mathematical benchmarks like the IMO. However, these contests do not reflect the depth, breadth, and abstraction of modern mathematical research. To bridge this gap, we introduce FATE (Formal Algebra Theorem Evaluation), a new benchmark series in formal algebra designed to chart a course toward advanced mathematical reasoning. We present two new components, FATE-H and FATE-X, each with 100 problems in abstract and commutative algebra. The FATE series spans a difficulty spectrum from undergraduate exercises to problems exceeding PhD qualifying exams. Notably, FATE-X is the first formal benchmark to surpass both PhD-level exam difficulty and the coverage of the Mathlib library. Our evaluations of state-of-the-art LLM provers on this new benchmark reveal a stark performance gap compared to contest math: the best model achieves only 3% (pass@64) accuracy on FATE-H and 0% on FATE-X. Our two-stage evaluation reveals that models’ natural-language reasoning is notably more accurate than their ability to formalize this reasoning. We systematically classify the common errors that arise during this formalization process. Furthermore, a comparative study shows that a specialized prover can exhibit less effective reflection than general-purpose models, reducing its accuracy at the natural-language stage. We believe FATE provides a robust and challenging benchmark that establishes essential checkpoints on the path toward research-level formal mathematical reasoning.
zh
[AI-63] Analysis of AdvFusion: Adapter-based Multilingual Learning for Code Large Language Models
【速读】:该论文旨在解决多编程语言知识迁移中如何有效提升代码大语言模型(Code-LLMs)在不同软件工程任务上的性能问题,尤其关注参数高效微调(Parameter Efficient Fine-Tuning, PEFT)方法在跨语言学习中的表现差异。其解决方案的关键在于提出并扩展了AdvFusion方法——一种基于PEFT的新型架构,通过在适配目标任务前先从其他编程语言中学习通用特征,从而增强模型对目标任务的适应能力。实验表明,AdvFusion在代码生成任务上优于AdapterFusion,但在提交消息生成和代码翻译任务中表现较差,且不同Code-LLM与任务组合展现出显著异质性,揭示了PEFT方法选择需结合具体任务和模型规模进行优化。
链接: https://arxiv.org/abs/2511.02869
作者: Amirreza Esmaeili,Fahd Seddik,Yongyi Ji,Fatemeh Fard,Fuxiang Chen
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
备注:
Abstract:Programming languages can benefit from one another by utilizing a language model for software engineering tasks. Full fine-tuning and Parameter Efficient Fine-Tuning (PEFT) of Code Language Models (Code-LMs) has been explored for multilingual knowledge transfer. AdapterFusion is a PEFT architecture that aims to enhance task performance by leveraging information from multiple programming languages, but primarily focuses on the target programming language. In our previous work, we proposed AdvFusion, a novel PEFT-based approach that effectively learns from other programming languages before adapting to the target task. Though previous experiments showed that AdvFusion outperformed AdapterFusion and LoRA, it was applied on pre-trained Code-LMs and was limited to only two tasks, code summarization and method name prediction. In this study, we expanded our work and investigated AdvFusion on Code Large Language Models (Code-LLMs), considering three new tasks: code generation, code translation, and commit message generation. We observed that different Code-LLMs/tasks exhibit different characteristics. In code generation, AdvFusion outperformed AdapterFusion but not other PEFT methods (LoRA, Compacter, and TaskAdapter). In commit message generation, AdapterFusion performed better than AdvFusion, and contrary to code generation, we found that the other PEFT methods do not have better performance. In code translation, AdvFusion performed worse than AdapterFusion overall, with the performance gap marginally widening as the model size increases. However, consistent with code generation, other PEFT methods showed better performance. Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Programming Languages (cs.PL) Cite as: arXiv:2511.02869 [cs.SE] (or arXiv:2511.02869v1 [cs.SE] for this version) https://doi.org/10.48550/arXiv.2511.02869 Focus to learn more arXiv-issued DOI via DataCite
zh
[AI-64] Proof-of-Spiking-Neurons(PoSN): Neuromorphic Consensus for Next-Generation Blockchains
【速读】:该论文旨在解决区块链系统中长期存在的可扩展性(scalability)、延迟(latency)和能源效率低下等问题,尤其是传统共识协议如工作量证明(Proof-of-Work, PoW)和权益证明(Proof-of-Stake, PoS)在资源消耗过高或存在中心化风险方面的局限性。其解决方案的关键在于提出一种受脉冲神经网络(spiking neural networks)启发的新型共识机制——脉冲神经元共识(Proof-of-Spiking-Neurons, PoSN),通过将交易编码为脉冲序列(spike trains),利用神经元间的竞争性放电动态选举领导者,并借助神经同步实现区块确认,从而在保持高吞吐量的同时显著降低能耗,适用于物联网(IoT)、边缘计算及大规模分布式系统场景。
链接: https://arxiv.org/abs/2511.02868
作者: M.Z. Haider,M.U Ghouri,Tayyaba Noreen,M. Salman
机构: 未知
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
备注:
Abstract:Blockchain systems face persistent challenges of scalability, latency, and energy inefficiency. Existing consensus protocols such as Proof-of-Work (PoW) and Proof-of-Stake (PoS) either consume excessive resources or risk centralization. This paper proposes \textitProof-of-Spiking-Neurons (PoSN), a neuromorphic consensus protocol inspired by spiking neural networks. PoSN encodes transactions as spike trains, elects leaders through competitive firing dynamics, and finalizes blocks via neural synchronization, enabling parallel and event-driven consensus with minimal energy overhead. A hybrid system architecture is implemented on neuromorphic platforms, supported by simulation frameworks such as Nengo and PyNN. Experimental results show significant gains in energy efficiency, throughput, and convergence compared to PoB and PoR. PoSN establishes a foundation for sustainable, adaptive blockchains suitable for IoT, edge, and large-scale distributed systems.
zh
[AI-65] LM-Fix: Lightweight Bit-Flip Detection and Rapid Recovery Framework for Language Models FAST
【速读】:该论文旨在解决大型语言模型(Large Language Models, LLMs)在实际部署中因硬件故障(如位翻转错误)导致的可靠性问题,现有完整性保障方法往往计算开销大或恢复速度慢,难以适配现代LLM的高效运行需求。其解决方案的关键在于提出LM-Fix框架,通过短时测试向量遍历(test-vector pass)结合哈希引导的校验机制实现轻量级故障检测(在TVL=200时可识别超过94%的单比特翻转,近100%的多比特翻转),并采用局部修复策略而非全量重载,使恢复速度比传统方式快100倍以上,同时仅引入约1%至7.7%的运行时开销,从而实现了高效率、低延迟的故障检测与恢复。
链接: https://arxiv.org/abs/2511.02866
作者: Ahmad Tahmasivand,Noureldin Zahran,Saba Al-Sayouri,Mohammed Fouda,Khaled N. Khasawneh
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Cryptography and Security (cs.CR)
备注: Accepted at IEEE ICCD 2025. Code: this https URL . Detects over 94 percent single-bit flips (near 100 percent multi-bit) with about 1 to 7.7 percent overhead; recovery is over 100x faster than a full reload. Keywords: LLMs, bit-flip, fault injection, reliability, security, Rowhammer, SDC, Jailbreaking, Attack, Defense, GPU DRAM faults
Abstract:This paper presents LM-Fix, a lightweight detection and rapid recovery framework for faults in large language models (LLMs). Existing integrity approaches are often heavy or slow for modern LLMs. LM-Fix runs a short test-vector pass and uses hash-guided checks to detect bit-flip faults, then repairs them locally without a full reload. Across multiple models, it detects over 94% of single-bit flips at TVL=200 and nearly 100% of multi-bit flips with approximately 1% to 7.7% runtime overhead; recovery is more than 100x faster than reloading. These results show a practical, low-overhead solution to keep LLMs reliable in production
zh
[AI-66] Mathematical exploration and discovery at scale
【速读】:该论文旨在解决数学领域中复杂问题的自动化发现与求解难题,尤其是针对长期未解的开放性问题(open problems)缺乏高效探索手段的问题。其解决方案的关键在于提出AlphaEvolve——一个通用的进化编码代理(evolutionary coding agent),通过将大语言模型(LLM)的生成能力与自动评估机制相结合,在迭代式进化框架中实现算法解的提出、测试与优化。该方法能够自主探索庞大的搜索空间,不仅在多数情况下复现已知最优解,还发现了若干改进解,并能从有限输入推广至通用公式,展现出超越传统人工推理的潜力。
链接: https://arxiv.org/abs/2511.02864
作者: Bogdan Georgiev,Javier Gómez-Serrano,Terence Tao,Adam Zsolt Wagner
机构: 未知
类目: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Classical Analysis and ODEs (math.CA); Combinatorics (math.CO); Metric Geometry (math.MG)
备注: 80 pages, 35 figures
Abstract:AlphaEvolve is a generic evolutionary coding agent that combines the generative capabilities of LLMs with automated evaluation in an iterative evolutionary framework that proposes, tests, and refines algorithmic solutions to challenging scientific and practical problems. In this paper we showcase AlphaEvolve as a tool for autonomously discovering novel mathematical constructions and advancing our understanding of long-standing open problems. To demonstrate its breadth, we considered a list of 67 problems spanning mathematical analysis, combinatorics, geometry, and number theory. The system rediscovered the best known solutions in most of the cases and discovered improved solutions in several. In some instances, AlphaEvolve is also able to generalize results for a finite number of input values into a formula valid for all input values. Furthermore, we are able to combine this methodology with Deep Think and AlphaProof in a broader framework where the additional proof-assistants and reasoning systems provide automated proof generation and further mathematical insights. These results demonstrate that large language model-guided evolutionary search can autonomously discover mathematical constructions that complement human intuition, at times matching or even improving the best known results, highlighting the potential for significant new ways of interaction between mathematicians and AI systems. We present AlphaEvolve as a powerful new tool for mathematical discovery, capable of exploring vast search spaces to solve complex optimization problems at scale, often with significantly reduced requirements on preparation and computation time. Comments: 80 pages, 35 figures Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Classical Analysis and ODEs (math.CA); Combinatorics (math.CO); Metric Geometry (math.MG) Cite as: arXiv:2511.02864 [cs.NE] (or arXiv:2511.02864v1 [cs.NE] for this version) https://doi.org/10.48550/arXiv.2511.02864 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
zh
[AI-67] SELF-REDRAFT: Eliciting Intrinsic Exploration-Exploitation Balance in Test-Time Scaling for Code Generation
【速读】:该论文旨在解决在无测试用例反馈的场景下,大型语言模型(Large Language Models, LLMs)进行代码生成时如何有效平衡探索(exploration)与利用(exploitation)的问题。现有方法通常依赖于贪婪式利用(如迭代优化)或随机探索(如基于样本投票或重排序机制),但二者之间的权衡尚未被充分研究。论文提出SELF-REDRAFT框架,其关键在于改进自Refine机制,通过鼓励模型对根本性错误的解决方案主动生成新的草稿(即“自我重写”),从而增强内在探索能力。实验表明,SELF-REDRAFT在相同迭代次数下性能优于原始Self-Refine,但仍受限于生成指导性反馈的能力不足和判别能力脆弱两个核心问题,提示未来研究应聚焦于提升反馈质量和决策鲁棒性。
链接: https://arxiv.org/abs/2511.02854
作者: Yixiang Chen,Tianshi Zheng,Shijue Huang,Zhitao He,Yi R. Fung
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
备注: 15 pages, 8 figures,2 tables
Abstract:Test-time scaling without interpreter feedback is essential for real-world code generation scenarios where test cases are not readily available. While existing paradigms often rely on either greedy exploitation (i.e., iterative refinement) or stochastic exploration (i.e., relying on sample-based voting or reranking mechanisms), the balance between these two dimensions remains underexplored. To investigate the LLM’s intrinsic ability to balance exploitation and exploration, we introduce SELF-REDRAFT, a framework built upon Self-Refine that encourages the model to propose new drafts for solutions that are fundamentally flawed. Our results show that SELF-REDRAFT consistently achieves better performance than Self-Refine when converged under the same maximum number of iterations. Still, we observe that significant room for improvement remains, largely due to two core aspects of current self-redraft capabilities: constrained capacity for generating instructive feedback and fragile discriminative judgment. We also find that balancing strategies vary notably across different LLMs, reflecting distinct, model-specific behaviors. Overall, our study establishes a baseline for intrinsic exploration-exploitation balancing in test-time scaling and identifies feedback and discrimination as key areas with potential for future advances.
zh
[AI-68] Digital Transformation Chatbot (DTchatbot): Integrating Large Language Model-based Chatbot in Acquiring Digital Transformation Needs
【速读】:该论文试图解决传统数字转型需求调研方法(如专家访谈)中存在的调度冲突、资源限制和一致性差等问题,旨在通过引入基于大语言模型(Large Language Model, LLM)的聊天机器人来高效、一致地获取组织的数字化转型需求。解决方案的关键在于将工作流驱动的指令机制与LLM的规划与推理能力相结合,使聊天机器人能够扮演虚拟专家角色并执行结构化访谈,从而实现自动化、可扩展的需求采集过程。
链接: https://arxiv.org/abs/2511.02842
作者: Jiawei Zheng,Gokcen Yilmaz,Ji Han,Saeema Ahmed-Kristensen
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注: Accepted by the International Conference on Human-Computer Interaction
Abstract:Many organisations pursue digital transformation to enhance operational efficiency, reduce manual efforts, and optimise processes by automation and digital tools. To achieve this, a comprehensive understanding of their unique needs is required. However, traditional methods, such as expert interviews, while effective, face several challenges, including scheduling conflicts, resource constraints, inconsistency, etc. To tackle these issues, we investigate the use of a Large Language Model (LLM)-powered chatbot to acquire organisations’ digital transformation needs. Specifically, the chatbot integrates workflow-based instruction with LLM’s planning and reasoning capabilities, enabling it to function as a virtual expert and conduct interviews. We detail the chatbot’s features and its implementation. Our preliminary evaluation indicates that the chatbot performs as designed, effectively following predefined workflows and supporting user interactions with areas for improvement. We conclude by discussing the implications of employing chatbots to elicit user information, emphasizing their potential and limitations.
zh
[AI-69] Evaluating Generative AI as an Educational Tool for Radiology Resident Report Drafting
【速读】:该论文旨在解决放射科住院医师在临床实践中缺乏及时、个性化反馈的问题,这一问题限制了其影像分析与报告能力的提升。当前临床工作负荷增加,导致主治医师难以提供充分指导。研究提出了一种符合HIPAA合规要求的GPT-4o系统,作为自动化反馈工具,在真实临床环境中对住院医师撰写的乳腺影像报告进行错误识别与反馈生成。解决方案的关键在于:利用大语言模型(LLM)对5000份住院医师与主治医师报告配对数据进行训练和提示工程设计,精准识别三类常见错误(关键发现遗漏或添加、技术描述使用不当、最终评估与发现不一致),并通过读者研究验证其反馈的准确性与教育价值,结果显示GPT-4o在多数情况下能与主治医师达成高度一致(准确率90.5%–90.4%),且其反馈被评价为“有帮助”的比例高达83.0%–92.0%,表明其具备作为可扩展辅助教学工具的潜力。
链接: https://arxiv.org/abs/2511.02839
作者: Antonio Verdone,Aidan Cardall,Fardeen Siddiqui,Motaz Nashawaty,Danielle Rigau,Youngjoon Kwon,Mira Yousef,Shalin Patel,Alex Kieturakis,Eric Kim,Laura Heacock,Beatriu Reig,Yiqiu Shen
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
备注:
Abstract:Objective: Radiology residents require timely, personalized feedback to develop accurate image analysis and reporting skills. Increasing clinical workload often limits attendings’ ability to provide guidance. This study evaluates a HIPAA-compliant GPT-4o system that delivers automated feedback on breast imaging reports drafted by residents in real clinical settings. Methods: We analyzed 5,000 resident-attending report pairs from routine practice at a multi-site U.S. health system. GPT-4o was prompted with clinical instructions to identify common errors and provide feedback. A reader study using 100 report pairs was conducted. Four attending radiologists and four residents independently reviewed each pair, determined whether predefined error types were present, and rated GPT-4o’s feedback as helpful or not. Agreement between GPT and readers was assessed using percent match. Inter-reader reliability was measured with Krippendorff’s alpha. Educational value was measured as the proportion of cases rated helpful. Results: Three common error types were identified: (1) omission or addition of key findings, (2) incorrect use or omission of technical descriptors, and (3) final assessment inconsistent with findings. GPT-4o showed strong agreement with attending consensus: 90.5%, 78.3%, and 90.4% across error types. Inter-reader reliability showed moderate variability (\alpha = 0.767, 0.595, 0.567), and replacing a human reader with GPT-4o did not significantly affect agreement (\Delta = -0.004 to 0.002). GPT’s feedback was rated helpful in most cases: 89.8%, 83.0%, and 92.0%. Discussion: ChatGPT-4o can reliably identify key educational errors. It may serve as a scalable tool to support radiology education. Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY) Cite as: arXiv:2511.02839 [cs.HC] (or arXiv:2511.02839v1 [cs.HC] for this version) https://doi.org/10.48550/arXiv.2511.02839 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Yiqiu Shen [view email] [v1] Mon, 22 Sep 2025 20:51:09 UTC (2,181 KB)
zh
[AI-70] An extended reality-based framework for user risk training in urban built environment
【速读】:该论文旨在解决城市风险加剧背景下,尤其是由气候变化引发的洪水灾害中,如何提升各类利益相关者(如市民、地方政府和应急响应人员)的风险意识与应对能力的问题。其解决方案的关键在于构建一个基于扩展现实(XR)技术的增强型风险培训框架,通过沉浸式模拟真实紧急场景,促进用户的主动参与和对潜在灾害(特别是洪水)的深入理解;同时,该框架强调利益相关方在开发过程中的深度参与,确保培训模块针对不同用户群体进行定制化设计,并采用迭代方法结合用户反馈与性能数据持续优化训练效果,从而显著提升城市风险培训的有效性与适应性。
链接: https://arxiv.org/abs/2511.02837
作者: Sotirios Konstantakos,Sotirios Asparagkathos,Moatasim Mahmoud,Stamatia Rizou,Enrico Quagliarini,Gabriele Bernardini
机构: 未知
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
备注:
Abstract:In the context of increasing urban risks, particularly from climate change-induced flooding, this paper presents an extended Reality (XR)-based framework to improve user risk training within urban built environments. The framework is designed to improve risk awareness and preparedness among various stakeholders, including citizens, local authorities, and emergency responders. Using immersive XR technologies, the training experience simulates real-world emergency scenarios, contributing to active participation and a deeper understanding of potential hazards and especially for floods. The framework highlights the importance of stakeholder participation in its development, ensuring that training modules are customized to address the specific needs of different user groups. The iterative approach of the framework supports ongoing refinement through user feedback and performance data, thus improving the overall effectiveness of risk training initiatives. This work outlines the methodological phases involved in the framework’s implementation, including i) user flow mapping, ii) scenario selection, and iii) performance evaluation, with a focus on the pilot application in Senigallia, Italy. The findings underscore the potential of XR technologies to transform urban risk training, promoting a culture of preparedness and resilience against urban hazards.
zh
[AI-71] Explaining Human Choice Probabilities with Simple Vector Representations
【速读】:该论文旨在解决人类在随机环境中追求奖励时,为何倾向于采用概率匹配(probability matching)策略——即个体选择频率与目标事件发生频率一致——即使这种策略并非最优的问题。研究通过“藏匿与寻找”任务设计,明确区分了追求(seeking)和回避(hiding)两种行为模式,并发现仅需两种基础策略:概率匹配/反匹配(antimatching)与最大化/最小化,即可充分解释参与者在不同选项数量和对手概率分布下的行为模式。其解决方案的关键在于将选择频率直方图建模为向量空间中的向量,并提出回避行为可通过概率匹配的向量反射实现;同时表明,仅需对历史结果相对频率的记忆能力,即可通过简单算术操作构造出所有四种策略,从而用混合策略模型准确拟合人类决策行为。
链接: https://arxiv.org/abs/2511.03643
作者: Peter DiBerardino,Britt Anderson
机构: 未知
类目: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI)
备注:
Abstract:When people pursue rewards in stochastic environments, they often match their choice frequencies to the observed target frequencies, even when this policy is demonstrably sub-optimal. We used a ``hide and seek’’ task to evaluate this behavior under conditions where pursuit (seeking) could be toggled to avoidance (hiding), while leaving the probability distribution fixed, or varying complexity by changing the number of possible choices. We developed a model for participant choice built from choice frequency histograms treated as vectors. We posited the existence of a probability antimatching strategy for avoidance (hiding) rounds, and formalized this as a vector reflection of probability matching. We found that only two basis policies: matching/antimatching and maximizing/minimizing were sufficient to account for participant choices across a range of room numbers and opponent probability distributions. This schema requires only that people have the ability to remember the relative frequency of the different outcomes. With this knowledge simple operations can construct the maximizing and minimizing policies as well as matching and antimatching strategies. A mixture of these two policies captures human choice patterns in a stochastic environment.
zh
[AI-72] Computational Imaging Meets LLM s: Zero-Shot IDH Mutation Prediction in Brain Gliomas
【速读】:该论文旨在解决脑胶质瘤中IDH突变状态的非侵入性、零样本预测问题,传统方法依赖于有创组织活检或需大量标注数据进行模型训练。其解决方案的关键在于将大语言模型(Large Language Models, LLMs)与计算图像分析相结合:通过多模态MRI扫描和肿瘤分割图提取可解释的语义特征(visual attributes)与定量特征,并以标准化JSON格式输入至GPT-4o和GPT-5进行无微调推理。实验表明,该框架在六个公开数据集(N=1427)上表现出高准确率和均衡分类性能,且无需人工标注;其中体积特征为最重要预测因子,辅以亚型特异性影像标志物及临床信息,验证了LLM推理能力与计算图像分析融合在精准无创肿瘤基因分型中的潜力。
链接: https://arxiv.org/abs/2511.03376
作者: Syed Muqeem Mahmood,Hassan Mohy-ud-Din
机构: 未知
类目: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)
备注: 5 pages, 1 figure, 3 tables
Abstract:We present a framework that combines Large Language Models with computational image analytics for non-invasive, zero-shot prediction of IDH mutation status in brain gliomas. For each subject, coregistered multi-parametric MRI scans and multi-class tumor segmentation maps were processed to extract interpretable semantic (visual) attributes and quantitative features, serialized in a standardized JSON file, and used to query GPT 4o and GPT 5 without fine-tuning. We evaluated this framework on six publicly available datasets (N = 1427) and results showcased high accuracy and balanced classification performance across heterogeneous cohorts, even in the absence of manual annotations. GPT 5 outperformed GPT 4o in context-driven phenotype interpretation. Volumetric features emerged as the most important predictors, supplemented by subtype-specific imaging markers and clinical information. Our results demonstrate the potential of integrating LLM-based reasoning with computational image analytics for precise, non-invasive tumor genotyping, advancing diagnostic strategies in neuro-oncology. The code is available at this https URL.
zh
[AI-73] Open Source State-Of-the-Art Solution for Romanian Speech Recognition
【速读】:该论文旨在解决罗马尼亚语自动语音识别(Automatic Speech Recognition, ASR)系统在准确性和实用性方面的瓶颈问题,尤其是在弱监督数据条件下实现高性能的语音转写。其解决方案的关键在于首次将 NVIDIA 的 FastConformer 架构应用于罗马尼亚语 ASR,并基于超过 2600 小时的弱监督语料进行训练;同时采用融合连接时序分类(Connectionist Temporal Classification, CTC)与词元持续时间转换器(Token-Duration Transducer, TDT)的混合解码器结构,结合多种解码策略(如贪婪搜索、ALSD 和带六元语法语言模型的 CTC 束搜索),显著提升了跨读诵、自发及领域特定语音场景下的识别准确率,相对词错误率(Word Error Rate, WER)降低最多达 27%,且保持了较低延迟,具备良好的实际部署效率。
链接: https://arxiv.org/abs/2511.03361
作者: Gabriel Pirlogeanu,Alexandru-Lucian Georgescu,Horia Cucu
机构: 未知
类目: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
备注: 13th Conference on Speech Technology and Human-Computer Dialogue (SpeD 2025), Cluj-Napoca, Romania
Abstract:In this work, we present a new state-of-the-art Romanian Automatic Speech Recognition (ASR) system based on NVIDIA’s FastConformer architecture–explored here for the first time in the context of Romanian. We train our model on a large corpus of, mostly, weakly supervised transcriptions, totaling over 2,600 hours of speech. Leveraging a hybrid decoder with both Connectionist Temporal Classification (CTC) and Token-Duration Transducer (TDT) branches, we evaluate a range of decoding strategies including greedy, ALSD, and CTC beam search with a 6-gram token-level language model. Our system achieves state-of-the-art performance across all Romanian evaluation benchmarks, including read, spontaneous, and domain-specific speech, with up to 27% relative WER reduction compared to previous best-performing systems. In addition to improved transcription accuracy, our approach demonstrates practical decoding efficiency, making it suitable for both research and deployment in low-latency ASR applications.
zh
[AI-74] Optimizing Earth-Moon Transfer and Cislunar Navigation: Integrating Low-Energy Trajectories AI Techniques and GNSS-R Technologies
【速读】:该论文旨在解决深空探索中日益增长的月球轨道(cislunar)活动对高效轨道设计与可靠导航感知能力的需求问题,尤其针对传统地月转移轨道存在发射窗口受限、推进剂消耗高,以及地球基全球导航卫星系统(GNSS)在地球同步轨道以外几乎无覆盖的问题。其解决方案的关键在于综合评估四类主要转移策略的性能指标(如速度增量需求、飞行时长和燃料效率),并提出融合人工智能(AI)与机器学习(ML)技术的新方法:利用卷积神经网络(CNN)实现自动撞击坑识别与数字地形建模,通过深度强化学习(DRL)优化着陆阶段的自适应轨迹调整以降低风险与决策延迟;同时,结合GNSS反射测量(GNSS-Reflectometry, GNSS-R)与先进定位、导航与授时(PNT)架构,扩展导航能力至当前极限之外,支持自主交会对接、拉格朗日点轨道保持及卫星编队协同作业,从而构建可持续的月球轨道空间探索框架。
链接: https://arxiv.org/abs/2511.03173
作者: Arsalan Muhammad,Wasiu Akande Ahmed,Omada Friday Ojonugwa,Paul Puspendu Biswas
机构: 未知
类目: Earth and Planetary Astrophysics (astro-ph.EP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
备注:
Abstract:The rapid growth of cislunar activities, including lunar landings, the Lunar Gateway, and in-space refueling stations, requires advances in cost-efficient trajectory design and reliable integration of navigation and remote sensing. Traditional Earth-Moon transfers suffer from rigid launch windows and high propellant demands, while Earth-based GNSS systems provide little to no coverage beyond geostationary orbit. This limits autonomy and environmental awareness in cislunar space. This review compares four major transfer strategies by evaluating velocity requirements, flight durations, and fuel efficiency, and by identifying their suitability for both crewed and robotic missions. The emerging role of artificial intelligence and machine learning is highlighted: convolutional neural networks support automated crater recognition and digital terrain model generation, while deep reinforcement learning enables adaptive trajectory refinement during descent and landing to reduce risk and decision latency. The study also examines how GNSS-Reflectometry and advanced Positioning, Navigation, and Timing architectures can extend navigation capabilities beyond current limits. GNSS-R can act as a bistatic radar for mapping lunar ice, soil properties, and surface topography, while PNT systems support autonomous rendezvous, Lagrange point station-keeping, and coordinated satellite swarm operations. Combining these developments establishes a scalable framework for sustainable cislunar exploration and long-term human and robotic presence.
zh
[AI-75] Optimal Boundary Control of Diffusion on Graphs via Linear Programming
【速读】:该论文旨在解决几何网络上稳态扩散与通量优化问题,其核心目标是在保持几何保真度的前提下,通过线性规划(Linear Programming, LP)框架对网络中的扩散过程进行建模与优化。解决方案的关键在于:将状态变量定义在加权有向图上,并通过边长缩放导纳以维持几何一致性;边界电势作为控制输入驱动内部通量,遵循线性网络拉普拉斯算子关系;同时,在所有边界边上施加基于梯度界直接推导出的符号约束和通量上限约束,从而构建一个可行域为多面体的有限维LP问题。该方法保证了问题的有界性和可解性,且在缺乏负衰退方向(如存在有限盒约束、通量上限或符号限制时自动满足)条件下,可获得全局最小解。
链接: https://arxiv.org/abs/2511.03129
作者: Harbir Antil,Rainald Löhner,Felipe Pérez
机构: 未知
类目: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Computational Physics (physics.comp-ph)
备注:
Abstract:We propose a linear programming (LP) framework for steady-state diffusion and flux optimization on geometric networks. The state variable satisfies a discrete diffusion law on a weighted, oriented graph, where conductances are scaled by edge lengths to preserve geometric fidelity. Boundary potentials act as controls that drive interior fluxes according to a linear network Laplacian. The optimization problem enforces physically meaningful sign and flux-cap constraints at all boundary edges, derived directly from a gradient bound. This yields a finite-dimensional LP whose feasible set is polyhedral, and whose boundedness and solvability follow from simple geometric or algebraic conditions on the network data. We prove that under the absence of negative recession directions–automatically satisfied in the presence of finite box bounds, flux caps, or sign restrictions–the LP admits a global minimizer. Several sufficient conditions guaranteeing boundedness of the feasible region are identified, covering both full-rank and rank-deficient flux maps. The analysis connects classical results such as the Minkowski–Weyl decomposition, Hoffman’s bound, and the fundamental theorem of linear programming with modern network-based diffusion modeling. Two large-scale examples illustrate the framework: (i) A typical large stadium in a major modern city, which forms a single connected component with relatively uniform corridor widths, and a (ii) A complex street network emanating from a large, historical city center, which forms a multi-component system. Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Computational Physics (physics.comp-ph) Cite as: arXiv:2511.03129 [math.OC] (or arXiv:2511.03129v1 [math.OC] for this version) https://doi.org/10.48550/arXiv.2511.03129 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
zh
[AI-76] EGMOF: Efficient Generation of Metal-Organic Frameworks Using a Hybrid Diffusion-Transformer Architecture
【速读】:该论文旨在解决材料逆向设计中面临的两大挑战:一是化学空间庞大导致的探索难度,二是属性标注数据稀缺限制了模型性能。现有生成式 AI 方法通常依赖大规模数据集,并且在目标属性变更时需重新训练,难以实现高效、灵活的材料设计。其解决方案的关键在于提出一种模块化、描述符驱动的混合框架 EGMOF(Efficient Generation of MOFs),将逆向设计分解为两个步骤:首先通过一维扩散模型 Prop2Desc 将目标属性映射为化学意义明确的描述符;随后利用 Transformer 模型 Desc2MOF 从这些描述符生成金属有机框架材料(MOF)结构。这种分步解耦的设计使得模型仅需少量数据即可保持高生成有效性与命中率,且无需针对每种新属性重新训练,从而实现了数据高效、通用性强的 MOF 逆向设计。
链接: https://arxiv.org/abs/2511.03122
作者: Seunghee Han,Yeonghun Kang,Taeun Bae,Varinia Bernales,Alan Aspuru-Guzik,Jihan Kim
机构: 未知
类目: Materials Science (cond-mat.mtrl-sci); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:
Abstract:Designing materials with targeted properties remains challenging due to the vastness of chemical space and the scarcity of property-labeled data. While recent advances in generative models offer a promising way for inverse design, most approaches require large datasets and must be retrained for every new target property. Here, we introduce the EGMOF (Efficient Generation of MOFs), a hybrid diffusion-transformer framework that overcomes these limitations through a modular, descriptor-mediated workflow. EGMOF decomposes inverse design into two steps: (1) a one-dimensional diffusion model (Prop2Desc) that maps desired properties to chemically meaningful descriptors followed by (2) a transformer model (Desc2MOF) that generates structures from these descriptors. This modular hybrid design enables minimal retraining and maintains high accuracy even under small-data conditions. On a hydrogen uptake dataset, EGMOF achieved over 95% validity and 84% hit rate, representing significant improvements of up to 57% in validity and 14% in hit rate compared to existing methods, while remaining effective with only 1,000 training samples. Moreover, our model successfully performed conditional generation across 29 diverse property datasets, including CoREMOF, QMOF, and text-mined experimental datasets, whereas previous models have not. This work presents a data-efficient, generalizable approach to the inverse design of diverse MOFs and highlights the potential of modular inverse design workflows for broader materials discovery.
zh
[AI-77] From Narrow to Wide: Autoencoding Transformers for Ultrasound Bandwidth Recovery
【速读】:该论文旨在解决低成本窄带(narrow-band)超声探头在脉冲回波(pulse-echo ultrasound)成像中因频带宽度受限导致的脉冲展宽和高频细节丢失问题,从而限制了图像分辨率。其解决方案的关键在于提出一种基于数据驱动的映射方法,利用改进的Tiny Vision Transformer(ViT)自编码器模型,从窄带射频(RF)信号的谱图(spectrogram)中学习生成宽带谱图;该方法通过课程加权损失函数在仿真数据上训练,能够在异质散斑囊肿体模上将图像域均方误差(MSE)降低90%,峰值信噪比(PSNR)提升6.7 dB,结构相似性指数(SSIM)达到0.965,且在完全未见过的高分辨率体模中展现出强泛化能力,同时保持帧率和相位信息不变,证明仅通过软件升级即可使现有窄带探头实现接近宽带性能,具有在资源受限场景中推广高分辨率超声的潜力。
链接: https://arxiv.org/abs/2511.02938
作者: Sepideh KhakzadGharamaleki,Hassan Rivaz,Brandon Helfield
机构: 未知
类目: ignal Processing (eess.SP); Artificial Intelligence (cs.AI)
备注:
Abstract:Conventional pulse-echo ultrasound suffers when low-cost probes deliver only narrow fractional bandwidths, elongating pulses and erasing high-frequency detail. We address this limitation by learning a data-driven mapping from band-limited to broadband spectrogram of radio-frequency (RF) lines. To this end, a variation of Tiny Vision Transform (ViT) auto-encoder is trained on simulation data using a curriculum-weighted loss. On heterogeneous speckle-cyst phantoms, the network reduces image-domain MSE by 90 percent, boosts PSNR by 6.7 dB, and raises SSIM to 0.965 compared with the narrow-band input. It also sharpens point-target rows in a completely unseen resolution phantom, demonstrating strong out-of-distribution generalisation without sacrificing frame rate or phase information. These results indicate that a purely software upgrade can endow installed narrow-band probes with broadband-like performance, potentially widening access to high-resolution ultrasound in resource-constrained settings.
zh
[AI-78] NABench: Large-Scale Benchmarks of Nucleotide Foundation Models for Fitness Prediction
【速读】:该论文旨在解决当前核酸(DNA/RNA)功能适应度(fitness)预测模型在评估时缺乏统一、大规模且高质量基准数据集的问题,从而导致不同方法之间难以公平比较。其解决方案的关键在于构建NABench——一个涵盖162个高通量实验数据集、包含260万条突变序列的系统性基准平台,覆盖多样化的核酸家族,并采用标准化划分和丰富元数据,支持零样本、少样本、迁移学习和监督学习等多种场景下的统一评估,从而为核酸建模提供可复现的性能基线和科学依据。
链接: https://arxiv.org/abs/2511.02888
作者: Zhongmin Li,Runze Ma,Jiahao Tan,Chengzi Tan,Shuangjia Zheng
机构: 未知
类目: Genomics (q-bio.GN); Artificial Intelligence (cs.AI)
备注:
Abstract:Nucleotide sequence variation can induce significant shifts in functional fitness. Recent nucleotide foundation models promise to predict such fitness effects directly from sequence, yet heterogeneous datasets and inconsistent preprocessing make it difficult to compare methods fairly across DNA and RNA families. Here we introduce NABench, a large-scale, systematic benchmark for nucleic acid fitness prediction. NABench aggregates 162 high-throughput assays and curates 2.6 million mutated sequences spanning diverse DNA and RNA families, with standardized splits and rich metadata. We show that NABench surpasses prior nucleotide fitness benchmarks in scale, diversity, and data quality. Under a unified evaluation suite, we rigorously assess 29 representative foundation models across zero-shot, few-shot prediction, transfer learning, and supervised settings. The results quantify performance heterogeneity across tasks and nucleic-acid types, demonstrating clear strengths and failure modes for different modeling choices and establishing strong, reproducible baselines. We release NABench to advance nucleic acid modeling, supporting downstream applications in RNA/DNA design, synthetic biology, and biochemistry. Our code is available at this https URL.
zh
[AI-79] Digitizing Spermatogenesis Lineage at Nanoscale Resolution In Tissue-Level Electron Microscopy
【速读】:该论文旨在解决在组织水平上对细胞内细胞器及其相互作用进行跨尺度定量分析的难题,尤其是在复杂生理和病理背景下精准解析细胞器空间分布与动态变化的需求。其解决方案的关键在于开发了一种基于轻量化Mask2Former框架的通用分割工具DeepOrganelle,该工具能够实现不同细胞类型中细胞器的自动分割与提取、统计定量分析,并可视化和量化细胞器形态及相互作用在组织尺度上的空间分布特征。通过该方法,研究团队系统揭示了生精上皮周期中膜接触位点(MCSs)的动态变化规律,特别是线粒体与内质网(Mito-ER)接触的时空梯度模式,为理解生殖细胞分化过程中能量代谢调控机制提供了新的实验证据。
链接: https://arxiv.org/abs/2511.02860
作者: Li Xiao,Liqing Liu,Hongjun Wu,Jiayi Zhong,Yan Zhang,Junjie Hu,Sun Fei,Ge Yang,Tao Xu
机构: 未知
类目: Biological Physics (physics.bio-ph); Artificial Intelligence (cs.AI)
备注: 19 pages,4 figures
Abstract:Recent advances in 2D large-scale and 3D volume electron microscopy have stimulated the rapid development of nanoscale functional analysis at the tissue and organ levels. Digitizing the cell by mapping the intricate organellar networks into its physiological and pathological textures will revolutionarize the contents of cell atlases. To meet the requirements of characterizing intracellular organelles and their interactions within defined cellular cohorts at tissue level, we have developed DeepOrganelle. It adopts a lightweighted Mask2Former frameworks as a universal segmentor and is capable of segmenting and extracting organelles within different cell types, performing statistical quantitative analysis, as well as visualizing and quantifying the spatial distribution of organelle morphologies and interactions across different cell types at tissue scales. Using DeepOrganelle, we systemically perform cross-scale quantification of membrane contact sites(MCSs) dynamics across the progression of the seminiferous epithelial cycle, covering 12 distinct developmental stages and 24 statuses of germ cells. DeepOrganelle uncovers the spatiotemporal gradient of the germ cell differentiation atlas according to different types of organelles and their interactions. Noticeably, it discovers a waved pattern of mitochondria(Mito)-endoplasmic reticulum(ER) contact with a significant increase specifically at Stage X pachytene preceding the transition to diplotene, which aligns well with a newly reported experiment that mitochondrial metabolic proteins like PDHA2 are essential for this transition by maintaining ATP supply for double-strand break(DSB) repair. DeepOrganelle also observes a dynamic restructuring of the blood-testis barrier and stage-specific reorganization of organelle topography in Sertoli cells from preleptotene to leptotene phases of prophase I.
zh
[AI-80] Consciousness-ECG Transformer for Conscious State Estimation System with Real-Time Monitoring
【速读】:该论文旨在解决传统意识状态估计方法依赖脑电图(electroencephalography, EEG)所面临的噪声敏感性强和环境限制严格的问题,从而在睡眠分期和麻醉管理等临床场景中实现更可靠、非侵入式的意识状态监测。其解决方案的关键在于提出了一种基于心电图(electrocardiography, ECG)信号的意识状态估计模型——consciousness-ECG transformer,该模型采用解耦查询注意力机制(decoupled query attention),有效捕捉心率变异性(heart rate variability, HRV)特征以区分清醒与无意识状态,实现了高精度的实时监测性能,在睡眠分期和麻醉深度评估任务中分别达到0.877和0.880的准确率以及0.786和0.895的AUC值,展现出优于基线模型的鲁棒性和实用性。
链接: https://arxiv.org/abs/2511.02853
作者: Young-Seok Kweon,Gi-Hwan Shin,Ji-Yong Kim,Bokyeong Ryu,Seong-Whan Lee
机构: 未知
类目: ignal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 30 pages, 8 figures
Abstract:Conscious state estimation is important in various medical settings, including sleep staging and anesthesia management, to ensure patient safety and optimize health outcomes. Traditional methods predominantly utilize electroencephalography (EEG), which faces challenges such as high sensitivity to noise and the requirement for controlled environments. In this study, we propose the consciousness-ECG transformer that leverages electrocardiography (ECG) signals for non-invasive and reliable conscious state estimation. Our approach employs a transformer with decoupled query attention to effectively capture heart rate variability features that distinguish between conscious and unconscious states. We implemented the conscious state estimation system with real-time monitoring and validated our system on datasets involving sleep staging and anesthesia level monitoring during surgeries. Experimental results demonstrate that our model outperforms baseline models, achieving accuracies of 0.877 on sleep staging and 0.880 on anesthesia level monitoring. Moreover, our model achieves the highest area under curve values of 0.786 and 0.895 on sleep staging and anesthesia level monitoring, respectively. The proposed system offers a practical and robust alternative to EEG-based methods, particularly suited for dynamic clinical environments. Our results highlight the potential of ECG-based consciousness monitoring to enhance patient safety and advance our understanding of conscious states.
zh
[AI-81] Approaching Low-Cost Cardiac Intelligence with Semi-Supervised Knowledge Distillation
【速读】:该论文旨在解决低成本心脏智能(Low-cost cardiac intelligence, LCCI)系统在诊断性能上显著落后于高成本心脏智能(High-cost cardiac intelligence, HCCI)系统的问题,尤其是在依赖可穿戴设备(如单导联心电图,1-lead ECG)进行日常心脏监测时。解决方案的关键在于提出 LiteHeart,一个半监督知识蒸馏框架:其核心创新包括两个模块——区域感知蒸馏模块(region-aware distillation module),模拟心脏病专家对诊断相关心电图区域的关注机制;以及跨层互信息模块(cross-layer mutual information module),用于对齐 LCCI 与 HCCI 系统的决策过程。此外,采用半监督训练策略进一步提升模型在有限标注数据下的鲁棒性。实验表明,LiteHeart 在五个涵盖超过38种心血管疾病的数据库上显著缩小了 LCCI 与 HCCI 的性能差距,宏 F1 分数提升达 4.27% 至 7.10%,验证了其在提升低成本心脏智能系统诊断能力方面的有效性。
链接: https://arxiv.org/abs/2511.02851
作者: Rushuang Zhou,Yuan-Ting Zhang,M.Jamal Deen,Yining Dong
机构: 未知
类目: ignal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:
Abstract:Deploying advanced cardiac artificial intelligence for daily cardiac monitoring is hindered by its reliance on extensive medical data and high computational resources. Low-cost cardiac intelligence (LCCI) offers a promising alternative by using wearable device data, such as 1-lead electrocardiogram (ECG), but it suffers from a significant diagnostic performance gap compared to high-cost cardiac intelligence (HCCI). To bridge this gap, we propose LiteHeart, a semi-supervised knowledge distillation framework. LiteHeart introduces a region-aware distillation module to mimic how cardiologists focus on diagnostically relevant ECG regions and a cross-layer mutual information module to align the decision processes of LCCI and HCCI systems. Using a semi-supervised training strategy, LiteHeart further improves model robustness under limited supervision. Evaluated on five datasets covering over 38 cardiovascular diseases, LiteHeart substantially reduces the performance gap between LCCI and HCCI, outperforming existing methods by 4.27% to 7.10% in macro F1 score. These results demonstrate that LiteHeart significantly enhances the diagnostic capabilities of low-cost cardiac intelligence systems, paving the way for scalable, affordable, and accurate daily cardiac healthcare using wearable technologies.
zh
[AI-82] EEGReXferNet: A Lightweight Gen-AI Framework for EEG Subspace Reconstruction via Cross-Subject Transfer Learning and Channel-Aware Embedding NEURIPS2025
【速读】:该论文旨在解决脑电图(EEG)信号因多种伪影导致信噪比(SNR)低的问题,传统去伪影方法常需人工干预或在滤波/重构过程中抑制关键神经特征。为克服现有生成式 AI(Generative AI)模型如变分自编码器(VAE)和生成对抗网络(GAN)在时间-频谱-空间敏感性整合不足且计算复杂度高的局限,本文提出轻量级生成式 AI 框架 EEGReXferNet,其核心在于通过跨被试迁移学习实现 EEG 子空间重建:采用模块化架构融合邻近通道的体积传导特性、带通卷积编码与滑动窗口动态潜在特征提取,并引入基于参考的缩放机制以保障时序连续性及跨被试泛化能力,从而显著提升时空频分辨率(平均功率谱密度相关系数=0.95;频谱 RV 系数=0.85),减少约45%参数量以缓解过拟合,同时保持实时处理效率,适用于神经生理学与脑机接口(BCI)场景下的可靠预处理。
链接: https://arxiv.org/abs/2511.02848
作者: Shantanu Sarkar,Piotr Nabrzyski,Saurabh Prasad,Jose Luis Contreras-Vidal
机构: 未知
类目: ignal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Accepted for presentation at the NeurIPS 2025 Workshop on Foundation Models for the Brain and Body
Abstract:Electroencephalography (EEG) is a widely used non-invasive technique for monitoring brain activity, but low signal-to-noise ratios (SNR) due to various artifacts often compromise its utility. Conventional artifact removal methods require manual intervention or risk suppressing critical neural features during filtering/reconstruction. Recent advances in generative models, including Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), have shown promise for EEG reconstruction; however, these approaches often lack integrated temporal-spectral-spatial sensitivity and are computationally intensive, limiting their suitability for real-time applications like brain-computer interfaces (BCIs). To overcome these challenges, we introduce EEGReXferNet, a lightweight Gen-AI framework for EEG subspace reconstruction via cross-subject transfer learning - developed using Keras TensorFlow (v2.15.1). EEGReXferNet employs a modular architecture that leverages volume conduction across neighboring channels, band-specific convolution encoding, and dynamic latent feature extraction through sliding windows. By integrating reference-based scaling, the framework ensures continuity across successive windows and generalizes effectively across subjects. This design improves spatial-temporal-spectral resolution (mean PSD correlation = 0.95; mean spectrogram RV-Coefficient = 0.85), reduces total weights by ~45% to mitigate overfitting, and maintains computational efficiency for robust, real-time EEG preprocessing in neurophysiological and BCI applications.
zh
[AI-83] Spatio-Temporal Attention Network for Epileptic Seizure Prediction
【速读】:该论文旨在解决癫痫患者发作前状态(preictal state)的精准预测问题,以实现早期干预和临床应用。传统方法依赖于人工特征工程或假设固定的发作前期时长,限制了模型的泛化能力和个体适应性。其解决方案的关键在于提出一种基于时空注意力网络(Spatio-Temporal Attention Network, STAN)的深度学习框架,该框架能够自动学习脑电图(EEG)信号中复杂的时空相关性,并通过对抗判别器区分发作前与发作间期的注意力模式,从而实现患者特异性建模。实验表明,该方法在CHB-MIT和MSSM数据集上分别达到96.6%敏感度(FDR=0.011/h)和94.2%敏感度(FDR=0.063/h),且能在发作前至少15分钟可靠检测到前驱状态,最长可达45分钟,为临床干预提供了充足时间窗口。
链接: https://arxiv.org/abs/2511.02846
作者: Zan Li,Kyongmin Yeo,Wesley Gifford,Lara Marcuse,Madeline Fields,Bülent Yener
机构: 未知
类目: ignal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:
Abstract:In this study, we present a deep learning framework that learns complex spatio-temporal correlation structures of EEG signals through a Spatio-Temporal Attention Network (STAN) for accurate predictions of onset of seizures for Epilepsy patients. Unlike existing methods, which rely on feature engineering and/or assume fixed preictal durations, our approach simultaneously models spatio-temporal correlations through STAN and employs an adversarial discriminator to distinguish preictal from interictal attention patterns, enabling patient-specific learning. Evaluation on CHB-MIT and MSSM datasets demonstrates 96.6% sensitivity with 0.011/h false detection rate on CHB-MIT, and 94.2% sensitivity with 0.063/h FDR on MSSM, significantly outperforming state-of-the-art methods. The framework reliably detects preictal states at least 15 minutes before an onset, with patient-specific windows extending to 45 minutes, providing sufficient intervention time for clinical applications.
zh
[AI-84] AI-Enhanced Wi-Fi Sensing Through Single Transceiver Pair
【速读】:该论文旨在解决下一代Wi-Fi感知系统在硬件资源受限条件下如何实现高精度感知的问题,尤其是在天线数量和带宽受限的情况下,传统雷达理论所设定的分辨率限制难以突破。其解决方案的关键在于揭示并利用人工智能(AI)在Wi-Fi感知中的两大核心优势:一是先验信息(prior information),使AI能够基于模糊输入生成合理的细节;二是时序相关性(temporal correlation),可有效降低感知误差的上限。作者通过构建仅使用单对收发器的AI驱动Wi-Fi感知系统,并在人体姿态估计与室内定位任务中验证了上述理论,实验证明了这两项机制对性能提升的决定性作用。
链接: https://arxiv.org/abs/2511.02845
作者: Yuxuan Liu,Chiya Zhang,Yifeng Yuan,Chunlong He,Weizheng Zhang,Gaojie Chen
机构: 未知
类目: ignal Processing (eess.SP); Artificial Intelligence (cs.AI); Instrumentation and Detectors (physics.ins-det)
备注: 12 pages, 11 figures
Abstract:The advancement of next-generation Wi-Fi technology heavily relies on sensing capabilities, which play a pivotal role in enabling sophisticated applications. In response to the growing demand for large-scale deployments, contemporary Wi-Fi sensing systems strive to achieve high-precision perception while maintaining minimal bandwidth consumption and antenna count requirements. Remarkably, various AI-driven perception technologies have demonstrated the ability to surpass the traditional resolution limitations imposed by radar theory. However, the theoretical underpinnings of this phenomenon have not been thoroughly investigated in existing research. In this study, we found that under hardware-constrained conditions, the performance gains brought by AI to Wi-Fi sensing systems primarily originate from two aspects: prior information and temporal correlation. Prior information enables the AI to generate plausible details based on vague input, while temporal correlation helps reduce the upper bound of sensing error. We developed an AI-based Wi-Fi sensing system using a single transceiver pair and designed experiments focusing on human pose estimation and indoor localization to validate the theoretical claims. The results confirm the performance gains contributed by temporal correlation and prior information.
zh
机器学习
[LG-0] Shrinking the Variance: Shrinkage Baselines for Reinforcement Learning with Verifiable Rewards
链接: https://arxiv.org/abs/2511.03710
作者: Guanning Zeng,Zhaoyi Zhou,Daman Arora,Andrea Zanette
类目: Machine Learning (cs.LG)
*备注: Preprint. Under Review
Abstract:Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful paradigm for post-training large reasoning models (LRMs) using policy-gradient methods such as GRPO. To stabilize training, these methods typically center trajectory rewards by subtracting the empirical mean for each prompt. Statistically, this centering acts as a control variate (or baseline), reducing the variance of the policy-gradient estimator. Typically, the mean reward is estimated using per-prompt empirical averages for each prompt in a batch. Drawing inspiration from Stein’s paradox, we propose using shrinkage estimators that combine per-prompt and across-prompt means to improve the overall per-prompt mean estimation accuracy – particularly in the low-generation regime typical of RLVR. Theoretically, we construct a shrinkage-based baseline that provably yields lower-variance policy-gradient estimators across algorithms. Our proposed baseline serves as a drop-in replacement for existing per-prompt mean baselines, requiring no additional hyper-parameters or computation. Empirically, shrinkage baselines consistently outperform standard empirical-mean baselines, leading to lower-variance gradient updates and improved training stability. Comments: Preprint. Under Review Subjects: Machine Learning (cs.LG) Cite as: arXiv:2511.03710 [cs.LG] (or arXiv:2511.03710v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2511.03710 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
[LG-1] Behavior-Adaptive Q-Learning: A Unifying Framework for Offline-to-Online RL
链接: https://arxiv.org/abs/2511.03695
作者: Lipeng Zu,Hansong Zhou,Xiaonan Zhang
类目: Machine Learning (cs.LG)
*备注:
Abstract:Offline reinforcement learning (RL) enables training from fixed data without online interaction, but policies learned offline often struggle when deployed in dynamic environments due to distributional shift and unreliable value estimates on unseen state-action pairs. We introduce Behavior-Adaptive Q-Learning (BAQ), a framework designed to enable a smooth and reliable transition from offline to online RL. The key idea is to leverage an implicit behavioral model derived from offline data to provide a behavior-consistency signal during online fine-tuning. BAQ incorporates a dual-objective loss that (i) aligns the online policy toward the offline behavior when uncertainty is high, and (ii) gradually relaxes this constraint as more confident online experience is accumulated. This adaptive mechanism reduces error propagation from out-of-distribution estimates, stabilizes early online updates, and accelerates adaptation to new scenarios. Across standard benchmarks, BAQ consistently outperforms prior offline-to-online RL approaches, achieving faster recovery, improved robustness, and higher overall performance. Our results demonstrate that implicit behavior adaptation is a principled and practical solution for reliable real-world policy deployment.
[LG-2] SHIELD: Securing Healthcare IoT with Efficient Machine Learning Techniques for Anomaly Detection
链接: https://arxiv.org/abs/2511.03661
作者: Mahek Desai,Apoorva Rumale,Marjan Asadinia
类目: Machine Learning (cs.LG)
*备注:
Abstract:The integration of IoT devices in healthcare introduces significant security and reliability challenges, increasing susceptibility to cyber threats and operational anomalies. This study proposes a machine learning-driven framework for (1) detecting malicious cyberattacks and (2) identifying faulty device anomalies, leveraging a dataset of 200,000 records. Eight machine learning models are evaluated across three learning approaches: supervised learning (XGBoost, K-Nearest Neighbors (K- NN)), semi-supervised learning (Generative Adversarial Networks (GAN), Variational Autoencoders (VAE)), and unsupervised learning (One-Class Support Vector Machine (SVM), Isolation Forest, Graph Neural Networks (GNN), and Long Short-Term Memory (LSTM) Autoencoders). The comprehensive evaluation was conducted across multiple metrics like F1-score, precision, recall, accuracy, ROC-AUC, computational efficiency. XGBoost achieved 99% accuracy with minimal computational overhead (0.04s) for anomaly detection, while Isolation Forest balanced precision and recall effectively. LSTM Autoencoders underperformed with lower accuracy and higher latency. For attack detection, KNN achieved near-perfect precision, recall, and F1-score with the lowest computational cost (0.05s), followed by VAE at 97% accuracy. GAN showed the highest computational cost with lowest accuracy and ROC-AUC. These findings enhance IoT-enabled healthcare security through effective anomaly detection strategies. By improving early detection of cyber threats and device failures, this framework has the potential to prevent data breaches, minimize system downtime, and ensure the continuous and safe operation of medical devices, ultimately safeguarding patient health and trust in IoT-driven healthcare solutions.
[LG-3] Efficient Testing Implies Structured Symmetry
链接: https://arxiv.org/abs/2511.03653
作者: Cynthia Dwork,Pranay Tankala
类目: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
*备注:
Abstract:Given a small random sample of n -bit strings labeled by an unknown Boolean function, which properties of this function can be tested computationally efficiently? We show an equivalence between properties that are efficiently testable from few samples and properties with structured symmetry, which depend only on the function’s average values on parts of a low-complexity partition of the domain. Without the efficiency constraint, a similar characterization in terms of unstructured symmetry was obtained by Blais and Yoshida (2019). Our main technical tool is supersimulation, which builds on methods from the algorithmic fairness literature to approximate arbitrarily complex functions by small-circuit simulators that fool significantly larger distinguishers. We extend the characterization along other axes as well. We show that allowing parts to overlap exponentially reduces their required number, broadening the scope of the construction from properties testable with O(\log n) samples to properties testable with O(n) samples. For larger sample sizes, we show that any efficient tester is essentially checking for indistinguishability from a bounded collection of small circuits, in the spirit of a characterization of testable graph properties. Finally, we show that our results for Boolean function testing generalize to high-entropy distribution testing on arbitrary domains. Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG) Cite as: arXiv:2511.03653 [cs.CC] (or arXiv:2511.03653v1 [cs.CC] for this version) https://doi.org/10.48550/arXiv.2511.03653 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
[LG-4] nanoTabPFN: A Lightweight and Educational Reimplementation of TabPFN
链接: https://arxiv.org/abs/2511.03634
作者: Alexander Pfefferle,Johannes Hog,Lennart Purucker,Frank Hutter
类目: Machine Learning (cs.LG)
*备注:
Abstract:Tabular foundation models such as TabPFN have revolutionized predictive machine learning for tabular data. At the same time, the driving factors of this revolution are hard to understand. Existing open-source tabular foundation models are implemented in complicated pipelines boasting over 10,000 lines of code, lack architecture documentation or code quality. In short, the implementations are hard to understand, not beginner-friendly, and complicated to adapt for new experiments. We introduce nanoTabPFN, a simplified and lightweight implementation of the TabPFN v2 architecture and a corresponding training loop that uses pre-generated training data. nanoTabPFN makes tabular foundation models more accessible to students and researchers alike. For example, restricted to a small data setting it achieves a performance comparable to traditional machine learning baselines within one minute of pre-training on a single GPU (160,000x faster than TabPFN v2 pretraining). This eliminated requirement of large computational resources makes pre-training tabular foundation models accessible for educational purposes. Our code is available at this https URL.
[LG-5] Neural Beamforming with Doppler-Aware Sparse Attention for High Mobility Environments
链接: https://arxiv.org/abs/2511.03632
作者: Cemil Vahapoglu,Timothy J. O’Shea,Wan Liu,Sennur Ulukus
类目: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
*备注:
Abstract:Beamforming has significance for enhancing spectral efficiency and mitigating interference in multi-antenna wireless systems, facilitating spatial multiplexing and diversity in dense and high mobility scenarios. Traditional beamforming techniques such as zero-forcing beamforming (ZFBF) and minimum mean square error (MMSE) beamforming experience performance deterioration under adverse channel conditions. Deep learning-based beamforming offers an alternative with nonlinear mappings from channel state information (CSI) to beamforming weights by improving robustness against dynamic channel environments. Transformer-based models are particularly effective due to their ability to model long-range dependencies across time and frequency. However, their quadratic attention complexity limits scalability in large OFDM grids. Recent studies address this issue through sparse attention mechanisms that reduce complexity while maintaining expressiveness, yet often employ patterns that disregard channel dynamics, as they are not specifically designed for wireless communication scenarios. In this work, we propose a Doppler-aware Sparse Neural Network Beamforming (Doppler-aware Sparse NNBF) model that incorporates a channel-adaptive sparse attention mechanism in a multi-user single-input multiple-output (MU-SIMO) setting. The proposed sparsity structure is configurable along 2D time-frequency axes based on channel dynamics and is theoretically proven to ensure full connectivity within p hops, where p is the number of attention heads. Simulation results under urban macro (UMa) channel conditions show that Doppler-aware Sparse NNBF significantly outperforms both a fixed-pattern baseline, referred to as Standard Sparse NNBF, and conventional beamforming techniques ZFBF and MMSE beamforming in high mobility scenarios, while maintaining structured sparsity with a controlled number of attended keys per query.
[LG-6] Financial Management System for SMEs: Real-World Deployment of Accounts Receivable and Cash Flow Prediction
链接: https://arxiv.org/abs/2511.03631
作者: Bartłomiej Małkus,Szymon Bobek,Grzegorz J. Nalepa
类目: Machine Learning (cs.LG)
*备注: 11 pages, 1 figure
Abstract:Small and Medium Enterprises (SMEs), particularly freelancers and early-stage businesses, face unique financial management challenges due to limited resources, small customer bases, and constrained data availability. This paper presents the development and deployment of an integrated financial prediction system that combines accounts receivable prediction and cash flow forecasting specifically designed for SME operational constraints. Our system addresses the gap between enterprise-focused financial tools and the practical needs of freelancers and small businesses. The solution integrates two key components: a binary classification model for predicting invoice payment delays, and a multi-module cash flow forecasting model that handles incomplete and limited historical data. A prototype system has been implemented and deployed as a web application with integration into Cluee’s platform, a startup providing financial management tools for freelancers, demonstrating practical feasibility for real-world SME financial management.
[LG-7] CLAX: Fast and Flexible Neural Click Models in JAX
链接: https://arxiv.org/abs/2511.03620
作者: Philipp Hager,Onno Zoeter,Maarten de Rijke
类目: Information Retrieval (cs.IR); Machine Learning (cs.LG); Software Engineering (cs.SE)
*备注:
Abstract:CLAX is a JAX-based library that implements classic click models using modern gradient-based optimization. While neural click models have emerged over the past decade, complex click models based on probabilistic graphical models (PGMs) have not systematically adopted gradient-based optimization, preventing practitioners from leveraging modern deep learning frameworks while preserving the interpretability of classic models. CLAX addresses this gap by replacing EM-based optimization with direct gradient-based optimization in a numerically stable manner. The framework’s modular design enables the integration of any component, from embeddings and deep networks to custom modules, into classic click models for end-to-end optimization. We demonstrate CLAX’s efficiency by running experiments on the full Baidu-ULTR dataset comprising over a billion user sessions in \approx 2 hours on a single GPU, orders of magnitude faster than traditional EM approaches. CLAX implements ten classic click models, serving both industry practitioners seeking to understand user behavior and improve ranking performance at scale and researchers developing new click models. CLAX is available at: this https URL
[LG-8] owards Formalizing Reinforcement Learning Theory
链接: https://arxiv.org/abs/2511.03618
作者: Shangtong Zhang
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:
Abstract:In this paper, we formalize the almost sure convergence of Q -learning and linear temporal difference (TD) learning with Markovian samples using the Lean 4 theorem prover based on the Mathlib library. Q -learning and linear TD are among the earliest and most influential reinforcement learning (RL) algorithms. The investigation of their convergence properties is not only a major research topic during the early development of the RL field but also receives increasing attention nowadays. This paper formally verifies their almost sure convergence in a unified framework based on the Robbins-Siegmund theorem. The framework developed in this work can be easily extended to convergence rates and other modes of convergence. This work thus makes an important step towards fully formalizing convergent RL results. The code is available at this https URL.
[LG-9] Going Beyond Expert Performance via Deep Implicit Imitation Reinforcement Learning
链接: https://arxiv.org/abs/2511.03616
作者: Iason Chrysomallis,Georgios Chalkiadakis
类目: Machine Learning (cs.LG)
*备注:
Abstract:Imitation learning traditionally requires complete state-action demonstrations from optimal or near-optimal experts. These requirements severely limit practical applicability, as many real-world scenarios provide only state observations without corresponding actions and expert performance is often suboptimal. In this paper we introduce a deep implicit imitation reinforcement learning framework that addresses both limitations by combining deep reinforcement learning with implicit imitation learning from observation-only datasets. Our main algorithm, Deep Implicit Imitation Q-Network (DIIQN), employs an action inference mechanism that reconstructs expert actions through online exploration and integrates a dynamic confidence mechanism that adaptively balances expert-guided and self-directed learning. This enables the agent to leverage expert guidance for accelerated training while maintaining capacity to surpass suboptimal expert performance. We further extend our framework with a Heterogeneous Actions DIIQN (HA-DIIQN) algorithm to tackle scenarios where expert and agent possess different action sets, a challenge previously unaddressed in the implicit imitation learning literature. HA-DIIQN introduces an infeasibility detection mechanism and a bridging procedure identifying alternative pathways connecting agent capabilities to expert guidance when direct action replication is impossible. Our experimental results demonstrate that DIIQN achieves up to 130% higher episodic returns compared to standard DQN, while consistently outperforming existing implicit imitation methods that cannot exceed expert performance. In heterogeneous action settings, HA-DIIQN learns up to 64% faster than baselines, leveraging expert datasets unusable by conventional approaches. Extensive parameter sensitivity analysis reveals the framework’s robustness across varying dataset sizes and hyperparameter configurations.
[LG-10] nsor-Efficient High-Dimensional Q-learning
链接: https://arxiv.org/abs/2511.03595
作者: Junyi Wu,Dan Li
类目: Machine Learning (cs.LG); Systems and Control (eess.SY)
*备注:
Abstract:High-dimensional reinforcement learning faces challenges with complex calculations and low sample efficiency in large state-action spaces. Q-learning algorithms struggle particularly with the curse of dimensionality, where the number of state-action pairs grows exponentially with problem size. While neural network-based approaches like Deep Q-Networks have shown success, recent tensor-based methods using low-rank decomposition offer more parameter-efficient alternatives. Building upon existing tensor-based methods, we propose Tensor-Efficient Q-Learning (TEQL), which enhances low-rank tensor decomposition via improved block coordinate descent on discretized state-action spaces, incorporating novel exploration and regularization mechanisms. The key innovation is an exploration strategy that combines approximation error with visit count-based upper confidence bound to prioritize actions with high uncertainty, avoiding wasteful random exploration. Additionally, we incorporate a frequency-based penalty term in the objective function to encourage exploration of less-visited state-action pairs and reduce overfitting to frequently visited regions. Empirical results on classic control tasks demonstrate that TEQL outperforms conventional matrix-based methods and deep RL approaches in both sample efficiency and total rewards, making it suitable for resource-constrained applications, such as space and healthcare where sampling costs are high.
[LG-11] abGemma: Text-Based Tabular ICL via LLM using Continued Pretraining and Retrieval
链接: https://arxiv.org/abs/2511.03570
作者: Günther Schindler,Maximilian Schambach,Michael Medek,Sam Thelin
类目: Machine Learning (cs.LG)
*备注:
Abstract:We study LLMs for tabular prediction with mixed text, numeric, and categorical fields. We introduce TabGemma, a schema-agnostic in-context learner that treats rows as sequences and tackles two practical hurdles when adapting pretrained LLMs for tabular predictions: unstable numeric tokenization and limited context size. We propose to canonicalize numbers via signed scientific notation and continue pretraining of a 12B Gemma 3 model with a target imputation objective using a large-scale real world dataset. For inference, we use a compact n-gram-based retrieval to select informative exemplars that fit within a 128k-token window. On semantically rich benchmarks, TabGemma establishes a new state of the art on classification across low- and high-data regimes and improves monotonically with more context rows. For regression, it is competitive at small sample sizes but trails conventional approaches as data grows. Our results show that LLMs can be effective tabular in-context learners on highly semantic tasks when paired with dedicated numeric handling and context retrieval, while motivating further advances in numeric modeling and long-context scaling. Subjects: Machine Learning (cs.LG) Cite as: arXiv:2511.03570 [cs.LG] (or arXiv:2511.03570v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2511.03570 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
[LG-12] Flat Minima and Generalization: Insights from Stochastic Convex Optimization
链接: https://arxiv.org/abs/2511.03548
作者: Matan Schliserman,Shira Vansover-Hager,Tomer Koren
类目: Machine Learning (cs.LG)
*备注:
Abstract:Understanding the generalization behavior of learning algorithms is a central goal of learning theory. A recently emerging explanation is that learning algorithms are successful in practice because they converge to flat minima, which have been consistently associated with improved generalization performance. In this work, we study the link between flat minima and generalization in the canonical setting of stochastic convex optimization with a non-negative, \beta -smooth objective. Our first finding is that, even in this fundamental and well-studied setting, flat empirical minima may incur trivial \Omega(1) population risk while sharp minima generalizes optimally. Then, we show that this poor generalization behavior extends to two natural ‘‘sharpness-aware’’ algorithms originally proposed by Foret et al. (2021), designed to bias optimization toward flat solutions: Sharpness-Aware Gradient Descent (SA-GD) and Sharpness-Aware Minimization (SAM). For SA-GD, which performs gradient steps on the maximal loss in a predefined neighborhood, we prove that while it successfully converges to a flat minimum at a fast rate, the population risk of the solution can still be as large as \Omega(1) , indicating that even flat minima found algorithmically using a sharpness-aware gradient method might generalize poorly. For SAM, a computationally efficient approximation of SA-GD based on normalized ascent steps, we show that although it minimizes the empirical loss, it may converge to a sharp minimum and also incur population risk \Omega(1) . Finally, we establish population risk upper bounds for both SA-GD and SAM using algorithmic stability techniques.
[LG-13] Byzantine-Robust Federated Learning with Learnable Aggregation Weights
链接: https://arxiv.org/abs/2511.03529
作者: Javad Parsa,Amir Hossein Daghestani,André M. H. Teixeira,Mikael Johansson
类目: Machine Learning (cs.LG)
*备注:
Abstract:Federated Learning (FL) enables clients to collaboratively train a global model without sharing their private data. However, the presence of malicious (Byzantine) clients poses significant challenges to the robustness of FL, particularly when data distributions across clients are heterogeneous. In this paper, we propose a novel Byzantine-robust FL optimization problem that incorporates adaptive weighting into the aggregation process. Unlike conventional approaches, our formulation treats aggregation weights as learnable parameters, jointly optimizing them alongside the global model parameters. To solve this optimization problem, we develop an alternating minimization algorithm with strong convergence guarantees under adversarial attack. We analyze the Byzantine resilience of the proposed objective. We evaluate the performance of our algorithm against state-of-the-art Byzantine-robust FL approaches across various datasets and attack scenarios. Experimental results demonstrate that our method consistently outperforms existing approaches, particularly in settings with highly heterogeneous data and a large proportion of malicious clients.
[LG-14] Learning Without Critics? Revisiting GRPO in Classical Reinforcement Learning Environments
链接: https://arxiv.org/abs/2511.03527
作者: Bryan L. M. de Oliveira,Felipe V. Frujeri,Marcos P. C. M. Queiroz,Luana G. B. Martins,Telma W. de L. Soares,Luckeciano C. Melo
类目: Machine Learning (cs.LG)
*备注:
Abstract:Group Relative Policy Optimization (GRPO) has emerged as a scalable alternative to Proximal Policy Optimization (PPO) by eliminating the learned critic and instead estimating advantages through group-relative comparisons of trajectories. This simplification raises fundamental questions about the necessity of learned baselines in policy-gradient methods. We present the first systematic study of GRPO in classical single-task reinforcement learning environments, spanning discrete and continuous control tasks. Through controlled ablations isolating baselines, discounting, and group sampling, we reveal three key findings: (1) learned critics remain essential for long-horizon tasks: all critic-free baselines underperform PPO except in short-horizon environments like CartPole where episodic returns can be effective; (2) GRPO benefits from high discount factors (gamma = 0.99) except in HalfCheetah, where lack of early termination favors moderate discounting (gamma = 0.9); (3) smaller group sizes outperform larger ones, suggesting limitations in batch-based grouping strategies that mix unrelated episodes. These results reveal both the limitations of critic-free methods in classical control and the specific conditions where they remain viable alternatives to learned value functions.
[LG-15] Why Less is More (Sometimes): A Theory of Data Curation
链接: https://arxiv.org/abs/2511.03492
作者: Elvis Dohmatob,Mohammad Pezeshki,Reyhane Askari-Hemmat
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:
Abstract:This paper introduces a theoretical framework to resolve a central paradox in modern machine learning: When is it better to use less data? This question has become critical as classical scaling laws suggesting more is more'' (Sun et al., 2025) are challenged by methods like LIMO (less is more’') and s1 (Ye et al., 2025; Muenighoff et al., 2025), which achieve superior performance with small, aggressively curated datasets. Here, we study data curation strategies where an imperfect oracle selects the training examples according to their difficulty and correctness. Our results provide exact scaling law curves for test error under both label-agnostic and label-aware curation rules, revealing when and why keeping only a subset of data can improve generalization. In contrast to classical scaling laws, we show that under certain conditions, small curated datasets can outperform full datasets, and we provide analytical conditions for this by deriving precise phase transition curves tied to data size and quality. We validate these theoretical claims with empirical results on ImageNet, confirming our predictions about when curation improves accuracy and can even mitigate model collapse. Furthermore, our framework provides a principled explanation for the contradictory curation strategies recently observed in LLM mathematical reasoning.
[LG-16] NAP: Attention-Based Late Fusion for Automatic Sleep Staging
链接: https://arxiv.org/abs/2511.03488
作者: Alvise Dei Rossi,Julia van der Meer,Markus H. Schmidt,Claudio L.A. Bassetti,Luigi Fiorillo,Francesca Faraci
类目: Machine Learning (cs.LG)
*备注:
Abstract:Polysomnography signals are highly heterogeneous, varying in modality composition (e.g., EEG, EOG, ECG), channel availability (e.g., frontal, occipital EEG), and acquisition protocols across datasets and clinical sites. Most existing models that process polysomnography data rely on a fixed subset of modalities or channels and therefore neglect to fully exploit its inherently multimodal nature. We address this limitation by introducing NAP (Neural Aggregator of Predictions), an attention-based model which learns to combine multiple prediction streams using a tri-axial attention mechanism that captures temporal, spatial, and predictor-level dependencies. NAP is trained to adapt to different input dimensions. By aggregating outputs from frozen, pretrained single-channel models, NAP consistently outperforms individual predictors and simple ensembles, achieving state-of-the-art zero-shot generalization across multiple datasets. While demonstrated in the context of automated sleep staging from polysomnography, the proposed approach could be extended to other multimodal physiological applications.
[LG-17] System Identification of a Moored ASV with Recessed Moon Pool via Deterministic and Bayesian Hankel-DMDc
链接: https://arxiv.org/abs/2511.03482
作者: Giorgio Palma,Ivan Santic,Andrea Serani,Lorenzo Minno,Matteo Diez
类目: ystems and Control (eess.SY); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
*备注: 26 pages, 11 figures, 2 tables, 1 box
Abstract:This study addresses the system identification of a small autonomous surface vehicle (ASV) under moored conditions using Hankel dynamic mode decomposition with control (HDMDc) and its Bayesian extension (BHDMDc). Experiments were carried out on a Codevintec CK-14e ASV in the towing tank of CNR-INM, under both irregular and regular head-sea wave conditions. The ASV under investigation features a recessed moon pool, which induces nonlinear responses due to sloshing, thereby increasing the modelling challenge. Data-driven reduced-order models were built from measurements of vessel motions and mooring loads. The HDMDc framework provided accurate deterministic predictions of vessel dynamics, while the Bayesian formulation enabled uncertainty-aware characterization of the model response by accounting for variability in hyperparameter selection. Validation against experimental data demonstrated that both HDMDc and BHDMDc can predict the vessel’s response to unseen regular and irregular wave excitations. In conclusion, the study shows that HDMDc-based ROMs are a viable data-driven alternative for system identification, demonstrating for the first time their generalization capability for a sea condition different from the training set, achieving high accuracy in reproducing vessel dynamics.
[LG-18] RAG Boost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse
链接: https://arxiv.org/abs/2511.03475
作者: Yinsicheng Jiang,Yeqi Huang,Liang Cheng,Cheng Deng,Xuan Sun,Luo Mai
类目: Machine Learning (cs.LG)
*备注:
Abstract:Retrieval-augmented generation (RAG) enhances large language models (LLMs) with retrieved context but often suffers from downgraded prefill performance as modern applications demand longer and more complex inputs. Existing caching techniques either preserve accuracy with low cache reuse or improve reuse at the cost of degraded reasoning quality. We present RAGBoost, an efficient RAG system that achieves high cache reuse without sacrificing accuracy through accuracy-preserving context reuse. RAGBoost detects overlapping retrieved items across concurrent sessions and multi-turn interactions, using efficient context indexing, ordering, and de-duplication to maximize reuse, while lightweight contextual hints maintain reasoning fidelity. It integrates seamlessly with existing LLM inference engines and improves their prefill performance by 1.5-3X over state-of-the-art methods, while preserving or even enhancing reasoning accuracy across diverse RAG and agentic AI workloads. Our code is released at: this https URL.
[LG-19] Reinforcement Learning Using known Invariances
链接: https://arxiv.org/abs/2511.03473
作者: Alexandru Cioba,Aya Kayal,Laura Toni,Sattar Vakili,Alberto Bernacchia
类目: Machine Learning (cs.LG)
*备注:
Abstract:In many real-world reinforcement learning (RL) problems, the environment exhibits inherent symmetries that can be exploited to improve learning efficiency. This paper develops a theoretical and algorithmic framework for incorporating known group symmetries into kernel-based RL. We propose a symmetry-aware variant of optimistic least-squares value iteration (LSVI), which leverages invariant kernels to encode invariance in both rewards and transition dynamics. Our analysis establishes new bounds on the maximum information gain and covering numbers for invariant RKHSs, explicitly quantifying the sample efficiency gains from symmetry. Empirical results on a customized Frozen Lake environment and a 2D placement design problem confirm the theoretical improvements, demonstrating that symmetry-aware RL achieves significantly better performance than their standard kernel counterparts. These findings highlight the value of structural priors in designing more sample-efficient reinforcement learning algorithms.
[LG-20] POEMS: Product of Experts for Interpretable Multi-omic Integration using Sparse Decoding
链接: https://arxiv.org/abs/2511.03464
作者: Mihriban Kocak Balik,Pekka Marttinen,Negar Safinianaini
类目: Machine Learning (cs.LG)
*备注:
Abstract:Integrating different molecular layers, i.e., multiomics data, is crucial for unraveling the complexity of diseases; yet, most deep generative models either prioritize predictive performance at the expense of interpretability or enforce interpretability by linearizing the decoder, thereby weakening the network’s nonlinear expressiveness. To overcome this tradeoff, we introduce POEMS: Product Of Experts for Interpretable Multiomics Integration using Sparse Decoding, an unsupervised probabilistic framework that preserves predictive performance while providing interpretability. POEMS provides interpretability without linearizing any part of the network by 1) mapping features to latent factors using sparse connections, which directly translates to biomarker discovery, 2) allowing for cross-omic associations through a shared latent space using product of experts model, and 3) reporting contributions of each omic by a gating network that adaptively computes their influence in the representation learning. Additionally, we present an efficient sparse decoder. In a cancer subtyping case study, POEMS achieves competitive clustering and classification performance while offering our novel set of interpretations, demonstrating that biomarker based insight and predictive accuracy can coexist in multiomics representation learning.
[LG-21] SyMuPe: Affective and Controllable Symbolic Music Performance
链接: https://arxiv.org/abs/2511.03425
作者: Ilya Borovik,Dmitrii Gavrilev,Vladimir Viro
类目: ound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM)
*备注: ACM Multimedia 2025. Extended version with supplementary material
Abstract:Emotions are fundamental to the creation and perception of music performances. However, achieving human-like expression and emotion through machine learning models for performance rendering remains a challenging task. In this work, we present SyMuPe, a novel framework for developing and training affective and controllable symbolic piano performance models. Our flagship model, PianoFlow, uses conditional flow matching trained to solve diverse multi-mask performance inpainting tasks. By design, it supports both unconditional generation and infilling of music performance features. For training, we use a curated, cleaned dataset of 2,968 hours of aligned musical scores and expressive MIDI performances. For text and emotion control, we integrate a piano performance emotion classifier and tune PianoFlow with the emotion-weighted Flan-T5 text embeddings provided as conditional inputs. Objective and subjective evaluations against transformer-based baselines and existing models show that PianoFlow not only outperforms other approaches, but also achieves performance quality comparable to that of human-recorded and transcribed MIDI samples. For emotion control, we present and analyze samples generated under different text conditioning scenarios. The developed model can be integrated into interactive applications, contributing to the creation of more accessible and engaging music performance systems.
[LG-22] ripleWin: Fixed-Point Equilibrium Pricing for Data-Model Coupled Markets
链接: https://arxiv.org/abs/2511.03368
作者: Hongrun Ren,Yun Xiong,Lei You,Yingying Wang,Haixu Xiong,Yangyong Zhu
类目: Machine Learning (cs.LG)
*备注:
Abstract:The rise of the machine learning (ML) model economy has intertwined markets for training datasets and pre-trained models. However, most pricing approaches still separate data and model transactions or rely on broker-centric pipelines that favor one side. Recent studies of data markets with externalities capture buyer interactions but do not yield a simultaneous and symmetric mechanism across data sellers, model producers, and model buyers. We propose a unified data-model coupled market that treats dataset and model trading as a single system. A supply-side mapping transforms dataset payments into buyer-visible model quotations, while a demand-side mapping propagates buyer prices back to datasets through Shapley-based allocation. Together, they form a closed loop that links four interactions: supply-demand propagation in both directions and mutual coupling among buyers and among sellers. We prove that the joint operator is a standard interference function (SIF), guaranteeing existence, uniqueness, and global convergence of equilibrium prices. Experiments demonstrate efficient convergence and improved fairness compared with broker-centric and one-sided baselines. The code is available on this https URL.
[LG-23] A Modular Data-Free Pipeline for Multi-Label Intention Recognition in Transportation Agent ic AI Applications
链接: https://arxiv.org/abs/2511.03363
作者: Xiaocai Zhang,Hur Lim,Ke Wang,Zhe Xiao,Jing Wang,Kelvin Lee,Xiuju Fu,Zheng Qin
类目: Machine Learning (cs.LG)
*备注: Present in the Transportation Research Board (TRB) Annual Meeting 2026
Abstract:In this study, a modular, data-free pipeline for multi-label intention recognition is proposed for agentic AI applications in transportation. Unlike traditional intent recognition systems that depend on large, annotated corpora and often struggle with fine-grained, multi-label discrimination, our approach eliminates the need for costly data collection while enhancing the accuracy of multi-label intention understanding. Specifically, the overall pipeline, named DMTC, consists of three steps: 1) using prompt engineering to guide large language models (LLMs) to generate diverse synthetic queries in different transport scenarios; 2) encoding each textual query with a Sentence-T5 model to obtain compact semantic embeddings; 3) training a lightweight classifier using a novel online focal-contrastive (OFC) loss that emphasizes hard samples and maximizes inter-class separability. The applicability of the proposed pipeline is demonstrated in an agentic AI application in the maritime transportation context. Extensive experiments show that DMTC achieves a Hamming loss of 5.35% and an AUC of 95.92%, outperforming state-of-the-art multi-label classifiers and recent end-to-end SOTA LLM-based baselines. Further analysis reveals that Sentence-T5 embeddings improve subset accuracy by at least 3.29% over alternative encoders, and integrating the OFC loss yields an additional 0.98% gain compared to standard contrastive objectives. In conclusion, our system seamlessly routes user queries to task-specific modules (e.g., ETA information, traffic risk evaluation, and other typical scenarios in the transportation domain), laying the groundwork for fully autonomous, intention-aware agents without costly manual labelling.
[LG-24] SORTeD Rashomon Sets of Sparse Decision Trees: Anytime Enumeration
链接: https://arxiv.org/abs/2511.03344
作者: Elif Arslan,Jacobus G. M. van der Linden,Serge Hoogendoorn,Marco Rinaldi,Emir Demirović
类目: Machine Learning (cs.LG)
*备注: 32 pages, 10 figures, to be published in the proceedings of The Thirty-Ninth Annual Conference on Neural Information Processing Systems
Abstract:Sparse decision tree learning provides accurate and interpretable predictive models that are ideal for high-stakes applications by finding the single most accurate tree within a (soft) size limit. Rather than relying on a single “best” tree, Rashomon sets-trees with similar performance but varying structures-can be used to enhance variable importance analysis, enrich explanations, and enable users to choose simpler trees or those that satisfy stakeholder preferences (e.g., fairness) without hard-coding such criteria into the objective function. However, because finding the optimal tree is NP-hard, enumerating the Rashomon set is inherently challenging. Therefore, we introduce SORTD, a novel framework that improves scalability and enumerates trees in the Rashomon set in order of the objective value, thus offering anytime behavior. Our experiments show that SORTD reduces runtime by up to two orders of magnitude compared with the state of the art. Moreover, SORTD can compute Rashomon sets for any separable and totally ordered objective and supports post-evaluating the set using other separable (and partially ordered) objectives. Together, these advances make exploring Rashomon sets more practical in real-world applications.
[LG-25] Graph Neural AI with Temporal Dynamics for Comprehensive Anomaly Detection in Microservices
链接: https://arxiv.org/abs/2511.03285
作者: Qingyuan Zhang,Ning Lyu,Le Liu,Yuxi Wang,Ziyu Cheng,Cancan Hua
类目: Machine Learning (cs.LG)
*备注:
Abstract:This study addresses the problem of anomaly detection and root cause tracing in microservice architectures and proposes a unified framework that combines graph neural networks with temporal modeling. The microservice call chain is abstracted as a directed graph, where multidimensional features of nodes and edges are used to construct a service topology representation, and graph convolution is applied to aggregate features across nodes and model dependencies, capturing complex structural relationships among services. On this basis, gated recurrent units are introduced to model the temporal evolution of call chains, and multi-layer stacking and concatenation operations are used to jointly obtain structural and temporal representations, improving the ability to identify anomaly patterns. Furthermore, anomaly scoring functions at both the node and path levels are defined to achieve unified modeling from local anomaly detection to global call chain tracing, which enables the identification of abnormal service nodes and the reconstruction of potential anomaly propagation paths. Sensitivity experiments are then designed from multiple dimensions, including hyperparameters, environmental disturbances, and data distribution, to evaluate the framework, and results show that it outperforms baseline methods in key metrics such as AUC, ACC, Recall, and F1-Score, maintaining high accuracy and stability under dynamic topologies and complex environments. This research not only provides a new technical path for anomaly detection in microservices but also lays a methodological foundation for intelligent operations in distributed systems.
[LG-26] A Probabilistic Approach to Pose Synchronization for Multi-Reference Alignment with Applications to MIMO Wireless Communication Systems NEURIPS
链接: https://arxiv.org/abs/2511.03280
作者: Rob Romijnders,Gabriele Cesa,Christos Louizos,Kumar Pratik,Arash Behboodi
类目: Machine Learning (cs.LG); Applications (stat.AP)
*备注: To appear in NeurIPS workshop: AI and ML for Next-Generation Wireless Communications (AI4NextG)
Abstract:From molecular imaging to wireless communications, the ability to align and reconstruct signals from multiple misaligned observations is crucial for system performance. We study the problem of multi-reference alignment (MRA), which arises in many real-world problems, such as cryo-EM, computer vision, and, in particular, wireless communication systems. Using a probabilistic approach to model MRA, we find a new algorithm that uses relative poses as nuisance variables to marginalize out – thereby removing the global symmetries of the problem and allowing for more direct solutions and improved convergence. The decentralization of this approach enables significant computational savings by avoiding the cubic scaling of centralized methods through cycle consistency. Both proposed algorithms achieve lower reconstruction error across experimental settings.
[LG-27] Multi-Objective Adaptive Rate Limiting in Microservices Using Deep Reinforcement Learning
链接: https://arxiv.org/abs/2511.03279
作者: Ning Lyu,Yuxi Wang,Ziyu Cheng,Qingyuan Zhang,Feng Chen
类目: Machine Learning (cs.LG)
*备注:
Abstract:As cloud computing and microservice architectures become increasingly prevalent, API rate limiting has emerged as a critical mechanism for ensuring system stability and service quality. Traditional rate limiting algorithms, such as token bucket and sliding window, while widely adopted, struggle to adapt to dynamic traffic patterns and varying system loads. This paper proposes an adaptive rate limiting strategy based on deep reinforcement learning that dynamically balances system throughput and service latency. We design a hybrid architecture combining Deep Q-Network (DQN) and Asynchronous Advantage Actor-Critic (A3C) algorithms, modeling the rate limiting decision process as a Markov Decision Process. The system continuously monitors microservice states and learns optimal rate limiting policies through environmental interaction. Extensive experiments conducted in a Kubernetes cluster environment demonstrate that our approach achieves 23.7% throughput improvement and 31.4% P99 latency reduction compared to traditional fixed-threshold strategies under high-load scenarios. Results from a 90-day production deployment handling 500 million daily requests validate the practical effectiveness of the proposed method, with 82% reduction in service degradation incidents and 68% decrease in manual interventions.
[LG-28] Diffusion Language Models are Super Data Learners
链接: https://arxiv.org/abs/2511.03276
作者: Jinjie Ni,Qian Liu,Longxu Dou,Chao Du,Zili Wang,Hang Yan,Tianyu Pang,Michael Qizhe Shieh
类目: Machine Learning (cs.LG)
*备注:
Abstract:Under strictly controlled pre-training settings, we observe a Crossover: when unique data is limited, diffusion language models (DLMs) consistently surpass autoregressive (AR) models by training for more epochs. The crossover shifts later with more or higher-quality data, earlier with larger models, and persists across dense and sparse architectures. We attribute the gains to three compounding factors: (1) any-order modeling, (2) super-dense compute from iterative bidirectional denoising, and (3) built-in Monte Carlo augmentation; input or parameter noise improves AR under data constraint but cannot close the gap. At scale, a 1.7B DLM trained with a ~1.5T-token compute budget on 10B unique Python tokens overtakes an AR coder trained with strictly matched settings. In addition, a 1B-parameter DLM achieves 56% accuracy on HellaSwag and 33% on MMLU using only 1B tokens, without any special tricks, just by repeating standard pre-training data. We also show that rising validation cross-entropy does not imply degraded downstream performance in this regime.
[LG-29] Death by a Thousand Prompts: Open Model Vulnerability Analysis
链接: https://arxiv.org/abs/2511.03247
作者: Amy Chang,Nicholas Conley,Harish Santhanalakshmi Ganesan,Adam Swanda
类目: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
*备注:
Abstract:Open-weight models provide researchers and developers with accessible foundations for diverse downstream applications. We tested the safety and security postures of eight open-weight large language models (LLMs) to identify vulnerabilities that may impact subsequent fine-tuning and deployment. Using automated adversarial testing, we measured each model’s resilience against single-turn and multi-turn prompt injection and jailbreak attacks. Our findings reveal pervasive vulnerabilities across all tested models, with multi-turn attacks achieving success rates between 25.86% and 92.78% – representing a 2\times to 10\times increase over single-turn baselines. These results underscore a systemic inability of current open-weight models to maintain safety guardrails across extended interactions. We assess that alignment strategies and lab priorities significantly influence resilience: capability-focused models such as Llama 3.3 and Qwen 3 demonstrate higher multi-turn susceptibility, whereas safety-oriented designs such as Google Gemma 3 exhibit more balanced performance. The analysis concludes that open-weight models, while crucial for innovation, pose tangible operational and ethical risks when deployed without layered security controls. These findings are intended to inform practitioners and developers of the potential risks and the value of professional AI security solutions to mitigate exposure. Addressing multi-turn vulnerabilities is essential to ensure the safe, reliable, and responsible deployment of open-weight LLMs in enterprise and public domains. We recommend adopting a security-first design philosophy and layered protections to ensure resilient deployments of open-weight models. Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG) Cite as: arXiv:2511.03247 [cs.CR] (or arXiv:2511.03247v1 [cs.CR] for this version) https://doi.org/10.48550/arXiv.2511.03247 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
[LG-30] Climate Adaptation with Reinforcement Learning: Economic vs. Quality of Life Adaptation Pathways
链接: https://arxiv.org/abs/2511.03243
作者: Miguel Costa,Arthur Vandervoort,Martin Drews,Karyn Morrissey,Francisco C. Pereira
类目: Machine Learning (cs.LG)
*备注: Accepted for presentation at AI for Climate and Conservation Workshop at EurIPS 2025
Abstract:Climate change will cause an increase in the frequency and severity of flood events, prompting the need for cohesive adaptation policymaking. Designing effective adaptation policies, however, depends on managing the uncertainty of long-term climate impacts. Meanwhile, such policies can feature important normative choices that are not always made explicit. We propose that Reinforcement Learning (RL) can be a useful tool to both identify adaptation pathways under uncertain conditions while it also allows for the explicit modelling (and consequent comparison) of different adaptation priorities (e.g. economic vs. wellbeing). We use an Integrated Assessment Model (IAM) to link together a rainfall and flood model, and compute the impacts of flooding in terms of quality of life (QoL), transportation, and infrastructure damage. Our results show that models prioritising QoL over economic impacts results in more adaptation spending as well as a more even distribution of spending over the study area, highlighting the extent to which such normative assumptions can alter adaptation policy. Our framework is publicly available: this https URL.
[LG-31] A unified physics-informed generative operator framework for general inverse problems
链接: https://arxiv.org/abs/2511.03241
作者: Gang Bao,Yaohua Zang
类目: Machine Learning (cs.LG)
*备注:
Abstract:Solving inverse problems governed by partial differential equations (PDEs) is central to science and engineering, yet remains challenging when measurements are sparse, noisy, or when the underlying coefficients are high-dimensional or discontinuous. Existing deep learning approaches either require extensive labeled datasets or are limited to specific measurement types, often leading to failure in such regimes and restricting their practical applicability. Here, a novel generative neural operator framework, IGNO, is introduced to overcome these limitations. IGNO unifies the solution of inverse problems from both point measurements and operator-valued data without labeled training pairs. This framework encodes high-dimensional, potentially discontinuous coefficient fields into a low-dimensional latent space, which drives neural operator decoders to reconstruct both coefficients and PDE solutions. Training relies purely on physics constraints through PDE residuals, while inversion proceeds via efficient gradient-based optimization in latent space, accelerated by an a priori normalizing flow model. Across a diverse set of challenging inverse problems, including recovery of discontinuous coefficients from solution-based measurements and the EIT problem with operator-based measurements, IGNO consistently achieves accurate, stable, and scalable inversion even under severe noise. It consistently outperforms the state-of-the-art method under varying noise levels and demonstrates strong generalization to out-of-distribution targets. These results establish IGNO as a unified and powerful framework for tackling challenging inverse problems across computational science domains.
[LG-32] Incorporating Quality of Life in Climate Adaptation Planning via Reinforcement Learning
链接: https://arxiv.org/abs/2511.03238
作者: Miguel Costa,Arthur Vandervoort,Martin Drews,Karyn Morrissey,Francisco C. Pereira
类目: Machine Learning (cs.LG)
*备注: Accepted for presentation at AI in Science (AIS) 2025
Abstract:Urban flooding is expected to increase in frequency and severity as a consequence of climate change, causing wide-ranging impacts that include a decrease in urban Quality of Life (QoL). Meanwhile, policymakers must devise adaptation strategies that can cope with the uncertain nature of climate change and the complex and dynamic nature of urban flooding. Reinforcement Learning (RL) holds significant promise in tackling such complex, dynamic, and uncertain problems. Because of this, we use RL to identify which climate adaptation pathways lead to a higher QoL in the long term. We do this using an Integrated Assessment Model (IAM) which combines a rainfall projection model, a flood model, a transport accessibility model, and a quality of life index. Our preliminary results suggest that this approach can be used to learn optimal adaptation measures and it outperforms other realistic and real-world planning strategies. Our framework is publicly available: this https URL.
[LG-33] Cross-Modal Alignment via Variational Copula Modelling
链接: https://arxiv.org/abs/2511.03196
作者: Feng Wu,Tsai Hor Chan,Fuying Wang,Guosheng Yin,Lequan Yu
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:
Abstract:Various data modalities are common in real-world applications (e.g., electronic health records, medical images and clinical notes in healthcare). It is essential to develop multimodal learning methods to aggregate various information from multiple modalities. The main challenge is how to appropriately align and fuse the representations of different modalities into a joint distribution. Existing methods mainly rely on concatenation or the Kronecker product, oversimplifying the interaction structure between modalities and indicating a need to model more complex interactions. Additionally, the joint distribution of latent representations with higher-order interactions is underexplored. Copula is a powerful statistical structure for modelling the interactions among variables, as it naturally bridges the joint distribution and marginal distributions of multiple variables. We propose a novel copula-driven multimodal learning framework, which focuses on learning the joint distribution of various modalities to capture the complex interactions among them. The key idea is to interpret the copula model as a tool to align the marginal distributions of the modalities efficiently. By assuming a Gaussian mixture distribution for each modality and a copula model on the joint distribution, our model can generate accurate representations for missing modalities. Extensive experiments on public MIMIC datasets demonstrate the superior performance of our model over other competitors. The code is available at this https URL.
[LG-34] Periodic Skill Discovery NEURIPS2025
链接: https://arxiv.org/abs/2511.03187
作者: Jonghae Park,Daesol Cho,Jusuk Lee,Dongseok Shim,Inkyu Jang,H. Jin Kim
类目: Machine Learning (cs.LG); Robotics (cs.RO)
*备注: NeurIPS 2025
Abstract:Unsupervised skill discovery in reinforcement learning (RL) aims to learn diverse behaviors without relying on external rewards. However, current methods often overlook the periodic nature of learned skills, focusing instead on increasing the mutual dependence between states and skills or maximizing the distance traveled in latent space. Considering that many robotic tasks – particularly those involving locomotion – require periodic behaviors across varying timescales, the ability to discover diverse periodic skills is essential. Motivated by this, we propose Periodic Skill Discovery (PSD), a framework that discovers periodic behaviors in an unsupervised manner. The key idea of PSD is to train an encoder that maps states to a circular latent space, thereby naturally encoding periodicity in the latent representation. By capturing temporal distance, PSD can effectively learn skills with diverse periods in complex robotic tasks, even with pixel-based observations. We further show that these learned skills achieve high performance on downstream tasks such as hurdling. Moreover, integrating PSD with an existing skill discovery method offers more diverse behaviors, thus broadening the agent’s repertoire. Our code and demos are available at this https URL
[LG-35] Understanding Robustness of Model Editing in Code LLM s: An Empirical Study
链接: https://arxiv.org/abs/2511.03182
作者: Vinaik Chhetri,A.B Siddique,Umar Farooq
类目: oftware Engineering (cs.SE); Machine Learning (cs.LG)
*备注: 26 pages, 2 figures, 15 tables
Abstract:Large language models (LLMs) are increasingly used in software development. However, while LLMs remain static after pretraining, programming languages and APIs continue to evolve, leading to the generation of deprecated or incompatible code that undermines reliability. Retraining LLMs from scratch to reflect such changes is computationally expensive, making model editing a promising lightweight alternative that updates only a small subset of parameters. Despite its potential, it remains unclear whether model editing yields genuine syntactic and semantic adaptations or merely superficial fixes. In this work, we present a systematic study of five state-of-the-art model editing methods: Constrained Fine-Tuning (FT), GRACE, MEMIT, PMET, and ROME. We apply these methods to three leading open-source code LLMs, CodeLlama, CodeQwen1.5, and DeepSeek-Coder, under controlled API deprecation scenarios. Our evaluation covers both instant and sequential editing settings, using three disjoint evaluation sets designed to assess reliability, generalization, and specificity. We measure model correctness at three levels: successful compilation, partial test case pass, and full test pass. Our findings show that instant edits consistently degrade model performance, with syntactic validity dropping by up to 86 percentage points and functional correctness declining by 45 points even in the best-performing setting. Sequential edits further amplify this degradation, and in some cases, model performance collapses entirely. Across all models, most passing generations relied on workarounds rather than correctly adopting the intended changes, while faulty adoptions that result in test failures or compilation errors were significantly more frequent. Correct adoptions, where the model correctly integrates the intended change, occurred in only about 6% of cases.
[LG-36] Learning-based Cooperative Robotic Paper Wrapping: A Unified Control Policy with Residual Force Control
链接: https://arxiv.org/abs/2511.03181
作者: Rewida Ali,Cristian C. Beltran-Hernandez,Weiwei Wan,Kensuke Harada
类目: Robotics (cs.RO); Machine Learning (cs.LG)
*备注:
Abstract:Human-robot cooperation is essential in environments such as warehouses and retail stores, where workers frequently handle deformable objects like paper, bags, and fabrics. Coordinating robotic actions with human assistance remains difficult due to the unpredictable dynamics of deformable materials and the need for adaptive force control. To explore this challenge, we focus on the task of gift wrapping, which exemplifies a long-horizon manipulation problem involving precise folding, controlled creasing, and secure fixation of paper. Success is achieved when the robot completes the sequence to produce a neatly wrapped package with clean folds and no tears. We propose a learning-based framework that integrates a high-level task planner powered by a large language model (LLM) with a low-level hybrid imitation learning (IL) and reinforcement learning (RL) policy. At its core is a Sub-task Aware Robotic Transformer (START) that learns a unified policy from human demonstrations. The key novelty lies in capturing long-range temporal dependencies across the full wrapping sequence within a single model. Unlike vanilla Action Chunking with Transformer (ACT), typically applied to short tasks, our method introduces sub-task IDs that provide explicit temporal grounding. This enables robust performance across the entire wrapping process and supports flexible execution, as the policy learns sub-goals rather than merely replicating motion sequences. Our framework achieves a 97% success rate on real-world wrapping tasks. We show that the unified transformer-based policy reduces the need for specialized models, allows controlled human supervision, and effectively bridges high-level intent with the fine-grained force control required for deformable object manipulation. Subjects: Robotics (cs.RO); Machine Learning (cs.LG) Cite as: arXiv:2511.03181 [cs.RO] (or arXiv:2511.03181v1 [cs.RO] for this version) https://doi.org/10.48550/arXiv.2511.03181 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
[LG-37] UnCLe: Towards Scalable Dynamic Causal Discovery in Non-linear Temporal Systems NEURIPS2025
链接: https://arxiv.org/abs/2511.03168
作者: Tingzhu Bi,Yicheng Pan,Xinrui Jiang,Huize Sun,Meng Ma,Ping Wang
类目: Machine Learning (cs.LG)
*备注: 12 pages main content, 18 pages appendix, NeurIPS 2025. Code: this https URL
Abstract:Uncovering cause-effect relationships from observational time series is fundamental to understanding complex systems. While many methods infer static causal graphs, real-world systems often exhibit dynamic causality-where relationships evolve over time. Accurately capturing these temporal dynamics requires time-resolved causal graphs. We propose UnCLe, a novel deep learning method for scalable dynamic causal discovery. UnCLe employs a pair of Uncoupler and Recoupler networks to disentangle input time series into semantic representations and learns inter-variable dependencies via auto-regressive Dependency Matrices. It estimates dynamic causal influences by analyzing datapoint-wise prediction errors induced by temporal perturbations. Extensive experiments demonstrate that UnCLe not only outperforms state-of-the-art baselines on static causal discovery benchmarks but, more importantly, exhibits a unique capability to accurately capture and represent evolving temporal causality in both synthetic and real-world dynamic systems (e.g., human motion). UnCLe offers a promising approach for revealing the underlying, time-varying mechanisms of complex phenomena.
[LG-38] owards Scalable Backpropagation-Free Gradient Estimation
链接: https://arxiv.org/abs/2511.03110
作者: Daniel Wang,Evan Markou,Dylan Campbell
类目: Machine Learning (cs.LG)
*备注: 12 pages, 2 figures, Accepted to AJCAI 2025
Abstract:While backpropagation–reverse-mode automatic differentiation–has been extraordinarily successful in deep learning, it requires two passes (forward and backward) through the neural network and the storage of intermediate activations. Existing gradient estimation methods that instead use forward-mode automatic differentiation struggle to scale beyond small networks due to the high variance of the estimates. Efforts to mitigate this have so far introduced significant bias to the estimates, reducing their utility. We introduce a gradient estimation approach that reduces both bias and variance by manipulating upstream Jacobian matrices when computing guess directions. It shows promising results and has the potential to scale to larger networks, indeed performing better as the network width is increased. Our understanding of this method is facilitated by analyses of bias and variance, and their connection to the low-dimensional structure of neural network gradients.
[LG-39] An Efficient Classification Model for Cyber Text
链接: https://arxiv.org/abs/2511.03107
作者: Md Sakhawat Hossen,Md. Zashid Iqbal Borshon,A. S. M. Badrudduza
类目: Machine Learning (cs.LG); Information Theory (cs.IT)
*备注:
Abstract:The uprising of deep learning methodology and practice in recent years has brought about a severe consequence of increasing carbon footprint due to the insatiable demand for computational resources and power. The field of text analytics also experienced a massive transformation in this trend of monopolizing methodology. In this paper, the original TF-IDF algorithm has been modified, and Clement Term Frequency-Inverse Document Frequency (CTF-IDF) has been proposed for data preprocessing. This paper primarily discusses the effectiveness of classical machine learning techniques in text analytics with CTF-IDF and a faster IRLBA algorithm for dimensionality reduction. The introduction of both of these techniques in the conventional text analytics pipeline ensures a more efficient, faster, and less computationally intensive application when compared with deep learning methodology regarding carbon footprint, with minor compromise in accuracy. The experimental results also exhibit a manifold of reduction in time complexity and improvement of model accuracy for the classical machine learning methods discussed further in this paper.
[LG-40] Online Learning to Rank under Corruption: A Robust Cascading Bandits Approach
链接: https://arxiv.org/abs/2511.03074
作者: Fatemeh Ghaffari,Siddarth Sitaraman,Xutong Liu,Xuchuang Wang,Mohammad Hajiesmaili
类目: Machine Learning (cs.LG)
*备注:
Abstract:Online learning to rank (OLTR) studies how to recommend a short ranked list of items from a large pool and improves future rankings based on user clicks. This setting is commonly modeled as cascading bandits, where the objective is to maximize the likelihood that the user clicks on at least one of the presented items across as many timesteps as possible. However, such systems are vulnerable to click fraud and other manipulations (i.e., corruption), where bots or paid click farms inject corrupted feedback that misleads the learning process and degrades user experience. In this paper, we propose MSUCB, a robust algorithm that incorporates a novel mean-of-medians estimator, which to our knowledge is applied to bandits with corruption setting for the first time. This estimator behaves like a standard mean in the absence of corruption, so no cost is paid for robustness. Under corruption, the median step filters out outliers and corrupted samples, keeping the estimate close to its true value. Updating this estimate at every round further accelerates empirical convergence in experiments. Hence, MSUCB achieves optimal logarithmic regret in the absence of corruption and degrades gracefully under corruptions, with regret increasing only by an additive term tied to the total corruption. Comprehensive and extensive experiments on real-world datasets further demonstrate that our approach consistently outperforms prior methods while maintaining strong robustness. In particular, it achieves a (97.35%) and a (91.60%) regret improvement over two state-of-the-art methods.
[LG-41] Homomorphism distortion: A metric to distinguish them all and in the latent space bind them
链接: https://arxiv.org/abs/2511.03068
作者: Martin Carrasco,Olga Zaghen,Erik Bekkers,Bastian Rieck
类目: Machine Learning (cs.LG)
*备注:
Abstract:For far too long, expressivity of graph neural networks has been measured \emphonly in terms of combinatorial properties. In this work we stray away from this tradition and provide a principled way to measure similarity between vertex attributed graphs. We denote this measure as the \emphgraph homomorphism distortion. We show it can \emphcompletely characterize graphs and thus is also a \emphcomplete graph embedding. However, somewhere along the road, we run into the graph canonization problem. To circumvent this obstacle, we devise to efficiently compute this measure via sampling, which in expectation ensures \emphcompleteness. Additionally, we also discovered that we can obtain a metric from this measure. We validate our claims empirically and find that the \emphgraph homomorphism distortion: (1.) fully distinguishes the \textttBREC dataset with up to 4 -WL non-distinguishable graphs, and (2.) \emphoutperforms previous methods inspired in homomorphisms under the \textttZINC-12k dataset. These theoretical results, (and their empirical validation), pave the way for future characterization of graphs, extending the graph theoretic tradition to new frontiers. Subjects: Machine Learning (cs.LG) Cite as: arXiv:2511.03068 [cs.LG] (or arXiv:2511.03068v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2511.03068 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
[LG-42] Unsupervised Evaluation of Multi-Turn Objective-Driven Interactions ICLR2026
链接: https://arxiv.org/abs/2511.03047
作者: Emi Soroka,Tanmay Chopra,Krish Desai,Sanjay Lall
类目: Machine Learning (cs.LG)
*备注: Under review at ICLR 2026
Abstract:Large language models (LLMs) have seen increasing popularity in enterprise applications where AI agents and humans engage in objective-driven interactions. However, these systems are difficult to evaluate: data may be complex and unlabeled; human annotation is often impractical at scale; custom metrics can monitor for specific errors, but not previously-undetected ones; and LLM judges can produce unreliable results. We introduce the first set of unsupervised metrics for objective-driven interactions, leveraging statistical properties of unlabeled interaction data and using fine-tuned LLMs to adapt to distributional shifts. We develop metrics for labeling user goals, measuring goal completion, and quantifying LLM uncertainty without grounding evaluations in human-generated ideal responses. Our approach is validated on open-domain and task-specific interaction data.
[LG-43] Leverag ing Discrete Function Decomposability for Scientific Design
链接: https://arxiv.org/abs/2511.03032
作者: James C. Bowden,Sergey Levine,Jennifer Listgarten
类目: Machine Learning (cs.LG)
*备注:
Abstract:In the era of AI-driven science and engineering, we often want to design discrete objects in silico according to user-specified properties. For example, we may wish to design a protein to bind its target, arrange components within a circuit to minimize latency, or find materials with certain properties. Given a property predictive model, in silico design typically involves training a generative model over the design space (e.g., protein sequence space) to concentrate on designs with the desired properties. Distributional optimization – which can be formalized as an estimation of distribution algorithm or as reinforcement learning policy optimization – finds the generative model that maximizes an objective function in expectation. Optimizing a distribution over discrete-valued designs is in general challenging because of the combinatorial nature of the design space. However, many property predictors in scientific applications are decomposable in the sense that they can be factorized over design variables in a way that could in principle enable more effective optimization. For example, amino acids at a catalytic site of a protein may only loosely interact with amino acids of the rest of the protein to achieve maximal catalytic activity. Current distributional optimization algorithms are unable to make use of such decomposability structure. Herein, we propose and demonstrate use of a new distributional optimization algorithm, Decomposition-Aware Distributional Optimization (DADO), that can leverage any decomposability defined by a junction tree on the design variables, to make optimization more efficient. At its core, DADO employs a soft-factorized “search distribution” – a learned generative model – for efficient navigation of the search space, invoking graph message-passing to coordinate optimization across linked factors.
[LG-44] Exploratory Analysis of Cyberattack Patterns on E-Commerce Platforms Using Statistical Methods
链接: https://arxiv.org/abs/2511.03020
作者: Fatimo Adenike Adeniya(York St John University, London Campus, London, United Kingdom)
类目: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
*备注: 32 pages, 9 figures, 6 tables; MSc Research Dissertation, York St John University, London Campus
Abstract:Cyberattacks on e-commerce platforms have grown in sophistication, threatening consumer trust and operational continuity. This research presents a hybrid analytical framework that integrates statistical modelling and machine learning for detecting and forecasting cyberattack patterns in the e-commerce domain. Using the Verizon Community Data Breach (VCDB) dataset, the study applies Auto ARIMA for temporal forecasting and significance testing, including a Mann-Whitney U test (U = 2579981.5, p = 0.0121), which confirmed that holiday shopping events experienced significantly more severe cyberattacks than non-holiday periods. ANOVA was also used to examine seasonal variation in threat severity, while ensemble machine learning models (XGBoost, LightGBM, and CatBoost) were employed for predictive classification. Results reveal recurrent attack spikes during high-risk periods such as Black Friday and holiday seasons, with breaches involving Personally Identifiable Information (PII) exhibiting elevated threat indicators. Among the models, CatBoost achieved the highest performance (accuracy = 85.29%, F1 score = 0.2254, ROC AUC = 0.8247). The framework uniquely combines seasonal forecasting with interpretable ensemble learning, enabling temporal risk anticipation and breach-type classification. Ethical considerations, including responsible use of sensitive data and bias assessment, were incorporated. Despite class imbalance and reliance on historical data, the study provides insights for proactive cybersecurity resource allocation and outlines directions for future real-time threat detection research.
[LG-45] Discrete Bayesian Sample Inference for Graph Generation
链接: https://arxiv.org/abs/2511.03015
作者: Ole Petersen,Marcel Kollovieh,Marten Lienen,Stephan Günnemann
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:
Abstract:Generating graph-structured data is crucial in applications such as molecular generation, knowledge graphs, and network analysis. However, their discrete, unordered nature makes them difficult for traditional generative models, leading to the rise of discrete diffusion and flow matching models. In this work, we introduce GraphBSI, a novel one-shot graph generative model based on Bayesian Sample Inference (BSI). Instead of evolving samples directly, GraphBSI iteratively refines a belief over graphs in the continuous space of distribution parameters, naturally handling discrete structures. Further, we state BSI as a stochastic differential equation (SDE) and derive a noise-controlled family of SDEs that preserves the marginal distributions via an approximation of the score function. Our theoretical analysis further reveals the connection to Bayesian Flow Networks and Diffusion models. Finally, in our empirical evaluation, we demonstrate state-of-the-art performance on molecular and synthetic graph generation, outperforming existing one-shot graph generative models on the standard benchmarks Moses and GuacaMol.
[LG-46] Heterogeneous Metamaterials Design via Multiscale Neural Implicit Representation
链接: https://arxiv.org/abs/2511.03012
作者: Hongrui Chen,Liwei Wang,Levent Burak Kara
类目: Machine Learning (cs.LG)
*备注:
Abstract:Metamaterials are engineered materials composed of specially designed unit cells that exhibit extraordinary properties beyond those of natural materials. Complex engineering tasks often require heterogeneous unit cells to accommodate spatially varying property requirements. However, designing heterogeneous metamaterials poses significant challenges due to the enormous design space and strict compatibility requirements between neighboring cells. Traditional concurrent multiscale design methods require solving an expensive optimization problem for each unit cell and often suffer from discontinuities at cell boundaries. On the other hand, data-driven approaches that assemble structures from a fixed library of microstructures are limited by the dataset and require additional post-processing to ensure seamless connections. In this work, we propose a neural network-based metamaterial design framework that learns a continuous two-scale representation of the structure, thereby jointly addressing these challenges. Central to our framework is a multiscale neural representation in which the neural network takes both global (macroscale) and local (microscale) coordinates as inputs, outputting an implicit field that represents multiscale structures with compatible unit cell geometries across the domain, without the need for a predefined dataset. We use a compatibility loss term during training to enforce connectivity between adjacent unit cells. Once trained, the network can produce metamaterial designs at arbitrarily high resolution, hence enabling infinite upsampling for fabrication or simulation. We demonstrate the effectiveness of the proposed approach on mechanical metamaterial design, negative Poisson’s ratio, and mechanical cloaking problems with potential applications in robotics, bioengineering, and aerospace.
[LG-47] Inference-Time Personalized Alignment with a Few User Preference Queries NEURIPS’25
链接: https://arxiv.org/abs/2511.02966
作者: Victor-Alexandru Pădurean,Parameswaran Kamalaruban,Nachiket Kotalwar,Alkis Gotovos,Adish Singla
类目: Machine Learning (cs.LG)
*备注: NeurIPS’25 paper
Abstract:We study the problem of aligning a generative model’s response with a user’s preferences. Recent works have proposed several different formulations for personalized alignment; however, they either require a large amount of user preference queries or require that the preference be explicitly specified as a text input. In this paper, we propose a novel inference-time personalized alignment method, UserAlign, that elicits the user’s preferences with a few queries as pairwise response comparisons. In particular, UserAlign builds on the theoretical framework of best-arm identification in logistic bandits and selects a personalized response from a fixed pool of the model’s generated responses. The key idea is to consider the user’s feedback consistent and noise-free, and incorporate it into the theoretical framework to identify the best response quickly. Experimental results across several tasks, involving personalized text and image generation, showcase the effectiveness of UserAlign in achieving personalized alignment.
[LG-48] Digital Twin-Driven Pavement Health Monitoring and Maintenance Optimization Using Graph Neural Networks
链接: https://arxiv.org/abs/2511.02957
作者: Mohsin Mahmud Topu,Mahfuz Ahmed Anik,Azmine Toushik Wasi,Md Manjurul Ahsan
类目: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Emerging Technologies (cs.ET); Neural and Evolutionary Computing (cs.NE); Systems and Control (eess.SY)
*备注:
Abstract:Pavement infrastructure monitoring is challenged by complex spatial dependencies, changing environmental conditions, and non-linear deterioration across road networks. Traditional Pavement Management Systems (PMS) remain largely reactive, lacking real-time intelligence for failure prevention and optimal maintenance planning. To address this, we propose a unified Digital Twin (DT) and Graph Neural Network (GNN) framework for scalable, data-driven pavement health monitoring and predictive maintenance. Pavement segments and spatial relations are modeled as graph nodes and edges, while real-time UAV, sensor, and LiDAR data stream into the DT. The inductive GNN learns deterioration patterns from graph-structured inputs to forecast distress and enable proactive interventions. Trained on a real-world-inspired dataset with segment attributes and dynamic connectivity, our model achieves an R2 of 0.3798, outperforming baseline regressors and effectively capturing non-linear degradation. We also develop an interactive dashboard and reinforcement learning module for simulation, visualization, and adaptive maintenance planning. This DT-GNN integration enhances forecasting precision and establishes a closed feedback loop for continuous improvement, positioning the approach as a foundation for proactive, intelligent, and sustainable pavement management, with future extensions toward real-world deployment, multi-agent coordination, and smart-city integration.
[LG-49] Adaptive and Robust Data Poisoning Detection and Sanitization in Wearable IoT Systems using Large Language Models
链接: https://arxiv.org/abs/2511.02894
作者: W.K.M Mithsara,Ning Yang,Ahmed Imteaj,Hussein Zangoti,Abdur R. Shahid
类目: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
*备注:
Abstract:The widespread integration of wearable sensing devices in Internet of Things (IoT) ecosystems, particularly in healthcare, smart homes, and industrial applications, has required robust human activity recognition (HAR) techniques to improve functionality and user experience. Although machine learning models have advanced HAR, they are increasingly susceptible to data poisoning attacks that compromise the data integrity and reliability of these systems. Conventional approaches to defending against such attacks often require extensive task-specific training with large, labeled datasets, which limits adaptability in dynamic IoT environments. This work proposes a novel framework that uses large language models (LLMs) to perform poisoning detection and sanitization in HAR systems, utilizing zero-shot, one-shot, and few-shot learning paradigms. Our approach incorporates \textitrole play prompting, whereby the LLM assumes the role of expert to contextualize and evaluate sensor anomalies, and \textitthink step-by-step reasoning, guiding the LLM to infer poisoning indicators in the raw sensor data and plausible clean alternatives. These strategies minimize reliance on curation of extensive datasets and enable robust, adaptable defense mechanisms in real-time. We perform an extensive evaluation of the framework, quantifying detection accuracy, sanitization quality, latency, and communication cost, thus demonstrating the practicality and effectiveness of LLMs in improving the security and reliability of wearable IoT systems.
[LG-50] Supersimulators
链接: https://arxiv.org/abs/2509.17994
作者: Cynthia Dwork,Pranay Tankala
类目: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
*备注:
Abstract:We prove that every randomized Boolean function admits a supersimulator: a randomized polynomial-size circuit whose output on random inputs cannot be efficiently distinguished from reality with constant advantage, even by polynomially larger distinguishers. Our result builds on the landmark complexity-theoretic regularity lemma of Trevisan, Tulsiani and Vadhan (2009), which, in contrast, provides a simulator that fools smaller distinguishers. We circumvent lower bounds for the simulator size by letting the distinguisher size bound vary with the target function, while remaining below an absolute upper bound independent of the target function. This dependence on the target function arises naturally from our use of an iteration technique originating in the graph regularity literature. The simulators provided by the regularity lemma and recent refinements thereof, known as multiaccurate and multicalibrated predictors, respectively, as per Hebert-Johnson et al. (2018), have previously been shown to have myriad applications in complexity theory, cryptography, learning theory, and beyond. We first show that a recent multicalibration-based characterization of the computational indistinguishability of product distributions actually requires only (calibrated) multiaccuracy. We then show that supersimulators yield an even tighter result in this application domain, closing a complexity gap present in prior versions of the characterization. Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG) Cite as: arXiv:2509.17994 [cs.CC] (or arXiv:2509.17994v2 [cs.CC] for this version) https://doi.org/10.48550/arXiv.2509.17994 Focus to learn more arXiv-issued DOI via DataCite
[LG-51] he Adaptivity Barrier in Batched Nonparametric Bandits: Sharp Characterization of the Price of Unknown Margin
链接: https://arxiv.org/abs/2511.03708
作者: Rong Jiang,Cong Ma
类目: atistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:
Abstract:We study batched nonparametric contextual bandits under a margin condition when the margin parameter \alpha is unknown. To capture the statistical price of this ignorance, we introduce the regret inflation criterion, defined as the ratio between the regret of an adaptive algorithm and that of an oracle knowing \alpha . We show that the optimal regret inflation grows polynomial with the horizon T , with exponent precisely given by the value of a convex optimization problem involving the dimension, smoothness, and batch budget. Moreover, the minimizers of this optimization problem directly prescribe the batch allocation and exploration strategy of a rate-optimal algorithm. Building on this principle, we develop RoBIN (RObust batched algorithm with adaptive BINning), which achieves the optimal regret inflation up to logarithmic factors. These results reveal a new adaptivity barrier: under batching, adaptation to an unknown margin parameter inevitably incurs a polynomial penalty, sharply characterized by a variational problem. Remarkably, this barrier vanishes when the number of batches exceeds \log \log T ; with only a doubly logarithmic number of updates, one can recover the oracle regret rate up to polylogarithmic factors.
[LG-52] Colorectal Cancer Histopathological Grading using Multi-Scale Federated Learning
链接: https://arxiv.org/abs/2511.03693
作者: Md Ahasanul Arafath,Abhijit Kumar Ghosh,Md Rony Ahmed,Sabrin Afroz,Minhazul Hosen,Md Hasan Moon,Md Tanzim Reza,Md Ashad Alam
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注: 15 pages and 7 figures
Abstract:Colorectal cancer (CRC) grading is a critical prognostic factor but remains hampered by inter-observer variability and the privacy constraints of multi-institutional data sharing. While deep learning offers a path to automation, centralized training models conflict with data governance regulations and neglect the diagnostic importance of multi-scale analysis. In this work, we propose a scalable, privacy-preserving federated learning (FL) framework for CRC histopathological grading that integrates multi-scale feature learning within a distributed training paradigm. Our approach employs a dual-stream ResNetRS50 backbone to concurrently capture fine-grained nuclear detail and broader tissue-level context. This architecture is integrated into a robust FL system stabilized using FedProx to mitigate client drift across heterogeneous data distributions from multiple hospitals. Extensive evaluation on the CRC-HGD dataset demonstrates that our framework achieves an overall accuracy of 83.5%, outperforming a comparable centralized model (81.6%). Crucially, the system excels in identifying the most aggressive Grade III tumors with a high recall of 87.5%, a key clinical priority to prevent dangerous false negatives. Performance further improves with higher magnification, reaching 88.0% accuracy at 40x. These results validate that our federated multi-scale approach not only preserves patient privacy but also enhances model performance and generalization. The proposed modular pipeline, with built-in preprocessing, checkpointing, and error handling, establishes a foundational step toward deployable, privacy-aware clinical AI for digital pathology.
[LG-53] Quantifying Weighted Morphological Content of Large-Scale Structures via Simulation-Based Inference
链接: https://arxiv.org/abs/2511.03636
作者: M. H. Jalali Kanafi,S. M. S. Movahed
类目: Cosmology and Nongalactic Astrophysics (astro-ph.CO); Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
*备注: 19 pages, 9 figures and 3 tables. Comments are welcome
Abstract:In this work, we perform a simulation-based forecasting analysis to compare the constraining power of two higher-order summary statistics of the large-scale structure (LSS), the Minkowski Functionals (MFs) and the Conditional Moments of Derivative (CMD), with a particular focus on their sensitivity to nonlinear and anisotropic features in redshift-space. Our analysis relies on halo catalogs from the Big Sobol Sequence(BSQ) simulations at redshift z=0.5 , employing a likelihood-free inference framework implemented via neural posterior estimation. At the fiducial cosmology of the Quijote simulations (\Omega_m=0.3175,,\sigma_8=0.834) , and for the smoothing scale R=15,h^-1 Mpc, we find that the CMD yields tighter forecasts for (\Omega_m,,\sigma_8) than the zeroth- to third-order MFs components, improving the constraint precision by \sim(44%,,52%) , \sim(30%,,45%) , \sim(27%,,17%) , and \sim(26%,,17%) , respectively. A joint configuration combining the MFs and CMD further enhances the precision by approximately \sim27% compared to the standard MFs alone, highlighting the complementary anisotropy-sensitive information captured by the CMD in contrast to the scalar morphological content encapsulated by the MFs. We further extend the forecasting analysis to a continuous range of cosmological parameter values and multiple smoothing scales. Our results show that, although the absolute forecast uncertainty for each component of summary statistics depends on the underlying parameter values and the adopted smoothing scale, the relative constraining power among the summary statistics remains nearly constant throughout.
[LG-54] Vector-valued self-normalized concentration inequalities beyond sub-Gaussianity
链接: https://arxiv.org/abs/2511.03606
作者: Diego Martinez-Taboada,Tomas Gonzalez,Aaditya Ramdas
类目: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
*备注:
Abstract:The study of self-normalized processes plays a crucial role in a wide range of applications, from sequential decision-making to econometrics. While the behavior of self-normalized concentration has been widely investigated for scalar-valued processes, vector-valued processes remain comparatively underexplored, especially outside of the sub-Gaussian framework. In this contribution, we provide concentration bounds for self-normalized processes with light tails beyond sub-Gaussianity (such as Bennett or Bernstein bounds). We illustrate the relevance of our results in the context of online linear regression, with applications in (kernelized) linear bandits.
[LG-55] he Structure of Cross-Validation Error: Stability Covariance and Minimax Limits
链接: https://arxiv.org/abs/2511.03554
作者: Ido Nachum,Rüdiger Urbanke,Thomas Weinberger
类目: atistics Theory (math.ST); Machine Learning (cs.LG)
*备注: 59 pages
Abstract:Despite ongoing theoretical research on cross-validation (CV), many theoretical questions about CV remain widely open. This motivates our investigation into how properties of algorithm-distribution pairs can affect the choice for the number of folds in k -fold cross-validation. Our results consist of a novel decomposition of the mean-squared error of cross-validation for risk estimation, which explicitly captures the correlations of error estimates across overlapping folds and includes a novel algorithmic stability notion, squared loss stability, that is considerably weaker than the typically required hypothesis stability in other comparable works. Furthermore, we prove: 1. For every learning algorithm that minimizes empirical error, a minimax lower bound on the mean-squared error of k -fold CV estimating the population risk L_\mathcalD : [ \min_k \mid n; \max_\mathcalD; \mathbbE!\left[\big(\widehatL_\mathrmCV^(k) - L_\mathcalD\big)^2\right] ;=; \Omega!\big(\sqrtk/n\big), ] where n is the sample size and k the number of folds. This shows that even under idealized conditions, for large values of k , CV cannot attain the optimum of order 1/n achievable by a validation set of size n , reflecting an inherent penalty caused by dependence between folds. 2. Complementing this, we exhibit learning rules for which [ \max_\mathcalD; \mathbbE!\left[\big(\widehatL_\mathrmCV^(k) - L_\mathcalD\big)^2\right] ;=; \Omega(k/n), ] matching (up to constants) the accuracy of a hold-out estimator of a single fold of size n/k . Together these results delineate the fundamental trade-off in resampling-based risk estimation: CV cannot fully exploit all n samples for unbiased risk evaluation, and its minimax performance is pinned between the k/n and \sqrtk/n regimes. Comments: 59 pages Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG) Cite as: arXiv:2511.03554 [math.ST] (or arXiv:2511.03554v1 [math.ST] for this version) https://doi.org/10.48550/arXiv.2511.03554 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Thomas Weinberger [view email] [v1] Wed, 5 Nov 2025 15:35:46 UTC (60 KB) Full-text links: Access Paper: View a PDF of the paper titled The Structure of Cross-Validation Error: Stability, Covariance, and Minimax Limits, by Ido Nachum and 2 other authorsView PDFTeX Source view license Current browse context: math.ST prev | next new | recent | 2025-11 Change to browse by: cs cs.LG math stat stat.TH References Citations NASA ADSGoogle Scholar Semantic Scholar export BibTeX citation Loading… BibTeX formatted citation loading… Data provided by: Bookmark checked=“checked”> Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Code, Data and Media Associated with this Article alphaXiv Toggle alphaXiv (What is alphaXiv?) Links to Code Toggle CatalyzeX Code Finder for Papers (What is CatalyzeX?) DagsHub Toggle DagsHub (What is DagsHub?) GotitPub Toggle Gotit.pub (What is GotitPub?) Huggingface Toggle Hugging Face (What is Huggingface?) Links to Code Toggle Papers with Code (What is Papers with Code?) ScienceCast Toggle ScienceCast (What is ScienceCast?) Demos Demos Replicate Toggle Replicate (What is Replicate?) Spaces Toggle Hugging Face Spaces (What is Spaces?) Spaces Toggle TXYZ.AI (What is TXYZ.AI?) Related Papers Recommenders and Search Tools Link to Influence Flower Influence Flower (What are Influence Flowers?) Core recommender toggle CORE Recommender (What is CORE?) Author Venue Institution Topic About arXivLabs arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv’s community? Learn more about arXivLabs. Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?) mathjaxToggle(); About Help contact arXivClick here to contact arXiv Contact subscribe to arXiv mailingsClick here to subscribe Subscribe Copyright Privacy Policy Web Accessibility Assistance arXiv Operational Status
[LG-56] A Support-Set Algorithm for Optimization Problems with Nonnegative and Orthogonal Constraints
链接: https://arxiv.org/abs/2511.03443
作者: Lei Wang,Xin Liu,Xiaojun Chen
类目: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:
Abstract:In this paper, we investigate optimization problems with nonnegative and orthogonal constraints, where any feasible matrix of size n \times p exhibits a sparsity pattern such that each row accommodates at most one nonzero entry. Our analysis demonstrates that, by fixing the support set, the global solution of the minimization subproblem for the proximal linearization of the objective function can be computed in closed form with at most n nonzero entries. Exploiting this structural property offers a powerful avenue for dramatically enhancing computational efficiency. Guided by this insight, we propose a support-set algorithm preserving strictly the feasibility of iterates. A central ingredient is a strategically devised update scheme for support sets that adjusts the placement of nonzero entries. We establish the global convergence of the support-set algorithm to a first-order stationary point, and show that its iteration complexity required to reach an \epsilon -approximate first-order stationary point is O (\epsilon^-2) . Numerical results are strongly in favor of our algorithm in real-world applications, including nonnegative PCA, clustering, and community detection.
[LG-57] Influence of Data Dimensionality Reduction Methods on the Effectiveness of Quantum Machine Learning Models
链接: https://arxiv.org/abs/2511.03320
作者: Aakash Ravindra Shinde,Jukka K. Nurminen
类目: Quantum Physics (quant-ph); Machine Learning (cs.LG)
*备注: 12 pages, IEEE International Conference on Quantum Computing Engineering (QCE25)
Abstract:Data dimensionality reduction techniques are often utilized in the implementation of Quantum Machine Learning models to address two significant issues: the constraints of NISQ quantum devices, which are characterized by noise and a limited number of qubits, and the challenge of simulating a large number of qubits on classical devices. It also raises concerns over the scalability of these approaches, as dimensionality reduction methods are slow to adapt to large datasets. In this article, we analyze how data reduction methods affect different QML models. We conduct this experiment over several generated datasets, quantum machine algorithms, quantum data encoding methods, and data reduction methods. All these models were evaluated on the performance metrics like accuracy, precision, recall, and F1 score. Our findings have led us to conclude that the usage of data dimensionality reduction methods results in skewed performance metric values, which results in wrongly estimating the actual performance of quantum machine learning models. There are several factors, along with data dimensionality reduction methods, that worsen this problem, such as characteristics of the datasets, classical to quantum information embedding methods, percentage of feature reduction, classical components associated with quantum models, and structure of quantum machine learning models. We consistently observed the difference in the accuracy range of 14% to 48% amongst these models, using data reduction and not using it. Apart from this, our observations have shown that some data reduction methods tend to perform better for some specific data embedding methodologies and ansatz constructions.
[LG-58] opography climate land cover and biodiversity: Explaining endemic richness and management implications on a Mediterranean island
链接: https://arxiv.org/abs/2511.03242
作者: Aristides Moustakas,Ioannis N Vogiatzakis
类目: Populations and Evolution (q-bio.PE); Machine Learning (cs.LG); Other Statistics (stat.OT)
*备注:
Abstract:Island endemism is shaped by complex interactions among environmental, ecological, and evolutionary factors, yet the relative contributions of topography, climate, and land cover remain incompletely quantified. We investigated the drivers of endemic plant richness across Crete, a Mediterranean biodiversity hotspot, using spatially explicit data on species distributions, topographic complexity, climatic variability, land cover, and soil characteristics. Artificial Neural Network models, a machine learning tool, were employed to assess the relative importance of these predictors and to identify hotspots of endemism. We found that total species richness, elevation range, and climatic variability were the strongest predictors of endemic richness, reflecting the role of biodiversity, topographic heterogeneity, and climatic gradients in generating diverse habitats and micro-refugia that promote speciation and buffer extinction risk. Endemic hotspots only partially overlapped with areas of high total species richness, indicating that total species richness was the optimal from the ones examined, yet an imperfect surrogate. These environmentally heterogeneous areas also provide critical ecosystem services, including soil stabilization, pollination, and cultural value, which are increasingly threatened by tourism, renewable energy development, land-use change, and climate impacts. Our findings underscore the importance of prioritizing mountainous and climatically variable regions in conservation planning, integrating ecosystem service considerations, and accounting for within-island spatial heterogeneity. By explicitly linking the environmental drivers of endemism to both biodiversity patterns and ecosystem function, this study provides a framework for evidence-based conservation planning in Crete and other Mediterranean islands with similar geological and biogeographic contexts.
[LG-59] RKUM: An R Package for Robust Kernel Unsupervised Methods
链接: https://arxiv.org/abs/2511.03216
作者: Md Ashad Alam
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注: 26, 2 figures
Abstract:RKUM is an R package developed for implementing robust kernel-based unsupervised methods. It provides functions for estimating the robust kernel covariance operator (CO) and the robust kernel cross-covariance operator (CCO) using generalized loss functions instead of the conventional quadratic loss. These operators form the foundation of robust kernel learning and enable reliable analysis under contaminated or noisy data conditions. The package includes implementations of robust kernel canonical correlation analysis (Kernel CCA), as well as the influence function (IF) for both standard and multiple kernel CCA frameworks. The influence function quantifies sensitivity and helps detect influential or outlying observations across two-view and multi-view datasets. Experiments using synthesized two-view and multi-view data demonstrate that the IF of the standard kernel CCA effectively identifies outliers, while the robust kernel methods implemented in RKUM exhibit reduced sensitivity to contamination. Overall, RKUM provides an efficient and extensible platform for robust kernel-based analysis in high-dimensional data applications.
[LG-60] Provable Separations between Memorization and Generalization in Diffusion Models
链接: https://arxiv.org/abs/2511.03202
作者: Zeqi Ye,Qijie Zhu,Molei Tao,Minshuo Chen
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注: 51 pages, 4 figures
Abstract:Diffusion models have achieved remarkable success across diverse domains, but they remain vulnerable to memorization – reproducing training data rather than generating novel outputs. This not only limits their creative potential but also raises concerns about privacy and safety. While empirical studies have explored mitigation strategies, theoretical understanding of memorization remains limited. We address this gap through developing a dual-separation result via two complementary perspectives: statistical estimation and network approximation. From the estimation side, we show that the ground-truth score function does not minimize the empirical denoising loss, creating a separation that drives memorization. From the approximation side, we prove that implementing the empirical score function requires network size to scale with sample size, spelling a separation compared to the more compact network representation of the ground-truth score function. Guided by these insights, we develop a pruning-based method that reduces memorization while maintaining generation quality in diffusion transformers.
[LG-61] Statistical Properties of Rectified Flow
链接: https://arxiv.org/abs/2511.03193
作者: Gonzalo Mena,Arun Kumar Kuchibhotla,Larry Wasserman
类目: atistics Theory (math.ST); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
*备注: 159 pages, 7 figures
Abstract:Rectified flow (Liu et al., 2022; Liu, 2022; Wu et al., 2023) is a method for defining a transport map between two distributions, and enjoys popularity in machine learning, although theoretical results supporting the validity of these methods are scant. The rectified flow can be regarded as an approximation to optimal transport, but in contrast to other transport methods that require optimization over a function space, computing the rectified flow only requires standard statistical tools such as regression or density estimation. Because of this, one can leverage standard data analysis tools for regression and density estimation to develop empirical versions of transport maps. We study some structural properties of the rectified flow, including existence, uniqueness, and regularity, as well as the related statistical properties, such as rates of convergence and central limit theorems, for some selected estimators. To do so, we analyze separately the bounded and unbounded cases as each presents unique challenges. In both cases, we are able to establish convergence at faster rates than the ones for the usual nonparametric regression and density estimation.
[LG-62] Modeling Headway in Heterogeneous and Mixed Traffic Flow: A Statistical Distribution Based on a General Exponential Function
链接: https://arxiv.org/abs/2511.03154
作者: Natchaphon Leungbootnak,Zihao Li,Zihang Wei,Dominique Lord,Yunlong Zhang
类目: Applications (stat.AP); Machine Learning (cs.LG)
*备注:
Abstract:The ability of existing headway distributions to accurately reflect the diverse behaviors and characteristics in heterogeneous traffic (different types of vehicles) and mixed traffic (human-driven vehicles with autonomous vehicles) is limited, leading to unsatisfactory goodness of fit. To address these issues, we modified the exponential function to obtain a novel headway distribution. Rather than employing Euler’s number (e) as the base of the exponential function, we utilized a real number base to provide greater flexibility in modeling the observed headway. However, the proposed is not a probability function. We normalize it to calculate the probability and derive the closed-form equation. In this study, we utilized a comprehensive experiment with five open datasets: highD, exiD, NGSIM, Waymo, and Lyft to evaluate the performance of the proposed distribution and compared its performance with six existing distributions under mixed and heterogeneous traffic flow. The results revealed that the proposed distribution not only captures the fundamental characteristics of headway distribution but also provides physically meaningful parameters that describe the distribution shape of observed headways. Under heterogeneous flow on highways (i.e., uninterrupted traffic flow), the proposed distribution outperforms other candidate distributions. Under urban road conditions (i.e., interrupted traffic flow), including heterogeneous and mixed traffic, the proposed distribution still achieves decent results.
[LG-63] Provable Accelerated Bayesian Optimization with Knowledge Transfer
链接: https://arxiv.org/abs/2511.03125
作者: Haitao Lin,Boxin Zhao,Mladen Kolar,Chong Liu
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注:
Abstract:We study how Bayesian optimization (BO) can be accelerated on a target task with historical knowledge transferred from related source tasks. Existing works on BO with knowledge transfer either do not have theoretical guarantees or achieve the same regret as BO in the non-transfer setting, \tilde\mathcalO(\sqrtT \gamma_f) , where T is the number of evaluations of the target function and \gamma_f denotes its information gain. In this paper, we propose the DeltaBO algorithm, in which a novel uncertainty-quantification approach is built on the difference function \delta between the source and target functions, which are allowed to belong to different reproducing kernel Hilbert spaces (RKHSs). Under mild assumptions, we prove that the regret of DeltaBO is of order \tilde\mathcalO(\sqrtT (T/N + \gamma_\delta)) , where N denotes the number of evaluations from source tasks and typically N \gg T . In many applications, source and target tasks are similar, which implies that \gamma_\delta can be much smaller than \gamma_f . Empirical studies on both real-world hyperparameter tuning tasks and synthetic functions show that DeltaBO outperforms other baseline methods and support our theoretical claims.
[LG-64] Quantifying Articulatory Coordination as a Biomarker for Schizophrenia ICASSP2026
链接: https://arxiv.org/abs/2511.03084
作者: Gowtham Premananth,Carol Espy-Wilson
类目: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Signal Processing (eess.SP)
*备注: Submitted to ICASSP 2026
Abstract:Advances in artificial intelligence (AI) and deep learning have improved diagnostic capabilities in healthcare, yet limited interpretability continues to hinder clinical adoption. Schizophrenia, a complex disorder with diverse symptoms including disorganized speech and social withdrawal, demands tools that capture symptom severity and provide clinically meaningful insights beyond binary diagnosis. Here, we present an interpretable framework that leverages articulatory speech features through eigenspectra difference plots and a weighted sum with exponential decay (WSED) to quantify vocal tract coordination. Eigenspectra plots effectively distinguished complex from simpler coordination patterns, and WSED scores reliably separated these groups, with ambiguity confined to a narrow range near zero. Importantly, WSED scores correlated not only with overall BPRS severity but also with the balance between positive and negative symptoms, reflecting more complex coordination in subjects with pronounced positive symptoms and the opposite trend for stronger negative symptoms. This approach offers a transparent, severity-sensitive biomarker for schizophrenia, advancing the potential for clinically interpretable speech-based assessment tools.
[LG-65] Min-Max Optimization Is Strictly Easier Than Variational Inequalities
链接: https://arxiv.org/abs/2511.03052
作者: Henry Shugart,Jason M. Altschuler
类目: Optimization and Control (math.OC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
*备注:
Abstract:Classically, a mainstream approach for solving a convex-concave min-max problem is to instead solve the variational inequality problem arising from its first-order optimality conditions. Is it possible to solve min-max problems faster by bypassing this reduction? This paper initiates this investigation. We show that the answer is yes in the textbook setting of unconstrained quadratic objectives: the optimal convergence rate for first-order algorithms is strictly better for min-max problems than for the corresponding variational inequalities. The key reason that min-max algorithms can be faster is that they can exploit the asymmetry of the min and max variables–a property that is lost in the reduction to variational inequalities. Central to our analyses are sharp characterizations of optimal convergence rates in terms of extremal polynomials which we compute using Green’s functions and conformal mappings.
[LG-66] Precise asymptotic analysis of Sobolev training for random feature models
链接: https://arxiv.org/abs/2511.03050
作者: Katharine E Fisher,Matthew TC Li,Youssef Marzouk,Timo Schorlepp
类目: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST)
*备注: 23(+49) pages, 7(+16) figures main text(+appendix)
Abstract:Gradient information is widely useful and available in applications, and is therefore natural to include in the training of neural networks. Yet little is known theoretically about the impact of Sobolev training – regression with both function and gradient data – on the generalization error of highly overparameterized predictive models in high dimensions. In this paper, we obtain a precise characterization of this training modality for random feature (RF) models in the limit where the number of trainable parameters, input dimensions, and training data tend proportionally to infinity. Our model for Sobolev training reflects practical implementations by sketching gradient data onto finite dimensional subspaces. By combining the replica method from statistical physics with linearizations in operator-valued free probability theory, we derive a closed-form description for the generalization errors of the trained RF models. For target functions described by single-index models, we demonstrate that supplementing function data with additional gradient data does not universally improve predictive performance. Rather, the degree of overparameterization should inform the choice of training method. More broadly, our results identify settings where models perform optimally by interpolating noisy function and gradient data.
[LG-67] Unifying Information-Theoretic and Pair-Counting Clustering Similarity
链接: https://arxiv.org/abs/2511.03000
作者: Alexander J. Gates
类目: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)
*备注: 28 pages, 2 figures
Abstract:Comparing clusterings is central to evaluating unsupervised models, yet the many existing similarity measures can produce widely divergent, sometimes contradictory, evaluations. Clustering similarity measures are typically organized into two principal families, pair-counting and information-theoretic, reflecting whether they quantify agreement through element pairs or aggregate information across full cluster contingency tables. Prior work has uncovered parallels between these families and applied empirical normalization or chance-correction schemes, but their deeper analytical connection remains only partially understood. Here, we develop an analytical framework that unifies these families through two complementary perspectives. First, both families are expressed as weighted expansions of observed versus expected co-occurrences, with pair-counting arising as a quadratic, low-order approximation and information-theoretic measures as higher-order, frequency-weighted extensions. Second, we generalize pair-counting to k -tuple agreement and show that information-theoretic measures can be viewed as systematically accumulating higher-order co-assignment structure beyond the pairwise level. We illustrate the approaches analytically for the Rand index and Mutual Information, and show how other indices in each family emerge as natural extensions. Together, these views clarify when and why the two regimes diverge, relating their sensitivities directly to weighting and approximation order, and provide a principled basis for selecting, interpreting, and extending clustering similarity measures across applications.
[LG-68] Scalable Single-Cell Gene Expression Generation with Latent Diffusion Models
链接: https://arxiv.org/abs/2511.02986
作者: Giovanni Palla,Sudarshan Babu,Payam Dibaeinia,James D. Pearce,Donghui Li,Aly A. Khan,Theofanis Karaletsos,Jakub M. Tomczak
类目: Machine Learning (stat.ML); Machine Learning (cs.LG); Genomics (q-bio.GN)
*备注: Github: this https URL
Abstract:Computational modeling of single-cell gene expression is crucial for understanding cellular processes, but generating realistic expression profiles remains a major challenge. This difficulty arises from the count nature of gene expression data and complex latent dependencies among genes. Existing generative models often impose artificial gene orderings or rely on shallow neural network architectures. We introduce a scalable latent diffusion model for single-cell gene expression data, which we refer to as scLDM, that respects the fundamental exchangeability property of the data. Our VAE uses fixed-size latent variables leveraging a unified Multi-head Cross-Attention Block (MCAB) architecture, which serves dual roles: permutation-invariant pooling in the encoder and permutation-equivariant unpooling in the decoder. We enhance this framework by replacing the Gaussian prior with a latent diffusion model using Diffusion Transformers and linear interpolants, enabling high-quality generation with multi-conditional classifier-free guidance. We show its superior performance in a variety of experiments for both observational and perturbational single-cell data, as well as downstream tasks like cell-level classification.
[LG-69] ECGXtract: Deep Learning-based ECG Feature Extraction for Automated CVD Diagnosis
链接: https://arxiv.org/abs/2511.02850
作者: Youssif Abuzied,Hassan AbdEltawab,Abdelrhman Gaber,Tamer ElBatt
类目: ignal Processing (eess.SP); Machine Learning (cs.LG)
*备注:
Abstract:This paper presents ECGXtract, a deep learning-based approach for interpretable ECG feature extraction, addressing the limitations of traditional signal processing and black-box machine learning methods. In particular, we develop convolutional neural network models capable of extracting both temporal and morphological features with strong correlations to a clinically validated ground truth. Initially, each model is trained to extract a single feature, ensuring precise and interpretable outputs. A series of experiments is then carried out to evaluate the proposed method across multiple setups, including global versus lead-specific features, different sampling frequencies, and comparisons with other approaches such as ECGdeli. Our findings show that ECGXtract achieves robust performance across most features with a mean correlation score of 0.80 with the ground truth for global features, with lead II consistently providing the best results. For lead-specific features, ECGXtract achieves a mean correlation score of 0.822. Moreover, ECGXtract achieves superior results to the state-of-the-art open source ECGdeli as it got a higher correlation score with the ground truth in 90% of the features. Furthermore, we explore the feasibility of extracting multiple features simultaneously utilizing a single model. Semantic grouping is proved to be effective for global features, while large-scale grouping and lead-specific multi-output models show notable performance drops. These results highlight the potential of structured grouping strategies to balance the computational efficiency vs. model accuracy, paving the way for more scalable and clinically interpretable ECG feature extraction systems in limited resource settings.
[LG-70] Association-sensory spatiotemporal hierarchy and functional gradient-regularised recurrent neural network with implications for schizophrenia
链接: https://arxiv.org/abs/2511.02722
作者: Subati Abulikemu,Puria Radmard,Michail Mamalakis,John Suckling
类目: Neurons and Cognition (q-bio.NC); Machine Learning (cs.LG)
*备注: 34 pages, 9 figures
Abstract:The human neocortex is functionally organised at its highest level along a continuous sensory-to-association (AS) hierarchy. This study characterises the AS hierarchy of patients with schizophrenia in a comparison with controls. Using a large fMRI dataset (N=355), we extracted individual AS gradients via spectral analysis of brain connectivity, quantified hierarchical specialisation by gradient spread, and related this spread with connectivity geometry. We found that schizophrenia compresses the AS hierarchy indicating reduced functional differentiation. By modelling neural timescale with the Ornstein-Uhlenbeck process, we observed that the most specialised, locally cohesive regions at the gradient extremes exhibit dynamics with a longer time constant, an effect that is attenuated in schizophrenia. To study computation, we used the gradients to regularise subject-specific recurrent neural networks (RNNs) trained on working memory tasks. Networks endowed with greater gradient spread learned more efficiently, plateaued at lower task loss, and maintained stronger alignment to the prescribed AS hierarchical geometry. Fixed point linearisation showed that high-range networks settled into more stable neural states during memory delay, evidenced by lower energy and smaller maximal Jacobian eigenvalues. This gradient-regularised RNN framework therefore links large-scale cortical architecture with fixed point stability, providing a mechanistic account of how gradient de-differentiation could destabilise neural computations in schizophrenia, convergently supported by empirical timescale flattening and model-based evidence of less stable fixed points.
信息检索
[IR-0] A Semantic Encoding of Object Centric Event Data
链接: https://arxiv.org/abs/2511.03351
作者: Saba Latif,Fajar J. Ekaputra,Maxim Vidgof,Sabrina Kirrane,Claudio Di Ciccio
类目: Information Retrieval (cs.IR)
*备注: 12 pages, 3 figures, Wil60
Abstract:The Object-Centric Event Data (OCED) is a novel meta-model aimed at providing a common ground for process data records centered around events and objects. One of its objectives is to foster interoperability and process information exchange. In this context, the integration of data from different providers, the combination of multiple processes, and the enhancement of knowledge inference are novel challenges. Semantic Web technologies can enable the creation of a machine-readable OCED description enriched through ontology-based relationships and entity categorization. In this paper, we introduce an approach built upon Semantic Web technologies for the realization of semantic-enhanced OCED, with the aim to strengthen process data reasoning, interconnect information sources, and boost expressiveness.
[IR-1] wo thousand years of the oracle problem. Insights from Ancient Delphi on the future of blockchain oracles
链接: https://arxiv.org/abs/2511.03319
作者: Giulio Caldarelli,Massimiliano Ornaghi
类目: Cryptography and Security (cs.CR); Computers and Society (cs.CY); Information Retrieval (cs.IR); Information Theory (cs.IT)
*备注: Not peer reviewed
Abstract:The oracle problem refers to the inability of an agent to know if the information coming from an oracle is authentic and unbiased. In ancient times, philosophers and historians debated on how to evaluate, increase, and secure the reliability of oracle predictions, particularly those from Delphi, which pertained to matters of state. Today, we refer to data carriers for automatic machines as oracles, but establishing a secure channel between these oracles and the real world still represents a challenge. Despite numerous efforts, this problem remains mostly unsolved, and the recent advent of blockchain oracles has added a layer of complexity because of the decentralization of blockchains. This paper conceptually connects Delphic and modern blockchain oracles, developing a comparative framework. Leveraging blockchain oracle taxonomy, lexical analysis is also performed on 167 Delphic queries to shed light on the relationship between oracle answer quality and question type. The presented framework aims first at revealing commonalities between classical and computational oracles and then at enriching the oracle analysis within each field. This study contributes to the computer science literature by proposing strategies to improve the reliability of blockchain oracles based on insights from Delphi and to classical literature by introducing a framework that can also be applied to interpret and classify other ancient oracular mechanisms.
[IR-2] KScaNN: Scalable Approximate Nearest Neighbor Search on Kunpeng
链接: https://arxiv.org/abs/2511.03298
作者: Oleg Senkevich,Siyang Xu,Tianyi Jiang,Alexander Radionov,Jan Tabaszewski,Dmitriy Malyshev,Zijian Li,Daihao Xue,Licheng Yu,Weidi Zeng,Meiling Wang,Xin Yao,Siyu Huang,Gleb Neshchetkin,Qiuling Pan,Yaoyao Fu
类目: Information Retrieval (cs.IR)
*备注:
Abstract:Approximate Nearest Neighbor Search (ANNS) is a cornerstone algorithm for information retrieval, recommendation systems, and machine learning applications. While x86-based architectures have historically dominated this domain, the increasing adoption of ARM-based servers in industry presents a critical need for ANNS solutions optimized on ARM architectures. A naive port of existing x86 ANNS algorithms to ARM platforms results in a substantial performance deficit, failing to leverage the unique capabilities of the underlying hardware. To address this challenge, we introduce KScaNN, a novel ANNS algorithm co-designed for the Kunpeng 920 ARM architecture. KScaNN embodies a holistic approach that synergizes sophisticated, data aware algorithmic refinements with carefully-designed hardware specific optimizations. Its core contributions include: 1) novel algorithmic techniques, including a hybrid intra-cluster search strategy and an improved PQ residual calculation method, which optimize the search process at a higher level; 2) an ML-driven adaptive search module that provides adaptive, per-query tuning of search parameters, eliminating the inefficiencies of static configurations; and 3) highly-optimized SIMD kernels for ARM that maximize hardware utilization for the critical distance computation workloads. The experimental results demonstrate that KScaNN not only closes the performance gap but establishes a new standard, achieving up to a 1.63x speedup over the fastest x86-based solution. This work provides a definitive blueprint for achieving leadership-class performance for vector search on modern ARM architectures and underscores
[IR-3] Russian Contribution to Coronary Artery Disease Research: A Scientometric Mapping of Publications
链接: https://arxiv.org/abs/2511.03215
作者: Muneer Ahmad,M Sadik Batcha
类目: Digital Libraries (cs.DL); Information Retrieval (cs.IR)
*备注: 18 pages, 3 figures, Research Article
Abstract:The present study attempts to highlight the research output generated in Russia in coronary artery disease (CAD) research during the period 1990-2019 to understand the distribution of research output, top journals for publications, and most prolific authors, authorship pattern, and citation pattern. This study is based on secondary data extracted from the Science Citation Index (SCI), which is an integral component of the Web of Science. Descriptive and inferential statistical techniques were applied in the study. There were 5058 articles by Russian scholars in coronary artery disease during 1990-2019; they preferred to publish in Russian journals. The research contributions were in the form of research articles, meeting abstracts and reviews with a consistent drop in the number of editorial material and article; proceedings paper with time. Co-authorship was the norm in coronary artery disease research, with a steady increase in the number of multi-author documents in recent years.
[IR-4] A Study on Library Resources with Services Satisfaction based on Library Users Affiliated Colleges to Solapur University
链接: https://arxiv.org/abs/2511.03209
作者: Patel Adam Burhansab,M Sadik Batcha,Muneer Ahmad
类目: Digital Libraries (cs.DL); Information Retrieval (cs.IR)
*备注: 8 pages, 1 figure, Research Article
Abstract:The main aim of this study was to assess and evaluate user satisfaction with library resources and services among library users associated with Solapur University. The current research shows the level of users satisfaction with different library resources and services offered by college libraries. The research found that a vast number of respondents were pleased with library facilities and services. The research is designed to achieve users satisfaction in the library to investigate the level of satisfaction towards library resources and services with regards to 26 colleges of Solapur University based in Maharashtra. Information in the form of data has been collected from colleges and on the basis of users results; analysis needs to analyze users satisfaction.
[IR-5] Generative Sequential Recommendation via Hierarchical Behavior Modeling
链接: https://arxiv.org/abs/2511.03155
作者: Zhefan Wang,Guokai Yan,Jinbei Yu,Siyu Gu,Jingyan Chen,Peng Jiang,Zhiqiang Guo,Min Zhang
类目: Information Retrieval (cs.IR)
*备注:
Abstract:Recommender systems in multi-behavior domains, such as advertising and e-commerce, aim to guide users toward high-value but inherently sparse conversions. Leveraging auxiliary behaviors (e.g., clicks, likes, shares) is therefore essential. Recent progress on generative recommendations has brought new possibilities for multi-behavior sequential recommendation. However, existing generative approaches face two significant challenges: 1) Inadequate Sequence Modeling: capture the complex, cross-level dependencies within user behavior sequences, and 2) Lack of Suitable Datasets: publicly available multi-behavior recommendation datasets are almost exclusively derived from e-commerce platforms, limiting the validation of feasibility in other domains, while also lacking sufficient side information for semantic ID generation. To address these issues, we propose a novel generative framework, GAMER (Generative Augmentation and Multi-lEvel behavior modeling for Recommendation), built upon a decoder-only backbone. GAMER introduces a cross-level interaction layer to capture hierarchical dependencies among behaviors and a sequential augmentation strategy that enhances robustness in training. To further advance this direction, we collect and release ShortVideoAD, a large-scale multi-behavior dataset from a mainstream short-video platform, which differs fundamentally from existing e-commerce datasets and provides pretrained semantic IDs for research on generative methods. Extensive experiments show that GAMER consistently outperforms both discriminative and generative baselines across multiple metrics.


