Arxiv今日论文 | 2025-02-21

本篇博文主要内容为 2025-02-21 从Arxiv.org论文网站获取的最新论文列表，自动更新，按照NLP、CV、ML、AI、IR五个大方向区分，若需要邮件定时接收，请在评论区留下你的邮箱号。

说明：每日论文数据从Arxiv.org获取，每天早上12:00左右定时自动更新。

友情提示: 如何您需要邮箱接收每日论文数据，请在评论处留下你的邮箱。

【速读】：该论文旨在解决大型语言模型（LLMs）在处理长序列时预填充阶段计算复杂度呈二次增长以及解码阶段KV缓存占用大量内存的问题。论文的关键解决方案是引入LServe系统，通过混合稀疏注意力机制加速长序列LLMs的服务。LServe统一了不同硬件友好的结构化稀疏模式，并在预填充和解码阶段以块状方式跳过不重要token的计算。此外，仅需常数数量的KV页即可保持长上下文能力，且设计了一种基于查询中心相似性的层次化KV页选择策略。这些优化使得LServe在保持长上下文精度的同时，将LLM的预填充速度提高至2.9倍，解码速度提高1.3到2.1倍。

链接: https://arxiv.org/abs/2502.14866
作者: Shang Yang,Junxian Guo,Haotian Tang,Qinghao Hu,Guangxuan Xiao,Jiaming Tang,Yujun Lin,Zhijian Liu,Yao Lu,Song Han
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Performance (cs.PF)
备注: Accepted by MLSys 2025. Code available at: this https URL

点击查看摘要

Abstract:Large language models (LLMs) have shown remarkable potential in processing long sequences, yet efficiently serving these long-context models remains challenging due to the quadratic computational complexity of attention in the prefilling stage and the large memory footprint of the KV cache in the decoding stage. To address these issues, we introduce LServe, an efficient system that accelerates long-sequence LLM serving via hybrid sparse attention. This method unifies different hardware-friendly, structured sparsity patterns for both prefilling and decoding attention into a single framework, where computations on less important tokens are skipped block-wise. LServe demonstrates the compatibility of static and dynamic sparsity in long-context LLM attention. This design enables multiplicative speedups by combining these optimizations. Specifically, we convert half of the attention heads to nearly free streaming heads in both the prefilling and decoding stages. Additionally, we find that only a constant number of KV pages is required to preserve long-context capabilities, irrespective of context length. We then design a hierarchical KV page selection policy that dynamically prunes KV pages based on query-centric similarity. On average, LServe accelerates LLM prefilling by up to 2.9x and decoding by 1.3-2.1x over vLLM, maintaining long-context accuracy. Code is released at this https URL.
zh

[NLP-1] Interpretable Text Embeddings and Text Similarity Explanation: A Primer

【速读】：该论文旨在解决文本嵌入（text embeddings）及其模型在许多AI和自然语言处理系统中的可解释性挑战，特别是在解释获得的相似度分数方面的需求，这在需要透明性的应用中尤为重要。论文的关键在于提供了一种结构化概述，专门探讨了解释这些相似度分数的方法，研究了这些方法的基本思想和技术，并评估了它们提高文本嵌入可解释性和解释预测相似性的潜力。

链接: https://arxiv.org/abs/2502.14862
作者: Juri Opitz,Lucas Möller,Andrianos Michail,Simon Clematide
机构: University of Zurich (苏黎世大学); IMS at University of Stuttgart (斯图加特大学IMS研究所)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
备注:

点击查看摘要

Abstract:Text embeddings and text embedding models are a backbone of many AI and NLP systems, particularly those involving search. However, interpretability challenges persist, especially in explaining obtained similarity scores, which is crucial for applications requiring transparency. In this paper, we give a structured overview of interpretability methods specializing in explaining those similarity scores, an emerging research area. We study the methods’ individual ideas and techniques, evaluating their potential for improving interpretability of text embeddings and explaining predicted similarities.
zh

[NLP-2] Aligning LLM s to Ask Good Questions A Case Study in Clinical Reasoning

【速读】：该论文旨在解决大型语言模型（Large Language Models, LLMs）在不确定性条件下提出有效问题的能力不足问题，这使得它们在需要主动信息收集的领域内决策可靠性降低。解决方案的关键在于ALFA框架，它通过将“好”问题的概念分解为一系列基于理论的属性（如清晰度、相关性），合成特定属性的问题变体，并通过基于偏好的优化来对齐模型，从而明确学习如何沿这些细粒度属性提出更好的问题。

链接: https://arxiv.org/abs/2502.14860
作者: Shuyue Stella Li,Jimin Mun,Faeze Brahman,Jonathan S. Ilgen,Yulia Tsvetkov,Maarten Sap
机构: University of Washington(华盛顿大学); Carnegie Mellon University(卡内基梅隆大学); Allen Institute for AI(艾伦人工智能研究所)
类目: Computation and Language (cs.CL)
备注: 22 pages, 8 figures, 8 tables

点击查看摘要

Abstract:Large language models (LLMs) often fail to ask effective questions under uncertainty, making them unreliable in domains where proactive information-gathering is essential for decisionmaking. We present ALFA, a framework that improves LLM question-asking by (i) decomposing the notion of a “good” question into a set of theory-grounded attributes (e.g., clarity, relevance), (ii) controllably synthesizing attribute-specific question variations, and (iii) aligning models via preference-based optimization to explicitly learn to ask better questions along these fine-grained attributes. Focusing on clinical reasoning as a case study, we introduce the MediQ-AskDocs dataset, composed of 17k real-world clinical interactions augmented with 80k attribute-specific preference pairs of follow-up questions, as well as a novel expert-annotated interactive healthcare QA task to evaluate question-asking abilities. Models aligned with ALFA reduce diagnostic errors by 56.6% on MediQ-AskDocs compared to SOTA instruction-tuned LLMs, with a question-level win-rate of 64.4% and strong generalizability. Our findings suggest that explicitly guiding question-asking with structured, fine-grained attributes offers a scalable path to improve LLMs, especially in expert application domains.
zh

[NLP-3] FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling

【速读】：该论文旨在解决大型语言模型（LLMs）在使用现有投机性采样方法时，效率提升受限的问题，尤其是在词汇量较大的模型中。论文的关键解决方案是提出了一种名为FR-Spec的频率排名投机性采样框架，通过压缩词汇空间来优化候选草案的选择，将草案搜索限制在按频率优先的词子集内，从而减少了语言模型头（LM Head）的计算开销达75%，同时确保最终输出分布不变。

链接: https://arxiv.org/abs/2502.14856
作者: Weilin Zhao,Tengyu Pan,Xu Han,Yudi Zhang,Ao Sun,Yuxiang Huang,Kaihuo Zhang,Weilun Zhao,Yuxuan Li,Jianyong Wang,Zhiyuan Liu,Maosong Sun
机构: Tsinghua University(清华大学); Harbin Institute of Technology(哈尔滨工业大学); Beijing University of Posts and Telecommunications(北京邮电大学); OpenAI
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Speculative sampling has emerged as an important technique for accelerating the auto-regressive generation process of large language models (LLMs) by utilizing a draft-then-verify mechanism to produce multiple tokens per forward pass. While state-of-the-art speculative sampling methods use only a single layer and a language modeling (LM) head as the draft model to achieve impressive layer compression, their efficiency gains are substantially reduced for large-vocabulary LLMs, such as Llama-3-8B with a vocabulary of 128k tokens. To address this, we present FR-Spec, a frequency-ranked speculative sampling framework that optimizes draft candidate selection through vocabulary space compression. By constraining the draft search to a frequency-prioritized token subset, our method reduces LM Head computation overhead by 75% while ensuring the equivalence of the final output distribution. Experiments across multiple datasets demonstrate an average of 1.12 \times speedup over the state-of-the-art speculative sampling method EAGLE-2.
zh

[NLP-4] Prompt-to-Leaderboard

【速读】：该论文旨在解决大型语言模型（LLM）评估中依赖聚合指标如准确率或人类偏好而导致的用户和提示特定性能变化被平均化的问题。论文提出的关键解决方案是Prompt-to-Leaderboard (P2L)，该方法通过训练一个以自然语言提示为输入并输出布拉德利-特里系数向量的LLM，进而预测人类偏好投票，从而生成与特定提示相关的排行榜。这种方法允许进行无监督的任务特定评估、查询模型的最佳路由、个性化以及自动评估模型的优势和劣势。

链接: https://arxiv.org/abs/2502.14855
作者: Evan Frick,Connor Chen,Joseph Tennyson,Tianle Li,Wei-Lin Chiang,Anastasios N. Angelopoulos,Ion Stoica
机构: University of California, Berkeley(加州大学伯克利分校)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Large language model (LLM) evaluations typically rely on aggregated metrics like accuracy or human preference, averaging across users and prompts. This averaging obscures user- and prompt-specific variations in model performance. To address this, we propose Prompt-to-Leaderboard (P2L), a method that produces leaderboards specific to a prompt. The core idea is to train an LLM taking natural language prompts as input to output a vector of Bradley-Terry coefficients which are then used to predict the human preference vote. The resulting prompt-dependent leaderboards allow for unsupervised task-specific evaluation, optimal routing of queries to models, personalization, and automated evaluation of model strengths and weaknesses. Data from Chatbot Arena suggest that P2L better captures the nuanced landscape of language model performance than the averaged leaderboard. Furthermore, our findings suggest that P2L’s ability to produce prompt-specific evaluations follows a power law scaling similar to that observed in LLMs themselves. In January 2025, the router we trained based on this methodology achieved the #1 spot in the Chatbot Arena leaderboard. Our code is available at this GitHub link: this https URL.
zh

[NLP-5] CLIPPER: Compression enables long-context synthetic data generation

【速读】：该论文旨在解决生成高质量合成数据以支持复杂长上下文推理任务的问题。论文的关键解决方案是提出了一种基于压缩的方法CLIPPER，它首先将书籍压缩为章节大纲和书摘总结，然后利用这些中间表示生成复杂的叙述性主张及其对应的推理链。这种方法相较于直接从原始文本生成主张，能够产生更有效、更合理且更复杂的主张，从而显著提升了模型在叙述性主张验证任务上的性能。

链接: https://arxiv.org/abs/2502.14854
作者: Chau Minh Pham,Yapei Chang,Mohit Iyyer
机构: University of Maryland, College Park (马里兰大学学院公园分校); University of Massachusetts Amherst (马萨诸塞大学阿默斯特分校)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:LLM developers are increasingly reliant on synthetic data, but generating high-quality data for complex long-context reasoning tasks remains challenging. We introduce CLIPPER, a compression-based approach for generating synthetic data tailored to narrative claim verification - a task that requires reasoning over a book to verify a given claim. Instead of generating claims directly from the raw text of the book, which results in artifact-riddled claims, CLIPPER first compresses the book into chapter outlines and book summaries and then uses these intermediate representations to generate complex claims and corresponding chain-of-thoughts. Compared to naive approaches, CLIPPER produces claims that are more valid, grounded, and complex. Using CLIPPER, we construct a dataset of 19K synthetic book claims paired with their source texts and chain-of-thought reasoning, and use it to fine-tune three open-weight models. Our best model achieves breakthrough results on narrative claim verification (from 28% to 76% accuracy on our test set) and sets a new state-of-the-art for sub-10B models on the NoCha leaderboard. Further analysis shows that our models generate more detailed and grounded chain-of-thought reasoning while also improving performance on other narrative understanding tasks (e.g., NarrativeQA).
zh

[NLP-6] GATE: Graph-based Adaptive Tool Evolution Across Diverse Tasks

【速读】：该论文旨在解决现有工具制作框架在构建可靠工具集方面的低效性以及其局限于单一任务设置的问题。关键解决方案在于提出GATE（基于图的自适应工具进化）框架，该框架能够动态构建和演化跨多个场景的可重用工具层次图，从而实现更高效的多任务工具生成与优化。

链接: https://arxiv.org/abs/2502.14848
作者: Jianwen Luo,Yiming Huang,Jinxiang Meng,Fangyu Lei,Shizhu He,Xiao Liu,Shanshan Jiang,Bin Dong,Jun Zhao,Kang Liu
机构: 未知
类目: Computation and Language (cs.CL)
备注: 8 pages of main text, 38 pages of appendices

点击查看摘要

Abstract:Large Language Models (LLMs) have shown great promise in tool-making, yet existing frameworks often struggle to efficiently construct reliable toolsets and are limited to single-task settings. To address these challenges, we propose GATE (Graph-based Adaptive Tool Evolution), an adaptive framework that dynamically constructs and evolves a hierarchical graph of reusable tools across multiple scenarios. We evaluate GATE on open-ended tasks (Minecraft), agent-based tasks (TextCraft, DABench), and code generation tasks (MATH, Date, TabMWP). Our results show that GATE achieves up to 4.3x faster milestone completion in Minecraft compared to the previous SOTA, and provides an average improvement of 9.23% over existing tool-making methods in code generation tasks and 10.03% in agent tasks. GATE demonstrates the power of adaptive evolution, balancing tool quantity, complexity, and functionality while maintaining high efficiency. Code and data are available at \urlthis https URL.
zh

[NLP-7] Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation

【速读】：该论文旨在解决视觉语言模型（Vision-Language Models, VLMs）在处理丰富文本图像（如图表和文档）时面临的挑战，主要由于缺乏多样化且富含文本的视觉语言数据。为了解决这一难题，论文提出了一种名为CoSyn的框架，其关键是利用纯文本大型语言模型（Large Language Models, LLMs）的编码能力自动生成合成的多模态数据。通过这种方式，CoSyn能够生成高质量的指令调优数据，并构造了一个包含400K图像和2.7M行视觉语言指令调优数据的大规模数据集。

链接: https://arxiv.org/abs/2502.14846
作者: Yue Yang,Ajay Patel,Matt Deitke,Tanmay Gupta,Luca Weihs,Andrew Head,Mark Yatskar,Chris Callison-Burch,Ranjay Krishna,Aniruddha Kembhavi,Christopher Clark
机构: University of Pennsylvania(宾夕法尼亚大学); Allen Institute for Artificial Intelligence(艾伦人工智能研究所)
类目: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
备注: 20 pages, 19 figures, 9 tables, website: this https URL

点击查看摘要

Abstract:Reasoning about images with rich text, such as charts and documents, is a critical application of vision-language models (VLMs). However, VLMs often struggle in these domains due to the scarcity of diverse text-rich vision-language data. To address this challenge, we present CoSyn, a framework that leverages the coding capabilities of text-only large language models (LLMs) to automatically create synthetic text-rich multimodal data. Given input text describing a target domain (e.g., “nutrition fact labels”), CoSyn prompts an LLM to generate code (Python, HTML, LaTeX, etc.) for rendering synthetic images. With the underlying code as textual representations of the synthetic images, CoSyn can generate high-quality instruction-tuning data, again relying on a text-only LLM. Using CoSyn, we constructed a dataset comprising 400K images and 2.7M rows of vision-language instruction-tuning data. Comprehensive experiments on seven benchmarks demonstrate that models trained on our synthetic data achieve state-of-the-art performance among competitive open-source models, including Llama 3.2, and surpass proprietary models such as GPT-4V and Gemini 1.5 Flash. Furthermore, CoSyn can produce synthetic pointing data, enabling VLMs to ground information within input images, showcasing its potential for developing multimodal agents capable of acting in real-world environments.
zh

[NLP-8] Revealing and Mitigating Over-Attention in Knowledge Editing

【速读】：该论文旨在解决大型语言模型（LLMs）在知识编辑过程中出现的特定性失效（Specificity Failure）问题，即通过修改少量参数精确编辑模型知识的方法可能导致其他预存知识退化。论文的关键解决方案是引入选择性注意漂移限制（Selective Attention Drift Restriction, SADR）方法，在知识编辑过程中增加一个正则化项，以限制注意力权重分布的变化，从而防止过度关注被编辑的实体，进而缓解注意漂移（Attention Drift）现象。

链接: https://arxiv.org/abs/2502.14838
作者: Pinzheng Wang,Zecheng Tang,Keyan Zhou,Juntao Li,Qiaoming Zhu,Min Zhang
机构: Soochow University (苏州大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Large Language Models have demonstrated superior performance across a wide range of tasks, but they still exhibit undesirable errors due to incorrect knowledge learned from the training data. To avoid this, knowledge editing methods emerged to precisely edit the specific model knowledge via efficiently modifying a very small percentage of parameters. % However, those methods can lead to the problem of Specificity Failure: when the content related to the edited knowledge occurs in the context, it can inadvertently corrupt other pre-existing knowledge. However, those methods can lead to the problem of Specificity Failure, where the existing knowledge and capabilities are severely degraded due to editing. Our preliminary indicates that Specificity Failure primarily stems from the model’s attention heads assigning excessive attention scores to entities related to the edited knowledge, thereby unduly focusing on specific snippets within the context, which we denote as the Attention Drift phenomenon. To mitigate such Attention Drift issue, we introduce a simple yet effective method Selective Attention Drift Restriction(SADR), which introduces an additional regularization term during the knowledge editing process to restrict changes in the attention weight distribution, thereby preventing undue focus on the edited entity. Experiments on five frequently used strong LLMs demonstrate the effectiveness of our method, where SADR can significantly mitigate Specificity Failure in the predominant knowledge editing tasks.
zh

[NLP-9] owards Economical Inference: Enabling DeepSeek s Multi-Head Latent Attention in Any Transformer-based LLM s

【速读】：该论文旨在解决将已训练好的大型语言模型（LLMs）从标准多头注意力机制（MHA）快速且高效地过渡到多头潜在注意力机制（MLA），而无需从零开始重新训练的问题。解决方案的关键在于提出了一种名为MHA2MLA的数据高效微调方法，该方法包含两部分：一是对于部分旋转位置嵌入（partial-RoPE），去除了对注意力分数贡献较小的查询和键的RoPE；二是引入基于预训练键值参数的联合奇异值分解（SVD）近似进行低秩逼近。这些策略使得MHA2MLA仅需使用极小比例（0.3%至0.6%）的数据即可恢复性能，显著降低了推理成本，并与诸如KV缓存量化等压缩技术无缝集成。

链接: https://arxiv.org/abs/2502.14837
作者: Tao Ji,Bin Guo,Yuanbin Wu,Qipeng Guo,Lixing Shen,Zhan Chen,Xipeng Qiu,Qi Zhang,Tao Gui
机构: Fudan University; East China Normal University; Hikvision Inc; Shanghai Al Lab
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 16 pages, 8 figures

点击查看摘要

Abstract:Multi-head Latent Attention (MLA) is an innovative architecture proposed by DeepSeek, designed to ensure efficient and economical inference by significantly compressing the Key-Value (KV) cache into a latent vector. Compared to MLA, standard LLMs employing Multi-Head Attention (MHA) and its variants such as Grouped-Query Attention (GQA) exhibit significant cost disadvantages. Enabling well-trained LLMs (e.g., Llama) to rapidly adapt to MLA without pre-training from scratch is both meaningful and challenging. This paper proposes the first data-efficient fine-tuning method for transitioning from MHA to MLA (MHA2MLA), which includes two key components: for partial-RoPE, we remove RoPE from dimensions of queries and keys that contribute less to the attention scores, for low-rank approximation, we introduce joint SVD approximations based on the pre-trained parameters of keys and values. These carefully designed strategies enable MHA2MLA to recover performance using only a small fraction (0.3% to 0.6%) of the data, significantly reducing inference costs while seamlessly integrating with compression techniques such as KV cache quantization. For example, the KV cache size of Llama2-7B is reduced by 92.19%, with only a 0.5% drop in LongBench performance.
zh

[NLP-10] LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models

【速读】：该论文旨在解决现有大型视觉语言模型（Large Vision-Language Models, LVLMs）在生成超过1,000词的连贯输出时遇到的困难。这一问题的核心在于这些模型在有监督微调（Supervised Fine-Tuning, SFT）过程中缺乏长输出示例。为了解决这一问题，论文提出的关键方案是引入LongWriter-V-22k数据集，并采用迭代式偏好优化（Iterative Direct Preference Optimization, IterDPO）方法。LongWriter-V-22k数据集包含22,158个样本，每个样本包括多个输入图像、指令以及长度从0到10,000词的相应输出。IterDPO通过将长输出分割成段并进行迭代修正来形成与原始输出的偏好对，从而有效降低收集人类反馈的成本。

链接: https://arxiv.org/abs/2502.14834
作者: Shangqing Tu,Yucheng Wang,Daniel Zhang-Li,Yushi Bai,Jifan Yu,Yuhao Wu,Lei Hou,Huiqin Liu,Zhiyuan Liu,Bin Xu,Juanzi Li
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Existing Large Vision-Language Models (LVLMs) can process inputs with context lengths up to 128k visual and text tokens, yet they struggle to generate coherent outputs beyond 1,000 words. We find that the primary limitation is the absence of long output examples during supervised fine-tuning (SFT). To tackle this issue, we introduce LongWriter-V-22k, a SFT dataset comprising 22,158 examples, each with multiple input images, an instruction, and corresponding outputs ranging from 0 to 10,000 words. Moreover, to achieve long outputs that maintain high-fidelity to the input images, we employ Direct Preference Optimization (DPO) to the SFT model. Given the high cost of collecting human feedback for lengthy outputs (e.g., 3,000 words), we propose IterDPO, which breaks long outputs into segments and uses iterative corrections to form preference pairs with the original outputs. Additionally, we develop MMLongBench-Write, a benchmark featuring six tasks to evaluate the long-generation capabilities of VLMs. Our 7B parameter model, trained with LongWriter-V-22k and IterDPO, achieves impressive performance on this benchmark, outperforming larger proprietary models like GPT-4o. Code and data: this https URL
zh

[NLP-11] Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLM s

【速读】：该论文旨在解决大型语言模型（LLM）在多语言任务特定应用中的跨语言迁移能力不足的问题。关键在于发现LLM的中间层在跨语言对齐中展现出最强的潜力，并提出了一种集成到任务特定训练中的中间层对齐目标。此方法通过在多个任务上的实验验证，显示出一致的跨语言迁移改进，特别是对于资源较少的语言，并且证明了其对未见语言的泛化能力。此外，单独训练的对齐模块可以与现有的任务特定模块合并，从而提升跨语言能力而无需完全重新训练。

链接: https://arxiv.org/abs/2502.14830
作者: Danni Liu,Jan Niehues
机构: Karlsruhe Institute of Technology (卡尔斯鲁厄理工学院)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:While large language models demonstrate remarkable capabilities at task-specific applications through fine-tuning, extending these benefits across diverse languages is essential for broad accessibility. However, effective cross-lingual transfer is hindered by LLM performance gaps across languages and the scarcity of fine-tuning data in many languages. Through analysis of LLM internal representations from over 1,000+ language pairs, we discover that middle layers exhibit the strongest potential for cross-lingual alignment. Building on this finding, we propose a middle-layer alignment objective integrated into task-specific training. Our experiments on slot filling, machine translation, and structured text generation show consistent improvements in cross-lingual transfer, especially to lower-resource languages. The method is robust to the choice of alignment languages and generalizes to languages unseen during alignment. Furthermore, we show that separately trained alignment modules can be merged with existing task-specific modules, improving cross-lingual capabilities without full re-training. Our code is publicly available (this https URL).
zh

[NLP-12] Measuring Faithfulness of Chains of Thought by Unlearning Reasoning Steps

【速读】：该论文旨在探究语言模型（Language Models, LMs）在逐步推理（chain of thought, CoT）过程中生成的推理步骤是否忠实于模型的参数化信念。论文的关键解决方案是引入了一种衡量生成推理参数化忠实性的框架，并提出了“通过遗忘推理步骤来验证忠实性”（Faithfulness by Unlearning Reasoning, FUR）。FUR通过从模型参数中擦除推理步骤所包含的信息来实现这一目标。实验结果表明，通过FUR方法可以经常改变模型的预测结果，这表明CoT的参数化忠实性，并进一步分析显示，模型在遗忘后生成的CoT支持不同的答案，强调了对专门对齐的需求。

链接: https://arxiv.org/abs/2502.14829
作者: Martin Tutek,Fateme Hashemi Chaleshtori,Ana Marasović,Yonatan Belinkov
机构: Technion - Israel Institute of Technology; University of Utah
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:When prompted to think step-by-step, language models (LMs) produce a chain of thought (CoT), a sequence of reasoning steps that the model supposedly used to produce its prediction. However, despite much work on CoT prompting, it is unclear if CoT reasoning is faithful to the models’ parameteric beliefs. We introduce a framework for measuring parametric faithfulness of generated reasoning, and propose Faithfulness by Unlearning Reasoning steps (FUR), an instance of this framework. FUR erases information contained in reasoning steps from model parameters. We perform experiments unlearning CoTs of four LMs prompted on four multi-choice question answering (MCQA) datasets. Our experiments show that FUR is frequently able to change the underlying models’ prediction by unlearning key steps, indicating when a CoT is parametrically faithful. Further analysis shows that CoTs generated by models post-unlearning support different answers, hinting at a deeper effect of unlearning. Importantly, CoT steps identified as important by FUR do not align well with human notions of plausbility, emphasizing the need for specialized alignment
zh

[NLP-13] C-Tab2Text: Aspect-Based Text Generation from e-Commerce Product Tables NAACL2025

【速读】：该论文旨在解决大型语言模型（LLMs）在电子商务领域应用不足的问题，主要由于缺乏特定领域的数据集。为了解决这一问题，论文引入了一个名为eC-Tab2Text的新数据集，用于捕捉电子商务中的复杂性，包括详细的产品属性和用户特定查询。解决方案的关键在于利用eC-Tab2Text数据集进行模型微调，使LLMs能够从结构化的表格数据生成高质量、属性特定的产品评论，从而显著提高生成上下文准确评论的能力。

链接: https://arxiv.org/abs/2502.14820
作者: Luis Antonio Gutiérrez Guanilo,Mir Tafseer Nayeem,Cristian López,Davood Rafiei
机构: University of Engineering and Technology (UTEC); University of Alberta (阿尔伯塔大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Databases (cs.DB); Human-Computer Interaction (cs.HC)
备注: NAACL 2025 (Industry Track)

点击查看摘要

Abstract:Large Language Models (LLMs) have demonstrated exceptional versatility across diverse domains, yet their application in e-commerce remains underexplored due to a lack of domain-specific datasets. To address this gap, we introduce eC-Tab2Text, a novel dataset designed to capture the intricacies of e-commerce, including detailed product attributes and user-specific queries. Leveraging eC-Tab2Text, we focus on text generation from product tables, enabling LLMs to produce high-quality, attribute-specific product reviews from structured tabular data. Fine-tuned models were rigorously evaluated using standard Table2Text metrics, alongside correctness, faithfulness, and fluency assessments. Our results demonstrate substantial improvements in generating contextually accurate reviews, highlighting the transformative potential of tailored datasets and fine-tuning methodologies in optimizing e-commerce workflows. This work highlights the potential of LLMs in e-commerce workflows and the essential role of domain-specific datasets in tailoring them to industry-specific challenges.
zh

[NLP-14] Optimizing Model Selection for Compound AI Systems

【速读】：该论文旨在解决在优化复合人工智能（Compound AI）系统时如何选择合适的大型语言模型（LLM）的问题。论文的关键在于提出了一种名为LLMSelector的高效框架，该框架利用了两个关键的实证见解：一是整体性能通常在固定其他模块的情况下随着每个模块表现的提升而单调增加；二是可以通过一个LLM准确估计每个模块的表现。基于这些见解，LLMSelector通过迭代选择一个模块，并为其分配性能最高的模型，直至无法进一步提升性能为止。此方法能够以线性增长的API调用次数实现高质量的模型分配，适用于具有有限模块数量的任意复合系统。

链接: https://arxiv.org/abs/2502.14815
作者: Lingjiao Chen,Jared Quincy Davis,Boris Hanin,Peter Bailis,Matei Zaharia,James Zou,Ion Stoica
机构: Microsoft Research (微软研究); Stanford University (斯坦福大学); Princeton University (普林斯顿大学); University of California, Berkeley (加州大学伯克利分校)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
备注:

点击查看摘要

Abstract:Compound AI systems that combine multiple LLM calls, such as self-refine and multi-agent-debate, achieve strong performance on many AI tasks. We address a core question in optimizing compound systems: for each LLM call or module in the system, how should one decide which LLM to use? We show that these LLM choices have a large effect on quality, but the search space is exponential. We propose LLMSelector, an efficient framework for model selection in compound systems, which leverages two key empirical insights: (i) end-to-end performance is often monotonic in how well each module performs, with all other modules held fixed, and (ii) per-module performance can be estimated accurately by an LLM. Building upon these insights, LLMSelector iteratively selects one module and allocates to it the model with the highest module-wise performance, as estimated by an LLM, until no further gain is possible. LLMSelector is applicable to any compound system with a bounded number of modules, and its number of API calls scales linearly with the number of modules, achieving high-quality model allocation both empirically and theoretically. Experiments with popular compound systems such as multi-agent debate and self-refine using LLMs such as GPT-4o, Claude 3.5 Sonnet and Gemini 1.5 show that LLMSelector confers 5%-70% accuracy gains compared to using the same LLM for all modules.
zh

[NLP-15] From RAG to Memory: Non-Parametric Continual Learning for Large Language Models

【速读】：该论文旨在解决现有检索增强生成（Retrieval-Augmented Generation, RAG）系统在基本事实记忆任务上的性能下降问题，以及无法充分模拟人类长期记忆的动态性和关联性。论文的关键解决方案是提出HippoRAG 2框架，该框架通过深度段落集成和更有效的大型语言模型（LLM）在线使用，改进了Personalized PageRank算法。这一方法显著提升了关联记忆任务的表现，同时在事实知识和理解记忆方面也表现出色，实现了比最先进的嵌入模型高出7%的提升。

链接: https://arxiv.org/abs/2502.14802
作者: Bernal Jiménez Gutiérrez,Yiheng Shu,Weijian Qi,Sizhe Zhou,Yu Su
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Code and data to be released at: this https URL

点击查看摘要

Abstract:Our ability to continuously acquire, organize, and leverage knowledge is a key feature of human intelligence that AI systems must approximate to unlock their full potential. Given the challenges in continual learning with large language models (LLMs), retrieval-augmented generation (RAG) has become the dominant way to introduce new information. However, its reliance on vector retrieval hinders its ability to mimic the dynamic and interconnected nature of human long-term memory. Recent RAG approaches augment vector embeddings with various structures like knowledge graphs to address some of these gaps, namely sense-making and associativity. However, their performance on more basic factual memory tasks drops considerably below standard RAG. We address this unintended deterioration and propose HippoRAG 2, a framework that outperforms standard RAG comprehensively on factual, sense-making, and associative memory tasks. HippoRAG 2 builds upon the Personalized PageRank algorithm used in HippoRAG and enhances it with deeper passage integration and more effective online use of an LLM. This combination pushes this RAG system closer to the effectiveness of human long-term memory, achieving a 7% improvement in associative memory tasks over the state-of-the-art embedding model while also exhibiting superior factual knowledge and sense-making memory capabilities. This work paves the way for non-parametric continual learning for LLMs. Our code and data will be released at this https URL.
zh

[NLP-16] Rapid Word Learning Through Meta In-Context Learning

【速读】：该论文旨在解决当前语言模型在少量示例下学习新词能力不足的问题。关键解决方案是引入了一种名为Meta-training for IN-context learNing Of Words (Minnow)的方法。这种方法通过使用特殊的占位符标记，在少量上下文示例的基础上训练语言模型生成新词的用法实例，经过多次迭代训练以培养模型的通用词汇学习能力。研究表明，从零开始使用Minnow训练的语言模型在人类规模的儿童导向语言上能够实现强大的少样本词汇学习能力，并且微调预训练的大规模语言模型（LLM）可以进一步提升其区分新词、识别新词语法类别以及生成合理的新词用法和定义的能力。

链接: https://arxiv.org/abs/2502.14791
作者: Wentao Wang,Guangyuan Jiang,Tal Linzen,Brenden M. Lake
机构: New York University (纽约大学); Peking University (北京大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Humans can quickly learn a new word from a few illustrative examples, and then systematically and flexibly use it in novel contexts. Yet the abilities of current language models for few-shot word learning, and methods for improving these abilities, are underexplored. In this study, we introduce a novel method, Meta-training for IN-context learNing Of Words (Minnow). This method trains language models to generate new examples of a word’s usage given a few in-context examples, using a special placeholder token to represent the new word. This training is repeated on many new words to develop a general word-learning ability. We find that training models from scratch with Minnow on human-scale child-directed language enables strong few-shot word learning, comparable to a large language model (LLM) pre-trained on orders of magnitude more data. Furthermore, through discriminative and generative evaluations, we demonstrate that finetuning pre-trained LLMs with Minnow improves their ability to discriminate between new words, identify syntactic categories of new words, and generate reasonable new usages and definitions for new words, based on one or a few in-context examples. These findings highlight the data efficiency of Minnow and its potential to improve language model performance in word learning tasks.
zh

[NLP-17] ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting

【速读】：该论文旨在解决多模态交互中的视觉隐私泄露以及实时设备端可用性限制的问题。其关键解决方案是提出了一种名为Visual Instruction Rewriting的方法，通过将多模态指令转换为纯文本命令，实现了轻量级设备端指令重写视觉语言模型（VLM）与现有对话式AI系统的无缝集成，从而增强视觉数据的隐私保护。

链接: https://arxiv.org/abs/2502.14780
作者: Abhijit Mishra,Richard Noh,Hsiang Fu,Mingda Li,Minji Kim
机构: School of Information, University of Texas at Austin (信息学院，德克萨斯大学奥斯汀分校); Department of Statistics and Data Science, Yale University (统计与数据科学系，耶鲁大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: 12 pages, 7 figures, 3 tables

点击查看摘要

Abstract:Efficient and privacy-preserving multimodal interaction is essential as AR, VR, and modern smartphones with powerful cameras become primary interfaces for human-computer communication. Existing powerful large vision-language models (VLMs) enabling multimodal interaction often rely on cloud-based processing, raising significant concerns about (1) visual privacy by transmitting sensitive vision data to servers, and (2) their limited real-time, on-device usability. This paper explores Visual Instruction Rewriting, a novel approach that transforms multimodal instructions into text-only commands, allowing seamless integration of lightweight on-device instruction rewriter VLMs (250M parameters) with existing conversational AI systems, enhancing vision data privacy. To achieve this, we present a dataset of over 39,000 examples across 14 domains and develop a compact VLM, pretrained on image captioning datasets and fine-tuned for instruction rewriting. Experimental results, evaluated through NLG metrics such as BLEU, METEOR, and ROUGE, along with semantic parsing analysis, demonstrate that even a quantized version of the model (500MB storage footprint) can achieve effective instruction rewriting, thus enabling privacy-focused, multimodal AI applications.
zh

[NLP-18] Harnessing PDF Data for Improving Japanese Large Multimodal Models

【速读】：该论文旨在解决大型多模态模型（LMMs）在日语应用中的效果受限问题，主要由于高质量训练数据的缺乏。当前的日语LMMs通常依赖于翻译的英语数据集，这限制了它们捕捉特定于日本文化知识的能力。论文的关键解决方案在于探索使用日文PDF数据作为训练资源，并提出了一套全自动的处理流程。该流程利用预训练模型通过版面分析、光学字符识别（OCR）以及视觉-语言配对来提取图像-文本对，无需人工标注。此外，从提取的图像-文本对中构建指令数据以丰富训练数据。这些方法显著提升了日语LMMs的表现，在Heron-Bench上的性能提升幅度为3.9%到13.8%，从而验证了PDF数据作为多模态资源的价值。

链接: https://arxiv.org/abs/2502.14778
作者: Jeonghun Baek,Akiko Aizawa,Kiyoharu Aizawa
机构: The University of Tokyo(东京大学); National Institute of Informatics(国立信息学研究所)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: 15 pages, 8 figures

点击查看摘要

Abstract:Large Multimodal Models (LMMs) have demonstrated strong performance in English, but their effectiveness in Japanese remains limited due to the lack of high-quality training data. Current Japanese LMMs often rely on translated English datasets, restricting their ability to capture Japan-specific cultural knowledge. To address this, we explore the potential of Japanese PDF data as a training resource, an area that remains largely underutilized. We introduce a fully automated pipeline that leverages pretrained models to extract image-text pairs from PDFs through layout analysis, OCR, and vision-language pairing, removing the need for manual annotation. Additionally, we construct instruction data from extracted image-text pairs to enrich the training data. To evaluate the effectiveness of PDF-derived data, we train Japanese LMMs and assess their performance on the Japanese LMM Benchmark. Our results demonstrate substantial improvements, with performance gains ranging from 3.9% to 13.8% on Heron-Bench. Further analysis highlights the impact of PDF-derived data on various factors, such as model size and language models, reinforcing its value as a multimodal resource for Japanese LMMs. We plan to make the source code and data publicly available upon acceptance.
zh

[NLP-19] SurveyX: Academic Survey Automation via Large Language Models

【速读】：该论文旨在解决现有自动化调查生成系统在有限上下文窗口、缺乏深入内容讨论以及缺乏系统性评估框架方面的局限性。解决方案的关键在于将调查生成过程分解为准备阶段和生成阶段，并通过引入在线参考检索、预处理方法AttributeTree以及重润色过程，显著提升了调查生成的效率与质量。实验结果显示，SurveyX在内容质量和引用质量方面均优于现有的自动化调查生成系统，接近人类专家水平。

链接: https://arxiv.org/abs/2502.14776
作者: Xun Liang,Jiawei Yang,Yezhaohui Wang,Chen Tang,Zifan Zheng,Simin Niu,Shichao Song,Hanyu Wang,Bo Tang,Feiyu Xiong,Keming Mao,Zhiyu li
机构: Renmin University of China(中国人民大学); Northeastern University(东北大学); Institute for Advanced Algorithms Research(先进算法研究所); The University of Sydney(悉尼大学)
类目: Computation and Language (cs.CL)
备注: 15 pages, 16 figures

点击查看摘要

Abstract:Large Language Models (LLMs) have demonstrated exceptional comprehension capabilities and a vast knowledge base, suggesting that LLMs can serve as efficient tools for automated survey generation. However, recent research related to automated survey generation remains constrained by some critical limitations like finite context window, lack of in-depth content discussion, and absence of systematic evaluation frameworks. Inspired by human writing processes, we propose SurveyX, an efficient and organized system for automated survey generation that decomposes the survey composing process into two phases: the Preparation and Generation phases. By innovatively introducing online reference retrieval, a pre-processing method called AttributeTree, and a re-polishing process, SurveyX significantly enhances the efficacy of survey composition. Experimental evaluation results show that SurveyX outperforms existing automated survey generation systems in content quality (0.259 improvement) and citation quality (1.76 enhancement), approaching human expert performance across multiple evaluation dimensions. Examples of surveys generated by SurveyX are available on this http URL
zh

[NLP-20] Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

【速读】：该论文旨在探索基于规则的强化学习（Reinforcement Learning, RL）在大型推理模型中的潜力。为了分析推理动态，研究使用合成逻辑谜题作为训练数据。论文的关键贡献在于提出了一种系统提示方法，强调思考和回答过程；设计了一个严格的格式奖励函数，以惩罚走捷径的输出；以及一种实现稳定收敛的简单训练方案。通过这些技术改进，70亿参数的模型不仅掌握了复杂的推理技能，如反思、验证和总结，还在仅训练于5000个逻辑问题后，在具有挑战性的数学基准AIME和AMC上展示了泛化能力。

链接: https://arxiv.org/abs/2502.14768
作者: Tian Xie,Zitian Gao,Qingnan Ren,Haoming Luo,Yuqian Hong,Bryan Dai,Joey Zhou,Kai Qiu,Zhirong Wu,Chong Luo
机构: Microsoft Research Asia (微软亚洲研究院); Ubiquant (Ubiquant); Independent (独立)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Inspired by the success of DeepSeek-R1, we explore the potential of rule-based reinforcement learning (RL) in large reasoning models. To analyze reasoning dynamics, we use synthetic logic puzzles as training data due to their controllable complexity and straightforward answer verification. We make some key technical contributions that lead to effective and stable RL training: a system prompt that emphasizes the thinking and answering process, a stringent format reward function that penalizes outputs for taking shortcuts, and a straightforward training recipe that achieves stable convergence. Our 7B model develops advanced reasoning skills-such as reflection, verification, and summarization-that are absent from the logic corpus. Remarkably, after training on just 5K logic problems, it demonstrates generalization abilities to the challenging math benchmarks AIME and AMC.
zh

[NLP-21] ree-of-Debate: Multi-Persona Debate Trees Elicit Critical Thinking for Scientific Comparative Analysis

【速读】：该论文旨在解决科学发现碎片化导致的跨领域和领域内评估困难的问题，特别是不同研究社区之间的相关工作在重要性、新颖性、增量发现及等效观点方面的对比与评价难题。为应对这一挑战，论文提出了一种名为Tree-of-Debate (ToD) 的框架，其关键在于将科学论文转化为大型语言模型（LLM）角色进行辩论，以探讨各自的新颖性。ToD通过动态构建辩论树，强调结构化且批判性的推理分析，而非仅仅关注结果，从而实现对学术文章中独立新颖性论点的细致剖析。

链接: https://arxiv.org/abs/2502.14767
作者: Priyanka Kargupta,Ishika Agarwal,Tal August,Jiawei Han
机构: Department of Computer Science, University of Illinois at Urbana-Champaign (伊利诺伊大学香槟分校计算机科学系)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Code available at: this https URL

点击查看摘要

Abstract:With the exponential growth of research facilitated by modern technology and improved accessibility, scientific discoveries have become increasingly fragmented within and across fields. This makes it challenging to assess the significance, novelty, incremental findings, and equivalent ideas between related works, particularly those from different research communities. Large language models (LLMs) have recently demonstrated strong quantitative and qualitative reasoning abilities, and multi-agent LLM debates have shown promise in handling complex reasoning tasks by exploring diverse perspectives and reasoning paths. Inspired by this, we introduce Tree-of-Debate (ToD), a framework which converts scientific papers into LLM personas that debate their respective novelties. To emphasize structured, critical reasoning rather than focusing solely on outcomes, ToD dynamically constructs a debate tree, enabling fine-grained analysis of independent novelty arguments within scholarly articles. Through experiments on scientific literature across various domains, evaluated by expert researchers, we demonstrate that ToD generates informative arguments, effectively contrasts papers, and supports researchers in their literature review.
zh

[NLP-22] Step-by-Step Fact Verification System for Medical Claims with Explainable Reasoning NAACL2025

【速读】：该论文旨在解决领域特定（Domain-specific）和现实世界事实验证（Fact Verification, FV）的问题，传统方法主要针对百科知识进行验证，并依赖于短证据片段和仅编码器的推理模型。论文的关键解决方案在于采用迭代式（iterative）的方法，利用大型语言模型（LLMs）的多轮交互特性，将FV视为逐步解决问题的过程，在此过程中通过生成并回答询问附加上下文的问题，直至获得足够的信息以做出决策。这种方法使验证过程更加理性且可解释。

链接: https://arxiv.org/abs/2502.14765
作者: Juraj Vladika,Ivana Hacajová,Florian Matthes
机构: Technical University of Munich (慕尼黑工业大学); School of Computation, Information and Technology (计算、信息和技术学院); Department of Computer Science (计算机科学系)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Accepted to NAACL 2025 (Main)

点击查看摘要

Abstract:Fact verification (FV) aims to assess the veracity of a claim based on relevant evidence. The traditional approach for automated FV includes a three-part pipeline relying on short evidence snippets and encoder-only inference models. More recent approaches leverage the multi-turn nature of LLMs to address FV as a step-by-step problem where questions inquiring additional context are generated and answered until there is enough information to make a decision. This iterative method makes the verification process rational and explainable. While these methods have been tested for encyclopedic claims, exploration on domain-specific and realistic claims is missing. In this work, we apply an iterative FV system on three medical fact-checking datasets and evaluate it with multiple settings, including different LLMs, external web search, and structured reasoning using logic predicates. We demonstrate improvements in the final performance over traditional approaches and the high potential of step-by-step FV systems for domain-specific claims.
zh

[NLP-23] On the Influence of Context Size and Model Choice in Retrieval-Augmented Generation Systems NAACL2025

【速读】：该论文旨在解决检索增强生成（Retrieval-augmented generation, RAG）系统在不同组件配置下的性能优化问题。关键在于探索提供上下文的最佳长度、基础大语言模型（LLMs）的选择以及检索方法的有效性。研究发现，最终问答性能随提供最多达15个片段而稳步提升，超过这个数量则停滞或下降。此外，不同通用LLMs在生物医学领域与百科全书领域的表现存在差异，表明开放域证据检索在大型语料库中的挑战性。

链接: https://arxiv.org/abs/2502.14759
作者: Juraj Vladika,Florian Matthes
机构: Technical University of Munich (慕尼黑工业大学)
School of Computation, Information and Technology (计算、信息和技术学院)
Department of Computer Science (计算机科学系)
Garching, Germany (德国加兴)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Accepted to Findings of NAACL 2025

点击查看摘要

Abstract:Retrieval-augmented generation (RAG) has emerged as an approach to augment large language models (LLMs) by reducing their reliance on static knowledge and improving answer factuality. RAG retrieves relevant context snippets and generates an answer based on them. Despite its increasing industrial adoption, systematic exploration of RAG components is lacking, particularly regarding the ideal size of provided context, and the choice of base LLM and retrieval method. To help guide development of robust RAG systems, we evaluate various context sizes, BM25 and semantic search as retrievers, and eight base LLMs. Moving away from the usual RAG evaluation with short answers, we explore the more challenging long-form question answering in two domains, where a good answer has to utilize the entire context. Our findings indicate that final QA performance improves steadily with up to 15 snippets but stagnates or declines beyond that. Finally, we show that different general-purpose LLMs excel in the biomedical domain than the encyclopedic one, and that open-domain evidence retrieval in large corpora is challenging.
zh

[NLP-24] ritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators

【速读】：该论文旨在解决Triton语言在GPU编程中的高效内核生成难题，特别是现有大型语言模型（Large Language Models, LLMs）难以生成性能优化的Triton代码的问题。为了解决这一问题，论文的关键在于引入了TritonBench，这是一个针对Triton算子生成的第一个全面基准测试工具。TritonBench通过两个评估通道——一组来自GitHub的184个真实世界算子以及与PyTorch接口对齐的算子集合——来评估功能正确性和效率性能，从而揭示了当前最先进的代码LLMs在生成高效Triton算子方面的不足，指出了高性能代码生成领域的重要差距。

链接: https://arxiv.org/abs/2502.14752
作者: Jianling Li,Shangzhan Li,Zhenye Gao,Qi Shi,Yuxuan Li,Zefan Wang,Jiacheng Huang,Haojie Wang,Jianrong Wang,Xu Han,Zhiyuan Liu,Maosong Sun
机构: Tianjin University (天津大学); Harbin Institute of Technology (哈尔滨工业大学); The Hong Kong University of Science and Technology (Guangzhou) (香港科技大学（广州）); Tsinghua University (清华大学)
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Triton, a high-level Python-like language designed for building efficient GPU kernels, is widely adopted in deep learning frameworks due to its portability, flexibility, and accessibility. However, programming and parallel optimization still require considerable trial and error from Triton developers. Despite advances in large language models (LLMs) for conventional code generation, these models struggle to generate accurate, performance-optimized Triton code, as they lack awareness of its specifications and the complexities of GPU programming. More critically, there is an urgent need for systematic evaluations tailored to Triton. In this work, we introduce TritonBench, the first comprehensive benchmark for Triton operator generation. TritonBench features two evaluation channels: a curated set of 184 real-world operators from GitHub and a collection of operators aligned with PyTorch interfaces. Unlike conventional code benchmarks prioritizing functional correctness, TritonBench also profiles efficiency performance on widely deployed GPUs aligned with industry applications. Our study reveals that current state-of-the-art code LLMs struggle to generate efficient Triton operators, highlighting a significant gap in high-performance code generation. TritonBench will be available at this https URL.
zh

[NLP-25] Large Language Models Struggle to Describe the Haystack without Human Help: Human-in-the-loop Evaluation of LLM s

【速读】：该论文旨在探究大型语言模型（LLM）在理解大规模文档集合中的有效性，并与传统的主题模型如Latent Dirichlet Allocation (LDA)进行比较。研究发现，虽然LLM方法能够生成更易读的主题，并在数据探索中展现出更高的平均胜率，但对于特定领域的数据集，这些方法倾向于产生过于泛化的主题，难以帮助用户深入了解文档内容。通过添加人工监督可以缓解幻觉和过泛化的问题，但需要更多的人员投入。论文的关键解决方案在于揭示了在处理大规模及特定领域数据时，LLM面临上下文长度限制所导致的描述能力不足和幻觉问题，并强调了在使用LLM进行探索时加入人工指导的重要性。

链接: https://arxiv.org/abs/2502.14748
作者: Zongxia Li,Lorena Calvo-Bartolomé,Alexander Hoyle,Paiheng Xu,Alden Dima,Juan Francisco Fung,Jordan Boyd-Graber
机构: 未知
类目: Computation and Language (cs.CL)
备注: 21 Pages. LLM for Data Exploration and content analysis

点击查看摘要

Abstract:A common use of NLP is to facilitate the understanding of large document collections, with a shift from using traditional topic models to Large Language Models. Yet the effectiveness of using LLM for large corpus understanding in real-world applications remains under-explored. This study measures the knowledge users acquire with unsupervised, supervised LLM-based exploratory approaches or traditional topic models on two datasets. While LLM-based methods generate more human-readable topics and show higher average win probabilities than traditional models for data exploration, they produce overly generic topics for domain-specific datasets that do not easily allow users to learn much about the documents. Adding human supervision to the LLM generation process improves data exploration by mitigating hallucination and over-genericity but requires greater human effort. In contrast, traditional. models like Latent Dirichlet Allocation (LDA) remain effective for exploration but are less user-friendly. We show that LLMs struggle to describe the haystack of large corpora without human help, particularly domain-specific data, and face scaling and hallucination limitations due to context length constraints. Dataset available at https://huggingface. co/datasets/zli12321/Bills.
zh

[NLP-26] HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden States

【速读】：该论文旨在解决大型视觉语言模型（Large Vision-Language Models, LVLMs）在集成额外模态后相较于仅处理文本的模型更容易受到安全风险，如越狱攻击的问题。论文的关键在于发现LVLMs在推理过程中内部激活模式会因处理不安全提示而表现出不同的特征，从而提出了一种无需微调的新型框架HiddenDetect。此框架利用模型内部激活来检测和缓解对抗性输入，显著提升了检测越狱攻击的能力，为增强LVLM在多模态威胁下的鲁棒性提供了高效且可扩展的解决方案。

链接: https://arxiv.org/abs/2502.14744
作者: Yilei Jiang,Xinyan Gao,Tianshuo Peng,Yingshui Tan,Xiaoyong Zhu,Bo Zheng,Xiangyu Yue
机构: MMLab, The Chinese University of Hong Kong(香港中文大学MMLab); Future Lab, Alibaba Group(阿里巴巴集团未来实验室)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:The integration of additional modalities increases the susceptibility of large vision-language models (LVLMs) to safety risks, such as jailbreak attacks, compared to their language-only counterparts. While existing research primarily focuses on post-hoc alignment techniques, the underlying safety mechanisms within LVLMs remain largely unexplored. In this work , we investigate whether LVLMs inherently encode safety-relevant signals within their internal activations during inference. Our findings reveal that LVLMs exhibit distinct activation patterns when processing unsafe prompts, which can be leveraged to detect and mitigate adversarial inputs without requiring extensive fine-tuning. Building on this insight, we introduce HiddenDetect, a novel tuning-free framework that harnesses internal model activations to enhance safety. Experimental results show that HiddenDetect surpasses state-of-the-art methods in detecting jailbreak attacks against LVLMs. By utilizing intrinsic safety-aware patterns, our method provides an efficient and scalable solution for strengthening LVLM robustness against multimodal threats. Our code will be released publicly at this https URL.
zh

[NLP-27] SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

【速读】：该论文旨在解决大型语言模型（Large Language Models, LLMs）在超过200个专业学科中的知识和推理能力评估不足的问题，尤其是在轻工业、农业和服务导向领域。论文的关键解决方案是提出了SuperGPQA基准测试，该测试涵盖了285个学科，并采用了一种新型的人机协同过滤机制，通过迭代优化基于LLM响应和专家反馈的方式，消除简单或模糊的问题，从而全面评估LLMs在不同知识领域的表现。

链接: https://arxiv.org/abs/2502.14739
作者: M-A-P Team,Xinrun Du,Yifan Yao,Kaijing Ma,Bingli Wang,Tianyu Zheng,Kang Zhu,Minghao Liu,Yiming Liang,Xiaolong Jin,Zhenlin Wei,Chujie Zheng,Kaixing Deng,Shuyue Guo,Shian Jia,Sichao Jiang,Yiyan Liao,Rui Li,Qinrui Li,Sirun Li,Yizhi Li,Yunwen Li,Dehua Ma,Yuansheng Ni,Haoran Que,Qiyao Wang,Zhoufutu Wen,Siwei Wu,Tianshun Xing,Ming Xu,Zhenzhu Yang,Zekun Moore Wang,Junting Zhou,Yuelin Bai,Xingyuan Bu,Chenglin Cai,Liang Chen,Yifan Chen,Chengtuo Cheng,Tianhao Cheng,Keyi Ding,Siming Huang,Yun Huang,Yaoru Li,Yizhe Li,Zhaoqun Li,Tianhao Liang,Chengdong Lin,Hongquan Lin,Yinghao Ma,Zhongyuan Peng,Zifan Peng,Qige Qi,Shi Qiu,Xingwei Qu,Yizhou Tan,Zili Wang,Chenqing Wang,Hao Wang,Yiya Wang,Yubo Wang,Jiajun Xu,Kexin Yang,Ruibin Yuan,Yuanhao Yue,Tianyang Zhan,Chun Zhang,Jingyang Zhang,Xiyue Zhang,Xingjian Zhang,Yue Zhang,Yongchi Zhao,Xiangyu Zheng,Chenghua Zhong,Yang Gao,Zhoujun Li,Dayiheng Liu,Qian Liu,Tianyu Liu,Shiwen Ni,Junran Peng,Yujia Qin,Wenbo Su,Guoyin Wang,Shi Wang,Jian Yang,Min Yang,Meng Cao,Xiang Yue,Zhaoxiang Zhang,Wangchunshu Zhou,Jiaheng Liu,Qunshu Lin,Wenhao Huang,Ge Zhang
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities of LLMs in many of these specialized fields-particularly in light industry, agriculture, and service-oriented disciplines-remain inadequately evaluated. To address this gap, we present SuperGPQA, a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. Our benchmark employs a novel Human-LLM collaborative filtering mechanism to eliminate trivial or ambiguous questions through iterative refinement based on both LLM responses and expert feedback. Our experimental results reveal significant room for improvement in the performance of current state-of-the-art LLMs across diverse knowledge domains (e.g., the reasoning-focused model DeepSeek-R1 achieved the highest accuracy of 61.82% on SuperGPQA), highlighting the considerable gap between current model capabilities and artificial general intelligence. Additionally, we present comprehensive insights from our management of a large-scale annotation process, involving over 80 expert annotators and an interactive Human-LLM collaborative system, offering valuable methodological guidance for future research initiatives of comparable scope.
zh

[NLP-28] Sentence Smith: Formally Controllable Text Transformation and its Application to Evaluation of Text Embedding Models

【速读】：该论文旨在解决文本嵌入模型在处理语义变化时的局限性，特别是当前基准测试中语言现象的不透明问题。解决方案的关键在于Sentence Smith框架，它通过将句子解析为语义图，应用人工设计的语义操作规则，从操作后的图生成文本，并进行最终过滤以确保转换的有效性，从而实现可控的语义操控。这种方法使得不同的语义变化能够被清晰地隔离，进而深入揭示广泛使用的文本嵌入模型的优势与不足。

链接: https://arxiv.org/abs/2502.14734
作者: Hongji Li,Andrianos Michail,Reto Gubelmann,Simon Clematide,Juri Opitz
机构: University of Zurich (苏黎世大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:We propose the Sentence Smith framework that enables controlled and specified manipulation of text meaning. It consists of three main steps: 1. Parsing a sentence into a semantic graph, 2. Applying human-designed semantic manipulation rules, and 3. Generating text from the manipulated graph. A final filtering step (4.) ensures the validity of the applied transformation. To demonstrate the utility of Sentence Smith in an application study, we use it to generate hard negative pairs that challenge text embedding models. Since the controllable generation makes it possible to clearly isolate different types of semantic shifts, we can gain deeper insights into the specific strengths and weaknesses of widely used text embedding models, also addressing an issue in current benchmarking where linguistic phenomena remain opaque. Human validation confirms that the generations produced by Sentence Smith are highly accurate.
zh

[NLP-29] Entity Framing and Role Portrayal in the News ACL

【速读】：该论文旨在构建一个新型多语言层级语料库，用于标注新闻文章中的实体角色描绘与框架化（Entity Framing and Role Portrayal），特别是针对主人公（protagonist）、反派（antagonist）和无辜者（innocent）三大类别下的22种精细角色。解决方案的关键在于采用一种独特的分类法，该方法借鉴了故事元素，并通过详尽定义每个角色来捕捉实体的细微差别。该语料库包含了来自五种语言（保加利亚语、英语、印地语、欧洲葡萄牙语和俄语）的1,378篇近期新闻文章，聚焦于全球重要的两个领域：乌克兰-俄罗斯战争和气候变化。此数据集作为研究角色描绘的重要资源，具有更广泛的新闻分析意义。

链接: https://arxiv.org/abs/2502.14718
作者: Tarek Mahmoud,Zhuohan Xie,Dimitar Dimitrov,Nikolaos Nikolaidis,Purificação Silvano,Roman Yangarber,Shivam Sharma,Elisa Sartori,Nicolas Stefanovitch,Giovanni Da San Martino,Jakub Piskorski,Preslav Nakov
机构: MBZUAI; Sofia University “St. Kliment Ohridski”; Athens University of Economics and Business; University of Porto; University of Helsinki; Indian Institute of Technology Delhi; University of Padova; European Commission Joint Research Centre; Institute of Computer Science, Polish Academy of Science
类目: Computation and Language (cs.CL)
备注: 23 pages, 12 figures. Submitted to ACL Rolling Review (ARR)

点击查看摘要

Abstract:We introduce a novel multilingual hierarchical corpus annotated for entity framing and role portrayal in news articles. The dataset uses a unique taxonomy inspired by storytelling elements, comprising 22 fine-grained roles, or archetypes, nested within three main categories: protagonist, antagonist, and innocent. Each archetype is carefully defined, capturing nuanced portrayals of entities such as guardian, martyr, and underdog for protagonists; tyrant, deceiver, and bigot for antagonists; and victim, scapegoat, and exploited for innocents. The dataset includes 1,378 recent news articles in five languages (Bulgarian, English, Hindi, European Portuguese, and Russian) focusing on two critical domains of global significance: the Ukraine-Russia War and Climate Change. Over 5,800 entity mentions have been annotated with role labels. This dataset serves as a valuable resource for research into role portrayal and has broader implications for news analysis. We describe the characteristics of the dataset and the annotation process, and we report evaluation results on fine-tuned state-of-the-art multilingual transformers and hierarchical zero-shot learning using LLMs at the level of a document, a paragraph, and a sentence.
zh

[NLP-30] From Knowledge Generation to Knowledge Verification: Examining the BioMedical Generative Capabilities of ChatGPT

【速读】：该论文旨在解决大型语言模型（LLM）在生成生物医学知识时的真实性问题。为应对这一挑战，论文提出了一种计算方法，系统性地评估LLM生成的生物医学知识的事实准确性。解决方案的关键在于两个过程：一是生成以疾病为中心的关联，二是利用生物医学本体的语义知识验证这些关联。通过设计一系列提示工程过程，使用ChatGPT生成疾病、药物、症状和基因之间的联系，从而建立评估的基础。实验结果显示，对于疾病术语（88%-97%）、药物名称（90%-91%）和遗传信息（88%-98%）的识别准确率较高，而症状术语的识别准确率较低（49%-61%）。

链接: https://arxiv.org/abs/2502.14714
作者: Ahmed Abdeen Hamed,Byung Suk Lee
机构: 未知
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
备注: 26 pages, 6 figures, In Review with a Cell Press Journal

点击查看摘要

Abstract:The generative capabilities of LLM models present opportunities in accelerating tasks and concerns with the authenticity of the knowledge it produces. To address the concerns, we present a computational approach that systematically evaluates the factual accuracy of biomedical knowledge that an LLM model has been prompted to generate. Our approach encompasses two processes: the generation of disease-centric associations and the verification of them using the semantic knowledge of the biomedical ontologies. Using ChatGPT as the select LLM model, we designed a set of prompt-engineering processes to generate linkages between diseases, drugs, symptoms, and genes to establish grounds for assessments. Experimental results demonstrate high accuracy in identifying disease terms (88%-97%), drug names (90%-91%), and genetic information (88%-98%). The symptom term identification accuracy was notably lower (49%-61%), as verified against the DOID, ChEBI, SYMPTOM, and GO ontologies accordingly. The verification of associations reveals literature coverage rates of (89%-91%) among disease-drug and disease-gene associations. The low identification accuracy for symptom terms also contributed to the verification of symptom-related associations (49%-62%).
zh

[NLP-31] Data-Efficient Pretraining with Group-Level Data Influence Modeling

【速读】：该论文旨在解决数据高效预训练中如何有效利用组级别数据的问题。论文的关键在于提出了一种名为Group-Level Data Influence Modeling (Group-MATES)的方法，通过收集和优化组级别的数据效用来提升预训练模型的效果。这种方法能够捕捉数据点之间的复杂交互关系，并通过影响感知聚类实现高效的推理，从而在多个下游任务中显著提升了性能。

链接: https://arxiv.org/abs/2502.14709
作者: Zichun Yu,Fei Peng,Jie Lei,Arnold Overwijk,Wen-tau Yih,Chenyan Xiong
机构: 未知
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Data-efficient pretraining has shown tremendous potential to elevate scaling laws. This paper argues that effective pretraining data should be curated at the group level, treating a set of data points as a whole rather than as independent contributors. To achieve that, we propose Group-Level Data Influence Modeling (Group-MATES), a novel data-efficient pretraining method that captures and optimizes group-level data utility. Specifically, Group-MATES collects oracle group-level influences by locally probing the pretraining model with data sets. It then fine-tunes a relational data influence model to approximate oracles as relationship-weighted aggregations of individual influences. The fine-tuned model selects the data subset by maximizing its group-level influence prediction, with influence-aware clustering to enable efficient inference. Experiments on the DCLM benchmark demonstrate that Group-MATES achieves a 10% relative core score improvement on 22 downstream tasks over DCLM-Baseline and 5% over individual-influence-based methods, establishing a new state-of-the-art. Further analyses highlight the effectiveness of relational data influence models in capturing intricate interactions between data points.
zh

[NLP-32] I-MCTS: Enhancing Agent ic AutoML via Introspective Monte Carlo Tree Search

【速读】：该论文旨在解决现有基于大型语言模型（Large Language Models, LLMs）的代理在自动化机器学习任务中代码生成多样性低和质量不高的问题。论文的关键解决方案是引入了一种名为内省蒙特卡洛树搜索（Introspective Monte Carlo Tree Search, I-MCTS）的新方法。I-MCTS通过迭代扩展树节点，并通过内省过程仔细分析父节点和兄弟节点的解与结果，从而实现节点的持续优化。此外，该方法还集成了一个基于LLM的价值模型，用于直接评估每个节点的解，以及实施了一种混合奖励机制，以平滑过渡从LLM估计得分到实际性能得分，从而提高高质量节点的遍历效率。这些改进共同提升了自动化机器学习系统的整体性能，相较于开源的强基准AutoML代理，其性能绝对提升了6%。

链接: https://arxiv.org/abs/2502.14693
作者: Zujie Liang,Feng Wei,Wujiang Xu,Lin Chen,Yuxi Qian,Xinhui Wu
机构: MYbank, Ant Group(蚂蚁集团); Rutgers University(罗格斯大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Recent advancements in large language models (LLMs) have shown remarkable potential in automating machine learning tasks. However, existing LLM-based agents often struggle with low-diversity and suboptimal code generation. While recent work has introduced Monte Carlo Tree Search (MCTS) to address these issues, limitations persist in the quality and diversity of thoughts generated, as well as in the scalar value feedback mechanisms used for node selection. In this study, we introduce Introspective Monte Carlo Tree Search (I-MCTS), a novel approach that iteratively expands tree nodes through an introspective process that meticulously analyzes solutions and results from parent and sibling nodes. This facilitates a continuous refinement of the node in the search tree, thereby enhancing the overall decision-making this http URL, we integrate a Large Language Model (LLM)-based value model to facilitate direct evaluation of each node’s solution prior to conducting comprehensive computational rollouts. A hybrid rewarding mechanism is implemented to seamlessly transition the Q-value from LLM-estimated scores to actual performance scores. This allows higher-quality nodes to be traversed this http URL to the various ML tasks, our approach demonstrates a6% absolute improvement in performance compared to the strong open-source AutoML agents, showcasing its effectiveness in enhancing agentic AutoML systems.
zh

[NLP-33] Bridging the Gap: Transforming Natural Language Questions into SQL Queries via Abstract Query Pattern and Contextual Schema Markup

【速读】：该论文旨在解决大型语言模型（LLMs）在处理复杂文本到SQL（Text-to-SQL）任务时存在的性能差距，特别是结构映射差距（structural mapping gap）和词汇映射差距（lexical mapping gap）。为了解决这些问题，论文提出了一种基于LLMs的高效SQL生成管道——PAS-SQL。该方法通过抽象查询模式（AQP）和上下文模式标记（CSM）来缓解上述差距。AQP通过去除与数据库相关的信息以获取问题的结构模式，从而找到结构相似的演示示例；CSM则将问题中的数据库相关文本片段与数据库中的特定表或列关联起来，以减轻词汇映射差距。实验结果表明，PAS-SQL在Spider数据集上达到了新的最先进水平，执行准确率为87.9%，在BIRD数据集上也取得了领先的结果，执行准确率为64.67%。

链接: https://arxiv.org/abs/2502.14682
作者: Yonghui Kong,Hongbing Hu,Dan Zhang,Siyuan Chai,Fan Zhang,Wei Wang
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Large language models have demonstrated excellent performance in many tasks, including Text-to-SQL, due to their powerful in-context learning capabilities. They are becoming the mainstream approach for Text-to-SQL. However, these methods still have a significant gap compared to human performance, especially on complex questions. As the complexity of questions increases, the gap between questions and SQLs increases. We identify two important gaps: the structural mapping gap and the lexical mapping gap. To tackle these two gaps, we propose PAS-SQL, an efficient SQL generation pipeline based on LLMs, which alleviates gaps through Abstract Query Pattern (AQP) and Contextual Schema Markup (CSM). AQP aims to obtain the structural pattern of the question by removing database-related information, which enables us to find structurally similar demonstrations. CSM aims to associate database-related text span in the question with specific tables or columns in the database, which alleviates the lexical mapping gap. Experimental results on the Spider and BIRD datasets demonstrate the effectiveness of our proposed method. Specifically, PAS-SQL + GPT-4o sets a new state-of-the-art on the Spider benchmark with an execution accuracy of 87.9%, and achieves leading results on the BIRD dataset with an execution accuracy of 64.67%.
zh

[NLP-34] How to Get Your LLM to Generate Challenging Problems for Evaluation

【速读】：该论文旨在解决大规模语言模型（Large Language Models, LLMs）评估过程中传统人工标注方法因复杂性和成本而变得不切实际的问题。论文的关键解决方案是提出CHASE框架，该框架能够无需人工干预地通过LLMs自动生成具有挑战性的问题。CHASE框架通过自下而上的方式从简单的组件构建难题，并将生成过程分解为可独立验证的子任务，从而确保高质量和正确性。

链接: https://arxiv.org/abs/2502.14678
作者: Arkil Patel,Siva Reddy,Dzmitry Bahdanau
机构: Mila (米拉); McGill University (麦吉尔大学); ServiceNow Research; Canada CIFAR AI Chair
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:The pace of evolution of Large Language Models (LLMs) necessitates new approaches for rigorous and comprehensive evaluation. Traditional human annotation is increasingly impracticable due to the complexities and costs involved in generating high-quality, challenging problems. In this work, we introduce CHASE, a unified framework to synthetically generate challenging problems using LLMs without human involvement. For a given task, our approach builds a hard problem in a bottom-up manner from simpler components. Moreover, our framework decomposes the generation process into independently verifiable sub-tasks, thereby ensuring a high level of quality and correctness. We implement CHASE to create evaluation benchmarks across three diverse domains: (1) document-based question answering, (2) repository-level code completion, and (3) math reasoning. The performance of state-of-the-art LLMs on these synthetic benchmarks lies in the range of 40-60% accuracy, thereby demonstrating the effectiveness of our framework at generating challenging problems. We publicly release our benchmarks and code.
zh

[NLP-35] Data-Constrained Synthesis of Training Data for De-Identification

【速读】：该论文旨在解决临床领域等敏感领域因隐私风险导致的缺乏广泛可用数据集的问题。论文的关键解决方案在于通过领域适应大型语言模型（LLMs），生成带有机器标注的个人识别信息标签的合成临床文本，并利用这些合成语料库训练合成命名实体识别（NER）模型。研究结果显示，使用合成语料库训练NER模型仅导致预测性能略有下降。该过程的局限性通过系统消融研究进行了探讨。关键在于较小的数据集可以满足领域适应LLMs进行数据合成的需求，而这一过程的有效性几乎完全取决于使用原始数据训练的机器标注NER模型的性能。

链接: https://arxiv.org/abs/2502.14677
作者: Thomas Vakili,Aron Henriksson,Hercules Dalianis
机构: Department of Computer and Systems Sciences (计算机与系统科学系), Stockholm University (斯德哥尔摩大学), Kista (基斯塔), Sweden (瑞典)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Under review

点击查看摘要

Abstract:Many sensitive domains – such as the clinical domain – lack widely available datasets due to privacy risks. The increasing generative capabilities of large language models (LLMs) have made synthetic datasets a viable path forward. In this study, we domain-adapt LLMs to the clinical domain and generate synthetic clinical texts that are machine-annotated with tags for personally identifiable information using capable encoder-based NER models. The synthetic corpora are then used to train synthetic NER models. The results show that training NER models using synthetic corpora incurs only a small drop in predictive performance. The limits of this process are investigated in a systematic ablation study – using both Swedish and Spanish data. Our analysis shows that smaller datasets can be sufficient for domain-adapting LLMs for data synthesis. Instead, the effectiveness of this process is almost entirely contingent on the performance of the machine-annotating NER models trained using the original data.
zh

[NLP-36] Explanations of Deep Language Models Explain Language Representations in the Brain

【速读】：该论文旨在解决如何更深入地连接大型语言模型（Large Language Models, LLMs）与大脑语言处理机制之间的关系。关键在于采用可解释的人工智能（Explainable AI, XAI）方法，特别是归因方法（attribution methods），来量化先前词汇对LLMs预测下一个词汇的贡献，并利用这些解释来预测功能性磁共振成像（fMRI）记录，从而实现对语言网络脑活动的稳健预测。这一方法不仅在早期语言区域超越了传统的内部表示，而且展示了分层对齐的特性，即早期层与大脑语言处理的初始阶段相对应，而后期层则与更高级阶段相对应。此外，对LLM下一个词预测影响更大的层（具有更高的归因得分）表现出与神经活动更强的对齐。

链接: https://arxiv.org/abs/2502.14671
作者: Maryam Rahimi(1),Yadollah Yaghoobzadeh(2 and 4),Mohammad Reza Daliri(1 and 3) ((1) Biomedical Engineering Department, School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran, (2) Electrical and Computer Engineering Department, University of Tehran, Tehran, Iran, (3) School of Cognitive Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran, (4) Tehran Institute for Advanced Studies, Khatam University, Tehran, Iran)
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)
备注:

点击查看摘要

Abstract:Recent advances in artificial intelligence have given rise to large language models (LLMs) that not only achieve human-like performance but also share computational principles with the brain’s language processing mechanisms. While previous research has primarily focused on aligning LLMs’ internal representations with neural activity, we introduce a novel approach that leverages explainable AI (XAI) methods to forge deeper connections between the two domains. Using attribution methods, we quantified how preceding words contribute to an LLM’s next-word predictions and employed these explanations to predict fMRI recordings from participants listening to the same narratives. Our findings demonstrate that attribution methods robustly predict brain activity across the language network, surpassing traditional internal representations in early language areas. This alignment is hierarchical: early-layer explanations correspond to the initial stages of language processing in the brain, while later layers align with more advanced stages. Moreover, the layers more influential on LLM next-word prediction \unicodex2014 those with higher attribution scores \unicodex2014 exhibited stronger alignment with neural activity. This work establishes a bidirectional bridge between AI and neuroscience. First, we demonstrate that attribution methods offer a powerful lens for investigating the neural mechanisms of language comprehension, revealing how meaning emerges from preceding context. Second, we propose using brain alignment as a metric to evaluate the validity of attribution methods, providing a framework for assessing their biological plausibility.
zh

[NLP-37] AlphaMaze: Enhancing Large Language Models Spatial Intelligence via GRPO

【速读】：该论文旨在解决大型语言模型（LLMs）在需要真实视觉空间推理的任务中的不足。关键解决方案在于引入了一种新颖的两阶段训练框架：首先通过有监督微调（SFT）在精心策划的迷宫表示数据集上训练模型以预测逐步移动命令；其次应用了组相对策略优化（GRPO），辅以精心设计的奖励函数，以精炼模型的顺序决策能力并促进涌现的思维链行为。实验结果表明，该方法显著提升了模型在迷宫导航任务中的表现。

链接: https://arxiv.org/abs/2502.14669
作者: Alan Dao(Gia Tuan Dao),Dinh Bach Vu
机构: Menlo Research
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Large Language Models (LLMs) have demonstrated impressive capabilities in language processing, yet they often struggle with tasks requiring genuine visual spatial reasoning. In this paper, we introduce a novel two-stage training framework designed to equip standard LLMs with visual reasoning abilities for maze navigation. First, we leverage Supervised Fine Tuning (SFT) on a curated dataset of tokenized maze representations to teach the model to predict step-by-step movement commands. Next, we apply Group Relative Policy Optimization (GRPO)-a technique used in DeepSeekR1-with a carefully crafted reward function to refine the model’s sequential decision-making and encourage emergent chain-of-thought behaviors. Experimental results on synthetically generated mazes show that while a baseline model fails to navigate the maze, the SFT-trained model achieves 86% accuracy, and further GRPO fine-tuning boosts accuracy to 93%. Qualitative analyses reveal that GRPO fosters more robust and self-corrective reasoning, highlighting the potential of our approach to bridge the gap between language models and visual spatial tasks. These findings offer promising implications for applications in robotics, autonomous navigation, and other domains that require integrated visual and sequential reasoning.
zh

[NLP-38] InstructAgent : Building User Controllable Recommender via LLM Agent WWW2025

【速读】：该论文旨在解决传统推荐系统中用户在平台控制下的脆弱位置问题。具体而言，这些问题包括平台导向的推荐算法可能忽视用户的真正兴趣，优化过程中忽略个体偏好，以及由此导致的用户缺乏对推荐系统的控制、潜在的平台操控、回音室效应以及不活跃用户个性化不足等劣势。为了解决这些局限性，论文提出了一种新的用户-代理-平台范式，其中代理作为用户与推荐系统之间的保护屏障，实现间接暴露。关键在于构建四个推荐数据集，并为每个记录提供用户指令，从而通过代理机制来保护用户利益并缓解上述问题。

链接: https://arxiv.org/abs/2502.14662
作者: Wujiang Xu,Yunxiao Shi,Zujie Liang,Xuying Ning,Kai Mei,Kun Wang,Xi Zhu,Min Xu,Yongfeng Zhang
机构: Rutgers University(罗格斯大学); University of Technology Sydney(悉尼科技大学); Ant Group(蚂蚁集团); University of Illinois Urbana-Champaign(伊利诺伊大学香槟分校); Nanyang Technological University(南洋理工大学); Min Xu(徐旻)(未提供单位信息)
类目: Computation and Language (cs.CL); Information Retrieval (cs.IR)
备注: WWW2025@HCRS

点击查看摘要

Abstract:Traditional recommender systems usually take the user-platform paradigm, where users are directly exposed under the control of the platform’s recommendation algorithms. However, the defect of recommendation algorithms may put users in very vulnerable positions under this paradigm. First, many sophisticated models are often designed with commercial objectives in mind, focusing on the platform’s benefits, which may hinder their ability to protect and capture users’ true interests. Second, these models are typically optimized using data from all users, which may overlook individual user’s preferences. Due to these shortcomings, users may experience several disadvantages under the traditional user-platform direct exposure paradigm, such as lack of control over the recommender system, potential manipulation by the platform, echo chamber effects, or lack of personalization for less active users due to the dominance of active users during collaborative learning. Therefore, there is an urgent need to develop a new paradigm to protect user interests and alleviate these issues. Recently, some researchers have introduced LLM agents to simulate user behaviors, these approaches primarily aim to optimize platform-side performance, leaving core issues in recommender systems unresolved. To address these limitations, we propose a new user-agent-platform paradigm, where agent serves as the protective shield between user and recommender system that enables indirect exposure. To this end, we first construct four recommendation datasets, denoted as \dataset , along with user instructions for each record.
zh

[NLP-39] Edit Once Update Everywhere: A Simple Framework for Cross-Lingual Knowledge Synchronization in LLM s

【速读】：该论文旨在解决现有知识编辑方法在多语言环境中无法实现真正跨语言知识同步的问题。解决方案的关键在于提出了一种名为Cross-Lingual Knowledge Democracy Edit (X-KDE)的方法，该方法包含两个阶段：(i) 跨语言编辑指令调优 (XE-IT)，通过微调模型以修改目标知识同时保留无关信息；(ii) 目标语言偏好优化 (TL-PO)，应用高级优化技术确保跨语言一致性，促进更新的传递。

链接: https://arxiv.org/abs/2502.14645
作者: Yuchen Wu,Liang Ding,Li Shen,Dacheng Tao
机构: Shanghai Jiao Tong University (上海交通大学); The University of Sydney (悉尼大学); Sun Yatsen University (中山大学); Nanyang Technological University (南洋理工大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Knowledge editing allows for efficient adaptation of large language models (LLMs) to new information or corrections without requiring full retraining. However, prior methods typically focus on either single-language editing or basic multilingual editing, failing to achieve true cross-linguistic knowledge synchronization. To address this, we present a simple and practical state-of-the-art (SOTA) recipe Cross-Lingual Knowledge Democracy Edit (X-KDE), designed to propagate knowledge from a dominant language to other languages effectively. Our X-KDE comprises two stages: (i) Cross-lingual Edition Instruction Tuning (XE-IT), which fine-tunes the model on a curated parallel dataset to modify in-scope knowledge while preserving unrelated information, and (ii) Target-language Preference Optimization (TL-PO), which applies advanced optimization techniques to ensure consistency across languages, fostering the transfer of updates. Additionally, we contribute a high-quality, cross-lingual dataset, specifically designed to enhance knowledge transfer across languages. Extensive experiments on the Bi-ZsRE and MzsRE benchmarks show that X-KDE significantly enhances cross-lingual performance, achieving an average improvement of +8.19%, while maintaining high accuracy in monolingual settings.
zh

[NLP-40] LIFT: Improving Long Context Understanding of Large Language Models through Long Input Fine-Tuning

【速读】：该论文旨在解决大型语言模型（Large Language Models, LLMs）在长上下文理解方面的挑战，主要由于其有限的上下文窗口。论文提出了一种名为长输入微调（Long Input Fine-Tuning, LIFT）的新框架，通过动态调整模型参数以适应长输入，从而提升任意短上下文LLMs的长上下文性能。LIFT的关键在于将长输入存储和吸收进模型参数中，而非无限扩大上下文窗口大小。此外，为了增强LIFT性能同时保持原有的即时学习（In-Context Learning, ICL）能力，引入了门控记忆（Gated Memory），这是一种专门的注意力适配器，能够自动平衡长输入记忆与即时学习。

链接: https://arxiv.org/abs/2502.14644
作者: Yansheng Mao,Yufei Xu,Jiaqi Li,Fanxu Meng,Haotong Yang,Zilong Zheng,Xiyuan Wang,Muhan Zhang
机构: 未知
类目: Computation and Language (cs.CL)
备注: arXiv admin note: text overlap with arXiv:2412.13626

点击查看摘要

Abstract:Long context understanding remains challenging for large language models due to their limited context windows. This paper presents Long Input Fine-Tuning (LIFT), a novel framework for long-context modeling that can improve the long-context performance of arbitrary (short-context) LLMs by dynamically adapting model parameters based on the long input. Importantly, LIFT, rather than endlessly extending the context window size to accommodate increasingly longer inputs in context, chooses to store and absorb the long input in parameter. By fine-tuning the long input into model parameters, LIFT allows short-context LLMs to answer questions even when the required information is not provided in the context during inference. Furthermore, to enhance LIFT performance while maintaining the original in-context learning (ICL) capabilities, we introduce Gated Memory, a specialized attention adapter that automatically balances long input memorization and ICL. We provide a comprehensive analysis of the strengths and limitations of LIFT on long context understanding, offering valuable directions for future research.
zh

[NLP-41] Length-Controlled Margin-Based Preference Optimization without Reference Model

【速读】：该论文旨在解决Direct Preference Optimization (DPO)在离线偏好基础上的强化学习中遇到的长度偏差、内存效率低下及概率退化等问题。论文的关键解决方案是提出Length-Controlled Margin-Based Preference Optimization (LMPO)，其通过引入均匀参考模型作为DPO损失的上限来实现对原始优化目标的更精确逼近，并采用平均对数概率优化策略以减小训练与推理阶段之间的差异。LMPO的核心创新在于其长度控制边界损失函数，该函数在Bradley-Terry框架内同时调节响应长度并扩大优选与拒绝输出间的差距，从而缓解接受和拒绝响应的概率退化问题。

链接: https://arxiv.org/abs/2502.14643
作者: Gengxu Li,Tingyu Xia,Yi Chang,Yuan Wu
机构: School of Artificial Intelligence, Jilin University (吉林大学人工智能学院); Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China (中国教育部知识驱动人机智能工程研究中心); International Center of Future Science, Jilin University (吉林大学未来科学国际中心)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Direct Preference Optimization (DPO) is a widely adopted offline algorithm for preference-based reinforcement learning from human feedback (RLHF), designed to improve training simplicity and stability by redefining reward functions. However, DPO is hindered by several limitations, including length bias, memory inefficiency, and probability degradation. To address these challenges, we propose Length-Controlled Margin-Based Preference Optimization (LMPO), a more efficient and robust alternative. LMPO introduces a uniform reference model as an upper bound for the DPO loss, enabling a more accurate approximation of the original optimization objective. Additionally, an average log-probability optimization strategy is employed to minimize discrepancies between training and inference phases. A key innovation of LMPO lies in its Length-Controlled Margin-Based loss function, integrated within the Bradley-Terry framework. This loss function regulates response length while simultaneously widening the margin between preferred and rejected outputs. By doing so, it mitigates probability degradation for both accepted and discarded responses, addressing a significant limitation of existing methods. We evaluate LMPO against state-of-the-art preference optimization techniques on two open-ended large language models, Mistral and LLaMA3, across six conditional benchmarks. Our experimental results demonstrate that LMPO effectively controls response length, reduces probability degradation, and outperforms existing approaches. The code is available at \urlthis https URL.
zh

[NLP-42] How Far are LLM s from Being Our Digital Twins? A Benchmark for Persona-Based Behavior Chain Simulation

【速读】：该论文旨在解决大型语言模型（LLMs）在模拟连续人类行为方面的不足，当前评估主要集中在对话模拟而忽视了这一重要方面。论文的关键解决方案是引入BehaviorChain基准，这是一个包含1,001个独特人格的15,846种不同行为的行为链集合，用于评估LLMs模拟人类行为的能力。通过将人格元数据整合到LLMs中，并在BehaviorChain提供的动态场景中迭代推断合适的行为，论文展示了即使是最先进的模型也难以准确模拟连续的人类行为。

链接: https://arxiv.org/abs/2502.14642
作者: Rui Li,Heming Xia,Xinfeng Yuan,Qingxiu Dong,Lei Sha,Wenjie Li,Zhifang Sui
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Recently, LLMs have garnered increasing attention across academic disciplines for their potential as human digital twins, virtual proxies designed to replicate individuals and autonomously perform tasks such as decision-making, problem-solving, and reasoning on their behalf. However, current evaluations of LLMs primarily emphasize dialogue simulation while overlooking human behavior simulation, which is crucial for digital twins. To address this gap, we introduce BehaviorChain, the first benchmark for evaluating LLMs’ ability to simulate continuous human behavior. BehaviorChain comprises diverse, high-quality, persona-based behavior chains, totaling 15,846 distinct behaviors across 1,001 unique personas, each with detailed history and profile metadata. For evaluation, we integrate persona metadata into LLMs and employ them to iteratively infer contextually appropriate behaviors within dynamic scenarios provided by BehaviorChain. Comprehensive evaluation results demonstrated that even state-of-the-art models struggle with accurately simulating continuous human behavior.
zh

[NLP-43] NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization

【速读】：该论文旨在解决图像地理定位（Image Geo-Localization）任务中的复杂推理需求，特别是视觉、地理和文化背景之间的跨领域推理。论文的关键解决方案在于提出了NaviClues数据集和Navig框架。NaviClues数据集源自流行的地理游戏GeoGuessr，提供了专家推理的语言示例。Navig框架则整合了全局和细粒度的图像信息，并通过语言推理将平均距离误差减少了14%，同时仅需少于1000个训练样本。

链接: https://arxiv.org/abs/2502.14638
作者: Zheyuan Zhang,Runze Li,Tasnim Kabir,Jordan Boyd-Graber
机构: Tsinghua University(清华大学); Nanjing University(南京大学); University of Maryland(马里兰大学)
类目: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Image geo-localization is the task of predicting the specific location of an image and requires complex reasoning across visual, geographical, and cultural contexts. While prior Vision Language Models (VLMs) have the best accuracy at this task, there is a dearth of high-quality datasets and models for analytical reasoning. We first create NaviClues, a high-quality dataset derived from GeoGuessr, a popular geography game, to supply examples of expert reasoning from language. Using this dataset, we present Navig, a comprehensive image geo-localization framework integrating global and fine-grained image information. By reasoning with language, Navig reduces the average distance error by 14% compared to previous state-of-the-art models while requiring fewer than 1000 training samples. Our dataset and code are available at this https URL.
zh

[NLP-44] PEARL: Towards Permutation-Resilient LLM s ICLR2025

【速读】：该论文旨在解决大型语言模型（Large Language Models, LLMs）在情境学习（In-Context Learning, ICL）过程中因演示顺序敏感性导致的预测不稳定性问题。论文指出，这种脆弱性可以被利用设计出一种自然攻击，即通过简单地置换演示序列，就能使LLaMA-3的攻击成功率接近80%，而这种攻击对于模型提供者来说难以检测。现有缓解方法主要依赖于后处理技术，无法增强模型对输入置换的固有鲁棒性，从而引发关于LLMs安全性与可靠性的担忧。

为了解决这一问题，论文提出了一种名为Permutation-resilient learning (PEARL)的新框架，该框架基于分布稳健优化（Distributionally Robust Optimization, DRO），其目标是在最坏情况下的输入置换下优化模型性能。PEARL的关键在于引入了一个排列提议网络（Permutation-Proposal Network, P-Net），它将生成最具挑战性的排列视为一个最优传输问题，并使用熵约束的Sinkhorn算法进行求解。通过极小极大优化，P-Net和LLM相互迭代优化，逐步提升LLM的鲁棒性。实验结果表明，PEARL不仅能有效缓解置换攻击，还能在预训练和指令调优任务中提高性能，在面对多演示和长上下文场景时，即使在较少演示和较短上下文条件下训练，也能实现高达40%的性能提升，展示了其高效性和泛化能力。

链接: https://arxiv.org/abs/2502.14628
作者: Liang Chen,Li Shen,Yang Deng,Xiaoyan Zhao,Bin Liang,Kam-Fai Wong
机构: The Chinese University of Hong Kong(香港中文大学); Shenzhen Campus of Sun Yat-sen University(中山大学深圳校区); SMU(新加坡管理大学)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注: ICLR 2025

点击查看摘要

Abstract:The in-context learning (ICL) capability of large language models (LLMs) enables them to perform challenging tasks using provided demonstrations. However, ICL is highly sensitive to the ordering of demonstrations, leading to instability in predictions. This paper shows that this vulnerability can be exploited to design a natural attack - difficult for model providers to detect - that achieves nearly 80% success rate on LLaMA-3 by simply permuting the demonstrations. Existing mitigation methods primarily rely on post-processing and fail to enhance the model’s inherent robustness to input permutations, raising concerns about safety and reliability of LLMs. To address this issue, we propose Permutation-resilient learning (PEARL), a novel framework based on distributionally robust optimization (DRO), which optimizes model performance against the worst-case input permutation. Specifically, PEARL consists of a permutation-proposal network (P-Net) and the LLM. The P-Net generates the most challenging permutations by treating it as an optimal transport problem, which is solved using an entropy-constrained Sinkhorn algorithm. Through minimax optimization, the P-Net and the LLM iteratively optimize against each other, progressively improving the LLM’s robustness. Experiments on synthetic pre-training and real-world instruction tuning tasks demonstrate that PEARL effectively mitigates permutation attacks and enhances performance. Notably, despite being trained on fewer shots and shorter contexts, PEARL achieves performance gains of up to 40% when scaled to many-shot and long-context scenarios, highlighting its efficiency and generalization capabilities.
zh

[NLP-45] Multi-Record Web Page Information Extraction From News Websites

【速读】：该论文旨在解决从包含大量记录的网页中提取信息的问题，特别是在大规模网页数据时代这一任务的重要性日益增加。现有研究和数据集大多关注详细页面，而对多记录“列表页面”的研究相对不足，尽管这类页面在实际中广泛存在且具有重要意义。为填补这一空白，论文创建了一个专门针对列表页面的大规模开放访问数据集，这是首个俄语领域的此类数据集，包含了13,120个新闻列表页面，其规模和复杂性均显著超越现有数据集。论文的关键解决方案在于提出了一种多阶段的信息提取方法，并探索了如何应用MarkupLM来应对多记录网页的具体挑战。这些方法通过实验验证了其有效性。

链接: https://arxiv.org/abs/2502.14625
作者: Alexander Kustenkov,Maksim Varlamov,Alexander Yatskov
机构: Ivannikov Institute for System Programming of the Russian Academy of Sciences (俄罗斯科学院系统编程研究所), Moscow, Russia; Lomonosov Moscow State University (莫斯科国立大学), Moscow, Russia
类目: Computation and Language (cs.CL); Information Retrieval (cs.IR)
备注:

点击查看摘要

Abstract:In this paper, we focused on the problem of extracting information from web pages containing many records, a task of growing importance in the era of massive web data. Recently, the development of neural network methods has improved the quality of information extraction from web pages. Nevertheless, most of the research and datasets are aimed at studying detailed pages. This has left multi-record “list pages” relatively understudied, despite their widespread presence and practical significance. To address this gap, we created a large-scale, open-access dataset specifically designed for list pages. This is the first dataset for this task in the Russian language. Our dataset contains 13,120 web pages with news lists, significantly exceeding existing datasets in both scale and complexity. Our dataset contains attributes of various types, including optional and multi-valued, providing a realistic representation of real-world list pages. These features make our dataset a valuable resource for studying information extraction from pages containing many records. Furthermore, we proposed our own multi-stage information extraction methods. In this work, we explore and demonstrate several strategies for applying MarkupLM to the specific challenges of multi-record web pages. Our experiments validate the advantages of our methods. By releasing our dataset to the public, we aim to advance the field of information extraction from multi-record pages. Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR) Cite as: arXiv:2502.14625 [cs.CL] (or arXiv:2502.14625v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2502.14625 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Aleksandr Yatskov [view email] [v1] Thu, 20 Feb 2025 15:05:00 UTC (298 KB)
zh

[NLP-46] Exploring RWKV for Sentence Embeddings: Layer-wise Analysis and Baseline Comparison for Semantic Similarity

【速读】：该论文旨在评估RWKV这一新颖语言模型架构在零样本设定下生成句嵌入的有效性，重点关注其线性注意力机制。研究通过分层分析不同隐藏层生成的嵌入在语义相似性方面的表现，并使用Spearman相关系数在Microsoft Research Paraphrase Corpus (MRPC)数据集上进行评估，将其与基于GloVe的基准进行比较。关键在于通过对比RWKV与GloVe在语义相似性任务中的性能差异，揭示RWKV在零样本句嵌入质量方面存在的不足，以及其线性扩展的优势需要进一步探索和可能的任务特定微调以匹配或超越简单基准。

链接: https://arxiv.org/abs/2502.14620
作者: Xinghan Pan
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 17 pages, 3 tables, preprint on ArXiV, includes detailed analysis of RWKV for semantic similarity tasks

点击查看摘要

Abstract:This paper investigates the efficacy of RWKV, a novel language model architecture known for its linear attention mechanism, for generating sentence embeddings in a zero-shot setting. I conduct a layer-wise analysis to evaluate the semantic similarity captured by embeddings from different hidden layers of a pre-trained RWKV model. The performance is assessed on the Microsoft Research Paraphrase Corpus (MRPC) dataset using Spearman correlation and compared against a GloVe-based baseline. My results indicate that while RWKV embeddings capture some semantic relatedness, they underperform compared to the GloVe baseline in terms of Spearman correlation. I also analyze the inference time and GPU memory usage, highlighting the computational trade-offs associated with RWKV embeddings. The findings suggest that while RWKV offers potential advantages in terms of linear scaling, its zero-shot sentence embedding quality for semantic similarity tasks requires further investigation and potential task-specific fine-tuning to match or exceed simpler baselines.
zh

[NLP-47] Reward Models Identify Consistency Not Causality

【速读】：该论文旨在解决现有奖励模型（Reward Models, RMs）在评估大型语言模型（Large Language Models, LLMs）推理质量时存在的局限性。研究发现，当前的RMs主要关注结构一致性而非因果正确性，并且过分依赖完整的推理轨迹，而忽视了对问题的显式理解。关键在于提出需要转向因果意识的奖励模型，以超越仅基于一致性的评估方法，从而更有效地评估推理的质量和逻辑有效性。

链接: https://arxiv.org/abs/2502.14619
作者: Yuhui Xu,Hanze Dong,Lei Wang,Caiming Xiong,Junnan Li
机构: Salesforce AI Research
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: 16 pages

点击查看摘要

Abstract:Reward models (RMs) play a crucial role in aligning large language models (LLMs) with human preferences and enhancing reasoning quality. Traditionally, RMs are trained to rank candidate outputs based on their correctness and coherence. However, in this work, we present several surprising findings that challenge common assumptions about RM behavior. Our analysis reveals that state-of-the-art reward models prioritize structural consistency over causal correctness. Specifically, removing the problem statement has minimal impact on reward scores, whereas altering numerical values or disrupting the reasoning flow significantly affects RM outputs. Furthermore, RMs exhibit a strong dependence on complete reasoning trajectories truncated or incomplete steps lead to significant variations in reward assignments, indicating that RMs primarily rely on learned reasoning patterns rather than explicit problem comprehension. These findings hold across multiple architectures, datasets, and tasks, leading to three key insights: (1) RMs primarily assess coherence rather than true reasoning quality; (2) The role of explicit problem comprehension in reward assignment is overstated; (3) Current RMs may be more effective at ranking responses than verifying logical validity. Our results suggest a fundamental limitation in existing reward modeling approaches, emphasizing the need for a shift toward causality-aware reward models that go beyond consistency-driven evaluation.
zh

[NLP-48] FIND: Fine-grained Information Density Guided Adaptive Retrieval-Augmented Generation for Disease Diagnosis

链接: https://arxiv.org/abs/2502.14614
作者: Mingyi Jia,Junwen Duan,Yan Song,Jianxin Wang
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-49] Behavioral Analysis of Information Salience in Large Language Models

【速读】：该论文旨在探究大型语言模型（LLMs）在文本摘要任务中内部化的重要性的精确概念。为了弥合这一差距，论文引入了一个可解释的框架，通过系统地推导和研究LLMs的信息重要性来探讨其摘要行为。解决方案的关键在于使用长度控制的摘要作为内容选择过程的行为探针，并追踪讨论中的问题可回答性，从而推导出模型如何优先处理信息的代理指标。实验结果显示，尽管模型表现出高度一致的行为和重要性模式，但这种重要性的概念无法通过内省获得，并且仅与人类对信息重要性的感知弱相关。

链接: https://arxiv.org/abs/2502.14613
作者: Jan Trienes,Jörg Schlötterer,Junyi Jessy Li,Christin Seifert
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Large Language Models (LLMs) excel at text summarization, a task that requires models to select content based on its importance. However, the exact notion of salience that LLMs have internalized remains unclear. To bridge this gap, we introduce an explainable framework to systematically derive and investigate information salience in LLMs through their summarization behavior. Using length-controlled summarization as a behavioral probe into the content selection process, and tracing the answerability of Questions Under Discussion throughout, we derive a proxy for how models prioritize information. Our experiments on 13 models across four datasets reveal that LLMs have a nuanced, hierarchical notion of salience, generally consistent across model families and sizes. While models show highly consistent behavior and hence salience patterns, this notion of salience cannot be accessed through introspection, and only weakly correlates with human perceptions of information salience.
zh

[NLP-50] A Statistical Case Against Empirical Human-AI Alignment

链接: https://arxiv.org/abs/2502.14581
作者: Julian Rodemann,Esteban Garces Arias,Christoph Luther,Christoph Jansen,Thomas Augustin
机构: 未知
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Other Statistics (stat.OT)
备注: 24 pages, 2 figures, 5 tables

点击查看摘要

[NLP-51] ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification

【速读】：该论文旨在解决大型语言模型（Large Language Models, LLMs）在自我评估和纠正生成内容方面的能力不足的问题。关键解决方案是提出了一种名为“通过内在自我验证进行精炼”（Refine via Intrinsic Self-Verification, ReVISE）的框架。ReVISE使LLMs能够通过自我验证来修正其输出，并且可以通过结构化的课程学习实现这一点。具体而言，ReVISE通过序列化课程学习处理自我验证和推理修正两个挑战性任务，并利用收集到的失败和成功的推理路径构建偏好对以进行高效训练。这一方法在推理过程中表现出自然的测试时间扩展能力，并通过置信度感知解码机制进一步增强。

链接: https://arxiv.org/abs/2502.14565
作者: Hyunseok Lee,Seunghyuk Oh,Jaehyung Kim,Jinwoo Shin,Jihoon Tack
机构: 未知
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Self-awareness, i.e., the ability to assess and correct one’s own generation, is a fundamental aspect of human intelligence, making its replication in large language models (LLMs) an important yet challenging task. Previous works tackle this by employing extensive reinforcement learning or rather relying on large external verifiers. In this work, we propose Refine via Intrinsic Self-Verification (ReVISE), an efficient and effective framework that enables LLMs to self-correct their outputs through self-verification. The core idea of ReVISE is to enable LLMs to verify their reasoning processes and continually rethink reasoning trajectories based on its verification. We introduce a structured curriculum based upon online preference learning to implement this efficiently. Specifically, as ReVISE involves two challenging tasks (i.e., self-verification and reasoning correction), we tackle each task sequentially using curriculum learning, collecting both failed and successful reasoning paths to construct preference pairs for efficient training. During inference, our approach enjoys natural test-time scaling by integrating self-verification and correction capabilities, further enhanced by our proposed confidence-aware decoding mechanism. Our experiments on various reasoning tasks demonstrate that ReVISE achieves efficient self-correction and significantly improves reasoning performance.
zh

[NLP-52] Can LLM s Predict Citation Intent? An Experimental Analysis of In-context Learning and Fine-tuning on Open LLM s

链接: https://arxiv.org/abs/2502.14561
作者: Paris Koloveas,Serafeim Chatzopoulos,Thanasis Vergoulis,Christos Tryfonopoulos
机构: IMSI, ATHENA RC (IMSI, 雅典研究与技术基金会); Athens, GR (雅典, 希腊); University of the Peloponnese (伯罗奔尼撒大学) / Tripolis, GR (特里波利斯, 希腊)
类目: Computation and Language (cs.CL); Digital Libraries (cs.DL)
备注:

点击查看摘要

[NLP-53] Less is More: Improving LLM Alignment via Preference Data Selection

【速读】：该论文旨在解决直接偏好优化（DPO）在处理噪声数据时导致参数收缩的问题。关键解决方案在于提出了一种新的边界最大化原则用于DPO训练的数据集整理，并通过双边界引导方法准确估计数据选择的边界，该方法同时考虑了外部奖励边界和隐式的DPO奖励边界。这一策略显著提升了模型性能，同时大幅降低了计算成本。

链接: https://arxiv.org/abs/2502.14560
作者: Xun Deng,Han Zhong,Rui Ai,Fuli Feng,Zheng Wang,Xiangnan He
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Direct Preference Optimization (DPO) has emerged as a promising approach for aligning large language models with human preferences. While prior work mainly extends DPO from the aspect of the objective function, we instead improve DPO from the largely overlooked but critical aspect of data selection. Specifically, we address the issue of parameter shrinkage caused by noisy data by proposing a novel margin-maximization principle for dataset curation in DPO training. To accurately estimate margins for data selection, we propose a dual-margin guided approach that considers both external reward margins and implicit DPO reward margins. Extensive experiments demonstrate that our method reduces computational cost dramatically while improving performance. Remarkably, by using just 10% of the Ultrafeedback dataset, our approach achieves 3% to 8% improvements across various Llama and Mistral series models on the AlpacaEval 2.0 benchmark. Furthermore, our approach seamlessly extends to iterative DPO, yielding a roughly 3% improvement with 25% online data, while further reducing training time. These results highlight the potential of data selection strategies for advancing preference optimization.
zh

[NLP-54] Multiscale Byte Language Models – A Hierarchical Architecture for Causal Million-Length Sequence Modeling

链接: https://arxiv.org/abs/2502.14553
作者: Eric Egli,Matteo Manica,Jannis Born
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Under Review

点击查看摘要

[NLP-55] LLM -based User Profile Management for Recommender System ACL2025

链接: https://arxiv.org/abs/2502.14541
作者: Seunghwan Bang,Hwanjun Song
机构: Ulsan National Institute of Science and Technology (韩国科学技术研究院); Korea Advanced Institute of Science and Technology (韩国高等科技学院)
类目: Computation and Language (cs.CL)
备注: Submitted to ACL 2025

点击查看摘要

[NLP-56] LoRA-GGPO: Mitigating Double Descent in LoRA Fine-Tuning via Gradient-Guided Perturbation Optimization

【速读】：该论文旨在解决大型语言模型（LLMs）在全量微调过程中资源消耗大的问题，并特别关注低秩适应（LoRA）方法在微调过程中出现的“双下降”现象。解决方案的关键在于提出了一种名为LoRA-GGPO（梯度引导扰动优化）的新方法，通过利用梯度和权重范数生成目标扰动，优化损失函数景观的锐度，引导模型趋向更平坦的极小值，从而缓解双下降问题并提升泛化能力。

链接: https://arxiv.org/abs/2502.14538
作者: Yupeng Chang,Chenlu Guo,Yi Chang,Yuan Wu
机构: School of Artificial Intelligence, Jilin University (吉林大学人工智能学院); Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China (中国教育部知识驱动人机智能工程研究中心); International Center of Future Science, Jilin University (吉林大学未来科学国际中心)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Large Language Models (LLMs) have achieved remarkable success in natural language processing, but their full fine-tuning remains resource-intensive. Parameter-Efficient Fine-Tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), have emerged as a practical solution by approximating parameter updates with low-rank matrices. However, LoRA often exhibits a “double descent” phenomenon during fine-tuning, where model performance degrades due to overfitting and limited expressiveness caused by low-rank constraints. To address this issue, we propose LoRA-GGPO (Gradient-Guided Perturbation Optimization), a novel method that leverages gradient and weight norms to generate targeted perturbations. By optimizing the sharpness of the loss landscape, LoRA-GGPO guides the model toward flatter minima, mitigating the double descent problem and improving generalization. Extensive experiments on natural language understanding (NLU) and generation (NLG) tasks demonstrate that LoRA-GGPO outperforms LoRA and its state-of-the-art variants. Furthermore, extended experiments specifically designed to analyze the double descent phenomenon confirm that LoRA-GGPO effectively alleviates this issue, producing more robust and generalizable models. Our work provides a robust and efficient solution for fine-tuning LLMs, with broad applicability in real-world scenarios. The code is available at this https URL.
zh

[NLP-57] CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models

链接: https://arxiv.org/abs/2502.14529
作者: Zhenhong Zhou,Zherui Li,Jie Zhang,Yuanhe Zhang,Kun Wang,Yang Liu,Qing Guo
机构: CFAR and IHPC, ASTAR (ASTAR); BUPT (北京邮电大学); Nanyang Technological University (南洋理工大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-58] Generative adversarial networks vs large language models : a comparative study on synthetic tabular data generation

【速读】：该论文旨在解决零样本生成合成表格数据的问题。解决方案的关键在于利用大型语言模型（Large Language Model, LLM）GPT-4o和自然语言提示（plain-language prompting），实现高保真表格数据的生成，而无需任务特定的微调或使用真实世界数据（Real World Data, RWD）进行预训练。

链接: https://arxiv.org/abs/2502.14523
作者: Austin A. Barr,Robert Rozman,Eddie Guo
机构: Cumming School of Medicine, University of Calgary(卡尔加里大学Cumming医学院); Calgary, AB, Canada(加拿大阿伯塔省卡尔加里市); Independent Researcher(独立研究员); Toronto, ON, Canada(加拿大安大略省多伦多市)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注: 12 pages, 7 figures, 5 tables

点击查看摘要

Abstract:We propose a new framework for zero-shot generation of synthetic tabular data. Using the large language model (LLM) GPT-4o and plain-language prompting, we demonstrate the ability to generate high-fidelity tabular data without task-specific fine-tuning or access to real-world data (RWD) for pre-training. To benchmark GPT-4o, we compared the fidelity and privacy of LLM-generated synthetic data against data generated with the conditional tabular generative adversarial network (CTGAN), across three open-access datasets: Iris, Fish Measurements, and Real Estate Valuation. Despite the zero-shot approach, GPT-4o outperformed CTGAN in preserving means, 95% confidence intervals, bivariate correlations, and data privacy of RWD, even at amplified sample sizes. Notably, correlations between parameters were consistently preserved with appropriate direction and strength. However, refinement is necessary to better retain distributional characteristics. These findings highlight the potential of LLMs in tabular data synthesis, offering an accessible alternative to generative adversarial networks and variational autoencoders.
zh

[NLP-59] MultiSlav: Using Cross-Lingual Knowledge Transfer to Combat the Curse of Multilinguality

【速读】：该论文旨在探讨多语种神经机器翻译（Multilingual Neural Machine Translation, NMT）是否会导致“多语种诅咒”或在语言家族内提供跨语言知识迁移。论文的关键解决方案在于探索多种方法以扩展NMT的数据范围，并证明即使在零样本翻译环境下，低资源语言也能从中受益。论文提供了针对特定斯拉夫语言之间翻译的最新开放源代码NMT模型，并将其发布在HuggingFace Hub上。

链接: https://arxiv.org/abs/2502.14509
作者: Artur Kot,Mikołaj Koszowski,Wojciech Chojnowski,Mieszko Rutkowski,Artur Nowakowski,Kamil Guttmann,Mikołaj Pokrywka
机构: Allegro.com(Allegro); Laniqo.com(Laniqo)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Does multilingual Neural Machine Translation (NMT) lead to The Curse of the Multlinguality or provides the Cross-lingual Knowledge Transfer within a language family? In this study, we explore multiple approaches for extending the available data-regime in NMT and we prove cross-lingual benefits even in 0-shot translation regime for low-resource languages. With this paper, we provide state-of-the-art open-source NMT models for translating between selected Slavic languages. We released our models on the HuggingFace Hub (this https URL) under the CC BY 4.0 license. Slavic language family comprises morphologically rich Central and Eastern European languages. Although counting hundreds of millions of native speakers, Slavic Neural Machine Translation is under-studied in our opinion. Recently, most NMT research focuses either on: high-resource languages like English, Spanish, and German - in WMT23 General Translation Task 7 out of 8 task directions are from or to English; massively multilingual models covering multiple language groups; or evaluation techniques.
zh

[NLP-60] Can LLM s Simulate L2-English Dialogue? An Information-Theoretic Analysis of L1-Dependent Biases

链接: https://arxiv.org/abs/2502.14507
作者: Rena Gao,Xuetong Wu,Tatsuki Kuribayashi,Mingrui Ye,Siya Qi,Carsten Roever,Yuanxing Liu,Zheng Yuan,Jey Han Lau
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-61] How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM ?

【速读】：该论文旨在解决大型语言模型（LLMs）在融合新知识时如何保持原有知识的同时避免性能下降的问题。关键在于使用低秩适应（Low-rank adaptation, LoRA）技术，在训练数据包含已知和新事实混合的情况下，对Llama-3.1-8B-instruct模型进行微调，以平衡新知识的整合与模型整体能力的保持。然而，这种方法可能导致模型在特定实体上的表现退化，并且在某些情况下会变得更加自信而拒绝作答。

链接: https://arxiv.org/abs/2502.14502
作者: Sergey Pletenev,Maria Marina,Daniil Moskovskiy,Vasily Konovalov,Pavel Braslavski,Alexander Panchenko,Mikhail Salnikov
机构: AIRI; Skoltech; Moscow Institute of Physics and Technology; Nazarbayev University
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:The performance of Large Language Models (LLMs) on many tasks is greatly limited by the knowledge learned during pre-training and stored in the model’s parameters. Low-rank adaptation (LoRA) is a popular and efficient training technique for updating or domain-specific adaptation of LLMs. In this study, we investigate how new facts can be incorporated into the LLM using LoRA without compromising the previously learned knowledge. We fine-tuned Llama-3.1-8B-instruct using LoRA with varying amounts of new knowledge. Our experiments have shown that the best results are obtained when the training data contains a mixture of known and new facts. However, this approach is still potentially harmful because the model’s performance on external question-answering benchmarks declines after such fine-tuning. When the training data is biased towards certain entities, the model tends to regress to few overrepresented answers. In addition, we found that the model becomes more confident and refuses to provide an answer in only few cases. These findings highlight the potential pitfalls of LoRA-based LLM updates and underscore the importance of training data composition and tuning parameters to balance new knowledge integration and general model capabilities.
zh

[NLP-62] owards a Perspectivist Turn in Argument Quality Assessment NAACL2025

链接: https://arxiv.org/abs/2502.14501
作者: Julia Romberg,Maximilian Maurer,Henning Wachsmuth,Gabriella Lapesa
机构: GESIS - Leibniz Institute for the Social Sciences (社会科学莱布尼茨研究所); Leibniz University Hannover (汉诺威莱布尼茨大学); Heinrich-Heine University Düsseldorf (杜塞尔多夫海因里希海涅大学)
类目: Computation and Language (cs.CL)
备注: Accepted to NAACL 2025

点击查看摘要

[NLP-63] MLGym: A New Framework and Benchmark for Advancing AI Research Agents

【速读】：该论文旨在解决如何评估和发展大型语言模型（Large Language Models, LLMs）在AI研究任务中的能力。解决方案的关键在于提出了一种新的框架Meta MLGym和基准测试平台MLGym-Bench，这是首个面向机器学习（ML）任务的Gym环境。该框架允许研究者通过13个来自计算机视觉、自然语言处理、强化学习和博弈论等领域的多样化和开放性AI研究任务来评估和开发LLMs。这些任务需要涵盖从生成新想法到迭代改进的整个AI研究过程的实际技能。通过这一框架，研究者可以轻松添加新任务、集成和评估模型或代理、大规模生成合成数据以及开发用于训练代理的新学习算法。

链接: https://arxiv.org/abs/2502.14499
作者: Deepak Nathani,Lovish Madaan,Nicholas Roberts,Nikolay Bashlykov,Ajay Menon,Vincent Moens,Amar Budhiraja,Despoina Magka,Vladislav Vorotilov,Gaurav Chaurasia,Dieuwke Hupkes,Ricardo Silveira Cabral,Tatiana Shavrina,Jakob Foerster,Yoram Bachrach,William Yang Wang,Roberta Raileanu
机构: unknown
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 35 pages, 12 figures, 10 tables

点击查看摘要

Abstract:We introduce Meta MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. This is the first Gym environment for machine learning (ML) tasks, enabling research on reinforcement learning (RL) algorithms for training such agents. MLGym-bench consists of 13 diverse and open-ended AI research tasks from diverse domains such as computer vision, natural language processing, reinforcement learning, and game theory. Solving these tasks requires real-world AI research skills such as generating new ideas and hypotheses, creating and processing data, implementing ML methods, training models, running experiments, analyzing the results, and iterating through this process to improve on a given task. We evaluate a number of frontier large language models (LLMs) on our benchmarks such as Claude-3.5-Sonnet, Llama-3.1 405B, GPT-4o, o1-preview, and Gemini-1.5 Pro. Our MLGym framework makes it easy to add new tasks, integrate and evaluate models or agents, generate synthetic data at scale, as well as develop new learning algorithms for training agents on AI research tasks. We find that current frontier models can improve on the given baselines, usually by finding better hyperparameters, but do not generate novel hypotheses, algorithms, architectures, or substantial improvements. We open-source our framework and benchmark to facilitate future research in advancing the AI research capabilities of LLM agents.
zh

[NLP-64] Stories that (are) Move(d by) Markets: A Causal Exploration of Market Shocks and Semantic Shifts across Different Partisan Groups

链接: https://arxiv.org/abs/2502.14497
作者: Felix Drinkall,Stefan Zohren,Michael McMahon,Janet B. Pierrehumbert
机构: Department of Engineering Science, University of Oxford (工程科学系，牛津大学); Department of Economics, University of Oxford (经济学系，牛津大学); Faculty of Linguistics, University of Oxford (语言学系，牛津大学)
类目: Computation and Language (cs.CL); Computational Engineering, Finance, and Science (cs.CE); General Economics (econ.GN)
备注:

点击查看摘要

[NLP-65] Enhancing Language Multi-Agent Learning with Multi-Agent Credit Re-Assignment for Interactive Environment Generalization

链接: https://arxiv.org/abs/2502.14496
作者: Zhitao He,Zijun Liu,Peng Li,May Fung,Ming Yan,Ji Zhang,Fei Huang,Yang Liu
机构: Institute for AI Industry Research (AIR), Tsinghua University (清华大学智能产业研究院); Department of Computer Science and Technology, Tsinghua University (清华大学计算机科学与技术系); Hong Kong University of Science and Technology (香港科技大学); Alibaba Group (阿里巴巴集团)
类目: Computation and Language (cs.CL)
备注: 24 pages, under review

点击查看摘要

[NLP-66] StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following

链接: https://arxiv.org/abs/2502.14494
作者: Jinnan Li,Jinzhe Li,Yue Wang,Yi Chang,Yuan Wu
机构: School of Artificial Intelligence, Jilin University (吉林大学人工智能学院); College of Computer Science and Technology, Jilin University (吉林大学计算机科学与技术学院); School of Information and Library Science, University of North Carolina at Chapel Hill (北卡罗来纳大学教堂山分校信息与图书馆科学学院); Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China (中国教育部知识驱动人机智能工程研究中心); International Center of Future Science, Jilin University (吉林大学未来科学国际中心)
类目: Computation and Language (cs.CL)
备注: 18 pages, 8 figures, 8 tables

点击查看摘要

[NLP-67] How Jailbreak Defenses Work and Ensemble? A Mechanistic Investigation

【速读】：该论文旨在解决大型视觉语言模型（Large Vision-Language Models, LVLMs）在面对有害提示（harmful prompts）时的安全性问题，特别是针对生成式AI（Generative AI）模型的越狱攻击（jailbreak attacks），此类攻击能够绕过模型内置的安全机制。论文的关键解决方案在于通过将标准生成任务重新定义为二分类问题，来评估模型对于有害和良性查询的拒绝倾向。基于此，论文识别出两种关键防御机制：安全偏移（safety shift），即提高所有查询的拒绝率；以及有害性区分（harmfulness discrimination），即增强模型区分有害与良性输入的能力。利用这些机制，论文提出了两种集成防御策略——跨机制集成（inter-mechanism ensembles）和机制内集成（intra-mechanism ensembles），以平衡安全性与有用性。实验结果表明，这些策略有效地提升了模型的安全性或优化了安全性和有用性的权衡。

链接: https://arxiv.org/abs/2502.14486
作者: Zhuohang Long,Siyuan Wang,Shujun Liu,Yuhang Lai,Xuanjing Huang,Zhongyu Wei
机构: Fudan University; University of Southern California
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Jailbreak attacks, where harmful prompts bypass generative models’ built-in safety, raise serious concerns about model vulnerability. While many defense methods have been proposed, the trade-offs between safety and helpfulness, and their application to Large Vision-Language Models (LVLMs), are not well understood. This paper systematically examines jailbreak defenses by reframing the standard generation task as a binary classification problem to assess model refusal tendencies for both harmful and benign queries. We identify two key defense mechanisms: safety shift, which increases refusal rates across all queries, and harmfulness discrimination, which improves the model’s ability to distinguish between harmful and benign inputs. Using these mechanisms, we develop two ensemble defense strategies-inter-mechanism ensembles and intra-mechanism ensembles-to balance safety and helpfulness. Experiments on the MM-SafetyBench and MOSSBench datasets with LLaVA-1.5 models show that these strategies effectively improve model safety or optimize the trade-off between safety and helpfulness.
zh

[NLP-68] NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language Models

【速读】：该论文旨在解决低秩适应（Low-rank Adaptation, LoRA）在参数高效微调（Parameter-Efficient Fine-Tuning, PEFT）中的慢收敛及昂贵计算问题。论文的关键解决方案是引入Nyström方法，并提出StructuredLoRA (SLoRA) 和NyströmLoRA (NLoRA) 方法，通过在低秩矩阵间加入中间矩阵及利用Nyström初始化来提升模型效率与效果。此外，IntermediateTune (IntTune) 方法进一步探索仅微调NLoRA的中间矩阵以显著提高大型语言模型（LLMs）的效率。这些方法在多个自然语言生成（NLG）和自然语言理解（NLU）任务上展示了其有效性与高效性。

链接: https://arxiv.org/abs/2502.14482
作者: Chenlu Guo,Yuan Wu,Yi Chang
机构: School of Artificial Intelligence, Jilin University(吉林大学人工智能学院); Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China(中国教育部知识驱动人机智能工程研究中心); International Center of Future Science, Jilin University(吉林大学未来科学国际中心)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Parameter-efficient fine-tuning (PEFT) is essential for adapting large language models (LLMs), with low-rank adaptation (LoRA) being the most popular approach. However, LoRA suffers from slow convergence, and some recent LoRA variants, such as PiSSA, primarily rely on Singular Value Decomposition (SVD) for initialization, leading to expensive computation. To mitigate these problems, we use the Nyström method, which follows a three-matrix manipulation. We first introduce StructuredLoRA (SLoRA), which investigates adding a small intermediate matrix between the low-rank matrices A and B. Secondly, we propose NyströmLoRA (NLoRA), which leverages Nyström-based initialization for SLoRA to improve its effectiveness and efficiency. Finally, we propose IntermediateTune (IntTune), which explores fine-tuning exclusively on the intermediate matrix of NLoRA to further boost LLM efficiency. We evaluate our methods on five natural language generation (NLG) tasks and eight natural language understanding (NLU) tasks. On GSM8K, SLoRA and NLoRA achieve accuracies of 56.48% and 57.70%, surpassing LoRA by 33.52% and 36.41%, with only 3.67 million additional trainable parameters. IntTune improves average NLG performance over LoRA by 7.45% while using only 1.25% of its parameters. These results demonstrate the efficiency and effectiveness of our approach in enhancing model performance with minimal parameter overhead.
zh

[NLP-69] Unshackling Context Length: An Efficient Selective Attention Approach through Query-Key Compression

【速读】：该论文旨在解决大型语言模型（LLMs）在处理长上下文序列时效率低下的问题。关键解决方案是提出了一种名为高效选择性注意（Efficient Selective Attention, ESA）的方法，通过在令牌级别选择最关键的信息来计算注意力，从而延长上下文长度。ESA通过将查询和键向量压缩到较低维度来降低令牌选择的计算复杂度。

链接: https://arxiv.org/abs/2502.14477
作者: Haoyu Wang,Tong Teng,Tianyu Guo,An Xiao,Duyu Tang,Hanting Chen,Yunhe Wang
机构: Huawei Noah’s Ark Lab (华为诺亚方舟实验室); Huawei CBG (华为消费者业务集团)
类目: Computation and Language (cs.CL)
备注: 14 pages,2 figures

点击查看摘要

Abstract:Handling long-context sequences efficiently remains a significant challenge in large language models (LLMs). Existing methods for token selection in sequence extrapolation either employ a permanent eviction strategy or select tokens by chunk, which may lead to the loss of critical information. We propose Efficient Selective Attention (ESA), a novel approach that extends context length by efficiently selecting the most critical tokens at the token level to compute attention. ESA reduces the computational complexity of token selection by compressing query and key vectors into lower-dimensional representations. We evaluate ESA on long sequence benchmarks with maximum lengths up to 256k using open-source LLMs with context lengths of 8k and 32k. ESA outperforms other selective attention methods, especially in tasks requiring the retrieval of multiple pieces of information, achieving comparable performance to full-attention extrapolation methods across various tasks, with superior results in certain tasks.
zh

[NLP-70] Argument-Based Comparative Question Answering Evaluation Benchmark

链接: https://arxiv.org/abs/2502.14476
作者: Irina Nikishina,Saba Anwar,Nikolay Dolgov,Maria Manina,Daria Ignatenko,Viktor Moskvoretskii,Artem Shelmanov,Tim Baldwin,Chris Biemann
机构: University of Hamburg(汉堡大学); HSE University; MBZUAI; Skoltech
类目: Computation and Language (cs.CL)
备注: 8 pages, 7 Tables, 13 Figures, 18 pages with Appendix

点击查看摘要

[NLP-71] Enhancing Smart Environments with Context-Aware Chatbots using Large Language Models

【速读】：该论文旨在解决智能环境中静态聊天机器人交互体验不足的问题。解决方案的关键在于利用大型语言模型（Large Language Models, LLMs）结合用户实时位置数据（通过UWB标签获取）和传感器装备的智能家居中的实时人体活动识别（Human Activity Recognition, HAR），以提供全面的用户情境理解。这种情境信息被用于驱动聊天机器人生成个性化交互和推荐，从而实现动态适应用户实时情况的交互方式。

链接: https://arxiv.org/abs/2502.14469
作者: Aurora Polo-Rodríguez,Laura Fiorini,Erika Rovini,Filippo Cavallo,Javier Medina-Quero
机构: Department of Computer Science, Automatics and Robotics, University of Granada(格拉纳达大学), Spain(西班牙).; Department of Industrial Engineering, University of Florence(佛罗伦萨大学), Italy(意大利).
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
备注: 11 pages, 3 figures

点击查看摘要

Abstract:This work presents a novel architecture for context-aware interactions within smart environments, leveraging Large Language Models (LLMs) to enhance user experiences. Our system integrates user location data obtained through UWB tags and sensor-equipped smart homes with real-time human activity recognition (HAR) to provide a comprehensive understanding of user context. This contextual information is then fed to an LLM-powered chatbot, enabling it to generate personalised interactions and recommendations based on the user’s current activity and environment. This approach moves beyond traditional static chatbot interactions by dynamically adapting to the user’s real-time situation. A case study conducted from a real-world dataset demonstrates the feasibility and effectiveness of our proposed architecture, showcasing its potential to create more intuitive and helpful interactions within smart homes. The results highlight the significant benefits of integrating LLM with real-time activity and location data to deliver personalised and contextually relevant user experiences.
zh

[NLP-72] Optimal word order for non-causal text generation with Large Language Models : the Spanish case

【速读】：该论文旨在解决自然语言生成（Natural Language Generation, NLG）在非因果（非单向）语言模型中的最优文本生成顺序问题。针对这一挑战，论文提出了一种基于Viterbi算法的新方法，用于最大似然词序估计。关键在于通过分析非因果语言模型下的最可能词序概率，并与因果（单向）NLG生成相同短语的概率进行比较，揭示出因果NLG更倾向于英语式的主谓宾（SVO）结构。此外，论文还探讨了最优生成顺序与因果从左到右生成顺序之间的关系，表明最大似然估计预测的理想顺序与因果顺序并不密切相关，且可能受到目标句子句法结构的影响。

链接: https://arxiv.org/abs/2502.14451
作者: Andrea Busto-Castiñeira,Silvia García-Méndez,Francisco de Arriba-Pérez,Francisco J. González-Castaño
机构: unknown.uvigo.es(未知.UVIGO)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Natural Language Generation (NLG) popularity has increased owing to the progress in Large Language Models (LLMs), with zero-shot inference capabilities. However, most neural systems utilize decoder-only causal (unidirectional) transformer models, which are effective for English but may reduce the richness of languages with less strict word order, subject omission, or different relative clause attachment preferences. This is the first work that analytically addresses optimal text generation order for non-causal language models. We present a novel Viterbi algorithm-based methodology for maximum likelihood word order estimation. We analyze the non-causal most-likelihood order probability for NLG in Spanish and, then, the probability of generating the same phrases with Spanish causal NLG. This comparative analysis reveals that causal NLG prefers English-like SVO structures. We also analyze the relationship between optimal generation order and causal left-to-right generation order using Spearman’s rank correlation. Our results demonstrate that the ideal order predicted by the maximum likelihood estimator is not closely related to the causal order and may be influenced by the syntactic structure of the target sentence.
zh

[NLP-73] PredictaBoard: Benchmarking LLM Score Predictability

【速读】：该论文旨在解决大型语言模型（Large Language Models, LLMs）在实际应用中不可预测的错误，特别是在基本常识推理任务中表现不稳定的问题。这种不可预测性阻碍了其安全部署。为了解决这一挑战，论文提出了一种名为PredictaBoard的新颖协作基准框架，用于评估评分预测器（即评估者）预测特定任务实例（即提示）中LLM错误的能力。PredictaBoard通过考虑不同容错率下的拒绝率来评估LLM和评估者之间的配对性能。关键在于强调需要同时评估可预测性和性能，以促进开发更可靠的评估者和提高LLM的可预测性，从而实现更安全的AI系统。

链接: https://arxiv.org/abs/2502.14445
作者: Lorenzo Pacchiardi,Konstantinos Voudouris,Ben Slater,Fernando Martínez-Plumed,José Hernández-Orallo,Lexin Zhou,Wout Schellaert
机构: Leverhulme Centre for the Future of Intelligence, University of Cambridge (莱弗休姆未来智能研究中心, 剑桥大学); Institute for Human-Centered AI (以人为本人工智能研究所), Helmholtz Zentrum Munich (赫尔姆霍兹慕尼黑研究中心), Germany; VRAIN (VRAIN), Universitat Politècnica de València (瓦伦西亚理工大学), Spain
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
备注:

点击查看摘要

Abstract:Despite possessing impressive skills, Large Language Models (LLMs) often fail unpredictably, demonstrating inconsistent success in even basic common sense reasoning tasks. This unpredictability poses a significant challenge to ensuring their safe deployment, as identifying and operating within a reliable “safe zone” is essential for mitigating risks. To address this, we present PredictaBoard, a novel collaborative benchmarking framework designed to evaluate the ability of score predictors (referred to as assessors) to anticipate LLM errors on specific task instances (i.e., prompts) from existing datasets. PredictaBoard evaluates pairs of LLMs and assessors by considering the rejection rate at different tolerance errors. As such, PredictaBoard stimulates research into developing better assessors and making LLMs more predictable, not only with a higher average performance. We conduct illustrative experiments using baseline assessors and state-of-the-art LLMs. PredictaBoard highlights the critical need to evaluate predictability alongside performance, paving the way for safer AI systems where errors are not only minimised but also anticipated and effectively mitigated. Code for our benchmark can be found at this https URL
zh

[NLP-74] An Enhancement of Jiang Z. et al.s Compression-Based Classification Algorithm Applied to News Article Categorization

【速读】：该论文旨在解决现有基于压缩的分类算法在检测文本文档之间语义相似性方面的局限性。关键解决方案在于改进一元词（Unigram）提取和优化的串联策略，通过压缩提取的一元词来减轻gzip固有的滑动窗口限制，从而提高压缩效率和相似性检测能力。同时，采用一元词并集的串联方法替代直接串联，减少了冗余并增强了归一化压缩距离（NCD）计算的准确性。这些改进显著提升了分类精度，尤其在高标签多样性和复杂文本结构的数据集中表现更为突出。

链接: https://arxiv.org/abs/2502.14444
作者: Sean Lester C. Benavides,Cid Antonio F. Masapol,Jonathan C. Morano,Dan Michael A. Cortez
机构: 未知
类目: Computation and Language (cs.CL)
备注: 11 pages, 5 figures, 1 table

点击查看摘要

Abstract:This study enhances Jiang et al.'s compression-based classification algorithm by addressing its limitations in detecting semantic similarities between text documents. The proposed improvements focus on unigram extraction and optimized concatenation, eliminating reliance on entire document compression. By compressing extracted unigrams, the algorithm mitigates sliding window limitations inherent to gzip, improving compression efficiency and similarity detection. The optimized concatenation strategy replaces direct concatenation with the union of unigrams, reducing redundancy and enhancing the accuracy of Normalized Compression Distance (NCD) calculations. Experimental results across datasets of varying sizes and complexities demonstrate an average accuracy improvement of 5.73%, with gains of up to 11% on datasets containing longer documents. Notably, these improvements are more pronounced in datasets with high-label diversity and complex text structures. The methodology achieves these results while maintaining computational efficiency, making it suitable for resource-constrained environments. This study provides a robust, scalable solution for text classification, emphasizing lightweight preprocessing techniques to achieve efficient compression, which in turn enables more accurate classification.
zh

[NLP-75] Natural Language Generation

链接: https://arxiv.org/abs/2502.14437
作者: Ehud Reiter
机构: 未知
类目: Computation and Language (cs.CL)
备注: This is a preprint of the following work: Ehud Reiter, Natural Language Generation, 2024, Springer reproduced with permission of Springer Nature Switzerland AG. The final authenticated version is available online at: this http URL

点击查看摘要

[NLP-76] Early-Exit and Instant Confidence Translation Quality Estimation

【速读】：该论文旨在解决大规模机器翻译中质量估计的成本问题以及开发一种低成本的不确定性估计方法。关键解决方案在于提出Instant Confidence COMET模型，该模型能够在降低计算成本的同时保持与先前方法相当的性能。进一步，通过引入Early-Exit COMET模型，在早期模型层即可计算质量评分及相应置信度，从而实现提前终止计算以减少评估成本。此外，论文还将该模型应用于机器翻译重排序任务，并结合上界置信区间算法，以在无需对所有候选者运行完整评估模型的情况下找到最佳候选者。这些方法均能在性能略微下降的前提下将所需的计算量减少50%。

链接: https://arxiv.org/abs/2502.14429
作者: Vilém Zouhar,Maike Züfle,Beni Egressy,Julius Cheng,Jan Niehues
机构: ETH Zurich (苏黎世联邦理工学院); Karlsruhe Institute of Technology (卡尔斯鲁厄理工学院); Heidelberg Institute for Theoretical Studies (海德堡理论研究所); University of Cambridge (剑桥大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Quality estimation is omnipresent in machine translation, for both evaluation and generation. Unfortunately, quality estimation models are often opaque and computationally expensive, making them impractical to be part of large-scale pipelines. In this work, we tackle two connected challenges: (1) reducing the cost of quality estimation at scale, and (2) developing an inexpensive uncertainty estimation method for quality estimation. To address the latter, we introduce Instant Confidence COMET, an uncertainty-aware quality estimation model that matches the performance of previous approaches at a fraction of their costs. We extend this to Early-Exit COMET, a quality estimation model that can compute quality scores and associated confidences already at early model layers, allowing us to early-exit computations and reduce evaluation costs. We also apply our model to machine translation reranking. We combine Early-Exit COMET with an upper confidence bound bandit algorithm to find the best candidate from a large pool without having to run the full evaluation model on all candidates. In both cases (evaluation and reranking) our methods reduce the required compute by 50% with very little degradation in performance.
zh

[NLP-77] oken-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models

【速读】：该论文旨在解决大型语言模型（Large Language Models, LLMs）在文本生成中的不确定性量化（Uncertainty Quantification, UQ）问题。现有的主要UQ方法包括基于信息的方法和基于一致性的方法，而密度方法虽然在基于编码器的模型的文本分类任务中非常有效，但在生成式LLMs中效果不佳。论文的关键解决方案在于将马氏距离（Mahalanobis Distance, MD）这一在分类任务中广泛应用的UQ技术应用于文本生成，并引入了一种新的监督式UQ方法。具体而言，该方法从LLMs的多个层提取令牌嵌入（token embeddings），计算每个令牌的MD得分，并利用在此特征上训练的线性回归模型提供鲁棒的不确定性评分。通过在十一个数据集上的广泛实验，证明了该方法显著优于现有UQ方法，在序列级选择性生成和声明级事实核查任务中提供了准确且计算高效的不确定性评分。

链接: https://arxiv.org/abs/2502.14427
作者: Artem Vazhentsev,Lyudmila Rvanova,Ivan Lazichny,Alexander Panchenko,Maxim Panov,Timothy Baldwin,Artem Shelmanov
机构: Skoltech; AIRI; MBZUAI; The University of Melbourne; Laboratory for Analysis and Controllable Text Generation Technologies RAS
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Uncertainty quantification (UQ) is a prominent approach for eliciting truthful answers from large language models (LLMs). To date, information-based and consistency-based UQ have been the dominant UQ methods for text generation via LLMs. Density-based methods, despite being very effective for UQ in text classification with encoder-based models, have not been very successful with generative LLMs. In this work, we adapt Mahalanobis Distance (MD) - a well-established UQ technique in classification tasks - for text generation and introduce a new supervised UQ method. Our method extracts token embeddings from multiple layers of LLMs, computes MD scores for each token, and uses linear regression trained on these features to provide robust uncertainty scores. Through extensive experiments on eleven datasets, we demonstrate that our approach substantially improves over existing UQ methods, providing accurate and computationally efficient uncertainty scores for both sequence-level selective generation and claim-level fact-checking tasks. Our method also exhibits strong generalization to out-of-domain data, making it suitable for a wide range of LLM-based applications.
zh

[NLP-78] A Survey on Data Contamination for Large Language Models

链接: https://arxiv.org/abs/2502.14425
作者: Yuxing Cheng,Yi Chang,Yuan Wu
机构: College of Software, Jilin University (软件学院，吉林大学); School of Artificial Intelligence, Jilin University (人工智能学院，吉林大学); Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China (知识驱动人机智能教育部工程研究中心，中国); International Center of Future Science, Jilin University (未来科学国际中心，吉林大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-79] Unstructured Evidence Attribution for Long Context Query Focused Summarization

【速读】：该论文旨在解决大型语言模型（LLMs）在长上下文查询聚焦摘要生成过程中，难以有效提取和适当地引用非结构化证据的问题。论文的关键解决方案在于创建了一个名为Summaries with Unstructured Evidence Text (SUnsET)的数据集，该数据集通过一种新颖的领域无关管道生成，可作为监督信号来调整LLMs以更好地完成这一任务。研究表明，使用SUnsET训练的LLMs能够生成更加相关且事实一致的证据，并能从更广泛的上下文中提取证据，从而生成更为相关和一致的摘要。

链接: https://arxiv.org/abs/2502.14409
作者: Dustin Wright,Zain Muhammad Mujahid,Lu Wang,Isabelle Augenstein,David Jurgens
机构: \musSharpDepartment of Computer Science, University of Copenhagen;
\musFlatDepartment of Computer Science and Engineering, University of Michigan;
\musNaturalSchool of Information, University of Michigan
类目: Computation and Language (cs.CL); Information Retrieval (cs.IR)
备注: 24 pages; 21 figures; 5 tables

点击查看摘要

Abstract:Large language models (LLMs) are capable of generating coherent summaries from very long contexts given a user query. Extracting and properly citing evidence spans could help improve the transparency and reliability of these summaries. At the same time, LLMs suffer from positional biases in terms of which information they understand and attend to, which could affect evidence citation. Whereas previous work has focused on evidence citation with predefined levels of granularity (e.g. sentence, paragraph, document, etc.), we propose the task of long-context query focused summarization with unstructured evidence citation. We show how existing systems struggle to generate and properly cite unstructured evidence from their context, and that evidence tends to be “lost-in-the-middle”. To help mitigate this, we create the Summaries with Unstructured Evidence Text dataset (SUnsET), a synthetic dataset generated using a novel domain-agnostic pipeline which can be used as supervision to adapt LLMs to this task. We demonstrate across 5 LLMs of different sizes and 4 datasets with varying document types and lengths that LLMs adapted with SUnsET data generate more relevant and factually consistent evidence than their base models, extract evidence from more diverse locations in their context, and can generate more relevant and consistent summaries.
zh

[NLP-80] A Macro- and Micro-Hierarchical Transfer Learning Framework for Cross-Domain Fake News Detection

链接: https://arxiv.org/abs/2502.14403
作者: Xuankai Yang,Yan Wang,Xiuzhen Zhang,Shoujin Wang,Huaxiong Wang,Kwok Yan Lam
机构: School of Computing, Macquarie University (计算学院, 麦考瑞大学); School of Computing Technologies, RMIT University (计算技术学院, RMIT大学); Data Science Institute, University of Technology Sydney (数据科学研究所, 伍伦贡大学); School of Physical and Mathematical Sciences, Nanyang Technological University (物理与数学科学学院, 南洋理工大学); College of Computing and Data Science, Nanyang Technological University (计算与数据科学学院, 南洋理工大学)
类目: ocial and Information Networks (cs.SI); Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: 11 pages, 8 figures

点击查看摘要

[NLP-81] Enhancing Portuguese Variety Identification with Cross-Domain Approaches AAAI2025

【速读】：该论文旨在解决自然语言处理领域中生成式模型在不同语言变体间应用的局限性问题，特别是在葡萄牙语中，由于网络上巴西葡萄牙语语料库的主导地位，导致这些模型存在语言偏见，限制了其在巴西以外的应用。为了解决这一问题并促进欧洲葡萄牙语资源的开发，研究团队开发了一种跨领域语言变体标识器（LVI），用于区分欧洲葡萄牙语和巴西葡萄牙语。该解决方案的关键在于构建了一个名为PtBrVarId的跨领域LVI数据集，并研究了基于Transformer的语言变体标识分类器在跨领域场景中的有效性。

链接: https://arxiv.org/abs/2502.14394
作者: Hugo Sousa,Rúben Almeida,Purificação Silvano,Inês Cantante,Ricardo Campos,Alípio Jorge
机构: 未知
类目: Computation and Language (cs.CL)
备注: AAAI 2025

点击查看摘要

Abstract:Recent advances in natural language processing have raised expectations for generative models to produce coherent text across diverse language varieties. In the particular case of the Portuguese language, the predominance of Brazilian Portuguese corpora online introduces linguistic biases in these models, limiting their applicability outside of Brazil. To address this gap and promote the creation of European Portuguese resources, we developed a cross-domain language variety identifier (LVI) to discriminate between European and Brazilian Portuguese. Motivated by the findings of our literature review, we compiled the PtBrVarId corpus, a cross-domain LVI dataset, and study the effectiveness of transformer-based LVI classifiers for cross-domain scenarios. Although this research focuses on two Portuguese varieties, our contribution can be extended to other varieties and languages. We open source the code, corpus, and models to foster further research in this task.
zh

[NLP-82] Leverag ing Small LLM s for Argument Mining in Education: Argument Component Identification Classification and Assessment

链接: https://arxiv.org/abs/2502.14389
作者: Lucile Favero,Juan Antonio Pérez-Ortiz,Tanja Käser,Nuria Oliver
机构: ELLIS Alicante(ELLIS阿里坎特); Universitat d’Alacant(阿尔卡纳特大学); École Polytechnique Fédérale de Lausanne, EPFL(洛桑联邦理工学院)
类目: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
备注:

点击查看摘要

[NLP-83] radutor: Building a Variety Specific Translation Model AAAI2025

链接: https://arxiv.org/abs/2502.14385
作者: Hugo Sousa,Satya Almasian,Ricardo Campos,Alípio Jorge
机构: 未知
类目: Computation and Language (cs.CL)
备注: AAAI 2025

点击查看摘要

[NLP-84] Rumor Detection by Multi-task Suffix Learning based on Time-series Dual Sentiments

链接: https://arxiv.org/abs/2502.14383
作者: Zhiwei Liu,Kailai Yang,Eduard Hovy,Sophia Ananiadou
机构: The University of Manchester(曼彻斯特大学); The University of Melbourne(墨尔本大学); Carnegie Mellon University(卡内基梅隆大学)
类目: Computation and Language (cs.CL)
备注: work in progress

点击查看摘要

[NLP-85] Affinity and Diversity: A Unified Metric for Demonstration Selection via Internal Representations

链接: https://arxiv.org/abs/2502.14380
作者: Mariko Kato,Hakaze Cho,Yoshihiro Sakai,Naoya Inoue
机构: Japan Advanced Institute of Science and Technology (日本先进科学与技术学院); RIKEN
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 8 pages, 10 figures

点击查看摘要

[NLP-86] A Similarity Paradigm Through Textual Regularization Without Forgetting

链接: https://arxiv.org/abs/2502.14376
作者: Fangming Cui,Jan Fong,Rongfei Zeng,Xinmei Tian,Jun Yu
机构: 未知
类目: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[NLP-87] Entropy-UID: A Method for Optimizing Information Density ACL2025

链接: https://arxiv.org/abs/2502.14366
作者: Xinpeng Shou
机构: 未知
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 5pages, 1 figures, submitting to ACL 2025

点击查看摘要

[NLP-88] riangulating LLM Progress through Benchmarks Games and Cognitive Tests

链接: https://arxiv.org/abs/2502.14359
作者: Filippo Momentè,Alessandro Suglia,Mario Giulianelli,Ambra Ferrari,Alexander Koller,Oliver Lemon,David Schlangen,Raquel Fernández,Raffaella Bernardi
机构: University of Trento(特伦托大学); Heriot-Watt University(赫瑞瓦特大学); ETH Zürich(瑞士苏黎世联邦理工学院); Saarland University(萨尔州立大学); University of Potsdam(波茨坦大学); University of Amsterdam(阿姆斯特丹大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-89] Full-Step-DPO: Self-Supervised Preference Optimization with Step-wise Rewards for Mathematical Reasoning

链接: https://arxiv.org/abs/2502.14356
作者: Huimin Xu,Xin Mao,Feng-Lin Li,Xiaobao Wu,Wang Chen,Wei Zhang,Anh Tuan Luu
机构: Nanyang Technological University (南洋理工大学); Shopee Pte. Ltd (Shopee有限公司); SEA Group (SEA集团)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-90] Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment

【速读】：该论文旨在解决多目标对齐（Multi-Objective Alignment, MOA）过程中，基于直接偏好优化（Direct Preference Optimization, DPO）的方法所面临的广泛存在的偏好冲突问题。这些偏好冲突导致不同的目标倾向于不同的响应，从而引发优化方向的冲突，阻碍了帕累托前沿（Pareto Front）上的优化进程。为了解决这一问题，论文的关键方案是提出构建帕累托最优响应以解决偏好冲突，并设计了一个自我改进的DPO框架，使大型语言模型（LLMs）能够自我生成和选择帕累托最优响应，实现自我监督的偏好对齐。实验结果表明，该框架在两个数据集上实现了优于各种基线方法的帕累托前沿。

链接: https://arxiv.org/abs/2502.14354
作者: Moxin Li,Yuantao Zhang,Wenjie Wang,Wentao Shi,Zhuo Liu,Fuli Feng,Tat-Seng Chua
机构: National University of Singapore(新加坡国立大学); University of Science and Technology of China(中国科学技术大学)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注: Under review

点击查看摘要

Abstract:Multi-Objective Alignment (MOA) aims to align LLMs’ responses with multiple human preference objectives, with Direct Preference Optimization (DPO) emerging as a prominent approach. However, we find that DPO-based MOA approaches suffer from widespread preference conflicts in the data, where different objectives favor different responses. This results in conflicting optimization directions, hindering the optimization on the Pareto Front. To address this, we propose to construct Pareto-optimal responses to resolve preference conflicts. To efficiently obtain and utilize such responses, we propose a self-improving DPO framework that enables LLMs to self-generate and select Pareto-optimal responses for self-supervised preference alignment. Extensive experiments on two datasets demonstrate the superior Pareto Front achieved by our framework compared to various baselines. Code is available at \urlthis https URL.
zh

[NLP-91] SR-LLM : Rethinking the Structured Representation in Large Language Model

链接: https://arxiv.org/abs/2502.14352
作者: Jiahuan Zhang,Tianheng Wang,Hanqing Wu,Ziyi Huang,Yulong Wu,Dongbai Chen,Linfeng Song,Yue Zhang,Guozheng Rao,Kaicheng Yu
机构: Westlake University (西湖大学); KMind Technology Co., Ltd. (科脉科技有限公司); Tianjin University (天津大学); Beijing Jiaotong University, Weihai (北京交通大学威海校区); University of Toronto (多伦多大学); Tencent AI Lab (腾讯人工智能实验室)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-92] Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective ICLR2025

链接: https://arxiv.org/abs/2502.14340
作者: Ruichen Shao,Bei Li,Gangao Liu,Yang Chen,Xiang Zhou,Jingang Wang,Xunliang Cai,Peng Li
机构: 未知
类目: Computation and Language (cs.CL)
备注: Accepted by ICLR 2025

点击查看摘要

[NLP-93] English Please: Evaluating Machine Translation for Multilingual Bug Reports

【速读】：该论文旨在评估机器翻译（Machine Translation, MT）在软件缺陷报告（bug reports）翻译中的性能，特别关注DeepL、AWS Translate和ChatGPT三种系统的准确性与有效性。研究通过多个自动评估指标，包括BLEU、BERTScore、COMET、METEOR和ROUGE，对这些系统进行综合分析。关键解决方案在于采用多维度的评估方法，发现DeepL在大多数自动评估指标中表现最优，而AWS Translate在METEOR指标中表现较为突出，ChatGPT则在关键指标上表现较弱。研究表明领域适应性对于技术文本翻译的重要性，并为未来优化特定工程领域的机器翻译提供了指导。

链接: https://arxiv.org/abs/2502.14338
作者: Avinash Patil,Aryan Jadon
机构: Juniper Networks Inc. (Juniper Networks公司)
类目: Computation and Language (cs.CL); Software Engineering (cs.SE)
备注: 8 Pages, 4 Figures, 3 Tables

点击查看摘要

Abstract:Accurate translation of bug reports is critical for efficient collaboration in global software development. In this study, we conduct the first comprehensive evaluation of machine translation (MT) performance on bug reports, analyzing the capabilities of DeepL, AWS Translate, and ChatGPT using data from the Visual Studio Code GitHub repository, specifically focusing on reports labeled with the english-please tag. To thoroughly assess the accuracy and effectiveness of each system, we employ multiple machine translation metrics, including BLEU, BERTScore, COMET, METEOR, and ROUGE. Our findings indicate that DeepL consistently outperforms the other systems across most automatic metrics, demonstrating strong lexical and semantic alignment. AWS Translate performs competitively, particularly in METEOR, while ChatGPT lags in key metrics. This study underscores the importance of domain adaptation for translating technical texts and offers guidance for integrating automated translation into bug-triaging workflows. Moreover, our results establish a foundation for future research to refine machine translation solutions for specialized engineering contexts. The code and dataset for this paper are available at GitHub: this https URL.
zh

[NLP-94] Information Types in Product Reviews

链接: https://arxiv.org/abs/2502.14335
作者: Ori Shapira,Yuval Piniter
机构: OriginAI; Ben-Gurion University of the Negev (本-古里安大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-95] A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics

链接: https://arxiv.org/abs/2502.14333
作者: Ting-Ruen Wei,Haowei Liu,Xuyang Wu,Yi Fang
机构: Santa Clara University (圣克拉拉大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-96] Beyond Self-Talk: A Communication-Centric Survey of LLM -Based Multi-Agent Systems

【速读】：该论文旨在探讨大型语言模型（LLMs）在多智能体系统（MAS）中的应用，特别是通过自然语言交互实现协作或竞争以完成单个智能体难以处理的任务。论文的关键在于从通信角度审视LLM驱动的多智能体系统的架构设计和通信目标，并深入分析内部机制如通信策略、范式、对象及内容。通过这些通信元素的互动，论文展示了如何促进集体智能与灵活协作。同时，论文讨论了可扩展性、安全性及多模态集成等主要挑战，并提出了未来研究方向。

链接: https://arxiv.org/abs/2502.14321
作者: Bingyu Yan,Xiaoming Zhang,Litian Zhang,Lian Zhang,Ziyi Zhou,Dezhuang Miao,Chaozhuo Li
机构: Beihang University(北京航空航天大学); Beijing University of Posts and Telecommunications(北京邮电大学)
类目: Multiagent Systems (cs.MA); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Large Language Models (LLMs) have recently demonstrated remarkable capabilities in reasoning, planning, and decision-making. Building upon these strengths, researchers have begun incorporating LLMs into multi-agent systems (MAS), where agents collaborate or compete through natural language interactions to tackle tasks beyond the scope of single-agent setups. In this survey, we present a communication-centric perspective on LLM-based multi-agent systems, examining key system-level features such as architecture design and communication goals, as well as internal mechanisms like communication strategies, paradigms, objects and content. We illustrate how these communication elements interplay to enable collective intelligence and flexible collaboration. Furthermore, we discuss prominent challenges, including scalability, security, and multimodal integration, and propose directions for future work to advance research in this emerging domain. Ultimately, this survey serves as a catalyst for further innovation, fostering more robust, scalable, and intelligent multi-agent systems across diverse application domains.
zh

[NLP-97] Line Goes Up? Inherent Limitations of Benchmarks for Evaluating Large Language Models

【速读】：该论文旨在挑战关于大型语言模型（LLMs）在广泛的语言、知识和推理基准测试中展示出迅速提升的一般认知能力的观点。论文的关键在于通过理论和实证考量指出，现有的基准测试范式及其局限性使得基准性能不适合作为衡量LLMs在认知任务中的可泛化能力的指标。此外，论文提出使用对抗性刺激和可解释性技术评估LLMs的能力，结果显示LLMs在许多语言和推理任务中缺乏稳健的竞争力，并且通常未能学习到促进可泛化推断的表征。因此，论文得出结论：不应将基准性能作为衡量LLMs一般认知能力的可靠指标。

链接: https://arxiv.org/abs/2502.14318
作者: James Fodor
机构: The Centre for Brain, Mind and Markets (大脑、心灵与市场中心); The University of Melbourne (墨尔本大学), Australia
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 10 pages

点击查看摘要

Abstract:Large language models (LLMs) regularly demonstrate new and impressive performance on a wide range of language, knowledge, and reasoning benchmarks. Such rapid progress has led many commentators to argue that LLM general cognitive capabilities have likewise rapidly improved, with the implication that such models are becoming progressively more capable on various real-world tasks. Here I summarise theoretical and empirical considerations to challenge this narrative. I argue that inherent limitations with the benchmarking paradigm, along with specific limitations of existing benchmarks, render benchmark performance highly unsuitable as a metric for generalisable competence over cognitive tasks. I also contend that alternative methods for assessing LLM capabilities, including adversarial stimuli and interpretability techniques, have shown that LLMs do not have robust competence in many language and reasoning tasks, and often fail to learn representations which facilitate generalisable inferences. I conclude that benchmark performance should not be used as a reliable indicator of general LLM cognitive capabilities.
zh

[NLP-98] ParallelComp: Parallel Long-Context Compressor for Length Extrapolation

【速读】：该论文旨在解决大型语言模型（LLMs）在处理长上下文时面临的挑战，特别是在长度外推方面的难题。现有方法要么依赖于昂贵的微调，要么受制于注意力衰减现象导致性能下降。论文的关键解决方案是提出了一种名为ParallelComp的新型无训练方法，通过引入注意力校准策略和块驱逐策略，使LLMs能够从4K扩展到128K的上下文长度，同时保持高吞吐量和低困惑度，并与Flash Attention无缝集成。此外，通过并行KV缓存驱逐技术进一步提升了效率，在预填充阶段实现了23.50倍的加速。

链接: https://arxiv.org/abs/2502.14317
作者: Jing Xiong,Jianghan Shen,Chuanyang Zheng,Zhongwei Wan,Chenyang Zhao,Chiwun Yang,Fanghua Ye,Hongxia Yang,Lingpeng Kong,Ngai Wong
机构: 未知
类目: Computation and Language (cs.CL)
备注: We will release the code soon

点击查看摘要

Abstract:Efficiently handling long contexts is crucial for large language models (LLMs). While rotary position embeddings (RoPEs) enhance length generalization, effective length extrapolation remains challenging and often requires costly fine-tuning. In contrast, recent training-free approaches suffer from the attention sink phenomenon, leading to severe performance degradation. In this paper, we introduce ParallelComp, a novel training-free method for long-context extrapolation that extends LLMs’ context length from 4K to 128K while maintaining high throughput and preserving perplexity, and integrates seamlessly with Flash Attention. Our analysis offers new insights into attention biases in parallel attention mechanisms and provides practical solutions to tackle these challenges. To mitigate the attention sink issue, we propose an attention calibration strategy that reduces biases, ensuring more stable long-range attention. Additionally, we introduce a chunk eviction strategy to efficiently manage ultra-long contexts on a single A100 80GB GPU. To further enhance efficiency, we propose a parallel KV cache eviction technique, which improves chunk throughput by 1.76x, thereby achieving a 23.50x acceleration in the prefilling stage with negligible performance loss due to attention calibration. Furthermore, ParallelComp achieves 91.17% of GPT-4’s performance on long-context tasks using an 8B model trained on 8K-length context, outperforming powerful closed-source models such as Claude-2 and Kimi-Chat.
zh

[NLP-99] Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLM s in Procedural Text Comprehension

链接: https://arxiv.org/abs/2502.14315
作者: Amir Hossein Yari,Fajri Koto
机构: Sharif University of Technology; MBZUAI (穆罕默德·本·扎耶德人工智能大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-100] he Impact and Feasibility of Self-Confidence Shaping for AI-Assisted Decision-Making

【速读】：该论文旨在解决在高风险领域（如金融和医疗）中人类如何适当地依赖人工智能（AI）进行决策的问题。论文的关键解决方案是通过实施一种自信心塑造干预措施来校准自信心至目标水平，从而改善人与AI团队的性能。实验结果表明，这种干预可以减少过度和不足的AI依赖，进而将人与AI团队的性能提升近50%。

链接: https://arxiv.org/abs/2502.14311
作者: Takehiro Takayanagi,Ryuji Hashimoto,Chung-Chi Chen,Kiyoshi Izumi
机构: The University of Tokyo; National Institute of Advanced Industrial Science and Technology
类目: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL); Computers and Society (cs.CY)
备注:

点击查看摘要

Abstract:In AI-assisted decision-making, it is crucial but challenging for humans to appropriately rely on AI, especially in high-stakes domains such as finance and healthcare. This paper addresses this problem from a human-centered perspective by presenting an intervention for self-confidence shaping, designed to calibrate self-confidence at a targeted level. We first demonstrate the impact of self-confidence shaping by quantifying the upper-bound improvement in human-AI team performance. Our behavioral experiments with 121 participants show that self-confidence shaping can improve human-AI team performance by nearly 50% by mitigating both over- and under-reliance on AI. We then introduce a self-confidence prediction task to identify when our intervention is needed. Our results show that simple machine-learning models achieve 67% accuracy in predicting self-confidence. We further illustrate the feasibility of such interventions. The observed relationship between sentiment and self-confidence suggests that modifying sentiment could be a viable strategy for shaping self-confidence. Finally, we outline future research directions to support the deployment of self-confidence shaping in a real-world scenario for effective human-AI collaboration.
zh

[NLP-101] MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models

链接: https://arxiv.org/abs/2502.14302
作者: Shrey Pandit,Jiawei Xu,Junyuan Hong,Zhangyang Wang,Tianlong Chen,Kaidi Xu,Ying Ding
机构: University of Texas at Austin(德克萨斯大学奥斯汀分校); UNC Chapel Hill(北卡罗来纳大学教堂山分校); Drexel University(德雷塞尔大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Code and dataset are available at this https URL

点击查看摘要

[NLP-102] SEA-HELM: Southeast Asian Holistic Evaluation of Language Models

链接: https://arxiv.org/abs/2502.14301
作者: Yosephine Susanto,Adithya Venkatadri Hulagadri,Jann Railey Montalan,Jian Gang Ngui,Xian Bin Yong,Weiqi Leong,Hamsawardhini Rengarajan,Peerat Limkonchotiwat,Yifan Mai,William Chandra Tjhi
机构: AI Singapore(人工智能Singapore); National University of Singapore(新加坡国立大学); Center for Research on Foundation Models (CRFM), Stanford University(基础模型研究中心(CRFM), 斯坦福大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-103] Drift: Decoding-time Personalized Alignments with Implicit User Preferences

【速读】：该论文旨在解决大型语言模型（LLMs）个性化对齐的问题，即如何在解码时根据个体用户的隐含偏好个性化调整LLMs。解决方案的关键在于引入了一个名为Drift的新框架，该框架通过少量示例（约50-100个）进行高效的偏好建模，从而在无需训练的情况下个性化LLMs，显著优于传统的基于人类反馈的强化学习（Reinforcement Learning from Human Feedback, RLHF）。

链接: https://arxiv.org/abs/2502.14289
作者: Minbeom Kim,Kang-il Lee,Seongho Joo,Hwaran Lee,Minbeom Kim
机构: Seoul National University(首尔国立大学); Sogang University(西江大学); NAVER AI Lab(NAVER AI 实验室)
类目: Computation and Language (cs.CL)
备注: 19 pages, 6 figures

点击查看摘要

Abstract:Personalized alignments for individual users have been a long-standing goal in large language models (LLMs). We introduce Drift, a novel framework that personalizes LLMs at decoding time with implicit user preferences. Traditional Reinforcement Learning from Human Feedback (RLHF) requires thousands of annotated examples and expensive gradient updates. In contrast, Drift personalizes LLMs in a training-free manner, using only a few dozen examples to steer a frozen model through efficient preference modeling. Our approach models user preferences as a composition of predefined, interpretable attributes and aligns them at decoding time to enable personalized generation. Experiments on both a synthetic persona dataset (Perspective) and a real human-annotated dataset (PRISM) demonstrate that Drift significantly outperforms RLHF baselines while using only 50-100 examples. Our results and analysis show that Drift is both computationally efficient and interpretable.
zh

[NLP-104] Vulnerability of Text-to-Image Models to Prompt Template Stealing: A Differential Evolution Approach

【速读】：该论文旨在解决通过有限数量的样本图像窃取提示模板的安全漏洞问题。论文的关键解决方案是提出了一种名为EvoStealer的新颖提示窃取方法，该方法利用差分进化算法在无需模型微调的情况下操作。EvoStealer首先使用预定义模式和多模态大型语言模型（Multimodal Large Language Models, MLLMs）初始化种群集，然后通过MLLMs迭代生成增强后代，并在进化过程中识别后代中的共同特征以推导出泛化的模板。这一方法显著提高了窃取模板的性能，在开放源代码（如INTERNVL2-26B）和闭源模型（如GPT-4o和GPT-4o-mini）上的评估显示其平均改进超过10%。

链接: https://arxiv.org/abs/2502.14285
作者: Yurong Wu,Fangwen Mu,Qiuhong Zhang,Jinjing Zhao,Xinrun Xu,Lingrui Mei,Yang Wu,Lin Shi,Junjie Wang,Zhiming Ding,Yiwei Wang
机构: Institute of Software, Chinese Academy of Sciences(中国科学院软件研究所); University of Chinese Academy of Sciences(中国科学院大学); University of California at Merced(加州大学默塞德分校); The University of Sydney(悉尼大学)
类目: Computation and Language (cs.CL)
备注: 14 pages,8 figures,4 tables

点击查看摘要

Abstract:Prompt trading has emerged as a significant intellectual property concern in recent years, where vendors entice users by showcasing sample images before selling prompt templates that can generate similar images. This work investigates a critical security vulnerability: attackers can steal prompt templates using only a limited number of sample images. To investigate this threat, we introduce Prism, a prompt-stealing benchmark consisting of 50 templates and 450 images, organized into Easy and Hard difficulty levels. To identify the vulnerabity of VLMs to prompt stealing, we propose EvoStealer, a novel template stealing method that operates without model fine-tuning by leveraging differential evolution algorithms. The system first initializes population sets using multimodal large language models (MLLMs) based on predefined patterns, then iteratively generates enhanced offspring through MLLMs. During evolution, EvoStealer identifies common features across offspring to derive generalized templates. Our comprehensive evaluation conducted across open-source (INTERNVL2-26B) and closed-source models (GPT-4o and GPT-4o-mini) demonstrates that EvoStealer’s stolen templates can reproduce images highly similar to originals and effectively generalize to other subjects, significantly outperforming baseline methods with an average improvement of over 10%. Moreover, our cost analysis reveals that EvoStealer achieves template stealing with negligible computational expenses. Our code and dataset are available at this https URL.
zh

[NLP-105] EpMAN: Episodic Memory AttentioN for Generalizing to Longer Contexts

【速读】：该论文旨在解决大型语言模型（LLMs）在处理长上下文时效率低下的问题。关键解决方案在于引入\textbf{EpMAN}方法，通过在情景记忆模块中进行情景注意力机制，整体性地关注语义相关的上下文片段。这种方法通过重新加权解码器的自注意力机制来优化存储的关键值缓存（KV cache），从而在训练和生成过程中提高性能。实验结果表明，使用EpMAN训练的LLM解码器在从16k到256k tokens的多个具有挑战性的单跳长上下文回忆和问答基准测试中表现更优且更稳健。

链接: https://arxiv.org/abs/2502.14280
作者: Subhajit Chaudhury,Payel Das,Sarathkrishna Swaminathan,Georgios Kollias,Elliot Nelson,Khushbu Pahwa,Tejaswini Pedapati,Igor Melnyk,Matthew Riemer
机构: IBM Research
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Recent advances in Large Language Models (LLMs) have yielded impressive successes on many language tasks. However, efficient processing of long contexts using LLMs remains a significant challenge. We introduce \textbfEpMAN – a method for processing long contexts in an \textitepisodic memory module while \textitholistically attending to semantically relevant context chunks. The output of \textitepisodic attention is then used to reweigh the decoder’s self-attention to the stored KV cache of the context during training and generation. When an LLM decoder is trained using \textbfEpMAN, its performance on multiple challenging single-hop long-context recall and question-answering benchmarks is found to be stronger and more robust across the range from 16k to 256k tokens than baseline decoders trained with self-attention, and popular retrieval-augmented generation frameworks.
zh

[NLP-106] STeCa: Step-level Trajectory Calibration for LLM Agent Learning

链接: https://arxiv.org/abs/2502.14276
作者: Hanlin Wang,Jian Wang,Chak Tou Leong,Wenjie Li
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-107] Fact or Guesswork? Evaluating Large Language Models Medical Knowledge with Structured One-Hop Judgment

【速读】：该论文旨在解决大型语言模型（LLMs）在直接回忆和应用基础医学事实方面的能力不足问题。现有医学问答基准主要评估复杂的推理或多跳推断，难以区分LLMs的推理能力和固有的医学知识。论文的关键解决方案是引入了一个名为“医学知识判断”(Medical Knowledge Judgment, MKJ)的数据集，该数据集专门用于衡量LLMs的一阶事实性医学知识。MKJ基于统一医学语言系统（Unified Medical Language System, UMLS）构建，并将知识评估构架为一个二元判断任务。为了进一步提高LLMs在医学决策中的事实准确性并减少不确定性，研究还探索了检索增强生成（retrieval-augmented generation），展示了其在改善事实准确性方面的有效性。

链接: https://arxiv.org/abs/2502.14275
作者: Jiaxi Li,Yiwei Wang,Kai Zhang,Yujun Cai,Bryan Hooi,Nanyun Peng,Kai-Wei Chang,Jin Lu
机构: University of Georgia(乔治亚大学); University of California, Merced(加州大学默塞德分校); Lehigh University(里海大学); The University of Queensland(昆士兰大学); National University of Singapore(新加坡国立大学); University of California, Los Angeles(加州大学洛杉矶分校)
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: 15 pages, 11 figures

点击查看摘要

Abstract:Large language models (LLMs) have been widely adopted in various downstream task domains. However, their ability to directly recall and apply factual medical knowledge remains under-explored. Most existing medical QA benchmarks assess complex reasoning or multi-hop inference, making it difficult to isolate LLMs’ inherent medical knowledge from their reasoning capabilities. Given the high-stakes nature of medical applications, where incorrect information can have critical consequences, it is essential to evaluate how well LLMs encode, retain, and recall fundamental medical facts. To bridge this gap, we introduce the Medical Knowledge Judgment, a dataset specifically designed to measure LLMs’ one-hop factual medical knowledge. MKJ is constructed from the Unified Medical Language System (UMLS), a large-scale repository of standardized biomedical vocabularies and knowledge graphs. We frame knowledge assessment as a binary judgment task, requiring LLMs to verify the correctness of medical statements extracted from reliable and structured knowledge sources. Our experiments reveal that LLMs struggle with factual medical knowledge retention, exhibiting significant performance variance across different semantic categories, particularly for rare medical conditions. Furthermore, LLMs show poor calibration, often being overconfident in incorrect answers. To mitigate these issues, we explore retrieval-augmented generation, demonstrating its effectiveness in improving factual accuracy and reducing uncertainty in medical decision-making. Comments: 15 pages, 11 figures Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG) Cite as: arXiv:2502.14275 [cs.CL] (or arXiv:2502.14275v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2502.14275 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
zh

[NLP-108] Capturing Nuanced Preferences: Preference-Aligned Distillation for Small Language Models

【速读】：该论文旨在解决小规模语言模型（Small Language Models, SLMs）与人类价值观对齐的问题。现有方法通过对比教师大语言模型（Large Language Models, LLMs）的成对响应来提取偏好知识，但忽略了响应之间的差异程度，导致学生SLMs难以捕捉到细微的偏好。论文的关键解决方案是提出了一种名为Preference-Aligned Distillation (PAD)的框架，该框架将教师的偏好知识建模为所有潜在偏好的概率分布，从而提供更细致的监督信号。PAD包含三个关键步骤：(1) 使用高温度采样多样化的响应；(2) 计算教师和学生的奖励以构建其内在偏好；(3) 训练学生内在偏好分布以与教师对齐。实验结果表明，PAD在四个主流对齐基准上显著优于现有方法，并且在MT-Bench上甚至使学生模型超越了教师模型。

链接: https://arxiv.org/abs/2502.14272
作者: Yanggan Gu,Junzhuo Li,Sirui Huang,Xin Zou,Zhenghua Li,Xuming Hu
机构: The Hong Kong University of Science and Technology (Guangzhou); Soochow University; University of Technology Sydney; The Hong Kong University of Science and Technology
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Under review

点击查看摘要

Abstract:Aligning small language models (SLMs) with human values typically involves distilling preference knowledge from large language models (LLMs). However, existing distillation methods model preference knowledge in teacher LLMs by comparing pairwise responses, overlooking the extent of difference between responses. This limitation hinders student SLMs from capturing the nuanced preferences for multiple responses. In this paper, we propose a Preference-Aligned Distillation (PAD) framework, which models teacher’s preference knowledge as a probability distribution over all potential preferences, thereby providing more nuanced supervisory signals. Our insight in developing PAD is rooted in the demonstration that language models can serve as reward functions, reflecting their intrinsic preferences. Based on this, PAD comprises three key steps: (1) sampling diverse responses using high-temperature; (2) computing rewards for both teacher and student to construct their intrinsic preference; and (3) training the student’s intrinsic preference distribution to align with the teacher’s. Experiments on four mainstream alignment benchmarks demonstrate that PAD consistently and significantly outperforms existing approaches, achieving over 20% improvement on AlpacaEval 2 and Arena-Hard, indicating superior alignment with human preferences. Notably, on MT-Bench, using the \textscGemma model family, the student trained by PAD surpasses its teacher, further validating the effectiveness of our PAD.
zh

[NLP-109] PaperHelper: Knowledge-Based LLM QA Paper Reading Assistant

【速读】：该论文旨在解决科研人员在高效浏览和理解科学文献过程中所面临的挑战。论文提出的关键解决方案是PaperHelper，一个基于 Retrieval-Augmented Generation (RAG) 框架的文献阅读辅助工具。PaperHelper通过实施如RAFT和RAG融合等先进技术，显著提升了基于大型语言模型 (LLMs) 的文献回顾过程的性能、准确性和可靠性，同时有效减少了大型语言模型常见的幻觉问题。实验结果显示，基于微调GPT-4 API的PaperHelper达到了60.04的F1分数，且延迟仅为5.8秒，比基本RAG模型提高了7%的F1分数。

链接: https://arxiv.org/abs/2502.14271
作者: Congrui Yin,Evan Wei,Zhongxing Zhang,Zaifu Zhan
机构: Department of Computer Science and Engineering, University of Minnesota, Twin Cities(计算机科学与工程系，明尼苏达大学双城分校); Department of Electrical and Computer Engineering, University of Minnesota, Twin Cities(电气与计算机工程系，明尼苏达大学双城分校)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:In the paper, we introduce a paper reading assistant, PaperHelper, a potent tool designed to enhance the capabilities of researchers in efficiently browsing and understanding scientific literature. Utilizing the Retrieval-Augmented Generation (RAG) framework, PaperHelper effectively minimizes hallucinations commonly encountered in large language models (LLMs), optimizing the extraction of accurate, high-quality knowledge. The implementation of advanced technologies such as RAFT and RAG Fusion significantly boosts the performance, accuracy, and reliability of the LLMs-based literature review process. Additionally, PaperHelper features a user-friendly interface that facilitates the batch downloading of documents and uses the Mermaid format to illustrate structural relationships between documents. Experimental results demonstrate that PaperHelper, based on a fine-tuned GPT-4 API, achieves an F1 Score of 60.04, with a latency of only 5.8 seconds, outperforming the basic RAG model by 7% in F1 Score.
zh

[NLP-110] MCQA-Eval: Efficient Confidence Evaluation in NLG with Gold-Standard Correctness Labels

链接: https://arxiv.org/abs/2502.14268
作者: Xiaoou Liu,Zhen Lin,Longchao Da,Chacha Chen,Shubhendu Trivedi,Hua Wei
机构: Arizona State University; University of Chicago
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-111] Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information

【速读】：该论文旨在探究语言模型处理随时间变化的事实的能力，而不仅仅是提取静态事实。论文的关键在于通过电路分析发现并验证了“Temporal Heads”（时间头），这些特定的注意力头主要负责处理时间相关的知识。论文表明，禁用这些时间头会削弱模型回忆特定时间知识的能力，但不会影响其总体性能或处理不变时间和问答任务的能力。此外，时间头不仅对数值条件激活，也对文本表达的时间激活，这表明它们编码了超越简单数值表示的时间维度。通过调整这些头的值，可以进一步编辑时间知识，从而扩展了研究的潜在应用。

链接: https://arxiv.org/abs/2502.14258
作者: Yein Park,Chanwoong Yoon,Jungwoo Park,Minbyul Jeong,Jaewoo Kang
机构: Korea University(高丽大学); Upstage AI; AIGEN Sciences
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:While the ability of language models to elicit facts has been widely investigated, how they handle temporally changing facts remains underexplored. We discover Temporal Heads, specific attention heads primarily responsible for processing temporal knowledge through circuit analysis. We confirm that these heads are present across multiple models, though their specific locations may vary, and their responses differ depending on the type of knowledge and its corresponding years. Disabling these heads degrades the model’s ability to recall time-specific knowledge while maintaining its general capabilities without compromising time-invariant and question-answering performances. Moreover, the heads are activated not only numeric conditions (“In 2004”) but also textual aliases (“In the year …”), indicating that they encode a temporal dimension beyond simple numerical representation. Furthermore, we expand the potential of our findings by demonstrating how temporal knowledge can be edited by adjusting the values of these heads.
zh

[NLP-112] Effects of Prompt Length on Domain-specific Tasks for Large Language Models

链接: https://arxiv.org/abs/2502.14255
作者: Qibang Liu,Wenzhe Wang,Jeffrey Willard
机构: Georgia Institute of Technology (乔治亚理工学院); Nanjing University of Finance and Economics (南京财经大学); Boston University (波士顿大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-113] Mitigating Lost-in-Retrieval Problems in Retrieval Augmented Multi-Hop Question Answering

链接: https://arxiv.org/abs/2502.14245
作者: Rongzhi Zhu,Xiangyu Liu,Zequn Sun,Yiwei Wang,Wei Hu
机构: State Key Laboratory for Novel Software Technology, Nanjing University (软件新技术国家重点实验室，南京大学), China; University of California, Merced (加州大学默塞德分校), USA; National Institute of Healthcare Data Science, Nanjing University (健康医疗数据科学研究所，南京大学), China
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-114] ransfer-Prompting: Enhancing Cross-Task Adaptation in Large Language Models via Dual-Stage Prompts Optimization

链接: https://arxiv.org/abs/2502.14211
作者: Yupeng Chang,Yi Chang,Yuan Wu
机构: School of Artificial intelligence, Jilin University(吉林大学人工智能学院); Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, Jilin University(知识驱动人机智能工程研究中心); International Center of Future Science, Jilin University(未来科学国际中心)
类目: Computation and Language (cs.CL)
备注: 17 pages

点击查看摘要

[NLP-115] On-the-fly Preference Alignment via Principle-Guided Decoding ICLR2025

链接: https://arxiv.org/abs/2502.14204
作者: Mingye Zhu,Yi Liu,Lei Zhang,Junbo Guo,Zhendong Mao
机构: University of Science and Technology of China(中国科学技术大学); State Key Laboratory of Communication Content Cognition, People’s Daily Online(人民日报社传播内容认知国家重点实验室)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Accepted to ICLR 2025

点击查看摘要

[NLP-116] Do LLM s Consider Security? An Empirical Study on Responses to Programming Questions

链接: https://arxiv.org/abs/2502.14202
作者: Amirali Sajadi,Binh Le,Anh Nguyen,Kostadin Damevski,Preetha Chatterjee
机构: 未知
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-117] NLP-AKG: Few-Shot Construction of NLP Academic Knowledge Graph Based on LLM

链接: https://arxiv.org/abs/2502.14192
作者: Jiayin Lan,Jiaqi Li,Baoxin Wang,Ming Liu,Dayong Wu,Shijin Wang,Bing Qin
机构: Harbin Institute of Technology(哈尔滨工业大学); Joint Laboratory of HIT and iFLYTEK(哈工大与科大讯飞联合实验室); University of Science and Technology of China(中国科学技术大学)
类目: Computation and Language (cs.CL); Digital Libraries (cs.DL)
备注:

点击查看摘要

[NLP-118] QUAD-LLM -MLTC: Large Language Models Ensemble Learning for Healthcare Text Multi-Label Classification

链接: https://arxiv.org/abs/2502.14189
作者: Hajar Sakai,Sarah S. Lam
机构: 未知
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-119] Federated Fine-Tuning of Large Language Models : Kahneman-Tversky vs. Direct Preference Optimization

链接: https://arxiv.org/abs/2502.14187
作者: Fernando Spadea,Oshani Seneviratne
机构: Rensselaer Polytechnic Institute(伦斯勒理工学院)
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-120] On the logical skills of large language models : evaluations using arbitrarily complex first-order logic problems

链接: https://arxiv.org/abs/2502.14180
作者: Shokhrukh Ibragimov,Arnulf Jentzen,Benno Kuckuck
机构: 未知
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注: 67 pages, 24 figures

点击查看摘要

[NLP-121] Enhancing Conversational Agents with Theory of Mind: Aligning Beliefs Desires and Intentions for Human-Like Interaction

【速读】：该论文旨在解决大型语言模型（Large Language Models, LLMs）在模仿人类通过理论思维（Theory of Mind, ToM）进行沟通方面的局限性。论文的关键解决方案在于通过显式操纵与ToM相关的组件，如信念、欲望和意图，来提升响应的一致性和质量。实验表明，在两个LLaMA 3变体中引入基于ToM的信息对齐策略显著提高了响应质量，分别达到了67%和63%的胜率。这些结果强调了以ToM为导向的策略在改善基于LLM的对话代理对齐方面的潜力。

链接: https://arxiv.org/abs/2502.14171
作者: Mohammadmahdi Jafari,Devin Yuncheng Hua,Hao Xue,Flora Salim
机构: UNSW Sydney (新南威尔士大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Natural language interaction with agentic Artificial Intelligence (AI), driven by Large Language Models (LLMs), is expected to remain a dominant paradigm in the near future. While humans instinctively align their communication with mental states – an ability known as Theory of Mind (ToM), current LLM powered systems exhibit significant limitations in this regard. This study examines the extent to which open source language models (LLaMA) can capture and preserve ToM related information and how effectively it contributes to consistent ToM reasoning in generated responses. We further investigate whether explicit manipulation of ToM related components, such as beliefs, desires, and intentions, can enhance response alignment. Experiments on two LLaMA 3 variants demonstrate that incorporating ToM informed alignment improves response quality, achieving win rates of 67 and 63 percent for the 3B and 8B models, respectively. These findings highlight the potential of ToM driven strategies to improve alignment in LLM based conversational agents.
zh

[NLP-122] Giving AI Personalities Leads to More Human-Like Reasoning

链接: https://arxiv.org/abs/2502.14155
作者: Animesh Nighojkar,Bekhzodbek Moydinboyev,My Duong,John Licato
机构: University of South Florida(南佛罗里达大学)
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
备注:

点击查看摘要

[NLP-123] LLM -Enhanced Dialogue Management for Full-Duplex Spoken Dialogue Systems INTERSPEECH2025

链接: https://arxiv.org/abs/2502.14145
作者: Hao Zhang,Weiwei Li,Rilin Chen,Vinay Kothapally,Meng Yu,Dong Yu
机构: Tencent AI Lab
类目: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
备注: In submission to INTERSPEECH 2025

点击查看摘要

[NLP-124] UM_FHS at TREC 2024 PLABA: Exploration of Fine-tuning and AI agent approach for plain language adaptations of biomedical text

链接: https://arxiv.org/abs/2502.14144
作者: Primoz Kocbek,Leon Kopitar,Zhihong Zhang,Emirhan Aydin,Maxim Topaz,Gregor Stiglic
机构: 未知
类目: Computation and Language (cs.CL)
备注: 10 pages, 2 figures, to be published in the 33rd Text REtrieval Conference (TREC 2024) proceedings

点击查看摘要

[NLP-125] Self-Regularization with Latent Space Explanations for Controllable LLM -based Classification

链接: https://arxiv.org/abs/2502.14133
作者: Xuansheng Wu,Wenhao Yu,Xiaoming Zhai,Ninghao Liu
机构: 未知
类目: Computation and Language (cs.CL)
备注: Pre-print, 15 pages, 4 figures

点击查看摘要

[NLP-126] Can Community Notes Replace Professional Fact-Checkers?

链接: https://arxiv.org/abs/2502.14132
作者: Nadav Borenstein,Greta Warren,Desmond Elliott,Isabelle Augenstein
机构: University of Copenhagen (哥本哈根大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[NLP-127] Which of These Best Describes Multiple Choice Evaluation with LLM s? A) Forced B) Flawed C) Fixable D) All of the Above

【速读】：该论文旨在解决多项选择题作答（MCQA）在大型语言模型（LLM）评估中存在的局限性。论文指出MCQA难以测试生成能力和主观性，无法充分匹配LLM的实际应用场景，并且不能全面测试知识。论文的关键解决方案是倡导采用更具生成性的评测格式，即让LLMs构建并解释答案，以更好地满足用户需求和知识测试的要求，同时保持评分的简便性。此外，论文针对MCQA数据集中的泄漏、不可回答性、捷径和饱和等问题提出改进措施，包括使用评分指南规范题目编写、评分方法抑制猜测以及应用项目反应理论设计更难的题目。通过这些改进，论文希望提升MCQA在评估LLM时的有效性和准确性。

链接: https://arxiv.org/abs/2502.14127
作者: Nishant Balepur,Rachel Rudinger,Jordan Lee Boyd-Graber
机构: University of Maryland (马里兰大学)
类目: Computation and Language (cs.CL)
备注: In-progress preprint

点击查看摘要

Abstract:Multiple choice question answering (MCQA) is popular for LLM evaluation due to its simplicity and human-like testing, but we argue for its reform. We first reveal flaws in MCQA’s format, as it struggles to: 1) test generation/subjectivity; 2) match LLM use cases; and 3) fully test knowledge. We instead advocate for generative formats based on human testing-where LLMs construct and explain answers-better capturing user needs and knowledge while remaining easy to score. We then show even when MCQA is a useful format, its datasets suffer from: leakage; unanswerability; shortcuts; and saturation. In each issue, we give fixes from education, like rubrics to guide MCQ writing; scoring methods to bridle guessing; and Item Response Theory to build harder MCQs. Lastly, we discuss LLM errors in MCQA-robustness, biases, and unfaithful explanations-showing how our prior solutions better measure or address these issues. While we do not need to desert MCQA, we encourage more efforts in refining the task based on educational testing, advancing evaluations.
zh

[NLP-128] Benchmarking LLM s for Political Science: A United Nations Perspective

链接: https://arxiv.org/abs/2502.14122
作者: Yueqing Liang,Liangwei Yang,Chen Wang,Congying Xia,Rui Meng,Xiongxiao Xu,Haoran Wang,Ali Payani,Kai Shu
机构: Illinois Institute of Technology; Salesforce; University of Illinois at Chicago; Meta; Cisco; Emory University
类目: Computation and Language (cs.CL); Computers and Society (cs.CY); Emerging Technologies (cs.ET)
备注:

点击查看摘要

[NLP-129] Meaning Beyond Truth Conditions: Evaluating Discourse Level Understanding via Anaphora Accessibility

链接: https://arxiv.org/abs/2502.14119
作者: Xiaomeng Zhu,Zhenghao Zhou,Simon Charlow,Robert Frank
机构: Yale University (耶鲁大学)
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-130] owards Context-Robust LLM s: A Gated Representation Fine-tuning Approach

链接: https://arxiv.org/abs/2502.14100
作者: Shenglai Zeng,Pengfei He,Kai Guo,Tianqi Zheng,Hanqing Lu,Yue Xing,Hui Liu
机构: Michigan State University (密歇根州立大学); Amazon.com (亚马逊)
类目: Computation and Language (cs.CL); Information Retrieval (cs.IR)
备注:

点击查看摘要

[NLP-131] Retrieving Versus Understanding Extractive Evidence in Few-Shot Learning AAAI2025

链接: https://arxiv.org/abs/2502.14095
作者: Karl Elbakian,Samuel Carton
机构: 未知
类目: Computation and Language (cs.CL)
备注: 9 pages, 8 figures, Accepted to AAAI 2025 Main Conference (AI Alignment Track)

点击查看摘要

[NLP-132] Navigating Semantic Relations: Challenges for Language Models in Abstract Common-Sense Reasoning

链接: https://arxiv.org/abs/2502.14086
作者: Cole Gawin,Yidan Sun,Mayank Kejriwal
机构: University of Southern California(南加州大学) Los Angeles(洛杉矶) California(加利福尼亚州) USA(美国)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 5 pages, 3 figures, ACM Web Conference 2025

点击查看摘要

[NLP-133] Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UniMoral

链接: https://arxiv.org/abs/2502.14083
作者: Shivani Kumar,David Jurgens
机构: 未知
类目: Computation and Language (cs.CL)
备注: 21 pages, 10 figures, 8 tables

点击查看摘要

[NLP-134] Investigating Non-Transitivity in LLM -as-a-Judge

链接: https://arxiv.org/abs/2502.14074
作者: Yi Xu,Laura Ruis,Tim Rocktäschel,Robert Kirk
机构: 未知
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: 8 pages, 6 figures, 2 tables (30 pages, 11 figures, 8 tables including references and appendices)

点击查看摘要

[NLP-135] RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression

链接: https://arxiv.org/abs/2502.14051
作者: Payman Behnam,Yaosheng Fu,Ritchie Zhao,Po-An Tsai,Zhiding Yu,Alexey Tumanov
机构: 未知
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-136] Diversity-driven Data Selection for Language Model Tuning through Sparse Autoencoder

链接: https://arxiv.org/abs/2502.14050
作者: Xianjun Yang,Shaoliang Nie,Lijuan Liu,Suchin Gururangan,Ujjwal Karn,Rui Hou,Madian Khabsa,Yuning Mao
机构: Meta (META)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-137] Semantic Decomposition and Selective Context Filtering – Text Processing Techniques for Context-Aware NLP-Based Systems

链接: https://arxiv.org/abs/2502.14048
作者: Karl John Villardar
机构: Cebu Institute of Technology(宿务科技大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
备注:

点击查看摘要

[NLP-138] DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation

链接: https://arxiv.org/abs/2502.14037
作者: Giorgio Franceschelli,Mirco Musolesi
机构: University of Bologna (博洛尼亚大学); University College London (伦敦大学学院)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[NLP-139] Dehumanizing Machines: Mitigating Anthropomorphic Behaviors in Text Generation Systems

链接: https://arxiv.org/abs/2502.14019
作者: Myra Cheng,Su Lin Blodgett,Alicia DeVrio,Lisa Egede,Alexandra Olteanu
机构: Stanford University (斯坦福大学); Microsoft Research (微软研究); Carnegie Mellon University (卡内基梅隆大学)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
备注:

点击查看摘要

[NLP-140] Which Attention Heads Matter for In-Context Learning?

链接: https://arxiv.org/abs/2502.14010
作者: Kayo Yin,Jacob Steinhardt
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

[NLP-141] MaskPrune: Mask-based LLM Pruning for Layer-wise Uniform Structures

链接: https://arxiv.org/abs/2502.14008
作者: Jiayu Qin,Jianchao Tan,Kefeng Zhang,Xunliang Cai,Wei Wang
机构: Nanjing University (南京大学); Meituan (美团)
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

计算机视觉

[CV-0] me Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts

【速读】：该论文旨在解决历史与文化文物分析过程中复杂且耗时的问题，并提出需要一个标准化基准来评估和改进大型多模态模型。解决方案的关键在于引入TimeTravel基准，该基准包含10,250个专家验证的样本，涵盖了10个主要历史区域中的266种不同文化。TimeTravel提供了一个结构化的数据集和强大的评估框架，用于评估AI模型在分类、解读及历史理解方面的能力，从而促进AI技术在历史研究和文化遗产保护中的应用。

链接: https://arxiv.org/abs/2502.14865
作者: Sara Ghaboura,Ketan More,Ritesh Thawkar,Wafa Alghallabi,Omkar Thawakar,Fahad Shahbaz Khan,Hisham Cholakkal,Salman Khan,Rao Muhammad Anwer
机构: Mohamed bin Zayed University of AI(穆罕默德·本·扎耶德人工智能大学); Linköping University(林雪平大学); Australian National University(澳大利亚国立大学); Aalto University(阿尔托大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: 4 pages, 6 figures

点击查看摘要

Abstract:Understanding historical and cultural artifacts demands human expertise and advanced computational techniques, yet the process remains complex and time-intensive. While large multimodal models offer promising support, their evaluation and improvement require a standardized benchmark. To address this, we introduce TimeTravel, a benchmark of 10,250 expert-verified samples spanning 266 distinct cultures across 10 major historical regions. Designed for AI-driven analysis of manuscripts, artworks, inscriptions, and archaeological discoveries, TimeTravel provides a structured dataset and robust evaluation framework to assess AI models’ capabilities in classification, interpretation, and historical comprehension. By integrating AI with historical research, TimeTravel fosters AI-powered tools for historians, archaeologists, researchers, and cultural tourists to extract valuable insights while ensuring technology contributes meaningfully to historical discovery and cultural heritage preservation. We evaluate contemporary AI models on TimeTravel, highlighting their strengths and identifying areas for improvement. Our goal is to establish AI as a reliable partner in preserving cultural heritage, ensuring that technological advancements contribute meaningfully to historical discovery. Our code is available at: \urlthis https URL.
zh

[CV-1] Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework

【速读】：该论文旨在解决现有基准测试主要关注简单的图像-文本交互，而忽视了图表等复杂视觉格式的问题。解决方案的关键在于引入了一个新的任务——基于图表的多模态检索增强生成（Chart-based MRAG），并通过提出CHARt-based文档问答GEneration（CHARGE）框架来半自动地生成高质量评估样本。该框架通过结构化关键点提取、跨模态验证和基于关键点的生成来产生评估数据。结合专家验证，构建了包含来自现实文档的八个领域共计4,738个问答对的全面基准测试集——Chart-MRAG Bench。

链接: https://arxiv.org/abs/2502.14864
作者: Yuming Yang,Jiang Zhong,Li Jin,Jingwang Huang,Jingpeng Gao,Qing Liu,Yang Bai,Jingyuan Zhang,Rui Jiang,Kaiwen Wei
机构: College of Computer Science, Chongqing University (重庆大学, 计算机科学学院); Aerospace Information Research Institute, Chinese Academy of Sciences (中国科学院, 航天信息研究所); Kuaishou Technology (快手科技), Beijing, China (中国北京)
类目: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Multimodal Retrieval-Augmented Generation (MRAG) enhances reasoning capabilities by integrating external knowledge. However, existing benchmarks primarily focus on simple image-text interactions, overlooking complex visual formats like charts that are prevalent in real-world applications. In this work, we introduce a novel task, Chart-based MRAG, to address this limitation. To semi-automatically generate high-quality evaluation samples, we propose CHARt-based document question-answering GEneration (CHARGE), a framework that produces evaluation data through structured keypoint extraction, crossmodal verification, and keypoint-based generation. By combining CHARGE with expert validation, we construct Chart-MRAG Bench, a comprehensive benchmark for chart-based MRAG evaluation, featuring 4,738 question-answering pairs across 8 domains from real-world documents. Our evaluation reveals three critical limitations in current approaches: (1) unified multimodal embedding retrieval methods struggles in chart-based scenarios, (2) even with ground-truth retrieval, state-of-the-art MLLMs achieve only 58.19% Correctness and 73.87% Coverage scores, and (3) MLLMs demonstrate consistent text-over-visual modality bias during Chart-based MRAG reasoning. The CHARGE and Chart-MRAG Bench are released at this https URL.
zh

[CV-2] Dynamic Concepts Personalization from Single Videos

【速读】：该论文旨在解决将文本到视频模型的个性化扩展至动态概念的问题。论文的关键解决方案是引入了一种名为Set-and-Sequence的新框架，用于基于Diffusion Transformers (DiTs) 的生成式视频模型的个性化。该方法通过在架构中构建一个不显式分离空间和时间特征的时空权重空间来实现。具体而言，通过两阶段过程：首先利用视频中的无序帧集合微调低秩适应（Low-Rank Adaptation, LoRA）层以学习不受时间干扰的身份LoRA基底，从而捕捉外观特征；其次冻结身份LoRA，在其系数上添加运动残差，并在完整视频序列上进行微调，以捕捉运动动态。这种方法有效地将动态概念嵌入到视频模型的输出域中，实现了前所未有的可编辑性和组合性，同时为个性化动态概念设定了新的基准。

链接: https://arxiv.org/abs/2502.14844
作者: Rameen Abdal,Or Patashnik,Ivan Skorokhodov,Willi Menapace,Aliaksandr Siarohin,Sergey Tulyakov,Daniel Cohen-Or,Kfir Aberman
机构: Snap Research
类目: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Webpage: this https URL

点击查看摘要

Abstract:Personalizing generative text-to-image models has seen remarkable progress, but extending this personalization to text-to-video models presents unique challenges. Unlike static concepts, personalizing text-to-video models has the potential to capture dynamic concepts, i.e., entities defined not only by their appearance but also by their motion. In this paper, we introduce Set-and-Sequence, a novel framework for personalizing Diffusion Transformers (DiTs)-based generative video models with dynamic concepts. Our approach imposes a spatio-temporal weight space within an architecture that does not explicitly separate spatial and temporal features. This is achieved in two key stages. First, we fine-tune Low-Rank Adaptation (LoRA) layers using an unordered set of frames from the video to learn an identity LoRA basis that represents the appearance, free from temporal interference. In the second stage, with the identity LoRAs frozen, we augment their coefficients with Motion Residuals and fine-tune them on the full video sequence, capturing motion dynamics. Our Set-and-Sequence framework results in a spatio-temporal weight space that effectively embeds dynamic concepts into the video model’s output domain, enabling unprecedented editability and compositionality while setting a new benchmark for personalizing dynamic concepts.
zh

[CV-3] Improving the Diffusability of Autoencoders

【速读】：该论文旨在解决现代自动编码器在潜空间中存在过多高频成分的问题，这些问题尤其在具有较大瓶颈通道尺寸的自动编码器中更为显著。这些高频成分干扰了扩散合成过程从粗到细的性质，从而阻碍了生成质量。为了解决这一问题，论文提出了一种称为尺度等变（scale equivariance）的简单正则化策略，通过在解码器中强制实现尺度等变来对齐潜空间和RGB空间中的频率。这种方法仅需少量代码更改，并且只需要最多20K次自动编码器微调步骤，即可显著提高生成质量，将ImageNet-1K 256x256图像生成的FID降低19%，并将Kinetics-700 17x256x256视频生成的FVD至少降低44%。

链接: https://arxiv.org/abs/2502.14831
作者: Ivan Skorokhodov,Sharath Girish,Benran Hu,Willi Menapace,Yanyu Li,Rameen Abdal,Sergey Tulyakov,Aliaksandr Siarohin
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 26 pages, 22 figures, 9 tables

点击查看摘要

Abstract:Latent diffusion models have emerged as the leading approach for generating high-quality images and videos, utilizing compressed latent representations to reduce the computational burden of the diffusion process. While recent advancements have primarily focused on scaling diffusion backbones and improving autoencoder reconstruction quality, the interaction between these components has received comparatively less attention. In this work, we perform a spectral analysis of modern autoencoders and identify inordinate high-frequency components in their latent spaces, which are especially pronounced in the autoencoders with a large bottleneck channel size. We hypothesize that this high-frequency component interferes with the coarse-to-fine nature of the diffusion synthesis process and hinders the generation quality. To mitigate the issue, we propose scale equivariance: a simple regularization strategy that aligns latent and RGB spaces across frequencies by enforcing scale equivariance in the decoder. It requires minimal code changes and only up to 20K autoencoder fine-tuning steps, yet significantly improves generation quality, reducing FID by 19% for image generation on ImageNet-1K 256x256 and FVD by at least 44% for video generation on Kinetics-700 17x256x256.
zh

[CV-4] Exploring Advanced Techniques for Visual Question Answering: A Comprehensive Comparison

【速读】：该论文旨在解决视觉问答（VQA）任务中模型面临的挑战，包括数据集偏差、有限的模型复杂性、常识推理差距、僵化的评估方法以及向真实世界场景的泛化能力。为应对这些挑战，论文提出了一个综合比较研究，涵盖了五种先进的VQA模型：ABC-CNN、KICNLE、Masked Vision and Language Modeling、BLIP-2和OFA，每种模型采用独特的策略来解决上述问题。关键在于通过对比分析这些模型，识别其在处理多模态推理复杂性方面的优势与局限，从而推动VQA领域的发展。

链接: https://arxiv.org/abs/2502.14827
作者: Aiswarya Baby,Tintu Thankom Koshy
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Machine Learning (cs.LG)
备注: 8 pages, No figures

点击查看摘要

Abstract:Visual Question Answering (VQA) has emerged as a pivotal task in the intersection of computer vision and natural language processing, requiring models to understand and reason about visual content in response to natural language questions. Analyzing VQA datasets is essential for developing robust models that can handle the complexities of multimodal reasoning. Several approaches have been developed to examine these datasets, each offering distinct perspectives on question diversity, answer distribution, and visual-textual correlations. Despite significant progress, existing VQA models face challenges related to dataset bias, limited model complexity, commonsense reasoning gaps, rigid evaluation methods, and generalization to real world scenarios. This paper presents a comprehensive comparative study of five advanced VQA models: ABC-CNN, KICNLE, Masked Vision and Language Modeling, BLIP-2, and OFA, each employing distinct methodologies to address these challenges.
zh

[CV-5] AVD2: Accident Video Diffusion for Accident Video Description ICRA2025

【速读】：该论文旨在解决自动驾驶领域中交通事故场景理解不足的问题，特别是在缺乏特定事故场景训练数据的情况下。为了解决这一问题，论文提出了一种名为AVD2（事故视频扩散用于事故视频描述）的新框架，该框架通过生成与详细自然语言描述和推理相一致的事故视频，增强对事故场景的理解。关键解决方案在于引入EMM-AU（增强多模态事故视频理解）数据集，从而显著提升了事故分析和预防领域的性能。

链接: https://arxiv.org/abs/2502.14801
作者: Cheng Li,Keyuan Zhou,Tong Liu,Yu Wang,Mingqiao Zhuang,Huan-ang Gao,Bu Jin,Hao Zhao
机构: Institute for AI Industry Research (AIR), Tsinghua University (清华大学); Academy of Interdisciplinary Studies, the Hong Kong University of Science and Technology (香港科技大学交叉学科研究院); College of Communication Engineering, Jilin University (吉林大学通信工程学院); School of Cyber Science and Engineering, Nanjing University of Science and Technology (南京理工大学网络空间安全学院); School of Automation, Beijing Institute of Technology (北京理工大学自动化学院); College of Foreign Language and Literature, Fudan University (复旦大学外国语言文学学院); Beijing Academy of Artificial Intelligence (BAAI) (北京智源人工智能研究院); Lightwheel AI (光轮智能)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: ICRA 2025, Project Page: this https URL

点击查看摘要

Abstract:Traffic accidents present complex challenges for autonomous driving, often featuring unpredictable scenarios that hinder accurate system interpretation and this http URL, prevailing methodologies fall short in elucidating the causes of accidents and proposing preventive measures due to the paucity of training data specific to accident this http URL this work, we introduce AVD2 (Accident Video Diffusion for Accident Video Description), a novel framework that enhances accident scene understanding by generating accident videos that aligned with detailed natural language descriptions and reasoning, resulting in the contributed EMM-AU (Enhanced Multi-Modal Accident Video Understanding) dataset. Empirical results reveal that the integration of the EMM-AU dataset establishes state-of-the-art performance across both automated metrics and human evaluations, markedly advancing the domains of accident analysis and prevention. Project resources are available at this https URL
zh

[CV-6] A Survey on Text-Driven 360-Degree Panorama Generation

【速读】：该论文旨在解决通过文本描述直接生成360度全景图像的问题。解决方案的关键在于利用最新的文本到图像扩散模型（text-to-image diffusion models）的进步，这些模型显著简化了传统上复杂的内容生成过程，并推动了这一新兴领域的快速发展。

链接: https://arxiv.org/abs/2502.14799
作者: Hai Wang,Xiaoyu Xiang,Weihao Xia,Jing-Hao Xue
机构: University College London (伦敦大学学院); Meta Reality Labs (Meta 实境实验室)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:The advent of text-driven 360-degree panorama generation, enabling the synthesis of 360-degree panoramic images directly from textual descriptions, marks a transformative advancement in immersive visual content creation. This innovation significantly simplifies the traditionally complex process of producing such content. Recent progress in text-to-image diffusion models has accelerated the rapid development in this emerging field. This survey presents a comprehensive review of text-driven 360-degree panorama generation, offering an in-depth analysis of state-of-the-art algorithms and their expanding applications in 360-degree 3D scene generation. Furthermore, we critically examine current limitations and propose promising directions for future research. A curated project page with relevant resources and research papers is available at this https URL.
zh

[CV-7] Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration

【速读】：该论文旨在解决当前类人机器人控制框架的局限性，这些框架主要依赖于反应机制，并由于数据稀缺而缺乏自主交互能力。论文的关键解决方案是提出Humanoid-VLA框架，该框架集成了语言理解、以自我为中心的场景感知和运动控制，从而实现通用的类人机器人控制。其关键是通过非以自我为中心的人体运动数据集与文本描述的预对齐，学习通用运动模式和动作语义，结合高效的视频条件微调来引入以自我为中心的视觉上下文，以及引入自监督的数据增强策略，将原始运动序列转换为信息丰富的问答对，从而促进大规模无标签视频数据的有效利用。

链接: https://arxiv.org/abs/2502.14795
作者: Pengxiang Ding,Jianfei Ma,Xinyang Tong,Binghong Zou,Xinxin Luo,Yiguo Fan,Ting Wang,Hongchao Lu,Panzhong Mo,Jinxin Liu,Yuefan Wang,Huaicheng Zhou,Wenshuo Feng,Jiacheng Liu,Siteng Huang,Donglin Wang
机构: 未知
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:This paper addresses the limitations of current humanoid robot control frameworks, which primarily rely on reactive mechanisms and lack autonomous interaction capabilities due to data scarcity. We propose Humanoid-VLA, a novel framework that integrates language understanding, egocentric scene perception, and motion control, enabling universal humanoid control. Humanoid-VLA begins with language-motion pre-alignment using non-egocentric human motion datasets paired with textual descriptions, allowing the model to learn universal motion patterns and action semantics. We then incorporate egocentric visual context through a parameter efficient video-conditioned fine-tuning, enabling context-aware motion generation. Furthermore, we introduce a self-supervised data augmentation strategy that automatically generates pseudoannotations directly derived from motion data. This process converts raw motion sequences into informative question-answer pairs, facilitating the effective use of large-scale unlabeled video data. Built upon whole-body control architectures, extensive experiments show that Humanoid-VLA achieves object interaction and environment exploration tasks with enhanced contextual awareness, demonstrating a more human-like capacity for adaptive and intelligent engagement.
zh

[CV-8] RendBEV: Semantic Novel View Synthesis for Self-Supervised Birds Eye View Segmentation WACV2025

【速读】：该论文旨在解决在零样本（zero-shot）条件下，利用自监督学习方法进行鸟瞰视图（BEV）语义分割的问题。论文的关键在于提出了一种名为RendBEV的新方法，通过可微分体渲染技术（differentiable volumetric rendering），从二维语义分割模型计算出的语义视角视图（semantic perspective views）获取监督信号，从而实现BEV语义分割网络的自监督训练。这种方法不仅能够在零样本情况下表现竞争力，还能显著提升在标注数据有限条件下的性能，并在充分使用标注数据进行微调时达到新的技术水平。

链接: https://arxiv.org/abs/2502.14792
作者: Henrique Piñeiro Monteagudo,Leonardo Taccari,Aurel Pjetri,Francesco Sambo,Samuele Salti
机构: Verizon Connect(威瑞森连接); University of Bologna(博洛尼亚大学); University of Florence(佛罗伦萨大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted at WACV 2025

点击查看摘要

Abstract:Bird’s Eye View (BEV) semantic maps have recently garnered a lot of attention as a useful representation of the environment to tackle assisted and autonomous driving tasks. However, most of the existing work focuses on the fully supervised setting, training networks on large annotated datasets. In this work, we present RendBEV, a new method for the self-supervised training of BEV semantic segmentation networks, leveraging differentiable volumetric rendering to receive supervision from semantic perspective views computed by a 2D semantic segmentation model. Our method enables zero-shot BEV semantic segmentation, and already delivers competitive results in this challenging setting. When used as pretraining to then fine-tune on labeled BEV ground-truth, our method significantly boosts performance in low-annotation regimes, and sets a new state of the art when fine-tuning on all available labels.
zh

[CV-9] Structurally Disentangled Feature Fields Distillation for 3D Understanding and Editing

【速读】：该论文致力于解决利用单一特征场捕获3D特征时存在的限制，尤其是假设特征是视点无关的观点。论文的关键在于提出使用多个解耦的特征场来捕捉不同结构组件的3D特征，这些特征包括视点相关和视点无关的成分，且这些特征仅从2D特征监督中学习得到。通过这种方法，每个元素可以独立控制，从而实现语义和结构的理解及编辑能力。

链接: https://arxiv.org/abs/2502.14789
作者: Yoel Levy,David Shavin,Itai Lang,Sagie Benaim
机构: The Hebrew University of Jerusalem(耶路撒冷希伯来大学); University of Chicago(芝加哥大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Recent work has demonstrated the ability to leverage or distill pre-trained 2D features obtained using large pre-trained 2D models into 3D features, enabling impressive 3D editing and understanding capabilities using only 2D supervision. Although impressive, models assume that 3D features are captured using a single feature field and often make a simplifying assumption that features are view-independent. In this work, we propose instead to capture 3D features using multiple disentangled feature fields that capture different structural components of 3D features involving view-dependent and view-independent components, which can be learned from 2D feature supervision only. Subsequently, each element can be controlled in isolation, enabling semantic and structural understanding and editing capabilities. For instance, using a user click, one can segment 3D features corresponding to a given object and then segment, edit, or remove their view-dependent (reflective) properties. We evaluate our approach on the task of 3D segmentation and demonstrate a set of novel understanding and editing tasks.
zh

[CV-10] SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding Localization and Dense Features

【速读】：该论文旨在提升多语言视觉-语言编码器在零样本分类、图像-文本检索及视觉表征提取等核心任务上的性能。解决方案的关键在于引入了一种统一的训练方法，该方法融合了基于描述的预训练、自监督损失（如自蒸馏、掩码预测）以及在线数据整理技术，从而显著提升了SigLIP 2模型在不同规模下的表现，并在定位和密集预测任务上取得了显著改进。此外，通过采用更丰富的数据混合和去偏技术，模型的多语言理解和公平性也得到了增强。

链接: https://arxiv.org/abs/2502.14786
作者: Michael Tschannen,Alexey Gritsenko,Xiao Wang,Muhammad Ferjad Naeem,Ibrahim Alabdulmohsin,Nikhil Parthasarathy,Talfan Evans,Lucas Beyer,Ye Xia,Basil Mustafa,Olivier Hénaff,Jeremiah Harmsen,Andreas Steiner,Xiaohua Zhai
机构: Google DeepMind
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Model checkpoints are available at this https URL

点击查看摘要

Abstract:We introduce SigLIP 2, a family of new multilingual vision-language encoders that build on the success of the original SigLIP. In this second iteration, we extend the original image-text training objective with several prior, independently developed techniques into a unified recipe – this includes captioning-based pretraining, self-supervised losses (self-distillation, masked prediction) and online data curation. With these changes, SigLIP 2 models outperform their SigLIP counterparts at all model scales in core capabilities, including zero-shot classification, image-text retrieval, and transfer performance when extracting visual representations for Vision-Language Models (VLMs). Furthermore, the new training recipe leads to significant improvements on localization and dense prediction tasks. We also train variants which support multiple resolutions and preserve the input’s native aspect ratio. Finally, we train on a more diverse data-mixture that includes de-biasing techniques, leading to much better multilingual understanding and improved fairness. To allow users to trade off inference cost with performance, we release model checkpoints at four sizes: ViT-B (86M), L (303M), So400m (400M), and g (1B).
zh

[CV-11] DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models

【速读】：该论文旨在解决现有基于全局条件的ControlNet模型在多条件图像生成中的灵活性不足及条件误解问题。论文的关键解决方案是引入DC（解耦）-ControlNet框架，通过解耦控制条件，将全局控制转化为层次系统，集成不同元素、内容和布局，从而实现更灵活的条件混合。为此，文中提出了内元素控制器（Intra-Element Controller）和跨元素控制器（Inter-Element Controller）。前者处理单个元素内部的不同类型控制信号，精确描述对象的内容和布局特性；后者则基于用户定义的关系，准确处理多元素间的交互和遮挡。这些改进显著提升了多条件控制下的操控灵活性和精度。

链接: https://arxiv.org/abs/2502.14779
作者: Hongji Yang,Wencheng Han,Yucheng Zhou,Jianbing Shen
机构: SKL-IOTSC, CIS, University of Macau (澳门大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:In this paper, we introduce DC (Decouple)-ControlNet, a highly flexible and precisely controllable framework for multi-condition image generation. The core idea behind DC-ControlNet is to decouple control conditions, transforming global control into a hierarchical system that integrates distinct elements, contents, and layouts. This enables users to mix these individual conditions with greater flexibility, leading to more efficient and accurate image generation control. Previous ControlNet-based models rely solely on global conditions, which affect the entire image and lack the ability of element- or region-specific control. This limitation reduces flexibility and can cause condition misunderstandings in multi-conditional image generation. To address these challenges, we propose both intra-element and Inter-element Controllers in DC-ControlNet. The Intra-Element Controller handles different types of control signals within individual elements, accurately describing the content and layout characteristics of the object. For interactions between elements, we introduce the Inter-Element Controller, which accurately handles multi-element interactions and occlusion based on user-defined relationships. Extensive evaluations show that DC-ControlNet significantly outperforms existing ControlNet models and Layout-to-Image generative models in terms of control flexibility and precision in multi-condition control.
zh

[CV-12] Sculpting [CLS] Features for Pre-Trained Model-Based Class-Incremental Learning

【速读】：该论文旨在解决在类增量学习（Class-incremental Learning）中模型面临的灾难性遗忘（Catastrophic Forgetting）问题，即模型在学习新类别时容易忘记旧的知识。为了解决这一挑战，论文提出的关键方案是引入了一个名为“Learn and Calibrate”（LuCA）的高效微调模块，并结合“Token-level Sparse Calibration and Adaptation”（TOSCA）策略。通过这种设计，论文实现了在保持预训练模型泛化能力的同时，仅通过对最后一个token进行稀疏校准和适应，从而在稳定性和可塑性之间达到和谐平衡。

链接: https://arxiv.org/abs/2502.14762
作者: Murat Onur Yildirim,Elif Ceren Gok Yildirim,Joaquin Vanschoren
机构: TU Eindhoven
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Class-incremental learning requires models to continually acquire knowledge of new classes without forgetting old ones. Although pre-trained models have demonstrated strong performance in class-incremental learning, they remain susceptible to catastrophic forgetting when learning new concepts. Excessive plasticity in the models breaks generalizability and causes forgetting, while strong stability results in insufficient adaptation to new classes. This necessitates effective adaptation with minimal modifications to preserve the general knowledge of pre-trained models. To address this challenge, we first introduce a new parameter-efficient fine-tuning module ‘Learn and Calibrate’, or LuCA, designed to acquire knowledge through an adapter-calibrator couple, enabling effective adaptation with well-refined feature representations. Second, for each learning session, we deploy a sparse LuCA module on top of the last token just before the classifier, which we refer to as ‘Token-level Sparse Calibration and Adaptation’, or TOSCA. This strategic design improves the orthogonality between the modules and significantly reduces both training and inference complexity. By leaving the generalization capabilities of the pre-trained models intact and adapting exclusively via the last token, our approach achieves a harmonious balance between stability and plasticity. Extensive experiments demonstrate TOSCA’s state-of-the-art performance while introducing ~8 times fewer parameters compared to prior methods.
zh

[CV-13] YOLOv12: A Breakdown of the Key Architectural Features

【速读】：该论文旨在提升单阶段实时目标检测系统的性能与效率。解决方案的关键在于引入优化的骨干网络（R-ELAN）、7x7可分离卷积以及基于FlashAttention的区域注意力机制，从而改进特征提取、增强效率并实现更稳健的检测结果。这些改进使得YOLOv12在保持高精度的同时，也实现了更快的推理速度，使其成为适用于自动驾驶系统、安全监控及实时分析等领域的有力选择。

链接: https://arxiv.org/abs/2502.14740
作者: Mujadded Al Rabbani Alif,Muhammad Hussain
机构: Department of Computer Science, Huddersfield University (赫德斯菲尔德大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:This paper presents an architectural analysis of YOLOv12, a significant advancement in single-stage, real-time object detection building upon the strengths of its predecessors while introducing key improvements. The model incorporates an optimised backbone (R-ELAN), 7x7 separable convolutions, and FlashAttention-driven area-based attention, improving feature extraction, enhanced efficiency, and robust detections. With multiple model variants, similar to its predecessors, YOLOv12 offers scalable solutions for both latency-sensitive and high-accuracy applications. Experimental results manifest consistent gains in mean average precision (mAP) and inference speed, making YOLOv12 a compelling choice for applications in autonomous systems, security, and real-time analytics. By achieving an optimal balance between computational efficiency and performance, YOLOv12 sets a new benchmark for real-time computer vision, facilitating deployment across diverse hardware platforms, from edge devices to high-performance clusters.
zh

[CV-14] Multi-dataset synergistic in supervised learning to pre-label structural components in point clouds from shell construction scenes

【速读】：该论文旨在解决建筑行业计算机视觉研究和机器学习中因标注数据而产生的显著工作量问题。论文的关键解决方案在于利用预训练的变压器架构进行迁移学习，以最小的新数据量实现建筑组件分割的高效性能。通过监督训练和自定义验证数据集建立基线，并评估跨领域推理效果，论文表明少量微调即可使预训练模型在新数据上的自动标注和常见对象分割任务中表现出色。

链接: https://arxiv.org/abs/2502.14721
作者: Lukas Rauch,Thomas Braml
机构: University of the Bundeswehr Munich(德国联邦国防军大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 18 pages, 8 figures, 7 tables

点击查看摘要

Abstract:The significant effort required to annotate data for new training datasets hinders computer vision research and machine learning in the construction industry. This work explores adapting standard datasets and the latest transformer model architectures for point cloud semantic segmentation in the context of shell construction sites. Unlike common approaches focused on object segmentation of building interiors and furniture, this study addressed the challenges of segmenting complex structural components in Architecture, Engineering, and Construction (AEC). We establish a baseline through supervised training and a custom validation dataset, evaluate the cross-domain inference with large-scale indoor datasets, and utilize transfer learning to maximize segmentation performance with minimal new data. The findings indicate that with minimal fine-tuning, pre-trained transformer architectures offer an effective strategy for building component segmentation. Our results are promising for automating the annotation of new, previously unseen data when creating larger training resources and for the segmentation of frequently recurring objects.
zh

[CV-15] CDGS: Confidence-Aware Depth Regularization for 3D Gaussian Splatting

【速读】：该论文旨在解决3D Gaussian Splatting (3DGS)在三维重建中的几何精度受限问题，即由于优化过程中缺乏显式的几何约束导致的几何细节不足。关键解决方案在于引入了一种名为CDGS的置信度感知深度正则化方法，通过利用单目深度估计的多线索置信图和稀疏的运动结构深度信息，在优化过程中自适应调整深度监督，从而增强3DGS的几何细节保留能力，并提升其在新型视图合成(NVS)质量和几何准确性方面的性能。

链接: https://arxiv.org/abs/2502.14684
作者: Qilin Zhang,Olaf Wysocki,Steffen Urban,Boris Jutzi
机构: Technical University of Munich (TUM) (慕尼黑工业大学)
类目: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:3D Gaussian Splatting (3DGS) has shown significant advantages in novel view synthesis (NVS), particularly in achieving high rendering speeds and high-quality results. However, its geometric accuracy in 3D reconstruction remains limited due to the lack of explicit geometric constraints during optimization. This paper introduces CDGS, a confidence-aware depth regularization approach developed to enhance 3DGS. We leverage multi-cue confidence maps of monocular depth estimation and sparse Structure-from-Motion depth to adaptively adjust depth supervision during the optimization process. Our method demonstrates improved geometric detail preservation in early training stages and achieves competitive performance in both NVS quality and geometric accuracy. Experiments on the publicly available Tanks and Temples benchmark dataset show that our method achieves more stable convergence behavior and more accurate geometric reconstruction results, with improvements of up to 2.31 dB in PSNR for NVS and consistently lower geometric errors in M3C2 distance metrics. Notably, our method reaches comparable F-scores to the original 3DGS with only 50% of the training iterations. We expect this work will facilitate the development of efficient and accurate 3D reconstruction systems for real-world applications such as digital twin creation, heritage preservation, or forestry applications.
zh

[CV-16] BP-SGCN: Behavioral Pseudo-Label Informed Sparse Graph Convolution Network for Pedestrian and Heterogeneous Trajectory Prediction

【速读】：该论文旨在解决轨迹预测在处理异构交通参与者（如行人、骑车者和车辆）时准确性受限的问题。现有的方法要么依赖于相对一致的行人行为模式，但无法有效应对真实世界中的异构场景；要么需要额外的类别标签信息来区分不同类型的参与者，但这不仅成本高昂且难以泛化到同一类别内的不同行为表现。本文的关键解决方案在于引入了一种行为伪标签（Behavioral Pseudo-Labels），它仅基于运动特征就能有效地捕捉行人的行为分布及异构参与者的动态变化，显著提升了轨迹预测的准确性。为了实现这一框架，文中提出了行为伪标签引导的稀疏图卷积网络（BP-SGCN），该网络能够学习并利用这些伪标签指导轨迹预测模型。此外，文中还提出了一种级联训练方案，首先以无监督方式学习伪标签，随后进行端到端微调以进一步提升轨迹预测精度。实验结果表明，所提出的伪标签能够有效建模不同的行为集群，并显著改善轨迹预测效果。

链接: https://arxiv.org/abs/2502.14676
作者: Ruochen Li,Stamos Katsigiannis,Tae-Kyun Kim,Hubert P. H. Shum
机构: Durham University (杜伦大学); KAIST (韩国科学技术院)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Trajectory prediction allows better decision-making in applications of autonomous vehicles or surveillance by predicting the short-term future movement of traffic agents. It is classified into pedestrian or heterogeneous trajectory prediction. The former exploits the relatively consistent behavior of pedestrians, but is limited in real-world scenarios with heterogeneous traffic agents such as cyclists and vehicles. The latter typically relies on extra class label information to distinguish the heterogeneous agents, but such labels are costly to annotate and cannot be generalized to represent different behaviors within the same class of agents. In this work, we introduce the behavioral pseudo-labels that effectively capture the behavior distributions of pedestrians and heterogeneous agents solely based on their motion features, significantly improving the accuracy of trajectory prediction. To implement the framework, we propose the Behavioral Pseudo-Label Informed Sparse Graph Convolution Network (BP-SGCN) that learns pseudo-labels and informs to a trajectory predictor. For optimization, we propose a cascaded training scheme, in which we first learn the pseudo-labels in an unsupervised manner, and then perform end-to-end fine-tuning on the labels in the direction of increasing the trajectory prediction accuracy. Experiments show that our pseudo-labels effectively model different behavior clusters and improve trajectory prediction. Our proposed BP-SGCN outperforms existing methods using both pedestrian (ETH/UCY, pedestrian-only SDD) and heterogeneous agent datasets (SDD, Argoverse 1).
zh

[CV-17] MAGO-SP: Detection and Correction of Water-Fat Swaps in Magnitude-Only VIBE MRI

【速读】：该论文旨在解决在非对比增强的VIBE（Volume Interpolated Breath-Hold Examination）MRI图像中水-脂肪置换导致的信号重构模糊性问题，这限制了自动化PDFF（Proton Density Fat Fraction）分析在大规模临床数据和人群研究中的应用。解决方案的关键在于开发了一个三步自动化流水线：首先，训练一个分割网络将体积分类为“脂肪样”或“水样”，利用由融合脂肪和水体积与Perlin噪声生成的合成水-脂肪置换进行训练；其次，使用去噪扩散图像到图像网络预测水体积作为信号先验进行校正；最后，将此先验整合进物理约束模型以恢复准确的水和脂肪信号。该方法在6点VIBE中实现了1%的水-脂肪置换检测误差率。

链接: https://arxiv.org/abs/2502.14659
作者: Robert Graf,Hendrik Möller,Sophie Starck,Matan Atad,Philipp Braun,Jonathan Stelter,Annette Peters,Lilian Krist,Stefan N. Willich,Henry Völzke,Robin Bülow,Klaus Berger,Tobias Pischon,Thoralf Niendorf,Johannes Paetzold,Dimitrios Karampinos,Daniel Rueckert,Jan Kirschke
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Volume Interpolated Breath-Hold Examination (VIBE) MRI generates images suitable for water and fat signal composition estimation. While the two-point VIBE provides water-fat-separated images, the six-point VIBE allows estimation of the effective transversal relaxation rate R2* and the proton density fat fraction (PDFF), which are imaging markers for health and disease. Ambiguity during signal reconstruction can lead to water-fat swaps. This shortcoming challenges the application of VIBE-MRI for automated PDFF analyses of large-scale clinical data and of population studies. This study develops an automated pipeline to detect and correct water-fat swaps in non-contrast-enhanced VIBE images. Our three-step pipeline begins with training a segmentation network to classify volumes as “fat-like” or “water-like,” using synthetic water-fat swaps generated by merging fat and water volumes with Perlin noise. Next, a denoising diffusion image-to-image network predicts water volumes as signal priors for correction. Finally, we integrate this prior into a physics-constrained model to recover accurate water and fat signals. Our approach achieves a 1% error rate in water-fat swap detection for a 6-point VIBE. Notably, swaps disproportionately affect individuals in the Underweight and Class 3 Obesity BMI categories. Our correction algorithm ensures accurate solution selection in chemical phase MRIs, enabling reliable PDFF estimation. This forms a solid technical foundation for automated large-scale population imaging analysis.
zh

[CV-18] Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion ICRA

【速读】：该论文旨在解决透明物体感知中的挑战，特别是通过单目图像进行透明物体的精确分割和深度估计。论文的关键在于提出了一种新颖的单目框架，该框架首次实现了仅使用单张图像就能在分割和深度估计两个任务之间有效融合多尺度信息，并通过迭代策略逐步优化初始特征，从而获得更清晰的结果。实验结果表明，该模型在两个具有挑战性的数据集上显著超越了现有的单目、立体视觉和多视角方法。

链接: https://arxiv.org/abs/2502.14616
作者: Jiangyuan Liu,Hongxuan Ma,Yuxin Guo,Yuhao Zhao,Chi Zhang,Wei Sui,Wei Zou
机构: School of Artificial Intelligence, University of Chinese Academy of Sciences(中国科学院大学人工智能学院); State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation of Chinese Academy of Sciences(多模态人工智能系统国家重点实验室（自动化所），中国科学院); School of Information Science and Technology, Shijiazhuang Tiedao University(石家庄铁道大学信息科学与技术学院); D-Robotics(未知)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by ICRA(2025). The code is accessible through: this https URL

点击查看摘要

Abstract:Transparent object perception is indispensable for numerous robotic tasks. However, accurately segmenting and estimating the depth of transparent objects remain challenging due to complex optical properties. Existing methods primarily delve into only one task using extra inputs or specialized sensors, neglecting the valuable interactions among tasks and the subsequent refinement process, leading to suboptimal and blurry predictions. To address these issues, we propose a monocular framework, which is the first to excel in both segmentation and depth estimation of transparent objects, with only a single-image input. Specifically, we devise a novel semantic and geometric fusion module, effectively integrating the multi-scale information between tasks. In addition, drawing inspiration from human perception of objects, we further incorporate an iterative strategy, which progressively refines initial features for clearer results. Experiments on two challenging synthetic and real-world datasets demonstrate that our model surpasses state-of-the-art monocular, stereo, and multi-view methods by a large margin of about 38.8%-46.2% with only a single RGB input. Codes and models are publicly available at this https URL.
zh

[CV-19] Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leverag ed by Triplet Mining ICLR2025

【速读】：该论文旨在解决自监督单目深度估计（Self-supervised Monocular Depth Estimation, SSMDE）在反射表面区域预测不准确的问题。为了解决这一问题，论文提出了一种新颖的训练策略，通过利用三元组挖掘来精确识别像素级别的反射区域，并借助不同视角之间的相机几何信息进行引导。关键解决方案在于引入了反射感知三元组挖掘损失函数，该函数专门针对局部化反射区域中的不当光度误差最小化进行惩罚，同时保持非反射区域的深度准确性。此外，论文还提出了一种反射感知的知识蒸馏方法，使学生模型能够从反射和非反射区域有选择性地学习像素级知识，从而实现对各区域深度估计的鲁棒性。

链接: https://arxiv.org/abs/2502.14573
作者: Wonhyeok Choi,Kyumin Hwang,Wei Peng,Minwoo Choi,Sunghoon Im
机构: Daegu Gyeongbuk Institute of Science and Technology(大邱庆北科学技术研究院); Stanford University (斯坦福大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Accepted at ICLR 2025

点击查看摘要

Abstract:Self-supervised monocular depth estimation (SSMDE) aims to predict the dense depth map of a monocular image, by learning depth from RGB image sequences, eliminating the need for ground-truth depth labels. Although this approach simplifies data acquisition compared to supervised methods, it struggles with reflective surfaces, as they violate the assumptions of Lambertian reflectance, leading to inaccurate training on such surfaces. To tackle this problem, we propose a novel training strategy for an SSMDE by leveraging triplet mining to pinpoint reflective regions at the pixel level, guided by the camera geometry between different viewpoints. The proposed reflection-aware triplet mining loss specifically penalizes the inappropriate photometric error minimization on the localized reflective regions while preserving depth accuracy in non-reflective areas. We also incorporate a reflection-aware knowledge distillation method that enables a student model to selectively learn the pixel-level knowledge from reflective and non-reflective regions. This results in robust depth estimation across areas. Evaluation results on multiple datasets demonstrate that our method effectively enhances depth quality on reflective surfaces and outperforms state-of-the-art SSMDE baselines.
zh

[CV-20] Learning Temporal 3D Semantic Scene Completion via Optical Flow Guidance

链接: https://arxiv.org/abs/2502.14520
作者: Meng Wang,Fan Wu,Ruihui Li,Yunchuan Qin,Zhuo Tang,Kenli Li
机构: College of Computer Science and Electronic Engineering, Hunan University (湖南大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-21] A Mobile Robotic Approach to Autonomous Surface Scanning in Legal Medicine

链接: https://arxiv.org/abs/2502.14514
作者: Sarah Grube,Sarah Latus,Martin Fischer,Vidas Raudonis,Axel Heinemann,Benjamin Ondruschka,Alexander Schlaefer
机构: 未知
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Systems and Control (eess.SY)
备注: Submitted and accepted for presentation at CARS 2025. This preprint has not undergone peer review or post-submission revisions. The final version of this work will appear in the official CARS 2025 proceedings

点击查看摘要

[CV-22] PLPHP: Per-Layer Per-Head Vision Token Pruning for Efficient Large Vision-Language Models

【速读】：该论文旨在解决大型视觉语言模型（LVLMs）在推理过程中因处理大量视觉标记而导致的效率低下问题。解决方案的关键在于提出了一种名为Per-Layer Per-Head Vision Token Pruning (PLPHP)的方法，这是一种包含层级保留率分配和头级别视觉标记剪枝的细粒度剪枝技术。通过动态调整各层的标记保留率，并在注意力头级别进行剪枝，使得不同层和头能够根据其对视觉信息的关注程度独立地保留关键上下文，从而提高解码速度并减少键值缓存（KV Cache）的大小。

链接: https://arxiv.org/abs/2502.14504
作者: Yu Meng,Kaiyuan Li,Chenran Huang,Chen Gao,Xinlei Chen,Yong Li,Xiaoping Zhang
机构: Shenzhen International Graduate School, Tsinghua University (清华大学深圳国际研究生院); Tsinghua University (清华大学); Tongji University (同济大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 12 pages, 8 figures

点击查看摘要

Abstract:Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a range of multimodal tasks. However, their inference efficiency is constrained by the large number of visual tokens processed during decoding. To address this challenge, we propose Per-Layer Per-Head Vision Token Pruning (PLPHP), a two-level fine-grained pruning method including Layer-Level Retention Rate Allocation and Head-Level Vision Token Pruning. Motivated by the Vision Token Re-attention phenomenon across decoder layers, we dynamically adjust token retention rates layer by layer. Layers that exhibit stronger attention to visual information preserve more vision tokens, while layers with lower vision attention are aggressively pruned. Furthermore, PLPHP applies pruning at the attention head level, enabling different heads within the same layer to independently retain critical context. Experiments on multiple benchmarks demonstrate that PLPHP delivers an 18% faster decoding speed and reduces the Key-Value Cache (KV Cache) size by over 50%, all at the cost of 0.46% average performance drop, while also achieving notable performance improvements in multi-image tasks. These results highlight the effectiveness of fine-grained token pruning and contribute to advancing the efficiency and scalability of LVLMs. Our source code will be made publicly available.
zh

[CV-23] LXLv2: Enhanced LiDAR Excluded Lean 3D Object Detection with Fusion of 4D Radar and Camera

【速读】：该论文旨在解决前作LXL在深度预测准确性与一致性方面的不足以及其串联融合方法导致的模型鲁棒性问题。关键解决方案在于引入了一种基于雷达点的一对多深度监督策略，并利用雷达截面（RCS）值调整监督区域以实现目标级别的深度一致性。此外，还提出了一个基于通道和空间注意力的融合模块（CSAFusion），以增强特征适应性。这些改进显著提升了检测精度、推理速度和模型鲁棒性。

链接: https://arxiv.org/abs/2502.14503
作者: Weiyi Xiong,Zean Zou,Qiuchi Zhao,Fengchun He,Bing Zhu
机构: School of Automation Science and Electrical Engineering, Beihang University (北京航空航天大学), Beijing, P.R. China; Continental Autonomous Mobility (Shanghai) Co., Ltd (大陆自动移动（上海）有限公司), Shanghai, P.R. China
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Accepted by IEEE Robotics and Automation Letters

点击查看摘要

Abstract:As the previous state-of-the-art 4D radar-camera fusion-based 3D object detection method, LXL utilizes the predicted image depth distribution maps and radar 3D occupancy grids to assist the sampling-based image view transformation. However, the depth prediction lacks accuracy and consistency, and the concatenation-based fusion in LXL impedes the model robustness. In this work, we propose LXLv2, where modifications are made to overcome the limitations and improve the performance. Specifically, considering the position error in radar measurements, we devise a one-to-many depth supervision strategy via radar points, where the radar cross section (RCS) value is further exploited to adjust the supervision area for object-level depth consistency. Additionally, a channel and spatial attention-based fusion module named CSAFusion is introduced to improve feature adaptiveness. Experimental results on the View-of-Delft and TJ4DRadSet datasets show that the proposed LXLv2 can outperform LXL in detection accuracy, inference speed and robustness, demonstrating the effectiveness of the model.
zh

[CV-24] Nearshore Underwater Target Detection Meets UAV-borne Hyperspectral Remote Sensing: A Novel Hybrid-level Contrastive Learning Framework and Benchmark Dataset

【速读】：该论文旨在解决近岸环境中水下目标检测（Underwater Target Detection, UTD）面临的光谱失真问题，这些失真导致传统基于水深模型的高光谱水下目标检测（Hyperspectral Underwater Target Detection, HUTD）方法的准确性下降。论文的关键解决方案是提出了一种名为高光谱水下对比学习网络（Hyperspectral Underwater Contrastive Learning Network, HUCLNet）的新框架，该框架结合对比学习和自适应学习策略，以增强近岸区域的鲁棒性HUTD能力。HUCLNet通过对比学习从失真的高光谱数据中提取判别特征，并利用自适应学习策略优先处理最具信息量的样本，从而提高了检测的准确性。

链接: https://arxiv.org/abs/2502.14495
作者: Jiahao Qi,Chuanhong Zhou,Xingyue Liu,Chen Chen,Dehui Zhu,Kangcheng Bin,Ping Zhong
机构: National Key Laboratory of Science and Technology on Automatic Target Recognition, National University of Defense Technology, Changsha 410073, China (国防科技大学自动目标识别国家重点实验室)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 18pages,13figures

点击查看摘要

Abstract:UAV-borne hyperspectral remote sensing has emerged as a promising approach for underwater target detection (UTD). However, its effectiveness is hindered by spectral distortions in nearshore environments, which compromise the accuracy of traditional hyperspectral UTD (HUTD) methods that rely on bathymetric model. These distortions lead to significant uncertainty in target and background spectra, challenging the detection process. To address this, we propose the Hyperspectral Underwater Contrastive Learning Network (HUCLNet), a novel framework that integrates contrastive learning with a self-paced learning paradigm for robust HUTD in nearshore regions. HUCLNet extracts discriminative features from distorted hyperspectral data through contrastive learning, while the self-paced learning strategy selectively prioritizes the most informative samples. Additionally, a reliability-guided clustering strategy enhances the robustness of learned this http URL evaluate the method effectiveness, we conduct a novel nearshore HUTD benchmark dataset, ATR2-HUTD, covering three diverse scenarios with varying water types and turbidity, and target types. Extensive experiments demonstrate that HUCLNet significantly outperforms state-of-the-art methods. The dataset and code will be publicly available at: this https URL
zh

[CV-25] CrossFuse: Learning Infrared and Visible Image Fusion by Cross-Sensor Top-K Vision Alignment and Beyond

链接: https://arxiv.org/abs/2502.14493
作者: Yukai Shi,Cidan Shi,Zhipeng Weng,Yin Tian,Xiaoyu Xian,Liang Lin
机构: School of Information Engineering, Guangdong University of Technology (信息工程学院, 广东工业大学); CRRC Academy Co., Ltd. (中国中车股份有限公司); School of Computer Science, Sun Yat-sen University (计算机科学学院, 中山大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: IEEE T-CSVT. We mainly discuss the out-of-distribution challenges in infrared and visible image fusion

点击查看摘要

[CV-26] mporal Misalignment and Probabilistic Neurons

链接: https://arxiv.org/abs/2502.14487
作者: Velibor Bojković,Xiaofeng Wu,Bin Gu
机构: 未知
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-27] Integrating Extra Modality Helps Segmentor Find Camouflaged Objects Well

链接: https://arxiv.org/abs/2502.14471
作者: Chengyu Fang,Chunming He,Longxiang Tang,Yuelin Zhang,Chenyang Zhu,Yuqi Shen,Chubin Chen,Guoxia Xu,Xiu Li
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 12 pages, 5 figures, 6 tables

点击查看摘要

[CV-28] Single-image Reflectance and Transmittance Estimation from Any Flatbed Scanner

【速读】：该论文旨在解决使用平板扫描仪进行高分辨率材料捕获时所面临的挑战，特别是现有方法依赖于非常特定的条件（如均匀扩散照明），这些条件仅在某些高端设备中可用，限制了其可扩展性和成本效益。论文的关键解决方案是提出了一种受内在图像分解启发的方法，能够准确去除阴影和高光，从而允许使用任何平板扫描仪进行有效的捕获。此外，该方法通过估算不透明度和透射率，进一步扩展了单图像材料反射率捕获技术，这是完整材料表观（SVBSDF）的重要组成部分，从而显著提升了通过平板扫描仪以极高分辨率和精度捕获任意材料的效果。

链接: https://arxiv.org/abs/2502.14462
作者: Carlos Rodriguez-Pardo,David Pascual-Hernandez,Javier Rodriguez-Vazquez,Jorge Lopez-Moreno,Elena Garces
机构: Politecnico di Milano (米兰理工大学); Euro-Mediterranean Center on Climate Change (CMCC); RFF-CMCC European Institute on Economics and the Environment (EIEE); Universidad Rey Juan Carlos (胡安卡洛斯国王大学); Arquimea Research Center; Adobe Research
类目: Graphics (cs.GR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Accepted to Computers Graphics

点击查看摘要

Abstract:Flatbed scanners have emerged as promising devices for high-resolution, single-image material capture. However, existing approaches assume very specific conditions, such as uniform diffuse illumination, which are only available in certain high-end devices, hindering their scalability and cost. In contrast, in this work, we introduce a method inspired by intrinsic image decomposition, which accurately removes both shading and specularity, effectively allowing captures with any flatbed scanner. Further, we extend previous work on single-image material reflectance capture with the estimation of opacity and transmittance, critical components of full material appearance (SVBSDF), improving the results for any material captured with a flatbed scanner, at a very high resolution and accuracy
zh

[CV-29] Exploiting Deblurring Networks for Radiance Fields

链接: https://arxiv.org/abs/2502.14454
作者: Haeyun Choi,Heemin Yang,Janghyeok Han,Sunghyun Cho
机构: POSTECH (POSTECH); GSAI (GSAI); CSE (CSE)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-30] Stochastic Resonance Improves the Detection of Low Contrast Images in Deep Learning Models

【速读】：该论文旨在探究随机共振在基于速率的神经网络图像分类中的应用。研究通过训练一个简单的LSTM递归神经网络进行数字识别与分类，在测试阶段降低图像对比度以使模型无法检测到刺激的存在。关键在于通过添加控制噪声部分恢复了分类性能，验证了基于速率的递归神经网络中随机共振的存在。

链接: https://arxiv.org/abs/2502.14442
作者: Siegfried Ludwig
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: MSc Course Project

点击查看摘要

Abstract:Stochastic resonance describes the utility of noise in improving the detectability of weak signals in certain types of systems. It has been observed widely in natural and engineered settings, but its utility in image classification with rate-based neural networks has not been studied extensively. In this analysis a simple LSTM recurrent neural network is trained for digit recognition and classification. During the test phase, image contrast is reduced to a point where the model fails to recognize the presence of a stimulus. Controlled noise is added to partially recover classification performance. The results indicate the presence of stochastic resonance in rate-based recurrent neural networks.
zh

[CV-31] Daily Land Surface Temperature Reconstruction in Landsat Cross-Track Areas Using Deep Ensemble Learning With Uncertainty Quantification

链接: https://arxiv.org/abs/2502.14433
作者: Shengjie Liu,Siqin Wang,Lu Zhang
机构: University of Southern California(南加州大学); Keck School of Medicine, University of Southern California(南加州大学凯克医学院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-32] ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model

【速读】：该论文旨在解决大型语言模型（Large Language Models）无法复制人类统一认知能力的问题，特别是在视觉-语言-动作模型（Vision-Language-Action, VLA）中的两个关键挑战：虚假遗忘（spurious forgetting）和任务干扰（task interference）。论文的关键解决方案是提出了一种名为ChatVLA的新框架，该框架采用分阶段对齐训练（Phased Alignment Training），逐步整合多模态数据，并使用专家混合（Mixture-of-Experts）架构以最小化任务干扰。

链接: https://arxiv.org/abs/2502.14420
作者: Zhongyi Zhou,Yichen Zhu,Minjie Zhu,Junjie Wen,Ning Liu,Zhiyuan Xu,Weibin Meng,Ran Cheng,Yaxin Peng,Chaomin Shen,Feifei Feng
机构: Midea Group; East China Normal University; Shanghai University; Beijing Innovation Center of Humanoid Robotics; Tsinghua University
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Humans possess a unified cognitive ability to perceive, comprehend, and interact with the physical world. Why can’t large language models replicate this holistic understanding? Through a systematic analysis of existing training paradigms in vision-language-action models (VLA), we identify two key challenges: spurious forgetting, where robot training overwrites crucial visual-text alignments, and task interference, where competing control and understanding tasks degrade performance when trained jointly. To overcome these limitations, we propose ChatVLA, a novel framework featuring Phased Alignment Training, which incrementally integrates multimodal data after initial control mastery, and a Mixture-of-Experts architecture to minimize task interference. ChatVLA demonstrates competitive performance on visual question-answering datasets and significantly surpasses state-of-the-art vision-language-action (VLA) methods on multimodal understanding benchmarks. Notably, it achieves a six times higher performance on MMMU and scores 47.2% on MMStar with a more parameter-efficient design than ECoT. Furthermore, ChatVLA demonstrates superior performance on 25 real-world robot manipulation tasks compared to existing VLA methods like OpenVLA. Our findings highlight the potential of our unified framework for achieving both robust multimodal understanding and effective robot control.
zh

[CV-33] Evaluating Precise Geolocation Inference Capabilities of Vision Language Models AAAI2025

链接: https://arxiv.org/abs/2502.14412
作者: Neel Jay,Hieu Minh Nguyen,Trung Dung Hoang,Jacob Haimes
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
备注: AAAI 2025 Workshop DATASAFE

点击查看摘要

[CV-34] PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data

【速读】：该论文旨在解决照片涂鸦（Photo Doodling）中的挑战，即如何将装饰元素无缝地融入照片背景中，同时保持背景不变形，并高效捕捉艺术家的独特风格。解决方案的关键在于采用两阶段训练策略：首先使用大规模数据训练一个通用图像编辑模型OmniEditor；随后利用EditLoRA对小规模、由艺术家精心挑选的前后图像对进行微调，以捕捉特定的编辑风格和技术。此外，引入了一种位置编码重用机制以增强生成结果的一致性。

链接: https://arxiv.org/abs/2502.14397
作者: Shijie Huang,Yiren Song,Yuxuan Zhang,Hailong Guo,Xueyin Wang,Mike Zheng Shou,Jiaming Liu
机构: National University of Singapore; Shanghai Jiao Tong University; Beijing University of Posts and Telecommunications; Byte Dance; Tiamat
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:We introduce PhotoDoodle, a novel image editing framework designed to facilitate photo doodling by enabling artists to overlay decorative elements onto photographs. Photo doodling is challenging because the inserted elements must appear seamlessly integrated with the background, requiring realistic blending, perspective alignment, and contextual coherence. Additionally, the background must be preserved without distortion, and the artist’s unique style must be captured efficiently from limited training data. These requirements are not addressed by previous methods that primarily focus on global style transfer or regional inpainting. The proposed method, PhotoDoodle, employs a two-stage training strategy. Initially, we train a general-purpose image editing model, OmniEditor, using large-scale data. Subsequently, we fine-tune this model with EditLoRA using a small, artist-curated dataset of before-and-after image pairs to capture distinct editing styles and techniques. To enhance consistency in the generated results, we introduce a positional encoding reuse mechanism. Additionally, we release a PhotoDoodle dataset featuring six high-quality styles. Extensive experiments demonstrate the advanced performance and robustness of our method in customized image editing, opening new possibilities for artistic creation.
zh

[CV-35] RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers

链接: https://arxiv.org/abs/2502.14377
作者: Ke Cao,Jing Wang,Ao Ma,Jiasong Feng,Zhanjie Zhang,Xuanhua He,Shanyuan Liu,Bo Cheng,Dawei Leng,Yuhui Yin,Jie Zhang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 15 pages, 9 figures

点击查看摘要

[CV-36] CrossVTON: Mimicking the Logic Reasoning on Cross-category Virtual Try-on guided by Tri-zone Priors

链接: https://arxiv.org/abs/2502.14373
作者: Donghao Luo,Yujie Liang,Xu Peng,Xiaobin Hu,Boyuan Jiang,Chengming Xu,Taisong Jin,Chengjie Wang,Yanwei Fu
机构: Tencent; Fudan University; Xiamen University
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-37] PPO-MI: Efficient Black-Box Model Inversion via Proximal Policy Optimization ICML2025

链接: https://arxiv.org/abs/2502.14370
作者: Xinpeng Shou
机构: 未知
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
备注: 6 pages, submitting to ICML 2025

点击查看摘要

[CV-38] Weed Detection using Convolutional Neural Network

【速读】：该论文旨在解决农业用地中杂草检测的问题，解决方案的关键在于使用卷积神经网络（Convolutional Neural Networks, CNNs），特别是Conv2d和扩张卷积（dilated Conv2d）层类型，通过预训练模型提取输入图像特征，并对其进行微调以实现精准的杂草检测。实验结果表明，所提出的方法在包含15336个地块的大型数据集上达到了94%的检测精度。这项研究对于减少有毒除草剂的使用以及提高农业杂草管理效率具有重要意义。

链接: https://arxiv.org/abs/2502.14360
作者: Santosh Kumar Tripathi,Shivendra Pratap Singh,Devansh Sharma,Harshavardhan U Patekar
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:In this paper we use convolutional neural networks (CNNs) for weed detection in agricultural land. We specifically investigate the application of two CNN layer types, Conv2d and dilated Conv2d, for weed detection in crop fields. The suggested method extracts features from the input photos using pre-trained models, which are subsequently adjusted for weed detection. The findings of the experiment, which used a sizable collection of dataset consisting of 15336 segments, being 3249 of soil, 7376 of soybean, 3520 grass and 1191 of broadleaf weeds. show that the suggested approach can accurately and successfully detect weeds at an accuracy of 94%. This study has significant ramifications for lowering the usage of toxic herbicides and increasing the effectiveness of weed management in agriculture.
zh

[CV-39] riply Laplacian Scale Mixture Modeling for Seismic Data Noise Suppression

链接: https://arxiv.org/abs/2502.14355
作者: Sirui Pan(1),Zhiyuan Zha(1),Shigang Wang(1),Yue Li(1),Zipei Fan(2),Gang Yan(3),Binh T. Nguyen(4),Bihan Wen(5),Ce Zhu(6) ((1) College of Communication Engineering, Jilin University, (2) School of Artificial Intelligence, Jilin University, (3) College of Computer Science and Technology, Jilin University, (4) Department of Computer Science, Faculty of Mathematics and Computer Science, University of Science, Vietnam National University, (5) School of Electrical and Electronic Engineering, Nanyang Technological University, (6) Glasgow College, University of Electronic Science and Technology of China)
机构: College of Communication Engineering, Jilin University, Changchun 130012, China (吉林大学通信工程学院); School of Artificial Intelligence, Jilin University, Changchun 130012, China (吉林大学人工智能学院); College of Computer Science and Technology, Jilin University, Changchun 130012, China (吉林大学计算机科学与技术学院); Department of Computer Science, Faculty of Mathematics and Computer Science, University of Science, Vietnam National University Ho Chi Minh City, Ho Chi Minh City 700000, Vietnam (越南国立大学胡志明市大学理学与计算机科学学院计算机科学系); School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798 (南洋理工大学电气与电子工程学院); Glasgow College, University of Electronic Science and Technology of China, Chengdu 611731, China (电子科技大学格拉斯哥学院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-40] SegAnyPET: Universal Promptable Segmentation from Positron Emission Tomography Images

链接: https://arxiv.org/abs/2502.14351
作者: Yichi Zhang,Le Xue,Wenbo Zhang,Lanlan Li,Yuchen Liu,Chen Jiang,Yuan Cheng,Yuan Qi
机构: Fudan University (复旦大学); Shanghai Academy of Artificial Intelligence for Science (上海人工智能科学研究院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-41] owards Accurate Binary Spiking Neural Networks: Learning with Adaptive Gradient Modulation Mechanism AAAI

链接: https://arxiv.org/abs/2502.14344
作者: Yu Liang,Wenjie Wei,Ammar Belatreche,Honglin Cao,Zijian Zhou,Shuai Wang,Malu Zhang,Yang Yang
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 9 pages, 8 figures, AAAI conference

点击查看摘要

[CV-42] A Collaborative Jade Recognition System for Mobile Devices Based on Lightweight and Large Models

【速读】：该论文旨在解决移动设备在实现玉器识别系统时所面临的计算资源有限、实时性需求以及准确性问题。解决方案的关键在于提出了一种基于尺寸模型协作的玉器识别系统。该系统通过设计一种基于多尺度图像处理的尺寸模型，提取玉器的尺寸、形状和表面纹理等关键视觉信息，并结合深度学习与传统计算机视觉算法构建了一个协作多模型分类框架。这个框架能够根据不同玉器的特点有效选择和调整模型，从而实现在多种环境下的高精度识别结果，同时保证快速的处理时间和较低的计算资源消耗。

链接: https://arxiv.org/abs/2502.14332
作者: Zhenyu Wang,Wenjia Li,Pengyu Zhu
机构: North China Electric Power University (华北电力大学), Beijing (北京), China (中国)
类目: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
备注:

点击查看摘要

Abstract:With the widespread adoption and development of mobile devices, vision-based recognition applications have become a hot topic in research. Jade, as an important cultural heritage and artistic item, has significant applications in fields such as jewelry identification and cultural relic preservation. However, existing jade recognition systems still face challenges in mobile implementation, such as limited computing resources, real-time requirements, and accuracy issues. To address these challenges, this paper proposes a jade recognition system based on size model collaboration, aiming to achieve efficient and accurate jade identification using mobile devices such as this http URL, we design a size model based on multi-scale image processing, extracting key visual information by analyzing jade’s dimensions, shapes, and surface textures. Then, a collaborative multi-model classification framework is built by combining deep learning and traditional computer vision algorithms. This framework can effectively select and adjust models based on different jade characteristics, providing high accuracy results across various environments and this http URL results show that the proposed system can provide high recognition accuracy and fast processing time on mobile devices, while consuming relatively low computational resources. The system not only holds great application potential but also provides new ideas and technical support for the intelligent development of jade identification.
zh

[CV-43] xtured 3D Regenerative Morphing with 3D Diffusion Prior

【速读】：该论文旨在解决纹理化三维变形过程中平滑且合理的形变序列生成问题，特别是在形状和纹理变化方面。传统方法依赖显式的点对点对应关系及平滑变形轨迹的确定，这不仅限制了它们只能处理无纹理、拓扑一致的数据集上的单纯形状变形，还导致了繁琐的预处理工作和较差的泛化能力。为克服这些挑战，论文提出了一种基于三维扩散先验的三维再生形变方法。该方法的关键在于引入了一个三维扩散模型，并在初始噪声、模型参数和条件特征三个层面上插值源对象与目标对象的信息，进而探索注意力融合策略以生成更平滑的形变序列。此外，通过引入令牌重排序和低频增强两种策略，进一步提高了语义插值和生成三维表面的逼真度。

链接: https://arxiv.org/abs/2502.14316
作者: Songlin Yang,Yushi Lan,Honghua Chen,Xingang Pan
机构: S-Lab, Nanyang Technological University (南洋理工大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Textured 3D morphing creates smooth and plausible interpolation sequences between two 3D objects, focusing on transitions in both shape and texture. This is important for creative applications like visual effects in filmmaking. Previous methods rely on establishing point-to-point correspondences and determining smooth deformation trajectories, which inherently restrict them to shape-only morphing on untextured, topologically aligned datasets. This restriction leads to labor-intensive preprocessing and poor generalization. To overcome these challenges, we propose a method for 3D regenerative morphing using a 3D diffusion prior. Unlike previous methods that depend on explicit correspondences and deformations, our method eliminates the additional need for obtaining correspondence and uses the 3D diffusion prior to generate morphing. Specifically, we introduce a 3D diffusion model and interpolate the source and target information at three levels: initial noise, model parameters, and condition features. We then explore an Attention Fusion strategy to generate more smooth morphing sequences. To further improve the plausibility of semantic interpolation and the generated 3D surfaces, we propose two strategies: (a) Token Reordering, where we match approximate tokens based on semantic analysis to guide implicit correspondences in the denoising process of the diffusion model, and (b) Low-Frequency Enhancement, where we enhance low-frequency signals in the tokens to improve the quality of generated surfaces. Experimental results show that our method achieves superior smoothness and plausibility in 3D morphing across diverse cross-category object pairs, offering a novel regenerative method for 3D morphing with textured representations.
zh

[CV-44] ODVerse33: Is the New YOLO Version Always Better? A Multi Domain benchmark from YOLO v5 to v11

链接: https://arxiv.org/abs/2502.14314
作者: Tianyou Jiang,Yang Zhong
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 18 pages, 4 figures, 7 tables

点击查看摘要

[CV-45] PC-Agent : A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC

链接: https://arxiv.org/abs/2502.14282
作者: Haowei Liu,Xi Zhang,Haiyang Xu,Yuyang Wanyan,Junyang Wang,Ming Yan,Ji Zhang,Chunfeng Yuan,Changsheng Xu,Weiming Hu,Fei Huang
机构: MAIS, Institute of Automation, Chinese Academy of Sciences(自动化研究所,中国科学院), China; School of Artificial Intelligence, University of Chinese Academy of Sciences(人工智能学院,中国科学院大学), China; Alibaba Group(阿里巴巴集团); Beijing Jiaotong University(北京交通大学); School of Information Science and Technology, ShanghaiTech University(信息科学与技术学院,上海科技大学), China
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 14 pages, 7 figures

点击查看摘要

[CV-46] OrchardDepth: Precise Metric Depth Estimation of Orchard Scene from Monocular Camera Images

【速读】：该论文旨在解决果园/葡萄园环境中单目深度估计的度量深度估算问题，现有研究主要集中在城市环境以提升自动驾驶设备性能，而忽略了农业领域的需求。论文的关键解决方案在于提出了OrchardDepth数据集，并引入了一种新的再训练方法，通过监控密集深度图与稀疏点之间的一致正则化来改进训练结果。这种方法将果园环境下深度估计的RMSE从1.5337降低到了0.6738，验证了其有效性。

链接: https://arxiv.org/abs/2502.14279
作者: Zhichao Zheng,Henry Williams,Bruce A MacDonald
机构: The University of Auckland (奥克兰大学)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 10 pages, 5 figures, Australasian Conference on Robotics and Automation, ACRA, 2024

点击查看摘要

Abstract:Monocular depth estimation is a rudimentary task in robotic perception. Recently, with the development of more accurate and robust neural network models and different types of datasets, monocular depth estimation has significantly improved performance and efficiency. However, most of the research in this area focuses on very concentrated domains. In particular, most of the benchmarks in outdoor scenarios belong to urban environments for the improvement of autonomous driving devices, and these benchmarks have a massive disparity with the orchard/vineyard environment, which is hardly helpful for research in the primary industry. Therefore, we propose OrchardDepth, which fills the gap in the estimation of the metric depth of the monocular camera in the orchard/vineyard environment. In addition, we present a new retraining method to improve the training result by monitoring the consistent regularization between dense depth maps and sparse points. Our method improves the RMSE of depth estimation in the orchard environment from 1.5337 to 0.6738, proving our method’s validation.
zh

[CV-47] LLM -EvRep: Learning an LLM -Compatible Event Representation Using a Self-Supervised Framework WWW

【速读】：该论文旨在解决现有事件驱动视觉识别方法依赖大量训练数据的问题，限制了其在处理事件驱动视觉内容时的适应性和效率。论文的关键解决方案是提出了一种事件表示生成器 \textbf{LLM-EvGen}，用于生成与大规模语言模型（Large Language Models, LLMs）兼容的事件表示 \textbf{LLM-EvRep}，从而提升LLMs在事件识别任务中的性能。此生成器通过自监督框架进行训练，确保生成的表示具有语义一致性和结构保真度。

链接: https://arxiv.org/abs/2502.14273
作者: Zongyou Yu,Qiang Qu,Qian Zhang,Nan Zhang,Xiaoming Chen
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
备注: 6 pages, 2 figures,Companion Proceedings of the ACM Web Conference 2025 (WWW Companion '25)

点击查看摘要

Abstract:Recent advancements in event-based recognition have demonstrated significant promise, yet most existing approaches rely on extensive training, limiting their adaptability for efficient processing of event-driven visual content. Meanwhile, large language models (LLMs) have exhibited remarkable zero-shot capabilities across diverse domains, but their application to event-based visual recognition remains largely unexplored. To bridge this gap, we propose \textbfLLM-EvGen, an event representation generator that produces LLM-compatible event representations \textbfLLM-EvRep, thereby enhancing the performance of LLMs on event recognition tasks. The generator is trained using a self-supervised framework, aligning the generated representations with semantic consistency and structural fidelity. Comprehensive experiments were conducted on three datasets: N-ImageNet, N-Caltech101, and N-MNIST. The results demonstrate that our method, \textbfLLM-EvRep, outperforms the event-to-video method, E2VID, by 15.93%, 0.82%, and 50.21%, respectively, in recognition tasks when evaluated using GPT-4o.
zh

[CV-48] Money Recognition for the Visually Impaired: A Case Study on Sri Lankan Banknotes

链接: https://arxiv.org/abs/2502.14267
作者: Akshaan Bandara
机构: Informatics Institute of Technology (信息技术学院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-49] Pandora3D: A Comprehensive Framework for High-Quality 3D Shape and Texture Generation

链接: https://arxiv.org/abs/2502.14247
作者: Jiayu Yang,Taizhang Shang,Weixuan Sun,Xibin Song,Ziang Chen,Senbo Wang,Shenzhou Chen,Weizhe Liu,Hongdong Li,Pan Ji
机构: Tencent XR Vision Labs(腾讯XR视觉实验室); The Australian National University(澳大利亚国立大学)
类目: Graphics (cs.GR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注: Tencent XR 3D Gen

点击查看摘要

[CV-50] OG-Gaussian: Occupancy Based Street Gaussians for Autonomous Driving

链接: https://arxiv.org/abs/2502.14235
作者: Yedong Shen,Xinran Zhang,Yifan Duan,Shiqi Zhang,Heng Li,Yilong Wu,Jianmin Ji,Yanyong Zhang
机构: School of Computer Science and Technology, University of Science and Technology of China (中国科学技术大学计算机科学与技术学院); School of Artificial Intelligence and Data Science, University of Science and Technology of China (中国科学技术大学人工智能与数据科学学院)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-51] Designing Parameter and Compute Efficient Diffusion Transformers using Distillation

链接: https://arxiv.org/abs/2502.14226
作者: Vignesh Sundaresha
机构: University of Illinois Urbana Champaign
类目: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
备注: 4 pages

点击查看摘要

[CV-52] H3DE-Net: Efficient and Accurate 3D Landmark Detection in Medical Imaging

链接: https://arxiv.org/abs/2502.14221
作者: Zhen Huang,Ronghao Xu,Xiaoqian Zhou,Yangbo Wei,Suhua Wang,Xiaoxin Sun,Han Li,Qingsong Yao
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-53] Asymmetric Co-Training for Source-Free Few-Shot Domain Adaptation

【速读】：该论文旨在解决源无监督领域适应（SFUDA）方法在实际应用中的局限性，特别是在目标数据无法满足封闭集标签分布假设相同或缺乏足够未标记目标数据的情况下。为了解决这些问题，论文提出了一种针对源有限目标自监督领域适应（SFFSDA）场景的非对称协同训练（ACT）方法。关键在于通过弱强增强来提升数据多样性，并采用两步优化过程：首先优化标签平滑交叉熵损失、类别条件分布熵和反熵损失以增强模型的判别能力并减少过拟合；其次通过最小化分类器确定性差异来降低输出空间的冗余。实验结果表明，该ACT方法优于现有的SFUDA方法和迁移学习技术。

链接: https://arxiv.org/abs/2502.14214
作者: Gengxu Li,Yuan Wu
机构: School of Artificial intelligence, Jilin University (吉林大学人工智能学院)
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
备注: 13 pages

点击查看摘要

Abstract:Source-free unsupervised domain adaptation (SFUDA) has gained significant attention as an alternative to traditional unsupervised domain adaptation (UDA), which relies on the constant availability of labeled source data. However, SFUDA approaches come with inherent limitations that are frequently overlooked. These challenges include performance degradation when the unlabeled target data fails to meet critical assumptions, such as having a closed-set label distribution identical to that of the source domain, or when sufficient unlabeled target data is unavailable-a common situation in real-world applications. To address these issues, we propose an asymmetric co-training (ACT) method specifically designed for the SFFSDA scenario. SFFSDA presents a more practical alternative to SFUDA, as gathering a few labeled target instances is more feasible than acquiring large volumes of unlabeled target data in many real-world contexts. Our ACT method begins by employing a weak-strong augmentation to enhance data diversity. Then we use a two-step optimization process to train the target model. In the first step, we optimize the label smoothing cross-entropy loss, the entropy of the class-conditional distribution, and the reverse-entropy loss to bolster the model’s discriminative ability while mitigating overfitting. The second step focuses on reducing redundancy in the output space by minimizing classifier determinacy disparity. Extensive experiments across four benchmarks demonstrate the superiority of our ACT approach, which outperforms state-of-the-art SFUDA methods and transfer learning techniques. Our findings suggest that adapting a source pre-trained model using only a small amount of labeled target data offers a practical and dependable solution. The code is available at this https URL.
zh

[CV-54] Spatial and Frequency Domain Adaptive Fusion Network for Image Deblurring

链接: https://arxiv.org/abs/2502.14209
作者: Hu Gao,Depeng Dang
机构: School of Artificial Intelligence, Beijing Normal University (北京师范大学人工智能学院); Beijing (北京), China (中国)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-55] Bridging Text and Vision: A Multi-View Text-Vision Registration Approach for Cross-Modal Place Recognition

【速读】：该论文旨在解决移动机器人在自然语言理解方面的需求，以准确识别位置并执行任务（如包裹递送）。传统视觉位置识别方法依赖单一视角视觉信息，无法解读人类语言描述。为克服这一挑战，论文提出了一种名为Text4VPR的多视角（360°环境视图）文本-视觉配准方法，用于位置识别任务，该方法首次仅利用文本描述匹配图像数据库。Text4VPR的关键在于使用冻结的T5语言模型提取全局文本嵌入，并通过Sinkhorn算法结合温度系数将局部标记分配到相应聚类，从而聚合图像的视觉描述符。此外，在推理阶段，Text4VPR采用级联交叉注意力余弦对齐(CCCA)来解决文本和图像组之间的内部不匹配，实现基于文本-图像组描述的精确位置匹配。

链接: https://arxiv.org/abs/2502.14195
作者: Tianyi Shang,Zhenyu Li,Pengjie Xu,Jinwei Qiao,Gang Chen,Zihan Ruan,Weijun Hu
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 8 pages, 4 figures, conference

点击查看摘要

Abstract:Mobile robots necessitate advanced natural language understanding capabilities to accurately identify locations and perform tasks such as package delivery. However, traditional visual place recognition (VPR) methods rely solely on single-view visual information and cannot interpret human language descriptions. To overcome this challenge, we bridge text and vision by proposing a multiview (360° views of the surroundings) text-vision registration approach called Text4VPR for place recognition task, which is the first method that exclusively utilizes textual descriptions to match a database of images. Text4VPR employs the frozen T5 language model to extract global textual embeddings. Additionally, it utilizes the Sinkhorn algorithm with temperature coefficient to assign local tokens to their respective clusters, thereby aggregating visual descriptors from images. During the training stage, Text4VPR emphasizes the alignment between individual text-image pairs for precise textual description. In the inference stage, Text4VPR uses the Cascaded Cross-Attention Cosine Alignment (CCCA) to address the internal mismatch between text and image groups. Subsequently, Text4VPR performs precisely place match based on the descriptions of text-image groups. On Street360Loc, the first text to image VPR dataset we created, Text4VPR builds a robust baseline, achieving a leading top-1 accuracy of 57% and a leading top-10 accuracy of 92% within a 5-meter radius on the test set, which indicates that localization from textual descriptions to images is not only feasible but also holds significant potential for further advancement, as shown in Figure 1.
zh

[CV-56] Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models KR

链接: https://arxiv.org/abs/2502.14191
作者: Michihiro Yasunaga,Luke Zettlemoyer,Marjan Ghazvininejad
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: Dataset available at this https URL

点击查看摘要

[CV-57] Stereo Image Coding for Machines with Joint Visual Feature Compression

链接: https://arxiv.org/abs/2502.14190
作者: Dengchao Jin,Jianjun Lei,Bo Peng,Zhaoqing Pan,Nam Ling,Qingming Huang
机构: School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China(天津大学电气与信息工程学院); Department of Computer Science and Engineering, Santa Clara University, Santa Clara, CA 95053, USA(圣克拉拉大学计算机科学与工程系); School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 101408, China(中国科学院大学计算机科学与技术学院)
类目: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
备注:

点击查看摘要

[CV-58] Bayesian SegNet for Semantic Segmentation with Improved Interpretation of Microstructural Evolution During Irradiation of Materials

【速读】：该论文旨在探究辐照锂铝氧（LiAlO2）颗粒的微观结构演化与氚扩散、保留及释放之间的关系，以提升氚生产燃烧吸收棒性能预测的准确性。为实现这一目标，论文的关键解决方案是训练深度卷积神经网络（Deep Convolutional Neural Networks, DCNNs）来分割图像，将其分类为缺陷、晶粒和边界，并从这些分割图像中计算出定性的微观结构信息，以便比较辐照和未辐照的颗粒。此外，通过引入元数据和利用不确定性量化改进模型敏感性，进一步提升了模型性能。总体而言，DCNN模型在处理辐照和未辐照图像时均表现出了高精度，表明其作为专家标注图像的替代方案具有可行性。

链接: https://arxiv.org/abs/2502.14184
作者: Marjolein Oostrom,Alex Hagen,Nicole LaHaye,Karl Pazdernik
机构: Pacific Northwest National Laboratory (太平洋西北国家实验室), Richland, WA, USA; North Carolina State University (北卡罗来纳州立大学), Raleigh, NC, USA
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Understanding the relationship between the evolution of microstructures of irradiated LiAlO2 pellets and tritium diffusion, retention and release could improve predictions of tritium-producing burnable absorber rod performance. Given expert-labeled segmented images of irradiated and unirradiated pellets, we trained Deep Convolutional Neural Networks to segment images into defect, grain, and boundary classes. Qualitative microstructural information was calculated from these segmented images to facilitate the comparison of unirradiated and irradiated pellets. We tested modifications to improve the sensitivity of the model, including incorporating meta-data into the model and utilizing uncertainty quantification. The predicted segmentation was similar to the expert-labeled segmentation for most methods of microstructural qualification, including pixel proportion, defect area, and defect density. Overall, the high performance metrics for the best models for both irradiated and unirradiated images shows that utilizing neural network models is a viable alternative to expert-labeled images.
zh

[CV-59] NeRF-3DTalker: Neural Radiance Field with 3D Prior Aided Audio Disentanglement for Talking Head Synthesis ICASSP2025

链接: https://arxiv.org/abs/2502.14178
作者: Xiaoxing Liu,Zhilei Liu,Chongke Bi
机构: College of Intelligence and Computing, Tianjin University (天津大学智能与计算学院)
类目: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
备注: Accepted by ICASSP 2025

点击查看摘要

[CV-60] Deep learning based infrared small object segmentation: Challenges and future directions

【速读】：该论文旨在解决红外图像中小目标检测与分类的挑战，特别是低信噪比、小而模糊的目标以及有限的标注/未标注训练数据。解决方案的关键在于通过深度学习方法进行系统性的分析和评估，并识别现有技术中的未解决问题及提供未来研究方向。论文还从挑战的角度对现有的红外感知方法进行了结构化综述，揭示了不同方法背后的动机，并提出了基于近期进展的有前景的未来发展方向。

链接: https://arxiv.org/abs/2502.14168
作者: Zhengeng Yang,Hongshan Yu,Jianjun Zhang,Qiang Tang,Ajmal Mian
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: This is a submitted version of a paper accepted by Information Fusion. If you want a better reading experience, please refer to the final published version of Information Fusion

点击查看摘要

Abstract:Infrared sensing is a core method for supporting unmanned systems, such as autonomous vehicles and drones. Recently, infrared sensors have been widely deployed on mobile and stationary platforms for detection and classification of objects from long distances and in wide field of views. Given its success in the vision image analysis domain, deep learning has also been applied for object recognition in infrared images. However, techniques that have proven successful in visible light perception face new challenges in the infrared domain. These challenges include extremely low signal-to-noise ratios in infrared images, very small and blurred objects of interest, and limited availability of labeled/unlabeled training data due to the specialized nature of infrared sensors. Numerous methods have been proposed in the literature for the detection and classification of small objects in infrared images achieving varied levels of success. There is a need for a survey paper that critically analyzes existing techniques in this domain, identifies unsolved challenges and provides future research directions. This paper fills the gap and offers a concise and insightful review of deep learning-based methods. It also identifies the challenges faced by existing infrared object segmentation methods and provides a structured review of existing infrared perception methods from the perspective of these challenges and highlights the motivations behind the various approaches. Finally, this review suggests promising future directions based on recent advancements within this domain.
zh

[CV-61] Mixed Signals: A Diverse Point Cloud Dataset for Heterogeneous LiDAR V2X Collaboration

链接: https://arxiv.org/abs/2502.14156
作者: Katie Z Luo,Minh-Quan Dao,Zhenzhen Liu,Mark Campbell,Wei-Lun Chao,Kilian Q. Weinberger,Ezio Malis,Vincent Fremont,Bharath Hariharan,Mao Shan,Stewart Worrall,Julie Stephany Berrio Perez
机构: Cornell University; Inria; University of Sydney; The Ohio State University; École Centrale de Nantes
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-62] PitVQA: Vector Matrix-Low-Rank Adaptation for Open-Ended Visual Question Answering in Pituitary Surgery

链接: https://arxiv.org/abs/2502.14149
作者: Runlong He,Danyal Z. Khan,Evangelos B. Mazomenos,Hani J. Marcus,Danail Stoyanov,Matthew J. Clarkson,Mobarakol Islam
机构: UCL Hawkes Institute (UCL霍克斯研究所); Department of Medical Physics & Biomedical Engineering, University College London (伦敦大学学院医学物理与生物医学工程系), UK; Department of Computer Science, University College London (伦敦大学学院计算机科学系), UK; National Hospital for Neurology and Neurosurgery, UK (国家神经病学和神经外科医院), UK
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: 9 pages

点击查看摘要

[CV-63] oken Adaptation via Side Graph Convolution for Temporally and Spatially Efficient Fine-tuning of 3D Point Cloud Transformers

链接: https://arxiv.org/abs/2502.14142
作者: Takahiko Furuya
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Currently under review

点击查看摘要

[CV-64] ModSkill: Physical Character Skill Modularization

链接: https://arxiv.org/abs/2502.14140
作者: Yiming Huang,Zhiyang Dou,Lingjie Liu
机构: University of Pennsylvania(宾夕法尼亚大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Robotics (cs.RO)
备注:

点击查看摘要

[CV-65] GlossGau: Efficient Inverse Rendering for Glossy Surface with Anisotropic Spherical Gaussian

链接: https://arxiv.org/abs/2502.14129
作者: Bang Du,Runfa Blark Li,Chen Du,Truong Nguyen
机构: University of California San Diego(加州大学圣地亚哥分校)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-66] Modular Prompt Learning Improves Vision-Language Models

链接: https://arxiv.org/abs/2502.14125
作者: Zhenhan Huang,Tejaswini Pedapati,Pin-Yu Chen,Jianxi Gao
机构: Rensselaer Polytechnic Institute (伦斯勒理工学院); IBM Research (IBM研究院)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing

点击查看摘要

[CV-67] Object-centric Binding in Contrastive Language-Image Pretraining

链接: https://arxiv.org/abs/2502.14113
作者: Rim Assouel,Pietro Astolfi,Florian Bordes,Michal Drozdzal,Adriana Romero-Soriano
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-68] Point Cloud Geometry Scalable Coding Using a Resolution and Quality-conditioned Latents Probability Estimator

链接: https://arxiv.org/abs/2502.14099
作者: Daniele Mari,André F. R. Guarda,Nuno M. M. Rodrigues,Simone Milani,Fernando Pereira
机构: University of Padova, Department of Information Engineering (帕多瓦大学，信息工程系, 意大利帕多瓦, 35131); Instituto de Telecomunicações (电信研究所, 葡萄牙里斯本, 1049-001); Instituto Superior Técnico, Universidade de Lisboa (里斯本大学技术学院, 葡萄牙里斯本, 1049-001); ESTG, Politécnico de Leiria (莱里亚理工学院, 葡萄牙莱里亚, 2411-901)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注: Submitted to IEEE and currently under review

点击查看摘要

[CV-69] Hybrid Visual Servoing of Tendon-driven Continuum Robots

链接: https://arxiv.org/abs/2502.14092
作者: Rana Danesh,Farrokh Janabi-Sharifi,Farhad Aghili
机构: 未知
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Systems and Control (eess.SY)
备注:

点击查看摘要

[CV-70] Regression in EO: Are VLMs Up to the Challenge?

链接: https://arxiv.org/abs/2502.14088
作者: Xizhe Xue,Xiao Xiang Zhu
机构: Technical University of Munich(慕尼黑工业大学); Munich Center for Machine Learning(慕尼黑机器学习中心)
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-71] DiffExp: Efficient Exploration in Reward Fine-tuning for Text-to-Image Diffusion Models AAAI2025

链接: https://arxiv.org/abs/2502.14070
作者: Daewon Chae,June Suk Choi,Jinkyu Kim,Kimin Lee
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注: AAAI 2025

点击查看摘要

[CV-72] A Racing Dataset and Baseline Model for Track Detection in Autonomous Racing

链接: https://arxiv.org/abs/2502.14068
作者: Shreya Ghosh,Yi-Huan Chen,Ching-Hsiang Huang,Abu Shafin Mohammad Mahdee Jameel,Chien Chou Ho,Aly El Gamal,Samuel Labi
机构: School of Electrical and Computer Engineering, Purdue University (普渡大学)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
备注: Currently Under Review

点击查看摘要

[CV-73] riad: Vision Foundation Model for 3D Magnetic Resonance Imaging

链接: https://arxiv.org/abs/2502.14064
作者: Shansong Wang,Mojtaba Safari,Qiang Li,Chih-Wei Chang,Richard LJ Qiu,Justin Roper,David S. Yu,Xiaofeng Yang
机构: Department of Radiation Oncology, Winship Cancer Institute, Emory University School of Medicine (埃默里大学医学院)
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

[CV-74] PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection

链接: https://arxiv.org/abs/2502.14063
作者: Rui Zhao,Zeyu Zhang,Yi Xu,Yi Yao,Yan Huang,Wenxin Zhang,Zirui Song,Xiuying Chen,Yang Zhao
机构: JD.com; The Australian National University; La Trobe University; Central South University; NavInfo Co., Ltd.; University of Technology Sydney; University of Chinese Academy of Science; Mohamed bin Zayed University of Artificial Intelligence
类目: Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-75] EfficientPose 6D: Scalable and Efficient 6D Object Pose Estimation

链接: https://arxiv.org/abs/2502.14061
作者: Zixuan Fang,Thomas Pöllabauer,Tristan Wirth,Sarah Berkei,Volker Knauthe,Arjan Kuijper
机构: 未知
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

[CV-76] Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data ICLR2025

链接: https://arxiv.org/abs/2502.14044
作者: Yucheng Shi,Quanzheng Li,Jin Sun,Xiang Li,Ninghao Liu
机构: School of Computing, University of Georgia (格鲁Georgia大学计算学院); Department of Radiology, Massachusetts General Hospital and Harvard Medical School (马萨诸塞州总医院放射科和哈佛医学院)
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Accepted by ICLR 2025. Code: this https URL

点击查看摘要

[CV-77] Dynamic Activation with Knowledge Distillation for Energy-Efficient Spiking NN Ensembles

【速读】：该论文旨在解决高性能人工神经网络（Artificial Neural Networks, ANNs）在能耗方面的局限性，使其适用于能源受限的应用场景。解决方案的关键在于引入了一种名为脉冲神经元集合（Spiking Neural Ensemble, SNE）的新型系统。SNE结合了知识蒸馏和集成学习，通过将基础AI模型作为教师网络指导一组小型学生脉冲神经网络（Spiking Neural Networks, SNNs），实现了在保持较高精度的同时显著降低能耗。SNE的核心创新点在于动态激活SNN子集，并通过知识蒸馏与特征空间的有指导划分（解缠结）来实现教师网络知识的有效传递，从而在CIFAR-10数据集上达到了高达20倍的计算效率提升，同时仅牺牲了2%的精度。此外，SNE在噪声条件下展现出比其ANN教师更高的鲁棒性。

链接: https://arxiv.org/abs/2502.14023
作者: Orestis Konstantaropoulos,Theodoris Mallios,Maria Papadopouli
机构: Department of Computer Science, University of Crete (计算机科学系，克里特大学), Heraklion, Greece;

Institute of Computer Science, Foundation for Research & Technology-Hellas (计算机科学研究所，基础研究与技术哈勒斯基金会), Heraklion, Greece;

Archimedes, Athena Research Center (雅典研究与技术中心), Athens, Greece
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
备注:

点击查看摘要

Abstract:While foundation AI models excel at tasks like classification and decision-making, their high energy consumption makes them unsuitable for energy-constrained applications. Inspired by the brain’s efficiency, spiking neural networks (SNNs) have emerged as a viable alternative due to their event-driven nature and compatibility with neuromorphic chips. This work introduces a novel system that combines knowledge distillation and ensemble learning to bridge the performance gap between artificial neural networks (ANNs) and SNNs. A foundation AI model acts as a teacher network, guiding smaller student SNNs organized into an ensemble, called Spiking Neural Ensemble (SNE). SNE enables the disentanglement of the teacher’s knowledge, allowing each student to specialize in predicting a distinct aspect of it, while processing the same input. The core innovation of SNE is the adaptive activation of a subset of SNN models of an ensemble, leveraging knowledge-distillation, enhanced with an informed-partitioning (disentanglement) of the teacher’s feature space. By dynamically activating only a subset of these student SNNs, the system balances accuracy and energy efficiency, achieving substantial energy savings with minimal accuracy loss. Moreover, SNE is significantly more efficient than the teacher network, reducing computational requirements by up to 20x with only a 2% drop in accuracy on the CIFAR-10 dataset. This disentanglement procedure achieves an accuracy improvement of up to 2.4% on the CIFAR-10 dataset compared to other partitioning schemes. Finally, we comparatively analyze SNE performance under noisy conditions, demonstrating enhanced robustness compared to its ANN teacher. In summary, SNE offers a promising new direction for energy-constrained applications.
zh

[CV-78] FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis

链接: https://arxiv.org/abs/2502.14807
作者: Fadillah Maani,Numan Saeed,Tausifa Saleem,Zaid Farooq,Hussain Alasmawi,Werner Diehl,Ameera Mohammad,Gareth Waring,Saudabi Valappi,Leanne Bricker,Mohammad Yaqub
机构: 未知
类目: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-79] MedVAE: Efficient Automated Interpretation of Medical Images with Large-Scale Generalizable Autoencoders

链接: https://arxiv.org/abs/2502.14753
作者: Maya Varma,Ashwin Kumar,Rogier van der Sluijs,Sophie Ostmeier,Louis Blankemeier,Pierre Chambon,Christian Bluethgen,Jip Prince,Curtis Langlotz,Akshay Chaudhari
机构: Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University (斯坦福大学), Palo Alto, CA, USA; UMC Utrecht (UMC乌特勒支), Utrecht, Netherlands
类目: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-80] Vision Foundation Models in Medical Image Analysis: Advances and Challenges

链接: https://arxiv.org/abs/2502.14584
作者: Pengchen Liang,Bin Pu,Haishan Huang,Yiwei Li,Hualiang Wang,Weibo Ma,Qing Chang
机构: 未知
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
备注: 17 pages, 1 figure

点击查看摘要

[CV-81] Role of the Pretraining and the Adaptation data sizes for low-resource real-time MRI video segmentation ICASSP2025

链接: https://arxiv.org/abs/2502.14418
作者: Masoud Thajudeen Tholan,Vinayaka Hegde,Chetan Sharma,Prasanta Kumar Ghosh
机构: Indian Institute of Science (印度科学研究所)
类目: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
备注: Accepted to ICASSP 2025

点击查看摘要

[CV-82] MedFuncta: Modality-Agnostic Representations Based on Efficient Neural Fields

链接: https://arxiv.org/abs/2502.14401
作者: Paul Friedrich,Florentin Bieder,Phlippe C. Cattin
机构: 未知
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
备注: Code and Dataset: this https URL

点击查看摘要

[CV-83] opology-Aware Wavelet Mamba for Airway Structure Segmentation in Postoperative Recurrent Nasopharyngeal Carcinoma CT Scans

链接: https://arxiv.org/abs/2502.14363
作者: Haishan Huang,Pengchen Liang,Naier Lin,Luxi Wang,Bin Pu,Jianguo Chen,Qing Chang,Xia Shen,Guo Ran
机构: School of Software Engineering, Sun Yat-sen University (中山大学软件工程学院), Zhuhai, Guangdong Province, China (中国广东省珠海市);

School of Microelectronics, Shanghai University (上海大学微电子学院), Shanghai, China (中国上海市);

Electronic and Computer Engineering, The Hong Kong University of Science and Technology (香港科技大学电子与计算机工程系), Hong Kong, China (中国香港特别行政区);

Department Shanghai Key Laboratory of Gastric Neoplasms, Department of Surgery, Shanghai Institute of Digestive Surgery, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine (上海交通大学医学院附属瑞金医院胃肿瘤重点实验室, 外科, 上海消化外科研究所), Shanghai, China (中国上海市);

Department of Radiology, Eye & ENT Hospital, Fudan University (复旦大学眼耳鼻喉医院放射科), Shanghai, 200031, China (中国上海市);

Department of Anesthesiology, Eye & ENT Hospital, Fudan University (复旦大学眼耳鼻喉医院麻醉科), Shanghai, 200031, China (中国上海市)
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
备注: 20 pages, 11 figures, 6 tables

点击查看摘要

[CV-84] EyeBench: A Call for More Rigorous Evaluation of Retinal Image Enhancement

链接: https://arxiv.org/abs/2502.14260
作者: Wenhui Zhu,Xuanzhao Dong,Xin Li,Yujian Xiong,Xiwen Chen,Peijie Qiu,Vamsi Krishna Vasa,Zhangsihao Yang,Yi Su,Oana Dumitrascu,Yalin Wang
机构: Arizona State University; Clemson University; Washington University in St. Louis; Banner Alzheimer’s Institute; Mayo Clinic
类目: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

[CV-85] MambaLiteSR: Image Super-Resolution with Low-Rank Mamba using Knowledge Distillation

链接: https://arxiv.org/abs/2502.14090
作者: Romina Aalishah,Mozhgan Navardi,Tinoosh Mohsenin
机构: Johns Hopkins University (约翰霍普金斯大学)
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
备注: Special Session: Generative AI on Edge, 26th International Symposium on Quality Electronic Design (ISQED’25)

点击查看摘要

[CV-86] Segmentation-free integration of nuclei morphology and spatial transcriptomics for retinal images

【速读】：该论文旨在解决细胞核形态特征与空间转录组学数据整合中的细胞分割难题。在某些组织区域，由于特定的结构复杂性和细胞密集堆积，难以开发通用的细胞分割方法。论文提出的关键解决方案是SEFI (Segmentation-Free Integration)，它利用自监督学习从荧光核染色图像中提取形态特征，从而增强基因表达数据的聚类，而无需进行细胞分割。

链接: https://arxiv.org/abs/2502.13974
作者: Eduard Chelebian,Pratiti Dasgupta,Zainalabedin Samadi,Carolina Wählby,Amjad Askary
机构: Dept. of Information Technology and SciLifeLab (信息科技部和SciLifeLab), Uppsala Univeristy (乌普萨拉大学); Dept. of Molecular, Cell and Developmental Biology (分子、细胞和发育生物学系), University of California, Los Angeles (加州大学洛杉矶分校)
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:This study introduces SEFI (SEgmentation-Free Integration), a novel method for integrating morphological features of cell nuclei with spatial transcriptomics data. Cell segmentation poses a significant challenge in the analysis of spatial transcriptomics data, as tissue-specific structural complexities and densely packed cells in certain regions make it difficult to develop a universal approach. SEFI addresses this by utilizing self-supervised learning to extract morphological features from fluorescent nuclear staining images, enhancing the clustering of gene expression data without requiring segmentation. We demonstrate SEFI on spatially resolved gene expression profiles of the developing retina, acquired using multiplexed single molecule Fluorescence In Situ Hybridization (smFISH). SEFI is publicly available at this https URL.
zh

人工智能

[AI-0] Ray-Tracing for Conditionally Activated Neural Networks

链接: https://arxiv.org/abs/2502.14788
作者: Claudio Gallicchio,Giuseppe Nuti
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: submitted to workshop

点击查看摘要

Abstract:In this paper, we introduce a novel architecture for conditionally activated neural networks combining a hierarchical construction of multiple Mixture of Experts (MoEs) layers with a sampling mechanism that progressively converges to an optimized configuration of expert activation. This methodology enables the dynamic unfolding of the network’s architecture, facilitating efficient path-specific training. Experimental results demonstrate that this approach achieves competitive accuracy compared to conventional baselines while significantly reducing the parameter count required for inference. Notably, this parameter reduction correlates with the complexity of the input patterns, a property naturally emerging from the network’s operational dynamics without necessitating explicit auxiliary penalty functions.

[AI-1] Real-Time Device Reach Forecasting Using HLL and MinHash Data Sketches

链接: https://arxiv.org/abs/2502.14785
作者: Chandrashekar Muniyappa,Kendall Willets,Sriraman Krishnamoorthy
类目: Databases (cs.DB); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Predicting the right number of TVs (Device Reach) in real-time based on a user-specified targeting attributes is imperative for running multi-million dollar ADs business. The traditional approach of SQL queries to join billions of records across multiple targeting dimensions is extremely slow. As a workaround, many applications will have an offline process to crunch these numbers and present the results after many hours. In our case, the solution was an offline process taking 24 hours to onboard a customer resulting in a potential loss of business. To solve this problem, we have built a new real-time prediction system using MinHash and HyperLogLog (HLL) data sketches to compute the device reach at runtime when a user makes a request. However, existing MinHash implementations do not solve the complex problem of multilevel aggregation and intersection. This work will show how we have solved this problem, in addition, we have improved MinHash algorithm to run 4 times faster using Single Instruction Multiple Data (SIMD) vectorized operations for high speed and accuracy with constant space to process billions of records. Finally, by experiments, we prove that the results are as accurate as traditional offline prediction system with an acceptable error rate of 5%.

[AI-2] Making Universal Policies Universal

链接: https://arxiv.org/abs/2502.14777
作者: Niklas Höpner,David Kuric,Herke van Hoof
类目: Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:The development of a generalist agent capable of solving a wide range of sequential decision-making tasks remains a significant challenge. We address this problem in a cross-agent setup where agents share the same observation space but differ in their action spaces. Our approach builds on the universal policy framework, which decouples policy learning into two stages: a diffusion-based planner that generates observation sequences and an inverse dynamics model that assigns actions to these plans. We propose a method for training the planner on a joint dataset composed of trajectories from all agents. This method offers the benefit of positive transfer by pooling data from different agents, while the primary challenge lies in adapting shared plans to each agent’s unique constraints. We evaluate our approach on the BabyAI environment, covering tasks of varying complexity, and demonstrate positive transfer across agents. Additionally, we examine the planner’s generalisation ability to unseen agents and compare our method to traditional imitation learning approaches. By training on a pooled dataset from multiple agents, our universal policy achieves an improvement of up to 42.20% in task completion accuracy compared to a policy trained on a dataset from a single agent.

[AI-3] EquivaMap: Leverag ing LLM s for Automatic Equivalence Checking of Optimization Formulations

链接: https://arxiv.org/abs/2502.14760
作者: Haotian Zhai,Connor Lawless,Ellen Vitercik,Liu Leqi
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Optimization and Control (math.OC)
*备注:

点击查看摘要

Abstract:A fundamental problem in combinatorial optimization is identifying equivalent formulations, which can lead to more efficient solution strategies and deeper insights into a problem’s computational complexity. The need to automatically identify equivalence between problem formulations has grown as optimization copilots–systems that generate problem formulations from natural language descriptions–have proliferated. However, existing approaches to checking formulation equivalence lack grounding, relying on simple heuristics which are insufficient for rigorous validation. Inspired by Karp reductions, in this work we introduce quasi-Karp equivalence, a formal criterion for determining when two optimization formulations are equivalent based on the existence of a mapping between their decision variables. We propose EquivaMap, a framework that leverages large language models to automatically discover such mappings, enabling scalable and reliable equivalence verification. To evaluate our approach, we construct the first open-source dataset of equivalent optimization formulations, generated by applying transformations such as adding slack variables or valid inequalities to existing formulations. Empirically, EquivaMap significantly outperforms existing methods, achieving substantial improvements in correctly identifying formulation equivalence.

[AI-4] Multi-Agent Coordination across Diverse Applications: A Survey

链接: https://arxiv.org/abs/2502.14743
作者: Lijun Sun,Yijun Yang,Qiqi Duan,Yuhui Shi,Chao Lyu,Yu-Cheng Chang,Chin-Teng Lin,Yang Shen
类目: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)
*备注: 23 pages, 4 figures, 2 tables

点击查看摘要

Abstract:Multi-agent coordination studies the underlying mechanism enabling the trending spread of diverse multi-agent systems (MAS) and has received increasing attention, driven by the expansion of emerging applications and rapid AI advances. This survey outlines the current state of coordination research across applications through a unified understanding that answers four fundamental coordination questions: (1) what is coordination; (2) why coordination; (3) who to coordinate with; and (4) how to coordinate. Our purpose is to explore existing ideas and expertise in coordination and their connections across diverse applications, while identifying and highlighting emerging and promising research directions. First, general coordination problems that are essential to varied applications are identified and analyzed. Second, a number of MAS applications are surveyed, ranging from widely studied domains, e.g., search and rescue, warehouse automation and logistics, and transportation systems, to emerging fields including humanoid and anthropomorphic robots, satellite systems, and large language models (LLMs). Finally, open challenges about the scalability, heterogeneity, and learning mechanisms of MAS are analyzed and discussed. In particular, we identify the hybridization of hierarchical and decentralized coordination, human-MAS coordination, and LLM-based MAS as promising future directions.

[AI-5] EAGER-LLM : Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration WWW2025

链接: https://arxiv.org/abs/2502.14735
作者: Minjie Hong,Yan Xia,Zehan Wang,Jieming Zhu,Ye Wang,Sihang Cai,Xiaoda Yang,Quanyu Dai,Zhenhua Dong,Zhimeng Zhang,Zhou Zhao
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
*备注: 9 pages, 6 figures, accpeted by WWW 2025

点击查看摘要

Abstract:Large language models (LLMs) are increasingly leveraged as foundational backbones in the development of advanced recommender systems, offering enhanced capabilities through their extensive knowledge and reasoning. Existing llm-based recommender systems (RSs) often face challenges due to the significant differences between the linguistic semantics of pre-trained LLMs and the collaborative semantics essential for RSs. These systems use pre-trained linguistic semantics but learn collaborative semantics from scratch via the llm-Backbone. However, LLMs are not designed for recommendations, leading to inefficient collaborative learning, weak result correlations, and poor integration of traditional RS features. To address these challenges, we propose EAGER-LLM, a decoder-only llm-based generative recommendation framework that integrates endogenous and exogenous behavioral and semantic information in a non-intrusive manner. Specifically, we propose 1)dual-source knowledge-rich item indices that integrates indexing sequences for exogenous signals, enabling efficient link-wide processing; 2)non-invasive multiscale alignment reconstruction tasks guide the model toward a deeper understanding of both collaborative and semantic signals; 3)an annealing adapter designed to finely balance the model’s recommendation performance with its comprehension capabilities. We demonstrate EAGER-LLM’s effectiveness through rigorous testing on three public benchmarks.

[AI-6] WavRAG : Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models

链接: https://arxiv.org/abs/2502.14727
作者: Yifu Chen,Shengpeng Ji,Haoxiao Wang,Ziqing Wang,Siyu Chen,Jinzheng He,Jin Xu,Zhou Zhao
类目: ound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
*备注:

点击查看摘要

Abstract:Retrieval Augmented Generation (RAG) has gained widespread adoption owing to its capacity to empower large language models (LLMs) to integrate external knowledge. However, existing RAG frameworks are primarily designed for text-based LLMs and rely on Automatic Speech Recognition to process speech input, which discards crucial audio information, risks transcription errors, and increases computational overhead. Therefore, we introduce WavRAG, the first retrieval augmented generation framework with native, end-to-end audio support. WavRAG offers two key features: 1) Bypassing ASR, WavRAG directly processes raw audio for both embedding and retrieval. 2) WavRAG integrates audio and text into a unified knowledge representation. Specifically, we propose the WavRetriever to facilitate the retrieval from a text-audio hybrid knowledge base, and further enhance the in-context capabilities of spoken dialogue models through the integration of chain-of-thought reasoning. In comparison to state-of-the-art ASR-Text RAG pipelines, WavRAG achieves comparable retrieval performance while delivering a 10x acceleration. Furthermore, WavRAG’s unique text-audio hybrid retrieval capability extends the boundaries of RAG to the audio modality.

[AI-7] Ranking Joint Policies in Dynamic Games using Evolutionary Dynamics

链接: https://arxiv.org/abs/2502.14724
作者: Natalia Koliou,George Vouros
类目: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Game-theoretic solution concepts, such as the Nash equilibrium, have been key to finding stable joint actions in multi-player games. However, it has been shown that the dynamics of agents’ interactions, even in simple two-player games with few strategies, are incapable of reaching Nash equilibria, exhibiting complex and unpredictable behavior. Instead, evolutionary approaches can describe the long-term persistence of strategies and filter out transient ones, accounting for the long-term dynamics of agents’ interactions. Our goal is to identify agents’ joint strategies that result in stable behavior, being resistant to changes, while also accounting for agents’ payoffs, in dynamic games. Towards this goal, and building on previous results, this paper proposes transforming dynamic games into their empirical forms by considering agents’ strategies instead of agents’ actions, and applying the evolutionary methodology \alpha -Rank to evaluate and rank strategy profiles according to their long-term dynamics. This methodology not only allows us to identify joint strategies that are strong through agents’ long-term interactions, but also provides a descriptive, transparent framework regarding the high ranking of these strategies. Experiments report on agents that aim to collaboratively solve a stochastic version of the graph coloring problem. We consider different styles of play as strategies to define the empirical game, and train policies realizing these strategies, using the DQN algorithm. Then we run simulations to generate the payoff matrix required by \alpha -Rank to rank joint strategies.

[AI-8] Building reliable sim driving agents by scaling self-play

链接: https://arxiv.org/abs/2502.14706
作者: Daphne Cornelisse,Aarav Pandya,Kevin Joseph,Joseph Suárez,Eugene Vinitsky
类目: Artificial Intelligence (cs.AI); Robotics (cs.RO)
*备注: First version

点击查看摘要

Abstract:Simulation agents are essential for designing and testing systems that interact with humans, such as autonomous vehicles (AVs). These agents serve various purposes, from benchmarking AV performance to stress-testing the system’s limits, but all use cases share a key requirement: reliability. A simulation agent should behave as intended by the designer, minimizing unintended actions like collisions that can compromise the signal-to-noise ratio of analyses. As a foundation for reliable sim agents, we propose scaling self-play to thousands of scenarios on the Waymo Open Motion Dataset under semi-realistic limits on human perception and control. Training from scratch on a single GPU, our agents nearly solve the full training set within a day. They generalize effectively to unseen test scenes, achieving a 99.8% goal completion rate with less than 0.8% combined collision and off-road incidents across 10,000 held-out scenarios. Beyond in-distribution generalization, our agents show partial robustness to out-of-distribution scenes and can be fine-tuned in minutes to reach near-perfect performance in those cases. Demonstrations of agent behaviors can be found at this link. We open-source both the pre-trained agents and the complete code base. Demonstrations of agent behaviors can be found at \urlthis https URL.

[AI-9] Not All Data are Good Labels: On the Self-supervised Labeling for Time Series Forecasting

链接: https://arxiv.org/abs/2502.14704
作者: Yuxuan Yang,Dalin Zhang,Yuxuan Liang,Hua Lu,Huan Li,Gang Chen
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Time Series Forecasting (TSF) is a crucial task in various domains, yet existing TSF models rely heavily on high-quality data and insufficiently exploit all available data. This paper explores a novel self-supervised approach to re-label time series datasets by inherently constructing candidate datasets. During the optimization of a simple reconstruction network, intermediates are used as pseudo labels in a self-supervised paradigm, improving generalization for any predictor. We introduce the Self-Correction with Adaptive Mask (SCAM), which discards overfitted components and selectively replaces them with pseudo labels generated from reconstructions. Additionally, we incorporate Spectral Norm Regularization (SNR) to further suppress overfitting from a loss landscape perspective. Our experiments on eleven real-world datasets demonstrate that SCAM consistently improves the performance of various backbone models. This work offers a new perspective on constructing datasets and enhancing the generalization of TSF models through self-supervised learning.

[AI-10] General Uncertainty Estimation with Delta Variances

链接: https://arxiv.org/abs/2502.14698
作者: Simon Schmitt,John Shawe-Taylor,Hado van Hasselt
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Applications (stat.AP); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:Decision makers may suffer from uncertainty induced by limited data. This may be mitigated by accounting for epistemic uncertainty, which is however challenging to estimate efficiently for large neural networks. To this extent we investigate Delta Variances, a family of algorithms for epistemic uncertainty quantification, that is computationally efficient and convenient to implement. It can be applied to neural networks and more general functions composed of neural networks. As an example we consider a weather simulator with a neural-network-based step function inside – here Delta Variances empirically obtain competitive results at the cost of a single gradient computation. The approach is convenient as it requires no changes to the neural network architecture or training procedure. We discuss multiple ways to derive Delta Variances theoretically noting that special cases recover popular techniques and present a unified perspective on multiple related methods. Finally we observe that this general perspective gives rise to a natural extension and empirically show its benefit.

[AI-11] seqKAN: Sequence processing with Kolmogorov-Arnold Networks

链接: https://arxiv.org/abs/2502.14681
作者: Tatiana Boura,Stasinos Konstantopoulos
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Kolmogorov-Arnold Networks (KANs) have been recently proposed as a machine learning framework that is more interpretable and controllable than the multi-layer perceptron. Various network architectures have been proposed within the KAN framework targeting different tasks and application domains, including sequence processing. This paper proposes seqKAN, a new KAN architecture for sequence processing. Although multiple sequence processing KAN architectures have already been proposed, we argue that seqKAN is more faithful to the core concept of the KAN framework. Furthermore, we empirically demonstrate that it achieves better results. The empirical evaluation is performed on generated data from a complex physics problem on an interpolation and an extrapolation task. Using this dataset we compared seqKAN against a prior KAN network for timeseries prediction, recurrent deep networks, and symbolic regression. seqKAN substantially outperforms all architectures, particularly on the extrapolation dataset, while also being the most transparent. Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI) Cite as: arXiv:2502.14681 [cs.LG] (or arXiv:2502.14681v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2502.14681 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

[AI-12] ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation

链接: https://arxiv.org/abs/2502.14637
作者: Angxiao Yue,Zichong Wang,Hongteng Xu
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Protein backbone generation plays a central role in de novo protein design and is significant for many biological and medical applications. Although diffusion and flow-based generative models provide potential solutions to this challenging task, they often generate proteins with undesired designability and suffer computational inefficiency. In this study, we propose a novel rectified quaternion flow (ReQFlow) matching method for fast and high-quality protein backbone generation. In particular, our method generates a local translation and a 3D rotation from random noise for each residue in a protein chain, which represents each 3D rotation as a unit quaternion and constructs its flow by spherical linear interpolation (SLERP) in an exponential format. We train the model by quaternion flow (QFlow) matching with guaranteed numerical stability and rectify the QFlow model to accelerate its inference and improve the designability of generated protein backbones, leading to the proposed ReQFlow model. Experiments show that ReQFlow achieves state-of-the-art performance in protein backbone generation while requiring much fewer sampling steps and significantly less inference time (e.g., being 37x faster than RFDiffusion and 62x faster than Genie2 when generating a backbone of length 300), demonstrating its effectiveness and efficiency. The code is available at this https URL.

[AI-13] ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors

链接: https://arxiv.org/abs/2502.14627
作者: Yuguo Yin,Yuxin Xie,Wenyuan Yang,Dongchao Yang,Jinghan Ru,Xianwei Zhuang,Liming Liang,Yuexian Zou
类目: ound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
*备注:

点击查看摘要

Abstract:Multilingual audio-text retrieval (ML-ATR) is a challenging task that aims to retrieve audio clips or multilingual texts from databases. However, existing ML-ATR schemes suffer from inconsistencies for instance similarity matching across languages. We theoretically analyze the inconsistency in terms of both multilingual modal alignment direction error and weight error, and propose the theoretical weight error upper bound for quantifying the inconsistency. Based on the analysis of the weight error upper bound, we find that the inconsistency problem stems from the data distribution error caused by random sampling of languages. We propose a consistent ML-ATR scheme using 1-to-k contrastive learning and audio-English co-anchor contrastive learning, aiming to mitigate the negative impact of data distribution error on recall and consistency in ML-ATR. Experimental results on the translated AudioCaps and Clotho datasets show that our scheme achieves state-of-the-art performance on recall and consistency metrics for eight mainstream languages, including English. Our code will be available at this https URL.

[AI-14] A Theory for Conditional Generative Modeling on Multiple Data Sources

链接: https://arxiv.org/abs/2502.14583
作者: Rongzhen Wang,Yan Zhang,Chenyu Zheng,Chongxuan Li,Guoqiang Wu
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: 35 pages

点击查看摘要

Abstract:The success of large generative models has driven a paradigm shift, leveraging massive multi-source data to enhance model capabilities. However, the interaction among these sources remains theoretically underexplored. This paper takes the first step toward a rigorous analysis of multi-source training in conditional generative modeling, where each condition represents a distinct data source. Specifically, we establish a general distribution estimation error bound in average total variation distance for conditional maximum likelihood estimation based on the bracketing number. Our result shows that when source distributions share certain similarities and the model is expressive enough, multi-source training guarantees a sharper bound than single-source training. We further instantiate the general theory on conditional Gaussian estimation and deep generative models including autoregressive and flexible energy-based models, by characterizing their bracketing numbers. The results highlight that the number of sources and similarity among source distributions improve the advantage of multi-source training. Simulations and real-world experiments validate our theory. Code is available at: \urlthis https URL.

[AI-15] Factor Graph-based Interpretable Neural Networks

链接: https://arxiv.org/abs/2502.14572
作者: Yicong Li,Kuanjiu Zhou,Shuo Yu,Qiang Zhang,Renqiang Luo,Xiaodong Li,Feng Xia
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: The Thirteenth International Conference on Learning Representations

点击查看摘要

[AI-16] Plan-over-Graph: Towards Parallelable LLM Agent Schedule

链接: https://arxiv.org/abs/2502.14563
作者: Shiqi Zhang,Xinbei Ma,Zouying Cao,Zhuosheng Zhang,Hai Zhao
类目: Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Large Language Models (LLMs) have demonstrated exceptional abilities in reasoning for task planning. However, challenges remain under-explored for parallel schedules. This paper introduces a novel paradigm, plan-over-graph, in which the model first decomposes a real-life textual task into executable subtasks and constructs an abstract task graph. The model then understands this task graph as input and generates a plan for parallel execution. To enhance the planning capability of complex, scalable graphs, we design an automated and controllable pipeline to generate synthetic graphs and propose a two-stage training scheme. Experimental results show that our plan-over-graph method significantly improves task performance on both API-based LLMs and trainable open-sourced LLMs. By normalizing complex tasks as graphs, our method naturally supports parallel execution, demonstrating global efficiency. The code and data are available at this https URL.

[AI-17] FUIA: Model Inversion Attack against Federated Unlearning

链接: https://arxiv.org/abs/2502.14558
作者: Lei Zhou,Youwen Zhu
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
*备注: Initial manuscript

点击查看摘要

Abstract:With the introduction of regulations related to the ``right to be forgotten", federated learning (FL) is facing new privacy compliance challenges. To address these challenges, researchers have proposed federated unlearning (FU). However, existing FU research has primarily focused on improving the efficiency of unlearning, with less attention paid to the potential privacy vulnerabilities inherent in these methods. To address this gap, we draw inspiration from gradient inversion attacks in FL and propose the federated unlearning inversion attack (FUIA). The FUIA is specifically designed for the three types of FU (sample unlearning, client unlearning, and class unlearning), aiming to provide a comprehensive analysis of the privacy leakage risks associated with FU. In FUIA, the server acts as an honest-but-curious attacker, recording and exploiting the model differences before and after unlearning to expose the features and labels of forgotten data. FUIA significantly leaks the privacy of forgotten data and can target all types of FU. This attack contradicts the goal of FU to eliminate specific data influence, instead exploiting its vulnerabilities to recover forgotten data and expose its privacy flaws. Extensive experimental results show that FUIA can effectively reveal the private information of forgotten data. To mitigate this privacy leakage, we also explore two potential defense methods, although these come at the cost of reduced unlearning effectiveness and the usability of the unlearned model.

[AI-18] Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks

链接: https://arxiv.org/abs/2502.14546
作者: Maya Bechler-Speicher,Ben Finkelshtein,Fabrizio Frasca,Luis Müller,Jan Tönshoff,Antoine Siraudin,Viktor Zaverkin,Michael M. Bronstein,Mathias Niepert,Bryan Perozzi,Mikhail Galkin,Christopher Morris
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
*备注:

点击查看摘要

Abstract:While machine learning on graphs has demonstrated promise in drug design and molecular property prediction, significant benchmarking challenges hinder its further progress and relevance. Current benchmarking practices often lack focus on transformative, real-world applications, favoring narrow domains like two-dimensional molecular graphs over broader, impactful areas such as combinatorial optimization, relational databases, or chip design. Additionally, many benchmark datasets poorly represent the underlying data, leading to inadequate abstractions and misaligned use cases. Fragmented evaluations and an excessive focus on accuracy further exacerbate these issues, incentivizing overfitting rather than fostering generalizable insights. These limitations have prevented the development of truly useful graph foundation models. This position paper calls for a paradigm shift toward more meaningful benchmarks, rigorous evaluation protocols, and stronger collaboration with domain experts to drive impactful and reliable advances in graph learning research, unlocking the potential of graph learning.

[AI-19] Small Graph Is All You Need: DeepStateGNN for Scalable Traffic Forecasting

链接: https://arxiv.org/abs/2502.14525
作者: Yannick Wölker,Arash Hajisafi,Cyrus Shahabi,Matthias Renz
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: Yannick Wölker and Arash Hajisafi contributed equally to this work

点击查看摘要

Abstract:We propose a novel Graph Neural Network (GNN) model, named DeepStateGNN, for analyzing traffic data, demonstrating its efficacy in two critical tasks: forecasting and reconstruction. Unlike typical GNN methods that treat each traffic sensor as an individual graph node, DeepStateGNN clusters sensors into higher-level graph nodes, dubbed Deep State Nodes, based on various similarity criteria, resulting in a fixed number of nodes in a Deep State graph. The term “Deep State” nodes is a play on words, referencing hidden networks of power that, like these nodes, secretly govern traffic independently of visible sensors. These Deep State Nodes are defined by several similarity factors, including spatial proximity (e.g., sensors located nearby in the road network), functional similarity (e.g., sensors on similar types of freeways), and behavioral similarity under specific conditions (e.g., traffic behavior during rain). This clustering approach allows for dynamic and adaptive node grouping, as sensors can belong to multiple clusters and clusters may evolve over time. Our experimental results show that DeepStateGNN offers superior scalability and faster training, while also delivering more accurate results than competitors. It effectively handles large-scale sensor networks, outperforming other methods in both traffic forecasting and reconstruction accuracy.

[AI-20] Statistical Scenario Modelling and Lookalike Distributions for Multi-Variate AI Risk

链接: https://arxiv.org/abs/2502.14491
作者: Elija Perrier
类目: Artificial Intelligence (cs.AI)
*备注: Under review

点击查看摘要

Abstract:Evaluating AI safety requires statistically rigorous methods and risk metrics for understanding how the use of AI affects aggregated risk. However, much AI safety literature focuses upon risks arising from AI models in isolation, lacking consideration of how modular use of AI affects risk distribution of workflow components or overall risk metrics. There is also a lack of statistical grounding enabling sensitisation of risk models in the presence of absence of AI to estimate causal contributions of AI. This is in part due to the dearth of AI impact data upon which to fit distributions. In this work, we address these gaps in two ways. First, we demonstrate how scenario modelling (grounded in established statistical techniques such as Markov chains, copulas and Monte Carlo simulation) can be used to model AI risk holistically. Second, we show how lookalike distributions from phenomena analogous to AI can be used to estimate AI impacts in the absence of directly observable data. We demonstrate the utility of our methods for benchmarking cumulative AI risk via risk analysis of a logistic scenario simulations.

[AI-21] Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing

链接: https://arxiv.org/abs/2502.14458
作者: Aviv Bick,Tobias Katsch,Nimit Sohoni,Arjun Desai,Albert Gu
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

[AI-22] Watch Less Feel More: Sim-to-Real RL for Generalizable Articulated Object Manipulation via Motion Adaptation and Impedance Control

链接: https://arxiv.org/abs/2502.14457
作者: Tan-Dzung Do,Nandiraju Gireesh,Jilong Wang,He Wang
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Articulated object manipulation poses a unique challenge compared to rigid object manipulation as the object itself represents a dynamic environment. In this work, we present a novel RL-based pipeline equipped with variable impedance control and motion adaptation leveraging observation history for generalizable articulated object manipulation, focusing on smooth and dexterous motion during zero-shot sim-to-real transfer. To mitigate the sim-to-real gap, our pipeline diminishes reliance on vision by not leveraging the vision data feature (RGBD/pointcloud) directly as policy input but rather extracting useful low-dimensional data first via off-the-shelf modules. Additionally, we experience less sim-to-real gap by inferring object motion and its intrinsic properties via observation history as well as utilizing impedance control both in the simulation and in the real world. Furthermore, we develop a well-designed training setting with great randomization and a specialized reward system (task-aware and motion-aware) that enables multi-staged, end-to-end manipulation without heuristic motion planning. To the best of our knowledge, our policy is the first to report 84% success rate in the real world via extensive experiments with various unseen objects.

[AI-23] Narrative-Driven Travel Planning : Geoculturally-Grounded Script Generation with Evolutionary Itinerary Optimization

链接: https://arxiv.org/abs/2502.14456
作者: Ran Ding,Ziyu Zhang,Ying Zhu,Ziqian Kong,Peilan Xu
类目: Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

[AI-24] An Efficient Ground-aerial Transportation System for Pest Control Enabled by AI-based Autonomous Nano-UAVs

链接: https://arxiv.org/abs/2502.14455
作者: Luca Crupi,Luca Butera,Alberto Ferrante,Alessandro Giusti,Daniele Palossi
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

[AI-25] HPS: Hard Preference Sampling for Human Preference Alignment

链接: https://arxiv.org/abs/2502.14400
作者: Xiandong Zou,Wanyu Lin,Yuchen Li,Pan Zhou
类目: Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

[AI-26] S*: Test Time Scaling for Code Generation

链接: https://arxiv.org/abs/2502.14382
作者: Dacheng Li,Shiyi Cao,Chengkun Cao,Xiuyu Li,Shangyin Tan,Kurt Keutzer,Jiarong Xing,Joseph E. Gonzalez,Ion Stoica
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

[AI-27] Is Q-learning an Ill-posed Problem?

链接: https://arxiv.org/abs/2502.14365
作者: Philipp Wissmann,Daniel Hein,Steffen Udluft,Thomas Runkler
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: Accepted at ESANN 2025

点击查看摘要

[AI-28] Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning

链接: https://arxiv.org/abs/2502.14361
作者: Jiachen Zhu,Congmin Zheng,Jianghao Lin,Kounianhua Du,Ying Wen,Yong Yu,Jun Wang,Weinan Zhang
类目: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
*备注:

点击查看摘要

[AI-29] FlowAgent : Achieving Compliance and Flexibility for Workflow Agents

链接: https://arxiv.org/abs/2502.14345
作者: Yuchen Shi,Siqi Cai,Zihan Xu,Yuei Qin,Gang Li,Hang Shao,Jiawei Chen,Deqing Yang,Ke Li,Xing Sun
类目: Artificial Intelligence (cs.AI)
*备注: 8 pages

点击查看摘要

Abstract:The integration of workflows with large language models (LLMs) enables LLM-based agents to execute predefined procedures, enhancing automation in real-world applications. Traditional rule-based methods tend to limit the inherent flexibility of LLMs, as their predefined execution paths restrict the models’ action space, particularly when the unexpected, out-of-workflow (OOW) queries are encountered. Conversely, prompt-based methods allow LLMs to fully control the flow, which can lead to diminished enforcement of procedural compliance. To address these challenges, we introduce FlowAgent, a novel agent framework designed to maintain both compliance and flexibility. We propose the Procedure Description Language (PDL), which combines the adaptability of natural language with the precision of code to formulate workflows. Building on PDL, we develop a comprehensive framework that empowers LLMs to manage OOW queries effectively, while keeping the execution path under the supervision of a set of controllers. Additionally, we present a new evaluation methodology to rigorously assess an LLM agent’s ability to handle OOW scenarios, going beyond routine flow compliance tested in existing benchmarks. Experiments on three datasets demonstrate that FlowAgent not only adheres to workflows but also effectively manages OOW queries, highlighting its dual strengths in compliance and flexibility. The code is available at this https URL.

[AI-30] An Evaluation of Sakanas AI Scientist for Autonomous Research: Wishful Thinking or an Emerging Reality Towards Artificial General Research Intelligence (AGRI)?

链接: https://arxiv.org/abs/2502.14297
作者: Joeran Beel,Min-Yen Kan,Moritz Baumgart
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: 16 pages

点击查看摘要

[AI-31] Graph Anomaly Detection via Adaptive Test-time Representation Learning across Out-of-Distribution Domains

链接: https://arxiv.org/abs/2502.14293
作者: Delaram Pirhayati,Arlei Silva
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
*备注:

点击查看摘要

[AI-32] Correcting Noisy Multilabel Predictions: Modeling Label Noise through Latent Space Shifts

链接: https://arxiv.org/abs/2502.14281
作者: Weipeng Huang,Qin Li,Yang Xiao,Cheng Qiao,Tie Cai,Junwei Liao,Neil J. Hurley,Guangyuan Piao
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Noise in data appears to be inevitable in most real-world machine learning applications and would cause severe overfitting problems. Not only can data features contain noise, but labels are also prone to be noisy due to human input. In this paper, rather than noisy label learning in multiclass classifications, we instead focus on the less explored area of noisy label learning for multilabel classifications. Specifically, we investigate the post-correction of predictions generated from classifiers learned with noisy labels. The reasons are two-fold. Firstly, this approach can directly work with the trained models to save computational resources. Secondly, it could be applied on top of other noisy label correction techniques to achieve further improvements. To handle this problem, we appeal to deep generative approaches that are possible for uncertainty estimation. Our model posits that label noise arises from a stochastic shift in the latent variable, providing a more robust and beneficial means for noisy learning. We develop both unsupervised and semi-supervised learning methods for our model. The extensive empirical study presents solid evidence to that our approach is able to consistently improve the independent models and performs better than a number of existing methods across various noisy label settings. Moreover, a comprehensive empirical analysis of the proposed method is carried out to validate its robustness, including sensitivity analysis and an ablation study, among other elements.

[AI-33] SPRIG: Stackelberg Perception-Reinforcement Learning with Internal Game Dynamics AAAI2025

链接: https://arxiv.org/abs/2502.14264
作者: Fernando Martinez-Lopez,Juntao Chen,Yingdong Lu
类目: Artificial Intelligence (cs.AI)
*备注: To appear in: AAAI 2025 Workshop on Planning and Reinforcement Learning (PRL) - Bridging the Gap Between AI Planning and Reinforcement Learning

点击查看摘要

Abstract:Deep reinforcement learning agents often face challenges to effectively coordinate perception and decision-making components, particularly in environments with high-dimensional sensory inputs where feature relevance varies. This work introduces SPRIG (Stackelberg Perception-Reinforcement learning with Internal Game dynamics), a framework that models the internal perception-policy interaction within a single agent as a cooperative Stackelberg game. In SPRIG, the perception module acts as a leader, strategically processing raw sensory states, while the policy module follows, making decisions based on extracted features. SPRIG provides theoretical guarantees through a modified Bellman operator while preserving the benefits of modern policy optimization. Experimental results on the Atari BeamRider environment demonstrate SPRIG’s effectiveness, achieving around 30% higher returns than standard PPO through its game-theoretical balance of feature extraction and decision-making.

[AI-34] Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation

链接: https://arxiv.org/abs/2502.14254
作者: Lingfeng Zhang,Yuecheng Liu,Zhanguang Zhang,Matin Aghaei,Yaochen Hu,Hongjian Gu,Mohammad Ali Alomrani,David Gamaliel Arcos Bravo,Raika Karimi,Atia Hamidizadeh,Haoping Xu,Guowei Huang,Zhanpeng Zhang,Tongtong Cao,Weichao Qiu,Xingyue Quan,Jianye Hao,Yuzheng Zhuang,Yingxue Zhang
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Recent advancements in Large Language Models (LLMs) and Vision-Language Models (VLMs) have made them powerful tools in embodied navigation, enabling agents to leverage commonsense and spatial reasoning for efficient exploration in unfamiliar environments. Existing LLM-based approaches convert global memory, such as semantic or topological maps, into language descriptions to guide navigation. While this improves efficiency and reduces redundant exploration, the loss of geometric information in language-based representations hinders spatial reasoning, especially in intricate environments. To address this, VLM-based approaches directly process ego-centric visual inputs to select optimal directions for exploration. However, relying solely on a first-person perspective makes navigation a partially observed decision-making problem, leading to suboptimal decisions in complex environments. In this paper, we present a novel vision-language model (VLM)-based navigation framework that addresses these challenges by adaptively retrieving task-relevant cues from a global memory module and integrating them with the agent’s egocentric observations. By dynamically aligning global contextual information with local perception, our approach enhances spatial reasoning and decision-making in long-horizon tasks. Experimental results demonstrate that the proposed method surpasses previous state-of-the-art approaches in object navigation tasks, providing a more effective and scalable solution for embodied navigation.

[AI-35] SleepGMUformer: A gated multimodal temporal neural network for sleep staging

链接: https://arxiv.org/abs/2502.14227
作者: Chenjun Zhao,Xuesen Niu,Xinglin Yu,Long Chen,Na Lv,Huiyu Zhou,Aite Zhao
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

[AI-36] Enhancing Pavement Sensor Data Acquisition for AI-Driven Transportation Research

链接: https://arxiv.org/abs/2502.14222
作者: Manish Kumar Krishne Gowda,Andrew Balmos,Shin Boonam,James V. Krogmeier
类目: Databases (cs.DB); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
*备注: This paper was accepted for presentation at the 104th TRB Annual Meeting, held on January 5-9, 2025, in Washington, D.C., and was presented during the poster session on January 8, 2025

点击查看摘要

[AI-37] Investigating the Impact of LLM Personality on Cognitive Bias Manifestation in Automated Decision-Making Tasks

链接: https://arxiv.org/abs/2502.14219
作者: Jiangen He,Jiqun Liu
类目: Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

[AI-38] Rethinking Spiking Neural Networks from an Ensemble Learning Perspective ICLR2025

链接: https://arxiv.org/abs/2502.14218
作者: Yongqi Ding,Lin Zuo,Mengmeng Jing,Pei He,Hanpu Deng
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: Published as a conference paper at ICLR 2025

点击查看摘要

[AI-39] owards Secure Program Partitioning for Smart Contracts with LLM s In-Context Learning

链接: https://arxiv.org/abs/2502.14215
作者: Ye Liu,Yuqing Niu,Chengyan Ma,Ruidong Han,Wei Ma,Yi Li,Debin Gao,David Lo
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Smart contracts are highly susceptible to manipulation attacks due to the leakage of sensitive information. Addressing manipulation vulnerabilities is particularly challenging because they stem from inherent data confidentiality issues rather than straightforward implementation bugs. To tackle this by preventing sensitive information leakage, we present PartitionGPT, the first LLM-driven approach that combines static analysis with the in-context learning capabilities of large language models (LLMs) to partition smart contracts into privileged and normal codebases, guided by a few annotated sensitive data variables. We evaluated PartitionGPT on 18 annotated smart contracts containing 99 sensitive functions. The results demonstrate that PartitionGPT successfully generates compilable, and verified partitions for 78% of the sensitive functions while reducing approximately 30% code compared to function-level partitioning approach. Furthermore, we evaluated PartitionGPT on nine real-world manipulation attacks that lead to a total loss of 25 million dollars, PartitionGPT effectively prevents eight cases, highlighting its potential for broad applicability and the necessity for secure program partitioning during smart contract development to diminish manipulation vulnerabilities.

[AI-40] Accurate Forgetting for Heterogeneous Federated Continual Learning ICLR2024

链接: https://arxiv.org/abs/2502.14205
作者: Abudukelimu Wuerkaixi,Sen Cui,Jingfeng Zhang,Kunda Yan,Bo Han,Gang Niu,Lei Fang,Changshui Zhang,Masashi Sugiyama
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: published in ICLR 2024

点击查看摘要

[AI-41] Causal Mean Field Multi-Agent Reinforcement Learning

链接: https://arxiv.org/abs/2502.14200
作者: Hao Ma,Zhiqiang Pu,Yi Pan,Boyin Liu,Junlong Gao,Zhenyu Guo
类目: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
*备注:

点击查看摘要

[AI-42] Adaptive Sparsified Graph Learning Framework for Vessel Behavior Anomalies AAAI

链接: https://arxiv.org/abs/2502.14197
作者: Jeehong Kim,Minchan Kim,Jaeseong Ju,Youngseok Hwang,Wonhee Lee,Hyunwoo Park
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: Anomaly Detection in Scientific Domains AAAI Workshop

点击查看摘要

[AI-43] ype 1 Diabetes Management using GLIMMER: Glucose Level Indicator Model with Modified Error Rate

链接: https://arxiv.org/abs/2502.14183
作者: Saman Khamesian,Asiful Arefeen,Adela Grando,Bithika Thompson,Hassan Ghasemzadeh
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Managing Type 1 Diabetes (T1D) demands constant vigilance as individuals strive to regulate their blood glucose levels to avert the dangers of dysglycemia (hyperglycemia or hypoglycemia). Despite the advent of sophisticated technologies such as automated insulin delivery (AID) systems, achieving optimal glycemic control remains a formidable task. AID systems integrate continuous subcutaneous insulin infusion (CSII) and continuous glucose monitors (CGM) data, offering promise in reducing variability and increasing glucose time-in-range. However, these systems often fail to prevent dysglycemia, partly due to limitations in prediction algorithms that lack the precision to avert abnormal glucose events. This gap highlights the need for proactive behavioral adjustments. We address this need with GLIMMER, Glucose Level Indicator Model with Modified Error Rate, a machine learning approach for forecasting blood glucose levels. GLIMMER categorizes glucose values into normal and abnormal ranges and devises a novel custom loss function to prioritize accuracy in dysglycemic events where patient safety is critical. To evaluate the potential of GLIMMER for T1D management, we both use a publicly available dataset and collect new data involving 25 patients with T1D. In predicting next-hour glucose values, GLIMMER achieved a root mean square error (RMSE) of 23.97 (+/-3.77) and a mean absolute error (MAE) of 15.83 (+/-2.09) mg/dL. These results reflect a 23% improvement in RMSE and a 31% improvement in MAE compared to the best-reported error rates.

[AI-44] A modal logic translation of the AGM axioms for belief revision

链接: https://arxiv.org/abs/2502.14176
作者: Giacomo Bonanno
类目: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)
*备注: 19 pages, 3 figures

点击查看摘要

[AI-45] Efficient Inverse Multiagent Learning

链接: https://arxiv.org/abs/2502.14160
作者: Denizalp Goktas,Amy Greenwald,Sadie Zhao,Alec Koppel,Sumitra Ganesh
类目: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Theoretical Economics (econ.TH)
*备注: Paper was submitted to the International Conference on Learning Representations (2024) under the title of “Generative Adversarial Inverse Multiagent Learning”, and renamed for the camera-ready submission as “Efficient Inverse Multiagent Learning”

点击查看摘要

[AI-46] Multi-Agent Risks from Advanced AI

链接: https://arxiv.org/abs/2502.14143
作者: Lewis Hammond,Alan Chan,Jesse Clifton,Jason Hoelscher-Obermaier,Akbir Khan,Euan McLean,Chandler Smith,Wolfram Barfuss,Jakob Foerster,Tomáš Gavenčiak, TheAnh Han,Edward Hughes,Vojtěch Kovařík,Jan Kulveit,Joel Z. Leibo,Caspar Oesterheld,Christian Schroeder de Witt,Nisarg Shah,Michael Wellman,Paolo Bova,Theodor Cimpeanu,Carson Ezell,Quentin Feuillade-Montixi,Matija Franklin,Esben Kran,Igor Krawczuk,Max Lamparth,Niklas Lauffer,Alexander Meinke,Sumeet Motwani,Anka Reuel,Vincent Conitzer,Michael Dennis,Iason Gabriel,Adam Gleave,Gillian Hadfield,Nika Haghtalab,Atoosa Kasirzadeh,Sébastien Krier,Kate Larson,Joel Lehman,David C. Parkes,Georgios Piliouras,Iyad Rahwan
类目: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Emerging Technologies (cs.ET); Machine Learning (cs.LG)
*备注: Cooperative AI Foundation, Technical Report #1

点击查看摘要

[AI-47] Gradients can train reward models: An Empirical Risk Minimization Approach for Offline Inverse RL and Dynamic Discrete Choice Model

链接: https://arxiv.org/abs/2502.14131
作者: Enoch H. Kang,Hema Yoganarasimhan,Lalit Jain
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Econometrics (econ.EM)
*备注:

点击查看摘要

[AI-48] Zero loss guarantees and explicit minimizers for generic overparametrized Deep Learning networks

链接: https://arxiv.org/abs/2502.14114
作者: Thomas Chen,Andrew G. Moore
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Analysis of PDEs (math.AP); Optimization and Control (math.OC); Machine Learning (stat.ML)
*备注: AMS Latex, 9 pages

点击查看摘要

[AI-49] Explainable Distributed Constraint Optimization Problems

链接: https://arxiv.org/abs/2502.14102
作者: Ben Rachmut,Stylianos Loukas Vasileiou,Nimrod Meir Weinstein,Roie Zivan,William Yeoh
类目: Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

[AI-50] Personalized Education with Generative AI and Digital Twins: VR RAG and Zero-Shot Sentiment Analysis for Industry 4.0 Workforce Development

链接: https://arxiv.org/abs/2502.14080
作者: Yu-Zheng Lin,Karan Petal,Ahmed H Alhamadah,Sujan Ghimire,Matthew William Redondo,David Rafael Vidal Corona,Jesus Pacheco,Soheil Salehi,Pratik Satam
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

[AI-51] owards a Learning Theory of Representation Alignment

链接: https://arxiv.org/abs/2502.14047
作者: Francesco Insulla,Shuo Huang,Lorenzo Rosasco
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
*备注:

点击查看摘要

[AI-52] Position: There are no Champions in Long-Term Time Series Forecasting

链接: https://arxiv.org/abs/2502.14045
作者: Lorenzo Brigato,Rafael Morand,Knut Strømmen,Maria Panagiotou,Markus Schmidt,Stavroula Mougiakakou
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: Pre-print

点击查看摘要

Abstract:Recent advances in long-term time series forecasting have introduced numerous complex prediction models that consistently outperform previously published architectures. However, this rapid progression raises concerns regarding inconsistent benchmarking and reporting practices, which may undermine the reliability of these comparisons. Our position emphasizes the need to shift focus away from pursuing ever-more complex models and towards enhancing benchmarking practices through rigorous and standardized evaluation methods. To support our claim, we first perform a broad, thorough, and reproducible evaluation of the top-performing models on the most popular benchmark by training 3,500+ networks over 14 datasets. Then, through a comprehensive analysis, we find that slight changes to experimental setups or current evaluation metrics drastically shift the common belief that newly published results are advancing the state of the art. Our findings suggest the need for rigorous and standardized evaluation methods that enable more substantiated claims, including reproducible hyperparameter setups and statistical testing.

[AI-53] Asking for Help Enables Safety Guarantees Without Sacrificing Effectiveness

链接: https://arxiv.org/abs/2502.14043
作者: Benjamin Plaut,Juan Liévano-Karim,Stuart Russell
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

[AI-54] Appeal prediction for AI up-scaled Images

链接: https://arxiv.org/abs/2502.14013
作者: Steve Göring,Rasmus Merten,Alexander Raake
类目: Graphics (cs.GR); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
*备注:

点击查看摘要

Abstract:DNN- or AI-based up-scaling algorithms are gaining in popularity due to the improvements in machine learning. Various up-scaling models using CNNs, GANs or mixed approaches have been published. The majority of models are evaluated using PSRN and SSIM or only a few example images. However, a performance evaluation with a wide range of real-world images and subjective evaluation is missing, which we tackle in the following paper. For this reason, we describe our developed dataset, which uses 136 base images and five different up-scaling methods, namely Real-ESRGAN, BSRGAN, waifu2x, KXNet, and Lanczos. Overall the dataset consists of 1496 annotated images. The labeling of our dataset focused on image appeal and has been performed using crowd-sourcing employing our open-source tool AVRate Voyager. We evaluate the appeal of the different methods, and the results indicate that Real-ESRGAN and BSRGAN are the best. Furthermore, we train a DNN to detect which up-scaling method has been used, the trained models have a good overall performance in our evaluation. In addition to this, we evaluate state-of-the-art image appeal and quality models, here none of the models showed a high prediction performance, therefore we also trained two own approaches. The first uses transfer learning and has the best performance, and the second model uses signal-based features and a random forest model with good overall performance. We share the data and implementation to allow further research in the context of open science.

[AI-55] DFDT: Dynamic Fast Decision Tree for IoT Data Stream Mining on Edge Devices

链接: https://arxiv.org/abs/2502.14011
作者: Afonso Lourenço,João Rodrigo,João Gama,Goreti Marreiros
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
*备注:

点击查看摘要

[AI-56] Rectified Lagrangian for Out-of-Distribution Detection in Modern Hopfield Networks AAAI2025

链接: https://arxiv.org/abs/2502.14003
作者: Ryo Moriai,Nakamasa Inoue,Masayuki Tanaka,Rei Kawakami,Satoshi Ikehata,Ikuro Sato
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: Accepted to AAAI 2025

点击查看摘要

[AI-57] Human-Artificial Interaction in the Age of Agent ic AI: A System-Theoretical Approach

链接: https://arxiv.org/abs/2502.14000
作者: Uwe M. Borghoff,Paolo Bottoni,Remo Pareschi
类目: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
*备注: 27 pages, 10 figures

点击查看摘要

[AI-58] Generative Detail Enhancement for Physically Based Materials

链接: https://arxiv.org/abs/2502.13994
作者: Saeed Hadadan,Benedikt Bitterli,Tizian Zeltner,Jan Novák,Fabrice Rousselle,Jacob Munkberg,Jon Hasselgren,Bartlomiej Wronski,Matthias Zwicker
类目: Graphics (cs.GR); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:We present a tool for enhancing the detail of physically based materials using an off-the-shelf diffusion model and inverse rendering. Our goal is to enhance the visual fidelity of materials with detail that is often tedious to author, by adding signs of wear, aging, weathering, etc. As these appearance details are often rooted in real-world processes, we leverage a generative image model trained on a large dataset of natural images with corresponding visuals in context. Starting with a given geometry, UV mapping, and basic appearance, we render multiple views of the object. We use these views, together with an appearance-defining text prompt, to condition a diffusion model. The details it generates are then backpropagated from the enhanced images to the material parameters via inverse differentiable rendering. For inverse rendering to be successful, the generated appearance has to be consistent across all the images. We propose two priors to address the multi-view consistency of the diffusion model. First, we ensure that the initial noise that seeds the diffusion process is itself consistent across views by integrating it from a view-independent UV space. Second, we enforce geometric consistency by biasing the attention mechanism via a projective constraint so that pixels attend strongly to their corresponding pixel locations in other views. Our approach does not require any training or finetuning of the diffusion model, is agnostic of the material model used, and the enhanced material properties, i.e., 2D PBR textures, can be further edited by artists.

[AI-59] MILE: Model-based Intervention Learning ICRA

链接: https://arxiv.org/abs/2502.13519
作者: Yigit Korkmaz,Erdem Bıyık
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: International Conference on Robotics and Automation (ICRA)

点击查看摘要

[AI-60] Human Misperception of Generative-AI Alignment: A Laboratory Experiment

链接: https://arxiv.org/abs/2502.14708
作者: Kevin He,Ran Shorrer,Mengjia Xia
类目: Theoretical Economics (econ.TH); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
*备注:

点击查看摘要

[AI-61] Distribution Matching for Self-Supervised Transfer Learning

链接: https://arxiv.org/abs/2502.14424
作者: Yuling Jiao,Wensen Ma,Defeng Sun,Hansheng Wang,Yang Wang
类目: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME)
*备注:

点击查看摘要

[AI-62] Reliable Explainability of Deep Learning Spatial-Spectral Classifiers for Improved Semantic Segmentation in Autonomous Driving

链接: https://arxiv.org/abs/2502.14416
作者: Jon Gutiérrez-Zaballa,Koldo Basterretxea,Javier Echanobe
类目: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

[AI-63] Discovering highly efficient low-weight quantum error-correcting codes with reinforcement learning

链接: https://arxiv.org/abs/2502.14372
作者: Austin Yubo He,Zi-Wen Liu
类目: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)
*备注: 18 pages, 14 figures, 4 tables

点击查看摘要

[AI-64] Purest Quantum State Identification

链接: https://arxiv.org/abs/2502.14334
作者: Yingqi Yu,Honglin Chen,Jun Wu,Wei Xie,Xiangyang Li
类目: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

[AI-65] Weighted Low-rank Approximation via Stochastic Gradient Descent on Manifolds

链接: https://arxiv.org/abs/2502.14174
作者: Conglong Xu,Peiqi Yang,Hao Wu
类目: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[AI-66] Multi-Objective Bayesian Optimization for Networked Black-Box Systems: A Path to Greener Profits and Smarter Designs

链接: https://arxiv.org/abs/2502.14121
作者: Akshay Kudva,Wei-Ting Tang,Joel A. Paulson
类目: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Designing modern industrial systems requires balancing several competing objectives, such as profitability, resilience, and sustainability, while accounting for complex interactions between technological, economic, and environmental factors. Multi-objective optimization (MOO) methods are commonly used to navigate these tradeoffs, but selecting the appropriate algorithm to tackle these problems is often unclear, particularly when system representations vary from fully equation-based (white-box) to entirely data-driven (black-box) models. While grey-box MOO methods attempt to bridge this gap, they typically impose rigid assumptions on system structure, requiring models to conform to the underlying structural assumptions of the solver rather than the solver adapting to the natural representation of the system of interest. In this chapter, we introduce a unifying approach to grey-box MOO by leveraging network representations, which provide a general and flexible framework for modeling interconnected systems as a series of function nodes that share various inputs and outputs. Specifically, we propose MOBONS, a novel Bayesian optimization-inspired algorithm that can efficiently optimize general function networks, including those with cyclic dependencies, enabling the modeling of feedback loops, recycle streams, and multi-scale simulations - features that existing methods fail to capture. Furthermore, MOBONS incorporates constraints, supports parallel evaluations, and preserves the sample efficiency of Bayesian optimization while leveraging network structure for improved scalability. We demonstrate the effectiveness of MOBONS through two case studies, including one related to sustainable process design. By enabling efficient MOO under general graph representations, MOBONS has the potential to significantly enhance the design of more profitable, resilient, and sustainable engineering systems.

[AI-67] owards a perturbation-based explanation for medical AI as differentiable programs

链接: https://arxiv.org/abs/2502.14001
作者: Takeshi Abe,Yoshiyuki Asai
类目: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: 7 pages, 1 figure

点击查看摘要

[AI-68] A Baseline Method for Removing Invisible Image Watermarks using Deep Image Prior

链接: https://arxiv.org/abs/2502.13998
作者: Hengyue Liang,Taihui Li,Ju Sun
类目: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Image watermarks have been considered a promising technique to help detect AI-generated content, which can be used to protect copyright or prevent fake image abuse. In this work, we present a black-box method for removing invisible image watermarks, without the need of any dataset of watermarked images or any knowledge about the watermark system. Our approach is simple to implement: given a single watermarked image, we regress it by deep image prior (DIP). We show that from the intermediate steps of DIP one can reliably find an evasion image that can remove invisible watermarks while preserving high image quality. Due to its unique working mechanism and practical effectiveness, we advocate including DIP as a baseline invasion method for benchmarking the robustness of watermarking systems. Finally, by showing the limited ability of DIP and other existing black-box methods in evading training-based visible watermarks, we discuss the positive implications on the practical use of training-based visible watermarks to prevent misinformation abuse.

[AI-69] Learning to Discover Regulatory Elements for Gene Expression Prediction

链接: https://arxiv.org/abs/2502.13991
作者: Xingyu Su,Haiyang Yu,Degui Zhi,Shuiwang Ji
类目: Genomics (q-bio.GN); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

[AI-70] Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders

链接: https://arxiv.org/abs/2502.13983
作者: Seungbae Kim,Daeun Lee,Brielle Stark,Jinyoung Han
类目: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

[AI-71] Utilizing Effective Dynamic Graph Learning to Shield Financial Stability from Risk Propagation

链接: https://arxiv.org/abs/2502.13979
作者: Guanyuan Yu,Qing Li,Yu Zhao,Jun Wang,YiJun Chen,Shaolei Chen
类目: Risk Management (q-fin.RM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

[AI-72] IncepFormerNet: A multi-scale multi-head attention network for SSVEP classification

链接: https://arxiv.org/abs/2502.13972
作者: Yan Huang,Yongru Chen,Lei Cao,Yongnian Cao,Xuechun Yang,Yilin Dong,Tianyu Liu
类目: ignal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

[AI-73] Bridging Simulation and Reality: A 3D Clustering-Based Deep Learning Model for UAV-Based RF Source Localization

链接: https://arxiv.org/abs/2502.13969
作者: Saad Masrur,Ismail Guvenc
类目: ignal Processing (eess.SP); Artificial Intelligence (cs.AI)
*备注: This paper has been submitted to IEEE ICC 2025

点击查看摘要

机器学习

[LG-0] Generating π-Functional Molecules Using STGG with Active Learning

链接: https://arxiv.org/abs/2502.14842
作者: Alexia Jolicoeur-Martineau,Yan Zhang,Boris Knyazev,Aristide Baratin,Cheng-Hao Liu
类目: Machine Learning (cs.LG)
*备注: Code: this https URL

点击查看摘要

Abstract:Generating novel molecules with out-of-distribution properties is a major challenge in molecular discovery. While supervised learning methods generate high-quality molecules similar to those in a dataset, they struggle to generalize to out-of-distribution properties. Reinforcement learning can explore new chemical spaces but often conducts ‘reward-hacking’ and generates non-synthesizable molecules. In this work, we address this problem by integrating a state-of-the-art supervised learning method, STGG+, in an active learning loop. Our approach iteratively generates, evaluates, and fine-tunes STGG+ to continuously expand its knowledge. We denote this approach STGG+AL. We apply STGG+AL to the design of organic \pi -functional materials, specifically two challenging tasks: 1) generating highly absorptive molecules characterized by high oscillator strength and 2) designing absorptive molecules with reasonable oscillator strength in the near-infrared (NIR) range. The generated molecules are validated and rationalized in-silico with time-dependent density functional theory. Our results demonstrate that our method is highly effective in generating novel molecules with high oscillator strength, contrary to existing methods such as reinforcement learning (RL) methods. We open-source our active-learning code along with our Conjugated-xTB dataset containing 2.9 million \pi -conjugated molecules and the function for approximating the oscillator strength and absorption wavelength (based on sTDA-xTB).

[LG-1] Spatial Distribution-Shift Aware Knowledge-Guided Machine Learning

链接: https://arxiv.org/abs/2502.14840
作者: Arun Sharma,Majid Farhadloo,Mingzhou Yang,Ruolei Zeng,Subhankar Ghosh,Shashi Shekhar
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Given inputs of diverse soil characteristics and climate data gathered from various regions, we aimed to build a model to predict accurate land emissions. The problem is important since accurate quantification of the carbon cycle in agroecosystems is crucial for mitigating climate change and ensuring sustainable food production. Predicting accurate land emissions is challenging since calibrating the heterogeneous nature of soil properties, moisture, and environmental conditions is hard at decision-relevant scales. Traditional approaches do not adequately estimate land emissions due to location-independent parameters failing to leverage the spatial heterogeneity and also require large datasets. To overcome these limitations, we proposed Spatial Distribution-Shift Aware Knowledge-Guided Machine Learning (SDSA-KGML), which leverages location-dependent parameters that account for significant spatial heterogeneity in soil moisture from multiple sites within the same region. Experimental results demonstrate that SDSA-KGML models achieve higher local accuracy for the specified states in the Midwest Region.

[LG-2] Probabilistic Robustness in Deep Learning: A Concise yet Comprehensive Guide

链接: https://arxiv.org/abs/2502.14833
作者: Xingyu Zhao
类目: Machine Learning (cs.LG)
*备注: This is a preprint of the following chapter: X. Zhao, Probabilistic Robustness in Deep Learning: A Concise yet Comprehensive Guide, published in the book Adversarial Example Detection and Mitigation Using Machine Learning, edited by Ehsan Nowroozi, Rahim Taheri, Lucas Cordeiro, 2025, Springer Nature. The final authenticated version will available online soon

点击查看摘要

Abstract:Deep learning (DL) has demonstrated significant potential across various safety-critical applications, yet ensuring its robustness remains a key challenge. While adversarial robustness has been extensively studied in worst-case scenarios, probabilistic robustness (PR) offers a more practical perspective by quantifying the likelihood of failures under stochastic perturbations. This paper provides a concise yet comprehensive overview of PR, covering its formal definitions, evaluation and enhancement methods. We introduce a reformulated ‘‘min-max’’ optimisation framework for adversarial training specifically designed to improve PR. Furthermore, we explore the integration of PR verification evidence into system-level safety assurance, addressing challenges in translating DL model-level robustness to system-level claims. Finally, we highlight open research questions, including benchmarking PR evaluation methods, extending PR to generative AI tasks, and developing rigorous methodologies and case studies for system-level integration.

[LG-3] Fundamental Limitations in Defending LLM Finetuning APIs

链接: https://arxiv.org/abs/2502.14828
作者: Xander Davies,Eric Winsor,Tomek Korbak,Alexandra Souly,Robert Kirk,Christian Schroeder de Witt,Yarin Gal
类目: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
*备注:

点击查看摘要

Abstract:LLM developers have imposed technical interventions to prevent fine-tuning misuse attacks, attacks where adversaries evade safeguards by fine-tuning the model using a public API. Previous work has established several successful attacks against specific fine-tuning API defences. In this work, we show that defences of fine-tuning APIs that seek to detect individual harmful training or inference samples (‘pointwise’ detection) are fundamentally limited in their ability to prevent fine-tuning attacks. We construct ‘pointwise-undetectable’ attacks that repurpose entropy in benign model outputs (e.g. semantic or syntactic variations) to covertly transmit dangerous knowledge. Our attacks are composed solely of unsuspicious benign samples that can be collected from the model before fine-tuning, meaning training and inference samples are all individually benign and low-perplexity. We test our attacks against the OpenAI fine-tuning API, finding they succeed in eliciting answers to harmful multiple-choice questions, and that they evade an enhanced monitoring system we design that successfully detects other fine-tuning attacks. We encourage the community to develop defences that tackle the fundamental limitations we uncover in pointwise fine-tuning API defences.

[LG-4] Meshless Shape Optimization using Neural Networks and Partial Differential Equations on Graphs

链接: https://arxiv.org/abs/2502.14821
作者: Eloi Martinet,Leon Bungert
类目: Numerical Analysis (math.NA); Machine Learning (cs.LG); Optimization and Control (math.OC)
*备注: 13 pages, 5 figures, accepted at SSVM 2025

点击查看摘要

Abstract:Shape optimization involves the minimization of a cost function defined over a set of shapes, often governed by a partial differential equation (PDE). In the absence of closed-form solutions, one relies on numerical methods to approximate the solution. The level set method – when coupled with the finite element method – is one of the most versatile numerical shape optimization approaches but still suffers from the limitations of most mesh-based methods. In this work, we present a fully meshless level set framework that leverages neural networks to parameterize the level set function and employs the graph Laplacian to approximate the underlying PDE. Our approach enables precise computations of geometric quantities such as surface normals and curvature, and allows tackling optimization problems within the class of convex shapes.

[LG-5] Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models

链接: https://arxiv.org/abs/2502.14819
作者: Vlad Sobal,Wancong Zhang,Kynghyun Cho,Randall Balestriero,Tim G. J. Rudner,Yann LeCun
类目: Machine Learning (cs.LG)
*备注: Project web page: this https URL

点击查看摘要

Abstract:A long-standing goal in AI is to build agents that can solve a variety of tasks across different environments, including previously unseen ones. Two dominant approaches tackle this challenge: (i) reinforcement learning (RL), which learns policies through trial and error, and (ii) optimal control, which plans actions using a learned or known dynamics model. However, their relative strengths and weaknesses remain underexplored in the setting where agents must learn from offline trajectories without reward annotations. In this work, we systematically analyze the performance of different RL and control-based methods under datasets of varying quality. On the RL side, we consider goal-conditioned and zero-shot approaches. On the control side, we train a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and use it for planning. We study how dataset properties-such as data diversity, trajectory quality, and environment variability-affect the performance of these approaches. Our results show that model-free RL excels when abundant, high-quality data is available, while model-based planning excels in generalization to novel environment layouts, trajectory stitching, and data-efficiency. Notably, planning with a latent dynamics model emerges as a promising approach for zero-shot generalization from suboptimal data.

[LG-6] Dynamic Low-Rank Sparse Adaptation for Large Language Models ICLR2025

链接: https://arxiv.org/abs/2502.14816
作者: Weizhong Huang,Yuxin Zhang,Xiawu Zheng,Yang Liu,Jing Lin,Yiwu Yao,Rongrong Ji
类目: Machine Learning (cs.LG)
*备注: Accepted to ICLR 2025

点击查看摘要

Abstract:Despite the efficacy of network sparsity in alleviating the deployment strain of Large Language Models (LLMs), it endures significant performance degradation. Applying Low-Rank Adaptation (LoRA) to fine-tune the sparse LLMs offers an intuitive approach to counter this predicament, while it holds shortcomings include: 1) The inability to integrate LoRA weights into sparse LLMs post-training, and 2) Insufficient performance recovery at high sparsity ratios. In this paper, we introduce dynamic Low-rank Sparse Adaptation (LoSA), a novel method that seamlessly integrates low-rank adaptation into LLM sparsity within a unified framework, thereby enhancing the performance of sparse LLMs without increasing the inference latency. In particular, LoSA dynamically sparsifies the LoRA outcomes based on the corresponding sparse weights during fine-tuning, thus guaranteeing that the LoRA module can be integrated into the sparse LLMs post-training. Besides, LoSA leverages Representation Mutual Information (RMI) as an indicator to determine the importance of layers, thereby efficiently determining the layer-wise sparsity rates during fine-tuning. Predicated on this, LoSA adjusts the rank of the LoRA module based on the variability in layer-wise reconstruction errors, allocating an appropriate fine-tuning for each layer to reduce the output discrepancies between dense and sparse LLMs. Extensive experiments tell that LoSA can efficiently boost the efficacy of sparse LLMs within a few hours, without introducing any additional inferential burden. For example, LoSA reduced the perplexity of sparse LLaMA-2-7B by 68.73 and increased zero-shot accuracy by 16.32 % , achieving a 2.60 \times speedup on CPU and 2.23 \times speedup on GPU, requiring only 45 minutes of fine-tuning on a single NVIDIA A100 80GB GPU. Code is available at this https URL.

[LG-7] PREM: Privately Answering Statistical Queries with Relative Error

链接: https://arxiv.org/abs/2502.14809
作者: Badih Ghazi,Cristóbal Guzmán,Pritish Kamath,Alexander Knop,Ravi Kumar,Pasin Manurangsi,Sushant Sachdeva
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:We introduce \mathsfPREM (Private Relative Error Multiplicative weight update), a new framework for generating synthetic data that achieves a relative error guarantee for statistical queries under (\varepsilon, \delta) differential privacy (DP). Namely, for a domain \cal X , a family \cal F of queries f : \cal X \to \0, 1\ , and \zeta 0 , our framework yields a mechanism that on input dataset D \in \cal X^n outputs a synthetic dataset \widehatD \in \cal X^n such that all statistical queries in \cal F on D , namely \sum_x \in D f(x) for f \in \cal F , are within a 1 \pm \zeta multiplicative factor of the corresponding value on \widehatD up to an additive error that is polynomial in \log |\cal F| , \log |\cal X| , \log n , \log(1/\delta) , 1/\varepsilon , and 1/\zeta . In contrast, any (\varepsilon, \delta) -DP mechanism is known to require worst-case additive error that is polynomial in at least one of n, |\cal F| , or |\cal X| . We complement our algorithm with nearly matching lower bounds.

[LG-8] An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces

链接: https://arxiv.org/abs/2502.14790
作者: Alexander Terenin,Jeffrey Negrea
类目: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Statistics Theory (math.ST); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:We develop an analysis of Thompson sampling for online learning under full feedback - also known as prediction with expert advice - where the learner’s prior is defined over the space of an adversary’s future actions, rather than the space of experts. We show regret decomposes into regret the learner expected a priori, plus a prior-robustness-type term we call excess regret. In the classical finite-expert setting, this recovers optimal rates. As an initial step towards practical online learning in settings with a potentially-uncountably-infinite number of experts, we show that Thompson sampling with a certain Gaussian process prior widely-used in the Bayesian optimization literature has a \mathcalO(\beta\sqrtT\log(1+\lambda)) rate against a \beta -bounded \lambda -Lipschitz~adversary.

[LG-9] A Neural Operator-Based Emulator for Regional Shallow Water Dynamics

链接: https://arxiv.org/abs/2502.14782
作者: Peter Rivera-Casillas,Sourav Dutta,Shukai Cai,Mark Loveland,Kamaljyoti Nath,Khemraj Shukla,Corey Trahan,Jonghyun Lee,Matthew Farthing,Clint Dawson
类目: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Computational Physics (physics.comp-ph); Geophysics (physics.geo-ph)
*备注:

点击查看摘要

Abstract:Coastal regions are particularly vulnerable to the impacts of rising sea levels and extreme weather events. Accurate real-time forecasting of hydrodynamic processes in these areas is essential for infrastructure planning and climate adaptation. In this study, we present the Multiple-Input Temporal Operator Network (MITONet), a novel autoregressive neural emulator that employs dimensionality reduction to efficiently approximate high-dimensional numerical solvers for complex, nonlinear problems that are governed by time-dependent, parameterized partial differential equations. Although MITONet is applicable to a wide range of problems, we showcase its capabilities by forecasting regional tide-driven dynamics described by the two-dimensional shallow-water equations, while incorporating initial conditions, boundary conditions, and a varying domain parameter. We demonstrate MITONet’s performance in a real-world application, highlighting its ability to make accurate predictions by extrapolating both in time and parametric space.

[LG-10] Sparse Activations as Conformal Predictors

链接: https://arxiv.org/abs/2502.14773
作者: Margarida M. Campos,João Calém,Sophia Sklaviadis,Mário A.T. Figueiredo,André F.T. Martins
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Conformal prediction is a distribution-free framework for uncertainty quantification that replaces point predictions with sets, offering marginal coverage guarantees (i.e., ensuring that the prediction sets contain the true label with a specified probability, in expectation). In this paper, we uncover a novel connection between conformal prediction and sparse softmax-like transformations, such as sparsemax and \gamma -entmax (with \gamma 1 ), which may assign nonzero probability only to a subset of labels. We introduce new non-conformity scores for classification that make the calibration process correspond to the widely used temperature scaling method. At test time, applying these sparse transformations with the calibrated temperature leads to a support set (i.e., the set of labels with nonzero probability) that automatically inherits the coverage guarantees of conformal prediction. Through experiments on computer vision and text classification benchmarks, we demonstrate that the proposed method achieves competitive results in terms of coverage, efficiency, and adaptiveness compared to standard non-conformity scores based on softmax.

[LG-11] Efficient Multivariate Robust Mean Estimation Under Mean-Shift Contamination

链接: https://arxiv.org/abs/2502.14772
作者: Ilias Diakonikolas,Giannis Iakovidis,Daniel M. Kane,Thanasis Pittas
类目: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:We study the algorithmic problem of robust mean estimation of an identity covariance Gaussian in the presence of mean-shift contamination. In this contamination model, we are given a set of points in \mathbbR^d generated i.i.d. via the following process. For a parameter \alpha1/2 , the i -th sample x_i is obtained as follows: with probability 1-\alpha , x_i is drawn from \mathcalN(\mu, I) , where \mu \in \mathbbR^d is the target mean; and with probability \alpha , x_i is drawn from \mathcalN(z_i, I) , where z_i is unknown and potentially arbitrary. Prior work characterized the information-theoretic limits of this task. Specifically, it was shown that, in contrast to Huber contamination, in the presence of mean-shift contamination consistent estimation is possible. On the other hand, all known robust estimators in the mean-shift model have running times exponential in the dimension. Here we give the first computationally efficient algorithm for high-dimensional robust mean estimation with mean-shift contamination that can tolerate a constant fraction of outliers. In particular, our algorithm has near-optimal sample complexity, runs in sample-polynomial time, and approximates the target mean to any desired accuracy. Conceptually, our result contributes to a growing body of work that studies inference with respect to natural noise models lying in between fully adversarial and random settings.

[LG-12] Determining Layer-wise Sparsity for Large Language Models Through a Theoretical Perspective

链接: https://arxiv.org/abs/2502.14770
作者: Weizhong Huang,Yuxin Zhang,Xiawu Zheng,Fei Chao,Rongrong Ji
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:In this paper, we address the challenge of determining the layer-wise sparsity rates of large language models (LLMs) through a theoretical perspective. Specifically, we identify a critical issue of ‘’ \textbfreconstruction error explosion ‘’ in existing LLMs sparsification methods. This refers to the cumulative effect of reconstruction errors throughout the sparsification process, where errors from earlier layers propagate and amplify in subsequent layers. As a result, the overall reconstruction error increases significantly, leading to a substantial degradation in model performance. Through theoretical analysis, we derive a simple yet effective approach to layer-wise sparsity allocation that mitigates this issue. Our method uses a monotonically increasing arithmetic progression, reducing the process of determining sparsity rates for multiple layers to the determination of a single common difference hyperparameter. Remarkably, this allows for the optimal layer-wise sparsity rates to be identified with just a few trials. Both our theoretical analysis and experimental results demonstrate that this sparsity allocation scheme is near optimal. Extensive experiments show that our method significantly improves the performance of sparse LLMs across various architectures, outperforming existing layer-wise sparsity methods. Furthermore, it enhances the performance of various compression techniques and is applicable to vision and multimodal models. Notably, our method achieves a reduction of 52.10 in perplexity for the 70 % sparse LLaMA2-7B model obtained via Wanda, improves average zero-shot accuracy by 10.50 % , and delivers speedups of 2.63 \times and 2.23 \times on CPU and GPU, respectively.

[LG-13] SQL4NN: Validation and expressive querying of models as data

链接: https://arxiv.org/abs/2502.14745
作者: Mark Gerarts,Juno Steegmans,Jan Van den Bussche
类目: Databases (cs.DB); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:We consider machine learning models, learned from data, to be an important, intensional, kind of data in themselves. As such, various analysis tasks on models can be thought of as queries over this intensional data, often combined with extensional data such as data for training or validation. We demonstrate that relational database systems and SQL can actually be well suited for many such tasks.

[LG-14] Reinforcement Learning with Graph Attention for Routing and Wavelength Assignment with Lightpath Reuse

链接: https://arxiv.org/abs/2502.14741
作者: Michael Doherty,Alejandra Beghelli
类目: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Systems and Control (eess.SY)
*备注:

点击查看摘要

Abstract:Many works have investigated reinforcement learning (RL) for routing and spectrum assignment on flex-grid networks but only one work to date has examined RL for fixed-grid with flex-rate transponders, despite production systems using this paradigm. Flex-rate transponders allow existing lightpaths to accommodate new services, a task we term routing and wavelength assignment with lightpath reuse (RWA-LR). We re-examine this problem and present a thorough benchmarking of heuristic algorithms for RWA-LR, which are shown to have 6% increased throughput when candidate paths are ordered by number of hops, rather than total length. We train an RL agent for RWA-LR with graph attention networks for the policy and value functions to exploit the graph-structured data. We provide details of our methodology and open source all of our code for reproduction. We outperform the previous state-of-the-art RL approach by 2.5% (17.4 Tbps mean additional throughput) and the best heuristic by 1.2% (8.5 Tbps mean additional throughput). This marginal gain highlights the difficulty in learning effective RL policies on long horizon resource allocation tasks.

[LG-15] Disentangled Latent Spaces for Reduced Order Models using Deterministic Autoencoders

链接: https://arxiv.org/abs/2502.14679
作者: Henning Schwarz,Pyei Phyo Lin,Jens-Peter M. Zemke,Thomas Rung
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Data-driven reduced-order models based on autoencoders generally lack interpretability compared to classical methods such as the proper orthogonal decomposition. More interpretability can be gained by disentangling the latent variables and analyzing the resulting modes. For this purpose, probabilistic \beta -variational autoencoders ( \beta -VAEs) are frequently used in computational fluid dynamics and other simulation sciences. Using a benchmark periodic flow dataset, we show that competitive results can be achieved using non-probabilistic autoencoder approaches that either promote orthogonality or penalize correlation between latent variables. Compared to probabilistic autoencoders, these approaches offer more robustness with respect to the choice of hyperparameters entering the loss function. We further demonstrate the ability of a non-probabilistic approach to identify a reduced number of active latent variables by introducing a correlation penalty, a function also known from the use of \beta -VAE. The investigated probabilistic and non-probabilistic autoencoder models are finally used for the dimensionality reduction of aircraft ditching loads, which serves as an industrial application in this work.

[LG-16] Beyond the Surface: Uncovering Implicit Locations with LLM s for Personalized Local News KDD

链接: https://arxiv.org/abs/2502.14660
作者: Gali Katz,Hai Sitton,Guy Gonen,Yohay Kaplan
类目: Machine Learning (cs.LG)
*备注: 10 pages, 2 figures, submitted to kdd

点击查看摘要

Abstract:News recommendation systems personalize homepage content to boost engagement, but factors like content type, editorial stance, and geographic focus impact recommendations. Local newspapers balance coverage across regions, yet identifying local articles is challenging due to implicit location cues like slang or landmarks. Traditional methods, such as Named Entity Recognition (NER) and Knowledge Graphs, infer locations, but Large Language Models (LLMs) offer new possibilities while raising concerns about accuracy and explainability. This paper explores LLMs for local article classification in Taboola’s “Homepage For You” system, comparing them to traditional techniques. Key findings: (1) Knowledge Graphs enhance NER models’ ability to detect implicit locations, (2) LLMs outperform traditional methods, and (3) LLMs can effectively identify local content without requiring Knowledge Graph integration. Offline evaluations showed LLMs excel at implicit location classification, while online A/B tests showed a significant increased in local views. A scalable pipeline integrating LLM-based location classification boosted local article distribution by 27%, preserving newspapers’ brand identity and enhancing homepage personalization. Comments: 10 pages, 2 figures, submitted to kdd Subjects: Machine Learning (cs.LG) MSC classes: 68T50 Cite as: arXiv:2502.14660 [cs.LG] (or arXiv:2502.14660v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2502.14660 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Gali Katz [view email] [v1] Thu, 20 Feb 2025 15:55:52 UTC (662 KB)

[LG-17] Variance Reduction Methods Do Not Need to Compute Full Gradients: Improved Efficiency through Shuffling

链接: https://arxiv.org/abs/2502.14648
作者: Daniil Medyakov,Gleb Molodtsov,Savelii Chezhegov,Alexey Rebrikov,Aleksandr Beznosikov
类目: Machine Learning (cs.LG); Optimization and Control (math.OC)
*备注: 30 pages, 6 figures, 1 table

点击查看摘要

Abstract:In today’s world, machine learning is hard to imagine without large training datasets and models. This has led to the use of stochastic methods for training, such as stochastic gradient descent (SGD). SGD provides weak theoretical guarantees of convergence, but there are modifications, such as Stochastic Variance Reduced Gradient (SVRG) and StochAstic Recursive grAdient algoritHm (SARAH), that can reduce the variance. These methods require the computation of the full gradient occasionally, which can be time consuming. In this paper, we explore variants of variance reduction algorithms that eliminate the need for full gradient computations. To make our approach memory-efficient and avoid full gradient computations, we use two key techniques: the shuffling heuristic and idea of SAG/SAGA methods. As a result, we improve existing estimates for variance reduction algorithms without the full gradient computations. Additionally, for the non-convex objective function, our estimate matches that of classic shuffling methods, while for the strongly convex one, it is an improvement. We conduct comprehensive theoretical analysis and provide extensive experimental results to validate the efficiency and practicality of our methods for large-scale machine learning problems.

[LG-18] CER: Confidence Enhanced Reasoning in LLM s

链接: https://arxiv.org/abs/2502.14634
作者: Ali Razghandi,Seyed Mohammad Hadi Hosseini,Mahdieh Soleymani Baghshah
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Ensuring the reliability of Large Language Models (LLMs) in complex reasoning tasks remains a formidable challenge, particularly in scenarios that demand precise mathematical calculations and knowledge-intensive open-domain generation. In this work, we introduce an uncertainty-aware framework designed to enhance the accuracy of LLM responses by systematically incorporating model confidence at critical decision points. We propose an approach that encourages multi-step reasoning in LLMs and quantify the confidence of intermediate answers such as numerical results in mathematical reasoning and proper nouns in open-domain generation. Then, the overall confidence of each reasoning chain is evaluated based on confidence of these critical intermediate steps. Finally, we aggregate the answer of generated response paths in a way that reflects the reliability of each generated content (as opposed to self-consistency in which each generated chain contributes equally to majority voting). We conducted extensive experiments in five datasets, three mathematical datasets and two open-domain datasets, using four LLMs. The results consistently validate the effectiveness of our novel confidence aggregation method, leading to an accuracy improvement of up to 7.4% and 5.8% over baseline approaches in math and open-domain generation tasks, respectively. Code is publicly available at this https URL Aquasar11/CER.

[LG-19] Synergistic Fusion of Multi-Source Knowledge via Evidence Theory for High-Entropy Alloy Discovery

链接: https://arxiv.org/abs/2502.14631
作者: Minh-Quyet Ha,Dinh-Khiet Le,Duc-Anh Dao,Tien-Sinh Vu,Duong-Nguyen Nguyen,Viet-Cuong Nguyen,Hiori Kino,Van-Nam Huynh,Hieu-Chi Dam
类目: Machine Learning (cs.LG)
*备注: 13 pages, 7 figures

点击查看摘要

Abstract:Discovering novel high-entropy alloys (HEAs) with desirable properties is challenging due to the vast compositional space and complex phase formation mechanisms. Efficient exploration of this space requires a strategic approach that integrates heterogeneous knowledge sources. Here, we propose a framework that systematically combines knowledge extracted from computational material datasets with domain knowledge distilled from scientific literature using large language models (LLMs). A central feature of this approach is the explicit consideration of element substitutability, identifying chemically similar elements that can be interchanged to potentially stabilize desired HEAs. Dempster-Shafer theory, a mathematical framework for reasoning under uncertainty, is employed to model and combine substitutabilities based on aggregated evidence from multiple sources. The framework predicts the phase stability of candidate HEA compositions and is systematically evaluated on both quaternary alloy systems, demonstrating superior performance compared to baseline machine learning models and methods reliant on single-source evidence in cross-validation experiments. By leveraging multi-source knowledge, the framework retains robust predictive power even when key elements are absent from the training data, underscoring its potential for knowledge transfer and extrapolation. Furthermore, the enhanced interpretability of the methodology offers insights into the fundamental factors governing HEA formation. Overall, this work provides a promising strategy for accelerating HEA discovery by integrating computational and textual knowledge sources, enabling efficient exploration of vast compositional spaces with improved generalization and interpretability.

[LG-20] Noisy Test-Time Adaptation in Vision-Language Models ICLR2025

链接: https://arxiv.org/abs/2502.14604
作者: Chentao Cao,Zhun Zhong,Zhanke Zhou,Tongliang Liu,Yang Liu,Kun Zhang,Bo Han
类目: Machine Learning (cs.LG)
*备注: ICLR 2025

点击查看摘要

Abstract:Test-time adaptation (TTA) aims to address distribution shifts between source and target data by relying solely on target data during testing. In open-world scenarios, models often encounter noisy samples, i.e., samples outside the in-distribution (ID) label space. Leveraging the zero-shot capability of pre-trained vision-language models (VLMs), this paper introduces Zero-Shot Noisy TTA (ZS-NTTA), focusing on adapting the model to target data with noisy samples during test-time in a zero-shot manner. We find existing TTA methods underperform under ZS-NTTA, often lagging behind even the frozen model. We conduct comprehensive experiments to analyze this phenomenon, revealing that the negative impact of unfiltered noisy data outweighs the benefits of clean data during model updating. Also, adapting a classifier for ID classification and noise detection hampers both sub-tasks. Built on this, we propose a framework that decouples the classifier and detector, focusing on developing an individual detector while keeping the classifier frozen. Technically, we introduce the Adaptive Noise Detector (AdaND), which utilizes the frozen model’s outputs as pseudo-labels to train a noise detector. To handle clean data streams, we further inject Gaussian noise during adaptation, preventing the detector from misclassifying clean samples as noisy. Beyond the ZS-NTTA, AdaND can also improve the zero-shot out-of-distribution (ZS-OOD) detection ability of VLMs. Experiments show that AdaND outperforms in both ZS-NTTA and ZS-OOD detection. On ImageNet, AdaND achieves a notable improvement of 8.32% in harmonic mean accuracy ( \textAcc_\textH ) for ZS-NTTA and 9.40% in FPR95 for ZS-OOD detection, compared to SOTA methods. Importantly, AdaND is computationally efficient and comparable to the model-frozen method. The code is publicly available at: this https URL.

[LG-21] Multi-Class Imbalanced Learning with Support Vector Machines via Differential Evolution

链接: https://arxiv.org/abs/2502.14597
作者: Zhong-Liang Zhang,Jie Yang,Jian-Ming Ru,Xiao-Xi Zhao,Xing-Gang Luo
类目: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
*备注:

点击查看摘要

Abstract:Support vector machine (SVM) is a powerful machine learning algorithm to handle classification tasks. However, the classical SVM is developed for binary problems with the assumption of balanced datasets. Obviously, the multi-class imbalanced classification problems are more complex. In this paper, we propose an improved SVM via Differential Evolution (i-SVM-DE) method to deal with it. An improved SVM (i-SVM) model is proposed to handle the data imbalance by combining cost sensitive technique and separation margin modification in the constraints, which formalize a parameter optimization problem. By using one-versus-one (OVO) scheme, a multi-class problem is decomposed into a number of binary subproblems. A large optimization problem is formalized through concatenating the parameters in the binary subproblems. To find the optimal model effectively and learn the support vectors for each class simultaneously, an improved differential evolution (DE) algorithm is applied to solve this large optimization problem. Instead of the validation set, we propose the fitness functions to evaluate the learned model and obtain the optimal parameters in the search process of DE. A series of experiments are carried out to verify the benefits of our proposed method. The results indicate that i-SVM-DE is statistically superior by comparing with the other baseline methods.

[LG-22] Moshi Moshi? A Model Selection Hijacking Adversarial Attack

链接: https://arxiv.org/abs/2502.14586
作者: Riccardo Petrucci,Luca Pajola,Francesco Marchiori,Luca Pasa,Mauro conti
类目: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
*备注:

点击查看摘要

Abstract:Model selection is a fundamental task in Machine Learning~(ML), focusing on selecting the most suitable model from a pool of candidates by evaluating their performance on specific metrics. This process ensures optimal performance, computational efficiency, and adaptability to diverse tasks and environments. Despite its critical role, its security from the perspective of adversarial ML remains unexplored. This risk is heightened in the Machine-Learning-as-a-Service model, where users delegate the training phase and the model selection process to third-party providers, supplying data and training strategies. Therefore, attacks on model selection could harm both the user and the provider, undermining model performance and driving up operational costs. In this work, we present MOSHI (MOdel Selection HIjacking adversarial attack), the first adversarial attack specifically targeting model selection. Our novel approach manipulates model selection data to favor the adversary, even without prior knowledge of the system. Utilizing a framework based on Variational Auto Encoders, we provide evidence that an attacker can induce inefficiencies in ML deployment. We test our attack on diverse computer vision and speech recognition benchmark tasks and different settings, obtaining an average attack success rate of 75.42%. In particular, our attack causes an average 88.30% decrease in generalization capabilities, an 83.33% increase in latency, and an increase of up to 105.85% in energy consumption. These results highlight the significant vulnerabilities in model selection processes and their potential impact on real-world applications. Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR) Cite as: arXiv:2502.14586 [cs.LG] (or arXiv:2502.14586v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2502.14586 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

[LG-23] Predicting Filter Medium Performances in Chamber Filter Presses with Digital Twins Using Neural Network Technologies

链接: https://arxiv.org/abs/2502.14571
作者: Dennis Teutscher,Tyll Weber-Carstanjen,Stephan Simonis,Mathias J. Krause
类目: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
*备注:

点击查看摘要

Abstract:Efficient solid-liquid separation is crucial in industries like mining, but traditional chamber filter presses depend heavily on manual monitoring, leading to inefficiencies, downtime, and resource wastage. This paper introduces a machine learning-powered digital twin framework to improve operational flexibility and predictive control. A key challenge addressed is the degradation of the filter medium due to repeated cycles and clogging, which reduces filtration efficiency. To solve this, a neural network-based predictive model was developed to forecast operational parameters, such as pressure and flow rates, under various conditions. This predictive capability allows for optimized filtration cycles, reduced downtime, and improved process efficiency. Additionally, the model predicts the filter mediums lifespan, aiding in maintenance planning and resource sustainability. The digital twin framework enables seamless data exchange between filter press sensors and the predictive model, ensuring continuous updates to the training data and enhancing accuracy over time. Two neural network architectures, feedforward and recurrent, were evaluated. The recurrent neural network outperformed the feedforward model, demonstrating superior generalization. It achieved a relative L^2 -norm error of 5% for pressure and 9.3% for flow rate prediction on partially known data. For completely unknown data, the relative errors were 18.4% and 15.4% , respectively. Qualitative analysis showed strong alignment between predicted and measured data, with deviations within a confidence band of 8.2% for pressure and 4.8% for flow rate predictions. This work contributes an accurate predictive model, a new approach to predicting filter medium cycle impacts, and a real-time interface for model updates, ensuring adaptability to changing operational conditions.

[LG-24] An Entropic Metric for Measuring Calibration of Machine Learning Models

链接: https://arxiv.org/abs/2502.14545
作者: Daniel James Sumler,Lee Devlin,Simon Maskell,Richard O. Lane
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-25] Preordering: A hybrid of correlation clustering and partial ordering

链接: https://arxiv.org/abs/2502.14536
作者: Jannik Irmai,Maximilian Moeller,Bjoern Andres
类目: Machine Learning (cs.LG)
*备注: Source code: this https URL

点击查看摘要

Abstract:We discuss the preordering problem, a joint relaxation of the correlation clustering problem and the partial ordering problem. We show that preordering remains NP-hard even for values in -1,0,1\ . We introduce a linear-time 4 -approximation algorithm and a local search technique. For an integer linear program formulation, we establish a class of non-canonical facets of the associated preorder polytope. By solving a non-canonical linear program relaxation, we obtain non-trivial upper bounds on the objective value. We provide implementations of the algorithms we define, apply these to published social networks and compare the output and efficiency qualitatively and quantitatively.

[LG-26] Inter-turbine Modelling of Wind-Farm Power using Multi-task Learning

链接: https://arxiv.org/abs/2502.14527
作者: Simon M. Brealy,Lawrence A. Bull,Pauline Beltrando,Anders Sommer,Nikolaos Dervilis,Keith Worden
类目: Machine Learning (cs.LG)
*备注: Preprint submitted to Mechanical Systems and Signal Processing. A shortened version of this article has submitted to the Wind Energy Science Conference 2025

点击查看摘要

[LG-27] Investigating the Generalizability of ECG Noise Detection Across Diverse Data Sources and Noise Types

链接: https://arxiv.org/abs/2502.14522
作者: Sharmad Kalpande,Nilesh Kumar Sahu,Haroon Lone
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Electrocardiograms (ECGs) are essential for monitoring cardiac health, allowing clinicians to analyze heart rate variability (HRV), detect abnormal rhythms, and diagnose cardiovascular diseases. However, ECG signals, especially those from wearable devices, are often affected by noise artifacts caused by motion, muscle activity, or device-related interference. These artifacts distort R-peaks and the characteristic QRS complex, making HRV analysis unreliable and increasing the risk of misdiagnosis. Despite this, the few existing studies on ECG noise detection have primarily focused on a single dataset, limiting the understanding of how well noise detection models generalize across different datasets. In this paper, we investigate the generalizability of noise detection in ECG using a novel HRV-based approach through cross-dataset experiments on four datasets. Our results show that machine learning achieves an average accuracy of over 90% and an AUPRC of more than 0.9. These findings suggest that regardless of the ECG data source or the type of noise, the proposed method maintains high accuracy even on unseen datasets, demonstrating the feasibility of generalizability. Subjects: Machine Learning (cs.LG) Cite as: arXiv:2502.14522 [cs.LG] (or arXiv:2502.14522v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2502.14522 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

[LG-28] Port-Hamiltonian Neural Networks with Output Error Noise Models

链接: https://arxiv.org/abs/2502.14432
作者: Sarvin Moradi,Gerben I. Beintema,Nick Jaensson,Roland Tóth,Maarten Schoukens
类目: Machine Learning (cs.LG)
*备注: Preprint submitted to Automatica

点击查看摘要

[LG-29] Cardiac Evidence Backtracking for Eating Behavior Monitoring using Collocative Electrocardiogram Imagining

链接: https://arxiv.org/abs/2502.14430
作者: Xu-Lu Zhang,Zhen-Qun Yang,Dong-Mei Jiang,Ga Liao,Qing Li,Ramesh Jain,Xiao-Yong Wei
类目: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
*备注:

点击查看摘要

[LG-30] owards Efficient Automatic Self-Pruning of Large Language Models

链接: https://arxiv.org/abs/2502.14413
作者: Weizhong Huang,Yuxin Zhang,Xiawu Zheng,Fei Chao,Rongrong Ji
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-31] dtaianomaly: A Python library for time series anomaly detection

链接: https://arxiv.org/abs/2502.14381
作者: Louis Carpentier,Nick Seeuws,Wannes Meert,Mathias Verbeke
类目: Machine Learning (cs.LG); Databases (cs.DB)
*备注:

点击查看摘要

[LG-32] Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leiblier Maillard Sampling

链接: https://arxiv.org/abs/2502.14379
作者: Hao Qin,Kwang-Sung Jun,Chicheng Zhang
类目: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
*备注: 12 pages of the main body, 2 figures, 43 pages in total

点击查看摘要

Abstract:We study the problem of Multi-Armed Bandits (MAB) with reward distributions belonging to a One-Parameter Exponential Distribution (OPED) family. In the literature, several criteria have been proposed to evaluate the performance of such algorithms, including Asymptotic Optimality (A.O.), Minimax Optimality (M.O.), Sub-UCB, and variance-adaptive worst-case regret bound. Thompson Sampling (TS)-based and Upper Confidence Bound (UCB)-based algorithms have been employed to achieve some of these criteria. However, none of these algorithms simultaneously satisfy all the aforementioned criteria. In this paper, we design an algorithm, Exponential Kullback-Leibler Maillard Sampling (abbrev. \expklms), that can achieve multiple optimality criteria simultaneously, including A.O., M.O. with a logarithmic factor, Sub-UCB, and variance-adaptive worst-case regret bound. Comments: 12 pages of the main body, 2 figures, 43 pages in total Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS) Cite as: arXiv:2502.14379 [cs.LG] (or arXiv:2502.14379v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2502.14379 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

[LG-33] VFL-RPS: Relevant Participant Selection in Vertical Federated Learning

链接: https://arxiv.org/abs/2502.14375
作者: Afsana Khan,Marijn ten Thij,Guangzhi Tang,Anna Wilbik
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Federated Learning (FL) allows collaboration between different parties, while ensuring that the data across these parties is not shared. However, not every collaboration is helpful in terms of the resulting model performance. Therefore, it is an important challenge to select the correct participants in a collaboration. As it currently stands, most of the efforts in participant selection in the literature have focused on Horizontal Federated Learning (HFL), which assumes that all features are the same across all participants, disregarding the possibility of different features across participants which is captured in Vertical Federated Learning (VFL). To close this gap in the literature, we propose a novel method VFL-RPS for participant selection in VFL, as a pre-training step. We have tested our method on several data sets performing both regression and classification tasks, showing that our method leads to comparable results as using all data by only selecting a few participants. In addition, we show that our method outperforms existing methods for participant selection in VFL.

[LG-34] Optimize Cardinality Estimation Model Pretraining by Simplifying the Training Datasets

链接: https://arxiv.org/abs/2502.14350
作者: Boyang Fang
类目: Databases (cs.DB); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:The cardinality estimation is a key aspect of query optimization research, and its performance has significantly improved with the integration of machine learning. To overcome the “cold start” problem or the lack of model transferability in learned cardinality estimators, some pre-training cardinality estimation models have been proposed that use learning across multiple datasets and corresponding workloads. These models typically train on a dataset created by uniformly sampling from many datasets, but this approach may not be optimal. By applying the Group Distributionally Robust Optimization (Group DRO) algorithm to training datasets, we find that some specific training datasets contribute more significantly to model performance than others. Based on this observation, we conduct extensive experiments to delve deeper into pre-training cardinality estimators. Our results show how the performance of these models can be influenced by the datasets and corresponding workloads. Finally, we introduce a simplified training dataset, which has been reduced to a fraction of the size of existing pretraining datasets. Sufficient experimental results demonstrate that the pre-trained cardinality estimator based on this simplified dataset can still achieve comparable performance to existing models in zero-shot setups.

[LG-35] On Theoretical Limits of Learning with Label Differential Privacy

链接: https://arxiv.org/abs/2502.14309
作者: Puning Zhao,Chuan Ma,Li Shen,Shaowei Wang,Rongfei Fan
类目: Machine Learning (cs.LG); Information Theory (cs.IT)
*备注:

点击查看摘要

Abstract:Label differential privacy (DP) is designed for learning problems involving private labels and public features. While various methods have been proposed for learning under label DP, the theoretical limits remain largely unexplored. In this paper, we investigate the fundamental limits of learning with label DP in both local and central models for both classification and regression tasks, characterized by minimax convergence rates. We establish lower bounds by converting each task into a multiple hypothesis testing problem and bounding the test error. Additionally, we develop algorithms that yield matching upper bounds. Our results demonstrate that under label local DP (LDP), the risk has a significantly faster convergence rate than that under full LDP, i.e. protecting both features and labels, indicating the advantages of relaxing the DP definition to focus solely on labels. In contrast, under the label central DP (CDP), the risk is only reduced by a constant factor compared to full DP, indicating that the relaxation of CDP only has limited benefits on the performance.

[LG-36] μRL: Discovering Transient Execution Vulnerabilities Using Reinforcement Learning

链接: https://arxiv.org/abs/2502.14307
作者: M. Caner Tol,Kemal Derya,Berk Sunar
类目: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:We propose using reinforcement learning to address the challenges of discovering microarchitectural vulnerabilities, such as Spectre and Meltdown, which exploit subtle interactions in modern processors. Traditional methods like random fuzzing fail to efficiently explore the vast instruction space and often miss vulnerabilities that manifest under specific conditions. To overcome this, we introduce an intelligent, feedback-driven approach using RL. Our RL agents interact with the processor, learning from real-time feedback to prioritize instruction sequences more likely to reveal vulnerabilities, significantly improving the efficiency of the discovery process. We also demonstrate that RL systems adapt effectively to various microarchitectures, providing a scalable solution across processor generations. By automating the exploration process, we reduce the need for human intervention, enabling continuous learning that uncovers hidden vulnerabilities. Additionally, our approach detects subtle signals, such as timing anomalies or unusual cache behavior, that may indicate microarchitectural weaknesses. This proposal advances hardware security testing by introducing a more efficient, adaptive, and systematic framework for protecting modern processors. When unleashed on Intel Skylake-X and Raptor Lake microarchitectures, our RL agent was indeed able to generate instruction sequences that cause significant observable byte leakages through transient execution without generating any \mu code assists, faults or interrupts. The newly identified leaky sequences stem from a variety of Intel instructions, e.g. including SERIALIZE, VERR/VERW, CLMUL, MMX-x87 transitions, LSL+RDSCP and LAR. These initial results give credence to the proposed approach. Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR); Machine Learning (cs.LG) Cite as: arXiv:2502.14307 [cs.CR] (or arXiv:2502.14307v1 [cs.CR] for this version) https://doi.org/10.48550/arXiv.2502.14307 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

[LG-37] Efficient AI in Practice: Training and Deployment of Efficient LLM s for Industry Applications

链接: https://arxiv.org/abs/2502.14305
作者: Kayhan Behdin,Yun Dai,Ata Fatahibaarzi,Aman Gupta,Qingquan Song,Shao Tang,Hejian Sang,Gregory Dexter,Sirou Zhu,Siyu Zhu,Tejas Dharamsi,Maziar Sanjabi,Vignesh Kothapalli,Hamed Firooz,Zhoutong Fu,Yihan Cao,Pin-Lun Hsu,Fedor Borisyuk,Zhipeng Wang,Rahul Mazumder,Natesh Pillai,Luke Simon
类目: Information Retrieval (cs.IR); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-38] Generalization Certificates for Adversarially Robust Bayesian Linear Regression

链接: https://arxiv.org/abs/2502.14298
作者: Mahalakshmi Sabanayagam,Russell Tsuchida,Cheng Soon Ong,Debarghya Ghoshdastidar
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注: Under review

点击查看摘要

Abstract:Adversarial robustness of machine learning models is critical to ensuring reliable performance under data perturbations. Recent progress has been on point estimators, and this paper considers distributional predictors. First, using the link between exponential families and Bregman divergences, we formulate an adversarial Bregman divergence loss as an adversarial negative log-likelihood. Using the geometric properties of Bregman divergences, we compute the adversarial perturbation for such models in closed-form. Second, under such losses, we introduce \emphadversarially robust posteriors, by exploiting the optimization-centric view of generalized Bayesian inference. Third, we derive the \emphfirst rigorous generalization certificates in the context of an adversarial extension of Bayesian linear regression by leveraging the PAC-Bayesian framework. Finally, experiments on real and synthetic datasets demonstrate the superior robustness of the derived adversarially robust posterior over Bayes posterior, and also validate our theoretical guarantees.

[LG-39] Predicting Fetal Birthweight from High Dimensional Data using Advanced Machine Learning

链接: https://arxiv.org/abs/2502.14270
作者: Nachiket Kapure,Harsh Joshi,Rajeshwari Mistri,Parul Kumari,Manasi Mali,Seema Purohit,Neha Sharma,Mrityunjoy Panday,Chittaranjan S. Yajnik
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Birth weight serves as a fundamental indicator of neonatal health, closely linked to both early medical interventions and long-term developmental risks. Traditional predictive models, often constrained by limited feature selection and incomplete datasets, struggle to achieve overlooking complex maternal and fetal interactions in diverse clinical settings. This research explores machine learning to address these limitations, utilizing a structured methodology that integrates advanced imputation strategies, supervised feature selection techniques, and predictive modeling. Given the constraints of the dataset, the research strengthens the role of data preprocessing in improving the model performance. Among the various methodologies explored, tree-based feature selection methods demonstrated superior capability in identifying the most relevant predictors, while ensemble-based regression models proved highly effective in capturing non-linear relationships and complex maternal-fetal interactions within the data. Beyond model performance, the study highlights the clinical significance of key physiological determinants, offering insights into maternal and fetal health factors that influence birth weight, offering insights that extend over statistical modeling. By bridging computational intelligence with perinatal research, this work underscores the transformative role of machine learning in enhancing predictive accuracy, refining risk assessment and informing data-driven decision-making in maternal and neonatal care. Keywords: Birth weight prediction, maternal-fetal health, MICE, BART, Gradient Boosting, neonatal outcomes, Clinipredictive.

[LG-40] LabTOP: A Unified Model for Lab Test Outcome Prediction on Electronic Health Records

链接: https://arxiv.org/abs/2502.14259
作者: Sujeong Im,Jungwoo Oh,Edward Choi
类目: Machine Learning (cs.LG)
*备注: 11 pages for main text, 4 pages for appendix

点击查看摘要

[LG-41] Real-Time Sampling-based Online Planning for Drone Interception ICRA2025

链接: https://arxiv.org/abs/2502.14231
作者: Gilhyun Ryou,Lukas Lao Beyer,Sertac Karaman
类目: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)
*备注: Accepted at ICRA 2025. Supplementary video: this https URL

点击查看摘要

[LG-42] A Non-Asymptotic Theory of Seminorm Lyapunov Stability: From Deterministic to Stochastic Iterative Algorithms

链接: https://arxiv.org/abs/2502.14208
作者: Zaiwei Chen,Sheng Zhang,Zhe Zhang,Shaan Ul Haque,Siva Theja Maguluri
类目: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:We study the problem of solving fixed-point equations for seminorm-contractive operators and establish foundational results on the non-asymptotic behavior of iterative algorithms in both deterministic and stochastic settings. Specifically, in the deterministic setting, we prove a fixed-point theorem for seminorm-contractive operators, showing that iterates converge geometrically to the kernel of the seminorm. In the stochastic setting, we analyze the corresponding stochastic approximation (SA) algorithm under seminorm-contractive operators and Markovian noise, providing a finite-sample analysis for various stepsize choices. A benchmark for equation solving is linear systems of equations, where the convergence behavior of fixed-point iteration is closely tied to the stability of linear dynamical systems. In this special case, our results provide a complete characterization of system stability with respect to a seminorm, linking it to the solution of a Lyapunov equation in terms of positive semi-definite matrices. In the stochastic setting, we establish a finite-sample analysis for linear Markovian SA without requiring the Hurwitzness assumption. Our theoretical results offer a unified framework for deriving finite-sample bounds for various reinforcement learning algorithms in the average reward setting, including TD( \lambda ) for policy evaluation (which is a special case of solving a Poisson equation) and Q-learning for control. Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML) Cite as: arXiv:2502.14208 [cs.LG] (or arXiv:2502.14208v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2502.14208 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

[LG-43] Multi-Faceted Studies on Data Poisoning can Advance LLM Development

链接: https://arxiv.org/abs/2502.14182
作者: Pengfei He,Yue Xing,Han Xu,Zhen Xiang,Jiliang Tang
类目: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:The lifecycle of large language models (LLMs) is far more complex than that of traditional machine learning models, involving multiple training stages, diverse data sources, and varied inference methods. While prior research on data poisoning attacks has primarily focused on the safety vulnerabilities of LLMs, these attacks face significant challenges in practice. Secure data collection, rigorous data cleaning, and the multistage nature of LLM training make it difficult to inject poisoned data or reliably influence LLM behavior as intended. Given these challenges, this position paper proposes rethinking the role of data poisoning and argue that multi-faceted studies on data poisoning can advance LLM development. From a threat perspective, practical strategies for data poisoning attacks can help evaluate and address real safety risks to LLMs. From a trustworthiness perspective, data poisoning can be leveraged to build more robust LLMs by uncovering and mitigating hidden biases, harmful outputs, and hallucinations. Moreover, from a mechanism perspective, data poisoning can provide valuable insights into LLMs, particularly the interplay between data and model behavior, driving a deeper understanding of their underlying mechanisms.

[LG-44] InstaSHAP: Interpretable Additive Models Explain Shapley Values Instantly

链接: https://arxiv.org/abs/2502.14177
作者: James Enouen,Yan Liu
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

[LG-45] Blockchain-based Framework for Scalable and Incentivized Federated Learning

链接: https://arxiv.org/abs/2502.14170
作者: Bijun Wu,Oshani Seneviratne
类目: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
*备注:

点击查看摘要

[LG-46] Dual-level Mixup for Graph Few-shot Learning with Fewer Tasks WWW25

链接: https://arxiv.org/abs/2502.14158
作者: Yonghao Liu,Mengyu Li,Fausto Giunchiglia,Lan Huang,Ximing Li,Xiaoyue Feng,Renchu Guan
类目: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
*备注: WWW25

点击查看摘要

[LG-47] Learning the P2D Model for Lithium-Ion Batteries with SOH Detection

链接: https://arxiv.org/abs/2502.14147
作者: Maricela Best McKay,Bhushan Gopaluni,Brian Wetton
类目: Machine Learning (cs.LG); Chemical Physics (physics.chem-ph)
*备注: 18 pages, 5 figures

点击查看摘要

[LG-48] Efficient and Optimal Policy Gradient Algorithm for Corrupted Multi-armed Bandits

链接: https://arxiv.org/abs/2502.14146
作者: Jiayuan Liu,Siwei Wang,Zhixuan Fang
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-49] Cluster Analysis and Concept Drift Detection in Malware

链接: https://arxiv.org/abs/2502.14135
作者: Aniket Mishra,Mark Stamp
类目: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
*备注:

点击查看摘要

[LG-50] Understanding SGD with Exponential Moving Averag e: A Case Study in Linear Regression

链接: https://arxiv.org/abs/2502.14123
作者: Xuheng Li,Quanquan Gu
类目: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
*备注: 34 pages, 4 figures

点击查看摘要

[LG-51] A Supervised Machine-Learning Approach For Turboshaft Engine Dynamic Modeling Under Real Flight Conditions

链接: https://arxiv.org/abs/2502.14120
作者: Damiano Paniccia,Francesco Aldo Tucci,Joel Guerrero,Luigi Capone,Nicoletta Sanguini,Tommaso Benacchio,Luigi Bottasso
类目: Machine Learning (cs.LG); Systems and Control (eess.SY)
*备注: 26 pages, 14 figures, submitted to the Aeronautical Journal

点击查看摘要

[LG-52] Chasing the Timber Trail: Machine Learning to Reveal Harvest Location Misrepresentation

链接: https://arxiv.org/abs/2502.14115
作者: Shailik Sarkar(1),Raquib Bin Yousuf(1),Linhan Wang(1),Brian Mayer(1),Thomas Mortier(2),Victor Deklerck(2),Jakub Truszkowski(2),John C. Simeone(3),Marigold Norman(2),Jade Saunders(2),Chang-Tien Lu(1),Naren Ramakrishnan(1) ((1) Virginia Tech, (2) World Forest ID, (3) Simeone Consulting LLC)
类目: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Computers and Society (cs.CY)
*备注: 9 pages, 5 figures

点击查看摘要

[LG-53] Aligned Multi Objective Optimization

链接: https://arxiv.org/abs/2502.14096
作者: Yonathan Efroni,Ben Kertzu,Daniel Jiang,Jalaj Bhandari,Zheqing(Bill)Zhu,Karen Ullrich
类目: Machine Learning (cs.LG); Optimization and Control (math.OC)
*备注:

点击查看摘要

[LG-54] CND-IDS: Continual Novelty Detection for Intrusion Detection Systems

链接: https://arxiv.org/abs/2502.14094
作者: Sean Fuhrman,Onat Gungor,Tajana Rosing
类目: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
*备注: Accepted by the 62nd Design Automation Conference (DAC 2025)

点击查看摘要

[LG-55] Learning from End User Data with Shuffled Differential Privacy over Kernel Densities ICLR2025

链接: https://arxiv.org/abs/2502.14087
作者: Tal Wagner
类目: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS)
*备注: ICLR 2025

点击查看摘要

[LG-56] owards Vector Optimization on Low-Dimensional Vector Symbolic Architecture

链接: https://arxiv.org/abs/2502.14075
作者: Shijin Duan,Yejia Liu,Gaowen Liu,Ramana Rao Kompella,Shaolei Ren,Xiaolin Xu
类目: Machine Learning (cs.LG)
*备注: 10 pages, 2 figures. Accepted in CPAL 2025

点击查看摘要

[LG-57] I Want Em All (At Once) – Ultrametric Cluster Hierarchies

链接: https://arxiv.org/abs/2502.14018
作者: Andrew Draganov,Pascal Weber,Rasmus Skibdahl Melanchton Jørgensen,Anna Beer,Claudia Plant,Ira Assent
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-58] Smaller But Better: Unifying Layout Generation with Smaller Large Language Models

链接: https://arxiv.org/abs/2502.14005
作者: Peirong Zhang,Jiaxin Zhang,Jiahuan Cao,Hongliang Li,Lianwen Jin
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-59] Inter3D: A Benchmark and Strong Baseline for Human-Interactive 3D Object Reconstruction

链接: https://arxiv.org/abs/2502.14004
作者: Gan Chen,Ying He,Mulin Yu,F. Richard Yu,Gang Xu,Fei Ma,Ming Li,Guang Zhou
类目: Graphics (cs.GR); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-60] Beyond Single-Value Metrics: Evaluating and Enhancing LLM Unlearning with Cognitive Diagnosis

链接: https://arxiv.org/abs/2502.13996
作者: Yicheng Lang,Kehan Guo,Yue Huang,Yujun Zhou,Haomin Zhuang,Tianyu Yang,Yao Su,Xiangliang Zhang
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Due to the widespread use of LLMs and the rising critical ethical and safety concerns, LLM unlearning methods have been developed to remove harmful knowledge and undesirable capabilities. In this context, evaluations are mostly based on single-value metrics such as QA accuracy. However, these metrics often fail to capture the nuanced retention of harmful knowledge components, making it difficult to assess the true effectiveness of unlearning. To address this issue, we propose UNCD (UNlearning evaluation via Cognitive Diagnosis), a novel framework that leverages Cognitive Diagnosis Modeling for fine-grained evaluation of LLM unlearning. Our dedicated benchmark, UNCD-Cyber, provides a detailed assessment of the removal of dangerous capabilities. Moreover, we introduce UNCD-Agent, which refines unlearning by diagnosing knowledge remnants and generating targeted unlearning data. Extensive experiments across eight unlearning methods and two base models demonstrate that UNCD not only enhances evaluation but also effectively facilitates the removal of harmful LLM abilities.

[LG-61] Multi-Objective Causal Bayesian Optimization

链接: https://arxiv.org/abs/2502.14755
作者: Shriya Bhatija,Paul-David Zuercher,Jakob Thumm,Thomas Bohné
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注: 17 Pages, 12 Figures

点击查看摘要

[LG-62] Beyond Performance Scores: Directed Functional Connectivity as a Brain-Based Biomarker for Motor Skill Learning and Retention

链接: https://arxiv.org/abs/2502.14731
作者: Anil Kamat,Rahul Rahul,Lora Cavuoto,Harry Burke,Matthew Hackett,Jack Norfleet,Steven Schwaitzberg,Suvranu De
类目: Neurons and Cognition (q-bio.NC); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-63] Internal Incoherency Scores for Constraint-based Causal Discovery Algorithms

链接: https://arxiv.org/abs/2502.14719
作者: Sofia Faltenbacher,Jonas Wahl,Rebecca Herman,Jakob Runge
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注: under review

点击查看摘要

[LG-64] RUSWorthy: Toward Clinically Applicable Deep Learning for Confident Detection of Prostate Cancer in Micro-Ultrasound

链接: https://arxiv.org/abs/2502.14707
作者: Mohamed Harmanani,Paul F.R. Wilson,Minh Nguyen Nhat To,Mahdi Gilany,Amoon Jamzad,Fahimeh Fooladgar,Brian Wodlinger,Purang Abolmaesumi,Parvin Mousavi
类目: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Tissues and Organs (q-bio.TO)
*备注: accepted to IJCARS. This preprint has not undergone post-submission improvements or corrections. To access the Version of Record of this article, see the journal reference below

点击查看摘要

[LG-65] Confidence Estimation via Sequential Likelihood Mixing

链接: https://arxiv.org/abs/2502.14689
作者: Johannes Kirschner,Andreas Krause,Michele Meziu,Mojmir Mutny
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注:

点击查看摘要

[LG-66] Generalization Error of f-Divergence Stabilized Algorithms via Duality

链接: https://arxiv.org/abs/2502.14544
作者: Francisco Daunas,Iñaki Esnaola,Samir M. Perlaza,Gholamali Aminian
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注: This is new work for ISIT2025. arXiv admin note: text overlap with arXiv:2402.00501

点击查看摘要

[LG-67] Provable Quantum Algorithm Advantage for Gaussian Process Quadrature

链接: https://arxiv.org/abs/2502.14467
作者: Cristian A. Galvis-Florez,Ahmad Farooq,Simo Särkkä
类目: Computation (stat.CO); Machine Learning (cs.LG); Quantum Physics (quant-ph)
*备注: 21 pages, 6 figures

点击查看摘要

[LG-68] owards efficient quantum algorithms for diffusion probability models

链接: https://arxiv.org/abs/2502.14252
作者: Yunfei Wang,Ruoxi Jiang,Yingda Fan,Xiaowei Jia,Jens Eisert,Junyu Liu,Jin-Peng Liu
类目: Quantum Physics (quant-ph); Machine Learning (cs.LG)
*备注: 6+20 pages, 2 figures

点击查看摘要

[LG-69] OBELiX: A Curated Dataset of Crystal Structures and Experimentally Measured Ionic Conductivities for Lithium Solid-State Electrolytes

链接: https://arxiv.org/abs/2502.14234
作者: Félix Therrien,Jamal Abou Haibeh,Divya Sharma,Rhiannon Hendley,Alex Hernández-García,Sun Sun,Alain Tchagang,Jiang Su,Samuel Huberman,Yoshua Bengio,Hongyu Guo,Homin Shin
类目: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)
*备注: 8 pages, 3 figures and 2 tables

点击查看摘要

[LG-70] Sample Complexity of Linear Quadratic Regulator Without Initial Stability

链接: https://arxiv.org/abs/2502.14210
作者: Amirreza Neshaei Moghaddam,Alex Olshevsky,Bahman Gharesifard
类目: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY)
*备注:

点击查看摘要

[LG-71] Finite Sample Analysis of Distributional TD Learning with Linear Function Approximation

链接: https://arxiv.org/abs/2502.14172
作者: Yang Peng,Kaicheng Jin,Liangyu Zhang,Zhihua Zhang
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注: 57 pages

点击查看摘要

[LG-72] Prediction-Powered Adaptive Shrinkage Estimation

链接: https://arxiv.org/abs/2502.14166
作者: Sida Li,Nikolaos Ignatiadis
类目: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
*备注:

点击查看摘要

[LG-73] Conformal Prediction under Lévy-Prokhorov Distribution Shifts: Robustness to Local and Global Perturbations

链接: https://arxiv.org/abs/2502.14105
作者: Liviu Aolaritei,Michael I. Jordan,Youssef Marzouk,Zheyu Oliver Wang,Julie Zhu
类目: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME)
*备注:

点击查看摘要

Abstract:Conformal prediction provides a powerful framework for constructing prediction intervals with finite-sample guarantees, yet its robustness under distribution shifts remains a significant challenge. This paper addresses this limitation by modeling distribution shifts using Lévy-Prokhorov (LP) ambiguity sets, which capture both local and global perturbations. We provide a self-contained overview of LP ambiguity sets and their connections to popular metrics such as Wasserstein and Total Variation. We show that the link between conformal prediction and LP ambiguity sets is a natural one: by propagating the LP ambiguity set through the scoring function, we reduce complex high-dimensional distribution shifts to manageable one-dimensional distribution shifts, enabling exact quantification of worst-case quantiles and coverage. Building on this analysis, we construct robust conformal prediction intervals that remain valid under distribution shifts, explicitly linking LP parameters to interval width and confidence levels. Experimental results on real-world datasets demonstrate the effectiveness of the proposed approach.

[LG-74] Population Dynamics Control with Partial Observations

链接: https://arxiv.org/abs/2502.14079
作者: Zhou Lu,Y.Jennifer Sun,Zhiyu Zhang
类目: Optimization and Control (math.OC); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:We study the problem of controlling population dynamics, a class of linear dynamical systems evolving on the probability simplex, from the perspective of online non-stochastic control. While Golowich this http URL. 2024 analyzed the fully observable setting, we focus on the more realistic, partially observable case, where only a low-dimensional representation of the state is accessible. In classical non-stochastic control, inputs are set as linear combinations of past disturbances. However, under partial observations, disturbances cannot be directly computed. To address this, Simchowitz this http URL. 2020 proposed to construct oblivious signals, which are counterfactual observations with zero control, as a substitute. This raises several challenges in our setting: (1) how to construct oblivious signals under simplex constraints, where zero control is infeasible; (2) how to design a sufficiently expressive convex controller parameterization tailored to these signals; and (3) how to enforce the simplex constraint on control when projections may break the convexity of cost functions. Our main contribution is a new controller that achieves the optimal \tildeO(\sqrtT) regret with respect to a natural class of mixing linear dynamic controllers. To tackle these challenges, we construct signals based on hypothetical observations under a constant control adapted to the simplex domain, and introduce a new controller parameterization that approximates general control policies linear in non-oblivious observations. Furthermore, we employ a novel convex extension surrogate loss, inspired by Lattimore 2024, to bypass the projection-induced convexity issue. Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG) Cite as: arXiv:2502.14079 [math.OC] (or arXiv:2502.14079v1 [math.OC] for this version) https://doi.org/10.48550/arXiv.2502.14079 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

[LG-75] New Lower Bounds for Stochastic Non-Convex Optimization through Divergence Composition

链接: https://arxiv.org/abs/2502.14060
作者: El Mehdi Saad,Weicheng Lee,Francesco Orabona
类目: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
*备注:

点击查看摘要

[LG-76] Benchmarking Self-Supervised Methods for Accelerated MRI Reconstruction

链接: https://arxiv.org/abs/2502.14009
作者: Andrew Wang,Mike Davies
类目: Image and Video Processing (eess.IV); Machine Learning (cs.LG)
*备注: Preprint: Work in Progress

点击查看摘要

[LG-77] Remote Sensing Semantic Segmentation Quality Assessment based on Vision Language Model

链接: https://arxiv.org/abs/2502.13990
作者: Huiying Shi,Zhihong Tan,Zhihan Zhang,Hongchen Wei,Yaosi Hu,Yingxue Zhang,Zhenzhong Chen
类目: Image and Video Processing (eess.IV); Machine Learning (cs.LG)
*备注: 16 pages,6 figures

点击查看摘要

[LG-78] Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics

链接: https://arxiv.org/abs/2502.13982
作者: Kabir Kumar
类目: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Natural Language Processing (NLP) and Voice Recognition agents are rapidly evolving healthcare by enabling efficient, accessible, and professional patient support while automating grunt work. This report serves as my self project wherein models finetuned on medical call recordings are analysed through a two-stage system: Automatic Speech Recognition (ASR) for speech transcription and a Large Language Model (LLM) for context-aware, professional responses. ASR, finetuned on phone call recordings provides generalised transcription of diverse patient speech over call, while the LLM matches transcribed text to medical diagnosis. A novel audio preprocessing strategy, is deployed to provide invariance to incoming recording/call data, laden with sufficient augmentation with noise/clipping to make the pipeline robust to the type of microphone and ambient conditions the patient might have while calling/recording.

[LG-79] Regularização aprendizagem profunda e interdisciplinaridade em problemas inversos mal-postos

链接: https://arxiv.org/abs/2502.13976
作者: Roberto Gutierrez Beraldo,Ricardo Suyama
类目: Image and Video Processing (eess.IV); Machine Learning (cs.LG)
*备注: 200 pages, in Portuguese language, 54 figures

点击查看摘要

信息检索

[IR-0] A Survey of Model Architectures in Information Retrieval

链接: https://arxiv.org/abs/2502.14822
作者: Zhichao Xu,Fengran Mo,Zhiqi Huang,Crystina Zhang,Puxuan Yu,Bei Wang,Jimmy Lin,Vivek Srikumar
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:This survey examines the evolution of model architectures in information retrieval (IR), focusing on two key aspects: backbone models for feature extraction and end-to-end system architectures for relevance estimation. The review intentionally separates architectural considerations from training methodologies to provide a focused analysis of structural innovations in IR this http URL trace the development from traditional term-based methods to modern neural approaches, particularly highlighting the impact of transformer-based models and subsequent large language models (LLMs). We conclude by discussing emerging challenges and future directions, including architectural optimizations for performance and scalability, handling of multimodal, multilingual data, and adaptation to novel application domains beyond traditional search paradigms.

[IR-1] A Multi-Agent Perspective on Modern Information Retrieval

链接: https://arxiv.org/abs/2502.14796
作者: Haya Nachimovsky,Moshe Tennenholtz,Oren Kurland
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:The rise of large language models (LLMs) has introduced a new era in information retrieval (IR), where queries and documents that were once assumed to be generated exclusively by humans can now also be created by automated agents. These agents can formulate queries, generate documents, and perform ranking. This shift challenges some long-standing IR paradigms and calls for a reassessment of both theoretical frameworks and practical methodologies. We advocate for a multi-agent perspective to better capture the complex interactions between query agents, document agents, and ranker agents. Through empirical exploration of various multi-agent retrieval settings, we reveal the significant impact of these interactions on system performance. Our findings underscore the need to revisit classical IR paradigms and develop new frameworks for more effective modeling and evaluation of modern retrieval systems.

[IR-2] Less is More: On the Importance of Data Quality for Unit Test Generation

链接: https://arxiv.org/abs/2502.14212
作者: Junwei Zhang,Xing Hu,Shan Gao,Xin Xia,David Lo,Shanping Li
类目: oftware Engineering (cs.SE); Information Retrieval (cs.IR)
*备注:

点击查看摘要

[IR-3] Collaborative Retrieval for Large Language Model-based Conversational Recommender Systems WWW’2025

链接: https://arxiv.org/abs/2502.14137
作者: Yaochen Zhu,Chao Wan,Harald Steck,Dawen Liang,Yesu Feng,Nathan Kallus,Jundong Li
类目: Information Retrieval (cs.IR)
*备注: Accepted by WWW’2025

点击查看摘要

附件下载

点击下载今日全部论文列表

Arxiv今日论文 | 2025-02-21

目录

概览 (2025-02-21)

自然语言处理

计算机视觉

人工智能

机器学习

信息检索

附件下载