本篇博文主要展示 2024-08-23 从Arxiv.org论文网站获取的最新论文列表,自动更新,按照NLP、CV、ML、AI、IR五个大方向区分,若需要邮件定时接收,请在评论区留下你的邮箱号。

说明:每日论文数据从Arxiv.org获取,每天早上10:30左右定时自动更新。

友情提示: 如何您需要邮箱接收每日论文数据,请在评论处留下你的邮箱,同样每天10:30左右邮件定时自动发送。

目录

概览 (2024-08-23)

今日共更新423篇论文,其中:

  • 自然语言处理78篇(Computation and Language (cs.CL))
  • 人工智能151篇(Artificial Intelligence (cs.AI))
  • 计算机视觉88篇(Computer Vision and Pattern Recognition (cs.CV))
  • 机器学习130篇(Machine Learning (cs.LG))

自然语言处理

[NLP-0] Controllable Text Generation for Large Language Models : A Survey
[NLP-0] 大型语言模型的可控文本生成:调查

链接: https://arxiv.org/abs/2408.12599
作者: Xun Liang,Hanyu Wang,Yezhaohui Wang,Shichao Song,Jiawei Yang,Simin Niu,Jie Hu,Dan Liu,Shunyu Yao,Feiyu Xiong,Zhiyu Li
关键词-EN: Natural Language Processing, Large Language Models, Language Processing, Large Language, Natural Language
关键词-ZH: 自然语言处理、大型语言模型、语言处理、大型语言、自然语言
类目: Computation and Language (cs.CL)
备注: 52 pages, 11 figures, 7 tables, 11 equations

点击查看摘要

Abstract:In Natural Language Processing (NLP), Large Language Models (LLMs) have demonstrated high text generation quality. However, in real-world applications, LLMs must meet increasingly complex requirements. Beyond avoiding misleading or inappropriate content, LLMs are also expected to cater to specific user needs, such as imitating particular writing styles or generating text with poetic richness. These varied demands have driven the development of Controllable Text Generation (CTG) techniques, which ensure that outputs adhere to predefined control conditions–such as safety, sentiment, thematic consistency, and linguistic style–while maintaining high standards of helpfulness, fluency, and diversity. This paper systematically reviews the latest advancements in CTG for LLMs, offering a comprehensive definition of its core concepts and clarifying the requirements for control conditions and text quality. We categorize CTG tasks into two primary types: content control and attribute control. The key methods are discussed, including model retraining, fine-tuning, reinforcement learning, prompt engineering, latent space manipulation, and decoding-time intervention. We analyze each method’s characteristics, advantages, and limitations, providing nuanced insights for achieving generation control. Additionally, we review CTG evaluation methods, summarize its applications across domains, and address key challenges in current research, including reduced fluency and practicality. We also propose several appeals, such as placing greater emphasis on real-world applications in future research. This paper aims to offer valuable guidance to researchers and developers in the field. Our reference list and Chinese version are open-sourced at this https URL. Comments: 52 pages, 11 figures, 7 tables, 11 equations Subjects: Computation and Language (cs.CL) ACMclasses: A.2; I.2.7 Cite as: arXiv:2408.12599 [cs.CL] (or arXiv:2408.12599v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2408.12599 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
摘要:在自然语言处理(NLP)中,大语言模型(LLM)表现出很高的文本生成质量。然而,在实际应用中,LLMS必须满足日益复杂的要求。除了避免误导或不恰当的内容,LLMS还应迎合特定的用户需求,例如模仿特定的写作风格或生成具有诗意丰富的文本。这些不同的需求推动了可控文本生成(CTG)技术的发展,这些技术确保输出符合预定义的控制条件–如安全性、情感、主题一致性和语言风格–同时保持帮助、流畅和多样性的高标准。本文系统地回顾了低层管理语料库的最新研究进展,对其核心概念进行了全面的界定,并阐明了其对控制条件和文本质量的要求。我们将CTG任务分为两种主要类型:内容控制和属性控制。讨论了模型再训练、微调、强化学习、即时工程、潜在空间操纵、译码时间干预等关键方法。我们分析了每种方法的特点、优势和局限性,为实现发电控制提供了细致入微的见解。此外,我们还回顾了CTG的评估方法,总结了它在各个领域的应用,并解决了当前研究中的关键挑战,包括流畅性和实用性的降低。我们还提出了几点呼吁,例如在未来的研究中更加重视现实世界的应用。本文旨在为该领域的研究人员和开发人员提供有价值的指导。我们的参考列表和中文版本在此HTTPS URL上是开源的。评论:52页,11个图表,7个表格,11个方程式主题:计算和语言(cs.CL)ACM类:A.2;I.2.7引用为:arxiv:2408.12599cs.CLhttps://doi.org/10.48550/arXiv.2408.12599 Focus通过DataCite了解更多arxiv发布的文档(待注册)

[NLP-1] RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment
[NLP-1] RuleAlign:通过诊断规则对齐让大型语言模型变得更好

链接: https://arxiv.org/abs/2408.12579
作者: Xiaohan Wang,Xiaoyan Yang,Yuqi Zhu,Yue Shen,Jian Wang,Peng Wei,Lei Liang,Jinjie Gu,Huajun Chen,Ningyu Zhang
关键词-EN: Large Language Models, Large Language, Language Models, Med-Gemini achieve performance, achieve performance competitively
关键词-ZH: 大型语言模型,大型语言,语言模型,Med-Gemini实现性能,实现有竞争力的性能
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR); Machine Learning (cs.LG)
备注: Ongoing work

点击查看摘要

Abstract:Large Language Models (LLMs) like GPT-4, MedPaLM-2, and Med-Gemini achieve performance competitively with human experts across various medical benchmarks. However, they still face challenges in making professional diagnoses akin to physicians, particularly in efficiently gathering patient information and reasoning the final diagnosis. To this end, we introduce the RuleAlign framework, designed to align LLMs with specific diagnostic rules. We develop a medical dialogue dataset comprising rule-based communications between patients and physicians and design an alignment learning approach through preference learning. Experimental results demonstrate the effectiveness of the proposed approach. We hope that our work can serve as an inspiration for exploring the potential of LLMs as AI physicians.
摘要:GPT-4、MedPaLM-2和Med-Gemini等大型语言模型(LLM)在各种医疗基准上与人类专家实现了具有竞争力的性能。然而,他们在做出类似于医生的专业诊断方面仍然面临挑战,特别是在有效收集患者信息和推理最终诊断方面。为此,我们引入了RuleAlign框架,旨在将LLM与特定的诊断规则保持一致。我们开发了一个医疗对话数据集,其中包括患者和医生之间基于规则的沟通,并通过偏好学习设计了一种对齐学习方法。实验结果证明了所提出方法的有效性。我们希望我们的工作能够为探索法学硕士作为人工智能医生的潜力提供灵感。

[NLP-2] MuMA-ToM: Multi-modal Multi-Agent Theory of Mind
[NLP-2] MuMA-ToM:多模式多智能体思维理论

链接: https://arxiv.org/abs/2408.12574
作者: Haojun Shi,Suyu Ye,Xinyu Fang,Chuanyang Jin,Layla Isik,Yen-Ling Kuo,Tianmin Shu
关键词-EN: Understanding people social, Theory of Mind, Understanding people, complex real-world scenarios, intricate mental reasoning
关键词-ZH: 了解人的社交、心理理论、了解人、复杂的现实世界场景、复杂的心理推理
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
备注: Project website: this https URL Code: this https URL

点击查看摘要

Abstract:Understanding people’s social interactions in complex real-world scenarios often relies on intricate mental reasoning. To truly understand how and why people interact with one another, we must infer the underlying mental states that give rise to the social interactions, i.e., Theory of Mind reasoning in multi-agent interactions. Additionally, social interactions are often multi-modal – we can watch people’s actions, hear their conversations, and/or read about their past behaviors. For AI systems to successfully and safely interact with people in real-world environments, they also need to understand people’s mental states as well as their inferences about each other’s mental states based on multi-modal information about their interactions. For this, we introduce MuMA-ToM, a Multi-modal Multi-Agent Theory of Mind benchmark. MuMA-ToM is the first multi-modal Theory of Mind benchmark that evaluates mental reasoning in embodied multi-agent interactions. In MuMA-ToM, we provide video and text descriptions of people’s multi-modal behavior in realistic household environments. Based on the context, we then ask questions about people’s goals, beliefs, and beliefs about others’ goals. We validated MuMA-ToM in a human experiment and provided a human baseline. We also proposed a novel multi-modal, multi-agent ToM model, LIMP (Language model-based Inverse Multi-agent Planning). Our experimental results show that LIMP significantly outperforms state-of-the-art methods, including large multi-modal models (e.g., GPT-4o, Gemini-1.5 Pro) and a recent multi-modal ToM model, BIP-ALM.
摘要:理解人们在复杂的现实世界场景中的社交互动往往依赖于错综复杂的心理推理。为了真正理解人与人之间如何以及为什么相互作用,我们必须推断引起社会相互作用的潜在心理状态,即多智能体相互作用中的心理推理理论。此外,社交互动通常是多模式的–我们可以观察人们的行为,听到他们的对话,和/或阅读他们过去的行为。为了让人工智能系统在真实世界环境中成功、安全地与人互动,它们还需要了解人们的心理状态,以及他们基于关于互动的多模式信息对彼此心理状态的推断。为此,我们引入了多通道多智能体心理理论基准MUMA-TOM。MUMA-TOM是第一个评估具体化多智能体交互中的心理推理的多模式心理理论基准。在MUMA-TOM中,我们提供了人们在现实家庭环境中的多模式行为的视频和文本描述。然后,我们根据上下文,询问人们的目标、信念,以及对他人目标的信念。我们在人体实验中验证了MUMA-TOM,并提供了人体基线。提出了一种新的多通道、多智能体TOM模型–基于语言模型的逆向多智能体规划LIMP(LIMP)。我们的实验结果表明,LIMP的性能明显优于最先进的方法,包括大型多模式模型(例如,GPT-40,Gemini-1.5Pro)和最近的多模式TOM模型BIP-ALM。

[NLP-3] Jamba-1.5: Hybrid Transformer-Mamba Models at Scale WWW
[NLP-3] Jamba-1.5:大规模混合变形机-曼巴模型

链接: https://arxiv.org/abs/2408.12570
作者: Jamba Team:Barak Lenz,Alan Arazi,Amir Bergman,Avshalom Manevich,Barak Peleg,Ben Aviram,Chen Almagor,Clara Fridman,Dan Padnos,Daniel Gissin,Daniel Jannai,Dor Muhlgay,Dor Zimberg,Edden M Gerber,Elad Dolev,Eran Krakovsky,Erez Safahi,Erez Schwartz,Gal Cohen,Gal Shachaf,Haim Rozenblum,Hofit Bata,Ido Blass,Inbal Magar,Itay Dalmedigos,Jhonathan Osin,Julie Fadlon,Maria Rozman,Matan Danos,Michael Gokhman,Mor Zusman,Naama Gidron,Nir Ratner,Noam Gat,Noam Rozen,Oded Fried,Ohad Leshno,Omer Antverg,Omri Abend,Opher Lieber,Or Dagan,Orit Cohavi,Raz Alon,Ro’i Belson,Roi Cohen,Rom Gilad,Roman Glozman,Shahar Lev,Shaked Meirom,Tal Delbari,Tal Ness,Tomer Asida,Tom Ben Gal,Tom Braude,Uriya Pumerantz,Yehoshua Cohen,Yonatan Belinkov,Yuval Globerson,Yuval Peleg Levy,Yoav Shoham
关键词-EN: instruction-tuned large language, large language models, language models based, instruction-tuned large, large language
关键词-ZH: 经描述调整的大型语言、大型语言模型、基于语言模型、经描述调整的大型语言
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: Webpage: this https URL

点击查看摘要

Abstract:We present Jamba-1.5, new instruction-tuned large language models based on our Jamba architecture. Jamba is a hybrid Transformer-Mamba mixture of experts architecture, providing high throughput and low memory usage across context lengths, while retaining the same or better quality as Transformer models. We release two model sizes: Jamba-1.5-Large, with 94B active parameters, and Jamba-1.5-Mini, with 12B active parameters. Both models are fine-tuned for a variety of conversational and instruction-following capabilties, and have an effective context length of 256K tokens, the largest amongst open-weight models. To support cost-effective inference, we introduce ExpertsInt8, a novel quantization technique that allows fitting Jamba-1.5-Large on a machine with 8 80GB GPUs when processing 256K-token contexts without loss of quality. When evaluated on a battery of academic and chatbot benchmarks, Jamba-1.5 models achieve excellent results while providing high throughput and outperforming other open-weight models on long-context benchmarks. The model weights for both sizes are publicly available under the Jamba Open Model License and we release ExpertsInt8 as open source.
摘要:基于我们的JAMBA体系结构,我们提出了一种新的指令优化的大型语言模型JAMBA-1.5。Jamba是一种混合Transformer-Mamba专家体系结构,在不同的上下文长度上提供高吞吐量和低内存使用量,同时保持与Transformer模型相同或更好的质量。我们发布了两个型号:Jamba-1.5-Large,具有94B主动参数,以及Jamba-1.5-Mini,具有12B主动参数。这两种模型都针对各种对话和指令遵循功能进行了微调,并具有256K令牌的有效上下文长度,是开放重量模型中最大的。为了支持经济高效的推理,我们引入了ExpertsInt8,这是一种新的量化技术,允许在具有8个80 GB GPU的机器上安装Jamba-1.5-Large,同时处理256K令牌上下文而不会损失质量。当在一系列学术和聊天机器人基准上进行评估时,Jamba-1.5模型获得了出色的结果,同时提供了高吞吐量,并在长上下文基准上优于其他开放权重模型。在Jamba Open Model许可下,这两种尺寸的模型重量都是公开提供的,我们将ExpertsInt8作为开源发布。

[NLP-4] owards Evaluating and Building Versatile Large Language Models for Medicine
[NLP-4] 评估和构建医学通用大型语言模型

链接: https://arxiv.org/abs/2408.12547
作者: Chaoyi Wu,Pengcheng Qiu,Jinxin Liu,Hongfei Gu,Na Li,Ya Zhang,Yanfeng Wang,Weidi Xie
关键词-EN: comprehensive benchmark designed, designed to evaluate, evaluate the performance, performance of large, comprehensive benchmark
关键词-ZH: 综合基准设计,旨在评估、评估性能,性能大型、全面的基准
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:In this study, we present MedS-Bench, a comprehensive benchmark designed to evaluate the performance of large language models (LLMs) in clinical contexts. Unlike existing benchmarks that focus on multiple-choice question answering, MedS-Bench spans 11 high-level clinical tasks, including clinical report summarization, treatment recommendations, diagnosis, named entity recognition, and medical concept explanation, among others. We evaluated six leading LLMs, e.g., MEDITRON, Mistral, InternLM 2, Llama 3, GPT-4, and Claude-3.5 using few-shot prompting, and found that even the most sophisticated models struggle with these complex tasks. To address these limitations, we developed MedS-Ins, a large-scale instruction tuning dataset for medicine. MedS-Ins comprises 58 medically oriented language corpora, totaling 13.5 million samples across 122 tasks. To demonstrate the dataset’s utility, we conducted a proof-of-concept experiment by performing instruction tuning on a lightweight, open-source medical language model. The resulting model, MMedIns-Llama 3, significantly outperformed existing models across nearly all clinical tasks. To promote further advancements in the application of LLMs to clinical challenges, we have made the MedS-Ins dataset fully accessible and invite the research community to contribute to its expansion.Additionally, we have launched a dynamic leaderboard for MedS-Bench, which we plan to regularly update the test set to track progress and enhance the adaptation of general LLMs to the medical domain. Leaderboard: this https URL. Github: this https URL.
摘要:在这项研究中,我们提出了MEDS-BENCH,一个旨在评估大语言模型(LLM)在临床环境中的性能的综合基准。与专注于多项选择问题回答的现有基准不同,MEDS-BASE跨越11项高级别临床任务,包括临床报告摘要、治疗建议、诊断、命名实体识别和医学概念解释等。我们使用少量提示评估了六个领先的LLM,例如Meditron、Mistral、InternLm 2、Llama 3、GPT-4和Claude-3.5,发现即使是最复杂的模型也难以完成这些复杂的任务。为了解决这些限制,我们开发了MEDS-INS,这是一个用于医学的大规模教学调整数据集。MEDS-INS包括58个面向医学的语言语料库,总计1350万个样本,涉及122个任务。为了证明数据集的实用性,我们通过在一个轻量级的开源医学语言模型上执行指令调优,进行了一个概念验证实验。由此产生的模型MMedIns-Llama 3在几乎所有临床任务中都显著优于现有模型。为了推动LLMS在临床挑战中的应用取得进一步的进展,我们已经使MEDS-INS数据集完全可用,并邀请研究团体为其扩展做出贡献。此外,我们还推出了MEDS-BASE的动态排行榜,我们计划定期更新测试集,以跟踪进展情况,并增强普通LLMS在医疗领域的适应性。排行榜:此HTTPS URL。GitHub:此HTTPS URL。

[NLP-5] he Russian-focused embedders exploration: ruMTEB benchmark and Russian embedding model design
[NLP-5] 以俄罗斯为中心的嵌入器探索:ruMTEB基准和俄罗斯嵌入模型设计

链接: https://arxiv.org/abs/2408.12503
作者: Artem Snegirev,Maria Tikhonova,Anna Maksimova,Alena Fenogenova,Alexander Abramov
关键词-EN: Natural Language Processing, Language Processing, Natural Language, role in Natural, creating text embeddings
关键词-ZH: 自然语言处理,语言处理,自然语言,在自然中的角色,创建文本嵌入
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Embedding models play a crucial role in Natural Language Processing (NLP) by creating text embeddings used in various tasks such as information retrieval and assessing semantic text similarity. This paper focuses on research related to embedding models in the Russian language. It introduces a new Russian-focused embedding model called ru-en-RoSBERTa and the ruMTEB benchmark, the Russian version extending the Massive Text Embedding Benchmark (MTEB). Our benchmark includes seven categories of tasks, such as semantic textual similarity, text classification, reranking, and retrieval. The research also assesses a representative set of Russian and multilingual models on the proposed benchmark. The findings indicate that the new model achieves results that are on par with state-of-the-art models in Russian. We release the model ru-en-RoSBERTa, and the ruMTEB framework comes with open-source code, integration into the original framework and a public leaderboard.
摘要:嵌入模型通过创建用于信息检索和评估语义文本相似性等各种任务的文本嵌入,在自然语言处理(NLP)中发挥着至关重要的作用。本文重点关注与俄语嵌入模型相关的研究。它引入了一种新的以俄语为中心的嵌入模型ru-en-RoSBERTa和ruMTEB基准,俄语版本扩展了海量文本嵌入基准(MTEB)。我们的基准测试包括七类任务,例如语义文本相似性、文本分类、重新排序和检索。该研究还根据拟议基准评估了一组代表性的俄语和多语言模型。研究结果表明,新模型的结果与俄罗斯最先进的模型相当。我们发布了ru-en-RoSBERTa模型,ruMTEB框架带有开源代码、集成到原始框架和公共排行榜。

[NLP-6] GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models
[NLP-6] GenderCARE:评估和减少大型语言模型中性别偏见的综合框架

链接: https://arxiv.org/abs/2408.12494
作者: Kunsheng Tang,Wenbo Zhou,Jie Zhang,Aishan Liu,Gelei Deng,Shuai Li,Peigui Qi,Weiming Zhang,Tianwei Zhang,Nenghai Yu
关键词-EN: Large language models, exhibited remarkable capabilities, magnify societal biases, natural language generation, Large language
关键词-ZH: 大型语言模型,表现出非凡的能力,放大社会偏见,自然语言生成,大型语言
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but they have also been observed to magnify societal biases, particularly those related to gender. In response to this issue, several benchmarks have been proposed to assess gender bias in LLMs. However, these benchmarks often lack practical flexibility or inadvertently introduce biases. To address these shortcomings, we introduce GenderCARE, a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics for quantifying and mitigating gender bias in LLMs. To begin, we establish pioneering criteria for gender equality benchmarks, spanning dimensions such as inclusivity, diversity, explainability, objectivity, robustness, and realisticity. Guided by these criteria, we construct GenderPair, a novel pair-based benchmark designed to assess gender bias in LLMs comprehensively. Our benchmark provides standardized and realistic evaluations, including previously overlooked gender groups such as transgender and non-binary individuals. Furthermore, we develop effective debiasing techniques that incorporate counterfactual data augmentation and specialized fine-tuning strategies to reduce gender bias in LLMs without compromising their overall performance. Extensive experiments demonstrate a significant reduction in various gender bias benchmarks, with reductions peaking at over 90% and averaging above 35% across 17 different LLMs. Importantly, these reductions come with minimal variability in mainstream language tasks, remaining below 2%. By offering a realistic assessment and tailored reduction of gender biases, we hope that our GenderCARE can represent a significant step towards achieving fairness and equity in LLMs. More details are available at this https URL.
摘要:大型语言模型在自然语言生成方面表现出了卓越的能力,但也被观察到放大了社会偏见,特别是与性别有关的偏见。针对这一问题,提出了几项基准来评估低收入国家的性别偏见。然而,这些基准往往缺乏实际灵活性,或者无意中引入了偏见。为了解决这些不足,我们引入了GenderCARE,这是一个全面的框架,包括用于量化和减轻低成本管理中的性别偏见的创新标准、偏见评估、减少技术和评估指标。首先,我们为性别平等基准建立了开创性的标准,跨越了包容性、多样性、可解释性、客观性、稳健性和现实性等维度。在这些标准的指导下,我们构建了GenderPair,一个新的基于配对的基准,旨在全面评估LLMS中的性别偏见。我们的基准提供了标准化和现实的评估,包括以前被忽视的性别群体,如变性人和非二元个人。此外,我们开发了有效的去偏技术,其中结合了反事实数据增强和专门的微调策略,以在不影响其整体性能的情况下减少LLM中的性别偏见。广泛的实验表明,各种性别偏见基准显著降低,在17个不同的LLM中,降幅最高超过90%,平均超过35%。重要的是,这些减少在主流语言任务中的变异性很小,保持在2%以下。通过提供现实的评估和针对性别偏见的量身定做的减少,我们希望我们的性别CARE能够代表着在实现土地管理公平和公平方面迈出的重要一步。有关更多详细信息,请访问此HTTPS URL。

[NLP-7] Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese
[NLP-7] Vintern-1B:一种高效的越南语多模式大型语言模型

链接: https://arxiv.org/abs/2408.12480
作者: Khang T. Doan,Bao G. Huynh,Dung T. Hoang,Thuc D. Pham,Nhat H. Pham,Quan T.M. Nguyen,Bang Q. Vo,Suong N. Hoang
关键词-EN: multimodal large language, Vietnamese language tasks, multimodal large, MLLM, Vietnamese language
关键词-ZH: 多模式大型语言、越南语言任务、多模式大型、MLLM、越南语言
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
备注: arXiv admin note: text overlap with arXiv:2404.16821 by other authors

点击查看摘要

Abstract:In this report, we introduce Vintern-1B, a reliable 1-billion-parameters multimodal large language model (MLLM) for Vietnamese language tasks. By integrating the Qwen2-0.5B-Instruct language model with the InternViT-300M-448px visual model, Vintern-1B is optimized for a range of applications, including optical character recognition (OCR), document extraction, and general question-answering in Vietnamese context. The model is fine-tuned on an extensive dataset of over 3 million image-question-answer pairs, achieving robust performance and reliable results across multiple Vietnamese language benchmarks like OpenViVQA and ViTextVQA. Vintern-1B is small enough to fit into various on-device applications easily. Additionally, we have open-sourced several Vietnamese vision question answering (VQA) datasets for text and diagrams, created with Gemini 1.5 Flash. Our models are available at: this https URL.
摘要:在本报告中,我们介绍了Vintern-1B,这是一个可靠的10亿参数多模式大型语言模型(MLLM),用于越南语言任务。通过将Qwen 2 -0. 5 B-Direct语言模型与InternViT-300 M-448 px视觉模型集成,Vintern-1B针对一系列应用进行了优化,包括光学字符识别(OCR)、文档提取和越南背景下的一般问答。该模型在超过300万个图像-问答对的广泛数据集上进行了微调,在OpenViVQA和ViTextVQA等多个越南语言基准上实现了稳健的性能和可靠的结果。Vintern-1B足够小,可以轻松适应各种设备上应用程序。此外,我们还开源了多个使用Gemini 1.5 Flash创建的越南视觉问答(VQA)文本和图表数据集。我们的模型可在以下网址获取:此https URL。

[NLP-8] Enhancing Multi-hop Reasoning through Knowledge Erasure in Large Language Model Editing
[NLP-8] 通过大语言模型编辑中的知识擦除增强多跳推理

链接: https://arxiv.org/abs/2408.12456
作者: Mengqi Zhang,Bowen Fang,Qiang Liu,Pengjie Ren,Shu Wu,Zhumin Chen,Liang Wang
关键词-EN: internal knowledge inaccuracies, face challenges, outdated information, challenges with internal, inaccuracies and outdated
关键词-ZH: 内部知识不准确、面临挑战、信息过时、内部挑战、不准确和过时
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Large language models (LLMs) face challenges with internal knowledge inaccuracies and outdated information. Knowledge editing has emerged as a pivotal approach to mitigate these issues. Although current knowledge editing techniques exhibit promising performance in single-hop reasoning tasks, they show limitations when applied to multi-hop reasoning. Drawing on cognitive neuroscience and the operational mechanisms of LLMs, we hypothesize that the residual single-hop knowledge after editing causes edited models to revert to their original answers when processing multi-hop questions, thereby undermining their performance in multihop reasoning tasks. To validate this hypothesis, we conduct a series of experiments that empirically confirm our assumptions. Building on the validated hypothesis, we propose a novel knowledge editing method that incorporates a Knowledge Erasure mechanism for Large language model Editing (KELE). Specifically, we design an erasure function for residual knowledge and an injection function for new knowledge. Through joint optimization, we derive the optimal recall vector, which is subsequently utilized within a rank-one editing framework to update the parameters of targeted model layers. Extensive experiments on GPT-J and GPT-2 XL demonstrate that KELE substantially enhances the multi-hop reasoning capability of edited LLMs.
摘要:大型语言模型面临着内部知识不准确和信息过时的挑战。知识编辑已成为缓解这些问题的关键方法。虽然目前的知识编辑技术在单跳推理任务中表现出良好的性能,但当应用于多跳推理时,它们显示出局限性。借鉴认知神经科学和LLMS的运行机制,我们假设编辑后的单跳知识会导致编辑后的模型在处理多跳问题时恢复到原始答案,从而影响其在多跳推理任务中的性能。为了验证这一假设,我们进行了一系列实验,从经验上证实了我们的假设。在验证假设的基础上,我们提出了一种新的知识编辑方法,该方法结合了大语言模型编辑的知识擦除机制(KELEL)。具体地,我们设计了残留知识的删除函数和新知识的注入函数。通过联合优化,我们得到了最优的召回向量,然后在一阶编辑框架中利用该向量来更新目标模型层的参数。在GPT-J和GPT-2XL上的大量实验表明,Kele大大增强了编辑后的LLM的多跳推理能力。

[NLP-9] Positional Description for Numerical Normalization INTERSPEECH2024
[NLP-9] 数值规范化的位置描述

链接: https://arxiv.org/abs/2408.12430
作者: Deepanshu Gupta,Javier Latorre
关键词-EN: Positional Description Scheme, Description Scheme, Positional Description, present a Positional, digit sequences
关键词-ZH: 位置描述方案,描述方案,位置描述,呈现位置,数字序列
类目: Computation and Language (cs.CL)
备注: Published at Interspeech 2024

点击查看摘要

Abstract:We present a Positional Description Scheme (PDS) tailored for digit sequences, integrating placeholder value information for each digit. Given the structural limitations of subword tokenization algorithms, language models encounter critical Text Normalization (TN) challenges when handling numerical tasks. Our schema addresses this challenge through straightforward pre-processing, preserving the model architecture while significantly simplifying number normalization, rendering the problem tractable. This simplifies the task and facilitates more compact production-ready models capable of learning from smaller datasets. Furthermore, our investigations reveal that PDS enhances the arithmetic processing capabilities of language models, resulting in a relative accuracy improvement of 23% to 51% on complex arithmetic tasks. We demonstrate that PDS effectively mitigates fatal numerical normalization errors in neural models, requiring only a modest amount of training data without rule-based Finite State Transducers (FST). We demonstrate that PDS is essential for both the Text-To-Speech and Speech Recognition text processing, enabling effective TN under production constraints.
摘要:我们提出了一种为数字序列量身定做的位置描述方案(PDS),它集成了每个数字的占位符值信息。考虑到子词标记化算法的结构限制,语言模型在处理数字任务时遇到了关键的文本归一化(TN)挑战。我们的模式通过直截了当的预处理解决了这一挑战,保留了模型体系结构,同时显著简化了数字规范化,使问题易于处理。这简化了任务,并促进了能够从较小的数据集学习的更紧凑的生产就绪模型。此外,我们的研究表明,PDS增强了语言模型的算术处理能力,在复杂算术任务上的相对准确率提高了23%到51%。我们证明了PDS有效地减少了神经模型中致命的数值归一化错误,不需要基于规则的有限状态换能器(FST),只需要适度的训练数据。我们证明了PDS对于文本到语音和语音识别文本处理都是必不可少的,从而在生产约束下实现了有效的TN。

[NLP-10] A Comparative Analysis of Faithfulness Metrics and Humans in Citation Evaluation SIGIR2024
[NLP-10] 引用评价中忠实度与人类的比较分析

链接: https://arxiv.org/abs/2408.12398
作者: Weijia Zhang,Mohammad Aliannejadi,Jiahuan Pei,Yifei Yuan,Jia-Hong Huang,Evangelos Kanoulas
关键词-EN: Large language models, Large language, language models, unsupported or unverifiable, support
关键词-ZH: 大型语言模型,大型语言,语言模型,不支持或无法验证,支持
类目: Information Retrieval (cs.IR); Computation and Language (cs.CL)
备注: Accepted by the First Workshop on Large Language Model for Evaluation in Information Retrieval (LLM4Eval@SIGIR2024), non-archival. arXiv admin note: substantial text overlap with arXiv:2406.15264

点击查看摘要

Abstract:Large language models (LLMs) often generate content with unsupported or unverifiable content, known as “hallucinations.” To address this, retrieval-augmented LLMs are employed to include citations in their content, grounding the content in verifiable sources. Despite such developments, manually assessing how well a citation supports the associated statement remains a major challenge. Previous studies tackle this challenge by leveraging faithfulness metrics to estimate citation support automatically. However, they limit this citation support estimation to a binary classification scenario, neglecting fine-grained citation support in practical scenarios. To investigate the effectiveness of faithfulness metrics in fine-grained scenarios, we propose a comparative evaluation framework that assesses the metric effectiveness in distinguishing citations between three-category support levels: full, partial, and no support. Our framework employs correlation analysis, classification evaluation, and retrieval evaluation to measure the alignment between metric scores and human judgments comprehensively. Our results indicate no single metric consistently excels across all evaluations, highlighting the complexity of accurately evaluating fine-grained support levels. Particularly, we find that the best-performing metrics struggle to distinguish partial support from full or no support. Based on these findings, we provide practical recommendations for developing more effective metrics.
摘要:大型语言模型(LLM)通常会生成包含不受支持或无法验证的内容的内容,这就是所谓的“幻觉”。为了解决这个问题,使用了检索增强的LLMS来在其内容中包括引文,使内容基于可核实的来源。尽管有这样的发展,手动评估引文对相关陈述的支持程度仍然是一个重大挑战。以前的研究通过利用忠诚度来自动评估引文支持度来应对这一挑战。然而,他们将这种引文支持估计限制在二进制分类场景中,忽略了实际场景中的细粒度引文支持。为了考察信任度度量在细粒度场景中的有效性,我们提出了一个比较评估框架,该框架评估了在区分三个类别支持级别:完全支持、部分支持和不支持的引文时的有效性。我们的框架使用相关性分析、分类评价和检索评价来综合衡量度量分数和人类判断之间的一致性。我们的结果表明,没有一项指标在所有评估中始终表现出色,这突显了准确评估细粒度支持级别的复杂性。特别是,我们发现,表现最好的指标很难区分部分支持和完全支持或不支持。基于这些发现,我们为开发更有效的指标提供了切实可行的建议。

[NLP-11] CLEANANERCorp: Identifying and Correcting Incorrect Labels in the ANERcorp Dataset LREC COLING2024
[NLP-11] CleananerCorp:识别和更正ANERcorp数据集中的错误标签

链接: https://arxiv.org/abs/2408.12362
作者: Mashael Al-Duwais,Hend Al-Khalifa,Abdulmalik Al-Salman
关键词-EN: Named Entity Recognition, Entity Recognition, machine learning datasets, Named Entity, common issue
关键词-ZH: 命名实体识别,实体识别,机器学习数据集,命名实体,常见问题
类目: Computation and Language (cs.CL)
备注: Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation @ LREC-COLING 2024

点击查看摘要

Abstract:Label errors are a common issue in machine learning datasets, particularly for tasks such as Named Entity Recognition. Such label errors might hurt model training, affect evaluation results, and lead to an inaccurate assessment of model performance. In this study, we dived deep into one of the widely adopted Arabic NER benchmark datasets (ANERcorp) and found a significant number of annotation errors, missing labels, and inconsistencies. Therefore, in this study, we conducted empirical research to understand these errors, correct them and propose a cleaner version of the dataset named CLEANANERCorp. CLEANANERCorp will serve the research community as a more accurate and consistent benchmark.
摘要:标签错误是机器学习数据集中的一个常见问题,尤其是对于命名实体识别等任务。此类标签错误可能会损害模型训练、影响评估结果并导致模型性能评估不准确。在这项研究中,我们深入研究了广泛采用的阿拉伯语NER基准数据集之一(ANERcorp),发现了大量注释错误、标签缺失和不一致。因此,在这项研究中,我们进行了实证研究来了解这些错误,纠正它们,并提出了一个名为CleananerCorp的数据集更干净的版本。CleananerCorp将为研究界提供更准确和一致的基准。

[NLP-12] Fine-tuning Smaller Language Models for Question Answering over Financial Documents
[NLP-12] 微调财务文档问题解答的较小语言模型

链接: https://arxiv.org/abs/2408.12337
作者: Karmvir Singh Phogat,Sai Akhil Puranam,Sridhar Dasaratha,Chetan Harsha,Shashishekar Ramakrishna
关键词-EN: Recent research, acquire substantial reasoning, substantial reasoning abilities, reasoning exemplars crafted, significantly larger teacher
关键词-ZH: 最近的研究,获得了大量的推理、大量的推理能力、精心制作的推理样本、明显更大的老师
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
备注:

点击查看摘要

Abstract:Recent research has shown that smaller language models can acquire substantial reasoning abilities when fine-tuned with reasoning exemplars crafted by a significantly larger teacher model. We explore this paradigm for the financial domain, focusing on the challenge of answering questions that require multi-hop numerical reasoning over financial texts. We assess the performance of several smaller models that have been fine-tuned to generate programs that encode the required financial reasoning and calculations. Our findings demonstrate that these fine-tuned smaller models approach the performance of the teacher model. To provide a granular analysis of model performance, we propose an approach to investigate the specific student model capabilities that are enhanced by fine-tuning. Our empirical analysis indicates that fine-tuning refines the student models ability to express and apply the required financial concepts along with adapting the entity extraction for the specific data format. In addition, we hypothesize and demonstrate that comparable financial reasoning capability can be induced using relatively smaller datasets. Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY) Cite as: arXiv:2408.12337 [cs.CL] (or arXiv:2408.12337v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2408.12337 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
摘要:最近的研究表明,较小的语言模型与较大的教师模型制作的推理样本进行微调时,可以获得实质性的推理能力。我们探索了金融领域的这种范式,重点是回答需要对金融文本进行多跳数字推理的问题的挑战。我们评估了几个较小模型的性能,这些模型已经进行了微调,以生成编码所需财务推理和计算的程序。我们的发现表明,这些微调较小的模型接近教师模型的表现。为了提供模型性能的细粒度分析,我们提出了一种方法来调查通过微调增强的特定学生模型能力。我们的经验分析表明,微调改进了学生模型表达和应用所需金融概念的能力,并使实体提取适合特定的数据格式。此外,我们假设并证明,可以使用相对较小的数据集来诱导类似的金融推理能力。科目:计算与语言(cs.CL);人工智能(cs.AI);机器学习(cs.LG);系统与控制(eess.SY)引用如下:arxiv:2408.12337cs.CLhttps://doi.org/10.48550/arXiv.2408.12337 Focus通过DataCite了解更多arxiv发布的DOI(待注册)

[NLP-13] Interactive DualChecker for Mitigating Hallucinations in Distilling Large Language Models
[NLP-13] 交互式双线程用于缓解提炼大型语言模型中的幻觉

链接: https://arxiv.org/abs/2408.12326
作者: Meiyun Wang,Masahiro Suzuki,Hiroki Sakaji,Kiyoshi Izumi
关键词-EN: Large Language Models, Large Language, demonstrated exceptional capabilities, Language Models, Models
关键词-ZH: 大型语言模型,大型语言,表现出卓越的能力,语言模型,模型
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Computers and Society (cs.CY)
备注:

点击查看摘要

Abstract:Large Language Models (LLMs) have demonstrated exceptional capabilities across various machine learning (ML) tasks. Given the high costs of creating annotated datasets for supervised learning, LLMs offer a valuable alternative by enabling effective few-shot in-context learning. However, these models can produce hallucinations, particularly in domains with incomplete knowledge. Additionally, current methods for knowledge distillation using LLMs often struggle to enhance the effectiveness of both teacher and student models. To address these challenges, we introduce DualChecker, an innovative framework designed to mitigate hallucinations and improve the performance of both teacher and student models during knowledge distillation. DualChecker employs ContextAligner to ensure that the context provided by teacher models aligns with human labeling standards. It also features a dynamic checker system that enhances model interaction: one component re-prompts teacher models with more detailed content when they show low confidence, and another identifies borderline cases from student models to refine the teaching templates. This interactive process promotes continuous improvement and effective knowledge transfer between the models. We evaluate DualChecker using a green innovation textual dataset that includes binary, multiclass, and token classification tasks. The experimental results show that DualChecker significantly outperforms existing state-of-the-art methods, achieving up to a 17% improvement in F1 score for teacher models and 10% for student models. Notably, student models fine-tuned with LLM predictions perform comparably to those fine-tuned with actual data, even in a challenging domain. We make all datasets, models, and code from this research publicly available.
摘要:大型语言模型(LLM)在各种机器学习(ML)任务中表现出了卓越的能力。考虑到为监督学习创建注释数据集的高成本,LLMS通过支持有效的少镜头情景学习提供了一种有价值的替代方案。然而,这些模型会产生幻觉,特别是在知识不完整的领域。此外,目前使用LLMS进行知识提炼的方法往往难以提高教师和学生模型的有效性。为了应对这些挑战,我们引入了DualChecker,这是一个创新的框架,旨在减轻幻觉并提高教师和学生模型在知识蒸馏过程中的表现。DualChecker使用ConextAligner来确保教师模型提供的上下文与人类标签标准保持一致。它还具有增强模型交互的动态检查系统:一个组件在教师模型表现出较低的可信度时重新提示教师模型更详细的内容,另一个组件从学生模型中识别边界案例以改进教学模板。这种互动过程促进了模型之间的持续改进和有效的知识转移。我们使用包括二进制、多类和令牌分类任务的绿色创新文本数据集来评估DualChecker。实验结果表明,DualChecker的性能明显优于现有的最先进方法,教师模型的F1分数提高了17%,学生模型的F1分数提高了10%。值得注意的是,用LLM预测微调的学生模型与那些用实际数据微调的学生模型的表现相当,即使在一个具有挑战性的领域也是如此。我们公开这项研究的所有数据集、模型和代码。

[NLP-14] Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators
[NLP-14] 通过解码时间幻觉和真实比较器改进大型语言模型中的事实

链接: https://arxiv.org/abs/2408.12325
作者: Dingkang Yang,Dongling Xiao,Jinjie Wei,Mingcheng Li,Zhaoyu Chen,Ke Li,Lihua Zhang
关键词-EN: Large Language Models, Large Language, contradict verifiable facts, unfaithful hallucination content, Language Models
关键词-ZH: 大型语言模型,大型语言,矛盾可验证的事实,不忠实的幻觉内容,语言模型
类目: Computation and Language (cs.CL)
备注: Hallucination Mitigation in LLMs

点击查看摘要

Abstract:Despite their remarkable capabilities, Large Language Models (LLMs) are prone to generate responses that contradict verifiable facts, i.e., unfaithful hallucination content. Existing efforts generally focus on optimizing model parameters or editing semantic representations, which compromise the internal factual knowledge of target LLMs. In addition, hallucinations typically exhibit multifaceted patterns in downstream tasks, limiting the model’s holistic performance across tasks. In this paper, we propose a Comparator-driven Decoding-Time (CDT) framework to alleviate the response hallucination. Firstly, we construct hallucinatory and truthful comparators with multi-task fine-tuning samples. In this case, we present an instruction prototype-guided mixture of experts strategy to enhance the ability of the corresponding comparators to capture different hallucination or truthfulness patterns in distinct task instructions. CDT constrains next-token predictions to factuality-robust distributions by contrasting the logit differences between the target LLMs and these comparators. Systematic experiments on multiple downstream tasks show that our framework can significantly improve the model performance and response factuality.
摘要:尽管大型语言模型具有显著的能力,但它们容易产生与可验证的事实相矛盾的反应,即不忠实的幻觉内容。现有的工作一般集中在优化模型参数或编辑语义表示上,这损害了目标LLM的内部事实知识。此外,幻觉通常在下游任务中表现出多方面的模式,限制了模型在不同任务中的整体表现。在本文中,我们提出了一个比较器驱动的解码时间(CDT)框架来缓解反应幻觉。首先,我们用多任务微调样本构造了虚幻和真实的比较器。在这种情况下,我们提出了一种指令原型指导的专家混合策略,以增强相应的比较器捕获不同任务指令中的不同幻觉或真实性模式的能力。CDT通过对比目标LLM和这些比较器之间的Logit差异,将下一个令牌预测约束为真实性稳健分布。在多个下游任务上的系统实验表明,该框架可以显著提高模型性能和响应真实性。

[NLP-15] MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model
[NLP-15] MaVEn:一种有效的多模式大型语言模型多粒度混合视觉编码框架

链接: https://arxiv.org/abs/2408.12321
作者: Chaoya Jiang,Jia Hongrui,Haiyang Xu,Wei Ye,Mengfan Dong,Ming Yan,Ji Zhang,Fei Huang,Shikun Zhang
关键词-EN: Multimodal Large Language, Large Language Models, Multi-granularity Visual Encoding, Encoding framework designed, Multimodal Large
关键词-ZH: 多模式大型语言,大型语言模型,多粒度视觉编码,编码框架设计,多模式大型
类目: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
备注:

点击查看摘要

Abstract:This paper presents MaVEn, an innovative Multi-granularity Visual Encoding framework designed to enhance the capabilities of Multimodal Large Language Models (MLLMs) in multi-image reasoning. Current MLLMs primarily focus on single-image visual understanding, limiting their ability to interpret and integrate information across multiple images. MaVEn addresses this limitation by combining discrete visual symbol sequences, which abstract coarse-grained semantic concepts, with traditional continuous representation sequences that model fine-grained features. This dual approach bridges the semantic gap between visual and textual data, thereby improving the model’s ability to process and interpret information from multiple images effectively. Additionally, we design a dynamic reduction mechanism by for long-sequence continuous features to enhance multi-image processing efficiency. Experimental results demonstrate that MaVEn significantly enhances MLLMs’ understanding in complex multi-image scenarios, while also improving performance in single-image contexts.
摘要:提出了一种新的多粒度视觉编码框架MAVEN,该框架旨在增强多通道大语言模型的多图像推理能力。目前的MLLMS主要关注单一图像的视觉理解,限制了它们解释和集成跨多个图像的信息的能力。Maven通过将抽象粗粒度语义概念的离散视觉符号序列与建模细粒度特征的传统连续表示序列相结合来解决这一限制。这种双重方法弥合了视觉和文本数据之间的语义鸿沟,从而提高了模型有效地处理和解释来自多个图像的信息的能力。此外,针对长序列连续特征,设计了一种动态约简机制,提高了多图像处理的效率。实验结果表明,MAVEN显著提高了MLLMS在复杂多图像场景下的理解能力,同时也提高了在单图像环境下的性能。

[NLP-16] Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations
[NLP-16] 大型语言模型是自学推理者:通过定制的问题解决演示增强LLM应用

链接: https://arxiv.org/abs/2408.12315
作者: Kai Tzu-iunn Ong,Taeyoon Kwon,Jinyoung Yeo
关键词-EN: Guiding large language, large language models, improving LLM applications, Guiding large, large language
关键词-ZH: 指导大型语言、大型语言模型、改进LLM应用、指导大型语言
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: preprint / under review

点击查看摘要

Abstract:Guiding large language models with a selected set of human-authored demonstrations is a common practice for improving LLM applications. However, human effort can be costly, especially in specialized domains (e.g., clinical diagnosis), and does not guarantee optimal performance due to the potential discrepancy of target skills between selected demonstrations and real test instances. Motivated by these, this paper explores the automatic creation of customized demonstrations, whose target skills align with the given target instance. We present SELF-TAUGHT, a problem-solving framework, which facilitates demonstrations that are “tailored” to the target problem and “filtered” for better quality (i.e., correctness) in a zero-shot manner. In 15 tasks of multiple-choice questions of diverse domains and the diagnosis of Alzheimer’s disease (AD) with real-world patients, SELF-TAUGHT achieves superior performance to strong baselines (e.g., Few-shot CoT, Plan-and-Solve, Auto-CoT). We conduct comprehensive analyses on SELF-TAUGHT, including its generalizability to existing prompting methods and different LLMs, the quality of its intermediate generation, and more.
摘要:使用一组精选的人工编写的演示来指导大型语言模型是改进LLM应用程序的常见做法。然而,人工工作可能是昂贵的,特别是在专门的领域(例如,临床诊断),并且由于选定的演示和实际测试实例之间的目标技能的潜在差异而不能保证最佳性能。受此启发,本文探索了定制演示的自动创建,其目标技能与给定的目标实例保持一致。我们提出了一个自学的问题解决框架,它促进了演示,这些演示是针对目标问题而“量身定做”的,并以零机会的方式“过滤”了更好的质量(即,正确性)。在不同领域的15个多项选择题和真实患者的阿尔茨海默病(AD)诊断任务中,自学取得了优于强基线的表现(例如,少射COT、计划并解决、自动COT)。我们对自学进行了全面的分析,包括它对现有的激励方法和不同的LLM的概括性,它的中间代的质量等等。

[NLP-17] oward the Evaluation of Large Language Models Considering Score Variance across Instruction Templates
[NLP-17] 考虑到教学模板之间的分数差异,评估大型语言模型

链接: https://arxiv.org/abs/2408.12263
作者: Yusuke Sakai,Adam Nohejl,Jiangnan Hang,Hidetaka Kamigaito,Taro Watanabe
关键词-EN: natural language understanding, large language models, NLU performance, language understanding, language models
关键词-ZH: 自然语言理解、大型语言模型、NLU性能、语言理解、语言模型
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 19 pages, 7 figures

点击查看摘要

Abstract:The natural language understanding (NLU) performance of large language models (LLMs) has been evaluated across various tasks and datasets. The existing evaluation methods, however, do not take into account the variance in scores due to differences in prompts, which leads to unfair evaluation and comparison of NLU performance. Moreover, evaluation designed for specific prompts is inappropriate for instruction tuning, which aims to perform well with any prompt. It is therefore necessary to find a way to measure NLU performance in a fair manner, considering score variance between different instruction templates. In this study, we provide English and Japanese cross-lingual datasets for evaluating the NLU performance of LLMs, which include multiple instruction templates for fair evaluation of each task, along with regular expressions to constrain the output format. Furthermore, we propose the Sharpe score as an evaluation metric that takes into account the variance in scores between templates. Comprehensive analysis of English and Japanese LLMs reveals that the high variance among templates has a significant impact on the fair evaluation of LLMs.
摘要:大型语言模型(LLM)的自然语言理解(NLU)性能已经在不同的任务和数据集上进行了评估。然而,现有的评价方法没有考虑到由于提示的不同而导致的分数差异,这导致了对自然语言理解成绩的不公平评价和比较。此外,针对特定提示设计的评估不适合于教学调整,其目的是在任何提示下都能很好地表现。因此,考虑到不同教学模板之间的分数差异,有必要找到一种公平的方式来衡量NLU的表现。在这项研究中,我们提供了英语和日语的跨语言数据集来评估LLMS的自然语言理解性能,其中包括用于公平评估每个任务的多个指令模板,以及约束输出格式的正则表达式。此外,我们提出了夏普分数作为一种评估度量,它考虑了模板之间分数的差异。对英语和日语学习模型的综合分析表明,模板之间的高度差异对学习模型的公平评价有很大的影响。

[NLP-18] A Language-agnostic Model of Child Language Acquisition
[NLP-18] 儿童语言习得的数字不可知模型

链接: https://arxiv.org/abs/2408.12254
作者: Louis Mahon,Omri Abend,Uri Berger,Katherine Demuth,Mark Johnson,Mark Steedman
关键词-EN: recent semantic bootstrapping, semantic bootstrapping child-language, bootstrapping child-language acquisition, child-language acquisition model, designed for English
关键词-ZH: 最近的语义引导,语义引导儿童语言,引导儿童语言习得,儿童语言习得模型,为英语设计
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:This work reimplements a recent semantic bootstrapping child-language acquisition model, which was originally designed for English, and trains it to learn a new language: Hebrew. The model learns from pairs of utterances and logical forms as meaning representations, and acquires both syntax and word meanings simultaneously. The results show that the model mostly transfers to Hebrew, but that a number of factors, including the richer morphology in Hebrew, makes the learning slower and less robust. This suggests that a clear direction for future work is to enable the model to leverage the similarities between different word forms.
摘要:这项工作重新实现了最近的语义引导儿童语言习得模型,该模型最初是为英语设计的,并训练其学习一种新语言:希伯来语。该模型从成对的话语和逻辑形式中学习作为意义表示,并同时获取语法和单词意义。结果表明,该模型主要转移到希伯来语,但包括希伯来语更丰富的形态学在内的许多因素使得学习速度更慢、更不稳健。这表明未来工作的明确方向是使该模型能够利用不同字词之间的相似性。

[NLP-19] LLMs are not Zero-Shot Reasoners for Biomedical Information Extraction
[NLP-19] LLM不是生物医学信息提取的零镜头推理器

链接: https://arxiv.org/abs/2408.12249
作者: Aishik Nagar,Viktor Schlegel,Thanh-Tung Nguyen,Hao Li,Yuping Wu,Kuluhan Binici,Stefan Winkler
关键词-EN: Large Language Models, Large Language, Language Models, Named Entity Recognition, document summarisation
关键词-ZH: 大型语言模型、大型语言、语言模型、命名实体识别、文档摘要
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 11 pages

点击查看摘要

Abstract:Large Language Models (LLMs) are increasingly adopted for applications in healthcare, reaching the performance of domain experts on tasks such as question answering and document summarisation. Despite their success on these tasks, it is unclear how well LLMs perform on tasks that are traditionally pursued in the biomedical domain, such as structured information extration. To breach this gap, in this paper, we systematically benchmark LLM performance in Medical Classification and Named Entity Recognition (NER) tasks. We aim to disentangle the contribution of different factors to the performance, particularly the impact of LLMs’ task knowledge and reasoning capabilities, their (parametric) domain knowledge, and addition of external knowledge. To this end we evaluate various open LLMs – including BioMistral and Llama-2 models – on a diverse set of biomedical datasets, using standard prompting, Chain-of-Thought (CoT) and Self-Consistency based reasoning as well as Retrieval-Augmented Generation (RAG) with PubMed and Wikipedia corpora. Counter-intuitively, our results reveal that standard prompting consistently outperforms more complex techniques across both tasks, laying bare the limitations in the current application of CoT, self-consistency and RAG in the biomedical domain. Our findings suggest that advanced prompting methods developed for knowledge- or reasoning-intensive tasks, such as CoT or RAG, are not easily portable to biomedical tasks where precise structured outputs are required. This highlights the need for more effective integration of external knowledge and reasoning mechanisms in LLMs to enhance their performance in real-world biomedical applications.
摘要:大型语言模型(LLM)越来越多地被应用于医疗保健领域,在回答问题和文档摘要等任务上达到了领域专家的性能。尽管它们在这些任务上取得了成功,但目前尚不清楚LLM在生物医学领域传统上追求的任务表现如何,例如结构化信息提取。为了打破这一差距,在本文中,我们系统地基准LLM在医学分类和命名实体识别(NER)任务中的性能。我们的目标是理清不同因素对性能的贡献,特别是LLMS的任务知识和推理能力、它们的(参数)领域知识以及外部知识的添加对性能的影响。为此,我们使用标准提示、思想链(COT)和基于自我一致性的推理以及带有PubMed和维基百科语料库的检索增强生成(RAG),在一组不同的生物医学数据集上评估了各种开放的LLMS–包括BioMistral和Llama-2模型。与直觉相反的是,我们的结果显示,在这两个任务中,标准提示始终优于更复杂的技术,暴露了COT、自我一致性和RAG在生物医学领域当前应用的局限性。我们的发现表明,为知识密集型或推理密集型任务开发的高级提示方法,如COT或RAG,不容易移植到需要精确结构化输出的生物医学任务中。这突显了需要在LLMS中更有效地整合外部知识和推理机制,以提高它们在现实世界生物医学应用中的性能。

[NLP-20] EvalYaks: Instruction Tuning Datasets and LoRA Fine-tuned Models for Automated Scoring of CEFR B2 Speaking Assessment Transcripts
[NLP-20] EvalYaks:用于CEFR B2口语评估成绩单自动评分的指令调整数据集和LoRA微调模型

链接: https://arxiv.org/abs/2408.12226
作者: Nicy Scaria,Silvester John Joseph Kennedy,Thomas Latinovich,Deepak Subramani
关键词-EN: Relying on human, creates scalability challenges, English speaking assessments, CEFR, CEFR speaking assessments
关键词-ZH: 依赖人类,带来可扩展性挑战,英语评估,CEFR,CEFR口语评估
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Relying on human experts to evaluate CEFR speaking assessments in an e-learning environment creates scalability challenges, as it limits how quickly and widely assessments can be conducted. We aim to automate the evaluation of CEFR B2 English speaking assessments in e-learning environments from conversation transcripts. First, we evaluate the capability of leading open source and commercial Large Language Models (LLMs) to score a candidate’s performance across various criteria in the CEFR B2 speaking exam in both global and India-specific contexts. Next, we create a new expert-validated, CEFR-aligned synthetic conversational dataset with transcripts that are rated at different assessment scores. In addition, new instruction-tuned datasets are developed from the English Vocabulary Profile (up to CEFR B2 level) and the CEFR-SP WikiAuto datasets. Finally, using these new datasets, we perform parameter efficient instruction tuning of Mistral Instruct 7B v0.2 to develop a family of models called EvalYaks. Four models in this family are for assessing the four sections of the CEFR B2 speaking exam, one for identifying the CEFR level of vocabulary and generating level-specific vocabulary, and another for detecting the CEFR level of text and generating level-specific text. EvalYaks achieved an average acceptable accuracy of 96%, a degree of variation of 0.35 levels, and performed 3 times better than the next best model. This demonstrates that a 7B parameter LLM instruction tuned with high-quality CEFR-aligned assessment data can effectively evaluate and score CEFR B2 English speaking assessments, offering a promising solution for scalable, automated language proficiency evaluation.
摘要:依赖人类专家在电子学习环境中评估CEFR口语评估带来了可扩展性挑战,因为它限制了评估的速度和范围。我们的目标是在电子学习环境中从对话记录中自动评估CEFR B2英语口语评估。首先,我们评估领先的开源和商业大型语言模型(LLM)在全球和印度特定环境下,在CEFR B2口语考试中通过各种标准为考生打分的能力。接下来,我们创建一个新的经专家验证的、与CEFR一致的合成会话数据集,其中包含以不同评估分数评级的成绩单。此外,还根据《英语词汇概况》(最高可达CEFR B2级)和CEFR-SP WikiAuto数据集开发了新的教学调整数据集。最后,使用这些新的数据集,我们执行了西北风指令7B v0.2的参数高效指令调优,以开发一系列称为EvalYaks的模型。这一系列中的四个模型用于评估CEFR B2口语考试的四个部分,一个用于识别CEFR词汇水平并生成特定级别的词汇,另一个用于检测CEFR文本的水平并生成特定级别的文本。EvalYaks达到了96%的平均可接受准确率,变异程度为0.35水平,比次好的模型好3倍。这表明,7B参数LLM教学与高质量的CEFR对齐评估数据相结合,可以有效地对CEFR B2英语口语评估进行评估和评分,为可扩展的自动化语言水平评估提供了一个有前景的解决方案。

[NLP-21] Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment EMNLP24
[NLP-21] 大型语言模型作为下一代密集检索的基础:全面的经验评估

链接: https://arxiv.org/abs/2408.12194
作者: Kun Luo,Minghao Qin,Zheng Liu,Shitao Xiao,Jun Zhao,Kang Liu
关键词-EN: Pretrained language models, Pretrained language, serve as crucial, retrieval, Pretrained
关键词-ZH: 预训练的语言模型,预训练的语言,作为至关重要的,检索,预训练
类目: Computation and Language (cs.CL)
备注: Submitted to EMNLP24

点击查看摘要

Abstract:Pretrained language models like BERT and T5 serve as crucial backbone encoders for dense retrieval. However, these models often exhibit limited generalization capabilities and face challenges in improving in domain accuracy. Recent research has explored using large language models (LLMs) as retrievers, achieving SOTA performance across various tasks. Despite these advancements, the specific benefits of LLMs over traditional retrievers and the impact of different LLM configurations, such as parameter sizes, pretraining duration, and alignment processes on retrieval tasks remain unclear. In this work, we conduct a comprehensive empirical study on a wide range of retrieval tasks, including in domain accuracy, data efficiency, zero shot generalization, lengthy retrieval, instruction based retrieval, and multi task learning. We evaluate over 15 different backbone LLMs and non LLMs. Our findings reveal that larger models and extensive pretraining consistently enhance in domain accuracy and data efficiency. Additionally, larger models demonstrate significant potential in zero shot generalization, lengthy retrieval, instruction based retrieval, and multi task learning. These results underscore the advantages of LLMs as versatile and effective backbone encoders in dense retrieval, providing valuable insights for future research and development in this field.
摘要:BERT和T5等预先训练的语言模型是密集检索的关键主干编码器。然而,这些模型往往表现出有限的泛化能力,并在提高领域精度方面面临挑战。最近的研究探索了使用大型语言模型(LLM)作为检索器,在各种任务中实现SOTA性能。尽管有这些进步,但与传统检索者相比,LLMS的具体优势以及不同的LLM配置,如参数大小、预训练持续时间和对齐过程对检索任务的影响仍不清楚。在这项工作中,我们对广泛的检索任务进行了全面的实证研究,包括领域准确率、数据效率、零距离泛化、冗长检索、基于指令的检索和多任务学习。我们评估了超过15种不同的主干LLM和非LLM。我们的发现表明,更大的模型和广泛的预训练在领域准确性和数据效率方面不断提高。此外,较大的模型在零射击泛化、冗长检索、基于指令的检索和多任务学习方面显示出巨大的潜力。这些结果突显了LLMS在密集检索中作为通用和有效的骨干编码器的优势,为该领域未来的研究和开发提供了有价值的见解。

[NLP-22] Reasoning Factual Knowledge in Structured Data with Large Language Models
[NLP-22] 用大型语言模型推理结构化数据中的事实知识

链接: https://arxiv.org/abs/2408.12188
作者: Sirui Huang,Yanggan Gu,Xuming Hu,Zhonghao Li,Qing Li,Guandong Xu
关键词-EN: Large language models, natural language processing, Large language, made remarkable progress, language processing tasks
关键词-ZH: 大型语言模型、自然语言处理、大型语言、取得显着进展、语言处理任务
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Large language models (LLMs) have made remarkable progress in various natural language processing tasks as a benefit of their capability to comprehend and reason with factual knowledge. However, a significant amount of factual knowledge is stored in structured data, which possesses unique characteristics that differ from the unstructured texts used for pretraining. This difference can introduce imperceptible inference parameter deviations, posing challenges for LLMs in effectively utilizing and reasoning with structured data to accurately infer factual knowledge. To this end, we propose a benchmark named StructFact, to evaluate the structural reasoning capabilities of LLMs in inferring factual knowledge. StructFact comprises 8,340 factual questions encompassing various tasks, domains, timelines, and regions. This benchmark allows us to investigate the capability of LLMs across five factual tasks derived from the unique characteristics of structural facts. Extensive experiments on a set of LLMs with different training strategies reveal the limitations of current LLMs in inferring factual knowledge from structured data. We present this benchmark as a compass to navigate the strengths and weaknesses of LLMs in reasoning with structured data for knowledge-sensitive tasks, and to encourage advancements in related real-world applications. Please find our code at this https URL.
摘要:大语言模型在各种自然语言处理任务中取得了显著的进展,这得益于它们对事实知识的理解和推理能力。然而,大量的事实知识存储在结构化数据中,这些数据具有不同于用于预训练的非结构化文本的独特特征。这种差异可能会带来难以察觉的推理参数偏差,给LLMS在有效利用和推理结构化数据以准确推断事实知识方面带来挑战。为此,我们提出了一个名为StructFact的基准测试,来评估LLM在推理事实知识方面的结构推理能力。StructFact包含8,340个事实问题,涵盖各种任务、域、时间线和区域。这一基准使我们能够调查LLMS跨五个事实任务的能力,这些任务源于结构事实的独特特征。在一组具有不同训练策略的LLMS上的大量实验揭示了现有LLMS在从结构化数据中推理事实知识方面的局限性。我们将这一基准作为一个指南针,以导航LLMS在使用知识敏感任务的结构化数据进行推理方面的优势和劣势,并鼓励相关现实世界应用程序的进步。请在此HTTPS URL中找到我们的代码。

[NLP-23] Revisiting the Phenomenon of Syntactic Complexity Convergence on German Dialogue Data
[NLP-23] 重新审视德国对话数据的语法复杂性趋同现象

链接: https://arxiv.org/abs/2408.12177
作者: Yu Wang,Hendrik Buschmeier
关键词-EN: syntactic complexity convergence, English dialogue, syntactic complexity, complexity convergence, mutual understanding
关键词-ZH: 语法复杂性趋同,英语对话,语法复杂性,复杂性趋同,相互理解
类目: Computation and Language (cs.CL)
备注: Accepted to KONVENS 2024

点击查看摘要

Abstract:We revisit the phenomenon of syntactic complexity convergence in conversational interaction, originally found for English dialogue, which has theoretical implication for dialogical concepts such as mutual understanding. We use a modified metric to quantify syntactic complexity based on dependency parsing. The results show that syntactic complexity convergence can be statistically confirmed in one of three selected German datasets that were analysed. Given that the dataset which shows such convergence is much larger than the other two selected datasets, the empirical results indicate a certain degree of linguistic generality of syntactic complexity convergence in conversational interaction. We also found a different type of syntactic complexity convergence in one of the datasets while further investigation is still necessary.
摘要:我们重新审视了最初在英语对话中发现的会话互动中的语法复杂性趋同现象,它对相互理解等对话概念具有理论意义。我们使用修改后的指标来基于依赖解析来量化语法复杂性。结果表明,在分析的三个选定的德国数据集之一中,可以通过统计证实语法复杂性收敛。鉴于表现出这种收敛的数据集比其他两个选择的数据集大得多,经验结果表明对话互动中语法复杂性收敛具有一定程度的语言通用性。我们还在其中一个数据集中发现了不同类型的语法复杂性收敛,但仍有必要进一步研究。

[NLP-24] FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation
[NLP-24] 第一:通过有效的值得信赖的蒸馏来教授可靠的大型语言模型

链接: https://arxiv.org/abs/2408.12168
作者: KaShun Shum,Minrui Xu,Jianshu Zhang,Zixin Chen,Shizhe Diao,Hanze Dong,Jipeng Zhang,Muhammad Omer Raza
关键词-EN: truth correctness likelihood, ground truth correctness, Large language models, Large language, daily lives
关键词-ZH: 真相正确性可能性、基本真相正确性、大型语言模型、大型语言、日常生活
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Large language models (LLMs) have become increasingly prevalent in our daily lives, leading to an expectation for LLMs to be trustworthy – - both accurate and well-calibrated (the prediction confidence should align with its ground truth correctness likelihood). Nowadays, fine-tuning has become the most popular method for adapting a model to practical usage by significantly increasing accuracy on downstream tasks. Despite the great accuracy it achieves, we found fine-tuning is still far away from satisfactory trustworthiness due to “tuning-induced mis-calibration”. In this paper, we delve deeply into why and how mis-calibration exists in fine-tuned models, and how distillation can alleviate the issue. Then we further propose a brand new method named Efficient Trustworthy Distillation (FIRST), which utilizes a small portion of teacher’s knowledge to obtain a reliable language model in a cost-efficient way. Specifically, we identify the “concentrated knowledge” phenomenon during distillation, which can significantly reduce the computational burden. Then we apply a “trustworthy maximization” process to optimize the utilization of this small portion of concentrated knowledge before transferring it to the student. Experimental results demonstrate the effectiveness of our method, where better accuracy (+2.3%) and less mis-calibration (-10%) are achieved on average across both in-domain and out-of-domain scenarios, indicating better trustworthiness.
摘要:大型语言模型在我们的日常生活中变得越来越普遍,这导致了对大型语言模型值得信赖的期望-既准确又经过良好校准(预测置信度应与其基本事实的正确可能性保持一致)。如今,微调已经成为使模型适应实际使用的最流行的方法,因为它显著提高了下游任务的精度。尽管微调实现了很高的精确度,但我们发现微调仍远未达到令人满意的可信度,这是由于“调谐导致的错误校准”。在本文中,我们深入研究了微调模型中存在错误校准的原因和方式,以及蒸馏如何缓解这个问题。然后,我们进一步提出了一种全新的方法–高效可信精馏(First),该方法利用教师的一小部分知识以低成本的方式获得可靠的语言模型。具体地说,我们识别了蒸馏过程中的“知识集中”现象,这可以显著降低计算负担。然后,在将这一小部分集中知识传授给学生之前,我们应用“值得信赖的最大化”过程来优化其利用。实验结果证明了该方法的有效性,在域内和域外场景中平均获得了更高的准确率(+2.3%)和更少的误校准(-10%),表明了更好的可信性。

[NLP-25] Preference-Guided Reflective Sampling for Aligning Language Models
[NLP-25] 用于对齐语言模型的偏好引导反射采样

链接: https://arxiv.org/abs/2408.12163
作者: Hai Ye,Hwee Tou Ng
关键词-EN: Large language models, Large language, human feedback, reinforcement learning, RLHF
关键词-ZH: 大型语言模型、大型语言、人类反馈、强化学习、RL HF
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Large language models (LLMs) are aligned with human preferences by reinforcement learning from human feedback (RLHF). Effective data sampling is crucial for RLHF, as it determines the efficiency of model training, ensuring that models learn from the informative samples. To achieve better data generation, we propose a new sampling method called Preference-Guided Reflective Sampling (PRS). PRS frames the response generation as an optimization process to the explicitly specified user preference described in natural language. It employs a tree-based generation framework to enable an efficient sampling process, which guides the direction of generation through preference and better explores the sampling space with adaptive self-refinement. Notably, PRS can align LLMs to diverse preferences. We study preference-controlled text generation for instruction following and keyword-focused document summarization. Our findings indicate that PRS, across different LLM policies, generates training data with much higher rewards than strong baselines. PRS also excels in post-RL training.
摘要:通过人类反馈强化学习(RLHF),大语言模型(LLM)与人的偏好保持一致。有效的数据采样对RLHF至关重要,因为它决定了模型训练的效率,确保模型从信息丰富的样本中学习。为了实现更好的数据生成,我们提出了一种新的采样方法,称为偏好引导反射采样(PRS)。根据以自然语言描述的明确指定的用户偏好,PRS将响应生成框定为优化过程。它使用基于树的生成框架来实现高效的采样过程,通过偏好来指导生成方向,并通过自适应自我求精来更好地探索采样空间。值得注意的是,PR可以使LLM适应不同的偏好。我们研究了用于指令遵循的偏好控制文本生成和以关键词为中心的文档摘要。我们的发现表明,在不同的LLM政策下,PR生成的培训数据比强基线产生的回报要高得多。PRS在RL后的培训方面也表现出色。

[NLP-26] Search-Based LLMs for Code Optimization ICSE’25
[NLP-26] 基于搜索的LLM用于代码优化

链接: https://arxiv.org/abs/2408.12159
作者: Shuzheng Gao,Cuiyun Gao,Wenchao Gu,Michael Lyu
关键词-EN: optimization, methods, optimization methods, written by developers, code
关键词-ZH: 优化,方法,优化方法,开发人员编写,代码
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: Accepted by 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE’25)

点击查看摘要

Abstract:The code written by developers usually suffers from efficiency problems and contain various performance bugs. These inefficiencies necessitate the research of automated refactoring methods for code optimization. Early research in code optimization employs rule-based methods and focuses on specific inefficiency issues, which are labor-intensive and suffer from the low coverage issue. Recent work regards the task as a sequence generation problem, and resorts to deep learning (DL) techniques such as large language models (LLMs). These methods typically prompt LLMs to directly generate optimized code. Although these methods show state-of-the-art performance, such one-step generation paradigm is hard to achieve an optimal solution. First, complex optimization methods such as combinatorial ones are hard to be captured by LLMs. Second, the one-step generation paradigm poses challenge in precisely infusing the knowledge required for effective code optimization within LLMs, resulting in under-optimized this http URL address these problems, we propose to model this task from the search perspective, and propose a search-based LLMs framework named SBLLM that enables iterative refinement and discovery of improved optimization methods. SBLLM synergistically integrate LLMs with evolutionary search and consists of three key components: 1) an execution-based representative sample selection part that evaluates the fitness of each existing optimized code and prioritizes promising ones to pilot the generation of improved code; 2) an adaptive optimization pattern retrieval part that infuses targeted optimization patterns into the model for guiding LLMs towards rectifying and progressively enhancing their optimization methods; and 3) a genetic operator-inspired chain-of-thought prompting part that aids LLMs in combining different optimization methods and generating improved optimization methods.
摘要:开发人员编写的代码通常存在效率问题,并且存在各种性能错误。这些效率低下的问题需要研究用于代码优化的自动重构方法。早期的代码优化研究使用基于规则的方法,并专注于特定的低效问题,这些问题是劳动密集型的,并且存在覆盖率低的问题。最近的工作将该任务视为一个序列生成问题,并求助于深度学习(DL)技术,如大型语言模型(LLMS)。这些方法通常会提示LLM直接生成优化的代码。虽然这些方法表现出最先进的性能,但这种一步生成范例很难获得最优解。首先,像组合优化方法这样的复杂优化方法很难被LLMS捕获。其次,一步生成模式对在LLMS中准确地注入有效代码优化所需的知识提出了挑战,导致该Http URL欠优化解决了这些问题,我们建议从搜索的角度对该任务进行建模,并提出了一个基于搜索的LLMS框架SBLLM,该框架能够迭代求精和发现改进的优化方法。SBLLM将LLMS与进化搜索有机地结合在一起,由三个关键部分组成:1)基于执行的代表性样本选择部分,它评估每个现有优化代码的适合度,并对有希望的代码进行优先排序,以试验改进代码的生成;2)自适应优化模式检索部分,将目标优化模式注入到模型中,指导LLM纠正和逐步增强其优化方法;以及3)遗传算子启发的思想链提示部分,帮助LLM结合不同的优化方法并生成改进的优化方法。

[NLP-27] Implicit Sentiment Analysis Based on Chain of Thought Prompting
[NLP-27] 基于思维链推测的内隐情绪分析

链接: https://arxiv.org/abs/2408.12157
作者: Zhihua Duan,Jialin Wang
关键词-EN: Implicit Sentiment Analysis, crucial research area, natural language processing, Sentiment Analysis, Implicit Sentiment
关键词-ZH: 内隐情感分析,关键研究领域,自然语言处理,情感分析,内隐情感
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Implicit Sentiment Analysis (ISA) is a crucial research area in natural language processing. Inspired by the idea of large language model Chain of Thought (CoT), this paper introduces a Sentiment Analysis of Thinking (SAoT) framework. The framework first analyzes the implicit aspects and opinions in the text using common sense and thinking chain capabilities. Then, it reflects on the process of implicit sentiment analysis and finally deduces the polarity of sentiment. The model is evaluated on the SemEval 2014 dataset, consisting of 1120 restaurant reviews and 638 laptop reviews. The experimental results demonstrate that the utilization of the ERNIE-Bot-4+SAoT model yields a notable performance improvement. Specifically, on the restaurant dataset, the F1 score reaches 75.27, accompanied by an ISA score of 66.29. Similarly, on the computer dataset, the F1 score achieves 76.50, while the ISA score amounts to 73.46. Comparatively, the ERNIE-Bot-4+SAoT model surpasses the BERTAsp + SCAPt baseline by an average margin of 47.99%.
摘要:隐含情感分析是自然语言处理中的一个重要研究领域。受大语言模型思维链(COT)思想的启发,提出了一种情感分析的思维框架。该框架首先利用常识和思维链能力分析文本中的隐含方面和观点。然后,反思了内隐情感分析的过程,最后推导出情感的两极。该模型在SemEval 2014数据集上进行了评估,该数据集包括1120条餐厅评论和638条笔记本电脑评论。实验结果表明,Ernie-Bot-4+SAOT模型的性能得到了显著的提高。具体来说,在餐厅数据集上,F1得分达到75.27,同时ISA得分为66.29。同样,在计算机数据集上,F1得分达到76.50,而ISA得分达到73.46。相比之下,Ernie-Bot-4+SAOT模型的平均边际超过了BERTAsp+SCAPt基线47.99%。

[NLP-28] A Tighter Complexity Analysis of SparseGPT
[NLP-28] SparseGPT的更严格复杂性分析

链接: https://arxiv.org/abs/2408.12151
作者: Xiaoyu Li,Yingyu Liang,Zhenmei Shi,Zhao Song
关键词-EN: Alistarh ICML, omega, matrix multiplication, Zhou ICML, exponent of matrix
关键词-ZH: Alistarh ICML,欧米茄,矩阵乘,Zhou ICML,矩阵指数
类目: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:In this work, we improved the analysis of the running time of SparseGPT [Frantar, Alistarh ICML 2023] from O(d^3) to O(d^\omega + d^2+a+o(1) + d^1+\omega(1,1,a)-a) for any a \in [0, 1] , where \omega is the exponent of matrix multiplication. In particular, for the current \omega \approx 2.371 [Alman, Duan, Williams, Xu, Xu, Zhou 2024], our running times boil down to O(d^2.53) . This running time is due to the analysis of the lazy update behavior in iterative maintenance problems, such as [Deng, Song, Weinstein 2022, Brand, Song, Zhou ICML 2024].
摘要:在这项工作中,我们将SparseGPT [Frantar,Alistarh ICML 2023]的运行时间分析从O(d ’ 3)改进为O(d '欧米茄+d#39; 2 +a+o(1)+d ’ 1 +\欧米茄(1,1,a)-a)对于任何a \in [0,1],其中\欧米茄是矩阵相乘的指数。特别是,对于当前的\欧米茄\大约2.371 [Alman,Duan,Williams,Xu,Xu,Zhou 2024],我们的运行时间可以归结为O(d#2.53)。这个运行时间是由于对迭代维护问题中懒惰更新行为的分析,例如[Deng,Song,Weinstein 2022,Brand,Song,Zhou ICML 2024]。

[NLP-29] MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symbolic LLM Agents
[NLP-29] DDD-5 k:通过神经符号LLM代理合成的精神障碍新诊断对话数据集

链接: https://arxiv.org/abs/2408.12142
作者: Congchi Yin,Feng Li,Shu Zhang,Zike Wang,Jun Shao,Piji Li,Jianhua Chen,Xun Jiang
关键词-EN: disorders primarily relies, mental disorders, Chinese mental disorders, mental disorders primarily, primarily relies
关键词-ZH: 疾病主要依赖,精神障碍,中国精神障碍,精神障碍主要,主要依赖
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:The clinical diagnosis of most mental disorders primarily relies on the conversations between psychiatrist and patient. The creation of such diagnostic conversation datasets is promising to boost the AI mental healthcare community. However, directly collecting the conversations in real diagnosis scenarios is near impossible due to stringent privacy and ethical considerations. To address this issue, we seek to synthesize diagnostic conversation by exploiting anonymous patient cases that are easier to access. Specifically, we design a neuro-symbolic multi-agent framework for synthesizing the diagnostic conversation of mental disorders with large language models. It takes patient case as input and is capable of generating multiple diverse conversations with one single patient case. The framework basically involves the interaction between a doctor agent and a patient agent, and achieves text generation under symbolic control via a dynamic diagnosis tree from a tool agent. By applying the proposed framework, we develop the largest Chinese mental disorders diagnosis dataset MDD-5k, which is built upon 1000 cleaned real patient cases by cooperating with a pioneering psychiatric hospital, and contains 5000 high-quality long conversations with diagnosis results as labels. To the best of our knowledge, it’s also the first labelled Chinese mental disorders diagnosis dataset. Human evaluation demonstrates the proposed MDD-5k dataset successfully simulates human-like diagnostic process of mental disorders. The dataset and code will become publicly accessible in this https URL.
摘要:大多数精神障碍的临床诊断主要依靠精神科医生与患者之间的对话。这类诊断对话数据集的创建有望促进人工智能精神卫生保健社区的发展。然而,由于严格的隐私和伦理考虑,在真实诊断场景中直接收集对话几乎是不可能的。为了解决这个问题,我们寻求通过利用更容易访问的匿名患者病例来合成诊断对话。具体地说,我们设计了一个神经符号多智能体框架,用于将精神障碍的诊断对话与大型语言模型进行综合。它以患者案例为输入,能够与单个患者案例生成多个不同的对话。该框架主要涉及医生代理和患者代理之间的交互,并通过工具代理的动态诊断树实现符号控制下的文本生成。通过应用提出的框架,我们开发了最大的中文精神障碍诊断数据库MDD-5K,该数据库建立在与一家领先的精神科医院合作的1000例清洁真实患者的基础上,包含了5000个以诊断结果为标签的高质量长对话。据我们所知,这也是第一个标记的中国精神障碍诊断数据集。人类评估表明,所提出的MDD-5k数据集成功地模拟了类似人类的精神障碍的诊断过程。数据集和代码将在此HTTPS URL中公开访问。

[NLP-30] RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
[NLP-30] RoVRM:通过辅助文本偏好数据优化的稳健视觉奖励模型

链接: https://arxiv.org/abs/2408.12109
作者: Chenglong Wang,Yang Gan,Yifu Huo,Yongyu Mu,Murun Yang,Qiaozhi He,Tong Xiao,Chunliang Zhang,Tongran Liu,Quan Du,Di Yang,Jingbo Zhu
关键词-EN: generating misleading content, Large vision-language models, proper visual context, visual reward model, Large vision-language
关键词-ZH: 生成误导性内容、大型视觉语言模型、适当的视觉上下文、视觉奖励模型、大型视觉语言
类目: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Large vision-language models (LVLMs) often fail to align with human preferences, leading to issues like generating misleading content without proper visual context (also known as hallucination). A promising solution to this problem is using human-preference alignment techniques, such as best-of-n sampling and reinforcement learning. However, these techniques face the difficulty arising from the scarcity of visual preference data, which is required to train a visual reward model (VRM). In this work, we continue the line of research. We present a Robust Visual Reward Model (RoVRM) which improves human-preference alignment for LVLMs. RoVRM leverages auxiliary textual preference data through a three-phase progressive training and optimal transport-based preference data selection to effectively mitigate the scarcity of visual preference data. We experiment with RoVRM on the commonly used vision-language tasks based on the LLaVA-1.5-7B and -13B models. Experimental results demonstrate that RoVRM consistently outperforms traditional VRMs. Furthermore, our three-phase progressive training and preference data selection approaches can yield consistent performance gains over ranking-based alignment techniques, such as direct preference optimization.
摘要:大型视觉语言模型(LVLM)往往不符合人类的偏好,导致在没有适当的视觉环境(也称为幻觉)的情况下产生误导性内容等问题。这个问题的一个有希望的解决方案是使用人类偏好对齐技术,例如n中最佳抽样和强化学习。然而,这些技术面临着由于视觉偏好数据稀缺而产生的困难,这需要训练视觉奖励模型(VRM)。在这项工作中,我们延续了研究的路线。我们提出了一种稳健的视觉奖励模型(RoVRM),它改进了LVLMS中人的偏好对齐。RoVRM通过三阶段渐进训练和基于交通的最优偏好数据选择来利用辅助文本偏好数据,有效地缓解视觉偏好数据的稀缺性。我们在基于LLaVA-1.5-7B和-13B模型的常用视觉语言任务上进行了RoVRM实验。实验结果表明,RoVRM的性能始终优于传统的VRM。此外,我们的三阶段渐进式训练和偏好数据选择方法可以产生与基于排名的对齐技术(如直接偏好优化)相比的一致性能收益。

[NLP-31] Extraction of Research Objectives Machine Learning Model Names and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis
[NLP-31] 研究目标从学术论文中提取机器学习模型名称和数据集名称并使用LLM和网络分析分析它们的相互关系

链接: https://arxiv.org/abs/2408.12097
作者: S. Nishio,H. Nonaka,N. Tsuchiya,A. Migita,Y. Banno,T. Hayashi,H. Sakaji,T. Sakumoto,K. Watabe
关键词-EN: Machine learning, machine learning models, learning, Machine, learning models
关键词-ZH: 机器学习,机器学习模型,学习,机器,学习模型
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: 10 pages, 8 figures

点击查看摘要

Abstract:Machine learning is widely utilized across various industries. Identifying the appropriate machine learning models and datasets for specific tasks is crucial for the effective industrial application of machine learning. However, this requires expertise in both machine learning and the relevant domain, leading to a high learning cost. Therefore, research focused on extracting combinations of tasks, machine learning models, and datasets from academic papers is critically important, as it can facilitate the automatic recommendation of suitable methods. Conventional information extraction methods from academic papers have been limited to identifying machine learning models and other entities as named entities. To address this issue, this study proposes a methodology extracting tasks, machine learning methods, and dataset names from scientific papers and analyzing the relationships between these information by using LLM, embedding model, and network clustering. The proposed method’s expression extraction performance, when using Llama3, achieves an F-score exceeding 0.8 across various categories, confirming its practical utility. Benchmarking results on financial domain papers have demonstrated the effectiveness of this method, providing insights into the use of the latest datasets, including those related to ESG (Environmental, Social, and Governance) data.
摘要:机器学习在各个行业中得到了广泛的应用。为特定任务确定合适的机器学习模型和数据集对于机器学习的有效工业应用至关重要。然而,这需要机器学习和相关领域的专业知识,导致学习成本很高。因此,从学术论文中提取任务、机器学习模型和数据集的组合的研究至关重要,因为它可以促进合适方法的自动推荐。传统的从学术论文中提取信息的方法仅限于将机器学习模型和其他实体识别为命名实体。为了解决这一问题,本研究提出了一种从科学论文中提取任务、机器学习方法和数据集名称的方法,并使用LLM、嵌入模型和网络聚类来分析这些信息之间的关系。在使用Llama3进行表情提取时,各种类别的F-Score均达到了0.8以上,证明了该方法的实用性。金融领域论文的基准结果证明了这种方法的有效性,为使用最新数据集提供了见解,包括与ESG(环境、社会和治理)数据相关的数据。

[NLP-32] uMedSum: A Unified Framework for Advancing Medical Abstractive Summarization
[NLP-32] uMedSum:推进医学抽象总结的统一框架

链接: https://arxiv.org/abs/2408.12095
作者: Aishik Nagar,Yutong Liu,Andy T. Liu,Viktor Schlegel,Vijay Prakash Dwivedi,Arun-Kumar Kaliya-Perumal,Guna Pratheep Kalanchiam,Yili Tang,Robby T. Tan
关键词-EN: faces the challenge, challenge of balancing, abstractive summarization faces, medical summarization, summarization
关键词-ZH: 面临挑战,平衡的挑战,抽象总结面临,医学总结,总结
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 12 pages

点击查看摘要

Abstract:Medical abstractive summarization faces the challenge of balancing faithfulness and informativeness. Current methods often sacrifice key information for faithfulness or introduce confabulations when prioritizing informativeness. While recent advancements in techniques like in-context learning (ICL) and fine-tuning have improved medical summarization, they often overlook crucial aspects such as faithfulness and informativeness without considering advanced methods like model reasoning and self-improvement. Moreover, the field lacks a unified benchmark, hindering systematic evaluation due to varied metrics and datasets. This paper addresses these gaps by presenting a comprehensive benchmark of six advanced abstractive summarization methods across three diverse datasets using five standardized metrics. Building on these findings, we propose uMedSum, a modular hybrid summarization framework that introduces novel approaches for sequential confabulation removal followed by key missing information addition, ensuring both faithfulness and informativeness. Our work improves upon previous GPT-4-based state-of-the-art (SOTA) medical summarization methods, significantly outperforming them in both quantitative metrics and qualitative domain expert evaluations. Notably, we achieve an average relative performance improvement of 11.8% in reference-free metrics over the previous SOTA. Doctors prefer uMedSum’s summaries 6 times more than previous SOTA in difficult cases where there are chances of confabulations or missing information. These results highlight uMedSum’s effectiveness and generalizability across various datasets and metrics, marking a significant advancement in medical summarization.
摘要:医学摘要面临着如何平衡真实性和信息性的挑战。目前的方法往往牺牲关键信息以换取忠诚度,或者在对信息进行优先排序时引入虚构。虽然情境学习(ICL)和微调等技术的最新进步改善了医学摘要,但它们往往忽略了忠实性和信息性等关键方面,而没有考虑模型推理和自我改进等高级方法。此外,该领域缺乏统一的基准,由于指标和数据集各不相同,阻碍了系统评估。本文通过使用五个标准化度量对三个不同数据集上的六种高级抽象摘要方法进行综合基准测试来解决这些差距。基于这些发现,我们提出了uMedSum,这是一个模块化的混合摘要框架,它引入了新的方法来顺序去除虚构,然后添加关键缺失信息,确保了忠实性和信息性。我们的工作改进了以前基于GPT-4的最先进的(SOTA)医学摘要方法,在定量度量和定性领域专家评估方面都显著优于它们。值得注意的是,与以前的SOTA相比,我们在无引用指标中实现了11.8%的平均相对性能改进。在有可能虚构或遗漏信息的疑难病例中,医生更喜欢uMedSum的摘要,是以前SOTA的6倍。这些结果突出了uMedSum在各种数据集和指标上的有效性和普适性,标志着医学摘要方面的重大进步。

[NLP-33] High-Quality Data Augmentation for Low-Resource NMT: Combining a Translation Memory a GAN Generator and Filtering
[NLP-33] 低资源NMT的高质量数据增强:将翻译存储器与GAN生成器结合起来并过滤

链接: https://arxiv.org/abs/2408.12079
作者: Hengjie Liu,Ruibo Hou,Yves Lepage
关键词-EN: language translation tasks, Back translation, low-resource language translation, Neural Machine Translation, extending a dataset
关键词-ZH: 语言翻译任务、反向翻译、低资源语言翻译、神经机器翻译、扩展数据集
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Back translation, as a technique for extending a dataset, is widely used by researchers in low-resource language translation tasks. It typically translates from the target to the source language to ensure high-quality translation results. This paper proposes a novel way of utilizing a monolingual corpus on the source side to assist Neural Machine Translation (NMT) in low-resource settings. We realize this concept by employing a Generative Adversarial Network (GAN), which augments the training data for the discriminator while mitigating the interference of low-quality synthetic monolingual translations with the generator. Additionally, this paper integrates Translation Memory ™ with NMT, increasing the amount of data available to the generator. Moreover, we propose a novel procedure to filter the synthetic sentence pairs during the augmentation process, ensuring the high quality of the data.
摘要:反向翻译作为一种扩展数据集的技术,被研究人员广泛用于低资源语言翻译任务。它通常将目标语言翻译为源语言,以确保高质量的翻译结果。本文提出了一种新颖的方法,在源端利用单语数据库来在低资源环境中协助神经机器翻译(NMT)。我们通过使用生成对抗网络(GAN)来实现这一概念,该网络增强了收件箱的训练数据,同时减轻了低质量合成单语翻译对生成器的干扰。此外,本文还将翻译记忆(TM)与NMT集成,增加了生成器可用的数据量。此外,我们提出了一种新颖的程序来在增强过程中过滤合成句子对,确保数据的高质量。

[NLP-34] ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM
[NLP-34] AtlantBank:评估法学硕士知识冲突影响的基准

链接: https://arxiv.org/abs/2408.12076
作者: Zhaochen Su,Jun Zhang,Xiaoye Qu,Tong Zhu,Yanshu Li,Jiashuo Sun,Juntao Li,Min Zhang,Yu Cheng
关键词-EN: Large language models, achieved impressive advancements, Large language, source of hallucinations, rarely been studied
关键词-ZH: 大型语言模型,取得了令人印象深刻的进步,大型语言,幻觉的来源,很少被研究
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: Under Review

点击查看摘要

Abstract:Large language models (LLMs) have achieved impressive advancements across numerous disciplines, yet the critical issue of knowledge conflicts, a major source of hallucinations, has rarely been studied. Only a few research explored the conflicts between the inherent knowledge of LLMs and the retrieved contextual knowledge. However, a thorough assessment of knowledge conflict in LLMs is still missing. Motivated by this research gap, we present ConflictBank, the first comprehensive benchmark developed to systematically evaluate knowledge conflicts from three aspects: (i) conflicts encountered in retrieved knowledge, (ii) conflicts within the models’ encoded knowledge, and (iii) the interplay between these conflict forms. Our investigation delves into four model families and twelve LLM instances, meticulously analyzing conflicts stemming from misinformation, temporal discrepancies, and semantic divergences. Based on our proposed novel construction framework, we create 7,453,853 claim-evidence pairs and 553,117 QA pairs. We present numerous findings on model scale, conflict causes, and conflict types. We hope our ConflictBank benchmark will help the community better understand model behavior in conflicts and develop more reliable LLMs.
摘要:大型语言模型在众多学科中取得了令人印象深刻的进展,但作为产生幻觉的主要来源的知识冲突这一关键问题却鲜有人研究。只有少数研究探讨了LLMS的固有知识与提取的语境知识之间的冲突。然而,对低成本管理中知识冲突的全面评估仍然缺乏。受这一研究空白的启发,我们提出了ConflictBank,这是第一个系统地评估知识冲突的综合基准,它从三个方面来系统地评估知识冲突:(I)在检索到的知识中遇到的冲突,(Ii)模型编码知识中的冲突,以及(Iii)这些冲突形式之间的相互作用。我们的调查深入研究了四个模型家族和12个LLM实例,细致地分析了由错误信息、时间差异和语义差异引起的冲突。基于我们提出的新的构建框架,我们创建了7,453,853个索赔-证据对和553,117个QA对。我们提出了大量关于模型规模、冲突原因和冲突类型的研究结果。我们希望我们的ConflictBank基准将帮助社区更好地理解冲突中的模型行为,并开发更可靠的LLM。

[NLP-35] Evidence-backed Fact Checking using RAG and Few-Shot In-Context Learning with LLMs
[NLP-35] 使用RAG和LLC的Few-Shot内上下文学习进行证据支持的事实检查

链接: https://arxiv.org/abs/2408.12060
作者: Ronit Singhal,Pransh Patwa,Parth Patwa,Aman Chadha,Amitava Das
关键词-EN: implementing fact-checking mechanisms, social media, widespread dissemination, dissemination of misinformation, misinformation on social
关键词-ZH: 实施事实核查机制、社交媒体、广泛传播、传播错误信息、社会上的错误信息
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Given the widespread dissemination of misinformation on social media, implementing fact-checking mechanisms for online claims is essential. Manually verifying every claim is highly challenging, underscoring the need for an automated fact-checking system. This paper presents our system designed to address this issue. We utilize the Averitec dataset to assess the veracity of claims. In addition to veracity prediction, our system provides supporting evidence, which is extracted from the dataset. We develop a Retrieve and Generate (RAG) pipeline to extract relevant evidence sentences from a knowledge base, which are then inputted along with the claim into a large language model (LLM) for classification. We also evaluate the few-shot In-Context Learning (ICL) capabilities of multiple LLMs. Our system achieves an ‘Averitec’ score of 0.33, which is a 22% absolute improvement over the baseline. All code will be made available on All code will be made available on this https URL.
摘要:鉴于社交媒体上错误信息的广泛传播,对在线索赔实施事实核查机制至关重要。手动验证每一项索赔都极具挑战性,这凸显了对自动事实核查系统的需求。本文介绍了我们旨在解决这个问题的系统。我们利用Averitec数据集来评估声明的真实性。除了准确性预测外,我们的系统还提供从数据集提取的支持证据。我们开发了一个收件箱并生成(RAG)管道,从知识库中提取相关证据句子,然后将其与声明一起输入到大型语言模型(LLM)中进行分类。我们还评估了多个LLM的少量上下文学习(ICL)能力。我们的系统的“Averitec”评分为0.33,比基线绝对提高了22%。所有代码都将在此https URL上提供所有代码都将在此httpsURL上提供。

[NLP-36] Aligning (Medical) LLMs for (Counterfactual) Fairness
[NLP-36] 调整(医疗)LLM以实现(反事实)公平

链接: https://arxiv.org/abs/2408.12055
作者: Raphael Poulain,Hamed Fayyaz,Rahmatollah Beheshti
关键词-EN: Large Language Models, Large Language, clinical decision support, decision support applications, Language Models
关键词-ZH: 大型语言模型、大型语言、临床决策支持、决策支持应用程序、语言模型
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
备注: arXiv admin note: substantial text overlap with arXiv:2404.15149

点击查看摘要

Abstract:Large Language Models (LLMs) have emerged as promising solutions for a variety of medical and clinical decision support applications. However, LLMs are often subject to different types of biases, which can lead to unfair treatment of individuals, worsening health disparities, and reducing trust in AI-augmented medical tools. Aiming to address this important issue, in this study, we present a new model alignment approach for aligning LLMs using a preference optimization method within a knowledge distillation framework. Prior to presenting our proposed method, we first use an evaluation framework to conduct a comprehensive (largest to our knowledge) empirical evaluation to reveal the type and nature of existing biases in LLMs used for medical applications. We then offer a bias mitigation technique to reduce the unfair patterns in LLM outputs across different subgroups identified by the protected attributes. We show that our mitigation method is effective in significantly reducing observed biased patterns. Our code is publicly available at \urlthis https URL.
摘要:大型语言模型(LLM)已经成为各种医疗和临床决策支持应用的有效解决方案。然而,LLM经常受到不同类型的偏见的影响,这可能导致对个人的不公平待遇,恶化健康差距,并降低对人工智能增强的医疗工具的信任。为了解决这一重要问题,在本研究中,我们提出了一种新的模型对齐方法,该方法在知识精馏框架内使用偏好优化方法来对齐低成本模型。在介绍我们提出的方法之前,我们首先使用一个评估框架来进行全面的(据我们所知最大的)实证评估,以揭示用于医疗应用的LLM中存在的偏差的类型和性质。然后,我们提出了一种偏差缓解技术,以减少由受保护属性标识的不同子组的LLM输出中的不公平模式。我们表明,我们的缓解方法在显著减少观察到的有偏模式方面是有效的。我们的代码在此HTTPS URL上公开提供。

[NLP-37] Reasoning and Tools for Human-Level Forecasting
[NLP-37] 人类层面预测的推理和工具

链接: https://arxiv.org/abs/2408.12036
作者: Elvis Hsieh,Preston Fu,Jonathan Chen
关键词-EN: largely successful due, memorize large amounts, training data, Language models, trained on web-scale
关键词-ZH: 由于记忆大量、训练数据、语言模型、在网络规模上训练,基本上成功
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
备注:

点击查看摘要

Abstract:Language models (LMs) trained on web-scale datasets are largely successful due to their ability to memorize large amounts of training data, even if only present in a few examples. These capabilities are often desirable in evaluation on tasks such as question answering but raise questions about whether these models can exhibit genuine reasoning or succeed only at mimicking patterns from the training data. This distinction is particularly salient in forecasting tasks, where the answer is not present in the training data, and the model must reason to make logical deductions. We present Reasoning and Tools for Forecasting (RTF), a framework of reasoning-and-acting (ReAct) agents that can dynamically retrieve updated information and run numerical simulation with equipped tools. We evaluate our model with questions from competitive forecasting platforms and demonstrate that our method is competitive with and can outperform human predictions. This suggests that LMs, with the right tools, can indeed think and adapt like humans, offering valuable insights for real-world decision-making.
摘要:在网络规模的数据集上训练的语言模型(LMS)由于具有记忆大量训练数据的能力而在很大程度上是成功的,即使只出现在几个例子中。这些能力在评估问题回答等任务时往往是可取的,但也引发了这样的问题:这些模型是否能够展示真正的推理,或者只成功地模仿训练数据中的模式。这一区别在预测任务中尤其明显,在预测任务中,答案不存在于训练数据中,模型必须推理以做出逻辑推断。我们提出了推理和预测工具(RTF),这是一个推理和行动(REACT)代理的框架,可以动态地检索更新的信息并使用配备的工具进行数值模拟。我们使用竞争性预测平台上的问题对我们的模型进行了评估,并证明了我们的方法与人类的预测相竞争,并且可以超越人类的预测。这表明,有了正确的工具,LMS确实可以像人类一样思考和适应,为现实世界的决策提供有价值的见解。

[NLP-38] Let Community Rules Be Reflected in Online Content Moderation
[NLP-38] 让社区规则体现在在线内容审核中

链接: https://arxiv.org/abs/2408.12035
作者: Wangjiaxuan Xin,Kanlun Wang,Zhe Fu,Lina Zhou
关键词-EN: social media platforms, Content moderation, media platforms, Content, widely used strategy
关键词-ZH: 社交媒体平台、内容审核、媒体平台、内容、广泛使用的策略
类目: ocial and Information Networks (cs.SI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
备注: 10 pages, 3 figures

点击查看摘要

Abstract:Content moderation is a widely used strategy to prevent the dissemination of irregular information on social media platforms. Despite extensive research on developing automated models to support decision-making in content moderation, there remains a notable scarcity of studies that integrate the rules of online communities into content moderation. This study addresses this gap by proposing a community rule-based content moderation framework that directly integrates community rules into the moderation of user-generated content. Our experiment results with datasets collected from two domains demonstrate the superior performance of models based on the framework to baseline models across all evaluation metrics. In particular, incorporating community rules substantially enhances model performance in content moderation. The findings of this research have significant research and practical implications for improving the effectiveness and generalizability of content moderation models in online communities.
摘要:内容审核是一种广泛使用的策略,用于防止非常规信息在社交媒体平台上的传播。尽管在开发自动化模型以支持内容审核决策方面进行了广泛的研究,但将在线社区的规则整合到内容审核中的研究仍然明显不足。这项研究通过提出一个基于社区规则的内容审核框架来解决这一差距,该框架直接将社区规则整合到用户生成的内容的审核中。我们在两个领域收集的数据集上的实验结果表明,基于该框架的模型在所有评估指标上都具有优于基线模型的性能。特别是,纳入社区规则大大提高了模型在内容审核方面的性能。本研究的发现对于提高在线社区内容审核模型的有效性和推广能力具有重要的研究和实践意义。

[NLP-39] Limitations in Employing Natural Language Supervision for Sensor-Based Human Activity Recognition – And Ways to Overcome Them
[NLP-39] 利用自然语言监督进行基于传感器的人类活动识别的局限性–以及克服这些局限性的方法

链接: https://arxiv.org/abs/2408.12023
作者: Harish Haresamudram,Apoorva Beedu,Mashfiqui Rabbi,Sankalita Saha,Irfan Essa,Thomas Ploetz
关键词-EN: Cross-modal contrastive pre-training, Cross-modal contrastive, demonstrated astonishing performance, vision and audio, natural language supervision
关键词-ZH: 跨模式对比预训练,跨模式对比,展示了令人惊叹的表现、视觉和音频、自然语言监督
类目: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
备注:

点击查看摘要

Abstract:Cross-modal contrastive pre-training between natural language and other modalities, e.g., vision and audio, has demonstrated astonishing performance and effectiveness across a diverse variety of tasks and domains. In this paper, we investigate whether such natural language supervision can be used for wearable sensor based Human Activity Recognition (HAR), and discover that-surprisingly-it performs substantially worse than standard end-to-end training and self-supervision. We identify the primary causes for this as: sensor heterogeneity and the lack of rich, diverse text descriptions of activities. To mitigate their impact, we also develop strategies and assess their effectiveness through an extensive experimental evaluation. These strategies lead to significant increases in activity recognition, bringing performance closer to supervised and self-supervised training, while also enabling the recognition of unseen activities and cross modal retrieval of videos. Overall, our work paves the way for better sensor-language learning, ultimately leading to the development of foundational models for HAR using wearables.
摘要:自然语言与其他形式(如视觉和听觉)之间的跨通道对比预训练在各种任务和领域中表现出惊人的性能和有效性。在本文中,我们调查了这种自然语言监督是否可以用于基于可穿戴传感器的人类活动识别(HAR),并发现令人惊讶的是,它的表现远远不如标准的端到端训练和自我监督。我们认为造成这种情况的主要原因是:传感器的异构性和缺乏对活动的丰富、多样化的文本描述。为了减轻它们的影响,我们还制定了战略,并通过广泛的实验评估来评估其有效性。这些策略显著增加了活动识别,使性能更接近于监督和自我监督的培训,同时还能够识别看不见的活动和视频的跨模式检索。总体而言,我们的工作为更好地学习传感器语言铺平了道路,最终导致了使用可穿戴设备的HAR基础模型的开发。

[NLP-40] Understanding Epistemic Language with a Bayesian Theory of Mind
[NLP-40] 用Bayesian心理理论理解认识语言

链接: https://arxiv.org/abs/2408.12022
作者: Lance Ying,Tan Zhi-Xuan,Lionel Wong,Vikash Mansinghka,Joshua B. Tenenbaum
关键词-EN: directly observed, people understand, understand and evaluate, Abstract, Bayesian
关键词-ZH: 直接观察,人们理解、理解和评价,抽象,Bayesian
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 21 pages

点击查看摘要

Abstract:How do people understand and evaluate claims about others’ beliefs, even though these beliefs cannot be directly observed? In this paper, we introduce a cognitive model of epistemic language interpretation, grounded in Bayesian inferences about other agents’ goals, beliefs, and intentions: a language-augmented Bayesian theory-of-mind (LaBToM). By translating natural language into an epistemic ``language-of-thought’', then evaluating these translations against the inferences produced by inverting a probabilistic generative model of rational action and perception, LaBToM captures graded plausibility judgments about epistemic claims. We validate our model in an experiment where participants watch an agent navigate a maze to find keys hidden in boxes needed to reach their goal, then rate sentences about the agent’s beliefs. In contrast with multimodal LLMs (GPT-4o, Gemini Pro) and ablated models, our model correlates highly with human judgments for a wide range of expressions, including modal language, uncertainty expressions, knowledge claims, likelihood comparisons, and attributions of false belief.
摘要:人们如何理解和评价关于他人信仰的主张,即使这些信仰不能被直接观察到?在本文中,我们介绍了一个认知语言解释模型,该模型基于贝叶斯对其他主体的目标、信念和意图的推理:语言扩充的贝叶斯心理理论(LaBToM)。通过将自然语言翻译成认知性的“思维语言”,然后对照通过颠倒理性行为和感知的概率生成模型而产生的推理来评估这些翻译,LaBToM捕捉到关于认知性主张的分级似是而非判断。我们在一个实验中验证了我们的模型,在这个实验中,参与者看着一个代理在迷宫中导航,寻找隐藏在到达他们目标所需的盒子里的钥匙,然后对关于代理信念的句子进行评级。与多通道LLMS(GPT-40,Gemini Pro)和消融模型相比,我们的模型与人类对各种表达的判断高度相关,包括模态语言、不确定性表达、知识主张、可能性比较和错误信念的归因。

[NLP-41] RAG-Optimized Tibetan Tourism LLMs: Enhancing Accuracy and Personalization
[NLP-41] RAG优化的西藏旅游LLM:提高准确性和个性化

链接: https://arxiv.org/abs/2408.12003
作者: Jinhu Qi,Shuai Yan,Yibo Zhang,Wentao Zhang,Rong Jin,Yuwei Hu,Ke Wang
关键词-EN: modern social economy, meet people spiritual, bringing development opportunities, social economy, modern social
关键词-ZH: 现代社会经济,满足人们精神,带来发展机遇,社会经济,现代社会
类目: Computation and Language (cs.CL)
备注: Accepted by AIPR 2024

点击查看摘要

Abstract:With the development of the modern social economy, tourism has become an important way to meet people’s spiritual needs, bringing development opportunities to the tourism industry. However, existing large language models (LLMs) face challenges in personalized recommendation capabilities and the generation of content that can sometimes produce hallucinations. This study proposes an optimization scheme for Tibet tourism LLMs based on retrieval-augmented generation (RAG) technology. By constructing a database of tourist viewpoints and processing the data using vectorization techniques, we have significantly improved retrieval accuracy. The application of RAG technology effectively addresses the hallucination problem in content generation. The optimized model shows significant improvements in fluency, accuracy, and relevance of content generation. This research demonstrates the potential of RAG technology in the standardization of cultural tourism information and data analysis, providing theoretical and technical support for the development of intelligent cultural tourism service systems.
摘要:随着现代社会经济的发展,旅游已成为满足人们精神需求的重要途径,给旅游业带来了发展机遇。然而,现有的大型语言模型(LLM)在个性化推荐能力和内容生成方面面临挑战,这些内容有时会产生幻觉。提出了一种基于检索-增强生成(RAG)技术的西藏旅游LLMS优化方案。通过构建旅游视点数据库,并利用矢量化技术对数据进行处理,显著提高了检索准确率。RAG技术的应用有效地解决了内容生成中的幻觉问题。优化后的模型在内容生成的流畅性、准确性和相关性方面都有显著提高。本研究展示了RAG技术在文化旅游信息标准化和数据分析方面的潜力,为智能文化旅游服务系统的开发提供了理论和技术支持。

[NLP-42] Large Language Models for Page Stream Segmentation
[NLP-42] 用于页面流分割的大型语言模型

链接: https://arxiv.org/abs/2408.11981
作者: Hunter Heidenreich,Ratish Dalvi,Rohith Mukku,Nikhil Verma,Neven Pičuljan
关键词-EN: Page Stream Segmentation, Page Stream, Stream Segmentation, Optical Character Recognition, essential prerequisite
关键词-ZH: 页面流分割,页面流,流分割,光学字符识别,必备先决条件
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Page Stream Segmentation (PSS) is an essential prerequisite for automated document processing at scale. However, research progress has been limited by the absence of realistic public benchmarks. This paper works towards addressing this gap by introducing TABME++, an enhanced benchmark featuring commercial Optical Character Recognition (OCR) annotations. We evaluate the performance of large language models (LLMs) on PSS, focusing on decoder-based models fine-tuned with parameter-efficient methods. Our results show that decoder-based LLMs outperform smaller multimodal encoders. Through a review of existing PSS research and datasets, we identify key challenges and advancements in the field. Our findings highlight the key importance of robust OCR, providing valuable insights for the development of more effective document processing systems.
摘要:页面流分割(CSS)是大规模自动化文档处理的必要先决条件。然而,由于缺乏现实的公共基准,研究进展受到限制。本文致力于通过引入TABME++来解决这一差距,TABME++是一种具有商业光学字符识别(OCR)注释的增强型基准。我们评估了大型语言模型(LLM)在大型语言模型(LLM)上的性能,重点关注使用参数高效方法微调的基于解码器的模型。我们的结果表明,基于解码器的LLM优于较小的多峰编码器。通过对现有的CSS研究和数据集的审查,我们确定了该领域的关键挑战和进步。我们的研究结果强调了稳健的OCR的关键重要性,为开发更有效的文档处理系统提供了宝贵的见解。

[NLP-43] Characterizing Online Toxicity During the 2022 Mpox Outbreak: A Computational Analysis of Topical and Network Dynamics
[NLP-43] 2022年Mpox疫情期间的在线毒性特征:话题和网络动态的计算分析

链接: https://arxiv.org/abs/2408.11962
作者: Lizhou Fan,Lingyao Li,Libby Hemphill
关键词-EN: Mpox outbreak, hate speech, pressing social concern, encompassing behaviors, digital age
关键词-ZH: 天花爆发、仇恨言论、紧迫的社会担忧、包容性行为、数字时代
类目: ocial and Information Networks (cs.SI); Computation and Language (cs.CL)
备注: 36 pages, 8 figure, and 12 tables

点击查看摘要

Abstract:Background: Online toxicity, encompassing behaviors such as harassment, bullying, hate speech, and the dissemination of misinformation, has become a pressing social concern in the digital age. The 2022 Mpox outbreak, initially termed “Monkeypox” but subsequently renamed to mitigate associated stigmas and societal concerns, serves as a poignant backdrop to this issue. Objective: In this research, we undertake a comprehensive analysis of the toxic online discourse surrounding the 2022 Mpox outbreak. Our objective is to dissect its origins, characterize its nature and content, trace its dissemination patterns, and assess its broader societal implications, with the goal of providing insights that can inform strategies to mitigate such toxicity in future crises. Methods: We collected more than 1.6 million unique tweets and analyzed them from five dimensions, including context, extent, content, speaker, and intent. Utilizing BERT-based topic modeling and social network community clustering, we delineated the toxic dynamics on Twitter. Results: We identified five high-level topic categories in the toxic online discourse on Twitter, including disease (46.6%), health policy and healthcare (19.3%), homophobia (23.9%), politics (6.0%), and racism (4.1%). Through the toxicity diffusion networks of mentions, retweets, and the top users, we found that retweets of toxic content were widespread, while influential users rarely engaged with or countered this toxicity through retweets. Conclusions: By tracking topical dynamics, we can track the changing popularity of toxic content online, providing a better understanding of societal challenges. Network dynamics spotlight key social media influencers and their intents, indicating that addressing these central figures in toxic discourse can enhance crisis communication and inform policy-making.
摘要:背景:网络毒性,包括骚扰、欺凌、仇恨言论和传播错误信息等行为,已成为数字时代迫切需要关注的社会问题。2022年的MPOX疫情最初被称为“猴痘”,但后来为了减轻相关的耻辱和社会关切而更名,成为这一问题的一个令人心酸的背景。目的:在这项研究中,我们对围绕2022年Mpox爆发的有毒网络话语进行了全面的分析。我们的目标是剖析其起源,确定其性质和内容,追踪其传播模式,并评估其更广泛的社会影响,目的是提供见解,为在未来的危机中减轻这种毒性的战略提供参考。方法:我们收集了160多万条独特的推文,并从背景、范围、内容、说话人和意图等五个维度进行了分析。利用基于BERT的主题建模和社交网络社区聚类,我们描绘了Twitter上的有毒动态。结果:我们在Twitter上的有毒在线话语中确定了五个高级别话题类别,包括疾病(46.6%)、卫生政策和医疗保健(19.3%)、同性恋恐惧症(23.9%)、政治(6.0%)和种族主义(4.1%)。通过提及、转发和排名靠前的用户的毒性扩散网络,我们发现有毒内容的转发很普遍,而有影响力的用户很少通过转发来从事或对抗这种毒性。结论:通过跟踪话题动态,我们可以跟踪在线有毒内容不断变化的受欢迎程度,从而更好地了解社会挑战。网络动态突出了关键的社交媒体影响者及其意图,表明用有毒话语处理这些核心人物可以加强危机沟通,并为政策制定提供信息。

[NLP-44] Decoding SEC Actions: Enforcement Trends through Analyzing Blockchain litigation using LLM-based Thematic Factor Mapping
[NLP-44] 解码SEC行动:通过使用基于LLM的主题因素映射分析区块链诉讼来分析执法趋势

链接: https://arxiv.org/abs/2408.11961
作者: Junliang Luo,Xihan Xiong,William Knottenbelt,Xue Liu
关键词-EN: potential regulatory actions, persons or enterprises, blockchain entities, driving SEC actions, regulatory authorities
关键词-ZH: 潜在的监管行动、个人或企业、区块链实体、推动SEC行动、监管当局
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:The proliferation of blockchain entities (persons or enterprises) exposes them to potential regulatory actions (e.g., being litigated) by regulatory authorities. Regulatory frameworks for crypto assets are actively being developed and refined, increasing the likelihood of such actions. The lack of systematic analysis of the factors driving litigation against blockchain entities leaves companies in need of clarity to navigate compliance risks. This absence of insight also deprives investors of the information for informed decision-making. This study focuses on U.S. litigation against blockchain entities, particularly by the U.S. Securities and Exchange Commission (SEC) given its influence on global crypto regulation. Utilizing frontier pretrained language models and large language models, we systematically map all SEC complaints against blockchain companies from 2012 to 2024 to thematic factors conceptualized by our study to delineate the factors driving SEC actions. We quantify the thematic factors and assess their influence on specific legal Acts cited within the complaints on an annual basis, allowing us to discern the regulatory emphasis, patterns and conduct trend analysis.
摘要:区块链实体(个人或企业)的激增使他们面临监管部门的潜在监管行动(例如,被提起诉讼)。正在积极制定和完善加密资产的监管框架,增加了此类行动的可能性。缺乏对推动针对区块链实体的诉讼的因素的系统分析,使得公司需要澄清以驾驭合规风险。这种洞察力的缺乏也剥夺了投资者做出明智决策的信息。这项研究的重点是美国针对区块链实体的诉讼,特别是美国证券交易委员会(SEC)对全球密码监管的影响。利用前沿预先训练的语言模型和大型语言模型,我们系统地将2012至2024年SEC对区块链公司的所有投诉映射到我们研究概念化的主题因素,以描绘推动SEC行动的因素。我们每年将主题因素量化,并评估它们对投诉中引用的具体法律行为的影响,使我们能够识别监管重点、模式和进行趋势分析。

[NLP-45] he State of Commercial Automatic French Legal Speech Recognition Systems and their Impact on Court Reporters et al
[NLP-45] 商业法语自动法律语音识别系统的现状及其对法庭记录员等人的影响

链接: https://arxiv.org/abs/2408.11940
作者: Nicolad Garneau,Olivier Bolduc
关键词-EN: Quebec and Canadian, official court reporter, Canadian courts, Automatic Speech Recognition, critical task
关键词-ZH: 魁北克和加拿大,官方法庭记者,加拿大法院,自动语音识别,关键任务
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:In Quebec and Canadian courts, the transcription of court proceedings is a critical task for appeal purposes and must be certified by an official court reporter. The limited availability of qualified reporters and the high costs associated with manual transcription underscore the need for more efficient solutions. This paper examines the potential of Automatic Speech Recognition (ASR) systems to assist court reporters in transcribing legal proceedings. We benchmark three ASR models, including commercial and open-source options, on their ability to recognize French legal speech using a curated dataset. Our study evaluates the performance of these systems using the Word Error Rate (WER) metric and introduces the Sonnex Distance to account for phonetic accuracy. We also explore the broader implications of ASR adoption on court reporters, copyists, the legal system, and litigants, identifying both positive and negative impacts. The findings suggest that while current ASR systems show promise, they require further refinement to meet the specific needs of the legal domain.
摘要:在魁北克和加拿大法院,法庭诉讼程序的转录对于上诉来说是一项关键任务,必须由正式的法庭书记员证明。合格记者的有限供应和与人工转录相关的高昂成本突出表明需要更有效的解决办法。本文研究了自动语音识别(ASR)系统在帮助法庭书记员转录法律程序方面的潜力。我们对包括商业和开源选项在内的三种ASR模型进行了基准测试,考察了它们使用精心挑选的数据集识别法语法律语音的能力。我们的研究使用单词错误率(WER)度量来评估这些系统的性能,并引入SonneX距离来考虑语音准确度。我们还探讨了采用ASR对法庭书记员、抄袭者、法律系统和诉讼当事人的更广泛影响,确定了积极和消极的影响。调查结果表明,虽然目前的ASR系统显示出了希望,但它们需要进一步完善,以满足法律领域的特定需求。

[NLP-46] Defining Boundaries: The Impact of Domain Specification on Cross-Language and Cross-Domain Transfer in Machine Translation
[NLP-46] 定义边界:领域规范对机器翻译中跨语言和跨领域转移的影响

链接: https://arxiv.org/abs/2408.11926
作者: Lia Shahnazaryan,Meriem Beloucif
关键词-EN: extensive parallel corpora, parallel corpora limits, corpora limits progress, Recent advancements, neural machine translation
关键词-ZH: 广泛的平行库、平行库限制、库限制进步、最新进展、神经机器翻译
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Recent advancements in neural machine translation (NMT) have revolutionized the field, yet the dependency on extensive parallel corpora limits progress for low-resource languages. Cross-lingual transfer learning offers a promising solution by utilizing data from high-resource languages but often struggles with in-domain NMT. In this paper, we investigate three pivotal aspects: enhancing the domain-specific quality of NMT by fine-tuning domain-relevant data from different language pairs, identifying which domains are transferable in zero-shot scenarios, and assessing the impact of language-specific versus domain-specific factors on adaptation effectiveness. Using English as the source language and Spanish for fine-tuning, we evaluate multiple target languages including Portuguese, Italian, French, Czech, Polish, and Greek. Our findings reveal significant improvements in domain-specific translation quality, especially in specialized fields such as medical, legal, and IT, underscoring the importance of well-defined domain data and transparency of the experiment setup in in-domain transfer learning.
摘要:神经机器翻译(NMT)的最新进展给该领域带来了革命性的变化,但对广泛的平行语料库的依赖限制了低资源语言的进步。跨语言迁移学习通过利用来自高资源语言的数据提供了一个有前途的解决方案,但往往在领域内自然机器翻译方面遇到困难。在本文中,我们研究了三个关键方面:通过微调来自不同语言对的领域相关数据来提高自然机器翻译的领域相关质量,识别哪些领域在零概率情景下是可转移的,以及评估特定语言和领域特定因素对适应效果的影响。我们使用英语作为源语言,使用西班牙语进行微调,评估多种目标语言,包括葡萄牙语、意大利语、法语、捷克语、波兰语和希腊语。我们的研究结果显示,特定领域的翻译质量有了显著的提高,特别是在医疗、法律和信息技术等专业领域,这突显了明确定义的领域数据和实验设置在领域内迁移学习中的透明度的重要性。

[NLP-47] Ancient Wisdom Modern Tools: Exploring Retrieval-Augmented LLMs for Ancient Indian Philosophy ACL ACL2024
[NLP-47] 古代智慧现代工具:探索古代印度哲学的检索增强法学硕士

链接: https://arxiv.org/abs/2408.11903
作者: Priyanka Mandikal
关键词-EN: revolutionized the landscape, landscape of information, knowledge dissemination, RAG model, RAG
关键词-ZH: 彻底改变了景观、信息景观、知识传播、RAG模型、RAG
类目: Computation and Language (cs.CL); Computers and Society (cs.CY); Information Retrieval (cs.IR)
备注: Best paper at the Workshop on Machine Learning for Ancient Languages @ ACL 2024. Proceedings of the 1st Machine Learning for Ancient Languages Workshop, 2024.ml4al-1.23, Association for Computational Linguistics (ACL) 2024. Dataset, code, and evaluation is available at: this https URL

点击查看摘要

Abstract:LLMs have revolutionized the landscape of information retrieval and knowledge dissemination. However, their application in specialized areas is often hindered by factual inaccuracies and hallucinations, especially in long-tail knowledge distributions. We explore the potential of retrieval-augmented generation (RAG) models for long-form question answering (LFQA) in a specialized knowledge domain. We present VedantaNY-10M, a dataset curated from extensive public discourses on the ancient Indian philosophy of Advaita Vedanta. We develop and benchmark a RAG model against a standard, non-RAG LLM, focusing on transcription, retrieval, and generation performance. Human evaluations by computational linguists and domain experts show that the RAG model significantly outperforms the standard model in producing factual and comprehensive responses having fewer hallucinations. In addition, a keyword-based hybrid retriever that emphasizes unique low-frequency terms further improves results. Our study provides insights into effectively integrating modern large language models with ancient knowledge systems. Project page with dataset and code: this https URL
摘要:LLMS彻底改变了信息检索和知识传播的格局。然而,它们在专门领域的应用往往受到事实错误和幻觉的阻碍,特别是在长尾知识传播方面。我们探索了检索-增强生成(RAG)模型在专门知识领域中用于长格式问答(LFQA)的潜力。我们提供VedantaNY-10M,这是一个数据集,从关于古代印度人的Advaita Vedanta哲学的广泛公开论述中精选出来。我们开发了一个RAG模型,并以标准的、非RAG的LLM为基准,重点关注转录、检索和生成性能。计算语言学家和领域专家的人类评估表明,RAG模型在产生具有较少幻觉的事实和综合反应方面显著优于标准模型。此外,基于关键字的混合检索器强调独特的低频术语,进一步提高了结果。我们的研究为有效地将现代大型语言模型与古代知识系统相结合提供了见解。包含数据集和代码的项目页面:此HTTPS URL

[NLP-48] Beyond Labels: Aligning Large Language Models with Human-like Reasoning ICPR2024
[NLP-48] 超越标签:将大型语言模型与类人推理保持一致

链接: https://arxiv.org/abs/2408.11879
作者: Muhammad Rafsan Kabir,Rafeed Mohammad Sultan,Ihsanul Haque Asif,Jawad Ibn Ahad,Fuad Rahman,Mohammad Ruhul Amin,Nabeel Mohammed,Shafin Rahman
关键词-EN: produce morally correct, Aligning large language, reasoning approach ensures, LLMs produce morally, large language models
关键词-ZH: 产生道德上正确的、调整大型语言、推理方法确保的、LLM产生道德上的、大型语言模型
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Accepted in ICPR 2024

点击查看摘要

Abstract:Aligning large language models (LLMs) with a human reasoning approach ensures that LLMs produce morally correct and human-like decisions. Ethical concerns are raised because current models are prone to generating false positives and providing malicious responses. To contribute to this issue, we have curated an ethics dataset named Dataset for Aligning Reasons (DFAR), designed to aid in aligning language models to generate human-like reasons. The dataset comprises statements with ethical-unethical labels and their corresponding reasons. In this study, we employed a unique and novel fine-tuning approach that utilizes ethics labels and their corresponding reasons (L+R), in contrast to the existing fine-tuning approach that only uses labels (L). The original pre-trained versions, the existing fine-tuned versions, and our proposed fine-tuned versions of LLMs were then evaluated on an ethical-unethical classification task and a reason-generation task. Our proposed fine-tuning strategy notably outperforms the others in both tasks, achieving significantly higher accuracy scores in the classification task and lower misalignment rates in the reason-generation task. The increase in classification accuracies and decrease in misalignment rates indicate that the L+R fine-tuned models align more with human ethics. Hence, this study illustrates that injecting reasons has substantially improved the alignment of LLMs, resulting in more human-like responses. We have made the DFAR dataset and corresponding codes publicly available at this https URL.
摘要:将大型语言模型(LLM)与人类推理方法相结合,确保LLM产生道德上正确的、与人类相似的决策。之所以提出伦理问题,是因为当前的模型容易产生假阳性并提供恶意响应。为了对这个问题做出贡献,我们策划了一个名为DataSet for Aligning Reasons(DFAR)的伦理数据集,旨在帮助调整语言模型以生成类似人类的原因。该数据集包括带有伦理-不道德标签的声明及其相应的原因。在本研究中,我们采用了一种独特的、新颖的微调方法,它利用伦理标签及其相应的原因(L+R),而不是现有的只使用标签的微调方法(L)。然后,在伦理-不道德分类任务和原因生成任务中对原始的预训练版本、现有的微调版本和我们提出的LLMS微调版本进行评估。我们提出的微调策略在这两个任务中的表现都显著优于其他两个任务,在分类任务中获得了显著较高的准确率,在原因生成任务中获得了较低的错配率。分类准确率的提高和错误率的降低表明,L+R微调模型更符合人类伦理。因此,这项研究表明,注射原因显著改善了LLMS的对准,导致了更多类似于人类的反应。我们已在此HTTPS URL上公开提供DFAR数据集和相应的代码。

[NLP-49] Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
[NLP-49] Open-FinLLM:面向金融应用的开放多模式大型语言模型

链接: https://arxiv.org/abs/2408.11878
作者: Qianqian Xie,Dong Li,Mengxi Xiao,Zihao Jiang,Ruoyu Xiang,Xiao Zhang,Zhengyu Chen,Yueru He,Weiguang Han,Yuzhe Yang,Shunian Chen,Yifei Zhang,Lihang Shen,Daniel Kim,Zhiwei Liu,Zheheng Luo,Yangyang Yu,Yupeng Cao,Zhiyang Deng,Zhiyuan Yao,Haohang Li,Duanyu Feng,Yongfu Dai,VijayaSai Somasundaram,Peng Lu,Yilun Zhao,Yitao Long,Guojun Xiong,Kaleb Smith,Honghai Yu,Yanzhao Lai,Min Peng,Jianyun Nie,Jordan W. Suchow,Xiao-Yang Liu,Benyou Wang,Alejandro Lopez-Lira,Jimin Huang,Sophia Ananiadou
关键词-EN: Large language models, involving multi-modal inputs, Large language, lack sufficient financial, tasks involving multi-modal
关键词-ZH: 大型语言模型,涉及多模式输入,大型语言,缺乏足够的财务,涉及多模式的任务
类目: Computation and Language (cs.CL); Computational Engineering, Finance, and Science (cs.CE); Computational Finance (q-fin.CP)
备注: 33 pages, 13 figures

点击查看摘要

Abstract:Large language models (LLMs) have advanced financial applications, yet they often lack sufficient financial knowledge and struggle with tasks involving multi-modal inputs like tables and time series data. To address these limitations, we introduce \textitOpen-FinLLMs, a series of Financial LLMs. We begin with FinLLaMA, pre-trained on a 52 billion token financial corpus, incorporating text, tables, and time-series data to embed comprehensive financial knowledge. FinLLaMA is then instruction fine-tuned with 573K financial instructions, resulting in FinLLaMA-instruct, which enhances task performance. Finally, we present FinLLaVA, a multimodal LLM trained with 1.43M image-text instructions to handle complex financial data types. Extensive evaluations demonstrate FinLLaMA’s superior performance over LLaMA3-8B, LLaMA3.1-8B, and BloombergGPT in both zero-shot and few-shot settings across 19 and 4 datasets, respectively. FinLLaMA-instruct outperforms GPT-4 and other Financial LLMs on 15 datasets. FinLLaVA excels in understanding tables and charts across 4 multimodal tasks. Additionally, FinLLaMA achieves impressive Sharpe Ratios in trading simulations, highlighting its robust financial application capabilities. We will continually maintain and improve our models and benchmarks to support ongoing innovation in academia and industry.
摘要:大型语言模型具有高级的金融应用,但它们往往缺乏足够的金融知识,并且难以处理表格和时间序列数据等涉及多模式输入的任务。为了解决这些限制,我们引入了一系列的金融LLM–\tex Open-FinLLMS。我们从FinLLaMA开始,它在520亿个象征性的金融语料库上进行了预培训,结合了文本、表格和时间序列数据,以嵌入全面的金融知识。FinLLaMA然后用573K金融指令进行指令微调,产生FinLLaMA-Indict,这提高了任务性能。最后,我们给出了FinLLaVA,一个用143万条图文指令训练的多模式LLM,用于处理复杂的金融数据类型。广泛的评估表明,在19个和4个数据集上,FinLLaMA分别在零激发和少激发设置方面优于LLaMA3-8B、LLaMA3.1-8B和BloombergGPT。FinLLaMA-Indict在15个数据集上的表现优于GPT-4和其他金融LLM。FinLLaVA在理解4个多模式任务的表格和图表方面表现出色。此外,FinLLaMA在交易模拟中达到了令人印象深刻的夏普比率,突显了其强大的金融应用能力。我们将继续保持和改进我们的模型和基准,以支持学术界和行业正在进行的创新。

[NLP-50] Hierarchical Retrieval-Augmented Generation Model with Rethink for Multi-hop Question Answering
[NLP-50] 用于多跳问题解答的分层检索增强生成模型

链接: https://arxiv.org/abs/2408.11875
作者: Xiaoming Zhang,Ming Wang,Xiaocui Yang,Daling Wang,Shi Feng,Yifei Zhang
关键词-EN: Multi-hop Question Answering, resolve intricate questions, Multi-hop Question, Question Answering, necessitates complex reasoning
关键词-ZH: 多跳问题解答,解决复杂的问题,多跳问题,问题解答,需要复杂的推理
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
备注: undereview

点击查看摘要

Abstract:Multi-hop Question Answering (QA) necessitates complex reasoning by integrating multiple pieces of information to resolve intricate questions. However, existing QA systems encounter challenges such as outdated information, context window length limitations, and an accuracy-quantity trade-off. To address these issues, we propose a novel framework, the Hierarchical Retrieval-Augmented Generation Model with Rethink (HiRAG), comprising Decomposer, Definer, Retriever, Filter, and Summarizer five key modules. We introduce a new hierarchical retrieval strategy that incorporates both sparse retrieval at the document level and dense retrieval at the chunk level, effectively integrating their strengths. Additionally, we propose a single-candidate retrieval method to mitigate the limitations of multi-candidate retrieval. We also construct two new corpora, Indexed Wikicorpus and Profile Wikicorpus, to address the issues of outdated and insufficient knowledge. Our experimental results on four datasets demonstrate that HiRAG outperforms state-of-the-art models across most metrics, and our Indexed Wikicorpus is effective. The code for HiRAG is available at this https URL Comments: undereview Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR) Cite as: arXiv:2408.11875 [cs.CL] (or arXiv:2408.11875v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2408.11875 Focus to learn more arXiv-issued DOI via DataCite
摘要:多跳问答通过整合多条信息来解决复杂的问题,从而实现复杂的推理。然而,现有的QA系统遇到了诸如过时的信息、上下文窗口长度限制以及精确度与数量的权衡等挑战。为了解决这些问题,我们提出了一种新的框架–带反思的层次化检索-扩充生成模型(HiRAG),该模型包括解组器、定义器、检索器、过滤器和汇总器五个关键模块。提出了一种新的层次化检索策略,将文档层的稀疏检索和组块层的密集检索有机地结合在一起,有效地融合了两者的优势。此外,我们还提出了一种单候选检索方法来缓解多候选检索的局限性。我们还构建了两个新的语料库,索引维基百科和概况维基百科,以解决过时和不足的知识问题。我们在四个数据集上的实验结果表明,HiRAG在大多数度量标准上都优于最先进的模型,并且我们的索引维基百科是有效的。HiRAG的代码可在此HTTPS URL上获得注释:Uneview主题:计算与语言(cs.CL);人工智能(cs.AI);信息检索(cs.IR)引用为:arxiv:2408.11875cs.CLhttps://doi.org/10.48550/arXiv.2408.11875 Focus通过DataCite了解更多arxiv发布的文档

[NLP-51] MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models
[NLP-51] MegaFake:由大型语言模型生成的理论驱动的假新闻数据集

链接: https://arxiv.org/abs/2408.11871
作者: Lionel Z. Wang,Yiming Ma,Renfei Gao,Beichen Guo,Zhuoran Li,Han Zhu,Wenqi Fan,Zexin Lu,Ka Chung Ng
关键词-EN: large language models, revolutionized online content, generate high-quality fake, online content creation, language models
关键词-ZH: 大型语言模型、彻底改变在线内容、生成高质量的虚假、在线内容创建、语言模型
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:The advent of large language models (LLMs) has revolutionized online content creation, making it much easier to generate high-quality fake news. This misuse threatens the integrity of our digital environment and ethical standards. Therefore, understanding the motivations and mechanisms behind LLM-generated fake news is crucial. In this study, we analyze the creation of fake news from a social psychology perspective and develop a comprehensive LLM-based theoretical framework, LLM-Fake Theory. We introduce a novel pipeline that automates the generation of fake news using LLMs, thereby eliminating the need for manual annotation. Utilizing this pipeline, we create a theoretically informed Machine-generated Fake news dataset, MegaFake, derived from the GossipCop dataset. We conduct comprehensive analyses to evaluate our MegaFake dataset. We believe that our dataset and insights will provide valuable contributions to future research focused on the detection and governance of fake news in the era of LLMs.
摘要:大型语言模型(LLM)的出现彻底改变了在线内容的创作,使得生成高质量的假新闻变得更加容易。这种滥用威胁到我们数字环境和道德标准的完整性。因此,了解LLM生成假新闻背后的动机和机制至关重要。在这项研究中,我们从社会心理学的角度分析了假新闻的产生,并发展了一个基于LLM的全面的理论框架–LLM-FAKE理论。我们引入了一种新的管道,使用LLMS自动生成假新闻,从而消除了手动标注的需要。利用这条管道,我们创建了一个理论上知情的机器生成的假新闻数据集MegaFake,它来自GossipCop数据集。我们进行全面的分析来评估我们的MegaFake数据集。我们相信,我们的数据集和见解将为未来专注于LLMS时代的假新闻检测和治理的研究提供有价值的贡献。

[NLP-52] Enhance Lifelong Model Editing with Continuous Data-Adapter Association
[NLP-52] 通过连续数据适配器关联增强终身模型编辑

链接: https://arxiv.org/abs/2408.11869
作者: Jiaang Li,Quan Wang,Zhongnan Wang,Yongdong Zhang,Zhendong Mao
关键词-EN: Large language models, avoid factual errors, efficiently update specific, Large language, require model editing
关键词-ZH: 大型语言模型,避免事实错误,有效更新特定,大型语言,需要模型编辑
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Preprint. Under Review

点击查看摘要

Abstract:Large language models (LLMs) require model editing to efficiently update specific knowledge within them and avoid factual errors. Most model editing methods are solely designed for single-time use and lead to a significant forgetting effect after sequential edits over time, referred to as lifelong editing. Current approaches manage sequential edits by freezing original parameters and allocating new adapters for each knowledge modification. However, these methods lack robustness to minor input variations. To address this challenge, we propose ELDER, \textbfEnhancing \textbfLifelong mo\textbfDel \textbfEditing with mixtu\textbfRe of Low-Rank Adapter (LoRA). ELDER is an adaptive approach that integrates multiple LoRAs through a router network. It learns to create a continuous and smooth association between data and adapters, thereby enhancing robustness and generalization to semantically equivalent inputs. Additionally, we introduce a novel loss to help learn associations between adapter allocations and edit semantics. A deferral mechanism is also proposed to retain the original LLM capabilities post-edit. Extensive experiments on GPT-2 XL and LLaMA2-7B demonstrate that ELDER effectively edits models in the lifelong setting and exhibits strong scalability, while retaining LLM’s general abilities on downstream tasks.
摘要:大型语言模型需要对模型进行编辑,以有效地更新其中的特定知识并避免事实错误。大多数模型编辑方法都是专为一次性使用而设计的,经过一段时间的连续编辑后会导致显著的遗忘效应,称为终身编辑。当前的方法通过冻结原始参数并为每个知识修改分配新的适配器来管理顺序编辑。然而,这些方法对微小的输入变化缺乏稳健性。为了应对这一挑战,我们建议使用低等级适配器(LORA)的混合来进行EnderdBf增强TextbfLifelong Mo TextbfDel Textbf编辑。EARDER是一种通过路由器网络集成多个LORA的自适应方法。它学习在数据和适配器之间创建连续而平滑的关联,从而增强对语义等价输入的健壮性和泛化。此外,我们引入了一个新的损失来帮助学习适配器分配和编辑语义之间的关联。还提出了一种推迟机制,以保留编辑后的原始LLM能力。在GPT-2XL和LLaMA2-7B上的大量实验表明,EARD可以在终身环境下有效地编辑模型,并具有很强的可扩展性,同时保留了LLM在下游任务上的一般能力。

[NLP-53] Improving embedding with contrastive fine-tuning on small datasets with expert-augmented scores
[NLP-53] 通过对小数据集进行对比微调并改进嵌入,并具有专家增强的分数

链接: https://arxiv.org/abs/2408.11868
作者: Jun Lu,David Li,Bill Ding,Yu Kang
关键词-EN: small datasets augmented, presents an approach, approach to improve, contrastive fine-tuning, fine-tuning on small
关键词-ZH: 小型数据集增强,提出了一种改进的方法,对比微调,小型微调
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:This paper presents an approach to improve text embedding models through contrastive fine-tuning on small datasets augmented with expert scores. It focuses on enhancing semantic textual similarity tasks and addressing text retrieval problems. The proposed method uses soft labels derived from expert-augmented scores to fine-tune embedding models, preserving their versatility and ensuring retrieval capability is improved. The paper evaluates the method using a Q\A dataset from an online shopping website and eight expert models. Results show improved performance over a benchmark model across multiple metrics on various retrieval tasks from the massive text embedding benchmark (MTEB). The method is cost-effective and practical for real-world applications, especially when labeled data is scarce.
摘要:本文提出了一种通过对用专家评分增强的小型数据集进行对比微调来改进文本嵌入模型的方法。它专注于增强语义文本相似性任务和解决文本检索问题。所提出的方法使用从专家增强的分数中获得的软标签来微调嵌入模型,保留其通用性并确保检索能力得到提高。本文使用来自在线购物网站的问答数据集和八个专家模型评估了该方法。结果显示,在大规模文本嵌入基准(MTEB)的各种检索任务上,跨多个指标的基准模型的性能有所提高。该方法对于现实世界的应用程序来说具有成本效益且实用,尤其是当标记数据稀缺时。

[NLP-54] Crossing New Frontiers: Knowledge-Augmented Large Language Model Prompting for Zero-Shot Text-Based De Novo Molecule Design NEURIPS NEURIPS-2023
[NLP-54] 跨越新领域:知识增强的大型语言模型为基于零镜头文本的De Novo分子设计提供支持

链接: https://arxiv.org/abs/2408.11866
作者: Sakhinana Sagar Srinivas,Venkataramana Runkana
关键词-EN: innovative material development, efficient chemical processes, leverages computational methods, optimize molecular properties, fast-tracking new drug
关键词-ZH: 创新材料开发、高效的化学过程、利用计算方法、优化分子性质、快速追踪新药
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Biomolecules (q-bio.BM)
备注: Paper was accepted at R0-FoMo: Robustness of Few-shot and Zero-shot Learning in Foundation Models, NeurIPS-2023. Please find the links: this https URL and this https URL

点击查看摘要

Abstract:Molecule design is a multifaceted approach that leverages computational methods and experiments to optimize molecular properties, fast-tracking new drug discoveries, innovative material development, and more efficient chemical processes. Recently, text-based molecule design has emerged, inspired by next-generation AI tasks analogous to foundational vision-language models. Our study explores the use of knowledge-augmented prompting of large language models (LLMs) for the zero-shot text-conditional de novo molecular generation task. Our approach uses task-specific instructions and a few demonstrations to address distributional shift challenges when constructing augmented prompts for querying LLMs to generate molecules consistent with technical descriptions. Our framework proves effective, outperforming state-of-the-art (SOTA) baseline models on benchmark datasets.
摘要:分子设计是一种多方面的方法,利用计算方法和实验来优化分子性质、快速跟踪新药发现、创新材料开发和更高效的化学过程。最近,受类似于基础视觉语言模型的下一代人工智能任务的启发,基于文本的分子设计出现了。我们的研究探索了使用大型语言模型(LLM)的知识增强提示来进行零镜头文本条件从头分子生成任务。我们的方法使用特定于任务的指令和一些演示来解决在构建用于查询LLM以生成与技术描述一致的分子的增强提示时的分布转移挑战。事实证明,我们的框架是有效的,优于基准数据集上的最先进(SOTA)基线模型。

[NLP-55] How Susceptible are LLMs to Influence in Prompts?
[NLP-55] LLM对预算的影响力有多大?

链接: https://arxiv.org/abs/2408.11865
作者: Sotiris Anagnostidis,Jannis Bulian
关键词-EN: Large Language Models, Large Language, including additional context, Language Models, highly sensitive
关键词-ZH: 大型语言模型,大型语言,包括额外的上下文,语言模型,高度敏感
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Large Language Models (LLMs) are highly sensitive to prompts, including additional context provided therein. As LLMs grow in capability, understanding their prompt-sensitivity becomes increasingly crucial for ensuring reliable and robust performance, particularly since evaluating these models becomes more challenging. In this work, we investigate how current models (Llama, Mixtral, Falcon) respond when presented with additional input from another model, mimicking a scenario where a more capable model – or a system with access to more external information – provides supplementary information to the target model. Across a diverse spectrum of question-answering tasks, we study how an LLM’s response to multiple-choice questions changes when the prompt includes a prediction and explanation from another model. Specifically, we explore the influence of the presence of an explanation, the stated authoritativeness of the source, and the stated confidence of the supplementary input. Our findings reveal that models are strongly influenced, and when explanations are provided they are swayed irrespective of the quality of the explanation. The models are more likely to be swayed if the input is presented as being authoritative or confident, but the effect is small in size. This study underscores the significant prompt-sensitivity of LLMs and highlights the potential risks of incorporating outputs from external sources without thorough scrutiny and further validation. As LLMs continue to advance, understanding and mitigating such sensitivities will be crucial for their reliable and trustworthy deployment.
摘要:大型语言模型(LLM)对提示高度敏感,包括其中提供的附加上下文。随着LLM能力的提高,了解它们的即时敏感度对于确保可靠和稳健的性能变得越来越重要,特别是在评估这些模型变得更具挑战性的情况下。在这项工作中,我们调查了当前的模型(Llama、Mixtral、Falcon)在从另一个模型获得额外输入时的反应,模拟了一个更有能力的模型–或者一个可以访问更多外部信息的系统–为目标模型提供补充信息的场景。在不同的问答任务中,我们研究了当提示包括来自另一个模型的预测和解释时,LLM对多项选择问题的反应是如何变化的。具体地说,我们探讨了解释的存在、来源的权威性和补充输入的声明的置信度的影响。我们的发现表明,模型受到了强烈的影响,当提供解释时,无论解释的质量如何,它们都会动摇。如果输入是权威的或有信心的,模型更有可能受到影响,但影响很小。这项研究强调了土地管理办法的重大即时敏感性,并强调了在没有彻底审查和进一步验证的情况下纳入外部来源产出的潜在风险。随着LLMS的不断发展,了解和减轻这种敏感性对其可靠和值得信赖的部署至关重要。

[NLP-56] Unraveling Text Generation in LLMs: A Stochastic Differential Equation Approach
[NLP-56] 解开LLM中的文本生成:随机方程方法

链接: https://arxiv.org/abs/2408.11863
作者: Yukun Zhang
关键词-EN: Stochastic Differential Equations, Differential Equations, Large Language Models, Stochastic Differential, Large Language
关键词-ZH: 随机方程,方程,大型语言模型,随机方程,大型语言
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:This paper explores the application of Stochastic Differential Equations (SDE) to interpret the text generation process of Large Language Models (LLMs) such as GPT-4. Text generation in LLMs is modeled as a stochastic process where each step depends on previously generated content and model parameters, sampling the next word from a vocabulary distribution. We represent this generation process using SDE to capture both deterministic trends and stochastic perturbations. The drift term describes the deterministic trends in the generation process, while the diffusion term captures the stochastic variations. We fit these functions using neural networks and validate the model on real-world text corpora. Through numerical simulations and comprehensive analyses, including drift and diffusion analysis, stochastic process property evaluation, and phase space exploration, we provide deep insights into the dynamics of text generation. This approach not only enhances the understanding of the inner workings of LLMs but also offers a novel mathematical perspective on language generation, which is crucial for diagnosing, optimizing, and controlling the quality of generated text.
摘要:本文探讨了随机微分方程(SDE)在解释GPT-4等大型语言模型(LLM)文本生成过程中的应用。LLMS中的文本生成被建模为一个随机过程,其中每一步都取决于先前生成的内容和模型参数,并从词汇分布中采样下一个单词。我们使用SDE来表示这一生成过程,以捕捉确定性趋势和随机扰动。漂移项描述了生成过程中的确定性趋势,而扩散项则描述了随机变化。我们使用神经网络对这些函数进行拟合,并在真实文本语料库上对模型进行了验证。通过数值模拟和综合分析,包括漂移和扩散分析、随机过程性质评估和相空间探索,我们对文本生成的动力学提供了深入的见解。这一方法不仅加深了对LLMS内部工作原理的理解,而且为语言生成提供了一个新的数学视角,这对于诊断、优化和控制生成文本的质量至关重要。

[NLP-57] Sentiment analysis of preservice teachers reflections using a large language model
[NLP-57] 使用大语言模型对服务教师反思的情感分析

链接: https://arxiv.org/abs/2408.11862
作者: Yunsoo Park,Younkyung Hong
关键词-EN: preservice teachers’ reflections, emotion and tone, analyzed using sentiment, teachers’ reflections, Gemini
关键词-ZH: 辅导老师的反思、情感和语气,用情感分析,老师的反思,双子座
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 5 pages, 2 tables, WAIE 2024 (2024 6th International Workshop on Artificial Intelligence and Education)

点击查看摘要

Abstract:In this study, the emotion and tone of preservice teachers’ reflections were analyzed using sentiment analysis with LLMs: GPT-4, Gemini, and BERT. We compared the results to understand how each tool categorizes and describes individual reflections and multiple reflections as a whole. This study aims to explore ways to bridge the gaps between qualitative, quantitative, and computational analyses of reflective practices in teacher education. This study finds that to effectively integrate LLM analysis into teacher education, developing an analysis method and result format that are both comprehensive and relevant for preservice teachers and teacher educators is crucial.
摘要:在这项研究中,使用LLM(GPT-4、Gemini和BERT)的情感分析来分析辅导教师反思的情感和语气。我们比较了结果,以了解每个工具如何将单个反射和多个反射作为一个整体进行分类和描述。本研究旨在探索弥合教师教育反思实践的定性、定量和计算分析之间差距的方法。本研究发现,为了将LLM分析有效地整合到教师教育中,开发一种全面且与兼职教师和教师教育工作者相关的分析方法和结果格式至关重要。

[NLP-58] Speaking the Same Language: Leveraging LLMs in Standardizing Clinical Data for AI
[NLP-58] 说同一种语言:利用LLM标准化人工智能临床数据

链接: https://arxiv.org/abs/2408.11861
作者: Arindam Sett,Somaye Hashemifar,Mrunal Yadav,Yogesh Pandit,Mohsen Hejrati
关键词-EN: Artificial Intelligence, garnered considerable attention, implementation of Artificial, cost reduction, considerable attention
关键词-ZH: 人工智能,获得相当大的关注,人工的实施,降低成本,相当大的关注
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: 11 pages, 2 figures, 4 tables

点击查看摘要

Abstract:The implementation of Artificial Intelligence (AI) in the healthcare industry has garnered considerable attention, attributable to its prospective enhancement of clinical outcomes, expansion of access to superior healthcare, cost reduction, and elevation of patient satisfaction. Nevertheless, the primary hurdle that persists is related to the quality of accessible multi-modal healthcare data in conjunction with the evolution of AI methodologies. This study delves into the adoption of large language models to address specific challenges, specifically, the standardization of healthcare data. We advocate the use of these models to identify and map clinical data schemas to established data standard attributes, such as the Fast Healthcare Interoperability Resources. Our results illustrate that employing large language models significantly diminishes the necessity for manual data curation and elevates the efficacy of the data standardization process. Consequently, the proposed methodology has the propensity to expedite the integration of AI in healthcare, ameliorate the quality of patient care, whilst minimizing the time and financial resources necessary for the preparation of data for AI.
摘要:人工智能(AI)在医疗保健行业的实施已经引起了相当大的关注,这归因于它可以预期地改善临床结果,扩大获得优质医疗保健的机会,降低成本,提高患者满意度。然而,持续存在的主要障碍与可访问的多模式医疗数据的质量以及人工智能方法的演变有关。这项研究深入探讨采用大型语言模型来解决特定的挑战,特别是医疗数据的标准化。我们提倡使用这些模型来识别临床数据模式并将其映射到已建立的数据标准属性,例如快速医疗互操作性资源。我们的结果表明,使用大型语言模型显著减少了手动数据管理的必要性,并提高了数据标准化过程的效率。因此,拟议的方法倾向于加快人工智能在医疗保健中的整合,改善患者护理的质量,同时将准备人工智能数据所需的时间和财政资源降至最低。

[NLP-59] Risks and NLP Design: A Case Study on Procedural Document QA
[NLP-59] 风险与NLP设计:程序性文档质量保证的案例研究

链接: https://arxiv.org/abs/2408.11860
作者: Nikita Haduong(1),Alice Gao(1),Noah A. Smith(1 and 2) ((1) Paul G. Allen School of Computer Science amp; Engineering, University of Washington, (2) Allen Institute for Artificial Intelligence)
关键词-EN: potential negative impacts, NLP applications, research community, abstract level, potential negative
关键词-ZH: 潜在的负面影响、NLP应用、研究界、抽象水平、潜在的负面影响
类目: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
备注:

点击查看摘要

Abstract:As NLP systems are increasingly deployed at scale, concerns about their potential negative impacts have attracted the attention of the research community, yet discussions of risk have mostly been at an abstract level and focused on generic AI or NLP applications. We argue that clearer assessments of risks and harms to users–and concrete strategies to mitigate them–will be possible when we specialize the analysis to more concrete applications and their plausible users. As an illustration, this paper is grounded in cooking recipe procedural document question answering (ProcDocQA), where there are well-defined risks to users such as injuries or allergic reactions. Our case study shows that an existing language model, applied in “zero-shot” mode, quantitatively answers real-world questions about recipes as well or better than the humans who have answered the questions on the web. Using a novel questionnaire informed by theoretical work on AI risk, we conduct a risk-oriented error analysis that could then inform the design of a future system to be deployed with lower risk of harm and better performance.
摘要:随着自然语言处理系统的大规模部署,对其潜在负面影响的担忧引起了研究界的关注,但对风险的讨论大多处于抽象水平,并集中在通用人工智能或自然语言处理应用上。我们认为,当我们专门对更具体的应用程序及其可信的用户进行分析时,就有可能对用户面临的风险和危害进行更清晰的评估,并制定具体的策略来缓解这些风险和危害。作为说明,本文以烹饪食谱程序文档问答(ProcDocQA)为基础,其中对用户存在明确定义的风险,如伤害或过敏反应。我们的案例研究表明,一个现有的语言模型,应用于“零镜头”模式,定量地回答了现实世界中关于食谱的问题,与在网络上回答问题的人一样,甚至更好。使用一份新的问卷,根据人工智能风险的理论工作,我们进行了面向风险的错误分析,然后可以为未来系统的设计提供信息,该系统将以更低的伤害风险和更好的性能部署。

[NLP-60] Convexity-based Pruning of Speech Representation Models
[NLP-60] 基于凸度的语音表示模型修剪

链接: https://arxiv.org/abs/2408.11858
作者: Teresa Dorszewski,Lenka Tětková,Lars Kai Hansen
关键词-EN: Speech representation models, Speech representation, shown great promise, representation models based, keyword spotting
关键词-ZH: 语音表示模型,语音表示,表现出巨大的前景,基于表示模型,关键词发现
类目: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
备注:

点击查看摘要

Abstract:Speech representation models based on the transformer architecture and trained by self-supervised learning have shown great promise for solving tasks such as speech and speaker recognition, keyword spotting, emotion detection, and more. Typically, it is found that larger models lead to better performance. However, the significant computational effort involved in such large transformer systems is a challenge for embedded and real-world applications. Recent work has shown that there is significant redundancy in the transformer models for NLP and massive layer pruning is feasible (Sajjad et al., 2023). Here, we investigate layer pruning in audio models. We base the pruning decision on a convexity criterion. Convexity of classification regions has recently been proposed as an indicator of subsequent fine-tuning performance in a range of application domains, including NLP and audio. In empirical investigations, we find a massive reduction in the computational effort with no loss of performance or even improvements in certain cases.
摘要:基于变换结构和自监督学习的语音表示模型在解决语音和说话人识别、关键词识别、情感检测等方面显示出巨大的潜力。通常,人们发现较大的模型会带来更好的性能。然而,这种大型变压器系统所涉及的大量计算工作对嵌入式和现实世界的应用程序来说是一个挑战。最近的工作表明,NLP的变压器模型中存在显著的冗余,大规模层修剪是可行的(Sajjad等人,2023)。在这里,我们研究音频模型中的层剪枝。我们的修剪决策基于凸性标准。分类区域的凸性最近被建议作为包括NLP和音频在内的一系列应用领域中后续微调性能的指标。在实证研究中,我们发现在不损失性能甚至在某些情况下改进的情况下,计算量有了很大的减少。

[NLP-61] Hermes 3 Technical Report
[NLP-61] 爱马仕3技术报告

链接: https://arxiv.org/abs/2408.11857
作者: Ryan Teknium,Jeffrey Quesnelle,Chen Guang
关键词-EN: large language models, people interact, interact with large, large language, tuned models
关键词-ZH: 大型语言模型,人们互动,与大型语言、调整模型互动
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Instruct (or “chat”) tuned models have become the primary way in which most people interact with large language models. As opposed to “base” or “foundation” models, instruct-tuned models are optimized to respond to imperative statements. We present Hermes 3, a neutrally-aligned generalist instruct and tool use model with strong reasoning and creative abilities. Its largest version, Hermes 3 405B, achieves state of the art performance among open weight models on several public benchmarks.
摘要:指令(或“聊天”)调优模型已成为大多数人与大型语言模型交互的主要方式。与“基础”或“基础”模型相反,预算调整模型经过优化以响应命令性陈述。我们展示了Hermes 3,这是一个中立的通才教学和工具使用模型,具有强大的推理和创造能力。其最大版本Hermes 3 405 B在多个公开基准测试中实现了开放重量模型中的最先进性能。

[NLP-62] Dynamic Adaptive Optimization for Effective Sentiment Analysis Fine-Tuning on Large Language Models
[NLP-62] 有效情绪分析的动态自适应优化大型语言模型的微调

链接: https://arxiv.org/abs/2408.11856
作者: Hongcheng Ding,Xuanze Zhao,Shamsul Nahar Abdullah,Deshinta Arrova Dewi,Zixiao Jiang
关键词-EN: Sentiment analysis plays, Sentiment analysis, plays a crucial, crucial role, business intelligence
关键词-ZH: 情绪分析发挥着至关重要的作用,商业智能
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注:

点击查看摘要

Abstract:Sentiment analysis plays a crucial role in various domains, such as business intelligence and financial forecasting. Large language models (LLMs) have become a popular paradigm for sentiment analysis, leveraging multi-task learning to address specific tasks concurrently. However, LLMs with fine-tuning for sentiment analysis often underperforms due to the inherent challenges in managing diverse task complexities. Moreover, constant-weight approaches in multi-task learning struggle to adapt to variations in data characteristics, further complicating model effectiveness. To address these issues, we propose a novel multi-task learning framework with a dynamic adaptive optimization (DAO) module. This module is designed as a plug-and-play component that can be seamlessly integrated into existing models, providing an effective and flexible solution for multi-task learning. The key component of the DAO module is dynamic adaptive loss, which dynamically adjusts the weights assigned to different tasks based on their relative importance and data characteristics during training. Sentiment analyses on a standard and customized financial text dataset demonstrate that the proposed framework achieves superior performance. Specifically, this work improves the Mean Squared Error (MSE) and Accuracy (ACC) by 15.58% and 1.24% respectively, compared with previous work.
摘要:情感分析在商业智能、金融预测等多个领域发挥着至关重要的作用。大型语言模型(LLM)已经成为情绪分析的流行范式,它利用多任务学习来同时处理特定任务。然而,对情绪分析进行微调的LLMS往往表现不佳,因为在管理各种任务复杂性方面存在内在挑战。此外,多任务学习中的恒定权重方法难以适应数据特征的变化,从而进一步复杂化了模型的有效性。为了解决这些问题,我们提出了一种带有动态自适应优化(DAO)模块的新型多任务学习框架。此模块设计为即插即用组件,可以无缝集成到现有模型中,为多任务学习提供有效且灵活的解决方案。DAO模块的关键部分是动态自适应损失,它在训练过程中根据任务的相对重要性和数据特征动态调整分配给不同任务的权重。在标准和定制的金融文本数据集上的情感分析表明,该框架取得了优越的性能。与前人的工作相比,本工作的均方误差(MSE)和精度(ACC)分别提高了15.58%和1.24%。

[NLP-63] FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
[NLP-63] FactorLLM:通过大型语言模型的专家混合对知识进行分解

链接: https://arxiv.org/abs/2408.11855
作者: Zhongyu Zhao,Menghang Dong,Rongyu Zhang,Wenzhao Zheng,Yunpeng Zhang,Huanrui Yang,Dalong Du,Kurt Keutzer,Shanghang Zhang
关键词-EN: Large Language Models, Large Language, storing diverse linguistic, Recent research, Feed-Forward Networks
关键词-ZH: 大型语言模型、大型语言、存储不同语言、最近的研究、前向网络
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:Recent research has demonstrated that Feed-Forward Networks (FFNs) in Large Language Models (LLMs) play a pivotal role in storing diverse linguistic and factual knowledge. Conventional methods frequently face challenges due to knowledge confusion stemming from their monolithic and redundant architectures, which calls for more efficient solutions with minimal computational overhead, particularly for LLMs. In this paper, we explore the FFN computation paradigm in LLMs and introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications, while maintaining the same level of performance. Furthermore, we embed a router from the Mixture-of-Experts (MoE), combined with our devised Prior-Approximate (PA) loss term that facilitates the dynamic activation of experts and knowledge adaptation, thereby accelerating computational processes and enhancing performance using minimal training data and fine-tuning steps. FactorLLM thus enables efficient knowledge factorization and activates select groups of experts specifically tailored to designated tasks, emulating the interactive functional segmentation of the human brain. Extensive experiments across various benchmarks demonstrate the effectiveness of our proposed FactorLLM which achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed. Code: this https URL.
摘要:最近的研究表明,大语言模型中的前馈网络在存储不同的语言和事实知识方面起着至关重要的作用。传统方法经常面临挑战,因为它们的单一和冗余的体系结构导致知识混乱,这要求以最小的计算开销获得更有效的解决方案,特别是对于LLM。在本文中,我们探索了LLMS中的FFN计算范式,并引入了FactorLLM,这是一种新的方法,它将训练有素的密集FFN分解成稀疏子网络,而不需要进一步修改,同时保持相同的性能水平。此外,我们嵌入了来自专家混合(MOE)的路由器,结合我们设计的先验近似(PA)损失项,促进了专家的动态激活和知识适应,从而加快了计算过程,并使用最少的训练数据和微调步骤提高了性能。因此,FactorLLM实现了高效的知识分解,并激活了专门为指定任务量身定做的选定专家组,模拟了人脑的交互功能分割。各种基准测试的广泛实验证明了我们提出的FactorLLM的有效性,它获得了与源模型相当的性能,确保了高达85%的模型性能,同时推理速度提高了30%以上。代码:此HTTPS URL。

[NLP-64] When Raw Data Prevails: Are Large Language Model Embeddings Effective in Numerical Data Representation for Medical Machine Learning Applications?
[NLP-64] 当原始数据盛行时:大型语言模型嵌入在医疗机器学习应用的数字数据表示中有效吗?

链接: https://arxiv.org/abs/2408.11854
作者: Yanjun Gao,Skatje Myers,Shan Chen,Dmitriy Dligach,Timothy A Miller,Danielle Bitterman,Matthew Churpek,Majid Afshar
关键词-EN: Large Language Models, Language Models, Large Language, bringing significant progress, introduction of Large
关键词-ZH: 大型语言模型,语言模型,大型语言,带来重大进展,大型语言的引入
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注: Under review

点击查看摘要

Abstract:The introduction of Large Language Models (LLMs) has advanced data representation and analysis, bringing significant progress in their use for medical questions and answering. Despite these advancements, integrating tabular data, especially numerical data pivotal in clinical contexts, into LLM paradigms has not been thoroughly explored. In this study, we examine the effectiveness of vector representations from last hidden states of LLMs for medical diagnostics and prognostics using electronic health record (EHR) data. We compare the performance of these embeddings with that of raw numerical EHR data when used as feature inputs to traditional machine learning (ML) algorithms that excel at tabular data learning, such as eXtreme Gradient Boosting. We focus on instruction-tuned LLMs in a zero-shot setting to represent abnormal physiological data and evaluating their utilities as feature extractors to enhance ML classifiers for predicting diagnoses, length of stay, and mortality. Furthermore, we examine prompt engineering techniques on zero-shot and few-shot LLM embeddings to measure their impact comprehensively. Although findings suggest the raw data features still prevails in medical ML tasks, zero-shot LLM embeddings demonstrate competitive results, suggesting a promising avenue for future research in medical applications.
摘要:大型语言模型的引入促进了数据的表示和分析,使其在医学问答中的应用取得了显著的进步。尽管取得了这些进展,但将表格数据,特别是在临床环境中起关键作用的数字数据整合到LLM范例中还没有得到彻底的探索。在这项研究中,我们使用电子健康记录(EHR)数据来检验来自LLMS最后隐藏状态的向量表示对于医疗诊断和预后的有效性。我们将这些嵌入与原始数字EHR数据用作传统机器学习(ML)算法的特征输入时的性能进行了比较,传统机器学习算法擅长于表格数据学习,如极端梯度增强。我们专注于在零镜头环境中对LLMS进行指令调整以表示异常生理数据,并评估它们作为特征提取工具的有效性,以增强ML分类器预测诊断、住院时间和死亡率的能力。此外,我们还考察了零炮和少炮LLM嵌入的即时工程技术,以全面衡量它们的影响。尽管研究结果表明原始数据特征在医学ML任务中仍然占主导地位,但零镜头LLM嵌入展示了具有竞争力的结果,这表明未来医学应用中的研究是一条很有前途的途径。

[NLP-65] PyMarian: Fast Neural Machine Translation and Evaluation in Python
[NLP-65] PyMarian:Python中的快速神经机器翻译和评估

链接: https://arxiv.org/abs/2408.11853
作者: Thamme Gowda,Roman Grundkiewicz,Elijah Rippeth,Matt Post,Marcin Junczys-Dowmunt
关键词-EN: deep learning language, technical support, hard to beat, deep learning, choice these days
关键词-ZH: 深度学习语言、技术支持、难以击败、深度学习、如今的选择
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:The deep learning language of choice these days is Python; measured by factors such as available libraries and technical support, it is hard to beat. At the same time, software written in lower-level programming languages like C++ retain advantages in speed. We describe a Python interface to Marian NMT, a C+±based training and inference toolkit for sequence-to-sequence models, focusing on machine translation. This interface enables models trained with Marian to be connected to the rich, wide range of tools available in Python. A highlight of the interface is the ability to compute state-of-the-art COMET metrics from Python but using Marian’s inference engine, with a speedup factor of up to 7.8 \times the existing implementations. We also briefly spotlight a number of other integrations, including Jupyter notebooks, connection with prebuilt models, and a web app interface provided with the package. PyMarian is available in PyPI via \textttpip install pymarian .
摘要:如今选择的深度学习语言是Python;根据可用库和技术支持等因素来衡量,它很难被击败。与此同时,用C++等低级编程语言编写的软件在速度上保留了优势。我们描述了Marian NMT的Python接口,Marian NMT是一个基于C++的训练和推理工具包,用于序列到序列模型,重点关注机器翻译。该界面使用Marian训练的模型能够连接到Python中提供的丰富、广泛的工具。该界面的一大亮点是能够从Python计算最先进的COMET指标,但使用Marian的推理引擎,加速因子高达现有实现的7.8倍。我们还简要介绍了许多其他集成,包括Deliveryter笔记本、与预构建模型的连接以及包中提供的Web应用程序界面。PyMarian可通过\textttpip安装pymarian在PyPI中使用。

[NLP-66] Fast Training Dataset Attribution via In-Context Learning
[NLP-66] 通过上下文学习快速训练数据集归因

链接: https://arxiv.org/abs/2408.11852
作者: Milad Fotouhi,Mohammad Taha Bahadori,Oluwaseyi Feyisetan,Payman Arabshahi,David Heckerman
关键词-EN: instruction-tuned large language, large language models, prompt engineering, engineering to estimate, instruction-tuned large
关键词-ZH: 描述调整的大型语言、大型语言模型、提示工程、工程估计、描述调整的大型
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
备注:

点击查看摘要

Abstract:We investigate the use of in-context learning and prompt engineering to estimate the contributions of training data in the outputs of instruction-tuned large language models (LLMs). We propose two novel approaches: (1) a similarity-based approach that measures the difference between LLM outputs with and without provided context, and (2) a mixture distribution model approach that frames the problem of identifying contribution scores as a matrix factorization task. Our empirical comparison demonstrates that the mixture model approach is more robust to retrieval noise in in-context learning, providing a more reliable estimation of data contributions.
摘要:我们研究了使用上下文学习和提示工程来估计训练数据在翻译调优大型语言模型(LLM)输出中的贡献。我们提出了两种新颖的方法:(1)基于相似性的方法,衡量有和没有提供上下文的LLM输出之间的差异,以及(2)混合分布模型方法,将识别贡献分数的问题框架为矩阵分解任务。我们的经验比较表明,混合模型方法对上下文学习中的检索噪音更稳健,可以提供更可靠的数据贡献估计。

[NLP-67] SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming
[NLP-67] SAGE-RT:安全评估和红色团队化的合成对齐数据生成

链接: https://arxiv.org/abs/2408.11851
作者: Anurakt Kumar,Divyanshu Kumar,Jatan Loya,Nitin Aravind Birur,Tanay Baswa,Sahil Agarwal,Prashanth Harshangi
关键词-EN: Red Teaming, Evaluation and Red, Safety Evaluation, data Generation, introduce Synthetic Alignment
关键词-ZH: 红色团队化、评估和红色、安全评估、数据生成、引入合成对齐
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
备注:

点击查看摘要

Abstract:We introduce Synthetic Alignment data Generation for Safety Evaluation and Red Teaming (SAGE-RT or SAGE) a novel pipeline for generating synthetic alignment and red-teaming data. Existing methods fall short in creating nuanced and diverse datasets, providing necessary control over the data generation and validation processes, or require large amount of manually generated seed data. SAGE addresses these limitations by using a detailed taxonomy to produce safety-alignment and red-teaming data across a wide range of topics. We generated 51,000 diverse and in-depth prompt-response pairs, encompassing over 1,500 topics of harmfulness and covering variations of the most frequent types of jailbreaking prompts faced by large language models (LLMs). We show that the red-teaming data generated through SAGE jailbreaks state-of-the-art LLMs in more than 27 out of 32 sub-categories, and in more than 58 out of 279 leaf-categories (sub-sub categories). The attack success rate for GPT-4o, GPT-3.5-turbo is 100% over the sub-categories of harmfulness. Our approach avoids the pitfalls of synthetic safety-training data generation such as mode collapse and lack of nuance in the generation pipeline by ensuring a detailed coverage of harmful topics using iterative expansion of the topics and conditioning the outputs on the generated raw-text. This method can be used to generate red-teaming and alignment data for LLM Safety completely synthetically to make LLMs safer or for red-teaming the models over a diverse range of topics.
摘要:我们介绍了一种新的生成合成比对和红队数据的管道–SAGE-RT(SAGE-RT或SAGE)。现有方法不能创建细微差别和多样化的数据集,无法提供对数据生成和验证过程的必要控制,或者需要大量手动生成的种子数据。Sage通过使用详细的分类法来生成广泛主题的安全对齐和红团队数据,从而解决了这些限制。我们生成了51,000个不同的和深入的提示-响应对,涵盖了1500多个关于危害性的主题,并涵盖了大型语言模型(LLM)面临的最频繁的越狱提示类型的变体。我们显示,通过SAGE越狱产生的红队数据在32个子类别中超过27个,在279个叶子类别(子类别)中超过58个,是最先进的LLM。GPT-40、GPT-3.5-Turbo的攻击成功率在危害亚类上是100%。我们的方法避免了合成安全训练数据生成的陷阱,例如模式崩溃和生成管道中缺乏细微差别,方法是使用主题的迭代扩展来确保有害主题的详细覆盖,并根据生成的原始文本来限制输出。这种方法可以用于为LLM安全完全综合地生成红队和对齐数据,以使LLM更安全,或者用于在不同主题范围内对模型进行红队。

[NLP-68] Parallel Speculative Decoding with Adaptive Draft Length
[NLP-68] 具有自适应草案长度的并行推测解码

链接: https://arxiv.org/abs/2408.11850
作者: Tianyu Liu,Yun Li,Qitan Lv,Kai Liu,Jianchen Zhu,Winston Hu
关键词-EN: LLM inference acceleration, shown great power, original target model, target model verifies, extra draft model
关键词-ZH: LLM推理加速,显示强大的力量,原始目标模型,目标模型验证,额外草稿模型
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Speculative decoding (SD), where an extra draft model is employed to provide multiple \textitdraft tokens first and then the original target model verifies these tokens in parallel, has shown great power for LLM inference acceleration. However, existing SD methods suffer from the mutual waiting problem, i.e., the target model gets stuck when the draft model is \textitguessing tokens, and vice versa. This problem is directly incurred by the asynchronous execution of the draft model and the target model, and is exacerbated due to the fixed draft length in speculative decoding. To address these challenges, we propose a conceptually simple, flexible, and general framework to boost speculative decoding, namely \textbfParallel sp\textbfEculative decoding with \textbfAdaptive d\textbfRaft \textbfLength (PEARL). Specifically, PEARL proposes \textitpre-verify to verify the first draft token in advance during the drafting phase, and \textitpost-verify to generate more draft tokens during the verification phase. PEARL parallels the drafting phase and the verification phase via applying the two strategies, and achieves adaptive draft length for different scenarios, which effectively alleviates the mutual waiting problem. Moreover, we theoretically demonstrate that the mean accepted tokens of PEARL is more than existing \textitdraft-then-verify works. Experiments on various text generation benchmarks demonstrate the effectiveness of our \name, leading to a superior speedup performance up to \textbf3.79 \times and \textbf1.52 \times , compared to auto-regressive decoding and vanilla speculative decoding, respectively.
摘要:推测译码(SD)是一种利用额外的草稿模型来提供多个文本草稿令牌,然后由原始目标模型并行验证这些令牌的方法,已经显示出强大的LLM推理加速能力。然而,现有的SD方法存在相互等待的问题,即当草稿模型是猜测令牌时,目标模型会被卡住,反之亦然。这个问题是由草稿模型和目标模型的异步执行直接引起的,并且由于推测解码中固定的草稿长度而加剧。为了应对这些挑战,我们提出了一种概念上简单、灵活和通用的框架来提高推测解码的性能,即:\extbf并行SP\extbf使用\extbf自适应d\extbfRaft\extbfLength(PEAR)进行联合解码。具体地说,珀尔建议在起草阶段对第一个草稿令牌进行预验证,在验证阶段生成更多的草稿令牌。PEARL通过应用这两种策略将起草阶段和验证阶段并行,并实现了不同场景下的自适应草稿长度,有效地缓解了相互等待问题。此外,我们从理论上证明了珍珠的平均可接受令牌多于现有的\文本草稿-然后-验证作品。在不同的文本生成基准测试上的实验证明了OUR\NAME的有效性,与自回归译码和Vanilla推测译码相比,分别获得了3.79倍和1.52倍的加速比。

[NLP-69] Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
[NLP-69] Style-Talker:微调音频语言模型和基于风格的文本到语音模型,用于快速口语对话生成

链接: https://arxiv.org/abs/2408.11849
作者: Yinghao Aaron Li,Xilin Jiang,Jordan Darefsky,Ge Zhu,Nima Mesgarani
关键词-EN: large language models, contextually relevant dialogues, text-based chatbots, demonstrating their capability, large language
关键词-ZH: 大型语言模型、上下文相关对话、基于文本的聊天机器人、展示其能力、大型语言
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
备注: CoLM 2024

点击查看摘要

Abstract:The rapid advancement of large language models (LLMs) has significantly propelled the development of text-based chatbots, demonstrating their capability to engage in coherent and contextually relevant dialogues. However, extending these advancements to enable end-to-end speech-to-speech conversation bots remains a formidable challenge, primarily due to the extensive dataset and computational resources required. The conventional approach of cascading automatic speech recognition (ASR), LLM, and text-to-speech (TTS) models in a pipeline, while effective, suffers from unnatural prosody because it lacks direct interactions between the input audio and its transcribed text and the output audio. These systems are also limited by their inherent latency from the ASR process for real-time applications. This paper introduces Style-Talker, an innovative framework that fine-tunes an audio LLM alongside a style-based TTS model for fast spoken dialog generation. Style-Talker takes user input audio and uses transcribed chat history and speech styles to generate both the speaking style and text for the response. Subsequently, the TTS model synthesizes the speech, which is then played back to the user. While the response speech is being played, the input speech undergoes ASR processing to extract the transcription and speaking style, serving as the context for the ensuing dialogue turn. This novel pipeline accelerates the traditional cascade ASR-LLM-TTS systems while integrating rich paralinguistic information from input speech. Our experimental results show that Style-Talker significantly outperforms the conventional cascade and speech-to-speech baselines in terms of both dialogue naturalness and coherence while being more than 50% faster.
摘要:大型语言模型的快速发展极大地推动了基于文本的聊天机器人的发展,显示了它们参与连贯和上下文相关对话的能力。然而,扩展这些改进以实现端到端的语音到语音对话机器人仍然是一个艰巨的挑战,主要是因为需要大量的数据集和计算资源。传统的在流水线中级联自动语音识别(ASR)、LLM和文本到语音(TTS)模型的方法虽然有效,但由于缺乏输入音频与其转录文本和输出音频之间的直接交互,因此存在不自然的韵律。这些系统还受到来自实时应用程序的ASR过程的固有延迟的限制。本文介绍了Style-Talker,这是一个创新的框架,它结合基于样式的TTS模型对音频LLM进行微调,以实现快速口语对话生成。Style-Talker获取用户输入的音频,并使用转录的聊天历史和语音风格来生成发言风格和响应文本。随后,TTS模型合成语音,然后将其播放给用户。在播放应答语音时,输入语音经过ASR处理以提取转录和说话风格,作为随后的对话话轮的上下文。这种新的流水线加速了传统的级联ASR-LLM-TTS系统,同时集成了来自输入语音的丰富的副语言信息。我们的实验结果表明,Style-Talker在对话自然度和连贯性方面都明显优于传统的级联和语音到语音基线,并且速度要快50%以上。

[NLP-70] MGH Radiology Llama: A Llama 3 70B Model for Radiology
[NLP-70] MGH放射学Llama:Llama 3 70 B放射学模型

链接: https://arxiv.org/abs/2408.11848
作者: Yucheng Shi,Peng Shu,Zhengliang Liu,Zihao Wu,Quanzheng Li,Xiang Li
关键词-EN: enhance diagnostic accuracy, improve patient care, streamline workflows, recent years, artificial intelligence
关键词-ZH: 提高诊断准确性、改善患者护理、简化工作流程、近年来、人工智能
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 11 pages, 3 figures, 1 table

点击查看摘要

Abstract:In recent years, the field of radiology has increasingly harnessed the power of artificial intelligence (AI) to enhance diagnostic accuracy, streamline workflows, and improve patient care. Large language models (LLMs) have emerged as particularly promising tools, offering significant potential in assisting radiologists with report generation, clinical decision support, and patient communication. This paper presents an advanced radiology-focused large language model: MGH Radiology Llama. It is developed using the Llama 3 70B model, building upon previous domain-specific models like Radiology-GPT and Radiology-Llama2. Leveraging a unique and comprehensive dataset from Massachusetts General Hospital, comprising over 6.5 million de-identified medical reports across various imaging modalities, the model demonstrates significant improvements in generating accurate and clinically relevant radiology impressions given the corresponding findings. Our evaluation, incorporating both traditional metrics and a GPT-4-based assessment, highlights the enhanced performance of this work over general-purpose LLMs.
摘要:近年来,放射学领域越来越多地利用人工智能(AI)的力量来提高诊断准确性、简化工作流程和改善患者护理。大型语言模型(LLM)已经成为特别有前景的工具,在帮助放射科医生生成报告、临床决策支持和患者沟通方面提供了巨大的潜力。本文提出了一种先进的面向放射学的大型语言模型:MGH Radiology Llama。它是使用Llama370B模型开发的,建立在以前的特定领域模型的基础上,如Radiology-GPT和Radiology-Llama2。利用来自马萨诸塞州综合医院的独特和全面的数据集,包括各种成像模式的超过650万份未识别的医疗报告,该模型显示出在生成准确和临床相关的放射学印象方面的显著改进,因为有了相应的发现。我们的评估结合了传统指标和基于GPT-4的评估,突出了这项工作相对于通用低成本管理的改进性能。

[NLP-71] Prompto: An open source library for asynchronous querying of LLM endpoints
[NLP-71] Deliverto:用于LLM端点同步查询的开源库

链接: https://arxiv.org/abs/2408.11847
作者: Ryan Sze-Yin Chan,Federico Nanni,Edwin Brown,Ed Chapman,Angus R. Williams,Jonathan Bright,Evelina Gabasova
关键词-EN: Large Language Model, Large Language, opened exciting avenues, Recent surge, surge in Large
关键词-ZH: 大型语言模型,大型语言,开辟了令人兴奋的道路,最近激增,大型激增
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:Recent surge in Large Language Model (LLM) availability has opened exciting avenues for research. However, efficiently interacting with these models presents a significant hurdle since LLMs often reside on proprietary or self-hosted API endpoints, each requiring custom code for interaction. Conducting comparative studies between different models can therefore be time-consuming and necessitate significant engineering effort, hindering research efficiency and reproducibility. To address these challenges, we present prompto, an open source Python library which facilitates asynchronous querying of LLM endpoints enabling researchers to interact with multiple LLMs concurrently, while maximising efficiency and utilising individual rate limits. Our library empowers researchers and developers to interact with LLMs more effectively and enabling faster experimentation and evaluation. prompto is released with an introductory video (this https URL) under MIT License and is available via GitHub (this https URL).
摘要:最近大型语言模型(LLM)的出现为研究开辟了令人兴奋的道路。然而,高效地与这些模型交互是一个重大障碍,因为LLM通常驻留在专有或自托管的API端点上,每个端点都需要用于交互的自定义代码。因此,在不同模型之间进行比较研究可能是耗时的,需要大量的工程工作,从而阻碍研究效率和重复性。为了应对这些挑战,我们推出了一个开源的Python库,它促进了对LLM端点的异步查询,使研究人员能够同时与多个LLM交互,同时最大限度地提高效率和利用单个速率限制。我们的库使研究人员和开发人员能够更有效地与LLMS交互,并能够更快地进行实验和评估。Prompto在麻省理工学院的许可下与介绍性视频(此HTTPS URL)一起发布,并可通过GitHub(此HTTPS URL)获得。

[NLP-72] Density Matrices for Metaphor Understanding
[NLP-72] 隐喻理解的密度矩阵

链接: https://arxiv.org/abs/2408.11846
作者: Jay Owers,Ekaterina Shutova,Martha Lewis
关键词-EN: represent mixed states, mixed states, pure states, lexical ambiguity, represent mixed
关键词-ZH: 代表混合状态,混合状态,纯状态,词汇歧义,代表混合
类目: Computation and Language (cs.CL)
备注: In Proceedings QPL 2024, arXiv:2408.05113

点击查看摘要

Abstract:In physics, density matrices are used to represent mixed states, i.e. probabilistic mixtures of pure states. This concept has previously been used to model lexical ambiguity. In this paper, we consider metaphor as a type of lexical ambiguity, and examine whether metaphorical meaning can be effectively modelled using mixtures of word senses. We find that modelling metaphor is significantly more difficult than other kinds of lexical ambiguity, but that our best-performing density matrix method outperforms simple baselines as well as some neural language models.
摘要:在物理学中,密度矩阵用于表示混合状态,即纯状态的概率混合。这个概念以前曾被用来建模词汇歧义。在本文中,我们将隐喻视为一种词汇歧义,并研究隐喻意义是否可以通过混合的词意来有效建模。我们发现,隐喻建模比其他类型的词汇歧义要困难得多,但我们性能最好的密度矩阵方法优于简单基线和一些神经语言模型。

[NLP-73] LLaMA based Punctuation Restoration With Forward Pass Only Decoding
[NLP-73] 基于LLaMA的标点符号恢复,仅前向传递解码

链接: https://arxiv.org/abs/2408.11845
作者: Yutong Pang,Debjyoti Paul,Kevin Jiang,Xuedong Zhang,Xin Lei
关键词-EN: Large Language Model, Language Model Annotation, field of Large, Large Language, Language Model
关键词-ZH: 大型语言模型,语言模型注释,大型领域,大型语言,语言模型
类目: Computation and Language (cs.CL)
备注:

点击查看摘要

Abstract:This paper introduces two advancements in the field of Large Language Model Annotation with a focus on punctuation restoration tasks. Our first contribution is the application of LLaMA for punctuation restoration, which demonstrates superior performance compared to the established benchmark. Despite its impressive quality, LLaMA faces challenges regarding inference speed and hallucinations. To address this, our second contribution presents Forward Pass Only Decoding (FPOD), a novel decoding approach for annotation tasks. This innovative method results in a substantial 19.8x improvement in inference speed, effectively addressing a critical bottleneck and enhancing the practical utility of LLaMA for large-scale data annotation tasks without hallucinations. The combination of these contributions not only solidifies LLaMA as a powerful tool for punctuation restoration but also highlights FPOD as a crucial strategy for overcoming speed constraints. Subjects: Computation and Language (cs.CL) Cite as: arXiv:2408.11845 [cs.CL] (or arXiv:2408.11845v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2408.11845 Focus to learn more arXiv-issued DOI via DataCite
摘要:本文介绍了大型语言模型标注领域的两个进展,重点介绍了标点符号恢复任务。我们的第一个贡献是将骆驼应用于标点符号恢复,与已建立的基准相比,它显示出更好的性能。尽管骆驼的质量令人印象深刻,但它在推理速度和幻觉方面面临着挑战。为了解决这一问题,我们的第二个贡献提出了FPOD(Forward Pass Only Decoding),这是一种用于标注任务的新的解码方法。这种创新的方法使推理速度大幅提高19.8倍,有效地解决了一个关键的瓶颈,并增强了大羊驼在没有幻觉的大规模数据标注任务中的实用价值。这些贡献的结合不仅巩固了骆驼作为标点符号恢复的强大工具的地位,而且也突出了FPOD作为克服速度限制的关键策略。主题:计算与语言(cs.CL)引用如下:arxiv:2408.11845cs.CLhttps://doi.org/10.48550/arXiv.2408.11845 Focus通过DataCite了解更多arxiv发布的文档

[NLP-74] Editable Fairness: Fine-Grained Bias Mitigation in Language Models
[NLP-74] 可编辑公平性:语言模型中的细粒度偏差缓解

链接: https://arxiv.org/abs/2408.11843
作者: Ruizhe Chen,Yichen Li,Jianfei Yang,Joey Tianyi Zhou,Zuozhu Liu
关键词-EN: deploying large language, Generating fair, accurate predictions plays, large language models, real world
关键词-ZH: 部署大型语言、生成公平、准确的预测游戏、大型语言模型、现实世界
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: arXiv admin note: substantial text overlap with arXiv:2405.09341

点击查看摘要

Abstract:Generating fair and accurate predictions plays a pivotal role in deploying large language models (LLMs) in the real world. However, existing debiasing methods inevitably generate unfair or incorrect predictions as they are designed and evaluated to achieve parity across different social groups but leave aside individual commonsense facts, resulting in modified knowledge that elicits unreasonable or undesired predictions. In this paper, we first establish a new bias mitigation benchmark, BiaScope, which systematically assesses performance by leveraging newly constructed datasets and metrics on knowledge retention and generalization. Then, we propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases. FAST identifies the decisive layer responsible for storing social biases and then calibrates its outputs by integrating a small modular network, considering both bias mitigation and knowledge-preserving demands. Comprehensive experiments demonstrate that FAST surpasses state-of-the-art baselines with superior debiasing performance while not compromising the overall model capability for knowledge retention and downstream predictions. This highlights the potential of fine-grained debiasing strategies to achieve fairness in LLMs. Code will be publicly available.
摘要:生成公正、准确的预测对于在现实世界中部署大型语言模型(LLM)起着至关重要的作用。然而,现有的去偏向方法不可避免地产生不公平或不正确的预测,因为它们的设计和评估是为了在不同的社会群体中实现平等,但忽略了个人常识事实,导致修改知识,导致不合理或不受欢迎的预测。在本文中,我们首先建立了一个新的偏差缓解基准,BiaScope,它通过利用新构建的数据集和关于知识保留和泛化的度量来系统地评估性能。然后,我们提出了一种新的去偏方法,公平戳(FAST),它能够对个体的社会偏见进行细粒度的校准。FAST确定负责存储社会偏见的决定性层,然后通过整合一个小型模块化网络来校准其输出,同时考虑到减轻偏见和保存知识的需求。综合实验表明,FAST以优越的去偏性能超过了最先进的基线,同时不影响整体模型的知识保留和下游预测能力。这突显了细粒度去偏策略在低成本管理中实现公平的潜力。代码将公开提供。

[NLP-75] Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants
[NLP-75] ChatGPT可以获得工程学位吗?评估高等教育对人工智能助理的脆弱性

链接: https://arxiv.org/abs/2408.11841
作者: Beatriz Borges,Negar Foroutan,Deniz Bayazit,Anna Sotnikova,Syrielle Montariol,Tanya Nazaretzky,Mohammadreza Banaei,Alireza Sakhaeirad,Philippe Servant,Seyed Parsa Neshaei,Jibril Frej,Angelika Romanou,Gail Weiss,Sepideh Mamooler,Zeming Chen,Simin Fan,Silin Gao,Mete Ismayilzada,Debjit Paul,Alexandre Schöpfer,Andrej Janchevski,Anja Tiede,Clarence Linden,Emanuele Troiani,Francesco Salvi,Freya Behrens,Giacomo Orsi,Giovanni Piccioli,Hadrien Sevel,Louis Coulon,Manuela Pineros-Rodriguez,Marin Bonnassies,Pierre Hellich,Puck van Gerwen,Sankalp Gambhir,Solal Pirelli,Thomas Blanchard,Timothée Callens,Toni Abi Aoun,Yannick Calvino Alonso,Yuri Cho,Alberto Chiappa,Antonio Sclocchi,Étienne Bruno,Florian Hofhammer,Gabriel Pescia,Geovani Rizk,Leello Dadi,Lucas Stoffl,Manoel Horta Ribeiro,Matthieu Bovel,Yueyang Pan,Aleksandra Radenovic,Alexandre Alahi,Alexander Mathis,Anne-Florence Bitbol,Boi Faltings,Cécile Hébert,Devis Tuia,François Maréchal,George Candea,Giuseppe Carleo,Jean-Cédric Chappelier,Nicolas Flammarion,Jean-Marie Fürbringer,Jean-Philippe Pellet,Karl Aberer,Lenka Zdeborová,Marcel Salathé,Martin Jaggi,Martin Rajman,Mathias Payer,Matthieu Wyart,Michael Gastpar,Michele Ceriotti,Ola Svensson,Olivier Lévêque,Paolo Ienne,Rachid Guerraoui,Robert West,Sanidhya Kashyap,Valerio Piazza,Viesturs Simanis,Viktor Kuncak,Volkan Cevher,Philippe Schwaller,Sacha Friedli,Patrick Jermann,Tanja Kaser,Antoine Bosselut
关键词-EN: higher education institutions, students enrolled, higher education, learning outcomes, education institutions
关键词-ZH: 高等教育机构、入学学生、高等教育、学习成果、教育机构
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
备注: 20 pages, 8 figures

点击查看摘要

Abstract:AI assistants are being increasingly used by students enrolled in higher education institutions. While these tools provide opportunities for improved teaching and education, they also pose significant challenges for assessment and learning outcomes. We conceptualize these challenges through the lens of vulnerability, the potential for university assessments and learning outcomes to be impacted by student use of generative AI. We investigate the potential scale of this vulnerability by measuring the degree to which AI assistants can complete assessment questions in standard university-level STEM courses. Specifically, we compile a novel dataset of textual assessment questions from 50 courses at EPFL and evaluate whether two AI assistants, GPT-3.5 and GPT-4 can adequately answer these questions. We use eight prompting strategies to produce responses and find that GPT-4 answers an average of 65.8% of questions correctly, and can even produce the correct answer across at least one prompting strategy for 85.1% of questions. When grouping courses in our dataset by degree program, these systems already pass non-project assessments of large numbers of core courses in various degree programs, posing risks to higher education accreditation that will be amplified as these models improve. Our results call for revising program-level assessment design in higher education in light of advances in generative AI.
摘要:越来越多的高校招生使用人工智能助手。虽然这些工具为改进教学和教育提供了机会,但它们也对评估和学习成果构成了重大挑战。我们从脆弱性、大学评估的潜力和学生使用生成性人工智能影响学习结果的角度对这些挑战进行了概念化。我们通过测量AI助手可以在标准大学级STEM课程中完成评估问题的程度来调查此漏洞的潜在规模。具体地说,我们从EPFL的50门课程中收集了一个新的文本评估问题集,并评估了两个人工智能助手GPT-3.5和GPT-4是否能够充分回答这些问题。我们使用了八种提示策略来产生回答,发现GPT-4平均回答了65.8%的问题,并且对于85.1%的问题,甚至可以在至少一种提示策略上产生正确的答案。当我们的数据集中的课程按学位课程分组时,这些系统已经通过了对各种学位课程中大量核心课程的非项目评估,这给高等教育认证带来了风险,随着这些模型的改进,这种风险将被放大。我们的结果呼吁根据生成性人工智能的进步来修改高等教育的程序级评估设计。

[NLP-76] OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs EMNLP2024
[NLP-76] OpenFactCheck:LLM事实评估的统一框架

链接: https://arxiv.org/abs/2408.11832
作者: Hasan Iqbal,Yuxia Wang,Minghan Wang,Georgi Georgiev,Jiahui Geng,Iryna Gurevych,Preslav Nakov
关键词-EN: large language models, real-world applications calls, https URL, language models, large language
关键词-ZH: 大型语言模型、现实世界应用程序调用、https URL、语言模型、大型语言
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
备注: 10 pages, 4 Figures, 3 Tables, Submitted to EMNLP 2024 System Demonstration. arXiv admin note: substantial text overlap with arXiv:2405.05583

点击查看摘要

Abstract:The increased use of large language models (LLMs) across a variety of real-world applications calls for automatic tools to check the factual accuracy of their outputs, as LLMs often hallucinate. This is difficult as it requires assessing the factuality of free-form open-domain responses. While there has been a lot of research on this topic, different papers use different evaluation benchmarks and measures, which makes them hard to compare and hampers future progress. To mitigate these issues, we developed OpenFactCheck, a unified framework, with three modules: (i) RESPONSEEVAL, which allows users to easily customize an automatic fact-checking system and to assess the factuality of all claims in an input document using that system, (ii) LLMEVAL, which assesses the overall factuality of an LLM, and (iii) CHECKEREVAL, a module to evaluate automatic fact-checking systems. OpenFactCheck is open-sourced (this https URL) and publicly released as a Python library (this https URL) and also as a web service (this https URL). A video describing the system is available at this https URL.
摘要:大型语言模型(LLM)在各种实际应用中的使用越来越多,这就要求使用自动工具来检查其输出的事实准确性,因为LLM通常会产生幻觉。这是困难的,因为它需要评估自由形式的开放领域答复的真实性。虽然已经有很多关于这一主题的研究,但不同的论文使用了不同的评估基准和衡量标准,这使得它们很难进行比较,并阻碍了未来的进步。为了缓解这些问题,我们开发了OpenFactCheck,这是一个统一的框架,有三个模块:(I)RESPONSEEVAL,它允许用户轻松地定制自动事实核查系统,并使用该系统评估输入文件中所有声明的真实性;(Ii)LLMEVAL,它评估LLM的整体真实性;(Iii)CHECKEREVAL,一个评价自动事实核查系统的模块。OpenFactCheck是开源的(这个HTTPS URL),并以Python库(这个HTTPS URL)和Web服务(这个HTTPS URL)的形式公开发布。描述该系统的视频可在此HTTPS URL上找到。

[NLP-77] he Mechanics of Conceptual Interpretation in GPT Models: Interpretative Insights
[NLP-77] GPT模型中的概念解释机制:解释性见解

链接: https://arxiv.org/abs/2408.11827
作者: Nura Aljaafari,Danilo S. Carvalho,André Freitas
关键词-EN: large language models, enhancing their accuracy, large language, crucial for enhancing, Locating and editing
关键词-ZH: 大型语言模型,提高其准确性,大型语言,对于增强、定位和编辑至关重要
类目: Computation and Language (cs.CL)
备注: 23 pages, 25 figures

点击查看摘要

Abstract:Locating and editing knowledge in large language models (LLMs) is crucial for enhancing their accuracy, safety, and inference rationale. We introduce ``concept editing’', an innovative variation of knowledge editing that uncovers conceptualisation mechanisms within these models. Using the reverse dictionary task, inference tracing, and input abstraction, we analyse the Multi-Layer Perceptron (MLP), Multi-Head Attention (MHA), and hidden state components of transformer models. Our results reveal distinct patterns: MLP layers employ key-value retrieval mechanism and context-dependent processing, which are highly associated with relative input tokens. MHA layers demonstrate a distributed nature with significant higher-level activations, suggesting sophisticated semantic integration. Hidden states emphasise the importance of the last token and top layers in the inference process. We observe evidence of gradual information building and distributed representation. These observations elucidate how transformer models process semantic information, paving the way for targeted interventions and improved interpretability techniques. Our work highlights the complex, layered nature of semantic processing in LLMs and the challenges of isolating and modifying specific concepts within these models.
摘要:在大型语言模型中定位和编辑知识是提高其准确性、安全性和推理基础的关键。我们介绍了“概念编辑”,这是一种创新的知识编辑变体,揭示了这些模型中的概念化机制。使用反向字典任务、推理跟踪和输入抽象,分析了变压器模型的多层感知器(MLP)、多头注意(MHA)和隐藏状态组件。我们的结果揭示了不同的模式:MLP层使用键值检索机制和上下文相关处理,这与相对输入标记高度关联。MHA层表现出分布式的性质,具有重要的更高级别的激活,这意味着复杂的语义集成。隐藏状态强调了推理过程中最后一个令牌和顶层的重要性。我们观察到循序渐进的信息构建和分布式表示的证据。这些观察阐明了转换器模型如何处理语义信息,为有针对性的干预和改进的可解释性技术铺平了道路。我们的工作突出了LLMS中语义处理的复杂、分层性质,以及在这些模型中隔离和修改特定概念的挑战。

人工智能

[AI-0] ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction

链接: https://arxiv.org/abs/2408.12598
作者: Ziyu Tang,Weicai Ye,Yifan Wang,Di Huang,Hujun Bao,Tong He,Guofeng Zhang
关键词-EN: Neural implicit reconstruction, Neural implicit, recovering dense, implicit reconstruction, reconstruction via volume
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Neural implicit reconstruction via volume rendering has demonstrated its effectiveness in recovering dense 3D surfaces. However, it is non-trivial to simultaneously recover meticulous geometry and preserve smoothness across regions with differing characteristics. To address this issue, previous methods typically employ geometric priors, which are often constrained by the performance of the prior models. In this paper, we propose ND-SDF, which learns a Normal Ddeflection field to represent the angular deviation between the scene normal and the prior normal. Unlike previous methods that uniformly apply geometric priors on all samples, introducing significant bias in accuracy, our proposed normal deflection field dynamically learns and adapts the utilization of samples based on their specific characteristics, thereby improving both the accuracy and effectiveness of the model. Our method not only obtains smooth weakly textured regions such as walls and floors but also preserves the geometric details of complex structures. In addition, we introduce a novel ray sampling strategy based on the deflection angle to facilitate the unbiased rendering process, which significantly improves the quality and accuracy of intricate surfaces, especially on thin structures. Consistent improvements on various challenging datasets demonstrate the superiority of our method.

[AI-1] Differentiable Logic Programming for Distant Supervision ECAI2024

链接: https://arxiv.org/abs/2408.12591
作者: Akihiro Takemura,Katsumi Inoue
关键词-EN: integrating neural networks, programming in Neural-Symbolic, distant supervision, direct labels, integrating neural
类目: Artificial Intelligence (cs.AI)
*备注: To be published in ECAI 2024

点击查看摘要

Abstract:We introduce a new method for integrating neural networks with logic programming in Neural-Symbolic AI (NeSy), aimed at learning with distant supervision, in which direct labels are unavailable. Unlike prior methods, our approach does not depend on symbolic solvers for reasoning about missing labels. Instead, it evaluates logical implications and constraints in a differentiable manner by embedding both neural network outputs and logic programs into matrices. This method facilitates more efficient learning under distant supervision. We evaluated our approach against existing methods while maintaining a constant volume of training data. The findings indicate that our method not only matches or exceeds the accuracy of other methods across various tasks but also speeds up the learning process. These results highlight the potential of our approach to enhance both accuracy and learning efficiency in NeSy applications.

[AI-2] xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations ECCV24

链接: https://arxiv.org/abs/2408.12590
作者: Can Qin,Congying Xia,Krithika Ramakrishnan,Michael Ryoo,Lifu Tu,Yihao Feng,Manli Shu,Honglu Zhou,Anas Awadalla,Jun Wang,Senthil Purushwalkam,Le Xue,Yingbo Zhou,Huan Wang,Silvio Savarese,Juan Carlos Niebles,Zeyuan Chen,Ran Xu,Caiming Xiong
关键词-EN: producing realistic scenes, textual descriptions, capable of producing, producing realistic, realistic scenes
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注: Accepted by ECCV24 AI4VA

点击查看摘要

Abstract:We present xGen-VideoSyn-1, a text-to-video (T2V) generation model capable of producing realistic scenes from textual descriptions. Building on recent advancements, such as OpenAI’s Sora, we explore the latent diffusion model (LDM) architecture and introduce a video variational autoencoder (VidVAE). VidVAE compresses video data both spatially and temporally, significantly reducing the length of visual tokens and the computational demands associated with generating long-sequence videos. To further address the computational costs, we propose a divide-and-merge strategy that maintains temporal consistency across video segments. Our Diffusion Transformer (DiT) model incorporates spatial and temporal self-attention layers, enabling robust generalization across different timeframes and aspect ratios. We have devised a data processing pipeline from the very beginning and collected over 13M high-quality video-text pairs. The pipeline includes multiple steps such as clipping, text detection, motion estimation, aesthetics scoring, and dense captioning based on our in-house video-LLM model. Training the VidVAE and DiT models required approximately 40 and 642 H100 days, respectively. Our model supports over 14-second 720p video generation in an end-to-end way and demonstrates competitive performance against state-of-the-art T2V models.

[AI-3] Identifying the Best Arm in the Presence of Global Environment Shifts ECAI2024

链接: https://arxiv.org/abs/2408.12581
作者: Phurinut Srisawad,Juergen Branke,Long Tran-Thanh
关键词-EN: Best-Arm Identification problem, non-stationary stochastic bandits, Best-Arm Identification, Identification problem, stochastic bandits setting
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: Extended version of the paper accepted at the 27th European Conference on Artificial Intelligence (ECAI 2024); Paper ID: M1125

点击查看摘要

Abstract:This paper formulates a new Best-Arm Identification problem in the non-stationary stochastic bandits setting, where the means of all arms are shifted in the same way due to a global influence of the environment. The aim is to identify the unique best arm across environmental change given a fixed total budget. While this setting can be regarded as a special case of Adversarial Bandits or Corrupted Bandits, we demonstrate that existing solutions tailored to those settings do not fully utilise the nature of this global influence, and thus, do not work well in practice (despite their theoretical guarantees). To overcome this issue, in this paper we develop a novel selection policy that is consistent and robust in dealing with global environmental shifts. We then propose an allocation policy, LinLUCB, which exploits information about global shifts across all arms in each environment. Empirical tests depict a significant improvement in our policies against other existing methods.

[AI-4] RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment

链接: https://arxiv.org/abs/2408.12579
作者: Xiaohan Wang,Xiaoyan Yang,Yuqi Zhu,Yue Shen,Jian Wang,Peng Wei,Lei Liang,Jinjie Gu,Huajun Chen,Ningyu Zhang
关键词-EN: Large Language Models, Large Language, Language Models, Med-Gemini achieve performance, achieve performance competitively
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR); Machine Learning (cs.LG)
*备注: Ongoing work

点击查看摘要

Abstract:Large Language Models (LLMs) like GPT-4, MedPaLM-2, and Med-Gemini achieve performance competitively with human experts across various medical benchmarks. However, they still face challenges in making professional diagnoses akin to physicians, particularly in efficiently gathering patient information and reasoning the final diagnosis. To this end, we introduce the RuleAlign framework, designed to align LLMs with specific diagnostic rules. We develop a medical dialogue dataset comprising rule-based communications between patients and physicians and design an alignment learning approach through preference learning. Experimental results demonstrate the effectiveness of the proposed approach. We hope that our work can serve as an inspiration for exploring the potential of LLMs as AI physicians.

[AI-5] A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language

链接: https://arxiv.org/abs/2408.12578
作者: Ekdeep Singh Lubana,Kyogo Kawaguchi,Robert P. Dick,Hidenori Tanaka
关键词-EN: phenomenon often called, compute can lead, emergent capabilities, Increase in data, Increase
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: Preprint

点击查看摘要

Abstract:Increase in data, size, or compute can lead to sudden learning of specific capabilities by a neural network – a phenomenon often called “emergence”. Beyond scientific understanding, establishing the causal factors underlying such emergent capabilities is crucial to enable risk regulation frameworks for AI. In this work, we seek inspiration from study of emergent properties in other fields and propose a phenomenological definition for the concept in the context of neural networks. Our definition implicates the acquisition of specific structures underlying the data-generating process as a cause of sudden performance growth for specific, narrower tasks. We empirically investigate this definition by proposing an experimental system grounded in a context-sensitive formal language and find that Transformers trained to perform tasks on top of strings from this language indeed exhibit emergent capabilities. Specifically, we show that once the language’s underlying grammar and context-sensitivity inducing structures are learned by the model, performance on narrower tasks suddenly begins to improve. We then analogize our network’s learning dynamics with the process of percolation on a bipartite graph, establishing a formal phase transition model that predicts the shift in the point of emergence observed in experiment when changing the data structure. Overall, our experimental and theoretical frameworks yield a step towards better defining, characterizing, and predicting emergence in neural networks.

[AI-6] Enhanced Parking Perception by Multi-Task Fisheye Cross-view Transformers ATC

链接: https://arxiv.org/abs/2408.12575
作者: Antonyo Musabini,Ivan Novikov,Sana Soula,Christel Leonet,Lihao Wang,Rachid Benmokhtar,Fabian Burger,Thomas Boulay,Xavier Perrotton
关键词-EN: algorithms primarily focus, error-prone homographic projection, Current parking area, Driver Assistance System, Advanced Driver Assistance
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注: 26th Irish Machine Vision and Image Processing Conference, Data-Driven Autonomy Workshop (matching camera-ready version)

点击查看摘要

Abstract:Current parking area perception algorithms primarily focus on detecting vacant slots within a limited range, relying on error-prone homographic projection for both labeling and inference. However, recent advancements in Advanced Driver Assistance System (ADAS) require interaction with end-users through comprehensive and intelligent Human-Machine Interfaces (HMIs). These interfaces should present a complete perception of the parking area going from distinguishing vacant slots’ entry lines to the orientation of other parked vehicles. This paper introduces Multi-Task Fisheye Cross View Transformers (MT F-CVT), which leverages features from a four-camera fisheye Surround-view Camera System (SVCS) with multihead attentions to create a detailed Bird-Eye View (BEV) grid feature map. Features are processed by both a segmentation decoder and a Polygon-Yolo based object detection decoder for parking slots and vehicles. Trained on data labeled using LiDAR, MT F-CVT positions objects within a 25m x 25m real open-road scenes with an average error of only 20 cm. Our larger model achieves an F-1 score of 0.89. Moreover the smaller model operates at 16 fps on an Nvidia Jetson Orin embedded board, with similar detection results to the larger one. MT F-CVT demonstrates robust generalization capability across different vehicles and camera rig configurations. A demo video from an unseen vehicle and camera rig is available at: this https URL.

[AI-7] MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

链接: https://arxiv.org/abs/2408.12574
作者: Haojun Shi,Suyu Ye,Xinyu Fang,Chuanyang Jin,Layla Isik,Yen-Ling Kuo,Tianmin Shu
关键词-EN: Understanding people social, Theory of Mind, Understanding people, complex real-world scenarios, intricate mental reasoning
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
*备注: Project website: this https URL Code: this https URL

点击查看摘要

Abstract:Understanding people’s social interactions in complex real-world scenarios often relies on intricate mental reasoning. To truly understand how and why people interact with one another, we must infer the underlying mental states that give rise to the social interactions, i.e., Theory of Mind reasoning in multi-agent interactions. Additionally, social interactions are often multi-modal – we can watch people’s actions, hear their conversations, and/or read about their past behaviors. For AI systems to successfully and safely interact with people in real-world environments, they also need to understand people’s mental states as well as their inferences about each other’s mental states based on multi-modal information about their interactions. For this, we introduce MuMA-ToM, a Multi-modal Multi-Agent Theory of Mind benchmark. MuMA-ToM is the first multi-modal Theory of Mind benchmark that evaluates mental reasoning in embodied multi-agent interactions. In MuMA-ToM, we provide video and text descriptions of people’s multi-modal behavior in realistic household environments. Based on the context, we then ask questions about people’s goals, beliefs, and beliefs about others’ goals. We validated MuMA-ToM in a human experiment and provided a human baseline. We also proposed a novel multi-modal, multi-agent ToM model, LIMP (Language model-based Inverse Multi-agent Planning). Our experimental results show that LIMP significantly outperforms state-of-the-art methods, including large multi-modal models (e.g., GPT-4o, Gemini-1.5 Pro) and a recent multi-modal ToM model, BIP-ALM.

[AI-8] Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers ECCV2024

链接: https://arxiv.org/abs/2408.12568
作者: Sayed Mohammad Vakilzadeh Hatefi,Maximilian Dreyer,Reduan Achtibat,Thomas Wiegand,Wojciech Samek,Sebastian Lapuschkin
关键词-EN: Deep Neural Networks, huge computational costs, Deep Neural, complex problems, billions of parameters
类目: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
*备注: Accepted as a workshop paper at ECCV 2024 31 pages (14 pages manuscript, 4 pages references, 13 pages appendix)

点击查看摘要

Abstract:To solve ever more complex problems, Deep Neural Networks are scaled to billions of parameters, leading to huge computational costs. An effective approach to reduce computational requirements and increase efficiency is to prune unnecessary components of these often over-parameterized networks. Previous work has shown that attribution methods from the field of eXplainable AI serve as effective means to extract and prune the least relevant network components in a few-shot fashion. We extend the current state by proposing to explicitly optimize hyperparameters of attribution methods for the task of pruning, and further include transformer-based networks in our analysis. Our approach yields higher model compression rates of large transformer- and convolutional architectures (VGG, ResNet, ViT) compared to previous works, while still attaining high performance on ImageNet classification tasks. Here, our experiments indicate that transformers have a higher degree of over-parameterization compared to convolutional neural networks. Code is available at \hrefthis https URL\textthis https link .

[AI-9] ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation

链接: https://arxiv.org/abs/2408.12561
作者: Lujia Zhong,Shuo Huang,Yonggang Shi
关键词-EN: made remarkable strides, probabilistic diffusion models, large language models, remarkable strides, generative modeling
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: Under review

点击查看摘要

Abstract:Recently, deep learning has made remarkable strides, especially with generative modeling, such as large language models and probabilistic diffusion models. However, training these models often involves significant computational resources, requiring billions of petaFLOPs. This high resource consumption results in substantial energy usage and a large carbon footprint, raising critical environmental concerns. Back-propagation (BP) is a major source of computational expense during training deep learning models. To advance research on energy-efficient training and allow for sparse learning on any machine and device, we propose a general, energy-efficient convolution module that can be seamlessly integrated into any deep learning architecture. Specifically, we introduce channel-wise sparsity with additional gradient selection schedulers during backward based on the assumption that BP is often dense and inefficient, which can lead to over-fitting and high computational consumption. Our experiments demonstrate that our approach reduces 40% computations while potentially improving model performance, validated on image classification and generation tasks. This reduction can lead to significant energy savings and a lower carbon footprint during the research and development phases of large-scale AI systems. Additionally, our method mitigates over-fitting in a manner distinct from Dropout, allowing it to be combined with Dropout to further enhance model performance and reduce computational resource usage. Extensive experiments validate that our method generalizes to a variety of datasets and tasks and is compatible with a wide range of deep learning architectures and modules. Code is publicly available at this https URL.

[AI-10] Data Quality Antipatterns for Software Analytics

链接: https://arxiv.org/abs/2408.12560
作者: Aaditya Bhatia,Dayi Lin,Gopi Krishnan Rajbahadur,Bram Adams,Ahmed E. Hassan
关键词-EN: data quality antipatterns, Data quality, software defect prediction, Toggle, quality antipatterns
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Background: Data quality is vital in software analytics, particularly for machine learning (ML) applications like software defect prediction (SDP). Despite the widespread use of ML in software engineering, the effect of data quality antipatterns on these models remains underexplored. Objective: This study develops a taxonomy of ML-specific data quality antipatterns and assesses their impact on software analytics models’ performance and interpretation. Methods: We identified eight types and 14 sub-types of ML-specific data quality antipatterns through a literature review. We conducted experiments to determine the prevalence of these antipatterns in SDP data (RQ1), assess how cleaning order affects model performance (RQ2), evaluate the impact of antipattern removal on performance (RQ3), and examine the consistency of interpretation from models built with different antipatterns (RQ4). Results: In our SDP case study, we identified nine antipatterns. Over 90% of these overlapped at both row and column levels, complicating cleaning prioritization and risking excessive data removal. The order of cleaning significantly impacts ML model performance, with neural networks being more resilient to cleaning order changes than simpler models like logistic regression. Antipatterns such as Tailed Distributions and Class Overlap show a statistically significant correlation with performance metrics when other antipatterns are cleaned. Models built with different antipatterns showed moderate consistency in interpretation results. Conclusion: The cleaning order of different antipatterns impacts ML model performance. Five antipatterns have a statistically significant correlation with model performance when others are cleaned. Additionally, model interpretation is moderately affected by different data quality antipatterns. Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI) Cite as: arXiv:2408.12560 [cs.SE] (or arXiv:2408.12560v1 [cs.SE] for this version) https://doi.org/10.48550/arXiv.2408.12560 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Aaditya Bhatia [view email] [v1] Thu, 22 Aug 2024 17:21:09 UTC (8,280 KB) Full-text links: Access Paper: View a PDF of the paper titled Data Quality Antipatterns for Software Analytics, by Aaditya Bhatia and 4 other authorsView PDFHTML (experimental)TeX SourceOther Formats view license Current browse context: cs.SE prev | next new | recent | 2024-08 Change to browse by: cs cs.AI References Citations NASA ADSGoogle Scholar Semantic Scholar a export BibTeX citation Loading… BibTeX formatted citation loading… Data provided by: Bookmark checked=“checked”> Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Code, Data and Media Associated with this Article Links to Code Toggle CatalyzeX Code Finder for Papers (What is CatalyzeX?) DagsHub Toggle DagsHub (What is DagsHub?) GotitPub Toggle Gotit.pub (What is GotitPub?) Links to Code Toggle Papers with Code (What is Papers with Code?) ScienceCast Toggle ScienceCast (What is ScienceCast?) Demos Demos Replicate Toggle Replicate (What is Replicate?) Spaces Toggle Hugging Face Spaces (What is Spaces?) Spaces Toggle TXYZ.AI (What is TXYZ.AI?) Related Papers Recommenders and Search Tools Link to Influence Flower Influence Flower (What are Influence Flowers?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Core recommender toggle CORE Recommender (What is CORE?) Author Venue Institution Topic About arXivLabs arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv’s community? Learn more about arXivLabs. Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?) mathjaxToggle(); About Help contact arXivClick here to contact arXiv Contact subscribe to arXiv mailingsClick here to subscribe Subscribe Copyright Privacy Policy Web Accessibility Assistance arXiv Operational Status Get status notifications via email or slack

[AI-11] Modeling Time-Variant Responses of Optical Compressors with Selective State Space Models

链接: https://arxiv.org/abs/2408.12549
作者: Riccardo Simionato
关键词-EN: Selective State Space, State Space models, State Space block, Selective State, State Space
类目: ound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
*备注:

点击查看摘要

Abstract:This paper presents a method for modeling optical dynamic range compressors using deep neural networks with Selective State Space models. The proposed approach surpasses previous methods based on recurrent layers by employing a Selective State Space block to encode the input audio. It features a refined technique integrating Feature-wise Linear Modulation and Gated Linear Units to adjust the network dynamically, conditioning the compression’s attack and release phases according to external parameters. The proposed architecture is well-suited for low-latency and real-time applications, crucial in live audio processing. The method has been validated on the analog optical compressors TubeTech CL 1B and Teletronix LA-2A, which possess distinct characteristics. Evaluation is performed using quantitative metrics and subjective listening tests, comparing the proposed method with other state-of-the-art models. Results show that our black-box modeling methods outperform all others, achieving accurate emulation of the compression process for both seen and unseen settings during training. We further show a correlation between this accuracy and the sampling density of the control parameters in the dataset and identify settings with fast attack and slow release as the most challenging to emulate.

[AI-12] PCGRL: Scaling Control and Generalization in Reinforcement Learning Level Generators

链接: https://arxiv.org/abs/2408.12525
作者: Sam Earle,Zehua Jiang,Julian Togelius
关键词-EN: Procedural Content Generation, Procedural Content, computable metrics acting, Content Generation, Generation via Reinforcement
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: 8 pages, 7 figures, 6 tables. Published at IEEE Conference on Games, 2024

点击查看摘要

Abstract:Procedural Content Generation via Reinforcement Learning (PCGRL) has been introduced as a means by which controllable designer agents can be trained based only on a set of computable metrics acting as a proxy for the level’s quality and key characteristics. While PCGRL offers a unique set of affordances for game designers, it is constrained by the compute-intensive process of training RL agents, and has so far been limited to generating relatively small levels. To address this issue of scale, we implement several PCGRL environments in Jax so that all aspects of learning and simulation happen in parallel on the GPU, resulting in faster environment simulation; removing the CPU-GPU transfer of information bottleneck during RL training; and ultimately resulting in significantly improved training speed. We replicate several key results from prior works in this new framework, letting models train for much longer than previously studied, and evaluating their behavior after 1 billion timesteps. Aiming for greater control for human designers, we introduce randomized level sizes and frozen “pinpoints” of pivotal game tiles as further ways of countering overfitting. To test the generalization ability of learned generators, we evaluate models on large, out-of-distribution map sizes, and find that partial observation sizes learn more robust design strategies.

[AI-13] Advanced atom-level representations for protein flexibility prediction utilizing graph neural networks

链接: https://arxiv.org/abs/2408.12519
作者: Sina Sarparast,Aldo Zaimi,Maximilian Ebert,Michael-Rock Goldsmith
关键词-EN: Protein dynamics play, Protein dynamics, Protein, play a crucial, crucial role
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Protein dynamics play a crucial role in many biological processes and drug interactions. However, measuring, and simulating protein dynamics is challenging and time-consuming. While machine learning holds promise in deciphering the determinants of protein dynamics from structural information, most existing methods for protein representation learning operate at the residue level, ignoring the finer details of atomic interactions. In this work, we propose for the first time to use graph neural networks (GNNs) to learn protein representations at the atomic level and predict B-factors from protein 3D structures. The B-factor reflects the atomic displacement of atoms in proteins, and can serve as a surrogate for protein flexibility. We compared different GNN architectures to assess their performance. The Meta-GNN model achieves a correlation coefficient of 0.71 on a large and diverse test set of over 4k proteins (17M atoms) from the Protein Data Bank (PDB), outperforming previous methods by a large margin. Our work demonstrates the potential of representations learned by GNNs for protein flexibility prediction and other related tasks.

[AI-14] he Russian-focused embedders exploration: ruMTEB benchmark and Russian embedding model design

链接: https://arxiv.org/abs/2408.12503
作者: Artem Snegirev,Maria Tikhonova,Anna Maksimova,Alena Fenogenova,Alexander Abramov
关键词-EN: Natural Language Processing, Language Processing, Natural Language, role in Natural, creating text embeddings
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Embedding models play a crucial role in Natural Language Processing (NLP) by creating text embeddings used in various tasks such as information retrieval and assessing semantic text similarity. This paper focuses on research related to embedding models in the Russian language. It introduces a new Russian-focused embedding model called ru-en-RoSBERTa and the ruMTEB benchmark, the Russian version extending the Massive Text Embedding Benchmark (MTEB). Our benchmark includes seven categories of tasks, such as semantic textual similarity, text classification, reranking, and retrieval. The research also assesses a representative set of Russian and multilingual models on the proposed benchmark. The findings indicate that the new model achieves results that are on par with state-of-the-art models in Russian. We release the model ru-en-RoSBERTa, and the ruMTEB framework comes with open-source code, integration into the original framework and a public leaderboard.

[AI-15] MEDCO: Medical Education Copilots Based on A Multi-Agent Framework

链接: https://arxiv.org/abs/2408.12496
作者: Hao Wei,Jianing Qiu,Haibao Yu,Wu Yuan
关键词-EN: Large language models, diverse research domains, Large language, research domains, including medicine
类目: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
*备注:

点击查看摘要

Abstract:Large language models (LLMs) have had a significant impact on diverse research domains, including medicine and healthcare. However, the potential of LLMs as copilots in medical education remains underexplored. Current AI-assisted educational tools are limited by their solitary learning approach and inability to simulate the multi-disciplinary and interactive nature of actual medical training. To address these limitations, we propose MEDCO (Medical EDucation COpilots), a novel multi-agent-based copilot system specially developed to emulate real-world medical training environments. MEDCO incorporates three primary agents: an agentic patient, an expert doctor, and a radiologist, facilitating a multi-modal and interactive learning environment. Our framework emphasizes the learning of proficient question-asking skills, multi-disciplinary collaboration, and peer discussions between students. Our experiments show that simulated virtual students who underwent training with MEDCO not only achieved substantial performance enhancements comparable to those of advanced models, but also demonstrated human-like learning behaviors and improvements, coupled with an increase in the number of learning samples. This work contributes to medical education by introducing a copilot that implements an interactive and collaborative learning approach. It also provides valuable insights into the effectiveness of AI-integrated training paradigms.

[AI-16] GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models

链接: https://arxiv.org/abs/2408.12494
作者: Kunsheng Tang,Wenbo Zhou,Jie Zhang,Aishan Liu,Gelei Deng,Shuai Li,Peigui Qi,Weiming Zhang,Tianwei Zhang,Nenghai Yu
关键词-EN: Large language models, exhibited remarkable capabilities, magnify societal biases, natural language generation, Large language
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but they have also been observed to magnify societal biases, particularly those related to gender. In response to this issue, several benchmarks have been proposed to assess gender bias in LLMs. However, these benchmarks often lack practical flexibility or inadvertently introduce biases. To address these shortcomings, we introduce GenderCARE, a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics for quantifying and mitigating gender bias in LLMs. To begin, we establish pioneering criteria for gender equality benchmarks, spanning dimensions such as inclusivity, diversity, explainability, objectivity, robustness, and realisticity. Guided by these criteria, we construct GenderPair, a novel pair-based benchmark designed to assess gender bias in LLMs comprehensively. Our benchmark provides standardized and realistic evaluations, including previously overlooked gender groups such as transgender and non-binary individuals. Furthermore, we develop effective debiasing techniques that incorporate counterfactual data augmentation and specialized fine-tuning strategies to reduce gender bias in LLMs without compromising their overall performance. Extensive experiments demonstrate a significant reduction in various gender bias benchmarks, with reductions peaking at over 90% and averaging above 35% across 17 different LLMs. Importantly, these reductions come with minimal variability in mainstream language tasks, remaining below 2%. By offering a realistic assessment and tailored reduction of gender biases, we hope that our GenderCARE can represent a significant step towards achieving fairness and equity in LLMs. More details are available at this https URL.

[AI-17] AI in radiological imaging of soft-tissue and bone tumours: a systematic review evaluating against CLAIM and FUTURE-AI guidelines

链接: https://arxiv.org/abs/2408.12491
作者: Douwe J. Spaanderman(1),Matthew Marzetti(2,3),Xinyi Wan(1),Andrew F. Scarsbrook(4,5),Philip Robinson(4),Edwin H.G. Oei(1),Jacob J. Visser(1),Robert Hemke(6),Kirsten van Langevelde(7),David F. Hanff(1),Geert J.L.H. van Leenders(8),Cornelis Verhoef(9),Dirk J. Gruühagen(9),Wiro J. Niessen(1,10),Stefan Klein(1),Martijn P.A. Starmans(1) ((1) Department of Radiology and Nuclear Medicine, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, the Netherlands, (2) Department of Medical Physics, Leeds Teaching Hospitals NHS Trust, UK, (3) Leeds Biomedical Research Centre, University of Leeds, UK, (4) Department of Radiology, Leeds Teaching Hospitals NHS Trust, UK, (5) Leeds Institute of Medical Research, University of Leeds, UK, (6) Department of Radiology and Nuclear Medicine, Amsterdam UMC, Amsterdam, the Netherlands, (7) Department of Radiology, Leiden University Medical Center, Leiden, the Netherlands, (8) Department of Pathology, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, the Netherlands, (9) Department of Surgical Oncology, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, the Netherlands, (10) Faculty of Medical Sciences, University of Groningen, Groningen, the Netherlands)
关键词-EN: diagnostically challenging lesions, Soft-tissue and bone, variable clinical behaviours, diagnostically challenging, treatment approaches
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: 23 pages, 6 figures, 6 supplementary figures

点击查看摘要

Abstract:Soft-tissue and bone tumours (STBT) are rare, diagnostically challenging lesions with variable clinical behaviours and treatment approaches. This systematic review provides an overview of Artificial Intelligence (AI) methods using radiological imaging for diagnosis and prognosis of these tumours, highlighting challenges in clinical translation, and evaluating study alignment with the Checklist for AI in Medical Imaging (CLAIM) and the FUTURE-AI international consensus guidelines for trustworthy and deployable AI to promote the clinical translation of AI methods. The review covered literature from several bibliographic databases, including papers published before 17/07/2024. Original research in peer-reviewed journals focused on radiology-based AI for diagnosing or prognosing primary STBT was included. Exclusion criteria were animal, cadaveric, or laboratory studies, and non-English papers. Abstracts were screened by two of three independent reviewers for eligibility. Eligible papers were assessed against guidelines by one of three independent reviewers. The search identified 15,015 abstracts, from which 325 articles were included for evaluation. Most studies performed moderately on CLAIM, averaging a score of 28.9 \pm 7.5 out of 53, but poorly on FUTURE-AI, averaging 5.1 \pm 2.1 out of 30. Imaging-AI tools for STBT remain at the proof-of-concept stage, indicating significant room for improvement. Future efforts by AI developers should focus on design (e.g. define unmet clinical need, intended clinical setting and how AI would be integrated in clinical workflow), development (e.g. build on previous work, explainability), evaluation (e.g. evaluating and addressing biases, evaluating AI against best practices), and data reproducibility and availability (making documented code and data publicly available). Following these recommendations could improve clinical translation of AI methods.

[AI-18] Not All Samples Should Be Utilized Equally: Towards Understanding and Improving Dataset Distillation

链接: https://arxiv.org/abs/2408.12483
作者: Shaobo Wang,Yantai Yang,Qilong Wang,Kaixin Li,Linfeng Zhang,Junchi Yan
关键词-EN: small dataset capable, aims to synthesize, synthesize a small, capable of performing, performing comparably
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Dataset Distillation (DD) aims to synthesize a small dataset capable of performing comparably to the original dataset. Despite the success of numerous DD methods, theoretical exploration of this area remains unaddressed. In this paper, we take an initial step towards understanding various matching-based DD methods from the perspective of sample difficulty. We begin by empirically examining sample difficulty, measured by gradient norm, and observe that different matching-based methods roughly correspond to specific difficulty tendencies. We then extend the neural scaling laws of data pruning to DD to theoretically explain these matching-based methods. Our findings suggest that prioritizing the synthesis of easier samples from the original dataset can enhance the quality of distilled datasets, especially in low IPC (image-per-class) settings. Based on our empirical observations and theoretical analysis, we introduce the Sample Difficulty Correction (SDC) approach, designed to predominantly generate easier samples to achieve higher dataset quality. Our SDC can be seamlessly integrated into existing methods as a plugin with minimal code adjustments. Experimental results demonstrate that adding SDC generates higher-quality distilled datasets across 7 distillation methods and 6 datasets.

[AI-19] Predicting Solar Energy Generation with Machine Learning based on AQI and Weather Features

链接: https://arxiv.org/abs/2408.12476
作者: Arjun Shah,Varun Viswanath,Kashish Gandhi,Dr. Nilesh Madhukar Patil
关键词-EN: Air Quality Index, efficient grid integration, Deep Learning, Quality Index, Deep Learning techniques
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: 10 pages, 11 figures

点击查看摘要

Abstract:This paper addresses the pressing need for an accurate solar energy prediction model, which is crucial for efficient grid integration. We explore the influence of the Air Quality Index and weather features on solar energy generation, employing advanced Machine Learning and Deep Learning techniques. Our methodology uses time series modeling and makes novel use of power transform normalization and zero-inflated modeling. Various Machine Learning algorithms and Conv2D Long Short-Term Memory model based Deep Learning models are applied to these transformations for precise predictions. Results underscore the effectiveness of our approach, demonstrating enhanced prediction accuracy with Air Quality Index and weather features. We achieved a 0.9691 R^2 Score, 0.18 MAE, 0.10 RMSE with Conv2D Long Short-Term Memory model, showcasing the power transform technique’s innovation in enhancing time series forecasting for solar energy generation. Such results help our research contribute valuable insights to the synergy between Air Quality Index, weather features, and Deep Learning techniques for solar energy prediction.

[AI-20] WCEbleedGen: A wireless capsule endoscopy dataset and its benchmarking for automatic bleeding classification detection and segmentation

链接: https://arxiv.org/abs/2408.12466
作者: Palak Handa,Manas Dhir,Amirreza Mahbod,Florian Schwarzhans,Ramona Woitek,Nidhi Goel,Deepak Gunjan
关键词-EN: Wireless Capsule Endoscopy, Capsule Endoscopy, Wireless Capsule, Computer-based analysis, medically annotated WCE
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Computer-based analysis of Wireless Capsule Endoscopy (WCE) is crucial. However, a medically annotated WCE dataset for training and evaluation of automatic classification, detection, and segmentation of bleeding and non-bleeding frames is currently lacking. The present work focused on development of a medically annotated WCE dataset called WCEbleedGen for automatic classification, detection, and segmentation of bleeding and non-bleeding frames. It comprises 2,618 WCE bleeding and non-bleeding frames which were collected from various internet resources and existing WCE datasets. A comprehensive benchmarking and evaluation of the developed dataset was done using nine classification-based, three detection-based, and three segmentation-based deep learning models. The dataset is of high-quality, is class-balanced and contains single and multiple bleeding sites. Overall, our standard benchmark results show that Visual Geometric Group (VGG) 19, You Only Look Once version 8 nano (YOLOv8n), and Link network (Linknet) performed best in automatic classification, detection, and segmentation-based evaluations, respectively. Automatic bleeding diagnosis is crucial for WCE video interpretations. This diverse dataset will aid in developing of real-time, multi-task learning-based innovative solutions for automatic bleeding diagnosis in WCE. The dataset and code are publicly available at this https URL and this https URL.

[AI-21] Relaxed Rotational Equivariance via G-Biases in Vision

链接: https://arxiv.org/abs/2408.12454
作者: Zhiqiang Wu,Licheng Sun,Yingjie Liu,Jian Yang,Hanlin Dong,Shing-Ho J. Lin,Xuan Tang,Jinpeng Mi,Bo Jin,Xian Wei
关键词-EN: Group Equivariant Convolution, Equivariant Convolution, handle rotational symmetry, rotational symmetry, strict rotational symmetry
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Group Equivariant Convolution (GConv) can effectively handle rotational symmetry data. They assume uniform and strict rotational symmetry across all features, as the transformations under the specific group. However, real-world data rarely conforms to strict rotational symmetry commonly referred to as Rotational Symmetry-Breaking in the system or dataset, making GConv unable to adapt effectively to this phenomenon. Motivated by this, we propose a simple but highly effective method to address this problem, which utilizes a set of learnable biases called the G -Biases under the group order to break strict group constraints and achieve \textbfRelaxed \textbfRotational \textbfEquivarant \textbfConvolution (RREConv). We conduct extensive experiments to validate Relaxed Rotational Equivariance on rotational symmetry groups \mathcalC_n (e.g. \mathcalC_2 , \mathcalC_4 , and \mathcalC_6 groups). Further experiments demonstrate that our proposed RREConv-based methods achieve excellent performance, compared to existing GConv-based methods in classification and detection tasks on natural image datasets.

[AI-22] A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures

链接: https://arxiv.org/abs/2408.12443
作者: Tahmina Khanam,Hamid Laga,Mohammed Bennamoun,Guanjin Wang,Ferdous Sohel,Farid Boussaid,Guan Wang,Anuj Srivastava
关键词-EN: Velocity Function Trees, Square Root Velocity, Root Velocity Function, comprehensive approach, modeling and analyzing
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
*备注:

点击查看摘要

Abstract:We propose the first comprehensive approach for modeling and analyzing the spatiotemporal shape variability in tree-like 4D objects, i.e., 3D objects whose shapes bend, stretch, and change in their branching structure over time as they deform, grow, and interact with their environment. Our key contribution is the representation of tree-like 3D shapes using Square Root Velocity Function Trees (SRVFT). By solving the spatial registration in the SRVFT space, which is equipped with an L2 metric, 4D tree-shaped structures become time-parameterized trajectories in this space. This reduces the problem of modeling and analyzing 4D tree-like shapes to that of modeling and analyzing elastic trajectories in the SRVFT space, where elasticity refers to time warping. In this paper, we propose a novel mathematical representation of the shape space of such trajectories, a Riemannian metric on that space, and computational tools for fast and accurate spatiotemporal registration and geodesics computation between 4D tree-shaped structures. Leveraging these building blocks, we develop a full framework for modelling the spatiotemporal variability using statistical models and generating novel 4D tree-like structures from a set of exemplars. We demonstrate and validate the proposed framework using real 4D plant data.

[AI-23] Enhanced Infield Agriculture with Interpretable Machine Learning Approaches for Crop Classification

链接: https://arxiv.org/abs/2408.12426
作者: Sudi Murindanyi,Joyce Nakatumba-Nabende,Rahman Sanya,Rose Nakibuule,Andrew Katumba
关键词-EN: Artificial Intelligence, popularity of Artificial, Intelligence in recent, increasing popularity, recent years
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:The increasing popularity of Artificial Intelligence in recent years has led to a surge in interest in image classification, especially in the agricultural sector. With the help of Computer Vision, Machine Learning, and Deep Learning, the sector has undergone a significant transformation, leading to the development of new techniques for crop classification in the field. Despite the extensive research on various image classification techniques, most have limitations such as low accuracy, limited use of data, and a lack of reporting model size and prediction. The most significant limitation of all is the need for model explainability. This research evaluates four different approaches for crop classification, namely traditional ML with handcrafted feature extraction methods like SIFT, ORB, and Color Histogram; Custom Designed CNN and established DL architecture like AlexNet; transfer learning on five models pre-trained using ImageNet such as EfficientNetV2, ResNet152V2, Xception, Inception-ResNetV2, MobileNetV3; and cutting-edge foundation models like YOLOv8 and DINOv2, a self-supervised Vision Transformer Model. All models performed well, but Xception outperformed all of them in terms of generalization, achieving 98% accuracy on the test data, with a model size of 80.03 MB and a prediction time of 0.0633 seconds. A key aspect of this research was the application of Explainable AI to provide the explainability of all the models. This journal presents the explainability of Xception model with LIME, SHAP, and GradCAM, ensuring transparency and trustworthiness in the models’ predictions. This study highlights the importance of selecting the right model according to task-specific needs. It also underscores the important role of explainability in deploying AI in agriculture, providing insightful information to help enhance AI-driven crop management strategies.

[AI-24] Multi-Knowledge Fusion Network for Time Series Representation Learning ICLR

链接: https://arxiv.org/abs/2408.12423
作者: Sagar Srinivas Sakhinana,Shivam Gupta,Krishna Sai Sudhir Aripirala,Venkataramana Runkana
关键词-EN: making informed decisions, sensor networks characterized, high-dimensional multivariate time, forecasting MTS data, MTS data
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: Paper accepted at ML4IoT Workshop, International Conference on Learning Representations(ICLR) 2023

点击查看摘要

Abstract:Forecasting the behaviour of complex dynamical systems such as interconnected sensor networks characterized by high-dimensional multivariate time series(MTS) is of paramount importance for making informed decisions and planning for the future in a broad spectrum of applications. Graph forecasting networks(GFNs) are well-suited for forecasting MTS data that exhibit spatio-temporal dependencies. However, most prior works of GFN-based methods on MTS forecasting rely on domain-expertise to model the nonlinear dynamics of the system, but neglect the potential to leverage the inherent relational-structural dependencies among time series variables underlying MTS data. On the other hand, contemporary works attempt to infer the relational structure of the complex dependencies between the variables and simultaneously learn the nonlinear dynamics of the interconnected system but neglect the possibility of incorporating domain-specific prior knowledge to improve forecast accuracy. To this end, we propose a hybrid architecture that combines explicit prior knowledge with implicit knowledge of the relational structure within the MTS data. It jointly learns intra-series temporal dependencies and inter-series spatial dependencies by encoding time-conditioned structural spatio-temporal inductive biases to provide more accurate and reliable forecasts. It also models the time-varying uncertainty of the multi-horizon forecasts to support decision-making by providing estimates of prediction uncertainty. The proposed architecture has shown promising results on multiple benchmark datasets and outperforms state-of-the-art forecasting methods by a significant margin. We report and discuss the ablation studies to validate our forecasting architecture.

[AI-25] Dataset | Mindset = Explainable AI | Interpretable AI

链接: https://arxiv.org/abs/2408.12420
作者: Caesar Wu,Rajkumar Buyya,Yuan Fang Li,Pascal Bouvry
关键词-EN: Artificial Intelligence, underpin machine learning, IAI, machine learning, XAI
类目: Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:We often use “explainable” Artificial Intelligence (XAI)" and “interpretable AI (IAI)” interchangeably when we apply various XAI tools for a given dataset to explain the reasons that underpin machine learning (ML) outputs. However, these notions can sometimes be confusing because interpretation often has a subjective connotation, while explanations lean towards objective facts. We argue that XAI is a subset of IAI. The concept of IAI is beyond the sphere of a dataset. It includes the domain of a mindset. At the core of this ambiguity is the duality of reasons, in which we can reason either outwards or inwards. When directed outwards, we want the reasons to make sense through the laws of nature. When turned inwards, we want the reasons to be happy, guided by the laws of the heart. While XAI and IAI share reason as the common notion for the goal of transparency, clarity, fairness, reliability, and accountability in the context of ethical AI and trustworthy AI (TAI), their differences lie in that XAI emphasizes the post-hoc analysis of a dataset, and IAI requires a priori mindset of abstraction. This hypothesis can be proved by empirical experiments based on an open dataset and harnessed by High-Performance Computing (HPC). The demarcation of XAI and IAI is indispensable because it would be impossible to determine regulatory policies for many AI applications, especially in healthcare, human resources, banking, and finance. We aim to clarify these notions and lay the foundation of XAI, IAI, EAI, and TAI for many practitioners and policymakers in future AI applications and research.

[AI-26] 4D Diffusion for Dynamic Protein Structure Prediction with Reference Guided Motion Alignment

链接: https://arxiv.org/abs/2408.12419
作者: Kaihui Cheng,Ce Liu,Qingkun Su,Jun Wang,Liwei Zhang,Yining Tang,Yao Yao,Siyu Zhu,Yuan Qi
关键词-EN: advancing biological research, facilitating pharmaceutical development, protein structures, dynamic protein structures, Protein structure prediction
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Protein structure prediction is pivotal for understanding the structure-function relationship of proteins, advancing biological research, and facilitating pharmaceutical development and experimental design. While deep learning methods and the expanded availability of experimental 3D protein structures have accelerated structure prediction, the dynamic nature of protein structures has received limited attention. This study introduces an innovative 4D diffusion model incorporating molecular dynamics (MD) simulation data to learn dynamic protein structures. Our approach is distinguished by the following components: (1) a unified diffusion model capable of generating dynamic protein structures, including both the backbone and side chains, utilizing atomic grouping and side-chain dihedral angle predictions; (2) a reference network that enhances structural consistency by integrating the latent embeddings of the initial 3D protein structures; and (3) a motion alignment module aimed at improving temporal structural coherence across multiple time steps. To our knowledge, this is the first diffusion-based model aimed at predicting protein trajectories across multiple time steps simultaneously. Validation on benchmark datasets demonstrates that our model exhibits high accuracy in predicting dynamic 3D structures of proteins containing up to 256 amino acids over 32 time steps, effectively capturing both local flexibility in stable states and significant conformational changes.

[AI-27] CODE: Confident Ordinary Differential Editing

链接: https://arxiv.org/abs/2408.12418
作者: Bastien van Delft,Tommaso Martorella,Alexandre Alahi
关键词-EN: facilitates seamless editing, Confident Ordinary Differential, Ordinary Differential Equation, Ordinary Differential Editing, Ordinary Differential
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Conditioning image generation facilitates seamless editing and the creation of photorealistic images. However, conditioning on noisy or Out-of-Distribution (OoD) images poses significant challenges, particularly in balancing fidelity to the input and realism of the output. We introduce Confident Ordinary Differential Editing (CODE), a novel approach for image synthesis that effectively handles OoD guidance images. Utilizing a diffusion model as a generative prior, CODE enhances images through score-based updates along the probability-flow Ordinary Differential Equation (ODE) trajectory. This method requires no task-specific training, no handcrafted modules, and no assumptions regarding the corruptions affecting the conditioning image. Our method is compatible with any diffusion model. Positioned at the intersection of conditional image generation and blind image restoration, CODE operates in a fully blind manner, relying solely on a pre-trained generative model. Our method introduces an alternative approach to blind restoration: instead of targeting a specific ground truth image based on assumptions about the underlying corruption, CODE aims to increase the likelihood of the input image while maintaining fidelity. This results in the most probable in-distribution image around the input. Our contributions are twofold. First, CODE introduces a novel editing method based on ODE, providing enhanced control, realism, and fidelity compared to its SDE-based counterpart. Second, we introduce a confidence interval-based clipping method, which improves CODE’s effectiveness by allowing it to disregard certain pixels or information, thus enhancing the restoration process in a blind manner. Experimental results demonstrate CODE’s effectiveness over existing methods, particularly in scenarios involving severe degradation or OoD inputs.

[AI-28] Multi-Source Knowledge-Based Hybrid Neural Framework for Time Series Representation Learning IJCAI-23

链接: https://arxiv.org/abs/2408.12409
作者: Sagar Srinivas Sakhinana,Krishna Sai Sudhir Aripirala,Shivam Gupta,Venkataramana Runkana
关键词-EN: complex dynamical systems, interconnected sensor networks, MTS data, Accurately predicting, multivariate time series
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: Paper is accepted at Knowledge-Based Compositional Generalization Workshop, International Joint Conferences on Artificial Intelligence(IJCAI-23)

点击查看摘要

Abstract:Accurately predicting the behavior of complex dynamical systems, characterized by high-dimensional multivariate time series(MTS) in interconnected sensor networks, is crucial for informed decision-making in various applications to minimize risk. While graph forecasting networks(GFNs) are ideal for forecasting MTS data that exhibit spatio-temporal dependencies, prior works rely solely on the domain-specific knowledge of time-series variables inter-relationships to model the nonlinear dynamics, neglecting inherent relational structural dependencies among the variables within the MTS data. In contrast, contemporary works infer relational structures from MTS data but neglect domain-specific knowledge. The proposed hybrid architecture addresses these limitations by combining both domain-specific knowledge and implicit knowledge of the relational structure underlying the MTS data using Knowledge-Based Compositional Generalization. The hybrid architecture shows promising results on multiple benchmark datasets, outperforming state-of-the-art forecasting methods. Additionally, the architecture models the time varying uncertainty of multi-horizon forecasts.

[AI-29] Multi-Style Facial Sketch Synthesis through Masked Generative Modeling

链接: https://arxiv.org/abs/2408.12400
作者: Bowen Sun,Guo Lu,Shibao Zheng
关键词-EN: generating sketch portraits, holds profound implications, encompassing cross-modal face, cross-modal face recognition, facial sketch synthesis
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:The facial sketch synthesis (FSS) model, capable of generating sketch portraits from given facial photographs, holds profound implications across multiple domains, encompassing cross-modal face recognition, entertainment, art, media, among others. However, the production of high-quality sketches remains a formidable task, primarily due to the challenges and flaws associated with three key factors: (1) the scarcity of artist-drawn data, (2) the constraints imposed by limited style types, and (3) the deficiencies of processing input information in existing models. To address these difficulties, we propose a lightweight end-to-end synthesis model that efficiently converts images to corresponding multi-stylized sketches, obviating the necessity for any supplementary inputs (\eg, 3D geometry). In this study, we overcome the issue of data insufficiency by incorporating semi-supervised learning into the training process. Additionally, we employ a feature extraction module and style embeddings to proficiently steer the generative transformer during the iterative prediction of masked image tokens, thus achieving a continuous stylized output that retains facial features accurately in sketches. The extensive experiments demonstrate that our method consistently outperforms previous algorithms across multiple benchmarks, exhibiting a discernible disparity.

[AI-30] Cell-ontology guided transcriptome foundation model DATE

链接: https://arxiv.org/abs/2408.12373
作者: Xinyu Yuan,Zhihao Zhan,Zuobai Zhang,Manqi Zhou,Jianan Zhao,Boyu Han,Yue Li,Jian Tang
关键词-EN: Transcriptome foundation models, hold great promises, TFMs hold great, dictate diverse cell, diverse cell functions
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: All anonymous reviewers’ constructive suggestions are appreciated. The next version will be updated soon

点击查看摘要

Abstract:Transcriptome foundation models TFMs hold great promises of deciphering the transcriptomic language that dictate diverse cell functions by self-supervised learning on large-scale single-cell gene expression data, and ultimately unraveling the complex mechanisms of human diseases. However, current TFMs treat cells as independent samples and ignore the taxonomic relationships between cell types, which are available in cell ontology graphs. We argue that effectively leveraging this ontology information during the TFM pre-training can improve learning biologically meaningful gene co-expression patterns while preserving TFM as a general purpose foundation model for downstream zero-shot and fine-tuning tasks. To this end, we present \textbfsingle \textbfcell, \textbfCell-\textbfontology guided TFM scCello. We introduce cell-type coherence loss and ontology alignment loss, which are minimized along with the masked gene expression prediction loss during the pre-training. The novel loss component guide scCello to learn the cell-type-specific representation and the structural relation between cell types from the cell ontology graph, respectively. We pre-trained scCello on 22 million cells from CellxGene database leveraging their cell-type labels mapped to the cell ontology graph from Open Biological and Biomedical Ontology Foundry. Our TFM demonstrates competitive generalization and transferability performance over the existing TFMs on biologically important tasks including identifying novel cell types of unseen cells, prediction of cell-type-specific marker genes, and cancer drug responses.

[AI-31] RoundTable: Leveraging Dynamic Schema and Contextual Autocomplete for Enhanced Query Precision in Tabular Question Answering

链接: https://arxiv.org/abs/2408.12369
作者: Pratyush Kumar,Kuber Vijaykumar Bellad,Bharat Vadlamudi,Aman Chadha
关键词-EN: Large Language Models, executable database queries, plain English, advancements in Large, translating user questions
类目: Artificial Intelligence (cs.AI)
*备注: 13 pages, 4 figures

点击查看摘要

Abstract:With advancements in Large Language Models (LLMs), a major use case that has emerged is querying databases in plain English, translating user questions into executable database queries, which has improved significantly. However, real-world datasets often feature a vast array of attributes and complex values, complicating the LLMs task of accurately identifying relevant columns or values from natural language queries. Traditional methods cannot fully relay the datasets size and complexity to the LLM. To address these challenges, we propose a novel framework that leverages Full-Text Search (FTS) on the input table. This approach not only enables precise detection of specific values and columns but also narrows the search space for language models, thereby enhancing query accuracy. Additionally, it supports a custom auto-complete feature that suggests queries based on the data in the table. This integration significantly refines the interaction between the user and complex datasets, offering a sophisticated solution to the limitations faced by current table querying capabilities. This work is accompanied by an application for both Mac and Windows platforms, which readers can try out themselves on their own data.

[AI-32] SAM-SP: Self-Prompting Makes SAM Great Again

链接: https://arxiv.org/abs/2408.12364
作者: Chunpeng Zhou,Kangjie Ning,Qianqian Shen,Sheng Zhou,Zhi Yu,Haishuai Wang
关键词-EN: Visual Foundation Model, recently introduced Segment, Visual Foundation, demonstrated impressive capabilities, diverse natural image
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)
*备注: Under Review

点击查看摘要

Abstract:The recently introduced Segment Anything Model (SAM), a Visual Foundation Model (VFM), has demonstrated impressive capabilities in zero-shot segmentation tasks across diverse natural image datasets. Despite its success, SAM encounters noticeably performance degradation when applied to specific domains, such as medical images. Current efforts to address this issue have involved fine-tuning strategies, intended to bolster the generalizability of the vanilla SAM. However, these approaches still predominantly necessitate the utilization of domain specific expert-level prompts during the evaluation phase, which severely constrains the model’s practicality. To overcome this limitation, we introduce a novel self-prompting based fine-tuning approach, called SAM-SP, tailored for extending the vanilla SAM model. Specifically, SAM-SP leverages the output from the previous iteration of the model itself as prompts to guide subsequent iteration of the model. This self-prompting module endeavors to learn how to generate useful prompts autonomously and alleviates the dependence on expert prompts during the evaluation phase, significantly broadening SAM’s applicability. Additionally, we integrate a self-distillation module to enhance the self-prompting process further. Extensive experiments across various domain specific datasets validate the effectiveness of the proposed SAM-SP. Our SAM-SP not only alleviates the reliance on expert prompts but also exhibits superior segmentation performance comparing to the state-of-the-art task-specific segmentation approaches, the vanilla SAM, and SAM-based approaches. Comments: Under Review Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET) Cite as: arXiv:2408.12364 [cs.CV] (or arXiv:2408.12364v1 [cs.CV] for this version) https://doi.org/10.48550/arXiv.2408.12364 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

[AI-33] Class-balanced Open-set Semi-supervised Object Detection for Medical Images

链接: https://arxiv.org/abs/2408.12355
作者: Zhanyun Lu,Renshu Gu,Huimin Cheng,Siyu Pang,Mingyu Xu,Peifang Xu,Yaqi Wang,Yuichiro Kinoshita,Juan Ye,Gangyong Jia,Qing Wu
关键词-EN: Semi-Supervised Object Detection, Object Detection, utilize unlabeled data, open-set semi-supervised object, Semi-Supervised Object
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Medical image datasets in the real world are often unlabeled and imbalanced, and Semi-Supervised Object Detection (SSOD) can utilize unlabeled data to improve an object detector. However, existing approaches predominantly assumed that the unlabeled data and test data do not contain out-of-distribution (OOD) classes. The few open-set semi-supervised object detection methods have two weaknesses: first, the class imbalance is not considered; second, the OOD instances are distinguished and simply discarded during pseudo-labeling. In this paper, we consider the open-set semi-supervised object detection problem which leverages unlabeled data that contain OOD classes to improve object detection for medical images. Our study incorporates two key innovations: Category Control Embed (CCE) and out-of-distribution Detection Fusion Classifier (OODFC). CCE is designed to tackle dataset imbalance by constructing a Foreground information Library, while OODFC tackles open-set challenges by integrating the ``unknown’’ information into basic pseudo-labels. Our method outperforms the state-of-the-art SSOD performance, achieving a 4.25 mAP improvement on the public Parasite dataset.

[AI-34] Fine-tuning Smaller Language Models for Question Answering over Financial Documents

链接: https://arxiv.org/abs/2408.12337
作者: Karmvir Singh Phogat,Sai Akhil Puranam,Sridhar Dasaratha,Chetan Harsha,Shashishekar Ramakrishna
关键词-EN: Recent research, acquire substantial reasoning, substantial reasoning abilities, reasoning exemplars crafted, significantly larger teacher
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
*备注:

点击查看摘要

Abstract:Recent research has shown that smaller language models can acquire substantial reasoning abilities when fine-tuned with reasoning exemplars crafted by a significantly larger teacher model. We explore this paradigm for the financial domain, focusing on the challenge of answering questions that require multi-hop numerical reasoning over financial texts. We assess the performance of several smaller models that have been fine-tuned to generate programs that encode the required financial reasoning and calculations. Our findings demonstrate that these fine-tuned smaller models approach the performance of the teacher model. To provide a granular analysis of model performance, we propose an approach to investigate the specific student model capabilities that are enhanced by fine-tuning. Our empirical analysis indicates that fine-tuning refines the student models ability to express and apply the required financial concepts along with adapting the entity extraction for the specific data format. In addition, we hypothesize and demonstrate that comparable financial reasoning capability can be induced using relatively smaller datasets. Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY) Cite as: arXiv:2408.12337 [cs.CL] (or arXiv:2408.12337v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2408.12337 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

[AI-35] Enhanced Expressivity in Graph Neural Networks with Lanczos-Based Linear Constraints

链接: https://arxiv.org/abs/2408.12334
作者: Niloofar Azizi,Nils Kriege,Horst Bischof
关键词-EN: Graph Neural Networks, Message Passing GNNs, Neural Networks, Message Passing, commonly used Message
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Graph Neural Networks (GNNs) excel in handling graph-structured data but often underperform in link prediction tasks compared to classical methods, mainly due to the limitations of the commonly used Message Passing GNNs (MPNNs). Notably, their ability to distinguish non-isomorphic graphs is limited by the 1-dimensional Weisfeiler-Lehman test. Our study presents a novel method to enhance the expressivity of GNNs by embedding induced subgraphs into the graph Laplacian matrix’s eigenbasis. We introduce a Learnable Lanczos algorithm with Linear Constraints (LLwLC), proposing two novel subgraph extraction strategies: encoding vertex-deleted subgraphs and applying Neumann eigenvalue constraints. For the former, we conjecture that LLwLC establishes a universal approximator, offering efficient time complexity. The latter focuses on link representations enabling differentiation between k -regular graphs and node automorphism, a vital aspect for link prediction tasks. Our approach results in an extremely lightweight architecture, reducing the need for extensive training datasets. Empirically, our method improves performance in challenging link prediction tasks across benchmark datasets, establishing its practical utility and supporting our theoretical findings. Notably, LLwLC achieves 20x and 10x speedup by only requiring 5% and 10% data from the PubMed and OGBL-Vessel datasets while comparing to the state-of-the-art.

[AI-36] Graph Retrieval Augmented Trustworthiness Reasoning

链接: https://arxiv.org/abs/2408.12333
作者: Ying Zhu,Shengchang Li,Ziqian Kong,Peilan Xu
关键词-EN: identify potential allies, allies and adversaries, decision-making processes, Graph Retrieval Augmented, identify potential
类目: Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Trustworthiness reasoning is crucial in multiplayer games with incomplete information, enabling agents to identify potential allies and adversaries, thereby enhancing reasoning and decision-making processes. Traditional approaches relying on pre-trained models necessitate extensive domain-specific data and considerable reward feedback, with their lack of real-time adaptability hindering their effectiveness in dynamic environments. In this paper, we introduce the Graph Retrieval Augmented Reasoning (GRATR) framework, leveraging the Retrieval-Augmented Generation (RAG) technique to bolster trustworthiness reasoning in agents. GRATR constructs a dynamic trustworthiness graph, updating it in real-time with evidential information, and retrieves relevant trust data to augment the reasoning capabilities of Large Language Models (LLMs). We validate our approach through experiments on the multiplayer game “Werewolf,” comparing GRATR against baseline LLM and LLM enhanced with Native RAG and Rerank RAG. Our results demonstrate that GRATR surpasses the baseline methods by over 30% in winning rate, with superior reasoning performance. Moreover, GRATR effectively mitigates LLM hallucinations, such as identity and objective amnesia, and crucially, it renders the reasoning process more transparent and traceable through the use of the trustworthiness graph.

[AI-37] Interactive DualChecker for Mitigating Hallucinations in Distilling Large Language Models

链接: https://arxiv.org/abs/2408.12326
作者: Meiyun Wang,Masahiro Suzuki,Hiroki Sakaji,Kiyoshi Izumi
关键词-EN: Large Language Models, Large Language, demonstrated exceptional capabilities, Language Models, Models
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Computers and Society (cs.CY)
*备注:

点击查看摘要

Abstract:Large Language Models (LLMs) have demonstrated exceptional capabilities across various machine learning (ML) tasks. Given the high costs of creating annotated datasets for supervised learning, LLMs offer a valuable alternative by enabling effective few-shot in-context learning. However, these models can produce hallucinations, particularly in domains with incomplete knowledge. Additionally, current methods for knowledge distillation using LLMs often struggle to enhance the effectiveness of both teacher and student models. To address these challenges, we introduce DualChecker, an innovative framework designed to mitigate hallucinations and improve the performance of both teacher and student models during knowledge distillation. DualChecker employs ContextAligner to ensure that the context provided by teacher models aligns with human labeling standards. It also features a dynamic checker system that enhances model interaction: one component re-prompts teacher models with more detailed content when they show low confidence, and another identifies borderline cases from student models to refine the teaching templates. This interactive process promotes continuous improvement and effective knowledge transfer between the models. We evaluate DualChecker using a green innovation textual dataset that includes binary, multiclass, and token classification tasks. The experimental results show that DualChecker significantly outperforms existing state-of-the-art methods, achieving up to a 17% improvement in F1 score for teacher models and 10% for student models. Notably, student models fine-tuned with LLM predictions perform comparably to those fine-tuned with actual data, even in a challenging domain. We make all datasets, models, and code from this research publicly available.

[AI-38] PolyRouter: A Multi-LLM Querying System

链接: https://arxiv.org/abs/2408.12320
作者: Dimitris Stripelis,Zijian Hu,Jipeng Zhang,Zhaozhuo Xu,Alay Shah,Han Jin,Yuhang Yao,Salman Avestimehr,Chaoyang He
关键词-EN: Large Language Models, Large Language, possessing domain-specific expertise, growth of Large, domain-specific expertise
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: 14 pages, 7 figures, 2 tables

点击查看摘要

Abstract:With the rapid growth of Large Language Models (LLMs) across various domains, numerous new LLMs have emerged, each possessing domain-specific expertise. This proliferation has highlighted the need for quick, high-quality, and cost-effective LLM query response methods. Yet, no single LLM exists to efficiently balance this trilemma. Some models are powerful but extremely costly, while others are fast and inexpensive but qualitatively inferior. To address this challenge, we present PolyRouter, a non-monolithic LLM querying system that seamlessly integrates various LLM experts into a single query interface and dynamically routes incoming queries to the most high-performant expert based on query’s requirements. Through extensive experiments, we demonstrate that when compared to standalone expert models, PolyRouter improves query efficiency by up to 40%, and leads to significant cost reductions of up to 30%, while maintaining or enhancing model performance by up to 10%.

[AI-39] Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations

链接: https://arxiv.org/abs/2408.12315
作者: Kai Tzu-iunn Ong,Taeyoon Kwon,Jinyoung Yeo
关键词-EN: Guiding large language, large language models, improving LLM applications, Guiding large, large language
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
*备注: preprint / under review

点击查看摘要

Abstract:Guiding large language models with a selected set of human-authored demonstrations is a common practice for improving LLM applications. However, human effort can be costly, especially in specialized domains (e.g., clinical diagnosis), and does not guarantee optimal performance due to the potential discrepancy of target skills between selected demonstrations and real test instances. Motivated by these, this paper explores the automatic creation of customized demonstrations, whose target skills align with the given target instance. We present SELF-TAUGHT, a problem-solving framework, which facilitates demonstrations that are “tailored” to the target problem and “filtered” for better quality (i.e., correctness) in a zero-shot manner. In 15 tasks of multiple-choice questions of diverse domains and the diagnosis of Alzheimer’s disease (AD) with real-world patients, SELF-TAUGHT achieves superior performance to strong baselines (e.g., Few-shot CoT, Plan-and-Solve, Auto-CoT). We conduct comprehensive analyses on SELF-TAUGHT, including its generalizability to existing prompting methods and different LLMs, the quality of its intermediate generation, and more.

[AI-40] Deep Learning with CNNs: A Compact Holistic Tutorial with Focus on Supervised Regression (Preprint)

链接: https://arxiv.org/abs/2408.12308
作者: Yansel Gonzalez Tejeda,Helmut A. Mayer
关键词-EN: Convolutional Neural Networks, Deep Learning, Learning, address Deep Learning, Neural Networks
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:In this tutorial, we present a compact and holistic discussion of Deep Learning with a focus on Convolutional Neural Networks (CNNs) and supervised regression. While there are numerous books and articles on the individual topics we cover, comprehensive and detailed tutorials that address Deep Learning from a foundational yet rigorous and accessible perspective are rare. Most resources on CNNs are either too advanced, focusing on cutting-edge architectures, or too narrow, addressing only specific applications like image classification.This tutorial not only summarizes the most relevant concepts but also provides an in-depth exploration of each, offering a complete yet agile set of ideas. Moreover, we highlight the powerful synergy between learning theory, statistic, and machine learning, which together underpin the Deep Learning and CNN frameworks. We aim for this tutorial to serve as an optimal resource for students, professors, and anyone interested in understanding the foundations of Deep Learning. Upon acceptance we will provide an accompanying repository under \hrefthis https URLthis https URL Keywords: Tutorial, Deep Learning, Convolutional Neural Networks, Machine Learning. Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG) Cite as: arXiv:2408.12308 [cs.AI] (or arXiv:2408.12308v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2408.12308 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

[AI-41] pta uzmanlik sinavinda (tus) b"uy"uk d.il modeller.i .insanlardan daha mi bacsarili?

链接: https://arxiv.org/abs/2408.12305
作者: Yesim Aygul,Muge Olucoglu,Adil Alpkocak
关键词-EN: natural language processing, artificial intelligence, medical, Term Medical Specialization, Medical Specialization Examination
类目: Artificial Intelligence (cs.AI)
*备注: 9 pages, in Turkish language, 8 figures

点击查看摘要

Abstract:The potential of artificial intelligence in medical education and assessment has been made evident by recent developments in natural language processing and artificial intelligence. Medical questions can now be successfully answered by artificial intelligence algorithms. It can help medical practitioners. This study evaluates the performance of three different artificial intelligence models in answering Turkish medical questions in the 2021 1st Term Medical Specialization Examination (MSE). MSE consists of a total of 240 questions across clinical (CMST) and basic (BMST) medical sciences. According to the results in CMST, it was concluded that Gemini correctly answered 82 questions, ChatGPT-4 answered 105 questions and ChatGPT-4o answered 117 questions. In BMST, Gemini and ChatGPT-4 answered 93 questions and ChatGPT-4o answered 107 questions correctly according to the answer key. ChatGPT-4o outperformed the candidate with the highest scores of 113 and 106 according to CMST and BMST respectively. This study highlights the importance of the potential of artificial intelligence in medical education and assessment. It demonstrates that advanced models can achieve high accuracy and contextual understanding, demonstrating their potential role in medical education and evaluation.

[AI-42] OPTDTALS: Approximate Logic Synthesis via Optimal Decision Trees Approach

链接: https://arxiv.org/abs/2408.12304
作者: Hao Hu,Shaowei Cai
关键词-EN: Explainable Artificial Intelligence, motivates promising studies, computing optimal Interpretable, Interpretable Machine Learning, interest in Explainable
类目: Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:The growing interest in Explainable Artificial Intelligence (XAI) motivates promising studies of computing optimal Interpretable Machine Learning models, especially decision trees. Such models generally provide optimality in compact size or empirical accuracy. Recent works focus on improving efficiency due to the natural scalability issue. The application of such models to practical problems is quite limited. As an emerging problem in circuit design, Approximate Logic Synthesis (ALS) aims to reduce circuit complexity by sacrificing correctness. Recently, multiple heuristic machine learning methods have been applied in ALS, which learns approximated circuits from samples of input-output pairs. In this paper, we propose a new ALS methodology realizing the approximation via learning optimal decision trees in empirical accuracy. Compared to previous heuristic ALS methods, the guarantee of optimality achieves a more controllable trade-off between circuit complexity and accuracy. Experimental results show clear improvements in our methodology in the quality of approximated designs (circuit complexity and accuracy) compared to the state-of-the-art approaches. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2408.12304 [cs.AI] (or arXiv:2408.12304v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2408.12304 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

[AI-43] AT-SNN: Adaptive Tokens for Vision Transformer on Spiking Neural Network

链接: https://arxiv.org/abs/2408.12293
作者: Donghwa Kang,Youngmoon Lee,Eun-Kyu Lee,Brent Kang,Jinkyu Lee,Hyeongboo Baek
关键词-EN: spiking neural networks, reducing power consumption, neural networks, convolutional neural networks, orthogonally developed
类目: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
*备注: 8 pages

点击查看摘要

Abstract:In the training and inference of spiking neural networks (SNNs), direct training and lightweight computation methods have been orthogonally developed, aimed at reducing power consumption. However, only a limited number of approaches have applied these two mechanisms simultaneously and failed to fully leverage the advantages of SNN-based vision transformers (ViTs) since they were originally designed for convolutional neural networks (CNNs). In this paper, we propose AT-SNN designed to dynamically adjust the number of tokens processed during inference in SNN-based ViTs with direct training, wherein power consumption is proportional to the number of tokens. We first demonstrate the applicability of adaptive computation time (ACT), previously limited to RNNs and ViTs, to SNN-based ViTs, enhancing it to discard less informative spatial tokens selectively. Also, we propose a new token-merge mechanism that relies on the similarity of tokens, which further reduces the number of tokens while enhancing accuracy. We implement AT-SNN to Spikformer and show the effectiveness of AT-SNN in achieving high energy efficiency and accuracy compared to state-of-the-art approaches on the image classification tasks, CIFAR10, CIFAR-100, and TinyImageNet. For example, our approach uses up to 42.4% fewer tokens than the existing best-performing method on CIFAR-100, while conserving higher accuracy.

[AI-44] owards Deconfounded Image-Text Matching with Causal Inference ACM-MM

链接: https://arxiv.org/abs/2408.12292
作者: Wenhui Li,Xinqi Su,Dan Song,Lanjun Wang,Kun Zhang,An-An Liu
关键词-EN: shown remarkable performance, image-text matching, Prior image-text matching, Structural Causal Models, image-text matching model
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注: ACM MM

点击查看摘要

Abstract:Prior image-text matching methods have shown remarkable performance on many benchmark datasets, but most of them overlook the bias in the dataset, which exists in intra-modal and inter-modal, and tend to learn the spurious correlations that extremely degrade the generalization ability of the model. Furthermore, these methods often incorporate biased external knowledge from large-scale datasets as prior knowledge into image-text matching model, which is inevitable to force model further learn biased associations. To address above limitations, this paper firstly utilizes Structural Causal Models (SCMs) to illustrate how intra- and inter-modal confounders damage the image-text matching. Then, we employ backdoor adjustment to propose an innovative Deconfounded Causal Inference Network (DCIN) for image-text matching task. DCIN (1) decomposes the intra- and inter-modal confounders and incorporates them into the encoding stage of visual and textual features, effectively eliminating the spurious correlations during image-text matching, and (2) uses causal inference to mitigate biases of external knowledge. Consequently, the model can learn causality instead of spurious correlations caused by dataset bias. Extensive experiments on two well-known benchmark datasets, i.e., Flickr30K and MSCOCO, demonstrate the superiority of our proposed method.

[AI-45] Developing vocal system impaired patient-aimed voice quality assessment approach using ASR representation-included multiple features INTERSPEECH2024

链接: https://arxiv.org/abs/2408.12279
作者: Shaoxiang Dang,Tetsuya Matsumoto,Yoshinori Takeuchi,Takashi Tsuboi,Yasuhiro Tanaka,Daisuke Nakatsubo,Satoshi Maesawa,Ryuta Saito,Masahisa Katsuno,Hiroaki Kudo
关键词-EN: samples loom large, imbalanced clinical data, clinical data samples, data samples loom, clinical speech processing
类目: ound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
*备注: Accepted by Interspeech 2024

点击查看摘要

Abstract:The potential of deep learning in clinical speech processing is immense, yet the hurdles of limited and imbalanced clinical data samples loom large. This article addresses these challenges by showcasing the utilization of automatic speech recognition and self-supervised learning representations, pre-trained on extensive datasets of normal speech. This innovative approach aims to estimate voice quality of patients with impaired vocal systems. Experiments involve checks on PVQD dataset, covering various causes of vocal system damage in English, and a Japanese dataset focusing on patients with Parkinson’s disease before and after undergoing subthalamic nucleus deep brain stimulation (STN-DBS) surgery. The results on PVQD reveal a notable correlation (0.8 on PCC) and an extraordinary accuracy (0.5 on MSE) in predicting Grade, Breathy, and Asthenic indicators. Meanwhile, progress has been achieved in predicting the voice quality of patients in the context of STN-DBS.

[AI-46] Variance reduction of diffusion models gradients with Taylor approximation-based control variate ICML

链接: https://arxiv.org/abs/2408.12270
作者: Paul Jeha,Will Grathwohl,Michael Riis Andersen,Carl Henrik Ek,Jes Frellsen
关键词-EN: denoising score matching, Score-based models, trained with denoising, score matching, high dimensional data
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: 14 pages, ICML Structured Probabilistic Inference Generative Modeling 2024

点击查看摘要

Abstract:Score-based models, trained with denoising score matching, are remarkably effective in generating high dimensional data. However, the high variance of their training objective hinders optimisation. We attempt to reduce it with a control variate, derived via a k -th order Taylor expansion on the training objective and its gradient. We prove an equivalence between the two and demonstrate empirically the effectiveness of our approach on a low dimensional problem setting; and study its effect on larger problems.

[AI-47] oward the Evaluation of Large Language Models Considering Score Variance across Instruction Templates

链接: https://arxiv.org/abs/2408.12263
作者: Yusuke Sakai,Adam Nohejl,Jiangnan Hang,Hidetaka Kamigaito,Taro Watanabe
关键词-EN: natural language understanding, large language models, NLU performance, language understanding, language models
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: 19 pages, 7 figures

点击查看摘要

Abstract:The natural language understanding (NLU) performance of large language models (LLMs) has been evaluated across various tasks and datasets. The existing evaluation methods, however, do not take into account the variance in scores due to differences in prompts, which leads to unfair evaluation and comparison of NLU performance. Moreover, evaluation designed for specific prompts is inappropriate for instruction tuning, which aims to perform well with any prompt. It is therefore necessary to find a way to measure NLU performance in a fair manner, considering score variance between different instruction templates. In this study, we provide English and Japanese cross-lingual datasets for evaluating the NLU performance of LLMs, which include multiple instruction templates for fair evaluation of each task, along with regular expressions to constrain the output format. Furthermore, we propose the Sharpe score as an evaluation metric that takes into account the variance in scores between templates. Comprehensive analysis of English and Japanese LLMs reveals that the high variance among templates has a significant impact on the fair evaluation of LLMs.

[AI-48] Can You Trust Your Metric? Automatic Concatenation-Based Tests for Metric Validity

链接: https://arxiv.org/abs/2408.12259
作者: Ora Nova Fandina,Leshem Choshen,Eitan Farchi,George Kour,Yotam Perlitz,Orna Raz
关键词-EN: Large Language Model, Large Language, Language Model, unsafe responses generated, filter unsafe responses
类目: Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Consider a scenario where a harmfulness detection metric is employed by a system to filter unsafe responses generated by a Large Language Model. When analyzing individual harmful and unethical prompt-response pairs, the metric correctly classifies each pair as highly unsafe, assigning the highest score. However, when these same prompts and responses are concatenated, the metric’s decision flips, assigning the lowest possible score, thereby misclassifying the content as safe and allowing it to bypass the filter. In this study, we discovered that several harmfulness LLM-based metrics, including GPT-based, exhibit this decision-flipping phenomenon. Additionally, we found that even an advanced metric like GPT-4o is highly sensitive to input order. Specifically, it tends to classify responses as safe if the safe content appears first, regardless of any harmful content that follows, and vice versa. This work introduces automatic concatenation-based tests to assess the fundamental properties a valid metric should satisfy. We applied these tests in a model safety scenario to assess the reliability of harmfulness detection metrics, uncovering a number of inconsistencies.

[AI-49] A Language-agnostic Model of Child Language Acquisition

链接: https://arxiv.org/abs/2408.12254
作者: Louis Mahon,Omri Abend,Uri Berger,Katherine Demuth,Mark Johnson,Mark Steedman
关键词-EN: recent semantic bootstrapping, semantic bootstrapping child-language, bootstrapping child-language acquisition, child-language acquisition model, designed for English
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:This work reimplements a recent semantic bootstrapping child-language acquisition model, which was originally designed for English, and trains it to learn a new language: Hebrew. The model learns from pairs of utterances and logical forms as meaning representations, and acquires both syntax and word meanings simultaneously. The results show that the model mostly transfers to Hebrew, but that a number of factors, including the richer morphology in Hebrew, makes the learning slower and less robust. This suggests that a clear direction for future work is to enable the model to leverage the similarities between different word forms.

[AI-50] Can Artificial Intelligence Embody Moral Values?

链接: https://arxiv.org/abs/2408.12250
作者: Torben Swoboda,Lode Lauwaert
关键词-EN: neutrality thesis holds, neutrality thesis, neutrality, artificial, thesis holds
类目: Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:The neutrality thesis holds that technology cannot be laden with values. This long-standing view has faced critiques, but much of the argumentation against neutrality has focused on traditional, non-smart technologies like bridges and razors. In contrast, AI is a smart technology increasingly used in high-stakes domains like healthcare, finance, and policing, where its decisions can cause moral harm. In this paper, we argue that artificial intelligence, particularly artificial agents that autonomously make decisions to pursue their goals, challenge the neutrality thesis. Our central claim is that the computational models underlying artificial agents can integrate representations of moral values such as fairness, honesty and avoiding harm. We provide a conceptual framework discussing the neutrality thesis, values, and AI. Moreover, we examine two approaches to designing computational models of morality, artificial conscience and ethical prompting, and present empirical evidence from text-based game environments that artificial agents with such models exhibit more ethical behavior compared to agents without these models. The findings support that AI can embody moral values, which contradicts the claim that all technologies are necessarily value-neutral.

[AI-51] LLMs are not Zero-Shot Reasoners for Biomedical Information Extraction

链接: https://arxiv.org/abs/2408.12249
作者: Aishik Nagar,Viktor Schlegel,Thanh-Tung Nguyen,Hao Li,Yuping Wu,Kuluhan Binici,Stefan Winkler
关键词-EN: Large Language Models, Large Language, Language Models, Named Entity Recognition, document summarisation
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: 11 pages

点击查看摘要

Abstract:Large Language Models (LLMs) are increasingly adopted for applications in healthcare, reaching the performance of domain experts on tasks such as question answering and document summarisation. Despite their success on these tasks, it is unclear how well LLMs perform on tasks that are traditionally pursued in the biomedical domain, such as structured information extration. To breach this gap, in this paper, we systematically benchmark LLM performance in Medical Classification and Named Entity Recognition (NER) tasks. We aim to disentangle the contribution of different factors to the performance, particularly the impact of LLMs’ task knowledge and reasoning capabilities, their (parametric) domain knowledge, and addition of external knowledge. To this end we evaluate various open LLMs – including BioMistral and Llama-2 models – on a diverse set of biomedical datasets, using standard prompting, Chain-of-Thought (CoT) and Self-Consistency based reasoning as well as Retrieval-Augmented Generation (RAG) with PubMed and Wikipedia corpora. Counter-intuitively, our results reveal that standard prompting consistently outperforms more complex techniques across both tasks, laying bare the limitations in the current application of CoT, self-consistency and RAG in the biomedical domain. Our findings suggest that advanced prompting methods developed for knowledge- or reasoning-intensive tasks, such as CoT or RAG, are not easily portable to biomedical tasks where precise structured outputs are required. This highlights the need for more effective integration of external knowledge and reasoning mechanisms in LLMs to enhance their performance in real-world biomedical applications.

[AI-52] Efficient Multivariate Time Series Anomaly Detection Through Transfer Learning for Large-Scale Web services

链接: https://arxiv.org/abs/2408.12247
作者: Shenglin Zhang,Pengtian Zhu,Minghua Ma,Jiagang Wang,Yongqian Sun,Dongwen Li,Jingyu Wang,Qianying Guo,Xiaolei Hua,Lin Zhu,Dan Pei
关键词-EN: Large language models, Large language, specialized domains due, language models, excel at general
类目: Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Large language models (LLMs) excel at general question-answering (QA) but often fall short in specialized domains due to a lack of domain-specific knowledge. Commercial companies face the dual challenges of privacy protection and resource constraints when involving LLMs for fine-tuning. This paper propose a novel framework, Self-Evolution, designed to address these issues by leveraging lightweight open-source LLMs through multiple iterative fine-tuning rounds. To enhance the efficiency of iterative fine-tuning, Self-Evolution employ a strategy that filters and reinforces the knowledge with higher value during the iterative process. We employed Self-Evolution on Qwen1.5-7B-Chat using 4,000 documents containing rich domain knowledge from China Mobile, achieving a performance score 174% higher on domain-specific question-answering evaluations than Qwen1.5-7B-Chat and even 22% higher than Qwen1.5-72B-Chat. Self-Evolution has been deployed in China Mobile’s daily operation and maintenance for 117 days, and it improves the efficiency of locating alarms, fixing problems, and finding related reports, with an average efficiency improvement of over 18.6%. In addition, we release Self-Evolution framework code in this https URL.

[AI-53] Weight Scope Alignment: A Frustratingly Easy Method for Model Merging

链接: https://arxiv.org/abs/2408.12237
作者: Yichu Xu,Xin-Chun Li,Le Gan,De-Chuan Zhan
关键词-EN: weight scope, efficiency and robustness, fundamental procedure, weight, scope
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Merging models becomes a fundamental procedure in some applications that consider model efficiency and robustness. The training randomness or Non-I.I.D. data poses a huge challenge for averaging-based model fusion. Previous research efforts focus on element-wise regularization or neural permutations to enhance model averaging while overlooking weight scope variations among models, which can significantly affect merging effectiveness. In this paper, we reveal variations in weight scope under different training conditions, shedding light on its influence on model merging. Fortunately, the parameters in each layer basically follow the Gaussian distribution, which inspires a novel and simple regularization approach named Weight Scope Alignment (WSA). It contains two key components: 1) leveraging a target weight scope to guide the model training process for ensuring weight scope matching in the subsequent model merging. 2) fusing the weight scope of two or more models into a unified one for multi-stage model fusion. We extend the WSA regularization to two different scenarios, including Mode Connectivity and Federated Learning. Abundant experimental studies validate the effectiveness of our approach.

[AI-54] MedDiT: A Knowledge-Controlled Diffusion Transformer Framework for Dynamic Medical Image Generation in Virtual Simulated Patient

链接: https://arxiv.org/abs/2408.12236
作者: Yanzeng Li,Cheng Zeng,Jinchao Zhang,Jie Zhou,Lei Zou
关键词-EN: practice clinical skills, education relies heavily, including medical image, medical image analysis, Medical
类目: Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Medical education relies heavily on Simulated Patients (SPs) to provide a safe environment for students to practice clinical skills, including medical image analysis. However, the high cost of recruiting qualified SPs and the lack of diverse medical imaging datasets have presented significant challenges. To address these issues, this paper introduces MedDiT, a novel knowledge-controlled conversational framework that can dynamically generate plausible medical images aligned with simulated patient symptoms, enabling diverse diagnostic skill training. Specifically, MedDiT integrates various patient Knowledge Graphs (KGs), which describe the attributes and symptoms of patients, to dynamically prompt Large Language Models’ (LLMs) behavior and control the patient characteristics, mitigating hallucination during medical conversation. Additionally, a well-tuned Diffusion Transformer (DiT) model is incorporated to generate medical images according to the specified patient attributes in the KG. In this paper, we present the capabilities of MedDiT through a practical demonstration, showcasing its ability to act in diverse simulated patient cases and generate the corresponding medical images. This can provide an abundant and interactive learning experience for students, advancing medical education by offering an immersive simulation platform for future healthcare professionals. The work sheds light on the feasibility of incorporating advanced technologies like LLM, KG, and DiT in education applications, highlighting their potential to address the challenges faced in simulated patient-based medical education.

[AI-55] EvalYaks: Instruction Tuning Datasets and LoRA Fine-tuned Models for Automated Scoring of CEFR B2 Speaking Assessment Transcripts

链接: https://arxiv.org/abs/2408.12226
作者: Nicy Scaria,Silvester John Joseph Kennedy,Thomas Latinovich,Deepak Subramani
关键词-EN: Relying on human, creates scalability challenges, English speaking assessments, CEFR, CEFR speaking assessments
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Relying on human experts to evaluate CEFR speaking assessments in an e-learning environment creates scalability challenges, as it limits how quickly and widely assessments can be conducted. We aim to automate the evaluation of CEFR B2 English speaking assessments in e-learning environments from conversation transcripts. First, we evaluate the capability of leading open source and commercial Large Language Models (LLMs) to score a candidate’s performance across various criteria in the CEFR B2 speaking exam in both global and India-specific contexts. Next, we create a new expert-validated, CEFR-aligned synthetic conversational dataset with transcripts that are rated at different assessment scores. In addition, new instruction-tuned datasets are developed from the English Vocabulary Profile (up to CEFR B2 level) and the CEFR-SP WikiAuto datasets. Finally, using these new datasets, we perform parameter efficient instruction tuning of Mistral Instruct 7B v0.2 to develop a family of models called EvalYaks. Four models in this family are for assessing the four sections of the CEFR B2 speaking exam, one for identifying the CEFR level of vocabulary and generating level-specific vocabulary, and another for detecting the CEFR level of text and generating level-specific text. EvalYaks achieved an average acceptable accuracy of 96%, a degree of variation of 0.35 levels, and performed 3 times better than the next best model. This demonstrates that a 7B parameter LLM instruction tuned with high-quality CEFR-aligned assessment data can effectively evaluate and score CEFR B2 English speaking assessments, offering a promising solution for scalable, automated language proficiency evaluation.

[AI-56] UNCO: Towards Unifying Neural Combinatorial Optimization through Large Language Model

链接: https://arxiv.org/abs/2408.12214
作者: Xia Jiang,Yaoxin Wu,Yuan Wang,Yingqian Zhang
关键词-EN: considerable research attention, attracted considerable research, applying neural networks, address combinatorial optimization, combinatorial optimization problems
类目: Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Recently, applying neural networks to address combinatorial optimization problems (COPs) has attracted considerable research attention. The prevailing methods always train deep models independently on specific problems, lacking a unified framework for concurrently tackling various COPs. To this end, we propose a unified neural combinatorial optimization (UNCO) framework to solve different types of COPs by a single model. Specifically, we use natural language to formulate text-attributed instances for different COPs and encode them in the same embedding space by the large language model (LLM). The obtained embeddings are further advanced by an encoder-decoder model without any problem-specific modules, thereby facilitating a unified process of solution construction. We further adopt the conflict gradients erasing reinforcement learning (CGERL) algorithm to train the UNCO model, delivering better performance across different COPs than vanilla multi-objective learning. Experiments show that the UNCO model can solve multiple COPs after a single-session training, and achieves satisfactory performance that is comparable to several traditional or learning-based baselines. Instead of pursuing the best performance for each COP, we explore the synergy between tasks and few-shot generalization based on LLM to inspire future work.

[AI-57] Relational decomposition for program synthesis

链接: https://arxiv.org/abs/2408.12212
作者: Céline Hocquette,Andrew Cropper
关键词-EN: relational synthesis sub-tasks, decomposes complex functional, complex functional tasks, simpler relational synthesis, synthesis sub-tasks
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:We introduce a novel approach to program synthesis that decomposes complex functional tasks into simpler relational synthesis sub-tasks. We demonstrate the effectiveness of our approach using an off-the-shelf inductive logic programming (ILP) system on three challenging datasets. Our results show that (i) a relational representation can outperform a functional one, and (ii) an off-the-shelf ILP system with a relational encoding can outperform domain-specific approaches.

[AI-58] wo-level deep domain decomposition method

链接: https://arxiv.org/abs/2408.12198
作者: Victorita Dolean,Serge Gratton,Alexander Heinlein,Valentin Mercier
关键词-EN: Domain Decomposition Method, Deep Domain Decomposition, Domain Decomposition, two-level Deep Domain, Decomposition Method
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: Preprint proceeding format

点击查看摘要

Abstract:This study presents a two-level Deep Domain Decomposition Method (Deep-DDM) augmented with a coarse-level network for solving boundary value problems using physics-informed neural networks (PINNs). The addition of the coarse level network improves scalability and convergence rates compared to the single level method. Tested on a Poisson equation with Dirichlet boundary conditions, the two-level deep DDM demonstrates superior performance, maintaining efficient convergence regardless of the number of subdomains. This advance provides a more scalable and effective approach to solving complex partial differential equations with machine learning.

[AI-59] Reasoning Factual Knowledge in Structured Data with Large Language Models

链接: https://arxiv.org/abs/2408.12188
作者: Sirui Huang,Yanggan Gu,Xuming Hu,Zhonghao Li,Qing Li,Guandong Xu
关键词-EN: Large language models, natural language processing, Large language, made remarkable progress, language processing tasks
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Large language models (LLMs) have made remarkable progress in various natural language processing tasks as a benefit of their capability to comprehend and reason with factual knowledge. However, a significant amount of factual knowledge is stored in structured data, which possesses unique characteristics that differ from the unstructured texts used for pretraining. This difference can introduce imperceptible inference parameter deviations, posing challenges for LLMs in effectively utilizing and reasoning with structured data to accurately infer factual knowledge. To this end, we propose a benchmark named StructFact, to evaluate the structural reasoning capabilities of LLMs in inferring factual knowledge. StructFact comprises 8,340 factual questions encompassing various tasks, domains, timelines, and regions. This benchmark allows us to investigate the capability of LLMs across five factual tasks derived from the unique characteristics of structural facts. Extensive experiments on a set of LLMs with different training strategies reveal the limitations of current LLMs in inferring factual knowledge from structured data. We present this benchmark as a compass to navigate the strengths and weaknesses of LLMs in reasoning with structured data for knowledge-sensitive tasks, and to encourage advancements in related real-world applications. Please find our code at this https URL.

[AI-60] A Safe and Efficient Self-evolving Algorithm for Decision-making and Control of Autonomous Driving Systems

链接: https://arxiv.org/abs/2408.12187
作者: Shuo Yang,Liwen Wang,Yanjun Huang,Hong Chen
关键词-EN: real-world environment, reinforcement learning, ability are expected, expected to cope, cope with unknown
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Autonomous vehicles with a self-evolving ability are expected to cope with unknown scenarios in the real-world environment. Take advantage of trial and error mechanism, reinforcement learning is able to self evolve by learning the optimal policy, and it is particularly well suitable for solving decision-making problems. However, reinforcement learning suffers from safety issues and low learning efficiency, especially in the continuous action space. Therefore, the motivation of this paper is to address the above problem by proposing a hybrid Mechanism-Experience-Learning augmented approach. Specifically, to realize the efficient self-evolution, the driving tendency by analogy with human driving experience is proposed to reduce the search space of the autonomous driving problem, while the constrained optimization problem based on a mechanistic model is designed to ensure safety during the self-evolving process. Experimental results show that the proposed method is capable of generating safe and reasonable actions in various complex scenarios, improving the performance of the autonomous driving system. Compared to conventional reinforcement learning, the safety and efficiency of the proposed algorithm are greatly improved. The training process is collision-free, and the training time is equivalent to less than 10 minutes in the real world.

[AI-61] Rank and Align: Towards Effective Source-free Graph Domain Adaptation IJCAI2024

链接: https://arxiv.org/abs/2408.12185
作者: Junyu Luo,Zhiping Xiao,Yifan Wang,Xiao Luo,Jingyang Yuan,Wei Ju,Langechuan Liu,Ming Zhang
关键词-EN: achieved impressive performance, Graph neural networks, graph domain adaptation, neural networks, achieved impressive
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
*备注: Published in IJCAI2024

点击查看摘要

Abstract:Graph neural networks (GNNs) have achieved impressive performance in graph domain adaptation. However, extensive source graphs could be unavailable in real-world scenarios due to privacy and storage concerns. To this end, we investigate an underexplored yet practical problem of source-free graph domain adaptation, which transfers knowledge from source models instead of source graphs to a target domain. To solve this problem, we introduce a novel GNN-based approach called Rank and Align (RNA), which ranks graph similarities with spectral seriation for robust semantics learning, and aligns inharmonic graphs with harmonic graphs which close to the source domain for subgraph extraction. In particular, to overcome label scarcity, we employ the spectral seriation algorithm to infer the robust pairwise rankings, which can guide semantic learning using a similarity learning objective. To depict distribution shifts, we utilize spectral clustering and the silhouette coefficient to detect harmonic graphs, which the source model can easily classify. To reduce potential domain discrepancy, we extract domain-invariant subgraphs from inharmonic graphs by an adversarial edge sampling process, which guides the invariant learning of GNNs. Extensive experiments on several benchmark datasets demonstrate the effectiveness of our proposed RNA.

[AI-62] Randomness control and reproducibility study of random forest algorithm in R and Python

链接: https://arxiv.org/abs/2408.12184
作者: Louisa Camadini,Yanis Bouzid,Maeva Merlet,Léopold Carron
关键词-EN: crucialto guarantee consumer, guarantee consumer protection, cosmetic products, compliance with regulatory, skin irritation
类目: Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:When it comes to the safety of cosmetic products, compliance with regulatory standards is crucialto guarantee consumer protection against the risks of skin irritation. Toxicologists must thereforebe fully conversant with all risks. This applies not only to their day-to-day work, but also to allthe algorithms they integrate into their routines. Recognizing this, ensuring the reproducibility ofalgorithms becomes one of the most crucial aspects to address.However, how can we prove the robustness of an algorithm such as the random forest, that reliesheavily on randomness? In this report, we will discuss the strategy of integrating random forest intoocular tolerance assessment for toxicologists.We will compare four packages: randomForest and Ranger (R packages), adapted in Python via theSKRanger package, and the widely used Scikit-Learn with the RandomForestClassifier() function.Our goal is to investigate the parameters and sources of randomness affecting the outcomes ofRandom Forest this http URL setting comparable parameters and using the same Pseudo-Random Number Generator (PRNG),we expect to reproduce results consistently across the various available implementations of therandom forest algorithm. Nevertheless, this exploration will unveil hidden layers of randomness andguide our understanding of the critical parameters necessary to ensure reproducibility across all fourimplementations of the random forest algorithm.

[AI-63] Search-Based LLMs for Code Optimization ICSE’25

链接: https://arxiv.org/abs/2408.12159
作者: Shuzheng Gao,Cuiyun Gao,Wenchao Gu,Michael Lyu
关键词-EN: optimization, methods, optimization methods, written by developers, code
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
*备注: Accepted by 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE’25)

点击查看摘要

Abstract:The code written by developers usually suffers from efficiency problems and contain various performance bugs. These inefficiencies necessitate the research of automated refactoring methods for code optimization. Early research in code optimization employs rule-based methods and focuses on specific inefficiency issues, which are labor-intensive and suffer from the low coverage issue. Recent work regards the task as a sequence generation problem, and resorts to deep learning (DL) techniques such as large language models (LLMs). These methods typically prompt LLMs to directly generate optimized code. Although these methods show state-of-the-art performance, such one-step generation paradigm is hard to achieve an optimal solution. First, complex optimization methods such as combinatorial ones are hard to be captured by LLMs. Second, the one-step generation paradigm poses challenge in precisely infusing the knowledge required for effective code optimization within LLMs, resulting in under-optimized this http URL address these problems, we propose to model this task from the search perspective, and propose a search-based LLMs framework named SBLLM that enables iterative refinement and discovery of improved optimization methods. SBLLM synergistically integrate LLMs with evolutionary search and consists of three key components: 1) an execution-based representative sample selection part that evaluates the fitness of each existing optimized code and prioritizes promising ones to pilot the generation of improved code; 2) an adaptive optimization pattern retrieval part that infuses targeted optimization patterns into the model for guiding LLMs towards rectifying and progressively enhancing their optimization methods; and 3) a genetic operator-inspired chain-of-thought prompting part that aids LLMs in combining different optimization methods and generating improved optimization methods.

[AI-64] Implicit Sentiment Analysis Based on Chain of Thought Prompting

链接: https://arxiv.org/abs/2408.12157
作者: Zhihua Duan,Jialin Wang
关键词-EN: Implicit Sentiment Analysis, crucial research area, natural language processing, Sentiment Analysis, Implicit Sentiment
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Implicit Sentiment Analysis (ISA) is a crucial research area in natural language processing. Inspired by the idea of large language model Chain of Thought (CoT), this paper introduces a Sentiment Analysis of Thinking (SAoT) framework. The framework first analyzes the implicit aspects and opinions in the text using common sense and thinking chain capabilities. Then, it reflects on the process of implicit sentiment analysis and finally deduces the polarity of sentiment. The model is evaluated on the SemEval 2014 dataset, consisting of 1120 restaurant reviews and 638 laptop reviews. The experimental results demonstrate that the utilization of the ERNIE-Bot-4+SAoT model yields a notable performance improvement. Specifically, on the restaurant dataset, the F1 score reaches 75.27, accompanied by an ISA score of 66.29. Similarly, on the computer dataset, the F1 score achieves 76.50, while the ISA score amounts to 73.46. Comparatively, the ERNIE-Bot-4+SAoT model surpasses the BERTAsp + SCAPt baseline by an average margin of 47.99%.

[AI-65] A Tighter Complexity Analysis of SparseGPT

链接: https://arxiv.org/abs/2408.12151
作者: Xiaoyu Li,Yingyu Liang,Zhenmei Shi,Zhao Song
关键词-EN: Alistarh ICML, omega, matrix multiplication, Zhou ICML, exponent of matrix
类目: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:In this work, we improved the analysis of the running time of SparseGPT [Frantar, Alistarh ICML 2023] from O(d^3) to O(d^\omega + d^2+a+o(1) + d^1+\omega(1,1,a)-a) for any a \in [0, 1] , where \omega is the exponent of matrix multiplication. In particular, for the current \omega \approx 2.371 [Alman, Duan, Williams, Xu, Xu, Zhou 2024], our running times boil down to O(d^2.53) . This running time is due to the analysis of the lazy update behavior in iterative maintenance problems, such as [Deng, Song, Weinstein 2022, Brand, Song, Zhou ICML 2024].

[AI-66] Multi-tool Integration Application for Math Reasoning Using Large Language Model

链接: https://arxiv.org/abs/2408.12148
作者: Zhihua Duan,Jialin Wang
关键词-EN: important research direction, Mathematical reasoning, tool, Mathematical, artificial intelligence
类目: Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Mathematical reasoning is an important research direction in the field of artificial intelligence. This article proposes a novel multi tool application framework for mathematical reasoning, aiming to achieve more comprehensive and accurate mathematical reasoning by utilizing the collaborative effect of large language models (LLMs) and multiple external tools. Firstly, use a Math Tool to perform basic mathematical calculations during the inference process through interaction with LLM. Secondly, Code Tool can generate code fragments that comply with syntax rules and execute them, providing support for complex mathematical problems. Then, through the iterative reasoning of the CoT Tool, the logical coherence and accuracy of mathematical reasoning are enhanced. Ultimately, by using self consistency tools to select the final answer based on different parameters, the consistency and reliability of reasoning are improved. Through the synergistic effect of these tools, the framework has achieved significant performance improvement in mathematical reasoning tasks. We conducted experiments on the NumGLUE Task 4 test set, which includes 220 mathematical reasoning fill in the blank questions. The experimental results showed that, based on Math Tool, Code Tool, and CoT Tool, in Task 4 task,our method achieved an accuracy of 89.09,compared with the GPT3+FewShot baseline, Few Shot+ERNIE-4.0+self consistency improved by 49.09%, and compared with fine-tuning the Fine tuning baseline, Few Shot+ERNIE-4.0+self consistency improved by 52.29%

[AI-67] MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symbolic LLM Agents

链接: https://arxiv.org/abs/2408.12142
作者: Congchi Yin,Feng Li,Shu Zhang,Zike Wang,Jun Shao,Piji Li,Jianhua Chen,Xun Jiang
关键词-EN: disorders primarily relies, mental disorders, Chinese mental disorders, mental disorders primarily, primarily relies
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:The clinical diagnosis of most mental disorders primarily relies on the conversations between psychiatrist and patient. The creation of such diagnostic conversation datasets is promising to boost the AI mental healthcare community. However, directly collecting the conversations in real diagnosis scenarios is near impossible due to stringent privacy and ethical considerations. To address this issue, we seek to synthesize diagnostic conversation by exploiting anonymous patient cases that are easier to access. Specifically, we design a neuro-symbolic multi-agent framework for synthesizing the diagnostic conversation of mental disorders with large language models. It takes patient case as input and is capable of generating multiple diverse conversations with one single patient case. The framework basically involves the interaction between a doctor agent and a patient agent, and achieves text generation under symbolic control via a dynamic diagnosis tree from a tool agent. By applying the proposed framework, we develop the largest Chinese mental disorders diagnosis dataset MDD-5k, which is built upon 1000 cleaned real patient cases by cooperating with a pioneering psychiatric hospital, and contains 5000 high-quality long conversations with diagnosis results as labels. To the best of our knowledge, it’s also the first labelled Chinese mental disorders diagnosis dataset. Human evaluation demonstrates the proposed MDD-5k dataset successfully simulates human-like diagnostic process of mental disorders. The dataset and code will become publicly accessible in this https URL.

[AI-68] DRExplainer: Quantifiable Interpretability in Drug Response Prediction with Directed Graph Convolutional Network

链接: https://arxiv.org/abs/2408.12139
作者: Haoyuan Shi,Tao Xu,Xiaodi Li,Qian Gao,Junfeng Xia,Zhenyu Yue
关键词-EN: directed bipartite network, personalized medicine, pivotal for personalized, cancer cell line, directed bipartite
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Predicting the response of a cancer cell line to a therapeutic drug is pivotal for personalized medicine. Despite numerous deep learning methods that have been developed for drug response prediction, integrating diverse information about biological entities and predicting the directional response remain major challenges. Here, we propose a novel interpretable predictive model, DRExplainer, which leverages a directed graph convolutional network to enhance the prediction in a directed bipartite network framework. DRExplainer constructs a directed bipartite network integrating multi-omics profiles of cell lines, the chemical structure of drugs and known drug response to achieve directed prediction. Then, DRExplainer identifies the most relevant subgraph to each prediction in this directed bipartite network by learning a mask, facilitating critical medical decision-making. Additionally, we introduce a quantifiable method for model interpretability that leverages a ground truth benchmark dataset curated from biological features. In computational experiments, DRExplainer outperforms state-of-the-art predictive methods and another graph-based explanation method under the same experimental setting. Finally, the case studies further validate the interpretability and the effectiveness of DRExplainer in predictive novel drug response. Our code is available at: this https URL.

[AI-69] Self-supervised Learning for Geospatial AI: A Survey

链接: https://arxiv.org/abs/2408.12133
作者: Yile Chen,Weiming Huang,Kaiqi Zhao,Yue Jiang,Gao Cong
关键词-EN: geospatial artificial intelligence, geospatial data, artificial intelligence, SSL techniques, SSL
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:The proliferation of geospatial data in urban and territorial environments has significantly facilitated the development of geospatial artificial intelligence (GeoAI) across various urban applications. Given the vast yet inherently sparse labeled nature of geospatial data, there is a critical need for techniques that can effectively leverage such data without heavy reliance on labeled datasets. This requirement aligns with the principles of self-supervised learning (SSL), which has attracted increasing attention for its adoption in geospatial data. This paper conducts a comprehensive and up-to-date survey of SSL techniques applied to or developed for three primary data (geometric) types prevalent in geospatial vector data: points, polylines, and polygons. We systematically categorize various SSL techniques into predictive and contrastive methods, discussing their application with respect to each data type in enhancing generalization across various downstream tasks. Furthermore, we review the emerging trends of SSL for GeoAI, and several task-specific SSL techniques. Finally, we discuss several key challenges in the current research and outline promising directions for future investigation. By presenting a structured analysis of relevant studies, this paper aims to inspire continued advancements in the integration of SSL with GeoAI, encouraging innovative methods to harnessing the power of geospatial data.

[AI-70] S-EPOA: Overcoming the Indivisibility of Annotations with Skill-Driven Preference-Based Reinforcement Learning AAAI02025

链接: https://arxiv.org/abs/2408.12130
作者: Ni Mu,Yao Luan,Yiqin Yang,Qing-shan Jia
关键词-EN: direct reward signal, intricate reward engineering, Preference-based reinforcement learning, utilizing human preferences, Preference-based reinforcement
类目: Artificial Intelligence (cs.AI)
*备注: Submitted to AAAI 02025

点击查看摘要

Abstract:Preference-based reinforcement learning (PbRL) stands out by utilizing human preferences as a direct reward signal, eliminating the need for intricate reward engineering. However, despite its potential, traditional PbRL methods are often constrained by the indivisibility of annotations, which impedes the learning process. In this paper, we introduce a groundbreaking approach, Skill-Enhanced Preference Optimization Algorithm~(S-EPOA), which addresses the annotation indivisibility issue by integrating skill mechanisms into the preference learning framework. Specifically, we first conduct the unsupervised pretraining to learn useful skills. Then, we propose a novel query selection mechanism to balance the information gain and discriminability over the learned skill space. Experimental results on a range of tasks, including robotic manipulation and locomotion, demonstrate that S-EPOA significantly outperforms conventional PbRL methods in terms of both robustness and learning efficiency. The results highlight the effectiveness of skill-driven learning in overcoming the challenges posed by annotation indivisibility.

[AI-71] Deep Analysis of Time Series Data for Smart Grid Startup Strategies: A Transformer-LSTM-PSO Model Approach

链接: https://arxiv.org/abs/2408.12129
作者: Zecheng Zhang
关键词-EN: holds strategic importance, Grid startup, holds strategic, integral component, strategic importance
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
*备注: 46 pages

点击查看摘要

Abstract:Grid startup, an integral component of the power system, holds strategic importance for ensuring the reliability and efficiency of the electrical grid. However, current methodologies for in-depth analysis and precise prediction of grid startup scenarios are inadequate. To address these challenges, we propose a novel method based on the Transformer-LSTM-PSO model. This model uniquely combines the Transformer’s self-attention mechanism, LSTM’s temporal modeling capabilities, and the parameter tuning features of the particle swarm optimization algorithm. It is designed to more effectively capture the complex temporal relationships in grid startup schemes. Our experiments demonstrate significant improvements, with our model achieving lower RMSE and MAE values across multiple datasets compared to existing benchmarks, particularly in the NYISO Electric Market dataset where the RMSE was reduced by approximately 15% and the MAE by 20% compared to conventional models. Our main contribution is the development of a Transformer-LSTM-PSO model that significantly enhances the accuracy and efficiency of smart grid startup predictions. The application of the Transformer-LSTM-PSO model represents a significant advancement in smart grid predictive analytics, concurrently fostering the development of more reliable and intelligent grid management systems.

[AI-72] Diffusion-Based Visual Art Creation: A Survey and New Perspectives

链接: https://arxiv.org/abs/2408.12128
作者: Bingyuan Wang,Qifeng Chen,Zeyu Wang
关键词-EN: underlying domain knowledge, visual art creation, visual art, diffusion-based visual art, domain knowledge
类目: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
*备注: 35 pages, 9 figures

点击查看摘要

Abstract:The integration of generative AI in visual art has revolutionized not only how visual content is created but also how AI interacts with and reflects the underlying domain knowledge. This survey explores the emerging realm of diffusion-based visual art creation, examining its development from both artistic and technical perspectives. We structure the survey into three phases, data feature and framework identification, detailed analyses using a structured coding process, and open-ended prospective outlooks. Our findings reveal how artistic requirements are transformed into technical challenges and highlight the design and application of diffusion-based methods within visual art creation. We also provide insights into future directions from technical and synergistic perspectives, suggesting that the confluence of generative AI and art has shifted the creative paradigm and opened up new possibilities. By summarizing the development and trends of this emerging interdisciplinary area, we aim to shed light on the mechanisms through which AI systems emulate and possibly, enhance human capacities in artistic perception and creativity.

[AI-73] AutoTest: Evolutionary Code Solution Selection with Test Cases

链接: https://arxiv.org/abs/2408.12125
作者: Zhihua Duan,Jialin Wang
关键词-EN: multiple candidate solutions, correct code solution, code generation techniques, code solution, selecting the correct
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:With the development of code generation techniques, selecting the correct code solution from multiple candidate solutions has become a crucial task. This study proposes AutoTest, a novel technique that combines automated test case generation with code solution execution to optimize the selection process using an evolutionary genetic algorithm. Firstly, AutoTest utilizes large pre-trained language models such as codegen-16B, code-davinci-002, and incoder-6B to provide code solutions and their corresponding test cases. Then, by executing the code solutions and evaluating their performance on the test cases, a consensus set is formed. Fine-grained ranking is achieved through the selection, mutation, and crossover mechanisms based on the evolutionary genetic algorithm, with the adjustment of alpha and beta parameters. Finally, the best code solution is chosen. AutoTest demonstrates significant performance improvements on the HumanEval benchmark test. The HumanEval dataset consists of 164 programming problems, and AutoTest achieves approximately a 10% improvement over the baseline method in terms of pass@1 score.

[AI-74] Emotion-Agent : Unsupervised Deep Reinforcement Learning with Distribution-Prototype Reward for Continuous Emotional EEG Analysis AAAI2025

链接: https://arxiv.org/abs/2408.12121
作者: Zhihao Zhou,Qile Liu,Jiyuan Wang,Zhen Liang
关键词-EN: affective brain-computer interface, continuous EEG signals, EEG signals, continuous EEG, collected EEG signals
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
*备注: 11 pages, 4 figures, 4 tables, submitted to AAAI 2025

点击查看摘要

Abstract:Continuous electroencephalography (EEG) signals are widely used in affective brain-computer interface (aBCI) applications. However, not all continuously collected EEG signals are relevant or meaningful to the task at hand (e.g., wondering thoughts). On the other hand, manually labeling the relevant parts is nearly impossible due to varying engagement patterns across different tasks and individuals. Therefore, effectively and efficiently identifying the important parts from continuous EEG recordings is crucial for downstream BCI tasks, as it directly impacts the accuracy and reliability of the results. In this paper, we propose a novel unsupervised deep reinforcement learning framework, called Emotion-Agent, to automatically identify relevant and informative emotional moments from continuous EEG signals. Specifically, Emotion-Agent involves unsupervised deep reinforcement learning combined with a heuristic algorithm. We first use the heuristic algorithm to perform an initial global search and form prototype representations of the EEG signals, which facilitates the efficient exploration of the signal space and identify potential regions of interest. Then, we design distribution-prototype reward functions to estimate the interactions between samples and prototypes, ensuring that the identified parts are both relevant and representative of the underlying emotional states. Emotion-Agent is trained using Proximal Policy Optimization (PPO) to achieve stable and efficient convergence. Our experiments compare the performance with and without Emotion-Agent. The results demonstrate that selecting relevant and informative emotional parts before inputting them into downstream tasks enhances the accuracy and reliability of aBCI applications.

[AI-75] Understanding Data Reconstruction Leakage in Federated Learning from a Theoretical Perspective

链接: https://arxiv.org/abs/2408.12119
作者: Zifan Wang,Binghui Zhang,Meng Pang,Yuan Hong,Binghui Wang
关键词-EN: emerging collaborative learning, collaborative learning paradigm, Federated learning, protect data privacy, collaborative learning
类目: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Federated learning (FL) is an emerging collaborative learning paradigm that aims to protect data privacy. Unfortunately, recent works show FL algorithms are vulnerable to the serious data reconstruction attacks. However, existing works lack a theoretical foundation on to what extent the devices’ data can be reconstructed and the effectiveness of these attacks cannot be compared fairly due to their unstable performance. To address this deficiency, we propose a theoretical framework to understand data reconstruction attacks to FL. Our framework involves bounding the data reconstruction error and an attack’s error bound reflects its inherent attack effectiveness. Under the framework, we can theoretically compare the effectiveness of existing attacks. For instance, our results on multiple datasets validate that the iDLG attack inherently outperforms the DLG attack.

[AI-76] Geolocation Representation from Large Language Models are Generic Enhancers for Spatio-Temporal Learning

链接: https://arxiv.org/abs/2408.12116
作者: Junlin He,Tong Nie,Wei Ma
关键词-EN: natural language processing, universal representation models, geospatial domain, computer vision, processing and computer
类目: Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:In the geospatial domain, universal representation models are significantly less prevalent than their extensive use in natural language processing and computer vision. This discrepancy arises primarily from the high costs associated with the input of existing representation models, which often require street views and mobility data. To address this, we develop a novel, training-free method that leverages large language models (LLMs) and auxiliary map data from OpenStreetMap to derive geolocation representations (LLMGeovec). LLMGeovec can represent the geographic semantics of city, country, and global scales, which acts as a generic enhancer for spatio-temporal learning. Specifically, by direct feature concatenation, we introduce a simple yet effective paradigm for enhancing multiple spatio-temporal tasks including geographic prediction (GP), long-term time series forecasting (LTSF), and graph-based spatio-temporal forecasting (GSTF). LLMGeovec can seamlessly integrate into a wide spectrum of spatio-temporal learning models, providing immediate enhancements. Experimental results demonstrate that LLMGeovec achieves global coverage and significantly boosts the performance of leading GP, LTSF, and GSTF models.

[AI-77] Risk Analysis in Customer Relationship Management via Quantile Region Convolutional Neural Network-Long Short-Term Memory and Cross-Attention Mechanism

链接: https://arxiv.org/abs/2408.12113
作者: Yaowen Huang,Jun Der Leu,Baoli Lu,Yan Zhou
关键词-EN: customer relationship management, affect customer satisfaction, CRM risk analysis, retention rates, customer satisfaction
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
*备注: 44 pages

点击查看摘要

Abstract:Risk analysis is an important business decision support task in customer relationship management (CRM), involving the identification of potential risks or challenges that may affect customer satisfaction, retention rates, and overall business performance. To enhance risk analysis in CRM, this paper combines the advantages of quantile region convolutional neural network-long short-term memory (QRCNN-LSTM) and cross-attention mechanisms for modeling. The QRCNN-LSTM model combines sequence modeling with deep learning architectures commonly used in natural language processing tasks, enabling the capture of both local and global dependencies in sequence data. The cross-attention mechanism enhances interactions between different input data parts, allowing the model to focus on specific areas or features relevant to CRM risk analysis. By applying QRCNN-LSTM and cross-attention mechanisms to CRM risk analysis, empirical evidence demonstrates that this approach can effectively identify potential risks and provide data-driven support for business decisions.

[AI-78] Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards

链接: https://arxiv.org/abs/2408.12112
作者: Shresth Verma,Niclas Boehmer,Lingkai Kong,Milind Tambe
关键词-EN: Reinforcement Learning, design reward functions, preferences in Reinforcement, Learning, based on human
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
*备注:

点击查看摘要

Abstract:LLMs are increasingly used to design reward functions based on human preferences in Reinforcement Learning (RL). We focus on LLM-designed rewards for Restless Multi-Armed Bandits, a framework for allocating limited resources among agents. In applications such as public health, this approach empowers grassroots health workers to tailor automated allocation decisions to community needs. In the presence of multiple agents, altering the reward function based on human preferences can impact subpopulations very differently, leading to complex tradeoffs and a multi-objective resource allocation problem. We are the first to present a principled method termed Social Choice Language Model for dealing with these tradeoffs for LLM-designed rewards for multiagent planners in general and restless bandits in particular. The novel part of our model is a transparent and configurable selection component, called an adjudicator, external to the LLM that controls complex tradeoffs via a user-selected social welfare function. Our experiments demonstrate that our model reliably selects more effective, aligned, and balanced reward functions compared to purely LLM-based approaches.

[AI-79] Extraction of Research Objectives Machine Learning Model Names and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis

链接: https://arxiv.org/abs/2408.12097
作者: S. Nishio,H. Nonaka,N. Tsuchiya,A. Migita,Y. Banno,T. Hayashi,H. Sakaji,T. Sakumoto,K. Watabe
关键词-EN: Machine learning, machine learning models, learning, Machine, learning models
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
*备注: 10 pages, 8 figures

点击查看摘要

Abstract:Machine learning is widely utilized across various industries. Identifying the appropriate machine learning models and datasets for specific tasks is crucial for the effective industrial application of machine learning. However, this requires expertise in both machine learning and the relevant domain, leading to a high learning cost. Therefore, research focused on extracting combinations of tasks, machine learning models, and datasets from academic papers is critically important, as it can facilitate the automatic recommendation of suitable methods. Conventional information extraction methods from academic papers have been limited to identifying machine learning models and other entities as named entities. To address this issue, this study proposes a methodology extracting tasks, machine learning methods, and dataset names from scientific papers and analyzing the relationships between these information by using LLM, embedding model, and network clustering. The proposed method’s expression extraction performance, when using Llama3, achieves an F-score exceeding 0.8 across various categories, confirming its practical utility. Benchmarking results on financial domain papers have demonstrated the effectiveness of this method, providing insights into the use of the latest datasets, including those related to ESG (Environmental, Social, and Governance) data.

[AI-80] uMedSum: A Unified Framework for Advancing Medical Abstractive Summarization

链接: https://arxiv.org/abs/2408.12095
作者: Aishik Nagar,Yutong Liu,Andy T. Liu,Viktor Schlegel,Vijay Prakash Dwivedi,Arun-Kumar Kaliya-Perumal,Guna Pratheep Kalanchiam,Yili Tang,Robby T. Tan
关键词-EN: faces the challenge, challenge of balancing, abstractive summarization faces, medical summarization, summarization
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: 12 pages

点击查看摘要

Abstract:Medical abstractive summarization faces the challenge of balancing faithfulness and informativeness. Current methods often sacrifice key information for faithfulness or introduce confabulations when prioritizing informativeness. While recent advancements in techniques like in-context learning (ICL) and fine-tuning have improved medical summarization, they often overlook crucial aspects such as faithfulness and informativeness without considering advanced methods like model reasoning and self-improvement. Moreover, the field lacks a unified benchmark, hindering systematic evaluation due to varied metrics and datasets. This paper addresses these gaps by presenting a comprehensive benchmark of six advanced abstractive summarization methods across three diverse datasets using five standardized metrics. Building on these findings, we propose uMedSum, a modular hybrid summarization framework that introduces novel approaches for sequential confabulation removal followed by key missing information addition, ensuring both faithfulness and informativeness. Our work improves upon previous GPT-4-based state-of-the-art (SOTA) medical summarization methods, significantly outperforming them in both quantitative metrics and qualitative domain expert evaluations. Notably, we achieve an average relative performance improvement of 11.8% in reference-free metrics over the previous SOTA. Doctors prefer uMedSum’s summaries 6 times more than previous SOTA in difficult cases where there are chances of confabulations or missing information. These results highlight uMedSum’s effectiveness and generalizability across various datasets and metrics, marking a significant advancement in medical summarization.

[AI-81] Unlocking Attributes Contribution to Successful Camouflage: A Combined Textual and VisualAnalysis Strategy ECCV2024

链接: https://arxiv.org/abs/2408.12086
作者: Hong Zhang,Yixuan Lyu,Qian Yu,Hanyang Liu,Huimin Ma,Ding Yuan,Yifan Yang
关键词-EN: Camouflaged Object Segmentation, remain poorly understood, effective camouflage remain, camouflage remain poorly, Object Segmentation
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注: Accepted by ECCV 2024

点击查看摘要

Abstract:In the domain of Camouflaged Object Segmentation (COS), despite continuous improvements in segmentation performance, the underlying mechanisms of effective camouflage remain poorly understood, akin to a black box. To address this gap, we present the first comprehensive study to examine the impact of camouflage attributes on the effectiveness of camouflage patterns, offering a quantitative framework for the evaluation of camouflage designs. To support this analysis, we have compiled the first dataset comprising descriptions of camouflaged objects and their attribute contributions, termed COD-Text And X-attributions (COD-TAX). Moreover, drawing inspiration from the hierarchical process by which humans process information: from high-level textual descriptions of overarching scenarios, through mid-level summaries of local areas, to low-level pixel data for detailed analysis. We have developed a robust framework that combines textual and visual information for the task of COS, named Attribution CUe Modeling with Eye-fixation Network (ACUMEN). ACUMEN demonstrates superior performance, outperforming nine leading methods across three widely-used datasets. We conclude by highlighting key insights derived from the attributes identified in our study. Code: this https URL.

[AI-82] High-Quality Data Augmentation for Low-Resource NMT: Combining a Translation Memory a GAN Generator and Filtering

链接: https://arxiv.org/abs/2408.12079
作者: Hengjie Liu,Ruibo Hou,Yves Lepage
关键词-EN: language translation tasks, Back translation, low-resource language translation, Neural Machine Translation, extending a dataset
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Back translation, as a technique for extending a dataset, is widely used by researchers in low-resource language translation tasks. It typically translates from the target to the source language to ensure high-quality translation results. This paper proposes a novel way of utilizing a monolingual corpus on the source side to assist Neural Machine Translation (NMT) in low-resource settings. We realize this concept by employing a Generative Adversarial Network (GAN), which augments the training data for the discriminator while mitigating the interference of low-quality synthetic monolingual translations with the generator. Additionally, this paper integrates Translation Memory ™ with NMT, increasing the amount of data available to the generator. Moreover, we propose a novel procedure to filter the synthetic sentence pairs during the augmentation process, ensuring the high quality of the data.

[AI-83] ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM

链接: https://arxiv.org/abs/2408.12076
作者: Zhaochen Su,Jun Zhang,Xiaoye Qu,Tong Zhu,Yanshu Li,Jiashuo Sun,Juntao Li,Min Zhang,Yu Cheng
关键词-EN: Large language models, achieved impressive advancements, Large language, source of hallucinations, rarely been studied
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
*备注: Under Review

点击查看摘要

Abstract:Large language models (LLMs) have achieved impressive advancements across numerous disciplines, yet the critical issue of knowledge conflicts, a major source of hallucinations, has rarely been studied. Only a few research explored the conflicts between the inherent knowledge of LLMs and the retrieved contextual knowledge. However, a thorough assessment of knowledge conflict in LLMs is still missing. Motivated by this research gap, we present ConflictBank, the first comprehensive benchmark developed to systematically evaluate knowledge conflicts from three aspects: (i) conflicts encountered in retrieved knowledge, (ii) conflicts within the models’ encoded knowledge, and (iii) the interplay between these conflict forms. Our investigation delves into four model families and twelve LLM instances, meticulously analyzing conflicts stemming from misinformation, temporal discrepancies, and semantic divergences. Based on our proposed novel construction framework, we create 7,453,853 claim-evidence pairs and 553,117 QA pairs. We present numerous findings on model scale, conflict causes, and conflict types. We hope our ConflictBank benchmark will help the community better understand model behavior in conflicts and develop more reliable LLMs.

[AI-84] ransformers As Approximations of Solomonoff Induction

链接: https://arxiv.org/abs/2408.12065
作者: Nathan Young,Michael Witbrock
关键词-EN: computable probability distribution, representing a Bayesian, Bayesian mixture, Solomonoff Induction, approximate Solomonoff Induction
类目: Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Solomonoff Induction is an optimal-in-the-limit unbounded algorithm for sequence prediction, representing a Bayesian mixture of every computable probability distribution and performing close to optimally in predicting any computable sequence. Being an optimal form of computational sequence prediction, it seems plausible that it may be used as a model against which other methods of sequence prediction might be compared. We put forth and explore the hypothesis that Transformer models - the basis of Large Language Models - approximate Solomonoff Induction better than any other extant sequence prediction method. We explore evidence for and against this hypothesis, give alternate hypotheses that take this evidence into account, and outline next steps for modelling Transformers and other kinds of AI in this way. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2408.12065 [cs.AI] (or arXiv:2408.12065v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2408.12065 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

[AI-85] Enhancing Sampling Protocol for Robust Point Cloud Classification

链接: https://arxiv.org/abs/2408.12062
作者: Chongshou Li,Pin Tang,Xinke Li,Tianrui Li
关键词-EN: Farthest Point Sampling, Established sampling protocols, Fixed Sample Size, point cloud, Established sampling
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Established sampling protocols for 3D point cloud learning, such as Farthest Point Sampling (FPS) and Fixed Sample Size (FSS), have long been recognized and utilized. However, real-world data often suffer from corrputions such as sensor noise, which violates the benignness assumption of point cloud in current protocols. Consequently, they are notably vulnerable to noise, posing significant safety risks in critical applications like autonomous driving. To address these issues, we propose an enhanced point cloud sampling protocol, PointDR, which comprises two components: 1) Downsampling for key point identification and 2) Resampling for flexible sample size. Furthermore, differentiated strategies are implemented for training and inference processes. Particularly, an isolation-rated weight considering local density is designed for the downsampling method, assisting it in performing random key points selection in the training phase and bypassing noise in the inference phase. A local-geometry-preserved upsampling is incorporated into resampling, facilitating it to maintain a stochastic sample size in the training stage and complete insufficient data in the inference. It is crucial to note that the proposed protocol is free of model architecture altering and extra learning, thus minimal efforts are demanded for its replacement of the existing one. Despite the simplicity, it substantially improves the robustness of point cloud learning, showcased by outperforming the state-of-the-art methods on multiple benchmarks of corrupted point cloud classification. The code will be available upon the paper’s acceptance.

[AI-86] Evidence-backed Fact Checking using RAG and Few-Shot In-Context Learning with LLMs

链接: https://arxiv.org/abs/2408.12060
作者: Ronit Singhal,Pransh Patwa,Parth Patwa,Aman Chadha,Amitava Das
关键词-EN: implementing fact-checking mechanisms, social media, widespread dissemination, dissemination of misinformation, misinformation on social
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Given the widespread dissemination of misinformation on social media, implementing fact-checking mechanisms for online claims is essential. Manually verifying every claim is highly challenging, underscoring the need for an automated fact-checking system. This paper presents our system designed to address this issue. We utilize the Averitec dataset to assess the veracity of claims. In addition to veracity prediction, our system provides supporting evidence, which is extracted from the dataset. We develop a Retrieve and Generate (RAG) pipeline to extract relevant evidence sentences from a knowledge base, which are then inputted along with the claim into a large language model (LLM) for classification. We also evaluate the few-shot In-Context Learning (ICL) capabilities of multiple LLMs. Our system achieves an ‘Averitec’ score of 0.33, which is a 22% absolute improvement over the baseline. All code will be made available on All code will be made available on this https URL.

[AI-87] Enhancing LLM-Based Automated Program Repair with Design Rationales

链接: https://arxiv.org/abs/2408.12056
作者: Jiuang Zhao,Donghao Yang,Li Zhang,Xiaoli Lian,Zitian Yang
关键词-EN: Automatic Program Repair, Automatic Program, Program Repair, autonomously rectify issues, feature development
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Automatic Program Repair (APR) endeavors to autonomously rectify issues within specific projects, which generally encompasses three categories of tasks: bug resolution, new feature development, and feature enhancement. Despite extensive research proposing various methodologies, their efficacy in addressing real issues remains unsatisfactory. It’s worth noting that, typically, engineers have design rationales (DR) on solution-planed solutions and a set of underlying reasons-before they start patching code. In open-source projects, these DRs are frequently captured in issue logs through project management tools like Jira. This raises a compelling question: How can we leverage DR scattered across the issue logs to efficiently enhance APR? To investigate this premise, we introduce DRCodePilot, an approach designed to augment GPT-4-Turbo’s APR capabilities by incorporating DR into the prompt instruction. Furthermore, given GPT-4’s constraints in fully grasping the broader project context and occasional shortcomings in generating precise identifiers, we have devised a feedback-based self-reflective framework, in which we prompt GPT-4 to reconsider and refine its outputs by referencing a provided patch and suggested identifiers. We have established a benchmark comprising 938 issue-patch pairs sourced from two open-source repositories hosted on GitHub and Jira. Our experimental results are impressive: DRCodePilot achieves a full-match ratio that is a remarkable 4.7x higher than when GPT-4 is utilized directly. Additionally, the CodeBLEU scores also exhibit promising enhancements. Moreover, our findings reveal that the standalone application of DR can yield promising increase in the full-match ratio across CodeLlama, GPT-3.5, and GPT-4 within our benchmark suite. We believe that our DRCodePilot initiative heralds a novel human-in-the-loop avenue for advancing the field of APR.

[AI-88] Reasoning and Tools for Human-Level Forecasting

链接: https://arxiv.org/abs/2408.12036
作者: Elvis Hsieh,Preston Fu,Jonathan Chen
关键词-EN: largely successful due, memorize large amounts, training data, Language models, trained on web-scale
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:Language models (LMs) trained on web-scale datasets are largely successful due to their ability to memorize large amounts of training data, even if only present in a few examples. These capabilities are often desirable in evaluation on tasks such as question answering but raise questions about whether these models can exhibit genuine reasoning or succeed only at mimicking patterns from the training data. This distinction is particularly salient in forecasting tasks, where the answer is not present in the training data, and the model must reason to make logical deductions. We present Reasoning and Tools for Forecasting (RTF), a framework of reasoning-and-acting (ReAct) agents that can dynamically retrieve updated information and run numerical simulation with equipped tools. We evaluate our model with questions from competitive forecasting platforms and demonstrate that our method is competitive with and can outperform human predictions. This suggests that LMs, with the right tools, can indeed think and adapt like humans, offering valuable insights for real-world decision-making.

[AI-89] A Constraint Programming Approach to Fair High School Course Scheduling

链接: https://arxiv.org/abs/2408.12032
作者: Mitsuka Kiyohara,Masakazu Ishihata
关键词-EN: Issues of inequity, school scheduling problem, high school scheduling, previously exist, scheduling problem
类目: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Issues of inequity in U.S. high schools’ course scheduling did not previously exist. However, in recent years, with the increase in student population and course variety, students perceive that the course scheduling method is unfair. Current integer programming (IP) methods to the high school scheduling problem (HSSP) fall short in addressing these fairness concerns. The purpose of this research is to develop a solution methodology that generates feasible and fair course schedules using student preferences. Utilizing principles of fairness, which have been well studied in market design, we define the fair high school scheduling problem (FHSSP), a novel extension to the HSSP, and devise a corresponding algorithm based on integer programming to solve the FHSSP. We test our approach on a real course request dataset from a high school in California, USA. Results show that our algorithm can generate schedules that are both feasible and fair. In this paper, we demonstrate that our IP algorithm not only solves the HSSP and FHSSP in the United States but has the potential to be applied to various real-world scheduling problems. Additionally, we show the feasibility of integrating human emotions into mathematical modeling.

[AI-90] Federated Diabetes Prediction in Canadian Adults Using Real-world Cross-Province Primary Care Data

链接: https://arxiv.org/abs/2408.12029
作者: Guojun Tang,Jason E. Black,Tyler S. Williamson,Steve H. Drew
关键词-EN: Electronic Health Records, Integrating Electronic Health, Integrating Electronic, Health Records, Electronic Health
类目: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI)
*备注: 10 pages

点击查看摘要

Abstract:Integrating Electronic Health Records (EHR) and the application of machine learning present opportunities for enhancing the accuracy and accessibility of data-driven diabetes prediction. In particular, developing data-driven machine learning models can provide early identification of patients with high risk for diabetes, potentially leading to more effective therapeutic strategies and reduced healthcare costs. However, regulation restrictions create barriers to developing centralized predictive models. This paper addresses the challenges by introducing a federated learning approach, which amalgamates predictive models without centralized data storage and processing, thus avoiding privacy issues. This marks the first application of federated learning to predict diabetes using real clinical datasets in Canada extracted from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN) without crossprovince patient data sharing. We address class-imbalance issues through downsampling techniques and compare federated learning performance against province-based and centralized models. Experimental results show that the federated MLP model presents a similar or higher performance compared to the model trained with the centralized approach. However, the federated logistic regression model showed inferior performance compared to its centralized peer.

[AI-91] Exploring Large Language Models for Feature Selection: A Data-centric Perspective

链接: https://arxiv.org/abs/2408.12025
作者: Dawei Li,Zhen Tan,Huan Liu
关键词-EN: Large Language Models, Language Models, Large Language, zero-shot learning capabilities, feature selection methods
类目: Artificial Intelligence (cs.AI)
*备注: Preprint, under review

点击查看摘要

Abstract:The rapid advancement of Large Language Models (LLMs) has significantly influenced various domains, leveraging their exceptional few-shot and zero-shot learning capabilities. In this work, we aim to explore and understand the LLMs-based feature selection methods from a data-centric perspective. We begin by categorizing existing feature selection methods with LLMs into two groups: data-driven feature selection which requires samples values to do statistical inference and text-based feature selection which utilizes prior knowledge of LLMs to do semantical associations using descriptive context. We conduct extensive experiments in both classification and regression tasks with LLMs in various sizes (e.g., GPT-4, ChatGPT and LLaMA-2). Our findings emphasize the effectiveness and robustness of text-based feature selection methods and showcase their potentials using a real-world medical application. We also discuss the challenges and future opportunities in employing LLMs for feature selection, offering insights for further research and development in this emerging field.

[AI-92] Understanding Epistemic Language with a Bayesian Theory of Mind

链接: https://arxiv.org/abs/2408.12022
作者: Lance Ying,Tan Zhi-Xuan,Lionel Wong,Vikash Mansinghka,Joshua B. Tenenbaum
关键词-EN: directly observed, people understand, understand and evaluate, Abstract, Bayesian
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
*备注: 21 pages

点击查看摘要

Abstract:How do people understand and evaluate claims about others’ beliefs, even though these beliefs cannot be directly observed? In this paper, we introduce a cognitive model of epistemic language interpretation, grounded in Bayesian inferences about other agents’ goals, beliefs, and intentions: a language-augmented Bayesian theory-of-mind (LaBToM). By translating natural language into an epistemic ``language-of-thought’', then evaluating these translations against the inferences produced by inverting a probabilistic generative model of rational action and perception, LaBToM captures graded plausibility judgments about epistemic claims. We validate our model in an experiment where participants watch an agent navigate a maze to find keys hidden in boxes needed to reach their goal, then rate sentences about the agent’s beliefs. In contrast with multimodal LLMs (GPT-4o, Gemini Pro) and ablated models, our model correlates highly with human judgments for a wide range of expressions, including modal language, uncertainty expressions, knowledge claims, likelihood comparisons, and attributions of false belief.

[AI-93] Does It Look Sequential? An Analysis of Datasets for Evaluation of Sequential Recommendations

链接: https://arxiv.org/abs/2408.12008
作者: Anton Klenitskiy,Anna Volodkevich,Anton Pembek,Alexey Vasilev
关键词-EN: Sequential recommender systems, Sequential, important and demanded, demanded area, recommender systems
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Sequential recommender systems are an important and demanded area of research. Such systems aim to use the order of interactions in a user’s history to predict future interactions. The premise is that the order of interactions and sequential patterns play an essential role. Therefore, it is crucial to use datasets that exhibit a sequential structure to evaluate sequential recommenders properly. We apply several methods based on the random shuffling of the user’s sequence of interactions to assess the strength of sequential structure across 15 datasets, frequently used for sequential recommender systems evaluation in recent research papers presented at top-tier conferences. As shuffling explicitly breaks sequential dependencies inherent in datasets, we estimate the strength of sequential patterns by comparing metrics for shuffled and original versions of the dataset. Our findings show that several popular datasets have a rather weak sequential structure. Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG) Cite as: arXiv:2408.12008 [cs.IR] (or arXiv:2408.12008v1 [cs.IR] for this version) https://doi.org/10.48550/arXiv.2408.12008 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Related DOI: https://doi.org/10.1145/3640457.3688195 Focus to learn more DOI(s) linking to related resources

[AI-94] QuaCK-TSF: Quantum-Classical Kernelized Time Series Forecasting

链接: https://arxiv.org/abs/2408.12007
作者: Abdallah Aaraba,Soumaya Cherkaoui,Ola Ahmad,Jean-Frédéric Laprade,Olivier Nahman-Lévesque,Alexis Vieloszynski,Shengrui Wang
关键词-EN: probabilistic time series, time series, complex endeavor, endeavor that extends, extends beyond predicting
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
*备注: 12 pages, 15 figures, to be published in IEEE Quantum Week 2024’s conference proceeding

点击查看摘要

Abstract:Forecasting in probabilistic time series is a complex endeavor that extends beyond predicting future values to also quantifying the uncertainty inherent in these predictions. Gaussian process regression stands out as a Bayesian machine learning technique adept at addressing this multifaceted challenge. This paper introduces a novel approach that blends the robustness of this Bayesian technique with the nuanced insights provided by the kernel perspective on quantum models, aimed at advancing quantum kernelized probabilistic forecasting. We incorporate a quantum feature map inspired by Ising interactions and demonstrate its effectiveness in capturing the temporal dependencies critical for precise forecasting. The optimization of our model’s hyperparameters circumvents the need for computationally intensive gradient descent by employing gradient-free Bayesian optimization. Comparative benchmarks against established classical kernel models are provided, affirming that our quantum-enhanced approach achieves competitive performance.

[AI-95] SimBench: A Rule-Based Multi-Turn Interaction Benchmark for Evaluating an LLMs Ability to Generate Digital Twins

链接: https://arxiv.org/abs/2408.11987
作者: Jingquan Wang,Harry Zhang,Huzaifa Mustafa Unjhawala,Peter Negrut,Shu Wang,Khailanii Slaton,Radu Serban,Jin-Long Wu,Dan Negrut
关键词-EN: large language models, student large language, language models, virtual testing, designed to evaluate
类目: Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:We introduce SimBench, a benchmark designed to evaluate the proficiency of student large language models (S-LLMs) in generating digital twins (DTs) that can be used in simulators for virtual testing. Given a collection of S-LLMs, this benchmark enables the ranking of the S-LLMs based on their ability to produce high-quality DTs. We demonstrate this by comparing over 20 open- and closed-source S-LLMs. Using multi-turn interactions, SimBench employs a rule-based judge LLM (J-LLM) that leverages both predefined rules and human-in-the-loop guidance to assign scores for the DTs generated by the S-LLM, thus providing a consistent and expert-inspired evaluation protocol. The J-LLM is specific to a simulator, and herein the proposed benchmarking approach is demonstrated in conjunction with the Chrono multi-physics simulator. Chrono provided the backdrop used to assess an S-LLM in relation to the latter’s ability to create digital twins for multibody dynamics, finite element analysis, vehicle dynamics, robotic dynamics, and sensor simulations. The proposed benchmarking principle is broadly applicable and enables the assessment of an S-LLM’s ability to generate digital twins for other simulation packages. All code and data are available at this https URL.

[AI-96] Chemical Reaction Neural Networks for Fitting Accelerated Rate Calorimetry Data

链接: https://arxiv.org/abs/2408.11984
作者: Saakaar Bhatnagar,Andrew Comerford,Zelu Xu,Davide Berti Polato,Araz Banaeizadeh,Alessandro Ferraris
关键词-EN: Accelerated Rate Calorimetry, lithium-ion batteries rapidly, batteries rapidly increases, thermal runaway, mitigate thermal runaway
类目: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:As the demand for lithium-ion batteries rapidly increases there is a need to design these cells in a safe manner to mitigate thermal runaway. Thermal runaway in batteries leads to an uncontrollable temperature rise and potentially fires, which is a major safety concern. Typically, when modelling the chemical kinetics of thermal runaway calorimetry data ( e.g. Accelerated Rate Calorimetry (ARC)) is needed to determine the temperature-driven decomposition kinetics. Conventional methods of fitting Arrhenius Ordinary Differential Equation (ODE) thermal runaway models to Accelerated Rate Calorimetry (ARC) data make several assumptions that reduce the fidelity and generalizability of the obtained model. In this paper, Chemical Reaction Neural Networks (CRNNs) are trained to fit the kinetic parameters of N-equation Arrhenius ODEs to ARC data obtained from a Molicel 21700 P45B. The models are found to be better approximations of the experimental data. The flexibility of the method is demonstrated by experimenting with two-equation and four-equation models. Thermal runaway simulations are conducted in 3D using the obtained kinetic parameters, showing the applicability of the obtained thermal runaway models to large-scale simulations.

[AI-97] Only Strict Saddles in the Energy Landscape of Predictive Coding Networks?

链接: https://arxiv.org/abs/2408.11979
作者: Francesco Innocenti,El Mehdi Achour,Ryan Singh,Christopher L. Buckley
关键词-EN: Predictive coding, performs iterative inference, energy-based learning algorithm, weight updates, algorithm that performs
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
*备注: 26 pages, 12 figures

点击查看摘要

Abstract:Predictive coding (PC) is an energy-based learning algorithm that performs iterative inference over network activities before weight updates. Recent work suggests that PC can converge in fewer learning steps than backpropagation thanks to its inference procedure. However, these advantages are not always observed, and the impact of PC inference on learning is theoretically not well understood. Here, we study the geometry of the PC energy landscape at the (inference) equilibrium of the network activities. For deep linear networks, we first show that the equilibrated energy is simply a rescaled mean squared error loss with a weight-dependent rescaling. We then prove that many highly degenerate (non-strict) saddles of the loss including the origin become much easier to escape (strict) in the equilibrated energy. Our theory is validated by experiments on both linear and non-linear networks. Based on these results, we conjecture that all the saddles of the equilibrated energy are strict. Overall, this work suggests that PC inference makes the loss landscape more benign and robust to vanishing gradients, while also highlighting the challenge of speeding up PC inference on large-scale models.

[AI-98] Sentiment and Emotion-aware Multi-criteria Fuzzy Group Decision Making System SDM2024

链接: https://arxiv.org/abs/2408.11976
作者: Adilet Yerkin,Pakizar Shamoi,Elnara Kadyrgali
关键词-EN: today world, holiday destination, restaurant or deciding, GDM, GDM systems
类目: Artificial Intelligence (cs.AI)
*备注: Submitted to FSDM 2024 - The 10th International Conference on Fuzzy Systems and Data Mining

点击查看摘要

Abstract:In today’s world, making decisions as a group is common, whether choosing a restaurant or deciding on a holiday destination. Group decision-making (GDM) systems play a crucial role by facilitating consensus among participants with diverse preferences. Discussions are one of the main tools people use to make decisions. When people discuss alternatives, they use natural language to express their opinions. Traditional GDM systems generally require participants to provide explicit opinion values to the system. However, in real-life scenarios, participants often express their opinions through some text (e.g., in comments, social media, messengers, etc.). This paper introduces a sentiment and emotion-aware multi-criteria fuzzy GDM system designed to enhance consensus-reaching effectiveness in group settings. This system incorporates natural language processing to analyze sentiments and emotions expressed in textual data, enabling an understanding of participant opinions besides the explicit numerical preference inputs. Once all the experts have provided their preferences for the alternatives, the individual preferences are aggregated into a single collective preference matrix. This matrix represents the collective expert opinion regarding the other options. Then, sentiments, emotions, and preference scores are inputted into a fuzzy inference system to get the overall score. The proposed system was used for a small decision-making process - choosing the hotel for a vacation by a group of friends. Our findings demonstrate that integrating sentiment and emotion analysis into GDM systems allows everyone’s feelings and opinions to be considered during discussions and significantly improves consensus among participants.

[AI-99] Real-Time Incremental Explanations for Object Detectors

链接: https://arxiv.org/abs/2408.11963
作者: Santiago Calderón-Peña,Hana Chockler,David A. Kelly
关键词-EN: Existing black box, Existing black, black box explainability, box explainability tools, object detectors rely
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Existing black box explainability tools for object detectors rely on multiple calls to the model, which prevents them from computing explanations in real time. In this paper we introduce IncX, an algorithm for real-time incremental approximations of explanations, based on linear transformations of saliency maps. We implement IncX on top of D-RISE, a state-of-the-art black-box explainability tool for object detectors. We show that IncX’s explanations are comparable in quality to those of D-RISE, with insertion curves being within 8%, and are computed two orders of magnitude faster that D-RISE’s explanations.

[AI-100] Advances in Preference-based Reinforcement Learning: A Review

链接: https://arxiv.org/abs/2408.11943
作者: Youssef Abdelkareem,Shady Shehata,Fakhri Karray
关键词-EN: engineered reward functions, accurately engineered reward, Preference-based reinforcement learning, Reinforcement Learning, algorithms suffer
类目: Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Reinforcement Learning (RL) algorithms suffer from the dependency on accurately engineered reward functions to properly guide the learning agents to do the required tasks. Preference-based reinforcement learning (PbRL) addresses that by utilizing human preferences as feedback from the experts instead of numeric rewards. Due to its promising advantage over traditional RL, PbRL has gained more focus in recent years with many significant advances. In this survey, we present a unified PbRL framework to include the newly emerging approaches that improve the scalability and efficiency of PbRL. In addition, we give a detailed overview of the theoretical guarantees and benchmarking work done in the field, while presenting its recent applications in complex real-world tasks. Lastly, we go over the limitations of the current approaches and the proposed future research directions.

[AI-101] Matmul or No Matmal in the Era of 1-bit LLMs

链接: https://arxiv.org/abs/2408.11939
作者: Jinendra Malekar,Mohammed E. Elbtity,Ramtin Zand Co
关键词-EN: attracted considerable attention, large language models, large language, attracted considerable, LLMs
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: 13 pages, 12 figures

点击查看摘要

Abstract:The advent of 1-bit large language models (LLMs) has attracted considerable attention and opened up new research opportunities. However, 1-bit LLMs only improve a fraction of models by applying extreme quantization to the projection layers while leaving attention heads unchanged. Therefore, to avoid fundamentally wrong choices of goals in future research, it is crucial to understand the actual improvements in computation and memory usage that 1-bit LLMs can deliver. In this work, we present an adaptation of Amdahl’s Law tailored for the 1-bit LLM context, which illustrates how partial improvements in 1-bit LLMs impact overall model performance. Through extensive experiments, we uncover key nuances across different model architectures and hardware configurations, offering a roadmap for future research in the era of 1-bit LLMs.

[AI-102] Estimating Contribution Quality in Online Deliberations Using a Large Language Model

链接: https://arxiv.org/abs/2408.11936
作者: Lodewijk Gelauff,Mohak Goyal,Bhargav Dindukurthi,Ashish Goel,Alice Siu
关键词-EN: participants exchanging knowledge, involves participants exchanging, Deliberation involves participants, Stanford Online Deliberation, Online Deliberation Platform
类目: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
*备注:

点击查看摘要

Abstract:Deliberation involves participants exchanging knowledge, arguments, and perspectives and has been shown to be effective at addressing polarization. The Stanford Online Deliberation Platform facilitates large-scale deliberations. It enables video-based online discussions on a structured agenda for small groups without requiring human moderators. This paper’s data comes from various deliberation events, including one conducted in collaboration with Meta in 32 countries, and another with 38 post-secondary institutions in the US. Estimating the quality of contributions in a conversation is crucial for assessing feature and intervention impacts. Traditionally, this is done by human annotators, which is time-consuming and costly. We use a large language model (LLM) alongside eight human annotators to rate contributions based on justification, novelty, expansion of the conversation, and potential for further expansion, with scores ranging from 1 to 5. Annotators also provide brief justifications for their ratings. Using the average rating from other human annotators as the ground truth, we find the model outperforms individual human annotators. While pairs of human annotators outperform the model in rating justification and groups of three outperform it on all four metrics, the model remains competitive. We illustrate the usefulness of the automated quality rating by assessing the effect of nudges on the quality of deliberation. We first observe that individual nudges after prolonged inactivity are highly effective, increasing the likelihood of the individual requesting to speak in the next 30 seconds by 65%. Using our automated quality estimation, we show that the quality ratings for statements prompted by nudging are similar to those made without nudging, signifying that nudging leads to more ideas being generated in the conversation without losing overall quality. Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC) ACMclasses: I.2.1; J.5; H.5.3 Cite as: arXiv:2408.11936 [cs.AI] (or arXiv:2408.11936v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2408.11936 Focus to learn more arXiv-issued DOI via DataCite

[AI-103] Explainable Anomaly Detection: Counterfactual driven What-If Analysis

链接: https://arxiv.org/abs/2408.11935
作者: Logan Cummins,Alexander Sommers,Sudip Mittal,Shahram Rahimi,Maria Seale,Joseph Jaboure,Thomas Arnold
关键词-EN: anomaly detection alerts, predictive maintenance, life prediction, anomaly detection, exists three main
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
*备注: 8 pages, 6 figures, 3 tables

点击查看摘要

Abstract:There exists three main areas of study inside of the field of predictive maintenance: anomaly detection, fault diagnosis, and remaining useful life prediction. Notably, anomaly detection alerts the stakeholder that an anomaly is occurring. This raises two fundamental questions: what is causing the fault and how can we fix it? Inside of the field of explainable artificial intelligence, counterfactual explanations can give that information in the form of what changes to make to put the data point into the opposing class, in this case “healthy”. The suggestions are not always actionable which may raise the interest in asking “what if we do this instead?” In this work, we provide a proof of concept for utilizing counterfactual explanations as what-if analysis. We perform this on the PRONOSTIA dataset with a temporal convolutional network as the anomaly detector. Our method presents the counterfactuals in the form of a what-if analysis for this base problem to inspire future work for more complex systems and scenarios.

[AI-104] An Open Knowledge Graph-Based Approach for Mapping Concepts and Requirements between the EU AI Act and International Standards

链接: https://arxiv.org/abs/2408.11925
作者: Julio Hernandez,Delaram Golpayegani,Dave Lewis
关键词-EN: Act, regulatory compliance, confusing and multipolar, multipolar landscape, navigate in pursuing
类目: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
*备注: This work was presented at the 9th International Symposium on Language Knowledge Engineering (LKE 2024) Dublin, Ireland, 4 - 6 June, 2024

点击查看摘要

Abstract:The many initiatives on trustworthy AI result in a confusing and multipolar landscape that organizations operating within the fluid and complex international value chains must navigate in pursuing trustworthy AI. The EU’s AI Act will now shift the focus of such organizations toward conformance with the technical requirements for regulatory compliance, for which the Act relies on Harmonized Standards. Though a high-level mapping to the Act’s requirements will be part of such harmonization, determining the degree to which standards conformity delivers regulatory compliance with the AI Act remains a complex challenge. Variance and gaps in the definitions of concepts and how they are used in requirements between the Act and harmonized standards may impact the consistency of compliance claims across organizations, sectors, and applications. This may present regulatory uncertainty, especially for SMEs and public sector bodies relying on standards conformance rather than proprietary equivalents for developing and deploying compliant high-risk AI systems. To address this challenge, this paper offers a simple and repeatable mechanism for mapping the terms and requirements relevant to normative statements in regulations and standards, e.g., AI Act and ISO management system standards, texts into open knowledge graphs. This representation is used to assess the adequacy of standards conformance to regulatory compliance and thereby provide a basis for identifying areas where further technical consensus development in trustworthy AI value chains is required to achieve regulatory compliance.

[AI-105] Neural Symbolic Logical Rule Learner for Interpretable Learning

链接: https://arxiv.org/abs/2408.11918
作者: Bowen Wei,Ziwei Zhu
关键词-EN: Normal Form, Rule-based neural networks, Conjunctive Normal Form, Disjunctive Normal Form, Normal Form Constraint
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: 19 pages, 62 figures

点击查看摘要

Abstract:Rule-based neural networks stand out for enabling interpretable classification by learning logical rules for both prediction and interpretation. However, existing models often lack flexibility due to the fixed model structure. Addressing this, we introduce the Normal Form Rule Learner (NFRL) algorithm, leveraging a selective discrete neural network, that treat weight parameters as hard selectors, to learn rules in both Conjunctive Normal Form (CNF) and Disjunctive Normal Form (DNF) for enhanced accuracy and interpretability. Instead of adopting a deep, complex structure, the NFRL incorporates two specialized Normal Form Layers (NFLs) with adaptable AND/OR neurons, a Negation Layer for input negations, and a Normal Form Constraint (NFC) to streamline neuron connections. We also show the novel network architecture can be optimized using adaptive gradient update together with Straight-Through Estimator to overcome the gradient vanishing challenge. Through extensive experiments on 11 datasets, NFRL demonstrates superior classification performance, quality of learned rules, efficiency and interpretability compared to 12 state-of-the-art alternatives. Code and data are available at \urlhttps://anonymous.4open.science/r/NFRL-27B4/.

[AI-106] Why am I Still Seeing This: Measuring the Effectiveness Of Ad Controls and Explanations in AI-Mediated Ad Targeting Systems AAAI

链接: https://arxiv.org/abs/2408.11910
作者: Jane Castleman,Aleksandra Korolova
关键词-EN: data privacy policies, targeting explanations Meta, targeting, targeting explanations, Meta
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
*备注: Accepted to the 7th AAAI Conference on AI, Ethics, and Society (AIES, 2024)

点击查看摘要

Abstract:Recently, Meta has shifted towards AI-mediated ad targeting mechanisms that do not require advertisers to provide detailed targeting criteria, likely driven by excitement over AI capabilities as well as new data privacy policies and targeting changes agreed upon in civil rights settlements. At the same time, Meta has touted their ad preference controls as an effective mechanism for users to control the ads they see. Furthermore, Meta markets their targeting explanations as a transparency tool that allows users to understand why they saw certain ads and inform actions to control future ads. Our study evaluates the effectiveness of Meta’s “See less” ad control and the actionability of ad targeting explanations following the shift to AI-mediated targeting. We conduct a large-scale study, randomly assigning participants to mark “See less” to Body Weight Control or Parenting topics, and collecting the ads and targeting explanations Meta shows to participants before and after the intervention. We find that utilizing the “See less” ad control for the topics we study does not significantly reduce the number of ads shown by Meta on these topics, and that the control is less effective for some users whose demographics are correlated with the topic. Furthermore, we find that the majority of ad targeting explanations for local ads made no reference to location-specific targeting criteria, and did not inform users why ads related to the topics they marked to “See less” of continued to be delivered. We hypothesize that the poor effectiveness of controls and lack of actionability in explanations are the result of the shift to AI-mediated targeting, for which explainability and transparency tools have not yet been developed. Our work thus provides evidence for the need of new methods for transparency and user control, suitable and reflective of increasingly complex AI-mediated ad delivery systems. Comments: Accepted to the 7th AAAI Conference on AI, Ethics, and Society (AIES, 2024) Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG) Cite as: arXiv:2408.11910 [cs.CY] (or arXiv:2408.11910v1 [cs.CY] for this version) https://doi.org/10.48550/arXiv.2408.11910 Focus to learn more arXiv-issued DOI via DataCite

[AI-107] Beyond Labels: Aligning Large Language Models with Human-like Reasoning ICPR2024

链接: https://arxiv.org/abs/2408.11879
作者: Muhammad Rafsan Kabir,Rafeed Mohammad Sultan,Ihsanul Haque Asif,Jawad Ibn Ahad,Fuad Rahman,Mohammad Ruhul Amin,Nabeel Mohammed,Shafin Rahman
关键词-EN: produce morally correct, Aligning large language, reasoning approach ensures, LLMs produce morally, large language models
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: Accepted in ICPR 2024

点击查看摘要

Abstract:Aligning large language models (LLMs) with a human reasoning approach ensures that LLMs produce morally correct and human-like decisions. Ethical concerns are raised because current models are prone to generating false positives and providing malicious responses. To contribute to this issue, we have curated an ethics dataset named Dataset for Aligning Reasons (DFAR), designed to aid in aligning language models to generate human-like reasons. The dataset comprises statements with ethical-unethical labels and their corresponding reasons. In this study, we employed a unique and novel fine-tuning approach that utilizes ethics labels and their corresponding reasons (L+R), in contrast to the existing fine-tuning approach that only uses labels (L). The original pre-trained versions, the existing fine-tuned versions, and our proposed fine-tuned versions of LLMs were then evaluated on an ethical-unethical classification task and a reason-generation task. Our proposed fine-tuning strategy notably outperforms the others in both tasks, achieving significantly higher accuracy scores in the classification task and lower misalignment rates in the reason-generation task. The increase in classification accuracies and decrease in misalignment rates indicate that the L+R fine-tuned models align more with human ethics. Hence, this study illustrates that injecting reasons has substantially improved the alignment of LLMs, resulting in more human-like responses. We have made the DFAR dataset and corresponding codes publicly available at this https URL.

[AI-108] Hierarchical Retrieval-Augmented Generation Model with Rethink for Multi-hop Question Answering

链接: https://arxiv.org/abs/2408.11875
作者: Xiaoming Zhang,Ming Wang,Xiaocui Yang,Daling Wang,Shi Feng,Yifei Zhang
关键词-EN: Multi-hop Question Answering, resolve intricate questions, Multi-hop Question, Question Answering, necessitates complex reasoning
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
*备注: undereview

点击查看摘要

Abstract:Multi-hop Question Answering (QA) necessitates complex reasoning by integrating multiple pieces of information to resolve intricate questions. However, existing QA systems encounter challenges such as outdated information, context window length limitations, and an accuracy-quantity trade-off. To address these issues, we propose a novel framework, the Hierarchical Retrieval-Augmented Generation Model with Rethink (HiRAG), comprising Decomposer, Definer, Retriever, Filter, and Summarizer five key modules. We introduce a new hierarchical retrieval strategy that incorporates both sparse retrieval at the document level and dense retrieval at the chunk level, effectively integrating their strengths. Additionally, we propose a single-candidate retrieval method to mitigate the limitations of multi-candidate retrieval. We also construct two new corpora, Indexed Wikicorpus and Profile Wikicorpus, to address the issues of outdated and insufficient knowledge. Our experimental results on four datasets demonstrate that HiRAG outperforms state-of-the-art models across most metrics, and our Indexed Wikicorpus is effective. The code for HiRAG is available at this https URL Comments: undereview Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR) Cite as: arXiv:2408.11875 [cs.CL] (or arXiv:2408.11875v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2408.11875 Focus to learn more arXiv-issued DOI via DataCite

[AI-109] MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models

链接: https://arxiv.org/abs/2408.11871
作者: Lionel Z. Wang,Yiming Ma,Renfei Gao,Beichen Guo,Zhuoran Li,Han Zhu,Wenqi Fan,Zexin Lu,Ka Chung Ng
关键词-EN: large language models, revolutionized online content, generate high-quality fake, online content creation, language models
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:The advent of large language models (LLMs) has revolutionized online content creation, making it much easier to generate high-quality fake news. This misuse threatens the integrity of our digital environment and ethical standards. Therefore, understanding the motivations and mechanisms behind LLM-generated fake news is crucial. In this study, we analyze the creation of fake news from a social psychology perspective and develop a comprehensive LLM-based theoretical framework, LLM-Fake Theory. We introduce a novel pipeline that automates the generation of fake news using LLMs, thereby eliminating the need for manual annotation. Utilizing this pipeline, we create a theoretically informed Machine-generated Fake news dataset, MegaFake, derived from the GossipCop dataset. We conduct comprehensive analyses to evaluate our MegaFake dataset. We believe that our dataset and insights will provide valuable contributions to future research focused on the detection and governance of fake news in the era of LLMs.

[AI-110] Enhance Lifelong Model Editing with Continuous Data-Adapter Association

链接: https://arxiv.org/abs/2408.11869
作者: Jiaang Li,Quan Wang,Zhongnan Wang,Yongdong Zhang,Zhendong Mao
关键词-EN: Large language models, avoid factual errors, efficiently update specific, Large language, require model editing
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: Preprint. Under Review

点击查看摘要

Abstract:Large language models (LLMs) require model editing to efficiently update specific knowledge within them and avoid factual errors. Most model editing methods are solely designed for single-time use and lead to a significant forgetting effect after sequential edits over time, referred to as lifelong editing. Current approaches manage sequential edits by freezing original parameters and allocating new adapters for each knowledge modification. However, these methods lack robustness to minor input variations. To address this challenge, we propose ELDER, \textbfEnhancing \textbfLifelong mo\textbfDel \textbfEditing with mixtu\textbfRe of Low-Rank Adapter (LoRA). ELDER is an adaptive approach that integrates multiple LoRAs through a router network. It learns to create a continuous and smooth association between data and adapters, thereby enhancing robustness and generalization to semantically equivalent inputs. Additionally, we introduce a novel loss to help learn associations between adapter allocations and edit semantics. A deferral mechanism is also proposed to retain the original LLM capabilities post-edit. Extensive experiments on GPT-2 XL and LLaMA2-7B demonstrate that ELDER effectively edits models in the lifelong setting and exhibits strong scalability, while retaining LLM’s general abilities on downstream tasks.

[AI-111] Improving embedding with contrastive fine-tuning on small datasets with expert-augmented scores

链接: https://arxiv.org/abs/2408.11868
作者: Jun Lu,David Li,Bill Ding,Yu Kang
关键词-EN: small datasets augmented, presents an approach, approach to improve, contrastive fine-tuning, fine-tuning on small
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:This paper presents an approach to improve text embedding models through contrastive fine-tuning on small datasets augmented with expert scores. It focuses on enhancing semantic textual similarity tasks and addressing text retrieval problems. The proposed method uses soft labels derived from expert-augmented scores to fine-tune embedding models, preserving their versatility and ensuring retrieval capability is improved. The paper evaluates the method using a Q\A dataset from an online shopping website and eight expert models. Results show improved performance over a benchmark model across multiple metrics on various retrieval tasks from the massive text embedding benchmark (MTEB). The method is cost-effective and practical for real-world applications, especially when labeled data is scarce.

[AI-112] Crossing New Frontiers: Knowledge-Augmented Large Language Model Prompting for Zero-Shot Text-Based De Novo Molecule Design NEURIPS NEURIPS-2023

链接: https://arxiv.org/abs/2408.11866
作者: Sakhinana Sagar Srinivas,Venkataramana Runkana
关键词-EN: innovative material development, efficient chemical processes, leverages computational methods, optimize molecular properties, fast-tracking new drug
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Biomolecules (q-bio.BM)
*备注: Paper was accepted at R0-FoMo: Robustness of Few-shot and Zero-shot Learning in Foundation Models, NeurIPS-2023. Please find the links: this https URL and this https URL

点击查看摘要

Abstract:Molecule design is a multifaceted approach that leverages computational methods and experiments to optimize molecular properties, fast-tracking new drug discoveries, innovative material development, and more efficient chemical processes. Recently, text-based molecule design has emerged, inspired by next-generation AI tasks analogous to foundational vision-language models. Our study explores the use of knowledge-augmented prompting of large language models (LLMs) for the zero-shot text-conditional de novo molecular generation task. Our approach uses task-specific instructions and a few demonstrations to address distributional shift challenges when constructing augmented prompts for querying LLMs to generate molecules consistent with technical descriptions. Our framework proves effective, outperforming state-of-the-art (SOTA) baseline models on benchmark datasets.

[AI-113] How Susceptible are LLMs to Influence in Prompts?

链接: https://arxiv.org/abs/2408.11865
作者: Sotiris Anagnostidis,Jannis Bulian
关键词-EN: Large Language Models, Large Language, including additional context, Language Models, highly sensitive
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Large Language Models (LLMs) are highly sensitive to prompts, including additional context provided therein. As LLMs grow in capability, understanding their prompt-sensitivity becomes increasingly crucial for ensuring reliable and robust performance, particularly since evaluating these models becomes more challenging. In this work, we investigate how current models (Llama, Mixtral, Falcon) respond when presented with additional input from another model, mimicking a scenario where a more capable model – or a system with access to more external information – provides supplementary information to the target model. Across a diverse spectrum of question-answering tasks, we study how an LLM’s response to multiple-choice questions changes when the prompt includes a prediction and explanation from another model. Specifically, we explore the influence of the presence of an explanation, the stated authoritativeness of the source, and the stated confidence of the supplementary input. Our findings reveal that models are strongly influenced, and when explanations are provided they are swayed irrespective of the quality of the explanation. The models are more likely to be swayed if the input is presented as being authoritative or confident, but the effect is small in size. This study underscores the significant prompt-sensitivity of LLMs and highlights the potential risks of incorporating outputs from external sources without thorough scrutiny and further validation. As LLMs continue to advance, understanding and mitigating such sensitivities will be crucial for their reliable and trustworthy deployment.

[AI-114] Unraveling Text Generation in LLMs: A Stochastic Differential Equation Approach

链接: https://arxiv.org/abs/2408.11863
作者: Yukun Zhang
关键词-EN: Stochastic Differential Equations, Differential Equations, Large Language Models, Stochastic Differential, Large Language
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
*备注:

点击查看摘要

Abstract:This paper explores the application of Stochastic Differential Equations (SDE) to interpret the text generation process of Large Language Models (LLMs) such as GPT-4. Text generation in LLMs is modeled as a stochastic process where each step depends on previously generated content and model parameters, sampling the next word from a vocabulary distribution. We represent this generation process using SDE to capture both deterministic trends and stochastic perturbations. The drift term describes the deterministic trends in the generation process, while the diffusion term captures the stochastic variations. We fit these functions using neural networks and validate the model on real-world text corpora. Through numerical simulations and comprehensive analyses, including drift and diffusion analysis, stochastic process property evaluation, and phase space exploration, we provide deep insights into the dynamics of text generation. This approach not only enhances the understanding of the inner workings of LLMs but also offers a novel mathematical perspective on language generation, which is crucial for diagnosing, optimizing, and controlling the quality of generated text.

[AI-115] Sentiment analysis of preservice teachers reflections using a large language model

链接: https://arxiv.org/abs/2408.11862
作者: Yunsoo Park,Younkyung Hong
关键词-EN: preservice teachers’ reflections, emotion and tone, analyzed using sentiment, teachers’ reflections, Gemini
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
*备注: 5 pages, 2 tables, WAIE 2024 (2024 6th International Workshop on Artificial Intelligence and Education)

点击查看摘要

Abstract:In this study, the emotion and tone of preservice teachers’ reflections were analyzed using sentiment analysis with LLMs: GPT-4, Gemini, and BERT. We compared the results to understand how each tool categorizes and describes individual reflections and multiple reflections as a whole. This study aims to explore ways to bridge the gaps between qualitative, quantitative, and computational analyses of reflective practices in teacher education. This study finds that to effectively integrate LLM analysis into teacher education, developing an analysis method and result format that are both comprehensive and relevant for preservice teachers and teacher educators is crucial.

[AI-116] Speaking the Same Language: Leveraging LLMs in Standardizing Clinical Data for AI

链接: https://arxiv.org/abs/2408.11861
作者: Arindam Sett,Somaye Hashemifar,Mrunal Yadav,Yogesh Pandit,Mohsen Hejrati
关键词-EN: Artificial Intelligence, garnered considerable attention, implementation of Artificial, cost reduction, considerable attention
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: 11 pages, 2 figures, 4 tables

点击查看摘要

Abstract:The implementation of Artificial Intelligence (AI) in the healthcare industry has garnered considerable attention, attributable to its prospective enhancement of clinical outcomes, expansion of access to superior healthcare, cost reduction, and elevation of patient satisfaction. Nevertheless, the primary hurdle that persists is related to the quality of accessible multi-modal healthcare data in conjunction with the evolution of AI methodologies. This study delves into the adoption of large language models to address specific challenges, specifically, the standardization of healthcare data. We advocate the use of these models to identify and map clinical data schemas to established data standard attributes, such as the Fast Healthcare Interoperability Resources. Our results illustrate that employing large language models significantly diminishes the necessity for manual data curation and elevates the efficacy of the data standardization process. Consequently, the proposed methodology has the propensity to expedite the integration of AI in healthcare, ameliorate the quality of patient care, whilst minimizing the time and financial resources necessary for the preparation of data for AI.

[AI-117] Dynamic Adaptive Optimization for Effective Sentiment Analysis Fine-Tuning on Large Language Models

链接: https://arxiv.org/abs/2408.11856
作者: Hongcheng Ding,Xuanze Zhao,Shamsul Nahar Abdullah,Deshinta Arrova Dewi,Zixiao Jiang
关键词-EN: Sentiment analysis plays, Sentiment analysis, plays a crucial, crucial role, business intelligence
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Sentiment analysis plays a crucial role in various domains, such as business intelligence and financial forecasting. Large language models (LLMs) have become a popular paradigm for sentiment analysis, leveraging multi-task learning to address specific tasks concurrently. However, LLMs with fine-tuning for sentiment analysis often underperforms due to the inherent challenges in managing diverse task complexities. Moreover, constant-weight approaches in multi-task learning struggle to adapt to variations in data characteristics, further complicating model effectiveness. To address these issues, we propose a novel multi-task learning framework with a dynamic adaptive optimization (DAO) module. This module is designed as a plug-and-play component that can be seamlessly integrated into existing models, providing an effective and flexible solution for multi-task learning. The key component of the DAO module is dynamic adaptive loss, which dynamically adjusts the weights assigned to different tasks based on their relative importance and data characteristics during training. Sentiment analyses on a standard and customized financial text dataset demonstrate that the proposed framework achieves superior performance. Specifically, this work improves the Mean Squared Error (MSE) and Accuracy (ACC) by 15.58% and 1.24% respectively, compared with previous work.

[AI-118] FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models

链接: https://arxiv.org/abs/2408.11855
作者: Zhongyu Zhao,Menghang Dong,Rongyu Zhang,Wenzhao Zheng,Yunpeng Zhang,Huanrui Yang,Dalong Du,Kurt Keutzer,Shanghang Zhang
关键词-EN: Large Language Models, Large Language, storing diverse linguistic, Recent research, Feed-Forward Networks
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Recent research has demonstrated that Feed-Forward Networks (FFNs) in Large Language Models (LLMs) play a pivotal role in storing diverse linguistic and factual knowledge. Conventional methods frequently face challenges due to knowledge confusion stemming from their monolithic and redundant architectures, which calls for more efficient solutions with minimal computational overhead, particularly for LLMs. In this paper, we explore the FFN computation paradigm in LLMs and introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications, while maintaining the same level of performance. Furthermore, we embed a router from the Mixture-of-Experts (MoE), combined with our devised Prior-Approximate (PA) loss term that facilitates the dynamic activation of experts and knowledge adaptation, thereby accelerating computational processes and enhancing performance using minimal training data and fine-tuning steps. FactorLLM thus enables efficient knowledge factorization and activates select groups of experts specifically tailored to designated tasks, emulating the interactive functional segmentation of the human brain. Extensive experiments across various benchmarks demonstrate the effectiveness of our proposed FactorLLM which achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed. Code: this https URL.

[AI-119] When Raw Data Prevails: Are Large Language Model Embeddings Effective in Numerical Data Representation for Medical Machine Learning Applications?

链接: https://arxiv.org/abs/2408.11854
作者: Yanjun Gao,Skatje Myers,Shan Chen,Dmitriy Dligach,Timothy A Miller,Danielle Bitterman,Matthew Churpek,Majid Afshar
关键词-EN: Large Language Models, Language Models, Large Language, bringing significant progress, introduction of Large
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: Under review

点击查看摘要

Abstract:The introduction of Large Language Models (LLMs) has advanced data representation and analysis, bringing significant progress in their use for medical questions and answering. Despite these advancements, integrating tabular data, especially numerical data pivotal in clinical contexts, into LLM paradigms has not been thoroughly explored. In this study, we examine the effectiveness of vector representations from last hidden states of LLMs for medical diagnostics and prognostics using electronic health record (EHR) data. We compare the performance of these embeddings with that of raw numerical EHR data when used as feature inputs to traditional machine learning (ML) algorithms that excel at tabular data learning, such as eXtreme Gradient Boosting. We focus on instruction-tuned LLMs in a zero-shot setting to represent abnormal physiological data and evaluating their utilities as feature extractors to enhance ML classifiers for predicting diagnoses, length of stay, and mortality. Furthermore, we examine prompt engineering techniques on zero-shot and few-shot LLM embeddings to measure their impact comprehensively. Although findings suggest the raw data features still prevails in medical ML tasks, zero-shot LLM embeddings demonstrate competitive results, suggesting a promising avenue for future research in medical applications.

[AI-120] Fast Training Dataset Attribution via In-Context Learning

链接: https://arxiv.org/abs/2408.11852
作者: Milad Fotouhi,Mohammad Taha Bahadori,Oluwaseyi Feyisetan,Payman Arabshahi,David Heckerman
关键词-EN: instruction-tuned large language, large language models, prompt engineering, engineering to estimate, instruction-tuned large
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:We investigate the use of in-context learning and prompt engineering to estimate the contributions of training data in the outputs of instruction-tuned large language models (LLMs). We propose two novel approaches: (1) a similarity-based approach that measures the difference between LLM outputs with and without provided context, and (2) a mixture distribution model approach that frames the problem of identifying contribution scores as a matrix factorization task. Our empirical comparison demonstrates that the mixture model approach is more robust to retrieval noise in in-context learning, providing a more reliable estimation of data contributions.

[AI-121] SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming

链接: https://arxiv.org/abs/2408.11851
作者: Anurakt Kumar,Divyanshu Kumar,Jatan Loya,Nitin Aravind Birur,Tanay Baswa,Sahil Agarwal,Prashanth Harshangi
关键词-EN: Red Teaming, Evaluation and Red, Safety Evaluation, data Generation, introduce Synthetic Alignment
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
*备注:

点击查看摘要

Abstract:We introduce Synthetic Alignment data Generation for Safety Evaluation and Red Teaming (SAGE-RT or SAGE) a novel pipeline for generating synthetic alignment and red-teaming data. Existing methods fall short in creating nuanced and diverse datasets, providing necessary control over the data generation and validation processes, or require large amount of manually generated seed data. SAGE addresses these limitations by using a detailed taxonomy to produce safety-alignment and red-teaming data across a wide range of topics. We generated 51,000 diverse and in-depth prompt-response pairs, encompassing over 1,500 topics of harmfulness and covering variations of the most frequent types of jailbreaking prompts faced by large language models (LLMs). We show that the red-teaming data generated through SAGE jailbreaks state-of-the-art LLMs in more than 27 out of 32 sub-categories, and in more than 58 out of 279 leaf-categories (sub-sub categories). The attack success rate for GPT-4o, GPT-3.5-turbo is 100% over the sub-categories of harmfulness. Our approach avoids the pitfalls of synthetic safety-training data generation such as mode collapse and lack of nuance in the generation pipeline by ensuring a detailed coverage of harmful topics using iterative expansion of the topics and conditioning the outputs on the generated raw-text. This method can be used to generate red-teaming and alignment data for LLM Safety completely synthetically to make LLMs safer or for red-teaming the models over a diverse range of topics.

[AI-122] Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation

链接: https://arxiv.org/abs/2408.11849
作者: Yinghao Aaron Li,Xilin Jiang,Jordan Darefsky,Ge Zhu,Nima Mesgarani
关键词-EN: large language models, contextually relevant dialogues, text-based chatbots, demonstrating their capability, large language
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
*备注: CoLM 2024

点击查看摘要

Abstract:The rapid advancement of large language models (LLMs) has significantly propelled the development of text-based chatbots, demonstrating their capability to engage in coherent and contextually relevant dialogues. However, extending these advancements to enable end-to-end speech-to-speech conversation bots remains a formidable challenge, primarily due to the extensive dataset and computational resources required. The conventional approach of cascading automatic speech recognition (ASR), LLM, and text-to-speech (TTS) models in a pipeline, while effective, suffers from unnatural prosody because it lacks direct interactions between the input audio and its transcribed text and the output audio. These systems are also limited by their inherent latency from the ASR process for real-time applications. This paper introduces Style-Talker, an innovative framework that fine-tunes an audio LLM alongside a style-based TTS model for fast spoken dialog generation. Style-Talker takes user input audio and uses transcribed chat history and speech styles to generate both the speaking style and text for the response. Subsequently, the TTS model synthesizes the speech, which is then played back to the user. While the response speech is being played, the input speech undergoes ASR processing to extract the transcription and speaking style, serving as the context for the ensuing dialogue turn. This novel pipeline accelerates the traditional cascade ASR-LLM-TTS systems while integrating rich paralinguistic information from input speech. Our experimental results show that Style-Talker significantly outperforms the conventional cascade and speech-to-speech baselines in terms of both dialogue naturalness and coherence while being more than 50% faster.

[AI-123] MGH Radiology Llama: A Llama 3 70B Model for Radiology

链接: https://arxiv.org/abs/2408.11848
作者: Yucheng Shi,Peng Shu,Zhengliang Liu,Zihao Wu,Quanzheng Li,Xiang Li
关键词-EN: enhance diagnostic accuracy, improve patient care, streamline workflows, recent years, artificial intelligence
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
*备注: 11 pages, 3 figures, 1 table

点击查看摘要

Abstract:In recent years, the field of radiology has increasingly harnessed the power of artificial intelligence (AI) to enhance diagnostic accuracy, streamline workflows, and improve patient care. Large language models (LLMs) have emerged as particularly promising tools, offering significant potential in assisting radiologists with report generation, clinical decision support, and patient communication. This paper presents an advanced radiology-focused large language model: MGH Radiology Llama. It is developed using the Llama 3 70B model, building upon previous domain-specific models like Radiology-GPT and Radiology-Llama2. Leveraging a unique and comprehensive dataset from Massachusetts General Hospital, comprising over 6.5 million de-identified medical reports across various imaging modalities, the model demonstrates significant improvements in generating accurate and clinically relevant radiology impressions given the corresponding findings. Our evaluation, incorporating both traditional metrics and a GPT-4-based assessment, highlights the enhanced performance of this work over general-purpose LLMs.

[AI-124] Editable Fairness: Fine-Grained Bias Mitigation in Language Models

链接: https://arxiv.org/abs/2408.11843
作者: Ruizhe Chen,Yichen Li,Jianfei Yang,Joey Tianyi Zhou,Zuozhu Liu
关键词-EN: deploying large language, Generating fair, accurate predictions plays, large language models, real world
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
*备注: arXiv admin note: substantial text overlap with arXiv:2405.09341

点击查看摘要

Abstract:Generating fair and accurate predictions plays a pivotal role in deploying large language models (LLMs) in the real world. However, existing debiasing methods inevitably generate unfair or incorrect predictions as they are designed and evaluated to achieve parity across different social groups but leave aside individual commonsense facts, resulting in modified knowledge that elicits unreasonable or undesired predictions. In this paper, we first establish a new bias mitigation benchmark, BiaScope, which systematically assesses performance by leveraging newly constructed datasets and metrics on knowledge retention and generalization. Then, we propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases. FAST identifies the decisive layer responsible for storing social biases and then calibrates its outputs by integrating a small modular network, considering both bias mitigation and knowledge-preserving demands. Comprehensive experiments demonstrate that FAST surpasses state-of-the-art baselines with superior debiasing performance while not compromising the overall model capability for knowledge retention and downstream predictions. This highlights the potential of fine-grained debiasing strategies to achieve fairness in LLMs. Code will be publicly available.

[AI-125] Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants

链接: https://arxiv.org/abs/2408.11841
作者: Beatriz Borges,Negar Foroutan,Deniz Bayazit,Anna Sotnikova,Syrielle Montariol,Tanya Nazaretzky,Mohammadreza Banaei,Alireza Sakhaeirad,Philippe Servant,Seyed Parsa Neshaei,Jibril Frej,Angelika Romanou,Gail Weiss,Sepideh Mamooler,Zeming Chen,Simin Fan,Silin Gao,Mete Ismayilzada,Debjit Paul,Alexandre Schöpfer,Andrej Janchevski,Anja Tiede,Clarence Linden,Emanuele Troiani,Francesco Salvi,Freya Behrens,Giacomo Orsi,Giovanni Piccioli,Hadrien Sevel,Louis Coulon,Manuela Pineros-Rodriguez,Marin Bonnassies,Pierre Hellich,Puck van Gerwen,Sankalp Gambhir,Solal Pirelli,Thomas Blanchard,Timothée Callens,Toni Abi Aoun,Yannick Calvino Alonso,Yuri Cho,Alberto Chiappa,Antonio Sclocchi,Étienne Bruno,Florian Hofhammer,Gabriel Pescia,Geovani Rizk,Leello Dadi,Lucas Stoffl,Manoel Horta Ribeiro,Matthieu Bovel,Yueyang Pan,Aleksandra Radenovic,Alexandre Alahi,Alexander Mathis,Anne-Florence Bitbol,Boi Faltings,Cécile Hébert,Devis Tuia,François Maréchal,George Candea,Giuseppe Carleo,Jean-Cédric Chappelier,Nicolas Flammarion,Jean-Marie Fürbringer,Jean-Philippe Pellet,Karl Aberer,Lenka Zdeborová,Marcel Salathé,Martin Jaggi,Martin Rajman,Mathias Payer,Matthieu Wyart,Michael Gastpar,Michele Ceriotti,Ola Svensson,Olivier Lévêque,Paolo Ienne,Rachid Guerraoui,Robert West,Sanidhya Kashyap,Valerio Piazza,Viesturs Simanis,Viktor Kuncak,Volkan Cevher,Philippe Schwaller,Sacha Friedli,Patrick Jermann,Tanja Kaser,Antoine Bosselut
关键词-EN: higher education institutions, students enrolled, higher education, learning outcomes, education institutions
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
*备注: 20 pages, 8 figures

点击查看摘要

Abstract:AI assistants are being increasingly used by students enrolled in higher education institutions. While these tools provide opportunities for improved teaching and education, they also pose significant challenges for assessment and learning outcomes. We conceptualize these challenges through the lens of vulnerability, the potential for university assessments and learning outcomes to be impacted by student use of generative AI. We investigate the potential scale of this vulnerability by measuring the degree to which AI assistants can complete assessment questions in standard university-level STEM courses. Specifically, we compile a novel dataset of textual assessment questions from 50 courses at EPFL and evaluate whether two AI assistants, GPT-3.5 and GPT-4 can adequately answer these questions. We use eight prompting strategies to produce responses and find that GPT-4 answers an average of 65.8% of questions correctly, and can even produce the correct answer across at least one prompting strategy for 85.1% of questions. When grouping courses in our dataset by degree program, these systems already pass non-project assessments of large numbers of core courses in various degree programs, posing risks to higher education accreditation that will be amplified as these models improve. Our results call for revising program-level assessment design in higher education in light of advances in generative AI.

[AI-126] Joint PET-MRI Reconstruction with Diffusion Stochastic Differential Model

链接: https://arxiv.org/abs/2408.11840
作者: Taofeng Xie,Zhuoxu Cui,Congcong Liu,Chen Luo,Huayu Wang,Yuanzhi Zhang,Xuemei Wang,Yihang Zhou,Qiyu Jin,Guoqing Chen,Dong Liang,Haifeng Wang
关键词-EN: PET, MRI, PET suffers, joint, Abstract
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注: Accepted as ISMRM 2024 Digital poster 6575. 04-09 May 2024 Singapore

点击查看摘要

Abstract:PET suffers from a low signal-to-noise ratio. Meanwhile, the k-space data acquisition process in MRI is time-consuming by PET-MRI systems. We aim to accelerate MRI and improve PET image quality. This paper proposed a novel joint reconstruction model by diffusion stochastic differential equations based on learning the joint probability distribution of PET and MRI. Compare the results underscore the qualitative and quantitative improvements our model brings to PET and MRI reconstruction, surpassing the current state-of-the-art methodologies. Joint PET-MRI reconstruction is a challenge in the PET-MRI system. This studies focused on the relationship extends beyond edges. In this study, PET is generated from MRI by learning joint probability distribution as the relationship.

[AI-127] Adaptive Friction in Deep Learning: Enhancing Optimizers with Sigmoid and Tanh Function

链接: https://arxiv.org/abs/2408.11839
作者: Hongye Zheng,Bingxing Wang,Minheng Xiao,Honglin Qin,Zhizhong Wu,Lianghao Tan
关键词-EN: adaptive friction coefficients, deep neural networks, neural networks, oscillation issues, pivotal in guiding
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Adaptive optimizers are pivotal in guiding the weight updates of deep neural networks, yet they often face challenges such as poor generalization and oscillation issues. To counter these, we introduce sigSignGrad and tanhSignGrad, two novel optimizers that integrate adaptive friction coefficients based on the Sigmoid and Tanh functions, respectively. These algorithms leverage short-term gradient information, a feature overlooked in traditional Adam variants like diffGrad and AngularGrad, to enhance parameter updates and convergence.Our theoretical analysis demonstrates the wide-ranging adjustment capability of the friction coefficient S, which aligns with targeted parameter update strategies and outperforms existing methods in both optimization trajectory smoothness and convergence rate. Extensive experiments on CIFAR-10, CIFAR-100, and Mini-ImageNet datasets using ResNet50 and ViT architectures confirm the superior performance of our proposed optimizers, showcasing improved accuracy and reduced training time. The innovative approach of integrating adaptive friction coefficients as plug-ins into existing optimizers, exemplified by the sigSignAdamW and sigSignAdamP variants, presents a promising strategy for boosting the optimization performance of established algorithms. The findings of this study contribute to the advancement of optimizer design in deep learning.

[AI-128] MicroXercise: A Micro-Level Comparative and Explainable System for Remote Physical Therapy

链接: https://arxiv.org/abs/2408.11837
作者: Hanchen David Wang,Nibraas Khan,Anna Chen,Nilanjan Sarkar,Pamela Wisniewski,Meiyi Ma
关键词-EN: Recent global estimates, global estimates suggest, Recent global, billion individuals, rehabilitation services
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)
*备注: Accepted by IEEE/ACM CHASE 2024

点击查看摘要

Abstract:Recent global estimates suggest that as many as 2.41 billion individuals have health conditions that would benefit from rehabilitation services. Home-based Physical Therapy (PT) faces significant challenges in providing interactive feedback and meaningful observation for therapists and patients. To fill this gap, we present MicroXercise, which integrates micro-motion analysis with wearable sensors, providing therapists and patients with a comprehensive feedback interface, including video, text, and scores. Crucially, it employs multi-dimensional Dynamic Time Warping (DTW) and attribution-based explainable methods to analyze the existing deep learning neural networks in monitoring exercises, focusing on a high granularity of exercise. This synergistic approach is pivotal, providing output matching the input size to precisely highlight critical subtleties and movements in PT, thus transforming complex AI analysis into clear, actionable feedback. By highlighting these micro-motions in different metrics, such as stability and range of motion, MicroXercise significantly enhances the understanding and relevance of feedback for end-users. Comparative performance metrics underscore its effectiveness over traditional methods, such as a 39% and 42% improvement in Feature Mutual Information (FMI) and Continuity. MicroXercise is a step ahead in home-based physical therapy, providing a technologically advanced and intuitively helpful solution to enhance patient care and outcomes.

[AI-129] SCREENER: A general framework for task-specific experiment design in quantitative MRI

链接: https://arxiv.org/abs/2408.11834
作者: Tianshu Zheng,Zican Wang,Timothy Bray,Daniel C. Alexander,Dan Wu,Hui Zhang
关键词-EN: magnetic resonance imaging, Quantitative magnetic resonance, resonance imaging, treatment monitoring, magnetic resonance
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Quantitative magnetic resonance imaging (qMRI) is increasingly investigated for use in a variety of clinical tasks from diagnosis, through staging, to treatment monitoring. However, experiment design in qMRI, the identification of the optimal acquisition protocols, has been focused on obtaining the most precise parameter estimations, with no regard for the specific requirements of downstream tasks. Here we propose SCREENER: A general framework for task-specific experiment design in quantitative MRI. SCREENER incorporates a task-specific objective and seeks the optimal protocol with a deep-reinforcement-learning (DRL) based optimization strategy. To illustrate this framework, we employ a task of classifying the inflammation status of bone marrow using diffusion MRI data with intravoxel incoherent motion (IVIM) modelling. Results demonstrate SCREENER outperforms previous ad hoc and optimized protocols under clinical signal-to-noise ratio (SNR) conditions, achieving significant improvement, both in binary classification tasks, e.g. from 67% to 89%, and in a multi-class classification task, from 46% to 59%. Additionally, we show this improvement is robust to the SNR. Lastly, we demonstrate the advantage of DRL-based optimization strategy, enabling zero-shot discovery of near-optimal protocols for a range of SNRs not used in training. In conclusion, SCREENER has the potential to enable wider uptake of qMRI in the clinic.

[AI-130] OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs EMNLP2024

链接: https://arxiv.org/abs/2408.11832
作者: Hasan Iqbal,Yuxia Wang,Minghan Wang,Georgi Georgiev,Jiahui Geng,Iryna Gurevych,Preslav Nakov
关键词-EN: large language models, real-world applications calls, https URL, language models, large language
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
*备注: 10 pages, 4 Figures, 3 Tables, Submitted to EMNLP 2024 System Demonstration. arXiv admin note: substantial text overlap with arXiv:2405.05583

点击查看摘要

Abstract:The increased use of large language models (LLMs) across a variety of real-world applications calls for automatic tools to check the factual accuracy of their outputs, as LLMs often hallucinate. This is difficult as it requires assessing the factuality of free-form open-domain responses. While there has been a lot of research on this topic, different papers use different evaluation benchmarks and measures, which makes them hard to compare and hampers future progress. To mitigate these issues, we developed OpenFactCheck, a unified framework, with three modules: (i) RESPONSEEVAL, which allows users to easily customize an automatic fact-checking system and to assess the factuality of all claims in an input document using that system, (ii) LLMEVAL, which assesses the overall factuality of an LLM, and (iii) CHECKEREVAL, a module to evaluate automatic fact-checking systems. OpenFactCheck is open-sourced (this https URL) and publicly released as a Python library (this https URL) and also as a web service (this https URL). A video describing the system is available at this https URL.

[AI-131] Generative Organizational Behavior Simulation using Large Language Model based Autonomous Agents : A Holacracy Perspective

链接: https://arxiv.org/abs/2408.11826
作者: Chen Zhu,Yihang Cheng,Jingshuai Zhang,Yusheng Qiu,Sitao Xia,Hengshu Zhu
关键词-EN: Model-based Autonomous Agents, Large Language Model-based, Language Model-based Autonomous, Autonomous Agents, Large Language
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:In this paper, we present the technical details and periodic findings of our project, CareerAgent, which aims to build a generative simulation framework for a Holacracy organization using Large Language Model-based Autonomous Agents. Specifically, the simulation framework includes three phases: construction, execution, and evaluation, and it incorporates basic characteristics of individuals, organizations, tasks, and meetings. Through our simulation, we obtained several interesting findings. At the organizational level, an increase in the average values of management competence and functional competence can reduce overall members’ stress levels, but it negatively impacts deeper organizational performance measures such as average task completion. At the individual level, both competences can improve members’ work performance. From the analysis of social networks, we found that highly competent members selectively participate in certain tasks and take on more responsibilities. Over time, small sub-communities form around these highly competent members within the holacracy. These findings contribute theoretically to the study of organizational science and provide practical insights for managers to understand the organization dynamics.

[AI-132] Strategic AI adoption in SMEs: A Prescriptive Framework

链接: https://arxiv.org/abs/2408.11825
作者: Atif Hussain,Rana Rizwan
关键词-EN: Artificial Intelligence, including small, medium enterprises, increasingly acknowledged, vital component
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Artificial Intelligence (AI) is increasingly acknowledged as a vital component for the advancement and competitiveness of modern organizations, including small and medium enterprises (SMEs). However, the adoption of AI technologies in SMEs faces significant barriers, primarily related to cost, lack of technical skills, and employee acceptance. This study proposes a comprehensive, phased framework designed to facilitate the effective adoption of AI in SMEs by systematically addressing these barriers. The framework begins with raising awareness and securing commitment from leadership, followed by the adoption of low-cost, general-purpose AI tools to build technical competence and foster a positive attitude towards AI. As familiarity with AI technologies increases, the framework advocates for the integration of task-specific AI tools to enhance efficiency and productivity. Subsequently, it guides organizations towards the in-house development of generative AI tools, providing greater customization and control. Finally, the framework addresses the development of discriminative AI models to meet highly specific and precision-oriented tasks. By providing a structured and incremental approach, this framework ensures that SMEs can navigate the complexities of AI integration effectively, driving innovation, efficiency, and competitive advantage. This study contributes to the field by offering a practical, prescriptive framework tailored to the unique needs of SMEs, facilitating the successful adoption of AI technologies and positioning these organizations for sustained growth in a competitive landscape.

[AI-133] AppAgent v2: Advanced Agent for Flexible Mobile Interactions

链接: https://arxiv.org/abs/2408.11824
作者: Yanda Li,Chi Zhang,Wanqi Yang,Bin Fu,Pei Cheng,Xin Chen,Ling Chen,Yunchao Wei
关键词-EN: Large Language Models, Multimodal Large Language, Language Models, Large Language, increasingly impacting software
类目: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:With the advancement of Multimodal Large Language Models (MLLM), LLM-driven visual agents are increasingly impacting software interfaces, particularly those with graphical user interfaces. This work introduces a novel LLM-based multimodal agent framework for mobile devices. This framework, capable of navigating mobile devices, emulates human-like interactions. Our agent constructs a flexible action space that enhances adaptability across various applications including parser, text and vision descriptions. The agent operates through two main phases: exploration and deployment. During the exploration phase, functionalities of user interface elements are documented either through agent-driven or manual explorations into a customized structured knowledge base. In the deployment phase, RAG technology enables efficient retrieval and update from this knowledge base, thereby empowering the agent to perform tasks effectively and accurately. This includes performing complex, multi-step operations across various applications, thereby demonstrating the framework’s adaptability and precision in handling customized task workflows. Our experimental results across various benchmarks demonstrate the framework’s superior performance, confirming its effectiveness in real-world scenarios. Our code will be open source soon.

[AI-134] Mamba-Spike: Enhancing the Mamba Architecture with a Spiking Front-End for Efficient Temporal Data Processing

链接: https://arxiv.org/abs/2408.11823
作者: Jiahao Qin,Feng Liu
关键词-EN: artificial intelligence systems, gained significant attention, biological neural networks, Mamba backbone, recent years
类目: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
*备注: 12 pages, 5 figures, accepted by CGI 2024

点击查看摘要

Abstract:The field of neuromorphic computing has gained significant attention in recent years, aiming to bridge the gap between the efficiency of biological neural networks and the performance of artificial intelligence systems. This paper introduces Mamba-Spike, a novel neuromorphic architecture that integrates a spiking front-end with the Mamba backbone to achieve efficient and robust temporal data processing. The proposed approach leverages the event-driven nature of spiking neural networks (SNNs) to capture and process asynchronous, time-varying inputs, while harnessing the power of the Mamba backbone’s selective state spaces and linear-time sequence modeling capabilities to model complex temporal dependencies effectively. The spiking front-end of Mamba-Spike employs biologically inspired neuron models, along with adaptive threshold and synaptic dynamics. These components enable efficient spatiotemporal feature extraction and encoding of the input data. The Mamba backbone, on the other hand, utilizes a hierarchical structure with gated recurrent units and attention mechanisms to capture long-term dependencies and selectively process relevant information. To evaluate the efficacy of the proposed architecture, a comprehensive empirical study is conducted on both neuromorphic datasets, including DVS Gesture and TIDIGITS, and standard datasets, such as Sequential MNIST and CIFAR10-DVS. The results demonstrate that Mamba-Spike consistently outperforms state-of-the-art baselines, achieving higher accuracy, lower latency, and improved energy efficiency. Moreover, the model exhibits robustness to various input perturbations and noise levels, highlighting its potential for real-world applications. The code will be available at this https URL.

[AI-135] State-of-the-art in Robot Learning for Multi-Robot Collaboration: A Comprehensive Survey

链接: https://arxiv.org/abs/2408.11822
作者: Bin Wu,C Steve Suh
关键词-EN: daily human life, continuous breakthroughs, breakthroughs in core, dawn of large-scale, robotic systems
类目: Robotics (cs.RO); Artificial Intelligence (cs.AI)
*备注: Multi-robot, Cooperation, robot learning

点击查看摘要

Abstract:With the continuous breakthroughs in core technology, the dawn of large-scale integration of robotic systems into daily human life is on the horizon. Multi-robot systems (MRS) built on this foundation are undergoing drastic evolution. The fusion of artificial intelligence technology with robot hardware is seeing broad application possibilities for MRS. This article surveys the state-of-the-art of robot learning in the context of Multi-Robot Cooperation (MRC) of recent. Commonly adopted robot learning methods (or frameworks) that are inspired by humans and animals are reviewed and their advantages and disadvantages are discussed along with the associated technical challenges. The potential trends of robot learning and MRS integration exploiting the merging of these methods with real-world applications is also discussed at length. Specifically statistical methods are used to quantitatively corroborate the ideas elaborated in the article.

[AI-136] Responsible AI Question Bank: A Comprehensive Tool for AI Risk Assessment

链接: https://arxiv.org/abs/2408.11820
作者: Sung Une Lee,Harsha Perera,Yue Liu,Boming Xia,Qinghua Lu,Liming Zhu
关键词-EN: RAI Question Bank, Artificial Intelligence, Question Bank, RAI Question, growth of Artificial
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
*备注: 30 pages, 6 tables, 14 figures

点击查看摘要

Abstract:The rapid growth of Artificial Intelligence (AI) has underscored the urgent need for responsible AI practices. Despite increasing interest, a comprehensive AI risk assessment toolkit remains lacking. This study introduces our Responsible AI (RAI) Question Bank, a comprehensive framework and tool designed to support diverse AI initiatives. By integrating AI ethics principles such as fairness, transparency, and accountability into a structured question format, the RAI Question Bank aids in identifying potential risks, aligning with emerging regulations like the EU AI Act, and enhancing overall AI governance. A key benefit of the RAI Question Bank is its systematic approach to linking lower-level risk questions to higher-level ones and related themes, preventing siloed assessments and ensuring a cohesive evaluation process. Case studies illustrate the practical application of the RAI Question Bank in assessing AI projects, from evaluating risk factors to informing decision-making processes. The study also demonstrates how the RAI Question Bank can be used to ensure compliance with standards, mitigate risks, and promote the development of trustworthy AI systems. This work advances RAI by providing organizations with a valuable tool to navigate the complexities of ethical AI development and deployment while ensuring comprehensive risk management.

[AI-137] Is ChatGPT a Good Software Librarian? An Exploratory Study on the Use of ChatGPT for Software Library Recommendations

链接: https://arxiv.org/abs/2408.05128
作者: Jasmine Latendresse,SayedHassan Khatoonabadi,Ahmad Abdellatif,Emad Shihab
关键词-EN: Large Language Models, play a critical, critical role, Large Language, Language Models
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: Submitted

点击查看摘要

Abstract:Software libraries play a critical role in the functionality, efficiency, and maintainability of software systems. As developers increasingly rely on Large Language Models (LLMs) to streamline their coding processes, the effectiveness of these models in recommending appropriate libraries becomes crucial yet remains largely unexplored. In this paper, we assess the effectiveness of ChatGPT as a software librarian and identify areas for improvement. We conducted an empirical study using GPT-3.5 Turbo to generate Python code for 10,000 Stack Overflow questions. Our findings show that ChatGPT uses third-party libraries nearly 10% more often than human developers, favoring widely adopted and well-established options. However, 14.2% of the recommended libraries had restrictive copyleft licenses, which were not explicitly communicated by ChatGPT. Additionally, 6.5% of the libraries did not work out of the box, leading to potential developer confusion and wasted time. While ChatGPT can be an effective software librarian, it should be improved by providing more explicit information on maintainability metrics and licensing. We recommend that developers implement rigorous dependency management practices and double-check library licenses before integrating LLM-generated code into their projects.

[AI-138] On the Variability of AI-based Software Systems Due to Environment Configurations

链接: https://arxiv.org/abs/2408.02825
作者: Musfiqur Rahman,SayedHassan Khatoonabadi,Ahmad Abdellatif,Haya Samaana,Emad Shihab
关键词-EN: include Artificial Intelligence, Artificial Intelligence, systems include Artificial, include Artificial, software systems include
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: Submitted to the Information and Software Technology journal for review

点击查看摘要

Abstract:[Context] Nowadays, many software systems include Artificial Intelligence (AI) components and changes in the development environment have been known to induce variability in an AI-based system. [Objective] However, how an environment configuration impacts the variability of these systems is yet to be explored. Understanding and quantifying the degree of variability due to such configurations can help practitioners decide the best environment configuration for the most stable AI products. [Method] To achieve this goal, we performed experiments with eight different combinations of three key environment variables (operating system, Python version, and CPU architecture) on 30 open-source AI-based systems using the Travis CI platform. We evaluate variability using three metrics: the output of an AI component like an ML model (performance), the time required to build and run a system (processing time), and the cost associated with building and running a system (expense). [Results] Our results indicate that variability exists in all three metrics; however, it is observed more frequently with respect to processing time and expense than performance. For example, between Linux and MacOS, variabilities are observed in 23%, 96.67%, and 100% of the studied projects in performance, processing time, and expense, respectively. [Conclusion] Our findings underscore the importance of identifying the optimal combination of configuration settings to mitigate performance drops and reduce retraining time and cost before deploying an AI-based system.

[AI-139] Predicting the First Response Latency of Maintainers and Contributors in Pull Requests

链接: https://arxiv.org/abs/2311.07786
作者: SayedHassan Khatoonabadi,Ahmad Abdellatif,Diego Elias Costa,Emad Shihab
关键词-EN: Pull Request, maintainers, response, response latency, faster first responses
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: Manuscript accepted for publication in IEEE Transactions on Software Engineering (TSE)

点击查看摘要

Abstract:The success of a Pull Request (PR) depends on the responsiveness of the maintainers and the contributor during the review process. Being aware of the expected waiting times can lead to better interactions and managed expectations for both the maintainers and the contributor. In this paper, we propose a machine-learning approach to predict the first response latency of the maintainers following the submission of a PR, and the first response latency of the contributor after receiving the first response from the maintainers. We curate a dataset of 20 large and popular open-source projects on GitHub and extract 21 features to characterize projects, contributors, PRs, and review processes. Using these features, we then evaluate seven types of classifiers to identify the best-performing models. We also conduct permutation feature importance and SHAP analyses to understand the importance and the impact of different features on the predicted response latencies. We find that our CatBoost models are the most effective for predicting the first response latencies of both maintainers and contributors. We also observe that PRs submitted earlier in the week, containing an average number of commits, and with concise descriptions are more likely to receive faster first responses from the maintainers. Similarly, PRs with a lower first response latency from maintainers, that received the first response of maintainers earlier in the week, and containing an average number of commits tend to receive faster first responses from the contributors. Additionally, contributors with a higher acceptance rate and a history of timely responses in the project are likely to both obtain and provide faster first responses. Moreover, we show the effectiveness of our approach in a cross-project setting.

[AI-140] Understanding the Helpfulness of Stale Bot for Pull-based Development: An Empirical Study of 20 Large Open-Source Projects

链接: https://arxiv.org/abs/2305.18150
作者: SayedHassan Khatoonabadi,Diego Elias Costa,Suhaib Mujahid,Emad Shihab
关键词-EN: Pull Requests, Stale bot, Stale, PRs, making it difficult
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: Manuscript submitted to ACM Transactions on Software Engineering and Methodology

点击查看摘要

Abstract:Pull Requests (PRs) that are neither progressed nor resolved clutter the list of PRs, making it difficult for the maintainers to manage and prioritize unresolved PRs. To automatically track, follow up, and close such inactive PRs, Stale bot was introduced by GitHub. Despite its increasing adoption, there are ongoing debates on whether using Stale bot alleviates or exacerbates the problem of inactive PRs. To better understand if and how Stale bot helps projects in their pull-based development workflow, we perform an empirical study of 20 large and popular open-source projects. We find that Stale bot can help deal with a backlog of unresolved PRs as the projects closed more PRs within the first few months of adoption. Moreover, Stale bot can help improve the efficiency of the PR review process as the projects reviewed PRs that ended up merged and resolved PRs that ended up closed faster after the adoption. However, Stale bot can also negatively affect the contributors as the projects experienced a considerable decrease in their number of active contributors after the adoption. Therefore, relying solely on Stale bot to deal with inactive PRs may lead to decreased community engagement and an increased probability of contributor abandonment.

[AI-141] On Wasted Contributions: Understanding the Dynamics of Contributor-Abandoned Pull Requests

链接: https://arxiv.org/abs/2110.15447
作者: SayedHassan Khatoonabadi,Diego Elias Costa,Rabe Abdalkareem,Emad Shihab
关键词-EN: enabled numerous volunteers, Pull-based development, fewer barriers, development has enabled, enabled numerous
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: Manuscript accepted for publication in ACM Transactions on Software Engineering and Methodology (TOSEM)

点击查看摘要

Abstract:Pull-based development has enabled numerous volunteers to contribute to open-source projects with fewer barriers. Nevertheless, a considerable amount of pull requests (PRs) with valid contributions are abandoned by their contributors, wasting the effort and time put in by both the contributors and maintainers. To better understand the underlying dynamics of contributor-abandoned PRs, we conduct a mixed-methods study using both quantitative and qualitative methods. We curate a dataset consisting of 265,325 PRs including 4,450 abandoned ones from ten popular and mature GitHub projects and measure 16 features characterizing PRs, contributors, review processes, and projects. Using statistical and machine learning techniques, we find that complex PRs, novice contributors, and lengthy reviews have a higher probability of abandonment and the rate of PR abandonment fluctuates alongside the projects’ maturity or workload. To identify why contributors abandon their PRs, we also manually examine a random sample of 354 abandoned PRs. We observe that the most frequent abandonment reasons are related to the obstacles faced by contributors, followed by the hurdles imposed by maintainers during the review process. Finally, we survey the top core maintainers of the studied projects to understand their perspectives on dealing with PR abandonment and on our findings.

[AI-142] GAP2WSS: A Genetic Algorithm based on the Pareto Principle for Web Service Selection

链接: https://arxiv.org/abs/2109.10430
作者: SayedHassan Khatoonabadi,Shahriar Lotfi,Ayaz Isazadeh
关键词-EN: candidate Web services, Web services, candidate Web, Web service selection, Web
类目: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
*备注:

点击查看摘要

Abstract:Despite all the progress in Web service selection, the need for an approach with a better optimality and performance still remains. This paper presents a genetic algorithm by adopting the Pareto principle that is called GAP2WSS for selecting a Web service for each task of a composite Web service from a pool of candidate Web services. In contrast to the existing approaches, all global QoS constraints, interservice constraints, and transactional constraints are considered simultaneously. At first, all candidate Web services are scored and ranked per each task using the proposed mechanism. Then, the top 20 percent of the candidate Web services of each task are considered as the candidate Web services of the corresponding task to reduce the problem search space. Finally, the Web service selection problem is solved by focusing only on these 20 percent candidate Web services of each task using a genetic algorithm. Empirical studies demonstrate this approach leads to a higher efficiency and efficacy as compared with the case that all the candidate Web services are considered in solving the problem.

[AI-143] Automatic Organ and Pan-cancer Segmentation in Abdomen CT: the FLARE 2023 Challenge MICCAI2024

链接: https://arxiv.org/abs/2408.12534
作者: Jun Ma,Yao Zhang,Song Gu,Cheng Ge,Ershuai Wang,Qin Zhou,Ziyan Huang,Pengju Lyu,Jian He,Bo Wang
关键词-EN: abdomen Computed Tomography, Computed Tomography, precise cancer diagnosis, abdomen Computed, diagnosis and treatment
类目: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
*备注: MICCAI 2024 FLARE Challenge Summary

点击查看摘要

Abstract:Organ and cancer segmentation in abdomen Computed Tomography (CT) scans is the prerequisite for precise cancer diagnosis and treatment. Most existing benchmarks and algorithms are tailored to specific cancer types, limiting their ability to provide comprehensive cancer analysis. This work presents the first international competition on abdominal organ and pan-cancer segmentation by providing a large-scale and diverse dataset, including 4650 CT scans with various cancer types from over 40 medical centers. The winning team established a new state-of-the-art with a deep learning-based cascaded framework, achieving average Dice Similarity Coefficient scores of 92.3% for organs and 64.9% for lesions on the hidden multi-national testing set. The dataset and code of top teams are publicly available, offering a benchmark platform to drive further innovations this https URL.

[AI-144] Dynamic PDB: A New Dataset and a SE(3) Model Extension by Integrating Dynamic Behaviors and Physical Properties in Protein Structures

链接: https://arxiv.org/abs/2408.12413
作者: Ce Liu,Jun Wang,Zhiqiang Cai,Yingxu Wang,Huizhen Kuang,Kaihui Cheng,Liwei Zhang,Qingkun Su,Yining Tang,Fenglei Cao,Limei Han,Siyu Zhu,Yuan Qi
关键词-EN: protein structure collection, static protein structure, Protein Data Bank, physical properties, vital characteristics
类目: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Despite significant progress in static protein structure collection and prediction, the dynamic behavior of proteins, one of their most vital characteristics, has been largely overlooked in prior research. This oversight can be attributed to the limited availability, diversity, and heterogeneity of dynamic protein datasets. To address this gap, we propose to enhance existing prestigious static 3D protein structural databases, such as the Protein Data Bank (PDB), by integrating dynamic data and additional physical properties. Specifically, we introduce a large-scale dataset, Dynamic PDB, encompassing approximately 12.6K proteins, each subjected to all-atom molecular dynamics (MD) simulations lasting 1 microsecond to capture conformational changes. Furthermore, we provide a comprehensive suite of physical properties, including atomic velocities and forces, potential and kinetic energies of proteins, and the temperature of the simulation environment, recorded at 1 picosecond intervals throughout the simulations. For benchmarking purposes, we evaluate state-of-the-art methods on the proposed dataset for the task of trajectory prediction. To demonstrate the value of integrating richer physical properties in the study of protein dynamics and related model design, we base our approach on the SE(3) diffusion model and incorporate these physical properties into the trajectory prediction process. Preliminary results indicate that this straightforward extension of the SE(3) model yields improved accuracy, as measured by MAE and RMSD, when the proposed physical properties are taken into consideration.

[AI-145] DeepHQ: Learned Hierarchical Quantizer for Progressive Deep Image Coding

链接: https://arxiv.org/abs/2408.12150
作者: Jooyoung Lee,Se Yoon Jeong,Munchurl Kim
关键词-EN: Unlike fixed, variable-rate image coding, providing high compression, variable-rate image, increasing the versatility
类目: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Unlike fixed- or variable-rate image coding, progressive image coding (PIC) aims to compress various qualities of images into a single bitstream, increasing the versatility of bitstream utilization and providing high compression efficiency compared to simulcast compression. Research on neural network (NN)-based PIC is in its early stages, mainly focusing on applying varying quantization step sizes to the transformed latent representations in a hierarchical manner. These approaches are designed to compress only the progressively added information as the quality improves, considering that a wider quantization interval for lower-quality compression includes multiple narrower sub-intervals for higher-quality compression. However, the existing methods are based on handcrafted quantization hierarchies, resulting in sub-optimal compression efficiency. In this paper, we propose an NN-based progressive coding method that firstly utilizes learned quantization step sizes via learning for each quantization layer. We also incorporate selective compression with which only the essential representation components are compressed for each quantization layer. We demonstrate that our method achieves significantly higher coding efficiency than the existing approaches with decreased decoding time and reduced model size.

[AI-146] Exploring the Feasibility of Automated Data Standardization using Large Language Models for Seamless Positioning

链接: https://arxiv.org/abs/2408.12080
作者: Max J. L. Lee,Ju Lin,Li-Ta Hsu
关键词-EN: Large Language Models, leveraging Large Language, standardization leveraging Large, Language Models, Large Language
类目: ignal Processing (eess.SP); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
*备注: Accepted at IPIN 2024. To be published in IEEE Xplore

点击查看摘要

Abstract:We propose a feasibility study for real-time automated data standardization leveraging Large Language Models (LLMs) to enhance seamless positioning systems in IoT environments. By integrating and standardizing heterogeneous sensor data from smartphones, IoT devices, and dedicated systems such as Ultra-Wideband (UWB), our study ensures data compatibility and improves positioning accuracy using the Extended Kalman Filter (EKF). The core components include the Intelligent Data Standardization Module (IDSM), which employs a fine-tuned LLM to convert varied sensor data into a standardized format, and the Transformation Rule Generation Module (TRGM), which automates the creation of transformation rules and scripts for ongoing data standardization. Evaluated in real-time environments, our study demonstrates adaptability and scalability, enhancing operational efficiency and accuracy in seamless navigation. This study underscores the potential of advanced LLMs in overcoming sensor data integration complexities, paving the way for more scalable and precise IoT navigation solutions.

[AI-147] Distributed Noncoherent Joint Transmission Based on Multi-Agent Reinforcement Learning for Dense Small Cell MISO Systems

链接: https://arxiv.org/abs/2408.12067
作者: Shaozhuang Bai,Zhenzhen Gao,Xuewen Liao
关键词-EN: small cell base, dense small cell, multi-antenna small cell, cell base stations, shared frequency band
类目: ignal Processing (eess.SP); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
*备注:

点击查看摘要

Abstract:We consider a dense small cell (DSC) network where multi-antenna small cell base stations (SBSs) transmit data to single-antenna users over a shared frequency band. To enhance capacity, a state-of-the-art technique known as noncoherent joint transmission (JT) is applied, enabling users to receive data from multiple coordinated SBSs. However, the sum rate maximization problem with noncoherent JT is inherently nonconvex and NP-hard. While existing optimization-based noncoherent JT algorithms can provide near-optimal performance, they require global channel state information (CSI) and multiple iterations, which makes them difficult to be implemeted in DSC this http URL overcome these challenges, we first prove that the optimal beamforming structure is the same for both the power minimization problem and the sum rate maximization problem, and then mathematically derive the optimal beamforming structure for both problems by solving the power minimization problem.The optimal beamforming structure can effectively reduces the variable this http URL exploiting the optimal beamforming structure, we propose a deep deterministic policy gradient-based distributed noncoherent JT scheme to maximize the system sum this http URL the proposed scheme, each SBS utilizes global information for training and uses local CSI to determine beamforming vectors. Simulation results demonstrate that the proposed scheme achieves comparable performance with considerably lower computational complexity and information overhead compared to centralized iterative optimization-based techniques, making it more attractive for practical deployment.

[AI-148] A Deconfounding Approach to Climate Model Bias Correction

链接: https://arxiv.org/abs/2408.12063
作者: Wentao Gao,Jiuyong Li,Debo Cheng,Lin Liu,Jixue Liu,Thuc Duy Le,Xiaojing Du,Xiongren Chen,Yanchang Zhao,Yun Chen
关键词-EN: Global Climate Models, Earth systems, simulating the Earth, predicting future climate, Global Climate
类目: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
*备注:

点击查看摘要

Abstract:Global Climate Models (GCMs) are crucial for predicting future climate changes by simulating the Earth systems. However, GCM outputs exhibit systematic biases due to model uncertainties, parameterization simplifications, and inadequate representation of complex climate phenomena. Traditional bias correction methods, which rely on historical observation data and statistical techniques, often neglect unobserved confounders, leading to biased results. This paper proposes a novel bias correction approach to utilize both GCM and observational data to learn a factor model that captures multi-cause latent confounders. Inspired by recent advances in causality based time series deconfounding, our method first constructs a factor model to learn latent confounders from historical data and then applies them to enhance the bias correction process using advanced time series forecasting models. The experimental results demonstrate significant improvements in the accuracy of precipitation outputs. By addressing unobserved confounders, our approach offers a robust and theoretically grounded solution for climate model bias correction.

[AI-149] From Glucose Patterns to Health Outcomes: A Generalizable Foundation Model for Continuous Glucose Monitor Data Analysis

链接: https://arxiv.org/abs/2408.11876
作者: Guy Lutsker,Gal Sapir,Anastasia Godneva,Smadar Shilo,Jerry R Greenfield,Dorit Samocha-Bonet,Shie Mannor,Eli Meirom,Gal Chechik,Hagai Rossman,Eran Segal
关键词-EN: self-supervised learning enabled, offer great potential, Recent advances, CGM, self-supervised learning
类目: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Recent advances in self-supervised learning enabled novel medical AI models, known as foundation models (FMs) that offer great potential for characterizing health from diverse biomedical data. Continuous glucose monitoring (CGM) provides rich, temporal data on glycemic patterns, but its full potential for predicting broader health outcomes remains underutilized. Here, we present GluFormer, a generative foundation model on biomedical temporal data based on a transformer architecture, and trained on over 10 million CGM measurements from 10,812 non-diabetic individuals. We tokenized the CGM training data and trained GluFormer using next token prediction in a generative, autoregressive manner. We demonstrate that GluFormer generalizes effectively to 15 different external datasets, including 4936 individuals across 5 different geographical regions, 6 different CGM devices, and several metabolic disorders, including normoglycemic, prediabetic, and diabetic populations, as well as those with gestational diabetes and obesity. GluFormer produces embeddings which outperform traditional CGM analysis tools, and achieves high Pearson correlations in predicting clinical parameters such as HbA1c, liver-related parameters, blood lipids, and sleep-related indices. Notably, GluFormer can also predict onset of future health outcomes even 4 years in advance. We also show that CGM embeddings from pre-intervention periods in Randomized Clinical Trials (RCTs) outperform other methods in predicting primary and secondary outcomes. When integrating dietary data into GluFormer, we show that the enhanced model can accurately generate CGM data based only on dietary intake data, simulate outcomes of dietary interventions, and predict individual responses to specific foods. Overall, we show that GluFormer accurately predicts health outcomes which generalize across different populations metabolic conditions.

[AI-150] Online Electric Vehicle Charging Detection Based on Memory-based Transformer using Smart Meter Data

链接: https://arxiv.org/abs/2408.11828
作者: Ammar Mansoor Kamoona,Hui Song,Mahdi Jalili,Hao Wang,Reza Razzaghi,Xinghuo Yu
关键词-EN: Electric Vehicles, poses unique challenges, Distribution Network Operators, popularity of Electric, electricity Distribution Network
类目: ignal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:The growing popularity of Electric Vehicles (EVs) poses unique challenges for grid operators and infrastructure, which requires effectively managing these vehicles’ integration into the grid. Identification of EVs charging is essential to electricity Distribution Network Operators (DNOs) for better planning and managing the distribution grid. One critical aspect is the ability to accurately identify the presence of EV charging in the grid. EV charging identification using smart meter readings obtained from behind-the-meter devices is a challenging task that enables effective managing the integration of EVs into the existing power grid. Different from the existing supervised models that require addressing the imbalance problem caused by EVs and non-EVs data, we propose a novel unsupervised memory-based transformer (M-TR) that can run in real-time (online) to detect EVs charging from a streaming smart meter. It dynamically leverages coarse-scale historical information using an M-TR encoder from an extended global temporal window, in conjunction with an M-TR decoder that concentrates on a limited time frame, local window, aiming to capture the fine-scale characteristics of the smart meter data. The M-TR is based on an anomaly detection technique that does not require any prior knowledge about EVs charging profiles, nor it does only require real power consumption data of non-EV users. In addition, the proposed model leverages the power of transfer learning. The M-TR is compared with different state-of-the-art methods and performs better than other unsupervised learning models. The model can run with an excellent execution time of 1.2 sec. for 1-minute smart recordings.

计算机视觉

[CV-0] DreamCinema: Cinematic Transfer with Free Camera and 3D Character

链接: https://arxiv.org/abs/2408.12601
作者: Weiliang Chen,Fangfu Liu,Diankun Wu,Haowen Sun,Haixu Song,Yueqi Duan
关键词-EN: digital media, flourishing era, era of digital, personal filmmaker, transfer empowers filmmakers
类目: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Multimedia (cs.MM)
*备注: Project page: this https URL

点击查看摘要

Abstract:We are living in a flourishing era of digital media, where everyone has the potential to become a personal filmmaker. Current research on cinematic transfer empowers filmmakers to reproduce and manipulate the visual elements (e.g., cinematography and character behaviors) from classic shots. However, characters in the reimagined films still rely on manual crafting, which involves significant technical complexity and high costs, making it unattainable for ordinary users. Furthermore, their estimated cinematography lacks smoothness due to inadequate capturing of inter-frame motion and modeling of physical trajectories. Fortunately, the remarkable success of 2D and 3D AIGC has opened up the possibility of efficiently generating characters tailored to users’ needs, diversifying cinematography. In this paper, we propose DreamCinema, a novel cinematic transfer framework that pioneers generative AI into the film production paradigm, aiming at facilitating user-friendly film creation. Specifically, we first extract cinematic elements (i.e., human and camera pose) and optimize the camera trajectory. Then, we apply a character generator to efficiently create 3D high-quality characters with a human structure prior. Finally, we develop a structure-guided motion transfer strategy to incorporate generated characters into film creation and transfer it via 3D graphics engines smoothly. Extensive experiments demonstrate the effectiveness of our method for creating high-quality films with free camera and 3D characters.

[CV-1] ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction

链接: https://arxiv.org/abs/2408.12598
作者: Ziyu Tang,Weicai Ye,Yifan Wang,Di Huang,Hujun Bao,Tong He,Guofeng Zhang
关键词-EN: Neural implicit reconstruction, Neural implicit, recovering dense, implicit reconstruction, reconstruction via volume
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Neural implicit reconstruction via volume rendering has demonstrated its effectiveness in recovering dense 3D surfaces. However, it is non-trivial to simultaneously recover meticulous geometry and preserve smoothness across regions with differing characteristics. To address this issue, previous methods typically employ geometric priors, which are often constrained by the performance of the prior models. In this paper, we propose ND-SDF, which learns a Normal Ddeflection field to represent the angular deviation between the scene normal and the prior normal. Unlike previous methods that uniformly apply geometric priors on all samples, introducing significant bias in accuracy, our proposed normal deflection field dynamically learns and adapts the utilization of samples based on their specific characteristics, thereby improving both the accuracy and effectiveness of the model. Our method not only obtains smooth weakly textured regions such as walls and floors but also preserves the geometric details of complex structures. In addition, we introduce a novel ray sampling strategy based on the deflection angle to facilitate the unbiased rendering process, which significantly improves the quality and accuracy of intricate surfaces, especially on thin structures. Consistent improvements on various challenging datasets demonstrate the superiority of our method.

[CV-2] Automating Deformable Gasket Assembly

链接: https://arxiv.org/abs/2408.12593
作者: Simeon Adebola,Tara Sadjadpour,Karim El-Refai,Will Panitch,Zehan Ma,Roy Lin,Tianshuang Qiu,Shreya Ganti,Charlotte Le,Jaimyn Drake,Ken Goldberg
关键词-EN: Gasket Assembly, Gasket, deformable gasket, Assembly, narrow channel
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
*备注: Content without Appendix accepted for IEEE CASE 2024

点击查看摘要

Abstract:In Gasket Assembly, a deformable gasket must be aligned and pressed into a narrow channel. This task is common for sealing surfaces in the manufacturing of automobiles, appliances, electronics, and other products. Gasket Assembly is a long-horizon, high-precision task and the gasket must align with the channel and be fully pressed in to achieve a secure fit. To compare approaches, we present 4 methods for Gasket Assembly: one policy from deep imitation learning and three procedural algorithms. We evaluate these methods with 100 physical trials. Results suggest that the Binary+ algorithm succeeds in 10/10 on the straight channel whereas the learned policy based on 250 human teleoperated demonstrations succeeds in 8/10 trials and is significantly slower. Code, CAD models, videos, and data can be found at this https URL

[CV-3] xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations ECCV24

链接: https://arxiv.org/abs/2408.12590
作者: Can Qin,Congying Xia,Krithika Ramakrishnan,Michael Ryoo,Lifu Tu,Yihao Feng,Manli Shu,Honglu Zhou,Anas Awadalla,Jun Wang,Senthil Purushwalkam,Le Xue,Yingbo Zhou,Huan Wang,Silvio Savarese,Juan Carlos Niebles,Zeyuan Chen,Ran Xu,Caiming Xiong
关键词-EN: producing realistic scenes, textual descriptions, capable of producing, producing realistic, realistic scenes
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注: Accepted by ECCV24 AI4VA

点击查看摘要

Abstract:We present xGen-VideoSyn-1, a text-to-video (T2V) generation model capable of producing realistic scenes from textual descriptions. Building on recent advancements, such as OpenAI’s Sora, we explore the latent diffusion model (LDM) architecture and introduce a video variational autoencoder (VidVAE). VidVAE compresses video data both spatially and temporally, significantly reducing the length of visual tokens and the computational demands associated with generating long-sequence videos. To further address the computational costs, we propose a divide-and-merge strategy that maintains temporal consistency across video segments. Our Diffusion Transformer (DiT) model incorporates spatial and temporal self-attention layers, enabling robust generalization across different timeframes and aspect ratios. We have devised a data processing pipeline from the very beginning and collected over 13M high-quality video-text pairs. The pipeline includes multiple steps such as clipping, text detection, motion estimation, aesthetics scoring, and dense captioning based on our in-house video-LLM model. Training the VidVAE and DiT models required approximately 40 and 642 H100 days, respectively. Our model supports over 14-second 720p video generation in an end-to-end way and demonstrates competitive performance against state-of-the-art T2V models.

[CV-4] Real-Time Video Generation with Pyramid Attention Broadcast

链接: https://arxiv.org/abs/2408.12588
作者: Xuanlei Zhao,Xiaolong Jin,Kai Wang,Yang You
关键词-EN: present Pyramid Attention, high quality, quality and training-free, training-free approach, approach for DiT-based
类目: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
*备注:

点击查看摘要

Abstract:We present Pyramid Attention Broadcast (PAB), a real-time, high quality and training-free approach for DiT-based video generation. Our method is founded on the observation that attention difference in the diffusion process exhibits a U-shaped pattern, indicating significant redundancy. We mitigate this by broadcasting attention outputs to subsequent steps in a pyramid style. It applies different broadcast strategies to each attention based on their variance for best efficiency. We further introduce broadcast sequence parallel for more efficient distributed inference. PAB demonstrates superior results across three models compared to baselines, achieving real-time generation for up to 720p videos. We anticipate that our simple yet effective method will serve as a robust baseline and facilitate future research and application for video generation.

[CV-5] Enhanced Parking Perception by Multi-Task Fisheye Cross-view Transformers ATC

链接: https://arxiv.org/abs/2408.12575
作者: Antonyo Musabini,Ivan Novikov,Sana Soula,Christel Leonet,Lihao Wang,Rachid Benmokhtar,Fabian Burger,Thomas Boulay,Xavier Perrotton
关键词-EN: algorithms primarily focus, error-prone homographic projection, Current parking area, Driver Assistance System, Advanced Driver Assistance
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注: 26th Irish Machine Vision and Image Processing Conference, Data-Driven Autonomy Workshop (matching camera-ready version)

点击查看摘要

Abstract:Current parking area perception algorithms primarily focus on detecting vacant slots within a limited range, relying on error-prone homographic projection for both labeling and inference. However, recent advancements in Advanced Driver Assistance System (ADAS) require interaction with end-users through comprehensive and intelligent Human-Machine Interfaces (HMIs). These interfaces should present a complete perception of the parking area going from distinguishing vacant slots’ entry lines to the orientation of other parked vehicles. This paper introduces Multi-Task Fisheye Cross View Transformers (MT F-CVT), which leverages features from a four-camera fisheye Surround-view Camera System (SVCS) with multihead attentions to create a detailed Bird-Eye View (BEV) grid feature map. Features are processed by both a segmentation decoder and a Polygon-Yolo based object detection decoder for parking slots and vehicles. Trained on data labeled using LiDAR, MT F-CVT positions objects within a 25m x 25m real open-road scenes with an average error of only 20 cm. Our larger model achieves an F-1 score of 0.89. Moreover the smaller model operates at 16 fps on an Nvidia Jetson Orin embedded board, with similar detection results to the larger one. MT F-CVT demonstrates robust generalization capability across different vehicles and camera rig configurations. A demo video from an unseen vehicle and camera rig is available at: this https URL.

[CV-6] MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

链接: https://arxiv.org/abs/2408.12574
作者: Haojun Shi,Suyu Ye,Xinyu Fang,Chuanyang Jin,Layla Isik,Yen-Ling Kuo,Tianmin Shu
关键词-EN: Understanding people social, Theory of Mind, Understanding people, complex real-world scenarios, intricate mental reasoning
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
*备注: Project website: this https URL Code: this https URL

点击查看摘要

Abstract:Understanding people’s social interactions in complex real-world scenarios often relies on intricate mental reasoning. To truly understand how and why people interact with one another, we must infer the underlying mental states that give rise to the social interactions, i.e., Theory of Mind reasoning in multi-agent interactions. Additionally, social interactions are often multi-modal – we can watch people’s actions, hear their conversations, and/or read about their past behaviors. For AI systems to successfully and safely interact with people in real-world environments, they also need to understand people’s mental states as well as their inferences about each other’s mental states based on multi-modal information about their interactions. For this, we introduce MuMA-ToM, a Multi-modal Multi-Agent Theory of Mind benchmark. MuMA-ToM is the first multi-modal Theory of Mind benchmark that evaluates mental reasoning in embodied multi-agent interactions. In MuMA-ToM, we provide video and text descriptions of people’s multi-modal behavior in realistic household environments. Based on the context, we then ask questions about people’s goals, beliefs, and beliefs about others’ goals. We validated MuMA-ToM in a human experiment and provided a human baseline. We also proposed a novel multi-modal, multi-agent ToM model, LIMP (Language model-based Inverse Multi-agent Planning). Our experimental results show that LIMP significantly outperforms state-of-the-art methods, including large multi-modal models (e.g., GPT-4o, Gemini-1.5 Pro) and a recent multi-modal ToM model, BIP-ALM.

[CV-7] Sapiens: Foundation for Human Vision Models ECCV2024

链接: https://arxiv.org/abs/2408.12569
作者: Rawal Khirodkar,Timur Bagautdinov,Julieta Martinez,Su Zhaoen,Austin James,Peter Selednik,Stuart Anderson,Shunsuke Saito
关键词-EN: surface normal prediction, fundamental human-centric vision, body-part segmentation, pose estimation, depth estimation
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注: ECCV 2024 (Oral)

点击查看摘要

Abstract:We present Sapiens, a family of models for four fundamental human-centric vision tasks - 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Our models natively support 1K high-resolution inference and are extremely easy to adapt for individual tasks by simply fine-tuning models pretrained on over 300 million in-the-wild human images. We observe that, given the same computational budget, self-supervised pretraining on a curated dataset of human images significantly boosts the performance for a diverse set of human-centric tasks. The resulting models exhibit remarkable generalization to in-the-wild data, even when labeled data is scarce or entirely synthetic. Our simple model design also brings scalability - model performance across tasks improves as we scale the number of parameters from 0.3 to 2 billion. Sapiens consistently surpasses existing baselines across various human-centric benchmarks. We achieve significant improvements over the prior state-of-the-art on Humans-5K (pose) by 7.6 mAP, Humans-2K (part-seg) by 17.1 mIoU, Hi4D (depth) by 22.4% relative RMSE, and THuman2 (normal) by 53.5% relative angular error.

[CV-8] Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers ECCV2024

链接: https://arxiv.org/abs/2408.12568
作者: Sayed Mohammad Vakilzadeh Hatefi,Maximilian Dreyer,Reduan Achtibat,Thomas Wiegand,Wojciech Samek,Sebastian Lapuschkin
关键词-EN: Deep Neural Networks, huge computational costs, Deep Neural, complex problems, billions of parameters
类目: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
*备注: Accepted as a workshop paper at ECCV 2024 31 pages (14 pages manuscript, 4 pages references, 13 pages appendix)

点击查看摘要

Abstract:To solve ever more complex problems, Deep Neural Networks are scaled to billions of parameters, leading to huge computational costs. An effective approach to reduce computational requirements and increase efficiency is to prune unnecessary components of these often over-parameterized networks. Previous work has shown that attribution methods from the field of eXplainable AI serve as effective means to extract and prune the least relevant network components in a few-shot fashion. We extend the current state by proposing to explicitly optimize hyperparameters of attribution methods for the task of pruning, and further include transformer-based networks in our analysis. Our approach yields higher model compression rates of large transformer- and convolutional architectures (VGG, ResNet, ViT) compared to previous works, while still attaining high performance on ImageNet classification tasks. Here, our experiments indicate that transformers have a higher degree of over-parameterization compared to convolutional neural networks. Code is available at \hrefthis https URL\textthis https link .

[CV-9] Comparing YOLOv5 Variants for Vehicle Detection: A Performance Analysis

链接: https://arxiv.org/abs/2408.12550
作者: Athulya Sundaresan Geetha
关键词-EN: Vehicle detection, important task, management of traffic, traffic and automatic, automatic vehicles
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:Vehicle detection is an important task in the management of traffic and automatic vehicles. This study provides a comparative analysis of five YOLOv5 variants, YOLOv5n6s, YOLOv5s6s, YOLOv5m6s, YOLOv5l6s, and YOLOv5x6s, for vehicle detection in various environments. The research focuses on evaluating the effectiveness of these models in detecting different types of vehicles, such as Car, Bus, Truck, Bicycle, and Motorcycle, under varying conditions including lighting, occlusion, and weather. Performance metrics such as precision, recall, F1-score, and mean Average Precision are utilized to assess the accuracy and reliability of each model. YOLOv5n6s demonstrated a strong balance between precision and recall, particularly in detecting Cars. YOLOv5s6s and YOLOv5m6s showed improvements in recall, enhancing their ability to detect all relevant objects. YOLOv5l6s, with its larger capacity, provided robust performance, especially in detecting Cars, but not good with identifying Motorcycles and Bicycles. YOLOv5x6s was effective in recognizing Buses and Cars but faced challenges with Motorcycle class.

[CV-10] Deep Learning Improvements for Sparse Spatial Field Reconstruction

链接: https://arxiv.org/abs/2408.12531
作者: Robert Sunderhaft,Logan Frank,Jim Davis
关键词-EN: Earth Sciences, Sciences and Fluid, spatial field, global spatial field, Accurately reconstructing
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:Accurately reconstructing a global spatial field from sparse data has been a longstanding problem in several domains, such as Earth Sciences and Fluid Dynamics. Historically, scientists have approached this problem by employing complex physics models to reconstruct the spatial fields. However, these methods are often computationally intensive. With the increase in popularity of machine learning (ML), several researchers have applied ML to the spatial field reconstruction task and observed improvements in computational efficiency. One such method in arXiv:2101.00554 utilizes a sparse mask of sensor locations and a Voronoi tessellation with sensor measurements as inputs to a convolutional neural network for reconstructing the global spatial field. In this work, we propose multiple adjustments to the aforementioned approach and show improvements on geoscience and fluid dynamics simulation datasets. We identify and discuss scenarios that benefit the most using the proposed ML-based spatial field reconstruction approach.

[CV-11] Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

链接: https://arxiv.org/abs/2408.12528
作者: Jinheng Xie,Weijia Mao,Zechen Bai,David Junhao Zhang,Weihao Wang,Kevin Qinghong Lin,Yuchao Gu,Zhijie Chen,Zhenheng Yang,Mike Zheng Shou
关键词-EN: Show-o unifies autoregressive, unifies multimodal understanding, Show-o unifies, Show-o, unifies multimodal
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注: Technical Report

点击查看摘要

Abstract:We present a unified transformer, i.e., Show-o, that unifies multimodal understanding and generation. Unlike fully autoregressive models, Show-o unifies autoregressive and (discrete) diffusion modeling to adaptively handle inputs and outputs of various and mixed modalities. The unified model flexibly supports a wide range of vision-language tasks including visual question-answering, text-to-image generation, text-guided inpainting/extrapolation, and mixed-modality generation. Across various benchmarks, it demonstrates comparable or superior performance to existing individual models with an equivalent or larger number of parameters tailored for understanding or generation. This significantly highlights its potential as a next-generation foundation model. Code and models are released at this https URL.

[CV-12] UMAD: University of Macau Anomaly Detection Benchmark Dataset IROS

链接: https://arxiv.org/abs/2408.12527
作者: Dong Li,Lineng Chen,Cheng-Zhong Xu,Hui Kong
关键词-EN: Anomaly detection, detection, identifying anomalous regions, Anomaly, reference
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
*备注: Accepted by the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024, project code at this https URL

点击查看摘要

Abstract:Anomaly detection is critical in surveillance systems and patrol robots by identifying anomalous regions in images for early warning. Depending on whether reference data are utilized, anomaly detection can be categorized into anomaly detection with reference and anomaly detection without reference. Currently, anomaly detection without reference, which is closely related to out-of-distribution (OoD) object detection, struggles with learning anomalous patterns due to the difficulty of collecting sufficiently large and diverse anomaly datasets with the inherent rarity and novelty of anomalies. Alternatively, anomaly detection with reference employs the scheme of change detection to identify anomalies by comparing semantic changes between a reference image and a query one. However, there are very few ADr works due to the scarcity of public datasets in this domain. In this paper, we aim to address this gap by introducing the UMAD Benchmark Dataset. To our best knowledge, this is the first benchmark dataset designed specifically for anomaly detection with reference in robotic patrolling scenarios, e.g., where an autonomous robot is employed to detect anomalous objects by comparing a reference and a query video sequences. The reference sequences can be taken by the robot along a specified route when there are no anomalous objects in the scene. The query sequences are captured online by the robot when it is patrolling in the same scene following the same route. Our benchmark dataset is elaborated such that each query image can find a corresponding reference based on accurate robot localization along the same route in the prebuilt 3D map, with which the reference and query images can be geometrically aligned using adaptive warping. Besides the proposed benchmark dataset, we evaluate the baseline models of ADr on this dataset.

[CV-13] Scribbles for All: Benchmarking Scribble Supervised Segmentation Across Datasets

链接: https://arxiv.org/abs/2408.12489
作者: Wolfgang Boettcher,Lukas Hoyer,Ozan Unal,Jan Eric Lenssen,Bernt Schiele
关键词-EN: segmentation, semantic segmentation, training data generation, scribble, semantic
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注: under review

点击查看摘要

Abstract:In this work, we introduce Scribbles for All, a label and training data generation algorithm for semantic segmentation trained on scribble labels. Training or fine-tuning semantic segmentation models with weak supervision has become an important topic recently and was subject to significant advances in model quality. In this setting, scribbles are a promising label type to achieve high quality segmentation results while requiring a much lower annotation effort than usual pixel-wise dense semantic segmentation annotations. The main limitation of scribbles as source for weak supervision is the lack of challenging datasets for scribble segmentation, which hinders the development of novel methods and conclusive evaluations. To overcome this limitation, Scribbles for All provides scribble labels for several popular segmentation datasets and provides an algorithm to automatically generate scribble labels for any dataset with dense annotations, paving the way for new insights and model advancements in the field of weakly supervised segmentation. In addition to providing datasets and algorithm, we evaluate state-of-the-art segmentation models on our datasets and show that models trained with our synthetic labels perform competitively with respect to models trained on manual labels. Thus, our datasets enable state-of-the-art research into methods for scribble-labeled semantic segmentation. The datasets, scribble generation algorithm, and baselines are publicly available at this https URL

[CV-14] Not All Samples Should Be Utilized Equally: Towards Understanding and Improving Dataset Distillation

链接: https://arxiv.org/abs/2408.12483
作者: Shaobo Wang,Yantai Yang,Qilong Wang,Kaixin Li,Linfeng Zhang,Junchi Yan
关键词-EN: small dataset capable, aims to synthesize, synthesize a small, capable of performing, performing comparably
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Dataset Distillation (DD) aims to synthesize a small dataset capable of performing comparably to the original dataset. Despite the success of numerous DD methods, theoretical exploration of this area remains unaddressed. In this paper, we take an initial step towards understanding various matching-based DD methods from the perspective of sample difficulty. We begin by empirically examining sample difficulty, measured by gradient norm, and observe that different matching-based methods roughly correspond to specific difficulty tendencies. We then extend the neural scaling laws of data pruning to DD to theoretically explain these matching-based methods. Our findings suggest that prioritizing the synthesis of easier samples from the original dataset can enhance the quality of distilled datasets, especially in low IPC (image-per-class) settings. Based on our empirical observations and theoretical analysis, we introduce the Sample Difficulty Correction (SDC) approach, designed to predominantly generate easier samples to achieve higher dataset quality. Our SDC can be seamlessly integrated into existing methods as a plugin with minimal code adjustments. Experimental results demonstrate that adding SDC generates higher-quality distilled datasets across 7 distillation methods and 6 datasets.

[CV-15] Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition

链接: https://arxiv.org/abs/2408.12475
作者: Bozheng Li,Mushui Liu,Gaoang Wang,Yunlong Yu
关键词-EN: few-shot action recognition, Temporal Sequence-Aware Model, sequential perceiver adapter, sequential temporal dynamics, capture temporal information
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注: 9 pages, 6 figures

点击查看摘要

Abstract:In this paper, we propose a novel Temporal Sequence-Aware Model (TSAM) for few-shot action recognition (FSAR), which incorporates a sequential perceiver adapter into the pre-training framework, to integrate both the spatial information and the sequential temporal dynamics into the feature embeddings. Different from the existing fine-tuning approaches that capture temporal information by exploring the relationships among all the frames, our perceiver-based adapter recurrently captures the sequential dynamics alongside the timeline, which could perceive the order change. To obtain the discriminative representations for each class, we extend a textual corpus for each class derived from the large language models (LLMs) and enrich the visual prototypes by integrating the contextual semantic information. Besides, We introduce an unbalanced optimal transport strategy for feature matching that mitigates the impact of class-unrelated features, thereby facilitating more effective decision-making. Experimental results on five FSAR datasets demonstrate that our method set a new benchmark, beating the second-best competitors with large margins.

[CV-16] Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning

链接: https://arxiv.org/abs/2408.12469
作者: Mushui Liu,Fangtai Wu,Bozheng Li,Ziqian Lu,Yunlong Yu,Xi Li
关键词-EN: limited visual data, aims to recognize, recognize new concepts, Visual Pattern Extraction, Semantic-guided Visual Pattern
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注: 9 pages, 7 figures

点击查看摘要

Abstract:Few-shot learning (FSL) aims to recognize new concepts using a limited number of visual samples. Existing approaches attempt to incorporate semantic information into the limited visual data for category understanding. However, these methods often enrich class-level feature representations with abstract category names, failing to capture the nuanced features essential for effective generalization. To address this issue, we propose a novel framework for FSL, which incorporates both the abstract class semantics and the concrete class entities extracted from Large Language Models (LLMs), to enhance the representation of the class prototypes. Specifically, our framework composes a Semantic-guided Visual Pattern Extraction (SVPE) module and a Prototype-Calibration (PC) module, where the SVPE meticulously extracts semantic-aware visual patterns across diverse scales, while the PC module seamlessly integrates these patterns to refine the visual prototype, enhancing its representativeness. Extensive experiments on four few-shot classification benchmarks and the BSCD-FSL cross-domain benchmarks showcase remarkable advancements over the current state-of-the-art methods. Notably, for the challenging one-shot setting, our approach, utilizing the ResNet-12 backbone, achieves an impressive average improvement of 1.95% over the second-best competitor.

[CV-17] WCEbleedGen: A wireless capsule endoscopy dataset and its benchmarking for automatic bleeding classification detection and segmentation

链接: https://arxiv.org/abs/2408.12466
作者: Palak Handa,Manas Dhir,Amirreza Mahbod,Florian Schwarzhans,Ramona Woitek,Nidhi Goel,Deepak Gunjan
关键词-EN: Wireless Capsule Endoscopy, Capsule Endoscopy, Wireless Capsule, Computer-based analysis, medically annotated WCE
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Computer-based analysis of Wireless Capsule Endoscopy (WCE) is crucial. However, a medically annotated WCE dataset for training and evaluation of automatic classification, detection, and segmentation of bleeding and non-bleeding frames is currently lacking. The present work focused on development of a medically annotated WCE dataset called WCEbleedGen for automatic classification, detection, and segmentation of bleeding and non-bleeding frames. It comprises 2,618 WCE bleeding and non-bleeding frames which were collected from various internet resources and existing WCE datasets. A comprehensive benchmarking and evaluation of the developed dataset was done using nine classification-based, three detection-based, and three segmentation-based deep learning models. The dataset is of high-quality, is class-balanced and contains single and multiple bleeding sites. Overall, our standard benchmark results show that Visual Geometric Group (VGG) 19, You Only Look Once version 8 nano (YOLOv8n), and Link network (Linknet) performed best in automatic classification, detection, and segmentation-based evaluations, respectively. Automatic bleeding diagnosis is crucial for WCE video interpretations. This diverse dataset will aid in developing of real-time, multi-task learning-based innovative solutions for automatic bleeding diagnosis in WCE. The dataset and code are publicly available at this https URL and this https URL.

[CV-18] Smartphone-based Eye Tracking System using Edge Intelligence and Model Optimisation

链接: https://arxiv.org/abs/2408.12463
作者: Nishan Gunawardena,Gough Yumu Lui,Jeewani Anupama Ginige,Bahman Javadi
关键词-EN: Recurrent Neural Networks, Convolutional Neural Networks, video-type visual stimuli, Gated Recurrent Unit, Long Short Term
类目: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Performance (cs.PF)
*备注:

点击查看摘要

Abstract:A significant limitation of current smartphone-based eye-tracking algorithms is their low accuracy when applied to video-type visual stimuli, as they are typically trained on static images. Also, the increasing demand for real-time interactive applications like games, VR, and AR on smartphones requires overcoming the limitations posed by resource constraints such as limited computational power, battery life, and network bandwidth. Therefore, we developed two new smartphone eye-tracking techniques for video-type visuals by combining Convolutional Neural Networks (CNN) with two different Recurrent Neural Networks (RNN), namely Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU). Our CNN+LSTM and CNN+GRU models achieved an average Root Mean Square Error of 0.955cm and 1.091cm, respectively. To address the computational constraints of smartphones, we developed an edge intelligence architecture to enhance the performance of smartphone-based eye tracking. We applied various optimisation methods like quantisation and pruning to deep learning models for better energy, CPU, and memory usage on edge devices, focusing on real-time processing. Using model quantisation, the model inference time in the CNN+LSTM and CNN+GRU models was reduced by 21.72% and 19.50%, respectively, on edge devices.

[CV-19] Finding Closure: A Closer Look at the Gestalt Law of Closure in Convolutional Neural Networks

链接: https://arxiv.org/abs/2408.12460
作者: Yuyan Zhang,Derya Soydaner,Lisa Koßmann,Fatemeh Behrad,Johan Wagemans
关键词-EN: neural networks, Closure, missing or fragmented, neural, inherent ability
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:The human brain has an inherent ability to fill in gaps to perceive figures as complete wholes, even when parts are missing or fragmented. This phenomenon is known as Closure in psychology, one of the Gestalt laws of perceptual organization, explaining how the human brain interprets visual stimuli. Given the importance of Closure for human object recognition, we investigate whether neural networks rely on a similar mechanism. Exploring this crucial human visual skill in neural networks has the potential to highlight their comparability to humans. Recent studies have examined the Closure effect in neural networks. However, they typically focus on a limited selection of Convolutional Neural Networks (CNNs) and have not reached a consensus on their capability to perform Closure. To address these gaps, we present a systematic framework for investigating the Closure principle in neural networks. We introduce well-curated datasets designed to test for Closure effects, including both modal and amodal completion. We then conduct experiments on various CNNs employing different measurements. Our comprehensive analysis reveals that VGG16 and DenseNet-121 exhibit the Closure effect, while other CNNs show variable results. We interpret these findings by blending insights from psychology and neural network research, offering a unique perspective that enhances transparency in understanding neural networks. Our code and dataset will be made available on GitHub.

[CV-20] Relaxed Rotational Equivariance via G-Biases in Vision

链接: https://arxiv.org/abs/2408.12454
作者: Zhiqiang Wu,Licheng Sun,Yingjie Liu,Jian Yang,Hanlin Dong,Shing-Ho J. Lin,Xuan Tang,Jinpeng Mi,Bo Jin,Xian Wei
关键词-EN: Group Equivariant Convolution, Equivariant Convolution, handle rotational symmetry, rotational symmetry, strict rotational symmetry
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Group Equivariant Convolution (GConv) can effectively handle rotational symmetry data. They assume uniform and strict rotational symmetry across all features, as the transformations under the specific group. However, real-world data rarely conforms to strict rotational symmetry commonly referred to as Rotational Symmetry-Breaking in the system or dataset, making GConv unable to adapt effectively to this phenomenon. Motivated by this, we propose a simple but highly effective method to address this problem, which utilizes a set of learnable biases called the G -Biases under the group order to break strict group constraints and achieve \textbfRelaxed \textbfRotational \textbfEquivarant \textbfConvolution (RREConv). We conduct extensive experiments to validate Relaxed Rotational Equivariance on rotational symmetry groups \mathcalC_n (e.g. \mathcalC_2 , \mathcalC_4 , and \mathcalC_6 groups). Further experiments demonstrate that our proposed RREConv-based methods achieve excellent performance, compared to existing GConv-based methods in classification and detection tasks on natural image datasets.

[CV-21] he 2nd Solution for LSVOS Challenge RVOS Track: Spatial-temporal Refinement for Consistent Semantic Segmentation

链接: https://arxiv.org/abs/2408.12447
作者: Tuyen Tran
关键词-EN: Referring Video Object, Video Object Segmentation, challenging task due, Video Object, Referring Video
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:Referring Video Object Segmentation (RVOS) is a challenging task due to its requirement for temporal understanding. Due to the obstacle of computational complexity, many state-of-the-art models are trained on short time intervals. During testing, while these models can effectively process information over short time steps, they struggle to maintain consistent perception over prolonged time sequences, leading to inconsistencies in the resulting semantic segmentation masks. To address this challenge, we take a step further in this work by leveraging the tracking capabilities of the newly introduced Segment Anything Model version 2 (SAM-v2) to enhance the temporal consistency of the referring object segmentation model. Our method achieved a score of 60.40 \mathcalJ\text\F on the test set of the MeViS dataset, placing 2nd place in the final ranking of the RVOS Track at the ECCV 2024 LSVOS Challenge.

[CV-22] A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures

链接: https://arxiv.org/abs/2408.12443
作者: Tahmina Khanam,Hamid Laga,Mohammed Bennamoun,Guanjin Wang,Ferdous Sohel,Farid Boussaid,Guan Wang,Anuj Srivastava
关键词-EN: Velocity Function Trees, Square Root Velocity, Root Velocity Function, comprehensive approach, modeling and analyzing
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
*备注:

点击查看摘要

Abstract:We propose the first comprehensive approach for modeling and analyzing the spatiotemporal shape variability in tree-like 4D objects, i.e., 3D objects whose shapes bend, stretch, and change in their branching structure over time as they deform, grow, and interact with their environment. Our key contribution is the representation of tree-like 3D shapes using Square Root Velocity Function Trees (SRVFT). By solving the spatial registration in the SRVFT space, which is equipped with an L2 metric, 4D tree-shaped structures become time-parameterized trajectories in this space. This reduces the problem of modeling and analyzing 4D tree-like shapes to that of modeling and analyzing elastic trajectories in the SRVFT space, where elasticity refers to time warping. In this paper, we propose a novel mathematical representation of the shape space of such trajectories, a Riemannian metric on that space, and computational tools for fast and accurate spatiotemporal registration and geodesics computation between 4D tree-shaped structures. Leveraging these building blocks, we develop a full framework for modelling the spatiotemporal variability using statistical models and generating novel 4D tree-like structures from a set of exemplars. We demonstrate and validate the proposed framework using real 4D plant data.

[CV-23] Adapting MIMO video restoration networks to low latency constraints

链接: https://arxiv.org/abs/2408.12439
作者: Valéry Dewil,Zhe Zheng,Arnaud Barral,Lara Raad,Nao Nicolas,Ioannis Cassagne,Jean-michel Morel,Gabriele Facciolo,Bruno Galerne,Pablo Arias
关键词-EN: evaluation produces multiple, produces multiple output, multiple output frames, multiple output, multiple input
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注: See the project web page to download the associated videos

点击查看摘要

Abstract:MIMO (multiple input, multiple output) approaches are a recent trend in neural network architectures for video restoration problems, where each network evaluation produces multiple output frames. The video is split into non-overlapping stacks of frames that are processed independently, resulting in a very appealing trade-off between output quality and computational cost. In this work we focus on the low-latency setting by limiting the number of available future frames. We find that MIMO architectures suffer from problems that have received little attention so far, namely (1) the performance drops significantly due to the reduced temporal receptive field, particularly for frames at the borders of the stack, (2) there are strong temporal discontinuities at stack transitions which induce a step-wise motion artifact. We propose two simple solutions to alleviate these problems: recurrence across MIMO stacks to boost the output quality by implicitly increasing the temporal receptive field, and overlapping of the output stacks to smooth the temporal discontinuity at stack transitions. These modifications can be applied to any MIMO architecture. We test them on three state-of-the-art video denoising networks with different computational cost. The proposed contributions result in a new state-of-the-art for low-latency networks, both in terms of reconstruction error and temporal consistency. As an additional contribution, we introduce a new benchmark consisting of drone footage that highlights temporal consistency issues that are not apparent in the standard benchmarks.

[CV-24] Robotic Eye-in-hand Visual Servo Axially Aligning Nasopharyngeal Swabs with the Nasal Cavity

链接: https://arxiv.org/abs/2408.12437
作者: Peter Q. Lee,John S. Zelek,Katja Mombaur
关键词-EN: swab test, respiratory illnesses, method for collecting, collecting cultures, cultures to diagnose
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
*备注: 12 pages, 13 figures

点击查看摘要

Abstract:The nasopharyngeal (NP) swab test is a method for collecting cultures to diagnose for different types of respiratory illnesses, including COVID-19. Delegating this task to robots would be beneficial in terms of reducing infection risks and bolstering the healthcare system, but a critical component of the NP swab test is having the swab aligned properly with the nasal cavity so that it does not cause excessive discomfort or injury by traveling down the wrong passage. Existing research towards robotic NP swabbing typically assumes the patient’s head is held within a fixture. This simplifies the alignment problem, but is also dissimilar to clinical scenarios where patients are typically free-standing. Consequently, our work creates a vision-guided pipeline to allow an instrumented robot arm to properly position and orient NP swabs with respect to the nostrils of free-standing patients. The first component of the pipeline is a precomputed joint lookup table to allow the arm to meet the patient’s arbitrary position in the designated workspace, while avoiding joint limits. Our pipeline leverages semantic face models from computer vision to estimate the Euclidean pose of the face with respect to a monocular RGB-D camera placed on the end-effector. These estimates are passed into an unscented Kalman filter on manifolds state estimator and a pose based visual servo control loop to move the swab to the designated pose in front of the nostril. Our pipeline was validated with human trials, featuring a cohort of 25 participants. The system is effective, reaching the nostril for 84% of participants, and our statistical analysis did not find significant demographic biases within the cohort.

[CV-25] FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing

链接: https://arxiv.org/abs/2408.12429
作者: Jue Wang,Yuxiang Lin,Tianshuo Yuan,Zhi-Qi Cheng,Xiaolong Wang,Jiao GH,Wei Chen,Xiaojiang Peng
关键词-EN: Combining Vision Large, Vision Large Language, Combining Vision, Vision Large, Large Language Models
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注: 15 pages, 14 figures

点击查看摘要

Abstract:Combining Vision Large Language Models (VLLMs) with diffusion models offers a powerful method for executing image editing tasks based on human language instructions. However, language instructions alone often fall short in accurately conveying user requirements, particularly when users want to add, replace elements in specific areas of an image. Luckily, masks can effectively indicate the exact locations or elements to be edited, while they require users to precisely draw the shapes at the desired locations, which is highly user-unfriendly. To address this, we propose FlexEdit, an end-to-end image editing method that leverages both free-shape masks and language instructions for Flexible Editing. Our approach employs a VLLM in comprehending the image content, mask, and user instructions. Additionally, we introduce the Mask Enhance Adapter (MEA) that fuses the embeddings of the VLLM with the image data, ensuring a seamless integration of mask information and model output embeddings. Furthermore, we construct FSMI-Edit, a benchmark specifically tailored for free-shape mask, including 8 types of free-shape mask. Extensive experiments show that our method achieves state-of-the-art (SOTA) performance in LLM-based image editing, and our simple prompting technique stands out in its effectiveness. The code and data can be found at this https URL.

[CV-26] Enhanced Infield Agriculture with Interpretable Machine Learning Approaches for Crop Classification

链接: https://arxiv.org/abs/2408.12426
作者: Sudi Murindanyi,Joyce Nakatumba-Nabende,Rahman Sanya,Rose Nakibuule,Andrew Katumba
关键词-EN: Artificial Intelligence, popularity of Artificial, Intelligence in recent, increasing popularity, recent years
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:The increasing popularity of Artificial Intelligence in recent years has led to a surge in interest in image classification, especially in the agricultural sector. With the help of Computer Vision, Machine Learning, and Deep Learning, the sector has undergone a significant transformation, leading to the development of new techniques for crop classification in the field. Despite the extensive research on various image classification techniques, most have limitations such as low accuracy, limited use of data, and a lack of reporting model size and prediction. The most significant limitation of all is the need for model explainability. This research evaluates four different approaches for crop classification, namely traditional ML with handcrafted feature extraction methods like SIFT, ORB, and Color Histogram; Custom Designed CNN and established DL architecture like AlexNet; transfer learning on five models pre-trained using ImageNet such as EfficientNetV2, ResNet152V2, Xception, Inception-ResNetV2, MobileNetV3; and cutting-edge foundation models like YOLOv8 and DINOv2, a self-supervised Vision Transformer Model. All models performed well, but Xception outperformed all of them in terms of generalization, achieving 98% accuracy on the test data, with a model size of 80.03 MB and a prediction time of 0.0633 seconds. A key aspect of this research was the application of Explainable AI to provide the explainability of all the models. This journal presents the explainability of Xception model with LIME, SHAP, and GradCAM, ensuring transparency and trustworthiness in the models’ predictions. This study highlights the importance of selecting the right model according to task-specific needs. It also underscores the important role of explainability in deploying AI in agriculture, providing insightful information to help enhance AI-driven crop management strategies.

[CV-27] CODE: Confident Ordinary Differential Editing

链接: https://arxiv.org/abs/2408.12418
作者: Bastien van Delft,Tommaso Martorella,Alexandre Alahi
关键词-EN: facilitates seamless editing, Confident Ordinary Differential, Ordinary Differential Equation, Ordinary Differential Editing, Ordinary Differential
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Conditioning image generation facilitates seamless editing and the creation of photorealistic images. However, conditioning on noisy or Out-of-Distribution (OoD) images poses significant challenges, particularly in balancing fidelity to the input and realism of the output. We introduce Confident Ordinary Differential Editing (CODE), a novel approach for image synthesis that effectively handles OoD guidance images. Utilizing a diffusion model as a generative prior, CODE enhances images through score-based updates along the probability-flow Ordinary Differential Equation (ODE) trajectory. This method requires no task-specific training, no handcrafted modules, and no assumptions regarding the corruptions affecting the conditioning image. Our method is compatible with any diffusion model. Positioned at the intersection of conditional image generation and blind image restoration, CODE operates in a fully blind manner, relying solely on a pre-trained generative model. Our method introduces an alternative approach to blind restoration: instead of targeting a specific ground truth image based on assumptions about the underlying corruption, CODE aims to increase the likelihood of the input image while maintaining fidelity. This results in the most probable in-distribution image around the input. Our contributions are twofold. First, CODE introduces a novel editing method based on ODE, providing enhanced control, realism, and fidelity compared to its SDE-based counterpart. Second, we introduce a confidence interval-based clipping method, which improves CODE’s effectiveness by allowing it to disregard certain pixels or information, thus enhancing the restoration process in a blind manner. Experimental results demonstrate CODE’s effectiveness over existing methods, particularly in scenarios involving severe degradation or OoD inputs.

[CV-28] Generalized SAM: Efficient Fine-Tuning of SAM for Variable Input Image Sizes ECCV2024

链接: https://arxiv.org/abs/2408.12406
作者: Sota Kato,Hinako Mitsuoka,Kazuhiro Hotta
关键词-EN: fine-tuning foundation models, input image size, SAM, lot of recent, recent research
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注: Accepted by ECCV2024 Workshop “Computational Aspects of Deep Learning (CADL)”

点击查看摘要

Abstract:There has been a lot of recent research on improving the efficiency of fine-tuning foundation models. In this paper, we propose a novel efficient fine-tuning method that allows the input image size of Segment Anything Model (SAM) to be variable. SAM is a powerful foundational model for image segmentation trained on huge datasets, but it requires fine-tuning to recognize arbitrary classes. The input image size of SAM is fixed at 1024 x 1024, resulting in substantial computational demands during training. Furthermore, the fixed input image size may result in the loss of image information, e.g. due to fixed aspect ratios. To address this problem, we propose Generalized SAM (GSAM). Different from the previous methods, GSAM is the first to apply random cropping during training with SAM, thereby significantly reducing the computational cost of training. Experiments on datasets of various types and various pixel counts have shown that GSAM can train more efficiently than SAM and other fine-tuning methods for SAM, achieving comparable or higher accuracy.

[CV-29] Multi-Style Facial Sketch Synthesis through Masked Generative Modeling

链接: https://arxiv.org/abs/2408.12400
作者: Bowen Sun,Guo Lu,Shibao Zheng
关键词-EN: generating sketch portraits, holds profound implications, encompassing cross-modal face, cross-modal face recognition, facial sketch synthesis
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:The facial sketch synthesis (FSS) model, capable of generating sketch portraits from given facial photographs, holds profound implications across multiple domains, encompassing cross-modal face recognition, entertainment, art, media, among others. However, the production of high-quality sketches remains a formidable task, primarily due to the challenges and flaws associated with three key factors: (1) the scarcity of artist-drawn data, (2) the constraints imposed by limited style types, and (3) the deficiencies of processing input information in existing models. To address these difficulties, we propose a lightweight end-to-end synthesis model that efficiently converts images to corresponding multi-stylized sketches, obviating the necessity for any supplementary inputs (\eg, 3D geometry). In this study, we overcome the issue of data insufficiency by incorporating semi-supervised learning into the training process. Additionally, we employ a feature extraction module and style embeddings to proficiently steer the generative transformer during the iterative prediction of masked image tokens, thus achieving a continuous stylized output that retains facial features accurately in sketches. The extensive experiments demonstrate that our method consistently outperforms previous algorithms across multiple benchmarks, exhibiting a discernible disparity.

[CV-30] Cross-Domain Foundation Model Adaptation: Pioneering Computer Vision Models for Geophysical Data Analysis

链接: https://arxiv.org/abs/2408.12396
作者: Zhixiang Guo,Xinming Wu,Luming Liang,Hanlin Sheng,Nuo Chen,Zhengfa Bi
关键词-EN: adapting foundation models, explore adapting foundation, foundation models, FMs, computer vision
类目: Computer Vision and Pattern Recognition (cs.CV); Geophysics (physics.geo-ph)
*备注:

点击查看摘要

Abstract:We explore adapting foundation models (FMs) from the computer vision domain to geoscience. FMs, large neural networks trained on massive datasets, excel in diverse tasks with remarkable adaptability and generality. However, geoscience faces challenges like lacking curated training datasets and high computational costs for developing specialized FMs. This study considers adapting FMs from computer vision to geoscience, analyzing their scale, adaptability, and generality for geoscientific data analysis. We introduce a workflow that leverages existing computer vision FMs, fine-tuning them for geoscientific tasks, reducing development costs while enhancing accuracy. Through experiments, we demonstrate this workflow’s effectiveness in broad applications to process and interpret geoscientific data of lunar images, seismic data, DAS arrays and so on. Our findings introduce advanced ML techniques to geoscience, proving the feasibility and advantages of cross-domain FMs adaptation, driving further advancements in geoscientific data analysis and offering valuable insights for FMs applications in other scientific domains.

[CV-31] Makeup-Guided Facial Privacy Protection via Untrained Neural Network Priors ECCV

链接: https://arxiv.org/abs/2408.12387
作者: Fahad Shamshad,Muzammal Naseer,Karthik Nandakumar
关键词-EN: Deep learning-based face, Deep learning-based, significant privacy risks, pose significant privacy, learning-based face recognition
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
*备注: Proceedings of ECCV Workshop on Explainable AI for Biometrics, 2024

点击查看摘要

Abstract:Deep learning-based face recognition (FR) systems pose significant privacy risks by tracking users without their consent. While adversarial attacks can protect privacy, they often produce visible artifacts compromising user experience. To mitigate this issue, recent facial privacy protection approaches advocate embedding adversarial noise into the natural looking makeup styles. However, these methods require training on large-scale makeup datasets that are not always readily available. In addition, these approaches also suffer from dataset bias. For instance, training on makeup data that predominantly contains female faces could compromise protection efficacy for male faces. To handle these issues, we propose a test-time optimization approach that solely optimizes an untrained neural network to transfer makeup style from a reference to a source image in an adversarial manner. We introduce two key modules: a correspondence module that aligns regions between reference and source images in latent space, and a decoder with conditional makeup layers. The untrained decoder, optimized via carefully designed structural and makeup consistency losses, generates a protected image that resembles the source but incorporates adversarial makeup to deceive FR models. As our approach does not rely on training with makeup face datasets, it avoids potential male/female dataset biases while providing effective protection. We further extend the proposed approach to videos by leveraging on temporal correlations. Experiments on benchmark datasets demonstrate superior performance in face verification and identification tasks and effectiveness against commercial FR systems. Our code and models will be available at this https URL

[CV-32] Sampling Strategies based on Wisdom of Crowds for Amazon Deforestation Detection

链接: https://arxiv.org/abs/2408.12381
作者: Hugo Resende,Eduardo B. Neto,Fabio A. M. Cappabianco,Alvaro L. Fazenda,Fabio A. Faria
关键词-EN: Conserving tropical forests, highly relevant socially, Conserving tropical, Machine Learning models, global ecosystem
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
*备注: 6 pages, 5 figus, paper accepted at the SIBGRAPI 2024

点击查看摘要

Abstract:Conserving tropical forests is highly relevant socially and ecologically because of their critical role in the global ecosystem. However, the ongoing deforestation and degradation affect millions of hectares each year, necessitating government or private initiatives to ensure effective forest monitoring. In April 2019, a project based on Citizen Science and Machine Learning models called ForestEyes (FE) was launched with the aim of providing supplementary data to assist experts from government and non-profit organizations in their deforestation monitoring efforts. Recent research has shown that labeling FE project volunteers/citizen scientists helps tailor machine learning models. In this sense, we adopt the FE project to create different sampling strategies based on the wisdom of crowds to select the most suitable samples from the training set to learn an SVM technique and obtain better classification results in deforestation detection tasks. In our experiments, we can show that our strategy based on user entropy-increasing achieved the best classification results in the deforestation detection task when compared with the random sampling strategies, as well as, reducing the convergence time of the SVM technique.

[CV-33] UMERegRobust – Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration ECCV2024

链接: https://arxiv.org/abs/2408.12380
作者: Yuval Haitman,Amit Efraim,Joseph M. Francos
关键词-EN: Universal Manifold Embedding, Manifold Embedding, Universal Manifold, sampled point clouds, differently sampled point
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注: ECCV 2024

点击查看摘要

Abstract:In this paper, we adopt the Universal Manifold Embedding (UME) framework for the estimation of rigid transformations and extend it, so that it can accommodate scenarios involving partial overlap and differently sampled point clouds. UME is a methodology designed for mapping observations of the same object, related by rigid transformations, into a single low-dimensional linear subspace. This process yields a transformation-invariant representation of the observations, with its matrix form representation being covariant (i.e. equivariant) with the transformation. We extend the UME framework by introducing a UME-compatible feature extraction method augmented with a unique UME contrastive loss and a sampling equalizer. These components are integrated into a comprehensive and robust registration pipeline, named UMERegRobust. We propose the RotKITTI registration benchmark, specifically tailored to evaluate registration methods for scenarios involving large rotations. UMERegRobust achieves better than state-of-the-art performance on the KITTI benchmark, especially when strict precision of (1°, 10cm) is considered (with an average gain of +9%), and notably outperform SOTA methods on the RotKITTI benchmark (with +45% gain compared the most recent SOTA method). Our code is available at this https URL.

[CV-34] Robust Principal Component Analysis via Discriminant Sample Weight Learning

链接: https://arxiv.org/abs/2408.12366
作者: Yingzhuo Deng,Ke Hu,Bo Li,Yao Zhang
关键词-EN: Principal component analysis, PCA projection matrix, projection matrix, classical feature extraction, Principal component
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:Principal component analysis (PCA) is a classical feature extraction method, but it may be adversely affected by outliers, resulting in inaccurate learning of the projection matrix. This paper proposes a robust method to estimate both the data mean and the PCA projection matrix by learning discriminant sample weights from data containing outliers. Each sample in the dataset is assigned a weight, and the proposed algorithm iteratively learns the weights, the mean, and the projection matrix, respectively. Specifically, when the mean and the projection matrix are available, via fine-grained analysis of outliers, a weight for each sample is learned hierarchically so that outliers have small weights while normal samples have large weights. With the learned weights available, a weighted optimization problem is solved to estimate both the data mean and the projection matrix. Because the learned weights discriminate outliers from normal samples, the adverse influence of outliers is mitigated due to the corresponding small weights. Experiments on toy data, UCI dataset, and face dataset demonstrate the effectiveness of the proposed method in estimating the mean and the projection matrix from the data containing outliers.

[CV-35] SAM-SP: Self-Prompting Makes SAM Great Again

链接: https://arxiv.org/abs/2408.12364
作者: Chunpeng Zhou,Kangjie Ning,Qianqian Shen,Sheng Zhou,Zhi Yu,Haishuai Wang
关键词-EN: Visual Foundation Model, recently introduced Segment, Visual Foundation, demonstrated impressive capabilities, diverse natural image
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)
*备注: Under Review

点击查看摘要

Abstract:The recently introduced Segment Anything Model (SAM), a Visual Foundation Model (VFM), has demonstrated impressive capabilities in zero-shot segmentation tasks across diverse natural image datasets. Despite its success, SAM encounters noticeably performance degradation when applied to specific domains, such as medical images. Current efforts to address this issue have involved fine-tuning strategies, intended to bolster the generalizability of the vanilla SAM. However, these approaches still predominantly necessitate the utilization of domain specific expert-level prompts during the evaluation phase, which severely constrains the model’s practicality. To overcome this limitation, we introduce a novel self-prompting based fine-tuning approach, called SAM-SP, tailored for extending the vanilla SAM model. Specifically, SAM-SP leverages the output from the previous iteration of the model itself as prompts to guide subsequent iteration of the model. This self-prompting module endeavors to learn how to generate useful prompts autonomously and alleviates the dependence on expert prompts during the evaluation phase, significantly broadening SAM’s applicability. Additionally, we integrate a self-distillation module to enhance the self-prompting process further. Extensive experiments across various domain specific datasets validate the effectiveness of the proposed SAM-SP. Our SAM-SP not only alleviates the reliance on expert prompts but also exhibits superior segmentation performance comparing to the state-of-the-art task-specific segmentation approaches, the vanilla SAM, and SAM-based approaches. Comments: Under Review Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET) Cite as: arXiv:2408.12364 [cs.CV] (or arXiv:2408.12364v1 [cs.CV] for this version) https://doi.org/10.48550/arXiv.2408.12364 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

[CV-36] Class-balanced Open-set Semi-supervised Object Detection for Medical Images

链接: https://arxiv.org/abs/2408.12355
作者: Zhanyun Lu,Renshu Gu,Huimin Cheng,Siyu Pang,Mingyu Xu,Peifang Xu,Yaqi Wang,Yuichiro Kinoshita,Juan Ye,Gangyong Jia,Qing Wu
关键词-EN: Semi-Supervised Object Detection, Object Detection, utilize unlabeled data, open-set semi-supervised object, Semi-Supervised Object
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Medical image datasets in the real world are often unlabeled and imbalanced, and Semi-Supervised Object Detection (SSOD) can utilize unlabeled data to improve an object detector. However, existing approaches predominantly assumed that the unlabeled data and test data do not contain out-of-distribution (OOD) classes. The few open-set semi-supervised object detection methods have two weaknesses: first, the class imbalance is not considered; second, the OOD instances are distinguished and simply discarded during pseudo-labeling. In this paper, we consider the open-set semi-supervised object detection problem which leverages unlabeled data that contain OOD classes to improve object detection for medical images. Our study incorporates two key innovations: Category Control Embed (CCE) and out-of-distribution Detection Fusion Classifier (OODFC). CCE is designed to tackle dataset imbalance by constructing a Foreground information Library, while OODFC tackles open-set challenges by integrating the ``unknown’’ information into basic pseudo-labels. Our method outperforms the state-of-the-art SSOD performance, achieving a 4.25 mAP improvement on the public Parasite dataset.

[CV-37] GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections

链接: https://arxiv.org/abs/2408.12352
作者: Shiyue Zhang,Zheng Chong,Xujie Zhang,Hanhui Li,Yuhao Cheng,Yiqiang Yan,Xiaodan Liang
关键词-EN: bring revolutionary innovation, models bring revolutionary, fields of arts, bring revolutionary, revolutionary innovation
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:General text-to-image models bring revolutionary innovation to the fields of arts, design, and media. However, when applied to garment generation, even the state-of-the-art text-to-image models suffer from fine-grained semantic misalignment, particularly concerning the quantity, position, and interrelations of garment components. Addressing this, we propose GarmentAligner, a text-to-garment diffusion model trained with retrieval-augmented multi-level corrections. To achieve semantic alignment at the component level, we introduce an automatic component extraction pipeline to obtain spatial and quantitative information of garment components from corresponding images and captions. Subsequently, to exploit component relationships within the garment images, we construct retrieval subsets for each garment by retrieval augmentation based on component-level similarity ranking and conduct contrastive learning to enhance the model perception of components from positive and negative samples. To further enhance the alignment of components across semantic, spatial, and quantitative granularities, we propose the utilization of multi-level correction losses that leverage detailed component information. The experimental findings demonstrate that GarmentAligner achieves superior fidelity and fine-grained semantic alignment when compared to existing competitors.

[CV-38] VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding

链接: https://arxiv.org/abs/2408.12340
作者: Yujie Liang,Xiaobin Hu,Boyuan Jiang,Donghao Luo,Kai WU,Wenhui Han,Taisong Jin,Chengjie Wang
关键词-EN: clothing regions occluded, image virtual try-on, diffusion-based image virtual, made considerable progress, try-on performance
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:Although diffusion-based image virtual try-on has made considerable progress, emerging approaches still struggle to effectively address the issue of hand occlusion (i.e., clothing regions occluded by the hand part), leading to a notable degradation of the try-on performance. To tackle this issue widely existing in real-world scenarios, we propose VTON-HandFit, leveraging the power of hand priors to reconstruct the appearance and structure for hand occlusion cases. Firstly, we tailor a Handpose Aggregation Net using the ControlNet-based structure explicitly and adaptively encoding the global hand and pose priors. Besides, to fully exploit the hand-related structure and appearance information, we propose Hand-feature Disentanglement Embedding module to disentangle the hand priors into the hand structure-parametric and visual-appearance features, and customize a masked cross attention for further decoupled feature embedding. Lastly, we customize a hand-canny constraint loss to better learn the structure edge knowledge from the hand template of model image. VTON-HandFit outperforms the baselines in qualitative and quantitative evaluations on the public dataset and our self-collected hand-occlusion Handfit-3K dataset particularly for the arbitrary hand pose occlusion cases in real-world scenarios. Code and dataset will be made publicly available.

[CV-39] Multimodal Foundational Models for Unsupervised 3D General Obstacle Detection

链接: https://arxiv.org/abs/2408.12322
作者: Tamás Matuszka,Péter Hajas,Dávid Szeghy
关键词-EN: Current autonomous driving, autonomous driving perception, Current autonomous, perception models primarily, models primarily rely
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:Current autonomous driving perception models primarily rely on supervised learning with predefined categories. However, these models struggle to detect general obstacles not included in the fixed category set due to their variability and numerous edge cases. To address this issue, we propose a combination of multimodal foundational model-based obstacle segmentation with traditional unsupervised computational geometry-based outlier detection. Our approach operates offline, allowing us to leverage non-causality, and utilizes training-free methods. This enables the detection of general obstacles in 3D without the need for expensive retraining. To overcome the limitations of publicly available obstacle detection datasets, we collected and annotated our dataset, which includes various obstacles even in distant regions.

[CV-40] MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model

链接: https://arxiv.org/abs/2408.12321
作者: Chaoya Jiang,Jia Hongrui,Haiyang Xu,Wei Ye,Mengfan Dong,Ming Yan,Ji Zhang,Fei Huang,Shikun Zhang
关键词-EN: Multimodal Large Language, Large Language Models, Multi-granularity Visual Encoding, Encoding framework designed, Multimodal Large
类目: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
*备注:

点击查看摘要

Abstract:This paper presents MaVEn, an innovative Multi-granularity Visual Encoding framework designed to enhance the capabilities of Multimodal Large Language Models (MLLMs) in multi-image reasoning. Current MLLMs primarily focus on single-image visual understanding, limiting their ability to interpret and integrate information across multiple images. MaVEn addresses this limitation by combining discrete visual symbol sequences, which abstract coarse-grained semantic concepts, with traditional continuous representation sequences that model fine-grained features. This dual approach bridges the semantic gap between visual and textual data, thereby improving the model’s ability to process and interpret information from multiple images effectively. Additionally, we design a dynamic reduction mechanism by for long-sequence continuous features to enhance multi-image processing efficiency. Experimental results demonstrate that MaVEn significantly enhances MLLMs’ understanding in complex multi-image scenarios, while also improving performance in single-image contexts.

[CV-41] Adapt CLIP as Aggregation Instructor for Image Dehazing

链接: https://arxiv.org/abs/2408.12317
作者: Xiaozhe Zhang,Fengying Xie,Haidong Ding,Linpeng Pan,Zhenwei Shi
关键词-EN: rich semantic prior, semantic prior encapsulated, dehazing methods suffer, downstream tasks, suffer from limited
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注: 12 pages, 6 figures

点击查看摘要

Abstract:Most dehazing methods suffer from limited receptive field and do not explore the rich semantic prior encapsulated in vision-language models, which have proven effective in downstream tasks. In this paper, we introduce CLIPHaze, a pioneering hybrid framework that synergizes the efficient global modeling of Mamba with the prior knowledge and zero-shot capabilities of CLIP to address both issues simultaneously. Specifically, our method employs parallel state space model and window-based self-attention to obtain global contextual dependency and local fine-grained perception, respectively. To seamlessly aggregate information from both paths, we introduce CLIP-instructed Aggregation Module (CAM). For non-homogeneous and homogeneous haze, CAM leverages zero-shot estimated haze density map and high-quality image embedding without degradation information to explicitly and implicitly determine the optimal neural operation range for each pixel, thereby adaptively fusing two paths with different receptive fields. Extensive experiments on various benchmarks demonstrate that CLIPHaze achieves state-of-the-art (SOTA) performance, particularly in non-homogeneous haze. Code will be publicly after acceptance.

[CV-42] Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement

链接: https://arxiv.org/abs/2408.12316
作者: Lingyu Zhu,Wenhan Yang,Baoliang Chen,Hanwei Zhu,Zhangkai Ni,Qi Mao,Shiqi Wang
关键词-EN: Obtaining pairs, raises technical issues, low-light video enhancement, enhancing low-light videos, raises technical
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:Obtaining pairs of low/normal-light videos, with motions, is more challenging than still images, which raises technical issues and poses the technical route of unpaired learning as a critical role. This paper makes endeavors in the direction of learning for low-light video enhancement without using paired ground truth. Compared to low-light image enhancement, enhancing low-light videos is more difficult due to the intertwined effects of noise, exposure, and contrast in the spatial domain, jointly with the need for temporal coherence. To address the above challenge, we propose the Unrolled Decomposed Unpaired Network (UDU-Net) for enhancing low-light videos by unrolling the optimization functions into a deep network to decompose the signal into spatial and temporal-related factors, which are updated iteratively. Firstly, we formulate low-light video enhancement as a Maximum A Posteriori estimation (MAP) problem with carefully designed spatial and temporal visual regularization. Then, via unrolling the problem, the optimization of the spatial and temporal constraints can be decomposed into different steps and updated in a stage-wise manner. From the spatial perspective, the designed Intra subnet leverages unpair prior information from expert photography retouched skills to adjust the statistical distribution. Additionally, we introduce a novel mechanism that integrates human perception feedback to guide network optimization, suppressing over/under-exposure conditions. Meanwhile, to address the issue from the temporal perspective, the designed Inter subnet fully exploits temporal cues in progressive optimization, which helps achieve improved temporal consistency in enhancement results. Consequently, the proposed method achieves superior performance to state-of-the-art methods in video illumination, noise suppression, and temporal consistency across outdoor and indoor scenes.

[CV-43] MakeupAttack: Feature Space Black-box Backdoor Attack on Face Recognition via Makeup Transfer

链接: https://arxiv.org/abs/2408.12312
作者: Ming Sun,Lihua Jing,Zixuan Zhu,Rui Wang
关键词-EN: deep neural networks, face recognition, neural networks, pose a significant, significant threat
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:Backdoor attacks pose a significant threat to the training process of deep neural networks (DNNs). As a widely-used DNN-based application in real-world scenarios, face recognition systems once implanted into the backdoor, may cause serious consequences. Backdoor research on face recognition is still in its early stages, and the existing backdoor triggers are relatively simple and visible. Furthermore, due to the perceptibility, diversity, and similarity of facial datasets, many state-of-the-art backdoor attacks lose effectiveness on face recognition tasks. In this work, we propose a novel feature space backdoor attack against face recognition via makeup transfer, dubbed MakeupAttack. In contrast to many feature space attacks that demand full access to target models, our method only requires model queries, adhering to black-box attack principles. In our attack, we design an iterative training paradigm to learn the subtle features of the proposed makeup-style trigger. Additionally, MakeupAttack promotes trigger diversity using the adaptive selection method, dispersing the feature distribution of malicious samples to bypass existing defense methods. Extensive experiments were conducted on two widely-used facial datasets targeting multiple models. The results demonstrate that our proposed attack method can bypass existing state-of-the-art defenses while maintaining effectiveness, robustness, naturalness, and stealthiness, without compromising model performance.

[CV-44] AT-SNN: Adaptive Tokens for Vision Transformer on Spiking Neural Network

链接: https://arxiv.org/abs/2408.12293
作者: Donghwa Kang,Youngmoon Lee,Eun-Kyu Lee,Brent Kang,Jinkyu Lee,Hyeongboo Baek
关键词-EN: spiking neural networks, reducing power consumption, neural networks, convolutional neural networks, orthogonally developed
类目: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
*备注: 8 pages

点击查看摘要

Abstract:In the training and inference of spiking neural networks (SNNs), direct training and lightweight computation methods have been orthogonally developed, aimed at reducing power consumption. However, only a limited number of approaches have applied these two mechanisms simultaneously and failed to fully leverage the advantages of SNN-based vision transformers (ViTs) since they were originally designed for convolutional neural networks (CNNs). In this paper, we propose AT-SNN designed to dynamically adjust the number of tokens processed during inference in SNN-based ViTs with direct training, wherein power consumption is proportional to the number of tokens. We first demonstrate the applicability of adaptive computation time (ACT), previously limited to RNNs and ViTs, to SNN-based ViTs, enhancing it to discard less informative spatial tokens selectively. Also, we propose a new token-merge mechanism that relies on the similarity of tokens, which further reduces the number of tokens while enhancing accuracy. We implement AT-SNN to Spikformer and show the effectiveness of AT-SNN in achieving high energy efficiency and accuracy compared to state-of-the-art approaches on the image classification tasks, CIFAR10, CIFAR-100, and TinyImageNet. For example, our approach uses up to 42.4% fewer tokens than the existing best-performing method on CIFAR-100, while conserving higher accuracy.

[CV-45] owards Deconfounded Image-Text Matching with Causal Inference ACM-MM

链接: https://arxiv.org/abs/2408.12292
作者: Wenhui Li,Xinqi Su,Dan Song,Lanjun Wang,Kun Zhang,An-An Liu
关键词-EN: shown remarkable performance, image-text matching, Prior image-text matching, Structural Causal Models, image-text matching model
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注: ACM MM

点击查看摘要

Abstract:Prior image-text matching methods have shown remarkable performance on many benchmark datasets, but most of them overlook the bias in the dataset, which exists in intra-modal and inter-modal, and tend to learn the spurious correlations that extremely degrade the generalization ability of the model. Furthermore, these methods often incorporate biased external knowledge from large-scale datasets as prior knowledge into image-text matching model, which is inevitable to force model further learn biased associations. To address above limitations, this paper firstly utilizes Structural Causal Models (SCMs) to illustrate how intra- and inter-modal confounders damage the image-text matching. Then, we employ backdoor adjustment to propose an innovative Deconfounded Causal Inference Network (DCIN) for image-text matching task. DCIN (1) decomposes the intra- and inter-modal confounders and incorporates them into the encoding stage of visual and textual features, effectively eliminating the spurious correlations during image-text matching, and (2) uses causal inference to mitigate biases of external knowledge. Consequently, the model can learn causality instead of spurious correlations caused by dataset bias. Extensive experiments on two well-known benchmark datasets, i.e., Flickr30K and MSCOCO, demonstrate the superiority of our proposed method.

[CV-46] Subsurface Scattering for 3D Gaussian Splatting

链接: https://arxiv.org/abs/2408.12282
作者: Jan-Niklas Dihlmann,Arjun Majumdar,Andreas Engelhardt,Raphael Braun,Hendrik P.A. Lensch
关键词-EN: significant challenge due, complex light transport, light transport beneath, Gaussian Splatting introduced, present a significant
类目: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
*备注: Project page: this https URL

点击查看摘要

Abstract:3D reconstruction and relighting of objects made from scattering materials present a significant challenge due to the complex light transport beneath the surface. 3D Gaussian Splatting introduced high-quality novel view synthesis at real-time speeds. While 3D Gaussians efficiently approximate an object’s surface, they fail to capture the volumetric properties of subsurface scattering. We propose a framework for optimizing an object’s shape together with the radiance transfer field given multi-view OLAT (one light at a time) data. Our method decomposes the scene into an explicit surface represented as 3D Gaussians, with a spatially varying BRDF, and an implicit volumetric representation of the scattering component. A learned incident light field accounts for shadowing. We optimize all parameters jointly via ray-traced differentiable rendering. Our approach enables material editing, relighting and novel view synthesis at interactive rates. We show successful application on synthetic data and introduce a newly acquired multi-view multi-light dataset of objects in a light-stage setup. Compared to previous work we achieve comparable or better results at a fraction of optimization and rendering time while enabling detailed control over material attributes. Project page this https URL

[CV-47] Epsilon: Exploring Comprehensive Visual-Semantic Projection for Multi-Label Zero-Shot Learning

链接: https://arxiv.org/abs/2408.12253
作者: Ziming Liu,Jingcai Guo,Song Guo,Xiaocheng Lu
关键词-EN: recognize multiple unseen, multiple unseen classes, multi-label scenario, auxiliary knowledge, investigates a challenging
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注: arXiv admin note: substantial text overlap with arXiv:2309.00923

点击查看摘要

Abstract:This paper investigates a challenging problem of zero-shot learning in the multi-label scenario (MLZSL), wherein the model is trained to recognize multiple unseen classes within a sample (e.g., an image) based on seen classes and auxiliary knowledge, e.g., semantic information. Existing methods usually resort to analyzing the relationship of various seen classes residing in a sample from the dimension of spatial or semantic characteristics and transferring the learned model to unseen ones. However, they neglect the integrity of local and global features. Although the use of the attention structure will accurately locate local features, especially objects, it will significantly lose its integrity, and the relationship between classes will also be affected. Rough processing of global features will also directly affect comprehensiveness. This neglect will make the model lose its grasp of the main components of the image. Relying only on the local existence of seen classes during the inference stage introduces unavoidable bias. In this paper, we propose a novel and comprehensive visual-semantic framework for MLZSL, dubbed Epsilon, to fully make use of such properties and enable a more accurate and robust visual-semantic projection. In terms of spatial information, we achieve effective refinement by group aggregating image features into several semantic prompts. It can aggregate semantic information rather than class information, preserving the correlation between semantics. In terms of global semantics, we use global forward propagation to collect as much information as possible to ensure that semantics are not omitted. Experiments on large-scale MLZSL benchmark datasets NUS-Wide and Open-Images-v4 demonstrate that the proposed Epsilon outperforms other state-of-the-art methods with large margins.

[CV-48] PRG: Prompt-Based Distillation Without Annotation via Proxy Relational Graph

链接: https://arxiv.org/abs/2408.12248
作者: Yijin Xu,Jialun Liu,Hualiang Wei,Wenhui Li
关键词-EN: Large Foundation Models, Large Foundation, manually annotated data, require manually annotated, student model
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:In this paper, we propose a new distillation method for extracting knowledge from Large Foundation Models (LFM) into lightweight models, introducing a novel supervision mode that does not require manually annotated data. While LFMs exhibit exceptional zero-shot classification abilities across datasets, relying solely on LFM-generated embeddings for distillation poses two main challenges: LFM’s task-irrelevant knowledge and the high density of features. The transfer of task-irrelevant knowledge could compromise the student model’s discriminative capabilities, and the high density of features within target domains obstructs the extraction of discriminative knowledge essential for the task. To address this issue, we introduce the Proxy Relational Graph (PRG) method. We initially extract task-relevant knowledge from LFMs by calculating a weighted average of logits obtained through text prompt embeddings. Then we construct sample-class proxy graphs for LFM and student models, respectively, to model the correlation between samples and class proxies. Then, we achieve the distillation of selective knowledge by aligning the relational graphs produced by both the LFM and the student model. Specifically, the distillation from LFM to the student model is achieved through two types of alignment: 1) aligning the sample nodes produced by the student model with those produced by the LFM, and 2) aligning the edge relationships in the student model’s graph with those in the LFM’s graph. Our experimental results validate the effectiveness of PRG, demonstrating its ability to leverage the extensive knowledge base of LFMs while skillfully circumventing their inherent limitations in focused learning scenarios. Notably, in our annotation-free framework, PRG achieves an accuracy of 76.23% (T: 77.9%) on CIFAR-100 and 72.44% (T: 75.3%) on the ImageNet-1K.

[CV-49] OVA-DETR: Open Vocabulary Aerial Object Detection Using Image-Text Alignment and Fusion

链接: https://arxiv.org/abs/2408.12246
作者: Guoting Wei,Xia Yuan,Yu Liu,Zhenhao Shang,Kelu Yao,Chao Li,Qingsen Yan,Chunxia Zhao,Haokui Zhang,Rong Xiao
关键词-EN: wide application requirements, Aerial object detection, text-guided Fusion Decoder, Aerial object, application requirements
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:Aerial object detection has been a hot topic for many years due to its wide application requirements. However, most existing approaches can only handle predefined categories, which limits their applicability for the open scenarios in real-world. In this paper, we extend aerial object detection to open scenarios by exploiting the relationship between image and text, and propose OVA-DETR, a high-efficiency open-vocabulary detector for aerial images. Specifically, based on the idea of image-text alignment, we propose region-text contrastive loss to replace the category regression loss in the traditional detection framework, which breaks the category limitation. Then, we propose Bidirectional Vision-Language Fusion (Bi-VLF), which includes a dual-attention fusion encoder and a multi-level text-guided Fusion Decoder. The dual-attention fusion encoder enhances the feature extraction process in the encoder part. The multi-level text-guided Fusion Decoder is designed to improve the detection ability for small objects, which frequently appear in aerial object detection scenarios. Experimental results on three widely used benchmark datasets show that our proposed method significantly improves the mAP and recall, while enjoying faster inference speed. For instance, in zero shot detection experiments on DIOR, the proposed OVA-DETR outperforms DescReg and YOLO-World by 37.4% and 33.1%, respectively, while achieving 87 FPS inference speed, which is 7.9x faster than DescReg and 3x faster than YOLO-world. The code is available at this https URL.

[CV-50] Scalable Autoregressive Image Generation with Mamba

链接: https://arxiv.org/abs/2408.12245
作者: Haopeng Li,Jinyue Yang,Kexin Wang,Xuerui Qiu,Yuhong Chou,Xin Li,Guoqi Li
关键词-EN: Mamba architecture, Mamba, commonly utilized Transformers, AiM employs Mamba, AiM
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注: 9 pages, 8 figures

点击查看摘要

Abstract:We introduce AiM, an autoregressive (AR) image generative model based on Mamba architecture. AiM employs Mamba, a novel state-space model characterized by its exceptional performance for long-sequence modeling with linear time complexity, to supplant the commonly utilized Transformers in AR image generation models, aiming to achieve both superior generation quality and enhanced inference speed. Unlike existing methods that adapt Mamba to handle two-dimensional signals via multi-directional scan, AiM directly utilizes the next-token prediction paradigm for autoregressive image generation. This approach circumvents the need for extensive modifications to enable Mamba to learn 2D spatial representations. By implementing straightforward yet strategically targeted modifications for visual generative tasks, we preserve Mamba’s core structure, fully exploiting its efficient long-sequence modeling capabilities and scalability. We provide AiM models in various scales, with parameter counts ranging from 148M to 1.3B. On the ImageNet1K 256*256 benchmark, our best AiM model achieves a FID of 2.21, surpassing all existing AR models of comparable parameter counts and demonstrating significant competitiveness against diffusion models, with 2 to 10 times faster inference speed. Code is available at this https URL

[CV-51] BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking

链接: https://arxiv.org/abs/2408.12232
作者: Hanzheng Wang,Wei Li,Xiang-Gen Xia,Qian Du
关键词-EN: existing HOT datasets, HOT datasets, exhibited potential, HOT, spectral
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:Hyperspectral object tracking (HOT) has exhibited potential in various applications, particularly in scenes where objects are camouflaged. Existing trackers can effectively retrieve objects via band regrouping because of the bias in existing HOT datasets, where most objects tend to have distinguishing visual appearances rather than spectral characteristics. This bias allows the tracker to directly use the visual features obtained from the false-color images generated by hyperspectral images without the need to extract spectral features. To tackle this bias, we find that the tracker should focus on the spectral information when object appearance is unreliable. Thus, we provide a new task called hyperspectral camouflaged object tracking (HCOT) and meticulously construct a large-scale HCOT dataset, termed BihoT, which consists of 41,912 hyperspectral images covering 49 video sequences. The dataset covers various artificial camouflage scenes where objects have similar appearances, diverse spectrums, and frequent occlusion, making it a very challenging dataset for HCOT. Besides, a simple but effective baseline model, named spectral prompt-based distractor-aware network (SPDAN), is proposed, comprising a spectral embedding network (SEN), a spectral prompt-based backbone network (SPBN), and a distractor-aware module (DAM). Specifically, the SEN extracts spectral-spatial features via 3-D and 2-D convolutions. Then, the SPBN fine-tunes powerful RGB trackers with spectral prompts and alleviates the insufficiency of training samples. Moreover, the DAM utilizes a novel statistic to capture the distractor caused by occlusion from objects and background. Extensive experiments demonstrate that our proposed SPDAN achieves state-of-the-art performance on the proposed BihoT and other HOT datasets.

[CV-52] Computer-Aided Fall Recognition Using a Three-Stream Spatial-Temporal GCN Model with Adaptive Feature Aggregation

链接: https://arxiv.org/abs/2408.12211
作者: Jungpil Shin,Abu Saleh Musa Miah,Rei Egawa1,Koki Hirooka,Md. Al Mehedi Hasan,Yoichi Tomioka,Yong Seok Hwang
关键词-EN: fall detection, fall detection system, paramount in modern, lead to severe, severe injuries
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:The prevention of falls is paramount in modern healthcare, particularly for the elderly, as falls can lead to severe injuries or even fatalities. Additionally, the growing incidence of falls among the elderly, coupled with the urgent need to prevent suicide attempts resulting from medication overdose, underscores the critical importance of accurate and efficient fall detection methods. In this scenario, a computer-aided fall detection system is inevitable to save elderly people’s lives worldwide. Many researchers have been working to develop fall detection systems. However, the existing fall detection systems often struggle with issues such as unsatisfactory performance accuracy, limited robustness, high computational complexity, and sensitivity to environmental factors due to a lack of effective features. In response to these challenges, this paper proposes a novel three-stream spatial-temporal feature-based fall detection system. Our system incorporates joint skeleton-based spatial and temporal Graph Convolutional Network (GCN) features, joint motion-based spatial and temporal GCN features, and residual connections-based features. Each stream employs adaptive graph-based feature aggregation and consecutive separable convolutional neural networks (Sep-TCN), significantly reducing computational complexity and model parameters compared to prior systems. Experimental results across multiple datasets demonstrate the superior effectiveness and efficiency of our proposed system, with accuracies of 99.51%, 99.15%, 99.79% and 99.85 % achieved on the ImViA, UR-Fall, Fall-UP and FU-Kinect datasets, respectively. The remarkable performance of our system highlights its superiority, efficiency, and generalizability in real-world fall detection scenarios, offering significant advancements in healthcare and societal well-being.

[CV-53] ransientangelo: Few-Viewpoint Surface Reconstruction Using Single-Photon Lidar

链接: https://arxiv.org/abs/2408.12191
作者: Weihan Luo,Anagh Malik,David B. Lindell
关键词-EN: lidar, backscattered light, raw measurements, light, lidar system
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:We consider the problem of few-viewpoint 3D surface reconstruction using raw measurements from a lidar system. Lidar captures 3D scene geometry by emitting pulses of light to a target and recording the speed-of-light time delay of the reflected light. However, conventional lidar systems do not output the raw, captured waveforms of backscattered light; instead, they pre-process these data into a 3D point cloud. Since this procedure typically does not accurately model the noise statistics of the system, exploit spatial priors, or incorporate information about downstream tasks, it ultimately discards useful information that is encoded in raw measurements of backscattered light. Here, we propose to leverage raw measurements captured with a single-photon lidar system from multiple viewpoints to optimize a neural surface representation of a scene. The measurements consist of time-resolved photon count histograms, or transients, which capture information about backscattered light at picosecond time scales. Additionally, we develop new regularization strategies that improve robustness to photon noise, enabling accurate surface reconstruction with as few as 10 photons per pixel. Our method outperforms other techniques for few-viewpoint 3D reconstruction based on depth maps, point clouds, or conventional lidar as demonstrated in simulation and with captured data.

[CV-54] Rebalancing Multi-Label Class-Incremental Learning

链接: https://arxiv.org/abs/2408.12161
作者: Kaile Du,Yifan Zhou,Fan Lyu,Yuyang Li,Junzhou Xie,Yixi Shen,Fuyuan Hu,Guangcan Liu
关键词-EN: real-world multi-label applications, Multi-label class-incremental learning, retaining previously learned, learned knowledge continuously, previously learned knowledge
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:Multi-label class-incremental learning (MLCIL) is essential for real-world multi-label applications, allowing models to learn new labels while retaining previously learned knowledge continuously. However, recent MLCIL approaches can only achieve suboptimal performance due to the oversight of the positive-negative imbalance problem, which manifests at both the label and loss levels because of the task-level partial label issue. The imbalance at the label level arises from the substantial absence of negative labels, while the imbalance at the loss level stems from the asymmetric contributions of the positive and negative loss parts to the optimization. To address the issue above, we propose a Rebalance framework for both the Loss and Label levels (RebLL), which integrates two key modules: asymmetric knowledge distillation (AKD) and online relabeling (OR). AKD is proposed to rebalance at the loss level by emphasizing the negative label learning in classification loss and down-weighting the contribution of overconfident predictions in distillation loss. OR is designed for label rebalance, which restores the original class distribution in memory by online relabeling the missing classes. Our comprehensive experiments on the PASCAL VOC and MS-COCO datasets demonstrate that this rebalancing strategy significantly improves performance, achieving new state-of-the-art results even with a vanilla CNN backbone.

[CV-55] RRG: Towards Truthful Radiology Report Generation With Cross-modal Disease Clue Enhanced Large Language Model

链接: https://arxiv.org/abs/2408.12141
作者: Yuhao Wang,Chao Hao,Yawen Cui,Xinqi Su,Weicheng Xie,Tao Tan,Zitong Yu
关键词-EN: attracted wide attention, large language models, radiology report generation, large language, multi-modal large language
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:The vision-language modeling capability of multi-modal large language models has attracted wide attention from the community. However, in medical domain, radiology report generation using vision-language models still faces significant challenges due to the imbalanced data distribution caused by numerous negated descriptions in radiology reports and issues such as rough alignment between radiology reports and radiography. In this paper, we propose a truthful radiology report generation framework, namely TRRG, based on stage-wise training for cross-modal disease clue injection into large language models. In pre-training stage, During the pre-training phase, contrastive learning is employed to enhance the ability of visual encoder to perceive fine-grained disease details. In fine-tuning stage, the clue injection module we proposed significantly enhances the disease-oriented perception capability of the large language model by effectively incorporating the robust zero-shot disease perception. Finally, through the cross-modal clue interaction module, our model effectively achieves the multi-granular interaction of visual embeddings and an arbitrary number of disease clue embeddings. This significantly enhances the report generation capability and clinical effectiveness of multi-modal large language models in the field of radiology reportgeneration. Experimental results demonstrate that our proposed pre-training and fine-tuning framework achieves state-of-the-art performance in radiology report generation on datasets such as IU-Xray and MIMIC-CXR. Further analysis indicates that our proposed method can effectively enhance the model to perceive diseases and improve its clinical effectiveness.

[CV-56] Diffusion-Based Visual Art Creation: A Survey and New Perspectives

链接: https://arxiv.org/abs/2408.12128
作者: Bingyuan Wang,Qifeng Chen,Zeyu Wang
关键词-EN: underlying domain knowledge, visual art creation, visual art, diffusion-based visual art, domain knowledge
类目: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
*备注: 35 pages, 9 figures

点击查看摘要

Abstract:The integration of generative AI in visual art has revolutionized not only how visual content is created but also how AI interacts with and reflects the underlying domain knowledge. This survey explores the emerging realm of diffusion-based visual art creation, examining its development from both artistic and technical perspectives. We structure the survey into three phases, data feature and framework identification, detailed analyses using a structured coding process, and open-ended prospective outlooks. Our findings reveal how artistic requirements are transformed into technical challenges and highlight the design and application of diffusion-based methods within visual art creation. We also provide insights into future directions from technical and synergistic perspectives, suggesting that the confluence of generative AI and art has shifted the creative paradigm and opened up new possibilities. By summarizing the development and trends of this emerging interdisciplinary area, we aim to shed light on the mechanisms through which AI systems emulate and possibly, enhance human capacities in artistic perception and creativity.

[CV-57] SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models

链接: https://arxiv.org/abs/2408.12114
作者: Youngjoon Yu,Sangyun Chung,Byung-Kwan Lee,Yong Man Ro
关键词-EN: text-aligned vision inputs, Large-scale Vision-Language Models, vision inputs, multi-vision, Large-scale Vision-Language
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注: Codes and data are available at this https URL

点击查看摘要

Abstract:Large-scale Vision-Language Models (LVLMs) have significantly advanced with text-aligned vision inputs. They have made remarkable progress in computer vision tasks by aligning text modality with vision inputs. There are also endeavors to incorporate multi-vision sensors beyond RGB, including thermal, depth, and medical X-ray images. However, we observe that current LVLMs view images taken from multi-vision sensors as if they were in the same RGB domain without considering the physical characteristics of multi-vision sensors. They fail to convey the fundamental multi-vision sensor information from the dataset and the corresponding contextual knowledge properly. Consequently, alignment between the information from the actual physical environment and the text is not achieved correctly, making it difficult to answer complex sensor-related questions that consider the physical environment. In this paper, we aim to establish a multi-vision Sensor Perception And Reasoning benchmarK called SPARK that can reduce the fundamental multi-vision sensor information gap between images and multi-vision sensors. We generated 6,248 vision-language test samples automatically to investigate multi-vision sensory perception and multi-vision sensory reasoning on physical sensor knowledge proficiency across different formats, covering different types of sensor-related questions. We utilized these samples to assess ten leading LVLMs. The results showed that most models displayed deficiencies in multi-vision sensory reasoning to varying extents. Codes and data are available at this https URL

[CV-58] ZipGait: Bridging Skeleton and Silhouette with Diffusion Model for Advancing Gait Recognition

链接: https://arxiv.org/abs/2408.12111
作者: Fanxu Min,Qing Cai,Shaoxiang Guo,Yang Yu,Hao Fan,Junyu Dong
关键词-EN: research predominantly focuses, Current gait recognition, recognition research predominantly, extracting appearance features, Current gait
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:Current gait recognition research predominantly focuses on extracting appearance features effectively, but the performance is severely compromised by the vulnerability of silhouettes under unconstrained scenes. Consequently, numerous studies have explored how to harness information from various models, particularly by sufficiently utilizing the intrinsic information of skeleton sequences. While these model-based methods have achieved significant performance, there is still a huge gap compared to appearance-based methods, which implies the potential value of bridging silhouettes and skeletons. In this work, we make the first attempt to reconstruct dense body shapes from discrete skeleton distributions via the diffusion model, demonstrating a new approach that connects cross-modal features rather than focusing solely on intrinsic features to improve model-based methods. To realize this idea, we propose a novel gait diffusion model named DiffGait, which has been designed with four specific adaptations suitable for gait recognition. Furthermore, to effectively utilize the reconstructed silhouettes and skeletons, we introduce Perception Gait Integration (PGI) to integrate different gait features through a two-stage process. Incorporating those modifications leads to an efficient model-based gait recognition framework called ZipGait. Through extensive experiments on four public benchmarks, ZipGait demonstrates superior performance, outperforming the state-of-the-art methods by a large margin under both cross-domain and intra-domain settings, while achieving significant plug-and-play performance improvements.

[CV-59] RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data

链接: https://arxiv.org/abs/2408.12109
作者: Chenglong Wang,Yang Gan,Yifu Huo,Yongyu Mu,Murun Yang,Qiaozhi He,Tong Xiao,Chunliang Zhang,Tongran Liu,Quan Du,Di Yang,Jingbo Zhu
关键词-EN: generating misleading content, Large vision-language models, proper visual context, visual reward model, Large vision-language
类目: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
*备注:

点击查看摘要

Abstract:Large vision-language models (LVLMs) often fail to align with human preferences, leading to issues like generating misleading content without proper visual context (also known as hallucination). A promising solution to this problem is using human-preference alignment techniques, such as best-of-n sampling and reinforcement learning. However, these techniques face the difficulty arising from the scarcity of visual preference data, which is required to train a visual reward model (VRM). In this work, we continue the line of research. We present a Robust Visual Reward Model (RoVRM) which improves human-preference alignment for LVLMs. RoVRM leverages auxiliary textual preference data through a three-phase progressive training and optimal transport-based preference data selection to effectively mitigate the scarcity of visual preference data. We experiment with RoVRM on the commonly used vision-language tasks based on the LLaVA-1.5-7B and -13B models. Experimental results demonstrate that RoVRM consistently outperforms traditional VRMs. Furthermore, our three-phase progressive training and preference data selection approaches can yield consistent performance gains over ranking-based alignment techniques, such as direct preference optimization.

[CV-60] Integrating Audio Visual and Semantic Information for Enhanced Multimodal Speaker Diarization

链接: https://arxiv.org/abs/2408.12102
作者: Luyao Cheng,Hui Wang,Siqi Zheng,Yafeng Chen,Rongjie Huang,Qinglin Zhang,Qian Chen,Xihao Li
关键词-EN: transcribed speech content, homogenous partitions based, transcribed speech, human speech, plays a crucial
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
*备注:

点击查看摘要

Abstract:Speaker diarization, the process of segmenting an audio stream or transcribed speech content into homogenous partitions based on speaker identity, plays a crucial role in the interpretation and analysis of human speech. Most existing speaker diarization systems rely exclusively on unimodal acoustic information, making the task particularly challenging due to the innate ambiguities of audio signals. Recent studies have made tremendous efforts towards audio-visual or audio-semantic modeling to enhance performance. However, even the incorporation of up to two modalities often falls short in addressing the complexities of spontaneous and unstructured conversations. To exploit more meaningful dialogue patterns, we propose a novel multimodal approach that jointly utilizes audio, visual, and semantic cues to enhance speaker diarization. Our method elegantly formulates the multimodal modeling as a constrained optimization problem. First, we build insights into the visual connections among active speakers and the semantic interactions within spoken content, thereby establishing abundant pairwise constraints. Then we introduce a joint pairwise constraint propagation algorithm to cluster speakers based on these visual and semantic constraints. This integration effectively leverages the complementary strengths of different modalities, refining the affinity estimation between individual speaker embeddings. Extensive experiments conducted on multiple multimodal datasets demonstrate that our approach consistently outperforms state-of-the-art speaker diarization methods.

[CV-61] A Unified Plug-and-Play Algorithm with Projected Landweber Operator for Split Convex Feasibility Problems

链接: https://arxiv.org/abs/2408.12100
作者: Shuchang Zhang,Hongxia Wang
关键词-EN: inverse imaging problems, recent years, performance in inverse, replacing proximal operators, inverse imaging
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:In recent years Plug-and-Play (PnP) methods have achieved state-of-the-art performance in inverse imaging problems by replacing proximal operators with denoisers. Based on the proximal gradient method, some theoretical results of PnP have appeared, where appropriate step size is crucial for convergence analysis. However, in practical applications, applying PnP methods with theoretically guaranteed step sizes is difficult, and these algorithms are limited to Gaussian noise. In this paper,from a perspective of split convex feasibility problems (SCFP), an adaptive PnP algorithm with Projected Landweber Operator (PnP-PLO) is proposed to address these issues. Numerical experiments on image deblurring, super-resolution, and compressed sensing MRI experiments illustrate that PnP-PLO with theoretical guarantees outperforms state-of-the-art methods such as RED and RED-PRO.

[CV-62] Query-Efficient Video Adversarial Attack with Stylized Logo

链接: https://arxiv.org/abs/2408.12099
作者: Duoxun Tang,Yuxin Cao,Xi Xiao,Derui Wang,Sheng Wen,Tianqing Zhu
关键词-EN: Deep Neural Networks, Neural Networks, classification systems based, Deep Neural, verifying video content
类目: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
*备注:

点击查看摘要

Abstract:Video classification systems based on Deep Neural Networks (DNNs) have demonstrated excellent performance in accurately verifying video content. However, recent studies have shown that DNNs are highly vulnerable to adversarial examples. Therefore, a deep understanding of adversarial attacks can better respond to emergency situations. In order to improve attack performance, many style-transfer-based attacks and patch-based attacks have been proposed. However, the global perturbation of the former will bring unnatural global color, while the latter is difficult to achieve success in targeted attacks due to the limited perturbation space. Moreover, compared to a plethora of methods targeting image classifiers, video adversarial attacks are still not that popular. Therefore, to generate adversarial examples with a low budget and to provide them with a higher verisimilitude, we propose a novel black-box video attack framework, called Stylized Logo Attack (SLA). SLA is conducted through three steps. The first step involves building a style references set for logos, which can not only make the generated examples more natural, but also carry more target class features in the targeted attacks. Then, reinforcement learning (RL) is employed to determine the style reference and position parameters of the logo within the video, which ensures that the stylized logo is placed in the video with optimal attributes. Finally, perturbation optimization is designed to optimize perturbations to improve the fooling rate in a step-by-step manner. Sufficient experimental results indicate that, SLA can achieve better performance than state-of-the-art methods and still maintain good deception effects when facing various defense methods.

[CV-63] LLM-enhanced Scene Graph Learning for Household Rearrangement SIGGRAPH

链接: https://arxiv.org/abs/2408.12093
作者: Wenhao Li,Zhiyuan Yu,Qijin She,Zhinan Yu,Yuqing Lan,Chenyang Zhu,Ruizhen Hu,Kai Xu
关键词-EN: task involves spotting, involves spotting misplaced, involves spotting, scene, AEG
类目: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
*备注: SIGGRAPH ASIA 2024

点击查看摘要

Abstract:The household rearrangement task involves spotting misplaced objects in a scene and accommodate them with proper places. It depends both on common-sense knowledge on the objective side and human user preference on the subjective side. In achieving such task, we propose to mine object functionality with user preference alignment directly from the scene itself, without relying on human intervention. To do so, we work with scene graph representation and propose LLM-enhanced scene graph learning which transforms the input scene graph into an affordance-enhanced graph (AEG) with information-enhanced nodes and newly discovered edges (relations). In AEG, the nodes corresponding to the receptacle objects are augmented with context-induced affordance which encodes what kind of carriable objects can be placed on it. New edges are discovered with newly discovered non-local relations. With AEG, we perform task planning for scene rearrangement by detecting misplaced carriables and determining a proper placement for each of them. We test our method by implementing a tiding robot in simulator and perform evaluation on a new benchmark we build. Extensive evaluations demonstrate that our method achieves state-of-the-art performance on misplacement detection and the following rearrangement planning.

[CV-64] Unlocking Attributes Contribution to Successful Camouflage: A Combined Textual and VisualAnalysis Strategy ECCV2024

链接: https://arxiv.org/abs/2408.12086
作者: Hong Zhang,Yixuan Lyu,Qian Yu,Hanyang Liu,Huimin Ma,Ding Yuan,Yifan Yang
关键词-EN: Camouflaged Object Segmentation, remain poorly understood, effective camouflage remain, camouflage remain poorly, Object Segmentation
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注: Accepted by ECCV 2024

点击查看摘要

Abstract:In the domain of Camouflaged Object Segmentation (COS), despite continuous improvements in segmentation performance, the underlying mechanisms of effective camouflage remain poorly understood, akin to a black box. To address this gap, we present the first comprehensive study to examine the impact of camouflage attributes on the effectiveness of camouflage patterns, offering a quantitative framework for the evaluation of camouflage designs. To support this analysis, we have compiled the first dataset comprising descriptions of camouflaged objects and their attribute contributions, termed COD-Text And X-attributions (COD-TAX). Moreover, drawing inspiration from the hierarchical process by which humans process information: from high-level textual descriptions of overarching scenarios, through mid-level summaries of local areas, to low-level pixel data for detailed analysis. We have developed a robust framework that combines textual and visual information for the task of COS, named Attribution CUe Modeling with Eye-fixation Network (ACUMEN). ACUMEN demonstrates superior performance, outperforming nine leading methods across three widely-used datasets. We conclude by highlighting key insights derived from the attributes identified in our study. Code: this https URL.

[CV-65] Vision-Based Detection of Uncooperative Targets and Components on Small Satellites

链接: https://arxiv.org/abs/2408.12084
作者: Hannah Grauer,Elena-Sorina Lupu,Connor Lee,Soon-Jo Chung,Darren Rowen,Benjamen Bycroft,Phaedrus Leeds,John Brader
关键词-EN: situational awareness techniques, inactive satellites pose, space situational awareness, awareness techniques, debris and inactive
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注: Small Satellite 2024 Conference, 13 pages, 8 figures, 6 tables

点击查看摘要

Abstract:Space debris and inactive satellites pose a threat to the safety and integrity of operational spacecraft and motivate the need for space situational awareness techniques. These uncooperative targets create a challenging tracking and detection problem due to a lack of prior knowledge of their features, trajectories, or even existence. Recent advancements in computer vision models can be used to improve upon existing methods for tracking such uncooperative targets to make them more robust and reliable to the wide-ranging nature of the target. This paper introduces an autonomous detection model designed to identify and monitor these objects using learning and computer vision. The autonomous detection method aims to identify and accurately track the uncooperative targets in varied circumstances, including different camera spectral sensitivities, lighting, and backgrounds. Our method adapts to the relative distance between the observing spacecraft and the target, and different detection strategies are adjusted based on distance. At larger distances, we utilize You Only Look Once (YOLOv8), a multitask Convolutional Neural Network (CNN), for zero-shot and domain-specific single-shot real time detection of the target. At shorter distances, we use knowledge distillation to combine visual foundation models with a lightweight fast segmentation CNN (Fast-SCNN) to segment the spacecraft components with low storage requirements and fast inference times, and to enable weight updates from earth and possible onboard training. Lastly, we test our method on a custom dataset simulating the unique conditions encountered in space, as well as a publicly-available dataset.

[CV-66] Enhancing Sampling Protocol for Robust Point Cloud Classification

链接: https://arxiv.org/abs/2408.12062
作者: Chongshou Li,Pin Tang,Xinke Li,Tianrui Li
关键词-EN: Farthest Point Sampling, Established sampling protocols, Fixed Sample Size, point cloud, Established sampling
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Established sampling protocols for 3D point cloud learning, such as Farthest Point Sampling (FPS) and Fixed Sample Size (FSS), have long been recognized and utilized. However, real-world data often suffer from corrputions such as sensor noise, which violates the benignness assumption of point cloud in current protocols. Consequently, they are notably vulnerable to noise, posing significant safety risks in critical applications like autonomous driving. To address these issues, we propose an enhanced point cloud sampling protocol, PointDR, which comprises two components: 1) Downsampling for key point identification and 2) Resampling for flexible sample size. Furthermore, differentiated strategies are implemented for training and inference processes. Particularly, an isolation-rated weight considering local density is designed for the downsampling method, assisting it in performing random key points selection in the training phase and bypassing noise in the inference phase. A local-geometry-preserved upsampling is incorporated into resampling, facilitating it to maintain a stochastic sample size in the training stage and complete insufficient data in the inference. It is crucial to note that the proposed protocol is free of model architecture altering and extra learning, thus minimal efforts are demanded for its replacement of the existing one. Despite the simplicity, it substantially improves the robustness of point cloud learning, showcased by outperforming the state-of-the-art methods on multiple benchmarks of corrupted point cloud classification. The code will be available upon the paper’s acceptance.

[CV-67] ISETHDR: A Physics-based Synthetic Radiance Dataset for High Dynamic Range Driving Scenes

链接: https://arxiv.org/abs/2408.12048
作者: Zhenyi Liu,Devesh Shah,Brian Wandell
关键词-EN: HDR, software, HDR driving scenes, sensors designed, image systems
类目: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
*备注:

点击查看摘要

Abstract:This paper describes a physics-based end-to-end software simulation for image systems. We use the software to explore sensors designed to enhance performance in high dynamic range (HDR) environments, such as driving through daytime tunnels and under nighttime conditions. We synthesize physically realistic HDR spectral radiance images and use them as the input to digital twins that model the optics and sensors of different systems. This paper makes three main contributions: (a) We create a labeled (instance segmentation and depth), synthetic radiance dataset of HDR driving scenes. (b) We describe the development and validation of the end-to-end simulation framework. © We present a comparative analysis of two single-shot sensors designed for HDR. We open-source both the dataset and the software.

[CV-68] FUSELOC: Fusing Global and Local Descriptors to Disambiguate 2D-3D Matching in Visual Localization

链接: https://arxiv.org/abs/2408.12037
作者: Son Tung Nguyen,Alejandro Fontan,Michael Milford,Tobias Fischer
关键词-EN: relevant map regions, visual localization, global descriptors, map regions, Hierarchical methods represent
类目: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
*备注:

点击查看摘要

Abstract:Hierarchical methods represent state-of-the-art visual localization, optimizing search efficiency by using global descriptors to focus on relevant map regions. However, this state-of-the-art performance comes at the cost of substantial memory requirements, as all database images must be stored for feature matching. In contrast, direct 2D-3D matching algorithms require significantly less memory but suffer from lower accuracy due to the larger and more ambiguous search space. We address this ambiguity by fusing local and global descriptors using a weighted average operator within a 2D-3D search framework. This fusion rearranges the local descriptor space such that geographically nearby local descriptors are closer in the feature space according to the global descriptors. Therefore, the number of irrelevant competing descriptors decreases, specifically if they are geographically distant, thereby increasing the likelihood of correctly matching a query descriptor. We consistently improve the accuracy over local-only systems and achieve performance close to hierarchical methods while halving memory requirements. Extensive experiments using various state-of-the-art local and global descriptors across four different datasets demonstrate the effectiveness of our approach. For the first time, our approach enables direct matching algorithms to benefit from global descriptors while maintaining memory efficiency. The code for this paper will be published at \hrefthis https URLthis http URL.

[CV-69] Limitations in Employing Natural Language Supervision for Sensor-Based Human Activity Recognition – And Ways to Overcome Them

链接: https://arxiv.org/abs/2408.12023
作者: Harish Haresamudram,Apoorva Beedu,Mashfiqui Rabbi,Sankalita Saha,Irfan Essa,Thomas Ploetz
关键词-EN: Cross-modal contrastive pre-training, Cross-modal contrastive, demonstrated astonishing performance, vision and audio, natural language supervision
类目: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:Cross-modal contrastive pre-training between natural language and other modalities, e.g., vision and audio, has demonstrated astonishing performance and effectiveness across a diverse variety of tasks and domains. In this paper, we investigate whether such natural language supervision can be used for wearable sensor based Human Activity Recognition (HAR), and discover that-surprisingly-it performs substantially worse than standard end-to-end training and self-supervision. We identify the primary causes for this as: sensor heterogeneity and the lack of rich, diverse text descriptions of activities. To mitigate their impact, we also develop strategies and assess their effectiveness through an extensive experimental evaluation. These strategies lead to significant increases in activity recognition, bringing performance closer to supervised and self-supervised training, while also enabling the recognition of unseen activities and cross modal retrieval of videos. Overall, our work paves the way for better sensor-language learning, ultimately leading to the development of foundational models for HAR using wearables.

[CV-70] CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion

链接: https://arxiv.org/abs/2408.12009
作者: Yunlong Tang,Gen Zhan,Li Yang,Yiting Liao,Chenliang Xu
关键词-EN: attract human attention, Video saliency prediction, saliency prediction aims, saliency prediction, driven by bottom-up
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:Video saliency prediction aims to identify the regions in a video that attract human attention and gaze, driven by bottom-up features from the video and top-down processes like memory and cognition. Among these top-down influences, language plays a crucial role in guiding attention by shaping how visual information is interpreted. Existing methods primarily focus on modeling perceptual information while neglecting the reasoning process facilitated by language, where ranking cues are crucial outcomes of this process and practical guidance for saliency prediction. In this paper, we propose CaRDiff (Caption, Rank, and generate with Diffusion), a framework that imitates the process by integrating a multimodal large language model (MLLM), a grounding module, and a diffusion model, to enhance video saliency prediction. Specifically, we introduce a novel prompting method VSOR-CoT (Video Salient Object Ranking Chain of Thought), which utilizes an MLLM with a grounding module to caption video content and infer salient objects along with their rankings and positions. This process derives ranking maps that can be sufficiently leveraged by the diffusion model to decode the saliency maps for the given video accurately. Extensive experiments show the effectiveness of VSOR-CoT in improving the performance of video saliency prediction. The proposed CaRDiff performs better than state-of-the-art models on the MVS dataset and demonstrates cross-dataset capabilities on the DHF1k dataset through zero-shot evaluation.

[CV-71] Visual Localization in 3D Maps: Comparing Point Cloud Mesh and NeRF Representations

链接: https://arxiv.org/abs/2408.11966
作者: Lintong Zhang,Yifu Tao,Jiarong Lin,Fu Zhang,Maurice Fallon
关键词-EN: cross-modal global visual, visual localization system, lidar sensing, global visual localization, paper introduces
类目: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
*备注:

点击查看摘要

Abstract:This paper introduces and assesses a cross-modal global visual localization system that can localize camera images within a color 3D map representation built using both visual and lidar sensing. We present three different state-of-the-art methods for creating the color 3D maps: point clouds, meshes, and neural radiance fields (NeRF). Our system constructs a database of synthetic RGB and depth image pairs from these representations. This database serves as the basis for global localization. We present an automatic approach that builds this database by synthesizing novel images of the scene and exploiting the 3D structure encoded in the different representations. Next, we present a global localization system that relies on the synthetic image database to accurately estimate the 6 DoF camera poses of monocular query images. Our localization approach relies on different learning-based global descriptors and feature detectors which enable robust image retrieval and matching despite the domain gap between (real) query camera images and the synthetic database images. We assess the system’s performance through extensive real-world experiments in both indoor and outdoor settings, in order to evaluate the effectiveness of each map representation and the benefits against traditional structure-from-motion localization approaches. Our results show that all three map representations can achieve consistent localization success rates of 55% and higher across various environments. NeRF synthesized images show superior performance, localizing query images at an average success rate of 72%. Furthermore, we demonstrate that our synthesized database enables global localization even when the map creation data and the localization sequence are captured when travelling in opposite directions. Our system, operating in real-time on a mobile laptop equipped with a GPU, achieves a processing rate of 1Hz.

[CV-72] Real-Time Incremental Explanations for Object Detectors

链接: https://arxiv.org/abs/2408.11963
作者: Santiago Calderón-Peña,Hana Chockler,David A. Kelly
关键词-EN: Existing black box, Existing black, black box explainability, box explainability tools, object detectors rely
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Existing black box explainability tools for object detectors rely on multiple calls to the model, which prevents them from computing explanations in real time. In this paper we introduce IncX, an algorithm for real-time incremental approximations of explanations, based on linear transformations of saliency maps. We implement IncX on top of D-RISE, a state-of-the-art black-box explainability tool for object detectors. We show that IncX’s explanations are comparable in quality to those of D-RISE, with insertion curves being within 8%, and are computed two orders of magnitude faster that D-RISE’s explanations.

[CV-73] CARLA Drone: Monocular 3D Object Detection from a Different Perspective

链接: https://arxiv.org/abs/2408.11958
作者: Johannes Meier,Luca Scalerandi,Oussema Dhaouadi,Jacques Kaiser,Nikita Araslanov,Daniel Cremers
关键词-EN: CARLA Drone dataset, Drone dataset, camera perspectives, Abstract, camera
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:Existing techniques for monocular 3D detection have a serious restriction. They tend to perform well only on a limited set of benchmarks, faring well either on ego-centric car views or on traffic camera views, but rarely on both. To encourage progress, this work advocates for an extended evaluation of 3D detection frameworks across different camera perspectives. We make two key contributions. First, we introduce the CARLA Drone dataset, CDrone. Simulating drone views, it substantially expands the diversity of camera perspectives in existing benchmarks. Despite its synthetic nature, CDrone represents a real-world challenge. To show this, we confirm that previous techniques struggle to perform well both on CDrone and a real-world 3D drone dataset. Second, we develop an effective data augmentation pipeline called GroundMix. Its distinguishing element is the use of the ground for creating 3D-consistent augmentation of a training image. GroundMix significantly boosts the detection accuracy of a lightweight one-stage detector. In our expanded evaluation, we achieve the average precision on par with or substantially higher than the previous state of the art across all tested datasets.

[CV-74] Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound

链接: https://arxiv.org/abs/2408.11915
作者: Junwon Lee,Jaekwon Im,Dabin Kim,Juhan Nam
关键词-EN: enhancing user experience, Foley sound synthesis, multimedia production, enhancing user, temporally and semantically
类目: ound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
*备注:

点击查看摘要

Abstract:Foley sound synthesis is crucial for multimedia production, enhancing user experience by synchronizing audio and video both temporally and semantically. Recent studies on automating this labor-intensive process through video-to-sound generation face significant challenges. Systems lacking explicit temporal features suffer from poor controllability and alignment, while timestamp-based models require costly and subjective human annotation. We propose Video-Foley, a video-to-sound system using Root Mean Square (RMS) as a temporal event condition with semantic timbre prompts (audio or text). RMS, a frame-level intensity envelope feature closely related to audio semantics, ensures high controllability and synchronization. The annotation-free self-supervised learning framework consists of two stages, Video2RMS and RMS2Sound, incorporating novel ideas including RMS discretization and RMS-ControlNet with a pretrained text-to-audio model. Our extensive evaluation shows that Video-Foley achieves state-of-the-art performance in audio-visual alignment and controllability for sound timing, intensity, timbre, and nuance. Code, model weights, and demonstrations are available on the accompanying website. (this https URL)

[CV-75] Joint PET-MRI Reconstruction with Diffusion Stochastic Differential Model

链接: https://arxiv.org/abs/2408.11840
作者: Taofeng Xie,Zhuoxu Cui,Congcong Liu,Chen Luo,Huayu Wang,Yuanzhi Zhang,Xuemei Wang,Yihang Zhou,Qiyu Jin,Guoqing Chen,Dong Liang,Haifeng Wang
关键词-EN: PET, MRI, PET suffers, joint, Abstract
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注: Accepted as ISMRM 2024 Digital poster 6575. 04-09 May 2024 Singapore

点击查看摘要

Abstract:PET suffers from a low signal-to-noise ratio. Meanwhile, the k-space data acquisition process in MRI is time-consuming by PET-MRI systems. We aim to accelerate MRI and improve PET image quality. This paper proposed a novel joint reconstruction model by diffusion stochastic differential equations based on learning the joint probability distribution of PET and MRI. Compare the results underscore the qualitative and quantitative improvements our model brings to PET and MRI reconstruction, surpassing the current state-of-the-art methodologies. Joint PET-MRI reconstruction is a challenge in the PET-MRI system. This studies focused on the relationship extends beyond edges. In this study, PET is generated from MRI by learning joint probability distribution as the relationship.

[CV-76] Analysis of Unstructured High-Density Crowded Scenes for Crowd Monitoring

链接: https://arxiv.org/abs/2408.11836
作者: Alexandre Matov
关键词-EN: human crowds, interested in developing, developing an automated, movements in human, undergoing organized motion
类目: Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:We are interested in developing an automated system for detection of organized movements in human crowds. Computer vision algorithms can extract information from videos of crowded scenes and automatically detect and track groups of individuals undergoing organized motion, which represents an anomalous behavior in the context of conflict aversion. Our system can detect organized cohorts against the background of randomly moving objects and we can estimate the number of participants in an organized cohort, the speed and direction of motion in real time, within three to four video frames, which is less than one second from the onset of motion captured on a CCTV. We have performed preliminary analysis in this context in biological cell data containing up to four thousand objects per frame and will extend this numerically to a hundred-fold for public safety applications. We envisage using the existing infrastructure of video cameras for acquiring image datasets on-the-fly and deploying an easy-to-use data-driven software system for parsing of significant events by analyzing image sequences taken inside and outside of sports stadiums or other public venues. Other prospective users are organizers of political rallies, civic and wildlife organizations, security firms, and the military. We will optimize the performance of the software by implementing a classification method able to distinguish between activities posing a threat and those not posing a threat. Subjects: Computer Vision and Pattern Recognition (cs.CV) Cite as: arXiv:2408.11836 [cs.CV] (or arXiv:2408.11836v1 [cs.CV] for this version) https://doi.org/10.48550/arXiv.2408.11836 Focus to learn more arXiv-issued DOI via DataCite

[CV-77] SCREENER: A general framework for task-specific experiment design in quantitative MRI

链接: https://arxiv.org/abs/2408.11834
作者: Tianshu Zheng,Zican Wang,Timothy Bray,Daniel C. Alexander,Dan Wu,Hui Zhang
关键词-EN: magnetic resonance imaging, Quantitative magnetic resonance, resonance imaging, treatment monitoring, magnetic resonance
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Quantitative magnetic resonance imaging (qMRI) is increasingly investigated for use in a variety of clinical tasks from diagnosis, through staging, to treatment monitoring. However, experiment design in qMRI, the identification of the optimal acquisition protocols, has been focused on obtaining the most precise parameter estimations, with no regard for the specific requirements of downstream tasks. Here we propose SCREENER: A general framework for task-specific experiment design in quantitative MRI. SCREENER incorporates a task-specific objective and seeks the optimal protocol with a deep-reinforcement-learning (DRL) based optimization strategy. To illustrate this framework, we employ a task of classifying the inflammation status of bone marrow using diffusion MRI data with intravoxel incoherent motion (IVIM) modelling. Results demonstrate SCREENER outperforms previous ad hoc and optimized protocols under clinical signal-to-noise ratio (SNR) conditions, achieving significant improvement, both in binary classification tasks, e.g. from 67% to 89%, and in a multi-class classification task, from 46% to 59%. Additionally, we show this improvement is robust to the SNR. Lastly, we demonstrate the advantage of DRL-based optimization strategy, enabling zero-shot discovery of near-optimal protocols for a range of SNRs not used in training. In conclusion, SCREENER has the potential to enable wider uptake of qMRI in the clinic.

[CV-78] FAKER: Full-body Anonymization with Human Keypoint Extraction for Real-time Video Deidentification

链接: https://arxiv.org/abs/2408.11829
作者: Byunghyun Ban,Hyoseok Lee
关键词-EN: contemporary digital era, digital era, paramount issue, contemporary digital, Abstract
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
*备注:

点击查看摘要

Abstract:In the contemporary digital era, protection of personal information has become a paramount issue. The exponential growth of the media industry has heightened concerns regarding the anonymization of individuals captured in video footage. Traditional methods, such as blurring or pixelation, are commonly employed, while recent advancements have introduced generative adversarial networks (GAN) to redraw faces in videos. In this study, we propose a novel approach that employs a significantly smaller model to achieve real-time full-body anonymization of individuals in videos. Unlike conventional techniques that often fail to effectively remove personal identification information such as skin color, clothing, accessories, and body shape while our method successfully eradicates all such details. Furthermore, by leveraging pose estimation algorithms, our approach accurately represents information regarding individuals’ positions, movements, and postures. This algorithm can be seamlessly integrated into CCTV or IP camera systems installed in various industrial settings, functioning in real-time and thus facilitating the widespread adoption of full-body anonymization technology.

[CV-79] Automatic Organ and Pan-cancer Segmentation in Abdomen CT: the FLARE 2023 Challenge MICCAI2024

链接: https://arxiv.org/abs/2408.12534
作者: Jun Ma,Yao Zhang,Song Gu,Cheng Ge,Ershuai Wang,Qin Zhou,Ziyan Huang,Pengju Lyu,Jian He,Bo Wang
关键词-EN: abdomen Computed Tomography, Computed Tomography, precise cancer diagnosis, abdomen Computed, diagnosis and treatment
类目: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
*备注: MICCAI 2024 FLARE Challenge Summary

点击查看摘要

Abstract:Organ and cancer segmentation in abdomen Computed Tomography (CT) scans is the prerequisite for precise cancer diagnosis and treatment. Most existing benchmarks and algorithms are tailored to specific cancer types, limiting their ability to provide comprehensive cancer analysis. This work presents the first international competition on abdominal organ and pan-cancer segmentation by providing a large-scale and diverse dataset, including 4650 CT scans with various cancer types from over 40 medical centers. The winning team established a new state-of-the-art with a deep learning-based cascaded framework, achieving average Dice Similarity Coefficient scores of 92.3% for organs and 64.9% for lesions on the hidden multi-national testing set. The dataset and code of top teams are publicly available, offering a benchmark platform to drive further innovations this https URL.

[CV-80] EUIS-Net: A Convolutional Neural Network for Efficient Ultrasound Image Segmentation

链接: https://arxiv.org/abs/2408.12323
作者: Shahzaib Iqbal,Hasnat Ahmed,Muhammad Sharif,Madiha Hena,Tariq M. Khan,Imran Razzak
关键词-EN: images’ inherent noise, Segmenting ultrasound images, ultrasound images’ inherent, significant challenges due, offers significant challenges
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:Segmenting ultrasound images is critical for various medical applications, but it offers significant challenges due to ultrasound images’ inherent noise and unpredictability. To address these challenges, we proposed EUIS-Net, a CNN network designed to segment ultrasound images efficiently and precisely. The proposed EUIS-Net utilises four encoder-decoder blocks, resulting in a notable decrease in computational complexity while achieving excellent performance. The proposed EUIS-Net integrates both channel and spatial attention mechanisms into the bottleneck to improve feature representation and collect significant contextual information. In addition, EUIS-Net incorporates a region-aware attention module in skip connections, which enhances the ability to concentrate on the region of the injury. To enable thorough information exchange across various network blocks, skip connection aggregation is employed from the network’s lowermost to the uppermost block. Comprehensive evaluations are conducted on two publicly available ultrasound image segmentation datasets. The proposed EUIS-Net achieved mean IoU and dice scores of 78. 12%, 85. 42% and 84. 73%, 89. 01% in the BUSI and DDTI datasets, respectively. The findings of our study showcase the substantial capabilities of EUIS-Net for immediate use in clinical settings and its versatility in various ultrasound imaging tasks.

[CV-81] Whole Slide Image Classification of Salivary Gland Tumours

链接: https://arxiv.org/abs/2408.12275
作者: John Charlton,Ibrahim Alsanie,Syed Ali Khurram
关键词-EN: work shows promising, shows promising results, multiple instance learning, salivary gland tumours, slide images
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
*备注: 5 pages, 2 figures, 28th UK Conference on Medical Image Understanding and Analysis - clinical abstract

点击查看摘要

Abstract:This work shows promising results using multiple instance learning on salivary gland tumours in classifying cancers on whole slide images. Utilising CTransPath as a patch-level feature extractor and CLAM as a feature aggregator, an F1 score of over 0.88 and AUROC of 0.92 are obtained for detecting cancer in whole slide images.

[CV-82] hrough-the-Wall Radar Human Activity Micro-Doppler Signature Representation Method Based on Joint Boulic-Sinusoidal Pendulum Model MICRO

链接: https://arxiv.org/abs/2408.12077
作者: Xiaopeng Yang,Weicheng Gao,Xiaodong Qu,Zeyu Ma,Hao Zhang
关键词-EN: accurately identify indoor, joint Boulic-sinusoidal pendulum, enables the reconstruction, indoor human activities, reconstruction of range
类目: ignal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
*备注: 17 pages, 14 figures, 7 tables, in IEEE Transactions on Microwave Theory and Techniques, 2024

点击查看摘要

Abstract:With the help of micro-Doppler signature, ultra-wideband (UWB) through-the-wall radar (TWR) enables the reconstruction of range and velocity information of limb nodes to accurately identify indoor human activities. However, existing methods are usually trained and validated directly using range-time maps (RTM) and Doppler-time maps (DTM), which have high feature redundancy and poor generalization ability. In order to solve this problem, this paper proposes a human activity micro-Doppler signature representation method based on joint Boulic-sinusoidal pendulum motion model. In detail, this paper presents a simplified joint Boulic-sinusoidal pendulum human motion model by taking head, torso, both hands and feet into consideration improved from Boulic-Thalmann kinematic model. The paper also calculates the minimum number of key points needed to describe the Doppler and micro-Doppler information sufficiently. Both numerical simulations and experiments are conducted to verify the effectiveness. The results demonstrate that the proposed number of key points of micro-Doppler signature can precisely represent the indoor human limb node motion characteristics, and substantially improve the generalization capability of the existing methods for different testers.

[CV-83] Detection of Under-represented Samples Using Dynamic Batch Training for Brain Tumor Segmentation from MR Images

链接: https://arxiv.org/abs/2408.12013
作者: Subin Sahayam,John Michael Sujay Zakkam,Yoga Sri Varshan V,Umarani Jayaraman
关键词-EN: magnetic resonance imaging, automatic brain tumor, brain tumor segmentation, samples, training
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Brain tumors in magnetic resonance imaging (MR) are difficult, time-consuming, and prone to human error. These challenges can be resolved by developing automatic brain tumor segmentation methods from MR images. Various deep-learning models based on the U-Net have been proposed for the task. These deep-learning models are trained on a dataset of tumor images and then used for segmenting the masks. Mini-batch training is a widely used method in deep learning for training. However, one of the significant challenges associated with this approach is that if the training dataset has under-represented samples or samples with complex latent representations, the model may not generalize well to these samples. The issue leads to skewed learning of the data, where the model learns to fit towards the majority representations while underestimating the under-represented samples. The proposed dynamic batch training method addresses the challenges posed by under-represented data points, data points with complex latent representation, and imbalances within the class, where some samples may be harder to learn than others. Poor performance of such samples can be identified only after the completion of the training, leading to the wastage of computational resources. Also, training easy samples after each epoch is an inefficient utilization of computation resources. To overcome these challenges, the proposed method identifies hard samples and trains such samples for more iterations compared to easier samples on the BraTS2020 dataset. Additionally, the samples trained multiple times are identified and it provides a way to identify hard samples in the BraTS2020 dataset. The comparison of the proposed training approach with U-Net and other models in the literature highlights the capabilities of the proposed training approach.

[CV-84] MBSS-T1: Model-Based Self-Supervised Motion Correction for Robust Cardiac T1 Mapping

链接: https://arxiv.org/abs/2408.11992
作者: Eyal Hanania,Ilya Volovik,Daphna Link-Sourani,Israel Cohen,Moti Freiman
关键词-EN: quantitative MRI technique, valuable quantitative MRI, diffuse myocardial diseases, diagnosing diffuse myocardial, quantitative MRI
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:T1 mapping is a valuable quantitative MRI technique for diagnosing diffuse myocardial diseases. Traditional methods, relying on breath-hold sequences and echo triggering, face challenges with patient compliance and arrhythmias, limiting their effectiveness. Image registration can enable motion-robust T1 mapping, but inherent intensity differences between time points pose a challenge. We introduce MBSS-T1, a self-supervised model for motion correction in cardiac T1 mapping, constrained by physical and anatomical principles. The physical constraints ensure expected signal decay behavior, while the anatomical constraints maintain realistic deformations. The unique combination of these constraints ensures accurate T1 mapping along the longitudinal relaxation axis. MBSS-T1 outperformed baseline deep-learning-based image registration approaches in a 5-fold experiment on a public dataset of 210 patients (STONE sequence) and an internal dataset of 19 patients (MOLLI sequence). MBSS-T1 excelled in model fitting quality (R2: 0.974 vs. 0.941, 0.946), anatomical alignment (Dice score: 0.921 vs. 0.984, 0.988), and expert visual quality assessment for the presence of visible motion artifacts (4.33 vs. 3.34, 3.62). MBSS-T1 has the potential to enable motion-robust T1 mapping for a broader range of patients, overcoming challenges such as arrhythmias, and suboptimal compliance, and allowing for free-breathing T1 mapping without requiring large training datasets.

[CV-85] AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results

链接: https://arxiv.org/abs/2408.11982
作者: Maksim Smirnov,Aleksandr Gushchin,Anastasia Antsiferova,Dmitry Vatolin,Radu Timofte,Ziheng Jia,Zicheng Zhang,Wei Sun,Jiaying Qian,Yuqin Cao,Yinan Sun,Yuxin Zhu,Xiongkuo Min,Guangtao Zhai,Kanjar De,Qing Luo,Ao-Xiang Zhang,Peng Zhang,Haibo Lei,Linyan Jiang,Yaqing Li,Wenhui Meng,Xiaoheng Tan,Haiqiang Wang,Xiaozhong Xu,Shan Liu,Zhenzhong Chen,Zhengxue Cheng,Jiahao Xiao,Jun Xu,Chenlong He,Qi Zheng,Ruoxi Zhu,Min Li,Yibo Fan,Zhengzhong Tu
关键词-EN: Video quality assessment, Compressed Video Quality, quality assessment, Quality Assessment Dataset, Video quality
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
*备注:

点击查看摘要

Abstract:Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dataset of 459 videos, encoded with 14 codecs of various compression standards (AVC/H.264, HEVC/H.265, AV1, and VVC/H.266) and containing a comprehensive collection of compression artifacts. To measure the methods performance, we employed traditional correlation coefficients between their predictions and subjective scores, which were collected via large-scale crowdsourced pairwise human comparisons. For training purposes, participants were provided with the Compressed Video Quality Assessment Dataset (CVQAD), a previously developed dataset of 1022 videos. Up to 30 participating teams registered for the challenge, while we report the results of 6 teams, which submitted valid final solutions and code for reproducing the results. Moreover, we calculated and present the performance of state-of-the-art VQA methods on the developed dataset, providing a comprehensive benchmark for future research. The dataset, results, and online leaderboard are publicly available at this https URL.

[CV-86] CT-AGRG: Automated Abnormality-Guided Report Generation from 3D Chest CT Volumes

链接: https://arxiv.org/abs/2408.11965
作者: Theo Di Piazza
关键词-EN: time-consuming manual analysis, robust automated analysis, automated analysis techniques, computed tomography, manual analysis
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
*备注: 15 pages, 9 figures, submitted to ISBI 2025

点击查看摘要

Abstract:The rapid increase of computed tomography (CT) scans and their time-consuming manual analysis have created an urgent need for robust automated analysis techniques in clinical settings. These aim to assist radiologists and help them managing their growing workload. Existing methods typically generate entire reports directly from 3D CT images, without explicitly focusing on observed abnormalities. This unguided approach often results in repetitive content or incomplete reports, failing to prioritize anomaly-specific descriptions. We propose a new anomaly-guided report generation model, which first predicts abnormalities and then generates targeted descriptions for each. Evaluation on a public dataset demonstrates significant improvements in report quality and clinical relevance. We extend our work by conducting an ablation study to demonstrate its effectiveness.

[CV-87] Bioimpedance a Diagnostic Tool for Tobacco Induced Oral Lesions: a Mixed Model cross-sectional study

链接: https://arxiv.org/abs/2408.11886
作者: Vaibhav Gupta,Poonam Goel,Usha Agrawal,Neena Chaudhary,Garima Jain,Deepak Gupta
关键词-EN: Electrical impedance spectroscopy, evaluating cervical dysplasia, basal cell carcinoma, Electrical impedance, cervical dysplasia
类目: Quantitative Methods (q-bio.QM); Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:Introduction: Electrical impedance spectroscopy (EIS) has recently developed as a novel diagnostic device for screening and evaluating cervical dysplasia, prostate cancer, breast cancer and basal cell carcinoma. The current study aimed to validate and evaluate bioimpedance as a diagnostic tool for tobacco-induced oral lesions. Methodology: The study comprised 50 OSCC and OPMD tissue specimens for in-vitro study and 320 subjects for in vivo study. Bioimpedance device prepared and calibrated. EIS measurements were done for the habit and control groups and were compared. Results: The impedance value in the control group was significantly higher compared to the OPMD and OSCC groups. Diagnosis based on BIS measurements has a sensitivity of 95.9% and a specificity of 86.7%. Conclusion: Bioimpedance device can help in decision-making for differentiating OPMD and OSCC cases and their management, especially in primary healthcare settings. Keywords: Impedance, Cancer, Diagnosis, Device, Community Subjects: Quantitative Methods (q-bio.QM); Computer Vision and Pattern Recognition (cs.CV) Cite as: arXiv:2408.11886 [q-bio.QM] (or arXiv:2408.11886v1 [q-bio.QM] for this version) https://doi.org/10.48550/arXiv.2408.11886 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

机器学习

[LG-0] Non-Homophilic Graph Pre-Training and Prompt Learning

链接: https://arxiv.org/abs/2408.12594
作者: Xingtong Yu,Jie Zhang,Yuan Fang,Renhe Jiang
关键词-EN: modeling complex relationships, ubiquitous for modeling, modeling complex, complex relationships, relationships between objects
类目: Machine Learning (cs.LG)
*备注: Under review

点击查看摘要

Abstract:Graphs are ubiquitous for modeling complex relationships between objects across various fields. Graph neural networks (GNNs) have become a mainstream technique for graph-based applications, but their performance heavily relies on abundant labeled data. To reduce labeling requirement, pre-training and prompt learning has become a popular alternative. However, most existing prompt methods do not differentiate homophilic and heterophilic characteristics of real-world graphs. In particular, many real-world graphs are non-homophilic, not strictly or uniformly homophilic with mixing homophilic and heterophilic patterns, exhibiting varying non-homophilic characteristics across graphs and nodes. In this paper, we propose ProNoG, a novel pre-training and prompt learning framework for such non-homophilic graphs. First, we analyze existing graph pre-training methods, providing theoretical insights into the choice of pre-training tasks. Second, recognizing that each node exhibits unique non-homophilic characteristics, we propose a conditional network to characterize the node-specific patterns in downstream tasks. Finally, we thoroughly evaluate and analyze ProNoG through extensive experiments on ten public datasets.

[LG-1] Identifying the Best Arm in the Presence of Global Environment Shifts ECAI2024

链接: https://arxiv.org/abs/2408.12581
作者: Phurinut Srisawad,Juergen Branke,Long Tran-Thanh
关键词-EN: Best-Arm Identification problem, non-stationary stochastic bandits, Best-Arm Identification, Identification problem, stochastic bandits setting
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: Extended version of the paper accepted at the 27th European Conference on Artificial Intelligence (ECAI 2024); Paper ID: M1125

点击查看摘要

Abstract:This paper formulates a new Best-Arm Identification problem in the non-stationary stochastic bandits setting, where the means of all arms are shifted in the same way due to a global influence of the environment. The aim is to identify the unique best arm across environmental change given a fixed total budget. While this setting can be regarded as a special case of Adversarial Bandits or Corrupted Bandits, we demonstrate that existing solutions tailored to those settings do not fully utilise the nature of this global influence, and thus, do not work well in practice (despite their theoretical guarantees). To overcome this issue, in this paper we develop a novel selection policy that is consistent and robust in dealing with global environmental shifts. We then propose an allocation policy, LinLUCB, which exploits information about global shifts across all arms in each environment. Empirical tests depict a significant improvement in our policies against other existing methods.

[LG-2] RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment

链接: https://arxiv.org/abs/2408.12579
作者: Xiaohan Wang,Xiaoyan Yang,Yuqi Zhu,Yue Shen,Jian Wang,Peng Wei,Lei Liang,Jinjie Gu,Huajun Chen,Ningyu Zhang
关键词-EN: Large Language Models, Large Language, Language Models, Med-Gemini achieve performance, achieve performance competitively
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR); Machine Learning (cs.LG)
*备注: Ongoing work

点击查看摘要

Abstract:Large Language Models (LLMs) like GPT-4, MedPaLM-2, and Med-Gemini achieve performance competitively with human experts across various medical benchmarks. However, they still face challenges in making professional diagnoses akin to physicians, particularly in efficiently gathering patient information and reasoning the final diagnosis. To this end, we introduce the RuleAlign framework, designed to align LLMs with specific diagnostic rules. We develop a medical dialogue dataset comprising rule-based communications between patients and physicians and design an alignment learning approach through preference learning. Experimental results demonstrate the effectiveness of the proposed approach. We hope that our work can serve as an inspiration for exploring the potential of LLMs as AI physicians.

[LG-3] A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language

链接: https://arxiv.org/abs/2408.12578
作者: Ekdeep Singh Lubana,Kyogo Kawaguchi,Robert P. Dick,Hidenori Tanaka
关键词-EN: phenomenon often called, compute can lead, emergent capabilities, Increase in data, Increase
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: Preprint

点击查看摘要

Abstract:Increase in data, size, or compute can lead to sudden learning of specific capabilities by a neural network – a phenomenon often called “emergence”. Beyond scientific understanding, establishing the causal factors underlying such emergent capabilities is crucial to enable risk regulation frameworks for AI. In this work, we seek inspiration from study of emergent properties in other fields and propose a phenomenological definition for the concept in the context of neural networks. Our definition implicates the acquisition of specific structures underlying the data-generating process as a cause of sudden performance growth for specific, narrower tasks. We empirically investigate this definition by proposing an experimental system grounded in a context-sensitive formal language and find that Transformers trained to perform tasks on top of strings from this language indeed exhibit emergent capabilities. Specifically, we show that once the language’s underlying grammar and context-sensitivity inducing structures are learned by the model, performance on narrower tasks suddenly begins to improve. We then analogize our network’s learning dynamics with the process of percolation on a bipartite graph, establishing a formal phase transition model that predicts the shift in the point of emergence observed in experiment when changing the data structure. Overall, our experimental and theoretical frameworks yield a step towards better defining, characterizing, and predicting emergence in neural networks.

[LG-4] MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

链接: https://arxiv.org/abs/2408.12574
作者: Haojun Shi,Suyu Ye,Xinyu Fang,Chuanyang Jin,Layla Isik,Yen-Ling Kuo,Tianmin Shu
关键词-EN: Understanding people social, Theory of Mind, Understanding people, complex real-world scenarios, intricate mental reasoning
类目: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
*备注: Project website: this https URL Code: this https URL

点击查看摘要

Abstract:Understanding people’s social interactions in complex real-world scenarios often relies on intricate mental reasoning. To truly understand how and why people interact with one another, we must infer the underlying mental states that give rise to the social interactions, i.e., Theory of Mind reasoning in multi-agent interactions. Additionally, social interactions are often multi-modal – we can watch people’s actions, hear their conversations, and/or read about their past behaviors. For AI systems to successfully and safely interact with people in real-world environments, they also need to understand people’s mental states as well as their inferences about each other’s mental states based on multi-modal information about their interactions. For this, we introduce MuMA-ToM, a Multi-modal Multi-Agent Theory of Mind benchmark. MuMA-ToM is the first multi-modal Theory of Mind benchmark that evaluates mental reasoning in embodied multi-agent interactions. In MuMA-ToM, we provide video and text descriptions of people’s multi-modal behavior in realistic household environments. Based on the context, we then ask questions about people’s goals, beliefs, and beliefs about others’ goals. We validated MuMA-ToM in a human experiment and provided a human baseline. We also proposed a novel multi-modal, multi-agent ToM model, LIMP (Language model-based Inverse Multi-agent Planning). Our experimental results show that LIMP significantly outperforms state-of-the-art methods, including large multi-modal models (e.g., GPT-4o, Gemini-1.5 Pro) and a recent multi-modal ToM model, BIP-ALM.

[LG-5] Jamba-1.5: Hybrid Transformer-Mamba Models at Scale WWW

链接: https://arxiv.org/abs/2408.12570
作者: Jamba Team:Barak Lenz,Alan Arazi,Amir Bergman,Avshalom Manevich,Barak Peleg,Ben Aviram,Chen Almagor,Clara Fridman,Dan Padnos,Daniel Gissin,Daniel Jannai,Dor Muhlgay,Dor Zimberg,Edden M Gerber,Elad Dolev,Eran Krakovsky,Erez Safahi,Erez Schwartz,Gal Cohen,Gal Shachaf,Haim Rozenblum,Hofit Bata,Ido Blass,Inbal Magar,Itay Dalmedigos,Jhonathan Osin,Julie Fadlon,Maria Rozman,Matan Danos,Michael Gokhman,Mor Zusman,Naama Gidron,Nir Ratner,Noam Gat,Noam Rozen,Oded Fried,Ohad Leshno,Omer Antverg,Omri Abend,Opher Lieber,Or Dagan,Orit Cohavi,Raz Alon,Ro’i Belson,Roi Cohen,Rom Gilad,Roman Glozman,Shahar Lev,Shaked Meirom,Tal Delbari,Tal Ness,Tomer Asida,Tom Ben Gal,Tom Braude,Uriya Pumerantz,Yehoshua Cohen,Yonatan Belinkov,Yuval Globerson,Yuval Peleg Levy,Yoav Shoham
关键词-EN: instruction-tuned large language, large language models, language models based, instruction-tuned large, large language
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
*备注: Webpage: this https URL

点击查看摘要

Abstract:We present Jamba-1.5, new instruction-tuned large language models based on our Jamba architecture. Jamba is a hybrid Transformer-Mamba mixture of experts architecture, providing high throughput and low memory usage across context lengths, while retaining the same or better quality as Transformer models. We release two model sizes: Jamba-1.5-Large, with 94B active parameters, and Jamba-1.5-Mini, with 12B active parameters. Both models are fine-tuned for a variety of conversational and instruction-following capabilties, and have an effective context length of 256K tokens, the largest amongst open-weight models. To support cost-effective inference, we introduce ExpertsInt8, a novel quantization technique that allows fitting Jamba-1.5-Large on a machine with 8 80GB GPUs when processing 256K-token contexts without loss of quality. When evaluated on a battery of academic and chatbot benchmarks, Jamba-1.5 models achieve excellent results while providing high throughput and outperforming other open-weight models on long-context benchmarks. The model weights for both sizes are publicly available under the Jamba Open Model License and we release ExpertsInt8 as open source.

[LG-6] Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers ECCV2024

链接: https://arxiv.org/abs/2408.12568
作者: Sayed Mohammad Vakilzadeh Hatefi,Maximilian Dreyer,Reduan Achtibat,Thomas Wiegand,Wojciech Samek,Sebastian Lapuschkin
关键词-EN: Deep Neural Networks, huge computational costs, Deep Neural, complex problems, billions of parameters
类目: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
*备注: Accepted as a workshop paper at ECCV 2024 31 pages (14 pages manuscript, 4 pages references, 13 pages appendix)

点击查看摘要

Abstract:To solve ever more complex problems, Deep Neural Networks are scaled to billions of parameters, leading to huge computational costs. An effective approach to reduce computational requirements and increase efficiency is to prune unnecessary components of these often over-parameterized networks. Previous work has shown that attribution methods from the field of eXplainable AI serve as effective means to extract and prune the least relevant network components in a few-shot fashion. We extend the current state by proposing to explicitly optimize hyperparameters of attribution methods for the task of pruning, and further include transformer-based networks in our analysis. Our approach yields higher model compression rates of large transformer- and convolutional architectures (VGG, ResNet, ViT) compared to previous works, while still attaining high performance on ImageNet classification tasks. Here, our experiments indicate that transformers have a higher degree of over-parameterization compared to convolutional neural networks. Code is available at \hrefthis https URL\textthis https link .

[LG-7] ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation

链接: https://arxiv.org/abs/2408.12561
作者: Lujia Zhong,Shuo Huang,Yonggang Shi
关键词-EN: made remarkable strides, probabilistic diffusion models, large language models, remarkable strides, generative modeling
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: Under review

点击查看摘要

Abstract:Recently, deep learning has made remarkable strides, especially with generative modeling, such as large language models and probabilistic diffusion models. However, training these models often involves significant computational resources, requiring billions of petaFLOPs. This high resource consumption results in substantial energy usage and a large carbon footprint, raising critical environmental concerns. Back-propagation (BP) is a major source of computational expense during training deep learning models. To advance research on energy-efficient training and allow for sparse learning on any machine and device, we propose a general, energy-efficient convolution module that can be seamlessly integrated into any deep learning architecture. Specifically, we introduce channel-wise sparsity with additional gradient selection schedulers during backward based on the assumption that BP is often dense and inefficient, which can lead to over-fitting and high computational consumption. Our experiments demonstrate that our approach reduces 40% computations while potentially improving model performance, validated on image classification and generation tasks. This reduction can lead to significant energy savings and a lower carbon footprint during the research and development phases of large-scale AI systems. Additionally, our method mitigates over-fitting in a manner distinct from Dropout, allowing it to be combined with Dropout to further enhance model performance and reduce computational resource usage. Extensive experiments validate that our method generalizes to a variety of datasets and tasks and is compatible with a wide range of deep learning architectures and modules. Code is publicly available at this https URL.

[LG-8] Human-In-The-Loop Machine Learning for Safe and Ethical Autonomous Vehicles: Principles Challenges and Opportunities

链接: https://arxiv.org/abs/2408.12548
作者: Yousef Emami,Kai Li,Luis Almeida,Wei Ni,Zhu Han
关键词-EN: Autonomous Vehicles, Rapid advances, trends in Autonomous, Machine Learning, Learning
类目: Machine Learning (cs.LG)
*备注: 19 pages, 5 figures

点击查看摘要

Abstract:Rapid advances in Machine Learning (ML) have triggered new trends in Autonomous Vehicles (AVs). ML algorithms play a crucial role in interpreting sensor data, predicting potential hazards, and optimizing navigation strategies. However, achieving full autonomy in cluttered and complex situations, such as intricate intersections, diverse sceneries, varied trajectories, and complex missions, is still challenging, and the cost of data labeling remains a significant bottleneck. The adaptability and robustness of humans in complex scenarios motivate the inclusion of humans in ML process, leveraging their creativity, ethical power, and emotional intelligence to improve ML effectiveness. The scientific community knows this approach as Human-In-The-Loop Machine Learning (HITL-ML). Towards safe and ethical autonomy, we present a review of HITL-ML for AVs, focusing on Curriculum Learning (CL), Human-In-The-Loop Reinforcement Learning (HITL-RL), Active Learning (AL), and ethical principles. In CL, human experts systematically train ML models by starting with simple tasks and gradually progressing to more difficult ones. HITL-RL significantly enhances the RL process by incorporating human input through techniques like reward shaping, action injection, and interactive learning. AL streamlines the annotation process by targeting specific instances that need to be labeled with human oversight, reducing the overall time and cost associated with training. Ethical principles must be embedded in AVs to align their behavior with societal values and norms. In addition, we provide insights and specify future research directions.

[LG-9] Dynamics of Meta-learning Representation in the Teacher-student Scenario

链接: https://arxiv.org/abs/2408.12545
作者: Hui Wang,Cho Tung Yip,Bo Li
关键词-EN: Gradient-based meta-learning algorithms, Gradient-based meta-learning, limited data, gained popularity, shared representation
类目: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn)
*备注:

点击查看摘要

Abstract:Gradient-based meta-learning algorithms have gained popularity for their ability to train models on new tasks using limited data. Empirical observations indicate that such algorithms are able to learn a shared representation across tasks, which is regarded as a key factor in their success. However, the in-depth theoretical understanding of the learning dynamics and the origin of the shared representation remains underdeveloped. In this work, we investigate the meta-learning dynamics of the non-linear two-layer neural networks trained on streaming tasks in the teach-student scenario. Through the lens of statistical physics analysis, we characterize the macroscopic behavior of the meta-training processes, the formation of the shared representation, and the generalization ability of the model on new tasks. The analysis also points to the importance of the choice of certain hyper-parameters of the learning algorithms.

[LG-10] Exploiting Student Parallelism for Low-latency GPU Inference of BERT-like Models in Online Services

链接: https://arxiv.org/abs/2408.12526
作者: Weiyan Wang,Yilun Jin,Yiming Zhang,Victor Junqiu Wei,Han Tian,Li Chen,Kai Chen
关键词-EN: discriminative text mining, BERT-like models, web searching, widely adopted, adopted by discriminative
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Due to high accuracy, BERT-like models have been widely adopted by discriminative text mining and web searching. However, large BERT-like models suffer from inefficient online inference, as they face the following two problems on GPUs. First, they rely on the large model depth to achieve high accuracy, which linearly increases the sequential computation on GPUs. Second, stochastic and dynamic online workloads cause extra costs. In this paper, we present Academus for low-latency online inference of BERT-like models. At the core of Academus is the novel student parallelism, which adopts boosting ensemble and stacking distillation to distill the original deep model into an equivalent group of parallel and shallow student models. This enables Academus to achieve the lower model depth (e.g., two layers) than baselines and consequently the lowest inference latency without affecting the accuracy.For occasional workload bursts, it can temporarily decrease the number of students with minimal accuracy loss to improve throughput. Additionally, it employs specialized system designs for student parallelism to better handle stochastic online workloads. We conduct comprehensive experiments to verify the effectiveness. The results show that Academus outperforms the baselines by 4.1X~1.6X in latency without compromising accuracy, and achieves up to 22.27X higher throughput for workload bursts.

[LG-11] PCGRL: Scaling Control and Generalization in Reinforcement Learning Level Generators

链接: https://arxiv.org/abs/2408.12525
作者: Sam Earle,Zehua Jiang,Julian Togelius
关键词-EN: Procedural Content Generation, Procedural Content, computable metrics acting, Content Generation, Generation via Reinforcement
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: 8 pages, 7 figures, 6 tables. Published at IEEE Conference on Games, 2024

点击查看摘要

Abstract:Procedural Content Generation via Reinforcement Learning (PCGRL) has been introduced as a means by which controllable designer agents can be trained based only on a set of computable metrics acting as a proxy for the level’s quality and key characteristics. While PCGRL offers a unique set of affordances for game designers, it is constrained by the compute-intensive process of training RL agents, and has so far been limited to generating relatively small levels. To address this issue of scale, we implement several PCGRL environments in Jax so that all aspects of learning and simulation happen in parallel on the GPU, resulting in faster environment simulation; removing the CPU-GPU transfer of information bottleneck during RL training; and ultimately resulting in significantly improved training speed. We replicate several key results from prior works in this new framework, letting models train for much longer than previously studied, and evaluating their behavior after 1 billion timesteps. Aiming for greater control for human designers, we introduce randomized level sizes and frozen “pinpoints” of pivotal game tiles as further ways of countering overfitting. To test the generalization ability of learned generators, we evaluate models on large, out-of-distribution map sizes, and find that partial observation sizes learn more robust design strategies.

[LG-12] Advanced atom-level representations for protein flexibility prediction utilizing graph neural networks

链接: https://arxiv.org/abs/2408.12519
作者: Sina Sarparast,Aldo Zaimi,Maximilian Ebert,Michael-Rock Goldsmith
关键词-EN: Protein dynamics play, Protein dynamics, Protein, play a crucial, crucial role
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Protein dynamics play a crucial role in many biological processes and drug interactions. However, measuring, and simulating protein dynamics is challenging and time-consuming. While machine learning holds promise in deciphering the determinants of protein dynamics from structural information, most existing methods for protein representation learning operate at the residue level, ignoring the finer details of atomic interactions. In this work, we propose for the first time to use graph neural networks (GNNs) to learn protein representations at the atomic level and predict B-factors from protein 3D structures. The B-factor reflects the atomic displacement of atoms in proteins, and can serve as a surrogate for protein flexibility. We compared different GNN architectures to assess their performance. The Meta-GNN model achieves a correlation coefficient of 0.71 on a large and diverse test set of over 4k proteins (17M atoms) from the Protein Data Bank (PDB), outperforming previous methods by a large margin. Our work demonstrates the potential of representations learned by GNNs for protein flexibility prediction and other related tasks.

[LG-13] AI in radiological imaging of soft-tissue and bone tumours: a systematic review evaluating against CLAIM and FUTURE-AI guidelines

链接: https://arxiv.org/abs/2408.12491
作者: Douwe J. Spaanderman(1),Matthew Marzetti(2,3),Xinyi Wan(1),Andrew F. Scarsbrook(4,5),Philip Robinson(4),Edwin H.G. Oei(1),Jacob J. Visser(1),Robert Hemke(6),Kirsten van Langevelde(7),David F. Hanff(1),Geert J.L.H. van Leenders(8),Cornelis Verhoef(9),Dirk J. Gruühagen(9),Wiro J. Niessen(1,10),Stefan Klein(1),Martijn P.A. Starmans(1) ((1) Department of Radiology and Nuclear Medicine, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, the Netherlands, (2) Department of Medical Physics, Leeds Teaching Hospitals NHS Trust, UK, (3) Leeds Biomedical Research Centre, University of Leeds, UK, (4) Department of Radiology, Leeds Teaching Hospitals NHS Trust, UK, (5) Leeds Institute of Medical Research, University of Leeds, UK, (6) Department of Radiology and Nuclear Medicine, Amsterdam UMC, Amsterdam, the Netherlands, (7) Department of Radiology, Leiden University Medical Center, Leiden, the Netherlands, (8) Department of Pathology, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, the Netherlands, (9) Department of Surgical Oncology, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, the Netherlands, (10) Faculty of Medical Sciences, University of Groningen, Groningen, the Netherlands)
关键词-EN: diagnostically challenging lesions, Soft-tissue and bone, variable clinical behaviours, diagnostically challenging, treatment approaches
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: 23 pages, 6 figures, 6 supplementary figures

点击查看摘要

Abstract:Soft-tissue and bone tumours (STBT) are rare, diagnostically challenging lesions with variable clinical behaviours and treatment approaches. This systematic review provides an overview of Artificial Intelligence (AI) methods using radiological imaging for diagnosis and prognosis of these tumours, highlighting challenges in clinical translation, and evaluating study alignment with the Checklist for AI in Medical Imaging (CLAIM) and the FUTURE-AI international consensus guidelines for trustworthy and deployable AI to promote the clinical translation of AI methods. The review covered literature from several bibliographic databases, including papers published before 17/07/2024. Original research in peer-reviewed journals focused on radiology-based AI for diagnosing or prognosing primary STBT was included. Exclusion criteria were animal, cadaveric, or laboratory studies, and non-English papers. Abstracts were screened by two of three independent reviewers for eligibility. Eligible papers were assessed against guidelines by one of three independent reviewers. The search identified 15,015 abstracts, from which 325 articles were included for evaluation. Most studies performed moderately on CLAIM, averaging a score of 28.9 \pm 7.5 out of 53, but poorly on FUTURE-AI, averaging 5.1 \pm 2.1 out of 30. Imaging-AI tools for STBT remain at the proof-of-concept stage, indicating significant room for improvement. Future efforts by AI developers should focus on design (e.g. define unmet clinical need, intended clinical setting and how AI would be integrated in clinical workflow), development (e.g. build on previous work, explainability), evaluation (e.g. evaluating and addressing biases, evaluating AI against best practices), and data reproducibility and availability (making documented code and data publicly available). Following these recommendations could improve clinical translation of AI methods.

[LG-14] Self-Learning for Personalized Keyword Spotting on Ultra-Low-Power Audio Sensors

链接: https://arxiv.org/abs/2408.12481
作者: Manuele Rusci,Francesco Paci,Marco Fariselli,Eric Flamand,Tinne Tuytelaars
关键词-EN: personalized Keyword Spotting, ultra-low power smart, Keyword Spotting, power smart audio, smart audio sensors
类目: ound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
*备注:

点击查看摘要

Abstract:This paper proposes a self-learning framework to incrementally train (fine-tune) a personalized Keyword Spotting (KWS) model after the deployment on ultra-low power smart audio sensors. We address the fundamental problem of the absence of labeled training data by assigning pseudo-labels to the new recorded audio frames based on a similarity score with respect to few user recordings. By experimenting with multiple KWS models with a number of parameters up to 0.5M on two public datasets, we show an accuracy improvement of up to +19.2% and +16.0% vs. the initial models pretrained on a large set of generic keywords. The labeling task is demonstrated on a sensor system composed of a low-power microphone and an energy-efficient Microcontroller (MCU). By efficiently exploiting the heterogeneous processing engines of the MCU, the always-on labeling task runs in real-time with an average power cost of up to 8.2 mW. On the same platform, we estimate an energy cost for on-device training 10x lower than the labeling energy if sampling a new utterance every 5 s or 16.4 s with a DS-CNN-S or a DS-CNN-M model. Our empirical result paves the way to self-adaptive personalized KWS sensors at the extreme edge.

[LG-15] Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese

链接: https://arxiv.org/abs/2408.12480
作者: Khang T. Doan,Bao G. Huynh,Dung T. Hoang,Thuc D. Pham,Nhat H. Pham,Quan T.M. Nguyen,Bang Q. Vo,Suong N. Hoang
关键词-EN: multimodal large language, Vietnamese language tasks, multimodal large, MLLM, Vietnamese language
类目: Machine Learning (cs.LG); Computation and Language (cs.CL)
*备注: arXiv admin note: text overlap with arXiv:2404.16821 by other authors

点击查看摘要

Abstract:In this report, we introduce Vintern-1B, a reliable 1-billion-parameters multimodal large language model (MLLM) for Vietnamese language tasks. By integrating the Qwen2-0.5B-Instruct language model with the InternViT-300M-448px visual model, Vintern-1B is optimized for a range of applications, including optical character recognition (OCR), document extraction, and general question-answering in Vietnamese context. The model is fine-tuned on an extensive dataset of over 3 million image-question-answer pairs, achieving robust performance and reliable results across multiple Vietnamese language benchmarks like OpenViVQA and ViTextVQA. Vintern-1B is small enough to fit into various on-device applications easily. Additionally, we have open-sourced several Vietnamese vision question answering (VQA) datasets for text and diagrams, created with Gemini 1.5 Flash. Our models are available at: this https URL.

[LG-16] Predicting Solar Energy Generation with Machine Learning based on AQI and Weather Features

链接: https://arxiv.org/abs/2408.12476
作者: Arjun Shah,Varun Viswanath,Kashish Gandhi,Dr. Nilesh Madhukar Patil
关键词-EN: Air Quality Index, efficient grid integration, Deep Learning, Quality Index, Deep Learning techniques
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: 10 pages, 11 figures

点击查看摘要

Abstract:This paper addresses the pressing need for an accurate solar energy prediction model, which is crucial for efficient grid integration. We explore the influence of the Air Quality Index and weather features on solar energy generation, employing advanced Machine Learning and Deep Learning techniques. Our methodology uses time series modeling and makes novel use of power transform normalization and zero-inflated modeling. Various Machine Learning algorithms and Conv2D Long Short-Term Memory model based Deep Learning models are applied to these transformations for precise predictions. Results underscore the effectiveness of our approach, demonstrating enhanced prediction accuracy with Air Quality Index and weather features. We achieved a 0.9691 R^2 Score, 0.18 MAE, 0.10 RMSE with Conv2D Long Short-Term Memory model, showcasing the power transform technique’s innovation in enhancing time series forecasting for solar energy generation. Such results help our research contribute valuable insights to the synergy between Air Quality Index, weather features, and Deep Learning techniques for solar energy prediction.

[LG-17] WCEbleedGen: A wireless capsule endoscopy dataset and its benchmarking for automatic bleeding classification detection and segmentation

链接: https://arxiv.org/abs/2408.12466
作者: Palak Handa,Manas Dhir,Amirreza Mahbod,Florian Schwarzhans,Ramona Woitek,Nidhi Goel,Deepak Gunjan
关键词-EN: Wireless Capsule Endoscopy, Capsule Endoscopy, Wireless Capsule, Computer-based analysis, medically annotated WCE
类目: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Computer-based analysis of Wireless Capsule Endoscopy (WCE) is crucial. However, a medically annotated WCE dataset for training and evaluation of automatic classification, detection, and segmentation of bleeding and non-bleeding frames is currently lacking. The present work focused on development of a medically annotated WCE dataset called WCEbleedGen for automatic classification, detection, and segmentation of bleeding and non-bleeding frames. It comprises 2,618 WCE bleeding and non-bleeding frames which were collected from various internet resources and existing WCE datasets. A comprehensive benchmarking and evaluation of the developed dataset was done using nine classification-based, three detection-based, and three segmentation-based deep learning models. The dataset is of high-quality, is class-balanced and contains single and multiple bleeding sites. Overall, our standard benchmark results show that Visual Geometric Group (VGG) 19, You Only Look Once version 8 nano (YOLOv8n), and Link network (Linknet) performed best in automatic classification, detection, and segmentation-based evaluations, respectively. Automatic bleeding diagnosis is crucial for WCE video interpretations. This diverse dataset will aid in developing of real-time, multi-task learning-based innovative solutions for automatic bleeding diagnosis in WCE. The dataset and code are publicly available at this https URL and this https URL.

[LG-18] Smartphone-based Eye Tracking System using Edge Intelligence and Model Optimisation

链接: https://arxiv.org/abs/2408.12463
作者: Nishan Gunawardena,Gough Yumu Lui,Jeewani Anupama Ginige,Bahman Javadi
关键词-EN: Recurrent Neural Networks, Convolutional Neural Networks, video-type visual stimuli, Gated Recurrent Unit, Long Short Term
类目: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Performance (cs.PF)
*备注:

点击查看摘要

Abstract:A significant limitation of current smartphone-based eye-tracking algorithms is their low accuracy when applied to video-type visual stimuli, as they are typically trained on static images. Also, the increasing demand for real-time interactive applications like games, VR, and AR on smartphones requires overcoming the limitations posed by resource constraints such as limited computational power, battery life, and network bandwidth. Therefore, we developed two new smartphone eye-tracking techniques for video-type visuals by combining Convolutional Neural Networks (CNN) with two different Recurrent Neural Networks (RNN), namely Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU). Our CNN+LSTM and CNN+GRU models achieved an average Root Mean Square Error of 0.955cm and 1.091cm, respectively. To address the computational constraints of smartphones, we developed an edge intelligence architecture to enhance the performance of smartphone-based eye tracking. We applied various optimisation methods like quantisation and pruning to deep learning models for better energy, CPU, and memory usage on edge devices, focusing on real-time processing. Using model quantisation, the model inference time in the CNN+LSTM and CNN+GRU models was reduced by 21.72% and 19.50%, respectively, on edge devices.

[LG-19] Finding Closure: A Closer Look at the Gestalt Law of Closure in Convolutional Neural Networks

链接: https://arxiv.org/abs/2408.12460
作者: Yuyan Zhang,Derya Soydaner,Lisa Koßmann,Fatemeh Behrad,Johan Wagemans
关键词-EN: neural networks, Closure, missing or fragmented, neural, inherent ability
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:The human brain has an inherent ability to fill in gaps to perceive figures as complete wholes, even when parts are missing or fragmented. This phenomenon is known as Closure in psychology, one of the Gestalt laws of perceptual organization, explaining how the human brain interprets visual stimuli. Given the importance of Closure for human object recognition, we investigate whether neural networks rely on a similar mechanism. Exploring this crucial human visual skill in neural networks has the potential to highlight their comparability to humans. Recent studies have examined the Closure effect in neural networks. However, they typically focus on a limited selection of Convolutional Neural Networks (CNNs) and have not reached a consensus on their capability to perform Closure. To address these gaps, we present a systematic framework for investigating the Closure principle in neural networks. We introduce well-curated datasets designed to test for Closure effects, including both modal and amodal completion. We then conduct experiments on various CNNs employing different measurements. Our comprehensive analysis reveals that VGG16 and DenseNet-121 exhibit the Closure effect, while other CNNs show variable results. We interpret these findings by blending insights from psychology and neural network research, offering a unique perspective that enhances transparency in understanding neural networks. Our code and dataset will be made available on GitHub.

[LG-20] Verifiable Homomorphic Linear Combinations in Multi-Instance Time-Lock Puzzles

链接: https://arxiv.org/abs/2408.12444
作者: Aydin Abadi
关键词-EN: securely transmit sensitive, transmit sensitive information, partially Homomorphic TLP, TLP, homomorphic
类目: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
*备注: arXiv admin note: text overlap with arXiv:2406.15070

点击查看摘要

Abstract:Time-Lock Puzzles (TLPs) have been developed to securely transmit sensitive information into the future without relying on a trusted third party. Multi-instance TLP is a scalable variant of TLP that enables a server to efficiently find solutions to different puzzles provided by a client at once. Nevertheless, existing multi-instance TLPs lack support for (verifiable) homomorphic computation. To address this limitation, we introduce the “Multi-Instance partially Homomorphic TLP” (MH-TLP), a multi-instance TLP supporting efficient verifiable homomorphic linear combinations of puzzles belonging to a client. It ensures anyone can verify the correctness of computations and solutions. Building on MH-TLP, we further propose the “Multi-instance Multi-client verifiable partially Homomorphic TLP” (MMH-TLP). It not only supports all the features of MH-TLP but also allows for verifiable homomorphic linear combinations of puzzles from different clients. Our schemes refrain from using asymmetric-key cryptography for verification and, unlike most homomorphic TLPs, do not require a trusted third party. A comprehensive cost analysis demonstrates that our schemes scale linearly with the number of clients and puzzles.

[LG-21] Multi-Knowledge Fusion Network for Time Series Representation Learning ICLR

链接: https://arxiv.org/abs/2408.12423
作者: Sagar Srinivas Sakhinana,Shivam Gupta,Krishna Sai Sudhir Aripirala,Venkataramana Runkana
关键词-EN: making informed decisions, sensor networks characterized, high-dimensional multivariate time, forecasting MTS data, MTS data
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: Paper accepted at ML4IoT Workshop, International Conference on Learning Representations(ICLR) 2023

点击查看摘要

Abstract:Forecasting the behaviour of complex dynamical systems such as interconnected sensor networks characterized by high-dimensional multivariate time series(MTS) is of paramount importance for making informed decisions and planning for the future in a broad spectrum of applications. Graph forecasting networks(GFNs) are well-suited for forecasting MTS data that exhibit spatio-temporal dependencies. However, most prior works of GFN-based methods on MTS forecasting rely on domain-expertise to model the nonlinear dynamics of the system, but neglect the potential to leverage the inherent relational-structural dependencies among time series variables underlying MTS data. On the other hand, contemporary works attempt to infer the relational structure of the complex dependencies between the variables and simultaneously learn the nonlinear dynamics of the interconnected system but neglect the possibility of incorporating domain-specific prior knowledge to improve forecast accuracy. To this end, we propose a hybrid architecture that combines explicit prior knowledge with implicit knowledge of the relational structure within the MTS data. It jointly learns intra-series temporal dependencies and inter-series spatial dependencies by encoding time-conditioned structural spatio-temporal inductive biases to provide more accurate and reliable forecasts. It also models the time-varying uncertainty of the multi-horizon forecasts to support decision-making by providing estimates of prediction uncertainty. The proposed architecture has shown promising results on multiple benchmark datasets and outperforms state-of-the-art forecasting methods by a significant margin. We report and discuss the ablation studies to validate our forecasting architecture.

[LG-22] 4D Diffusion for Dynamic Protein Structure Prediction with Reference Guided Motion Alignment

链接: https://arxiv.org/abs/2408.12419
作者: Kaihui Cheng,Ce Liu,Qingkun Su,Jun Wang,Liwei Zhang,Yining Tang,Yao Yao,Siyu Zhu,Yuan Qi
关键词-EN: advancing biological research, facilitating pharmaceutical development, protein structures, dynamic protein structures, Protein structure prediction
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Protein structure prediction is pivotal for understanding the structure-function relationship of proteins, advancing biological research, and facilitating pharmaceutical development and experimental design. While deep learning methods and the expanded availability of experimental 3D protein structures have accelerated structure prediction, the dynamic nature of protein structures has received limited attention. This study introduces an innovative 4D diffusion model incorporating molecular dynamics (MD) simulation data to learn dynamic protein structures. Our approach is distinguished by the following components: (1) a unified diffusion model capable of generating dynamic protein structures, including both the backbone and side chains, utilizing atomic grouping and side-chain dihedral angle predictions; (2) a reference network that enhances structural consistency by integrating the latent embeddings of the initial 3D protein structures; and (3) a motion alignment module aimed at improving temporal structural coherence across multiple time steps. To our knowledge, this is the first diffusion-based model aimed at predicting protein trajectories across multiple time steps simultaneously. Validation on benchmark datasets demonstrates that our model exhibits high accuracy in predicting dynamic 3D structures of proteins containing up to 256 amino acids over 32 time steps, effectively capturing both local flexibility in stable states and significant conformational changes.

[LG-23] Unlearning Trojans in Large Language Models : A Comparison Between Natural Language and Source Code

链接: https://arxiv.org/abs/2408.12416
作者: Mahdi Kazemi,Aftab Hussain,Md Rafiqul Islam Rabin,Mohammad Amin Alipour,Sen Lin
关键词-EN: Fisher Information Matrix, large language models, conventional large language, Information Matrix, elastic weight consolidation
类目: oftware Engineering (cs.SE); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:This work investigates the application of Machine Unlearning (MU) for mitigating the impact of trojans embedded in conventional large language models of natural language (Text-LLMs) and large language models of code (Code-LLMs) We propose a novel unlearning approach, LYA, that leverages both gradient ascent and elastic weight consolidation, a Fisher Information Matrix (FIM) based regularization technique, to unlearn trojans from poisoned models. We compare the effectiveness of LYA against conventional techniques like fine-tuning, retraining, and vanilla gradient ascent. The subject models we investigate are BERT and CodeBERT, for sentiment analysis and code defect detection tasks, respectively. Our findings demonstrate that the combination of gradient ascent and FIM-based regularization, as done in LYA, outperforms existing methods in removing the trojan’s influence from the poisoned model, while preserving its original functionality. To the best of our knowledge, this is the first work that compares and contrasts MU of trojans in LLMs, in the NL and Coding domain.

[LG-24] Multi-Source Knowledge-Based Hybrid Neural Framework for Time Series Representation Learning IJCAI-23

链接: https://arxiv.org/abs/2408.12409
作者: Sagar Srinivas Sakhinana,Krishna Sai Sudhir Aripirala,Shivam Gupta,Venkataramana Runkana
关键词-EN: complex dynamical systems, interconnected sensor networks, MTS data, Accurately predicting, multivariate time series
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: Paper is accepted at Knowledge-Based Compositional Generalization Workshop, International Joint Conferences on Artificial Intelligence(IJCAI-23)

点击查看摘要

Abstract:Accurately predicting the behavior of complex dynamical systems, characterized by high-dimensional multivariate time series(MTS) in interconnected sensor networks, is crucial for informed decision-making in various applications to minimize risk. While graph forecasting networks(GFNs) are ideal for forecasting MTS data that exhibit spatio-temporal dependencies, prior works rely solely on the domain-specific knowledge of time-series variables inter-relationships to model the nonlinear dynamics, neglecting inherent relational structural dependencies among the variables within the MTS data. In contrast, contemporary works infer relational structures from MTS data but neglect domain-specific knowledge. The proposed hybrid architecture addresses these limitations by combining both domain-specific knowledge and implicit knowledge of the relational structure underlying the MTS data using Knowledge-Based Compositional Generalization. The hybrid architecture shows promising results on multiple benchmark datasets, outperforming state-of-the-art forecasting methods. Additionally, the architecture models the time varying uncertainty of multi-horizon forecasts.

[LG-25] An Evaluation of Deep Learning Models for Stock Market Trend Prediction

链接: https://arxiv.org/abs/2408.12408
作者: Gonzalo Lopez Gil,Paul Duhamel-Sebline,Andrew McCarren
关键词-EN: reflecting economic health, influencing global dynamics, Time Series Forecasting, Time Series, providing investment opportunities
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:The stock market is a fundamental component of financial systems, reflecting economic health, providing investment opportunities, and influencing global dynamics. Accurate stock market predictions can lead to significant gains and promote better investment decisions. However, predicting stock market trends is challenging due to their non-linear and stochastic nature. This study investigates the efficacy of advanced deep learning models for short-term trend forecasting using daily and hourly closing prices from the SP 500 index and the Brazilian ETF EWZ. The models explored include Temporal Convolutional Networks (TCN), Neural Basis Expansion Analysis for Time Series Forecasting (N-BEATS), Temporal Fusion Transformers (TFT), Neural Hierarchical Interpolation for Time Series Forecasting (N-HiTS), and Time-series Dense Encoder (TiDE). Furthermore, we introduce the Extended Long Short-Term Memory for Time Series (xLSTM-TS) model, an xLSTM adaptation optimised for time series prediction. Wavelet denoising techniques were applied to smooth the signal and reduce minor fluctuations, providing cleaner data as input for all approaches. Denoising significantly improved performance in predicting stock price direction. Among the models tested, xLSTM-TS consistently outperformed others. For example, it achieved a test accuracy of 72.82% and an F1 score of 73.16% on the EWZ daily dataset. By leveraging advanced deep learning models and effective data preprocessing techniques, this research provides valuable insights into the application of machine learning for market movement forecasting, highlighting both the potential and the challenges involved.

[LG-26] Fredholm Integral Equations Neural Operator (FIE-NO) for Data-Driven Boundary Value Problems

链接: https://arxiv.org/abs/2408.12389
作者: Haoyang Jiang,Yongzhi Qu
关键词-EN: Random Fourier Features, Fredholm Integral Equation, Fredholm Integral, Integral Equation Neural, Random Fourier
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:In this paper, we present a novel Fredholm Integral Equation Neural Operator (FIE-NO) method, an integration of Random Fourier Features and Fredholm Integral Equations (FIE) into the deep learning framework, tailored for solving data-driven Boundary Value Problems (BVPs) with irregular boundaries. Unlike traditional computational approaches that struggle with the computational intensity and complexity of such problems, our method offers a robust, efficient, and accurate solution mechanism, using a physics inspired design of the learning structure. We demonstrate that the proposed physics-guided operator learning method (FIE-NO) achieves superior performance in addressing BVPs. Notably, our approach can generalize across multiple scenarios, including those with unknown equation forms and intricate boundary shapes, after being trained only on one boundary condition. Experimental validation demonstrates that the FIE-NO method performs well in simulated examples, including Darcy flow equation and typical partial differential equations such as the Laplace and Helmholtz equations. The proposed method exhibits robust performance across different boundary conditions. Experimental results indicate that FIE-NO achieves higher accuracy and stability compared to other methods when addressing complex boundary value problems with varying numbers of interior points.

[LG-27] Makeup-Guided Facial Privacy Protection via Untrained Neural Network Priors ECCV

链接: https://arxiv.org/abs/2408.12387
作者: Fahad Shamshad,Muzammal Naseer,Karthik Nandakumar
关键词-EN: Deep learning-based face, Deep learning-based, significant privacy risks, pose significant privacy, learning-based face recognition
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
*备注: Proceedings of ECCV Workshop on Explainable AI for Biometrics, 2024

点击查看摘要

Abstract:Deep learning-based face recognition (FR) systems pose significant privacy risks by tracking users without their consent. While adversarial attacks can protect privacy, they often produce visible artifacts compromising user experience. To mitigate this issue, recent facial privacy protection approaches advocate embedding adversarial noise into the natural looking makeup styles. However, these methods require training on large-scale makeup datasets that are not always readily available. In addition, these approaches also suffer from dataset bias. For instance, training on makeup data that predominantly contains female faces could compromise protection efficacy for male faces. To handle these issues, we propose a test-time optimization approach that solely optimizes an untrained neural network to transfer makeup style from a reference to a source image in an adversarial manner. We introduce two key modules: a correspondence module that aligns regions between reference and source images in latent space, and a decoder with conditional makeup layers. The untrained decoder, optimized via carefully designed structural and makeup consistency losses, generates a protected image that resembles the source but incorporates adversarial makeup to deceive FR models. As our approach does not rely on training with makeup face datasets, it avoids potential male/female dataset biases while providing effective protection. We further extend the proposed approach to videos by leveraging on temporal correlations. Experiments on benchmark datasets demonstrate superior performance in face verification and identification tasks and effectiveness against commercial FR systems. Our code and models will be available at this https URL

[LG-28] Sharper Bounds for Chebyshev Moment Matching with Applications to Differential Privacy and Beyond

链接: https://arxiv.org/abs/2408.12385
作者: Cameron Musco,Christopher Musco,Lucas Rosenblatt,Apoorv Vikram Singh
关键词-EN: Chebyshev polynomial moments, Chebyshev polynomial, polynomial moments, study the problem, problem of approximately
类目: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:We study the problem of approximately recovering a probability distribution given noisy measurements of its Chebyshev polynomial moments. We sharpen prior work, proving that accurate recovery in the Wasserstein distance is possible with more noise than previously known. As a main application, our result yields a simple “linear query” algorithm for constructing a differentially private synthetic data distribution with Wasserstein-1 error \tildeO(1/n) based on a dataset of n points in [-1,1] . This bound is optimal up to log factors and matches a recent breakthrough of Boedihardjo, Strohmer, and Vershynin [Probab. Theory. Rel., 2024], which uses a more complex “superregular random walk” method to beat an O(1/\sqrtn) accuracy barrier inherent to earlier approaches. We illustrate a second application of our new moment-based recovery bound in numerical linear algebra: by improving an approach of Braverman, Krishnan, and Musco [STOC 2022], our result yields a faster algorithm for estimating the spectral density of a symmetric matrix up to small error in the Wasserstein distance. Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG) Cite as: arXiv:2408.12385 [cs.DS] (or arXiv:2408.12385v1 [cs.DS] for this version) https://doi.org/10.48550/arXiv.2408.12385 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

[LG-29] Sampling Strategies based on Wisdom of Crowds for Amazon Deforestation Detection

链接: https://arxiv.org/abs/2408.12381
作者: Hugo Resende,Eduardo B. Neto,Fabio A. M. Cappabianco,Alvaro L. Fazenda,Fabio A. Faria
关键词-EN: Conserving tropical forests, highly relevant socially, Conserving tropical, Machine Learning models, global ecosystem
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
*备注: 6 pages, 5 figus, paper accepted at the SIBGRAPI 2024

点击查看摘要

Abstract:Conserving tropical forests is highly relevant socially and ecologically because of their critical role in the global ecosystem. However, the ongoing deforestation and degradation affect millions of hectares each year, necessitating government or private initiatives to ensure effective forest monitoring. In April 2019, a project based on Citizen Science and Machine Learning models called ForestEyes (FE) was launched with the aim of providing supplementary data to assist experts from government and non-profit organizations in their deforestation monitoring efforts. Recent research has shown that labeling FE project volunteers/citizen scientists helps tailor machine learning models. In this sense, we adopt the FE project to create different sampling strategies based on the wisdom of crowds to select the most suitable samples from the training set to learn an SVM technique and obtain better classification results in deforestation detection tasks. In our experiments, we can show that our strategy based on user entropy-increasing achieved the best classification results in the deforestation detection task when compared with the random sampling strategies, as well as, reducing the convergence time of the SVM technique.

[LG-30] Cell-ontology guided transcriptome foundation model DATE

链接: https://arxiv.org/abs/2408.12373
作者: Xinyu Yuan,Zhihao Zhan,Zuobai Zhang,Manqi Zhou,Jianan Zhao,Boyu Han,Yue Li,Jian Tang
关键词-EN: Transcriptome foundation models, hold great promises, TFMs hold great, dictate diverse cell, diverse cell functions
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: All anonymous reviewers’ constructive suggestions are appreciated. The next version will be updated soon

点击查看摘要

Abstract:Transcriptome foundation models TFMs hold great promises of deciphering the transcriptomic language that dictate diverse cell functions by self-supervised learning on large-scale single-cell gene expression data, and ultimately unraveling the complex mechanisms of human diseases. However, current TFMs treat cells as independent samples and ignore the taxonomic relationships between cell types, which are available in cell ontology graphs. We argue that effectively leveraging this ontology information during the TFM pre-training can improve learning biologically meaningful gene co-expression patterns while preserving TFM as a general purpose foundation model for downstream zero-shot and fine-tuning tasks. To this end, we present \textbfsingle \textbfcell, \textbfCell-\textbfontology guided TFM scCello. We introduce cell-type coherence loss and ontology alignment loss, which are minimized along with the masked gene expression prediction loss during the pre-training. The novel loss component guide scCello to learn the cell-type-specific representation and the structural relation between cell types from the cell ontology graph, respectively. We pre-trained scCello on 22 million cells from CellxGene database leveraging their cell-type labels mapped to the cell ontology graph from Open Biological and Biomedical Ontology Foundry. Our TFM demonstrates competitive generalization and transferability performance over the existing TFMs on biologically important tasks including identifying novel cell types of unseen cells, prediction of cell-type-specific marker genes, and cancer drug responses.

[LG-31] Robust Principal Component Analysis via Discriminant Sample Weight Learning

链接: https://arxiv.org/abs/2408.12366
作者: Yingzhuo Deng,Ke Hu,Bo Li,Yao Zhang
关键词-EN: Principal component analysis, PCA projection matrix, projection matrix, classical feature extraction, Principal component
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:Principal component analysis (PCA) is a classical feature extraction method, but it may be adversely affected by outliers, resulting in inaccurate learning of the projection matrix. This paper proposes a robust method to estimate both the data mean and the PCA projection matrix by learning discriminant sample weights from data containing outliers. Each sample in the dataset is assigned a weight, and the proposed algorithm iteratively learns the weights, the mean, and the projection matrix, respectively. Specifically, when the mean and the projection matrix are available, via fine-grained analysis of outliers, a weight for each sample is learned hierarchically so that outliers have small weights while normal samples have large weights. With the learned weights available, a weighted optimization problem is solved to estimate both the data mean and the projection matrix. Because the learned weights discriminate outliers from normal samples, the adverse influence of outliers is mitigated due to the corresponding small weights. Experiments on toy data, UCI dataset, and face dataset demonstrate the effectiveness of the proposed method in estimating the mean and the projection matrix from the data containing outliers.

[LG-32] Enhancing Uncertainty Communication in Time Series Predictions: Insights and Recommendations

链接: https://arxiv.org/abs/2408.12365
作者: Apoorva Karagappa,Pawandeep Kaur Betz,Jonas Gilg,Moritz Zeumer,Andreas Gerndt,Bernhard Preim
关键词-EN: time series predictions, time series, time series forecast, series predictions, uncertainty
类目: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:As the world increasingly relies on mathematical models for forecasts in different areas, effective communication of uncertainty in time series predictions is important for informed decision making. This study explores how users estimate probabilistic uncertainty in time series predictions under different variants of line charts depicting uncertainty. It examines the role of individual characteristics and the influence of user-reported metrics on uncertainty estimations. By addressing these aspects, this paper aims to enhance the understanding of uncertainty visualization and for improving communication in time series forecast visualizations and the design of prediction data this http URL the world increasingly relies on mathematical models for forecasts in different areas, effective communication of uncertainty in time series predictions is important for informed decision making. This study explores how users estimate probabilistic uncertainty in time series predictions under different variants of line charts depicting uncertainty. It examines the role of individual characteristics and the influence of user-reported metrics on uncertainty estimations. By addressing these aspects, this paper aims to enhance the understanding of uncertainty visualization and for improving communication in time series forecast visualizations and the design of prediction data dashboards.

[LG-33] Fine-tuning Smaller Language Models for Question Answering over Financial Documents

链接: https://arxiv.org/abs/2408.12337
作者: Karmvir Singh Phogat,Sai Akhil Puranam,Sridhar Dasaratha,Chetan Harsha,Shashishekar Ramakrishna
关键词-EN: Recent research, acquire substantial reasoning, substantial reasoning abilities, reasoning exemplars crafted, significantly larger teacher
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
*备注:

点击查看摘要

Abstract:Recent research has shown that smaller language models can acquire substantial reasoning abilities when fine-tuned with reasoning exemplars crafted by a significantly larger teacher model. We explore this paradigm for the financial domain, focusing on the challenge of answering questions that require multi-hop numerical reasoning over financial texts. We assess the performance of several smaller models that have been fine-tuned to generate programs that encode the required financial reasoning and calculations. Our findings demonstrate that these fine-tuned smaller models approach the performance of the teacher model. To provide a granular analysis of model performance, we propose an approach to investigate the specific student model capabilities that are enhanced by fine-tuning. Our empirical analysis indicates that fine-tuning refines the student models ability to express and apply the required financial concepts along with adapting the entity extraction for the specific data format. In addition, we hypothesize and demonstrate that comparable financial reasoning capability can be induced using relatively smaller datasets. Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY) Cite as: arXiv:2408.12337 [cs.CL] (or arXiv:2408.12337v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2408.12337 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

[LG-34] Enhanced Expressivity in Graph Neural Networks with Lanczos-Based Linear Constraints

链接: https://arxiv.org/abs/2408.12334
作者: Niloofar Azizi,Nils Kriege,Horst Bischof
关键词-EN: Graph Neural Networks, Message Passing GNNs, Neural Networks, Message Passing, commonly used Message
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Graph Neural Networks (GNNs) excel in handling graph-structured data but often underperform in link prediction tasks compared to classical methods, mainly due to the limitations of the commonly used Message Passing GNNs (MPNNs). Notably, their ability to distinguish non-isomorphic graphs is limited by the 1-dimensional Weisfeiler-Lehman test. Our study presents a novel method to enhance the expressivity of GNNs by embedding induced subgraphs into the graph Laplacian matrix’s eigenbasis. We introduce a Learnable Lanczos algorithm with Linear Constraints (LLwLC), proposing two novel subgraph extraction strategies: encoding vertex-deleted subgraphs and applying Neumann eigenvalue constraints. For the former, we conjecture that LLwLC establishes a universal approximator, offering efficient time complexity. The latter focuses on link representations enabling differentiation between k -regular graphs and node automorphism, a vital aspect for link prediction tasks. Our approach results in an extremely lightweight architecture, reducing the need for extensive training datasets. Empirically, our method improves performance in challenging link prediction tasks across benchmark datasets, establishing its practical utility and supporting our theoretical findings. Notably, LLwLC achieves 20x and 10x speedup by only requiring 5% and 10% data from the PubMed and OGBL-Vessel datasets while comparing to the state-of-the-art.

[LG-35] PolyRouter: A Multi-LLM Querying System

链接: https://arxiv.org/abs/2408.12320
作者: Dimitris Stripelis,Zijian Hu,Jipeng Zhang,Zhaozhuo Xu,Alay Shah,Han Jin,Yuhang Yao,Salman Avestimehr,Chaoyang He
关键词-EN: Large Language Models, Large Language, possessing domain-specific expertise, growth of Large, domain-specific expertise
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: 14 pages, 7 figures, 2 tables

点击查看摘要

Abstract:With the rapid growth of Large Language Models (LLMs) across various domains, numerous new LLMs have emerged, each possessing domain-specific expertise. This proliferation has highlighted the need for quick, high-quality, and cost-effective LLM query response methods. Yet, no single LLM exists to efficiently balance this trilemma. Some models are powerful but extremely costly, while others are fast and inexpensive but qualitatively inferior. To address this challenge, we present PolyRouter, a non-monolithic LLM querying system that seamlessly integrates various LLM experts into a single query interface and dynamically routes incoming queries to the most high-performant expert based on query’s requirements. Through extensive experiments, we demonstrate that when compared to standalone expert models, PolyRouter improves query efficiency by up to 40%, and leads to significant cost reductions of up to 30%, while maintaining or enhancing model performance by up to 10%.

[LG-36] Deep Learning with CNNs: A Compact Holistic Tutorial with Focus on Supervised Regression (Preprint)

链接: https://arxiv.org/abs/2408.12308
作者: Yansel Gonzalez Tejeda,Helmut A. Mayer
关键词-EN: Convolutional Neural Networks, Deep Learning, Learning, address Deep Learning, Neural Networks
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:In this tutorial, we present a compact and holistic discussion of Deep Learning with a focus on Convolutional Neural Networks (CNNs) and supervised regression. While there are numerous books and articles on the individual topics we cover, comprehensive and detailed tutorials that address Deep Learning from a foundational yet rigorous and accessible perspective are rare. Most resources on CNNs are either too advanced, focusing on cutting-edge architectures, or too narrow, addressing only specific applications like image classification.This tutorial not only summarizes the most relevant concepts but also provides an in-depth exploration of each, offering a complete yet agile set of ideas. Moreover, we highlight the powerful synergy between learning theory, statistic, and machine learning, which together underpin the Deep Learning and CNN frameworks. We aim for this tutorial to serve as an optimal resource for students, professors, and anyone interested in understanding the foundations of Deep Learning. Upon acceptance we will provide an accompanying repository under \hrefthis https URLthis https URL Keywords: Tutorial, Deep Learning, Convolutional Neural Networks, Machine Learning. Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG) Cite as: arXiv:2408.12308 [cs.AI] (or arXiv:2408.12308v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2408.12308 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

[LG-37] Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning

链接: https://arxiv.org/abs/2408.12307
作者: Yen-Ru Lai,Fu-Chieh Chang,Pei-Yuan Wu
关键词-EN: Offline reinforcement learning, requires large amounts, reinforcement learning, learns policies, fixed dataset
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Offline reinforcement learning (RL) learns policies from a fixed dataset, but often requires large amounts of data. The challenge arises when labeled datasets are expensive, especially when rewards have to be provided by human labelers for large datasets. In contrast, unlabelled data tends to be less expensive. This situation highlights the importance of finding effective ways to use unlabelled data in offline RL, especially when labelled data is limited or expensive to obtain. In this paper, we present the algorithm to utilize the unlabeled data in the offline RL method with kernel function approximation and give the theoretical guarantee. We present various eigenvalue decay conditions of \mathcalH_k which determine the complexity of the algorithm. In summary, our work provides a promising approach for exploiting the advantages offered by unlabeled data in offline RL, whilst maintaining theoretical assurances.

[LG-38] ackling Data Heterogeneity in Federated Learning via Loss Decomposition MICCAI2024

链接: https://arxiv.org/abs/2408.12300
作者: Shuang Zeng,Pengxin Guo,Shuai Wang,Jianbo Wang,Yuyin Zhou,Liangqiong Qu
关键词-EN: privacy-preserving machine learning, Federated Learning, large-scale medical datasets, medical datasets remain, datasets remain localized
类目: Machine Learning (cs.LG)
*备注: Accepted at MICCAI 2024

点击查看摘要

Abstract:Federated Learning (FL) is a rising approach towards collaborative and privacy-preserving machine learning where large-scale medical datasets remain localized to each client. However, the issue of data heterogeneity among clients often compels local models to diverge, leading to suboptimal global models. To mitigate the impact of data heterogeneity on FL performance, we start with analyzing how FL training influence FL performance by decomposing the global loss into three terms: local loss, distribution shift loss and aggregation loss. Remarkably, our loss decomposition reveals that existing local training-based FL methods attempt to reduce the distribution shift loss, while the global aggregation-based FL methods propose better aggregation strategies to reduce the aggregation loss. Nevertheless, a comprehensive joint effort to minimize all three terms is currently limited in the literature, leading to subpar performance when dealing with data heterogeneity challenges. To fill this gap, we propose a novel FL method based on global loss decomposition, called FedLD, to jointly reduce these three loss terms. Our FedLD involves a margin control regularization in local training to reduce the distribution shift loss, and a principal gradient-based server aggregation strategy to reduce the aggregation loss. Notably, under different levels of data heterogeneity, our strategies achieve better and more robust performance on retinal and chest X-ray classification compared to other FL algorithms. Our code is available at \hrefthis https URLthis https URL.

[LG-39] Geometrical structures of digital fluctuations in parameter space of neural networks trained with adaptive momentum optimization

链接: https://arxiv.org/abs/2408.12273
作者: Igor V. Netay
关键词-EN: stochastic gradient-based optimization, present results, stochastic gradient-based, gradient-based optimization, adaptive momentum
类目: Machine Learning (cs.LG); Numerical Analysis (math.NA)
*备注:

点击查看摘要

Abstract:We present results of numerical experiments for neural networks with stochastic gradient-based optimization with adaptive momentum. This widely applied optimization has proved convergence and practical efficiency, but for long-run training becomes numerically unstable. We show that numerical artifacts are observable not only for large-scale models and finally lead to divergence also for case of shallow narrow networks. We argue this theory by experiments with more than 1600 neural networks trained for 50000 epochs. Local observations show presence of the same behavior of network parameters in both stable and unstable training segments. Geometrical behavior of parameters forms double twisted spirals in the parameter space and is caused by alternating of numerical perturbations with next relaxation oscillations in values for 1st and 2nd momentum.

[LG-40] Variance reduction of diffusion models gradients with Taylor approximation-based control variate ICML

链接: https://arxiv.org/abs/2408.12270
作者: Paul Jeha,Will Grathwohl,Michael Riis Andersen,Carl Henrik Ek,Jes Frellsen
关键词-EN: denoising score matching, Score-based models, trained with denoising, score matching, high dimensional data
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: 14 pages, ICML Structured Probabilistic Inference Generative Modeling 2024

点击查看摘要

Abstract:Score-based models, trained with denoising score matching, are remarkably effective in generating high dimensional data. However, the high variance of their training objective hinders optimisation. We attempt to reduce it with a control variate, derived via a k -th order Taylor expansion on the training objective and its gradient. We prove an equivalence between the two and demonstrate empirically the effectiveness of our approach on a low dimensional problem setting; and study its effect on larger problems.

[LG-41] Accounts of using the Tustin-Net architecture on a rotary inverted pendulum

链接: https://arxiv.org/abs/2408.12266
作者: Stijn van Esch,Fabio Bonassi,Thomas B. Schön
关键词-EN: Tustin neural network, rotary inverse pendulum, neural network architecture, physical rotary inverse, Tustin neural
类目: ystems and Control (eess.SY); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:In this report we investigate the use of the Tustin neural network architecture (Tustin-Net) for the identification of a physical rotary inverse pendulum. This physics-based architecture is of particular interest as it builds on the known relationship between velocities and positions. We here aim at discussing the advantages, limitations and performance of Tustin-Nets compared to first-principles grey-box models on a real physical apparatus, showing how, with a standard training procedure, the former can hardly achieve the same accuracy as the latter. To address this limitation, we present a training strategy based on transfer learning that yields Tustin-Nets that are competitive with the first-principles model, without requiring extensive knowledge of the setup as the latter.

[LG-42] oward the Evaluation of Large Language Models Considering Score Variance across Instruction Templates

链接: https://arxiv.org/abs/2408.12263
作者: Yusuke Sakai,Adam Nohejl,Jiangnan Hang,Hidetaka Kamigaito,Taro Watanabe
关键词-EN: natural language understanding, large language models, NLU performance, language understanding, language models
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: 19 pages, 7 figures

点击查看摘要

Abstract:The natural language understanding (NLU) performance of large language models (LLMs) has been evaluated across various tasks and datasets. The existing evaluation methods, however, do not take into account the variance in scores due to differences in prompts, which leads to unfair evaluation and comparison of NLU performance. Moreover, evaluation designed for specific prompts is inappropriate for instruction tuning, which aims to perform well with any prompt. It is therefore necessary to find a way to measure NLU performance in a fair manner, considering score variance between different instruction templates. In this study, we provide English and Japanese cross-lingual datasets for evaluating the NLU performance of LLMs, which include multiple instruction templates for fair evaluation of each task, along with regular expressions to constrain the output format. Furthermore, we propose the Sharpe score as an evaluation metric that takes into account the variance in scores between templates. Comprehensive analysis of English and Japanese LLMs reveals that the high variance among templates has a significant impact on the fair evaluation of LLMs.

[LG-43] LLMs are not Zero-Shot Reasoners for Biomedical Information Extraction

链接: https://arxiv.org/abs/2408.12249
作者: Aishik Nagar,Viktor Schlegel,Thanh-Tung Nguyen,Hao Li,Yuping Wu,Kuluhan Binici,Stefan Winkler
关键词-EN: Large Language Models, Large Language, Language Models, Named Entity Recognition, document summarisation
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: 11 pages

点击查看摘要

Abstract:Large Language Models (LLMs) are increasingly adopted for applications in healthcare, reaching the performance of domain experts on tasks such as question answering and document summarisation. Despite their success on these tasks, it is unclear how well LLMs perform on tasks that are traditionally pursued in the biomedical domain, such as structured information extration. To breach this gap, in this paper, we systematically benchmark LLM performance in Medical Classification and Named Entity Recognition (NER) tasks. We aim to disentangle the contribution of different factors to the performance, particularly the impact of LLMs’ task knowledge and reasoning capabilities, their (parametric) domain knowledge, and addition of external knowledge. To this end we evaluate various open LLMs – including BioMistral and Llama-2 models – on a diverse set of biomedical datasets, using standard prompting, Chain-of-Thought (CoT) and Self-Consistency based reasoning as well as Retrieval-Augmented Generation (RAG) with PubMed and Wikipedia corpora. Counter-intuitively, our results reveal that standard prompting consistently outperforms more complex techniques across both tasks, laying bare the limitations in the current application of CoT, self-consistency and RAG in the biomedical domain. Our findings suggest that advanced prompting methods developed for knowledge- or reasoning-intensive tasks, such as CoT or RAG, are not easily portable to biomedical tasks where precise structured outputs are required. This highlights the need for more effective integration of external knowledge and reasoning mechanisms in LLMs to enhance their performance in real-world biomedical applications.

[LG-44] Weight Scope Alignment: A Frustratingly Easy Method for Model Merging

链接: https://arxiv.org/abs/2408.12237
作者: Yichu Xu,Xin-Chun Li,Le Gan,De-Chuan Zhan
关键词-EN: weight scope, efficiency and robustness, fundamental procedure, weight, scope
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Merging models becomes a fundamental procedure in some applications that consider model efficiency and robustness. The training randomness or Non-I.I.D. data poses a huge challenge for averaging-based model fusion. Previous research efforts focus on element-wise regularization or neural permutations to enhance model averaging while overlooking weight scope variations among models, which can significantly affect merging effectiveness. In this paper, we reveal variations in weight scope under different training conditions, shedding light on its influence on model merging. Fortunately, the parameters in each layer basically follow the Gaussian distribution, which inspires a novel and simple regularization approach named Weight Scope Alignment (WSA). It contains two key components: 1) leveraging a target weight scope to guide the model training process for ensuring weight scope matching in the subsequent model merging. 2) fusing the weight scope of two or more models into a unified one for multi-stage model fusion. We extend the WSA regularization to two different scenarios, including Mode Connectivity and Federated Learning. Abundant experimental studies validate the effectiveness of our approach.

[LG-45] Relational decomposition for program synthesis

链接: https://arxiv.org/abs/2408.12212
作者: Céline Hocquette,Andrew Cropper
关键词-EN: relational synthesis sub-tasks, decomposes complex functional, complex functional tasks, simpler relational synthesis, synthesis sub-tasks
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:We introduce a novel approach to program synthesis that decomposes complex functional tasks into simpler relational synthesis sub-tasks. We demonstrate the effectiveness of our approach using an off-the-shelf inductive logic programming (ILP) system on three challenging datasets. Our results show that (i) a relational representation can outperform a functional one, and (ii) an off-the-shelf ILP system with a relational encoding can outperform domain-specific approaches.

[LG-46] Fair Augmentation for Graph Collaborative Filtering

链接: https://arxiv.org/abs/2408.12208
作者: Ludovico Boratto,Francesco Fabbri,Gianni Fenu,Mirko Marras,Giacomo Medda
关键词-EN: learning users’ preferences, users’ preferences, preferences from user-item, graph collaborative filtering, collaborative power
类目: Information Retrieval (cs.IR); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Recent developments in recommendation have harnessed the collaborative power of graph neural networks (GNNs) in learning users’ preferences from user-item networks. Despite emerging regulations addressing fairness of automated systems, unfairness issues in graph collaborative filtering remain underexplored, especially from the consumer’s perspective. Despite numerous contributions on consumer unfairness, only a few of these works have delved into GNNs. A notable gap exists in the formalization of the latest mitigation algorithms, as well as in their effectiveness and reliability on cutting-edge models. This paper serves as a solid response to recent research highlighting unfairness issues in graph collaborative filtering by reproducing one of the latest mitigation methods. The reproduced technique adjusts the system fairness level by learning a fair graph augmentation. Under an experimental setup based on 11 GNNs, 5 non-GNN models, and 5 real-world networks across diverse domains, our investigation reveals that fair graph augmentation is consistently effective on high-utility models and large datasets. Experiments on the transferability of the fair augmented graph open new issues for future recommendation studies. Source code: this https URL.

[LG-47] wo-level deep domain decomposition method

链接: https://arxiv.org/abs/2408.12198
作者: Victorita Dolean,Serge Gratton,Alexander Heinlein,Valentin Mercier
关键词-EN: Domain Decomposition Method, Deep Domain Decomposition, Domain Decomposition, two-level Deep Domain, Decomposition Method
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: Preprint proceeding format

点击查看摘要

Abstract:This study presents a two-level Deep Domain Decomposition Method (Deep-DDM) augmented with a coarse-level network for solving boundary value problems using physics-informed neural networks (PINNs). The addition of the coarse level network improves scalability and convergence rates compared to the single level method. Tested on a Poisson equation with Dirichlet boundary conditions, the two-level deep DDM demonstrates superior performance, maintaining efficient convergence regardless of the number of subdomains. This advance provides a more scalable and effective approach to solving complex partial differential equations with machine learning.

[LG-48] Rank and Align: Towards Effective Source-free Graph Domain Adaptation IJCAI2024

链接: https://arxiv.org/abs/2408.12185
作者: Junyu Luo,Zhiping Xiao,Yifan Wang,Xiao Luo,Jingyang Yuan,Wei Ju,Langechuan Liu,Ming Zhang
关键词-EN: achieved impressive performance, Graph neural networks, graph domain adaptation, neural networks, achieved impressive
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
*备注: Published in IJCAI2024

点击查看摘要

Abstract:Graph neural networks (GNNs) have achieved impressive performance in graph domain adaptation. However, extensive source graphs could be unavailable in real-world scenarios due to privacy and storage concerns. To this end, we investigate an underexplored yet practical problem of source-free graph domain adaptation, which transfers knowledge from source models instead of source graphs to a target domain. To solve this problem, we introduce a novel GNN-based approach called Rank and Align (RNA), which ranks graph similarities with spectral seriation for robust semantics learning, and aligns inharmonic graphs with harmonic graphs which close to the source domain for subgraph extraction. In particular, to overcome label scarcity, we employ the spectral seriation algorithm to infer the robust pairwise rankings, which can guide semantic learning using a similarity learning objective. To depict distribution shifts, we utilize spectral clustering and the silhouette coefficient to detect harmonic graphs, which the source model can easily classify. To reduce potential domain discrepancy, we extract domain-invariant subgraphs from inharmonic graphs by an adversarial edge sampling process, which guides the invariant learning of GNNs. Extensive experiments on several benchmark datasets demonstrate the effectiveness of our proposed RNA.

[LG-49] How disentangled are your classification uncertainties?

链接: https://arxiv.org/abs/2408.12175
作者: Ivo Pascal de Jong,Andreea Ioana Sburlea,Matias Valdenegro-Toro
关键词-EN: Quantification in Machine, Machine Learning, Learning has progressed, Information Theoretic approach, Uncertainty Quantification
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注: 11 pages, 11 figures

点击查看摘要

Abstract:Uncertainty Quantification in Machine Learning has progressed to predicting the source of uncertainty in a prediction: Uncertainty from stochasticity in the data (aleatoric), or uncertainty from limitations of the model (epistemic). Generally, each uncertainty is evaluated in isolation, but this obscures the fact that they are often not truly disentangled. This work proposes a set of experiments to evaluate disentanglement of aleatoric and epistemic uncertainty, and uses these methods to compare two competing formulations for disentanglement (the Information Theoretic approach, and the Gaussian Logits approach). The results suggest that the Information Theoretic approach gives better disentanglement, but that either predicted source of uncertainty is still largely contaminated by the other for both methods. We conclude that with the current methods for disentangling, aleatoric and epistemic uncertainty are not reliably separated, and we provide a clear set of experimental criteria that good uncertainty disentanglement should follow.

[LG-50] Recent Advances on Machine Learning for Computational Fluid Dynamics: A Survey

链接: https://arxiv.org/abs/2408.12171
作者: Haixin Wang,Yadi Cao,Zijie Huang,Yuxuan Liu,Peiyan Hu,Xiao Luo,Zezheng Song,Wanjia Zhao,Jilin Liu,Jinan Sun,Shikun Zhang,Long Wei,Yue Wang,Tailin Wu,Zhi-Ming Ma,Yizhou Sun
关键词-EN: Machine Learning, tasks through Machine, ML-assisted Numerical Solutions, Computational Fluid Dynamics, CFD
类目: Machine Learning (cs.LG)
*备注: 22 pages, 6 figures

点击查看摘要

Abstract:This paper explores the recent advancements in enhancing Computational Fluid Dynamics (CFD) tasks through Machine Learning (ML) techniques. We begin by introducing fundamental concepts, traditional methods, and benchmark datasets, then examine the various roles ML plays in improving CFD. The literature systematically reviews papers in recent five years and introduces a novel classification for forward modeling: Data-driven Surrogates, Physics-Informed Surrogates, and ML-assisted Numerical Solutions. Furthermore, we also review the latest ML methods in inverse design and control, offering a novel classification and providing an in-depth discussion. Then we highlight real-world applications of ML for CFD in critical scientific and engineering disciplines, including aerodynamics, combustion, atmosphere ocean science, biology fluid, plasma, symbolic regression, and reduced order modeling. Besides, we identify key challenges and advocate for future research directions to address these challenges, such as multi-scale representation, physical knowledge encoding, scientific foundation model and automatic scientific discovery. This review serves as a guide for the rapidly expanding ML for CFD community, aiming to inspire insights for future advancements. We draw the conclusion that ML is poised to significantly transform CFD research by enhancing simulation accuracy, reducing computational time, and enabling more complex analyses of fluid dynamics. The paper resources can be viewed at this https URL.

[LG-51] DimeRec: A Unified Framework for Enhanced Sequential Recommendation via Generative Diffusion Models

链接: https://arxiv.org/abs/2408.12153
作者: Wuchao Li,Rui Huang,Haijun Zhao,Chi Liu,Kai Zheng,Qi Liu,Na Mou,Guorui Zhou,Defu Lian,Yang Song,Wentian Bao,Enyun Yu,Wenwu Ou
关键词-EN: user preferences based, Sequential Recommendation, plays a pivotal, pivotal role, role in recommender
类目: Information Retrieval (cs.IR); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Sequential Recommendation (SR) plays a pivotal role in recommender systems by tailoring recommendations to user preferences based on their non-stationary historical interactions. Achieving high-quality performance in SR requires attention to both item representation and diversity. However, designing an SR method that simultaneously optimizes these merits remains a long-standing challenge. In this study, we address this issue by integrating recent generative Diffusion Models (DM) into SR. DM has demonstrated utility in representation learning and diverse image generation. Nevertheless, a straightforward combination of SR and DM leads to sub-optimal performance due to discrepancies in learning objectives (recommendation vs. noise reconstruction) and the respective learning spaces (non-stationary vs. stationary). To overcome this, we propose a novel framework called DimeRec (\textbfDiffusion with \textbfmulti-interest \textbfenhanced \textbfRecommender). DimeRec synergistically combines a guidance extraction module (GEM) and a generative diffusion aggregation module (DAM). The GEM extracts crucial stationary guidance signals from the user’s non-stationary interaction history, while the DAM employs a generative diffusion process conditioned on GEM’s outputs to reconstruct and generate consistent recommendations. Our numerical experiments demonstrate that DimeRec significantly outperforms established baseline methods across three publicly available datasets. Furthermore, we have successfully deployed DimeRec on a large-scale short video recommendation platform, serving hundreds of millions of users. Live A/B testing confirms that our method improves both users’ time spent and result diversification.

[LG-52] A Tighter Complexity Analysis of SparseGPT

链接: https://arxiv.org/abs/2408.12151
作者: Xiaoyu Li,Yingyu Liang,Zhenmei Shi,Zhao Song
关键词-EN: Alistarh ICML, omega, matrix multiplication, Zhou ICML, exponent of matrix
类目: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:In this work, we improved the analysis of the running time of SparseGPT [Frantar, Alistarh ICML 2023] from O(d^3) to O(d^\omega + d^2+a+o(1) + d^1+\omega(1,1,a)-a) for any a \in [0, 1] , where \omega is the exponent of matrix multiplication. In particular, for the current \omega \approx 2.371 [Alman, Duan, Williams, Xu, Xu, Zhou 2024], our running times boil down to O(d^2.53) . This running time is due to the analysis of the lazy update behavior in iterative maintenance problems, such as [Deng, Song, Weinstein 2022, Brand, Song, Zhou ICML 2024].

[LG-53] DRExplainer: Quantifiable Interpretability in Drug Response Prediction with Directed Graph Convolutional Network

链接: https://arxiv.org/abs/2408.12139
作者: Haoyuan Shi,Tao Xu,Xiaodi Li,Qian Gao,Junfeng Xia,Zhenyu Yue
关键词-EN: directed bipartite network, personalized medicine, pivotal for personalized, cancer cell line, directed bipartite
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Predicting the response of a cancer cell line to a therapeutic drug is pivotal for personalized medicine. Despite numerous deep learning methods that have been developed for drug response prediction, integrating diverse information about biological entities and predicting the directional response remain major challenges. Here, we propose a novel interpretable predictive model, DRExplainer, which leverages a directed graph convolutional network to enhance the prediction in a directed bipartite network framework. DRExplainer constructs a directed bipartite network integrating multi-omics profiles of cell lines, the chemical structure of drugs and known drug response to achieve directed prediction. Then, DRExplainer identifies the most relevant subgraph to each prediction in this directed bipartite network by learning a mask, facilitating critical medical decision-making. Additionally, we introduce a quantifiable method for model interpretability that leverages a ground truth benchmark dataset curated from biological features. In computational experiments, DRExplainer outperforms state-of-the-art predictive methods and another graph-based explanation method under the same experimental setting. Finally, the case studies further validate the interpretability and the effectiveness of DRExplainer in predictive novel drug response. Our code is available at: this https URL.

[LG-54] Domain Adaptation for Offline Reinforcement Learning with Limited Samples

链接: https://arxiv.org/abs/2408.12136
作者: Weiqin Chen,Sandipan Mishra,Santiago Paternain
关键词-EN: learns effective policies, Offline reinforcement learning, static target dataset, target dataset, reinforcement learning
类目: Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:Offline reinforcement learning (RL) learns effective policies from a static target dataset. Despite state-of-the-art (SOTA) offline RL algorithms being promising, they highly rely on the quality of the target dataset. The performance of SOTA algorithms can degrade in scenarios with limited samples in the target dataset, which is often the case in real-world applications. To address this issue, domain adaptation that leverages auxiliary samples from related source datasets (such as simulators) can be beneficial. In this context, determining the optimal way to trade off the source and target datasets remains a critical challenge in offline RL. To the best of our knowledge, this paper proposes the first framework that theoretically and experimentally explores how the weight assigned to each dataset affects the performance of offline RL. We establish the performance bounds and convergence neighborhood of our framework, both of which depend on the selection of the weight. Furthermore, we identify the existence of an optimal weight for balancing the two datasets. All theoretical guarantees and optimal weight depend on the quality of the source dataset and the size of the target dataset. Our empirical results on the well-known Procgen Benchmark substantiate our theoretical contributions.

[LG-55] Self-supervised Learning for Geospatial AI: A Survey

链接: https://arxiv.org/abs/2408.12133
作者: Yile Chen,Weiming Huang,Kaiqi Zhao,Yue Jiang,Gao Cong
关键词-EN: geospatial artificial intelligence, geospatial data, artificial intelligence, SSL techniques, SSL
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:The proliferation of geospatial data in urban and territorial environments has significantly facilitated the development of geospatial artificial intelligence (GeoAI) across various urban applications. Given the vast yet inherently sparse labeled nature of geospatial data, there is a critical need for techniques that can effectively leverage such data without heavy reliance on labeled datasets. This requirement aligns with the principles of self-supervised learning (SSL), which has attracted increasing attention for its adoption in geospatial data. This paper conducts a comprehensive and up-to-date survey of SSL techniques applied to or developed for three primary data (geometric) types prevalent in geospatial vector data: points, polylines, and polygons. We systematically categorize various SSL techniques into predictive and contrastive methods, discussing their application with respect to each data type in enhancing generalization across various downstream tasks. Furthermore, we review the emerging trends of SSL for GeoAI, and several task-specific SSL techniques. Finally, we discuss several key challenges in the current research and outline promising directions for future investigation. By presenting a structured analysis of relevant studies, this paper aims to inspire continued advancements in the integration of SSL with GeoAI, encouraging innovative methods to harnessing the power of geospatial data.

[LG-56] Deep Analysis of Time Series Data for Smart Grid Startup Strategies: A Transformer-LSTM-PSO Model Approach

链接: https://arxiv.org/abs/2408.12129
作者: Zecheng Zhang
关键词-EN: holds strategic importance, Grid startup, holds strategic, integral component, strategic importance
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
*备注: 46 pages

点击查看摘要

Abstract:Grid startup, an integral component of the power system, holds strategic importance for ensuring the reliability and efficiency of the electrical grid. However, current methodologies for in-depth analysis and precise prediction of grid startup scenarios are inadequate. To address these challenges, we propose a novel method based on the Transformer-LSTM-PSO model. This model uniquely combines the Transformer’s self-attention mechanism, LSTM’s temporal modeling capabilities, and the parameter tuning features of the particle swarm optimization algorithm. It is designed to more effectively capture the complex temporal relationships in grid startup schemes. Our experiments demonstrate significant improvements, with our model achieving lower RMSE and MAE values across multiple datasets compared to existing benchmarks, particularly in the NYISO Electric Market dataset where the RMSE was reduced by approximately 15% and the MAE by 20% compared to conventional models. Our main contribution is the development of a Transformer-LSTM-PSO model that significantly enhances the accuracy and efficiency of smart grid startup predictions. The application of the Transformer-LSTM-PSO model represents a significant advancement in smart grid predictive analytics, concurrently fostering the development of more reliable and intelligent grid management systems.

[LG-57] Recording Brain Activity While Listening to Music Using Wearable EEG Devices Combined with Bidirectional Long Short-Term Memory Networks

链接: https://arxiv.org/abs/2408.12124
作者: Jingyi Wang,Zhiqun Wang,Guiran Liu
关键词-EN: investigating brain function, high-dimensional EEG signals, cognitive processes, EEG signal processing, crucial for investigating
类目: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)
*备注: 15 pages

点击查看摘要

Abstract:Electroencephalography (EEG) signals are crucial for investigating brain function and cognitive processes. This study aims to address the challenges of efficiently recording and analyzing high-dimensional EEG signals while listening to music to recognize emotional states. We propose a method combining Bidirectional Long Short-Term Memory (Bi-LSTM) networks with attention mechanisms for EEG signal processing. Using wearable EEG devices, we collected brain activity data from participants listening to music. The data was preprocessed, segmented, and Differential Entropy (DE) features were extracted. We then constructed and trained a Bi-LSTM model to enhance key feature extraction and improve emotion recognition accuracy. Experiments were conducted on the SEED and DEAP datasets. The Bi-LSTM-AttGW model achieved 98.28% accuracy on the SEED dataset and 92.46% on the DEAP dataset in multi-class emotion recognition tasks, significantly outperforming traditional models such as SVM and EEG-Net. This study demonstrates the effectiveness of combining Bi-LSTM with attention mechanisms, providing robust technical support for applications in brain-computer interfaces (BCI) and affective computing. Future work will focus on improving device design, incorporating multimodal data, and further enhancing emotion recognition accuracy, aiming to achieve practical applications in real-world scenarios.

[LG-58] Cross-border Commodity Pricing Strategy Optimization via Mixed Neural Network for Time Series Analysis

链接: https://arxiv.org/abs/2408.12115
作者: Lijuan Wang,Yijia Hu,Yan Zhou
关键词-EN: cross-border commodity pricing, pricing largely determines, commodity pricing, commodity pricing largely, cross-border commodity
类目: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); General Economics (econ.GN)
*备注: 30 pages

点击查看摘要

Abstract:In the context of global trade, cross-border commodity pricing largely determines the competitiveness and market share of businesses. However, existing methodologies often prove inadequate, as they lack the agility and precision required to effectively respond to the dynamic international markets. Time series data is of great significance in commodity pricing and can reveal market dynamics and trends. Therefore, we propose a new method based on the hybrid neural network model CNN-BiGRU-SSA. The goal is to achieve accurate prediction and optimization of cross-border commodity pricing strategies through in-depth analysis and optimization of time series data. Our model undergoes experimental validation across multiple datasets. The results show that our method achieves significant performance advantages on datasets such as UNCTAD, IMF, WITS and China Customs. For example, on the UNCTAD dataset, our model reduces MAE to 4.357, RMSE to 5.406, and R2 to 0.961, significantly better than other models. On the IMF and WITS datasets, our method also achieves similar excellent performance. These experimental results verify the effectiveness and reliability of our model in the field of cross-border commodity pricing. Overall, this study provides an important reference for enterprises to formulate more reasonable and effective cross-border commodity pricing strategies, thereby enhancing market competitiveness and profitability. At the same time, our method also lays a foundation for the application of deep learning in the fields of international trade and economic strategy optimization, which has important theoretical and practical significance.

[LG-59] Risk Analysis in Customer Relationship Management via Quantile Region Convolutional Neural Network-Long Short-Term Memory and Cross-Attention Mechanism

链接: https://arxiv.org/abs/2408.12113
作者: Yaowen Huang,Jun Der Leu,Baoli Lu,Yan Zhou
关键词-EN: customer relationship management, affect customer satisfaction, CRM risk analysis, retention rates, customer satisfaction
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
*备注: 44 pages

点击查看摘要

Abstract:Risk analysis is an important business decision support task in customer relationship management (CRM), involving the identification of potential risks or challenges that may affect customer satisfaction, retention rates, and overall business performance. To enhance risk analysis in CRM, this paper combines the advantages of quantile region convolutional neural network-long short-term memory (QRCNN-LSTM) and cross-attention mechanisms for modeling. The QRCNN-LSTM model combines sequence modeling with deep learning architectures commonly used in natural language processing tasks, enabling the capture of both local and global dependencies in sequence data. The cross-attention mechanism enhances interactions between different input data parts, allowing the model to focus on specific areas or features relevant to CRM risk analysis. By applying QRCNN-LSTM and cross-attention mechanisms to CRM risk analysis, empirical evidence demonstrates that this approach can effectively identify potential risks and provide data-driven support for business decisions.

[LG-60] Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards

链接: https://arxiv.org/abs/2408.12112
作者: Shresth Verma,Niclas Boehmer,Lingkai Kong,Milind Tambe
关键词-EN: Reinforcement Learning, design reward functions, preferences in Reinforcement, Learning, based on human
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
*备注:

点击查看摘要

Abstract:LLMs are increasingly used to design reward functions based on human preferences in Reinforcement Learning (RL). We focus on LLM-designed rewards for Restless Multi-Armed Bandits, a framework for allocating limited resources among agents. In applications such as public health, this approach empowers grassroots health workers to tailor automated allocation decisions to community needs. In the presence of multiple agents, altering the reward function based on human preferences can impact subpopulations very differently, leading to complex tradeoffs and a multi-objective resource allocation problem. We are the first to present a principled method termed Social Choice Language Model for dealing with these tradeoffs for LLM-designed rewards for multiagent planners in general and restless bandits in particular. The novel part of our model is a transparent and configurable selection component, called an adjudicator, external to the LLM that controls complex tradeoffs via a user-selected social welfare function. Our experiments demonstrate that our model reliably selects more effective, aligned, and balanced reward functions compared to purely LLM-based approaches.

[LG-61] Pareto Inverse Reinforcement Learning for Diverse Expert Policy Generation IJCAI

链接: https://arxiv.org/abs/2408.12110
作者: Woo Kyung Kim,Minjong Yoo,Honguk Woo
关键词-EN: Data-driven offline reinforcement, sequential decision-making problems, addressing sequential decision-making, Data-driven offline, imitation learning approaches
类目: Machine Learning (cs.LG)
*备注: 13 pages, 7 figures; Accepted for International Joint Conference on Artificial Intelligence (IJCAI) 2024; Published version

点击查看摘要

Abstract:Data-driven offline reinforcement learning and imitation learning approaches have been gaining popularity in addressing sequential decision-making problems. Yet, these approaches rarely consider learning Pareto-optimal policies from a limited pool of expert datasets. This becomes particularly marked due to practical limitations in obtaining comprehensive datasets for all preferences, where multiple conflicting objectives exist and each expert might hold a unique optimization preference for these objectives. In this paper, we adapt inverse reinforcement learning (IRL) by using reward distance estimates for regularizing the discriminator. This enables progressive generation of a set of policies that accommodate diverse preferences on the multiple objectives, while using only two distinct datasets, each associated with a different expert preference. In doing so, we present a Pareto IRL framework (ParIRL) that establishes a Pareto policy set from these limited datasets. In the framework, the Pareto policy set is then distilled into a single, preference-conditioned diffusion model, thus allowing users to immediately specify which expert’s patterns they prefer. Through experiments, we show that ParIRL outperforms other IRL algorithms for various multi-objective control tasks, achieving the dense approximation of the Pareto frontier. We also demonstrate the applicability of ParIRL with autonomous driving in CARLA.

[LG-62] You Only Merge Once: Learning the Pareto Set of Preference-Aware Model Merging

链接: https://arxiv.org/abs/2408.12105
作者: Weiyu Chen,James Kwok
关键词-EN: gained increasing popularity, combines multiple models, Model, recent years, combines multiple
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Model merging, which combines multiple models into a single model, has gained increasing popularity in recent years. By efficiently integrating the capabilities of various models without their original training data, this significantly reduces the parameter count and memory usage. However, current methods can only produce one single merged model. This necessitates a performance trade-off due to conflicts among the various models, and the resultant one-size-fits-all model may not align with the preferences of different users who may prioritize certain models over others. To address this issue, we propose preference-aware model merging, and formulate this as a multi-objective optimization problem in which the performance of the merged model on each base model’s task is treated as an objective. In only one merging process, the proposed parameter-efficient structure can generate the whole Pareto set of merged models, each representing the Pareto-optimal model for a given user-specified preference. Merged models can also be selected from the learned Pareto set that are tailored to different user preferences. Experimental results on a number of benchmark datasets demonstrate that the proposed preference-aware Pareto Merging can obtain a diverse set of trade-off models and outperforms state-of-the-art model merging baselines.

[LG-63] Integrating Audio Visual and Semantic Information for Enhanced Multimodal Speaker Diarization

链接: https://arxiv.org/abs/2408.12102
作者: Luyao Cheng,Hui Wang,Siqi Zheng,Yafeng Chen,Rongjie Huang,Qinglin Zhang,Qian Chen,Xihao Li
关键词-EN: transcribed speech content, homogenous partitions based, transcribed speech, human speech, plays a crucial
类目: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
*备注:

点击查看摘要

Abstract:Speaker diarization, the process of segmenting an audio stream or transcribed speech content into homogenous partitions based on speaker identity, plays a crucial role in the interpretation and analysis of human speech. Most existing speaker diarization systems rely exclusively on unimodal acoustic information, making the task particularly challenging due to the innate ambiguities of audio signals. Recent studies have made tremendous efforts towards audio-visual or audio-semantic modeling to enhance performance. However, even the incorporation of up to two modalities often falls short in addressing the complexities of spontaneous and unstructured conversations. To exploit more meaningful dialogue patterns, we propose a novel multimodal approach that jointly utilizes audio, visual, and semantic cues to enhance speaker diarization. Our method elegantly formulates the multimodal modeling as a constrained optimization problem. First, we build insights into the visual connections among active speakers and the semantic interactions within spoken content, thereby establishing abundant pairwise constraints. Then we introduce a joint pairwise constraint propagation algorithm to cluster speakers based on these visual and semantic constraints. This integration effectively leverages the complementary strengths of different modalities, refining the affinity estimation between individual speaker embeddings. Extensive experiments conducted on multiple multimodal datasets demonstrate that our approach consistently outperforms state-of-the-art speaker diarization methods.

[LG-64] Extraction of Research Objectives Machine Learning Model Names and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis

链接: https://arxiv.org/abs/2408.12097
作者: S. Nishio,H. Nonaka,N. Tsuchiya,A. Migita,Y. Banno,T. Hayashi,H. Sakaji,T. Sakumoto,K. Watabe
关键词-EN: Machine learning, machine learning models, learning, Machine, learning models
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
*备注: 10 pages, 8 figures

点击查看摘要

Abstract:Machine learning is widely utilized across various industries. Identifying the appropriate machine learning models and datasets for specific tasks is crucial for the effective industrial application of machine learning. However, this requires expertise in both machine learning and the relevant domain, leading to a high learning cost. Therefore, research focused on extracting combinations of tasks, machine learning models, and datasets from academic papers is critically important, as it can facilitate the automatic recommendation of suitable methods. Conventional information extraction methods from academic papers have been limited to identifying machine learning models and other entities as named entities. To address this issue, this study proposes a methodology extracting tasks, machine learning methods, and dataset names from scientific papers and analyzing the relationships between these information by using LLM, embedding model, and network clustering. The proposed method’s expression extraction performance, when using Llama3, achieves an F-score exceeding 0.8 across various categories, confirming its practical utility. Benchmarking results on financial domain papers have demonstrated the effectiveness of this method, providing insights into the use of the latest datasets, including those related to ESG (Environmental, Social, and Governance) data.

[LG-65] uMedSum: A Unified Framework for Advancing Medical Abstractive Summarization

链接: https://arxiv.org/abs/2408.12095
作者: Aishik Nagar,Yutong Liu,Andy T. Liu,Viktor Schlegel,Vijay Prakash Dwivedi,Arun-Kumar Kaliya-Perumal,Guna Pratheep Kalanchiam,Yili Tang,Robby T. Tan
关键词-EN: faces the challenge, challenge of balancing, abstractive summarization faces, medical summarization, summarization
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: 12 pages

点击查看摘要

Abstract:Medical abstractive summarization faces the challenge of balancing faithfulness and informativeness. Current methods often sacrifice key information for faithfulness or introduce confabulations when prioritizing informativeness. While recent advancements in techniques like in-context learning (ICL) and fine-tuning have improved medical summarization, they often overlook crucial aspects such as faithfulness and informativeness without considering advanced methods like model reasoning and self-improvement. Moreover, the field lacks a unified benchmark, hindering systematic evaluation due to varied metrics and datasets. This paper addresses these gaps by presenting a comprehensive benchmark of six advanced abstractive summarization methods across three diverse datasets using five standardized metrics. Building on these findings, we propose uMedSum, a modular hybrid summarization framework that introduces novel approaches for sequential confabulation removal followed by key missing information addition, ensuring both faithfulness and informativeness. Our work improves upon previous GPT-4-based state-of-the-art (SOTA) medical summarization methods, significantly outperforming them in both quantitative metrics and qualitative domain expert evaluations. Notably, we achieve an average relative performance improvement of 11.8% in reference-free metrics over the previous SOTA. Doctors prefer uMedSum’s summaries 6 times more than previous SOTA in difficult cases where there are chances of confabulations or missing information. These results highlight uMedSum’s effectiveness and generalizability across various datasets and metrics, marking a significant advancement in medical summarization.

[LG-66] Unsupervised discovery of the shared and private geometry in multi-view data

链接: https://arxiv.org/abs/2408.12091
作者: Sai Koukuntla,Joshua B. Julian,Jesse C. Kaminsky,Manuel Schottdorf,David W. Tank,Carlos D. Brody,Adam S. Charles
关键词-EN: Modern applications, leverage multiple views, subject of study, applications often leverage, Modern
类目: Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
*备注:

点击查看摘要

Abstract:Modern applications often leverage multiple views of a subject of study. Within neuroscience, there is growing interest in large-scale simultaneous recordings across multiple brain regions. Understanding the relationship between views (e.g., the neural activity in each region recorded) can reveal fundamental principles about the characteristics of each representation and about the system. However, existing methods to characterize such relationships either lack the expressivity required to capture complex nonlinearities, describe only sources of variance that are shared between views, or discard geometric information that is crucial to interpreting the data. Here, we develop a nonlinear neural network-based method that, given paired samples of high-dimensional views, disentangles low-dimensional shared and private latent variables underlying these views while preserving intrinsic data geometry. Across multiple simulated and real datasets, we demonstrate that our method outperforms competing methods. Using simulated populations of lateral geniculate nucleus (LGN) and V1 neurons we demonstrate our model’s ability to discover interpretable shared and private structure across different noise conditions. On a dataset of unrotated and corresponding but randomly rotated MNIST digits, we recover private latents for the rotated view that encode rotation angle regardless of digit class, and places the angle representation on a 1-d manifold, while shared latents encode digit class but not rotation angle. Applying our method to simultaneous Neuropixels recordings of hippocampus and prefrontal cortex while mice run on a linear track, we discover a low-dimensional shared latent space that encodes the animal’s position. We propose our approach as a general-purpose method for finding succinct and interpretable descriptions of paired data sets in terms of disentangled shared and private latent variables.

[LG-67] Multi-Task Curriculum Graph Contrastive Learning with Clustering Entropy Guidance

链接: https://arxiv.org/abs/2408.12071
作者: Chusheng Zeng,Bocheng Wang,Jinghui Yuan,Rong Wang,Mulin Chen
关键词-EN: unsupervised deep graph, graph contrastive learning, Recent advances, contrastive learning, deep graph clustering
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Recent advances in unsupervised deep graph clustering have been significantly promoted by contrastive learning. Despite the strides, most graph contrastive learning models face challenges: 1) graph augmentation is used to improve learning diversity, but commonly used random augmentation methods may destroy inherent semantics and cause noise; 2) the fixed positive and negative sample selection strategy is limited to deal with complex real data, thereby impeding the model’s capability to capture fine-grained patterns and relationships. To reduce these problems, we propose the Clustering-guided Curriculum Graph contrastive Learning (CCGL) framework. CCGL uses clustering entropy as the guidance of the following graph augmentation and contrastive learning. Specifically, according to the clustering entropy, the intra-class edges and important features are emphasized in augmentation. Then, a multi-task curriculum learning scheme is proposed, which employs the clustering guidance to shift the focus from the discrimination task to the clustering task. In this way, the sample selection strategy of contrastive learning can be adjusted adaptively from early to late stage, which enhances the model’s flexibility for complex data structure. Experimental results demonstrate that CCGL has achieved excellent performance compared to state-of-the-art competitors.

[LG-68] Simplified Mamba with Disentangled Dependency Encoding for Long-Term Time Series Forecasting

链接: https://arxiv.org/abs/2408.12068
作者: Zixuan Weng,Jindong Han,Wenzhao Jiang,Hao Liu
关键词-EN: Long-term Time Series, Recently many deep, proposed for Long-term, deep learning models, Long-term Time
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Recently many deep learning models have been proposed for Long-term Time Series Forecasting (LTSF). Based on previous literature, we identify three critical patterns that can improve forecasting accuracy: the order and semantic dependencies in time dimension as well as cross-variate dependency. However, little effort has been made to simultaneously consider order and semantic dependencies when developing forecasting models. Moreover, existing approaches utilize cross-variate dependency by mixing information from different timestamps and variates, which may introduce irrelevant or harmful cross-variate information to the time dimension and largely hinder forecasting performance. To overcome these limitations, we investigate the potential of Mamba for LTSF and discover two key advantages benefiting forecasting: (i) the selection mechanism makes Mamba focus on or ignore specific inputs and learn semantic dependency easily, and (ii) Mamba preserves order dependency by processing sequences recursively. After that, we empirically find that the non-linear activation used in Mamba is unnecessary for semantically sparse time series data. Therefore, we further propose SAMBA, a Simplified Mamba with disentangled dependency encoding. Specifically, we first remove the non-linearities of Mamba to make it more suitable for LTSF. Furthermore, we propose a disentangled dependency encoding strategy to endow Mamba with cross-variate dependency modeling capabilities while reducing the interference between time and variate dimensions. Extensive experimental results on seven real-world datasets demonstrate the effectiveness of SAMBA over state-of-the-art forecasting models.

[LG-69] Aligning (Medical) LLMs for (Counterfactual) Fairness

链接: https://arxiv.org/abs/2408.12055
作者: Raphael Poulain,Hamed Fayyaz,Rahmatollah Beheshti
关键词-EN: Large Language Models, Large Language, clinical decision support, decision support applications, Language Models
类目: Computation and Language (cs.CL); Machine Learning (cs.LG)
*备注: arXiv admin note: substantial text overlap with arXiv:2404.15149

点击查看摘要

Abstract:Large Language Models (LLMs) have emerged as promising solutions for a variety of medical and clinical decision support applications. However, LLMs are often subject to different types of biases, which can lead to unfair treatment of individuals, worsening health disparities, and reducing trust in AI-augmented medical tools. Aiming to address this important issue, in this study, we present a new model alignment approach for aligning LLMs using a preference optimization method within a knowledge distillation framework. Prior to presenting our proposed method, we first use an evaluation framework to conduct a comprehensive (largest to our knowledge) empirical evaluation to reveal the type and nature of existing biases in LLMs used for medical applications. We then offer a bias mitigation technique to reduce the unfair patterns in LLM outputs across different subgroups identified by the protected attributes. We show that our mitigation method is effective in significantly reducing observed biased patterns. Our code is publicly available at \urlthis https URL.

[LG-70] Reasoning and Tools for Human-Level Forecasting

链接: https://arxiv.org/abs/2408.12036
作者: Elvis Hsieh,Preston Fu,Jonathan Chen
关键词-EN: largely successful due, memorize large amounts, training data, Language models, trained on web-scale
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:Language models (LMs) trained on web-scale datasets are largely successful due to their ability to memorize large amounts of training data, even if only present in a few examples. These capabilities are often desirable in evaluation on tasks such as question answering but raise questions about whether these models can exhibit genuine reasoning or succeed only at mimicking patterns from the training data. This distinction is particularly salient in forecasting tasks, where the answer is not present in the training data, and the model must reason to make logical deductions. We present Reasoning and Tools for Forecasting (RTF), a framework of reasoning-and-acting (ReAct) agents that can dynamically retrieve updated information and run numerical simulation with equipped tools. We evaluate our model with questions from competitive forecasting platforms and demonstrate that our method is competitive with and can outperform human predictions. This suggests that LMs, with the right tools, can indeed think and adapt like humans, offering valuable insights for real-world decision-making.

[LG-71] Let Community Rules Be Reflected in Online Content Moderation

链接: https://arxiv.org/abs/2408.12035
作者: Wangjiaxuan Xin,Kanlun Wang,Zhe Fu,Lina Zhou
关键词-EN: social media platforms, Content moderation, media platforms, Content, widely used strategy
类目: ocial and Information Networks (cs.SI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
*备注: 10 pages, 3 figures

点击查看摘要

Abstract:Content moderation is a widely used strategy to prevent the dissemination of irregular information on social media platforms. Despite extensive research on developing automated models to support decision-making in content moderation, there remains a notable scarcity of studies that integrate the rules of online communities into content moderation. This study addresses this gap by proposing a community rule-based content moderation framework that directly integrates community rules into the moderation of user-generated content. Our experiment results with datasets collected from two domains demonstrate the superior performance of models based on the framework to baseline models across all evaluation metrics. In particular, incorporating community rules substantially enhances model performance in content moderation. The findings of this research have significant research and practical implications for improving the effectiveness and generalizability of content moderation models in online communities.

[LG-72] Limitations in Employing Natural Language Supervision for Sensor-Based Human Activity Recognition – And Ways to Overcome Them

链接: https://arxiv.org/abs/2408.12023
作者: Harish Haresamudram,Apoorva Beedu,Mashfiqui Rabbi,Sankalita Saha,Irfan Essa,Thomas Ploetz
关键词-EN: Cross-modal contrastive pre-training, Cross-modal contrastive, demonstrated astonishing performance, vision and audio, natural language supervision
类目: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
*备注:

点击查看摘要

Abstract:Cross-modal contrastive pre-training between natural language and other modalities, e.g., vision and audio, has demonstrated astonishing performance and effectiveness across a diverse variety of tasks and domains. In this paper, we investigate whether such natural language supervision can be used for wearable sensor based Human Activity Recognition (HAR), and discover that-surprisingly-it performs substantially worse than standard end-to-end training and self-supervision. We identify the primary causes for this as: sensor heterogeneity and the lack of rich, diverse text descriptions of activities. To mitigate their impact, we also develop strategies and assess their effectiveness through an extensive experimental evaluation. These strategies lead to significant increases in activity recognition, bringing performance closer to supervised and self-supervised training, while also enabling the recognition of unseen activities and cross modal retrieval of videos. Overall, our work paves the way for better sensor-language learning, ultimately leading to the development of foundational models for HAR using wearables.

[LG-73] Does It Look Sequential? An Analysis of Datasets for Evaluation of Sequential Recommendations

链接: https://arxiv.org/abs/2408.12008
作者: Anton Klenitskiy,Anna Volodkevich,Anton Pembek,Alexey Vasilev
关键词-EN: Sequential recommender systems, Sequential, important and demanded, demanded area, recommender systems
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Sequential recommender systems are an important and demanded area of research. Such systems aim to use the order of interactions in a user’s history to predict future interactions. The premise is that the order of interactions and sequential patterns play an essential role. Therefore, it is crucial to use datasets that exhibit a sequential structure to evaluate sequential recommenders properly. We apply several methods based on the random shuffling of the user’s sequence of interactions to assess the strength of sequential structure across 15 datasets, frequently used for sequential recommender systems evaluation in recent research papers presented at top-tier conferences. As shuffling explicitly breaks sequential dependencies inherent in datasets, we estimate the strength of sequential patterns by comparing metrics for shuffled and original versions of the dataset. Our findings show that several popular datasets have a rather weak sequential structure. Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG) Cite as: arXiv:2408.12008 [cs.IR] (or arXiv:2408.12008v1 [cs.IR] for this version) https://doi.org/10.48550/arXiv.2408.12008 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Related DOI: https://doi.org/10.1145/3640457.3688195 Focus to learn more DOI(s) linking to related resources

[LG-74] QuaCK-TSF: Quantum-Classical Kernelized Time Series Forecasting

链接: https://arxiv.org/abs/2408.12007
作者: Abdallah Aaraba,Soumaya Cherkaoui,Ola Ahmad,Jean-Frédéric Laprade,Olivier Nahman-Lévesque,Alexis Vieloszynski,Shengrui Wang
关键词-EN: probabilistic time series, time series, complex endeavor, endeavor that extends, extends beyond predicting
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
*备注: 12 pages, 15 figures, to be published in IEEE Quantum Week 2024’s conference proceeding

点击查看摘要

Abstract:Forecasting in probabilistic time series is a complex endeavor that extends beyond predicting future values to also quantifying the uncertainty inherent in these predictions. Gaussian process regression stands out as a Bayesian machine learning technique adept at addressing this multifaceted challenge. This paper introduces a novel approach that blends the robustness of this Bayesian technique with the nuanced insights provided by the kernel perspective on quantum models, aimed at advancing quantum kernelized probabilistic forecasting. We incorporate a quantum feature map inspired by Ising interactions and demonstrate its effectiveness in capturing the temporal dependencies critical for precise forecasting. The optimization of our model’s hyperparameters circumvents the need for computationally intensive gradient descent by employing gradient-free Bayesian optimization. Comparative benchmarks against established classical kernel models are provided, affirming that our quantum-enhanced approach achieves competitive performance.

[LG-75] Energy Estimation of Last Mile Electric Vehicle Routes

链接: https://arxiv.org/abs/2408.12006
作者: André Snoeck,Aniruddha Bhargava,Daniel Merchan,Josiah Davis,Julian Pachon
关键词-EN: incorporate electric vehicles, carriers increasingly incorporate, increasingly incorporate electric, Last-mile carriers increasingly, achieve sustainability goals
类目: Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Last-mile carriers increasingly incorporate electric vehicles (EVs) into their delivery fleet to achieve sustainability goals. This goal presents many challenges across multiple planning spaces including but not limited to how to plan EV routes. In this paper, we address the problem of predicting energy consumption of EVs for Last-Mile delivery routes using deep learning. We demonstrate the need to move away from thinking about range and we propose using energy as the basic unit of analysis. We share a range of deep learning solutions, beginning with a Feed Forward Neural Network (NN) and Recurrent Neural Network (RNN) and demonstrate significant accuracy improvements relative to pure physics-based and distance-based approaches. Finally, we present Route Energy Transformer (RET) a decoder-only Transformer model sized according to Chinchilla scaling laws. RET yields a +217 Basis Points (bps) improvement in Mean Absolute Percentage Error (MAPE) relative to the Feed Forward NN and a +105 bps improvement relative to the RNN.

[LG-76] CSPI-MT: Calibrated Safe Policy Improvement with Multiple Testing for Threshold Policies

链接: https://arxiv.org/abs/2408.12004
作者: Brian M Cho,Ana-Roxana Pop,Kyra Gan,Sam Corbett-Davies,Israel Nir,Ariel Evnine,Nathan Kallus
关键词-EN: newly proposed policy, modifying existing policies, status quo, ensure with high, high certainty
类目: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:When modifying existing policies in high-risk settings, it is often necessary to ensure with high certainty that the newly proposed policy improves upon a baseline, such as the status quo. In this work, we consider the problem of safe policy improvement, where one only adopts a new policy if it is deemed to be better than the specified baseline with at least pre-specified probability. We focus on threshold policies, a ubiquitous class of policies with applications in economics, healthcare, and digital advertising. Existing methods rely on potentially underpowered safety checks and limit the opportunities for finding safe improvements, so too often they must revert to the baseline to maintain safety. We overcome these issues by leveraging the most powerful safety test in the asymptotic regime and allowing for multiple candidates to be tested for improvement over the baseline. We show that in adversarial settings, our approach controls the rate of adopting a policy worse than the baseline to the pre-specified error level, even in moderate sample sizes. We present CSPI and CSPI-MT, two novel heuristics for selecting cutoff(s) to maximize the policy improvement from baseline. We demonstrate through both synthetic and external datasets that our approaches improve both the detection rates of safe policies and the realized improvement, particularly under stringent safety requirements and low signal-to-noise conditions.

[LG-77] me Series Foundation Models and Deep Learning Architectures for Earthquake Temporal and Spatial Nowcasting

链接: https://arxiv.org/abs/2408.11990
作者: Alireza Jafari,Geoffrey Fox,John B. Rundle,Andrea Donnellan,Lisa Grant Ludwig
关键词-EN: enduring objective aimed, Advancing the capabilities, seismic activities remains, reducing casualties, activities remains
类目: Machine Learning (cs.LG); Geophysics (physics.geo-ph)
*备注: 22 pages, 8 figures, 2 tables

点击查看摘要

Abstract:Advancing the capabilities of earthquake nowcasting, the real-time forecasting of seismic activities remains a crucial and enduring objective aimed at reducing casualties. This multifaceted challenge has recently gained attention within the deep learning domain, facilitated by the availability of extensive, long-term earthquake datasets. Despite significant advancements, existing literature on earthquake nowcasting lacks comprehensive evaluations of pre-trained foundation models and modern deep learning architectures. These architectures, such as transformers or graph neural networks, uniquely focus on different aspects of data, including spatial relationships, temporal patterns, and multi-scale dependencies. This paper addresses the mentioned gap by analyzing different architectures and introducing two innovation approaches called MultiFoundationQuake and GNNCoder. We formulate earthquake nowcasting as a time series forecasting problem for the next 14 days within 0.1-degree spatial bins in Southern California, spanning from 1986 to 2024. Earthquake time series is forecasted as a function of logarithm energy released by quakes. Our comprehensive evaluation employs several key performance metrics, notably Nash-Sutcliffe Efficiency and Mean Squared Error, over time in each spatial region. The results demonstrate that our introduced models outperform other custom architectures by effectively capturing temporal-spatial relationships inherent in seismic data. The performance of existing foundation models varies significantly based on the pre-training datasets, emphasizing the need for careful dataset selection. However, we introduce a new general approach termed MultiFoundationPattern that combines a bespoke pattern with foundation model results handled as auxiliary streams. In the earthquake case, the resultant MultiFoundationQuake model achieves the best overall performance.

[LG-78] Only Strict Saddles in the Energy Landscape of Predictive Coding Networks?

链接: https://arxiv.org/abs/2408.11979
作者: Francesco Innocenti,El Mehdi Achour,Ryan Singh,Christopher L. Buckley
关键词-EN: Predictive coding, performs iterative inference, energy-based learning algorithm, weight updates, algorithm that performs
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
*备注: 26 pages, 12 figures

点击查看摘要

Abstract:Predictive coding (PC) is an energy-based learning algorithm that performs iterative inference over network activities before weight updates. Recent work suggests that PC can converge in fewer learning steps than backpropagation thanks to its inference procedure. However, these advantages are not always observed, and the impact of PC inference on learning is theoretically not well understood. Here, we study the geometry of the PC energy landscape at the (inference) equilibrium of the network activities. For deep linear networks, we first show that the equilibrated energy is simply a rescaled mean squared error loss with a weight-dependent rescaling. We then prove that many highly degenerate (non-strict) saddles of the loss including the origin become much easier to escape (strict) in the equilibrated energy. Our theory is validated by experiments on both linear and non-linear networks. Based on these results, we conjecture that all the saddles of the equilibrated energy are strict. Overall, this work suggests that PC inference makes the loss landscape more benign and robust to vanishing gradients, while also highlighting the challenge of speeding up PC inference on large-scale models.

[LG-79] wo-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization ICML2020

链接: https://arxiv.org/abs/2408.11974
作者: Tianyi Lin,Chi Jin,Michael. I. Jordan
关键词-EN: gradient descent ascent, two-timescale gradient descent, textbf, descent ascent, constraint set
类目: Machine Learning (cs.LG); Optimization and Control (math.OC)
*备注: A preliminary version [ arXiv:1906.00331 ] of this paper, with a subset of the results that are presented here, was presented at ICML 2020; 44 Pages, 10 Figures

点击查看摘要

Abstract:We provide a unified analysis of two-timescale gradient descent ascent (TTGDA) for solving structured nonconvex minimax optimization problems in the form of \min_\textbfx \max_\textbfy \in Y f(\textbfx, \textbfy) , where the objective function f(\textbfx, \textbfy) is nonconvex in \textbfx and concave in \textbfy , and the constraint set Y \subseteq \mathbbR^n is convex and bounded. In the convex-concave setting, the single-timescale GDA achieves strong convergence guarantees and has been used for solving application problems arising from operations research and computer science. However, it can fail to converge in more general settings. Our contribution in this paper is to design the simple deterministic and stochastic TTGDA algorithms that efficiently find one stationary point of the function \Phi(\cdot) := \max_\textbfy \in Y f(\cdot, \textbfy) . Specifically, we prove the theoretical bounds on the complexity of solving both smooth and nonsmooth nonconvex-concave minimax optimization problems. To our knowledge, this is the first systematic analysis of TTGDA for nonconvex minimax optimization, shedding light on its superior performance in training generative adversarial networks (GANs) and in solving other real-world application problems.

[LG-80] Valuing an Engagement Surface using a Large Scale Dynamic Causal Model KDD2024

链接: https://arxiv.org/abs/2408.11967
作者: Abhimanyu Mukerji,Sushant More,Ashwin Viswanathan Kannan,Lakshmi Ravi,Hua Chen,Naman Kohli,Chris Khawand,Dinesh Mandalapu
关键词-EN: recent rapid growth, AI-powered Engagement Surfaces, Engagement Surfaces, online shopping, retail services
类目: Machine Learning (cs.LG); Econometrics (econ.EM); Applications (stat.AP)
*备注: 10 pages, 5 figures. Accepted at Applied Data Science track of KDD 2024, Barcelona, Spain

点击查看摘要

Abstract:With recent rapid growth in online shopping, AI-powered Engagement Surfaces (ES) have become ubiquitous across retail services. These engagement surfaces perform an increasing range of functions, including recommending new products for purchase, reminding customers of their orders and providing delivery notifications. Understanding the causal effect of engagement surfaces on value driven for customers and businesses remains an open scientific question. In this paper, we develop a dynamic causal model at scale to disentangle value attributable to an ES, and to assess its effectiveness. We demonstrate the application of this model to inform business decision-making by understanding returns on investment in the ES, and identifying product lines and features where the ES adds the most value.

[LG-81] Matmul or No Matmal in the Era of 1-bit LLMs

链接: https://arxiv.org/abs/2408.11939
作者: Jinendra Malekar,Mohammed E. Elbtity,Ramtin Zand Co
关键词-EN: attracted considerable attention, large language models, large language, attracted considerable, LLMs
类目: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: 13 pages, 12 figures

点击查看摘要

Abstract:The advent of 1-bit large language models (LLMs) has attracted considerable attention and opened up new research opportunities. However, 1-bit LLMs only improve a fraction of models by applying extreme quantization to the projection layers while leaving attention heads unchanged. Therefore, to avoid fundamentally wrong choices of goals in future research, it is crucial to understand the actual improvements in computation and memory usage that 1-bit LLMs can deliver. In this work, we present an adaptation of Amdahl’s Law tailored for the 1-bit LLM context, which illustrates how partial improvements in 1-bit LLMs impact overall model performance. Through extensive experiments, we uncover key nuances across different model architectures and hardware configurations, offering a roadmap for future research in the era of 1-bit LLMs.

[LG-82] Explainable Anomaly Detection: Counterfactual driven What-If Analysis

链接: https://arxiv.org/abs/2408.11935
作者: Logan Cummins,Alexander Sommers,Sudip Mittal,Shahram Rahimi,Maria Seale,Joseph Jaboure,Thomas Arnold
关键词-EN: anomaly detection alerts, predictive maintenance, life prediction, anomaly detection, exists three main
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
*备注: 8 pages, 6 figures, 3 tables

点击查看摘要

Abstract:There exists three main areas of study inside of the field of predictive maintenance: anomaly detection, fault diagnosis, and remaining useful life prediction. Notably, anomaly detection alerts the stakeholder that an anomaly is occurring. This raises two fundamental questions: what is causing the fault and how can we fix it? Inside of the field of explainable artificial intelligence, counterfactual explanations can give that information in the form of what changes to make to put the data point into the opposing class, in this case “healthy”. The suggestions are not always actionable which may raise the interest in asking “what if we do this instead?” In this work, we provide a proof of concept for utilizing counterfactual explanations as what-if analysis. We perform this on the PRONOSTIA dataset with a temporal convolutional network as the anomaly detector. Our method presents the counterfactuals in the form of a what-if analysis for this base problem to inspire future work for more complex systems and scenarios.

[LG-83] Neural Symbolic Logical Rule Learner for Interpretable Learning

链接: https://arxiv.org/abs/2408.11918
作者: Bowen Wei,Ziwei Zhu
关键词-EN: Normal Form, Rule-based neural networks, Conjunctive Normal Form, Disjunctive Normal Form, Normal Form Constraint
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注: 19 pages, 62 figures

点击查看摘要

Abstract:Rule-based neural networks stand out for enabling interpretable classification by learning logical rules for both prediction and interpretation. However, existing models often lack flexibility due to the fixed model structure. Addressing this, we introduce the Normal Form Rule Learner (NFRL) algorithm, leveraging a selective discrete neural network, that treat weight parameters as hard selectors, to learn rules in both Conjunctive Normal Form (CNF) and Disjunctive Normal Form (DNF) for enhanced accuracy and interpretability. Instead of adopting a deep, complex structure, the NFRL incorporates two specialized Normal Form Layers (NFLs) with adaptable AND/OR neurons, a Negation Layer for input negations, and a Normal Form Constraint (NFC) to streamline neuron connections. We also show the novel network architecture can be optimized using adaptive gradient update together with Straight-Through Estimator to overcome the gradient vanishing challenge. Through extensive experiments on 11 datasets, NFRL demonstrates superior classification performance, quality of learned rules, efficiency and interpretability compared to 12 state-of-the-art alternatives. Code and data are available at \urlhttps://anonymous.4open.science/r/NFRL-27B4/.

[LG-84] Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound

链接: https://arxiv.org/abs/2408.11915
作者: Junwon Lee,Jaekwon Im,Dabin Kim,Juhan Nam
关键词-EN: enhancing user experience, Foley sound synthesis, multimedia production, enhancing user, temporally and semantically
类目: ound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
*备注:

点击查看摘要

Abstract:Foley sound synthesis is crucial for multimedia production, enhancing user experience by synchronizing audio and video both temporally and semantically. Recent studies on automating this labor-intensive process through video-to-sound generation face significant challenges. Systems lacking explicit temporal features suffer from poor controllability and alignment, while timestamp-based models require costly and subjective human annotation. We propose Video-Foley, a video-to-sound system using Root Mean Square (RMS) as a temporal event condition with semantic timbre prompts (audio or text). RMS, a frame-level intensity envelope feature closely related to audio semantics, ensures high controllability and synchronization. The annotation-free self-supervised learning framework consists of two stages, Video2RMS and RMS2Sound, incorporating novel ideas including RMS discretization and RMS-ControlNet with a pretrained text-to-audio model. Our extensive evaluation shows that Video-Foley achieves state-of-the-art performance in audio-visual alignment and controllability for sound timing, intensity, timbre, and nuance. Code, model weights, and demonstrations are available on the accompanying website. (this https URL)

[LG-85] Why am I Still Seeing This: Measuring the Effectiveness Of Ad Controls and Explanations in AI-Mediated Ad Targeting Systems AAAI

链接: https://arxiv.org/abs/2408.11910
作者: Jane Castleman,Aleksandra Korolova
关键词-EN: data privacy policies, targeting explanations Meta, targeting, targeting explanations, Meta
类目: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
*备注: Accepted to the 7th AAAI Conference on AI, Ethics, and Society (AIES, 2024)

点击查看摘要

Abstract:Recently, Meta has shifted towards AI-mediated ad targeting mechanisms that do not require advertisers to provide detailed targeting criteria, likely driven by excitement over AI capabilities as well as new data privacy policies and targeting changes agreed upon in civil rights settlements. At the same time, Meta has touted their ad preference controls as an effective mechanism for users to control the ads they see. Furthermore, Meta markets their targeting explanations as a transparency tool that allows users to understand why they saw certain ads and inform actions to control future ads. Our study evaluates the effectiveness of Meta’s “See less” ad control and the actionability of ad targeting explanations following the shift to AI-mediated targeting. We conduct a large-scale study, randomly assigning participants to mark “See less” to Body Weight Control or Parenting topics, and collecting the ads and targeting explanations Meta shows to participants before and after the intervention. We find that utilizing the “See less” ad control for the topics we study does not significantly reduce the number of ads shown by Meta on these topics, and that the control is less effective for some users whose demographics are correlated with the topic. Furthermore, we find that the majority of ad targeting explanations for local ads made no reference to location-specific targeting criteria, and did not inform users why ads related to the topics they marked to “See less” of continued to be delivered. We hypothesize that the poor effectiveness of controls and lack of actionability in explanations are the result of the shift to AI-mediated targeting, for which explainability and transparency tools have not yet been developed. Our work thus provides evidence for the need of new methods for transparency and user control, suitable and reflective of increasingly complex AI-mediated ad delivery systems. Comments: Accepted to the 7th AAAI Conference on AI, Ethics, and Society (AIES, 2024) Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG) Cite as: arXiv:2408.11910 [cs.CY] (or arXiv:2408.11910v1 [cs.CY] for this version) https://doi.org/10.48550/arXiv.2408.11910 Focus to learn more arXiv-issued DOI via DataCite

[LG-86] Beyond Labels: Aligning Large Language Models with Human-like Reasoning ICPR2024

链接: https://arxiv.org/abs/2408.11879
作者: Muhammad Rafsan Kabir,Rafeed Mohammad Sultan,Ihsanul Haque Asif,Jawad Ibn Ahad,Fuad Rahman,Mohammad Ruhul Amin,Nabeel Mohammed,Shafin Rahman
关键词-EN: produce morally correct, Aligning large language, reasoning approach ensures, LLMs produce morally, large language models
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: Accepted in ICPR 2024

点击查看摘要

Abstract:Aligning large language models (LLMs) with a human reasoning approach ensures that LLMs produce morally correct and human-like decisions. Ethical concerns are raised because current models are prone to generating false positives and providing malicious responses. To contribute to this issue, we have curated an ethics dataset named Dataset for Aligning Reasons (DFAR), designed to aid in aligning language models to generate human-like reasons. The dataset comprises statements with ethical-unethical labels and their corresponding reasons. In this study, we employed a unique and novel fine-tuning approach that utilizes ethics labels and their corresponding reasons (L+R), in contrast to the existing fine-tuning approach that only uses labels (L). The original pre-trained versions, the existing fine-tuned versions, and our proposed fine-tuned versions of LLMs were then evaluated on an ethical-unethical classification task and a reason-generation task. Our proposed fine-tuning strategy notably outperforms the others in both tasks, achieving significantly higher accuracy scores in the classification task and lower misalignment rates in the reason-generation task. The increase in classification accuracies and decrease in misalignment rates indicate that the L+R fine-tuned models align more with human ethics. Hence, this study illustrates that injecting reasons has substantially improved the alignment of LLMs, resulting in more human-like responses. We have made the DFAR dataset and corresponding codes publicly available at this https URL.

[LG-87] Enhance Lifelong Model Editing with Continuous Data-Adapter Association

链接: https://arxiv.org/abs/2408.11869
作者: Jiaang Li,Quan Wang,Zhongnan Wang,Yongdong Zhang,Zhendong Mao
关键词-EN: Large language models, avoid factual errors, efficiently update specific, Large language, require model editing
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: Preprint. Under Review

点击查看摘要

Abstract:Large language models (LLMs) require model editing to efficiently update specific knowledge within them and avoid factual errors. Most model editing methods are solely designed for single-time use and lead to a significant forgetting effect after sequential edits over time, referred to as lifelong editing. Current approaches manage sequential edits by freezing original parameters and allocating new adapters for each knowledge modification. However, these methods lack robustness to minor input variations. To address this challenge, we propose ELDER, \textbfEnhancing \textbfLifelong mo\textbfDel \textbfEditing with mixtu\textbfRe of Low-Rank Adapter (LoRA). ELDER is an adaptive approach that integrates multiple LoRAs through a router network. It learns to create a continuous and smooth association between data and adapters, thereby enhancing robustness and generalization to semantically equivalent inputs. Additionally, we introduce a novel loss to help learn associations between adapter allocations and edit semantics. A deferral mechanism is also proposed to retain the original LLM capabilities post-edit. Extensive experiments on GPT-2 XL and LLaMA2-7B demonstrate that ELDER effectively edits models in the lifelong setting and exhibits strong scalability, while retaining LLM’s general abilities on downstream tasks.

[LG-88] Improving embedding with contrastive fine-tuning on small datasets with expert-augmented scores

链接: https://arxiv.org/abs/2408.11868
作者: Jun Lu,David Li,Bill Ding,Yu Kang
关键词-EN: small datasets augmented, presents an approach, approach to improve, contrastive fine-tuning, fine-tuning on small
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:This paper presents an approach to improve text embedding models through contrastive fine-tuning on small datasets augmented with expert scores. It focuses on enhancing semantic textual similarity tasks and addressing text retrieval problems. The proposed method uses soft labels derived from expert-augmented scores to fine-tune embedding models, preserving their versatility and ensuring retrieval capability is improved. The paper evaluates the method using a Q\A dataset from an online shopping website and eight expert models. Results show improved performance over a benchmark model across multiple metrics on various retrieval tasks from the massive text embedding benchmark (MTEB). The method is cost-effective and practical for real-world applications, especially when labeled data is scarce.

[LG-89] Crossing New Frontiers: Knowledge-Augmented Large Language Model Prompting for Zero-Shot Text-Based De Novo Molecule Design NEURIPS NEURIPS-2023

链接: https://arxiv.org/abs/2408.11866
作者: Sakhinana Sagar Srinivas,Venkataramana Runkana
关键词-EN: innovative material development, efficient chemical processes, leverages computational methods, optimize molecular properties, fast-tracking new drug
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Biomolecules (q-bio.BM)
*备注: Paper was accepted at R0-FoMo: Robustness of Few-shot and Zero-shot Learning in Foundation Models, NeurIPS-2023. Please find the links: this https URL and this https URL

点击查看摘要

Abstract:Molecule design is a multifaceted approach that leverages computational methods and experiments to optimize molecular properties, fast-tracking new drug discoveries, innovative material development, and more efficient chemical processes. Recently, text-based molecule design has emerged, inspired by next-generation AI tasks analogous to foundational vision-language models. Our study explores the use of knowledge-augmented prompting of large language models (LLMs) for the zero-shot text-conditional de novo molecular generation task. Our approach uses task-specific instructions and a few demonstrations to address distributional shift challenges when constructing augmented prompts for querying LLMs to generate molecules consistent with technical descriptions. Our framework proves effective, outperforming state-of-the-art (SOTA) baseline models on benchmark datasets.

[LG-90] How Susceptible are LLMs to Influence in Prompts?

链接: https://arxiv.org/abs/2408.11865
作者: Sotiris Anagnostidis,Jannis Bulian
关键词-EN: Large Language Models, Large Language, including additional context, Language Models, highly sensitive
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Large Language Models (LLMs) are highly sensitive to prompts, including additional context provided therein. As LLMs grow in capability, understanding their prompt-sensitivity becomes increasingly crucial for ensuring reliable and robust performance, particularly since evaluating these models becomes more challenging. In this work, we investigate how current models (Llama, Mixtral, Falcon) respond when presented with additional input from another model, mimicking a scenario where a more capable model – or a system with access to more external information – provides supplementary information to the target model. Across a diverse spectrum of question-answering tasks, we study how an LLM’s response to multiple-choice questions changes when the prompt includes a prediction and explanation from another model. Specifically, we explore the influence of the presence of an explanation, the stated authoritativeness of the source, and the stated confidence of the supplementary input. Our findings reveal that models are strongly influenced, and when explanations are provided they are swayed irrespective of the quality of the explanation. The models are more likely to be swayed if the input is presented as being authoritative or confident, but the effect is small in size. This study underscores the significant prompt-sensitivity of LLMs and highlights the potential risks of incorporating outputs from external sources without thorough scrutiny and further validation. As LLMs continue to advance, understanding and mitigating such sensitivities will be crucial for their reliable and trustworthy deployment.

[LG-91] Unraveling Text Generation in LLMs: A Stochastic Differential Equation Approach

链接: https://arxiv.org/abs/2408.11863
作者: Yukun Zhang
关键词-EN: Stochastic Differential Equations, Differential Equations, Large Language Models, Stochastic Differential, Large Language
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
*备注:

点击查看摘要

Abstract:This paper explores the application of Stochastic Differential Equations (SDE) to interpret the text generation process of Large Language Models (LLMs) such as GPT-4. Text generation in LLMs is modeled as a stochastic process where each step depends on previously generated content and model parameters, sampling the next word from a vocabulary distribution. We represent this generation process using SDE to capture both deterministic trends and stochastic perturbations. The drift term describes the deterministic trends in the generation process, while the diffusion term captures the stochastic variations. We fit these functions using neural networks and validate the model on real-world text corpora. Through numerical simulations and comprehensive analyses, including drift and diffusion analysis, stochastic process property evaluation, and phase space exploration, we provide deep insights into the dynamics of text generation. This approach not only enhances the understanding of the inner workings of LLMs but also offers a novel mathematical perspective on language generation, which is crucial for diagnosing, optimizing, and controlling the quality of generated text.

[LG-92] Speaking the Same Language: Leveraging LLMs in Standardizing Clinical Data for AI

链接: https://arxiv.org/abs/2408.11861
作者: Arindam Sett,Somaye Hashemifar,Mrunal Yadav,Yogesh Pandit,Mohsen Hejrati
关键词-EN: Artificial Intelligence, garnered considerable attention, implementation of Artificial, cost reduction, considerable attention
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: 11 pages, 2 figures, 4 tables

点击查看摘要

Abstract:The implementation of Artificial Intelligence (AI) in the healthcare industry has garnered considerable attention, attributable to its prospective enhancement of clinical outcomes, expansion of access to superior healthcare, cost reduction, and elevation of patient satisfaction. Nevertheless, the primary hurdle that persists is related to the quality of accessible multi-modal healthcare data in conjunction with the evolution of AI methodologies. This study delves into the adoption of large language models to address specific challenges, specifically, the standardization of healthcare data. We advocate the use of these models to identify and map clinical data schemas to established data standard attributes, such as the Fast Healthcare Interoperability Resources. Our results illustrate that employing large language models significantly diminishes the necessity for manual data curation and elevates the efficacy of the data standardization process. Consequently, the proposed methodology has the propensity to expedite the integration of AI in healthcare, ameliorate the quality of patient care, whilst minimizing the time and financial resources necessary for the preparation of data for AI.

[LG-93] FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models

链接: https://arxiv.org/abs/2408.11855
作者: Zhongyu Zhao,Menghang Dong,Rongyu Zhang,Wenzhao Zheng,Yunpeng Zhang,Huanrui Yang,Dalong Du,Kurt Keutzer,Shanghang Zhang
关键词-EN: Large Language Models, Large Language, storing diverse linguistic, Recent research, Feed-Forward Networks
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Recent research has demonstrated that Feed-Forward Networks (FFNs) in Large Language Models (LLMs) play a pivotal role in storing diverse linguistic and factual knowledge. Conventional methods frequently face challenges due to knowledge confusion stemming from their monolithic and redundant architectures, which calls for more efficient solutions with minimal computational overhead, particularly for LLMs. In this paper, we explore the FFN computation paradigm in LLMs and introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications, while maintaining the same level of performance. Furthermore, we embed a router from the Mixture-of-Experts (MoE), combined with our devised Prior-Approximate (PA) loss term that facilitates the dynamic activation of experts and knowledge adaptation, thereby accelerating computational processes and enhancing performance using minimal training data and fine-tuning steps. FactorLLM thus enables efficient knowledge factorization and activates select groups of experts specifically tailored to designated tasks, emulating the interactive functional segmentation of the human brain. Extensive experiments across various benchmarks demonstrate the effectiveness of our proposed FactorLLM which achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed. Code: this https URL.

[LG-94] When Raw Data Prevails: Are Large Language Model Embeddings Effective in Numerical Data Representation for Medical Machine Learning Applications?

链接: https://arxiv.org/abs/2408.11854
作者: Yanjun Gao,Skatje Myers,Shan Chen,Dmitriy Dligach,Timothy A Miller,Danielle Bitterman,Matthew Churpek,Majid Afshar
关键词-EN: Large Language Models, Language Models, Large Language, bringing significant progress, introduction of Large
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: Under review

点击查看摘要

Abstract:The introduction of Large Language Models (LLMs) has advanced data representation and analysis, bringing significant progress in their use for medical questions and answering. Despite these advancements, integrating tabular data, especially numerical data pivotal in clinical contexts, into LLM paradigms has not been thoroughly explored. In this study, we examine the effectiveness of vector representations from last hidden states of LLMs for medical diagnostics and prognostics using electronic health record (EHR) data. We compare the performance of these embeddings with that of raw numerical EHR data when used as feature inputs to traditional machine learning (ML) algorithms that excel at tabular data learning, such as eXtreme Gradient Boosting. We focus on instruction-tuned LLMs in a zero-shot setting to represent abnormal physiological data and evaluating their utilities as feature extractors to enhance ML classifiers for predicting diagnoses, length of stay, and mortality. Furthermore, we examine prompt engineering techniques on zero-shot and few-shot LLM embeddings to measure their impact comprehensively. Although findings suggest the raw data features still prevails in medical ML tasks, zero-shot LLM embeddings demonstrate competitive results, suggesting a promising avenue for future research in medical applications.

[LG-95] Fast Training Dataset Attribution via In-Context Learning

链接: https://arxiv.org/abs/2408.11852
作者: Milad Fotouhi,Mohammad Taha Bahadori,Oluwaseyi Feyisetan,Payman Arabshahi,David Heckerman
关键词-EN: instruction-tuned large language, large language models, prompt engineering, engineering to estimate, instruction-tuned large
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:We investigate the use of in-context learning and prompt engineering to estimate the contributions of training data in the outputs of instruction-tuned large language models (LLMs). We propose two novel approaches: (1) a similarity-based approach that measures the difference between LLM outputs with and without provided context, and (2) a mixture distribution model approach that frames the problem of identifying contribution scores as a matrix factorization task. Our empirical comparison demonstrates that the mixture model approach is more robust to retrieval noise in in-context learning, providing a more reliable estimation of data contributions.

[LG-96] Adaptive Friction in Deep Learning: Enhancing Optimizers with Sigmoid and Tanh Function

链接: https://arxiv.org/abs/2408.11839
作者: Hongye Zheng,Bingxing Wang,Minheng Xiao,Honglin Qin,Zhizhong Wu,Lianghao Tan
关键词-EN: adaptive friction coefficients, deep neural networks, neural networks, oscillation issues, pivotal in guiding
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
*备注:

点击查看摘要

Abstract:Adaptive optimizers are pivotal in guiding the weight updates of deep neural networks, yet they often face challenges such as poor generalization and oscillation issues. To counter these, we introduce sigSignGrad and tanhSignGrad, two novel optimizers that integrate adaptive friction coefficients based on the Sigmoid and Tanh functions, respectively. These algorithms leverage short-term gradient information, a feature overlooked in traditional Adam variants like diffGrad and AngularGrad, to enhance parameter updates and convergence.Our theoretical analysis demonstrates the wide-ranging adjustment capability of the friction coefficient S, which aligns with targeted parameter update strategies and outperforms existing methods in both optimization trajectory smoothness and convergence rate. Extensive experiments on CIFAR-10, CIFAR-100, and Mini-ImageNet datasets using ResNet50 and ViT architectures confirm the superior performance of our proposed optimizers, showcasing improved accuracy and reduced training time. The innovative approach of integrating adaptive friction coefficients as plug-ins into existing optimizers, exemplified by the sigSignAdamW and sigSignAdamP variants, presents a promising strategy for boosting the optimization performance of established algorithms. The findings of this study contribute to the advancement of optimizer design in deep learning.

[LG-97] MicroXercise: A Micro-Level Comparative and Explainable System for Remote Physical Therapy

链接: https://arxiv.org/abs/2408.11837
作者: Hanchen David Wang,Nibraas Khan,Anna Chen,Nilanjan Sarkar,Pamela Wisniewski,Meiyi Ma
关键词-EN: Recent global estimates, global estimates suggest, Recent global, billion individuals, rehabilitation services
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)
*备注: Accepted by IEEE/ACM CHASE 2024

点击查看摘要

Abstract:Recent global estimates suggest that as many as 2.41 billion individuals have health conditions that would benefit from rehabilitation services. Home-based Physical Therapy (PT) faces significant challenges in providing interactive feedback and meaningful observation for therapists and patients. To fill this gap, we present MicroXercise, which integrates micro-motion analysis with wearable sensors, providing therapists and patients with a comprehensive feedback interface, including video, text, and scores. Crucially, it employs multi-dimensional Dynamic Time Warping (DTW) and attribution-based explainable methods to analyze the existing deep learning neural networks in monitoring exercises, focusing on a high granularity of exercise. This synergistic approach is pivotal, providing output matching the input size to precisely highlight critical subtleties and movements in PT, thus transforming complex AI analysis into clear, actionable feedback. By highlighting these micro-motions in different metrics, such as stability and range of motion, MicroXercise significantly enhances the understanding and relevance of feedback for end-users. Comparative performance metrics underscore its effectiveness over traditional methods, such as a 39% and 42% improvement in Feature Mutual Information (FMI) and Continuity. MicroXercise is a step ahead in home-based physical therapy, providing a technologically advanced and intuitively helpful solution to enhance patient care and outcomes.

[LG-98] FAKER: Full-body Anonymization with Human Keypoint Extraction for Real-time Video Deidentification

链接: https://arxiv.org/abs/2408.11829
作者: Byunghyun Ban,Hyoseok Lee
关键词-EN: contemporary digital era, digital era, paramount issue, contemporary digital, Abstract
类目: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
*备注:

点击查看摘要

Abstract:In the contemporary digital era, protection of personal information has become a paramount issue. The exponential growth of the media industry has heightened concerns regarding the anonymization of individuals captured in video footage. Traditional methods, such as blurring or pixelation, are commonly employed, while recent advancements have introduced generative adversarial networks (GAN) to redraw faces in videos. In this study, we propose a novel approach that employs a significantly smaller model to achieve real-time full-body anonymization of individuals in videos. Unlike conventional techniques that often fail to effectively remove personal identification information such as skin color, clothing, accessories, and body shape while our method successfully eradicates all such details. Furthermore, by leveraging pose estimation algorithms, our approach accurately represents information regarding individuals’ positions, movements, and postures. This algorithm can be seamlessly integrated into CCTV or IP camera systems installed in various industrial settings, functioning in real-time and thus facilitating the widespread adoption of full-body anonymization technology.

[LG-99] Is ChatGPT a Good Software Librarian? An Exploratory Study on the Use of ChatGPT for Software Library Recommendations

链接: https://arxiv.org/abs/2408.05128
作者: Jasmine Latendresse,SayedHassan Khatoonabadi,Ahmad Abdellatif,Emad Shihab
关键词-EN: Large Language Models, play a critical, critical role, Large Language, Language Models
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: Submitted

点击查看摘要

Abstract:Software libraries play a critical role in the functionality, efficiency, and maintainability of software systems. As developers increasingly rely on Large Language Models (LLMs) to streamline their coding processes, the effectiveness of these models in recommending appropriate libraries becomes crucial yet remains largely unexplored. In this paper, we assess the effectiveness of ChatGPT as a software librarian and identify areas for improvement. We conducted an empirical study using GPT-3.5 Turbo to generate Python code for 10,000 Stack Overflow questions. Our findings show that ChatGPT uses third-party libraries nearly 10% more often than human developers, favoring widely adopted and well-established options. However, 14.2% of the recommended libraries had restrictive copyleft licenses, which were not explicitly communicated by ChatGPT. Additionally, 6.5% of the libraries did not work out of the box, leading to potential developer confusion and wasted time. While ChatGPT can be an effective software librarian, it should be improved by providing more explicit information on maintainability metrics and licensing. We recommend that developers implement rigorous dependency management practices and double-check library licenses before integrating LLM-generated code into their projects.

[LG-100] On the Variability of AI-based Software Systems Due to Environment Configurations

链接: https://arxiv.org/abs/2408.02825
作者: Musfiqur Rahman,SayedHassan Khatoonabadi,Ahmad Abdellatif,Haya Samaana,Emad Shihab
关键词-EN: include Artificial Intelligence, Artificial Intelligence, systems include Artificial, include Artificial, software systems include
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: Submitted to the Information and Software Technology journal for review

点击查看摘要

Abstract:[Context] Nowadays, many software systems include Artificial Intelligence (AI) components and changes in the development environment have been known to induce variability in an AI-based system. [Objective] However, how an environment configuration impacts the variability of these systems is yet to be explored. Understanding and quantifying the degree of variability due to such configurations can help practitioners decide the best environment configuration for the most stable AI products. [Method] To achieve this goal, we performed experiments with eight different combinations of three key environment variables (operating system, Python version, and CPU architecture) on 30 open-source AI-based systems using the Travis CI platform. We evaluate variability using three metrics: the output of an AI component like an ML model (performance), the time required to build and run a system (processing time), and the cost associated with building and running a system (expense). [Results] Our results indicate that variability exists in all three metrics; however, it is observed more frequently with respect to processing time and expense than performance. For example, between Linux and MacOS, variabilities are observed in 23%, 96.67%, and 100% of the studied projects in performance, processing time, and expense, respectively. [Conclusion] Our findings underscore the importance of identifying the optimal combination of configuration settings to mitigate performance drops and reduce retraining time and cost before deploying an AI-based system.

[LG-101] Predicting the First Response Latency of Maintainers and Contributors in Pull Requests

链接: https://arxiv.org/abs/2311.07786
作者: SayedHassan Khatoonabadi,Ahmad Abdellatif,Diego Elias Costa,Emad Shihab
关键词-EN: Pull Request, maintainers, response, response latency, faster first responses
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: Manuscript accepted for publication in IEEE Transactions on Software Engineering (TSE)

点击查看摘要

Abstract:The success of a Pull Request (PR) depends on the responsiveness of the maintainers and the contributor during the review process. Being aware of the expected waiting times can lead to better interactions and managed expectations for both the maintainers and the contributor. In this paper, we propose a machine-learning approach to predict the first response latency of the maintainers following the submission of a PR, and the first response latency of the contributor after receiving the first response from the maintainers. We curate a dataset of 20 large and popular open-source projects on GitHub and extract 21 features to characterize projects, contributors, PRs, and review processes. Using these features, we then evaluate seven types of classifiers to identify the best-performing models. We also conduct permutation feature importance and SHAP analyses to understand the importance and the impact of different features on the predicted response latencies. We find that our CatBoost models are the most effective for predicting the first response latencies of both maintainers and contributors. We also observe that PRs submitted earlier in the week, containing an average number of commits, and with concise descriptions are more likely to receive faster first responses from the maintainers. Similarly, PRs with a lower first response latency from maintainers, that received the first response of maintainers earlier in the week, and containing an average number of commits tend to receive faster first responses from the contributors. Additionally, contributors with a higher acceptance rate and a history of timely responses in the project are likely to both obtain and provide faster first responses. Moreover, we show the effectiveness of our approach in a cross-project setting.

[LG-102] Understanding the Helpfulness of Stale Bot for Pull-based Development: An Empirical Study of 20 Large Open-Source Projects

链接: https://arxiv.org/abs/2305.18150
作者: SayedHassan Khatoonabadi,Diego Elias Costa,Suhaib Mujahid,Emad Shihab
关键词-EN: Pull Requests, Stale bot, Stale, PRs, making it difficult
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: Manuscript submitted to ACM Transactions on Software Engineering and Methodology

点击查看摘要

Abstract:Pull Requests (PRs) that are neither progressed nor resolved clutter the list of PRs, making it difficult for the maintainers to manage and prioritize unresolved PRs. To automatically track, follow up, and close such inactive PRs, Stale bot was introduced by GitHub. Despite its increasing adoption, there are ongoing debates on whether using Stale bot alleviates or exacerbates the problem of inactive PRs. To better understand if and how Stale bot helps projects in their pull-based development workflow, we perform an empirical study of 20 large and popular open-source projects. We find that Stale bot can help deal with a backlog of unresolved PRs as the projects closed more PRs within the first few months of adoption. Moreover, Stale bot can help improve the efficiency of the PR review process as the projects reviewed PRs that ended up merged and resolved PRs that ended up closed faster after the adoption. However, Stale bot can also negatively affect the contributors as the projects experienced a considerable decrease in their number of active contributors after the adoption. Therefore, relying solely on Stale bot to deal with inactive PRs may lead to decreased community engagement and an increased probability of contributor abandonment.

[LG-103] On Wasted Contributions: Understanding the Dynamics of Contributor-Abandoned Pull Requests

链接: https://arxiv.org/abs/2110.15447
作者: SayedHassan Khatoonabadi,Diego Elias Costa,Rabe Abdalkareem,Emad Shihab
关键词-EN: enabled numerous volunteers, Pull-based development, fewer barriers, development has enabled, enabled numerous
类目: oftware Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注: Manuscript accepted for publication in ACM Transactions on Software Engineering and Methodology (TOSEM)

点击查看摘要

Abstract:Pull-based development has enabled numerous volunteers to contribute to open-source projects with fewer barriers. Nevertheless, a considerable amount of pull requests (PRs) with valid contributions are abandoned by their contributors, wasting the effort and time put in by both the contributors and maintainers. To better understand the underlying dynamics of contributor-abandoned PRs, we conduct a mixed-methods study using both quantitative and qualitative methods. We curate a dataset consisting of 265,325 PRs including 4,450 abandoned ones from ten popular and mature GitHub projects and measure 16 features characterizing PRs, contributors, review processes, and projects. Using statistical and machine learning techniques, we find that complex PRs, novice contributors, and lengthy reviews have a higher probability of abandonment and the rate of PR abandonment fluctuates alongside the projects’ maturity or workload. To identify why contributors abandon their PRs, we also manually examine a random sample of 354 abandoned PRs. We observe that the most frequent abandonment reasons are related to the obstacles faced by contributors, followed by the hurdles imposed by maintainers during the review process. Finally, we survey the top core maintainers of the studied projects to understand their perspectives on dealing with PR abandonment and on our findings.

[LG-104] Stochastic Compositional Minimax Optimization with Provable Convergence Guarantees

链接: https://arxiv.org/abs/2408.12505
作者: Yuyang Deng,Fuli Qiao,Mehrdad Mahdavi
关键词-EN: Stochastic compositional minimax, compositional minimax problem, Stochastic compositional, compositional minimax, compositional
类目: Optimization and Control (math.OC); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Stochastic compositional minimax problems are prevalent in machine learning, yet there are only limited established on the convergence of this class of problems. In this paper, we propose a formal definition of the stochastic compositional minimax problem, which involves optimizing a minimax loss with a compositional structure either in primal , dual, or both primal and dual variables. We introduce a simple yet effective algorithm, stochastically Corrected stOchastic gradient Descent Ascent (CODA), which is a descent ascent type algorithm with compositional correction steps, and establish its convergence rate in aforementioned three settings. In the presence of the compositional structure in primal, the objective function typically becomes nonconvex in primal due to function composition. Thus, we consider the nonconvex-strongly-concave and nonconvex-concave settings and show that CODA can efficiently converge to a stationary point. In the case of composition on the dual, the objective function becomes nonconcave in the dual variable, and we demonstrate convergence in the strongly-convex-nonconcave and convex-nonconcave setting. In the case of composition on both variables, the primal and dual variables may lose convexity and concavity, respectively. Therefore, we anaylze the convergence in weakly-convex-weakly-concave setting. We also give a variance reduction version algorithm, CODA+, which achieves the best known rate on nonconvex-strongly-concave and nonconvex-concave compositional minimax problem. This work initiates the theoretical study of the stochastic compositional minimax problem on various settings and may inform modern machine learning scenarios such as domain adaptation or robust model-agnostic meta-learning.

[LG-105] EX-DRL: Hedging Against Heavy Losses with EXtreme Distributional Reinforcement Learning

链接: https://arxiv.org/abs/2408.12446
作者: Parvin Malekzadeh,Zissis Poulos,Jacky Chen,Zeyu Wang,Konstantinos N. Plataniotis
关键词-EN: Distributional Reinforcement Learning, Distributional Reinforcement, Recent advancements, Reinforcement Learning, advancements in Distributional
类目: Risk Management (q-fin.RM); Machine Learning (cs.LG); Statistical Finance (q-fin.ST)
*备注: 14 pages

点击查看摘要

Abstract:Recent advancements in Distributional Reinforcement Learning (DRL) for modeling loss distributions have shown promise in developing hedging strategies in derivatives markets. A common approach in DRL involves learning the quantiles of loss distributions at specified levels using Quantile Regression (QR). This method is particularly effective in option hedging due to its direct quantile-based risk assessment, such as Value at Risk (VaR) and Conditional Value at Risk (CVaR). However, these risk measures depend on the accurate estimation of extreme quantiles in the loss distribution’s tail, which can be imprecise in QR-based DRL due to the rarity and extremity of tail data, as highlighted in the literature. To address this issue, we propose EXtreme DRL (EX-DRL), which enhances extreme quantile prediction by modeling the tail of the loss distribution with a Generalized Pareto Distribution (GPD). This method introduces supplementary data to mitigate the scarcity of extreme quantile observations, thereby improving estimation accuracy through QR. Comprehensive experiments on gamma hedging options demonstrate that EX-DRL improves existing QR-based models by providing more precise estimates of extreme quantiles, thereby improving the computation and reliability of risk metrics for complex financial risk management.

[LG-106] Dynamic Gated Recurrent Neural Network for Compute-efficient Speech Enhancement INTERSPEECH2024

链接: https://arxiv.org/abs/2408.12425
作者: Longbiao Cheng,Ashutosh Pandey,Buye Xu,Tobi Delbruck,Shih-Chii Liu
关键词-EN: Gated Recurrent Neural, Recurrent Neural Network, Dynamic Gated Recurrent, resource-constrained hardware platforms, Gated Recurrent Unit
类目: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
*备注: Accepted to Interspeech 2024

点击查看摘要

Abstract:This paper introduces a new Dynamic Gated Recurrent Neural Network (DG-RNN) for compute-efficient speech enhancement models running on resource-constrained hardware platforms. It leverages the slow evolution characteristic of RNN hidden states over steps, and updates only a selected set of neurons at each step by adding a newly proposed select gate to the RNN model. This select gate allows the computation cost of the conventional RNN to be reduced during network inference. As a realization of the DG-RNN, we further propose the Dynamic Gated Recurrent Unit (D-GRU) which does not require additional parameters. Test results obtained from several state-of-the-art compute-efficient RNN-based speech enhancement architectures using the DNS challenge dataset, show that the D-GRU based model variants maintain similar speech intelligibility and quality metrics comparable to the baseline GRU based models even with an average 50% reduction in GRU computes.

[LG-107] Distributed quasi-Newton robust estimation under differential privacy

链接: https://arxiv.org/abs/2408.12353
作者: Chuhan Wang,Lixing Zhu,Xuehu Zhu
关键词-EN: distributed quasi-Newton estimation, Byzantine machines, asymptotic relative efficiency, Privacy Protection, computing with Byzantine
类目: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
*备注: 38 pages, 6 figures

点击查看摘要

Abstract:For distributed computing with Byzantine machines under Privacy Protection (PP) constraints, this paper develops a robust PP distributed quasi-Newton estimation, which only requires the node machines to transmit five vectors to the central processor with high asymptotic relative efficiency. Compared with the gradient descent strategy which requires more rounds of transmission and the Newton iteration strategy which requires the entire Hessian matrix to be transmitted, the novel quasi-Newton iteration has advantages in reducing privacy budgeting and transmission cost. Moreover, our PP algorithm does not depend on the boundedness of gradients and second-order derivatives. When gradients and second-order derivatives follow sub-exponential distributions, we offer a mechanism that can ensure PP with a sufficiently high probability. Furthermore, this novel estimator can achieve the optimal convergence rate and the asymptotic normality. The numerical studies on synthetic and real data sets evaluate the performance of the proposed algorithm.

[LG-108] Neural-ANOVA: Model Decomposition for Interpretable Machine Learning

链接: https://arxiv.org/abs/2408.12319
作者: Steffen Limmer,Steffen Udluft,Clemens Otte
关键词-EN: specific decision output, analysis of variance, decision output, ANOVA decomposition, offers a systematic
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注: 8 pages, 4 figures, 5 tables

点击查看摘要

Abstract:The analysis of variance (ANOVA) decomposition offers a systematic method to understand the interaction effects that contribute to a specific decision output. In this paper we introduce Neural-ANOVA, an approach to decompose neural networks into glassbox models using the ANOVA decomposition. Our approach formulates a learning problem, which enables rapid and closed-form evaluation of integrals over subspaces that appear in the calculation of the ANOVA decomposition. Finally, we conduct numerical experiments to illustrate the advantages of enhanced interpretability and model validation by a decomposition of the learned interaction effects.

[LG-109] Multiple testing for signal-agnostic searches of new physics with machine learning

链接: https://arxiv.org/abs/2408.12296
作者: Gaia Grosso,Marco Letizia
关键词-EN: enhance signal-agnostic searches, multiple testing strategies, leveraging multiple testing, address the question, searches by leveraging
类目: High Energy Physics - Phenomenology (hep-ph); Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex); Data Analysis, Statistics and Probability (physics.data-an); Methodology (stat.ME)
*备注: 17 pages, 5 tables, 6 figures

点击查看摘要

Abstract:In this work, we address the question of how to enhance signal-agnostic searches by leveraging multiple testing strategies. Specifically, we consider hypothesis tests relying on machine learning, where model selection can introduce a bias towards specific families of new physics signals. We show that it is beneficial to combine different tests, characterised by distinct choices of hyperparameters, and that performances comparable to the best available test are generally achieved while providing a more uniform response to various types of anomalies. Focusing on the New Physics Learning Machine, a methodology to perform a signal-agnostic likelihood-ratio test, we explore a number of approaches to multiple testing, such as combining p-values and aggregating test statistics.

[LG-110] Demystifying Functional Random Forests: Novel Explainability Tools for Model Transparency in High-Dimensional Spaces

链接: https://arxiv.org/abs/2408.12288
作者: Fabrizio Maturo,Annamaria Porreca
关键词-EN: raised significant challenges, Functional Random Forests, Functional Data Analysis, analysing high-dimensional datasets, advent of big
类目: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME)
*备注: 33 pages

点击查看摘要

Abstract:The advent of big data has raised significant challenges in analysing high-dimensional datasets across various domains such as medicine, ecology, and economics. Functional Data Analysis (FDA) has proven to be a robust framework for addressing these challenges, enabling the transformation of high-dimensional data into functional forms that capture intricate temporal and spatial patterns. However, despite advancements in functional classification methods and very high performance demonstrated by combining FDA and ensemble methods, a critical gap persists in the literature concerning the transparency and interpretability of black-box models, e.g. Functional Random Forests (FRF). In response to this need, this paper introduces a novel suite of explainability tools to illuminate the inner mechanisms of FRF. We propose using Functional Partial Dependence Plots (FPDPs), Functional Principal Component (FPC) Probability Heatmaps, various model-specific and model-agnostic FPCs’ importance metrics, and the FPC Internal-External Importance and Explained Variance Bubble Plot. These tools collectively enhance the transparency of FRF models by providing a detailed analysis of how individual FPCs contribute to model predictions. By applying these methods to an ECG dataset, we demonstrate the effectiveness of these tools in revealing critical patterns and improving the explainability of FRF.

[LG-111] Zeroth-Order Stochastic Mirror Descent Algorithms for Minimax Excess Risk Optimization

链接: https://arxiv.org/abs/2408.12209
作者: Zhihao Gu,Zi Xu
关键词-EN: traditional distributionally robust, achieves uniformly low, uniformly low regret, distributionally robust optimization, minimax excess risk
类目: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
*备注:

点击查看摘要

Abstract:The minimax excess risk optimization (MERO) problem is a new variation of the traditional distributionally robust optimization (DRO) problem, which achieves uniformly low regret across all test distributions under suitable conditions. In this paper, we propose a zeroth-order stochastic mirror descent (ZO-SMD) algorithm available for both smooth and non-smooth MERO to estimate the minimal risk of each distrbution, and finally solve MERO as (non-)smooth stochastic convex-concave (linear) minimax optimization problems. The proposed algorithm is proved to converge at optimal convergence rates of \mathcalO\left(1/\sqrtt\right) on the estimate of R_i^* and \mathcalO\left(1/\sqrtt\right) on the optimization error of both smooth and non-smooth MERO. Numerical results show the efficiency of the proposed algorithm.

[LG-112] Efficient Learning for Linear Properties of Bounded-Gate Quantum Circuits

链接: https://arxiv.org/abs/2408.12199
作者: Yuxuan Du,Min-Hsiu Hsieh,Dacheng Tao
关键词-EN: complicated large-qubit state, large-qubit state space, state space forbids, modern quantum computers, vast and complicated
类目: Quantum Physics (quant-ph); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:The vast and complicated large-qubit state space forbids us to comprehensively capture the dynamics of modern quantum computers via classical simulations or quantum tomography. However, recent progress in quantum learning theory invokes a crucial question: given a quantum circuit containing d tunable RZ gates and G-d Clifford gates, can a learner perform purely classical inference to efficiently predict its linear properties using new classical inputs, after learning from data obtained by incoherently measuring states generated by the same circuit but with different classical inputs? In this work, we prove that the sample complexity scaling linearly in d is necessary and sufficient to achieve a small prediction error, while the corresponding computational complexity may scale exponentially in d. Building upon these derived complexity bounds, we further harness the concept of classical shadow and truncated trigonometric expansion to devise a kernel-based learning model capable of trading off prediction error and computational complexity, transitioning from exponential to polynomial scaling in many practical settings. Our results advance two crucial realms in quantum computation: the exploration of quantum algorithms with practical utilities and learning-based quantum system certification. We conduct numerical simulations to validate our proposals across diverse scenarios, encompassing quantum information processing protocols, Hamiltonian simulation, and variational quantum algorithms up to 60 qubits.

[LG-113] Empowering Wireless Network Applications with Deep Learning-based Radio Propagation Models

链接: https://arxiv.org/abs/2408.12193
作者: Stefanos Bakirtzis,Cagkan Yapar,Marco Fiore,Jie Zhang,Ian Wassell
关键词-EN: target coverage area, communication ecosystem rely, received signal quality, wireless communication ecosystem, coverage area
类目: ignal Processing (eess.SP); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
*备注: 7 pages, 3 Figures, 1 Table

点击查看摘要

Abstract:The efficient deployment and operation of any wireless communication ecosystem rely on knowledge of the received signal quality over the target coverage area. This knowledge is typically acquired through radio propagation solvers, which however suffer from intrinsic and well-known performance limitations. This article provides a primer on how integrating deep learning and conventional propagation modeling techniques can enhance multiple vital facets of wireless network operation, and yield benefits in terms of efficiency and reliability. By highlighting the pivotal role that the deep learning-based radio propagation models will assume in next-generation wireless networks, we aspire to propel further research in this direction and foster their adoption in additional applications.

[LG-114] ransformers are Minimax Optimal Nonparametric In-Context Learners ICML2024

链接: https://arxiv.org/abs/2408.12186
作者: Juno Kim,Tai Nakamaki,Taiji Suzuki
关键词-EN: large language models, surprisingly effective method, large language, language models, models has proven
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注: 40 pages, 3 figures, ICML 2024 Workshop on Theoretical Foundations of Foundation Models

点击查看摘要

Abstract:In-context learning (ICL) of large language models has proven to be a surprisingly effective method of learning a new task from only a few demonstrative examples. In this paper, we study the efficacy of ICL from the viewpoint of statistical learning theory. We develop approximation and generalization error bounds for a transformer composed of a deep neural network and one linear attention layer, pretrained on nonparametric regression tasks sampled from general function spaces including the Besov space and piecewise \gamma -smooth class. We show that sufficiently trained transformers can achieve – and even improve upon – the minimax optimal estimation risk in context by encoding the most relevant basis representations during pretraining. Our analysis extends to high-dimensional or sequential data and distinguishes the \emphpretraining and \emphin-context generalization gaps. Furthermore, we establish information-theoretic lower bounds for meta-learners w.r.t. both the number of tasks and in-context examples. These findings shed light on the roles of task diversity and representation learning for ICL.

[LG-115] DeepHQ: Learned Hierarchical Quantizer for Progressive Deep Image Coding

链接: https://arxiv.org/abs/2408.12150
作者: Jooyoung Lee,Se Yoon Jeong,Munchurl Kim
关键词-EN: Unlike fixed, variable-rate image coding, providing high compression, variable-rate image, increasing the versatility
类目: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Unlike fixed- or variable-rate image coding, progressive image coding (PIC) aims to compress various qualities of images into a single bitstream, increasing the versatility of bitstream utilization and providing high compression efficiency compared to simulcast compression. Research on neural network (NN)-based PIC is in its early stages, mainly focusing on applying varying quantization step sizes to the transformed latent representations in a hierarchical manner. These approaches are designed to compress only the progressively added information as the quality improves, considering that a wider quantization interval for lower-quality compression includes multiple narrower sub-intervals for higher-quality compression. However, the existing methods are based on handcrafted quantization hierarchies, resulting in sub-optimal compression efficiency. In this paper, we propose an NN-based progressive coding method that firstly utilizes learned quantization step sizes via learning for each quantization layer. We also incorporate selective compression with which only the essential representation components are compressed for each quantization layer. We demonstrate that our method achieves significantly higher coding efficiency than the existing approaches with decreased decoding time and reduced model size.

[LG-116] hrough-the-Wall Radar Human Activity Micro-Doppler Signature Representation Method Based on Joint Boulic-Sinusoidal Pendulum Model MICRO

链接: https://arxiv.org/abs/2408.12077
作者: Xiaopeng Yang,Weicheng Gao,Xiaodong Qu,Zeyu Ma,Hao Zhang
关键词-EN: accurately identify indoor, joint Boulic-sinusoidal pendulum, enables the reconstruction, indoor human activities, reconstruction of range
类目: ignal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
*备注: 17 pages, 14 figures, 7 tables, in IEEE Transactions on Microwave Theory and Techniques, 2024

点击查看摘要

Abstract:With the help of micro-Doppler signature, ultra-wideband (UWB) through-the-wall radar (TWR) enables the reconstruction of range and velocity information of limb nodes to accurately identify indoor human activities. However, existing methods are usually trained and validated directly using range-time maps (RTM) and Doppler-time maps (DTM), which have high feature redundancy and poor generalization ability. In order to solve this problem, this paper proposes a human activity micro-Doppler signature representation method based on joint Boulic-sinusoidal pendulum motion model. In detail, this paper presents a simplified joint Boulic-sinusoidal pendulum human motion model by taking head, torso, both hands and feet into consideration improved from Boulic-Thalmann kinematic model. The paper also calculates the minimum number of key points needed to describe the Doppler and micro-Doppler information sufficiently. Both numerical simulations and experiments are conducted to verify the effectiveness. The results demonstrate that the proposed number of key points of micro-Doppler signature can precisely represent the indoor human limb node motion characteristics, and substantially improve the generalization capability of the existing methods for different testers.

[LG-117] A Deconfounding Approach to Climate Model Bias Correction

链接: https://arxiv.org/abs/2408.12063
作者: Wentao Gao,Jiuyong Li,Debo Cheng,Lin Liu,Jixue Liu,Thuc Duy Le,Xiaojing Du,Xiongren Chen,Yanchang Zhao,Yun Chen
关键词-EN: Global Climate Models, Earth systems, simulating the Earth, predicting future climate, Global Climate
类目: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
*备注:

点击查看摘要

Abstract:Global Climate Models (GCMs) are crucial for predicting future climate changes by simulating the Earth systems. However, GCM outputs exhibit systematic biases due to model uncertainties, parameterization simplifications, and inadequate representation of complex climate phenomena. Traditional bias correction methods, which rely on historical observation data and statistical techniques, often neglect unobserved confounders, leading to biased results. This paper proposes a novel bias correction approach to utilize both GCM and observational data to learn a factor model that captures multi-cause latent confounders. Inspired by recent advances in causality based time series deconfounding, our method first constructs a factor model to learn latent confounders from historical data and then applies them to enhance the bias correction process using advanced time series forecasting models. The experimental results demonstrate significant improvements in the accuracy of precipitation outputs. By addressing unobserved confounders, our approach offers a robust and theoretically grounded solution for climate model bias correction.

[LG-118] MAC protocol classification in the ISM band using machine learning methods

链接: https://arxiv.org/abs/2408.12059
作者: Hanieh Rashidpour,Hossein Bahramgiri
关键词-EN: radio spectrum shortages, wireless channel spectrum, ISM radio band, Cognitive Radio users, growing number
类目: ignal Processing (eess.SP); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
*备注:

点击查看摘要

Abstract:With the emergence of new technologies and a growing number of wireless networks, we face the problem of radio spectrum shortages. As a result, identifying the wireless channel spectrum to exploit the channel’s idle state while also boosting network security is a pivotal issue. Detecting and classifying protocols in the MAC sublayer enables Cognitive Radio users to improve spectrum utilization and minimize potential interference. In this paper, we classify the Wi-Fi and Bluetooth protocols, which are the most widely used MAC sublayer protocols in the ISM radio band. With the advent of various wireless technologies, especially in the 2.4 GHz frequency band, the ISM frequency spectrum has become crowded and high-traffic, which faces a lack of spectrum resources and user interference. Therefore, identifying and classifying protocols is an effective and useful method. Leveraging machine learning and deep learning techniques, known for their advanced classification capabilities, we apply Support Vector Machine and K-Nearest Neighbors algorithms, which are machine learning algorithms, to classify protocols into three classes: Wi-Fi, Wi-Fi Beacon, and Bluetooth. To capture the signals, we use the USRP N210 Software Defined Radio device and sample the real data in the indoor environment in different conditions of the presence and absence of transmitters and receivers for these two protocols. By assembling this dataset and studying the time and frequency features of the protocols, we extract the frame width and the silence gap between the two frames as time features and the PAPR of each frame as a power feature. By comparing the output of the protocols classification in different conditions and also adding Gaussian noise, it was found that the samples in the nonlinear SVM method with RBF and KNN functions have the best performance, with 97.83% and 98.12% classification accuracy, respectively.

[LG-119] Detection of Under-represented Samples Using Dynamic Batch Training for Brain Tumor Segmentation from MR Images

链接: https://arxiv.org/abs/2408.12013
作者: Subin Sahayam,John Michael Sujay Zakkam,Yoga Sri Varshan V,Umarani Jayaraman
关键词-EN: magnetic resonance imaging, automatic brain tumor, brain tumor segmentation, samples, training
类目: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Brain tumors in magnetic resonance imaging (MR) are difficult, time-consuming, and prone to human error. These challenges can be resolved by developing automatic brain tumor segmentation methods from MR images. Various deep-learning models based on the U-Net have been proposed for the task. These deep-learning models are trained on a dataset of tumor images and then used for segmenting the masks. Mini-batch training is a widely used method in deep learning for training. However, one of the significant challenges associated with this approach is that if the training dataset has under-represented samples or samples with complex latent representations, the model may not generalize well to these samples. The issue leads to skewed learning of the data, where the model learns to fit towards the majority representations while underestimating the under-represented samples. The proposed dynamic batch training method addresses the challenges posed by under-represented data points, data points with complex latent representation, and imbalances within the class, where some samples may be harder to learn than others. Poor performance of such samples can be identified only after the completion of the training, leading to the wastage of computational resources. Also, training easy samples after each epoch is an inefficient utilization of computation resources. To overcome these challenges, the proposed method identifies hard samples and trains such samples for more iterations compared to easier samples on the BraTS2020 dataset. Additionally, the samples trained multiple times are identified and it provides a way to identify hard samples in the BraTS2020 dataset. The comparison of the proposed training approach with U-Net and other models in the literature highlights the capabilities of the proposed training approach.

[LG-120] An Asymptotically Optimal Coordinate Descent Algorithm for Learning Bayesian Networks from Gaussian Models

链接: https://arxiv.org/abs/2408.11977
作者: Tong Xu,Armeen Taeb,Simge Küçükyavuz,Ali Shojaie
关键词-EN: linear Gaussian structural, Gaussian structural equation, structural equation model, learning Bayesian networks, Bayesian networks
类目: Machine Learning (stat.ML); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:This paper studies the problem of learning Bayesian networks from continuous observational data, generated according to a linear Gaussian structural equation model. We consider an \ell_0 -penalized maximum likelihood estimator for this problem which is known to have favorable statistical properties but is computationally challenging to solve, especially for medium-sized Bayesian networks. We propose a new coordinate descent algorithm to approximate this estimator and prove several remarkable properties of our procedure: the algorithm converges to a coordinate-wise minimum, and despite the non-convexity of the loss function, as the sample size tends to infinity, the objective value of the coordinate descent solution converges to the optimal objective value of the \ell_0 -penalized maximum likelihood estimator. Finite-sample optimality and statistical consistency guarantees are also established. To the best of our knowledge, our proposal is the first coordinate descent procedure endowed with optimality and statistical guarantees in the context of learning Bayesian networks. Numerical experiments on synthetic and real data demonstrate that our coordinate descent method can obtain near-optimal solutions while being scalable.

[LG-121] DrivAerML: High-Fidelity Computational Fluid Dynamics Dataset for Road-Car External Aerodynamics

链接: https://arxiv.org/abs/2408.11969
作者: Neil Ashton,Charles Mockett,Marian Fuchs,Louis Fliessbach,Hendrik Hetmann,Thilo Knacke,Norbert Schonwald,Vangelis Skaperdas,Grigoris Fotiadis,Astrid Walle,Burkhard Hupertz,Danielle Maddix
关键词-EN: Machine Learning, enabling split-second flow, split-second flow predictions, flow predictions early, enabling split-second
类目: Fluid Dynamics (physics.flu-dyn); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Machine Learning (ML) has the potential to revolutionise the field of automotive aerodynamics, enabling split-second flow predictions early in the design process. However, the lack of open-source training data for realistic road cars, using high-fidelity CFD methods, represents a barrier to their development. To address this, a high-fidelity open-source (CC-BY-SA) public dataset for automotive aerodynamics has been generated, based on 500 parametrically morphed variants of the widely-used DrivAer notchback generic vehicle. Mesh generation and scale-resolving CFD was executed using consistent and validated automatic workflows representative of the industrial state-of-the-art. Geometries and rich aerodynamic data are published in open-source formats. To our knowledge, this is the first large, public-domain dataset for complex automotive configurations generated using high-fidelity CFD.

[LG-122] he Whole Is Bigger Than the Sum of Its Parts: Modeling Individual Annotators to Capture Emotional Variability INTERSPEECH2024

链接: https://arxiv.org/abs/2408.11956
作者: James Tavernor,Yara El-Tawil,Emily Mower Provost
关键词-EN: highly subjective processes, perception are nuanced, subjective processes, expression and perception, highly subjective
类目: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
*备注: Accepted to Interspeech 2024 Conference

点击查看摘要

Abstract:Emotion expression and perception are nuanced, complex, and highly subjective processes. When multiple annotators label emotional data, the resulting labels contain high variability. Most speech emotion recognition tasks address this by averaging annotator labels as ground truth. However, this process omits the nuance of emotion and inter-annotator variability, which are important signals to capture. Previous work has attempted to learn distributions to capture emotion variability, but these methods also lose information about the individual annotators. We address these limitations by learning to predict individual annotators and by introducing a novel method to create distributions from continuous model outputs that permit the learning of emotion distributions during model training. We show that this combined approach can result in emotion distributions that are more accurate than those seen in prior work, in both within- and cross-corpus settings.

[LG-123] opological Representational Similarity Analysis in Brains and Beyond WWW

链接: https://arxiv.org/abs/2408.11948
作者: Baihan Lin
关键词-EN: crucial topological information, Topological RSA, introduces Topological RSA, Topological Simplicial Analysis, artificial intelligence
类目: Neurons and Cognition (q-bio.NC); Machine Learning (cs.LG); Geometric Topology (math.GT)
*备注: Thesis defended by Baihan Lin (bl2681@columbia.edu) in 2023 for PhD in Computational Neuroscience at Columbia University; unifies and extends work from PNAS, WWW, CCN, ISMB, BIBM etc. ( arXiv:2309.11028 , 2203.05488 , 1906.09264 , 2204.14048 , 1810.02923 )

点击查看摘要

Abstract:Understanding how the brain represents and processes information is crucial for advancing neuroscience and artificial intelligence. Representational similarity analysis (RSA) has been instrumental in characterizing neural representations, but traditional RSA relies solely on geometric properties, overlooking crucial topological information. This thesis introduces Topological RSA (tRSA), a novel framework combining geometric and topological properties of neural representations. tRSA applies nonlinear monotonic transforms to representational dissimilarities, emphasizing local topology while retaining intermediate-scale geometry. The resulting geo-topological matrices enable model comparisons robust to noise and individual idiosyncrasies. This thesis introduces several key methodological advances: (1) Topological RSA (tRSA) for identifying computational signatures and testing topological hypotheses; (2) Adaptive Geo-Topological Dependence Measure (AGTDM) for detecting complex multivariate relationships; (3) Procrustes-aligned Multidimensional Scaling (pMDS) for revealing neural computation stages; (4) Temporal Topological Data Analysis (tTDA) for uncovering developmental trajectories; and (5) Single-cell Topological Simplicial Analysis (scTSA) for characterizing cell population complexity. Through analyses of neural recordings, biological data, and neural network simulations, this thesis demonstrates the power and versatility of these methods in understanding brains, computational models, and complex biological systems. They not only offer robust approaches for adjudicating among competing models but also reveal novel theoretical insights into the nature of neural computation. This work lays the foundation for future investigations at the intersection of topology, neuroscience, and time series analysis, paving the way for more nuanced understanding of brain function and dysfunction. Comments: Thesis defended by Baihan Lin (bl2681@columbia.edu) in 2023 for PhD in Computational Neuroscience at Columbia University; unifies and extends work from PNAS, WWW, CCN, ISMB, BIBM etc. (arXiv:2309.11028, 2203.05488, 1906.09264, 2204.14048, 1810.02923) Subjects: Neurons and Cognition (q-bio.NC); Machine Learning (cs.LG); Geometric Topology (math.GT) Cite as: arXiv:2408.11948 [q-bio.NC] (or arXiv:2408.11948v1 [q-bio.NC] for this version) https://doi.org/10.48550/arXiv.2408.11948 Focus to learn more arXiv-issued DOI via DataCite

[LG-124] A Unified Theory of Quantum Neural Network Loss Landscapes

链接: https://arxiv.org/abs/2408.11901
作者: Eric R. Anschuetz
关键词-EN: Classical neural networks, Classical neural, random initialization famously, neural networks, Toggle
类目: Quantum Physics (quant-ph); Machine Learning (cs.LG)
*备注: 51 pages, 4 figures

点击查看摘要

Abstract:Classical neural networks with random initialization famously behave as Gaussian processes in the limit of many neurons, with the architecture of the network determining the covariance of the associated process. This limit allows one to completely characterize the training behavior of such networks and show that, generally, classical neural networks train efficiently via gradient descent. No such general understanding exists for quantum neural networks (QNNs), which – outside of certain special cases – are known to not behave as Gaussian processes when randomly initialized. We here prove that instead QNNs and their first two derivatives generally form what we call Wishart processes, where now certain algebraic properties of the network determine the hyperparameters of the process. This Wishart process description allows us to, for the first time: 1. Give necessary and sufficient conditions for a QNN architecture to have a Gaussian process limit. 2. Calculate the full gradient distribution, unifying previously known barren plateau results. 3. Calculate the local minima distribution of algebraically constrained QNNs. The transition from trainability to untrainability in each of these contexts is governed by a single parameter we call the “degrees of freedom” of the network architecture. We thus end by proposing a formal definition for the “trainability” of a given QNN architecture using this experimentally accessible quantity. Comments: 51 pages, 4 figures Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG) Cite as: arXiv:2408.11901 [quant-ph] (or arXiv:2408.11901v1 [quant-ph] for this version) https://doi.org/10.48550/arXiv.2408.11901 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Eric Anschuetz [view email] [v1] Wed, 21 Aug 2024 18:00:08 UTC (458 KB) Full-text links: Access Paper: View a PDF of the paper titled A Unified Theory of Quantum Neural Network Loss Landscapes, by Eric R. AnschuetzView PDFHTML (experimental)TeX SourceOther Formats view license Current browse context: quant-ph prev | next new | recent | 2024-08 Change to browse by: cs cs.LG References Citations INSPIRE HEP NASA ADSGoogle Scholar Semantic Scholar a export BibTeX citation Loading… BibTeX formatted citation loading… Data provided by: Bookmark checked=“checked”> Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Code, Data and Media Associated with this Article Links to Code Toggle CatalyzeX Code Finder for Papers (What is CatalyzeX?) DagsHub Toggle DagsHub (What is DagsHub?) GotitPub Toggle Gotit.pub (What is GotitPub?) Links to Code Toggle Papers with Code (What is Papers with Code?) ScienceCast Toggle ScienceCast (What is ScienceCast?) Demos Demos Replicate Toggle Replicate (What is Replicate?) Spaces Toggle Hugging Face Spaces (What is Spaces?) Spaces Toggle TXYZ.AI (What is TXYZ.AI?) Related Papers Recommenders and Search Tools Link to Influence Flower Influence Flower (What are Influence Flowers?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Core recommender toggle CORE Recommender (What is CORE?) Author Venue Institution Topic About arXivLabs arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv’s community? Learn more about arXivLabs. Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?) mathjaxToggle(); About Help contact arXivClick here to contact arXiv Contact subscribe to arXiv mailingsClick here to subscribe Subscribe Copyright Privacy Policy Web Accessibility Assistance arXiv Operational Status Get status notifications via email or slack

[LG-125] ST-USleepNet: A Spatial-Temporal Coupling Prominence Network for Multi-Channel Sleep Staging

链接: https://arxiv.org/abs/2408.11884
作者: Jingying Ma,Qika Lin,Ziyu Jia,Mengling Feng
关键词-EN: assessing sleep quality, Sleep, diagnosing disorders, Sleep staging, critical for assessing
类目: Neurons and Cognition (q-bio.NC); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Sleep staging is critical for assessing sleep quality and diagnosing disorders. Recent advancements in artificial intelligence have driven the development of automated sleep staging models, which still face two significant challenges. 1) Simultaneously extracting prominent temporal and spatial sleep features from multi-channel raw signals, including characteristic sleep waveforms and salient spatial brain networks. 2) Capturing the spatial-temporal coupling patterns essential for accurate sleep staging. To address these challenges, we propose a novel framework named ST-USleepNet, comprising a spatial-temporal graph construction module (ST) and a U-shaped sleep network (USleepNet). The ST module converts raw signals into a spatial-temporal graph to model spatial-temporal couplings. The USleepNet utilizes a U-shaped structure originally designed for image segmentation. Similar to how image segmentation isolates significant targets, when applied to both raw sleep signals and ST module-generated graph data, USleepNet segments these inputs to extract prominent temporal and spatial sleep features simultaneously. Testing on three datasets demonstrates that ST-USleepNet outperforms existing baselines, and model visualizations confirm its efficacy in extracting prominent sleep features and temporal-spatial coupling patterns across various sleep stages. The code is available at: this https URL.

[LG-126] From Glucose Patterns to Health Outcomes: A Generalizable Foundation Model for Continuous Glucose Monitor Data Analysis

链接: https://arxiv.org/abs/2408.11876
作者: Guy Lutsker,Gal Sapir,Anastasia Godneva,Smadar Shilo,Jerry R Greenfield,Dorit Samocha-Bonet,Shie Mannor,Eli Meirom,Gal Chechik,Hagai Rossman,Eran Segal
关键词-EN: self-supervised learning enabled, offer great potential, Recent advances, CGM, self-supervised learning
类目: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Recent advances in self-supervised learning enabled novel medical AI models, known as foundation models (FMs) that offer great potential for characterizing health from diverse biomedical data. Continuous glucose monitoring (CGM) provides rich, temporal data on glycemic patterns, but its full potential for predicting broader health outcomes remains underutilized. Here, we present GluFormer, a generative foundation model on biomedical temporal data based on a transformer architecture, and trained on over 10 million CGM measurements from 10,812 non-diabetic individuals. We tokenized the CGM training data and trained GluFormer using next token prediction in a generative, autoregressive manner. We demonstrate that GluFormer generalizes effectively to 15 different external datasets, including 4936 individuals across 5 different geographical regions, 6 different CGM devices, and several metabolic disorders, including normoglycemic, prediabetic, and diabetic populations, as well as those with gestational diabetes and obesity. GluFormer produces embeddings which outperform traditional CGM analysis tools, and achieves high Pearson correlations in predicting clinical parameters such as HbA1c, liver-related parameters, blood lipids, and sleep-related indices. Notably, GluFormer can also predict onset of future health outcomes even 4 years in advance. We also show that CGM embeddings from pre-intervention periods in Randomized Clinical Trials (RCTs) outperform other methods in predicting primary and secondary outcomes. When integrating dietary data into GluFormer, we show that the enhanced model can accurately generate CGM data based only on dietary intake data, simulate outcomes of dietary interventions, and predict individual responses to specific foods. Overall, we show that GluFormer accurately predicts health outcomes which generalize across different populations metabolic conditions.

[LG-127] Parameter-Efficient Transfer Learning under Federated Learning for Automatic Speech Recognition

链接: https://arxiv.org/abs/2408.11873
作者: Xuan Kan,Yonghui Xiao,Tien-Ju Yang,Nanxin Chen,Rajiv Mathews
关键词-EN: Automatic Speech Recognition, enhancing Automatic Speech, Speech Recognition, Automatic Speech, user data privacy
类目: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:This work explores the challenge of enhancing Automatic Speech Recognition (ASR) model performance across various user-specific domains while preserving user data privacy. We employ federated learning and parameter-efficient domain adaptation methods to solve the (1) massive data requirement of ASR models from user-specific scenarios and (2) the substantial communication cost between servers and clients during federated learning. We demonstrate that when equipped with proper adapters, ASR models under federated tuning can achieve similar performance compared with centralized tuning ones, thus providing a potential direction for future privacy-preserved ASR services. Besides, we investigate the efficiency of different adapters and adapter incorporation strategies under the federated learning setting.

[LG-128] Gradient Reduction Convolutional Neural Network Policy for Financial Deep Reinforcement Learning

链接: https://arxiv.org/abs/2408.11859
作者: Sina Montazeri,Haseebullah Jumakhan,Sonia Abrasiabian,Amir Mirzaeinia
关键词-EN: convolutional neural networks, CNN model predictive, prior explorations, explorations of convolutional, convolutional neural
类目: Computational Finance (q-fin.CP); Machine Learning (cs.LG)
*备注: Regular Research Paper (CSCE-ICAI 2024), 8 Pages

点击查看摘要

Abstract:Building on our prior explorations of convolutional neural networks (CNNs) for financial data processing, this paper introduces two significant enhancements to refine our CNN model’s predictive performance and robustness for financial tabular data. Firstly, we integrate a normalization layer at the input stage to ensure consistent feature scaling, addressing the issue of disparate feature magnitudes that can skew the learning process. This modification is hypothesized to aid in stabilizing the training dynamics and improving the model’s generalization across diverse financial datasets. Secondly, we employ a Gradient Reduction Architecture, where earlier layers are wider and subsequent layers are progressively narrower. This enhancement is designed to enable the model to capture more complex and subtle patterns within the data, a crucial factor in accurately predicting financial outcomes. These advancements directly respond to the limitations identified in previous studies, where simpler models struggled with the complexity and variability inherent in financial applications. Initial tests confirm that these changes improve accuracy and model stability, suggesting that deeper and more nuanced network architectures can significantly benefit financial predictive tasks. This paper details the implementation of these enhancements and evaluates their impact on the model’s performance in a controlled experimental setting.

[LG-129] Online Electric Vehicle Charging Detection Based on Memory-based Transformer using Smart Meter Data

链接: https://arxiv.org/abs/2408.11828
作者: Ammar Mansoor Kamoona,Hui Song,Mahdi Jalili,Hao Wang,Reza Razzaghi,Xinghuo Yu
关键词-EN: Electric Vehicles, poses unique challenges, Distribution Network Operators, popularity of Electric, electricity Distribution Network
类目: ignal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:The growing popularity of Electric Vehicles (EVs) poses unique challenges for grid operators and infrastructure, which requires effectively managing these vehicles’ integration into the grid. Identification of EVs charging is essential to electricity Distribution Network Operators (DNOs) for better planning and managing the distribution grid. One critical aspect is the ability to accurately identify the presence of EV charging in the grid. EV charging identification using smart meter readings obtained from behind-the-meter devices is a challenging task that enables effective managing the integration of EVs into the existing power grid. Different from the existing supervised models that require addressing the imbalance problem caused by EVs and non-EVs data, we propose a novel unsupervised memory-based transformer (M-TR) that can run in real-time (online) to detect EVs charging from a streaming smart meter. It dynamically leverages coarse-scale historical information using an M-TR encoder from an extended global temporal window, in conjunction with an M-TR decoder that concentrates on a limited time frame, local window, aiming to capture the fine-scale characteristics of the smart meter data. The M-TR is based on an anomaly detection technique that does not require any prior knowledge about EVs charging profiles, nor it does only require real power consumption data of non-EV users. In addition, the proposed model leverages the power of transfer learning. The M-TR is compared with different state-of-the-art methods and performs better than other unsupervised learning models. The model can run with an excellent execution time of 1.2 sec. for 1-minute smart recordings.

信息检索

[IR-0] RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment

链接: https://arxiv.org/abs/2408.12579
作者: Xiaohan Wang,Xiaoyan Yang,Yuqi Zhu,Yue Shen,Jian Wang,Peng Wei,Lei Liang,Jinjie Gu,Huajun Chen,Ningyu Zhang
关键词-EN: Large Language Models, Large Language, Language Models, Med-Gemini achieve performance, achieve performance competitively
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR); Machine Learning (cs.LG)
*备注: Ongoing work

点击查看摘要

Abstract:Large Language Models (LLMs) like GPT-4, MedPaLM-2, and Med-Gemini achieve performance competitively with human experts across various medical benchmarks. However, they still face challenges in making professional diagnoses akin to physicians, particularly in efficiently gathering patient information and reasoning the final diagnosis. To this end, we introduce the RuleAlign framework, designed to align LLMs with specific diagnostic rules. We develop a medical dialogue dataset comprising rule-based communications between patients and physicians and design an alignment learning approach through preference learning. Experimental results demonstrate the effectiveness of the proposed approach. We hope that our work can serve as an inspiration for exploring the potential of LLMs as AI physicians.

[IR-1] he Importance of Cognitive Biases in the Recommendation Ecosystem

链接: https://arxiv.org/abs/2408.12492
作者: Markus Schedl,Oleg Lesota,Stefan Brandl,Mohammad Lotfi,Gustavo Junior Escobedo Ticona,Shahed Masoudian
关键词-EN: Cognitive biases, studied in psychology, economics for decades, behavioral economics, biases
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:Cognitive biases have been studied in psychology, sociology, and behavioral economics for decades. Traditionally, they have been considered a negative human trait that leads to inferior decision-making, reinforcement of stereotypes, or can be exploited to manipulate consumers, respectively. We argue that cognitive biases also manifest in different parts of the recommendation ecosystem and at different stages of the recommendation process. More importantly, we contest this traditional detrimental perspective on cognitive biases and claim that certain cognitive biases can be beneficial when accounted for by recommender systems. Concretely, we provide empirical evidence that biases such as feature-positive effect, Ikea effect, and cultural homophily can be observed in various components of the recommendation pipeline, including input data (such as ratings or side information), recommendation algorithm or model (and consequently recommended items), and user interactions with the system. In three small experiments covering recruitment and entertainment domains, we study the pervasiveness of the aforementioned biases. We ultimately advocate for a prejudice-free consideration of cognitive biases to improve user and item models as well as recommendation algorithms.

[IR-2] DLCRec: A Novel Approach for Managing Diversity in LLM-Based Recommender Systems

链接: https://arxiv.org/abs/2408.12470
作者: Jiaju Chen,Chongming Gao,Shuai Yuan,Shuchang Liu,Qingpeng Cai,Peng Jiang
关键词-EN: Large Language Models, Large Language, substantial performance improvements, integration of Large, Language Models
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:The integration of Large Language Models (LLMs) into recommender systems has led to substantial performance improvements. However, this often comes at the cost of diminished recommendation diversity, which can negatively impact user satisfaction. To address this issue, controllable recommendation has emerged as a promising approach, allowing users to specify their preferences and receive recommendations that meet their diverse needs. Despite its potential, existing controllable recommender systems frequently rely on simplistic mechanisms, such as a single prompt, to regulate diversity-an approach that falls short of capturing the full complexity of user preferences. In response to these limitations, we propose DLCRec, a novel framework designed to enable fine-grained control over diversity in LLM-based recommendations. Unlike traditional methods, DLCRec adopts a fine-grained task decomposition strategy, breaking down the recommendation process into three sequential sub-tasks: genre prediction, genre filling, and item prediction. These sub-tasks are trained independently and inferred sequentially according to user-defined control numbers, ensuring more precise control over diversity. Furthermore, the scarcity and uneven distribution of diversity-related user behavior data pose significant challenges for fine-tuning. To overcome these obstacles, we introduce two data augmentation techniques that enhance the model’s robustness to noisy and out-of-distribution data. These techniques expose the model to a broader range of patterns, improving its adaptability in generating recommendations with varying levels of diversity. Our extensive empirical evaluation demonstrates that DLCRec not only provides precise control over diversity but also outperforms state-of-the-art baselines across multiple recommendation scenarios.

[IR-3] A Comparative Analysis of Faithfulness Metrics and Humans in Citation Evaluation SIGIR2024

链接: https://arxiv.org/abs/2408.12398
作者: Weijia Zhang,Mohammad Aliannejadi,Jiahuan Pei,Yifei Yuan,Jia-Hong Huang,Evangelos Kanoulas
关键词-EN: Large language models, Large language, language models, unsupported or unverifiable, support
类目: Information Retrieval (cs.IR); Computation and Language (cs.CL)
*备注: Accepted by the First Workshop on Large Language Model for Evaluation in Information Retrieval (LLM4Eval@SIGIR2024), non-archival. arXiv admin note: substantial text overlap with arXiv:2406.15264

点击查看摘要

Abstract:Large language models (LLMs) often generate content with unsupported or unverifiable content, known as “hallucinations.” To address this, retrieval-augmented LLMs are employed to include citations in their content, grounding the content in verifiable sources. Despite such developments, manually assessing how well a citation supports the associated statement remains a major challenge. Previous studies tackle this challenge by leveraging faithfulness metrics to estimate citation support automatically. However, they limit this citation support estimation to a binary classification scenario, neglecting fine-grained citation support in practical scenarios. To investigate the effectiveness of faithfulness metrics in fine-grained scenarios, we propose a comparative evaluation framework that assesses the metric effectiveness in distinguishing citations between three-category support levels: full, partial, and no support. Our framework employs correlation analysis, classification evaluation, and retrieval evaluation to measure the alignment between metric scores and human judgments comprehensively. Our results indicate no single metric consistently excels across all evaluations, highlighting the complexity of accurately evaluating fine-grained support levels. Particularly, we find that the best-performing metrics struggle to distinguish partial support from full or no support. Based on these findings, we provide practical recommendations for developing more effective metrics.

[IR-4] Dynamic Product Image Generation and Recommendation at Scale for Personalized E-commerce RECSYS’24

链接: https://arxiv.org/abs/2408.12392
作者: Ádám Tibor Czapp,Mátyás Jani,Bálint Domián,Balázs Hidasi
关键词-EN: Coupling latent diffusion, latent diffusion based, contextual bandits enables, eye-catching personalized product, Coupling latent
类目: Information Retrieval (cs.IR)
*备注: Appearing in the Proceedings of the 18th ACM Conference on Recommender Systems (RecSys’24) as an Industry Track paper

点击查看摘要

Abstract:Coupling latent diffusion based image generation with contextual bandits enables the creation of eye-catching personalized product images at scale that was previously either impossible or too expensive. In this paper we showcase how we utilized these technologies to increase user engagement with recommendations in online retargeting campaigns for e-commerce.

[IR-5] Fair Augmentation for Graph Collaborative Filtering

链接: https://arxiv.org/abs/2408.12208
作者: Ludovico Boratto,Francesco Fabbri,Gianni Fenu,Mirko Marras,Giacomo Medda
关键词-EN: learning users’ preferences, users’ preferences, preferences from user-item, graph collaborative filtering, collaborative power
类目: Information Retrieval (cs.IR); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Recent developments in recommendation have harnessed the collaborative power of graph neural networks (GNNs) in learning users’ preferences from user-item networks. Despite emerging regulations addressing fairness of automated systems, unfairness issues in graph collaborative filtering remain underexplored, especially from the consumer’s perspective. Despite numerous contributions on consumer unfairness, only a few of these works have delved into GNNs. A notable gap exists in the formalization of the latest mitigation algorithms, as well as in their effectiveness and reliability on cutting-edge models. This paper serves as a solid response to recent research highlighting unfairness issues in graph collaborative filtering by reproducing one of the latest mitigation methods. The reproduced technique adjusts the system fairness level by learning a fair graph augmentation. Under an experimental setup based on 11 GNNs, 5 non-GNN models, and 5 real-world networks across diverse domains, our investigation reveals that fair graph augmentation is consistently effective on high-utility models and large datasets. Experiments on the transferability of the fair augmented graph open new issues for future recommendation studies. Source code: this https URL.

[IR-6] Rank and Align: Towards Effective Source-free Graph Domain Adaptation IJCAI2024

链接: https://arxiv.org/abs/2408.12185
作者: Junyu Luo,Zhiping Xiao,Yifan Wang,Xiao Luo,Jingyang Yuan,Wei Ju,Langechuan Liu,Ming Zhang
关键词-EN: achieved impressive performance, Graph neural networks, graph domain adaptation, neural networks, achieved impressive
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
*备注: Published in IJCAI2024

点击查看摘要

Abstract:Graph neural networks (GNNs) have achieved impressive performance in graph domain adaptation. However, extensive source graphs could be unavailable in real-world scenarios due to privacy and storage concerns. To this end, we investigate an underexplored yet practical problem of source-free graph domain adaptation, which transfers knowledge from source models instead of source graphs to a target domain. To solve this problem, we introduce a novel GNN-based approach called Rank and Align (RNA), which ranks graph similarities with spectral seriation for robust semantics learning, and aligns inharmonic graphs with harmonic graphs which close to the source domain for subgraph extraction. In particular, to overcome label scarcity, we employ the spectral seriation algorithm to infer the robust pairwise rankings, which can guide semantic learning using a similarity learning objective. To depict distribution shifts, we utilize spectral clustering and the silhouette coefficient to detect harmonic graphs, which the source model can easily classify. To reduce potential domain discrepancy, we extract domain-invariant subgraphs from inharmonic graphs by an adversarial edge sampling process, which guides the invariant learning of GNNs. Extensive experiments on several benchmark datasets demonstrate the effectiveness of our proposed RNA.

[IR-7] Hardware Acceleration for Knowledge Graph Processing: Challenges Recent Developments

链接: https://arxiv.org/abs/2408.12173
作者: Maciej Besta,Robert Gerstenberger,Patrick Iff,Pournima Sonawane,Juan Gómez Luna,Raghavendra Kanakagiri,Rui Min,Onur Mutlu,Torsten Hoefler,Raja Appuswamy,Aidan O Mahony
关键词-EN: Semantic Web, achieved significant attention, recent years, search engines, attention in recent
类目: Information Retrieval (cs.IR); Performance (cs.PF)
*备注:

点击查看摘要

Abstract:Knowledge graphs (KGs) have achieved significant attention in recent years, particularly in the area of the Semantic Web as well as gaining popularity in other application domains such as data mining and search engines. Simultaneously, there has been enormous progress in the development of different types of heterogeneous hardware, impacting the way KGs are processed. The aim of this paper is to provide a systematic literature review of knowledge graph hardware acceleration. For this, we present a classification of the primary areas in knowledge graph technology that harnesses different hardware units for accelerating certain knowledge graph functionalities. We then extensively describe respective works, focusing on how KG related schemes harness modern hardware accelerators. Based on our review, we identify various research gaps and future exploratory directions that are anticipated to be of significant value both for academics and industry practitioners.

[IR-8] DimeRec: A Unified Framework for Enhanced Sequential Recommendation via Generative Diffusion Models

链接: https://arxiv.org/abs/2408.12153
作者: Wuchao Li,Rui Huang,Haijun Zhao,Chi Liu,Kai Zheng,Qi Liu,Na Mou,Guorui Zhou,Defu Lian,Yang Song,Wentian Bao,Enyun Yu,Wenwu Ou
关键词-EN: user preferences based, Sequential Recommendation, plays a pivotal, pivotal role, role in recommender
类目: Information Retrieval (cs.IR); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Sequential Recommendation (SR) plays a pivotal role in recommender systems by tailoring recommendations to user preferences based on their non-stationary historical interactions. Achieving high-quality performance in SR requires attention to both item representation and diversity. However, designing an SR method that simultaneously optimizes these merits remains a long-standing challenge. In this study, we address this issue by integrating recent generative Diffusion Models (DM) into SR. DM has demonstrated utility in representation learning and diverse image generation. Nevertheless, a straightforward combination of SR and DM leads to sub-optimal performance due to discrepancies in learning objectives (recommendation vs. noise reconstruction) and the respective learning spaces (non-stationary vs. stationary). To overcome this, we propose a novel framework called DimeRec (\textbfDiffusion with \textbfmulti-interest \textbfenhanced \textbfRecommender). DimeRec synergistically combines a guidance extraction module (GEM) and a generative diffusion aggregation module (DAM). The GEM extracts crucial stationary guidance signals from the user’s non-stationary interaction history, while the DAM employs a generative diffusion process conditioned on GEM’s outputs to reconstruct and generate consistent recommendations. Our numerical experiments demonstrate that DimeRec significantly outperforms established baseline methods across three publicly available datasets. Furthermore, we have successfully deployed DimeRec on a large-scale short video recommendation platform, serving hundreds of millions of users. Live A/B testing confirms that our method improves both users’ time spent and result diversification.

[IR-9] Behavior Pattern Mining-based Multi-Behavior Recommendation

链接: https://arxiv.org/abs/2408.12152
作者: Haojie Li,Zhiyong Cheng,Xu Yu,Jinhuan Liu,Guanfeng Liu,Junwei Du
关键词-EN: systems enhance effectiveness, leveraging auxiliary behaviors, sparse target behaviors, recommendation systems enhance, Multi-behavior recommendation systems
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:Multi-behavior recommendation systems enhance effectiveness by leveraging auxiliary behaviors (such as page views and favorites) to address the limitations of traditional models that depend solely on sparse target behaviors like purchases. Existing approaches to multi-behavior recommendations typically follow one of two strategies: some derive initial node representations from individual behavior subgraphs before integrating them for a comprehensive profile, while others interpret multi-behavior data as a heterogeneous graph, applying graph neural networks to achieve a unified node representation. However, these methods do not adequately explore the intricate patterns of behavior among users and items. To bridge this gap, we introduce a novel algorithm called Behavior Pattern mining-based Multi-behavior Recommendation (BPMR). Our method extensively investigates the diverse interaction patterns between users and items, utilizing these patterns as features for making recommendations. We employ a Bayesian approach to streamline the recommendation process, effectively circumventing the challenges posed by graph neural network algorithms, such as the inability to accurately capture user preferences due to over-smoothing. Our experimental evaluation on three real-world datasets demonstrates that BPMR significantly outperforms existing state-of-the-art algorithms, showing an average improvement of 268.29% in Recall@10 and 248.02% in NDCG@10 metrics. The code of our BPMR is openly accessible for use and further research at this https URL.

[IR-10] Reasoning and Tools for Human-Level Forecasting

链接: https://arxiv.org/abs/2408.12036
作者: Elvis Hsieh,Preston Fu,Jonathan Chen
关键词-EN: largely successful due, memorize large amounts, training data, Language models, trained on web-scale
类目: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:Language models (LMs) trained on web-scale datasets are largely successful due to their ability to memorize large amounts of training data, even if only present in a few examples. These capabilities are often desirable in evaluation on tasks such as question answering but raise questions about whether these models can exhibit genuine reasoning or succeed only at mimicking patterns from the training data. This distinction is particularly salient in forecasting tasks, where the answer is not present in the training data, and the model must reason to make logical deductions. We present Reasoning and Tools for Forecasting (RTF), a framework of reasoning-and-acting (ReAct) agents that can dynamically retrieve updated information and run numerical simulation with equipped tools. We evaluate our model with questions from competitive forecasting platforms and demonstrate that our method is competitive with and can outperform human predictions. This suggests that LMs, with the right tools, can indeed think and adapt like humans, offering valuable insights for real-world decision-making.

[IR-11] Does It Look Sequential? An Analysis of Datasets for Evaluation of Sequential Recommendations

链接: https://arxiv.org/abs/2408.12008
作者: Anton Klenitskiy,Anna Volodkevich,Anton Pembek,Alexey Vasilev
关键词-EN: Sequential recommender systems, Sequential, important and demanded, demanded area, recommender systems
类目: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
*备注:

点击查看摘要

Abstract:Sequential recommender systems are an important and demanded area of research. Such systems aim to use the order of interactions in a user’s history to predict future interactions. The premise is that the order of interactions and sequential patterns play an essential role. Therefore, it is crucial to use datasets that exhibit a sequential structure to evaluate sequential recommenders properly. We apply several methods based on the random shuffling of the user’s sequence of interactions to assess the strength of sequential structure across 15 datasets, frequently used for sequential recommender systems evaluation in recent research papers presented at top-tier conferences. As shuffling explicitly breaks sequential dependencies inherent in datasets, we estimate the strength of sequential patterns by comparing metrics for shuffled and original versions of the dataset. Our findings show that several popular datasets have a rather weak sequential structure. Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG) Cite as: arXiv:2408.12008 [cs.IR] (or arXiv:2408.12008v1 [cs.IR] for this version) https://doi.org/10.48550/arXiv.2408.12008 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Related DOI: https://doi.org/10.1145/3640457.3688195 Focus to learn more DOI(s) linking to related resources

[IR-12] What are the limits of cross-lingual dense passage retrieval for low-resource languages?

链接: https://arxiv.org/abs/2408.11942
作者: Jie Wu,Zhaochun Ren,Suzan Verberne
关键词-EN: Dense Passage Retriever, multi-lingual Dense Passage, multi-lingual Dense, Passage Retriever, Dense Passage
类目: Information Retrieval (cs.IR)
*备注:

点击查看摘要

Abstract:In this paper, we analyze the capabilities of the multi-lingual Dense Passage Retriever (mDPR) for extremely low-resource languages. In the Cross-lingual Open-Retrieval Answer Generation (CORA) pipeline, mDPR achieves success on multilingual open QA benchmarks across 26 languages, of which 9 were unseen during training. These results are promising for Question Answering (QA) for low-resource languages. We focus on two extremely low-resource languages for which mDPR performs poorly: Amharic and Khmer. We collect and curate datasets to train mDPR models using Translation Language Modeling (TLM) and question–passage alignment. We also investigate the effect of our extension on the language distribution in the retrieval results. Our results on the MKQA and AmQA datasets show that language alignment brings improvements to mDPR for the low-resource languages, but the improvements are modest and the results remain low. We conclude that fulfilling CORA’s promise to enable multilingual open QA in extremely low-resource settings is challenging because the model, the data, and the evaluation approach are intertwined. Hence, all three need attention in follow-up work. We release our code for reproducibility and future work: https://anonymous.4open.science/r/Question-Answering-for-Low-Resource-Languages-B13C/

[IR-13] Ancient Wisdom Modern Tools: Exploring Retrieval-Augmented LLMs for Ancient Indian Philosophy ACL ACL2024

链接: https://arxiv.org/abs/2408.11903
作者: Priyanka Mandikal
关键词-EN: revolutionized the landscape, landscape of information, knowledge dissemination, RAG model, RAG
类目: Computation and Language (cs.CL); Computers and Society (cs.CY); Information Retrieval (cs.IR)
*备注: Best paper at the Workshop on Machine Learning for Ancient Languages @ ACL 2024. Proceedings of the 1st Machine Learning for Ancient Languages Workshop, 2024.ml4al-1.23, Association for Computational Linguistics (ACL) 2024. Dataset, code, and evaluation is available at: this https URL

点击查看摘要

Abstract:LLMs have revolutionized the landscape of information retrieval and knowledge dissemination. However, their application in specialized areas is often hindered by factual inaccuracies and hallucinations, especially in long-tail knowledge distributions. We explore the potential of retrieval-augmented generation (RAG) models for long-form question answering (LFQA) in a specialized knowledge domain. We present VedantaNY-10M, a dataset curated from extensive public discourses on the ancient Indian philosophy of Advaita Vedanta. We develop and benchmark a RAG model against a standard, non-RAG LLM, focusing on transcription, retrieval, and generation performance. Human evaluations by computational linguists and domain experts show that the RAG model significantly outperforms the standard model in producing factual and comprehensive responses having fewer hallucinations. In addition, a keyword-based hybrid retriever that emphasizes unique low-frequency terms further improves results. Our study provides insights into effectively integrating modern large language models with ancient knowledge systems. Project page with dataset and code: this https URL

[IR-14] Hierarchical Retrieval-Augmented Generation Model with Rethink for Multi-hop Question Answering

链接: https://arxiv.org/abs/2408.11875
作者: Xiaoming Zhang,Ming Wang,Xiaocui Yang,Daling Wang,Shi Feng,Yifei Zhang
关键词-EN: Multi-hop Question Answering, resolve intricate questions, Multi-hop Question, Question Answering, necessitates complex reasoning
类目: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
*备注: undereview

点击查看摘要

Abstract:Multi-hop Question Answering (QA) necessitates complex reasoning by integrating multiple pieces of information to resolve intricate questions. However, existing QA systems encounter challenges such as outdated information, context window length limitations, and an accuracy-quantity trade-off. To address these issues, we propose a novel framework, the Hierarchical Retrieval-Augmented Generation Model with Rethink (HiRAG), comprising Decomposer, Definer, Retriever, Filter, and Summarizer five key modules. We introduce a new hierarchical retrieval strategy that incorporates both sparse retrieval at the document level and dense retrieval at the chunk level, effectively integrating their strengths. Additionally, we propose a single-candidate retrieval method to mitigate the limitations of multi-candidate retrieval. We also construct two new corpora, Indexed Wikicorpus and Profile Wikicorpus, to address the issues of outdated and insufficient knowledge. Our experimental results on four datasets demonstrate that HiRAG outperforms state-of-the-art models across most metrics, and our Indexed Wikicorpus is effective. The code for HiRAG is available at this https URL Comments: undereview Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR) Cite as: arXiv:2408.11875 [cs.CL] (or arXiv:2408.11875v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2408.11875 Focus to learn more arXiv-issued DOI via DataCite

附件下载

点击下载今日全部论文列表