本篇博文主要展示每日从Arxiv论文网站获取的最新论文列表,每天早上11:00点定时自动更新,主要按照NLP、CV、ML、AI四个大方向区分,若需要邮件定时接收,请在评论区留下你的邮箱号。

说明:每日论文数据从arxiv网站获取,每天早上11点左右定时自动更新。

友情提示: 如何您需要邮箱接收每日论文数据,请在评论处留下你的邮箱,同样每天11点左右邮件定时自动发送。

目录

概览 (2022-05-17)

今日共更新451篇论文,其中:

  • 61篇自然语言处理(NLP: cs.CL)
  • 82篇计算机视觉(CV: cs.CV)
  • 126篇机器学习(ML: cs.LG)
  • 16篇人工智能(AI: cs.AI)
  • 其它主题166篇

自然语言处理

NLP-0-标题 FactPEGASUS Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization

链接: https://arxiv.org/abs/2205.07830
作者: David Wan, Mohit Bansal
备注: NAACL 2022 (19 pages)

点击查看摘要

Abstract: We present FactPEGASUS, an abstractive summarization model that addresses the problem of factuality during pre-training and fine-tuning: (1) We augment the sentence selection strategy of PEGASUS’s (Zhang et al., 2020) pre-training objective to create pseudo-summaries that are both important and factual; (2) We introduce three complementary components for fine-tuning. The corrector removes hallucinations present in the reference summary, the contrastor uses contrastive learning to better differentiate nonfactual summaries from factual ones, and the connector bridges the gap between the pre-training and fine-tuning for better transfer of knowledge. Experiments on three downstream tasks demonstrate that FactPEGASUS substantially improves factuality evaluated by multiple automatic metrics and humans. Our thorough analysis suggests that FactPEGASUS is more factual than using the original pre-training objective in zero-shot and few-shot settings, retains factual behavior more robustly than strong baselines, and does not rely entirely on becoming more extractive to improve factuality. Our code and data are publicly available at: this https URL

摘要:我们提出了FactPegasus,这是一个抽象性摘要模型,该模型解决了预培训和微调期间的事实问题:(1)我们增强了Pegasus’s(Zhang等,2020)的句子选择策略,以创建预培训的目标,以创建既重要又是事实的伪夏姆; (2)我们介绍了三个互补组件以进行微调。校正器消除了参考摘要中存在的幻觉,对比度使用对比度学习更好地区分非事实摘要与事实的摘要,并且连接器弥合了预训练和微调之间的差距,以更好地转移知识。对三个下游任务的实验表明,FactPegasus大大改善了通过多个自动指标和人类评估的事实。我们的详尽分析表明,事实类型比在零射击和几乎没有射击的设置中使用原始的预训练目标更为事实,比强大的基线更坚固,保留事实行为,并且不完全依靠更依赖于提高事实性。我们的代码和数据可公开可用:此HTTPS URL

NLP-1-标题 Natural Language Specifications in Proof Assistants

链接: https://arxiv.org/abs/2205.07811
作者: Colin S. Gordon, Sergey Matskevich
备注:

点击查看摘要

Abstract: Interactive proof assistants are computer programs carefully constructed to check a human-designed proof of a mathematical claim with high confidence in the implementation. However, this only validates truth of a formal claim, which may have been mistranslated from a claim made in natural language. This is especially problematic when using proof assistants to formally verify the correctness of software with respect to a natural language specification. The translation from informal to formal remains a challenging, time-consuming process that is difficult to audit for correctness. This paper argues that it is possible to build support for natural language specifications within existing proof assistants, in a way that complements the principles used to establish trust and auditability in proof assistants themselves.

摘要:交互式证明助手是精心构建的计算机程序,以检查人类设计的数学主张证明,并对实施充满信心。但是,这仅证实了正式主张的真理,这可能是从自然语言提出的主张中被误认为的。当使用证明助手正式验证软件相对于自然语言规范的正确性时,这尤其有问题。从非正式到正式的翻译仍然是一个充满挑战的,耗时的过程,难以审核正确的正确性。本文认为,有可能在现有证明助理中建立对自然语言规格的支持,以补充用于建立证明助手本身的信任和可审计性的原则。

NLP-2-标题 Referring Expressions with Rational Speech Act Framework A Probabilistic Approach

链接: https://arxiv.org/abs/2205.07795
作者: Hieu Le, Taufiq Daryanto, Fabian Zhafransyah, Derry Wijaya, Elizabeth Coppock, Sang Chin
备注:

点击查看摘要

Abstract: This paper focuses on a referring expression generation (REG) task in which the aim is to pick out an object in a complex visual scene. One common theoretical approach to this problem is to model the task as a two-agent cooperative scheme in which a speaker' agent would generate the expression that best describes a targeted area and a listener’ agent would identify the target. Several recent REG systems have used deep learning approaches to represent the speaker/listener agents. The Rational Speech Act framework (RSA), a Bayesian approach to pragmatics that can predict human linguistic behavior quite accurately, has been shown to generate high quality and explainable expressions on toy datasets involving simple visual scenes. Its application to large scale problems, however, remains largely unexplored. This paper applies a combination of the probabilistic RSA framework and deep learning approaches to larger datasets involving complex visual scenes in a multi-step process with the aim of generating better-explained expressions. We carry out experiments on the RefCOCO and RefCOCO+ datasets and compare our approach with other end-to-end deep learning approaches as well as a variation of RSA to highlight our key contribution. Experimental results show that while achieving lower accuracy than SOTA deep learning methods, our approach outperforms similar RSA approach in human comprehension and has an advantage over end-to-end deep learning under limited data scenario. Lastly, we provide a detailed analysis on the expression generation process with concrete examples, thus providing a systematic view on error types and deficiencies in the generation process and identifying possible areas for future improvements.

摘要:本文着重于引用表达生成(REG)任务,其目的是在复杂的视觉场景中挑选对象。解决此问题的一种常见理论方法是将任务建模为两种合作方案,在该方案中,“说话者”代理人将生成最能描述目标区域的表达式,而“听众”代理人将识别目标。最近的一些REG系统使用了深度学习方法来代表说话者/听众的代理商。理性语音ACT框架(RSA)是一种可以非常准确地预测人类语言行为的贝叶斯语言方法,已显示出在涉及简单视觉场景的玩具数据集上产生高质量和可解释的表达式。然而,它在大规模问题上的应用仍然很大程度上没有探索。本文将概率RSA框架和深度学习方法的组合应用于涉及多步骤过程中复杂的视觉场景的较大数据集,目的是生成更好解释的表达式。我们在reccoco和reccoco+数据集上进行了实验,并将我们的方法与其他端到端深度学习方法以及RSA的变体进行比较,以突出我们的关键贡献。实验结果表明,尽管达到比SOTA深度学习方法较低的精度,但我们的方法在人类理解中的表现优于类似的RSA方法,并且在有限的数据方案下,比端到端的深度学习具有优势。最后,我们通过具体示例提供了有关表达生成过程的详细分析,从而为生成过程中的错误类型和缺陷提供了系统的看法,并确定了未来改进的可能区域。

NLP-3-标题 What company do words keep? Revisiting the distributional semantics of J.R. Firth & Zellig Harris

链接: https://arxiv.org/abs/2205.07750
作者: Mikael Brunila, Jack LaViolette
备注: Accepted at NAACL 2022 (main track)

点击查看摘要

Abstract: The power of word embeddings is attributed to the linguistic theory that similar words will appear in similar contexts. This idea is specifically invoked by noting that “you shall know a word by the company it keeps,” a quote from British linguist J.R. Firth who, along with his American colleague Zellig Harris, is often credited with the invention of “distributional semantics.” While both Firth and Harris are cited in all major NLP textbooks and many foundational papers, the content and differences between their theories is seldom discussed. Engaging in a close reading of their work, we discover two distinct and in many ways divergent theories of meaning. One focuses exclusively on the internal workings of linguistic forms, while the other invites us to consider words in new company - not just with other linguistic elements, but also in a broader cultural and situational context. Contrasting these theories from the perspective of current debates in NLP, we discover in Firth a figure who could guide the field towards a more culturally grounded notion of semantics. We consider how an expanded notion of “context” might be modeled in practice through two different strategies: comparative stratification and syntagmatic extension

摘要:单词嵌入的力量归因于语言理论,即相似的单词将出现在相似的上下文中。英国语言学家J.R. Firth的一句话,他与美国同事Zellig Harris一起经常被认为是“发行语义”的发明。尽管Firth和Harris在所有主要的NLP教科书和许多基础论文中都被引用,但很少讨论其理论之间的内容和差异。仔细阅读他们的作品,我们发现了两种不同的意义理论。一个人专注于语言形式的内部运作,而另一个人邀请我们考虑新公司中的词语 - 不仅与其他语言元素有关,而且在更广泛的文化和情境背景下。从NLP当前的辩论的角度来看,我们在Firth中发现了这些理论,他在Firth中发现了一个人物,他可以指导该领域朝着更具文化基础的语义概念迈进。我们考虑如何通过两种不同的策略在实践中建立扩展的“上下文”概念:比较分层和构想扩展

NLP-4-标题 Strong Equivalence of TAG and CCG

链接: https://arxiv.org/abs/2205.07743
作者: Andreas Maletti, Lena Katharina Schiffer (Universität Leipzig)
备注: 30 pages, 6 figures, revised and extended version of paper appearing in Trans. ACL (2021) 9: 707-720

点击查看摘要

Abstract: Tree-adjoining grammar (TAG) and combinatory categorial grammar (CCG) are two well-established mildly context-sensitive grammar formalisms that are known to have the same expressive power on strings (i.e., generate the same class of string languages). It is demonstrated that their expressive power on trees also essentially coincides. In fact, CCG without lexicon entries for the empty string and only first-order rules of degree at most 2 are sufficient for its full expressive power.

摘要:添加树的语法(TAG)和组合类别语法(CCG)是两个完善的温和上下文敏感的语法形式主义,已知在字符串上具有相同的表达力(即生成相同类别的字符串语言)。证明它们在树木上的表现力也基本上是一致的。实际上,没有词典的CCG用于空字符串,最多只有2个学位规则就足以满足其完整的表现力。

NLP-5-标题 Persian Abstract Meaning Representation

链接: https://arxiv.org/abs/2205.07712
作者: Reza Takhshid, Razieh Shojaei, Zahra Azin, Mohammad Bahrani
备注:

点击查看摘要

Abstract: Abstract Meaning Representation (AMR) is an annotation framework representing the semantic structure of a sentence as a whole. From the beginning, AMR was not intended to act as an interlingua; however, it has made progress towards the idea of designing a universal meaning representation framework. Accordingly, developing AMR annotation guidelines for different languages, based on language divergences, is of significant importance. In this paper, we elaborate on Persian Abstract Meaning Representation (PAMR) annotation specifications, based on which we annotated the Persian translation of “The Little Prince” as the first gold standard for Persian AMR. Moreover, we describe how some Persian-specific syntactic constructions would result in different AMR annotations.

摘要:抽象含义表示(AMR)是一个注释框架,代表整个句子的语义结构。从一开始,AMR就不打算充当Interlingua。但是,它已经取得了发展,即设计一个通用含义表示框架的想法。因此,根据语言差异制定不同语言的AMR注释指南至关重要。在本文中,我们详细阐述了波斯抽象的含义表示(PAMR)注释规范,根据我们的注释,我们将波斯语“小王子”的翻译作为波斯AMR的第一个金标准。此外,我们描述了一些波斯特异性的句法结构将如何导致AMR注释不同。

NLP-6-标题 CQR-SQL Conversational Question Reformulation Enhanced Context-Dependent Text-to-SQL Parsers

链接: https://arxiv.org/abs/2205.07686
作者: Dongling Xiao, Linzheng Chai, Qian-Wen Zhang, Zhao Yan, Zhoujun Li, Yunbo Cao
备注: Work in progress. 11 pages, 6 figures

点击查看摘要

Abstract: Context-dependent text-to-SQL is the task of translating multi-turn questions into database-related SQL queries. Existing methods typically focus on making full use of history context or previously predicted SQL for currently SQL parsing, while neglecting to explicitly comprehend the schema and conversational dependency, such as co-reference, ellipsis and user focus change. In this paper, we propose CQR-SQL, which uses auxiliary Conversational Question Reformulation (CQR) learning to explicitly exploit schema and decouple contextual dependency for SQL parsing. Specifically, we first present a schema enhanced recursive CQR method to produce domain-relevant self-contained questions. Secondly, we train CQR-SQL models to map the semantics of multi-turn questions and auxiliary self-contained questions into the same latent space through schema grounding consistency task and tree-structured SQL parsing consistency task, which enhances the abilities of SQL parsing by adequately contextual understanding. At the time of writing, our CQR-SQL achieves new state-of-the-art results on two context-dependent text-to-SQL benchmarks SParC and CoSQL.

摘要:与上下文有关的文本到SQL是将多转变问题转换为与数据库相关的SQL查询的任务。现有方法通常着重于充分利用历史上下文或以前预测的SQL进行当前的SQL解析,同时忽略了明确理解模式和对话依赖性,例如共同参考,省略号和用户的重点更改。在本文中,我们提出了使用辅助对话问题重新印度(CQR)的CQR-SQL,以明确利用模式并将上下文依赖于SQL解析。具体而言,我们首先提出了一种增强的递归CQR方法,以产生与域相关的独立问题。其次,我们训练CQR-SQL模型,通过模式接地一致性任务和树结构的SQL解析一致性任务来绘制多转弯问题和辅助自包含问题的语义,从而增强了SQL通过SQL解析的能力。充分的上下文理解。在撰写本文时,我们的CQR-SQL在两个与上下文依赖的文本到SQL基准SPARC和COSQL上实现了新的最先进结果。

NLP-7-标题 A Fast Attention Network for Joint Intent Detection and Slot Filling on Edge Devices

链接: https://arxiv.org/abs/2205.07646
作者: Liang Huang, Senjie Liang, Feiyang Ye, Nan Gao
备注: 9 pages, 4 figures

点击查看摘要

Abstract: Intent detection and slot filling are two main tasks in natural language understanding and play an essential role in task-oriented dialogue systems. The joint learning of both tasks can improve inference accuracy and is popular in recent works. However, most joint models ignore the inference latency and cannot meet the need to deploy dialogue systems at the edge. In this paper, we propose a Fast Attention Network (FAN) for joint intent detection and slot filling tasks, guaranteeing both accuracy and latency. Specifically, we introduce a clean and parameter-refined attention module to enhance the information exchange between intent and slot, improving semantic accuracy by more than 2%. FAN can be implemented on different encoders and delivers more accurate models at every speed level. Our experiments on the Jetson Nano platform show that FAN inferences fifteen utterances per second with a small accuracy drop, showing its effectiveness and efficiency on edge devices.

摘要:意图检测和插槽填充是自然语言理解的两个主要任务,并在以任务为导向的对话系统中起着至关重要的作用。两项任务的联合学习都可以提高推理的准确性,并且在最近的作品中很受欢迎。但是,大多数联合模型都忽略了推理潜伏期,无法满足在边缘部署对话系统的需求。在本文中,我们提出了一个快速注意网络(FAN),以进行联合意图检测和插槽填充任务,以确保准确性和延迟。具体而言,我们引入了一个干净的参数精制注意模块,以增强意图和插槽之间的信息交换,从而提高语义精度超过2%。风扇可以在不同的编码器上实现,并在每个速度级别提供更准确的模型。我们在Jetson Nano平台上进行的实验表明,粉丝每秒以较小的精度下降来推断15个话语,显示了其在边缘设备上的有效性和效率。

NLP-8-标题 A Precis of Language Models are not Models of Language

链接: https://arxiv.org/abs/2205.07634
作者: Csaba Veres
备注:

点击查看摘要

Abstract: Natural Language Processing is one of the leading application areas in the current resurgence of Artificial Intelligence, spearheaded by Artificial Neural Networks. We show that despite their many successes at performing linguistic tasks, Large Neural Language Models are ill-suited as comprehensive models of natural language. The wider implication is that, in spite of the often overbearing optimism about AI, modern neural models do not represent a revolution in our understanding of cognition.

摘要:自然语言处理是人工神经网络率领的当前人工智能复兴的主要应用领域之一。我们表明,尽管他们在执行语言任务方面取得了许多成功,但大型神经语言模型还是不适合自然语言的综合模型。更广泛的含义是,尽管对AI经常过分乐观,但现代神经模型并不代表我们对认知的理解的革命。

NLP-9-标题 Taming Continuous Posteriors for Latent Variational Dialogue Policies

链接: https://arxiv.org/abs/2205.07633
作者: Marin Vlastelica, Patrick Ernst, Gyuri Szarvas
备注:

点击查看摘要

Abstract: Utilizing amortized variational inference for latent-action reinforcement learning (RL) has been shown to be an effective approach in Task-oriented Dialogue (ToD) systems for optimizing dialogue success. Until now, categorical posteriors have been argued to be one of the main drivers of performance. In this work we revisit Gaussian variational posteriors for latent-action RL and show that they can yield even better performance than categoricals. We achieve this by simplifying the training procedure and propose ways to regularize the latent dialogue policy to retain good response coherence. Using continuous latent representations our model achieves state of the art dialogue success rate on the MultiWOZ benchmark, and also compares well to categorical latent methods in response coherence.

摘要:使用摊销的变异推断来进行潜在的增强学习(RL)被证明是以任务为导向的对话(TOD)系统的有效方法,以优化对话成功。到目前为止,已经被认为是性能的主要驱动力之一。在这项工作中,我们重新访问了潜在rl的高斯变异后代,并表明它们比分类可以产生更好的性能。我们通过简化培训程序并提出方式来实现这一目标,以使潜在对话政策正规化以保持良好的响应连贯性。使用连续的潜在表示,我们的模型在多沃兹基准上实现了最先进的对话成功率,并且还可以很好地与响应相干性的分类潜在方法进行比较。

NLP-10-标题 Assessing the Limits of the Distributional Hypothesis in Semantic Spaces Trait-based Relational Knowledge and the Impact of Co-occurrences

链接: https://arxiv.org/abs/2205.07603
作者: Mark Anderson, Jose Camacho-Collados
备注: Due to appear in the proceedings of *SEM 2022: The 11th Joint Conference on Lexical and Computational Semantics

点击查看摘要

Abstract: The increase in performance in NLP due to the prevalence of distributional models and deep learning has brought with it a reciprocal decrease in interpretability. This has spurred a focus on what neural networks learn about natural language with less of a focus on how. Some work has focused on the data used to develop data-driven models, but typically this line of work aims to highlight issues with the data, e.g. highlighting and offsetting harmful biases. This work contributes to the relatively untrodden path of what is required in data for models to capture meaningful representations of natural language. This entails evaluating how well English and Spanish semantic spaces capture a particular type of relational knowledge, namely the traits associated with concepts (e.g. bananas-yellow), and exploring the role of co-occurrences in this context.

摘要:由于分布模型的普遍性和深度学习的流行,NLP的性能提高,其可解释性互惠下降。这促使人们专注于神经网络对自然语言的了解,而不再关注如何。一些工作集中在用于开发数据驱动模型的数据上,但通常这一工作旨在突出数据中的问题,例如突出和抵消有害偏见。这项工作有助于模型中数据中所需的内容相对不受欢迎的路径,以捕获自然语言的有意义表示。这需要评估英语和西班牙语的语义空间如何捕获特定类型的关系知识,即与概念相关的特征(例如香蕉黄色),并在这种情况下探索共同发生的作用。

NLP-11-标题 Heroes Villains and Victims and GPT-3 – Automated Extraction of Character Roles Without Training Data

链接: https://arxiv.org/abs/2205.07557
作者: Dominik Stammbach, Maria Antoniak, Elliott Ash
备注:

点击查看摘要

Abstract: This paper shows how to use large-scale pre-trained language models to extract character roles from narrative texts without training data. Queried with a zero-shot question-answering prompt, GPT-3 can identify the hero, villain, and victim in diverse domains: newspaper articles, movie plot summaries, and political speeches.

摘要:本文展示了如何使用大规模的预训练的语言模型从叙事文本中提取角色角色,而无需培训数据。GPT-3通过零射击提示提示,可以识别各种领域的英雄,反派和受害者:报纸文章,电影情节摘要和政治演讲。

NLP-12-标题 The AI Teacher Test Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues

链接: https://arxiv.org/abs/2205.07540
作者: Anaïs Tack, Chris Piech
备注: to be published in the Proceedings of the 15th International Conference on Educational Data Mining; 8 pages, 5 figures, 3 tables

点击查看摘要

Abstract: How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports on a first attempt at an AI teacher test. We built a solution around the insight that you can run conversational agents in parallel to human teachers in real-world dialogues, simulate how different agents would respond to a student, and compare these counterpart responses in terms of three abilities: speak like a teacher, understand a student, help a student. Our method builds on the reliability of comparative judgments in education and uses a probabilistic model and Bayesian sampling to infer estimates of pedagogical ability. We find that, even though conversational agents (Blender in particular) perform well on conversational uptake, they are quantifiably worse than real teachers on several pedagogical dimensions, especially with regard to helpfulness (Blender: {\Delta} ability = -0.75; GPT-3: {\Delta} ability = -0.93).

摘要:我们如何测试诸如Blender和GPT-3之类的最先进的生成模型是否能够在教育对话中回复学生?设计AI教师的测试具有挑战性:尽管评估方法急需,但没有现成的解决方案来衡量教学能力。本文报告了AI教师测试的首次尝试。我们围绕着洞察力建立了一个解决方案,您可以在现实世界中与人类教师并行运行对话代理,模拟不同的代理人对学生的反应方式,并根据三种能力进行比较这些对应的回答:像老师,老师,老师,说话了解学生,帮助学生。我们的方法基于教育中比较判断的可靠性,并使用概率模型和贝叶斯抽样来推断教学能力的估计。我们发现,即使对话剂(尤其是搅拌器)在对话摄取方面的表现良好,但在几个教学维度上,它们的态度比真实的老师差,尤其是在有益的方面(Blender:{\ delta}能力= -0.75; gpt-- gpt-- 3:{\ delta}能力= -0.93)。

NLP-13-标题 Quantitative Discourse Cohesion Analysis of Scientific Scholarly Texts using Multilayer Networks

链接: https://arxiv.org/abs/2205.07532
作者: Vasudha Bhatnagar, Swagata Duari, S.K. Gupta
备注: 26 pages, 8 figures, 4 tables

点击查看摘要

Abstract: Discourse cohesion facilitates text comprehension and helps the reader form a coherent narrative. In this study, we aim to computationally analyze the discourse cohesion in scientific scholarly texts using multilayer network representation and quantify the writing quality of the document. Exploiting the hierarchical structure of scientific scholarly texts, we design section-level and document-level metrics to assess the extent of lexical cohesion in text. We use a publicly available dataset along with a curated set of contrasting examples to validate the proposed metrics by comparing them against select indices computed using existing cohesion analysis tools. We observe that the proposed metrics correlate as expected with the existing cohesion indices. We also present an analytical framework, CHIAA (CHeck It Again, Author), to provide pointers to the author for potential improvements in the manuscript with the help of the section-level and document-level metrics. The proposed CHIAA framework furnishes a clear and precise prescription to the author for improving writing by localizing regions in text with cohesion gaps. We demonstrate the efficacy of CHIAA framework using succinct examples from cohesion-deficient text excerpts in the experimental dataset.

摘要:话语的凝聚力有助于文本理解,并帮助读者形成连贯的叙述。在这项研究中,我们旨在使用多层网络表示并量化文档的写作质量,从而计算分析科学学术文本中的话语凝聚力。利用科学学术文本的层次结构,我们设计了部分级别和文档级指标,以评估文本中词汇内聚力的程度。我们使用公开可用的数据集以及一组策划的对比示例,通过将它们与使用现有内聚力分析工具计算的精选索引进行比较来验证所提出的指标。我们观察到,所提出的指标与现有内聚力指数相关。我们还提出了一个分析框架Chiaa(作者再次检查),以在截面级别和文档级指标的帮助下向作者提供指示以进行手稿的潜在改进。拟议的CHIAA框架为作者提供了清晰而精确的处方,用于通过具有凝聚力差距的文本区域化区域来改善写作。我们使用实验数据集中的凝聚力缺陷文本摘录中的简洁示例来证明CHIAA框架的功效。

NLP-14-标题 Prompting to Distill Boosting Data-Free Knowledge Distillation via Reinforced Prompt

链接: https://arxiv.org/abs/2205.07523
作者: Xinyin Ma, Xinchao Wang, Gongfan Fang, Yongliang Shen, Weiming Lu
备注: Accepted by IJCAI2022

点击查看摘要

Abstract: Data-free knowledge distillation (DFKD) conducts knowledge distillation via eliminating the dependence of original training data, and has recently achieved impressive results in accelerating pre-trained language models. At the heart of DFKD is to reconstruct a synthetic dataset by inverting the parameters of the uncompressed model. Prior DFKD approaches, however, have largely relied on hand-crafted priors of the target data distribution for the reconstruction, which can be inevitably biased and often incompetent to capture the intrinsic distributions. To address this problem, we propose a prompt-based method, termed as PromptDFD, that allows us to take advantage of learned language priors, which effectively harmonizes the synthetic sentences to be semantically and grammatically correct. Specifically, PromptDFD leverages a pre-trained generative model to provide language priors and introduces a reinforced topic prompter to control data synthesis, making the generated samples thematically relevant and semantically plausible, and thus friendly to downstream tasks. As shown in our experiments, the proposed method substantially improves the synthesis quality and achieves considerable improvements on distillation performance. In some cases, PromptDFD even gives rise to results on par with those from the data-driven knowledge distillation with access to the original training data.

摘要:无数据知识蒸馏(DFKD)通过消除原始培训数据的依赖性进行知识蒸馏,并且最近在加速预训练的语言模型方面取得了令人印象深刻的结果。 DFKD的核心是通过反转未压缩模型的参数来重建合成数据集。然而,先前的DFKD方法在很大程度上依赖于重建目标数据分布的手工制作的先验,这可能是不可避免地会偏见的,并且通常无法捕获固有分布。为了解决这个问题,我们提出了一种基于及时的方法,称为及时的方法,该方法使我们能够利用学习的语言先验,该方法有效地统一了综合句子在语义上和语法上是正确的。具体而言,促使DFD利用预先训练的生成模型来提供语言先验,并引入了强化的主题提示器来控制数据综合,使生成的样本在主题上是相关且具有语义上合理的,因此对下游任务友好。如我们的实验所示,所提出的方法基本上改善了合成质量,并在蒸馏性能方面取得了可观的改善。在某些情况下,提示DFD甚至会与数据驱动的知识蒸馏的结果相提并论,并访问了原始培训数据。

NLP-15-标题 Directed Acyclic Transformer for Non-Autoregressive Machine Translation

链接: https://arxiv.org/abs/2205.07459
作者: Fei Huang, Hao Zhou, Yang Liu, Hang Li, Minlie Huang
备注: accepted at ICML2022

点击查看摘要

Abstract: Non-autoregressive Transformers (NATs) significantly reduce the decoding latency by generating all tokens in parallel. However, such independent predictions prevent NATs from capturing the dependencies between the tokens for generating multiple possible translations. In this paper, we propose Directed Acyclic Transfomer (DA-Transformer), which represents the hidden states in a Directed Acyclic Graph (DAG), where each path of the DAG corresponds to a specific translation. The whole DAG simultaneously captures multiple translations and facilitates fast predictions in a non-autoregressive fashion. Experiments on the raw training data of WMT benchmark show that DA-Transformer substantially outperforms previous NATs by about 3 BLEU on average, which is the first NAT model that achieves competitive results with autoregressive Transformers without relying on knowledge distillation.

摘要:非自动进取的变压器(NAT)通过并联生成所有令牌来显着减少解码延迟。但是,这种独立的预测阻止NAT捕获令牌之间的依赖性,以生成多个可能的翻译。在本文中,我们提出了定向的无环形转化器(DA-Transformer),该转换器表示有向的无环图(DAG)中的隐藏状态,其中DAG的每个路径都与特定的翻译相对应。整个DAG同时捕获了多次翻译,并以非解放性的方式促进了快速预测。WMT基准测试的原始训练数据的实验表明,DA-Transformer平均比以前的NAT大约超过了3个BLEU,这是第一个NAT模型,它在不依赖知识蒸馏的情况下使用自动回归变压器实现竞争性结果。

NLP-16-标题 Reasoning about Procedures with Natural Language Processing A Tutorial

链接: https://arxiv.org/abs/2205.07455
作者: Li Zhang
备注:

点击查看摘要

Abstract: This tutorial provides a comprehensive and in-depth view of the research on procedures, primarily in Natural Language Processing. A procedure is a sequence of steps intended to achieve some goal. Understanding procedures in natural language has a long history, with recent breakthroughs made possible by advances in technology. First, we discuss established approaches to collect procedures, by human annotation or extraction from web resources. Then, we examine different angles from which procedures can be reasoned about, as well as ways to represent them. Finally, we enumerate scenarios where procedural knowledge can be applied to the real world.

摘要:本教程提供了有关程序研究的全面和深入的观点,主要是在自然语言处理中。一个过程是一系列旨在实现目标的步骤。理解自然语言的程序具有悠久的历史,而技术进步使最近的突破成为可能。首先,我们讨论通过人类注释或从Web资源中提取的既定方法来收集程序。然后,我们检查了可以从哪些角度进行推理的不同角度,以及代表它们的方法。最后,我们列举了可以将程序知识应用于现实世界的方案。

NLP-17-标题 Miutsu NTUs TaskBot for the Alexa Prize

链接: https://arxiv.org/abs/2205.07446
作者: Yen-Ting Lin, Hui-Chi Kuo, Ze-Song Xu, Ssu Chiu, Chieh-Chi Hung, Yi-Cheng Chen, Chao-Wei Huang, Yun-Nung Chen
备注:

点击查看摘要

Abstract: This paper introduces Miutsu, National Taiwan University’s Alexa Prize TaskBot, which is designed to assist users in completing tasks requiring multiple steps and decisions in two different domains – home improvement and cooking. We overview our system design and architectural goals, and detail the proposed core elements, including question answering, task retrieval, social chatting, and various conversational modules. A dialogue flow is proposed to provide a robust and engaging conversation when handling complex tasks. We discuss the faced challenges during the competition and potential future work.

摘要:本文介绍了台湾大学的Miutsu,Alexa奖Taskbot,该奖项旨在帮助用户完成在两个不同领域的多个步骤和决策的任务 - 家庭装修和烹饪。我们概述了我们的系统设计和架构目标,并详细介绍了提出的核心元素,包括问答,任务检索,社交聊天和各种对话模块。提出对话流程,以便在处理复杂任务时提供强大而引人入胜的对话。我们讨论竞争期间面临的挑战和潜在的未来工作。

NLP-18-标题 What GPT Knows About Who is Who

链接: https://arxiv.org/abs/2205.07407
作者: Xiaohan Yang, Eduardo Peynetti, Vasco Meerman, Chris Tanner
备注: Accepted by ACL 2022 Workshop on Insights from Negative Results in NLP

点击查看摘要

Abstract: Coreference resolution – which is a crucial task for understanding discourse and language at large – has yet to witness widespread benefits from large language models (LLMs). Moreover, coreference resolution systems largely rely on supervised labels, which are highly expensive and difficult to annotate, thus making it ripe for prompt engineering. In this paper, we introduce a QA-based prompt-engineering method and discern \textit{generative}, pre-trained LLMs’ abilities and limitations toward the task of coreference resolution. Our experiments show that GPT-2 and GPT-Neo can return valid answers, but that their capabilities to identify coreferent mentions are limited and prompt-sensitive, leading to inconsistent results.

摘要:核心解决方案 - 这是理解话语和整个语言的关键任务 - 尚未见证大型语言模型(LLMS)的广泛利益。此外,Coreference分辨率系统在很大程度上依赖于监督标签,这些标签非常昂贵且难以注释,从而使迅速工程变得成熟。在本文中,我们引入了一种基于质量检查的及时工程方法,并识别\ textit {Generative},预先训练的LLMS的能力和局限性对COREFERENT分辨率的任务。我们的实验表明,GPT-2和GPT-NEO可以返回有效的答案,但是他们识别核心提及的能力有限且迅速敏感,从而导致结果不一致。

NLP-19-标题 Downstream Transformer Generation of Question-Answer Pairs with Preprocessing and Postprocessing Pipelines

链接: https://arxiv.org/abs/2205.07387
作者: Cheng Zhang, Hao Zhang, Jie Wang
备注:

点击查看摘要

Abstract: We present a system called TP3 to perform a downstream task of transformers on generating question-answer pairs (QAPs) from a given article. TP3 first finetunes pretrained transformers on QAP datasets, then uses a preprocessing pipeline to select appropriate answers, feeds the relevant sentences and the answer to the finetuned transformer to generate candidate QAPs, and finally uses a postprocessing pipeline to filter inadequate QAPs. In particular, using pretrained T5 models as transformers and the SQuAD dataset as the finetruning dataset, we show that TP3 generates satisfactory number of QAPs with high qualities on the Gaokao-EN dataset.

摘要:我们提出了一个名为TP3的系统,以执行变形金刚从给定文章生成问答对(QAPS)的下游任务。TP3 First FineTunes在QAP数据集上预处理变压器,然后使用预处理管道选择适当的答案,为相关的句子和FineTunded Transformer的答案来生成候选QAPS,并最终使用后处理管道来过滤不充分的QAPS。特别是,使用预估计的T5模型作为变压器,将小队数据集用作FINETRUNUNING数据集,我们表明TP3在Gaokao-en数据集上生成了具有高品质的令人满意的QAPS。

NLP-20-标题 SeqZero Few-shot Compositional Semantic Parsing with Sequential Prompts and Zero-shot Models

链接: https://arxiv.org/abs/2205.07381
作者: Jingfeng Yang, Haoming Jiang, Qingyu Yin, Danqing Zhang, Bing Yin, Diyi Yang
备注: 12 pages, Findings of NAACL 2022

点击查看摘要

Abstract: Recent research showed promising results on combining pretrained language models (LMs) with canonical utterance for few-shot semantic parsing. The canonical utterance is often lengthy and complex due to the compositional structure of formal languages. Learning to generate such canonical utterance requires significant amount of data to reach high performance. Fine-tuning with only few-shot samples, the LMs can easily forget pretrained knowledge, overfit spurious biases, and suffer from compositionally out-of-distribution generalization errors. To tackle these issues, we propose a novel few-shot semantic parsing method – SeqZero. SeqZero decomposes the problem into a sequence of sub-problems, which correspond to the sub-clauses of the formal language. Based on the decomposition, the LMs only need to generate short answers using prompts for predicting sub-clauses. Thus, SeqZero avoids generating a long canonical utterance at once. Moreover, SeqZero employs not only a few-shot model but also a zero-shot model to alleviate the overfitting. In particular, SeqZero brings out the merits from both models via ensemble equipped with our proposed constrained rescaling. SeqZero achieves SOTA performance of BART-based models on GeoQuery and EcommerceQuery, which are two few-shot datasets with compositional data split.

摘要:最近的研究表明,将预审计的语言模型(LMS)与规范性语言结合起来,以进行几次射击语义解析方面有希望的结果。由于形式语言的组成结构,规范的话语通常是冗长而复杂的。学习生成这种规范的话语需要大量数据才能达到高性能。 LMS仅用几个样本进行微调,可以轻松地忘记鉴定的知识,过度拟合虚假的偏见,并且遭受了分布外的概括错误的困扰。为了解决这些问题,我们提出了一种新颖的几声语义解析方法-Seqzero。 Seqzero将问题分解为一系列子问题,这些序列与形式语言的子语言相对应。基于分解,LMS只需要使用预测子段的提示来生成简短的答案。因此,seqzero避免了一次产生长长的规范话语。此外,Seqzero不仅采用了几个射击模型,而且还采用了零拍模型来减轻过度拟合。特别是,Seqzero通过配备了我们建议的约束重新制定的合奏来阐明两种模型的优点。 Seqzero在GeoQuery和EcommerceQuery上实现了基于BART的模型的SOTA性能,它们是两个具有组成数据分配的少量数据集。

NLP-21-标题 Long-term Control for Dialogue Generation Methods and Evaluation

链接: https://arxiv.org/abs/2205.07352
作者: Ramya Ramakrishnan, Hashan Buddhika Narangodage, Mauro Schilman, Kilian Q. Weinberger, Ryan McDonald
备注:

点击查看摘要

Abstract: Current approaches for controlling dialogue response generation are primarily focused on high-level attributes like style, sentiment, or topic. In this work, we focus on constrained long-term dialogue generation, which involves more fine-grained control and requires a given set of control words to appear in generated responses. This setting requires a model to not only consider the generation of these control words in the immediate context, but also produce utterances that will encourage the generation of the words at some time in the (possibly distant) future. We define the problem of constrained long-term control for dialogue generation, identify gaps in current methods for evaluation, and propose new metrics that better measure long-term control. We also propose a retrieval-augmented method that improves performance of long-term controlled generation via logit modification techniques. We show through experiments on three task-oriented dialogue datasets that our metrics better assess dialogue control relative to current alternatives and that our method outperforms state-of-the-art constrained generation baselines.

摘要:当前控制对话响应生成的方法主要集中在样式,情感或主题之类的高级属性上。在这项工作中,我们专注于约束的长期对话生成,该对话涉及更细粒度的控制,并且需要给定的一组控制词以出现在生成的响应中。这种设置要求模型不仅要在直接上下文中考虑这些控制词的产生,还需要产生话语,以鼓励在(可能遥远)未来的某个时候产生单词的产生。我们定义了对话生成的长期控制的问题,确定当前评估方法中的差距,并提出了更好地衡量长期控制的新指标。我们还提出了一种检索功能的方法,该方法通过logit修饰技术来提高长期受控生成的性能。我们通过在三个面向任务的对话数据集的实验中展示,我们的指标可以更好地评估相对于当前替代方案的对话控制,并且我们的方法表现优于最先进的生成基线。

NLP-22-标题 Transkimmer Transformer Learns to Layer-wise Skim

链接: https://arxiv.org/abs/2205.07324
作者: Yue Guan, Zhengyi Li, Jingwen Leng, Zhouhan Lin, Minyi Guo
备注: Published as a conference paper at ACL 2022

点击查看摘要

Abstract: Transformer architecture has become the de-facto model for many machine learning tasks from natural language processing and computer vision. As such, improving its computational efficiency becomes paramount. One of the major computational inefficiency of Transformer-based models is that they spend the identical amount of computation throughout all layers. Prior works have proposed to augment the Transformer model with the capability of skimming tokens to improve its computational efficiency. However, they suffer from not having effectual and end-to-end optimization of the discrete skimming predictor. To address the above limitations, we propose the Transkimmer architecture, which learns to identify hidden state tokens that are not required by each layer. The skimmed tokens are then forwarded directly to the final output, thus reducing the computation of the successive layers. The key idea in Transkimmer is to add a parameterized predictor before each layer that learns to make the skimming decision. We also propose to adopt reparameterization trick and add skim loss for the end-to-end training of Transkimmer. Transkimmer achieves 10.97x average speedup on GLUE benchmark compared with vanilla BERT-base baseline with less than 1% accuracy degradation.

摘要:变压器体系结构已成为自然语言处理和计算机视觉的许多机器学习任务的事实模型。因此,提高其计算效率变得至关重要。基于变压器的模型的主要计算效率低下之一是,它们在所有层中花费了相同数量的计算。先前的工作已经提出,凭借撇油令牌提高其计算效率的能力来增强变压器模型。但是,他们没有对离散的撇油预测变量的有效和端到端优化。为了解决上述限制,我们提出了TransKimmer体系结构,该体系结构学会识别每一层不需要的隐藏状态令牌。然后将脱脂令牌直接转发到最终输出,从而减少连续层的计算。 TransKimmer中的关键思想是在每一层都在学习做出脱脂决策之前添加一个参数化的预测变量。我们还建议采用重新聚集技巧,并为TransKimmer的端到端培训增加略大的损失。与香草bert基线基线相比,TransKimmer在胶水基准测试基准的平均加速度为10.97倍,精度降解率少于1%。

NLP-23-标题 TiBERT Tibetan Pre-trained Language Model

链接: https://arxiv.org/abs/2205.07303
作者: Yuan Sun, Sisi Liu, Junjie Deng, Xiaobing Zhao
备注:

点击查看摘要

Abstract: The pre-trained language model is trained on large-scale unlabeled text and can achieve state-of-the-art results in many different downstream tasks. However, the current pre-trained language model is mainly concentrated in the Chinese and English fields. For low resource language such as Tibetan, there is lack of a monolingual pre-trained model. To promote the development of Tibetan natural language processing tasks, this paper collects the large-scale training data from Tibetan websites and constructs a vocabulary that can cover 99.95 % of the words in the corpus by using Sentencepiece. Then, we train the Tibetan monolingual pre-trained language model named TiBERT on the data and vocabulary. Finally, we apply TiBERT to the downstream tasks of text classification and question generation, and compare it with classic models and multilingual pre-trained models, the experimental results show that TiBERT can achieve the best performance. Our model is published in this http URL

摘要:预先训练的语言模型接受了大规模未标记文本的培训,并且可以在许多不同的下游任务中获得最新的结果。但是,当前的预训练的语言模型主要集中在中文和英语领域。对于诸如藏族之类的低资源语言,缺乏单语的预培训模型。为了促进藏族自然语言处理任务的发展,本文从藏族网站收集了大规模的培训数据,并构建了一个词汇,可以使用句子覆盖语料库中99.95%的单词。然后,我们在数据和词汇上训练名为Tibert的藏族单语训练的语言模型。最后,我们将Tibert应用于文本分类和问题生成的下游任务,并将其与经典模型和多语言预培训模型进行比较,实验结果表明Tibert可以实现最佳性能。我们的模型发表在此HTTP URL中

NLP-24-标题 Meta Self-Refinement for Robust Learning with Weak Supervision

链接: https://arxiv.org/abs/2205.07290
作者: Dawei Zhu, Xiaoyu Shen, Michael A. Hedderich, Dietrich Klakow
备注:

点击查看摘要

Abstract: Training deep neural networks (DNNs) with weak supervision has been a hot topic as it can significantly reduce the annotation cost. However, labels from weak supervision can be rather noisy and the high capacity of DNNs makes them easy to overfit the noisy labels. Recent methods leverage self-training techniques to train noise-robust models, where a teacher trained on noisy labels is used to teach a student. However, the teacher from such models might fit a substantial amount of noise and produce wrong pseudo-labels with high confidence, leading to error propagation. In this work, we propose Meta Self-Refinement (MSR), a noise-resistant learning framework, to effectively combat noisy labels from weak supervision sources. Instead of purely relying on a fixed teacher trained on noisy labels, we keep updating the teacher to refine its pseudo-labels. At each training step, it performs a meta gradient descent on the current mini-batch to maximize the student performance on a clean validation set. Extensive experimentation on eight NLP benchmarks demonstrates that MSR is robust against noise in all settings and outperforms the state-of-the-art up to 11.4% in accuracy and 9.26% in F1 score.

摘要:培训深度监督的深层神经网络(DNNS)一直是一个热门话题,因为它可以大大降低注释成本。但是,来自弱监督的标签可能会很嘈杂,而DNN的高容量使它们易于过度贴合嘈杂的标签。最近的方法利用自我训练的技术来训练噪声模型,在该模型中,接受嘈杂标签的老师被用来教学生。但是,这些模型的老师可能会符合大量噪音,并以高信心产生错误的伪标签,从而导致错误传播。在这项工作中,我们提出了一种抗噪声的学习框架META自我翻新(MSR),以有效地抵制来自弱监督源的嘈杂标签。我们不再纯粹依靠经过嘈杂标签的固定老师,而是继续更新老师以完善其伪标签。在每个训练步骤中,它在当前的迷你批次上执行元梯度下降,以最大程度地提高学生在清洁验证集上的表现。对八个NLP基准测试的广泛实验表明,MSR在所有设置中都具有强大的噪音,并且在准确性上优于最高的最高速度为11.4%,而F1得分的最高为9.26%。

NLP-25-标题 Domain Adaptation in Multilingual and Multi-Domain Monolingual Settings for Complex Word Identification

链接: https://arxiv.org/abs/2205.07283
作者: George-Eduard Zaharia, Răzvan-Alexandru Smădu, Dumitru-Clementin Cercel, Mihai Dascalu
备注: 9 pages, 1 figure, Accepted at ACL 2022 main conference (Long Paper)

点击查看摘要

Abstract: Complex word identification (CWI) is a cornerstone process towards proper text simplification. CWI is highly dependent on context, whereas its difficulty is augmented by the scarcity of available datasets which vary greatly in terms of domains and languages. As such, it becomes increasingly more difficult to develop a robust model that generalizes across a wide array of input examples. In this paper, we propose a novel training technique for the CWI task based on domain adaptation to improve the target character and context representations. This technique addresses the problem of working with multiple domains, inasmuch as it creates a way of smoothing the differences between the explored datasets. Moreover, we also propose a similar auxiliary task, namely text simplification, that can be used to complement lexical complexity prediction. Our model obtains a boost of up to 2.42% in terms of Pearson Correlation Coefficients in contrast to vanilla training techniques, when considering the CompLex from the Lexical Complexity Prediction 2021 dataset. At the same time, we obtain an increase of 3% in Pearson scores, while considering a cross-lingual setup relying on the Complex Word Identification 2018 dataset. In addition, our model yields state-of-the-art results in terms of Mean Absolute Error.

摘要:复杂的单词识别(CWI)是朝着正确简化文本的基石过程。 CWI高度依赖上下文,而其困难因可用数据集而稀缺,这些数据集在域和语言方面差异很大。因此,开发出跨各种输入示例的强大模型变得越来越困难。在本文中,我们根据域适应性提出了一种针对CWI任务的新型培训技术,以改善目标特征和上下文表示。该技术解决了与多个域一起工作的问题,因为它创建了一种平滑探索数据集之间差异的方法。此外,我们还提出了类似的辅助任务,即简化文本,可用于补充词汇复杂性预测。当考虑从词汇复杂性预测2021数据集的复合物时,我们的模型就Pearson相关系数而言获得了高达2.42%的增长。同时,我们获得了Pearson分数增长3%,同时考虑了依靠复杂单词标识2018数据集的跨语义设置。此外,我们的模型以平均绝对误差的形式产生最先进的结果。

NLP-26-标题 Classifiers are Better Experts for Controllable Text Generation

链接: https://arxiv.org/abs/2205.07276
作者: Askhat Sitdikov, Nikita Balagansky, Daniil Gavrilov, Alexander Markov
备注:

点击查看摘要

Abstract: This paper proposes a simple method for controllable text generation based on weighting logits produced, namely CAIF sampling. Using an arbitrary third-party text classifier, we adjust a small part of a language model’s logits and guide text generation towards or away from classifier prediction. We show that the proposed method significantly outperforms recent PPLM, GeDi, and DExperts on PPL and sentiment accuracy based on the external classifier of generated texts. A the same time, it is also easier to implement and tune, and has significantly fewer restrictions and requirements.

摘要:本文提出了一种基于产生的加权逻辑(即CAIF采样)来控制文本生成的简单方法。使用任意的第三方文本分类器,我们将语言模型logits的一小部分调整为指导文本生成,以远离分类器预测。我们表明,基于生成的文本的外部分类器,所提出的方法在PPL和情感精度上明显优于最近的PPLM,GEDI和DEXPERTS。在同一时间,它也更容易实施和调整,并且限制和要求较少。

NLP-27-标题 Textual Explanations and Critiques in Recommendation Systems

链接: https://arxiv.org/abs/2205.07268
作者: Diego Antognini
备注: Ph.D. Thesis, Ecole Polytechnique F’ed’erale de Lausanne (EPFL). See this https URL for the original version

点击查看摘要

Abstract: Artificial intelligence and machine learning algorithms have become ubiquitous. Although they offer a wide range of benefits, their adoption in decision-critical fields is limited by their lack of interpretability, particularly with textual data. Moreover, with more data available than ever before, it has become increasingly important to explain automated predictions. Generally, users find it difficult to understand the underlying computational processes and interact with the models, especially when the models fail to generate the outcomes or explanations, or both, correctly. This problem highlights the growing need for users to better understand the models’ inner workings and gain control over their actions. This dissertation focuses on two fundamental challenges of addressing this need. The first involves explanation generation: inferring high-quality explanations from text documents in a scalable and data-driven manner. The second challenge consists in making explanations actionable, and we refer to it as critiquing. This dissertation examines two important applications in natural language processing and recommendation tasks. Overall, we demonstrate that interpretability does not come at the cost of reduced performance in two consequential applications. Our framework is applicable to other fields as well. This dissertation presents an effective means of closing the gap between promise and practice in artificial intelligence.

摘要:人工智能和机器学习算法已变得无处不在。尽管它们提供了广泛的好处,但他们在决策领域的采用受到了缺乏解释性的限制,尤其是文本数据。此外,随着比以往更多的数据,解释自动化预测变得越来越重要。通常,用户发现很难理解基本的计算过程并与模型进行交互,尤其是当模型无法正确生成结果或解释时,或者正确地正确地解释了。这个问题强调了对用户更好地了解模型内部运作并获得对其行为的控制的需求日益增长的需求。本文的重点是解决这一需求的两个基本挑战。第一个涉及解释生成:以可扩展和数据驱动的方式从文本文档中推断出高质量的解释。第二个挑战是使解释可行,我们将其称为批评。本论文研究了自然语言处理和建议任务中的两个重要应用。总体而言,我们证明可解释性并不是为了降低两个相关应用的绩效成本。我们的框架也适用于其他领域。本文提出了一种有效的手段,可以弥合人工智能中的承诺与实践之间的差距。

NLP-28-标题 Topic Modelling on Consumer Financial Protection Bureau Data An Approach Using BERT Based Embeddings

链接: https://arxiv.org/abs/2205.07259
作者: Vasudeva Raju Sangaraju, Bharath Kumar Bolla, Deepak Kumar Nayak, Jyothsna Kh
备注: Accepted at International Conference for Convergence in Technology, 2022

点击查看摘要

Abstract: Customers’ reviews and comments are important for businesses to understand users’ sentiment about the products and services. However, this data needs to be analyzed to assess the sentiment associated with topics/aspects to provide efficient customer assistance. LDA and LSA fail to capture the semantic relationship and are not specific to any domain. In this study, we evaluate BERTopic, a novel method that generates topics using sentence embeddings on Consumer Financial Protection Bureau (CFPB) data. Our work shows that BERTopic is flexible and yet provides meaningful and diverse topics compared to LDA and LSA. Furthermore, domain-specific pre-trained embeddings (FinBERT) yield even better topics. We evaluated the topics on coherence score (c_v) and UMass.

摘要:客户的评论和评论对于企业了解用户对产品和服务的情感很重要。但是,需要分析此数据,以评估与主题/方面相关的情感,以提供有效的客户帮助。LDA和LSA无法捕获语义关系,也不适合任何领域。在这项研究中,我们评估了Bertopic,这是一种新颖的方法,该方法使用消费者金融保护局(CFPB)数据的句子来生成主题。我们的工作表明,与LDA和LSA相比,伯托很灵活,但提供了有意义和多样的主题。此外,特定领域的预训练嵌入(Finbert)会产生更好的主题。我们评估了连贯评分(C_V)和UMass的主题。

NLP-29-标题 Not to Overfit or Underfit? A Study of Domain Generalization in Question Answering

链接: https://arxiv.org/abs/2205.07257
作者: Md Arafat Sultan, Avirup Sil, Radu Florian
备注:

点击查看摘要

Abstract: Machine learning models are prone to overfitting their source (training) distributions, which is commonly believed to be why they falter in novel target domains. Here we examine the contrasting view that multi-source domain generalization (DG) is in fact a problem of mitigating source domain underfitting: models not adequately learning the signal in their multi-domain training data. Experiments on a reading comprehension DG benchmark show that as a model gradually learns its source domains better – using known methods such as knowledge distillation from a larger model – its zero-shot out-of-domain accuracy improves at an even faster rate. Improved source domain learning also demonstrates superior generalization over three popular domain-invariant learning methods that aim to counter overfitting.

摘要:机器学习模型容易过度拟合其来源(训练)分布,这通常被认为是它们在新型目标领域中步履蹒跚的原因。在这里,我们研究了一种对比观点,即多源域概括(DG)实际上是缓解源域不足的问题:模型未在其多域培训数据中充分学习信号。阅读理解理解DG基准的实验表明,随着模型逐渐学习其源域(使用已知的方法,例如从较大模型中进行知识蒸馏),其零拍摄的零外域准确性以更快的速度提高。改进的源域学习还表明,旨在应对过度拟合的三种流行域不变学习方法的卓越概括。

NLP-30-标题 Discovering Latent Concepts Learned in BERT

链接: https://arxiv.org/abs/2205.07237
作者: Fahim Dalvi, Abdul Rafae Khan, Firoj Alam, Nadir Durrani, Jia Xu, Hassan Sajjad
备注: ICLR 2022

点击查看摘要

Abstract: A large number of studies that analyze deep neural network models and their ability to encode various linguistic and non-linguistic concepts provide an interpretation of the inner mechanics of these models. The scope of the analyses is limited to pre-defined concepts that reinforce the traditional linguistic knowledge and do not reflect on how novel concepts are learned by the model. We address this limitation by discovering and analyzing latent concepts learned in neural network models in an unsupervised fashion and provide interpretations from the model’s perspective. In this work, we study: i) what latent concepts exist in the pre-trained BERT model, ii) how the discovered latent concepts align or diverge from classical linguistic hierarchy and iii) how the latent concepts evolve across layers. Our findings show: i) a model learns novel concepts (e.g. animal categories and demographic groups), which do not strictly adhere to any pre-defined categorization (e.g. POS, semantic tags), ii) several latent concepts are based on multiple properties which may include semantics, syntax, and morphology, iii) the lower layers in the model dominate in learning shallow lexical concepts while the higher layers learn semantic relations and iv) the discovered latent concepts highlight potential biases learned in the model. We also release a novel BERT ConceptNet dataset (BCN) consisting of 174 concept labels and 1M annotated instances.

摘要:大量研究分析了深神网络模型及其编码各种语言和非语言概念的能力,可以解释这些模型的内部力学。分析的范围仅限于预定义的概念,这些概念强化了传统的语言知识,并且不反思模型如何学到新颖概念。我们通过以无监督的方式发现和分析在神经网络模型中学到的潜在概念,并从模型的角度提供解释,以解决这一限制。在这项工作中,我们研究了:i)预先训练的伯特模型中存在哪些潜在概念,ii)发现的潜在概念如何与经典语言层次结构和iii相结合或差异。我们的发现表明:i)一个模型学习新颖概念(例如动物类别和人口统计组),这些概念并不严格遵守任何预定义的分类(例如,pos,语义标签),ii)ii)几个潜在概念基于多个属性,这些属性是多个属性可能包括语义,语法和形态,iii)模型中的较低层在学习浅词法概念中占主导地位,而较高的层学习语义关系和iv)发现的潜在概念突出了模型中学到的潜在偏见。我们还发布了一个新颖的BERT ConceptNet数据集(BCN),该数据集由174个概念标签和1M注释实例组成。

NLP-31-标题 Mitigating Toxic Degeneration with Empathetic Data Exploring the Relationship Between Toxicity and Empathy

链接: https://arxiv.org/abs/2205.07233
作者: Allison Lahnala, Charles Welch, Béla Neuendorf, Lucie Flek
备注: Accepted to NAACL 2022

点击查看摘要

Abstract: Large pre-trained neural language models have supported the effectiveness of many NLP tasks, yet are still prone to generating toxic language hindering the safety of their use. Using empathetic data, we improve over recent work on controllable text generation that aims to reduce the toxicity of generated text. We find we are able to dramatically reduce the size of fine-tuning data to 7.5-30k samples while at the same time making significant improvements over state-of-the-art toxicity mitigation of up to 3.4% absolute reduction (26% relative) from the original work on 2.3m samples, by strategically sampling data based on empathy scores. We observe that the degree of improvement is subject to specific communication components of empathy. In particular, the cognitive components of empathy significantly beat the original dataset in almost all experiments, while emotional empathy was tied to less improvement and even underperforming random samples of the original data. This is a particularly implicative insight for NLP work concerning empathy as until recently the research and resources built for it have exclusively considered empathy as an emotional concept.

摘要:大型预训练的神经语言模型支持了许多NLP任务的有效性,但仍然容易产生有毒语言阻碍其使用的安全性。使用善解人意的数据,我们改进了最新的可控文本生成工作,旨在减少生成的文本的毒性。我们发现我们能够将微调数据的规模大大减少到7.5-30k样本,同时对最高毒性的最高可减少3.4%的绝对减少(相对26%)进行显着改善(相对26%)根据对230万样本的原始工作,通过基于同理心分数对数据进行策略性采样。我们观察到,改善程度受到同理心的特定交流组成部分的约束。特别是,同理心的认知成分在几乎所有实验中都显着击败了原始数据集,而情绪同理心则与较小的改善甚至不足的原始数据随机样本相关。对于NLP的工作,这是关于同理心的一个特别隐含的见解,直到最近,为其建立的研究和资源仅将同理心视为一种情感概念。

NLP-32-标题 Adaptive Prompt Learning-based Few-Shot Sentiment Analysis

链接: https://arxiv.org/abs/2205.07220
作者: Pengfei Zhang, Tingting Chai, Yongdong Xu
备注:

点击查看摘要

Abstract: In the field of natural language processing, sentiment analysis via deep learning has a excellent performance by using large labeled datasets. Meanwhile, labeled data are insufficient in many sentiment analysis, and obtaining these data is time-consuming and laborious. Prompt learning devotes to resolving the data deficiency by reformulating downstream tasks with the help of prompt. In this way, the appropriate prompt is very important for the performance of the model. This paper proposes an adaptive prompting(AP) construction strategy using seq2seq-attention structure to acquire the semantic information of the input sequence. Then dynamically construct adaptive prompt which can not only improve the quality of the prompt, but also can effectively generalize to other fields by pre-trained prompt which is constructed by existing public labeled data. The experimental results on FewCLUE datasets demonstrate that the proposed method AP can effectively construct appropriate adaptive prompt regardless of the quality of hand-crafted prompt and outperform the state-of-the-art baselines.

摘要:在自然语言处理领域,通过使用大型标记的数据集,通过深度学习的情感分析具有出色的性能。同时,标记的数据在许多情绪分析中不足,并且获得这些数据是耗时且费力的。提示学习致力于通过提示的帮助来重新安装下游任务来解决数据缺陷。这样,适当的提示对于模型的性能非常重要。本文提出了使用SEQ2SEQ意见结构的自适应提示(AP)构建策略,以获取输入序列的语义信息。然后动态构建自适应提示,这不仅可以提高提示的质量,而且可以通过预先训练的提示有效地推广到其他字段,该提示是由现有公共标记的数据构建的。几clue数据集的实验结果表明,所提出的方法AP可以有效地构建适当的适应性提示,而不管手工制作的提示的质量和表现优于最先进的基线的质量如何。

NLP-33-标题 Fine-tuning Pre-trained Language Models for Few-shot Intent Detection Supervised Pre-training and Isotropization

链接: https://arxiv.org/abs/2205.07208
作者: Haode Zhang, Haowen Liang, Yuwei Zhang, Liming Zhan, Xiao-Ming Wu, Xiaolei Lu, Albert Y.S. Lam
备注: NAACL 2022

点击查看摘要

Abstract: It is challenging to train a good intent classifier for a task-oriented dialogue system with only a few annotations. Recent studies have shown that fine-tuning pre-trained language models with a small amount of labeled utterances from public benchmarks in a supervised manner is extremely helpful. However, we find that supervised pre-training yields an anisotropic feature space, which may suppress the expressive power of the semantic representations. Inspired by recent research in isotropization, we propose to improve supervised pre-training by regularizing the feature space towards isotropy. We propose two regularizers based on contrastive learning and correlation matrix respectively, and demonstrate their effectiveness through extensive experiments. Our main finding is that it is promising to regularize supervised pre-training with isotropization to further improve the performance of few-shot intent detection. The source code can be found at this https URL.

摘要:训练一个良好的意图分类器为一个只有几个注释的良好意图分类器培训良好的意图分类器。最近的研究表明,以有监督的方式对公共基准的少量标记的训练的预训练的语言模型非常有帮助。但是,我们发现受监督的训练会产生各向异性特征空间,这可能会抑制语义表示的表达能力。受各向同性化的最新研究的启发,我们建议通过将各向同性的特征空间正规化,以改善受监督的预训练。我们分别基于对比度学习和相关矩阵提出了两个正规化器,并通过广泛的实验证明了它们的有效性。我们的主要发现是,有望通过各向同性化对监督的预训练进行正规化,以进一步提高少数射击意图检测的性能。可以在此HTTPS URL上找到源代码。

NLP-34-标题 Hero-Gang Neural Model For Named Entity Recognition

链接: https://arxiv.org/abs/2205.07177
作者: Jinpeng Hu, Yaling Shen, Yang Liu, Xiang Wan, Tsung-Hui Chang
备注: 11 pages, 4 figures, NAACL 2022

点击查看摘要

Abstract: Named entity recognition (NER) is a fundamental and important task in NLP, aiming at identifying named entities (NEs) from free text. Recently, since the multi-head attention mechanism applied in the Transformer model can effectively capture longer contextual information, Transformer-based models have become the mainstream methods and have achieved significant performance in this task. Unfortunately, although these models can capture effective global context information, they are still limited in the local feature and position information extraction, which is critical in NER. In this paper, to address this limitation, we propose a novel Hero-Gang Neural structure (HGN), including the Hero and Gang module, to leverage both global and local information to promote NER. Specifically, the Hero module is composed of a Transformer-based encoder to maintain the advantage of the self-attention mechanism, and the Gang module utilizes a multi-window recurrent module to extract local features and position information under the guidance of the Hero module. Afterward, the proposed multi-window attention effectively combines global information and multiple local features for predicting entity labels. Experimental results on several benchmark datasets demonstrate the effectiveness of our proposed model.

摘要:命名实体识别(NER)是NLP的一项基本和重要任务,旨在从自由文本中识别命名实体(NES)。最近,由于在变压器模型中应用的多头注意机制可以有效地捕获更长的上下文信息,因此基于变压器的模型已成为主流方法,并在此任务中实现了显着的性能。不幸的是,尽管这些模型可以捕获有效的全球上下文信息,但它们在本地功能和位置信息提取中仍然受到限制,这在NER中至关重要。在本文中,为了解决这一局限性,我们提出了一种新颖的英雄神经结构(HGN),包括英雄和帮派模块,以利用全球和本地信息来推广NER。具体而言,英雄模块由基于变压器的编码器组成,以保持自我发挥机制的优势,而帮派模块则利用多窗口复发模块在英雄模块的指导下提取本地特征和位置信息。之后,提出的多窗口注意有效地结合了全局信息和用于预测实体标签的多个局部特征。几个基准数据集的实验结果证明了我们提出的模型的有效性。

NLP-35-标题 From Cognitive to Computational Modeling Text-based Risky Decision-Making Guided by Fuzzy Trace Theory

链接: https://arxiv.org/abs/2205.07164
作者: Jaron Mar, Jiamou Liu
备注:

点击查看摘要

Abstract: Understanding, modelling and predicting human risky decision-making is challenging due to intrinsic individual differences and irrationality. Fuzzy trace theory (FTT) is a powerful paradigm that explains human decision-making by incorporating gists, i.e., fuzzy representations of information which capture only its quintessential meaning. Inspired by Broniatowski and Reyna’s FTT cognitive model, we propose a computational framework which combines the effects of the underlying semantics and sentiments on text-based decision-making. In particular, we introduce Category-2-Vector to learn categorical gists and categorical sentiments, and demonstrate how our computational model can be optimised to predict risky decision-making in groups and individuals.

摘要:由于固有的个体差异和非理性性,理解,建模和预测人类风险决策是具有挑战性的。模糊痕量理论(FTT)是一种强大的范式,通过合并GIST,即信息的模糊表示来解释人类的决策,这些信息仅捕获其典型的含义。受Broniatowski和Reyna的FTT认知模型的启发,我们提出了一个计算框架,该计算框架结合了基础语义和情感对基于文本决策的影响。特别是,我们介绍了2级矢量,以学习分类的要点和分类情感,并演示如何优化我们的计算模型以预测群体和个人的风险决策。

NLP-36-标题 Exploring Generalizability of Fine-Tuned Models for Fake News Detection

链接: https://arxiv.org/abs/2205.07154
作者: Abhijit Suprem, Calton Pu
备注:

点击查看摘要

Abstract: The Covid-19 pandemic has caused a dramatic and parallel rise in dangerous misinformation, denoted an `infodemic’ by the CDC and WHO. Misinformation tied to the Covid-19 infodemic changes continuously; this can lead to performance degradation of fine-tuned models due to concept drift. Degredation can be mitigated if models generalize well-enough to capture some cyclical aspects of drifted data. In this paper, we explore generalizability of pre-trained and fine-tuned fake news detectors across 9 fake news datasets. We show that existing models often overfit on their training dataset and have poor performance on unseen data. However, on some subsets of unseen data that overlap with training data, models have higher accuracy. Based on this observation, we also present KMeans-Proxy, a fast and effective method based on K-Means clustering for quickly identifying these overlapping subsets of unseen data. KMeans-Proxy improves generalizability on unseen fake news datasets by 0.1-0.2 f1-points across datasets. We present both our generalizability experiments as well as KMeans-Proxy to further research in tackling the fake news problem.

摘要:COVID-19大流行引起了危险错误信息的急剧和平行的增长,表示CDC和WHO。与COVID-19的不断变化相关的错误信息连续变化;这可能导致由于概念漂移而导致微调模型的性能退化。如果模型概括以捕获漂移数据的某些周期性方面,则可以减轻降级。在本文中,我们探讨了9个假新闻数据集中预训练和微调的假新闻探测器的普遍性。我们表明,现有的模型经常在其培训数据集中过度拟合,并且在看不见的数据上的性能差。但是,在与培训数据重叠的一些看不见的数据子集中,模型具有更高的精度。基于此观察结果,我们还提出了Kmeans-Proxy,这是一种基于K-均值聚类的快速有效方法,可快速识别这些看不见数据的重叠子集。 Kmeans-Proxy将对看不见的假新闻数据集的概括性提高了数据集的0.1-0.2 f1点。我们介绍了我们的概括性实验以及Kmeans-Proxy,以进一步研究解决假新闻问题。

NLP-37-标题 The VoicePrivacy 2020 Challenge Evaluation Plan

链接: https://arxiv.org/abs/2205.07123
作者: Natalia Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Jose Patino, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco
备注: arXiv admin note: text overlap with arXiv:2203.12468

点击查看摘要

Abstract: The VoicePrivacy Challenge aims to promote the development of privacy preservation tools for speech technology by gathering a new community to define the tasks of interest and the evaluation methodology, and benchmarking solutions through a series of challenges. In this document, we formulate the voice anonymization task selected for the VoicePrivacy 2020 Challenge and describe the datasets used for system development and evaluation. We also present the attack models and the associated objective and subjective evaluation metrics. We introduce two anonymization baselines and report objective evaluation results.

摘要:语音私人挑战旨在通过收集一个新的社区来定义感兴趣的任务和评估方法,并通过一系列挑战来确定解决方案,从而促进语音技术的隐私保护工具的开发。在本文档中,我们制定了为Voice Privacy 2020挑战选择的语音匿名任务,并描述用于系统开发和评估的数据集。我们还介绍了攻击模型以及相关的客观和主观评估指标。我们介绍了两个匿名基线并报告客观评估结果。

NLP-38-标题 Multiformer A Head-Configurable Transformer-Based Model for Direct Speech Translation

链接: https://arxiv.org/abs/2205.07100
作者: Gerard Sant, Gerard I. Gállego, Belen Alastruey, Marta R. Costa-Jussà
备注: NAACL-SRW 2022

点击查看摘要

Abstract: Transformer-based models have been achieving state-of-the-art results in several fields of Natural Language Processing. However, its direct application to speech tasks is not trivial. The nature of this sequences carries problems such as long sequence lengths and redundancy between adjacent tokens. Therefore, we believe that regular self-attention mechanism might not be well suited for it. Different approaches have been proposed to overcome these problems, such as the use of efficient attention mechanisms. However, the use of these methods usually comes with a cost, which is a performance reduction caused by information loss. In this study, we present the Multiformer, a Transformer-based model which allows the use of different attention mechanisms on each head. By doing this, the model is able to bias the self-attention towards the extraction of more diverse token interactions, and the information loss is reduced. Finally, we perform an analysis of the head contributions, and we observe that those architectures where all heads relevance is uniformly distributed obtain better results. Our results show that mixing attention patterns along the different heads and layers outperforms our baseline by up to 0.7 BLEU.

摘要:基于变压器的模型已经在自然语言处理的几个领域中实现了最新的结果。但是,它直接应用于语音任务并不是微不足道的。该序列的性质带有问题,例如长序列长度和相邻令牌之间的冗余。因此,我们认为常规的自我注意机制可能不适合它。已经提出了不同的方法来克服这些问题,例如使用有效的注意机制。但是,这些方法的使用通常会带来成本,这是由于信息丢失引起的绩效降低。在这项研究中,我们介绍了Multiformer,这是一个基于变压器的模型,该模型允许在每个头部使用不同的注意机制。通过这样做,该模型能够将自我注意力偏向于提取更多样化的代币相互作用,并减少信息丢失。最后,我们对头部贡献进行分析,并观察到所有头部相关性均匀分布的架构获得了更好的结果。我们的结果表明,沿着不同头部和层混合注意力模式的表现优于我们的基线高达0.7 bleu。

NLP-39-标题 What do Models Learn From Training on More Than Text? Measuring Visual Commonsense Knowledge

链接: https://arxiv.org/abs/2205.07065
作者: Lovisa Hagström, Richard Johansson
备注: Accepted to the ACL Student Research Workshop 2022

点击查看摘要

Abstract: There are limitations in learning language from text alone. Therefore, recent focus has been on developing multimodal models. However, few benchmarks exist that can measure what language models learn about language from multimodal training. We hypothesize that training on a visual modality should improve on the visual commonsense knowledge in language models. Therefore, we introduce two evaluation tasks for measuring visual commonsense knowledge in language models and use them to evaluate different multimodal models and unimodal baselines. Primarily, we find that the visual commonsense knowledge is not significantly different between the multimodal models and unimodal baseline models trained on visual text data.

摘要:仅从文本中学习语言就存在局限性。因此,最近的重点是开发多模型模型。但是,很少有基准可以衡量从多模式培训中学到的语言模型的知识。我们假设在视觉方式上进行培训应该改善语言模型中的视觉常识知识。因此,我们介绍了两项评估任务,用于测量语言模型中的视觉常识知识,并使用它们来评估不同的多模式模型和单峰基线。首先,我们发现在视觉文本数据训练的多模型模型和单峰基线模型之间,视觉常识性知识没有显着差异。

NLP-40-标题 Naturalistic Causal Probing for Morpho-Syntax

链接: https://arxiv.org/abs/2205.07043
作者: Afra Amini, Tiago Pimentel, Clara Meister, Ryan Cotterell
备注: To appear in TACL 2022. The arXiv version is a pre-MIT Press publication version

点击查看摘要

Abstract: Probing has become a go-to methodology for interpreting and analyzing deep neural models in natural language processing. Yet recently, there has been much debate around the limitations and weaknesses of probes. In this work, we suggest a naturalistic strategy for input-level intervention on real world data in Spanish, which is a language with gender marking. Using our approach, we isolate morpho-syntactic features from counfounders in sentences, e.g. topic, which will then allow us to causally probe pre-trained models. We apply this methodology to analyze causal effects of gender and number on contextualized representations extracted from pre-trained models – BERT, RoBERTa and GPT-2. Our experiments suggest that naturalistic intervention can give us stable estimates of causal effects, which varies across different words in a sentence. We further show the utility of our estimator in investigating gender bias in adjectives, and answering counterfactual questions in masked prediction. Our probing experiments highlights the importance of conducting causal probing in determining if a particular property is encoded in representations.

摘要:探测已成为解释和分析自然语言处理中深神经模型的首选方法。然而,最近,关于探针的局限性和弱点一直存在很多争论。在这项工作中,我们提出了一种自然主义的策略,以对现实世界数据的西班牙语进行投入级别的干预,这是一种具有性别标记的语言。使用我们的方法,我们将句子中的词法特征分离出来,例如主题,这将使我们能够进行因果探测预训练的模型。我们将此方法应用于分析性别和数字对从预训练模型中提取的上下文化表示的因果影响-BERT,ROBERTA和GPT-2。我们的实验表明,自然主义的干预可以使我们对因果效应的稳定估计,这在句子中的不同单词各不相同。我们进一步显示了估计量在研究形容词中的性别偏见方面的实用性,并在蒙版预测中回答了反事实问题。我们的探测实验强调了进行因果探测在确定特定属性是否在表示中编码的重要性。

NLP-41-标题 Integration of Text and Graph-based Features for Detecting Mental Health Disorders from Voice

链接: https://arxiv.org/abs/2205.07006
作者: Nasser Ghadiri, Rasoul Samani, Fahime Shahrokh
备注:

点击查看摘要

Abstract: With the availability of voice-enabled devices such as smart phones, mental health disorders could be detected and treated earlier, particularly post-pandemic. The current methods involve extracting features directly from audio signals. In this paper, two methods are used to enrich voice analysis for depression detection: graph transformation of voice signals, and natural language processing of the transcript based on representational learning, fused together to produce final class labels. The results of experiments with the DAIC-WOZ dataset suggest that integration of text-based voice classification and learning from low level and graph-based voice signal features can improve the detection of mental disorders like depression.

摘要:借助智能手机等语音设备的可用性,可以先前检测和治疗心理健康障碍,尤其是后流行病。当前方法涉及直接从音频信号提取功能。在本文中,使用两种方法来丰富语音分析以进行抑郁检测:语音信号的图形转换以及基于代表性学习的成绩单的自然语言处理,融合在一起以产生最终类标签。使用DAIC-WOZ数据集实验的结果表明,基于文本的语音分类和从低级别和基于图的语音信号特征学习的整合可以改善对抑郁等精神障碍的检测。

NLP-42-标题 Improving Neural Machine Translation of Indigenous Languages with Multilingual Transfer Learning

链接: https://arxiv.org/abs/2205.06993
作者: Wei-Rui Chen, Muhammad Abdul-Mageed
备注:

点击查看摘要

Abstract: Machine translation (MT) involving Indigenous languages, including those possibly endangered, is challenging due to lack of sufficient parallel data. We describe an approach exploiting bilingual and multilingual pretrained MT models in a transfer learning setting to translate from Spanish to ten South American Indigenous languages. Our models set new SOTA on five out of the ten language pairs we consider, even doubling performance on one of these five pairs. Unlike previous SOTA that perform data augmentation to enlarge the train sets, we retain the low-resource setting to test the effectiveness of our models under such a constraint. In spite of the rarity of linguistic information available about the Indigenous languages, we offer a number of quantitative and qualitative analyses (e.g., as to morphology, tokenization, and orthography) to contextualize our results.

摘要:由于缺乏足够的并行数据,涉及土著语言的机器翻译(MT)(包括可能濒临灭绝的语言)具有挑战性。我们描述了一种在转移学习环境中利用双语和多语言预定的MT模型的方法,从西班牙语转化为十种南美土著语言。我们的模型为我们考虑的十对语言中的五个中的五个设定了新的SOTA,甚至可以在这五对中的一项中表现加倍。与以前的SOTA执行数据扩大以扩大火车集的不同,我们保留了低资源设置,以在这种约束下测试模型的有效性。尽管有有关土著语言的语言信息的稀有性,但我们提供了许多定量和定性分析(例如,关于形态学,代币化和拼字法),以使我们的结果与之相关。

NLP-43-标题 Review-Based Tip Generation for Music Songs

链接: https://arxiv.org/abs/2205.06985
作者: Jingya Zang, Cuiyun Gao, Yupan Chen, Ruifeng Xu, Lanjun Zhou, Xuan Wang
备注:

点击查看摘要

Abstract: Reviews of songs play an important role in online music service platforms. Prior research shows that users can make quicker and more informed decisions when presented with meaningful song reviews. However, reviews of music songs are generally long in length and most of them are non-informative for users. It is difficult for users to efficiently grasp meaningful messages for making decisions. To solve this problem, one practical strategy is to provide tips, i.e., short, concise, empathetic, and self-contained descriptions about songs. Tips are produced from song reviews and should express non-trivial insight about the songs. To the best of our knowledge, no prior studies have explored the tip generation task in music domain. In this paper, we create a dataset named MTips for the task and propose a framework named GenTMS for automatically generating tips from song reviews. The dataset involves 8,003 Chinese tips/non-tips from 128 songs which are distributed in five different song genres. Experimental results show that GenTMS achieves top-10 precision at 85.56%, outperforming the baseline models by at least 3.34%. Besides, to simulate the practical usage of our proposed framework, we also experiment with previously-unseen songs, during which GenTMS also achieves the best performance with top-10 precision at 78.89% on average. The results demonstrate the effectiveness of the proposed framework in tip generation of the music domain.

摘要:歌曲的评论在在线音乐服务平台中起着重要作用。先前的研究表明,在提出有意义的歌曲评论时,用户可以做出更快,更明智的决策。但是,音乐歌曲的评论通常长度很长,大多数对用户来说是不明智的。用户很难有效掌握有意义的消息来做出决策。为了解决这个问题,一种实用的策略是提供技巧,即简短,简洁,同情和关于歌曲的独立描述。技巧是由歌曲评论产生的,应该表达有关歌曲的非平凡见解。据我们所知,没有先前的研究探讨了音乐领域的提示生成任务。在本文中,我们为任务创建了一个名为MTIP的数据集,并为自动生成歌曲评论中的提示的框架提出了一个名为Gentms的框架。该数据集涉及128首歌曲中的8,003个中文技巧/非TIPS,这些歌曲分布在五种不同的歌曲流派中。实验结果表明,Gentms以85.56%的速度获得了前十名,表现优于基线模型至少3.34%。此外,为了模拟我们提出的框架的实际用法,我们还尝试了以前未见的歌曲,在此期间,Gentms也以平均为78.89%的前10精度获得了最佳性能。结果证明了拟议框架在音乐领域的小费中的有效性。

NLP-44-标题 RASAT Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL

链接: https://arxiv.org/abs/2205.06983
作者: Jiexing Qi, Jingyao Tang, Ziwei He, Xiangpeng Wan, Chenghu Zhou, Xinbing Wang, Quanshi Zhang, Zhouhan Lin
备注: 9 pages, first version

点击查看摘要

Abstract: Relational structures such as schema linking and schema encoding have been validated as a key component to qualitatively translating natural language into SQL queries. However, introducing these structural relations comes with prices: they often result in a specialized model structure, which largely prohibits the use of large pretrained models in text-to-SQL. To address this problem, we propose RASAT: a Transformer seq2seq architecture augmented with relation-aware self-attention that could leverage a variety of relational structures while at the meantime being able to effectively inherit the pretrained parameters from the T5 model. Our model is able to incorporate almost all types of existing relations in the literature, and in addition, we propose to introduce co-reference relations for the multi-turn scenario. Experimental results on three widely used text-to-SQL datasets, covering both single-turn and multi-turn scenarios, have shown that RASAT could achieve competitive results in all three benchmarks, achieving state-of-the-art performance in execution accuracy (80.5% EX on Spider, 53.1% IEX on SParC, and 37.5% IEX on CoSQL).

摘要:诸如模式链接和模式编码之类的关系结构已被验证为将自然语言定性转化为SQL查询的关键组成部分。但是,引入这些结构关系伴随着价格:它们通常会导致专门的模型结构,这在很大程度上禁止在文本到SQL中使用大型预审计的模型。为了解决这个问题,我们提出了RASAT:具有关系感知的自我发言的变压器SEQ2SEQ架构增强,可以利用各种关系结构,同时能够从T5模型中有效地继承预告片的参数。我们的模型能够在文献中纳入几乎所有类型的现有关系,此外,我们建议为多转变场景引入共同参考关系。在三个广泛使用的文本到SQL数据集的实验结果,涵盖了单转弯和多转情况,这表明RASAT可以在所有三个基准测试中获得竞争成果,从而在执行准确性方面实现最先进的性能( 80.5 \%EX蜘蛛,53.1 \%iex在sparc上,cosql上的37.5 \%iex)。

NLP-45-标题 ACCoRD A Multi-Document Approach to Generating Diverse Descriptions of Scientific Concepts

链接: https://arxiv.org/abs/2205.06982
作者: Sonia K. Murthy, Kyle Lo, Daniel King, Chandra Bhagavatula, Bailey Kuehl, Sophie Johnson, Jonathan Borchardt, Daniel S. Weld, Tom Hope, Doug Downey
备注:

点击查看摘要

Abstract: Systems that can automatically define unfamiliar terms hold the promise of improving the accessibility of scientific texts, especially for readers who may lack prerequisite background knowledge. However, current systems assume a single “best” description per concept, which fails to account for the many potentially useful ways a concept can be described. We present ACCoRD, an end-to-end system tackling the novel task of generating sets of descriptions of scientific concepts. Our system takes advantage of the myriad ways a concept is mentioned across the scientific literature to produce distinct, diverse descriptions of target scientific concepts in terms of different reference concepts. To support research on the task, we release an expert-annotated resource, the ACCoRD corpus, which includes 1,275 labeled contexts and 1,787 hand-authored concept descriptions. We conduct a user study demonstrating that (1) users prefer descriptions produced by our end-to-end system, and (2) users prefer multiple descriptions to a single “best” description.

摘要:可以自动定义陌生术语的系统具有改善科学文本的可访问性的希望,尤其是对于可能缺乏前提背景知识的读者而言。但是,当前系统每个概念都采用单个“最佳”描述,该描述无法说明可以描述概念的许多潜在有用方式。我们提出Accord,这是一个端到端的系统,可应对生成科学概念描述集的新任务。我们的系统利用了整个科学文献中提到的概念的众多方式,以根据不同的参考概念对目标科学概念产生不同的描述。为了支持对任务的研究,我们发布了专家注册资源,Accord语料库,其中包括1,275个标记的上下文和1,787个手工撰写的概念描述。我们进行了一项用户研究,表明(1)用户更喜欢我们的端到端系统产生的描述,并且(2)用户更喜欢多个描述,而不是单个“最佳”描述。

NLP-46-标题 Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing

链接: https://arxiv.org/abs/2205.06963
作者: Heli Qi, Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura
备注: Submitted to INTERSPEECH 2022

点击查看摘要

Abstract: Consistency regularization has recently been applied to semi-supervised sequence-to-sequence (S2S) automatic speech recognition (ASR). This principle encourages an ASR model to output similar predictions for the same input speech with different perturbations. The existing paradigm of semi-supervised S2S ASR utilizes SpecAugment as data augmentation and requires a static teacher model to produce pseudo transcripts for untranscribed speech. However, this paradigm fails to take full advantage of consistency regularization. First, the masking operations of SpecAugment may damage the linguistic contents of the speech, thus influencing the quality of pseudo labels. Second, S2S ASR requires both input speech and prefix tokens to make the next prediction. The static prefix tokens made by the offline teacher model cannot match dynamic pseudo labels during consistency training. In this work, we propose an improved consistency training paradigm of semi-supervised S2S ASR. We utilize speech chain reconstruction as the weak augmentation to generate high-quality pseudo labels. Moreover, we demonstrate that dynamic pseudo transcripts produced by the student ASR model benefit the consistency training. Experiments on LJSpeech and LibriSpeech corpora show that compared to supervised baselines, our improved paradigm achieves a 12.2% CER improvement in the single-speaker setting and 38.6% in the multi-speaker setting.

摘要:一致性正则化最近已应用于半监督序列到序列(S2S)自动语音识别(ASR)。该原理鼓励ASR模型对具有不同扰动的相同输入语音输出相似的预测。半监督的S2S ASR的现有范式利用规格作为数据增强,并需要静态教师模型来生产用于未转录语音的伪成绩单。但是,该范式无法充分利用一致性正则化。首先,规格的掩盖操作可能会损害语言内容,从而影响伪标签的质量。其次,S2S ASR需要输入语音和前缀令牌才能进行下一个预测。离线教师模型制作的静态前缀令牌在一致性训练过程中无法匹配动态伪标签。在这项工作中,我们提出了改进的半监督S2S ASR的一致性培训范式。我们利用语音链重建为弱扩展来产生高质量的伪标签。此外,我们证明了学生ASR模型产生的动态伪成绩单有益于一致性训练。 LjSpeech和Librispeech Corpora的实验表明,与受监督的基线相比,我们的改进范式在单扬声器设置中可提高12.2%的CER,在多演讲者设置中获得38.6%的范围。

NLP-47-标题 Auto-Select Reading Passages in English Assessment Tests?

链接: https://arxiv.org/abs/2205.06961
作者: Bruce W. Lee, Jason H. Lee
备注: 5 pages, 4 figures

点击查看摘要

Abstract: We show a method to auto-select reading passages in English assessment tests and share some key insights that can be helpful in related fields. In specifics, we prove that finding a similar passage (to a passage that already appeared in the test) can give a suitable passage for test development. In the process, we create a simple database-tagger-filter algorithm and perform a human evaluation. However, 1. the textual features, that we analyzed, lack coverage, and 2. we fail to find meaningful correlations between each feature and suitability score. Lastly, we describe the future developments to improve automated reading passage selection.

摘要:我们展示了一种在英语评估测试中自动选择阅读段落的方法,并分享一些对相关领域有帮助的关键见解。在细节上,我们证明找到类似的段落(到已经出现在测试中的段落)可以为测试开发提供合适的段落。在此过程中,我们创建了一个简单的数据库 - tagger滤器算法并执行人类评估。但是,1。我们分析,缺乏覆盖范围的文本功能,2。我们没有发现每个功能和适合性分数之间的有意义的相关性。最后,我们描述了未来的发展,以改善自动阅读通道的选择。

NLP-48-标题 Generating Literal and Implied Subquestions to Fact-check Complex Claims

链接: https://arxiv.org/abs/2205.06938
作者: Jifan Chen, Aniruddh Sriram, Eunsol Choi, Greg Durrett
备注:

点击查看摘要

Abstract: Verifying complex political claims is a challenging task, especially when politicians use various tactics to subtly misrepresent the facts. Automatic fact-checking systems fall short here, and their predictions like “half-true” are not very useful in isolation, since we have no idea which parts of the claim are true and which are not. In this work, we focus on decomposing a complex claim into a comprehensive set of yes-no subquestions whose answers influence the veracity of the claim. We present ClaimDecomp, a dataset of decompositions for over 1000 claims. Given a claim and its verification paragraph written by fact-checkers, our trained annotators write subquestions covering both explicit propositions of the original claim and its implicit facets, such as asking about additional political context that changes our view of the claim’s veracity. We study whether state-of-the-art models can generate such subquestions, showing that these models generate reasonable questions to ask, but predicting the comprehensive set of subquestions from the original claim without evidence remains challenging. We further show that these subquestions can help identify relevant evidence to fact-check the full claim and derive the veracity through their answers, suggesting that they can be useful pieces of a fact-checking pipeline.

摘要:验证复杂的政治主张是一项具有挑战性的任务,尤其是当政客使用各种策略巧妙地歪曲事实时。自动核对系统在这里不足,他们的预测在隔离中并不是很有用,因为我们不知道该索赔的哪一部分是真实的,哪些不是。在这项工作中,我们专注于将复杂的主张分解成一组全面的Yes-no子问题集,其答案影响了索赔的真实性。我们介绍了SoipeDecomp,这是一个用于1000多个索赔的分解数据集。鉴于事实检查者撰写的主张及其验证段落,我们训练有素的注释者写了涵盖了原始主张及其内在方面的两个明确命题的子问题,例如询问其他政治背景,以改变我们对索赔真实性的看法。我们研究了最新的模型是否可以产生这样的子问题,表明这些模型会产生合理的问题,但预测原始索赔中没有证据的全面子问题仍然具有挑战性。我们进一步表明,这些子问题可以帮助确定相关证据,以检查全部主张并通过答案得出真实性,这表明它们可以成为事实检查管道的有用部分。

NLP-49-标题 A Property Induction Framework for Neural Language Models

链接: https://arxiv.org/abs/2205.06910
作者: Kanishka Misra, Julia Taylor Rayz, Allyson Ettinger
备注: CogSci 2022 camera ready version, with hyperref-compatible citations. Code and Supplemental Material can be found in this https URL

点击查看摘要

Abstract: To what extent can experience from language contribute to our conceptual knowledge? Computational explorations of this question have shed light on the ability of powerful neural language models (LMs) – informed solely through text input – to encode and elicit information about concepts and properties. To extend this line of research, we present a framework that uses neural-network language models (LMs) to perform property induction – a task in which humans generalize novel property knowledge (has sesamoid bones) from one or more concepts (robins) to others (sparrows, canaries). Patterns of property induction observed in humans have shed considerable light on the nature and organization of human conceptual knowledge. Inspired by this insight, we use our framework to explore the property inductions of LMs, and find that they show an inductive preference to generalize novel properties on the basis of category membership, suggesting the presence of a taxonomic bias in their representations.

摘要:从语言中可以在多大程度上有助于我们的概念知识?这个问题的计算探索已经阐明了强大的神经语言模型(LMS)的能力(仅通过文本输入来告知),以编码和获取有关概念和属性的信息。为了扩展这一研究,我们提出了一个使用神经网络语言模型(LMS)来执行财产归纳的框架 - 人类从一个或多个概念(Robins)概括新颖的财产知识(具有芝麻骨)的任务其他(麻雀,金丝雀)。在人类中观察到的财产归纳模式已经对人类概念知识的性质和组织有了很大的了解。受到这种见解的启发,我们使用框架探索LMS的性质归纳,并发现它们表现出基于类别成员资格的新型属性的归纳优先偏好,这表明其表示中存在分类偏见。

NLP-50-标题 Developing a Production System for Purpose of Call Detection in Business Phone Conversations

链接: https://arxiv.org/abs/2205.06904
作者: Elena Khasanova, Pooja Hiranandani, Shayna Gardiner, Cheng Chen, Xue-Yong Fu, Simon Corston-Oliver
备注: NAACL 2022

点击查看摘要

Abstract: For agents at a contact centre receiving calls, the most important piece of information is the reason for a given call. An agent cannot provide support on a call if they do not know why a customer is calling. In this paper we describe our implementation of a commercial system to detect Purpose of Call statements in English business call transcripts in real time. We present a detailed analysis of types of Purpose of Call statements and language patterns related to them, discuss an approach to collect rich training data by bootstrapping from a set of rules to a neural model, and describe a hybrid model which consists of a transformer-based classifier and a set of rules by leveraging insights from the analysis of call transcripts. The model achieved 88.6 F1 on average in various types of business calls when tested on real life data and has low inference time. We reflect on the challenges and design decisions when developing and deploying the system.

摘要:对于接听电话的代理商,最重要的信息是给定电话的原因。如果代理商不知道客户为什么要打电话,他们将无法提供呼叫的支持。在本文中,我们描述了我们实施商业系统,以实时检测英语商务呼叫笔录中的呼叫声明目的。我们对与之相关的呼叫声明和语言模式的目的类型进行了详细分析,讨论一种通过自举通过一组规则到神经模型来收集丰富培训数据的方法,并描述了由变压器组成的混合模型通过利用呼叫成绩单分析的见解,基于分类器和一组规则。在对现实生活数据进行测试时,该模型平均在各种类型的业务呼叫中达到了88.6 F1,并且推理时间较低。我们反思开发和部署系统时的挑战和设计决策。

NLP-51-标题 Bootstrapping Text Anonymization Models with Distant Supervision

链接: https://arxiv.org/abs/2205.06895
作者: Anthi Papadopoulou, Pierre Lison, Lilja Øvrelid, Ildikó Pilán
备注:

点击查看摘要

Abstract: We propose a novel method to bootstrap text anonymization models based on distant supervision. Instead of requiring manually labeled training data, the approach relies on a knowledge graph expressing the background information assumed to be publicly available about various individuals. This knowledge graph is employed to automatically annotate text documents including personal data about a subset of those individuals. More precisely, the method determines which text spans ought to be masked in order to guarantee k -anonymity, assuming an adversary with access to both the text documents and the background information expressed in the knowledge graph. The resulting collection of labeled documents is then used as training data to fine-tune a pre-trained language model for text anonymization. We illustrate this approach using a knowledge graph extracted from Wikidata and short biographical texts from Wikipedia. Evaluation results with a RoBERTa-based model and a manually annotated collection of 553 summaries showcase the potential of the approach, but also unveil a number of issues that may arise if the knowledge graph is noisy or incomplete. The results also illustrate that, contrary to most sequence labeling problems, the text anonymization task may admit several alternative solutions.

摘要:我们提出了一种基于遥远监督的匿名模型来引导文本匿名模型的新方法。该方法不需要手动标记的培训数据,而是依赖于表达背景信息的知识图。该知识图用于自动注释文本文档,包括有关这些个人子集的个人数据。更确切地说,该方法确定应该掩盖哪些文本跨度以保证k-匿名性,假设具有对文本文档的访问和知识图中表达的背景信息的对手。然后将所得的标记文档集合用作培训数据,以微调用于文本匿名化的预训练的语言模型。我们使用从Wikidata提取的知识图和Wikipedia简短的传记文本来说明这种方法。评估结果具有基于罗伯塔的模型和553个摘要的手动注释集合,展示了该方法的潜力,但如果知识图是嘈杂或不完整的,则可能会出现许多问题。结果还表明,与大多数序列标记问题相反,文本匿名任务可能会允许几种替代解决方案。

NLP-52-标题 PathologyBERT – Pre-trained Vs. A New Transformer Language Model for Pathology Domain

链接: https://arxiv.org/abs/2205.06885
作者: Thiago Santos, Amara Tariq, Susmita Das, Kavyasree Vayalpati, Geoffrey H. Smith, Hari Trivedi, Imon Banerjee
备注: submitted to “American Medical Informatics Association (AMIA)” 2022 Annual Symposium

点击查看摘要

Abstract: Pathology text mining is a challenging task given the reporting variability and constant new findings in cancer sub-type definitions. However, successful text mining of a large pathology database can play a critical role to advance ‘big data’ cancer research like similarity-based treatment selection, case identification, prognostication, surveillance, clinical trial screening, risk stratification, and many others. While there is a growing interest in developing language models for more specific clinical domains, no pathology-specific language space exist to support the rapid data-mining development in pathology space. In literature, a few approaches fine-tuned general transformer models on specialized corpora while maintaining the original tokenizer, but in fields requiring specialized terminology, these models often fail to perform adequately. We propose PathologyBERT - a pre-trained masked language model which was trained on 347,173 histopathology specimen reports and publicly released in the Huggingface repository. Our comprehensive experiments demonstrate that pre-training of transformer model on pathology corpora yields performance improvements on Natural Language Understanding (NLU) and Breast Cancer Diagnose Classification when compared to nonspecific language models.

摘要:鉴于癌症子类型定义的报告变异性和不断的新发现,病理文本挖掘是一项具有挑战性的任务。但是,大型病理数据库的成功文本挖掘可以发挥关键作用,以推进“大数据”癌症研究,例如基于相似性的治疗选择,病例识别,预后,监视,临床试验筛查,风险分层等。尽管人们对为更具体的临床领域开发语言模型的兴趣越来越大,但没有特定于病理学的语言空间来支持病理空间的快速数据开发。在文献中,在维护原始令牌的同时,在专业语料库上进行了一些微调的通用变压器模型,但是在需要专门术语的领域中,这些模型通常无法充分执行。我们提出了Pathology,这是一种经过预先训练的蒙版语言模型,接受了347,173个组织病理学标本报告的培训,并在Huggingface存储库中公开发布。我们的综合实验表明,与非特异性语言模型相比,在病理语料库中对变压器模型的预训练可在自然语言理解(NLU)和乳腺癌诊断上的性能提高。

NLP-53-标题 Near-Negative Distinction Giving a Second Life to Human Evaluation Datasets

链接: https://arxiv.org/abs/2205.06871
作者: Philippe Laban, Chien-Sheng Wu, Wenhao Liu, Caiming Xiong
备注:

点击查看摘要

Abstract: Precisely assessing the progress in natural language generation (NLG) tasks is challenging, and human evaluation to establish preference in a model’s output over another is often necessary. However, human evaluation is usually costly, difficult to reproduce, and non-reusable. In this paper, we propose a new and simple automatic evaluation method for NLG called Near-Negative Distinction (NND) that repurposes prior human annotations into NND tests. In an NND test, an NLG model must place higher likelihood on a high-quality output candidate than on a near-negative candidate with a known error. Model performance is established by the number of NND tests a model passes, as well as the distribution over task-specific errors the model fails on. Through experiments on three NLG tasks (question generation, question answering, and summarization), we show that NND achieves higher correlation with human judgments than standard NLG evaluation metrics. We then illustrate NND evaluation in four practical scenarios, for example performing fine-grain model analysis, or studying model training dynamics. Our findings suggest NND can give a second life to human annotations and provide low-cost NLG evaluation.

摘要:精确地评估自然语言生成(NLG)任务的进度是具有挑战性的,并且通常需要进行人体评估以确立模型的产出而不是另一个模型的偏好。但是,人类评估通常是昂贵的,难以再现的,并且不可解决。在本文中,我们提出了一种新的简单自动评估方法,用于NLG,称为近距离区分(NND),将先前的人类注释重新用于NND测试。在NND测试中,NLG模型必须在高质量的输出候选者上放置更高的可能性,而不是在具有已知误差的近阴性候选者上。模型性能是通过模型传递的NND测试的数量以及模型特定于任务错误的分布而建立的。通过实验三个NLG任务(问题产生,问题答案和摘要),我们表明,与标准NLG评估指标相比,NND与人类判断的相关性更高。然后,我们在四个实际情况下说明了NND评估,例如执行精细颗粒模型分析或研究模型训练动力学。我们的发现表明,NND可以为人类注释提供第二寿命,并提供低成本的NLG评估。

NLP-54-标题 Sentiment Analysis of Covid-related Reddits

链接: https://arxiv.org/abs/2205.06863
作者: Yilin Yang, Tomas Fieg, Marina Sokolova
备注: 10 pages, 1 figure, 5 tables

点击查看摘要

Abstract: This paper focuses on Sentiment Analysis of Covid-19 related messages from the r/Canada and r/Unitedkingdom subreddits of Reddit. We apply manual annotation and three Machine Learning algorithms to analyze sentiments conveyed in those messages. We use VADER and TextBlob to label messages for Machine Learning experiments. Our results show that removal of shortest and longest messages improves VADER and TextBlob agreement on positive sentiments and F-score of sentiment classification by all the three algorithms

摘要:本文重点介绍了来自R/加拿大和Reddit的R/Unitedkingdom Subreddits的Covid-19的情感分析。我们应用手动注释和三种机器学习算法来分析这些消息中传达的情绪。我们使用Vader和TextBlob为机器学习实验标记消息。我们的结果表明,删除最短和最长的消息可改善Vader和TextBlob关于所有三种算法的积极情感和情感分类的F-评分的协议

NLP-55-标题 An Approach for Automatic Construction of an Algorithmic Knowledge Graph from Textual Resources

链接: https://arxiv.org/abs/2205.06854
作者: Jyotima Patel, Biswanath Dutta
备注: 12 pages, 7 figures, 2 tables

点击查看摘要

Abstract: There is enormous growth in various fields of research. This development is accompanied by new problems. To solve these problems efficiently and in an optimized manner, algorithms are created and described by researchers in the scientific literature. Scientific algorithms are vital for understanding and reusing existing work in numerous domains. However, algorithms are generally challenging to find. Also, the comparison among similar algorithms is difficult because of the disconnected documentation. Information about algorithms is mostly present in websites, code comments, and so on. There is an absence of structured metadata to portray algorithms. As a result, sometimes redundant or similar algorithms are published, and the researchers build them from scratch instead of reusing or expanding upon the already existing algorithm. In this paper, we introduce an approach for automatically developing a knowledge graph (KG) for algorithmic problems from unstructured data. Because it captures information more clearly and extensively, an algorithm KG will give additional context and explainability to the algorithm metadata.

摘要:各个研究领域都有巨大的增长。这种发展伴随着新问题。为了有效地以优化的方式解决这些问题,科学文献中的研究人员创建和描述了算法。科学算法对于理解和重复众多领域的现有工作至关重要。但是,算法通常具有挑战性。同样,由于文档断开连接,相似算法之间的比较也很困难。有关算法的信息主要存在于网站,代码注释等中。没有结构化的元数据来描绘算法。结果,有时会发布多余或类似的算法,研究人员从头开始构建它们,而不是重复使用或扩展已经存在的算法。在本文中,我们介绍了一种方法,用于自动开发知识图(kg),以解决非结构化数据的算法问题。由于它更清晰,更广泛地捕获信息,因此算法kg将为算法元数据提供其他上下文和解释性。

NLP-56-标题 IRB-NLP at SemEval-2022 Task 1 Exploring the Relationship Between Words and Their Semantic Representations

链接: https://arxiv.org/abs/2205.06840
作者: Damir Korenčić, Ivan Grubišić
备注:

点击查看摘要

Abstract: What is the relation between a word and its description, or a word and its embedding? Both descriptions and embeddings are semantic representations of words. But, what information from the original word remains in these representations? Or more importantly, which information about a word do these two representations share? Definition Modeling and Reverse Dictionary are two opposite learning tasks that address these questions. The goal of the Definition Modeling task is to investigate the power of information laying inside a word embedding to express the meaning of the word in a humanly understandable way – as a dictionary definition. Conversely, the Reverse Dictionary task explores the ability to predict word embeddings directly from its definition. In this paper, by tackling these two tasks, we are exploring the relationship between words and their semantic representations. We present our findings based on the descriptive, exploratory, and predictive data analysis conducted on the CODWOE dataset. We give a detailed overview of the systems that we designed for Definition Modeling and Reverse Dictionary tasks, and that achieved top scores on SemEval-2022 CODWOE challenge in several subtasks. We hope that our experimental results concerning the predictive models and the data analyses we provide will prove useful in future explorations of word representations and their relationships.

摘要:单词及其描述,单词及其嵌入之间有什么关系?描述和嵌入都是单词的语义表示。但是,这些表示形式中仍然存在哪些原始单词中的信息?或更重要的是,这两个表示共享的关于一个单词的哪些信息?定义建模和反向字典是解决这些问题的两个相反的学习任务。定义建模任务的目的是调查单词嵌入中铺设的信息的力量,以人类可以理解的方式表达单词的含义 - 作为字典定义。相反,反向字典任务探讨了直接从其定义中预测单词嵌入的能力。在本文中,通过解决这两个任务,我们正在探索单词及其语义表示之间的关系。我们根据在CODWOE数据集上进行的描述性,探索性和预测数据分析介绍了我们的发现。我们详细概述了我们为定义建模和反向字典任务设计的系统,并在几个子任务中获得了Semeval-2022 Codwoe挑战的最高分。我们希望我们有关预测模型和我们提供的数据分析的实验结果将在将来对单词表示及其关系的探索中有用。

NLP-57-标题 Deconstructing NLG Evaluation Evaluation Practices Assumptions and Their Implications

链接: https://arxiv.org/abs/2205.06828
作者: Kaitlyn Zhou, Su Lin Blodgett, Adam Trischler, Hal Daumé III, Kaheer Suleman, Alexandra Olteanu
备注: Camera Ready for NAACL 2022 (Main Conference)

点击查看摘要

Abstract: There are many ways to express similar things in text, which makes evaluating natural language generation (NLG) systems difficult. Compounding this difficulty is the need to assess varying quality criteria depending on the deployment setting. While the landscape of NLG evaluation has been well-mapped, practitioners’ goals, assumptions, and constraints – which inform decisions about what, when, and how to evaluate – are often partially or implicitly stated, or not stated at all. Combining a formative semi-structured interview study of NLG practitioners (N=18) with a survey study of a broader sample of practitioners (N=61), we surface goals, community practices, assumptions, and constraints that shape NLG evaluations, examining their implications and how they embody ethical considerations.

摘要:有很多方法可以在文本中表达相似的内容,这使得评估自然语言生成(NLG)系统变得困难。使这种困难更加复杂的是,需要根据部署设置评估不同的质量标准。尽管NLG评估的景观绘制得很好,但从业人员的目标,假设和约束(这些景观都可以为关于什么,何时和如何评估的决定提供信息 - 通常是部分或隐含的,或者根本没有说明。将NLG从业人员(n = 18)的形成性半结构化访谈研究与对更广泛的从业者样本(n = 61)的调查研究(n = 61)相结合,我们表达了目标,社区实践,假设和塑造NLG评估的约束,检查他们含义及其如何体现道德考虑。

NLP-58-标题 GenerSpeech Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Synthesis

链接: https://arxiv.org/abs/2205.07211
作者: Rongjie Huang, Yi Ren, Jinglin Liu, Chenye Cui, Zhou Zhao
备注:

点击查看摘要

Abstract: Style transfer for out-of-domain (OOD) speech synthesis aims to generate speech samples with unseen style (e.g., speaker identity, emotion, and prosody) derived from an acoustic reference, while facing the following challenges: 1) The highly dynamic style features in expressive voice are difficult to model and transfer; and 2) the TTS models should be robust enough to handle diverse OOD conditions that differ from the source data. This paper proposes GenerSpeech, a text-to-speech model towards high-fidelity zero-shot style transfer of OOD custom voice. GenerSpeech decomposes the speech variation into the style-agnostic and style-specific parts by introducing two components: 1) a multi-level style adaptor to efficiently model a large range of style conditions, including global speaker and emotion characteristics, and the local (utterance, phoneme, and word-level) fine-grained prosodic representations; and 2) a generalizable content adaptor with Mix-Style Layer Normalization to eliminate style information in the linguistic content representation and thus improve model generalization. Our evaluations on zero-shot style transfer demonstrate that GenerSpeech surpasses the state-of-the-art models in terms of audio quality and style similarity. The extension studies to adaptive style transfer further show that GenerSpeech performs robustly in the few-shot data setting. Audio samples are available at \url{this https URL}

摘要:室外(OOD)语音综合的样式转移旨在生成具有看不见的样式的语音样本(例如,扬声器身份,情感和韵律)来自声学参考,同时面临以下挑战:1)高度表达语音中的动态风格特征很难建模和传输。 2)TTS模型应足够强大,以处理与源数据不同的不同OOD条件。本文提出了GenerSpeech,这是一种文本到语音模型,用于OOD自定义语音的高保真零拍传输。 GenerSpeech通过引入两个组件将语音变化分解为样式不合时宜的和特定于样式的部分:1)多层样式适配器,以有效地对各种样式条件进行建模,包括全球扬声器和情感特征,以及本地(话语)(话语(话语) ,音素和单词级)细粒度的韵律表示; 2)具有混合层归一化的可推广内容适配器,以消除语言内容表示中的样式信息,从而改善模型的概括。我们对零拍动样式转移的评估表明,GenerSpeech在音频质量和样式相似性方面超过了最新模型。扩展研究的自适应风格转移进一步表明,GenerSpeech在几个弹药数据设置中表现稳健。音频样本可在\ url {此https url}上找到

NLP-59-标题 Collar-aware Training for Streaming Speaker Change Detection in Broadcast Speech

链接: https://arxiv.org/abs/2205.07086
作者: Joonas Kalda, Tanel Alumäe
备注: Accepted to Speaker Odyssey 2022

点击查看摘要

Abstract: In this paper, we present a novel training method for speaker change detection models. Speaker change detection is often viewed as a binary sequence labelling problem. The main challenges with this approach are the vagueness of annotated change points caused by the silences between speaker turns and imbalanced data due to the majority of frames not including a speaker change. Conventional training methods tackle these by artificially increasing the proportion of positive labels in the training data. Instead, the proposed method uses an objective function which encourages the model to predict a single positive label within a specified collar. This is done by marginalizing over all possible subsequences that have exactly one positive label within the collar. Experiments on English and Estonian datasets show large improvements over the conventional training method. Additionally, the model outputs have peaks concentrated to a single frame, removing the need for post-processing to find the exact predicted change point which is particularly useful for streaming applications.

摘要:在本文中,我们提出了一种用于说话者变更检测模型的新型培训方法。扬声器更改检测通常被视为二进制序列标记问题。这种方法的主要挑战是,由于大多数框架不包括说话者更改,说话者转弯和不平衡数据之间的沉默引起的带注释的变化点的模糊性。常规培训方法通过人为地增加培训数据中的正标的比例来解决这些方法。取而代之的是,所提出的方法使用目标函数,该目标函数鼓励模型预测指定项圈内的单个正标签。这是通过在所有可能的子序列上边缘化,这些子序列在衣领中恰好具有一个正标签。关于英语和爱沙尼亚数据集的实验对传统培训方法显示出很大的改进。此外,模型输出的峰集浓缩到单个帧,从而消除了后处理的需求,以找到确切的预测变更点,这对于流媒体应用特别有用。

NLP-60-标题 Pretraining Approaches for Spoken Language Recognition TalTech Submission to the OLR 2021 Challenge

链接: https://arxiv.org/abs/2205.07083
作者: Tanel Alumäe, Kunnar Kukk
备注: Accepted to Speaker Odyssey 2022

点击查看摘要

Abstract: This paper investigates different pretraining approaches to spoken language identification. The paper is based on our submission to the Oriental Language Recognition 2021 Challenge. We participated in two tracks of the challenge: constrained and unconstrained language recognition. For the constrained track, we first trained a Conformer-based encoder-decoder model for multilingual automatic speech recognition (ASR), using the provided training data that had transcripts available. The shared encoder of the multilingual ASR model was then finetuned for the language identification task. For the unconstrained task, we relied on both externally available pretrained models as well as external data: the multilingual XLSR-53 wav2vec2.0 model was finetuned on the VoxLingua107 corpus for the language recognition task, and finally finetuned on the provided target language training data, augmented with CommonVoice data. Our primary metric C_{\rm avg} values on the Test set are 0.0079 for the constrained task and 0.0119 for the unconstrained task which resulted in the second place in both rankings. In post-evaluation experiments, we study the amount of target language data needed for training an accurate backend model, the importance of multilingual pretraining data, and compare different models as finetuning starting points.

摘要:本文调查了口语识别的不同预训练方法。该论文基于我们对东方语言识别2021挑战的提交。我们参与了挑战的两条轨道:受约束和不受限制的语言识别。对于受限的轨道,我们首先使用具有可用成绩单的提供的培训数据训练了基于总体器的编码器模型,用于多语言自动语音识别(ASR)。然后将多语言ASR模型的共享编码器用于语言标识任务。对于不受限制的任务,我们依靠外部可用的预处理模型以及外部数据:多语言XLSR-53 WAV2VEC2.0模型在Voxlingua107语料库上进行了审核,用于语言识别任务,并最终在提供的目标语言训练数据上对,加上常见数据。我们的主要度量c _ {\ rm avg}值在约束任务中为0.0079,而无约束任务的主要度量为0.0119,这在两个排名中均为第二名。在评估后实验中,我们研究了训练准确的后端模型所需的目标语言数据的量,多语言预读数据的重要性,并将不同的模型视为鉴定起点。

机器学习

ML-0-标题 Loss Landscape Engineering via Data Regulation on PINNs

链接: https://arxiv.org/abs/2205.07843
作者: Vignesh Gopakumar, Stanislas Pamela, Debasmita Samaddar
备注: 13 Pages, 10 Figures. Journal Submission

点击查看摘要

Abstract: Physics-Informed Neural Networks have shown unique utility in parameterising the solution of a well-defined partial differential equation using automatic differentiation and residual losses. Though they provide theoretical guarantees of convergence, in practice the required training regimes tend to be exacting and demanding. Through the course of this paper, we take a deep dive into understanding the loss landscapes associated with a PINN and how that offers some insight as to why PINNs are fundamentally hard to optimise for. We demonstrate how PINNs can be forced to converge better towards the solution, by way of feeding in sparse or coarse data as a regulator. The data regulates and morphs the topology of the loss landscape associated with the PINN to make it easily traversable for the minimiser. Data regulation of PINNs helps ease the optimisation required for convergence by invoking a hybrid unsupervised-supervised training approach, where the labelled data pushes the network towards the vicinity of the solution, and the unlabelled regime fine-tunes it to the solution.

摘要:物理知识的神经网络在使用自动分化和残余损失的情况下参数化了明确定义的部分微分方程的解决方案的解决方案。尽管它们提供了融合的理论保证,但实际上,所需的培训制度往往是严格和要求的。在本文的过程中,我们深入研究了与Pinn相关的损失景观,以及如何提供有关为什么Pinns从根本上很难优化的洞察力。我们证明了如何通过以稀疏或粗数据为调节器来喂食PINN可以更好地融入溶液。数据调节和变形了与PINN相关的损失景观的拓扑结构,以使其容易为最小化器遍历。 PINN的数据调控有助于通过调用混合无监督的监督训练方法来简化收敛所需的优化,在此标记的数据将网络推向解决方案附近,而未标记的款项将其用于解决方案。

ML-1-标题 Decision Making for Hierarchical Multi-label Classification with Multidimensional Local Precision Rate

链接: https://arxiv.org/abs/2205.07833
作者: Yuting Ye, Christine Ho, Ci-Ren Jiang, Wayne Tai Lee, Haiyan Huang
备注: 34 pages, 11 figures, 9 tables

点击查看摘要

Abstract: Hierarchical multi-label classification (HMC) has drawn increasing attention in the past few decades. It is applicable when hierarchical relationships among classes are available and need to be incorporated along with the multi-label classification whereby each object is assigned to one or more classes. There are two key challenges in HMC: i) optimizing the classification accuracy, and meanwhile ii) ensuring the given class hierarchy. To address these challenges, in this article, we introduce a new statistic called the multidimensional local precision rate (mLPR) for each object in each class. We show that classification decisions made by simply sorting objects across classes in descending order of their true mLPRs can, in theory, ensure the class hierarchy and lead to the maximization of CATCH, an objective function we introduce that is related to the area under a hit curve. This approach is the first of its kind that handles both challenges in one objective function without additional constraints, thanks to the desirable statistical properties of CATCH and mLPR. In practice, however, true mLPRs are not available. In response, we introduce HierRank, a new algorithm that maximizes an empirical version of CATCH using estimated mLPRs while respecting the hierarchy. The performance of this approach was evaluated on a synthetic data set and two real data sets; ours was found to be superior to several comparison methods on evaluation criteria based on metrics such as precision, recall, and F_1 score.

摘要:层次多标签分类(HMC)在过去几十年中引起了人们的关注。当可以使用类别之间的层次关系,并且需要与多标签分类合并,将每个对象分配给一个或多个类时,它适用。 HMC中有两个关键的挑战:i)优化分类精度,同时ii)确保给定的类层次结构。为了解决这些挑战,在本文中,我们介绍了一个新的统计量,称为每个类别中每个对象的多维本地精度率(MLPR)。我们表明,通过简单地按跨类排序的对象做出的分类决策在理论上可以确保类层次结构并导致捕获的最大化,这是我们引入的目标函数,我们引入的目标功能与命中率下的区域相关曲线。由于Catch和MLPR的理想统计属性,这种方法是第一个在一个目标函数中处理两个挑战的第一个方法,它在一个目标函数中没有其他限制。但是,实际上,真正的MLPR不可用。作为回应,我们介绍了Hierrank,这是一种新算法,在尊重层次结构的同时,使用估计的MLPR最大化捕获的经验版本。该方法的性能在合成数据集和两个真实数据集上进行了评估。发现我们的基于精度,回忆和F_1分数等指标的评估标准的几种比较方法优越。

ML-2-标题 Expected Frequency Matrices of Elections Computation Geometry and Preference Learning

链接: https://arxiv.org/abs/2205.07831
作者: Niclas Boehmer, Robert Bredereck, Edith Elkind, Piotr Faliszewski, Stanisław Szufa
备注:

点击查看摘要

Abstract: We use the “map of elections” approach of Szufa et al. (AAMAS 2020) to analyze several well-known vote distributions. For each of them, we give an explicit formula or an efficient algorithm for computing its frequency matrix, which captures the probability that a given candidate appears in a given position in a sampled vote. We use these matrices to draw the “skeleton map” of distributions, evaluate its robustness, and analyze its properties. We further use them to identify the nature of several real-world elections.

摘要:我们使用Szufa等人的“选举地图”方法。(AAMAS 2020)分析了几个著名的投票分布。对于每个人,我们给出了一个明确的公式或用于计算其频率矩阵的有效算法,该算法捕获了给定候选人在采样投票中出现在给定位置的概率。我们使用这些矩阵来绘制分布的“骨架图”,评估其稳健性并分析其特性。我们进一步使用它们来确定几个现实世界选举的本质。

ML-3-标题 Federated Anomaly Detection over Distributed Data Streams

链接: https://arxiv.org/abs/2205.07829
作者: Paula Raissa Silva, João Viangre, João Gama
备注: DSAA’2021 Conference - PhD Track

点击查看摘要

Abstract: Sharing of telecommunication network data, for example, even at high aggregation levels, is nowadays highly restricted due to privacy legislation and regulations and other important ethical concerns. It leads to scattering data across institutions, regions, and states, inhibiting the usage of AI methods that could otherwise take advantage of data at scale. It creates the need to build a platform to control such data, build models or perform calculations. In this work, we propose an approach to building the bridge among anomaly detection, federated learning, and data streams. The overarching goal of the work is to detect anomalies in a federated environment over distributed data streams. This work complements the state-of-the-art by adapting the data stream algorithms in a federated learning setting for anomaly detection and by delivering a robust framework and demonstrating the practical feasibility in a real-world distributed deployment scenario.

摘要:例如,由于隐私立法和法规以及其他重要的道德问题,电信网络数据的共享也受到了高度限制。它导致在机构,区域和州之间散射数据,从而抑制了AI方法的使用,否则可以大规模利用数据。它创造了建立一个平台来控制此类数据,构建模型或执行计算的需要。在这项工作中,我们提出了一种方法,可以在异常检测,联合学习和数据流之间建立桥梁。该工作的总体目标是在分布式数据流中检测在联合环境中的异常。这项工作通过在联合学习环境中调整数据流算法来补充最先进的方法,以用于异常检测,并提供强大的框架并证明在现实世界分布式部署方案中的可行性。

ML-4-标题 GraphHD Efficient graph classification using hyperdimensional computing

链接: https://arxiv.org/abs/2205.07826
作者: Igor Nunes, Mike Heddes, Tony Givargis, Alexandru Nicolau, Alex Veidenbaum
备注:

点击查看摘要

Abstract: Hyperdimensional Computing (HDC) developed by Kanerva is a computational model for machine learning inspired by neuroscience. HDC exploits characteristics of biological neural systems such as high-dimensionality, randomness and a holographic representation of information to achieve a good balance between accuracy, efficiency and robustness. HDC models have already been proven to be useful in different learning applications, especially in resource-limited settings such as the increasingly popular Internet of Things (IoT). One class of learning tasks that is missing from the current body of work on HDC is graph classification. Graphs are among the most important forms of information representation, yet, to this day, HDC algorithms have not been applied to the graph learning problem in a general sense. Moreover, graph learning in IoT and sensor networks, with limited compute capabilities, introduce challenges to the overall design methodology. In this paper, we present GraphHD - a baseline approach for graph classification with HDC. We evaluate GraphHD on real-world graph classification problems. Our results show that when compared to the state-of-the-art Graph Neural Networks (GNNs) the proposed model achieves comparable accuracy, while training and inference times are on average 14.6 \times and 2.0 \times faster, respectively.

摘要:Kanerva开发的高维计算(HDC)是一种受神经科学启发的机器学习的计算模型。 HDC利用了生物神经系统的特征,例如高维,随机性和信息的全息表示,以在准确性,效率和鲁棒性之间达到良好的平衡。 HDC模型已被证明在不同的学习应用程序中很有用,尤其是在资源有限的设置(例如越来越流行的物联网(IoT))中。 HDC当前工作中缺少的一类学习任务是图形分类。图形是最重要的信息表示形式之一,但是,直到今天,HDC算法尚未在一般意义上应用于图形学习问题。此外,具有有限的计算功能的物联网和传感器网络中的图形学习给总体设计方法带来了挑战。在本文中,我们提出了GraphHD-使用HDC进行图形分类的基线方法。我们在现实世界图分类问题上评估GraphHD。我们的结果表明,与最先进的图形神经网络(GNN)相比,所提出的模型可以达到可比的精度,而训练和推理时间平均为14.6 \ times和2.0 \ times。

ML-5-标题 CurFi An automated tool to find the best regression analysis model using curve fitting

链接: https://arxiv.org/abs/2205.07804
作者: Ayon Roy, Tausif Al Zubayer, Nafisa Tabassum, Muhammad Nazrul Islam, Md. Abdus Sattar
备注:

点击查看摘要

Abstract: Regression analysis is a well known quantitative research method that primarily explores the relationship between one or more independent variables and a dependent variable. Conducting regression analysis manually on large datasets with multiple independent variables can be tedious. An automated system for regression analysis will be of great help for researchers as well as non-expert users. Thus, the objective of this research is to design and develop an automated curve fitting system. As outcome, a curve fitting system named “CurFi” was developed that uses linear regression models to fit a curve to a dataset and to find out the best fit model. The system facilitates to upload a dataset, split the dataset into training set and test set, select relevant features and label from the dataset; and the system will return the best fit linear regression model after training is completed. The developed tool would be a great resource for the users having limited technical knowledge who will also be able to find the best fit regression model for a dataset using the developed “CurFi” system.

摘要:回归分析是一种众所周知的定量研究方法,主要探索一个或多个自变量和因变量之间的关系。在具有多个独立变量的大型数据集上手动进行回归分析可能很乏味。回归分析的自动化系统将对研究人员以及非专家用户有很大帮助。因此,这项研究的目的是设计和开发自动化曲线拟合系统。作为结果,开发了一个名为“ Curfi”的曲线拟合系统,该系统使用线性回归模型将曲线拟合到数据集并找出最佳拟合模型。该系统有助于上传数据集,将数据集拆分为训练集和测试集,从数据集中选择相关功能并标记;训练完成后,系统将返回最佳拟合线性回归模型。对于拥有有限的技术知识的用户来说,开发的工具将是一个很好的资源,他们还可以使用开发的“ Curfi”系统找到数据集的最佳拟合回归模型。

ML-6-标题 The Primacy Bias in Deep Reinforcement Learning

链接: https://arxiv.org/abs/2205.07802
作者: Evgenii Nikishin, Max Schwarzer, Pierluca D’Oro, Pierre-Luc Bacon, Aaron Courville
备注: ICML 2022; code at this https URL

点击查看摘要

Abstract: This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later. Because of training on progressively growing datasets, deep RL agents incur a risk of overfitting to earlier experiences, negatively affecting the rest of the learning process. Inspired by cognitive science, we refer to this effect as the primacy bias. Through a series of experiments, we dissect the algorithmic aspects of deep RL that exacerbate this bias. We then propose a simple yet generally-applicable mechanism that tackles the primacy bias by periodically resetting a part of the agent. We apply this mechanism to algorithms in both discrete (Atari 100k) and continuous action (DeepMind Control Suite) domains, consistently improving their performance.

摘要:这项工作确定了深钢筋学习(RL)算法的常见缺陷:依靠早期互动并忽略后来遇到的有用证据的趋势。由于对逐步增长数据集的培训,深度RL代理会导致过度适应早期体验的风险,从而对其余的学习过程产生负面影响。受认知科学的启发,我们将这种影响称为首要偏见。通过一系列实验,我们剖析了加剧这种偏见的深度RL的算法方面。然后,我们提出了一种简单但通常具有易于应用的机制,该机制通过定期重置一部分代理来解决首要偏差。我们将此机制应用于离散(ATARI 100K)和连续作用(DeepMind Control Suite)域中的算法,从而不断提高其性能。

ML-7-标题 Gradient-based Counterfactual Explanations using Tractable Probabilistic Models

链接: https://arxiv.org/abs/2205.07774
作者: Xiaoting Shao, Kristian Kersting
备注:

点击查看摘要

Abstract: Counterfactual examples are an appealing class of post-hoc explanations for machine learning models. Given input x of class y_1 , its counterfactual is a contrastive example x^\prime of another class y_0 . Current approaches primarily solve this task by a complex optimization: define an objective function based on the loss of the counterfactual outcome y_0 with hard or soft constraints, then optimize this function as a black-box. This “deep learning” approach, however, is rather slow, sometimes tricky, and may result in unrealistic counterfactual examples. In this work, we propose a novel approach to deal with these problems using only two gradient computations based on tractable probabilistic models. First, we compute an unconstrained counterfactual u of x to induce the counterfactual outcome y_0 . Then, we adapt u to higher density regions, resulting in x^{\prime} . Empirical evidence demonstrates the dominant advantages of our approach.

摘要:反事实示例是机器学习模型的事后解释。给定的输入x类的y_1,其反事实是另一个类y_0的对比示例x^\ prime。当前方法主要通过复杂的优化解决此任务:根据使用硬或软约束的反事实结果Y_0的丢失来定义目标函数,然后将此函数优化为黑色框。但是,这种“深度学习”方法相当缓慢,有时很棘手,可能导致不切实际的反事实例子。在这项工作中,我们提出了一种新的方法,仅使用基于可行概率模型的两个梯度计算来处理这些问题。首先,我们计算X的不受约束的反事实U,以诱导反事实结果y_0。然后,我们将U调整到更高的密度区域,从而导致X^{\ Prime}。经验证据证明了我们方法的主要优势。

ML-8-标题 Efficient Algorithms for Planning with Participation Constraints

链接: https://arxiv.org/abs/2205.07767
作者: Hanrui Zhang, Yu Cheng, Vincent Conitzer
备注: EC 2022

点击查看摘要

Abstract: We consider the problem of planning with participation constraints introduced in [Zhang et al., 2022]. In this problem, a principal chooses actions in a Markov decision process, resulting in separate utilities for the principal and the agent. However, the agent can and will choose to end the process whenever his expected onward utility becomes negative. The principal seeks to compute and commit to a policy that maximizes her expected utility, under the constraint that the agent should always want to continue participating. We provide the first polynomial-time exact algorithm for this problem for finite-horizon settings, where previously only an additive \varepsilon -approximation algorithm was known. Our approach can also be extended to the (discounted) infinite-horizon case, for which we give an algorithm that runs in time polynomial in the size of the input and \log(1/\varepsilon) , and returns a policy that is optimal up to an additive error of \varepsilon .

摘要:我们考虑[Zhang等,2022]中引入的参与限制的计划问题。在这个问题中,委托人在马尔可夫决策过程中选择行动,从而为委托人和代理人提供了单独的公用事业。但是,代理可以并且将选择在其预期的效用变为负面时结束该过程。委托人试图在代理人应该始终希望继续参与的限制下计算并承诺最大化其预期效用的政策。我们为有限摩尼子设置提供了此问题的第一个多项式时间精确算法,以前仅知道添加剂\ varepsilon-approximation算法。我们的方法也可以扩展到(打折的)无限 - 霍森案例,为此,我们提供了一种算法,该算法在输入和\ log和\ log(1/\ varepsilon)的大小中及时运行,并返回一项最佳策略达到\ varepsilon的添加误差。

ML-9-标题 Prioritizing Corners in OoD Detectors via Symbolic String Manipulation

链接: https://arxiv.org/abs/2205.07736
作者: Chih-Hong Cheng, Changshun Wu, Emmanouil Seferis, Saddek Bensalem
备注:

点击查看摘要

Abstract: For safety assurance of deep neural networks (DNNs), out-of-distribution (OoD) monitoring techniques are essential as they filter spurious input that is distant from the training dataset. This paper studies the problem of systematically testing OoD monitors to avoid cases where an input data point is tested as in-distribution by the monitor, but the DNN produces spurious output predictions. We consider the definition of “in-distribution” characterized in the feature space by a union of hyperrectangles learned from the training dataset. Thus the testing is reduced to finding corners in hyperrectangles distant from the available training data in the feature space. Concretely, we encode the abstract location of every data point as a finite-length binary string, and the union of all binary strings is stored compactly using binary decision diagrams (BDDs). We demonstrate how to use BDDs to symbolically extract corners distant from all data points within the training set. Apart from test case generation, we explain how to use the proposed corners to fine-tune the DNN to ensure that it does not predict overly confidently. The result is evaluated over examples such as number and traffic sign recognition.

摘要:为了确保深神经网络(DNNS)的安全性,分布(OOD)监测技术是必不可少的,因为它们过滤了远离培训数据集的伪造输入。本文研究了系统测试OOD监测器的问题,以避免显示输入数据点作为监视器分配的情况,但DNN产生了虚假的输出预测。我们考虑通过从训练数据集中学到的超级矩形联合在特征空间中特征的“分布”的定义。因此,测试减少为在与特征空间中可用训练数据的远距离距离中找到角落。具体来说,我们将每个数据点的抽象位置编码为有限的二进制字符串,所有二进制字符串的联合使用二进制决策图(BDDS)紧凑地存储。我们演示了如何使用BDD象征性地提取距训练集中所有数据点的拐角处。除了测试案例生成外,我们还解释了如何使用拟议的角来微调DNN,以确保其不会过于自信。在诸如数字和流量标志识别之类的示例中评估了结果。

ML-10-标题 Generalizing to New Tasks via One-Shot Compositional Subgoals

链接: https://arxiv.org/abs/2205.07716
作者: Xihan Bian, Oscar Mendez, Simon Hadfield
备注: Present at ICRA 2022 “Compositional Robotics: Mathematics and Tools”

点击查看摘要

Abstract: The ability to generalize to previously unseen tasks with little to no supervision is a key challenge in modern machine learning research. It is also a cornerstone of a future “General AI”. Any artificially intelligent agent deployed in a real world application, must adapt on the fly to unknown environments. Researchers often rely on reinforcement and imitation learning to provide online adaptation to new tasks, through trial and error learning. However, this can be challenging for complex tasks which require many timesteps or large numbers of subtasks to complete. These “long horizon” tasks suffer from sample inefficiency and can require extremely long training times before the agent can learn to perform the necessary longterm planning. In this work, we introduce CASE which attempts to address these issues by training an Imitation Learning agent using adaptive “near future” subgoals. These subgoals are recalculated at each step using compositional arithmetic in a learned latent representation space. In addition to improving learning efficiency for standard long-term tasks, this approach also makes it possible to perform one-shot generalization to previously unseen tasks, given only a single reference trajectory for the task in a different environment. Our experiments show that the proposed approach consistently outperforms the previous state-of-the-art compositional Imitation Learning approach by 30%.

摘要:在现代机器学习研究中,几乎没有监督的情况下概括到以前看不见的任务的能力是一个关键挑战。它也是未来“将军AI”的基石。任何部署在现实世界应用中的人为智能代理,都必须随时适应未知环境。研究人员通常依靠强化和模仿学习来通过试用和错误学习来在线适应新任务。但是,这对于需要许多时间段或大量子任务才能完成的复杂任务可能具有挑战性。这些“长范围”任务遭受了样本效率低下的损失,并且可能需要非常长的培训时间,然后代理人才能学习执行必要的长期计划。在这项工作中,我们介绍了案例,该案例试图通过使用自适应“不久的将来”子目标训练模仿学习代理来解决这些问题。这些子观念在每个步骤中使用构图算术在学习潜在的表示空间中进行重新计算。除了提高标准长期任务的学习效率外,这种方法还可以使对以前看不见的任务进行一次性的概括,只有在不同环境中为该任务进行单个参考轨迹。我们的实验表明,所提出的方法始终优于先前的最新成分模仿学习方法30%。

ML-11-标题 L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data

链接: https://arxiv.org/abs/2205.07682
作者: Mattia Giovanni Campana, Andrea Rovati, Franca Delmastro, Elena Pagani
备注: accepted for IEEE SMARTCOMP 2022

点击查看摘要

Abstract: Smartphones and wearable devices, along with Artificial Intelligence, can represent a game-changer in the pandemic control, by implementing low-cost and pervasive solutions to recognize the development of new diseases at their early stages and by potentially avoiding the rise of new outbreaks. Some recent works show promise in detecting diagnostic signals of COVID-19 from voice and coughs by using machine learning and hand-crafted acoustic features. In this paper, we decided to investigate the capabilities of the recently proposed deep embedding model L3-Net to automatically extract meaningful features from raw respiratory audio recordings in order to improve the performances of standard machine learning classifiers in discriminating between COVID-19 positive and negative subjects from smartphone data. We evaluated the proposed model on 3 datasets, comparing the obtained results with those of two reference works. Results show that the combination of L3-Net with hand-crafted features overcomes the performance of the other works of 28.57% in terms of AUC in a set of subject-independent experiments. This result paves the way to further investigation on different deep audio embeddings, also for the automatic detection of different diseases.

摘要:智能手机和可穿戴设备以及人工智能,可以通过实施低成本和普遍的解决方案来代表大流行控制中的游戏规则,以识别早期阶段的新疾病的发展,并有可能避免新的疾病的兴起爆发。最近的一些作品显示了通过使用机器学习和手工制作的声学特征从语音和咳嗽中检测Covid-19的诊断信号的希望。在本文中,我们决定调查最近提出的深层嵌入模型L3-NET的功能,以自动从原始呼吸音频记录中提取有意义的特征,以提高标准机器学习分类器在区分COVID-19的正面和负面的表演中的性能。来自智能手机数据的受试者。我们在3个数据集上评估了所提出的模型,将获得的结果与两项参考作品的结果进行了比较。结果表明,在一组无关的实验中,L3-NET与手工制作的特征的组合克服了AUC的其他28.57%的其他作品的性能。该结果为进一步研究不同的深度音频嵌入量铺平了道路,也可以自动检测不同疾病。

ML-12-标题 Hyperdimensional computing encoding for feature selection on the use case of epileptic seizure detection

链接: https://arxiv.org/abs/2205.07654
作者: Una Pale, Tomas Teijeiro, David Atienza
备注:

点击查看摘要

Abstract: The healthcare landscape is moving from the reactive interventions focused on symptoms treatment to a more proactive prevention, from one-size-fits-all to personalized medicine, and from centralized to distributed paradigms. Wearable IoT devices and novel algorithms for continuous monitoring are essential components of this transition. Hyperdimensional (HD) computing is an emerging ML paradigm inspired by neuroscience research with various aspects interesting for IoT devices and biomedical applications. Here we explore the not yet addressed topic of optimal encoding of spatio-temporal data, such as electroencephalogram (EEG) signals, and all information it entails to the HD vectors. Further, we demonstrate how the HD computing framework can be used to perform feature selection by choosing an adequate encoding. To the best of our knowledge, this is the first approach to performing feature selection using HD computing in the literature. As a result, we believe it can support the ML community to further foster the research in multiple directions related to feature and channel selection, as well as model interpretability.

摘要:医疗保健局势正在从重点关注症状治疗的反应性干预措施转变为从一定程度的全能医学到个性化医学,再到集中式范式到分布式范式。可穿戴的物联网设备和用于连续监测的新型算法是该过渡的重要组成部分。高维(HD)计算是受神经科学研究启发的新兴ML范式,对物联网设备和生物医学应用有趣的各个方面。在这里,我们探讨了尚未解决的空间数据最佳编码的主题,例如脑电图(EEG)信号及其所带来的所有信息。此外,我们演示了如何通过选择足够的编码来使用高清计算框架来执行特征选择。据我们所知,这是使用文献中使用高清计算进行功能选择的第一种方法。结果,我们认为它可以支持ML社区以与功能和渠道选择有关的多个方向以及模型的解释性进一步促进研究。

ML-13-标题 Generalizing to Evolving Domains with Latent Structure-Aware Sequential Autoencoder

链接: https://arxiv.org/abs/2205.07649
作者: Tiexin Qin, Shiqi Wang, Haoliang Li
备注: ICML 2022, code is available at this https URL

点击查看摘要

Abstract: Domain generalization aims to improve the generalization capability of machine learning systems to out-of-distribution (OOD) data. Existing domain generalization techniques embark upon stationary and discrete environments to tackle the generalization issue caused by OOD data. However, many real-world tasks in non-stationary environments (e.g. self-driven car system, sensor measures) involve more complex and continuously evolving domain drift, which raises new challenges for the problem of domain generalization. In this paper, we formulate the aforementioned setting as the problem of evolving domain generalization. Specifically, we propose to introduce a probabilistic framework called Latent Structure-aware Sequential Autoencoder (LSSAE) to tackle the problem of evolving domain generalization via exploring the underlying continuous structure in the latent space of deep neural networks, where we aim to identify two major factors namely covariate shift and concept shift accounting for distribution shift in non-stationary environments. Experimental results on both synthetic and real-world datasets show that LSSAE can lead to superior performances based on the evolving domain generalization setting.

摘要:域的概括旨在提高机器学习系统到分布(OOD)数据的概括能力。现有的域概括技术将启动固定和离散环境,以解决由OOD数据引起的概括问题。但是,非平稳环境中的许多实际任务(例如,自动驱动的汽车系统,传感器度量)涉及更复杂和不断发展的域漂移,这为域概括的问题带来了新的挑战。在本文中,我们将上述设置作为不断发展的域概括问题。具体而言,我们建议引入一个称为潜在结构感知的顺序自动编码器(LSSAE)的概率框架,以解决通过探索深神经网络潜在空间中的基本连续结构来解决域的概括问题,我们旨在识别两个主要因素即协变量的转移和概念转移核算非平稳环境中的分配转移。合成和现实世界数据集的实验结果表明,LSSAE可以基于不断发展的域概括设置导致出色的性能。

ML-14-标题 Attacking and Defending Deep Reinforcement Learning Policies

链接: https://arxiv.org/abs/2205.07626
作者: Chao Wang
备注: nine pages

点击查看摘要

Abstract: Recent studies have shown that deep reinforcement learning (DRL) policies are vulnerable to adversarial attacks, which raise concerns about applications of DRL to safety-critical systems. In this work, we adopt a principled way and study the robustness of DRL policies to adversarial attacks from the perspective of robust optimization. Within the framework of robust optimization, optimal adversarial attacks are given by minimizing the expected return of the policy, and correspondingly a good defense mechanism should be realized by improving the worst-case performance of the policy. Considering that attackers generally have no access to the training environment, we propose a greedy attack algorithm, which tries to minimize the expected return of the policy without interacting with the environment, and a defense algorithm, which performs adversarial training in a max-min form. Experiments on Atari game environments show that our attack algorithm is more effective and leads to worse return of the policy than existing attack algorithms, and our defense algorithm yields policies more robust than existing defense methods to a range of adversarial attacks (including our proposed attack algorithm).

摘要:最近的研究表明,深入的强化学习(DRL)政策容易受到对抗性攻击的影响,这引起了人们对DRL对安全至关重要系统的应用的担忧。在这项工作中,我们采用了一种原则性的方式,并从强大优化的角度研究了DRL政策的鲁棒性来对抗攻击。在强大优化的框架内,通过最大程度地减少预期的政策回报来给出最佳的对抗攻击,并且应通过改善政策最差的案例表现来实现良好的防御机制。考虑到攻击者通常无法访问培训环境,我们提出了一种贪婪的攻击算法,该算法试图最大程度地降低政策的预期回报而不与环境互动,并使用一种防御算法,以最大的最大形式进行对抗性培训。 Atari游戏环境上的实验表明,我们的攻击算法比现有攻击算法更有效,并且使政策的回报差,并且我们的国防算法比现有的防御方法更强大地获得了一系列对抗性攻击(包括我们所提出的拟议的攻击算法)。

ML-15-标题 Model Agnostic Local Explanations of Reject

链接: https://arxiv.org/abs/2205.07623
作者: André Artelt, Roel Visser, Barbara Hammer
备注: arXiv admin note: text overlap with arXiv:2202.07244

点击查看摘要

Abstract: The application of machine learning based decision making systems in safety critical areas requires reliable high certainty predictions. Reject options are a common way of ensuring a sufficiently high certainty of predictions made by the system. While being able to reject uncertain samples is important, it is also of importance to be able to explain why a particular sample was rejected. However, explaining general reject options is still an open problem. We propose a model agnostic method for locally explaining arbitrary reject options by means of interpretable models and counterfactual explanations.

摘要:基于机器学习的决策系统在安全关键领域中的应用需要可靠的高确定性预测。拒绝选项是确保系统做出的预测足够高的确定性的常见方法。虽然能够拒绝不确定的样本很重要,但能够解释为什么拒绝特定样本也很重要。但是,解释一般拒绝选项仍然是一个空旷的问题。我们提出了一种模型不可知论方法,用于本地通过可解释的模型和反事实解释来解释任意拒绝选项。

ML-16-标题 Rethinking Reinforcement Learning based Logic Synthesis

链接: https://arxiv.org/abs/2205.07614
作者: Chao Wang, Chen Chen, Dong Li, Bin Wang
备注: nine pages; one figure;

点击查看摘要

Abstract: Recently, reinforcement learning has been used to address logic synthesis by formulating the operator sequence optimization problem as a Markov decision process. However, through extensive experiments, we find out that the learned policy makes decisions independent from the circuit features (i.e., states) and yields an operator sequence that is permutation invariant to some extent in terms of operators. Based on these findings, we develop a new RL-based method that can automatically recognize critical operators and generate common operator sequences generalizable to unseen circuits. Our algorithm is verified on both the EPFL benchmark, a private dataset and a circuit at industrial scale. Experimental results demonstrate that it achieves a good balance among delay, area and runtime, and is practical for industrial usage.

摘要:最近,通过将操作员序列优化问题作为马尔可夫决策过程来解决逻辑综合来解决逻辑综合。但是,通过广泛的实验,我们发现学识渊博的政策使决策与电路功能(即状态)独立,并产生操作员序列,该序列在某种程度上在操作员方面不变。基于这些发现,我们开发了一种新的基于RL的方法,该方法可以自动识别关键操作员并生成可推广到看不见电路的常见运算符序列。我们的算法在EPFL基准,私人数据集和工业规模的电路上都得到了验证。实验结果表明,它在延迟,区域和运行时达到了良好的平衡,并且在工业使用方面是实用的。

ML-17-标题 Qualitative Differences Between Evolutionary Strategies and Reinforcement Learning Methods for Control of Autonomous Agents

链接: https://arxiv.org/abs/2205.07592
作者: Nicola Milano, Stefano Nolfi
备注:

点击查看摘要

Abstract: In this paper we analyze the qualitative differences between evolutionary strategies and reinforcement learning algorithms by focusing on two popular state-of-the-art algorithms: the OpenAI-ES evolutionary strategy and the Proximal Policy Optimization (PPO) reinforcement learning algorithm – the most similar methods of the two families. We analyze how the methods differ with respect to: (i) general efficacy, (ii) ability to cope with sparse rewards, (iii) propensity/capacity to discover minimal solutions, (iv) dependency on reward shaping, and (v) ability to cope with variations of the environmental conditions. The analysis of the performance and of the behavioral strategies displayed by the agents trained with the two methods on benchmark problems enable us to demonstrate qualitative differences which were not identified in previous studies, to identify the relative weakness of the two methods, and to propose ways to ameliorate some of those weakness. We show that the characteristics of the reward function has a strong impact which vary qualitatively not only for the OpenAI-ES and the PPO but also for alternative reinforcement learning algorithms, thus demonstrating the importance of optimizing the characteristic of the reward function to the algorithm used.

摘要:在本文中,我们通过关注两种流行的最新算法来分析进化策略和强化学习算法之间的质差:OpenAI-ES进化策略和近端政策优化(PPO)增强算法 - 两个家庭中最相似的方法。我们分析了这些方法在以下方面的不同之处:(i)一般疗效,(ii)能够应对稀疏奖励的能力,(iii)发现最小解决方案的倾向/能力,(iv)依赖奖励成型的依赖性;(v)能力;(v)能力应对环境条件的变化。对通过基准问题进行两种方法训练的代理商展示的性能和行为策略的分析,使我们能够证明在先前的研究中未发现的定性差异,以确定两种方法的相对弱点,并提出方法,并提出方法。改善其中一些弱点。我们表明,奖励功能的特征具有很大的影响,不仅对OpenAI-ES和PPO,而且对替代强化学习算法都有不同的影响。

ML-18-标题 Fundamental Laws of Binary Classification

链接: https://arxiv.org/abs/2205.07589
作者: Denise M. Reeves
备注: 180 pages, 21 figures: We present a comprehensive treatise on the binary classification of random vectors. We formulate the direct problem by generalizing a well-posed variant of Bayes’ decision rule. We formulate the inverse problem by generalizing a well-posed variant of the constrained optimization algorithm used by support vector machines to learn nonlinear decision boundaries

点击查看摘要

Abstract: Finding discriminant functions of minimum risk binary classification systems is a novel geometric locus problem – that requires solving a system of fundamental locus equations of binary classification – subject to deep-seated statistical laws. We show that a discriminant function of a minimum risk binary classification system is the solution of a locus equation that represents the geometric locus of the decision boundary of the system, wherein the discriminant function is connected to the decision boundary by an intrinsic eigen-coordinate system in such a manner that the discriminant function is represented by a geometric locus of a novel principal eigenaxis – formed by a dual locus of likelihood components and principal eigenaxis components. We demonstrate that a minimum risk binary classification system acts to jointly minimize its eigenenergy and risk by locating a point of equilibrium wherein critical minimum eigenenergies exhibited by the system are symmetrically concentrated in such a manner that the geometric locus of the novel principal eigenaxis of the system exhibits symmetrical dimensions and densities, such that counteracting and opposing forces and influences of the system are symmetrically balanced with each other – about the geometric center of the locus of the novel principal eigenaxis – whereon the statistical fulcrum of the system is located. Thereby, a minimum risk binary classification system satisfies a state of statistical equilibrium wherein the total allowed eigenenergy and the expected risk exhibited by the system are jointly minimized within the decision space of the system, so that the system exhibits the minimum probability of classification error.

摘要:找到最小风险二进制分类系统的判别功能是一个新的几何基因座问题 - 需要解决二元分类基本基因座方程的系统 - 遵守深层统计法。我们表明,最小风险二进制分类系统的判别函数是代表系统决策边界几何基因座的基因座方程的解决方案,其中判别函数通过内在的特征坐标系统连接到决策边界以一种新颖的主特征性的几何基因座表示判别函数的方式 - 由似然成分和主特征性成分的双基因座形成。我们证明,最小风险二进制分类系统通过定位平衡点共同将其特征力和风险共同最小化,其中该系统所表现出的关键最小征素能对称地集中在这种方式中,以至于该系统的几点元素是新颖的主要特征性系统特征性的几点。表现出对称的维度和密度,以使系统的反对力和相反力和影响对称彼此平衡 - 关于新型主要特征力的新几何中心 - 该系统的统计支出是该系统的统计支出。因此,最小风险二进制分类系统满足统计平衡状态,其中允许的总特征和系统所表现出的预期风险在系统的决策空间内共同最小化,因此该系统表现出分类错误的最低可能性。

ML-19-标题 Chemical transformer compression for accelerating both training and inference of molecular modeling

链接: https://arxiv.org/abs/2205.07582
作者: Yi Yu, Karl Borjesson
备注:

点击查看摘要

Abstract: Transformer models have been developed in molecular science with excellent performance in applications including quantitative structure-activity relationship (QSAR) and virtual screening (VS). Compared with other types of models, however, they are large, which results in a high hardware requirement to abridge time for both training and inference processes. In this work, cross-layer parameter sharing (CLPS), and knowledge distillation (KD) are used to reduce the sizes of transformers in molecular science. Both methods not only have competitive QSAR predictive performance as compared to the original BERT model, but also are more parameter efficient. Furthermore, by integrating CLPS and KD into a two-state chemical network, we introduce a new deep lite chemical transformer model, DeLiCaTe. DeLiCaTe captures general-domains as well as task-specific knowledge, which lead to a 4x faster rate of both training and inference due to a 10- and 3-times reduction of the number of parameters and layers, respectively. Meanwhile, it achieves comparable performance in QSAR and VS modeling. Moreover, we anticipate that the model compression strategy provides a pathway to the creation of effective generative transformer models for organic drug and material design.

摘要:在分子科学中已经开发了变压器模型,其应用在包括定量结构 - 活性关系(QSAR)和虚拟筛选(VS)的应用中。但是,与其他类型的模型相比,它们很大,这导致了很高的硬件要求,以缩短培训和推理过程的时间。在这项工作中,使用跨层参数共享(CLP)和知识蒸馏(KD)来减少分子科学中变压器的大小。与原始BERT模型相比,这两种方法不仅具有竞争性的QSAR预测性能,而且更有效。此外,通过将CLP和KD整合到两个态化学网络中,我们引入了一种新的Deep Lite化学变压器模型,即精致。精致的捕获通用域以及特定于任务的知识,这会导致训练和推理的速度更快,这是由于参数和层的数量分别减少了10倍和3倍。同时,它在QSAR和VS建模中实现了可比的性能。此外,我们预计模型压缩策略为有机药物和材料设计创建有效的生成变压器模型提供了途径。

ML-20-标题 Autonomous Open-Ended Learning of Tasks with Non-Stationary Interdependencies

链接: https://arxiv.org/abs/2205.07562
作者: Alejandro Romero, Gianluca Baldassarre, Richard J. Duro, Vieri Giuliano Santucci
备注: Submitted and accepted to “The Multi-disciplinary Conference on Reinforcement Learning and Decision Making” RLDM 2022

点击查看摘要

Abstract: Autonomous open-ended learning is a relevant approach in machine learning and robotics, allowing the design of artificial agents able to acquire goals and motor skills without the necessity of user assigned tasks. A crucial issue for this approach is to develop strategies to ensure that agents can maximise their competence on as many tasks as possible in the shortest possible time. Intrinsic motivations have proven to generate a task-agnostic signal to properly allocate the training time amongst goals. While the majority of works in the field of intrinsically motivated open-ended learning focus on scenarios where goals are independent from each other, only few of them studied the autonomous acquisition of interdependent tasks, and even fewer tackled scenarios where goals involve non-stationary interdependencies. Building on previous works, we tackle these crucial issues at the level of decision making (i.e., building strategies to properly select between goals), and we propose a hierarchical architecture that treating sub-tasks selection as a Markov Decision Process is able to properly learn interdependent skills on the basis of intrinsically generated motivations. In particular, we first deepen the analysis of a previous system, showing the importance of incorporating information about the relationships between tasks at a higher level of the architecture (that of goal selection). Then we introduce H-GRAIL, a new system that extends the previous one by adding a new learning layer to store the autonomously acquired sequences of tasks to be able to modify them in case the interdependencies are non-stationary. All systems are tested in a real robotic scenario, with a Baxter robot performing multiple interdependent reaching tasks.

摘要:自主开放式学习是机器学习和机器人技术中的一种相关方法,可以设计能够在无需使用用户指定任务的情况下获得目标和运动技能的人造代理。这种方法的一个关键问题是制定策略,以确保代理可以在最短的时间内最大程度地提高其在尽可能多的任务上的能力。事实证明,内在动机可以产生任务不足的信号,以正确分配目标之间的训练时间。虽然大多数在本质上积极进取的开放式学习集中在目标的方案上,但只有少数人研究了对相互依存任务的自主收购,甚至更少的解决方案涉及非平稳性相互依存的目标。在以前的工作的基础上,我们在决策级别(即,建立在目标之间正确选择的策略)解决了这些关键问题,我们提出了一个等级体系结构,将子任务选择作为马尔可夫决策过程,可以正确地学习相互依存的技能基于本质上产生的动机。特别是,我们首先加深了对先前系统的分析,展示了在较高级别(目标选择的)中合并有关任务之间关系的重要性。然后,我们介绍了H-Grail,这是一种新系统,通过添加一个新的学习层来存储自主获得的任务序列,以便能够修改它们,以防相互依存是非平稳的,从而扩展了前一个系统。所有系统均在真正的机器人场景中进行测试,百特机器人执行多个相互依存的达到任务。

ML-21-标题 Reachability Constrained Reinforcement Learning

链接: https://arxiv.org/abs/2205.07536
作者: Dongjie Yu, Haitong Ma, Shengbo Eben Li, Jianyu Chen
备注: Accepted by ICML 2022

点击查看摘要

Abstract: Constrained Reinforcement Learning (CRL) has gained significant interest recently, since the satisfaction of safety constraints is critical for real world problems. However, existing CRL methods constraining discounted cumulative costs generally lack rigorous definition and guarantee of safety. On the other hand, in the safe control research, safety is defined as persistently satisfying certain state constraints. Such persistent safety is possible only on a subset of the state space, called feasible set, where an optimal largest feasible set exists for a given environment. Recent studies incorporating safe control with CRL using energy-based methods such as control barrier function (CBF), safety index (SI) leverage prior conservative estimation of feasible sets, which harms performance of the learned policy. To deal with this problem, this paper proposes a reachability CRL (RCRL) method by using reachability analysis to characterize the largest feasible sets. We characterize the feasible set by the established self-consistency condition, then a safety value function can be learned and used as constraints in CRL. We also use the multi-time scale stochastic approximation theory to prove that the proposed algorithm converges to a local optimum, where the largest feasible set can be guaranteed. Empirical results on different benchmarks such as safe-control-gym and Safety-Gym validate the learned feasible set, the performance in optimal criteria, and constraint satisfaction of RCRL, compared to state-of-the-art CRL baselines.

摘要:最近的加固学习(CRL)最近引起了重大兴趣,因为对安全限制的满意度对于现实世界中的问题至关重要。但是,限制折现累积成本的现有CRL方法通常缺乏严格的定义和安全性的保证。另一方面,在安全控制研究中,安全被定义为持续满足某些状态限制。这种持续的安全性只有在称为可行集合的状态空间的一部分中才有可能,其中最佳的最大可行集存在于给定的环境中。最近使用基于能量的方法(例如控制屏障功能(CBF)),安全指数(SI)利用了可行集合的保守性估计,这损害了学识渊博的政策的性能,因此使用基于能量的方法(CBF),安全指数(SI)利用了诸如控制屏障功能(CBF),安全指数(SI)的最新研究。为了解决这个问题,本文通过使用可及性分析来表征最大的可行集合,提出了一种可及性CRL(RCRL)方法。我们表征了由已建立的自洽条件设置的可行设置,然后可以在CRL中学习并将安全价值函数用作约束。我们还使用多时间刻度随机近似理论来证明所提出的算法会收敛到局部最佳,其中最大的可行集可以保证。与最先进的CRL基准相比,诸如安全控制的基准和安全性gym等不同基准的经验结果验证了可行的集合,最佳标准的性能以及RCRL的约束满意度。

ML-22-标题 Wasserstein t-SNE

链接: https://arxiv.org/abs/2205.07531
作者: Fynn Bachmann, Philipp Hennig, Dmitry Kobak
备注:

点击查看摘要

Abstract: Scientific datasets often have hierarchical structure: for example, in surveys, individual participants (samples) might be grouped at a higher level (units) such as their geographical region. In these settings, the interest is often in exploring the structure on the unit level rather than on the sample level. Units can be compared based on the distance between their means, however this ignores the within-unit distribution of samples. Here we develop an approach for exploratory analysis of hierarchical datasets using the Wasserstein distance metric that takes into account the shapes of within-unit distributions. We use t-SNE to construct 2D embeddings of the units, based on the matrix of pairwise Wasserstein distances between them. The distance matrix can be efficiently computed by approximating each unit with a Gaussian distribution, but we also provide a scalable method to compute exact Wasserstein distances. We use synthetic data to demonstrate the effectiveness of our Wasserstein t-SNE, and apply it to data from the 2017 German parliamentary election, considering polling stations as samples and voting districts as units. The resulting embedding uncovers meaningful structure in the data.

摘要:科学数据集通常具有层次结构:例如,在调查中,个人参与者(样本)可能会分为更高级别(单位),例如其地理区域。在这些设置中,兴趣通常是在探索单位级别而不是样本级别上的结构。可以根据其平均值之间的距离进行比较,但是这忽略了样本的单位内分布。在这里,我们使用Wasserstein距离度量标准开发了一种对层次数据集进行探索性分析的方法,该指标考虑了单位内分布的形状。我们使用T-SNE构建单元的2D嵌入,基于它们之间的成对瓦斯汀距离的矩阵。距离矩阵可以通过使用高斯分布近似于每个单元来有效计算,但是我们还提供了一种可扩展的方法来计算精确的Wasserstein距离。我们使用合成数据来证明我们的Wasserstein T-SNE的有效性,并将其应用于2017年德国议会选举的数据,将投票站视为样本和投票区。结果嵌入发现数据中有意义的结构。

ML-23-标题 A model aggregation approach for high-dimensional large-scale optimization

链接: https://arxiv.org/abs/2205.07525
作者: Haowei Wang, Ercong Zhang, Szu Hui Ng, Giulia Pedrielli
备注:

点击查看摘要

Abstract: Bayesian optimization (BO) has been widely used in machine learning and simulation optimization. With the increase in computational resources and storage capacities in these fields, high-dimensional and large-scale problems are becoming increasingly common. In this study, we propose a model aggregation method in the Bayesian optimization (MamBO) algorithm for efficiently solving high-dimensional large-scale optimization problems. MamBO uses a combination of subsampling and subspace embeddings to collectively address high dimensionality and large-scale issues; in addition, a model aggregation method is employed to address the surrogate model uncertainty issue that arises when embedding is applied. This surrogate model uncertainty issue is largely ignored in the embedding literature and practice, and it is exacerbated when the problem is high-dimensional and data are limited. Our proposed model aggregation method reduces these lower-dimensional surrogate model risks and improves the robustness of the BO algorithm. We derive an asymptotic bound for the proposed aggregated surrogate model and prove the convergence of MamBO. Benchmark numerical experiments indicate that our algorithm achieves superior or comparable performance to other commonly used high-dimensional BO algorithms. Moreover, we apply MamBO to a cascade classifier of a machine learning algorithm for face detection, and the results reveal that MamBO finds settings that achieve higher classification accuracy than the benchmark settings and is computationally faster than other high-dimensional BO algorithms.

摘要:贝叶斯优化(BO)已被广泛用于机器学习和仿真优化。随着这些领域的计算资源和存储能力的提高,高维和大规模的问题变得越来越普遍。在这项研究中,我们在贝叶斯优化(MAMBO)算法中提出了一种模型聚合方法,以有效地解决高维大规模优化问题。 Mambo结合使用亚采样和子空间嵌入来共同解决高维度和大规模问题。此外,采用模型聚合方法来解决应用嵌入时出现的替代模型不确定性问题。在嵌入文献和实践中,这种替代模型不确定性问题在很大程度上被忽略了,并且当问题具有高维度并且数据受到限制时,它会加剧。我们提出的模型聚合方法降低了这些低维替代模型的风险,并改善了BO算法的鲁棒性。我们得出了针对拟议的构造替代模型的渐近线结合,并证明了MAMBO的收敛性。基准数值实验表明,我们的算法与其他常用的高维BO算法相比具有出色或可比的性能。此外,我们将Mambo应用于机器学习算法的级联分类器进行面部检测,结果表明,Mambo找到的设置比基准设置更高,并且比其他高维BO算法更快地计算。

ML-24-标题 A scalable deep learning approach for solving high-dimensional dynamic optimal transport

链接: https://arxiv.org/abs/2205.07521
作者: Wei Wan, Yuejin Zhang, Chenglong Bao, Bin Dong, Zuoqiang Shi
备注:

点击查看摘要

Abstract: The dynamic formulation of optimal transport has attracted growing interests in scientific computing and machine learning, and its computation requires to solve a PDE-constrained optimization problem. The classical Eulerian discretization based approaches suffer from the curse of dimensionality, which arises from the approximation of high-dimensional velocity field. In this work, we propose a deep learning based method to solve the dynamic optimal transport in high dimensional space. Our method contains three main ingredients: a carefully designed representation of the velocity field, the discretization of the PDE constraint along the characteristics, and the computation of high dimensional integral by Monte Carlo method in each time step. Specifically, in the representation of the velocity field, we apply the classical nodal basis function in time and the deep neural networks in space domain with the H1-norm regularization. This technique promotes the regularity of the velocity field in both time and space such that the discretization along the characteristic remains to be stable during the training process. Extensive numerical examples have been conducted to test the proposed method. Compared to other solvers of optimal transport, our method could give more accurate results in high dimensional cases and has very good scalability with respect to dimension. Finally, we extend our method to more complicated cases such as crowd motion problem.

摘要:最佳运输的动态表述吸引了对科学计算和机器学习的不断增长的兴趣,其计算需要解决PDE受限的优化问题。基于经典的欧拉离散化方法遭受维数的诅咒,这是由于高维速度场的近似而产生的。在这项工作中,我们提出了一种基于深度学习的方法,以解决高维空间中的动态最佳运输。我们的方法包含三个主要成分:速度场的精心设计表示,沿特征的PDE约束的离散化以及在每个时间步中通过Monte Carlo方法对高维积分的计算。具体而言,在速度场的表示中,我们将经典的节点基础函数和空间域中的深神经网络应用于H1-norm正则化。该技术在时间和空间中促进了速度场的规律性,因此沿着特征的离散化在训练过程中仍有稳定。已经进行了广泛的数值示例来测试所提出的方法。与其他最佳运输的求解器相比,我们的方法可以在高维情况下提供更准确的结果,并且相对于尺寸具有非常好的可伸缩性。最后,我们将方法扩展到更复杂的案例,例如人群运动问题。

ML-25-标题 KGRGRL A Users Permission Reasoning Method Based on Knowledge Graph Reward Guidance Reinforcement Learning

链接: https://arxiv.org/abs/2205.07502
作者: Lei Zhang, Yu Pan, Yi Liu, Qibin Zheng, Zhisong Pan
备注: 8 pages, 2 figures

点击查看摘要

Abstract: In general, multiple domain cyberspace security assessments can be implemented by reasoning user’s permissions. However, while existing methods include some information from the physical and social domains, they do not provide a comprehensive representation of cyberspace. Existing reasoning methods are also based on expert-given rules, resulting in inefficiency and a low degree of intelligence. To address this challenge, we create a Knowledge Graph (KG) of multiple domain cyberspace in order to provide a standard semantic description of the multiple domain cyberspace. Following that, we proposed a user’s permissions reasoning method based on reinforcement learning. All permissions in cyberspace are represented as nodes, and an agent is trained to find all permissions that user can have according to user’s initial permissions and cyberspace KG. We set 10 reward setting rules based on the features of cyberspace KG in the reinforcement learning of reward information setting, so that the agent can better locate user’s all permissions and avoid blindly finding user’s permissions. The results of the experiments showed that the proposed method can successfully reason about user’s permissions and increase the intelligence level of the user’s permissions reasoning method. At the same time, the F1 value of the proposed method is 6% greater than that of the Translating Embedding (TransE) method.

摘要:通常,可以通过推理用户的权限来实现多个域网络空间安全评估。但是,尽管现有方法包括来自物理和社会领域的一些信息,但它们并未提供网络空间的全面表示。现有的推理方法也基于专家赋予的规则,导致效率低下和智力程度低。为了应对这一挑战,我们创建了多个域网络空间的知识图(kg),以提供多个域网络空间的标准语义描述。在此之后,我们提出了基于强化学习的用户权限推理方法。网络空间中的所有权限均表示为节点,并且对代理进行了培训,以找到用户可以根据用户的初始权限和网络空间kg获得的所有权限。我们根据网络空间kg的功能在加强奖励信息设置中设置了10个奖励设置规则,以便代理可以更好地找到用户的所有权限,并避免盲目找到用户的权限。实验的结果表明,提出的方法可以成功地推荐用户的权限,并提高用户权限推理方法的智能水平。同时,所提出的方法的F1值比翻译嵌入方法(TRANSE)方法的F1值高6%。

ML-26-标题 Multi-scale Attention Flow for Probabilistic Time Series Forecasting

链接: https://arxiv.org/abs/2205.07493
作者: Shibo Feng, Ke Xu, Jiaxiang Wu, Pengcheng Wu, Fan Lin, Peilin Zhao
备注:

点击查看摘要

Abstract: The probability prediction of multivariate time series is a notoriously challenging but practical task. On the one hand, the challenge is how to effectively capture the cross-series correlations between interacting time series, to achieve accurate distribution modeling. On the other hand, we should consider how to capture the contextual information within time series more accurately to model multivariate temporal dynamics of time series. In this work, we proposed a novel non-autoregressive deep learning model, called Multi-scale Attention Normalizing Flow(MANF), where we integrate multi-scale attention and relative position information and the multivariate data distribution is represented by the conditioned normalizing flow. Additionally, compared with autoregressive modeling methods, our model avoids the influence of cumulative error and does not increase the time complexity. Extensive experiments demonstrate that our model achieves state-of-the-art performance on many popular multivariate datasets.

摘要:多元时间序列的概率预测是众所周知的挑战但实际任务。一方面,挑战是如何有效地捕获相互作用时间序列之间的跨系列相关性,以实现准确的分布建模。另一方面,我们应该考虑如何更准确地捕获时间序列中的上下文信息,以建模时间序列的多元时间动力学。在这项工作中,我们提出了一种新型的非自动入学深度学习模型,称为多尺度注意力归一流(MANF),在该模型中,我们整合了多尺度的注意力和相对位置信息,并且多元数据分布由条件正常化的流量表示。此外,与自回旋建模方法相比,我们的模型避免了累积误差的影响,并且不会增加时间复杂性。广泛的实验表明,我们的模型在许多流行的多元数据集上实现了最新性能。

ML-27-标题 Robust Testing in High-Dimensional Sparse Models

链接: https://arxiv.org/abs/2205.07488
作者: Anand Jerry George, Clément L. Canonne
备注:

点击查看摘要

Abstract: We consider the problem of robustly testing the norm of a high-dimensional sparse signal vector under two different observation models. In the first model, we are given n i.i.d. samples from the distribution \mathcal{N}\left(\theta,I_d\right) (with unknown \theta ), of which a small fraction has been arbitrarily corrupted. Under the promise that |\theta|_0\le s , we want to correctly distinguish whether |\theta|_2=0 or |\theta|_2>\gamma , for some input parameter \gamma>0 . We show that any algorithm for this task requires n=\Omega\left(s\log\frac{ed}{s}\right) samples, which is tight up to logarithmic factors. We also extend our results to other common notions of sparsity, namely, |\theta|_q\le s for any 0 < q < 2 . In the second observation model that we consider, the data is generated according to a sparse linear regression model, where the covariates are i.i.d. Gaussian and the regression coefficient (signal) is known to be s -sparse. Here too we assume that an \epsilon -fraction of the data is arbitrarily corrupted. We show that any algorithm that reliably tests the norm of the regression coefficient requires at least n=\Omega\left(\min(s\log d,{1}/{\gamma^4})\right) samples. Our results show that the complexity of testing in these two settings significantly increases under robustness constraints. This is in line with the recent observations made in robust mean testing and robust covariance testing.

摘要:我们考虑了在两个不同的观察模型下鲁棒测试高维稀疏信号矢量规范的问题。在第一个模型中,我们得到了n I.I.D.来自分布\ Mathcal {n} \ left(\ theta,i_d \ right)的样本(带有未知\ theta),其中一小部分被任意损坏。在承诺\ | \ theta \ | _0 \ le s的承诺下,我们要正确区分\ | \ theta \ | _2 = 0或\ | \ | \ | \ theta \ | _2> \ gamma,对于某些输入参数\ gamma> 0 。我们表明,此任务的任何算法都需要n = \ omega \ left(s \ log \ frac {ed} {s} {s} \ right)样本,这是紧密的对数因素。我们还将结果扩展到其他常见的稀疏性概念,即,对于任何0 <q <2,\ | \ theta \ | _q \ le s。在我们考虑的第二个观察模型中,数据是根据稀疏线性回归模型生成的,其中协变量为I.I.D。高斯和回归系数(信号)已知为s -sparse。在这里,我们也假设数据的\ epsilon分数被任意损坏。我们表明,任何可靠测试回归系数规范的算法都需要至少n = \ omega \ left(\ min(s \ log d,{1}/{1}/{\ gamma^4} 4})\ right)\ right)样本。我们的结果表明,这两种设置中测试的复杂性在鲁棒性约束下显着增加。这与在强大的平均测试和鲁棒协方差测试中进行的最新观察结果一致。

ML-28-标题 Learning-Based sensitivity analysis and feedback design for drug delivery of mixed therapy of cancer in the presence of high model uncertainties

链接: https://arxiv.org/abs/2205.07482
作者: Mazen Alamir
备注:

点击查看摘要

Abstract: In this paper, a methodology is proposed that enables to analyze the sensitivity of the outcome of a therapy to unavoidable high dispersion of the patient specific parameters on one hand and to the choice of the parameters that define the drug delivery feedback strategy on the other hand. More precisely, a method is given that enables to extract and rank the most influent parameters that determine the probability of success/failure of a given feedback therapy for a given set of initial conditions over a cloud of realizations of uncertainties. Moreover predictors of the expectations of the amounts of drugs being used can also be derived. This enables to design an efficient stochastic optimization framework that guarantees safe contraction of the tumor while minimizing a weighted sum of the quantities of the different drugs being used. The framework is illustrated and validated using the example of a mixed therapy of cancer involving three combined drugs namely: a chemotherapy drug, an immunology vaccine and an immunotherapy drug. Finally, in this specific case, it is shown that dash-boards can be built in the 2D-space of the most influent state components that summarize the outcomes’ probabilities and the associated drug usage as iso-values curves in the reduced state space.

摘要:在本文中,提出了一种方法,该方法能够分析疗法结果的敏感性,以一方面不可避免地对患者特定参数的高度分散以及选择定义药物输送反馈策略的参数的选择另一方面。更确切地说,给出了一种方法,该方法能够提取和排除最有影响力的参数,以确定给定反馈疗法成功/失败的可能性/失败的可能性,而在不确定性的实现云上,给定的一组初始条件。此外,还可以得出对所使用药物量的期望的预测指标。这使得可以设计有效的随机优化框架,以确保肿瘤的安全收缩,同时最大程度地减少所使用的不同药物的加权总和。使用涉及三种组合药物的癌症混合疗法的例子来说明和验证该框架:一种化学疗法药物,一种免疫学疫苗和免疫疗法药物。最后,在这种特定情况下,可以证明仪表板可以在最具影响力的状态组件的2D空间中构建,这些仪表构成了结果的概率和相关的药物用法作为减少状态空间中的ISO值曲线。

ML-29-标题 Towards Lossless ANN-SNN Conversion under Ultra-Low Latency with Dual-Phase Optimization

链接: https://arxiv.org/abs/2205.07473
作者: Ziming Wang, Shuang Lian, Yuhao Zhang, Xiaoxin Cui, Rui Yan, Huajin Tang
备注:

点击查看摘要

Abstract: Spiking neural network (SNN) operating with asynchronous discrete events shows higher energy efficiency. A popular approach to implement deep SNNs is ANN-SNN conversion combining both efficient training in ANNs and efficient inference in SNNs. However, the previous works mostly required thousands of time steps to achieve lossless conversion. In this paper, we first identify the underlying cause, i.e., misrepresentation of the negative or overflow residual membrane potential in SNNs. Furthermore, we systematically analyze the conversion error between SNNs and ANNs, and then decompose it into three folds: quantization error, clipping error, and residual membrane potential representation error. With such insights, we propose a dual-phase conversion algorithm to minimize those errors. As a result, our model achieves SOTA in both accuracy and accuracy-delay tradeoff with deep architectures (ResNet and VGG net). Specifically, we report SOTA accuracy within 16 \times speedup compared with the latest results. Meanwhile, lossless conversion is performed with at least 2 \times faster reasoning performance.

摘要:使用异步离散事件运行的尖峰神经网络(SNN)显示出较高的能源效率。实施深SNN的一种流行方法是ANN-SNN转换结合了ANN中的有效训练和SNN中有效的推断。但是,以前的作品主要需要数千个时间步骤才能实现无损转换。在本文中,我们首先确定了基本原因,即对SNN中负膜电位的负或溢出膜电位的虚假陈述。此外,我们系统地分析了SNN和ANN之间的转换误差,然后将其分解为三倍:量化误差,剪辑误差和残留膜电位表示误差。有了这样的见解,我们提出了一种双相转换算法,以最大程度地减少这些错误。结果,我们的模型可以通过深度体系结构(Resnet和vgg Net)获得准确性和准确性折衷。具体来说,我们报告了与最新结果相比,在16 \ times速度之内的SOTA准确性。同时,执行无损转换的速度至少要更快地推理性能。

ML-30-标题 q-Munchausen Reinforcement Learning

链接: https://arxiv.org/abs/2205.07467
作者: Lingwei Zhu, Zheng Chen, Eiji Uchibe, Takamitsu Matsubara
备注:

点击查看摘要

Abstract: The recently successful Munchausen Reinforcement Learning (M-RL) features implicit Kullback-Leibler (KL) regularization by augmenting the reward function with logarithm of the current stochastic policy. Though significant improvement has been shown with the Boltzmann softmax policy, when the Tsallis sparsemax policy is considered, the augmentation leads to a flat learning curve for almost every problem considered. We show that it is due to the mismatch between the conventional logarithm and the non-logarithmic (generalized) nature of Tsallis entropy. Drawing inspiration from the Tsallis statistics literature, we propose to correct the mismatch of M-RL with the help of q -logarithm/exponential functions. The proposed formulation leads to implicit Tsallis KL regularization under the maximum Tsallis entropy framework. We show such formulation of M-RL again achieves superior performance on benchmark problems and sheds light on more general M-RL with various entropic indices q .

摘要:最近成功的Munchausen强化学习(M-RL)通过与当前随机策略的对数增强奖励功能,以隐式Kullback-Leibler(KL)正则化。尽管Boltzmann SoftMax政策已经显示出显着的改进,但是当考虑到Tsallis Sparsemax政策时,增强导致几乎所有考虑的问题都会导致平坦的学习曲线。我们表明,这是由于传统对数与tsallis熵的非同伴(广义)性质之间的不匹配所致。从Tsallis统计文献中汲取灵感,我们建议借助Q -logarithm/指数函数纠正M -RL的不匹配。提出的配方导致最大TSALLIS熵框架下隐式tsallis kl正则化。我们显示了M-RL的这种表述再次在基准问题上实现了卓越的性能,并用各种熵指数q阐明了更通用的M-RL。

ML-31-标题 Gradient Descent Optimizes Infinite-Depth ReLU Implicit Networks with Linear Widths

链接: https://arxiv.org/abs/2205.07463
作者: Tianxiang Gao, Hongyang Gao
备注:

点击查看摘要

Abstract: Implicit deep learning has recently become popular in the machine learning community since these implicit models can achieve competitive performance with state-of-the-art deep networks while using significantly less memory and computational resources. However, our theoretical understanding of when and how first-order methods such as gradient descent (GD) converge on \textit{nonlinear} implicit networks is limited. Although this type of problem has been studied in standard feed-forward networks, the case of implicit models is still intriguing because implicit networks have \textit{infinitely} many layers. The corresponding equilibrium equation probably admits no or multiple solutions during training. This paper studies the convergence of both gradient flow (GF) and gradient descent for nonlinear ReLU activated implicit networks. To deal with the well-posedness problem, we introduce a fixed scalar to scale the weight matrix of the implicit layer and show that there exists a small enough scaling constant, keeping the equilibrium equation well-posed throughout training. As a result, we prove that both GF and GD converge to a global minimum at a linear rate if the width m of the implicit network is \textit{linear} in the sample size N , i.e., m=\Omega(N) .

摘要:隐含的深度学习最近在机器学习社区中变得很流行,因为这些隐性模型可以通过最先进的深层网络实现竞争性能,同时使用明显较少的内存和计算资源。但是,我们对诸如梯度下降(GD)等一阶方法在\ textit {nonlinear}隐式网络上的理论理解是有限的。尽管已经在标准的馈送网络中研究了这种类型的问题,但隐式模型的情况仍然很有趣,因为隐式网络具有\ textit {Indlible}许多层。相应的平衡方程可能在训练过程中允许没有或多个解决方案。本文研究了非线性relu激活的隐式网络的梯度流(GF)和梯度下降的收敛性。为了处理适合良好的问题,我们引入了一个固定标量来扩展隐式层的重量矩阵,并表明存在足够小的缩放常数,从而使平衡方程保持在整个训练过程中。结果,我们证明,如果隐式网络的宽度m为\ textit {linear},则GF和GD以线性速率收敛到全局最小值,即样本大小n,即m = \ omega(n)。

ML-32-标题 A Deep Reinforcement Learning Blind AI in DareFightingICE

链接: https://arxiv.org/abs/2205.07444
作者: Thai Van Nguyen, Xincheng Dai, Ibrahim Khan, Ruck Thawonmas, Hai V. Pham
备注:

点击查看摘要

Abstract: This paper presents a deep reinforcement learning AI that uses sound as the input on the DareFightingICE platform at the DareFightingICE Competition in IEEE CoG 2022. In this work, an AI that only uses sound as the input is called blind AI. While state-of-the-art AIs rely mostly on visual or structured observations provided by their environments, learning to play games from only sound is still new and thus challenging. We propose different approaches to process audio data and use the Proximal Policy Optimization algorithm for our blind AI. We also propose to use our blind AI in evaluation of sound designs submitted to the competition and define three metrics for this task. The experimental results show the effectiveness of not only our blind AI but also the proposed three metrics.

摘要:本文提出了一种深厚的增强学习AI,它使用声音作为IEEE COG 2022的DareFightingings竞赛中Darefightingings平台上的输入。尽管最新的AI主要依赖于其环境提供的视觉或结构化观察结果,但学会从Sound玩游戏仍然是新的,因此具有挑战性。我们建议使用不同的方法来处理音频数据,并为盲人AI使用近端策略优化算法。我们还建议利用盲人AI评估提交竞争的声音设计,并为此任务定义三个指标。实验结果不仅显示了我们的盲人AI,而且还表明了三个指标的有效性。

ML-33-标题 Optimizing the optimizer for data driven deep neural networks and physics informed neural networks

链接: https://arxiv.org/abs/2205.07430
作者: John Taylor, Wenyi Wang, Biswajit Bala, Tomasz Bednarz
备注: 23 page, 12 figures

点击查看摘要

Abstract: We investigate the role of the optimizer in determining the quality of the model fit for neural networks with a small to medium number of parameters. We study the performance of Adam, an algorithm for first-order gradient-based optimization that uses adaptive momentum, the Levenberg and Marquardt (LM) algorithm a second order method, Broyden,Fletcher,Goldfarb and Shanno algorithm (BFGS) a second order method and LBFGS, a low memory version of BFGS. Using these optimizers we fit the function y = sinc(10x) using a neural network with a few parameters. This function has a variable amplitude and a constant frequency. We observe that the higher amplitude components of the function are fitted first and the Adam, BFGS and LBFGS struggle to fit the lower amplitude components of the function. We also solve the Burgers equation using a physics informed neural network(PINN) with the BFGS and LM optimizers. For our example problems with a small to medium number of weights, we find that the LM algorithm is able to rapidly converge to machine precision offering significant benefits over other optimizers. We further investigated the Adam optimizer with a range of models and found that Adam optimiser requires much deeper models with large numbers of hidden units containing up to 26x more parameters, in order to achieve a model fit close that achieved by the LM optimizer. The LM optimizer results illustrate that it may be possible build models with far fewer parameters. We have implemented all our methods in Keras and TensorFlow 2.

摘要:我们研究了优化器在确定模型拟合对神经网络的质量方面的作用,该神经网络具有小到中等参数的质量。我们研究Adam的性能,Adam是一种基于一阶梯度优化的算法,使用自适应动量,Levenberg和Marquardt(LM)算法是第二阶方法,Broyden,Fletcher,Goldfarb和Shanno Algorithm(BFGS)和LBFGS,BFGS的低内存版本。使用这些优化器,我们使用具有几个参数的神经网络拟合函数y = sinc(10x)。该函数具有可变幅度和恒定频率。我们观察到该函数的较高振幅成分首先拟合,而ADAM,BFGS和LBFG则难以适合该函数的较低振幅成分。我们还使用BFG和LM优化器的物理知情神经网络(PINN)来求解汉堡方程。对于我们的示例问题,较小的权重问题,我们发现LM算法能够快速收敛到机器精确度,从而提供了比其他优化器的重要好处。我们进一步研究了ADAM优化器,并发现Adam Optimiser需要更深的模型,其中大量的隐藏单元包含多达26倍的参数,以实现LM Optimizer实现的模型接近。 LM优化器结果表明,可能是构建参数少得多的模型。我们已经在Keras和Tensorflow 2中实现了所有方法。

ML-34-标题 On the Convergence of the Shapley Value in Parametric Bayesian Learning Games

链接: https://arxiv.org/abs/2205.07428
作者: Lucas Agussurja, Xinyi Xu, Bryan Kian Hsiang Low
备注: To appear in the 39th International Conference on Machine Learning (ICML 2022). Extended version with derivations

点击查看摘要

Abstract: Measuring contributions is a classical problem in cooperative game theory where the Shapley value is the most well-known solution concept. In this paper, we establish the convergence property of the Shapley value in parametric Bayesian learning games where players perform a Bayesian inference using their combined data, and the posterior-prior KL divergence is used as the characteristic function. We show that for any two players, under some regularity conditions, their difference in Shapley value converges in probability to the difference in Shapley value of a limiting game whose characteristic function is proportional to the log-determinant of the joint Fisher information. As an application, we present an online collaborative learning framework that is asymptotically Shapley-fair. Our result enables this to be achieved without any costly computations of posterior-prior KL divergences. Only a consistent estimator of the Fisher information is needed. The framework’s effectiveness is demonstrated with experiments using real-world data.

摘要:测量贡献是合作游戏理论中的一个经典问题,其中沙普利价值是最著名的解决方案概念。在本文中,我们在参数贝叶斯学习游戏中建立了沙普利价值的收敛属性,玩家使用其组合数据进行贝叶斯推断,后端kl差异被用作特征函数。我们表明,对于任何两个玩家,在某些规律性的条件下,其在Shapley价值上的差异与限制性游戏的Shapley值的差异有关,其特征功能与联合Fisher信息的对数确定性成正比。作为一个应用程序,我们提出了一个在线协作学习框架,该框架是渐近的沙普利 - 费尔。我们的结果使得可以实现这一目标,而无需对后端KL差异的任何昂贵计算。仅需要一致的Fisher信息估计器。使用现实世界数据通过实验证明了框架的有效性。

ML-35-标题 Exploring the Learning Difficulty of Data Theory and Measure

链接: https://arxiv.org/abs/2205.07427
作者: Weiyao Zhu, Ou Wu, Fengguang Su, Yingjun Deng
备注:

点击查看摘要

Abstract: As learning difficulty is crucial for machine learning (e.g., difficulty-based weighting learning strategies), previous literature has proposed a number of learning difficulty measures. However, no comprehensive investigation for learning difficulty is available to date, resulting in that nearly all existing measures are heuristically defined without a rigorous theoretical foundation. In addition, there is no formal definition of easy and hard samples even though they are crucial in many studies. This study attempts to conduct a pilot theoretical study for learning difficulty of samples. First, a theoretical definition of learning difficulty is proposed on the basis of the bias-variance trade-off theory on generalization error. Theoretical definitions of easy and hard samples are established on the basis of the proposed definition. A practical measure of learning difficulty is given as well inspired by the formal definition. Second, the properties for learning difficulty-based weighting strategies are explored. Subsequently, several classical weighting methods in machine learning can be well explained on account of explored properties. Third, the proposed measure is evaluated to verify its reasonability and superiority in terms of several main difficulty factors. The comparison in these experiments indicates that the proposed measure significantly outperforms the other measures throughout the experiments.

摘要:由于学习难度对于机器学习至关重要(例如,基于难度的加权学习策略),以前的文献提出了许多学习难度措施。但是,迄今为止尚无针对学习难度的全面调查,导致几乎所有现有的措施都在没有严格的理论基础的情况下进行了启发性定义。此外,即使在许多研究中至关重要,也没有正式的简单和硬样品定义。这项研究试图进行一项试验理论研究,以实现样本的学习难度。首先,根据概述误差的偏见变化权衡理论提出了学习难度的理论定义。基于拟议的定义建立了简单和硬样品的理论定义。从正式定义中给出了一种实用的学习难度测量方法。其次,探索了学习难度的加权策略的属性。随后,可以根据探索的属性来很好地解释机器学习中的几种经典加权方法。第三,评估提出的措施以验证其合理性和优越性,以几个主要的难度因素。这些实验中的比较表明,所提出的措施在整个实验过程中的其他措施显着优于其他措施。

ML-36-标题 Trustworthy Graph Neural Networks Aspects Methods and Trends

链接: https://arxiv.org/abs/2205.07424
作者: He Zhang, Bang Wu, Xingliang Yuan, Shirui Pan, Hanghang Tong, Jian Pei
备注: 36 pages, 7 tables, 4 figures

点击查看摘要

Abstract: Graph neural networks (GNNs) have emerged as a series of competent graph learning methods for diverse real-world scenarios, ranging from daily applications like recommendation systems and question answering to cutting-edge technologies such as drug discovery in life sciences and n-body simulation in astrophysics. However, task performance is not the only requirement for GNNs. Performance-oriented GNNs have exhibited potential adverse effects like vulnerability to adversarial attacks, unexplainable discrimination against disadvantaged groups, or excessive resource consumption in edge computing environments. To avoid these unintentional harms, it is necessary to build competent GNNs characterised by trustworthiness. To this end, we propose a comprehensive roadmap to build trustworthy GNNs from the view of the various computing technologies involved. In this survey, we introduce basic concepts and comprehensively summarise existing efforts for trustworthy GNNs from six aspects, including robustness, explainability, privacy, fairness, accountability, and environmental well-being. Additionally, we highlight the intricate cross-aspect relations between the above six aspects of trustworthy GNNs. Finally, we present a thorough overview of trending directions for facilitating the research and industrialisation of trustworthy GNNs.

摘要:图形神经网络(GNN)已成为一系列具有多种现实世界情景的胜任的图形学习方法,从建议系统等日常应用到诸如生命科学和n-的药物发现等尖端技术等每日应用,以及n-的尖端技术天体物理学中的身体模拟。但是,任务绩效并不是GNN的唯一要求。面向性能的GNN表现出潜在的不利影响,例如对对抗性攻击的脆弱性,对弱势群体的无法解释的歧视或在边缘计算环境中过多的资源消耗。为了避免这些无意的危害,有必要建立具有可信赖性的特征的有能力的GNN。为此,我们提出了一个全面的路线图,以从所涉及的各种计算技术的角度来建立可信赖的GNN。在这项调查中,我们介绍了基本概念,并全面总结了从六个方面的可信赖GNN的现有努力,包括可靠性,解释性,隐私,公平,问责制和环境福祉。此外,我们强调了可信赖的GNN上述六个方面之间复杂的跨观察关系。最后,我们详细介绍了促进可信赖GNN的研究和工业化的趋势方向。

ML-37-标题 TNN7 A Custom Macro Suite for Implementing Highly Optimized Designs of Neuromorphic TNNs

链接: https://arxiv.org/abs/2205.07410
作者: Harideep Nair, Prabhu Vellaisamy, Santha Bhasuthkar, John Paul Shen
备注: To be published in ISVLSI 2022

点击查看摘要

Abstract: Temporal Neural Networks (TNNs), inspired from the mammalian neocortex, exhibit energy-efficient online sensory processing capabilities. Recent works have proposed a microarchitecture design framework for implementing TNNs and demonstrated competitive performance on vision and time-series applications. Building on them, this work proposes TNN7, a suite of nine highly optimized custom macros developed using a predictive 7nm Process Design Kit (PDK), to enhance the efficiency, modularity and flexibility of the TNN design framework. TNN prototypes for two applications are used for evaluation of TNN7. An unsupervised time-series clustering TNN delivering competitive performance can be implemented within 40 uW power and 0.05 mm^2 area, while a 4-layer TNN that achieves an MNIST error rate of 1% consumes only 18 mW and 24.63 mm^2. On average, the proposed macros reduce power, delay, area, and energy-delay product by 14%, 16%, 28%, and 45%, respectively. Furthermore, employing TNN7 significantly reduces the synthesis runtime of TNN designs (by more than 3x), allowing for highly-scaled TNN implementations to be realized.

摘要:颞神经网络(TNNS),灵感来自哺乳动物新皮层,具有节能的在线感官处理能力。最近的作品提出了一个微体系结构设计框架,用于实施TNN,并在视觉和时间序列应用程序上表现出竞争性能。这项工作在他们的基础上提出了TNN7,这是一套使用预测性7NM工艺设计套件(PDK)开发的九个高度优化的定制宏,以提高TNN设计框架的效率,模块化和灵活性。用于两种应用的TNN原型用于评估TNN7。无监督的时间序列聚类TNN提供竞争性能可以在40个UW功率和0.05 mm^2区域内实施,而4层TNN的MNIST错误率为1%,仅消耗18 MW和24.63 mm^2。平均而言,拟议的宏平均将功率,延迟,面积和能量延迟产品降低了14%,16%,28%和45%。此外,采用TNN7大大降低了TNN设计的合成运行时(超过3倍),从而实现了高度缩放的TNN实现。

ML-38-标题 Training neural networks using Metropolis Monte Carlo and an adaptive variant

链接: https://arxiv.org/abs/2205.07408
作者: Stephen Whitelam, Viktor Selin, Ian Benlolo, Isaac Tamblyn
备注:

点击查看摘要

Abstract: We examine the zero-temperature Metropolis Monte Carlo algorithm as a tool for training a neural network by minimizing a loss function. We find that, as expected on theoretical grounds and shown empirically by other authors, Metropolis Monte Carlo can train a neural net with an accuracy comparable to that of gradient descent, if not necessarily as quickly. The Metropolis algorithm does not fail automatically when the number of parameters of a neural network is large. It can fail when a neural network’s structure or neuron activations are strongly heterogenous, and we introduce an adaptive Monte Carlo algorithm, aMC, to overcome these limitations. The intrinsic stochasticity of the Monte Carlo method allows aMC to train neural networks in which the gradient is too small to allow training by gradient descent. We suggest that, as for molecular simulation, Monte Carlo methods offer a complement to gradient-based methods for training neural networks, allowing access to a distinct set of network architectures and principles.

摘要:我们研究了零温度的大都市蒙特卡洛算法,作为通过最大程度地减少损失函数来训练神经网络的工具。我们发现,正如理论上的预期,并在其他作者的经验上表现出来,Metropolis Monte Carlo可以训练具有与梯度下降相当的准确性(即使不一定那么快)的准确性。当神经网络的参数数量较大时,大都市算法不会自动失败。当神经网络的结构或神经元激活是强大的异质性时,它可能会失败,并且我们引入了一种自适应的蒙特卡洛算法AMC来克服这些局限性。蒙特卡洛方法的内在随机性使AMC能够训练梯度太小的神经网络,无法通过梯度下降进行训练。我们建议,对于分子模拟,蒙特卡洛方法为训练神经网络的基于梯度的方法提供了补充,从而可以访问一组不同的网络体系结构和原理。

ML-39-标题 Sibyl Adaptive and Extensible Data Placement in Hybrid Storage Systems Using Online Reinforcement Learning

链接: https://arxiv.org/abs/2205.07394
作者: Gagandeep Singh, Rakesh Nadig, Jisung Park, Rahul Bera, Nastaran Hajinazar, David Novo, Juan Gómez-Luna, Sander Stuijk, Henk Corporaal, Onur Mutlu
备注:

点击查看摘要

Abstract: Hybrid storage systems (HSS) use multiple different storage devices to provide high and scalable storage capacity at high performance. Recent research proposes various techniques that aim to accurately identify performance-critical data to place it in a “best-fit” storage device. Unfortunately, most of these techniques are rigid, which (1) limits their adaptivity to perform well for a wide range of workloads and storage device configurations, and (2) makes it difficult for designers to extend these techniques to different storage system configurations (e.g., with a different number or different types of storage devices) than the configuration they are designed for. We introduce Sibyl, the first technique that uses reinforcement learning for data placement in hybrid storage systems. Sibyl observes different features of the running workload as well as the storage devices to make system-aware data placement decisions. For every decision it makes, Sibyl receives a reward from the system that it uses to evaluate the long-term performance impact of its decision and continuously optimizes its data placement policy online. We implement Sibyl on real systems with various HSS configurations. Our results show that Sibyl provides 21.6%/19.9% performance improvement in a performance-oriented/cost-oriented HSS configuration compared to the best previous data placement technique. Our evaluation using an HSS configuration with three different storage devices shows that Sibyl outperforms the state-of-the-art data placement policy by 23.9%-48.2%, while significantly reducing the system architect’s burden in designing a data placement mechanism that can simultaneously incorporate three storage devices. We show that Sibyl achieves 80% of the performance of an oracle policy that has complete knowledge of future access patterns while incurring a very modest storage overhead of only 124.4 KiB.

摘要:混合存储系统(HSS)使用多个不同的存储设备,以高性能提供高可扩展的存储容量。最近的研究提出了各种技术,旨在准确识别至关重要的数据以将其放置在“最合适”的存储设备中。不幸的是,这些技术中的大多数都是刚性的,(1)将其适应性限制为在广泛的工作负载和存储设备配置方面的表现良好,并且(2)使设计人员难以将这些技术扩展到不同的存储系统配置(例如。 ,具有不同数量或不同类型的存储设备)与它们设计的配置。我们介绍了SIBYL,这是第一个使用加固学习将数据放置在混合存储系统中的技术。 Sibyl观察运行工作负载的不同功能以及存储设备,以做出系统意识的数据放置决策。对于它做出的每个决定,Sibyl都会从系统中获得奖励,以评估其决策的长期绩效影响,并不断优化其在线数据放置政策。我们在具有各种HSS配置的真实系统上实现SIBYL。我们的结果表明,与最佳以前的数据放置技术相比,SIBYL在面向性能/面向成本的HSS配置方面提供了21.6%/19.9%的性能提高。我们使用三种不同存储设备的HSS配置进行评估表明,SIBYL的表现优于最先进的数据放置政策23.9%-48.2%,同时大大减轻了系统架构师在设计数据放置机制时的负担三个存储设备。我们表明,西比尔(Sibyl)达到了甲骨文策略的80%的表现,该策略完全了解未来的访问模式,同时仅产生了仅124.4 KIB的非常适中的存储开销。

ML-40-标题 Incorporating Prior Knowledge into Neural Networks through an Implicit Composite Kernel

链接: https://arxiv.org/abs/2205.07384
作者: Ziyang Jiang, Tongshu Zheng, David Carlson
备注: 17 pages, 14 figures, 1 table, submitted to Advances in Neural Information Processing Systems

点击查看摘要

Abstract: It is challenging to guide neural network (NN) learning with prior knowledge. In contrast, many known properties, such as spatial smoothness or seasonality, are straightforward to model by choosing an appropriate kernel in a Gaussian process (GP). Many deep learning applications could be enhanced by modeling such known properties. For example, convolutional neural networks (CNNs) are frequently used in remote sensing, which is subject to strong seasonal effects. We propose to blend the strengths of deep learning and the clear modeling capabilities of GPs by using a composite kernel that combines a kernel implicitly defined by a neural network with a second kernel function chosen to model known properties (e.g., seasonality). Then, we approximate the resultant GP by combining a deep network and an efficient mapping based on the Nystrom approximation, which we call Implicit Composite Kernel (ICK). ICK is flexible and can be used to include prior information in neural networks in many applications. We demonstrate the strength of our framework by showing its superior performance and flexibility on both synthetic and real-world data sets. The code is available at: https://anonymous.4open.science/r/ICK_NNGP-17C5/.

ML-41-标题 Effect of Batch Normalization on Noise Resistant Property of Deep Learning Models

链接: https://arxiv.org/abs/2205.07372
作者: Omobayode Fagbohungbe, Lijun Qian
备注:

点击查看摘要

Abstract: The fast execution speed and energy efficiency of analog hardware has made them a strong contender for deployment of deep learning model at the edge. However, there are concerns about the presence of analog noise which causes changes to the weight of the models, leading to performance degradation of deep learning model, despite their inherent noise resistant characteristics. The effect of the popular batch normalization layer on the noise resistant ability of deep learning model is investigated in this work. This systematic study has been carried out by first training different models with and without batch normalization layer on CIFAR10 and CIFAR100 dataset. The weights of the resulting models are then injected with analog noise and the performance of the models on the test dataset is obtained and compared. The results show that the presence of batch normalization layer negatively impacts noise resistant property of deep learning model and the impact grows with the increase of the number of batch normalization layers.

摘要:模拟硬件的快速执行速度和能源效率使它们成为了在边缘部署深度学习模型的强大竞争者。但是,人们担心模拟噪声的存在会导致模型的重量变化,尽管具有固有的耐噪声特性,但导致深度学习模型的性能下降。在这项工作中研究了流行的批准层归一层对深度学习模型的噪声能力的影响。这项系统的研究是通过在CIFAR10和CIFAR100数据集上进行有或没有批处理标准化层的第一个训练不同模型进行的。然后将所得模型的权重注入模拟噪声,并获得测试数据集上模型的性能并比较。结果表明,批处理层的存在对深度学习模型的抗噪声性特性产生负面影响,并且随着批处理标准化层数量的增加而增长的影响。

ML-42-标题 What is an equivariant neural network?

链接: https://arxiv.org/abs/2205.07362
作者: Lek-Heng Lim, Bradley J. Nelson
备注: 6 pages, 1 figure

点击查看摘要

Abstract: We explain equivariant neural networks, a notion underlying breakthroughs in machine learning from deep convolutional neural networks for computer vision to AlphaFold 2 for protein structure prediction, without assuming knowledge of equivariance or neural networks. The basic mathematical ideas are simple but are often obscured by engineering complications that come with practical realizations. We extract and focus on the mathematical aspects, and limit ourselves to a cursory treatment of the engineering issues at the end.

摘要:我们解释了模棱两可的神经网络,这是机器学习中从深度卷积神经网络中进行计算机视觉的基础突破性的概念,以示为蛋白质结构预测的Alphafold 2,而无需假设对等效性或神经网络的知识。基本的数学思想很简单,但经常被实际实现带来的工程并发症所掩盖。我们提取并专注于数学方面,并将自己限制在最终对工程问题的粗略处理中。

ML-43-标题 Policy Gradient Method For Robust Reinforcement Learning

链接: https://arxiv.org/abs/2205.07344
作者: Yue Wang, Shaofeng Zou
备注: Accepted by ICML 2022

点击查看摘要

Abstract: This paper develops the first policy gradient method with global optimality guarantee and complexity analysis for robust reinforcement learning under model mismatch. Robust reinforcement learning is to learn a policy robust to model mismatch between simulator and real environment. We first develop the robust policy (sub-)gradient, which is applicable for any differentiable parametric policy class. We show that the proposed robust policy gradient method converges to the global optimum asymptotically under direct policy parameterization. We further develop a smoothed robust policy gradient method and show that to achieve an \epsilon -global optimum, the complexity is \mathcal O(\epsilon^{-3}) . We then extend our methodology to the general model-free setting and design the robust actor-critic method with differentiable parametric policy class and value function. We further characterize its asymptotic convergence and sample complexity under the tabular setting. Finally, we provide simulation results to demonstrate the robustness of our methods.

摘要:本文开发了第一个政策梯度方法,具有全球最佳保证和在模型不匹配下的强大强化学习的复杂性分析。强大的强化学习是学习一项策略,以模拟模拟器和真实环境之间的不匹配。我们首先开发可靠的策略(子)梯度,该梯度适用于任何可区分的参数策略类。我们表明,在直接的策略参数化下,提出的强大策略梯度方法会收敛到全局最佳。我们进一步开发了一种平滑的稳健策略梯度方法,并表明要获得\ epsilon -global最佳,复杂性为\ mathcal o(\ epsilon^{ - 3})。然后,我们将我们的方法扩展到无模型设置,并使用可区分的参数策略类别和值函数设计强大的参与者方法。我们进一步表征其在表格设置下的渐近收敛性和样品复杂性。最后,我们提供了模拟结果,以证明我们方法的鲁棒性。

ML-44-标题 Reductive MDPs A Perspective Beyond Temporal Horizons

链接: https://arxiv.org/abs/2205.07338
作者: Thomas Spooner, Rui Silva, Joshua Lockhart, Jason Long, Vacslav Glukhov
备注: 15 pages, 10 figures, 1 algorithm

点击查看摘要

Abstract: Solving general Markov decision processes (MDPs) is a computationally hard problem. Solving finite-horizon MDPs, on the other hand, is highly tractable with well known polynomial-time algorithms. What drives this extreme disparity, and do problems exist that lie between these diametrically opposed complexities? In this paper we identify and analyse a sub-class of stochastic shortest path problems (SSPs) for general state-action spaces whose dynamics satisfy a particular drift condition. This construction generalises the traditional, temporal notion of a horizon via decreasing reachability: a property called reductivity. It is shown that optimal policies can be recovered in polynomial-time for reductive SSPs – via an extension of backwards induction – with an efficient analogue in reductive MDPs. The practical considerations of the proposed approach are discussed, and numerical verification provided on a canonical optimal liquidation problem.

摘要:解决一般马尔可夫决策过程(MDP)是一个计算问题上的问题。另一方面,通过众所周知的多项式时间算法,解决有限摩托的MDP是高度拖动的。是什么驱动了这种极端差异,并且存在这些截然相反的复杂性之间存在的问题?在本文中,我们确定并分析了一类随机路径问题(SSP)的子类别的一般状态行动空间,该州的动力学满足了特定的漂移条件。这种结构通过降低可达性来概括地平线的传统时间概念:一种称为降低的特性。结果表明,通过向后诱导的扩展,可以在多项式时间内回收最佳策略,并具有有效的模拟MDP。讨论了所提出的方法的实际考虑,并在规范的最佳清算问题上提供了数值验证。

ML-45-标题 Sobolev Acceleration and Statistical Optimality for Learning Elliptic Equations via Gradient Descent

链接: https://arxiv.org/abs/2205.07331
作者: Yiping Lu, Jose Blanchet, Lexing Ying
备注:

点击查看摘要

Abstract: In this paper, we study the statistical limits in terms of Sobolev norms of gradient descent for solving inverse problem from randomly sampled noisy observations using a general class of objective functions. Our class of objective functions includes Sobolev training for kernel regression, Deep Ritz Methods (DRM), and Physics Informed Neural Networks (PINN) for solving elliptic partial differential equations (PDEs) as special cases. We consider a potentially infinite-dimensional parameterization of our model using a suitable Reproducing Kernel Hilbert Space and a continuous parameterization of problem hardness through the definition of kernel integral operators. We prove that gradient descent over this objective function can also achieve statistical optimality and the optimal number of passes over the data increases with sample size. Based on our theory, we explain an implicit acceleration of using a Sobolev norm as the objective function for training, inferring that the optimal number of epochs of DRM becomes larger than the number of PINN when both the data size and the hardness of tasks increase, although both DRM and PINN can achieve statistical optimality.

摘要:在本文中,我们研究了使用一般目标函数类别的随机抽样观测来求解逆问题的梯度下降规范的统计限制。我们的目标功能类别包括用于内核回归的SOBOLEV培训,深层RITZ方法(DRM)和物理知识的神经网络(PINN),以解决椭圆形偏微分方程(PDES)作为特殊情况。我们考虑使用合适的再现核希尔伯特空间和通过内核积分运算符的定义对问题硬度的连续参数化考虑模型的潜在无限二维参数化。我们证明,该目标函数上的梯度下降也可以实现统计最佳性,并且数据的最佳通过数随样本量增加而增加。基于我们的理论,我们解释了使用Sobolev Norm作为训练的目标函数的隐含加速度,推断出DRM的最佳时期数量在数据大小和任务的硬度增加时,DRM的最佳数量变得大于PINN的数量,尽管DRM和PINN都可以实现统计最佳性。

ML-46-标题 Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective

链接: https://arxiv.org/abs/2205.07320
作者: Keitaro Sakamoto, Issei Sato
备注:

点击查看摘要

Abstract: The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large learning rate does not work well in deep neural networks such as ResNet. However, since the initial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability. In this paper, we confirm this hypothesis and show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior. On the basis of our experimental findings that flatness is useful for improving accuracy and robustness to label noise and that the distance from the initial weights is deeply involved in winning tickets, we offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets. Finally, we revisit existing algorithms for finding winning tickets from a PAC-Bayesian perspective and provide new insights into these methods.

摘要:彩票假说(LTH)引起了人们的关注,因为它可以解释为什么过度参数化模型通常显示出很高的概括能力。众所周知,当我们使用迭代幅度修剪(IMP)时,这是一种算法,可以找到具有高概括能力的稀疏网络,可以独立从初始权重训练,称为获胜票,最初的大型学习率在深层神经网络,例如重新连接。但是,由于最初的较大学习率通常有助于优化器收敛到平坦的最小值,因此我们假设获胜票的最小值相对较高,这在概括能力方面被认为是不利的。在本文中,我们证实了这一假设,并表明Pac-Bayesian理论可以对LTH与概括行为之间的关系有明确的理解。根据我们的实验发现,平坦度可用于提高标签噪声的准确性和稳健性,并且与初始权重的距离深深涉及获胜的门票,我们提供了使用尖峰和slab分布的PAC-Bayes绑定到的pac-bayes分析获胜门票。最后,我们重新审视了现有的算法,以从Pac-Bayesian的角度查找获奖门票,并对这些方法提供新的见解。

ML-47-标题 cMelGAN An Efficient Conditional Generative Model Based on Mel Spectrograms

链接: https://arxiv.org/abs/2205.07319
作者: Tracy Qian, Jackson Kaunismaa, Tony Chung
备注:

点击查看摘要

Abstract: Analysing music in the field of machine learning is a very difficult problem with numerous constraints to consider. The nature of audio data, with its very high dimensionality and widely varying scales of structure, is one of the primary reasons why it is so difficult to model. There are many applications of machine learning in music, like the classifying the mood of a piece of music, conditional music generation, or popularity prediction. The goal for this project was to develop a genre-conditional generative model of music based on Mel spectrograms and evaluate its performance by comparing it to existing generative music models that use note-based representations. We initially implemented an autoregressive, RNN-based generative model called MelNet . However, due to its slow speed and low fidelity output, we decided to create a new, fully convolutional architecture that is based on the MelGAN [4] and conditional GAN architectures, called cMelGAN.

摘要:在机器学习领域分析音乐是一个非常困难的问题,需要考虑许多限制。音频数据的性质具有很高的维度和广泛变化的结构,是为什么很难建模的主要原因之一。音乐学习中有许多应用程序,例如将音乐,有条件的音乐发电或受欢迎程度预测进行分类。该项目的目标是基于MEL频谱图开发音乐类型的音乐生成模型,并通过将其与使用基于注释表示的现有生成音乐模型进行比较来评估其性能。我们最初实施了一种称为Melnet的自动回归,基于RNN的生成模型。但是,由于其缓慢的速度和低的保真度输出,我们决定创建一种基于梅尔根[4]和有条件的GAN体系结构的新的,完全卷积的体系结构,称为Cmelgan。

ML-48-标题 Parameter Adaptation for Joint Distribution Shifts

链接: https://arxiv.org/abs/2205.07315
作者: Siddhartha Datta
备注:

点击查看摘要

Abstract: While different methods exist to tackle distinct types of distribution shift, such as label shift (in the form of adversarial attacks) or domain shift, tackling the joint shift setting is still an open problem. Through the study of a joint distribution shift manifesting both adversarial and domain-specific perturbations, we not only show that a joint shift worsens model performance compared to their individual shifts, but that the use of a similar domain worsens performance than a dissimilar domain. To curb the performance drop, we study the use of perturbation sets motivated by input and parameter space bounds, and adopt a meta learning strategy (hypernetworks) to model parameters w.r.t. test-time inputs to recover performance.

摘要:尽管存在不同的方法来解决不同类型的分配转移类型,例如标签转移(以对抗性攻击的形式)或域移动,但解决关节移位设置仍然是一个开放的问题。通过研究表现出对抗性和域特异性扰动的联合分布变化,我们不仅表明,与单个变化相比,联合转移会使模型性能恶化,而且使用相似的域的使用使性能恶化了,而不是相似的域。为了遏制性能下降,我们研究了由输入和参数空间界定动机的扰动集的使用,并采用元学习策略(HyperNetworks)来模拟参数W.R.T.测试时间输入以恢复性能。

ML-49-标题 Generalization Bounds on Multi-Kernel Learning with Mixed Datasets

链接: https://arxiv.org/abs/2205.07313
作者: Lan V. Truong
备注: Under review for possible publication

点击查看摘要

Abstract: This paper presents novel generalization bounds for the multi-kernel learning problem. Motivated by applications in sensor networks, we assume that the dataset is mixed where each sample is taken from a finite pool of Markov chains. Our bounds for learning kernels admit O(\sqrt{\log m}) dependency on the number of base kernels and O(1/\sqrt{n}) dependency on the number of training samples. However, some O(1/\sqrt{n}) terms are added to compensate for the dependency among samples compared with existing generalization bounds for multi-kernel learning with i.i.d. datasets.

摘要:本文介绍了多内核学习问题的新型概括界。由传感器网络中的应用程序激励,我们假设数据集混合在每个样品中从有限的马尔可夫链中取出的地方。我们学习内核的界限o(\ sqrt {\ log m})依赖于基本内核的数量和o(1/\ sqrt {n})对训练样本数量的依赖性。但是,与i.i.d的多内核学习相比,添加了一些O(1/\ sqrt {n})项以补偿样品之间的依赖性。数据集。

ML-50-标题 COIN Communication-Aware In-Memory Acceleration for Graph Convolutional Networks

链接: https://arxiv.org/abs/2205.07311
作者: Sumit K. Mandal, Gokul Krishnan, A. Alper Goksoy, Gopikrishnan Ravindran Nair, Yu Cao, Umit Y. Ogras
备注:

点击查看摘要

Abstract: Graph convolutional networks (GCNs) have shown remarkable learning capabilities when processing graph-structured data found inherently in many application areas. GCNs distribute the outputs of neural networks embedded in each vertex over multiple iterations to take advantage of the relations captured by the underlying graphs. Consequently, they incur a significant amount of computation and irregular communication overheads, which call for GCN-specific hardware accelerators. To this end, this paper presents a communication-aware in-memory computing architecture (COIN) for GCN hardware acceleration. Besides accelerating the computation using custom compute elements (CE) and in-memory computing, COIN aims at minimizing the intra- and inter-CE communication in GCN operations to optimize the performance and energy efficiency. Experimental evaluations with widely used datasets show up to 105x improvement in energy consumption compared to state-of-the-art GCN accelerator.

摘要:图形卷积网络(GCN)在处理在许多应用领域内固有发现的图形结构数据时表现出了显着的学习能力。GCN在多个迭代中分布了嵌入每个顶点中的神经网络的输出,以利用基础图捕获的关系。因此,他们会产生大量的计算和不规则的通信开销,这些开销要求特定于GCN的硬件加速器。为此,本文介绍了用于GCN硬件加速的通信感知内存计算体系结构(COIN)。除了使用自定义计算元素(CE)和内存计算加速计算外,硬币还旨在最大程度地降低GCN操作中的内部和INTE-CE内通信,以优化性能和能源效率。与最先进的GCN加速器相比,使用广泛使用的数据集的实验评估可提高105倍。

ML-51-标题 3DLinker An E(3) Equivariant Variational Autoencoder for Molecular Linker Design

链接: https://arxiv.org/abs/2205.07309
作者: Yinan Huang, Xingang Peng, Jianzhu Ma, Muhan Zhang
备注:

点击查看摘要

Abstract: Deep learning has achieved tremendous success in designing novel chemical compounds with desirable pharmaceutical properties. In this work, we focus on a new type of drug design problem – generating a small “linker” to physically attach two independent molecules with their distinct functions. The main computational challenges include: 1) the generation of linkers is conditional on the two given molecules, in contrast to generating full molecules from scratch in previous works; 2) linkers heavily depend on the anchor atoms of the two molecules to be connected, which are not known beforehand; 3) 3D structures and orientations of the molecules need to be considered to avoid atom clashes, for which equivariance to E(3) group are necessary. To address these problems, we propose a conditional generative model, named 3DLinker, which is able to predict anchor atoms and jointly generate linker graphs and their 3D structures based on an E(3) equivariant graph variational autoencoder. So far as we know, there are no previous models that could achieve this task. We compare our model with multiple conditional generative models modified from other molecular design tasks and find that our model has a significantly higher rate in recovering molecular graphs, and more importantly, accurately predicting the 3D coordinates of all the atoms.

摘要:深度学习在设计具有理想药物特性的新型化学化合物方面取得了巨大的成功。在这项工作中,我们专注于一种新型的药物设计问题 - 生成一个小的“接头”,以便在物理上附着两个独立的分子及其独特的功能。主要的计算挑战包括:1)连接器的产生在两个给定的分子上是有条件的,与以前的作品中从头开始生成完整分子相比; 2)链接在很大程度上取决于要连接的两个分子的锚定原子,这些原子事先未知; 3)需要考虑三个分子的3D结构和方向,以避免原子冲突,对于e(3)组,必须对其进行均衡。为了解决这些问题,我们提出了一个有条件的生成模型,称为3DLINKER,该模型能够预测基于E(3)eprovariant图形变异自动编码器的锚定原子并共同生成链接图及其3D结构。据我们所知,没有以前的模型可以实现这一任务。我们将模型与从其他分子设计任务修改的多种条件生成模型进行比较,发现我们的模型在恢复分子图方面具有明显更高的速率,更重要的是,准确地预测了所有原子的3D坐标。

ML-52-标题 Finding Global Homophily in Graph Neural Networks When Meeting Heterophily

链接: https://arxiv.org/abs/2205.07308
作者: Xiang Li, Renyu Zhu, Yao Cheng, Caihua Shan, Siqiang Luo, Dongsheng Li, Weining Qian
备注: To appear in ICML 2022

点击查看摘要

Abstract: We investigate graph neural networks on graphs with heterophily. Some existing methods amplify a node’s neighborhood with multi-hop neighbors to include more nodes with homophily. However, it is a significant challenge to set personalized neighborhood sizes for different nodes. Further, for other homophilous nodes excluded in the neighborhood, they are ignored for information aggregation. To address these problems, we propose two models GloGNN and GloGNN++, which generate a node’s embedding by aggregating information from global nodes in the graph. In each layer, both models learn a coefficient matrix to capture the correlations between nodes, based on which neighborhood aggregation is performed. The coefficient matrix allows signed values and is derived from an optimization problem that has a closed-form solution. We further accelerate neighborhood aggregation and derive a linear time complexity. We theoretically explain the models’ effectiveness by proving that both the coefficient matrix and the generated node embedding matrix have the desired grouping effect. We conduct extensive experiments to compare our models against 11 other competitors on 15 benchmark datasets in a wide range of domains, scales and graph heterophilies. Experimental results show that our methods achieve superior performance and are also very efficient.

摘要:我们研究了与异性恋图上图的图形神经网络。一些现有的方法可以扩大与多跳邻邻居的节点的邻域,以包括更多具有同质性的节点。但是,要为不同节点设置个性化邻域大小是一个重大挑战。此外,对于附近不包括的其他同质节点,它们被忽略以进行信息聚合。为了解决这些问题,我们提出了两个模型Glognn和Glognn ++,它们通过汇总图中全局节点的信息来生成节点的嵌入。在每一层中,两个模型都学习一个系数矩阵,以捕获节点之间的相关性,该矩阵基于执行邻域聚集。系数矩阵允许签名值,并从具有封闭形式解决方案的优化问题得出。我们进一步加速邻域聚集并得出线性时间复杂性。从理论上讲,我们通过证明系数矩阵和生成的节点嵌入矩阵都具有所需的分组效果来解释模型的有效性。我们进行了广泛的实验,以将模型与15个基准数据集中的其他11个竞争对手进行比较,以范围广泛的范围,尺度和图形异质体进行比较。实验结果表明,我们的方法达到了卓越的性能,并且也非常有效。

ML-53-标题 Optimization of Decision Tree Evaluation Using SIMD Instructions

链接: https://arxiv.org/abs/2205.07307
作者: Alexey Mironov, Ilnur Khuziev
备注: in Russian language

点击查看摘要

Abstract: Decision forest (decision tree ensemble) is one of the most popular machine learning algorithms. To use large models on big data, like document scoring with learning-to-rank models, we need to evaluate these models efficiently. In this paper, we explore MatrixNet, the ancestor of the popular CatBoost library. Both libraries use the SSE instruction set for scoring on CPU. This paper investigates the opportunities given by the AVX instruction set to evaluate models more efficiently. We achieved 35% speedup on the binarization stage (nodes conditions comparison), and 20% speedup on the trees apply stage on the ranking model.

摘要:决策森林(决策树集合)是最受欢迎的机器学习算法之一。要在大数据上使用大型模型,例如通过学习到级模型的文档评分,我们需要有效地评估这些模型。在本文中,我们探索了流行的Catboost库的祖先Matrixnet。两个库都使用SSE指令集用于CPU上的评分。本文研究了AVX指导设置的机会,以更有效地评估模型。我们在二进制阶段(节点条件比较)达到了35%的加速,在树上的速度上有20%的速度在排名模型上应用阶段。

ML-54-标题 Posterior Probability Matters Doubly-Adaptive Calibration for Neural Predictions in Online Advertising

链接: https://arxiv.org/abs/2205.07295
作者: Penghui Wei, Weimin Zhang, Ruijie Hou, Jinquan Liu, Shaoguo Liu, Liang Wang, Bo Zheng
备注: SIGIR 2022 (short)

点击查看摘要

Abstract: Predicting user response probabilities is vital for ad ranking and bidding. We hope that predictive models can produce accurate probabilistic predictions that reflect true likelihoods. Calibration techniques aims to post-process model predictions to posterior probabilities. Field-level calibration – which performs calibration w.r.t. to a specific field value – is fine-grained and more practical. In this paper we propose a doubly-adaptive approach AdaCalib. It learns an isotonic function family to calibrate model predictions with the guidance of posterior statistics, and field-adaptive mechanisms are designed to ensure that the posterior is appropriate for the field value to be calibrated. Experiments verify that AdaCalib achieves significant improvement on calibration performance. It has been deployed online and beats previous approach.

摘要:预测用户响应概率对于广告排名和竞标至关重要。我们希望预测模型可以产生准确的概率预测,以反映出真正的可能性。校准技术旨在对后验概率进行后处理模型预测。现场级校准 - 执行校准W.R.T.对于特定的现场价值 - 是细粒度的,更实用的。在本文中,我们提出了一种双重自适应方法adacalib。它学习了一个等渗函数家族,可以通过后统计的指导来校准模型预测,并设计了现场自适应机制,以确保后验适用于要校准的场值。实验验证了Adacalib是否可以在校准性能方面取得显着改善。它已在网上部署并击败了先前的方法。

ML-55-标题 A Computational Framework of Cortical Microcircuits Approximates Sign-concordant Random Backpropagation

链接: https://arxiv.org/abs/2205.07292
作者: Yukun Yang, Peng Li
备注:

点击查看摘要

Abstract: Several recent studies attempt to address the biological implausibility of the well-known backpropagation (BP) method. While promising methods such as feedback alignment, direct feedback alignment, and their variants like sign-concordant feedback alignment tackle BP’s weight transport problem, their validity remains controversial owing to a set of other unsolved issues. In this work, we answer the question of whether it is possible to realize random backpropagation solely based on mechanisms observed in neuroscience. We propose a hypothetical framework consisting of a new microcircuit architecture and its supporting Hebbian learning rules. Comprising three types of cells and two types of synaptic connectivity, the proposed microcircuit architecture computes and propagates error signals through local feedback connections and supports the training of multi-layered spiking neural networks with a globally defined spiking error function. We employ the Hebbian rule operating in local compartments to update synaptic weights and achieve supervised learning in a biologically plausible manner. Finally, we interpret the proposed framework from an optimization point of view and show its equivalence to sign-concordant feedback alignment. The proposed framework is benchmarked on several datasets including MNIST and CIFAR10, demonstrating promising BP-comparable accuracy.

摘要:最近的几项研究试图解决众所周知的反向传播(BP)方法的生物学不可能。尽管有希望的方法,例如反馈对准,直接反馈对准以及它们的变体,例如标志的反馈对准攻击BP的重量传输问题,但由于一系列未解决的问题,它们的有效性仍然存在争议。在这项工作中,我们回答了一个问题,即是否仅基于神经科学中观察到的机制实现随机反向传播。我们提出了一个假设框架,该框架包括新的微电路架构及其支持的HEBBIAN学习规则。拟议的微电路体系结构包括三种类型的单元和两种突触连接性,通过本地反馈连接来计算错误信号,并支持具有全球定义的尖峰错误函数的多层尖峰神经网络的训练。我们采用在本地隔间运行的HEBBIAN规则来更新突触权重,并以生物学上合理的方式实现监督学习。最后,我们从优化的角度来解释提出的框架,并显示其等效到符号反馈对齐。提出的框架在包括MNIST和CIFAR10在内的多个数据集上进行了测试,证明了有希望的BP稳定精度。

ML-56-标题 Exploiting the Relationship Between Kendalls Rank Correlation and Cosine Similarity for Attribution Protection

链接: https://arxiv.org/abs/2205.07279
作者: Fan Wang, Adams Wai-Kin Kong
备注:

点击查看摘要

Abstract: Model attributions are important in deep neural networks as they aid practitioners in understanding the models, but recent studies reveal that attributions can be easily perturbed by adding imperceptible noise to the input. The non-differentiable Kendall’s rank correlation is a key performance index for attribution protection. In this paper, we first show that the expected Kendall’s rank correlation is positively correlated to cosine similarity and then indicate that the direction of attribution is the key to attribution robustness. Based on these findings, we explore the vector space of attribution to explain the shortcomings of attribution defense methods using \ell_p norm and propose integrated gradient regularizer (IGR), which maximizes the cosine similarity between natural and perturbed attributions. Our analysis further exposes that IGR encourages neurons with the same activation states for natural samples and the corresponding perturbed samples, which is shown to induce robustness to gradient-based attribution methods. Our experiments on different models and datasets confirm our analysis on attribution protection and demonstrate a decent improvement in adversarial robustness.

摘要:模型归因在深度神经网络中很重要,因为它们可以帮助实践者理解模型,但是最近的研究表明,通过向输入中添加不可察觉的噪声来轻松扰动归因。非差异性肯德尔的等级相关性是归因保护的关键绩效指数。在本文中,我们首先证明了预期的肯德尔的等级相关性与余弦相似性呈正相关,然后表明归因方向是归因鲁棒性的关键。基于这些发现,我们探索了归因的矢量空间,以使用\ ell_p norm来解释归因防御方法的缺点,并提出了综合梯度正常器(IGR),从而最大程度地提高了自然和扰动属性之间的余弦相似性。我们的分析进一步公开了IGR鼓励具有相同激活状态的天然样品和相应扰动样品的神经元,这证明可以诱导基于梯度的归因方法的鲁棒性。我们在不同模型和数据集上的实验证实了我们对归因保护的分析,并证明了对抗性鲁棒性的不当改善。

ML-57-标题 Fairness via Explanation Quality Evaluating Disparities in the Quality of Post hoc Explanations

链接: https://arxiv.org/abs/2205.07277
作者: Jessica Dai, Sohini Upadhyay, Ulrich Aivodji, Stephen H. Bach, Himabindu Lakkaraju
备注: Accepted at AIES 2022

点击查看摘要

Abstract: As post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to ensure that the quality of the resulting explanations is consistently high across various population subgroups including the minority groups. For instance, it should not be the case that explanations associated with instances belonging to a particular gender subgroup (e.g., female) are less accurate than those associated with other genders. However, there is little to no research that assesses if there exist such group-based disparities in the quality of the explanations output by state-of-the-art explanation methods. In this work, we address the aforementioned gaps by initiating the study of identifying group-based disparities in explanation quality. To this end, we first outline the key properties which constitute explanation quality and where disparities can be particularly problematic. We then leverage these properties to propose a novel evaluation framework which can quantitatively measure disparities in the quality of explanations output by state-of-the-art methods. Using this framework, we carry out a rigorous empirical analysis to understand if and when group-based disparities in explanation quality arise. Our results indicate that such disparities are more likely to occur when the models being explained are complex and highly non-linear. In addition, we also observe that certain post hoc explanation methods (e.g., Integrated Gradients, SHAP) are more likely to exhibit the aforementioned disparities. To the best of our knowledge, this work is the first to highlight and study the problem of group-based disparities in explanation quality. In doing so, our work sheds light on previously unexplored ways in which explanation methods may introduce unfairness in real world decision making.

摘要:随着事后解释方法越来越多地被利用以解释高风险环境中的复杂模型,因此确保在包括少数群体在内的各个种群亚组中,所得解释的质量始终高。例如,与与其他性别相关的实例(例如,女性)相关的实例(例如,女性)的说明不应该是与其他性别相关的解释。但是,几乎没有研究能够评估通过最先进的解释方法在输出的解释质量上是否存在这种基于群体的差异。在这项工作中,我们通过启动确定基于群体的解释质量差异的研究来解决上述差距。为此,我们首先概述了构成解释质量以及差异尤其有问题的关键属性。然后,我们利用这些属性提出了一个新的评估框架,该框架可以通过最新方法定量测量解释质量的差异。使用此框架,我们进行了严格的经验分析,以了解是否出现了解释质量的基于小组的差异。我们的结果表明,当所解释的模型复杂且高度非线性时,这种差异更可能发生。此外,我们还观察到某些事后解释方法(例如,综合梯度,外形)更有可能表现出上述差异。据我们所知,这项工作是第一个强调和研究解释质量上基于群体差异的问题的工作。通过这样做,我们的工作阐明了以前未开发的方式,其中解释方法可能在现实世界决策中引入不公平。

ML-58-标题 Discovering the Representation Bottleneck of Graph Neural Networks from Multi-order Interactions

链接: https://arxiv.org/abs/2205.07266
作者: Fang Wu, Siyuan Li, Lirong Wu, Stan Z. Li, Dragomir Radev, Qiang Zhang
备注:

点击查看摘要

Abstract: Most graph neural networks (GNNs) rely on the message passing paradigm to propagate node features and build interactions. Recent works point out that different graph learning tasks require different ranges of interactions between nodes. To investigate its underlying mechanism, we explore the capacity of GNNs to capture pairwise interactions between nodes under contexts with different complexities, especially for their graph-level and node-level applications in scientific domains like biochemistry and physics. When formulating pairwise interactions, we study two common graph construction methods in scientific domains, i.e., \emph{K-nearest neighbor} (KNN) graphs and \emph{fully-connected} (FC) graphs. Furthermore, we demonstrate that the inductive bias introduced by KNN-graphs and FC-graphs hinders GNNs to learn the most informative order of interactions. {Such a phenomenon is broadly shared by several GNNs for different graph learning tasks and forbids GNNs to achieve the global minimum loss, so we name it a \emph{representation bottleneck}.} To overcome that, we propose a novel graph rewiring approach based on the pairwise interaction strengths to dynamically adjust the reception fields of each node. Extensive experiments in molecular property prediction and dynamic system forecast prove the superiority of our method over state-of-the-art GNN baselines. More importantly, this paper provides a reasonable explanation of why subgraphs play an important role in the determination of graph properties.

摘要:大多数图形神经网络(GNNS)依靠传递范式传播节点特征并构建交互的消息。最近的作品指出,不同的图形学习任务需要节点之间的不同互动范围。为了研究其潜在机制,我们探讨了GNN在具有不同复杂性的上下文中捕获节点之间成对相互作用的能力,尤其是对于其在生物化学和物理等科学领域中的图形级别和节点级别的应用。在制定成对相互作用时,我们研究了科学域中的两种常见图形构造方法,即\ emph {k-nearest邻居}(KNN)图和\ emph {完全连接}(FC)图。此外,我们证明了KNN-Graphs和FC-Graphs引入的感应偏差阻碍GNNS学习最有用的相互作用顺序。 {这种现象由多个GNN广泛共享,用于不同的图形学习任务和禁止GNNS以实现全球最小损失,因此我们将其命名为\ emph {表示bottleneck}。在成对的相互作用强度上,以动态调整每个节点的接收场。分子性质预测和动态系统的广泛实验预测证明了我们方法比最先进的GNN基准的优越性。更重要的是,本文提供了一个合理的解释,说明为什么子图在确定图属性中起重要作用。

ML-59-标题 Reliable Offline Model-based Optimization for Industrial Process Control

链接: https://arxiv.org/abs/2205.07250
作者: Cheng Feng, Jinyan Guan
备注:

点击查看摘要

Abstract: In the research area of offline model-based optimization, novel and promising methods are frequently developed. However, implementing such methods in real-world industrial systems such as production lines for process control is oftentimes a frustrating process. In this work, we address two important problems to extend the current success of offline model-based optimization to industrial process control problems: 1) how to learn a reliable dynamics model from offline data for industrial processes? 2) how to learn a reliable but not over-conservative control policy from offline data by utilizing existing model-based optimization algorithms? Specifically, we propose a dynamics model based on ensemble of conditional generative adversarial networks to achieve accurate reward calculation in industrial scenarios. Furthermore, we propose an epistemic-uncertainty-penalized reward evaluation function which can effectively avoid giving over-estimated rewards to out-of-distribution inputs during the learning/searching of the optimal control policy. We provide extensive experiments with the proposed method on two representative cases (a discrete control case and a continuous control case), showing that our method compares favorably to several baselines in offline policy learning for industrial process control.

摘要:在基于离线模型的优化的研究领域,新颖和有希望的方法经常开发。但是,在现实世界中实施此类方法,例如用于过程控制的生产线,通常是一个令人沮丧的过程。在这项工作中,我们解决了两个重要问题,以将基于离线模型的优化的当前成功扩展到工业流程控制问题:1)如何从离线数据中从离线数据中学习可靠的动态模型? 2)如何通过利用现有基于模型的优化算法来从离线数据中学习可靠但过度保守的控制策略?具体而言,我们提出了一个基于条件生成对抗网络集合的动力学模型,以在工业场景中实现准确的奖励计算。此外,我们提出了一种认知 - 不确定性的奖励评估功能,可以有效避免在学习/搜索最佳控制策略期间给出过度估计的奖励。我们提供了有关两个代表性案例(一个离散的控制案例和连续控制案例)的建议方法进行的广泛实验,表明我们的方法与工业过程控制的离线政策学习中的几个基线相比有利。

ML-60-标题 Pocket2Mol Efficient Molecular Sampling Based on 3D Protein Pockets

链接: https://arxiv.org/abs/2205.07249
作者: Xingang Peng, Shitong Luo, Jiaqi Guan, Qi Xie, Jian Peng, Jianzhu Ma
备注: ICML 2022 accepted

点击查看摘要

Abstract: Deep generative models have achieved tremendous success in designing novel drug molecules in recent years. A new thread of works have shown the great potential in advancing the specificity and success rate of in silico drug design by considering the structure of protein pockets. This setting posts fundamental computational challenges in sampling new chemical compounds that could satisfy multiple geometrical constraints imposed by pockets. Previous sampling algorithms either sample in the graph space or only consider the 3D coordinates of atoms while ignoring other detailed chemical structures such as bond types and functional groups. To address the challenge, we develop Pocket2Mol, an E(3)-equivariant generative network composed of two modules: 1) a new graph neural network capturing both spatial and bonding relationships between atoms of the binding pockets and 2) a new efficient algorithm which samples new drug candidates conditioned on the pocket representations from a tractable distribution without relying on MCMC. Experimental results demonstrate that molecules sampled from Pocket2Mol achieve significantly better binding affinity and other drug properties such as druglikeness and synthetic accessibility.

摘要:近年来,深层生成模型在设计新型药物分子方面取得了巨大的成功。一系列新的作品表明,通过考虑蛋白质口袋的结构来提高硅药物设计的特异性和成功率的巨大潜力。该设置发布了对可以满足口袋施加的多种几何约束的新化学化合物的基本计算挑战。以前的采样算法要么在图形空间中采样,要么仅考虑原子的3D坐标,同时忽略了其他详细的化学结构,例如键类型和官能团。为了应对挑战,我们开发了Pocket2mol,一个由两个模块组成的E(3) - 等级生成网络:1)一个新的图神经网络捕获绑定口袋原子之间的空间和键合的新图神经网络和2)一种新的有效算法,该算法是新的有效算法样品在不依赖MCMC的情况下以可拖动分布的口袋代表为条件的新药候选人。实验结果表明,从Pocket2mol采样的分子获得了更好的结合亲和力和其他药物特性,例如药物液化性和合成可及性。

ML-61-标题 Clinical outcome prediction under hypothetical interventions – a representation learning framework for counterfactual reasoning

链接: https://arxiv.org/abs/2205.07234
作者: Yikuan Li, Mohammad Mamouei, Shishir Rao, Abdelaali Hassaine, Dexter Canoy, Thomas Lukasiewicz, Kazem Rahimi, Gholamreza Salimi-Khorshidi
备注:

点击查看摘要

Abstract: Most machine learning (ML) models are developed for prediction only; offering no option for causal interpretation of their predictions or parameters/properties. This can hamper the health systems’ ability to employ ML models in clinical decision-making processes, where the need and desire for predicting outcomes under hypothetical investigations (i.e., counterfactual reasoning/explanation) is high. In this research, we introduce a new representation learning framework (i.e., partial concept bottleneck), which considers the provision of counterfactual explanations as an embedded property of the risk model. Despite architectural changes necessary for jointly optimising for prediction accuracy and counterfactual reasoning, the accuracy of our approach is comparable to prediction-only models. Our results suggest that our proposed framework has the potential to help researchers and clinicians improve personalised care (e.g., by investigating the hypothetical differential effects of interventions)

摘要:大多数机器学习(ML)模型仅用于预测;为因果解释其预测或参数/属性没有任何选择。这可能会阻碍卫生系统在临床决策过程中采用ML模型的能力,在这种过程中,在假设研究(即反事实推理/解释)下预测结果的需求和渴望很高。在这项研究中,我们介绍了一个新的表示学习框架(即部分概念瓶颈),该框架将反事实解释作为风险模型的嵌入式属性。尽管结构变化以共同优化预测准确性和反事实推理,但我们方法的准确性与仅预测模型相当。我们的结果表明,我们提出的框架有可能帮助研究人员和临床医生改善个性化护理(例如,通过研究干预的假设差异效应)

ML-62-标题 RoMFAC A Robust Mean-Field Actor-Critic Reinforcement Learning against Adversarial Perturbations on States

链接: https://arxiv.org/abs/2205.07229
作者: Ziyuan Zhou, Guanjun Liu
备注:

点击查看摘要

Abstract: Deep reinforcement learning methods for multi-agent systems make optimal decisions dependent on states observed by agents, but a little uncertainty on the observations can possibly mislead agents into taking wrong actions. The mean-field actor-critic reinforcement learning (MFAC) is very famous in the multi-agent field since it can effectively handle the scalability problem. However, this paper finds that it is also sensitive to state perturbations which can significantly degrade the team rewards. This paper proposes a robust learning framework for MFAC called RoMFAC that has two innovations: 1) a new objective function of training actors, composed of a \emph{policy gradient function} that is related to the expected cumulative discount reward on sampled clean states and an \emph{action loss function} that represents the difference between actions taken on clean and adversarial states; and 2) a repetitive regularization of the action loss that ensures the trained actors obtain a good performance. Furthermore, we prove that the proposed action loss function is convergent. Experiments show that RoMFAC is robust against adversarial perturbations while maintaining its good performance in environments without perturbations.

摘要:多代理系统的深入强化学习方法做出取决于代理观察到的状态的最佳决策,但是观察结果的一些不确定性可能会误导代理人采取错误的行动。平均现场参与者 - 批判性强化学习(MFAC)在多代理领域非常出名,因为它可以有效地处理可伸缩性问题。但是,本文发现它对状态扰动也很敏感,这可能会大大降低团队奖励。本文为MFAC提出了一个强大的学习框架,称为ROMFAC,该框架具有两种创新:1)培训演员的新目标功能,由\ emph {polity梯度函数}组成,该功能与预期的累积折扣奖励有关的累积折扣奖励,一个\ emph {动作损失函数},代表对干净和对抗状态采取的动作之间的差异; 2)对动作损失的重复正规化,以确保受过训练的参与者获得良好的表现。此外,我们证明了提出的动作损失函数是收敛的。实验表明,ROMFAC在对抗扰动的同时保持其在没有扰动的环境中的良好性能,对对抗性扰动非常强大。

ML-63-标题 Sample-Efficient Learning of Correlated Equilibria in Extensive-Form Games

链接: https://arxiv.org/abs/2205.07223
作者: Ziang Song, Song Mei, Yu Bai
备注:

点击查看摘要

Abstract: Imperfect-Information Extensive-Form Games (IIEFGs) is a prevalent model for real-world games involving imperfect information and sequential plays. The Extensive-Form Correlated Equilibrium (EFCE) has been proposed as a natural solution concept for multi-player general-sum IIEFGs. However, existing algorithms for finding an EFCE require full feedback from the game, and it remains open how to efficiently learn the EFCE in the more challenging bandit feedback setting where the game can only be learned by observations from repeated playing. This paper presents the first sample-efficient algorithm for learning the EFCE from bandit feedback. We begin by proposing K -EFCE – a more generalized definition that allows players to observe and deviate from the recommended actions for K times. The K -EFCE includes the EFCE as a special case at K=1 , and is an increasingly stricter notion of equilibrium as K increases. We then design an uncoupled no-regret algorithm that finds an \varepsilon -approximate K -EFCE within \widetilde{\mathcal{O}}(\max_{i}X_iA_i^{K}/\varepsilon^2) iterations in the full feedback setting, where X_i and A_i are the number of information sets and actions for the i -th player. Our algorithm works by minimizing a wide-range regret at each information set that takes into account all possible recommendation histories. Finally, we design a sample-based variant of our algorithm that learns an \varepsilon -approximate K -EFCE within \widetilde{\mathcal{O}}(\max_{i}X_iA_i^{K+1}/\varepsilon^2) episodes of play in the bandit feedback setting. When specialized to K=1 , this gives the first sample-efficient algorithm for learning EFCE from bandit feedback.

摘要:不完美的信息广泛形式游戏(IIEFGS)是现实游戏的普遍模型,涉及不完美的信息和顺序游戏。广泛的相关平衡(EFCE)已被认为是多玩家通用和IEIFG的天然解决方案概念。但是,现有的寻找EFCE的算法需要游戏中的全部反馈,并且在更具挑战性的强盗反馈环境中,它仍然是开放的,如何有效地学习EFCE,在这种情况下,只能通过重复玩耍的观察来学习游戏。本文介绍了第一种用于从匪徒反馈中学习EFCE的样品效率算法。我们首先提出k -efce-一个更广泛的定义,使玩家可以观察并偏离K时间的建议动作。 k -efce在k = 1处将EFCE作为特殊情况,并且随着k的增加而变得越来越严格的平衡概念。然后,我们设计了一个未耦合的无regret算法,该算法在\ widetilde {\ mathcal {o}}(\ approximate k -efce)中找到了整个完整迭代中迭代的{\ mathcal {o}}(\ max_ {i} x_ia_i_i^{i} x_ia_i_i^{k}/\ varepsilon^2)迭代。反馈设置,其中x_i和a_i是I -TH播放器的信息集和操作的数量。我们的算法通过最大程度地减少了所有可能的建议历史记录的信息集的广泛遗憾。最后,我们设计了算法的基于样本的变体,该变体在\ widetilde {\ mathcal {o}}中学习\ varepsilon -approximate k -efce(\ max_ {i} x_ia_i_i^x_ia_i^ )在强盗反馈设置中播放的剧集。当专门为k = 1时,这给出了第一个从匪徒反馈中学习EFCE的样品效率算法。

ML-64-标题 Online Nonsubmodular Minimization with Delayed Costs From Full Information to Bandit Feedback

链接: https://arxiv.org/abs/2205.07217
作者: Tianyi Lin, Aldo Pacchiano, Yaodong Yu, Michael I. Jordan
备注: Accepted by ICML 2022; The first three authors contributed equally to this work; 37 pages, 9 figures

点击查看摘要

Abstract: Motivated by applications to online learning in sparse estimation and Bayesian optimization, we consider the problem of online unconstrained nonsubmodular minimization with delayed costs in both full information and bandit feedback settings. In contrast to previous works on online unconstrained submodular minimization, we focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms in static and delayed scenarios. We derive bounds for the agent’s regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded. Key to our approach is the notion of (\alpha, \beta) -regret and the extension of the generic convex relaxation model from~\citet{El-2020-Optimal}, the analysis of which is of independent interest. We conduct and showcase several simulation studies to demonstrate the efficacy of our algorithms.

摘要:通过稀疏估计和贝叶斯优化的在线学习的申请,我们考虑了在线无约束的非管道最小化的问题,并且在完整的信息和强盗反馈设置中均延迟成本。与以前关于在线无约束下最小化的作品相反,我们专注于具有特殊结构的一类非模型功能,并证明了在静态和延迟场景中的在线和近似在线强盗梯度下降算法的几种变体的遗憾保证。我们在全部信息和强盗反馈设置中为代理商的遗憾得出了界限,即使选择决定与接收成本之间的延迟是没有约束的。我们方法的关键是(\ alpha,\ beta)-regret的概念,以及从〜\ citet {el-2020-Optimal}的通用凸松弛模型的扩展,其分析具有独立的关注。我们进行并展示了几项模拟研究,以证明我们的算法的功效。

ML-65-标题 FedHAP Fast Federated Learning for LEO Constellations using Collaborative HAPs

链接: https://arxiv.org/abs/2205.07216
作者: Mohamed Elmahallawy, Tony Luo
备注:

点击查看摘要

Abstract: Low Earth Obit (LEO) satellite constellations have seen a sharp increase of deployment in recent years, due to their distinctive capabilities of providing broadband Internet access and enabling global data acquisition as well as large-scale AI applications. To apply machine learning (ML) in such applications, the traditional way of downloading satellite data such as imagery to a ground station (GS) and then training a model in a centralized manner, is not desirable because of the limited bandwidth, intermittent connectivity between satellites and the GS, and privacy concerns on transmitting raw data. Federated Learning (FL) as an emerging communication and computing paradigm provides a potentially supreme solution to this problem. However, we show that existing FL solutions do not fit well in such LEO constellation scenarios because of significant challenges such as excessive convergence delay and unreliable wireless channels. To this end, we propose to introduce high-altitude platforms (HAPs) as distributed parameter servers (PSs) and propose a synchronous FL algorithm, FedHAP, to accomplish model training in an efficient manner via inter-satellite collaboration. To accelerate convergence, we also propose a layered communication scheme between satellites and HAPs that FedHAP leverages. Our simulations demonstrate that FedHAP attains model convergence in much fewer communication rounds than benchmarks, cutting the training time substantially from several days down to a few hours with the same level of resulting accuracy.

摘要:近年来,低地球obit(LEO)卫星星座在近年来的部署急剧增加,这是因为它们具有独特的功能,可提供宽带互联网访问并实现全球数据获取以及大规模的AI应用程序。要在此类应用中应用机器学习(ML),是将卫星数据(例如图像)下载到地面站(GS),然后以集中式方式训练模型的传统方式,这是不可取的,因为带宽有限,间歇性连接卫星和GS以及有关传输原始数据的隐私问题。联合学习(FL)作为新兴的沟通和计算范式为这一问题提供了潜在的最高解决方案。但是,我们表明,由于诸如过度收敛延迟和不可靠的无线通道等重大挑战,现有的FL解决方案在这种LEO星座方案中不太合适。为此,我们建议将高海拔平台(HAP)作为分布式参数服务器(PSS)引入,并提出了同步的FL算法FEDHAP,以通过卫星间协作以有效的方式完成模型培训。为了加速融合,我们还提出了卫星和联邦法的HAP之间的分层通信方案。我们的模拟表明,FedHap在沟通弹中的模型收敛性比基准少得多,从而将训练时间从几天下降到几个小时,并具有相同的准确性水平。

ML-66-标题 Towards a Comprehensive Solution for a Vision-based Digitized Neurological Examination

链接: https://arxiv.org/abs/2205.07209
作者: Trung-Hieu Hoang, Mona Zehni, Huaijin Xu, George Heintz, Christopher Zallek, Minh N. Do
备注:

点击查看摘要

Abstract: The ability to use digitally recorded and quantified neurological exam information is important to help healthcare systems deliver better care, in-person and via telehealth, as they compensate for a growing shortage of neurologists. Current neurological digital biomarker pipelines, however, are narrowed down to a specific neurological exam component or applied for assessing specific conditions. In this paper, we propose an accessible vision-based exam and documentation solution called Digitized Neurological Examination (DNE) to expand exam biomarker recording options and clinical applications using a smartphone/tablet. Through our DNE software, healthcare providers in clinical settings and people at home are enabled to video capture an examination while performing instructed neurological tests, including finger tapping, finger to finger, forearm roll, and stand-up and walk. Our modular design of the DNE software supports integrations of additional tests. The DNE extracts from the recorded examinations the 2D/3D human-body pose and quantifies kinematic and spatio-temporal features. The features are clinically relevant and allow clinicians to document and observe the quantified movements and the changes of these metrics over time. A web server and a user interface for recordings viewing and feature visualizations are available. DNE was evaluated on a collected dataset of 21 subjects containing normal and simulated-impaired movements. The overall accuracy of DNE is demonstrated by classifying the recorded movements using various machine learning models. Our tests show an accuracy beyond 90% for upper-limb tests and 80% for the stand-up and walk tests.

摘要:使用数字记录和量化的神经检查信息的能力对于帮助医疗系统提供更好的护理,面对面和通过远程医疗非常重要,因为它们可以弥补神经病学家日益增长的短缺。然而,当前的神经数字生物标志物管道被缩小为特定的神经检查组件或用于评估特定条件。在本文中,我们提出了一种名为数字化神经检查(DNE)的基于视觉的考试和文档解决方案,以扩大使用智能手机/平板电脑的生物标志物记录选项和临床应用。通过我们的DNE软件,在临床环境中的医疗保健提供者和家里的人们可以在执行指示神经系统测试的同时进行视频捕获检查,包括手指敲击,手指伸向手指,前臂卷以及站立和步行。我们对DNE软件的模块化设计支持其他测试的集成。 DNE从记录的检查中提取了2D/3D人体姿势,并量化运动学和时空特征。这些功能在临床上具有相关性,并允许临床医生记录并观察到量化运动和随着时间的变化的变化。可以使用用于录音的Web服务器和用户界面,以查看和功能可视化。在收集的21名受试者的数据集上评估了DNE,该数据集包含正常和模拟运动的运动。通过使用各种机器学习模型对记录的运动进行分类来证明DNE的总体精度。我们的测试显示上LIMB测试的准确性超过90%,而站立和步行测试的精度超过80%。

ML-67-标题 Sparsity-Aware Robust Normalized Subband Adaptive Filtering algorithms based on Alternating Optimization

链接: https://arxiv.org/abs/2205.07172
作者: Yi Yu, Zongxin Huang, Hongsen He, Yuriy Zakharov, Rodrigo C. de Lamare
备注: 8 pages, 3 figures

点击查看摘要

Abstract: This paper proposes a unified sparsity-aware robust normalized subband adaptive filtering (SA-RNSAF) algorithm for identification of sparse systems under impulsive noise. The proposed SA-RNSAF algorithm generalizes different algorithms by defining the robust criterion and sparsity-aware penalty. Furthermore, by alternating optimization of the parameters (AOP) of the algorithm, including the step-size and the sparsity penalty weight, we develop the AOP-SA-RNSAF algorithm, which not only exhibits fast convergence but also obtains low steady-state misadjustment for sparse systems. Simulations in various noise scenarios have verified that the proposed AOP-SA-RNSAF algorithm outperforms existing techniques.

摘要:本文提出了一种统一的稀疏性,可靠的归一化子带自适应滤波(SA-RNSAF)算法,用于鉴定冲动噪声下稀疏系统。拟议的SA-RNSAF算法通过定义强大的标准和稀疏感知的惩罚来概括不同的算法。此外,通过对算法的参数(AOP)进行交替优化,包括阶梯尺寸和稀疏惩罚权重,我们开发了AOP-SA-SA-RNSAF算法,该算法不仅表现出快速收敛,而且还会表现出低稳态的不良症状。用于稀疏系统。在各种噪声方案中,模拟验证了所提出的AOP-SA-RNSAF算法优于现有技术。

ML-68-标题 Interpretable Stochastic Model Predictive Control using Distributional Reinforced Estimation for Quadrotor Tracking Systems

链接: https://arxiv.org/abs/2205.07150
作者: Yanran Wang, James O’Keeffe, Qiuchen Qian, David Boyle
备注: 8 pages, 4 figures

点击查看摘要

Abstract: This paper presents a novel trajectory tracker for autonomous quadrotor navigation in dynamic and complex environments. The proposed framework integrates a distributional Reinforcement Learning (RL) estimator for unknown aerodynamic effects into a Stochastic Model Predictive Controller (SMPC) for trajectory tracking. Aerodynamic effects derived from drag forces and moment variations are difficult to model directly and accurately. Most current quadrotor tracking systems therefore treat them as simple `disturbances’ in conventional control approaches. We propose Quantile-approximation-based Distributional Reinforced-disturbance-estimator, an aerodynamic disturbance estimator, to accurately identify disturbances, i.e., uncertainties between the true and estimated values of aerodynamic effects. Simplified Affine Disturbance Feedback is employed for control parameterization to guarantee convexity, which we then integrate with a SMPC to achieve sufficient and non-conservative control signals. We demonstrate our system to improve the cumulative tracking errors by at least 66% with unknown and diverse aerodynamic forces compared with recent state-of-the-art. Concerning traditional Reinforcement Learning’s non-interpretability, we provide convergence and stability guarantees of Distributional RL and SMPC, respectively, with non-zero mean disturbances.

摘要:本文介绍了一种新型的轨迹跟踪器,用于在动态和复杂环境中自动二次导航。提出的框架将未知空气动力效应的分布加固学习(RL)估计器集成到随机模型预测控制器(SMPC)中进行轨迹跟踪。源自阻力的空气动力效应和力矩变化很难直接准确地建模。因此,大多数当前的四项跟踪系统将它们视为传统控制方法中的简单“干扰”。我们提出了基于分位数的分布分布加强驱动式估计器(一种空气动力障碍估计器),以准确识别空气动力效应的真实和估计值之间的不确定性,即不确定性。简化的仿射干扰反馈用于控制参数化来保证凸度,然后我们将其与SMPC集成以获得足够的非保守控制信号。我们证明了我们的系统将累积跟踪误差提高至少66%,与最近的最新动力相比,未知和多样的空气动力。关于传统的强化学习的不泄露性,我们分别提供分配RL和SMPC的收敛性和稳定性保证,并具有非零均值干扰。

ML-69-标题 BackLink Supervised Local Training with Backward Links

链接: https://arxiv.org/abs/2205.07141
作者: Wenzhe Guo, Mohammed E Fouda, Ahmed M. Eltawil, Khaled N. Salama
备注:

点击查看摘要

Abstract: Empowered by the backpropagation (BP) algorithm, deep neural networks have dominated the race in solving various cognitive tasks. The restricted training pattern in the standard BP requires end-to-end error propagation, causing large memory cost and prohibiting model parallelization. Existing local training methods aim to resolve the training obstacle by completely cutting off the backward path between modules and isolating their gradients to reduce memory cost and accelerate the training process. These methods prevent errors from flowing between modules and hence information exchange, resulting in inferior performance. This work proposes a novel local training algorithm, BackLink, which introduces inter-module backward dependency and allows errors to flow between modules. The algorithm facilitates information to flow backward along with the network. To preserve the computational advantage of local training, BackLink restricts the error propagation length within the module. Extensive experiments performed in various deep convolutional neural networks demonstrate that our method consistently improves the classification performance of local training algorithms over other methods. For example, in ResNet32 with 16 local modules, our method surpasses the conventional greedy local training method by 4.00% and a recent work by 1.83% in accuracy on CIFAR10, respectively. Analysis of computational costs reveals that small overheads are incurred in GPU memory costs and runtime on multiple GPUs. Our method can lead up to a 79% reduction in memory cost and 52% in simulation runtime in ResNet110 compared to the standard BP. Therefore, our method could create new opportunities for improving training algorithms towards better efficiency and biological plausibility.

摘要:由反向传播(BP)算法授权,深度神经网络在解决各种认知任务方面主导了种族。标准BP中受限制的训练模式需要端到端误差传播,导致大量记忆成本并禁止模型并行化。现有的本地培训方法旨在通过完全切断模块之间的向后路径并隔离其梯度以降低记忆成本并加速训练过程来解决训练障碍。这些方法阻止了模块之间流动的错误和信息交换,从而导致性能较低。这项工作提出了一种新型的本地培训算法Backlink,该算法会引入模块间依赖性,并允许在模块之间流动错误。该算法有助于与网络一起向后流动。为了保留本地培训的计算优势,反向链接限制了模块内的误差传播长度。在各种深度卷积神经网络中进行的广泛实验表明,我们的方法始终如一地改善本地训练算法的分类性能,而不是其他方法。例如,在具有16个本地模块的Resnet32中,我们的方法分别超过了常规的贪婪本地培训方法4.00 \%,而CIFAR10上的准确性分别超过1.83 \%。计算成本的分析表明,在多个GPU上的GPU内存成本和运行时会产生小型开销。与标准BP相比,我们的方法可导致记忆成本的79 \%降低,在RESNET110中的模拟运行时52 \%。因此,我们的方法可以创造新的机会,以改善培训算法,以提高效率和生物学合理性。

ML-70-标题 Practical Insights of Repairing Model Problems on Image Classification

链接: https://arxiv.org/abs/2205.07116
作者: Akihito Yoshii, Susumu Tokumoto, Fuyuki Ishikawa
备注:

点击查看摘要

Abstract: Additional training of a deep learning model can cause negative effects on the results, turning an initially positive sample into a negative one (degradation). Such degradation is possible in real-world use cases due to the diversity of sample characteristics. That is, a set of samples is a mixture of critical ones which should not be missed and less important ones. Therefore, we cannot understand the performance by accuracy alone. While existing research aims to prevent a model degradation, insights into the related methods are needed to grasp their benefits and limitations. In this talk, we will present implications derived from a comparison of methods for reducing degradation. Especially, we formulated use cases for industrial settings in terms of arrangements of a data set. The results imply that a practitioner should care about better method continuously considering dataset availability and life cycle of an AI system because of a trade-off between accuracy and preventing degradation.

摘要:深度学习模型的其他培训可能会对结果产生负面影响,从而将最初的阳性样本变成负样本(退化)。由于样本特征的多样性,在现实世界中的用例中可能会降解。也就是说,一组样本是关键的样本的混合物,不应错过且不重要的样本。因此,我们无法单独通过准确性理解性能。尽管现有研究旨在防止模型退化,但需要对相关方法的见解来掌握其收益和局限性。在本次演讲中,我们将介绍从减少降解方法的比较中得出的含义。特别是,我们就数据集的安排为工业环境制定了用例。结果表明,由于准确性和防止降解之间的权衡,从业者应不断考虑更好的方法,考虑AI系统的数据集可用性和生命周期。

ML-71-标题 SystemMatch optimizing preclinical drug models to human clinical outcomes via generative latent-space matching

链接: https://arxiv.org/abs/2205.07110
作者: Scott Gigante, Varsha G. Raghavan, Amanda M. Robinson, Robert A. Barton, Adeeb H. Rahman, Drausin F. Wulsin, Jacques Banchereau, Noam Solomon, Luis F. Voloch, Fabian J. Theis
备注: Published at the MLDD workshop, ICLR 2022

点击查看摘要

Abstract: Translating the relevance of preclinical models ( \textit{in vitro} , animal models, or organoids) to their relevance in humans presents an important challenge during drug development. The rising abundance of single-cell genomic data from human tumors and tissue offers a new opportunity to optimize model systems by their similarity to targeted human cell types in disease. In this work, we introduce SystemMatch to assess the fit of preclinical model systems to an \textit{in sapiens} target population and to recommend experimental changes to further optimize these systems. We demonstrate this through an application to developing \textit{in vitro} systems to model human tumor-derived suppressive macrophages. We show with held-out \textit{in vivo} controls that our pipeline successfully ranks macrophage subpopulations by their biological similarity to the target population, and apply this analysis to rank a series of 18 \textit{in vitro} macrophage systems perturbed with a variety of cytokine stimulations. We extend this analysis to predict the behavior of 66 \textit{in silico} model systems generated using a perturbational autoencoder and apply a k -medoids approach to recommend a subset of these model systems for further experimental development in order to fully explore the space of possible perturbations. Through this use case, we demonstrate a novel approach to model system development to generate a system more similar to human biology.

摘要:将临床前模型(\ textit {Inter},动物模型或类器官)的相关性转化为它们在人类中的相关性,这在药物开发过程中提出了重要的挑战。来自人类肿瘤和组织的单细胞基因组数据的丰富度增加为模型系统与疾病中的人类细胞类型相似,从而优化模型系统。在这项工作中,我们介绍了SystemMatch,以评估临床前模型系统的拟合度{sapiens}目标人群,并推荐实验更改以进一步优化这些系统。我们通过应用于开发\ textit {Inter}系统来建模人类肿瘤衍生的抑制性巨噬细胞的应用来证明这一点。我们以持有的\ textit {in Vivo}的控制显示,我们的管道通过与目标人群的生物学相似性成功地对巨噬细胞亚群进行了排名,并应用此分析来对一系列18 \ textit {interto {inter}巨噬细胞系统的处理方式排名各种细胞因子刺激。我们扩展了此分析,以预测使用扰动自动编码器生成的66 \ textit {In silico}模型系统的行为扰动。通过这种用例,我们展示了一种新颖的方法来建模系统开发,以生成与人类生物学更相似的系统。

ML-72-标题 Unsupervised Abnormal Traffic Detection through Topological Flow Analysis

链接: https://arxiv.org/abs/2205.07109
作者: Paul Irofti, Andrei Pătraşcu, Andrei Iulian Hîji
备注:

点击查看摘要

Abstract: Cyberthreats are a permanent concern in our modern technological world. In the recent years, sophisticated traffic analysis techniques and anomaly detection (AD) algorithms have been employed to face the more and more subversive adversarial attacks. A malicious intrusion, defined as an invasive action intending to illegally exploit private resources, manifests through unusual data traffic and/or abnormal connectivity pattern. Despite the plethora of statistical or signature-based detectors currently provided in the literature, the topological connectivity component of a malicious flow is less exploited. Furthermore, a great proportion of the existing statistical intrusion detectors are based on supervised learning, that relies on labeled data. By viewing network flows as weighted directed interactions between a pair of nodes, in this paper we present a simple method that facilitate the use of connectivity graph features in unsupervised anomaly detection algorithms. We test our methodology on real network traffic datasets and observe several improvements over standard AD.

摘要:在我们的现代技术世界中,网络威胁是永久关注的问题。近年来,采用了复杂的交通分析技术和异常检测(AD)算法,以面对越来越多的颠覆性对抗性攻击。一种恶意入侵,定义为旨在非法利用私人资源的侵入性动作,通过异常的数据流量和/或异常连接模式表现出来。尽管文献中目前提供了大量的统计或基于签名的检测器,但恶意流的拓扑连接组件的利用较少。此外,现有的统计入侵检测器中很大一部分基于监督学习,这取决于标记的数据。通过将网络流动为一对节点之间的加权定向相互作用,在本文中,我们提出了一种简单的方法,可促进在无监督的异常检测算法中使用连接图特征。我们在实际网络流量数据集上测试我们的方法,并观察到标准AD的几个改进。

ML-73-标题 A Learning Approach for Joint Design of Event-triggered Control and Power-Efficient Resource Allocation

链接: https://arxiv.org/abs/2205.07070
作者: Atefeh Termehchi, Mehdi Rasti
备注: 14 pages, 12 figures, in IEEE Transactions on Vehicular Technology

点击查看摘要

Abstract: In emerging Industrial Cyber-Physical Systems (ICPSs), the joint design of communication and control sub-systems is essential, as these sub-systems are interconnected. In this paper, we study the joint design problem of an event-triggered control and an energy-efficient resource allocation in a fifth generation (5G) wireless network. We formally state the problem as a multi-objective optimization one, aiming to minimize the number of updates on the actuators’ input and the power consumption in the downlink transmission. To address the problem, we propose a model-free hierarchical reinforcement learning approach \textcolor{blue}{with uniformly ultimate boundedness stability guarantee} that learns four policies simultaneously. These policies contain an update time policy on the actuators’ input, a control policy, and energy-efficient sub-carrier and power allocation policies. Our simulation results show that the proposed approach can properly control a simulated ICPS and significantly decrease the number of updates on the actuators’ input as well as the downlink power consumption.

摘要:在新兴的工业网络物理系统(ICPS)中,通信和控制子系统的联合设计至关重要,因为这些子系统是相互联系的。在本文中,我们研究了第五代(5G)无线网络中事件触发的控制和节能资源分配的联合设计问题。我们正式将问题称为多目标优化,旨在最大程度地减少执行器输入的更新数量和下行链路传输中的功耗。为了解决这个问题,我们提出了一种无模型的层次增强学习方法\ TextColor {blue} {具有统一的最终有限性稳定性保证},同时学习四个政策。这些策略包含有关执行者的输入,控制策略以及节能子载波和电力分配策略的更新时间策略。我们的仿真结果表明,所提出的方法可以正确控制模拟的ICP,并显着减少执行器输入以及下行链路功耗的更新数量。

ML-74-标题 MIND Maximum Mutual Information Based Neural Decoder

链接: https://arxiv.org/abs/2205.07061
作者: Andrea M. Tonello, Nunzio A. Letizia
备注: 5 pages, 5 figures. This work has been submitted to the IEEE for possible publication

点击查看摘要

Abstract: We are assisting at a growing interest in the development of learning architectures with application to digital communication systems. Herein, we consider the detection/decoding problem. We aim at developing an optimal neural architecture for such a task. The definition of the optimal criterion is a fundamental step. We propose to use the mutual information (MI) of the channel input-output signal pair. The computation of the MI is a formidable task, and for the majority of communication channels it is unknown. Therefore, the MI has to be learned. For such an objective, we propose a novel neural MI estimator based on a discriminative formulation. This leads to the derivation of the mutual information neural decoder (MIND). The developed neural architecture is capable not only to solve the decoding problem in unknown channels, but also to return an estimate of the average MI achieved with the coding scheme, as well as the decoding error probability. Several numerical results are reported and compared with maximum a-posteriori (MAP) and maximum likelihood (MaxL) decoding strategies.

摘要:我们正在协助对学习架构的发展,并应用于数字通信系统。在此,我们考虑检测/解码问题。我们旨在为这项任务开发最佳的神经体系结构。最佳标准的定义是一个基本步骤。我们建议使用通道输入输出信号对的共同信息(MI)。 MI的计算是一项艰巨的任务,对于大多数通信渠道而言,这是未知的。因此,必须学习MI。对于这样的目标,我们提出了基于判别性公式的新型神经MI估计量。这导致了相互信息神经解码器(思维)的推导。开发的神经体系结构不仅能够解决未知通道中的解码问题,还可以返回对编码方案获得的平均MI的估计以及解码误差概率。报告了几个数值结果,并将其与最大的A型杆菌(MAP)和最大似然(MAXL)解码策略进行了比较。

ML-75-标题 GAN-Aimbots Using Machine Learning for Cheating in First Person Shooters

链接: https://arxiv.org/abs/2205.07060
作者: Anssi Kanervisto, Tomi Kinnunen, Ville Hautamäki
备注: Accepted to IEEE Transactions on Games. Source code available at this https URL

点击查看摘要

Abstract: Playing games with cheaters is not fun, and in a multi-billion-dollar video game industry with hundreds of millions of players, game developers aim to improve the security and, consequently, the user experience of their games by preventing cheating. Both traditional software-based methods and statistical systems have been successful in protecting against cheating, but recent advances in the automatic generation of content, such as images or speech, threaten the video game industry; they could be used to generate artificial gameplay indistinguishable from that of legitimate human players. To better understand this threat, we begin by reviewing the current state of multiplayer video game cheating, and then proceed to build a proof-of-concept method, GAN-Aimbot. By gathering data from various players in a first-person shooter game we show that the method improves players’ performance while remaining hidden from automatic and manual protection mechanisms. By sharing this work we hope to raise awareness on this issue and encourage further research into protecting the gaming communities.

摘要:与作弊者一起玩游戏并不有趣,在数十亿美元的视频游戏行业中,游戏开发人员的目标是提高安全性,因此,通过防止作弊来提高其游戏的用户体验。传统的基于软件的方法和统计系统都成功地保护了作弊,但是自动生成内容(例如图像或语音)的最新进展威胁到视频游戏行业。它们可以用来与合法的人类玩家产生人造游戏玩法。为了更好地理解这种威胁,我们首先要回顾当前的多人视频游戏作弊状态,然后继续构建概念验证方法Gan-Aimbot。通过在第一人称射击游戏中收集来自各种玩家的数据,我们表明该方法可以提高玩家的性能,同时隐藏在自动和手动保护机制中。通过分享这项工作,我们希望提高人们对这个问题的认识,并鼓励进一步研究保护游戏社区。

ML-76-标题 Generalization error bounds for DECONET a deep unfolded network for analysis Compressive Sensing

链接: https://arxiv.org/abs/2205.07050
作者: Vasiliki Kouni
备注:

点击查看摘要

Abstract: In this paper, we propose a new deep unfolding neural network – based on a state-of-the-art optimization algorithm – for analysis Compressed Sensing. The proposed network called Decoding Network (DECONET) implements a decoder that reconstructs vectors from their incomplete, noisy measurements. Moreover, DECONET jointly learns a redundant analysis operator for sparsification, which is shared across the layers of DECONET. We study the generalization ability of DECONET. Towards that end, we first estimate the Rademacher complexity of the hypothesis class consisting of all the decoders that DECONET can implement. Then, we provide generalization error bounds, in terms of the aforementioned estimate. Finally, we present numerical experiments which confirm the validity of our theoretical results.

摘要:在本文中,我们提出了一个基于最新优化算法的新的深度展开的神经网络,以进行分析。所提出的称为解码网络(DECONET)的网络实现了一种解码器,该解码器从其不完整的嘈杂测量值中重建向量。此外,DeConet共同学习了一个冗余的分析操作员,以进行稀疏,该操作员在整个Deconet层共享。我们研究Deconet的概括能力。为此,我们首先估计假设类别的Rademacher复杂性,该假设类别由Deconet可以实施的所有解码器组成。然后,我们根据上述估计值提供概括误差界限。最后,我们提出了数值实验,以证实我们理论结果的有效性。

ML-77-标题 Fake News Quick Detection on Dynamic Heterogeneous Information Networks

链接: https://arxiv.org/abs/2205.07039
作者: Jin Ho Go, Alina Sari, Jiaojiao Jiang, Shuiqiao Yang, Sanjay Jha
备注:

点击查看摘要

Abstract: The spread of fake news has caused great harm to society in recent years. So the quick detection of fake news has become an important task. Some current detection methods often model news articles and other related components as a static heterogeneous information network (HIN) and use expensive message-passing algorithms. However, in the real-world, quickly identifying fake news is of great significance and the network may vary over time in terms of dynamic nodes and edges. Therefore, in this paper, we propose a novel Dynamic Heterogeneous Graph Neural Network (DHGNN) for fake news quick detection. More specifically, we first implement BERT and fine-tuned BERT to get a semantic representation of the news article contents and author profiles and convert it into graph data. Then, we construct the heterogeneous news-author graph to reflect contextual information and relationships. Additionally, we adapt ideas from personalized PageRank propagation and dynamic propagation to heterogeneous networks in order to reduce the time complexity of back-propagating through many nodes during training. Experiments on three real-world fake news datasets show that DHGNN can outperform other GNN-based models in terms of both effectiveness and efficiency.

摘要:近年来,假新闻的传播对社会造成了巨大伤害。因此,快速发现假新闻已成为一项重要任务。一些当前的检测方法通常将新闻文章和其他相关组件建模为静态的异质信息网络(HIN),并使用昂贵的消息通讯算法。但是,在现实世界中,迅速识别假新闻具有重要意义,并且在动态节点和边缘方面,网络可能会随着时间而变化。因此,在本文中,我们提出了一种新型动态异质图神经网络(DHGNN),以进行假新闻快速检测。更具体地说,我们首先实现了Bert和微调的Bert,以获取新闻文章内容和作者配置文件的语义表示并将其转换为图形数据。然后,我们构建了异构新闻作者图,以反映上下文信息和关系。此外,我们将个性化的Pagerank繁殖和动态传播的想法调整为异质网络,以减少培训期间通过许多节点进行反向传播的时间复杂性。在三个现实世界中的假新闻数据集上进行的实验表明,就有效性和效率而言,DHGNN可以优于其他基于GNN的模型。

ML-78-标题 High Performance of Gradient Boosting in Binding Affinity Prediction

链接: https://arxiv.org/abs/2205.07023
作者: Dmitrii Gavrilev, Nurlybek Amangeldiuly, Sergei Ivanov, Evgeny Burnaev
备注:

点击查看摘要

Abstract: Prediction of protein-ligand (PL) binding affinity remains the key to drug discovery. Popular approaches in recent years involve graph neural networks (GNNs), which are used to learn the topology and geometry of PL complexes. However, GNNs are computationally heavy and have poor scalability to graph sizes. On the other hand, traditional machine learning (ML) approaches, such as gradient-boosted decision trees (GBDTs), are lightweight yet extremely efficient for tabular data. We propose to use PL interaction features along with PL graph-level features in GBDT. We show that this combination outperforms the existing solutions.

摘要:蛋白质 - 配体(PL)结合亲和力的预测仍然是药物发现的关键。近年来,流行的方法涉及图神经网络(GNN),这些神经网络用于学习PL复合物的拓扑结构和几何形状。但是,GNN在计算上很重,并且图形大小的可扩展性较差。另一方面,传统的机器学习(ML)方法,例如提高梯度的决策树(GBDTS),对表格数据轻巧但非常有效。我们建议在GBDT中使用PL相互作用功能以及PL图级功能。我们表明,这种组合的表现优于现有解决方案。

ML-79-标题 Cliff Diving Exploring Reward Surfaces in Reinforcement Learning Environments

链接: https://arxiv.org/abs/2205.07015
作者: Ryan Sullivan, J. K. Terry, Benjamin Black, John P. Dickerson
备注:

点击查看摘要

Abstract: Visualizing optimization landscapes has led to many fundamental insights in numeric optimization, and novel improvements to optimization techniques. However, visualizations of the objective that reinforcement learning optimizes (the “reward surface”) have only ever been generated for a small number of narrow contexts. This work presents reward surfaces and related visualizations of 27 of the most widely used reinforcement learning environments in Gym for the first time. We also explore reward surfaces in the policy gradient direction and show for the first time that many popular reinforcement learning environments have frequent “cliffs” (sudden large drops in expected return). We demonstrate that A2C often “dives off” these cliffs into low reward regions of the parameter space while PPO avoids them, confirming a popular intuition for PPO’s improved performance over previous methods. We additionally introduce a highly extensible library that allows researchers to easily generate these visualizations in the future. Our findings provide new intuition to explain the successes and failures of modern RL methods, and our visualizations concretely characterize several failure modes of reinforcement learning agents in novel ways.

摘要:可视化优化景观导致了数字优化的许多基本见解,以及对优化技术的新颖改进。但是,仅在少数狭窄的环境中生成了增强学习优化(“奖励表面”)的目标的可视化。这项工作首次介绍了27个最广泛使用的增强学习环境的奖励表面和相关的可视化。我们还探索了政策梯度方向上的奖励表面,并首次表明许多流行的强化学习环境经常出现“悬崖”(预期回报中突然下降)。我们证明,A2C经常将这些悬崖“脱落”到参数空间的低奖励区域,而PPO避免了它们,这证实了PPO对PPO的流行直觉,以改善以前的方法。我们还引入了一个高度可扩展的库,该库使研究人员将来可以轻松地生成这些可视化。我们的发现提供了新的直觉,以解释现代RL方法的成功和失败,我们的可视化构成了以新颖方式进行强化学习剂的几种失败模式。

ML-80-标题 PrefixRL Optimization of Parallel Prefix Circuits using Deep Reinforcement Learning

链接: https://arxiv.org/abs/2205.07000
作者: Rajarshi Roy, Jonathan Raiman, Neel Kant, Ilyas Elkin, Robert Kirby, Michael Siu, Stuart Oberman, Saad Godil, Bryan Catanzaro
备注: Copyright 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

点击查看摘要

Abstract: In this work, we present a reinforcement learning (RL) based approach to designing parallel prefix circuits such as adders or priority encoders that are fundamental to high-performance digital design. Unlike prior methods, our approach designs solutions tabula rasa purely through learning with synthesis in the loop. We design a grid-based state-action representation and an RL environment for constructing legal prefix circuits. Deep Convolutional RL agents trained on this environment produce prefix adder circuits that Pareto-dominate existing baselines with up to 16.0% and 30.2% lower area for the same delay in the 32b and 64b settings respectively. We observe that agents trained with open-source synthesis tools and cell library can design adder circuits that achieve lower area and delay than commercial tool adders in an industrial cell library.

摘要:在这项工作中,我们提出了一种基于强化学习(RL)的方法来设计平行前缀电路,例如加法器或优先编码器,这些编码器是高性能数字设计的基础。与先前的方法不同,我们的方法设计解决方案tabula rasa纯粹是通过在循环中进行合成的。我们设计了基于网格的国家行动表示和一个用于构建法律前缀电路的RL环境。经过这种环境训练的深卷积RL代理会产生前缀加法电路,该电路为占主导地位的现有基线,分别在32B和64B设置中,相同延迟的面积高达16.0%和30.2%。我们观察到,经过开源合成工具和细胞库训练的代理可以设计比工业单元格中的商业工具加成器更低的加法电路。

ML-81-标题 Verifying Neural Networks Against Backdoor Attacks

链接: https://arxiv.org/abs/2205.06992
作者: Long H. Pham, Jun Sun
备注:

点击查看摘要

Abstract: Neural networks have achieved state-of-the-art performance in solving many problems, including many applications in safety/security-critical systems. Researchers also discovered multiple security issues associated with neural networks. One of them is backdoor attacks, i.e., a neural network may be embedded with a backdoor such that a target output is almost always generated in the presence of a trigger. Existing defense approaches mostly focus on detecting whether a neural network is ‘backdoored’ based on heuristics, e.g., activation patterns. To the best of our knowledge, the only line of work which certifies the absence of backdoor is based on randomized smoothing, which is known to significantly reduce neural network performance. In this work, we propose an approach to verify whether a given neural network is free of backdoor with a certain level of success rate. Our approach integrates statistical sampling as well as abstract interpretation. The experiment results show that our approach effectively verifies the absence of backdoor or generates backdoor triggers.

摘要:神经网络在解决许多问题方面已经取得了最先进的表现,包括许多安全/关键安全系统中的应用。研究人员还发现了与神经网络相关的多个安全问题。其中之一是后门攻击,即,神经网络可以嵌入后门,使得目标输出几乎总是在存在触发器的情况下产生。现有的防御方法主要集中于检测神经网络是否基于启发式方法,例如激活模式。据我们所知,证明缺乏后门的唯一工作是基于随机平滑的,众所周知,这会大大降低神经网络的性能。在这项工作中,我们提出了一种方法来验证给定的神经网络是否没有具有一定水平的成功率的后门。我们的方法集成了统计抽样以及抽象的解释。实验结果表明,我们的方法有效地验证了后门的缺失或生成后门触发器。

ML-82-标题 QHD A brain-inspired hyperdimensional reinforcement learning algorithm

链接: https://arxiv.org/abs/2205.06978
作者: Yang Ni, Danny Abraham, Mariam Issa, Yeseong Kim, Pietro Mecati, Mohsen Imani
备注:

点击查看摘要

Abstract: Reinforcement Learning (RL) has opened up new opportunities to solve a wide range of complex decision-making tasks. However, modern RL algorithms, e.g., Deep Q-Learning, are based on deep neural networks, putting high computational costs when running on edge devices. In this paper, we propose QHD, a Hyperdimensional Reinforcement Learning, that mimics brain properties toward robust and real-time learning. QHD relies on a lightweight brain-inspired model to learn an optimal policy in an unknown environment. We first develop a novel mathematical foundation and encoding module that maps state-action space into high-dimensional space. We accordingly develop a hyperdimensional regression model to approximate the Q-value function. The QHD-powered agent makes decisions by comparing Q-values of each possible action. We evaluate the effect of the different RL training batch sizes and local memory capacity on the QHD quality of learning. Our QHD is also capable of online learning with tiny local memory capacity, which can be as small as the training batch size. QHD provides real-time learning by further decreasing the memory capacity and the batch size. This makes QHD suitable for highly-efficient reinforcement learning in the edge environment, where it is crucial to support online and real-time learning. Our solution also supports a small experience replay batch size that provides 12.3 times speedup compared to DQN while ensuring minimal quality loss. Our evaluation shows QHD capability for real-time learning, providing 34.6 times speedup and significantly better quality of learning than state-of-the-art deep RL algorithms.

摘要:强化学习(RL)为解决各种复杂的决策任务提供了新的机会。但是,现代的RL算法,例如,深Q学习是基于深层神经网络,在Edge设备上运行时的计算成本很高。在本文中,我们提出了QHD,一种高度增强的学习,它模仿了大脑特性,以实现健壮和实时学习。 QHD依靠轻巧的大脑启发模型来学习未知环境中的最佳政策。我们首先建立一个新颖的数学基础和编码模块,该模块将状态行动空间映射到高维空间中。因此,我们开发了一个高维回归模型,以近似Q值函数。 QHD驱动的代理通过比较每个可能动作的Q值来做出决定。我们评估了不同的RL培训批量和本地记忆能力对QHD学习质量的影响。我们的QHD也能够以微小的本地记忆能力在线学习,这与培训批量大小一样小。 QHD通过进一步降低记忆容量和批处理大小来提供实时学习。这使得QHD适用于在边缘环境中高效的增强学习,这对于支持在线和实时学习至关重要。我们的解决方案还支持少量的重播批量大小,与DQN相比,该批量的速度为12.3倍,同时确保质量损失最小。我们的评估显示了实时学习的QHD能力,比最先进的Deep RL算法提供了34.6倍的速度和更高的学习质量。

ML-83-标题 Mask CycleGAN Unpaired Multi-modal Domain Translation with Interpretable Latent Variable

链接: https://arxiv.org/abs/2205.06969
作者: Minfa Wang
备注:

点击查看摘要

Abstract: We propose Mask CycleGAN, a novel architecture for unpaired image domain translation built based on CycleGAN, with an aim to address two issues: 1) unimodality in image translation and 2) lack of interpretability of latent variables. Our innovation in the technical approach is comprised of three key components: masking scheme, generator and objective. Experimental results demonstrate that this architecture is capable of bringing variations to generated images in a controllable manner and is reasonably robust to different masks.

摘要:我们提出了Mask Cyclegan,这是一种基于Cyclegan构建的未配对图像域翻译的新型体系结构,目的是解决两个问题:1)图像翻译中的单偶像性和2)缺乏潜在变量的解释性。我们在技术方法中的创新由三个关键组成部分组成:掩盖方案,生成器和客观。实验结果表明,该体系结构能够以可控的方式为生成的图像带来变化,并且对不同的掩模具有合理的稳健性。

ML-84-标题 No-regret learning for repeated non-cooperative games with lossy bandits

链接: https://arxiv.org/abs/2205.06968
作者: Wenting Liu, Jinlong Lei, Peng Yi, Yiguang Hong
备注: 14 pages,11 figures

点击查看摘要

Abstract: This paper considers no-regret learning for repeated continuous-kernel games with lossy bandit feedback. Since it is difficult to give the explicit model of the utility functions in dynamic environments, the players’ action can only be learned with bandit feedback. Moreover, because of unreliable communication channels or privacy protection, the bandit feedback may be lost or dropped at random. Therefore, we study the asynchronous online learning strategy of the players to adaptively adjust the next actions for minimizing the long-term regret loss. The paper provides a novel no-regret learning algorithm, called Online Gradient Descent with lossy bandits (OGD-lb). We first give the regret analysis for concave games with differentiable and Lipschitz utilities. Then we show that the action profile converges to a Nash equilibrium with probability 1 when the game is also strictly monotone. We further provide the mean square convergence rate \mathcal{O}\left(k^{-2\min{\beta, 1/6}}\right) when the game is \beta- strongly monotone. In addition, we extend the algorithm to the case when the loss probability of the bandit feedback is unknown, and prove its almost sure convergence to Nash equilibrium for strictly monotone games. Finally, we take the resource management in fog computing as an application example, and carry out numerical experiments to empirically demonstrate the algorithm performance.

摘要:本文考虑了具有失去的强盗反馈的重复连续内核游戏的无重组学习。由于很难在动态环境中提供实用程序功能的明确模型,因此只能通过强盗反馈来学习玩家的动作。此外,由于不可靠的沟通渠道或隐私保护,土匪反馈可能会随机丢失或丢弃。因此,我们研究了玩家的异步在线学习策略,以适应性地调整下一个动作,以最大程度地减少长期的遗憾损失。该论文提供了一种新颖的无重格学习算法,称为有损强盗(OGD-LB)的在线梯度下降。我们首先对具有可区分和Lipschitz实用程序的凹面游戏进行遗憾分析。然后,我们证明,当游戏也严格单调时,动作曲线会收敛到具有概率1的NASH平衡。我们进一步提供均方根收敛率\ Mathcal {o} \ left(k^{ - 2 \ min \ {\ beta,1/6 }} \ right)当游戏\ beta-强烈单调时。此外,我们将算法扩展到盗销反馈的损失概率未知的情况下,并证明其几乎可以确定其融合了严格单调游戏的NASH平衡。最后,我们以雾计算中的资源管理为应用程序示例,并进行数值实验以凭经验证明算法性能。

ML-85-标题 Bayesian Physics-Informed Extreme Learning Machine for Forward and Inverse PDE Problems with Noisy Data

链接: https://arxiv.org/abs/2205.06948
作者: Xu Liu, Wen Yao, Wei Peng, Weien Zhou
备注:

点击查看摘要

Abstract: Physics-informed extreme learning machine (PIELM) has recently received significant attention as a rapid version of physics-informed neural network (PINN) for solving partial differential equations (PDEs). The key characteristic is to fix the input layer weights with random values and use Moore-Penrose generalized inverse for the output layer weights. The framework is effective, but it easily suffers from overfitting noisy data and lacks uncertainty quantification for the solution under noise this http URL this end, we develop the Bayesian physics-informed extreme learning machine (BPIELM) to solve both forward and inverse linear PDE problems with noisy data in a unified framework. In our framework, a prior probability distribution is introduced in the output layer for extreme learning machine with physic laws and the Bayesian method is used to estimate the posterior of parameters. Besides, for inverse PDE problems, problem parameters considered as new output layer weights are unified in a framework with forward PDE problems. Finally, we demonstrate BPIELM considering both forward problems, including Poisson, advection, and diffusion equations, as well as inverse problems, where unknown problem parameters are estimated. The results show that, compared with PIELM, BPIELM quantifies uncertainty arising from noisy data and provides more accurate predictions. In addition, BPIELM is considerably cheaper than PINN in terms of the computational cost.

摘要:物理知识的极限学习机(PIELM)最近作为用于解决偏微分方程(PDES)的物理信息的神经网络(PIERN)的快速版本。关键特征是用随机值固定输入层权重,并为输出层权重使用Moore-Penrose广义倒数。该框架是有效的,但是它很容易遭受过度适应嘈杂的数据,并且缺乏噪声下解决方案的不确定性量化此目的,我们开发了贝叶斯物理学的极限学习机器(BPIELM)来解决前进和反向线性PDE问题。在统一框架中带有嘈杂的数据。在我们的框架中,具有物理定律的极限学习机的输出层中引入了先前的概率分布,并使用贝叶斯方法估计参数的后验。此外,对于逆PDE问题,将视为新输出层权重的问题参数统一在具有正向PDE问题的框架中。最后,我们证明了BPIELM考虑了远期问题,包括泊松,对流方程和扩散方程,以及反向问题,在这些问题中估计了未知的问题参数。结果表明,与PIELM相比,BPIELM量化了嘈杂数据引起的不确定性,并提供了更准确的预测。此外,就计算成本而言,BPIELM比Pinn便宜得多。

ML-86-标题 Unified Distributed Environment

链接: https://arxiv.org/abs/2205.06946
作者: Woong Gyu La, Sunil Muralidhara, Lingjie Kong, Pratik Nichat
备注:

点击查看摘要

Abstract: We propose Unified Distributed Environment (UDE), an environment virtualization toolkit for reinforcement learning research. UDE is designed to integrate environments built on any simulation platform such as Gazebo, Unity, Unreal, and OpenAI Gym. Through environment virtualization, UDE enables offloading the environment for execution on a remote machine while still maintaining a unified interface. The UDE interface is designed to support multi-agent by default. With environment virtualization and its interface design, the agent policies can be trained in multiple machines for a multi-agent environment. Furthermore, UDE supports integration with existing major RL toolkits for researchers to leverage the benefits. This paper discusses the components of UDE and its design decisions.

摘要:我们提出了统一的分布环境(UDE),这是一种用于强化学习研究的环境虚拟化工具包。UDE旨在集成在任何模拟平台上建立的环境,例如凉亭,统一,虚幻和OpenAI健身房。通过环境虚拟化,UDE可以在远程计算机上卸载环境,同时仍保持统一的接口。UDE接口旨在默认支持多代理。通过环境虚拟化及其界面设计,可以在多个机器中培训代理策略,以实现多代理环境。此外,UDE支持与现有主要RL工具包的集成,以便研究人员利用这些收益。本文讨论了UDE及其设计决策的组成部分。

ML-87-标题 Efficient Learning of Interpretable Classification Rules

链接: https://arxiv.org/abs/2205.06936
作者: Bishwamittra Ghosh, Dmitry Malioutov, Kuldeep S. Meel
备注:

点击查看摘要

Abstract: Machine learning has become omnipresent with applications in various safety-critical domains such as medical, law, and transportation. In these domains, high-stake decisions provided by machine learning necessitate researchers to design interpretable models, where the prediction is understandable to a human. In interpretable machine learning, rule-based classifiers are particularly effective in representing the decision boundary through a set of rules comprising input features. The interpretability of rule-based classifiers is in general related to the size of the rules, where smaller rules are considered more interpretable. To learn such a classifier, the brute-force direct approach is to consider an optimization problem that tries to learn the smallest classification rule that has close to maximum accuracy. This optimization problem is computationally intractable due to its combinatorial nature and thus, the problem is not scalable in large datasets. To this end, in this paper we study the triangular relationship among the accuracy, interpretability, and scalability of learning rule-based classifiers. The contribution of this paper is an interpretable learning framework IMLI, that is based on maximum satisfiability (MaxSAT) for synthesizing classification rules expressible in proposition logic. Despite the progress of MaxSAT solving in the last decade, the straightforward MaxSAT-based solution cannot scale. Therefore, we incorporate an efficient incremental learning technique inside the MaxSAT formulation by integrating mini-batch learning and iterative rule-learning. In our experiments, IMLI achieves the best balance among prediction accuracy, interpretability, and scalability. As an application, we deploy IMLI in learning popular interpretable classifiers such as decision lists and decision sets.

摘要:机器学习已随着医疗,法律和运输等各种安全 - 关键领域的应用而无处不在。在这些领域中,机器学习提供的高风险决策需要研究人员设计可解释的模型,在该模型中,预测对人类是可以理解的。在可解释的机器学习中,基于规则的分类器在通过包含输入功能的一组规则来表示决策边界方面特别有效。基于规则的分类器的解释性通常与规则的规模有关,其中较小的规则被认为更容易解释。要学习这样的分类器,蛮力的直接方法是考虑一个优化问题,该问题试图学习具有接近最大准确性的最小分类规则。由于其组合性质,该优化问题在计算上是可悲的,因此,在大型数据集中,该问题无法扩展。为此,在本文中,我们研究了基于学习规则的分类器的准确性,可解释性和可伸缩性之间的三角关系。本文的贡献是一个可解释的学习框架IMLI,这是基于最大的满意度(MAXSAT),用于在命题逻辑中表达的合成分类规则。尽管在过去十年中MaxSat解决方案取得了进展,但基于最直接的MaxSat解决方案仍无法扩展。因此,我们通过整合迷你批次学习和迭代规则学习,将有效的增量学习技术纳入了MaxSAT公式中。在我们的实验中,IMLI在预测准确性,可解释性和可伸缩性之间取得了最佳平衡。作为一个应用程序,我们将IMLI部署在学习流行的可解释分类器(例如决策清单和决策集)中。

ML-88-标题 Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps

链接: https://arxiv.org/abs/2205.06935
作者: Donald Bertucci, Md Montaser Hamid, Yashwanthi Anand, Anita Ruangrotsakun, Delyar Tabatabai, Melissa Perez, Minsuk Kahng
备注:

点击查看摘要

Abstract: In this paper, we present DendroMap, a novel approach to interactively exploring large-scale image datasets for machine learning. Machine learning practitioners often explore image datasets by generating a grid of images or projecting high-dimensional representations of images into 2-D using dimensionality reduction techniques (e.g., t-SNE). However, neither approach effectively scales to large datasets because images are ineffectively organized and interactions are insufficiently supported. To address these challenges, we develop DendroMap by adapting Treemaps, a well-known visualization technique. DendroMap effectively organizes images by extracting hierarchical cluster structures from high-dimensional representations of images. It enables users to make sense of the overall distributions of datasets and interactively zoom into specific areas of interests at multiple levels of abstraction. Our case studies with widely-used image datasets for deep learning demonstrate that users can discover insights about datasets and trained models by examining the diversity of images, identifying underperforming subgroups, and analyzing classification errors. We conducted a user study that evaluates the effectiveness of DendroMap in grouping and searching tasks by comparing it with a gridified version of t-SNE and found that participants preferred DendroMap over the compared method.

摘要:在本文中,我们提出了DendroMap,这是一种新颖的方法,用于互动地探索用于机器学习的大型图像数据集。机器学习实践者通常通过使用降低降低技术(例如T-SNE)生成图像的网格或将图像的高维表示分为2-D来探索图像数据集。但是,两种方法都没有有效地扩展到大型数据集,因为图像是无效组织的,并且相互作用不足。为了应对这些挑战,我们通过适应Treemaps(一种众所周知的可视化技术)来开发树突。树突图通过从图像的高维表示中提取分层群集结构来有效地组织图像。它使用户能够理解数据集的整体分布,并在多个抽象级别上进行交互放大到特定的兴趣领域。我们使用广泛使用的图像数据集进行深度学习的案例研究表明,用户可以通过检查图像的多样性,确定表现不佳的子组并分析分类错误,从而发现有关数据集和训练模型的见解。我们进行了一项用户研究,该研究通过将其与T-SNE的网状版本进行比较来评估树突图在分组和搜索任务中的有效性,并发现参与者比比较方法更喜欢DendroMap。

ML-89-标题 Toward a Geometrical Understanding of Self-supervised Contrastive Learning

链接: https://arxiv.org/abs/2205.06926
作者: Romain Cosentino, Anirvan Sengupta, Salman Avestimehr, Mahdi Soltanolkotabi, Antonio Ortega, Ted Willke, Mariano Tepper
备注:

点击查看摘要

Abstract: Self-supervised learning (SSL) is currently one of the premier techniques to create data representations that are actionable for transfer learning in the absence of human annotations. Despite their success, the underlying geometry of these representations remains elusive, which obfuscates the quest for more robust, trustworthy, and interpretable models. In particular, mainstream SSL techniques rely on a specific deep neural network architecture with two cascaded neural networks: the encoder and the projector. When used for transfer learning, the projector is discarded since empirical results show that its representation generalizes more poorly than the encoder’s. In this paper, we investigate this curious phenomenon and analyze how the strength of the data augmentation policies affects the data embedding. We discover a non-trivial relation between the encoder, the projector, and the data augmentation strength: with increasingly larger augmentation policies, the projector, rather than the encoder, is more strongly driven to become invariant to the augmentations. It does so by eliminating crucial information about the data by learning to project it into a low-dimensional space, a noisy estimate of the data manifold tangent plane in the encoder representation. This analysis is substantiated through a geometrical perspective with theoretical and empirical results.

摘要:自我监督学习(SSL)当前是创建数据表示的主要技术之一,这些技术在没有人类注释的情况下可用于转移学习。尽管它们成功了,但这些表示形式的基本几何形状仍然难以捉摸,这使人们对更健壮,可信赖和可解释的模型的追求混为一谈。特别是,主流SSL技术依赖于具有两个级联神经网络的特定深度神经网络体系结构:编码器和投影仪。当用于转移学习时,投影仪会被丢弃,因为经验结果表明,其表示形式比编码器的表现更差。在本文中,我们研究了这种奇怪的现象,并分析了数据增强策略的强度如何影响数据嵌入。我们发现编码器,投影仪和数据增强强度之间存在非平凡的关系:随着越来越大的增强策略,投影仪而不是编码器更强烈地驱动着成为增强的不变性。通过学习将其投影到低维空间中,它可以消除有关数据的关键信息,这是对编码器表示中数据歧管切线平面的嘈杂估计。通过理论和经验结果,通过几何视角来证实该分析。

ML-90-标题 Exploring How Machine Learning Practitioners (Try To) Use Fairness Toolkits

链接: https://arxiv.org/abs/2205.06922
作者: Wesley Hanwen Deng, Manish Nagireddy, Michelle Seng Ah Lee, Jatinder Singh, Zhiwei Steven Wu, Kenneth Holstein, Haiyi Zhu
备注: ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT 2022)

点击查看摘要

Abstract: Recent years have seen the development of many open-source ML fairness toolkits aimed at helping ML practitioners assess and address unfairness in their systems. However, there has been little research investigating how ML practitioners actually use these toolkits in practice. In this paper, we conducted the first in-depth empirical exploration of how industry practitioners (try to) work with existing fairness toolkits. In particular, we conducted think-aloud interviews to understand how participants learn about and use fairness toolkits, and explored the generality of our findings through an anonymous online survey. We identified several opportunities for fairness toolkits to better address practitioner needs and scaffold them in using toolkits effectively and responsibly. Based on these findings, we highlight implications for the design of future open-source fairness toolkits that can support practitioners in better contextualizing, communicating, and collaborating around ML fairness efforts.

摘要:近年来已经看到了许多开源ML公平工具包的发展,旨在帮助ML实践者评估和解决其系统中的不公平性。但是,很少有研究研究ML从业人员实际上如何在实践中使用这些工具包。在本文中,我们对行业从业人员(尝试)如何与现有公平工具包合作的第一次深入实证探索。特别是,我们进行了思考访谈,以了解参与者如何学习和使用公平工具包,并通过匿名在线调查探讨我们发现的一般性。我们确定了公平工具包的几个机会,可以更好地满足从业者的需求,并在有效,负责任地使用工具包时脚手架。基于这些发现,我们强调了对未来开源公平工具包的设计的影响,这些工具包可以支持从业者更好地围绕ML公平努力进行更好的背景化,沟通和协作。

ML-91-标题 Beyond General Purpose Machine Translation The Need for Context-specific Empirical Research to Design for Appropriate User Trust

链接: https://arxiv.org/abs/2205.06920
作者: Wesley Hanwen Deng, Nikita Mehandru, Samantha Robertson, Niloufar Salehi
备注: Workshop on Trust and Reliance in AI-Human Teams (TRAIT): this https URL

点击查看摘要

Abstract: Machine Translation (MT) has the potential to help people overcome language barriers and is widely used in high-stakes scenarios, such as in hospitals. However, in order to use MT reliably and safely, users need to understand when to trust MT outputs and how to assess the quality of often imperfect translation results. In this paper, we discuss research directions to support users to calibrate trust in MT systems. We share findings from an empirical study in which we conducted semi-structured interviews with 20 clinicians to understand how they communicate with patients across language barriers, and if and how they use MT systems. Based on our findings, we advocate for empirical research on how MT systems are used in practice as an important first step to addressing the challenges in building appropriate trust between users and MT tools.

摘要:机器翻译(MT)有可能帮助人们克服语言障碍,并广泛用于高风险场景,例如在医院中。但是,为了可靠,安全地使用MT,用户需要了解何时信任MT输出以及如何评估通常不完美翻译结果的质量。在本文中,我们讨论了研究方向,以支持用户校准对MT系统的信任。我们分享了一项实证研究的发现,在该研究中,我们与20位临床医生进行了半结构化访谈,以了解他们如何在语言障碍中与患者进行交流,以及他们如何以及如何使用MT系统。根据我们的发现,我们主张有关如何将MT系统用于实践中的实证研究,作为解决用户和MT工具之间建立适当信任的挑战的重要第一步。

ML-92-标题 Representation learning with function call graph transformations for malware open set recognition

链接: https://arxiv.org/abs/2205.06918
作者: Jingyun Jia, Philip K. Chan
备注:

点击查看摘要

Abstract: Open set recognition (OSR) problem has been a challenge in many machine learning (ML) applications, such as security. As new/unknown malware families occur regularly, it is difficult to exhaust samples that cover all the classes for the training process in ML systems. An advanced malware classification system should classify the known classes correctly while sensitive to the unknown class. In this paper, we introduce a self-supervised pre-training approach for the OSR problem in malware classification. We propose two transformations for the function call graph (FCG) based malware representations to facilitate the pretext task. Also, we present a statistical thresholding approach to find the optimal threshold for the unknown class. Moreover, the experiment results indicate that our proposed pre-training process can improve different performances of different downstream loss functions for the OSR problem.

摘要:开放集识别(OSR)问题在许多机器学习(ML)应用程序(例如安全性)中一直是一个挑战。由于新的/未知的恶意软件系列经常发生,因此很难耗尽涵盖ML系统中培训过程的所有课程的样品。高级恶意软件分类系统应在对未知类敏感的同时正确对已知类别进行分类。在本文中,我们为恶意软件分类中的OSR问题引入了一种自我监督的预训练方法。我们建议基于功能调用图(FCG)的恶意软件表示形式进行两种转换,以促进借口任务。另外,我们提出了一种统计阈值方法,以找到未知类别的最佳阈值。此外,实验结果表明,我们提出的训练过程可以改善OSR问题的不同下游损失函数的不同性能。

ML-93-标题 Formal limitations of sample-wise information-theoretic generalization bounds

链接: https://arxiv.org/abs/2205.06915
作者: Hrayr Harutyunyan, Greg Ver Steeg, Aram Galstyan
备注:

点击查看摘要

Abstract: Some of the tightest information-theoretic generalization bounds depend on the average information between the learned hypothesis and a \emph{single} training example. However, these sample-wise bounds were derived only for \emph{expected} generalization gap. We show that even for expected \emph{squared} generalization gap no such sample-wise information-theoretic bounds exist. The same is true for PAC-Bayes and single-draw bounds. Remarkably, PAC-Bayes, single-draw and expected squared generalization gap bounds that depend on information in pairs of examples exist.

摘要:一些最紧密的信息理论概括范围取决于学到的假设与\ emph {单}训练示例之间的平均信息。但是,这些样本的边界仅针对\ emph {预期}概括差距得出。我们表明,即使对于预期\ emph {squared}概括差距也不存在这样的样本信息理论界限。Pac-Bayes和单绘图界也是如此。值得注意的是,存在依赖于示例成对信息的信息的Pac-bayes,单纸和预期的平方概括差距。

ML-94-标题 Neural-Fly Enables Rapid Learning for Agile Flight in Strong Winds

链接: https://arxiv.org/abs/2205.06908
作者: Michael O’Connell, Guanya Shi, Xichen Shi, Kamyar Azizzadenesheli, Anima Anandkumar, Yisong Yue, Soon-Jo Chung
备注: This is the accepted version of Science Robotics Vol. 7, Issue 66, eabm6597 (2022). Video: this https URL

点击查看摘要

Abstract: Executing safe and precise flight maneuvers in dynamic high-speed winds is important for the ongoing commoditization of uninhabited aerial vehicles (UAVs). However, because the relationship between various wind conditions and its effect on aircraft maneuverability is not well understood, it is challenging to design effective robot controllers using traditional control design methods. We present Neural-Fly, a learning-based approach that allows rapid online adaptation by incorporating pretrained representations through deep learning. Neural-Fly builds on two key observations that aerodynamics in different wind conditions share a common representation and that the wind-specific part lies in a low-dimensional space. To that end, Neural-Fly uses a proposed learning algorithm, domain adversarially invariant meta-learning (DAIML), to learn the shared representation, only using 12 minutes of flight data. With the learned representation as a basis, Neural-Fly then uses a composite adaptation law to update a set of linear coefficients for mixing the basis elements. When evaluated under challenging wind conditions generated with the Caltech Real Weather Wind Tunnel, with wind speeds up to 43.6 kilometers/hour (12.1 meters/second), Neural-Fly achieves precise flight control with substantially smaller tracking error than state-of-the-art nonlinear and adaptive controllers. In addition to strong empirical performance, the exponential stability of Neural-Fly results in robustness guarantees. Last, our control design extrapolates to unseen wind conditions, is shown to be effective for outdoor flights with only onboard sensors, and can transfer across drones with minimal performance degradation.

摘要:在动态高速风中执行安全,精确的飞行操作对于无人居住的航空车(UAV)的持续商品化很重要。但是,由于各种风条件之间的关系及其对飞机机动性的影响尚不清楚,因此使用传统的控制设计方法设计有效的机器人控制器是一项挑战。我们提出了一种基于学习的方法,它可以通过深入学习结合预审计的表示,允许快速在线适应。神经苍蝇的建立在两个关键观察结果上,即在不同风条件下空气动力学具有共同的表示,而特定风的部分则在于低维空间。为此,Neural-Fly使用建议的学习算法,域对抗不变的元学习(DAIML),仅使用12分钟的飞行数据来学习共享表示。以学习的表示为基础,神经fly使用复合适应定律来更新一组线性系数以混合基础元素。当在Caltech真实的天气风洞产生的挑战性风条件下进行评估,风速高达43.6公里/小时(12.1米/秒),神经飞行可实现精确的飞行控制,并且跟踪误差要小得多。艺术非线性和自适应控制器。除了强烈的经验表现外,神经苍蝇的指数稳定性可带来稳定性的保证。最后,我们的控制设计外推到看不见的风条件,被证明对只有板载传感器的室外飞行有效,并且可以跨无人机转移性能最小的降解。

ML-95-标题 Multimodal Conversational AI A Survey of Datasets and Approaches

链接: https://arxiv.org/abs/2205.06907
作者: Anirudh Sundar, Larry Heck
备注: 17 pages, 1 figure, to be published in the 4th Workshop on NLP for Conversational AI

点击查看摘要

Abstract: As humans, we experience the world with all our senses or modalities (sound, sight, touch, smell, and taste). We use these modalities, particularly sight and touch, to convey and interpret specific meanings. Multimodal expressions are central to conversations; a rich set of modalities amplify and often compensate for each other. A multimodal conversational AI system answers questions, fulfills tasks, and emulates human conversations by understanding and expressing itself via multiple modalities. This paper motivates, defines, and mathematically formulates the multimodal conversational research objective. We provide a taxonomy of research required to solve the objective: multimodal representation, fusion, alignment, translation, and co-learning. We survey state-of-the-art datasets and approaches for each research area and highlight their limiting assumptions. Finally, we identify multimodal co-learning as a promising direction for multimodal conversational AI research.

摘要:作为人类,我们以我们的所有感官或方式(声音,视觉,触摸,气味和味觉)体验世界。我们使用这些方式,尤其是视觉和触摸来传达和解释特定的含义。多模式表达是对话的核心;一组丰富的模式会放大并经常相互补偿。多模式的对话AI系统回答问题,完成任务,并通过多种模式理解和表达自己来模仿人类的对话。本文激励,定义和数学提出了多模式的对话研究目标。我们提供了解决目标所需的研究分类学:多模式表示,融合,对齐,翻译和共同学习。我们调查了每个研究领域的最新数据集和方法,并突出了它们的限制假设。最后,我们将多模式共学习视为多模式对话AI研究的有希望的方向。

ML-96-标题 Structural Dropout for Model Width Compression

链接: https://arxiv.org/abs/2205.06906
作者: Julian Knodt
备注:

点击查看摘要

Abstract: Existing ML models are known to be highly over-parametrized, and use significantly more resources than required for a given task. Prior work has explored compressing models offline, such as by distilling knowledge from larger models into much smaller ones. This is effective for compression, but does not give an empirical method for measuring how much the model can be compressed, and requires additional training for each compressed model. We propose a method that requires only a single training session for the original model and a set of compressed models. The proposed approach is a “structural” dropout that prunes all elements in the hidden state above a randomly chosen index, forcing the model to learn an importance ordering over its features. After learning this ordering, at inference time unimportant features can be pruned while retaining most accuracy, reducing parameter size significantly. In this work, we focus on Structural Dropout for fully-connected layers, but the concept can be applied to any kind of layer with unordered features, such as convolutional or attention layers. Structural Dropout requires no additional pruning/retraining, but requires additional validation for each possible hidden sizes. At inference time, a non-expert can select a memory versus accuracy trade-off that best suits their needs, across a wide range of highly compressed versus more accurate models.

摘要:已知现有的ML模型已被高度过度分配,并且使用比给定任务所需的更多资源。先前的工作已经探索了脱机压缩模型,例如将知识从较大模型提炼成较小的模型。这对于压缩是有效的,但没有给出测量可以压缩多少模型的经验方法,并且需要为每个压缩模型进行额外的培训。我们提出了一种方法,该方法仅需要一个针对原始模型和一组压缩模型的单个培训会话。所提出的方法是一种“结构”辍学,将随机选择索引上方的隐藏状态中的所有元素修复,迫使该模型学习对其功能的重要性顺序。在学习此顺序后,可以在保留最准确性的同时修剪不重要的特征,从而大大降低参数大小。在这项工作中,我们专注于完全连接层的结构辍学,但是该概念可以应用于具有无序特征的任何类型的层,例如卷积或注意力层。结构辍学不需要额外的修剪/再培训,但需要对每个可能的隐藏尺寸进行额外验证。在推理时,非专家可以选择最适合其需求的内存与准确性权衡,并在广泛的高度压缩和更准确的模型中选择最适合其需求。

ML-97-标题 Perspectives on Incorporating Expert Feedback into Model Updates

链接: https://arxiv.org/abs/2205.06905
作者: Valerie Chen, Umang Bhatt, Hoda Heidari, Adrian Weller, Ameet Talwalkar
备注:

点击查看摘要

Abstract: Machine learning (ML) practitioners are increasingly tasked with developing models that are aligned with non-technical experts’ values and goals. However, there has been insufficient consideration on how practitioners should translate domain expertise into ML updates. In this paper, we consider how to capture interactions between practitioners and experts systematically. We devise a taxonomy to match expert feedback types with practitioner updates. A practitioner may receive feedback from an expert at the observation- or domain-level, and convert this feedback into updates to the dataset, loss function, or parameter space. We review existing work from ML and human-computer interaction to describe this feedback-update taxonomy, and highlight the insufficient consideration given to incorporating feedback from non-technical experts. We end with a set of open questions that naturally arise from our proposed taxonomy and subsequent survey.

摘要:机器学习(ML)从业人员越来越多地承担着与非技术专家的价值观和目标保持一致的模型。但是,关于从业人员如何将域专业知识转化为ML更新的考虑不足。在本文中,我们考虑如何系统地捕获从业者和专家之间的互动。我们设计了一种分类法,以将专家反馈类型与从业者更新相匹配。从业者可以从观察或域级别的专家那里收到反馈,并将此反馈转换为数据集,损耗函数或参数空间的更新。我们回顾了ML和人类计算机互动中的现有工作,以描述这种反馈更高的分类法,并强调了不足以纳入非技术专家的反馈意见。我们以一系列的开放问题结尾,这些问题自然而然地源于我们提议的分类法和随后的调查。

ML-98-标题 Universal Post-Training Backdoor Detection

链接: https://arxiv.org/abs/2205.06900
作者: Hang Wang, Zhen Xiang, David J. Miller, George Kesidis
备注:

点击查看摘要

Abstract: A Backdoor attack (BA) is an important type of adversarial attack against deep neural network classifiers, wherein test samples from one or more source classes will be (mis)classified to the attacker’s target class when a backdoor pattern (BP) is embedded. In this paper, we focus on the post-training backdoor defense scenario commonly considered in the literature, where the defender aims to detect whether a trained classifier was backdoor attacked, without any access to the training set. To the best of our knowledge, existing post-training backdoor defenses are all designed for BAs with presumed BP types, where each BP type has a specific embedding function. They may fail when the actual BP type used by the attacker (unknown to the defender) is different from the BP type assumed by the defender. In contrast, we propose a universal post-training defense that detects BAs with arbitrary types of BPs, without making any assumptions about the BP type. Our detector leverages the influence of the BA, independently of the BP type, on the landscape of the classifier’s outputs prior to the softmax layer. For each class, a maximum margin statistic is estimated using a set of random vectors; detection inference is then performed by applying an unsupervised anomaly detector to these statistics. Thus, our detector is also an advance relative to most existing post-training methods by not needing any legitimate clean samples, and can efficiently detect BAs with arbitrary numbers of source classes. These advantages of our detector over several state-of-the-art methods are demonstrated on four datasets, for three different types of BPs, and for a variety of attack configurations. Finally, we propose a novel, general approach for BA mitigation once a detection is made.

摘要:后门攻击(BA)是针对深神经网络分类器的对抗性攻击的一种重要类型,其中一个或多个源类的测试样本将(MIS)分类为攻击者的目标类,当嵌入后门模式(BP)时。在本文中,我们专注于文献中通常考虑的训练后的后门防御场景,后卫的目的是检测训练有素的分类器是否受到后门攻击,而无需任何访问训练集。据我们所知,现有的训练后后门防御都是为BAS设计的,其中每种BP类型都具有特定的嵌入功能。当攻击者使用的实际BP类型(防御者未知)与防御者假设的BP类型不同时,它们可能会失败。相比之下,我们提出了一种通用的训练后防御,该防御可检测使用任意类型的BPS的BAS,而无需对BP类型做出任何假设。我们的探测器利用BA的影响,独立于BP类型,对SoftMax层之前的分类器输出的景观。对于每个类,使用一组随机向量估算一个最大保证金统计量。然后,通过将无监督的异常检测器应用于这些统计数据来进行检测推断。因此,我们的检测器也是相对于大多数现有的训练后方法的进步,而不需要任何合法的干净样本,并且可以通过任意数量的源类别有效地检测BAS。在四种不同类型的BP和各种攻击配置中,在四个数据集中证明了检测器比几种最新方法的这些优势。最后,我们提出了一种新颖的,一旦进行检测,用于减轻BA的一般方法。

ML-99-标题 Differentiable programming Generalization characterization and limitations of deep learning

链接: https://arxiv.org/abs/2205.06898
作者: Adrián Hernández, Gilles Millerioux, José M. Amigó
备注: 15 pages, 5 figures

点击查看摘要

Abstract: In the past years, deep learning models have been successfully applied in several cognitive tasks. Originally inspired by neuroscience, these models are specific examples of differentiable programs. In this paper we define and motivate differentiable programming, as well as specify some program characteristics that allow us to incorporate the structure of the problem in a differentiable program. We analyze different types of differentiable programs, from more general to more specific, and evaluate, for a specific problem with a graph dataset, its structure and knowledge with several differentiable programs using those characteristics. Finally, we discuss some inherent limitations of deep learning and differentiable programs, which are key challenges in advancing artificial intelligence, and then analyze possible solutions

摘要:在过去的几年中,深度学习模型已成功地应用于几个认知任务中。这些模型最初是受神经科学的启发,是可区分程序的特定示例。在本文中,我们定义和激励可区分的编程,并指定一些程序特征,使我们能够将问题的结构纳入可区分的程序中。我们分析了不同类型的可区分程序,从更通用到更具体的程序,并评估图形数据集的特定问题,其结构和知识,使用这些特征使用多个可区分程序。最后,我们讨论深度学习和可区分程序的一些内在局限性,这是推进人工智能的关键挑战,然后分析可能的解决方案

ML-100-标题 Robustness of Control Design via Bayesian Learning

链接: https://arxiv.org/abs/2205.06896
作者: Nardos Ayele Ashenafi, Wankun Sirichotiyakul, Aykut C. Satici
备注:

点击查看摘要

Abstract: In the realm of supervised learning, Bayesian learning has shown robust predictive capabilities under input and parameter perturbations. Inspired by these findings, we demonstrate the robustness properties of Bayesian learning in the control search task. We seek to find a linear controller that stabilizes a one-dimensional open-loop unstable stochastic system. We compare two methods to deduce the controller: the first (deterministic) one assumes perfect knowledge of system parameter and state, the second takes into account uncertainties in both and employs Bayesian learning to compute a posterior distribution for the controller.

摘要:在监督学习领域,贝叶斯学习在输入和参数扰动下显示出强大的预测能力。受这些发现的启发,我们证明了在控制搜索任务中贝叶斯学习的鲁棒性特性。我们试图找到一个线性控制器,该线性控制器可以稳定一维开环不稳定的随机系统。我们比较了推断控制器的两种方法:第一个(确定性)假设对系统参数和状态的完美知识,第二种方法考虑了两者的不确定性,并采用贝叶斯学习来计算控制器的后验分布。

ML-101-标题 Physics guided neural networks for modelling of non-linear dynamics

链接: https://arxiv.org/abs/2205.06858
作者: Haakon Robinson, Suraj Pawar, Adil Rasheed, Omer San
备注:

点击查看摘要

Abstract: The success of the current wave of artificial intelligence can be partly attributed to deep neural networks, which have proven to be very effective in learning complex patterns from large datasets with minimal human intervention. However, it is difficult to train these models on complex dynamical systems from data alone due to their low data efficiency and sensitivity to hyperparameters and initialisation. This work demonstrates that injection of partially known information at an intermediate layer in a DNN can improve model accuracy, reduce model uncertainty, and yield improved convergence during the training. The value of these physics-guided neural networks has been demonstrated by learning the dynamics of a wide variety of nonlinear dynamical systems represented by five well-known equations in nonlinear systems theory: the Lotka-Volterra, Duffing, Van der Pol, Lorenz, and Henon-Heiles systems.

摘要:当前人工智能浪潮的成功可以部分归因于深度神经网络,事实证明,这些神经网络在从大型数据集中学习的复杂模式非常有效。但是,由于数据效率低和对超参数和初始化的敏感性,因此很难在复杂的动力系统上训练这些模型。这项工作表明,在DNN中的中间层中注入部分已知的信息可以提高模型的准确性,降低模型不确定性并在训练过程中提高收敛性。通过学习非线性系统理论中五个知名方程式的各种非线性动力学系统的动力学,已经证明了这些物理引导的神经网络的价值:Lotka-Volterra,Duffing,van der Pol,Lorenz和亨蒙·海尔斯系统。

ML-102-标题 Optimal Parameter-free Online Learning with Switching Cost

链接: https://arxiv.org/abs/2205.06846
作者: Zhiyu Zhang, Ashok Cutkosky, Ioannis Ch. Paschalidis
备注:

点击查看摘要

Abstract: Parameter-freeness in online learning refers to the adaptivity of an algorithm with respect to the optimal decision in hindsight. In this paper, we design such algorithms in the presence of switching cost - the latter penalizes the optimistic updates required by parameter-freeness, leading to a delicate design trade-off. Based on a novel dual space scaling strategy, we propose a simple yet powerful algorithm for Online Linear Optimization (OLO) with switching cost, which improves the existing suboptimal regret bound [ZCP22a] to the optimal rate. The obtained benefit is extended to the expert setting, and the practicality of our algorithm is demonstrated through a sequential investment task.

摘要:在线学习中的参数杂志是指关于事后最佳决定的算法的适应性。在本文中,我们在开关成本的存在下设计了这样的算法 - 后者惩罚了参数折扣所需的乐观更新,从而导致了精致的设计权衡。基于一种新颖的双空间缩放策略,我们提出了一种简单而强大的在线线性优化(OLO),其开关成本,从而改善了现有的次优遗后绑定[ZCP22A],以提高到最佳速度。获得的收益扩展到专家环境,我们的算法的实用性通过顺序投资任务证明。

ML-103-标题 Power and limitations of single-qubit native quantum neural networks

链接: https://arxiv.org/abs/2205.07848
作者: Zhan Yu, Hongshun Yao, Mujin Li, Xin Wang
备注: 19 pages including appendix

点击查看摘要

Abstract: Quantum neural networks (QNNs) have emerged as a leading strategy to establish applications in machine learning, chemistry, and optimization. While the applications of QNN have been widely investigated, its theoretical foundation remains less understood. In this paper, we formulate a theoretical framework for the expressive ability of data re-uploading quantum neural networks that consist of interleaved encoding circuit blocks and trainable circuit blocks. First, we prove that single-qubit quantum neural networks can approximate any univariate function by mapping the model to a partial Fourier series. Beyond previous works’ understanding of existence, we in particular establish the exact correlations between the parameters of the trainable gates and the working Fourier coefficients, by exploring connections to quantum signal processing. Second, we discuss the limitations of single-qubit native QNNs on approximating multivariate functions by analyzing the frequency spectrum and the flexibility of Fourier coefficients. We further demonstrate the expressivity and limitations of single-qubit native QNNs via numerical experiments. As applications, we introduce natural extensions to multi-qubit quantum neural networks, which exhibit the capability of classifying real-world multi-dimensional data. We believe these results would improve our understanding of QNNs and provide a helpful guideline for designing powerful QNNs for machine learning tasks.

摘要:量子神经网络(QNN)已成为建立机器学习,化学和优化应用程序的领先策略。虽然QNN的应用已得到广泛研究,但其理论基础仍然不太了解。在本文中,我们制定了一个理论框架,用于由由交织的编码电路块和可训练的电路块组成的数据重新上传量子神经网络的表达能力。首先,我们证明单量量子神经网络可以通过将模型映射到部分傅立叶系列来近似任何单变量函数。除了以前的作品对存在的理解之外,我们特别在可训练门的参数与工作傅立叶系数之间建立了确切的相关性,通过探索与量子信号处理的连接。其次,我们通过分析频率谱和傅立叶系数的灵活性来讨论单Qubit天然QNN对近似多元函数的局限性。我们进一步证明了通过数值实验的单量本天然QNN的表达和局限性。作为应用程序,我们将自然扩展引入多量量子神经网络,这些量子神经网络具有分类现实世界多维数据的能力。我们认为这些结果将提高我们对QNN的理解,并为设计强大的QNN用于机器学习任务提供有用的指南。

ML-104-标题 Physics-informed machine learning techniques for edge plasma turbulence modelling in computational theory and experiment

链接: https://arxiv.org/abs/2205.07838
作者: Abhilash Mathews
备注: PhD thesis, 172 pages, 38 figures, 4 tables

点击查看摘要

Abstract: Edge plasma turbulence is critical to the performance of magnetic confinement fusion devices. Towards better understanding edge turbulence in both theory and experiment, a custom-built physics-informed deep learning framework constrained by partial differential equations is developed to accurately learn turbulent fields consistent with the two-fluid theory from partial observations of electron pressure. This calculation is not otherwise possible using conventional equilibrium models. With this technique, the first direct quantitative comparisons of turbulent fields between electrostatic two-fluid theory and electromagnetic gyrokinetic modelling are demonstrated with good overall agreement found in magnetized helical plasmas at low normalized pressure. To translate these computational techniques to experimental fusion plasmas, a novel method to translate brightness measurements of HeI line radiation into local plasma fluctuations is demonstrated via a newly created deep learning framework that integrates neutral transport physics and collisional radiative theory for the 3^3 D - 2^3 P transition in atomic helium. Using fast camera data on the Alcator C-Mod tokamak, this thesis presents the first 2-dimensional time-dependent experimental measurements of the turbulent electron density, electron temperature, and neutral density in a fusion plasma using a single spectral line. With this experimentally inferred data, initial estimates of the 2-dimensional turbulent electric field consistent with drift-reduced Braginskii theory under the framework of an axisymmetric fusion plasma with purely toroidal field are calculated. The inclusion of atomic helium effects on particle and energy sources are found to strengthen correlations between the electric field and electron pressure while broadening turbulent field amplitudes which impact {\bf E \times B} flows and shearing rates.

摘要:边缘等离子体湍流对于磁性限制融合设备的性能至关重要。为了更好地理解理论和实验中的边缘湍流,开发了一个由部分微分方程约束的定制物理知识的深度学习框架,以准确地学习与电子压力的部分观察结果相符的湍流场。使用常规平衡模型,不可能进行此计算。通过这种技术,在低归一化压力下,在磁化螺旋化等离子体中发现了良好的总体一致性,证明了静电两流体理论和电磁旋转旋转模型之间的湍流的第一个直接定量比较。为了将这些计算技术转化为实验融合等离子体,通过新创建的深度学习框架将HEI线辐射测量的亮度测量转化为局部等离子体波动的新方法,该方法将中性的传输物理学和碰撞辐射理论整合到3^3 D - 2^3 p转变原子氦。本文使用碱C-Mod Tokamak上的快速摄像头数据,介绍了使用单光谱线在融合等离子体中的湍流电子密度,电子温度和中性密度的第一个二维实验测量。通过实验推断的数据,计算了二维湍流电场的初始估计,与轴对称融合等离子体的框架下,与纯粹的环形场的框架下,与漂移降低的braginskii理论一致。发现原子氦对粒子和能源的影响可以增强电场和电子压力之间的相关性,同时扩大影响{\ bf e \ times b}流动和剪切速率的湍流场振幅。

ML-105-标题 JR2net A Joint Non-Linear Representation and Recovery Network for Compressive Spectral Imaging

链接: https://arxiv.org/abs/2205.07770
作者: Brayan Monroy, Jorge Bacca, Henry Arguello
备注:

点击查看摘要

Abstract: Deep learning models are state-of-the-art in compressive spectral imaging (CSI) recovery. These methods use a deep neural network (DNN) as an image generator to learn non-linear mapping from compressed measurements to the spectral image. For instance, the deep spectral prior approach uses a convolutional autoencoder network (CAE) in the optimization algorithm to recover the spectral image by using a non-linear representation. However, the CAE training is detached from the recovery problem, which does not guarantee optimal representation of the spectral images for the CSI problem. This work proposes a joint non-linear representation and recovery network (JR2net), linking the representation and recovery task into a single optimization problem. JR2net consists of an optimization-inspired network following an ADMM formulation that learns a non-linear low-dimensional representation and simultaneously performs the spectral image recovery, trained via the end-to-end approach. Experimental results show the superiority of the proposed method with improvements up to 2.57 dB in PSNR and performance around 2000 times faster than state-of-the-art methods.

摘要:深度学习模型是压缩光谱成像(CSI)恢复的最新模型。这些方法使用深神网络(DNN)作为图像发生器来学习从压缩测量到光谱图像的非线性映射。例如,深频谱先验方法在优化算法中使用卷积自动编码器网络(CAE)通过使用非线性表示来恢复光谱图像。但是,CAE训练与恢复问题分离,这不能保证CSI问题的光谱图像的最佳表示。这项工作提出了联合非线性表示和恢复网络(JR2NET),将表示和恢复任务链接到单个优化问题。 JR2NET由ADMM公式遵循优化启发的网络组成,该网络学习了非线性低维表示,并同时执行通过端到端方法训练的光谱图像恢复。实验结果表明,该方法的优势在PSNR中的改进高达2.57 dB,并且性能比最新方法快2000倍。

ML-106-标题 On the inability of Gaussian process regression to optimally learn compositional functions

链接: https://arxiv.org/abs/2205.07764
作者: Matteo Giordano, Kolyan Ray, Johannes Schmidt-Hieber
备注: 24 pages

点击查看摘要

Abstract: We rigorously prove that deep Gaussian process priors can outperform Gaussian process priors if the target function has a compositional structure. To this end, we study information-theoretic lower bounds for posterior contraction rates for Gaussian process regression in a continuous regression model. We show that if the true function is a generalized additive function, then the posterior based on any mean-zero Gaussian process can only recover the truth at a rate that is strictly slower than the minimax rate by a factor that is polynomially suboptimal in the sample size n .

摘要:我们严格地证明,如果目标函数具有组成结构,那么深层过程先验可以超越高斯工艺先验。为此,我们研究了连续回归模型中高斯过程回归后收缩率的信息理论下限。我们表明,如果真实函数是广义的加性函数,那么基于任何平均零高斯过程的后验只能以严格慢的速率恢复真相,而该速率比最小值速率慢了,该因子在样品中多项式次优的因素。尺寸n。

ML-107-标题 Sharp Asymptotics of Self-training with Linear Classifier

链接: https://arxiv.org/abs/2205.07739
作者: Takashi Takahashi
备注: 34 pages, 6 figures

点击查看摘要

Abstract: Self-training (ST) is a straightforward and standard approach in semi-supervised learning, successfully applied to many machine learning problems. The performance of ST strongly depends on the supervised learning method used in the refinement step and the nature of the given data; hence, a general performance guarantee from a concise theory may become loose in a concrete setup. However, the theoretical methods that sharply predict how the performance of ST depends on various details for each learning scenario are limited. This study develops a novel theoretical framework for sharply characterizing the generalization abilities of the models trained by ST using the non-rigorous replica method of statistical physics. We consider the ST of the linear model that minimizes the ridge-regularized cross-entropy loss when the data are generated from a two-component Gaussian mixture. Consequently, we show that the generalization performance of ST in each iteration is sharply characterized by a small finite number of variables, which satisfy a set of deterministic self-consistent equations. By numerically solving these self-consistent equations, we find that ST’s generalization performance approaches to the supervised learning method with a very simple regularization schedule when the label bias is small and a moderately large number of iterations are used.

摘要:自我训练(ST)是半监督学习中的一种直接和标准的方法,成功地应用于许多机器学习问题。 ST的性能在很大程度上取决于在改进步骤中使用的监督学习方法和给定数据的性质;因此,简洁理论的一般绩效保证可能会在具体的设置中变得松散。但是,彻底预测ST的性能如何取决于每个学习方案的各种细节的理论方法是有限的。这项研究开发了一个新型的理论框架,用于彻底表征使用统计物理学的非辅助复制方法训练ST的模型的概括能力。我们考虑了线性模型的ST,该模型将数据是从两个组分的高斯混合物生成的,可以最大程度地减少脊定制的横向渗透损失。因此,我们表明,每次迭代中ST的概括性能都以少数有限数量的变量为特征,这些变量满足了一组确定性的自洽方程。通过数值求解这些自洽方程式,我们发现,当标签偏差很小并且使用了中等数量的迭代时,ST的概括性能方法采用非常简单的正规化时间表进行监督学习方法。

ML-108-标题 From Dirichlet to Rubin Optimistic Exploration in RL without Bonuses

链接: https://arxiv.org/abs/2205.07704
作者: Daniil Tiapkin, Denis Belomestny, Eric Moulines, Alexey Naumov, Sergey Samsonov, Yunhao Tang, Michal Valko, Pierre Menard
备注:

点击查看摘要

Abstract: We propose the Bayes-UCBVI algorithm for reinforcement learning in tabular, stage-dependent, episodic Markov decision process: a natural extension of the Bayes-UCB algorithm by Kaufmann et al. (2012) for multi-armed bandits. Our method uses the quantile of a Q-value function posterior as upper confidence bound on the optimal Q-value function. For Bayes-UCBVI, we prove a regret bound of order \widetilde{O}(\sqrt{H^3SAT}) where H is the length of one episode, S is the number of states, A the number of actions, T the number of episodes, that matches the lower-bound of \Omega(\sqrt{H^3SAT}) up to poly- \log terms in H,S,A,T for a large enough T . To the best of our knowledge, this is the first algorithm that obtains an optimal dependence on the horizon H (and S ) without the need for an involved Bernstein-like bonus or noise. Crucial to our analysis is a new fine-grained anti-concentration bound for a weighted Dirichlet sum that can be of independent interest. We then explain how Bayes-UCBVI can be easily extended beyond the tabular setting, exhibiting a strong link between our algorithm and Bayesian bootstrap (Rubin, 1981).

摘要:我们提出了用于在表格,阶段依赖性的,情节的马尔可夫决策过程中进行增强学习的贝叶斯-UCBVI算法:Kaufmann等人的贝叶斯-UCB算法的自然扩展。 (2012年)用于多军匪徒。我们的方法将Q值函数后部的分位数用作最佳Q值函数上的上限。对于贝叶斯-UCBVI,我们证明了一个遗憾\ widetilde {o}(\ sqrt {h^3sat}),其中h是一集的长度,s是状态的数量,a动作数量,t the与\ omega(\ sqrt {h^3sat})的较低的情节数量与h,s,a,t中的poly- \ log enter符合poly- \ log术语,适合足够大的t。据我们所知,这是第一个获得对地平线H(和S)最佳依赖性的算法,而无需涉及伯恩斯坦的奖励或噪声。对于我们的分析而言,至关重要的是一种新的细粒抗浓缩,以具有独立感兴趣的加权dirichlet总和。然后,我们解释了如何轻松地将贝叶斯-UCBVI延伸到表格环境之外,从而在我们的算法和贝叶斯引导之间表现出牢固的联系(Rubin,1981)。

ML-109-标题 Conditional Born machine for Monte Carlo events generation

链接: https://arxiv.org/abs/2205.07674
作者: Oriel Kiss, Michele Grossi, Enrique Kajomovitz, Sofia Vallecorsa
备注: 10 pages, 9 figures, 4 tables

点击查看摘要

Abstract: Generative modeling is a promising task for near-term quantum devices, which can use the stochastic nature of quantum measurements as random source. So called Born machines are purely quantum models and promise to generate probability distributions in a quantum way, inaccessible to classical computers. This paper presents an application of Born machines to Monte Carlo simulations and extends their reach to multivariate and conditional distributions. Models are run on (noisy) simulators and IBM Quantum superconducting quantum hardware. More specifically, Born machines are used to generate muonic force carriers (MFC) events resulting from scattering processes between muons and the detector material in high-energy-physics colliders experiments. MFCs are bosons appearing in beyond the standard model theoretical frameworks, which are candidates for dark matter. Empirical evidences suggest that Born machines can reproduce the underlying distribution of datasets coming from Monte Carlo simulations, and are competitive with classical machine learning-based generative models of similar complexity.

摘要:生成建模是近期量子设备的一项有前途的任务,可以将量子测量的随机性作为随机来源。所谓的出生机器是纯粹的量子模型,并承诺以量子的方式生成概率分布,而对经典计算机无法访问。本文介绍了出生的机器在蒙特卡洛模拟中的应用,并将其覆盖范围扩展到多元和有条件的分布。模型在(嘈杂)模拟器和IBM量子超导量子硬件上运行。更具体地说,出生的机器用于生成由Muons和探测器材料之间的散射过程和高能物理式攻针实验中的探测器材料之间的散射过程产生的事件。 MFC是出现在标准模型理论框架之外的玻色子,它们是暗物质的候选者。经验证据表明,诞生的机器可以重现来自蒙特卡洛模拟的数据集的潜在分布,并且具有基于经典的机器学习的类似复杂性的经典机器学习模型的竞争。

ML-110-标题 Reduction of detection limit and quantification uncertainty due to interferent by neural classification with abstention

链接: https://arxiv.org/abs/2205.07609
作者: Alex Hagen, Ken Jarman, Jesse Ward, Greg Eiden, Charles Barinaga, Emily Mace, Craig Aalseth, Anthony Carado
备注: Preprint submitted to Nuclear Instruments and Methods in Physics Research,\ A 12 pages, 10 figures

点击查看摘要

Abstract: Many measurements in the physical sciences can be cast as counting experiments, where the number of occurrences of a physical phenomenon informs the prevalence of the phenomenon’s source. Often, detection of the physical phenomenon (termed signal) is difficult to distinguish from naturally occurring phenomena (termed background). In this case, the discrimination of signal events from background can be performed using classifiers, and they may range from simple, threshold-based classifiers to sophisticated neural networks. These classifiers are often trained and validated to obtain optimal accuracy, however we show that the optimal accuracy classifier does not generally coincide with a classifier that provides the lowest detection limit, nor the lowest quantification uncertainty. We present a derivation of the detection limit and quantification uncertainty in the classifier-based counting experiment case. We also present a novel abstention mechanism to minimize the detection limit or quantification uncertainty \emph{a posteriori}. We illustrate the method on two data sets from the physical sciences, discriminating Ar-37 and Ar-39 radioactive decay from non-radioactive events in a gas proportional counter, and discriminating neutrons from photons in an inorganic scintillator and report results therefrom.

摘要:物理科学中的许多测量值可以作为计数实验,其中物理现象的发生数量符合现象来源的普遍性。通常,检测物理现象(称为信号)很难与天然发生现象(称为背景)区分开。在这种情况下,可以使用分类器对信号事件进行歧视,它们的范围从简单的,基于阈值的分类器到复杂的神经网络。这些分类器通常经过训练和验证以获得最佳精度,但是我们表明,最佳精度分类器通常与提供最低检测极限的分类器不一致,也不是最低的定量不确定性。我们提出了基于分类器的计数实验案例中检测极限和定量不确定性的推导。我们还提出了一种新颖的弃权机制,以最大程度地减少检测极限或定量不确定性\ emph {a后验}。我们在物理科学的两个数据集上说明了该方法,该方法将AR-37和AR-39放射性衰减与非放射性事件相比,在气体比例计数器中与非放射性事件进行区分,并在无机闪烁剂中将中子与光子区分开,并在其中报告结果。

ML-111-标题 Towards on-sky adaptive optics control using reinforcement learning

链接: https://arxiv.org/abs/2205.07554
作者: J. Nousiainen, C. Rajani, M. Kasper, T. Helin, S. Y. Haffert, C. Vérinaud, J. R. Males, K. Van Gorkom, L. M. Close, J. D. Long, A. D. Hedglen, O. Guyon, L. Schatz, M. Kautz, J. Lumbres, A. Rodack, J.M. Knight, K. Miller
备注:

点击查看摘要

Abstract: The direct imaging of potentially habitable Exoplanets is one prime science case for the next generation of high contrast imaging instruments on ground-based extremely large telescopes. To reach this demanding science goal, the instruments are equipped with eXtreme Adaptive Optics (XAO) systems which will control thousands of actuators at a framerate of kilohertz to several kilohertz. Most of the habitable exoplanets are located at small angular separations from their host stars, where the current XAO systems’ control laws leave strong residuals.Current AO control strategies like static matrix-based wavefront reconstruction and integrator control suffer from temporal delay error and are sensitive to mis-registration, i.e., to dynamic variations of the control system geometry. We aim to produce control methods that cope with these limitations, provide a significantly improved AO correction and, therefore, reduce the residual flux in the coronagraphic point spread function. We extend previous work in Reinforcement Learning for AO. The improved method, called PO4AO, learns a dynamics model and optimizes a control neural network, called a policy. We introduce the method and study it through numerical simulations of XAO with Pyramid wavefront sensing for the 8-m and 40-m telescope aperture cases. We further implemented PO4AO and carried out experiments in a laboratory environment using MagAO-X at the Steward laboratory. PO4AO provides the desired performance by improving the coronagraphic contrast in numerical simulations by factors 3-5 within the control region of DM and Pyramid WFS, in simulation and in the laboratory. The presented method is also quick to train, i.e., on timescales of typically 5-10 seconds, and the inference time is sufficiently small (< ms) to be used in real-time control for XAO with currently available hardware even for extremely large telescopes.

摘要:潜在可居住的系外行星的直接成像是下一代高对比度成像仪器上的一个主要科学案例。为了实现这一苛刻的科学目标,这些仪器配备了极端的自适应光学(XAO)系统,该系统将在Kilohertz的框架上控制成千上万的执行器到几座Kilohertz。大多数可居住的系外行星都位于与宿主恒星的小角度分离,当前的XAO系统的控制定律留下了强大的残留。误导,即控制控制系统几何形状的动态变化。我们旨在产生应对这些局限性的控制方法,提供明显改善的AO校正,因此减少了冠状动脉点扩散功能中的残余通量。我们扩展了以前的AO强化学习工作。改进的方法称为PO4AO,学习了动态模型并优化了称为策略的控制神经网络。我们介绍了该方法,并通过对8米和40米望远镜孔径的XAO进行数值模拟,并通过金字塔波前传感。我们进一步实施了PO4AO,并使用MAGAO-X在管家实验室中进行了实验。 PO4AO通过在DM和PYRAMID WFS的控制区域内,模拟和实验室中通过因子3-5来改善数值模拟中的冠状动脉对比度来提供所需的性能。提出的方法也很快进行训练,即通常在5-10秒的时间尺度上,并且推理时间足够小(<ms),可以在XAO的实时控制中使用,即使在非常大的望远镜上,当前可用的硬件即使是当前可用的硬件。

ML-112-标题 The use of deep learning in interventional radiotherapy (brachytherapy) a review with a focus on open source and open data

链接: https://arxiv.org/abs/2205.07516
作者: Tobias Fechter, Ilias Sachpazidis, Dimos Baltas
备注: 31 pages, submitted to “Zeitschrift f"ur Medizinische Physik”

点击查看摘要

Abstract: Deep learning advanced to one of the most important technologies in almost all medical fields. Especially in areas, related to medical imaging it plays a big role. However, in interventional radiotherapy (brachytherapy) deep learning is still in an early phase. In this review, first, we investigated and scrutinised the role of deep learning in all processes of interventional radiotherapy and directly related fields. Additionally we summarised the most recent developments. To reproduce results of deep learning algorithms both source code and training data must be available. Therefore, a second focus of this work was on the analysis of the availability of open source, open data and open models. In our analysis, we were able to show that deep learning plays already a major role in some areas of interventional radiotherapy, but is still hardly presented in others. Nevertheless, its impact is increasing with the years, partly self-propelled but also influenced by closely related fields. Open source, data and models are growing in number but are still scarce and unevenly distributed among different research groups. The reluctance in publishing code, data and models limits reproducibility and restricts evaluation to mono-institutional datasets. Summarised, deep learning will change positively the workflow of interventional radiotherapy but there is room for improvement when it comes to reproducible results and standardised evaluation methods.

摘要:深度学习已成为几乎所有医学领域中最重要的技术之一。特别是在与医学成像有关的领域中,它起着重要作用。然而,在介入放疗(近距离放射治疗)中,深度学习仍处于早期阶段。在这篇综述中,首先,我们研究并审查了深度学习在介入放射疗法和直接相关领域的所有过程中的作用。此外,我们总结了最新的发展。为了重现深度学习算法的结果,必须提供源代码和培训数据。因此,这项工作的第二个重点是分析开源,开放数据和开放模型的可用性。在我们的分析中,我们能够证明深度学习在某些介入放射疗法领域已经起着主要作用,但在其他方面仍然很少出现。然而,随着年份的影响,它的影响正在增加,部分自我推广,但也受到密切相关领域的影响。开源,数据和模型的数量正在增长,但仍然很少,并且在不同的研究小组之间分布不均。不愿发布代码,数据和模型限制了可重复性,并将评估限制为单一机构数据集。总结,深度学习将积极改变介入放射疗法的工作流程,但是在可再现的结果和标准化评估方法方面,有改进的余地。

ML-113-标题 Ergodic variational flows

链接: https://arxiv.org/abs/2205.07475
作者: Zuheng Xu, Naitong Chen, Trevor Campbell
备注:

点击查看摘要

Abstract: This work presents a new class of variational family – ergodic variational flows – that not only enables tractable i.i.d. sampling and density evaluation, but also comes with MCMC-like convergence guarantees. Ergodic variational flows consist of a mixture of repeated applications of a measure-preserving and ergodic map to an initial reference distribution. We provide mild conditions under which the variational distribution converges weakly and in total variation to the target as the number of steps in the flow increases; this convergence holds regardless of the value of variational parameters, although different parameter values may result in faster or slower convergence. Further, we develop a particular instantiation of the general family using Hamiltonian dynamics combined with deterministic momentum refreshment. Simulated and real data experiments provide an empirical verification of the convergence theory and demonstrate that samples produced by the method are of comparable quality to a state-of-the-art MCMC method.

摘要:这项工作介绍了一类新的变分家族 - 千古差异流 - 不仅可以实现可拖延的I.I.D.采样和密度评估,但也伴随着MCMC样收敛保证。奇异的变分流由量度保护和厄加迪克图与初始参考分布的重复应用的混合物组成。我们提供了轻度条件,随着流量的增加数量,变异分布弱收敛且总变异为目标;无论变分参数的值如何,这种收敛均保持不变,尽管不同的参数值可能会导致收敛速度更快或较慢。此外,我们使用哈密顿动力学以及确定性动量茶点来开发一般家族的特殊实例化。模拟和真实的数据实验提供了对收敛理论的经验验证,并证明该方法产生的样品与最先进的MCMC方法相当。

ML-114-标题 Optimal Randomized Approximations for Matrix based Renyis Entropy

链接: https://arxiv.org/abs/2205.07426
作者: Yuxin Dong, Tieliang Gong, Shujian Yu, Chen Li
备注:

点击查看摘要

Abstract: The Matrix-based Renyi’s entropy enables us to directly measure information quantities from given data without the costly probability density estimation of underlying distributions, thus has been widely adopted in numerous statistical learning and inference tasks. However, exactly calculating this new information quantity requires access to the eigenspectrum of a semi-positive definite (SPD) matrix A which grows linearly with the number of samples n , resulting in a O(n^3) time complexity that is prohibitive for large-scale applications. To address this issue, this paper takes advantage of stochastic trace approximations for matrix-based Renyi’s entropy with arbitrary \alpha \in R^+ orders, lowering the complexity by converting the entropy approximation to a matrix-vector multiplication problem. Specifically, we develop random approximations for integer order \alpha cases and polynomial series approximations (Taylor and Chebyshev) for non-integer \alpha cases, leading to a O(n^2sm) overall time complexity, where s,m \ll n denote the number of vector queries and the polynomial order respectively. We theoretically establish statistical guarantees for all approximation algorithms and give explicit order of s and m with respect to the approximation error \varepsilon , showing optimal convergence rate for both parameters up to a logarithmic factor. Large-scale simulations and real-world applications validate the effectiveness of the developed approximations, demonstrating remarkable speedup with negligible loss in accuracy.

摘要:基于矩阵的Renyi的熵使我们能够直接从给定数据中测量信息数量,而无需对基础分布的概率密度估计,因此在众多统计学习和推理任务中已被广泛采用。但是,准确地计算此新信息数量需要访问半阳性确定(SPD)矩阵A的特征光谱A,该矩阵与样品n的数量线性生长,从而导致O(n^3)时间复杂性,该时间复杂性对于大的大型时间复杂性 - 规模应用。为了解决这个问题,本文利用了基于矩阵的雷耶熵的随机跟踪近似,该熵具有随意的\ alpha \在r^+ orders中,从而通过将熵近似转换为矩阵矢量乘法问题来降低复杂性。具体而言,我们开发了整数秩序\ alpha病例和多项式序列近似(Taylor和Chebyshev)的随机近似值,用于非Integer \ alpha病例,从而导致O(n^2SM)的整体时间复杂性,其中S,m \ ll n表示矢量查询的数量和多项式顺序。我们从理论上为所有近似算法建立了统计保证,并相对于近似误差\ Varepsilon给出了S和M的明确顺序,这显示了两个参数的最佳收敛速率,直至对数因子。大规模的模拟和现实世界应用验证了开发近似值的有效性,表明准确性的损失可忽略不计。

ML-115-标题 Learning Representations for New Sound Classes With Continual Self-Supervised Learning

链接: https://arxiv.org/abs/2205.07390
作者: Zhepei Wang, Cem Subakan, Xilin Jiang, Junkai Wu, Efthymios Tzinis, Mirco Ravanelli, Paris Smaragdis
备注: Submitted to IEEE Signal Processing Letters

点击查看摘要

Abstract: In this paper, we present a self-supervised learning framework for continually learning representations for new sound classes. The proposed system relies on a continually trained neural encoder that is trained with similarity-based learning objectives without using labels. We show that representations learned with the proposed method generalize better and are less susceptible to catastrophic forgetting than fully-supervised approaches. Remarkably, our technique does not store past data or models and is more computationally efficient than distillation-based methods. To accurately assess the system performance, in addition to using existing protocols, we propose two realistic evaluation protocols that use only a small amount of labeled data to simulate practical use cases.

摘要:在本文中,我们提出了一个自我监督的学习框架,用于不断学习新声音类的表示。拟议的系统依赖于不断受过训练的神经编码器,该神经编码器经过基于相似性的学习目标而无需使用标签。我们表明,使用所提出的方法学到的表示形式比完全监督的方法更不容易受到灾难性遗忘的影响。值得注意的是,我们的技术不会存储过去的数据或模型,并且比基于蒸馏的方法更有效地计算效率。为了准确评估系统性能,除了使用现有协议外,我们还提出了两个现实的评估协议,它们仅使用少量标记数据来模拟实际用例。

ML-116-标题 Supervised Learning and Model Analysis with Compositional Data

链接: https://arxiv.org/abs/2205.07271
作者: Shimeng Huang, Elisabeth Ailer, Niki Kilbertus, Niklas Pfister
备注:

点击查看摘要

Abstract: The compositionality and sparsity of high-throughput sequencing data poses a challenge for regression and classification. However, in microbiome research in particular, conditional modeling is an essential tool to investigate relationships between phenotypes and the microbiome. Existing techniques are often inadequate: they either rely on extensions of the linear log-contrast model (which adjusts for compositionality, but is often unable to capture useful signals), or they are based on black-box machine learning methods (which may capture useful signals, but ignore compositionality in downstream analyses). We propose KernelBiome, a kernel-based nonparametric regression and classification framework for compositional data. It is tailored to sparse compositional data and is able to incorporate prior knowledge, such as phylogenetic structure. KernelBiome captures complex signals, including in the zero-structure, while automatically adapting model complexity. We demonstrate on par or improved predictive performance compared with state-of-the-art machine learning methods. Additionally, our framework provides two key advantages: (i) We propose two novel quantities to interpret contributions of individual components and prove that they consistently estimate average perturbation effects of the conditional mean, extending the interpretability of linear log-contrast models to nonparametric models. (ii) We show that the connection between kernels and distances aids interpretability and provides a data-driven embedding that can augment further analysis. Finally, we apply the KernelBiome framework to two public microbiome studies and illustrate the proposed model analysis. KernelBiome is available as an open-source Python package at this https URL.

摘要:高通量测序数据的组成性和稀疏性对回归和分类构成了挑战。但是,在微生物组研究中,条件建模是研究表型与微生物组之间关系的重要工具。现有技术通常不足:它们要么依赖于线性对比对比模型的扩展(该模型调整了组合性,但通常无法捕获有用的信号),要么基于黑盒机器学习方法(可能捕获有用信号,但忽略下游分析中的组成性)。我们提出了基于内核的非参数回归和组成数据的分类框架内核biome。它针对稀疏组成数据量身定制,并能够纳入先验知识,例如系统发育结构。 kernelbiome捕获复杂的信号,包括在零结构中,同时自动适应模型复杂性。与最先进的机器学习方法相比,我们证明了PAR或改进的预测性能。此外,我们的框架提供了两个关键优势:(i)我们提出了两个新数量来解释单个组件的贡献,并证明它们始终如一地估计条件平均值的平均扰动效应,从而将线性对比度模型的可解释性扩展到非参数模型。 (ii)我们表明,内核和距离之间的连接有助于解释性,并提供了数据驱动的嵌入,可以增强进一步的分析。最后,我们将内核框架应用于两个公共微生物组研究,并说明了所提出的模型分析。 kernelbiome可以在此HTTPS URL上作为开源python软件包。

ML-117-标题 Evaluating Independence and Conditional Independence Measures

链接: https://arxiv.org/abs/2205.07253
作者: Jian Ma
备注: 53 pages, 26 figures, 3 tables

点击查看摘要

Abstract: Independence and Conditional Independence (CI) are two fundamental concepts in probability and statistics, which can be applied to solve many central problems of statistical inference. There are many existing independence and CI measures defined from diverse principles and concepts. In this paper, the 16 independence measures and 16 CI measures were reviewed and then evaluated with simulated and real data. For the independence measures, eight simulated data were generating from normal distribution, normal and Archimedean copula functions to compare the measures in bivariate or multivariate, linear or nonlinear settings. Two UCI dataset, including the heart disease data and the wine quality data, were used to test the power of the independence measures in real conditions. For the CI measures, two simulated data with normal distribution and Gumbel copula, and one real data (the Beijing air data) were utilized to test the CI measures in prespecified linear or nonlinear setting and real scenario. From the experimental results, we found that most of the measures work well on the simulated data by presenting the right monotonicity of the simulations. However, the independence and CI measures were differentiated on much complex real data respectively and only a few can be considered as working well with reference to domain knowledge. We also found that the measures tend to be separated into groups based on the similarity of the behaviors of them in each setting and in general. According to the experiments, we recommend CE as a good choice for both independence and CI measure. This is also due to its rigorous distribution-free definition and consistent nonparametric estimator.

摘要:独立性和有条件独立性(CI)是概率和统计中的两个基本概念,可以应用于解决统计推断的许多核心问题。根据不同的原则和概念定义了许多现有的独立性和CI措施。在本文中,审查了16项独立措施和16项CI措施,然后使用模拟和真实数据进行评估。对于独立措施,从正态分布,正常和阿基米德copula函数产生了八个模拟数据,以比较双变量或多变量,线性或非线性设置中的度量。两个UCI数据集,包括心脏病数据和葡萄酒质量数据,用于在实际条件下测试独立性措施的力量。对于CI度量,使用了两个具有正态分布和牙龈副本的模拟数据,以及一个真实数据(北京空气数据)用于测试预先指定的线性或非线性设置和实际情况中的CI测量。从实验结果中,我们发现大多数措施通过呈现模拟的正确单调性来很好地在模拟数据上工作。但是,独立性和CI措施分别在许多复杂的真实数据上有所区别,只有少数可以被视为参考领域知识很好地工作。我们还发现,这些措施倾向于根据每种环境和通常的行为的相似性分为组。根据实验,我们建议CE作为独立和CI度量的理想选择。这也是由于其严格的无分配定义和一致的非参数估计器。

ML-118-标题 A comparison of PINN approaches for drift-diffusion equations on metric graphs

链接: https://arxiv.org/abs/2205.07195
作者: Jan Blechschmidt, Jan-Frederik Pietschman, Tom-Christian Riemer, Martin Stoll, Max Winkler
备注:

点击查看摘要

Abstract: In this paper we focus on comparing machine learning approaches for quantum graphs, which are metric graphs, i.e., graphs with dedicated edge lengths, and an associated differential operator. In our case the differential equation is a drift-diffusion model. Computational methods for quantum graphs require a careful discretization of the differential operator that also incorporates the node conditions, in our case Kirchhoff-Neumann conditions. Traditional numerical schemes are rather mature but have to be tailored manually when the differential equation becomes the constraint in an optimization problem. Recently, physics informed neural networks (PINNs) have emerged as a versatile tool for the solution of partial differential equations from a range of applications. They offer flexibility to solve parameter identification or optimization problems by only slightly changing the problem formulation used for the forward simulation. We compare several PINN approaches for solving the drift-diffusion on the metric graph.

摘要:在本文中,我们专注于比较量子图的机器学习方法,即标准图,即具有专用边缘长度的图形和相关的差分运算符。在我们的情况下,微分方程是一个漂移扩散模型。量子图的计算方法需要对差分运算符进行仔细的离散化,该差分运算符也包括了节点条件,在我们的情况下,Kirchhoff-Neumann条件。传统的数值方案相当成熟,但是当微分方程成为优化问题中的约束时,必须手动量身定制。最近,物理知情的神经网络(PINN)已成为一种用途工具,用于从一系列应用中解决部分微分方程。它们提供了灵活性来解决参数识别或优化问题,仅通过稍微更改用于正向模拟的问题公式。我们比较了几种PINN方法用于求解公制图上的漂移扩散的方法。

ML-119-标题 Fair Bayes-Optimal Classifiers Under Predictive Parity

链接: https://arxiv.org/abs/2205.07182
作者: Xianli Zeng, Edgar Dobriban, Guang Cheng
备注:

点击查看摘要

Abstract: Increasing concerns about disparate effects of AI have motivated a great deal of work on fair machine learning. Existing works mainly focus on independence- and separation-based measures (e.g., demographic parity, equality of opportunity, equalized odds), while sufficiency-based measures such as predictive parity are much less studied. This paper considers predictive parity, which requires equalizing the probability of success given a positive prediction among different protected groups. We prove that, if the overall performances of different groups vary only moderately, all fair Bayes-optimal classifiers under predictive parity are group-wise thresholding rules. Perhaps surprisingly, this may not hold if group performance levels vary widely; in this case we find that predictive parity among protected groups may lead to within-group unfairness. We then propose an algorithm we call FairBayes-DPP, aiming to ensure predictive parity when our condition is satisfied. FairBayes-DPP is an adaptive thresholding algorithm that aims to achieve predictive parity, while also seeking to maximize test accuracy. We provide supporting experiments conducted on synthetic and empirical data.

摘要:对AI不同影响的越来越担心,促使人们在公平的机器学习上进行了大量工作。现有的作品主要集中于基于独立和基于分离的措施(例如人口统计学奇偶,机会平等,均衡的几率),而基于足够的措施(例如预测奇偶)的研究要少得多。本文考虑了预测奇偶校验,这需要在不同受保护的群体之间进行积极的预测,需要平等成功的概率。我们证明,如果不同群体的总体表现仅适度变化,那么在预测奇偶校验下,所有公平的贝叶斯 - 最佳分类器都是群体阈值规则。也许令人惊讶的是,如果小组绩效水平差异很大,这可能不会成立。在这种情况下,我们发现受保护群体之间的预测平等可能导致组内不公平。然后,我们提出了一种称为Fairbayes-DPP的算法,旨在确保在满足条件时进行预测奇偶校验。 Fairbayes-DPP是一种自适应阈值算法,旨在实现预测奇偶校验,同时也试图最大程度地提高测试准确性。我们提供有关合成和经验数据进行的支持实验。

ML-120-标题 Trajectory Inference via Mean-field Langevin in Path Space

链接: https://arxiv.org/abs/2205.07146
作者: Stephen Zhang, Lénaïc Chizat, Matthieu Heitz, Geoffrey Schiebinger
备注:

点击查看摘要

Abstract: Trajectory inference aims at recovering the dynamics of a population from snapshots of its temporal marginals. To solve this task, a min-entropy estimator relative to the Wiener measure in path space was introduced by Lavenant et al. arXiv:2102.09204, and shown to consistently recover the dynamics of a large class of drift-diffusion processes from the solution of an infinite dimensional convex optimization problem. In this paper, we introduce a grid-free algorithm to compute this estimator. Our method consists in a family of point clouds (one per snapshot) coupled via Schrödinger bridges which evolve with noisy gradient descent. We study the mean-field limit of the dynamics and prove its global convergence at an exponential rate to the desired estimator. Overall, this leads to an inference method with end-to-end theoretical guarantees that solves an interpretable model for trajectory inference. We also present how to adapt the method to deal with mass variations, a useful extension when dealing with single cell RNA-sequencing data where cells can branch and die.

摘要:轨迹推断旨在从其时间边缘的快照中恢复人群的动态。为了解决这项任务,Lavenant等人引入了相对于路径空间中的Wiener度量的最小渗透估计量。 ARXIV:2102.09204,并显示出从无限尺寸凸优化问题的解决方案中始终如一地恢复大型漂移扩散过程的动力学。在本文中,我们引入了无网算法来计算该估计器。我们的方法包括通过schrödinger桥的一个点云(每个快照)的家族,该桥随着嘈杂的梯度下降而演变。我们研究动力学的平均场限制,并以指数率向所需估计量证明其全局收敛性。总体而言,这导致了一种具有端到端理论保证的推理方法,该方法可以解决轨迹推理的可解释模型。我们还提出了如何调整方法处理质量变化的方法,这是在处理单个细胞RNA序列数据时的有用扩展,其中细胞可以分支和死亡。

ML-121-标题 Robust Regularized Low-Rank Matrix Models for Regression and Classification

链接: https://arxiv.org/abs/2205.07106
作者: Hsin-Hsiung Huang, Feng Yu, Xing Fan, Teng Zhang
备注: 26 pages, 7 figures

点击查看摘要

Abstract: While matrix variate regression models have been studied in many existing works, classical statistical and computational methods for the analysis of the regression coefficient estimation are highly affected by high dimensional and noisy matrix-valued predictors. To address these issues, this paper proposes a framework of matrix variate regression models based on a rank constraint, vector regularization (e.g., sparsity), and a general loss function with three special cases considered: ordinary matrix regression, robust matrix regression, and matrix logistic regression. We also propose an alternating projected gradient descent algorithm. Based on analyzing our objective functions on manifolds with bounded curvature, we show that the algorithm is guaranteed to converge, all accumulation points of the iterates have estimation errors in the order of O(1/\sqrt{n}) asymptotically and substantially attaining the minimax rate. Our theoretical analysis can be applied to general optimization problems on manifolds with bounded curvature and can be considered an important technical contribution to this work. We validate the proposed method through simulation studies and real image data examples.

摘要:尽管已经在许多现有作品中研究了矩阵变量回归模型,但用于分析回归系数估计的经典统计和计算方法受高维和嘈杂的矩阵值预测变量的高度影响。为了解决这些问题,本文提出了一个基于等级约束,矢量正则化(例如,稀疏)和一般损失函数的矩阵变量回归模型的框架,并考虑了三种特殊情况:普通矩阵回归,可靠的矩阵回归和矩阵逻辑回归。我们还提出了一种交替的投影梯度下降算法。基于分析我们对具有有界曲率的流形的目标功能,我们证明算法可以融合,迭代元素的所有积累点在O(1/\ sqrt {n})的顺序上都有估计错误,并实质上无效最小值。我们的理论分析可以应用于具有有界曲率的流形的一般优化问题,并且可以被认为是对这项工作的重要技术贡献。我们通过模拟研究和真实图像数据示例验证了提出的方法。

ML-122-标题 A Tale of Two Flows Cooperative Learning of Langevin Flow and Normalizing Flow Toward Energy-Based Model

链接: https://arxiv.org/abs/2205.06924
作者: Jianwen Xie, Yaxuan Zhu, Jun Li, Ping Li
备注: 23 pages

点击查看摘要

Abstract: This paper studies the cooperative learning of two generative flow models, in which the two models are iteratively updated based on the jointly synthesized examples. The first flow model is a normalizing flow that transforms an initial simple density to a target density by applying a sequence of invertible transformations. The second flow model is a Langevin flow that runs finite steps of gradient-based MCMC toward an energy-based model. We start from proposing a generative framework that trains an energy-based model with a normalizing flow as an amortized sampler to initialize the MCMC chains of the energy-based model. In each learning iteration, we generate synthesized examples by using a normalizing flow initialization followed by a short-run Langevin flow revision toward the current energy-based model. Then we treat the synthesized examples as fair samples from the energy-based model and update the model parameters with the maximum likelihood learning gradient, while the normalizing flow directly learns from the synthesized examples by maximizing the tractable likelihood. Under the short-run non-mixing MCMC scenario, the estimation of the energy-based model is shown to follow the perturbation of maximum likelihood, and the short-run Langevin flow and the normalizing flow form a two-flow generator that we call CoopFlow. We provide an understating of the CoopFlow algorithm by information geometry and show that it is a valid generator as it converges to a moment matching estimator. We demonstrate that the trained CoopFlow is capable of synthesizing realistic images, reconstructing images, and interpolating between images.

摘要:本文研究了两个生成流模型的合作学习,其中两个模型基于共同合成的示例进行迭代更新。第一个流量模型是正常化的流,该流通过应用一系列可逆变换来将初始简单密度转换为目标密度。第二流模型是Langevin流,该流程运行基于梯度的MCMC的有限步骤朝基于能量的模型。我们从提出一个生成框架开始,该框架训练具有标准化流程作为摊销采样器的基于能量的模型,以初始化基于能量的模型的MCMC链。在每次学习迭代中,我们通过使用归一化流量初始化,然后对当前基于能量的模型进行短期Langevin Flow修订,从而生成合成的示例。然后,我们将合成的示例视为来自基于能量的模型的公平样本,并以最大似然学习梯度更新模型参数,而归一化流则通过最大化可拖动的可能性直接从合成的示例中学习。在短期非混合MCMC方案下,基于能量的模型的估计被证明是遵循最大似然的扰动,而短期的langevin流量和归一化流量形成了我们称之为两流的生成器,我们称之为cookflow 。我们通过信息几何形状提供了对库流算法的低估,并表明它是一个有效的发电机,因为它会收敛到矩匹配估计器。我们证明了训练有素的羊松流能够合成逼真的图像,重建图像和图像之间插值。

ML-123-标题 Large-Scale Sequential Learning for Recommender and Engineering Systems

链接: https://arxiv.org/abs/2205.06893
作者: Aleksandra Burashnikova
备注: PhD thesis

点击查看摘要

Abstract: In this thesis, we focus on the design of an automatic algorithms that provide personalized ranking by adapting to the current conditions. To demonstrate the empirical efficiency of the proposed approaches we investigate their applications for decision making in recommender systems and energy systems domains. For the former, we propose novel algorithm called SAROS that take into account both kinds of feedback for learning over the sequence of interactions. The proposed approach consists in minimizing pairwise ranking loss over blocks constituted by a sequence of non-clicked items followed by the clicked one for each user. We also explore the influence of long memory on the accurateness of predictions. SAROS shows highly competitive and promising results based on quality metrics and also it turn out faster in terms of loss convergence than stochastic gradient descent and batch classical approaches. Regarding power systems, we propose an algorithm for faulted lines detection based on focusing of misclassifications in lines close to the true event location. The proposed idea of taking into account the neighbour lines shows statistically significant results in comparison with the initial approach based on convolutional neural networks for faults detection in power grid.

摘要:在本文中,我们着重于通过适应当前条件来提供个性化排名的自动算法的设计。为了证明所提出方法的经验效率,我们研究了它们在推荐系统和能源系统领域中决策的应用。对于前者,我们提出了名为Saros的新型算法,这些算法考虑了两种反馈,用于学习相互作用的顺序。所提出的方法包括最大程度地减少由一系列非点击项目构成的块上的成对排名损失,然后为每个用户单击一个。我们还探讨了长期记忆对预测准确性的影响。萨罗斯(Saros)根据质量指标表现出高度竞争性和有希望的结果,并且比随机梯度下降和批处理经典方法,在损失收敛方面的结果也更快。关于电源系统,我们提出了一种基于对真实事件位置的线路的错误分类的关注,用于故障线检测的算法。考虑到邻居线的拟议思想与基于卷积神经网络的初始方法相比,在统计学上具有显着的结果,用于电网中的故障检测。

ML-124-标题 A Huber loss-based super learner with applications to healthcare expenditures

链接: https://arxiv.org/abs/2205.06870
作者: Ziyue Wu, David Benkeser
备注:

点击查看摘要

Abstract: Complex distributions of the healthcare expenditure pose challenges to statistical modeling via a single model. Super learning, an ensemble method that combines a range of candidate models, is a promising alternative for cost estimation and has shown benefits over a single model. However, standard approaches to super learning may have poor performance in settings where extreme values are present, such as healthcare expenditure data. We propose a super learner based on the Huber loss, a “robust” loss function that combines squared error loss with absolute loss to down-weight the influence of outliers. We derive oracle inequalities that establish bounds on the finite-sample and asymptotic performance of the method. We show that the proposed method can be used both directly to optimize Huber risk, as well as in finite-sample settings where optimizing mean squared error is the ultimate goal. For this latter scenario, we provide two methods for performing a grid search for values of the robustification parameter indexing the Huber loss. Simulations and real data analysis demonstrate appreciable finite-sample gains in cost prediction and causal effect estimation using our proposed method.

摘要:医疗支出的复杂分布通过单个模型对统计建模构成挑战。 Super Learning是一种结合各种候选模型的合奏方法,是成本估算的有前途的替代方法,并且显示了与单个模型相比的收益。但是,在存在极端价值(例如医疗保健支出数据)的情况下,标准的超级学习方法的性能可能很差。我们提出了一个基于Huber损失的超级学习者,这是一种“强大的”损失函数,将平方误差损失与绝对损失结合在一起,以减轻异常值的影响。我们得出了甲骨文的不平等现象,这些不平等是在该方法的有限样本和渐近性能上建立界限。我们表明,所提出的方法既可以直接用于优化Huber风险,又可以在有限样本设置中进行优化的平方误差是最终目标。对于后一种情况,我们提供了两种方法,用于执行网格搜索,以索引Huber损失的鲁棒化参数值。模拟和实际数据分析表明,使用我们建议的方法,成本预测和因果效应估计中有明显的有限样本收益。

ML-125-标题 Multi-variant COVID-19 model with heterogeneous transmission rates using deep neural networks

链接: https://arxiv.org/abs/2205.06834
作者: K.D. Olumoyin, A.Q.M. Khaliq, K.M. Furati
备注:

点击查看摘要

Abstract: Mutating variants of COVID-19 have been reported across many US states since 2021. In the fight against COVID-19, it has become imperative to study the heterogeneity in the time-varying transmission rates for each variant in the presence of pharmaceutical and non-pharmaceutical mitigation measures. We develop a Susceptible-Exposed-Infected-Recovered mathematical model to highlight the differences in the transmission of the B.1.617.2 delta variant and the original SARS-CoV-2. Theoretical results for the well-posedness of the model are discussed. A Deep neural network is utilized and a deep learning algorithm is developed to learn the time-varying heterogeneous transmission rates for each variant. The accuracy of the algorithm for the model is shown using error metrics in the data-driven simulation for COVID-19 variants in the US states of Florida, Alabama, Tennessee, and Missouri. Short-term forecasting of daily cases is demonstrated using long short term memory neural network and an adaptive neuro-fuzzy inference system.

摘要:自2021年以来,美国许多州的COVID-19的突变变体已经报道。在与Covid-19的斗争中,必须研究在有药物和药物存在下每个变体的时变传播速率的异质性非药物缓解措施。我们开发了一个易感性的感染的经过验证的数学模型,以突出显示B.1.617.2 Delta变体和原始SARS-COV-2的传输差异。讨论了模型良好的理论结果。使用了深层神经网络,并开发了深度学习算法来学习每个变体的随时间变化的异质传输速率。使用数据驱动的模拟在美国佛罗里达州,阿拉巴马州,田纳西州和密苏里州的COVID-19变体中,使用误差指标显示了该模型算法的准确性。使用长期短期记忆神经网络和自适应神经模糊推理系统,证明了日常病例的短期预测。

计算机视觉

CV-0-标题 Guess What Moves Unsupervised Video and Image Segmentation by Anticipating Motion

链接: https://arxiv.org/abs/2205.07844
作者: Subhabrata Choudhury, Laurynas Karazija, Iro Laina, Andrea Vedaldi, Christian Rupprecht
备注:

点击查看摘要

Abstract: Motion, measured via optical flow, provides a powerful cue to discover and learn objects in images and videos. However, compared to using appearance, it has some blind spots, such as the fact that objects become invisible if they do not move. In this work, we propose an approach that combines the strengths of motion-based and appearance-based segmentation. We propose to supervise an image segmentation network, tasking it with predicting regions that are likely to contain simple motion patterns, and thus likely to correspond to objects. We apply this network in two modes. In the unsupervised video segmentation mode, the network is trained on a collection of unlabelled videos, using the learning process itself as an algorithm to segment these videos. In the unsupervised image segmentation model, the network is learned using videos and applied to segment independent still images. With this, we obtain strong empirical results in unsupervised video and image segmentation, significantly outperforming the state of the art on benchmarks such as DAVIS, sometimes with a 5% IoU gap.

摘要:通过光流量测量的运动为在图像和视频中发现和学习对象提供了强大的提示。但是,与使用外观相比,它具有一些盲点,例如,如果物体不移动,则它们会变得看不见。在这项工作中,我们提出了一种结合基于运动和基于外观分割的优势的方法。我们建议监督图像分割网络,将其任务为预测可能包含简单运动模式的区域,因此可能与对象相对应。我们以两种模式应用此网络。在无监督的视频细分模式下,该网络接受了无标记视频集的培训,将学习过程本身作为算法来细分这些视频。在无监督的图像分割模型中,使用视频学习了网络,并应用于分段独立的静止图像。因此,我们在无监督的视频和图像分割中获得了强有力的经验结果,在基准(例如戴维斯)(例如戴维斯(Davis))上明显优于最先进的先进状态,有时有5 \%iou GAP。

CV-1-标题 Deep Spectral Methods A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization

链接: https://arxiv.org/abs/2205.07839
作者: Luke Melas-Kyriazi, Christian Rupprecht, Iro Laina, Andrea Vedaldi
备注: Published at CVPR 2022. Project Page: this https URL

点击查看摘要

Abstract: Unsupervised localization and segmentation are long-standing computer vision challenges that involve decomposing an image into semantically-meaningful segments without any labeled data. These tasks are particularly interesting in an unsupervised setting due to the difficulty and cost of obtaining dense image annotations, but existing unsupervised approaches struggle with complex scenes containing multiple objects. Differently from existing methods, which are purely based on deep learning, we take inspiration from traditional spectral segmentation methods by reframing image decomposition as a graph partitioning problem. Specifically, we examine the eigenvectors of the Laplacian of a feature affinity matrix from self-supervised networks. We find that these eigenvectors already decompose an image into meaningful segments, and can be readily used to localize objects in a scene. Furthermore, by clustering the features associated with these segments across a dataset, we can obtain well-delineated, nameable regions, i.e. semantic segmentations. Experiments on complex datasets (Pascal VOC, MS-COCO) demonstrate that our simple spectral method outperforms the state-of-the-art in unsupervised localization and segmentation by a significant margin. Furthermore, our method can be readily used for a variety of complex image editing tasks, such as background removal and compositing.

摘要:无监督的本地化和细分是长期存在的计算机视觉挑战,涉及将图像分解为语义上的片段,而没有任何标记的数据。由于获得密集图像注释的困难和成本,这些任务在无监督的环境中特别有趣,但是现有的无监督方法与包含多个对象的复杂场景困难。与现有方法纯粹基于深度学习的方式不同,我们通过将图像分解作为图形分配问题来从传统的光谱分割方法中获取灵感。具体而言,我们检查了来自自我监督网络的特征亲和力矩阵的拉普拉斯的特征向量。我们发现这些特征向量已经将图像分解为有意义的片段,并且可以很容易地用于在场景中定位对象。此外,通过将与这些段相关联的特征在数据集中聚集,我们可以获得精心缩减的,可命名的区域,即语义分割。在复杂数据集(Pascal VOC,MS-Coco)上进行的实验表明,我们的简单光谱方法优于无监督的本地化和分割的最先进的方法。此外,我们的方法可以很容易地用于各种复杂的图像编辑任务,例如背景删除和合成。

CV-2-标题 FvOR Robust Joint Shape and Pose Optimization for Few-view Object Reconstruction

链接: https://arxiv.org/abs/2205.07763
作者: Zhenpei Yang, Zhile Ren, Miguel Angel Bautista, Zaiwei Zhang, Qi Shan, Qixing Huang
备注: CVPR 2022

点击查看摘要

Abstract: Reconstructing an accurate 3D object model from a few image observations remains a challenging problem in computer vision. State-of-the-art approaches typically assume accurate camera poses as input, which could be difficult to obtain in realistic settings. In this paper, we present FvOR, a learning-based object reconstruction method that predicts accurate 3D models given a few images with noisy input poses. The core of our approach is a fast and robust multi-view reconstruction algorithm to jointly refine 3D geometry and camera pose estimation using learnable neural network modules. We provide a thorough benchmark of state-of-the-art approaches for this problem on ShapeNet. Our approach achieves best-in-class results. It is also two orders of magnitude faster than the recent optimization-based approach IDR. Our code is released at \url{this https URL}

摘要:从一些图像观察中重建准确的3D对象模型仍然是计算机视觉中的一个挑战性问题。最先进的方法通常假设准确的相机姿势是输入,在现实的设置中可能很难获得。在本文中,我们提出了FVOR,这是一种基于学习的对象重建方法,该方法预测了一些具有嘈杂输入姿势的图像的准确3D模型。我们方法的核心是一种快速,健壮的多视图重建算法,可使用可学习的神经网络模块共同完善3D几何和相机姿势估计。我们为Shapenet上的此问题提供了最新方法的详尽基准。我们的方法取得了一流的结果。它也比最近基于优化的方法IDR快两个数量级。我们的代码在\ url {this https url}发布

CV-3-标题 A Data Cube of Big Satellite Image Time-Series for Agriculture Monitoring

链接: https://arxiv.org/abs/2205.07752
作者: Thanassis Drivas, Vasileios Sitokonstantinou, Iason Tsardanidis, Alkiviadis Koukos, Charalampos Kontoes, Vassilia Karathanassi
备注: This work has been accepted for publication in IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP 2022)

点击查看摘要

Abstract: The modernization of the Common Agricultural Policy (CAP) requires the large scale and frequent monitoring of agricultural land. Towards this direction, the free and open satellite data (i.e., Sentinel missions) have been extensively used as the sources for the required high spatial and temporal resolution Earth observations. Nevertheless, monitoring the CAP at large scales constitutes a big data problem and puts a strain on CAP paying agencies that need to adapt fast in terms of infrastructure and know-how. Hence, there is a need for efficient and easy-to-use tools for the acquisition, storage, processing and exploitation of big satellite data. In this work, we present the Agriculture monitoring Data Cube (ADC), which is an automated, modular, end-to-end framework for discovering, pre-processing and indexing optical and Synthetic Aperture Radar (SAR) images into a multidimensional cube. We also offer a set of powerful tools on top of the ADC, including i) the generation of analysis-ready feature spaces of big satellite data to feed downstream machine learning tasks and ii) the support of Satellite Image Time-Series (SITS) analysis via services pertinent to the monitoring of the CAP (e.g., detecting trends and events, monitoring the growth status etc.). The knowledge extracted from the SITS analyses and the machine learning tasks returns to the data cube, building scalable country-specific knowledge bases that can efficiently answer complex and multi-faceted geospatial queries.

摘要:共同农业政策(CAP)的现代化需要大规模和频繁监测农业土地。朝这个方向迈出,自由开放的卫星数据(即哨兵任务)已被广泛用作所需的高空间和时间分辨率地球观测的来源。然而,监视大规模的帽子构成了一个大数据问题,并使需要在基础架构和专有技术方面快速适应的帽子付款机构施加压力。因此,需要有效且易于使用的工具来获取,存储,处理和开发大型卫星数据。在这项工作中,我们介绍了农业监视数据立方体(ADC),该数据立方体是一个自动化,模块化的,端到端的框架,用于发现,预处理和索引光学和合成孔径雷达(SAR)图像中的多维多维数据集。我们还在ADC之外提供了一组强大的工具,包括i)大型卫星数据的分析功能空间的生成以馈送下游机器学习任务,ii)支持卫星图像时间序列(SITS)分析通过与监视帽子有关的服务(例如,检测趋势和事件,监视增长状态等)。从分析和机器学习任务中提取的知识返回到数据立方,建立可扩展的国家特定知识库,这些知识库可以有效地回答复杂且多方面的地理空间查询。

CV-4-标题 Pest presence prediction using interpretable machine learning

链接: https://arxiv.org/abs/2205.07723
作者: Ornela Nanushi, Vasileios Sitokonstantinou, Ilias Tsoumas, Charalampos Kontoes
备注: This work has been accepted for publication in IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP 2022)

点击查看摘要

Abstract: Helicoverpa Armigera, or cotton bollworm, is a serious insect pest of cotton crops that threatens the yield and the quality of lint. The timely knowledge of the presence of the insects in the field is crucial for effective farm interventions. Meteo-climatic and vegetation conditions have been identified as key drivers of crop pest abundance. In this work, we applied an interpretable classifier, i.e., Explainable Boosting Machine, which uses earth observation vegetation indices, numerical weather predictions and insect trap catches to predict the onset of bollworm harmfulness in cotton fields in Greece. The glass-box nature of our approach provides significant insight on the main drivers of the model and the interactions among them. Model interpretability adds to the trustworthiness of our approach and therefore its potential for rapid uptake and context-based implementation in operational farm management scenarios. Our results are satisfactory and the importance of drivers, through our analysis on global and local explainability, is in accordance with the literature.

摘要:Helicoverpa Armigera或棉花虫是严重的棉花作物的害虫,威胁着棉绒的产量和质量。及时了解昆虫在田间的存在对于有效的农场干预至关重要。金属气候和植被条件已被确定为作物丰度的主要驱动因素。在这项工作中,我们应用了一个可解释的分类器,即可解释的提升机,该机器使用地球观察植被指数,数值天气预测和昆虫陷阱捕获量来预测希腊棉花场中棉布虫的危害。我们方法的玻璃箱性质为模型的主要驱动因素及其之间的相互作用提供了重要的见解。模型可解释性增加了我们方法的可信赖性,因此在运营农场管理方案中的快速吸收和基于上下文的实施的潜力。我们的结果令人满意,通过我们对全球和局部解释性的分析,驱动因素的重要性符合文献。

CV-5-标题 Towards Space-to-Ground Data Availability for Agriculture Monitoring

链接: https://arxiv.org/abs/2205.07721
作者: George Choumos, Alkiviadis Koukos, Vasileios Sitokonstantinou, Charalampos Kontoes
备注: Has been accepted for publication in IEEE IVMSP 2022: this https URL Specifically in the special session “Multimodal Analysis, Fusion and Retrieval of satellite images”: this https URL

点击查看摘要

Abstract: The recent advances in machine learning and the availability of free and open big Earth data (e.g., Sentinel missions), which cover large areas with high spatial and temporal resolution, have enabled many agriculture monitoring applications. One example is the control of subsidy allocations of the Common Agricultural Policy (CAP). Advanced remote sensing systems have been developed towards the large-scale evidence-based monitoring of the CAP. Nevertheless, the spatial resolution of satellite images is not always adequate to make accurate decisions for all fields. In this work, we introduce the notion of space-to-ground data availability, i.e., from the satellite to the field, in an attempt to make the best out of the complementary characteristics of the different sources. We present a space-to-ground dataset that contains Sentinel-1 radar and Sentinel-2 optical image time-series, as well as street-level images from the crowdsourcing platform Mapillary, for grassland fields in the area of Utrecht for 2017. The multifaceted utility of our dataset is showcased through the downstream task of grassland classification. We train machine and deep learning algorithms on these different data domains and highlight the potential of fusion techniques towards increasing the reliability of decisions.

摘要:机器学习的最新进展以及自由和开放的大地数据(例如,哨兵任务)涵盖了具有高空间和时间分辨率的大面积,已使许多农业监测应用程序。一个例子是控制共同农业政策(CAP)的补贴分配。先进的遥感系统已开发用于大规模的循证监测盖。然而,卫星图像的空间分辨率并不总是足以为所有领域做出准确的决策。在这项工作中,我们介绍了从卫星到现场的空间到地面数据可用性的概念,以尝试从不同来源的互补特性中获得最佳状态。我们提供了一个空间到地面数据集,其中包含Sentinel-1雷达和Sentinel-2光学图像时间序列,以及来自众包平台Mapillary的街道级图像,用于2017年UTRECHT地区的草地领域。通过草地分类的下游任务展示了我们数据集的多方面效用。我们在这些不同的数据域上训练机器和深度学习算法,并突出了融合技术在提高决策可靠性方面的潜力。

CV-6-标题 Exploring Diversity-based Active Learning for 3D Object Detection in Autonomous Driving

链接: https://arxiv.org/abs/2205.07708
作者: Zhihao Liang, Xun Xu, Shengheng Deng, Lile Cai, Tao Jiang, Kui Jia
备注:

点击查看摘要

Abstract: 3D object detection has recently received much attention due to its great potential in autonomous vehicle (AV). The success of deep learning based object detectors relies on the availability of large-scale annotated datasets, which is time-consuming and expensive to compile, especially for 3D bounding box annotation. In this work, we investigate diversity-based active learning (AL) as a potential solution to alleviate the annotation burden. Given limited annotation budget, only the most informative frames and objects are automatically selected for human to annotate. Technically, we take the advantage of the multimodal information provided in an AV dataset, and propose a novel acquisition function that enforces spatial and temporal diversity in the selected samples. We benchmark the proposed method against other AL strategies under realistic annotation cost measurement, where the realistic costs for annotating a frame and a 3D bounding box are both taken into consideration. We demonstrate the effectiveness of the proposed method on the nuScenes dataset and show that it outperforms existing AL strategies significantly.

摘要:3D对象检测最近由于其在自动驾驶汽车(AV)方面的巨大潜力而​​受到了很多关注。基于深度学习的对象检测器的成功取决于大规模注释的数据集的可用性,该数据集耗时且编译昂贵,尤其是对于3D边界框注释。在这项工作中,我们研究了基于多样性的积极学习(AL),以减轻注释负担的潜在解决方案。鉴于有限的注释预算,只有最有用的框架和对象自动选择人类注释。从技术上讲,我们利用AV数据集中提供的多模式信息的优势,并提出了一种新颖的采集函数,该函数可以在选定的样品中执行空间和时间多样性。我们在现实的注释成本衡量下对其他AL策略进行基准对其他策略进行基准测试,其中考虑了框架和3D边界框的现实成本。我们证明了在Nuscenes数据集上提出的方法的有效性,并表明它的表现明显优于现有策略。

CV-7-标题 Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml

链接: https://arxiv.org/abs/2205.07690
作者: Nicolò Ghielmetti, Vladimir Loncar, Maurizio Pierini, Marcel Roed, Sioni Summers, Thea Aarrestad, Christoffer Petersson, Hampus Linander, Jennifer Ngadiuba, Kelvin Lin, Philip Harris
备注: 11 pages, 6 tables, 5 figures

点击查看摘要

Abstract: In this paper, we investigate how field programmable gate arrays can serve as hardware accelerators for real-time semantic segmentation tasks relevant for autonomous driving. Considering compressed versions of the ENet convolutional neural network architecture, we demonstrate a fully-on-chip deployment with a latency of 4.9 ms per image, using less than 30% of the available resources on a Xilinx ZCU102 evaluation board. The latency is reduced to 3 ms per image when increasing the batch size to ten, corresponding to the use case where the autonomous vehicle receives inputs from multiple cameras simultaneously. We show, through aggressive filter reduction and heterogeneous quantization-aware training, and an optimized implementation of convolutional layers, that the power consumption and resource utilization can be significantly reduced while maintaining accuracy on the Cityscapes dataset.

摘要:在本文中,我们研究了如何用自动驾驶相关的实时语义分割任务作为硬件加速器。考虑到ENET卷积神经网络体系结构的压缩版本,我们使用Xilinx ZCU102评估委员会上的少于30%的可用资源展示了全芯片部署,延迟为4.9 ms。当将批处理大小增加到十个时,延迟减少到每个图像3 ms,对应于自动驾驶汽车同时接收多个摄像机输入的用例。通过积极的过滤器减少和异质量化感知训练以及对卷积层的优化实施,我们可以大大降低电力消耗和资源利用率,同时保持城市景观数据集的准确性,从而显示出优化的量化量。

CV-8-标题 CONSENT Context Sensitive Transformer for Bold Words Classification

链接: https://arxiv.org/abs/2205.07683
作者: Ionut-Catalin Sandu, Daniel Voinea, Alin-Ionut Popa
备注:

点击查看摘要

Abstract: We present CONSENT, a simple yet effective CONtext SENsitive Transformer framework for context-dependent object classification within a fully-trainable end-to-end deep learning pipeline. We exemplify the proposed framework on the task of bold words detection proving state-of-the-art results. Given an image containing text of unknown font-types (e.g. Arial, Calibri, Helvetica), unknown language, taken under various degrees of illumination, angle distortion and scale variation, we extract all the words and learn a context-dependent binary classification (i.e. bold versus non-bold) using an end-to-end transformer-based neural network ensemble. To prove the extensibility of our framework, we demonstrate competitive results against state-of-the-art for the game of rock-paper-scissors by training the model to determine the winner given a sequence with 2 pictures depicting hand poses.

摘要:我们提出同意,这是一个简单而有效的上下文敏感的变压器框架,用于与上下文相关的对象分类,内部可识别的端到端深度学习管道。我们在大胆单词检测的任务上展示了提出的框架,证明了最先进的结果。给定一个包含未知字体类型文本(例如Arial,Calibri,Helvetica)的图像,未知语言,以各种照明,角度失真和比例变化为单位,我们提取所有单词并学习依赖上下文依赖于上下文的二进制分类(即,即。使用基于端到端变压器的神经网络集合,大胆与非折叠)。为了证明我们的框架的可扩展性,我们通过训练模型来确定获胜者,以2张描绘手姿势的序列来确定获胜者,从而证明了针对岩石剪辑器游戏的最先进的竞争结果。

CV-9-标题 VQBB Image-to-image Translation with Vector Quantized Brownian Bridge

链接: https://arxiv.org/abs/2205.07680
作者: Bo Li, Kaitao Xue, Bin Liu, Yu-Kun Lai
备注: 5 pages, 5 figures

点击查看摘要

Abstract: Image-to-image translation is an important and challenging problem in computer vision. Existing approaches like Pixel2Pixel, DualGAN suffer from the instability of GAN and fail to generate diverse outputs because they model the task as a one-to-one mapping. Although diffusion models can generate images with high quality and diversity, current conditional diffusion models still can not maintain high similarity with the condition image on image-to-image translation tasks due to the Gaussian noise added in the reverse process. To address these issues, a novel Vector Quantized Brownian Bridge(VQBB) diffusion model is proposed in this paper. On one hand, Brownian Bridge diffusion process can model the transformation between two domains more accurate and flexible than the existing Markov diffusion methods. As far as the authors know, it is the first work for Brownian Bridge diffusion process proposed for image-to-image translation. On the other hand, the proposed method improved the learning efficiency and translation accuracy by confining the diffusion process in the quantized latent space. Finally, numerical experimental results validated the performance of the proposed method.

摘要:图像到图像翻译是计算机视觉中的重要且具有挑战性的问题。诸如Pixel2像素,Dualgan之类的现有方法遭受了GAN的不稳定性的困扰,并且无法产生多样化的输出,因为它们将任务建模为一对一的映射。尽管扩散模型可以生成具有高质量和多样性的图像,但由于反向过程中添加的高斯噪声,当前条件扩散模型与图像到图像翻译任务上的条件图像的相似性仍然无法保持高度相似。为了解决这些问题,本文提出了一个新颖的矢量量化布朗桥(VQBB)扩散模型。一方面,布朗桥扩散过程可以比现有的马尔可夫扩散方法更准确,更灵活地模拟两个域之间的转换。据作者所知,这是布朗桥扩散过程的第一项作品,用于图像到图像翻译。另一方面,提出的方法通过限制量化潜在空间的扩散过程来提高学习效率和翻译精度。最后,数值实验结果验证了所提出的方法的性能。

CV-10-标题 PUCK Parallel Surface and Convolution-kernel Tracking for Event-Based Cameras

链接: https://arxiv.org/abs/2205.07657
作者: Luna Gava, Marco Monforte, Massimiliano Iacono, Chiara Bartolozzi, Arren Glover
备注: submitted to IROS 2022

点击查看摘要

Abstract: Low latency and accuracy are fundamental requirements when vision is integrated in robots for high-speed interaction with targets, since they affect system reliability and stability. In such a scenario, the choice of the sensor and algorithms is important for the entire control loop. The technology of event-cameras can guarantee fast visual sensing in dynamic environments, but requires a tracking algorithm that can keep up with the high data rate induced by the robot ego-motion while maintaining accuracy and robustness to distractors. In this paper, we introduce a novel tracking method that leverages the Exponential Reduced Ordinal Surface (EROS) data representation to decouple event-by-event processing and tracking computation. The latter is performed using convolution kernels to detect and follow a circular target moving on a plane. To benchmark state-of-the-art event-based tracking, we propose the task of tracking the air hockey puck sliding on a surface, with the future aim of controlling the iCub robot to reach the target precisely and on time. Experimental results demonstrate that our algorithm achieves the best compromise between low latency and tracking accuracy both when the robot is still and when moving.

摘要:当将视力集成到机器人中以与目标相互作用,因为它们会影响系统的可靠性和稳定性,因此延迟和准确性是基本要求。在这种情况下,传感器和算法的选择对于整个控制循环很重要。事件 - 摄像机的技术可以保证在动态环境中快速的视觉感测,但需要一种跟踪算法,该算法可以跟上机器人自我动机引起的高数据速率,同时保持对干扰器的准确性和稳健性。在本文中,我们介绍了一种新颖的跟踪方法,该方法利用指数降低的序列表面(ERO)数据表示形式来解除事件的处理和跟踪计算。后者是使用卷积内核进行的,以检测并遵循在平面上移动的圆形目标。为了基于最新的事件跟踪,我们提出了跟踪曲棍球冰球在表面上滑动的任务,未来的目的是控制ICUB机器人,以精确地到达目标。实验结果表明,当机器人静止和移动时,我们的算法在低潜伏期和跟踪准确性之间达到了最佳折衷。

CV-11-标题 Scalable Vehicle Re-Identification via Self-Supervision

链接: https://arxiv.org/abs/2205.07613
作者: Pirazh Khorramshahi, Vineet Shenoy, Rama Chellappa
备注:

点击查看摘要

Abstract: As Computer Vision technologies become more mature for intelligent transportation applications, it is time to ask how efficient and scalable they are for large-scale and real-time deployment. Among these technologies is Vehicle Re-Identification which is one of the key elements in city-scale vehicle analytics systems. Many state-of-the-art solutions for vehicle re-id mostly focus on improving the accuracy on existing re-id benchmarks and often ignore computational complexity. To balance the demands of accuracy and computational efficiency, in this work we propose a simple yet effective hybrid solution empowered by self-supervised training which only uses a single network during inference time and is free of intricate and computation-demanding add-on modules often seen in state-of-the-art approaches. Through extensive experiments, we show our approach, termed Self-Supervised and Boosted VEhicle Re-Identification (SSBVER), is on par with state-of-the-art alternatives in terms of accuracy without introducing any additional overhead during deployment. Additionally we show that our approach, generalizes to different backbone architectures which facilitates various resource constraints and consistently results in a significant accuracy boost.

摘要:随着计算机视觉技术在智能运输应用中变得更加成熟,现在该询问它们在大规模和实时部署方面的效率和可扩展性。这些技术包括车辆重新识别,这是城市规模的车辆分析系统中的关键要素之一。许多用于车辆重新ID的最先进的解决方案主要集中在提高现有重新ID基准的准确性,并且通常忽略了计算复杂性。为了平衡准确性和计算效率的需求,在这项工作中,我们提出了一种简单而有效的混合解决方案,该解决方案是由自我监督培训赋予的能力,该培训在推理期间仅使用一个网络,并且通常没有复杂和计算的附加模块在最先进的方法中看到。通过广泛的实验,我们显示了我们的方法,称为自我监督和增强的车辆重新识别(SSBVER),就准确性而言与最先进的替代方案相提并论,而没有在部署过程中引入任何其他开销。此外,我们表明我们的方法概括为不同的骨干体系结构,这些骨干结构有助于各种资源限制,并始终如一地导致明显的准确性提升。

CV-12-标题 Noise-Tolerant Learning for Audio-Visual Action Recognition

链接: https://arxiv.org/abs/2205.07611
作者: Haochen Han, Qinghua Zheng, Minnan Luo, Kaiyao Miao, Feng Tian, Yan Chen
备注:

点击查看摘要

Abstract: Recently, video recognition is emerging with the help of multi-modal learning, which focuses on integrating multiple modalities to improve the performance or robustness of a model. Although various multi-modal learning methods have been proposed and offer remarkable recognition results, almost all of these methods rely on high-quality manual annotations and assume that modalities among multi-modal data provide relevant semantic information. Unfortunately, most widely used video datasets are collected from the Internet and inevitably contain noisy labels and noisy correspondence. To solve this problem, we use the audio-visual action recognition task as a proxy and propose a noise-tolerant learning framework to find anti-interference model parameters to both noisy labels and noisy correspondence. Our method consists of two phases and aims to rectify noise by the inherent correlation between modalities. A noise-tolerant contrastive training phase is performed first to learn robust model parameters unaffected by the noisy labels. To reduce the influence of noisy correspondence, we propose a cross-modal noise estimation component to adjust the consistency between different modalities. Since the noisy correspondence existed at the instance level, a category-level contrastive loss is proposed to further alleviate the interference of noisy correspondence. Then in the hybrid supervised training phase, we calculate the distance metric among features to obtain corrected labels, which are used as complementary supervision. In addition, we investigate the noisy correspondence in real-world datasets and conduct comprehensive experiments with synthetic and real noise data. The results verify the advantageous performance of our method compared to state-of-the-art methods.

摘要:最近,视频识别是在多模式学习的帮助下出现的,该学习重点是整合多种模态以提高模型的性能或鲁棒性。尽管已经提出了各种多模式学习方法并提供了显着的识别结果,但几乎所有这些方法都依赖于高质量的手动注释,并假设多模式数据之间的方式提供了相关的语义信息。不幸的是,大多数使用的视频数据集是从Internet收集的,不可避免地包含嘈杂的标签和嘈杂的信件。为了解决这个问题,我们将视听动作识别任务用作代理,并提出一个耐噪声的学习框架,以找到嘈杂的标签和嘈杂的对应关系的反干扰模型参数。我们的方法包括两个阶段,旨在通过模态之间的固有相关性来纠正噪声。首先执行耐噪声的对比训练阶段,以学习不受嘈杂标签影响的强大模型参数。为了减少噪声对应关系的影响,我们提出了一个跨模式噪声估计分量,以调整不同模态之间的一致性。由于在实例级别上存在嘈杂的对应关系,因此提出了类别级的对比损失,以进一步减轻噪声对应的干扰。然后,在混合监督训练阶段,我们计算特征之间的距离度量,以获得校正的标签,这些标签被用作补充监督。此外,我们研究了现实世界数据集中的嘈杂对应关系,并使用合成和真实噪声数据进行了全面的实验。结果与最新方法相比,结果验证了我们方法的优势性能。

CV-13-标题 An automatic pipeline for atlas-based fetal and neonatal brain segmentation and analysis

链接: https://arxiv.org/abs/2205.07575
作者: Urru, Andrea, Nakaki, Ayako, Benkarim, Oualid, Crovetto, Francesca, Segales, Laura, Comte, Valentin, Hahner, Nadine, Eixarch, Elisenda, Gratacós, Eduard, Crispi, Fàtima, Piella, Gemma, González Ballester, Miguel A
备注:

点击查看摘要

Abstract: The automatic segmentation of perinatal brain structures in magnetic resonance imaging (MRI) is of utmost importance for the study of brain growth and related complications. While different methods exist for adult and pediatric MRI data, there is a lack for automatic tools for the analysis of perinatal imaging. In this work, a new pipeline for fetal and neonatal segmentation has been developed. We also report the creation of two new fetal atlases, and their use within the pipeline for atlas-based segmentation, based on novel registration methods. The pipeline is also able to extract cortical and pial surfaces and compute features, such as curvature, thickness, sulcal depth, and local gyrification index. Results show that the introduction of the new templates together with our segmentation strategy leads to accurate results when compared to expert annotations, as well as better performances when compared to a reference pipeline (developing Human Connectome Project (dHCP)), for both early and late-onset fetal brains.

摘要:磁共振成像(MRI)中围产期脑结构的自动分割对于研究脑生长和相关并发症至关重要。尽管存在针对成人和小儿MRI数据的不同方法,但缺乏用于分析围产期成像的自动工具。在这项工作中,已经开发了一条新的胎儿和新生儿分割的管道。我们还报告了基于新型的注册方法,报告了两个新的胎儿地图集的创建,以及它们在基于地图集的分段中的使用。该管道还能够提取皮质和折线表面以及计算特征,例如曲率,厚度,沟深度和局部回旋指数。结果表明,与专家注释相比,新模板的引入以及我们的分割策略以及与参考管道(开发人类连接项目(DHCP)相比)相比,可以取得准确的结果。 - 胎儿大脑。

CV-14-标题 An Effective Transformer-based Solution for RSNA Intracranial Hemorrhage Detection Competition

链接: https://arxiv.org/abs/2205.07556
作者: Fangxin Shang, Siqi Wang, Yehui Yang
备注:

点击查看摘要

Abstract: We present an effective method for Intracranial Hemorrhage Detection (IHD) which exceeds the performance of the winner solution in RSNA-IHD competition (2019). Meanwhile, our model only takes quarter parameters and ten percent FLOPs compared to the winner’s solution. The IHD task needs to predict the hemorrhage category of each slice for the input brain CT. We review the top-5 solutions for the IHD competition held by the Radiological Society of North America(RSNA) in 2019. Nearly all the top solutions rely on 2D convolutional networks and sequential models (Bidirectional GRU or LSTM) to extract intra-slice and inter-slice features, respectively. All the top solutions enhance the performance by leveraging the model ensemble, and the model number varies from 7 to 31. In the past years, since much progress has been made in the computer vision regime especially Transformer-based models, we introduce the Transformer-based techniques to extract the features in both intra-slice and inter-slice views for IHD tasks. Additionally, a semi-supervised method is embedded into our workflow to further improve the performance. The code is available athttps://aistudio.this http URL.

摘要:我们提出了一种有效的颅内出血检测方法(IHD),该方法超过了RSNA-IHD竞争中赢家解决方案的性能(2019年)。同时,与获胜者的解决方案相比,我们的模型仅采用四分之一的参数和10%的失败。 IHD任务需要预测输入脑CT的每个切片的出血类别。我们回顾了北美放射学会(RSNA)在2019年举行的IHD竞争的前5个解决方案。几乎所有顶级解决方案都依赖于2D卷积网络和顺序模型(双向GRU或LSTM)来提取斜线内和坡度内部和式模型片间特征。所有顶部解决方案都通过利用模型集合来增强性能,并且模型数量从7到31不等。在过去的几年中,由于计算机视觉制度(尤其是基于变压器)的模型已经取得了很大进展,因此我们介绍了Transformer-基于IHD任务内板和板间视图中提取功能的技术。此外,将半监督的方法嵌入我们的工作流程中,以进一步提高性能。该代码可用于athttps://aistudio.this http url。

CV-15-标题 A Neuro-Symbolic ASP Pipeline for Visual Question Answering

链接: https://arxiv.org/abs/2205.07548
作者: Thomas Eiter, Nelson Higuera, Johannes Oetsch, Michael Pritz
备注: Paper presented at the 38th International Conference on Logic Programming (ICLP 2022), 15 pages

点击查看摘要

Abstract: We present a neuro-symbolic visual question answering (VQA) pipeline for CLEVR, which is a well-known dataset that consists of pictures showing scenes with objects and questions related to them. Our pipeline covers (i) training neural networks for object classification and bounding-box prediction of the CLEVR scenes, (ii) statistical analysis on the distribution of prediction values of the neural networks to determine a threshold for high-confidence predictions, and (iii) a translation of CLEVR questions and network predictions that pass confidence thresholds into logic programs so that we can compute the answers using an ASP solver. By exploiting choice rules, we consider deterministic and non-deterministic scene encodings. Our experiments show that the non-deterministic scene encoding achieves good results even if the neural networks are trained rather poorly in comparison with the deterministic approach. This is important for building robust VQA systems if network predictions are less-than perfect. Furthermore, we show that restricting non-determinism to reasonable choices allows for more efficient implementations in comparison with related neuro-symbolic approaches without loosing much accuracy. This work is under consideration for acceptance in TPLP.

摘要:我们提出了CLEVR的神经符号视觉问题答案(VQA)管道,该管道是一个众所周知的数据集,由图片组成,显示带有对象及其相关问题的场景。我们的管道涵盖了(i)培训神经网络,以进行对象分类和CLEVR场景的边界框预测,(ii)关于神经网络预测值分布的统计分析,以确定高信心预测的阈值,以及(iii)(iii )CLEVR问题和网络预测的翻译将置信度阈值传递到逻辑程序中,以便我们可以使用ASP求解器计算答案。通过利用选择规则,我们考虑确定性和非确定性场景编码。我们的实验表明,即使与确定性方法相比,对神经网络的训练相当糟糕,编码非确定性的场景也取得了良好的结果。如果网络预测不太完美,这对于构建强大的VQA系统很重要。此外,我们表明,将非确定性限制在合理的选择中可以与相关的神经符号方法相比,可以更有效地实现,而不会失去太多准确性。这项工作正在考虑在TPLP中接受。

CV-16-标题 SQ-VAE Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

链接: https://arxiv.org/abs/2205.07547
作者: Yuhta Takida, Takashi Shibuya, WeiHsiang Liao, Chieh-Hsin Lai, Junki Ohmura, Toshimitsu Uesaka, Naoki Murata, Shusuke Takahashi, Toshiyuki Kumakura, Yuki Mitsufuji
备注: 25 pages with 10 figures, accepted for publication in ICML 2022

点击查看摘要

Abstract: One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook, also known as codebook collapse. We hypothesize that the training scheme of VQ-VAE, which involves some carefully designed heuristics, underlies this issue. In this paper, we propose a new training scheme that extends the standard VAE via novel stochastic dequantization and quantization, called stochastically quantized variational autoencoder (SQ-VAE). In SQ-VAE, we observe a trend that the quantization is stochastic at the initial stage of the training but gradually converges toward a deterministic quantization, which we call self-annealing. Our experiments show that SQ-VAE improves codebook utilization without using common heuristics. Furthermore, we empirically show that SQ-VAE is superior to VAE and VQ-VAE in vision- and speech-related tasks.

摘要:一个著名的矢量定量变分自动编码器(VQ-VAE)的问题是,学识渊博的离散表示形式仅使用代码书的全部容量的一小部分,也称为代码书倒塌。我们假设VQ-VAE的培训计划涉及一些精心设计的启发式方法,这是这个问题的基础。在本文中,我们提出了一种新的训练方案,该方案通过新颖的随机去量化和量化扩展标准VAE,称为随机量化变异自动编码器(SQ-VAE)。在SQ-VAE中,我们观察到一种趋势,即在训练的初始阶段进行量化是随机的,但逐渐收敛于确定性量化,我们称之为自宣传。我们的实验表明,SQ-VAE在不使用常见启发式方法的情况下改善了代码书的利用率。此外,我们从经验上表明,在视觉和语音相关的任务中,SQ-VAE优于VAE和VQ-VAE。

CV-17-标题 Residual Local Feature Network for Efficient Super-Resolution

链接: https://arxiv.org/abs/2205.07514
作者: Fangyuan Kong, Mingxi Li, Songwei Liu, Ding Liu, Jingwen He, Yang Bai, Fangmin Chen, Lean Fu
备注:

点击查看摘要

Abstract: Deep learning based approaches has achieved great performance in single image super-resolution (SISR). However, recent advances in efficient super-resolution focus on reducing the number of parameters and FLOPs, and they aggregate more powerful features by improving feature utilization through complex layer connection strategies. These structures may not be necessary to achieve higher running speed, which makes them difficult to be deployed to resource-constrained devices. In this work, we propose a novel Residual Local Feature Network (RLFN). The main idea is using three convolutional layers for residual local feature learning to simplify feature aggregation, which achieves a good trade-off between model performance and inference time. Moreover, we revisit the popular contrastive loss and observe that the selection of intermediate features of its feature extractor has great influence on the performance. Besides, we propose a novel multi-stage warm-start training strategy. In each stage, the pre-trained weights from previous stages are utilized to improve the model performance. Combined with the improved contrastive loss and training strategy, the proposed RLFN outperforms all the state-of-the-art efficient image SR models in terms of runtime while maintaining both PSNR and SSIM for SR. In addition, we won the first place in the runtime track of the NTIRE 2022 efficient super-resolution challenge. Code will be available at this https URL.

摘要:基于深度学习的方法在单像超分辨率(SISR)中取得了出色的性能。但是,有效的超分辨率的最新进展集中在减少参数和失败的数量上,并通过通过复杂的层连接策略来改善功能利用来汇总更强大的功能。这些结构对于达到更高的运行速度可能不是必需的,这使得它们难以将其部署到资源约束的设备上。在这项工作中,我们提出了一个新颖的残留本地特征网络(RLFN)。主要思想是使用三个卷积层用于剩余的本地功能学习来简化特征聚合,这在模型性能和推理时间之间取决了良好的权衡。此外,我们重新审视了流行的对比损失,并观察到其功能提取器的中间特征的选择对性能有很大影响。此外,我们提出了一种新型的多阶段温暖启动训练策略。在每个阶段,都利用了先前阶段的预训练权重来改善模型性能。结合改进的对比度损失和训练策略,提议的RLFN在运行时胜过所有最先进的图像SR模型,同​​时维持SR的PSNR和SSIM。此外,我们赢得了NTIRE 2022高效超分辨率挑战的运行时赛道的第一名。代码将在此HTTPS URL上可用。

CV-18-标题 Topologically Persistent Features-based Object Recognition in Cluttered Indoor Environments

链接: https://arxiv.org/abs/2205.07479
作者: Ekta U. Samani, Ashis G. Banerjee
备注: Accepted for presentation in the IEEE International Conference on Robotics and Automation (ICRA) 2022 Workshop on Robotic Perception and Mapping: Emerging Techniques

点击查看摘要

Abstract: Recognition of occluded objects in unseen indoor environments is a challenging problem for mobile robots. This work proposes a new slicing-based topological descriptor that captures the 3D shape of object point clouds to address this challenge. It yields similarities between the descriptors of the occluded and the corresponding unoccluded objects, enabling object unity-based recognition using a library of trained models. The descriptor is obtained by partitioning an object’s point cloud into multiple 2D slices and constructing filtrations (nested sequences of simplicial complexes) on the slices to mimic further slicing of the slices, thereby capturing detailed shapes through persistent homology-generated features. We use nine different sequences of cluttered scenes from a benchmark dataset for performance evaluation. Our method outperforms two state-of-the-art deep learning-based point cloud classification methods, namely, DGCNN and SimpleView.

摘要:在看不见的室内环境中识别被遮挡的物体对于移动机器人来说是一个具有挑战性的问题。这项工作提出了一个新的基于切片的拓扑描述符,该描述符捕获对象点云的3D形状以应对这一挑战。它在遮挡的描述符和相应的未关注对象之间产生相似之处,从而使用训练有素的模型库实现了基于对象统一的识别。描述符是通过将对象的点云划分为多个2D切片并在切片上构造过滤(嵌套的过滤序列)来获得的描述符,从而模仿切片的进一步切片,从而通过持续的同源性生成的特征捕获详细的形状。我们使用从基准数据集中使用九个不同的混乱场景序列进行绩效评估。我们的方法的表现优于两种最先进的基于深度学习的点云分类方法,即DGCNN和SimpleView。

CV-19-标题 Manifold Characteristics That Predict Downstream Task Performance

链接: https://arxiv.org/abs/2205.07477
作者: Ruan van der Merwe, Gregory Newman, Etienne Barnard
备注: Currently under review

点击查看摘要

Abstract: Pretraining methods are typically compared by evaluating the accuracy of linear classifiers, transfer learning performance, or visually inspecting the representation manifold’s (RM) lower-dimensional projections. We show that the differences between methods can be understood more clearly by investigating the RM directly, which allows for a more detailed comparison. To this end, we propose a framework and new metric to measure and compare different RMs. We also investigate and report on the RM characteristics for various pretraining methods. These characteristics are measured by applying sequentially larger local alterations to the input data, using white noise injections and Projected Gradient Descent (PGD) adversarial attacks, and then tracking each datapoint. We calculate the total distance moved for each datapoint and the relative change in distance between successive alterations. We show that self-supervised methods learn an RM where alterations lead to large but constant size changes, indicating a smoother RM than fully supervised methods. We then combine these measurements into one metric, the Representation Manifold Quality Metric (RMQM), where larger values indicate larger and less variable step sizes, and show that RMQM correlates positively with performance on downstream tasks.

摘要:通常通过评估线性分类器的准确性,转移学习性能或视觉检查表示歧管(RM)的较低维投影来比较预处理方法。我们表明,通过直接研究RM可以更清楚地理解方法之间的差异,从而可以进行更详细的比较。为此,我们提出了一个框架和新指标,以衡量和比较不同的RMS。我们还研究并报告了各种预训练方法的RM特性。这些特征是通过使用白噪声注射和投影梯度下降(PGD)对抗性攻击的依次更大的局部变化来衡量的,然后跟踪每个数据点。我们计算每个数据点的总距离以及连续更改之间距离的相对变化。我们表明,自我监督的方法学习了一个RM,其中改变会导致较大但恒定的尺寸变化,表明RM比完全监督的方法更平滑。然后,我们将这些测量值组合为一个指标,即表示歧管质量度量(RMQM),其中较大的值表示较大和较小的变量步骤尺寸,并表明RMQM与下游任务上的性能呈正相关。

CV-20-标题 Frequency selective extrapolation with residual filtering for image error concealment

链接: https://arxiv.org/abs/2205.07476
作者: Ján Koloda, Jürgen Seiler, André Kaup, Victoria Sánchez, Antonio M. Peinado
备注:

点击查看摘要

Abstract: The purpose of signal extrapolation is to estimate unknown signal parts from known samples. This task is especially important for error concealment in image and video communication. For obtaining a high quality reconstruction, assumptions have to be made about the underlying signal in order to solve this underdetermined problem. Among existent reconstruction algorithms, frequency selective extrapolation (FSE) achieves high performance by assuming that image signals can be sparsely represented in the frequency domain. However, FSE does not take into account the low-pass behaviour of natural images. In this paper, we propose a modified FSE that takes this prior knowledge into account for the modelling, yielding significant PSNR gains.

摘要:信号外推的目的是估计已知样品中未知的信号零件。此任务对于图像和视频通信中的错误隐藏尤其重要。为了获得高质量的重建,必须对基础信号做出假设,以解决这一不确定的问题。在现有的重建算法中,频率选择性外推(FSE)通过假设图像信号在频域中稀疏表示,从而实现了高性能。但是,FSE不考虑自然图像的低通行率。在本文中,我们提出了一种修改的FSE,该FSE将这种先验知识考虑到建模,从而产生了显着的PSNR增益。

CV-21-标题 Robust Representation via Dynamic Feature Aggregation

链接: https://arxiv.org/abs/2205.07466
作者: Haozhe Liu, Haoqin Ji, Yuexiang Li, Nanjun He, Haoqian Wu, Feng Liu, Linlin Shen, Yefeng Zheng
备注:

点击查看摘要

Abstract: Deep convolutional neural network (CNN) based models are vulnerable to the adversarial attacks. One of the possible reasons is that the embedding space of CNN based model is sparse, resulting in a large space for the generation of adversarial samples. In this study, we propose a method, denoted as Dynamic Feature Aggregation, to compress the embedding space with a novel regularization. Particularly, the convex combination between two samples are regarded as the pivot for aggregation. In the embedding space, the selected samples are guided to be similar to the representation of the pivot. On the other side, to mitigate the trivial solution of such regularization, the last fully-connected layer of the model is replaced by an orthogonal classifier, in which the embedding codes for different classes are processed orthogonally and separately. With the regularization and orthogonal classifier, a more compact embedding space can be obtained, which accordingly improves the model robustness against adversarial attacks. An averaging accuracy of 56.91% is achieved by our method on CIFAR-10 against various attack methods, which significantly surpasses a solid baseline (Mixup) by a margin of 37.31%. More surprisingly, empirical results show that, the proposed method can also achieve the state-of-the-art performance for out-of-distribution (OOD) detection, due to the learned compact feature space. An F1 score of 0.937 is achieved by the proposed method, when adopting CIFAR-10 as in-distribution (ID) dataset and LSUN as OOD dataset. Code is available at this https URL.

摘要:基于深度卷积神经网络(CNN)模型容易受到对抗攻击的影响。可能的原因之一是,基于CNN的模型的嵌入空间稀疏,为生成对抗样品提供了很大的空间。在这项研究中,我们提出了一种被称为动态特征聚集的方法,以通过新颖的正则化来压缩嵌入空间。特别是,两个样品之间的凸组合被认为是聚集的枢轴。在嵌入空间中,所选样品被指导为类似于枢轴的表示。另一方面,为了减轻这种正则化的琐碎解决方案,模型的最后一个完全连接的层被正交分类器取代,其中不同类别的嵌入式代码是正交和分别处理的。借助正则化和正交分类器,可以获得更紧凑的嵌入空间,因此可以提高模型对对抗性攻击的鲁棒性。我们的CIFAR-10方法对各种攻击方法实现了平均准确性为56.91%,该方法可显着超过固体基线(混合)37.31%。更令人惊讶的是,经验结果表明,由于学识渊博的紧凑型特征空间,该提出的方法还可以实现脱离分布(OOD)检测的最新性能。当采用CIFAR-10作为分布(ID)数据集和LSUN作为OOD数据集时,通过提出的方法实现了0.937的F1分数。代码可在此HTTPS URL上找到。

CV-22-标题 Diffusion Models for Adversarial Purification

链接: https://arxiv.org/abs/2205.07460
作者: Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, Anima Anandkumar
备注: ICML 2022

点击查看摘要

Abstract: Adversarial purification refers to a class of defense methods that remove adversarial perturbations using a generative model. These methods do not make assumptions on the form of attack and the classification model, and thus can defend pre-existing classifiers against unseen threats. However, their performance currently falls behind adversarial training methods. In this work, we propose DiffPure that uses diffusion models for adversarial purification: Given an adversarial example, we first diffuse it with a small amount of noise following a forward diffusion process, and then recover the clean image through a reverse generative process. To evaluate our method against strong adaptive attacks in an efficient and scalable way, we propose to use the adjoint method to compute full gradients of the reverse generative process. Extensive experiments on three image datasets including CIFAR-10, ImageNet and CelebA-HQ with three classifier architectures including ResNet, WideResNet and ViT demonstrate that our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods, often by a large margin. Project page: this https URL.

摘要:对抗性纯化是指使用生成模型消除对抗扰动的一类防御方法。这些方法不会对攻击形式和分类模型做出假设,因此可以捍卫现有的分类器免受看不见的威胁。但是,他们的表现目前落后于对抗训练方法。在这项工作中,我们提出了使用扩散模型进行对抗纯化的扩散:给定一个对抗性示例,我们首先在正向扩散过程后用少量噪声扩散它,然后通过反向生成过程恢复干净的图像。为了以有效且可扩展的方式评估我们的方法针对强大的自适应攻击,我们建议使用伴随方法来计算反向生成过程的完整梯度。在包括CIFAR-10,ImageNet和Celeba-HQ在内的三个图像数据集上进行的广泛实验,具有三个分类器体系结构,包括Resnet,Wideresnet和Vit,这表明我们的方法可以实现最先进的结果,超过了当前的对抗性训练和对抗性净化方法,即对对抗性净化方法,通常是很大的边距。项目页面:此HTTPS URL。

CV-23-标题 ReDFeat Recoupling Detection and Description for Multimodal Feature Learning

链接: https://arxiv.org/abs/2205.07439
作者: Yuxin Deng, Jiayi Ma
备注:

点击查看摘要

Abstract: Deep-learning-based local feature extraction algorithms that combine detection and description have made significant progress in visible image matching. However, the end-to-end training of such frameworks is notoriously unstable due to the lack of strong supervision of detection and the inappropriate coupling between detection and description. The problem is magnified in cross-modal scenarios, in which most methods heavily rely on the pre-training. In this paper, we recouple independent constraints of detection and description of multimodal feature learning with a mutual weighting strategy, in which the detected probabilities of robust features are forced to peak and repeat, while features with high detection scores are emphasized during optimization. Different from previous works, those weights are detached from back propagation so that the detected probability of indistinct features would not be directly suppressed and the training would be more stable. Moreover, we propose the Super Detector, a detector that possesses a large receptive field and is equipped with learnable non-maximum suppression layers, to fulfill the harsh terms of detection. Finally, we build a benchmark that contains cross visible, infrared, near-infrared and synthetic aperture radar image pairs for evaluating the performance of features in feature matching and image registration tasks. Extensive experiments demonstrate that features trained with the recoulped detection and description, named ReDFeat, surpass previous state-of-the-arts in the benchmark, while the model can be readily trained from scratch.

摘要:结合检测和描述的基于深度学习的本地特征提取算法在可见图像匹配中取得了重大进展。但是,由于缺乏对检测的强大监督以及检测和描述之间的不当耦合,因此众所周知,此类框架的端到端培训是不稳定的。该问题在跨模式场景中被放大,其中大多数方法都严重依赖于预训练。在本文中,我们通过相互加权策略弥补了多模式特征学习的独立限制和描述,其中可检测到的鲁棒特征的检测概率被迫峰值和重复,而在优化过程中则强调具有高检测分数的特征。与以前的作品不同,这些权重从后部传播脱离,因此未直接抑制了未直接特征的概率,并且训练将更加稳定。此外,我们提出了超级探测器,该检测器具有大型的接收场,并配备了可学习的非最大抑制层,以实现苛刻的检测术语。最后,我们构建了一个基准,该基准包含可见的,红外,近红外和合成的光圈雷达图像对,以评估功能匹配和图像注册任务中功能的性能。广泛的实验表明,经过重新探测和描述训练的功能,名为Redfeat,超过了基准测试中先前最先进的功能,而该模型可以从头开始训练。

CV-24-标题 Binarizing by Classification Is soft function really necessary?

链接: https://arxiv.org/abs/2205.07433
作者: Yefei He, Luoming Zhang, Weijia Wu, Hong Zhou
备注: submitted to NeurIPS2022

点击查看摘要

Abstract: Binary neural network leverages the Sign function to binarize real values, and its non-derivative property inevitably brings huge gradient errors during backpropagation. Although many hand-designed soft functions have been proposed to approximate gradients, their mechanism is not clear and there are still huge performance gaps between binary models and their full-precision counterparts. To address this, we propose to tackle network binarization as a binary classification problem and use a multi-layer perceptron (MLP) as the classifier. The MLP-based classifier can fit any continuous function theoretically and is adaptively learned to binarize networks and backpropagate gradients without any specific soft function. With this view, we further prove experimentally that even a simple linear function can outperform previous complex soft functions. Extensive experiments demonstrate that the proposed method yields surprising performance both in image classification and human pose estimation tasks. Specifically, we achieve 65.7% top-1 accuracy of ResNet-34 on ImageNet dataset, with an absolute improvement of 2.8%. When evaluating on the challenging Microsoft COCO keypoint dataset, the proposed method enables binary networks to achieve a mAP of 60.6 for the first time, on par with some full-precision methods.

摘要:二进制神经网络利用符号函数来对真实值进行二进制,其非衍生属性不可避免地会在反向传播过程中带来巨大的梯度错误。尽管已经提出了许多手工设计的软功能来近似梯度,但它们的机制尚不清楚,并且在二进制模型及其完整精确的对应物之间仍然存在巨大的性能差距。为了解决这个问题,我们建议将网络二进制作为二进制分类问题解决,并使用多层感知器(MLP)作为分类器。基于MLP的分类器理论上可以符合任何连续功能,并可以自适应地学习,以对网络进行二进制和反向流向梯度,而无需任何特定的软函数。通过这种观点,我们进一步证明,即使是简单的线性函数也可以胜过先前的复杂软函数。广泛的实验表明,所提出的方法在图像分类和人类姿势估计任务中产生令人惊讶的表现。具体而言,我们在ImageNet数据集上实现了Resnet-34的65.7%的TOP-1准确性,绝对提高了2.8%。在评估具有挑战性的Microsoft可可关键数据集时,提出的方法使二进制网络能够首次获得60.6的地图,并与一些完整的方法相当。

CV-25-标题 Transformers in 3D Point Clouds A Survey

链接: https://arxiv.org/abs/2205.07417
作者: Dening Lu, Qian Xie, Mingqiang Wei, Linlin Xu, Jonathan Li
备注: 22 pages, 5 figures, 5 tables

点击查看摘要

Abstract: In recent years, Transformer models have been proven to have the remarkable ability of long-range dependencies modeling. They have achieved satisfactory results both in Natural Language Processing (NLP) and image processing. This significant achievement sparks great interest among researchers in 3D point cloud processing to apply them to various 3D tasks. Due to the inherent permutation invariance and strong global feature learning ability, 3D Transformers are well suited for point cloud processing and analysis. They have achieved competitive or even better performance compared to the state-of-the-art non-Transformer algorithms. This survey aims to provide a comprehensive overview of 3D Transformers designed for various tasks (e.g. point cloud classification, segmentation, object detection, and so on). We start by introducing the fundamental components of the general Transformer and providing a brief description of its application in 2D and 3D fields. Then, we present three different taxonomies (i.e., Transformer implementation-based taxonomy, data representation-based taxonomy, and task-based taxonomy) for method classification, which allows us to analyze involved methods from multiple perspectives. Furthermore, we also conduct an investigation of 3D self-attention mechanism variants designed for performance improvement. To demonstrate the superiority of 3D Transformers, we compare the performance of Transformer-based algorithms in terms of point cloud classification, segmentation, and object detection. Finally, we point out three potential future research directions, expecting to provide some benefit references for the development of 3D Transformers.

摘要:近年来,变压器模型已被证明具有长期依赖性建模的显着能力。他们在自然语言处理(NLP)和图像处理方面都取得了令人满意的结果。这项重大成就激发了研究人员对3D点云处理的极大兴趣,以将其应用于各种3D任务。由于固有的置换不变性和强大的全球特征学习能力,3D变压器非常适合点云处理和分析。与最先进的非转变算法相比,他们取得了竞争力甚至更好的性能。这项调查旨在提供针对各种任务(例如点云分类,细分,对象检测等)设计的3D变压器的全面概述。我们首先介绍一般变压器的基本组件,并简要说明其在2D和3D字段中的应用程序。然后,我们为方法分类提供了三种不同的分类法(即基于变压器实施的分类法,基于数据表示的分类法和基于任务的分类法),这使我们能够从多个角度分析涉及的方法。此外,我们还对旨在提高性能的3D自我发项机制变体进行了研究。为了证明3D变压器的优势,我们比较了基于点云分类,分割和对象检测的基于变压器的算法的性能。最后,我们指出了三个潜在的未来研究方向,希望为3D变压器的开发提供一些利益参考。

CV-26-标题 A New Outlier Removal Strategy Based on Reliability of Correspondence Graph for Fast Point Cloud Registration

链接: https://arxiv.org/abs/2205.07404
作者: Li Yan, Pengcheng Wei, Hong Xie, Jicheng Dai, Hao Wu, Ming Huang
备注: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

点击查看摘要

Abstract: Registration is a basic yet crucial task in point cloud processing. In correspondence-based point cloud registration, matching correspondences by point feature techniques may lead to an extremely high outlier ratio. Current methods still suffer from low efficiency, accuracy, and recall rate. We use a simple and intuitive method to describe the 6-DOF (degree of freedom) curtailment process in point cloud registration and propose an outlier removal strategy based on the reliability of the correspondence graph. The method constructs the corresponding graph according to the given correspondences and designs the concept of the reliability degree of the graph node for optimal candidate selection and the reliability degree of the graph edge to obtain the global maximum consensus set. The presented method could achieve fast and accurate outliers removal along with gradual aligning parameters estimation. Extensive experiments on simulations and challenging real-world datasets demonstrate that the proposed method can still perform effective point cloud registration even the correspondence outlier ratio is over 99%, and the efficiency is better than the state-of-the-art. Code is available at this https URL.

摘要:注册是点云处理中的基本但至关重要的任务。在基于对应的点云注册中,符合点特征技术的匹配对应关系可能会导致极高的离群比。当前的方法仍然患有低效率,准确性和召回率。我们使用一种简单而直观的方法来描述点云注册中的6-DOF(自由度)缩减过程,并根据通信图的可靠性提出了一个离群的删除策略。该方法根据给定的对应关系构建相应的图形,并设计图形节点的可靠性度的概念,以获得最佳候选选择和图形边缘的可靠性度,以获得全局最大共识集。提出的方法可以实现快速,准确的离群删除以及逐渐比对参数估计。关于仿真和挑战现实数据集的广泛实验表明,即使对应的离群值比率超过99%,所提出的方法仍然可以执行有效的点云注册,并且效率比最先进的效率更好。代码可在此HTTPS URL上找到。

CV-27-标题 PillarNet High-Performance Pillar-based 3D Object Detection

链接: https://arxiv.org/abs/2205.07403
作者: Guangsheng Shi, Ruifeng Li, Chao Ma
备注:

点击查看摘要

Abstract: Real-time and high-performance 3D object detection is of critical importance for autonomous driving. Recent top-performing 3D object detectors mainly rely on point-based or 3D voxel-based convolutions, which are both computationally inefficient for onboard deployment. In contrast, pillar-based methods use merely 2D convolutions, which consume less computation resources, but they lag far behind their voxel-based counterparts in detection accuracy. In this paper, by examining the primary performance gap between pillar- and voxel-based detectors, we develop a real-time and high-performance pillar-based detector, dubbed PillarNet. The proposed PillarNet consists of a powerful encoder network for effective pillar feature learning, a neck network for spatial-semantic feature fusion and the commonly used detect head. Using only 2D convolutions, PillarNet is flexible to an optional pillar size and compatible with classical 2D CNN backbones, such as VGGNet and ResNet. Additionally, PillarNet benefits from an orientation-decoupled IoU regression loss along with the IoU-aware prediction branch. Extensive experimental results on the large-scale nuScenes Dataset and Waymo Open Dataset demonstrate that the proposed PillarNet performs well over the state-of-the-art 3D detectors in terms of effectiveness and efficiency.

摘要:实时和高性能3D对象检测对于自动驾驶至关重要。最近表现最佳的3D对象探测器主要依赖于基于点或基于3D Voxel的卷积,这两者在计算上均无效地部署。相比之下,基于支柱的方法仅使用2D卷积,从而消耗了较少的计算资源,但它们的检测准确性远远落后于基于体素的对应物。在本文中,通过检查基于支柱和体素的探测器之间的主要性能差距,我们开发了一个实时和高性能的柱子检测器,称为Pillarnet。提出的柱子由一个强大的编码网络组成,用于有效的支柱特征学习,用于空间语义特征融合的颈网和常用的检测头。仅使用2D卷积,Pillarnet具有可选的支柱尺寸的灵活性,并与经典的2D CNN骨架(例如VGGNET和RESNET)兼容。此外,柱网和iou-Aware预测分支带来了导向的回归损失。大规模NUSCENES数据集和Waymo Open数据集的广泛实验结果表明,在有效性和效率方面,提出的Pillarnet在最新的3D检测器上表现良好。

CV-28-标题 SuperWarp Supervised Learning and Warping on U-Net for Invariant Subvoxel-Precise Registration

链接: https://arxiv.org/abs/2205.07399
作者: Sean I. Young, Yaël Balbastre, Adrian V. Dalca, William M. Wells, Juan Eugenio Iglesias, Bruce Fischl
备注:

点击查看摘要

Abstract: In recent years, learning-based image registration methods have gradually moved away from direct supervision with target warps to instead use self-supervision, with excellent results in several registration benchmarks. These approaches utilize a loss function that penalizes the intensity differences between the fixed and moving images, along with a suitable regularizer on the deformation. In this paper, we argue that the relative failure of supervised registration approaches can in part be blamed on the use of regular U-Nets, which are jointly tasked with feature extraction, feature matching, and estimation of deformation. We introduce one simple but crucial modification to the U-Net that disentangles feature extraction and matching from deformation prediction, allowing the U-Net to warp the features, across levels, as the deformation field is evolved. With this modification, direct supervision using target warps begins to outperform self-supervision approaches that require segmentations, presenting new directions for registration when images do not have segmentations. We hope that our findings in this preliminary workshop paper will re-ignite research interest in supervised image registration techniques. Our code is publicly available from this https URL.

摘要:近年来,基于学习的图像注册方法已逐渐摆脱了与目标扭曲的直接监督,而是使用自我划分,在几种注册基准中取得了出色的结果。这些方法利用了损失函数,损失了固定图像和移动图像之间的强度差异,以及适合变形的正常化程序。在本文中,我们认为,监督注册方法的相对失败可以部分地归咎于使用常规U-NET,这些U-NET的使用是共同任务,其特征提取,功能匹配和变形估计。我们将一种简单但至关重要的修改引入了U-NET,该u-net删除了脱离的特征,并从变形预测中提取并匹配,从而使u-net跨级别旋转特征,因为变形场的发展。通过此修改,使用目标扭曲的直接监督开始胜过需要分割的自学方法,当图像没有细分时,提出了新的注册方向。我们希望我们在这篇初步研讨会论文中的发现能够重新点燃有关监督图像注册技术的研究兴趣。我们的代码可从此HTTPS URL公开获得。

CV-29-标题 Novel Multicolumn Kernel Extreme Learning Machine for Food Detection via Optimal Features from CNN

链接: https://arxiv.org/abs/2205.07348
作者: Ghalib Ahmed, Tahir Chu, Kiong Loo
备注:

点击查看摘要

Abstract: Automatic food detection is an emerging topic of interest due to its wide array of applications ranging from detecting food images on social media platforms to filtering non-food photos from the users in dietary assessment apps. Recently, during the COVID-19 pandemic, it has facilitated enforcing an eating ban by automatically detecting eating activities from cameras in public places. Therefore, to tackle the challenge of recognizing food images with high accuracy, we proposed the idea of a hybrid framework for extracting and selecting optimal features from an efficient neural network. There on, a nonlinear classifier is employed to discriminate between linearly inseparable feature vectors with great precision. In line with this idea, our method extracts features from MobileNetV3, selects an optimal subset of attributes by using Shapley Additive exPlanations (SHAP) values, and exploits kernel extreme learning machine (KELM) due to its nonlinear decision boundary and good generalization ability. However, KELM suffers from the ‘curse of dimensionality problem’ for large datasets due to the complex computation of kernel matrix with large numbers of hidden nodes. We solved this problem by proposing a novel multicolumn kernel extreme learning machine (MCKELM) which exploited the k-d tree algorithm to divide data into N subsets and trains separate KELM on each subset of data. Then, the method incorporates KELM classifiers into parallel structures and selects the top k nearest subsets during testing by using the k-d tree search for classifying input instead of the whole network. For evaluating a proposed framework large food/non-food dataset is prepared using nine publically available datasets. Experimental results showed the superiority of our method on an integrated set of measures while solving the problem of 'curse of dimensionality in KELM for large datasets.

摘要:自动食品检测是一个有趣的主题,因为它的广泛应用程序从在社交媒体平台上检测食品图像到过滤饮食评估应用程序中用户的非食品照片。最近,在Covid-19大流行期间,它通过自动从公共场所的相机自动检测饮食活动来促进禁令。因此,为了应对以高精度识别食物图像的挑战,我们提出了一个混合框架的想法,用于从高效的神经网络中提取和选择最佳特征。在此,采用非线性分类器来区分以非常精确的线性不可分割的特征向量。根据这个想法,我们的方法从MobilenetV3中提取功能,通过使用Shapley添加说明(SHAP)值选择最佳属性子集,并由于其非线性决策边界和良好的概括能力而利用内核极限学习机(KELM)。但是,由于对大量隐藏节点的内核矩阵进行了复杂的计算,KELM遭受了大数据集的“维数问题的诅咒”。我们通过提出了一种新颖的多柱核极限学习机(McKelm)来解决这个问题,该机器(McKelm)利用K-D树算法将数据分为N子集并在每个数据子集上训练单独的KELM。然后,该方法将KELM分类器合并到并行结构中,并通过使用K-D树搜索进行分类输入而不是整个网络,从而在测试过程中选择顶部K最近的子集。为了评估提议的框架,大型食品/非食品数据集是使用九个公开可用数据集准备的。实验结果表明,我们方法在一组综合度量集中的优越性,同时解决了大型数据集中KELM中维度的诅咒问题。

CV-30-标题 Trucks Dont Mean Trump Diagnosing Human Error in Image Analysis

链接: https://arxiv.org/abs/2205.07333
作者: J.D. Zamfirescu-Pereira, Jerry Chen, Emily Wen, Allison Koenecke, Nikhil Garg, Emma Pierson
备注: To be published in FAccT 2022

点击查看摘要

Abstract: Algorithms provide powerful tools for detecting and dissecting human bias and error. Here, we develop machine learning methods to to analyze how humans err in a particular high-stakes task: image interpretation. We leverage a unique dataset of 16,135,392 human predictions of whether a neighborhood voted for Donald Trump or Joe Biden in the 2020 US election, based on a Google Street View image. We show that by training a machine learning estimator of the Bayes optimal decision for each image, we can provide an actionable decomposition of human error into bias, variance, and noise terms, and further identify specific features (like pickup trucks) which lead humans astray. Our methods can be applied to ensure that human-in-the-loop decision-making is accurate and fair and are also applicable to black-box algorithmic systems.

摘要:算法提供了用于检测和剖析人类偏见和错误的强大工具。在这里,我们开发了机器学习方法,以分析人类在特定高风险任务中如何犯错:图像解释。我们根据Google Street View Image在2020年的美国大选中,在2020年美国选举中是否投票支持唐纳德·特朗普或乔·拜登的独特数据集对人类是否投票给唐纳德·特朗普或乔·拜登的独特数据集。我们表明,通过训练每张图像的贝叶斯最佳决策的机器学习估算器,我们可以将人为错误的可行分解为偏见,差异和噪声术语,并进一步识别导致人类误入歧途的特定功能(例如皮卡车)。我们的方法可以应用于确保人类的决策是准确而公平的,并且也适用于黑盒算法系统。

CV-31-标题 Uncertainty estimation for Cross-dataset performance in Trajectory prediction

链接: https://arxiv.org/abs/2205.07310
作者: Thomas Gilles, Stefano Sabatini, Dzmitry Tsishkou, Bogdan Stanciulescu, Fabien Moutarde
备注: Workshop on Fresh Perspectives on the Future of Autonomous Driving, ICRA 2022

点击查看摘要

Abstract: While a lot of work has been done on developing trajectory prediction methods, and various datasets have been proposed for benchmarking this task, little study has been done so far on the generalizability and the transferability of these methods across dataset. In this paper, we study the performance of a state-of-the-art trajectory prediction method across four different datasets (Argoverse, NuScenes, Interaction, Shifts). We first check how a similar method can be applied and trained on all these datasets with similar hyperparameters. Then we highlight which datasets work best on others, and study how uncertainty estimation allows for a better transferable performance; proposing a novel way to estimate uncertainty and to directly use it in prediction.

摘要:虽然已经在开发轨迹预测方法方面已经完成了许多工作,并且已经提出了各种数据集来基准这项任务,但迄今为止,很少对这些方法在整个数据集中的可推广性和可传递性进行研究。在本文中,我们研究了四个不同数据集(Argoverse,nuscenes,互动,移位)的最先进轨迹预测方法的性能。我们首先检查如何在所有这些数据集上使用类似的超参数应用类似的方法。然后,我们强调了哪些数据集对其他数据集最有效,并研究不确定性估计如何允许更好地转移性能;提出一种新颖的方式来估计不确定性并直接在预测中使用它。

CV-32-标题 Conditional Vector Graphics Generation for Music Cover Images

链接: https://arxiv.org/abs/2205.07301
作者: Valeria Efimova, Ivan Jarsky, Ilya Bizyaev, Andrey Filchenkov
备注:

点击查看摘要

Abstract: Generative Adversarial Networks (GAN) have motivated a rapid growth of the domain of computer image synthesis. As almost all the existing image synthesis algorithms consider an image as a pixel matrix, the high-resolution image synthesis is complicated.A good alternative can be vector images. However, they belong to the highly sophisticated parametric space, which is a restriction for solving the task of synthesizing vector graphics by GANs. In this paper, we consider a specific application domain that softens this restriction dramatically allowing the usage of vector image synthesis. Music cover images should meet the requirements of Internet streaming services and printing standards, which imply high resolution of graphic materials without any additional requirements on the content of such images. Existing music cover image generation services do not analyze tracks themselves; however, some services mostly consider only genre tags. To generate music covers as vector images that reflect the music and consist of simple geometric objects, we suggest a GAN-based algorithm called CoverGAN. The assessment of resulting images is based on their correspondence to the music compared with AttnGAN and DALL-E text-to-image generation according to title or lyrics. Moreover, the significance of the patterns found by CoverGAN has been evaluated in terms of the correspondence of the generated cover images to the musical tracks. Listeners evaluate the music covers generated by the proposed algorithm as quite satisfactory and corresponding to the tracks. Music cover images generation code and demo are available at this https URL.

摘要:生成对抗网络(GAN)激发了计算机图像合成领域的快速增长。由于几乎所有现有的图像合成算法都将图像视为像素矩阵,因此高分辨率图像合成是复杂的。良好的替代方案可以是向量图像。但是,它们属于高度复杂的参数空间,这是解决GAN合成向量图形的任务的限制。在本文中,我们考虑了一个特定的应用域,该域极大地软化了该限制,从而允许使用向量图像合成。音乐覆盖图像应满足Internet流服务和打印标准的要求,这意味着图形材料的高分辨率很高,而对此类图像的内容没有任何其他要求。现有的音乐封面图像生成服务不会自行分析曲目;但是,某些服务主要仅考虑流派标签。为了使音乐覆盖为反映音乐并由简单几何对象组成的矢量图像,我们建议一种基于GAN的算法称为Covergan。根据标题或歌词,对结果图像的评估是基于与Attngan和dall-e文本到图像生成相比的对应。此外,已经根据生成的封面图像对音乐轨道的对应关系进行了评估,从而评估了Covergan发现的图案的重要性。听众评估拟议算法产生的音乐封面非常令人满意,并且对应于曲目。音乐封面图像生成代码和演示可在此HTTPS URL上找到。

CV-33-标题 Regulating Facial Processing Technologies Tensions Between Legal and Technical Considerations in the Application of Illinois BIPA

链接: https://arxiv.org/abs/2205.07299
作者: Rui-Jie Yew, Alice Xiang
备注: Forthcoming at FAccT 2022

点击查看摘要

Abstract: Harms resulting from the development and deployment of facial processing technologies (FPT) have been met with increasing controversy. Several states and cities in the U.S. have banned the use of facial recognition by law enforcement and governments, but FPT are still being developed and used in a wide variety of contexts where they primarily are regulated by state biometric information privacy laws. Among these laws, the 2008 Illinois Biometric Information Privacy Act (BIPA) has generated a significant amount of litigation. Yet, with most BIPA lawsuits reaching settlements before there have been meaningful clarifications of relevant technical intricacies and legal definitions, there remains a great degree of uncertainty as to how exactly this law applies to FPT. What we have found through applications of BIPA in FPT litigation so far, however, points to potential disconnects between technical and legal communities. This paper analyzes what we know based on BIPA court proceedings and highlights these points of tension: areas where the technical operationalization of BIPA may create unintended and undesirable incentives for FPT development, as well as areas where BIPA litigation can bring to light the limitations of solely technical methods in achieving legal privacy values. These factors are relevant for (i) reasoning about biometric information privacy laws as a governing mechanism for FPT, (ii) assessing the potential harms of FPT, and (iii) providing incentives for the mitigation of these harms. By illuminating these considerations, we hope to empower courts and lawmakers to take a more nuanced approach to regulating FPT and developers to better understand privacy values in the current U.S. legal landscape.

摘要:面部处理技术(FPT)的发展和部署造成的危害已随着争议的越来越多。美国的几个州和城市禁止执法和政府使用面部识别,但FPT仍在开发和使用各种环境中,这些情况主要受国家生物识别信息隐私法的监管。在这些法律中,《 2008年伊利诺伊州生物识别信息隐私法》(BIPA)产生了大量诉讼。然而,由于大多数BIPA诉讼在对相关的技术复杂性和法律定义进行有意义的澄清之前就达成了解决方案,因此对于该法律如何适用于FPT,仍然存在很大程度的不确定性。但是,到目前为止,我们通过BIPA在FPT诉讼中的申请中发现了什么,这表明技术和法律社区之间的潜在断开连接。本文分析了我们根据BIPA法院程序所知道的,并强调了这些紧张局势:BIPA的技术操作可能会为FPT开发带来意外且不受欢迎的激励措施,以及BIPA诉讼可以揭示唯一限制的领域实现法律隐私价值的技术方法。这些因素与(i)关于生物识别信息隐私法作为FPT的管理机制,(ii)评估FPT的潜在危害,以及(iii)为缓解这些危害提供激励措施。通过阐明这些考虑因素,我们希望使法院和立法者采取更细微的方法来监管FPT和开发商,以更好地了解美国当前法律景观中的隐私价值。

CV-34-标题 Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks

链接: https://arxiv.org/abs/2205.07260
作者: Bum Jun Kim, Hyeyeon Choi, Hyeonah Jang, Dong Gu Lee, Wonseok Jeong, Sang Woo Kim
备注: 12 pages, 6 figures

点击查看摘要

Abstract: L2 regularization for weights in neural networks is widely used as a standard training trick. However, L2 regularization for gamma, a trainable parameter of batch normalization, remains an undiscussed mystery and is applied in different ways depending on the library and practitioner. In this paper, we study whether L2 regularization for gamma is valid. To explore this issue, we consider two approaches: 1) variance control to make the residual network behave like identity mapping and 2) stable optimization through the improvement of effective learning rate. Through two analyses, we specify the desirable and undesirable gamma to apply L2 regularization and propose four guidelines for managing them. In several experiments, we observed the increase and decrease in performance caused by applying L2 regularization to gamma of four categories, which is consistent with our four guidelines. Our proposed guidelines were validated through various tasks and architectures, including variants of residual networks and transformers.

摘要:神经网络中的权重的L2正则化被广泛用作标准训练技巧。然而,伽马(Gamma)的L2正则化是批准归一化的可训练参数,仍然是一个未被发现的谜团,并根据图书馆和从业者的不同方式应用。在本文中,我们研究了伽马的L2正则化是否有效。为了探讨这个问题,我们考虑了两种方法:1)方差控制,使残留网络的行为像身份映射一样,2)通过提高有效学习率来稳定优化。通过两项分析,我们指定了应用L2正则化的理想和不希望的伽玛,并提出了四个用于管理它们的准则。在几个实验中,我们观察到通过将L2正则化适用于四个类别的伽玛引起的性能的增加和下降,这与我们的四个准则一致。我们提出的指南通过各种任务和架构进行了验证,包括残留网络和变形金刚的变体。

CV-35-标题 FreeMatch Self-adaptive Thresholding for Semi-supervised Learning

链接: https://arxiv.org/abs/2205.07246
作者: Yidong Wang, Hao Chen, Qiang Heng, Wenxin Hou, Marios Savvides, Takahiro Shinozaki, Bhiksha Raj, Zhen Wu, Jindong Wang
备注: Preprint. Codebase: this https URL

点击查看摘要

Abstract: Pseudo labeling and consistency regularization approaches with confidence-based thresholding have made great progress in semi-supervised learning (SSL). In this paper, we theoretically and empirically analyze the relationship between the unlabeled data distribution and the desirable confidence threshold. Our analysis shows that previous methods might fail to define favorable threshold since they either require a pre-defined / fixed threshold or an ad-hoc threshold adjusting scheme that does not reflect the learning effect well, resulting in inferior performance and slow convergence, especially for complicated unlabeled data distributions. We hence propose \emph{FreeMatch} to define and adjust the confidence threshold in a self-adaptive manner according to the model’s learning status. To handle complicated unlabeled data distributions more effectively, we further propose a self-adaptive class fairness regularization method that encourages the model to produce diverse predictions during training. Extensive experimental results indicate the superiority of FreeMatch especially when the labeled data are extremely rare. FreeMatch achieves \textbf{5.78}%, \textbf{13.59}%, and \textbf{1.28}% error rate reduction over the latest state-of-the-art method FlexMatch on CIFAR-10 with 1 label per class, STL-10 with 4 labels per class, and ImageNet with 100k labels respectively.

摘要:具有基于置信的阈值的伪标记和一致性正则化方法在半监督学习(SSL)方面取得了巨大进展。在本文中,我们从理论和经验上分析了未标记的数据分布与理想的置信阈值之间的关系。我们的分析表明,以前的方法可能无法定义有利的阈值,因为它们要么需要预先定义 /固定阈值,要么是无法很好地反映学习效果的临时阈值调整方案,从而导致劣质性能和缓慢的收敛性,尤其是对于复杂的未标记数据分布。因此,我们建议\ emph {freeMatch}根据模型的学习状态以自适应方式定义和调整置信度阈值。为了更有效地处理复杂的未标记数据分布,我们进一步提出了一种自适应阶级的公平正规化方法,该方法鼓励该模型在培训期间产生各种预测。广泛的实验结果表明,尤其是在标记的数据极为罕见时,尤其是在很少见的情况下。 freeMatch实现\ textbf {5.78} \%,\ textbf {13.59} \%,和\ textbf {1.28} \%错误率降低了最新的最新最新最新方法的CIFAR-10,CIFAR-10在CIFAR-10上均具有1个标签。 ,每个类别具有4个标签的STL-10和具有100K标签的Imagenet。

CV-36-标题 Video Frame Interpolation with Transformer

链接: https://arxiv.org/abs/2205.07230
作者: Liying Lu, Ruizheng Wu, Huaijia Lin, Jiangbo Lu, Jiaya Jia
备注: CVPR2022

点击查看摘要

Abstract: Video frame interpolation (VFI), which aims to synthesize intermediate frames of a video, has made remarkable progress with development of deep convolutional networks over past years. Existing methods built upon convolutional networks generally face challenges of handling large motion due to the locality of convolution operations. To overcome this limitation, we introduce a novel framework, which takes advantage of Transformer to model long-range pixel correlation among video frames. Further, our network is equipped with a novel cross-scale window-based attention mechanism, where cross-scale windows interact with each other. This design effectively enlarges the receptive field and aggregates multi-scale information. Extensive quantitative and qualitative experiments demonstrate that our method achieves new state-of-the-art results on various benchmarks.

摘要:旨在综合视频中间帧的视频框架插值(VFI)在过去几年的深度卷积网络的发展中取得了显着进展。建立在卷积网络上的现有方法通常面临处理大型运动的挑战,这是由于卷积操作的当地。为了克服这一限制,我们引入了一个新颖的框架,该框架利用变压器对视频帧之间的远程像素相关进行建模。此外,我们的网络配备了一种新型的跨尺度基于窗口的注意机制,跨尺度窗户相互交互。该设计有效地扩大了接受场并汇总了多尺度信息。广泛的定量和定性实验表明,我们的方法在各种基准上实现了新的最新结果。

CV-37-标题 Fused Deep Neural Network based Transfer Learning in Occluded Face Classification and Person re-Identification

链接: https://arxiv.org/abs/2205.07203
作者: Mohamed Mohana, Prasanalakshmi B, Salem Alelyani, Mohammed Saleh Alsaqer
备注: 15 pages, 9 figures

点击查看摘要

Abstract: Recent period of pandemic has brought person identification even with occluded face image a great importance with increased number of mask usage. This paper aims to recognize the occlusion of one of four types in face images. Various transfer learning methods were tested, and the results show that MobileNet V2 with Gated Recurrent Unit(GRU) performs better than any other Transfer Learning methods, with a perfect accuracy of 99% in classification of images as with or without occlusion and if with occlusion, then the type of occlusion. In parallel, identifying the Region of interest from the device captured image is done. This extracted Region of interest is utilised in face identification. Such a face identification process is done using the ResNet model with its Caffe implementation. To reduce the execution time, after the face occlusion type was recognized the person was searched to confirm their face image in the registered database. The face label of the person obtained from both simultaneous processes was verified for their matching score. If the matching score was above 90, the recognized label of the person was logged into a file with their name, type of mask, date, and time of recognition. MobileNetV2 is a lightweight framework which can also be used in embedded or IoT devices to perform real time detection and identification in suspicious areas of investigations using CCTV footages. When MobileNetV2 was combined with GRU, a reliable accuracy was obtained. The data provided in the paper belong to two categories, being either collected from Google Images for occlusion classification, face recognition, and facial landmarks, or collected in fieldwork. The motive behind this research is to identify and log person details which could serve surveillance activities in society-based e-governance.

摘要:最近的大流行时期也带来了人的身份,即使遮挡的脸部形象也非常重要,并且使用了增加的掩膜。本文旨在认识到面部图像中四种类型之一的阻塞。测试了各种转移学习方法,结果表明,具有门控复发单元(GRU)的Mobilenet V2的性能比任何其他转移学习方法都更好,并且在图像的分类中,具有99%的精度,如或不带有闭塞性, ,然后是遮挡的类型。同时,确定从设备捕获的图像中识别感兴趣的区域。该提取的感兴趣区域用于面部识别。这种面部识别过程是使用带有CAFFE实现的Resnet模型完成的。为了减少执行时间,在识别面部遮挡类型后,搜索了该人以在注册数据库中确认其面部图像。从两个同时过程中获得的人的面部标签均经过匹配得分验证。如果匹配分数高于90,则该人的公认标签被登录到具有其名称,掩码类型,日期和识别时间的文件中。 MobileNetV2是一个轻巧的框架,也可以在嵌入式或IoT设备中使用,以使用CCTV录像在可疑的调查区域进行实时检测和识别。当MobileNetV2与GRU结合使用时,获得了可靠的精度。本文提供的数据属于两类,要么从Google图像中收集以进行遮挡分类,面部识别和面部地标,要么在现场工作中收集。这项研究的动机是识别和记录可以在社会电子政务中提供监视活动的人的细节。

CV-38-标题 Real-centric Consistency Learning for Deepfake Detection

链接: https://arxiv.org/abs/2205.07201
作者: Ruiqi Zha, Zhichao Lian, Qianmu Li, Siqi Gu
备注:

点击查看摘要

Abstract: Most of previous deepfake detection researches bent their efforts to describe and discriminate artifacts in human perceptible ways, which leave a bias in the learned networks of ignoring some critical invariance features intra-class and underperforming the robustness of internet interference. Essentially, the target of deepfake detection problem is to represent natural faces and fake faces at the representation space discriminatively, and it reminds us whether we could optimize the feature extraction procedure at the representation space through constraining intra-class consistence and inter-class inconsistence to bring the intra-class representations close and push the inter-class representations apart? Therefore, inspired by contrastive representation learning, we tackle the deepfake detection problem through learning the invariant representations of both classes and propose a novel real-centric consistency learning method. We constraint the representation from both the sample level and the feature level. At the sample level, we take the procedure of deepfake synthesis into consideration and propose a novel forgery semantical-based pairing strategy to mine latent generation-related features. At the feature level, based on the centers of natural faces at the representation space, we design a hard positive mining and synthesizing method to simulate the potential marginal features. Besides, a hard negative fusion method is designed to improve the discrimination of negative marginal features with the help of supervised contrastive margin loss we developed. The effectiveness and robustness of the proposed method has been demonstrated through extensive experiments.

摘要:以前的大多数DeepFake检测研究都竭尽全力描述和区分人类可感知的方式,这些方式在学习的网络中留下了偏见,这些网络忽略了一些关键的不变性特征,并且表现不佳,表现不佳。从本质上讲,深层检测问题的目标是在表示空间上表示自然面孔和虚假面孔,它提醒我们是否可以通过限制阶层内的一致性和阶层间的不一致性来优化表示空间的特征提取程序,从而提醒我们。使课堂内表示并将阶层间表示分开?因此,受到对比表示学习的启发,我们通过学习两种类的不变表示并提出一种新颖的以实物为中心的一致性学习方法来解决DeepFake检测问题。我们从样本级别和特征级别限制表示形式。在样本级别上,我们考虑了深泡合成的程序,并提出了一种新型的基于伪造语义的配对策略,以挖掘潜在的生成相关特征。在特征级别,根据表示空间的自然面中心,我们设计了一种坚硬的阳性采矿和合成方法,以模拟潜在的边际特征。此外,强硬的负融合方法旨在通过我们开发的有监督的对比度损失来改善负边缘特征的歧视。通过广泛的实验证明了所提出方法的有效性和鲁棒性。

CV-39-标题 Promoting Saliency From Depth Deep Unsupervised RGB-D Saliency Detection

链接: https://arxiv.org/abs/2205.07179
作者: Wei Ji, Jingjing Li, Qi Bi, Chuan Guo, Jie Liu, Li Cheng
备注: This paper appeared at ICLR 2022

点击查看摘要

Abstract: Growing interests in RGB-D salient object detection (RGB-D SOD) have been witnessed in recent years, owing partly to the popularity of depth sensors and the rapid progress of deep learning techniques. Unfortunately, existing RGB-D SOD methods typically demand large quantity of training images being thoroughly annotated at pixel-level. The laborious and time-consuming manual annotation has become a real bottleneck in various practical scenarios. On the other hand, current unsupervised RGB-D SOD methods still heavily rely on handcrafted feature representations. This inspires us to propose in this paper a deep unsupervised RGB-D saliency detection approach, which requires no manual pixel-level annotation during training. It is realized by two key ingredients in our training pipeline. First, a depth-disentangled saliency update (DSU) framework is designed to automatically produce pseudo-labels with iterative follow-up refinements, which provides more trustworthy supervision signals for training the saliency network. Second, an attentive training strategy is introduced to tackle the issue of noisy pseudo-labels, by properly re-weighting to highlight the more reliable pseudo-labels. Extensive experiments demonstrate the superior efficiency and effectiveness of our approach in tackling the challenging unsupervised RGB-D SOD scenarios. Moreover, our approach can also be adapted to work in fully-supervised situation. Empirical studies show the incorporation of our approach gives rise to notably performance improvement in existing supervised RGB-D SOD models.

摘要:近年来已经看到了对RGB-D显着对象检测(RGB-D SOD)的日益增长的兴趣,部分原因是深度传感器的普及和深度学习技术的快速发展。不幸的是,现有的RGB-D SOD方法通常要求在像素级时彻底注释大量的训练图像。在各种实际情况下,费力且耗时的手动注释已成为真正的瓶颈。另一方面,当前无监督的RGB-D SOD方法仍然很大程度上依赖手工制作的特征表示。这激发了我们在本文中提出的一种深度无监督的RGB-D显着检测方法,这在培训过程中不需要手动像素级注释。我们的培训管道中的两种关键要素实现了这一点。首先,深度插入的显着性更新(DSU)框架旨在自动生产具有迭代后续改进的伪标签,该曲目可为培训显着性网络提供更值得信赖的监督信号。其次,引入了一个细心的培训策略来解决嘈杂的伪标签问题,通过适当重新加权以突出更可靠的伪标签。广泛的实验表明,我们方法在应对具有挑战性的RGB-D SOD场景方面的效率和有效性。此外,我们的方法也可以适应在完全监督的情况下工作。实证研究表明,我们的方法的合并导致了现有监督的RGB-D SOD模型的性能改善。

CV-40-标题 Proxyless Neural Architecture Adaptation for Supervised Learning and Self-Supervised Learning

链接: https://arxiv.org/abs/2205.07168
作者: Do-Guk Kim, Heung-Chang Lee
备注: arXiv admin note: substantial text overlap with arXiv:2006.08231

点击查看摘要

Abstract: Recently, Neural Architecture Search (NAS) methods have been introduced and show impressive performance on many benchmarks. Among those NAS studies, Neural Architecture Transformer (NAT) aims to adapt the given neural architecture to improve performance while maintaining computational costs. However, NAT lacks reproducibility and it requires an additional architecture adaptation process before network weight training. In this paper, we propose proxyless neural architecture adaptation that is reproducible and efficient. Our method can be applied to both supervised learning and self-supervised learning. The proposed method shows stable performance on various architectures. Extensive reproducibility experiments on two datasets, i.e., CIFAR-10 and Tiny Imagenet, present that the proposed method definitely outperforms NAT and is applicable to other models and datasets.

摘要:最近,引入了神经体系结构搜索(NAS)方法,并在许多基准测试中表现出令人印象深刻的性能。在NAS研究中,神经体系结构变压器(NAT)旨在调整给定的神经体系结构,以提高性能,同时保持计算成本。但是,NAT缺乏可重复性,并且需要在网络举重训练之前进行额外的体系结构适应过程。在本文中,我们提出了可再现和高效的无近近近临界神经结构适应。我们的方法可以应用于监督的学习和自我监督的学习。提出的方法在各种体系结构上显示出稳定的性能。在两个数据集(即CIFAR-10和Tiny Imagenet)上进行的广泛可重复性实验表明,该建议的方法绝对优于NAT,并且适用于其他模型和数据集。

CV-41-标题 GLaMa Joint Spatial and Frequency Loss for General Image Inpainting

链接: https://arxiv.org/abs/2205.07162
作者: Zeyu Lu, Junjun Jiang, Junqin Huang, Gang Wu, Xianming Liu
备注: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

点击查看摘要

Abstract: The purpose of image inpainting is to recover scratches and damaged areas using context information from remaining parts. In recent years, thanks to the resurgence of convolutional neural networks (CNNs), image inpainting task has made great breakthroughs. However, most of the work consider insufficient types of mask, and their performance will drop dramatically when encountering unseen masks. To combat these challenges, we propose a simple yet general method to solve this problem based on the LaMa image inpainting framework, dubbed GLaMa. Our proposed GLaMa can better capture different types of missing information by using more types of masks. By incorporating more degraded images in the training phase, we can expect to enhance the robustness of the model with respect to various masks. In order to yield more reasonable results, we further introduce a frequency-based loss in addition to the traditional spatial reconstruction loss and adversarial loss. In particular, we introduce an effective reconstruction loss both in the spatial and frequency domain to reduce the chessboard effect and ripples in the reconstructed image. Extensive experiments demonstrate that our method can boost the performance over the original LaMa method for each type of mask on FFHQ, ImageNet, Places2 and WikiArt dataset. The proposed GLaMa was ranked first in terms of PSNR, LPIPS and SSIM in the NTIRE 2022 Image Inpainting Challenge Track 1 Unsupervised.

摘要:图像介入的目的是使用其余部分中的上下文信息恢复划痕和损坏区域。近年来,由于卷积神经网络(CNN)的复兴,图像涂上任务取得了重大突破。但是,大多数工作都认为蒙版类型不足,并且在遇到看不见的口罩时,它们的性能会急剧下降。为了应对这些挑战,我们提出了一种简单而通用的方法,以基于喇嘛图像镶嵌框架(称为Glama)解决此问题。我们提出的魅力可以通过使用更多类型的掩模来更好地捕获不同类型的缺少信息。通过在训练阶段合并更多退化的图像,我们可以期望增强模型相对于各种掩模的鲁棒性。为了产生更合理的结果,除传统的空间重建损失和对抗性损失外,我们还进一步引入了基于频率的损失。特别是,我们在空间和频域中引入了有效的重建损失,以减少棋盘效应和重建图像中的涟漪。广泛的实验表明,我们的方法可以提高FFHQ,Imagenet,Place2和Wikiart DataSet上每种蒙版的原始LAMA方法的性能。在NTIRE 2022 IMAGE IMPAIL挑战赛曲目1中,无监督的拟议的GLAMA在PSNR,LPIP和SSIM方面排名第一。

CV-42-标题 Evaluating Uncertainty Calibration for Open-Set Recognition

链接: https://arxiv.org/abs/2205.07160
作者: Zongyao Lyu, Nolan B. Gutierrez, William J. Beksi
备注: To be presented at the 2022 IEEE International Conference on Robotics and Automation (ICRA) Workshop on Safe and Reliable Robot Autonomy under Uncertainty

点击查看摘要

Abstract: Despite achieving enormous success in predictive accuracy for visual classification problems, deep neural networks (DNNs) suffer from providing overconfident probabilities on out-of-distribution (OOD) data. Yet, accurate uncertainty estimation is crucial for safe and reliable robot autonomy. In this paper, we evaluate popular calibration techniques for open-set conditions in a way that is distinctly different from the conventional evaluation of calibration methods on OOD data. Our results show that closed-set DNN calibration approaches are much less effective for open-set recognition, which highlights the need to develop new DNN calibration methods to address this problem.

摘要:尽管在视觉分类问题方面取得了巨大的预测准确性,但深层神经网络(DNNS)仍无法提供过度证明概率(OOD)数据。然而,准确的不确定性估计对于安全可靠的机器人自主权至关重要。在本文中,我们以与OOD数据上校准方法的常规评估相同的方式评估了开放式条件的流行校准技术。我们的结果表明,封闭设置的DNN校准方法对于开放式识别的有效性要差得多,这突出了开发新的DNN校准方法来解决此问题的必要性。

CV-43-标题 Breaking with Fixed Set Pathology Recognition through Report-Guided Contrastive Training

链接: https://arxiv.org/abs/2205.07139
作者: Constantin Seibold, Simon Reiß, M. Saquib Sarfraz, Rainer Stiefelhagen, Jens Kleesiek
备注: Provisionally Accepted at MICCAI2022

点击查看摘要

Abstract: When reading images, radiologists generate text reports describing the findings therein. Current state-of-the-art computer-aided diagnosis tools utilize a fixed set of predefined categories automatically extracted from these medical reports for training. This form of supervision limits the potential usage of models as they are unable to pick up on anomalies outside of their predefined set, thus, making it a necessity to retrain the classifier with additional data when faced with novel classes. In contrast, we investigate direct text supervision to break away from this closed set assumption. By doing so, we avoid noisy label extraction via text classifiers and incorporate more contextual information. We employ a contrastive global-local dual-encoder architecture to learn concepts directly from unstructured medical reports while maintaining its ability to perform free form classification. We investigate relevant properties of open set recognition for radiological data and propose a method to employ currently weakly annotated data into training. We evaluate our approach on the large-scale chest X-Ray datasets MIMIC-CXR, CheXpert, and ChestX-Ray14 for disease classification. We show that despite using unstructured medical report supervision, we perform on par with direct label supervision through a sophisticated inference setting.

摘要:阅读图像时,放射科医生会生成描述其中发现的文本报告。当前的最新计算机辅助诊断工具利用了一组固定的预定义类别,自动从这些医学报告中提取以进行培训。这种监督形式限制了模型的潜在用法,因为它们无法对其预定义集合之外的异常进行拾取,因此,在面对新型类别时,必须使用其他数据来重新训练分类器。相反,我们研究了直接的文本监督,以脱离这个封闭的假设。通过这样做,我们避免通过文本分类器提取嘈杂的标签,并结合更多上下文信息。我们采用对比度的全球双重编码架构来直接从非结构化医疗报告中学习概念,同时保持其执行自由形式分类的能力。我们研究了放射学数据开放式识别的相关属性,并提出了一种将当前弱注释数据用于培训的方法。我们在大规模的胸部X射线数据集中评估了我们的方法模仿CXR,CHEXPERT和CHESTX-RAY14进行疾病分类。我们表明,尽管使用了非结构化的医学报告监督,但我们通过复杂的推理环境以直接标签监督的方式执行。

CV-44-标题 ETAD A Unified Framework for Efficient Temporal Action Detection

链接: https://arxiv.org/abs/2205.07134
作者: Shuming Liu, Mengmeng Xu, Chen Zhao, Xu Zhao, Bernard Ghanem
备注: 17 pages, 5 figures

点击查看摘要

Abstract: Untrimmed video understanding such as temporal action detection (TAD) often suffers from the pain of huge demand for computing resources. Because of long video durations and limited GPU memory, most action detectors can only operate on pre-extracted features rather than the original videos, and they still require a lot of computation to achieve high detection performance. To alleviate the heavy computation problem in TAD, in this work, we first propose an efficient action detector with detector proposal sampling, based on the observation that performance saturates at a small number of proposals. This detector is designed with several important techniques, such as LSTM-boosted temporal aggregation and cascaded proposal refinement to achieve high detection quality as well as low computational cost. To enable joint optimization of this action detector and the feature encoder, we also propose encoder gradient sampling, which selectively back-propagates through video snippets and tremendously reduces GPU memory consumption. With the two sampling strategies and the effective detector, we build a unified framework for efficient end-to-end temporal action detection (ETAD), making real-world untrimmed video understanding tractable. ETAD achieves state-of-the-art performance on both THUMOS-14 and ActivityNet-1.3. Interestingly, on ActivityNet-1.3, it reaches 37.78% average mAP, while only requiring 6 mins of training time and 1.23 GB memory based on pre-extracted features. With end-to-end training, it reduces the GPU memory footprint by more than 70% with even higher performance (38.21% average mAP), as compared with traditional end-to-end methods. The code is available at this https URL.

摘要:未修剪的视频理解(例如时间动作检测(TAD))通常会遭受对计算资源需求的痛苦。由于视频持续时间较长和GPU内存有限,大多数动作检测器只能以预先提取的功能而不是原始视频进行操作,并且它们仍然需要大量计算来实现高检测性能。为了减轻TAD中的严重计算问题,在这项工作中,我们首先提出了一个有效的动作探测器,并基于观察结果,即在少数建议下表现饱和。该探测器的设计采用了几种重要技术,例如LSTM增强的时间聚集和级联建议的精炼,以实现高检测质量以及低计算成本。为了启用该动作检测器和特征编码器的联合优化,我们还提出了编码器梯度采样,该采样通过视频片段选择性地向后传播并大大减少GPU内存消耗。通过两种采样策略和有效的检测器,我们建立了一个统一的框架,以进行有效的端到端时间动作检测(ETAD),从而使现实世界中的未经构造的视频理解可牵涉。 etad在Thumos-14和ActivityNet-1.3上都达到了最先进的性能。有趣的是,在ActivityNet-1.3上,它达到平均地图37.78%,而基于预先提取的功能,仅需要6分钟的训练时间和1.23 GB的内存。通过端到端培训,与传统的端到端方法相比,它将GPU的内存足迹减少了70%以上(平均地图38.21%)。该代码可在此HTTPS URL上找到。

CV-45-标题 Classification of Astronomical Bodies by Efficient Layer Fine-Tuning of Deep Neural Networks

链接: https://arxiv.org/abs/2205.07124
作者: Sabeesh Ethiraj, Bharath Kumar Bolla
备注: Accepted at 5th Conference on Information and Communication Technology (CICT), 2021

点击查看摘要

Abstract: The SDSS-IV dataset contains information about various astronomical bodies such as Galaxies, Stars, and Quasars captured by observatories. Inspired by our work on deep multimodal learning, which utilized transfer learning to classify the SDSS-IV dataset, we further extended our research in the fine tuning of these architectures to study the effect in the classification scenario. Architectures such as Resnet-50, DenseNet-121 VGG-16, Xception, EfficientNetB2, MobileNetV2 and NasnetMobile have been built using layer wise fine tuning at different levels. Our findings suggest that freezing all layers with Imagenet weights and adding a final trainable layer may not be the optimal solution. Further, baseline models and models that have higher number of trainable layers performed similarly in certain architectures. Model need to be fine tuned at different levels and a specific training ratio is required for a model to be termed ideal. Different architectures had different responses to the change in the number of trainable layers w.r.t accuracies. While models such as DenseNet-121, Xception, EfficientNetB2 achieved peak accuracies that were relatively consistent with near perfect training curves, models such as Resnet-50,VGG-16, MobileNetV2 and NasnetMobile had lower, delayed peak accuracies with poorly fitting training curves. It was also found that though mobile neural networks have lesser parameters and model size, they may not always be ideal for deployment on a low computational device as they had consistently lower validation accuracies. Customized evaluation metrics such as Tuning Parameter Ratio and Tuning Layer Ratio are used for model evaluation.

摘要:SDSS-IV数据集包含有关观测值捕获的各种天文机构的信息,例如星系,恒星和类星体。受到我们在深度多模式学习方面的工作的启发,该学习利用了转移学习来对SDSS-IV数据集进行分类,我们进一步扩展了研究这些体系结构的研究,以研究分类场景中的效果。诸如Resnet-50,Densenet-121 VGG-16,Xception,EdgitionNetB2,MobilenetV2和Nasnetmobile之类的体系结构已使用不同级别的层微调构建。我们的发现表明,将所有图层用成像网的重量冻结并添加最终训练层可能不是最佳解决方案。此外,在某些架构中,具有较高可训练层数量的基线模型和模型在某些体系结构中执行类似。模型需要以不同的水平进行微调,并且需要特定的训练比率才能称为理想的模型。不同的体系结构对可训练层W.R.T精确度的变化有不同的响应。尽管诸如Densenet-121,Xception,EditivedNetB2之类的模型达到了与接近完美的训练曲线相对一致的峰精度,但诸如Resnet-50,VGG-16,MobilenetV2和NasnetMobile之类的模型具有较低,延迟的峰精度,并且较差,并且较差。还发现,尽管移动神经网络具有较小的参数和模型大小,但由于它们始终降低验证精度,因此它们可能并不总是理想的,用于在低计算设备上部署。定制的评估指标(例如调谐参数比率和调谐层比率)用于模型评估。

CV-46-标题 Revisiting Facial Key Point Detection An Efficient Approach Using Deep Neural Networks

链接: https://arxiv.org/abs/2205.07121
作者: Prathima Dileep, Bharath Kumar Bolla, Sabeesh Ethiraj
备注: Accepted at international Conference On Big Data, Machine Learning and Applications (BigDML 2021)

点击查看摘要

Abstract: Facial landmark detection is a widely researched field of deep learning as this has a wide range of applications in many fields. These key points are distinguishing characteristic points on the face, such as the eyes center, the eye’s inner and outer corners, the mouth center, and the nose tip from which human emotions and intent can be explained. The focus of our work has been evaluating transfer learning models such as MobileNetV2 and NasNetMobile, including custom CNN architectures. The objective of the research has been to develop efficient deep learning models in terms of model size, parameters, and inference time and to study the effect of augmentation imputation and fine-tuning on these models. It was found that while augmentation techniques produced lower RMSE scores than imputation techniques, they did not affect the inference time. MobileNetV2 architecture produced the lowest RMSE and inference time. Moreover, our results indicate that manually optimized CNN architectures performed similarly to Auto Keras tuned architecture. However, manually optimized architectures yielded better inference time and training curves.

摘要:面部地标检测是一个广泛研究的深度学习领域,因为这在许多领域都有广泛的应用。这些要点是区分脸上的特征点,例如眼睛中心,眼睛的内角和外角,嘴中心以及可以解释人类情感和意图的鼻尖。我们工作的重点是评估转移学习模型,例如Mobilenetv2和Nasnetmobile,包括自定义CNN体系结构。该研究的目的是根据模型大小,参数和推理时间开发有效的深度学习模型,并研究增强插补和微调对这些模型的影响。据发现,虽然增强技术的RMSE得分比归纳技术较低,但它们并没有影响推理时间。 Mobilenetv2架构产生了最低的RMSE和推理时间。此外,我们的结果表明,手动优化的CNN体​​系结构与自动keras调谐体系结构相似。但是,手动优化的体系结构产生了更好的推理时间和训练曲线。

CV-47-标题 Efficient Deep Learning Methods for Identification of Defective Casting Products

链接: https://arxiv.org/abs/2205.07118
作者: Bharath Kumar Bolla, Mohan Kingam, Sabeesh Ethiraj
备注: Accepted at ICCR 2021: International Conference on Cognition and Recognition 2021

点击查看摘要

Abstract: Quality inspection has become crucial in any large-scale manufacturing industry recently. In order to reduce human error, it has become imperative to use efficient and low computational AI algorithms to identify such defective products. In this paper, we have compared and contrasted various pre-trained and custom-built architectures using model size, performance and CPU latency in the detection of defective casting products. Our results show that custom architectures are efficient than pre-trained mobile architectures. Moreover, custom models perform 6 to 9 times faster than lightweight models such as MobileNetV2 and NasNet. The number of training parameters and the model size of the custom architectures is significantly lower (~386 times & ~119 times respectively) than the best performing models such as MobileNetV2 and NasNet. Augmentation experimentations have also been carried out on the custom architectures to make the models more robust and generalizable. Our work sheds light on the efficiency of these custom-built architectures for deployment on Edge and IoT devices and that transfer learning models may not always be ideal. Instead, they should be specific to the kind of dataset and the classification problem at hand.

摘要:最近在任何大规模制造业中,质量检查变得至关重要。为了减少人为错误,必须使用高效且低的计算AI算法来识别此类缺陷的产品已成为必须进行的。在本文中,我们使用模型尺寸,性能和CPU潜伏期在检测有缺陷的铸造产品中比较并对比了各种预训练和定制的体系结构。我们的结果表明,自定义体系结构比预先训练的移动体系结构高效。此外,自定义型号的执行速度比MobilenetV2和Nasnet等轻型模型快6至9倍。训练参数的数量和自定义体系结构的模型大小明显低(分别为386次,〜119次),比MobileNetV2和Nasnet等最佳性能模型。还对自定义体系结构进行了增强实验,以使模型更强大和可推广。我们的工作阐明了这些定制构建体系结构在边缘和物联网设备上部署的效率,而转移学习模型可能并不总是理想的。相反,它们应该特定于数据集和手头的分类问题。

CV-48-标题 Differentiable SAR Renderer and SAR Target Reconstruction

链接: https://arxiv.org/abs/2205.07099
作者: Shilei Fu, Feng Xu
备注:

点击查看摘要

Abstract: Forward modeling of wave scattering and radar imaging mechanisms is the key to information extraction from synthetic aperture radar (SAR) images. Like inverse graphics in optical domain, an inherently-integrated forward-inverse approach would be promising for SAR advanced information retrieval and target reconstruction. This paper presents such an attempt to the inverse graphics for SAR imagery. A differentiable SAR renderer (DSR) is developed which reformulates the mapping and projection algorithm of SAR imaging mechanism in the differentiable form of probability maps. First-order gradients of the proposed DSR are then analytically derived which can be back-propagated from rendered image/silhouette to the target geometry and scattering attributes. A 3D inverse target reconstruction algorithm from SAR images is devised. Several simulation and reconstruction experiments are conducted, including targets with and without background, using both synthesized data or real measured inverse SAR (ISAR) data by ground radar. Results demonstrate the efficacy of the proposed DSR and its inverse approach.

摘要:波散射和雷达成像机制的正向建模是从合成孔径雷达(SAR)图像中提取信息的关键。就像光学域中的逆图形一样,对于SAR高级信息检索和目标重建,一种固有的融合前进方法将是有希望的。本文为SAR图像的逆图提供了这种尝试。开发了一个可区分的SAR渲染器(DSR),该渲染器(DSR)以可区分形式的概率图形式重新制定了SAR成像机制的映射和投影算法。然后,分析得出所提出的DSR的一阶梯度可以从渲染的图像/轮廓回到目标几何和散射属性。设计了来自SAR图像的3D逆目标重建算法。进行了几项仿真和重建实验,包括使用合成数据或实际测量的逆SAR(ISAR)数据,包括有或没有背景的目标。结果证明了拟议的DSR及其反向方法的功效。

CV-49-标题 Multi-modal curb detection and filtering

链接: https://arxiv.org/abs/2205.07096
作者: Sandipan Das, Navid Mahabadi, Saikat Chatterjee, Maurice Fallon
备注:

点击查看摘要

Abstract: Reliable knowledge of road boundaries is critical for autonomous vehicle navigation. We propose a robust curb detection and filtering technique based on the fusion of camera semantics and dense lidar point clouds. The lidar point clouds are collected by fusing multiple lidars for robust feature detection. The camera semantics are based on a modified EfficientNet architecture which is trained with labeled data collected from onboard fisheye cameras. The point clouds are associated with the closest curb segment with L_2 -norm analysis after projecting into the image space with the fisheye model projection. Next, the selected points are clustered using unsupervised density-based spatial clustering to detect different curb regions. As new curb points are detected in consecutive frames they are associated with the existing curb clusters using temporal reachability constraints. If no reachability constraints are found a new curb cluster is formed from these new points. This ensures we can detect multiple curbs present in road segments consisting of multiple lanes if they are in the sensors’ field of view. Finally, Delaunay filtering is applied for outlier removal and its performance is compared to traditional RANSAC-based filtering. An objective evaluation of the proposed solution is done using a high-definition map containing ground truth curb points obtained from a commercial map supplier. The proposed system has proven capable of detecting curbs of any orientation in complex urban road scenarios comprising straight roads, curved roads, and intersections with traffic isles.

摘要:对道路边界的可靠知识对于自动驾驶汽车导航至关重要。我们根据摄像机语义和密集的激光点云的融合提出了一种强大的路缘检测和过滤技术。通过融合多个激光元以进行稳健特征检测,收集了激光云点云。摄像机语义基于修改的高效网络体系结构,该体系结构经过培训,该体系结构是从船上Fisheye摄像头收集的标记数据。点云与最接近的路缘段通过L_2 -Norm分析在使用Fisheye模型投影投影到图像空间后。接下来,使用无监督密度的空间聚类将所选点聚类以检测不同的路缘区域。由于在连续帧中检测到新的路缘点,因此使用时间覆盖性约束与现有的路缘簇相关联。如果没有发现可及性约束,则从这些新点形成了新的路缘群集。这样可以确保我们可以在路段中发现多个路边,如果它们处于传感器的视野中,则可以检测到由多个车道组成的路口。最后,将Delaunay过滤用于离群拆卸,并将其性能与传统的基于RANSAC的过滤进行比较。使用来自商业地图供应商获得的地面真相路缘点的高清图进行了对所提出解决方案的客观评估。拟议的系统已被证明能够检测在复杂的城市道路场景中的任何方向的路缘,其中包括直路,弯曲的道路和与交通岛的交叉路口。

CV-50-标题 Monitoring of Pigmented Skin Lesions Using 3D Whole Body Imaging

链接: https://arxiv.org/abs/2205.07085
作者: David Ahmedt-Aristizabal, Chuong Nguyen, Lachlan Tychsen-Smith, Ashley Stacey, Shenghong Li, Joseph Pathikulangara, Lars Petersson, Dadong Wang
备注:

点击查看摘要

Abstract: Modern data-driven machine learning research that enables revolutionary advances in image analysis has now become a critical tool to redefine how skin lesions are documented, mapped, and tracked. We propose a 3D whole body imaging prototype to enable rapid evaluation and mapping of skin lesions. A modular camera rig arranged in a cylindrical configuration is designed to automatically capture synchronised images from multiple angles for entire body scanning. We develop algorithms for 3D body image reconstruction, data processing and skin lesion detection based on deep convolutional neural networks. We also propose a customised, intuitive and flexible interface that allows the user to interact and collaborate with the machine to understand the data. The hybrid of the human and computer is represented by the analysis of 2D lesion detection, 3D mapping and data management. The experimental results using synthetic and real images demonstrate the effectiveness of the proposed solution by providing multiple views of the target skin lesion, enabling further 3D geometry analysis. Skin lesions are identified as outliers which deserve more attention from a skin cancer physician. Our detector identifies lesions at a comparable performance level as a physician. The proposed 3D whole body imaging system can be used by dermatological clinics, allowing for fast documentation of lesions, quick and accurate analysis of the entire body to detect suspicious lesions. Because of its fast examination, the method might be used for screening or epidemiological investigations. 3D data analysis has the potential to change the paradigm of total-body photography with many applications in skin diseases, including inflammatory and pigmentary disorders.

摘要:现代数据驱动的机器学习研究,使图像分析中的革命进步现已成为重新定义皮肤病变如何记录,映射和跟踪的关键工具。我们提出了一个3D全身成像原型,以实现皮肤病变的快速评估和映射。在圆柱配置中排列的模块化摄像头钻机旨在自动从多个角度捕获同步图像以进行整个身体扫描。我们开发了基于深卷积神经网络的3D身体图像重建,数据处理和皮肤病变检测的算法。我们还提出了一个自定义,直观和灵活的界面,使用户可以与计算机进行交互和协作以了解数据。人类和计算机的混合体是对2D病变检测,3D映射和数据管理的分析来表示的。使用合成和真实图像的实验结果通过提供靶皮肤病变的多种视图,从而实现进一步的3D几何分析来证明拟议溶液的有效性。皮肤病变被确定为异常值,应该从皮肤癌医生那里得到更多关注。我们的检测器将病变确定为医师的性能水平。提出的3D全身成像系统可以由皮肤病学诊所使用,可以快速记录病变,快速准确地分析整个身体以检测可疑病变。由于其快速检查,该方法可用于筛查或流行病学研究。 3D数据分析有可能改变整体摄影的范式,并在包括炎症性和色素性疾病在内的皮肤病中使用了许多应用。

CV-51-标题 Spiking Approximations of the MaxPooling Operation in Deep SNNs

链接: https://arxiv.org/abs/2205.07076
作者: Ramashish Gaurav, Bryan Tripp, Apurva Narayan
备注: Accepted in IJCNN-2022

点击查看摘要

Abstract: Spiking Neural Networks (SNNs) are an emerging domain of biologically inspired neural networks that have shown promise for low-power AI. A number of methods exist for building deep SNNs, with Artificial Neural Network (ANN)-to-SNN conversion being highly successful. MaxPooling layers in Convolutional Neural Networks (CNNs) are an integral component to downsample the intermediate feature maps and introduce translational invariance, but the absence of their hardware-friendly spiking equivalents limits such CNNs’ conversion to deep SNNs. In this paper, we present two hardware-friendly methods to implement Max-Pooling in deep SNNs, thus facilitating easy conversion of CNNs with MaxPooling layers to SNNs. In a first, we also execute SNNs with spiking-MaxPooling layers on Intel’s Loihi neuromorphic hardware (with MNIST, FMNIST, & CIFAR10 dataset); thus, showing the feasibility of our approach.

摘要:尖峰神经网络(SNN)是生物学启发的神经网络的新兴领域,对低功率AI显示了有望。存在许多用于构建深SNN的方法,具有人工神经网络(ANN)至SNN转换非常成功。卷积神经网络(CNN)中的MaxPool层是将中间特征映射下样本并引入翻译不变性的组成部分,但是缺乏其硬件友好的尖峰等效限制了此类CNNS转换为深SNNS。在本文中,我们提出了两种适合硬件的方法,以在深SNN中实现最大值,从而促进了CNN的易于转换,将Maxpool层带到SNNS。首先,我们还在英特尔的Loihi神经形态硬件(使用MNIST,FMNIST和CIFAR10 DATASET)上执行使用SPIKING-MAXPOOL层的SNN;因此,显示我们方法的可行性。

CV-52-标题 Corrosion Detection for Industrial Objects From Multi-Sensor System to 5D Feature Space

链接: https://arxiv.org/abs/2205.07075
作者: Dennis Haitz, Boris Jutzi, Patrick Huebner, Markus Ulrich
备注: 8 pages, 4 figures

点击查看摘要

Abstract: Corrosion is a form of damage that often appears on the surface of metal-made objects used in industrial applications. Those damages can be critical depending on the purpose of the used object. Optical-based testing systems provide a form of non-contact data acquisition, where the acquired data can then be used to analyse the surface of an object. In the field of industrial image processing, this is called surface inspection. We provide a testing setup consisting of a rotary table which rotates the object by 360 degrees, as well as industrial RGB cameras and laser triangulation sensors for the acquisition of 2D and 3D data as our multi-sensor system. These sensors acquire data while the object to be tested takes a full rotation. Further on, data augmentation is applied to prepare new data or enhance already acquired data. In order to evaluate the impact of a laser triangulation sensor for corrosion detection, one challenge is to at first fuse the data of both domains. After the data fusion process, 5 different channels can be utilized to create a 5D feature space. Besides the red, green and blue channels of the image (1-3), additional range data from the laser triangulation sensor is incorporated (4). As a fifth channel, said sensor provides additional intensity data (5). With a multi-channel image classification, a 5D feature space will lead to slightly superior results opposed to a 3D feature space, composed of only the RGB channels of the image.

摘要:腐蚀是一种经常出现在工业应用中使用的金属制成物体表面上的损害形式。这些损坏可能取决于用过的对象的目的。基于光学的测试系统提供了一种非接触式数据采集的形式,然后可以使用获得的数据来分析对象的表面。在工业图像处理领域,这称为表面检查。我们提供了一个由旋转表组成的测试设置,该旋转表将物体旋转360度,以及工业RGB摄像机和激光三角剖分传感器,以获取2D和3D数据作为我们的多传感器系统。这些传感器在要进行测试的对象时获取数据,需要全面旋转。此外,还应用了数据增强来准备新的数据或增强已经获得的数据。为了评估激光三角传感器对腐蚀检测的影响,首先是融合两个域的数据。在数据融合过程之后,可以利用5个不同的通道来创建5D特征空间。除了图像的红色,绿色和蓝色通道(1-3)外,还合并了来自激光三角传感器的其他范围数据(4)。作为第五通道,所述传感器提供了其他强度数据(5)。使用多通道图像分类,5D特征空间将导致与3D特征空间相反的结果,仅由图像的RGB通道组成。

CV-53-标题 An Architecture for the detection of GAN-generated Flood Images with Localization Capabilities

链接: https://arxiv.org/abs/2205.07073
作者: Jun Wang, Omran Alamayreh, Benedetta Tondi, Mauro Barni
备注:

点击查看摘要

Abstract: In this paper, we address a new image forensics task, namely the detection of fake flood images generated by ClimateGAN architecture. We do so by proposing a hybrid deep learning architecture including both a detection and a localization branch, the latter being devoted to the identification of the image regions manipulated by ClimateGAN. Even if our goal is the detection of fake flood images, in fact, we found that adding a localization branch helps the network to focus on the most relevant image regions with significant improvements in terms of generalization capabilities and robustness against image processing operations. The good performance of the proposed architecture is validated on two datasets of pristine flood images downloaded from the internet and three datasets of fake flood images generated by ClimateGAN starting from a large set of diverse street images.

摘要:在本文中,我们解决了一项新的图像取证任务,即检测Climategan Architecture生成的假洪水图像。我们这样做是通过提出混合深度学习体系结构,包括检测和定位分支,后者致力于识别Climategan操纵的图像区域。即使我们的目标是检测假洪水图像,实际上,我们发现添加本地化分支有助于网络专注于最相关的图像区域,并在对图像处理操作的概括能力和鲁棒性方面有显着改善。在从Internet下载的两个原始洪水图像的数据集中验证了所提出的体系结构的良好性能,以及Climategan生成的三个假洪水图像数据集,从一系列不同的街道图像开始。

CV-54-标题 RTMV A Ray-Traced Multi-View Synthetic Dataset for Novel View Synthesis

链接: https://arxiv.org/abs/2205.07058
作者: Jonathan Tremblay, Moustafa Meshry, Alex Evans, Jan Kautz, Alexander Keller, Sameh Khamis, Charles Loop, Nathan Morrical, Koki Nagano, Towaki Takikawa, Stan Birchfield
备注: Project page at this http URL

点击查看摘要

Abstract: We present a large-scale synthetic dataset for novel view synthesis consisting of ~300k images rendered from nearly 2000 complex scenes using high-quality ray tracing at high resolution (1600 x 1600 pixels). The dataset is orders of magnitude larger than existing synthetic datasets for novel view synthesis, thus providing a large unified benchmark for both training and evaluation. Using 4 distinct sources of high-quality 3D meshes, the scenes of our dataset exhibit challenging variations in camera views, lighting, shape, materials, and textures. Because our dataset is too large for existing methods to process, we propose Sparse Voxel Light Field (SVLF), an efficient voxel-based light field approach for novel view synthesis that achieves comparable performance to NeRF on synthetic data, while being an order of magnitude faster to train and two orders of magnitude faster to render. SVLF achieves this speed by relying on a sparse voxel octree, careful voxel sampling (requiring only a handful of queries per ray), and reduced network structure; as well as ground truth depth maps at training time. Our dataset is generated by NViSII, a Python-based ray tracing renderer, which is designed to be simple for non-experts to use and share, flexible and powerful through its use of scripting, and able to create high-quality and physically-based rendered images. Experiments with a subset of our dataset allow us to compare standard methods like NeRF and mip-NeRF for single-scene modeling, and pixelNeRF for category-level modeling, pointing toward the need for future improvements in this area.

摘要:我们提供了一个用于新视图合成的大规模合成数据集,该数据集由使用高分辨率(1600 x 1600像素)的高质量射线跟踪从近2000个复杂场景中呈现的〜300K图像组成。该数据集的数量级比现有的合成数据集大,以进行新的视图合成,因此为培训和评估提供了较大的统一基准。使用4个不同的高质量3D网眼来源,我们数据集的场景在相机视图,照明,形状,材料和纹理方面表现出挑战性的变化。由于我们的数据集太大而无法处理现有方法,因此我们提出了稀疏体素光场(SVLF),这是一种基于Voxel的高效光场方法,用于新型视图合成,在合成数据上与NERF相当地表现出色的性能,同时是一个数量级的顺序更快地训练和两个数量级的渲染速度。 SVLF通过依靠稀疏的体素OCTRE,仔细的体素采样(每射线需要少量查询)和减少网络结构来实现此速度;以及在训练时间的地面真相深度地图。我们的数据集由基于Python的射线跟踪渲染器NVISII生成,该渲染器的设计简单,对于非专家使用和共享,通过使用脚本来使用和共享,灵活而有力,并能够创建高质量和基于物理的物理渲染图像。具有数据集子集的实验使我们能够比较单场模型的NERF和MIP-NERF等标准方法,以及用于类别级建模的Pixelnerf,指出了该领域未来改进的需求。

CV-55-标题 Transformer Scale Gate for Semantic Segmentation

链接: https://arxiv.org/abs/2205.07056
作者: Hengcan Shi, Munawar Hayat, Jianfei Cai
备注:

点击查看摘要

Abstract: Effectively encoding multi-scale contextual information is crucial for accurate semantic segmentation. Existing transformer-based segmentation models combine features across scales without any selection, where features on sub-optimal scales may degrade segmentation outcomes. Leveraging from the inherent properties of Vision Transformers, we propose a simple yet effective module, Transformer Scale Gate (TSG), to optimally combine multi-scale features.TSG exploits cues in self and cross attentions in Vision Transformers for the scale selection. TSG is a highly flexible plug-and-play module, and can easily be incorporated with any encoder-decoder-based hierarchical vision Transformer architecture. Extensive experiments on the Pascal Context and ADE20K datasets demonstrate that our feature selection strategy achieves consistent gains.

摘要:有效地编码多尺度上下文信息对于准确的语义分割至关重要。现有的基于变压器的分割模型结合了跨量表的特征,而无需任何选择,在亚最佳尺度上的功能可能会降低分割结果。从视觉变压器的固有属性中,我们提出了一个简单而有效的模块,变压器尺度门(TSG),以最佳结合多尺度功能。TSG利用自我和跨视力变压器中的交叉注意力提示进行量表选择。TSG是一个高度灵活的插件模块,可以轻松地与任何基于编码器的层次视觉变压器体系结构结合在一起。对Pascal环境和ADE20K数据集进行的广泛实验表明,我们的特征选择策略可以达到一致的增长。

CV-56-标题 Realistic Defocus Blur for Multiplane Computer-Generated Holography

链接: https://arxiv.org/abs/2205.07030
作者: Koray Kavaklı, Yuta Itoh, Hakan Urey, Kaan Akşit
备注: 17 pages in total, first 7 pages are for the manuscript, remaining pages are for supplementary. For more visit: this https URL For our codebase visit this https URL

点击查看摘要

Abstract: This paper introduces a new multiplane CGH computation method to reconstruct artefact-free high-quality holograms with natural-looking defocus blur. Our method introduces a new targeting scheme and a new loss function. While the targeting scheme accounts for defocused parts of the scene at each depth plane, the new loss function analyzes focused and defocused parts separately in reconstructed images. Our method support phase-only CGH calculations using various iterative (e.g., Gerchberg-Saxton, Gradient Descent) and non-iterative (e.g., Double Phase) CGH techniques. We achieve our best image quality using a modified gradient descent-based optimization recipe where we introduce a constraint inspired by the double phase method. We validate our method experimentally using our proof-of-concept holographic display, comparing various algorithms, including multi-depth scenes with sparse and dense contents.

摘要:本文介绍了一种新的多层CGH计算方法,用于重建无伪影的高质量全息图,并具有自然的Defocus Blur。我们的方法介绍了一种新的目标方案和新的损失功能。虽然目标方案在每个深度平面上说明了场景的偶像部分,但新的损失函数分析了重构图像中的集中和散落的部分。我们的方法使用各种迭代(例如Gerchberg-saxton,梯度下降)和非词汇(例如,双相)CGH技术支持仅CHH计算。我们使用基于改良的梯度下降的优化配方实现最佳图像质量,在该配方中我们引入了受双相方法启发的约束。我们使用我们的概念验证全息图显示了实验验证我们的方法,并比较各种算法,包括具有稀疏和致密含量的多深度场景。

CV-57-标题 Object-Aware Self-supervised Multi-Label Learning

链接: https://arxiv.org/abs/2205.07028
作者: Xu Kaixin, Liu Liyang, Zhao Ziyuan, Zeng Zeng, Bharadwaj Veeravalli
备注:

点击查看摘要

Abstract: Multi-label Learning on Image data has been widely exploited with deep learning models. However, supervised training on deep CNN models often cannot discover sufficient discriminative features for classification. As a result, numerous self-supervision methods are proposed to learn more robust image representations. However, most self-supervised approaches focus on single-instance single-label data and fall short on more complex images with multiple objects. Therefore, we propose an Object-Aware Self-Supervision (OASS) method to obtain more fine-grained representations for multi-label learning, dynamically generating auxiliary tasks based on object locations. Secondly, the robust representation learned by OASS can be leveraged to efficiently generate Class-Specific Instances (CSI) in a proposal-free fashion to better guide multi-label supervision signal transfer to instances. Extensive experiments on the VOC2012 dataset for multi-label classification demonstrate the effectiveness of the proposed method against the state-of-the-art counterparts.

摘要:关于图像数据的多标签学习已通过深度学习模型广泛利用。但是,对深CNN模型的监督培训通常无法发现足够的判别特征进行分类。结果,提出了许多自学方法来学习更多可靠的图像表示。但是,大多数自我监督的方法都集中在单个标签数据上,并缺乏具有多个对象的更复杂的图像。因此,我们提出了一种对象感知的自学方法(OASS)方法,以获取多标签学习的更细粒度表示,并根据对象位置动态生成辅助任务。其次,可以利用OAS学到的强大表示形式,以无提案方式有效地生成特定于类的实例(CSI),以更好地指导多标签监督信号传递到实例。对多标签分类的VOC2012数据集进行了广泛的实验,证明了该方法针对最先进的对应物的有效性。

CV-58-标题 Evaluating the Generalization Ability of Super-Resolution Networks

链接: https://arxiv.org/abs/2205.07019
作者: Yihao Liu, Hengyuan Zhao, Jinjin Gu, Yu Qiao, Chao Dong
备注: First Generalization Assessment Index for SR networks

点击查看摘要

Abstract: Performance and generalization ability are two important aspects to evaluate deep learning models. However, research on the generalization ability of Super-Resolution (SR) networks is currently absent. We make the first attempt to propose a Generalization Assessment Index for SR networks, namely SRGA. SRGA exploits the statistical characteristics of internal features of deep networks, not output images to measure the generalization ability. Specially, it is a non-parametric and non-learning metric. To better validate our method, we collect a patch-based image evaluation set (PIES) that includes both synthetic and real-world images, covering a wide range of degradations. With SRGA and PIES dataset, we benchmark existing SR models on the generalization ability. This work could lay the foundation for future research on model generalization in low-level vision.

摘要:性能和概括能力是评估深度学习模型的两个重要方面。但是,目前不存在对超分辨率(SR)网络的概括能力的研究。我们首次尝试为SR网络(即SRGA)提出概括评估指数。SRGA利用了深网的内部特征的统计特征,而不是输出图像以衡量概括能力。特别是,它是一种非参数和非学习度量的度量。为了更好地验证我们的方法,我们收集了一个基于补丁的图像评估集(PIE),其中包括合成图像和现实世界图像,涵盖了广泛的降级。使用SRGA和PIES数据集,我们将现有的SR模型基于概括能力进行基准测试。这项工作可以为在低级视野中的模型概括提供未来的研究奠定基础。

CV-59-标题 Importance Weighted Structure Learning for Scene Graph Generation

链接: https://arxiv.org/abs/2205.07017
作者: Daqi Liu, Miroslaw Bober, Josef Kittler
备注:

点击查看摘要

Abstract: Scene graph generation is a structured prediction task aiming to explicitly model objects and their relationships via constructing a visually-grounded scene graph for an input image. Currently, the message passing neural network based mean field variational Bayesian methodology is the ubiquitous solution for such a task, in which the variational inference objective is often assumed to be the classical evidence lower bound. However, the variational approximation inferred from such loose objective generally underestimates the underlying posterior, which often leads to inferior generation performance. In this paper, we propose a novel importance weighted structure learning method aiming to approximate the underlying log-partition function with a tighter importance weighted lower bound, which is computed from multiple samples drawn from a reparameterizable Gumbel-Softmax sampler. A generic entropic mirror descent algorithm is applied to solve the resulting constrained variational inference task. The proposed method achieves the state-of-the-art performance on various popular scene graph generation benchmarks.

摘要:场景图生成是一个结构化的预测任务,旨在通过为输入图像构造视觉接地的场景图来明确建模对象及其关系。当前,通过基于神经网络的消息基于基于神经网络的平均变异贝叶斯方法论是这种任务的无处不在解决方案,其中通常认为变异推理目标是经典证据下限。但是,从这种松散的客观中推断出的变异近似通常低估了下面的后部,这通常会导致下一代的性能。在本文中,我们提出了一种新颖的重要性加权结构学习方法,旨在近似于基础对数分区函数,并具有更严重的加权下限,这是根据从可重新集体化的gumbel-softmax采样器中得出的多个样品计算得出的。应用通用的熵镜下降算法来求解所得约束的变异推理任务。所提出的方法在各种流行场景图生成基准测试中实现了最先进的性能。

CV-60-标题 SaiNet Stereo aware inpainting behind objects with generative networks

链接: https://arxiv.org/abs/2205.07014
作者: Violeta Menéndez González, Andrew Gilbert, Graeme Phillipson, Stephen Jolly, Simon Hadfield
备注: Presented at AI4CC workshop at CVPR

点击查看摘要

Abstract: In this work, we present an end-to-end network for stereo-consistent image inpainting with the objective of inpainting large missing regions behind objects. The proposed model consists of an edge-guided UNet-like network using Partial Convolutions. We enforce multi-view stereo consistency by introducing a disparity loss. More importantly, we develop a training scheme where the model is learned from realistic stereo masks representing object occlusions, instead of the more common random masks. The technique is trained in a supervised way. Our evaluation shows competitive results compared to previous state-of-the-art techniques.

摘要:在这项工作中,我们提出了一个端到端网络,用于立体声一致的图像插图,目的是介绍对象背后的大型缺失区域。提出的模型由使用部分卷积的边缘引导的UNET样网络组成。我们通过引入差异损失来实施多视图立体声一致性。更重要的是,我们开发了一种训练方案,该方案是从代表对象遮挡的现实立体声掩码中学到的模型,而不是更常见的随机掩模。该技术以有监督的方式进行了培训。与以前的最新技术相比,我们的评估显示出竞争性的结果。

CV-61-标题 Panoptic-PHNet Towards Real-Time and High-Precision LiDAR Panoptic Segmentation via Clustering Pseudo Heatmap

链接: https://arxiv.org/abs/2205.07002
作者: Jinke Li, Xiao He, Yang Wen, Yuan Gao, Xiaoqiang Cheng, Dan Zhang
备注:

点击查看摘要

Abstract: As a rising task, panoptic segmentation is faced with challenges in both semantic segmentation and instance segmentation. However, in terms of speed and accuracy, existing LiDAR methods in the field are still limited. In this paper, we propose a fast and high-performance LiDAR-based framework, referred to as Panoptic-PHNet, with three attractive aspects: 1) We introduce a clustering pseudo heatmap as a new paradigm, which, followed by a center grouping module, yields instance centers for efficient clustering without object-level learning tasks. 2) A knn-transformer module is proposed to model the interaction among foreground points for accurate offset regression. 3) For backbone design, we fuse the fine-grained voxel features and the 2D Bird’s Eye View (BEV) features with different receptive fields to utilize both detailed and global information. Extensive experiments on both SemanticKITTI dataset and nuScenes dataset show that our Panoptic-PHNet surpasses state-of-the-art methods by remarkable margins with a real-time speed. We achieve the 1st place on the public leaderboard of SemanticKITTI and leading performance on the recently released leaderboard of nuScenes.

摘要:作为一项不断上升的任务,全景分段面临着语义分割和实例分割的挑战。但是,就速度和准确性而言,现有的LIDAR方法仍然有限。在本文中,我们提出了一个快速,高性能的基于激光雷达的框架,称为全磁带,有三个有吸引力的方面:1)我们引入了一个聚类的伪热图作为新范式,然后是一个中心分组模块,在没有对象级学习任务的情况下产生实例中心,以进行有效的聚类。 2)提出了一个KNN转换器模块,以建模前景点之间的相互作用,以确保偏移回归。 3)对于骨干设计,我们将融合细粒度的体素特征和2D Bird’s Eye View(BEV)功能,并具有不同的接收场,以同时使用详细信息和全球信息。 Semantickitti数据集和Nuscenes数据集的广泛实验表明,我们的Panoptic-Phnet通过以实时速度显着的边距超过最新方法。我们在Semantickitti的公共排行榜上获得了第一名,并在最近发布的Nuscenes排行榜上获得了领先的表现。

CV-62-标题 Voxel-wise Adversarial Semi-supervised Learning for Medical Image Segmentation

链接: https://arxiv.org/abs/2205.06987
作者: Chae Eun Lee, Hyelim Park, Yeong-Gil Shin, Minyoung Chung
备注:

点击查看摘要

Abstract: Semi-supervised learning for medical image segmentation is an important area of research for alleviating the huge cost associated with the construction of reliable large-scale annotations in the medical domain. Recent semi-supervised approaches have demonstrated promising results by employing consistency regularization, pseudo-labeling techniques, and adversarial learning. These methods primarily attempt to learn the distribution of labeled and unlabeled data by enforcing consistency in the predictions or embedding context. However, previous approaches have focused only on local discrepancy minimization or context relations across single classes. In this paper, we introduce a novel adversarial learning-based semi-supervised segmentation method that effectively embeds both local and global features from multiple hidden layers and learns context relations between multiple classes. Our voxel-wise adversarial learning method utilizes a voxel-wise feature discriminator, which considers multilayer voxel-wise features (involving both local and global features) as an input by embedding class-specific voxel-wise feature distribution. Furthermore, we improve our previous representation learning method by overcoming information loss and learning stability problems, which enables rich representations of labeled data. Our method outperforms current best-performing state-of-the-art semi-supervised learning approaches on the image segmentation of the left atrium (single class) and multiorgan datasets (multiclass). Moreover, our visual interpretation of the feature space demonstrates that our proposed method enables a well-distributed and separated feature space from both labeled and unlabeled data, which improves the overall prediction results.

摘要:对医学图像分割的半监督学习是减轻与在医疗领域中可靠的大规模注释相关的巨大成本的重要研究领域。最近的半监督方法通过采用一致性正则化,伪标记技术和对抗性学习,证明了有希望的结果。这些方法主要尝试通过在预测或嵌入环境中执行一致性来学习标记和未标记数据的分布。但是,以前的方法仅集中在跨单一类的本地差异最小化或上下文关系上。在本文中,我们介绍了一种基于对抗性学习的新型半监督分割方法,该方法有效地从多个隐藏的层中嵌入了本地和全局特征,并学习了多个类之间的上下文关系。我们的体素对手学习方法利用了Voxel的特征歧视器,该方法认为多层体素特征(涉及局部和全局特征)是通过嵌入类特异性体素特征特征特征分布的输入。此外,我们通过克服信息丢失和学习稳定性问题来改善以前的表示学习方法,从而使标记的数据具有丰富的表示形式。我们的方法优于当前最佳性能最先进的半监督学习方法,该方法在左心房(单个类)和多层数据集(多类)的图像分割上。此外,我们对特征空间的视觉解释表明,我们提出的方法可以从标记和未标记的数据中分布良好和分离的特征空间,从而改善了总体预测结果。

CV-63-标题 Efficient Gesture Recognition for the Assistance of Visually Impaired People using Multi-Head Neural Networks

链接: https://arxiv.org/abs/2205.06980
作者: Samer Alashhab, Antonio Javier Gallego, Miguel Ángel Lozano
备注:

点击查看摘要

Abstract: This paper proposes an interactive system for mobile devices controlled by hand gestures aimed at helping people with visual impairments. This system allows the user to interact with the device by making simple static and dynamic hand gestures. Each gesture triggers a different action in the system, such as object recognition, scene description or image scaling (e.g., pointing a finger at an object will show a description of it). The system is based on a multi-head neural network architecture, which initially detects and classifies the gestures, and subsequently, depending on the gesture detected, performs a second stage that carries out the corresponding action. This multi-head architecture optimizes the resources required to perform different tasks simultaneously, and takes advantage of the information obtained from an initial backbone to perform different processes in a second stage. To train and evaluate the system, a dataset with about 40k images was manually compiled and labeled including different types of hand gestures, backgrounds (indoors and outdoors), lighting conditions, etc. This dataset contains synthetic gestures (whose objective is to pre-train the system in order to improve the results) and real images captured using different mobile phones. The results obtained and the comparison made with the state of the art show competitive results as regards the different actions performed by the system, such as the accuracy of classification and localization of gestures, or the generation of descriptions for objects and scenes.

摘要:本文提出了一个用于由手势控制的移动设备的交互式系统,旨在帮助视力障碍者。该系统允许用户通过制作简单的静态和动态手势来与设备进行交互。每个手势都会触发系统中的不同动作,例如对象识别,场景描述或图像缩放(例如,将手指指向对象将显示一个描述)。该系统基于多头神经网络体系结构,该架构最初检测和对手势进行分类,随后,根据所检测到的手势,执行了执行相应动作的第二阶段。这种多头体系结构优化了同时执行不同任务所需的资源,并利用从初始骨架获得的信息在第二阶段中执行不同的过程。为了训练和评估系统,手动编译了一个具有约40k图像的数据集,并标记了包括不同类型的手势,背景(室内和室外),照明条件等。此数据集包含合成手势(其目的是预先制作的目标该系统以改善结果)和使用不同手机捕获的真实图像。获得的结果以及与最先进的状态进行的比较有关系统执行的不同动作的竞争结果,例如手势的分类和定位的准确性,或为对象和场景的描述生成。

CV-64-标题 RiCS A 2D Self-Occlusion Map for Harmonizing Volumetric Objects

链接: https://arxiv.org/abs/2205.06975
作者: Yunseok Jang, Ruben Villegas, Jimei Yang, Duygu Ceylan, Xin Sun, Honglak Lee
备注: Accepted paper at AI for Content Creation Workshop (AICC) at CVPR 2022

点击查看摘要

Abstract: There have been remarkable successes in computer vision with deep learning. While such breakthroughs show robust performance, there have still been many challenges in learning in-depth knowledge, like occlusion or predicting physical interactions. Although some recent works show the potential of 3D data in serving such context, it is unclear how we efficiently provide 3D input to the 2D models due to the misalignment in dimensionality between 2D and 3D. To leverage the successes of 2D models in predicting self-occlusions, we design Ray-marching in Camera Space (RiCS), a new method to represent the self-occlusions of foreground objects in 3D into a 2D self-occlusion map. We test the effectiveness of our representation on the human image harmonization task by predicting shading that is coherent with a given background image. Our experiments demonstrate that our representation map not only allows us to enhance the image quality but also to model temporally coherent complex shadow effects compared with the simulation-to-real and harmonization methods, both quantitatively and qualitatively. We further show that we can significantly improve the performance of human parts segmentation networks trained on existing synthetic datasets by enhancing the harmonization quality with our method.

摘要:在计算机视觉中取得了巨大的成功,并深入学习。尽管这种突破表现出强大的表现,但在学习深入知识(例如遮挡或预测身体互动)方面仍然存在许多挑战。尽管最近的一些作品显示了3D数据在服务这种情况下的潜力,但由于2D和3D之间的维度不一致,我们如何有效地向2D模型提供3D输入。为了利用2D模型在预测自我估计方面的成功,我们在摄像机空间(RICS)中设计了射线建设,这是一种新方法,将3D中前景对象的自校成分表示为2D自clusion映射。我们通过预测与给定背景图像相干的阴影来测试表示对人类图像协调任务的有效性。我们的实验表明,我们的表示图不仅允许我们增强图像质量,而且可以在定量和定性上与仿真到现实和协调方法相比,与临时连贯的复杂阴影效应建模。我们进一步表明,我们可以通过提高与我们的方法的协调质量来显着提高在现有合成数据集接受培训的人类零件细分网络的性能。

CV-65-标题 Dense residual Transformer for image denoising

链接: https://arxiv.org/abs/2205.06944
作者: Chao Yao, Shuo Jin, Meiqin Liu, Xiaojuan Ban
备注: Updated on 0514

点击查看摘要

Abstract: Image denoising is an important low-level computer vision task, which aims to reconstruct a noise-free and high-quality image from a noisy image. With the development of deep learning, convolutional neural network (CNN) has been gradually applied and achieved great success in image denoising, image compression, image enhancement, etc. Recently, Transformer has been a hot technique, which is widely used to tackle computer vision tasks. However, few Transformer-based methods have been proposed for low-level vision tasks. In this paper, we proposed an image denoising network structure based on Transformer, which is named DenSformer. DenSformer consists of three modules, including a preprocessing module, a local-global feature extraction module, and a reconstruction module. Specifically, the local-global feature extraction module consists of several Sformer groups, each of which has several ETransformer layers and a convolution layer, together with a residual connection. These Sformer groups are densely skip-connected to fuse the feature of different layers, and they jointly capture the local and global information from the given noisy images. We conduct our model on comprehensive experiments. Experimental results prove that our DenSformer achieves improvement compared to some state-of-the-art methods, both for the synthetic noise data and real noise data, in the objective and subjective evaluations.

摘要:图像Denoising是一项重要的低级计算机视觉任务,旨在从嘈杂的图像中重建无噪声和高质量的图像。随着深度学习的发展,卷积神经网络(CNN)已逐渐应用并取得了巨大的成功,在图像DeNoising,图像压缩,图像增强等方面取得了巨大成功。任务。但是,对于低级视觉任务,很少提出基于变压器的方法。在本文中,我们提出了一个基于变压器的图像降级网络结构,该结构被命名为密度构象。密度构造器由三个模块组成,包括预处理模块,局部全球特征提取模块和一个重建模块。具体而言,局部全球特征提取模块由几个Sformer组组成,每个组都有几个Etransformer层和一个卷积层以及残留的连接。这些sformer组密集地连接以融合不同层的特征,它们从给定的嘈杂图像共同捕获本地和全局信息。我们开展有关综合实验的模型。实验结果证明,在客观和主观评估中,与某些最新方法相比,我们的敏感器与某些最新方法相比,在合成噪声数据和真实噪声数据方面取得了进步。

CV-66-标题 A Saliency-Guided Street View Image Inpainting Framework for Efficient Last-Meters Wayfinding

链接: https://arxiv.org/abs/2205.06934
作者: Chuanbo Hu, Shan Jia, Fan Zhang, Xin Li
备注:

点击查看摘要

Abstract: Global Positioning Systems (GPS) have played a crucial role in various navigation applications. Nevertheless, localizing the perfect destination within the last few meters remains an important but unresolved problem. Limited by the GPS positioning accuracy, navigation systems always show users a vicinity of a destination, but not its exact location. Street view images (SVI) in maps as an immersive media technology have served as an aid to provide the physical environment for human last-meters wayfinding. However, due to the large diversity of geographic context and acquisition conditions, the captured SVI always contains various distracting objects (e.g., pedestrians and vehicles), which will distract human visual attention from efficiently finding the destination in the last few meters. To address this problem, we highlight the importance of reducing visual distraction in image-based wayfinding by proposing a saliency-guided image inpainting framework. It aims at redirecting human visual attention from distracting objects to destination-related objects for more efficient and accurate wayfinding in the last meters. Specifically, a context-aware distracting object detection method driven by deep salient object detection has been designed to extract distracting objects from three semantic levels in SVI. Then we employ a large-mask inpainting method with fast Fourier convolutions to remove the detected distracting objects. Experimental results with both qualitative and quantitative analysis show that our saliency-guided inpainting method can not only achieve great perceptual quality in street view images but also redirect the human’s visual attention to focus more on static location-related objects than distracting ones. The human-based evaluation also justified the effectiveness of our method in improving the efficiency of locating the target destination.

摘要:全球定位系统(GPS)在各种导航应用程序中发挥了至关重要的作用。然而,在最后几米内将理想的目的地定位仍然是一个重要但尚未解决的问题。受GPS定位精度的限制,导航系统始终向用户显示目的地的附近,但不是其确切位置。作为沉浸式媒体技术的地图中的街景图图像(SVI)有助于为人类的最后一台寻路提供物理环境。但是,由于地理环境和获取条件的多样性,被捕获的SVI始终包含各种分散注意力的物体(例如,行人和车辆),这将分散人类视觉注意力的注意力,以便在过去几米中有效地找到目的地。为了解决这个问题,我们强调了通过提出显着引导的图像介入框架来减少基于图像的寻路的视觉分散注意力的重要性。它旨在将人类的视觉注意力从分散注意力的对象转变为与目的地相关的对象,以在最后一米中更有效,更准确地找到寻路。具体而言,由深突出对象检测驱动的分心对象检测方法已设计为从SVI中的三个语义级别提取分散注意力的对象。然后,我们采用具有快速傅立叶卷积的大型掩盖涂层方法来删除被检测到的分散注意力的物体。定性和定量分析的实验结果表明,我们的显着性指导方法不仅可以在街道视图图像中获得出色的感知质量,而且还可以将人的视觉关注重新定向,以更多地关注与静态位置相关的物体,而不是分散注意力。基于人类的评估还证明了我们方法在提高定位目标目的地效率方面的有效性。

CV-67-标题 ImageSig A signature transform for ultra-lightweight image recognition

链接: https://arxiv.org/abs/2205.06929
作者: Mohamed R. Ibrahim, Terry Lyons
备注:

点击查看摘要

Abstract: This paper introduces a new lightweight method for image recognition. ImageSig is based on computing signatures and does not require a convolutional structure or an attention-based encoder. It is striking to the authors that it achieves: a) an accuracy for 64 X 64 RGB images that exceeds many of the state-of-the-art methods and simultaneously b) requires orders of magnitude less FLOPS, power and memory footprint. The pretrained model can be as small as 44.2 KB in size. ImageSig shows unprecedented performance on hardware such as Raspberry Pi and Jetson-nano. ImageSig treats images as streams with multiple channels. These streams are parameterized by spatial directions. We contribute to the functionality of signature and rough path theory to stream-like data and vision tasks on static images beyond temporal streams. With very few parameters and small size models, the key advantage is that one could have many of these “detectors” assembled on the same chip; moreover, the feature acquisition can be performed once and shared between different models of different tasks - further accelerating the process. This contributes to energy efficiency and the advancements of embedded AI at the edge.

摘要:本文介绍了一种新的轻巧方法用于图像识别。 Imagesig基于计算签名,不需要卷积结构或基于注意的编码器。对作者的实现是惊人的:a)超过许多最先进方法的64 x 64 RGB图像的准确性,同时b)需要减少拖鞋,功率和内存足迹的数量级。预处理的模型的大小可以小至44.2 kb。 Imagesig在硬件(例如Raspberry Pi和Jetson-Nano)上显示了前所未有的性能。 Imagesig将图像视为带有多个通道的流。这些流是通过空间方向进行参数化的。我们为签名和粗糙路径理论的功能做出了贡献,以在时间流以外的静态图像上进行类似流的数据和视觉任务。对于很少的参数和小尺寸模型,关键优势在于,可以将其中许多“检测器”组装在同一芯片上。此外,可以执行一次功能采集并在不同任务的不同模型之间共享 - 进一步加速了过程。这有助于能源效率和边缘嵌入的AI的进步。

CV-68-标题 AVCAffe A Large Scale Audio-Visual Dataset of Cognitive Load and Affect for Remote Work

链接: https://arxiv.org/abs/2205.06887
作者: Pritam Sarkar, Aaron Posen, Ali Etemad
备注:

点击查看摘要

Abstract: We introduce AVCAffe, the first Audio-Visual dataset consisting of Cognitive load and Affect attributes. We record AVCAffe by simulating remote work scenarios over a video-conferencing platform, where subjects collaborate to complete a number of cognitively engaging tasks. AVCAffe is the largest originally collected (not collected from the Internet) affective dataset in English language. We recruit 106 participants from 18 different countries of origin, spanning an age range of 18 to 57 years old, with a balanced male-female ratio. AVCAffe comprises a total of 108 hours of video, equivalent to more than 58,000 clips along with task-based self-reported ground truth labels for arousal, valence, and cognitive load attributes such as mental demand, temporal demand, effort, and a few others. We believe AVCAffe would be a challenging benchmark for the deep learning research community given the inherent difficulty of classifying affect and cognitive load in particular. Moreover, our dataset fills an existing timely gap by facilitating the creation of learning systems for better self-management of remote work meetings, and further study of hypotheses regarding the impact of remote work on cognitive load and affective states.

摘要:我们介绍了Avcaffe,这是第一个由认知负载和影响属性组成的视听数据集。我们通过模拟视频会议平台模拟远程工作方案来录制Avcaffe,主题协作完成了许多认知吸引人的任务。 Avcaffe是用英语收集的最初收集的(未从互联网收集)的情感数据集。我们招募了来自18个不同原籍国的106名参与者,涵盖了18至57岁的年龄范围,男女比例平衡。 Avcaffe总共包括108小时的视频,相当于58,000多个剪辑,以及基于任务的自我报告的地面真相标签,用于唤醒,价和认知负载属性,例如精神需求,时间需求,努力,努力以及其他一些。我们认为,鉴于对深度学习研究社区的固有困难和认知负荷的固有困难,Avcaffe对于深度学习研究界来说将是一个具有挑战性的基准。此外,我们的数据集通过促进学习系统的创建来更好地自我管理远程工作会议,并进一步研究有关远程工作对认知负荷和情感状态的影响的假设,从而填补了现有的及时差距。

CV-69-标题 Using Augmented Face Images to Improve Facial Recognition Tasks

链接: https://arxiv.org/abs/2205.06873
作者: Shuo Cheng, Guoxian Song, Wan-Chun Ma, Chao Wang, Linjie Luo
备注: CHI 2022 Workshop: AI-Generated Characters: Putting Deepfakes to Good Use

点击查看摘要

Abstract: We present a framework that uses GAN-augmented images to complement certain specific attributes, usually underrepresented, for machine learning model training. This allows us to improve inference quality over those attributes for the facial recognition tasks.

摘要:我们提出了一个框架,该框架使用gan-augment图像来补充机器学习模型培训的某些特定属性,通常代表性不足。这使我们能够改善面部识别任务的推理质量而不是这些属性。

CV-70-标题 From Images to Probabilistic Anatomical Shapes A Deep Variational Bottleneck Approach

链接: https://arxiv.org/abs/2205.06862
作者: Jadie Adams, Shireen Elhabian
备注: Provisionally accepted to MICCAI 2022 on May 4, 2022

点击查看摘要

Abstract: Statistical shape modeling (SSM) directly from 3D medical images is an underutilized tool for detecting pathology, diagnosing disease, and conducting population-level morphology analysis. Deep learning frameworks have increased the feasibility of adopting SSM in medical practice by reducing the expert-driven manual and computational overhead in traditional SSM workflows. However, translating such frameworks to clinical practice requires calibrated uncertainty measures as neural networks can produce over-confident predictions that cannot be trusted in sensitive clinical decision-making. Existing techniques for predicting shape with aleatoric (data-dependent) uncertainty utilize a principal component analysis (PCA) based shape representation computed in isolation from the model training. This constraint restricts the learning task to solely estimating pre-defined shape descriptors from 3D images and imposes a linear relationship between this shape representation and the output (i.e., shape) space. In this paper, we propose a principled framework based on the variational information bottleneck theory to relax these assumptions while predicting probabilistic shapes of anatomy directly from images without supervised encoding of shape descriptors. Here, the latent representation is learned in the context of the learning task, resulting in a more scalable, flexible model that better captures data non-linearity. Additionally, this model is self-regularized and generalizes better given limited training data. Our experiments demonstrate that the proposed method provides improved accuracy and better calibrated aleatoric uncertainty estimates than state-of-the-art methods.

摘要:直接来自3D医学图像的统计形状建模(SSM)是未充分利用的工具,用于检测病理学,诊断疾病和进行人群水平的形态分析。深度学习框架通过减少传统SSM工作流程中专家驱动的手动和计算开销,提高了在医疗实践中采用SSM的可行性。但是,将这种框架转换为临床实践需要校准的不确定性度量,因为神经网络可以产生过度自信的预测,而这些预测在敏感的临床决策中无法信任。现有的技术用Aleatoric(数据依赖性)不确定性来预测形状,利用基于主成分分析(PCA)的形状表示,从模型训练中孤立地计算出来。该限制将学习任务限制在仅从3D图像中估算预定义的形状描述符,并在此形状表示与输出(即形状)空间之间实现线性关系。在本文中,我们提出了一个基于变异信息瓶颈理论的原则框架,以放松这些假设,同时直接从图像中预测解剖学的概率形状,而无需监督形状描述符的编码。在这里,潜在表示是在学习任务的背景下学习的,从而产生了更可扩展,灵活的模型,可以更好地捕获数据非线性。此外,在训练数据有限的情况下,该模型是自我调节的,并且可以更好地概括。我们的实验表明,所提出的方法比最新方法提供了提高的准确性和更好的校准不确定性估计。

CV-71-标题 A Framework for Event-based Computer Vision on a Mobile Device

链接: https://arxiv.org/abs/2205.06836
作者: Gregor Lenz, Serge Picaud, Sio-Hoi Ieng
备注:

点击查看摘要

Abstract: We present the first publicly available Android framework to stream data from an event camera directly to a mobile phone. Today’s mobile devices handle a wider range of workloads than ever before and they incorporate a growing gamut of sensors that make devices smarter, more user friendly and secure. Conventional cameras in particular play a central role in such tasks, but they cannot record continuously, as the amount of redundant information recorded is costly to process. Bio-inspired event cameras on the other hand only record changes in a visual scene and have shown promising low-power applications that specifically suit mobile tasks such as face detection, gesture recognition or gaze tracking. Our prototype device is the first step towards embedding such an event camera into a battery-powered handheld device. The mobile framework allows us to stream events in real-time and opens up the possibilities for always-on and on-demand sensing on mobile phones. To liaise the asynchronous event camera output with synchronous von Neumann hardware, we look at how buffering events and processing them in batches can benefit mobile applications. We evaluate our framework in terms of latency and throughput and show examples of computer vision tasks that involve both event-by-event and pre-trained neural network methods for gesture recognition, aperture robust optical flow and grey-level image reconstruction from events. The code is available at this https URL

摘要:我们介绍了第一个公开可用的Android框架,将数据直接从事件摄像机传输到手机。当今的移动设备比以往任何时候都处理更大的工作负载,它们结合了越来越多的传感器,使设备更聪明,更友好,更安全。传统的摄像机尤其在此类任务中起着核心作用,但是它们无法连续记录,因为记录的冗余信息的数量在过程中成本很高。另一方面,以生物为灵感的事件摄像机仅记录视觉场景中的变化,并显示了有希望的低功率应用程序,这些应用程序特别适合移动任务,例如面部检测,手势识别或凝视跟踪。我们的原型设备是将这种事件摄像头嵌入电池供电的手持设备中的第一步。移动框架使我们能够实时流式传输事件,并为手机上的始终开机和按需感测的可能性打开了可能性。为了与同步的von Neumann硬件联系异步事件摄像机输出,我们研究了如何在批处理中进行缓冲事件和处理它们可以使移动应用程序受益。我们从延迟和吞吐量方面评估了我们的框架,并显示了计算机视觉任务的示例,这些任务涉及逐个事件和预训练的神经网络方法,以识别手势,可靠的光流和灰级图像重建事件。该代码可在此HTTPS URL上找到

CV-72-标题 Weakly-supervised Biomechanically-constrained CT/MRI Registration of the Spine

链接: https://arxiv.org/abs/2205.07568
作者: Bailiang Jian, Mohammad Farid Azampour, Francesca De Benetti, Johannes Oberreuter, Christina Bukas, Alexandra S. Gersing, Sarah C. Foreman, Anna-Sophia Dietrich, Jon Rischewski, Jan S. Kirschke, Nassir Navab, Thomas Wendler
备注: 10 pages, 3 figures

点击查看摘要

Abstract: CT and MRI are two of the most informative modalities in spinal diagnostics and treatment planning. CT is useful when analysing bony structures, while MRI gives information about the soft tissue. Thus, fusing the information of both modalities can be very beneficial. Registration is the first step for this fusion. While the soft tissues around the vertebra are deformable, each vertebral body is constrained to move rigidly. We propose a weakly-supervised deep learning framework that preserves the rigidity and the volume of each vertebra while maximizing the accuracy of the registration. To achieve this goal, we introduce anatomy-aware losses for training the network. We specifically design these losses to depend only on the CT label maps since automatic vertebra segmentation in CT gives more accurate results contrary to MRI. We evaluate our method on an in-house dataset of 167 patients. Our results show that adding the anatomy-aware losses increases the plausibility of the inferred transformation while keeping the accuracy untouched.

摘要:CT和MRI是脊柱诊断和治疗计划中最有用的两个方法。在分析骨质结构时,CT很有用,而MRI提供了有关软组织的信息。因此,融合两种方式的信息可能非常有益。注册是该融合的第一步。虽然椎骨周围的软组织是可变形的,但每个椎体都被限制为刚性移动。我们提出了一个弱监督的深度学习框架,该框架保留了每个椎骨的刚度和体积,同时最大程度地提高了注册的准确性。为了实现这一目标,我们引入了培训网络的解剖学损失。我们专门设计这些损失以仅取决于CT标签图,因为CT中的自动椎骨分割给出了与MRI相反的更准确的结果。我们在167名患者的内部数据集上评估我们的方法。我们的结果表明,添加解剖学感知的损失会增加推断转化的合理性,同时保持准确性不变。

CV-73-标题 Adaptive Convolutional Dictionary Network for CT Metal Artifact Reduction

链接: https://arxiv.org/abs/2205.07471
作者: Hong Wang, Yuexiang Li, Deyu Meng, Yefeng Zheng
备注: this https URL

点击查看摘要

Abstract: Inspired by the great success of deep neural networks, learning-based methods have gained promising performances for metal artifact reduction (MAR) in computed tomography (CT) images. However, most of the existing approaches put less emphasis on modelling and embedding the intrinsic prior knowledge underlying this specific MAR task into their network designs. Against this issue, we propose an adaptive convolutional dictionary network (ACDNet), which leverages both model-based and learning-based methods. Specifically, we explore the prior structures of metal artifacts, e.g., non-local repetitive streaking patterns, and encode them as an explicit weighted convolutional dictionary model. Then, a simple-yet-effective algorithm is carefully designed to solve the model. By unfolding every iterative substep of the proposed algorithm into a network module, we explicitly embed the prior structure into a deep network, \emph{i.e.,} a clear interpretability for the MAR task. Furthermore, our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image based on its content. Hence, our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods. Comprehensive experiments executed on synthetic and clinical datasets show the superiority of our ACDNet in terms of effectiveness and model generalization. {\color{blue}{\textit{Code is available at {\url{this https URL}.}}}}

摘要:受深神经网络的巨大成功的启发,基于学习的方法在计算机断层扫描(CT)图像中获得了有希望的金属伪像(MAR)的表现。但是,大多数现有方法更加强调建模并嵌入本特定MAR任务的内在先验知识中,将其纳入其网络设计中。在这个问题上,我们提出了一个自适应卷积词典网络(ACDNET),该网络利用基于模型的方法和基于学习的方法。具体而言,我们探讨了金属伪像的先前结构,例如非本地重复条纹模式,并将其编码为显式加权卷积词典模型。然后,仔细设计了一种简单的算法来解决模型。通过将所提出算法的每个迭代取代展开到网络模块中,我们将先前的结构明确嵌入到深网中,\ emph {i.e。,}对MAR任务的明确解释性。此外,我们的ACDNET可以通过训练数据自动学习无伪影CT图像的先验,并根据其内容自适应地调整每个输入CT图像的表示内核。因此,我们的方法继承了基于模型的方法的明确解释性,并保持了基于学习的方法的强大表示能力。在合成和临床数据集上执行的综合实验表明,在有效性和模型概括方面,我们的ACDNET的优越性。 {\ color {blue} {\ textit {代码可在{\ url {this https url}。}}}}}}}}}}

CV-74-标题 High-Resolution CMB Lensing Reconstruction with Deep Learning

链接: https://arxiv.org/abs/2205.07368
作者: Peikai Li, Ipek Ilayda Onur, Scott Dodelson, Shreyas Chaudhari
备注: 11 pages, 9 figures

点击查看摘要

Abstract: Next-generation cosmic microwave background (CMB) surveys are expected to provide valuable information about the primordial universe by creating maps of the mass along the line of sight. Traditional tools for creating these lensing convergence maps include the quadratic estimator and the maximum likelihood based iterative estimator. Here, we apply a generative adversarial network (GAN) to reconstruct the lensing convergence field. We compare our results with a previous deep learning approach – Residual-UNet – and discuss the pros and cons of each. In the process, we use training sets generated by a variety of power spectra, rather than the one used in testing the methods.

摘要:下一代宇宙微波背景(CMB)调查有望通过沿视线创建质量图来提供有关原始宇宙的有价值信息。创建这些镜头收敛图的传统工具包括二次估计器和最大似然性迭代估计器。在这里,我们应用生成对抗网络(GAN)来重建镜头收敛场。我们将结果与以前的深度学习方法(残留 - Unet)进行比较,并讨论每个方法的利弊。在此过程中,我们使用各种功率谱生成的训练集,而不是用于测试方法的训练集。

CV-75-标题 Combating COVID-19 using Generative Adversarial Networks and Artificial Intelligence for Medical Images A Scoping Review

链接: https://arxiv.org/abs/2205.07236
作者: Hazrat Ali, Zubair Shah
备注:

点击查看摘要

Abstract: This review presents a comprehensive study on the role of GANs in addressing the challenges related to COVID-19 data scarcity and diagnosis. It is the first review that summarizes the different GANs methods and the lungs images datasets for COVID-19. It attempts to answer the questions related to applications of GANs, popular GAN architectures, frequently used image modalities, and the availability of source code. This review included 57 full-text studies that reported the use of GANs for different applications in COVID-19 lungs images data. Most of the studies (n=42) used GANs for data augmentation to enhance the performance of AI techniques for COVID-19 diagnosis. Other popular applications of GANs were segmentation of lungs and super-resolution of the lungs images. The cycleGAN and the conditional GAN were the most commonly used architectures used in nine studies each. 29 studies used chest X-Ray images while 21 studies used CT images for the training of GANs. For majority of the studies (n=47), the experiments were done and results were reported using publicly available data. A secondary evaluation of the results by radiologists/clinicians was reported by only two studies. Conclusion: Studies have shown that GANs have great potential to address the data scarcity challenge for lungs images of COVID-19. Data synthesized with GANs have been helpful to improve the training of the Convolutional Neural Network (CNN) models trained for the diagnosis of COVID-19. Besides, GANs have also contributed to enhancing the CNNs performance through the super-resolution of the images and segmentation. This review also identified key limitations of the potential transformation of GANs based methods in clinical applications.

摘要:这篇评论介绍了一项有关甘斯在解决与COVID-19数据稀缺和诊断相关的挑战中作用的综合研究。这是第一个总结了不同gans方法的评论和covid-19的肺图像数据集。它试图回答与gan的应用,流行的gan体系结构,常用图像模式以及源代码的可用性有关的问题。这项综述包括57项全文研究,这些研究报告了在Covid-19-19肺图像数据中使用GAN用于不同应用的。大多数研究(n = 42)都使用gan进行数据增强,以增强CoVID-19诊断的AI技术的性能。 gan的其他流行应用是肺部的分割和肺图像的超分辨率。自行车和条件gan是九项研究中最常用的架构。 29项研究使用了胸部X射线图像,而21个研究使用CT图像进行gan训练。对于大多数研究(n = 47),进行了实验,并使用公开数据进行了报告。仅两项研究报告了放射科医生/临床医生对结果的次要评估。结论:研究表明,GAN具有很大的潜力来解决COVID-19的肺部图像的数据稀缺挑战。用GAN合成的数据有助于改善培训用于诊断Covid-19的卷积神经网络(CNN)模型的训练。此外,GAN还通过图像和分割的超分辨率来提高CNNS性能。这篇综述还确定了基于gan的方法在临床应用中的潜在转化的关键局限性。

CV-76-标题 Nonconvex L_ 1/2 -Regularized Nonlocal Self-similarity Denoiser for Compressive Sensing based CT Reconstruction

链接: https://arxiv.org/abs/2205.07185
作者: Yunyi Li (1), Yiqiu Jiang (2), Hengmin Zhang (3), Jianxun Liu (1), Xiangling Ding (1), Guan Gui (4) ((1) School of Computer Science and Engineering, Hunan University of Science and Technology (2) Department of Sports Medicine and Joint Surgery, Nanjing First Hospital, Nanjing Medical University (3) Department of Computer and Information Science, University of Macau (4) College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications)
备注: Preprint submitted to Journal of The Franklin Institute. Corresponding Author: yunyili@hnust.edu.cn, guiguan@njupt.edu.cn

点击查看摘要

Abstract: Compressive sensing (CS) based computed tomography (CT) image reconstruction aims at reducing the radiation risk through sparse-view projection data. It is usually challenging to achieve satisfying image quality from incomplete projections. Recently, the nonconvex {L_ {1/2}} -norm has achieved promising performance in sparse recovery, while the applications on imaging are unsatisfactory due to its nonconvexity. In this paper, we develop a {L_ {1/2}} -regularized nonlocal self-similarity (NSS) denoiser for CT reconstruction problem, which integrates low-rank approximation with group sparse coding (GSC) framework. Concretely, we first split the CT reconstruction problem into two subproblems, and then improve the CT image quality furtherly using our {L_ {1/2}} -regularized NSS denoiser. Instead of optimizing the nonconvex problem under the perspective of GSC, we particularly reconstruct CT image via low-rank minimization based on two simple yet essential schemes, which build the equivalent relationship between GSC based denoiser and low-rank minimization. Furtherly, the weighted singular value thresholding (WSVT) operator is utilized to optimize the resulting nonconvex {L_ {1/2}} minimization problem. Following this, our proposed denoiser is integrated with the CT reconstruction problem by alternating direction method of multipliers (ADMM) framework. Extensive experimental results on typical clinical CT images have demonstrated that our approach can further achieve better performance than popular approaches.

摘要:基于压缩传感(CS)的计算机断层扫描(CT)图像重建旨在通过稀疏视图投影数据降低辐射风险。通常,从不完整的预测中获得令人满意的图像质量通常是一项挑战。最近,NonConvex {L_ {1/2}} - NOMM在稀疏恢复中实现了有希望的性能,而成像应用程序由于其非凸性而不令人满意。在本文中,我们为CT重建问题开发了一个{l_ {1/2}} - 规范化的非局部自相似(NSS)denoiser,该deoiser将低级近似值与组稀疏编码(GSC)框架集成在一起。具体而言,我们首先将CT重建问题分为两个子问题,然后使用我们的{l_ {1/2}}进行了调节的NSS Denoiser进一步提高CT图像质量。我们没有在GSC的角度优化非凸问题,而是根据两个简单但必不可少的方案通过低级别最小化重建CT图像,这些方案在基于GSC的DENOISER和低级最小化之间建立了等效关系。此外,加权奇异值阈值(WSVT)运算符用于优化所得的非convex {l_ {1/2}}最小化问题。此后,我们提出的DeNoiser通过交替的乘数方法(ADMM)框架与CT重建问题集成在一起。对典型临床CT图像的广泛实验结果表明,我们的方法可以进一步取得比流行方法更好的性能。

CV-77-标题 Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT

链接: https://arxiv.org/abs/2205.07180
作者: Bowen Shi, Abdelrahman Mohamed, Wei-Ning Hsu
备注: Submitted to Interspeech

点击查看摘要

Abstract: This paper investigates self-supervised pre-training for audio-visual speaker representation learning where a visual stream showing the speaker’s mouth area is used alongside speech as inputs. Our study focuses on the Audio-Visual Hidden Unit BERT (AV-HuBERT) approach, a recently developed general-purpose audio-visual speech pre-training framework. We conducted extensive experiments probing the effectiveness of pre-training and visual modality. Experimental results suggest that AV-HuBERT generalizes decently to speaker related downstream tasks, improving label efficiency by roughly ten fold for both audio-only and audio-visual speaker verification. We also show that incorporating visual information, even just the lip area, greatly improves the performance and noise robustness, reducing EER by 38% in the clean condition and 75% in noisy conditions. Our code and models will be publicly available.

摘要:本文研究了视听说话者表示的自我监督的预训练,其中显示了视觉流,显示说话者的口腔区域与语音一起用作输入。我们的研究重点是视听隐藏单元BERT(AV-HUBERT)方法,该方法是最近开发的通用音频语音训练前训练框架。我们进行了广泛的实验,以探测预训练和视觉方式的有效性。实验结果表明,AV-Hubert可以很好地概括与说话者相关的下游任务,从而使标签效率提高了大约10倍的仅10倍,仅音频和视听扬声器验证。我们还表明,结合视觉信息,甚至仅仅是唇部区域,都大大提高了性能和噪声稳健性,在清洁条件下将EER降低了38%,在嘈杂的条件下将EER降低了75%。我们的代码和模型将公开使用。

CV-78-标题 A Unifying Multi-sampling-ratio CS-MRI Framework With Two-grid-cycle Correction and Geometric Prior Distillation

链接: https://arxiv.org/abs/2205.07062
作者: Xiaohong Fan, Yin Yang, Ke Chen, Jianping Zhang, Ke Dong
备注: 12 pages

点击查看摘要

Abstract: CS is an efficient method to accelerate the acquisition of MR images from under-sampled k-space data. Although existing deep learning CS-MRI methods have achieved considerably impressive performance, explainability and generalizability continue to be challenging for such methods since most of them are not flexible enough to handle multi-sampling-ratio reconstruction assignments, often the transition from mathematical analysis to network design not always natural enough. In this work, to tackle explainability and generalizability, we propose a unifying deep unfolding multi-sampling-ratio CS-MRI framework, by merging advantages of model-based and deep learning-based methods. The combined approach offers more generalizability than previous works whereas deep learning gains explainability through a geometric prior module. Inspired by multigrid algorithm, we first embed the CS-MRI-based optimization algorithm into correction-distillation scheme that consists of three ingredients: pre-relaxation module, correction module and geometric prior distillation module. Furthermore, we employ a condition module to learn adaptively step-length and noise level from compressive sampling ratio in every stage, which enables the proposed framework to jointly train multi-ratio tasks through a single model. The proposed model can not only compensate the lost contextual information of reconstructed image which is refined from low frequency error in geometric characteristic k-space, but also integrate the theoretical guarantee of model-based methods and the superior reconstruction performances of deep learning-based methods. All physical-model parameters are learnable, and numerical experiments show that our framework outperforms state-of-the-art methods in terms of qualitative and quantitative evaluations.

摘要:CS是一种有效的方法,可以加速从采样不足的K空间数据中获取MR图像。尽管现有的深度学习CS-MRI方法已经实现了令人印象深刻的性能,但是对于此类方法而言,解释性和可推广性仍然具有挑战性设计并不总是足够自然的。在这项工作中,为了解决解释性和可推广性,我们通过合并基于模型和深度学习的方法的优势,提出了一个统一的深入展开多样化比率RATIO RATIO RATIO RATIO RATIO RATIO RATIO RATIO RATIO RATIO RATIO RATIO框架。合并的方法比以前的作品提供了更多的概括性,而深度学习通过几何模块的解释性获得了。受到Multigrid算法的启发,我们首先将基于CS-MRI的优化算法嵌入到校正缩减方案中,该方案由三种成分组成:释放模块,校正模块和几何模块和几何蒸馏模块。此外,我们采用条件模块来学习每个阶段的压缩抽样比率的自适应阶梯长度和噪声水平,这使提出的框架能够通过单个模型共同训练多主比例任务。所提出的模型不仅可以补偿重建图像的丢失的上下文信息,从而从几何特征K空间中的低频误差进行了完善,而且还整合了基于模型方法的理论保证和基于深度学习方法的卓越重建性能。所有物理模型参数都是可以学习的,数值实验表明,在定性和定量评估方面,我们的框架优于最先进的方法。

CV-79-标题 Self-supervised Assisted Active Learning for Skin Lesion Segmentation

链接: https://arxiv.org/abs/2205.07021
作者: Ziyuan Zhao, Wenjing Lu, Zeng Zeng, Kaixin Xu, Bharadwaj Veeravalli, Cuntai Guan
备注: Accepted by the 44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2022)

点击查看摘要

Abstract: Label scarcity has been a long-standing issue for biomedical image segmentation, due to high annotation costs and professional requirements. Recently, active learning (AL) strategies strive to reduce annotation costs by querying a small portion of data for annotation, receiving much traction in the field of medical imaging. However, most of the existing AL methods have to initialize models with some randomly selected samples followed by active selection based on various criteria, such as uncertainty and diversity. Such random-start initialization methods inevitably introduce under-value redundant samples and unnecessary annotation costs. For the purpose of addressing the issue, we propose a novel self-supervised assisted active learning framework in the cold-start setting, in which the segmentation model is first warmed up with self-supervised learning (SSL), and then SSL features are used for sample selection via latent feature clustering without accessing labels. We assess our proposed methodology on skin lesions segmentation task. Extensive experiments demonstrate that our approach is capable of achieving promising performance with substantial improvements over existing baselines.

摘要:由于高注释成本和专业要求,标签稀缺性一直是生物医学图像细分的长期问题。最近,主动学习(AL)策略努力通过查询一小部分数据以进行注释,从而降低注释成本,从而在医学成像领域受到很多关注。但是,大多数现有的AL方法必须根据一些随机选择的样本初始化模型,然后根据各种标准(例如不确定性和多样性)进行主动选择。这种随机启动的初始化方法不可避免地会引入价值冗余样本和不必要的注释成本。为了解决这个问题,我们在冷启动环境中提出了一个新颖的自我监督的辅助积极学习框架,在该环境中,首先使用自我监督的学习(SSL)对细分模型进行热身,然后使用SSL功能,然后使用SSL功能用于通过潜在特征聚类选择样品,而无需访问标签。我们评估了有关皮肤病变细分任务的建议方法。广泛的实验表明,我们的方法能够实现有希望的表现,并且对现有基准的重大改进。

CV-80-标题 BronchusNet Region and Structure Prior Embedded Representation Learning for Bronchus Segmentation and Classification

链接: https://arxiv.org/abs/2205.06947
作者: Wenhao Huang, Haifan Gong, Huan Zhang, Yu Wang, Haofeng Li, Guanbin Li, Hong Shen
备注:

点击查看摘要

Abstract: CT-based bronchial tree analysis plays an important role in the computer-aided diagnosis for respiratory diseases, as it could provide structured information for clinicians. The basis of airway analysis is bronchial tree reconstruction, which consists of bronchus segmentation and classification. However, there remains a challenge for accurate bronchial analysis due to the individual variations and the severe class imbalance. In this paper, we propose a region and structure prior embedded framework named BronchusNet to achieve accurate segmentation and classification of bronchial regions in CT images. For bronchus segmentation, we propose an adaptive hard region-aware UNet that incorporates multi-level prior guidance of hard pixel-wise samples in the general Unet segmentation network to achieve better hierarchical feature learning. For the classification of bronchial branches, we propose a hybrid point-voxel graph learning module to fully exploit bronchial structure priors and to support simultaneous feature interactions across different branches. To facilitate the study of bronchial analysis, we contribute~\textbf{BRSC}: an open-access benchmark of \textbf{BR}onchus imaging analysis with high-quality pixel-wise \textbf{S}egmentation masks and the \textbf{C}lass of bronchial segments. Experimental results on BRSC show that our proposed method not only achieves the state-of-the-art performance for binary segmentation of bronchial region but also exceeds the best existing method on bronchial branches classification by 6.9%.

摘要:基于CT的支气管树分析在计算机辅助诊断呼吸道疾病中起着重要作用,因为它可以为临床医生提供结构化信息。气道分析的基础是支气管树重建,由支气管分割和分类组成。但是,由于个体变化和严重的类失衡,准确的支气管分析仍然存在挑战。在本文中,我们提出了一个名为Bronchusnet的区域和结构,以实现CT图像中支气管区域的精确分割和分类。对于支气管细分,我们提出了一种自适应的硬区域感知的UNET,该UNET将硬像素样本的多层次指导纳入一般的UNET细分网络中,以实现更好的层次功能学习。对于支气管分支的分类,我们提出了一个混合点 - 素图学习模块,以完全利用支气管结构先验,并支持跨不同分支的同时特征相互作用。为了促进支气管分析的研究,我们贡献了〜\ textbf {brsc}:\ textbf {br}的开放式基准标准,具有高质量像素\ textbf {s s} emengementation masks and the \ textbf { C}支气管段的小节。 BRSC上的实验结果表明,我们提出的方法不仅可以实现支气管区域二元分割的最新性能,而且还超过了支气管分支分类的最佳现有方法,降低了6.9 \%。

CV-81-标题 Unsupervised Representation Learning for 3D MRI Super Resolution with Degradation Adaptation

链接: https://arxiv.org/abs/2205.06891
作者: Jianan Liu, Hao Li, Tao Huang, Euijoon Ahn, Adeel Razi, Wei Xiang
备注: 8 pages, 4 figures

点击查看摘要

Abstract: High-resolution (HR) MRI is critical in assisting the doctor’s diagnosis and image-guided treatment, but is hard to obtain in a clinical setting due to long acquisition time. Therefore, the research community investigated deep learning-based super-resolution (SR) technology to reconstruct HR MRI images with shortened acquisition time. However, training such neural networks usually requires paired HR and low-resolution (LR) in-vivo images, which are difficult to acquire due to patient movement during and between the image acquisition. Rigid movements of hard tissues can be corrected with image-registration, whereas the alignment of deformed soft tissues is challenging, making it impractical to train the neural network with such authentic HR and LR image pairs. Therefore, most of the previous studies proposed SR reconstruction by employing authentic HR images and synthetic LR images downsampled from the HR images, yet the difference in degradation representations between synthetic and authentic LR images suppresses the performance of SR reconstruction from authentic LR images. To mitigate the aforementioned problems, we propose a novel Unsupervised DEgradation Adaptation Network (UDEAN). Our model consists of two components: the degradation learning network and the SR reconstruction network. The degradation learning network downsamples the HR images by addressing the degradation representation of the misaligned or unpaired LR images, and the SR reconstruction network learns the mapping from the downsampled HR images to their original HR images. As a result, the SR reconstruction network can generate SR images from the LR images and achieve comparable quality to the HR images. Experimental results show that our method outperforms the state-of-the-art models and can potentially be applied in real-world clinical settings.

摘要:高分辨率(HR)MRI对于协助医生的诊断和图像引导的治疗至关重要,但是由于较长的收购时间,在临床环境中很难获得。因此,研究界研究了基于深度学习的超分辨率(SR)技术,以缩短获取时间重建HR MRI图像。但是,训练这种神经网络通常需要配对的人力资源和低分辨率(LR)体内图像,由于在图像获取过程中和图像之间,由于患者的运动而难以获取。硬组织的刚性运动可以通过图像进行校正,而变形软组织的比对具有挑战性,这使得用这种正宗的HR和LR图像对训练神经网络是不切实际的。因此,以前的大多数研究都通过使用真实的HR图像和合成的LR图像提出了SR重建,从HR图像中删除了采样,但是合成和真实LR图像之间的降解表示差异抑制了来自真实LR图像的SR重构的性能。为了减轻上述问题,我们提出了一个新颖的无监督降解适应网络(UDEAN)。我们的模型由两个组成部分组成:退化学习网络和SR重建网络。降级学习网络通过解决未对准或未配对的LR图像的降解表示,而SR重建网络从落下的HR图像到其原始的HR图像来了解映射,将HR图像下调了HR图像。结果,SR重建网络可以从LR图像生成SR图像,并获得与HR图像相当的质量。实验结果表明,我们的方法的表现优于最新模型,并且可以潜在地应用于现实世界中的临床环境。

人工智能

AI-0-标题 How Different Groups Prioritize Ethical Values for Responsible AI

链接: https://arxiv.org/abs/2205.07722
作者: Maurice Jakesch, Zana Buçinca, Saleema Amershi, Alexandra Olteanu
备注:

点击查看摘要

Abstract: Private companies, public sector organizations, and academic groups have outlined ethical values they consider important for responsible artificial intelligence technologies. While their recommendations converge on a set of central values, little is known about the values a more representative public would find important for the AI technologies they interact with and might be affected by. We conducted a survey examining how individuals perceive and prioritize responsible AI values across three groups: a representative sample of the US population (N=743), a sample of crowdworkers (N=755), and a sample of AI practitioners (N=175). Our results empirically confirm a common concern: AI practitioners’ value priorities differ from those of the general public. Compared to the US-representative sample, AI practitioners appear to consider responsible AI values as less important and emphasize a different set of values. In contrast, self-identified women and black respondents found responsible AI values more important than other groups. Surprisingly, more liberal-leaning participants, rather than participants reporting experiences with discrimination, were more likely to prioritize fairness than other groups. Our findings highlight the importance of paying attention to who gets to define responsible AI.

摘要:私营公司,公共部门组织和学术团体概述了他们认为对负责任人工智能技术重要的道德价值观。尽管他们的建议在一组中心价值上汇聚在一起,但对更具代表性的公众认为对与他们互动并可能受到影响的AI技术很重要的价值知之甚少。我们进行了一项调查,研究了个人如何看待和优先考虑三组的负责人AI值:美国人口的代表性样本(n = 743),人群工人的样本(n = 755)和AI从业者样本(n = 175 )。我们的结果从经验上证实了一个普遍关注的问题:AI从业者的价值优先级与公众的价值不同。与美国代表样本相比,AI从业人员似乎认为负责的AI值不太重要,并强调了一组不同的值。相反,自我认同的妇女和黑人受访者发现,负责的AI值比其他群体更重要。令人惊讶的是,比其他群体更有可能优先考虑公平的参与者,而不是报道歧视经验的参与者更有可能优先考虑公平。我们的发现强调了关注谁定义负责人AI的重要性。

AI-1-标题 A review of ontologies for smart and continuous commissioning

链接: https://arxiv.org/abs/2205.07636
作者: Sara Gilani, Caroline Quinn, J.J. McArthur (Faculty of Engineering and Architectural Science, Ryerson University, Toronto, Canada)
备注: 36 pages, 9557 words

点击查看摘要

Abstract: Smart and continuous commissioning (SCCx) of buildings can result in a significant reduction in the gap between design and operational performance. Ontologies play an important role in SCCx as they facilitate data readability and reasoning by machines. A better understanding of ontologies is required in order to develop and incorporate them in SCCx. This paper critically reviews the state-of-the-art research on building data ontologies since 2014 within the SCCx domain through sorting them based on building data types, general approaches, and applications. The data types of two main domains of building information modeling and building management system have been considered in the majority of existing ontologies. Three main applications are evident from a critical analysis of existing ontologies: (1) key performance indicator calculation, (2) building performance improvement, and (3) fault detection and diagnosis. The key gaps found in the literature review are a holistic ontology for SCCx and insight on how such approaches should be evaluated. Based on these findings, this study provides recommendations for future necessary research including: identification of SCCx-related data types, assessment of ontology performance, and creation of open-source approaches.

摘要:建筑物的智能和连续调试(SCCX)可能会大大减少设计和运营性能之间的差距。本体论在SCCX中起着重要作用,因为它们促进了机器的数据可读性和推理。为了将其开发和纳入SCCX,需要更好地了解本体。本文批判性地回顾了自2014年以来自2014年以来在SCCX域内建立数据本体的最新研究,通过基于建筑数据类型,一般方法和应用程序对它们进行排序。在大多数现有本体论中,已经考虑了建筑信息建模和建筑管理系统的两个主要领域的数据类型。从现有本体论的批判分析中可以明显看出三个主要应用:(1)关键绩效指标计算,(2)建筑物绩效的改善以及(3)故障检测和诊断。文献综述中发现的关键差距是SCCX的整体本体,并了解应如何评估这种方法。基于这些发现,本研究为未来的必要研究提供了建议,包括:与SCCX相关的数据类型的识别,本体学绩效评估以及创建开源方法。

AI-2-标题 Relating Information and Proof

链接: https://arxiv.org/abs/2205.07635
作者: Anatol Slissenko
备注: 9 pages

点击查看摘要

Abstract: In mathematics information is a number that measures uncertainty (entropy) based on a probabilistic distribution, often of an obscure origin. In real life language information is a datum, a statement, more precisely, a formula. But such a formula should be justified by a proof. I try to formalize this perception of information. The measure of informativeness of a proof is based on the set of proofs related to the formulas under consideration. This set of possible proofs (`a knowledge base’) defines a probabilistic measure, and entropic weight is defined using this measure. The paper is mainly conceptual, it is not clear where and how this approach can be applied.

摘要:在数学信息中是一个数字,它基于概率分布(通常是晦涩的起源)来测量不确定性(熵)。在现实生活中,语言信息是一个基准,更精确地是一种公式。但是这样的公式应由证明证明是合理的。我试图正式化这种信息的看法。证明信息的衡量标准是基于与正在考虑的公式有关的一组证明。这组可能的证据(“知识库”)定义了概率度量,并且使用此度量定义了熵权重。该论文主要是概念性的,尚不清楚如何应用这种方法。

AI-3-标题 KnowGraph-PM a Knowledge Graph based Pricing Model for Semiconductors Supply Chains

链接: https://arxiv.org/abs/2205.07627
作者: Nour Ramzy, Soren Auer, Javad Chamanara, Hans Ehm
备注:

点击查看摘要

Abstract: Semiconductor supply chains are described by significant demand fluctuation that increases as one moves up the supply chain, the so-called bullwhip effect. To counteract, semiconductor manufacturers aim to optimize capacity utilization, to deliver with shorter lead times and exploit this to generate revenue. Additionally, in a competitive market, firms seek to maintain customer relationships while applying revenue management strategies such as dynamic pricing. Price change potentially generates conflicts with customers. In this paper, we present KnowGraph-PM, a knowledge graph-based dynamic pricing model. The semantic model uses the potential of faster delivery and shorter lead times to define premium prices, thus entail increased profits based on the customer profile. The knowledge graph enables the integration of customer-related information, e.g., customer class and location to customer order data. The pricing algorithm is realized as a SPARQL query that relies on customer profile and order behavior to determine the corresponding price premium. We evaluate the approach by calculating the revenue generated after applying the pricing algorithm. Based on competency questions that translate to SPARQL queries, we validate the created knowledge graph. We demonstrate that semantic data integration enables customer-tailored revenue management.

摘要:半导体供应链通过大量需求波动来描述,随着人们向上供应链的发展,即所谓的斗牛效应。为了抵消,半导体制造商旨在优化容量利用率,以较短的交货时间交付并利用此收入来产生收入。此外,在竞争激烈的市场中,公司试图在应用诸如动态定价之类的收入管理策略时保持客户关系。价格变化可能会与客户产生冲突。在本文中,我们介绍了Knowgraph-PM,这是一种基于知识图的动态定价模型。语义模型利用更快的交付和较短的交货时间来定义高级价格的潜力,因此根据客户资料需要增加利润。知识图可以集成与客户相关的信息,例如客户类和位置与客户订单数据。定价算法被实现为依赖客户配置文件和订单行为来确定相应价格溢价的SPARQL查询。我们通过计算应用定价算法后产生的收入来评估该方法。基于转化为SPARQL查询的能力问题,我们验证创建的知识图。我们证明语义数据集成可以实现客户量的收入管理。

AI-4-标题 Problem Decomposition and Multi-shot ASP Solving for Job-shop Scheduling

链接: https://arxiv.org/abs/2205.07537
作者: Mohammed M. S. El-Kholany, Martin Gebser, Konstantin Schekotihin
备注:

点击查看摘要

Abstract: The Job-shop Scheduling Problem (JSP) is a well-known and challenging combinatorial optimization problem in which tasks sharing a machine are to be arranged in a sequence such that encompassing jobs can be completed as early as possible. In this paper, we propose problem decomposition into time windows whose operations can be successively scheduled and optimized by means of multi-shot Answer Set Programming (ASP) solving. Decomposition aims to split highly complex scheduling tasks into better manageable sub-problems with a balanced number of operations so that good quality or even optimal partial solutions can be reliably found in a small fraction of runtime. Problem decomposition must respect the precedence of operations within their jobs and partial schedules optimized by time windows should yield better global solutions than obtainable in similar runtime on the entire instance. We devise and investigate a variety of decomposition strategies in terms of the number and size of time windows as well as heuristics for choosing their operations. Moreover, we incorporate time window overlapping and compression techniques into the iterative scheduling process to counteract window-wise optimization limitations restricted to partial schedules. Our experiments on JSP benchmark sets of several sizes show that successive optimization by multi-shot ASP solving leads to substantially better schedules within the runtime limit than global optimization on the full problem, where the gap increases with the number of operations to schedule. While the obtained solution quality still remains behind a state-of-the-art Constraint Programming system, our multi-shot solving approach comes closer the larger the instance size, demonstrating good scalability by problem decomposition.

摘要:工作店调度问题(JSP)是一个众所周知且具有挑战性的组合优化问题,其中要以序列为准,共享机器的任务,以便尽早完成包含的作业。在本文中,我们将问题分解在时间窗口中,可以通过多拍答案集编程(ASP)求解来连续安排和优化操作。分解旨在将高度复杂的调度任务分为具有平衡数量操作的更好可管理的子问题,以便在一小部分运行时可以可靠地发现高质量甚至最佳的部分解决方案。问题分解必须尊重其作业中操作的优先级,并且由时间窗口优化的部分时间表应产生比整个实例中类似运行时更好的全局解决方案。我们根据时间窗口的数量和大小以及选择其操作的启发式方法来设计和研究各种分解策略。此外,我们将时间窗口重叠和压缩技术纳入迭代调度过程中,以抵消限制部分计划的窗口优化限制。我们在JSP基准测试集的几个尺寸的实验表明,通过多弹性ASP求解的连续优化可导致在运行时限制内的时间表比全局优化在整个问题上的时间限制更高,在此问题中,差距随着时间表的运行数量而增加。尽管获得的解决方案质量仍然在最新的约束编程系统后面,但我们的多拍解决方法的实例大小越大,可以通过问题分解表明可扩展性良好。

AI-5-标题 Efficient Knowledge Compilation Beyond Weighted Model Counting

链接: https://arxiv.org/abs/2205.07496
作者: Rafael Kiesel, Pietro Totis, Angelika Kimmig
备注: Paper presented at the 38th International Conference on Logic Programming (ICLP 2022), 16 pages

点击查看摘要

Abstract: Quantitative extensions of logic programming often require the solution of so called second level inference tasks, i.e., problems that involve a third operation, such as maximization or normalization, on top of addition and multiplication, and thus go beyond the well-known weighted or algebraic model counting setting of probabilistic logic programming under the distribution semantics. We introduce Second Level Algebraic Model Counting (2AMC) as a generic framework for these kinds of problems. As 2AMC is to (algebraic) model counting what forall-exists-SAT is to propositional satisfiability, it is notoriously hard to solve. First level techniques based on Knowledge Compilation (KC) have been adapted for specific 2AMC instances by imposing variable order constraints on the resulting circuit. However, those constraints can severely increase the circuit size and thus decrease the efficiency of such approaches. We show that we can exploit the logical structure of a 2AMC problem to omit parts of these constraints, thus limiting the negative effect. Furthermore, we introduce and implement a strategy to generate a sufficient set of constraints statically, with a priori guarantees for the performance of KC. Our empirical evaluation on several benchmarks and tasks confirms that our theoretical results can translate into more efficient solving in practice. Under consideration for acceptance in TPLP.

摘要:逻辑编程的定量扩展通常需要解决所谓的第二级推理任务的解决方案,即涉及第三个操作的问题,例如最大化或归一化,在加法和乘法之上,因此超出了众所周知的加权或分布语义下的概率逻辑编程的代数模型计数设置。我们将第二级代数模型计数(2AMC)作为此类问题的通用框架。由于2AMC是(代数)模型,计算出forall存在的sat是命题可满足的,因此众所周知,很难解决。基于知识汇编(KC)的第一级技术已通过在结果电路上施加可变订单约束来适应特定的2AMC实例。但是,这些约束可以严重增加电路的大小,从而降低此类方法的效率。我们表明,我们可以利用2AMC问题的逻辑结构来忽略这些约束的一部分,从而限制了负面影响。此外,我们介绍并实施了一项策略,以在静态上生成足够的约束,并提供KC的性能的先验保证。我们对几个基准和任务的经验评估证实,我们的理论结果可以转化为实践中更有效的解决。在TPLP中接受的考虑。

AI-6-标题 Behaviour Explanation via Causal Analysis of Mental States A Preliminary Report

链接: https://arxiv.org/abs/2205.07443
作者: Shakil M. Khan
备注: 8 pages

点击查看摘要

Abstract: Inspired by a novel action-theoretic formalization of actual cause, Khan and Lespérance (2021) recently proposed a first account of causal knowledge that supports epistemic effects, models causal knowledge dynamics, and allows sensing actions to be causes of observed effects. To date, no other study has looked specifically at these issues. But their formalization is not sufficiently expressive enough to model explanations via causal analysis of mental states as it ignores a crucial aspect of theory of mind, namely motivations. In this paper, we build on their work to support causal reasoning about conative effects. In our framework, one can reason about causes of motivational states, and we allow motivation-altering actions to be causes of observed effects. We illustrate that this formalization along with a model of goal recognition can be utilized to explain agent behaviour in communicative multiagent contexts.

摘要:受到实际原因的新型动作理论形式化的启发,可汗和Lespérance(2021)最近提出了有关因果知识的首次说明,该因果知识支持认知效应,模型,模型因果知识动态,并允许感应行动是观察到效应的原因。迄今为止,还没有其他研究专门研究这些问题。但是,它们的形式化不足以表达足以通过因果分析来模仿心理状态的解释,因为它忽略了心理理论的关键方面,即动机。在本文中,我们以他们的工作为基础,以支持有关锥形效应的因果推理。在我们的框架中,可以理解动机状态的原因,我们允许改变动力的行动是造成观察到的效果的原因。我们说明,可以利用这种形式化以及目标识别模型来解释交流多种环境中的代理行为。

AI-7-标题 Understanding Emergent Behaviours in Multi-Agent Systems with Evolutionary Game Theory

链接: https://arxiv.org/abs/2205.07369
作者: The Anh Han
备注:

点击查看摘要

Abstract: The mechanisms of emergence and evolution of collective behaviours in dynamical Multi-Agent Systems (MAS) of multiple interacting agents, with diverse behavioral strategies in co-presence, have been undergoing mathematical study via Evolutionary Game Theory (EGT). Their systematic study also resorts to agent-based modelling and simulation (ABM) techniques, thus enabling the study of aforesaid mechanisms under a variety of conditions, parameters, and alternative virtual games. This paper summarises some main research directions and challenges tackled in our group, using methods from EGT and ABM. These range from the introduction of cognitive and emotional mechanisms into agents’ implementation in an evolving MAS, to the cost-efficient interference for promoting prosocial behaviours in complex networks, to the regulation and governance of AI safety development ecology, and to the equilibrium analysis of random evolutionary multi-player games. This brief aims to sensitize the reader to EGT based issues, results and prospects, which are accruing in importance for the modeling of minds with machines and the engineering of prosocial behaviours in dynamical MAS, with impact on our understanding of the emergence and stability of collective behaviours. In all cases, important open problems in MAS research as viewed or prioritised by the group are described.

摘要:通过进化游戏理论(EGT)进行了数学研究,在多种相互作用的动力学系统(MAS)中,具有多种行为策略的动态多机构系统(MAS)的出现和演变的机制。他们的系统研究还求助于基于代理的建模和仿真(ABM)技术,从而在各种条件,参数和替代虚拟游戏中都可以研究上述机制。本文使用EGT和ABM的方法总结了我们小组中解决的一些主要研究方向和挑战。这些范围从将认知和情感机制引入代理商在不断发展的MAS中的实施,到促进复杂网络中的亲社会行为的成本效益干预,以及对AI安全发展生态学的调节和治理,以及对平衡分析随机进化多人游戏。这本简介旨在使读者对基于EGT的问题,结果和前景提示,这些问题,结果和前景对于使用机器的思维建模和动态MAS中的亲社会行为的工程至关重要,并影响了我们对集体的出现和稳定性的影响行为。在所有情况下,都描述了该组所见或优先考虑的MAS研究中的重要开放问题。

AI-8-标题 Automating Defeasible Reasoning in Law

链接: https://arxiv.org/abs/2205.07335
作者: How Khang Lim, Avishkar Mahajan, Martin Strecker, Meng Weng Wong
备注:

点击查看摘要

Abstract: The paper studies defeasible reasoning in rule-based systems, in particular about legal norms and contracts. We identify rule modifiers that specify how rules interact and how they can be overridden. We then define rule transformations that eliminate these modifiers, leading in the end to a translation of rules to formulas. For reasoning with and about rules, we contrast two approaches, one in a classical logic with SMT solvers as proof engines, one in a non-monotonic logic with Answer Set Programming solvers.

摘要:论文研究基于规则的系统,特别是关于法律规范和合同的理由。我们确定规则修饰符,以指定规则如何相互作用以及如何被覆盖。然后,我们定义了消除这些修饰符的规则转换,最终导致规则转换为公式。为了与规则进行推理,我们将两种方法与SMT求解器作为证明引擎进行了对比,一种在经典逻辑中,一种在非单调逻辑中使用的方法与答案集编程求解器。

AI-9-标题 Variable Functioning and Its Application to Large Scale Steel Frame Design Optimization

链接: https://arxiv.org/abs/2205.07274
作者: Amir H Gandomi, Kalyanmoy Deb, Ronald C Averill, Shahryar Rahnamayan, Mohammad Nabi Omidvar
备注:

点击查看摘要

Abstract: To solve complex real-world problems, heuristics and concept-based approaches can be used in order to incorporate information into the problem. In this study, a concept-based approach called variable functioning Fx is introduced to reduce the optimization variables and narrow down the search space. In this method, the relationships among one or more subset of variables are defined with functions using information prior to optimization; thus, instead of modifying the variables in the search process, the function variables are optimized. By using problem structure analysis technique and engineering expert knowledge, the Fx method is used to enhance the steel frame design optimization process as a complex real-world problem. The proposed approach is coupled with particle swarm optimization and differential evolution algorithms and used for three case studies. The algorithms are applied to optimize the case studies by considering the relationships among column cross-section areas. The results show that Fx can significantly improve both the convergence rate and the final design of a frame structure, even if it is only used for seeding.

摘要:为了解决复杂的现实世界问题,可以使用启发式方法和基于概念的方法来将信息纳入问题中。在这项研究中,引入了一种基于概念的方法,称为可变功能FX,以减少优化变量并缩小搜索空间的范围。在这种方法中,在优化之前使用信息来定义一个或多个变量子集之间的关系;因此,优化了函数变量,而不是修改搜索过程中的变量。通过使用问题结构分析技术和工程专家知识,FX方法用于增强钢框架设计优化过程作为复杂的现实世界问题。所提出的方法与粒子群优化和差异进化算法结合,并用于三个案例研究。通过考虑柱横截面区域之间的关系,应用算法来优化案例研究。结果表明,即使仅用于播种,FX可以显着提高框架结构的收敛速率和最终设计。

AI-10-标题 Efficient lifting of symmetry breaking constraints for complex combinatorial problems

链接: https://arxiv.org/abs/2205.07129
作者: Alice Tarzariol, Martin Gebser, Mark Law, Konstantin Schekotihin
备注: Paper presented at the 38th International Conference on Logic Programming (ICLP 2022), 16 pages

点击查看摘要

Abstract: Many industrial applications require finding solutions to challenging combinatorial problems. Efficient elimination of symmetric solution candidates is one of the key enablers for high-performance solving. However, existing model-based approaches for symmetry breaking are limited to problems for which a set of representative and easily-solvable instances is available, which is often not the case in practical applications. This work extends the learning framework and implementation of a model-based approach for Answer Set Programming to overcome these limitations and address challenging problems, such as the Partner Units Problem. In particular, we incorporate a new conflict analysis algorithm in the Inductive Logic Programming system ILASP, redefine the learning task, and suggest a new example generation method to scale up the approach. The experiments conducted for different kinds of Partner Units Problem instances demonstrate the applicability of our approach and the computational benefits due to the first-order constraints learned.

摘要:许多工业应用都需要找到具有挑战性的组合问题的解决方案。有效消除对称解决方案候选者是高性能解决方案的关键推动因素之一。但是,现有的基于模型的对称破坏方法仅限于一组代表性和易于解决的实例的问题,在实际应用中通常不是这种情况。这项工作扩展了学习框架和基于模型的方法进行答案设置编程的方法,以克服这些局限性并解决挑战性问题,例如合作伙伴单位问题。特别是,我们将一种新的冲突分析算法纳入了归纳逻辑编程系统ILASP,重新定义学习任务,并提出了一种新的示例生成方法来扩展方法。针对不同类型的伙伴单位问题实例进行的实验证明了我们方法的适用性以及由于所学到的一阶约束而引起的计算益处。

AI-11-标题 GoalNet Inferring Conjunctive Goal Predicates from Human Plan Demonstrations for Robot Instruction Following

链接: https://arxiv.org/abs/2205.07081
作者: Shreya Sharma, Jigyasa Gupta, Shreshth Tuli, Rohan Paul, Mausam
备注: Accepted at Planning and Reinforcement Learning workshop in ICAPS 2022

点击查看摘要

Abstract: Our goal is to enable a robot to learn how to sequence its actions to perform tasks specified as natural language instructions, given successful demonstrations from a human partner. The ability to plan high-level tasks can be factored as (i) inferring specific goal predicates that characterize the task implied by a language instruction for a given world state and (ii) synthesizing a feasible goal-reaching action-sequence with such predicates. For the former, we leverage a neural network prediction model, while utilizing a symbolic planner for the latter. We introduce a novel neuro-symbolic model, GoalNet, for contextual and task dependent inference of goal predicates from human demonstrations and linguistic task descriptions. GoalNet combines (i) learning, where dense representations are acquired for language instruction and the world state that enables generalization to novel settings and (ii) planning, where the cause-effect modeling by the symbolic planner eschews irrelevant predicates facilitating multi-stage decision making in large domains. GoalNet demonstrates a significant improvement (51%) in the task completion rate in comparison to a state-of-the-art rule-based approach on a benchmark data set displaying linguistic variations, particularly for multi-stage instructions.

摘要:我们的目标是使机器人能够将其行为对执行指定为自然语言指示的任务进行测序,并鉴于人类伴侣的成功演示。可以将计划高级任务的能力纳入(i)推断特定目标谓词,这些谓词表征了给定世界状态的语言指令所隐含的任务,以及(ii)将可行的目标行动序列与此类谓词合成。对于前者,我们利用神经网络预测模型,同时利用后者的符号计划者。我们介绍了一种新型的神经符号模型,目的是,以从人类示范和语言任务描述中对目标谓词进行上下文和任务依赖性推断。门网结合了(i)学习,在该语言指导和世界状态获得了密集的表示,使概括到新颖的设置和(ii)计划,符号计划者的因果效应建模避免了无关的谓词,从而有助于多阶段决策做出多阶段决策。在大域中。与在基准数据集上显示出语言变化的基于最新的规则方法相比,目标完成率显示了任务完成率的显着改善(51%),尤其是对于多阶段说明。

AI-12-标题 Evaluating Membership Inference Through Adversarial Robustness

链接: https://arxiv.org/abs/2205.06986
作者: Zhaoxi Zhang, Leo Yu Zhang, Xufei Zheng, Bilal Hussain Abbasi, Shengshan Hu
备注: Accepted by The Computer Journal. Pre-print version

点击查看摘要

Abstract: The usage of deep learning is being escalated in many applications. Due to its outstanding performance, it is being used in a variety of security and privacy-sensitive areas in addition to conventional applications. One of the key aspects of deep learning efficacy is to have abundant data. This trait leads to the usage of data which can be highly sensitive and private, which in turn causes wariness with regard to deep learning in the general public. Membership inference attacks are considered lethal as they can be used to figure out whether a piece of data belongs to the training dataset or not. This can be problematic with regards to leakage of training data information and its characteristics. To highlight the significance of these types of attacks, we propose an enhanced methodology for membership inference attacks based on adversarial robustness, by adjusting the directions of adversarial perturbations through label smoothing under a white-box setting. We evaluate our proposed method on three datasets: Fashion-MNIST, CIFAR-10, and CIFAR-100. Our experimental results reveal that the performance of our method surpasses that of the existing adversarial robustness-based method when attacking normally trained models. Additionally, through comparing our technique with the state-of-the-art metric-based membership inference methods, our proposed method also shows better performance when attacking adversarially trained models. The code for reproducing the results of this work is available at \url{this https URL}.

摘要:在许多应用中,深度学习的使用正在升级。由于其出色的性能,除了传统的应用程序外,它还用于各种安全性和隐私敏感区域。深度学习功效的关键方面之一是拥有丰富的数据。这种特征导致数据使用可能是高度敏感和私人的,这反过来又引起了公众深入学习的警惕。会员推理攻击被认为是致命的,因为它们可以用来弄清楚一块数据是否属于培训数据集。对于培训数据信息及其特征的泄漏,这可能是有问题的。为了强调这些类型的攻击的重要性,我们通过通过在白色盒子设置下通过标签平滑来调整对抗性扰动的方向来提出一种基于对抗性鲁棒性的成员推理攻击的方法。我们在三个数据集上评估了我们提出的方法:Fashion-Mnist,CIFAR-10和CIFAR-100。我们的实验结果表明,在攻击正常训练的模型时,我们方法的性能超过了现有的基于对抗性鲁棒性的方法。此外,通过将我们的技术与最先进的成员推理方法进行比较,我们提出的方法在攻击受对抗训练的模型时还显示出更好的性能。复制此工作结果的代码可在\ url {this https url}上获得。

AI-13-标题 Grounding Explainability Within the Context of Global South in XAI

链接: https://arxiv.org/abs/2205.06919
作者: Deepa Singh, Michal Slupczynski, Ajit G. Pillai, Vinoth Pandian Sermuga Pandian
备注: 4 pages, Presented at CHI 2022 Workshop on Human-Centered Explainable AI (HCXAI): Beyond Opening the Black-Box of AI

点击查看摘要

Abstract: In this position paper, we propose building a broader and deeper understanding around Explainability in AI by ‘grounding’ it in social contexts, the socio-technical systems operate in. We situate our understanding of grounded explainability in the ‘Global South’ in general and India in particular and express the need for more research within the global south context when it comes to explainability and AI.

摘要:在该立场论文中,我们建议通过在社会背景下“接地” IT来建立对AI解释性的更广泛的理解,社会技术系统正在运行。我们对“全球南方”中的基础解释性的理解置于了“全球南方”中的理解。一般和印度尤其是在解释性和AI方面表达在全球南部背景下进行更多研究的必要性。

AI-14-标题 Deep Reinforcement Learning in mmW-NOMA Joint Power Allocation and Hybrid Beamforming

链接: https://arxiv.org/abs/2205.06814
作者: Abbas Akbarpour-Kasgari, Mehrdad Ardebilipour
备注: 20 pages (single Column), 9 figures. arXiv admin note: text overlap with arXiv:2205.06489

点击查看摘要

Abstract: High demand of data rate in the next generation of wireless communication could be ensured by Non-Orthogonal Multiple Access (NOMA) approach in the millimetre-wave (mmW) frequency band. Decreasing the interference on the other users while maintaining the bit rate via joint power allocation and beamforming is mandatory to guarantee the high demand of bit-rate. Furthermore, mmW frequency bands dictates the hybrid structure for beamforming because of the trade-off in implementation and performance, simultaneously. In this paper, joint power allocation and hybrid beamforming of mmW-NOMA systems is brought up via recent advances in machine learning and control theory approaches called Deep Reinforcement Learning (DRL). Actor-critic phenomena is exploited to measure the immediate reward and providing the new action to maximize the overall Q-value of the network. Additionally, to improve the stability of the approach, we have utilized Soft Actor-Critic (SAC) approach where overall reward and action entropy is maximized, simultaneously. The immediate reward has been defined based on the soft weighted summation of the rate of all the users. The soft weighting is based on the achieved rate and allocated power of each user. Furthermore, the channel responses between the users and base station (BS) is defined as the state of environment, while action space is involved of the digital and analog beamforming weights and allocated power to each user. The simulation results represent the superiority of the proposed approach rather than the Time-Division Multiple Access (TDMA) and Non-Line of Sight (NLOS)-NOMA in terms of sum-rate of the users. It’s outperformance is caused by the joint optimization and independency of the proposed approach to the channel responses.

摘要:通过在毫米波(MMW)频带中,通过非正交多访问(NOMA)方法来确保下一代无线通信中数据速率的高需求。必须减少对其他用户的干扰,同时必须通过关节功率分配和波束成形维持比特率,以确保比特率的高需求。此外,MMW频带同时决定了实施和性能方面的权衡。在本文中,通过机器学习和控制理论方法的最新进展称为“深钢筋学习”(DRL),MMW-NOMA系统的关节功率分配和混合波束形成。利用参与者批评现象来衡量即时的奖励,并提供新的动作,以最大程度地提高网络的整体Q值。此外,为了提高该方法的稳定性,我们还利用了软角色批判性方法(SAC)方法,其中总体奖励和动作熵同时是最大化的。立即的奖励是根据所有用户速率的软加权总和来定义的。软重量基于每个用户的实现速率和分配的功率。此外,用户和基站(BS)之间的频道响应定义为环境状态,而动作空间涉及数字和模拟波束形式的权重,并分配给每个用户的功率。仿真结果代表了所提出的方法的优势,而不是时间分段多重访问(TDMA)和非线视力(NLOS) - 非瘤,就用户的总和率而言。它的表现不佳是由提出的通道响应方法的联合优化和独立性引起的。

AI-15-标题 Fair Shares Feasibility Domination and Incentives

链接: https://arxiv.org/abs/2205.07519
作者: Moshe Babaioff, Uriel Feige
备注:

点击查看摘要

Abstract: We consider fair allocation of a set M of indivisible goods to n equally-entitled agents, with no monetary transfers. Every agent i has a valuation v_i from some given class of valuation functions. A share s is a function that maps a pair (v_i,n) to a value, with the interpretation that if an allocation of M to n agents fails to give agent i a bundle of value at least equal to s(v_i,n) , this serves as evidence that the allocation is not fair towards i . For such an interpretation to make sense, we would like the share to be feasible, meaning that for any valuations in the class, there is an allocation that gives every agent at least her share. The maximin share was a natural candidate for a feasible share for additive valuations. However, Kurokawa, Procaccia and Wang [2018] show that it is not feasible. We initiate a systematic study of the family of feasible shares. We say that a share is \emph{self maximizing} if truth-telling maximizes the implied guarantee. We show that every feasible share is dominated by some self-maximizing and feasible share. We seek to identify those self-maximizing feasible shares that are polynomial time computable, and offer the highest share values. We show that a SM-dominating feasible share – one that dominates every self-maximizing (SM) feasible share – does not exist for additive valuations (and beyond). Consequently, we relax the domination property to that of domination up to a multiplicative factor of \rho (called \rho -dominating). For additive valuations we present shares that are feasible, self-maximizing and polynomial-time computable. For n agents we present such a share that is \frac{2n}{3n-1} -dominating. For two agents we present such a share that is (1 - \epsilon) -dominating. Moreover, for these shares we present poly-time algorithms that compute allocations that give every agent at least her share.

摘要:我们考虑将一组不可分割的商品分配给n个同等定位的代理商,没有货币转移。每个代理我都有来自一些给定类别的估值功能的评估v_i。共享s是将一对(v_i,n)映射到一个值的函数,并解释说,如果M对N代理的分配未能给代理I一个值束至少等于S(v_i,n),这是证据表明分配对i不公平。为了使这种解释有意义,我们希望份额可行,这意味着对于班上的任何估值,都有一种分配,至少可以为每个代理人提供她的份额。最大值的份额是添加估值的可行份额的自然候选人。但是,黑线岛,Procaccia和Wang [2018]表明,这是不可行的。我们对可行股票家族进行了系统的研究。我们说,如果说真话最大化隐含的保证,则分享是\ emph {自我最大化}。我们表明,每个可行的份额都由一些自我最大化和可行的份额主导。我们试图确定那些可以计算多项式时间的自我最大的可行份额,并提供最高的份额价值。我们表明,对于加性估值(及以后)而言,并不存在一个具有SM偏见的可行份额(主导着每一个自我最大化(SM)可行的份额)。因此,我们放宽了统治的统治特性,直至\ rho的乘法因子(称为\ rho -dominating)。对于加性估值,我们呈现可行的,自我最大化和多项式时间可计算的共享。对于n个代理,我们提供了这样的共享,该共享为\ frac {2n} {3n -1} dominating。对于两个代理商,我们提供了这样的份额(1- \ epsilon)。此外,对于这些股份,我们提出了多个时间算法,这些算法计算分配,这些算法至少为每个代理提供了她的份额。

附件下载

点击下载今日全部论文列表