本篇博文主要展示每日从Arxiv论文网站获取的最新论文列表,每天10:30min左右自动更新,主要按照NLP、CV、ML、AI四个大方向区分,若需要邮件定时接收,请在评论区留下你的邮箱号。

说明:每日论文数据从arxiv网站获取,每天早上10点左右定时更新。

友情提示: 如何您需要邮箱接收每日论文数据,请在评论处留下你的邮箱,同样每天10点左右邮件定时发送。

目录

概览 (2021-10-22)

今日共更新490篇论文,其中:

  • 43篇自然语言处理(NLP: cs.CL)
  • 135篇计算机视觉(CV: cs.CV)
  • 129篇机器学习(ML: cs.LG)
  • 19篇人工智能(AI: cs.AI)
  • 其它主题164篇

自然语言处理

NLP-0-标题: A Python Package to Detect Anti-Vaccine Users on Twitter

链接: https://arxiv.org/abs/2110.11333
作者: Matheus Schmitz, Goran Murić, Keith Burghardt
备注:

点击查看摘要

Abstract: Vaccine hesitancy has a long history but has been recently driven by the anti-vaccine narratives shared online, which significantly degrades the efficacy of vaccination strategies, such as those for COVID-19. Despite broad agreement in the medical community about the safety and efficacy of available vaccines, a large number of social media users continue to be inundated with false information about vaccines and, partly because of this, became indecisive or unwilling to be vaccinated. The goal of this study is to better understand anti-vaccine sentiment, and work to reduce its impact, by developing a system capable of automatically identifying the users responsible for spreading anti-vaccine narratives. We introduce a publicly available Python package capable of analyzing Twitter profiles to assess how likely that profile is to spread anti-vaccine sentiment in the future. The software package is built using text embedding methods, neural networks, and automated dataset generation. It is trained on over one hundred thousand accounts and several million tweets. This model will help researchers and policy-makers understand anti-vaccine discussion and misinformation strategies, which can further help tailor targeted campaigns seeking to inform and debunk the harmful anti-vaccination myths currently being spread. Additionally, we leverage the data on such users to understand what are the moral and emotional characteristics of anti-vaccine spreaders.

摘要:疫苗犹豫有悠久的历史,但最近被在线共享的反疫苗叙述驱动,这显着降低了疫苗接种策略的疗效,例如Covid-19的效果。尽管医学界有关可用疫苗的安全性和功效,但大量的社交媒体用户仍然继续与疫苗的虚假信息淹没,部分原因是因为这,变得犹豫不决或不愿接种疫苗。本研究的目标是通过开发能够自动识别负责传播抗疫苗叙事的用户的系统来更好地了解反疫苗情绪,并努力减少其影响。我们介绍了一个可公开的Python软件包,能够分析Twitter配置文件,以评估个人资料将来传播反疫苗情绪的可能性。软件包是使用文本嵌入方法,神经网络和自动数据集生成构建的。它受到了超过十万个账户和几百万次推文的培训。该模式将帮助研究人员和决策者了解反疫苗讨论和错误信息策略,这可以进一步帮助定制寻求通知和揭穿目前正在传播的有害反疫苗的竞争的活动。此外,我们利用这些用户的数据来了解反疫苗蔓延机的道德和情感特征是什么。

NLP-1-标题: Fast Model Editing at Scale

链接: https://arxiv.org/abs/2110.11309
作者: Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, Christopher D. Manning
备注: View implementation and additional project info at this https URL

点击查看摘要

Abstract: While large pre-trained models have enabled impressive results on a variety of downstream tasks, the largest existing models still make errors, and even accurate predictions may become outdated over time. Because detecting all such failures at training time is impossible, enabling both developers and end users of such models to correct inaccurate outputs while leaving the model otherwise intact is desirable. However, the distributed, black-box nature of the representations learned by large neural networks makes producing such targeted edits difficult. If presented with only a single problematic input and new desired output, fine-tuning approaches tend to overfit; other editing algorithms are either computationally infeasible or simply ineffective when applied to very large models. To enable easy post-hoc editing at scale, we propose Model Editor Networks with Gradient Decomposition (MEND), a collection of small auxiliary editing networks that use a single desired input-output pair to make fast, local edits to a pre-trained model. MEND learns to transform the gradient obtained by standard fine-tuning, using a low-rank decomposition of the gradient to make the parameterization of this transformation tractable. MEND can be trained on a single GPU in less than a day even for 10 billion+ parameter models; once trained MEND enables rapid application of new edits to the pre-trained model. Our experiments with T5, GPT, BERT, and BART models show that MEND is the only approach to model editing that produces effective edits for models with tens of millions to over 10 billion parameters. Implementation available at this https URL.

摘要:虽然大型预训练模型在各种下游任务上启用了令人印象深刻的结果,但最大的现有型号仍然会产生错误,甚至准确的预测可能随着时间的推移而变得过时。因为检测训练时间的所有此类失败是不可能的,所以可以使这些模型的开发人员和最终用户能够在离开模型的同时纠正不准确的输出。然而,大型神经网络学到的表示的分布式,黑匣子性质使得产生这种目标编辑困难。如果仅用单一的有问题输入和新的所需输出呈现,微调方法往往会过度装备;当应用于非常大的模型时,其他编辑算法是计算地不可行的或根本无效。为了在SCALE中启用HOC Easy编辑,我们提出了具有渐变分解(修补程序)的模型编辑器网络,一系列使用单个所需输入输出对的小辅助编辑网络,以便快速,本地编辑到预先训练的模型。 MEND学习以使用梯度的低秩分解来改造标准微调所获得的渐变,以使得该变换的参数化易于。即使为10亿+参数模型,甚至可以在不到一天的单一GPU上培训修补程序;一旦训练有素的修复,可以快速将新的编辑应用于预先训练的模型。我们的实验与T5,GPT,BERT和BART模型表明,修补程序是模型编辑的唯一方法,它为具有数百万到超过100亿参数的型号生产有效编辑。在此HTTPS URL可用的实现。

NLP-2-标题: A Systematic Review on the Detection of Fake News Articles

链接: https://arxiv.org/abs/2110.11240
作者: Nathaniel Hoy, Theodora Koulouri
备注: 22 Pages, 16 Figures, Currently submitted to ACM TIST - Awaiting Peer-Review

点击查看摘要

Abstract: It has been argued that fake news and the spread of false information pose a threat to societies throughout the world, from influencing the results of elections to hindering the efforts to manage the COVID-19 pandemic. To combat this threat, a number of Natural Language Processing (NLP) approaches have been developed. These leverage a number of datasets, feature extraction/selection techniques and machine learning (ML) algorithms to detect fake news before it spreads. While these methods are well-documented, there is less evidence regarding their efficacy in this domain. By systematically reviewing the literature, this paper aims to delineate the approaches for fake news detection that are most performant, identify limitations with existing approaches, and suggest ways these can be mitigated. The analysis of the results indicates that Ensemble Methods using a combination of news content and socially-based features are currently the most effective. Finally, it is proposed that future research should focus on developing approaches that address generalisability issues (which, in part, arise from limitations with current datasets), explainability and bias.

摘要:有人认为假新闻和虚假信息的传播对全世界的社会构成威胁,从影响选举结果,以阻碍管理Covid-19大流行的努力。为了打击这种威胁,已经开发了许多自然语言处理(NLP)方法。这些利用了许多数据集,特征提取/选择技术和机器学习(ML)算法来检测它在蔓延之前的假新闻。虽然这些方法被妥善了解,但有关于它们在该领域的功效的证据较少。通过系统地审查文献,本文旨在描绘伪造新闻检测方法,这些方法是最表现的,识别现有方法的限制,并建议可以减轻这些方法。结果分析表明,使用新闻内容和基于社会的特征的组合的集合方法是最有效的。最后,提出了未来的研究应专注于开发解决可行性问题的方法(部分地部分地从当前数据集中的限制而产生),解释性和偏见。

NLP-3-标题: Topic-Guided Abstractive Multi-Document Summarization

链接: https://arxiv.org/abs/2110.11207
作者: Peng Cui, Le Hu
备注: accepted at findings of EMNLP 2021

点击查看摘要

Abstract: A critical point of multi-document summarization (MDS) is to learn the relations among various documents. In this paper, we propose a novel abstractive MDS model, in which we represent multiple documents as a heterogeneous graph, taking semantic nodes of different granularities into account, and then apply a graph-to-sequence framework to generate summaries. Moreover, we employ a neural topic model to jointly discover latent topics that can act as cross-document semantic units to bridge different documents and provide global information to guide the summary generation. Since topic extraction can be viewed as a special type of summarization that “summarizes” texts into a more abstract format, i.e., a topic distribution, we adopt a multi-task learning strategy to jointly train the topic and summarization module, allowing the promotion of each other. Experimental results on the Multi-News dataset demonstrate that our model outperforms previous state-of-the-art MDS models on both Rouge metrics and human evaluation, meanwhile learns high-quality topics.

摘要:多文件摘要(MDS)的关键点是学习各种文件之间的关系。在本文中,我们提出了一种新的抽象MDS模型,其中我们代表多个文档作为异构图,考虑了不同粒度的语义节点,然后应用了图形到序列框架以产生摘要。此外,我们采用了神经主题模型,共同发现可以充当跨文档语义单元的潜在主题来弥合不同的文档并提供全局信息来指导摘要生成。由于主题提取可以被视为一种特殊类型的摘要,即“总结”文本进入更抽象的格式,即主题分布,我们采用了一个多任务学习策略来共同培训主题和摘要模块,允许促销彼此。多新闻数据集的实验结果表明,我们的模型在胭脂指标和人类评估中以前的最先进的MDS模型,同时学习了高质量的主题。

NLP-4-标题: DAIR: Data Augmented Invariant Regularization

链接: https://arxiv.org/abs/2110.11205
作者: Tianjian Huang, Shaunak Halbe, Chinnadhurai Sankar, Pooyan Amini, Satwik Kottur, Alborz Geramifard, Meisam Razaviyayn, Ahmad Beirami
备注: 15 pages

点击查看摘要

Abstract: While deep learning through empirical risk minimization (ERM) has succeeded at achieving human-level performance at a variety of complex tasks, ERM generalizes poorly to distribution shift. This is partly explained by overfitting to spurious features such as background in images or named entities in natural language. Synthetic data augmentation followed by empirical risk minimization (DA-ERM) is a simple yet powerful solution to remedy this problem. In this paper, we propose data augmented invariant regularization (DAIR). The idea of DAIR is based on the observation that the model performance (loss) is desired to be consistent on the augmented sample and the original one. DAIR introduces a regularizer on DA-ERM to penalize such loss inconsistency. Both theoretically and through empirical experiments, we show that a particular form of the DAIR regularizer consistently performs well in a variety of settings. We apply it to multiple real-world learning problems involving domain shift, namely robust regression, visual question answering, robust deep neural network training, and task-oriented dialog modeling. Our experiments show that DAIR consistently outperforms ERM and DA-ERM with little marginal cost and setting new state-of-the-art results in several benchmarks.

摘要:虽然通过经验风险最小化(ERM)深入学习(ERM)在各种复杂任务中取得了成功,但ERM将概括到分销班次。部分地通过对自然语言中的图像中的图像或名为实体的背景之类的杂散特征进行了部分解释。合成数据增强后跟经验风险最小化(DA-ERM)是一个简单而强大的解决方案,可以解决这个问题。在本文中,我们提出了数据增强不变正则化(Dair)。抗钻井的想法是基于观察,即希望模型性能(损失)在增强的样本和原始样品上是一致的。 Dair在Da-eRM上引入了一个符合Da-ERM的核心器,以惩罚这种损失不一致。理论上和通过经验实验,我们都表明,特定形式的抗牛仔常规器在各种设置中始终良好地表现良好。我们将其应用于涉及域移位,即强大的回归,视觉问题应答,强大的深神经网络培训和面向任务对话框建模的多个实际学习问题。我们的实验表明,Dair始终如一地优于ERM和DA-eRM,几乎没有边际成本,并在几个基准中设定新的最先进导致。

NLP-5-标题: Asynchronous Decentralized Distributed Training of Acoustic Models

链接: https://arxiv.org/abs/2110.11199
作者: Xiaodong Cui, Wei Zhang, Abdullah Kayi, Mingrui Liu, Ulrich Finkler, Brian Kingsbury, George Saon, David Kung
备注: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing

点击查看摘要

Abstract: Large-scale distributed training of deep acoustic models plays an important role in today’s high-performance automatic speech recognition (ASR). In this paper we investigate a variety of asynchronous decentralized distributed training strategies based on data parallel stochastic gradient descent (SGD) to show their superior performance over the commonly-used synchronous distributed training via allreduce, especially when dealing with large batch sizes. Specifically, we study three variants of asynchronous decentralized parallel SGD (ADPSGD), namely, fixed and randomized communication patterns on a ring as well as a delay-by-one scheme. We introduce a mathematical model of ADPSGD, give its theoretical convergence rate, and compare the empirical convergence behavior and straggler resilience properties of the three variants. Experiments are carried out on an IBM supercomputer for training deep long short-term memory (LSTM) acoustic models on the 2000-hour Switchboard dataset. Recognition and speedup performance of the proposed strategies are evaluated under various training configurations. We show that ADPSGD with fixed and randomized communication patterns cope well with slow learners. When learners are equally fast, ADPSGD with the delay-by-one strategy has the fastest convergence with large batches. In particular, using the delay-by-one strategy, we can train the acoustic model in less than 2 hours using 128 V100 GPUs with competitive word error rates.

摘要:深度声学模型的大规模分布式训练在今天的高性能自动语音识别(ASR)中起着重要作用。在本文中,我们研究了基于数据并行随机梯度下降(SGD)的各种异步分散分布式训练策略,以展示其通过已解密的常用同步分布式训练的优越性,特别是在处理大批量尺寸时。具体地,我们研究了环上的异步分散的并行SGD(ADPSGD)的三种变体,即固定和随机通信模式以及逐个方案。我们介绍了ADPSGD的数学模型,给出了理论收敛速度,并比较了三种变体的经验融合行为和肩部恢复性质。实验是在IBM超级计算机上进行的,用于在2000小时的交换机数据集上训练深长短期内存(LSTM)声学模型。在各种训练配置中评估了拟议策略的识别和加速性能。我们向ADPSGD显示了固定和随机的通信模式,与慢速学习者相得益彰。当学习者同样快速时,逐步策略的ADPSGD具有较快的批量收敛。特别是,使用延迟逐行策略,我们可以使用128 V100 GPU与竞争语言误差率的128 V100 GPU来培训声学模型。

NLP-6-标题: A guided journey through non-interactive automatic story generation

链接: https://arxiv.org/abs/2110.11167
作者: Luis Miguel Botelho
备注:

点击查看摘要

Abstract: We present a literature survey on non-interactive computational story generation. The article starts with the presentation of requirements for creative systems, three types of models of creativity (computational, socio-cultural, and individual), and models of human creative writing. Then it reviews each class of story generation approach depending on the used technology: story-schemas, analogy, rules, planning, evolutionary algorithms, implicit knowledge learning, and explicit knowledge learning. Before the concluding section, the article analyses the contributions of the reviewed work to improve the quality of the generated stories. This analysis addresses the description of the story characters, the use of narrative knowledge including about character believability, and the possible lack of more comprehensive or more detailed knowledge or creativity models. Finally, the article presents concluding remarks in the form of suggestions of research topics that might have a significant impact on the advancement of the state of the art on autonomous non-interactive story generation systems. The article concludes that the autonomous generation and adoption of the main idea to be conveyed and the autonomous design of the creativity ensuring criteria are possibly two of most important topics for future research.

摘要:我们对非互动计算故事生成的文献调查。文章始于创意系统的要求的提示,创造力三种类型的模型(计算,社会文化和个人)和人类创造性写作的模型。然后评论各类故事生成方法取决于二手技术:故事模式,类比,规则,规划,进化算法,隐含知识学习和明确的知识学习。在结束部分之前,文章分析了审查工作的贡献,以提高生成的故事的质量。此分析解决了故事人物的描述,使用叙述知识,包括关于角色可信度的叙述,以及可能缺乏更全面或更详细的知识或创造性模型。最后,文章介绍了研究主题建议的缔结言论,这可能对自主非互动故事生成系统的艺术通道产生重大影响。本文的结论是,自主代产生和采用要传达的主要思想和创造力的自主设计,确保标准可能是未来研究最重要的两个主题。

NLP-7-标题: Modeling Performance in Open-Domain Dialogue with PARADISE

链接: https://arxiv.org/abs/2110.11164
作者: Marilyn Walker, Colin Harmon, James Graupera, Davan Harrison, Steve Whittaker
备注: The 12th International Workshop on Spoken Dialog System Technology, November 2021

点击查看摘要

Abstract: There has recently been an explosion of work on spoken dialogue systems, along with an increased interest in open-domain systems that engage in casual conversations on popular topics such as movies, books and music. These systems aim to socially engage, entertain, and even empathize with their users. Since the achievement of such social goals is hard to measure, recent research has used dialogue length or human ratings as evaluation metrics, and developed methods for automatically calculating novel metrics, such as coherence, consistency, relevance and engagement. Here we develop a PARADISE model for predicting the performance of Athena, a dialogue system that has participated in thousands of conversations with real users, while competing as a finalist in the Alexa Prize. We use both user ratings and dialogue length as metrics for dialogue quality, and experiment with predicting these metrics using automatic features that are both system dependent and independent. Our goal is to learn a general objective function that can be used to optimize the dialogue choices of any Alexa Prize system in real time and evaluate its performance. Our best model for predicting user ratings gets an R2^2 of .136 with a DistilBert model, and the best model for predicting length with system independent features gets an R2^2 of .865, suggesting that conversation length may be a more reliable measure for automatic training of dialogue systems.

摘要:最近曾经是口头对话系统的爆炸性,以及对开放式系统的兴趣增加,这些系统从事电影,书籍和音乐等流行主题的随意谈话。这些系统旨在社会互动,娱乐,甚至同情他们的用户。由于这种社会目标的实现难以衡量,最近的研究已经使用对话长度或人类评级作为评估指标,并开发了自动计算新颖度量的方法,例如一致性,一致性,相关性和参与。在这里,我们开发了一个天堂模型,以预测雅典娜的表现,这是一个与真实用户数千次对话的对话系统,同时竞争Alexa奖的决赛。我们将用户评级和对话长度与对话质量的指标一起使用,并使用依赖于系统的自动功能来预测这些指标的实验。我们的目标是学习一般的目标函数,可用于实时优化任何Alexa奖励系统的对话选择,并评估其性能。我们预测用户评级的最佳模型获得了一个r $ ^ 2 $ of of of of of ^ 2 $ .136,具有zoriLbert模型,最佳模型与系统独立功能的预测长度获得了r $ ^ 2 $ .865,表明会话长度可能是对话系统自动培训更可靠的措施。

NLP-8-标题: Improving Non-autoregressive Generation with Mixup Training

链接: https://arxiv.org/abs/2110.11115
作者: Ting Jiang, Shaohan Huang, Zihan Zhang, Deqing Wang, Fuzhen Zhuang, Furu Wei, Haizhen Huang, Liangjie Zhang, Qi Zhang
备注:

点击查看摘要

Abstract: While pre-trained language models have achieved great success on various natural language understanding tasks, how to effectively leverage them into non-autoregressive generation tasks remains a challenge. To solve this problem, we present a non-autoregressive generation model based on pre-trained transformer models. To bridge the gap between autoregressive and non-autoregressive models, we propose a simple and effective iterative training method called MIx Source and pseudo Target (MIST). Unlike other iterative decoding methods, which sacrifice the inference speed to achieve better performance based on multiple decoding iterations, MIST works in the training stage and has no effect on inference time. Our experiments on three generation benchmarks including question generation, summarization and paraphrase generation, show that the proposed framework achieves the new state-of-the-art results for fully non-autoregressive models. We also demonstrate that our method can be used to a variety of pre-trained models. For instance, MIST based on the small pre-trained model also obtains comparable performance with seq2seq models.

摘要:虽然预训练的语言模型在各种自然语言理解任务上取得了巨大成功,但如何将它们有效地利用他们进入非自动产生的一代任务仍然是一个挑战。为了解决这个问题,我们提出了一种基于预先接受的变压器模型的非自动发育模型。为了弥合自回归和非自动增加模型之间的差距,我们提出了一种简单有效的迭代培训方法,称为混合源和伪目标(雾)。与其他迭代解码方法不同,该方法基于多次解码迭代牺牲推动速度以实现更好的性能,雾在训练阶段工作,并且对推理时间没有影响。我们对三代基准的实验,包括问题生成,摘要和解释生成,表明,所提出的框架实现了全新的最先进的结果,以满足完全非自动增加模型。我们还证明我们的方法可用于各种预先训练的模型。例如,基于小型预先训练模型的雾也通过SEQ2Seq模型获得了可比性。

NLP-9-标题: LOA: Logical Optimal Actions for Text-based Interaction Games

链接: https://arxiv.org/abs/2110.10973
作者: Daiki Kimura, Subhajit Chaudhury, Masaki Ono, Michiaki Tatsubori, Don Joven Agravante, Asim Munawar, Akifumi Wachi, Ryosuke Kohita, Alexander Gray
备注: ACL-IJCNLP 2021 (demo paper)

点击查看摘要

Abstract: We present Logical Optimal Actions (LOA), an action decision architecture of reinforcement learning applications with a neuro-symbolic framework which is a combination of neural network and symbolic knowledge acquisition approach for natural language interaction games. The demonstration for LOA experiments consists of a web-based interactive platform for text-based games and visualization for acquired knowledge for improving interpretability for trained rules. This demonstration also provides a comparison module with other neuro-symbolic approaches as well as non-symbolic state-of-the-art agent models on the same text-based games. Our LOA also provides open-sourced implementation in Python for the reinforcement learning environment to facilitate an experiment for studying neuro-symbolic agents. Code: this https URL

摘要:我们呈现逻辑最佳动作(LOA),具有神经象征性框架的强化学习应用的行动决策架构,这是神经网络和自然语言交互游戏的象征知识获取方法的组合。LOA实验的演示由基于网络的互动平台组成,用于基于文本的游戏和可视化,以提高培训的规则的可解释性。该演示还提供了与其他神经象征性的方法以及基于文本的游戏的非符号状态的代理模型的比较模块。我们的LOA还提供了Python的开放式实施,以便于加强学习环境,以便于研究神经象征性药剂的实验。代码:此HTTPS URL

NLP-10-标题: Neuro-Symbolic Reinforcement Learning with First-Order Logic

链接: https://arxiv.org/abs/2110.10963
作者: Daiki Kimura, Masaki Ono, Subhajit Chaudhury, Ryosuke Kohita, Akifumi Wachi, Don Joven Agravante, Michiaki Tatsubori, Asim Munawar, Alexander Gray
备注: EMNLP 2021 (main conference)

点击查看摘要

Abstract: Deep reinforcement learning (RL) methods often require many trials before convergence, and no direct interpretability of trained policies is provided. In order to achieve fast convergence and interpretability for the policy in RL, we propose a novel RL method for text-based games with a recent neuro-symbolic framework called Logical Neural Network, which can learn symbolic and interpretable rules in their differentiable network. The method is first to extract first-order logical facts from text observation and external word meaning network (ConceptNet), then train a policy in the network with directly interpretable logical operators. Our experimental results show RL training with the proposed method converges significantly faster than other state-of-the-art neuro-symbolic methods in a TextWorld benchmark.

摘要:深度加强学习(RL)方法经常需要在收敛之前需要许多试验,并且没有提供训练有素的政策的直接可解释性。为了实现RL的策略的快速收敛和可解释性,我们提出了一种与最近称为逻辑神经网络的神经象征框架的基于文本的游戏的新颖RL方法,其可以在其可分辨率网络中学习象征性和可解释的规则。该方法首先是从文本观察和外部字中提取一阶逻辑事实意义网络(ConceptNet),然后用直接解释的逻辑运算符在网络中培训策略。我们的实验结果表明,利用所提出的方法,RL训练比TextWorld基准中的其他最先进的神经象征性方法更快地收敛。

NLP-11-标题: Single-Modal Entropy based Active Learning for Visual Question Answering

链接: https://arxiv.org/abs/2110.10906
作者: Dong-Jin Kim, Jae Won Cho, Jinsoo Choi, Yunjae Jung, In So Kweon
备注: Accepted to BMVC 2021

点击查看摘要

Abstract: Constructing a large-scale labeled dataset in the real world, especially for high-level tasks (eg, Visual Question Answering), can be expensive and time-consuming. In addition, with the ever-growing amounts of data and architecture complexity, Active Learning has become an important aspect of computer vision research. In this work, we address Active Learning in the multi-modal setting of Visual Question Answering (VQA). In light of the multi-modal inputs, image and question, we propose a novel method for effective sample acquisition through the use of ad hoc single-modal branches for each input to leverage its information. Our mutual information based sample acquisition strategy Single-Modal Entropic Measure (SMEM) in addition to our self-distillation technique enables the sample acquisitor to exploit all present modalities and find the most informative samples. Our novel idea is simple to implement, cost-efficient, and readily adaptable to other multi-modal tasks. We confirm our findings on various VQA datasets through state-of-the-art performance by comparing to existing Active Learning baselines.

摘要:构建现实世界中的大型标签数据集,特别是对于高级任务(例如,视觉问题应答),可能是昂贵且耗时的。此外,随着越来越多的数据和架构复杂程度,积极学习已成为计算机视觉研究的一个重要方面。在这项工作中,我们在Visual问题的多模态设置(VQA)中解决了主动学习。鉴于多模态输入,图像和问题,我们提出了一种通过使用Ad Hoc单模分支来利用其信息来利用其信息来提出一种有效的样本采集的新方法。我们的互信息采集策略单模熵措施(SMEM)除了我们的自蒸馏技术之外,还可以利用所有现有方式,找到最具信息的样本。我们的新颖思想易于实施,成本效益,易于适应其他多模态任务。我们通过与现有的主动学习基线进行比较,通过最先进的性能确认我们的各种VQA数据集。

NLP-12-标题: CNewSum: A Large-scale Chinese News Summarization Dataset with Human-annotated Adequacy and Deducibility Level

链接: https://arxiv.org/abs/2110.10874
作者: Danqing Wang, Jiaze Chen, Xianze Wu, Hao Zhou, Lei Li
备注:

点击查看摘要

Abstract: Automatic text summarization aims to produce a brief but crucial summary for the input documents. Both extractive and abstractive methods have witnessed great success in English datasets in recent years. However, there has been a minimal exploration of text summarization in Chinese, limited by the lack of large-scale datasets. In this paper, we present a large-scale Chinese news summarization dataset CNewSum, which consists of 304,307 documents and human-written summaries for the news feed. It has long documents with high-abstractive summaries, which can encourage document-level understanding and generation for current summarization models. An additional distinguishing feature of CNewSum is that its test set contains adequacy and deducibility annotations for the summaries. The adequacy level measures the degree of summary information covered by the document, and the deducibility indicates the reasoning ability the model needs to generate the summary. These annotations can help researchers analyze and target their model performance bottleneck. We examine recent methods on CNewSum and release our dataset to provide a solid testbed for automatic Chinese summarization research.

摘要:自动文本摘要旨在为输入文件产生简短但重要的摘要。近年来,Extractact和Abstactive方法在英语数据集中见证了巨大的成功。但是,在缺乏大规模数据集中的中文中,文本摘要探索了最小的探索。在本文中,我们提出了一个大规模的中国新闻摘要数据集CNewsum,其中包括304,307份文件和新闻饲料的人为书面摘要。它具有高度抽象摘要的长篇文档,可以鼓励对当前摘要模型的文件级了解和生成。 CNewsum的另一个显着特征是其测试集包含摘要的充分性和推动性注释。充分率水平测量文件所涵盖的摘要信息,推动能力表明模型需要摘要的推理能力。这些注释可以帮助研究人员分析和定位其模型性能瓶颈。我们检查最近关于CNewSum的方法,并释放我们的数据集,以提供用于自动中国摘要研究的固体试验台。

NLP-13-标题: Principled Representation Learning for Entity Alignment

链接: https://arxiv.org/abs/2110.10871
作者: Lingbing Guo, Zequn Sun, Mingyang Chen, Wei Hu, Qiang Zhang, Huajun Chen
备注:

点击查看摘要

Abstract: Embedding-based entity alignment (EEA) has recently received great attention. Despite significant performance improvement, few efforts have been paid to facilitate understanding of EEA methods. Most existing studies rest on the assumption that a small number of pre-aligned entities can serve as anchors connecting the embedding spaces of two KGs. Nevertheless, no one investigates the rationality of such an assumption. To fill the research gap, we define a typical paradigm abstracted from existing EEA methods and analyze how the embedding discrepancy between two potentially aligned entities is implicitly bounded by a predefined margin in the scoring function. Further, we find that such a bound cannot guarantee to be tight enough for alignment learning. We mitigate this problem by proposing a new approach, named NeoEA, to explicitly learn KG-invariant and principled entity embeddings. In this sense, an EEA model not only pursues the closeness of aligned entities based on geometric distance, but also aligns the neural ontologies of two KGs by eliminating the discrepancy in embedding distribution and underlying ontology knowledge. Our experiments demonstrate consistent and significant improvement in performance against the best-performing EEA methods.

摘要:基于嵌入的实体对齐(EEA)最近受到了极大的关注。尽管具有显着的性能改进,但很少有努力以促进对EEA方法的理解。大多数现有的研究依赖于假设少量预先准合的实体可以用作连接两千克的嵌入空间的锚点。然而,没有人调查这种假设的合理性。为了填补研究缺口,我们定义了一种典型的范式从现有的EEA方法中抽象,并分析了两个潜在对齐实体之间的嵌入差异是如何通过评分函数中的预定义余量隐式界定的。此外,我们发现这种绑定不能保证足够紧,以便对准学习。我们通过提出一个名为Neoea的新方法来减轻这个问题,明确地学习KG不变和原则的实体嵌入。从这个意义上讲,EEA模型不仅根据几何距离追求对齐实体的近距离,而且通过消除嵌入分布和潜在的本体知识的差异来对准两kgs的神经病学。我们的实验表明,对最佳性能的EEA方法的性能表现出一致而显着的改善。

NLP-14-标题: Integrating Visuospatial Linguistic and Commonsense Structure into Story Visualization

链接: https://arxiv.org/abs/2110.10834
作者: Adyasha Maharana, Mohit Bansal
备注: EMNLP 2021 (16 pages)

点击查看摘要

Abstract: While much research has been done in text-to-image synthesis, little work has been done to explore the usage of linguistic structure of the input text. Such information is even more important for story visualization since its inputs have an explicit narrative structure that needs to be translated into an image sequence (or visual story). Prior work in this domain has shown that there is ample room for improvement in the generated image sequence in terms of visual quality, consistency and relevance. In this paper, we first explore the use of constituency parse trees using a Transformer-based recurrent architecture for encoding structured input. Second, we augment the structured input with commonsense information and study the impact of this external knowledge on the generation of visual story. Third, we also incorporate visual structure via bounding boxes and dense captioning to provide feedback about the characters/objects in generated images within a dual learning setup. We show that off-the-shelf dense-captioning models trained on Visual Genome can improve the spatial structure of images from a different target domain without needing fine-tuning. We train the model end-to-end using intra-story contrastive loss (between words and image sub-regions) and show significant improvements in several metrics (and human evaluation) for multiple datasets. Finally, we provide an analysis of the linguistic and visuo-spatial information. Code and data: this https URL.

摘要:虽然在文本到图像综合中进行了许多研究,但已经完成了很少的工作来探索输入文本的语言结构的使用。这种信息对于故事可视化更为重要,因为它的输入具有明确的叙事结构,需要将其翻译成图像序列(或视觉故事)。在该域的前进工作表明,在视觉质量,一致性和相关性方面,在所生成的图像序列中有充足的改进空间。在本文中,我们首先使用基于变换器的复制架构来编码结构化输入来探讨选区解析树的使用。其次,我们使用致命信息增强结构化输入,并研究了这种外部知识对视觉故事的产生的影响。第三,我们还通过边界框和密集字幕加入可视结构,以提供关于在双学习设置中生成的图像中的字符/对象的反馈。我们展示了在视觉基因组上培训的货架密集标题模型可以从不同的目标域改善图像的空间结构而不需要微调。我们使用故事内部的对比损失(在单词和图像子区域之间)培训模型端到端,并显示多个数据集的几个度量(和人类评估)的显着改进。最后,我们提供了对语言和Visuo-Spatial信息的分析。代码和数据:此HTTPS URL。

NLP-15-标题: An Open Natural Language Processing Development Framework for EHR-based Clinical Research: A case demonstration using the National COVID Cohort Collaborative (N3C)

链接: https://arxiv.org/abs/2110.10780
作者: Sijia Liu, Andrew Wen, Liwei Wang, Huan He, Sunyang Fu, Robert Miller, Andrew Williams, Daniel Harris, Ramakanth Kavuluru, Mei Liu, Noor Abu-el-rub, Rui Zhang, John D. Osborne, Masoud Rouhizadeh, Yongqun He, Emily Pfaff, Christopher G. Chute, Tim Duong, Melissa A. Haendel, Rafael Fuentes, Peter Szolovits, Hua Xu, Hongfang Liu (N3C Natural Language Processing (NLP) Subgroup)
备注:

点击查看摘要

Abstract: While we pay attention to the latest advances in clinical natural language processing (NLP), we can notice some resistance in the clinical and translational research community to adopt NLP models due to limited transparency, Interpretability and usability. Built upon our previous work, in this study, we proposed an open natural language processing development framework and evaluated it through the implementation of NLP algorithms for the National COVID Cohort Collaborative (N3C). Based on the interests in information extraction from COVID-19 related clinical notes, our work includes 1) an open data annotation process using COVID-19 signs and symptoms as the use case, 2) a community-driven ruleset composing platform, and 3) a synthetic text data generation workflow to generate texts for information extraction tasks without involving human subjects. The generated corpora derived out of the texts from multiple intuitions and gold standard annotation are tested on a single institution’s rule set has the performances in F1 score of 0.876, 0.706 and 0.694, respectively. The study as a consortium effort of the N3C NLP subgroup demonstrates the feasibility of creating a federated NLP algorithm development and benchmarking platform to enhance multi-institution clinical NLP study.

摘要:虽然我们注重临床自然语言处理(NLP)的最新进展,但我们可以注意到临床和翻译研究界的一些抵抗因透明度,可解释性和可用性有限而采用NLP模型。在我们以前的工作基础上,在本研究中,我们提出了一种开放的自然语言处理开发框架,并通过实施国家Covid队列协作(N3C)的NLP算法进行评估。基于Covid-19相关临床笔记的信息提取的利益,我们的工作包括1)使用Covid-19标志和症状作为用例的开放数据注释过程,2)一个社区驱动的规则集合平台,3)合成文本数据生成工作流程,用于生成信息提取任务的文本而不涉及人为受试者。在单个机构的规则集上测试了从多个直觉和黄金标准注释的文本中获得的生成的Corpora分别在F1分数为0.876,0.706和0.694中的性能。作为N3C NLP子群的联盟努力的研究表明,创建联合NLP算法开发和基准测试平台的可行性,以增强多机构临床NLP研究。

NLP-16-标题: Contrastive Document Representation Learning with Graph Attention Networks

链接: https://arxiv.org/abs/2110.10778
作者: Peng Xu, Xinchi Chen, Xiaofei Ma, Zhiheng Huang, Bing Xiang
备注: Findings of EMNLP 2021

点击查看摘要

Abstract: Recent progress in pretrained Transformer-based language models has shown great success in learning contextual representation of text. However, due to the quadratic self-attention complexity, most of the pretrained Transformers models can only handle relatively short text. It is still a challenge when it comes to modeling very long documents. In this work, we propose to use a graph attention network on top of the available pretrained Transformers model to learn document embeddings. This graph attention network allows us to leverage the high-level semantic structure of the document. In addition, based on our graph document model, we design a simple contrastive learning strategy to pretrain our models on a large amount of unlabeled corpus. Empirically, we demonstrate the effectiveness of our approaches in document classification and document retrieval tasks.

摘要:最近在预磨料的基于格式的语言模型的进步表明了学习文本的上下文代表性的巨大成功。但是,由于二次自我关注复杂性,大多数预磨削的变压器模型只能处理相对短的文本。在建模很长的文件方面仍然是一项挑战。在这项工作中,我们建议在可用的普试变压器模型顶部使用图表关注网络来学习嵌入式嵌入式。该图注意网络允许我们利用文档的高级语义结构。此外,根据我们的图表模型,我们设计了一种简单的对比学习策略,以预先使用大量未标记的语料库来预防我们的模型。凭经验,我们展示了我们在文件分类和文件检索任务中的方法的有效性。

NLP-17-标题: SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation

链接: https://arxiv.org/abs/2110.10774
作者: Hong Chen, Hiroya Takamura, Hideki Nakayama
备注: this paper was accepted by EMNLP2021-findings

点击查看摘要

Abstract: Generating texts in scientific papers requires not only capturing the content contained within the given input but also frequently acquiring the external information called \textit{context}. We push forward the scientific text generation by proposing a new task, namely \textbf{context-aware text generation} in the scientific domain, aiming at exploiting the contributions of context in generated texts. To this end, we present a novel challenging large-scale \textbf{Sci}entific Paper Dataset for Conte\textbf{X}t-Aware Text \textbf{Gen}eration (SciXGen), consisting of well-annotated 205,304 papers with full references to widely-used objects (e.g., tables, figures, algorithms) in a paper. We comprehensively benchmark, using state-of-the-arts, the efficacy of our newly constructed SciXGen dataset in generating description and paragraph. Our dataset and benchmarks will be made publicly available to hopefully facilitate the scientific text generation research.

摘要:在科学论文中生成文本,不仅需要捕获给定输入中包含的内容,而且经常获取称为\ Textit {上下文}的外部信息。我们通过提出新任务,即\ TextBF {上下文感知文本生成}在科学域中推动科学文本生成,旨在利用生成文本中的上下文的贡献。为此,我们提出了一种小说挑战大规模\ textbf {sci}以conte \ textbf {x} t-afwer teve \ textbf {gen} eration(scixgen)的特点纸数据集,由满满的良好的205,304篇论文组成引用纸中广泛使用的对象(例如,表格,图,算法)。我们全面地采用最先进的基准测试,我们的新构建的Scixgen数据集在生成描述和段落中的功效。我们的数据集和基准将公开可供使科学文本生成研究促进。

NLP-18-标题: Better than Average: Paired Evaluation of NLP Systems

链接: https://arxiv.org/abs/2110.10746
作者: Maxime Peyrard, Wei Zhao, Steffen Eger, Robert West
备注: Published in ACL 2021 (long paper)

点击查看摘要

Abstract: Evaluation in NLP is usually done by comparing the scores of competing systems independently averaged over a common set of test instances. In this work, we question the use of averages for aggregating evaluation scores into a final number used to decide which system is best, since the average, as well as alternatives such as the median, ignores the pairing arising from the fact that systems are evaluated on the same test instances. We illustrate the importance of taking the instance-level pairing of evaluation scores into account and demonstrate, both theoretically and empirically, the advantages of aggregation methods based on pairwise comparisons, such as the Bradley-Terry (BT) model, a mechanism based on the estimated probability that a given system scores better than another on the test set. By re-evaluating 296 real NLP evaluation setups across four tasks and 18 evaluation metrics, we show that the choice of aggregation mechanism matters and yields different conclusions as to which systems are state of the art in about 30% of the setups. To facilitate the adoption of pairwise evaluation, we release a practical tool for performing the full analysis of evaluation scores with the mean, median, BT, and two variants of BT (Elo and TrueSkill), alongside functionality for appropriate statistical testing.

摘要:NLP中的评估通常通过比较独立于一组常见的测试实例的竞争系统进行比较。在这项工作中,我们质疑将汇总评估分数的平均值的使用分解为用于决定哪个系统最佳的最终数字,因为平均值以及中位数(如中位数)忽略了系统所评估的事实引起的配对在同一测试实例上。我们说明了从理论上和经验上展示了评估评估分数的实例级配对的重要性,并证明了基于成对比较的聚集方法的优点,例如Br​​adley-Terry(BT)模型,这是一种基于的机制估计给定系统在测试集上得分优于另一个的概率。通过在四个任务和18个评估指标中重新评估296个真实的NLP评估设置,我们表明聚合机制的选择事项并产生了不同的结论,以及该系统的最新状态在约30%的设置中。为了促进采用成对评估,我们释放了一种实用的工具,用于对BT(ELO和TRUSEKILL)的平均值,中值,BT和两个变体进行完全分析的评价分数,以及适当的统计测试的功能。

NLP-19-标题: A Self-Explainable Stylish Image Captioning Framework via Multi-References

链接: https://arxiv.org/abs/2110.10704
作者: Chengxi Li, Brent Harrison
备注: arXiv admin note: substantial text overlap with arXiv:2103.11186

点击查看摘要

Abstract: In this paper, we propose to build a stylish image captioning model through a Multi-style Multi modality mechanism (2M). We demonstrate that with 2M, we can build an effective stylish captioner and that multi-references produced by the model can also support explaining the model through identifying erroneous input features on faulty examples. We show how this 2M mechanism can be used to build stylish captioning models and show how these models can be utilized to provide explanations of likely errors in the models.

摘要:在本文中,我们建议通过多种式多模态机制(2M)构建时尚的图像标题模型。我们证明,使用2M,我们可以构建有效的时尚标题器,并且通过识别错误示例的错误输入功能,模型产生的多引用也可以支持解释模型。我们展示了这款2M机制如何用于构建时尚的标题模型,并展示这些模型如何用于提供模型中可能错误的解释。

NLP-20-标题: Evaluating the Evaluation Metrics for Style Transfer: A Case Study in Multilingual Formality Transfer

链接: https://arxiv.org/abs/2110.10668
作者: Eleftheria Briakou, Sweta Agrawal, Joel Tetreault, Marine Carpuat
备注: EMNLP 2021

点击查看摘要

Abstract: While the field of style transfer (ST) has been growing rapidly, it has been hampered by a lack of standardized practices for automatic evaluation. In this paper, we evaluate leading ST automatic metrics on the oft-researched task of formality style transfer. Unlike previous evaluations, which focus solely on English, we expand our focus to Brazilian-Portuguese, French, and Italian, making this work the first multilingual evaluation of metrics in ST. We outline best practices for automatic evaluation in (formality) style transfer and identify several models that correlate well with human judgments and are robust across languages. We hope that this work will help accelerate development in ST, where human evaluation is often challenging to collect.

摘要:虽然风格转移领域(ST)迅速增长,但它因缺乏用于自动评估的标准化实践而受到阻碍。在本文中,我们评估了对Formality Transperal Trade的研究领先的ST自动指标。与以前的评估不同,专注于英语,我们将我们的重点扩展到巴西 - 葡萄牙语,法国和意大利语,使这项工作是ST的第一次多语文评估。我们概述了(形式)风格转移的自动评估的最佳实践,并确定了几种与人类判断相关的模型,跨越语言。我们希望这项工作将有助于加速ST的发展,人类评估往往挑战收集。

NLP-21-标题: SILG: The Multi-environment Symbolic Interactive Language Grounding Benchmark

链接: https://arxiv.org/abs/2110.10661
作者: Victor Zhong, Austin W. Hanjie, Sida I. Wang, Karthik Narasimhan, Luke Zettlemoyer
备注: NeurIPS 2021. 14 pages, 8 figures

点击查看摘要

Abstract: Existing work in language grounding typically study single environments. How do we build unified models that apply across multiple environments? We propose the multi-environment Symbolic Interactive Language Grounding benchmark (SILG), which unifies a collection of diverse grounded language learning environments under a common interface. SILG consists of grid-world environments that require generalization to new dynamics, entities, and partially observed worlds (RTFM, Messenger, NetHack), as well as symbolic counterparts of visual worlds that require interpreting rich natural language with respect to complex scenes (ALFWorld, Touchdown). Together, these environments provide diverse grounding challenges in richness of observation space, action space, language specification, and plan complexity. In addition, we propose the first shared model architecture for RL on these environments, and evaluate recent advances such as egocentric local convolution, recurrent state-tracking, entity-centric attention, and pretrained LM using SILG. Our shared architecture achieves comparable performance to environment-specific architectures. Moreover, we find that many recent modelling advances do not result in significant gains on environments other than the one they were designed for. This highlights the need for a multi-environment benchmark. Finally, the best models significantly underperform humans on SILG, which suggests ample room for future work. We hope SILG enables the community to quickly identify new methodologies for language grounding that generalize to a diverse set of environments and their associated challenges.

摘要:在语言接地中的现有工作通常会研究单一环境。我们如何构建跨多种环境的统一模型?我们提出了多环境符号互动语言接地基准(Silg),其在公共接口下统一了各种接地的语言学习环境。 Silg由网格 - 世界环境组成,需要向新动态,实体和部分观察到的世界(RTFF,Messenger,Nethack)以及视觉世界的象征性同行,这些环境以及需要解释丰富的自然语言的象征性的对手(Alfworld,接地)。这些环境在一起,在观察空间,动作空间,语言规范和计划复杂性的丰富性方面提供多样化的接地挑战。此外,我们提出了对这些环境的RL的第一个共享模型架构,并评估最近的进步,例如使用SILG的自我监测局部卷积,经常性状态跟踪,实体的注意力和预磨损的LM。我们的共享体系结构可实现特定于环境的架构的可比性。此外,我们发现许多最近的建模进步不会导致除了为其设计的环境之外的环境中的显着收益。这突出了对多环境基准的需求。最后,最好的模型在Silg上显着低下,这表明未来工作的充足的空间。我们希望Silg使社区能够快速识别出于各种环境和相关挑战的语言接地的新方法。

NLP-22-标题: Overview of the 2021 Key Point Analysis Shared Task

链接: https://arxiv.org/abs/2110.10577
作者: Roni Friedman, Lena Dankin, Yufang Hou, Ranit Aharonov, Yoav Katz, Noam Slonim
备注:

点击查看摘要

Abstract: We describe the 2021 Key Point Analysis (KPA-2021) shared task on key point analysis that we organized as a part of the 8th Workshop on Argument Mining (ArgMining 2021) at EMNLP 2021. We outline various approaches and discuss the results of the shared task. We expect the task and the findings reported in this paper to be relevant for researchers working on text summarization and argument mining.

摘要:我们描述了2021个关键点分析(KPA-2021)关于关键点分析的共享任务,我们作为第8次研讨会的一部分在EMNLP 2021上组织为第8次研讨会(Argmining 2021)的一部分。我们概述了各种方法并讨论了结果共享任务。我们预计本文中报告的任务和调查结果与研究文本摘要和论证挖掘的研究人员有关。

NLP-23-标题: SocialVisTUM: An Interactive Visualization Toolkit for Correlated Neural Topic Models on Social Media Opinion Mining

链接: https://arxiv.org/abs/2110.10575
作者: Gerhard Hagerer, Martin Kirchhoff, Hannah Danner, Robert Pesch, Mainak Ghosh, Archishman Roy, Jiaxi Zhao, Georg Groh
备注: Demo paper accepted for publication on RANLP 2021; 8 pages, 5 figures, 1 table

点击查看摘要

Abstract: Recent research in opinion mining proposed word embedding-based topic modeling methods that provide superior coherence compared to traditional topic modeling. In this paper, we demonstrate how these methods can be used to display correlated topic models on social media texts using SocialVisTUM, our proposed interactive visualization toolkit. It displays a graph with topics as nodes and their correlations as edges. Further details are displayed interactively to support the exploration of large text collections, e.g., representative words and sentences of topics, topic and sentiment distributions, hierarchical topic clustering, and customizable, predefined topic labels. The toolkit optimizes automatically on custom data for optimal coherence. We show a working instance of the toolkit on data crawled from English social media discussions about organic food consumption. The visualization confirms findings of a qualitative consumer research study. SocialVisTUM and its training procedures are accessible online.

摘要:近期阐述挖掘的研究提出了基于词的嵌入式主题建模方法,与传统主题建模相比提供了优异的相干性。在本文中,我们演示了如何使用这些方法在我们提出的交互式可视化工具包中使用SocialVistum在社交媒体文本上显示相关主题模型。它显示具有主题的图形作为节点及其与边缘的相关性。进一步的详细信息以交互方式显示,以支持探索大型文本集合,例如主题,主题和情绪分布,分层主题群集和自定义,预定义主题标签的代表单词和句子。该工具包在自定义数据上自动优化,以获得最佳的相干性。我们展示了工具包的工作实例,就英语社交媒体讨论有关有机食品消费的讨论。可视化确认了定性消费研究研究的结果。 SocialVistum及其培训程序在线访问。

NLP-24-标题: Continual Learning in Multilingual NMT via Language-Specific Embeddings

链接: https://arxiv.org/abs/2110.10478
作者: Alexandre Berard
备注: Accepted as a research paper to WMT 2021

点击查看摘要

Abstract: This paper proposes a technique for adding a new source or target language to an existing multilingual NMT model without re-training it on the initial set of languages. It consists in replacing the shared vocabulary with a small language-specific vocabulary and fine-tuning the new embeddings on the new language’s parallel data. Some additional language-specific components may be trained to improve performance (e.g., Transformer layers or adapter modules). Because the parameters of the original model are not modified, its performance on the initial languages does not degrade. We show on two sets of experiments (small-scale on TED Talks, and large-scale on ParaCrawl) that this approach performs as well or better as the more costly alternatives; and that it has excellent zero-shot performance: training on English-centric data is enough to translate between the new language and any of the initial languages.

摘要:本文提出了一种将新来源或目标语言添加到现有的多语言NMT模型的技术,而无需在初始语言集中重新培训它。它包括替换具有小型语言特定词汇的共享词汇和微调新的语言并行数据的新嵌入。可以培训一些额外的语言特定组件以改善性能(例如,变压器层或适配器模块)。由于未修改原始模型的参数,因此其对初始语言的性能不会降低。我们在两套实验中显示出两套实验(小规模的TED谈判,并在帕拉克上大规模),这种方法也表现不佳或更好地作为更昂贵的替代品;并且它具有出色的零点性能:英式数据培训足以在新语言和任何初始语言之间进行翻译。

NLP-25-标题: Multilingual Unsupervised Neural Machine Translation with Denoising Adapters

链接: https://arxiv.org/abs/2110.10472
作者: Ahmet Üstün, Alexandre Bérard, Laurent Besacier, Matthias Gallé
备注: Accepted as a long paper to EMNLP 2021

点击查看摘要

Abstract: We consider the problem of multilingual unsupervised machine translation, translating to and from languages that only have monolingual data by using auxiliary parallel language pairs. For this problem the standard procedure so far to leverage the monolingual data is back-translation, which is computationally costly and hard to tune. In this paper we propose instead to use denoising adapters, adapter layers with a denoising objective, on top of pre-trained mBART-50. In addition to the modularity and flexibility of such an approach we show that the resulting translations are on-par with back-translating as measured by BLEU, and furthermore it allows adding unseen languages incrementally.

摘要:考虑使用辅助并行语言对的多语种无监督机器翻译,转换为单语言数据的语言问题。对于此问题,到目前为止利用单语数据的标准程序是反平衡,这是计算地昂贵且难以调整的。在本文中,我们提出使用去噪适配器,具有去噪物目标的适配器层,在预先训练的MBART-50之上。除了这种方法的模块化和灵活性之外,我们还表明,由BLEU测量的后翻盖是与后转换的,此外,它允许逐步添加看不见的语言。

NLP-26-标题: Interpreting Deep Learning Models in Natural Language Processing: A Review

链接: https://arxiv.org/abs/2110.10470
作者: Xiaofei Sun, Diyi Yang, Xiaoya Li, Tianwei Zhang, Yuxian Meng, Qiu Han, Guoyin Wang, Eduard Hovy, Jiwei Li
备注:

点击查看摘要

Abstract: Neural network models have achieved state-of-the-art performances in a wide range of natural language processing (NLP) tasks. However, a long-standing criticism against neural network models is the lack of interpretability, which not only reduces the reliability of neural NLP systems but also limits the scope of their applications in areas where interpretability is essential (e.g., health care applications). In response, the increasing interest in interpreting neural NLP models has spurred a diverse array of interpretation methods over recent years. In this survey, we provide a comprehensive review of various interpretation methods for neural models in NLP. We first stretch out a high-level taxonomy for interpretation methods in NLP, i.e., training-based approaches, test-based approaches, and hybrid approaches. Next, we describe sub-categories in each category in detail, e.g., influence-function based methods, KNN-based methods, attention-based models, saliency-based methods, perturbation-based methods, etc. We point out deficiencies of current methods and suggest some avenues for future research.

摘要:神经网络模型在广泛的自然语言处理(NLP)任务中取得了最先进的性能。然而,对神经网络模型的长期批评是缺乏可解释性,这不仅降低了神经NLP系统的可靠性,而且还限制了他们在可解释性至关重要的地区的应用范围(例如,保健应用)。作为回应,近年来,解释神经NLP模型的越来越令人兴趣促使了各种解释方法。在本调查中,我们对NLP中神经模型的各种解释方法进行了全面审查。我们首先在NLP,即基于培训的方法,基于培训的方法和混合方法中延伸了一个高级别分类法。接下来,我们详细描述了每个类别的子类别,例如基于影响功能的方法,基于KNN的方法,基于关注的模型,显着的方法,基于扰动的方法等。我们指出了当前方法的缺陷并建议一些途径以供未来的研究。

NLP-27-标题: Knowledge Graph informed Fake News Classification via Heterogeneous Representation Ensembles

链接: https://arxiv.org/abs/2110.10457
作者: Boshko Koloski, Timen Stepišnik-Perdih, Marko Robnik-Šikonja, Senja Pollak, Blaž Škrlj
备注:

点击查看摘要

Abstract: Increasing amounts of freely available data both in textual and relational form offers exploration of richer document representations, potentially improving the model performance and robustness. An emerging problem in the modern era is fake news detection – many easily available pieces of information are not necessarily factually correct, and can lead to wrong conclusions or are used for manipulation. In this work we explore how different document representations, ranging from simple symbolic bag-of-words, to contextual, neural language model-based ones can be used for efficient fake news identification. One of the key contributions is a set of novel document representation learning methods based solely on knowledge graphs, i.e. extensive collections of (grounded) subject-predicate-object triplets. We demonstrate that knowledge graph-based representations already achieve competitive performance to conventionally accepted representation learners. Furthermore, when combined with existing, contextual representations, knowledge graph-based document representations can achieve state-of-the-art performance. To our knowledge this is the first larger-scale evaluation of how knowledge graph-based representations can be systematically incorporated into the process of fake news classification.

摘要:以文本和关系表格中的越来越多的可自由数据提供探索更丰富的文档表示,可能会提高模型性能和鲁棒性。现代时代的新兴问题是假新闻检测 - 许多易于使用的信息块不一定是有事实正确的,并且可以出于错误的结论或用于操纵。在这项工作中,我们探讨了如何不同的文档表示,从简单的象征性袋子,语境上的基于神经语言模型的文档表示如何用于高效的假新闻识别。其中一个关键贡献是一组简单地基于知识图的新型文档表示学习方法,即(接地)主题 - 谓词对象三元组的广泛集合。我们展示了基于知识的基于图表的代表已经为传统上被接受的代表学习者实现了竞争性能。此外,当与现有的上下文表示结合时,基于知识图形的文档表示可以实现最先进的性能。据我们所知,这是对知识基于图形的表现方式的第一个大规模评估可以系统地纳入假新闻分类的过程中。

NLP-28-标题: Discontinuous Grammar as a Foreign Language

链接: https://arxiv.org/abs/2110.10431
作者: Daniel Fernández-González, Carlos Gómez-Rodríguez
备注: 22 pages

点击查看摘要

Abstract: In order to achieve deep natural language understanding, syntactic constituent parsing is a vital step, highly demanded by many artificial intelligence systems to process both text and speech. One of the most recent proposals is the use of standard sequence-to-sequence models to perform constituent parsing as a machine translation task, instead of applying task-specific parsers. While they show a competitive performance, these text-to-parse transducers are still lagging behind classic techniques in terms of accuracy, coverage and speed. To close the gap, we here extend the framework of sequence-to-sequence models for constituent parsing, not only by providing a more powerful neural architecture for improving their performance, but also by enlarging their coverage to handle the most complex syntactic phenomena: discontinuous structures. To that end, we design several novel linearizations that can fully produce discontinuities and, for the first time, we test a sequence-to-sequence model on the main discontinuous benchmarks, obtaining competitive results on par with task-specific discontinuous constituent parsers and achieving state-of-the-art scores on the (discontinuous) English Penn Treebank.

摘要:为了实现深刻的自然语言理解,句法成分解析是一个重要的步骤,很多人工智能系统都需要处理文本和演讲。最新的建议之一是使用标准序列到序列模型来执行作为机器翻译任务的组成部分,而不是应用特定于任务的解析器。虽然它们表现出竞争性能,但这些文本到解析的传感器在准确性,覆盖率和速度方面仍然落后于经典技术。要关闭差距,我们在此扩大了组成解析的序列到序列模型的框架,而不仅仅是通过提供更强大的神经结构来提高其性能,而且还通过扩大其覆盖范围来处理最复杂的句法现象:不连续结构。为此,我们设计了几种可以完全产生不连续性的新型线性化,并且首次测试主要不连续基准测试序列到序列模型,从特定于特定的不连续的成分解析器和实现方面获得竞争结果(不连续)英语Penn TreeBank上的最先进的分数。

NLP-29-标题: Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach

链接: https://arxiv.org/abs/2110.10429
作者: Mun-Hak Lee, Joon-Hyuk Chang
备注: 4page + 1page for citation + 2 pages for appendix

点击查看摘要

Abstract: The remarkable performance of the pre-trained language model (LM) using self-supervised learning has led to a major paradigm shift in the study of natural language processing. In line with these changes, leveraging the performance of speech recognition systems with massive deep learning-based LMs is a major topic of speech recognition research. Among the various methods of applying LMs to speech recognition systems, in this paper, we focus on a cross-modal knowledge distillation method that transfers knowledge between two types of deep neural networks with different modalities. We propose an acoustic model structure with multiple auxiliary output layers for cross-modal distillation and demonstrate that the proposed method effectively compensates for the shortcomings of the existing label-interpolation-based distillation method. In addition, we extend the proposed method to a hierarchical distillation method using LMs trained in different units (senones, monophones, and subwords) and reveal the effectiveness of the hierarchical distillation method through an ablation study.

摘要:使用自我监督学习的预训练语言模型(LM)的显着性能导致了自然语言处理研究中的主要范式转变。符合这些变化,利用语音识别系统的性能与大规模的深度学习的LMS是语音识别研究的主要话题。在本文中将LMS应用于语音识别系统的各种方法中,我们专注于跨模型知识蒸馏方法,其在具有不同方式的两种类型的深神经网络之间传输知识。我们提出了一种具有多个辅助输出层的声学模型结构,用于交叉模次蒸馏,并证明该方法有效地补偿了基于标签插值的蒸馏方法的缺点。此外,我们将所提出的方法扩展到使用不同单元(Senone,Monophone和次字)培训的LMS的分层蒸馏方法,并通过烧蚀研究揭示分层蒸馏方法的有效性。

NLP-30-标题: Distributionally Robust Classifiers in Sentiment Analysis

链接: https://arxiv.org/abs/2110.10372
作者: Shilun Li, Renee Li, Carina Zhang
备注:

点击查看摘要

Abstract: In this paper, we propose sentiment classification models based on BERT integrated with DRO (Distributionally Robust Classifiers) to improve model performance on datasets with distributional shifts. We added 2-Layer Bi-LSTM, projection layer (onto simplex or Lp ball), and linear layer on top of BERT to achieve distributionally robustness. We considered one form of distributional shift (from IMDb dataset to Rotten Tomatoes dataset). We have confirmed through experiments that our DRO model does improve performance on our test set with distributional shift from the training set.

摘要:在本文中,我们提出了基于BERT与DRO(分布强大分类器)集成的情绪分类模型,以提高具有分布换档的数据集的模型性能。我们添加了2层Bi-LSTM,投影层(进入Simplex或LP球),并在伯特顶部的线性层实现分布鲁棒性。我们考虑了一种分类转移形式(从IMDB数据集到腐烂的Tomatoes数据集)。我们通过实验证实了我们的DRO模型在我们的测试集中提高了具有来自培训集的分布班的表现。

NLP-31-标题: Hierarchical Aspect-guided Explanation Generation for Explainable Recommendation

链接: https://arxiv.org/abs/2110.10358
作者: Yidan Hu, Yong Liu, Chunyan Miao, Gongqi Lin, Yuan Miao
备注:

点击查看摘要

Abstract: Explainable recommendation systems provide explanations for recommendation results to improve their transparency and persuasiveness. The existing explainable recommendation methods generate textual explanations without explicitly considering the user’s preferences on different aspects of the item. In this paper, we propose a novel explanation generation framework, named Hierarchical Aspect-guided explanation Generation (HAG), for explainable recommendation. Specifically, HAG employs a review-based syntax graph to provide a unified view of the user/item details. An aspect-guided graph pooling operator is proposed to extract the aspect-relevant information from the review-based syntax graphs to model the user’s preferences on an item at the aspect level. Then, a hierarchical explanation decoder is developed to generate aspects and aspect-relevant explanations based on the attention mechanism. The experimental results on three real datasets indicate that HAG outperforms state-of-the-art explanation generation methods in both single-aspect and multi-aspect explanation generation tasks, and also achieves comparable or even better preference prediction accuracy than strong baseline methods.

摘要:可解释的推荐系统为推荐结果提供解释,以提高透明度和说服力。现有可解释的推荐方法在不明确考虑用户对项目的不同方面的偏好时生成文本解释。在本文中,我们提出了一种新颖的解释生成框架,名为分层方面引导的解释生成(HAG),以解释的推荐。具体而言,HAG采用基于审查的语法图形,以提供用户/项目详细信息的统一视图。提出了一个方面引导的图形池池池,以从基于审查的语法图中提取方面相关信息,以在方面级别的项目上模拟用户的偏好。然后,开发了一种分层解释解码器以基于注意机制生成方面和方面相关的解释。在三个真实数据集上的实验结果表明HAG优于最先进的解释生成方法,在单个方面和多方面说明生成任务中,并且还实现了比强基线方法的相当甚至更好的偏好预测精度。

NLP-32-标题: News-based Business Sentiment and its Properties as an Economic Index

链接: https://arxiv.org/abs/2110.10340
作者: Kazuhiro Seki, Yusuke Ikuta, Yoichi Matsubayashi
备注: 40 pages, to be published in Information Processing and Management

点击查看摘要

Abstract: This paper presents an approach to measuring business sentiment based on textual data. Business sentiment has been measured by traditional surveys, which are costly and time-consuming to conduct. To address the issues, we take advantage of daily newspaper articles and adopt a self-attention-based model to define a business sentiment index, named S-APIR, where outlier detection models are investigated to properly handle various genres of news articles. Moreover, we propose a simple approach to temporally analyzing how much any given event contributed to the predicted business sentiment index. To demonstrate the validity of the proposed approach, an extensive analysis is carried out on 12 years’ worth of newspaper articles. The analysis shows that the S-APIR index is strongly and positively correlated with established survey-based index (up to correlation coefficient r=0.937) and that the outlier detection is effective especially for a general newspaper. Also, S-APIR is compared with a variety of economic indices, revealing the properties of S-APIR that it reflects the trend of the macroeconomy as well as the economic outlook and sentiment of economic agents. Moreover, to illustrate how S-APIR could benefit economists and policymakers, several events are analyzed with respect to their impacts on business sentiment over time.

摘要:本文提出了一种基于文本数据的衡量业务情绪的方法。经营情绪已通过传统调查来衡量,这是昂贵且耗时的行为。为了解决这些问题,我们利用日报文章并采取了基于自我关注的模型来定义名为S-APIR的商业情绪指数,其中调查了异常检测模型以适当处理各种新闻文章。此外,我们提出了一种简单的方法来暂时分析任何给定的事件对预测的商业情绪指数的贡献程度。为了证明所提出的方法的有效性,在12年的报纸文章中进行了广泛的分析。该分析表明,S-APIR指数与建立的基于测量的索引(最多是相关系数R = 0.937)强烈呈呈肯定相关,并且异常检测对于一般报纸而言是有效的。此外,将S-APIR与各种经济指标进行比较,揭示了S-APIR的性质,即它反映了宏观经济的趋势以及经济代理的经济前景和情绪。此外,为了说明S-APIR如何受益于经济学家和政策制定者,在随着时间的推移,有关他们对商业情绪的影响分析了一些事件。

NLP-33-标题: SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training

链接: https://arxiv.org/abs/2110.10329
作者: Ankur Bapna, Yu-an Chung, Nan Wu, Anmol Gulati, Ye Jia, Jonathan H. Clark, Melvin Johnson, Jason Riesa, Alexis Conneau, Yu Zhang
备注:

点击查看摘要

Abstract: Unsupervised pre-training is now the predominant approach for both text and speech understanding. Self-attention models pre-trained on large amounts of unannotated data have been hugely successful when fine-tuned on downstream tasks from a variety of domains and languages. This paper takes the universality of unsupervised language pre-training one step further, by unifying speech and text pre-training within a single model. We build a single encoder with the BERT objective on unlabeled text together with the w2v-BERT objective on unlabeled speech. To further align our model representations across modalities, we leverage alignment losses, specifically Translation Language Modeling (TLM) and Speech Text Matching (STM) that make use of supervised speech-text recognition data. We demonstrate that incorporating both speech and text data during pre-training can significantly improve downstream quality on CoVoST~2 speech translation, by around 1 BLEU compared to single-modality pre-trained models, while retaining close to SotA performance on LibriSpeech and SpeechStew ASR tasks. On four GLUE tasks and text-normalization, we observe evidence of capacity limitations and interference between the two modalities, leading to degraded performance compared to an equivalent text-only model, while still being competitive with BERT. Through extensive empirical analysis we also demonstrate the importance of the choice of objective function for speech pre-training, and the beneficial effect of adding additional supervised signals on the quality of the learned representations.

摘要:无监督的预训练现在是文本和语音理解的主要方法。在来自各个领域和语言的下游任务上微调时,在大量未定位的数据上预先培训的自我关注模型非常成功。本文通过统一语音和文本在单一模型中统一讲话和文本预培训,将无监督的语言预先培训一步。我们在未标记的文本上与未标记的语音的W2V-BERT目标一起构建单个编码器。为了进一步将模型表示跨模式调整,我们利用对齐损失,特别是翻译语言建模(TLM)和语音文本匹配(STM),用于使用监督的语音识别数据。我们证明,与单片机预训练模型相比,在培训前,在培训期间,致辞和文本数据将显着提高Covost〜2语音翻译下的下游质量,而单反培训型号,同时保留靠近LibrisPeech和SpeemStew ASR的SOTA性能。任务。在四个胶水任务和文本标准化上,我们遵守能力局限性和两种方式之间的干扰的证据,导致性能降低,与同等文本的模型相比,同时仍然与伯特竞争。通过广泛的实证分析,我们还证明了语音预培训的客观函数的重要性,以及在学习象征的质量上增加额外监督信号的有益效果。

NLP-34-标题: R3Net:Relation-embedded Representation Reconstruction Network for Change Captioning

链接: https://arxiv.org/abs/2110.10328
作者: Yunbin Tu, Liang Li, Chenggang Yan, Shengxiang Gao, Zhengtao Yu
备注: Accepted by EMNLP 2021

点击查看摘要

Abstract: Change captioning is to use a natural language sentence to describe the fine-grained disagreement between two similar images. Viewpoint change is the most typical distractor in this task, because it changes the scale and location of the objects and overwhelms the representation of real change. In this paper, we propose a Relation-embedded Representation Reconstruction Network (R3^3Net) to explicitly distinguish the real change from the large amount of clutter and irrelevant changes. Specifically, a relation-embedded module is first devised to explore potential changed objects in the large amount of clutter. Then, based on the semantic similarities of corresponding locations in the two images, a representation reconstruction module (RRM) is designed to learn the reconstruction representation and further model the difference representation. Besides, we introduce a syntactic skeleton predictor (SSP) to enhance the semantic interaction between change localization and caption generation. Extensive experiments show that the proposed method achieves the state-of-the-art results on two public datasets.

摘要:改变标题是使用自然语言句来描述两个类似图像之间的细粒度分歧。 ViewPoint Change是此任务中最典型的分散组,因为它更改了对象的比例和位置,并压倒了真实变化的表示。在本文中,我们提出了一个关系嵌入式的表示重建网络(R $ ^ 3 $ Net),以明确区分从大量混乱和无关紧要的变化。具体地,首先设计了关系嵌入式模块以探索大量杂波中的潜在改变的物体。然后,基于两个图像中的相应位置的语义相似之处,设计了一种表示重建模块(RRM)来学习重建表示和进一步模拟差异表示。此外,我们介绍了一个句法骨架预测器(SSP),以增强改变定位和标题生成之间的语义交互。广泛的实验表明,该方法在两个公共数据集上实现了最先进的结果。

NLP-35-标题: LMSOC: An Approach for Socially Sensitive Pretraining

链接: https://arxiv.org/abs/2110.10319
作者: Vivek Kulkarni, Shubhanshu Mishra, Aria Haghighi
备注: Camera ready version. Accepted to EMNLP 2021 Findings. Code for reproducing the experiments can be found at: this https URL

点击查看摘要

Abstract: While large-scale pretrained language models have been shown to learn effective linguistic representations for many NLP tasks, there remain many real-world contextual aspects of language that current approaches do not capture. For instance, consider a cloze-test “I enjoyed the ____ game this weekend”: the correct answer depends heavily on where the speaker is from, when the utterance occurred, and the speaker’s broader social milieu and preferences. Although language depends heavily on the geographical, temporal, and other social contexts of the speaker, these elements have not been incorporated into modern transformer-based language models. We propose a simple but effective approach to incorporate speaker social context into the learned representations of large-scale language models. Our method first learns dense representations of social contexts using graph representation learning algorithms and then primes language model pretraining with these social context representations. We evaluate our approach on geographically-sensitive language-modeling tasks and show a substantial improvement (more than 100% relative lift on MRR) compared to baselines.

摘要:虽然已经显示大规模预制语言模型来学习许多NLP任务的有效语言表征,但仍然存在许多当前方法不会捕获的语言的真实世界观方面。例如,考虑一个隐冻式测试“我本周末喜欢____游戏”:当话语发生时,正确的答案很大程度上取决于扬声器的位置,以及演讲者更广泛的社会环境和偏好。虽然语言在很大程度上取决于发言者的地理,时间和其他社会环境,但这些元素尚未纳入现代变压器的语言模型。我们提出了一种简单但有效的方法,将扬声器社会背景合并到大型语言模型的学习表现中。我们的方法首先使用图形表示学习算法学习社会环境的密集表示,然后用这些社会上下文表示来推动语言模型预先磨损。我们在地理上敏感语言建模任务中评估了我们的方法,并与基线相比,显示了大量改进(MRR上的超过100%相对电梯)。

NLP-36-标题: Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair Prediction

链接: https://arxiv.org/abs/2110.10318
作者: Shubhanshu Mishra, Aria Haghighi
备注: Camera ready version. Accepted to WNUT 2021. Code for reproducing the experiments can be found at: this https URL

点击查看摘要

Abstract: We evaluate a simple approach to improving zero-shot multilingual transfer of mBERT on social media corpus by adding a pretraining task called translation pair prediction (TPP), which predicts whether a pair of cross-lingual texts are a valid translation. Our approach assumes access to translations (exact or approximate) between source-target language pairs, where we fine-tune a model on source language task data and evaluate the model in the target language. In particular, we focus on language pairs where transfer learning is difficult for mBERT: those where source and target languages are different in script, vocabulary, and linguistic typology. We show improvements from TPP pretraining over mBERT alone in zero-shot transfer from English to Hindi, Arabic, and Japanese on two social media tasks: NER (a 37% average relative improvement in F1 across target languages) and sentiment classification (12% relative improvement in F1) on social media text, while also benchmarking on a non-social media task of Universal Dependency POS tagging (6.7% relative improvement in accuracy). Our results are promising given the lack of social media bitext corpus. Our code can be found at: this https URL.

摘要:通过添加称为翻译对预测(TPP)的预先训练任务,评估一种简单的方法来提高社交媒体语料库上的零拍摄多语言转移MBBERT,这预测了一对交叉文本是有效的翻译。我们的方法假定访问源目标语言对之间的翻译(精确或近似),在那里我们微调源语言任务数据上的模型,并在目标语言中评估模型。特别是,我们专注于语言对,即MBERT难以转移学习:脚本,词汇和语言类型学中源和目标语言的源语言不同。我们展示了从零拍摄的TPP预先追溯的改进,单独从英语转移到印地语,阿拉伯语和日语两项社交媒体任务:ner(目标语言的37%的平均相对改善)和情绪分类(12%相对) F1的改进)在社交媒体文本上,而在普通依赖POS标记的非社交媒体任务(准确性的相对提高6.7%)的非社交媒体任务中也是基准测试。鉴于缺乏社交媒体BITEXT语料库,我们的结果很有希望。我们的代码可以在:这个HTTPS URL找到。

NLP-37-标题: Learning Domain Specific Language Models for Automatic Speech Recognition through Machine Translation

链接: https://arxiv.org/abs/2110.10261
作者: Saurav Jha
备注: Master’s thesis work from July 2021, 22 pages including references

点击查看摘要

Abstract: Automatic Speech Recognition (ASR) systems have been gaining popularity in the recent years for their widespread usage in smart phones and speakers. Building ASR systems for task-specific scenarios is subject to the availability of utterances that adhere to the style of the task as well as the language in question. In our work, we target such a scenario wherein task-specific text data is available in a language that is different from the target language in which an ASR Language Model (LM) is expected. We use Neural Machine Translation (NMT) as an intermediate step to first obtain translations of the task-specific text data. We then train LMs on the 1-best and N-best translations and study ways to improve on such a baseline LM. We develop a procedure to derive word confusion networks from NMT beam search graphs and evaluate LMs trained on these confusion networks. With experiments on the WMT20 chat translation task dataset, we demonstrate that NMT confusion networks can help to reduce the perplexity of both n-gram and recurrent neural network LMs compared to those trained only on N-best translations.

摘要:近年来,自动语音识别(ASR)系统在智能手机和扬声器广泛使用的近年来一直受到普及。为任务特定方案构建ASR系统受到遵守任务风格的话语的可用性以及有问题的语言。在我们的工作中,我们针对这样的场景,其中任务特定的文本数据可用于不同于预期ASR语言模型(LM)的目标语言的语言。我们使用神经电脑翻译(NMT)作为中间步骤,以首先获得特定于任务文本数据的翻译。然后,我们将LMS培训1-Best和N最佳翻译和研究这种基线LM的方法。我们开发了从NMT波束搜索图中推出混淆网络的过程,并评估这些混淆网络上的LMS。通过对WMT20聊天翻译任务数据集的实验,我们证明NMT混淆网络可以帮助减少与N-BEST翻译的那些训练的N-GRAM和经常性神经网络LMS的困惑。

NLP-38-标题: Neural Medication Extraction: A Comparison of Recent Models in Supervised and Semi-supervised Learning Settings

链接: https://arxiv.org/abs/2110.10213
作者: Ali Can Kocabiyikoglu, François Portet, Raheel Qader, Jean-Marc Babouchkine
备注: IEEE International Conference on Healthcare Informatics (ICHI 2021)

点击查看摘要

Abstract: Drug prescriptions are essential information that must be encoded in electronic medical records. However, much of this information is hidden within free-text reports. This is why the medication extraction task has emerged. To date, most of the research effort has focused on small amount of data and has only recently considered deep learning methods. In this paper, we present an independent and comprehensive evaluation of state-of-the-art neural architectures on the I2B2 medical prescription extraction task both in the supervised and semi-supervised settings. The study shows the very competitive performance of simple DNN models on the task as well as the high interest of pre-trained models. Adapting the latter models on the I2B2 dataset enables to push medication extraction performances above the state-of-the-art. Finally, the study also confirms that semi-supervised techniques are promising to leverage large amounts of unlabeled data in particular in low resource setting when labeled data is too costly to acquire.

摘要:药物处方是必须在电子医疗记录中编码的重要信息。但是,许多这些信息都隐藏在自由文本报告中。这就是为什么药物提取任务已经出现。迄今为止,大多数研究努力都集中在少量数据上,最近仅考虑了深入学习方法。在本文中,我们在监督和半监督环境中对I2B2医疗处方提取任务的最新神经结构进行了独立和全面的评估。该研究表明了在任务上的简单DNN模型以及预先训练模型的高兴趣。在I2B2数据集中调整后一程模型可以推动上方的药物提取性能。最后,该研究还证实,当标记数据太高时,半监督技术尤其在低资源环境中利用大量未标记的数据。

NLP-39-标题: StructFormer: Learning Spatial Structure for Language-Guided Semantic Rearrangement of Novel Objects

链接: https://arxiv.org/abs/2110.10189
作者: Weiyu Liu, Chris Paxton, Tucker Hermans, Dieter Fox
备注:

点击查看摘要

Abstract: Geometric organization of objects into semantically meaningful arrangements pervades the built world. As such, assistive robots operating in warehouses, offices, and homes would greatly benefit from the ability to recognize and rearrange objects into these semantically meaningful structures. To be useful, these robots must contend with previously unseen objects and receive instructions without significant programming. While previous works have examined recognizing pairwise semantic relations and sequential manipulation to change these simple relations none have shown the ability to arrange objects into complex structures such as circles or table settings. To address this problem we propose a novel transformer-based neural network, StructFormer, which takes as input a partial-view point cloud of the current object arrangement and a structured language command encoding the desired object configuration. We show through rigorous experiments that StructFormer enables a physical robot to rearrange novel objects into semantically meaningful structures with multi-object relational constraints inferred from the language command.

摘要:对象的几何组织成语义有意义的安排,遍布建造的世界。因此,在仓库,办公室和家庭中运行的辅助机器人将极大地受益于识别和重新排列对象中的这些语义有意义的结构。有用的是,这些机器人必须与以前看不见的对象争斗,并在没有重大编程的情况下接收指令。虽然以前的作品已经检查了识别成对语义关系和顺序操作以改变这些简单的关系,但没有显示能够将对象安排到诸如圈子或表设置之类的复杂结构中。为了解决这个问题,我们提出了一种基于新的基于变换器的神经网络,STRATHFORER,其作为输入当前对象布置的局部视角云和编码所需对象配置的结构化语言命令。我们通过严格的实验表明,Struciformer使物理机器人能够将新颖的对象重新排列成语义有意义的结构,其中来自语言命令推断的多目标关系约束。

NLP-40-标题: GenNI: Human-AI Collaboration for Data-Backed Text Generation

链接: https://arxiv.org/abs/2110.10185
作者: Hendrik Strobelt, Jambay Kinley, Robert Krueger, Johanna Beyer, Hanspeter Pfister, Alexander M. Rush
备注: IEEE VIS 2021

点击查看摘要

Abstract: Table2Text systems generate textual output based on structured data utilizing machine learning. These systems are essential for fluent natural language interfaces in tools such as virtual assistants; however, left to generate freely these ML systems often produce misleading or unexpected outputs. GenNI (Generation Negotiation Interface) is an interactive visual system for high-level human-AI collaboration in producing descriptive text. The tool utilizes a deep learning model designed with explicit control states. These controls allow users to globally constrain model generations, without sacrificing the representation power of the deep learning models. The visual interface makes it possible for users to interact with AI systems following a Refine-Forecast paradigm to ensure that the generation system acts in a manner human users find suitable. We report multiple use cases on two experiments that improve over uncontrolled generation approaches, while at the same time providing fine-grained control. A demo and source code are available at this https URL .

摘要:表2Text系统基于利用机器学习的结构化数据生成文本输出。这些系统对于虚拟助手等工具中的流利自然语言界面至关重要;然而,留下自由产生这些ML系统通常会产生误导性或意外输出。 Genni(Generation Engotiation界面)是一种用于在生成描述性文本时高级人类AI协作的交互式视觉系统。该工具利用具有明确控制状态的深度学习模型。这些控件允许用户全局限制模型代,而不牺牲深度学习模型的表示力。视觉界面使用户可以在精确预测范例之后与AI系统交互,以确保生成系统以人类用户找到合适的方式起作用。我们报告了两种实验中的多种用例,这些实验改善了不受控制的发电方法,同时提供细粒度控制。这个HTTPS URL可以使用演示和源代码。

NLP-41-标题: SummN: A Multi-Stage Summarization Framework for Long Input Dialogues and Documents

链接: https://arxiv.org/abs/2110.10150
作者: Yusen Zhang, Ansong Ni, Ziming Mao, Chen Henry Wu, Chenguang Zhu, Budhaditya Deb, Ahmed H. Awadallah, Dragomir Radev, Rui Zhang
备注:

点击查看摘要

Abstract: Text summarization is an essential task to help readers capture salient information from documents, news, interviews, and meetings. However, most state-of-the-art pretrained language models are unable to efficiently process long text commonly seen in the summarization problem domain. In this paper, we propose Summ^N, a simple, flexible, and effective multi-stage framework for input texts that are longer than the maximum context lengths of typical pretrained LMs. Summ^N first generates the coarse summary in multiple stages and then produces the final fine-grained summary based on them. The framework can process input text of arbitrary length by adjusting the number of stages while keeping the LM context size fixed. Moreover, it can deal with both documents and dialogues and can be used on top of any underlying backbone abstractive summarization model. Our experiments demonstrate that Summ^N significantly outperforms previous state-of-the-art methods by improving ROUGE scores on three long meeting summarization datasets AMI, ICSI, and QMSum, two long TV series datasets from SummScreen, and a newly proposed long document summarization dataset GovReport. Our data and code are available at this https URL.

摘要:文本摘要是帮助读者从文档,新闻,访谈和会议中捕获突出信息的重要任务。但是,大多数最先进的预训练语言模型无法有效地处理摘要问题域中常见的长文本。在本文中,我们提出了汇总^ n,一个简单,灵活,有效的多阶段框架,用于输入文本,这些输入文本比典型的预磨损LMS的最大上下文长度长。 summ ^ n首先在多个阶段生成粗略摘要,然后基于它们产生最终的细粒度摘要。该框架可以通过调整保持LM上下文大小固定的阶段的阶段数来处理任意长度的输入文本。此外,它可以处理文档和对话,可以在任何底层骨干抽象摘要模型之上使用。我们的实验表明,Summ ^ N通过改善三个长期会议摘要数据集AMI,ICSI和QMSUM,来自SummScreen的两个长电视剧数据集,以及新提出的长文件摘要,显着优于先前的最先进的方法DataSet GovReport。我们的数据和代码可在此HTTPS URL上获得。

NLP-42-标题: The R package sentometrics to compute aggregate and predict with textual sentiment

链接: https://arxiv.org/abs/2110.10817
作者: David Ardia, Keven Bluteau, Samuel Borms, Kris Boudt
备注:

点击查看摘要

Abstract: We provide a hands-on introduction to optimized textual sentiment indexation using the R package sentometrics. Textual sentiment analysis is increasingly used to unlock the potential information value of textual data. The sentometrics package implements an intuitive framework to efficiently compute sentiment scores of numerous texts, to aggregate the scores into multiple time series, and to use these time series to predict other variables. The workflow of the package is illustrated with a built-in corpus of news articles from two major U.S. journals to forecast the CBOE Volatility Index.

摘要:我们提供了使用R包Sentometrics优化的文本情感指数的实践介绍。越来越多地用于解锁文本数据的潜在信息值。Sentometrics包实现了直观的框架,以有效地计算许多文本的情感分数,将分数聚合到多个时间序列中,并使用这些时间序列来预测其他变量。该包装的工作流程由来自两个主要的美国期刊的新闻文章的内置语料库来说明,以预测Cboe波动性指数。

机器学习

ML-0-标题: RoQNN: Noise-Aware Training for Robust Quantum Neural Networks

链接: https://arxiv.org/abs/2110.11331
作者: Hanrui Wang, Jiaqi Gu, Yongshan Ding, Zirui Li, Frederic T. Chong, David Z. Pan, Song Han
备注: 19 pages, 10 figures, open-source at this https URL

点击查看摘要

Abstract: Quantum Neural Network (QNN) is a promising application towards quantum advantage on near-term quantum hardware. However, due to the large quantum noises (errors), the performance of QNN models has a severe degradation on real quantum devices. For example, the accuracy gap between noise-free simulation and noisy results on IBMQ-Yorktown for MNIST-4 classification is over 60%. Existing noise mitigation methods are general ones without leveraging unique characteristics of QNN and are only applicable to inference; on the other hand, existing QNN work does not consider noise effect. To this end, we present RoQNN, a QNN-specific framework to perform noise-aware optimizations in both training and inference stages to improve robustness. We analytically deduct and experimentally observe that the effect of quantum noise to QNN measurement outcome is a linear map from noise-free outcome with a scaling and a shift factor. Motivated by that, we propose post-measurement normalization to mitigate the feature distribution differences between noise-free and noisy scenarios. Furthermore, to improve the robustness against noise, we propose noise injection to the training process by inserting quantum error gates to QNN according to realistic noise models of quantum hardware. Finally, post-measurement quantization is introduced to quantize the measurement outcomes to discrete values, achieving the denoising effect. Extensive experiments on 8 classification tasks using 6 quantum devices demonstrate that RoQNN improves accuracy by up to 43%, and achieves over 94% 2-class, 80% 4-class, and 34% 10-class MNIST classification accuracy measured on real quantum computers. We also open-source our PyTorch library for construction and noise-aware training of QNN at this https URL .

摘要:量子神经网络(QNN)是近期量子硬件对量子优势的有希望的应用。然而,由于量子噪声(误差),QNN模型的性能在真实量子器件上具有严重的降级。例如,用于MNIST-4分类的IBMQ-Yorktown对无噪声模拟和噪声之间的准确性差距超过60%。在不利用QNN的独特特征的情况下,现有的噪声缓解方法是一般的,并且仅适用于推理;另一方面,现有的QNN工作不考虑噪音效果。为此,我们呈现ROQNN,一个特定于QNN的框架,以在训练和推理阶段进行噪声感知优化以提高鲁棒性。我们分析和实验地观察量子噪声对QNN测量结果的影响是来自无缩放结果的线性图和换档因子。由此,我们提出后测量标准化,以减轻无噪声和嘈杂的场景之间的特征分布差异。此外,为了提高噪声的稳健性,我们通过根据量子硬件的逼真噪声模型将量子误差门插入QNN来提出训练过程的噪声注射。最后,引入后测量量化以将测量结果量化为离散值,实现去噪效果。使用6个量子器件的8个分类任务的广泛实验表明ROQNN提高了高达43%的准确度,并实现了超过94%的2级,80%4级和34%的10级MNIST分类精度,在真正的量子计算机上测量。我们还开源我们的Pytorch库进行此HTTPS URL的QNN的施工和噪声感知培训。

ML-1-标题: Towards modelling hazard factors in unstructured data spaces using gradient-based latent interpolation

链接: https://arxiv.org/abs/2110.11312
作者: Tobias Weber, Michael Ingrisch, Bernd Bischl, David Rügamer
备注: NeurIPS 2021 Workshop, Deep Generative Models and Downstream Applications

点击查看摘要

Abstract: The application of deep learning in survival analysis (SA) gives the opportunity to utilize unstructured and high-dimensional data types uncommon in traditional survival methods. This allows to advance methods in fields such as digital health, predictive maintenance and churn analysis, but often yields less interpretable and intuitively understandable models due to the black-box character of deep learning-based approaches. We close this gap by proposing 1) a multi-task variational autoencoder (VAE) with survival objective, yielding survival-oriented embeddings, and 2) a novel method HazardWalk that allows to model hazard factors in the original data space. HazardWalk transforms the latent distribution of our autoencoder into areas of maximized/minimized hazard and then uses the decoder to project changes to the original domain. Our procedure is evaluated on a simulated dataset as well as on a dataset of CT imaging data of patients with liver metastases.

摘要:深度学习在生存分析中的应用(SA)为传统生存方法提供了利用非结构化和高维数据类型罕见的机会。这允许推进在数字健康,预测性维护和搅拌分析之类的领域中的方法,但由于基于深度学习的方法的黑匣子特征,通常会产生更少的可解释和直观的型号。我们通过提出1)多任务变分性AutoEncoder(VAE),以存活目标,产生生存的嵌入,2)一种新的方法危险障碍,允许在原始数据空间中模拟危险因素的新方法危险。HazardWalk将ioirencoder的潜在分布转换为最大化/最小化危险区域,然后使用解码器对原始域的项目更改。我们的程序在模拟数据集以及肝转放患者的CT成像数据的数据集上进行评估。

ML-2-标题: On games and simulators as a platform for development of artificial intelligence for command and control

链接: https://arxiv.org/abs/2110.11305
作者: Vinicius G. Goecks, Nicholas Waytowich, Derrik E. Asher, Song Jun Park, Mark Mittrick, John Richardson, Manuel Vindiola, Anne Logie, Mark Dennison, Theron Trout, Priya Narayanan, Alexander Kott
备注: Preprint submitted to the Journal of Defense Modeling and Simulation (JDMS) for peer review

点击查看摘要

Abstract: Games and simulators can be a valuable platform to execute complex multi-agent, multiplayer, imperfect information scenarios with significant parallels to military applications: multiple participants manage resources and make decisions that command assets to secure specific areas of a map or neutralize opposing forces. These characteristics have attracted the artificial intelligence (AI) community by supporting development of algorithms with complex benchmarks and the capability to rapidly iterate over new ideas. The success of artificial intelligence algorithms in real-time strategy games such as StarCraft II have also attracted the attention of the military research community aiming to explore similar techniques in military counterpart scenarios. Aiming to bridge the connection between games and military applications, this work discusses past and current efforts on how games and simulators, together with the artificial intelligence algorithms, have been adapted to simulate certain aspects of military missions and how they might impact the future battlefield. This paper also investigates how advances in virtual reality and visual augmentation systems open new possibilities in human interfaces with gaming platforms and their military parallels.

摘要:游戏和模拟器可以是执行复杂的多代理,多人游戏,多种信息方案的有价值的平台,具有重要的平台与军事应用:多个参与者管理资源,并制定指挥用于保护地图的特定区域或中和对方力量的决定。这些特征通过支持具有复杂基准的算法和快速迭代新想法的能力来吸引人工智能(AI)社区。人工智能算法在STAR争霸II等实时战略游戏中的成功也引起了军事研究界的注意,旨在探讨军事对应方案中的类似技术。旨在弥合游戏和军事应用之间的联系,这项工作讨论了过去和目前对游戏和模拟器与人工智能算法如何进行的努力,这一直适合模拟军事任务的某些方面以及它们如何影响未来的战地。本文还研究了虚拟现实和视觉增强系统的进步如何开辟人类界面的新可能性以及游戏平台和他们的军用平行。

ML-3-标题: Survival-oriented embeddings for improving accessibility to complex data structures

链接: https://arxiv.org/abs/2110.11303
作者: Tobias Weber, Michael Ingrisch, Matthias Fabritius, Bernd Bischl, David Rügamer
备注: NeurIPS 2021 Workshop, Bridging the Gap: From Machine Learning Research to Clinical Practice

点击查看摘要

Abstract: Deep learning excels in the analysis of unstructured data and recent advancements allow to extend these techniques to survival analysis. In the context of clinical radiology, this enables, e.g., to relate unstructured volumetric images to a risk score or a prognosis of life expectancy and support clinical decision making. Medical applications are, however, associated with high criticality and consequently, neither medical personnel nor patients do usually accept black box models as reason or basis for decisions. Apart from averseness to new technologies, this is due to missing interpretability, transparency and accountability of many machine learning methods. We propose a hazard-regularized variational autoencoder that supports straightforward interpretation of deep neural architectures in the context of survival analysis, a field highly relevant in healthcare. We apply the proposed approach to abdominal CT scans of patients with liver tumors and their corresponding survival times.

摘要:深度学习擅长非结构化数据分析和最近进步允许将这些技术扩展到生存分析。在临床放射学的背景下,这使得例如将非结构化的体积图像与风险评分或预期预期的预后和支持临床决策相关。然而,医学应用与高临界有关,因此,医生和患者均不会接受黑匣子模型作为决策的原因或基础。除了向新技术的厌恶之外,这是由于许多机器学习方法的可解释性,透明度和问责制为。我们提出了一种危险的正规化变分性,可以在生存分析中,支持对深神经结构的直接解释,在生存分析中,一个在医疗保健中高度相关的领域。我们将建议的腹部CT扫描方法应用于肝脏肿瘤的腹部CT扫描及其相应的存活时间。

ML-4-标题: Transformer Acceleration with Dynamic Sparse Attention

链接: https://arxiv.org/abs/2110.11299
作者: Liu Liu, Zheng Qu, Zhaodong Chen, Yufei Ding, Yuan Xie
备注:

点击查看摘要

Abstract: Transformers are the mainstream of NLP applications and are becoming increasingly popular in other domains such as Computer Vision. Despite the improvements in model quality, the enormous computation costs make Transformers difficult at deployment, especially when the sequence length is large in emerging applications. Processing attention mechanism as the essential component of Transformer is the bottleneck of execution due to the quadratic complexity. Prior art explores sparse patterns in attention to support long sequence modeling, but those pieces of work are on static or fixed patterns. We demonstrate that the sparse patterns are dynamic, depending on input sequences. Thus, we propose the Dynamic Sparse Attention (DSA) that can efficiently exploit the dynamic sparsity in the attention of Transformers. Compared with other methods, our approach can achieve better trade-offs between accuracy and model complexity. Moving forward, we identify challenges and provide solutions to implement DSA on existing hardware (GPUs) and specialized hardware in order to achieve practical speedup and efficiency improvements for Transformer execution.

摘要:变形金刚是NLP应用的主流,在计算机愿景等其他领域越来越受欢迎。尽管模型质量提升,但巨大的计算成本使变压器在部署时难以困难,特别是当序列长度在新兴应用中大时。由于变压器的基本组件,处理注意机制是由于二次复杂性导致的瓶颈。现有技术探讨了稀疏的模式,以支持长序列建模,但这些工作件在静态或固定图案上。我们证明稀疏模式是动态的,具体取决于输入序列。因此,我们提出了动态稀疏的注意力(DSA),可以有效地利用变压器注意力的动态稀疏性。与其他方法相比,我们的方法可以在准确性和模型复杂性之间实现更好的权衡。向前迈进,我们确定挑战并提供解决现有硬件(GPU)和专业硬件的DSA的解决方案,以实现变压器执行的实用加速和效率。

ML-5-标题: OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis

链接: https://arxiv.org/abs/2110.11292
作者: Animesh Basak Chowdhury, Benjamin Tan, Ramesh Karri, Siddharth Garg
备注: 18 pages

点击查看摘要

Abstract: Logic synthesis is a challenging and widely-researched combinatorial optimization problem during integrated circuit (IC) design. It transforms a high-level description of hardware in a programming language like Verilog into an optimized digital circuit netlist, a network of interconnected Boolean logic gates, that implements the function. Spurred by the success of ML in solving combinatorial and graph problems in other domains, there is growing interest in the design of ML-guided logic synthesis tools. Yet, there are no standard datasets or prototypical learning tasks defined for this problem domain. Here, we describe OpenABC-D,a large-scale, labeled dataset produced by synthesizing open source designs with a leading open-source logic synthesis tool and illustrate its use in developing, evaluating and benchmarking ML-guided logic synthesis. OpenABC-D has intermediate and final outputs in the form of 870,000 And-Inverter-Graphs (AIGs) produced from 1500 synthesis runs plus labels such as the optimized node counts, and de-lay. We define a generic learning problem on this dataset and benchmark existing solutions for it. The codes related to dataset creation and benchmark models are available athttps://github.com/NYU-MLDA/OpenABC.git. The dataset generated is available athttps://archive.this http URL

摘要:逻辑合成是集成电路(IC)设计期间的具有挑战性和广泛研究的组合优化问题。它将硬件的高级描述以Verilog等编程语言转换为优化的数字电路网列表,这是一种实现该功能的互联布尔逻辑门网络。通过ML的成功来解决在其他域中的组合和图形问题中的成功,对ML引导逻辑合成工具的设计越来越感兴趣。然而,没有为此问题域定义的标准数据集或原型学习任务。这里,我们描述了OpenABC-D,通过合成开源设计和领先的开源逻辑合成工具来制作的大规模标记的数据集,并说明其在开发,评估和基准标记逻辑合成中的使用。 OpenABC-D具有由1500个合成的870,000和逆变器 - 图(AIG)形式的中间和最终输出,包括优化节点计数,以及De-Lay。我们在此数据集中定义了一个通用的学习问题,并为其进行基准现有解决方案。有关与数据集创建和基准模型相关的代码是可用的Athttps://github.com/nyu-mlda/openabc.git。生成的数据集是可用的AthTTPS://Archive.this http URL

ML-6-标题: Physical Side-Channel Attacks on Embedded Neural Networks: A Survey

链接: https://arxiv.org/abs/2110.11290
作者: Maria Méndez Real, Rubén Salvador
备注: 25 pages, 7 figures

点击查看摘要

Abstract: During the last decade, Deep Neural Networks (DNN) have progressively been integrated on all types of platforms, from data centers to embedded systems including low-power processors and, recently, FPGAs. Neural Networks (NN) are expected to become ubiquitous in IoT systems by transforming all sorts of real-world applications, including applications in the safety-critical and security-sensitive domains. However, the underlying hardware security vulnerabilities of embedded NN implementations remain unaddressed. In particular, embedded DNN implementations are vulnerable to Side-Channel Analysis (SCA) attacks, which are especially important in the IoT and edge computing contexts where an attacker can usually gain physical access to the targeted device. A research field has therefore emerged and is rapidly growing in terms of the use of SCA including timing, electromagnetic attacks and power attacks to target NN embedded implementations. Since 2018, research papers have shown that SCA enables an attacker to recover inference models architectures and parameters, to expose industrial IP and endangers data confidentiality and privacy. Without a complete review of this emerging field in the literature so far, this paper surveys state-of-the-art physical SCA attacks relative to the implementation of embedded DNNs on micro-controllers and FPGAs in order to provide a thorough analysis on the current landscape. It provides a taxonomy and a detailed classification of current attacks. It first discusses mitigation techniques and then provides insights for future research leads.

摘要:在过去十年中,深度神经网络(DNN)逐步集成在所有类型的平台上,从数据中心到嵌入式系统,包括低功耗处理器,最近FPGA。通过转换各种实际应用程序,预计神经网络(NN)将在IOT系统中变得无处不在,包括安全关键和安全敏感域中的应用程序。但是,嵌入式NN实现的底层硬件安全漏洞仍未构建。特别地,嵌入式DNN实现容易受到侧信道分析(SCA)攻击,这在IOT和边缘计算上下文中尤其重要,其中攻击者通常可以获得对目标设备的物理访问。因此,在使用SCA的使用方面,因此出现了一个研究领域,并且在包括定时,电磁攻击和电力攻击来瞄准NN嵌入式实现的迅速增长。自2018年以来,研究论文表明,SCA使攻击者能够恢复推理模型架构和参数,以暴露工业IP和危险者数据机密性和隐私。如果到目前为止,在文献中没有完整审查该新兴领域,本文调查了相对于微控制器和FPGA上的嵌入式DNN的实现的最先进的物理SCA攻击,以便对当前提供彻底的分析风景。它提供了分类法和当前攻击的详细分类。它首先讨论缓解技术,然后为未来的研究引线提供见解。

ML-7-标题: One-Shot Transfer Learning of Physics-Informed Neural Networks

链接: https://arxiv.org/abs/2110.11286
作者: Shaan Desai, Marios Mattheakis, Hayden Joy, Pavlos Protopapas, Stephen Roberts
备注: [under review]

点击查看摘要

Abstract: Solving differential equations efficiently and accurately sits at the heart of progress in many areas of scientific research, from classical dynamical systems to quantum mechanics. There is a surge of interest in using Physics-Informed Neural Networks (PINNs) to tackle such problems as they provide numerous benefits over traditional numerical approaches. Despite their potential benefits for solving differential equations, transfer learning has been under explored. In this study, we present a general framework for transfer learning PINNs that results in one-shot inference for linear systems of both ordinary and partial differential equations. This means that highly accurate solutions to many unknown differential equations can be obtained instantaneously without retraining an entire network. We demonstrate the efficacy of the proposed deep learning approach by solving several real-world problems, such as first- and second-order linear ordinary equations, the Poisson equation, and the time-dependent Schrodinger complex-value partial differential equation.

摘要:从经典动态系统到量子力学,有效,精确地坐在核心和准确地坐在核心科学研究的核心。使用物理信息的神经网络(Pinns)兴趣激增,以解决这些问题,因为它们提供了传统数值方法的许多好处。尽管有求解微分方程的潜在好处,但探索了转移学习。在这项研究中,我们介绍了转移学习拼盘的一般框架,导致普通和部分微分方程的线性系统的一次性推断。这意味着可以瞬间获得对许多未知微分方程的高准确解决方案而不会再培训整个网络。我们通过解决若干现实世界问题,例如第一和二阶线性普通方程,泊松方程和时间依赖的Schrodinger复合值部分微分方程来展示所提出的深度学习方法的功效。

ML-8-标题: Actor-critic is implicitly biased towards high entropy optimal policies

链接: https://arxiv.org/abs/2110.11280
作者: Yuzheng Hu, Ziwei Ji, Matus Telgarsky
备注:

点击查看摘要

Abstract: We show that the simplest actor-critic method – a linear softmax policy updated with TD through interaction with a linear MDP, but featuring no explicit regularization or exploration – does not merely find an optimal policy, but moreover prefers high entropy optimal policies. To demonstrate the strength of this bias, the algorithm not only has no regularization, no projections, and no exploration like ϵ\epsilon-greedy, but is moreover trained on a single trajectory with no resets. The key consequence of the high entropy bias is that uniform mixing assumptions on the MDP, which exist in some form in all prior work, can be dropped: the implicit regularization of the high entropy bias is enough to ensure that all chains mix and an optimal policy is reached with high probability. As auxiliary contributions, this work decouples concerns between the actor and critic by writing the actor update as an explicit mirror descent, provides tools to uniformly bound mixing times within KL balls of policy space, and provides a projection-free TD analysis with its own implicit bias which can be run from an unmixed starting distribution.

摘要:我们展示了最简单的演员 - 批评方法 - 通过与线性MDP的交互更新的Linear SoftMax策略,但没有明确的正则化或探索 - 不仅仅找到最佳政策,而且更喜欢高熵最佳政策。为了展示这种偏差的强度,算法不仅没有正规化,没有投影,而且没有像$ \ epsilon $ -greey的探索,而是在单个轨迹上训练,没有重置。高熵偏差的关键后果是在所有先前工作中以某种形式存在的MDP上的均匀混合假设可以丢弃:高熵偏差的隐式正则化以确保所有链条混合和最佳达到高概率的政策。作为辅助贡献,这项工作通过将演员更新写入作为显式镜像血液来说,这项工作与演员更新之间的疑虑,提供了在策略空间的KL球中统一混合时间的工具,并提供了一种自身隐含的预测TD分析偏差可以从未混合的起始分发运行。

ML-9-标题: Analyzing and Improving the Optimization Landscape of Noise-Contrastive Estimation

链接: https://arxiv.org/abs/2110.11271
作者: Bingbin Liu, Elan Rosenfeld, Pradeep Ravikumar, Andrej Risteski
备注:

点击查看摘要

Abstract: Noise-contrastive estimation (NCE) is a statistically consistent method for learning unnormalized probabilistic models. It has been empirically observed that the choice of the noise distribution is crucial for NCE’s performance. However, such observations have never been made formal or quantitative. In fact, it is not even clear whether the difficulties arising from a poorly chosen noise distribution are statistical or algorithmic in nature. In this work, we formally pinpoint reasons for NCE’s poor performance when an inappropriate noise distribution is used. Namely, we prove these challenges arise due to an ill-behaved (more precisely, flat) loss landscape. To address this, we introduce a variant of NCE called “eNCE” which uses an exponential loss and for which normalized gradient descent addresses the landscape issues provably when the target and noise distributions are in a given exponential family.

摘要:噪声对比度估计(NCE)是一种学习非全体化概率模型的统计方式。经验证明,噪声分布的选择对于NCE性能至关重要。然而,这种观察结果从未做过正式或定量。事实上,它甚至不清楚是否从未选择噪声分布中产生的困难是自然界的统计或算法。在这项工作中,当使用不适当的噪声分布时,我们正式定位了NCE性能不佳的原因。即,由于表现不佳(更精确,平坦)的损失景观,我们证明了这些挑战。为了解决这个问题,我们介绍了一个名为“ence”的NCE的变体,它使用指数损失,并且归一化梯度下降在给定指数家庭中可以证明归一化梯度血迹地解决了景观问题。

ML-10-标题: Modeling the AC Power Flow Equations with Optimally Compact Neural Networks: Application to Unit Commitment

链接: https://arxiv.org/abs/2110.11269
作者: Alyssa Kody, Samuel Chevalier, Spyros Chatzivasileiadis, Daniel Molzahn
备注: first two authors equally contributed, 8 pages, 3 figures, 1 table

点击查看摘要

Abstract: Nonlinear power flow constraints render a variety of power system optimization problems computationally intractable. Emerging research shows, however, that the nonlinear AC power flow equations can be successfully modeled using Neural Networks (NNs). These NNs can be exactly transformed into Mixed Integer Linear Programs (MILPs) and embedded inside challenging optimization problems, thus replacing nonlinearities that are intractable for many applications with tractable piecewise linear approximations. Such approaches, though, suffer from an explosion of the number of binary variables needed to represent the NN. Accordingly, this paper develops a technique for training an “optimally compact” NN, i.e., one that can represent the power flow equations with a sufficiently high degree of accuracy while still maintaining a tractable number of binary variables. We show that the resulting NN model is more expressive than both the DC and linearized power flow approximations when embedded inside of a challenging optimization problem (i.e., the AC unit commitment problem).

摘要:非线性电流限制渲染各种电力系统优化问题,计算地难以解决。然而,新兴的研究表明,非线性交流电流方程可以使用神经网络(NNS)成功建模。这些NN可以完全转换为混合整数线性程序(MILLS)并嵌入内部具有挑战性的优化问题,从而替换为具有易解码线性近似的许多应用的非线性。但是,这种方法遭受代表NN所需的二元变量的数量的爆炸。因此,本文开发了一种用于训练“最佳紧凑的”NN的技术,即,可以在仍然保持易于的二进制变量的同时以足够高的精度表示电流方程的技术。我们表明,当嵌入在充满挑战的优化问题的内部时,所得到的NN模型比DC和线性化功率流近似更具表现力(即,AC单元承诺问题)。

ML-11-标题: Deep Reinforcement Learning for Online Control of Stochastic Partial Differential Equations

链接: https://arxiv.org/abs/2110.11265
作者: Erfan Pirmorad, Faraz Khoshbakhtian, Farnam Mansouri, Amir-massoud Farahmand
备注:

点击查看摘要

Abstract: In many areas, such as the physical sciences, life sciences, and finance, control approaches are used to achieve a desired goal in complex dynamical systems governed by differential equations. In this work we formulate the problem of controlling stochastic partial differential equations (SPDE) as a reinforcement learning problem. We present a learning-based, distributed control approach for online control of a system of SPDEs with high dimensional state-action space using deep deterministic policy gradient method. We tested the performance of our method on the problem of controlling the stochastic Burgers’ equation, describing a turbulent fluid flow in an infinitely large domain.

摘要:在许多领域,如物理科学,生命科学和金融,控制方法用于在差分方程治理的复杂动态系统中实现所需目标。在这项工作中,我们制定了控制随机部分微分方程(SPDE)作为加强学习问题的问题。我们介绍了一种基于学习的,分布式控制方法,用于使用深度确定性政策梯度方法对具有高维状态动作空间的SPDES系统的在线控制。我们测试了我们对控制随机汉堡等方程问题的方法的性能,描述了无限大域的湍流流体流动。

ML-12-标题: Principal Component Analysis versus Factor Analysis

链接: https://arxiv.org/abs/2110.11261
作者: Zenon Gniazdowski
备注: 54 pages, 13 figures, 35 tables

点击查看摘要

Abstract: The article discusses selected problems related to both principal component analysis (PCA) and factor analysis (FA). In particular, both types of analysis were compared. A vector interpretation for both PCA and FA has also been proposed. The problem of determining the number of principal components in PCA and factors in FA was discussed in detail. A new criterion for determining the number of factors and principal components is discussed, which will allow to present most of the variance of each of the analyzed primary variables. An efficient algorithm for determining the number of factors in FA, which complies with this criterion, was also proposed. This algorithm was adapted to find the number of principal components in PCA. It was also proposed to modify the PCA algorithm using a new method of determining the number of principal components. The obtained results were discussed.

摘要:本文讨论了与主成分分析(PCA)和因子分析(FA)相关的选定问题。特别地,比较了两种类型的分析。还提出了PCA和FA的矢量解释。详细讨论了确定PCA中主要成分数量的问题和FA中的因子。讨论了用于确定因子和主成分数量的新标准,这将允许呈现每个分析的初级变量的大部分方差。还提出了一种用于确定FA的因素数量的有效算法,也提出了符合这一标准。该算法适用于在PCA中找到主组件的数量。还建议使用新方法修改PCA算法,确定主组件的数量。讨论了所获得的结果。

ML-13-标题: Learning to Recommend Using Non-Uniform Data

链接: https://arxiv.org/abs/2110.11248
作者: Wanning Chen, Mohsen Bayati
备注:

点击查看摘要

Abstract: Learning user preferences for products based on their past purchases or reviews is at the cornerstone of modern recommendation engines. One complication in this learning task is that some users are more likely to purchase products or review them, and some products are more likely to be purchased or reviewed by the users. This non-uniform pattern degrades the power of many existing recommendation algorithms, as they assume that the observed data is sampled uniformly at random among user-product pairs. In addition, existing literature on modeling non-uniformity either assume user interests are independent of the products, or lack theoretical understanding. In this paper, we first model the user-product preferences as a partially observed matrix with non-uniform observation pattern. Next, building on the literature about low-rank matrix estimation, we introduce a new weighted trace-norm penalized regression to predict unobserved values of the matrix. We then prove an upper bound for the prediction error of our proposed approach. Our upper bound is a function of a number of parameters that are based on a certain weight matrix that depends on the joint distribution of users and products. Utilizing this observation, we introduce a new optimization problem to select a weight matrix that minimizes the upper bound on the prediction error. The final product is a new estimator, NU-Recommend, that outperforms existing methods in both synthetic and real datasets.

摘要:基于过去的购买或评论,学习对产品的用户偏好是现代推荐发动机的基石。这项学习任务中的一种复杂性是一些用户更有可能购买产品或审查它们,并且用户更有可能被用户购买或审查产品。这种不均匀的模式降低了许多现有推荐算法的力量,因为它们假设观察到的数据在用户 - 产品对中随机地均匀地进行采样。此外,对非均匀性的现有文献要么承担用户兴趣都独立于产品,或缺乏理论理解。在本文中,我们首先将用户产品偏好模拟,作为具有非均匀观察模式的部分观察到的矩阵。接下来,在对低秩矩阵估计的文献中构建,我们引入了一个新的加权追踪惩罚回归,以预测矩阵的未观察值。然后,我们证明了我们提出的方法的预测误差的上限。我们的上限是基于一定权重矩阵的许多参数的函数,这取决于用户和产品的联合分配。利用此观察,我们介绍了一个新的优化问题,以选择重量矩阵,最小化预测误差的上限。最终产品是一个新的估算器Nu-Becurmity,这占合成和实时数据集中的现有方法。

ML-14-标题: Variational Predictive Routing with Nested Subjective Timescales

链接: https://arxiv.org/abs/2110.11236
作者: Alexey Zakharov, Qinghai Guo, Zafeirios Fountas
备注: 18 pages, 13 figures

点击查看摘要

Abstract: Discovery and learning of an underlying spatiotemporal hierarchy in sequential data is an important topic for machine learning. Despite this, little work has been done to explore hierarchical generative models that can flexibly adapt their layerwise representations in response to datasets with different temporal dynamics. Here, we present Variational Predictive Routing (VPR) - a neural probabilistic inference system that organizes latent representations of video features in a temporal hierarchy, based on their rates of change, thus modeling continuous data as a hierarchical renewal process. By employing an event detection mechanism that relies solely on the system’s latent representations (without the need of a separate model), VPR is able to dynamically adjust its internal state following changes in the observed features, promoting an optimal organisation of representations across the levels of the model’s latent hierarchy. Using several video datasets, we show that VPR is able to detect event boundaries, disentangle spatiotemporal features across its hierarchy, adapt to the dynamics of the data, and produce accurate time-agnostic rollouts of the future. Our approach integrates insights from neuroscience and introduces a framework with high potential for applications in model-based reinforcement learning, where flexible and informative state-space rollouts are of particular interest.

摘要:在顺序数据中发现和学习底层的时空层次结构是机器学习的重要主题。尽管如此,已经完成了很少的工作来探索分层生成模型,这些模型可以灵活地调整它们的层状表示,以响应具有不同时间动态的数据集。这里,我们呈现变分预测路由(VPR) - 一种神经概率推理系统,其基于其改变率来组织时间层次结构中的视频特征的潜在表示,从而将连续数据建模作为分层续展过程。通过采用完全依赖于系统的潜在表示的事件检测机制(无需单独模型),VPR能够在观察到的特征的变化之后动态调整其内部状态,促进整个级别的最佳组织模型的潜在层次结构。使用多个视频数据集,我们显示VPR能够检测到其层次结构中的脱位缺陷时空特征,适应数据的动态,并产生未来的准确时间不可止结的卷展览。我们的方法集成了神经科学的洞察力,并介绍了基于模型的强化学习中的应用程序的高潜力,其中灵活和信息丰富的状态 - 空间卷展栏特别感兴趣。

ML-15-标题: Is High Variance Unavoidable in RL? A Case Study in Continuous Control

链接: https://arxiv.org/abs/2110.11222
作者: Johan Bjorck, Carla P. Gomes, Kilian Q. Weinberger
备注:

点击查看摘要

Abstract: Reinforcement learning (RL) experiments have notoriously high variance, and minor details can have disproportionately large effects on measured outcomes. This is problematic for creating reproducible research and also serves as an obstacle for real-world applications, where safety and predictability are paramount. In this paper, we investigate causes for this perceived instability. To allow for an in-depth analysis, we focus on a specifically popular setup with high variance – continuous control from pixels with an actor-critic agent. In this setting, we demonstrate that variance mostly arises early in training as a result of poor “outlier” runs, but that weight initialization and initial exploration are not to blame. We show that one cause for early variance is numerical instability which leads to saturating nonlinearities. We investigate several fixes to this issue and find that one particular method is surprisingly effective and simple – normalizing penultimate features. Addressing the learning instability allows for larger learning rates, and significantly decreases the variance of outcomes. This demonstrates that the perceived variance in RL is not necessarily inherent to the problem definition and may be addressed through simple architectural modifications.

摘要:强化学习(RL)实验具有众所周知的高方差,并且次要细节可能对测量结果具有不成比例的巨大影响。这对于创建可重复的研究是有问题的,并且还用作现实世界应用的障碍,其中安全性和可预测性是至关重要的。在本文中,我们调查了这种感知不稳定的原因。为了允许深入分析,我们专注于具有高方差的最受欢迎的设置 - 从带有演员 - 批评者的像素连续控制。在这个环境中,我们证明,由于糟糕的“异常值”运行,方差主要是培训早期,但重量初始化和初始探索不应该责备。我们表明,早期方差的一个原因是数值不稳定,导致饱和非线性。我们调查了解决此问题的几个修复,并发现一个特定的方法令人惊讶的是有效和简单的倒数第二特征。解决学习不稳定允许更大的学习率,并且显着降低结果的方差。这表明RL中的感知方差不一定是问题定义所固有的,并且可以通过简单的架构修改来解决。

ML-16-标题: User-Level Private Learning via Correlated Sampling

链接: https://arxiv.org/abs/2110.11208
作者: Badih Ghazi, Ravi Kumar, Pasin Manurangsi
备注: To appear in NeurIPS 2021

点击查看摘要

Abstract: Most works in learning with differential privacy (DP) have focused on the setting where each user has a single sample. In this work, we consider the setting where each user holds mm samples and the privacy protection is enforced at the level of each user’s data. We show that, in this setting, we may learn with a much fewer number of users. Specifically, we show that, as long as each user receives sufficiently many samples, we can learn any privately learnable class via an (ϵ,δ)(\epsilon, \delta)-DP algorithm using only O(log(1/δ)/ϵ)O(\log(1/\delta)/\epsilon) users. For ϵ\epsilon-DP algorithms, we show that we can learn using only Oϵ(d)O_{\epsilon}(d) users even in the local model, where dd is the probabilistic representation dimension. In both cases, we show a nearly-matching lower bound on the number of users required. A crucial component of our results is a generalization of global stability [Bun et al., FOCS 2020] that allows the use of public randomness. Under this relaxed notion, we employ a correlated sampling strategy to show that the global stability can be boosted to be arbitrarily close to one, at a polynomial expense in the number of samples.

摘要:大多数使用差异隐私(DP)学习的工作都集中在每个用户有一个样本的设置上。在这项工作中,我们考虑每个用户持有M $ Samples的设置,并且在每个用户数据的级别强制执行隐私保护。我们展示了,在这个设置中,我们可以学习少数用户。具体而言,我们表明,只要每个用户收到足够多的样本,我们就可以通过( epsilon, delta(\ epsilon,\ delta) - dp算法使用$ o(\ log(1 / \ delta)来学习任何私人学习的课程/ \ epsilon)用户。对于用户。对于 \ epsilon $ -dp算法,我们展示我们即使在本地模型中也可以使用$ o _ {\ epsilon}(d)用户学习,其中用户学习,其中 d $是概率表示维度。在这两种情况下,我们在所需用户数量上显示了几乎匹配的下限。我们的结果的一个关键组成部分是全局稳定性的概括[Bun等,Focs 2020]允许使用公共随机性。在这种轻松的概念下,我们采用相关的采样策略来表明全局稳定性可以在样品数量的多项式牺牲中被提升以任意接近一个。

ML-17-标题: Anti-Concentrated Confidence Bonuses for Scalable Exploration

链接: https://arxiv.org/abs/2110.11202
作者: Jordan T. Ash, Cyril Zhang, Surbhi Goel, Akshay Krishnamurthy, Sham Kakade
备注:

点击查看摘要

Abstract: Intrinsic rewards play a central role in handling the exploration-exploitation trade-off when designing sequential decision-making algorithms, in both foundational theory and state-of-the-art deep reinforcement learning. The LinUCB algorithm, a centerpiece of the stochastic linear bandits literature, prescribes an elliptical bonus which addresses the challenge of leveraging shared information in large action spaces. This bonus scheme cannot be directly transferred to high-dimensional exploration problems, however, due to the computational cost of maintaining the inverse covariance matrix of action features. We introduce \emph{anti-concentrated confidence bounds} for efficiently approximating the elliptical bonus, using an ensemble of regressors trained to predict random noise from policy network-derived features. Using this approximation, we obtain stochastic linear bandit algorithms which obtain O~(dT)\tilde O(d \sqrt{T}) regret bounds for poly(d)\mathrm{poly}(d) fixed actions. We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic reward heuristics on Atari benchmarks.

摘要:内在奖励在机构在整个理论和最先进的深度加强学习时在设计序贯决策算法时扮演勘探开发权衡的核心作用。 Linucb算法是随机线性匪徒文献的核心规定了一种椭圆奖励,其解决了在大动作空间中利用共享信息的挑战。然而,由于维护逆协方差矩阵的动作特征的计算成本,这种奖励方案不能直接转移到高维探索问题。我们介绍\ emph {反集中的置信界限}有效地逼近椭圆奖励,使用培训的回归训练的集合来预测来自策略网络派生功能的随机噪声。使用此近似,我们获得了$ \ tilde o(d \ sqrt {t})$ \ mathrm {poly}(d)$固定操作的reetret界限的随机线性强盗算法。我们开发了一种对深度加强学习的实际变体,这是对阿塔利基准的现代内在奖励启发式竞争力的竞争力。

ML-18-标题: Sensing Cox Processes via Posterior Sampling and Positive Bases

链接: https://arxiv.org/abs/2110.11181
作者: Mojmír Mutný, Andreas Krause
备注:

点击查看摘要

Abstract: We study adaptive sensing of Cox point processes, a widely used model from spatial statistics. We introduce three tasks: maximization of captured events, search for the maximum of the intensity function and learning level sets of the intensity function. We model the intensity function as a sample from a truncated Gaussian process, represented in a specially constructed positive basis. In this basis, the positivity constraint on the intensity function has a simple form. We show how an minimal description positive basis can be adapted to the covariance kernel, non-stationarity and make connections to common positive bases from prior works. Our adaptive sensing algorithms use Langevin dynamics and are based on posterior sampling (\textsc{Cox-Thompson}) and top-two posterior sampling (\textsc{Top2}) principles. With latter, the difference between samples serves as a surrogate to the uncertainty. We demonstrate the approach using examples from environmental monitoring and crime rate modeling, and compare it to the classical Bayesian experimental design approach.

摘要:我们研究了Cox点过程的自适应感应,从空间统计中广泛使用的模型。我们介绍了三个任务:捕获事件的最大化,搜索强度函数的最大函数和学习级别集。我们将强度函数模拟作为来自截短的高斯过程的样本,以特殊构造的正面为代表。在此基础上,强度函数的积极约束具有简单的形式。我们展示了最小的描述肯定基础可以适应协方差内核,非公平性,并与现有作品的共同积极基础进行连接。我们的自适应传感算法使用Langevin动态,并基于后部采样(\ TextSc {Cox-Thompson})和前两个后面采样(\ Textsc {Top2})原则。随着后者,样品之间的差异用作不确定性的替代品。我们展示了使用环境监测和犯罪率建模的实例的方法,并将其与古典贝叶斯实验设计方法进行比较。

ML-19-标题: DeLag: Detecting Latency Degradation Patterns in Service-based Systems

链接: https://arxiv.org/abs/2110.11155
作者: Luca Traini, Vittorio Cortellessa
备注:

点击查看摘要

Abstract: Performance debugging in production is a fundamental activity in modern service-based systems. The diagnosis of performance issues is often time-consuming, since it requires thorough inspection of large volumes of traces and performance indices. In this paper we present DeLag, a novel automated search-based approach for diagnosing performance issues in service-based systems. DeLag identifies subsets of requests that show, in the combination of their Remote Procedure Call execution times, symptoms of potentially relevant performance issues. We call such symptoms Latency Degradation Patterns. DeLag simultaneously search for multiple latency degradation patterns while optimizing precision, recall and latency dissimilarity. Experimentation on 700 datasets of requests generated from two microservice-based systems shows that our approach provide better and more stable effectiveness than three state-of-the-art approaches and general purpose machine learning clustering algorithms. Moreover, DeLag outperforms in terms of efficiency the second and the third most effective baseline techniques on the largest datasets used in our evaluation.

摘要:生产中的性能调试是基于现代服务的系统的基本活动。绩效问题的诊断往往是耗时的,因为它需要彻底检查大量的痕迹和性能指标。在本文中,我们提出了一种新颖的自动化搜索方法,用于诊断基于服务的系统中的性能问题。 Delag在其远程过程调用执行时间的组合中识别显示的请求子集,潜在相关性能问题的症状。我们称之为潜伏期劣化模式。 Delag同时搜索多个延迟劣化模式,同时优化精度,召回和延迟不相似。从两个微服务的系统生成的700个数据集的实验表明,我们的方法提供比三种最先进的方法和通用机器学习聚类算法更好,更稳定的效率。此外,在效率方面,在我们评估中使用的最大数据集中的第二个和第三个最有效的基线技术的效率方面占优势。

ML-20-标题: Personalized Transfer of User Preferences for Cross-domain Recommendation

链接: https://arxiv.org/abs/2110.11154
作者: Yongchun Zhu, Zhenwei Tang, Yudan Liu, Fuzhen Zhuang, Ruobing Xie, Xu Zhang, Leyu Lin, Qing He
备注: Accepted by WSDM 2022

点击查看摘要

Abstract: Cold-start problem is still a very challenging problem in recommender systems. Fortunately, the interactions of the cold-start users in the auxiliary source domain can help cold-start recommendations in the target domain. How to transfer user’s preferences from the source domain to the target domain, is the key issue in Cross-domain Recommendation (CDR) which is a promising solution to deal with the cold-start problem. Most existing methods model a common preference bridge to transfer preferences for all users. Intuitively, since preferences vary from user to user, the preference bridges of different users should be different. Along this line, we propose a novel framework named Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR). Specifically, a meta network fed with users’ characteristic embeddings is learned to generate personalized bridge functions to achieve personalized transfer of preferences for each user. To learn the meta network stably, we employ a task-oriented optimization procedure. With the meta-generated personalized bridge function, the user’s preference embedding in the source domain can be transformed into the target domain, and the transformed user preference embedding can be utilized as the initial embedding for the cold-start user in the target domain. Using large real-world datasets, we conduct extensive experiments to evaluate the effectiveness of PTUPCDR on both cold-start and warm-start stages. The code has been available at \url{this https URL}.

摘要:冷启动问题在推荐系统中仍然是一个非常具有挑战性的问题。幸运的是,冷启动用户在辅助源域中的交互可以帮助目标域中的冷启动推荐。如何将用户的偏好从源域转移到目标域,是跨域推荐(CDR)中的关键问题,这是处理冷启动问题的有希望的解决方案。大多数现有方法模型用于传输所有用户的偏好。直观地,由于偏好因用户对用户而异,不同用户的偏好网桥应该是不同的。在这一行中,我们提出了一个名为个性化用户偏好的小说框架,用于跨域推荐(PTUPCDR)。具体地,学习了与用户特征嵌入的元网络,以生成个性化桥接功能以实现每个用户的个性化的偏好传送。要稳定地学习元网络,我们采用了面向任务的优化过程。利用元生成的个性化桥函数,用户在源域中的偏好嵌入可以转换为目标域,并且变换的用户偏好嵌入可以用作目标域中的冷启动用户的初始嵌入。使用大型现实数据集,我们进行广泛的实验,以评估PTUPCDR对冷启动和热启动阶段的有效性。代码已在\ URL {此HTTPS URL}可用。

ML-21-标题: Towards strong pruning for lottery tickets with non-zero biases

链接: https://arxiv.org/abs/2110.11150
作者: Jonas Fischer, Rebekka Burkholz
备注:

点击查看摘要

Abstract: The strong lottery ticket hypothesis holds the promise that pruning randomly initialized deep neural networks could offer a computationally efficient alternative to deep learning with stochastic gradient descent. Common parameter initialization schemes and existence proofs, however, are focused on networks with zero biases, thus foregoing the potential universal approximation property of pruning. To fill this gap, we extend multiple initialization schemes and existence proofs to non-zero biases, including explicit ‘looks-linear’ approaches for ReLU activation functions. These do not only enable truly orthogonal parameter initialization but also reduce potential pruning errors. In experiments on standard benchmark data sets, we further highlight the practical benefits of non-zero bias initialization schemes, and present theoretically inspired extensions for state-of-the-art strong lottery ticket pruning.

摘要:强大的彩票假设拥有随机初始化的深度神经网络修剪的承诺可以提供与随机梯度下降的深入学习的计算有效替代。然而,公共参数初始化方案和存在证明集中在具有零偏置的网络上,因此前面提出了修剪的潜在普遍近似特性。为了填补这种差距,我们将多个初始化方案和存在证明扩展到非零偏差,包括relu激活功能的显式“外观”方法。这些不仅可以实现真正正交的参数初始化,还可以降低潜在的修剪误差。在标准基准数据集的实验中,我们进一步突出了非零偏置初始化方案的实际优势,并呈现了最先进的强大彩票修剪的理论上灵感的扩展。

ML-22-标题: Inverse Optimal Control Adapted to the Noise Characteristics of the Human Sensorimotor System

链接: https://arxiv.org/abs/2110.11130
作者: Matthias Schultheis, Dominik Straub, Constantin A. Rothkopf
备注: 24 pages, 11 figures, to be published at NeurIPS 2021

点击查看摘要

Abstract: Computational level explanations based on optimal feedback control with signal-dependent noise have been able to account for a vast array of phenomena in human sensorimotor behavior. However, commonly a cost function needs to be assumed for a task and the optimality of human behavior is evaluated by comparing observed and predicted trajectories. Here, we introduce inverse optimal control with signal-dependent noise, which allows inferring the cost function from observed behavior. To do so, we formalize the problem as a partially observable Markov decision process and distinguish between the agent’s and the experimenter’s inference problems. Specifically, we derive a probabilistic formulation of the evolution of states and belief states and an approximation to the propagation equation in the linear-quadratic Gaussian problem with signal-dependent noise. We extend the model to the case of partial observability of state variables from the point of view of the experimenter. We show the feasibility of the approach through validation on synthetic data and application to experimental data. Our approach enables recovering the costs and benefits implicit in human sequential sensorimotor behavior, thereby reconciling normative and descriptive approaches in a computational framework.

摘要:基于具有信号相关噪声的最佳反馈控制的计算水平解释已经能够考虑人类感觉传感器行为中的大量现象。然而,通常需要假设成本函数,通过比较观察和预测的轨迹来评估人类行为的最优性。在这里,我们引入了信号相关噪声的逆最佳控制,这允许从观察到的行为推断成本函数。为此,我们将问题正式化为部分可观察的马尔可夫决策过程,区分代理商和实验者的推理问题。具体而言,我们推导出概率的制定状态和信仰状态的演变,以及与信号相关噪声的线性二次高斯问题中的传播方程的近似。从实验者的角度来看,我们将模型扩展到状态变量部分可观察性的情况。我们通过对实验数据的验证来验证方法的可行性。我们的方法能够恢复在人类顺序感觉传感器行为中隐含的成本和益处,从而在计算框架中核对规范性和描述性方法。

ML-23-标题: RoMA: a Method for Neural Network Robustness Measurement and Assessment

链接: https://arxiv.org/abs/2110.11088
作者: Natan Levy, Guy Katz
备注:

点击查看摘要

Abstract: Neural network models have become the leading solution for a large variety of tasks, such as classification, language processing, protein folding, and others. However, their reliability is heavily plagued by adversarial inputs: small input perturbations that cause the model to produce erroneous outputs. Adversarial inputs can occur naturally when the system’s environment behaves randomly, even in the absence of a malicious adversary, and are a severe cause for concern when attempting to deploy neural networks within critical systems. In this paper, we present a new statistical method, called Robustness Measurement and Assessment (RoMA), which can measure the expected robustness of a neural network model. Specifically, RoMA determines the probability that a random input perturbation might cause misclassification. The method allows us to provide formal guarantees regarding the expected frequency of errors that a trained model will encounter after deployment. Our approach can be applied to large-scale, black-box neural networks, which is a significant advantage compared to recently proposed verification methods. We apply our approach in two ways: comparing the robustness of different models, and measuring how a model’s robustness is affected by the magnitude of input perturbation. One interesting insight obtained through this work is that, in a classification network, different output labels can exhibit very different robustness levels. We term this phenomenon categorial robustness. Our ability to perform risk and robustness assessments on a categorial basis opens the door to risk mitigation, which may prove to be a significant step towards neural network certification in safety-critical applications.

摘要:神经网络模型已成为各种任务的领先解决方案,如分类,语言处理,蛋白质折叠等。然而,它们的可靠性受到对抗性输入的严重困扰:小型输入扰动导致模型产生错误输出。当系统的环境随机行为时,对侵扰投入可以自然地发生,即使在没有恶意对手时,也是在试图在关键系统中部署神经网络时的严重原因。在本文中,我们提出了一种新的统计方法,称为鲁棒性测量和评估(ROMA),可以测量神经网络模型的预期稳健性。具体而言,罗马确定随机输入扰动可能导致错误分类的概率。该方法允许我们提供关于培训模型在部署之后遇到的预期错误的预期频率的正式保证。与最近提出的验证方法相比,我们的方法可以应用于大规模的黑盒神经网络,这是一个显着的优势。我们以两种方式应用我们的方法:比较不同模型的鲁棒性,并测量模型的鲁棒性如何受输入扰动的大小的影响。通过这项工作获得的一个有趣的洞察力是,在分类网络中,不同的输出标签可以表现出非常不同的稳健性水平。我们术语这种现象分类鲁棒性。我们对分类基础进行风险和稳健性评估的能力将打开风险缓解的大门,这可能是对安全关键型应用中神经网络认证的重要一步。

ML-24-标题: Continuous Authentication Using Mouse Movements Machine Learning and Minecraft

链接: https://arxiv.org/abs/2110.11080
作者: Nyle Siddiqui, Rushit Dave, Naeem Seliya
备注:

点击查看摘要

Abstract: Mouse dynamics has grown in popularity as a novel irreproducible behavioral biometric. Datasets which contain general unrestricted mouse movements from users are sparse in the current literature. The Balabit mouse dynamics dataset produced in 2016 was made for a data science competition and despite some of its shortcomings, is considered to be the first publicly available mouse dynamics dataset. Collecting mouse movements in a dull administrative manner as Balabit does may unintentionally homogenize data and is also not representative of realworld application scenarios. This paper presents a novel mouse dynamics dataset that has been collected while 10 users play the video game Minecraft on a desktop computer. Binary Random Forest (RF) classifiers are created for each user to detect differences between a specific users movements and an imposters movements. Two evaluation scenarios are proposed to evaluate the performance of these classifiers; one scenario outperformed previous works in all evaluation metrics, reaching average accuracy rates of 92%, while the other scenario successfully reported reduced instances of false authentications of imposters.

摘要:鼠标动力学在普及中发展为新的IRREPROODUIBLE行为生物识别。包含来自用户的一般不受限制的鼠标运动的数据集在当前文献中稀疏。 2016年生产的Balabit鼠标动态数据集是为数据科学竞争制作的,尽管有一些缺点,被认为是第一个公开的小鼠动态数据集。以暗中的行政方式收集鼠标运动,因为鲍比可能无意中均匀化数据,也不代表RealWorld应用方案。本文介绍了一个新的鼠标动态数据集,而10位用户在台式计算机上播放视频游戏MINECraft。为每个用户创建二进制随机森林(RF)分类器,以检测特定用户移动和驾驶员运动之间的差异。提出了两个评估方案来评估这些分类器的表现;在所有评估指标中,一个场景优于以前的作品,达到平均精度率为92%,而其他情况成功报告了驾驶员的错误认证的减少实例。

ML-25-标题: RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System

链接: https://arxiv.org/abs/2110.11073
作者: Kai Wang, Zhene Zou, Qilin Deng, Yue Shang, Minghao Zhao, Runze Wu, Xudong Shen, Tangjie Lyu, Changjie Fan
备注: First version

点击查看摘要

Abstract: Reinforcement learning based recommender systems (RL-based RS) aims at learning a good policy from a batch of collected data, with casting sequential recommendation to multi-step decision-making tasks. However, current RL-based RS benchmarks commonly have a large reality gap, because they involve artificial RL datasets or semi-simulated RS datasets, and the trained policy is directly evaluated in the simulation environment. In real-world situations, not all recommendation problems are suitable to be transformed into reinforcement learning problems. Unlike previous academic RL researches, RL-based RS suffer from extrapolation error and the difficulties of being well validated before deployment. In this paper, we introduce the RL4RS (Reinforcement Learning for Recommender Systems) benchmark - a new resource fully collected from industrial applications to train and evaluate RL algorithms with special concerns on the above issues. It contains two datasets, tuned simulation environments, related advanced RL baselines, data understanding tools, and counterfactual policy evaluation algorithms. The RL4RS suit can be found at this https URL. In addition to the RL-based recommender systems, we expect the resource to contribute to research in reinforcement learning and neural combinatorial optimization.

摘要:基于钢筋的基于学习的推荐系统(RL为基础的RS)旨在从一批收集的数据学习良好的政策,并将顺序推荐用于多步决策任务。但是,基于目前的RL基RS基准通常具有大的现实缺口,因为它们涉及人造RL数据集或半模拟的RS数据集,并且在仿真环境中直接评估训练策略。在现实世界的情况下,并非所有推荐问题都适合转变为加强学习问题。与以前的学术RL研究不同,基于RL的RS遭受外推错误,并且在部署之前验证良好的困难。在本文中,我们介绍了RL4RS(推荐系统的强化学习)基准 - 从工业应用中完全收集的新资源,以培训和评估R​​L算法,以特别关注上述问题。它包含两个数据集,调谐仿真环境,相关的高级RL基线,数据了解工具和反事实策略评估算法。可以在此HTTPS URL找到RL4RS套装。除了基于RL的推荐制度之外,我们还希望资源有助于研究加固学习和神经组合优化。

ML-26-标题: Sequential Modeling with Multiple Attributes for Watchlist Recommendation in E-Commerce

链接: https://arxiv.org/abs/2110.11072
作者: Uriel Singer, Haggai Roitman, Yotam Eshel, Alexander Nus, Ido Guy, Or Levi, Idan Hasson, Eliyahu Kiperwasser
备注:

点击查看摘要

Abstract: In e-commerce, the watchlist enables users to track items over time and has emerged as a primary feature, playing an important role in users’ shopping journey. Watchlist items typically have multiple attributes whose values may change over time (e.g., price, quantity). Since many users accumulate dozens of items on their watchlist, and since shopping intents change over time, recommending the top watchlist items in a given context can be valuable. In this work, we study the watchlist functionality in e-commerce and introduce a novel watchlist recommendation task. Our goal is to prioritize which watchlist items the user should pay attention to next by predicting the next items the user will click. We cast this task as a specialized sequential recommendation task and discuss its characteristics. Our proposed recommendation model, Trans2D, is built on top of the Transformer architecture, where we further suggest a novel extended attention mechanism (Attention2D) that allows to learn complex item-item, attribute-attribute and item-attribute patterns from sequential-data with multiple item attributes. Using a large-scale watchlist dataset from eBay, we evaluate our proposed model, where we demonstrate its superiority compared to multiple state-of-the-art baselines, many of which are adapted for this task.

摘要:在电子商务中,监视列表使用户能够随着时间的推移跟踪物品,并已成为主要功能,在用户购物之旅中发挥着重要作用。观看列表项目通常具有多个属性,其值可能会随时间变化(例如,价格,数量)。由于许多用户在其监视表中累积数十项,因为购物意图随着时间的推移而变化,因此在给定的上下文中推荐顶部监视列表项可能是有价值的。在这项工作中,我们研究了电子商务中的监视功能,并介绍了一个新颖的监视商推荐任务。我们的目标是优先考虑哪些观察列表项目通过预测用户单击的下一个项目,请注意下一步。我们将此任务作为专门的顺序推荐任务,并讨论其特征。我们所提出的推荐模型Trans2D是基于变压器架构的顶部,我们进一步建议了一种新的扩展关注机制(Impection2D),允许从顺序数据中学习复杂的项目 - 项目,属性和项目属性模式多项属性。使用来自eBay的大规模监视列表数据集,我们评估我们所提出的模型,我们展示了与多种最先进的基线相比其优势,其中许多适用于此任务。

ML-27-标题: A Nested Weighted Tchebycheff Multi-Objective Bayesian Optimization Approach for Flexibility of Unknown Utopia Estimation in Expensive Black-box Design Problems

链接: https://arxiv.org/abs/2110.11070
作者: Arpan Biswas, Claudio Fuentes, Christopher Hoyle
备注: 35 pages, 8 figures in main text and 2 figures in supplementary

点击查看摘要

Abstract: We propose a nested weighted Tchebycheff Multi-objective Bayesian optimization framework where we build a regression model selection procedure from an ensemble of models, towards better estimation of the uncertain parameters of the weighted-Tchebycheff expensive black-box multi-objective function. In existing work, a weighted Tchebycheff MOBO approach has been demonstrated which attempts to estimate the unknown utopia in formulating acquisition function, through calibration using a priori selected regression model. However, the existing MOBO model lacks flexibility in selecting the appropriate regression models given the guided sampled data and therefore, can under-fit or over-fit as the iterations of the MOBO progress, reducing the overall MOBO performance. As it is too complex to a priori guarantee a best model in general, this motivates us to consider a portfolio of different families of predictive models fitted with current training data, guided by the WTB MOBO; the best model is selected following a user-defined prediction root mean-square-error-based approach. The proposed approach is implemented in optimizing a multi-modal benchmark problem and a thin tube design under constant loading of temperature-pressure, with minimizing the risk of creep-fatigue failure and design cost. Finally, the nested weighted Tchebycheff MOBO model performance is compared with different MOBO frameworks with respect to accuracy in parameter estimation, Pareto-optimal solutions and function evaluation cost. This method is generalized enough to consider different families of predictive models in the portfolio for best model selection, where the overall design architecture allows for solving any high-dimensional (multiple functions) complex black-box problems and can be extended to any other global criterion multi-objective optimization methods where prior knowledge of utopia is required.

【摘要在现有工作中,已经证明了一种加权Tchebycheff Mobo方法,该方法试图通过使用先验选择的回归模型来校准来估计制定采集功能的未知乌托邦。然而,现有的MOBO模型在选择适当的回归模型时缺乏灵活性,在给出导游的采样数据,因此,可以在MOBO进展的迭代中贴合或过度适合,降低整体MOBO性能。由于它太复杂于先验保证一般的最佳型号,这使我们能够考虑由WTB Mobo引导的当前训练数据的预测模型的不同家庭组合;按照用户定义的预测根均方误差的方法,选择了最佳模型。该方法在优化多模态基准问题和温度负荷恒定负荷下的薄管设计方面实施,最大限度地减少了蠕变 - 疲劳失效和设计成本的风险。最后,在参数估计中的准确性,帕累托 - 最优解和功能评估成本的准确性将嵌套加权TcheBefcheff Mobo模型性能与不同的MOBO框架进行比较。该方法是足够的,以考虑投资组合中的不同型号的不同型号,以获得最佳模型选择,其中整体设计架构允许解决任何高维(多函数)复杂的黑盒问题,并且可以扩展到任何其他全局标准需要多目标优化方法,需要乌托邦的先验知识。

ML-28-标题: Bayesian Meta-Learning Through Variational Gaussian Processes

链接: https://arxiv.org/abs/2110.11044
作者: Vivek Myers, Nikhil Sardana
备注:

点击查看摘要

Abstract: Recent advances in the field of meta-learning have tackled domains consisting of large numbers of small (“few-shot”) supervised learning tasks. Meta-learning algorithms must be able to rapidly adapt to any individual few-shot task, fitting to a small support set within a task and using it to predict the labels of the task’s query set. This problem setting can be extended to the Bayesian context, wherein rather than predicting a single label for each query data point, a model predicts a distribution of labels capturing its uncertainty. Successful methods in this domain include Bayesian ensembling of MAML-based models, Bayesian neural networks, and Gaussian processes with learned deep kernel and mean functions. While Gaussian processes have a robust Bayesian interpretation in the meta-learning context, they do not naturally model non-Gaussian predictive posteriors for expressing uncertainty. In this paper, we design a theoretically principled method, VMGP, extending Gaussian-process-based meta-learning to allow for high-quality, arbitrary non-Gaussian uncertainty predictions. On benchmark environments with complex non-smooth or discontinuous structure, we find our VMGP method performs significantly better than existing Bayesian meta-learning baselines.

摘要:元学习领域的最新进展已经解决了由大量的小(“几秒钟”)监督学习任务组成的域名。元学习算法必须能够快速适应任何单个少量拍摄任务,拟合在任务中设置的小型支持,并使用它来预测任务查询集的标签。该问题设置可以扩展到贝叶斯上下文,其中而不是预测每个查询数据点的单个标签,模型预测捕获其不确定性的标签的分布。该域中的成功方法包括MAML的模型,贝叶斯神经网络和高斯过程的贝叶斯乐队,具有学习的深核和均值函数。虽然高斯过程在元学习背景下具有强大的贝叶斯解释,但它们并不是自然地模拟非高斯预测后海后,以表达不确定性。在本文中,我们设计了理论上原则上的方法,VMGP,扩展了基于高斯过程的元学习,允许高质量,任意的非高斯不确定性预测。在具有复杂非平滑或不连续结构的基准环境中,我们发现我们的VMGP方法比现有的贝叶斯元学习基线更好地表现出来。

ML-29-标题: FedGEMS: Federated Learning of Larger Server Models via Selective Knowledge Fusion

链接: https://arxiv.org/abs/2110.11027
作者: Sijie Cheng, Jingwen Wu, Yanghua Xiao, Yang Liu, Yang Liu
备注: Under review as a conference paper at ICLR 2022

点击查看摘要

Abstract: Today data is often scattered among billions of resource-constrained edge devices with security and privacy constraints. Federated Learning (FL) has emerged as a viable solution to learn a global model while keeping data private, but the model complexity of FL is impeded by the computation resources of edge nodes. In this work, we investigate a novel paradigm to take advantage of a powerful server model to break through model capacity in FL. By selectively learning from multiple teacher clients and itself, a server model develops in-depth knowledge and transfers its knowledge back to clients in return to boost their respective performance. Our proposed framework achieves superior performance on both server and client models and provides several advantages in a unified framework, including flexibility for heterogeneous client architectures, robustness to poisoning attacks, and communication efficiency between clients and server. By bridging FL effectively with larger server model training, our proposed paradigm paves ways for robust and continual knowledge accumulation from distributed and private data.

摘要:今天的数据通常分散在具有安全性和隐私约束的数十亿资源受限的边缘设备中。联合学习(FL)已成为在保持数据私有的同时学习全球模型的可行解决方案,但FL的模型复杂性被边缘节点的计算资源阻碍。在这项工作中,我们调查了一种新的范例来利用强大的服务器模型来突破FL中的模型容量。通过选择性地从多个教师客户和本身学习,服务器模型开发深入的知识,并将其知识传输回客户端,以恢复它们各自的性能。我们所提出的框架在服务器和客户端模型上实现了卓越的性能,并在统一的框架中提供了几个优点,包括异构客户端架构的灵活性,对中毒攻击的鲁棒性以及客户端和服务器之间的通信效率。通过有效地使用较大的服务器模型培训,我们提出的范式铺平了从分布式和私人数据的强大和持续知识累积铺平了方法。

ML-30-标题: Watermarking Graph Neural Networks based on Backdoor Attacks

链接: https://arxiv.org/abs/2110.11024
作者: Jing Xu, Stjepan Picek
备注:

点击查看摘要

Abstract: Graph Neural Networks (GNNs) have achieved promising performance in various real-world applications. Building a powerful GNN model is not a trivial task, as it requires a large amount of training data, powerful computing resources, and human expertise on fine-tuning the model. What is more, with the development of adversarial attacks, e.g., model stealing attacks, GNNs raise challenges to model authentication. To avoid copyright infringement on GNNs, it is necessary to verify the ownership of the GNN models. In this paper, we present a watermarking framework for GNNs for both graph and node classification tasks. We 1) design two strategies to generate watermarked data for the graph classification and one for the node classification task, 2) embed the watermark into the host model through training to obtain the watermarked GNN model, and 3) verify the ownership of the suspicious model in a black-box setting. The experiments show that our framework can verify the ownership of GNN models with a very high probability (around 100%100\%) for both tasks. In addition, we experimentally show that our watermarking approach is still effective even when considering suspicious models obtained from different architectures than the owner’s.

摘要:图形神经网络(GNNS)在各种现实应用中取得了有希望的表现。建立一个强大的GNN模型不是一个琐碎的任务,因为它需要大量的培训数据,强大的计算资源和微调模型的人类专业知识。更重要的是,随着对抗性攻击的发展,例如,模型窃取攻击,GNNS提出了模型认证的挑战。为避免对GNN的版权侵犯,有必要验证GNN模型的所有权。在本文中,我们为图形和节点分类任务提供了一种用于GNN的水印框架。我们1)设计两种策略来为图形分类生成水印数据,一个用于节点分类任务,2)通过培训将水印嵌入到主机模型中,以获得水印的GNN模型,3)验证可疑模型的所有权在黑盒设置中。实验表明,我们的框架可以验证GNN模型的所有权,具有非常高的概率(约100亿美元)的任务。此外,我们实验表明,即使在考虑到从不同架构获得的可疑模型比所有者的可疑模型,我们的水印方法也仍然有效。

ML-31-标题: Learning Time-Varying Graphs from Online Data

链接: https://arxiv.org/abs/2110.11017
作者: Alberto Natali, Elvin Isufi, Mario Coutino, Geert Leus
备注:

点击查看摘要

Abstract: This work proposes an algorithmic framework to learn time-varying graphs from online data. The generality offered by the framework renders it model-independent, i.e., it can be theoretically analyzed in its abstract formulation and then instantiated under a variety of model-dependent graph learning problems. This is possible by phrasing (time-varying) graph learning as a composite optimization problem, where different functions regulate different desiderata, e.g., data fidelity, sparsity or smoothness. Instrumental for the findings is recognizing that the dependence of the majority (if not all) data-driven graph learning algorithms on the data is exerted through the empirical covariance matrix, representing a sufficient statistic for the estimation problem. Its user-defined recursive update enables the framework to work in non-stationary environments, while iterative algorithms building on novel time-varying optimization tools explicitly take into account the temporal dynamics, speeding up convergence and implicitly including a temporal-regularization of the solution. We specialize the framework to three well-known graph learning models, namely, the Gaussian graphical model (GGM), the structural equation model (SEM), and the smoothness-based model (SBM), where we also introduce ad-hoc vectorization schemes for structured matrices (symmetric, hollows, etc.) which are crucial to perform correct gradient computations, other than enabling to work in low-dimensional vector spaces and hence easing storage requirements. After discussing the theoretical guarantees of the proposed framework, we corroborate it with extensive numerical tests in synthetic and real data.

摘要:这项工作提出了一种算法框架来学习来自在线数据的时变图。框架提供的一般性呈现它与其模型无关的,即,在其抽象的制定中可以理论上分析,然后在各种模型相关的图形学习问题下实例化。这是通过将(时变)图表学习作为复合优化问题来实现这一点,其中不同的功能调节不同的Desiderata,例如数据保真度,稀疏性或平滑度。研究结果的仪器认识到,大多数(如果不是全部)数据驱动的图形学习算法的依赖性通过经验协方差矩阵施加,代表估计问题的足够统计信息。其用户定义的递归更新使框架能够在非静止环境中工作,而新的时变优化工具上的迭代算法建立明确地考虑了时间动态,加速会聚并隐含地包括解决方案的时间正则化。我们专业为三个众所周知的图形学习模型,即高斯图形模型(GGM),结构方程模型(SEM)和基于光滑的型号(SBM),在那里我们还引入了Ad-Hoc矢量化方案对于对执行正确的梯度计算来说至关重要的结构化矩阵(对称,空心等),除了能够在低维向量空间中工作并因此实现存储要求。在讨论拟议框架的理论保证后,我们将其证实了合成和实际数据的广泛数值测试。

ML-32-标题: A Utility Maximization Model of Pedestrian and Driver Interactions

链接: https://arxiv.org/abs/2110.11015
作者: Yi-Shin Lin, Aravinda Ramakrishnan Srinivasan, Matteo Leonetti, Jac Billington, Gustav Markkula
备注: 10 pages, 7 figures

点击查看摘要

Abstract: Many models account for the traffic flow of road users but few take the details of local interactions into consideration and how they could deteriorate into safety-critical situations. Building on the concept of sensorimotor control, we develop a modeling framework applying the principles of utility maximization, motor primitives, and intermittent action decisions to account for the details of interactive behaviors among road users. The framework connects these principles to the decision theory and is applied to determine whether such an approach can reproduce the following phenomena: When two pedestrians travel on crossing paths, (a) their interaction is sensitive to initial asymmetries, and (b) based on which, they rapidly resolve collision conflict by adapting their behaviors. When a pedestrian crosses the road while facing an approaching car, © either road user yields to the other to resolve their conflict, akin to the pedestrian interaction, and (d) the outcome reveals a specific situational kinematics, associated with the nature of vehicle acceleration. We show that these phenomena emerge naturally from our modeling framework when the model can evolve its parameters as a consequence of the situations. We believe that the modeling framework and phenomenon-centered analysis offer promising tools to understand road user interactions. We conclude with a discussion on how the model can be instrumental in studying the safety-critical situations when including other variables in road-user interactions.

摘要:许多模型占道路使用者的交通流量,但很少需要考虑本地互动的细节,以及它们如何恶化到安全关键情况下。在SensorImotor控制的概念上构建,我们开发了应用实用的型号,电机基元和间歇性行动决策的建模框架,以考虑道路用户之间的互动行为的详细信息。该框架将这些原则连接到决策理论,并应用于确定这种方法是否可以重现以下现象:当两个行人在交叉路径上行驶时,(a)它们的相互作用对初始不对称敏感,并且(b)基于哪个,他们通过调整行为来迅速解决碰撞冲突。当一个行人穿过道路时面对接近的汽车时,(c)任何道路用户都会产生另一个,以解决他们的冲突,类似于行人互动,(d)结果揭示了与本质相关的特定情境运动学车辆加速度。我们表明这些现象自然地从我们的建模框架中出现,当模型可以随着情况而发展其参数时。我们认为,建模框架和以现象为中心的分析提供了理解道路用户互动的有希望的工具。我们结束了关于在包括道路用户交互中的其他变量的情况下,模型如何在研究安全关键情况时如何讨论。

ML-33-标题: Bristle: Decentralized Federated Learning in Byzantine Non-i.i.d. Environments

链接: https://arxiv.org/abs/2110.11006
作者: Joost Verbraeken, Martijn de Vos, Johan Pouwelse
备注:

点击查看摘要

Abstract: Federated learning (FL) is a privacy-friendly type of machine learning where devices locally train a model on their private data and typically communicate model updates with a server. In decentralized FL (DFL), peers communicate model updates with each other instead. However, DFL is challenging since (1) the training data possessed by different peers is often non-i.i.d. (i.e., distributed differently between the peers) and (2) malicious, or Byzantine, attackers can share arbitrary model updates with other peers to subvert the training process. We address these two challenges and present Bristle, middleware between the learning application and the decentralized network layer. Bristle leverages transfer learning to predetermine and freeze the non-output layers of a neural network, significantly speeding up model training and lowering communication costs. To securely update the output layer with model updates from other peers, we design a fast distance-based prioritizer and a novel performance-based integrator. Their combined effect results in high resilience to Byzantine attackers and the ability to handle non-i.i.d. classes. We empirically show that Bristle converges to a consistent 95% accuracy in Byzantine environments, outperforming all evaluated baselines. In non-Byzantine environments, Bristle requires 83% fewer iterations to achieve 90% accuracy compared to state-of-the-art methods. We show that when the training classes are non-i.i.d., Bristle significantly outperforms the accuracy of the most Byzantine-resilient baselines by 2.3x while reducing communication costs by 90%.

摘要:联合学习(FL)是一种隐私友好类型的机器学习,其中设备在本地训练其私有数据上的模型,通常使用服务器传达模型更新。在分散的FL(DFL)中,对等体互相传达模型更新。然而,由于(1)不同同行所拥有的培训数据通常是非I.I.D的,因此DFL是具有挑战性的。 (即,在同行之间的不同分布)和(2)恶意或拜占庭,攻击者可以与其他同行共享任意模型更新以颠覆培训过程。我们在学习应用程序和分散的网络层之间解决了这两个挑战和目前的刷毛,中间件。刷毛利用转移学习以预先确定并冻结神经网络的非输出层,显着加速模型训练和降低通信成本。要与其他对等体的模型更新安全更新输出层,我们设计了一种基于距离的距离的优先级和基于新颖的基于性能的集成器。它们的综合效果导致拜占庭攻击者的高度恢复力和处理非I.I.D的能力。课程。我们经验证明,刷毛在拜占庭环境中融合到一致的95%精度,表现出所有评估的基线。在非百建合环境中,与最先进的方法相比,刷毛需要83%的迭代率来实现90%的精度。我们表明,当培训类是非i.i.d时,鬃毛显着优于最高拜占庭弹性基线的准确性2.3倍,同时将通信成本降低90%。

ML-34-标题: Interpretable Machine Learning for Resource Allocation with Application to Ventilator Triage

链接: https://arxiv.org/abs/2110.10994
作者: Julien Grand-Clément, Carri Chan, Vineet Goyal, Elizabeth Chuang
备注:

点击查看摘要

Abstract: Rationing of healthcare resources is a challenging decision that policy makers and providers may be forced to make during a pandemic, natural disaster, or mass casualty event. Well-defined guidelines to triage scarce life-saving resources must be designed to promote transparency, trust, and consistency. To facilitate buy-in and use during high-stress situations, these guidelines need to be interpretable and operational. We propose a novel data-driven model to compute interpretable triage guidelines based on policies for Markov Decision Process that can be represented as simple sequences of decision trees (“tree policies”). In particular, we characterize the properties of optimal tree policies and present an algorithm based on dynamic programming recursions to compute good tree policies. We utilize this methodology to obtain simple, novel triage guidelines for ventilator allocations for COVID-19 patients, based on real patient data from Montefiore hospitals. We also compare the performance of our guidelines to the official New York State guidelines that were developed in 2015 (well before the COVID-19 pandemic). Our empirical study shows that the number of excess deaths associated with ventilator shortages could be reduced significantly using our policy. Our work highlights the limitations of the existing official triage guidelines, which need to be adapted specifically to COVID-19 before being successfully deployed.

摘要:医疗保健资源的配给是一个具有挑战性的决定,即决策者和提供者可能被迫在大流行,自然灾害或大规模伤亡事件中制作。定义的分类指南稀缺救生资源必须旨在促进透明度,信任和一致性。为了便于在高压力局势期间的买入和使用,这些指导方针需要可解释和运作。我们提出了一种新颖的数据驱动模型,基于马尔可夫决策过程的策略来计算可解释的分类指南,这些方法可以表示为决策树的简单序列(“树策略”)。特别是,我们描述了最佳树策的属性,并呈现了一种基于动态编程递归的算法来计算良好的树策略。我们利用这种方法来获得Covid-19患者的呼吸机分配的简单新的分类指南,基于来自Montefiore Hospitals的真实患者数据。我们还将我们的准则对2015年开发的纽约州官方指南(在Covid-19 Pandemer之前的官员)进行了比较。我们的实证研究表明,与呼吸机短缺相关的多余死亡人数可能会用我们的政策大大减少。我们的工作突出了现有的官方分类指南的局限性,在成功部署之前,需要特别适用于Covid-19。

ML-35-标题: Learning OFDM Waveforms with PAPR and ACLR Constraints

链接: https://arxiv.org/abs/2110.10987
作者: Mathieu Goutay, Fayçal Ait Aoudia, Jakob Hoydis, Jean-Marie Gorce
备注:

点击查看摘要

Abstract: An attractive research direction for future communication systems is the design of new waveforms that can both support high throughputs and present advantageous signal characteristics. Although most modern systems use orthogonal frequency-division multiplexing (OFDM) for its efficient equalization, this waveform suffers from multiple limitations such as a high adjacent channel leakage ratio (ACLR) and high peak-to-average power ratio (PAPR). In this paper, we propose a learning-based method to design OFDM-based waveforms that satisfy selected constraints while maximizing an achievable information rate. To that aim, we model the transmitter and the receiver as convolutional neural networks (CNNs) that respectively implement a high-dimensional modulation scheme and perform the detection of the transmitted bits. This leads to an optimization problem that is solved using the augmented Lagrangian method. Evaluation results show that the end-to-end system is able to satisfy target PAPR and ACLR constraints and allows significant throughput gains compared to a tone reservation (TR) baseline. An additional advantage is that no dedicated pilots are needed.

摘要:未来通信系统的有吸引力的研究方向是新波形的设计,可以支持高吞吐量和目前有利的信号特性。尽管大多数现代系统使用正交频分复用(OFDM)的其有效均衡,但该波形遭受多个限制,例如高相邻信道泄漏比(ACLR)和高峰平均功率比(PAPR)。在本文中,我们提出了一种基于学习的方法来设计满足所选约束的基于DM的波形,同时最大化可实现的信息速率。为此,我们将发送器和接收器模拟作为卷积神经网络(CNNS),分别实现高维调制方案并执行发送的比特的检测。这导致了使用增强拉格朗日方法解决的优化问题。评估结果表明,端到端系统能够满足目标PAPR和ACLR约束,并允许与音调预留(TR)基线相比的显着吞吐量增益。额外的优点是不需要专用的飞行员。

ML-36-标题: Sliced-Wasserstein Gradient Flows

链接: https://arxiv.org/abs/2110.10972
作者: Clément Bonet, Nicolas Courty, François Septier, Lucas Drumetz
备注:

点击查看摘要

Abstract: Minimizing functionals in the space of probability distributions can be done with Wasserstein gradient flows. To solve them numerically, a possible approach is to rely on the Jordan-Kinderlehrer-Otto (JKO) scheme which is analogous to the proximal scheme in Euclidean spaces. However, this bilevel optimization problem is known for its computational challenges, especially in high dimension. To alleviate it, very recent works propose to approximate the JKO scheme leveraging Brenier’s theorem, and using gradients of Input Convex Neural Networks to parameterize the density (JKO-ICNN). However, this method comes with a high computational cost and stability issues. Instead, this work proposes to use gradient flows in the space of probability measures endowed with the sliced-Wasserstein (SW) distance. We argue that this method is more flexible than JKO-ICNN, since SW enjoys a closed-form differentiable approximation. Thus, the density at each step can be parameterized by any generative model which alleviates the computational burden and makes it tractable in higher dimensions. Interestingly, we also show empirically that these gradient flows are strongly related to the usual Wasserstein gradient flows, and that they can be used to minimize efficiently diverse machine learning functionals.

摘要:可以使用Wassersein梯度流动完成概率分布空间中的功能。为了在数值上解决它们,可能的方法是依靠Jordan-KinderLeher-Otto(JKO)方案,该方案类似于欧几里德空间中的近端方案。然而,这种偏离优化问题以其计算挑战而闻名,尤其是高维度。为了缓解它,最近的作品建议近似于利用Brenier定理的JKO方案,并使用输入凸神经网络的梯度来参数化密度(JKO-ICNN)。但是,该方法具有高计算成本和稳定性问题。相反,这项工作建议在赋予切片 - Wasserstein(SW)距离的概率措施空间中使用梯度流。我们认为这种方法比JKO-ICNN更灵活,因为SW自享受封闭形式可差的近似。因此,每个步骤的密度可以通过任何生成模型进行参数化,这些模型可减轻计算负担,使其在更高的尺寸上进行易行。有趣的是,我们还经验上展示了这些渐变流与通常的Wasserstein梯度流量强烈相关,并且它们可用于最小化有效的多样化机器学习功能。

ML-37-标题: Generalization of Neural Combinatorial Solvers Through the Lens of Adversarial Robustness

链接: https://arxiv.org/abs/2110.10942
作者: Simon Geisler, Johanna Sommer, Jan Schuchardt, Aleksandar Bojchevski, Stephan Günnemann
备注:

点击查看摘要

Abstract: End-to-end (geometric) deep learning has seen first successes in approximating the solution of combinatorial optimization problems. However, generating data in the realm of NP-hard/-complete tasks brings practical and theoretical challenges, resulting in evaluation protocols that are too optimistic. Specifically, most datasets only capture a simpler subproblem and likely suffer from spurious features. We investigate these effects by studying adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features. For this purpose, we derive perturbation models for SAT and TSP. Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound, allowing us to determine the true label of perturbed samples without a solver. Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning. Although such robust solvers exist, we show empirically that the assessed neural solvers do not generalize well w.r.t. small perturbations of the problem instance.

摘要:端到端(几何)深度学习在近似组合优化问题的解决方案中已经看到了第一次成功。但是,在NP-Hard / -Complete任务的领域中生成数据带来了实用和理论挑战,导致了太乐观的评估协议。具体而言,大多数数据集仅捕获更简单的子问题,并且可能遭受杂散的特征。我们通过研究对抗性鲁棒性 - 一种局部概括性财产来调查这些效果 - 揭示努力,模型特定的实例和杂散特征。为此目的,我们派生SAT和TSP的扰动模型。与其他应用程序不同,在扰动模型围绕难以察觉的主观概念设计,我们的扰动模型是有效且声音,允许我们在没有求解器的情况下确定扰动样本的真实标签。令人惊讶的是,通过这种扰动,一种充分表达的神经求解器不会遭受监督学习中共同的准确性鲁棒性权衡的局限性。虽然存在这种稳健的溶剂,但我们凭经验显示评估的神经溶剂不会概括W.r.t.问题实例的小扰动。

ML-38-标题: A channel attention based MLP-Mixer network for motor imagery decoding with EEG

链接: https://arxiv.org/abs/2110.10939
作者: Yanbin He, Zhiyang Lu, Jun Wang, Jun Shi
备注:

点击查看摘要

Abstract: Convolutional neural networks (CNNs) and their variants have been successfully applied to the electroencephalogram (EEG) based motor imagery (MI) decoding task. However, these CNN-based algorithms generally have limitations in perceiving global temporal dependencies of EEG signals. Besides, they also ignore the diverse contributions of different EEG channels to the classification task. To address such issues, a novel channel attention based MLP-Mixer network (CAMLP-Net) is proposed for EEG-based MI decoding. Specifically, the MLP-based architecture is applied in this network to capture the temporal and spatial information. The attention mechanism is further embedded into MLP-Mixer to adaptively exploit the importance of different EEG channels. Therefore, the proposed CAMLP-Net can effectively learn more global temporal and spatial information. The experimental results on the newly built MI-2 dataset indicate that our proposed CAMLP-Net achieves superior classification performance over all the compared algorithms.

摘要:卷积神经网络(CNNS)及其变体已成功应用于基于脑电图(EEG)的电动机图像(MI)解码任务。然而,这些基于CNN的算法通常具有识别EEG信号的全局时间依赖性的限制。此外,它们还忽略了不同EEG频道对分类任务的不同贡献。为了解决此类问题,提出了一种基于EEG的MI解码的基于MLP-MLP混频器网络(CAMLP-NET)的新颖的信道注意。具体地,在该网络中应用基于MLP的架构以捕获时间和空间信息。注意机制进一步嵌入到MLP混合器中以自适应地利用不同EEG通道的重要性。因此,所提出的CAMLP-NET可以有效地学习更多全局时间和空间信息。新建的MI-2数据集上的实验结果表明,我们所提出的CAMLP-Net在所有比较算法上实现了卓越的分类性能。

ML-39-标题: Can Q-learning solve Multi Armed Bantids?

链接: https://arxiv.org/abs/2110.10934
作者: Refael Vivanti
备注: arXiv admin note: text overlap with arXiv:1905.10144

点击查看摘要

Abstract: When a reinforcement learning (RL) method has to decide between several optional policies by solely looking at the received reward, it has to implicitly optimize a Multi-Armed-Bandit (MAB) problem. This arises the question: are current RL algorithms capable of solving MAB problems? We claim that the surprising answer is no. In our experiments we show that in some situations they fail to solve a basic MAB problem, and in many common situations they have a hard time: They suffer from regression in results during training, sensitivity to initialization and high sample complexity. We claim that this stems from variance differences between policies, which causes two problems: The first problem is the “Boring Policy Trap” where each policy have a different implicit exploration depends on its rewards variance, and leaving a boring, or low variance, policy is less likely due to its low implicit exploration. The second problem is the “Manipulative Consultant” problem, where value-estimation functions used in deep RL algorithms such as DQN or deep Actor Critic methods, maximize estimation precision rather than mean rewards, and have a better loss in low-variance policies, which cause the network to converge to a sub-optimal policy. Cognitive experiments on humans showed that noised reward signals may paradoxically improve performance. We explain this using the aforementioned problems, claiming that both humans and algorithms may share similar challenges in decision making. Inspired by this result, we propose the Adaptive Symmetric Reward Noising (ASRN) method, by which we mean equalizing the rewards variance across different policies, thus avoiding the two problems without affecting the environment’s mean rewards behavior. We demonstrate that the ASRN scheme can dramatically improve the results.

摘要:当加强学习(RL)方法通过单独查看收到的奖励来在几个可选策略之间决定时,它必须隐式优化多武装 - 强盗(MAB)问题。这出现了问题:目前的RL算法是否能够解决MAB问题?我们声称令人惊讶的答案是没有。在我们的实验中,我们表明,在某些情况下,他们未能解决基本的MAB问题,并且在许多常见情况下,他们遇到了艰难的时间:在训练期间,它们在训练期间的结果遭受回归,初始化和高样本复杂性的敏感性。我们声称,这源于政策之间的方差差异,这导致了两个问题:第一个问题是“无聊策略陷阱”,其中每个政策具有不同的隐含探索,取决于其奖励方差,以及留下无聊或低方差,政策由于其低隐性探索,不太可能。第二个问题是“操纵顾问”问题,其中在DQN或Deep Actor评论批评中的深度RL算法中使用的值估计函数,最大化估计精度而不是平均奖励,并且在低方差策略中具有更好的损失,导致网络收敛到次优策略。人类的认知实验表明,发声奖励信号可能矛盾地提高性能。我们使用上述问题解释了这一点,声称人和算法都可能在决策中共享相似的挑战。受到这一结果的启发,我们提出了自适应对称奖励通知(ASRN)方法,从而通过它意味着均衡不同策略的奖励方案,从而避免了两个问题而不影响环境的均值奖励行为。我们证明ASRN方案可以显着改善结果。

ML-40-标题: Subspace Detours Meet Gromov-Wasserstein

链接: https://arxiv.org/abs/2110.10932
作者: Clément Bonet, Nicolas Courty, François Septier, Lucas Drumetz
备注:

点击查看摘要

Abstract: In the context of optimal transport methods, the subspace detour approach was recently presented by Muzellec and Cuturi (2019). It consists in building a nearly optimal transport plan in the measures space from an optimal transport plan in a wisely chosen subspace, onto which the original measures are projected. The contribution of this paper is to extend this category of methods to the Gromov-Wasserstein problem, which is a particular type of transport distance involving the inner geometry of the compared distributions. After deriving the associated formalism and properties, we also discuss a specific cost for which we can show connections with the Knothe-Rosenblatt rearrangement. We finally give an experimental illustration on a shape matching problem.

摘要:在最佳运输方法的背景下,最近由Muzellec和Cuturi(2019)呈现的子空间绕道方法。它包括在明智的子空间中从最佳运输计划的措施空间建立几乎最佳的运输计划,原始措施预计。本文的贡献是将此类别的方法扩展到Gromov-Wassersein问题,这是一种特定类型的传输距离,涉及比较分布的内部几何形状。在获得相关的形式和属性之后,我们还讨论了我们可以与Knothe-Rosenblatt重新排列的联系来讨论具体成本。我们终于在形状匹配问题上进行实验说明。

ML-41-标题: Quantum field theories Markov random fields and machine learning

链接: https://arxiv.org/abs/2110.10928
作者: Dimitrios Bachtis, Gert Aarts, Biagio Lucini
备注: Contribution submitted to the CCP2021: XXXII IUPAP Conference on Computational Physics, Coventry University, United Kingdom. arXiv admin note: substantial text overlap with arXiv:2109.07730

点击查看摘要

Abstract: The transition to Euclidean space and the discretization of quantum field theories on spatial or space-time lattices opens up the opportunity to investigate probabilistic machine learning from the perspective of quantum field theory. Here, we will discuss how discretized Euclidean field theories can be recast within the mathematical framework of Markov random fields, which is a notable class of probabilistic graphical models with applications in a variety of research areas, including machine learning. Specifically, we will demonstrate that the ϕ4\phi^{4} scalar field theory on a square lattice satisfies the Hammersley-Clifford theorem, therefore recasting it as a Markov random field from which neural networks are additionally derived. We will then discuss applications pertinent to the minimization of an asymmetric distance between the probability distribution of the ϕ4\phi^{4} machine learning algorithms and that of target probability distributions.

摘要:对欧几里德空间的过渡以及空间或时空格子上的量子田理论的离散化开辟了从量子场理论的角度调查概率机器学习的机会。在这里,我们将讨论Markov随机字段的数学框架内可以在Markov随机字段的数学框架内重新定位,这是一个显着的概率图形模型,其中包括各种研究领域的应用,包括机器学习。具体而言,我们将证明平方晶格上的$ \ Phi ^ {4}标量字段理论满足Hammersley-Clifford定理,因此作为Markov随机字段重新推出,从中另外导出神经网络。然后,我们将讨论与最小化在$ \ Phi ^ {4} $机器学习算法和目标概率分布的概率分布之间的不对称距离的应用。

ML-42-标题: SecureBoost : A High Performance Gradient Boosting Tree Framework for Large Scale Vertical Federated Learning

链接: https://arxiv.org/abs/2110.10927
作者: Weijing Chen, Guoqiang Ma, Tao Fan, Yan Kang, Qian Xu, Qiang Yang
备注:

点击查看摘要

Abstract: Gradient boosting decision tree (GBDT) is a widely used ensemble algorithm in the industry. Its vertical federated learning version, SecureBoost, is one of the most popular algorithms used in cross-silo privacy-preserving modeling. As the area of privacy computation thrives in recent years, demands for large-scale and high-performance federated learning have grown dramatically in real-world applications. In this paper, to fulfill these requirements, we propose SecureBoost+ that is both novel and improved from the prior work SecureBoost. SecureBoost+ integrates several ciphertext calculation optimizations and engineering optimizations. The experimental results demonstrate that Secureboost+ has significant performance improvements on large and high dimensional data sets compared to SecureBoost. It makes effective and efficient large-scale vertical federated learning possible.

摘要:梯度升压决策树(GBDT)是业内广泛使用的集合算法。它的垂直联合学习版本,SecureBoost是跨筒仓隐私保护建模中最受欢迎的算法之一。由于近年来隐私计算领域蓬勃发展,对大规模和高性能联合学习的需求在现实世界应用中大幅增加。在本文中,为了满足这些要求,我们提出了SecureBoost +,这既有新颖的,并从事事先工作船舶改进。SecureBoost +集成了多个密文计算优化和工程优化。实验结果表明,与SecureBoost相比,SecureBoost +对大型和高维数据集具有显着性能改进。它可以实现有效,有效高效的大规模垂直联合学习。

ML-43-标题: On some theoretical limitations of Generative Adversarial Networks

链接: https://arxiv.org/abs/2110.10915
作者: Benoît Oriol, Alexandre Miot
备注: 7 pages

点击查看摘要

Abstract: Generative Adversarial Networks have become a core technique in Machine Learning to generate unknown distributions from data samples. They have been used in a wide range of context without paying much attention to the possible theoretical limitations of those models. Indeed, because of the universal approximation properties of Neural Networks, it is a general assumption that GANs can generate any probability distribution. Recently, people began to question this assumption and this article is in line with this thinking. We provide a new result based on Extreme Value Theory showing that GANs can’t generate heavy tailed distributions. The full proof of this result is given.

摘要:生成的对策网络已成为机器学习中的核心技术,从数据样本生成未知的分布。它们已在广泛的背景下使用,而不会对这些模型的可能的理论局限性贡献。实际上,由于神经网络的普遍近似特性,这是GAN可以产生任何概率分布的一般假设。最近,人们开始质疑这个假设,这篇文章符合这一思考。我们提供了基于极值理论的新结果,显示GAN不能产生重尾分布。给出了此结果的完整证明。

ML-44-标题: An Empirical Evaluation of Time-Series Feature Sets

链接: https://arxiv.org/abs/2110.10914
作者: Trent Henderson, Ben D. Fulcher
备注: Submitted to and accepted for publication in SFE-TSDM Workshop at 21st IEEE International Conference on Data Mining (IEEE ICDM 2021)

点击查看摘要

Abstract: Solving time-series problems with features has been rising in popularity due to the availability of software for feature extraction. Feature-based time-series analysis can now be performed using many different feature sets, including hctsa (7730 features: Matlab), feasts (42 features: R), tsfeatures (63 features: R), Kats (40 features: Python), tsfresh (up to 1558 features: Python), TSFEL (390 features: Python), and the C-coded catch22 (22 features: Matlab, R, Python, and Julia). There is substantial overlap in the types of methods included in these sets (e.g., properties of the autocorrelation function and Fourier power spectrum), but they are yet to be systematically compared. Here we compare these seven sets on computational speed, assess the redundancy of features contained in each, and evaluate the overlap and redundancy between them. We take an empirical approach to feature similarity based on outputs across a diverse set of real-world and simulated time series. We find that feature sets vary across three orders of magnitude in their computation time per feature on a laptop for a 1000-sample series, from the fastest sets catch22 and TSFEL (~0.1ms per feature) to tsfeatures (~3s per feature). Using PCA to evaluate feature redundancy within each set, we find the highest within-set redundancy for TSFEL and tsfresh. For example, in TSFEL, 90% of the variance across 390 features can be captured with just four PCs. Finally, we introduce a metric for quantifying overlap between pairs of feature sets, which indicates substantial overlap. We found that the largest feature set, hctsa, is the most comprehensive, and that tsfresh is the most distinctive, due to its incorporation of many low-level Fourier coefficients. Our results provide empirical understanding of the differences between existing feature sets, information that can be used to better tailor feature sets to their applications.

摘要:解决具有功能的时间序列问题由于特征提取的软件而导致的普及。现在可以使用许多不同的特征集进行特征的时间序列分析,包括HCTSA(7730个特点:Matlab),盛宴(42个特点:R),TSFeatures(63个特点:R),KATS(40个特点:Python), TSFRESH(最多1558个特点:Python),TSFEL(390个特点:Python)和C编码Catch22(22个特点:Matlab,R,Python和Julia)。这些集合中包括的方法类型(例如,自相关函数和傅里叶功率谱的性质)存在大量重叠,但它们尚未系统地进行了系统。在这里,我们将这七个套在计算速度上进行比较,评估每个包含的功能的冗余,并评估它们之间的重叠和冗余。我们采取了基于各种现实世界和模拟时间序列的输出的特征相似性的经验方法。我们发现,从最快的设置Catch22和TSFEL(每种特征约为0.1ms)到TSFeatures(每项特征每次特征〜3s,每次特征每次特征约为3s,每次特征时〜0.1ms)计算时间,在每个功能上的计算时间中的计算时间有所不同。使用PCA在每个集合中评估功能冗余,我们发现TSFEL和TSFRESH中的最高内部冗余。例如,在TSFEL中,只需四个PC即可捕获390个功能的90%的差异。最后,我们介绍了用于在特征集对之间量化重叠的度量,这表示实质重叠。我们发现,最大的功能集HCTSA是最全面的,而TSFresh是最独特的,因为它融入了许多低级傅里叶系数。我们的结果提供了对现有功能集之间的差异的实证理解,可用于更好地裁缝功能集的信息。

ML-45-标题: Finite Volume Least-Squares Neural Network (FV-LSNN) Method for Scalar Nonlinear Hyperbolic Conservation Laws

链接: https://arxiv.org/abs/2110.10895
作者: Zhiqiang Cai, Jingshuang Chen, Min Liu
备注: arXiv admin note: text overlap with arXiv:2105.11627

点击查看摘要

Abstract: In [4], we introduced the least-squares ReLU neural network (LSNN) method for solving the linear advection-reaction problem with discontinuous solution and showed that the number of degrees of freedom for the LSNN method is significantly less than that of traditional mesh-based methods. The LSNN method is a discretization of an equivalent least-squares (LS) formulation in the class of neural network functions with the ReLU activation function; and evaluation of the LS functional is done by using numerical integration and proper numerical differentiation. By developing a novel finite volume approximation (FVA) to the divergence operator, this paper studies the LSNN method for scalar nonlinear hyperbolic conservation laws. The FVA introduced in this paper is tailored to the LSNN method and is more accurate than traditional, well-studied FV schemes used in mesh-based numerical methods. Numerical results of some benchmark test problems with both convex and non-convex fluxes show that the finite volume LSNN (FV-LSNN) method is capable of computing the physical solution for problems with rarefaction waves and capturing the shock of the underlying problem automatically through the free hyper-planes of the ReLU neural network. Moreover, the method does not exhibit the common Gibbs phenomena along the discontinuous interface.

摘要:在[4]中,我们介绍了用不连续解决方案求解线性平流反应问题的最小二乘释放方法,并显示了LSNN方法的自由度的数量明显小于传统基于网格的方法。 LSNN方法是具有Relu激活功能的神经网络功能中的等效最小二乘(LS)配方的离散化;通过使用数值积分和适当的数值分化来完成LS功能的评估。通过将新的有限体积近似(FVA)开发到发散操作员,本文研究了标量非线性双曲胁迫法的LSNN方法。本文介绍的FVA是针对LSNN方法量身定制的,比传统的,学习的基于网格的数值方法更准确。具有凸和非凸助焊剂的一些基准测试问题的数值结果表明,有限体积LSNN(FV-LSNN)方法能够计算稀疏波浪问题的物理解决方案,并通过自动捕获潜在问题的冲击recu neural网络的免费超平面。此外,该方法不沿着不连续界面展示普通的GIBB现象。

ML-46-标题: A Real-Time Energy and Cost Efficient Vehicle Route Assignment Neural Recommender System

链接: https://arxiv.org/abs/2110.10887
作者: Ayman Moawad, Zhijian Li, Ines Pancorbo, Krishna Murthy Gurumurthy, Vincent Freyermuth, Ehsan Islam, Ram Vijayagopal, Monique Stinson, Aymeric Rousseau
备注: 14 pages, 11 figures

点击查看摘要

Abstract: This paper presents a neural network recommender system algorithm for assigning vehicles to routes based on energy and cost criteria. In this work, we applied this new approach to efficiently identify the most cost-effective medium and heavy duty truck (MDHDT) powertrain technology, from a total cost of ownership (TCO) perspective, for given trips. We employ a machine learning based approach to efficiently estimate the energy consumption of various candidate vehicles over given routes, defined as sequences of links (road segments), with little information known about internal dynamics, i.e using high level macroscopic route information. A complete recommendation logic is then developed to allow for real-time optimum assignment for each route, subject to the operational constraints of the fleet. We show how this framework can be used to (1) efficiently provide a single trip recommendation with a top-kk vehicles star ranking system, and (2) engage in more general assignment problems where nn vehicles need to be deployed over mnm \leq n trips. This new assignment system has been deployed and integrated into the POLARIS Transportation System Simulation Tool for use in research conducted by the Department of Energy’s Systems and Modeling for Accelerated Research in Transportation (SMART) Mobility Consortium

摘要:本文提出了一种神经网络推荐系统算法,用于基于能量和成本标准将车辆分配给路线。在这项工作中,我们应用了这种新方法,以有效地识别最具成本效益的中型和重型卡车(MDHDT)动力总成技术,从总体拥有成本(TCO)的角度来看,对于给定的旅行。我们采用基于机器学习的方法,以有效地估计给定路线上的各种候选车辆的能量消耗,被定义为链路(道路段)的序列,内部动态知之甚少,即使用高级宏观路线信息。然后开发出完整的推荐逻辑,以允许对每个路线进行实时最佳分配,但经过舰队的运行限制。我们展示了该框架如何用于(1)有效地提供单一旅行推荐,其中包含顶级K 车辆Star排名系统,(2)从事更多的一般分配问题,其中需要部署车辆Star排名系统,(2)从事更多的一般分配问题,其中需要部署 N 车辆车辆 m \ leq n $ traps。该新的任务系统已部署并集成到Polaris运输系统仿真工具中,用于在能源系统和建模中进行的研究,以加速运输(智能)移动联盟的加速研究

ML-47-标题: Deep Generative Models in Engineering Design: A Review

链接: https://arxiv.org/abs/2110.10863
作者: Lyle Regenwetter, Amin Heyrani Nobari, Faez Ahmed
备注:

点击查看摘要

Abstract: Automated design synthesis has the potential to revolutionize the modern human design process and improve access to highly optimized and customized products across countless industries. Successfully adapting generative Machine Learning to design engineering may be the key to such automated design synthesis and is a research subject of great importance. We present a review and analysis of Deep Generative Learning models in engineering design. Deep Generative Models (DGMs) typically leverage deep networks to learn from an input dataset and learn to synthesize new designs. Recently, DGMs such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), feedforward Neural Networks (NNs) and certain Deep Reinforcement Learning (DRL) frameworks have shown promising results in design applications like structural optimization, materials design, and shape synthesis. The prevalence of DGMs in Engineering Design has skyrocketed since 2016. Anticipating continued growth, we conduct a review of recent advances with the hope of benefitting researchers interested in DGMs for design. We structure our review as an exposition of the algorithms, datasets, representation methods, and applications commonly used in the current literature. In particular, we discuss key works that have introduced new techniques and methods in DGMs, successfully applied DGMs to a design-related domain, or directly supported development of DGMs through datasets or auxiliary methods. We further identify key challenges and limitations currently seen in DGMs across design fields, such as design creativity, handling complex constraints and objectives, and modeling both form and functional performance simultaneously. In our discussion we identify possible solution pathways as key areas on which to target future work.

摘要:自动化设计合成有可能彻底改变现代人类设计过程,并改善对无数行业的高度优化和定制产品的进展。成功调整生成机器学习设计工程可能是这种自动化设计合成的关键,是一个重要的研究主题。我们在工程设计中提出了对深生成的学习模型的审查与分析。深度生成模型(DGMS)通常利用深网络从输入数据集中学习并学习合成新设计。最近,DGM如生成的对抗网络(GANS),变形自动泊车(VAES),前馈神经网络(NNS)和某些深度增强学习(DRL)框架已经显示了结构优化,材料设计和形状合成等设计应用的有希望的结果。自2016年以来,工程设计中DGMS在工程设计中的普遍存在。预测持续增长,我们对最近的进展进行了审查,希望受益于DGMS设计的研究人员。我们将审核构建为当前文献中常用的算法,数据集,表示方法和应用程序的博览会。特别是,我们讨论在DGMS中引入新技术和方法的关键作品,成功应用于设计相关域,或通过数据集或辅助方法直接支持DGM的开发。我们进一步确定了目前在DGMS跨设计领域看到的关键挑战和限制,例如设计创造性,处理复杂的约束和目标,并同时建模表单和功能性能。在我们的讨论中,我们将可能的解决方案途径视为目标未来工作的关键领域。

ML-48-标题: Utilizing Redundancy in Cost Functions for Resilience in Distributed Optimization and Learning

链接: https://arxiv.org/abs/2110.10858
作者: Shuo Liu, Nirupam Gupta, Nitin Vaidya
备注: 66 pages, 1 figure, and 1 table. Supersede our previous report arXiv:2106.03998 in asynchronous distributed optimization by containing the most of its results

点击查看摘要

Abstract: This paper considers the problem of resilient distributed optimization and stochastic machine learning in a server-based architecture. The system comprises a server and multiple agents, where each agent has a local cost function. The agents collaborate with the server to find a minimum of their aggregate cost functions. We consider the case when some of the agents may be asynchronous and/or Byzantine faulty. In this case, the classical algorithm of distributed gradient descent (DGD) is rendered ineffective. Our goal is to design techniques improving the efficacy of DGD with asynchrony and Byzantine failures. To do so, we start by proposing a way to model the agents’ cost functions by the generic notion of (f,r;ϵ)(f, \,r; \epsilon)-redundancy where ff and rr are the parameters of Byzantine failures and asynchrony, respectively, and ϵ\epsilon characterizes the closeness between agents’ cost functions. This allows us to quantify the level of redundancy present amongst the agents’ cost functions, for any given distributed optimization problem. We demonstrate, both theoretically and empirically, the merits of our proposed redundancy model in improving the robustness of DGD against asynchronous and Byzantine agents, and their extensions to distributed stochastic gradient descent (D-SGD) for robust distributed machine learning with asynchronous and Byzantine agents.

摘要:本文考虑了基于服务器的架构中弹性分布式优化和随机机器学习的问题。该系统包括服务器和多个代理,其中每个代理具有本地成本函数。该代理与服务器合作,找到最少的总成本函数。当一些药剂可能是异步和/或拜占庭故障时,我们考虑这种情况。在这种情况下,分布式梯度下降(DGD)的经典算法呈现无效。我们的目标是设计技术改善DGD与异步和拜占庭故障的效果。为此,我们首先提出一种通过(f,\,r; \ epsilon)的通用概念来模拟代理商的成本函数 - 冗余,其中 f r 是拜占庭故障的参数分别为Asynchrony,以及是拜占庭故障的参数分别为Asynchrony,以及 \ epsilon $的特征是代理商的成本职能之间的亲密关系。这允许我们量化代理的成本函数中存在的冗余水平,对于任何给定的分布式优化问题。理论上和经验,我们展示了我们提出的冗余模型在提高DGD对异步和拜占庭代理的稳健性的优点,以及它们对具有异步和拜占庭代理的强大分布式机器学习的分布式随机梯度下降(D-SGD)的延伸。

ML-49-标题: Using NASA Satellite Data Sources and Geometric Deep Learning to Uncover Hidden Patterns in COVID-19 Clinical Severity

链接: https://arxiv.org/abs/2110.10849
作者: Ignacio Segovia-Dominguez, Huikyo Lee, Zhiwei Zhen, Yuzhou Chen, Michael Garay, Daniel Crichton, Rishabh Wagh, Yulia R. Gel
备注: Main Paper and Appendix

点击查看摘要

Abstract: As multiple adverse events in 2021 illustrated, virtually all aspects of our societal functioning – from water and food security to energy supply to healthcare – more than ever depend on the dynamics of environmental factors. Nevertheless, the social dimensions of weather and climate are noticeably less explored by the machine learning community, largely, due to the lack of reliable and easy access to use data. Here we present a unique not yet broadly available NASA’s satellite dataset on aerosol optical depth (AOD), temperature and relative humidity and discuss the utility of these new data for COVID-19 biosurveillance. In particular, using the geometric deep learning models for semi-supervised classification on a county-level basis over the contiguous United States, we investigate the pressing societal question whether atmospheric variables have considerable impact on COVID-19 clinical severity.

摘要:由于2021年的多个不良事件所示,我们的社会功能的几乎所有方面 - 从水和粮食安全到医疗保健的能源供应 - 比以往任何时候都取决于环境因素的动态。尽管如此,由于缺乏可靠和轻松的使用数据,因此机器学习界的天气和气候的社会方面明显不太探索。在这里,我们在气溶胶光学深度(AOD),温度和相对湿度下,展出了独特但尚可提供的NASA卫星数据集,并讨论了Covid-19生物核心的这些新数据的效用。特别是,在县级基础上使用几何深度学习模型对连续的美国县级基础,我们调查了迫切的社会问题,无论大气变量是否对Covid-19临床严重程度有相当大的影响。

ML-50-标题: AdamD: Improved bias-correction in Adam

链接: https://arxiv.org/abs/2110.10828
作者: John St John
备注: 6 pages, 1 figure

点击查看摘要

Abstract: Here I present a small update to the bias correction term in the Adam optimizer that has the advantage of behaving well in the first several steps. The default implementation of Adam may be as sensitive as it is to hyperparameters partially due to the originally proposed bias correction procedure, and its behavior in early steps of training.

摘要:在这里,我在adam优化器中向偏见校正项呈现一个小的更新,这些校正术语具有在第一个几个步骤中表现得很好的优势。由于最初提出的偏置校正过程,亚当的默认实施可能与部分敏感的校正校正程序一样敏感,以及其在培训的早期行为中。

ML-51-标题: Shaking the foundations: delusions in sequence models for interaction and control

链接: https://arxiv.org/abs/2110.10819
作者: Pedro A. Ortega, Markus Kunesch, Grégoire Delétang, Tim Genewein, Jordi Grau-Moya, Joel Veness, Jonas Buchli, Jonas Degrave, Bilal Piot, Julien Perolat, Tom Everitt, Corentin Tallec, Emilio Parisotto, Tom Erez, Yutian Chen, Scott Reed, Marcus Hutter, Nando de Freitas, Shane Legg
备注: DeepMind Tech Report, 16 pages, 4 figures

点击查看摘要

Abstract: The recent phenomenal success of language models has reinvigorated machine learning research, and large sequence models such as transformers are being applied to a variety of domains. One important problem class that has remained relatively elusive however is purposeful adaptive behavior. Currently there is a common perception that sequence models “lack the understanding of the cause and effect of their actions” leading them to draw incorrect inferences due to auto-suggestive delusions. In this report we explain where this mismatch originates, and show that it can be resolved by treating actions as causal interventions. Finally, we show that in supervised learning, one can teach a system to condition or intervene on data by training with factual and counterfactual error signals respectively.

摘要:近期语言模型的现象成功具有重新检测的机器学习研究,诸如变压器等大型序列型号正在应用于各种域。然而,一个重要的问题类仍然是相对难以捉摸的阶级是有目的的自适应行为。目前,序列模型“缺乏对其行动的原因和效果缺乏了解”的常见看法,导致他们引起了由于自动暗示妄想导致的不正确的推论。在本报告中,我们解释了这种不匹配源的位置,并表明它可以通过将动作视为因果干预来解决。最后,我们表明,在监督学习中,可以通过分别使用事实和反事实误差信号训练来教导系统或干预数据。

ML-52-标题: Convergence Analysis and Implicit Regularization of Feedback Alignment for Deep Linear Networks

链接: https://arxiv.org/abs/2110.10815
作者: Manuela Girotti, Ioannis Mitliagkas, Gauthier Gidel
备注: 10 pages (Main) + 19 pages (Appendix), 6 figures

点击查看摘要

Abstract: We theoretically analyze the Feedback Alignment (FA) algorithm, an efficient alternative to backpropagation for training neural networks. We provide convergence guarantees with rates for deep linear networks for both continuous and discrete dynamics. Additionally, we study incremental learning phenomena for shallow linear networks. Interestingly, certain specific initializations imply that negligible components are learned before the principal ones, thus potentially negatively affecting the effectiveness of such a learning algorithm; a phenomenon we classify as implicit anti-regularization. We also provide initialization schemes where the components of the problem are approximately learned by decreasing order of importance, thus providing a form of implicit regularization.

摘要:我们理论上分析了反馈对准(FA)算法,是培训神经网络的高效替代方法。我们为连续和离散动态的深度线性网络提供汇聚率。此外,我们研究浅线性网络的增量学习现象。有趣的是,某些特定的初始化意味着在主体之前学到了可忽略不计的组件,从而可能对这种学习算法的有效性产生负面影响;我们将视为隐性反正则化的现象。我们还提供初始化方案,其中问题的组件是通过减少重要性的顺序大致学习的,从而提供隐式正则化的形式。

ML-53-标题: Hierarchical Skills for Efficient Exploration

链接: https://arxiv.org/abs/2110.10809
作者: Jonas Gehring, Gabriel Synnaeve, Andreas Krause, Nicolas Usunier
备注: To appear in 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

点击查看摘要

Abstract: In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration. However, prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control) and specificity (faster learning) in skill design. In previous work on continuous control, the sensitivity of methods to this trade-off has not been addressed explicitly, as locomotion provides a suitable prior for navigation tasks, which have been of foremost interest. In this work, we analyze this trade-off for low-level policy pre-training with a new benchmark suite of diverse, sparse-reward tasks for bipedal robots. We alleviate the need for prior knowledge by proposing a hierarchical skill learning framework that acquires skills of varying complexity in an unsupervised manner. For utilization on downstream tasks, we present a three-layered hierarchical learning algorithm to automatically trade off between general and specific skills as required by the respective task. In our experiments, we show that our approach performs this trade-off effectively and achieves better results than current state-of-the-art methods for end- to-end hierarchical reinforcement learning and unsupervised skill discovery. Code and videos are available at this https URL .

摘要:在加强学习中,预先接受过的低级技能有可能极大地促进探索。然而,下游任务的先验知识是必需的,以在技能设计中击中一般性(细粒度控制)和特异性(更快的学习)之间的正确平衡。在以前的持续控制的工作中,由于机车在导航任务之前提供了适合的兴趣,因此尚未明确解决此权衡对此权衡的敏感性。在这项工作中,我们分析了对低级政策的折扣,并以新型机器人的新基准套件,稀疏奖励任务的新基准套件进行了预培训。通过提出以无监督的方式获取不同复杂性技能的分层技能学习框架,减轻了对先验知识的需求。为了利用下游任务,我们提出了一种三层分层学习算法,可以根据各个任务的要求自动折交一般和特定技能。在我们的实验中,我们表明我们的方法有效地执行了这种权衡,而且比目前最先进的分层强化学习和无监督技能发现的最先进方法实现了更好的结果。此HTTPS URL可提供代码和视频。

ML-54-标题: Propensity-scored Probabilistic Label Trees

链接: https://arxiv.org/abs/2110.10803
作者: Marek Wydmuch, Kalina Jasinska-Kobus, Rohit Babbar, Krzysztof Dembczyński
备注: The extended version of SIGIR '21 Short Research Paper

点击查看摘要

Abstract: Extreme multi-label classification (XMLC) refers to the task of tagging instances with small subsets of relevant labels coming from an extremely large set of all possible labels. Recently, XMLC has been widely applied to diverse web applications such as automatic content labeling, online advertising, or recommendation systems. In such environments, label distribution is often highly imbalanced, consisting mostly of very rare tail labels, and relevant labels can be missing. As a remedy to these problems, the propensity model has been introduced and applied within several XMLC algorithms. In this work, we focus on the problem of optimal predictions under this model for probabilistic label trees, a popular approach for XMLC problems. We introduce an inference procedure, based on the AA^*-search algorithm, that efficiently finds the optimal solution, assuming that all probabilities and propensities are known. We demonstrate the attractiveness of this approach in a wide empirical study on popular XMLC benchmark datasets.

摘要:极端多标签分类(XMLC)是指带有来自所有可能标签的极大套件的相关标签的标记实例的任务。最近,XMLC已广泛应用于不同的Web应用程序,例如自动内容标签,在线广告或推荐系统。在这样的环境中,标签分布通常高度不平衡,主要由非常罕见的尾标签组成,并且可以缺少相关标签。作为这些问题的补救措施,已经引入并应用了倾向模型并在几个XMLC算法中应用。在这项工作中,我们专注于在这个模型下为概率标签树下的最佳预测问题,是XMLC问题的流行方法。我们介绍了一种推理过程,基于$ a ^ * $ - 搜索算法,它有效地找到最佳解决方案,假设所有概率和拟合都是已知的。我们展示了这种方法对流行XMLC基准数据集的广泛实证研究的吸引力。

ML-55-标题: A Data-Centric Optimization Framework for Machine Learning

链接: https://arxiv.org/abs/2110.10802
作者: Oliver Rausch, Tal Ben-Nun, Nikoli Dryden, Andrei Ivanov, Shigang Li, Torsten Hoefler
备注:

点击查看摘要

Abstract: Rapid progress in deep learning is leading to a diverse set of quickly changing models, with a dramatically growing demand for compute. However, as frameworks specialize optimization to patterns in popular networks, they implicitly constrain novel and diverse models that drive progress in research. We empower deep learning researchers by defining a flexible and user-customizable pipeline for optimizing training of arbitrary deep neural networks, based on data movement minimization. The pipeline begins with standard networks in PyTorch or ONNX and transforms computation through progressive lowering. We define four levels of general-purpose transformations, from local intra-operator optimizations to global data movement reduction. These operate on a data-centric graph intermediate representation that expresses computation and data movement at all levels of abstraction, including expanding basic operators such as convolutions to their underlying computations. Central to the design is the interactive and introspectable nature of the pipeline. Every part is extensible through a Python API, and can be tuned interactively using a GUI. We demonstrate competitive performance or speedups on ten different networks, with interactive optimizations discovering new opportunities in EfficientNet.

摘要:深度学习的快速进展导致多种迅速改变模型,对计算的需求显着增加。然而,由于框架专门从事流行网络中的模式,因此隐含地限制了推动研究进展的新颖和不同的模型。通过定义灵活的和用户可定制的管道来赋予深度学习研究人员,以基于数据移动最小化优化任意深神经网络的培训。管道以Pytorch或ONNX中的标准网络开头,并通过逐行降低转换计算。我们定义四个级别的通用转换,从本地算子优化到全局数据移动减少。这些在以数据为中心的图形中间表示操作,该中间表示表达了所有级别的抽象中的计算和数据移动,包括扩展基本运算符,例如卷积到其底层计算。设计中的核心是管道的互动和难题性质。每个零件都通过Python API来扩展,可以使用GUI交互式调谐。我们展示了十大不同网络的竞争性能或加速,交互式优化发现了高效网络的新机会。

ML-56-标题: Dynamic Bottleneck for Robust Self-Supervised Exploration

链接: https://arxiv.org/abs/2110.10735
作者: Chenjia Bai, Lingxiao Wang, Lei Han, Animesh Garg, Jianye Hao, Peng Liu, Zhaoran Wang
备注: NeurIPS 2021

点击查看摘要

Abstract: Exploration methods based on pseudo-count of transitions or curiosity of dynamics have achieved promising results in solving reinforcement learning with sparse rewards. However, such methods are usually sensitive to environmental dynamics-irrelevant information, e.g., white-noise. To handle such dynamics-irrelevant information, we propose a Dynamic Bottleneck (DB) model, which attains a dynamics-relevant representation based on the information-bottleneck principle. Based on the DB model, we further propose DB-bonus, which encourages the agent to explore state-action pairs with high information gain. We establish theoretical connections between the proposed DB-bonus, the upper confidence bound (UCB) for linear case, and the visiting count for tabular case. We evaluate the proposed method on Atari suits with dynamics-irrelevant noises. Our experiments show that exploration with DB bonus outperforms several state-of-the-art exploration methods in noisy environments.

摘要:基于伪数或动态的好奇心的勘探方法取得了有希望的阐释,以稀疏奖励解决钢筋学习。然而,这种方法通常对环境动态 - 无关的信息敏感,例如白噪声。为了处理这种动态 - 无关的信息,我们提出了一种动态瓶颈(DB)模型,其基于信息瓶颈原理获得了动态相关的表示。基于DB模型,我们进一步提出了DB-Bonus,鼓励代理商探索具有高信息增益的状态操作对。我们建立了建议的DB-Bonus,线性案例的上置信度绑定(UCB)与表格案件的访问计数之间的理论连接。我们评估Atari诉讼的建议方法,具有动力学 - 无关的噪声。我们的实验表明,与DB奖励的探索优于嘈杂环境中的几种最先进的勘探方法。

ML-57-标题: Part-X: A Family of Stochastic Algorithms for Search-Based Test Generation with Probabilistic Guarantees

链接: https://arxiv.org/abs/2110.10729
作者: Giulia Pedrielli, Tanmay Kandhait, Surdeep Chotaliya, Quinn Thibeault, Hao Huang, Mauricio Castillo-Effen, Georgios Fainekos
备注: 25 pages, 7 Figures

点击查看摘要

Abstract: Requirements driven search-based testing (also known as falsification) has proven to be a practical and effective method for discovering erroneous behaviors in Cyber-Physical Systems. Despite the constant improvements on the performance and applicability of falsification methods, they all share a common characteristic. Namely, they are best-effort methods which do not provide any guarantees on the absence of erroneous behaviors (falsifiers) when the testing budget is exhausted. The absence of finite time guarantees is a major limitation which prevents falsification methods from being utilized in certification procedures. In this paper, we address the finite-time guarantees problem by developing a new stochastic algorithm. Our proposed algorithm not only estimates (bounds) the probability that falsifying behaviors exist, but also it identifies the regions where these falsifying behaviors may occur. We demonstrate the applicability of our approach on standard benchmark functions from the optimization literature and on the F16 benchmark problem.

摘要:要求驱动的基于搜索的测试(也称为伪造)已被证明是在网络 - 物理系统中发现错误行为的实用而有效的方法。尽管对伪造方法的性能和适用性不断提高,但它们都分享了一个共同的特征。即,当测试预算耗尽时,它们是在没有错误行为(伪造者)的情况下没有提供任何保证的最佳努力方法。没有有限时间担保是预防伪造方法在认证程序中使用的主要限制。在本文中,我们通过开发一种新的随机算法来解决有限时间担保问题。我们所提出的算法不仅估计(界限)伪造行为所在的概率,而且还标识可能发生这些伪造行为的区域。我们展示了我们对优化文献和F16基准问题的标准基准功能的适用性。

ML-58-标题: PPFS: Predictive Permutation Feature Selection

链接: https://arxiv.org/abs/2110.10713
作者: Atif Hassan, Jiaul H. Paik, Swanand Khare, Syed Asif Hassan
备注: 7 pages. For the implementation of this work, see this https URL

点击查看摘要

Abstract: We propose Predictive Permutation Feature Selection (PPFS), a novel wrapper-based feature selection method based on the concept of Markov Blanket (MB). Unlike previous MB methods, PPFS is a universal feature selection technique as it can work for both classification as well as regression tasks on datasets containing categorical and/or continuous features. We propose Predictive Permutation Independence (PPI), a new Conditional Independence (CI) test, which enables PPFS to be categorised as a wrapper feature selection method. This is in contrast to current filter based MB feature selection techniques that are unable to harness the advancements in supervised algorithms such as Gradient Boosting Machines (GBM). The PPI test is based on the knockoff framework and utilizes supervised algorithms to measure the association between an individual or a set of features and the target variable. We also propose a novel MB aggregation step that addresses the issue of sample inefficiency. Empirical evaluations and comparisons on a large number of datasets demonstrate that PPFS outperforms state-of-the-art Markov blanket discovery algorithms as well as, well-known wrapper methods. We also provide a sketch of the proof of correctness of our method. Implementation of this work is available at \url{this https URL}

摘要:我们提出了一种基于Markov毯概念(MB)的新型包装器的特征选择方法(MB)的预测性置换特征选择(PPFS)。与以前的MB方法不同,PPFS是一个通用的特征选择技术,因为它可以在包含分类和/或连续功能的数据集中的分类以及回归任务。我们提出了预测置换独立性(PPI),一种新的条件独立性(CI)测试,它使PPF能够被分类为包装特征选择方法。这与基于电流滤波器的MB特征选择技术相反,无法利用监督算法(例如梯度升压机)(GBM)的进步。 PPI测试基于淘汰框架,利用监督算法来测量个人或一组特征和目标变量之间的关联。我们还提出了一种新的MB聚合步骤,该步骤解决了样本效率低下的问题。大量数据集的实证评估和比较表明,PPFS优于最先进的马尔可夫毯发现算法以及众所周知的包装器方法。我们还提供了我们方法的正确性证明的草图。此工作的实现可用于\ URL {此HTTPS URL}

ML-59-标题: SEA: Graph Shell Attention in Graph Neural Networks

链接: https://arxiv.org/abs/2110.10674
作者: Christian M.M. Frey, Yunpu Ma, Matthias Schubert
备注:

点击查看摘要

Abstract: A common issue in Graph Neural Networks (GNNs) is known as over-smoothing. By increasing the number of iterations within the message-passing of GNNs, the nodes’ representations of the input graph align with each other and become indiscernible. Recently, it has been shown that increasing a model’s complexity by integrating an attention mechanism yields more expressive architectures. This is majorly contributed to steering the nodes’ representations only towards nodes that are more informative than others. Transformer models in combination with GNNs result in architectures including Graph Transformer Layers (GTL), where layers are entirely based on the attention operation. However, the calculation of a node’s representation is still restricted to the computational working flow of a GNN. In our work, we relax the GNN architecture by means of implementing a routing heuristic. Specifically, the nodes’ representations are routed to dedicated experts. Each expert calculates the representations according to their respective GNN workflow. The definitions of distinguishable GNNs result from k-localized views starting from the central node. We call this procedure Graph Shell Attention (SEA), where experts process different subgraphs in a transformer-motivated fashion. Intuitively, by increasing the number of experts, the models gain in expressiveness such that a node’s representation is solely based on nodes that are located within the receptive field of an expert. We evaluate our architecture on various benchmark datasets showing competitive results compared to state-of-the-art models.

摘要:图形神经网络(GNN)中的常见问题称为过平滑。通过增加GNN的消息传递中的迭代次数,输入图的节点的表示彼此对齐并变得无法辨认。最近,已经表明通过集成注意机制来增加模型的复杂性,产生更多的表现力的架构。这主要是有助于引导节点的表示,仅针对比其他人更丰富的节点。变压器模型与GNNS结合导致架构,包括曲线变压器层(GTL),其中层完全基于注意力操作。然而,节点表示的计算仍然限于GNN的计算工作流程。在我们的工作中,我们通过实施路由启发式来放松GNN架构。具体来说,节点的表示被路由到专用专家。每个专家根据其各自的GNN工作流计算表示。可区分GNN的定义由从中央节点开始的k局部化视图结果。我们称这程序图表壳体注意(海),专家在变压器激励的时尚中处理不同的子图。直观地,通过增加专家的数量,模型以富有表现力增益,使得节点的表示仅基于位于专家的接受领域内的节点。我们在各种基准数据集上评估我们的架构,显示出与最先进的模型相比的竞争结果。

ML-60-标题: OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems

链接: https://arxiv.org/abs/2110.10659
作者: Nawras Alnaasan, Arpan Jain, Aamir Shafi, Hari Subramoni, Dhabaleswar K Panda
备注:

点击查看摘要

Abstract: Python has become a dominant programming language for emerging areas like Machine Learning (ML), Deep Learning (DL), and Data Science (DS). An attractive feature of Python is that it provides easy-to-use programming interface while allowing library developers to enhance performance of their applications by harnessing the computing power offered by High Performance Computing (HPC) platforms. Efficient communication is key to scaling applications on parallel systems, which is typically enabled by the Message Passing Interface (MPI) standard and compliant libraries on HPC hardware. mpi4py is a Python-based communication library that provides an MPI-like interface for Python applications allowing application developers to utilize parallel processing elements including GPUs. However, there is currently no benchmark suite to evaluate communication performance of mpi4py – and Python MPI codes in general – on modern HPC systems. In order to bridge this gap, we propose OMB-Py – Python extensions to the open-source OSU Micro-Benchmark (OMB) suite – aimed to evaluate communication performance of MPI-based parallel applications in Python. To the best of our knowledge, OMB-Py is the first communication benchmark suite for parallel Python applications. OMB-Py consists of a variety of point-to-point and collective communication benchmark tests that are implemented for a range of popular Python libraries including NumPy, CuPy, Numba, and PyCUDA. We also provide Python implementation for several distributed ML algorithms as benchmarks to understand the potential gain in performance for ML/DL workloads. Our evaluation reveals that mpi4py introduces a small overhead when compared to native MPI libraries. We also evaluate the ML/DL workloads and report up to 106x speedup on 224 CPU cores compared to sequential execution. We plan to publicly release OMB-Py to benefit Python HPC community.

摘要:Python已成为机器学习(ML),深度学习(DL)和数据科学(DS)等新兴区域的主导编程语言。 Python的一个有吸引力的特点是它提供易于使用的编程接口,同时允许库开发人员通过利用高性能计算(HPC)平台提供的计算能力来增强其应用程序的性能。高效的通信是对并行系统上的缩放应用的关键,该系统通常由消息传递接口(MPI)标准和符合HPC硬件的兼容库启用。 MPI4PY是一种基于Python的通信库,为Python应用程序提供类似MPI的接口,允许应用程序开发人员利用包括GPU的并行处理元素。但是,目前没有基准套件,以评估MPI4PY - 和Python MPI代码的通信性能 - 在现代HPC系统上。为了弥合这个差距,我们向开源OSU微基准(OMB)套件提出了Omb-Py - Python扩展 - 旨在评估Python中基于MPI的并行应用的通信性能。据我们所知,OMB-PY是第一个用于并行Python应用程序的通信基准套件。 OMB-PY由各种点对点和集体通信基准测试组成,该测试是为一系列流行的Python库实施,包括Numpy,Cupy,NumBa和Pycuda。我们还为几个分布式ML算法提供Python实现,作为基准,以了解ML / DL工作负载性能的潜在增益。我们的评价显示,与本机MPI库相比,MPI4PY会引入小开销。与顺序执行相比,我们还评估ML / DL工作负载并在224 CPU内核上报告高达106倍的加速。我们计划公开发布OMB-PY以使Python HPC社区受益。

ML-61-标题: Adversarial Socialbot Learning via Multi-Agent Deep Hierarchical Reinforcement Learning

链接: https://arxiv.org/abs/2110.10655
作者: Thai Le, Long Tran-Thanh, Dongwon Lee
备注:

点击查看摘要

Abstract: Socialbots are software-driven user accounts on social platforms, acting autonomously (mimicking human behavior), with the aims to influence the opinions of other users or spread targeted misinformation for particular goals. As socialbots undermine the ecosystem of social platforms, they are often considered harmful. As such, there have been several computational efforts to auto-detect the socialbots. However, to our best knowledge, the adversarial nature of these socialbots has not yet been studied. This begs a question “can adversaries, controlling socialbots, exploit AI techniques to their advantage?” To this question, we successfully demonstrate that indeed it is possible for adversaries to exploit computational learning mechanism such as reinforcement learning (RL) to maximize the influence of socialbots while avoiding being detected. We first formulate the adversarial socialbot learning as a cooperative game between two functional hierarchical RL agents. While one agent curates a sequence of activities that can avoid the detection, the other agent aims to maximize network influence by selectively connecting with right users. Our proposed policy networks train with a vast amount of synthetic graphs and generalize better than baselines on unseen real-life graphs both in terms of maximizing network influence (up to +18%) and sustainable stealthiness (up to +40% undetectability) under a strong bot detector (with 90% detection accuracy). During inference, the complexity of our approach scales linearly, independent of a network’s structure and the virality of news. This makes our approach a practical adversarial attack when deployed in a real-life setting.

摘要:社会行为是社交平台上的软件驱动的用户账户,自主(模仿人类行为),旨在影响其他用户的意见或对特定目标传播有针对性的错误信息。随着社会行为破坏社会平台的生态系统,它们通常被认为是有害的。因此,已经有几项计算努力来自动检测社会行为。然而,为了我们最好的知识,尚未研究这些社会窃贼的对抗性本质。这引出了一个问题“可以对来自的对手,控制社会行动,利用它的优势来利用AI技术?”对于这个问题,我们成功地证明了对避免被检测到的钢筋学习(RL)的计算学习机制,以最大化社会障碍的影响,以实现对抗的侵略性。我们首先将对抗的社会别墅学习作为两个功能分层RL代理商之间的合作游戏。虽然一个代理策划了一系列可以避免检测的活动,但另一个代理旨在通过选择性地连接正确的用户来最大化网络影响。我们所提出的策略网络培训具有大量合成图的培训,并在最大化网络影响(高达+ 18%)和可持续的隐私性(最高+ 40%未检测性)方面,概括了不良的现实生活图中的基线。强机器探测器(检测精度为90%)。在推论期间,我们的方法的复杂性线性地缩放,与网络的结构和新闻的视点无关。这使得我们的方法在现实生活中部署时进行了实际的逆势攻击。

ML-62-标题: More Efficient Exploration with Symbolic Priors on Action Sequence Equivalences

链接: https://arxiv.org/abs/2110.10632
作者: Toby Johnstone, Nathan Grinsztajn, Johan Ferret, Philippe Preux
备注:

点击查看摘要

Abstract: Incorporating prior knowledge in reinforcement learning algorithms is mainly an open question. Even when insights about the environment dynamics are available, reinforcement learning is traditionally used in a tabula rasa setting and must explore and learn everything from scratch. In this paper, we consider the problem of exploiting priors about action sequence equivalence: that is, when different sequences of actions produce the same effect. We propose a new local exploration strategy calibrated to minimize collisions and maximize new state visitations. We show that this strategy can be computed at little cost, by solving a convex optimization problem. By replacing the usual epsilon-greedy strategy in a DQN, we demonstrate its potential in several environments with various dynamic structures.

摘要:在加固学习算法中纳入先验知识主要是一个开放的问题。即使有关环境动态的见解,也可以在Tabula Rasa设置中使用加固学习,并且必须从头开始探索和学习所有内容。在本文中,我们考虑利用对动作序列等价的前沿的问题:即,当不同的行动序列产生相同的效果时。我们提出了一种新的本地探索策略,以最大限度地减少碰撞并最大限度地提高新的国家审视。我们表明,通过解决凸优化问题,可以几乎没有成本计算该策略。通过在DQN中取代通常的epsilon贪婪策略,我们在具有各种动态结构的若干环境中展示了其潜力。

ML-63-标题: Independent Natural Policy Gradient Always Converges in Markov Potential Games

链接: https://arxiv.org/abs/2110.10614
作者: Roy Fox, Stephen McAleer, Will Overman, Ioannis Panageas
备注: 24 pages

点击查看摘要

Abstract: Multi-agent reinforcement learning has been successfully applied to fully-cooperative and fully-competitive environments, but little is currently known about mixed cooperative/competitive environments. In this paper, we focus on a particular class of multi-agent mixed cooperative/competitive stochastic games called Markov Potential Games (MPGs), which include cooperative games as a special case. Recent results have shown that independent policy gradient converges in MPGs but it was not known whether Independent Natural Policy Gradient converges in MPGs as well. We prove that Independent Natural Policy Gradient always converges in the last iterate using constant learning rates. The proof deviates from the existing approaches and the main challenge lies in the fact that Markov Potential Games do not have unique optimal values (as single-agent settings exhibit) so different initializations can lead to different limit point values. We complement our theoretical results with experiments that indicate that Natural Policy Gradient outperforms Policy Gradient in routing games and congestion games.

摘要:多档强化学习已成功应用于完全合作和完全竞争的环境,但目前对混合合作/竞争环境知之甚少。在本文中,我们专注于称为马尔可夫潜在游戏(MPGS)的特定类别的多代理混合合作/竞争随机游戏,包括合作游戏作为特殊情况。最近的结果表明,独立的政策梯度在MPG中收敛,但尚不清楚独立的自然政策梯度是否在MPG中收敛。我们证明了独立的自然政策梯度始终使用恒定的学习速率在最后迭代中收敛。证明偏离现有方法,主要挑战在于,马尔可夫潜在游戏没有独特的最佳值(作为单代理设置展示),因此不同的初始化可能导致不同的限制点值。我们通过实验补充我们的理论结果,表明自然政策梯度优于路由游戏和拥堵游戏中的政策梯度。

ML-64-标题: Transductive Robust Learning Guarantees

链接: https://arxiv.org/abs/2110.10602
作者: Omar Montasser, Steve Hanneke, Nathan Srebro
备注:

点击查看摘要

Abstract: We study the problem of adversarially robust learning in the transductive setting. For classes H\mathcal{H} of bounded VC dimension, we propose a simple transductive learner that when presented with a set of labeled training examples and a set of unlabeled test examples (both sets possibly adversarially perturbed), it correctly labels the test examples with a robust error rate that is linear in the VC dimension and is adaptive to the complexity of the perturbation set. This result provides an exponential improvement in dependence on VC dimension over the best known upper bound on the robust error in the inductive setting, at the expense of competing with a more restrictive notion of optimal robust error.

摘要:我们研究了转换环境中对抗的稳健学习问题。对于课程$ \ Mathcal {H} $界限VC维度,我们提出了一个简单的转换学习者,当呈现一套标记的训练示例和一组未标记的测试示例(两组可能是对抗的扰动)时,它正确地标记了测试具有稳健错误率的示例,其在VC维度中是线性的,并且适用于扰动集的复杂性。该结果提供了依赖于电感设置中强大错误的最佳已知的上限的VC维度的指数改进,以竞争与最佳稳健误差的更严格概念的竞争。

ML-65-标题: Color Teams for Machine Learning Development

链接: https://arxiv.org/abs/2110.10601
作者: Josh Kalin, David Noever, Matthew Ciolino
备注: 8 Pages, 6 Figures

点击查看摘要

Abstract: Machine learning and software development share processes and methodologies for reliably delivering products to customers. This work proposes the use of a new teaming construct for forming machine learning teams for better combatting adversarial attackers. In cybersecurity, infrastructure uses these teams to protect their systems by using system builders and programmers to also offer more robustness to their platforms. Color teams provide clear responsibility to the individuals on each team for which part of the baseline (Yellow), attack (Red), and defense (Blue) breakout of the pipeline. Combining colors leads to additional knowledge shared across the team and more robust models built during development. The responsibilities of the new teams Orange, Green, and Purple will be outlined during this paper along with an overview of the necessary resources for these teams to be successful.

摘要:机器学习和软件开发股票流程和方法,可靠地向客户提供产品。这项工作提出了使用新的合作结构来形成机器学习团队,以便更好地打击对抗性攻击者。在网络安全中,基础架构使用这些团队使用系统建设者和程序员来保护他们的系统,也为他们的平台提供更多的稳健性。Color团队对每个团队的个人提供明确的责任,其中部分基线(黄色),攻击(红色)和管道的攻击(蓝色)突破。结合颜色导致在开发期间建造的团队和更强大的型号共享其他知识。在本文期间,将概述新团队橙色,绿色和紫色的责任,并概述了这些团队成功的必要资源。

ML-66-标题: Time-Domain Mapping Based Single-Channel Speech Separation With Hierarchical Constraint Training

链接: https://arxiv.org/abs/2110.10593
作者: Chenyang Gao, Yue Gu, Ivan Marsic
备注:

点击查看摘要

Abstract: Single-channel speech separation is required for multi-speaker speech recognition. Recent deep learning-based approaches focused on time-domain audio separation net (TasNet) because it has superior performance and lower latency compared to the conventional time-frequency-based (T-F-based) approaches. Most of these works rely on the masking-based method that estimates a linear mapping function (mask) for each speaker. However, the other commonly used method, the mapping-based method that is less sensitive to SNR variations, is inadequately studied in the time domain. We explore the potential of the mapping-based method by introducing attention augmented DPRNN (AttnAugDPRNN) which directly approximates the clean sources from the mixture for speech separation. Permutation Invariant Training (PIT) has been a paradigm to solve the label ambiguity problem for speech separation but usually leads to suboptimal performance. To solve this problem, we propose an efficient training strategy called Hierarchical Constraint Training (HCT) to regularize the training, which could effectively improve the model performance. When using PIT, our results showed that mapping-based AttnAugDPRNN outperformed masking-based AttnAugDPRNN when the training corpus is large. Mapping-based AttnAugDPRNN with HCT significantly improved the SI-SDR by 10.1% compared to the masking-based AttnAugDPRNN without HCT.

摘要:多扬声器语音识别需要单通道语音分离。最近基于深度学习的方法专注于时域音频分离网(TASNet),因为与基于传统的时间频率的(基于T-F基)的方法相比,它具有卓越的性能和更低的等待时间。大多数这些作品依赖于基于掩蔽的方法,估计每个扬声器的线性映射函数(掩码)。然而,在时域中对其他常用的方法,对SNR变化敏感的基于映射的方法。我们通过引入增强DPRNN(ATTNAUGDPRNN)来探讨基于映射的方法的潜力,该DPRNN(ATTNAUGDPRNN)直接近似于混合物的清洁源进行语音分离。置换不变培训(PIT)是解决语音分离的标签模糊问题的范例,但通常会导致次优的性能。为了解决这个问题,我们提出了一个有效的培训策略,称为分层约束培训(HCT)来规范培训,可以有效地提高模型性能。使用坑时,我们的结果表明,当培训语料库大时,基于映射的attnaugdprnn优于掩蔽的掩蔽attnaugdprnn。与没有HCT的掩蔽的attnaugdprnn相比,基于MCT的attnaugdPRNN与HCT显着改善了10.1%。

ML-67-标题: Distributionally Robust Semi-Supervised Learning Over Graphs

链接: https://arxiv.org/abs/2110.10582
作者: Alireza Sadeghi, Meng Ma, Bingcong Li, Georgios B. Giannakis
备注:

点击查看摘要

Abstract: Semi-supervised learning (SSL) over graph-structured data emerges in many network science applications. To efficiently manage learning over graphs, variants of graph neural networks (GNNs) have been developed recently. By succinctly encoding local graph structures and features of nodes, state-of-the-art GNNs can scale linearly with the size of graph. Despite their success in practice, most of existing methods are unable to handle graphs with uncertain nodal attributes. Specifically whenever mismatches between training and testing data distribution exists, these models fail in practice. Challenges also arise due to distributional uncertainties associated with data acquired by noisy measurements. In this context, a distributionally robust learning framework is developed, where the objective is to train models that exhibit quantifiable robustness against perturbations. The data distribution is considered unknown, but lies within a Wasserstein ball centered around empirical data distribution. A robust model is obtained by minimizing the worst expected loss over this ball. However, solving the emerging functional optimization problem is challenging, if not impossible. Advocating a strong duality condition, we develop a principled method that renders the problem tractable and efficiently solvable. Experiments assess the performance of the proposed method.

摘要:在许多网络科学应用中出现了半监督学习(SSL)在图形结构数据中出现。为了有效地管理学习,最近开发了图形神经网络(GNNS)的变体。通过简洁地编码节点的本地图形结构和特征,最先进的GNN可以与图的大小线性缩放。尽管他们在实践中取得了成功,但大多数现有方法都无法处理具有不确定的节点属性的图表。具体经时,每当存在训练和测试数据分布之间的不匹配时,这些模型都在实践中失败。由于与噪声测量获得的数据相关的分配不确定性,挑战也会出现。在这种情况下,开发了一种分布稳健的学习框架,其中目标是培训表现出对扰动的可量化稳健性的模型。数据分布被认为是未知的,但位于围绕经验数据分布的Wassersein球内。通过最小化该球的最差预期损失来获得鲁棒模型。然而,解决新兴功能优化问题是挑战,如果不是不可能的话。倡导强大的二元状况,我们开发了一个原则的方法,使得破坏和有效可解决的问题。实验评估了所提出的方法的性能。

ML-68-标题: Behavioral Experiments for Understanding Catastrophic Forgetting

链接: https://arxiv.org/abs/2110.10570
作者: Samuel J. Bell, Neil D. Lawrence
备注:

点击查看摘要

Abstract: In this paper we explore whether the fundamental tool of experimental psychology, the behavioral experiment, has the power to generate insight not only into humans and animals, but artificial systems too. We apply the techniques of experimental psychology to investigating catastrophic forgetting in neural networks. We present a series of controlled experiments with two-layer ReLU networks, and exploratory results revealing a new understanding of the behavior of catastrophic forgetting. Alongside our empirical findings, we demonstrate an alternative, behavior-first approach to investigating neural network phenomena.

摘要:在本文中,我们探讨了实验性心理学的基本工具,行为实验,不仅能够生成洞察力,不仅是人类和动物,还具有人工系统。我们应用实验心理学的技术来调查神经网络中的灾难性遗忘。我们提出了一系列具有双层Relu网络的受控实验,探索性结果揭示了对灾难性遗忘行为的新了解。除了我们的实证调查结果,我们展示了一种替代,行为 - 第一种调查神经网络现象的方法。

ML-69-标题: Why Settle for Just One? Extending EL Ontology Embeddings with Many-to-Many Relationships

链接: https://arxiv.org/abs/2110.10555
作者: Biswesh Mohapatra, Sumit Bhatia, Raghava Mutharaju, G. Srinivasaraghavan
备注: The paper got accepted in SemrRec challenge in ISWC 2021

点击查看摘要

Abstract: Knowledge Graph (KG) embeddings provide a low-dimensional representation of entities and relations of a Knowledge Graph and are used successfully for various applications such as question answering and search, reasoning, inference, and missing link prediction. However, most of the existing KG embeddings only consider the network structure of the graph and ignore the semantics and the characteristics of the underlying ontology that provides crucial information about relationships between entities in the KG. Recent efforts in this direction involve learning embeddings for a Description Logic (logical underpinning for ontologies) named EL++. However, such methods consider all the relations defined in the ontology to be one-to-one which severely limits their performance and applications. We provide a simple and effective solution to overcome this shortcoming that allows such methods to consider many-to-many relationships while learning embedding representations. Experiments conducted using three different EL++ ontologies show substantial performance improvement over five baselines. Our proposed solution also paves the way for learning embedding representations for even more expressive description logics such as SROIQ.

摘要:知识图(kg)嵌入式提供知识图表的实体和关系的低维表示,并成功用于各种应用,如问题应答和搜索,推理,推理和缺少链路预测。然而,大多数现有的KG嵌入物只考虑图形的网络结构,并忽略基础本体的语义和特征,提供了关于KG中实体之间关系的重要信息。最近在这方面的努力涉及学习嵌入的描述逻辑(本体逻辑为Ontologies)命名为EL ++。但是,此类方法考虑本体中定义的所有关系,是一对一,严重限制其性能和应用程序。我们提供了一个简单有效的解决方案,以克服这种缺点,允许在学习嵌入表示时考虑多对多关系的方法。使用三种不同的EL ++本体进行的实验显示出超过五个基线的性能改进。我们所提出的解决方案还为诸如SROIQ等更具表现力的描述逻辑学习嵌入表示的方式铺平了道路。

ML-70-标题: Synthesizing Optimal Parallelism Placement and Reduction Strategies on Hierarchical Systems for Deep Learning

链接: https://arxiv.org/abs/2110.10548
作者: Ningning Xie, Tamara Norman, Dominik Grewe, Dimitrios Vytiniotis
备注: Submitted to the 5th MLSys Conference

点击查看摘要

Abstract: We present a novel characterization of the mapping of multiple parallelism forms (e.g. data and model parallelism) onto hierarchical accelerator systems that is hierarchy-aware and greatly reduces the space of software-to-hardware mapping. We experimentally verify the substantial effect of these mappings on all-reduce performance (up to 448x). We offer a novel syntax-guided program synthesis framework that is able to decompose reductions over one or more parallelism axes to sequences of collectives in a hierarchy- and mapping-aware way. For 69% of parallelism placements and user requested reductions, our framework synthesizes programs that outperform the default all-reduce implementation when evaluated on different GPU hierarchies (max 2.04x, average 1.27x). We complement our synthesis tool with a simulator exceeding 90% top-10 accuracy, which therefore reduces the need for massive evaluations of synthesis results to determine a small set of optimal programs and mappings.

摘要:我们提出了一种新颖的特征,对多个并行形式(例如数据和模型并行性)的映射到分层加速器系统,该分层加速器系统是众所周知的,大大减少了软件到硬件映射的空间。我们通过实验验证这些映射对全部降低性能的实质性效果(最多448倍)。我们提供一种新颖的语法引导程序综合框架,能够以层次结构和绘图感知的方式分解在一个或多个并行轴上减少一个或多个并行轴到集体的序列。对于69%的并行性展示位置和用户请求的缩减,我们的框架合成了在不同GPU层次结构(最大2.04x平均1.27倍)上进行评估时享受默认全部减少实现的程序。我们将综合工具补充,模拟器超过90%的前10前10个精度,因此减少了对综合结果进行大规模评估的需求,以确定一小一小一组最佳节目和映射。

ML-71-标题: Ranking and Tuning Pre-trained Models: A New Paradigm of Exploiting Model Hubs

链接: https://arxiv.org/abs/2110.10545
作者: Kaichao You, Yong Liu, Jianmin Wang, Michael I. Jordan, Mingsheng Long
备注: 45 pages

点击查看摘要

Abstract: Pre-trained model hubs with many pre-trained models (PTMs) have been a cornerstone in deep learning. Although built at a high cost, they are in fact \emph{under-exploited}: practitioners usually pick one PTM from the provided model hub by popularity, and then fine-tune the PTM to solve the target task. This nave but common practice poses two obstacles to sufficiently exploiting pre-trained model hubs: (1) the PTM selection procedure has no optimality guarantee; (2) only one PTM is used while the rest PTMs are overlooked. Ideally, to maximally exploit pre-trained model hubs, trying all combinations of PTMs and extensively fine-tuning each combination of PTMs are required, which incurs exponential combinations and unaffordable computational budget. In this paper, we propose a new paradigm of exploiting model hubs by ranking and tuning pre-trained models: (1) Our conference work~\citep{you_logme:_2021} proposed LogME to estimate the maximum value of label evidence given features extracted by pre-trained models, which can rank all the PTMs in a model hub for various types of PTMs and tasks \emph{before fine-tuning}. (2) the best ranked PTM can be fine-tuned and deployed if we have no preference for the model’s architecture, or the target PTM can be tuned by top-K ranked PTMs via the proposed B-Tuning algorithm. The ranking part is based on the conference paper, and we complete its theoretical analysis (convergence proof of the heuristic evidence maximization procedure, and the influence of feature dimension) in this paper. The tuning part introduces a novel Bayesian Tuning (B-Tuning) method for multiple PTMs tuning, which surpasses dedicated methods designed for homogeneous PTMs tuning and sets up new state of the art for heterogeneous PTMs tuning. We believe the new paradigm of exploiting PTM hubs can interest a large audience of the community.

摘要:具有许多预训练模型(PTMS)的预训练模型中心是深度学习的基石。虽然以高成本建立,但它们实际上是\ emph {被剥削的}:从业者通常通过人气从提供的模型中心挑选一个PTM,然后微调PTM来解决目标任务。这个殿但常见的做法造成了足够利用预先训练的模型集线器的两个障碍:(1)PTM选择程序没有最优保障; (2)仅使用一个PTM,而REST PTM被忽视。理想情况下,为了最大限度地利用预先训练的模型集线器,尝试所有PTMS的组合和广泛微调每个PTMS的组合,这会引发指数组合和未支付的计算预算。在本文中,我们提出了一种通过排序和调整预先训练的模型来利用模型集线器的新范式:(1)我们的会议工作〜\ citep {you_logme:_2021}拟议的logme来估计所提取的标签证据的最大值预先训练的模型,可以为各种类型的PTM和任务\ \ \ emph {在微调}之前,可以在模型集线器中排列所有PTM。 (2)如果我们对模型的架构没有偏好,可以进行微调和部署的最佳排名PTM,或者通过所提出的B-Tuning算法可以由Top-K排名的PTM调整目标PTM。排名部分基于会议论文,我们在本文中完成了理论分析(启发式证据最大化程序的收敛证明,以及特征维度的影响)。调音部分介绍了一种用于多种PTMS调谐的新型贝叶斯调谐(B-TUNING)方法,其超越专用用于均匀PTMS调谐的专用方法,并为异构PTMS调谐设置新技术。我们相信开发PTM中心的新范式可以利益,这是社区的大量受众。

ML-72-标题: Sampling from Arbitrary Functions via PSD Models

链接: https://arxiv.org/abs/2110.10527
作者: Ulysse Marteau-Ferey (SIERRA, PSL), Alessandro Rudi (PSL, SIERRA), Francis Bach (PSL, SIERRA)
备注:

点击查看摘要

Abstract: In many areas of applied statistics and machine learning, generating an arbitrary number of independent and identically distributed (i.i.d.) samples from a given distribution is a key task. When the distribution is known only through evaluations of the density, current methods either scale badly with the dimension or require very involved implementations. Instead, we take a two-step approach by first modeling the probability distribution and then sampling from that model. We use the recently introduced class of positive semi-definite (PSD) models, which have been shown to be efficient for approximating probability densities. We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models. We also present preliminary empirical results to illustrate our assertions.

摘要:在应用统计和机器学习的许多领域,从给定分发生成任意数量的独立和相同分布的(i.i.d.)样本是一个关键任务。当分布仅通过浓度的评估时才已知,当前方法与维度严重规模或需要非常涉及的实现。相反,我们通过首先建模概率分布然后从该模型进行采样来采取两步方法。我们使用最近引入的正半明确(PSD)模型,已被证明是有效的近似概率密度。我们表明这些模型可以简明扼要地使用少数评估来近似大量密度,并提出了一种简单的算法,可以从这些模型中有效地采样。我们还提出了初步的经验结果来说明我们的断言。

ML-73-标题: Statistical and Topological Properties of Gaussian Smoothed Sliced Probability Divergences

链接: https://arxiv.org/abs/2110.10524
作者: Alain Rakotomamonjy, Mokhtar Z. Alaya (LMAC), Maxime Berar (DocApp - LITIS), Gilles Gasso (DocApp - LITIS)
备注:

点击查看摘要

Abstract: Gaussian smoothed sliced Wasserstein distance has been recently introduced for comparing probability distributions, while preserving privacy on the data. It has been shown, in applications such as domain adaptation, to provide performances similar to its non-private (non-smoothed) counterpart. However, the computational and statistical properties of such a metric is not yet been well-established. In this paper, we analyze the theoretical properties of this distance as well as those of generalized versions denoted as Gaussian smoothed sliced divergences. We show that smoothing and slicing preserve the metric property and the weak topology. We also provide results on the sample complexity of such divergences. Since, the privacy level depends on the amount of Gaussian smoothing, we analyze the impact of this parameter on the divergence. We support our theoretical findings with empirical studies of Gaussian smoothed and sliced version of Wassertein distance, Sinkhorn divergence and maximum mean discrepancy (MMD). In the context of privacy-preserving domain adaptation, we confirm that those Gaussian smoothed sliced Wasserstein and MMD divergences perform very well while ensuring data privacy.

摘要:最近推出了高斯平滑切片的Wasserstein距离,用于比较概率分布,同时保留数据隐私。已经显示在诸如域适应的应用中,以提供类似于其非私有(非平滑)对应的性能。然而,尚未确定这种度量的计算和统计特性。在本文中,我们分析了该距离的理论特性以及表示为高斯平滑切片分流的广义版本的理论属性。我们展示了平滑和切片保持了公制财产和弱拓扑。我们还提供了这些分歧的样本复杂性的结果。由于,隐私级别取决于高斯平滑的量,我们分析了该参数对分歧的影响。我们支持我们的理论调查结果,具有高斯平滑,切片版的Wassertein距离,陷入困境和最大差异(MMD)的实证研究。在保留隐私域适应的背景下,我们确认那些高斯平滑切片的Wasserstein和MMD分流在确保数据隐私时表现得非常好。

ML-74-标题: CIM-PPO:Proximal Policy Optimization with Liu-Correntropy Induced Metric

链接: https://arxiv.org/abs/2110.10522
作者: Yunxiao Guo, Han Long, Xiaojun Duan, Kaiyuan Feng, Maochu Li, Xiaying Ma
备注:

点击查看摘要

Abstract: As an algorithm based on deep reinforcement learning, Proximal Policy Optimization (PPO) performs well in many complex tasks and has become one of the most popular RL algorithms in recent years. According to the mechanism of penalty in surrogate objective, PPO can be divided into PPO with KL Divergence (KL-PPO) and PPO with Clip function(Clip-PPO). Clip-PPO is widely used in a variety of practical scenarios and has attracted the attention of many researchers. Therefore, many variations have also been created, making the algorithm better and better. However, as a more theoretical algorithm, KL-PPO was neglected because its performance was not as good as CliP-PPO. In this article, we analyze the asymmetry effect of KL divergence on PPO’s objective function , and give the inequality that can indicate when the asymmetry will affect the efficiency of KL-PPO. Proposed PPO with Correntropy Induced Metric algorithm(CIM-PPO) that use the theory of correntropy(a symmetry metric method that was widely used in M-estimation to evaluate two distributions’ difference)and applied it in PPO. Then, we designed experiments based on OpenAIgym to test the effectiveness of the new algorithm and compare it with KL-PPO and CliP-PPO.

摘要:作为基于深度加强学习的算法,近端政策优化(PPO)在许多复杂的任务中表现良好,并且已成为近年来最受欢迎的RL算法之一。根据替代物镜的罚球机制,PPO可以用KL发散(KL-PPO)和具有夹子功能的PPO分为PPO(CLIP-PPO)。 CLIP-PPO广泛用于各种实践情景,并引起了许多研究人员的注意。因此,还创建了许多变化,使算法更好,更好地更好。然而,作为一种更为理论的算法,KL-PPO被忽略了,因为其性能并不像剪辑PPO一样好。在本文中,我们分析了对PPO的目标函数对KL发散的不对称影响,并给出了可能表明当不对称性会影响KL-PPO效率的不等式。提出了PPO与使用控制理论的管制诱导度量算法(CIM-PPO)(一种对称度量方法,广泛用于M估计以评估两个分布的差异)并将其应用于PPO中。然后,我们设计了基于OpenAigym的实验,以测试新算法的有效性,并将其与KL-PPO和CLIP-PPO进行比较。

ML-75-标题: Periodic DMP formulation for Quaternion Trajectories

链接: https://arxiv.org/abs/2110.10510
作者: Fares J. Abu-Dakka, Matteo Saveriano, Luka Peternel
备注: 2021 20th International Conference on Advanced Robotics (ICAR)

点击查看摘要

Abstract: Imitation learning techniques have been used as a way to transfer skills to robots. Among them, dynamic movement primitives (DMPs) have been widely exploited as an effective and an efficient technique to learn and reproduce complex discrete and periodic skills. While DMPs have been properly formulated for learning point-to-point movements for both translation and orientation, periodic ones are missing a formulation to learn the orientation. To address this gap, we propose a novel DMP formulation that enables encoding of periodic orientation trajectories. Within this formulation we develop two approaches: Riemannian metric-based projection approach and unit quaternion based periodic DMP. Both formulations exploit unit quaternions to represent the orientation. However, the first exploits the properties of Riemannian manifolds to work in the tangent space of the unit sphere. The second encodes directly the unit quaternion trajectory while guaranteeing the unitary norm of the generated quaternions. We validated the technical aspects of the proposed methods in simulation. Then we performed experiments on a real robot to execute daily tasks that involve periodic orientation changes (i.e., surface polishing/wiping and liquid mixing by shaking).

摘要:仿制学习技术已被用作将技能转移到机器人的方法。其中,动态运动原语(DMP)已被广泛利用为学习和再现复杂的离散和周期技能的有效和有效的技术。虽然DMPS已被适当地制定用于学习翻译和方向的学习点对点运动,但定期的缺少缺少制定以学习方向的制定。为了解决这一差距,我们提出了一种新的DMP制剂,可以进行周期性定向轨迹的编码。在此配方中,我们开发了两种方法:基于Riemannian公制的投影方法和基于单位的周期性DMP。两个配方都利用单位四元数来代表方向。然而,首先利用黎曼歧管的性质在单位球体的切线空间中工作。第二个编码直接单位四元音轨迹,同时保证生成的四元数的酉规范。我们验证了模拟中提出的方法的技术方面。然后,我们对真正的机器人进行了实验,以执行涉及周期定向变化的日常任务(即,通过摇动的表面抛光/擦拭和液体混合)。

ML-76-标题: Transferring Reinforcement Learning for DC-DC Buck Converter Control via Duty Ratio Mapping: From Simulation to Implementation

链接: https://arxiv.org/abs/2110.10490
作者: Chenggang Cui, Tianxiao Yang, Yuxuan Dai, Chuanlin Zhang
备注:

点击查看摘要

Abstract: Reinforcement learning (RL) control approach with application into power electronics systems has become an emerging topic whilst the sim-to-real issue remains a challenging problem as very few results can be referred to in the literature. Indeed, due to the inevitable mismatch between simulation models and real-life systems, offline trained RL control strategies may sustain unexpected hurdles in practical implementation during transferring procedure. As the main contribution of this paper, a transferring methodology via a delicately designed duty ratio mapping (DRM) is proposed for a DC-DC buck converter. Then, a detailed sim-to-real process is presented to enable the implementation of a model-free deep reinforcement learning (DRL) controller. The feasibility and effectiveness of the proposed methodology are demonstrated by comparative experimental studies.

摘要:加固学习(RL)控制方法应用于电力电子系统已成为一个新兴主题,同时SIM-TO-REAL问题仍然是一个具有挑战性的问题,因为很少有结果可以在文献中提及。实际上,由于仿真模型和现实生活系统之间的不可避免的不匹配,离线训练有素的RL控制策略可能在转移过程中在实际实现中维持意外障碍。作为本文的主要贡献,为DC-DC降压转换器提出了经由精致设计的占空比映射(DRM)的传输方法。然后,提出了详细的SIM-实际过程,以实现无模型的深增强学习(DRL)控制器。通过比较实验研究证明了所提出的方法的可行性和有效性。

ML-77-标题: A TinyML Platform for On-Device Continual Learning with Quantized Latent Replays

链接: https://arxiv.org/abs/2110.10486
作者: Leonardo Ravaglia, Manuele Rusci, Davide Nadalini, Alessandro Capotondi, Francesco Conti, Luca Benini
备注: 14 pages

点击查看摘要

Abstract: In the last few years, research and development on Deep Learning models and techniques for ultra-low-power devices in a word, TinyML has mainly focused on a train-then-deploy assumption, with static models that cannot be adapted to newly collected data without cloud-based data collection and fine-tuning. Latent Replay-based Continual Learning (CL) techniques[1] enable online, serverless adaptation in principle, but so farthey have still been too computation and memory-hungry for ultra-low-power TinyML devices, which are typically based on microcontrollers. In this work, we introduce a HW/SW platform for end-to-end CL based on a 10-core FP32-enabled parallel ultra-low-power (PULP) processor. We rethink the baseline Latent Replay CL algorithm, leveraging quantization of the frozen stage of the model and Latent Replays (LRs) to reduce their memory cost with minimal impact on accuracy. In particular, 8-bit compression of the LR memory proves to be almost lossless (-0.26% with 3000LR) compared to the full-precision baseline implementation, but requires 4x less memory, while 7-bit can also be used with an additional minimal accuracy degradation (up to 5%). We also introduce optimized primitives for forward and backward propagation on the PULP processor. Our results show that by combining these techniques, continual learning can be achieved in practice using less than 64MB of memory an amount compatible with embedding in TinyML devices. On an advanced 22nm prototype of our platform, called VEGA, the proposed solution performs onaverage 65x faster than a low-power STM32 L4 microcontroller, being 37x more energy efficient enough for a lifetime of 535h when learning a new mini-batch of data once every minute.

摘要:在过去的几年里,在一个单词中对深度学习模型和技术的研究和开发,Tinyml主要集中在火车站 - 然后部署的假设上,静态模型不能适应新的模型收集的数据没有基于云的数据收集和微调。基于潜在的重播的持续学习(CL)技术[1]在线,无服务器适应原则上,但对于超低功耗TinyML设备而言,Farthey仍然过于计算和饥饿,这通常基于微控制器。在这项工作中,我们基于10核FP32支持的并行超低功耗(纸浆)处理器,为端到端CL引入了HW / SW平台。我们重新思考基线潜在重放CL算法,利用模型的冻结阶段的量化和潜在的重放(LRS)来降低其存储器成本,对准确性的影响最小。特别是,与全精密基线实现相比,LR内存的8位压缩是几乎无损(-0.26%,3000LR),但需要4倍,内存较少,而7位也可以与额外的最小值一起使用精度降解(最高5%)。我们还介绍了针对纸浆处理器上的前向和后向传播的优化原语。我们的研究结果表明,通过组合这些技术,可以在实践中使用小于64MB的内存来实现持续学习,这与TinyML设备嵌入的量兼容。在我们平台的高级22nm原型上,称为Vega,所提出的解决方案比低功耗STM32 L4微控制器更快地执行OnaVerage 65x,在每次学习新的百分比数据时,足够的能量效率为37倍。分钟。

ML-78-标题: Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation

链接: https://arxiv.org/abs/2110.10461
作者: Ross M. Clarke, Elre T. Oldewage, José Miguel Hernández-Lobato
备注: 34 pages, 18 figures, 13 tables

点击查看摘要

Abstract: Machine learning training methods depend plentifully and intricately on hyperparameters, motivating automated strategies for their optimisation. Many existing algorithms restart training for each new hyperparameter choice, at considerable computational cost. Some hypergradient-based one-pass methods exist, but these either cannot be applied to arbitrary optimiser hyperparameters (such as learning rates and momenta) or take several times longer to train than their base models. We extend these existing methods to develop an approximate hypergradient-based hyperparameter optimiser which is applicable to any continuous hyperparameter appearing in a differentiable model weight update, yet requires only one training episode, with no restarts. We also provide a motivating argument for convergence to the true hypergradient, and perform tractable gradient-based optimisation of independent learning rates for each model parameter. Our method performs competitively from varied random hyperparameter initialisations on several UCI datasets and Fashion-MNIST (using a one-layer MLP), Penn Treebank (using an LSTM) and CIFAR-10 (using a ResNet-18), in time only 2-3x greater than vanilla training.

摘要:机器学习训练方法在近似参数上依赖和错综复杂,激励其优化的自动化策略。许多现有的算法重新启动每个新的超参数选择的训练,以相当大的计算成本。存在一些基于过的基于过度的单通方法,但这些方法不能应用于任意优化器超参数(例如学习率和动量),或者比其基础模型更长的时间来训练。我们扩展了这些现有方法来开发一个近似的基于的超级基于的超参数优化器,它适用于出现在可差异化的模型重量更新中的任何持续的超级参数,但只需要一个训练集,而没有重新启动。我们还提供了一种激励对真正的HyperGRadient的融合的论点,并且对每个模型参数进行独立学习率的基于渐变的基于梯度的优化。我们的方法竞争性地从多个UCI数据集和时尚 - MNIST上的不同随机的超级分数计初始化(使用一层MLP),Penn TreeBank(使用LSTM)和CiFar-10(使用Reset-18),仅2- 3x大于香草培训。

ML-79-标题: Reconstruction of Fragmented Trajectories of Collective Motion using Hadamard Deep Autoencoders

链接: https://arxiv.org/abs/2110.10428
作者: Kelum Gajamannage, Yonggi Park, Randy Paffenroth, Anura P. Jayasumana
备注: 21 Pages, 5 figures, submitted into Pattern Recognition

点击查看摘要

Abstract: Learning dynamics of collectively moving agents such as fish or humans is an active field in research. Due to natural phenomena such as occlusion and change of illumination, the multi-object methods tracking such dynamics might lose track of the agents where that might result fragmentation in the constructed trajectories. Here, we present an extended deep autoencoder (DA) that we train only on fully observed segments of the trajectories by defining its loss function as the Hadamard product of a binary indicator matrix with the absolute difference between the outputs and the labels. The trajectories of the agents practicing collective motion is low-rank due to mutual interactions and dependencies between the agents that we utilize as the underlying pattern that our Hadamard deep autoencoder (HDA) codes during its training. The performance of our HDA is compared with that of a low-rank matrix completion scheme in the context of fragmented trajectory reconstruction.

摘要:学习动态,诸如鱼类或人类的统称代理是研究中的一个活跃的领域。由于自然现象,例如遮挡和照明的变化,跟踪这种动态的多物体方法可能会丢失在构造的轨迹中可能导致碎片的代理。在这里,我们通过将其损失函数定义为二进制指示矩阵的Hadamard产品,介绍了一个扩展的深度自动统计程序(DA),即我们仅在轨迹的完全观察到的轨迹段中,以输出和标签之间的绝对差异。练习集体运动的代理人的轨迹是低级别的,因为我们利用作为我们的Hadamard Deep AutoEncoder(HDA)代码在其培训期间的潜在模式之间的代理商之间的相互作用和依赖性。将HDA的性能与在碎片轨迹重建的上下文中的低级矩阵完成方案的性能进行了比较。

ML-80-标题: ProxyBO: Accelerating Neural Architecture Search via Bayesian Optimization with Zero-cost Proxies

链接: https://arxiv.org/abs/2110.10423
作者: Yu Shen, Yang Li, Jian Zheng, Wentao Zhang, Peng Yao, Jixiang Li, Sen Yang, Ji Liu, Cui Bin
备注:

点击查看摘要

Abstract: Designing neural architectures requires immense manual efforts. This has promoted the development of neural architecture search (NAS) to automate this design. While previous NAS methods achieve promising results but run slowly and zero-cost proxies run extremely fast but are less promising, recent work considers utilizing zero-cost proxies via a simple warm-up. The existing method has two limitations, which are unforeseeable reliability and one-shot usage. To address the limitations, we present ProxyBO, an efficient Bayesian optimization framework that utilizes the zero-cost proxies to accelerate neural architecture search. We propose the generalization ability measurement to estimate the fitness of proxies on the task during each iteration and then combine BO with zero-cost proxies via dynamic influence combination. Extensive empirical studies show that ProxyBO consistently outperforms competitive baselines on five tasks from three public benchmarks. Concretely, ProxyBO achieves up to 5.41x and 3.83x speedups over the state-of-the-art approach REA and BRP-NAS, respectively.

摘要:设计神经架构需要巨大的手工努力。这促进了神经结构搜索(NAS)的发展,以自动化这种设计。虽然以前的NAS方法实现了有希望的结果,但运行缓慢而零成本代理运行非常快,但不太有希望,最近的工作通过简单的热身利用零成本代理。现有方法具有两个限制,这是可靠的可靠性和单次使用。为了解决限制,我们呈现Proxybo,一个高效的贝叶斯优化框架,利用零成本代理加速神经架构搜索。我们提出了泛化能力测量,以估计在每次迭代期间任务的代理的适应性,然后通过动态影响组合将BO与零成本代理组合。广泛的实证研究表明,Proxybo在三个公共基准中的五个任务中始终如一地优于竞争性基础。具体地说,Proxybo分别通过最先进的方法REA和BRP-NAS实现高达5.41倍和3.83倍的加速。

ML-81-标题: Encoding spatiotemporal priors with VAEs for small-area estimation

链接: https://arxiv.org/abs/2110.10422
作者: Elizaveta Semenova, Yidan Xu, Adam Howes, Theo Rashid, Samir Bhatt, Swapnil Mishra, Seth Flaxman
备注:

点击查看摘要

Abstract: Gaussian processes (GPs), implemented through multivariate Gaussian distributions for a finite collection of data, are the most popular approach in small-area spatiotemporal statistical modelling. In this context they are used to encode correlation structures over space and time and can generalise well in interpolation tasks. Despite their flexibility, off-the-shelf GPs present serious computational challenges which limit their scalability and practical usefulness in applied settings. Here, we propose a novel, deep generative modelling approach to tackle this challenge: for a particular spatiotemporal setting, we approximate a class of GP priors through prior sampling and subsequent fitting of a variational autoencoder (VAE). Given a trained VAE, the resultant decoder allows spatiotemporal inference to become incredibly efficient due to the low dimensional, independently distributed latent Gaussian space representation of the VAE. Once trained, inference using the VAE decoder replaces the GP within a Bayesian sampling framework. This approach provides tractable and easy-to-implement means of approximately encoding spatiotemporal priors and facilitates efficient statistical inference. We demonstrate the utility of our VAE two stage approach on Bayesian, small-area estimation tasks.

摘要:通过多元高斯分布实施的高斯进程(GPS)是有限的数据集合,是小区时空统计建模中最流行的方法。在这种情况下,它们用于在空间和时间上编码相关结构,并且可以在插值任务中概括很好。尽管他们灵活地,但现成的GPS存在严重的计算挑战,这限制了应用设置中的可扩展性和实际用途。在这里,我们提出了一种新颖的深入生成的建模方法来解决这一挑战:对于特定的时空设置,我们通过先前采样和随后的变分性AutoEncoder(VAE)的拟合来估计一类GP前沿。考虑到训练有素的VAE,所得到的解码器允许时尚推理由于VAE的低维度,独立分布的潜伏的高斯空间表示而变得非常有效。曾经接受过培训,使用VAE解码器推断替换贝叶斯采样框架内的GP。该方法提供了近似编码了时空前沿的易易且易于实现的手段,并有助于有效的统计推理。我们展示了我们的VAE两级方法对贝叶斯,小区域估计任务的效用。

ML-82-标题: JavaBERT: Training a transformer-based model for the Java programming language

链接: https://arxiv.org/abs/2110.10404
作者: Nelson Tavares de Sousa, Wilhelm Hasselbring
备注: 6 pages, to appear in the Proceedings of the 9th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE’2021)

点击查看摘要

Abstract: Code quality is and will be a crucial factor while developing new software code, requiring appropriate tools to ensure functional and reliable code. Machine learning techniques are still rarely used for software engineering tools, missing out the potential benefits of its application. Natural language processing has shown the potential to process text data regarding a variety of tasks. We argue, that such models can also show similar benefits for software code processing. In this paper, we investigate how models used for natural language processing can be trained upon software code. We introduce a data retrieval pipeline for software code and train a model upon Java software code. The resulting model, JavaBERT, shows a high accuracy on the masked language modeling task showing its potential for software engineering tools.

摘要:代码质量是一个关键因素,同时开发新的软件代码,需要适当的工具来确保功能和可靠的代码。机器学习技术仍然很少用于软件工程工具,缺失其应用的潜在好处。自然语言处理已经显示有可能处理关于各种任务的文本数据。我们争辩说,这样的模型也可以显示出类似的软件代码处理的好处。在本文中,我们调查如何在软件代码上培训用于自然语言处理的模型。我们为软件代码引入数据检索管道,并在Java软件代码上培训模型。由此产生的模型Javabert在屏蔽语言建模任务上显示了高精度,显示其对软件工程工具的潜力。

ML-83-标题: An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR

链接: https://arxiv.org/abs/2110.10402
作者: Huaibo Zhao, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi
备注: Accepted to APSIPA 2021

点击查看摘要

Abstract: In the present paper, an attempt is made to combine Mask-CTC and the triggered attention mechanism to construct a streaming end-to-end automatic speech recognition (ASR) system that provides high performance with low latency. The triggered attention mechanism, which performs autoregressive decoding triggered by the CTC spike, has shown to be effective in streaming ASR. However, in order to maintain high accuracy of alignment estimation based on CTC outputs, which is the key to its performance, it is inevitable that decoding should be performed with some future information input (i.e., with higher latency). It should be noted that in streaming ASR, it is desirable to be able to achieve high recognition accuracy while keeping the latency low. Therefore, the present study aims to achieve highly accurate streaming ASR with low latency by introducing Mask-CTC, which is capable of learning feature representations that anticipate future information (i.e., that can consider long-term contexts), to the encoder pre-training. Experimental comparisons conducted using WSJ data demonstrate that the proposed method achieves higher accuracy with lower latency than the conventional triggered attention-based streaming ASR system.

摘要:在本文中,尝试组合Mask-CTC和触发的注意机制,以构建流动端到端的自动语音识别(ASR)系统,该系统提供高延迟的高性能。执行由CTC Spike触发的自回归解码的触发注意机制,已在流式ASR中显示。然而,为了基于CTC输出保持高精度,这是其性能的关键,这是不可避免的,即应使用一些未来的信息输入来执行解码(即,具有更高延迟)。应该注意,在流式ASR中,希望能够在保持延迟低的同时实现高识别精度。因此,本研究旨在通过引入掩模-CTC来实现具有低延迟的高精度流ASR,该屏蔽-CTC能够学习预测未来信息的特征表示(即,可以考虑长期上下文),对编码器进行预培训。使用WSJ数据进行的实验比较表明,所提出的方法具有比传统的基于触发关注的流媒体ASR系统更低的延迟达到更高的精度。

ML-84-标题: Robust lEarned Shrinkage-Thresholding (REST): Robust unrolling for sparse recover

链接: https://arxiv.org/abs/2110.10391
作者: Wei Pu, Chao Zhou, Yonina C. Eldar, Miguel R.D. Rodrigues
备注:

点击查看摘要

Abstract: In this paper, we consider deep neural networks for solving inverse problems that are robust to forward model mis-specifications. Specifically, we treat sensing problems with model mismatch where one wishes to recover a sparse high-dimensional vector from low-dimensional observations subject to uncertainty in the measurement operator. We then design a new robust deep neural network architecture by applying algorithm unfolding techniques to a robust version of the underlying recovery problem. Our proposed network - named Robust lEarned Shrinkage-Thresholding (REST) - exhibits an additional normalization processing compared to Learned Iterative Shrinkage-Thresholding Algorithm (LISTA), leading to reliable recovery of the signal under sample-wise varying model mismatch. The proposed REST network is shown to outperform state-of-the-art model-based and data-driven algorithms in both compressive sensing and radar imaging problems wherein model mismatch is taken into consideration.

摘要:在本文中,我们考虑了深度神经网络,用于解决对转发模型错误规范的逆问题。具体地,我们对模型不匹配进行感测问题,其中一个人希望从低维观察恢复稀疏的高维向量,在测量操作员中受到不确定性的影响。然后,我们通过将算法展开技术应用于底层恢复问题的强大版本来设计一种新的强大的深度神经网络架构。我们所提出的网络 - 命名为强大的学习收缩 - 阈值(REST) - 与学习的迭代收缩阈值算法(ListA)相比,展示了额外的归一化处理,从而可靠地恢复了样本 - 明智的不同模型失配下的信号。所提出的REST网络被示出为占据最先进的模型和数据驱动的算法,在两种压缩感测和雷达成像问题中,考虑模型不匹配。

ML-85-标题: Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic Forecasting

链接: https://arxiv.org/abs/2110.10380
作者: Hyunwook Lee, Seungmin Jin, Hyeshin Chu, Hongkyu Lim, Sungahn Ko
备注: 12 pages, Submitted as conference paper to ICLR 2022

点击查看摘要

Abstract: Traffic forecasting is a challenging problem due to complex road networks and sudden speed changes caused by various events on roads. A number of models have been proposed to solve this challenging problem with a focus on learning spatio-temporal dependencies of roads. In this work, we propose a new perspective of converting the forecasting problem into a pattern matching task, assuming that large data can be represented by a set of patterns. To evaluate the validness of the new perspective, we design a novel traffic forecasting model, called Pattern-Matching Memory Networks (PM-MemNet), which learns to match input data to the representative patterns with a key-value memory structure. We first extract and cluster representative traffic patterns, which serve as keys in the memory. Then via matching the extracted keys and inputs, PM-MemNet acquires necessary information of existing traffic patterns from the memory and uses it for forecasting. To model spatio-temporal correlation of traffic, we proposed novel memory architecture GCMem, which integrates attention and graph convolution for memory enhancement. The experiment results indicate that PM-MemNet is more accurate than state-of-the-art models, such as Graph WaveNet with higher responsiveness. We also present a qualitative analysis result, describing how PM-MemNet works and achieves its higher accuracy when road speed rapidly changes.

摘要:由于复杂的道路网络和道路上的各种事件造成的突然变化,交通预测是一个挑战性问题。已经提出了许多模型来解决这一具有挑战性的问题,专注于学习道路的时空依赖性。在这项工作中,我们提出了将预测问题转换为模式匹配任务的新视角,假设大数据可以由一组模式表示。为了评估新的视角的有效性,我们设计了一种名为模式匹配的内存网络(PM-MEMNET)的新型流量预测模型,该模型学会使用键值存储器结构匹配输入数据到代表模式。我们首先提取和集群代表性的流量模式,它用作存储器中的键。然后通过匹配提取的键和输入,PM-MEMNET获取来自内存的现有流量模式的必要信息,并使用它进行预测。为了模拟交通的时空相关性,我们提出了新的内存架构GCMEM,它集成了对内存增强的关注和图形卷积。实验结果表明PM-MEMNET比最先进的模型更准确,例如具有更高响应性的图形波形。我们还提出了一个定性分析结果,描述了PM-MEMNET如何运作,并且在道路速度快速变化时实现其更高的准确性。

ML-86-标题: Cascaded Compressed Sensing Networks: A Reversible Architecture for Layerwise Learning

链接: https://arxiv.org/abs/2110.10379
作者: Weizhi Lu, Mingrui Chen, Kai Guo, Weiyu Li
备注:

点击查看摘要

Abstract: Recently, the method that learns networks layer by layer has attracted increasing interest for its ease of analysis. For the method, the main challenge lies in deriving an optimization target for each layer by inversely propagating the global target of the network. The propagation problem is ill posed, due to involving the inversion of nonlinear activations from lowdimensional to high-dimensional spaces. To address the problem, the existing solution is to learn an auxiliary network to specially propagate the target. However, the network lacks stability, and moreover, it results in higher complexity for network learning. In the letter, we show that target propagation could be achieved by modeling the network s each layer with compressed sensing, without the need of auxiliary networks. Experiments show that the proposed method could achieve better performance than the auxiliary network-based method.

摘要:近来,通过层学习网络层的方法吸引了易于分析的越来越兴趣。对于该方法,主要挑战在于通过反向传播网络的全局目标来导出每个层的优化目标。由于涉及从低维度与高维空间的非线性激活的反转,传播问题不适。为了解决问题,现有解决方案是学习辅助网络,以特别传播目标。然而,网络缺乏稳定性,而且,它导致网络学习的复杂性更高。在信中,我们表明可以通过使用压缩感的每个层建模网络S的每个层来实现目标传播,而不需要辅助网络。实验表明,该方法可以实现比基于网络的辅助网络的更好的性能。

ML-87-标题: Distributed Reinforcement Learning for Privacy-Preserving Dynamic Edge Caching

链接: https://arxiv.org/abs/2110.10349
作者: Shengheng Liu, Chong Zheng, Yongming Huang, Tony Q. S. Quek
备注: 12 pages, 6 figures, under review with the IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS

点击查看摘要

Abstract: Mobile edge computing (MEC) is a prominent computing paradigm which expands the application fields of wireless communication. Due to the limitation of the capacities of user equipments and MEC servers, edge caching (EC) optimization is crucial to the effective utilization of the caching resources in MEC-enabled wireless networks. However, the dynamics and complexities of content popularities over space and time as well as the privacy preservation of users pose significant challenges to EC optimization. In this paper, a privacy-preserving distributed deep deterministic policy gradient (P2D3PG) algorithm is proposed to maximize the cache hit rates of devices in the MEC networks. Specifically, we consider the fact that content popularities are dynamic, complicated and unobservable, and formulate the maximization of cache hit rates on devices as distributed problems under the constraints of privacy preservation. In particular, we convert the distributed optimizations into distributed model-free Markov decision process problems and then introduce a privacy-preserving federated learning method for popularity prediction. Subsequently, a P2D3PG algorithm is developed based on distributed reinforcement learning to solve the distributed problems. Simulation results demonstrate the superiority of the proposed approach in improving EC hit rate over the baseline methods while preserving user privacy.

摘要:移动边缘计算(MEC)是一个突出的计算范例,它扩展了无线通信的应用领域。由于用户设备和MEC服务器的能力的限制,边缘缓存(EC)优化对于有效利用启用MEC的无线网络中的高速利用。然而,内容普及空间和时间的动态和复杂性以及用户的隐私保护对EC优化构成了重大挑战。在本文中,提出了一种隐私保留的分布式深度确定性政策梯度(P2D3PG)算法,以最大化MEC网络中设备的高速缓存命中率。具体而言,我们认为内容流行度是动态,复杂和不可观察的事实,并制定了在隐私保存的限制下作为分布式问题的设备的高速缓存命中速率的最大化。特别是,我们将分布式优化转换为分布式的无模型马尔可夫决策过程问题,然后介绍一种隐私保留的联合学习方法,用于普及预测。随后,基于分布式增强学学习开发了P2D3PG算法以解决分布式问题。仿真结果表明,在保护用户隐私的同时通过基线方法提高EC击中率的提出方法的优越性。

ML-88-标题: Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond

链接: https://arxiv.org/abs/2110.10342
作者: Chulhee Yun, Shashank Rajput, Suvrit Sra
备注: 72 pages

点击查看摘要

Abstract: In distributed learning, local SGD (also known as federated averaging) and its simple baseline minibatch SGD are widely studied optimization methods. Most existing analyses of these methods assume independent and unbiased gradient estimates obtained via with-replacement sampling. In contrast, we study shuffling-based variants: minibatch and local Random Reshuffling, which draw stochastic gradients without replacement and are thus closer to practice. For smooth functions satisfying the Polyak-Łojasiewicz condition, we obtain convergence bounds (in the large epoch regime) which show that these shuffling-based variants converge faster than their with-replacement counterparts. Moreover, we prove matching lower bounds showing that our convergence analysis is tight. Finally, we propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.

摘要:在分布式学习中,广泛研究了本地SGD(也称为联邦平均)及其简单的基线小纤维SGD。这些方法的大多数现有分析假设通过替换采样获得的独立和非偏见的梯度估计。相比之下,我们研究了基于洗牌的变体:小纤维和局部随机重新洗牌,其在没有替代的情况下绘制随机梯度,因此更接近实践。对于满足Polyak-ŁajasiewiCz条件的顺利函数,我们获得了收敛界(在大型时期)中,表明这些基于洗牌的变体比其替换对应物更快地收敛。此外,我们证明了匹配的下界,表明我们的收敛分析很紧。最后,我们提出了一种称为同步混洗的算法修改,其导致收敛速率比近乎均匀设置的下限更快。

ML-89-标题: One-Step Abductive Multi-Target Learning with Diverse Noisy Samples

链接: https://arxiv.org/abs/2110.10325
作者: Yongquan Yang
备注: 6 pages

点击查看摘要

Abstract: One-step abductive multi-target learning (OSAMTL) was proposed to handle complex noisy labels. In this paper, giving definition of diverse noisy samples (DNS), we propose one-step abductive multi-target learning with DNS (OSAMTL-DNS) to expand the original OSAMTL to a wider range of tasks that handle complex noisy labels.

摘要:提出了一步绑架的多目标学习(OSAMTL)来处理复杂的嘈杂标签。在本文中,赋予各种嘈杂样本(DNS)的定义,我们提出了一种与DNS(OSAMTL-DNS)的一步绑架多目标学习,以将原始OSAMTL扩展到处理复杂嘈杂标签的更广泛的任务范围内。

ML-90-标题: Frontiers in Evolutionary Computation: A Workshop Report

链接: https://arxiv.org/abs/2110.10320
作者: Tyler Millhouse, Melanie Moses, Melanie Mitchell
备注:

点击查看摘要

Abstract: In July of 2021, the Santa Fe Institute hosted a workshop on evolutionary computation as part of its Foundations of Intelligence in Natural and Artificial Systems project. This project seeks to advance the field of artificial intelligence by promoting interdisciplinary research on the nature of intelligence. The workshop brought together computer scientists and biologists to share their insights about the nature of evolution and the future of evolutionary computation. In this report, we summarize each of the talks and the subsequent discussions. We also draw out a number of key themes and identify important frontiers for future research.

摘要:2021年7月,圣达菲研究所举办了一个关于进化计算的研讨会,是自然和人工系统项目中智能基础的一部分。该项目旨在通过促进智能性质的跨学科研究来推进人工智能领域。该研讨会汇集了计算机科学家和生物学家,分享了他们对进化性质和进化计算的未来的见解。在本报告中,我们总结了每个会谈和随后的讨论。我们还借出了许多关键主题,并确定未来研究的重要前沿。

ML-91-标题: When in Doubt Summon the Titans: Efficient Inference with Large Models

链接: https://arxiv.org/abs/2110.10305
作者: Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Amr Ahmed, Sanjiv Kumar
备注:

点击查看摘要

Abstract: Scaling neural networks to “large” sizes, with billions of parameters, has been shown to yield impressive results on many challenging problems. However, the inference cost incurred by such large models often prevents their application in most real-world settings. In this paper, we propose a two-stage framework based on distillation that realizes the modelling benefits of the large models, while largely preserving the computational benefits of inference with more lightweight models. In a nutshell, we use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of “easy” examples; for the “hard” examples, we fall-back to the teacher. Such an approach allows us to efficiently employ large models in practical scenarios where easy examples are much more frequent than rare hard examples. Our proposed use of distillation to only handle easy instances allows for a more aggressive trade-off in the student size, thereby reducing the amortized cost of inference and achieving better accuracy than standard distillation. Empirically, we demonstrate the benefits of our approach on both image classification and natural language processing benchmarks.

摘要:将神经网络缩放到“大”尺寸,数十亿个参数显示,在许多具有挑战性问题上产生令人印象深刻的结果。但是,如此大型模型所产生的推理成本通常会阻止其在大多数现实世界中的应用程序。在本文中,我们提出了一种基于蒸馏的两级框架,实现了大型型号的建模益处,同时大大保护了推论的计算益处与更多轻量级模型。简而言之,我们使用大型教师模型来指导轻量级学生模型,以便对“简易”示例的子集进行正确的预测;对于“硬”的例子,我们倒回老师。这种方法使我们能够在实际情况下有效地使用大型模型,其中易示例比罕见的硬例更频繁。我们建议使用蒸馏仅处理轻松的实例,允许在学生规模中更具侵略性的权衡,从而降低推理的摊销成本,并实现比标准蒸馏更好的准确性。经验上,我们展示了我们对图像分类和自然语言处理基准的方法的好处。

ML-92-标题: Layer-wise Adaptive Model Aggregation for Scalable Federated Learning

链接: https://arxiv.org/abs/2110.10302
作者: Sunwoo Lee, Tuo Zhang, Chaoyang He, Salman Avestimehr
备注:

点击查看摘要

Abstract: In Federated Learning, a common approach for aggregating local models across clients is periodic averaging of the full model parameters. It is, however, known that different layers of neural networks can have a different degree of model discrepancy across the clients. The conventional full aggregation scheme does not consider such a difference and synchronizes the whole model parameters at once, resulting in inefficient network bandwidth consumption. Aggregating the parameters that are similar across the clients does not make meaningful training progress while increasing the communication cost. We propose FedLAMA, a layer-wise model aggregation scheme for scalable Federated Learning. FedLAMA adaptively adjusts the aggregation interval in a layer-wise manner, jointly considering the model discrepancy and the communication cost. The layer-wise aggregation method enables to finely control the aggregation interval to relax the aggregation frequency without a significant impact on the model accuracy. Our empirical study shows that FedLAMA reduces the communication cost by up to 60% for IID data and 70% for non-IID data while achieving a comparable accuracy to FedAvg.

摘要:在联合学习中,跨客户端聚合本地模型的常见方法是完整模型参数的周期性平均。然而,已经知道,不同的神经网络层可以在客户端上具有不同程度的模型差异。传统的全聚合方案不考虑这种差异并立即同步整个模型参数,导致网络带宽消耗效率低下。在增加沟通成本的同时,聚合在客户端中相似的参数不会进行有意义的培训进度。我们提出FedLama,一个用于可扩展联合学习的一层模型模型聚合方案。 FEDLAMA以层式方式自适应地调整聚合间隔,共同考虑模型差异和通信成本。层面聚合方法可以通过对模型精度的显着影响,整理地控制聚合间隔以放宽聚合频率,而不会对模型精度产生重大影响。我们的实证研究表明,Fedlama在IID数据中将通信成本降低至60%,而非IID数据的70%,同时为Fedivg实现了可比的准确性。

ML-93-标题: Expressivity of Neural Networks via Chaotic Itineraries beyond Sharkovskys Theorem

链接: https://arxiv.org/abs/2110.10295
作者: Clayton Sanford, Vaggos Chatziafratis
备注: 47 pages, 19 figures

点击查看摘要

Abstract: Given a target function ff, how large must a neural network be in order to approximate ff? Recent works examine this basic question on neural network \textit{expressivity} from the lens of dynamical systems and provide novel depth-vs-width'' tradeoffs for a large family of functions $f$. They suggest that such tradeoffs are governed by the existence of \textit{periodic} points or \emph{cycles} in $f$. Our work, by further deploying dynamical systems concepts, illuminates a more subtle connection between periodicity and expressivity: we prove that periodic points alone lead to suboptimal depth-width tradeoffs and we improve upon them by demonstrating that certain chaotic itineraries’’ give stronger exponential tradeoffs, even in regimes where previous analyses only imply polynomial gaps. Contrary to prior works, our bounds are nearly-optimal, tighten as the period increases, and handle strong notions of inapproximability (e.g., constant L1L_1 error). More broadly, we identify a phase transition to the \textit{chaotic regime} that exactly coincides with an abrupt shift in other notions of function complexity, including VC-dimension and topological entropy.

摘要:鉴于目标函数$ F ,神经网络必须多大,以便近似,神经网络必须多大,以便近似 F ?最近的作品在动态系统镜头中检查了神经网络 TexitExpressivity的基本问题,为大型职能家庭提供了新的“深度VS宽度”的权衡?最近的作品在动态系统镜头中检查了神经网络\ Texit {Expressivity}的基本问题,为大型职能家庭提供了新的“深度VS宽度”的权衡 F 。他们建议这样的权衡受到 Textit定期点或 emphcycles的延款的管辖。通过进一步部署动态系统概念的工作,在周期性和富有效力之间阐明更细微的连接:我们证明了单独的定期点导致次优的深度宽度权衡,我们通过证明某些“混乱的行程”给予更强即使在以前分析的政权中,指数权衡只会暗示多项式差距。与现有作品相反,我们的界限几乎是最佳的,随着期间的增加而收紧,并处理不可达比的强烈概念(例如,常数。他们建议这样的权衡受到\ Textit {定期}点或\ emph {cycles}的延款的管辖。通过进一步部署动态系统概念的工作,在周期性和富有效力之间阐明更细微的连接:我们证明了单独的定期点导致次优的深度宽度权衡,我们通过证明某些“混乱的行程”给予更强即使在以前分析的政权中,指数权衡只会暗示多项式差距。与现有作品相反,我们的界限几乎是最佳的,随着期间的增加而收紧,并处理不可达比的强烈概念(例如,常数 L_1 $错误)。更广泛地,我们识别到\ Texit {混沌制度}的阶段过渡,这与其他函数复杂性的其他概念突然转变完全一致,包括VC维和拓扑熵。

ML-94-标题: Multi-concept adversarial attacks

链接: https://arxiv.org/abs/2110.10287
作者: Vibha Belavadi, Yan Zhou, Murat Kantarcioglu, Bhavani M. Thuraisingham
备注: 20 pages, 28 figures, 9 tables

点击查看摘要

Abstract: As machine learning (ML) techniques are being increasingly used in many applications, their vulnerability to adversarial attacks becomes well-known. Test time attacks, usually launched by adding adversarial noise to test instances, have been shown effective against the deployed ML models. In practice, one test input may be leveraged by different ML models. Test time attacks targeting a single ML model often neglect their impact on other ML models. In this work, we empirically demonstrate that naively attacking the classifier learning one concept may negatively impact classifiers trained to learn other concepts. For example, for the online image classification scenario, when the Gender classifier is under attack, the (wearing) Glasses classifier is simultaneously attacked with the accuracy dropped from 98.69 to 88.42. This raises an interesting question: is it possible to attack one set of classifiers without impacting the other set that uses the same test instance? Answers to the above research question have interesting implications for protecting privacy against ML model misuse. Attacking ML models that pose unnecessary risks of privacy invasion can be an important tool for protecting individuals from harmful privacy exploitation. In this paper, we address the above research question by developing novel attack techniques that can simultaneously attack one set of ML models while preserving the accuracy of the other. In the case of linear classifiers, we provide a theoretical framework for finding an optimal solution to generate such adversarial examples. Using this theoretical framework, we develop a multi-concept attack strategy in the context of deep learning. Our results demonstrate that our techniques can successfully attack the target classes while protecting the protected classes in many different settings, which is not possible with the existing test-time attack-single strategies.

摘要:由于机器学习(ml)技术在许多应用中越来越多地使用,他们对对抗性攻击的脆弱性变得众所周知。测试时间攻击通常通过向测试实例添加对抗性噪声来启动,已经显示出对部署的ML模型有效。在实践中,可以通过不同的ML模型利用一个测试输入。测试时间攻击目标单一ML模型通常会忽视它们对其他ML模型的影响。在这项工作中,我们经验证明了攻击分类器学习一个概念可能产生训练的分类器以学习其他概念的分类器。例如,对于在线图像分类方案,当性别分类器处于攻击时,(磨损)眼镜分类器同时攻击,准确性从98.69降至88.42。这提出了一个有趣的问题:是否有可能攻击一组分类器,而不会影响使用相同的测试实例的其他集合?上述研究问题的答案对保护隐私免受ML模型滥用的影响具有有趣的影响。攻击ML模型,造成不必要的隐私权的风险可能是保护个人免受有害隐私开发的重要工具。在本文中,我们通过开发新的攻击技术来解决上述研究问题,这些技术可以同时攻击一组ML模型,同时保留对方的准确性。在线性分类器的情况下,我们提供了一个理论框架,用于寻找最佳解决方案以生成这种对抗示例。使用这种理论框架,我们在深入学习的背景下开发了多概念攻击策略。我们的结果表明,我们的技术可以成功攻击目标类,同时在许多不同的设置中保护受保护的类,这是不可能的现有测试时间攻击单策略。

ML-95-标题: Robust Semi-Supervised Classification using GANs with Self-Organizing Maps

链接: https://arxiv.org/abs/2110.10286
作者: Ronald Fick, Paul Gader, Alina Zare
备注: 9 pages, 13 figures This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

点击查看摘要

Abstract: Generative adversarial networks (GANs) have shown tremendous promise in learning to generate data and effective at aiding semi-supervised classification. However, to this point, semi-supervised GAN methods make the assumption that the unlabeled data set contains only samples of the joint distribution of the classes of interest, referred to as inliers. Consequently, when presented with a sample from other distributions, referred to as outliers, GANs perform poorly at determining that it is not qualified to make a decision on the sample. The problem of discriminating outliers from inliers while maintaining classification accuracy is referred to here as the DOIC problem. In this work, we describe an architecture that combines self-organizing maps (SOMs) with SS-GANS with the goal of mitigating the DOIC problem and experimental results indicating that the architecture achieves the goal. Multiple experiments were conducted on hyperspectral image data sets. The SS-GANS performed slightly better than supervised GANS on classification problems with and without the SOM. Incorporating the SOMs into the SS-GANs and the supervised GANS led to substantially mitigation of the DOIC problem when compared to SS-GANS and GANs without the SOMs. Furthermore, the SS-GANS performed much better than GANS on the DOIC problem, even without the SOMs.

摘要:生成的对抗网络(GANS)在学习生成数据和有效时,在劝阻半监督分类方面表现出巨大的承诺。然而,在这一点上,半监督GaN方法假设未标记的数据集仅包含感兴趣类的联合分布的样本,称为最基金。因此,当向其他分布的样本呈现时,称为异常值时,GAN在确定它不合格在样本上做出决定时表现不佳。在维持分类准确性的同时判断来自主体的异常值的问题在此称为DEIC问题。在这项工作中,我们描述了一种将自组织地图(SOM)与SS-GAN结合的架构,其目的是减轻DOIC问题和实验结果,表明该体系结构实现了目标。在高光谱图像数据集上进行多个实验。 SS-GANS在具有和没有SOM的分类问题上略微好转。将SOM与SS-GANS合并到SS-GANS中,与没有SOM的SS-GANS和GAN相比,监督的GANS会导致THIC问题大幅减轻。此外,即使没有SOM,SS-GAN在DOIC问题上表现优于导致的GAN。

ML-96-标题: A Simple Approach to Continual Learning by Transferring Skill Parameters

链接: https://arxiv.org/abs/2110.10255
作者: K.R. Zentner, Ryan Julian, Ujjwal Puri, Yulun Zhang, Gaurav S. Sukhatme
备注: Submitted to ICRA 2022

点击查看摘要

Abstract: In order to be effective general purpose machines in real world environments, robots not only will need to adapt their existing manipulation skills to new circumstances, they will need to acquire entirely new skills on-the-fly. A great promise of continual learning is to endow robots with this ability, by using their accumulated knowledge and experience from prior skills. We take a fresh look at this problem, by considering a setting in which the robot is limited to storing that knowledge and experience only in the form of learned skill policies. We show that storing skill policies, careful pre-training, and appropriately choosing when to transfer those skill policies is sufficient to build a continual learner in the context of robotic manipulation. We analyze which conditions are needed to transfer skills in the challenging Meta-World simulation benchmark. Using this analysis, we introduce a pair-wise metric relating skills that allows us to predict the effectiveness of skill transfer between tasks, and use it to reduce the problem of continual learning to curriculum selection. Given an appropriate curriculum, we show how to continually acquire robotic manipulation skills without forgetting, and using far fewer samples than needed to train them from scratch.

摘要:为了使现实世界环境中有效的通用机器,机器人不仅需要使其现有的操纵技能适应新的情况,他们将需要在飞行中获得全面的新技能。通过使用先前技能的累积知识和经验,持续学习的巨大希望是通过这种能力的能力。通过考虑机器人限于存储该知识和体验,仅以学习技能政策的形式存储该知识和体验,我们采取了新的问题。 We show that storing skill policies, careful pre-training, and appropriately choosing when to transfer those skill policies is sufficient to build a continual learner in the context of robotic manipulation.我们分析了在挑战性荟萃模拟基准中转移技能所需的条件。使用此分析,我们介绍了一对明智的公制相关技能,使我们能够预测任务之间的技能转移的有效性,并使用它来减少跨学习课程选择的问题。鉴于适当的课程,我们展示了如何在不遗忘的情况下不断获得机器人操纵技巧,并且使用比需要训练它们的样品更少。

ML-97-标题: Neural Stochastic Partial Differential Equations

链接: https://arxiv.org/abs/2110.10249
作者: Cristopher Salvi, Maud Lemercier
备注:

点击查看摘要

Abstract: Stochastic partial differential equations (SPDEs) are the mathematical tool of choice to model complex spatio-temporal dynamics of systems subject to the influence of randomness. We introduce the Neural SPDE model providing an extension to two important classes of physics-inspired neural architectures. On the one hand, it extends all the popular neural – ordinary, controlled, stochastic, rough – differential equation models in that it is capable of processing incoming information even when the latter evolves in an infinite dimensional state space. On the other hand, it extends Neural Operators – recent generalizations of neural networks modelling mappings between functional spaces – in that it can be used to learn complex SPDE solution operators (u0,ξ)u(u_0,\xi) \mapsto u depending simultaneously on an initial condition u0u_0 and on a stochastic forcing term ξ\xi, while remaining resolution-invariant and equation-agnostic. A Neural SPDE is constrained to respect real physical dynamics and consequently requires only a modest amount of data to train, depends on a significantly smaller amount of parameters and has better generalization properties compared to Neural Operators. Through various experiments on semilinear SPDEs with additive and multiplicative noise (including the stochastic Navier-Stokes equations) we demonstrate how Neural SPDEs can flexibly be used in a supervised learning setting as well as conditional generative models to sample solutions of SPDEs conditioned on prior knowledge, systematically achieving in both cases better performance than all alternative models.

摘要:随机偏微分方程(SPDES)是选择复杂的时空动态的选择性的数学工具,其受随机性影响。我们介绍了神经SPDE模型,为两个重要的物理启发神经架构提供了扩展。一方面,它延伸了所有流行的神经普通,受控,随机粗差分方程模型,因为即使在无限尺寸状态空间中的后者演变时,它也能够处理进入信息。另一方面,它扩展了神经运营商 - 最近的神经网络在功能空间之间建模映射的概括 - 因为它可以用于学习复杂的SPDE解决方案运算符$(U_0,\ xi)\ MapSto U ,取决于同时初始条件,取决于同时初始条件 u_0 和随机强制术语和随机强制术语 \ xi $,而剩余分辨率不变和公式 - 不可知论。神经SPDE被限制以尊重真实物理动态,因此只需要训练的适度数量,这取决于与神经运算符相比具有更好的参数和具有更好的泛化特性。通过各种关于具有添加剂和乘法噪声(包括随机Navier-Stokes方程)的半线性SPDES的实验,我们证明了神经间隙如何能够灵活地用于监督学习设置以及条件生成模型,以便在先验知识上调节SPDES的解决方案,系统地实现两种情况,而不是所有替代模型的性能。

ML-98-标题: More Engineering No Silos: Rethinking Processes and Interfaces in Collaboration between Interdisciplinary Teams for Machine Learning Projects

链接: https://arxiv.org/abs/2110.10234
作者: Nadia Nahar, Shurui Zhou, Grace Lewis, Christian Kästner
备注: 22 pages, 10 figures, 5 tables

点击查看摘要

Abstract: The introduction of machine learning (ML) components in software projects has created the need for software engineers to collaborate with data scientists and other specialists. While collaboration can always be challenging, ML introduces additional challenges with its exploratory model development process, additional skills and knowledge needed, difficulties testing ML systems, need for continuous evolution and monitoring, and non-traditional quality requirements such as fairness and explainability. Through interviews with 45 practitioners from 28 organizations, we identified key collaboration challenges that teams face when building and deploying ML systems into production. We report on common collaboration points in the development of production ML systems for requirements, data, and integration, as well as corresponding team patterns and challenges. We find that most of these challenges center around communication, documentation, engineering, and process and collect recommendations to address these challenges.

摘要:软件项目中的机器学习(ML)组件的引入创造了软件工程师与数据科学家和其他专家合作。虽然合作可以始终具有挑战性,但ML介绍了探索性模型开发过程的额外挑战,需要额外的技能和知识,测试ML系统的困难,需要连续演化和监测,以及非传统质量要求,如公平性和解释性。通过采访来自28个组织的45名从业者,我们确定了在建立和将ML系统部署到生产时面临的关键合作挑战。我们报告了生产ML系统的开发中的共同合作点,以获得要求,数据和集成以及相应的团队模式和挑战。我们发现,这些挑战中的大部分挑战围绕通信,文档,工程和流程以及收集建议以解决这些挑战。

ML-99-标题: Forecasting Market Prices using DL with Data Augmentation and Meta-learning: ARIMA still wins!

链接: https://arxiv.org/abs/2110.10233
作者: Vedant Shah, Gautam Shroff
备注: Accepted at the ICBINB Workshop @ NeurIPS, 2021

点击查看摘要

Abstract: Deep-learning techniques have been successfully used for time-series forecasting and have often shown superior performance on many standard benchmark datasets as compared to traditional techniques. Here we present a comprehensive and comparative study of performance of deep-learning techniques for forecasting prices in financial markets. We benchmark state-of-the-art deep-learning baselines, such as NBeats, etc., on data from currency as well as stock markets. We also generate synthetic data using a fuzzy-logic based model of demand driven by technical rules such as moving averages, which are often used by traders. We benchmark the baseline techniques on this synthetic data as well as use it for data augmentation. We also apply gradient-based meta-learning to account for non-stationarity of financial time-series. Our extensive experiments notwithstanding, the surprising result is that the standard ARIMA models outperforms deep-learning even using data augmentation or meta-learning. We conclude by speculating as to why this might be the case.

摘要:与传统技术相比,深度学习技术已成功用于时间系列预测,并且经常在许多标准基准数据集中显示出优越的性能。在这里,我们对金融市场价格预测价格的深度学习技术进行了全面的和比较研究。我们基准于最先进的深度学习基线,例如Nbeats等,从货币以及股票市场进行数据。我们还使用由技术规则驱动的基于模糊逻辑的需求模型来生成合成数据,例如移动平均值,这些规则通常由交易者使用。我们将基线技术基准测试在此合成数据上以及使用它以进行数据增强。我们还将梯度的元学习应用于金融时序系列的非公平性。尽管我们的广泛实验令人惊讶的结果是,即使使用数据增强或元学习,标准的Arima Models也优于深度学习。我们通过猜测为什么这可能是这种情况来结束。

ML-100-标题: What Averages Do Not Tell – Predicting Real Life Processes with Sequential Deep Learning

链接: https://arxiv.org/abs/2110.10225
作者: István Ketykó, Felix Mannhardt, Marwan Hassani, Boudewijn van Dongen
备注:

点击查看摘要

Abstract: Deep Learning is proven to be an effective tool for modeling sequential data as shown by the success in Natural Language, Computer Vision and Signal Processing. Process Mining concerns discovering insights on business processes from their execution data that are logged by supporting information systems. The logged data (event log) is formed of event sequences (traces) that correspond to executions of a process. Many Deep Learning techniques have been successfully adapted for predictive Process Mining that aims to predict process outcomes, remaining time, the next event, or even the suffix of running traces. Traces in Process Mining are multimodal sequences and very differently structured than natural language sentences or images. This may require a different approach to processing. So far, there has been little focus on these differences and the challenges introduced. Looking at suffix prediction as the most challenging of these tasks, the performance of Deep Learning models was evaluated only on average measures and for a small number of real-life event logs. Comparing the results between papers is difficult due to different pre-processing and evaluation strategies. Challenges that may be relevant are the skewness of trace-length distribution and the skewness of the activity distribution in real-life event logs. We provide an end-to-end framework which enables to compare the performance of seven state-of-the-art sequential architectures in common settings. Results show that sequence modeling still has a lot of room for improvement for majority of the more complex datasets. Further research and insights are required to get consistent performance not just in average measures but additionally over all the prefixes.

摘要:被证明是深度学习是一种用于建模顺序数据的有效工具,如自然语言,计算机视觉和信号处理的成功所示。过程挖掘涉及通过支持信息系统记录的执行数据来发现对业务流程的见解。记录数据(事件日志)由对应于过程的执行的事件序列(迹线)形成。许多深度学习技术已成功适用于预测过程挖掘,其旨在预测过程结果,剩余时间,下一个事件,甚至运行迹线的后缀。过程挖掘中的迹线是多模式序列,而不是自然语言句子或图像的结构非常不同。这可能需要不同的处理方法。到目前为止,几乎没有焦点这些差异,呈现挑战。看起来后缀预测作为这些任务的最具挑战性,只有在平均措施和少量现实生活事件日志中评估了深度学习模型的性能。由于不同的预处理和评估策略,比较纸张之间的结果是困难的。可能是相关的挑战是微量痕量分布的歪曲和现实事件日志中的活动分布的歪曲。我们提供了端到端的框架,可以在公共设置中比较七种最先进的顺序体系结构的性能。结果表明,序列建模仍然有大量改善大多数更复杂的数据集的空间。需要进一步的研究和见解,以获得一致的性能,不仅仅是平均措施,而且还在所有的前缀上。

ML-101-标题: A Federated Learning Aggregation Algorithm for Pervasive Computing: Evaluation and Comparison

链接: https://arxiv.org/abs/2110.10223
作者: Sannara Ek, François Portet, Philippe Lalanda, German Vega
备注: 9th IEEE International Conference on Pervasive Computing and Communications (PerCom 2021)

点击查看摘要

Abstract: Pervasive computing promotes the installation of connected devices in our living spaces in order to provide services. Two major developments have gained significant momentum recently: an advanced use of edge resources and the integration of machine learning techniques for engineering applications. This evolution raises major challenges, in particular related to the appropriate distribution of computing elements along an edge-to-cloud continuum. About this, Federated Learning has been recently proposed for distributed model training in the edge. The principle of this approach is to aggregate models learned on distributed clients in order to obtain a new, more general model. The resulting model is then redistributed to clients for further training. To date, the most popular federated learning algorithm uses coordinate-wise averaging of the model parameters for aggregation. However, it has been shown that this method is not adapted in heterogeneous environments where data is not identically and independently distributed (non-iid). This corresponds directly to some pervasive computing scenarios where heterogeneity of devices and users challenges machine learning with the double objective of generalization and personalization. In this paper, we propose a novel aggregation algorithm, termed FedDist, which is able to modify its model architecture (here, deep neural network) by identifying dissimilarities between specific neurons amongst the clients. This permits to account for clients’ specificity without impairing generalization. Furthermore, we define a complete method to evaluate federated learning in a realistic way taking generalization and personalization into account. Using this method, FedDist is extensively tested and compared with three state-of-the-art federated learning algorithms on the pervasive domain of Human Activity Recognition with smartphones.

摘要:普遍计算促进在生活空间中的连接设备的安装,以提供服务。最近两大主要发展促使了显着的动力:高级使用边缘资源和机器学习技术为工程应用的集成。这种进化提出了主要的挑战,特别是与沿着边缘到云连续体的计算元素的适当分配相关。关于这一点,最近已经提出了联合学习的边缘中分布式模型训练。这种方法的原理是聚合在分布式客户端上学习的模型,以获得新的更普通模型。然后将产生的模型重新分配给客户以进行进一步培训。迄今为止,最受欢迎的联合学习算法使用模型参数的坐标 - 明智的汇聚。然而,已经表明,该方法不适用于数据不相同且独立分布(非IID)的异构环境中。这直接对应于某些普遍的计算场景,其中设备和用户的异构性挑战机器学习,具有泛化和个性化的双重目标。在本文中,我们提出了一种新的聚合算法,被称为FedDist,其能够通过识别客户之间的特定神经元之间的异化来修改其模型架构(这里,深神经网络)。这允许在不损害泛化的情况下解释客户的特异性。此外,我们定义了一种完整的方法,以通过考虑泛化和个性化的现实方式评估联合学习。使用这种方法,联邦主义者与智能手机的人类活动识别的普遍领域的三个最先进的联合学习算法进行了广泛的测试。

ML-102-标题: The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding

链接: https://arxiv.org/abs/2110.10221
作者: Pratik Fegade, Tianqi Chen, Phillip B. Gibbons, Todd C. Mowry
备注: 23 pages, 25 figures and 10 tables

点击查看摘要

Abstract: There is often variation in the shape and size of input data used for deep learning. In many cases, such data can be represented using tensors with non-uniform shapes, or ragged tensors. Due to limited and non-portable support for efficient execution on ragged tensors, current deep learning frameworks generally use techniques such as padding and masking to make the data shapes uniform and then offload the computations to optimized kernels for dense tensor algebra. Such techniques can, however, lead to a lot of wasted computation and therefore, a loss in performance. This paper presents CoRa, a tensor compiler that allows users to easily generate efficient code for ragged tensor operators targeting a wide range of CPUs and GPUs. Evaluating CoRa on a variety of operators on ragged tensors as well as on an encoder layer of the transformer model, we find that CoRa (i)performs competitively with hand-optimized implementations of the operators and the transformer encoder and (ii) achieves, over PyTorch, a 1.6X geomean speedup for the encoder on an Nvidia GPU and a 1.86X geomean speedup for the multi-head attention module used in transformers on an ARM CPU.

摘要:用于深度学习的输入数据的形状和大小通常有变化。在许多情况下,可以使用具有非均匀形状或粗糙的张量的张量来表示这些数据。由于有限和不便携式的支持在粗糙的张力上有效执行,电流深度学习框架通常使用诸如填充和掩模的技术,以使数据形状均匀,然后将计算卸载到用于密度张量代数的优化内核。然而,这种技术可以导致大量浪费的计算,因此,性能损失。本文介绍了Cora,一个张量编译器,允许用户轻松地为针对各种CPU和GPU的旧撕裂的卷曲运算符来生成有效的代码。评估CORA在粗糙的卷曲器上的各种操作员以及变压器模型的编码器层上,我们发现CORA(i)竞争地利用操作员的手优化实现和变压器编码器和(ii)实现了Pytorch,NVIDIA GPU上的编码器的1.6倍地理加速器和ARM CPU上的变压器中使用的多针注意模块的1.86倍的地理加速。

ML-103-标题: fairadapt: Causal Reasoning for Fair Data Pre-processing

链接: https://arxiv.org/abs/2110.10200
作者: Drago Plečko, Nicolas Bennett, Nicolai Meinshausen
备注: Keywords: algorithmic fairness, causal inference, machine learning

点击查看摘要

Abstract: Machine learning algorithms are useful for various predictions tasks, but they can also learn how to discriminate, based on gender, race or other sensitive attributes. This realization gave rise to the field of fair machine learning, which aims to measure and mitigate such algorithmic bias. This manuscript describes the R-package fairadapt, which implements a causal inference pre-processing method. By making use of a causal graphical model and the observed data, the method can be used to address hypothetical questions of the form “What would my salary have been, had I been of a different gender/race?”. Such individual level counterfactual reasoning can help eliminate discrimination and help justify fair decisions. We also discuss appropriate relaxations which assume certain causal pathways from the sensitive attribute to the outcome are not discriminatory.

摘要:机器学习算法对于各种预测任务非常有用,但它们也可以基于性别,种族或其他敏感属性学习如何区分。这种实现引起了公平机器学习领域,旨在测量和减轻这种算法偏差。此稿件描述了R-Package Fairadapt,它实现了一个因果推断预处理方法。通过利用因果图形模型和观察到的数据,该方法可用于解决形式的假设问题“我的薪水是什么,我有不同的性别/比赛?”。这样的个人级别的反事工程可以帮助消除歧视,并有助于证明公平的决定。我们还讨论了适当的放松,该放松假设从敏感属性到结果的某些因果途径不是歧视性的。

ML-104-标题: NAS-HPO-Bench-II: A Benchmark Dataset on Joint Optimization of Convolutional Neural Network Architecture and Training Hyperparameters

链接: https://arxiv.org/abs/2110.10165
作者: Yoichi Hirose, Nozomu Yoshinari, Shinichi Shirakawa
备注: 16 pages, 6 figures. Accepted at ACML2021 (long oral). API is available at this https URL

点击查看摘要

Abstract: The benchmark datasets for neural architecture search (NAS) have been developed to alleviate the computationally expensive evaluation process and ensure a fair comparison. Recent NAS benchmarks only focus on architecture optimization, although the training hyperparameters affect the obtained model performances. Building the benchmark dataset for joint optimization of architecture and training hyperparameters is essential to further NAS research. The existing NAS-HPO-Bench is a benchmark for joint optimization, but it does not consider the network connectivity design as done in modern NAS algorithms. This paper introduces the first benchmark dataset for joint optimization of network connections and training hyperparameters, which we call NAS-HPO-Bench-II. We collect the performance data of 4K cell-based convolutional neural network architectures trained on the CIFAR-10 dataset with different learning rate and batch size settings, resulting in the data of 192K configurations. The dataset includes the exact data for 12 epoch training. We further build the surrogate model predicting the accuracies after 200 epoch training to provide the performance data of longer training epoch. By analyzing NAS-HPO-Bench-II, we confirm the dependency between architecture and training hyperparameters and the necessity of joint optimization. Finally, we demonstrate the benchmarking of the baseline optimization algorithms using NAS-HPO-Bench-II.

摘要:已经开发了用于神经结构搜索(NAS)的基准数据集以缓解计算昂贵的评估过程,并确保公平比较。最近的NAS基准仅关注架构优化,尽管培训超参数会影响所获得的模型性能。建立用于联合优化建筑和培训的基准数据集和培训QuanteParameters对于进一步的NAS研究至关重要。现有的NAS-HPO-BENCH是联合优化的基准,但它不考虑在现代NAS算法中完成的网络连接设计。本文介绍了用于联合优化网络连接和培训超参数的第一个基准数据集,我们调用NAS-HPO-BENCH-II。我们收集在CIFAR-10数据集上培训的4K细胞的卷积神经网络架构的性能数据,具有不同的学习速率和批量尺寸设置,导致192K配置的数据。数据集包括12个时代培训的确切数据。我们进一步构建了预测200时欧元训练后准确性的代理模型,以提供更长培训时代的性能数据。通过分析NAS-HPO-BENCH-II,我们确认了架构和培训高参数之间的依赖性以及联合优化的必要性。最后,我们展示了使用NAS-HPO-BENCH-II的基线优化算法的基准测试。

ML-105-标题: Identifying Stroke Indicators Using Rough Sets

链接: https://arxiv.org/abs/2110.10152
作者: Muhammad Salman Pathan, Jianbiao Zhang, Deepu John, Avishek Nag, Soumyabrata Dev
备注: Accepted in IEEE Access, 2020

点击查看摘要

Abstract: Stroke is widely considered as the second most common cause of mortality. The adverse consequences of stroke have led to global interest and work for improving the management and diagnosis of stroke. Various techniques for data mining have been used globally for accurate prediction of occurrence of stroke based on the risk factors that are associated with the electronic health care records (EHRs) of the patients. In particular, EHRs routinely contain several thousands of features and most of them are redundant and irrelevant that need to be discarded to enhance the prediction accuracy. The choice of feature-selection methods can help in improving the prediction accuracy of the model and efficient data management of the archived input features. In this paper, we systematically analyze the various features in EHR records for the detection of stroke. We propose a novel rough-set based technique for ranking the importance of the various EHR records in detecting stroke. Unlike the conventional rough-set techniques, our proposed technique can be applied on any dataset that comprises binary feature sets. We evaluated our proposed method in a publicly available dataset of EHR, and concluded that age, average glucose level, heart disease, and hypertension were the most essential attributes for detecting stroke in patients. Furthermore, we benchmarked the proposed technique with other popular feature-selection techniques. We obtained the best performance in ranking the importance of individual features in detecting stroke.

摘要:中风被广泛认为是死亡率的第二个最常见的原因。中风的不良后果导致全球兴趣和努力改善中风的管理和诊断。全球各种用于数据挖掘的技术,用于基于与患者的电子医疗记录(EHRS)相关的风险因素来精确预测中风的发生。特别是,EHRS通常包含几千个特征,大多数是冗余的,需要丢弃,以提高预测精度。特征选择方法的选择可以帮助提高模型的预测准确性和归档输入功能的高效数据管理。在本文中,我们系统地分析了EHR记录中的各种特征来检测行程。我们提出了一种基于粗糙的粗糙集技术,用于排名各种EHR记录在检测行程中的重要性。与传统的粗糙集技术不同,我们所提出的技术可以应用于包括二进制特征集的任何数据集。我们在公开可用的EHR数据数据中评估了我们的拟议方法,并得出结论,年龄,平均血糖水平,心脏病和高血压是检测患者中风最重要的属性。此外,我们用其他流行的特征选择技术基准测试提出的技术。我们获得了排名在检测中风中个别特征的重要性方面的最佳性能。

ML-106-标题: Likelihood Training of Schrödinger Bridge using Forward-Backward SDEs Theory

链接: https://arxiv.org/abs/2110.11291
作者: Tianrong Chen, Guan-Horng Liu, Evangelos A. Theodorou
备注:

点击查看摘要

Abstract: Schrödinger Bridge (SB) is an optimal transport problem that has received increasing attention in deep generative modeling for its mathematical flexibility compared to the Scored-based Generative Model (SGM). However, it remains unclear whether the optimization principle of SB relates to the modern training of deep generative models, which often rely on constructing parameterized log-likelihood objectives.This raises questions on the suitability of SB models as a principled alternative for generative applications. In this work, we present a novel computational framework for likelihood training of SB models grounded on Forward-Backward Stochastic Differential Equations Theory – a mathematical methodology appeared in stochastic optimal control that transforms the optimality condition of SB into a set of SDEs. Crucially, these SDEs can be used to construct the likelihood objectives for SB that, surprisingly, generalizes the ones for SGM as special cases. This leads to a new optimization principle that inherits the same SB optimality yet without losing applications of modern generative training techniques, and we show that the resulting training algorithm achieves comparable results on generating realistic images on MNIST, CelebA, and CIFAR10.

摘要:Schrödinger桥(SB)是一个最佳的运输问题,与基于批量的生成模型(SGM)相比,对其数学灵活性的深度生成建模感到受到影响。然而,仍然不清楚SB的优化原则是否涉及深度生成模型的现代培训,这往往依赖于构建参数化的对数目标。这提出了关于SB模型作为生成应用的原则替代方案的问题。在这项工作中,我们提出了一种新的计算框架,用于在前后随机微分方程接地的SB模型的似然训练训练 - 随机最佳控制中出现的数学方法将SB的最优性条件转化为一组SDES。至关重要的是,这些SDE可用于构建SB的可能性目标,令人惊讶的是,令人惊讶地将SG的概括为特殊情况。这导致了一种新的优化原理,即在不失现代生成训练技术的情况下继承相同的SB最优性,并且我们表明所得到的训练算法在Mnist,Celeba和CiFar10上产生了可比的结果。

ML-107-标题: On Optimal Interpolation In Linear Regression

链接: https://arxiv.org/abs/2110.11258
作者: Eduard Oravkin, Patrick Rebeschini
备注: 25 pages, 7 figures, to appear in NeurIPS 2021

点击查看摘要

Abstract: Understanding when and why interpolating methods generalize well has recently been a topic of interest in statistical learning theory. However, systematically connecting interpolating methods to achievable notions of optimality has only received partial attention. In this paper, we investigate the question of what is the optimal way to interpolate in linear regression using functions that are linear in the response variable (as the case for the Bayes optimal estimator in ridge regression) and depend on the data, the population covariance of the data, the signal-to-noise ratio and the covariance of the prior for the signal, but do not depend on the value of the signal itself nor the noise vector in the training data. We provide a closed-form expression for the interpolator that achieves this notion of optimality and show that it can be derived as the limit of preconditioned gradient descent with a specific initialization. We identify a regime where the minimum-norm interpolator provably generalizes arbitrarily worse than the optimal response-linear achievable interpolator that we introduce, and validate with numerical experiments that the notion of optimality we consider can be achieved by interpolating methods that only use the training data as input in the case of an isotropic prior. Finally, we extend the notion of optimal response-linear interpolation to random features regression under a linear data-generating model that has been previously studied in the literature.

摘要:了解何时以及为什么内插方法概括井最近是统计学习理论兴趣的主题。然而,系统地连接用于可实现的最优性概念的内插方法仅接受了部分注意。在本文中,我们调查了使用响应变量线性的函数在线性回归中插入线性回归的最佳方式的问题(视网膜回归中的贝叶斯最佳估计器)并取决于数据,人口协方差数据,信噪比和信号的信噪比和协方差,但不依赖于信号本身的值,也不依赖于训练数据中的噪声向量。我们为插值器提供了一个封闭式表达式,该插值器实现了最优性的概念,并表明它可以被推导为具有特定初始化的预处理梯度下降的极限。我们确定最小规范内插器可否从最佳响应 - 线性可实现的内容概括地概括的政权,并通过数字实验验证,通过数字实验,我们考虑的最优性的概念可以通过插值方法来实现,这些方法可以通过插值方法来实现仅使用培训数据的方法来实现作为在各向同性之前的输入。最后,我们将最佳响应线性插值的概念扩展到先前在文献中研究的线性数据生成模型下的随机特征回归。

ML-108-标题: User-friendly introduction to PAC-Bayes bounds

链接: https://arxiv.org/abs/2110.11216
作者: Pierre Alquier
备注:

点击查看摘要

Abstract: Aggregated predictors are obtained by making a set of basic predictors vote according to some weights, that is, to some probability distribution. Randomized predictors are obtained by sampling in a set of basic predictors, according to some prescribed probability distribution. Thus, aggregated and randomized predictors have in common that they are not defined by a minimization problem, but by a probability distribution on the set of predictors. In statistical learning theory, there is a set of tools designed to understand the generalization ability of such procedures: PAC-Bayesian or PAC-Bayes bounds. Since the original PAC-Bayes bounds of McAllester, these tools have been considerably improved in many directions (we will for example describe a simplified version of the localization technique of Catoni that was missed by the community, and later rediscovered as “mutual information bounds”). Very recently, PAC-Bayes bounds received a considerable attention: for example there was workshop on PAC-Bayes at NIPS 2017, “(Almost) 50 Shades of Bayesian Learning: PAC-Bayesian trends and insights”, organized by B. Guedj, F. Bach and P. Germain. One of the reason of this recent success is the successful application of these bounds to neural networks by Dziugaite and Roy. An elementary introduction to PAC-Bayes theory is still missing. This is an attempt to provide such an introduction.

摘要:通过使一组基本预测因素投票来获得聚合预测器,即根据一些权重,即对某些概率分布。根据一些规定的概率分布,通过在一组基本预测器中采样来获得随机预测器。因此,聚合和随机预测器的共同之处包括最小化问题,而是通过对预测器集的概率分布来定义。在统计学习理论中,有一套工具旨在了解此类程序的泛化能力:Pac-Bayesian或Pac-Bayes界。由于Mcallester的原始Pac-Bayes界,这些工具在许多方向上得到了大大改善(例如,我们将描述社区错过的Catoni的本地化技术的简化版本,后来被重新发现为“互信界” )。最近,Pac-Bayes的界限受到相当大的关注:例如,在2017年的Pac-Bayes上有研讨会,“(几乎)50种贝叶斯学习:Pac-Bayesian趋势和见解”,由B. Guedj,F组织。 。巴赫和P.Merain。这个最近成功的原因之一是通过Dziugaite和Roy成功地将这些限制与神经网络中的界限。对Pac-Bayes理论的初步介绍仍然缺失。这是一种尝试提供这样的介绍。

ML-109-标题: Mean Nyström Embeddings for Adaptive Compressive Learning

链接: https://arxiv.org/abs/2110.10996
作者: Antoine Chatalic, Luigi Carratino, Ernesto De Vito, Lorenzo Rosasco
备注: 22 pages, 4 figures

点击查看摘要

Abstract: Compressive learning is an approach to efficient large scale learning based on sketching an entire dataset to a single mean embedding (the sketch), i.e. a vector of generalized moments. The learning task is then approximately solved as an inverse problem using an adapted parametric model. Previous works in this context have focused on sketches obtained by averaging random features, that while universal can be poorly adapted to the problem at hand. In this paper, we propose and study the idea of performing sketching based on data-dependent Nyström approximation. From a theoretical perspective we prove that the excess risk can be controlled under a geometric assumption relating the parametric model used to learn from the sketch and the covariance operator associated to the task at hand. Empirically, we show for k-means clustering and Gaussian modeling that for a fixed sketch size, Nyström sketches indeed outperform those built with random features.

摘要:压缩学习是一种基于将整个数据集进行草图到单个平均嵌入(草图)的大规模学习的方法,即广义时刻的向量。然后使用适应的参数模型近似地解决了学习任务作为逆问题。以前的工作在这种情况下专注于通过平均随机特征获得的草图,而虽然普遍可能适应手头的问题。在本文中,我们提出并研究了基于数据依赖性NYSTRÖM近似进行素描的想法。从理论透视中,我们证明可以在与用于手头的任务相关的草图和协方差操作者中学习的几何假设下的几何假设来控制过多的风险。凭经验,我们展示了K-Means群集和高斯模型,即针对固定的草图尺寸,Nyström草图确实以随机特征为基础的那些。

ML-110-标题: Data splitting improves statistical performance in overparametrized regimes

链接: https://arxiv.org/abs/2110.10956
作者: Nicole Mücke, Enrico Reiss, Jonas Rungenhagen, Markus Klein
备注:

点击查看摘要

Abstract: While large training datasets generally offer improvement in model performance, the training process becomes computationally expensive and time consuming. Distributed learning is a common strategy to reduce the overall training time by exploiting multiple computing devices. Recently, it has been observed in the single machine setting that overparametrization is essential for benign overfitting in ridgeless regression in Hilbert spaces. We show that in this regime, data splitting has a regularizing effect, hence improving statistical performance and computational complexity at the same time. We further provide a unified framework that allows to analyze both the finite and infinite dimensional setting. We numerically demonstrate the effect of different model parameters.

摘要:虽然大型训练数据集通常提供模型性能的提高,但培训过程变得昂贵且耗时。分布式学习是通过利用多个计算设备来减少整体培训时间的常见策略。最近,在单一机器设置中观察到,通过在希尔伯特空间中的缺陷回归中的良性过度,它是必不可少的。我们展示在这个制度中,数据分裂具有正规化效果,因此同时提高统计性能和计算复杂性。我们进一步提供了一个统一的框架,允许分析有限和无限尺寸设置。我们在数值上展示了不同模型参数的影响。

ML-111-标题: REAL-M: Towards Speech Separation on Real Mixtures

链接: https://arxiv.org/abs/2110.10812
作者: Cem Subakan, Mirco Ravanelli, Samuele Cornell, François Grondin
备注: Submitted to ICASSP 2022

点击查看摘要

Abstract: In recent years, deep learning based source separation has achieved impressive results. Most studies, however, still evaluate separation models on synthetic datasets, while the performance of state-of-the-art techniques on in-the-wild speech data remains an open question. This paper contributes to fill this gap in two ways. First, we release the REAL-M dataset, a crowd-sourced corpus of real-life mixtures. Secondly, we address the problem of performance evaluation of real-life mixtures, where the ground truth is not available. We bypass this issue by carefully designing a blind Scale-Invariant Signal-to-Noise Ratio (SI-SNR) neural estimator. Through a user study, we show that our estimator reliably evaluates the separation performance on real mixtures. The performance predictions of the SI-SNR estimator indeed correlate well with human opinions. Moreover, we observe that the performance trends predicted by our estimator on the REAL-M dataset closely follow those achieved on synthetic benchmarks when evaluating popular speech separation models.

摘要:近年来,基于深度学习的源分离取得了令人印象深刻的结果。然而,大多数研究仍然在合成数据集上评估分离模型,而在野外语音数据中的最先进技术的性能仍然是一个打开的问题。本文有助于以两种方式填补这一差距。首先,我们释放真实的数据集,这是一个人群的真实混合物语料库。其次,我们解决了现实生活混合物的绩效评估问题,实际情况不可用。我们通过仔细设计盲目尺度不​​变信噪比(SI-SNR)神经估计器来绕过这个问题。通过用户学习,我们表明我们的估算器可靠地评估真实混合物上的分离性能。 SI-SNR估计器的性能预测确实与人类观点良好相关。此外,我们观察到我们在Real-M数据集上预测的性能趋势紧密遵循在评估流行语音分离模型时对合成基准实现的趋势。

ML-112-标题: Identifiable Variational Autoencoders via Sparse Decoding

链接: https://arxiv.org/abs/2110.10804
作者: Gemma E. Moran, Dhanya Sridhar, Yixin Wang, David M. Blei
备注:

点击查看摘要

Abstract: We develop the Sparse VAE, a deep generative model for unsupervised representation learning on high-dimensional data. Given a dataset of observations, the Sparse VAE learns a set of latent factors that captures its distribution. The model is sparse in the sense that each feature of the dataset (i.e., each dimension) depends on a small subset of the latent factors. As examples, in ratings data each movie is only described by a few genres; in text data each word is only applicable to a few topics; in genomics, each gene is active in only a few biological processes. We first show that the Sparse VAE is identifiable: given data drawn from the model, there exists a uniquely optimal set of factors. (In contrast, most VAE-based models are not identifiable.) The key assumption behind Sparse-VAE identifiability is the existence of “anchor features”, where for each factor there exists a feature that depends only on that factor. Importantly, the anchor features do not need to be known in advance. We then show how to fit the Sparse VAE with variational EM. Finally, we empirically study the Sparse VAE with both simulated and real data. We find that it recovers meaningful latent factors and has smaller heldout reconstruction error than related methods.

摘要:我们开发了稀疏的VAE,这是一种深层生成模型,用于在高维数据上学习无监督的代表。鉴于观察数据集,稀疏VAE了解一组捕捉其分布的潜在因子。该模型在某种意义上是稀疏,即数据集(即,每个维度)的每个特征取决于潜在因子的小子集。作为示例,在评级数据中,每部电影仅由几种类型描述;在文本数据中,每个单词仅适用于几个主题;在基因组学中,每个基因仅在少数生物过程中活跃。我们首先表明稀疏VAE是可识别的:给定从模型中汲取的数据,存在一个唯一的最佳因素集。 (相比之下,大多数基于VAE的模型都不可识别。)稀疏-VAE可识别性背后的关键假设是存在“锚功能”,其中每个因素存在仅取决于该因子的特征。重要的是,不需要预先知道锚功能。然后,我们展示如何用变形件适合稀疏VAE。最后,我们用模拟和真实数据统一地研究稀疏VAE。我们发现它恢复了有意义的潜在因素,并且具有比相关方法更小的储备重建误差。

ML-113-标题: Adversarial attacks against Bayesian forecasting dynamic models

链接: https://arxiv.org/abs/2110.10783
作者: Roi Naveiro
备注:

点击查看摘要

Abstract: The last decade has seen the rise of Adversarial Machine Learning (AML). This discipline studies how to manipulate data to fool inference engines, and how to protect those systems against such manipulation attacks. Extensive work on attacks against regression and classification systems is available, while little attention has been paid to attacks against time series forecasting systems. In this paper, we propose a decision analysis based attacking strategy that could be utilized against Bayesian forecasting dynamic models.

摘要:过去十年已经看到了对抗机器学习的兴起(AML)。这一学科研究如何操纵数据到愚蠢推理引擎,以及如何保护这些系统免受此类操纵攻击。可以获得针对回归和分类系统的攻击的广泛工作,虽然对时间序列预测系统的攻击很少关注。在本文中,我们提出了一种基于攻击策略的决策分析,可以用于贝叶斯预测动态模型。

ML-114-标题: Pick-and-Mix Information Operators for Probabilistic ODE Solvers

链接: https://arxiv.org/abs/2110.10770
作者: Nathanael Bosch, Filip Tronarp, Philipp Hennig
备注: 13 pages, 7 figures

点击查看摘要

Abstract: Probabilistic numerical solvers for ordinary differential equations compute posterior distributions over the solution of an initial value problem via Bayesian inference. In this paper, we leverage their probabilistic formulation to seamlessly include additional information as general likelihood terms. We show that second-order differential equations should be directly provided to the solver, instead of transforming the problem to first order. Additionally, by including higher-order information or physical conservation laws in the model, solutions become more accurate and more physically meaningful. Lastly, we demonstrate the utility of flexible information operators by solving differential-algebraic equations. In conclusion, the probabilistic formulation of numerical solvers offers a flexible way to incorporate various types of information, thus improving the resulting solutions.

摘要:普通微分方程的概率数值求解器通过贝叶斯推断来计算初始值问题的解。在本文中,我们利用他们的概率制定能够无缝地包括作为一般似然术语的附加信息。我们表明应该直接向求解器提供二阶微分方程,而不是将问题转换为首次顺序。此外,通过在模型中包括高阶信息或物理保护法律,解决方案变得更加准确,更有物理有意义。最后,我们通过求解差动代数方程来证明灵活信息运营商的效用。总之,数值溶剂的概率制剂提供了一种灵活的方法来包含各种类型的信息,从而改善所得溶液。

ML-115-标题: Iterated Block Particle Filter for High-dimensional Parameter Learning: Beating the Curse of Dimensionality

链接: https://arxiv.org/abs/2110.10745
作者: Ning Ning, Edward L. Ionides
备注:

点击查看摘要

Abstract: Parameter learning for high-dimensional, partially observed, and nonlinear stochastic processes is a methodological challenge. Spatiotemporal disease transmission systems provide examples of such processes giving rise to open inference problems. We propose the iterated block particle filter (IBPF) algorithm for learning high-dimensional parameters over graphical state space models with general state spaces, measures, transition densities and graph structure. Theoretical performance guarantees are obtained on beating the curse of dimensionality (COD), algorithm convergence, and likelihood maximization. Experiments on a highly nonlinear and non-Gaussian spatiotemporal model for measles transmission reveal that the iterated ensemble Kalman filter algorithm (Li et al. (2020)) is ineffective and the iterated filtering algorithm (Ionides et al. (2015)) suffers from the COD, while our IBPF algorithm beats COD consistently across various experiments with different metrics.

摘要:高维,部分观察到的参数学习和非线性随机过程是一种方法论挑战。时尚性疾病传输系统提供了这种过程的示例,其产生了开放推理问题。我们提出了迭代块粒子滤波器(IBPF)算法,用于学习具有常规状态空间,测量,转换密度和图形结构的图形状态空间模型中的高维参数。在击败维度(COD),算法收敛和似然最大化的诅咒上获得理论性能保证。对麻疹传输高度非线性和非高斯时空模型的实验表明,迭代集合Kalman滤波算法(Li等人)是无效的,迭代过滤算法(IONIDES等)遭受迭代滤波算法(IONIDED等)COD,而我们的IBPF算法在不同指标的各种实验中始终如一地击败COD。

ML-116-标题: Semi-supervised physics guided DL framework for predicting the I-V characteristics of GAN HEMT

链接: https://arxiv.org/abs/2110.10724
作者: Shivanshu Mishra, Bipin Gaikwad, Nidhi Chaturvedi
备注:

点击查看摘要

Abstract: This letter proposes a novel deep learning framework (DLF) that addresses two major hurdles in the adoption of deep learning techniques for solving physics-based problems: 1) requirement of the large dataset for training the DL model, 2) consistency of the DL model with the physics of the phenomenon. The framework is generic in nature and can be applied to model a phenomenon from other fields of research too as long as its behaviour is known. To demonstrate the technique, a semi-supervised physics guided neural network (SPGNN) has been developed that predicts I-V characteristics of a gallium nitride-based high electron mobility transistor (GaN HEMT). A two-stage training method is proposed, where in the first stage, the DL model is trained via the unsupervised learning method using the I-V equations of a field-effect transistor as a loss function of the model that incorporates physical behaviors in the DL model and in the second stage, the DL model has been fine-tuned with a very small set of experimental data. The SPGNN significantly reduces the requirement of the training data by more than 80% for achieving similar or better performance than a traditional neural network (TNN) even for unseen conditions. The SPGNN predicts 32.4% of the unseen test data with less than 1% of error and only 0.4% of the unseen test data with more than 10% of error.

摘要:这封信提出了一种新颖的深入学习框架(DLF),在采用深度学习技术中解决了解决物理学问题的两个主要障碍:1)大型数据集进行培训DL模型的要求,2)一致性DL模型与现象的物理学。只要其行为已知,该框架本质上是通用的,并且可以应用于从其他研究领域的现象模拟现象。为了证明该技术,已经开发了一种半监督物理引导的神经网络(SPGNN),其预测了基于氮化镓的高电子迁移率晶体管(GaN Hemt)的I-V特性。提出了一种两阶段训练方法,其中在第一阶段,DL模型通过使用现场效应晶体管的IV方程作为模型的损耗函数在DL模型中结合了物理行为的损耗函数训练在第二阶段,DL模型已经精细调整,具有一小一套实验数据。 SPGNN显着降低了培训数据的要求超过80%,以实现与传统的神经网络(TNN)相似或更好的性能,即使是看不见的条件。 SPGNN预测32.4%的看不见的测试数据,误差少于1%,只有0.4%的未经检测数据,超过10%的错误。

ML-117-标题: Learning quantum dynamics with latent neural ODEs

链接: https://arxiv.org/abs/2110.10721
作者: Matthew Choi, Daniel Flam-Shepherd, Thi Ha Kyaw, Alán Aspuru-Guzik
备注:

点击查看摘要

Abstract: The core objective of machine-assisted scientific discovery is to learn physical laws from experimental data without prior knowledge of the systems in question. In the area of quantum physics, making progress towards these goals is significantly more challenging due to the curse of dimensionality as well as the counter-intuitive nature of quantum mechanics. Here, we present the QNODE, a latent neural ODE trained on dynamics from closed and open quantum systems. The QNODE can learn to generate quantum dynamics and extrapolate outside of its training region that satisfy the von Neumann and time-local Lindblad master equations for closed and open quantum systems. Furthermore the QNODE rediscovers quantum mechanical laws such as Heisenberg’s uncertainty principle in a totally data-driven way, without constraints or guidance. Additionally, we show that trajectories that are generated from the QNODE and are close in its latent space have similar quantum dynamics while preserving the physics of the training system.

摘要:机器辅助科学发现的核心目标是从实验数据中学习物理法律,而无需先验知识所讨论的系统。在量子物理学领域,由于维度的诅咒以及量子力学的反向直观性,使这些目标的进展明显更具挑战性。在这里,我们介绍了一个潜在的神经颂歌,训练来自闭合和打开量子系统的动态。 Qnode可以学习在其训练区域之外的Quantum动态和外推,以满足von neumann和时间 - 本地Lindblad主方程,用于关闭和开放量子系统。此外,Qnode以完全数据驱动的方式重新发现Quantum机械法,例如Heisenberg的不确定性原则,没有限制或指导。此外,我们表明,从Qnode生成的轨迹并在其潜在空间中接近具有相似的量子动态,同时保留培训系统的物理。

ML-118-标题: Stochastic Learning Rate Optimization in the Stochastic Approximation and Online Learning Settings

链接: https://arxiv.org/abs/2110.10710
作者: Theodoros Mamalis, Dusan Stipanovic, Petros Voulgaris
备注:

点击查看摘要

Abstract: In this work, multiplicative stochasticity is applied to the learning rate of stochastic optimization algorithms, giving rise to stochastic learning-rate schemes. In-expectation theoretical convergence results of Stochastic Gradient Descent equipped with this novel stochastic learning rate scheme under the stochastic setting, as well as convergence results under the online optimization settings are provided. Empirical results consider the case of an adaptively uniformly distributed multiplicative stochasticity and include not only Stochastic Gradient Descent, but also other popular algorithms equipped with a stochastic learning rate. They demonstrate noticeable optimization performance gains, with respect to their deterministic-learning-rate versions.

摘要:在这项工作中,乘法随机性适用于随机优化算法的学习率,从而产生随机学习率方案。提供了在随机设置下的这种新型随机学习率方案的随机梯度下降的理论会聚结果,以及在线优化设置下的收敛结果。经验结果考虑适自相同均匀分布的乘法性随机性的情况,并且不仅包括随机梯度下降,而且包括配备随机学习率的其他流行算法。他们展示了关于他们的确定性学习速率版本的明显优化性能。

ML-119-标题: Predicting Tau Accumulation in Cerebral Cortex with Multivariate MRI Morphometry Measurements Sparse Coding and Correntropy

链接: https://arxiv.org/abs/2110.10709
作者: Jianfeng Wu, Wenhui Zhu, Yi Su, Jie Gui, Natasha Lepore, Eric M. Reiman, Richard J. Caselli, Paul M. Thompson, Kewei Chen, Yalin Wang
备注: 10 pages, 5 figures, 17th International Symposium on Medical Information Processing and Analysis

点击查看摘要

Abstract: Biomarker-assisted diagnosis and intervention in Alzheimer’s disease (AD) may be the key to prevention breakthroughs. One of the hallmarks of AD is the accumulation of tau plaques in the human brain. However, current methods to detect tau pathology are either invasive (lumbar puncture) or quite costly and not widely available (Tau PET). In our previous work, structural MRI-based hippocampal multivariate morphometry statistics (MMS) showed superior performance as an effective neurodegenerative biomarker for preclinical AD and Patch Analysis-based Surface Correntropy-induced Sparse coding and max-pooling (PASCS-MP) has excellent ability to generate low-dimensional representations with strong statistical power for brain amyloid prediction. In this work, we apply this framework together with ridge regression models to predict Tau deposition in Braak12 and Braak34 brain regions separately. We evaluate our framework on 925 subjects from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Each subject has one pair consisting of a PET image and MRI scan which were collected at about the same times. Experimental results suggest that the representations from our MMS and PASCS-MP have stronger predictive power and their predicted Braak12 and Braak34 are closer to the real values compared to the measures derived from other approaches such as hippocampal surface area and volume, and shape morphometry features based on spherical harmonics (SPHARM).

摘要:在阿尔茨海默病(AD)中的生物标志物辅助诊断和干预可能是预防突破的关键。广告的标志之一是人类大脑中Tau Plaques的积累。然而,检测Tau病理学的目前的方法是侵入性(腰椎穿刺)或相当昂贵而不是广泛的(TAU PET)。在我们以前的工作中,基于结构MRI的海马多变量形态统计(MMS)表现出优异的性能,作为临床前AD和贴剂分析的表面上的纯净稀疏的稀疏编码和最大池(PASCS-MP)具有优异的能力为脑淀粉样蛋白预测产生具有强统计功率的低维表示。在这项工作中,我们将此框架与Ridge回归模型一起应用,以预测Brak12和Brak34脑区的Tau沉积。我们评估来自阿尔茨海默病神经影像倡议(ADNI)的925个受试者的框架。每个受试者具有由PET图像和MRI扫描组成的一对,其在约同时收集。实验结果表明,与从其他方法(如海马表面积和体积)的措施相比,我们的MMS和PASCS-MP的代表具有更强的预测力,并且其预测的Braak12和Braak34更接近实际值,以及基于样的其他方法,以及基于形状的形态学特征在球形谐波(Spharm)。

ML-120-标题: Online non-parametric change-point detection for heterogeneous data streams observed over graph nodes

链接: https://arxiv.org/abs/2110.10518
作者: Alejandro de la Concha, Argyris Kalogeratos, Nicolas Vayatis
备注: 11 pages

点击查看摘要

Abstract: Consider a heterogeneous data stream being generated by the nodes of a graph. The data stream is in essence composed by multiple streams, possibly of different nature that depends on each node. At a given moment τ\tau, a change-point occurs for a subset of nodes CC, signifying the change in the probability distribution of their associated streams. In this paper we propose an online non-parametric method to infer τ\tau based on the direct estimation of the likelihood-ratio between the post-change and the pre-change distribution associated with the data stream of each node. We propose a kernel-based method, under the hypothesis that connected nodes of the graph are expected to have similar likelihood-ratio estimates when there is no change-point. We demonstrate the quality of our method on synthetic experiments and real-world applications.

摘要:考虑由图形的节点生成的异构数据流。数据流实质上由多个流组成,可能取决于每个节点的不同性质。在给定的时刻$ \ tau ,为节点CODE的子集进行变更点,请表示其关联流的概率分布的变化。在本文中,我们提出了一种在线非参数方法来推断出,为节点CODE的子集进行变更点,请表示其关联流的概率分布的变化。在本文中,我们提出了一种在线非参数方法来推断出 \ tau $基于直接估计后变更后的似然比和与每个节点的数据流相关的预更改分布。我们提出了一种基于内核的方法,在假设下,当没有变化点时,预期图的连接节点具有相似的似然比估计。我们展示了我们对合成实验和现实世界应用的方法的质量。

ML-121-标题: Feedback Linearization of Car Dynamics for Racing via Reinforcement Learning

链接: https://arxiv.org/abs/2110.10441
作者: Michael Estrada, Sida Li, Xiangyu Cai
备注: Final research paper for Berkeley’s CS 285 (Deep Reinforcement Learning) in Fall 2020

点击查看摘要

Abstract: Through the method of Learning Feedback Linearization, we seek to learn a linearizing controller to simplify the process of controlling a car to race autonomously. A soft actor-critic approach is used to learn a decoupling matrix and drift vector that effectively correct for errors in a hand-designed linearizing controller. The result is an exactly linearizing controller that can be used to enable the well-developed theory of linear systems to design path planning and tracking schemes that are easy to implement and significantly less computationally demanding. To demonstrate the method of feedback linearization, it is first used to learn a simulated model whose exact structure is known, but varied from the initial controller, so as to introduce error. We further seek to apply this method to a system that introduces even more error in the form of a gym environment specifically designed for modeling the dynamics of car racing. To do so, we posit an extension to the method of learning feedback linearization; a neural network that is trained using supervised learning to convert the output of our linearizing controller to the required input for the racing environment. Our progress towards these goals is reported and the next steps in their accomplishment are discussed.

摘要:通过学习反馈线性化的方法,我们寻求学习一个线性化控制器,以简化控制汽车自主竞争的过程。软演员 - 批评方法用于学习解耦矩阵和漂移矢量,有效地校正手动设计的线性化控制器中的错误。结果是一个准确的线性化控制器,可用于使线性系统的良好开发的理论设计为设计易于实施的路径规划和跟踪方案,并且显着较低的计算要求苛刻。为了展示反馈线性化的方法,首先用于学习其精确结构所知的模拟模型,而是从初始控制器变化,以便引入错误。我们进一步寻求将这种方法应用于一个系统,该系统以专门用于建模汽车赛车的动态的健身环境的形式引入更多错误。为此,我们对学习反馈线性化的方法的扩展;使用监督学习培训的神经网络,将线性化控制器的输出转换为赛车环境所需的输入。据报告了我们对这些目标的进展,并讨论了他们成就的下一步。

ML-122-标题: Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process

链接: https://arxiv.org/abs/2110.10351
作者: Tianjiao Li, Ziwei Guan, Shaofeng Zou, Tengyu Xu, Yingbin Liang, Guanghui Lan
备注: The paper was initially submitted for publication in January 2021

点击查看摘要

Abstract: The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its utilities/costs. A new primal-dual approach is proposed with a novel integration of three ingredients: entropy regularized policy optimizer, dual variable regularizer, and Nesterov’s accelerated gradient descent dual optimizer, all of which are critical to achieve a faster convergence. The finite-time error bound of the proposed approach is characterized. Despite the challenge of the nonconcave objective subject to nonconcave constraints, the proposed approach is shown to converge to the global optimum with a complexity of O~(1/ϵ)\tilde{\mathcal O}(1/\epsilon) in terms of the optimality gap and the constraint violation, which improves the complexity of the existing primal-dual approach by a factor of O(1/ϵ)\mathcal O(1/\epsilon) \citep{ding2020natural,paternain2019constrained}. This is the first demonstration that nonconcave CMDP problems can attain the complexity lower bound of O(1/ϵ)\mathcal O(1/\epsilon) for convex optimization subject to convex constraints. Our primal-dual approach and non-asymptotic analysis are agnostic to the RL optimizer used, and thus are more flexible for practical applications. More generally, our approach also serves as the first algorithm that provably accelerates constrained nonconvex optimization with zero duality gap by exploiting the geometries such as the gradient dominance condition, for which the existing acceleration methods for constrained convex optimization are not applicable.

摘要:调查了受约束的马尔可夫决策过程(CMDP)的问题,其中代理人旨在最大限度地提高预期的累计折扣奖励对其公用事业/费用的多种限制。提出了一种新的原始方法,提出了三种成分的新颖集成:熵正规策略优化器,双变量规范器和Nesterov的加速梯度下降双优化器,所有这些都是实现更快的收敛性的至关重要。所提出方法的有限误差的特点是。尽管非协调客观受到非协调的限制,但所提出的方法被证明汇集到全球最佳,而具有$ \ TINDE {\ MATHCAL O}(1 / \ epsilon)的复杂性,而是最优性差距和约束违规,这提高了现有原始双方法的复杂性,以的复杂性,而是最优性差距和约束违规,这提高了现有原始 - 双方法的复杂性,以 \ mathcal o(1 / \ epsilon)$ \ citep {ding2020natural,paternain2019constromed}。这是第一个演示,即非协商CMDP问题可以获得$ \ MATHCAL O(1 / \ epsilon)$的复杂性,以便凸的优化受到凸的约束。我们的原始 - 双方法和非渐近分析对于使用的RL优化器不可知,因此对于实际应用更加灵活。更一般地,我们的方法还用作通过利用诸如梯度优势条件的几何形状,从零二元间隙可被证明地加速约束的非膨胀优化的第一算法。

ML-123-标题: Computational Graph Completion

链接: https://arxiv.org/abs/2110.10323
作者: Houman Owhadi
备注: 31 pages

点击查看摘要

Abstract: We introduce a framework for generating, organizing, and reasoning with computational knowledge. It is motivated by the observation that most problems in Computational Sciences and Engineering (CSE) can be described as that of completing (from data) a computational graph representing dependencies between functions and variables. Functions and variables may be known, unknown, or random. Data comes in the form of observations of distinct values of a finite number of subsets of the variables of the graph. The underlying problem combines a regression problem (approximating unknown functions) with a matrix completion problem (recovering unobserved variables in the data). Replacing unknown functions by Gaussian Processes (GPs) and conditioning on observed data provides a simple but efficient approach to completing such graphs. Since the proposed framework is highly expressive, it has a vast potential application scope. Since the completion process can be automatized, as one solves 2+3\sqrt{\sqrt{2}+\sqrt{3}} on a pocket calculator without thinking about it, one could, with the proposed framework, solve a complex CSE problem by drawing a diagram. Compared to traditional kriging, the proposed framework can be used to recover unknown functions with much scarcer data by exploiting interdependencies between multiple functions and variables. The Computational Graph Completion (CGC) problem addressed by the proposed framework could therefore also be interpreted as a generalization of that of solving linear systems of equations to that of approximating unknown variables and functions with noisy, incomplete, and nonlinear dependencies. Numerous examples illustrate the flexibility, scope, efficacy, and robustness of the CGC framework and show how it can be used as a pathway to identifying simple solutions to classical CSE problems (digital twin modeling, dimension reduction, mode decomposition, etc.).

摘要:我们介绍了通过计算知识产生,组织和推理的框架。它是通过观察到的,即计算科学和工程(CSE)中的大多数问题可以被描述为完成(从数据)代表函数和变量之间的依赖性的计算图表。功能和变量可以是已知的,未知的或随机的。数据以分析图变量的有限数量的分数的观察形式。底层问题将回归问题(近似未知函数)与矩阵完成问题组合(恢复数据中的未观察变量)。通过高斯进程(GPS)替换未知功能,并且对观察数据的调理提供了简单但有效的方法来完成这些图。由于拟议的框架是高度表现力的,因此它具有巨大的潜在应用范围。由于完成过程可以自动化,因为一个人解决了$ \ sqrt {\ sqrt {2} + \ sqrt {3}}美元在口袋计算器上没有考虑它,一个可以与提出的框架一起解决复杂的CSE问题通过绘制图表。与传统的Kriging相比,通过利用多个函数和变量之间的相互依赖性,可以使用所提出的框架来恢复具有多稀缺数据的未知功能。因此,所提出的框架所寻址的计算图形完成(CGC)问题也可以被解释为求解方程的线性系统的概括,以近似未知变量和噪声,不完整和非线性依赖性的函数。许多例子说明了CGC框架的灵活性,范围,功效和鲁棒性,并展示了如何用作识别古典CSE问题的简单解决方案的途径(数字双胞胎建模,尺寸减少,模式分解等)。

ML-124-标题: Joint Gaussian Graphical Model Estimation: A Survey

链接: https://arxiv.org/abs/2110.10281
作者: Katherine Tsai, Oluwasanmi Koyejo, Mladen Kolar
备注:

点击查看摘要

Abstract: Graphs from complex systems often share a partial underlying structure across domains while retaining individual features. Thus, identifying common structures can shed light on the underlying signal, for instance, when applied to scientific discoveries or clinical diagnoses. Furthermore, growing evidence shows that the shared structure across domains boosts the estimation power of graphs, particularly for high-dimensional data. However, building a joint estimator to extract the common structure may be more complicated than it seems, most often due to data heterogeneity across sources. This manuscript surveys recent work on statistical inference of joint Gaussian graphical models, identifying model structures that fit various data generation processes. Simulations under different data generation processes are implemented with detailed discussions on the choice of models.

摘要:来自复杂系统的图表经常在域中共享一个部分底层结构,同时保留单个功能。因此,识别常见结构可以在底层信号上脱光,例如,当应用于科学发现或临床诊断时。此外,日益增长的证据表明,域的共享结构提高了图形的估计功率,特别是对于高维数据。然而,构建联合估计器以提取共同结构可能比似乎更复杂,最常见于跨越来源的数据异质性。该稿件调查了近期高斯图形模型统计推断的最新工作,识别适合各种数据生成过程的模型结构。在不同数据生成过程下的模拟是通过关于模型的选择的详细讨论来实现。

ML-125-标题: Factorization Approach for Low-complexity Matrix Completion Problems: Exponential Number of Spurious Solutions and Failure of Gradient Methods

链接: https://arxiv.org/abs/2110.10279
作者: Baturalp Yalcin, Haixiang Zhang, Javad Lavaei, Somayeh Sojoudi
备注: 21 pages, 1 figure

点击查看摘要

Abstract: It is well-known that the Burer-Monteiro (B-M) factorization approach can efficiently solve low-rank matrix optimization problems under the RIP condition. It is natural to ask whether B-M factorization-based methods can succeed on any low-rank matrix optimization problems with a low information-theoretic complexity, i.e., polynomial-time solvable problems that have a unique solution. In this work, we provide a negative answer to the above question. We investigate the landscape of B-M factorized polynomial-time solvable matrix completion (MC) problems, which are the most popular subclass of low-rank matrix optimization problems without the RIP condition. We construct an instance of polynomial-time solvable MC problems with exponentially many spurious local minima, which leads to the failure of most gradient-based methods. Based on those results, we define a new complexity metric that potentially measures the solvability of low-rank matrix optimization problems based on the B-M factorization approach. In addition, we show that more measurements of the ground truth matrix can deteriorate the landscape, which further reveals the unfavorable behavior of the B-M factorization on general low-rank matrix optimization problems.

摘要:众所周知,毛刺蒙特罗(B-M)分解方法可以有效地解决RIP条件下的低级矩阵优化问题。询问基于B-M的基于矩阵优化问题是否可以在具有低信息理论复杂度的任何低级矩阵优化问题上,即具有唯一解决方案的多项式可溶性问题是很自然的。在这项工作中,我们为上述问题提供了负面答案。我们调查B-M分解多项式可溶性矩阵完井(MC)问题的景观,这是没有RIP条件的低级矩阵优化问题的最受欢迎的子类。我们构建了多项式溶解MC问题的一个实例,其具有指数级的杂散局部最小值,这导致基于梯度的最大的方法失败。基于这些结果,我们定义了一种新的复杂性度量,可能基于B-M分解方法来测量低秩矩阵优化问题的可解性。此外,我们表明,更多的地面实际矩阵的测量可以恶化景观,这进一步揭示了B-M对普通低秩矩阵优化问题的不利行为。

ML-126-标题: Patch Based Transformation for Minimum Variance Beamformer Image Approximation Using Delay and Sum Pipeline

链接: https://arxiv.org/abs/2110.10220
作者: Sairoop Bodepudi, A N Madhavanunni, Mahesh Raveendranatha Panicker
备注: 6 pages, 3 figures

点击查看摘要

Abstract: In the recent past, there have been several efforts in accelerating computationally heavy beamforming algorithms such as minimum variance distortionless response (MVDR) beamforming to achieve real-time performance comparable to the popular delay and sum (DAS) beamforming. This has been achieved using a variety of neural network architectures ranging from fully connected neural networks (FCNNs), convolutional neural networks (CNNs) and general adversarial networks (GANs). However most of these approaches are working with optimizations considering image level losses and hence require a significant amount of dataset to ensure that the process of beamforming is learned. In this work, a patch level U-Net based neural network is proposed, where the delay compensated radio frequency (RF) patch for a fixed region in space (e.g. 32x32) is transformed through a U-Net architecture and multiplied with DAS apodization weights and optimized for similarity with MVDR image of the patch. Instead of framing the beamforming problem as a regression problem to estimate the apodization weights, the proposed approach treats the non-linear transformation of the RF data space that can account for the data driven weight adaptation done by the MVDR approach in the parameters of the network. In this way, it is also observed that by restricting the input to a patch the model will learn the beamforming pipeline as an image non-linear transformation problem.

摘要:在最近的过去,在加速计算的重型波束成形算法中有几项努力,例如最小方差失真响应(MVDR)波束成形,以实现与流行延迟和和(DAS)波束成形相当的实时性能。这已经通过从完全连接的神经网络(FCNNS),卷积神经网络(CNN)和一般对冲网络(GANs)范围内的各种神经网络架构实现。然而,大多数方法正在使用考虑图像级损耗的优化,因此需要大量的数据集以确保学习波束成形的过程。在这项工作中,提出了一种基于补码U-Net基于网络的神经网络,其中通过U-Net架构转换空间中的固定区域(例如32X32)的延迟补偿射频(RF)贴片,并乘以DAS复位权重并与修补程序的MVDR图像相似于相似性优化。代替将波束成形问题绘制成回归问题以估计停留权重,所提出的方法处理RF数据空间的非线性变换,该RF数据空间可以考虑通过MVDR方法在网络的参数中完成的数据驱动权重自适应。以这种方式,还观察到,通过将输入限制为修补程序,模型将学习波束形成流水线作为图像非线性变换问题。

ML-127-标题: Long Random Matrices and Tensor Unfolding

链接: https://arxiv.org/abs/2110.10210
作者: Gérard Ben Arous, Daniel Zhengyu Huang, Jiaoyang Huang
备注: 29 pages, 4 figures

点击查看摘要

Abstract: In this paper, we consider the singular values and singular vectors of low rank perturbations of large rectangular random matrices, in the regime the matrix is “long”: we allow the number of rows (columns) to grow polynomially in the number of columns (rows). We prove there exists a critical signal-to-noise ratio (depending on the dimensions of the matrix), and the extreme singular values and singular vectors exhibit a BBP type phase transition. As a main application, we investigate the tensor unfolding algorithm for the asymmetric rank-one spiked tensor model, and obtain an exact threshold, which is independent of the procedure of tensor unfolding. If the signal-to-noise ratio is above the threshold, tensor unfolding detects the signals; otherwise, it fails to capture the signals.

摘要:在本文中,我们考虑了大矩形随机矩阵的低级别扰动的奇异值和奇异矢量,在矩阵中的“长”中:我们允许行数(列)的数量在数量中增加多项列(行)。我们证明存在临界信噪比(取决于矩阵的尺寸),并且极端奇异值和奇异矢量表现出BBP型相变。作为主要应用,我们研究了对不对称秩一尖峰值模型的张量展开算法,并获得了精确的阈值,其与张量展开的过程无关。如果信噪比高于阈值,则张量展开检测信号;否则,它无法捕获信号。

ML-128-标题: Barriers and Dynamical Paths in Alternating Gibbs Sampling of Restricted Boltzmann Machines

链接: https://arxiv.org/abs/2107.06013
作者: Clément Roussel, Simona Cocco, Rémi Monasson
备注:

点击查看摘要

Abstract: Restricted Boltzmann Machines (RBM) are bi-layer neural networks used for the unsupervised learning of model distributions from data. The bipartite architecture of RBM naturally defines an elegant sampling procedure, called Alternating Gibbs Sampling (AGS), where the configurations of the latent-variable layer are sampled conditional to the data-variable layer, and vice versa. We study here the performance of AGS on several analytically tractable models borrowed from statistical mechanics. We show that standard AGS is not more efficient than classical Metropolis-Hastings (MH) sampling of the effective energy landscape defined on the data layer. However, RBM can identify meaningful representations of training data in their latent space. Furthermore, using these representations and combining Gibbs sampling with the MH algorithm in the latent space can enhance the sampling performance of the RBM when the hidden units encode weakly dependent features of the data. We illustrate our findings on three datasets: Bars and Stripes and MNIST, well known in machine learning, and the so-called Lattice Proteins, introduced in theoretical biology to study the sequence-to-structure mapping in proteins.

摘要:受限制的Boltzmann机器(RBM)是用于无监督从数据的模型分布学习的双层神经网络。 RBM的二角形架构自然地定义了一个优雅的采样过程,称为交替的GIBBS采样(AGS),其中潜在变量层的配置对数据变量层进行采样条件,反之亦然。我们在此研究AGS在统计机制借用的几种分析贸易模型上的表现。我们表明标准年牌与在数据层上定义的有效能量景观的经典大都会 - 黑斯廷斯(MH)采样并不比较有效。但是,RBM可以识别在其潜在空间中的培训数据的有意义表示。此外,使用这些表示和与潜在空间中的MH算法组合的GIBBS采样可以增强RBM的采样性能,当隐藏的单元编码数据的弱依赖性特征时。我们在三个数据集上说明了我们的调查结果:酒吧和条纹和Mnist,在机器学习中众所周知,所谓的晶格蛋白质,在理论生物学中引入,以研究蛋白质中的序列到结构映射。

计算机视觉

CV-0-标题: Convex Joint Graph Matching and Clustering via Semidefinite Relaxations

链接: https://arxiv.org/abs/2110.11335
作者: Maximilian Krahn, Florian Bernard, Vladislav Golyanik
备注: 12 pages, 8 figures; source code available; project webpage: this https URL

点击查看摘要

Abstract: This paper proposes a new algorithm for simultaneous graph matching and clustering. For the first time in the literature, these two problems are solved jointly and synergetically without relying on any training data, which brings advantages for identifying similar arbitrary objects in compound 3D scenes and matching them. For joint reasoning, we first rephrase graph matching as a rigid point set registration problem operating on spectral graph embeddings. Consequently, we utilise efficient convex semidefinite program relaxations for aligning points in Hilbert spaces and add coupling constraints to model the mutual dependency and exploit synergies between both tasks. We outperform state of the art in challenging cases with non-perfectly matching and noisy graphs, and we show successful applications on real compound scenes with multiple 3D elements. Our source code and data are publicly available.

摘要:本文提出了一种用于同时图形匹配和聚类的新算法。在文献中首次,这两个问题在不依赖于任何培训数据的情况下共同且协同作用,这为复合3D场景中的类似任意物体带来了识别类似的任意物体并匹配它们。对于联合推理,我们首先将图形匹配作为在光谱图嵌入的频谱图中操作的刚度设置注册问题。因此,我们利用有效的凸形半纤维计划放松,以便在希尔伯特空格中对齐点,并添加耦合约束来模拟两个任务之间的共同依赖性和利用协同作用。我们在具有非完美匹配和嘈杂的图形的具有挑战性的情况下优于最先进的案例,我们在具有多个3D元素的真实复合场景上显示成功的应用程序。我们的源代码和数据是公开的。

CV-1-标题: Generalized Out-of-Distribution Detection: A Survey

链接: https://arxiv.org/abs/2110.11334
作者: Jingkang Yang, Kaiyang Zhou, Yixuan Li, Ziwei Liu
备注: Issues, comments, and questions are all welcomed in this https URL

点击查看摘要

Abstract: Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of machine learning systems. For instance, in autonomous driving, we would like the driving system to issue an alert and hand over the control to humans when it detects unusual scenes or objects that it has never seen before and cannot make a safe decision. This problem first emerged in 2017 and since then has received increasing attention from the research community, leading to a plethora of methods developed, ranging from classification-based to density-based to distance-based ones. Meanwhile, several other problems are closely related to OOD detection in terms of motivation and methodology. These include anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD). Despite having different definitions and problem settings, these problems often confuse readers and practitioners, and as a result, some existing studies misuse terms. In this survey, we first present a generic framework called generalized OOD detection, which encompasses the five aforementioned problems, i.e., AD, ND, OSR, OOD detection, and OD. Under our framework, these five problems can be seen as special cases or sub-tasks, and are easier to distinguish. Then, we conduct a thorough review of each of the five areas by summarizing their recent technical developments. We conclude this survey with open challenges and potential research directions.

摘要:分销(OOD)检测对于确保机器学习系统的可靠性和安全性至关重要。例如,在自动驾驶中,我们希望驱动系统在检测到之前从未见过的不寻常的场景或对象并且无法做出安全决定时,我们希望驱动系统发出警报并将其交给人类。这次问题在2017年首次出现,从那时起,从研究界的增加,导致普遍的方法,从基于分类的基于距离为基础的方法。与此同时,在动机和方法方面,其他几个其他问题与OOC检测密切相关。这些包括异常检测(AD),新奇检测(ND),开放式识别(OSR),以及异常值检测(OD)。尽管具有不同的定义和问题设置,但这些问题通常会使读者和从业者混淆,因此,一些现有的研究滥用条款。在本调查中,我们首先提出一个称为广义ood检测的通用框架,其包括五个上述问题,即广告,Nd,osr,ood检测和od。在我们的框架下,这五个问题可以被视为特殊情况或子任务,并且更容易区分。然后,我们通过总结其最近的技术发展来对五个领域进行全面审查。我们以开放的挑战和潜在的研究方向得出本调查。

CV-2-标题: A Fine-Grained Analysis on Distribution Shift

链接: https://arxiv.org/abs/2110.11328
作者: Olivia Wiles, Sven Gowal, Florian Stimberg, Sylvestre Alvise-Rebuffi, Ira Ktena, Krishnamurthy (Dj) Dvijotham, Taylan Cemgil
备注:

点击查看摘要

Abstract: Robustness to distribution shifts is critical for deploying machine learning models in the real world. Despite this necessity, there has been little work in defining the underlying mechanisms that cause these shifts and evaluating the robustness of algorithms across multiple, different distribution shifts. To this end, we introduce a framework that enables fine-grained analysis of various distribution shifts. We provide a holistic analysis of current state-of-the-art methods by evaluating 19 distinct methods grouped into five categories across both synthetic and real-world datasets. Overall, we train more than 85K models. Our experimental framework can be easily extended to include new methods, shifts, and datasets. We find, unlike previous work~\citep{Gulrajani20}, that progress has been made over a standard ERM baseline; in particular, pretraining and augmentations (learned or heuristic) offer large gains in many cases. However, the best methods are not consistent over different datasets and shifts.

摘要:对分发班次的稳健性对于在现实世界中部署机器学习模型至关重要。尽管如此必要的,但在定义导致这些变化的潜在机制以及评估跨多个不同的分发班次的稳健性的潜在机制很少。为此,我们介绍了一种框架,可实现各种分布换档的细粒度分析。我们通过评估在合成和现实世界数据集中分为五个类别的19个不同的方法来提供对当前最先进的方法的整体分析。总的来说,我们训练超过85架模型。我们的实验框架可以很容易地扩展到包括新方法,班次和数据集。我们发现,与以前的工作〜\ citep {gulrajani20}不同,该进度已经通过标准的ERM基线进行;特别是,在许多情况下,预先训练和增强(学习或启发式)提供了大的收益。但是,最好的方法在不同的数据集和班次上不一致。

CV-3-标题: Learning 3D Semantic Segmentation with only 2D Image Supervision

链接: https://arxiv.org/abs/2110.11325
作者: Kyle Genova, Xiaoqi Yin, Abhijit Kundu, Caroline Pantofaru, Forrester Cole, Avneesh Sud, Brian Brewington, Brian Shucker, Thomas Funkhouser
备注: Accepted to 3DV 2021 (Oral)

点击查看摘要

Abstract: With the recent growth of urban mapping and autonomous driving efforts, there has been an explosion of raw 3D data collected from terrestrial platforms with lidar scanners and color cameras. However, due to high labeling costs, ground-truth 3D semantic segmentation annotations are limited in both quantity and geographic diversity, while also being difficult to transfer across sensors. In contrast, large image collections with ground-truth semantic segmentations are readily available for diverse sets of scenes. In this paper, we investigate how to use only those labeled 2D image collections to supervise training 3D semantic segmentation models. Our approach is to train a 3D model from pseudo-labels derived from 2D semantic image segmentations using multiview fusion. We address several novel issues with this approach, including how to select trusted pseudo-labels, how to sample 3D scenes with rare object categories, and how to decouple input features from 2D images from pseudo-labels during training. The proposed network architecture, 2D3DNet, achieves significantly better performance (+6.2-11.4 mIoU) than baselines during experiments on a new urban dataset with lidar and images captured in 20 cities across 5 continents.

摘要:随着近期城市测绘和自主驾驶努力的增长,从陆地平台与LIDAR扫描仪和彩色相机中收集的原始3D数据爆炸。然而,由于高标签成本,地面真理3D语义分割注释在数量和地理分集中有限,同时难以通过传感器转移。相比之下,具有地面真实性分割的大型图像集合随时可用于不同的场景。在本文中,我们研究了如何仅使用标记为2D图像集合来监督培训3D语义分段模型。我们的方法是从使用多视图融合训练从2D语义图像分割的伪标签训练3D模型。我们通过这种方法解决了几个新颖的问题,包括如何选择可信伪标签,如何使用罕见的对象类别来示例3D场景,以及如何在训练期间从伪标签中从伪标签中与2D图像中的输入特征分离。建议的网络架构,2D3DNET,在新城市数据集的实验中实现了明显更好的性能(+ 6.2-11.4 miou),其在新的城市数据集中的LIDAR和20个城市捕获的图像。

CV-4-标题: StyleAlign: Analysis and Applications of Aligned StyleGAN Models

链接: https://arxiv.org/abs/2110.11323
作者: Zongze Wu, Yotam Nitzan, Eli Shechtman, Dani Lischinski
备注: 39 pages, 33 figures

点击查看摘要

Abstract: In this paper, we perform an in-depth study of the properties and applications of aligned generative models. We refer to two models as aligned if they share the same architecture, and one of them (the child) is obtained from the other (the parent) via fine-tuning to another domain, a common practice in transfer learning. Several works already utilize some basic properties of aligned StyleGAN models to perform image-to-image translation. Here, we perform the first detailed exploration of model alignment, also focusing on StyleGAN. First, we empirically analyze aligned models and provide answers to important questions regarding their nature. In particular, we find that the child model’s latent spaces are semantically aligned with those of the parent, inheriting incredibly rich semantics, even for distant data domains such as human faces and churches. Second, equipped with this better understanding, we leverage aligned models to solve a diverse set of tasks. In addition to image translation, we demonstrate fully automatic cross-domain image morphing. We further show that zero-shot vision tasks may be performed in the child domain, while relying exclusively on supervision in the parent domain. We demonstrate qualitatively and quantitatively that our approach yields state-of-the-art results, while requiring only simple fine-tuning and inversion.

摘要:在本文中,我们对对齐的生成模型的性质和应用进行了深入研究。如果它们共享相同的架构,则指调整为对齐的两个模型,并且通过微调到另一个域,从另一个(父母)获得其中一个(子),传输学习的常见实践。几种工作已经利用了对齐的样式模型的一些基本属性来执行图像到图像转换。在这里,我们执行模型对齐的第一次详细探索,也专注于样式。首先,我们经验经验分析对齐的模型,并为其性质提供重要问题的答案。特别是,我们发现儿童模型的潜在空间与父级的潜在的潜在空间进行了语义对齐,即使对于遥远的数据领域,例如人类面和教堂等遥控数据域。其次,配备这种更好的理解,我们利用对齐的模型来解决各种各样的任务。除了图像翻译外,我们还展示了全自动跨域图像的变形。我们进一步表明,可以在子域中执行零拍视觉任务,同时仅在父域中依赖于监督。我们定性和定量展示我们的方法产生最先进的结果,同时只需要简单的微调和反转。

CV-5-标题: Deep Curriculum Learning in Task Space for Multi-Class Based Mammography Diagnosis

链接: https://arxiv.org/abs/2110.11320
作者: Jun Luo, Dooman Arefan, Margarita Zuley, Jules Sumkin, Shandong Wu
备注: 4-page abstract. Full paper to appear at SPIE Medical Imaging 2022

点击查看摘要

Abstract: Mammography is used as a standard screening procedure for the potential patients of breast cancer. Over the past decade, it has been shown that deep learning techniques have succeeded in reaching near-human performance in a number of tasks, and its application in mammography is one of the topics that medical researchers most concentrate on. In this work, we propose an end-to-end Curriculum Learning (CL) strategy in task space for classifying the three categories of Full-Field Digital Mammography (FFDM), namely Malignant, Negative, and False recall. Specifically, our method treats this three-class classification as a “harder” task in terms of CL, and create an “easier” sub-task of classifying False recall against the combined group of Negative and Malignant. We introduce a loss scheduler to dynamically weight the contribution of the losses from the two tasks throughout the entire training process. We conduct experiments on an FFDM datasets of 1,709 images using 5-fold cross validation. The results show that our curriculum learning strategy can boost the performance for classifying the three categories of FFDM compared to the baseline strategies for model training.

摘要:乳房X源被用作乳腺癌潜在患者的标准筛选程序。在过去的十年中,已经表明,深入的学习技术已经成功地在许多任务中达到了近乎人类的绩效,其在乳房X线摄影中的应用是医学研究人员最专注的主题之一。在这项工作中,我们向任务空间提出了一个端到端的课程学习(CL)策略,用于分类三类全场数字乳房X线摄影(FFDM),即恶性,负,错误的召回。具体而言,我们的方法在CL方面将此三类分类视为“更难”的任务,并创建一个“更容易”对对否定的负面和恶性组组进行误召回的子任务。我们介绍一个损失调度程序,以动态地重量在整个培训过程中从两个任务中的损失的贡献。我们使用5倍交叉验证对1,709个图像的FFDM数据集进行实验。结果表明,与模型培训的基线策略相比,我们的课程学习策略可以提高分类三类FFDM的性能。

CV-6-标题: CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

链接: https://arxiv.org/abs/2110.11316
作者: Andreas Fürst, Elisabeth Rumetshofer, Viet Tran, Hubert Ramsauer, Fei Tang, Johannes Lehner, David Kreil, Michael Kopp, Günter Klambauer, Angela Bitto-Nemling, Sepp Hochreiter
备注: 14 pages (+ appendix); Blog: this https URL GitHub: this https URL

点击查看摘要

Abstract: Contrastive learning with the InfoNCE objective is exceptionally successful in various self-supervised learning tasks. Recently, the CLIP model yielded impressive results on zero-shot transfer learning when using InfoNCE for learning visual representations from natural language supervision. However, InfoNCE as a lower bound on the mutual information has been shown to perform poorly for high mutual information. In contrast, the InfoLOOB upper bound (leave one out bound) works well for high mutual information but suffers from large variance and instabilities. We introduce “Contrastive Leave One Out Boost” (CLOOB), where modern Hopfield networks boost learning with the InfoLOOB objective. Modern Hopfield networks replace the original embeddings by retrieved embeddings in the InfoLOOB objective. The retrieved embeddings give InfoLOOB two assets. Firstly, the retrieved embeddings stabilize InfoLOOB, since they are less noisy and more similar to one another than the original embeddings. Secondly, they are enriched by correlations, since the covariance structure of embeddings is reinforced through retrievals. We compare CLOOB to CLIP after learning on the Conceptual Captions and the YFCC dataset with respect to their zero-shot transfer learning performance on other datasets. CLOOB consistently outperforms CLIP at zero-shot transfer learning across all considered architectures and datasets.

摘要:与Inconce目标的对比学习在各种自我监督的学习任务中具有特别成功。最近,剪辑模型在使用InfoNce以了解自然语言监督时学习视觉表现时,零拍摄转移学习产生了令人印象深刻的结果。但是,由于相互信息的下限为下限,对于高互信来表现不佳。相比之下,infoloob上限(留出一个界限)适用于高相互信息,但遭受大的差异和稳定性。我们介绍了“对比留出一个推出的提升”(Cloob),其中现代Hopfield网络通过infoloob目标提升了学习。现代Hopfield Networks通过检索Infoloob目标中的嵌入式更换原始嵌入式。检索到的嵌入式给出了两种资产。首先,检索到的嵌入稳定infoloob,因为它们的噪声较小,更类似于彼此的彼此相似。其次,它们通过相关性而富集,因为嵌入的协方差结构通过检索加强。在学习概念标题和YFCC数据集后,我们将Cloob与其他数据集上的零拍摄传输学习性能进行了解。 Cloob在所有被视为架构和数据集中都始终如一地占据零拍摄传输学习的剪辑。

CV-7-标题: Center Loss Regularization for Continual Learning

链接: https://arxiv.org/abs/2110.11314
作者: Kaustubh Olpadkar, Ekta Gavas
备注: 16 pages, 9 figures, Submitted to the ICLR 2022 conference

点击查看摘要

Abstract: The ability to learn different tasks sequentially is essential to the development of artificial intelligence. In general, neural networks lack this capability, the major obstacle being catastrophic forgetting. It occurs when the incrementally available information from non-stationary data distributions is continually acquired, disrupting what the model has already learned. Our approach remembers old tasks by projecting the representations of new tasks close to that of old tasks while keeping the decision boundaries unchanged. We employ the center loss as a regularization penalty that enforces new tasks’ features to have the same class centers as old tasks and makes the features highly discriminative. This, in turn, leads to the least forgetting of already learned information. This method is easy to implement, requires minimal computational and memory overhead, and allows the neural network to maintain high performance across many sequentially encountered tasks. We also demonstrate that using the center loss in conjunction with the memory replay outperforms other replay-based strategies. Along with standard MNIST variants for continual learning, we apply our method to continual domain adaptation scenarios with the Digits and PACS datasets. We demonstrate that our approach is scalable, effective, and gives competitive performance compared to state-of-the-art continual learning methods.

摘要:依次学习不同任务的能力对人工智能的发展至关重要。一般来说,神经网络缺乏这种能力,主要障碍是灾难性的遗忘。当不断获取非静止数据分布的逐步可用信息时,会破坏模型已经学习的内容。我们的方法通过将新任务的表达投影靠近旧任务的新任务的表现来记住旧任务,同时保持决策边界不变。我们将中心损失作为正则化罚款,该惩罚执行了新的任务的功能,使其与旧任务相同的类中心,并使具有高度辨别性的特征。反过来,这导致了已经忘记了已经学习的信息。该方法易于实现,需要最小的计算和存储器开销,并且允许神经网络在许多顺序遇到的任务中保持高性能。我们还证明使用中心损耗结合内存重播优于其他基于重放的策略。除了用于连续学习的标准MNIST变体之外,我们将我们的方法应用于与数字和PACS数据集的持续域适应方案。与最先进的持续学习方法相比,我们证明我们的方法是可扩展性,有效的,并提供竞争性能。

CV-8-标题: Video and Text Matching with Conditioned Embeddings

链接: https://arxiv.org/abs/2110.11298
作者: Ameen Ali, Idan Schwartz, Tamir Hazan, Lior Wolf
备注:

点击查看摘要

Abstract: We present a method for matching a text sentence from a given corpus to a given video clip and vice versa. Traditionally video and text matching is done by learning a shared embedding space and the encoding of one modality is independent of the other. In this work, we encode the dataset data in a way that takes into account the query’s relevant information. The power of the method is demonstrated to arise from pooling the interaction data between words and frames. Since the encoding of the video clip depends on the sentence compared to it, the representation needs to be recomputed for each potential match. To this end, we propose an efficient shallow neural network. Its training employs a hierarchical triplet loss that is extendable to paragraph/video matching. The method is simple, provides explainability, and achieves state-of-the-art results for both sentence-clip and video-text by a sizable margin across five different datasets: ActivityNet, DiDeMo, YouCook2, MSR-VTT, and LSMDC. We also show that our conditioned representation can be transferred to video-guided machine translation, where we improved the current results on VATEX. Source code is available at this https URL.

摘要:我们介绍了一种从给定的语料库匹配给定视频剪辑的文本句子的方法,反之亦然。传统上通过学习共享嵌入空间来完成视频和文本匹配,并且一个模态的编码是独立于另一个的。在这项工作中,我们以考虑查询的相关信息的方式对数据集数据进行编码。对该方法的功率进行说明,以汇集单词和帧之间的交互数据。由于视频剪辑的编码取决于与它相比的判断,因此需要为每个潜在匹配重新计算表示。为此,我们提出了一个高效的浅层神经网络。其培训采用分层三重损失,可扩展到段落/视频匹配。该方法简单,提供了可解释性,并在五个不同数据集中的相同边值下实现了最先进的结果,并在五个不同的数据集中的相同边值:ActivityNet,Didemo,You ocook2,MSR-VTT和LSMDC。我们还表明,我们的条件表示可以转移到视频引导机翻译,在那里我们改善了Vatex上的当前结果。此HTTPS URL可用源代码。

CV-9-标题: An Empirical Study on GANs with Margin Cosine Loss and Relativistic Discriminator

链接: https://arxiv.org/abs/2110.11293
作者: Cuong V. Nguyen, Tien-Dung Cao, Tram Truong-Huu, Khanh N. Pham, Binh T. Nguyen
备注: 16 pages, 5 figures

点击查看摘要

Abstract: Generative Adversarial Networks (GANs) have emerged as useful generative models, which are capable of implicitly learning data distributions of arbitrarily complex dimensions. However, the training of GANs is empirically well-known for being highly unstable and sensitive. The loss functions of both the discriminator and generator concerning their parameters tend to oscillate wildly during training. Different loss functions have been proposed to stabilize the training and improve the quality of images generated. In this paper, we perform an empirical study on the impact of several loss functions on the performance of standard GAN models, Deep Convolutional Generative Adversarial Networks (DCGANs). We introduce a new improvement that employs a relativistic discriminator to replace the classical deterministic discriminator in DCGANs and implement a margin cosine loss function for both the generator and discriminator. This results in a novel loss function, namely \textit{Relativistic Margin Cosine Loss} (RMCosGAN). We carry out extensive experiments with four datasets: CIFAR-1010, MNIST, STL-1010, and CAT. We compare RMCosGAN performance with existing loss functions based on two metrics: Frechet inception distance and inception score. The experimental results show that RMCosGAN outperforms the existing ones and significantly improves the quality of images generated.

摘要:生成的对抗网络(GANS)已成为有用的生成模型,其能够隐式地学习任意复杂的维度的数据分布。然而,对于高度不稳定和敏感,对GAN的培训是众所周知的。关于其参数的鉴别器和发电机的损失功能倾向于在训练期间疯狂地振荡。已经提出了不同的损失功能来稳定培训并提高生成的图像的质量。在本文中,我们对若干损失功能的影响对标准GaN模型,深卷积生成的对抗网络(DCGANs)的影响进行了实证研究。我们引入了一种新的改进,采用相对论的鉴别器来取代DCGANS中的经典确定性鉴别器,并为发电机和鉴别器实施边缘余损损耗功能。这导致新的损失函数,即\ Texit {相对论利润余弦损失}(RMCOSGAN)。我们使用四个数据集进行广泛的实验:CiFar- 10美元,Mnist,STL- 10美元和猫。我们将RMCOSGON性能与现有损失函数进行比较,基于两个指标:Freechet Inception距离和成立分数。实验结果表明,RMCOSGON优于现有的,并显着提高了所生成的图像的质量。

CV-10-标题: Multi-Object Tracking and Segmentation with a Space-Time Memory Network

链接: https://arxiv.org/abs/2110.11284
作者: Mehdi Miah, Guillaume-Alexandre Bilodeau, Nicolas Saunier
备注: arXiv admin note: text overlap with arXiv:2107.07067

点击查看摘要

Abstract: We propose a method for multi-object tracking and segmentation that does not require fine-tuning or per benchmark hyper-parameter selection. The proposed tracker, MeNToS, addresses particularly the data association problem. Indeed, the recently introduced HOTA metric, which has a better alignment with the human visual assessment by evenly balancing detections and associations quality, has shown that improvements are still needed for data association. After creating tracklets using instance segmentation and optical flow, the proposed method relies on a space-time memory network developed for one-shot video object segmentation to improve the association of tracklets with temporal gaps. We evaluated our tracker on KITTIMOTS and MOTSChallenge and show the benefit of our data association strategy with the HOTA metric. The project page is \url{this http URL}.

摘要:我们提出了一种用于多目标跟踪和分段的方法,这些方法不需要微调或每个基准超参数选择。所提出的跟踪器,Mentos,特别是数据关联问题。实际上,最近引入的HOTA度量通过均匀平衡检测和关联质量与人类视觉评估更好地对准,已经表明数据关联仍然需要改进。在使用实例分割和光流创建轨迹之后,所提出的方法依赖于为单次视频对象分段开发的时空存储器网络,以改善具有时间间隙的Tracklet的关联。我们在Kittimots和Motschallenge上评估了我们的跟踪器,并显示了我们的数据关联策略与Hota指标的好处。项目页面是\ URL {此HTTP URL}。

CV-11-标题: The Effect of Wearing a Face Mask on Face Image Quality

链接: https://arxiv.org/abs/2110.11283
作者: Biying Fu, Florian Kirchbuchner, Naser Damer
备注: 8 pages, 6 figures, 16th {IEEE} International Conference on Automatic Face and Gesture Recognition, {FG} 2021

点击查看摘要

Abstract: Due to the COVID-19 situation, face masks have become a main part of our daily life. Wearing mouth-and-nose protection has been made a mandate in many public places, to prevent the spread of the COVID-19 virus. However, face masks affect the performance of face recognition, since a large area of the face is covered. The effect of wearing a face mask on the different components of the face recognition system in a collaborative environment is a problem that is still to be fully studied. This work studies, for the first time, the effect of wearing a face mask on face image quality by utilising state-of-the-art face image quality assessment methods of different natures. This aims at providing better understanding on the effect of face masks on the operation of face recognition as a whole system. In addition, we further studied the effect of simulated masks on face image utility in comparison to real face masks. We discuss the correlation between the mask effect on face image quality and that on the face verification performance by automatic systems and human experts, indicating a consistent trend between both factors. The evaluation is conducted on the database containing (1) no-masked faces, (2) real face masks, and (3) simulated face masks, by synthetically generating digital facial masks on no-masked faces according to the NIST protocols [1, 23]. Finally, a visual interpretation of the face areas contributing to the quality score of a selected set of quality assessment methods is provided to give a deeper insight into the difference of network decisions in masked and non-masked faces, among other variations.

摘要:由于Covid-19局面,面部面具已成为日常生活的主要部分。在许多公共场所佩戴嘴巴保护是在许多公共场所的授权,以防止Covid-19病毒的传播。然而,面部面罩影响人脸识别的性能,因为覆盖了大面积的面积。穿着面罩在协作环境中穿着面罩对面部识别系统的不同部件的影响是仍然被全面研究的问题。这项工作研究首次通过利用不同自然的人的面部图像质量评估方法佩戴面部掩模对面部图像质量的影响。这旨在更好地了解面部掩模对整个系统的脸部识别操作的影响。此外,我们进一步研究了模拟掩模对面部图像效用的影响与真实面罩相比。我们讨论了对面部图像质量的掩模效果与自动系统和人类专家面临的面部验证性能之间的相关性,表明这两个因素之间的一致趋势。通过根据NIST协议在无掩盖面上的综合产生数字面部面罩[1, 23]。最后,提供了对所选质量评估方法集的质量得分的面积的视觉解释,以便更深入地了解掩蔽和非掩盖面中的网络决策的差异以及其他变化。

CV-12-标题: Super-resolution of multiphase materials by combining complementary 2D and 3D image data using generative adversarial networks

链接: https://arxiv.org/abs/2110.11281
作者: Amir Dahari, Steve Kench, Isaac Squires, Samuel J. Cooper
备注:

点击查看摘要

Abstract: Modelling the impact of a material’s mesostructure on device level performance typically requires access to 3D image data containing all the relevant information to define the geometry of the simulation domain. This image data must include sufficient contrast between phases to distinguish each material, be of high enough resolution to capture the key details, but also have a large enough field-of-view to be representative of the material in general. It is rarely possible to obtain data with all of these properties from a single imaging technique. In this paper, we present a method for combining information from pairs of distinct but complementary imaging techniques in order to accurately reconstruct the desired multi-phase, high resolution, representative, 3D images. Specifically, we use deep convolutional generative adversarial networks to implement super-resolution, style transfer and dimensionality expansion. To demonstrate the widespread applicability of this tool, two pairs of datasets are used to validate the quality of the volumes generated by fusing the information from paired imaging techniques. Three key mesostructural metrics are calculated in each case to show the accuracy of this method. Having confidence in the accuracy of our method, we then demonstrate its power by applying to a real data pair from a lithium ion battery electrode, where the required 3D high resolution image data is not available anywhere in the literature. We believe this approach is superior to previously reported statistical material reconstruction methods both in terms of its fidelity and ease of use. Furthermore, much of the data required to train this algorithm already exists in the literature, waiting to be combined. As such, our open-access code could precipitate a step change by generating the hard to obtain high quality image volumes necessary to simulate behaviour at the mesoscale.

摘要:建模材料的Mesososture对设备级性能的影响通常需要访问包含所有相关信息的3D图像数据来定义仿真域的几何形状。该图像数据必须包括相位之间的足够对比度来区分每个材料,具有足够高的分辨率以捕获关键细节,而且还具有足够大的视野,以便通常代表材料。很少可以从单个成像技术获取所有这些属性的数据。在本文中,我们介绍了一种用于将信息组合的方法与互补的成像技术对组合,以便准确地重建所需的多相,高分辨率,代表性的3D图像。具体而言,我们使用深卷积的生成对抗网络来实现超分辨率,风格的转移和维度扩展。为了演示该工具的广泛适用性,使用两对数据集来验证通过熔断来自成对的成像技术而生成的卷的质量。在每种情况下计算三个关键的Mesostratural测量标准,以显示该方法的准确性。我们对我们方法的准确性有信心,然后我们通过施加来自锂离子电池电极的真实数据对来展示其功率,其中所需的3D高分辨率图像数据在文献中的任何地方不可用。我们认为这种方法优于先前报告的统计材料重建方法,无论是其保真性和易用性。此外,培训该算法所需的大部分数据已经存在于文献中,等待组合。因此,我们的开放访问代码可以通过生成难以获得在Mesoscale在Mesoscale处的行为所需的高质量图像卷来促使步骤改变。

CV-13-标题: Self-Supervised Monocular Scene Decomposition and Depth Estimation

链接: https://arxiv.org/abs/2110.11275
作者: Sadra Safadoust, Fatma Güney
备注: 3DV 2021

点击查看摘要

Abstract: Self-supervised monocular depth estimation approaches either ignore independently moving objects in the scene or need a separate segmentation step to identify them. We propose MonoDepthSeg to jointly estimate depth and segment moving objects from monocular video without using any ground-truth labels. We decompose the scene into a fixed number of components where each component corresponds to a region on the image with its own transformation matrix representing its motion. We estimate both the mask and the motion of each component efficiently with a shared encoder. We evaluate our method on three driving datasets and show that our model clearly improves depth estimation while decomposing the scene into separately moving components.

摘要:自我监督的单眼深度估计方法忽略场景中的独立移动对象,或者需要单独的分段步骤来识别它们。我们提出了单眼视频联合估计深度和段移动物体的单位epthseg,而不使用任何地面真值标签。我们将场景分解为固定数量的组件,其中每个组件对应于图像上的区域,其具有其自身的变换矩阵表示其运动。我们用共享编码器估计每个组件的掩模和运动的运动。我们在三个驾驶数据集中评估我们的方法,并表明我们的模型清楚地提高了深度估计,同时将场景分解成单独移动的组件。

CV-14-标题: MSO: Multi-Feature Space Joint Optimization Network for RGB-Infrared Person Re-Identification

链接: https://arxiv.org/abs/2110.11264
作者: Yajun Gao, Tengfei Liang, Yi Jin, Xiaoyan Gu, Wu Liu, Yidong Li, Congyan Lang
备注:

点击查看摘要

Abstract: The RGB-infrared cross-modality person re-identification (ReID) task aims to recognize the images of the same identity between the visible modality and the infrared modality. Existing methods mainly use a two-stream architecture to eliminate the discrepancy between the two modalities in the final common feature space, which ignore the single space of each modality in the shallow layers. To solve it, in this paper, we present a novel multi-feature space joint optimization (MSO) network, which can learn modality-sharable features in both the single-modality space and the common space. Firstly, based on the observation that edge information is modality-invariant, we propose an edge features enhancement module to enhance the modality-sharable features in each single-modality space. Specifically, we design a perceptual edge features (PEF) loss after the edge fusion strategy analysis. According to our knowledge, this is the first work that proposes explicit optimization in the single-modality feature space on cross-modality ReID task. Moreover, to increase the difference between cross-modality distance and class distance, we introduce a novel cross-modality contrastive-center (CMCC) loss into the modality-joint constraints in the common feature space. The PEF loss and CMCC loss jointly optimize the model in an end-to-end manner, which markedly improves the network’s performance. Extensive experiments demonstrate that the proposed model significantly outperforms state-of-the-art methods on both the SYSU-MM01 and RegDB datasets.

摘要:RGB-红外线跨型号人重新识别(REID)任务旨在识别可见模态和红外模态之间相同的身份的图像。现有方法主要使用两流架构来消除最终常见特征空间中的两个模态之间的差异,该空间忽略了浅层中每个模态的单个空间。为了解决它,在本文中,我们提出了一种新型多特征空间联合优化(MSO)网络,可以在单模态空间和公共空间中学习模态可共享功能。首先,基于边缘信息是模态不变量的观察,我们提出了边缘具有增强模块的边缘,以增强每个单模态空间中的模态可共享功能。具体而言,我们在边缘融合策略分析后设计了感知边缘特征(PEF)丢失。根据我们的知识,这是第一个在跨模式REID任务上提出了单模特征空间中显式优化的工作。此外,为了增加跨模型距离和类距离之间的差异,我们将新的跨模式对比中心(CMCC)丢失进入共同特征空间中的模态 - 关节约束。 PEF损耗和CMCC损耗以端到端的方式共同优化模型,这显着提高了网络的性能。广泛的实验表明,所提出的模型在Sysu-MM01和RegDB数据集中显着优于最先进的方法。

CV-15-标题: Multi-Category Mesh Reconstruction From Image Collections

链接: https://arxiv.org/abs/2110.11256
作者: Alessandro Simoni, Stefano Pini, Roberto Vezzani, Rita Cucchiara
备注: Accepted at 3DV 2021

点击查看摘要

Abstract: Recently, learning frameworks have shown the capability of inferring the accurate shape, pose, and texture of an object from a single RGB image. However, current methods are trained on image collections of a single category in order to exploit specific priors, and they often make use of category-specific 3D templates. In this paper, we present an alternative approach that infers the textured mesh of objects combining a series of deformable 3D models and a set of instance-specific deformation, pose, and texture. Differently from previous works, our method is trained with images of multiple object categories using only foreground masks and rough camera poses as supervision. Without specific 3D templates, the framework learns category-level models which are deformed to recover the 3D shape of the depicted object. The instance-specific deformations are predicted independently for each vertex of the learned 3D mesh, enabling the dynamic subdivision of the mesh during the training process. Experiments show that the proposed framework can distinguish between different object categories and learn category-specific shape priors in an unsupervised manner. Predicted shapes are smooth and can leverage from multiple steps of subdivision during the training process, obtaining comparable or state-of-the-art results on two public datasets. Models and code are publicly released.

摘要:最近,学习框架已经显示了从单个RGB图像推断对象的准确形状,姿势和纹理的能力。但是,目前的方法在单个类别的图像集合上培训,以便利用特定的前瞻,并且他们经常利用特定于类别的3D模板。在本文中,我们提出了一种替代方法,即在组合一系列可变形的3D模型和一组具体变形,姿势和纹理的组合对象的纹理网格的替代方法。与以前的作品不同,我们的方法使用多个物品类别的图像培训,仅使用前景面罩和粗糙相机姿势作为监控。如果没有特定的3D模板,该框架就会了解变形以恢复所描绘对象的3D形状的类别级模型。对于所学习的3D网格的每个顶点,独立地预测了实例特定的变形,在训练过程期间启用网格的动态细分。实验表明,所提出的框架可以区分不同的对象类别,并以无监督的方式学习特定的类别的形状前导。预测的形状是光滑的,可以在训练过程中从细分的多个步骤中利用,从而获得两个公共数据集的可比或最先进的结果。模型和代码公开发布。

CV-16-标题: On the properties of some low-parameter models for color reproduction in terms of spectrum transformations and coverage of a color triangle

链接: https://arxiv.org/abs/2110.11255
作者: Alexey Kroshnin, Viacheslav Vasilev, Egor Ershov, Denis Shepelev, Dmitry Nikolaev, Mikhail Tchobanou
备注: 23 pages, 2 figures

点击查看摘要

Abstract: One of the classical approaches to solving color reproduction problems, such as color adaptation or color space transform, is the use of low-parameter spectral models. The strength of this approach is the ability to choose a set of properties that the model should have, be it a large coverage area of a color triangle, an accurate description of the addition or multiplication of spectra, knowing only the tristimulus corresponding to them. The disadvantage is that some of the properties of the mentioned spectral models are confirmed only experimentally. This work is devoted to the theoretical substantiation of various properties of spectral models. In particular, we prove that the banded model is the only model that simultaneously possesses the properties of closure under addition and multiplication. We also show that the Gaussian model is the limiting case of the von Mises model and prove that the set of protomers of the von Mises model unambiguously covers the color triangle in both the case of convex and non-convex spectral locus.

摘要:解决颜色再现问题的经典方法之一,例如颜色适应或彩色空间变换,是使用低参数光谱模型。这种方法的强度是能够选择模型应该具有的一组特性,作为颜色三角形的大覆盖面积,可以仅从与它们相对应的三刺激来描述的添加或乘法的准确描述。缺点是,所提到的光谱模型的一些性质仅在实验上确认。这项工作致力于谱模型各种性质的理论证实。特别是,我们证明了带状模型是唯一同时拥有在添加和乘法的封闭性质的模型。我们还表明,高斯模型是Von Mises模型的限制情况,并证明了Von Mises模型的一组重选件在凸和非凸谱轨迹的情况下明确地覆盖了颜色三角形。

CV-17-标题: One Representative-Shot Learning Using a Population-Driven Template with Application to Brain Connectivity Classification and Evolution Prediction

链接: https://arxiv.org/abs/2110.11238
作者: Umut Guvercin, Mohammed Amine Gharsallaoui, Islem Rekik
备注:

点击查看摘要

Abstract: Few-shot learning presents a challenging paradigm for training discriminative models on a few training samples representing the target classes to discriminate. However, classification methods based on deep learning are ill-suited for such learning as they need large amounts of training data --let alone one-shot learning. Recently, graph neural networks (GNNs) have been introduced to the field of network neuroscience, where the brain connectivity is encoded in a graph. However, with scarce neuroimaging datasets particularly for rare diseases and low-resource clinical facilities, such data-devouring architectures might fail in learning the target task. In this paper, we take a very different approach in training GNNs, where we aim to learn with one sample and achieve the best performance --a formidable challenge to tackle. Specifically, we present the first one-shot paradigm where a GNN is trained on a single population-driven template --namely a connectional brain template (CBT). A CBT is a compact representation of a population of brain graphs capturing the unique connectivity patterns shared across individuals. It is analogous to brain image atlases for neuroimaging datasets. Using a one-representative CBT as a training sample, we alleviate the training load of GNN models while boosting their performance across a variety of classification and regression tasks. We demonstrate that our method significantly outperformed benchmark one-shot learning methods with downstream classification and time-dependent brain graph data forecasting tasks while competing with the train-on-all conventional training strategy. Our source code can be found at this https URL.

摘要:少量学习呈现出具有挑战性的范式,用于训练代表目标课程的几个训练样本上的歧视性模型。然而,基于深度学习的分类方法对于这种学习而不适合这种学习,因为它们需要大量的培训数据 - 单独学习。最近,图表神经网络(GNNS)已被引入到网络神经科学领域,其中大脑连接在图中编码。然而,由于稀缺的神经影像数据集,特别是对于罕见疾病和低资源临床设施,这种数据吞噬的架构可能在学习目标任务时可能失败。在本文中,我们采取了一种非常不同的方法在培训GNNS中,我们的目标是使用一个样本来学习并实现最佳性能 - 以解决挑战。具体而言,我们介绍了第一单次范例,其中GNN在单个人口驱动的模板上培训,连接脑模板(CBT)。 CBT是脑图群的紧凑型表示,捕获各个跨个人共享的独特连通模式。它类似于神经影像数据集的脑图像地图集。使用一个代表性的CBT作为培训样本,我们缓解了GNN模型的培训负载,同时提高了各种分类和回归任务的性能。我们展示了我们的方法在与所有传统训练策略中竞争时,我们的方法具有下游分类和时间依赖性脑图数据预测任务的基准一次性学习方法。我们的源代码可以在此HTTPS URL找到。

CV-18-标题: Detection of Driver Drowsiness by Calculating the Speed of Eye Blinking

链接: https://arxiv.org/abs/2110.11223
作者: Muhammad Fawwaz Yusri, Patrick Mangat, Oliver Wasenmüller
备注: This paper has been accepted at the Upper-Rhine Artificial Intelligence Symposium 2021

点击查看摘要

Abstract: Many road accidents are caused by drowsiness of the driver. While there are methods to detect closed eyes, it is a non-trivial task to detect the gradual process of a driver becoming drowsy. We consider a simple real-time detection system for drowsiness merely based on the eye blinking rate derived from the eye aspect ratio. For the eye detection we use HOG and a linear SVM. If the speed of the eye blinking drops below some empirically determined threshold, the system triggers an alarm, hence preventing the driver from falling into microsleep. In this paper, we extensively evaluate the minimal requirements for the proposed system. We find that this system works well if the face is directed to the camera, but it becomes less reliable once the head is tilted significantly. The results of our evaluations provide the foundation for further developments of our drowsiness detection system.

摘要:许多道路事故是由司机的嗜睡引起的。虽然有检测闭合的方法,但是要检测驾驶员昏昏欲睡的渐进过程是一种非琐碎的任务。我们认为仅基于从眼睛纵横比导出的眼睛闪烁速率来考虑嗜睡的简单实时检测系统。对于眼睛检测,我们使用HOG和LINEAR SVM。如果眼睛闪烁的速度低于一些经验确定的阈值,系统会触发警报,因此防止驾驶员落入微图。在本文中,我们广泛评估了所提出的系统的最小要求。我们发现,如果脸部被定向到相机,则该系统运行良好,但一旦头部倾斜,它就会变得不那么可靠。我们的评估结果为我们的嗜睡检测系统的进一步发展提供了基础。

CV-19-标题: PlaneRecNet: Multi-Task Learning with Cross-Task Consistency for Piece-Wise Plane Detection and Reconstruction from a Single RGB Image

链接: https://arxiv.org/abs/2110.11219
作者: Yaxu Xie, Fangwen Shu, Jason Rambach, Alain Pagani, Didier Stricker
备注: accepted to BMVC 2021, code opensource: this https URL

点击查看摘要

Abstract: Piece-wise 3D planar reconstruction provides holistic scene understanding of man-made environments, especially for indoor scenarios. Most recent approaches focused on improving the segmentation and reconstruction results by introducing advanced network architectures but overlooked the dual characteristics of piece-wise planes as objects and geometric models. Different from other existing approaches, we start from enforcing cross-task consistency for our multi-task convolutional neural network, PlaneRecNet, which integrates a single-stage instance segmentation network for piece-wise planar segmentation and a depth decoder to reconstruct the scene from a single RGB image. To achieve this, we introduce several novel loss functions (geometric constraint) that jointly improve the accuracy of piece-wise planar segmentation and depth estimation. Meanwhile, a novel Plane Prior Attention module is used to guide depth estimation with the awareness of plane instances. Exhaustive experiments are conducted in this work to validate the effectiveness and efficiency of our method.

摘要:典型的3D平面重建为人造环境提供全面的现场了解,特别是对于室内场景。最近的方法侧重于通过引入先进的网络架构来改善分割和重建结果,但忽略了片断平面的双重特征作为对象和几何模型。与其他现有方法不同,我们开始为我们的多任务卷积神经网络,FlanerEcnet进行跨任务一致性,该跨任务一致性地集成了单级实例分段网络,用于转换平面平面分割和深度解码器,以从A重建场景单个RGB图像。为此,我们介绍了几种新颖的损失功能(几何约束),共同提高了片断平面分割和深度估计的准确性。同时,先前注意模块的新颖平面用于引导平面实例的认识深度估计。在这项工作中进行了详尽的实验,以验证我们方法的有效性和效率。

CV-20-标题: Generative Adversarial Graph Convolutional Networks for Human Action Synthesis

链接: https://arxiv.org/abs/2110.11191
作者: Bruno Degardin, João Neves, Vasco Lopes, João Brito, Ehsan Yaghoubi, Hugo Proença
备注: Published as a conference paper at WACV 2022. Code and pretrained models available at this https URL

点击查看摘要

Abstract: Synthesising the spatial and temporal dynamics of the human body skeleton remains a challenging task, not only in terms of the quality of the generated shapes, but also of their diversity, particularly to synthesise realistic body movements of a specific action (action conditioning). In this paper, we propose Kinetic-GAN, a novel architecture that leverages the benefits of Generative Adversarial Networks and Graph Convolutional Networks to synthesise the kinetics of the human body. The proposed adversarial architecture can condition up to 120 different actions over local and global body movements while improving sample quality and diversity through latent space disentanglement and stochastic variations. Our experiments were carried out in three well-known datasets, where Kinetic-GAN notably surpasses the state-of-the-art methods in terms of distribution quality metrics while having the ability to synthesise more than one order of magnitude regarding the number of different actions. Our code and models are publicly available at this https URL.

摘要:综合人体骨骼的空间和时间动态仍然是一个具有挑战性的任务,不仅在所产生的形状的质量方面,而且还具有它们的多样性,特别是合成特定行动的现实身体运动(动作调理) 。在本文中,我们提出了一种新颖的架构,它利用生成的对抗网络和图表卷积网络来合成人体动力学的益处。拟议的对抗性建筑可以通过潜伏空间解剖和随机变化来改善样本质量和多样性,在本地和全球机构运动中调节高达120个不同的行动。我们的实验是在三个众所周知的数据集中进行的,其中Kinetic-GaN在分配质量指标方面显着超越了最先进的方法,同时能够合成多个数量级的有关不同的数量级行动。我们的代码和型号在此HTTPS URL上公开提供。

CV-21-标题: On Hard Episodes in Meta-Learning

链接: https://arxiv.org/abs/2110.11190
作者: Samyadeep Basu, Amr Sharaf, Nicolo Fusi, Soheil Feizi
备注:

点击查看摘要

Abstract: Existing meta-learners primarily focus on improving the average task accuracy across multiple episodes. Different episodes, however, may vary in hardness and quality leading to a wide gap in the meta-learner’s performance across episodes. Understanding this issue is particularly critical in industrial few-shot settings, where there is limited control over test episodes as they are typically uploaded by end-users. In this paper, we empirically analyse the behaviour of meta-learners on episodes of varying hardness across three standard benchmark datasets: CIFAR-FS, mini-ImageNet, and tiered-ImageNet. Surprisingly, we observe a wide gap in accuracy of around 50% between the hardest and easiest episodes across all the standard benchmarks and meta-learners. We additionally investigate various properties of hard episodes and highlight their connection to catastrophic forgetting during meta-training. To address the issue of sub-par performance on hard episodes, we investigate and benchmark different meta-training strategies based on adversarial training and curriculum learning. We find that adversarial training strategies are much more powerful than curriculum learning in improving the prediction performance on hard episodes.

摘要:现有的元学习者主要专注于提高多次集中的平均任务准确性。然而,不同的剧集可能因硬度和质量而导致Meta-Learner在剧集中的表现方面的广泛差距。了解此问题在工业少量拍摄设置中尤其重要,其中对测试剧集的控制有限,因为它们通常由最终用户上传。在本文中,我们在三个标准基准数据集中统一地分析了元学习者在不同硬度的剧集中的行为:CiFar-FS,迷你想象,和分层Imagenet。令人惊讶的是,我们在所有标准基准和Meta-Meformers中最艰难和最简景之间的准确性达到大约50%的精度范围。我们还调查了困难剧集的各种属性,并突出了在Meta培训期间与灾难性遗忘的连接。为了解决困难剧集的分区表现问题,我们根据对抗培训和课程学习来调查和基准不同的荟萃培训策略。我们发现对抗性培训策略比课程学习更强大,从而提高硬发作的预测性能。

CV-22-标题: SLURP: Side Learning Uncertainty for Regression Problems

链接: https://arxiv.org/abs/2110.11182
作者: Xuanlong Yu, Gianni Franchi, Emanuel Aldea
备注:

点击查看摘要

Abstract: It has become critical for deep learning algorithms to quantify their output uncertainties to satisfy reliability constraints and provide accurate results. Uncertainty estimation for regression has received less attention than classification due to the more straightforward standardized output of the latter class of tasks and their high importance. However, regression problems are encountered in a wide range of applications in computer vision. We propose SLURP, a generic approach for regression uncertainty estimation via a side learner that exploits the output and the intermediate representations generated by the main task model. We test SLURP on two critical regression tasks in computer vision: monocular depth and optical flow estimation. In addition, we conduct exhaustive benchmarks comprising transfer to different datasets and the addition of aleatoric noise. The results show that our proposal is generic and readily applicable to various regression problems and has a low computational cost with respect to existing solutions.

摘要:对于深度学习算法来量化其输出不确定性来满足可靠性约束并提供准确的结果,它变得至关重要。由于后一类任务的标准化和高度更加直接的标准输出,回归的不确定性估计比分类更少。但是,在计算机视觉中的各种应用中遇到了回归问题。我们提出了SLURP,通过侧学习者进行了一种副学习者的通用方法,该侧学习者利用了主要任务模型生成的输出和中间表示。我们在计算机视觉中的两个关键回归任务中测试SLURP:单眼深度和光学流量估计。另外,我们进行详尽的基准,包括转移到不同的数据集并添加梯度噪声。结果表明,我们的提案是通用的,随时适用于各种回归问题,并且对现有解决方案具有低计算成本。

CV-23-标题: Self-Supervised Visual Representation Learning Using Lightweight Architectures

链接: https://arxiv.org/abs/2110.11160
作者: Prathamesh Sonawane, Sparsh Drolia, Saqib Shamsi, Bhargav Jain
备注: 8 pages, 4 figures, 1 table, submitted to Artificial Intelligence and Statistics 2022 (AISTATS 2022)

点击查看摘要

Abstract: In self-supervised learning, a model is trained to solve a pretext task, using a data set whose annotations are created by a machine. The objective is to transfer the trained weights to perform a downstream task in the target domain. We critically examine the most notable pretext tasks to extract features from image data and further go on to conduct experiments on resource constrained networks, which aid faster experimentation and deployment. We study the performance of various self-supervised techniques keeping all other parameters uniform. We study the patterns that emerge by varying model type, size and amount of pre-training done for the backbone as well as establish a standard to compare against for future research. We also conduct comprehensive studies to understand the quality of representations learned by different architectures.

摘要:在自我监督的学习中,使用模型培训模型,用于使用计算机注释创建的数据集。目标是将训练的权重传输到目标域中的下游任务。我们批判性地检查最值得注意的借口任务,以从图像数据中提取特征,然后继续进行关于资源受限网络的实验,这有助于更快的实验和部署。我们研究了各种自我监督技术的性能,保持所有其他参数均匀。我们研究了各种模型类型,规模和对骨干的预培训量的模式,以及建立标准以比较未来的研究。我们还开展全面的研究,以了解不同架构学到的陈述的质量。

CV-24-标题: Each Attribute Matters: Contrastive Attention for Sentence-based Image Editing

链接: https://arxiv.org/abs/2110.11159
作者: Liuqing Zhao, Fan Lyu, Fuyuan Hu, Kaizhu Huang, Fenglei Xu, Linyan Li
备注: Accepted by BMVC 2021

点击查看摘要

Abstract: Sentence-based Image Editing (SIE) aims to deploy natural language to edit an image. Offering potentials to reduce expensive manual editing, SIE has attracted much interest recently. However, existing methods can hardly produce accurate editing and even lead to failures in attribute editing when the query sentence is with multiple editable attributes. To cope with this problem, by focusing on enhancing the difference between attributes, this paper proposes a novel model called Contrastive Attention Generative Adversarial Network (CA-GAN), which is inspired from contrastive training. Specifically, we first design a novel contrastive attention module to enlarge the editing difference between random combinations of attributes which are formed during training. We then construct an attribute discriminator to ensure effective editing on each attribute. A series of experiments show that our method can generate very encouraging results in sentence-based image editing with multiple attributes on CUB and COCO dataset. Our code is available at this https URL

摘要:基于句子的图像编辑(SIE)旨在部署自然语言以编辑图像。提供减少昂贵的手动编辑的潜力,最近吸引了很多兴趣。但是,当查询句子具有多个可编辑属性时,现有方法几乎不会产生准确的编辑甚至导致属性编辑中的故障。为了应对这个问题,专注于提高属性之间的差异,本文提出了一种称为对比引起的对比生成的对比生成对抗的新型模型(CA-GAN),其受到对比训练的启发。具体地,我们首先设计一种新的对比度注意模块,以扩大在训练期间形成的属性的随机组合之间的编辑差异。然后,我们构建属性鉴别器以确保在每个属性上进行有效编辑。一系列实验表明,我们的方法可以在Cub和Coco DataSet上具有多个属性的基于句子的图像编辑生成非常令人鼓舞的结果。我们的代码可在此HTTPS URL上获得

CV-25-标题: HCV: Hierarchy-Consistency Verification for Incremental Implicitly-Refined Classification

链接: https://arxiv.org/abs/2110.11148
作者: Kai Wang, Xialei Liu, Luis Herranz, Joost van de Weijer
备注: accepted in BMVC 2021

点击查看摘要

Abstract: Human beings learn and accumulate hierarchical knowledge over their lifetime. This knowledge is associated with previous concepts for consolidation and hierarchical construction. However, current incremental learning methods lack the ability to build a concept hierarchy by associating new concepts to old ones. A more realistic setting tackling this problem is referred to as Incremental Implicitly-Refined Classification (IIRC), which simulates the recognition process from coarse-grained categories to fine-grained categories. To overcome forgetting in this benchmark, we propose Hierarchy-Consistency Verification (HCV) as an enhancement to existing continual learning methods. Our method incrementally discovers the hierarchical relations between classes. We then show how this knowledge can be exploited during both training and inference. Experiments on three setups of varying difficulty demonstrate that our HCV module improves performance of existing continual learning methods under this IIRC setting by a large margin. Code is available in this https URL.

摘要:人类在终身学习和积累分层知识。这些知识与先前的整合和分层结构的概念相关联。然而,当前的增量学习方法缺乏通过将新概念与旧的概念相关联构建概念层次结构的能力。解决此问题的更现实的设置被称为增量隐式精细的分类(IIRC),其模拟了从粗粒小型类别到细粒度类别的识别过程。为了克服这种基准测试,我们将层次结构 - 一致性验证(HCV)提出了对现有的不断学习方法的增强。我们的方法逐步发现类之间的分层关系。然后,我们展示了如何在培训和推理期间利用这种知识。三种不同难度设置的实验表明,我们的HCV模块通过大幅度提高了此IIRC设置下现有的连续学习方法的性能。代码可在此HTTPS URL中使用。

CV-26-标题: Dual Encoding U-Net for Spatio-Temporal Domain Shift Frame Prediction

链接: https://arxiv.org/abs/2110.11140
作者: Jay Santokhi, Dylan Hillier, Yiming Yang, Joned Sarwar, Anna Jordan, Emil Hewage
备注: 8 pages, 4 figures, 5 tables

点击查看摘要

Abstract: The landscape of city-wide mobility behaviour has altered significantly over the past 18 months. The ability to make accurate and reliable predictions on such behaviour has likewise changed drastically with COVID-19 measures impacting how populations across the world interact with the different facets of mobility. This raises the question: “How does one use an abundance of pre-covid mobility data to make predictions on future behaviour in a present/post-covid environment?” This paper seeks to address this question by introducing an approach for traffic frame prediction using a lightweight Dual-Encoding U-Net built using only 12 Convolutional layers that incorporates a novel approach to skip-connections between Convolutional LSTM layers. This approach combined with an intuitive handling of training data can model both a temporal and spatio-temporal domain shift (this http URL).

摘要:在过去的18个月内,城市范围的流动行为的景观发生了显着变化。对这种行为进行准确和可靠的预测的能力同样随着Covid-19措施影响世界各地如何与流动性的不同方面互动而变化。这提出了问题:“人们如何利用丰富的预科移动性数据来预测当前/后科迪德环境中的未来行为?”本文旨在通过使用仅使用12个卷积层的轻量级双编码U-Net来引入交通帧预测方法来解决该问题的解决该问题,该卷积层采用了一种新的方法来跳过卷积LSTM层之间的跳过连接。这种方法结合了直观处理训练数据可以模拟时间和时空域移位(此HTTP URL)。

CV-27-标题: A Strong Baseline for Semi-Supervised Incremental Few-Shot Learning

链接: https://arxiv.org/abs/2110.11128
作者: Linlan Zhao, Dashan Guo, Yunlu Xu, Liang Qiao, Zhanzhan Cheng, Shiliang Pu, Yi Niu, Xiangzhong Fang
备注: Accepted by BMVC2021

点击查看摘要

Abstract: Few-shot learning (FSL) aims to learn models that generalize to novel classes with limited training samples. Recent works advance FSL towards a scenario where unlabeled examples are also available and propose semi-supervised FSL methods. Another line of methods also cares about the performance of base classes in addition to the novel ones and thus establishes the incremental FSL scenario. In this paper, we generalize the above two under a more realistic yet complex setting, named by Semi-Supervised Incremental Few-Shot Learning (S2 I-FSL). To tackle the task, we propose a novel paradigm containing two parts: (1) a well-designed meta-training algorithm for mitigating ambiguity between base and novel classes caused by unreliable pseudo labels and (2) a model adaptation mechanism to learn discriminative features for novel classes while preserving base knowledge using few labeled and all the unlabeled data. Extensive experiments on standard FSL, semi-supervised FSL, incremental FSL, and the firstly built S2 I-FSL benchmarks demonstrate the effectiveness of our proposed method.

摘要:少量学习(FSL)旨在学习概括为具有有限培训样本的新课程的模型。最近的作品将FSL推进一个场景,其中还提供了未标记的例子并提出半监督FSL方法。另一种方法还关心基类的性能,除了新颖的外,还建立了增量FSL方案。在本文中,我们在更现实但复杂的环境下概括了上述两个,通过半监督增量少量学习(S2 I-FSL)命名。为了解决任务,我们提出了一种包含两部分的新型范例:(1)一种精心设计的元训练算法,用于减轻由不可靠的伪标签和(2)模型适应机制来减轻基础和新颖类之间的模糊性,以学习歧视特征对于小说类,同时使用少数标记和所有未标记的数据保留基本知识。对标准FSL,半监控FSL,增量FSL的广泛实验,以及第一个构建的S2 I-FSL基准测试证明了我们提出的方法的有效性。

CV-28-标题: A Deep Insight into Measuring Face Image Utility with General and Face-specific Image Quality Metrics

链接: https://arxiv.org/abs/2110.11111
作者: Biying Fu, Cong Chen, Olaf Henniger, Naser Damer
备注: 8 pages, 5 figures, IEEE Winter Conf. on Applications of Computer Vision

点击查看摘要

Abstract: Quality scores provide a measure to evaluate the utility of biometric samples for biometric recognition. Biometric recognition systems require high-quality samples to achieve optimal performance. This paper focuses on face images and the measurement of face image utility with general and face-specific image quality metrics. While face-specific metrics rely on features of aligned face images, general image quality metrics can be used on the global image and relate to human perceptions. In this paper, we analyze the gap between the general image quality metrics and the face image quality metrics. Our contribution lies in a thorough examination of how different the image quality assessment algorithms relate to the utility for the face recognition task. The results of image quality assessment algorithms are further compared with those of dedicated face image quality assessment algorithms. In total, 25 different quality metrics are evaluated on three face image databases, BioSecure, LFW, and VGGFace2 using three open-source face recognition solutions, SphereFace, ArcFace, and FaceNet. Our results reveal a clear correlation between learned image metrics to face image utility even without being specifically trained as a face utility measure. Individual handcrafted features lack general stability and perform significantly worse than general face-specific quality metrics. We additionally provide a visual insight into the image areas contributing to the quality score of a selected set of quality assessment methods.

摘要:质量分数提供评估生物识别样品的实用性以进行生物识别识别的措施。生物识别系统需要高质量的样本来实现最佳性能。本文重点介绍了脸部图像和脸部图像实用性的衡量,具有一般和面对特定的图像质量指标。虽然面对特定的指标依赖于对齐的面部图像的特征,但是一般图像质量指标可以在全球图像上使用并与人类看法有关。在本文中,我们分析了一般图像质量指标与面部图像质量指标之间的差距。我们的贡献在于彻底检查图像质量评估算法如何与面部识别任务的实用程序不同。与专用面部图像质量评估算法相比,图像质量评估算法的结果进一步。总共使用三个开源面部识别解决方案,SPINEFFACE,ARCFACE和Faceget,在三个面部图像数据库,生物安全性,LFW和VGGFace2中评估25个不同的质量指标。我们的结果揭示了学习图像指标与面部图像实用程序之间的明确相关性,即使在没有专门培训的脸部实用程序测量。个人手工制作功能缺乏一般稳定性,而且比普通面对特定的质量指标表现得明显差。我们还提供了对贡献所选质量评估方法集的质量分数的图像区域的视觉洞察力。

CV-29-标题: Extraction of Positional Player Data from Broadcast Soccer Videos

链接: https://arxiv.org/abs/2110.11107
作者: Jonas Theiner, Wolfgang Gritz, Eric Müller-Budack, Robert Rein, Daniel Memmert, Ralph Ewerth
备注: Accepted for publication at WACV’22; Preprint

点击查看摘要

Abstract: Computer-aided support and analysis are becoming increasingly important in the modern world of sports. The scouting of potential prospective players, performance as well as match analysis, and the monitoring of training programs rely more and more on data-driven technologies to ensure success. Therefore, many approaches require large amounts of data, which are, however, not easy to obtain in general. In this paper, we propose a pipeline for the fully-automated extraction of positional data from broadcast video recordings of soccer matches. In contrast to previous work, the system integrates all necessary sub-tasks like sports field registration, player detection, or team assignment that are crucial for player position estimation. The quality of the modules and the entire system is interdependent. A comprehensive experimental evaluation is presented for the individual modules as well as the entire pipeline to identify the influence of errors to subsequent modules and the overall result. In this context, we propose novel evaluation metrics to compare the output with ground-truth positional data.

摘要:计算机辅助支持和分析在现代运动世界中变得越来越重要。潜在的前瞻性球员,绩效和匹配分析的侦察和竞争计划的监测越来越依赖于数据驱动技术,以确保成功。因此,许多方法需要大量数据,然而,通常不容易获得。在本文中,我们提出了一种管道,用于全自动提取来自足球比赛的广播视频录制的位置数据。与以前的工作相比,系统集成了运动场登记,玩家检测或团队分配等所有必要的子任务,这对于玩家位置估算至关重要。模块和整个系统的质量是相互依赖的。为各个模块以及整个管道提供全面的实验评估,以确定错误对后续模块的影响以及整体结果。在这方面,我们提出了新的评估指标,以将输出与地面定位数据进行比较。

CV-30-标题: Reinforcement Learning Based Optimal Camera Placement for Depth Observation of Indoor Scenes

链接: https://arxiv.org/abs/2110.11106
作者: Yichuan Chen, Manabu Tsukada, Hiroshi Esaki
备注: Accepted to IEEE International Conference on Networking, Sensing and Control (ICNSC) 2021

点击查看摘要

Abstract: Exploring the most task-friendly camera setting – optimal camera placement (OCP) problem – in tasks that use multiple cameras is of great importance. However, few existing OCP solutions specialize in depth observation of indoor scenes, and most versatile solutions work offline. To this problem, an OCP online solution to depth observation of indoor scenes based on reinforcement learning is proposed in this paper. The proposed solution comprises a simulation environment that implements scene observation and reward estimation using shadow maps and an agent network containing a soft actor-critic (SAC)-based reinforcement learning backbone and a feature extractor to extract features from the observed point cloud layer-by-layer. Comparative experiments with two state-of-the-art optimization-based offline methods are conducted. The experimental results indicate that the proposed system outperforms seven out of ten test scenes in obtaining lower depth observation error. The total error in all test scenes is also less than 90% of the baseline ones. Therefore, the proposed system is more competent for depth camera placement in scenarios where there is no prior knowledge of the scenes or where a lower depth observation error is the main objective.

摘要:探索最具任务友好的相机设置 - 最佳摄像机放置(OCP)问题 - 在使用多个摄像机的任务中非常重要。然而,很少有现有的OCP解决方案专门从事室内场景的深度观察,并且最通用的解决方案离线工作。在这个问题上,本文提出了一种基于强化学习的室内场景深度观察的OCP在线解决方案。所提出的解决方案包括模拟环境,其使用阴影贴图实现场景观察和奖励估计,以及包含软演奏者 - 评论家(SAC)的加强学习骨干和特征提取器的代理网络,用于从观察点云层中提取特征-层。进行了两种最先进的基于优化的离线方法的比较实验。实验结果表明,在获得较低深度观察误差时,所提出的系统在十个测试场景中优于七个。所有测试场景中的总误差也占基线的误差。因此,所提出的系统在没有事先了解场景的情况下或者较低深度观察误差是主要目标的情况下,在场景中更有能力。

CV-31-标题: 3D-ANAS v2: Grafting Transformer Module on Automatically Designed ConvNet for Hyperspectral Image Classification

链接: https://arxiv.org/abs/2110.11084
作者: Xizhe Xue, Haokui Zhang, Zongwen Bai, Ying Li
备注: 15 pages, 10 figures

点击查看摘要

Abstract: Hyperspectral image (HSI) classification has been a hot topic for decides, as Hyperspectral image has rich spatial and spectral information, providing strong basis for distinguishing different land-cover objects. Benefiting from the development of deep learning technologies, deep learning based HSI classification methods have achieved promising performance. Recently, several neural architecture search (NAS) algorithms are proposed for HSI classification, which further improve the accuracy of HSI classification to a new level. In this paper, we revisit the search space designed in previous HSI classification NAS methods and propose a novel hybrid search space, where 3D convolution, 2D spatial convolution and 2D spectral convolution are employed. Compared search space proposed in previous works, the serach space proposed in this paper is more aligned with characteristic of HSI data that is HSIs have a relatively low spatial resolution and an extremely high spectral resolution. In addition, to further improve the classification accuracy, we attempt to graft the emerging transformer module on the automatically designed ConvNet to adding global information to local region focused features learned by ConvNet. We carry out comparison experiments on three public HSI datasets which have different spectral characteristics to evaluate the proposed method. Experimental results show that the proposed method achieves much better performance than comparison approaches, and both adopting the proposed hybrid search space and grafting transformer module improves classification accuracy. Especially on the most recently captured dataset Houston University, overall accuracy is improved by up to nearly 6 percentage points. Code will be available at: this https URL.

摘要:高光谱图像(HSI)分类一直是一个热门话题,因为Hyperspectral图像具有丰富的空间和光谱信息,为区分不同的陆地覆盖物体提供了强的基础。受益于深度学习技术的发展,基于深度学习的HSI分类方法取得了有希望的表现。最近,提出了几个神经结构搜索(NAS)算法,用于HSI分类,从而提高了HSI分类对新级别的准确性。在本文中,我们重新审视以前的HSI分类NAS方法中设计的搜索空间,并提出了一种新颖的混合搜索空间,其中使用了3D卷积,2D空间卷积和2D光谱卷积。比较在以前的作品中提出的搜索空间,本文提出的Serach空间与HSI数据的特性更准确,HSI数据是HSIS具有相对低的空间分辨率和极高的光谱分辨率。此外,为了进一步提高分类准确性,我们试图将新兴变压器模块移植到自动设计的GRONNET上,以将全球信息添加到由GromNet学习的本地区域的聚焦功能。我们对三个公共HSI数据集进行比较实验,其具有不同的光谱特性来评估所提出的方法。实验结果表明,该方法比比较方法实现了更好的性能,并采用所提出的混合搜索空间和移植变压器模块来提高分类精度。特别是在最近捕获的DataSet休斯顿大学,总体准确性得到了近6个百分点。代码将可用:此HTTPS URL。

CV-32-标题: Robust Edge-Direct Visual Odometry based on CNN edge detection and Shi-Tomasi corner optimization

链接: https://arxiv.org/abs/2110.11064
作者: Kengdong Lu, Jintao Cheng, Yubin Zhou, Juncan Deng, Rui Fan, Kaiqing Luo
备注:

点击查看摘要

Abstract: In this paper, we propose a robust edge-direct visual odometry (VO) based on CNN edge detection and Shi-Tomasi corner optimization. Four layers of pyramids were extracted from the image in the proposed method to reduce the motion error between frames. This solution used CNN edge detection and Shi-Tomasi corner optimization to extract information from the image. Then, the pose estimation is performed using the Levenberg-Marquardt (LM) algorithm and updating the keyframes. Our method was compared with the dense direct method, the improved direct method of Canny edge detection, and ORB-SLAM2 system on the RGB-D TUM benchmark. The experimental results indicate that our method achieves better robustness and accuracy.

摘要:在本文中,我们提出了一种基于CNN边缘检测和Shi-Tomasi拐角优化的强大的边缘直接视觉径管(VO)。在所提出的方法中从图像中提取四层金字塔,以减少帧之间的运动误差。该解决方案使用CNN边缘检测和SHI-TOMASI角度优化来提取来自图像的信息。然后,使用Levenberg-Marquardt(LM)算法和更新关键帧来执行姿势估计。我们的方法与密集的直接方法进行比较,改进的Canny Edge检测直接方法,以及RGB-D Tum基准上的ORB-SLAM2系统。实验结果表明,我们的方法实现了更好的鲁棒性和准确性。

CV-33-标题: Transfer beyond the Field of View: Dense Panoramic Semantic Segmentation via Unsupervised Domain Adaptation

链接: https://arxiv.org/abs/2110.11062
作者: Jiaming Zhang, Chaoxiang Ma, Kailun Yang, Alina Roitberg, Kunyu Peng, Rainer Stiefelhagen
备注: Accepted to IEEE Transactions on Intelligent Transportation Systems (IEEE T-ITS). Dataset and code will be made publicly available at this https URL arXiv admin note: substantial text overlap with arXiv:2108.06383

点击查看摘要

Abstract: Autonomous vehicles clearly benefit from the expanded Field of View (FoV) of 360-degree sensors, but modern semantic segmentation approaches rely heavily on annotated training data which is rarely available for panoramic images. We look at this problem from the perspective of domain adaptation and bring panoramic semantic segmentation to a setting, where labelled training data originates from a different distribution of conventional pinhole camera images. To achieve this, we formalize the task of unsupervised domain adaptation for panoramic semantic segmentation and collect DensePASS - a novel densely annotated dataset for panoramic segmentation under cross-domain conditions, specifically built to study the Pinhole-to-Panoramic domain shift and accompanied with pinhole camera training examples obtained from Cityscapes. DensePASS covers both, labelled- and unlabelled 360-degree images, with the labelled data comprising 19 classes which explicitly fit the categories available in the source (i.e. pinhole) domain. Since data-driven models are especially susceptible to changes in data distribution, we introduce P2PDA - a generic framework for Pinhole-to-Panoramic semantic segmentation which addresses the challenge of domain divergence with different variants of attention-augmented domain adaptation modules, enabling the transfer in output-, feature-, and feature confidence spaces. P2PDA intertwines uncertainty-aware adaptation using confidence values regulated on-the-fly through attention heads with discrepant predictions. Our framework facilitates context exchange when learning domain correspondences and dramatically improves the adaptation performance of accuracy- and efficiency-focused models. Comprehensive experiments verify that our framework clearly surpasses unsupervised domain adaptation- and specialized panoramic segmentation approaches.

摘要:自治车辆明显受益于360度传感器的扩展视野(FOV),但现代语义分割方法严重依赖于批发培训数据,这很少可用于全景图像。从域适应的角度来看,将全景语义分段带到一个设置,其中标记的训练数据源自传统针孔相机图像的不同分布。为此,我们正规化全景语义分割的无监督域适应的任务,并收集DensePass - 一种用于跨域条件下全景分割的新型浓密注释数据集,专门构建,以研究针孔到全景域移位并伴随着针孔从城市景观获得的相机训练示例。 DensePass涵盖两个,标签和未标记的360度图像,其中包含19类的标记数据,该数据明确地拟合源(i.e.Inhole)域中的可用类别。由于数据驱动的模型尤其容易受到数据分布的变化,因此引入了P2PDA - 一种用于针孔到全景语义分割的通用框架,其解决了域发散的挑战,这些域具有不同的注意力增强域适应模块的不同变体,从而实现了转移在输出,特征和功能置信空间中。 P2PDA使用带有差异预测的注意力头来互相监管的置信度值来互通不确定性感知的适应。我们的框架在学习域对应关系时促进上下文交换,并大大提高了专注于精度和效率的模型的适应性。综合实验验证了我们的框架明确超越了无监督的域适应和专业的全景细分方法。

CV-34-标题: Mixer-based lidar lane detection network and dataset for urban roads

链接: https://arxiv.org/abs/2110.11048
作者: Donghee Paek, Seung-Hyun Kong, Kevin Tirta Wijaya
备注: 15 pages, 12 figures, 8 tables

点击查看摘要

Abstract: Accurate lane detection under various road conditions is a critical function for autonomous driving. Generally, when detected lane lines from a front camera image are projected into a birds-eye view (BEV) for motion planning, the resulting lane lines are often distorted. And convolutional neural network (CNN)-based feature extractors often lose resolution when increasing the receptive field to detect global features such as lane lines. However, Lidar point cloud has little image distortion in the BEV-projection. Since lane lines are thin and stretch over entire BEV image while occupying only a small portion, lane lines should be detected as a global feature with high resolution. In this paper, we propose Lane Mixer Network (LMN) that extracts local features from Lidar point cloud, recognizes global features, and detects lane lines using a BEV encoder, a Mixer-based global feature extractor, and a detection head, respectively. In addition, we provide a world-first large urban lane dataset for Lidar, K-Lane, which has maximum 6 lanes under various urban road conditions. We demonstrate that the proposed LMN achieves the state-of-the-art performance, an F1 score of 91.67%, with K-Lane. The K-Lane, LMN training code, pre-trained models, and total dataset development platform are available at github.

摘要:在各种道路条件下准确的道路检测是自主驾驶的关键功能。通常,当从前摄像机图像被检测到从前摄像机图像被投射到用于运动规划的鸟瞰图(BEV)时,所得到的车道线通常被扭曲。和基于卷积神经网络(CNN)的特征提取器通常在增加接收场以检测诸如车道线路的全局特征时失去分辨率。然而,LIDAR点云在BEV投影中具有很小的图像失真。由于车道线很薄并且在整个BEV图像上伸展,而仅在占用一小部分时,应以高分辨率检测车道线作为全局特征。在本文中,我们提出了从LIDAR点云中提取局部特征的Lane混频器网络(LMN),识别全局特征,并使用BEV编码器,基于混频器的全局特征提取器和检测头检测车道线路。此外,我们还为Lidar,K-Lane提供了一个世界第一大城市车道数据集,在各种城市道路条件下最多6个车道。我们证明,拟议的LMN实现了最先进的性能,F1分数为91.67%,带有K车道。 Github提供了K-Lane,LMN训练代码,预先训练的型号和总数据集开发平台。

CV-35-标题: Improving the Deployment of Recycling Classification through Efficient Hyper-Parameter Analysis

链接: https://arxiv.org/abs/2110.11043
作者: Mazin Abdulmahmood, Ryan Grammenos
备注:

点击查看摘要

Abstract: The paradigm of automated waste classification has recently seen a shift in the domain of interest from conventional image processing techniques to powerful computer vision algorithms known as convolutional neural networks (CNN). Historically, CNNs have demonstrated a strong dependency on powerful hardware for real-time classification, yet the need for deployment on weaker embedded devices is greater than ever. The work in this paper proposes a methodology for reconstructing and tuning conventional image classification models, using EfficientNets, to decrease their parameterisation with no trade-off in model accuracy and develops a pipeline through TensorRT for accelerating such models to run at real-time on an NVIDIA Jetson Nano embedded device. The train-deployment discrepancy, relating how poor data augmentation leads to a discrepancy in model accuracy between training and deployment, is often neglected in many papers and thus the work is extended by analysing and evaluating the impact real word perturbations had on model accuracy once deployed. The scope of the work concerns developing a more efficient variant of WasteNet, a collaborative recycling classification model. The newly developed model scores a test-set accuracy of 95.8% with a real word accuracy of 95%, a 14% increase over the original. Our acceleration pipeline boosted model throughput by 750% to 24 inferences per second on the Jetson Nano and real-time latency of the system was verified through servomotor latency analysis.

摘要:自动化废物分类的范式最近看到了传统图像处理技术的感兴趣领域的转变,以强大的计算机视觉算法称为卷积神经网络(CNN)。从历史上看,CNNS已经展示了对实时分类强大硬件的强大依赖性,但对较弱的嵌入式设备部署的需要大于以往任何时候都大。本文的工作提出了一种使用高效导管来重建和调整传统图像分类模型的方法,以减少它们的参数化,而不是模型精度的权衡,并通过规则来开发管道,以加速这样的模型在实时运行NVIDIA Jetson Nano嵌入式设备。火车部署差异,有关培训和部署之间的模型准确性的差异差异,在许多论文中往往忽略了差异,因此通过分析和评估一旦部署一旦部署的模型准确性的影响和评估工作延长。 。工作范围涉及开发更有效的废物变种,是一种协作回收分类模型。新开发的模型评分了95.8 \%的测试设置精度,真正的单词精度为95%,最初的增加14%。我们的加速管线提升了750%的模型吞吐量在Jetson Nano上每秒24个推论,通过伺服电动机延迟分析来验证系统的实时延迟。

CV-36-标题: RefRec: Pseudo-labels Refinement via Shape Reconstruction for Unsupervised 3D Domain Adaptation

链接: https://arxiv.org/abs/2110.11036
作者: Adriano Cardace, Riccardo Spezialetti, Pierluigi Zama Ramirez, Samuele Salti, Luigi Di Stefano
备注: 3DV 2021 (Oral) Code: this https URL

点击查看摘要

Abstract: Unsupervised Domain Adaptation (UDA) for point cloud classification is an emerging research problem with relevant practical motivations. Reliance on multi-task learning to align features across domains has been the standard way to tackle it. In this paper, we take a different path and propose RefRec, the first approach to investigate pseudo-labels and self-training in UDA for point clouds. We present two main innovations to make self-training effective on 3D data: i) refinement of noisy pseudo-labels by matching shape descriptors that are learned by the unsupervised task of shape reconstruction on both domains; ii) a novel self-training protocol that learns domain-specific decision boundaries and reduces the negative impact of mislabelled target samples and in-domain intra-class variability. RefRec sets the new state of the art in both standard benchmarks used to test UDA for point cloud classification, showcasing the effectiveness of self-training for this important problem.

摘要:对点云分类的无监督域适应(UDA)是一种具有相关实用动机的新兴研究问题。依赖多任务学习以对齐域中的功能是解决它的标准方法。在本文中,我们采取了不同的路径并提出了refroc,这是一个调查伪标签和在UDA的自我训练进行点云的方法。我们提出了两种主要创新,使自我训练在3D数据上进行有效:i)通过匹配形状描述符在两个域上的形状重建的无监督任务中学到的形状描述符进行嘈杂的伪标签; ii)一种新颖的自我培训协议,了解具体的域的决策边界,并减少了误标记的目标样本和域内级别变异性的负面影响。 recrec在用于测试UDA的标准基准中,为点云分类进行测试,展示了对这一重要问题的自我培训的有效性。

CV-37-标题: Augmenting Knowledge Distillation With Peer-To-Peer Mutual Learning For Model Compression

链接: https://arxiv.org/abs/2110.11023
作者: Usma Niyaz, Deepti R. Bathula
备注:

点击查看摘要

Abstract: Knowledge distillation (KD) is an effective model compression technique where a compact student network is taught to mimic the behavior of a complex and highly trained teacher network. In contrast, Mutual Learning (ML) provides an alternative strategy where multiple simple student networks benefit from sharing knowledge, even in the absence of a powerful but static teacher network. Motivated by these findings, we propose a single-teacher, multi-student framework that leverages both KD and ML to achieve better performance. Furthermore, an online distillation strategy is utilized to train the teacher and students simultaneously. To evaluate the performance of the proposed approach, extensive experiments were conducted using three different versions of teacher-student networks on benchmark biomedical classification (MSI vs. MSS) and object detection (Polyp Detection) tasks. Ensemble of student networks trained in the proposed manner achieved better results than the ensemble of students trained using KD or ML individually, establishing the benefit of augmenting knowledge transfer from teacher to students with peer-to-peer learning between students.

摘要:知识蒸馏(KD)是一种有效的模型压缩技术,其中教授紧凑的学生网络以模仿复杂和高度训练的教师网络的行为。相比之下,相互学习(ml)提供了一种替代策略,其中多个简单的学生网络从共享知识中受益,即使在没有强大但静态的教师网络的情况下也是如此。通过这些调查结果,我们提出了一个单一教师,多学生框架,利用KD和ML来实现更好的性能。此外,在线蒸馏策略用于同时培训老师和学生。为了评估所提出的方法的性能,使用三种不同版本的教师 - 学生网络进行了广泛的实验,基准生物医学分类(MSI与MS)和对象检测(POLYP检测)任务。以拟议的方式培训的学生网络集合比使用KD或ML的学生单独培训的学生的集合达到了更好的结果,从而建立了从教师到学生同行学习的学生增强知识转移的好处。

CV-38-标题: Spatial Location Constraint Prototype Loss for Open Set Recognition

链接: https://arxiv.org/abs/2110.11013
作者: Ziheng Xia, Ganggang Dong, Penghui Wang, Hongwei Liu
备注: 9 pages

点击查看摘要

Abstract: One of the challenges in pattern recognition is open set recognition. Compared with closed set recognition, open set recognition needs to reduce not only the empirical risk, but also the open space risk, and the reduction of these two risks corresponds to classifying the known classes and identifying the unknown classes respectively. How to reduce the open space risk is the key of open set recognition. This paper explores the origin of the open space risk by analyzing the distribution of known and unknown classes features. On this basis, the spatial location constraint prototype loss function is proposed to reduce the two risks simultaneously. Extensive experiments on multiple benchmark datasets and many visualization results indicate that our methods is significantly superior to other existing approaches.

摘要:模式识别中的一个挑战是开放式识别。与封闭式识别相比,开放式识别不仅需要减少经验风险,也需要降低开放空间风险,并且对这两个风险的减少对应于分别分配已知类别并分别识别未知类。如何降低开放空间风险是开放式识别的关键。本文通过分析已知和未知类功能的分布来探讨开放空间风险的起源。在此基础上,提出了空间位置约束原型丢失功能,以同时减少两个风险。在多个基准数据集和许多可视化结果上的广泛实验表明我们的方法显着优于其他现有方法。

CV-39-标题: Pixel-Level Face Image Quality Assessment for Explainable Face Recognition

链接: https://arxiv.org/abs/2110.11001
作者: Philipp Terhörst, Marco Huber, Naser Damer, Florian Kirchbuchner, Kiran Raja, Arjan Kuijper
备注: Submitted to CVPR 2022, Code will be made publicly-available in November 2021

点击查看摘要

Abstract: An essential factor to achieve high performance in face recognition systems is the quality of its samples. Since these systems are involved in various daily life there is a strong need of making face recognition processes understandable for humans. In this work, we introduce the concept of pixel-level face image quality that determines the utility of pixels in a face image for recognition. Given an arbitrary face recognition network, in this work, we propose a training-free approach to assess the pixel-level qualities of a face image. To achieve this, a model-specific quality value of the input image is estimated and used to build a sample-specific quality regression model. Based on this model, quality-based gradients are back-propagated and converted into pixel-level quality estimates. In the experiments, we qualitatively and quantitatively investigated the meaningfulness of the pixel-level qualities based on real and artificial disturbances and by comparing the explanation maps on ICAO-incompliant faces. In all scenarios, the results demonstrate that the proposed solution produces meaningful pixel-level qualities. The code is publicly available.

摘要:在人脸识别系统中实现高性能的必要因素是其样本的质量。由于这些系统涉及各种日常生活,因此对人类可以理解的面部识别过程具有很强的需要。在这项工作中,我们介绍了像素级面部图像质量的概念,该概念确定面部图像中像素的效用以进行识别。鉴于任意面部识别网络,在这项工作中,我们提出了一种无培训方法来评估面部图像的像素级质量。为此,估计输入图像的特定模型质量值并用于构建特定于样本的质量回归模型。基于该模型,基于质量的梯度被回到传播并转换为像素级质量估计。在实验中,我们基于真实和人工扰动的基于实际和人工障碍来定量和定量地研究了像素级质量的有意义性。在所有场景中,结果表明,所提出的解决方案产生有意义的像素级质量。该代码可公开可用。

CV-40-标题: Memory Efficient Adaptive Attention For Multiple Domain Learning

链接: https://arxiv.org/abs/2110.10969
作者: Himanshu Pradeep Aswani, Abhiraj Sunil Kanse, Shubhang Bhatnagar, Amit Sethi
备注: 13 pages, 3 figures, 4 graphs, 3 tables

点击查看摘要

Abstract: Training CNNs from scratch on new domains typically demands large numbers of labeled images and computations, which is not suitable for low-power hardware. One way to reduce these requirements is to modularize the CNN architecture and freeze the weights of the heavier modules, that is, the lower layers after pre-training. Recent studies have proposed alternative modular architectures and schemes that lead to a reduction in the number of trainable parameters needed to match the accuracy of fully fine-tuned CNNs on new domains. Our work suggests that a further reduction in the number of trainable parameters by an order of magnitude is possible. Furthermore, we propose that new modularization techniques for multiple domain learning should also be compared on other realistic metrics, such as the number of interconnections needed between the fixed and trainable modules, the number of training samples needed, the order of computations required and the robustness to partial mislabeling of the training data. On all of these criteria, the proposed architecture demonstrates advantages over or matches the current state-of-the-art.

摘要:培训来自新域的划痕的CNN通常要求大量标记的图像和计算,这不适用于低功耗硬件。降低这些要求的一种方法是模块化CNN架构并冻结较重模块的重量,即预训练后下层。最近的研究提出了替代的模块化架构和方案,这导致符合新域上完全微调CNN的准确性所需的培训参数的数量的减少。我们的工作表明,可以进一步减少训练参数的数量级的数量级。此外,我们建议还应该在其他现实度量标准中比较多域学习的新模块化技术,例如固定和可训练模块之间所需的互连数,所需的训练样本的数量,所需的计算顺序和鲁棒性部分错误标记培训数据。在所有这些标准上,所提出的架构展示了优势或与当前最先进的架构相匹配。

CV-41-标题: Weakly Supervised Training of Monocular 3D Object Detectors Using Wide Baseline Multi-view Traffic Camera Data

链接: https://arxiv.org/abs/2110.10966
作者: Matthew Howe, Ian Reid, Jamie Mackenzie
备注: Paper accepted at The 32nd British Machine Vision Conference, BMVC 2021

点击查看摘要

Abstract: Accurate 7DoF prediction of vehicles at an intersection is an important task for assessing potential conflicts between road users. In principle, this could be achieved by a single camera system that is capable of detecting the pose of each vehicle but this would require a large, accurately labelled dataset from which to train the detector. Although large vehicle pose datasets exist (ostensibly developed for autonomous vehicles), we find training on these datasets inadequate. These datasets contain images from a ground level viewpoint, whereas an ideal view for intersection observation would be elevated higher above the road surface. We develop an alternative approach using a weakly supervised method of fine tuning 3D object detectors for traffic observation cameras; showing in the process that large existing autonomous vehicle datasets can be leveraged for pre-training. To fine-tune the monocular 3D object detector, our method utilises multiple 2D detections from overlapping, wide-baseline views and a loss that encodes the subjacent geometric consistency. Our method achieves vehicle 7DoF pose prediction accuracy on our dataset comparable to the top performing monocular 3D object detectors on autonomous vehicle datasets. We present our training methodology, multi-view reprojection loss, and dataset.

摘要:准确的7DOF预测交叉路口的预测是评估道路用户之间潜在冲突的重要任务。原则上,这可以通过能够检测每个车辆的姿势的单个相机系统来实现,但这需要从中训练检测器的大型准确标记的数据集。虽然存在大型车辆姿势数据集(用于自主车辆的外表面上),但我们发现这些数据集的培训不足。这些数据集包含来自地面视点的图像,而用于交叉路口观察的理想视图将高于路面高于路面。我们使用弱调整3D对象探测器进行交通观测摄像机的弱监督方法,开发一种替代方法;在该过程中显示,可以利用大型现有自动车辆数据集进行预培训。为了微调单眼3D对象检测器,我们的方法利用重叠,广泛基线视图的多个2D检测和编码所述象胶几何一致性的损耗。我们的方法在我们的数据集上实现了车辆7dof对自动车辆数据集上的顶部表现单眼3D对象探测器的数据集上的姿势预测精度。我们介绍了我们的培训方法,多视图重新注入损失和数据集。

CV-42-标题: Vis-TOP: Visual Transformer Overlay Processor

链接: https://arxiv.org/abs/2110.10957
作者: Wei Hu, Dian Xu, Zimeng Fan, Fang Liu, Yanxiang He
备注: 13 pages, 5 figures

点击查看摘要

Abstract: In recent years, Transformer has achieved good results in Natural Language Processing (NLP) and has also started to expand into Computer Vision (CV). Excellent models such as the Vision Transformer and Swin Transformer have emerged. At the same time, the platform for Transformer models was extended to embedded devices to meet some resource-sensitive application scenarios. However, due to the large number of parameters, the complex computational flow and the many different structural variants of Transformer models, there are a number of issues that need to be addressed in its hardware design. This is both an opportunity and a challenge. We propose Vis-TOP (Visual Transformer Overlay Processor), an overlay processor for various visual Transformer models. It differs from coarse-grained overlay processors such as CPU, GPU, NPE, and from fine-grained customized designs for a specific model. Vis-TOP summarizes the characteristics of all visual Transformer models and implements a three-layer and two-level transformation structure that allows the model to be switched or changed freely without changing the hardware architecture. At the same time, the corresponding instruction bundle and hardware architecture are designed in three-layer and two-level transformation structure. After quantization of Swin Transformer tiny model using 8-bit fixed points (fix_8), we implemented an overlay processor on the ZCU102. Compared to GPU, the TOP throughput is 1.5x higher. Compared to the existing Transformer accelerators, our throughput per DSP is between 2.2x and 11.7x higher than others. In a word, the approach in this paper meets the requirements of real-time AI in terms of both resource consumption and inference speed. Vis-TOP provides a cost-effective and power-effective solution based on reconfigurable devices for computer vision at the edge.

摘要:近年来,变压器在自然语言处理(NLP)中取得了良好的结果,并开始扩展到计算机视觉(CV)。出现了高度型号,如视觉变压器和Swin变压器。与此同时,变压器模型的平台扩展到嵌入式设备,以满足一些资源敏感的应用方案。然而,由于参数的大量参数,复杂的计算流程和变压器模型的许多不同的结构变体,有许多问题需要在其硬件设计中解决。这是一个机会和挑战。我们提出了VIS-TOP(可视变压器覆盖处理器),是各种可视变压器模型的覆盖处理器。它与CPU,GPU,NPE等粗粒覆盖处理器不同,以及特定模型的细粒定制设计。 VIS-TOP总结了所有可视变压器模型的特性,实现了三层和两级变换结构,允许在不改变硬件架构的情况下自由地切换或更改模型。同时,相应的指令束和硬件架构设计成三层和两级变换结构。在使用8位固定点(FIX_8)的SWIN变压器微小模型的量化之后,我们在ZCU102上实现了覆盖处理器。与GPU相比,顶部吞吐量较高1.5倍。与现有的变压器加速器相比,我们每DSP的吞吐量高于其他吞吐量。总之,本文的方法符合资源消耗和推理速度的实时AI的要求。 VIS-TOP基于在边缘的计算机视觉的可重新配置设备提供了一种经济高效和功率有效的解决方案。

CV-43-标题: Multi-label Classification with Partial Annotations using Class-aware Selective Loss

链接: https://arxiv.org/abs/2110.10955
作者: Emanuel Ben-Baruch, Tal Ridnik, Itamar Friedman, Avi Ben-Cohen, Nadav Zamir, Asaf Noy, Lihi Zelnik-Manor
备注:

点击查看摘要

Abstract: Large-scale multi-label classification datasets are commonly, and perhaps inevitably, partially annotated. That is, only a small subset of labels are annotated per sample. Different methods for handling the missing labels induce different properties on the model and impact its accuracy. In this work, we analyze the partial labeling problem, then propose a solution based on two key ideas. First, un-annotated labels should be treated selectively according to two probability quantities: the class distribution in the overall dataset and the specific label likelihood for a given data sample. We propose to estimate the class distribution using a dedicated temporary model, and we show its improved efficiency over a naive estimation computed using the dataset’s partial annotations. Second, during the training of the target model, we emphasize the contribution of annotated labels over originally un-annotated labels by using a dedicated asymmetric loss. With our novel approach, we achieve state-of-the-art results on OpenImages dataset (e.g. reaching 87.3 mAP on V6). In addition, experiments conducted on LVIS and simulated-COCO demonstrate the effectiveness of our approach. Code is available at this https URL.

摘要:大规模的多标签分类数据集通常,也许是不可避免的,部分注释。也就是说,每个样本只注释一个小的标签子。处理缺失标签的不同方法在模型上诱导不同的属性并影响其准确性。在这项工作中,我们分析了部分标签问题,然后基于两个关键思想提出解决方案。首先,应根据两个概率数量选择性地处理未注释的标签:整个数据集中的类分布和给定数据样本的特定标签似然性。我们建议使用专用临时模型来估算类分布,并且我们在使用DataSet的部分注释计算的天真估计上显示其提高效率。其次,在培训目标模型期间,我们强调注释标签通过使用专用的不对称损失来对原始未注释的标签的贡献。通过我们的小说方法,我们在OpenImages数据集中实现最先进的结果(例如,在V6上达到87.3地图)。此外,对紫杉和模拟-COCO进行的实验表明了我们方法的有效性。代码可在此HTTPS URL中获得。

CV-44-标题: MOS: A Low Latency and Lightweight Framework for Face Detection Landmark Localization and Head Pose Estimation

链接: https://arxiv.org/abs/2110.10953
作者: Yepeng Liu, Zaiwang Gu, Shenghua Gao, Dong Wang, Yusheng Zeng, Jun Cheng
备注:

点击查看摘要

Abstract: With the emergence of service robots and surveillance cameras, dynamic face recognition (DFR) in wild has received much attention in recent years. Face detection and head pose estimation are two important steps for DFR. Very often, the pose is estimated after the face detection. However, such sequential computations lead to higher latency. In this paper, we propose a low latency and lightweight network for simultaneous face detection, landmark localization and head pose estimation. Inspired by the observation that it is more challenging to locate the facial landmarks for faces with large angles, a pose loss is proposed to constrain the learning. Moreover, we also propose an uncertainty multi-task loss to learn the weights of individual tasks automatically. Another challenge is that robots often use low computational units like ARM based computing core and we often need to use lightweight networks instead of the heavy ones, which lead to performance drop especially for small and hard faces. In this paper, we propose online feedback sampling to augment the training samples across different scales, which increases the diversity of training data automatically. Through validation in commonly used WIDER FACE, AFLW and AFLW2000 datasets, the results show that the proposed method achieves the state-of-the-art performance in low computational resources.

摘要:随着服务机器人和监视摄像机的出现,近年来野外的动态人脸识别(DFR)受到了很多关注。面部检测和头部姿势估计是DFR的两个重要步骤。经常,在面部检测后估计姿势。然而,这种顺序计算导致更高的延迟。在本文中,我们提出了一种低延迟和轻量级网络,用于同时脸部检测,地标定位和头部姿势估计。灵感来自观察,以大角度定位面部的面部地标更具挑战性,提出了一个姿势损失来限制学习。此外,我们还提出了不确定性的多任务损失,以便自动学习各个任务的权重。另一个挑战是,机器人通常使用武器基的计算核心等低计算单元,我们经常需要使用轻量级网络而不是沉重的网络,这导致性能下降,特别是对于小型和硬面。在本文中,我们提出了在线反馈采样来增加不同尺度的培训样本,这会自动增加培训数据的多样性。通过验证常用的更广泛的脸,AFLW和AFLW2000数据集,结果表明,该方法在低计算资源中实现了最先进的性能。

CV-45-标题: Multimodal Learning using Optimal Transport for Sarcasm and Humor Detection

链接: https://arxiv.org/abs/2110.10949
作者: Shraman Pramanick, Aniket Roy, Vishal M. Patel
备注: Accepted to WACV 2022

点击查看摘要

Abstract: Multimodal learning is an emerging yet challenging research area. In this paper, we deal with multimodal sarcasm and humor detection from conversational videos and image-text pairs. Being a fleeting action, which is reflected across the modalities, sarcasm detection is challenging since large datasets are not available for this task in the literature. Therefore, we primarily focus on resource-constrained training, where the number of training samples is limited. To this end, we propose a novel multimodal learning system, MuLOT (Multimodal Learning using Optimal Transport), which utilizes self-attention to exploit intra-modal correspondence and optimal transport for cross-modal correspondence. Finally, the modalities are combined with multimodal attention fusion to capture the inter-dependencies across modalities. We test our approach for multimodal sarcasm and humor detection on three benchmark datasets - MUStARD (video, audio, text), UR-FUNNY (video, audio, text), MST (image, text) and obtain 2.1%, 1.54%, and 2.34% accuracy improvements over state-of-the-art.

摘要:多式化学习是一个新兴而有挑战性的研究区。在本文中,我们处理从会话视频和图像文本对的多模态讽刺和幽默检测。作为一种速度的行动,它反映在模态中,讽刺检测是具有挑战性,因为在文献中没有大型数据集没有可用于此任务。因此,我们主要关注资源受限培训,培训样本的数量是有限的。为此,我们提出了一种新颖的多式化学习系统,MULOT(使用最佳运输),利用自我关注来利用模态对应和最佳运输来跨越模型对应。最后,模态与多模式注意融合相结合,以捕获跨模式的依赖性。我们测试我们的三个基准数据集 - 芥末(视频,音频,文本),UR-yike(视频,音频,文本),MST(图像,文本)和获得2.1%,1.54%最先进的2.34%的准确性改进。

CV-46-标题: Autonomous Dimension Reduction by Flattening Deformation of Data Manifold under an Intrinsic Deforming Field

链接: https://arxiv.org/abs/2110.10938
作者: Xiaodong Zhuang
备注: 18 pages, 23 figures

点击查看摘要

Abstract: A new dimension reduction (DR) method for data sets is proposed by autonomous deforming of data manifolds. The deformation is guided by the proposed deforming vector field, which is defined by two kinds of virtual interactions between data points. The flattening of data manifold is achieved as an emergent behavior under the elastic and repelling interactions between data points, meanwhile the topological structure of the manifold is preserved. To overcome the uneven sampling (or “short-cut edge”) problem, the soft neighborhood is proposed, in which the neighbor degree is defined and adaptive interactions between neighbor points is implemented. The proposed method provides a novel geometric viewpoint on dimension reduction. Experimental results prove the effectiveness of the proposed method in dimension reduction, and implicit feature of data sets may also be revealed.

摘要:通过数据歧管的自主变形提出了一种新的数据集的尺寸减少(DR)方法。变形由所提出的变形矢量字段引导,其由数据点之间的两种虚拟交互定义。数据歧管的平坦化是在数据点之间弹性和排斥相互作用下的紧急行为,同时保留了歧管的拓扑结构。为了克服不均匀的采样(或“短切边缘”)问题,提出了软邻域,其中确定了邻居度的邻居度,并且相邻点之间的自适应相互作用。该方法提供了关于尺寸减少的新型几何观点。实验结果证明了所提出的方法在尺寸减小中的有效性,并且还可以揭示数据集的隐式特征。

CV-47-标题: CATRO: Channel Pruning via Class-Aware Trace Ratio Optimization

链接: https://arxiv.org/abs/2110.10921
作者: Wenzheng Hu, Ning Liu, Zhengping Che, Mingyang Li, Jian Tang, Changshui Zhang, Jianqiang Wang
备注:

点击查看摘要

Abstract: Deep convolutional neural networks are shown to be overkill with high parametric and computational redundancy in many application scenarios, and an increasing number of works have explored model pruning to obtain lightweight and efficient networks. However, most existing pruning approaches are driven by empirical heuristics and rarely consider the joint impact of channels, leading to unguaranteed and suboptimal performance. In this paper, we propose a novel channel pruning method via class-aware trace ratio optimization (CATRO) to reduce the computational burden and accelerate the model inference. Utilizing class information from a few samples, CATRO measures the joint impact of multiple channels by feature space discriminations and consolidates the layer-wise impact of preserved channels. By formulating channel pruning as a submodular set function maximization problem, CATRO solves it efficiently via a two-stage greedy iterative optimization procedure. More importantly, we present theoretical justifications on convergence and performance of CATRO. Experimental results demonstrate that CATRO achieves higher accuracy with similar computation cost or lower computation cost with similar accuracy than other state-of-the-art channel pruning algorithms. In addition, because of its class-aware property, CATRO is suitable to prune efficient networks adaptively for various classification subtasks, enhancing handy deployment and usage of deep networks in real-world applications.

摘要:在许多应用场景中,深度卷积神经网络具有高参数和计算冗余的矫枉过正,越来越多的工程已经探索了模型修剪以获得轻量级和高效的网络。然而,大多数现有的修剪方法是由经验启发式驱动的,并且很少考虑渠道的联合影响,导致无人视野和次优的表现。在本文中,我们提出了一种通过类感知的轨迹比优化(CATRO)提出了一种新颖的信道修剪方法,以减少计算负担并加速模型推断。利用来自几个样品的类信息,CATO通过特征空间鉴别来测量多个通道的关节影响,并整合保存通道的层面影响。通过将信道修剪作为子模具集功能最大化问题,CATO通过两级贪婪迭代优化程序有效地解决了它。更重要的是,我们对CACRO的收敛性和表现提供了理论理由。实验结果表明,具有比其他最先进的通道修剪算法相似的计算成本或较低的计算成本,CATO具有更高的准确性。此外,由于其类感知属性,CATO是适用于各种分类子任务的修剪高效网络,增强了现实世界应用中的方便部署和使用深网络。

CV-48-标题: Exploiting Inter-pixel Correlations in Unsupervised Domain Adaptation for Semantic Segmentation

链接: https://arxiv.org/abs/2110.10916
作者: Inseop Chung, Jayeon Yoo, Nojun Kwak
备注:

点击查看摘要

Abstract: “Self-training” has become a dominant method for semantic segmentation via unsupervised domain adaptation (UDA). It creates a set of pseudo labels for the target domain to give explicit supervision. However, the pseudo labels are noisy, sparse and do not provide any information about inter-pixel correlations. We regard inter-pixel correlation quite important because semantic segmentation is a task of predicting highly structured pixel-level outputs. Therefore, in this paper, we propose a method of transferring the inter-pixel correlations from the source domain to the target domain via a self-attention module. The module takes the prediction of the segmentation network as an input and creates a self-attended prediction that correlates similar pixels. The module is trained only on the source domain to learn the domain-invariant inter-pixel correlations, then later, it is used to train the segmentation network on the target domain. The network learns not only from the pseudo labels but also by following the output of the self-attention module which provides additional knowledge about the inter-pixel correlations. Through extensive experiments, we show that our method significantly improves the performance on two standard UDA benchmarks and also can be combined with recent state-of-the-art method to achieve better performance.

摘要:“自我训练”已成为通过无监督域适应(UDA)的语义细分的主导方法。它为目标域创建了一组伪标签,以提供明确的监督。但是,伪标签是嘈杂的,稀疏,并且不提供关于像素间相关的任何信息。我们认为像素间相关性非常重要,因为语义分割是预测高度结构化的像素级输出的任务。因此,在本文中,我们提出了一种通过自我关注模块将像素相互关联与源域传输到目标域的方法。该模块将分段网络预测为输入,并创建相关的自我预测,其相关类似像素。该模块仅在源域上培训以了解域不变的像素间相关性,然后稍后,它用于训练目标域上的分段网络。该网络不仅从伪标签中学习,而且通过遵循自我关注模块的输出,这提供了关于像素间相关的额外知识。通过广泛的实验,我们表明我们的方法显着提高了两个标准UDA基准测试的性能,也可以与最近的最先进的方法相结合以实现更好的性能。

CV-49-标题: A Fast Location Algorithm for Very Sparse Point Clouds Based on Object Detection

链接: https://arxiv.org/abs/2110.10901
作者: Shiyu Fan
备注:

点击查看摘要

Abstract: Limited by the performance factor, it is arduous to recognize target object and locate it in Augmented Reality (AR) scenes on low-end mobile devices, especially which using monocular cameras. In this paper, we proposed an algorithm which can quickly locate the target object through image object detection in the circumstances of having very sparse feature points. We introduce YOLOv3-Tiny to our algorithm as the object detection module to filter the possible points and using Principal Component Analysis (PCA) to determine the location. We conduct the experiment in a manually designed scene by holding a smartphone and the results represent high positioning speed and accuracy of our method.

摘要:受到性能因素的限制,识别目标对象并将其定位在低端移动设备上的增强现实(AR)场景中,特别是使用单眼摄像机。在本文中,我们提出了一种算法,该算法可以通过在具有非常稀疏的特征点的情况下通过图像对象检测快速定位目标对象。我们将YOLOV3-TINY介绍为我们的算法作为对象检测模块,以过滤可能的点并使用主成分分析(PCA)来确定位置。我们通过持有智能手机在手动设计场景中进行实验,结果代表了我们方法的高定位速度和准确性。

CV-50-标题: LARNet: Latent Action Representation for Human Action Synthesis

链接: https://arxiv.org/abs/2110.10899
作者: Naman Biyani, Aayush J Rana, Shruti Vyas, Yogesh S Rawat
备注: British Machine Vision Conference (BMVC) 2021

点击查看摘要

Abstract: We present LARNet, a novel end-to-end approach for generating human action videos. A joint generative modeling of appearance and dynamics to synthesize a video is very challenging and therefore recent works in video synthesis have proposed to decompose these two factors. However, these methods require a driving video to model the video dynamics. In this work, we propose a generative approach instead, which explicitly learns action dynamics in latent space avoiding the need of a driving video during inference. The generated action dynamics is integrated with the appearance using a recurrent hierarchical structure which induces motion at different scales to focus on both coarse as well as fine level action details. In addition, we propose a novel mix-adversarial loss function which aims at improving the temporal coherency of synthesized videos. We evaluate the proposed approach on four real-world human action datasets demonstrating the effectiveness of the proposed approach in generating human actions. The code and models will be made publicly available.

摘要:我们呈现Larnet,一种用于生成人类行动视频的新型端到端方法。合成视频的外观和动态的联合生成建模是非常具有挑战性的,因此近期在视频合成中的作品已经提出了对这两个因素分解。然而,这些方法需要驾驶视频来模拟视频动态。在这项工作中,我们提出了一种生成的方法,这将明确地学习潜伏空间中的动态动态,避免了推理期间的驾驶视频。生成的动态动态与外观与外观集成在一起,其传递分层结构引起不同尺度的运动,以专注于粗略和精细的级别动作细节。此外,我们提出了一种新的混合侵犯损失函数,旨在提高合成视频的时间相干性。我们评估了四个现实世界行动数据集的建议方法,证明了提出的方法在产生人类行为方面的有效性。代码和模型将公开可用。

CV-51-标题: Deep Image Matting with Flexible Guidance Input

链接: https://arxiv.org/abs/2110.10898
作者: Hang Cheng, Shugong Xu, Xiufeng Jiang, Rongrong Wang
备注: Accepted to BMVC2021

点击查看摘要

Abstract: Image matting is an important computer vision problem. Many existing matting methods require a hand-made trimap to provide auxiliary information, which is very expensive and limits the real world usage. Recently, some trimap-free methods have been proposed, which completely get rid of any user input. However, their performance lag far behind trimap-based methods due to the lack of guidance information. In this paper, we propose a matting method that use Flexible Guidance Input as user hint, which means our method can use trimap, scribblemap or clickmap as guidance information or even work without any guidance input. To achieve this, we propose Progressive Trimap Deformation(PTD) scheme that gradually shrink the area of the foreground and background of the trimap with the training step increases and finally become a scribblemap. To make our network robust to any user scribble and click, we randomly sample points on foreground and background and perform curve fitting. Moreover, we propose Semantic Fusion Module(SFM) which utilize the Feature Pyramid Enhancement Module(FPEM) and Joint Pyramid Upsampling(JPU) in matting task for the first time. The experiments show that our method can achieve state-of-the-art results comparing with existing trimap-based and trimap-free methods.

摘要:图像消光是一个重要的计算机视觉问题。许多现有的消光方法需要手工制作的Trimap来提供辅助信息,这非常昂贵并限制真实的世界使用量。最近,已经提出了一些免费的免费方法,它完全摆脱了任何用户输入。然而,由于缺乏指导信息,他们的表现远远落后于基于Trimap的方法。在本文中,我们提出了一种利用灵活的引导输入作为用户提示的消光方法,这意味着我们的方法可以使用Trimap,Scribblepap或Clickmap作为指导信息甚至没有任何引导输入。为实现这一目标,我们提出了逐步修剪变形(PTD)方案,逐渐缩小了Trimap的前景区域和Trimap的背景随着训练步骤而增加,最终成为Scribblepap。为了使我们的网络对任何用户涂鸦的强大并单击,我们随机采样前景和背景上的点并执行曲线拟合。此外,我们第一次使用特征金字塔增强模块(FPEM)和联合金字塔上采样(JPU)的语义融合模块(SFM)和联合金字塔上采样(JPU)。实验表明,我们的方法可以实现与现有的基于Trimap和Trimap的方法相比的最先进的结果。

CV-52-标题: Privacy-Aware Identity Cloning Detection based on Deep Forest

链接: https://arxiv.org/abs/2110.10897
作者: Ahmed Alharbi, Hai Dong, Xun Yi, Prabath Abeysekara
备注: The 19th International Conference on Service Oriented Computing (ICSOC 2021). arXiv admin note: text overlap with arXiv:2109.15179

点击查看摘要

Abstract: We propose a novel method to detect identity cloning of social-sensor cloud service providers to prevent the detrimental outcomes caused by identity deception. This approach leverages non-privacy-sensitive user profile data gathered from social networks and a powerful deep learning model to perform cloned identity detection. We evaluated the proposed method against the state-of-the-art identity cloning detection techniques and the other popular identity deception detection models atop a real-world dataset. The results show that our method significantly outperforms these techniques/models in terms of Precision and F1-score.

摘要:我们提出了一种新的方法来检测社交传感器云服务提供商的身份克隆,以防止身份欺骗造成的有害结果。这种方法利用从社交网络收集的非隐私敏感的用户简档数据和强大的深度学习模型来执行克隆身份检测。我们评估了针对最先进的身份克隆检测技术和现实世界数据集的最先进的身份克隆检测技术和其他流行的身份欺骗检测模型的方法。结果表明,我们的方法在精度和F1分数方面显着优于这些技术/模型。

CV-53-标题: Evolving Transferable Pruning Functions

链接: https://arxiv.org/abs/2110.10876
作者: Yuchen Liu, S.Y. Kung, David Wentzlaff
备注:

点击查看摘要

Abstract: Channel pruning has made major headway in the design of efficient deep learning models. Conventional approaches adopt human-made pruning functions to score channels’ importance for channel pruning, which requires domain knowledge and could be sub-optimal. In this work, we propose an end-to-end framework to automatically discover strong pruning metrics. Specifically, we craft a novel design space for expressing pruning functions and leverage an evolution strategy, genetic programming, to evolve high-quality and transferable pruning functions. Unlike prior methods, our approach can not only provide compact pruned networks for efficient inference, but also novel closed-form pruning metrics that are mathematically explainable and thus generalizable to different pruning tasks. The evolution is conducted on small datasets while the learned functions are transferable to larger datasets without any manual modification. Compared to direct evolution on a large dataset, our strategy shows better cost-effectiveness. When applied to more challenging datasets, different from those used in the evolution process, e.g., ILSVRC-2012, an evolved function achieves state-of-the-art pruning results.

摘要:渠道修剪在高效的深度学习模型设计中取得了重大进展。常规方法采用人造的修剪函数来得分渠道对信道修剪的重要性,这需要域知识并且可能是次优。在这项工作中,我们提出了一个端到端的框架,以自动发现强烈的修剪度量。具体而言,我们为表达修剪功能制作一个新颖的设计空间,并利用进化策略,遗传编程,演变高质量和可转移的修剪功能。与先前的方法不同,我们的方法不仅可以提供紧凑的修剪网络,用于有效推论,而且还具有数学上可解释的新型封闭式修剪度量,从而概括不同的修剪任务。在小型数据集上进行演变,而学习功能可转换为较大的数据集,而无需任何手动修改。与大型数据集的直接演变相比,我们的策略显示出更好的成本效益。当应用于更具挑战性的数据集时,与进化过程中使用的那些不同,例如ILSVRC-2012,演变功能实现了最先进的修剪结果。

CV-54-标题: Controllable and Compositional Generation with Latent-Space Energy-Based Models

链接: https://arxiv.org/abs/2110.10873
作者: Weili Nie, Arash Vahdat, Anima Anandkumar
备注: 32 pages, NeurIPS 2021

点击查看摘要

Abstract: Controllable generation is one of the key requirements for successful adoption of deep generative models in real-world applications, but it still remains as a great challenge. In particular, the compositional ability to generate novel concept combinations is out of reach for most current models. In this work, we use energy-based models (EBMs) to handle compositional generation over a set of attributes. To make them scalable to high-resolution image generation, we introduce an EBM in the latent space of a pre-trained generative model such as StyleGAN. We propose a novel EBM formulation representing the joint distribution of data and attributes together, and we show how sampling from it is formulated as solving an ordinary differential equation (ODE). Given a pre-trained generator, all we need for controllable generation is to train an attribute classifier. Sampling with ODEs is done efficiently in the latent space and is robust to hyperparameters. Thus, our method is simple, fast to train, and efficient to sample. Experimental results show that our method outperforms the state-of-the-art in both conditional sampling and sequential editing. In compositional generation, our method excels at zero-shot generation of unseen attribute combinations. Also, by composing energy functions with logical operators, this work is the first to achieve such compositionality in generating photo-realistic images of resolution 1024x1024.

摘要:可控一代是成功采用现实世界应用中的深度生成模型的关键要求之一,但它仍然是一个巨大的挑战。特别地,产生新颖概念组合的组成能力对于大多数目前的模型来说是遥不可及的。在这项工作中,我们使用基于能量的模型(EBMS)来处理一组属性上的组成生成。为了使它们可扩展到高分辨率图像生成,我们在培训的前期生成模型等潜在空间中引入eBM,例如样式。我们提出了一种新的EBM制剂,代表数据和属性的联合分布在一起,我们展示了如何对其进行采样作为解决常规方程(ODE)。考虑到预先训练的生成器,我们需要可控生成的所有都是训练属性分类器。使用ODES采样是有效的在潜在的空间中完成,并且对HyperParameter具有稳健性。因此,我们的方法简单,速度快,并有效地样本。实验结果表明,我们的方法在条件采样和顺序编辑中表明了最先进的。在组成生成中,我们的方法在零拍摄生成的不均义属性组合中卓越。此外,通过用逻辑运算符组成能量函数,这项工作是第一个实现在发电量1024x1024的光处理图像中实现这种组成性的。

CV-55-标题: HENet: Forcing a Network to Think More for Font Recognition

链接: https://arxiv.org/abs/2110.10872
作者: Jingchao Chen, Shiyi Mu, Shugong Xu, Youdong Ding
备注: 8 pages, 2021 3rd International Conference on Advanced Information Science and System (AISS 2021)

点击查看摘要

Abstract: Although lots of progress were made in Text Recognition/OCR in recent years, the task of font recognition is remaining challenging. The main challenge lies in the subtle difference between these similar fonts, which is hard to distinguish. This paper proposes a novel font recognizer with a pluggable module solving the font recognition task. The pluggable module hides the most discriminative accessible features and forces the network to consider other complicated features to solve the hard examples of similar fonts, called HE Block. Compared with the available public font recognition systems, our proposed method does not require any interactions at the inference stage. Extensive experiments demonstrate that HENet achieves encouraging performance, including on character-level dataset Explor_all and word-level dataset AdobeVFR

摘要:虽然近年来在文本认可/ OCR中进行了大量进展,但字体认可的任务是持续挑战。主要挑战在于这些类似字体之间的微妙差异,这很难区分。本文提出了一种具有求解字体识别任务的可插拔模块的新型字体识别器。可插拔模块隐藏了最辨别的可接近功能,并强制网络考虑其他复杂功能来解决类似字体的硬示例,称为他块。与可用的公共字体识别系统相比,我们所提出的方法不需要在推理阶段进行任何交互。广泛的实验表明,HENET实现了令人鼓舞的绩效,包括字符级数据集Explor_all和Word级数据集adobevfr

CV-56-标题: LC3Net: Ladder context correlation complementary network for salient object detection

链接: https://arxiv.org/abs/2110.10869
作者: Xian Fang, Jinchao Zhu, Xiuli Shao, Hongpeng Wang
备注:

点击查看摘要

Abstract: Currently, existing salient object detection methods based on convolutional neural networks commonly resort to constructing discriminative networks to aggregate high level and low level features. However, contextual information is always not fully and reasonably utilized, which usually causes either the absence of useful features or contamination of redundant features. To address these issues, we propose a novel ladder context correlation complementary network (LC3Net) in this paper, which is equipped with three crucial components. At the beginning, we propose a filterable convolution block (FCB) to assist the automatic collection of information on the diversity of initial features, and it is simple yet practical. Besides, we propose a dense cross module (DCM) to facilitate the intimate aggregation of different levels of features by validly integrating semantic information and detailed information of both adjacent and non-adjacent layers. Furthermore, we propose a bidirectional compression decoder (BCD) to help the progressive shrinkage of multi-scale features from coarse to fine by leveraging multiple pairs of alternating top-down and bottom-up feature interaction flows. Extensive experiments demonstrate the superiority of our method against 16 state-of-the-art methods.

摘要:目前,基于卷积神经网络的现有突出物体检测方法常见于构建鉴别网络以聚合高电平和低级功能。然而,上下文信息始终不完全合理地利用,这通常导致没有有用的特征或冗余特征的污染。为解决这些问题,我们提出了一种新颖的梯形图上下文相关互补网络(LC3Net),其配备了三个重要组件。在开始时,我们提出了一种可译无卷积块(FCB),以帮助自动收集关于初始功能的多样性,并且简单实用。此外,我们提出了一种密集的交叉模块(DCM),以便通过有效地集成语义信息和相邻和非相邻层的详细信息来促进不同程度的特征的亲密聚合。此外,我们提出了双向压缩解码器(BCD),以帮助通过利用多对交替的自上而下和自下而上的特征交互流来帮助从粗尺寸到精细的多尺度特征的逐渐收缩。广泛的实验证明了我们对16种最先进方法的方法的优越性。

CV-57-标题: Class-Discriminative CNN Compression

链接: https://arxiv.org/abs/2110.10864
作者: Yuchen Liu, David Wentzlaff, S.Y. Kung
备注:

点击查看摘要

Abstract: Compressing convolutional neural networks (CNNs) by pruning and distillation has received ever-increasing focus in the community. In particular, designing a class-discrimination based approach would be desired as it fits seamlessly with the CNNs training objective. In this paper, we propose class-discriminative compression (CDC), which injects class discrimination in both pruning and distillation to facilitate the CNNs training goal. We first study the effectiveness of a group of discriminant functions for channel pruning, where we include well-known single-variate binary-class statistics like Student’s T-Test in our study via an intuitive generalization. We then propose a novel layer-adaptive hierarchical pruning approach, where we use a coarse class discrimination scheme for early layers and a fine one for later layers. This method naturally accords with the fact that CNNs process coarse semantics in the early layers and extract fine concepts at the later. Moreover, we leverage discriminant component analysis (DCA) to distill knowledge of intermediate representations in a subspace with rich discriminative information, which enhances hidden layers’ linear separability and classification accuracy of the student. Combining pruning and distillation, CDC is evaluated on CIFAR and ILSVRC 2012, where we consistently outperform the state-of-the-art results.

摘要:通过修剪和蒸馏压缩卷积神经网络(CNNS),已在社区中获得了越来越多的焦点。特别地,设计基于类别的基于类的方法,因为它与CNNS训练目标无缝地配合。在本文中,我们提出了类别辨别压缩(CDC),其在修剪和蒸馏中注入了阶级歧视,以促进CNNS培训目标。我们首先研究了一组判别函数的渠道修剪的效力,在那里我们通过直观的泛化在我们的研究中包括学生的T检验等众所周知的单变量二进制统计数据。然后,我们提出了一种新的层次 - 自适应分层修剪方法,在那里我们使用粗级鉴别方案来用于早期层和稍后的层。该方法自然地符合CNNS在早期层中的加工粗略语义,并在后面提取精细概念。此外,我们利用判别组分分析(DCA)来蒸馏在具有丰富鉴别信息的子空间中的中间表示的知识,这提高了学生的隐藏层的线性可分性和分类准确性。结合修剪和蒸馏,CDC在CIFAR和ILSVRC 2012上进行评估,在那里我们始终如一地优于最先进的结果。

CV-58-标题: SMOF: Squeezing More Out of Filters Yields Hardware-Friendly CNN Pruning

链接: https://arxiv.org/abs/2110.10842
作者: Yanli Liu, Bochen Guan, Qinwen Xu, Weiyi Li, Shuxue Quan
备注: 11 pages, 4 figures

点击查看摘要

Abstract: For many years, the family of convolutional neural networks (CNNs) has been a workhorse in deep learning. Recently, many novel CNN structures have been designed to address increasingly challenging tasks. To make them work efficiently on edge devices, researchers have proposed various structured network pruning strategies to reduce their memory and computational cost. However, most of them only focus on reducing the number of filter channels per layer without considering the redundancy within individual filter channels. In this work, we explore pruning from another dimension, the kernel size. We develop a CNN pruning framework called SMOF, which Squeezes More Out of Filters by reducing both kernel size and the number of filter channels. Notably, SMOF is friendly to standard hardware devices without any customized low-level implementations, and the pruning effort by kernel size reduction does not suffer from the fixed-size width constraint in SIMD units of general-purpose processors. The pruned networks can be deployed effortlessly with significant running time reduction. We also support these claims via extensive experiments on various CNN structures and general-purpose processors for mobile devices.

摘要:多年来,卷积神经网络(CNNS)的家庭是深度学习的主教。最近,许多新的CNN结构旨在解决越来越具有挑战性的任务。为了使他们能够高效地在边缘设备上工作,研究人员提出了各种结构化网络修剪策略,以降低他们的记忆和计算成本。然而,它们中的大多数仅关注每层过滤通道的数量而不考虑各个过滤通道内的冗余。在这项工作中,我们从另一个维度,内核大小探索修剪。我们开发一个名为SMOF的CNN修剪框架,通过减少内核大小和滤波器通道的数量来挤压更多的过滤器。值得注意的是,SMOF对标准硬件设备友好而没有任何定制的低级实现,并且通过内核尺寸减小的修剪努力不会遭受通用处理器的SIMD单元中的固定尺寸宽度约束。修剪的网络可以毫不费力地部署,具有显着的运行时间减少。我们还通过对移动设备的各种CNN结构和通用处理器的广泛实验支持这些索赔。

CV-59-标题: A Domain Gap Aware Generative Adversarial Network for Multi-domain Image Translation

链接: https://arxiv.org/abs/2110.10837
作者: Wenju Xu, Guanghui Wang
备注:

点击查看摘要

Abstract: Recent image-to-image translation models have shown great success in mapping local textures between two domains. Existing approaches rely on a cycle-consistency constraint that supervises the generators to learn an inverse mapping. However, learning the inverse mapping introduces extra trainable parameters and it is unable to learn the inverse mapping for some domains. As a result, they are ineffective in the scenarios where (i) multiple visual image domains are involved; (ii) both structure and texture transformations are required; and (iii) semantic consistency is preserved. To solve these challenges, the paper proposes a unified model to translate images across multiple domains with significant domain gaps. Unlike previous models that constrain the generators with the ubiquitous cycle-consistency constraint to achieve the content similarity, the proposed model employs a perceptual self-regularization constraint. With a single unified generator, the model can maintain consistency over the global shapes as well as the local texture information across multiple domains. Extensive qualitative and quantitative evaluations demonstrate the effectiveness and superior performance over state-of-the-art models. It is more effective in representing shape deformation in challenging mappings with significant dataset variation across multiple domains.

摘要:最近的图像到图像转换模型在映射两个域之间的局部纹理方面已经取得了巨大成功。现有方法依赖于循环一致性约束,该约束监督发电机以学习逆映射。但是,学习反向映射引入额外的培训参数,它无法学习某些域的逆映射。结果,它们在(i)涉及多个视觉图像域的情况下是无效的; (ii)需要结构和纹理转换; (iii)保留了语义一致性。为了解决这些挑战,本文提出了一个统一的模型,将图像跨多个域转换图像,具有重要域间隙。与以前的模型与普遍存在的循环一致性约束约束发生器以实现内容相似性,所提出的模型采用感知的自正则规则约束。使用单个统一发生器,该模型可以在全局形状和多个域中的本地纹理信息保持一致性。广泛的定性和定量评估展示了最先进模型的有效性和优越性。在具有多个域的具有重要数据集变化的具有挑战性的映射中表示形状变形更有效。

CV-60-标题: High-resolution rainfall-runoff modeling using graph neural network

链接: https://arxiv.org/abs/2110.10833
作者: Zhongrun Xiang, Ibrahim Demir
备注:

点击查看摘要

Abstract: Time-series modeling has shown great promise in recent studies using the latest deep learning algorithms such as LSTM (Long Short-Term Memory). These studies primarily focused on watershed-scale rainfall-runoff modeling or streamflow forecasting, but the majority of them only considered a single watershed as a unit. Although this simplification is very effective, it does not take into account spatial information, which could result in significant errors in large watersheds. Several studies investigated the use of GNN (Graph Neural Networks) for data integration by decomposing a large watershed into multiple sub-watersheds, but each sub-watershed is still treated as a whole, and the geoinformation contained within the watershed is not fully utilized. In this paper, we propose the GNRRM (Graph Neural Rainfall-Runoff Model), a novel deep learning model that makes full use of spatial information from high-resolution precipitation data, including flow direction and geographic information. When compared to baseline models, GNRRM has less over-fitting and significantly improves model performance. Our findings support the importance of hydrological data in deep learning-based rainfall-runoff modeling, and we encourage researchers to include more domain knowledge in their models.

摘要:使用最新的深度学习算法(长短短期记忆),时间系列建模在最近的研究中表现出很大的承诺。这些研究主要集中在流域级降雨 - 径流建模或流流量预测上,但其中大部分仅被认为是单个流域作为一个单位。虽然这种简化非常有效,但它没有考虑空间信息,这可能导致大流域中的重大错误。几项研究调查了GNN(图形神经网络)通过将大流域分解成多个子流域的数据集成,但每个子流域仍然是整体的,并且流域内包含的地理信息未充分利用。在本文中,我们提出了GNRRM(图形神经雨 - 径流模型),这是一种新的深度学习模型,可以充分利用来自高分辨率降水数据的空间信息,包括流向和地理信息。与基线模型相比,GNRRM具有较少的过度拟合并显着提高了模型性能。我们的调查结果支持基于深度学习的降雨径流建模的水文数据的重要性,我们鼓励研究人员在其模型中包括更多领域知识。

CV-61-标题: Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization

链接: https://arxiv.org/abs/2110.10832
作者: Devansh Arpit, Huan Wang, Yingbo Zhou, Caiming Xiong
备注:

点击查看摘要

Abstract: In Domain Generalization (DG) settings, models trained on a given set of training domains have notoriously chaotic performance on distribution shifted test domains, and stochasticity in optimization (e.g. seed) plays a big role. This makes deep learning models unreliable in real world settings. We first show that a simple protocol for averaging model parameters along the optimization path, starting early during training, both significantly boosts domain generalization and diminishes the impact of stochasticity by improving the rank correlation between the in-domain validation accuracy and out-domain test accuracy, which is crucial for reliable model selection. Next, we show that an ensemble of independently trained models also has a chaotic behavior in the DG setting. Taking advantage of our observation, we show that instead of ensembling unaveraged models, ensembling moving average models (EoA) from different runs does increase stability and further boosts performance. On the DomainBed benchmark, when using a ResNet-50 pre-trained on ImageNet, this ensemble of averages achieves 88.6%88.6\% on PACS, 79.1%79.1\% on VLCS, 72.5%72.5\% on OfficeHome, 52.3%52.3\% on TerraIncognita, and 47.4%47.4\% on DomainNet, an average of 68.0%68.0\%, beating ERM (w/o model averaging) by 4%\sim 4\%. We also evaluate a model that is pre-trained on a larger dataset, where we show EoA achieves an average accuracy of 72.7%72.7\%, beating its corresponding ERM baseline by 5%5\%.

摘要:在域泛化(DG)设置中,给定集合培训域的模型在分布偏移测试域上具有众所周知的混乱性能,并且优化的暂停(例如种子)发挥着重要作用。这使得在真实世界环境中使深度学习模型不可靠。首先,我们首先表明,在培训期间开始,开始沿着优化路径平均模型参数的简单协议,这两者都显着提高了域泛化,并通过提高域验证精度和外域测试精度之间的等级相关性来减小随机性的影响,这对于可靠的模型选择至关重要。接下来,我们表明,独立培训的模型的集合也在DG设置中具有混沌行为。利用我们的观察,我们表明,从不同运行中合并了移动的平均模型(EOA)而是从不同运行组合的稳定性并进一步提高性能。在Domainbed基准测试中,使用Reset-50预先培训的Imagenet,平均值的集合达到了88.6 \%$ 88.6 \%$ 88.6 \%$ 79.1 \%$ of vlcs,72.5美元,officehome,$ 52.3 \%$ on terraincognita, DomainNet的$ 47.4 \%$ 47.4 \%$ 68.0 \%,以,以 \ sim 4 \%跳动ERMW/O型号)。我们还评估在更大的数据集上预先培训的模型,我们将显示EOA的平均准确性为72.7跳动ERM(W / O型号)。我们还评估在更大的数据集上预先培训的模型,我们将显示EOA的平均准确性为72.7% 72.7 \%$ 5,5 \%$击败其相应的ERM基准。

CV-62-标题: HALP: Hardware-Aware Latency Pruning

链接: https://arxiv.org/abs/2110.10811
作者: Maying Shen, Hongxu Yin, Pavlo Molchanov, Lei Mao, Jianna Liu, Jose M. Alvarez
备注:

点击查看摘要

Abstract: Structural pruning can simplify network architecture and improve inference speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing the accuracy while constraining latency under a predefined budget. For filter importance ranking, HALP leverages latency lookup table to track latency reduction potential and global saliency score to gauge accuracy drop. Both metrics can be evaluated very efficiently during pruning, allowing us to reformulate global structural pruning under a reward maximization problem given target constraint. This makes the problem solvable via our augmented knapsack solver, enabling HALP to surpass prior work in pruning efficacy and accuracy-efficiency trade-off. We examine HALP on both classification and detection tasks, over varying networks, on ImageNet and VOC datasets. In particular, for ResNet-50/-101 pruning on ImageNet, HALP improves network throughput by 1.60×1.60\times/1.90×1.90\times with +0.3%+0.3\%/0.2%-0.2\% top-1 accuracy changes, respectively. For SSD pruning on VOC, HALP improves throughput by 1.94×1.94\times with only a 0.560.56 mAP drop. HALP consistently outperforms prior art, sometimes by large margins.

摘要:结构修剪可以简化网络架构并提高推理速度。我们提出了硬件感知延迟修剪(HALP),其将结构修剪作为全局资源分配优化问题,旨在最大限度地提高准确性,同时在预定预算下限制延迟。对于滤波器重要性排名,HALP利用延迟查找表跟踪延迟降低潜力和全局显着分数,以便测量精度下降。两个指标都可以在修剪期间非常有效地评估,允许我们在给出目标约束的奖励最大化问题下重新定制全局结构修剪。这使得通过我们的增强背包求解器可以解决问题,使得HALP能够在修剪功效和精度效率的折磨方面超越先前的工作。在Imagenet和VOC数据集中,我们在不同网络上进行分类和检测任务检查HALP。特别是,对于ImageNet的Reset-50 / -101修剪,HALP将网络吞吐量提高了1.60美元,以$ + 0.3 \%$ / $ / $ - 0.2 \%$ top-1准确性更改。对于VOC上的SSD修剪,HALP仅提高吞吐量1.94美元,只需0.56美元的地图下降。 HALP始终如一地优于现有技术,有时是大的边缘。

CV-63-标题: Text-Based Person Search with Limited Data

链接: https://arxiv.org/abs/2110.10807
作者: Xiao Han, Sen He, Li Zhang, Tao Xiang
备注: 20 pages, 7 figures, 6 tables, to appear in BMVC2021

点击查看摘要

Abstract: Text-based person search (TBPS) aims at retrieving a target person from an image gallery with a descriptive text query. Solving such a fine-grained cross-modal retrieval task is challenging, which is further hampered by the lack of large-scale datasets. In this paper, we present a framework with two novel components to handle the problems brought by limited data. Firstly, to fully utilize the existing small-scale benchmarking datasets for more discriminative feature learning, we introduce a cross-modal momentum contrastive learning framework to enrich the training data for a given mini-batch. Secondly, we propose to transfer knowledge learned from existing coarse-grained large-scale datasets containing image-text pairs from drastically different problem domains to compensate for the lack of TBPS training data. A transfer learning method is designed so that useful information can be transferred despite the large domain gap. Armed with these components, our method achieves new state of the art on the CUHK-PEDES dataset with significant improvements over the prior art in terms of Rank-1 and mAP. Our code is available at this https URL.

摘要:基于文本的人员搜索(TBPS)旨在通过描述性文本查询从图像库中检索目标人员。解决这种细粒度的跨模型检索任务是具有挑战性的,这是由于缺乏大规模数据集而进一步阻碍。在本文中,我们介绍了一个具有两种新组件的框架,以处理有限数据所带来的问题。首先,为了充分利用现有的小规模基准数据集进行更多歧视特征学习,我们介绍了一种跨模型动量对比学习框架,以丰富给定的小型批次的训练数据。其次,我们建议从巨大不同的问题域中从包含图像文本对的现有粗粒大规模数据集中学习的知识,以补偿缺乏Tbps训练数据。设计转移学习方法,使得尽管域间隙大,但可以传输有用的信息。使用这些组件,我们的方法在CUHK-PEDES数据集上实现了新技术,在RANK-1和地图方面,在现有技术中具有显着改进。我们的代码可在此HTTPS URL上获得。

CV-64-标题: DVIO: Depth aided visual inertial odometry for RGBD sensors

链接: https://arxiv.org/abs/2110.10805
作者: Abhishek Tyagi, Yangwen Liang, Shuangquan Wang, Dongwoon Bai
备注:

点击查看摘要

Abstract: In past few years we have observed an increase in the usage of RGBD sensors in mobile devices. These sensors provide a good estimate of the depth map for the camera frame, which can be used in numerous augmented reality applications. This paper presents a new visual inertial odometry (VIO) system, which uses measurements from a RGBD sensor and an inertial measurement unit (IMU) sensor for estimating the motion state of the mobile device. The resulting system is called the depth-aided VIO (DVIO) system. In this system we add the depth measurement as part of the nonlinear optimization process. Specifically, we propose methods to use the depth measurement using one-dimensional (1D) feature parameterization as well as three-dimensional (3D) feature parameterization. In addition, we propose to utilize the depth measurement for estimating time offset between the unsynchronized IMU and the RGBD sensors. Last but not least, we propose a novel block-based marginalization approach to speed up the marginalization processes and maintain the real-time performance of the overall system. Experimental results validate that the proposed DVIO system outperforms the other state-of-the-art VIO systems in terms of trajectory accuracy as well as processing time.

摘要:在过去的几年中,我们观察到移动设备中RGBD传感器的使用增加。这些传感器提供了相机帧的深度图的良好估计,可用于许多增强现实应用。本文介绍了一种新的视觉惯性内径测量(VIO)系统,其使用来自RGBD传感器的测量和用于估计移动设备的运动状态的惯性测量单元(IMU)传感器。得到的系统称为深度辅助VIO(DVIO)系统。在该系统中,我们将深度测量添加为非线性优化过程的一部分。具体地,我们提出了使用一维(1D)特征参数化以及三维(3D)特征参数化的方法来使用深度测量。另外,我们建议利用深度测量来估计未同步的IMU和RGBD传感器之间的时间偏移。最后但并非最不重要的是,我们提出了一种新颖的基于块的边缘化方法来加快边缘化过程并维持整个系统的实时性能。实验结果验证了所提出的DVIO系统在轨迹精度以及处理时间方面优于其他最先进的VIO系统。

CV-65-标题: Style Agnostic 3D Reconstruction via Adversarial Style Transfer

链接: https://arxiv.org/abs/2110.10784
作者: Felix Petersen, Bastian Goldluecke, Oliver Deussen, Hilde Kuehne
备注: To be published at WACV 2022, Code @ this https URL

点击查看摘要

Abstract: Reconstructing the 3D geometry of an object from an image is a major challenge in computer vision. Recently introduced differentiable renderers can be leveraged to learn the 3D geometry of objects from 2D images, but those approaches require additional supervision to enable the renderer to produce an output that can be compared to the input image. This can be scene information or constraints such as object silhouettes, uniform backgrounds, material, texture, and lighting. In this paper, we propose an approach that enables a differentiable rendering-based learning of 3D objects from images with backgrounds without the need for silhouette supervision. Instead of trying to render an image close to the input, we propose an adversarial style-transfer and domain adaptation pipeline that allows to translate the input image domain to the rendered image domain. This allows us to directly compare between a translated image and the differentiable rendering of a 3D object reconstruction in order to train the 3D object reconstruction network. We show that the approach learns 3D geometry from images with backgrounds and provides a better performance than constrained methods for single-view 3D object reconstruction on this task.

摘要:从图像中重建对象的3D几何形状是计算机视觉中的主要挑战。最近引入的可差异化渲染器可以利用以学习来自2D图像的对象的3D几何形状,但这些方法需要额外的监督,使渲染器能够产生与输入图像可以比较的输出。这可以是场景信息或约束,例如对象剪影,统一背景,材料,纹理和照明。在本文中,我们提出了一种方法,该方法使得能够从图像中的3D对象的基于3D对象的学习,而无需轮廓监督。我们提出了一个对逆势的样式传输和域自适应流水线提出了允许将输入图像域转换为呈现的图像域。这允许我们直接比较在翻译的图像和3D对象重建的可差渲染之间,以便训练3D对象重建网络。我们表明该方法从图形中获取3D几何图形,并提供比此任务的单视图3D对象重建的约束方法更好的性能。

CV-66-标题: Closed-loop Feedback Registration for Consecutive Images of Moving Flexible Targets

链接: https://arxiv.org/abs/2110.10772
作者: Rui Ma, Xian Du
备注:

点击查看摘要

Abstract: Advancement of imaging techniques enables consecutive image sequences to be acquired for quality monitoring of manufacturing production lines. Registration for these image sequences is essential for in-line pattern inspection and metrology, e.g., in the printing process of flexible electronics. However, conventional image registration algorithms cannot produce accurate results when the images contain many similar and deformable patterns in the manufacturing process. Such a failure originates from a fact that the conventional algorithms only use the spatial and pixel intensity information for registration. Considering the nature of temporal continuity and consecution of the product images, in this paper, we propose a closed-loop feedback registration algorithm for matching and stitching the deformable printed patterns on a moving flexible substrate. The algorithm leverages the temporal and spatial relationships of the consecutive images and the continuity of the image sequence for fast, accurate, and robust point matching. Our experimental results show that our algorithm can find more matching point pairs with a lower root mean squared error (RMSE) compared to other state-of-the-art algorithms while offering significant improvements to running time.

摘要:成像技术的进步使得能够获得连续的图像序列以获得制造生产线的质量监测。这些图像序列的注册对于在线模式检查和计量是必不可少的,例如,在柔性电子器件的印刷过程中。然而,当图像在制造过程中包含许多类似和可变形的图案时,传统的图像登记算法不能产生准确的结果。这种失败起源于传统算法仅使用用于注册的空间和像素强度信息。考虑到产品图像的时间连续性和连续性的性质,在本文中,我们提出了一种闭环反馈登记算法,用于匹配和缝合在移动的柔性基板上的可变形印刷图案。该算法利用连续图像的时间和空间关系以及快速,准确,坚固的点匹配的图像序列的连续性。我们的实验结果表明,与其他最先进的算法相比,我们的算法可以找到更多匹配的点对具有较低的根均方误差(RMSE),同时提供对运行时间的显着改进。

CV-67-标题: Class Incremental Online Streaming Learning

链接: https://arxiv.org/abs/2110.10741
作者: Soumya Banerjee, Vinay Kumar Verma, Toufiq Parag, Maneesh Singh, Vinay P. Namboodiri
备注:

点击查看摘要

Abstract: A wide variety of methods have been developed to enable lifelong learning in conventional deep neural networks. However, to succeed, these methods require a `batch’ of samples to be available and visited multiple times during training. While this works well in a static setting, these methods continue to suffer in a more realistic situation where data arrives in \emph{online streaming manner}. We empirically demonstrate that the performance of current approaches degrades if the input is obtained as a stream of data with the following restrictions: (i)(i) each instance comes one at a time and can be seen only once, and (ii)(ii) the input data violates the i.i.d assumption, i.e., there can be a class-based correlation. We propose a novel approach (CIOSL) for the class-incremental learning in an \emph{online streaming setting} to address these challenges. The proposed approach leverages implicit and explicit dual weight regularization and experience replay. The implicit regularization is leveraged via the knowledge distillation, while the explicit regularization incorporates a novel approach for parameter regularization by learning the joint distribution of the buffer replay and the current sample. Also, we propose an efficient online memory replay and replacement buffer strategy that significantly boosts the model’s performance. Extensive experiments and ablation on challenging datasets show the efficacy of the proposed method.

摘要:已经开发了各种各样的方法来实现传统的深神经网络中的终身学习。但是,要成功,这些方法需要在训练期间多次可用的样本和访问的样本。虽然这效果很好,但这些方法在更现实的情况下继续遭受数据到达\ EMPH {在线流式的}。我们经验证明,如果当作为具有以下限制的数据流获得输入的流程的性能会降低:i)每隔一次来一次,只能看到一次,并且(i)每隔一次来一次,只能看到一次,并且(ii)$输入数据违反了IID假设,即,可以存在基于类的相关性。我们提出了一种新的方法(CIOS),用于\ EMPH {在线流设置}中的类增量学习,以解决这些挑战。所提出的方法利用隐式和明确的双重重量正则化和体验重播。通过知识蒸馏来利用隐式正则化,而明确的正则化通过学习缓冲重放和当前样本的联合分布来包含一种用于参数正规化的新方法。此外,我们提出了一个有效的在线记忆重播和替换缓冲策略,可显着提高模型的性能。充满挑战性数据集的广泛实验和消融显示了所提出的方法的功效。

CV-68-标题: Self-Supervision and Spatial-Sequential Attention Based Loss for Multi-Person Pose Estimation

链接: https://arxiv.org/abs/2110.10734
作者: Haiyang Liu, Dingli Luo, Songlin Du, Takeshi Ikenaga
备注:

点击查看摘要

Abstract: Bottom-up based multi-person pose estimation approaches use heatmaps with auxiliary predictions to estimate joint positions and belonging at one time. Recently, various combinations between auxiliary predictions and heatmaps have been proposed for higher performance, these predictions are supervised by the corresponding L2 loss function directly. However, the lack of more explicit supervision results in low features utilization and contradictions between predictions in one model. To solve these problems, this paper proposes (i) a new loss organization method which uses self-supervised heatmaps to reduce prediction contradictions and spatial-sequential attention to enhance networks’ features extraction; (ii) a new combination of predictions composed by heatmaps, Part Affinity Fields (PAFs) and our block-inside offsets to fix pixel-level joints positions and further demonstrates the effectiveness of proposed loss function. Experiments are conducted on the MS COCO keypoint dataset and adopting OpenPose as the baseline model. Our method outperforms the baseline overall. On the COCO verification dataset, the mAP of OpenPose trained with our proposals outperforms the OpenPose baseline by over 5.5%.

摘要:基于自下而上的多人姿势估计方法使用辅助预测的热手段来估计关节位置和归属。最近,已经提出了辅助预测和热线之间的各种组合,以便更高的性能,这些预测由相应的L2损耗函数直接监督。然而,在一个模型中,缺乏更明确的监督导致利用率低的功能和矛盾。为了解决这些问题,本文提出了一种新的损失组织方法,它使用自我监督的热手,以减少预测矛盾和空间顺序关注,以增强网络的特征提取; (ii)由热手套,零件亲和性字段(PAF)和我们的块内偏移组成的预测的新组合以固定像素级接头位置,并进一步展示了所提出的损失功能的有效性。实验在MS Coco Keypoint数据集上进行,并采用开卷作为基线模型。我们的方法优于基线整体。在COCO验证数据集上,Openpose培训的地图,我们的建议验证了Openpose Baseline以上超过5.5%。

CV-69-标题: Semi-supervised Domain Adaptation for Semantic Segmentation

链接: https://arxiv.org/abs/2110.10639
作者: Ying Chen, Xu Ouyang, Kaiyue Zhu, Gady Agam
备注:

点击查看摘要

Abstract: Deep learning approaches for semantic segmentation rely primarily on supervised learning approaches and require substantial efforts in producing pixel-level annotations. Further, such approaches may perform poorly when applied to unseen image domains. To cope with these limitations, both unsupervised domain adaptation (UDA) with full source supervision but without target supervision and semi-supervised learning (SSL) with partial supervision have been proposed. While such methods are effective at aligning different feature distributions, there is still a need to efficiently exploit unlabeled data to address the performance gap with respect to fully-supervised methods. In this paper we address semi-supervised domain adaptation (SSDA) for semantic segmentation, where a large amount of labeled source data as well as a small amount of labeled target data are available. We propose a novel and effective two-step semi-supervised dual-domain adaptation (SSDDA) approach to address both cross- and intra-domain gaps in semantic segmentation. The proposed framework is comprised of two mixing modules. First, we conduct a cross-domain adaptation via an image-level mixing strategy, which learns to align the distribution shift of features between the source data and target data. Second, intra-domain adaptation is achieved using a separate student-teacher network which is built to generate category-level data augmentation by mixing unlabeled target data in a way that respects predicted object boundaries. We demonstrate that the proposed approach outperforms state-of-the-art methods on two common synthetic-to-real semantic segmentation benchmarks. An extensive ablation study is provided to further validate the effectiveness of our approach.

摘要:语义分割的深度学习方法主要依赖于监督学习方法,并在生产像素级注释方面需要实质性努力。此外,当应用于看不见的图像域时,这种方法可以不良。为了应对这些限制,已经提出了具有全面监督的无监督域适应(UDA),但没有针对部分监督的目标监督和半监督学习(SSL)。虽然这种方法在对齐不同的特征分布时有效,但仍然需要有效地利用未标记的数据来解决关于完全监督方法的性能差距。在本文中,我们地址用于语义分割的半监督域适应(SSDA),其中有大量标记的源数据以及少量标记的目标数据可用。我们提出了一种新颖且有效的两步半监督双域适应(SSDDA)方法,以解决语义分割中的交叉和域内间隙。所提出的框架由两个混合模块组成。首先,我们通过图像级混合策略进行跨域自适应,该策略学会对准源数据和目标数据之间的特征的分布偏移。其次,使用单独的学生 - 教师网络实现域内自适应,其通过以尊重预测对象边界的方式混合未标记的目标数据来生成类别级数据增强。我们证明,所提出的方法在两个常见的合成对实际语义细分基准上表现出最先进的方法。提供了广泛的消融研究,以进一步验证我们的方法的有效性。

CV-70-标题: Video Instance Segmentation by Instance Flow Assembly

链接: https://arxiv.org/abs/2110.10599
作者: Xiang Li, Jinglu Wang, Xiao Li, Yan Lu
备注:

点击查看摘要

Abstract: Instance segmentation is a challenging task aiming at classifying and segmenting all object instances of specific classes. While two-stage box-based methods achieve top performances in the image domain, they cannot easily extend their superiority into the video domain. This is because they usually deal with features or images cropped from the detected bounding boxes without alignment, failing to capture pixel-level temporal consistency. We embrace the observation that bottom-up methods dealing with box-free features could offer accurate spacial correlations across frames, which can be fully utilized for object and pixel level tracking. We first propose our bottom-up framework equipped with a temporal context fusion module to better encode inter-frame correlations. Intra-frame cues for semantic segmentation and object localization are simultaneously extracted and reconstructed by corresponding decoders after a shared backbone. For efficient and robust tracking among instances, we introduce an instance-level correspondence across adjacent frames, which is represented by a center-to-center flow, termed as instance flow, to assemble messy dense temporal correspondences. Experiments demonstrate that the proposed method outperforms the state-of-the-art online methods (taking image-level input) on the challenging Youtube-VIS dataset.

摘要:实例分割是一个具有挑战性的任务,旨在分类和分割特定类的所有对象实例。虽然基于两阶段盒的方法在图像域中实现顶部表现,但它们不能轻易扩展其优越性进入视频域中。这是因为它们通常处理从检测到的边界框裁剪的特征或图像而不对齐,不能捕获像素级时间一致性。我们拥抱观察说,处理箱体功能的自下而上方法可以跨帧提供精确的间歇相关性,这可以充分利用对象和像素电平跟踪。我们首先提出了配备有时间上下文融合模块的自下而上的框架,以更好地编码帧间相关性。同时通过相应的解码器在共享骨干后同时提取和重建用于语义分割和对象定位的帧内提示。对于实例之间的高效和稳健跟踪,我们在相邻帧中引入了一个实例级对应,其由被视为中心流表示,称为实例流程,以组装凌乱的密集时间对应关系。实验表明,所提出的方法优于挑战YouTube-VIS数据集的最先进的在线方法(拍摄图像级输入)。

CV-71-标题: Look at What Im Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

链接: https://arxiv.org/abs/2110.10596
作者: Reuben Tan, Bryan A. Plummer, Kate Saenko, Hailin Jin, Bryan Russell
备注: Accepted at NeurIPS 2021

点击查看摘要

Abstract: We introduce the task of spatially localizing narrated interactions in videos. Key to our approach is the ability to learn to spatially localize interactions with self-supervision on a large corpus of videos with accompanying transcribed narrations. To achieve this goal, we propose a multilayer cross-modal attention network that enables effective optimization of a contrastive loss during training. We introduce a divided strategy that alternates between computing inter- and intra-modal attention across the visual and natural language modalities, which allows effective training via directly contrasting the two modalities’ representations. We demonstrate the effectiveness of our approach by self-training on the HowTo100M instructional video dataset and evaluating on a newly collected dataset of localized described interactions in the YouCook2 dataset. We show that our approach outperforms alternative baselines, including shallow co-attention and full cross-modal attention. We also apply our approach to grounding phrases in images with weak supervision on Flickr30K and show that stacking multiple attention layers is effective and, when combined with a word-to-region loss, achieves state of the art on recall-at-one and pointing hand accuracies.

摘要:我们介绍了空间本地化叙述中的视频中的任务。我们的方法的关键是能够学会在与随附的叙述的视频中的大型视频中对自我监督进行空间地定位与自我监督的互动。为实现这一目标,我们提出了一种多层跨模型关注网络,可以在培训期间有效优化对比损失。我们介绍了一种分割的策略,可以通过视觉和自然语言方式计算和中间模态注意力之间的交替,这允许通过直接对比两种方式的表示来实现有效的培训。我们展示了我们对HOWTO100M教学数据集的自我训练的方法的有效性,并在YouCook2 DataSet中的本地化描述交互的新收集数据集上进行评估。我们展示了我们的方法优于替代基准,包括浅薄的共同关注和完全跨越的关注。我们还将我们的方法应用于在Flickr30k上的弱监管下的图像中的接地短语,并显示堆叠多个注意层是有效的,并且当与对区域丢失相结合时,在召回召回和指向时达到最先进的艺术状态手准确性。

CV-72-标题: A Learning Framework for Diffeomorphic Image Registration based on Quasi-conformal Geometry

链接: https://arxiv.org/abs/2110.10580
作者: Qiguang Chen, Zhiwen Li, Lok Ming Lui
备注:

点击查看摘要

Abstract: Image registration, the process of defining meaningful correspondences between images, is essential for various image analysis tasks, especially medical imaging. Numerous learning-based methods, notably convolutional neural networks (CNNs), for deformable image registration proposed in recent years have demonstrated the feasibility and superiority of deep learning techniques for registration problems. Besides, compared to traditional algorithms’ optimization scheme of the objective function for each image pair, learning-based algorithms are several orders of magnitude faster. However, these data-driven methods without proper constraint on the deformation field will easily lead to topological foldings. To tackle this problem, We propose the quasi-conformal registration network (QCRegNet), an unsupervised learning framework, to obtain diffeomorphic 2D image registrations with large deformations based on quasi-conformal (QC) map, an orientation-preserving homeomorphism between two manifolds. The basic idea is to design a CNN mapping image pairs to deformation fields. QCRegNet consists of the estimator network and the Beltrami solver network (BSNet). The estimator network takes image pair as input and outputs the Beltrami coefficient (BC). The BC, which captures conformal distortion of a QC map and guarantees the bijectivity, will then be input to the BSNet, a task-independent network which reconstructs the desired QC map. Furthermore, we reduce the number of network parameters and computational complexity by utilizing Fourier approximation to compress BC. Experiments have been carried out on different data such as underwater and medical images. Registration results show that the registration accuracy is comparable to state-of-the-art methods and diffeomorphism is to a great extent guaranteed compared to other diffeomorphic registration algorithms.

摘要:图像登记,在图像之间定义有意义的相应的过程,对于各种图像分析任务,特别是医学成像是必不可少的。近年来提出的基于卷积神经网络(CNNS)的众多基于学习的方法,显着提出的可变形图像配准,已经证明了用于登记问题的深度学习技术的可行性和优越性。此外,与每个图像对的目标函数的传统算法优化方案相比,基于学习的算法是几个数量级。但是,这些数据驱动的方法没有对变形场的适当约束将容易导致拓扑折叠。为了解决这个问题,我们提出了准共同的登记网络(QCREGNET),一个无监督的学习框架,以基于准共形(QC)图的大变形,从而获得了两种歧管之间的定向 - 保存的同性恋的大变形。基本思想是设计CNN映射图像对以变形字段。 QCREGNET由估计网络和Beltrami解算器网络(BSNet)组成。估算器网络将图像对作为输入和输出Beltrami系数(BC)。 BC捕获QC映射的保形失真并保证生物率,然后将输入BSNet,是重建所需QC映射的任务独立网络。此外,我们通过利用傅立叶近似来压缩BC来减少网络参数和计算复杂性的数量。实验已经在水下和医学图像等不同数据上进行。注册结果表明,与最先进的方法和扩散精度相当,与其他散米晶体登记算法相比,扩散术在很大程度上保证。

CV-73-标题: Inference Graphs for CNN Interpretation

链接: https://arxiv.org/abs/2110.10568
作者: Yael Konforti, Alon Shpigler, Boaz Lernerand Aharon Bar-Hillel
备注:

点击查看摘要

Abstract: Convolutional neural networks (CNNs) have achieved superior accuracy in many visual related tasks. However, the inference process through intermediate layers is opaque, making it difficult to interpret such networks or develop trust in their operation. We propose to model the network hidden layers activity using probabilistic models. The activity patterns in layers of interest are modeled as Gaussian mixture models, and transition probabilities between clusters in consecutive modeled layers are estimated. Based on maximum-likelihood considerations, nodes and paths relevant for network prediction are chosen, connected, and visualized as an inference graph. We show that such graphs are useful for understanding the general inference process of a class, as well as explaining decisions the network makes regarding specific images.

摘要:卷积神经网络(CNNS)在许多视觉相关任务中取得了卓越的准确性。然而,通过中间层的推动过程是不透明的,使得难以解释这种网络或在其操作中发展信任。我们建议使用概率模型来模拟网络隐藏的层活动。兴趣层中的活动模式被建模为高斯混合模型,并且估计了连续建模层中的簇之间的转换概率。基于最大似然考虑,选择,连接和可视化与网络预测相关的节点和路径作为推理图。我们表明,这些图表可用于理解类的一般推理过程,以及解释网络对特定图像的决策。

CV-74-标题: Fingerprint recognition with embedded presentation attacks detection: are we ready?

链接: https://arxiv.org/abs/2110.10567
作者: Marco Micheletto, Gian Luca Marcialis, Giulia Orrù, Fabio Roli
备注:

点击查看摘要

Abstract: The diffusion of fingerprint verification systems for security applications makes it urgent to investigate the embedding of software-based presentation attack detection algorithms (PAD) into such systems. Companies and institutions need to know whether such integration would make the system more “secure” and whether the technology available is ready, and, if so, at what operational working conditions. Despite significant improvements, especially by adopting deep learning approaches to fingerprint PAD, current research did not state much about their effectiveness when embedded in fingerprint verification systems. We believe that the lack of works is explained by the lack of instruments to investigate the problem, that is, modeling the cause-effect relationships when two non-zero error-free systems work together. Accordingly, this paper explores the fusion of PAD into verification systems by proposing a novel investigation instrument: a performance simulator based on the probabilistic modeling of the relationships among the Receiver Operating Characteristics (ROC) of the two individual systems when PAD and verification stages are implemented sequentially. As a matter of fact, this is the most straightforward, flexible, and widespread approach. We carry out simulations on the PAD algorithms’ ROCs submitted to the most recent editions of LivDet (2017-2019), the state-of-the-art NIST Bozorth3, and the top-level Veryfinger 12 matchers. Reported experiments explore significant scenarios to get the conditions under which fingerprint matching with embedded PAD can improve, rather than degrade, the overall personal verification performance.

摘要:用于安全应用的指纹验证系统的扩散使得迫切需要研究基于软件的演示攻击检测算法(PAD)的嵌入到这种系统中。公司和机构需要知道这种整合是否会使系统更加“安全”以及可用的技术是否已准备就绪,如果是的话,在运行的工作条件下。尽管有重大改进,但特别是通过采用深度学习方法对指纹垫,目前的研究在嵌入在指纹验证系统中时,目前的研究不会对其有效性进行大大。我们认为,缺乏仪器缺乏调查问题的缺乏作品,即在两个非零无差错系统一起工作时,建模原因关系。因此,本文通过提出新颖的调查仪器探讨了垫进入验证系统的融合:一种基于垫和验证阶段的两个单独系统的接收器操作特性(ROC)之间关系的概率模拟的性能模拟器顺序地。事实上,这是最简单,灵活,普遍普遍的方法。我们对提交给最新版本的Livdet(2017-2019),最先进的NIST Bozorth3以及顶级非常福利12匹配者的模拟。报告的实验探讨了获取与嵌入式垫的指纹匹配的条件的重要情景可以改善,而不是降低整体个人验证性能。

CV-75-标题: Robust Monocular Localization in Sparse HD Maps Leveraging Multi-Task Uncertainty Estimation

链接: https://arxiv.org/abs/2110.10563
作者: Kürsat Petek, Kshitij Sirohi, Daniel Büscher, Wolfram Burgard
备注:

点击查看摘要

Abstract: Robust localization in dense urban scenarios using a low-cost sensor setup and sparse HD maps is highly relevant for the current advances in autonomous driving, but remains a challenging topic in research. We present a novel monocular localization approach based on a sliding-window pose graph that leverages predicted uncertainties for increased precision and robustness against challenging scenarios and per frame failures. To this end, we propose an efficient multi-task uncertainty-aware perception module, which covers semantic segmentation, as well as bounding box detection, to enable the localization of vehicles in sparse maps, containing only lane borders and traffic lights. Further, we design differentiable cost maps that are directly generated from the estimated uncertainties. This opens up the possibility to minimize the reprojection loss of amorphous map elements in an association free and uncertainty-aware manner. Extensive evaluation on the Lyft 5 dataset shows that, despite the sparsity of the map, our approach enables robust and accurate 6D localization in challenging urban scenarios

摘要:使用低成本传感器设置和稀疏高清地图的密集城市情景中的强大定位对自动驾驶的当前进步非常相关,但仍然是研究中有挑战性的话题。我们提出了一种基于滑动窗姿势图的新型单眼定位方法,其利用预测的不确定性来提高对具有挑战性的情景和每个帧故障的精度和鲁棒性。为此,我们提出了一个有效的多任务不确定性感知的感知模块,其覆盖了语义分割,以及边界框检测,以使得稀疏地图中的车辆定位,其中包含车道边界和红绿灯。此外,我们设计了从估计的不确定性直接产生的可微分成本图。这开辟了最大限度地减少自由和不确定感知方式的非晶地图元素的重新注入丧失。 Lyft 5数据集的广泛评估显示,尽管地图的稀疏性,但我们的方法可以实现挑战性的城市情景中的强大和准确的6D本地化

CV-76-标题: Few-Shot Temporal Action Localization with Query Adaptive Transformer

链接: https://arxiv.org/abs/2110.10552
作者: Sauradip Nag, Xiatian Zhu, Tao Xiang
备注: BMVC 2021

点击查看摘要

Abstract: Existing temporal action localization (TAL) works rely on a large number of training videos with exhaustive segment-level annotation, preventing them from scaling to new classes. As a solution to this problem, few-shot TAL (FS-TAL) aims to adapt a model to a new class represented by as few as a single video. Exiting FS-TAL methods assume trimmed training videos for new classes. However, this setting is not only unnatural actions are typically captured in untrimmed videos, but also ignores background video segments containing vital contextual cues for foreground action segmentation. In this work, we first propose a new FS-TAL setting by proposing to use untrimmed training videos. Further, a novel FS-TAL model is proposed which maximizes the knowledge transfer from training classes whilst enabling the model to be dynamically adapted to both the new class and each video of that class simultaneously. This is achieved by introducing a query adaptive Transformer in the model. Extensive experiments on two action localization benchmarks demonstrate that our method can outperform all the state of the art alternatives significantly in both single-domain and cross-domain scenarios. The source code can be found in this https URL

摘要:现有的时间行动本地化(TAL)依赖于具有详尽段级注释的大量培训视频,防止它们缩放到新类。作为解决这个问题的解决方案,很少拍摄的TAL(FS-TAL)旨在使模型调整到由单个视频代表的新类。退出FS-TAL方法假设为新课程修剪培训视频。但是,此设置不仅是不自然的动作通常在未经监控的视频中捕获,而且忽略包含用于前景动作分段的重要上下文提示的背景视频段。在这项工作中,我们首先提出了一种新的FS-TAL设置,提议使用未限制的培训视频。此外,提出了一种新颖的FS-TAL模型,其最大化来自训练类的知识转移,而使模型能够同时动态地适应该类的新类和每个视频。这是通过在模型中引入查询自适应变压器来实现的。在两个行动定位基准上的广泛实验表明,我们的方法可以在单域和跨域方案中显着优于所有最先进的替代方案。源代码可以在此HTTPS URL中找到

CV-77-标题: Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation

链接: https://arxiv.org/abs/2110.10546
作者: Qiming Hu, Xiaojie Guo
备注: Accepted to NeurIPS2021

点击查看摘要

Abstract: Single image reflection separation (SIRS), as a representative blind source separation task, aims to recover two layers, i.e.\textit{i.e.}, transmission and reflection, from one mixed observation, which is challenging due to the highly ill-posed nature. Existing deep learning based solutions typically restore the target layers individually, or with some concerns at the end of the output, barely taking into account the interaction across the two streams/branches. In order to utilize information more efficiently, this work presents a general yet simple interactive strategy, namely your trash is my treasure\textit{your trash is my treasure} (YTMT), for constructing dual-stream decomposition networks. To be specific, we explicitly enforce the two streams to communicate with each other block-wisely. Inspired by the additive property between the two components, the interactive path can be easily built via transferring, instead of discarding, deactivated information by the ReLU rectifier from one stream to the other. Both ablation studies and experimental results on widely-used SIRS datasets are conducted to demonstrate the efficacy of YTMT, and reveal its superiority over other state-of-the-art alternatives. The implementation is quite simple and our code is publicly available at \href\href{this https URL}{\textit{this https URL}}.

摘要:单个图像反射分离(SIRS),作为代表性的盲源分离任务,旨在从一个混合观察到恢复两层,$ \ yexit {Ie} ,传输和反射,这是由于强烈的疾病而挑战构成性质。现有的基于深度学习的解决方案通常单独恢复目标层,或者在输出结束时恢复目标层,几乎不考虑两个流/分支的交互。为了更有效地利用信息,这项工作提出了一般又简单的互动策略,即,传输和反射,这是由于强烈的疾病而挑战构成性质。现有的基于深度学习的解决方案通常单独恢复目标层,或者在输出结束时恢复目标层,几乎不考虑两个流/分支的交互。为了更有效地利用信息,这项工作提出了一般又简单的互动策略,即 \ Texit {垃圾是我的宝藏} YTMT),用于构建双流分解网络。具体而言,我们明确强制执行两条流,以便明智地拨打块。灵感来自两个组件之间的添加剂性质,可以通过传输,而不是从一个流到另一个流的传输,而不是丢弃的,而不是丢弃的交互路径。对广泛使用的SIRS数据集进行消融研究和实验结果,以证明YTMT的功效,并揭示其在其他最先进的替代方面的优越性。实施非常简单,我们的代码是公开可用的(YTMT),用于构建双流分解网络。具体而言,我们明确强制执行两条流,以便明智地拨打块。灵感来自两个组件之间的添加剂性质,可以通过传输,而不是从一个流到另一个流的传输,而不是丢弃的,而不是丢弃的交互路径。对广泛使用的SIRS数据集进行消融研究和实验结果,以证明YTMT的功效,并揭示其在其他最先进的替代方面的优越性。实施非常简单,我们的代码是公开可用的 \ href {this https url} {\ textit {this https url} $。

CV-78-标题: Anisotropic Separable Set Abstraction for Efficient Point Cloud Representation Learning

链接: https://arxiv.org/abs/2110.10538
作者: Guocheng Qian, Hasan Abed Al Kader Hammoud, Guohao Li, Ali Thabet, Bernard Ghanem
备注: NeurIPS’21 Spotlight paper. code available at this https URL

点击查看摘要

Abstract: Access to 3D point cloud representations has been widely facilitated by LiDAR sensors embedded in various mobile devices. This has led to an emerging need for fast and accurate point cloud processing techniques. In this paper, we revisit and dive deeper into PointNet++, one of the most influential yet under-explored networks, and develop faster and more accurate variants of the model. We first present a novel Separable Set Abstraction (SA) module that disentangles the vanilla SA module used in PointNet++ into two separate learning stages: (1) learning channel correlation and (2) learning spatial correlation. The Separable SA module is significantly faster than the vanilla version, yet it achieves comparable performance. We then introduce a new Anisotropic Reduction function into our Separable SA module and propose an Anisotropic Separable SA (ASSA) module that substantially increases the network’s accuracy. We later replace the vanilla SA modules in PointNet++ with the proposed ASSA module, and denote the modified network as ASSANet. Extensive experiments on point cloud classification, semantic segmentation, and part segmentation show that ASSANet outperforms PointNet++ and other methods, achieving much higher accuracy and faster speeds. In particular, ASSANet outperforms PointNet++ by 7.47.4 mIoU on S3DIS Area 5, while maintaining $1.6 \times $ faster inference speed on a single NVIDIA 2080Ti GPU. Our scaled ASSANet variant achieves 66.866.8 mIoU and outperforms KPConv, while being more than 54×54 \times faster.

摘要:在各种移动设备中嵌入的LIDAR传感器广泛促进了3D点云表示的访问。这导致了快速准确的点云处理技术的新兴需求。在本文中,我们深入重新审视并深入进入PointNet ++,其中一个最有影响力的尚未探讨的网络,并开发模型的更快更准确的变体。我们首先介绍一种新型可分离集抽象(SA)模块,使PiNeNet ++中使用的Vanilla SA模块分为两个单独的学习阶段:(1)学习信道相关和(2)学习空间相关性。可分离的SA模块明显快于Vanilla版本,但它达到了可比性的性能。然后,我们将新的各向异性减少函数引入我们的可分离SA模块中,并提出了一种基本上增加了网络的准确性的各向异性可分离SA(ASSA)模块。我们以后用PointNet ++中的Vanilla SA模块用所提出的ASSA模块替换为PointNet ++,并将修改的网络表示为Assanet。关于点云分类,语义分割和部件分割的广泛实验表明,Assanet优于PointNet ++等方法,实现了更高的准确性和更快的速度。特别是,Assanet优于S3DIS区域5.4 $ Miou的PointNet ++,同时在单个NVIDIA 2080TI GPU上维持1.6美元\倍的推理速度。我们缩放的Assanet Variant达到66.8美元$ Miou和Oxperforms Kpconv,而超过54美元\倍。

CV-79-标题: Improving Model Generalization by Agreement of Learned Representations from Data Augmentation

链接: https://arxiv.org/abs/2110.10536
作者: Rowel Atienza
备注: Accepted at WACV2022

点击查看摘要

Abstract: Data augmentation reduces the generalization error by forcing a model to learn invariant representations given different transformations of the input image. In computer vision, on top of the standard image processing functions, data augmentation techniques based on regional dropout such as CutOut, MixUp, and CutMix and policy-based selection such as AutoAugment demonstrated state-of-the-art (SOTA) results. With an increasing number of data augmentation algorithms being proposed, the focus is always on optimizing the input-output mapping while not realizing that there might be an untapped value in the transformed images with the same label. We hypothesize that by forcing the representations of two transformations to agree, we can further reduce the model generalization error. We call our proposed method Agreement Maximization or simply AgMax. With this simple constraint applied during training, empirical results show that data augmentation algorithms can further improve the classification accuracy of ResNet50 on ImageNet by up to 1.5%, WideResNet40-2 on CIFAR10 by up to 0.7%, WideResNet40-2 on CIFAR100 by up to 1.6%, and LeNet5 on Speech Commands Dataset by up to 1.4%. Experimental results further show that unlike other regularization terms such as label smoothing, AgMax can take advantage of the data augmentation to consistently improve model generalization by a significant margin. On downstream tasks such as object detection and segmentation on PascalVOC and COCO, AgMax pre-trained models outperforms other data augmentation methods by as much as 1.0mAP (box) and 0.5mAP (mask). Code is available at this https URL.

摘要:数据增强通过强制模型来减少泛化误差来学习给出输入图像的不同变换的不变表示。在计算机愿景中,在标准图像处理功能之上,基于区域辍学的数据增强技术,如切断,混合和切割和基于策略的选择,如自动化,展示了最先进的(SOTA)结果。随着提出的越来越多的数据增强算法,焦点总是在优化输入 - 输出映射上,同时没有意识到具有相同标签的变换图像中可能存在未开发的值。我们假设通过强制迫使两个转换的表示来达成一致,我们可以进一步降低模型泛化误差。我们称之为我们提出的方法协议最大化或简单地agmax。通过在培训期间应用的简单约束,经验结果表明,数据增强算法可以通过高达1.5%的CIFAR10,在CIFAR100上越多为0.7%,进一步提高Imagenet上Reset50的分类准确性。 1.6%,Lenet5在语音命令上数据集高达1.4%。实验结果进一步表明,与标签平滑等其他正则化术语不同,AGMAX可以利用数据增强,以始终如一地提高大量边距的模型泛化。在Pascalvoc和Coco上的对象检测和分段等下游任务中,AGMAX预先训练的型号通过1.0Map(框)和0.5Map(掩模)优于1.0map(框)的其他数据增强方法。代码可在此HTTPS URL中获得。

CV-80-标题: AniFormer: Data-driven 3D Animation with Transformer

链接: https://arxiv.org/abs/2110.10533
作者: Haoyu Chen, Hao Tang, Nicu Sebe, Guoying Zhao
备注: BMVC 2021

点击查看摘要

Abstract: We present a novel task, i.e., animating a target 3D object through the motion of a raw driving sequence. In previous works, extra auxiliary correlations between source and target meshes or intermedia factors are inevitable to capture the motions in the driving sequences. Instead, we introduce AniFormer, a novel Transformer-based architecture, that generates animated 3D sequences by directly taking the raw driving sequences and arbitrary same-type target meshes as inputs. Specifically, we customize the Transformer architecture for 3D animation that generates mesh sequences by integrating styles from target meshes and motions from the driving meshes. Besides, instead of the conventional single regression head in the vanilla Transformer, AniFormer generates multiple frames as outputs to preserve the sequential consistency of the generated meshes. To achieve this, we carefully design a pair of regression constraints, i.e., motion and appearance constraints, that can provide strong regularization on the generated mesh sequences. Our AniFormer achieves high-fidelity, realistic, temporally coherent animated results and outperforms compared start-of-the-art methods on benchmarks of diverse categories. Code is available: this https URL.

摘要:我们提出了一种小说任务,即通过原始驱动序列的运动来动画目标3D对象。在以前的作品中,源网和源网或中间因子之间的额外辅助相关性是不可避免的,以捕获驾驶序列中的动作。相反,我们介绍了Anificer,这是一种基于新的变换器的架构,通过直接使用原始驱动序列和任意相同类型的目标网格作为输入来生成动画3D序列。具体地,我们通过将样式从目标网格和来自驾驶网格的动作集成来定制生成网状序列的3D动画的变压器架构。此外,代替Vanilla变压器中的传统单个回归头,Anifificer产生多个帧作为输出,以保持所生成网格的顺序一致性。为此,我们仔细设计了一对回归约束,即运动和外观约束,其可以在所生成的网格序列上提供强烈的正则化。我们的Anifificer实现了高保真,现实,时间相干的动画结果和优于不同类别基准的最初方法的优越。代码可用:此HTTPS URL。

CV-81-标题: Detecting and Identifying Optical Signal Attacks on Autonomous Driving Systems

链接: https://arxiv.org/abs/2110.10523
作者: Jindi Zhang, Yifan Zhang, Kejie Lu, Jianping Wang, Kui Wu, Xiaohua Jia, Bin Liu
备注:

点击查看摘要

Abstract: For autonomous driving, an essential task is to detect surrounding objects accurately. To this end, most existing systems use optical devices, including cameras and light detection and ranging (LiDAR) sensors, to collect environment data in real time. In recent years, many researchers have developed advanced machine learning models to detect surrounding objects. Nevertheless, the aforementioned optical devices are vulnerable to optical signal attacks, which could compromise the accuracy of object detection. To address this critical issue, we propose a framework to detect and identify sensors that are under attack. Specifically, we first develop a new technique to detect attacks on a system that consists of three sensors. Our main idea is to: 1) use data from three sensors to obtain two versions of depth maps (i.e., disparity) and 2) detect attacks by analyzing the distribution of disparity errors. In our study, we use real data sets and the state-of-the-art machine learning model to evaluate our attack detection scheme and the results confirm the effectiveness of our detection method. Based on the detection scheme, we further develop an identification model that is capable of identifying up to n-2 attacked sensors in a system with one LiDAR and n cameras. We prove the correctness of our identification scheme and conduct experiments to show the accuracy of our identification method. Finally, we investigate the overall sensitivity of our framework.

摘要:对于自动驾驶,基本任务是准确地检测周围的物体。为此,大多数现有系统使用光学器件,包括相机和光检测和测距(LIDAR)传感器,以实时收集环境数据。近年来,许多研究人员开发了先进的机器学习模型来检测周围物体。然而,上述光学器件容易受到光学信号攻击的影响,这可能会损害物体检测的准确性。为了解决这一关键问题,我们提出了一个框架来检测和识别攻击的传感器。具体而言,我们首先制定一种新的技术来检测由三个传感器组成的系统的攻击。我们的主要思想是:1)使用三个传感器的数据来获得两个版本的深度映射(即,差异)和2)通过分析视差误差的分布来检测攻击。在我们的研究中,我们使用真实数据集和最先进的机器学习模型来评估我们的攻击检测方案,结果证实了我们的检测方法的有效性。基于检测方案,我们进一步开发了一种识别模型,该识别模型能够在具有一个LIDAR和N个摄像机的系统中识别N-2攻击的传感器。我们证明了我们的识别方案的正确性和进行实验,以显示我们识别方法的准确性。最后,我们调查了我们框架的整体敏感性。

CV-82-标题: Event Guided Depth Sensing

链接: https://arxiv.org/abs/2110.10505
作者: Manasi Muglikar, Diederik Paul Moeys, Davide Scaramuzza
备注:

点击查看摘要

Abstract: Active depth sensors like structured light, lidar, and time-of-flight systems sample the depth of the entire scene uniformly at a fixed scan rate. This leads to limited spatio-temporal resolution where redundant static information is over-sampled and precious motion information might be under-sampled. In this paper, we present an efficient bio-inspired event-camera-driven depth estimation algorithm. In our approach, we dynamically illuminate areas of interest densely, depending on the scene activity detected by the event camera, and sparsely illuminate areas in the field of view with no motion. The depth estimation is achieved by an event-based structured light system consisting of a laser point projector coupled with a second event-based sensor tuned to detect the reflection of the laser from the scene. We show the feasibility of our approach in a simulated autonomous driving scenario and real indoor sequences using our prototype. We show that, in natural scenes like autonomous driving and indoor environments, moving edges correspond to less than 10% of the scene on average. Thus our setup requires the sensor to scan only 10% of the scene, which could lead to almost 90% less power consumption by the illumination source. While we present the evaluation and proof-of-concept for an event-based structured-light system, the ideas presented here are applicable for a wide range of depth-sensing modalities like LIDAR, time-of-flight, and standard stereo.

摘要:有源深度传感器,如结构化光,激光雷达和飞行时间系统以固定扫描速率均匀地样本整个场景的深度。这导致了有限的时空分辨率,其中冗余静态信息是过度采样的,并且可能会被采样珍贵运动信息。在本文中,我们提出了一种有效的生物启发事件 - 摄像机驱动深度估计算法。在我们的方法中,我们密集地动态地照亮感兴趣的领域,这取决于事件摄像机检测到的场景活动,并在没有动作的视野中稀疏地照亮区域。深度估计是通过基于事件的结构化光系统来实现,该光点投影仪组成,该激光点投影仪与调谐的第二事件的传感器耦合,以检测来自场景的激光器的反射。我们在模拟自主驾驶场景和真实室内序列中展示了我们方法的可行性,使用我们的原型。我们表明,在自动驾驶和室内环境的自然场景中,移动边缘平均对应于场景的不到10%。因此,我们的设置要求传感器仅扫描10%的场景,这可能会导致照明源的功耗较低的差价较低。虽然我们为基于事件的结构光系统提供了评估和验证,但这里提出的思想适用于Lidar,飞行时间和标准立体声等广泛的深度感测模式。

CV-83-标题: STALP: Style Transfer with Auxiliary Limited Pairing

链接: https://arxiv.org/abs/2110.10501
作者: David Futschik, Michal Kučera, Michal Lukáč, Zhaowen Wang, Eli Shechtman, Daniel Sýkora
备注: Eurographics 2021

点击查看摘要

Abstract: We present an approach to example-based stylization of images that uses a single pair of a source image and its stylized counterpart. We demonstrate how to train an image translation network that can perform real-time semantically meaningful style transfer to a set of target images with similar content as the source image. A key added value of our approach is that it considers also consistency of target images during training. Although those have no stylized counterparts, we constrain the translation to keep the statistics of neural responses compatible with those extracted from the stylized source. In contrast to concurrent techniques that use a similar input, our approach better preserves important visual characteristics of the source style and can deliver temporally stable results without the need to explicitly handle temporal consistency. We demonstrate its practical utility on various applications including video stylization, style transfer to panoramas, faces, and 3D models.

摘要:我们提出了一种方法,以实现使用单对源图像及其体型对应物的图像的示例性的映像。我们展示如何训练可以对具有与源图像相似内容的一组目标图像执行实时语义有意义的样式传输的图像翻译网络。我们方法的一个关键附加值是它考虑了培训期间目标图像的一致性。虽然那些没有风格化的对应物,但我们限制了翻译,以保持与从程式化源提取的那些兼容的神经反应统计。与使用类似输入的并发技术相比,我们的方法更好地保留了源样式的重要视觉特征,并且可以在不需要明确处理时间一致性的情况下提供时间稳定的结果。我们展示了其在各种应用程序的实用实用性,包括视频录制,风格转移到全景,面部,面部和3D模型。

CV-84-标题: Deep Point Cloud Normal Estimation via Triplet Learning

链接: https://arxiv.org/abs/2110.10494
作者: Weijia Wang, Xuequan Lu, Dasith de Silva Edirimuni, Xiao Liu, Antonio Robles-Kelly
备注:

点击查看摘要

Abstract: Normal estimation on 3D point clouds is a fundamental problem in 3D vision and graphics. Current methods often show limited accuracy in predicting normals at sharp features (e.g., edges and corners) and less robustness to noise. In this paper, we propose a novel normal estimation method for point clouds. It consists of two phases: (a) feature encoding which learns representations of local patches, and (b) normal estimation that takes the learned representation as input and regresses the normal vector. We are motivated that local patches on isotropic and anisotropic surfaces have similar or distinct normals, and that separable features or representations can be learned to facilitate normal estimation. To realise this, we first construct triplets of local patches on 3D point cloud data, and design a triplet network with a triplet loss for feature encoding. We then design a simple network with several MLPs and a loss function to regress the normal vector. Despite having a smaller network size compared to most other methods, experimental results show that our method preserves sharp features and achieves better normal estimation results on CAD-like shapes.

摘要:3D点云的正常估计是3D视觉和图形的基本问题。目前的方法通常在预测尖锐特征(例如边缘和角)的规范和对噪声较低的稳健性方面具有有限的准确性。在本文中,我们提出了一种用于点云的新型正常估计方法。它由两个阶段组成:(a)特征编码,其学习本地补丁的表示,(b)将学习表示作为输入的正常估计,并回归正常向量。我们的动机是各向同性和各向异性表面上的局部贴片具有相似或不同的法线,并且可以学习可分离的特征或表示以促进正常估计。为了实现这一点,我们首先在3D点云数据上构建本地补丁的三胞胎,并设计一个具有三重态损耗的三联网网络,用于特征编码。然后,我们设计具有多个MLP的简单网络和丢失正常向量的损耗功能。尽管与大多数其他方法相比具有较小的网络尺寸,但实验结果表明我们的方法保留了尖锐的特征,并实现了更好的正常估计结果对CAD样的形状。

CV-85-标题: Unified Style Transfer

链接: https://arxiv.org/abs/2110.10481
作者: Guanjie Huang, Hongjian He, Xiang Li, Xingchen Li, Ziang Liu
备注: 9 pages, 5 figures

点击查看摘要

Abstract: Currently, it is hard to compare and evaluate different style transfer algorithms due to chaotic definitions of style and the absence of agreed objective validation methods in the study of style transfer. In this paper, a novel approach, the Unified Style Transfer (UST) model, is proposed. With the introduction of a generative model for internal style representation, UST can transfer images in two approaches, i.e., Domain-based and Image-based, simultaneously. At the same time, a new philosophy based on the human sense of art and style distributions for evaluating the transfer model is presented and demonstrated, called Statistical Style Analysis. It provides a new path to validate style transfer models’ feasibility by validating the general consistency between internal style representation and art facts. Besides, the translation-invariance of AdaIN features is also discussed.

摘要:目前,由于风格的混乱定义,难以比较和评估不同的风格传输算法,并且在转移时缺乏商定的客观验证方法。本文提出了一种新方法,提出了统一的风格转移(UST)模型。随着用于内部样式表示的生成模型,UST可以以两种方法,即基于域和基于图像的基于图像的图像传输图像。同时,提出并证明了一种基于人类艺术感和风格分布的新哲学,并展示了称为统计风格分析。它通过验证内部样式表示和艺术事实之间的一般一致性,提供了验证样式传输模型的可行性的新路径。此外,还讨论了对Adain功能的转换不变性。

CV-86-标题: Noisy Annotation Refinement for Object Detection

链接: https://arxiv.org/abs/2110.10456
作者: Jiafeng Mao, Qing Yu, Yoko Yamakata, Kiyoharu Aizawa
备注:

点击查看摘要

Abstract: Supervised training of object detectors requires well-annotated large-scale datasets, whose production is costly. Therefore, some efforts have been made to obtain annotations in economical ways, such as cloud sourcing. However, datasets obtained by these methods tend to contain noisy annotations such as inaccurate bounding boxes and incorrect class labels. In this study, we propose a new problem setting of training object detectors on datasets with entangled noises of annotations of class labels and bounding boxes. Our proposed method efficiently decouples the entangled noises, corrects the noisy annotations, and subsequently trains the detector using the corrected annotations. We verified the effectiveness of our proposed method and compared it with the baseline on noisy datasets with different noise levels. The experimental results show that our proposed method significantly outperforms the baseline.

摘要:对象探测器的监督训练需要良好的注释大规模数据集,其生产成本高。因此,已经努力以经济的方式获得注释,例如云采购。但是,通过这些方法获得的数据集倾向于含有嘈杂的注释,例如不准确的边界框和不正确的类标签。在这项研究中,我们提出了一个新的问题在数据集上训练对象探测器的训练对象探测器,其中包含类标签和边界框的注释的纠缠漏洞。我们所提出的方法有效地解耦了缠绕的噪声,纠正了嘈杂的注释,然后使用纠正的注释训练探测器。我们验证了我们提出的方法的有效性,并将其与具有不同噪声水平的噪声数据集的基线进行了比较。实验结果表明,我们所提出的方法显着优于基线。

CV-87-标题: Moiré Attack (MA): A New Potential Risk of Screen Photos

链接: https://arxiv.org/abs/2110.10444
作者: Dantong Niu, Ruohao Guo, Yisen Wang
备注: NeurIPS 2021

点击查看摘要

Abstract: Images, captured by a camera, play a critical role in training Deep Neural Networks (DNNs). Usually, we assume the images acquired by cameras are consistent with the ones perceived by human eyes. However, due to the different physical mechanisms between human-vision and computer-vision systems, the final perceived images could be very different in some cases, for example shooting on digital monitors. In this paper, we find a special phenomenon in digital image processing, the moiré effect, that could cause unnoticed security threats to DNNs. Based on it, we propose a Moiré Attack (MA) that generates the physical-world moiré pattern adding to the images by mimicking the shooting process of digital devices. Extensive experiments demonstrate that our proposed digital Moiré Attack (MA) is a perfect camouflage for attackers to tamper with DNNs with a high success rate (100.0%100.0\% for untargeted and 97.0%97.0\% for targeted attack with the noise budget ϵ=4\epsilon=4), high transferability rate across different models, and high robustness under various defenses. Furthermore, MA owns great stealthiness because the moiré effect is unavoidable due to the camera’s inner physical structure, which therefore hardly attracts the awareness of humans. Our code is available at this https URL.

摘要:由相机捕获的图像,在培训深度神经网络(DNN)中发挥着关键作用。通常,我们假设摄像机获取的图像与人眼所感知的图像一致。然而,由于人类视觉和计算机视觉系统之间的不同物理机制,在某些情况下,最终感知的图像可能非常不同,例如在数字监视器上拍摄。在本文中,我们在数字图像处理中发现了一种特殊的现象,莫尔氏效应,这可能导致对DNN的不受注意力的安全威胁。基于它,我们提出了一种通过模拟数字设备的拍摄过程而生成物理世界Moiré模式的Moiré攻击(MA)。广泛的实验表明,我们提出的数字Moiré攻击(MA)是攻击者的完美伪装,攻击者与DNN具有高成功率(UN $ 100.0 \%$ 100.0 \%$ 97.0 \%$ 97.0 \%$ over and epsilon $ \ epsilon = 4美元),不同型号的可转移率高,以及各种防御下的高稳健性。此外,马拥有极大的隐身,因为由于相机的内在物理结构,莫尔氏效果是不可避免的,因此难以吸引人类的认识。我们的代码可在此HTTPS URL上获得。

CV-88-标题: A unifying framework for n-dimensional quasi-conformal mappings

链接: https://arxiv.org/abs/2110.10437
作者: Daoping Zhang, Gary P. T. Choi, Jianping Zhang, Lok Ming Lui
备注:

点击查看摘要

Abstract: With the advancement of computer technology, there is a surge of interest in effective mapping methods for objects in higher-dimensional spaces. To establish a one-to-one correspondence between objects, higher-dimensional quasi-conformal theory can be utilized for ensuring the bijectivity of the mappings. In addition, it is often desirable for the mappings to satisfy certain prescribed geometric constraints and possess low distortion in conformality or volume. In this work, we develop a unifying framework for computing nn-dimensional quasi-conformal mappings. More specifically, we propose a variational model that integrates quasi-conformal distortion, volumetric distortion, landmark correspondence, intensity mismatch and volume prior information to handle a large variety of deformation problems. We further prove the existence of a minimizer for the proposed model and devise efficient numerical methods to solve the optimization problem. We demonstrate the effectiveness of the proposed framework using various experiments in two- and three-dimensions, with applications to medical image registration, adaptive remeshing and shape modeling.

摘要:随着计算机技术的推进,对高维空间中对象的有效映射方法有兴趣激增。为了在对象之间建立一对一的对应关系,可以利用更高维的准共形理论来确保映射的映射。另外,通常希望映射满足某些规定的几何约束并且具有低变形或体积的低失真。在这项工作中,我们开发了一个用于计算$ N $ -dimensional Quasi-Conformal映射的统一框架。更具体地,我们提出了一种变分模型,其集成了准共形失真,体积失真,地标对应,强度不匹配和卷现有信息来处理大量变形问题。我们进一步证明了所提出的模型和设计有效数值方法的最小化器的存在,以解决优化问题。我们展示了使用两维和三维的各种实验的所提出的框架的有效性,其应用于医学图像配准,自适应倒闭和形状建模。

CV-89-标题: Depth360: Monocular Depth Estimation using Learnable Axisymmetric Camera Model for Spherical Camera Image

链接: https://arxiv.org/abs/2110.10415
作者: Noriaki Hirose, Kosuke Tahara
备注: 8 pages, 6 figures, 2 tables

点击查看摘要

Abstract: Self-supervised monocular depth estimation has been widely investigated to estimate depth images and relative poses from RGB images. This framework is attractive for researchers because the depth and pose networks can be trained from just time sequence images without the need for the ground truth depth and poses. In this work, we estimate the depth around a robot (360 degree view) using time sequence spherical camera images, from a camera whose parameters are unknown. We propose a learnable axisymmetric camera model which accepts distorted spherical camera images with two fisheye camera images. In addition, we trained our models with a photo-realistic simulator to generate ground truth depth images to provide supervision. Moreover, we introduced loss functions to provide floor constraints to reduce artifacts that can result from reflective floor surfaces. We demonstrate the efficacy of our method using the spherical camera images from the GO Stanford dataset and pinhole camera images from the KITTI dataset to compare our method’s performance with that of baseline method in learning the camera parameters.

摘要:已被广泛调查自我监督的单眼深度估计来估计RGB图像的深度图像和相对姿势。该框架对于研究人员来说是具有吸引力的,因为可以从时间序列图像训练的深度和姿势网络,而无需基础事实深度和姿势。在这项工作中,我们使用时间序列球面相机图像来估计机器人(360度视图)周围的深度,从参数未知的相机。我们提出了一种可学习的轴对称相机模型,可接受具有两个鱼眼相机图像的扭曲的球形相机图像。此外,我们培训了我们的模型,用照片逼真的模拟器来生成地面真理深度图像来提供监督。此外,我们引入了损失功能,以提供楼层限制,以减少可能由反射地板表面产生的伪影。我们展示了从Go Stanford DataSet和Pinhole Camera映像中使用来自Kitti DataSet的针孔摄像机图像的方法的功效,以将我们的方法的性能与学习相机参数的基线方法进行比较。

CV-90-标题: ARTS: Eliminating Inconsistency between Text Detection and Recognition with Auto-Rectification Text Spotter

链接: https://arxiv.org/abs/2110.10405
作者: Humen Zhong, Jun Tang, Wenhai Wang, Zhibo Yang, Cong Yao, Tong Lu
备注:

点击查看摘要

Abstract: Recent approaches for end-to-end text spotting have achieved promising results. However, most of the current spotters were plagued by the inconsistency problem between text detection and recognition. In this work, we introduce and prove the existence of the inconsistency problem and analyze it from two aspects: (1) inconsistency of text recognition features between training and testing, and (2) inconsistency of optimization targets between text detection and recognition. To solve the aforementioned issues, we propose a differentiable Auto-Rectification Module (ARM) together with a new training strategy to enable propagating recognition loss back into detection branch, so that our detection branch can be jointly optimized by detection and recognition targets, which largely alleviates the inconsistency problem between text detection and recognition. Based on these designs, we present a simple yet robust end-to-end text spotting framework, termed Auto-Rectification Text Spotter (ARTS), to detect and recognize arbitrarily-shaped text in natural scenes. Extensive experiments demonstrate the superiority of our method. In particular, our ARTS-S achieves 77.1% end-to-end text spotting F-measure on Total-Text at a competitive speed of 10.5 FPS, which significantly outperforms previous methods in both accuracy and inference speed.

摘要:最近末端文本发现的方法取得了有希望的结果。然而,大多数当前的特色被文本检测和识别之间的不一致问题困扰。在这项工作中,我们介绍并证明了不一致问题的存在,并从两个方面分析:(1)文本识别功能之间的培训和测试之间的不一致,(2)文本检测和识别之间的优化目标不一致。为了解决上述问题,我们提出了一种可分辨动的自动整流模块(ARM)以及新的训练策略,使传播识别丢失转回检测分支,因此我们的检测分支可以通过检测和识别目标共同优化,这在很大程度上减轻文本检测和识别之间的不一致问题。基于这些设计,我们介绍了一个简单而强大的端到端文本的发现框架,被称为自动整流文本特点(ARTS),以检测和识别自然场景中的任意形状的文本。广泛的实验证明了我们方法的优越性。特别是,我们的ARTS-S以10.5 fps的竞争速度达到77.1%的端到端文本拍摄F-MEATION。以10.5 FPS的竞争速度显着优于先前的准确性和推理速度的方法。

CV-91-标题: 3DFaceFill: An Analysis-By-Synthesis Approach to Face Completion

链接: https://arxiv.org/abs/2110.10395
作者: Rahul Dey, Vishnu Boddeti
备注: Winter Conference on Applications of Computer Vision, WACV 2022

点击查看摘要

Abstract: Existing face completion solutions are primarily driven by end-to-end models that directly generate 2D completions of 2D masked faces. By having to implicitly account for geometric and photometric variations in facial shape and appearance, such approaches result in unrealistic completions, especially under large variations in pose, shape, illumination and mask sizes. To alleviate these limitations, we introduce 3DFaceFill, an analysis-by-synthesis approach for face completion that explicitly considers the image formation process. It comprises three components, (1) an encoder that disentangles the face into its constituent 3D mesh, 3D pose, illumination and albedo factors, (2) an autoencoder that inpaints the UV representation of facial albedo, and (3) a renderer that resynthesizes the completed face. By operating on the UV representation, 3DFaceFill affords the power of correspondence and allows us to naturally enforce geometrical priors (e.g. facial symmetry) more effectively. Quantitatively, 3DFaceFill improves the state-of-the-art by up to 4dB higher PSNR and 25% better LPIPS for large masks. And, qualitatively, it leads to demonstrably more photorealistic face completions over a range of masks and occlusions while preserving consistency in global and component-wise shape, pose, illumination and eye-gaze.

摘要:现有的脸部完成解决方案主要由端到端模型驱动,直接生成2D屏蔽面的2D完成。通过暗示面部形状和外观的几何和光度变化,这种方法导致不切实际的完成,尤其是在姿势,形状,照明和掩模尺寸的大变化下。为了缓解这些限制,我们介绍了3DFaceFill,一个分析的逐合作方法,用于显式考虑图像形成过程。它包括三个组件,(1)一个编码器,它将面部解散到其成分3D网格,3D姿势,照明和反向因子,(2)AutoEncoder,其批准面部Albedo的UV表示,以及(3)重新合成的渲染器完成的脸。通过在UV表示上操作,3D浮面填充能够更有效地实现对应力的力量,并允许我们自然地强制执行几何前瞻(例如面部对称)。定量地,3D浮面填充能够通过最高可达4dB的PSNR和25%更好的大面罩提高25%的LPIP。并且,定性地,它导致在一系列面罩和闭塞的情况下明显更具光敏性的面部完成,同时保持全球和部件形状,姿势,照明和眼睛凝视的一致性。

CV-92-标题: Does Data Repair Lead to Fair Models? Curating Contextually Fair Data To Reduce Model Bias

链接: https://arxiv.org/abs/2110.10389
作者: Sharat Agarwal, Sumanyu Muku, Saket Anand, Chetan Arora
备注: A variant of this report is accepted in WACV 2022

点击查看摘要

Abstract: Contextual information is a valuable cue for Deep Neural Networks (DNNs) to learn better representations and improve accuracy. However, co-occurrence bias in the training dataset may hamper a DNN model’s generalizability to unseen scenarios in the real world. For example, in COCO, many object categories have a much higher co-occurrence with men compared to women, which can bias a DNN’s prediction in favor of men. Recent works have focused on task-specific training strategies to handle bias in such scenarios, but fixing the available data is often ignored. In this paper, we propose a novel and more generic solution to address the contextual bias in the datasets by selecting a subset of the samples, which is fair in terms of the co-occurrence with various classes for a protected attribute. We introduce a data repair algorithm using the coefficient of variation, which can curate fair and contextually balanced data for a protected class(es). This helps in training a fair model irrespective of the task, architecture or training methodology. Our proposed solution is simple, effective, and can even be used in an active learning setting where the data labels are not present or being generated incrementally. We demonstrate the effectiveness of our algorithm for the task of object detection and multi-label image classification across different datasets. Through a series of experiments, we validate that curating contextually fair data helps make model predictions fair by balancing the true positive rate for the protected class across groups without compromising on the model’s overall performance.

摘要:背景信息是深度神经网络(DNN)的有价值的提示,以了解更好的陈述并提高准确性。然而,训练数据集中的共同发生偏差可能会妨碍DNN模型对现实世界中的看不见的情景的普遍性。例如,在Coco,与女性相比,许多物体类别与男性相比具有更高的共同发生,这可以偏离DNN的预测对男性的预测。最近的作品专注于任务特定的培训策略来处理这种情景中的偏见,但修复了可用数据通常被忽略。在本文中,我们提出了一种新颖的和更通用的解决方案来通过选择样本的子集来解决数据集中的上下文偏置,这在与受保护属性的各种类的共同发生方面是公平的。我们使用变型系数引入数据修复算法,可以为受保护类策划公平和上下文平衡数据。无论任务,架构或培训方法如何,这有助于培训公平模型。我们所提出的解决方案简单,有效,甚至可以用于活动学习设置,其中数据标签不存在或逐步生成。我们展示了我们在不同数据集中的对象检测和多标签图像分类任务的算法的有效性。通过一系列实验,我们验证了策划上下文公平的数据通过平衡跨组的受保护类的真正阳性率来帮助进行模型预测,而不会影响模型的整体性能。

CV-93-标题: Model Composition: Can Multiple Neural Networks Be Combined into a Single Network Using Only Unlabeled Data?

链接: https://arxiv.org/abs/2110.10369
作者: Amin Banitalebi-Dehkordi, Xinyu Kang, Yong Zhang
备注: BMVC 2021

点击查看摘要

Abstract: The diversity of deep learning applications, datasets, and neural network architectures necessitates a careful selection of the architecture and data that match best to a target application. As an attempt to mitigate this dilemma, this paper investigates the idea of combining multiple trained neural networks using unlabeled data. In addition, combining multiple models into one can speed up the inference, result in stronger, more capable models, and allows us to select efficient device-friendly target network architectures. To this end, the proposed method makes use of generation, filtering, and aggregation of reliable pseudo-labels collected from unlabeled data. Our method supports using an arbitrary number of input models with arbitrary architectures and categories. Extensive performance evaluations demonstrated that our method is very effective. For example, for the task of object detection and without using any ground-truth labels, an EfficientDet-D0 trained on Pascal-VOC and an EfficientDet-D1 trained on COCO, can be combined to a RetinaNet-ResNet50 model, with a similar mAP as the supervised training. If fine-tuned in a semi-supervised setting, the combined model achieves +18.6%, +12.6%, and +8.1% mAP improvements over supervised training with 1%, 5%, and 10% of labels.

摘要:深度学习应用程序,数据集和神经网络架构的多样性需要仔细选择与目标应用程序最佳匹配的架构和数据。作为减轻这种困境的尝试,本文研究了使用未标记数据组合多训练神经网络的想法。此外,将多种模型组合成一个可以加速推断,导致更强大,更具体的模型,并允许我们选择高效的设备友好的目标网络架构。为此,所提出的方法利用从未标记数据收集的可靠伪标签的生成,过滤和聚合。我们的方法支持使用任意架构和类别的任意数量的输入模型。广泛的性能评估表明我们的方法非常有效。例如,对于对象检测的任务以及不使用任何地面真值标签,可以将在Coco上培训的Pascal-VOC和高效地D1培训的有效目的-D0,可以组合到RetinAnet-Resnet50模型,具有类似地图作为监督培训。如果在半监督设定中进行微调,综合模型可实现+ 18.6%,+ 12.6%,+ 8.1%的地图改进,由1%,5%和10%的标签进行监督培训。

CV-94-标题: ABC: Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised Learning

链接: https://arxiv.org/abs/2110.10368
作者: Hyuck Lee, Seungjae Shin, Heeyoung Kim
备注:

点击查看摘要

Abstract: Existing semi-supervised learning (SSL) algorithms typically assume class-balanced datasets, although the class distributions of many real-world datasets are imbalanced. In general, classifiers trained on a class-imbalanced dataset are biased toward the majority classes. This issue becomes more problematic for SSL algorithms because they utilize the biased prediction of unlabeled data for training. However, traditional class-imbalanced learning techniques, which are designed for labeled data, cannot be readily combined with SSL algorithms. We propose a scalable class-imbalanced SSL algorithm that can effectively use unlabeled data, while mitigating class imbalance by introducing an auxiliary balanced classifier (ABC) of a single layer, which is attached to a representation layer of an existing SSL algorithm. The ABC is trained with a class-balanced loss of a minibatch, while using high-quality representations learned from all data points in the minibatch using the backbone SSL algorithm to avoid overfitting and information loss.Moreover, we use consistency regularization, a recent SSL technique for utilizing unlabeled data in a modified way, to train the ABC to be balanced among the classes by selecting unlabeled data with the same probability for each class. The proposed algorithm achieves state-of-the-art performance in various class-imbalanced SSL experiments using four benchmark datasets.

摘要:现有的半监督学习(SSL)算法通常假设类平衡数据集,尽管许多真实世界数据集的类分布是不平衡的。通常,在类上展开的数据集上训练的分类器偏向于多数类。对于SSL算法而言,此问题变得更加有问题,因为它们利用对未标记数据进行培训的偏置预测。但是,设计用于标记数据的传统类别 - 不平衡的学习技术,不能与SSL算法一起容易地结合使用。我们提出了一种可伸缩的类别 - 不平衡的SSL算法,其可以有效地使用未标记的数据,而通过引入单个层的辅助平衡分类器(ABC)来减轻类别不平衡,其附加到现有SSL算法的表示层。 ABC培训,均使用骨干SSL算法使用骨干SSL算法从小闩上的所有数据点中学到的高质量表示来培训,以避免过度接收和信息丢失。我们使用一致性正则化,最近的SSL以修改方式利用未标记数据的技术,通过选择具有相同概率的未标记数据来训练ABC在类中平衡。所提出的算法在使用四个基准数据集中实现了各种类别的SSL实验中的最先进的性能。

CV-95-标题: Repaint: Improving the Generalization of Down-Stream Visual Tasks by Generating Multiple Instances of Training Examples

链接: https://arxiv.org/abs/2110.10366
作者: Amin Banitalebi-Dehkordi, Yong Zhang
备注: BMVC 2021

点击查看摘要

Abstract: Convolutional Neural Networks (CNNs) for visual tasks are believed to learn both the low-level textures and high-level object attributes, throughout the network depth. This paper further investigates the texture bias' in CNNs. To this end, we regenerate multiple instances of training examples from each original image, through a process we call repainting’. The repainted examples preserve the shape and structure of the regions and objects within the scenes, but diversify their texture and color. Our method can regenerate a same image at different daylight, season, or weather conditions, can have colorization or de-colorization effects, or even bring back some texture information from blacked-out areas. The in-place repaint allows us to further use these repainted examples for improving the generalization of CNNs. Through an extensive set of experiments, we demonstrate the usefulness of the repainted examples in training, for the tasks of image classification (ImageNet) and object detection (COCO), over several state-of-the-art network architectures at different capacities, and across different data availability regimes.

摘要:据信可视化任务的卷积神经网络(CNNS)可以在整个网络深度中学习低级纹理和高级对象属性。本文进一步调查CNN中的“纹理偏差”。为此,我们通过我们称之为“重绘”的过程从每个原始图像中重新生成多个训练示例实例。重新绘制的例子保持了场景中区域和物体的形状和结构,而是使它们的纹理和颜色多样化。我们的方法可以在不同的日光,季节或天气条件下再生相同的图像,可以具有着色或去色作用,甚至从黑色区域带回一些纹理信息。就地重绘允许我们进一步使用这些重新绘制的示例来改善CNN的概括。通过广泛的实验,我们展示了在不同容量的几种最先进的网络架构上的图像分类(ImageNet)和对象检测(Coco)的训练中重新绘制示例在培训中的有用性。跨不同的数据可用性制度。

CV-96-标题: NOD: Taking a Closer Look at Detection under Extreme Low-Light Conditions with Night Object Detection Dataset

链接: https://arxiv.org/abs/2110.10364
作者: Igor Morawski, Yu-An Chen, Yu-Sheng Lin, Winston H. Hsu
备注: 13 pages, 6 figures, to be published in BMVC 2021

点击查看摘要

Abstract: Recent work indicates that, besides being a challenge in producing perceptually pleasing images, low light proves more difficult for machine cognition than previously thought. In our work, we take a closer look at object detection in low light. First, to support the development and evaluation of new methods in this domain, we present a high-quality large-scale Night Object Detection (NOD) dataset showing dynamic scenes captured on the streets at night. Next, we directly link the lighting conditions to perceptual difficulty and identify what makes low light problematic for machine cognition. Accordingly, we provide instance-level annotation for a subset of the dataset for an in-depth evaluation of future methods. We also present an analysis of the baseline model performance to highlight opportunities for future research and show that low light is a non-trivial problem that requires special attention from the researchers. Further, to address the issues caused by low light, we propose to incorporate an image enhancement module into the object detection framework and two novel data augmentation techniques. Our image enhancement module is trained under the guidance of the object detector to learn image representation optimal for machine cognition rather than for the human visual system. Finally, experimental results confirm that the proposed method shows consistent improvement of the performance on low-light datasets.

摘要:最近的工作表明,除了在制作感知上令人愉悦的图像方面的挑战之外,低光对机器认知比以前认为更困难。在我们的工作中,我们仔细研究低光线的物体检测。首先,为了支持这一领域的新方法的开发和评估,我们提出了一种高质量的大型夜间对象检测(NOD)DataSet,显示晚上街道上捕获的动态场景。接下来,我们直接将照明条件链接到感知难度,并确定机器认知的低光问题。因此,我们为DataSet的子集提供了实例级注释,以便深入评估未来方法。我们还对基准模型性能进行了分析,突出了未来研究的机会,并表明低光是一个不需要研究人员特别关注的非琐碎问题。此外,为了解决由低光引起的问题,我们建议将图像增强模块纳入对象检测框架和两种新型数据增强技术。我们的图像增强模块在对象检测器的指导下培训,以学习机器认知的图像表示而不是人类视觉系统的图像表示。最后,实验结果证实,该方法显示了低光数据集的性能的一致性。

CV-97-标题: Dynamic Multi-Person Mesh Recovery From Uncalibrated Multi-View Cameras

链接: https://arxiv.org/abs/2110.10355
作者: Buzhen Huang, Yuan Shu, Tianshu Zhang, Yangang Wang
备注: 3DV 2021

点击查看摘要

Abstract: Dynamic multi-person mesh recovery has been a hot topic in 3D vision recently. However, few works focus on the multi-person motion capture from uncalibrated cameras, which mainly faces two challenges: the one is that inter-person interactions and occlusions introduce inherent ambiguities for both camera calibration and motion capture; The other is that a lack of dense correspondences can be used to constrain sparse camera geometries in a dynamic multi-person scene. Our key idea is incorporating motion prior knowledge into simultaneous optimization of extrinsic camera parameters and human meshes from noisy human semantics. First, we introduce a physics-geometry consistency to reduce the low and high frequency noises of the detected human semantics. Then a novel latent motion prior is proposed to simultaneously optimize extrinsic camera parameters and coherent human motions from slightly noisy inputs. Experimental results show that accurate camera parameters and human motions can be obtained through one-stage optimization. The codes will be publicly available at~\url{this https URL}.

摘要:动态多人网格恢复最近是3D Vision的热门话题。然而,很少有人作品关注来自未校准相机的多人运动捕获,主要面临两个挑战:这是人类的相互作用和闭塞引入了相机校准和运动捕获的固有模糊;另一种是,缺乏密集的对应应用于在动态多人场景中限制稀疏的相机几何形状。我们的关键思想正在将运动事先知识纳入同时优化外在的相机参数和来自嘈杂的人类语义的人体网格。首先,我们引入物理几何一致性,以减少检测到的人类语义的低频噪声。然后提出了一种新的潜在运动,以便同时优化外部相机参数和来自略微嘈杂的输入的连贯的人类运动。实验结果表明,通过一级优化,可以获得精确的相机参数和人体运动。该代码将在〜\ URL {此HTTPS URL}公开使用。

CV-98-标题: Detecting Backdoor Attacks Against Point Cloud Classifiers

链接: https://arxiv.org/abs/2110.10354
作者: Zhen Xiang, David J. Miller, Siheng Chen, Xi Li, George Kesidis
备注:

点击查看摘要

Abstract: Backdoor attacks (BA) are an emerging threat to deep neural network classifiers. A classifier being attacked will predict to the attacker’s target class when a test sample from a source class is embedded with the backdoor pattern (BP). Recently, the first BA against point cloud (PC) classifiers was proposed, creating new threats to many important applications including autonomous driving. Such PC BAs are not detectable by existing BA defenses due to their special BP embedding mechanism. In this paper, we propose a reverse-engineering defense that infers whether a PC classifier is backdoor attacked, without access to its training set or to any clean classifiers for reference. The effectiveness of our defense is demonstrated on the benchmark ModeNet40 dataset for PCs.

摘要:后门攻击(BA)是深度神经网络分类器的新兴威胁。当来自源类的测试样本嵌入嵌入后门模式(BP)时,攻击的分类器将预测攻击者的目标类。最近,提出了第一个BA针对点云(PC)分类器,为许多重要应用程序创造了新的威胁,包括自主驾驶。由于其特殊的BP嵌入机制,现有的BA防御不会检测到这种PC BAS。在本文中,我们提出了一个逆向工程防御,即推送PC分类器是否遭到攻击,无需访问其培训集或任何清洁分类器以供参考。我们的防御的有效性在PCS的基准Modenet40数据集上展示。

CV-99-标题: Contextual Gradient Scaling for Few-Shot Learning

链接: https://arxiv.org/abs/2110.10353
作者: Sanghyuk Lee, Seunghyun Lee, Byung Cheol Song
备注: Accepted to WACV2022

点击查看摘要

Abstract: Model-agnostic meta-learning (MAML) is a well-known optimization-based meta-learning algorithm that works well in various computer vision tasks, e.g., few-shot classification. MAML is to learn an initialization so that a model can adapt to a new task in a few steps. However, since the gradient norm of a classifier (head) is much bigger than those of backbone layers, the model focuses on learning the decision boundary of the classifier with similar representations. Furthermore, gradient norms of high-level layers are small than those of the other layers. So, the backbone of MAML usually learns task-generic features, which results in deteriorated adaptation performance in the inner-loop. To resolve or mitigate this problem, we propose contextual gradient scaling (CxGrad), which scales gradient norms of the backbone to facilitate learning task-specific knowledge in the inner-loop. Since the scaling factors are generated from task-conditioned parameters, gradient norms of the backbone can be scaled in a task-wise fashion. Experimental results show that CxGrad effectively encourages the backbone to learn task-specific knowledge in the inner-loop and improves the performance of MAML up to a significant margin in both same- and cross-domain few-shot classification.

摘要:模型 - 不可知的元学习(MAML)是一种知名的基于优化的元学习算法,其在各种计算机视觉任务中运行良好,例如,几次分类。 MAML是学习初始化,以便模型可以在几个步骤中适应新任务。然而,由于分类器(头部)的梯度规范比骨干层的梯度规范大得多,因此模型侧重于学习具有相似表示的分类器的决策边界。此外,高级层的梯度规范小于其他层的梯度规范。因此,MAML的骨干通常会学习任务通用功能,从而导致内圈中的适应性劣化。要解决或缓解此问题,我们提出了上下文渐变缩放(CxGRAD),其缩放了骨干的梯度规范,以便于在内圈中学习特定于特定的知识。由于缩放因子是从任务条件参数生成的,因此可以以任务方式缩放骨干的梯度规范。实验结果表明,CXGRAD有效地鼓励骨干,以学习内圈中的任务特定知识,并提高MAML的性能在相同和跨域几次分类中的显着裕度。

CV-100-标题: GTM: Gray Temporal Model for Video Recognition

链接: https://arxiv.org/abs/2110.10348
作者: Yanping Zhang, Yongxin Yu
备注:

点击查看摘要

Abstract: Data input modality plays an important role in video action recognition. Normally, there are three types of input: RGB, flow stream and compressed data. In this paper, we proposed a new input modality: gray stream. Specifically, taken the stacked consecutive 3 gray images as input, which is the same size of RGB, can not only skip the conversion process from video decoding data to RGB, but also improve the spatio-temporal modeling ability at zero computation and zero parameters. Meanwhile, we proposed a 1D Identity Channel-wise Spatio-temporal Convolution(1D-ICSC) which captures the temporal relationship at channel-feature level within a controllable computation budget(by parameters G & R). Finally, we confirm its effectiveness and efficiency on several action recognition benchmarks, such as Kinetics, Something-Something, HMDB-51 and UCF-101, and achieve impressive results.

摘要:数据输入模态在视频动作识别中扮演着重要作用。通常,输入类型的输入:RGB,流量流和压缩数据。在本文中,我们提出了一种新的输入方式:灰色流。具体地,将堆叠的连续3灰度图像作为输入,其与RGB相同的输入,不仅可以将转换过程从视频解码数据跳到RGB,而且还提高了零计算和零参数的时空建模能力。同时,我们提出了一个1D身份频道 - 方便时空卷积(1D-ICSC),其在可控计算预算(参数G&R)内捕获信道特征级别的时间关系。最后,我们确认了对几个动作识别基准的有效性和效率,例如动力学,东西,HMDB-51和UCF-101,并实现了令人印象深刻的结果。

CV-101-标题: EBJR: Energy-Based Joint Reasoning for Adaptive Inference

链接: https://arxiv.org/abs/2110.10343
作者: Mohammad Akbari, Amin Banitalebi-Dehkordi, Yong Zhang
备注: BMVC 2021

点击查看摘要

Abstract: State-of-the-art deep learning models have achieved significant performance levels on various benchmarks. However, the excellent performance comes at a cost of inefficient computational cost. Light-weight architectures, on the other hand, achieve moderate accuracies, but at a much more desirable latency. This paper presents a new method of jointly using the large accurate models together with the small fast ones. To this end, we propose an Energy-Based Joint Reasoning (EBJR) framework that adaptively distributes the samples between shallow and deep models to achieve an accuracy close to the deep model, but latency close to the shallow one. Our method is applicable to out-of-the-box pre-trained models as it does not require an architecture change nor re-training. Moreover, it is easy to use and deploy, especially for cloud services. Through a comprehensive set of experiments on different down-stream tasks, we show that our method outperforms strong state-of-the-art approaches with a considerable margin. In addition, we propose specialized EBJR, an extension of our method where we create a smaller specialized side model that performs the target task only partially, but yields an even higher accuracy and faster inference. We verify the strengths of our methods with both theoretical and experimental evaluations.

摘要:最先进的深度学习模型在各种基准上取得了显着的性能水平。然而,优异的性能以低效的计算成本提供。另一方面,轻量级架构实现中度准确性,但在更加理想的延迟中。本文介绍了一种新的方法,将大型精确模型与小快速一起使用。为此,我们提出了一种基于能量的联合推理(EBJR)框架,可自适应地分配浅层和深层模型之间的样本,以实现靠近深度模型的精度,但近距离浅景点。我们的方法适用于开箱即用的训练型号,因为它不需要架构变化和重新培训。此外,它易于使用和部署,特别是对于云服务。通过在不同的下游任务上进行一套全面的实验,我们表明我们的方法优于强大的最先进的方法,具有相当的利润率。此外,我们提出了专门的EBJR,这是我们的方法的扩展,我们创建了一个小型专业侧模型,仅部分地部分地执行目标任务,但产生更高的准确性和推理更快。我们通过理论和实验评估验证了我们的方法的优势。

CV-102-标题: Simpler Does It: Generating Semantic Labels with Objectness Guidance

链接: https://arxiv.org/abs/2110.10335
作者: Md Amirul Islam, Matthew Kowal, Sen Jia, Konstantinos G. Derpanis, Neil D. B. Bruce
备注: BMVC 2021

点击查看摘要

Abstract: Existing weakly or semi-supervised semantic segmentation methods utilize image or box-level supervision to generate pseudo-labels for weakly labeled images. However, due to the lack of strong supervision, the generated pseudo-labels are often noisy near the object boundaries, which severely impacts the network’s ability to learn strong representations. To address this problem, we present a novel framework that generates pseudo-labels for training images, which are then used to train a segmentation model. To generate pseudo-labels, we combine information from: (i) a class agnostic objectness network that learns to recognize object-like regions, and (ii) either image-level or bounding box annotations. We show the efficacy of our approach by demonstrating how the objectness network can naturally be leveraged to generate object-like regions for unseen categories. We then propose an end-to-end multi-task learning strategy, that jointly learns to segment semantics and objectness using the generated pseudo-labels. Extensive experiments demonstrate the high quality of our generated pseudo-labels and effectiveness of the proposed framework in a variety of domains. Our approach achieves better or competitive performance compared to existing weakly-supervised and semi-supervised methods.

摘要:现有的弱或半监督的语义分割方法利用图像或箱级监控来为弱标记图像生成伪标签。然而,由于缺乏强烈的监督,所产生的伪标签往往在对象边界附近嘈杂,这严重影响了网络学习强烈陈述的能力。为了解决这个问题,我们提出了一种新的框架,为训练图像产生伪标签,然后用于训练分割模型。要生成伪标签,我们将来自以下信息的信息组合:(i)一个类无话的对象网络,了解识别对象的区域,(ii)映像级或边界框注释。我们通过演示对象网络如何自然地利用以生成未操作类别来生成对象区域来展示我们的方法的功效。然后,我们提出了一个端到端的多任务学习策略,该策略共同学习使用生成的伪标签进行分段语义和对象。广泛的实验证明了我们所产生的伪标签的高质量和各种域中所提出的框架的有效性。与现有的弱监督和半监督方法相比,我们的方法可以实现更好或更有竞争力的表现。

CV-103-标题: Toward Accurate and Reliable Iris Segmentation Using Uncertainty Learning

链接: https://arxiv.org/abs/2110.10334
作者: Jianze Wei, Huaibo Huang, Muyi Sun, Ran He, Zhenan Sun
备注:

点击查看摘要

Abstract: As an upstream task of iris recognition, iris segmentation plays a vital role in multiple subsequent tasks, including localization and matching. A slight bias in iris segmentation often results in obvious performance degradation of the iris recognition system. In the paper, we propose an Iris U-transformer (IrisUsformer) for accurate and reliable iris segmentation. For better accuracy, we elaborately design IrisUsformer by adopting position-sensitive operation and re-packaging transformer block to raise the spatial perception ability of the model. For better reliability, IrisUsformer utilizes an auxiliary head to distinguishes the high- and low-uncertainty regions of segmentation predictions and then adopts a weighting scheme to guide model optimization. Experimental results on three publicly available databases demonstrate that IrisUsformer achieves better segmentation accuracy using 35% MACs of the SOTA IrisParseNet. More importantly, our method estimates the uncertainty map corresponding to the segmentation prediction for subsequent processing in iris recognition systems.

摘要:作为虹膜识别的上游任务,虹膜细分在多个后续任务中发挥着重要作用,包括本地化和匹配。虹膜分割中的轻微偏差往往导致虹膜识别系统的明显性能下降。在本文中,我们提出了一种虹膜U变压器(IRISUSFORMER),用于准确可靠的虹膜细分。为了更好的准确性,我们通过采用位置敏感操作和重新打包变压器块来精心设计IriSusFormer,以提高模型的空间感知能力。为了更好的可靠性,IRISUSFORMER利用辅助头来区分分割预测的高和低不确定性区域,然后采用加权方案来引导模型优化。在三个公开数据库上的实验结果表明IriSusFormer使用Sota Irisparsenet的35%Mac来实现更好的分割精度。更重要的是,我们的方法估计与用于虹膜识别系统的后续处理的分割预测对应的不确定性图。

CV-104-标题: Constrained Mean Shift for Representation Learning

链接: https://arxiv.org/abs/2110.10309
作者: Ajinkya Tejankar, Soroush Abbasi Koohpayegani, Hamed Pirsiavash
备注:

点击查看摘要

Abstract: We are interested in representation learning from labeled or unlabeled data. Inspired by recent success of self-supervised learning (SSL), we develop a non-contrastive representation learning method that can exploit additional knowledge. This additional knowledge may come from annotated labels in the supervised setting or an SSL model from another modality in the SSL setting. Our main idea is to generalize the mean-shift algorithm by constraining the search space of nearest neighbors, resulting in semantically purer representations. Our method simply pulls the embedding of an instance closer to its nearest neighbors in a search space that is constrained using the additional knowledge. By leveraging this non-contrastive loss, we show that the supervised ImageNet-1k pretraining with our method results in better transfer performance as compared to the baselines. Further, we demonstrate that our method is relatively robust to label noise. Finally, we show that it is possible to use the noisy constraint across modalities to train self-supervised video models.

摘要:我们对从标记或未标记的数据的表示感兴趣。最近的自我监督学习成功的启发,我们开发了一种非对比代表学习方法,可以利用额外的知识。这种额外的知识可能来自监督设置中的注释标签或SSL设置中的另一个模态的SSL模型。我们的主要思想是通过约束最近邻居的搜索空间来概括平均移位算法,从而产生语义纯粹的表示。我们的方法简单地将实例的嵌入嵌入到最近的邻居中,以使用额外知识约束的搜索空间。通过利用这种非对比性损失,我们表明,与基线相比,我们的方法的监督ImageNet-1K预先预先推价会导致更好的转移性能。此外,我们证明我们的方法对标签噪声相对稳健。最后,我们表明可以使用跨模式的嘈杂约束来培训自我监督的视频模型。

CV-105-标题: Momentum Contrastive Autoencoder: Using Contrastive Learning for Latent Space Distribution Matching in WAE

链接: https://arxiv.org/abs/2110.10303
作者: Devansh Arpit, Aadyot, Bhatnagar, Huan Wang, Caiming Xiong
备注:

点击查看摘要

Abstract: Wasserstein autoencoder (WAE) shows that matching two distributions is equivalent to minimizing a simple autoencoder (AE) loss under the constraint that the latent space of this AE matches a pre-specified prior distribution. This latent space distribution matching is a core component of WAE, and a challenging task. In this paper, we propose to use the contrastive learning framework that has been shown to be effective for self-supervised representation learning, as a means to resolve this problem. We do so by exploiting the fact that contrastive learning objectives optimize the latent space distribution to be uniform over the unit hyper-sphere, which can be easily sampled from. We show that using the contrastive learning framework to optimize the WAE loss achieves faster convergence and more stable optimization compared with existing popular algorithms for WAE. This is also reflected in the FID scores on CelebA and CIFAR-10 datasets, and the realistic generated image quality on the CelebA-HQ dataset.

摘要:Wasserstein AutoEncoder(WAE)显示匹配的两个分布相当于最小化在该约束下最小化简单的AutoEncoder(AE)丢失,即该AE的潜像与预先指定的先前分配相匹配。这种潜在的空间分布匹配是WAE的核心组件,以及一个具有挑战性的任务。在本文中,我们建议使用对对比的学习框架已被证明对自我监督的代表学习有效,作为解决这个问题的手段。我们这样做是通过利用对比学习目标优化潜伏的空间分布在单元超球上均匀,这可以容易地采样。我们展示使用对比的学习框架来优化WAE损失,与现有的WAE的流行算法相比,达到更快的收敛性和更稳定的优化。这也反映在Celeba和CiFar-10数据集的FID分数中,以及Celeba-HQ数据集上的现实生成的图像质量。

CV-106-标题: Learning Rich Nearest Neighbor Representations from Self-supervised Ensembles

链接: https://arxiv.org/abs/2110.10293
作者: Bram Wallace, Devansh Arpit, Huan Wang, Caiming Xiong
备注:

点击查看摘要

Abstract: Pretraining convolutional neural networks via self-supervision, and applying them in transfer learning, is an incredibly fast-growing field that is rapidly and iteratively improving performance across practically all image domains. Meanwhile, model ensembling is one of the most universally applicable techniques in supervised learning literature and practice, offering a simple solution to reliably improve performance. But how to optimally combine self-supervised models to maximize representation quality has largely remained unaddressed. In this work, we provide a framework to perform self-supervised model ensembling via a novel method of learning representations directly through gradient descent at inference time. This technique improves representation quality, as measured by k-nearest neighbors, both on the in-domain dataset and in the transfer setting, with models transferable from the former setting to the latter. Additionally, this direct learning of feature through backpropagation improves representations from even a single model, echoing the improvements found in self-distillation.

摘要:通过自我监督预防卷积神经网络,并将其应用于转移学习,是一种令人难以置信的快速增长领域,即在几乎所有图像域之间迅速且迭代地提高性能。同时,模型集合是监督学习文学和实践中最普遍适用的技术之一,提供简单的解决方案来可靠地提高性能。但是如何最佳地结合自我监督模型以最大化表示质量在很大程度上仍然是唯一的。在这项工作中,我们提供了一种框架,通过直接通过推理时间直接通过梯度下降来执行自我监督模型集合。该技术通过在域内数据集和传送设置中,通过K-Collecti邻邻居测量,从而提高了表示质量,其模型从以前设置到后者。此外,通过BackProxagation的这种功能直接学习通过甚至单一模型来改善表示,回应自蒸馏中发现的改进。

CV-107-标题: On Coordinate Decoding for Keypoint Estimation Tasks

链接: https://arxiv.org/abs/2110.10289
作者: Anargyros Chatzitofis, Nikolaos Zioulis, Georgios Nikolaos Albanis, Dimitrios Zarpalas, Petros Daras
备注:

点击查看摘要

Abstract: A series of 2D (and 3D) keypoint estimation tasks are built upon heatmap coordinate representation, i.e. a probability map that allows for learnable and spatially aware encoding and decoding of keypoint coordinates on grids, even allowing for sub-pixel coordinate accuracy. In this report, we aim to reproduce the findings of DARK that investigated the 2D heatmap representation by highlighting the importance of the encoding of the ground truth heatmap and the decoding of the predicted heatmap to keypoint coordinates. The authors claim that a) a more principled distribution-aware coordinate decoding method overcomes the limitations of the standard techniques widely used in the literature, and b), that the reconstruction of heatmaps from ground-truth coordinates by generating accurate and continuous heatmap distributions lead to unbiased model training, contrary to the standard coordinate encoding process that quantizes the keypoint coordinates on the resolution of the input image grid.

摘要:在Heatmap坐标表示时构建了一系列2D(和3D)keypoint估计任务,即允许在网格上获取学习和空间上的坐标坐标的概率映射,甚至允许子像素坐标精度。在本报告中,我们的目标是通过突出地面真理热图的编码和预测的热图的解码来重现调查2D热示例表示的暗的发现,并将预测的热图对关键点坐标进行解码。作者声称a)更加原则的分布感知坐标解码方法克服了文献中广泛使用的标准技术的局限性,B)通过产生准确和连续的热图分布引线来重建来自地面真理坐标的热量。对于非偏见的模型培训,与标准坐标编码过程相反,这些过程可以在输入图像网格的分辨率上量化关键点坐标。

CV-108-标题: Fine-Grained Control of Artistic Styles in Image Generation

链接: https://arxiv.org/abs/2110.10278
作者: Xin Miao, Huayan Wang, Jun Fu, Jiayi Liu, Shen Wang, Zhenyu Liao
备注:

点击查看摘要

Abstract: Recent advances in generative models and adversarial training have enabled artificially generating artworks in various artistic styles. It is highly desirable to gain more control over the generated style in practice. However, artistic styles are unlike object categories – there are a continuous spectrum of styles distinguished by subtle differences. Few works have been explored to capture the continuous spectrum of styles and apply it to a style generation task. In this paper, we propose to achieve this by embedding original artwork examples into a continuous style space. The style vectors are fed to the generator and discriminator to achieve fine-grained control. Our method can be used with common generative adversarial networks (such as StyleGAN). Experiments show that our method not only precisely controls the fine-grained artistic style but also improves image quality over vanilla StyleGAN as measured by FID.

摘要:生成模型和对抗训练的最新进展使人工制作各种艺术风格的艺术品。非常希望在实践中获得更多对所产生的风格的控制。但是,艺术风格与目标类别不同 - 通过微妙的差异,存在连续的曲线谱。已经探索了很少的作品以捕获连续的样式并将其应用于风格的生成任务。在本文中,我们建议通过将原始艺术品示例嵌入连续风格空间来实现这一目标。风格向量被送到发电机和鉴别器以实现细粒度控制。我们的方法可以与常见的生成对抗性网络(如样式甘蓝)一起使用。实验表明,我们的方法不仅精确地控制了细粒度的艺术风格,而且还通过FID衡量来改善Vanilla样式中的图像质量。

CV-109-标题: Early- and in-season crop type mapping without current-year ground truth: generating labels from historical information via a topology-based approach

链接: https://arxiv.org/abs/2110.10275
作者: Chenxi Lin, Liheng Zhong, Xiao-Peng Song, Jinwei Dong, David B.Lobell, Zhenong Jin
备注:

点击查看摘要

Abstract: Land cover classification in remote sensing is often faced with the challenge of limited ground truth. Incorporating historical information has the potential to significantly lower the expensive cost associated with collecting ground truth and, more importantly, enable early- and in-season mapping that is helpful to many pre-harvest decisions. In this study, we propose a new approach that can effectively transfer knowledge about the topology (i.e. relative position) of different crop types in the spectral feature space (e.g. the histogram of SWIR1 vs RDEG1 bands) to generate labels, thereby support crop classification in a different year. Importantly, our approach does not attempt to transfer classification decision boundaries that are susceptible to inter-annual variations of weather and management, but relies on the more robust and shift-invariant topology information. We tested this approach for mapping corn/soybeans in the US Midwest and paddy rice/corn/soybeans in Northeast China using Landsat-8 and Sentinel-2 data. Results show that our approach automatically generates high-quality labels for crops in the target year immediately after each image becomes available. Based on these generated labels from our approach, the subsequent crop type mapping using a random forest classifier reach the F1 score as high as 0.887 for corn as early as the silking stage and 0.851 for soybean as early as the flowering stage and the overall accuracy of 0.873 in Iowa. In Northeast China, F1 scores of paddy rice, corn and soybeans and the overall accuracy can exceed 0.85 two and half months ahead of harvest. Overall, these results highlight unique advantages of our approach in transferring historical knowledge and maximizing the timeliness of crop maps. Our approach supports a general paradigm shift towards learning transferrable and generalizable knowledge to facilitate land cover classification.

摘要:遥感中的土地覆盖分类通常面临有限的地面真理的挑战。纳入历史信息有可能会显着降低与收集地面真理相关的昂贵成本,更重要的是,在早期和季节性映射方面有助于许多预先收获的决策。在本研究中,我们提出了一种新的方法,可以有效地将关于频谱特征空间中不同作物类型的拓扑(即相对位置)的知识(例如,SWIR1 VS RDEG1带的直方图)来生成标签,从而支持作物分类一个不同的一年。重要的是,我们的方法不会试图转移易受年度天气和管理年间变异的分类决策界限,但依赖于更强大和换班的拓扑信息。我们使用Landsat-8和Sentinel-2数据在东北地区的美国中西部和水稻/玉米/大豆中的玉米/大豆在美国中西部和水稻/玉米/大豆中进行了测试。结果表明,我们的方法在每张图像可用后立即自动为目标年内的作物生成高质量标签。基于我们的方法的这些生成的标签,随后使用随机林分类器的后续作物类型映射,早在蚕食阶段和大豆0.851的0.851就达到0.887的F1得分,早在开花阶段和整体准确性爱荷华州0.873。在东北地区,水稻,玉米和大豆的F1分数和整体准确性可能超过收获的0.85分。总的来说,这些结果突出了我们在转移历史知识并最大化作物地图的及时性方面的独特优势。我们的方法支持一般范式转向学习可转让和概括的知识,以促进土地覆盖分类。

CV-110-标题: 1st Place Solution for the UVO Challenge on Image-based Open-World Segmentation 2021

链接: https://arxiv.org/abs/2110.10239
作者: Yuming Du, Wen Guo, Yang Xiao, Vincent Lepetit
备注: Code:this https URL

点击查看摘要

Abstract: We describe our two-stage instance segmentation framework we use to compete in the challenge. The first stage of our framework consists of an object detector, which generates object proposals in the format of bounding boxes. Then, the images and the detected bounding boxes are fed to the second stage, where a segmentation network is applied to segment the objects in the bounding boxes. We train all our networks in a class-agnostic way. Our approach achieves the first place in the UVO 2021 Image-based Open-World Segmentation Challenge.

摘要:我们描述了我们在挑战中竞争的两级实例分段框架。我们的框架的第一阶段由对象检测器组成,该对象探测器以边界框的格式生成对象提案。然后,将图像和检测到的边界框馈送到第二级,其中施加分割网络以在边界框中划分对象。我们以类无话的方式培训我们所有的网络。我们的方法在UVO 2021基于图像的开放世界分割挑战中实现了第一名。

CV-111-标题: Test time Adaptation through Perturbation Robustness

链接: https://arxiv.org/abs/2110.10232
作者: Prabhu Teja Sivaprasad, François Fleuret
备注: Under review

点击查看摘要

Abstract: Data samples generated by several real world processes are dynamic in nature \textit{i.e.}, their characteristics vary with time. Thus it is not possible to train and tackle all possible distributional shifts between training and inference, using the host of transfer learning methods in literature. In this paper, we tackle this problem of adapting to domain shift at inference time \textit{i.e.}, we do not change the training process, but quickly adapt the model at test-time to handle any domain shift. For this, we propose to enforce consistency of predictions of data sampled in the vicinity of test sample on the image manifold. On a host of test scenarios like dealing with corruptions (CIFAR-10-C and CIFAR-100-C), and domain adaptation (VisDA-C), our method is at par or significantly outperforms previous methods.

摘要:由几个真实世界流程生成的数据样本在Nature \ texit {i.e.}中是动态的,它们的特征随时间而异。因此,不可能使用文献中的传输学习方法训练和推断之间的所有可能的分布转移。在本文中,我们解决了在推理时间\ Textit {i.i.}时适应域移位的这个问题,我们没有改变培训过程,但快速调整模型在测试时间以处理任何域移位。为此,我们建议强制执行在图像歧管上的测试样品附近采样的数据预测的一致性。在许多关于处理损坏(CIFAR-10-C和CIFAR-100-C)的测试场景上,我们的方法是PAR或显着优于以前的方法。

CV-112-标题: An Adaptive Sampling and Edge Detection Approach for Encoding Static Images for Spiking Neural Networks

链接: https://arxiv.org/abs/2110.10217
作者: Peyton Chandarana, Junlin Ou, Ramtin Zand
备注:

点击查看摘要

Abstract: Current state-of-the-art methods of image classification using convolutional neural networks are often constrained by both latency and power consumption. This places a limit on the devices, particularly low-power edge devices, that can employ these methods. Spiking neural networks (SNNs) are considered to be the third generation of artificial neural networks which aim to address these latency and power constraints by taking inspiration from biological neuronal communication processes. Before data such as images can be input into an SNN, however, they must be first encoded into spike trains. Herein, we propose a method for encoding static images into temporal spike trains using edge detection and an adaptive signal sampling method for use in SNNs. The edge detection process consists of first performing Canny edge detection on the 2D static images and then converting the edge detected images into two X and Y signals using an image-to-signal conversion method. The adaptive signaling approach consists of sampling the signals such that the signals maintain enough detail and are sensitive to abrupt changes in the signal. Temporal encoding mechanisms such as threshold-based representation (TBR) and step-forward (SF) are then able to be used to convert the sampled signals into spike trains. We use various error and indicator metrics to optimize and evaluate the efficiency and precision of the proposed image encoding approach. Comparison results between the original and reconstructed signals from spike trains generated using edge-detection and adaptive temporal encoding mechanism exhibit 18x and 7x reduction in average root mean square error (RMSE) compared to the conventional SF and TBR encoding, respectively, while used for encoding MNIST dataset.

摘要:目前使用卷积神经网络的图像分类的最新方法通常受到延迟和功耗的约束。这将限制有限于可以采用这些方法的设备,特别是低功率边缘设备。尖峰神经网络(SNNS)被认为是第三代人工神经网络,其目的通过从生物神经元通信过程中获取灵感来解决这些潜伏和功率约束。然而,在诸如图像的数据可以输入SNN之前,必须首先编码到尖峰列车中。在此,我们提出了一种使用边缘检测和用于SNNS的自适应信号采样方法将静态图像编码到时间尖峰列表中的方法。边缘检测过程包括首先在2D静态图像上执行Canny Edge检测,然后使用图像到信号转换方法将边缘检测到的图像转换为两个X和Y信号。自适应信令方法包括采样信号,使得信号保持足够的细节并对信号中的突然变化敏感。然后,能够使用基于阈值的表示(TBR)和前进(SF)的时间编码机制,以将采样信号转换为尖峰列车。我们使用各种错误和指标指标来优化和评估所提出的图像编码方法的效率和精度。使用边缘检测和自适应时间编码机制生成的尖峰列车的原始和重建信号之间的比较结果分别与传统的SF和TBR编码相比,平均根均线误差(RMSE)分别展示了18倍和7倍的降低,同时用于编码Mnist DataSet。

CV-113-标题: Learning Equivariances and Partial Equivariances from Data

链接: https://arxiv.org/abs/2110.10211
作者: David W. Romero, Suhas Lohit
备注:

点击查看摘要

Abstract: Group equivariant Convolutional Neural Networks (G-CNNs) constrain features to respect the chosen symmetries, and lead to better generalization when these symmetries appear in the data. However, if the chosen symmetries are not present, group equivariant architectures lead to overly constrained models and worse performance. Frequently, the distribution of the data can be better represented by a subset of a group than by the group as a whole, e.g., rotations in [90,90][-90^{\circ}, 90^{\circ}]. In such cases, a model that respects equivariance partially is better suited to represent the data. Moreover, relevant symmetries may differ for low and high-level features, e.g., edge orientations in a face, and face poses relative to the camera. As a result, the optimal level of equivariance may differ per layer. In this work, we introduce Partial G-CNNs: a family of equivariant networks able to learn partial and full equivariances from data at every layer end-to-end. Partial G-CNNs retain full equivariance whenever beneficial, e.g., for rotated MNIST, but are able to restrict it whenever it becomes harmful, e.g., for 6~/~9 or natural image classification. Partial G-CNNs perform on par with G-CNNs when full equivariance is necessary, and outperform them otherwise. Our method is applicable to discrete groups, continuous groups and combinations thereof.

摘要:集团的成功卷积神经网络(G-CNNS)约束特征以尊重所选的对称性,并在这些对称在数据中出现时更好地推广。但是,如果不存在所选择的对称性,则组的架构架构导致过度约束的模型和更糟糕的性能。通常,数据的分布可以通过组的子集更好地表示,而不是作为整体组,例如,$ [ - 90 ^ {\ rIC},90 ^ {\ circ}] $。在这种情况下,部分尊重方向性的模型更适合代表数据。此外,相关的对称可能对于低电平和高级特征,例如,脸部的边缘取向,并且面部相对于相机姿势。结果,每个层的最佳等级可以不同。在这项工作中,我们介绍了部分G-CNNS:一个等级的网络系列,能够从每层到底的每层的数据学习部分和完整的协调。当有益的时,部分G-CNNS保持全价值,例如,对于旋转的MNIST,但能够在其变得有害的情况下限制它,例如,例如6〜/〜9或自然图像分类。部分G-CNNS在必要的充分标准时与G-CNN表示,否则差异。我们的方法适用于离散组,连续组及其组合。

CV-114-标题: Come Again? Re-Query in Referring Expression Comprehension

链接: https://arxiv.org/abs/2110.10206
作者: Stephan J. Lemmer, Jason J. Corso
备注: 17 pages, 3 figures

点击查看摘要

Abstract: To build a shared perception of the world, humans rely on the ability to resolve misunderstandings by requesting and accepting clarifications. However, when evaluating visiolinguistic models, metrics such as accuracy enforce the assumption that a decision must be made based on a single piece of evidence. In this work, we relax this assumption for the task of referring expression comprehension by allowing the model to request help when its confidence is low. We consider two ways in which this help can be provided: multimodal re-query, where the user is allowed to point or click to provide additional information to the model, and rephrase re-query, where the user is only allowed to provide another referring expression. We demonstrate the importance of re-query by showing that providing the best referring expression for all objects can increase accuracy by up to 21.9% and that this accuracy can be matched by re-querying only 12% of initial referring expressions. We further evaluate re-query functions for both multimodal and rephrase re-query across three modern approaches and demonstrate combined replacement for rephrase re-query, which improves average single-query performance by up to 6.5% and converges to as close as 1.6% of the upper bound of single-query performance.

摘要:建立对世界的共同看法,人类通过要求和接受澄清来解决误解的能力。然而,在评估粘液语言学模型时,精度等度量强制执行决定基于单一证据进行决定。在这项工作中,我们通过允许模型在其置信度低时请求有所帮助来放松这一假设。我们考虑两种方式可以提供此帮助:多模式重新查询,其中允许用户点点或点击以向模型提供附加信息,并重新查询,其中仅允许用户提供另一个引用表达。我们展示了重新查询的重要性,显示提供所有对象的最佳引用表达式可以将精度提高到21.9%,并且这种准确性可以通过重新查询仅12%的初始引用表达式来匹配。我们进一步评估了跨三种现代方法的多峰和Rephrase重新查询的重新查询功能,并演示了Rephrase重新查询的组合替代,这将平均单查询性能提高了最高可达6.5%,并收敛到最近的1.6%单查询性能的上限。

CV-115-标题: CoFi: Coarse-to-Fine ICP for LiDAR Localization in an Efficient Long-lasting Point Cloud Map

链接: https://arxiv.org/abs/2110.10194
作者: Yecheng Lyu, Xinming Huang, Ziming Zhang
备注: 8 pages, submitted to ICRA 2022

点击查看摘要

Abstract: LiDAR odometry and localization has attracted increasing research interest in recent years. In the existing works, iterative closest point (ICP) is widely used since it is precise and efficient. Due to its non-convexity and its local iterative strategy, however, ICP-based method easily falls into local optima, which in turn calls for a precise initialization. In this paper, we propose CoFi, a Coarse-to-Fine ICP algorithm for LiDAR localization. Specifically, the proposed algorithm down-samples the input point sets under multiple voxel resolution, and gradually refines the transformation from the coarse point sets to the fine-grained point sets. In addition, we propose a map based LiDAR localization algorithm that extracts semantic feature points from the LiDAR frames and apply CoFi to estimate the pose on an efficient point cloud map. With the help of the Cylinder3D algorithm for LiDAR scan semantic segmentation, the proposed CoFi localization algorithm demonstrates the state-of-the-art performance on the KITTI odometry benchmark, with significant improvement over the literature.

摘要:近年来,激光乐乐乐园和本地化吸引了越来越多的研究兴趣。在现有的作品中,迭代最接近的点(ICP)被广泛使用,因为它是精确和有效的。然而,由于其非凸性及其本地迭代策略,基于ICP的方法很容易进入本地Optima,这反过来呼吁精确初始化。在本文中,我们提出了COFI,一种用于LIDAR定位的粗致精细的ICP算法。具体地,所提出的算法在多个体素分辨率下向下采样输入点集,并且逐渐将从粗点集的变换改进到细粒度点集。此外,我们提出了一种基于地图的LIDAR定位算法,其从激光雷达帧中提取语义特征点,并应用COFI来估计有效点云映射上的姿势。借助LIDAR扫描语义分割的气缸3D算法,所提出的COFI定位算法在Kitti Odomomichart基准上展示了最先进的性能,对文献进行了显着改善。

CV-116-标题: Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation

链接: https://arxiv.org/abs/2110.10183
作者: Bin Ren, Hao Tang, Nicu Sebe
备注: 16 pages, 5 figures

点击查看摘要

Abstract: It is hard to generate an image at target view well for previous cross-view image translation methods that directly adopt a simple encoder-decoder or U-Net structure, especially for drastically different views and severe deformation cases. To ease this problem, we propose a novel two-stage framework with a new Cascaded Cross MLP-Mixer (CrossMLP) sub-network in the first stage and one refined pixel-level loss in the second stage. In the first stage, the CrossMLP sub-network learns the latent transformation cues between image code and semantic map code via our novel CrossMLP blocks. Then the coarse results are generated progressively under the guidance of those cues. Moreover, in the second stage, we design a refined pixel-level loss that eases the noisy semantic label problem with more reasonable regularization in a more compact fashion for better optimization. Extensive experimental results on Dayton~\cite{vo2016localizing} and CVUSA~\cite{workman2015wide} datasets show that our method can generate significantly better results than state-of-the-art methods. The source code and trained models are available at this https URL.

摘要:对于先前的巧妙图像翻译方法,难以在目标视图下生成图像,该方法直接采用简单的编码器 - 解码器或U-Net结构,尤其是对于众所周知的不同观点和严重的变形案例。为了简化这个问题,我们提出了一种新的两级框架,在第一阶段中具有新的级联交叉MLP - 混频器(CrossmlP)子网,第二阶段中的一个精细像素级损耗。在第一阶段,CrossMLP子网通过我们的新颖CrossMLP块在图像代码和语义地图代码之间学习潜在变换提示。然后在这些提示的指导下逐步产生粗略结果。此外,在第二阶段,我们设计了一种精致的像素级损耗,以更紧凑的方式更合理地规则化,以便更加紧凑的优化来简化嘈杂的语义标签问题。 Dayton〜\ Cite {VO2016Clocalizing}和CVUSA〜\ Cite {Workman2015Wide}数据集显示的广泛实验结果表明,我们的方法可以产生比最先进的方法产生更好的结果。此HTTPS URL提供源代码和培训的型号。

CV-117-标题: Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction

链接: https://arxiv.org/abs/2110.10174
作者: Takuma Yagi, Md Tasnimul Hasan, Yoichi Sato
备注: BMVC 2021

点击查看摘要

Abstract: Every hand-object interaction begins with contact. Despite predicting the contact state between hands and objects is useful in understanding hand-object interactions, prior methods on hand-object analysis have assumed that the interacting hands and objects are known, and were not studied in detail. In this study, we introduce a video-based method for predicting contact between a hand and an object. Specifically, given a video and a pair of hand and object tracks, we predict a binary contact state (contact or no-contact) for each frame. However, annotating a large number of hand-object tracks and contact labels is costly. To overcome the difficulty, we propose a semi-supervised framework consisting of (i) automatic collection of training data with motion-based pseudo-labels and (ii) guided progressive label correction (gPLC), which corrects noisy pseudo-labels with a small amount of trusted data. We validated our framework’s effectiveness on a newly built benchmark dataset for hand-object contact prediction and showed superior performance against existing baseline methods. Code and data are available at this https URL.

摘要:每个手对象交互都以触点开始。尽管预测手和物体之间的接触状态可用于了解手对象相互作用,但是在手工对象分析上的先前方法已经假设了相互作用的手和对象是已知的,并且没有详细研究。在本研究中,我们介绍了一种基于视频的方法,用于预测手和物体之间的接触。具体地,给定视频和一对手和对象轨道,我们预测每个帧的二进制接触状态(触点或无关)。但是,注释大量的手对象轨道和联系标签昂贵。为了克服困难,我们提出了一个半监督框架,由(i)自动收集具有基于运动的伪标签和(ii)引导的逐行标签校正(GPLC),其纠正了一个小的嘈杂伪标签可信数据的数量。我们验证了我们对新建的基准数据集进行了框架的有效性,用于手动对象联系人预测,并显示出对现有基线方法的卓越性能。此HTTPS URL可以使用代码和数据。

CV-118-标题: Recurrent Brain Graph Mapper for Predicting Time-Dependent Brain Graph Evaluation Trajectory

链接: https://arxiv.org/abs/2110.11237
作者: Alpay Tekin, Ahmed Nebli, Islem Rekik
备注:

点击查看摘要

Abstract: Several brain disorders can be detected by observing alterations in the brain’s structural and functional connectivities. Neurological findings suggest that early diagnosis of brain disorders, such as mild cognitive impairment (MCI), can prevent and even reverse its development into Alzheimer’s disease (AD). In this context, recent studies aimed to predict the evolution of brain connectivities over time by proposing machine learning models that work on brain images. However, such an approach is costly and time-consuming. Here, we propose to use brain connectivities as a more efficient alternative for time-dependent brain disorder diagnosis by regarding the brain as instead a large interconnected graph characterizing the interconnectivity scheme between several brain regions. We term our proposed method Recurrent Brain Graph Mapper (RBGM), a novel efficient edge-based recurrent graph neural network that predicts the time-dependent evaluation trajectory of a brain graph from a single baseline. Our RBGM contains a set of recurrent neural network-inspired mappers for each time point, where each mapper aims to project the ground-truth brain graph onto its next time point. We leverage the teacher forcing method to boost training and improve the evolved brain graph quality. To maintain the topological consistency between the predicted brain graphs and their corresponding ground-truth brain graphs at each time point, we further integrate a topological loss. We also use l1 loss to capture time-dependency and minimize the distance between the brain graph at consecutive time points for regularization. Benchmarks against several variants of RBGM and state-of-the-art methods prove that we can achieve the same accuracy in predicting brain graph evolution more efficiently, paving the way for novel graph neural network architecture and a highly efficient training scheme.

摘要:通过观察大脑结构和功能性连接的改变,可以检测几种脑障碍。神经系统发现表明,脑疾病的早期诊断,如轻度认知障碍(MCI),可以预防甚至逆转其对阿尔茨海默病(AD)的发展。在这种情况下,最近的研究旨在通过提出在脑图像上工作的机器学习模型来预测脑连接的演变。但是,这种方法是昂贵且耗时的。在这里,我们建议使用脑连接性作为更有效的替代时间依赖于大脑的脑障碍诊断,而是大脑的大互连图表征了几种脑区之间的互连方案。我们术语我们提出的方法复发性脑图映射映射器(RBGM),一种新型高效的基于边缘的复发图神经网络,其预测来自单个基线的脑图的时间依赖评估轨迹。我们的RBGM包含一组经常性的神经网络启发映射器,每个时间点都有每个时间点,其中每个映射器旨在将地面真实的脑图投影到下一个时间点。我们利用教师强迫方法提高训练,提高演进的脑图质量。为了保持预测的脑图与它们在每个时间点之间预测的脑图和相应的地面脑图之间的拓扑一致性,我们进一步整合了拓扑损失。我们还使用L1丢失来捕获时间依赖性,并最大限度地减少连续时间点进行正规化的脑图之间的距离。针对RBGM的几种变体和最先进的方法的基准证明我们可以更有效地达到预测大脑图的演变,为新颖的图形神经网络架构和高效训练方案铺平了相同的准确性。

CV-119-标题: Towards Reducing Aleatoric Uncertainty for Medical Imaging Tasks

链接: https://arxiv.org/abs/2110.11012
作者: Abhishek Singh Sambyal, Narayanan C. Krishnan, Deepti R. Bathula
备注:

点击查看摘要

Abstract: In safety-critical applications like medical diagnosis, certainty associated with a model’s prediction is just as important as its accuracy. Consequently, uncertainty estimation and reduction play a crucial role. Uncertainty in predictions can be attributed to noise or randomness in data (aleatoric) and incorrect model inferences (epistemic). While model uncertainty can be reduced with more data or bigger models, aleatoric uncertainty is more intricate. This work proposes a novel approach that interprets data uncertainty estimated from a self-supervised task as noise inherent to the data and utilizes it to reduce aleatoric uncertainty in another task related to the same dataset via data augmentation. The proposed method was evaluated on a benchmark medical imaging dataset with image reconstruction as the self-supervised task and segmentation as the image analysis task. Our findings demonstrate the effectiveness of the proposed approach in significantly reducing the aleatoric uncertainty in the image segmentation task while achieving better or on-par performance compared to the standard augmentation techniques.

摘要:在医学诊断等安全关键应用中,与模型预测相关的确定性与其准确性同样重要。因此,不确定性估计和减少起到了至关重要的作用。预测中的不确定性可归因于数据(炼侵)和不正确的模型推论中的噪声或随机性(认识)。虽然模型不确定性可以减少更多的数据或更大的模型,但炼体不确定性更复杂。这项工作提出了一种新的方法,将自我监督任务估计的数据不确定性解释为数据所固有的噪音,并利用它通过数据增强来减少与同一数据集相关的另一个任务中的梯级不确定性。在具有图像重建的基准医学成像数据集上评估所提出的方法,作为自我监督任务和分段作为图像分析任务。我们的研究结果证明了拟议方法在显着降低了图像分割任务中的炼米不确定性的效果,同时与标准增强技术相比,实现了更好或逐阵性能。

CV-120-标题: 2020 CATARACTS Semantic Segmentation Challenge

链接: https://arxiv.org/abs/2110.10965
作者: Imanol Luengo, Maria Grammatikopoulou, Rahim Mohammadi, Chris Walsh, Chinedu Innocent Nwoye, Deepak Alapatt, Nicolas Padoy, Zhen-Liang Ni, Chen-Chen Fan, Gui-Bin Bian, Zeng-Guang Hou, Heonjin Ha, Jiacheng Wang, Haojie Wang, Dong Guo, Lu Wang, Guotai Wang, Mobarakol Islam, Bharat Giddwani, Ren Hongliang, Theodoros Pissas, Claudio Ravasio Martin Huber, Jeremy Birch, Joan M.Nunez Do Rio, Lyndon da Cruz, Christos Bergeles, Hongyu Chen, Fucang Jia, Nikhil KumarTomar, Debesh Jha, Michael A. Riegler, Pal Halvorsen, Sophia Bano, Uddhav Vaghela, Jianyuan Hong, Haili Ye, Feihong Huang, Da-Han Wang, Danail Stoyanov
备注:

点击查看摘要

Abstract: Surgical scene segmentation is essential for anatomy and instrument localization which can be further used to assess tissue-instrument interactions during a surgical procedure. In 2017, the Challenge on Automatic Tool Annotation for cataRACT Surgery (CATARACTS) released 50 cataract surgery videos accompanied by instrument usage annotations. These annotations included frame-level instrument presence information. In 2020, we released pixel-wise semantic annotations for anatomy and instruments for 4670 images sampled from 25 videos of the CATARACTS training set. The 2020 CATARACTS Semantic Segmentation Challenge, which was a sub-challenge of the 2020 MICCAI Endoscopic Vision (EndoVis) Challenge, presented three sub-tasks to assess participating solutions on anatomical structure and instrument segmentation. Their performance was assessed on a hidden test set of 531 images from 10 videos of the CATARACTS test set.

摘要:手术场景分割对于解剖学和仪器本地化至关重要,该仪器本地化可以进一步用于评估手术过程中的组织仪器相互作用。2017年,对白内障手术(白内障)自动工具注释(白内障)的挑战释放了50个白内障手术视频,伴随着仪器使用注释。这些注释包括帧级仪器存在信息。在2020年,我们为来自白内障训练集的25个视频采样的4670张图像进行了解剖学和仪器的解剖和仪器发布了像素 - Wise语义注释。2020年白内障语义分割挑战,这是2020个米奇内窥镜视觉(Endovis)挑战的亚挑战,提出了三个子任务,以评估参与解剖结构和仪器分割的参与解决方案。他们的表现是在来自白内障测试集的10个视频中的531张图像的隐藏测试组中进行了评估。

CV-121-标题: Evaluation of Various Open-Set Medical Imaging Tasks with Deep Neural Networks

链接: https://arxiv.org/abs/2110.10888
作者: Zongyuan Ge, Xin Wang
备注:

点击查看摘要

Abstract: The current generation of deep neural networks has achieved close-to-human results on “closed-set” image recognition; that is, the classes being evaluated overlap with the training classes. Many recent methods attempt to address the importance of the unknown, which are termed “open-set” recognition algorithms, try to reject unknown classes as well as maintain high recognition accuracy on known classes. However, it is still unclear how different general domain-trained open-set methods from ImageNet would perform on a different but more specific domain, such as the medical domain. Without principled and formal evaluations to measure the effectiveness of those general open-set methods, artificial intelligence (AI)-based medical diagnostics would experience ineffective adoption and increased risks of bad decision making. In this paper, we conduct rigorous evaluations amongst state-of-the-art open-set methods, exploring different open-set scenarios from “similar-domain” to “different-domain” scenarios and comparing them on various general and medical domain datasets. We summarise the results and core ideas and explain how the models react to various degrees of openness and different distributions of open classes. We show the main difference between general domain-trained and medical domain-trained open-set models with our quantitative and qualitative analysis of the results. We also identify aspects of model robustness in real clinical workflow usage according to confidence calibration and the inference efficiency.

摘要:当前的深度神经网络的产生在“封闭式”图像识别上实现了近距离的结果;也就是说,正在评估与培训类重叠的类。许多最近的方法尝试解决未知的重要性,这些方法被称为“开放式”识别算法,尝试拒绝未知的类,并在已知类上保持高识别准确性。但是,尚不清楚来自ImageNet的不同常规域训练的开放方法如何在不同但更具体的域中执行,例如医疗域。没有原则性和正式的评估来衡量这些一般开放式方法的有效性,人工智能(AI)基础的医疗诊断将经历无效的采用和增加的错误决策风险。在本文中,我们在最先进的开放式方法中进行严格的评估,从“类似域”到“不同域”方案,并将它们与各种通用和医疗域数据集进行比较。我们总结了结果和核心思路,并解释了模型如何对各种开放性和不同开放类的不同分布作出反应。我们展示了一般域训练和医疗领域培训的开放式模型之间的主要区别,具有我们对结果的定量和定性分析。我们还根据置信度校准和推理效率确定真实​​临床工作流程中模型稳健性的方面。

CV-122-标题: CXR-Net: An Encoder-Decoder-Encoder Multitask Deep Neural Network for Explainable and Accurate Diagnosis of COVID-19 pneumonia with Chest X-ray Images

链接: https://arxiv.org/abs/2110.10813
作者: Xin Zhang, Liangxiu Han, Tam Sobeih, Lianghao Han, Nina Dempsey, Symeon Lechareas, Ascanio Tridente, Haoming Chen, Stephen White
备注:

点击查看摘要

Abstract: Accurate and rapid detection of COVID-19 pneumonia is crucial for optimal patient treatment. Chest X-Ray (CXR) is the first line imaging test for COVID-19 pneumonia diagnosis as it is fast, cheap and easily accessible. Inspired by the success of deep learning (DL) in computer vision, many DL-models have been proposed to detect COVID-19 pneumonia using CXR images. Unfortunately, these deep classifiers lack the transparency in interpreting findings, which may limit their applications in clinical practice. The existing commonly used visual explanation methods are either too noisy or imprecise, with low resolution, and hence are unsuitable for diagnostic purposes. In this work, we propose a novel explainable deep learning framework (CXRNet) for accurate COVID-19 pneumonia detection with an enhanced pixel-level visual explanation from CXR images. The proposed framework is based on a new Encoder-Decoder-Encoder multitask architecture, allowing for both disease classification and visual explanation. The method has been evaluated on real world CXR datasets from both public and private data sources, including: healthy, bacterial pneumonia, viral pneumonia and COVID-19 pneumonia cases The experimental results demonstrate that the proposed method can achieve a satisfactory level of accuracy and provide fine-resolution classification activation maps for visual explanation in lung disease detection. The Average Accuracy, the Precision, Recall and F1-score of COVID-19 pneumonia reached 0.879, 0.985, 0.992 and 0.989, respectively. We have also found that using lung segmented (CXR) images can help improve the performance of the model. The proposed method can provide more detailed high resolution visual explanation for the classification decision, compared to current state-of-the-art visual explanation methods and has a great potential to be used in clinical practice for COVID-19 pneumonia diagnosis.

摘要:Covid-19肺炎的准确和快速检测对于最佳患者治疗至关重要。胸部X射线(CXR)是Covid-19肺炎诊断的第一线成像试验,因为它快速,便宜且易于访问。受到计算机愿景中深度学习(DL)的成功的启发,已经提出了许多DL模型使用CXR图像检测Covid-19肺炎。不幸的是,这些深层分类器缺乏解释调查结果的透明度,这可能会限制他们在临床实践中的应用。现有常用的视觉解释方法太嘈杂或不精确,具有低分辨率,因此不适合诊断目的。在这项工作中,我们提出了一种新颖的可解释的深度学习框架(CXRNET),用于精确的Covid-19肺炎检测,具有来自CXR图像的增强像素级视觉解释。该框架基于新的编码器 - 解码器编码器多任务架构,允许疾病分类和视觉解释。该方法已经在公共和私人数据来源的现实世界CXR数据集上进行了评估,包括:健康,细菌性肺炎,病毒肺炎和Covid-19肺炎病例实验结果表明,所提出的方法可以达到令人满意的准确性和提供水平微分辨率分类激活地图,用于肺病检测中的视觉解释。 Covid-19肺炎的平均准确性,精度,召回和F1分数分别达到0.879,0.985,0.992和0.989。我们还发现,使用肺部分段(CXR)图像可以帮助提高模型的性能。该方法可以为分类决策提供更详细的高分辨率视觉解释,与当前的最先进的视觉解释方法相比,并且具有在Covid-19肺炎诊断的临床实践中使用的巨大潜力。

CV-123-标题: Toward Real-world Image Super-resolution via Hardware-based Adaptive Degradation Models

链接: https://arxiv.org/abs/2110.10755
作者: Rui Ma, Johnathan Czernik, Xian Du
备注:

点击查看摘要

Abstract: Most single image super-resolution (SR) methods are developed on synthetic low-resolution (LR) and high-resolution (HR) image pairs, which are simulated by a predetermined degradation operation, e.g., bicubic downsampling. However, these methods only learn the inverse process of the predetermined operation, so they fail to super resolve the real-world LR images; the true formulation deviates from the predetermined operation. To address this problem, we propose a novel supervised method to simulate an unknown degradation process with the inclusion of the prior hardware knowledge of the imaging system. We design an adaptive blurring layer (ABL) in the supervised learning framework to estimate the target LR images. The hyperparameters of the ABL can be adjusted for different imaging hardware. The experiments on the real-world datasets validate that our degradation model can estimate LR images more accurately than the predetermined degradation operation, as well as facilitate existing SR methods to perform reconstructions on real-world LR images more accurately than the conventional approaches.

摘要:大多数单图像超分辨率(SR)方法是在合成低分辨率(LR)和高分辨率(HR)图像对上开发的,其通过预定的降级操作,例如,双臂下采样。但是,这些方法只学习预定操作的逆过程,因此它们无法超级解决现实世界的LR图像;真正的制剂偏离预定操作。为了解决这个问题,我们提出了一种新颖的监督方法来模拟未知的降级过程,包括包含成像系统的先前硬件知识。我们在监督的学习框架中设计一个自适应模糊层(ABL)来估计目标LR图像。可以调整ABL的超级参数以进行不同的成像硬件。实验对现实世界数据集的实验验证了我们的劣化模型可以比预定的劣化操作更准确地估计LR图像,以及促进现有的SR方法比传统方法更准确地执行真实世界的LR图像上的重建。

CV-124-标题: Combining Different V1 Brain Model Variants to Improve Robustness to Image Corruptions in CNNs

链接: https://arxiv.org/abs/2110.10645
作者: Avinash Baidya, Joel Dapello, James J. DiCarlo, Tiago Marques
备注: 15 pages with supplementary material, 3 main figures, 2 supplementary figures, 4 supplementary tables

点击查看摘要

Abstract: While some convolutional neural networks (CNNs) have surpassed human visual abilities in object classification, they often struggle to recognize objects in images corrupted with different types of common noise patterns, highlighting a major limitation of this family of models. Recently, it has been shown that simulating a primary visual cortex (V1) at the front of CNNs leads to small improvements in robustness to these image perturbations. In this study, we start with the observation that different variants of the V1 model show gains for specific corruption types. We then build a new model using an ensembling technique, which combines multiple individual models with different V1 front-end variants. The model ensemble leverages the strengths of each individual model, leading to significant improvements in robustness across all corruption categories and outperforming the base model by 38% on average. Finally, we show that using distillation, it is possible to partially compress the knowledge in the ensemble model into a single model with a V1 front-end. While the ensembling and distillation techniques used here are hardly biologically-plausible, the results presented here demonstrate that by combining the specific strengths of different neuronal circuits in V1 it is possible to improve the robustness of CNNs for a wide range of perturbations.

摘要:虽然一些卷积神经网络(CNNS)在对象分类中超过了人类的视觉能力,但它们往往努力识别以不同类型的常见噪声模式损坏的图像中的对象,突出了这一系列模型的主要限制。最近,已经表明,在CNNS前面模拟主视觉皮质(V1)导致对这些图像扰动的鲁棒性的小改进。在本研究中,我们从观察到v1模型的不同变体显示特定腐败类型的增益。然后,我们使用合奏技术构建一个新模型,该技术将多个单独模型与不同的V1前端变体组合。该模型集合利用每个腐败类别的鲁棒性的显着改善,平均优于38%的基础模型。最后,我们表明使用蒸馏,可以将集合模型中的知识部分压缩成具有V1前端的单个模型。虽然这里使用的合并和蒸馏技术几乎没有生物学,但是这里呈现的结果表明,通过组合V1中不同神经元电路的特定强度,可以改善CNN的鲁棒性,用于广泛的扰动。

CV-125-标题: OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data

链接: https://arxiv.org/abs/2110.10640
作者: Christoph Reich, Tim Prangemeier, Özdemir Cetin, Heinz Koeppl
备注: BMVC 2021 (accepted), this https URL (code)

点击查看摘要

Abstract: Convolutional neural networks (CNNs) are the current state-of-the-art meta-algorithm for volumetric segmentation of medical data, for example, to localize COVID-19 infected tissue on computer tomography scans or the detection of tumour volumes in magnetic resonance imaging. A key limitation of 3D CNNs on voxelised data is that the memory consumption grows cubically with the training data resolution. Occupancy networks (O-Nets) are an alternative for which the data is represented continuously in a function space and 3D shapes are learned as a continuous decision boundary. While O-Nets are significantly more memory efficient than 3D CNNs, they are limited to simple shapes, are relatively slow at inference, and have not yet been adapted for 3D semantic segmentation of medical data. Here, we propose Occupancy Networks for Semantic Segmentation (OSS-Nets) to accurately and memory-efficiently segment 3D medical data. We build upon the original O-Net with modifications for increased expressiveness leading to improved segmentation performance comparable to 3D CNNs, as well as modifications for faster inference. We leverage local observations to represent complex shapes and prior encoder predictions to expedite inference. We showcase OSS-Net’s performance on 3D brain tumour and liver segmentation against a function space baseline (O-Net), a performance baseline (3D residual U-Net), and an efficiency baseline (2D residual U-Net). OSS-Net yields segmentation results similar to the performance baseline and superior to the function space and efficiency baselines. In terms of memory efficiency, OSS-Net consumes comparable amounts of memory as the function space baseline, somewhat more memory than the efficiency baseline and significantly less than the performance baseline. As such, OSS-Net enables memory-efficient and accurate 3D semantic segmentation that can scale to high resolutions.

摘要:卷积神经网络(CNNS)是当前用于医疗数据的体积分割的现有最先进的元算法,例如,在计算机断层扫描扫描或磁性中肿瘤体积的检测共振成像。在体封装数据上的3D CNN的关键限制是存储器消耗随着训练数据分辨率的判断统计。占用网络(O-NET)是在函数空间中连续地表示数据,并且将3D形状作为连续决策边界进行3D形状。虽然O-Net的内存高于3D CNNS,但它们限于简单形状,在推理时比较慢,并且尚未适用于医疗数据的3D语义分段。在这里,我们提出了用于语义分割(OSS-NET)的占用网络,以准确和记忆 - 高效地段3D医疗数据。我们建立在原始O-NET上,具有提高的表现力,导致与3D CNNS相当的分割性能,以及更快推理的修改。我们利用本地观察来表示复杂的形状和先前的编码器预测,以加快推理。我们展示OSS-Net在3D脑肿瘤和肝脏分割对函数空间基线(O-Net)的性能,性能基线(3D残余U-Net)和效率基线(2D残余U-Net)。 OSS-Net产生的分段结果类似于性能基线,优于函数空间和效率基线。在记忆效率方面,OSS-Net将可比数量的存储器作为函数空间基线消耗,比效率基线更多的内存,并且显着小于性能基线。因此,OSS-Net支持可以扩展到高分辨率的内存高效和准确的3D语义分段。

CV-126-标题: Development and accuracy evaluation of Coded Phase-shift 3D scanner

链接: https://arxiv.org/abs/2110.10520
作者: Pranav Kant Gaur, D.M.Sarode, S.K.Bose
备注:

点击查看摘要

Abstract: In this paper, we provide an overview of development of a structured light 3D-scanner based on combination of binary-coded patterns and sinusoidal phase-shifted fringe patterns called Coded Phase-shift technique. Further, we describe the experiments performed to evaluate measurement accuracy and precision of the developed system. A study of this kind is expected to be helpful in understanding the basic working of current structured-light 3D scanners and the approaches followed for their performance assessment.

摘要:本文基于二进制编码图案和称为编码相移技术的正弦相移条纹图案的组合,提供结构光3D扫描仪的开发概述。此外,我们描述了评估发发系统的测量精度和精度的实验。预计对这种研究的研究有助于了解当前结构化光3D扫描仪的基本工作以及遵循其性能评估的方法。

CV-127-标题: Evaluation of augmentation methods in classifying autism spectrum disorders from fMRI data with 3D convolutional neural networks

链接: https://arxiv.org/abs/2110.10489
作者: Johan Jönemo, David Abramian, Anders Eklund
备注:

点击查看摘要

Abstract: Classifying subjects as healthy or diseased using neuroimaging data has gained a lot of attention during the last 10 years. Here we apply deep learning to derivatives from resting state fMRI data, and investigate how different 3D augmentation techniques affect the test accuracy. Specifically, we use resting state derivatives from 1,112 subjects in ABIDE preprocessed to train a 3D convolutional neural network (CNN) to perform the classification. Our results show that augmentation only provide minor improvements to the test accuracy.

摘要:在过去10年中,将受试者作为健康或患病使用神经影像数据的分类受到很多关注。在这里,我们将深入学习从休息状态FMRI数据中的衍生品,并研究不同的3D增强技术如何影响测试精度。具体地,我们使用从静脉预处理的静脉预处理的1,112受试者训练3D卷积神经网络(CNN)来执行分类。我们的结果表明,增强仅对测试准确性进行了轻微的改进。

CV-128-标题: AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation

链接: https://arxiv.org/abs/2110.10403
作者: Xiangyi Yan, Hao Tang, Shanlin Sun, Haoyu Ma, Deying Kong, Xiaohui Xie
备注:

点击查看摘要

Abstract: Recent advances in transformer-based models have drawn attention to exploring these techniques in medical image segmentation, especially in conjunction with the U-Net model (or its variants), which has shown great success in medical image segmentation, under both 2D and 3D settings. Current 2D based methods either directly replace convolutional layers with pure transformers or consider a transformer as an additional intermediate encoder between the encoder and decoder of U-Net. However, these approaches only consider the attention encoding within one single slice and do not utilize the axial-axis information naturally provided by a 3D volume. In the 3D setting, convolution on volumetric data and transformers both consume large GPU memory. One has to either downsample the image or use cropped local patches to reduce GPU memory usage, which limits its performance. In this paper, we propose Axial Fusion Transformer UNet (AFTer-UNet), which takes both advantages of convolutional layers’ capability of extracting detailed features and transformers’ strength on long sequence modeling. It considers both intra-slice and inter-slice long-range cues to guide the segmentation. Meanwhile, it has fewer parameters and takes less GPU memory to train than the previous transformer-based models. Extensive experiments on three multi-organ segmentation datasets demonstrate that our method outperforms current state-of-the-art methods.

摘要:基于变压器的模型的最新进展引起了在医学图像分割中探索这些技术,特别是与U-Net模型(或其变体)结合,在2D和2D中,在医学图像分割中表现出巨大的成功。 3D设置。基于2D的基于2D的方法直接用纯变压器替换卷积层,或者将变压器视为U-Net的编码器和解码器之间的额外中间编码器。然而,这些方法仅考虑一个单个切片内的注意力编码,并且不利用由3D体积自然提供的轴轴信息。在3D设置中,体积数据和变压器的卷积都消耗大GPU内存。一个必须缩小图像或使用裁剪的本地修补程序来减少GPU内存使用情况,这限制了其性能。在本文中,我们提出了轴向熔化变压器Unet(uneT),其卷积层的优点是在长序序列建模上提取详细特征和变压器强度的能力。它考虑了切片内和切片间的远程提示,以指导分割。同时,它具有较少的参数,并且比以前的基于变压器的模型训练少的GPU内存。在三个多器官分段数据集上进行广泛的实验表明,我们的方法优于当前最先进的方法。

CV-129-标题: Deep Learning for HDR Imaging: State-of-the-Art and Future Trends

链接: https://arxiv.org/abs/2110.10394
作者: Lin Wang, Kuk-Jin Yoon
备注: Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

点击查看摘要

Abstract: High dynamic range (HDR) imaging is a technique that allows an extensive dynamic range of exposures, which is important in image processing, computer graphics, and computer vision. In recent years, there has been a significant advancement in HDR imaging using deep learning (DL). This study conducts a comprehensive and insightful survey and analysis of recent developments in deep HDR imaging methodologies. We hierarchically and structurally group existing deep HDR imaging methods into five categories based on (1) number/domain of input exposures, (2) number of learning tasks, (3) novel sensor data, (4) novel learning strategies, and (5) applications. Importantly, we provide a constructive discussion on each category regarding its potential and challenges. Moreover, we review some crucial aspects of deep HDR imaging, such as datasets and evaluation metrics. Finally, we highlight some open problems and point out future research directions.

摘要:高动态范围(HDR)成像是一种允许广泛的曝光范围的技术,这在图像处理,计算机图形和计算机视觉中很重要。近年来,使用深度学习(DL),HDR成像有重大进展。本研究对深层HDR成像方法的最新发展进行了综合和富有洞察力的调查和分析。在分层和结构上,将现有的深层HDR成像方法基于(1)输入曝光的数量/域,(2)学习任务数,(3)新传感器数据,(4)新的学习策略,(5)应用程序。重要的是,我们对关于其潜在和挑战的每个类别提供建设性的讨论。此外,我们审查了深度HDR成像的一些关键方面,例如数据集和评估指标。最后,我们突出了一些打开的问题,并指出了未来的研究方向。

CV-130-标题: Knowledge-Guided Multiview Deep Curriculum Learning for Elbow Fracture Classification

链接: https://arxiv.org/abs/2110.10383
作者: Jun Luo, Gene Kitamura, Dooman Arefan, Emine Doganay, Ashok Panigrahy, Shandong Wu
备注: MICCAI 2021 workshop. DOI: this https URL URL: this https URL

点击查看摘要

Abstract: Elbow fracture diagnosis often requires patients to take both frontal and lateral views of elbow X-ray radiographs. In this paper, we propose a multiview deep learning method for an elbow fracture subtype classification task. Our strategy leverages transfer learning by first training two single-view models, one for frontal view and the other for lateral view, and then transferring the weights to the corresponding layers in the proposed multiview network architecture. Meanwhile, quantitative medical knowledge was integrated into the training process through a curriculum learning framework, which enables the model to first learn from “easier” samples and then transition to “harder” samples to reach better performance. In addition, our multiview network can work both in a dual-view setting and with a single view as input. We evaluate our method through extensive experiments on a classification task of elbow fracture with a dataset of 1,964 images. Results show that our method outperforms two related methods on bone fracture study in multiple settings, and our technique is able to boost the performance of the compared methods. The code is available at this https URL.

摘要:肘部骨折诊断常常要求患者采取弯头X射线射线照片的正面和侧视图。在本文中,我们提出了一种用于肘部骨折亚型分类任务的多视图深度学习方法。我们的策略通过首次培训两个单视图模型来利用转移学习,一个用于横向视图,另一个用于横向视图,然后将权重传送到所提出的多视图网络架构中的相应层。同时,通过课程学习框架将量化的医学知识纳入培训过程,这使得模型能够首先从“更容易”的样本中学习,然后过渡到“更难”的样本以达到更好的性能。此外,我们的MultiView网络可以在双视图设置中使用单个视图作为输入。我们通过对肘部骨折的分类任务的大量实验评估我们的方法,其数据集为1,964个图像。结果表明,我们的方法在多种设置中优于两种相关方法,我们的技术能够提高比较方法的性能。该代码可在此HTTPS URL上获得。

CV-131-标题: Medical Knowledge-Guided Deep Curriculum Learning for Elbow Fracture Diagnosis from X-Ray Images

链接: https://arxiv.org/abs/2110.10381
作者: Jun Luo, Gene Kitamura, Emine Doganay, Dooman Arefan, Shandong Wu
备注: SPIE Medical Imaging 2021. DOI: this https URL URL: this https URL

点击查看摘要

Abstract: Elbow fractures are one of the most common fracture types. Diagnoses on elbow fractures often need the help of radiographic imaging to be read and analyzed by a specialized radiologist with years of training. Thanks to the recent advances of deep learning, a model that can classify and detect different types of bone fractures needs only hours of training and has shown promising results. However, most existing deep learning models are purely data-driven, lacking incorporation of known domain knowledge from human experts. In this work, we propose a novel deep learning method to diagnose elbow fracture from elbow X-ray images by integrating domain-specific medical knowledge into a curriculum learning framework. In our method, the training data are permutated by sampling without replacement at the beginning of each training epoch. The sampling probability of each training sample is guided by a scoring criterion constructed based on clinically known knowledge from human experts, where the scoring indicates the diagnosis difficultness of different elbow fracture subtypes. We also propose an algorithm that updates the sampling probabilities at each epoch, which is applicable to other sampling-based curriculum learning frameworks. We design an experiment with 1865 elbow X-ray images for a fracture/normal binary classification task and compare our proposed method to a baseline method and a previous method using multiple metrics. Our results show that the proposed method achieves the highest classification performance. Also, our proposed probability update algorithm boosts the performance of the previous method.

摘要:手肘骨折是最常见的骨折类型之一。肘部骨折上的诊断通常需要通过一年多年培训的专业放射科学专员读取和分析射线照相成像的帮助。由于最近深入学习的进步,一种可以分类和检测不同类型的骨折需要的模型只需要几小时的培训,并显示出现有前途的结果。然而,大多数现有的深度学习模型纯粹是数据驱动的,缺乏从人类专家的已知领域知识的纳入。在这项工作中,通过将域特定的医学知识集成到课程学习框架中,提出了一种新的深度学习方法来诊断弯头X射线图像的肘部骨折。在我们的方法中,培训数据通过对每个训练时代开始时进行采样而无需更换。每个训练样本的采样概率由基于人类专家的临床知识构建的评分标准指导,其中评分表明不同肘部骨折亚型的诊断困难。我们还提出了一种算法,可以在每个时代更新采样概率,这适用于其他基于样本的课程学习框架。我们设计了一个用于骨折/正常二进制分类任务的1865个弯头X射线图像的实验,并将我们的提出方法与基线方法和使用多个度量的先前方法进行比较。我们的结果表明,该方法达到了最高分类性能。此外,我们提出的概率更新算法提高了先前方法的性能。

CV-132-标题: Artificial Intelligence-Based Detection Classification and Prediction/Prognosis in PET Imaging: Towards Radiophenomics

链接: https://arxiv.org/abs/2110.10332
作者: Fereshteh Yousefirizi, Pierre Decasez, Amine Amyar, Su Ruan, Babak Saboury, Arman Rahmim
备注:

点击查看摘要

Abstract: Artificial intelligence (AI) techniques have significant potential to enable effective, robust, and automated image phenotyping including identification of subtle patterns. AI-based detection searches the image space to find the regions of interest based on patterns and features. There is a spectrum of tumor histologies from benign to malignant that can be identified by AI-based classification approaches using image features. The extraction of minable information from images gives way to the field of radiomics and can be explored via explicit (handcrafted/engineered) and deep radiomics frameworks. Radiomics analysis has the potential to be utilized as a noninvasive technique for the accurate characterization of tumors to improve diagnosis and treatment monitoring. This work reviews AI-based techniques, with a special focus on oncological PET and PET/CT imaging, for different detection, classification, and prediction/prognosis tasks. We also discuss needed efforts to enable the translation of AI techniques to routine clinical workflows, and potential improvements and complementary techniques such as the use of natural language processing on electronic health records and neuro-symbolic AI techniques.

摘要:人工智能(AI)技术具有有效,鲁棒和自动图像表型的显着潜力,包括识别细微图案。基于AI的检测搜索图像空间基于模式和特征来找到兴趣区域。存在一种良性的肿瘤组织学,可以通过使用图像特征的基于AI的分类方法来识别。图像从图像中提取可用于的可覆盖方式,可以通过显式(手工/工程化)和深度辐射谱系框架来探索途径。辐射瘤分析有可能用作非侵入性技术,以准确表征肿瘤,以改善诊断和治疗监测。这项工作介绍基于AI的技术,专注于肿瘤宠物和PET / CT成像,用于不同的检测,分类和预测/预测任务。我们还讨论了所需的努力,使AI技术转换为常规临床工作流程,以及潜在的改进和互补技术,例如在电子健康记录和神经象征性AI技术上使用自然语言处理。

CV-133-标题: A New Automatic Change Detection Frame-work Based on Region Growing and Weighted Local Mutual Information: Analysis of Breast Tumor Response to Chemotherapy in Serial MR Images

链接: https://arxiv.org/abs/2110.10242
作者: Narges Norouzi, Reza Azmi, Nooshin Noshiri, Robab Anbiaee
备注: 18 pages, 16 figures, 14 tables

点击查看摘要

Abstract: The automatic analysis of subtle changes between longitudinal MR images is an important task as it is still a challenging issue in scope of the breast medical image processing. In this paper we propose an effective automatic change detection framework composed of two phases since previously used methods have features with low distinctive power. First, in the preprocessing phase an intensity normalization method is suggested based on Hierarchical Histogram Matching (HHM) that is more robust to noise than previous methods. To eliminate undesirable changes and extract the regions containing significant changes the proposed Extraction Region of Changes (EROC) method is applied based on intensity distribution and Hill-Climbing algorithm. Second, in the detection phase a region growing-based approach is suggested to differentiate significant changes from unreal ones. Due to using proposed Weighted Local Mutual Information (WLMI) method to extract high level features and also utilizing the principle of the local consistency of changes, the proposed approach enjoys reasonable performance. The experimental results on both simulated and real longitudinal Breast MR Images confirm the effectiveness of the proposed framework. Also, this framework outperforms the human expert in some cases which can detect many lesion evolutions that are missed by expert.

摘要:纵向MR图像之间的微妙变化的自动分析是一个重要的任务,因为它在乳房医学图像处理范围内仍为一个具有挑战性的问题。在本文中,我们提出了一种有效的自动变化检测框架,其由两种阶段组成,因为先前使用的方法具有具有低独特功率的功能。首先,在预处理阶段,基于分层直方图匹配(HHM)来提出强度归一化方法,该匹配比以前的方法更稳健。为了消除不希望的变化和提取含有显着变化的区域,基于强度分布和爬山算法应用所提出的改变的提取区域(EROC)方法。其次,在检测阶段,建议地区生长的方法,以区分从虚幻界面的显着变化。由于使用所提出的加权局部互信息(WLMI)方法来提取高级特征并且还利用局部一致性的原理变化,所提出的方法享有合理的性能。模拟和真正的纵向乳房MR图像的实验结果证实了提出框架的有效性。此外,该框架在某些情况下占人类专家,这可能会发现专家错过的许多病变演变。

CV-134-标题: Cross-Sim-NGF: FFT-Based Global Rigid Multimodal Alignment of Image Volumes using Normalized Gradient Fields

链接: https://arxiv.org/abs/2110.10156
作者: Johan Öfverstedt, Joakim Lindblad, Nataša Sladoje
备注: 5 pages, 3 figures, 3 tables. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

点击查看摘要

Abstract: Multimodal image alignment involves finding spatial correspondences between volumes varying in appearance and structure. Automated alignment methods are often based on local optimization that can be highly sensitive to their initialization. We propose a global optimization method for rigid multimodal 3D image alignment, based on a novel efficient algorithm for computing similarity of normalized gradient fields (NGF) in the frequency domain. We validate the method experimentally on a dataset comprised of 20 brain volumes acquired in four modalities (T1w, Flair, CT, [18F] FDG PET), synthetically displaced with known transformations. The proposed method exhibits excellent performance on all six possible modality combinations, and outperforms all four reference methods by a large margin. The method is fast; a 3.4Mvoxel global rigid alignment requires approximately 40 seconds of computation, and the proposed algorithm outperforms a direct algorithm for the same task by more than three orders of magnitude. Open-source implementation is provided.

摘要:多模式图像对齐涉及在外观和结构之间的卷之间找到空间对应。自动对齐方式通常基于本地优化,对其初始化非常敏感。基于频域中标准化梯度场(NGF)的相似性,提出了一种全局优化方法,用于基于频域中的归一化梯度字段(NGF)的相似性。我们在实验上实验在由四种方式(T1W,Flair,CT,[18F] FDG PET中获取的20个脑量组成的数据集上,用已知的转换综合移位。所提出的方法对所有六种可能的模态组合表现出优异的性能,并且通过大边距优于所有四种参考方法。该方法快速; 3.4mvoxel全局刚性对准需要大约40秒的计算,并且所提出的算法优于同一任务的直接算法超过三个数量级。提供了开源实现。

人工智能

AI-0-标题: Detecting Important Patterns Using Conceptual Relevance Interestingness Measure

链接: https://arxiv.org/abs/2110.11262
作者: Mohamed-Hamza Ibrahim, Rokia Missaoui, Jean Vaillancourt
备注:

点击查看摘要

Abstract: Discovering meaningful conceptual structures is a substantial task in data mining and knowledge discovery applications. While off-the-shelf interestingness indices defined in Formal Concept Analysis may provide an effective relevance evaluation in several situations, they frequently give inadequate results when faced with massive formal contexts (and concept lattices), and in the presence of irrelevant concepts. In this paper, we introduce the Conceptual Relevance (CR) score, a new scalable interestingness measurement for the identification of actionable concepts. From a conceptual perspective, the minimal generators provide key information about their associated concept intent. Furthermore, the relevant attributes of a concept are those that maintain the satisfaction of its closure condition. Thus, the guiding idea of CR exploits the fact that minimal generators and relevant attributes can be efficiently used to assess concept relevance. As such, the CR index quantifies both the amount of conceptually relevant attributes and the number of the minimal generators per concept intent. Our experiments on synthetic and real-world datasets show the efficiency of this measure over the well-known stability index.

摘要:发现有意义的概念结构是数据挖掘和知识发现应用中的实质性任务。虽然正式概念分析中定义的现成有趣索引可以在若干情况下提供有效的相关性评估,但在面对巨大的正式背景(和概念格子)以及在不相关的概念存在时,它们经常给出不足的结果。在本文中,我们介绍了概念相关性(CR)得分,以识别可行概念的新可扩展有趣测量。从概念角度来看,最小的发电机提供有关他们关联的概念意图的关键信息。此外,概念的相关属性是维持其关闭条件满意度的相关属性。因此,CR的指导思想利用最小的发电机和相关属性可以有效地用于评估概念相关性的事实。因此,CR指数量化概念相关的属性的数量和每个概念意图的最小发电机的数量。我们对综合和现实世界数据集的实验表明了在众所周知的稳定性指标上的这种措施的效率。

AI-1-标题: Improving the Search by Encoding Multiple Solutions in a Chromosome

链接: https://arxiv.org/abs/2110.11239
作者: Mihai Oltean
备注: 7 figures

点击查看摘要

Abstract: We investigate the possibility of encoding multiple solutions of a problem in a single chromosome. The best solution encoded in an individual will represent (will provide the fitness of) that individual. In order to obtain some benefits the chromosome decoding process must have the same complexity as in the case of a single solution in a chromosome. Three Genetic Programming techniques are analyzed for this purpose: Multi Expression Programming, Linear Genetic Programming, and Infix Form Genetic Programming. Numerical experiments show that encoding multiple solutions in a chromosome greatly improves the search process.

摘要:我们调查在单个染色体中编码问题的多种解决方案的可能性。在个人中编码的最佳解决方案将代表(将提供个人的适应性。为了获得一些益处,染色体解码过程必须具有与染色体中单个溶液的情况相同的复杂性。为此目的分析了三种遗传编程技术:多表达编程,线性遗传编程和INFIX形式遗传编程。数值实验表明,在染色体中编码多种溶液大大提高了搜索过程。

AI-2-标题: Accelerating Genetic Programming using GPUs

链接: https://arxiv.org/abs/2110.11226
作者: Vimarsh Sathia (1), Venkataramana Ganesh (2), Shankara Rao Thejaswi Nanditale (2) ((1) Indian Institute of Technology Madras, (2) NVIDIA Corporation)
备注: 10 pages, 4 figures

点击查看摘要

Abstract: Genetic Programming (GP), an evolutionary learning technique, has multiple applications in machine learning such as curve fitting, data modelling, feature selection, classification etc. GP has several inherent parallel steps, making it an ideal candidate for GPU based parallelization. This paper describes a GPU accelerated stack-based variant of the generational GP algorithm which can be used for symbolic regression and binary classification. The selection and evaluation steps of the generational GP algorithm are parallelized using CUDA. We introduce representing candidate solution expressions as prefix lists, which enables evaluation using a fixed-length stack in GPU memory. CUDA based matrix vector operations are also used for computation of the fitness of population programs. We evaluate our algorithm on synthetic datasets for the Pagie Polynomial (ranging in size from 40964096 to 1616 million points), profiling training times of our algorithm with other standard symbolic regression libraries viz. gplearn, TensorGP and KarooGP. In addition, using 66 large-scale regression and classification datasets usually used for comparing gradient boosting algorithms, we run performance benchmarks on our algorithm and gplearn, profiling the training time, test accuracy, and loss. On an NVIDIA DGX-A100 GPU, our algorithm outperforms all the previously listed frameworks, and in particular, achieves average speedups of 119×119\times and 40×40\times against gplearn on the synthetic and large scale datasets respectively.

摘要:遗传编程(GP),一种进化学习技术,在机器学习中具有多种应用,如曲线拟合,数据建模,特征选择,分类等。GP具有多个固有的并行步骤,使其成为基于GPU的并行化的理想候选者。本文介绍了世代GP算法的基于GPU加速堆栈的变体,其可用于符号回归和二进制分类。世代GP算法的选择和评估步骤使用CUDA并行化。我们介绍代表候选解决方案表达式作为前缀列表,它可以使用GPU内存中的固定长度堆栈进行评估。基于CUDA的矩阵矢量操作也用于计算人口计划的适用性。我们评估我们的Pagie多项式的合成数据集算法(规模为4096美元到1600亿美元),与其他标准符号回归库viz的算法划分培训时间。 Gplearn,Tensorgp和Karooogp。此外,使用通常用于比较渐变升压算法的$ 6 $大规模回归和分类数据集,我们在算法和GPLEARN上运行性能基准,分析训练时间,测试精度和损耗。在NVIDIA DGX-A100 GPU上,我们的算法优于所有先前列出的框架,特别是,分别实现了合成和大规模数据集上的GPLEarn的平均速度为119倍。

AI-3-标题: A Survey on Methods and Metrics for the Assessment of Explainability under the Proposed AI Act

链接: https://arxiv.org/abs/2110.11168
作者: Francesco Sovrano, Salvatore Sapienza, Monica Palmirani, Fabio Vitali
备注: Accepted paper at JURIX 2021

点击查看摘要

Abstract: This study discusses the interplay between metrics used to measure the explainability of the AI systems and the proposed EU Artificial Intelligence Act. A standardisation process is ongoing: several entities (e.g. ISO) and scholars are discussing how to design systems that are compliant with the forthcoming Act and explainability metrics play a significant role. This study identifies the requirements that such a metric should possess to ease compliance with the AI Act. It does so according to an interdisciplinary approach, i.e. by departing from the philosophical concept of explainability and discussing some metrics proposed by scholars and standardisation entities through the lenses of the explainability obligations set by the proposed AI Act. Our analysis proposes that metrics to measure the kind of explainability endorsed by the proposed AI Act shall be risk-focused, model-agnostic, goal-aware, intelligible & accessible. This is why we discuss the extent to which these requirements are met by the metrics currently under discussion.

摘要:本研究讨论了用于衡量AI系统的解释性和建议欧盟人工智能行为的指标之间的相互作用。正在进行标准化进程:若干实体(例如ISO)和学者正在讨论如何使用即将到来的行为和解释性度量符合即将发挥作用的设计。本研究确定了这种指标应具备可容易遵守AI法案的要求。它根据跨学科方法,即通过脱颖解释性的哲学概念和拟议AI法案所设定的解释性义务的镜片探讨了学者和标准化实体的一些指标。我们的分析提出了衡量拟议AI法案认可的可解释性的指标,应占用风险,模型 - 无人,目标感知,可理解和可访问。这就是为什么我们讨论目前正在讨论的指标满足这些要求的程度。

AI-4-标题: Applying Second-Order Quantifier Elimination in Inspecting Gödels Ontological Proof

链接: https://arxiv.org/abs/2110.11108
作者: Christoph Wernhard
备注:

点击查看摘要

Abstract: In recent years, Gödel’s ontological proof and variations of it were formalized and analyzed with automated tools in various ways. We supplement these analyses with a modeling in an automated environment based on first-order logic extended by predicate quantification. Formula macros are used to structure complex formulas and tasks. The analysis is presented as a generated type-set document where informal explanations are interspersed with pretty-printed formulas and outputs of reasoners for first-order theorem proving and second-order quantifier elimination. Previously unnoticed or obscured aspects and details of Gödel’s proof become apparent. Practical application possibilities of second-order quantifier elimination are shown and the encountered elimination tasks may serve as benchmarks.

【摘要我们通过基于谓词量化扩展的一阶逻辑,在自动化环境中使用自动化环境中的建模进行了补充。公式宏用于构建复杂的公式和任务。分析作为生成的类型集文档呈现,其中非正式的解释与漂亮印刷的公式交流,以及用于一阶定理证明和二阶量化消除的原件的产出。以前没有注意到或模糊的方面和哥德尔证据的细节变得明显。示出了二阶量化消除的实际应用可能性,并且遇到的消除任务可以用作基准。

AI-5-标题: Enabling a Social Robot to Process Social Cues to Detect when to Help a User

链接: https://arxiv.org/abs/2110.11075
作者: Jason R. Wilson, Phyo Thuta Aung, Isabelle Boucher
备注: Presented at AI-HRI symposium as part of AAAI-FSS 2021 (arXiv:2109.10836)

点击查看摘要

Abstract: It is important for socially assistive robots to be able to recognize when a user needs and wants help. Such robots need to be able to recognize human needs in a real-time manner so that they can provide timely assistance. We propose an architecture that uses social cues to determine when a robot should provide assistance. Based on a multimodal fusion approach upon eye gaze and language modalities, our architecture is trained and evaluated on data collected in a robot-assisted Lego building task. By focusing on social cues, our architecture has minimal dependencies on the specifics of a given task, enabling it to be applied in many different contexts. Enabling a social robot to recognize a user’s needs through social cues can help it to adapt to user behaviors and preferences, which in turn will lead to improved user experiences.

摘要:社会辅助机器人能够识别用户需求并想要帮助是重要的。这些机器人需要能够以实时方式识别人类需求,以便他们可以提供及时的帮助。我们提出了一种使用社会提示来确定机器人应提供援助的架构。基于眼睛凝视和语言方式的多模式融合方法,我们的架构受到培训并在机器人辅助乐高建筑任务中收集的数据进行培训。通过专注于社会案例,我们的体系结构对给定任务的细节具有最小依赖性,使其能够在许多不同的上下文中应用。通过社会提示启用社交机器人以认识到用户需求,可以帮助它适应用户行为和偏好,这反过来将导致改善用户体验。

AI-6-标题: Optimizing Multi-Taper Features for Deep Speaker Verification

链接: https://arxiv.org/abs/2110.10983
作者: Xuechen Liu, Md Sahidullah, Tomi Kinnunen
备注: To appear in IEEE Signal Processing Letters

点击查看摘要

Abstract: Multi-taper estimators provide low-variance power spectrum estimates that can be used in place of the windowed discrete Fourier transform (DFT) to extract speech features such as mel-frequency cepstral coefficients (MFCCs). Even if past work has reported promising automatic speaker verification (ASV) results with Gaussian mixture model-based classifiers, the performance of multi-taper MFCCs with deep ASV systems remains an open question. Instead of a static-taper design, we propose to optimize the multi-taper estimator jointly with a deep neural network trained for ASV tasks. With a maximum improvement on the SITW corpus of 25.8% in terms of equal error rate over the static-taper, our method helps preserve a balanced level of leakage and variance, providing more robustness.

摘要:多锥形估计值提供低方差功率谱估计,可以用于代替窗口离散的傅立叶变换(DFT)以提取诸如熔融频率谱系数(MFCC)的语音特征。即使过去的工作报道了具有基于高斯混合模型的自动扬声器验证(ASV)的自动扬声器验证(ASV)结果,即使是基于高斯模型的分类器,具有深度ASV系统的多锥形MFCC的性能仍然是一个打开的问题。我们建议使用针对ASV任务训练的深神经网络,而不是静态锥形设计。在静态锥度上的相同误差率方面,在25.8%的SITW语料库上最大限度地改善,我们的方法有助于保持额定泄漏和方差,提供更多的鲁棒性。

AI-7-标题: PipAttack: Poisoning Federated Recommender Systems forManipulating Item Promotion

链接: https://arxiv.org/abs/2110.10926
作者: Shijie Zhang, Hongzhi Yin, Tong Chen, Zi Huang, Quoc Viet Hung Nguyen, Lizhen Cui
备注: Proceedings of the 15th ACM International Conference on Web Search and Data Mining (WSDM '22)

点击查看摘要

Abstract: Due to the growing privacy concerns, decentralization emerges rapidly in personalized services, especially recommendation. Also, recent studies have shown that centralized models are vulnerable to poisoning attacks, compromising their integrity. In the context of recommender systems, a typical goal of such poisoning attacks is to promote the adversary’s target items by interfering with the training dataset and/or process. Hence, a common practice is to subsume recommender systems under the decentralized federated learning paradigm, which enables all user devices to collaboratively learn a global recommender while retaining all the sensitive data locally. Without exposing the full knowledge of the recommender and entire dataset to end-users, such federated recommendation is widely regarded `safe’ towards poisoning attacks. In this paper, we present a systematic approach to backdooring federated recommender systems for targeted item promotion. The core tactic is to take advantage of the inherent popularity bias that commonly exists in data-driven recommenders. As popular items are more likely to appear in the recommendation list, our innovatively designed attack model enables the target item to have the characteristics of popular items in the embedding space. Then, by uploading carefully crafted gradients via a small number of malicious users during the model update, we can effectively increase the exposure rate of a target (unpopular) item in the resulted federated recommender. Evaluations on two real-world datasets show that 1) our attack model significantly boosts the exposure rate of the target item in a stealthy way, without harming the accuracy of the poisoned recommender; and 2) existing defenses are not effective enough, highlighting the need for new defenses against our local model poisoning attacks to federated recommender systems.

摘要:由于隐私问题不断发展,分权在个性化服务中迅速出现,特别是建议。此外,最近的研究表明,集中模型容易受到中毒攻击的影响,损害了他们的完整性。在推荐系统的背景下,这种中毒攻击的典型目标是通过干扰培训数据集和/或过程来促进对抗的目标项目。因此,常识是在分散的联合学习范例下占用推荐系统,这使得所有用户设备能够协作地学习全局推荐,同时在本地保留所有敏感数据。在不将推荐人和整个数据集的全面了解到最终用户的情况下,这种联合建议被广泛认为地攻击中毒攻击。在本文中,我们为有针对性项目促进提供了一种系统的反转辅助推荐系统的系统方法。核心策略是利用数据驱动推荐人中常见的固有人气偏差。由于流行项目更有可能出现在推荐列表中,我们创新设计的攻击模型使目标物品能够在嵌入空间中具有流行项目的特征。然后,通过在模型更新期间通过少量恶意用户上传仔细制作梯度,我们可以有效地提高所产生的联合推荐中的目标(不受欢迎)项目的曝光率。两个现实世界数据集的评估表明,我们的攻击模型明显提高了目标物品的曝光率,不受损害中毒推荐的准确性; 2)现有的防御不够有效,突出了对我们当地模型中毒攻击的新防御,以对联邦推荐系统的攻击。

AI-8-标题: Efficient Robotic Manipulation Through Offline-to-Online Reinforcement Learning and Goal-Aware State Information

链接: https://arxiv.org/abs/2110.10905
作者: Jin Li, Xianyuan Zhan, Zixu Xiao, Guyue Zhou
备注:

点击查看摘要

Abstract: End-to-end learning robotic manipulation with high data efficiency is one of the key challenges in robotics. The latest methods that utilize human demonstration data and unsupervised representation learning has proven to be a promising direction to improve RL learning efficiency. The use of demonstration data also allows “warming-up” the RL policies using offline data with imitation learning or the recently emerged offline reinforcement learning algorithms. However, existing works often treat offline policy learning and online exploration as two separate processes, which are often accompanied by severe performance drop during the offline-to-online transition. Furthermore, many robotic manipulation tasks involve complex sub-task structures, which are very challenging to be solved in RL with sparse reward. In this work, we propose a unified offline-to-online RL framework that resolves the transition performance drop issue. Additionally, we introduce goal-aware state information to the RL agent, which can greatly reduce task complexity and accelerate policy learning. Combined with an advanced unsupervised representation learning module, our framework achieves great training efficiency and performance compared with the state-of-the-art methods in multiple robotic manipulation tasks.

摘要:具有高数据效率的端到端学习机器人操纵是机器人中的关键挑战之一。利用人类示范数据和无监督代表学习的最新方法已被证明是提高RL学习效率的有希望的方向。演示数据的使用还允许“预热”RL策略使用具有模仿学习的离线数据或最近出现的离线强化学习算法。然而,现有的作品通常将离线政策学习和在线探索视为两个独立的进程,这些过程通常伴随着在线到在线过渡期间的严重性能下降。此外,许多机器人操纵任务涉及复杂的子任务结构,这些结构非常具有挑战性,以稀疏奖励在RL中解决。在这项工作中,我们提出了一个统一的离线到在线RL框架,解决过渡性能下降问题。此外,我们将目标感知状态信息介绍给RL代理,可以大大降低任务复杂性并加速策略学习。结合高级无监督的代表学习模块,我们的框架与多种机器人操纵任务中的最先进方法相比,实现了巨大的培训效率和性能。

AI-9-标题: Human-Centered Explainable AI (XAI): From Algorithms to User Experiences

链接: https://arxiv.org/abs/2110.10790
作者: Q. Vera Liao, Kush R. Varshney
备注: draft for a book chapter

点击查看摘要

Abstract: As a technical sub-field of artificial intelligence (AI), explainable AI (XAI) has produced a vast collection of algorithms in recent years. However, explainability is an inherently human-centric property and the field is starting to embrace inter-disciplinary perspectives and human-centered approaches. As researchers and practitioners begin to leverage XAI algorithms to build XAI applications, explainability has moved beyond a demand by data scientists or researchers to comprehend the models they are developing, to become an essential requirement for people to trust and adopt AI deployed in numerous domains. Human-computer interaction (HCI) research and user experience (UX) design in this area are therefore increasingly important. In this chapter, we begin with a high-level overview of the technical landscape of XAI algorithms, then selectively survey recent HCI work that takes human-centered approaches to design, evaluate, provide conceptual and methodological tools for XAI. We ask the question “what are human-centered approaches doing for XAI” and highlight three roles that they should play in shaping XAI technologies: to drive technical choices by understanding users’ explainability needs, to uncover pitfalls of existing XAI methods through empirical studies and inform new methods, and to provide conceptual frameworks for human-compatible XAI.

摘要:作为人工智能(AI)的技术子领域,可解释的AI(XAI)近年来产生了广泛的算法。然而,解释性是一种本质上以人为本的财产,该领域开始接受间间观点和以人为本的方法。随着研究人员和从业者开始利用XAI算法来构建XAI应用程序,解释性超出了数据科学家或研究人员的要求,以理解他们正在开发的模型,成为人们信任和采用在许多领域部署的AI的基本要求。因此,该区域的人机交互(HCI)研究和用户体验(UX)设计越来越重要。在本章中,我们从XAI算法技术景观开始,然后选择性地调查最近的HCI工作,以为XAI的以人为本,评估,为XAI提供概念和方法工具。我们提出的问题“以XAI为中心的人为中心的方法是什么,突出三个角色,他们应该在塑造XAI技术中发挥作用:通过了解用户的解释性需求来推动技术选择,以通过实证研究揭示现有XAI方法的缺陷通知新方法,并为人类兼容的Xai提供概念框架。

AI-10-标题: Privacy in Open Search: A Review of Challenges and Solutions

链接: https://arxiv.org/abs/2110.10720
作者: Samuel Sousa, Roman Kern, Christian Guetl
备注: Paper accepted at OSSYM 2021 - Third International Open Search Symposium

点击查看摘要

Abstract: Privacy is of worldwide concern regarding activities and processes that include sensitive data. For this reason, many countries and territories have been recently approving regulations controlling the extent to which organizations may exploit data provided by people. Artificial intelligence areas, such as machine learning and natural language processing, have already successfully employed privacy-preserving mechanisms in order to safeguard data privacy in a vast number of applications. Information retrieval (IR) is likewise prone to privacy threats, such as attacks and unintended disclosures of documents and search history, which may cripple the security of users and be penalized by data protection laws. This work aims at highlighting and discussing open challenges for privacy in the recent literature of IR, focusing on tasks featuring user-generated text data. Our contribution is threefold: firstly, we present an overview of privacy threats to IR tasks; secondly, we discuss applicable privacy-preserving mechanisms which may be employed in solutions to restrain privacy hazards; finally, we bring insights on the tradeoffs between privacy preservation and utility performance for IR tasks.

摘要:隐私是在全球范围内关注的是包括敏感数据的活动和进程。因此,许多国家和地区最近批准了控制组织可以利用人员提供的数据的程度的规定。人工智能领域,如机器学习和自然语言处理,已经成功地使用了隐私保留机制,以便在大量应用中保护数据隐私。信息检索(IR)同样容易达到隐私威胁,例如攻击和无意的文件和搜索历史披露,这可能会瘫痪用户的安全性并由数据保护法律受到惩罚。这项工作旨在突出和讨论IR文献中隐私的公开挑战,重点关注具有用户生成的文本数据的任务。我们的贡献是三倍:首先,我们概述了隐私威胁对IR任务;其次,我们讨论适用的隐私保留机制,这些机制可用于抑制隐私危害的解决方案;最后,我们对IR任务的隐私保存与实用性绩效之间的权衡介绍了洞察力。

AI-11-标题: Bootstrapping confidence in future safety based on past safe operation

链接: https://arxiv.org/abs/2110.10718
作者: Peter Bishop, Andrey Povyakalo, Lorenzo Strigini
备注: 15 pages, 3 figures

点击查看摘要

Abstract: With autonomous vehicles (AVs), a major concern is the inability to give meaningful quantitative assurance of safety, to the extent required by society - e.g. that an AV must be at least as safe as a good human driver - before that AV is in extensive use. We demonstrate an approach to achieving more moderate, but useful, confidence, e.g., confidence of low enough probability of causing accidents in the early phases of operation. This formalises mathematically the common approach of operating a system on a limited basis in the hope that mishap-free operation will confirm one’s confidence in its safety and allow progressively more extensive operation: a process of “bootstrapping” of confidence. Translating that intuitive approach into theorems shows: (1) that it is substantially sound in the right circumstances, and could be a good method for deciding about the early deployment phase for an AV; (2) how much confidence can be rightly derived from such a “cautious deployment” approach, so that we can avoid over-optimism; (3) under which conditions our sound formulas for future confidence are applicable; (4) thus, which analyses of the concrete situations, and/or constraints on practice, are needed in order to enjoy the advantages of provably correct confidence in adequate future safety.

摘要:随着自治车辆(AVS),主要问题是无法对社会所需的安全性提供有意义的定量保证 - 例如,在AV在广泛使用之前,AV必须至少与良好的人类司机一样安全。我们展示了一种实现更温和,但有用,信心,例如有用,信心的方法,其足够低概率导致事故发生在初期的阶段。在数学上规范,以有限的基础为一个有限的方式运作系统的常见方法,希望无误运营将确认一个人对其安全的信心,并逐步更广泛的运作:一个“自行启动”的自信的过程。将直观的方法翻译成定理表明:(1)在合适的情况下,它可能是一个很好的方法,可以是关于AV的早期部署阶段的好方法; (2)可以从这种“谨慎部署”方法中正确地衍生多少人,以便我们可以避免过度乐观; (3)在哪种情况下,我们对未来信心的声音公式是适用的; (4)因此,需要哪些分析具体情况和/或对实践的限制,以便在足够的未来安全方面享受可怕的正确信心的优势。

AI-12-标题: Colosseum: Large-Scale Wireless Experimentation Through Hardware-in-the-Loop Network Emulation

链接: https://arxiv.org/abs/2110.10617
作者: Leonardo Bonati, Pedram Johari, Michele Polese, Salvatore D’Oro, Subhramoy Mohanti, Miead Tehrani-Moayyed, Davide Villa, Shweta Shrivastava, Chinenye Tassie, Kurt Yoder, Ajeet Bagga, Paresh Patel, Ventz Petkov, Michael Seltser, Francesco Restuccia, Abhimanyu Gosain, Kaushik R. Chowdhury, Stefano Basagni, Tommaso Melodia
备注:

点击查看摘要

Abstract: Colosseum is an open-access and publicly-available large-scale wireless testbed for experimental research via virtualized and softwarized waveforms and protocol stacks on a fully programmable, “white-box” platform. Through 256 state-of-the-art Software-defined Radios and a Massive Channel Emulator core, Colosseum can model virtually any scenario, enabling the design, development and testing of solutions at scale in a variety of deployments and channel conditions. These Colosseum radio-frequency scenarios are reproduced through high-fidelity FPGA-based emulation with finite-impulse response filters. Filters model the taps of desired wireless channels and apply them to the signals generated by the radio nodes, faithfully mimicking the conditions of real-world wireless environments. In this paper we describe the architecture of Colosseum and its experimentation and emulation capabilities. We then demonstrate the effectiveness of Colosseum for experimental research at scale through exemplary use cases including prevailing wireless technologies (e.g., cellular and Wi-Fi) in spectrum sharing and unmanned aerial vehicle scenarios. A roadmap for Colosseum future updates concludes the paper.

摘要:Colorsseum是一种开放式和公开可用的大型无线无线测试,用于通过全可编程的“白盒”平台上的虚拟化和软载波形和协议栈进行实验研究。通过256最先进的软件定义的无线电和巨大的通道仿真器核心,罗马斗兽场几乎可以模拟任何方案,在各种部署和渠道条件下,可以在规模上进行设计,开发和测试解决方案。通过有限脉冲响应滤波器通过高保真FPGA的仿真再现这些罗马孔射频场景。过滤器模拟所需的无线通道的抽头,并将它们应用于无线电节点生成的信号,忠实地模拟现实世界无线环境的条件。在本文中,我们描述了罗马斗兽场的结构及其实验和仿真能力。然后,我们通过示例性用例证明了罗马斗兽场对实验研究的有效性,包括频谱共享和无人空中车辆场景的普遍用途用例,包括普遍的无线技术(例如,蜂窝和Wi-Fi)。斗兽索斗兽场未来更新的路线图总结了这篇论文。

AI-13-标题: Surrogate Representation Learning with Isometric Mapping for Gray-box Graph Adversarial Attacks

链接: https://arxiv.org/abs/2110.10482
作者: Zihan Liul, Yun Luo, Zelin Zang, Stan Z. Li
备注:

点击查看摘要

Abstract: Gray-box graph attacks aim at disrupting the performance of the victim model by using inconspicuous attacks with limited knowledge of the victim model. The parameters of the victim model and the labels of the test nodes are invisible to the attacker. To obtain the gradient on the node attributes or graph structure, the attacker constructs an imaginary surrogate model trained under supervision. However, there is a lack of discussion on the training of surrogate models and the robustness of provided gradient information. The general node classification model loses the topology of the nodes on the graph, which is, in fact, an exploitable prior for the attacker. This paper investigates the effect of representation learning of surrogate models on the transferability of gray-box graph adversarial attacks. To reserve the topology in the surrogate embedding, we propose Surrogate Representation Learning with Isometric Mapping (SRLIM). By using Isometric mapping method, our proposed SRLIM can constrain the topological structure of nodes from the input layer to the embedding space, that is, to maintain the similarity of nodes in the propagation process. Experiments prove the effectiveness of our approach through the improvement in the performance of the adversarial attacks generated by the gradient-based attacker in untargeted poisoning gray-box setups.

摘要:灰度框图攻击旨在通过利用受害者模型的有限知识的不起眼攻击来扰乱受害者模型的性能。受害者模型的参数和测试节点的标签对攻击者是不可见的。要在节点属性或图形结构上获取渐变,攻击者构建了在监督下培训的虚构代理模型。然而,有关替代模型的培训以及提供梯度信息的稳健性缺乏讨论。一般节点分类模型失去了图表上节点的拓扑,实际上是攻击者之前的可利用。本文研究了代理模型对灰度箱图对抗攻击的可转移性的影响。为了预留替代嵌入的拓扑,我们提出了使用等距映射(SRLIM)的代理代表学习。通过使用等距映射方法,我们所提出的SRLIM可以将节点的拓扑结构从输入层限制为嵌入空间,即,以维持传播过程中节点的相似性。实验证明了我们的方法的有效性通过改善了基于梯度的攻击者在未确定的中毒灰盒设置中产生的对抗性攻击的性能。

AI-14-标题: R4: A Framework for Route Representation and Route Recommendation

链接: https://arxiv.org/abs/2110.10474
作者: Ran Cheng, Chao Chen, Longfei Xu, Shen Li, Lei Wang, Hengbin Cui, Kaikui Liu, Xiaolong Li
备注:

点击查看摘要

Abstract: Route recommendation is significant in navigation service. Two major challenges for route recommendation are route representation and user representation. Different from items that can be identified by unique IDs in traditional recommendation, routes are combinations of links (i.e., a road segment and its following action like turning left) and the number of combinations could be close to infinite. Besides, the representation of a route changes under different scenarios. These facts result in severe sparsity of routes, which increases the difficulty of route representation. Moreover, link attribute deficiencies and errors affect preciseness of route representation. Because of the sparsity of routes, the interaction data between users and routes are also sparse. This makes it not easy to acquire user representation from historical user-item interactions as traditional recommendations do. To address these issues, we propose a novel learning framework R4. In R4, we design a sparse & dense network to obtain representations of routes. The sparse unit learns link ID embeddings and aggregates them to represent a route, which captures implicit route characteristics and subsequently alleviates problems caused by link attribute deficiencies and errors. The dense unit extracts implicit local features of routes from link attributes. For user representation, we utilize a series of historical navigation to extract user preference. R4 achieves remarkable performance in both offline and online experiments.

摘要:航线推荐在导航服务中是显着的。路由推荐的两个主要挑战是路由表示和用户表示。与传统推荐中的唯一ID可以识别的项目不同,路由是链接的组合(即,道路段及其后续行动,如左转),组合数量可能接近无限。此外,在不同场景下的路线变化的表示。这些事实导致了轨道的严重稀疏性,这增加了路线表示的难度。此外,链接属性缺陷和错误会影响路由表示的精确性。由于路线的稀疏性,用户和路由之间的交互数据也稀疏。这使得从历史用户项交互中获取用户表示不容易作为传统建议所做的。为了解决这些问题,我们提出了一部小说学习框架R4。在R4,我们设计稀疏和密集的网络,以获得路线的表示。稀疏单元学习链接ID Embeddings并聚合它们以表示捕获隐式路由特征的路由,然后减轻链接属性缺陷和错误引起的问题。密度单元从链路属性中提取了路由的隐式本地特征。对于用户表示,我们利用一系列历史导航来提取用户偏好。 R4在离线和在线实验中实现了显着性能。

AI-15-标题: Playing 2048 With Reinforcement Learning

链接: https://arxiv.org/abs/2110.10374
作者: Shilun Li, Veronica Peng
备注:

点击查看摘要

Abstract: The game of 2048 is a highly addictive game. It is easy to learn the game, but hard to master as the created game revealed that only about 1% games out of hundreds million ever played have been won. In this paper, we would like to explore reinforcement learning techniques to win 2048. The approaches we have took include deep Q-learning and beam search, with beam search reaching 2048 28.5 of time.

摘要:2048年的游戏是一个高度上瘾的游戏。很容易学习游戏,但难以掌握,因为创造的游戏透露,只赢了一百百万次占外的1%游戏。在本文中,我们希望探索赢取2048的强化学习技术。我们所采用的方法包括深度Q学习和梁搜索,光束搜索到达2048 28.5的时间。

AI-16-标题: Semantic Sensing and Planning for Human-Robot Collaboration in Uncertain Environments

链接: https://arxiv.org/abs/2110.10324
作者: Luke Burks, Hunter M. Ray, Jamison McGinley, Sousheel Vunnam, Nisar Ahmed
备注:

点击查看摘要

Abstract: Autonomous robots can benefit greatly from human-provided semantic characterizations of uncertain task environments and states. However, the development of integrated strategies which let robots model, communicate, and act on such soft data remains challenging. Here, a framework is presented for active semantic sensing and planning in human-robot teams which addresses these gaps by formally combining the benefits of online sampling-based POMDP policies, multi-modal semantic interaction, and Bayesian data fusion. This approach lets humans opportunistically impose model structure and extend the range of semantic soft data in uncertain environments by sketching and labeling arbitrary landmarks across the environment. Dynamic updating of the environment while searching for a mobile target allows robotic agents to actively query humans for novel and relevant semantic data, thereby improving beliefs of unknown environments and target states for improved online planning. Target search simulations show significant improvements in time and belief state estimates required for interception versus conventional planning based solely on robotic sensing. Human subject studies demonstrate a average doubling in dynamic target capture rate compared to the lone robot case, employing reasoning over a range of user characteristics and interaction modalities. Video of interaction can be found at this https URL.

摘要:自治机器人可以从人类提供不确定任务环境和各国的语义特征中受益匪浅。然而,发展机器人模型,通信和在这种软数据上行动的综合策略的发展仍然具有挑战性。在这里,在人机团队中提供了一个框架,通过正式结合基于在线采样的POMDP策略,多模态语义交互和贝叶斯数据融合的好处来解决这些差距。这种方法让人类机会地施加模型结构,并通过在环境中勾勒和标记任意地标,在不确定的环境中扩展了不确定环境中的语义软数据的范围。在寻找移动目标的同时动态更新环境允许机器人代理积极查询人类的新颖和相关的语义数据,从而改善了未知环境和目标状态的信念,以改善在线规划。目标搜索模拟表现出基于机器人感测的拦截与传统规划所需的时间和信仰状态估计的显着改进。人类主题研究表明,与孤独的机器人外壳相比,动态目标捕获率平均加倍,在一系列用户特征和交互方式上采用推理。可以在此HTTPS URL中找到交互视频。

AI-17-标题: flip-hoisting: Exploiting Repeated Parameters in Discrete Probabilistic Programs

链接: https://arxiv.org/abs/2110.10284
作者: Yu-Hsi Cheng, Todd Millstein, Guy Van den Broeck, Steven Holtzen
备注:

点击查看摘要

Abstract: Probabilistic programming is emerging as a popular and effective means of probabilistic modeling and an alternative to probabilistic graphical models. Probabilistic programs provide greater expressivity and flexibility in modeling probabilistic systems than graphical models, but this flexibility comes at a cost: there remains a significant disparity in performance between specialized Bayesian network solvers and probabilistic program inference algorithms. In this work we present a program analysis and associated optimization, flip-hoisting, that collapses repetitious parameters in discrete probabilistic programs to improve inference performance. flip-hoisting generalizes parameter sharing - a well-known important optimization from discrete graphical models - to probabilistic programs. We implement flip-hoisting in an existing probabilistic programming language and show empirically that it significantly improves inference performance, narrowing the gap between the performances of probabilistic programs and probabilistic graphical models.

摘要:概率规划是作为概率和有效的概率模型手段和概率图形模型的替代方案。概率计划在建模概率系统方面提供更高的表达性和灵活性,而不是图形模型,但这种灵活性以成本为准:专业贝叶斯网络求解器和概率性计划推理算法之间的性能仍然存在显着差异。在这项工作中,我们提出了一个程序分析和相关的优化,翻转呼吸,在离散概率程序中折叠重复的参数,以提高推理性能。翻转呼吸概括参数共享 - 来自离散图形模型的知名重要优化 - 概率计划。我们以现有的概率编程语言实施翻转升值,并经验显示它显着提高了推理性能,缩小了概率计划和概率图形模型之间的差距。

AI-18-标题: MultiHead MultiModal Deep Interest Recommendation Network

链接: https://arxiv.org/abs/2110.10205
作者: Mingbao Yang, ShaoBo Li, Zhou Peng, Ansi Zhang, Yuanmeng Zhang
备注:

点击查看摘要

Abstract: With the development of information technology, human beings are constantly producing a large amount of information at all times. How to obtain the information that users are interested in from the large amount of information has become an issue of great concern to users and even business managers. In order to solve this problem, from traditional machine learning to deep learning recommendation systems, researchers continue to improve optimization models and explore solutions. Because researchers have optimized more on the recommendation model network structure, they have less research on enriching recommendation model features, and there is still room for in-depth recommendation model optimization. Based on the DIN\cite{Authors01} model, this paper adds multi-head and multi-modal modules, which enriches the feature sets that the model can use, and at the same time strengthens the cross-combination and fitting capabilities of the model. Experiments show that the multi-head multi-modal DIN improves the recommendation prediction effect, and outperforms current state-of-the-art methods on various comprehensive indicators.

摘要:随着信息技术的发展,人类始终不断生产大量信息。如何获取用户对来自大量信息感兴趣的信息已成为用户甚至业务经理的极大关注的问题。为了解决这个问题,从传统的机器学习深入学习推荐系统,研究人员继续改善优化模型和探索解决方案。由于研究人员更多地对推荐模型网络结构进行了优化,因此他们对丰富推荐模型功能的研究较少,并且仍然存在深入推荐模型优化的空间。基于DIN \ Cite {Authors01}模型,增加了多头和多模态模块,它丰富了模型可以使用的功能集,同时增强了模型的交叉组合和拟合能力。实验表明,多头多模态DIN改善了推荐预测效果,并且在各种综合指标上优于最新的现有方法。

附件下载

点击下载今日全部论文列表