目录

概览 (2021-10-22)

• 43篇自然语言处理（NLP: cs.CL)
• 135篇计算机视觉(CV: cs.CV)
• 129篇机器学习(ML: cs.LG)
• 19篇人工智能(AI: cs.AI)
• 其它主题164篇

自然语言处理

NLP-0-标题: A Python Package to Detect Anti-Vaccine Users on Twitter

Abstract: Vaccine hesitancy has a long history but has been recently driven by the anti-vaccine narratives shared online, which significantly degrades the efficacy of vaccination strategies, such as those for COVID-19. Despite broad agreement in the medical community about the safety and efficacy of available vaccines, a large number of social media users continue to be inundated with false information about vaccines and, partly because of this, became indecisive or unwilling to be vaccinated. The goal of this study is to better understand anti-vaccine sentiment, and work to reduce its impact, by developing a system capable of automatically identifying the users responsible for spreading anti-vaccine narratives. We introduce a publicly available Python package capable of analyzing Twitter profiles to assess how likely that profile is to spread anti-vaccine sentiment in the future. The software package is built using text embedding methods, neural networks, and automated dataset generation. It is trained on over one hundred thousand accounts and several million tweets. This model will help researchers and policy-makers understand anti-vaccine discussion and misinformation strategies, which can further help tailor targeted campaigns seeking to inform and debunk the harmful anti-vaccination myths currently being spread. Additionally, we leverage the data on such users to understand what are the moral and emotional characteristics of anti-vaccine spreaders.

NLP-1-标题: Fast Model Editing at Scale

Abstract: While large pre-trained models have enabled impressive results on a variety of downstream tasks, the largest existing models still make errors, and even accurate predictions may become outdated over time. Because detecting all such failures at training time is impossible, enabling both developers and end users of such models to correct inaccurate outputs while leaving the model otherwise intact is desirable. However, the distributed, black-box nature of the representations learned by large neural networks makes producing such targeted edits difficult. If presented with only a single problematic input and new desired output, fine-tuning approaches tend to overfit; other editing algorithms are either computationally infeasible or simply ineffective when applied to very large models. To enable easy post-hoc editing at scale, we propose Model Editor Networks with Gradient Decomposition (MEND), a collection of small auxiliary editing networks that use a single desired input-output pair to make fast, local edits to a pre-trained model. MEND learns to transform the gradient obtained by standard fine-tuning, using a low-rank decomposition of the gradient to make the parameterization of this transformation tractable. MEND can be trained on a single GPU in less than a day even for 10 billion+ parameter models; once trained MEND enables rapid application of new edits to the pre-trained model. Our experiments with T5, GPT, BERT, and BART models show that MEND is the only approach to model editing that produces effective edits for models with tens of millions to over 10 billion parameters. Implementation available at this https URL.

NLP-2-标题: A Systematic Review on the Detection of Fake News Articles

Abstract: It has been argued that fake news and the spread of false information pose a threat to societies throughout the world, from influencing the results of elections to hindering the efforts to manage the COVID-19 pandemic. To combat this threat, a number of Natural Language Processing (NLP) approaches have been developed. These leverage a number of datasets, feature extraction/selection techniques and machine learning (ML) algorithms to detect fake news before it spreads. While these methods are well-documented, there is less evidence regarding their efficacy in this domain. By systematically reviewing the literature, this paper aims to delineate the approaches for fake news detection that are most performant, identify limitations with existing approaches, and suggest ways these can be mitigated. The analysis of the results indicates that Ensemble Methods using a combination of news content and socially-based features are currently the most effective. Finally, it is proposed that future research should focus on developing approaches that address generalisability issues (which, in part, arise from limitations with current datasets), explainability and bias.

NLP-3-标题: Topic-Guided Abstractive Multi-Document Summarization

Abstract: A critical point of multi-document summarization (MDS) is to learn the relations among various documents. In this paper, we propose a novel abstractive MDS model, in which we represent multiple documents as a heterogeneous graph, taking semantic nodes of different granularities into account, and then apply a graph-to-sequence framework to generate summaries. Moreover, we employ a neural topic model to jointly discover latent topics that can act as cross-document semantic units to bridge different documents and provide global information to guide the summary generation. Since topic extraction can be viewed as a special type of summarization that “summarizes” texts into a more abstract format, i.e., a topic distribution, we adopt a multi-task learning strategy to jointly train the topic and summarization module, allowing the promotion of each other. Experimental results on the Multi-News dataset demonstrate that our model outperforms previous state-of-the-art MDS models on both Rouge metrics and human evaluation, meanwhile learns high-quality topics.

NLP-4-标题: DAIR: Data Augmented Invariant Regularization

Abstract: While deep learning through empirical risk minimization (ERM) has succeeded at achieving human-level performance at a variety of complex tasks, ERM generalizes poorly to distribution shift. This is partly explained by overfitting to spurious features such as background in images or named entities in natural language. Synthetic data augmentation followed by empirical risk minimization (DA-ERM) is a simple yet powerful solution to remedy this problem. In this paper, we propose data augmented invariant regularization (DAIR). The idea of DAIR is based on the observation that the model performance (loss) is desired to be consistent on the augmented sample and the original one. DAIR introduces a regularizer on DA-ERM to penalize such loss inconsistency. Both theoretically and through empirical experiments, we show that a particular form of the DAIR regularizer consistently performs well in a variety of settings. We apply it to multiple real-world learning problems involving domain shift, namely robust regression, visual question answering, robust deep neural network training, and task-oriented dialog modeling. Our experiments show that DAIR consistently outperforms ERM and DA-ERM with little marginal cost and setting new state-of-the-art results in several benchmarks.

NLP-5-标题: Asynchronous Decentralized Distributed Training of Acoustic Models

Abstract: Large-scale distributed training of deep acoustic models plays an important role in today’s high-performance automatic speech recognition (ASR). In this paper we investigate a variety of asynchronous decentralized distributed training strategies based on data parallel stochastic gradient descent (SGD) to show their superior performance over the commonly-used synchronous distributed training via allreduce, especially when dealing with large batch sizes. Specifically, we study three variants of asynchronous decentralized parallel SGD (ADPSGD), namely, fixed and randomized communication patterns on a ring as well as a delay-by-one scheme. We introduce a mathematical model of ADPSGD, give its theoretical convergence rate, and compare the empirical convergence behavior and straggler resilience properties of the three variants. Experiments are carried out on an IBM supercomputer for training deep long short-term memory (LSTM) acoustic models on the 2000-hour Switchboard dataset. Recognition and speedup performance of the proposed strategies are evaluated under various training configurations. We show that ADPSGD with fixed and randomized communication patterns cope well with slow learners. When learners are equally fast, ADPSGD with the delay-by-one strategy has the fastest convergence with large batches. In particular, using the delay-by-one strategy, we can train the acoustic model in less than 2 hours using 128 V100 GPUs with competitive word error rates.

NLP-6-标题: A guided journey through non-interactive automatic story generation

Abstract: We present a literature survey on non-interactive computational story generation. The article starts with the presentation of requirements for creative systems, three types of models of creativity (computational, socio-cultural, and individual), and models of human creative writing. Then it reviews each class of story generation approach depending on the used technology: story-schemas, analogy, rules, planning, evolutionary algorithms, implicit knowledge learning, and explicit knowledge learning. Before the concluding section, the article analyses the contributions of the reviewed work to improve the quality of the generated stories. This analysis addresses the description of the story characters, the use of narrative knowledge including about character believability, and the possible lack of more comprehensive or more detailed knowledge or creativity models. Finally, the article presents concluding remarks in the form of suggestions of research topics that might have a significant impact on the advancement of the state of the art on autonomous non-interactive story generation systems. The article concludes that the autonomous generation and adoption of the main idea to be conveyed and the autonomous design of the creativity ensuring criteria are possibly two of most important topics for future research.

NLP-7-标题: Modeling Performance in Open-Domain Dialogue with PARADISE

Abstract: There has recently been an explosion of work on spoken dialogue systems, along with an increased interest in open-domain systems that engage in casual conversations on popular topics such as movies, books and music. These systems aim to socially engage, entertain, and even empathize with their users. Since the achievement of such social goals is hard to measure, recent research has used dialogue length or human ratings as evaluation metrics, and developed methods for automatically calculating novel metrics, such as coherence, consistency, relevance and engagement. Here we develop a PARADISE model for predicting the performance of Athena, a dialogue system that has participated in thousands of conversations with real users, while competing as a finalist in the Alexa Prize. We use both user ratings and dialogue length as metrics for dialogue quality, and experiment with predicting these metrics using automatic features that are both system dependent and independent. Our goal is to learn a general objective function that can be used to optimize the dialogue choices of any Alexa Prize system in real time and evaluate its performance. Our best model for predicting user ratings gets an R$^2$ of .136 with a DistilBert model, and the best model for predicting length with system independent features gets an R$^2$ of .865, suggesting that conversation length may be a more reliable measure for automatic training of dialogue systems.

ML-17-标题: Anti-Concentrated Confidence Bonuses for Scalable Exploration

Abstract: Intrinsic rewards play a central role in handling the exploration-exploitation trade-off when designing sequential decision-making algorithms, in both foundational theory and state-of-the-art deep reinforcement learning. The LinUCB algorithm, a centerpiece of the stochastic linear bandits literature, prescribes an elliptical bonus which addresses the challenge of leveraging shared information in large action spaces. This bonus scheme cannot be directly transferred to high-dimensional exploration problems, however, due to the computational cost of maintaining the inverse covariance matrix of action features. We introduce \emph{anti-concentrated confidence bounds} for efficiently approximating the elliptical bonus, using an ensemble of regressors trained to predict random noise from policy network-derived features. Using this approximation, we obtain stochastic linear bandit algorithms which obtain $\tilde O(d \sqrt{T})$ regret bounds for $\mathrm{poly}(d)$ fixed actions. We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic reward heuristics on Atari benchmarks.

ML-42-标题: SecureBoost : A High Performance Gradient Boosting Tree Framework for Large Scale Vertical Federated Learning

Abstract: Gradient boosting decision tree (GBDT) is a widely used ensemble algorithm in the industry. Its vertical federated learning version, SecureBoost, is one of the most popular algorithms used in cross-silo privacy-preserving modeling. As the area of privacy computation thrives in recent years, demands for large-scale and high-performance federated learning have grown dramatically in real-world applications. In this paper, to fulfill these requirements, we propose SecureBoost+ that is both novel and improved from the prior work SecureBoost. SecureBoost+ integrates several ciphertext calculation optimizations and engineering optimizations. The experimental results demonstrate that Secureboost+ has significant performance improvements on large and high dimensional data sets compared to SecureBoost. It makes effective and efficient large-scale vertical federated learning possible.

ML-43-标题: On some theoretical limitations of Generative Adversarial Networks

Abstract: Generative Adversarial Networks have become a core technique in Machine Learning to generate unknown distributions from data samples. They have been used in a wide range of context without paying much attention to the possible theoretical limitations of those models. Indeed, because of the universal approximation properties of Neural Networks, it is a general assumption that GANs can generate any probability distribution. Recently, people began to question this assumption and this article is in line with this thinking. We provide a new result based on Extreme Value Theory showing that GANs can’t generate heavy tailed distributions. The full proof of this result is given.

ML-44-标题: An Empirical Evaluation of Time-Series Feature Sets

Abstract: Solving time-series problems with features has been rising in popularity due to the availability of software for feature extraction. Feature-based time-series analysis can now be performed using many different feature sets, including hctsa (7730 features: Matlab), feasts (42 features: R), tsfeatures (63 features: R), Kats (40 features: Python), tsfresh (up to 1558 features: Python), TSFEL (390 features: Python), and the C-coded catch22 (22 features: Matlab, R, Python, and Julia). There is substantial overlap in the types of methods included in these sets (e.g., properties of the autocorrelation function and Fourier power spectrum), but they are yet to be systematically compared. Here we compare these seven sets on computational speed, assess the redundancy of features contained in each, and evaluate the overlap and redundancy between them. We take an empirical approach to feature similarity based on outputs across a diverse set of real-world and simulated time series. We find that feature sets vary across three orders of magnitude in their computation time per feature on a laptop for a 1000-sample series, from the fastest sets catch22 and TSFEL (~0.1ms per feature) to tsfeatures (~3s per feature). Using PCA to evaluate feature redundancy within each set, we find the highest within-set redundancy for TSFEL and tsfresh. For example, in TSFEL, 90% of the variance across 390 features can be captured with just four PCs. Finally, we introduce a metric for quantifying overlap between pairs of feature sets, which indicates substantial overlap. We found that the largest feature set, hctsa, is the most comprehensive, and that tsfresh is the most distinctive, due to its incorporation of many low-level Fourier coefficients. Our results provide empirical understanding of the differences between existing feature sets, information that can be used to better tailor feature sets to their applications.

ML-45-标题: Finite Volume Least-Squares Neural Network (FV-LSNN) Method for Scalar Nonlinear Hyperbolic Conservation Laws

Abstract: In [4], we introduced the least-squares ReLU neural network (LSNN) method for solving the linear advection-reaction problem with discontinuous solution and showed that the number of degrees of freedom for the LSNN method is significantly less than that of traditional mesh-based methods. The LSNN method is a discretization of an equivalent least-squares (LS) formulation in the class of neural network functions with the ReLU activation function; and evaluation of the LS functional is done by using numerical integration and proper numerical differentiation. By developing a novel finite volume approximation (FVA) to the divergence operator, this paper studies the LSNN method for scalar nonlinear hyperbolic conservation laws. The FVA introduced in this paper is tailored to the LSNN method and is more accurate than traditional, well-studied FV schemes used in mesh-based numerical methods. Numerical results of some benchmark test problems with both convex and non-convex fluxes show that the finite volume LSNN (FV-LSNN) method is capable of computing the physical solution for problems with rarefaction waves and capturing the shock of the underlying problem automatically through the free hyper-planes of the ReLU neural network. Moreover, the method does not exhibit the common Gibbs phenomena along the discontinuous interface.

ML-46-标题: A Real-Time Energy and Cost Efficient Vehicle Route Assignment Neural Recommender System

Abstract: This paper presents a neural network recommender system algorithm for assigning vehicles to routes based on energy and cost criteria. In this work, we applied this new approach to efficiently identify the most cost-effective medium and heavy duty truck (MDHDT) powertrain technology, from a total cost of ownership (TCO) perspective, for given trips. We employ a machine learning based approach to efficiently estimate the energy consumption of various candidate vehicles over given routes, defined as sequences of links (road segments), with little information known about internal dynamics, i.e using high level macroscopic route information. A complete recommendation logic is then developed to allow for real-time optimum assignment for each route, subject to the operational constraints of the fleet. We show how this framework can be used to (1) efficiently provide a single trip recommendation with a top-$k$ vehicles star ranking system, and (2) engage in more general assignment problems where $n$ vehicles need to be deployed over $m \leq n$ trips. This new assignment system has been deployed and integrated into the POLARIS Transportation System Simulation Tool for use in research conducted by the Department of Energy’s Systems and Modeling for Accelerated Research in Transportation (SMART) Mobility Consortium

ML-49-标题: Using NASA Satellite Data Sources and Geometric Deep Learning to Uncover Hidden Patterns in COVID-19 Clinical Severity

Abstract: As multiple adverse events in 2021 illustrated, virtually all aspects of our societal functioning – from water and food security to energy supply to healthcare – more than ever depend on the dynamics of environmental factors. Nevertheless, the social dimensions of weather and climate are noticeably less explored by the machine learning community, largely, due to the lack of reliable and easy access to use data. Here we present a unique not yet broadly available NASA’s satellite dataset on aerosol optical depth (AOD), temperature and relative humidity and discuss the utility of these new data for COVID-19 biosurveillance. In particular, using the geometric deep learning models for semi-supervised classification on a county-level basis over the contiguous United States, we investigate the pressing societal question whether atmospheric variables have considerable impact on COVID-19 clinical severity.

Abstract: Here I present a small update to the bias correction term in the Adam optimizer that has the advantage of behaving well in the first several steps. The default implementation of Adam may be as sensitive as it is to hyperparameters partially due to the originally proposed bias correction procedure, and its behavior in early steps of training.

ML-51-标题: Shaking the foundations: delusions in sequence models for interaction and control

Abstract: The recent phenomenal success of language models has reinvigorated machine learning research, and large sequence models such as transformers are being applied to a variety of domains. One important problem class that has remained relatively elusive however is purposeful adaptive behavior. Currently there is a common perception that sequence models “lack the understanding of the cause and effect of their actions” leading them to draw incorrect inferences due to auto-suggestive delusions. In this report we explain where this mismatch originates, and show that it can be resolved by treating actions as causal interventions. Finally, we show that in supervised learning, one can teach a system to condition or intervene on data by training with factual and counterfactual error signals respectively.

ML-52-标题: Convergence Analysis and Implicit Regularization of Feedback Alignment for Deep Linear Networks

Abstract: We theoretically analyze the Feedback Alignment (FA) algorithm, an efficient alternative to backpropagation for training neural networks. We provide convergence guarantees with rates for deep linear networks for both continuous and discrete dynamics. Additionally, we study incremental learning phenomena for shallow linear networks. Interestingly, certain specific initializations imply that negligible components are learned before the principal ones, thus potentially negatively affecting the effectiveness of such a learning algorithm; a phenomenon we classify as implicit anti-regularization. We also provide initialization schemes where the components of the problem are approximately learned by decreasing order of importance, thus providing a form of implicit regularization.

ML-53-标题: Hierarchical Skills for Efficient Exploration

ML-54-标题: Propensity-scored Probabilistic Label Trees

Abstract: Extreme multi-label classification (XMLC) refers to the task of tagging instances with small subsets of relevant labels coming from an extremely large set of all possible labels. Recently, XMLC has been widely applied to diverse web applications such as automatic content labeling, online advertising, or recommendation systems. In such environments, label distribution is often highly imbalanced, consisting mostly of very rare tail labels, and relevant labels can be missing. As a remedy to these problems, the propensity model has been introduced and applied within several XMLC algorithms. In this work, we focus on the problem of optimal predictions under this model for probabilistic label trees, a popular approach for XMLC problems. We introduce an inference procedure, based on the $A^*$-search algorithm, that efficiently finds the optimal solution, assuming that all probabilities and propensities are known. We demonstrate the attractiveness of this approach in a wide empirical study on popular XMLC benchmark datasets.

ML-55-标题: A Data-Centric Optimization Framework for Machine Learning

Abstract: Rapid progress in deep learning is leading to a diverse set of quickly changing models, with a dramatically growing demand for compute. However, as frameworks specialize optimization to patterns in popular networks, they implicitly constrain novel and diverse models that drive progress in research. We empower deep learning researchers by defining a flexible and user-customizable pipeline for optimizing training of arbitrary deep neural networks, based on data movement minimization. The pipeline begins with standard networks in PyTorch or ONNX and transforms computation through progressive lowering. We define four levels of general-purpose transformations, from local intra-operator optimizations to global data movement reduction. These operate on a data-centric graph intermediate representation that expresses computation and data movement at all levels of abstraction, including expanding basic operators such as convolutions to their underlying computations. Central to the design is the interactive and introspectable nature of the pipeline. Every part is extensible through a Python API, and can be tuned interactively using a GUI. We demonstrate competitive performance or speedups on ten different networks, with interactive optimizations discovering new opportunities in EfficientNet.

ML-56-标题: Dynamic Bottleneck for Robust Self-Supervised Exploration

Abstract: Exploration methods based on pseudo-count of transitions or curiosity of dynamics have achieved promising results in solving reinforcement learning with sparse rewards. However, such methods are usually sensitive to environmental dynamics-irrelevant information, e.g., white-noise. To handle such dynamics-irrelevant information, we propose a Dynamic Bottleneck (DB) model, which attains a dynamics-relevant representation based on the information-bottleneck principle. Based on the DB model, we further propose DB-bonus, which encourages the agent to explore state-action pairs with high information gain. We establish theoretical connections between the proposed DB-bonus, the upper confidence bound (UCB) for linear case, and the visiting count for tabular case. We evaluate the proposed method on Atari suits with dynamics-irrelevant noises. Our experiments show that exploration with DB bonus outperforms several state-of-the-art exploration methods in noisy environments.

ML-57-标题: Part-X: A Family of Stochastic Algorithms for Search-Based Test Generation with Probabilistic Guarantees

Abstract: Requirements driven search-based testing (also known as falsification) has proven to be a practical and effective method for discovering erroneous behaviors in Cyber-Physical Systems. Despite the constant improvements on the performance and applicability of falsification methods, they all share a common characteristic. Namely, they are best-effort methods which do not provide any guarantees on the absence of erroneous behaviors (falsifiers) when the testing budget is exhausted. The absence of finite time guarantees is a major limitation which prevents falsification methods from being utilized in certification procedures. In this paper, we address the finite-time guarantees problem by developing a new stochastic algorithm. Our proposed algorithm not only estimates (bounds) the probability that falsifying behaviors exist, but also it identifies the regions where these falsifying behaviors may occur. We demonstrate the applicability of our approach on standard benchmark functions from the optimization literature and on the F16 benchmark problem.

ML-58-标题: PPFS: Predictive Permutation Feature Selection

Abstract: We propose Predictive Permutation Feature Selection (PPFS), a novel wrapper-based feature selection method based on the concept of Markov Blanket (MB). Unlike previous MB methods, PPFS is a universal feature selection technique as it can work for both classification as well as regression tasks on datasets containing categorical and/or continuous features. We propose Predictive Permutation Independence (PPI), a new Conditional Independence (CI) test, which enables PPFS to be categorised as a wrapper feature selection method. This is in contrast to current filter based MB feature selection techniques that are unable to harness the advancements in supervised algorithms such as Gradient Boosting Machines (GBM). The PPI test is based on the knockoff framework and utilizes supervised algorithms to measure the association between an individual or a set of features and the target variable. We also propose a novel MB aggregation step that addresses the issue of sample inefficiency. Empirical evaluations and comparisons on a large number of datasets demonstrate that PPFS outperforms state-of-the-art Markov blanket discovery algorithms as well as, well-known wrapper methods. We also provide a sketch of the proof of correctness of our method. Implementation of this work is available at \url{this https URL}

ML-59-标题: SEA: Graph Shell Attention in Graph Neural Networks

Abstract: A common issue in Graph Neural Networks (GNNs) is known as over-smoothing. By increasing the number of iterations within the message-passing of GNNs, the nodes’ representations of the input graph align with each other and become indiscernible. Recently, it has been shown that increasing a model’s complexity by integrating an attention mechanism yields more expressive architectures. This is majorly contributed to steering the nodes’ representations only towards nodes that are more informative than others. Transformer models in combination with GNNs result in architectures including Graph Transformer Layers (GTL), where layers are entirely based on the attention operation. However, the calculation of a node’s representation is still restricted to the computational working flow of a GNN. In our work, we relax the GNN architecture by means of implementing a routing heuristic. Specifically, the nodes’ representations are routed to dedicated experts. Each expert calculates the representations according to their respective GNN workflow. The definitions of distinguishable GNNs result from k-localized views starting from the central node. We call this procedure Graph Shell Attention (SEA), where experts process different subgraphs in a transformer-motivated fashion. Intuitively, by increasing the number of experts, the models gain in expressiveness such that a node’s representation is solely based on nodes that are located within the receptive field of an expert. We evaluate our architecture on various benchmark datasets showing competitive results compared to state-of-the-art models.

ML-60-标题: OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems

Abstract: Python has become a dominant programming language for emerging areas like Machine Learning (ML), Deep Learning (DL), and Data Science (DS). An attractive feature of Python is that it provides easy-to-use programming interface while allowing library developers to enhance performance of their applications by harnessing the computing power offered by High Performance Computing (HPC) platforms. Efficient communication is key to scaling applications on parallel systems, which is typically enabled by the Message Passing Interface (MPI) standard and compliant libraries on HPC hardware. mpi4py is a Python-based communication library that provides an MPI-like interface for Python applications allowing application developers to utilize parallel processing elements including GPUs. However, there is currently no benchmark suite to evaluate communication performance of mpi4py – and Python MPI codes in general – on modern HPC systems. In order to bridge this gap, we propose OMB-Py – Python extensions to the open-source OSU Micro-Benchmark (OMB) suite – aimed to evaluate communication performance of MPI-based parallel applications in Python. To the best of our knowledge, OMB-Py is the first communication benchmark suite for parallel Python applications. OMB-Py consists of a variety of point-to-point and collective communication benchmark tests that are implemented for a range of popular Python libraries including NumPy, CuPy, Numba, and PyCUDA. We also provide Python implementation for several distributed ML algorithms as benchmarks to understand the potential gain in performance for ML/DL workloads. Our evaluation reveals that mpi4py introduces a small overhead when compared to native MPI libraries. We also evaluate the ML/DL workloads and report up to 106x speedup on 224 CPU cores compared to sequential execution. We plan to publicly release OMB-Py to benefit Python HPC community.

ML-61-标题: Adversarial Socialbot Learning via Multi-Agent Deep Hierarchical Reinforcement Learning

Abstract: Socialbots are software-driven user accounts on social platforms, acting autonomously (mimicking human behavior), with the aims to influence the opinions of other users or spread targeted misinformation for particular goals. As socialbots undermine the ecosystem of social platforms, they are often considered harmful. As such, there have been several computational efforts to auto-detect the socialbots. However, to our best knowledge, the adversarial nature of these socialbots has not yet been studied. This begs a question “can adversaries, controlling socialbots, exploit AI techniques to their advantage?” To this question, we successfully demonstrate that indeed it is possible for adversaries to exploit computational learning mechanism such as reinforcement learning (RL) to maximize the influence of socialbots while avoiding being detected. We first formulate the adversarial socialbot learning as a cooperative game between two functional hierarchical RL agents. While one agent curates a sequence of activities that can avoid the detection, the other agent aims to maximize network influence by selectively connecting with right users. Our proposed policy networks train with a vast amount of synthetic graphs and generalize better than baselines on unseen real-life graphs both in terms of maximizing network influence (up to +18%) and sustainable stealthiness (up to +40% undetectability) under a strong bot detector (with 90% detection accuracy). During inference, the complexity of our approach scales linearly, independent of a network’s structure and the virality of news. This makes our approach a practical adversarial attack when deployed in a real-life setting.

ML-62-标题: More Efficient Exploration with Symbolic Priors on Action Sequence Equivalences

Abstract: Incorporating prior knowledge in reinforcement learning algorithms is mainly an open question. Even when insights about the environment dynamics are available, reinforcement learning is traditionally used in a tabula rasa setting and must explore and learn everything from scratch. In this paper, we consider the problem of exploiting priors about action sequence equivalence: that is, when different sequences of actions produce the same effect. We propose a new local exploration strategy calibrated to minimize collisions and maximize new state visitations. We show that this strategy can be computed at little cost, by solving a convex optimization problem. By replacing the usual epsilon-greedy strategy in a DQN, we demonstrate its potential in several environments with various dynamic structures.

ML-63-标题: Independent Natural Policy Gradient Always Converges in Markov Potential Games

Abstract: Multi-agent reinforcement learning has been successfully applied to fully-cooperative and fully-competitive environments, but little is currently known about mixed cooperative/competitive environments. In this paper, we focus on a particular class of multi-agent mixed cooperative/competitive stochastic games called Markov Potential Games (MPGs), which include cooperative games as a special case. Recent results have shown that independent policy gradient converges in MPGs but it was not known whether Independent Natural Policy Gradient converges in MPGs as well. We prove that Independent Natural Policy Gradient always converges in the last iterate using constant learning rates. The proof deviates from the existing approaches and the main challenge lies in the fact that Markov Potential Games do not have unique optimal values (as single-agent settings exhibit) so different initializations can lead to different limit point values. We complement our theoretical results with experiments that indicate that Natural Policy Gradient outperforms Policy Gradient in routing games and congestion games.

ML-64-标题: Transductive Robust Learning Guarantees

Abstract: We study the problem of adversarially robust learning in the transductive setting. For classes $\mathcal{H}$ of bounded VC dimension, we propose a simple transductive learner that when presented with a set of labeled training examples and a set of unlabeled test examples (both sets possibly adversarially perturbed), it correctly labels the test examples with a robust error rate that is linear in the VC dimension and is adaptive to the complexity of the perturbation set. This result provides an exponential improvement in dependence on VC dimension over the best known upper bound on the robust error in the inductive setting, at the expense of competing with a more restrictive notion of optimal robust error.

ML-65-标题: Color Teams for Machine Learning Development

Abstract: Machine learning and software development share processes and methodologies for reliably delivering products to customers. This work proposes the use of a new teaming construct for forming machine learning teams for better combatting adversarial attackers. In cybersecurity, infrastructure uses these teams to protect their systems by using system builders and programmers to also offer more robustness to their platforms. Color teams provide clear responsibility to the individuals on each team for which part of the baseline (Yellow), attack (Red), and defense (Blue) breakout of the pipeline. Combining colors leads to additional knowledge shared across the team and more robust models built during development. The responsibilities of the new teams Orange, Green, and Purple will be outlined during this paper along with an overview of the necessary resources for these teams to be successful.

ML-66-标题: Time-Domain Mapping Based Single-Channel Speech Separation With Hierarchical Constraint Training

Abstract: Single-channel speech separation is required for multi-speaker speech recognition. Recent deep learning-based approaches focused on time-domain audio separation net (TasNet) because it has superior performance and lower latency compared to the conventional time-frequency-based (T-F-based) approaches. Most of these works rely on the masking-based method that estimates a linear mapping function (mask) for each speaker. However, the other commonly used method, the mapping-based method that is less sensitive to SNR variations, is inadequately studied in the time domain. We explore the potential of the mapping-based method by introducing attention augmented DPRNN (AttnAugDPRNN) which directly approximates the clean sources from the mixture for speech separation. Permutation Invariant Training (PIT) has been a paradigm to solve the label ambiguity problem for speech separation but usually leads to suboptimal performance. To solve this problem, we propose an efficient training strategy called Hierarchical Constraint Training (HCT) to regularize the training, which could effectively improve the model performance. When using PIT, our results showed that mapping-based AttnAugDPRNN outperformed masking-based AttnAugDPRNN when the training corpus is large. Mapping-based AttnAugDPRNN with HCT significantly improved the SI-SDR by 10.1% compared to the masking-based AttnAugDPRNN without HCT.

ML-67-标题: Distributionally Robust Semi-Supervised Learning Over Graphs

Abstract: Semi-supervised learning (SSL) over graph-structured data emerges in many network science applications. To efficiently manage learning over graphs, variants of graph neural networks (GNNs) have been developed recently. By succinctly encoding local graph structures and features of nodes, state-of-the-art GNNs can scale linearly with the size of graph. Despite their success in practice, most of existing methods are unable to handle graphs with uncertain nodal attributes. Specifically whenever mismatches between training and testing data distribution exists, these models fail in practice. Challenges also arise due to distributional uncertainties associated with data acquired by noisy measurements. In this context, a distributionally robust learning framework is developed, where the objective is to train models that exhibit quantifiable robustness against perturbations. The data distribution is considered unknown, but lies within a Wasserstein ball centered around empirical data distribution. A robust model is obtained by minimizing the worst expected loss over this ball. However, solving the emerging functional optimization problem is challenging, if not impossible. Advocating a strong duality condition, we develop a principled method that renders the problem tractable and efficiently solvable. Experiments assess the performance of the proposed method.

ML-68-标题: Behavioral Experiments for Understanding Catastrophic Forgetting

Abstract: In this paper we explore whether the fundamental tool of experimental psychology, the behavioral experiment, has the power to generate insight not only into humans and animals, but artificial systems too. We apply the techniques of experimental psychology to investigating catastrophic forgetting in neural networks. We present a series of controlled experiments with two-layer ReLU networks, and exploratory results revealing a new understanding of the behavior of catastrophic forgetting. Alongside our empirical findings, we demonstrate an alternative, behavior-first approach to investigating neural network phenomena.

ML-69-标题: Why Settle for Just One? Extending EL Ontology Embeddings with Many-to-Many Relationships

Abstract: Knowledge Graph (KG) embeddings provide a low-dimensional representation of entities and relations of a Knowledge Graph and are used successfully for various applications such as question answering and search, reasoning, inference, and missing link prediction. However, most of the existing KG embeddings only consider the network structure of the graph and ignore the semantics and the characteristics of the underlying ontology that provides crucial information about relationships between entities in the KG. Recent efforts in this direction involve learning embeddings for a Description Logic (logical underpinning for ontologies) named EL++. However, such methods consider all the relations defined in the ontology to be one-to-one which severely limits their performance and applications. We provide a simple and effective solution to overcome this shortcoming that allows such methods to consider many-to-many relationships while learning embedding representations. Experiments conducted using three different EL++ ontologies show substantial performance improvement over five baselines. Our proposed solution also paves the way for learning embedding representations for even more expressive description logics such as SROIQ.

ML-70-标题: Synthesizing Optimal Parallelism Placement and Reduction Strategies on Hierarchical Systems for Deep Learning

Abstract: We present a novel characterization of the mapping of multiple parallelism forms (e.g. data and model parallelism) onto hierarchical accelerator systems that is hierarchy-aware and greatly reduces the space of software-to-hardware mapping. We experimentally verify the substantial effect of these mappings on all-reduce performance (up to 448x). We offer a novel syntax-guided program synthesis framework that is able to decompose reductions over one or more parallelism axes to sequences of collectives in a hierarchy- and mapping-aware way. For 69% of parallelism placements and user requested reductions, our framework synthesizes programs that outperform the default all-reduce implementation when evaluated on different GPU hierarchies (max 2.04x, average 1.27x). We complement our synthesis tool with a simulator exceeding 90% top-10 accuracy, which therefore reduces the need for massive evaluations of synthesis results to determine a small set of optimal programs and mappings.

ML-71-标题: Ranking and Tuning Pre-trained Models: A New Paradigm of Exploiting Model Hubs

Abstract: Pre-trained model hubs with many pre-trained models (PTMs) have been a cornerstone in deep learning. Although built at a high cost, they are in fact \emph{under-exploited}: practitioners usually pick one PTM from the provided model hub by popularity, and then fine-tune the PTM to solve the target task. This nave but common practice poses two obstacles to sufficiently exploiting pre-trained model hubs: (1) the PTM selection procedure has no optimality guarantee; (2) only one PTM is used while the rest PTMs are overlooked. Ideally, to maximally exploit pre-trained model hubs, trying all combinations of PTMs and extensively fine-tuning each combination of PTMs are required, which incurs exponential combinations and unaffordable computational budget. In this paper, we propose a new paradigm of exploiting model hubs by ranking and tuning pre-trained models: (1) Our conference work~\citep{you_logme:_2021} proposed LogME to estimate the maximum value of label evidence given features extracted by pre-trained models, which can rank all the PTMs in a model hub for various types of PTMs and tasks \emph{before fine-tuning}. (2) the best ranked PTM can be fine-tuned and deployed if we have no preference for the model’s architecture, or the target PTM can be tuned by top-K ranked PTMs via the proposed B-Tuning algorithm. The ranking part is based on the conference paper, and we complete its theoretical analysis (convergence proof of the heuristic evidence maximization procedure, and the influence of feature dimension) in this paper. The tuning part introduces a novel Bayesian Tuning (B-Tuning) method for multiple PTMs tuning, which surpasses dedicated methods designed for homogeneous PTMs tuning and sets up new state of the art for heterogeneous PTMs tuning. We believe the new paradigm of exploiting PTM hubs can interest a large audience of the community.

ML-72-标题: Sampling from Arbitrary Functions via PSD Models

Abstract: In many areas of applied statistics and machine learning, generating an arbitrary number of independent and identically distributed (i.i.d.) samples from a given distribution is a key task. When the distribution is known only through evaluations of the density, current methods either scale badly with the dimension or require very involved implementations. Instead, we take a two-step approach by first modeling the probability distribution and then sampling from that model. We use the recently introduced class of positive semi-definite (PSD) models, which have been shown to be efficient for approximating probability densities. We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models. We also present preliminary empirical results to illustrate our assertions.

ML-73-标题: Statistical and Topological Properties of Gaussian Smoothed Sliced Probability Divergences

Abstract: Gaussian smoothed sliced Wasserstein distance has been recently introduced for comparing probability distributions, while preserving privacy on the data. It has been shown, in applications such as domain adaptation, to provide performances similar to its non-private (non-smoothed) counterpart. However, the computational and statistical properties of such a metric is not yet been well-established. In this paper, we analyze the theoretical properties of this distance as well as those of generalized versions denoted as Gaussian smoothed sliced divergences. We show that smoothing and slicing preserve the metric property and the weak topology. We also provide results on the sample complexity of such divergences. Since, the privacy level depends on the amount of Gaussian smoothing, we analyze the impact of this parameter on the divergence. We support our theoretical findings with empirical studies of Gaussian smoothed and sliced version of Wassertein distance, Sinkhorn divergence and maximum mean discrepancy (MMD). In the context of privacy-preserving domain adaptation, we confirm that those Gaussian smoothed sliced Wasserstein and MMD divergences perform very well while ensuring data privacy.

ML-74-标题: CIM-PPO:Proximal Policy Optimization with Liu-Correntropy Induced Metric

Abstract: As an algorithm based on deep reinforcement learning, Proximal Policy Optimization (PPO) performs well in many complex tasks and has become one of the most popular RL algorithms in recent years. According to the mechanism of penalty in surrogate objective, PPO can be divided into PPO with KL Divergence (KL-PPO) and PPO with Clip function(Clip-PPO). Clip-PPO is widely used in a variety of practical scenarios and has attracted the attention of many researchers. Therefore, many variations have also been created, making the algorithm better and better. However, as a more theoretical algorithm, KL-PPO was neglected because its performance was not as good as CliP-PPO. In this article, we analyze the asymmetry effect of KL divergence on PPO’s objective function , and give the inequality that can indicate when the asymmetry will affect the efficiency of KL-PPO. Proposed PPO with Correntropy Induced Metric algorithm(CIM-PPO) that use the theory of correntropy(a symmetry metric method that was widely used in M-estimation to evaluate two distributions’ difference)and applied it in PPO. Then, we designed experiments based on OpenAIgym to test the effectiveness of the new algorithm and compare it with KL-PPO and CliP-PPO.

ML-75-标题: Periodic DMP formulation for Quaternion Trajectories

Abstract: Imitation learning techniques have been used as a way to transfer skills to robots. Among them, dynamic movement primitives (DMPs) have been widely exploited as an effective and an efficient technique to learn and reproduce complex discrete and periodic skills. While DMPs have been properly formulated for learning point-to-point movements for both translation and orientation, periodic ones are missing a formulation to learn the orientation. To address this gap, we propose a novel DMP formulation that enables encoding of periodic orientation trajectories. Within this formulation we develop two approaches: Riemannian metric-based projection approach and unit quaternion based periodic DMP. Both formulations exploit unit quaternions to represent the orientation. However, the first exploits the properties of Riemannian manifolds to work in the tangent space of the unit sphere. The second encodes directly the unit quaternion trajectory while guaranteeing the unitary norm of the generated quaternions. We validated the technical aspects of the proposed methods in simulation. Then we performed experiments on a real robot to execute daily tasks that involve periodic orientation changes (i.e., surface polishing/wiping and liquid mixing by shaking).

ML-76-标题: Transferring Reinforcement Learning for DC-DC Buck Converter Control via Duty Ratio Mapping: From Simulation to Implementation

Abstract: Reinforcement learning (RL) control approach with application into power electronics systems has become an emerging topic whilst the sim-to-real issue remains a challenging problem as very few results can be referred to in the literature. Indeed, due to the inevitable mismatch between simulation models and real-life systems, offline trained RL control strategies may sustain unexpected hurdles in practical implementation during transferring procedure. As the main contribution of this paper, a transferring methodology via a delicately designed duty ratio mapping (DRM) is proposed for a DC-DC buck converter. Then, a detailed sim-to-real process is presented to enable the implementation of a model-free deep reinforcement learning (DRL) controller. The feasibility and effectiveness of the proposed methodology are demonstrated by comparative experimental studies.

ML-77-标题: A TinyML Platform for On-Device Continual Learning with Quantized Latent Replays

Abstract: In the last few years, research and development on Deep Learning models and techniques for ultra-low-power devices in a word, TinyML has mainly focused on a train-then-deploy assumption, with static models that cannot be adapted to newly collected data without cloud-based data collection and fine-tuning. Latent Replay-based Continual Learning (CL) techniques[1] enable online, serverless adaptation in principle, but so farthey have still been too computation and memory-hungry for ultra-low-power TinyML devices, which are typically based on microcontrollers. In this work, we introduce a HW/SW platform for end-to-end CL based on a 10-core FP32-enabled parallel ultra-low-power (PULP) processor. We rethink the baseline Latent Replay CL algorithm, leveraging quantization of the frozen stage of the model and Latent Replays (LRs) to reduce their memory cost with minimal impact on accuracy. In particular, 8-bit compression of the LR memory proves to be almost lossless (-0.26% with 3000LR) compared to the full-precision baseline implementation, but requires 4x less memory, while 7-bit can also be used with an additional minimal accuracy degradation (up to 5%). We also introduce optimized primitives for forward and backward propagation on the PULP processor. Our results show that by combining these techniques, continual learning can be achieved in practice using less than 64MB of memory an amount compatible with embedding in TinyML devices. On an advanced 22nm prototype of our platform, called VEGA, the proposed solution performs onaverage 65x faster than a low-power STM32 L4 microcontroller, being 37x more energy efficient enough for a lifetime of 535h when learning a new mini-batch of data once every minute.

ML-78-标题: Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation

Abstract: Machine learning training methods depend plentifully and intricately on hyperparameters, motivating automated strategies for their optimisation. Many existing algorithms restart training for each new hyperparameter choice, at considerable computational cost. Some hypergradient-based one-pass methods exist, but these either cannot be applied to arbitrary optimiser hyperparameters (such as learning rates and momenta) or take several times longer to train than their base models. We extend these existing methods to develop an approximate hypergradient-based hyperparameter optimiser which is applicable to any continuous hyperparameter appearing in a differentiable model weight update, yet requires only one training episode, with no restarts. We also provide a motivating argument for convergence to the true hypergradient, and perform tractable gradient-based optimisation of independent learning rates for each model parameter. Our method performs competitively from varied random hyperparameter initialisations on several UCI datasets and Fashion-MNIST (using a one-layer MLP), Penn Treebank (using an LSTM) and CIFAR-10 (using a ResNet-18), in time only 2-3x greater than vanilla training.

ML-79-标题: Reconstruction of Fragmented Trajectories of Collective Motion using Hadamard Deep Autoencoders

Abstract: Learning dynamics of collectively moving agents such as fish or humans is an active field in research. Due to natural phenomena such as occlusion and change of illumination, the multi-object methods tracking such dynamics might lose track of the agents where that might result fragmentation in the constructed trajectories. Here, we present an extended deep autoencoder (DA) that we train only on fully observed segments of the trajectories by defining its loss function as the Hadamard product of a binary indicator matrix with the absolute difference between the outputs and the labels. The trajectories of the agents practicing collective motion is low-rank due to mutual interactions and dependencies between the agents that we utilize as the underlying pattern that our Hadamard deep autoencoder (HDA) codes during its training. The performance of our HDA is compared with that of a low-rank matrix completion scheme in the context of fragmented trajectory reconstruction.

ML-80-标题: ProxyBO: Accelerating Neural Architecture Search via Bayesian Optimization with Zero-cost Proxies

Abstract: Designing neural architectures requires immense manual efforts. This has promoted the development of neural architecture search (NAS) to automate this design. While previous NAS methods achieve promising results but run slowly and zero-cost proxies run extremely fast but are less promising, recent work considers utilizing zero-cost proxies via a simple warm-up. The existing method has two limitations, which are unforeseeable reliability and one-shot usage. To address the limitations, we present ProxyBO, an efficient Bayesian optimization framework that utilizes the zero-cost proxies to accelerate neural architecture search. We propose the generalization ability measurement to estimate the fitness of proxies on the task during each iteration and then combine BO with zero-cost proxies via dynamic influence combination. Extensive empirical studies show that ProxyBO consistently outperforms competitive baselines on five tasks from three public benchmarks. Concretely, ProxyBO achieves up to 5.41x and 3.83x speedups over the state-of-the-art approach REA and BRP-NAS, respectively.

ML-81-标题: Encoding spatiotemporal priors with VAEs for small-area estimation

Abstract: Gaussian processes (GPs), implemented through multivariate Gaussian distributions for a finite collection of data, are the most popular approach in small-area spatiotemporal statistical modelling. In this context they are used to encode correlation structures over space and time and can generalise well in interpolation tasks. Despite their flexibility, off-the-shelf GPs present serious computational challenges which limit their scalability and practical usefulness in applied settings. Here, we propose a novel, deep generative modelling approach to tackle this challenge: for a particular spatiotemporal setting, we approximate a class of GP priors through prior sampling and subsequent fitting of a variational autoencoder (VAE). Given a trained VAE, the resultant decoder allows spatiotemporal inference to become incredibly efficient due to the low dimensional, independently distributed latent Gaussian space representation of the VAE. Once trained, inference using the VAE decoder replaces the GP within a Bayesian sampling framework. This approach provides tractable and easy-to-implement means of approximately encoding spatiotemporal priors and facilitates efficient statistical inference. We demonstrate the utility of our VAE two stage approach on Bayesian, small-area estimation tasks.

ML-82-标题: JavaBERT: Training a transformer-based model for the Java programming language

Abstract: Code quality is and will be a crucial factor while developing new software code, requiring appropriate tools to ensure functional and reliable code. Machine learning techniques are still rarely used for software engineering tools, missing out the potential benefits of its application. Natural language processing has shown the potential to process text data regarding a variety of tasks. We argue, that such models can also show similar benefits for software code processing. In this paper, we investigate how models used for natural language processing can be trained upon software code. We introduce a data retrieval pipeline for software code and train a model upon Java software code. The resulting model, JavaBERT, shows a high accuracy on the masked language modeling task showing its potential for software engineering tools.

ML-83-标题: An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR

Abstract: In the present paper, an attempt is made to combine Mask-CTC and the triggered attention mechanism to construct a streaming end-to-end automatic speech recognition (ASR) system that provides high performance with low latency. The triggered attention mechanism, which performs autoregressive decoding triggered by the CTC spike, has shown to be effective in streaming ASR. However, in order to maintain high accuracy of alignment estimation based on CTC outputs, which is the key to its performance, it is inevitable that decoding should be performed with some future information input (i.e., with higher latency). It should be noted that in streaming ASR, it is desirable to be able to achieve high recognition accuracy while keeping the latency low. Therefore, the present study aims to achieve highly accurate streaming ASR with low latency by introducing Mask-CTC, which is capable of learning feature representations that anticipate future information (i.e., that can consider long-term contexts), to the encoder pre-training. Experimental comparisons conducted using WSJ data demonstrate that the proposed method achieves higher accuracy with lower latency than the conventional triggered attention-based streaming ASR system.

ML-84-标题: Robust lEarned Shrinkage-Thresholding (REST): Robust unrolling for sparse recover

Abstract: In this paper, we consider deep neural networks for solving inverse problems that are robust to forward model mis-specifications. Specifically, we treat sensing problems with model mismatch where one wishes to recover a sparse high-dimensional vector from low-dimensional observations subject to uncertainty in the measurement operator. We then design a new robust deep neural network architecture by applying algorithm unfolding techniques to a robust version of the underlying recovery problem. Our proposed network - named Robust lEarned Shrinkage-Thresholding (REST) - exhibits an additional normalization processing compared to Learned Iterative Shrinkage-Thresholding Algorithm (LISTA), leading to reliable recovery of the signal under sample-wise varying model mismatch. The proposed REST network is shown to outperform state-of-the-art model-based and data-driven algorithms in both compressive sensing and radar imaging problems wherein model mismatch is taken into consideration.

ML-85-标题: Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic Forecasting

Abstract: Traffic forecasting is a challenging problem due to complex road networks and sudden speed changes caused by various events on roads. A number of models have been proposed to solve this challenging problem with a focus on learning spatio-temporal dependencies of roads. In this work, we propose a new perspective of converting the forecasting problem into a pattern matching task, assuming that large data can be represented by a set of patterns. To evaluate the validness of the new perspective, we design a novel traffic forecasting model, called Pattern-Matching Memory Networks (PM-MemNet), which learns to match input data to the representative patterns with a key-value memory structure. We first extract and cluster representative traffic patterns, which serve as keys in the memory. Then via matching the extracted keys and inputs, PM-MemNet acquires necessary information of existing traffic patterns from the memory and uses it for forecasting. To model spatio-temporal correlation of traffic, we proposed novel memory architecture GCMem, which integrates attention and graph convolution for memory enhancement. The experiment results indicate that PM-MemNet is more accurate than state-of-the-art models, such as Graph WaveNet with higher responsiveness. We also present a qualitative analysis result, describing how PM-MemNet works and achieves its higher accuracy when road speed rapidly changes.

ML-86-标题: Cascaded Compressed Sensing Networks: A Reversible Architecture for Layerwise Learning

Abstract: Recently, the method that learns networks layer by layer has attracted increasing interest for its ease of analysis. For the method, the main challenge lies in deriving an optimization target for each layer by inversely propagating the global target of the network. The propagation problem is ill posed, due to involving the inversion of nonlinear activations from lowdimensional to high-dimensional spaces. To address the problem, the existing solution is to learn an auxiliary network to specially propagate the target. However, the network lacks stability, and moreover, it results in higher complexity for network learning. In the letter, we show that target propagation could be achieved by modeling the network s each layer with compressed sensing, without the need of auxiliary networks. Experiments show that the proposed method could achieve better performance than the auxiliary network-based method.

ML-87-标题: Distributed Reinforcement Learning for Privacy-Preserving Dynamic Edge Caching

Abstract: Mobile edge computing (MEC) is a prominent computing paradigm which expands the application fields of wireless communication. Due to the limitation of the capacities of user equipments and MEC servers, edge caching (EC) optimization is crucial to the effective utilization of the caching resources in MEC-enabled wireless networks. However, the dynamics and complexities of content popularities over space and time as well as the privacy preservation of users pose significant challenges to EC optimization. In this paper, a privacy-preserving distributed deep deterministic policy gradient (P2D3PG) algorithm is proposed to maximize the cache hit rates of devices in the MEC networks. Specifically, we consider the fact that content popularities are dynamic, complicated and unobservable, and formulate the maximization of cache hit rates on devices as distributed problems under the constraints of privacy preservation. In particular, we convert the distributed optimizations into distributed model-free Markov decision process problems and then introduce a privacy-preserving federated learning method for popularity prediction. Subsequently, a P2D3PG algorithm is developed based on distributed reinforcement learning to solve the distributed problems. Simulation results demonstrate the superiority of the proposed approach in improving EC hit rate over the baseline methods while preserving user privacy.

ML-88-标题: Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond

Abstract: In distributed learning, local SGD (also known as federated averaging) and its simple baseline minibatch SGD are widely studied optimization methods. Most existing analyses of these methods assume independent and unbiased gradient estimates obtained via with-replacement sampling. In contrast, we study shuffling-based variants: minibatch and local Random Reshuffling, which draw stochastic gradients without replacement and are thus closer to practice. For smooth functions satisfying the Polyak-Łojasiewicz condition, we obtain convergence bounds (in the large epoch regime) which show that these shuffling-based variants converge faster than their with-replacement counterparts. Moreover, we prove matching lower bounds showing that our convergence analysis is tight. Finally, we propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.

ML-89-标题: One-Step Abductive Multi-Target Learning with Diverse Noisy Samples

Abstract: One-step abductive multi-target learning (OSAMTL) was proposed to handle complex noisy labels. In this paper, giving definition of diverse noisy samples (DNS), we propose one-step abductive multi-target learning with DNS (OSAMTL-DNS) to expand the original OSAMTL to a wider range of tasks that handle complex noisy labels.

ML-90-标题: Frontiers in Evolutionary Computation: A Workshop Report

Abstract: In July of 2021, the Santa Fe Institute hosted a workshop on evolutionary computation as part of its Foundations of Intelligence in Natural and Artificial Systems project. This project seeks to advance the field of artificial intelligence by promoting interdisciplinary research on the nature of intelligence. The workshop brought together computer scientists and biologists to share their insights about the nature of evolution and the future of evolutionary computation. In this report, we summarize each of the talks and the subsequent discussions. We also draw out a number of key themes and identify important frontiers for future research.

ML-91-标题: When in Doubt Summon the Titans: Efficient Inference with Large Models

Abstract: Scaling neural networks to “large” sizes, with billions of parameters, has been shown to yield impressive results on many challenging problems. However, the inference cost incurred by such large models often prevents their application in most real-world settings. In this paper, we propose a two-stage framework based on distillation that realizes the modelling benefits of the large models, while largely preserving the computational benefits of inference with more lightweight models. In a nutshell, we use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of “easy” examples; for the “hard” examples, we fall-back to the teacher. Such an approach allows us to efficiently employ large models in practical scenarios where easy examples are much more frequent than rare hard examples. Our proposed use of distillation to only handle easy instances allows for a more aggressive trade-off in the student size, thereby reducing the amortized cost of inference and achieving better accuracy than standard distillation. Empirically, we demonstrate the benefits of our approach on both image classification and natural language processing benchmarks.

ML-92-标题: Layer-wise Adaptive Model Aggregation for Scalable Federated Learning

Abstract: In Federated Learning, a common approach for aggregating local models across clients is periodic averaging of the full model parameters. It is, however, known that different layers of neural networks can have a different degree of model discrepancy across the clients. The conventional full aggregation scheme does not consider such a difference and synchronizes the whole model parameters at once, resulting in inefficient network bandwidth consumption. Aggregating the parameters that are similar across the clients does not make meaningful training progress while increasing the communication cost. We propose FedLAMA, a layer-wise model aggregation scheme for scalable Federated Learning. FedLAMA adaptively adjusts the aggregation interval in a layer-wise manner, jointly considering the model discrepancy and the communication cost. The layer-wise aggregation method enables to finely control the aggregation interval to relax the aggregation frequency without a significant impact on the model accuracy. Our empirical study shows that FedLAMA reduces the communication cost by up to 60% for IID data and 70% for non-IID data while achieving a comparable accuracy to FedAvg.

ML-93-标题: Expressivity of Neural Networks via Chaotic Itineraries beyond Sharkovskys Theorem

Abstract: Given a target function $f$, how large must a neural network be in order to approximate $f$? Recent works examine this basic question on neural network \textit{expressivity} from the lens of dynamical systems and provide novel depth-vs-width'' tradeoffs for a large family of functions $f$. They suggest that such tradeoffs are governed by the existence of \textit{periodic} points or \emph{cycles} in $f$. Our work, by further deploying dynamical systems concepts, illuminates a more subtle connection between periodicity and expressivity: we prove that periodic points alone lead to suboptimal depth-width tradeoffs and we improve upon them by demonstrating that certain chaotic itineraries’’ give stronger exponential tradeoffs, even in regimes where previous analyses only imply polynomial gaps. Contrary to prior works, our bounds are nearly-optimal, tighten as the period increases, and handle strong notions of inapproximability (e.g., constant $L_1$ error). More broadly, we identify a phase transition to the \textit{chaotic regime} that exactly coincides with an abrupt shift in other notions of function complexity, including VC-dimension and topological entropy.

Abstract: As machine learning (ML) techniques are being increasingly used in many applications, their vulnerability to adversarial attacks becomes well-known. Test time attacks, usually launched by adding adversarial noise to test instances, have been shown effective against the deployed ML models. In practice, one test input may be leveraged by different ML models. Test time attacks targeting a single ML model often neglect their impact on other ML models. In this work, we empirically demonstrate that naively attacking the classifier learning one concept may negatively impact classifiers trained to learn other concepts. For example, for the online image classification scenario, when the Gender classifier is under attack, the (wearing) Glasses classifier is simultaneously attacked with the accuracy dropped from 98.69 to 88.42. This raises an interesting question: is it possible to attack one set of classifiers without impacting the other set that uses the same test instance? Answers to the above research question have interesting implications for protecting privacy against ML model misuse. Attacking ML models that pose unnecessary risks of privacy invasion can be an important tool for protecting individuals from harmful privacy exploitation. In this paper, we address the above research question by developing novel attack techniques that can simultaneously attack one set of ML models while preserving the accuracy of the other. In the case of linear classifiers, we provide a theoretical framework for finding an optimal solution to generate such adversarial examples. Using this theoretical framework, we develop a multi-concept attack strategy in the context of deep learning. Our results demonstrate that our techniques can successfully attack the target classes while protecting the protected classes in many different settings, which is not possible with the existing test-time attack-single strategies.

ML-95-标题: Robust Semi-Supervised Classification using GANs with Self-Organizing Maps

Abstract: Generative adversarial networks (GANs) have shown tremendous promise in learning to generate data and effective at aiding semi-supervised classification. However, to this point, semi-supervised GAN methods make the assumption that the unlabeled data set contains only samples of the joint distribution of the classes of interest, referred to as inliers. Consequently, when presented with a sample from other distributions, referred to as outliers, GANs perform poorly at determining that it is not qualified to make a decision on the sample. The problem of discriminating outliers from inliers while maintaining classification accuracy is referred to here as the DOIC problem. In this work, we describe an architecture that combines self-organizing maps (SOMs) with SS-GANS with the goal of mitigating the DOIC problem and experimental results indicating that the architecture achieves the goal. Multiple experiments were conducted on hyperspectral image data sets. The SS-GANS performed slightly better than supervised GANS on classification problems with and without the SOM. Incorporating the SOMs into the SS-GANs and the supervised GANS led to substantially mitigation of the DOIC problem when compared to SS-GANS and GANs without the SOMs. Furthermore, the SS-GANS performed much better than GANS on the DOIC problem, even without the SOMs.

ML-96-标题: A Simple Approach to Continual Learning by Transferring Skill Parameters

Abstract: In order to be effective general purpose machines in real world environments, robots not only will need to adapt their existing manipulation skills to new circumstances, they will need to acquire entirely new skills on-the-fly. A great promise of continual learning is to endow robots with this ability, by using their accumulated knowledge and experience from prior skills. We take a fresh look at this problem, by considering a setting in which the robot is limited to storing that knowledge and experience only in the form of learned skill policies. We show that storing skill policies, careful pre-training, and appropriately choosing when to transfer those skill policies is sufficient to build a continual learner in the context of robotic manipulation. We analyze which conditions are needed to transfer skills in the challenging Meta-World simulation benchmark. Using this analysis, we introduce a pair-wise metric relating skills that allows us to predict the effectiveness of skill transfer between tasks, and use it to reduce the problem of continual learning to curriculum selection. Given an appropriate curriculum, we show how to continually acquire robotic manipulation skills without forgetting, and using far fewer samples than needed to train them from scratch.

ML-97-标题: Neural Stochastic Partial Differential Equations

Abstract: Stochastic partial differential equations (SPDEs) are the mathematical tool of choice to model complex spatio-temporal dynamics of systems subject to the influence of randomness. We introduce the Neural SPDE model providing an extension to two important classes of physics-inspired neural architectures. On the one hand, it extends all the popular neural – ordinary, controlled, stochastic, rough – differential equation models in that it is capable of processing incoming information even when the latter evolves in an infinite dimensional state space. On the other hand, it extends Neural Operators – recent generalizations of neural networks modelling mappings between functional spaces – in that it can be used to learn complex SPDE solution operators $(u_0,\xi) \mapsto u$ depending simultaneously on an initial condition $u_0$ and on a stochastic forcing term $\xi$, while remaining resolution-invariant and equation-agnostic. A Neural SPDE is constrained to respect real physical dynamics and consequently requires only a modest amount of data to train, depends on a significantly smaller amount of parameters and has better generalization properties compared to Neural Operators. Through various experiments on semilinear SPDEs with additive and multiplicative noise (including the stochastic Navier-Stokes equations) we demonstrate how Neural SPDEs can flexibly be used in a supervised learning setting as well as conditional generative models to sample solutions of SPDEs conditioned on prior knowledge, systematically achieving in both cases better performance than all alternative models.

ML-98-标题: More Engineering No Silos: Rethinking Processes and Interfaces in Collaboration between Interdisciplinary Teams for Machine Learning Projects

Abstract: The introduction of machine learning (ML) components in software projects has created the need for software engineers to collaborate with data scientists and other specialists. While collaboration can always be challenging, ML introduces additional challenges with its exploratory model development process, additional skills and knowledge needed, difficulties testing ML systems, need for continuous evolution and monitoring, and non-traditional quality requirements such as fairness and explainability. Through interviews with 45 practitioners from 28 organizations, we identified key collaboration challenges that teams face when building and deploying ML systems into production. We report on common collaboration points in the development of production ML systems for requirements, data, and integration, as well as corresponding team patterns and challenges. We find that most of these challenges center around communication, documentation, engineering, and process and collect recommendations to address these challenges.

ML-99-标题: Forecasting Market Prices using DL with Data Augmentation and Meta-learning: ARIMA still wins!

Abstract: Deep-learning techniques have been successfully used for time-series forecasting and have often shown superior performance on many standard benchmark datasets as compared to traditional techniques. Here we present a comprehensive and comparative study of performance of deep-learning techniques for forecasting prices in financial markets. We benchmark state-of-the-art deep-learning baselines, such as NBeats, etc., on data from currency as well as stock markets. We also generate synthetic data using a fuzzy-logic based model of demand driven by technical rules such as moving averages, which are often used by traders. We benchmark the baseline techniques on this synthetic data as well as use it for data augmentation. We also apply gradient-based meta-learning to account for non-stationarity of financial time-series. Our extensive experiments notwithstanding, the surprising result is that the standard ARIMA models outperforms deep-learning even using data augmentation or meta-learning. We conclude by speculating as to why this might be the case.

ML-100-标题: What Averages Do Not Tell – Predicting Real Life Processes with Sequential Deep Learning

Abstract: Deep Learning is proven to be an effective tool for modeling sequential data as shown by the success in Natural Language, Computer Vision and Signal Processing. Process Mining concerns discovering insights on business processes from their execution data that are logged by supporting information systems. The logged data (event log) is formed of event sequences (traces) that correspond to executions of a process. Many Deep Learning techniques have been successfully adapted for predictive Process Mining that aims to predict process outcomes, remaining time, the next event, or even the suffix of running traces. Traces in Process Mining are multimodal sequences and very differently structured than natural language sentences or images. This may require a different approach to processing. So far, there has been little focus on these differences and the challenges introduced. Looking at suffix prediction as the most challenging of these tasks, the performance of Deep Learning models was evaluated only on average measures and for a small number of real-life event logs. Comparing the results between papers is difficult due to different pre-processing and evaluation strategies. Challenges that may be relevant are the skewness of trace-length distribution and the skewness of the activity distribution in real-life event logs. We provide an end-to-end framework which enables to compare the performance of seven state-of-the-art sequential architectures in common settings. Results show that sequence modeling still has a lot of room for improvement for majority of the more complex datasets. Further research and insights are required to get consistent performance not just in average measures but additionally over all the prefixes.

ML-101-标题: A Federated Learning Aggregation Algorithm for Pervasive Computing: Evaluation and Comparison

Abstract: Pervasive computing promotes the installation of connected devices in our living spaces in order to provide services. Two major developments have gained significant momentum recently: an advanced use of edge resources and the integration of machine learning techniques for engineering applications. This evolution raises major challenges, in particular related to the appropriate distribution of computing elements along an edge-to-cloud continuum. About this, Federated Learning has been recently proposed for distributed model training in the edge. The principle of this approach is to aggregate models learned on distributed clients in order to obtain a new, more general model. The resulting model is then redistributed to clients for further training. To date, the most popular federated learning algorithm uses coordinate-wise averaging of the model parameters for aggregation. However, it has been shown that this method is not adapted in heterogeneous environments where data is not identically and independently distributed (non-iid). This corresponds directly to some pervasive computing scenarios where heterogeneity of devices and users challenges machine learning with the double objective of generalization and personalization. In this paper, we propose a novel aggregation algorithm, termed FedDist, which is able to modify its model architecture (here, deep neural network) by identifying dissimilarities between specific neurons amongst the clients. This permits to account for clients’ specificity without impairing generalization. Furthermore, we define a complete method to evaluate federated learning in a realistic way taking generalization and personalization into account. Using this method, FedDist is extensively tested and compared with three state-of-the-art federated learning algorithms on the pervasive domain of Human Activity Recognition with smartphones.

ML-102-标题: The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding

Abstract: There is often variation in the shape and size of input data used for deep learning. In many cases, such data can be represented using tensors with non-uniform shapes, or ragged tensors. Due to limited and non-portable support for efficient execution on ragged tensors, current deep learning frameworks generally use techniques such as padding and masking to make the data shapes uniform and then offload the computations to optimized kernels for dense tensor algebra. Such techniques can, however, lead to a lot of wasted computation and therefore, a loss in performance. This paper presents CoRa, a tensor compiler that allows users to easily generate efficient code for ragged tensor operators targeting a wide range of CPUs and GPUs. Evaluating CoRa on a variety of operators on ragged tensors as well as on an encoder layer of the transformer model, we find that CoRa (i)performs competitively with hand-optimized implementations of the operators and the transformer encoder and (ii) achieves, over PyTorch, a 1.6X geomean speedup for the encoder on an Nvidia GPU and a 1.86X geomean speedup for the multi-head attention module used in transformers on an ARM CPU.

ML-103-标题: fairadapt: Causal Reasoning for Fair Data Pre-processing

Abstract: Machine learning algorithms are useful for various predictions tasks, but they can also learn how to discriminate, based on gender, race or other sensitive attributes. This realization gave rise to the field of fair machine learning, which aims to measure and mitigate such algorithmic bias. This manuscript describes the R-package fairadapt, which implements a causal inference pre-processing method. By making use of a causal graphical model and the observed data, the method can be used to address hypothetical questions of the form “What would my salary have been, had I been of a different gender/race?”. Such individual level counterfactual reasoning can help eliminate discrimination and help justify fair decisions. We also discuss appropriate relaxations which assume certain causal pathways from the sensitive attribute to the outcome are not discriminatory.

ML-104-标题: NAS-HPO-Bench-II: A Benchmark Dataset on Joint Optimization of Convolutional Neural Network Architecture and Training Hyperparameters

Abstract: The benchmark datasets for neural architecture search (NAS) have been developed to alleviate the computationally expensive evaluation process and ensure a fair comparison. Recent NAS benchmarks only focus on architecture optimization, although the training hyperparameters affect the obtained model performances. Building the benchmark dataset for joint optimization of architecture and training hyperparameters is essential to further NAS research. The existing NAS-HPO-Bench is a benchmark for joint optimization, but it does not consider the network connectivity design as done in modern NAS algorithms. This paper introduces the first benchmark dataset for joint optimization of network connections and training hyperparameters, which we call NAS-HPO-Bench-II. We collect the performance data of 4K cell-based convolutional neural network architectures trained on the CIFAR-10 dataset with different learning rate and batch size settings, resulting in the data of 192K configurations. The dataset includes the exact data for 12 epoch training. We further build the surrogate model predicting the accuracies after 200 epoch training to provide the performance data of longer training epoch. By analyzing NAS-HPO-Bench-II, we confirm the dependency between architecture and training hyperparameters and the necessity of joint optimization. Finally, we demonstrate the benchmarking of the baseline optimization algorithms using NAS-HPO-Bench-II.

ML-105-标题: Identifying Stroke Indicators Using Rough Sets

Abstract: Stroke is widely considered as the second most common cause of mortality. The adverse consequences of stroke have led to global interest and work for improving the management and diagnosis of stroke. Various techniques for data mining have been used globally for accurate prediction of occurrence of stroke based on the risk factors that are associated with the electronic health care records (EHRs) of the patients. In particular, EHRs routinely contain several thousands of features and most of them are redundant and irrelevant that need to be discarded to enhance the prediction accuracy. The choice of feature-selection methods can help in improving the prediction accuracy of the model and efficient data management of the archived input features. In this paper, we systematically analyze the various features in EHR records for the detection of stroke. We propose a novel rough-set based technique for ranking the importance of the various EHR records in detecting stroke. Unlike the conventional rough-set techniques, our proposed technique can be applied on any dataset that comprises binary feature sets. We evaluated our proposed method in a publicly available dataset of EHR, and concluded that age, average glucose level, heart disease, and hypertension were the most essential attributes for detecting stroke in patients. Furthermore, we benchmarked the proposed technique with other popular feature-selection techniques. We obtained the best performance in ranking the importance of individual features in detecting stroke.

ML-106-标题: Likelihood Training of Schrödinger Bridge using Forward-Backward SDEs Theory

Abstract: Schrödinger Bridge (SB) is an optimal transport problem that has received increasing attention in deep generative modeling for its mathematical flexibility compared to the Scored-based Generative Model (SGM). However, it remains unclear whether the optimization principle of SB relates to the modern training of deep generative models, which often rely on constructing parameterized log-likelihood objectives.This raises questions on the suitability of SB models as a principled alternative for generative applications. In this work, we present a novel computational framework for likelihood training of SB models grounded on Forward-Backward Stochastic Differential Equations Theory – a mathematical methodology appeared in stochastic optimal control that transforms the optimality condition of SB into a set of SDEs. Crucially, these SDEs can be used to construct the likelihood objectives for SB that, surprisingly, generalizes the ones for SGM as special cases. This leads to a new optimization principle that inherits the same SB optimality yet without losing applications of modern generative training techniques, and we show that the resulting training algorithm achieves comparable results on generating realistic images on MNIST, CelebA, and CIFAR10.

ML-107-标题: On Optimal Interpolation In Linear Regression

Abstract: Understanding when and why interpolating methods generalize well has recently been a topic of interest in statistical learning theory. However, systematically connecting interpolating methods to achievable notions of optimality has only received partial attention. In this paper, we investigate the question of what is the optimal way to interpolate in linear regression using functions that are linear in the response variable (as the case for the Bayes optimal estimator in ridge regression) and depend on the data, the population covariance of the data, the signal-to-noise ratio and the covariance of the prior for the signal, but do not depend on the value of the signal itself nor the noise vector in the training data. We provide a closed-form expression for the interpolator that achieves this notion of optimality and show that it can be derived as the limit of preconditioned gradient descent with a specific initialization. We identify a regime where the minimum-norm interpolator provably generalizes arbitrarily worse than the optimal response-linear achievable interpolator that we introduce, and validate with numerical experiments that the notion of optimality we consider can be achieved by interpolating methods that only use the training data as input in the case of an isotropic prior. Finally, we extend the notion of optimal response-linear interpolation to random features regression under a linear data-generating model that has been previously studied in the literature.

ML-108-标题: User-friendly introduction to PAC-Bayes bounds

Abstract: Aggregated predictors are obtained by making a set of basic predictors vote according to some weights, that is, to some probability distribution. Randomized predictors are obtained by sampling in a set of basic predictors, according to some prescribed probability distribution. Thus, aggregated and randomized predictors have in common that they are not defined by a minimization problem, but by a probability distribution on the set of predictors. In statistical learning theory, there is a set of tools designed to understand the generalization ability of such procedures: PAC-Bayesian or PAC-Bayes bounds. Since the original PAC-Bayes bounds of McAllester, these tools have been considerably improved in many directions (we will for example describe a simplified version of the localization technique of Catoni that was missed by the community, and later rediscovered as “mutual information bounds”). Very recently, PAC-Bayes bounds received a considerable attention: for example there was workshop on PAC-Bayes at NIPS 2017, “(Almost) 50 Shades of Bayesian Learning: PAC-Bayesian trends and insights”, organized by B. Guedj, F. Bach and P. Germain. One of the reason of this recent success is the successful application of these bounds to neural networks by Dziugaite and Roy. An elementary introduction to PAC-Bayes theory is still missing. This is an attempt to provide such an introduction.

ML-109-标题: Mean Nyström Embeddings for Adaptive Compressive Learning

Abstract: Compressive learning is an approach to efficient large scale learning based on sketching an entire dataset to a single mean embedding (the sketch), i.e. a vector of generalized moments. The learning task is then approximately solved as an inverse problem using an adapted parametric model. Previous works in this context have focused on sketches obtained by averaging random features, that while universal can be poorly adapted to the problem at hand. In this paper, we propose and study the idea of performing sketching based on data-dependent Nyström approximation. From a theoretical perspective we prove that the excess risk can be controlled under a geometric assumption relating the parametric model used to learn from the sketch and the covariance operator associated to the task at hand. Empirically, we show for k-means clustering and Gaussian modeling that for a fixed sketch size, Nyström sketches indeed outperform those built with random features.

ML-110-标题: Data splitting improves statistical performance in overparametrized regimes

Abstract: While large training datasets generally offer improvement in model performance, the training process becomes computationally expensive and time consuming. Distributed learning is a common strategy to reduce the overall training time by exploiting multiple computing devices. Recently, it has been observed in the single machine setting that overparametrization is essential for benign overfitting in ridgeless regression in Hilbert spaces. We show that in this regime, data splitting has a regularizing effect, hence improving statistical performance and computational complexity at the same time. We further provide a unified framework that allows to analyze both the finite and infinite dimensional setting. We numerically demonstrate the effect of different model parameters.

ML-111-标题: REAL-M: Towards Speech Separation on Real Mixtures

Abstract: In recent years, deep learning based source separation has achieved impressive results. Most studies, however, still evaluate separation models on synthetic datasets, while the performance of state-of-the-art techniques on in-the-wild speech data remains an open question. This paper contributes to fill this gap in two ways. First, we release the REAL-M dataset, a crowd-sourced corpus of real-life mixtures. Secondly, we address the problem of performance evaluation of real-life mixtures, where the ground truth is not available. We bypass this issue by carefully designing a blind Scale-Invariant Signal-to-Noise Ratio (SI-SNR) neural estimator. Through a user study, we show that our estimator reliably evaluates the separation performance on real mixtures. The performance predictions of the SI-SNR estimator indeed correlate well with human opinions. Moreover, we observe that the performance trends predicted by our estimator on the REAL-M dataset closely follow those achieved on synthetic benchmarks when evaluating popular speech separation models.

ML-112-标题: Identifiable Variational Autoencoders via Sparse Decoding

Abstract: We develop the Sparse VAE, a deep generative model for unsupervised representation learning on high-dimensional data. Given a dataset of observations, the Sparse VAE learns a set of latent factors that captures its distribution. The model is sparse in the sense that each feature of the dataset (i.e., each dimension) depends on a small subset of the latent factors. As examples, in ratings data each movie is only described by a few genres; in text data each word is only applicable to a few topics; in genomics, each gene is active in only a few biological processes. We first show that the Sparse VAE is identifiable: given data drawn from the model, there exists a uniquely optimal set of factors. (In contrast, most VAE-based models are not identifiable.) The key assumption behind Sparse-VAE identifiability is the existence of “anchor features”, where for each factor there exists a feature that depends only on that factor. Importantly, the anchor features do not need to be known in advance. We then show how to fit the Sparse VAE with variational EM. Finally, we empirically study the Sparse VAE with both simulated and real data. We find that it recovers meaningful latent factors and has smaller heldout reconstruction error than related methods.

ML-113-标题: Adversarial attacks against Bayesian forecasting dynamic models

Abstract: The last decade has seen the rise of Adversarial Machine Learning (AML). This discipline studies how to manipulate data to fool inference engines, and how to protect those systems against such manipulation attacks. Extensive work on attacks against regression and classification systems is available, while little attention has been paid to attacks against time series forecasting systems. In this paper, we propose a decision analysis based attacking strategy that could be utilized against Bayesian forecasting dynamic models.

ML-114-标题: Pick-and-Mix Information Operators for Probabilistic ODE Solvers

Abstract: Probabilistic numerical solvers for ordinary differential equations compute posterior distributions over the solution of an initial value problem via Bayesian inference. In this paper, we leverage their probabilistic formulation to seamlessly include additional information as general likelihood terms. We show that second-order differential equations should be directly provided to the solver, instead of transforming the problem to first order. Additionally, by including higher-order information or physical conservation laws in the model, solutions become more accurate and more physically meaningful. Lastly, we demonstrate the utility of flexible information operators by solving differential-algebraic equations. In conclusion, the probabilistic formulation of numerical solvers offers a flexible way to incorporate various types of information, thus improving the resulting solutions.

ML-115-标题: Iterated Block Particle Filter for High-dimensional Parameter Learning: Beating the Curse of Dimensionality

Abstract: Parameter learning for high-dimensional, partially observed, and nonlinear stochastic processes is a methodological challenge. Spatiotemporal disease transmission systems provide examples of such processes giving rise to open inference problems. We propose the iterated block particle filter (IBPF) algorithm for learning high-dimensional parameters over graphical state space models with general state spaces, measures, transition densities and graph structure. Theoretical performance guarantees are obtained on beating the curse of dimensionality (COD), algorithm convergence, and likelihood maximization. Experiments on a highly nonlinear and non-Gaussian spatiotemporal model for measles transmission reveal that the iterated ensemble Kalman filter algorithm (Li et al. (2020)) is ineffective and the iterated filtering algorithm (Ionides et al. (2015)) suffers from the COD, while our IBPF algorithm beats COD consistently across various experiments with different metrics.

ML-116-标题: Semi-supervised physics guided DL framework for predicting the I-V characteristics of GAN HEMT

Abstract: This letter proposes a novel deep learning framework (DLF) that addresses two major hurdles in the adoption of deep learning techniques for solving physics-based problems: 1) requirement of the large dataset for training the DL model, 2) consistency of the DL model with the physics of the phenomenon. The framework is generic in nature and can be applied to model a phenomenon from other fields of research too as long as its behaviour is known. To demonstrate the technique, a semi-supervised physics guided neural network (SPGNN) has been developed that predicts I-V characteristics of a gallium nitride-based high electron mobility transistor (GaN HEMT). A two-stage training method is proposed, where in the first stage, the DL model is trained via the unsupervised learning method using the I-V equations of a field-effect transistor as a loss function of the model that incorporates physical behaviors in the DL model and in the second stage, the DL model has been fine-tuned with a very small set of experimental data. The SPGNN significantly reduces the requirement of the training data by more than 80% for achieving similar or better performance than a traditional neural network (TNN) even for unseen conditions. The SPGNN predicts 32.4% of the unseen test data with less than 1% of error and only 0.4% of the unseen test data with more than 10% of error.

ML-117-标题: Learning quantum dynamics with latent neural ODEs

Abstract: The core objective of machine-assisted scientific discovery is to learn physical laws from experimental data without prior knowledge of the systems in question. In the area of quantum physics, making progress towards these goals is significantly more challenging due to the curse of dimensionality as well as the counter-intuitive nature of quantum mechanics. Here, we present the QNODE, a latent neural ODE trained on dynamics from closed and open quantum systems. The QNODE can learn to generate quantum dynamics and extrapolate outside of its training region that satisfy the von Neumann and time-local Lindblad master equations for closed and open quantum systems. Furthermore the QNODE rediscovers quantum mechanical laws such as Heisenberg’s uncertainty principle in a totally data-driven way, without constraints or guidance. Additionally, we show that trajectories that are generated from the QNODE and are close in its latent space have similar quantum dynamics while preserving the physics of the training system.

ML-118-标题: Stochastic Learning Rate Optimization in the Stochastic Approximation and Online Learning Settings

Abstract: In this work, multiplicative stochasticity is applied to the learning rate of stochastic optimization algorithms, giving rise to stochastic learning-rate schemes. In-expectation theoretical convergence results of Stochastic Gradient Descent equipped with this novel stochastic learning rate scheme under the stochastic setting, as well as convergence results under the online optimization settings are provided. Empirical results consider the case of an adaptively uniformly distributed multiplicative stochasticity and include not only Stochastic Gradient Descent, but also other popular algorithms equipped with a stochastic learning rate. They demonstrate noticeable optimization performance gains, with respect to their deterministic-learning-rate versions.

ML-119-标题: Predicting Tau Accumulation in Cerebral Cortex with Multivariate MRI Morphometry Measurements Sparse Coding and Correntropy

Abstract: Biomarker-assisted diagnosis and intervention in Alzheimer’s disease (AD) may be the key to prevention breakthroughs. One of the hallmarks of AD is the accumulation of tau plaques in the human brain. However, current methods to detect tau pathology are either invasive (lumbar puncture) or quite costly and not widely available (Tau PET). In our previous work, structural MRI-based hippocampal multivariate morphometry statistics (MMS) showed superior performance as an effective neurodegenerative biomarker for preclinical AD and Patch Analysis-based Surface Correntropy-induced Sparse coding and max-pooling (PASCS-MP) has excellent ability to generate low-dimensional representations with strong statistical power for brain amyloid prediction. In this work, we apply this framework together with ridge regression models to predict Tau deposition in Braak12 and Braak34 brain regions separately. We evaluate our framework on 925 subjects from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Each subject has one pair consisting of a PET image and MRI scan which were collected at about the same times. Experimental results suggest that the representations from our MMS and PASCS-MP have stronger predictive power and their predicted Braak12 and Braak34 are closer to the real values compared to the measures derived from other approaches such as hippocampal surface area and volume, and shape morphometry features based on spherical harmonics (SPHARM).

ML-120-标题: Online non-parametric change-point detection for heterogeneous data streams observed over graph nodes

Abstract: Consider a heterogeneous data stream being generated by the nodes of a graph. The data stream is in essence composed by multiple streams, possibly of different nature that depends on each node. At a given moment $\tau$, a change-point occurs for a subset of nodes $C$, signifying the change in the probability distribution of their associated streams. In this paper we propose an online non-parametric method to infer $\tau$ based on the direct estimation of the likelihood-ratio between the post-change and the pre-change distribution associated with the data stream of each node. We propose a kernel-based method, under the hypothesis that connected nodes of the graph are expected to have similar likelihood-ratio estimates when there is no change-point. We demonstrate the quality of our method on synthetic experiments and real-world applications.

ML-121-标题: Feedback Linearization of Car Dynamics for Racing via Reinforcement Learning

Abstract: Through the method of Learning Feedback Linearization, we seek to learn a linearizing controller to simplify the process of controlling a car to race autonomously. A soft actor-critic approach is used to learn a decoupling matrix and drift vector that effectively correct for errors in a hand-designed linearizing controller. The result is an exactly linearizing controller that can be used to enable the well-developed theory of linear systems to design path planning and tracking schemes that are easy to implement and significantly less computationally demanding. To demonstrate the method of feedback linearization, it is first used to learn a simulated model whose exact structure is known, but varied from the initial controller, so as to introduce error. We further seek to apply this method to a system that introduces even more error in the form of a gym environment specifically designed for modeling the dynamics of car racing. To do so, we posit an extension to the method of learning feedback linearization; a neural network that is trained using supervised learning to convert the output of our linearizing controller to the required input for the racing environment. Our progress towards these goals is reported and the next steps in their accomplishment are discussed.

ML-122-标题: Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process

Abstract: The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its utilities/costs. A new primal-dual approach is proposed with a novel integration of three ingredients: entropy regularized policy optimizer, dual variable regularizer, and Nesterov’s accelerated gradient descent dual optimizer, all of which are critical to achieve a faster convergence. The finite-time error bound of the proposed approach is characterized. Despite the challenge of the nonconcave objective subject to nonconcave constraints, the proposed approach is shown to converge to the global optimum with a complexity of $\tilde{\mathcal O}(1/\epsilon)$ in terms of the optimality gap and the constraint violation, which improves the complexity of the existing primal-dual approach by a factor of $\mathcal O(1/\epsilon)$ \citep{ding2020natural,paternain2019constrained}. This is the first demonstration that nonconcave CMDP problems can attain the complexity lower bound of $\mathcal O(1/\epsilon)$ for convex optimization subject to convex constraints. Our primal-dual approach and non-asymptotic analysis are agnostic to the RL optimizer used, and thus are more flexible for practical applications. More generally, our approach also serves as the first algorithm that provably accelerates constrained nonconvex optimization with zero duality gap by exploiting the geometries such as the gradient dominance condition, for which the existing acceleration methods for constrained convex optimization are not applicable.

ML-123-标题: Computational Graph Completion

Abstract: We introduce a framework for generating, organizing, and reasoning with computational knowledge. It is motivated by the observation that most problems in Computational Sciences and Engineering (CSE) can be described as that of completing (from data) a computational graph representing dependencies between functions and variables. Functions and variables may be known, unknown, or random. Data comes in the form of observations of distinct values of a finite number of subsets of the variables of the graph. The underlying problem combines a regression problem (approximating unknown functions) with a matrix completion problem (recovering unobserved variables in the data). Replacing unknown functions by Gaussian Processes (GPs) and conditioning on observed data provides a simple but efficient approach to completing such graphs. Since the proposed framework is highly expressive, it has a vast potential application scope. Since the completion process can be automatized, as one solves $\sqrt{\sqrt{2}+\sqrt{3}}$ on a pocket calculator without thinking about it, one could, with the proposed framework, solve a complex CSE problem by drawing a diagram. Compared to traditional kriging, the proposed framework can be used to recover unknown functions with much scarcer data by exploiting interdependencies between multiple functions and variables. The Computational Graph Completion (CGC) problem addressed by the proposed framework could therefore also be interpreted as a generalization of that of solving linear systems of equations to that of approximating unknown variables and functions with noisy, incomplete, and nonlinear dependencies. Numerous examples illustrate the flexibility, scope, efficacy, and robustness of the CGC framework and show how it can be used as a pathway to identifying simple solutions to classical CSE problems (digital twin modeling, dimension reduction, mode decomposition, etc.).

CV-62-标题: HALP: Hardware-Aware Latency Pruning

Abstract: Structural pruning can simplify network architecture and improve inference speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing the accuracy while constraining latency under a predefined budget. For filter importance ranking, HALP leverages latency lookup table to track latency reduction potential and global saliency score to gauge accuracy drop. Both metrics can be evaluated very efficiently during pruning, allowing us to reformulate global structural pruning under a reward maximization problem given target constraint. This makes the problem solvable via our augmented knapsack solver, enabling HALP to surpass prior work in pruning efficacy and accuracy-efficiency trade-off. We examine HALP on both classification and detection tasks, over varying networks, on ImageNet and VOC datasets. In particular, for ResNet-50/-101 pruning on ImageNet, HALP improves network throughput by $1.60\times$/$1.90\times$ with $+0.3\%$/$-0.2\%$ top-1 accuracy changes, respectively. For SSD pruning on VOC, HALP improves throughput by $1.94\times$ with only a $0.56$ mAP drop. HALP consistently outperforms prior art, sometimes by large margins.

CV-68-标题: Self-Supervision and Spatial-Sequential Attention Based Loss for Multi-Person Pose Estimation

Abstract: Bottom-up based multi-person pose estimation approaches use heatmaps with auxiliary predictions to estimate joint positions and belonging at one time. Recently, various combinations between auxiliary predictions and heatmaps have been proposed for higher performance, these predictions are supervised by the corresponding L2 loss function directly. However, the lack of more explicit supervision results in low features utilization and contradictions between predictions in one model. To solve these problems, this paper proposes (i) a new loss organization method which uses self-supervised heatmaps to reduce prediction contradictions and spatial-sequential attention to enhance networks’ features extraction; (ii) a new combination of predictions composed by heatmaps, Part Affinity Fields (PAFs) and our block-inside offsets to fix pixel-level joints positions and further demonstrates the effectiveness of proposed loss function. Experiments are conducted on the MS COCO keypoint dataset and adopting OpenPose as the baseline model. Our method outperforms the baseline overall. On the COCO verification dataset, the mAP of OpenPose trained with our proposals outperforms the OpenPose baseline by over 5.5%.

CV-69-标题: Semi-supervised Domain Adaptation for Semantic Segmentation

Abstract: Deep learning approaches for semantic segmentation rely primarily on supervised learning approaches and require substantial efforts in producing pixel-level annotations. Further, such approaches may perform poorly when applied to unseen image domains. To cope with these limitations, both unsupervised domain adaptation (UDA) with full source supervision but without target supervision and semi-supervised learning (SSL) with partial supervision have been proposed. While such methods are effective at aligning different feature distributions, there is still a need to efficiently exploit unlabeled data to address the performance gap with respect to fully-supervised methods. In this paper we address semi-supervised domain adaptation (SSDA) for semantic segmentation, where a large amount of labeled source data as well as a small amount of labeled target data are available. We propose a novel and effective two-step semi-supervised dual-domain adaptation (SSDDA) approach to address both cross- and intra-domain gaps in semantic segmentation. The proposed framework is comprised of two mixing modules. First, we conduct a cross-domain adaptation via an image-level mixing strategy, which learns to align the distribution shift of features between the source data and target data. Second, intra-domain adaptation is achieved using a separate student-teacher network which is built to generate category-level data augmentation by mixing unlabeled target data in a way that respects predicted object boundaries. We demonstrate that the proposed approach outperforms state-of-the-art methods on two common synthetic-to-real semantic segmentation benchmarks. An extensive ablation study is provided to further validate the effectiveness of our approach.

CV-70-标题: Video Instance Segmentation by Instance Flow Assembly

Abstract: Instance segmentation is a challenging task aiming at classifying and segmenting all object instances of specific classes. While two-stage box-based methods achieve top performances in the image domain, they cannot easily extend their superiority into the video domain. This is because they usually deal with features or images cropped from the detected bounding boxes without alignment, failing to capture pixel-level temporal consistency. We embrace the observation that bottom-up methods dealing with box-free features could offer accurate spacial correlations across frames, which can be fully utilized for object and pixel level tracking. We first propose our bottom-up framework equipped with a temporal context fusion module to better encode inter-frame correlations. Intra-frame cues for semantic segmentation and object localization are simultaneously extracted and reconstructed by corresponding decoders after a shared backbone. For efficient and robust tracking among instances, we introduce an instance-level correspondence across adjacent frames, which is represented by a center-to-center flow, termed as instance flow, to assemble messy dense temporal correspondences. Experiments demonstrate that the proposed method outperforms the state-of-the-art online methods (taking image-level input) on the challenging Youtube-VIS dataset.

CV-71-标题: Look at What Im Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

Abstract: We introduce the task of spatially localizing narrated interactions in videos. Key to our approach is the ability to learn to spatially localize interactions with self-supervision on a large corpus of videos with accompanying transcribed narrations. To achieve this goal, we propose a multilayer cross-modal attention network that enables effective optimization of a contrastive loss during training. We introduce a divided strategy that alternates between computing inter- and intra-modal attention across the visual and natural language modalities, which allows effective training via directly contrasting the two modalities’ representations. We demonstrate the effectiveness of our approach by self-training on the HowTo100M instructional video dataset and evaluating on a newly collected dataset of localized described interactions in the YouCook2 dataset. We show that our approach outperforms alternative baselines, including shallow co-attention and full cross-modal attention. We also apply our approach to grounding phrases in images with weak supervision on Flickr30K and show that stacking multiple attention layers is effective and, when combined with a word-to-region loss, achieves state of the art on recall-at-one and pointing hand accuracies.

CV-72-标题: A Learning Framework for Diffeomorphic Image Registration based on Quasi-conformal Geometry

Abstract: Image registration, the process of defining meaningful correspondences between images, is essential for various image analysis tasks, especially medical imaging. Numerous learning-based methods, notably convolutional neural networks (CNNs), for deformable image registration proposed in recent years have demonstrated the feasibility and superiority of deep learning techniques for registration problems. Besides, compared to traditional algorithms’ optimization scheme of the objective function for each image pair, learning-based algorithms are several orders of magnitude faster. However, these data-driven methods without proper constraint on the deformation field will easily lead to topological foldings. To tackle this problem, We propose the quasi-conformal registration network (QCRegNet), an unsupervised learning framework, to obtain diffeomorphic 2D image registrations with large deformations based on quasi-conformal (QC) map, an orientation-preserving homeomorphism between two manifolds. The basic idea is to design a CNN mapping image pairs to deformation fields. QCRegNet consists of the estimator network and the Beltrami solver network (BSNet). The estimator network takes image pair as input and outputs the Beltrami coefficient (BC). The BC, which captures conformal distortion of a QC map and guarantees the bijectivity, will then be input to the BSNet, a task-independent network which reconstructs the desired QC map. Furthermore, we reduce the number of network parameters and computational complexity by utilizing Fourier approximation to compress BC. Experiments have been carried out on different data such as underwater and medical images. Registration results show that the registration accuracy is comparable to state-of-the-art methods and diffeomorphism is to a great extent guaranteed compared to other diffeomorphic registration algorithms.

CV-73-标题: Inference Graphs for CNN Interpretation

Abstract: Convolutional neural networks (CNNs) have achieved superior accuracy in many visual related tasks. However, the inference process through intermediate layers is opaque, making it difficult to interpret such networks or develop trust in their operation. We propose to model the network hidden layers activity using probabilistic models. The activity patterns in layers of interest are modeled as Gaussian mixture models, and transition probabilities between clusters in consecutive modeled layers are estimated. Based on maximum-likelihood considerations, nodes and paths relevant for network prediction are chosen, connected, and visualized as an inference graph. We show that such graphs are useful for understanding the general inference process of a class, as well as explaining decisions the network makes regarding specific images.

CV-74-标题: Fingerprint recognition with embedded presentation attacks detection: are we ready?

CV-75-标题: Robust Monocular Localization in Sparse HD Maps Leveraging Multi-Task Uncertainty Estimation

Abstract: Robust localization in dense urban scenarios using a low-cost sensor setup and sparse HD maps is highly relevant for the current advances in autonomous driving, but remains a challenging topic in research. We present a novel monocular localization approach based on a sliding-window pose graph that leverages predicted uncertainties for increased precision and robustness against challenging scenarios and per frame failures. To this end, we propose an efficient multi-task uncertainty-aware perception module, which covers semantic segmentation, as well as bounding box detection, to enable the localization of vehicles in sparse maps, containing only lane borders and traffic lights. Further, we design differentiable cost maps that are directly generated from the estimated uncertainties. This opens up the possibility to minimize the reprojection loss of amorphous map elements in an association free and uncertainty-aware manner. Extensive evaluation on the Lyft 5 dataset shows that, despite the sparsity of the map, our approach enables robust and accurate 6D localization in challenging urban scenarios

CV-76-标题: Few-Shot Temporal Action Localization with Query Adaptive Transformer

Abstract: Existing temporal action localization (TAL) works rely on a large number of training videos with exhaustive segment-level annotation, preventing them from scaling to new classes. As a solution to this problem, few-shot TAL (FS-TAL) aims to adapt a model to a new class represented by as few as a single video. Exiting FS-TAL methods assume trimmed training videos for new classes. However, this setting is not only unnatural actions are typically captured in untrimmed videos, but also ignores background video segments containing vital contextual cues for foreground action segmentation. In this work, we first propose a new FS-TAL setting by proposing to use untrimmed training videos. Further, a novel FS-TAL model is proposed which maximizes the knowledge transfer from training classes whilst enabling the model to be dynamically adapted to both the new class and each video of that class simultaneously. This is achieved by introducing a query adaptive Transformer in the model. Extensive experiments on two action localization benchmarks demonstrate that our method can outperform all the state of the art alternatives significantly in both single-domain and cross-domain scenarios. The source code can be found in this https URL

CV-77-标题: Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation

Abstract: Single image reflection separation (SIRS), as a representative blind source separation task, aims to recover two layers, $\textit{i.e.}$, transmission and reflection, from one mixed observation, which is challenging due to the highly ill-posed nature. Existing deep learning based solutions typically restore the target layers individually, or with some concerns at the end of the output, barely taking into account the interaction across the two streams/branches. In order to utilize information more efficiently, this work presents a general yet simple interactive strategy, namely $\textit{your trash is my treasure}$ (YTMT), for constructing dual-stream decomposition networks. To be specific, we explicitly enforce the two streams to communicate with each other block-wisely. Inspired by the additive property between the two components, the interactive path can be easily built via transferring, instead of discarding, deactivated information by the ReLU rectifier from one stream to the other. Both ablation studies and experimental results on widely-used SIRS datasets are conducted to demonstrate the efficacy of YTMT, and reveal its superiority over other state-of-the-art alternatives. The implementation is quite simple and our code is publicly available at $\href{this https URL}{\textit{this https URL}}$.

CV-78-标题: Anisotropic Separable Set Abstraction for Efficient Point Cloud Representation Learning

Abstract: Access to 3D point cloud representations has been widely facilitated by LiDAR sensors embedded in various mobile devices. This has led to an emerging need for fast and accurate point cloud processing techniques. In this paper, we revisit and dive deeper into PointNet++, one of the most influential yet under-explored networks, and develop faster and more accurate variants of the model. We first present a novel Separable Set Abstraction (SA) module that disentangles the vanilla SA module used in PointNet++ into two separate learning stages: (1) learning channel correlation and (2) learning spatial correlation. The Separable SA module is significantly faster than the vanilla version, yet it achieves comparable performance. We then introduce a new Anisotropic Reduction function into our Separable SA module and propose an Anisotropic Separable SA (ASSA) module that substantially increases the network’s accuracy. We later replace the vanilla SA modules in PointNet++ with the proposed ASSA module, and denote the modified network as ASSANet. Extensive experiments on point cloud classification, semantic segmentation, and part segmentation show that ASSANet outperforms PointNet++ and other methods, achieving much higher accuracy and faster speeds. In particular, ASSANet outperforms PointNet++ by $7.4$ mIoU on S3DIS Area 5, while maintaining $1.6 \times$ faster inference speed on a single NVIDIA 2080Ti GPU. Our scaled ASSANet variant achieves $66.8$ mIoU and outperforms KPConv, while being more than $54 \times$ faster.

CV-79-标题: Improving Model Generalization by Agreement of Learned Representations from Data Augmentation

Abstract: Data augmentation reduces the generalization error by forcing a model to learn invariant representations given different transformations of the input image. In computer vision, on top of the standard image processing functions, data augmentation techniques based on regional dropout such as CutOut, MixUp, and CutMix and policy-based selection such as AutoAugment demonstrated state-of-the-art (SOTA) results. With an increasing number of data augmentation algorithms being proposed, the focus is always on optimizing the input-output mapping while not realizing that there might be an untapped value in the transformed images with the same label. We hypothesize that by forcing the representations of two transformations to agree, we can further reduce the model generalization error. We call our proposed method Agreement Maximization or simply AgMax. With this simple constraint applied during training, empirical results show that data augmentation algorithms can further improve the classification accuracy of ResNet50 on ImageNet by up to 1.5%, WideResNet40-2 on CIFAR10 by up to 0.7%, WideResNet40-2 on CIFAR100 by up to 1.6%, and LeNet5 on Speech Commands Dataset by up to 1.4%. Experimental results further show that unlike other regularization terms such as label smoothing, AgMax can take advantage of the data augmentation to consistently improve model generalization by a significant margin. On downstream tasks such as object detection and segmentation on PascalVOC and COCO, AgMax pre-trained models outperforms other data augmentation methods by as much as 1.0mAP (box) and 0.5mAP (mask). Code is available at this https URL.

CV-80-标题: AniFormer: Data-driven 3D Animation with Transformer

Abstract: We present a novel task, i.e., animating a target 3D object through the motion of a raw driving sequence. In previous works, extra auxiliary correlations between source and target meshes or intermedia factors are inevitable to capture the motions in the driving sequences. Instead, we introduce AniFormer, a novel Transformer-based architecture, that generates animated 3D sequences by directly taking the raw driving sequences and arbitrary same-type target meshes as inputs. Specifically, we customize the Transformer architecture for 3D animation that generates mesh sequences by integrating styles from target meshes and motions from the driving meshes. Besides, instead of the conventional single regression head in the vanilla Transformer, AniFormer generates multiple frames as outputs to preserve the sequential consistency of the generated meshes. To achieve this, we carefully design a pair of regression constraints, i.e., motion and appearance constraints, that can provide strong regularization on the generated mesh sequences. Our AniFormer achieves high-fidelity, realistic, temporally coherent animated results and outperforms compared start-of-the-art methods on benchmarks of diverse categories. Code is available: this https URL.

CV-81-标题: Detecting and Identifying Optical Signal Attacks on Autonomous Driving Systems

Abstract: For autonomous driving, an essential task is to detect surrounding objects accurately. To this end, most existing systems use optical devices, including cameras and light detection and ranging (LiDAR) sensors, to collect environment data in real time. In recent years, many researchers have developed advanced machine learning models to detect surrounding objects. Nevertheless, the aforementioned optical devices are vulnerable to optical signal attacks, which could compromise the accuracy of object detection. To address this critical issue, we propose a framework to detect and identify sensors that are under attack. Specifically, we first develop a new technique to detect attacks on a system that consists of three sensors. Our main idea is to: 1) use data from three sensors to obtain two versions of depth maps (i.e., disparity) and 2) detect attacks by analyzing the distribution of disparity errors. In our study, we use real data sets and the state-of-the-art machine learning model to evaluate our attack detection scheme and the results confirm the effectiveness of our detection method. Based on the detection scheme, we further develop an identification model that is capable of identifying up to n-2 attacked sensors in a system with one LiDAR and n cameras. We prove the correctness of our identification scheme and conduct experiments to show the accuracy of our identification method. Finally, we investigate the overall sensitivity of our framework.

CV-82-标题: Event Guided Depth Sensing

Abstract: Active depth sensors like structured light, lidar, and time-of-flight systems sample the depth of the entire scene uniformly at a fixed scan rate. This leads to limited spatio-temporal resolution where redundant static information is over-sampled and precious motion information might be under-sampled. In this paper, we present an efficient bio-inspired event-camera-driven depth estimation algorithm. In our approach, we dynamically illuminate areas of interest densely, depending on the scene activity detected by the event camera, and sparsely illuminate areas in the field of view with no motion. The depth estimation is achieved by an event-based structured light system consisting of a laser point projector coupled with a second event-based sensor tuned to detect the reflection of the laser from the scene. We show the feasibility of our approach in a simulated autonomous driving scenario and real indoor sequences using our prototype. We show that, in natural scenes like autonomous driving and indoor environments, moving edges correspond to less than 10% of the scene on average. Thus our setup requires the sensor to scan only 10% of the scene, which could lead to almost 90% less power consumption by the illumination source. While we present the evaluation and proof-of-concept for an event-based structured-light system, the ideas presented here are applicable for a wide range of depth-sensing modalities like LIDAR, time-of-flight, and standard stereo.

CV-83-标题: STALP: Style Transfer with Auxiliary Limited Pairing

Abstract: We present an approach to example-based stylization of images that uses a single pair of a source image and its stylized counterpart. We demonstrate how to train an image translation network that can perform real-time semantically meaningful style transfer to a set of target images with similar content as the source image. A key added value of our approach is that it considers also consistency of target images during training. Although those have no stylized counterparts, we constrain the translation to keep the statistics of neural responses compatible with those extracted from the stylized source. In contrast to concurrent techniques that use a similar input, our approach better preserves important visual characteristics of the source style and can deliver temporally stable results without the need to explicitly handle temporal consistency. We demonstrate its practical utility on various applications including video stylization, style transfer to panoramas, faces, and 3D models.

CV-84-标题: Deep Point Cloud Normal Estimation via Triplet Learning

Abstract: Normal estimation on 3D point clouds is a fundamental problem in 3D vision and graphics. Current methods often show limited accuracy in predicting normals at sharp features (e.g., edges and corners) and less robustness to noise. In this paper, we propose a novel normal estimation method for point clouds. It consists of two phases: (a) feature encoding which learns representations of local patches, and (b) normal estimation that takes the learned representation as input and regresses the normal vector. We are motivated that local patches on isotropic and anisotropic surfaces have similar or distinct normals, and that separable features or representations can be learned to facilitate normal estimation. To realise this, we first construct triplets of local patches on 3D point cloud data, and design a triplet network with a triplet loss for feature encoding. We then design a simple network with several MLPs and a loss function to regress the normal vector. Despite having a smaller network size compared to most other methods, experimental results show that our method preserves sharp features and achieves better normal estimation results on CAD-like shapes.

CV-85-标题: Unified Style Transfer

Abstract: Currently, it is hard to compare and evaluate different style transfer algorithms due to chaotic definitions of style and the absence of agreed objective validation methods in the study of style transfer. In this paper, a novel approach, the Unified Style Transfer (UST) model, is proposed. With the introduction of a generative model for internal style representation, UST can transfer images in two approaches, i.e., Domain-based and Image-based, simultaneously. At the same time, a new philosophy based on the human sense of art and style distributions for evaluating the transfer model is presented and demonstrated, called Statistical Style Analysis. It provides a new path to validate style transfer models’ feasibility by validating the general consistency between internal style representation and art facts. Besides, the translation-invariance of AdaIN features is also discussed.

CV-86-标题: Noisy Annotation Refinement for Object Detection

Abstract: Supervised training of object detectors requires well-annotated large-scale datasets, whose production is costly. Therefore, some efforts have been made to obtain annotations in economical ways, such as cloud sourcing. However, datasets obtained by these methods tend to contain noisy annotations such as inaccurate bounding boxes and incorrect class labels. In this study, we propose a new problem setting of training object detectors on datasets with entangled noises of annotations of class labels and bounding boxes. Our proposed method efficiently decouples the entangled noises, corrects the noisy annotations, and subsequently trains the detector using the corrected annotations. We verified the effectiveness of our proposed method and compared it with the baseline on noisy datasets with different noise levels. The experimental results show that our proposed method significantly outperforms the baseline.

CV-87-标题: Moiré Attack (MA): A New Potential Risk of Screen Photos

Abstract: Images, captured by a camera, play a critical role in training Deep Neural Networks (DNNs). Usually, we assume the images acquired by cameras are consistent with the ones perceived by human eyes. However, due to the different physical mechanisms between human-vision and computer-vision systems, the final perceived images could be very different in some cases, for example shooting on digital monitors. In this paper, we find a special phenomenon in digital image processing, the moiré effect, that could cause unnoticed security threats to DNNs. Based on it, we propose a Moiré Attack (MA) that generates the physical-world moiré pattern adding to the images by mimicking the shooting process of digital devices. Extensive experiments demonstrate that our proposed digital Moiré Attack (MA) is a perfect camouflage for attackers to tamper with DNNs with a high success rate ($100.0\%$ for untargeted and $97.0\%$ for targeted attack with the noise budget $\epsilon=4$), high transferability rate across different models, and high robustness under various defenses. Furthermore, MA owns great stealthiness because the moiré effect is unavoidable due to the camera’s inner physical structure, which therefore hardly attracts the awareness of humans. Our code is available at this https URL.

CV-88-标题: A unifying framework for n-dimensional quasi-conformal mappings

Abstract: With the advancement of computer technology, there is a surge of interest in effective mapping methods for objects in higher-dimensional spaces. To establish a one-to-one correspondence between objects, higher-dimensional quasi-conformal theory can be utilized for ensuring the bijectivity of the mappings. In addition, it is often desirable for the mappings to satisfy certain prescribed geometric constraints and possess low distortion in conformality or volume. In this work, we develop a unifying framework for computing $n$-dimensional quasi-conformal mappings. More specifically, we propose a variational model that integrates quasi-conformal distortion, volumetric distortion, landmark correspondence, intensity mismatch and volume prior information to handle a large variety of deformation problems. We further prove the existence of a minimizer for the proposed model and devise efficient numerical methods to solve the optimization problem. We demonstrate the effectiveness of the proposed framework using various experiments in two- and three-dimensions, with applications to medical image registration, adaptive remeshing and shape modeling.

CV-89-标题: Depth360: Monocular Depth Estimation using Learnable Axisymmetric Camera Model for Spherical Camera Image

Abstract: Self-supervised monocular depth estimation has been widely investigated to estimate depth images and relative poses from RGB images. This framework is attractive for researchers because the depth and pose networks can be trained from just time sequence images without the need for the ground truth depth and poses. In this work, we estimate the depth around a robot (360 degree view) using time sequence spherical camera images, from a camera whose parameters are unknown. We propose a learnable axisymmetric camera model which accepts distorted spherical camera images with two fisheye camera images. In addition, we trained our models with a photo-realistic simulator to generate ground truth depth images to provide supervision. Moreover, we introduced loss functions to provide floor constraints to reduce artifacts that can result from reflective floor surfaces. We demonstrate the efficacy of our method using the spherical camera images from the GO Stanford dataset and pinhole camera images from the KITTI dataset to compare our method’s performance with that of baseline method in learning the camera parameters.

CV-90-标题: ARTS: Eliminating Inconsistency between Text Detection and Recognition with Auto-Rectification Text Spotter

Abstract: Recent approaches for end-to-end text spotting have achieved promising results. However, most of the current spotters were plagued by the inconsistency problem between text detection and recognition. In this work, we introduce and prove the existence of the inconsistency problem and analyze it from two aspects: (1) inconsistency of text recognition features between training and testing, and (2) inconsistency of optimization targets between text detection and recognition. To solve the aforementioned issues, we propose a differentiable Auto-Rectification Module (ARM) together with a new training strategy to enable propagating recognition loss back into detection branch, so that our detection branch can be jointly optimized by detection and recognition targets, which largely alleviates the inconsistency problem between text detection and recognition. Based on these designs, we present a simple yet robust end-to-end text spotting framework, termed Auto-Rectification Text Spotter (ARTS), to detect and recognize arbitrarily-shaped text in natural scenes. Extensive experiments demonstrate the superiority of our method. In particular, our ARTS-S achieves 77.1% end-to-end text spotting F-measure on Total-Text at a competitive speed of 10.5 FPS, which significantly outperforms previous methods in both accuracy and inference speed.

CV-91-标题: 3DFaceFill: An Analysis-By-Synthesis Approach to Face Completion

Abstract: Existing face completion solutions are primarily driven by end-to-end models that directly generate 2D completions of 2D masked faces. By having to implicitly account for geometric and photometric variations in facial shape and appearance, such approaches result in unrealistic completions, especially under large variations in pose, shape, illumination and mask sizes. To alleviate these limitations, we introduce 3DFaceFill, an analysis-by-synthesis approach for face completion that explicitly considers the image formation process. It comprises three components, (1) an encoder that disentangles the face into its constituent 3D mesh, 3D pose, illumination and albedo factors, (2) an autoencoder that inpaints the UV representation of facial albedo, and (3) a renderer that resynthesizes the completed face. By operating on the UV representation, 3DFaceFill affords the power of correspondence and allows us to naturally enforce geometrical priors (e.g. facial symmetry) more effectively. Quantitatively, 3DFaceFill improves the state-of-the-art by up to 4dB higher PSNR and 25% better LPIPS for large masks. And, qualitatively, it leads to demonstrably more photorealistic face completions over a range of masks and occlusions while preserving consistency in global and component-wise shape, pose, illumination and eye-gaze.

CV-92-标题: Does Data Repair Lead to Fair Models? Curating Contextually Fair Data To Reduce Model Bias

Abstract: Contextual information is a valuable cue for Deep Neural Networks (DNNs) to learn better representations and improve accuracy. However, co-occurrence bias in the training dataset may hamper a DNN model’s generalizability to unseen scenarios in the real world. For example, in COCO, many object categories have a much higher co-occurrence with men compared to women, which can bias a DNN’s prediction in favor of men. Recent works have focused on task-specific training strategies to handle bias in such scenarios, but fixing the available data is often ignored. In this paper, we propose a novel and more generic solution to address the contextual bias in the datasets by selecting a subset of the samples, which is fair in terms of the co-occurrence with various classes for a protected attribute. We introduce a data repair algorithm using the coefficient of variation, which can curate fair and contextually balanced data for a protected class(es). This helps in training a fair model irrespective of the task, architecture or training methodology. Our proposed solution is simple, effective, and can even be used in an active learning setting where the data labels are not present or being generated incrementally. We demonstrate the effectiveness of our algorithm for the task of object detection and multi-label image classification across different datasets. Through a series of experiments, we validate that curating contextually fair data helps make model predictions fair by balancing the true positive rate for the protected class across groups without compromising on the model’s overall performance.

CV-93-标题: Model Composition: Can Multiple Neural Networks Be Combined into a Single Network Using Only Unlabeled Data?

Abstract: The diversity of deep learning applications, datasets, and neural network architectures necessitates a careful selection of the architecture and data that match best to a target application. As an attempt to mitigate this dilemma, this paper investigates the idea of combining multiple trained neural networks using unlabeled data. In addition, combining multiple models into one can speed up the inference, result in stronger, more capable models, and allows us to select efficient device-friendly target network architectures. To this end, the proposed method makes use of generation, filtering, and aggregation of reliable pseudo-labels collected from unlabeled data. Our method supports using an arbitrary number of input models with arbitrary architectures and categories. Extensive performance evaluations demonstrated that our method is very effective. For example, for the task of object detection and without using any ground-truth labels, an EfficientDet-D0 trained on Pascal-VOC and an EfficientDet-D1 trained on COCO, can be combined to a RetinaNet-ResNet50 model, with a similar mAP as the supervised training. If fine-tuned in a semi-supervised setting, the combined model achieves +18.6%, +12.6%, and +8.1% mAP improvements over supervised training with 1%, 5%, and 10% of labels.

CV-94-标题: ABC: Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised Learning

Abstract: Existing semi-supervised learning (SSL) algorithms typically assume class-balanced datasets, although the class distributions of many real-world datasets are imbalanced. In general, classifiers trained on a class-imbalanced dataset are biased toward the majority classes. This issue becomes more problematic for SSL algorithms because they utilize the biased prediction of unlabeled data for training. However, traditional class-imbalanced learning techniques, which are designed for labeled data, cannot be readily combined with SSL algorithms. We propose a scalable class-imbalanced SSL algorithm that can effectively use unlabeled data, while mitigating class imbalance by introducing an auxiliary balanced classifier (ABC) of a single layer, which is attached to a representation layer of an existing SSL algorithm. The ABC is trained with a class-balanced loss of a minibatch, while using high-quality representations learned from all data points in the minibatch using the backbone SSL algorithm to avoid overfitting and information loss.Moreover, we use consistency regularization, a recent SSL technique for utilizing unlabeled data in a modified way, to train the ABC to be balanced among the classes by selecting unlabeled data with the same probability for each class. The proposed algorithm achieves state-of-the-art performance in various class-imbalanced SSL experiments using four benchmark datasets.

CV-95-标题: Repaint: Improving the Generalization of Down-Stream Visual Tasks by Generating Multiple Instances of Training Examples

Abstract: Convolutional Neural Networks (CNNs) for visual tasks are believed to learn both the low-level textures and high-level object attributes, throughout the network depth. This paper further investigates the texture bias' in CNNs. To this end, we regenerate multiple instances of training examples from each original image, through a process we call repainting’. The repainted examples preserve the shape and structure of the regions and objects within the scenes, but diversify their texture and color. Our method can regenerate a same image at different daylight, season, or weather conditions, can have colorization or de-colorization effects, or even bring back some texture information from blacked-out areas. The in-place repaint allows us to further use these repainted examples for improving the generalization of CNNs. Through an extensive set of experiments, we demonstrate the usefulness of the repainted examples in training, for the tasks of image classification (ImageNet) and object detection (COCO), over several state-of-the-art network architectures at different capacities, and across different data availability regimes.

CV-96-标题: NOD: Taking a Closer Look at Detection under Extreme Low-Light Conditions with Night Object Detection Dataset

Abstract: Recent work indicates that, besides being a challenge in producing perceptually pleasing images, low light proves more difficult for machine cognition than previously thought. In our work, we take a closer look at object detection in low light. First, to support the development and evaluation of new methods in this domain, we present a high-quality large-scale Night Object Detection (NOD) dataset showing dynamic scenes captured on the streets at night. Next, we directly link the lighting conditions to perceptual difficulty and identify what makes low light problematic for machine cognition. Accordingly, we provide instance-level annotation for a subset of the dataset for an in-depth evaluation of future methods. We also present an analysis of the baseline model performance to highlight opportunities for future research and show that low light is a non-trivial problem that requires special attention from the researchers. Further, to address the issues caused by low light, we propose to incorporate an image enhancement module into the object detection framework and two novel data augmentation techniques. Our image enhancement module is trained under the guidance of the object detector to learn image representation optimal for machine cognition rather than for the human visual system. Finally, experimental results confirm that the proposed method shows consistent improvement of the performance on low-light datasets.

CV-97-标题: Dynamic Multi-Person Mesh Recovery From Uncalibrated Multi-View Cameras

Abstract: Dynamic multi-person mesh recovery has been a hot topic in 3D vision recently. However, few works focus on the multi-person motion capture from uncalibrated cameras, which mainly faces two challenges: the one is that inter-person interactions and occlusions introduce inherent ambiguities for both camera calibration and motion capture; The other is that a lack of dense correspondences can be used to constrain sparse camera geometries in a dynamic multi-person scene. Our key idea is incorporating motion prior knowledge into simultaneous optimization of extrinsic camera parameters and human meshes from noisy human semantics. First, we introduce a physics-geometry consistency to reduce the low and high frequency noises of the detected human semantics. Then a novel latent motion prior is proposed to simultaneously optimize extrinsic camera parameters and coherent human motions from slightly noisy inputs. Experimental results show that accurate camera parameters and human motions can be obtained through one-stage optimization. The codes will be publicly available at~\url{this https URL}.

CV-98-标题: Detecting Backdoor Attacks Against Point Cloud Classifiers

Abstract: Backdoor attacks (BA) are an emerging threat to deep neural network classifiers. A classifier being attacked will predict to the attacker’s target class when a test sample from a source class is embedded with the backdoor pattern (BP). Recently, the first BA against point cloud (PC) classifiers was proposed, creating new threats to many important applications including autonomous driving. Such PC BAs are not detectable by existing BA defenses due to their special BP embedding mechanism. In this paper, we propose a reverse-engineering defense that infers whether a PC classifier is backdoor attacked, without access to its training set or to any clean classifiers for reference. The effectiveness of our defense is demonstrated on the benchmark ModeNet40 dataset for PCs.

CV-99-标题: Contextual Gradient Scaling for Few-Shot Learning

CV-100-标题: GTM: Gray Temporal Model for Video Recognition

Abstract: Data input modality plays an important role in video action recognition. Normally, there are three types of input: RGB, flow stream and compressed data. In this paper, we proposed a new input modality: gray stream. Specifically, taken the stacked consecutive 3 gray images as input, which is the same size of RGB, can not only skip the conversion process from video decoding data to RGB, but also improve the spatio-temporal modeling ability at zero computation and zero parameters. Meanwhile, we proposed a 1D Identity Channel-wise Spatio-temporal Convolution(1D-ICSC) which captures the temporal relationship at channel-feature level within a controllable computation budget(by parameters G & R). Finally, we confirm its effectiveness and efficiency on several action recognition benchmarks, such as Kinetics, Something-Something, HMDB-51 and UCF-101, and achieve impressive results.

CV-101-标题: EBJR: Energy-Based Joint Reasoning for Adaptive Inference

Abstract: State-of-the-art deep learning models have achieved significant performance levels on various benchmarks. However, the excellent performance comes at a cost of inefficient computational cost. Light-weight architectures, on the other hand, achieve moderate accuracies, but at a much more desirable latency. This paper presents a new method of jointly using the large accurate models together with the small fast ones. To this end, we propose an Energy-Based Joint Reasoning (EBJR) framework that adaptively distributes the samples between shallow and deep models to achieve an accuracy close to the deep model, but latency close to the shallow one. Our method is applicable to out-of-the-box pre-trained models as it does not require an architecture change nor re-training. Moreover, it is easy to use and deploy, especially for cloud services. Through a comprehensive set of experiments on different down-stream tasks, we show that our method outperforms strong state-of-the-art approaches with a considerable margin. In addition, we propose specialized EBJR, an extension of our method where we create a smaller specialized side model that performs the target task only partially, but yields an even higher accuracy and faster inference. We verify the strengths of our methods with both theoretical and experimental evaluations.

CV-102-标题: Simpler Does It: Generating Semantic Labels with Objectness Guidance

Abstract: Existing weakly or semi-supervised semantic segmentation methods utilize image or box-level supervision to generate pseudo-labels for weakly labeled images. However, due to the lack of strong supervision, the generated pseudo-labels are often noisy near the object boundaries, which severely impacts the network’s ability to learn strong representations. To address this problem, we present a novel framework that generates pseudo-labels for training images, which are then used to train a segmentation model. To generate pseudo-labels, we combine information from: (i) a class agnostic objectness network that learns to recognize object-like regions, and (ii) either image-level or bounding box annotations. We show the efficacy of our approach by demonstrating how the objectness network can naturally be leveraged to generate object-like regions for unseen categories. We then propose an end-to-end multi-task learning strategy, that jointly learns to segment semantics and objectness using the generated pseudo-labels. Extensive experiments demonstrate the high quality of our generated pseudo-labels and effectiveness of the proposed framework in a variety of domains. Our approach achieves better or competitive performance compared to existing weakly-supervised and semi-supervised methods.

CV-103-标题: Toward Accurate and Reliable Iris Segmentation Using Uncertainty Learning

Abstract: As an upstream task of iris recognition, iris segmentation plays a vital role in multiple subsequent tasks, including localization and matching. A slight bias in iris segmentation often results in obvious performance degradation of the iris recognition system. In the paper, we propose an Iris U-transformer (IrisUsformer) for accurate and reliable iris segmentation. For better accuracy, we elaborately design IrisUsformer by adopting position-sensitive operation and re-packaging transformer block to raise the spatial perception ability of the model. For better reliability, IrisUsformer utilizes an auxiliary head to distinguishes the high- and low-uncertainty regions of segmentation predictions and then adopts a weighting scheme to guide model optimization. Experimental results on three publicly available databases demonstrate that IrisUsformer achieves better segmentation accuracy using 35% MACs of the SOTA IrisParseNet. More importantly, our method estimates the uncertainty map corresponding to the segmentation prediction for subsequent processing in iris recognition systems.

CV-104-标题: Constrained Mean Shift for Representation Learning

Abstract: We are interested in representation learning from labeled or unlabeled data. Inspired by recent success of self-supervised learning (SSL), we develop a non-contrastive representation learning method that can exploit additional knowledge. This additional knowledge may come from annotated labels in the supervised setting or an SSL model from another modality in the SSL setting. Our main idea is to generalize the mean-shift algorithm by constraining the search space of nearest neighbors, resulting in semantically purer representations. Our method simply pulls the embedding of an instance closer to its nearest neighbors in a search space that is constrained using the additional knowledge. By leveraging this non-contrastive loss, we show that the supervised ImageNet-1k pretraining with our method results in better transfer performance as compared to the baselines. Further, we demonstrate that our method is relatively robust to label noise. Finally, we show that it is possible to use the noisy constraint across modalities to train self-supervised video models.

CV-105-标题: Momentum Contrastive Autoencoder: Using Contrastive Learning for Latent Space Distribution Matching in WAE

Abstract: Wasserstein autoencoder (WAE) shows that matching two distributions is equivalent to minimizing a simple autoencoder (AE) loss under the constraint that the latent space of this AE matches a pre-specified prior distribution. This latent space distribution matching is a core component of WAE, and a challenging task. In this paper, we propose to use the contrastive learning framework that has been shown to be effective for self-supervised representation learning, as a means to resolve this problem. We do so by exploiting the fact that contrastive learning objectives optimize the latent space distribution to be uniform over the unit hyper-sphere, which can be easily sampled from. We show that using the contrastive learning framework to optimize the WAE loss achieves faster convergence and more stable optimization compared with existing popular algorithms for WAE. This is also reflected in the FID scores on CelebA and CIFAR-10 datasets, and the realistic generated image quality on the CelebA-HQ dataset.

CV-106-标题: Learning Rich Nearest Neighbor Representations from Self-supervised Ensembles

Abstract: Pretraining convolutional neural networks via self-supervision, and applying them in transfer learning, is an incredibly fast-growing field that is rapidly and iteratively improving performance across practically all image domains. Meanwhile, model ensembling is one of the most universally applicable techniques in supervised learning literature and practice, offering a simple solution to reliably improve performance. But how to optimally combine self-supervised models to maximize representation quality has largely remained unaddressed. In this work, we provide a framework to perform self-supervised model ensembling via a novel method of learning representations directly through gradient descent at inference time. This technique improves representation quality, as measured by k-nearest neighbors, both on the in-domain dataset and in the transfer setting, with models transferable from the former setting to the latter. Additionally, this direct learning of feature through backpropagation improves representations from even a single model, echoing the improvements found in self-distillation.

CV-107-标题: On Coordinate Decoding for Keypoint Estimation Tasks

Abstract: A series of 2D (and 3D) keypoint estimation tasks are built upon heatmap coordinate representation, i.e. a probability map that allows for learnable and spatially aware encoding and decoding of keypoint coordinates on grids, even allowing for sub-pixel coordinate accuracy. In this report, we aim to reproduce the findings of DARK that investigated the 2D heatmap representation by highlighting the importance of the encoding of the ground truth heatmap and the decoding of the predicted heatmap to keypoint coordinates. The authors claim that a) a more principled distribution-aware coordinate decoding method overcomes the limitations of the standard techniques widely used in the literature, and b), that the reconstruction of heatmaps from ground-truth coordinates by generating accurate and continuous heatmap distributions lead to unbiased model training, contrary to the standard coordinate encoding process that quantizes the keypoint coordinates on the resolution of the input image grid.

CV-108-标题: Fine-Grained Control of Artistic Styles in Image Generation

Abstract: Recent advances in generative models and adversarial training have enabled artificially generating artworks in various artistic styles. It is highly desirable to gain more control over the generated style in practice. However, artistic styles are unlike object categories – there are a continuous spectrum of styles distinguished by subtle differences. Few works have been explored to capture the continuous spectrum of styles and apply it to a style generation task. In this paper, we propose to achieve this by embedding original artwork examples into a continuous style space. The style vectors are fed to the generator and discriminator to achieve fine-grained control. Our method can be used with common generative adversarial networks (such as StyleGAN). Experiments show that our method not only precisely controls the fine-grained artistic style but also improves image quality over vanilla StyleGAN as measured by FID.

CV-109-标题: Early- and in-season crop type mapping without current-year ground truth: generating labels from historical information via a topology-based approach

Abstract: Land cover classification in remote sensing is often faced with the challenge of limited ground truth. Incorporating historical information has the potential to significantly lower the expensive cost associated with collecting ground truth and, more importantly, enable early- and in-season mapping that is helpful to many pre-harvest decisions. In this study, we propose a new approach that can effectively transfer knowledge about the topology (i.e. relative position) of different crop types in the spectral feature space (e.g. the histogram of SWIR1 vs RDEG1 bands) to generate labels, thereby support crop classification in a different year. Importantly, our approach does not attempt to transfer classification decision boundaries that are susceptible to inter-annual variations of weather and management, but relies on the more robust and shift-invariant topology information. We tested this approach for mapping corn/soybeans in the US Midwest and paddy rice/corn/soybeans in Northeast China using Landsat-8 and Sentinel-2 data. Results show that our approach automatically generates high-quality labels for crops in the target year immediately after each image becomes available. Based on these generated labels from our approach, the subsequent crop type mapping using a random forest classifier reach the F1 score as high as 0.887 for corn as early as the silking stage and 0.851 for soybean as early as the flowering stage and the overall accuracy of 0.873 in Iowa. In Northeast China, F1 scores of paddy rice, corn and soybeans and the overall accuracy can exceed 0.85 two and half months ahead of harvest. Overall, these results highlight unique advantages of our approach in transferring historical knowledge and maximizing the timeliness of crop maps. Our approach supports a general paradigm shift towards learning transferrable and generalizable knowledge to facilitate land cover classification.

CV-110-标题: 1st Place Solution for the UVO Challenge on Image-based Open-World Segmentation 2021

Abstract: We describe our two-stage instance segmentation framework we use to compete in the challenge. The first stage of our framework consists of an object detector, which generates object proposals in the format of bounding boxes. Then, the images and the detected bounding boxes are fed to the second stage, where a segmentation network is applied to segment the objects in the bounding boxes. We train all our networks in a class-agnostic way. Our approach achieves the first place in the UVO 2021 Image-based Open-World Segmentation Challenge.

CV-111-标题: Test time Adaptation through Perturbation Robustness

Abstract: Data samples generated by several real world processes are dynamic in nature \textit{i.e.}, their characteristics vary with time. Thus it is not possible to train and tackle all possible distributional shifts between training and inference, using the host of transfer learning methods in literature. In this paper, we tackle this problem of adapting to domain shift at inference time \textit{i.e.}, we do not change the training process, but quickly adapt the model at test-time to handle any domain shift. For this, we propose to enforce consistency of predictions of data sampled in the vicinity of test sample on the image manifold. On a host of test scenarios like dealing with corruptions (CIFAR-10-C and CIFAR-100-C), and domain adaptation (VisDA-C), our method is at par or significantly outperforms previous methods.

CV-112-标题: An Adaptive Sampling and Edge Detection Approach for Encoding Static Images for Spiking Neural Networks

Abstract: Current state-of-the-art methods of image classification using convolutional neural networks are often constrained by both latency and power consumption. This places a limit on the devices, particularly low-power edge devices, that can employ these methods. Spiking neural networks (SNNs) are considered to be the third generation of artificial neural networks which aim to address these latency and power constraints by taking inspiration from biological neuronal communication processes. Before data such as images can be input into an SNN, however, they must be first encoded into spike trains. Herein, we propose a method for encoding static images into temporal spike trains using edge detection and an adaptive signal sampling method for use in SNNs. The edge detection process consists of first performing Canny edge detection on the 2D static images and then converting the edge detected images into two X and Y signals using an image-to-signal conversion method. The adaptive signaling approach consists of sampling the signals such that the signals maintain enough detail and are sensitive to abrupt changes in the signal. Temporal encoding mechanisms such as threshold-based representation (TBR) and step-forward (SF) are then able to be used to convert the sampled signals into spike trains. We use various error and indicator metrics to optimize and evaluate the efficiency and precision of the proposed image encoding approach. Comparison results between the original and reconstructed signals from spike trains generated using edge-detection and adaptive temporal encoding mechanism exhibit 18x and 7x reduction in average root mean square error (RMSE) compared to the conventional SF and TBR encoding, respectively, while used for encoding MNIST dataset.

CV-113-标题: Learning Equivariances and Partial Equivariances from Data

Abstract: Group equivariant Convolutional Neural Networks (G-CNNs) constrain features to respect the chosen symmetries, and lead to better generalization when these symmetries appear in the data. However, if the chosen symmetries are not present, group equivariant architectures lead to overly constrained models and worse performance. Frequently, the distribution of the data can be better represented by a subset of a group than by the group as a whole, e.g., rotations in $[-90^{\circ}, 90^{\circ}]$. In such cases, a model that respects equivariance partially is better suited to represent the data. Moreover, relevant symmetries may differ for low and high-level features, e.g., edge orientations in a face, and face poses relative to the camera. As a result, the optimal level of equivariance may differ per layer. In this work, we introduce Partial G-CNNs: a family of equivariant networks able to learn partial and full equivariances from data at every layer end-to-end. Partial G-CNNs retain full equivariance whenever beneficial, e.g., for rotated MNIST, but are able to restrict it whenever it becomes harmful, e.g., for 6~/~9 or natural image classification. Partial G-CNNs perform on par with G-CNNs when full equivariance is necessary, and outperform them otherwise. Our method is applicable to discrete groups, continuous groups and combinations thereof.

CV-114-标题: Come Again? Re-Query in Referring Expression Comprehension

Abstract: To build a shared perception of the world, humans rely on the ability to resolve misunderstandings by requesting and accepting clarifications. However, when evaluating visiolinguistic models, metrics such as accuracy enforce the assumption that a decision must be made based on a single piece of evidence. In this work, we relax this assumption for the task of referring expression comprehension by allowing the model to request help when its confidence is low. We consider two ways in which this help can be provided: multimodal re-query, where the user is allowed to point or click to provide additional information to the model, and rephrase re-query, where the user is only allowed to provide another referring expression. We demonstrate the importance of re-query by showing that providing the best referring expression for all objects can increase accuracy by up to 21.9% and that this accuracy can be matched by re-querying only 12% of initial referring expressions. We further evaluate re-query functions for both multimodal and rephrase re-query across three modern approaches and demonstrate combined replacement for rephrase re-query, which improves average single-query performance by up to 6.5% and converges to as close as 1.6% of the upper bound of single-query performance.

CV-115-标题: CoFi: Coarse-to-Fine ICP for LiDAR Localization in an Efficient Long-lasting Point Cloud Map

Abstract: LiDAR odometry and localization has attracted increasing research interest in recent years. In the existing works, iterative closest point (ICP) is widely used since it is precise and efficient. Due to its non-convexity and its local iterative strategy, however, ICP-based method easily falls into local optima, which in turn calls for a precise initialization. In this paper, we propose CoFi, a Coarse-to-Fine ICP algorithm for LiDAR localization. Specifically, the proposed algorithm down-samples the input point sets under multiple voxel resolution, and gradually refines the transformation from the coarse point sets to the fine-grained point sets. In addition, we propose a map based LiDAR localization algorithm that extracts semantic feature points from the LiDAR frames and apply CoFi to estimate the pose on an efficient point cloud map. With the help of the Cylinder3D algorithm for LiDAR scan semantic segmentation, the proposed CoFi localization algorithm demonstrates the state-of-the-art performance on the KITTI odometry benchmark, with significant improvement over the literature.

CV-116-标题: Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation

Abstract: It is hard to generate an image at target view well for previous cross-view image translation methods that directly adopt a simple encoder-decoder or U-Net structure, especially for drastically different views and severe deformation cases. To ease this problem, we propose a novel two-stage framework with a new Cascaded Cross MLP-Mixer (CrossMLP) sub-network in the first stage and one refined pixel-level loss in the second stage. In the first stage, the CrossMLP sub-network learns the latent transformation cues between image code and semantic map code via our novel CrossMLP blocks. Then the coarse results are generated progressively under the guidance of those cues. Moreover, in the second stage, we design a refined pixel-level loss that eases the noisy semantic label problem with more reasonable regularization in a more compact fashion for better optimization. Extensive experimental results on Dayton~\cite{vo2016localizing} and CVUSA~\cite{workman2015wide} datasets show that our method can generate significantly better results than state-of-the-art methods. The source code and trained models are available at this https URL.

CV-117-标题: Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction

Abstract: Every hand-object interaction begins with contact. Despite predicting the contact state between hands and objects is useful in understanding hand-object interactions, prior methods on hand-object analysis have assumed that the interacting hands and objects are known, and were not studied in detail. In this study, we introduce a video-based method for predicting contact between a hand and an object. Specifically, given a video and a pair of hand and object tracks, we predict a binary contact state (contact or no-contact) for each frame. However, annotating a large number of hand-object tracks and contact labels is costly. To overcome the difficulty, we propose a semi-supervised framework consisting of (i) automatic collection of training data with motion-based pseudo-labels and (ii) guided progressive label correction (gPLC), which corrects noisy pseudo-labels with a small amount of trusted data. We validated our framework’s effectiveness on a newly built benchmark dataset for hand-object contact prediction and showed superior performance against existing baseline methods. Code and data are available at this https URL.

CV-118-标题: Recurrent Brain Graph Mapper for Predicting Time-Dependent Brain Graph Evaluation Trajectory

Abstract: Several brain disorders can be detected by observing alterations in the brain’s structural and functional connectivities. Neurological findings suggest that early diagnosis of brain disorders, such as mild cognitive impairment (MCI), can prevent and even reverse its development into Alzheimer’s disease (AD). In this context, recent studies aimed to predict the evolution of brain connectivities over time by proposing machine learning models that work on brain images. However, such an approach is costly and time-consuming. Here, we propose to use brain connectivities as a more efficient alternative for time-dependent brain disorder diagnosis by regarding the brain as instead a large interconnected graph characterizing the interconnectivity scheme between several brain regions. We term our proposed method Recurrent Brain Graph Mapper (RBGM), a novel efficient edge-based recurrent graph neural network that predicts the time-dependent evaluation trajectory of a brain graph from a single baseline. Our RBGM contains a set of recurrent neural network-inspired mappers for each time point, where each mapper aims to project the ground-truth brain graph onto its next time point. We leverage the teacher forcing method to boost training and improve the evolved brain graph quality. To maintain the topological consistency between the predicted brain graphs and their corresponding ground-truth brain graphs at each time point, we further integrate a topological loss. We also use l1 loss to capture time-dependency and minimize the distance between the brain graph at consecutive time points for regularization. Benchmarks against several variants of RBGM and state-of-the-art methods prove that we can achieve the same accuracy in predicting brain graph evolution more efficiently, paving the way for novel graph neural network architecture and a highly efficient training scheme.

CV-119-标题: Towards Reducing Aleatoric Uncertainty for Medical Imaging Tasks

Abstract: In safety-critical applications like medical diagnosis, certainty associated with a model’s prediction is just as important as its accuracy. Consequently, uncertainty estimation and reduction play a crucial role. Uncertainty in predictions can be attributed to noise or randomness in data (aleatoric) and incorrect model inferences (epistemic). While model uncertainty can be reduced with more data or bigger models, aleatoric uncertainty is more intricate. This work proposes a novel approach that interprets data uncertainty estimated from a self-supervised task as noise inherent to the data and utilizes it to reduce aleatoric uncertainty in another task related to the same dataset via data augmentation. The proposed method was evaluated on a benchmark medical imaging dataset with image reconstruction as the self-supervised task and segmentation as the image analysis task. Our findings demonstrate the effectiveness of the proposed approach in significantly reducing the aleatoric uncertainty in the image segmentation task while achieving better or on-par performance compared to the standard augmentation techniques.

CV-120-标题: 2020 CATARACTS Semantic Segmentation Challenge

Abstract: Surgical scene segmentation is essential for anatomy and instrument localization which can be further used to assess tissue-instrument interactions during a surgical procedure. In 2017, the Challenge on Automatic Tool Annotation for cataRACT Surgery (CATARACTS) released 50 cataract surgery videos accompanied by instrument usage annotations. These annotations included frame-level instrument presence information. In 2020, we released pixel-wise semantic annotations for anatomy and instruments for 4670 images sampled from 25 videos of the CATARACTS training set. The 2020 CATARACTS Semantic Segmentation Challenge, which was a sub-challenge of the 2020 MICCAI Endoscopic Vision (EndoVis) Challenge, presented three sub-tasks to assess participating solutions on anatomical structure and instrument segmentation. Their performance was assessed on a hidden test set of 531 images from 10 videos of the CATARACTS test set.

CV-121-标题: Evaluation of Various Open-Set Medical Imaging Tasks with Deep Neural Networks

Abstract: The current generation of deep neural networks has achieved close-to-human results on “closed-set” image recognition; that is, the classes being evaluated overlap with the training classes. Many recent methods attempt to address the importance of the unknown, which are termed “open-set” recognition algorithms, try to reject unknown classes as well as maintain high recognition accuracy on known classes. However, it is still unclear how different general domain-trained open-set methods from ImageNet would perform on a different but more specific domain, such as the medical domain. Without principled and formal evaluations to measure the effectiveness of those general open-set methods, artificial intelligence (AI)-based medical diagnostics would experience ineffective adoption and increased risks of bad decision making. In this paper, we conduct rigorous evaluations amongst state-of-the-art open-set methods, exploring different open-set scenarios from “similar-domain” to “different-domain” scenarios and comparing them on various general and medical domain datasets. We summarise the results and core ideas and explain how the models react to various degrees of openness and different distributions of open classes. We show the main difference between general domain-trained and medical domain-trained open-set models with our quantitative and qualitative analysis of the results. We also identify aspects of model robustness in real clinical workflow usage according to confidence calibration and the inference efficiency.

CV-122-标题: CXR-Net: An Encoder-Decoder-Encoder Multitask Deep Neural Network for Explainable and Accurate Diagnosis of COVID-19 pneumonia with Chest X-ray Images

Abstract: Accurate and rapid detection of COVID-19 pneumonia is crucial for optimal patient treatment. Chest X-Ray (CXR) is the first line imaging test for COVID-19 pneumonia diagnosis as it is fast, cheap and easily accessible. Inspired by the success of deep learning (DL) in computer vision, many DL-models have been proposed to detect COVID-19 pneumonia using CXR images. Unfortunately, these deep classifiers lack the transparency in interpreting findings, which may limit their applications in clinical practice. The existing commonly used visual explanation methods are either too noisy or imprecise, with low resolution, and hence are unsuitable for diagnostic purposes. In this work, we propose a novel explainable deep learning framework (CXRNet) for accurate COVID-19 pneumonia detection with an enhanced pixel-level visual explanation from CXR images. The proposed framework is based on a new Encoder-Decoder-Encoder multitask architecture, allowing for both disease classification and visual explanation. The method has been evaluated on real world CXR datasets from both public and private data sources, including: healthy, bacterial pneumonia, viral pneumonia and COVID-19 pneumonia cases The experimental results demonstrate that the proposed method can achieve a satisfactory level of accuracy and provide fine-resolution classification activation maps for visual explanation in lung disease detection. The Average Accuracy, the Precision, Recall and F1-score of COVID-19 pneumonia reached 0.879, 0.985, 0.992 and 0.989, respectively. We have also found that using lung segmented (CXR) images can help improve the performance of the model. The proposed method can provide more detailed high resolution visual explanation for the classification decision, compared to current state-of-the-art visual explanation methods and has a great potential to be used in clinical practice for COVID-19 pneumonia diagnosis.

Abstract: Most single image super-resolution (SR) methods are developed on synthetic low-resolution (LR) and high-resolution (HR) image pairs, which are simulated by a predetermined degradation operation, e.g., bicubic downsampling. However, these methods only learn the inverse process of the predetermined operation, so they fail to super resolve the real-world LR images; the true formulation deviates from the predetermined operation. To address this problem, we propose a novel supervised method to simulate an unknown degradation process with the inclusion of the prior hardware knowledge of the imaging system. We design an adaptive blurring layer (ABL) in the supervised learning framework to estimate the target LR images. The hyperparameters of the ABL can be adjusted for different imaging hardware. The experiments on the real-world datasets validate that our degradation model can estimate LR images more accurately than the predetermined degradation operation, as well as facilitate existing SR methods to perform reconstructions on real-world LR images more accurately than the conventional approaches.

CV-124-标题: Combining Different V1 Brain Model Variants to Improve Robustness to Image Corruptions in CNNs

Abstract: While some convolutional neural networks (CNNs) have surpassed human visual abilities in object classification, they often struggle to recognize objects in images corrupted with different types of common noise patterns, highlighting a major limitation of this family of models. Recently, it has been shown that simulating a primary visual cortex (V1) at the front of CNNs leads to small improvements in robustness to these image perturbations. In this study, we start with the observation that different variants of the V1 model show gains for specific corruption types. We then build a new model using an ensembling technique, which combines multiple individual models with different V1 front-end variants. The model ensemble leverages the strengths of each individual model, leading to significant improvements in robustness across all corruption categories and outperforming the base model by 38% on average. Finally, we show that using distillation, it is possible to partially compress the knowledge in the ensemble model into a single model with a V1 front-end. While the ensembling and distillation techniques used here are hardly biologically-plausible, the results presented here demonstrate that by combining the specific strengths of different neuronal circuits in V1 it is possible to improve the robustness of CNNs for a wide range of perturbations.

CV-125-标题: OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data

Abstract: Convolutional neural networks (CNNs) are the current state-of-the-art meta-algorithm for volumetric segmentation of medical data, for example, to localize COVID-19 infected tissue on computer tomography scans or the detection of tumour volumes in magnetic resonance imaging. A key limitation of 3D CNNs on voxelised data is that the memory consumption grows cubically with the training data resolution. Occupancy networks (O-Nets) are an alternative for which the data is represented continuously in a function space and 3D shapes are learned as a continuous decision boundary. While O-Nets are significantly more memory efficient than 3D CNNs, they are limited to simple shapes, are relatively slow at inference, and have not yet been adapted for 3D semantic segmentation of medical data. Here, we propose Occupancy Networks for Semantic Segmentation (OSS-Nets) to accurately and memory-efficiently segment 3D medical data. We build upon the original O-Net with modifications for increased expressiveness leading to improved segmentation performance comparable to 3D CNNs, as well as modifications for faster inference. We leverage local observations to represent complex shapes and prior encoder predictions to expedite inference. We showcase OSS-Net’s performance on 3D brain tumour and liver segmentation against a function space baseline (O-Net), a performance baseline (3D residual U-Net), and an efficiency baseline (2D residual U-Net). OSS-Net yields segmentation results similar to the performance baseline and superior to the function space and efficiency baselines. In terms of memory efficiency, OSS-Net consumes comparable amounts of memory as the function space baseline, somewhat more memory than the efficiency baseline and significantly less than the performance baseline. As such, OSS-Net enables memory-efficient and accurate 3D semantic segmentation that can scale to high resolutions.

CV-126-标题: Development and accuracy evaluation of Coded Phase-shift 3D scanner

Abstract: In this paper, we provide an overview of development of a structured light 3D-scanner based on combination of binary-coded patterns and sinusoidal phase-shifted fringe patterns called Coded Phase-shift technique. Further, we describe the experiments performed to evaluate measurement accuracy and precision of the developed system. A study of this kind is expected to be helpful in understanding the basic working of current structured-light 3D scanners and the approaches followed for their performance assessment.

CV-127-标题: Evaluation of augmentation methods in classifying autism spectrum disorders from fMRI data with 3D convolutional neural networks

Abstract: Classifying subjects as healthy or diseased using neuroimaging data has gained a lot of attention during the last 10 years. Here we apply deep learning to derivatives from resting state fMRI data, and investigate how different 3D augmentation techniques affect the test accuracy. Specifically, we use resting state derivatives from 1,112 subjects in ABIDE preprocessed to train a 3D convolutional neural network (CNN) to perform the classification. Our results show that augmentation only provide minor improvements to the test accuracy.

CV-128-标题: AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation

Abstract: Recent advances in transformer-based models have drawn attention to exploring these techniques in medical image segmentation, especially in conjunction with the U-Net model (or its variants), which has shown great success in medical image segmentation, under both 2D and 3D settings. Current 2D based methods either directly replace convolutional layers with pure transformers or consider a transformer as an additional intermediate encoder between the encoder and decoder of U-Net. However, these approaches only consider the attention encoding within one single slice and do not utilize the axial-axis information naturally provided by a 3D volume. In the 3D setting, convolution on volumetric data and transformers both consume large GPU memory. One has to either downsample the image or use cropped local patches to reduce GPU memory usage, which limits its performance. In this paper, we propose Axial Fusion Transformer UNet (AFTer-UNet), which takes both advantages of convolutional layers’ capability of extracting detailed features and transformers’ strength on long sequence modeling. It considers both intra-slice and inter-slice long-range cues to guide the segmentation. Meanwhile, it has fewer parameters and takes less GPU memory to train than the previous transformer-based models. Extensive experiments on three multi-organ segmentation datasets demonstrate that our method outperforms current state-of-the-art methods.

CV-129-标题: Deep Learning for HDR Imaging: State-of-the-Art and Future Trends

Abstract: High dynamic range (HDR) imaging is a technique that allows an extensive dynamic range of exposures, which is important in image processing, computer graphics, and computer vision. In recent years, there has been a significant advancement in HDR imaging using deep learning (DL). This study conducts a comprehensive and insightful survey and analysis of recent developments in deep HDR imaging methodologies. We hierarchically and structurally group existing deep HDR imaging methods into five categories based on (1) number/domain of input exposures, (2) number of learning tasks, (3) novel sensor data, (4) novel learning strategies, and (5) applications. Importantly, we provide a constructive discussion on each category regarding its potential and challenges. Moreover, we review some crucial aspects of deep HDR imaging, such as datasets and evaluation metrics. Finally, we highlight some open problems and point out future research directions.

CV-130-标题: Knowledge-Guided Multiview Deep Curriculum Learning for Elbow Fracture Classification

Abstract: Elbow fracture diagnosis often requires patients to take both frontal and lateral views of elbow X-ray radiographs. In this paper, we propose a multiview deep learning method for an elbow fracture subtype classification task. Our strategy leverages transfer learning by first training two single-view models, one for frontal view and the other for lateral view, and then transferring the weights to the corresponding layers in the proposed multiview network architecture. Meanwhile, quantitative medical knowledge was integrated into the training process through a curriculum learning framework, which enables the model to first learn from “easier” samples and then transition to “harder” samples to reach better performance. In addition, our multiview network can work both in a dual-view setting and with a single view as input. We evaluate our method through extensive experiments on a classification task of elbow fracture with a dataset of 1,964 images. Results show that our method outperforms two related methods on bone fracture study in multiple settings, and our technique is able to boost the performance of the compared methods. The code is available at this https URL.

CV-131-标题: Medical Knowledge-Guided Deep Curriculum Learning for Elbow Fracture Diagnosis from X-Ray Images

Abstract: Elbow fractures are one of the most common fracture types. Diagnoses on elbow fractures often need the help of radiographic imaging to be read and analyzed by a specialized radiologist with years of training. Thanks to the recent advances of deep learning, a model that can classify and detect different types of bone fractures needs only hours of training and has shown promising results. However, most existing deep learning models are purely data-driven, lacking incorporation of known domain knowledge from human experts. In this work, we propose a novel deep learning method to diagnose elbow fracture from elbow X-ray images by integrating domain-specific medical knowledge into a curriculum learning framework. In our method, the training data are permutated by sampling without replacement at the beginning of each training epoch. The sampling probability of each training sample is guided by a scoring criterion constructed based on clinically known knowledge from human experts, where the scoring indicates the diagnosis difficultness of different elbow fracture subtypes. We also propose an algorithm that updates the sampling probabilities at each epoch, which is applicable to other sampling-based curriculum learning frameworks. We design an experiment with 1865 elbow X-ray images for a fracture/normal binary classification task and compare our proposed method to a baseline method and a previous method using multiple metrics. Our results show that the proposed method achieves the highest classification performance. Also, our proposed probability update algorithm boosts the performance of the previous method.

CV-132-标题: Artificial Intelligence-Based Detection Classification and Prediction/Prognosis in PET Imaging: Towards Radiophenomics

Abstract: Artificial intelligence (AI) techniques have significant potential to enable effective, robust, and automated image phenotyping including identification of subtle patterns. AI-based detection searches the image space to find the regions of interest based on patterns and features. There is a spectrum of tumor histologies from benign to malignant that can be identified by AI-based classification approaches using image features. The extraction of minable information from images gives way to the field of radiomics and can be explored via explicit (handcrafted/engineered) and deep radiomics frameworks. Radiomics analysis has the potential to be utilized as a noninvasive technique for the accurate characterization of tumors to improve diagnosis and treatment monitoring. This work reviews AI-based techniques, with a special focus on oncological PET and PET/CT imaging, for different detection, classification, and prediction/prognosis tasks. We also discuss needed efforts to enable the translation of AI techniques to routine clinical workflows, and potential improvements and complementary techniques such as the use of natural language processing on electronic health records and neuro-symbolic AI techniques.

CV-133-标题: A New Automatic Change Detection Frame-work Based on Region Growing and Weighted Local Mutual Information: Analysis of Breast Tumor Response to Chemotherapy in Serial MR Images

Abstract: The automatic analysis of subtle changes between longitudinal MR images is an important task as it is still a challenging issue in scope of the breast medical image processing. In this paper we propose an effective automatic change detection framework composed of two phases since previously used methods have features with low distinctive power. First, in the preprocessing phase an intensity normalization method is suggested based on Hierarchical Histogram Matching (HHM) that is more robust to noise than previous methods. To eliminate undesirable changes and extract the regions containing significant changes the proposed Extraction Region of Changes (EROC) method is applied based on intensity distribution and Hill-Climbing algorithm. Second, in the detection phase a region growing-based approach is suggested to differentiate significant changes from unreal ones. Due to using proposed Weighted Local Mutual Information (WLMI) method to extract high level features and also utilizing the principle of the local consistency of changes, the proposed approach enjoys reasonable performance. The experimental results on both simulated and real longitudinal Breast MR Images confirm the effectiveness of the proposed framework. Also, this framework outperforms the human expert in some cases which can detect many lesion evolutions that are missed by expert.

CV-134-标题: Cross-Sim-NGF: FFT-Based Global Rigid Multimodal Alignment of Image Volumes using Normalized Gradient Fields

Abstract: Multimodal image alignment involves finding spatial correspondences between volumes varying in appearance and structure. Automated alignment methods are often based on local optimization that can be highly sensitive to their initialization. We propose a global optimization method for rigid multimodal 3D image alignment, based on a novel efficient algorithm for computing similarity of normalized gradient fields (NGF) in the frequency domain. We validate the method experimentally on a dataset comprised of 20 brain volumes acquired in four modalities (T1w, Flair, CT, [18F] FDG PET), synthetically displaced with known transformations. The proposed method exhibits excellent performance on all six possible modality combinations, and outperforms all four reference methods by a large margin. The method is fast; a 3.4Mvoxel global rigid alignment requires approximately 40 seconds of computation, and the proposed algorithm outperforms a direct algorithm for the same task by more than three orders of magnitude. Open-source implementation is provided.

人工智能

AI-0-标题: Detecting Important Patterns Using Conceptual Relevance Interestingness Measure

Abstract: Discovering meaningful conceptual structures is a substantial task in data mining and knowledge discovery applications. While off-the-shelf interestingness indices defined in Formal Concept Analysis may provide an effective relevance evaluation in several situations, they frequently give inadequate results when faced with massive formal contexts (and concept lattices), and in the presence of irrelevant concepts. In this paper, we introduce the Conceptual Relevance (CR) score, a new scalable interestingness measurement for the identification of actionable concepts. From a conceptual perspective, the minimal generators provide key information about their associated concept intent. Furthermore, the relevant attributes of a concept are those that maintain the satisfaction of its closure condition. Thus, the guiding idea of CR exploits the fact that minimal generators and relevant attributes can be efficiently used to assess concept relevance. As such, the CR index quantifies both the amount of conceptually relevant attributes and the number of the minimal generators per concept intent. Our experiments on synthetic and real-world datasets show the efficiency of this measure over the well-known stability index.

AI-1-标题: Improving the Search by Encoding Multiple Solutions in a Chromosome

Abstract: We investigate the possibility of encoding multiple solutions of a problem in a single chromosome. The best solution encoded in an individual will represent (will provide the fitness of) that individual. In order to obtain some benefits the chromosome decoding process must have the same complexity as in the case of a single solution in a chromosome. Three Genetic Programming techniques are analyzed for this purpose: Multi Expression Programming, Linear Genetic Programming, and Infix Form Genetic Programming. Numerical experiments show that encoding multiple solutions in a chromosome greatly improves the search process.

AI-2-标题: Accelerating Genetic Programming using GPUs

Abstract: Genetic Programming (GP), an evolutionary learning technique, has multiple applications in machine learning such as curve fitting, data modelling, feature selection, classification etc. GP has several inherent parallel steps, making it an ideal candidate for GPU based parallelization. This paper describes a GPU accelerated stack-based variant of the generational GP algorithm which can be used for symbolic regression and binary classification. The selection and evaluation steps of the generational GP algorithm are parallelized using CUDA. We introduce representing candidate solution expressions as prefix lists, which enables evaluation using a fixed-length stack in GPU memory. CUDA based matrix vector operations are also used for computation of the fitness of population programs. We evaluate our algorithm on synthetic datasets for the Pagie Polynomial (ranging in size from $4096$ to $16$ million points), profiling training times of our algorithm with other standard symbolic regression libraries viz. gplearn, TensorGP and KarooGP. In addition, using $6$ large-scale regression and classification datasets usually used for comparing gradient boosting algorithms, we run performance benchmarks on our algorithm and gplearn, profiling the training time, test accuracy, and loss. On an NVIDIA DGX-A100 GPU, our algorithm outperforms all the previously listed frameworks, and in particular, achieves average speedups of $119\times$ and $40\times$ against gplearn on the synthetic and large scale datasets respectively.

AI-3-标题: A Survey on Methods and Metrics for the Assessment of Explainability under the Proposed AI Act

Abstract: This study discusses the interplay between metrics used to measure the explainability of the AI systems and the proposed EU Artificial Intelligence Act. A standardisation process is ongoing: several entities (e.g. ISO) and scholars are discussing how to design systems that are compliant with the forthcoming Act and explainability metrics play a significant role. This study identifies the requirements that such a metric should possess to ease compliance with the AI Act. It does so according to an interdisciplinary approach, i.e. by departing from the philosophical concept of explainability and discussing some metrics proposed by scholars and standardisation entities through the lenses of the explainability obligations set by the proposed AI Act. Our analysis proposes that metrics to measure the kind of explainability endorsed by the proposed AI Act shall be risk-focused, model-agnostic, goal-aware, intelligible & accessible. This is why we discuss the extent to which these requirements are met by the metrics currently under discussion.

AI-4-标题: Applying Second-Order Quantifier Elimination in Inspecting Gödels Ontological Proof

Abstract: In recent years, Gödel’s ontological proof and variations of it were formalized and analyzed with automated tools in various ways. We supplement these analyses with a modeling in an automated environment based on first-order logic extended by predicate quantification. Formula macros are used to structure complex formulas and tasks. The analysis is presented as a generated type-set document where informal explanations are interspersed with pretty-printed formulas and outputs of reasoners for first-order theorem proving and second-order quantifier elimination. Previously unnoticed or obscured aspects and details of Gödel’s proof become apparent. Practical application possibilities of second-order quantifier elimination are shown and the encountered elimination tasks may serve as benchmarks.

【摘要我们通过基于谓词量化扩展的一阶逻辑，在自动化环境中使用自动化环境中的建模进行了补充。公式宏用于构建复杂的公式和任务。分析作为生成的类型集文档呈现，其中非正式的解释与漂亮印刷的公式交流，以及用于一阶定理证明和二阶量化消除的原件的产出。以前没有注意到或模糊的方面和哥德尔证据的细节变得明显。示出了二阶量化消除的实际应用可能性，并且遇到的消除任务可以用作基准。

AI-5-标题: Enabling a Social Robot to Process Social Cues to Detect when to Help a User

Abstract: It is important for socially assistive robots to be able to recognize when a user needs and wants help. Such robots need to be able to recognize human needs in a real-time manner so that they can provide timely assistance. We propose an architecture that uses social cues to determine when a robot should provide assistance. Based on a multimodal fusion approach upon eye gaze and language modalities, our architecture is trained and evaluated on data collected in a robot-assisted Lego building task. By focusing on social cues, our architecture has minimal dependencies on the specifics of a given task, enabling it to be applied in many different contexts. Enabling a social robot to recognize a user’s needs through social cues can help it to adapt to user behaviors and preferences, which in turn will lead to improved user experiences.

AI-6-标题: Optimizing Multi-Taper Features for Deep Speaker Verification

Abstract: Multi-taper estimators provide low-variance power spectrum estimates that can be used in place of the windowed discrete Fourier transform (DFT) to extract speech features such as mel-frequency cepstral coefficients (MFCCs). Even if past work has reported promising automatic speaker verification (ASV) results with Gaussian mixture model-based classifiers, the performance of multi-taper MFCCs with deep ASV systems remains an open question. Instead of a static-taper design, we propose to optimize the multi-taper estimator jointly with a deep neural network trained for ASV tasks. With a maximum improvement on the SITW corpus of 25.8% in terms of equal error rate over the static-taper, our method helps preserve a balanced level of leakage and variance, providing more robustness.

AI-7-标题: PipAttack: Poisoning Federated Recommender Systems forManipulating Item Promotion

Abstract: Due to the growing privacy concerns, decentralization emerges rapidly in personalized services, especially recommendation. Also, recent studies have shown that centralized models are vulnerable to poisoning attacks, compromising their integrity. In the context of recommender systems, a typical goal of such poisoning attacks is to promote the adversary’s target items by interfering with the training dataset and/or process. Hence, a common practice is to subsume recommender systems under the decentralized federated learning paradigm, which enables all user devices to collaboratively learn a global recommender while retaining all the sensitive data locally. Without exposing the full knowledge of the recommender and entire dataset to end-users, such federated recommendation is widely regarded safe’ towards poisoning attacks. In this paper, we present a systematic approach to backdooring federated recommender systems for targeted item promotion. The core tactic is to take advantage of the inherent popularity bias that commonly exists in data-driven recommenders. As popular items are more likely to appear in the recommendation list, our innovatively designed attack model enables the target item to have the characteristics of popular items in the embedding space. Then, by uploading carefully crafted gradients via a small number of malicious users during the model update, we can effectively increase the exposure rate of a target (unpopular) item in the resulted federated recommender. Evaluations on two real-world datasets show that 1) our attack model significantly boosts the exposure rate of the target item in a stealthy way, without harming the accuracy of the poisoned recommender; and 2) existing defenses are not effective enough, highlighting the need for new defenses against our local model poisoning attacks to federated recommender systems.

AI-8-标题: Efficient Robotic Manipulation Through Offline-to-Online Reinforcement Learning and Goal-Aware State Information

Abstract: End-to-end learning robotic manipulation with high data efficiency is one of the key challenges in robotics. The latest methods that utilize human demonstration data and unsupervised representation learning has proven to be a promising direction to improve RL learning efficiency. The use of demonstration data also allows “warming-up” the RL policies using offline data with imitation learning or the recently emerged offline reinforcement learning algorithms. However, existing works often treat offline policy learning and online exploration as two separate processes, which are often accompanied by severe performance drop during the offline-to-online transition. Furthermore, many robotic manipulation tasks involve complex sub-task structures, which are very challenging to be solved in RL with sparse reward. In this work, we propose a unified offline-to-online RL framework that resolves the transition performance drop issue. Additionally, we introduce goal-aware state information to the RL agent, which can greatly reduce task complexity and accelerate policy learning. Combined with an advanced unsupervised representation learning module, our framework achieves great training efficiency and performance compared with the state-of-the-art methods in multiple robotic manipulation tasks.

AI-9-标题: Human-Centered Explainable AI (XAI): From Algorithms to User Experiences

Abstract: As a technical sub-field of artificial intelligence (AI), explainable AI (XAI) has produced a vast collection of algorithms in recent years. However, explainability is an inherently human-centric property and the field is starting to embrace inter-disciplinary perspectives and human-centered approaches. As researchers and practitioners begin to leverage XAI algorithms to build XAI applications, explainability has moved beyond a demand by data scientists or researchers to comprehend the models they are developing, to become an essential requirement for people to trust and adopt AI deployed in numerous domains. Human-computer interaction (HCI) research and user experience (UX) design in this area are therefore increasingly important. In this chapter, we begin with a high-level overview of the technical landscape of XAI algorithms, then selectively survey recent HCI work that takes human-centered approaches to design, evaluate, provide conceptual and methodological tools for XAI. We ask the question “what are human-centered approaches doing for XAI” and highlight three roles that they should play in shaping XAI technologies: to drive technical choices by understanding users’ explainability needs, to uncover pitfalls of existing XAI methods through empirical studies and inform new methods, and to provide conceptual frameworks for human-compatible XAI.

AI-10-标题: Privacy in Open Search: A Review of Challenges and Solutions

Abstract: Privacy is of worldwide concern regarding activities and processes that include sensitive data. For this reason, many countries and territories have been recently approving regulations controlling the extent to which organizations may exploit data provided by people. Artificial intelligence areas, such as machine learning and natural language processing, have already successfully employed privacy-preserving mechanisms in order to safeguard data privacy in a vast number of applications. Information retrieval (IR) is likewise prone to privacy threats, such as attacks and unintended disclosures of documents and search history, which may cripple the security of users and be penalized by data protection laws. This work aims at highlighting and discussing open challenges for privacy in the recent literature of IR, focusing on tasks featuring user-generated text data. Our contribution is threefold: firstly, we present an overview of privacy threats to IR tasks; secondly, we discuss applicable privacy-preserving mechanisms which may be employed in solutions to restrain privacy hazards; finally, we bring insights on the tradeoffs between privacy preservation and utility performance for IR tasks.

AI-11-标题: Bootstrapping confidence in future safety based on past safe operation

Abstract: With autonomous vehicles (AVs), a major concern is the inability to give meaningful quantitative assurance of safety, to the extent required by society - e.g. that an AV must be at least as safe as a good human driver - before that AV is in extensive use. We demonstrate an approach to achieving more moderate, but useful, confidence, e.g., confidence of low enough probability of causing accidents in the early phases of operation. This formalises mathematically the common approach of operating a system on a limited basis in the hope that mishap-free operation will confirm one’s confidence in its safety and allow progressively more extensive operation: a process of “bootstrapping” of confidence. Translating that intuitive approach into theorems shows: (1) that it is substantially sound in the right circumstances, and could be a good method for deciding about the early deployment phase for an AV; (2) how much confidence can be rightly derived from such a “cautious deployment” approach, so that we can avoid over-optimism; (3) under which conditions our sound formulas for future confidence are applicable; (4) thus, which analyses of the concrete situations, and/or constraints on practice, are needed in order to enjoy the advantages of provably correct confidence in adequate future safety.

AI-12-标题: Colosseum: Large-Scale Wireless Experimentation Through Hardware-in-the-Loop Network Emulation

Abstract: Colosseum is an open-access and publicly-available large-scale wireless testbed for experimental research via virtualized and softwarized waveforms and protocol stacks on a fully programmable, “white-box” platform. Through 256 state-of-the-art Software-defined Radios and a Massive Channel Emulator core, Colosseum can model virtually any scenario, enabling the design, development and testing of solutions at scale in a variety of deployments and channel conditions. These Colosseum radio-frequency scenarios are reproduced through high-fidelity FPGA-based emulation with finite-impulse response filters. Filters model the taps of desired wireless channels and apply them to the signals generated by the radio nodes, faithfully mimicking the conditions of real-world wireless environments. In this paper we describe the architecture of Colosseum and its experimentation and emulation capabilities. We then demonstrate the effectiveness of Colosseum for experimental research at scale through exemplary use cases including prevailing wireless technologies (e.g., cellular and Wi-Fi) in spectrum sharing and unmanned aerial vehicle scenarios. A roadmap for Colosseum future updates concludes the paper.

AI-13-标题: Surrogate Representation Learning with Isometric Mapping for Gray-box Graph Adversarial Attacks

Abstract: Gray-box graph attacks aim at disrupting the performance of the victim model by using inconspicuous attacks with limited knowledge of the victim model. The parameters of the victim model and the labels of the test nodes are invisible to the attacker. To obtain the gradient on the node attributes or graph structure, the attacker constructs an imaginary surrogate model trained under supervision. However, there is a lack of discussion on the training of surrogate models and the robustness of provided gradient information. The general node classification model loses the topology of the nodes on the graph, which is, in fact, an exploitable prior for the attacker. This paper investigates the effect of representation learning of surrogate models on the transferability of gray-box graph adversarial attacks. To reserve the topology in the surrogate embedding, we propose Surrogate Representation Learning with Isometric Mapping (SRLIM). By using Isometric mapping method, our proposed SRLIM can constrain the topological structure of nodes from the input layer to the embedding space, that is, to maintain the similarity of nodes in the propagation process. Experiments prove the effectiveness of our approach through the improvement in the performance of the adversarial attacks generated by the gradient-based attacker in untargeted poisoning gray-box setups.

AI-14-标题: R4: A Framework for Route Representation and Route Recommendation

AI-15-标题: Playing 2048 With Reinforcement Learning

Abstract: The game of 2048 is a highly addictive game. It is easy to learn the game, but hard to master as the created game revealed that only about 1% games out of hundreds million ever played have been won. In this paper, we would like to explore reinforcement learning techniques to win 2048. The approaches we have took include deep Q-learning and beam search, with beam search reaching 2048 28.5 of time.

AI-16-标题: Semantic Sensing and Planning for Human-Robot Collaboration in Uncertain Environments

Abstract: Autonomous robots can benefit greatly from human-provided semantic characterizations of uncertain task environments and states. However, the development of integrated strategies which let robots model, communicate, and act on such soft data remains challenging. Here, a framework is presented for active semantic sensing and planning in human-robot teams which addresses these gaps by formally combining the benefits of online sampling-based POMDP policies, multi-modal semantic interaction, and Bayesian data fusion. This approach lets humans opportunistically impose model structure and extend the range of semantic soft data in uncertain environments by sketching and labeling arbitrary landmarks across the environment. Dynamic updating of the environment while searching for a mobile target allows robotic agents to actively query humans for novel and relevant semantic data, thereby improving beliefs of unknown environments and target states for improved online planning. Target search simulations show significant improvements in time and belief state estimates required for interception versus conventional planning based solely on robotic sensing. Human subject studies demonstrate a average doubling in dynamic target capture rate compared to the lone robot case, employing reasoning over a range of user characteristics and interaction modalities. Video of interaction can be found at this https URL.

AI-17-标题: flip-hoisting: Exploiting Repeated Parameters in Discrete Probabilistic Programs

Abstract: Probabilistic programming is emerging as a popular and effective means of probabilistic modeling and an alternative to probabilistic graphical models. Probabilistic programs provide greater expressivity and flexibility in modeling probabilistic systems than graphical models, but this flexibility comes at a cost: there remains a significant disparity in performance between specialized Bayesian network solvers and probabilistic program inference algorithms. In this work we present a program analysis and associated optimization, flip-hoisting, that collapses repetitious parameters in discrete probabilistic programs to improve inference performance. flip-hoisting generalizes parameter sharing - a well-known important optimization from discrete graphical models - to probabilistic programs. We implement flip-hoisting in an existing probabilistic programming language and show empirically that it significantly improves inference performance, narrowing the gap between the performances of probabilistic programs and probabilistic graphical models.

AI-18-标题: MultiHead MultiModal Deep Interest Recommendation Network

Abstract: With the development of information technology, human beings are constantly producing a large amount of information at all times. How to obtain the information that users are interested in from the large amount of information has become an issue of great concern to users and even business managers. In order to solve this problem, from traditional machine learning to deep learning recommendation systems, researchers continue to improve optimization models and explore solutions. Because researchers have optimized more on the recommendation model network structure, they have less research on enriching recommendation model features, and there is still room for in-depth recommendation model optimization. Based on the DIN\cite{Authors01} model, this paper adds multi-head and multi-modal modules, which enriches the feature sets that the model can use, and at the same time strengthens the cross-combination and fitting capabilities of the model. Experiments show that the multi-head multi-modal DIN improves the recommendation prediction effect, and outperforms current state-of-the-art methods on various comprehensive indicators.