We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Science

New submissions

[ total of 679 entries: 1-679 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 25 Feb 20

[1]  arXiv:2002.09477 [pdf]
Title: Graph Computing based Distributed State Estimation with PMUs
Comments: 5 pages, 3 figures, 3 tables, 2020 IEEE Power and Energy Society General Meeting. arXiv admin note: substantial text overlap with arXiv:1902.06893
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF); Signal Processing (eess.SP); Numerical Analysis (math.NA)

Power system state estimation plays a fundamental and critical role in the energy management system (EMS). To achieve a high performance and accurate system states estimation, a graph computing based distributed state estimation approach is proposed in this paper. Firstly, a power system network is divided into multiple areas. Reference buses are selected with PMUs being installed at these buses for each area. Then, the system network is converted into multiple independent areas. In this way, the power system state estimation could be conducted in parallel for each area and the estimated system states are obtained without compromise of accuracy. IEEE 118-bus system and MP 10790-bus system are employed to verify the results accuracy and present the promising computation performance.

[2]  arXiv:2002.09478 [pdf, other]
Title: On the Search for Feedback in Reinforcement Learning
Comments: arXiv admin note: substantial text overlap with arXiv:1904.08361
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

This paper addresses the problem of learning the optimal feedback policy for a nonlinear stochastic dynamical system with continuous state space, continuous action space and unknown dynamics. Feedback policies are complex objects that typically need a large dimensional parametrization, which makes Reinforcement Learning algorithms that search for an optimum in this large parameter space, sample inefficient and subject to high variance. We propose a "decoupling" principle that drastically reduces the feedback parameter space while still remaining near-optimal to the fourth-order in a small noise parameter. Based on this principle, we propose a decoupled data-based control (D2C) algorithm that addresses the stochastic control problem: first, an open-loop deterministic trajectory optimization problem is solved using a black-box simulation model of the dynamical system. Then, a linear closed-loop control is developed around this nominal trajectory using only a simulation model. Empirical evidence suggests significant reduction in training time, as well as the training variance, compared to other state of the art Reinforcement Learning algorithms.

[3]  arXiv:2002.09479 [pdf, ps, other]
Title: Kullback-Leibler Divergence-Based Fuzzy $C$-Means Clustering Incorporating Morphological Reconstruction and Wavelet Frames for Image Segmentation
Comments: 13 pages, 13 figures, 5 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Although spatial information of images usually enhance the robustness of the Fuzzy C-Means (FCM) algorithm, it greatly increases the computational costs for image segmentation. To achieve a sound trade-off between the segmentation performance and the speed of clustering, we come up with a Kullback-Leibler (KL) divergence-based FCM algorithm by incorporating a tight wavelet frame transform and a morphological reconstruction operation. To enhance FCM's robustness, an observed image is first filtered by using the morphological reconstruction. A tight wavelet frame system is employed to decompose the observed and filtered images so as to form their feature sets. Considering these feature sets as data of clustering, an modified FCM algorithm is proposed, which introduces a KL divergence term in the partition matrix into its objective function. The KL divergence term aims to make membership degrees of each image pixel closer to those of its neighbors, which brings that the membership partition becomes more suitable and the parameter setting of FCM becomes simplified. On the basis of the obtained partition matrix and prototypes, the segmented feature set is reconstructed by minimizing the inverse process of the modified objective function. To modify abnormal features produced in the reconstruction process, each reconstructed feature is reassigned to the closest prototype. As a result, the segmentation accuracy of KL divergence-based FCM is further improved. What's more, the segmented image is reconstructed by using a tight wavelet frame reconstruction operation. Finally, supporting experiments coping with synthetic, medical and color images are reported. Experimental results exhibit that the proposed algorithm works well and comes with better segmentation performance than other comparative algorithms. Moreover, the proposed algorithm requires less time than most of the FCM-related algorithms.

[4]  arXiv:2002.09481 [pdf, other]
Title: TFApprox: Towards a Fast Emulation of DNN Approximate Hardware Accelerators on GPU
Comments: To appear at the 23rd Design, Automation and Test in Europe (DATE 2020). Grenoble, France
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

Energy efficiency of hardware accelerators of deep neural networks (DNN) can be improved by introducing approximate arithmetic circuits. In order to quantify the error introduced by using these circuits and avoid the expensive hardware prototyping, a software emulator of the DNN accelerator is usually executed on CPU or GPU. However, this emulation is typically two or three orders of magnitude slower than a software DNN implementation running on CPU or GPU and operating with standard floating point arithmetic instructions and common DNN libraries. The reason is that there is no hardware support for approximate arithmetic operations on common CPUs and GPUs and these operations have to be expensively emulated. In order to address this issue, we propose an efficient emulation method for approximate circuits utilized in a given DNN accelerator which is emulated on GPU. All relevant approximate circuits are implemented as look-up tables and accessed through a texture memory mechanism of CUDA capable GPUs. We exploit the fact that the texture memory is optimized for irregular read-only access and in some GPU architectures is even implemented as a dedicated cache. This technique allowed us to reduce the inference time of the emulated DNN accelerator approximately 200 times with respect to an optimized CPU version on complex DNNs such as ResNet. The proposed approach extends the TensorFlow library and is available online at https://github.com/ehw-fit/tf-approximate.

[5]  arXiv:2002.09485 [pdf, other]
Title: The Four Dimensions of Social Network Analysis: An Overview of Research Methods, Applications, and Software Tools
Comments: This paper is currently under evaluation in Information Fusion journal
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY); Machine Learning (cs.LG)

Social network based applications have experienced exponential growth in recent years. One of the reasons for this rise is that this application domain offers a particularly fertile place to test and develop the most advanced computational techniques to extract valuable information from the Web. The main contribution of this work is three-fold: (1) we provide an up-to-date literature review of the state of the art on social network analysis (SNA);(2) we propose a set of new metrics based on four essential features (or dimensions) in SNA; (3) finally, we provide a quantitative analysis of a set of popular SNA tools and frameworks. We have also performed a scientometric study to detect the most active research areas and application domains in this area. This work proposes the definition of four different dimensions, namely Pattern & Knowledge discovery, Information Fusion & Integration, Scalability, and Visualization, which are used to define a set of new metrics (termed degrees) in order to evaluate the different software tools and frameworks of SNA (a set of 20 SNA-software tools are analyzed and ranked following previous metrics). These dimensions, together with the defined degrees, allow evaluating and measure the maturity of social network technologies, looking for both a quantitative assessment of them, as to shed light to the challenges and future trends in this active area.

[6]  arXiv:2002.09504 [pdf, ps, other]
Title: Robust Numerical Tracking of One Path of a Polynomial Homotopy on Parallel Shared Memory Computers
Subjects: Numerical Analysis (math.NA); Distributed, Parallel, and Cluster Computing (cs.DC); Symbolic Computation (cs.SC)

We consider the problem of tracking one solution path defined by a polynomial homotopy on a parallel shared memory computer. Our robust path tracker applies Newton's method on power series to locate the closest singular parameter value. On top of that, it computes singular values of the Hessians of the polynomials in the homotopy to estimate the distance to the nearest different path. Together, these estimates are used to compute an appropriate adaptive stepsize. For n-dimensional problems, the cost overhead of our robust path tracker is O(n), compared to the commonly used predictor-corrector methods. This cost overhead can be reduced by a multithreaded program on a parallel shared memory computer.

[7]  arXiv:2002.09505 [pdf, other]
Title: Estimating Q(s,s') with Deep Deterministic Dynamics Gradients
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

In this paper, we introduce a novel form of value function, $Q(s, s')$, that expresses the utility of transitioning from a state $s$ to a neighboring state $s'$ and then acting optimally thereafter. In order to derive an optimal policy, we develop a forward dynamics model that learns to make next-state predictions that maximize this value. This formulation decouples actions from values while still learning off-policy. We highlight the benefits of this approach in terms of value function transfer, learning within redundant action spaces, and learning off-policy from state observations generated by sub-optimal or completely random policies. Code and videos are available at \url{sites.google.com/view/qss-paper}.

[8]  arXiv:2002.09511 [pdf, other]
Title: Chronofold: a data structure for versioned text
Subjects: Data Structures and Algorithms (cs.DS)

Collaborative text editing and versioning is known to be a tough topic. Diffs, OT and CRDT are three relevant classes of algorithms which all have their issues. CRDT is the only one that works correctly and deterministically in a distributed environment, at the unfortunate cost of data structure complexity and metadata overheads.
A chronofold is a data structure for editable linear collections based on the Causal Tree CRDT model. A chronofold maintains time-ordering and space-ordering of its elements. Simply put, a it is both a log and a text at the same time, which makes it very convenient for text versioning and synchronization. Being a simple array-based data structure with O(1) insertions, chronofold makes CRDT overheads acceptable for many practical applications.

[9]  arXiv:2002.09516 [pdf, other]
Title: Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)

This paper studies the statistical theory of batch data reinforcement learning with function approximation. Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history generated by unknown behavioral policies. We study a regression-based fitted Q iteration method, and show that it is equivalent to a model-based method that estimates a conditional mean embedding of the transition operator. We prove that this method is information-theoretically optimal and has nearly minimal estimation error. In particular, by leveraging contraction property of Markov processes and martingale concentration, we establish a finite-sample instance-dependent error upper bound and a nearly-matching minimax lower bound. The policy evaluation error depends sharply on a restricted $\chi^2$-divergence over the function class between the long-term distribution of the target policy and the distribution of past data. This restricted $\chi^2$-divergence is both instance-dependent and function-class-dependent. It characterizes the statistical limit of off-policy evaluation. Further, we provide an easily computable confidence bound for the policy evaluator, which may be useful for optimistic planning and safe policy improvement.

[10]  arXiv:2002.09518 [pdf, other]
Title: Memory-Based Graph Networks
Comments: ICLR 2020
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Graph neural networks (GNNs) are a class of deep models that operate on data with arbitrary topology represented as graphs. We introduce an efficient memory layer for GNNs that can jointly learn node representations and coarsen the graph. We also introduce two new networks based on this layer: memory-based GNN (MemGNN) and graph memory network (GMN) that can learn hierarchical graph representations. The experimental results shows that the proposed models achieve state-of-the-art results in eight out of nine graph classification and regression benchmarks. We also show that the learned representations could correspond to chemical features in the molecule data. Code and reference implementations are released at: https://github.com/amirkhas/GraphMemoryNet

[11]  arXiv:2002.09519 [pdf, ps, other]
Title: Exponential Amortized Resource Analysis
Subjects: Programming Languages (cs.PL)

Automatic amortized resource analysis (AARA) is a type-based technique for inferring concrete (non-asymptotic) bounds on a program's resource usage. Existing work on AARA has focused on bounds that are polynomial in the sizes of the inputs. This paper presents and extension of AARA to exponential bounds that preserves the benefits of the technique, such as compositionality and efficient type inference based on linear constraint solving. A key idea is the use of the Stirling numbers of the second kind as the basis of potential functions, which play the same role as the binomial coefficients in polynomial AARA. To formalize the similarities with the existing analyses, the paper presents a general methodology for AARA that is instantiated to the polynomial version, the exponential version, and a combined system with potential functions that are formed by products of Stirling numbers and binomial coefficients. The soundness of exponential AARA is proved with respect to an operational cost semantics and the analysis of representative example programs demonstrates the effectiveness of the new analysis.

[12]  arXiv:2002.09523 [pdf, other]
Title: Struct-MMSB: Mixed Membership Stochastic Blockmodels with Interpretable Structured Priors
Comments: ECAI 2020
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

The mixed membership stochastic blockmodel (MMSB) is a popular framework for community detection and network generation. It learns a low-rank mixed membership representation for each node across communities by exploiting the underlying graph structure. MMSB assumes that the membership distributions of the nodes are independently drawn from a Dirichlet distribution, which limits its capability to model highly correlated graph structures that exist in real-world networks. In this paper, we present a flexible richly structured MMSB model, \textit{Struct-MMSB}, that uses a recently developed statistical relational learning model, hinge-loss Markov random fields (HL-MRFs), as a structured prior to model complex dependencies among node attributes, multi-relational links, and their relationship with mixed-membership distributions. Our model is specified using a probabilistic programming templating language that uses weighted first-order logic rules, which enhances the model's interpretability. Further, our model is capable of learning latent characteristics in real-world networks via meaningful latent variables encoded as a complex combination of observed features and membership distributions. We present an expectation-maximization based inference algorithm that learns latent variables and parameters iteratively, a scalable stochastic variation of the inference algorithm, and a method to learn the weights of HL-MRF structured priors. We evaluate our model on six datasets across three different types of networks and corresponding modeling scenarios and demonstrate that our models are able to achieve an improvement of 15\% on average in test log-likelihood and faster convergence when compared to state-of-the-art network models.

[13]  arXiv:2002.09533 [pdf, other]
Title: Real-Time Visualization in Non-Isotropic Geometries
Subjects: Graphics (cs.GR); Differential Geometry (math.DG)

Non-isotropic geometries are of interest to low-dimensional topologists, physicists and cosmologists. However, they are challenging to comprehend and visualize. We present novel methods of computing real-time native geodesic rendering of non-isotropic geometries. Our methods can be applied not only to visualization, but also are essential for potential applications in machine learning and video games.

[14]  arXiv:2002.09534 [pdf, other]
Title: Hyperbolic Minesweeper is in P
Authors: Eryk Kopczyński
Subjects: Computational Complexity (cs.CC); Artificial Intelligence (cs.AI)

We show that, while Minesweeper is NP-complete, its hyperbolic variant is in P. Our proof does not rely on the rules of Minesweeper, but is valid for any puzzle based on satisfying local constraints on a graph embedded in the hyperbolic plane.

[15]  arXiv:2002.09535 [pdf, other]
Title: RobustPeriod: Time-Frequency Mining for Robust Multiple Periodicities Detection
Comments: 9 pages, 7 figures, and 4 tables
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Applications (stat.AP); Machine Learning (stat.ML)

Periodicity detection is an important task in time series analysis as it plays a crucial role in many time series tasks such as classification, clustering, compression, anomaly detection, and forecasting. It is challenging due to the following reasons: 1, complicated non-stationary time series; 2, dynamic and complicated periodic patterns, including multiple interlaced periodic components; 3, outliers and noises. In this paper, we propose a robust periodicity detection algorithm to address these challenges. Our algorithm applies maximal overlap discrete wavelet transform to transform the time series into multiple temporal-frequency scales such that different periodicities can be isolated. We rank them by wavelet variance and then at each scale, and then propose Huber-periodogram by formulating the periodogram as the solution to M-estimator for introducing robustness. We rigorously prove the theoretical properties of Huber-periodogram and justify the use of Fisher's test on Huber-periodogram for periodicity detection. To further refine the detected periods, we compute unbiased autocorrelation function based on Wiener-Khinchin theorem from Huber-periodogram for improved robustness and efficiency. Experiments on synthetic and real-world datasets show that our algorithm outperforms other popular ones for both single and multiple periodicity detection. It is now implemented and provided as a public online service at Alibaba Group and has been used extensive in different business lines.

[16]  arXiv:2002.09536 [pdf, other]
Title: Image to Language Understanding: Captioning approach
Comments: 8 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Extracting context from visual representations is of utmost importance in the advancement of Computer Science. Representation of such a format in Natural Language has a huge variety of applications such as helping the visually impaired etc. Such an approach is a combination of Computer Vision and Natural Language techniques which is a hard problem to solve. This project aims to compare different approaches for solving the image captioning problem. In specific, the focus was on comparing two different types of models: Encoder-Decoder approach and a Multi-model approach. In the encoder-decoder approach, inject and merge architectures were compared against a multi-modal image captioning approach based primarily on object detection. These approaches have been compared on the basis on state of the art sentence comparison metrics such as BLEU, GLEU, Meteor, and Rouge on a subset of the Google Conceptual captions dataset which contains 100k images. On the basis of this comparison, we observed that the best model was the Inception injected encoder model. This best approach has been deployed as a web-based system. On uploading an image, such a system will output the best caption associated with the image.

[17]  arXiv:2002.09539 [pdf, other]
Title: Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD
Comments: Accepted to ICASSP 2020
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC); Machine Learning (stat.ML)

Distributed stochastic gradient descent (SGD) is essential for scaling the machine learning algorithms to a large number of computing nodes. However, the infrastructures variability such as high communication delay or random node slowdown greatly impedes the performance of distributed SGD algorithm, especially in a wireless system or sensor networks. In this paper, we propose an algorithmic approach named Overlap-Local-SGD (and its momentum variant) to overlap the communication and computation so as to speedup the distributed training procedure. The approach can help to mitigate the straggler effects as well. We achieve this by adding an anchor model on each node. After multiple local updates, locally trained models will be pulled back towards the synchronized anchor model rather than communicating with others. Experimental results of training a deep neural network on CIFAR-10 dataset demonstrate the effectiveness of Overlap-Local-SGD. We also provide a convergence guarantee for the proposed algorithm under non-convex objective functions.

[18]  arXiv:2002.09541 [pdf]
Title: Evaluation of Automatic FPGA Offloading for Loop Statements of Applications
Authors: Yoji Yamato
Comments: 7 pages, 4 figure, in Japanese, IEICE Technical Report, SWIM2019-25
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In recent years, with the prediction of Moore's law slowing down, utilization of hardware other than CPU such as FPGA which is energy effective is increasing. However, when using heterogeneous hardware other than CPUs, barriers of technical skills such as OpenCL are high. Based on that, I have proposed environment adaptive software that enables automatic conversion, configuration, and high-performance operation of once written code, according to the hardware to be placed. Partly of the offloading to the GPU was automated previously. In this paper, I propose and evaluate an automatic extraction method of appropriate offload target loop statements of source code as the first step of offloading to FPGA. I evaluate the effectiveness of the proposed method using an existing application.

[19]  arXiv:2002.09543 [pdf, other]
Title: Modelling Latent Skills for Multitask Language Generation
Subjects: Computation and Language (cs.CL)

We present a generative model for multitask conditional language generation. Our guiding hypothesis is that a shared set of latent skills underlies many disparate language generation tasks, and that explicitly modelling these skills in a task embedding space can help with both positive transfer across tasks and with efficient adaptation to new tasks. We instantiate this task embedding space as a latent variable in a latent variable sequence-to-sequence model. We evaluate this hypothesis by curating a series of monolingual text-to-text language generation datasets - covering a broad range of tasks and domains - and comparing the performance of models both in the multitask and few-shot regimes. We show that our latent task variable model outperforms other sequence-to-sequence baselines on average across tasks in the multitask setting. In the few-shot learning setting on an unseen test dataset (i.e., a new task), we demonstrate that model adaptation based on inference in the latent task space is more robust than standard fine-tuning based parameter adaptation and performs comparably in terms of overall performance. Finally, we examine the latent task representations learnt by our model and show that they cluster tasks in a natural way.

[20]  arXiv:2002.09545 [pdf, other]
Title: RobustTAD: Robust Time Series Anomaly Detection via Decomposition and Convolutional Neural Networks
Comments: 9 pages, 5 figures, and 2 tables
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Applications (stat.AP); Machine Learning (stat.ML)

The monitoring and management of numerous and diverse time series data at Alibaba Group calls for an effective and scalable time series anomaly detection service. In this paper, we propose RobustTAD, a Robust Time series Anomaly Detection framework by integrating robust seasonal-trend decomposition and convolutional neural network for time series data. The seasonal-trend decomposition can effectively handle complicated patterns in time series, and meanwhile significantly simplifies the architecture of the neural network, which is an encoder-decoder architecture with skip connections. This architecture can effectively capture the multi-scale information from time series, which is very useful in anomaly detection. Due to the limited labeled data in time series anomaly detection, we systematically investigate data augmentation methods in both time and frequency domains. We also introduce label-based weight and value-based weight in the loss function by utilizing the unbalanced nature of the time series anomaly detection problem. Compared with the widely used forecasting-based anomaly detection algorithms, decomposition-based algorithms, traditional statistical algorithms, as well as recent neural network based algorithms, RobustTAD performs significantly better on public benchmark datasets. It is deployed as a public online service and widely adopted in different business scenarios at Alibaba Group.

[21]  arXiv:2002.09546 [pdf, other]
Title: IMDfence: Architecting a Secure Protocol for Implantable Medical Devices
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)

Over the past decade, focus on the security and privacy aspects of implantable medical devices (IMDs) has intensified, driven by the multitude of cybersecurity vulnerabilities found in various existing devices. However, due to their strict computational, energy and physical constraints, conventional security protocols are not directly applicable to IMDs. Custom-tailored schemes have been proposed instead which, however, fail to cover the full spectrum of security features that modern IMDs and their ecosystems so critically require. In this paper we propose IMDfence, a security protocol for IMD ecosystems that provides a comprehensive yet practical security portfolio, which includes availability, non-repudiation, access control, emergency access, entity authentication, remote monitoring and system scalability. The performance of the security protocol as well as its feasibility and impact on modern IMDs are extensively analyzed and evaluated. We find that IMDfence achieves the above security requirements at a mere 4.64\% increase in total IMD energy consumption, and less than 14 ms and 6 kB increase in system delay and memory footprint respectively.

[22]  arXiv:2002.09553 [pdf, ps, other]
Title: Sequential decomposition of discrete memoryless channel with noisy feedback
Authors: Deepanshu Vasal
Subjects: Systems and Control (eess.SY); Information Theory (cs.IT)

In this paper, we consider a discrete memoryless point to point channel with noisy feedback, where there is a sender with a private message that she wants to communicate to a receiver by sequentially transmitting symbols over a noisy channel. After each transmission, she receives a noisy feedback of the symbol received by the receiver. The goal is to design transmission control strategy of the sender that minimize the average probability of error. This is an instance of decentralized control of information where the two controllers, the sender and the receiver have no common information. There exist no methodology in the literature that provides a notion of "state" and a dynamic program to find optimal policies for this problem In this paper, we show introduce a notion of state, based on which we provide a sequential decomposition methodology that finds optimum policies within the class of Markov strategies with respect to this state (which need not be globally optimum). This allows to decompose the problem across time and reduce the complexity dependence on time from double exponential to linear in time.

[23]  arXiv:2002.09554 [pdf, other]
Title: Particle Filter Based Monocular Human Tracking with a 3D Cardbox Model and a Novel Deterministic Resampling Strategy
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

The challenge of markerless human motion tracking is the high dimensionality of the search space. Thus, efficient exploration in the search space is of great significance. In this paper, a motion capturing algorithm is proposed for upper body motion tracking. The proposed system tracks human motion based on monocular silhouette-matching, and it is built on the top of a hierarchical particle filter, within which a novel deterministic resampling strategy (DRS) is applied. The proposed system is evaluated quantitatively with the ground truth data measured by an inertial sensor system. In addition, we compare the DRS with the stratified resampling strategy (SRS). It is shown in experiments that DRS outperforms SRS with the same amount of particles. Moreover, a new 3D articulated human upper body model with the name 3D cardbox model is created and is proven to work successfully for motion tracking. Experiments show that the proposed system can robustly track upper body motion without self-occlusion. Motions towards the camera can also be well tracked.

[24]  arXiv:2002.09560 [pdf, other]
Title: Practical Verification of MapReduce Computation Integrity via Partial Re-execution
Comments: 12 pages
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

Big data processing is often outsourced to powerful, but untrusted cloud service providers that provide agile and scalable computing resources to weaker clients. However, untrusted cloud services do not ensure the integrity of data and computations while clients have no control over the outsourced computation or no means to check the correctness of the execution. Despite a growing interest and recent progress in verifiable computation, the existing techniques are still not practical enough for big data processing due to high verification overhead. In this paper, we present a solution called V-MR (Verifiable MapReduce), which is a framework that verifies the integrity of MapReduce computation outsourced in the untrusted cloud via partial re-execution. V-MR is practically effective and efficient in that (1) it can detect the violation of MapReduce computation integrity and identify the malicious workers involved in the that produced the incorrect computation. (2) it can reduce the overhead of verification via partial re-execution with carefully selected input data and program code using program analysis. The experiment results of a prototype of V-MR show that V-MR can verify the integrity of MapReduce computation effectively with small overhead for partial re-execution.

[25]  arXiv:2002.09561 [pdf, other]
Title: Performance / Complexity Trade-offs of the Sphere Decoder Algorithm for Massive MIMO Systems
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Signal Processing (eess.SP)

Massive MIMO systems are seen by many researchers as a paramount technology toward next generation networks. This technology consists of hundreds of antennas that are capable of sending and receiving simultaneously a huge amount of data. One of the main challenges when using this technology is the necessity of an efficient decoding framework. The latter must guarantee both a low complexity and a good signal detection accuracy. The Sphere Decoder (SD) algorithm represents one of the promising decoding algorithms in terms of detection accuracy. However, it is inefficient for dealing with large MIMO systems due to its prohibitive complexity. To overcome this drawback, we propose to revisit the sequential SD algorithm and implement several variants that aim at finding appropriate trade-offs between complexity and performance. Then, we propose an efficient high-level parallel SD scheme based on the master/worker paradigm, which permits multiple SD instances to simultaneously explore the search space, while mitigating the overheads from load imbalance. The results of our parallel SD implementation outperform the state-of-the-art by more than 5x using similar MIMO configuration systems, and show a super-linear speedup on multicore platforms. Moreover, this paper presents a new hybrid implementation that combines the strengths of SD and K-best algorithms, i.e., maintaining the detection accuracy of SD, while reducing the complexity using the K-best way of pruning search space. The hybrid approach extends our parallel SD implementation: the master contains the SD search tree, and the workers use the K-best algorithm to accelerate its exploration. The resulting hybrid approach enhances the diversification gain, and therefore, lowers the overall complexity. Our synergistic hybrid approach permits to deal with large MIMO configurations up to 100x100, without sacrificing the accuracy and complexity.

[26]  arXiv:2002.09563 [pdf, ps, other]
Title: Structure-Preserving and Efficient Numerical Methods for Ion Transport
Subjects: Numerical Analysis (math.NA)

Ion transport, often described by the Poisson--Nernst--Planck (PNP) equations, is ubiquitous in electrochemical devices and many biological processes of significance. In this work, we develop conservative, positivity-preserving, energy dissipating, and implicit finite difference schemes for solving the multi-dimensional PNP equations with multiple ionic species. A central-differencing discretization based on harmonic-mean approximations is employed for the Nernst--Planck (NP) equations. The backward Euler discretization in time is employed to derive a fully implicit nonlinear system, which is efficiently solved by a newly proposed Newton's method. The improved computational efficiency of the Newton's method originates from the usage of the electrostatic potential as the iteration variable, rather than the unknowns of the nonlinear system that involves both the potential and concentration of multiple ionic species. Numerical analysis proves that the numerical schemes respect three desired analytical properties (conservation, positivity preserving, and energy dissipation) fully discretely. Based on advantages brought by the harmonic-mean approximations, we are able to establish estimate on the upper bound of condition numbers of coefficient matrices in linear systems that are solved iteratively. The solvability and stability of the linearized problem in the Newton's method are rigorously established as well. Numerical tests are performed to confirm the anticipated numerical accuracy, computational efficiency, and structure-preserving properties of the developed schemes. Adaptive time stepping is implemented for further efficiency improvement. Finally, the proposed numerical approaches are applied to characterize ion transport subject to a sinusoidal applied potential.

[27]  arXiv:2002.09564 [pdf, other]
Title: Towards Robust and Reproducible Active Learning Using Neural Networks
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Active learning (AL) is a promising ML paradigm that has the potential to parse through large unlabeled data and help reduce annotation cost in domains where labeling entire data can be prohibitive. Recently proposed neural network based AL methods use different heuristics to accomplish this goal. In this study, we show that recent AL methods offer a gain over random baseline under a brittle combination of experimental conditions. We demonstrate that such marginal gains vanish when experimental factors are changed, leading to reproducibility issues and suggesting that AL methods lack robustness. We also observe that with a properly tuned model, which employs recently proposed regularization techniques, the performance significantly improves for all AL methods including the random sampling baseline, and performance differences among the AL methods become negligible. Based on these observations, we suggest a set of experiments that are critical to assess the true effectiveness of an AL method. To facilitate these experiments we also present an open source toolkit. We believe our findings and recommendations will help advance reproducible research in robust AL using neural networks.

[28]  arXiv:2002.09565 [pdf, other]
Title: Adversarial Attacks on Machine Learning Systems for High-Frequency Trading
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Statistical Finance (q-fin.ST)

Algorithmic trading systems are often completely automated, and deep learning is increasingly receiving attention in this domain. Nonetheless, little is known about the robustness properties of these models. We study valuation models for algorithmic trading from the perspective of adversarial machine learning. We introduce new attacks specific to this domain with size constraints that minimize attack costs. We further discuss how these attacks can be used as an analysis tool to study and evaluate the robustness properties of financial models. Finally, we investigate the feasibility of realistic adversarial attacks in which an adversarial trader fools automated trading systems into making inaccurate predictions.

[29]  arXiv:2002.09570 [pdf, ps, other]
Title: Feedback game on Eulerian graphs
Comments: 16 pages, 12 figures
Subjects: Computer Science and Game Theory (cs.GT); Discrete Mathematics (cs.DM); Combinatorics (math.CO)

In this paper, we introduce a two-player impartial game on graphs, called a {\em feedback game}, which is a variant of the generalized geography. We study the feedback game on Eulerian graphs. In particular, we show that the PSPACE-completeness of the game and determine the winner of the game on several classes of Eulerian graphs.

[30]  arXiv:2002.09571 [pdf, other]
Title: Learning to Continually Learn
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Continual lifelong learning requires an agent or model to learn many sequentially ordered tasks, building on previous knowledge without catastrophically forgetting it. Much work has gone towards preventing the default tendency of machine learning models to catastrophically forget, yet virtually all such work involves manually-designed solutions to the problem. We instead advocate meta-learning a solution to catastrophic forgetting, allowing AI to learn to continually learn. Inspired by neuromodulatory processes in the brain, we propose A Neuromodulated Meta-Learning Algorithm (ANML). It differentiates through a sequential learning process to meta-learn an activation-gating function that enables context-dependent selective activation within a deep neural network. Specifically, a neuromodulatory (NM) neural network gates the forward pass of another (otherwise normal) neural network called the prediction learning network (PLN). The NM network also thus indirectly controls selective plasticity (i.e. the backward pass of) the PLN. ANML enables continual learning without catastrophic forgetting at scale: it produces state-of-the-art continual learning performance, sequentially learning as many as 600 classes (over 9,000 SGD updates).

[31]  arXiv:2002.09572 [pdf, other]
Title: The Break-Even Point on Optimization Trajectories of Deep Neural Networks
Comments: Accepted as a spotlight at ICLR 2020. The last two authors contributed equally
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The early phase of training of deep neural networks is critical for their final performance. In this work, we study how the hyperparameters of stochastic gradient descent (SGD) used in the early phase of training affect the rest of the optimization trajectory. We argue for the existence of the "break-even" point on this trajectory, beyond which the curvature of the loss surface and noise in the gradient are implicitly regularized by SGD. In particular, we demonstrate on multiple classification tasks that using a large learning rate in the initial phase of training reduces the variance of the gradient, and improves the conditioning of the covariance of gradients. These effects are beneficial from the optimization perspective and become visible after the break-even point. Complementing prior work, we also show that using a low learning rate results in bad conditioning of the loss surface even for a neural network with batch normalization layers. In short, our work shows that key properties of the loss surface are strongly influenced by SGD in the early phase of training. We argue that studying the impact of the identified effects on generalization is a promising future direction.

[32]  arXiv:2002.09574 [pdf, other]
Title: Coded Federated Learning
Comments: Presented at the Wireless Edge Intelligence Workshop, IEEE GLOBECOM 2019
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Federated learning is a method of training a global model from decentralized data distributed across client devices. Here, model parameters are computed locally by each client device and exchanged with a central server, which aggregates the local models for a global view, without requiring sharing of training data. The convergence performance of federated learning is severely impacted in heterogeneous computing platforms such as those at the wireless edge, where straggling computations and communication links can significantly limit timely model parameter updates. This paper develops a novel coded computing technique for federated learning to mitigate the impact of stragglers. In the proposed Coded Federated Learning (CFL) scheme, each client device privately generates parity training data and shares it with the central server only once at the start of the training phase. The central server can then preemptively perform redundant gradient computations on the composite parity data to compensate for the erased or delayed parameter updates. Our results show that CFL allows the global model to converge nearly four times faster when compared to an uncoded approach

[33]  arXiv:2002.09575 [pdf, other]
Title: A Multi-Channel Neural Graphical Event Model with Negative Evidence
Comments: AAAI 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Event datasets are sequences of events of various types occurring irregularly over the time-line, and they are increasingly prevalent in numerous domains. Existing work for modeling events using conditional intensities rely on either using some underlying parametric form to capture historical dependencies, or on non-parametric models that focus primarily on tasks such as prediction. We propose a non-parametric deep neural network approach in order to estimate the underlying intensity functions. We use a novel multi-channel RNN that optimally reinforces the negative evidence of no observable events with the introduction of fake event epochs within each consecutive inter-event interval. We evaluate our method against state-of-the-art baselines on model fitting tasks as gauged by log-likelihood. Through experiments on both synthetic and real-world datasets, we find that our proposed approach outperforms existing baselines on most of the datasets studied.

[34]  arXiv:2002.09576 [pdf, other]
Title: UnMask: Adversarial Detection and Defense Through Robust Feature Alignment
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Deep learning models are being integrated into a wide range of high-impact, security-critical systems, from self-driving cars to medical diagnosis. However, recent research has demonstrated that many of these deep learning architectures are vulnerable to adversarial attacks--highlighting the vital need for defensive techniques to detect and mitigate these attacks before they occur. To combat these adversarial attacks, we developed UnMask, an adversarial detection and defense framework based on robust feature alignment. The core idea behind UnMask is to protect these models by verifying that an image's predicted class ("bird") contains the expected robust features (e.g., beak, wings, eyes). For example, if an image is classified as "bird", but the extracted features are wheel, saddle and frame, the model may be under attack. UnMask detects such attacks and defends the model by rectifying the misclassification, re-classifying the image based on its robust features. Our extensive evaluation shows that UnMask (1) detects up to 96.75% of attacks, with a false positive rate of 9.66% and (2) defends the model by correctly classifying up to 93% of adversarial images produced by the current strongest attack, Projected Gradient Descent, in the gray-box setting. UnMask provides significantly better protection than adversarial training across 8 attack vectors, averaging 31.18% higher accuracy. Our proposed method is architecture agnostic and fast. We open source the code repository and data with this paper: https://github.com/unmaskd/unmask.

[35]  arXiv:2002.09577 [pdf, other]
Title: Emulating duration and curvature of coral snake anti-predator thrashing behaviors using a soft-robotic platform
Comments: 6 pages, 7 figures
Subjects: Robotics (cs.RO)

This paper presents a soft-robotic platform for exploring the ecological relevance of non-locomotory movements via animal-robot interactions. Coral snakes (genus Micrurus) and their mimics use vigorous, non-locomotory, and arrhythmic thrashing to deter predation. There is variation across snake species in the duration and curvature of anti-predator thrashes, and it is unclear how these aspects of motion interact to contribute to snake survival. In this work, soft robots composed of fiber-reinforced elastomeric enclosures (FREEs) are developed to emulate the anti-predator behaviors of three genera of snake. Curvature and duration of motion are estimated for both live snakes and robots, providing a quantitative assessment of the robots' ability to emulate snake poses. The curvature values of the fabricated soft-robotic head, midsection, and tail segments are found to overlap with those exhibited by live snakes. Soft robot motion durations were less than or equal to those of snakes for all three genera. Additionally, combinations of segments were selected to emulate three specific snake genera with distinct anti-predatory behavior, producing curvature values that aligned well with live snake observations.

[36]  arXiv:2002.09579 [pdf, other]
Title: Robustness to Programmable String Transformations via Augmented Abstract Training
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Deep neural networks for natural language processing tasks are vulnerable to adversarial input perturbations. In this paper, we present a versatile language for programmatically specifying string transformations -- e.g., insertions, deletions, substitutions, swaps, etc. -- that are relevant to the task at hand. We then present an approach to adversarially training models that are robust to such user-defined string transformations. Our approach combines the advantages of search-based techniques for adversarial training with abstraction-based techniques. Specifically, we show how to decompose a set of user-defined string transformations into two component specifications, one that benefits from search and another from abstraction. We use our technique to train models on the AG and SST2 datasets and show that the resulting models are robust to combinations of user-defined transformations mimicking spelling mistakes and other meaning-preserving transformations.

[37]  arXiv:2002.09581 [pdf]
Title: Extracting and Validating Explanatory Word Archipelagoes using Dual Entropy
Comments: 7 pages, 2 figures, 2 columns
Subjects: Computation and Language (cs.CL)

The logical connectivity of text is represented by the connectivity of words that form archipelagoes. Here, each archipelago is a sequence of islands of the occurrences of a certain word. An island here means the local sequence of sentences where the word is emphasized, and an archipelago of a length comparable to the target text is extracted using the co-variation of entropy A (the window-based entropy) on the distribution of the word's occurrences with the width of each time window. Then, the logical connectivity of text is evaluated on entropy B (the graph-based entropy) computed on the distribution of sentences to connected word-clusters obtained on the co-occurrence of words. The results show the parts of the target text with words forming archipelagoes extracted on entropy A, without learned or prepared knowledge, form an explanatory part of the text that is of smaller entropy B than the parts extracted by the baseline methods.

[38]  arXiv:2002.09587 [pdf, ps, other]
Title: The Sample Complexity of Meta Sparse Regression
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

This paper addresses the meta-learning problem in sparse linear regression with infinite tasks. We assume that the learner can access several similar tasks. The goal of the learner is to transfer knowledge from the prior tasks to a similar but novel task. For p parameters, size of the support set k , and l samples per task, we show that T \in O (( k log(p) ) /l ) tasks are sufficient in order to recover the common support of all tasks. With the recovered support, we can greatly reduce the sample complexity for estimating the parameter of the novel task, i.e., l \in O (1) with respect to T and p . We also prove that our rates are minimax optimal. A key difference between meta-learning and the classical multi-task learning, is that meta-learning focuses only on the recovery of the parameters of the novel task, while multi-task learning estimates the parameter of all tasks, which requires l to grow with T . Instead, our efficient meta-learning estimator allows for l to be constant with respect to T (i.e., few-shot learning).

[39]  arXiv:2002.09591 [pdf, other]
Title: Dynamics of large scale networks following a merger
Comments: 8 pages, 17 figures
Journal-ref: T. Ozyer and R. Alhajj (eds), Machine Learning Techniques for Online Social Networks, Lecture Notes in Social Networks, pages 173--193. Springer, 2018
Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)

We study the dynamic network of relationships among avatars in the massively multiplayer online game Planetside 2. In the spring of 2014, two separate servers of this game were merged, and as a result, two previously distinct networks were combined into one. We observed the evolution of this network in the seven month period following the merger and report our observations. We found that some structures of original networks persist in the combined network for a long time after the merger. As the original avatars are gradually removed, these structures slowly dissolve, but they remain observable for a surprisingly long time. We present a number of visualizations illustrating the post-merger dynamics and discuss time evolution of selected quantities characterizing the topology of the network.

[40]  arXiv:2002.09594 [pdf, other]
Title: OCGNN: One-class Classification with Graph Neural Networks
Comments: 7 pages, 2 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Nowadays, graph-structured data are increasingly used to model complex systems. Meanwhile, detecting anomalies from graph has become a vital research problem of pressing societal concerns. Anomaly detection is an unsupervised learning task of identifying rare data that differ from the majority. As one of the dominant anomaly detection algorithms, One Class Support Vector Machine has been widely used to detect outliers. However, those traditional anomaly detection methods lost their effectiveness in graph data. Since traditional anomaly detection methods are stable, robust and easy to use, it is vitally important to generalize them to graph data. In this work, we propose One Class Graph Neural Network (OCGNN), a one-class classification framework for graph anomaly detection. OCGNN is designed to combine the powerful representation ability of Graph Neural Networks along with the classical one-class objective. Compared with other baselines, OCGNN achieves significant improvements in extensive experiments.

[41]  arXiv:2002.09595 [pdf]
Title: The Pragmatic Turn in Explainable Artificial Intelligence (XAI)
Authors: Andrés Páez
Journal-ref: Minds and Machines, 29(3), 441-459, 2019
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

In this paper I argue that the search for explainable models and interpretable decisions in AI must be reformulated in terms of the broader project of offering a pragmatic and naturalistic account of understanding in AI. Intuitively, the purpose of providing an explanation of a model or a decision is to make it understandable to its stakeholders. But without a previous grasp of what it means to say that an agent understands a model or a decision, the explanatory strategies will lack a well-defined goal. Aside from providing a clearer objective for XAI, focusing on understanding also allows us to relax the factivity condition on explanation, which is impossible to fulfill in many machine learning models, and to focus instead on the pragmatic conditions that determine the best fit between a model and the methods and devices deployed to understand it. After an examination of the different types of understanding discussed in the philosophical and psychological literature, I conclude that interpretative or approximation models not only provide the best way to achieve the objectual understanding of a machine learning model, but are also a necessary condition to achieve post-hoc interpretability. This conclusion is partly based on the shortcomings of the purely functionalist approach to post-hoc interpretability that seems to be predominant in most recent literature.

[42]  arXiv:2002.09597 [pdf, other]
Title: On Layered Fan-Planar Graph Drawings
Subjects: Computational Geometry (cs.CG); Discrete Mathematics (cs.DM)

In this paper, we study fan-planar drawings that use $h$ layers and are proper, i.e., edges connect adjacent layers. We show that if the embedding of the graph is fixed, then testing the existence of such drawings is fixed-parameter tractable in $h$, via a reduction to a similar result for planar graphs by Dujmovi\'{c} et al. If the embedding is not fixed, then we give partial results for $h=2$: It was already known how to test existence of fan-planar proper 2-layer drawings for 2-connected graphs, and we show here how to test this for trees. Along the way, we exhibit other interesting results for graphs with a fan-planar proper $h$-layer drawings; in particular we bound their pathwidth and show that they have a bar-1-visibility representation.

[43]  arXiv:2002.09598 [pdf, ps, other]
Title: A characterization of proportionally representative committees
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Theoretical Economics (econ.TH)

A well-known axiom for proportional representation is Proportionality of Solid Coalitions (PSC). We characterize committees satisfying PSC as possible outcomes of the Minimal Demand rule, which generalizes an approach pioneered by Michael Dummett.

[44]  arXiv:2002.09599 [pdf, other]
Title: Training Question Answering Models From Synthetic Data
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Question and answer generation is a data augmentation method that aims to improve question answering (QA) models given the limited amount of human labeled data. However, a considerable gap remains between synthetic and human-generated question-answer pairs. This work aims to narrow this gap by taking advantage of large language models and explores several factors such as model size, quality of pretrained models, scale of data synthesized, and algorithmic choices. On the SQuAD1.1 question answering task, we achieve higher accuracy using solely synthetic questions and answers than when using the SQuAD1.1 training set questions alone. Removing access to real Wikipedia data, we synthesize questions and answers from a synthetic corpus generated by an 8.3 billion parameter GPT-2 model. With no access to human supervision and only access to other models, we are able to train state of the art question answering networks on entirely model-generated data that achieve 88.4 Exact Match (EM) and 93.9 F1 score on the SQuAD1.1 dev set. We further apply our methodology to SQuAD2.0 and show a 2.8 absolute gain on EM score compared to prior work using synthetic data.

[45]  arXiv:2002.09600 [pdf, other]
Title: Convex Shape Representation with Binary Labels for Image Segmentation: Models and Fast Algorithms
Subjects: Numerical Analysis (math.NA); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

We present a novel and effective binary representation for convex shapes. We show the equivalence between the shape convexity and some properties of the associated indicator function. The proposed method has two advantages. Firstly, the representation is based on a simple inequality constraint on the binary function rather than the definition of convex shapes, which allows us to obtain efficient algorithms for various applications with convexity prior. Secondly, this method is independent of the dimension of the concerned shape. In order to show the effectiveness of the proposed representation approach, we incorporate it with a probability based model for object segmentation with convexity prior. Efficient algorithms are given to solve the proposed models using Lagrange multiplier methods and linear approximations. Various experiments are given to show the superiority of the proposed methods.

[46]  arXiv:2002.09603 [pdf, other]
Title: Efficient solvers for hybridized three-field mixed finite element coupled poromechanics
Subjects: Numerical Analysis (math.NA)

We consider a mixed hybrid finite element formulation for coupled poromechanics. A stabilization strategy based on a macro-element approach is advanced to eliminate the spurious pressure modes appearing in undrained/incompressible conditions. The efficient solution of the stabilized mixed hybrid block system is addressed by developing a class of block triangular preconditioners based on a Schur-complement approximation strategy. Robustness, computational efficiency and scalability of the proposed approach are theoretically discussed and tested using challenging benchmark problems on massively parallel architectures.

[47]  arXiv:2002.09604 [pdf, other]
Title: Emergent Communication with World Models
Comments: NeurIPS Workshop on Emergent Communication
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We introduce Language World Models, a class of language-conditional generative model which interpret natural language messages by predicting latent codes of future observations. This provides a visual grounding of the message, similar to an enhanced observation of the world, which may include objects outside of the listening agent's field-of-view. We incorporate this "observation" into a persistent memory state, and allow the listening agent's policy to condition on it, akin to the relationship between memory and controller in a World Model. We show this improves effective communication and task success in 2D gridworld speaker-listener navigation tasks. In addition, we develop two losses framed specifically for our model-based formulation to promote positive signalling and positive listening. Finally, because messages are interpreted in a generative model, we can visualize the model beliefs to gain insight into how the communication channel is utilized.

[48]  arXiv:2002.09605 [pdf, ps, other]
Title: Error estimation of the Relaxation Finite Difference Scheme for the nonlinear Schrödinger Equation
Subjects: Numerical Analysis (math.NA)

We consider an initial- and boundary- value problem for the nonlinear Schr\"odinger equation with homogeneous Dirichlet boundary conditions in the one space dimension case. We discretize the problem in space by a central finite difference method and in time by the Relaxation Scheme proposed by C. Besse [C. R. Acad. Sci. Paris S\'er. I {\bf 326} (1998), 1427-1432]. We provide optimal order error estimates, in the discrete $L_t^{\infty}(H_x^1)$ norm, for the approximation error at the time nodes and at the intermediate time nodes. In the context of the nonlinear Schr{\"o}dinger equation, it is the first time that the derivation of an error estimate, for a fully discrete method based on the Relaxation Scheme, is completely addressed.

[49]  arXiv:2002.09607 [pdf, other]
Title: Multi-Representation Knowledge Distillation For Audio Classification
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

As an important component of multimedia analysis tasks, audio classification aims to discriminate between different audio signal types and has received intensive attention due to its wide applications. Generally speaking, the raw signal can be transformed into various representations (such as Short Time Fourier Transform and Mel Frequency Cepstral Coefficients), and information implied in different representations can be complementary. Ensembling the models trained on different representations can greatly boost the classification performance, however, making inference using a large number of models is cumbersome and computationally expensive. In this paper, we propose a novel end-to-end collaborative learning framework for the audio classification task. The framework takes multiple representations as the input to train the models in parallel. The complementary information provided by different representations is shared by knowledge distillation. Consequently, the performance of each model can be significantly promoted without increasing the computational overhead in the inference stage. Extensive experimental results demonstrate that the proposed approach can improve the classification performance and achieve state-of-the-art results on both acoustic scene classification tasks and general audio tagging tasks.

[50]  arXiv:2002.09609 [pdf, ps, other]
Title: Private Stochastic Convex Optimization: Efficient Algorithms for Non-smooth Objectives
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we revisit the problem of private stochastic convex optimization. We propose an algorithm, based on noisy mirror descent, which achieves optimal rates up to a logarithmic factor, both in terms of statistical complexity and number of queries to a first-order stochastic oracle. Unlike prior work, we do not require Lipschitz continuity of stochastic gradients to achieve optimal rates. Our algorithm generalizes beyond the Euclidean setting and yields anytime utility and privacy guarantees.

[51]  arXiv:2002.09610 [pdf, ps, other]
Title: Improved MPC Algorithms for MIS, Matching, and Coloring on Trees and Beyond
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

We present $O(\log\log n)$ round scalable Massively Parallel Computation algorithms for maximal independent set and maximal matching, in trees and more generally graphs of bounded arboricity, as well as for constant coloring trees. Following the standards, by a scalable MPC algorithm, we mean that these algorithms can work on machines that have capacity/memory as small as $n^{\delta}$ for any positive constant $\delta<1$. Our results improve over the $O(\log^2\log n)$ round algorithms of Behnezhad et al. [PODC'19]. Moreover, our matching algorithm is presumably optimal as its bound matches an $\Omega(\log\log n)$ conditional lower bound of Ghaffari, Kuhn, and Uitto [FOCS'19].

[52]  arXiv:2002.09616 [pdf, other]
Title: "Wait, I'm Still Talking!" Predicting the Dialogue Interaction Behavior Using Imagine-Then-Arbitrate Model
Subjects: Computation and Language (cs.CL)

Producing natural and accurate responses like human beings is the ultimate goal of intelligent dialogue agents. So far, most of the past works concentrate on selecting or generating one pertinent and fluent response according to current query and its context. These models work on a one-to-one environment, making one response to one utterance each round. However, in real human-human conversations, human often sequentially sends several short messages for readability instead of a long message in one turn. Thus messages will not end with an explicit ending signal, which is crucial for agents to decide when to reply. So the first step for an intelligent dialogue agent is not replying but deciding if it should reply at the moment. To address this issue, in this paper, we propose a novel Imagine-then-Arbitrate (ITA) neural dialogue model to help the agent decide whether to wait or to make a response directly. Our method has two imaginator modules and an arbitrator module. The two imaginators will learn the agent's and user's speaking style respectively, generate possible utterances as the input of the arbitrator, combining with dialogue history. And the arbitrator decides whether to wait or to make a response to the user directly. To verify the performance and effectiveness of our method, we prepared two dialogue datasets and compared our approach with several popular models. Experimental results show that our model performs well on addressing ending prediction issue and outperforms baseline models.

[53]  arXiv:2002.09617 [pdf, other]
Title: Power-Constrained Trajectory Optimization for Wireless UAV Relays with Random Requests
Comments: Accepted and to appear at IEEE ICC 2020
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

This paper studies the adaptive trajectory design of a rotary-wing UAV serving as a relay between ground nodes dispersed in a circular cell and generating uplink data transmissions randomly according to a Poisson process, and a central base station. We seek to minimize the expected average communication delay to service the data transmission requests, subject to an average power constraint on the mobility of the UAV. The problem is cast as a semi-Markov decision process, and it is shown that the policy exhibits a two-scale structure, which can be efficiently optimized: in the outer decision, upon starting a communication phase, and given its current radius, the UAV selects a target end radius position so as to optimally balance a trade-off between average long-term communication delay and power consumption; in the inner decision, the UAV selects its trajectory between the start radius and the selected end radius, so as to greedily minimize the delay and energy consumption to serve the current request. Numerical evaluations show that, during waiting phases, the UAV circles at some optimal radius at the most energy efficient speed, until a new request is received. Lastly, the expected average communication delay and power consumption of the optimal policy is compared to that of several heuristics, demonstrating a reduction in latency by over 50% and 20%, respectively, compared to static and mobile heuristic schemes.

[54]  arXiv:2002.09620 [pdf, other]
Title: Efficient Sentence Embedding via Semantic Subspace Analysis
Comments: 7 pages, 2 figures
Subjects: Computation and Language (cs.CL)

A novel sentence embedding method built upon semantic subspace analysis, called semantic subspace sentence embedding (S3E), is proposed in this work. Given the fact that word embeddings can capture semantic relationship while semantically similar words tend to form semantic groups in a high-dimensional embedding space, we develop a sentence representation scheme by analyzing semantic subspaces of its constituent words. Specifically, we construct a sentence model from two aspects. First, we represent words that lie in the same semantic group using the intra-group descriptor. Second, we characterize the interaction between multiple semantic groups with the inter-group descriptor. The proposed S3E method is evaluated on both textual similarity tasks and supervised tasks. Experimental results show that it offers comparable or better performance than the state-of-the-art. The complexity of our S3E method is also much lower than other parameterized models.

[55]  arXiv:2002.09623 [pdf, other]
Title: Anypath Routing Protocol Design via Q-Learning for Underwater Sensor Networks
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)

As a promising technology in the Internet of Underwater Things, underwater sensor networks have drawn a widespread attention from both academia and industry. However, designing a routing protocol for underwater sensor networks is a great challenge due to high energy consumption and large latency in the underwater environment. This paper proposes a Q-learning-based localization-free anypath routing (QLFR) protocol to prolong the lifetime as well as reduce the end-to-end delay for underwater sensor networks. Aiming at optimal routing policies, the Q-value is calculated by jointly considering the residual energy and depth information of sensor nodes throughout the routing process. More specifically, we define two reward functions (i.e., depth-related and energy-related rewards) for Q-learning with the objective of reducing latency and extending network lifetime. In addition, a new holding time mechanism for packet forwarding is designed according to the priority of forwarding candidate nodes. Furthermore, a mathematical analysis is presented to analyze the performance of the proposed routing protocol. Extensive simulation results demonstrate the superiority performance of the proposed routing protocol in terms of the end-to-end delay and the network lifetime.

[56]  arXiv:2002.09626 [pdf, other]
Title: Feedback Identification of conductance-based models
Comments: 14 pages, 10 figures
Subjects: Systems and Control (eess.SY)

This paper applies the classical prediction error method (PEM) to the estimation of nonlinear models of neuronal systems subject to input-additive noise. While the nonlinear system exhibits excitability, bifurcations, and limit-cycle oscillations, we prove consistency of the parameter estimation procedure under output feedback. Hence, this paper provides a rigorous framework for the application of conventional nonlinear system identification methods to stochastic neuronal systems. The main result exploits the elementary property that conductance-based models of neurons have an exponentially contracting inverse dynamics. This property is implied by the voltage-clamp experiment, which has been the fundamental modeling experiment of neurons ever since the pioneering work of Hodgkin and Huxley.

[57]  arXiv:2002.09627 [pdf, ps, other]
Title: Feedback for nonlinear system identification
Comments: 18th European Control Conference (ECC), Napoli, Italy, June 25-28 2019
Journal-ref: 18th European Control Conference (ECC), Naples, Italy, 2019, pp. 1344-1349
Subjects: Systems and Control (eess.SY)

Motivated by neuronal models from neuroscience, we consider the system identification of simple feedback structures whose behaviors include nonlinear phenomena such as excitability, limit-cycles and chaos. We show that output feedback is sufficient to solve the identification problem in a two-step procedure. First, the nonlinear static characteristic of the system is extracted, and second, using a feedback linearizing law, a mildly nonlinear system with an approximately-finite memory is identified. In an ideal setting, the second step boils down to the identification of a LTI system. To illustrate the method in a realistic setting, we present numerical simulations of the identification of two classical systems that fit the assumed model structure.

[58]  arXiv:2002.09629 [pdf, other]
Title: An Empirical Study of Android Security Bulletins in Different Vendors
Subjects: Cryptography and Security (cs.CR)

Mobile devices encroach on almost every part of our lives, including work and leisure, and contain a wealth of personal and sensitive information. It is, therefore, imperative that these devices uphold high security standards. A key aspect is the security of the underlying operating system. In particular, Android plays a critical role due to being the most dominant platform in the mobile ecosystem with more than one billion active devices and due to its openness, which allows vendors to adopt and customize it. Similar to other platforms, Android maintains security by providing monthly security patches and announcing them via the Android security bulletin. To absorb this information successfully across the Android ecosystem, impeccable coordination by many different vendors is required.
In this paper, we perform a comprehensive study of 3,171 Android-related vulnerabilities and study to which degree they are reflected in the Android security bulletin, as well as in the security bulletins of three leading vendors: Samsung, LG, and Huawei. In our analysis, we focus on the metadata of these security bulletins (e.g., timing, affected layers, severity, and CWE data) to better understand the similarities and differences among vendors. We find that (i) the studied vendors in the Android ecosystem have adopted different structures for vulnerability reporting, (ii) vendors are less likely to react with delay for CVEs with Android Git repository references, (iii) vendors handle Qualcomm-related CVEs differently from the rest of external layer CVEs.

[59]  arXiv:2002.09632 [pdf, other]
Title: Using Single-Step Adversarial Training to Defend Iterative Adversarial Examples
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

Adversarial examples have become one of the largest challenges that machine learning models, especially neural network classifiers, face. These adversarial examples break the assumption of attack-free scenario and fool state-of-the-art (SOTA) classifiers with insignificant perturbations to human. So far, researchers achieved great progress in utilizing adversarial training as a defense. However, the overwhelming computational cost degrades its applicability and little has been done to overcome this issue. Single-Step adversarial training methods have been proposed as computationally viable solutions, however they still fail to defend against iterative adversarial examples. In this work, we first experimentally analyze several different SOTA defense methods against adversarial examples. Then, based on observations from experiments, we propose a novel single-step adversarial training method which can defend against both single-step and iterative adversarial examples. Lastly, through extensive evaluations, we demonstrate that our proposed method outperforms the SOTA single-step and iterative adversarial training defense. Compared with ATDA (single-step method) on CIFAR10 dataset, our proposed method achieves 35.67% enhancement in test accuracy and 19.14% reduction in training time. When compared with methods that use BIM or Madry examples (iterative methods) on CIFAR10 dataset, it saves up to 76.03% in training time with less than 3.78% degeneration in test accuracy.

[60]  arXiv:2002.09634 [pdf, other]
Title: Data Augmentation for Copy-Mechanism in Dialogue State Tracking
Subjects: Computation and Language (cs.CL)

While several state-of-the-art approaches to dialogue state tracking (DST) have shown promising performances on several benchmarks, there is still a significant performance gap between seen slot values (i.e., values that occur in both training set and test set) and unseen ones (values that occur in training set but not in test set). Recently, the copy-mechanism has been widely used in DST models to handle unseen slot values, which copies slot values from user utterance directly. In this paper, we aim to find out the factors that influence the generalization ability of a common copy-mechanism model for DST. Our key observations include: 1) the copy-mechanism tends to memorize values rather than infer them from contexts, which is the primary reason for unsatisfactory generalization performance; 2) greater diversity of slot values in the training set increase the performance on unseen values but slightly decrease the performance on seen values. Moreover, we propose a simple but effective algorithm of data augmentation to train copy-mechanism models, which augments the input dataset by copying user utterances and replacing the real slot values with randomly generated strings. Users could use two hyper-parameters to realize a trade-off between the performances on seen values and unseen ones, as well as a trade-off between overall performance and computational cost. Experimental results on three widely used datasets (WoZ 2.0, DSTC2, and Multi-WoZ 2.0) show the effectiveness of our approach.

[61]  arXiv:2002.09636 [pdf, other]
Title: Conceptual Game Expansion
Comments: 14 pages, 5 figures
Subjects: Artificial Intelligence (cs.AI)

Automated game design is the problem of automatically producing games through computational processes. Traditionally these methods have relied on the authoring of search spaces by a designer, defining the space of all possible games for the system to author. In this paper we instead learn representations of existing games and use these to approximate a search space of novel games. In a human subject study we demonstrate that these novel games are indistinguishable from human games for certain measures.

[62]  arXiv:2002.09637 [pdf, other]
Title: Markov Chain Monte-Carlo Phylogenetic Inference Construction in Computational Historical Linguistics
Authors: Tianyi Ni
Subjects: Computation and Language (cs.CL)

More and more languages in the world are under study nowadays, as a result, the traditional way of historical linguistics study is facing some challenges. For example, the linguistic comparative research among languages needs manual annotation, which becomes more and more impossible with the increasing amount of language data coming out all around the world. Although it could hardly replace linguists work, the automatic computational methods have been taken into consideration and it can help people reduce their workload. One of the most important work in historical linguistics is word comparison from different languages and find the cognate words for them, which means people try to figure out if the two languages are related to each other or not. In this paper, I am going to use computational method to cluster the languages and use Markov Chain Monte Carlo (MCMC) method to build the language typology relationship tree based on the clusters.

[63]  arXiv:2002.09646 [pdf, other]
Title: Machine Translation System Selection from Bandit Feedback
Subjects: Computation and Language (cs.CL)

Adapting machine translation systems in the real world is a difficult problem. In contrast to offline training, users cannot provide the type of fine-grained feedback typically used for improving the system. Moreover, users have different translation needs, and even a single user's needs may change over time.
In this work we take a different approach, treating the problem of adapting as one of selection. Instead of adapting a single system, we train many translation systems using different architectures and data partitions. Using bandit learning techniques on simulated user feedback, we learn a policy to choose which system to use for a particular translation task. We show that our approach can (1) quickly adapt to address domain changes in translation tasks, (2) outperform the single best system in mixed-domain translation tasks, and (3) make effective instance-specific decisions when using contextual bandit strategies.

[64]  arXiv:2002.09650 [pdf, other]
Title: Learning Cost Functions for Optimal Transport
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Learning the cost function for optimal transport from observed transport plan or its samples has been cast as a bi-level optimization problem. In this paper, we derive an unconstrained convex optimization formulation for the problem which can be further augmented by any customizable regularization. This novel framework avoids repeatedly solving a forward optimal transport problem in each iteration which has been a thorny computational bottleneck for the bi-level optimization approach. To validate the effectiveness of this framework, we develop two numerical algorithms, one is a fast matrix scaling method based on the Sinkhorn-Knopp algorithm for the discrete case, and the other is a supervised learning algorithm that realizes the cost function as a deep neural network in the continuous case. Numerical results demonstrate promising efficiency and accuracy advantages of the proposed algorithms over existing state of the art methods.

[65]  arXiv:2002.09663 [pdf, other]
Title: Active Lighting Recurrence by Parallel Lighting Analogy for Fine-Grained Change Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper studies a new problem, namely active lighting recurrence (ALR) that physically relocalizes a light source to reproduce the lighting condition from single reference image for a same scene, which may suffer from fine-grained changes during twice observations. ALR is of great importance for fine-grained visual inspection and change detection, because some phenomena or minute changes can only be clearly observed under particular lighting conditions. Therefore, effective ALR should be able to online navigate a light source toward the target pose, which is challenging due to the complexity and diversity of real-world lighting and imaging processes. To this end, we propose to use the simple parallel lighting as an analogy model and based on Lambertian law to compose an instant navigation ball for this purpose. We theoretically prove the feasibility, i.e., equivalence and convergence, of this ALR approach for realistic near point light source and small near surface light source. Besides, we also theoretically prove the invariance of our ALR approach to the ambiguity of normal and lighting decomposition. The effectiveness and superiority of the proposed approach have been verified by both extensive quantitative experiments and challenging real-world tasks on fine-grained change detection of cultural heritages. We also validate the generality of our approach to non-Lambertian scenes.

[66]  arXiv:2002.09664 [pdf, other]
Title: Book-Ahead & Supply Management for Ridesourcing Platforms
Subjects: Systems and Control (eess.SY)

Ridesourcing platforms recently introduced the "schedule a ride" service where passengers may reserve (book-ahead) a ride in advance of their trip. Reservations give platforms precise information that describes the start time and location of anticipated future trips; in turn, platforms can use this information to adjust the availability and spatial distribution of the driver supply. In this article, we propose a framework for modeling/analyzing reservations in time-varying stochastic ridesourcing systems. We consider that the driver supply is distributed over a network of geographic regions and that book-ahead rides have reach time priority over non-reserved rides. First, we propose a state-dependent admission control policy that assigns drivers to passengers; this policy ensures that the reach time service requirement would be attained for book-ahead rides. Second, given the admission control policy and reservations information in each region, we predict the "target" number of drivers that is required (in the future) to probabilistically guarantee the reach time service requirement for stochastic non-reserved rides. Third, we propose a reactive dispatching/rebalancing mechanism that determines the adjustments to the driver supply that are needed to maintain the targets across regions. For a specific reach time quality of service, simulation results using data from Lyft rides in Manhattan exhibit how the number of idle drivers decreases with the fraction of book-ahead rides. We also observe that the non-stationary demand (ride request) rate varies significantly across time; this rapid variation further illustrates that time-dependent models are needed for operational analysis of ridesourcing systems.

[67]  arXiv:2002.09666 [pdf, ps, other]
Title: String stable integral control of vehicle platoons with disturbances
Comments: 7 pages, 4 figures, submitted to Automatica
Subjects: Systems and Control (eess.SY)

This paper presents string stable controllers with disturbance rejection properties for vehicle platoons. Through the addition of integral action and a coordinate change, sufficient smoothness conditions on the closed loop system are established that ensure the proposed controller is string stable in the presence of time-varying disturbances, and is able to reject constant disturbances. Error bounds from desired platoon configuration are also developed. Further, a suitable controller structure is introduced, and an example is provided that achieves the required smoothness conditions and is examined in simulation studies.

[68]  arXiv:2002.09668 [pdf, other]
Title: Communication-Efficient Edge AI: Algorithms and Systems
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)

Artificial intelligence (AI) has achieved remarkable breakthroughs in a wide range of fields, ranging from speech processing, image classification to drug discovery. This is driven by the explosive growth of data, advances in machine learning (especially deep learning), and easy access to vastly powerful computing resources. Particularly, the wide scale deployment of edge devices (e.g., IoT devices) generates an unprecedented scale of data, which provides the opportunity to derive accurate models and develop various intelligent applications at the network edge. However, such enormous data cannot all be sent from end devices to the cloud for processing, due to the varying channel quality, traffic congestion and/or privacy concerns. By pushing inference and training processes of AI models to edge nodes, edge AI has emerged as a promising alternative. AI at the edge requires close cooperation among edge devices, such as smart phones and smart vehicles, and edge servers at the wireless access points and base stations, which however result in heavy communication overheads. In this paper, we present a comprehensive survey of the recent developments in various techniques for overcoming these communication challenges. Specifically, we first identify key communication challenges in edge AI systems. We then introduce communication-efficient techniques, from both algorithmic and system perspectives for training and inference tasks at the network edge. Potential future research directions are also highlighted.

[69]  arXiv:2002.09670 [pdf, other]
Title: Nonmyopic Gaussian Process Optimization with Macro-Actions
Comments: 23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020), Extended version with proofs, 32 pages
Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Machine Learning (stat.ML)

This paper presents a multi-staged approach to nonmyopic adaptive Gaussian process optimization (GPO) for Bayesian optimization (BO) of unknown, highly complex objective functions that, in contrast to existing nonmyopic adaptive BO algorithms, exploits the notion of macro-actions for scaling up to a further lookahead to match up to a larger available budget. To achieve this, we generalize GP upper confidence bound to a new acquisition function defined w.r.t. a nonmyopic adaptive macro-action policy, which is intractable to be optimized exactly due to an uncountable set of candidate outputs. The contribution of our work here is thus to derive a nonmyopic adaptive epsilon-Bayes-optimal macro-action GPO (epsilon-Macro-GPO) policy. To perform nonmyopic adaptive BO in real time, we then propose an asymptotically optimal anytime variant of our epsilon-Macro-GPO policy with a performance guarantee. We empirically evaluate the performance of our epsilon-Macro-GPO policy and its anytime variant in BO with synthetic and real-world datasets.

[70]  arXiv:2002.09671 [pdf, ps, other]
Title: Vehicle Tracking in Wireless Sensor Networks via Deep Reinforcement Learning
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Machine Learning (stat.ML)

Vehicle tracking has become one of the key applications of wireless sensor networks (WSNs) in the fields of rescue, surveillance, traffic monitoring, etc. However, the increased tracking accuracy requires more energy consumption. In this letter, a decentralized vehicle tracking strategy is conceived for improving both tracking accuracy and energy saving, which is based on adjusting the intersection area between the fixed sensing area and the dynamic activation area. Then, two deep reinforcement learning (DRL) aided solutions are proposed relying on the dynamic selection of the activation area radius. Finally, simulation results show the superiority of our DRL aided design.

[71]  arXiv:2002.09673 [pdf, other]
Title: Incorporating Effective Global Information via Adaptive Gate Attention for Text Classification
Subjects: Computation and Language (cs.CL)

The dominant text classification studies focus on training classifiers using textual instances only or introducing external knowledge (e.g., hand-craft features and domain expert knowledge). In contrast, some corpus-level statistical features, like word frequency and distribution, are not well exploited. Our work shows that such simple statistical information can enhance classification performance both efficiently and significantly compared with several baseline models. In this paper, we propose a classifier with gate mechanism named Adaptive Gate Attention model with Global Information (AGA+GI), in which the adaptive gate mechanism incorporates global statistical features into latent semantic features and the attention layer captures dependency relationship within the sentence. To alleviate the overfitting issue, we propose a novel Leaky Dropout mechanism to improve generalization ability and performance stability. Our experiments show that the proposed method can achieve better accuracy than CNN-based and RNN-based approaches without global information on several benchmarks.

[72]  arXiv:2002.09674 [pdf, other]
Title: Temporal Sparse Adversarial Attack on Gait Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Gait recognition has a broad application in social security due to its advantages in long-distance human identification. Despite the high accuracy of gait recognition systems, their adversarial robustness has not been explored. In this paper, we demonstrate that the state-of-the-art gait recognition model is vulnerable to adversarial attacks. A novel temporal sparse adversarial attack under a new defined distortion measurement is proposed. GAN-based architecture is employed to semantically generate adversarial high-quality gait silhouette. By sparsely substituting or inserting a few adversarial gait silhouettes, our proposed method can achieve a high attack success rate. The imperceptibility and the attacking success rate of the adversarial examples are well balanced. Experimental results show even only one-fortieth frames are attacked, the attack success rate still reaches 76.8%.

[73]  arXiv:2002.09676 [pdf, other]
Title: Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion
Comments: 8 pages, 8 figures, 5 tables, 1 algorithm, accepted to IEEE Robotics and Automation Letters (RA-L), January 2020 with presentation at International Conference on Robotics and Automation (ICRA) 2020
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)

Deep reinforcement learning (RL) uses model-free techniques to optimize task-specific control policies. Despite having emerged as a promising approach for complex problems, RL is still hard to use reliably for real-world applications. Apart from challenges such as precise reward function tuning, inaccurate sensing and actuation, and non-deterministic response, existing RL methods do not guarantee behavior within required safety constraints that are crucial for real robot scenarios. In this regard, we introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained proximal policy optimization (CPPO) for tracking base velocity commands while following the defined constraints. We also introduce schemes which encourage state recovery into constrained regions in case of constraint violations. We present experimental results of our training method and test it on the real ANYmal quadruped robot. We compare our approach against the unconstrained RL method and show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.

[74]  arXiv:2002.09680 [pdf, other]
Title: On Boolean gates in fungal colony
Subjects: Emerging Technologies (cs.ET)

A fungal colony maintains its integrity via flow of cytoplasm along mycelium network. This flow, together with possible coordination of mycelium tips propagation, is controlled by calcium waves and associated waves of electrical potential changes. We propose that these excitation waves can be employed to implement a computation in the mycelium networks. We use FitzHugh-Nagumo model to imitate propagation of excitation in a single colony of Aspergillus niger. Boolean values are encoded by spikes of extracellular potential. We represent binary inputs by electrical impulses on a pair of selected electrodes and we record responses of the colony from sixteen electrodes. We derive sets of two-inputs-on-output logical gates implementable the fungal colony and analyse distributions of the gates.

[75]  arXiv:2002.09681 [pdf]
Title: Towards field-programmable photonic gate arrays
Comments: A version of this paper was presented at the OPTO conference in Photonics West 2020
Subjects: Emerging Technologies (cs.ET); Optics (physics.optics)

We review some of the basic principles, fundamentals, technologies, architectures and recent advances leading to thefor the implementation of Field Programmable Photonic Field Arrays (FPPGAs).

[76]  arXiv:2002.09682 [pdf, ps, other]
Title: Concurrent Kleene Algebra with Observations: from Hypotheses to Completeness
Subjects: Logic in Computer Science (cs.LO)

Concurrent Kleene Algebra (CKA) extends basic Kleene algebra with a parallel composition operator, which enables reasoning about concurrent programs. However, CKA fundamentally misses tests, which are needed to model standard programming constructs such as conditionals and $\mathsf{while}$-loops. It turns out that integrating tests in CKA is subtle, due to their interaction with parallelism. In this paper we provide a solution in the form of Concurrent Kleene Algebra with Observations (CKAO). Our main contribution is a completeness theorem for CKAO. Our result resorts on a more general study of CKA "with hypotheses", of which CKAO turns out to be an instance: this analysis is of independent interest, as it can be applied to extensions of CKA other than CKAO.

[77]  arXiv:2002.09685 [pdf, other]
Title: Exploiting Typed Syntactic Dependencies for Targeted Sentiment Classification Using Graph Attention Neural Network
Subjects: Computation and Language (cs.CL)

Targeted sentiment classification predicts the sentiment polarity on given target mentions in input texts. Dominant methods employ neural networks for encoding the input sentence and extracting relations between target mentions and their contexts. Recently, graph neural network has been investigated for integrating dependency syntax for the task, achieving the state-of-the-art results. However, existing methods do not consider dependency label information, which can be intuitively useful. To solve the problem, we investigate a novel relational graph attention network that integrates typed syntactic dependency information. Results on standard benchmarks show that our method can effectively leverage label information for improving targeted sentiment classification performances. Our final model significantly outperforms state-of-the-art syntax-based approaches.

[78]  arXiv:2002.09689 [pdf, ps, other]
Title: Fair and Decentralized Exchange of Digital Goods
Comments: 10 pages
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY); Social and Information Networks (cs.SI)

We construct a privacy-preserving, distributed and decentralized marketplace where parties can exchange data for tokens. In this market, buyers and sellers make transactions in a blockchain and interact with a third party, called notary, who has the ability to vouch for the authenticity and integrity of the data.
We introduce a protocol for the data-token exchange where neither party gains more information than what it is paying for, and the exchange is fair: either both parties gets the other's item or neither does. No third party involvement is required after setup, and no dispute resolution is needed.

[79]  arXiv:2002.09690 [pdf, other]
Title: A Positive and Energy Stable Numerical Scheme for the Poisson-Nernst-Planck-Cahn-Hilliard Equations with Steric Interactions
Subjects: Numerical Analysis (math.NA); Chemical Physics (physics.chem-ph); Computational Physics (physics.comp-ph)

We consider numerical methods for the Poisson-Nernst-Planck-Cahn-Hilliard (PNPCH) equations with steric interactions. We propose a novel energy stable numerical scheme that respects mass conservation and positivity at the discrete level. Existence and uniqueness of the solution to the proposed nonlinear scheme are established by showing that the solution is a unique minimizer of a convex functional over a closed, convex domain. The positivity of numerical solutions is further theoretically justified by the singularity of the entropy terms, which prevents the minimizer from approaching zero concentrations. A further numerical analysis proves discrete free-energy dissipation. Extensive numerical tests are performed to validate that the numerical scheme is first-order accurate in time and second-order accurate in space, and is capable of preserving the desired properties, such as mass conservation, positivity, and free energy dissipation, at the discrete level. Moreover, the PNPCH equations and the proposed scheme are applied to study charge dynamics and self-assembled nanopatterns in highly concentrated electrolytes that are widely used in electrochemical energy devices. Numerical results demonstrate that the PNPCH equations and our numerical scheme are able to capture nanostructures, such as lamellar patterns and labyrinthine patterns in electric double layers and the bulk, and multiple time relaxation with multiple time scales. In addition, we numerically characterize the interplay between cross steric interactions of short range and the concentration gradient regularization, and their impact on the development of nanostructures in the equilibrium state.

[80]  arXiv:2002.09692 [pdf, other]
Title: Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)

Distributed learning techniques such as federated learning have enabled multiple workers to train machine learning models together to reduce the overall training time. However, current distributed training algorithms (centralized or decentralized) suffer from the communication bottleneck on multiple low-bandwidth workers (also on the server under the centralized architecture). Although decentralized algorithms generally have lower communication complexity than the centralized counterpart, they still suffer from the communication bottleneck for workers with low network bandwidth. To deal with the communication problem while being able to preserve the convergence performance, we introduce a novel decentralized training algorithm with the following key features: 1) It does not require a parameter server to maintain the model during training, which avoids the communication pressure on any single peer. 2) Each worker only needs to communicate with a single peer at each communication round with a highly compressed model, which can significantly reduce the communication traffic on the worker. We theoretically prove that our sparsification algorithm still preserves convergence properties. 3) Each worker dynamically selects its peer at different communication rounds to better utilize the bandwidth resources. We conduct experiments with convolutional neural networks on 32 workers to verify the effectiveness of our proposed algorithm compared to seven existing methods. Experimental results show that our algorithm significantly reduces the communication traffic and generally select relatively high bandwidth peers.

[81]  arXiv:2002.09693 [pdf, other]
Title: Interpretable Crowd Flow Prediction with Spatial-Temporal Self-Attention
Comments: 7pages
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Image and Video Processing (eess.IV)

Crowd flow prediction has been increasingly investigated in intelligent urban computing field as a fundamental component of urban management system. The most challenging part of predicting crowd flow is to measure the complicated spatial-temporal dependencies. A prevalent solution employed in current methods is to divide and conquer the spatial and temporal information by various architectures (e.g., CNN/GCN, LSTM). However, this strategy has two disadvantages: (1) the sophisticated dependencies are also divided and therefore partially isolated; (2) the spatial-temporal features are transformed into latent representations when passing through different architectures, making it hard to interpret the predicted crowd flow. To address these issues, we propose a Spatial-Temporal Self-Attention Network (STSAN) with an ST encoding gate that calculates the entire spatial-temporal representation with positional and time encodings and therefore avoids dividing the dependencies. Furthermore, we develop a Multi-aspect attention mechanism that applies scaled dot-product attention over spatial-temporal information and measures the attention weights that explicitly indicate the dependencies. Experimental results on traffic and mobile data demonstrate that the proposed method reduces inflow and outflow RMSE by 16% and 8% on the Taxi-NYC dataset compared to the SOTA baselines.

[82]  arXiv:2002.09699 [pdf, other]
Title: FMore: An Incentive Scheme of Multi-dimensional Auction for Federated Learning in MEC
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)

Promising federated learning coupled with Mobile Edge Computing (MEC) is considered as one of the most promising solutions to the AI-driven service provision. Plenty of studies focus on federated learning from the performance and security aspects, but they neglect the incentive mechanism. In MEC, edge nodes would not like to voluntarily participate in learning, and they differ in the provision of multi-dimensional resources, both of which might deteriorate the performance of federated learning. Also, lightweight schemes appeal to edge nodes in MEC. These features require the incentive mechanism to be well designed for MEC. In this paper, we present an incentive mechanism FMore with multi-dimensional procurement auction of K winners. Our proposal FMore not only is lightweight and incentive compatible, but also encourages more high-quality edge nodes with low cost to participate in learning and eventually improve the performance of federated learning. We also present theoretical results of Nash equilibrium strategy to edge nodes and employ the expected utility theory to provide guidance to the aggregator. Both extensive simulations and real-world experiments demonstrate that the proposed scheme can effectively reduce the training rounds and drastically improve the model accuracy for challenging AI tasks.

[83]  arXiv:2002.09705 [pdf, other]
Title: The candy wrapper problem -- a temporal multiscale approach for pde/pde systems
Subjects: Numerical Analysis (math.NA)

We describe a temporal multiscale approach for the simulation of long-term processes with short-term influences involving partial differential equations. The specific problem under consideration is a growth process in blood vessels. The \emph{Candy Wrapper Process} describes a restenosis in a vessel that has previously be widened by inserting a stent. The development of a new stenosis takes place on a long time horizon (months) while the acting forces are mainly given by the pulsating blood flow. We describe a coupled pde model and a finite element simulation that is used as basis for our multiscale approach, which is based on averaging the long scale equation and approximating the fast scale impact by localized periodic-in-time problems. Numerical test cases in prototypical 3d configurations demonstrate the power of the approach.

[84]  arXiv:2002.09706 [pdf]
Title: Structural Combinatorial of Network Information System of Systems based on Evolutionary Optimization Method
Subjects: Neural and Evolutionary Computing (cs.NE)

The network information system is a military information network system with evolution characteristics. Evolution is a process of replacement between disorder and order, chaos and equilibrium. Given that the concept of evolution originates from biological systems, in this article, the evolution of network information architecture is analyzed by genetic algorithms, and the network information architecture is represented by chromosomes. Besides, the genetic algorithm is also applied to find the optimal chromosome in the architecture space. The evolutionary simulation is used to predict the optimal scheme of the network information architecture and provide a reference for system construction.

[85]  arXiv:2002.09707 [pdf, other]
Title: Compression with wildcards: All spanning trees
Authors: Marcel Wild
Comments: 14 pages
Subjects: Data Structures and Algorithms (cs.DS)

By processing all minimal cutsets of a graph G, and by using novel wildcards, all spanning trees of G can be compactly encoded. Thus, different from all previous enumeration schemes, the spanning trees are not generated one-by-one. The Mathematica implementation of one of our algorithms generated for a random (11,50)-graph its 819'603'181 spanning trees, in bundles of size about 400, within 52 seconds.

[86]  arXiv:2002.09708 [pdf, other]
Title: Robust Multimodal Brain Tumor Segmentation via Feature Disentanglement and Gated Fusion
Comments: MICCAI 2019
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Accurate medical image segmentation commonly requires effective learning of the complementary information from multimodal data. However, in clinical practice, we often encounter the problem of missing imaging modalities. We tackle this challenge and propose a novel multimodal segmentation framework which is robust to the absence of imaging modalities. Our network uses feature disentanglement to decompose the input modalities into the modality-specific appearance code, which uniquely sticks to each modality, and the modality-invariant content code, which absorbs multimodal information for the segmentation task. With enhanced modality-invariance, the disentangled content code from each modality is fused into a shared representation which gains robustness to missing data. The fusion is achieved via a learning-based strategy to gate the contribution of different modalities at different locations. We validate our method on the important yet challenging multimodal brain tumor segmentation task with the BRATS challenge dataset. With competitive performance to the state-of-the-art approaches for full modality, our method achieves outstanding robustness under various missing modality(ies) situations, significantly exceeding the state-of-the-art method by over 16% in average for Dice on whole tumor segmentation.

[87]  arXiv:2002.09710 [pdf, other]
Title: Actively Mapping Industrial Structures with Information Gain-Based Planning on a Quadruped Robot
Subjects: Robotics (cs.RO)

In this paper, we develop an online active mapping system to enable a quadruped robot to autonomously survey large physical structures. We describe the perception, planning and control modules needed to scan and reconstruct an object of interest, without requiring a prior model. The system builds a voxel representation of the object, and iteratively determines the Next-Best-View (NBV) to extend the representation, according to both the reconstruction itself and to avoid collisions with the environment. By computing the expected information gain of a set of candidate scan locations sampled on the as-sensed terrain map, as well as the cost of reaching these candidates, the robot decides the NBV for further exploration. The robot plans an optimal path towards the NBV, avoiding obstacles and un-traversable terrain. Experimental results on both simulated and real-world environments show the capability and efficiency of our system. Finally we present a full system demonstration on the real robot, the ANYbotics ANYmal, autonomously reconstructing a building facade and an industrial structure.

[88]  arXiv:2002.09718 [pdf, other]
Title: Safe Screening for the Generalized Conditional Gradient Method
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

The conditional gradient method (CGM) has been widely used for fast sparse approximation, having a low per iteration computational cost for structured sparse regularizers. We explore the sparsity acquiring properties of a generalized CGM (gCGM), where the constraint is replaced by a penalty function based on a gauge penalty; this can be done without significantly increasing the per-iteration computation, and applies to general notions of sparsity. Without assuming bounded iterates, we show $O(1/t)$ convergence of the function values and gap of gCGM. We couple this with a safe screening rule, and show that at a rate $O(1/(t\delta^2))$, the screened support matches the support at the solution, where $\delta \geq 0$ measures how close the problem is to being degenerate. In our experiments, we show that the gCGM for these modified penalties have similar feature selection properties as common penalties, but with potentially more stability over the choice of hyperparameter.

[89]  arXiv:2002.09719 [pdf, ps, other]
Title: Joint Transmission and Computing Scheduling for Status Update with Mobile Edge Computing
Comments: 6 pages, 6 figures, accepted by IEEE ICC'20
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Age of Information (AoI), defined as the time elapsed since the generation of the latest received update, is a promising performance metric to measure data freshness for real-time status monitoring. In many applications, status information needs to be extracted through computing, which can be processed at an edge server enabled by mobile edge computing (MEC). In this paper, we aim to minimize the average AoI within a given deadline by jointly scheduling the transmissions and computations of a series of update packets with deterministic transmission and computing times. The main analytical results are summarized as follows. Firstly, the minimum deadline to guarantee the successful transmission and computing of all packets is given. Secondly, a \emph{no-wait computing} policy which intuitively attains the minimum AoI is introduced, and the feasibility condition of the policy is derived. Finally, a closed-form optimal scheduling policy is obtained on the condition that the deadline exceeds a certain threshold. The behavior of the optimal transmission and computing policy is illustrated by numerical results with different values of the deadline, which validates the analytical results.

[90]  arXiv:2002.09721 [pdf, other]
Title: General theory of interpolation error estimates on anisotropic meshes
Comments: 22 pages, 2 figures
Subjects: Numerical Analysis (math.NA)

We propose a general theory of estimating interpolation error for smooth functions in two and three dimensions. In our theory, the error of interpolation is bound in terms of the diameter of a simplex and a geometric parameter. In the two-dimensional case, our geometric parameter is equivalent to the circumradius of a triangle. In the three-dimensional case, our geometric parameter also represents the flatness of a tetrahedron. Through the introduction of the geometric parameter, the error estimates newly obtained can be applied to cases that violate the maximum-angle condition.

[91]  arXiv:2002.09722 [pdf, ps, other]
Title: Checking Phylogenetic Decisiveness in Theory and in Practice
Subjects: Data Structures and Algorithms (cs.DS)

Suppose we have a set $X$ consisting of $n$ taxa and we are given information from $k$ loci from which to construct a phylogeny for $X$. Each locus offers information for only a fraction of the taxa. The question is whether this data suffices to construct a reliable phylogeny. The decisiveness problem expresses this question combinatorially. Although a precise characterization of decisiveness is known, the complexity of the problem is open. Here we relate decisiveness to a hypergraph coloring problem. We use this idea to (1) obtain lower bounds on the amount of coverage needed to achieve decisiveness, (2) devise an exact algorithm for decisiveness, (3) develop problem reduction rules, and use them to obtain efficient algorithms for inputs with few loci, and (4) devise an integer linear programming formulation of the decisiveness problem, which allows us to analyze data sets that arise in practice.

[92]  arXiv:2002.09723 [pdf, other]
Title: Constructing fast approximate eigenspaces with application to the fast graph Fourier transforms
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Numerical Analysis (math.NA); Machine Learning (stat.ML)

We investigate numerically efficient approximations of eigenspaces associated to symmetric and general matrices. The eigenspaces are factored into a fixed number of fundamental components that can be efficiently manipulated (we consider extended orthogonal Givens or scaling and shear transformations). The number of these components controls the trade-off between approximation accuracy and the computational complexity of projecting on the eigenspaces. We write minimization problems for the single fundamental components and provide closed-form solutions. Then we propose algorithms that iterative update all these components until convergence. We show results on random matrices and an application on the approximation of graph Fourier transforms for directed and undirected graphs.

[93]  arXiv:2002.09725 [pdf, other]
Title: Testing the Agreement of Trees with Internal Labels
Subjects: Data Structures and Algorithms (cs.DS)

The input to the agreement problem is a collection $P = \{T_1, T_2, \dots , T_k\}$ of phylogenetic trees, called input trees, over partially overlapping sets of taxa. The question is whether there exists a tree $T$, called an agreement tree, whose taxon set is the union of the taxon sets of the input trees, such that for each $i \in \{1, 2, \dots , k\}$, the restriction of $T$ to the taxon set of $T_i$ is isomorphic to $T_i$. We give a $O(n k (\sum_{i \in [k]} d_i + \log^2(nk)))$ algorithm for a generalization of the agreement problem in which the input trees may have internal labels, where $n$ is the total number of distinct taxa in $P$, $k$ is the number of trees in $P$, and $d_i$ is the maximum number of children of a node in $T_i$.

[94]  arXiv:2002.09726 [pdf, other]
Title: Operator inference for non-intrusive model reduction of systems with non-polynomial nonlinear terms
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Dynamical Systems (math.DS); Machine Learning (stat.ML)

This work presents a non-intrusive model reduction method to learn low-dimensional models of dynamical systems with non-polynomial nonlinear terms that are spatially local and that are given in analytic form. In contrast to state-of-the-art model reduction methods that are intrusive and thus require full knowledge of the governing equations and the operators of a full model of the discretized dynamical system, the proposed approach requires only the non-polynomial terms in analytic form and learns the rest of the dynamics from snapshots computed with a potentially black-box full-model solver. The proposed method learns operators for the linear and polynomially nonlinear dynamics via a least-squares problem, where the given non-polynomial terms are incorporated in the right-hand side. The least-squares problem is linear and thus can be solved efficiently in practice. The proposed method is demonstrated on three problems governed by partial differential equations, namely the diffusion-reaction Chafee-Infante model, a tubular reactor model for reactive flows, and a batch-chromatography model that describes a chemical separation process. The numerical results provide evidence that the proposed approach learns reduced models that achieve comparable accuracy as models constructed with state-of-the-art intrusive model reduction methods that require full knowledge of the governing equations.

[95]  arXiv:2002.09733 [pdf, ps, other]
Title: Numerical Analysis of a High-Order Scheme for Nonlinear Fractional Differential Equations with Uniform Accuracy
Subjects: Numerical Analysis (math.NA)

We introduce a high-order numerical scheme for fractional ordinary differential equations with the Caputo derivative. The method is developed by dividing the domain into a number of subintervals, and applying the quadratic interpolation on each subinterval. The method is shown to be unconditionally stable, and for general nonlinear equations, the uniform sharp numerical order $3-\nu$ can be rigorously proven for sufficiently smooth solutions at all time steps. The proof provides a general guide for proving the sharp order for higher-order schemes in the nonlinear case. Some numerical examples are given to validate our theoretical results.

[96]  arXiv:2002.09740 [pdf, other]
Title: (Faster) Multi-Sided Boundary Labelling
Comments: 16 pages, 12 figures
Subjects: Computational Geometry (cs.CG)

A 1-bend boundary labelling problem consists of an axis-aligned rectangle $B$, $n$ points (called sites) in the interior, and $n$ points (called ports) on the labels along the boundary of $B$. The goal is to find a set of $n$ axis-aligned curves (called leaders), each having at most one bend and connecting one site to one port, such that the leaders are pairwise disjoint. A 1-bend boundary labelling problem is $k$-sided ($1\leq k\leq 4$) if the ports appear on $k$ different sides of $B$. Kindermann et al. ["Multi-Sided Boundary Labeling", Algorithmica, 76(1): 225-258, 2016] showed that the 1-bend three-sided and four-sided boundary labelling problems can be solved in $O(n^4)$ and $O(n^9)$ time, respectively. Bose et al. [SWAT, 12:1-12:14, 2018] improved the latter running time to $O(n^6)$ by reducing the problem to computing maximum independent set in an outerstring graph. In this paper, we improve both previous results by giving new algorithms with running times $O(n^3\log n)$ and $O(n^5)$ to solve the 1-bend three-sided and four-sided boundary labelling problems, respectively.

[97]  arXiv:2002.09745 [pdf, other]
Title: Differentially Private Set Union
Comments: 23 pages, 7 figures
Subjects: Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Machine Learning (stat.ML)

We study the basic operation of set union in the global model of differential privacy. In this problem, we are given a universe $U$ of items, possibly of infinite size, and a database $D$ of users. Each user $i$ contributes a subset $W_i \subseteq U$ of items. We want an ($\epsilon$,$\delta$)-differentially private algorithm which outputs a subset $S \subset \cup_i W_i$ such that the size of $S$ is as large as possible. The problem arises in countless real world applications; it is particularly ubiquitous in natural language processing (NLP) applications as vocabulary extraction. For example, discovering words, sentences, $n$-grams etc., from private text data belonging to users is an instance of the set union problem.
Known algorithms for this problem proceed by collecting a subset of items from each user, taking the union of such subsets, and disclosing the items whose noisy counts fall above a certain threshold. Crucially, in the above process, the contribution of each individual user is always independent of the items held by other users, resulting in a wasteful aggregation process, where some item counts happen to be way above the threshold. We deviate from the above paradigm by allowing users to contribute their items in a $\textit{dependent fashion}$, guided by a $\textit{policy}$. In this new setting ensuring privacy is significantly delicate. We prove that any policy which has certain $\textit{contractive}$ properties would result in a differentially private algorithm. We design two new algorithms, one using Laplace noise and other Gaussian noise, as specific instances of policies satisfying the contractive properties. Our experiments show that the new algorithms significantly outperform previously known mechanisms for the problem.

[98]  arXiv:2002.09748 [pdf, other]
Title: DECIBEL: Improving Audio Chord Estimation for Popular Music by Alignment and Integration of Crowd-Sourced Symbolic Representations
Comments: 81 pages, 47 figures
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

Automatic Chord Estimation (ACE) is a fundamental task in Music Information Retrieval (MIR) and has applications in both music performance and MIR research. The task consists of segmenting a music recording or score and assigning a chord label to each segment. Although it has been a task in the annual benchmarking evaluation MIREX for over 10 years, ACE is not yet a solved problem, since performance has stagnated and modern systems have started to tune themselves to subjective training data. We propose DECIBEL, a new ACE system that exploits widely available MIDI and tab representations to improve ACE from audio only. From an audio file and a set of MIDI and tab files corresponding to the same popular music song, DECIBEL first estimates chord sequences. For audio, state-of-the-art audio ACE methods are used. MIDI files are aligned to the audio, followed by a MIDI chord estimation step. Tab files are transformed into untimed chord sequences and then aligned to the audio. Next, DECIBEL uses data fusion to integrate all estimated chord sequences into one final output sequence. DECIBEL improves all tested state-of-the-art ACE methods by over 3 percent on average. This result shows that the integration of musical knowledge from heterogeneous symbolic music representations is a suitable strategy for addressing challenging MIR tasks such as ACE.

[99]  arXiv:2002.09750 [pdf, other]
Title: On some neural network architectures that can represent viscosity solutions of certain high dimensional Hamilton--Jacobi partial differential equations
Subjects: Numerical Analysis (math.NA)

We propose novel connection between several neural network architectures and viscosity solutions of some Hamilton--Jacobi (HJ) partial differential equations (PDEs) whose Hamiltonian is convex and only depends on the spatial gradient of the solution. To be specific, we prove that under certain assumptions, the two neural network architectures we proposed represent viscosity solutions to two sets of HJ PDEs with zero error. We also implement our proposed neural network architectures using Tensorflow and provide several examples and illustrations. Note that these neural network representations can avoid curve of dimensionality for certain HJ PDEs, since they do not involve grid or discretization. Our results suggest that efficient dedicated hardware implementation for neural networks can be leveraged to evaluate viscosity solutions of certain HJ PDEs.

[100]  arXiv:2002.09751 [pdf, ps, other]
Title: Automatic Decoupling and Index-aware Model-Order Reduction for Nonlinear Differential-Algebraic Equations
Subjects: Numerical Analysis (math.NA)

We extend the index-aware model-order reduction method to systems of nonlinear differential-algebraic equations with a special nonlinear term f(Ex), where E is a singular matrix. Such nonlinear differential-algebraic equations arise, for example, in the spatial discretization of the gas flow in pipeline networks. In practice, mathematical models of real-life processes pose challenges when used in numerical simulations, due to complexity and system size. Model-order reduction aims to eliminate this problem by generating reduced-order models that have lower computational cost to simulate, yet accurately represent the original large-scale system behavior. However, direct reduction and simulation of nonlinear differential-algebraic equations is difficult due to hidden constraints which affect the choice of numerical integration methods and model-order reduction techniques. We propose an extension of index-aware model-order reduction methods to nonlinear differential-algebraic equations without any kind of linearization. The proposed model-order reduction approach involves automatic decoupling of nonlinear differential-algebraic equations into nonlinear ordinary differential equations and algebraic equations. This allows applying standard model-order reduction techniques to both parts without worrying about the index. The same procedure can also be used to simulate nonlinear differential-algebraic equations using standard integration schemes. We illustrate the performance of our proposed method for nonlinear differential-algebraic equations arising from gas flow models in pipeline networks.

[101]  arXiv:2002.09754 [pdf, other]
Title: Sampling for Deep Learning Model Diagnosis (Technical Report)
Subjects: Machine Learning (cs.LG); Databases (cs.DB)

Deep learning (DL) models have achieved paradigm-changing performance in many fields with high dimensional data, such as images, audio, and text. However, the black-box nature of deep neural networks is a barrier not just to adoption in applications such as medical diagnosis, where interpretability is essential, but also impedes diagnosis of under performing models. The task of diagnosing or explaining DL models requires the computation of additional artifacts, such as activation values and gradients. These artifacts are large in volume, and their computation, storage, and querying raise significant data management challenges.
In this paper, we articulate DL diagnosis as a data management problem, and we propose a general, yet representative, set of queries to evaluate systems that strive to support this new workload. We further develop a novel data sampling technique that produce approximate but accurate results for these model debugging queries. Our sampling technique utilizes the lower dimension representation learned by the DL model and focuses on model decision boundaries for the data in this lower dimensional space. We evaluate our techniques on one standard computer vision and one scientific data set and demonstrate that our sampling technique outperforms a variety of state-of-the-art alternatives in terms of query accuracy.

[102]  arXiv:2002.09755 [pdf, other]
Title: BAD to the Bone: Big Active Data at its Core
Comments: 27 pages. Submitted to VLDBJ
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)

Virtually all of today's Big Data systems are passive in nature, responding to queries posted by their users. Instead, we are working to shift Big Data platforms from passive to active. In our view, a Big Active Data (BAD) system should continuously and reliably capture Big Data while enabling timely and automatic delivery of relevant information to a large pool of interested users, as well as supporting retrospective analyses of historical information. While various scalable streaming query engines have been created, their active behavior is limited to a (relatively) small window of the incoming data. To this end we have created a BAD platform that combines ideas and capabilities from both Big Data and Active Data (e.g., Publish/Subscribe, Streaming Engines). It supports complex subscriptions that consider not only newly arrived items but also their relationships to past, stored data. Further, it can provide actionable notifications by enriching the subscription results with other useful data. Our platform extends an existing open-source Big Data Management System, Apache AsterixDB, with an active toolkit. The toolkit contains features to rapidly ingest semistructured data, share execution pipelines among users, manage scaled user data subscriptions, and actively monitor the state of the data to produce individualized information for each user. This paper describes the features and design of our current BAD data platform and demonstrates its ability to scale without sacrificing query capabilities or result individualization.

[103]  arXiv:2002.09758 [pdf, other]
Title: Unsupervised Question Decomposition for Question Answering
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We aim to improve question answering (QA) by decomposing hard questions into easier sub-questions that existing QA systems can answer. Since collecting labeled decompositions is cumbersome, we propose an unsupervised approach to produce sub-questions. Specifically, by leveraging >10M questions from Common Crawl, we learn to map from the distribution of multi-hop questions to the distribution of single-hop sub-questions. We answer sub-questions with an off-the-shelf QA model and incorporate the resulting answers in a downstream, multi-hop QA system. On a popular multi-hop QA dataset, HotpotQA, we show large improvements over a strong baseline, especially on adversarial and out-of-domain questions. Our method is generally applicable and automatically learns to decompose questions of different classes, while matching the performance of decomposition methods that rely heavily on hand-engineering and annotation.

[104]  arXiv:2002.09759 [pdf, ps, other]
Title: Block-Term Tensor Decomposition: Model Selection and Computation
Subjects: Numerical Analysis (math.NA)

The so-called block-term decomposition (BTD) tensor model has been recently receiving increasing attention due to its enhanced representation ability in numerous applications involving mixing of signals of rank higher than one (blocks). Its uniqueness and approximation have thus been thoroughly studied. Nevertheless, the problem of estimating the BTD model structure, namely the number of block terms and their individual ranks, has only recently started to attract significant attention, as it is more challenging compared to more classical tensor models such as canonical polyadic decomposition (CPD) and Tucker decomposition (TD). This article briefly reports our recent results on this topic, which are based on an appropriate extension to the BTD model of our earlier rank-revealing work on low-rank matrix and tensor approximation. The idea is to impose column sparsity \emph{jointly} on the factors and successively estimate the ranks as the numbers of factor columns of non-negligible magnitude, with the aid of alternating iteratively reweighted least squares (IRLS). Simulation results are reported that demonstrate the effectiveness of our method in accurately estimating both the ranks and the factors of the least squares BTD approximation, and in a computationally efficient manner.

[105]  arXiv:2002.09763 [pdf, other]
Title: Longitudinal Support Vector Machines for High Dimensional Time Series
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We consider the problem of learning a classifier from observed functional data. Here, each data-point takes the form of a single time-series and contains numerous features. Assuming that each such series comes with a binary label, the problem of learning to predict the label of a new coming time-series is considered. Hereto, the notion of {\em margin} underlying the classical support vector machine is extended to the continuous version for such data. The longitudinal support vector machine is also a convex optimization problem and its dual form is derived as well. Empirical results for specified cases with significance tests indicate the efficacy of this innovative algorithm for analyzing such long-term multivariate data.

[106]  arXiv:2002.09765 [pdf, other]
Title: Predictive refinement methodology for compressed sensing imaging
Comments: 33 pages, 9 figures, 1 table
Subjects: Information Theory (cs.IT)

The weak-$\ell^p$ norm can be used to define a measure $s$ of sparsity. When we compute $s$ for the discrete cosine transform coefficients of a signal, the value of $s$ is related to the information content of said signal. We use this value of $s$ to define a reference-free index $\mathcal{E}$, called the sparsity index, that we can use to predict with high accuracy the quality of signal reconstruction in the setting of compressed sensing imaging. That way, when compressed sensing is framed in the context of sampling theory, we can use $\mathcal{E}$ to decide when to further partition the sampling space and increase the sampling rate to optimize the recovery of an image when we use compressed sensing techniques.

[107]  arXiv:2002.09766 [pdf, other]
Title: Improving the Tightness of Convex Relaxation Bounds for Training Certifiably Robust Classifiers
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Convex relaxations are effective for training and certifying neural networks against norm-bounded adversarial attacks, but they leave a large gap between certifiable and empirical robustness. In principle, convex relaxation can provide tight bounds if the solution to the relaxed problem is feasible for the original non-convex problem. We propose two regularizers that can be used to train neural networks that yield tighter convex relaxation bounds for robustness. In all of our experiments, the proposed regularizers result in higher certified accuracy than non-regularized baselines.

[108]  arXiv:2002.09772 [pdf, other]
Title: Non-Intrusive Detection of Adversarial Deep Learning Attacks via Observer Networks
Comments: 5 pages, 2 figures, 4 tables
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

Recent studies have shown that deep learning models are vulnerable to specifically crafted adversarial inputs that are quasi-imperceptible to humans. In this letter, we propose a novel method to detect adversarial inputs, by augmenting the main classification network with multiple binary detectors (observer networks) which take inputs from the hidden layers of the original network (convolutional kernel outputs) and classify the input as clean or adversarial. During inference, the detectors are treated as a part of an ensemble network and the input is deemed adversarial if at least half of the detectors classify it as so. The proposed method addresses the trade-off between accuracy of classification on clean and adversarial samples, as the original classification network is not modified during the detection process. The use of multiple observer networks makes attacking the detection mechanism non-trivial even when the attacker is aware of the victim classifier. We achieve a 99.5% detection accuracy on the MNIST dataset and 97.5% on the CIFAR-10 dataset using the Fast Gradient Sign Attack in a semi-white box setup. The number of false positive detections is a mere 0.12% in the worst case scenario.

[109]  arXiv:2002.09773 [pdf, ps, other]
Title: Convex Duality of Deep Neural Networks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We study regularized deep neural networks and introduce an analytic framework to characterize the structure of the hidden layers. We show that a set of optimal hidden layer weight matrices for a norm regularized deep neural network training problem can be explicitly found as the extreme points of a convex set. For two-layer linear networks, we first formulate a convex dual program and prove that strong duality holds. We then extend our derivations to prove that strong duality also holds for certain deep networks. In particular, for linear deep networks, we show that each optimal layer weight matrix is rank-one and aligns with the previous layers when the network output is scalar. We also extend our analysis to the vector outputs and other convex loss functions. More importantly, we show that the same characterization can also be applied to deep ReLU networks with rank-one inputs, where we prove that strong duality still holds and optimal layer weight matrices are rank-one for scalar output networks. As a corollary, we prove that norm regularized deep ReLU networks yield spline interpolation for one-dimensional datasets which was previously known only for two-layer networks. We then verify our theoretical results via several numerical experiments.

[110]  arXiv:2002.09779 [pdf, other]
Title: Stochasticity in Neural ODEs: An Empirical Study
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Stochastic regularization of neural networks (e.g. dropout) is a wide-spread technique in deep learning that allows for better generalization. Despite its success, continuous-time models, such as neural ordinary differential equation (ODE), usually rely on a completely deterministic feed-forward operation. This work provides an empirical study of stochastically regularized neural ODE on several image-classification tasks (CIFAR-10, CIFAR-100, TinyImageNet). Building upon the formalism of stochastic differential equations (SDEs), we demonstrate that neural SDE is able to outperform its deterministic counterpart. Further, we show that data augmentation during the training improves the performance of both deterministic and stochastic versions of the same model. However, the improvements obtained by the data augmentation completely eliminate the empirical gains of the stochastic regularization, making the difference in the performance of neural ODE and neural SDE negligible.

[111]  arXiv:2002.09781 [pdf, other]
Title: On the Inductive Bias of a CNN for Orthogonal Patterns Distributions
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Training overparameterized convolutional neural networks with gradient based methods is the most successful learning method for image classification. However, its theoretical properties are far from understood even for very simple learning tasks. In this work, we consider a simplified image classification task where images contain orthogonal patches and are learned with a 3-layer overparameterized convolutional network and stochastic gradient descent. We empirically identify a novel phenomenon where the dot-product between the learned pattern detectors and their detected patterns are governed by the pattern statistics in the training set. We call this phenomenon Pattern Statistics Inductive Bias (PSI) and prove that PSI holds for a simple setup with two points in the training set. Furthermore, we prove that if PSI holds, stochastic gradient descent has sample complexity $O(d^2\log(d))$ where $d$ is the filter dimension. In contrast, we show a VC dimension lower bound in our setting which is exponential in $d$. Taken together, our results provide strong evidence that PSI is a unique inductive bias of stochastic gradient descent, that guarantees good generalization properties.

[112]  arXiv:2002.09784 [pdf, ps, other]
Title: Compactly Representing Uniform Interpolants for EUF using (conditional) DAGS
Subjects: Logic in Computer Science (cs.LO)

The concept of a uniform interpolant for a quantifier-free formula from a given formula with a list of symbols, while well-known in the logic literature, has been unknown to the formal methods and automated reasoning community. This concept is precisely defined. Two algorithms for computing the uniform interpolant of a quantifier-free formula in EUF endowed with a list of symbols to be eliminated are proposed. The first algorithm is non-deterministic and generates a uniform interpolant expressed as a disjunction of conjunction of literals, whereas the second algorithm gives a compact representation of a uniform interpolant as a conjunction of Horn clauses. Both algorithms exploit efficient dedicated DAG representation of terms. Correctness and completeness proofs are supplied, using arguments combining rewrite techniques with model-theoretic tools.

[113]  arXiv:2002.09786 [pdf, other]
Title: HarDNN: Feature Map Vulnerability Evaluation in CNNs
Comments: 14 pages, 5 figures, a short version accepted for publication in First Workshop on Secure and Resilient Autonomy (SARA) co-located with MLSys2020
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

As Convolutional Neural Networks (CNNs) are increasingly being employed in safety-critical applications, it is important that they behave reliably in the face of hardware errors. Transient hardware errors may percolate undesirable state during execution, resulting in software-manifested errors which can adversely affect high-level decision making. This paper presents HarDNN, a software-directed approach to identify vulnerable computations during a CNN inference and selectively protect them based on their propensity towards corrupting the inference output in the presence of a hardware error. We show that HarDNN can accurately estimate relative vulnerability of a feature map (fmap) in CNNs using a statistical error injection campaign, and explore heuristics for fast vulnerability assessment. Based on these results, we analyze the tradeoff between error coverage and computational overhead that the system designers can use to employ selective protection. Results show that the improvement in resilience for the added computation is superlinear with HarDNN. For example, HarDNN improves SqueezeNet's resilience by 10x with just 30% additional computations.

[114]  arXiv:2002.09790 [pdf, other]
Title: Shallow2Deep: Indoor Scene Modeling by Single Image Understanding
Comments: Accepted by Pattern Recognition
Journal-ref: Pattern Recognition. 2020 Feb 12:107271
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Dense indoor scene modeling from 2D images has been bottlenecked due to the absence of depth information and cluttered occlusions. We present an automatic indoor scene modeling approach using deep features from neural networks. Given a single RGB image, our method simultaneously recovers semantic contents, 3D geometry and object relationship by reasoning indoor environment context. Particularly, we design a shallow-to-deep architecture on the basis of convolutional networks for semantic scene understanding and modeling. It involves multi-level convolutional networks to parse indoor semantics/geometry into non-relational and relational knowledge. Non-relational knowledge extracted from shallow-end networks (e.g. room layout, object geometry) is fed forward into deeper levels to parse relational semantics (e.g. support relationship). A Relation Network is proposed to infer the support relationship between objects. All the structured semantics and geometry above are assembled to guide a global optimization for 3D scene modeling. Qualitative and quantitative analysis demonstrates the feasibility of our method in understanding and modeling semantics-enriched indoor scenes by evaluating the performance of reconstruction accuracy, computation performance and scene complexity.

[115]  arXiv:2002.09792 [pdf, other]
Title: VisionGuard: Runtime Detection of Adversarial Inputs to Perception Systems
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Image and Video Processing (eess.IV)

Deep neural network (DNN) models have proven to be vulnerable to adversarial attacks. In this paper, we propose VisionGuard, a novel attack- and dataset-agnostic and computationally-light defense mechanism for adversarial inputs to DNN-based perception systems. In particular, VisionGuard relies on the observation that adversarial images are sensitive to lossy compression transformations. Specifically, to determine if an image is adversarial, VisionGuard checks if the output of the target classifier on a given input image changes significantly after feeding it a transformed version of the image under investigation. Moreover, we show that VisionGuard is computationally-light both at runtime and design-time which makes it suitable for real-time applications that may also involve large-scale image domains. To highlight this, we demonstrate the efficiency of VisionGuard on ImageNet, a task that is computationally challenging for the majority of relevant defenses. Finally, we include extensive comparative experiments on the MNIST, CIFAR10, and ImageNet datasets that show that VisionGuard outperforms existing defenses in terms of scalability and detection performance.

[116]  arXiv:2002.09794 [pdf, other]
Title: PoET-BiN: Power Efficient Tiny Binary Neurons
Comments: Accepted in MLSys 2020 conference
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The success of neural networks in image classification has inspired various hardware implementations on embedded platforms such as Field Programmable Gate Arrays, embedded processors and Graphical Processing Units. These embedded platforms are constrained in terms of power, which is mainly consumed by the Multiply Accumulate operations and the memory accesses for weight fetching. Quantization and pruning have been proposed to address this issue. Though effective, these techniques do not take into account the underlying architecture of the embedded hardware. In this work, we propose PoET-BiN, a Look-Up Table based power efficient implementation on resource constrained embedded devices. A modified Decision Tree approach forms the backbone of the proposed implementation in the binary domain. A LUT access consumes far less power than the equivalent Multiply Accumulate operation it replaces, and the modified Decision Tree algorithm eliminates the need for memory accesses. We applied the PoET-BiN architecture to implement the classification layers of networks trained on MNIST, SVHN and CIFAR-10 datasets, with near state-of-the art results. The energy reduction for the classifier portion reaches up to six orders of magnitude compared to a floating point implementations and up to three orders of magnitude when compared to recent binary quantized neural networks.

[117]  arXiv:2002.09795 [pdf, ps, other]
Title: Periodic Q-Learning
Authors: Donghwan Lee, Niao He
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

The use of target networks is a common practice in deep reinforcement learning for stabilizing the training; however, theoretical understanding of this technique is still limited. In this paper, we study the so-called periodic Q-learning algorithm (PQ-learning for short), which resembles the technique used in deep Q-learning for solving infinite-horizon discounted Markov decision processes (DMDP) in the tabular setting. PQ-learning maintains two separate Q-value estimates - the online estimate and target estimate. The online estimate follows the standard Q-learning update, while the target estimate is updated periodically. In contrast to the standard Q-learning, PQ-learning enjoys a simple finite time analysis and achieves better sample complexity for finding an epsilon-optimal policy. Our result provides a preliminary justification of the effectiveness of utilizing target estimates or networks in Q-learning algorithms.

[118]  arXiv:2002.09797 [pdf, other]
Title: Reliable Fidelity and Diversity Metrics for Generative Models
Comments: First two authors have contributed equally
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

Devising indicative evaluation metrics for the image generation task remains an open problem. The most widely used metric for measuring the similarity between real and generated images has been the Fr\'echet Inception Distance (FID) score. Because it does not differentiate the fidelity and diversity aspects of the generated images, recent papers have introduced variants of precision and recall metrics to diagnose those properties separately. In this paper, we show that even the latest version of the precision and recall metrics are not reliable yet. For example, they fail to detect the match between two identical distributions, they are not robust against outliers, and the evaluation hyperparameters are selected arbitrarily. We propose density and coverage metrics that solve the above issues. We analytically and experimentally show that density and coverage provide more interpretable and reliable signals for practitioners than the existing metrics. Code: https://github.com/clovaai/generative-evaluation-prdc.

[119]  arXiv:2002.09799 [pdf, other]
Title: Sample Debiasing in the Themis Open World Database System (Extended Version)
Comments: SIGMOD 2020
Subjects: Databases (cs.DB)

Open world database management systems assume tuples not in the database still exist and are becoming an increasingly important area of research. We present Themis, the first open world database that automatically rebalances arbitrarily biased samples to approximately answer queries as if they were issued over the entire population. We leverage apriori population aggregate information to develop and combine two different approaches for automatic debiasing: sample reweighting and Bayesian network probabilistic modeling. We build a prototype of Themis and demonstrate that Themis achieves higher query accuracy than the default AQP approach, an alternative sample reweighting technique, and a variety of Bayesian network models while maintaining interactive query response times. We also show that \name is robust to differences in the support between the sample and population, a key use case when using social media samples.

[120]  arXiv:2002.09801 [pdf, other]
Title: High-order Methods for a Pressure Poisson Equation Reformulation of the Navier-Stokes Equations with Electric Boundary Conditions
Subjects: Numerical Analysis (math.NA)

Pressure Poisson equation (PPE) reformulations of the incompressible Navier-Stokes equations (NSE) replace the incompressibility constraint by a Poisson equation for the pressure and a suitable choice of boundary conditions. This yields a time-evolution equation for the velocity field only, with the pressure gradient acting as a nonlocal operator. Thus, numerical methods based on PPE reformulations, in principle, have no limitations in achieving high order. In this paper, it is studied to what extent high-order methods for the NSE can be obtained from a specific PPE reformulation with electric boundary conditions (EBC). To that end, implicit-explicit (IMEX) time-stepping is used to decouple the pressure solve from the velocity update, while avoiding a parabolic time-step restriction; and mixed finite elements are used in space, to capture the structure imposed by the EBC. Via numerical examples, it is demonstrated that the methodology can yield at least third order accuracy in space and time.

[121]  arXiv:2002.09803 [pdf, other]
Title: Author Name Disambiguation on Heterogeneous Information Network with Adversarial Representation Learning
Comments: AAAI 2020
Subjects: Social and Information Networks (cs.SI); Digital Libraries (cs.DL)

Author name ambiguity causes inadequacy and inconvenience in academic information retrieval, which raises the necessity of author name disambiguation (AND). Existing AND methods can be divided into two categories: the models focusing on content information to distinguish whether two papers are written by the same author, the models focusing on relation information to represent information as edges on the network and to quantify the similarity among papers. However, the former requires adequate labeled samples and informative negative samples, and are also ineffective in measuring the high-order connections among papers, while the latter needs complicated feature engineering or supervision to construct the network. We propose a novel generative adversarial framework to grow the two categories of models together: (i) the discriminative module distinguishes whether two papers are from the same author, and (ii) the generative module selects possibly homogeneous papers directly from the heterogeneous information network, which eliminates the complicated feature engineering. In such a way, the discriminative module guides the generative module to select homogeneous papers, and the generative module generates high-quality negative samples to train the discriminative module to make it aware of high-order connections among papers. Furthermore, a self-training strategy for the discriminative module and a random walk based generating algorithm are designed to make the training stable and efficient. Extensive experiments on two real-world AND benchmarks demonstrate that our model provides significant performance improvement over the state-of-the-art methods.

[122]  arXiv:2002.09805 [pdf, ps, other]
Title: Risk-Aware Optimization of Age of Information in the Internet of Things
Comments: 6 pages. Accepted to IEEE International Conference on Communications (ICC) 2020, Dublin, Ireland
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)

For time-sensitive Internet of Things (IoT) applications, a risk-neutral approach for age of information (AoI) optimization which focuses only on minimizing the expected value of the AoI based cost function, cannot capture rare yet critical events with potentially very large AoI. Thus, in this paper, in order to quantify such rare events, an effective coherent risk measure, called the conditional value-at-risk (CVaR), is studied for the purpose of minimizing the AoI of real-time IoT status updates. Particularly, a real-time IoT monitoring system is considered in which an IoT device monitors a physical process and sends the status updates to a remote receiver with an updating cost. The optimal status updating process is designed to jointly minimize the AoI at the receiver, the CVaR of the AoI at the receiver, and the energy cost. This stochastic optimization problem is formulated as an infinite horizon discounted risk-aware Markov decision process (MDP), which is computationally intractable due to the time inconsistency of the CVaR. By exploiting the special properties of coherent risk measures, the risk-aware MDP is reduced to a standard MDP with an augmented state space, for which, a dynamic programming based solution is proposed to derive the optimal stationary policy. In particular, the optimal history-dependent policy of the risk-aware MDP is shown to depend on the history only through the augmented system states and can be readily constructed using the optimal stationary policy of the augmented MDP. The proposed solution is computationally tractable and minimizes the AoI in real-time IoT monitoring systems in a risk-aware manner.

[123]  arXiv:2002.09807 [pdf, other]
Title: Online Stochastic Max-Weight Matching: prophet inequality for vertex and edge arrival models
Comments: 29 pages, 2 figures
Subjects: Data Structures and Algorithms (cs.DS); Computer Science and Game Theory (cs.GT)

We provide prophet inequality algorithms for online weighted matching in general (non-bipartite) graphs, under two well-studied arrival models, namely edge arrival and vertex arrival. The weight of each edge is drawn independently from an a-priori known probability distribution. Under edge arrival, the weight of each edge is revealed upon arrival, and the algorithm decides whether to include it in the matching or not. Under vertex arrival, the weights of all edges from the newly arriving vertex to all previously arrived vertices are revealed, and the algorithm decides which of these edges, if any, to include in the matching. To study these settings, we introduce a novel unified framework of batched prophet inequalities that captures online settings where elements arrive in batches; in particular it captures matching under the two aforementioned arrival models. Our algorithms rely on the construction of suitable online contention resolution scheme (OCRS). We first extend the framework of OCRS to batched-OCRS, we then establish a reduction from batched prophet inequality to batched OCRS, and finally we construct batched OCRSs with selectable ratios of 0.337 and 0.5 for edge and vertex arrival models, respectively. Both results improve the state of the art for the corresponding settings. For the vertex arrival, our result is tight. Interestingly, a pricing-based prophet inequality with comparable competitive ratios is unknown.

[124]  arXiv:2002.09808 [pdf, other]
Title: My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits
Subjects: Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA)

Consider N cooperative but non-communicating players where each plays one out of M arms for T turns. Players have different utilities for each arm, representable as an N x M matrix. These utilities are unknown to the players. In each turn players receive noisy observations of their utility for their selected arm. However, if any other players selected the same arm that turn, they will all receive zero utility due to the conflict. No other communication or coordination between the players is possible. Our goal is to design a distributed algorithm that learns the matching between players and arms that achieves max-min fairness while minimizing the regret. We present an algorithm and prove that it is regret optimal up to a $\log\log T$ factor. This is the first max-min fairness multi-player bandit algorithm with (near) order optimal regret.

[125]  arXiv:2002.09809 [pdf, other]
Title: Random Bundle: Brain Metastases Segmentation Ensembling through Annotation Randomization
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce a novel ensembling method, Random Bundle (RB), that improves performance for brain metastases segmentation. We create our ensemble by training each network on our dataset with 50% of our annotated lesions censored out. We also apply a lopsided bootstrap loss to recover performance after inducing an in silico 50% false negative rate and make our networks more sensitive. We improve our network detection of lesions's mAP value by 39% and more than triple the sensitivity at 80% precision. We also show slight improvements in segmentation quality through DICE score. Further, RB ensembling improves performance over baseline by a larger margin than a variety of popular ensembling strategies. Finally, we show that RB ensembling is computationally efficient by comparing its performance to a single network when both systems are constrained to have the same compute.

[126]  arXiv:2002.09811 [pdf, other]
Title: Automatic Cost Function Learning with Interpretable Compositional Networks
Subjects: Artificial Intelligence (cs.AI)

Cost Function Networks (CFN) are a formalism in Constraint Programming to model combinatorial satisfaction or optimization problems. By associating a function to each constraint type to evaluate the quality of an assignment, it extends the expressivity of regular CSP/COP formalisms but at a price of making harder the problem modeling. Indeed, in addition to regular variables/domains/constraints sets, one must provide a set of cost functions that are not always easy to define. Here we propose a method to automatically learn a cost function of a constraint, given a function deciding if assignments are valid or not. This is to the best of our knowledge the first attempt to automatically learn cost functions. Our method aims to learn cost functions in a supervised fashion, trying to reproduce the Hamming distance, by using a variation of neural networks we named Interpretable Compositional Networks, allowing us to get explainable results, unlike regular artificial neural networks. We experiment it on 5 different constraints to show its versatility. Experiments show that functions learned on small dimensions scale on high dimensions, outputting a perfect or near-perfect Hamming distance for most constraints. Our system can be used to automatically generate cost functions and then having the expressivity of CFN with the same modeling effort than for CSP/COP.

[127]  arXiv:2002.09812 [pdf, ps, other]
Title: Sketching Transformed Matrices with Applications to Natural Language Processing
Comments: AISTATS 2020
Subjects: Data Structures and Algorithms (cs.DS); Computation and Language (cs.CL); Machine Learning (cs.LG)

Suppose we are given a large matrix $A=(a_{i,j})$ that cannot be stored in memory but is in a disk or is presented in a data stream. However, we need to compute a matrix decomposition of the entry-wisely transformed matrix, $f(A):=(f(a_{i,j}))$ for some function $f$. Is it possible to do it in a space efficient way? Many machine learning applications indeed need to deal with such large transformed matrices, for example word embedding method in NLP needs to work with the pointwise mutual information (PMI) matrix, while the entrywise transformation makes it difficult to apply known linear algebraic tools. Existing approaches for this problem either need to store the whole matrix and perform the entry-wise transformation afterwards, which is space consuming or infeasible, or need to redesign the learning method, which is application specific and requires substantial remodeling.
In this paper, we first propose a space-efficient sketching algorithm for computing the product of a given small matrix with the transformed matrix. It works for a general family of transformations with provable small error bounds and thus can be used as a primitive in downstream learning tasks. We then apply this primitive to a concrete application: low-rank approximation. We show that our approach obtains small error and is efficient in both space and time. We complement our theoretical results with experiments on synthetic and real data.

[128]  arXiv:2002.09814 [pdf, other]
Title: Survey Bandits with Regret Guarantees
Comments: 17 pages, 10 figures
Subjects: Machine Learning (cs.LG); Econometrics (econ.EM); Machine Learning (stat.ML)

We consider a variant of the contextual bandit problem. In standard contextual bandits, when a user arrives we get the user's complete feature vector and then assign a treatment (arm) to that user. In a number of applications (like healthcare), collecting features from users can be costly. To address this issue, we propose algorithms that avoid needless feature collection while maintaining strong regret guarantees.

[129]  arXiv:2002.09818 [pdf, other]
Title: Assembling Semantically-Disentangled Representations for Predictive-Generative Models via Adaptation from Synthetic Domain
Comments: 8 pages, 18 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Deep neural networks can form high-level hierarchical representations of input data. Various researchers have demonstrated that these representations can be used to enable a variety of useful applications. However, such representations are typically based on the statistics within the data, and may not conform with the semantic representation that may be necessitated by the application. Conditional models are typically used to overcome this challenge, but they require large annotated datasets which are difficult to come by and costly to create. In this paper, we show that semantically-aligned representations can be generated instead with the help of a physics based engine. This is accomplished by creating a synthetic dataset with decoupled attributes, learning an encoder for the synthetic dataset, and augmenting prescribed attributes from the synthetic domain with attributes from the real domain. It is shown that the proposed (SYNTH-VAE-GAN) method can construct a conditional predictive-generative model of human face attributes without relying on real data labels.

[130]  arXiv:2002.09820 [pdf, other]
Title: Deep Reinforcement Learning with Linear Quadratic Regulator Regions
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (eess.SY)

Practitioners often rely on compute-intensive domain randomization to ensure reinforcement learning policies trained in simulation can robustly transfer to the real world. Due to unmodeled nonlinearities in the real system, however, even such simulated policies can still fail to perform stably enough to acquire experience in real environments. In this paper we propose a novel method that guarantees a stable region of attraction for the output of a policy trained in simulation, even for highly nonlinear systems. Our core technique is to use "bias-shifted" neural networks for constructing the controller and training the network in the simulator. The modified neural networks not only capture the nonlinearities of the system but also provably preserve linearity in a certain region of the state space and thus can be tuned to resemble a linear quadratic regulator that is known to be stable for the real system. We have tested our new method by transferring simulated policies for a swing-up inverted pendulum to real systems and demonstrated its efficacy.

[131]  arXiv:2002.09825 [pdf, other]
Title: Model Predictive Congestion Control for TCP Endpoints
Comments: 13 pages, 13 figures
Subjects: Networking and Internet Architecture (cs.NI)

A common problem in science networks and private wide area networks (WANs) is that of achieving predictable data transfers of multiple concurrent flows by maintaining specific pacing rates for each. We address this problem by developing a control algorithm based on concepts from model predictive control (MPC) to produce flows with smooth pacing rates and round trip times (RTTs). In the proposed approach, we model the bottleneck link as a queue and derive a model relating the pacing rate and the RTT. A MPC based control algorithm based on this model is shown to avoid the extreme window (which translates to rate) reduction that exists in current control algorithms when facing network congestion. We have implemented our algorithm as a Linux kernel module. Through simulation and experimental analysis, we show that our algorithm achieves the goals of a low standard deviation of RTT and pacing rate, even when the bottleneck link is fully utilized. In the case of multiple flows, we can assign different rates to each flow and as long as the sum of rates is less than bottleneck rate, they can maintain their assigned pacing rate with low standard deviation. This is achieved even when the flows have different RTTs.

[132]  arXiv:2002.09827 [pdf, ps, other]
Title: Signature in Counterparts, a Formal Treatment
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Multiagent Systems (cs.MA)

"Signature in counterparts" is a legal process that permits a contract between two or more parties to be brought into force by having the parties independently (possibly, remotely) sign different copies of the contract, rather than placing their signatures on a common copy at a physical meeting. The paper develops a logical understanding of this process, developing a number of axioms that can be used to justify the validity of a contract from the assumption that separate copies have been signed. It is argued that a satisfactory account benefits from a logic with syntactic self-reference. The axioms used are supported by a formal semantics, and a number of further properties of this semantics are investigated. In particular, it is shown that the semantics implies that when a contract is valid, the parties do not just agree, but are in mutual agreement (a common-knowledge-like notion) about the validity of the contract.

[133]  arXiv:2002.09831 [pdf, other]
Title: On the Role of Dataset Quality and Heterogeneity in Model Confidence
Comments: 25 pages, 14 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Safety-critical applications require machine learning models that output accurate and calibrated probabilities. While uncalibrated deep networks are known to make over-confident predictions, it is unclear how model confidence is impacted by the variations in the data, such as label noise or class size. In this paper, we investigate the role of the dataset quality by studying the impact of dataset size and the label noise on the model confidence. We theoretically explain and experimentally demonstrate that, surprisingly, label noise in the training data leads to under-confident networks, while reduced dataset size leads to over-confident models. We then study the impact of dataset heterogeneity, where data quality varies across classes, on model confidence. We demonstrate that this leads to heterogenous confidence/accuracy behavior in the test data and is poorly handled by the standard calibration algorithms. To overcome this, we propose an intuitive heterogenous calibration technique and show that the proposed approach leads to improved calibration metrics (both average and worst-case errors) on the CIFAR datasets.

[134]  arXiv:2002.09832 [pdf, other]
Title: Sequence Preserving Network Traffic Generation
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)

We present the Network Traffic Generator (NTG), a framework for perturbing recorded network traffic with the purpose of generating diverse but realistic background traffic for network simulation and what-if analysis in enterprise environments. The framework preserves many characteristics of the original traffic recorded in an enterprise, as well as sequences of network activities. Using the proposed framework, the original traffic flows are profiled using 200 cross-protocol features. The traffic is aggregated into flows of packets between IP pairs and clustered into groups of similar network activities. Sequences of network activities are then extracted. We examined two methods for extracting sequences of activities: a Markov model and a neural language model. Finally, new traffic is generated using the extracted model. We developed a prototype of the framework and conducted extensive experiments based on two real network traffic collections. Hypothesis testing was used to examine the difference between the distribution of original and generated features, showing that 30-100\% of the extracted features were preserved. Small differences between n-gram perplexities in sequences of network activities in the original and generated traffic, indicate that sequences of network activities were well preserved.

[135]  arXiv:2002.09834 [pdf, other]
Title: PrivGen: Preserving Privacy of Sequences Through Data Generation
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Sequential data is everywhere, and it can serve as a basis for research that will lead to improved processes. For example, road infrastructure can be improved by identifying bottlenecks in GPS data, or early diagnosis can be improved by analyzing patterns of disease progression in medical data. The main obstacle is that access and use of such data is usually limited or not permitted at all due to concerns about violating user privacy, and rightly so. Anonymizing sequence data is not a simple task, since a user creates an almost unique signature over time. Existing anonymization methods reduce the quality of information in order to maintain the level of anonymity required. Damage to quality may disrupt patterns that appear in the original data and impair the preservation of various characteristics. Since in many cases the researcher does not need the data as is and instead is only interested in the patterns that exist in the data, we propose PrivGen, an innovative method for generating data that maintains patterns and characteristics of the source data. We demonstrate that the data generation mechanism significantly limits the risk of privacy infringement. Evaluating our method with real-world datasets shows that its generated data preserves many characteristics of the data, including the sequential model, as trained based on the source data. This suggests that the data generated by our method could be used in place of actual data for various types of analysis, maintaining user privacy and the data's integrity at the same time.

[136]  arXiv:2002.09836 [pdf, other]
Title: Fill in the BLANC: Human-free quality estimation of document summaries
Comments: 12 pages, 9 figures, 2 tables
Subjects: Computation and Language (cs.CL)

We present BLANC, a new approach to the automatic estimation of document summary quality. Our goal is to measure the functional performance of a summary with an objective, reproducible, and fully automated method. Our approach achieves this by measuring the performance boost gained by a pre-trained language model with access to a document summary while carrying out its language understanding task on the document's text. We present evidence that BLANC scores have at least as good correlation with human evaluations as do the ROUGE family of summary quality measurements. And unlike ROUGE, the BLANC method does not require human-written reference summaries, allowing for fully human-free summary quality estimation.

[137]  arXiv:2002.09841 [pdf, other]
Title: SetRank: A Setwise Bayesian Approach for Collaborative Ranking from Implicit Feedback
Comments: This paper has been accepted in AAAI'20
Journal-ref: The Thirty-Fourth AAAI Conference on Artificial Intelligenc (AAAI'20), New York, New York, USA, 2020
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)

The recent development of online recommender systems has a focus on collaborative ranking from implicit feedback, such as user clicks and purchases. Different from explicit ratings, which reflect graded user preferences, the implicit feedback only generates positive and unobserved labels. While considerable efforts have been made in this direction, the well-known pairwise and listwise approaches have still been limited by various challenges. Specifically, for the pairwise approaches, the assumption of independent pairwise preference is not always held in practice. Also, the listwise approaches cannot efficiently accommodate "ties" due to the precondition of the entire list permutation. To this end, in this paper, we propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to inherently accommodate the characteristics of implicit feedback in recommender system. Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons and can be implemented with matrix factorization and neural networks. Meanwhile, we also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to $\sqrt{M/N}$, where $M$ and $N$ are the numbers of items and users, respectively. Finally, extensive experiments on four real-world datasets clearly validate the superiority of SetRank compared with various state-of-the-art baselines.

[138]  arXiv:2002.09843 [pdf, other]
Title: Practical and Bilateral Privacy-preserving Federated Learning
Comments: Submitted to ICML 2020
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Federated learning, as an emerging distributed training model of neural networks without collecting raw data, has attracted widespread attention. However, almost all existing researches of federated learning only consider protecting the privacy of clients, but not preventing model iterates and final model parameters from leaking to untrusted clients and external attackers. In this paper, we present the first bilateral privacy-preserving federated learning scheme, which protects not only the raw training data of clients, but also model iterates during the training phase as well as final model parameters. Specifically, we present an efficient privacy-preserving technique to mask or encrypt the global model, which not only allows clients to train over the noisy global model, but also ensures only the server can obtain the exact updated model. Detailed security analysis shows that clients can access neither model iterates nor the final global model; meanwhile, the server cannot obtain raw training data of clients from additional information used for recovering the exact updated model. Finally, extensive experiments demonstrate the proposed scheme has comparable model accuracy with traditional federated learning without bringing much extra communication overhead.

[139]  arXiv:2002.09846 [pdf, other]
Title: Tree++: Truncated Tree Based Graph Kernels
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

Graph-structured data arise ubiquitously in many application domains. A fundamental problem is to quantify their similarities. Graph kernels are often used for this purpose, which decompose graphs into substructures and compare these substructures. However, most of the existing graph kernels do not have the property of scale-adaptivity, i.e., they cannot compare graphs at multiple levels of granularities. Many real-world graphs such as molecules exhibit structure at varying levels of granularities. To tackle this problem, we propose a new graph kernel called Tree++ in this paper. At the heart of Tree++ is a graph kernel called the path-pattern graph kernel. The path-pattern graph kernel first builds a truncated BFS tree rooted at each vertex and then uses paths from the root to every vertex in the truncated BFS tree as features to represent graphs. The path-pattern graph kernel can only capture graph similarity at fine granularities. In order to capture graph similarity at coarse granularities, we incorporate a new concept called super path into it. The super path contains truncated BFS trees rooted at the vertices in a path. Our evaluation on a variety of real-world graphs demonstrates that Tree++ achieves the best classification accuracy compared with previous graph kernels.

[140]  arXiv:2002.09848 [pdf, ps, other]
Title: A new regularization method for a parameter identification problem in a non-linear partial differential equation
Subjects: Numerical Analysis (math.NA); Functional Analysis (math.FA)

We consider a parameter identification problem related to a quasi-linear elliptic Neumann boundary value problem involving a parameter function $a(\cdot)$ and the solution $u(\cdot)$, where the problem is to identify $a(\cdot)$ on an interval $I:= g(\Gamma)$ from the knowledge of the solution $u(\cdot)$ as $g$ on $\Gamma$, where $\Gamma$ is a given curve on the boundary of the domain $\Omega \subseteq \mathbb{R}^3$ of the problem and $g$ is a continuous function. For obtaining stable approximate solutions, we consider new regularization method which gives error estimates similar to, and in certain cases better than, the classical Tikhonov regularization considered in the literature in recent past.

[141]  arXiv:2002.09849 [pdf, other]
Title: Multi-Antenna UAV Data Harvesting: Joint Trajectory and Communication Optimization
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Unmanned aerial vehicle (UAV)-enabled communication is a promising technology to extend coverage and enhance throughput for traditional terrestrial wireless communication systems. In this paper, we consider a UAV-enabled wireless sensor network (WSN), where a multi-antenna UAV is dispatched to collect data from a group of sensor nodes (SNs). The objective is to maximize the minimum data collection rate from all SNs via jointly optimizing their transmission scheduling and power allocations as well as the trajectory of the UAV, subject to the practical constraints on the maximum transmit power of the SNs and the maximum speed of the UAV. The formulated optimization problem is challenging to solve as it involves non-convex constraints and discrete-value variables. To draw useful insight, we first consider the special case of the formulated problem by ignoring the UAV speed constraint and optimally solve it based on the Lagrange duality method. It is shown that for this relaxed problem, the UAV should hover above a finite number of optimal locations with different durations in general. Next, we address the general case of the formulated problem where the UAV speed constraint is considered and propose a traveling salesman problem (TSP)-based trajectory initialization, where the UAV sequentially visits the locations obtained in the relaxed problem with minimum flying time. Given this initial trajectory, we then find the corresponding transmission scheduling and power allocations of the SNs and further optimize the UAV trajectory by applying the block coordinate descent (BCD) and successive convex approximation (SCA) techniques. Finally, numerical results are provided to illustrate the spectrum and energy efficiency gains of the proposed scheme for multi-antenna UAV data harvesting, as compared to benchmark schemes.

[142]  arXiv:2002.09850 [pdf, other]
Title: Active localization of multiple targets using noisy relative measurements
Comments: 8 pages, 5 figures
Subjects: Robotics (cs.RO)

Consider a mobile robot tasked with localizing targets at unknown locations by obtaining relative measurements. The observations can be bearing or range measurements. How should the robot move so as to localize the targets and minimize the uncertainty in their locations as quickly as possible? Most existing approaches are either greedy in nature or rely on accurate initial estimates.
We formulate this path planning problem as an unsupervised learning problem where the measurements are aggregated using a Bayesian histogram filter. The robot learns to minimize the total uncertainty of each target in the shortest amount of time using the current measurement and an aggregate representation of the current belief state. We analyze our method in a series of experiments where we show that our method outperforms a standard greedy approach. In addition, its performance is also comparable to an offline algorithm which has access to the true location of the targets.

[143]  arXiv:2002.09853 [pdf, other]
Title: Optimizing Traffic Lights with Multi-agent Deep Reinforcement Learning and V2X communication
Comments: 7 Figure, Table 1
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

We consider a system to optimize duration of traffic signals using multi-agent deep reinforcement learning and Vehicle-to-Everything (V2X) communication. This system aims at analyzing independent and shared rewards for multi-agents to control duration of traffic lights. A learning agent traffic light gets information along its lanes within a circular V2X coverage. The duration cycles of traffic light are modeled as Markov decision Processes. We investigate four variations of reward functions. The first two are unshared-rewards: based on waiting number, and waiting time of vehicles between two cycles of traffic light. The third and fourth functions are: shared-rewards based on waiting cars, and waiting time for all agents. Each agent has a memory for optimization through target network and prioritized experience replay. We evaluate multi-agents through the Simulation of Urban MObility (SUMO) simulator. The results prove effectiveness of the proposed system to optimize traffic signals and reduce average waiting cars to 41.5 % as compared to the traditional periodic traffic control system.

[144]  arXiv:2002.09854 [pdf, ps, other]
Title: Crossing the Reality Gap with Evolved Plastic Neurocontrollers
Comments: Submitted to GECCO2020
Subjects: Robotics (cs.RO); Neural and Evolutionary Computing (cs.NE)

A critical issue in evolutionary robotics is the transfer of controllers learned in simulation to reality. This is especially the case for small Unmanned Aerial Vehicles (UAVs), as the platforms are highly dynamic and susceptible to breakage. Previous approaches often require simulation models with a high level of accuracy, otherwise significant errors may arise when the well-designed controller is being deployed onto the targeted platform. Here we try to overcome the transfer problem from a different perspective, by designing a spiking neurocontroller which uses synaptic plasticity to cross the reality gap via online adaptation. Through a set of experiments we show that the evolved plastic spiking controller can maintain its functionality by self-adapting to model changes that take place after evolutionary training, and consequently exhibit better performance than its non-plastic counterpart.

[145]  arXiv:2002.09857 [pdf, ps, other]
Title: Verifying Array Manipulating Programs with Full-Program Induction
Subjects: Software Engineering (cs.SE); Programming Languages (cs.PL)

We present a full-program induction technique for proving (a sub-class of) quantified as well as quantifier-free properties of programs manipulating arrays of parametric size N. Instead of inducting over individual loops, our technique inducts over the entire program (possibly containing multiple loops) directly via the program parameter N. Significantly, this does not require generation or use of loop-specific invariants. We have developed a prototype tool Vajra to assess the efficacy of our technique. We demonstrate the performance of Vajra vis-a-vis several state-of-the-art tools on a set of array manipulating benchmarks.

[146]  arXiv:2002.09858 [pdf, other]
Title: Deep Learning Based FDD Non-Stationary Massive MIMO Downlink Channel Reconstruction
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

This paper proposes a model-driven deep learning-based downlink channel reconstruction scheme for frequency division duplexing (FDD) massive multi-input multi-output (MIMO) systems. The spatial non-stationarity, which is the key feature of the future extremely large aperture massive MIMO system, is considered. Instead of the channel matrix, the channel model parameters are learned by neural networks to save the overhead and improve the accuracy of channel reconstruction. By viewing the channel as an image, we introduce You Only Look Once (YOLO), a powerful neural network for object detection, to enable a rapid estimation process of the model parameters, including the detection of angles and delays of the paths and the identification of visibility regions of the scatterers. The deep learning-based scheme avoids the complicated iterative process introduced by the algorithm-based parameter extraction methods. A low-complexity algorithm-based refiner further refines the YOLO estimates toward high accuracy. Given the efficiency of model-driven deep learning and the combination of neural network and algorithm, the proposed scheme can rapidly and accurately reconstruct the non-stationary downlink channel. Moreover, the proposed scheme is also applicable to widely concerned stationary systems and achieves comparable reconstruction accuracy as an algorithm-based method with greatly reduced time consumption.

[147]  arXiv:2002.09859 [pdf, other]
Title: DotFAN: A Domain-transferred Face Augmentation Network for Pose and Illumination Invariant Face Recognition
Comments: 12 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The performance of a convolutional neural network (CNN) based face recognition model largely relies on the richness of labelled training data. Collecting a training set with large variations of a face identity under different poses and illumination changes, however, is very expensive, making the diversity of within-class face images a critical issue in practice. In this paper, we propose a 3D model-assisted domain-transferred face augmentation network (DotFAN) that can generate a series of variants of an input face based on the knowledge distilled from existing rich face datasets collected from other domains. DotFAN is structurally a conditional CycleGAN but has two additional subnetworks, namely face expert network (FEM) and face shape regressor (FSR), for latent code control. While FSR aims to extract face attributes, FEM is designed to capture a face identity. With their aid, DotFAN can learn a disentangled face representation and effectively generate face images of various facial attributes while preserving the identity of augmented faces. Experiments show that DotFAN is beneficial for augmenting small face datasets to improve their within-class diversity so that a better face recognition model can be learned from the augmented dataset.

[148]  arXiv:2002.09860 [pdf, other]
Title: Variance Loss in Variational Autoencoders
Authors: Andrea Asperti
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

In this article, we highlight what appears to be major issue of Variational Autoencoders, evinced from an extensive experimentation with different network architectures and datasets: the variance of generated data is sensibly lower than that of training data. Since generative models are usually evaluated with metrics such as the Frechet Inception Distance (FID) that compare the distributions of (features of) real versus generated images, the variance loss typically results in degraded scores. This problem is particularly relevant in a two stage setting, where we use a second VAE to sample in the latent space of the first VAE. The minor variance creates a mismatch between the actual distribution of latent variables and those generated by the second VAE, that hinders the beneficial effects of the second stage. Renormalizing the output of the second VAE towards the expected normal spherical distribution, we obtain a sudden burst in the quality of generated samples, as also testified in terms of FID.

[149]  arXiv:2002.09864 [pdf, other]
Title: Stealing Black-Box Functionality Using The Deep Neural Tree Architecture
Comments: 8 pages, 7 figures, 1 table
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

This paper makes a substantial step towards cloning the functionality of black-box models by introducing a Machine learning (ML) architecture named Deep Neural Trees (DNTs). This new architecture can learn to separate different tasks of the black-box model, and clone its task-specific behavior. We propose to train the DNT using an active learning algorithm to obtain faster and more sample-efficient training. In contrast to prior work, we study a complex "victim" black-box model based solely on input-output interactions, while at the same time the attacker and the victim model may have completely different internal architectures. The attacker is a ML based algorithm whereas the victim is a generally unknown module, such as a multi-purpose digital chip, complex analog circuit, mechanical system, software logic or a hybrid of these. The trained DNT module not only can function as the attacked module, but also provides some level of explainability to the cloned model due to the tree-like nature of the proposed architecture.

[150]  arXiv:2002.09866 [pdf, other]
Title: On the generalization of bayesian deep nets for multi-class classification
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Generalization bounds which assess the difference between the true risk and the empirical risk have been studied extensively. However, to obtain bounds, current techniques use strict assumptions such as a uniformly bounded or a Lipschitz loss function. To avoid these assumptions, in this paper, we propose a new generalization bound for Bayesian deep nets by exploiting the contractivity of the Log-Sobolev inequalities. Using these inequalities adds an additional loss-gradient norm term to the generalization bound, which is intuitively a surrogate of the model complexity. Empirically, we analyze the affect of this loss-gradient norm term using different deep nets.

[151]  arXiv:2002.09869 [pdf, ps, other]
Title: Near-optimal Regret Bounds for Stochastic Shortest Path
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Stochastic shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost. In the learning formulation of the problem, the agent is unaware of the environment dynamics (i.e., the transition function) and has to repeatedly play for a given number of episodes while reasoning about the problem's optimal solution. Unlike other well-studied models in reinforcement learning (RL), the length of an episode is not predetermined (or bounded) and is influenced by the agent's actions. Recently, Tarbouriech et al. (2019) studied this problem in the context of regret minimization and provided an algorithm whose regret bound is inversely proportional to the square root of the minimum instantaneous cost. In this work we remove this dependence on the minimum cost---we give an algorithm that guarantees a regret bound of $\widetilde{O}(B_\star |S| \sqrt{|A| K})$, where $B_\star$ is an upper bound on the expected cost of the optimal policy, $S$ is the set of states, $A$ is the set of actions and $K$ is the number of episodes. We additionally show that any learning algorithm must have at least $\Omega(B_\star \sqrt{|S| |A| K})$ regret in the worst case.

[152]  arXiv:2002.09877 [pdf, other]
Title: Automata for Hyperlanguages
Comments: 12 pages of main paper and another 10 pages appendix
Subjects: Formal Languages and Automata Theory (cs.FL); Computation and Language (cs.CL)

Hyperproperties lift conventional trace properties from a set of execution traces to a set of sets of execution traces. Hyperproperties have been shown to be a powerful formalism for expressing and reasoning about information-flow security policies and important properties of cyber-physical systems such as sensitivity and robustness, as well as consistency conditions in distributed computing such as linearizability. Although there is an extensive body of work on automata-based representation of trace properties, we currently lack such characterization for hyperproperties. We introduce hyperautomata for em hyperlanguages, which are languages over sets of words. Essentially, hyperautomata allow running multiple quantified words over an automaton. We propose a specific type of hyperautomata called nondeterministic finite hyperautomata (NFH), which accept regular hyperlanguages. We demonstrate the ability of regular hyperlanguages to express hyperproperties for finite traces. We then explore the fundamental properties of NFH and show their closure under the Boolean operations. We show that while nonemptiness is undecidable in general, it is decidable for several fragments of NFH. We further show the decidability of the membership problem for finite sets and regular languages for NFH, as well as the containment problem for several fragments of NFH. Finally, we introduce learning algorithms based on Angluin's L-star algorithm for the fragments NFH in which the quantification is either strictly universal or strictly existential.

[153]  arXiv:2002.09880 [pdf, other]
Title: Mixed Integer Programming for Searching Maximum Quasi-Bicliques
Comments: This paper draft is stored here for self-archiving purposes
Journal-ref: Springer Proceedings in Mathematics & Statistics, vol 315. Springer, Cham (2020)
Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Social and Information Networks (cs.SI); Optimization and Control (math.OC)

This paper is related to the problem of finding the maximal quasi-bicliques in a bipartite graph (bigraph). A quasi-biclique in the bigraph is its "almost" complete subgraph. The relaxation of completeness can be understood variously; here, we assume that the subgraph is a $\gamma$-quasi-biclique if it lacks a certain number of edges to form a biclique such that its density is at least $\gamma \in (0,1]$. For a bigraph and fixed $\gamma$, the problem of searching for the maximal quasi-biclique consists of finding a subset of vertices of the bigraph such that the induced subgraph is a quasi-biclique and its size is maximal for a given graph. Several models based on Mixed Integer Programming (MIP) to search for a quasi-biclique are proposed and tested for working efficiency. An alternative model inspired by biclustering is formulated and tested; this model simultaneously maximizes both the size of the quasi-biclique and its density, using the least-square criterion similar to the one exploited by triclustering \textsc{TriBox}.

[154]  arXiv:2002.09884 [pdf, other]
Title: Discriminative Particle Filter Reinforcement Learning for Complex Partial Observations
Comments: Accepted to ICLR 2020
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Deep reinforcement learning is successful in decision making for sophisticated games, such as Atari, Go, etc. However, real-world decision making often requires reasoning with partial information extracted from complex visual observations. This paper presents Discriminative Particle Filter Reinforcement Learning (DPFRL), a new reinforcement learning framework for complex partial observations. DPFRL encodes a differentiable particle filter in the neural network policy for explicit reasoning with partial observations over time. The particle filter maintains a belief using learned discriminative update, which is trained end-to-end for decision making. We show that using the discriminative update instead of standard generative models results in significantly improved performance, especially for tasks with complex visual observations, because they circumvent the difficulty of modeling complex observations that are irrelevant to decision making. In addition, to extract features from the particle belief, we propose a new type of belief feature based on the moment generating function. DPFRL outperforms state-of-the-art POMDP RL models in Flickering Atari Games, an existing POMDP RL benchmark, and in Natural Flickering Atari Games, a new, more challenging POMDP RL benchmark introduced in this paper. Further, DPFRL performs well for visual navigation with real-world data in the Habitat environment.

[155]  arXiv:2002.09885 [pdf, other]
Title: Speeding up the AIFV-$2$ dynamic programs by two orders of magnitude using Range Minimum Queries
Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT)

AIFV-$2$ codes are a new method for constructing lossless codes for memoryless sources that provide better worst-case redundancy than Huffman codes. They do this by using two code trees instead of one and also allowing some bounded delay in the decoding process. Known algorithms for constructing AIFV-code are iterative; at each step they replace the current code tree pair with a "better" one. The current state of the art for performing this replacement is a pair of Dynamic Programming (DP) algorithms that use $O(n^5)$ time to fill in two tables, each of size $O(n^3)$ (where $n$ is the number of different characters in the source).
This paper describes how to reduce the time for filling in the DP tables by two orders of magnitude, down to $O(n^3)$. It does this by introducing a grouping technique that permits separating the $\Theta(n^3)$-space tables into $\Theta(n)$ groups, each of size $O(n^2)$, and then using Two-Dimensional Range-Minimum Queries (RMQs) to fill in that group's table entries in $O(n^2)$ time. This RMQ speedup technique seems to be new and might be of independent interest.

[156]  arXiv:2002.09891 [pdf, other]
Title: End-To-End Graph-based Deep Semi-Supervised Learning
Comments: 5 figures, 6 tables
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The quality of a graph is determined jointly by three key factors of the graph: nodes, edges and similarity measure (or edge weights), and is very crucial to the success of graph-based semi-supervised learning (SSL) approaches. Recently, dynamic graph, which means part/all its factors are dynamically updated during the training process, has demonstrated to be promising for graph-based semi-supervised learning. However, existing approaches only update part of the three factors and keep the rest manually specified during the learning stage. In this paper, we propose a novel graph-based semi-supervised learning approach to optimize all three factors simultaneously in an end-to-end learning fashion. To this end, we concatenate two neural networks (feature network and similarity network) together to learn the categorical label and semantic similarity, respectively, and train the networks to minimize a unified SSL objective function. We also introduce an extended graph Laplacian regularization term to increase training efficiency. Extensive experiments on several benchmark datasets demonstrate the effectiveness of our approach.

[157]  arXiv:2002.09893 [pdf, other]
Title: Efficient Compression of Long Arbitrary Sequences with No Reference at the Encoder
Subjects: Information Theory (cs.IT)

In a distributed information application an encoder compresses an arbitrary vector while a similar reference vector is available to the decoder as side information. For the Hamming-distance similarity measure, and when guaranteed perfect reconstruction is required, we present two contributions to the solution of this problem. One result shows that when a set of potential reference vectors is available to the encoder, lower compression rates can be achieved when the set satisfies a certain clustering property. Another result reduces the best known decoding complexity from exponential in the vector length $n$ to $O(n^{1.5})$ by generalized concatenation of inner coset codes and outer error-correcting codes. One potential application of the results is the compression of DNA sequences, where similar (but not identical) reference vectors are shared among senders and receivers.

[158]  arXiv:2002.09895 [pdf, other]
Title: Treeplication: An Erasure Code for Distributed Full Recovery under the Random Multiset Channel
Subjects: Information Theory (cs.IT)

This paper presents a new erasure code called Treeplication designed for distributed recovery of the full information word, while most prior work in coding for distributed storage only supports distributed repair of individual symbols. A Treeplication code for $k$ information symbols is defined on a binary tree with $2k-1$ vertices, along with a distribution for selecting code symbols from the tree layers. We analyze and optimize the code under a random-multiset model, which captures the system property that the nodes available for recovery are drawn randomly from the nodes storing the code symbols. Treeplication codes are shown to have full-recovery communication-cost comparable to replication, while offering much better recoverability.

[159]  arXiv:2002.09896 [pdf, other]
Title: Adversarial Attack on DL-based Massive MIMO CSI Feedback
Comments: 12 pages, 5 figures, 1 table. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

With the increasing application of deep learning (DL) algorithms in wireless communications, the physical layer faces new challenges caused by adversarial attack. Such attack has significantly affected the neural network in computer vision. We chose DL-based analog channel state information (CSI) to show the effect of adversarial attack on DL-based communication system. We present a practical method to craft white-box adversarial attack on DL-based CSI feedback process. Our simulation results showed the destructive effect adversarial attack caused on DL-based CSI feedback by analyzing the performance of normalized mean square error. We also launched a jamming attack for comparison and found that the jamming attack could be prevented with certain precautions. As DL algorithm becomes the trend in developing wireless communication, this work raises concerns regarding the security in the use of DL-based algorithms.

[160]  arXiv:2002.09898 [pdf, other]
Title: Efficient numerical methods for computing the stationary states of phase field crystal models
Comments: 26 pages, 5 figures
Subjects: Numerical Analysis (math.NA)

Finding the stationary states of a free energy functional is an important problem in phase field crystal (PFC) models. Many efforts have been devoted for designing numerical schemes with energy dissipation and mass conservation properties. However, most existing approaches are time-consuming due to the requirement of small effective time steps. In this paper, we discretize the energy functional and propose efficient numerical algorithms for solving the constrained non-convex minimization problem. A class of first order approaches, which is the so-called adaptive accelerated Bregman proximal gradient (AA-BPG) methods, is proposed and the convergence property is established without the global Lipschitz constant requirements. Moreover, we design a hybrid approach that applies an inexact Newton method to further accelerate the local convergence. One key feature of our algorithm is that the energy dissipation and mass conservation properties hold during the iteration process. Extensive numerical experiments, including two three dimensional periodic crystals in Landau-Brazovskii (LB) model and a two dimensional quasicrystal in Lifshitz-Petrich (LP) model, demonstrate that our approaches have adaptive time steps which lead to a significant acceleration over many existing methods when computing complex structures.

[161]  arXiv:2002.09901 [pdf]
Title: A Nepali Rule Based Stemmer and its performance on different NLP applications
Comments: 5 pages, 2 figures, 3 tables
Journal-ref: Proceedings of the 4th International IT Conference on ICT with Smart Computing and 9th National Students' Conference on Information Technology, (NaSCoIT 2018), Kathmandu, Nepal, ISSN No 2505-1075, pp. 16 (December 2018)
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Stemming is an integral part of Natural Language Processing (NLP). It's a preprocessing step in almost every NLP application. Arguably, the most important usage of stemming is in Information Retrieval (IR). While there are lots of work done on stemming in languages like English, Nepali stemming has only a few works. This study focuses on creating a Rule Based stemmer for Nepali text. Specifically, it is an affix stripping system that identifies two different class of suffixes in Nepali grammar and strips them separately. Only a single negativity prefix (Na) is identified and stripped. This study focuses on a number of techniques like exception word identification, morphological normalization and word transformation to increase stemming performance. The stemmer is tested intrinsically using Paice's method and extrinsically on a basic tf-idf based IR system and an elementary news topic classifier using Multinomial Naive Bayes Classifier. The difference in performance of these systems with and without using the stemmer is analysed.

[162]  arXiv:2002.09905 [pdf, other]
Title: Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction
Comments: Submitted to a conference, under review
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Video prediction is a pixel-wise dense prediction task to infer future frames based on past frames. Missing appearance details and motion blur are still two major problems for current predictive models, which lead to image distortion and temporal inconsistency. In this paper, we point out the necessity of exploring multi-frequency analysis to deal with the two problems. Inspired by the frequency band decomposition characteristic of Human Vision System (HVS), we propose a video prediction network based on multi-level wavelet analysis to deal with spatial and temporal information in a unified manner. Specifically, the multi-level spatial discrete wavelet transform decomposes each video frame into anisotropic sub-bands with multiple frequencies, helping to enrich structural information and reserve fine details. On the other hand, multi-level temporal discrete wavelet transform which operates on time axis decomposes the frame sequence into sub-band groups of different frequencies to accurately capture multi-frequency motions under a fixed frame rate. Extensive experiments on diverse datasets demonstrate that our model shows significant improvements on fidelity and temporal consistency over state-of-the-art works.

[163]  arXiv:2002.09907 [pdf, ps, other]
Title: Performance Analysis of Intelligent Reflecting Surface Assisted NOMA Networks
Comments: 13 pages, 11 figures
Subjects: Information Theory (cs.IT)

Intelligent reflecting surface (IRS) is a promising technology to enhance the coverage and performance of wireless networks. We consider the application of IRS to non-orthogonal multiple access (NOMA), where a base station transmits superposed signals to multiple users by the virtue of an IRS. The performance of an IRS-assisted NOMA networks with imperfect successive interference cancellation (ipSIC) and perfect successive interference cancellation (pSIC) is investigated by invoking 1-bit coding scheme. In particular, we derive new exact and asymptotic expressions for both outage probability and ergodic rate of the m-th user with ipSIC/pSIC. Based on analytical results, the diversity order of the m-th user with pSIC is in connection with the number of reflecting elements and channel ordering. The high signal-to-noise radio (SNR) slope of ergodic rate for the $m$-th user is obtained. The throughput and energy efficiency of non-orthogonal users for IRS-NOMA are discussed both in delay-limited and delay-tolerant transmission modes. Additionally, we derive new exact expressions of outage probability and ergodic rate for IRS-assisted orthogonal multiple access (IRS-OMA). Numerical results are presented to substantiate our analyses and demonstrate that: i) The outage behaviors of IRS-NOMA are superior to that of IRS-OMA and relaying schemes; ii) With increasing the number of reflecting elements, IRS-NOMA is capable of achieving enhanced outage performance; and iii) The M-th user has a larger ergodic rate compared to IRS-OMA and benchmarks. However, the ergodic performance of the $m$-th user exceeds relaying schemes in the low SNR regime.

[164]  arXiv:2002.09917 [pdf, other]
Title: Improve SGD Training via Aligning Min-batches
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Deep neural networks (DNNs) for supervised learning can be viewed as a pipeline of a feature extractor (i.e. last hidden layer) and a linear classifier (i.e. output layer) that is trained jointly with stochastic gradient descent (SGD). In each iteration of SGD, a mini-batch from the training data is sampled and the true gradient of the loss function is estimated as the noisy gradient calculated on this mini-batch. From the feature learning perspective, the feature extractor should be updated to learn meaningful features with respect to the entire data, and reduce the accommodation to noise in the mini-batch. With this motivation, we propose In-Training Distribution Matching (ITDM) to improve DNN training and reduce overfitting. Specifically, along with the loss function, ITDM regularizes the feature extractor by matching the moments of distributions of different mini-batches in each iteration of SGD, which is fulfilled by minimizing the maximum mean discrepancy. As such, ITDM does not assume any explicit parametric form of data distribution in the latent feature space. Extensive experiments are conducted to demonstrate the effectiveness of our proposed strategy.

[165]  arXiv:2002.09919 [pdf, other]
Title: Do Multi-Hop Question Answering Systems Know How to Answer the Single-Hop Sub-Questions?
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Multi-hop question answering (QA) requires a model to retrieve and integrate information from different parts of a long text to answer a question. Humans answer this kind of complex questions via a divide-and-conquer approach. In this paper, we investigate whether top-performing models for multi-hop questions understand the underlying sub-questions like humans. We adopt a neural decomposition model to generate sub-questions for a multi-hop complex question, followed by extracting the corresponding sub-answers. We show that multiple state-of-the-art multi-hop QA models fail to correctly answer a large portion of sub-questions, although their corresponding multi-hop questions are correctly answered. This indicates that these models manage to answer the multi-hop questions using some partial clues, instead of truly understanding the reasoning paths. We also propose a new model which significantly improves the performance on answering the sub-questions. Our work takes a step forward towards building a more explainable multi-hop QA system.

[166]  arXiv:2002.09923 [pdf, other]
Title: Monocular Direct Sparse Localization in a Prior 3D Surfel Map
Comments: 7 pages, 6 figures, to appear in ICRA 2020
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

In this paper, we introduce an approach to tracking the pose of a monocular camera in a prior surfel map. By rendering vertex and normal maps from the prior surfel map, the global planar information for the sparse tracked points in the image frame is obtained. The tracked points with and without the global planar information involve both global and local constraints of frames to the system. Our approach formulates all constraints in the form of direct photometric errors within a local window of the frames. The final optimization utilizes these constraints to provide the accurate estimation of global 6-DoF camera poses with the absolute scale. The extensive simulation and real-world experiments demonstrate that our monocular method can provide accurate camera localization results under various conditions.

[167]  arXiv:2002.09925 [pdf, other]
Title: ORCSolver: An Efficient Solver for Adaptive GUI Layout with OR-Constraints
Comments: Published at CHI2020
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)

OR-constrained (ORC) graphical user interface layouts unify conventional constraint-based layouts with flow layouts, which enables the definition of flexible layouts that adapt to screens with different sizes, orientations, or aspect ratios with only a single layout specification. Unfortunately, solving ORC layouts with current solvers is time-consuming and the needed time increases exponentially with the number of widgets and constraints. To address this challenge, we propose ORCSolver, a novel solving technique for adaptive ORC layouts, based on a branch-and-bound approach with heuristic preprocessing. We demonstrate that ORCSolver simplifies ORC specifications at runtime and our approach can solve ORC layout specifications efficiently at near-interactive rates.

[168]  arXiv:2002.09927 [pdf, other]
Title: Weighting Is Worth the Wait: Bayesian Optimization with Importance Sampling
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Many contemporary machine learning models require extensive tuning of hyperparameters to perform well. A variety of methods, such as Bayesian optimization, have been developed to automate and expedite this process. However, tuning remains extremely costly as it typically requires repeatedly fully training models. We propose to accelerate the Bayesian optimization approach to hyperparameter tuning for neural networks by taking into account the relative amount of information contributed by each training example. To do so, we leverage importance sampling (IS); this significantly increases the quality of the black-box function evaluations, but also their runtime, and so must be done carefully. Casting hyperparameter search as a multi-task Bayesian optimization problem over both hyperparameters and importance sampling design achieves the best of both worlds: by learning a parameterization of IS that trades-off evaluation complexity and quality, we improve upon Bayesian optimization state-of-the-art runtime and final validation error across a variety of datasets and complex neural architectures.

[169]  arXiv:2002.09928 [pdf, other]
Title: Predictive Sampling with Forecasting Autoregressive Models
Comments: 13 pages, 16 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Autoregressive models (ARMs) currently hold state-of-the-art performance in likelihood-based modeling of image and audio data. Generally, neural network based ARMs are designed to allow fast inference, but sampling from these models is impractically slow. In this paper, we introduce the predictive sampling algorithm: a procedure that exploits the fast inference property of ARMs in order to speed up sampling, while keeping the model intact. We propose two variations of predictive sampling, namely sampling with ARM fixed-point iteration and learned forecasting modules. Their effectiveness is demonstrated in two settings: i) explicit likelihood modeling on binary MNIST, SVHN and CIFAR10, and ii) discrete latent modeling in an autoencoder trained on SVHN, CIFAR10 and Imagenet32. Empirically, we show considerable improvements over baselines in number of ARM inference calls and sampling speed.

[170]  arXiv:2002.09931 [pdf, other]
Title: The Value of Big Data for Credit Scoring: Enhancing Financial Inclusion using Mobile Phone Data and Social Network Analytics
Journal-ref: Applied Soft Computing, Volume 74, January 2019, Pages 26-39
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY); Machine Learning (cs.LG); Machine Learning (stat.ML)

Credit scoring is without a doubt one of the oldest applications of analytics. In recent years, a multitude of sophisticated classification techniques have been developed to improve the statistical performance of credit scoring models. Instead of focusing on the techniques themselves, this paper leverages alternative data sources to enhance both statistical and economic model performance. The study demonstrates how including call networks, in the context of positive credit information, as a new Big Data source has added value in terms of profit by applying a profit measure and profit-based feature selection. A unique combination of datasets, including call-detail records, credit and debit account information of customers is used to create scorecards for credit card applicants. Call-detail records are used to build call networks and advanced social network analytics techniques are applied to propagate influence from prior defaulters throughout the network to produce influence scores. The results show that combining call-detail records with traditional data in credit scoring models significantly increases their performance when measured in AUC. In terms of profit, the best model is the one built with only calling behavior features. In addition, the calling behavior features are the most predictive in other models, both in terms of statistical and economic performance. The results have an impact in terms of ethical use of call-detail records, regulatory implications, financial inclusion, as well as data sharing and privacy.

[171]  arXiv:2002.09941 [pdf, other]
Title: A Bridge between Polynomial Optimization and Games with Imperfect Recall
Subjects: Computer Science and Game Theory (cs.GT); Logic in Computer Science (cs.LO)

We provide several positive and negative complexity results for solving games with imperfect recall. Using a one-to-one correspondence between these games on one side and multivariate polynomials on the other side, we show that solving games with imperfect recall is as hard as solving certain problems of the first order theory of reals. We establish square root sum hardness even for the specific class of A-loss games. On the positive side, we find restrictions on games and strategies motivated by Bridge bidding that give polynomial-time complexity.

[172]  arXiv:2002.09942 [pdf, ps, other]
Title: How Good Is a Strategy in a Game With Nature?
Journal-ref: ACM Transactions on Computational Logic, Vol. 21, No 3, Article 21, pp. 1-39, February 2020
Subjects: Formal Languages and Automata Theory (cs.FL); Computer Science and Game Theory (cs.GT); Logic in Computer Science (cs.LO)

We consider games with two antagonistic players --- \'Elo\"ise (modelling a program) and Ab\'elard (modelling a byzantine environment) --- and a third, unpredictable and uncontrollable player, that we call Nature. Motivated by the fact that the usual probabilistic semantics very quickly leads to undecidability when considering either infinite game graphs or imperfect-information, we propose two alternative semantics that leads to decidability where the probabilistic one fails: one based on counting and one based on topology.

[173]  arXiv:2002.09943 [pdf, other]
Title: Network Clustering Via Kernel-ARMA Modeling and the Grassmannian The Brain-Network Case
Comments: arXiv admin note: substantial text overlap with arXiv:1906.02292
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

This paper introduces a clustering framework for networks with nodes annotated with time-series data. The framework addresses all types of network-clustering problems: State clustering, node clustering within states (a.k.a. topology identification or community detection), and even subnetwork-state-sequence identification/tracking. Via a bottom-up approach, features are first extracted from the raw nodal time-series data by kernel autoregressive-moving-average modeling to reveal non-linear dependencies and low-rank representations, and then mapped onto the Grassmann manifold (Grassmannian). All clustering tasks are performed by leveraging the underlying Riemannian geometry of the Grassmannian in a novel way. To validate the proposed framework, brain-network clustering is considered, where extensive numerical tests on synthetic and real functional magnetic resonance imaging (fMRI) data demonstrate that the advocated learning framework compares favorably versus several state-of-the-art clustering schemes.

[174]  arXiv:2002.09945 [pdf, other]
Title: On the Estimation of Complex Circuits Functional Failure Rate by Machine Learning Techniques
Comments: arXiv admin note: text overlap with arXiv:2002.08882
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

De-Rating or Vulnerability Factors are a major feature of failure analysis efforts mandated by today's Functional Safety requirements. Determining the Functional De-Rating of sequential logic cells typically requires computationally intensive fault-injection simulation campaigns. In this paper a new approach is proposed which uses Machine Learning to estimate the Functional De-Rating of individual flip-flops and thus, optimising and enhancing fault injection efforts. Therefore, first, a set of per-instance features is described and extracted through an analysis approach combining static elements (cell properties, circuit structure, synthesis attributes) and dynamic elements (signal activity). Second, reference data is obtained through first-principles fault simulation approaches. Finally, one part of the reference dataset is used to train the Machine Learning algorithm and the remaining is used to validate and benchmark the accuracy of the trained tool. The intended goal is to obtain a trained model able to provide accurate per-instance Functional De-Rating data for the full list of circuit instances, an objective that is difficult to reach using classical methods. The presented methodology is accompanied by a practical example to determine the performance of various Machine Learning models for different training sizes.

[175]  arXiv:2002.09949 [pdf, other]
Title: Path Outlines: Browsing Path-Based Summaries of Linked Open Datasets
Comments: 13 pages, 9 figures
Subjects: Human-Computer Interaction (cs.HC)

Linked Data (LD) are structured sources of information, such as DBpedia or Geonames, that can be linked together and queried. The information they contain is atomized into triples, each triple being a simple statement composed of a subject, a predicate and an object. Triples can then be combined to form higher level statements following information needs. This granularity makes it difficult to produce overviews of LD content. We therefore introduce the concept of path-based summaries which carries a higher level of semantics for data producers. We also introduce the tool Path Outlines to support LD producers in browsing path-based summaries of their datasets. We present its interface based on the broken (out)lines layout algorithm and the path browser visualisation.
Our approach, reifying chains of statements into path outlines, was informed by the observation of LD producers and we report a characterisation of their needs. We compare Path Outlines with the current baseline technique (Virtuoso SPARQL query editor) in an experiment with 36 participants. We show that participants prefer Path Outlines, find it easier to understand, easier to use, faster, and lowering the number of tasks that users give-up before completing them.

[176]  arXiv:2002.09951 [pdf]
Title: Multi-Stream Networks and Ground-Truth Generation for Crowd Counting
Comments: this https URL
Journal-ref: The International Journal of Electrical and Computer Engineering Systems 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Crowd scene analysis has received a lot of attention recently due to the wide variety of applications, for instance, forensic science, urban planning, surveillance and security. In this context, a challenging task is known as crowd counting, whose main purpose is to estimate the number of people present in a single image. A Multi-Stream Convolutional Neural Network is developed and evaluated in this work, which receives an image as input and produces a density map that represents the spatial distribution of people in an end-to-end fashion. In order to address complex crowd counting issues, such as extremely unconstrained scale and perspective changes, the network architecture utilizes receptive fields with different size filters for each stream. In addition, we investigate the influence of the two most common fashions on the generation of ground truths and propose a hybrid method based on tiny face detection and scale interpolation. Experiments conducted on two challenging datasets, UCF-CC-50 and ShanghaiTech, demonstrate that using our ground truth generation methods achieves superior results.

[177]  arXiv:2002.09956 [pdf, other]
Title: De-randomized PAC-Bayes Margin Bounds: Applications to Non-convex and Non-smooth Predictors
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

In spite of several notable efforts, explaining the generalization of deterministic deep nets, e.g., ReLU-nets, has remained challenging. Existing approaches usually need to bound the Lipschitz constant of such deep nets but such bounds have been shown to increase substantially with the number of training samples yielding vacuous generalization bounds [Nagarajan and Kolter, 2019a]. In this paper, we present new de-randomized PAC-Bayes margin bounds for deterministic non-convex and non-smooth predictors, e.g., ReLU-nets. The bounds depend on a trade-off between the $L_2$-norm of the weights and the effective curvature (`flatness') of the predictor, avoids any dependency on the Lipschitz constant, and yield meaningful (decreasing) bounds with increase in training set size. Our analysis first develops a de-randomization argument for non-convex but smooth predictors, e.g., linear deep networks (LDNs). We then consider non-smooth predictors which for any given input realize as a smooth predictor, e.g., ReLU-nets become some LDN for a given input, but the realized smooth predictor can be different for different inputs.
For such non-smooth predictors, we introduce a new PAC-Bayes analysis that maintains distributions over the structure as well as parameters of smooth predictors, e.g., LDNs corresponding to ReLU-nets, which after de-randomization yields a bound for the deterministic non-smooth predictor. We present empirical results to illustrate the efficacy of our bounds over changing training set size and randomness in labels.

[178]  arXiv:2002.09958 [pdf, other]
Title: Gradual Channel Pruning while Training using Feature Relevance Scores for Convolutional Neural Networks
Comments: 10 pages, 6 figures, 5 tables
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

The enormous inference cost of deep neural networks can be scaled down by network compression. Pruning is one of the predominant approaches used for deep network compression. However, existing pruning techniques have one or more of the following limitations: 1) Additional energy cost on top of the compute heavy training stage due to pruning and fine-tuning stages, 2) Layer-wise pruning based on the statistics of a particular, ignoring the effect of error propagation in the network, 3) Lack of an efficient estimate for determining the important channels globally, 4) Unstructured pruning requires specialized hardware for effective use. To address all the above issues, we present a simple-yet-effective gradual channel pruning while training methodology using a novel data driven metric referred as Feature relevance score. The proposed technique gets rid of the additional retraining cycles by pruning least important channels in a structured fashion at fixed intervals during the actual training phase. Feature relevance scores help in efficiently evaluating the contribution of each channel towards the discriminative power of the network. We demonstrate the effectiveness of the proposed methodology on architectures such as VGG and ResNet using datasets such as CIFAR-10, CIFAR-100 and ImageNet, and successfully achieve significant model compression while trading off less than $1\%$ accuracy. Notably on CIFAR-10 dataset trained on ResNet-110, our approach achieves $2.4\times$ compression and a $56\%$ reduction in FLOPs with an accuracy drop of $0.01\%$ compared to the unpruned network.

[179]  arXiv:2002.09963 [pdf, other]
Title: Mitigating Class Boundary Label Uncertainty to Reduce Both Model Bias and Variance
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The study of model bias and variance with respect to decision boundaries is critically important in supervised classification. There is generally a tradeoff between the two, as fine-tuning of the decision boundary of a classification model to accommodate more boundary training samples (i.e., higher model complexity) may improve training accuracy (i.e., lower bias) but hurt generalization against unseen data (i.e., higher variance). By focusing on just classification boundary fine-tuning and model complexity, it is difficult to reduce both bias and variance. To overcome this dilemma, we take a different perspective and investigate a new approach to handle inaccuracy and uncertainty in the training data labels, which are inevitable in many applications where labels are conceptual and labeling is performed by human annotators. The process of classification can be undermined by uncertainty in the labels of the training data; extending a boundary to accommodate an inaccurately labeled point will increase both bias and variance. Our novel method can reduce both bias and variance by estimating the pointwise label uncertainty of the training set and accordingly adjusting the training sample weights such that those samples with high uncertainty are weighted down and those with low uncertainty are weighted up. In this way, uncertain samples have a smaller contribution to the objective function of the model's learning algorithm and exert less pull on the decision boundary. In a real-world physical activity recognition case study, the data presents many labeling challenges, and we show that this new approach improves model performance and reduces model variance.

[180]  arXiv:2002.09964 [pdf, other]
Title: Quantized Push-sum for Gossip and Decentralized Optimization over Directed Graphs
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Signal Processing (eess.SP); Systems and Control (eess.SY)

We consider a decentralized stochastic learning problem where data points are distributed among computing nodes communicating over a directed graph. As the model size gets large, decentralized learning faces a major bottleneck that is the heavy communication load due to each node transmitting large messages (model updates) to its neighbors. To tackle this bottleneck, we propose the quantized decentralized stochastic learning algorithm over directed graphs that is based on the push-sum algorithm in decentralized consensus optimization. More importantly, we prove that our algorithm achieves the same convergence rates of the decentralized stochastic learning algorithm with exact-communication for both convex and non-convex losses. A key technical challenge of the work is to prove \emph{exact convergence} of the proposed decentralized learning algorithm in the presence of quantization noise with unbounded variance over directed graphs. We provide numerical evaluations that corroborate our main theoretical results and illustrate significant speed-up compared to the exact-communication methods.

[181]  arXiv:2002.09971 [pdf, other]
Title: Rapidly Personalizing Mobile Health Treatment Policies with Limited Data
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)

In mobile health (mHealth), reinforcement learning algorithms that adapt to one's context without learning personalized policies might fail to distinguish between the needs of individuals. Yet the high amount of noise due to the in situ delivery of mHealth interventions can cripple the ability of an algorithm to learn when given access to only a single user's data, making personalization challenging. We present IntelligentPooling, which learns personalized policies via an adaptive, principled use of other users' data. We show that IntelligentPooling achieves an average of 26% lower regret than state-of-the-art across all generative models. Additionally, we inspect the behavior of this approach in a live clinical trial, demonstrating its ability to learn from even a small group of users.

[182]  arXiv:2002.09972 [pdf, other]
Title: Structural Parameterizations with Modulator Oblivion
Subjects: Data Structures and Algorithms (cs.DS)

It is known that problems like Vertex Cover, Feedback Vertex Set and Odd Cycle Transversal are polynomial time solvable in the class of chordal graphs. We consider these problems in a graph that has at most $k$ vertices whose deletion results in a chordal graph, when parameterized by $k$. While this investigation fits naturally into the recent trend of what are called `structural parameterizations', here we assume that the deletion set is not given.
One method to solve them is to compute a $k$-sized or an approximate ($f(k)$ sized, for a function $f$) chordal vertex deletion set and then use the structural properties of the graph to design an algorithm. This method leads to at least $k^{\mathcal{O}(k)}n^{\mathcal{O}(1)}$ running time when we use the known parameterized or approximation algorithms for finding a $k$-sized chordal deletion set on an $n$ vertex graph.
In this work, we design $2^{\mathcal{O}(k)}n^{\mathcal{O}(1)}$ time algorithms for these problems. Our algorithms do not compute a chordal vertex deletion set (or even an approximate solution). Instead, we construct a tree decomposition of the given graph in time $2^{\mathcal{O}(k)}n^{\mathcal{O}(1)}$ where each bag is a union of four cliques and $\mathcal{O}(k)$ vertices. We then apply standard dynamic programming algorithms over this special tree decomposition. This special tree decomposition can be of independent interest.
Our algorithms are adaptive (robust) in the sense that given an integer $k$, they detect whether the graph has a chordal vertex deletion set of size at most $k$ or output the special tree decomposition and solve the problem.
We also show lower bounds for the problems we deal with under the Strong Exponential Time Hypothesis (SETH).

[183]  arXiv:2002.09979 [pdf, other]
Title: Gaussian-Process-based Robot Learning from Demonstration
Comments: 7 pages, 10 figures
Subjects: Robotics (cs.RO)

Endowed with higher levels of autonomy, robots are required to perform increasingly complex manipulation tasks. Learning from demonstration is arising as a promising paradigm for easily extending robot capabilities so that they adapt to unseen scenarios. We present a novel Gaussian-Process-based approach for learning manipulation skills from observations of a human teacher. This probabilistic representation allows to generalize over multiple demonstrations, and encode uncertainty variability along the different phases of the task. In this paper, we address how Gaussian Processes can be used to effectively learn a policy from trajectories in task space. We also present a method to efficiently adapt the policy to fulfill new requirements, and to modulate the robot behavior as a function of task uncertainty. This approach is illustrated through a real-world application using the TIAGo robot.

[184]  arXiv:2002.09989 [pdf, other]
Title: Deriving a Usage-Independent Software Quality Metric
Subjects: Software Engineering (cs.SE)

Context:The extent of post-release use of software affects the number of faults, thus biasing quality metrics and adversely affecting associated decisions. The proprietary nature of usage data limited deeper exploration of this subject in the past. Objective: To determine how software faults and software use are related and how an accurate quality measure can be designed. Method: New users, usage intensity, usage frequency, exceptions, and release date and duration measured for complex proprietary mobile applications for Android and iOS. Utilized Bayesian Network and Random Forest models to explain the interrelationships and to derive the usage independent release quality measure. Investigated the interrelationship among various code complexity measures, usage (downloads), and number of issues for 520 NPM packages and derived a usage-independent quality measure from these analyses, applied it on 4430 popular NPM packages to construct timelines for comparing the perceived quality (issues) and our derived measure of quality for these packages.Results: We found the number of new users to be the primary factor determining the number of exceptions, and found no direct link between the intensity and frequency of software usage and software faults. Release quality expressed as crashes per user was independent of other usage-related predictors, thus serving as a usage independent measure of software quality. Usage also affected quality in NPM, where downloads were strongly associated with numbers of issues, even after taking the other code complexity measures into consideration. Conclusions: We expect our result and our proposed quality measure will help gauge release quality of a software more accurately and inspire further research in this area.

[185]  arXiv:2002.10002 [pdf, other]
Title: On Thompson Sampling with Langevin Algorithms
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Thompson sampling is a methodology for multi-armed bandit problems that is known to enjoy favorable performance in both theory and practice. It does, however, have a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration. We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue. We construct quickly converging Langevin algorithms to generate approximate samples that have accuracy guarantees, and we leverage novel posterior concentration rates to analyze the regret of the resulting approximate Thompson sampling algorithm. Further, we specify the necessary hyper-parameters for the MCMC procedure to guarantee optimal instance-dependent frequentist regret while having low computational complexity. In particular, our algorithms take advantage of both posterior concentration and a sample reuse mechanism to ensure that only a constant number of iterations and a constant amount of data is needed in each round. The resulting approximate Thompson sampling algorithm has logarithmic regret and its computational complexity does not scale with the time horizon of the algorithm.

[186]  arXiv:2002.10003 [pdf, ps, other]
Title: NeurIPS 2019 Disentanglement Challenge: Improved Disentanglement through Aggregated Convolutional Feature Maps
Comments: Disentanglement Challenge - 33rd Conference on Neural Information Processing Systems (NeurIPS) - NeurIPS 2019
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

This report to our stage 1 submission to the NeurIPS 2019 disentanglement challenge presents a simple image preprocessing method for training VAEs leading to improved disentanglement compared to directly using the images. In particular, we propose to use regionally aggregated feature maps extracted from CNNs pretrained on ImageNet. Our method achieved the 2nd place in stage 1 of the challenge. Code is available at https://github.com/mseitzer/neurips2019-disentanglement-challenge.

[187]  arXiv:2002.10006 [pdf, other]
Title: Comparing the Parameter Complexity of Hypernetworks and the Embedding-Based Alternative
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

In the context of learning to map an input $I$ to a function $h_I:\mathcal{X}\to \mathbb{R}$, we compare two alternative methods: (i) an embedding-based method, which learns a fixed function in which $I$ is encoded as a conditioning signal $e(I)$ and the learned function takes the form $h_I(x) = q(x,e(I))$, and (ii) hypernetworks, in which the weights $\theta_I$ of the function $h_I(x) = g(x;\theta_I)$ are given by a hypernetwork $f$ as $\theta_I=f(I)$.
We extend the theory of~\cite{devore} and provide a lower bound on the complexity of neural networks as function approximators, i.e., the number of trainable parameters. This extension, eliminates the requirements for the approximation method to be robust. Our results are then used to compare the complexities of $q$ and $g$, showing that under certain conditions and when letting the functions $e$ and $f$ be as large as we wish, $g$ can be smaller than $q$ by orders of magnitude. In addition, we show that for typical assumptions on the function to be approximated, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.

[188]  arXiv:2002.10007 [pdf, other]
Title: A Critical View of the Structural Causal Model
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

In the univariate case, we show that by comparing the individual complexities of univariate cause and effect, one can identify the cause and the effect, without considering their interaction at all. In our framework, complexities are captured by the reconstruction error of an autoencoder that operates on the quantiles of the distribution. Comparing the reconstruction errors of the two autoencoders, one for each variable, is shown to perform surprisingly well on the accepted causality directionality benchmarks. Hence, the decision as to which of the two is the cause and which is the effect may not be based on causality but on complexity.
In the multivariate case, where one can ensure that the complexities of the cause and effect are balanced, we propose a new adversarial training method that mimics the disentangled structure of the causal model. We prove that in the multidimensional case, such modeling is likely to fit the data only in the direction of causality. Furthermore, a uniqueness result shows that the learned model is able to identify the underlying causal and residual (noise) components. Our multidimensional method outperforms the literature methods on both synthetic and real world datasets.

[189]  arXiv:2002.10009 [pdf, other]
Title: Fighting Fire with Light: A Case for Defending DDoS Attacks Using the Optical Layer
Comments: 6 pages, 4 figures
Subjects: Cryptography and Security (cs.CR)

The DDoS attack landscape is growing at an unprecedented pace. Inspired by the recent advances in optical networking, we make a case for optical layer-aware DDoS defense (O-LAD) in this paper. Our approach leverages the optical layer to isolate attack traffic rapidly via dynamic reconfiguration of (backup) wavelengths using ROADMs---bridging the gap between (a) evolution of the DDoS attack landscape and (b) innovations in the optical layer (e.g., reconfigurable optics). We show that the physical separation of traffic profiles allows finer-grained handling of suspicious flows and offers better performance for benign traffic in the face of an attack. We present preliminary results modeling throughput and latency for legitimate flows while scaling the strength of attacks. We also identify a number of open problems for the security, optical, and systems communities: modeling diverse DDoS attacks (e.g., fixed vs. variable rate, detectable vs. undetectable), building a full-fledged defense system with optical advancements (e.g., OpenConfig), and optical layer-aware defenses for a broader class of attacks (e.g., network reconnaissance).

[190]  arXiv:2002.10010 [pdf, other]
Title: Driving with Data in the Motor City: Mining and Modeling Vehicle Fleet Maintenance Data
Subjects: Computers and Society (cs.CY)

The City of Detroit maintains an active fleet of over 2500 vehicles, spending an annual average of over \$5 million on purchases and over \$7.7 million on maintenance. Modeling patterns and trends in this data is of particular importance to a variety of stakeholders, particularly as Detroit emerges from Chapter 9 bankruptcy, but the structure in such data is complex, and the city lacks dedicated resources for in-depth analysis. The City of Detroit's Operations and Infrastructure Group and the University of Michigan initiated a collaboration which seeks to address this unmet need by analyzing data from the City of Detroit's vehicle fleet. This work presents a case study and provides the first data-driven benchmark, demonstrating a suite of methods to aid in data understanding and prediction for large vehicle maintenance datasets. We present analyses to address three key questions raised by the stakeholders, related to discovering multivariate maintenance patterns over time; predicting maintenance; and predicting vehicle- and fleet-level costs. We present a novel algorithm, PRISM, for automating multivariate sequential data analyses using tensor decomposition. This work is a first of its kind that presents both methodologies and insights to guide future civic data research.

[191]  arXiv:2002.10011 [pdf, other]
Title: Geometric Algebra Power Theory (GAPoT): Revisiting Apparent Power under Non-Sinusoidal Conditions
Subjects: Systems and Control (eess.SY)

Traditional power theories and one of their most important concepts --apparent power-- are still a source of debate and, as shown in the literature, they present several flaws that misinterpret the power-transfer phenomena under distorted grid conditions. In recent years, advanced mathematical tools such as geometric algebra (GA) have been applied to address these issues. However, the application of GA to electrical circuits requires more consensus, improvements and refinement. In this paper, power theories based on GA are revisited. Several drawbacks and inconsistencies of previous works are identified and modifications to the so-called geometric algebra power theory (GAPoT) are presented. This theory takes into account power components generated by cross-products between current and voltage harmonics in the frequency domain. Compared to other theories based on GA, it is compatible with the traditional definition of apparent power calculated as the product of RMS voltage and current. Also, mathematical developments are done in a multi-dimensional Euclidean space where the energy conservation principle is satisfied. The paper includes a basic example and experimental results in which measurements from a utility supply are analysed. Finally, suggestions for the extension to three-phase systems are drawn.

[192]  arXiv:2002.10016 [pdf, other]
Title: Deep Multimodal Image-Text Embeddings for Automatic Cross-Media Retrieval
Authors: Hadi Abdi Khojasteh (1), Ebrahim Ansari (1 and 2), Parvin Razzaghi (1 and 3), Akbar Karimi (4) ((1) Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran, (2) Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, Charles University, Czechia, (3) Institute for Research in Fundamental Sciences (IPM), Tehran, Iran, (4) IMP Lab, Department of Engineering and Architecture, University of Parma, Parma, Italy)
Comments: 6 pages and 2 figures, Learn more about this project at this https URL
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

This paper considers the task of matching images and sentences by learning a visual-textual embedding space for cross-modal retrieval. Finding such a space is a challenging task since the features and representations of text and image are not comparable. In this work, we introduce an end-to-end deep multimodal convolutional-recurrent network for learning both vision and language representations simultaneously to infer image-text similarity. The model learns which pairs are a match (positive) and which ones are a mismatch (negative) using a hinge-based triplet ranking. To learn about the joint representations, we leverage our newly extracted collection of tweets from Twitter. The main characteristic of our dataset is that the images and tweets are not standardized the same as the benchmarks. Furthermore, there can be a higher semantic correlation between the pictures and tweets contrary to benchmarks in which the descriptions are well-organized. Experimental results on MS-COCO benchmark dataset show that our model outperforms certain methods presented previously and has competitive performance compared to the state-of-the-art. The code and dataset have been made available publicly.

[193]  arXiv:2002.10018 [pdf, ps, other]
Title: Distributed Quantum Proofs for Replicated Data
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Quantum Physics (quant-ph)

The paper tackles the issue of $\textit{checking}$ that all copies of a large data set replicated at several nodes of a network are identical. The fact that the replicas may be located at distant nodes prevents the system from verifying their equality locally, i.e., by having each node consult only nodes in its vicinity. On the other hand, it remains possible to assign $\textit{certificates}$ to the nodes, so that verifying the consistency of the replicas can be achieved locally. However, we show that, as the data set is large, classical certification mechanisms, including distributed Merlin-Arthur protocols, cannot guarantee good completeness and soundness simultaneously, unless they use very large certificates. The main result of this paper is a distributed $\textit{quantum}$ Merlin-Arthur protocol enabling the nodes to collectively check the consistency of the replicas, based on small certificates, and in a single round of message exchange between neighbors, with short messages. In particular, the certificate-size is logarithmic in the size of the data set, which gives an exponential advantage over classical certification mechanisms.

[194]  arXiv:2002.10021 [pdf, other]
Title: How Transferable are the Representations Learned by Deep Q Agents?
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we consider the source of Deep Reinforcement Learning (DRL)'s sample complexity, asking how much derives from the requirement of learning useful representations of environment states and how much is due to the sample complexity of learning a policy. While for DRL agents, the distinction between representation and policy may not be clear, we seek new insight through a set of transfer learning experiments. In each experiment, we retain some fraction of layers trained on either the same game or a related game, comparing the benefits of transfer learning to learning a policy from scratch. Interestingly, we find that benefits due to transfer are highly variable in general and non-symmetric across pairs of tasks. Our experiments suggest that perhaps transfer from simpler environments can boost performance on more complex downstream tasks and that the requirements of learning a useful representation can range from negligible to the majority of the sample complexity, based on the environment. Furthermore, we find that fine-tuning generally outperforms training with the transferred layers frozen, confirming an insight first noted in the classification setting.

[195]  arXiv:2002.10025 [pdf, other]
Title: Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Deep networks were recently suggested to face the odds between accuracy (on clean natural images) and robustness (on adversarially perturbed images) (Tsipras et al., 2019). Such a dilemma is shown to be rooted in the inherently higher sample complexity (Schmidt et al., 2018) and/or model capacity (Nakkiran, 2019), for learning a high-accuracy and robust classifier. In view of that, give a classification task, growing the model capacity appears to help draw a win-win between accuracy and robustness, yet at the expense of model size and latency, therefore posing challenges for resource-constrained applications. Is it possible to co-design model accuracy, robustness and efficiency to achieve their triple wins? This paper studies multi-exit networks associated with input-adaptive efficient inference, showing their strong promise in achieving a "sweet point" in cooptimizing model accuracy, robustness and efficiency. Our proposed solution, dubbed Robust Dynamic Inference Networks (RDI-Nets), allows for each input (either clean or adversarial) to adaptively choose one of the multiple output layers (early branches or the final one) to output its prediction. That multi-loss adaptivity adds new variations and flexibility to adversarial attacks and defenses, on which we present a systematical investigation. We show experimentally that by equipping existing backbones with such robust adaptive inference, the resulting RDI-Nets can achieve better accuracy and robustness, yet with over 30% computational savings, compared to the defended original models.

[196]  arXiv:2002.10029 [pdf, other]
Title: Symbolic Querying of Vector Spaces: Probabilistic Databases Meets Relational Embeddings
Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB)

To deal with increasing amounts of uncertainty and incompleteness in relational data, we propose unifying techniques from probabilistic databases and relational embedding models. We use probabilistic databases as our formalism to define the probabilistic model with respect to which all queries are done. This allows us to leverage the rich literature of theory and algorithms from probabilistic databases for solving problems. While this formalization can be used with any relational embedding model, the lack of a well defined joint probability distribution causes simple problems to become provably hard. With this in mind, we introduce \TO, a relational embedding model designed in terms of probabilistic databases to exploit typical embedding assumptions within the probabilistic framework. Using principled, efficient inference algorithms that can be derived from its definition, we empirically demonstrate that \TOs is an effective and general model for these tasks.

[197]  arXiv:2002.10033 [pdf]
Title: Citations Systematically Misrepresent the Quality and Impact of Research Articles: Survey and Experimental Evidence from Thousands of Citers
Subjects: Social and Information Networks (cs.SI)

Citations are ubiquitous in evaluating research, but how exactly they relate to what they are thought to measure (quality and intellectual impact) is unclear. We investigate the relationships between citations, quality, and impact using a survey with an embedded experiment in which 12,670 authors in 15 academic fields describe about 25K specific referencing decisions. Results suggest that citation counts, when equated with quality and impact, are biased in opposite directions. First, experimentally exposing papers' actual citation counts during the survey causes respondents to perceive all but the top 10% cited papers as of lower quality. Because perceptions of quality are a key factor in citing decisions, citation counts are likely to endogenously cause more citing of top papers and equating them with quality overestimates the actual quality of those papers. Conversely, 54% of references had either zero or minor influence on authors who cite them, but references to highly cited papers were about 200% more likely to denote substantial impact. Equating citations with impact thus underestimates the impact of highly cited papers. Real citation practices thus reveal that citations are biased measures of quality and impact.

[198]  arXiv:2002.10035 [pdf, ps, other]
Title: A Note on Echelon-Ferrers Construction
Subjects: Information Theory (cs.IT)

Echelon-Ferrers is one of important techniques to help researchers to improve lower bounds for constant-dimension codes. Fagang Li [6] combined the linkage construction and echelon-Ferrers to obtain some new lower bounds of constant-dimension codes. However, this method seems incorrect since we found a counterexample.

[199]  arXiv:2002.10039 [pdf, other]
Title: Computing Bi-Lipschitz Outlier Embeddings into the Line
Subjects: Data Structures and Algorithms (cs.DS)

The problem of computing a bi-Lipschitz embedding of a graphical metric into the line with minimum distortion has received a lot of attention. The best-known approximation algorithm computes an embedding with distortion $O(c^2)$, where $c$ denotes the optimal distortion [B\u{a}doiu \etal~2005]. We present a bi-criteria approximation algorithm that extends the above results to the setting of \emph{outliers}.
Specifically, we say that a metric space $(X,\rho)$ admits a $(k,c)$-embedding if there exists $K\subset X$, with $|K|=k$, such that $(X\setminus K, \rho)$ admits an embedding into the line with distortion at most $c$. Given $k\geq 0$, and a metric space that admits a $(k,c)$-embedding, for some $c\geq 1$, our algorithm computes a $({\mathsf p}{\mathsf o}{\mathsf l}{\mathsf y}(k, c, \log n), {\mathsf p}{\mathsf o}{\mathsf l}{\mathsf y}(c))$-embedding in polynomial time. This is the first algorithmic result for outlier bi-Lipschitz embeddings. Prior to our work, comparable outlier embeddings where known only for the case of additive distortion.

[200]  arXiv:2002.10043 [pdf, other]
Title: Complete Dictionary Learning via $\ell_p$-norm Maximization
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Signal Processing (eess.SP); Machine Learning (stat.ML)

Dictionary learning is a classic representation learning method that has been widely applied in signal processing and data analytics. In this paper, we investigate a family of $\ell_p$-norm ($p>2,p \in \mathbb{N}$) maximization approaches for the complete dictionary learning problem from theoretical and algorithmic aspects. Specifically, we prove that the global maximizers of these formulations are very close to the true dictionary with high probability, even when Gaussian noise is present. Based on the generalized power method (GPM), an efficient algorithm is then developed for the $\ell_p$-based formulations. We further show the efficacy of the developed algorithm: for the population GPM algorithm over the sphere constraint, it first quickly enters the neighborhood of a global maximizer, and then converges linearly in this region. Extensive experiments will demonstrate that the $\ell_p$-based approaches enjoy a higher computational efficiency and better robustness than conventional approaches and $p=3$ performs the best.

[201]  arXiv:2002.10045 [pdf, ps, other]
Title: Optimal Advertising for Information Products
Subjects: Computer Science and Game Theory (cs.GT); Theoretical Economics (econ.TH)

When selling information, sometimes the seller can increase the revenue by giving away some partial information to change the buyer's belief about the information product, so the buyer may be more willing to purchase. This work studies the general problem of advertising information products by revealing some partial information. We consider a buyer who needs to make a decision, the outcome of which depends on the state of the world that is unknown to the buyer. There is an information seller who has access to information about the state of the world. The seller can advertise the information by revealing some partial information. We consider a seller who chooses an advertising strategy and then commits to it. The buyer decides whether to purchase the full information product after seeing the partial information. The seller's goal is to maximize the expected revenue. We prove that finding the optimal advertising strategy is hard, even in the simple case that the buyer type is known. Nevertheless, we show that when the buyer type is known, the problem is equivalent to finding the concave closure of a function. Based on this observation, we prove some properties of the optimal mechanism, which allow us to solve the optimal mechanism by a convex program (with exponential size in general, polynomial size for special cases). We also prove some interesting characterizations of the optimal mechanisms based on these properties. For the general problem when the seller only knows the type distribution of the buyer, it is NP-hard to find a constant factor approximation. We thus look at special cases and provide an approximation algorithm that finds an $\varepsilon$-suboptimal mechanism when it is not too hard to predict the possible type of buyer who will make the purchase.

[202]  arXiv:2002.10047 [pdf, other]
Title: Parallel Clique Counting and Peeling Algorithms
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

Dense subgraphs capture strong communities in social networks and entities possessing strong interactions in biological networks. In particular, $k$-clique counting and listing have applications in identifying important actors in a graph. However, finding $k$-cliques is computationally expensive, and thus it is important to have fast parallel algorithms.
We present a new parallel algorithm for $k$-clique counting that has polylogarithmic span and is work-efficient with respect to the well-known sequential algorithm for $k$-clique listing by Chiba and Nishizeki. Our algorithm can be extended to support listing and enumeration, and is based on computing low out-degree orientations. We present a new linear-work and polylogarithmic span algorithm for computing such orientations, and new parallel algorithms for producing unbiased estimations of clique counts. Finally, we design new parallel work-efficient algorithms for approximating the $k$-clique densest subgraph. Our first algorithm gives a $1/k$-approximation and is based on iteratively peeling vertices with the lowest clique counts; our algorithm is work-efficient, but we prove that this process is P-complete and hence does not have polylogarithmic span. Our second algorithm gives a $1/(k(1+\epsilon))$-approximation, is work-efficient, and has polylogarithmic span.
In addition, we implement these algorithms and propose optimizations. On a 60-core machine, we achieve 13.23-38.99x and 1.19-13.76x self-relative parallel speedup for $k$-clique counting and $k$-clique densest subgraph, respectively. Compared to the state-of-the-art parallel $k$-clique counting algorithms, we achieve a 1.31-9.88x speedup, and compared to existing implementations of $k$-clique densest subgraph, we achieve a 1.01-11.83x speedup. We are able to compute the $4$-clique counts on the largest publicly-available graph with over two hundred billion edges.

[203]  arXiv:2002.10055 [pdf, other]
Title: Ensuring Privacy in Location-Based Services: A Model-based Approach
Subjects: Cryptography and Security (cs.CR)

In recent years, the widespread of mobile devices equipped with GPS and communication chips has led to the growing use of location-based services (LBS) in which a user receives a service based on his current location. The disclosure of user's location, however, can raise serious concerns about user privacy in general, and location privacy in particular which led to the development of various location privacy-preserving mechanisms aiming to enhance the location privacy while using LBS applications. In this paper, we propose to model the user mobility pattern and utility of the LBS as a Markov decision process (MDP), and inspired by probabilistic current state opacity notation, we introduce a new location privacy metric, namely $\epsilon-$privacy, that quantifies the adversary belief over the user's current location. We exploit this dynamic model to design a LPPM that while it ensures the utility of service is being fully utilized, independent of the adversary prior knowledge about the user, it can guarantee a user-specified privacy level can be achieved for an infinite time horizon. The overall privacy-preserving framework, including the construction of the user mobility model as a MDP, and design of the proposed LPPM, are demonstrated and validated with real-world experimental data.

[204]  arXiv:2002.10059 [pdf, other]
Title: Cooperative Adaptive Learning Control for A Group of Nonholonomic UGVs by Output Feedback
Subjects: Systems and Control (eess.SY)

A high-gain observer-based cooperative deterministic learning (CDL) control algorithm is proposed in this chapter for a group of identical unicycle-type unmanned ground vehicles (UGVs) to track over desired reference trajectories. For the vehicle states, the positions of the vehicles can be measured, while the velocities are estimated using the high-gain observer. For the trajectory tracking controller, the radial basis function (RBF) neural network (NN) is used to online estimate the unknown dynamics of the vehicle, and the NN weight convergence and estimation accuracy is guaranteed by CDL. The major challenge and novelty of this chapter is to track the reference trajectory using this observer-based CDL algorithm without the full knowledge of the vehicle state and vehicle model. In addition, any vehicle in the system is able to learn the knowledge of unmodeled dynamics along the union of trajectories experienced by all vehicle agents, such that the learned knowledge can be re-used to follow any reference trajectory defined in the learning phase. The learning-based tracking convergence and consensus learning results, as well as using learned knowledge for tracking experienced trajectories, are shown using the Lyapunov method. Simulation is given to show the effectiveness of this algorithm.

[205]  arXiv:2002.10061 [pdf, other]
Title: Rethinking 1D-CNN for Time Series Classification: A Stronger Baseline
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

For time series classification task using 1D-CNN, the selection of kernel size is critically important to ensure the model can capture the right scale salient signal from a long time-series. Most of the existing work on 1D-CNN treats the kernel size as a hyper-parameter and tries to find the proper kernel size through a grid search which is time-consuming and is inefficient. This paper theoretically analyses how kernel size impacts the performance of 1D-CNN. Considering the importance of kernel size, we propose a novel Omni-Scale 1D-CNN (OS-CNN) architecture to capture the proper kernel size during the model learning period. A specific design for kernel size configuration is developed which enables us to assemble very few kernel-size options to represent more receptive fields. The proposed OS-CNN method is evaluated using the UCR archive with 85 datasets. The experiment results demonstrate that our method is a stronger baseline in multiple performance indicators, including the critical difference diagram, counts of wins, and average accuracy. We also published the experimental source codes at GitHub (https://github.com/Wensi-Tang/OS-CNN/).

[206]  arXiv:2002.10064 [pdf, other]
Title: Exploring the Connection Between Binary and Spiking Neural Networks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

On-chip edge intelligence has necessitated the exploration of algorithmic techniques to reduce the compute requirements of current machine learning frameworks. This work aims to bridge the recent algorithmic progress in training Binary Neural Networks and Spiking Neural Networks - both of which are driven by the same motivation and yet synergies between the two have not been fully explored. We show that training Spiking Neural Networks in the extreme quantization regime results in near full precision accuracies on large-scale datasets like CIFAR-$100$ and ImageNet. An important implication of this work is that Binary Spiking Neural Networks can be enabled by "In-Memory" hardware accelerators catered for Binary Neural Networks without suffering any accuracy degradation due to binarization. We utilize standard training techniques for non-spiking networks to generate our spiking networks by conversion process and also perform an extensive empirical analysis and explore simple design-time and run-time optimization techniques for reducing inference latency of spiking networks (both for binary and full-precision models) by an order of magnitude over prior work.

[207]  arXiv:2002.10066 [pdf, other]
Title: Learning From Strategic Agents: Accuracy, Improvement, and Causality
Comments: 18 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

In many predictive decision-making scenarios, such as credit scoring and academic testing, a decision-maker must construct a model (predicting some outcome) that accounts for agents' incentives to "game" their features in order to receive better decisions. Whereas the strategic classification literature generally assumes that agents' outcomes are not causally dependent on their features (and thus strategic behavior is a form of lying), we join concurrent work in modeling agents' outcomes as a function of their changeable attributes. Our formulation is the first to incorporate a crucial phenomenon: when agents act to change observable features, they may as a side effect perturb hidden features that causally affect their true outcomes.
We consider three distinct desiderata for a decision-maker's model: accurately predicting agents' post-gaming outcomes (accuracy), incentivizing agents to improve these outcomes (improvement), and, in the linear setting, estimating the visible coefficients of the true causal model (causal precision). As our main contribution, we provide the first algorithms for learning accuracy-optimizing, improvement-optimizing, and causal-precision-optimizing linear regression models directly from data, without prior knowledge of agents' possible actions. These algorithms circumvent the hardness result of Miller et al. (2019) by allowing the decision maker to observe agents' responses to a sequence of decision rules, in effect inducing agents to perform causal interventions for free.

[208]  arXiv:2002.10069 [pdf, other]
Title: Robust Learning-Based Control via Bootstrapped Multiplicative Noise
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC); Machine Learning (stat.ML)

Despite decades of research and recent progress in adaptive control and reinforcement learning, there remains a fundamental lack of understanding in designing controllers that provide robustness to inherent non-asymptotic uncertainties arising from models estimated with finite, noisy data. We propose a robust adaptive control algorithm that explicitly incorporates such non-asymptotic uncertainties into the control design. The algorithm has three components: (1) a least-squares nominal model estimator; (2) a bootstrap resampling method that quantifies non-asymptotic variance of the nominal model estimate; and (3) a non-conventional robust control design method using an optimal linear quadratic regulator (LQR) with multiplicative noise. A key advantage of the proposed approach is that the system identification and robust control design procedures both use stochastic uncertainty representations, so that the actual inherent statistical estimation uncertainty directly aligns with the uncertainty the robust controller is being designed against. We show through numerical experiments that the proposed robust adaptive controller can significantly outperform the certainty equivalent controller on both expected regret and measures of regret risk.

[209]  arXiv:2002.10070 [pdf, other]
Title: An Overlapping Domain Decomposition Framework without Dual Formulation for Variational Imaging Problems
Authors: Jongho Park
Comments: 24 pages, 7 figures
Subjects: Numerical Analysis (math.NA)

In this paper, we propose a novel overlapping domain decomposition method that can be applied to various problems in variational imaging such as total variation minimization. Most of recent domain decomposition methods for total variation minimization adopt the Fenchel--Rockafellar duality, whereas the proposed method is based on the primal formulation. Thus, the proposed method can be applied not only to total variation minimization but also to those with complex dual problems such as higher order models. In the proposed method, an equivalent formulation of the model problem with parallel structure is constructed using a custom overlapping domain decomposition scheme with the notion of essential domains. As a solver for the constructed formulation, we propose a decoupled augmented Lagrangian method for untying the coupling of adjacent subdomains. Convergence analysis of the decoupled augmented Lagrangian method is provided. We present implementation details and numerical examples for various model problems including total variation minimizations and higher order models.

[210]  arXiv:2002.10072 [pdf, other]
Title: Reconfigurable Intelligent Surface Assisted Multiuser MISO Systems Exploiting Deep Reinforcement Learning
Comments: 12 pages. Accepted by IEEE JSAC special issue on Multiple Antenna Technologies for Beyond 5G
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG)

Recently, the reconfigurable intelligent surface (RIS), benefited from the breakthrough on the fabrication of programmable meta-material, has been speculated as one of the key enabling technologies for the future six generation (6G) wireless communication systems scaled up beyond massive multiple input multiple output (Massive-MIMO) technology to achieve smart radio environments. Employed as reflecting arrays, RIS is able to assist MIMO transmissions without the need of radio frequency chains resulting in considerable reduction in power consumption. In this paper, we investigate the joint design of transmit beamforming matrix at the base station and the phase shift matrix at the RIS, by leveraging recent advances in deep reinforcement learning (DRL). We first develop a DRL based algorithm, in which the joint design is obtained through trial-and-error interactions with the environment by observing predefined rewards, in the context of continuous state and action. Unlike the most reported works utilizing the alternating optimization techniques to alternatively obtain the transmit beamforming and phase shifts, the proposed DRL based algorithm obtains the joint design simultaneously as the output of the DRL neural network. Simulation results show that the proposed algorithm is not only able to learn from the environment and gradually improve its behavior, but also obtains the comparable performance compared with two state-of-the-art benchmarks. It is also observed that, appropriate neural network parameter settings will improve significantly the performance and convergence rate of the proposed algorithm.

[211]  arXiv:2002.10077 [pdf, other]
Title: Approximate Data Deletion from Machine Learning Models: Algorithms and Evaluations
Comments: 20 pages, 2 figures, under review by ICML
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Deleting data from a trained machine learning (ML) model is a critical task in many applications. For example, we may want to remove the influence of training points that might be out of date or outliers. Regulations such as EU's General Data Protection Regulation also stipulate that individuals can request to have their data deleted. The naive approach to data deletion is to retrain the ML model on the remaining data, but this is too time consuming. Moreover there is no known efficient algorithm that exactly deletes data from most ML models. In this work, we evaluate several approaches for approximate data deletion from trained models. For the case of linear regression, we propose a new method with linear dependence on the feature dimension $d$, a significant gain over all existing methods which all have superlinear time dependence on the dimension. We also provide a new test for evaluating data deletion from linear models.

[212]  arXiv:2002.10078 [pdf, other]
Title: TrojanNet: Embedding Hidden Trojan Horse Models in Neural Networks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The complexity of large-scale neural networks can lead to poor understanding of their internal details. We show that this opaqueness provides an opportunity for adversaries to embed unintended functionalities into the network in the form of Trojan horses. Our novel framework hides the existence of a Trojan network with arbitrary desired functionality within a benign transport network. We prove theoretically that the Trojan network's detection is computationally infeasible and demonstrate empirically that the transport network does not compromise its disguise. Our paper exposes an important, previously unknown loophole that could potentially undermine the security and trustworthiness of machine learning.

[213]  arXiv:2002.10080 [pdf, other]
Title: Sparse Optimization for Green Edge AI Inference
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)

With the rapid upsurge of deep learning tasks at the network edge, effective edge artificial intelligence (AI) inference becomes critical to provide low-latency intelligent services for mobile users via leveraging the edge computing capability. In such scenarios, energy efficiency becomes a primary concern. In this paper, we present a joint inference task selection and downlink beamforming strategy to achieve energy-efficient edge AI inference through minimizing the overall power consumption consisting of both computation and transmission power consumption, yielding a mixed combinatorial optimization problem. By exploiting the inherent connections between the set of task selection and group sparsity structural transmit beamforming vector, we reformulate the optimization as a group sparse beamforming problem. To solve this challenging problem, we propose a log-sum function based three-stage approach. By adopting the log-sum function to enhance the group sparsity, a proximal iteratively reweighted algorithm is developed. Furthermore, we establish the global convergence analysis and provide the ergodic worst-case convergence rate for this algorithm. Simulation results will demonstrate the effectiveness of the proposed approach for improving energy efficiency in edge AI inference systems.

[214]  arXiv:2002.10081 [pdf, other]
Title: Toward a mathematical theory of the crystallographic phase retrieval problem
Subjects: Information Theory (cs.IT); Algebraic Geometry (math.AG)

Motivated by the X-ray crystallography technology to determine the atomic structure of biological molecules, we study the crystallographic phase retrieval problem, arguably the leading and hardest phase retrieval setup. This problem entails recovering a K-sparse signal of length N from its Fourier magnitude or, equivalently, from its periodic auto-correlation. Specifically, this work focuses on the fundamental question of uniqueness: what is the maximal sparsity level K/N that allows unique mapping between a signal and its Fourier magnitude, up to intrinsic symmetries. We design a systemic computational technique to affirm uniqueness for any specific pair (K,N), and establish the following conjecture: the Fourier magnitude determines a generic signal uniquely, up to intrinsic symmetries, as long as K<=N/2. Based on group-theoretic considerations and an additional computational technique, we formulate a second conjecture: if K<N/2, then for any signal the set of solutions to the crystallographic phase retrieval problem has measure zero in the set of all signals with a given Fourier magnitude. Together, these conjectures constitute the first attempt to establish a mathematical theory for the crystallographic phase retrieval problem.

[215]  arXiv:2002.10083 [pdf, other]
Title: Optimizing High Performance Markov Clustering for Pre-Exascale Architectures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

HipMCL is a high-performance distributed memory implementation of the popular Markov Cluster Algorithm (MCL) and can cluster large-scale networks within hours using a few thousand CPU-equipped nodes. It relies on sparse matrix computations and heavily makes use of the sparse matrix-sparse matrix multiplication kernel (SpGEMM). The existing parallel algorithms in HipMCL are not scalable to Exascale architectures, both due to their communication costs dominating the runtime at large concurrencies and also due to their inability to take advantage of accelerators that are increasingly popular.
In this work, we systematically remove scalability and performance bottlenecks of HipMCL. We enable GPUs by performing the expensive expansion phase of the MCL algorithm on GPU. We propose a CPU-GPU joint distributed SpGEMM algorithm called pipelined Sparse SUMMA and integrate a probabilistic memory requirement estimator that is fast and accurate. We develop a new merging algorithm for the incremental processing of partial results produced by the GPUs, which improves the overlap efficiency and the peak memory usage. We also integrate a recent and faster algorithm for performing SpGEMM on CPUs. We validate our new algorithms and optimizations with extensive evaluations. With the enabling of the GPUs and integration of new algorithms, HipMCL is up to 12.4x faster, being able to cluster a network with 70 million proteins and 68 billion connections just under 15 minutes using 1024 nodes of ORNL's Summit supercomputer.

[216]  arXiv:2002.10084 [pdf, other]
Title: Utilizing a null class to restrict decision spaces and defend against neural network adversarial attacks
Authors: Matthew J. Roos
Comments: 15 pages, 19 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Despite recent progress, deep neural networks generally continue to be vulnerable to so-called adversarial examples--input images with small perturbations that can result in changes in the output classifications, despite no such change in the semantic meaning to human viewers. This is true even for seemingly simple challenges such as the MNIST digit classification task. In part, this suggests that these networks are not relying on the same set of object features as humans use to make these classifications. In this paper we examine an additional, and largely unexplored, cause behind this phenomenon--namely, the use of the conventional training paradigm in which the entire input space is parcellated among the training classes. Owing to this paradigm, learned decision spaces for individual classes span excessively large regions of the input space and include images that have no semantic similarity to images in the training set. In this study, we train models that include a null class. That is, models may "opt-out" of classifying an input image as one of the digit classes. During training, null images are created through a variety of methods, in an attempt to create tighter and more semantically meaningful decision spaces for the digit classes. The best performing models classify nearly all adversarial examples as nulls, rather than mistaking them as a member of an incorrect digit class, while simultaneously maintaining high accuracy on the unperturbed test set. The use of a null class and the training paradigm presented herein may provide an effective defense against adversarial attacks for some applications. Code for replicating this study will be made available at https://github.com/mattroos/null_class_adversarial_defense .

[217]  arXiv:2002.10085 [pdf, other]
Title: Temporal Spike Sequence Learning via Backpropagation for Deep Spiking Neural Networks
Authors: Wenrui Zhang, Peng Li
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)

Spiking neural networks (SNNs) are well suited for spatio-temporal learning and implementations on energy-efficient event-driven neuromorphic processors. However, existing SNNs error backpropagation (BP) track methods lack proper handling of spiking discontinuities and suffer from low performance compared to BP methods for traditional artificial neural networks. In addition, a large number of time steps are typically required for SNNs to achieve decent performance, leading to high latency and rendering spike-based computation unscalable to deep architectures. We present a novel Temporal Spike Sequence Learning Backpropagation (TSSL-BP) method for training deep SNNs, which breaks down error backpropagation across two types of inter-neuron and intra-neuron dependencies. It considers the all-or-none characteristics of firing activities, capturing inter-neuron dependencies through presynaptic firing times, and internal evolution of each neuronal state through time capturing intra-neuron dependencies. For various image classification datasets, TSSL-BP efficiently trains deep SNNs within a short temporal time window of a few steps with improved accuracy and runtime efficiency including achieving more than 2% accuracy improvement over the previously reported SNN work on CIFAR10.

[218]  arXiv:2002.10096 [pdf, other]
Title: Emosaic: Visualizing Affective Content of Text at Varying Granularity
Comments: 9 pages, 7 figures
Subjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL); Computers and Society (cs.CY)

This paper presents Emosaic, a tool for visualizing the emotional tone of text documents, considering multiple dimensions of emotion and varying levels of semantic granularity. Emosaic is grounded in psychological research on the relationship between language, affect, and color perception. We capitalize on an established three-dimensional model of human emotion: valence (good, nice vs. bad, awful), arousal (calm, passive vs. exciting, active) and dominance (weak, controlled vs. strong, in control). Previously, multi-dimensional models of emotion have been used rarely in visualizations of textual data, due to the perceptual challenges involved. Furthermore, until recently most text visualizations remained at a high level, precluding closer engagement with the deep semantic content of the text. Informed by empirical studies, we introduce a color mapping that translates any point in three-dimensional affective space into a unique color. Emosaic uses affective dictionaries of words annotated with the three emotional parameters of the valence-arousal-dominance model to extract emotional meanings from texts and then assigns to them corresponding color parameters of the hue-saturation-brightness color space. This approach of mapping emotion to color is aimed at helping readers to more easily grasp the emotional tone of the text. Several features of Emosaic allow readers to interactively explore the affective content of the text in more detail; e.g., in aggregated form as histograms, in sequential form following the order of text, and in detail embedded into the text display itself. Interaction techniques have been included to allow for filtering and navigating of text and visualizations.

[219]  arXiv:2002.10097 [pdf, other]
Title: Fast and Stable Adversarial Training through Noise Injection
Comments: 7 pages, 3 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Adversarial training is the most successful empirical method, to increase the robustness of neural networks against adversarial attacks yet. Unfortunately, this higher robustness is accompanied by considerably higher computational complexity. To date, only adversarial training with expensive multi-step adversarial attacks like Projected Gradient Descent (PGD) proved effective against equally strong attacks. In this paper, we present two ideas that combined enable adversarial training with the computationally less expensive Fast Gradient Sign Method (FGSM). First, we add uniform noise to the initial data point of the FGSM attack, which creates a wider variety of stronger adversaries. Further, we add a learnable regularization step prior to the neural network called Stochastic Augmentation Layer (SAL). Inputs propagated trough the SAL are resampled from a Gaussian distribution. The randomness of the resampling at inference time makes it more complicated for the attacker to construct an adversarial example since the outcome of the model is not known in advance. We show that noise injection in conjunction with FGSM adversarial training achieves comparable results to adversarial training with PGD while being orders of magnitude faster. Moreover, we show superior results in comparison to PGD-based training when combining noise injection and SAL.

[220]  arXiv:2002.10098 [pdf, other]
Title: An RLS-Based Instantaneous Velocity Estimator for Extended Radar Tracking
Comments: 8 pages, 11 figures
Subjects: Robotics (cs.RO)

Radar sensors have become an important part of the perception sensor suite due to their long range and their ability to work in adverse weather conditions. However, several shortcomings such as large amounts of noise and extreme sparsity of the point cloud result in them not being used to their full potential. In this paper, we present a novel Recursive Least Squares (RLS) based approach to estimate the instantaneous velocity of dynamic objects in real-time that is capable of handling large amounts of noise in the input data stream. We also present an end-to-end pipeline to track extended objects in real-time that uses the computed velocity estimates for data association and track initialisation. The approaches are evaluated using several real-world inspired driving scenarios that test the limits of these algorithms. It is also experimentally proven that our approaches run in real-time with frame execution time not exceeding 30 ms even in dense traffic scenarios, thus allowing for their direct implementation on autonomous vehicles.

[221]  arXiv:2002.10099 [pdf, other]
Title: Implicit Geometric Regularization for Learning Shapes
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (stat.ML)

Representing shapes as level sets of neural networks has been recently proved to be useful for different shape analysis and reconstruction tasks So far, such representations were computed using either: (i) pre-computed implicit shape representations; or (ii) loss functions explicitly defined over the neural level sets. In this paper we offer a new paradigm for computing high fidelity implicit neural representations directly from raw data (i.e., point clouds, with or without normal information). We observe that a rather simple loss function, encouraging the neural network to vanish on the input point cloud and to have a unit norm gradient, possesses an implicit geometric regularization property that favors smooth and natural zero level set surfaces, avoiding bad zero-loss solutions. We provide a theoretical analysis of this property for the linear case, and show that, in practice, our method leads to state of the art implicit neural representations with higher level-of-details and fidelity compared to previous methods.

[222]  arXiv:2002.10100 [pdf, other]
Title: LeafGAN: An Effective Data Augmentation Method for Practical Plant Disease Diagnosis
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Many applications for the automated diagnosis of plant disease have been developed based on the success of deep learning techniques. However, these applications often suffer from overfitting, and the diagnostic performance is drastically decreased when used on test datasets from new environments. The typical reasons for this are that the symptoms to be detected are unclear or faint, and there are limitations related to data diversity. In this paper, we propose LeafGAN, a novel image-to-image translation system with own attention mechanism. LeafGAN generates a wide variety of diseased images via transformation from healthy images, as a data augmentation tool for improving the performance of plant disease diagnosis. Thanks to its own attention mechanism, our model can transform only relevant areas from images with a variety of backgrounds, thus enriching the versatility of the training images. Experiments with five-class cucumber disease classification show that data augmentation with vanilla CycleGAN cannot help to improve the generalization, i.e. disease diagnostic performance increased by only 0.7% from the baseline. In contrast, LeafGAN boosted the diagnostic performance by 7.4%. We also visually confirmed the generated images by our LeafGAN were much better quality and more convincing than those generated by vanilla CycleGAN.

[223]  arXiv:2002.10101 [pdf, other]
Title: GRET: Global Representation Enhanced Transformer
Comments: Accepted by AAAI 2020
Subjects: Computation and Language (cs.CL)

Transformer, based on the encoder-decoder framework, has achieved state-of-the-art performance on several natural language generation tasks. The encoder maps the words in the input sentence into a sequence of hidden states, which are then fed into the decoder to generate the output sentence. These hidden states usually correspond to the input words and focus on capturing local information. However, the global (sentence level) information is seldom explored, leaving room for the improvement of generation quality. In this paper, we propose a novel global representation enhanced Transformer (GRET) to explicitly model global representation in the Transformer network. Specifically, in the proposed model, an external state is generated for the global representation from the encoder. The global representation is then fused into the decoder during the decoding process to improve generation quality. We conduct experiments in two text generation tasks: machine translation and text summarization. Experimental results on four WMT machine translation tasks and LCSTS text summarization task demonstrate the effectiveness of the proposed approach on natural language generation.

[224]  arXiv:2002.10102 [pdf, other]
Title: GANHopper: Multi-Hop GAN for Unsupervised Image-to-Image Translation
Comments: 9 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce GANHOPPER, an unsupervised image-to-image translation network that transforms images gradually between two domains, through multiple hops. Instead of executing translation directly, we steer the translation by requiring the network to produce in-between images which resemble weighted hybrids between images from the two in-put domains. Our network is trained on unpaired images from the two domains only, without any in-between images.All hops are produced using a single generator along each direction. In addition to the standard cycle-consistency and adversarial losses, we introduce a new hybrid discrimina-tor, which is trained to classify the intermediate images produced by the generator as weighted hybrids, with weights based on a predetermined hop count. We also introduce a smoothness term to constrain the magnitude of each hop,further regularizing the translation. Compared to previous methods, GANHOPPER excels at image translations involving domain-specific image features and geometric variations while also preserving non-domain-specific features such as backgrounds and general color schemes.

[225]  arXiv:2002.10105 [pdf, other]
Title: Communication Contention Aware Scheduling of Multiple Deep Learning Training Jobs
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Distributed Deep Learning (DDL) has rapidly grown its popularity since it helps boost the training performance on high-performance GPU clusters. Efficient job scheduling is indispensable to maximize the overall performance of the cluster when training multiple jobs simultaneously. However, existing schedulers do not consider the communication contention of multiple communication tasks from different distributed training jobs, which could deteriorate the system performance and prolong the job completion time. In this paper, we first establish a new DDL job scheduling framework which organizes DDL jobs as Directed Acyclic Graphs (DAGs) and considers communication contention between nodes. We then propose an efficient algorithm, LWF-$\kappa$, to balance the GPU utilization and consolidate the allocated GPUs for each job. When scheduling those communication tasks, we observe that neither avoiding all the contention nor blindly accepting them is optimal to minimize the job completion time. We thus propose a provable algorithm, AdaDUAL, to efficiently schedule those communication tasks. Based on AdaDUAL, we finally propose Ada-SRSF for the DDL job scheduling problem. Simulations on a 64-GPU cluster connected with 10 Gbps Ethernet show that LWF-$\kappa$ achieves up to $1.59\times$ improvement over the classical first-fit algorithms. More importantly, Ada-SRSF reduces the average job completion time by $20.1\%$ and $36.7\%$, as compared to the SRSF(1) scheme (avoiding all the contention) and the SRSF(2) scheme (blindly accepting all of two-way communication contention) respectively.

[226]  arXiv:2002.10107 [pdf]
Title: Predicting Subjective Features from Questions on QA Websites using BERT
Comments: 5 pages, 4 figures, 2 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Modern Question-Answering websites, such as StackOverflow and Quora, have specific user rules to maintain their content quality. These systems rely on user reports for accessing new contents, which has serious problems including the slow handling of violations, the loss of normal and experienced users' time, the low quality of some reports, and discouraging feedback to new users. Therefore, with the overall goal of providing solutions for automating moderation actions in Q&A websites, we aim to provide a model to predict 20 quality or subjective aspects of questions in QA websites. To this end, we used data gathered by the CrowdSource team at Google Research in 2019 and fine-tuned pre-trained BERT model on our problem. Model achieves 95.4% accuracy after 2 epochs of training and did not improve substantially in the next ones. Results confirm that by simple fine-tuning, we can achieve accurate models, in little time, and on less amount of data.

[227]  arXiv:2002.10110 [pdf, ps, other]
Title: Revisiting EXTRA for Smooth Distributed Optimization
Authors: Huan Li, Zhouchen Lin
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Optimization and Control (math.OC)

EXTRA is a popular method for the dencentralized distributed optimization and has broad applications. This paper revisits the EXTRA. Firstly, we give a sharp complexity analysis for EXTRA with the improved $O\left(\left(\frac{L}{\mu}+\frac{1}{1-\sigma_2(W)}\right)\log\frac{1}{\epsilon(1-\sigma_2(W))}\right)$ communication and computation complexities for $\mu$-strongly convex and $L$-smooth problems, where $\sigma_2(W)$ is the second largest singular value of the weight matrix $W$. When the strong convexity is absent, we prove the $O\left(\left(\frac{L}{\epsilon}+\frac{1}{1-\sigma_2(W)}\right)\log\frac{1}{1-\sigma_2(W)}\right)$ complexities. Then, we use the Catalyst framework to accelerate EXTRA and obtain the $O\left(\sqrt{\frac{L}{\mu(1-\sigma_2(W))}}\log\frac{ L}{\mu(1-\sigma_2(W))}\log\frac{1}{\epsilon}\right)$ communication and computation complexities for strongly convex and smooth problems and the $O\left(\sqrt{\frac{L}{\epsilon(1-\sigma_2(W))}}\log\frac{1}{\epsilon(1-\sigma_2(W))}\right)$ complexities for non-strongly convex ones. Our communication complexities of the accelerated EXTRA are only worse by the factors of $\left(\log\frac{L}{\mu(1-\sigma_2(W))}\right)$ and $\left(\log\frac{1}{\epsilon(1-\sigma_2(W))}\right)$ from the lower complexity bounds for strongly convex and non-strongly convex problems, respectively.

[228]  arXiv:2002.10111 [pdf, other]
Title: SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation
Comments: 8 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving. In case of monocular vision, successful methods have been mainly based on two ingredients: (i) a network generating 2D region proposals, (ii) a R-CNN structure predicting 3D object pose by utilizing the acquired regions of interest. We argue that the 2D detection network is redundant and introduces non-negligible noise for 3D detection. Hence, we propose a novel 3D object detection method, named SMOKE, in this paper that predicts a 3D bounding box for each detected object by combining a single keypoint estimate with regressed 3D variables. As a second contribution, we propose a multi-step disentangling approach for constructing the 3D bounding box, which significantly improves both training convergence and detection accuracy. In contrast to previous 3D detection techniques, our method does not require complicated pre/post-processing, extra data, and a refinement stage. Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset, giving the best state-of-the-art result on both 3D object detection and Bird's eye view evaluation. The code will be made publicly available.

[229]  arXiv:2002.10112 [pdf, ps, other]
Title: Intelligent Reflecting Surface: Practical Phase Shift Model and Beamforming Optimization
Comments: submitted for possible journal publication (Part of this work will be presented in the IEEE International Conference on Communications (ICC), Dublin, Ireland, 2020 Available: arXiv:1907.06002)
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Intelligent reflecting surface (IRS) that enables the control of wireless propagation environment has recently emerged as a promising cost-effective technology for boosting the spectrum and energy efficiency in future wireless communication systems. Prior works on IRS are mainly based on the ideal phase shift model assuming the full signal reflection by each of the elements regardless of its phase shift, which, however, is practically difficult to realize. In contrast, we propose in this paper the practical phase shift model that captures the phase-dependent amplitude variation in the element-wise reflection coefficient. Based on the proposed model and considering an IRS-aided multiuser system with an IRS deployed to assist in the downlink communications from a multi-antenna access point (AP) to multiple single-antenna users, we formulate an optimization problem to minimize the total transmit power at the AP by jointly designing the AP transmit beamforming and the IRS reflect beamforming, subject to the users' individual signal-to-interference-plus-noise ratio (SINR) constraints. Iterative algorithms are proposed to find suboptimal solutions to this problem efficiently by utilizing the alternating optimization (AO) or penalty-based optimization technique. Moreover, we analyze the asymptotic performance loss of the IRS-aided system that employs practical phase shifters but assumes the ideal phase shift model for beamforming optimization, as the number of IRS elements goes to infinity. Simulation results unveil substantial performance gains achieved by the proposed beamforming optimization based on the practical phase shift model as compared to the conventional ideal model.

[230]  arXiv:2002.10113 [pdf, other]
Title: APAC-Net: Alternating the Population and Agent Control via Two Neural Networks to Solve High-Dimensional Stochastic Mean Field Games
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Optimization and Control (math.OC); Machine Learning (stat.ML)

We present APAC-Net, an alternating population and agent control neural network for solving stochastic mean field games (MFGs). Our algorithm is geared toward high-dimensional instances MFGs that are beyond reach with existing solution methods. We achieve this in two steps. First, we take advantage of the underlying variational primal-dual structure that MFGs exhibit and phrase it as a convex-concave saddle point problem. Second, we parameterize the value and density functions by two neural networks, respectively. By phrasing the problem in this manner, solving the MFG can be interpreted as a special case of training a generative adversarial generative network (GAN). We show the potential of our method on up to 50-dimensional MFG problems.

[231]  arXiv:2002.10116 [pdf, other]
Title: A Hybrid Approach to Dependency Parsing: Combining Rules and Morphology with Deep Learning
Authors: Şaziye Betül Özateş (1), Arzucan Özgür (1), Tunga Güngör (1), Balkız Öztürk (2) ((1) Department of Computer Engineering, Boğaziçi University, (2) Department of Linguistics, Boğaziçi University)
Comments: 25 pages, 7 figures
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Fully data-driven, deep learning-based models are usually designed as language-independent and have been shown to be successful for many natural language processing tasks. However, when the studied language is low-resourced and the amount of training data is insufficient, these models can benefit from the integration of natural language grammar-based information. We propose two approaches to dependency parsing especially for languages with restricted amount of training data. Our first approach combines a state-of-the-art deep learning-based parser with a rule-based approach and the second one incorporates morphological information into the parser. In the rule-based approach, the parsing decisions made by the rules are encoded and concatenated with the vector representations of the input words as additional information to the deep network. The morphology-based approach proposes different methods to include the morphological structure of words into the parser network. Experiments are conducted on the IMST-UD Treebank and the results suggest that integration of explicit knowledge about the target language to a neural parser through a rule-based parsing system and morphological analysis leads to more accurate annotations and hence, increases the parsing performance in terms of attachment scores. The proposed methods are developed for Turkish, but can be adapted to other languages as well.

[232]  arXiv:2002.10119 [pdf, other]
Title: DeepSign: Deep On-Line Signature Verification
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

Deep learning has become a breathtaking technology in the last years, overcoming traditional handcrafted approaches and even humans for many different tasks. However, in some tasks, such as the verification of handwritten signatures, the amount of publicly available data is scarce, what makes difficult to test the real limits of deep learning. In addition to the lack of public data, it is not easy to evaluate the improvements of novel proposed approaches as different databases and experimental protocols are usually considered.
The main contributions of this study are: i) we provide an in-depth analysis of state-of-the-art deep learning approaches for on-line signature verification, ii) we present and describe the new DeepSignDB on-line handwritten signature biometric public database, iii) we propose a standard experimental protocol and benchmark to be used for the research community in order to perform a fair comparison of novel approaches with the state of the art, and iv) we adapt and evaluate our recent deep learning approach named Time-Aligned Recurrent Neural Networks (TA-RNNs) for the task of on-line handwritten signature verification. This approach combines the potential of Dynamic Time Warping and Recurrent Neural Networks to train more robust systems against forgeries. Our proposed TA-RNN system outperforms the state of the art, achieving results even below 2.0% EER when considering skilled forgery impostors and just one training signature per user.

[233]  arXiv:2002.10120 [pdf, other]
Title: Semantic Flow for Fast and Accurate Scene Parsing
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

In this paper, we focus on effective methods for fast and accurate scene parsing. A common practice to improve the performance is to attain high resolution feature maps with strong semantic representation. Two strategies are widely used---astrous convolutions and feature pyramid fusion, are either computation intensive or ineffective. Inspired by Optical Flow for motion alignment between adjacent video frames, we propose a Flow Alignment Module (FAM) to learn Semantic Flow between feature maps of adjacent levels and broadcast high-level features to high resolution features effectively and efficiently. Furthermore, integrating our module to a common feature pyramid structure exhibits superior performance over other real-time methods even on very light-weight backbone networks, such as ResNet-18. Extensive experiments are conducted on several challenging datasets, including Cityscapes, PASCAL Context, ADE20K and CamVid. Particularly, our network is the first to achieve 80.4\% mIoU on Cityscapes with a frame rate of 26 FPS. The code will be available at \url{https://github.com/donnyyou/torchcv}.

[234]  arXiv:2002.10121 [pdf, other]
Title: Optimal and Greedy Algorithms for Multi-Armed Bandits with Many Arms
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We characterize Bayesian regret in a stochastic multi-armed bandit problem with a large but finite number of arms. In particular, we assume the number of arms $k$ is $T^{\alpha}$, where $T$ is the time-horizon and $\alpha$ is in $(0,1)$. We consider a Bayesian setting where the reward distribution of each arm is drawn independently from a common prior, and provide a complete analysis of expected regret with respect to this prior. Our results exhibit a sharp distinction around $\alpha = 1/2$. When $\alpha < 1/2$, the fundamental lower bound on regret is $\Omega(k)$; and it is achieved by a standard UCB algorithm. When $\alpha > 1/2$, the fundamental lower bound on regret is $\Omega(\sqrt{T})$, and it is achieved by an algorithm that first subsamples $\sqrt{T}$ arms uniformly at random, then runs UCB on just this subset. Interestingly, we also find that a sufficiently large number of arms allows the decision-maker to benefit from "free" exploration if she simply uses a greedy algorithm. In particular, this greedy algorithm exhibits a regret of $\tilde{O}(\max(k,T/\sqrt{k}))$, which translates to a {\em sublinear} (though not optimal) regret in the time horizon. We show empirically that this is because the greedy algorithm rapidly disposes of underperforming arms, a beneficial trait in the many-armed regime. Technically, our analysis of the greedy algorithm involves a novel application of the Lundberg inequality, an upper bound for the ruin probability of a random walk; this approach may be of independent interest.

[235]  arXiv:2002.10126 [pdf, other]
Title: Safe reinforcement learning for probabilistic reachability and safety specifications: A Lyapunov-based approach
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)

Emerging applications in robotics and autonomous systems, such as autonomous driving and robotic surgery, often involve critical safety constraints that must be satisfied even when information about system models is limited. In this regard, we propose a model-free safety specification method that learns the maximal probability of safe operation by carefully combining probabilistic reachability analysis and safe reinforcement learning (RL). Our approach constructs a Lyapunov function with respect to a safe policy to restrain each policy improvement stage. As a result, it yields a sequence of safe policies that determine the range of safe operation, called the safe set, which monotonically expands and gradually converges. We also develop an efficient safe exploration scheme that accelerates the process of identifying the safety of unexamined states. Exploiting the Lyapunov shielding, our method regulates the exploratory policy to avoid dangerous states with high confidence. To handle high-dimensional systems, we further extend our approach to deep RL by introducing a Lagrangian relaxation technique to establish a tractable actor-critic algorithm. The empirical performance of our method is demonstrated through continuous control benchmark problems, such as a reaching task on a planar robot arm.

[236]  arXiv:2002.10127 [pdf, other]
Title: FONDUE: A Framework for Node Disambiguation Using Network Embeddings
Comments: 11 pages, 3 figures
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)

Real-world data often presents itself in the form of a network. Examples include social networks, citation networks, biological networks, and knowledge graphs. In their simplest form, networks represent real-life entities (e.g. people, papers, proteins, concepts) as nodes, and describe them in terms of their relations with other entities by means of edges between these nodes. This can be valuable for a range of purposes from the study of information diffusion to bibliographic analysis, bioinformatics research, and question-answering.
The quality of networks is often problematic though, affecting downstream tasks. This paper focuses on the common problem where a node in the network in fact corresponds to multiple real-life entities. In particular, we introduce FONDUE, an algorithm based on network embedding for node disambiguation. Given a network, FONDUE identifies nodes that correspond to multiple entities, for subsequent splitting. Extensive experiments on twelve benchmark datasets demonstrate that FONDUE is substantially and uniformly more accurate for ambiguous node identification compared to the existing state-of-the-art, at a comparable computational cost, while less optimal for determining the best way to split ambiguous nodes.

[237]  arXiv:2002.10131 [pdf, other]
Title: Angry Birds Flock Together: Aggression Propagation on Social Media
Comments: 10 pages, 4 figures
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY)

Cyberaggression has been found in various contexts and online social platforms, and modeled on different data using state-of-the-art machine and deep learning algorithms to enable automatic detection and blocking of this behavior. Users can be influenced to act aggressively or even bully others because of elevated toxicity and aggression in their own (online) social circle. In effect, this behavior can propagate from one user and neighborhood to another, and therefore, spread in the network. Interestingly, to our knowledge, no work has modeled the network dynamics of aggressive behavior. In this paper, we take a first step towards this direction, by studying propagation of aggression on social media. We look into various opinion dynamics models widely used to model how opinions propagate through a network. We propose ways to enhance these classical models to accommodate how aggression may propagate from one user to another, depending on how each user is connected to other aggressive or regular users. Through extensive simulations on Twitter data, we study how aggressive behavior could propagate in the network, and validate our models with ground truth from crawled data and crowdsourced annotations. We discuss the results and implications of our work.

[238]  arXiv:2002.10137 [pdf, other]
Title: Audio-driven Talking Face Video Generation with Natural Head Pose
Comments: 12 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Real-world talking faces often accompany with natural head movement. However, most existing talking face video generation methods only consider facial animation with fixed head pose. In this paper, we address this problem by proposing a deep neural network model that takes an audio signal A of a source person and a very short video V of a target person as input, and outputs a synthesized high-quality talking face video with natural head pose (making use of the visual information in V), expression and lip synchronization (by considering both A and V). The most challenging issue in our work is that natural poses often cause in-plane and out-of-plane head rotations, which makes synthesized talking face video far from realistic. To address this challenge, we reconstruct 3D face animation and re-render it into synthesized frames. To fine tune these frames into realistic ones with smooth background transition, we propose a novel memory-augmented GAN module. Extensive experiments and three user studies show that our method can generate high-quality (i.e., natural head movements, expressions and good lip synchronization) personalized talking face videos, outperforming the state-of-the-art methods.

[239]  arXiv:2002.10142 [pdf, other]
Title: Explicit and Implicit Dynamic Coloring of Graphs with Bounded Arboricity
Subjects: Data Structures and Algorithms (cs.DS)

Graph coloring is a fundamental problem in computer science. We study the fully dynamic version of the problem in which the graph is undergoing edge insertions and deletions and we wish to maintain a vertex-coloring with small update time after each insertion and deletion.
We show how to maintain an $O(\alpha \lg n)$-coloring with polylogarithmic update time, where $n$ is the number of vertices in the graph and $\alpha$ is the current arboricity of the graph. This improves upon a result by Solomon and Wein (ESA'18) who maintained an $O(\alpha_{\max}\lg^2 n)$-coloring, where $\alpha_{\max}$ is the maximum arboricity of the graph over all updates.
Furthermore, motivated by a lower bound by Barba et al. (Algorithmica'19), we initiate the study of implicit dynamic colorings. Barba et al. showed that dynamic algorithms with polylogarithmic update time cannot maintain an $f(\alpha)$-coloring for any function $f$ when the vertex colors are stored explicitly, i.e., for each vertex the color is stored explicitly in the memory. Previously, all dynamic algorithms maintained explicit colorings. Therefore, we propose to study implicit colorings, i.e., the data structure only needs to offer an efficient query procedure to return the color of a vertex (instead of storing its color explicitly). We provide an algorithm which breaks the lower bound and maintains an implicit $2^{O(\alpha)}$-coloring with polylogarithmic update time. In particular, this yields the first dynamic $O(1)$-coloring for graphs with constant arboricity such as planar graphs or graphs with bounded tree-width, which is impossible using explicit colorings.
We also show how to dynamically maintain a partition of the graph's edges into $O(\alpha)$ forests with polylogarithmic update time. We believe this data structure is of independent interest and might have more applications in the future.

[240]  arXiv:2002.10143 [pdf, other]
Title: Snitch: A 10 kGE Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads
Subjects: Hardware Architecture (cs.AR)

Data-parallel applications, such as data analytics, machine learning, and scientific computing, are placing an ever-growing demand on floating-point operations per second on emerging systems. With increasing integration density, the quest for energy efficiency becomes the number one design concern. While dedicated accelerators provide high energy efficiency, they are over-specialized and hard to adjust to algorithmic changes. We propose an architectural concept that tackles the issues of achieving extreme energy efficiency while still maintaining high flexibility as a general-purpose compute engine. The key idea is to pair a tiny 10kGE control core, called Snitch, with a double-precision FPU to adjust the compute to control ratio. While traditionally minimizing non-FPU area and achieving high floating-point utilization has been a trade-off, with Snitch, we achieve them both, by enhancing the ISA with two minimally intrusive extensions: stream semantic registers (SSR) and a floating-point repetition instruction (FREP). SSRs allow the core to implicitly encode load/store instructions as register reads/writes, eliding many explicit memory instructions. The FREP extension decouples the floating-point and integer pipeline by sequencing instructions from a micro-loop buffer. These ISA extensions significantly reduce the pressure on the core and free it up for other tasks, making Snitch and FPU effectively dual-issue at a minimal incremental cost of 3.2%. The two low overhead ISA extensions make Snitch more flexible than a contemporary vector processor lane, achieving a $2\times$ energy-efficiency improvement. We have evaluated the proposed core and ISA extensions on an octa-core cluster in 22nm technology. We achieve more than $5\times$ multi-core speed-up and a $3.5\times$ gain in energy efficiency on several parallel microkernels.

[241]  arXiv:2002.10145 [pdf, other]
Title: Hardness of equations over finite solvable groups under the exponential time hypothesis
Authors: Armin Weiß
Subjects: Computational Complexity (cs.CC); Group Theory (math.GR)

Goldmann and Russell (2002) initiated the study of the complexity of the equation satisfiability problem in finite groups by showing that it is in P for nilpotent groups while it is NP-complete for non-solvable groups. Since then, several results have appeared showing that the problem can be solved in polynomial time in certain solvable groups of Fitting length two. In this work, we present the first lower bounds for the equation satisfiability problem in finite solvable groups: under the assumption of the exponential time hypothesis, we show that it cannot be in P for any group of Fitting length at least four and for certain groups of Fitting length three. Moreover, the same hardness result applies to the equation identity problem.

[242]  arXiv:2002.10148 [pdf, other]
Title: Embedded-physics machine learning for coarse-graining and collective variable discovery without data
Subjects: Machine Learning (cs.LG); Chemical Physics (physics.chem-ph); Computational Physics (physics.comp-ph); Machine Learning (stat.ML)

We present a novel learning framework that consistently embeds underlying physics while bypassing a significant drawback of most modern, data-driven coarse-grained approaches in the context of molecular dynamics (MD), i.e., the availability of big data. The generation of a sufficiently large training dataset poses a computationally demanding task, while complete coverage of the atomistic configuration space is not guaranteed. As a result, the explorative capabilities of data-driven coarse-grained models are limited and may yield biased "predictive" tools. We propose a novel objective based on reverse Kullback-Leibler divergence that fully incorporates the available physics in the form of the atomistic force field. Rather than separating model learning from the data-generation procedure - the latter relies on simulating atomistic motions governed by force fields - we query the atomistic force field at sample configurations proposed by the predictive coarse-grained model. Thus, learning relies on the evaluation of the force field but does not require any MD simulation. The resulting generative coarse-grained model serves as an efficient surrogate model for predicting atomistic configurations and estimating relevant observables. Beyond obtaining a predictive coarse-grained model, we demonstrate that in the discovered lower-dimensional representation, the collective variables (CVs) are related to physicochemical properties, which are essential for gaining understanding of unexplored complex systems. We demonstrate the algorithmic advances in terms of predictive ability and the physical meaning of the revealed CVs for a bimodal potential energy function and the alanine dipeptide.

[243]  arXiv:2002.10149 [pdf, other]
Title: Cognitive Argumentation and the Suppression Task
Subjects: Artificial Intelligence (cs.AI)

This paper addresses the challenge of modeling human reasoning, within a new framework called Cognitive Argumentation. This framework rests on the assumption that human logical reasoning is inherently a process of dialectic argumentation and aims to develop a cognitive model for human reasoning that is computational and implementable. To give logical reasoning a human cognitive form the framework relies on cognitive principles, based on empirical and theoretical work in Cognitive Science, to suitably adapt a general and abstract framework of computational argumentation from AI. The approach of Cognitive Argumentation is evaluated with respect to Byrne's suppression task, where the aim is not only to capture the suppression effect between different groups of people but also to account for the variation of reasoning within each group. Two main cognitive principles are particularly important to capture human conditional reasoning that explain the participants' responses: (i) the interpretation of a condition within a conditional as sufficient and/or necessary and (ii) the mode of reasoning either as predictive or explanatory. We argue that Cognitive Argumentation provides a coherent and cognitively adequate model for human conditional reasoning that allows a natural distinction between definite and plausible conclusions, exhibiting the important characteristics of context-sensitive and defeasible reasoning.

[244]  arXiv:2002.10151 [pdf, ps, other]
Title: Vizing-Goldberg type bounds for the equitable chromatic number of block graphs
Comments: 21 pages, 12 figures
Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)

An equitable coloring of a graph $G$ is a proper vertex coloring of $G$ such that the sizes of any two color classes differ by at most one. In the paper, we pose a conjecture that offers a gap-one bound for the smallest number of colors needed to equitably color every block graph. In other words, the difference between the upper and the lower bounds of our conjecture is at most one. Thus, in some sense, the situation is similar to that of chromatic index, where we have the classical theorem of Vizing and the Goldberg conjecture for multigraphs. The results obtained in the paper support our conjecture. More precisely, we verify it in the class of well-covered block graphs, which are block graphs in which each vertex belongs to a maximum independent set. We also show that the conjecture is true for block graphs, which contain a vertex that does not lie in an independent set of size larger than two. Finally, we verify the conjecture for some symmetric-like block graphs. In order to derive our results we obtain structural characterizations of block graphs from these classes.

[245]  arXiv:2002.10152 [pdf, other]
Title: Real-time Kinematic Ground Truth for the Oxford RobotCar Dataset
Comments: Dataset website: this https URL
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

We describe the release of reference data towards a challenging long-term localisation and mapping benchmark based on the large-scale Oxford RobotCar Dataset. The release includes 72 traversals of a route through Oxford, UK, gathered in all illumination, weather and traffic conditions, and is representative of the conditions an autonomous vehicle would be expected to operate reliably in. Using post-processed raw GPS, IMU, and static GNSS base station recordings, we have produced a globally-consistent centimetre-accurate ground truth for the entire year-long duration of the dataset. Coupled with a planned online benchmarking service, we hope to enable quantitative evaluation and comparison of different localisation and mapping approaches focusing on long-term autonomy for road vehicles in urban environments challenged by changing weather.

[246]  arXiv:2002.10158 [pdf, other]
Title: Robot Perception of Static and Dynamic Objects with an Autonomous Floor Scrubber
Comments: 15 pages, 16 figures, submitted to Intelligent Service Robotics
Subjects: Robotics (cs.RO)

This paper presents the perception system of a new professional cleaning robot for large public places. The proposed system is based on multiple sensors including 3D and 2D lidar, two RGB-D cameras and a stereo camera. The two lidars together with an RGB-D camera are used for dynamic object (human) detection and tracking, while the second RGB-D and stereo camera are used for detection of static objects (dirt and ground objects). A learning and reasoning module for spatial-temporal representation of the environment based on the perception pipeline is also introduced. Furthermore, a new dataset collected with the robot in several public places, including a supermarket, a warehouse and an airport, is released. Baseline results on this dataset for further research and comparison are provided. The proposed system has been fully implemented into the Robot Operating System (ROS) with high modularity, also publicly available to the community.

[247]  arXiv:2002.10163 [pdf, other]
Title: Software Engineering Timeline: major areas of interest and multidisciplinary trends
Comments: Technical report University of Almer\'ia
Subjects: Software Engineering (cs.SE)

Society today cannot run without software and by extension, without Software Engineering. Since this discipline emerged in 1968, practitioners have learned valuable lessons that have contributed to current practices. Some have become outdated but many are still relevant and widely used. From the personal and incomplete perspective of the authors, this paper not only reviews the major milestones and areas of interest in the Software Engineering timeline helping software engineers to appreciate the state of things, but also tries to give some insights into the trends that this complex engineering will see in the near future.

[248]  arXiv:2002.10171 [pdf, ps, other]
Title: A Probabilistic Approach to Voting, Allocation, Matching, and Coalition Formation
Authors: Haris Aziz
Comments: Preprint for book chapter in "The Future of Economic Design: The Continuing Development of a Field as Envisioned by Its Researchers"
Subjects: Computer Science and Game Theory (cs.GT)

Randomisation and time-sharing are some of the oldest methods to achieve fairness. I make a case that applying these approaches to social choice settings constitutes a powerful paradigm that deserves an extensive and thorough examination. I discuss challenges and opportunities in applying these approaches to settings including voting, allocation, matching, and coalition formation.

[249]  arXiv:2002.10172 [pdf, other]
Title: Optimal strategies in the Fighting Fantasy gaming system: influencing stochastic dynamics by gambling with limited resource
Authors: Iain G. Johnston
Comments: Keyword: stochastic game; Markov decision problem; stochastic simulation; dynamic programming; resource allocation; stochastic optimal control; Bellman equation
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)

Fighting Fantasy is a popular recreational fantasy gaming system worldwide. Combat in this system progresses through a stochastic game involving a series of rounds, each of which may be won or lost. Each round, a limited resource (`luck') may be spent on a gamble to amplify the benefit from a win or mitigate the deficit from a loss. However, the success of this gamble depends on the amount of remaining resource, and if the gamble is unsuccessful, benefits are reduced and deficits increased. Players thus dynamically choose to expend resource to attempt to influence the stochastic dynamics of the game, with diminishing probability of positive return. The identification of the optimal strategy for victory is a Markov decision problem that has not yet been solved. Here, we combine stochastic analysis and simulation with dynamic programming to characterise the dynamical behaviour of the system in the absence and presence of gambling policy. We derive a simple expression for the victory probability without luck-based strategy. We use a backward induction approach to solve the Bellman equation for the system and identify the optimal strategy for any given state during the game. The optimal control strategies can dramatically enhance success probabilities, but take detailed forms; we use stochastic simulation to approximate these optimal strategies with simple heuristics that can be practically employed. Our findings provide a roadmap to improving success in the games that millions of people play worldwide, and inform a class of resource allocation problems with diminishing returns in stochastic games.

[250]  arXiv:2002.10174 [pdf, other]
Title: When Relation Networks meet GANs: Relation GANs with Triplet Loss
Comments: 8 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Though recent research has achieved remarkable progress in generating realistic images with generative adversarial networks (GANs), the lack of training stability is still a lingering concern of most GANs, especially on high-resolution inputs and complex datasets. Since the randomly generated distribution can hardly overlap with the real distribution, training GANs often suffers from the gradient vanishing problem. A number of approaches have been proposed to address this issue by constraining the discriminator's capabilities using empirical techniques, like weight clipping, gradient penalty, spectral normalization etc. In this paper, we provide a more principled approach as an alternative solution to this issue. Instead of training the discriminator to distinguish real and fake input samples, we investigate the relationship between paired samples by training the discriminator to separate paired samples from the same distribution and those from different distributions. To this end, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability. Extensive experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks including unconditional and conditional image generation and image translation. Our source codes are available on the website: \url{https://github.com/JosephineRabbit/Relation-GAN}

[251]  arXiv:2002.10177 [pdf, other]
Title: Improving STDP-based Visual Feature Learning with Whitening
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

In recent years, spiking neural networks (SNNs) emerge as an alternative to deep neural networks (DNNs). SNNs present a higher computational efficiency using low-power neuromorphic hardware and require less labeled data for training using local and unsupervised learning rules such as spike timing-dependent plasticity (STDP). SNN have proven their effectiveness in image classification on simple datasets such as MNIST. However, to process natural images, a pre-processing step is required. Difference-of-Gaussians (DoG) filtering is typically used together with on-center/off-center coding, but it results in a loss of information that is detrimental to the classification performance. In this paper, we propose to use whitening as a pre-processing step before learning features with STDP. Experiments on CIFAR-10 show that whitening allows STDP to learn visual features that are closer to the ones learned with standard neural networks, with a significantly increased classification performance as compared to DoG filtering. We also propose an approximation of whitening as convolution kernels that is computationally cheaper to learn and more suited to be implemented on neuromorphic hardware. Experiments on CIFAR-10 show that it performs similarly to regular whitening. Cross-dataset experiments on CIFAR-10 and STL-10 also show that it is fairly stable across datasets, making it possible to learn a single whitening transformation to process different datasets.

[252]  arXiv:2002.10179 [pdf, other]
Title: HRank: Filter Pruning using High-Rank Feature Map
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Neural network pruning offers a promising prospect to facilitate deploying deep neural networks on resource-limited devices. However, existing methods are still challenged by the training inefficiency and labor cost in pruning designs, due to missing theoretical guidance of non-salient network components. In this paper, we propose a novel filter pruning method by exploring the High Rank of feature maps (HRank). Our HRank is inspired by the discovery that the average rank of multiple feature maps generated by a single filter is always the same, regardless of the number of image batches CNNs receive. Based on HRank, we develop a method that is mathematically formulated to prune filters with low-rank feature maps. The principle behind our pruning is that low-rank feature maps contain less information, and thus pruned results can be easily reproduced. Besides, we experimentally show that weights with high-rank feature maps contain more important information, such that even when a portion is not updated, very little damage would be done to the model performance. Without introducing any additional constraints, HRank leads to significant improvements over the state-of-the-arts in terms of FLOPs and parameters reduction, with similar accuracies. For example, with ResNet-110, we achieve a 58.2%-FLOPs reduction by removing 59.2% of the parameters, with only a small loss of 0.14% in top-1 accuracy on CIFAR-10. With Res-50, we achieve a 43.8%-FLOPs reduction by removing 36.7% of the parameters, with only a loss of 1.17% in the top-1 accuracy on ImageNet. The codes can be available at https://github.com/lmbxmu/HRank.

[253]  arXiv:2002.10181 [pdf, other]
Title: Relaxing Relationship Queries on Graph Data
Comments: 16 pages, accepted to JoWS
Subjects: Information Retrieval (cs.IR); Databases (cs.DB)

In many domains we have witnessed the need to search a large entity-relation graph for direct and indirect relationships between a set of entities specified in a query. A search result, called a semantic association (SA), is typically a compact (e.g., diameter-constrained) connected subgraph containing all the query entities. For this problem of SA search, efficient algorithms exist but will return empty results if some query entities are distant in the graph. To reduce the occurrence of failing query and provide alternative results, we study the problem of query relaxation in the context of SA search. Simply relaxing the compactness constraint will sacrifice the compactness of an SA, and more importantly, may lead to performance issues and be impracticable. Instead, we focus on removing the smallest number of entities from the original failing query, to form a maximum successful sub-query which minimizes the loss of result quality caused by relaxation. We prove that verifying the success of a sub-query turns into finding an entity (called a certificate) that satisfies a distance-based condition about the query entities. To efficiently find a certificate of the success of a maximum sub-query, we propose a best-first search algorithm that leverages distance-based estimation to effectively prune the search space. We further improve its performance by adding two fine-grained heuristics: one based on degree and the other based on distance. Extensive experiments over popular RDF datasets demonstrate the efficiency of our algorithm, which is more scalable than baselines.

[254]  arXiv:2002.10185 [pdf, other]
Title: iLQGames.jl: Rapidly Designing and Solving Differential Games in Julia
Subjects: Multiagent Systems (cs.MA)

In many problems that involve multiple decision making agents, optimal choices for each agent depend on the choices of others. Differential game theory provides a principled formalism for expressing these coupled interactions and recent work offers efficient approximations to solve these problems to non-cooperative equilibria. iLQGames.jl is a framework for designing and solving differential games, built around the iterative linear-quadratic method. It is written in the Julia programming language to allow flexible prototyping and integration with other research software, while leveraging the high-performance nature of the language to allow real-time execution. The open-source software package can be found at https://github.com/lassepe/iLQGames.jl.

[255]  arXiv:2002.10187 [pdf, other]
Title: 3DSSD: Point-based 3D Single Stage Object Detector
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Currently, there have been many kinds of voxel-based 3D single stage detectors, while point-based single stage methods are still underexplored. In this paper, we first present a lightweight and effective point-based 3D single stage object detector, named 3DSSD, achieving a good balance between accuracy and efficiency. In this paradigm, all upsampling layers and refinement stage, which are indispensable in all existing point-based methods, are abandoned to reduce the large computation cost. We novelly propose a fusion sampling strategy in downsampling process to make detection on less representative points feasible. A delicate box prediction network including a candidate generation layer, an anchor-free regression head with a 3D center-ness assignment strategy is designed to meet with our demand of accuracy and speed. Our paradigm is an elegant single stage anchor-free framework, showing great superiority to other existing methods. We evaluate 3DSSD on widely used KITTI dataset and more challenging nuScenes dataset. Our method outperforms all state-of-the-art voxel-based single stage methods by a large margin, and has comparable performance to two stage point-based methods as well, with inference speed more than 25 FPS, 2x faster than former state-of-the-art point-based methods.

[256]  arXiv:2002.10191 [pdf, other]
Title: Learning Attentive Pairwise Interaction for Fine-Grained Classification
Comments: Accepted at AAAI-2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Fine-grained classification is a challenging problem, due to subtle differences among highly-confused categories. Most approaches address this difficulty by learning discriminative representation of individual input image. On the other hand, humans can effectively identify contrastive clues by comparing image pairs. Inspired by this fact, this paper proposes a simple but effective Attentive Pairwise Interaction Network (API-Net), which can progressively recognize a pair of fine-grained images by interaction. Specifically, API-Net first learns a mutual feature vector to capture semantic differences in the input pair. It then compares this mutual vector with individual vectors to generate gates for each input image. These distinct gate vectors inherit mutual context on semantic differences, which allow API-Net to attentively capture contrastive clues by pairwise interaction between two images. Additionally, we train API-Net in an end-to-end manner with a score ranking regularization, which can further generalize API-Net by taking feature priorities into account. We conduct extensive experiments on five popular benchmarks in fine-grained classification. API-Net outperforms the recent SOTA methods, i.e., CUB-200-2011 (90.0%), Aircraft(93.9%), Stanford Cars (95.3%), Stanford Dogs (90.3%), and NABirds (88.1%).

[257]  arXiv:2002.10198 [pdf, other]
Title: Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning
Comments: Published at The Web Conference (WWW) 2020, full paper
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Software Engineering (cs.SE)

Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query. Since both tasks aim to model the association between natural language and program-ming language, recent studies have combined these two tasks to improve their performance. However, researchers have yet been able to effectively leverage the intrinsic connection between the two tasks as they train these tasks in a separate or pipeline manner, which means their performance can not be well balanced. In this paper, we propose a novel end-to-end model for the two tasks by introducing an additional code generation task. More specifically, we explicitly exploit the probabilistic correlation between code summarization and code generation with dual learning, and utilize the two encoders for code summarization and code generation to train the code retrieval task via multi-task learning. We have carried out extensive experiments on an existing dataset of SQL andPython, and results show that our model can significantly improve the results of the code retrieval task over the-state-of-art models, as well as achieve competitive performance in terms of BLEU score for the code summarization task.

[258]  arXiv:2002.10199 [pdf, ps, other]
Title: Better Classifier Calibration for Small Data Sets
Comments: Accepted for publication in ACM Transactions on Knowledge Discovery from Data
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Classifier calibration does not always go hand in hand with the classifier's ability to separate the classes. There are applications where good classifier calibration, i.e. the ability to produce accurate probability estimates, is more important than class separation. When the amount of data for training is limited, the traditional approach to improve calibration starts to crumble. In this article we show how generating more data for calibration is able to improve calibration algorithm performance in many cases where a classifier is not naturally producing well-calibrated outputs and the traditional approach fails. The proposed approach adds computational cost but considering that the main use case is with small data sets this extra computational cost stays insignificant and is comparable to other methods in prediction time. From the tested classifiers the largest improvement was detected with the random forest and naive Bayes classifiers. Therefore, the proposed approach can be recommended at least for those classifiers when the amount of data available for training is limited and good calibration is essential.

[259]  arXiv:2002.10200 [pdf, other]
Title: ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network
Comments: Accepted to Proc. IEEE Conf. Comp. Vis. Pattern Recogn. (CVPR) 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Scene text detection and recognition has received increasing research attention. Existing methods can be roughly categorized into two groups: character-based and segmentation-based. These methods either are costly for character annotation or need to maintain a complex pipeline, which is often not suitable for real-time applications. Here we address the problem by proposing the Adaptive Bezier-Curve Network (ABCNet). Our contributions are three-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve. 2) We design a novel BezierAlign layer for extracting accurate convolution features of a text instance with arbitrary shapes, significantly improving the precision compared with previous methods. 3) Compared with standard bounding box detection, our Bezier curve detection introduces negligible computation overhead, resulting in superiority of our method in both efficiency and accuracy. Experiments on arbitrarily-shaped benchmark datasets, namely Total-Text and CTW1500, demonstrate that ABCNet achieves state-of-the-art accuracy, meanwhile significantly improving the speed. In particular, on Total-Text, our realtime version is over 10 times faster than recent state-of-the-art methods with a competitive recognition accuracy. Code is available at https://tinyurl.com/AdelaiDet

[260]  arXiv:2002.10210 [pdf, other]
Title: Learning to Select Bi-Aspect Information for Document-Scale Text Content Manipulation
Comments: accepted by AAAI2020
Subjects: Computation and Language (cs.CL)

In this paper, we focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer and aims to preserve text styles while altering the content. In detail, the input is a set of structured records and a reference text for describing another recordset. The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference. The task is unsupervised due to lack of parallel data, and is challenging to select suitable records and style words from bi-aspect inputs respectively and generate a high-fidelity long document. To tackle those problems, we first build a dataset based on a basketball game report corpus as our testbed, and present an unsupervised neural model with interactive attention mechanism, which is used for learning the semantic relationship between records and reference texts to achieve better content transfer and better style preservation. In addition, we also explore the effectiveness of the back-translation in our task for constructing some pseudo-training pairs. Empirical results show superiority of our approaches over competitive methods, and the models also yield a new state-of-the-art result on a sentence-level dataset.

[261]  arXiv:2002.10211 [pdf, other]
Title: Mnemonics Training: Multi-Class Incremental Learning without Forgetting
Comments: Accepted by CVPR 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Multi-Class Incremental Learning (MCIL) aims to learn new concepts by incrementally updating a model trained on previous concepts. However, there is an inherent trade-off to effectively learning new concepts without forgetting previous ones, potentially leading to catastrophic forgetting of previous concepts. To alleviate this issue, it has been proposed to keep around a few examples of the previous concepts but the effectiveness of this approach heavily depends on the representativeness of these examples. This paper proposes a novel and automatic framework we call mnemonics, where we parameterize exemplars and make them optimizable in an end-to-end manner. We train the framework through bilevel optimizations, i.e., model-level and exemplar-level. We conduct extensive experiments on three MCIL benchmarks, CIFAR-100, ImageNet-Subset and ImageNet, and show that using mnemonics exemplars can surpass the state-of-the-art by a large margin. Interestingly and quite intriguingly, the mnemonics exemplars tend to be on the boundaries between classes.

[262]  arXiv:2002.10212 [pdf, ps, other]
Title: A Mechanised Semantics for HOL with Ad-hoc Overloading
Comments: 18 pages, submitted to LPAR 2020
Subjects: Logic in Computer Science (cs.LO)

Isabelle/HOL augments classical higher-order logic with ad-hoc overloading of constant definitions---that is, one constant may have several definitions for non-overlapping types. In this paper, we present a mechanised proof that HOL with ad-hoc overloading is consistent. All our results have been formalised in the HOL4 theorem prover.

[263]  arXiv:2002.10213 [pdf, other]
Title: Superoptimization of WebAssembly Bytecode
Comments: 4 pages, 3 figures, MoreVMs 2020
Subjects: Programming Languages (cs.PL)

Motivated by the fast adoption of WebAssembly, we propose the first functional pipeline to support the superoptimization of WebAssembly bytecode. Our pipeline works over LLVM and Souper. We evaluate our superoptimization pipeline with 12 programs from the Rosetta code project. Our pipeline improves the code section size of 8 out of 12 programs. We discuss the challenges faced in superoptimization of WebAssembly with two case studies.

[264]  arXiv:2002.10214 [pdf, other]
Title: Injective Domain Knowledge in Neural Networks for Transprecision Computing
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Machine Learning (ML) models are very effective in many learning tasks, due to the capability to extract meaningful information from large data sets. Nevertheless, there are learning problems that cannot be easily solved relying on pure data, e.g. scarce data or very complex functions to be approximated. Fortunately, in many contexts domain knowledge is explicitly available and can be used to train better ML models. This paper studies the improvements that can be obtained by integrating prior knowledge when dealing with a non-trivial learning task, namely precision tuning of transprecision computing applications. The domain information is injected in the ML models in different ways: I) additional features, II) ad-hoc graph-based network topology, III) regularization schemes. The results clearly show that ML models exploiting problem-specific information outperform the purely data-driven ones, with an average accuracy improvement around 38%.

[265]  arXiv:2002.10215 [pdf, other]
Title: On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering
Comments: Accepted to Proc. IEEE Conf. Computer Vision and Pattern Recognition 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Visual Question Answering (VQA) methods have made incredible progress, but suffer from a failure to generalize. This is visible in the fact that they are vulnerable to learning coincidental correlations in the data rather than deeper relations between image content and ideas expressed in language. We present a dataset that takes a step towards addressing this problem in that it contains questions expressed in two languages, and an evaluation process that co-opts a well understood image-based metric to reflect the method's ability to reason. Measuring reasoning directly encourages generalization by penalizing answers that are coincidentally correct. The dataset reflects the scene-text version of the VQA problem, and the reasoning evaluation can be seen as a text-based version of a referring expression challenge. Experiments and analysis are provided that show the value of the dataset.

[266]  arXiv:2002.10217 [pdf, other]
Title: Automatic Estimation of Sphere Centers from Images of Calibrated Cameras
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Calibration of devices with different modalities is a key problem in robotic vision. Regular spatial objects, such as planes, are frequently used for this task. This paper deals with the automatic detection of ellipses in camera images, as well as to estimate the 3D position of the spheres corresponding to the detected 2D ellipses. We propose two novel methods to (i) detect an ellipse in camera images and (ii) estimate the spatial location of the corresponding sphere if its size is known. The algorithms are tested both quantitatively and qualitatively. They are applied for calibrating the sensor system of autonomous cars equipped with digital cameras, depth sensors and LiDAR devices.

[267]  arXiv:2002.10220 [pdf, ps, other]
Title: On the use of the Infinity Computer architecture to set up a dynamic precision floating-point arithmetic
Comments: 11 pages, 2 figures, 6 tables
Subjects: Numerical Analysis (math.NA)

We devise a variable precision floating-point arithmetic by exploiting the framework provided by the Infinity Computer. This is a computational platform implementing the Infinity Arithmetic system, a positional numeral system which can handle both infinite and infinitesimal quantities symbolized by the positive and negative finite powers of the radix grossone. The computational features offered by the Infinity Computer allows us to dynamically change the accuracy of representation and floating-point operations during the flow of a computation. When suitably implemented, this possibility turns out to be particularly advantageous when solving ill-conditioned problems. In fact, compared with a standard multi-precision arithmetic, here the accuracy is improved only when needed, thus not affecting that much the overall computational effort. An illustrative example about the solution of a nonlinear equation is also presented.

[268]  arXiv:2002.10221 [pdf, ps, other]
Title: The Archimedean trap: Why traditional reinforcement learning will probably not yield AGI
Comments: 16 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

After generalizing the Archimedean property of real numbers in such a way as to make it adaptable to non-numeric structures, we demonstrate that the real numbers cannot be used to accurately measure non-Archimedean structures. We argue that, since an agent with Artificial General Intelligence (AGI) should have no problem engaging in tasks that inherently involve non-Archimedean rewards, and since traditional reinforcement learning rewards are real numbers, therefore traditional reinforcement learning cannot lead to AGI. We indicate two possible ways traditional reinforcement learning could be altered to remove this roadblock.

[269]  arXiv:2002.10228 [pdf]
Title: Dynamic Systems Simulation and Control Using Consecutive Recurrent Neural Networks
Comments: 14 pages, granted for publication in Communications in Computer and Information Science (CCIS) proceedings by Springer Nature, presented in the International Conference on Modelling, Machine Learning and Astronomy (MMLA 2019)
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Signal Processing (eess.SP)

In this paper, we introduce a novel architecture to connecting adaptive learning and neural networks into and arbitrary machine's control system paradigm. Two consecutive Recurrent Neural Networks (RNNs) are used together to accurately model the dynamic characteristics of electromechanical systems that include controllers, actuators and motors. The age-old method of achieving control with the use of the- Proportional, Integral and Derivative constants is well understood as a simplified method that does not capture the complexities of the inherent nonlinearities of complex control systems. In the context of controlling and simulating electromechanical systems, we propose an alternative to PID controllers, employing a sequence of two Recurrent Neural Networks. The first RNN emulates the behavior of the controller, and the second the actuator/motor. The second RNN when used in isolation, potentially serves as an advantageous alternative to extant testing methods of electromechanical systems.

[270]  arXiv:2002.10231 [pdf, other]
Title: Linear-frictional contact model for 3D discrete element simulations of granular systems
Comments: 6 figures
Journal-ref: Int J Numer Methods Eng. 2020, 121(3), 560-569
Subjects: Computational Engineering, Finance, and Science (cs.CE); Soft Condensed Matter (cond-mat.soft)

The linear-frictional contact model is the most commonly used contact mechanism for discrete element (DEM) simulations of granular materials. Linear springs with a frictional slider are used for modeling interactions in directions normal and tangential to the contact surface. Although the model is simple in two dimensions, its implementation in 3D faces certain subtle challenges, and the particle interactions that occur within a single time-step require careful modeling with a robust algorithm. The paper details a 3D algorithm that accounts for the changing direction of the tangential force within a time-step, the transition from elastic to slip behavior within a time-step, possible contact sliding during only part of a time-step, and twirling and rotation of the tangential force during a time-step. Without three of these adjustments, errors are introduced in the incremental stiffness of an assembly. Without the fourth adjustment, the resulting stress tensor is not only incorrect, it is no longer a tensor. The algorithm also computes the work increments during a time-step, both elastic and dissipative.

[271]  arXiv:2002.10233 [pdf, ps, other]
Title: ArcText: An Unified Text Approach to Describing Convolutional Neural Network Architectures
Authors: Yanan Sun
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Numerous Convolutional Neural Network (CNN) models have demonstrated their promising performance mostly in computer vision. The superiority of CNNs mainly relies on their complex architectures that are often manually designed with extensive human expertise. Data mining on CNN architectures can discover useful patterns and fundamental sub-comments from existing CNN architectures, providing common researchers with strong prior knowledge to design CNN architectures when they have no expertise in CNNs. There have been various state-of-the-art data mining algorithms at hand, while there is rare work that has been used for this aspect. The main reason behind this is the barrier between CNN architectures and data mining algorithms. Specifically, the current CNN architecture descriptions cannot be exactly vectorized to the input to data mining algorithms. In this paper, we propose a unified approach, named ArcTxt, to describing CNN architectures based on text. Particularly, three different units of ArcText and an order method have been elaborately designed, to uniquely describe the same architecture including the sufficient information. Also, the resulted description can also be exactly converted back to the corresponding CNN architecture. ArcText bridge the gap between CNN and data mining researchers, and has the potentiality to be utilized to wider scenarios.

[272]  arXiv:2002.10234 [pdf, other]
Title: FR-Train: A mutual information-based approach to fair and robust training
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Trustworthy AI is a critical issue in machine learning where, in addition to training a model that is accurate, one must consider both fair and robust training in the presence of data bias and poisoning. However, the existing model fairness techniques mistakenly view poisoned data as an additional bias, resulting in severe performance degradation. To fix this problem, we propose FR-Train, which holistically performs fair and robust model training. We provide a mutual information-based interpretation of an existing adversarial training-based fairness-only method, and apply this idea to architect an additional discriminator that can identify poisoned data using a clean validation set and reduce its influence. In our experiments, FR-Train shows almost no decrease in fairness and accuracy in the presence of data poisoning by both mitigating the bias and defending against poisoning. We also demonstrate how to construct clean validation sets using crowdsourcing, and release new benchmark datasets.

[273]  arXiv:2002.10235 [pdf, other]
Title: Recurrent Dirichlet Belief Networks for Interpretable Dynamic Relational Data Modelling
Comments: 7 pages, 3 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The Dirichlet Belief Network~(DirBN) has been recently proposed as a promising approach in learning interpretable deep latent representations for objects. In this work, we leverage its interpretable modelling architecture and propose a deep dynamic probabilistic framework -- the Recurrent Dirichlet Belief Network~(Recurrent-DBN) -- to study interpretable hidden structures from dynamic relational data. The proposed Recurrent-DBN has the following merits: (1) it infers interpretable and organised hierarchical latent structures for objects within and across time steps; (2) it enables recurrent long-term temporal dependence modelling, which outperforms the one-order Markov descriptions in most of the dynamic probabilistic frameworks. In addition, we develop a new inference strategy, which first upward-and-backward propagates latent counts and then downward-and-forward samples variables, to enable efficient Gibbs sampling for the Recurrent-DBN. We apply the Recurrent-DBN to dynamic relational data problems. The extensive experiment results on real-world data validate the advantages of the Recurrent-DBN over the state-of-the-art models in interpretable latent structure discovery and improved link prediction performance.

[274]  arXiv:2002.10241 [pdf, other]
Title: Multi-objective Consensus Clustering Framework for Flight Search Recommendation
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)

In the travel industry, online customers book their travel itinerary according to several features, like cost and duration of the travel or the quality of amenities. To provide personalized recommendations for travel searches, an appropriate segmentation of customers is required. Clustering ensemble approaches were developed to overcome well-known problems of classical clustering approaches, that each rely on a different theoretical model and can thus identify in the data space only clusters corresponding to this model. Clustering ensemble approaches combine multiple clustering results, each from a different algorithmic configuration, for generating more robust consensus clusters corresponding to agreements between initial clusters. We present a new clustering ensemble multi-objective optimization-based framework developed for analyzing Amadeus customer search data and improve personalized recommendations. This framework optimizes diversity in the clustering ensemble search space and automatically determines an appropriate number of clusters without requiring user's input. Experimental results compare the efficiency of this approach with other existing approaches on Amadeus customer search data in terms of internal (Adjusted Rand Index) and external (Amadeus business metric) validations.

[275]  arXiv:2002.10242 [pdf, other]
Title: Age of Information Optimized MAC in V2X Sidelink via Piggyback-Based Collaboration
Comments: Submitted to IEEE TWC for possible publication
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

Real-time status update in future vehicular networks is vital to enable control-level cooperative autonomous driving. Cellular Vehicle-to-Everything (C-V2X), as one of the most promising vehicular wireless technologies, adopts a Semi-Persistent Scheduling (SPS) based Medium-Access-Control (MAC) layer protocol for its sidelink communications. Despite the recent and ongoing efforts to optimize SPS, very few work has considered the status update performance of SPS. In this paper, Age of Information (AoI) is first leveraged to evaluate the MAC layer performance of C-V2X sidelink. Critical issues of SPS, i.e., persistent packet collisions and Half-Duplex (HD) effects, are identified to hinder its AoI performance. Therefore, a piggyback-based collaboration method is proposed accordingly, whereby vehicles collaborate to inform each other of potential collisions and collectively afford HD errors, while entailing only a small signaling overhead. Closed-form AoI performance is derived for the proposed scheme, optimal configurations for key parameters are hence calculated, and the convergence property is proved for decentralized implementation. Simulation results show that compared with the standardized SPS and its state-of-the-art enhancement schemes, the proposed scheme shows significantly better performance, not only in terms of AoI, but also of conventional metrics such as transmission reliability.

[276]  arXiv:2002.10244 [pdf, other]
Title: Fractional-Order Models for the Static and Dynamic Analysis of Nonlocal Plates
Comments: 26 pages, 3 figures, 13 Tables. arXiv admin note: text overlap with arXiv:2001.06885, arXiv:2002.07148
Subjects: Computational Engineering, Finance, and Science (cs.CE); Analysis of PDEs (math.AP); Dynamical Systems (math.DS); Numerical Analysis (math.NA)

This study presents the analytical formulation and the finite element solution of fractional order nonlocal plates under both Mindlin and Kirchoff formulations. By employing consistent definitions for fractional-order kinematic relations, the governing equations and the associated boundary conditions are derived based on variational principles. Remarkably, the fractional-order nonlocal model gives rise to a self-adjoint and positive-definite system that accepts a unique solution. Further, owing to the difficulty in obtaining analytical solutions to this fractional-order differ-integral problem, a 2D finite element model for the fractional-order governing equations is presented. Following a thorough validation with benchmark problems, the 2D fractional finite element model is used to study the static as well as the free dynamic response of fractional-order plates subject to various loading and boundary conditions. It is established that the fractional-order nonlocality leads to a reduction in the stiffness of the plate structure thereby increasing the displacements and reducing the natural frequency of vibration of the plates. Further, it is seen that the effect of nonlocality is stronger on the higher modes of vibration when compared to the fundamental mode. These effects of the fractional-order nonlocality are noted irrespective of the nature of the boundary conditions. More specifically, the fractional-order model of nonlocal plates is free from boundary effects that lead to paradoxical predictions such as hardening and absence of nonlocal effects in classical integral approaches to nonlocal elasticity. This consistency in the predictions is a result of the well-posed nature of the fractional-order governing equations that accept a unique solution.

[277]  arXiv:2002.10245 [pdf, other]
Title: Specializing Coherence, Consistency, and Push/Pull for GPU Graph Analytics
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

This work provides the first study to explore the interaction of update propagation with and without fine-grained synchronization (push vs. pull), emerging coherence protocols (GPU vs. DeNovo coherence), and software-centric consistency models (DRF0, DRF1, and DRFrlx) for graph workloads on emerging integrated GPU-CPU systems with native unified shared memory. We study 6 graph applications with 6 graph inputs for a total of 36 workloads running on 12 system (hardware+software) configurations reflecting the above design space of update propagation, coherence, and memory consistency. We make three key contributions. First, we show that there is no single best system configuration for all workloads, motivating systems with flexible coherence and consistency support. Second, we develop a model to accurately predict the best system configuration -- this model can be used by software designers to decide on push vs. pull and the consistency model and by flexible hardware to invoke the appropriate coherence and consistency configuration for the given workload. Third, we show that the design dimensions explored here are inter-dependent, reinforcing the need for software-hardware co-design in the above design dimensions. For example, software designers deciding on push vs. pull must consider the consistency model supported by hardware -- in some cases, push maybe better if hardware supports DRFrlx while pull may be better if hardware does not support DRFrlx.

[278]  arXiv:2002.10246 [pdf, other]
Title: A subtractive manufacturing constraint for level set topology optimization
Journal-ref: Structural and Multidisciplinary Optimization (2020)
Subjects: Computational Engineering, Finance, and Science (cs.CE); Optimization and Control (math.OC)

We present a method for enforcing manufacturability constraints in generated parts such that they will be automatically ready for fabrication using a subtractive approach. We primarily target multi-axis CNC milling approaches but the method should generalize to other subtractive methods as well. To this end, we take as user input: the radius of curvature of the tool bit, a coarse model of the tool head and optionally a set of milling directions. This allows us to enforce the following manufacturability conditions: 1) surface smoothness such that the radius of curvature of the part does not exceed the milling bit radius, 2) orientation such that every part of the surface to be milled is visible from at least one milling direction, 3) accessibility such that every surface patch can be reached by the tool bit without interference with the tool or head mount. We will show how to efficiently enforce the constraint during level set-based topology optimization modifying the advection velocity such that at each iteration the topology optimization maintains a descent optimization direction and does not violate any of the manufacturability conditions. This approach models the actual subtractive process by carving away material accessible to the machine at each iteration until a local optimum is achieved.

[279]  arXiv:2002.10248 [pdf, other]
Title: Bayes-Probe: Distribution-Guided Sampling for Prediction Level Sets
Comments: Significantly expanded version of arXiv:2001.03076, with new problem formulation and experiments
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Building machine learning models requires a suite of tools for interpretation, understanding, and debugging. Many existing methods have been proposed, but it can still be difficult to probe for examples which communicate model behaviour. We introduce Bayes-Probe, a model inspection method for analyzing neural networks by generating distribution-conforming examples of known prediction confidence. By selecting appropriate distributions and confidence prediction values, Bayes-Probe can be used to synthesize ambivalent predictions, uncover in-distribution adversarial examples, and understand novel-class extrapolation and domain adaptation behaviours. Bayes-Probe is model agnostic, requiring only a data generator and classifier prediction. We use Bayes-Probe to analyze models trained on both procedurally-generated data (CLEVR) and organic data (MNIST and Fashion-MNIST). Code is available at https://github.com/serenabooth/Bayes-Probe.

[280]  arXiv:2002.10251 [pdf, ps, other]
Title: Identifying stochastic governing equations from data of the most probable transition trajectories
Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph); Methodology (stat.ME)

Extracting the governing stochastic differential equation model from elusive data is crucial to understand and forecast dynamics for various systems. We devise a method to extract the drift term and estimate the diffusion coefficient of a governing stochastic dynamical system, from its time-series data for the most probable transition trajectory. By the Onsager-Machlup theory, the most probable transition trajectory satisfies the corresponding Euler-Lagrange equation, which is a second order deterministic ordinary differential equation involving the drift term and diffusion coefficient. We first estimate the coefficients of the Euler-Lagrange equation based on the data of the most probable trajectory, and then we calculate the drift and diffusion coefficient of the governing stochastic dynamical system. These two steps involve sparse regression and optimization. We finally illustrate our method with an example.

[281]  arXiv:2002.10252 [pdf, other]
Title: TensorShield: Tensor-based Defense Against Adversarial Attacks on Images
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Recent studies have demonstrated that machine learning approaches like deep neural networks (DNNs) are easily fooled by adversarial attacks. Subtle and imperceptible perturbations of the data are able to change the result of deep neural networks. Leveraging vulnerable machine learning methods raises many concerns especially in domains where security is an important factor. Therefore, it is crucial to design defense mechanisms against adversarial attacks. For the task of image classification, unnoticeable perturbations mostly occur in the high-frequency spectrum of the image. In this paper, we utilize tensor decomposition techniques as a preprocessing step to find a low-rank approximation of images which can significantly discard high-frequency perturbations. Recently a defense framework called Shield could "vaccinate" Convolutional Neural Networks (CNN) against adversarial examples by performing random-quality JPEG compressions on local patches of images on the ImageNet dataset. Our tensor-based defense mechanism outperforms the SLQ method from Shield by 14% against FastGradient Descent (FGSM) adversarial attacks, while maintaining comparable speed.

[282]  arXiv:2002.10253 [pdf, other]
Title: Physics-Informed Multi-LSTM Networks for Metamodeling of Nonlinear Structures
Comments: 21 pages, 13 figures
Subjects: Computational Engineering, Finance, and Science (cs.CE); Signal Processing (eess.SP)

This paper introduces an innovative physics-informed deep learning framework for metamodeling of nonlinear structural systems with scarce data. The basic concept is to incorporate physics knowledge (e.g., laws of physics, scientific principles) into deep long short-term memory (LSTM) networks, which boosts the learning within a feasible solution space. The physics constraints are embedded in the loss function to enforce the model training which can accurately capture latent system nonlinearity even with very limited available training datasets. Specifically for dynamic structures, physical laws of equation of motion, state dependency and hysteretic constitutive relationship are considered to construct the physics loss. In particular, two physics-informed multi-LSTM network architectures are proposed for structural metamodeling. The satisfactory performance of the proposed framework is successfully demonstrated through two illustrative examples (e.g., nonlinear structures subjected to ground motion excitation). It turns out that the embedded physics can alleviate overfitting issues, reduce the need of big training datasets, and improve the robustness of the trained model for more reliable prediction. As a result, the physics-informed deep learning paradigm outperforms classical non-physics-guided data-driven neural networks.

[283]  arXiv:2002.10254 [pdf, other]
Title: Empirical Study on Airline Delay Analysis and Prediction
Comments: Figure 13
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The Big Data analytics are a logical analysis of very large scale datasets. The data analysis enhances an organization and improve the decision making process. In this article, we present Airline Delay Analysis and Prediction to analyze airline datasets with the combination of weather dataset. In this research work, we consider various attributes to analyze flight delay, for example, day-wise, airline-wise, cloud cover, temperature, etc. Moreover, we present rigorous experiments on various machine learning model to predict correctly the delay of a flight, namely, logistic regression with L2 regularization, Gaussian Naive Bayes, K-Nearest Neighbors, Decision Tree classifier and Random forest model. The accuracy of the Random Forest model is 82% with a delay threshold of 15 minutes of flight delay. The analysis is carried out using dataset from 1987 to 2008, the training is conducted with dataset from 2000 to 2007 and validated prediction result using 2008 data. Moreover, we have got recall 99% in the Random Forest model.

[284]  arXiv:2002.10255 [pdf, other]
Title: Ambiguous phase assignment of discretized 3D geometries in topology optimization
Subjects: Computational Engineering, Finance, and Science (cs.CE); Optimization and Control (math.OC)

Level set-based immersed boundary techniques operate on nonconforming meshes while providing a crisp definition of interface and external boundaries. In such techniques, an isocontour of a level set field interpolated from nodal level set values defines a problem's geometry. If the interface is explicitly tracked, the intersected elements are typically divided into sub-elements to which a phase needs to be assigned. Due to loss of information in the discretization of the level set field, certain geometrical configurations allow for ambiguous phase assignment of sub-elements, and thus ambiguous definition of the interface. The study presented here focuses on analyzing these topological ambiguities in embedded geometries constructed from discretized level set fields on hexahedral meshes. The analysis is performed on three-dimensional problems where several intersection configurations can significantly affect the problem's topology. This is in contrast to two-dimensional problems where ambiguous topological features exist only in one intersection configuration and identifying and resolving them is straightforward. A set of rules that resolve these ambiguities for two-phase problems is proposed, and algorithms for their implementations are provided. The influence of these rules on the evolution of the geometry in the optimization process is investigated with linear elastic topology optimization problems. These problems are solved by an explicit level set topology optimization framework that uses the extended finite element method to predict physical responses. This study shows that the choice of a rule to resolve topological features can result in drastically different final geometries. However, for the problems studied in this paper, the performances of the optimized design do not differ.

[285]  arXiv:2002.10258 [pdf, ps, other]
Title: Computation Rate Maximization in Wireless Powered MEC with Spread Spectrum Multiple Access
Comments: The paper has been accepted for publication by Proc. IEEE ITOEC 2020
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

The integration of mobile edge computing (MEC) and wireless power transfer (WPT) technologies has recently emerged as an effective solution for extending battery life and increasing the computing power of wireless devices. In this paper, we study the resource allocation problem of a multi-user wireless powered MEC system, where the users share the wireless channel via direct sequence code division multiple access (DS-CDMA). In particular, we are interested in jointly optimizing the task offloading decisions and resource allocation, to maximize the weighted sum computation rate of all the users in the network. The optimization problem is formulated as a mixed integer non-linear programming (MINLP). For a given offloading user set, we implement an efficient Fractional Programming (FP) approach to mitigate the multi-user interference in the uplink task offloading. On top of that, we then propose a Stochastic Local Search algorithm to optimize the offloading decisions. Simulation results show that the proposed method can effectively enhance the computing performance of a wireless powered MEC with spread spectrum multiple access compared to other representative benchmark methods.

[286]  arXiv:2002.10259 [pdf, other]
Title: Markov Logic Networks with Complex Weights: Expressivity, Liftability and Fourier Transforms
Authors: Ondrej Kuzelka
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)

We study expressivity of Markov logic networks (MLNs). We introduce complex MLNs, which use complex-valued weights, and we show that, unlike standard MLNs with real-valued weights, complex MLNs are fully expressive. We then observe that discrete Fourier transform can be computed using weighted first order model counting (WFOMC) with complex weights and use this observation to design an algorithm for computing relational marginal polytopes which needs substantially less calls to a WFOMC oracle than a recent algorithm.

[287]  arXiv:2002.10260 [pdf, other]
Title: Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation
Subjects: Computation and Language (cs.CL)

Transformer-based models have brought a radical change to neural machine translation. A key feature of the Transformer architecture is the so-called multi-head attention mechanism, which allows the model to focus simultaneously on different parts of the input. However, recent works have shown that attention heads learn simple positional patterns which are often redundant. In this paper, we propose to replace all but one attention head of each encoder layer with fixed -- non-learnable -- attentive patterns that are solely based on position and do not require any external knowledge. Our experiments show that fixing the attention heads on the encoder side of the Transformer at training time does not impact the translation quality and even increases BLEU scores by up to 3 points in low-resource scenarios.

[288]  arXiv:2002.10261 [pdf, other]
Title: Learning from Positive and Unlabeled Data with Arbitrary Positive Shift
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Positive-unlabeled (PU) learning trains a binary classifier using only positive and unlabeled data. A common simplifying assumption is that the positive data is representative of the target positive class. This assumption is often violated in practice due to time variation, domain shift, or adversarial concept drift. This paper shows that PU learning is possible even with arbitrarily non-representative positive data when provided unlabeled datasets from the source and target distributions. Our key insight is that only the negative class's distribution need be fixed. We propose two methods to learn under such arbitrary positive bias. The first couples negative-unlabeled (NU) learning with unlabeled-unlabeled (UU) learning while the other uses a novel recursive risk estimator robust to positive shift. Experimental results demonstrate our methods' effectiveness across numerous real-world datasets and forms of positive data bias, including disjoint positive class-conditional supports.

[289]  arXiv:2002.10266 [pdf, other]
Title: Rhythm, Chord and Melody Generation for Lead Sheets using Recurrent Neural Networks
Comments: 8 pages, 2 figures, 3 tables, 2 appendices
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)

Music that is generated by recurrent neural networks often lacks a sense of direction and coherence. We therefore propose a two-stage LSTM-based model for lead sheet generation, in which the harmonic and rhythmic templates of the song are produced first, after which, in a second stage, a sequence of melody notes is generated conditioned on these templates. A subjective listening test shows that our approach outperforms the baselines and increases perceived musical coherence.

[290]  arXiv:2002.10268 [pdf, other]
Title: Using Machine Learning to predict extreme events in the Hénon map
Comments: 9 pages, 12 figures
Journal-ref: Chaos: An Interdisciplinary Journal of Nonlinear Science 30.1 (2020): 013113
Subjects: Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD); Machine Learning (stat.ML)

Machine Learning (ML) inspired algorithms provide a flexible set of tools for analyzing and forecasting chaotic dynamical systems. We here analyze the performance of one algorithm for the prediction of extreme events in the two-dimensional H\'enon map at the classical parameters. The task is to determine whether a trajectory will exceed a threshold after a set number of time steps into the future. This task has a geometric interpretation within the dynamics of the H\'enon map, which we use to gauge the performance of the neural networks that are used in this work. We analyze the dependence of the success rate of the ML models on the prediction time $T$ , the number of training samples $N_T$ and the size of the network $N_p$. We observe that in order to maintain a certain accuracy, $N_T \propto exp(2 h T)$ and $N_p \propto exp(hT)$, where $h$ is the topological entropy. Similar relations between the intrinsic chaotic properties of the dynamics and ML parameters might be observable in other systems as well.

[291]  arXiv:2002.10269 [pdf, other]
Title: A Graph-Based Platform for Customer Behavior Analysis using Applications' Clickstream Data
Authors: Mojgan Mohajer
Comments: Technical Report
Subjects: Databases (cs.DB); Machine Learning (cs.LG)

Clickstream analysis is getting more attention since the increase of usage in e-commerce and applications. Beside customers' purchase behavior analysis, there is also attempt to analyze the customer behavior in relation to the quality of web or application design. In general, clickstream data can be considered as a sequence of log events collected at different levels of web/app usage. The analysis of clickstream data can be performed directly as sequence analysis or by extracting features from sequences. In this work, we show how representing and saving the sequences with their underlying graph structures can induce a platform for customer behavior analysis. Our main idea is that clickstream data containing sequences of actions of an application, are walks of the corresponding finite state automaton (FSA) of that application. Our hypothesis is that the customers of an application normally do not use all possible walks through that FSA and the number of actual walks is much smaller than total number of possible walks through the FSA. Sequences of such a walk normally consist of a finite number of cycles on FSA graphs. Identifying and matching these cycles in the classical sequence analysis is not straight forward. We show that representing the sequences through their underlying graph structures not only groups the sequences automatically but also provides a compressed data representation of the original sequences.

[292]  arXiv:2002.10277 [pdf, other]
Title: PUGeo-Net: A Geometry-centric Network for 3D Point Cloud Upsampling
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper addresses the problem of generating uniform dense point clouds to describe the underlying geometric structures from given sparse point clouds. Due to the irregular and unordered nature, point cloud densification as a generative task is challenging. To tackle the challenge, we propose a novel deep neural network based method, called PUGeo-Net, that learns a $3\times 3$ linear transformation matrix $\bf T$ for each input point. Matrix $\mathbf T$ approximates the augmented Jacobian matrix of a local parameterization and builds a one-to-one correspondence between the 2D parametric domain and the 3D tangent plane so that we can lift the adaptively distributed 2D samples (which are also learned from data) to 3D space. After that, we project the samples to the curved surface by computing a displacement along the normal of the tangent plane. PUGeo-Net is fundamentally different from the existing deep learning methods that are largely motivated by the image super-resolution techniques and generate new points in the abstract feature space. Thanks to its geometry-centric nature, PUGeo-Net works well for both CAD models with sharp features and scanned models with rich geometric details. Moreover, PUGeo-Net can compute the normal for the original and generated points, which is highly desired by the surface reconstruction algorithms. Computational results show that PUGeo-Net, the first neural network that can jointly generate vertex coordinates and normals, consistently outperforms the state-of-the-art in terms of accuracy and efficiency for upsampling factor $4\sim 16$

[293]  arXiv:2002.10283 [pdf, other]
Title: The Knowledge Graph Track at OAEI -- Gold Standards, Baselines, and the Golden Hammer Bias
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

The Ontology Alignment Evaluation Initiative (OAEI) is an annual evaluation of ontology matching tools. In 2018, we have started the Knowledge Graph track, whose goal is to evaluate the simultaneous matching of entities and schemas of large-scale knowledge graphs. In this paper, we discuss the design of the track and two different strategies of gold standard creation. We analyze results and experiences obtained in first editions of the track, and, by revealing a hidden task, we show that all tools submitted to the track (and probably also to other tracks) suffer from a bias which we name the golden hammer bias.

[294]  arXiv:2002.10284 [pdf]
Title: Word Embeddings Inherently Recover the Conceptual Organization of the Human Mind
Authors: Victor Swift
Comments: 12 pages, 4 figures
Subjects: Computation and Language (cs.CL); Neurons and Cognition (q-bio.NC)

Machine learning is a means to uncover deep patterns from rich sources of data. Here, we find that machine learning can recover the conceptual organization of the human mind when applied to the natural language use of millions of people. Utilizing text from billions of webpages, we recover most of the concepts contained in English, Dutch, and Japanese, as represented in large scale Word Association networks. Our results justify machine learning as a means to probe the human mind, at a depth and scale that has been unattainable using self-report and observational methods. Beyond direct psychological applications, our methods may prove useful for projects concerned with defining, assessing, relating, or uncovering concepts in any scientific field.

[295]  arXiv:2002.10286 [pdf, other]
Title: Prediction with Corrupted Expert Advice
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We revisit the fundamental problem of prediction with expert advice, in a setting where the environment is benign and generates losses stochastically, but the feedback observed by the learner is subject to a moderate adversarial corruption. We prove that a variant of the classical Multiplicative Weights algorithm with decreasing step sizes achieves constant regret in this setting and performs optimally in a wide range of environments, regardless of the magnitude of the injected corruption. Our results reveal a surprising disparity between the often comparable Follow the Regularized Leader (FTRL) and Online Mirror Descent (OMD) frameworks: we show that for experts in the corrupted stochastic regime, the regret performance of OMD is in fact strictly inferior to that of FTRL.

[296]  arXiv:2002.10289 [pdf, other]
Title: EL PASSO: Privacy-preserving, Asynchronous Single Sign-On
Subjects: Cryptography and Security (cs.CR)

We introduce EL PASSO, a privacy-preserving, asynchronous Single Sign-On (SSO) system. It enables personal authentication while protecting users' privacy against both identity providers and relying parties, and allows selective attribute disclosure. EL PASSO is based on anonymous credentials, yet it supports users' accountability. Selected authorities may recover the identity of allegedly misbehaving users, and users can prove properties about their identity without revealing it in the clear. EL PASSO does not require specific secure hardware or a third party (other than existing participants in SSO). The generation and use of authentication credentials are asynchronous, allowing users to sign on when identity providers are temporarily unavailable. We evaluate EL PASSO in a distributed environment and prove its low computational cost, yielding faster sign-on operations than OIDC from a regular laptop, one-second user-perceived latency from a low-power device, and scaling to more than 50 sign-on operations per second at a relying party using a single 4-core server in the cloud.

[297]  arXiv:2002.10294 [pdf, other]
Title: Semantic, Efficient, and Secure Search over Encrypted Cloud Data
Authors: Fateh Boucenna
Comments: 180 pages, PhD Thesis, University of Sciences and Technology Houari Boumediene (USTHB) Algiers Algeria, searchable encryption, cloud computing, semantic search, homomorphic encryption, data privacy, weighting formula
Subjects: Cryptography and Security (cs.CR); Information Retrieval (cs.IR)

Companies and individuals demand more and more storage space and computing power. For this purpose, several new technologies have been designed and implemented, such as the cloud computing. This technology provides its users with storage space and computing power according to their needs in a flexible and personalized way. However, the outsourced data such as emails, electronic health records, and company reports are sensitive and confidential. Therefore, It is primordial to protect the outsourced data against possible external attacks and the cloud server itself. That is why it is highly recommended to encrypt the sensitive data before being outsourced to a remote server. To perform searches over outsourced data, it is no longer possible to exploit traditional search engines given that these data are encrypted. Consequently, lots of searchable encryption (SE) schemes have been proposed in the literature. Three major research axes of searchable encryption area have been studied in the literature. The first axis consists in ensuring the security of the search approach. Indeed, the search process should be performed without decryption any data and without causing any sensitive information leakage. The second axis consists in studying the search performance. In fact, the encrypted indexes are less efficient than the plaintext indexes, which makes the searchable encryption schemes very slow in practice. More the approach is secure, less it is efficient, thus, the challenge consists in finding the best compromise between security and performance. Finally, the third research axis consists in the quality of the returned results in terms of relevance and recall. The problem is that the encryption of the index causes the degradation of the recall and the precision. Therefore, the goal is to propose a technique that is able to obtain almost the same result obtained in the traditional search.

[298]  arXiv:2002.10295 [pdf, other]
Title: SupRB: A Supervised Rule-based Learning System for Continuous Problems
Comments: Submitted to the Genetic and Evolutionary Computation Conference 2020 (GECCO 2020)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

We propose the SupRB learning system, a new Pittsburgh-style learning classifier system (LCS) for supervised learning on multi-dimensional continuous decision problems. SupRB learns an approximation of a quality function from examples (consisting of situations, choices and associated qualities) and is then able to make an optimal choice as well as predict the quality of a choice in a given situation. One area of application for SupRB is parametrization of industrial machinery. In this field, acceptance of the recommendations of machine learning systems is highly reliant on operators' trust. While an essential and much-researched ingredient for that trust is prediction quality, it seems that this alone is not enough. At least as important is a human-understandable explanation of the reasoning behind a recommendation. While many state-of-the-art methods such as artificial neural networks fall short of this, LCSs such as SupRB provide human-readable rules that can be understood very easily. The prevalent LCSs are not directly applicable to this problem as they lack support for continuous choices. This paper lays the foundations for SupRB and shows its general applicability on a simplified model of an additive manufacturing problem.

[299]  arXiv:2002.10301 [pdf, other]
Title: Q-learning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast Learning
Comments: 30 pages, 2 figures
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)

It has been a trend in the Reinforcement Learning literature to derive sample complexity bounds: a bound on how many experiences with the environment are required to obtain an $\varepsilon$-optimal policy. In the discounted cost, infinite horizon setting, all of the known bounds have a factor that is a polynomial in $1/(1-\beta)$, where $\beta < 1$ is the discount factor. For a large discount factor, these bounds seem to imply that a very large number of samples is required to achieve an $\varepsilon$-optimal policy. The objective of the present work is to introduce a new class of algorithms that have sample complexity uniformly bounded for all $\beta < 1$. One may argue that this is impossible, due to a recent min-max lower bound. The explanation is that this previous lower bound is for a specific problem, which we modify, without compromising the ultimate objective of obtaining an $\varepsilon$-optimal policy.
Specifically, we show that the asymptotic variance of the Q-learning algorithm, with an optimized step-size sequence, is a quadratic function of $1/(1-\beta)$; an expected, and essentially known result. The new relative Q-learning algorithm proposed here is shown to have asymptotic variance that is a quadratic in $1/(1- \rho \beta)$, where $1 - \rho > 0$ is the spectral gap of an optimal transition matrix.

[300]  arXiv:2002.10303 [pdf, ps, other]
Title: Wheeler Languages
Subjects: Formal Languages and Automata Theory (cs.FL)

The recently introduced class of Wheeler graphs, inspired by the Burrows-Wheeler Transform (BWT) of a given string, admits an efficient index data structure for searching for subpaths with a given path label, and lifts the applicability of the Burrows-Wheeler transform from strings to languages. In this paper we study the regular languages accepted by automata having a Wheeler graph as transition function, and prove results on determination, Myhill_Nerode characterization, decidability, and closure properties for this class of languages.

[301]  arXiv:2002.10304 [pdf, ps, other]
Title: Fast in-place algorithms for polynomial operations: division, evaluation, interpolation
Subjects: Symbolic Computation (cs.SC); Computational Complexity (cs.CC)

We consider space-saving versions of several important operations on univariate polynomials, namely power series inversion and division, division with remainder, multi-point evaluation, and interpolation. Now-classical results show that such problems can be solved in (nearly) the same asymptotic time as fast polynomial multiplication. However, these reductions, even when applied to an in-place variant of fast polynomial multiplication, yield algorithms which require at least a linear amount of extra space for intermediate results. We demonstrate new in-place algorithms for the aforementioned polynomial computations which require only constant extra space and achieve the same asymptotic running time as their out-of-place counterparts. We also provide a precise complexity analysis so that all constants are made explicit, parameterized by the space usage of the underlying multiplication algorithms.

[302]  arXiv:2002.10306 [pdf, other]
Title: Adaptive Propagation Graph Convolutional Network
Comments: Preprint submitted to IEEE Transaction on Neural Networks and Learning Systems
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Graph convolutional networks (GCNs) are a family of neural network models that perform inference on graph data by interleaving vertex-wise operations and message-passing exchanges across nodes. Concerning the latter, two key questions arise: (i) how to design a differentiable exchange protocol (e.g., a 1-hop Laplacian smoothing in the original GCN), and (ii) how to characterize the trade-off in complexity with respect to the local updates. In this paper, we show that state-of-the-art results can be achieved by adapting the number of communication steps independently at every node. In particular, we endow each node with a halting unit (inspired by Graves' adaptive computation time) that after every exchange decides whether to continue communicating or not. We show that the proposed adaptive propagation GCN (AP-GCN) achieves superior or similar results to the best proposed models so far on a number of benchmarks, while requiring a small overhead in terms of additional parameters. We also investigate a regularization term to enforce an explicit trade-off between communication and accuracy. The code for the AP-GCN experiments is released as an open-source library.

[303]  arXiv:2002.10309 [pdf, other]
Title: Uncertainty based Class Activation Maps for Visual Question Answering
Comments: This work is an extension of our ICCV-2019 work. arXiv admin note: text overlap with arXiv:1908.06306
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)

Understanding and explaining deep learning models is an imperative task. Towards this, we propose a method that obtains gradient-based certainty estimates that also provide visual attention maps. Particularly, we solve for visual question answering task. We incorporate modern probabilistic deep learning methods that we further improve by using the gradients for these estimates. These have two-fold benefits: a) improvement in obtaining the certainty estimates that correlate better with misclassified samples and b) improved attention maps that provide state-of-the-art results in terms of correlation with human attention regions. The improved attention maps result in consistent improvement for various methods for visual question answering. Therefore, the proposed technique can be thought of as a recipe for obtaining improved certainty estimates and explanations for deep learning models. We provide detailed empirical analysis for the visual question answering task on all standard benchmarks and comparison with state of the art methods.

[304]  arXiv:2002.10310 [pdf, other]
Title: Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval
Comments: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Fine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of retrieving a particular photo instance given a user's query sketch. Its widespread applicability is however hindered by the fact that drawing a sketch takes time, and most people struggle to draw a complete and faithful sketch. In this paper, we reformulate the conventional FG-SBIR framework to tackle these challenges, with the ultimate goal of retrieving the target photo with the least number of strokes possible. We further propose an on-the-fly design that starts retrieving as soon as the user starts drawing. To accomplish this, we devise a reinforcement learning-based cross-modal retrieval framework that directly optimizes rank of the ground-truth photo over a complete sketch drawing episode. Additionally, we introduce a novel reward scheme that circumvents the problems related to irrelevant sketch strokes, and thus provides us with a more consistent rank list during the retrieval. We achieve superior early-retrieval efficiency over state-of-the-art methods and alternative baselines on two publicly available fine-grained sketch retrieval datasets.

[305]  arXiv:2002.10312 [pdf, ps, other]
Title: Learning Certified Individually Fair Representations
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

To effectively enforce fairness constraints one needs to define an appropriate notion of fairness and employ representation learning in order to impose this notion without compromising downstream utility for the data consumer. A desirable notion is individual fairness as it guarantees similar treatment for similar individuals. In this work, we introduce the first method which generalizes individual fairness to rich similarity notions via logical constraints while also enabling data consumers to obtain fairness certificates for their models. The key idea is to learn a representation that provably maps similar individuals to latent representations at most $\epsilon$ apart in $\ell_{\infty}$-distance, enabling data consumers to certify individual fairness by proving $\epsilon$-robustness of their classifier. Our experimental evaluation on six real-world datasets and a wide range of fairness constraints demonstrates that our approach is expressive enough to capture similarity notions beyond existing distance metrics while scaling to realistic use cases.

[306]  arXiv:2002.10313 [pdf, other]
Title: Imagining Data-Objects for Reflective Self-Tracking
Subjects: Human-Computer Interaction (cs.HC)

While self-tracking data is typically captured real-time in a lived experience, the data is often stored in a manner detached from the context where it belongs. Research has shown that there is a potential to enhance people's lived experiences with data-objects (artifacts representing contextually relevant data), for individual and collective reflections through a physical portrayal of data. This paper expands that research by studying how to design contextually relevant data-objects based on people's needs. We conducted a participatory research project with five households using object theater as a core method to encourage participants to speculate upon combinations of meaningful objects and personal data archives. In this paper, we detail three aspects that seem relevant for designing data-objects: social sharing, contextual ambiguity and interaction with the body. We show how an experience-centric view on data-objects can contribute with the contextual, social and bodily interplay between people, data and objects.

[307]  arXiv:2002.10316 [pdf, other]
Title: Fair Bandit Learning with Delayed Impact of Actions
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Algorithmic fairness has been studied mostly in a static setting where the implicit assumptions are that the frequencies of historically made decisions do not impact the problem structure in subsequent future. However, for example, the capability to pay back a loan for people in a certain group might depend on historically how frequently that group has been approved loan applications. If banks keep rejecting loan applications to people in a disadvantaged group, it could create a feedback loop and further damage the chance of getting loans for people in that group. This challenge has been noted in several recent works but is under-explored in a more generic sequential learning setting. In this paper, we formulate this delayed and long-term impact of actions within the context of multi-armed bandits (MAB). We generalize the classical bandit setting to encode the dependency of this action "bias" due to the history of the learning. Our goal is to learn to maximize the collected utilities over time while satisfying fairness constraints imposed over arms' utilities, which again depend on the decision they have received. We propose an algorithm that achieves a regret of $\tilde{\mathcal{O}}(KT^{2/3})$ and show a matching regret lower bound of $\Omega(KT^{2/3})$, where $K$ is the number of arms and $T$ denotes the learning horizon. Our results complement the bandit literature by adding techniques to deal with actions with long-term impacts and have implications in designing fair algorithms.

[308]  arXiv:2002.10319 [pdf, other]
Title: Self-Adaptive Training: beyond Empirical Risk Minimization
Comments: 13 pages, 7 figures, 5 tables
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We propose self-adaptive training---a new training algorithm that dynamically corrects problematic training labels by model predictions without incurring extra computational cost---to improve generalization of deep learning for potentially corrupted training data. This problem is crucial towards robustly learning from data that are corrupted by, e.g., label noises and out-of-distribution samples. The standard empirical risk minimization (ERM) for such data, however, may easily overfit noises and thus suffers from sub-optimal performance. In this paper, we observe that model predictions can substantially benefit the training process: self-adaptive training significantly improves generalization over ERM under various levels of noises, and mitigates the overfitting issue in both natural and adversarial training. We evaluate the error-capacity curve of self-adaptive training: the test error is monotonously decreasing w.r.t. model capacity. This is in sharp contrast to the recently-discovered double-descent phenomenon in ERM which might be a result of overfitting of noises. Experiments on CIFAR and ImageNet datasets verify the effectiveness of our approach in two applications: classification with label noise and selective classification. We release our code at \url{https://github.com/LayneH/self-adaptive-training}.

[309]  arXiv:2002.10322 [pdf, other]
Title: Anatomy-aware 3D Human Pose Estimation in Videos
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this work, we propose a new solution for 3D human pose estimation in videos. Instead of directly regressing the 3D joint locations, we draw inspiration from the human skeleton anatomy and decompose the task into bone direction prediction and bone length prediction, from which the 3D joint locations can be completely derived. Our motivation is the fact that the bone lengths of a human skeleton remain consistent across time. This promotes us to develop effective techniques to utilize global information across {\it all} the frames in a video for high-accuracy bone length prediction. Moreover, for the bone direction prediction network, we propose a fully-convolutional propagating architecture with long skip connections. Essentially, it predicts the directions of different bones hierarchically without using any time-consuming memory units (e.g. LSTM). A novel joint shift loss is further introduced to bridge the training of the bone length and bone direction prediction networks. Finally, we employ an implicit attention mechanism to feed the 2D keypoint visibility scores into the model as extra guidance, which significantly mitigates the depth ambiguity in many challenging poses. Our full model outperforms the previous best results on Human3.6M and MPI-INF-3DHP datasets, where comprehensive evaluation validates the effectiveness of our model.

[310]  arXiv:2002.10327 [pdf, ps, other]
Title: Angle Aware User Cooperation for Secure Massive MIMO in Rician Fading Channel
Comments: 14 pages, 12 figures, accepted by IEEE Journal on Selected Areas in Communications
Subjects: Information Theory (cs.IT)

Massive multiple-input multiple-output communications can achieve high-level security by concentrating radio frequency signals towards the legitimate users. However, this system is vulnerable in a Rician fading environment if the eavesdropper positions itself such that its channel is highly "similar" to the channel of a legitimate user. To address this problem, this paper proposes an angle aware user cooperation (AAUC) scheme, which avoids direct transmission to the attacked user and relies on other users for cooperative relaying. The proposed scheme only requires the eavesdropper's angle information, and adopts an angular secrecy model to represent the average secrecy rate of the attacked system. With this angular model, the AAUC problem turns out to be nonconvex, and a successive convex optimization algorithm, which converges to a Karush-Kuhn-Tucker solution, is proposed. Furthermore, a closed-form solution and a Bregman first-order method are derived for the cases of large-scale antennas and large-scale users, respectively. Extension to the intelligent reflecting surfaces based scheme is also discussed. Simulation results demonstrate the effectiveness of the proposed successive convex optimization based AAUC scheme, and also validate the low-complexity nature of the proposed large-scale optimization algorithms.

[311]  arXiv:2002.10329 [pdf, other]
Title: KBSET -- Knowledge-Based Support for Scholarly Editing and Text Processing with Declarative LaTeX Markup and a Core Written in SWI-Prolog
Comments: To appear in DECLARE 2019 Revised Selected Papers
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

KBSET is an environment that provides support for scholarly editing in two flavors: First, as a practical tool KBSET/Letters that accompanies the development of editions of correspondences (in particular from the 18th and 19th century), completely from source documents to PDF and HTML presentations. Second, as a prototypical tool KBSET/NER for experimentally investigating novel forms of working on editions that are centered around automated named entity recognition. KBSET can process declarative application-specific markup that is expressed in LaTeX notation and incorporate large external fact bases that are typically provided in RDF. KBSET includes specially developed LaTeX styles and a core system that is written in SWI-Prolog, which is used there in many roles, utilizing that it realizes the potential of Prolog as a unifying language.

[312]  arXiv:2002.10330 [pdf, other]
Title: FSinR: an exhaustive package for feature selection
Comments: 17 pages, 6 figures, 2 tables
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Feature Selection (FS) is a key task in Machine Learning. It consists in selecting a number of relevant variables for the model construction or data analysis. We present the R package, FSinR, which implements a variety of widely known filter and wrapper methods, as well as search algorithms. Thus, the package provides the possibility to perform the feature selection process, which consists in the combination of a guided search on the subsets of features with the filter or wrapper methods that return an evaluation measure of those subsets. In this article, we also present some examples on the usage of the package and a comparison with other packages available in R that contain methods for feature selection.

[313]  arXiv:2002.10333 [pdf]
Title: A New Approach for Improvement Security against DoS Attacks in Vehicular Ad-hoc Network
Comments: 7 pages, 12 figures, 2 tables, 4 equation, journal
Journal-ref: Int J Adv Comput Sci Appl, 7(7), 10-16 (2016)
Subjects: Cryptography and Security (cs.CR); Performance (cs.PF)

Vehicular Ad-Hoc Networks (VANET) are a proper subset of mobile wireless networks, where nodes are revulsive, the vehicles are armed with special electronic devices on the motherboard OBU (On Board Unit) which enables them to trasmit and receive messages from other vehicles in the VANET. Furthermore the communication between the vehicles, the VANET interface is donated by the contact points with road infrastructure. VANET is a subgroup of MANETs. Unlike the MANETs nodes, VANET nodes are moving very fast. Impound a permanent route for the dissemination of emergency messages and alerts from a danger zone is a very challenging task. Therefore, routing plays a significant duty in VANETs. decreasing network overhead, avoiding network congestion, increasing traffic congestion and packet delivery ratio are the most important issues associated with routing in VANETs. In addition, VANET network is subject to various security attacks. In base VANET systems, an algorithm is used to dicover attacks at the time of confirmation in which overhead delay occurs. This paper proposes (P-Secure) approach which is used for the detection of DoS attacks before the confirmation time. This reduces the overhead delays for processing and increasing the security in VANETs. Simulation results show that the P-Secure approach, is more efficient than OBUmodelVaNET approach in terms of PDR, e2e_delay, throughput and drop packet rate.

[314]  arXiv:2002.10336 [pdf, other]
Title: Semi-Supervised Speech Recognition via Local Prior Matching
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

For sequence transduction tasks like speech recognition, a strong structured prior model encodes rich information about the target space, implicitly ruling out invalid sequences by assigning them low probability. In this work, we propose local prior matching (LPM), a semi-supervised objective that distills knowledge from a strong prior (e.g. a language model) to provide learning signal to a discriminative model trained on unlabeled speech. We demonstrate that LPM is theoretically well-motivated, simple to implement, and superior to existing knowledge distillation techniques under comparable settings. Starting from a baseline trained on 100 hours of labeled speech, with an additional 360 hours of unlabeled data, LPM recovers 54% and 73% of the word error rate on clean and noisy test sets relative to a fully supervised model on the same data.

[315]  arXiv:2002.10340 [pdf, other]
Title: Guessing State Tracking for Visual Dialogue
Comments: 9 pages, 5 figures, Nov. 2019, this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The Guesser plays an important role in GuessWhat?! like visual dialogues. It locates the target object in an image supposed by an oracle oneself over a question-answer based dialogue between a Questioner and the Oracle. Most existing guessers make one and only one guess after receiving all question-answer pairs in a dialogue with predefined number of rounds. This paper proposes the guessing state for the guesser, and regards guess as a process with change of guessing state through a dialogue. A guessing state tracking based guess model is therefore proposed. The guessing state is defined as a distribution on candidate objects in the image. A state update algorithm including three modules is given. UoVR updates the representation of the image according to current guessing state, QAEncoder encodes the question-answer pairs, and UoGS updates the guessing state by combining both information from the image and dialogue history. With the guessing state in hand, two loss functions are defined as supervisions for model training. Early supervision brings supervision to guesser at early rounds, and incremental supervision brings monotonicity to the guessing state. Experimental results on GuessWhat?! dataset show that our model significantly outperforms previous models, achieves new state-of-the-art, especially, the success rate of guessing 83.3% is approaching human-level performance 84.4%.

[316]  arXiv:2002.10342 [pdf, other]
Title: Comparing View-Based and Map-Based Semantic Labelling in Real-Time SLAM
Comments: ICRA 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generally capable Spatial AI systems must build persistent scene representations where geometric models are combined with meaningful semantic labels. The many approaches to labelling scenes can be divided into two clear groups: view-based which estimate labels from the input view-wise data and then incrementally fuse them into the scene model as it is built; and map-based which label the generated scene model. However, there has so far been no attempt to quantitatively compare view-based and map-based labelling. Here, we present an experimental framework and comparison which uses real-time height map fusion as an accessible platform for a fair comparison, opening up the route to further systematic research in this area.

[317]  arXiv:2002.10344 [pdf, other]
Title: On the Forward and Backward Motion of Milli-Bristle-Bots
Subjects: Systems and Control (eess.SY); Robotics (cs.RO)

This works presents the theoretical analysis and experimental observations of bidirectional motion of a millimeter-scale bristle robot (milli-bristle-bot) with an on-board piezoelectric actuator. First, the theory of the motion, based on the dry-friction model, is developed and the frequency regions of the forward and backward motion, along with resonant frequencies of the system are predicted. Secondly, milli-bristle-bots with two different bristle tilt angles are fabricated, and their bidirectional motions are experimentally investigated. The dependency of the robot speed on the actuation frequency is studied,which reveals two distinct frequency regions for the forward and backward motions that well matches our theoretical predictions. Furthermore, the dependencies of the resonance frequency and robot speed on the bristle tilt angle are experimentally studied and tied to the theoretical model. This work marks the first demonstration of bidirectional motion at millimeter-scales, achieved for bristle-bots with a single on-board actuator.

[318]  arXiv:2002.10345 [pdf, other]
Title: Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation
Comments: 7 pages, 6 figures
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Fine-tuning pre-trained language models like BERT has become an effective way in NLP and yields state-of-the-art results on many downstream tasks. Recent studies on adapting BERT to new tasks mainly focus on modifying the model structure, re-designing the pre-train tasks, and leveraging external data and knowledge. The fine-tuning strategy itself has yet to be fully explored. In this paper, we improve the fine-tuning of BERT with two effective mechanisms: self-ensemble and self-distillation. The experiments on text classification and natural language inference tasks show our proposed methods can significantly improve the adaption of BERT without any external data or knowledge.

[319]  arXiv:2002.10347 [pdf, other]
Title: MilliCar -- An ns-3 Module for mmWave NR V2X Networks
Comments: 8 pages, 5 figures. Submitted to WNS3 2020. The code related to this paper can be found at this https URL
Subjects: Networking and Internet Architecture (cs.NI)

Vehicle-to-vehicle (V2V) communications have opened the way towards cooperative automated driving as a means to guarantee improved road safety and traffic efficiency. The use of the millimeter wave (mmWave) spectrum for V2V, in particular, holds great promise since the large bandwidth available offers the possibility of realizing high-data-rate connections. However, this potential is hindered by the significant path and penetration loss experienced at these frequencies. It then becomes fundamental to practically evaluate the feasibility of installing mmWave-based technologies in the vehicular scenario, in view of the strict latency and throughput requirements of future automotive applications. To do so, in this paper we present MilliCar, the first ns-3 module for V2V mmWave networks, which features a detailed implementation of the sidelink Physical (PHY) and Medium Access Control (MAC) layers based on the latest NR V2X specifications, the 3GPP standard for next-generation vehicular systems. Our module is open-source and enables researchers to compare possible design options and their relative performance through an end-to-end full-stack approach, thereby stimulating further research on this topic.

[320]  arXiv:2002.10348 [pdf, other]
Title: Low-Resource Knowledge-Grounded Dialogue Generation
Comments: Published in ICLR 2020
Subjects: Computation and Language (cs.CL)

Responding with knowledge has been recognized as an important capability for an intelligent conversational agent. Yet knowledge-grounded dialogues, as training data for learning such a response generation model, are difficult to obtain. Motivated by the challenge in practice, we consider knowledge-grounded dialogue generation under a natural assumption that only limited training examples are available. In such a low-resource setting, we devise a disentangled response decoder in order to isolate parameters that depend on knowledge-grounded dialogues from the entire generation model. By this means, the major part of the model can be learned from a large number of ungrounded dialogues and unstructured documents, while the remaining small parameters can be well fitted using the limited training examples. Evaluation results on two benchmarks indicate that with only 1/8 training data, our model can achieve the state-of-the-art performance and generalize well on out-of-domain knowledge.

[321]  arXiv:2002.10349 [pdf, other]
Title: A Model-Based Derivative-Free Approach to Black-Box Adversarial Examples: BOBYQA
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We demonstrate that model-based derivative free optimisation algorithms can generate adversarial targeted misclassification of deep networks using fewer network queries than non-model-based methods. Specifically, we consider the black-box setting, and show that the number of networks queries is less impacted by making the task more challenging either through reducing the allowed $\ell^{\infty}$ perturbation energy or training the network with defences against adversarial misclassification. We illustrate this by contrasting the BOBYQA algorithm with the state-of-the-art model-free adversarial targeted misclassification approaches based on genetic, combinatorial, and direct-search algorithms. We observe that for high $\ell^{\infty}$ energy perturbations on networks, the aforementioned simpler model-free methods require the fewest queries. In contrast, the proposed BOBYQA based method achieves state-of-the-art results when the perturbation energy decreases, or if the network is trained against adversarial perturbations.

[322]  arXiv:2002.10361 [pdf, other]
Title: Multilingual Twitter Corpus and Baselines for Evaluating Demographic Bias in Hate Speech Recognition
Comments: Accepted at LREC 2020
Subjects: Computation and Language (cs.CL)

Existing research on fairness evaluation of document classification models mainly uses synthetic monolingual data without ground truth for author demographic attributes. In this work, we assemble and publish a multilingual Twitter corpus for the task of hate speech detection with inferred four author demographic factors: age, country, gender and race/ethnicity. The corpus covers five languages: English, Italian, Polish, Portuguese and Spanish. We evaluate the inferred demographic labels with a crowdsourcing platform, Figure Eight. To examine factors that can cause biases, we take an empirical analysis of demographic predictability on the English corpus. We measure the performance of four popular document classifiers and evaluate the fairness and bias of the baseline classifiers on the author-level demographic attributes.

[323]  arXiv:2002.10362 [pdf, other]
Title: Group Membership Verification with Privacy: Sparse or Dense?
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)

Group membership verification checks if a biometric trait corresponds to one member of a group without revealing the identity of that member. Recent contributions provide privacy for group membership protocols through the joint use of two mechanisms: quantizing templates into discrete embeddings and aggregating several templates into one group representation. However, this scheme has one drawback: the data structure representing the group has a limited size and cannot recognize noisy queries when many templates are aggregated. Moreover, the sparsity of the embeddings seemingly plays a crucial role on the performance verification. This paper proposes a mathematical model for group membership verification allowing to reveal the impact of sparsity on both security, compactness, and verification performances. This model bridges the gap towards a Bloom filter robust to noisy queries. It shows that a dense solution is more competitive unless the queries are almost noiseless.

[324]  arXiv:2002.10363 [pdf, other]
Title: Joint Learning of Assignment and Representation for Biometric Group Membership
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)

This paper proposes a framework for group membership protocols preventing the curious but honest server from reconstructing the enrolled biometric signatures and inferring the identity of querying clients. This framework learns the embedding parameters, group representations and assignments simultaneously. Experiments show the trade-off between security/privacy and verification/identification performances.

[325]  arXiv:2002.10365 [pdf, other]
Title: The Early Phase of Neural Network Training
Comments: ICLR 2020 Camera Ready. Available on OpenReview at this https URL
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Recent studies have shown that many important aspects of neural network learning take place within the very earliest iterations or epochs of training. For example, sparse, trainable sub-networks emerge (Frankle et al., 2019), gradient descent moves into a small subspace (Gur-Ari et al., 2018), and the network undergoes a critical period (Achille et al., 2019). Here, we examine the changes that deep neural networks undergo during this early phase of training. We perform extensive measurements of the network state during these early iterations of training and leverage the framework of Frankle et al. (2019) to quantitatively probe the weight distribution and its reliance on various aspects of the dataset. We find that, within this framework, deep networks are not robust to reinitializing with random weights while maintaining signs, and that weight distributions are highly non-independent even after only a few hundred iterations. Despite this behavior, pre-training with blurred inputs or an auxiliary self-supervised task can approximate the changes in supervised networks, suggesting that these changes are not inherently label-dependent, though labels significantly accelerate this process. Together, these results help to elucidate the network changes occurring during this pivotal initial period of learning.

[326]  arXiv:2002.10371 [pdf, ps, other]
Title: A Hardware Architecture for Reconfigurable Intelligent Surfaces with Minimal Active Elements for Explicit Channel Estimation
Comments: 5 pages, 2 figures, invited/accepted to IEEE ICASSP 2020
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Intelligent surfaces comprising of cost effective, nearly passive, and reconfigurable unit elements are lately gaining increasing interest due to their potential in enabling fully programmable wireless environments. They are envisioned to offer environmental intelligence for diverse communication objectives, when coated on various objects of the deployment area of interest. To achieve this overarching goal, the channels where the Reconfigurable Intelligent Surfaces (RISs) are involved need to be in principle estimated. However, this is a challenging task with the currently available hardware RIS architectures requiring lengthy training periods among the network nodes utilizing RIS-assisted wireless communication. In this paper, we present a novel RIS architecture comprising of any number of passive reflecting elements, a simple controller for their adjustable configuration, and a single Radio Frequency (RF) chain for baseband measurements. Capitalizing on this architecture and assuming sparse wireless channels in the beamspace domain, we present an alternating optimization approach for explicit estimation of the channel gains at the RIS elements attached to the single RF chain. Representative simulation results demonstrate the channel estimation accuracy and achievable end-to-end performance for various training lengths and numbers of reflecting unit elements.

[327]  arXiv:2002.10373 [pdf, other]
Title: Symbolic Learning and Reasoning with Noisy Data for Probabilistic Anchoring
Subjects: Artificial Intelligence (cs.AI)

Robotic agents should be able to learn from sub-symbolic sensor data, and at the same time, be able to reason about objects and communicate with humans on a symbolic level. This raises the question of how to overcome the gap between symbolic and sub-symbolic artificial intelligence. We propose a semantic world modeling approach based on bottom-up object anchoring using an object-centered representation of the world. Perceptual anchoring processes continuous perceptual sensor data and maintains a correspondence to a symbolic representation. We extend the definitions of anchoring to handle multi-modal probability distributions and we couple the resulting symbol anchoring system to a probabilistic logic reasoner for performing inference. Furthermore, we use statistical relational learning to enable the anchoring framework to learn symbolic knowledge in the form of a set of probabilistic logic rules of the world from noisy and sub-symbolic sensor input. The resulting framework, which combines perceptual anchoring and statistical relational learning, is able to maintain a semantic world model of all the objects that have been perceived over time, while still exploiting the expressiveness of logical rules to reason about the state of objects which are not directly observed through sensory input data. To validate our approach we demonstrate, on the one hand, the ability of our system to perform probabilistic reasoning over multi-modal probability distributions, and on the other hand, the learning of probabilistic logical rules from anchored objects produced by perceptual observations. The learned logical rules are, subsequently, used to assess our proposed probabilistic anchoring procedure. We demonstrate our system in a setting involving object interactions where object occlusions arise and where probabilistic inference is needed to correctly anchor objects.

[328]  arXiv:2002.10375 [pdf, other]
Title: Discriminative Adversarial Search for Abstractive Summarization
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

We introduce a novel approach for sequence decoding, Discriminative Adversarial Search (DAS), which has the desirable properties of alleviating the effects of exposure bias without requiring external metrics. Inspired by Generative Adversarial Networks (GANs), wherein a discriminator is used to improve the generator, our method differs from GANs in that the generator parameters are not updated at training time and the discriminator is only used to drive sequence generation at inference time.
We investigate the effectiveness of the proposed approach on the task of Abstractive Summarization: the results obtained show that a naive application of DAS improves over the state-of-the-art methods, with further gains obtained via discriminator retraining. Moreover, we show how DAS can be effective for cross-domain adaptation. Finally, all results reported are obtained without additional rule-based filtering strategies, commonly used by the best performing systems available: this indicates that DAS can effectively be deployed without relying on post-hoc modifications of the generated outputs.

[329]  arXiv:2002.10376 [pdf, other]
Title: The Two Regimes of Deep Network Training
Comments: 14 pages (5 of appendix), 14 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Learning rate schedule has a major impact on the performance of deep learning models. Still, the choice of a schedule is often heuristical. We aim to develop a precise understanding of the effects of different learning rate schedules and the appropriate way to select them. To this end, we isolate two distinct phases of training, the first, which we refer to as the "large-step" regime, exhibits a rather poor performance from an optimization point of view but is the primary contributor to model generalization; the latter, "small-step" regime exhibits much more "convex-like" optimization behavior but used in isolation produces models that generalize poorly. We find that by treating these regimes separately-and em specializing our training algorithm to each one of them, we can significantly simplify learning rate schedules.

[330]  arXiv:2002.10378 [pdf, other]
Title: Supervised Deep Similarity Matching
Subjects: Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)

We propose a novel biologically-plausible solution to the credit assignment problem, being motivated by observations in the ventral visual pathway and trained deep neural networks. In both, representations of objects in the same category become progressively more similar, while objects belonging to different categories becomes less similar. We use this observation to motivate a layer-specific learning goal in a deep network: each layer aims to learn a representational similarity matrix that interpolates between previous and later layers. We formulate this idea using a supervised deep similarity matching cost function and derive from it deep neural networks with feedforward, lateral and feedback connections, and neurons that exhibit biologically-plausible Hebbian and anti-Hebbian plasticity. Supervised deep similarity matching can be interpreted as an energy-based learning algorithm, but with significant differences from others in how a contrastive function is constructed.

[331]  arXiv:2002.10381 [pdf, other]
Title: Sketchformer: Transformer-based Representation for Sketched Structure
Comments: Accepted for publication at CVPR 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Sketchformer is a novel transformer-based representation for encoding free-hand sketches input in a vector form, i.e. as a sequence of strokes. Sketchformer effectively addresses multiple tasks: sketch classification, sketch based image retrieval (SBIR), and the reconstruction and interpolation of sketches. We report several variants exploring continuous and tokenized input representations, and contrast their performance. Our learned embedding, driven by a dictionary learning tokenization scheme, yields state of the art performance in classification and image retrieval tasks, when compared against baseline representations driven by LSTM sequence to sequence architectures: SketchRNN and derivatives. We show that sketch reconstruction and interpolation are improved significantly by the Sketchformer embedding for complex sketches with longer stroke sequences.

[332]  arXiv:2002.10384 [pdf, ps, other]
Title: On the Sample Complexity of Adversarial Multi-Source PAC Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We study the problem of learning from multiple untrusted data sources, a scenario of increasing practical relevance given the recent emergence of crowdsourcing and collaborative learning paradigms. Specifically, we analyze the situation in which a learning system obtains datasets from multiple sources, some of which might be biased or even adversarially perturbed. It is known that in the single-source case, an adversary with the power to corrupt a fixed fraction of the training data can prevent PAC-learnability, that is, even in the limit of infinitely much training data, no learning system can approach the optimal test error. In this work we show that, surprisingly, the same is not true in the multi-source setting, where the adversary can arbitrarily corrupt a fixed fraction of the data sources. Our main results are a generalization bound that provides finite-sample guarantees for this learning setting, as well as corresponding lower bounds. Besides establishing PAC-learnability our results also show that in a cooperative learning setting sharing data with other parties has provable benefits, even if some participants are malicious.

[333]  arXiv:2002.10387 [pdf, other]
Title: Achievable Information Rates for Probabilistic Amplitude Shaping: A Minimum-Randomness Approach via Random Sign-Coding Arguments
Comments: 10 pages, 4 figures
Subjects: Information Theory (cs.IT)

Probabilistic amplitude shaping (PAS) is a coded modulation strategy in which constellation shaping and channel coding are combined. PAS has attracted considerable attention in both wireless and optical communications. Achievable information rates (AIRs) of PAS have been investigated in the literature using Gallager's error exponent approach. In particular, it has been shown that PAS achieves the capacity of a memoryless channel. In this work, we revisit the capacity-achieving property of PAS, and derive AIRs using weak typicality. We provide alternative proofs based on random sign-coding arguments. Our objective is to minimize the randomness in the random coding experiment. Accordingly, in our proofs, only some signs of the channel inputs are drawn from a random code, while the remaining signs and the amplitudes are produced constructively. We consider both symbol-metric and bit-metric decoding.

[334]  arXiv:2002.10389 [pdf, other]
Title: Semi-Supervised Neural Architecture Search
Comments: Code available at this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Neural architecture search (NAS) relies on a good controller to generate better architectures or predict the accuracy of given architectures. However, training the controller requires both abundant and high-quality pairs of architectures and their accuracy, while it is costly to evaluate an architecture and obtain its accuracy. In this paper, we propose \emph{SemiNAS}, a semi-supervised NAS approach that leverages numerous unlabeled architectures~(without evaluation and thus nearly no cost) to improve the controller. Specifically, SemiNAS 1) trains an initial controller with a small set of architecture-accuracy data pairs; 2) uses the trained controller to predict the accuracy of large amount of architectures~(without evaluation); and 3) adds the generated data pairs to the original data to further improve the controller. SemiNAS has two advantages: 1) It reduces the computational cost under the same accuracy guarantee. 2) It achieves higher accuracy under the same computational cost. On NASBench-101 benchmark dataset, it discovers a top 0.01% architecture after evaluating roughly 300 architectures, with only 1/7 computational cost compared with regularized evolution and gradient-based methods. On ImageNet, it achieves 24.2% top-1 error rate (under the mobile setting) using 4 GPU-days for search. We further apply it to LJSpeech text to speech task and it achieves 97% intelligibility rate in the low-resource setting and 15% test error rate in the robustness setting, with 9%, 7% improvements over the baseline respectively. Our code is available at https://github.com/renqianluo/SemiNAS.

[335]  arXiv:2002.10390 [pdf, other]
Title: Spatial-Temporal Moving Target Defense: A Markov Stackelberg Game Model
Comments: accepted by AAMAS 2020
Subjects: Computer Science and Game Theory (cs.GT); Cryptography and Security (cs.CR)

Moving target defense has emerged as a critical paradigm of protecting a vulnerable system against persistent and stealthy attacks. To protect a system, a defender proactively changes the system configurations to limit the exposure of security vulnerabilities to potential attackers. In doing so, the defender creates asymmetric uncertainty and complexity for the attackers, making it much harder for them to compromise the system. In practice, the defender incurs a switching cost for each migration of the system configurations. The switching cost usually depends on both the current configuration and the following configuration. Besides, different system configurations typically require a different amount of time for an attacker to exploit and attack. Therefore, a defender must simultaneously decide both the optimal sequences of system configurations and the optimal timing for switching. In this paper, we propose a Markov Stackelberg Game framework to precisely characterize the defender's spatial and temporal decision-making in the face of advanced attackers. We introduce a relative value iteration algorithm that computes the defender's optimal moving target defense strategies. Empirical evaluation on real-world problems demonstrates the advantages of the Markov Stackelberg game model for spatial-temporal moving target defense.

[336]  arXiv:2002.10392 [pdf, other]
Title: Suppressing Uncertainties for Large-Scale Facial Expression Recognition
Comments: This manuscript has been accepted by CVPR2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Annotating a qualitative large-scale facial expression dataset is extremely difficult due to the uncertainties caused by ambiguous facial expressions, low-quality facial images, and the subjectiveness of annotators. These uncertainties lead to a key challenge of large-scale Facial Expression Recognition (FER) in deep learning era. To address this problem, this paper proposes a simple yet efficient Self-Cure Network (SCN) which suppresses the uncertainties efficiently and prevents deep networks from over-fitting uncertain facial images. Specifically, SCN suppresses the uncertainty from two different aspects: 1) a self-attention mechanism over mini-batch to weight each training sample with a ranking regularization, and 2) a careful relabeling mechanism to modify the labels of these samples in the lowest-ranked group. Experiments on synthetic FER datasets and our collected WebEmotion dataset validate the effectiveness of our method. Results on public benchmarks demonstrate that our SCN outperforms current state-of-the-art methods with \textbf{88.14}\% on RAF-DB, \textbf{60.23}\% on AffectNet, and \textbf{89.35}\% on FERPlus. The code will be available at \href{https://github.com/kaiwang960112/Self-Cure-Network}{https://github.com/kaiwang960112/Self-Cure-Network}.

[337]  arXiv:2002.10394 [pdf, other]
Title: DeepPlume: Very High Resolution Real-Time Air Quality Mapping
Comments: 8 pages, 8 figures
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)

This paper presents an engine able to predict jointly the real-time concentration of the main pollutants harming people's health: nitrogen dioxyde (NO2), ozone (O3) and particulate matter (PM2.5 and PM10, which are respectively the particles whose size are below 2.5 um and 10 um).
The engine covers a large part of the world and is fed with real-time official stations measures, atmospheric models' forecasts, land cover data, road networks and traffic estimates to produce predictions with a very high resolution in the range of a few dozens of meters. This resolution makes the engine adapted to very innovative applications like street-level air quality mapping or air quality adjusted routing.
Plume Labs has deployed a similar prediction engine to build several products aiming at providing air quality data to individuals and businesses. For the sake of clarity and reproducibility, the engine presented here has been built specifically for this paper and differs quite significantly from the one used in Plume Labs' products. A major difference is in the data sources feeding the engine: in particular, this prediction engine does not include mobile sensors measurements.

[338]  arXiv:2002.10400 [pdf, other]
Title: Closing the convergence gap of SGD without replacement
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Stochastic gradient descent without replacement sampling is widely used in practice for model training. However, the vast majority of SGD analyses assumes data sampled with replacement, and when the function minimized is strongly convex, an $\mathcal{O}\left(\frac{1}{T}\right)$ rate can be established when SGD is run for $T$ iterations. A recent line of breakthrough work on SGD without replacement (SGDo) established an $\mathcal{O}\left(\frac{n}{T^2}\right)$ convergence rate when the function minimized is strongly convex and is a sum of $n$ smooth functions, and an $\mathcal{O}\left(\frac{1}{T^2}+\frac{n^3}{T^3}\right)$ rate for sums of quadratics. On the other hand, the tightest known lower bound postulates an $\Omega\left(\frac{1}{T^2}+\frac{n^2}{T^3}\right)$ rate, leaving open the possibility of better SGDo convergence rates in the general case. In this paper, we close this gap and show that SGD without replacement achieves a rate of $\mathcal{O}\left(\frac{1}{T^2}+\frac{n^2}{T^3}\right)$ when the sum of the functions is a quadratic, and offer a new lower bound of $\Omega\left(\frac{n}{T^2}\right)$ for strongly convex functions that are sums of smooth functions.

[339]  arXiv:2002.10401 [pdf]
Title: BLAST: Bridging Length/time scales via Atomistic Simulation Toolkit
Subjects: Computational Engineering, Finance, and Science (cs.CE); Mesoscale and Nanoscale Physics (cond-mat.mes-hall); Materials Science (cond-mat.mtrl-sci)

The ever-increasing power of supercomputers coupled with highly scalable simulation codes have made molecular dynamics an indispensable tool in applications ranging from predictive modeling of materials to computational design and discovery of new materials for a broad range of applications. Multi-fidelity scale bridging between the various flavors of molecular dynamics i.e. ab-initio, classical and coarse-grained models has remained a long-standing challenge. Here, we introduce our framework BLAST (Bridging Length/time scales via Atomistic Simulation Toolkit) that leverages machine learning principles to address this challenge. BLAST is a multi-fidelity scale bridging framework that provide users with the capabilities to train and develop their own classical atomistic and coarse-grained interatomic potentials (force fields) for molecular simulations. BLAST is designed to address several long-standing problems in the molecular simulations community, such as unintended misuse of existing force fields due to knowledge gap between developers and users, bottlenecks in traditional force field development approaches, and other issues relating to the accuracy, efficiency, and transferability of force fields. Here, we discuss several important aspects in force field development and highlight features in BLAST that enable its functionalities and ease of use.

[340]  arXiv:2002.10410 [pdf, other]
Title: Lagrangian Decomposition for Neural Network Verification
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

A fundamental component of neural network verification is the computation of bounds on the values their outputs can take. Previous methods have either used off-the-shelf solvers, discarding the problem structure, or relaxed the problem even further, making the bounds unnecessarily loose. We propose a novel approach based on Lagrangian Decomposition. Our formulation admits an efficient supergradient ascent algorithm, as well as an improved proximal algorithm. Both the algorithms offer three advantages: (i) they yield bounds that are provably at least as tight as previous dual algorithms relying on Lagrangian relaxations; (ii) they are based on operations analogous to forward/backward pass of neural networks layers and are therefore easily parallelizable, amenable to GPU implementation and able to take advantage of the convolutional structure of problems; and (iii) they allow for anytime stopping while still providing valid bounds. Empirically, we show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time, and obtain tighter bounds in the same time as previous dual algorithms. This results in an overall speed-up when employing the bounds for formal verification.

[341]  arXiv:2002.10411 [pdf]
Title: Clustering and Classification with Non-Existence Attributes: A Sentenced Discrepancy Measure Based Technique
Comments: 30 pages, 16 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

For some or all of the data instances a number of independent-world clustering issues suffer from incomplete data characterization due to losing or absent attributes. Typical clustering approaches cannot be applied directly to such data unless pre-processing by techniques like imputation or marginalization. We have overcome this drawback by utilizing a Sentenced Discrepancy Measure which we refer to as the Attribute Weighted Penalty based Discrepancy (AWPD). Using the AWPD measure, we modified the K-MEANS++ and Scalable K-MEANS++ for clustering algorithm and k Nearest Neighbor (kNN) for classification so as to make them directly applicable to datasets with non-existence attributes. We have presented a detailed theoretical analysis which shows that the new AWPD based K-MEANS++, Scalable K-MEANS++ and kNN algorithm merge into a local prime among the number of iterations is finite. We have reported in depth experiments on numerous benchmark datasets for various forms of Non-Existence showing that the projected clustering and classification techniques usually show better results in comparison to some of the renowned imputation methods that are generally used to process such insufficient data. This technique is designed to trace invaluable data to: directly apply our method on the datasets which have Non-Existence attributes and establish a method for detecting unstructured Non-Existence attributes with the best accuracy rate and minimum cost.

[342]  arXiv:2002.10413 [pdf, other]
Title: Neural Message Passing on High Order Paths
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Graph neural network have achieved impressive results in predicting molecular properties, but they do not directly account for local and hidden structures in the graph such as functional groups and molecular geometry. At each propagation step, GNNs aggregate only over first order neighbours, ignoring important information contained in subsequent neighbours as well as the relationships between those higher order connections. In this work, we generalize graph neural nets to pass messages and aggregate across higher order paths. This allows for information to propagate over various levels and substructures of the graph. We demonstrate our model on a few tasks in molecular property prediction.

[343]  arXiv:2002.10416 [pdf, other]
Title: Resources for Turkish Dependency Parsing: Introducing the BOUN Treebank and the BoAT Annotation Tool
Authors: Utku Türk (1), Furkan Atmaca (1), Şaziye Betül Özateş (2), Gözde Berk (2), Seyyit Talha Bedir (1), Abdullatif Köksal (2), Balkız Öztürk Başaran (1), Tunga Güngör (2), Arzucan Özgür (2) ((1) Department of Linguistics Boğaziçi University, (2) Department of Computer Engineering Boğaziçi University)
Comments: 29 pages, 5 figures, 10 tables, submitted to Language Resources and Evaluation
Subjects: Computation and Language (cs.CL)

In this paper, we describe our contributions and efforts to develop Turkish resources, which include a new treebank (BOUN Treebank) with novel sentences, along with the guidelines we adopted and a new annotation tool we developed (BoAT). The manual annotation process we employed was shaped and implemented by a team of four linguists and five NLP specialists. Decisions regarding the annotation of the BOUN Treebank were made in line with the Universal Dependencies framework, which originated from the works of De Marneffe et al. (2014) and Nivre et al. (2016). We took into account the recent unifying efforts based on the re-annotation of other Turkish treebanks in the UD framework (T\"urk et al., 2019). Through the BOUN Treebank, we introduced a total of 9,757 sentences from various topics including biographical texts, national newspapers, instructional texts, popular culture articles, and essays. In addition, we report the parsing results of a graph-based dependency parser obtained over each text type, the total of the BOUN Treebank, and all Turkish treebanks that we either re-annotated or introduced. We show that a state-of-the-art dependency parser has improved scores for identifying the proper head and the syntactic relationships between the heads and the dependents. In light of these results, we have observed that the unification of the Turkish annotation scheme and introducing a more comprehensive treebank improves performance with regards to dependency parsing

[344]  arXiv:2002.10420 [pdf, other]
Title: Boosting rare benthic macroinvertebrates taxa identification with one-class classification
Comments: 5 pages, 1 figure, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)

Insect monitoring is crucial for understanding the consequences of rapid ecological changes, but taxa identification currently requires tedious manual expert work and cannot be scaled-up efficiently. Deep convolutional neural networks (CNNs), provide a viable way to significantly increase the biomonitoring volumes. However, taxa abundances are typically very imbalanced and the amounts of training images for the rarest classes are simply too low for deep CNNs. As a result, the samples from the rare classes are often completely missed, while detecting them has biological importance. In this paper, we propose combining the trained deep CNN with one-class classifiers to improve the rare species identification. One-class classification models are traditionally trained with much fewer samples and they can provide a mechanism to indicate samples potentially belonging to the rare classes for human inspection. Our experiments confirm that the proposed approach may indeed support moving towards partial automation of the taxa identification task.

[345]  arXiv:2002.10429 [pdf]
Title: Distributed Frequency Emergency Control with Coordinated Edge Intelligence
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)

Developing effective strategies to rapidly support grid frequency while minimizing loss in case of severe contingencies is an important requirement in power systems. While distributed responsive load demands are commonly adopted for frequency regulation, it is difficult to achieve both rapid response and global accuracy in a practical and cost-effective manner. In this paper, the cyber-physical design of an Internet-of-Things (IoT) enabled system, called Grid Sense, is presented. Grid Sense utilizes a large number of distributed appliances for frequency emergency support. It features a local power loss $\Delta P$ estimation approach for frequency emergency control based on coordinated edge intelligence. The specifically designed smart outlets of Grid Sense detect the frequency disturbance event locally using the parameters sent from the control center to estimate active power loss in the system and to make rapid and accurate switching decisions soon after a severe contingency. Based on a modified IEEE 24-bus system, numerical simulations and hardware experiments are conducted to demonstrate the frequency support performance of Grid Sense in the aspects of accuracy and speed. It is shown that Grid Sense equipped with its local $\Delta P$-estimation frequency control approach can accurately and rapidly prevent the drop of frequency after a major power loss.

[346]  arXiv:2002.10433 [pdf, other]
Title: From Chess and Atari to StarCraft and Beyond: How Game AI is Driving the World of AI
Journal-ref: KI - Kuenstliche Intelligenz (2020)
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

This paper reviews the field of Game AI, which not only deals with creating agents that can play a certain game, but also with areas as diverse as creating game content automatically, game analytics, or player modelling. While Game AI was for a long time not very well recognized by the larger scientific community, it has established itself as a research area for developing and testing the most advanced forms of AI algorithms and articles covering advances in mastering video games such as StarCraft 2 and Quake III appear in the most prestigious journals. Because of the growth of the field, a single review cannot cover it completely. Therefore, we put a focus on important recent developments, including that advances in Game AI are starting to be extended to areas outside of games, such as robotics or the synthesis of chemicals. In this article, we review the algorithms and methods that have paved the way for these breakthroughs, report on the other important areas of Game AI research, and also point out exciting directions for the future of Game AI.

[347]  arXiv:2002.10434 [pdf, other]
Title: Maximum Entropy on the Mean: A Paradigm Shift for Regularization in Image Deblurring
Comments: 15 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Image deblurring is a notoriously challenging ill-posed inverse problem. In recent years, a wide variety of approaches have been proposed based upon regularization at the level of the image or on techniques from machine learning. We propose an alternative approach, shifting the paradigm towards regularization at the level of the probability distribution on the space of images. Our method is based upon the idea of maximum entropy on the mean wherein we work at the level of the probability density function of the image whose expectation is our estimate of the ground truth. Using techniques from convex analysis and probability theory, we show that the method is computationally feasible and amenable to very large blurs. Moreover, when images are imbedded with symbology (a known pattern), we show how our method can be applied to approximate the unknown blur kernel with remarkable effects. While our method is stable with respect to small amounts of noise, it does not actively denoise. However, for moderate to large amounts of noise, it performs well by preconditioned denoising with a state of the art method.

[348]  arXiv:2002.10435 [pdf, ps, other]
Title: Learning Structured Distributions From Untrusted Batches: Faster and Simpler
Comments: 34 pages
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)

We revisit the problem of learning from untrusted batches introduced by Qiao and Valiant [QV17]. Recently, Jain and Orlitsky [JO19] gave a simple semidefinite programming approach based on the cut-norm that achieves essentially information-theoretically optimal error in polynomial time. Concurrently, Chen et al. [CLM19] considered a variant of the problem where $\mu$ is assumed to be structured, e.g. log-concave, monotone hazard rate, $t$-modal, etc. In this case, it is possible to achieve the same error with sample complexity sublinear in $n$, and they exhibited a quasi-polynomial time algorithm for doing so using Haar wavelets.
In this paper, we find an appealing way to synthesize the techniques of [JO19] and [CLM19] to give the best of both worlds: an algorithm which runs in polynomial time and can exploit structure in the underlying distribution to achieve sublinear sample complexity. Along the way, we simplify the approach of [JO19] by avoiding the need for SDP rounding and giving a more direct interpretation of it through the lens of soft filtering, a powerful recent technique in high-dimensional robust estimation.

[349]  arXiv:2002.10438 [pdf, other]
Title: LogicGAN: Logic-guided Generative Adversarial Networks
Comments: 6 pages (+ 1 page for reference) Vineel Nagisetty and Laura Graves are joint first authors
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Generative Adversarial Networks (GANs) are a revolutionary class of Deep Neural Networks (DNNs) that have been successfully used to generate realistic images, music, text, and other data. However, it is well known that GAN training can be notoriously resource-intensive and presents many challenges. Further, a potential weakness in GANs is that discriminator DNNs typically provide only one value (loss) of corrective feedback to generator DNNs (namely, the discriminator's assessment of the generated example). By contrast, we propose a new class of GAN we refer to as LogicGAN, that leverages recent advances in (logic-based) explainable AI (xAI) systems to provide a "richer" form of corrective feedback from discriminators to generators. Specifically, we modify the gradient descent process using xAI systems that specify the reason as to why the discriminator made the classification it did, thus providing the richer corrective feedback that helps the generator to better fool the discriminator. Using our approach, we show that LogicGANs learn much faster on MNIST data, achieving an improvement in data efficiency of 45% in single and 12.73% in multi-class setting over standard GANs while maintaining the same quality as measured by Fr\'echet Inception Distance. Further, we argue that LogicGAN enables users greater control over how models learn than standard GAN systems.

[350]  arXiv:2002.10444 [pdf, other]
Title: Batch Normalization Biases Deep Residual Networks Towards Shallow Paths
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Batch normalization has multiple benefits. It improves the conditioning of the loss landscape, and is a surprisingly effective regularizer. However, the most important benefit of batch normalization arises in residual networks, where it dramatically increases the largest trainable depth. We identify the origin of this benefit: At initialization, batch normalization downscales the residual branch relative to the skip connection, by a normalizing factor proportional to the square root of the network depth. This ensures that, early in training, the function computed by deep normalized residual networks is dominated by shallow paths with well-behaved gradients. We use this insight to develop a simple initialization scheme which can train very deep residual networks without normalization. We also clarify that, although batch normalization does enable stable training with larger learning rates, this benefit is only useful when one wishes to parallelize training over large batch sizes. Our results help isolate the distinct benefits of batch normalization in different architectures.

[351]  arXiv:2002.10445 [pdf, other]
Title: Deep Nearest Neighbor Anomaly Detection
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Nearest neighbors is a successful and long-standing technique for anomaly detection. Significant progress has been recently achieved by self-supervised deep methods (e.g. RotNet). Self-supervised features however typically under-perform Imagenet pre-trained features. In this work, we investigate whether the recent progress can indeed outperform nearest-neighbor methods operating on an Imagenet pretrained feature space. The simple nearest-neighbor based-approach is experimentally shown to outperform self-supervised methods in: accuracy, few shot generalization, training time and noise robustness while making fewer assumptions on image distributions.

Cross-lists for Tue, 25 Feb 20

[352]  arXiv:1908.08783 (cross-list from cs.NE) [pdf, other]
Title: Learning Fitness Functions for Genetic Algorithms
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Machine Learning (stat.ML)

The problem of automatic software generation is known as Machine Programming. In this work, we propose a framework based on genetic algorithms to solve this problem. Although genetic algorithms have been used successfully for many problems, one criticism is that hand-crafting its fitness function, the test that aims to effectively guide its evolution, can be notably challenging. Our framework presents a novel approach to learn the fitness function using neural networks to predict values of ideal fitness function. We also augment the evolutionary process with a minimally intrusive search heuristic. This heuristic improves the framework's ability to discover correct programs from ones that are approximately correct and does so with negligible computational overhead. We compare our approach with two state-of-the-art program synthesis methods and demonstrate that it finds more correct programs with fewer candidate program generations.

[353]  arXiv:2002.09487 (cross-list from physics.flu-dyn) [pdf, other]
Title: Quasi-periodic traveling gravity-capillary waves
Comments: 25 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:2001.10745
Subjects: Fluid Dynamics (physics.flu-dyn); Numerical Analysis (math.NA)

We present a numerical study of spatially quasi-periodic traveling waves on the surface of an ideal fluid of infinite depth. This is a generalization of the classic Wilton ripple problem to the case when the ratio of wave numbers satisfying the dispersion relation is irrational. We develop a conformal mapping formulation of the water wave equations that employs a quasi-periodic variant of the Hilbert transform to compute the normal velocity of the fluid from its velocity potential on the free surface. We develop a Fourier pseudo-spectral discretization of the traveling water wave equations in which one-dimensional quasi-periodic functions are represented by two-dimensional periodic functions on the torus. This leads to an overdetermined nonlinear least squares problem that we solve using a variant of the Levenberg-Marquardt method. We investigate various properties of quasi-periodic traveling waves, including Fourier resonances and the dependence of wave speed and surface tension on the amplitude parameters that describe a two-parameter family of waves.

[354]  arXiv:2002.09488 (cross-list from math.OC) [pdf, other]
Title: Optimal Randomized First-Order Methods for Least-Squares Problems
Comments: arXiv admin note: text overlap with arXiv:2002.00864
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)

We provide an exact analysis of a class of randomized algorithms for solving overdetermined least-squares problems. We consider first-order methods, where the gradients are pre-conditioned by an approximation of the Hessian, based on a subspace embedding of the data matrix. This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems. We focus on two classical embeddings, namely, Gaussian projections and subsampled randomized Hadamard transforms (SRHT). Our key technical innovation is the derivation of the limiting spectral density of SRHT embeddings. Leveraging this novel result, we derive the family of normalized orthogonal polynomials of the SRHT density and we find the optimal pre-conditioned first-order method along with its rate of convergence. Our analysis of Gaussian embeddings proceeds similarly, and leverages classical random matrix theory results. In particular, we show that for a given sketch size, SRHT embeddings exhibits a faster rate of convergence than Gaussian embeddings. Then, we propose a new algorithm by optimizing the computational complexity over the choice of the sketching dimension. To our knowledge, our resulting algorithm yields the best known complexity for solving least-squares problems with no condition number dependence.

[355]  arXiv:2002.09509 (cross-list from math.NT) [pdf, ps, other]
Title: Gowers norms for automatic sequences
Comments: 50 pages
Subjects: Number Theory (math.NT); Formal Languages and Automata Theory (cs.FL); Combinatorics (math.CO); Dynamical Systems (math.DS)

We show that any automatic sequence can be separated into a structured part and a Gowers uniform part in a way that is considerably more efficient than guaranteed by the Arithmetic Regularity Lemma. For sequences produced by strongly connected and prolongable automata, the structured part is rationally almost periodic, while for general sequences the description is marginally more complicated. In particular, we show that all automatic sequences orthogonal to periodic sequences are Gowers uniform. As an application, we obtain for any $l \geq 2$ and any automatic set $A \subset \mathbb{N}_0$ lower bounds on the number of $l$-term arithmetic progressions - contained in $A$ - with a given difference. The analogous result is false for general subsets of $\mathbb{N}_0$ and progressions of length $\geq 5$.

[356]  arXiv:2002.09515 (cross-list from physics.geo-ph) [pdf, other]
Title: Joint geophysical, petrophysical and geologic inversion using a dynamic Gaussian mixture model
Comments: 35 pages, 10 figures, submitted paper awaiting for decision
Subjects: Geophysics (physics.geo-ph); Computational Engineering, Finance, and Science (cs.CE); Applications (stat.AP)

We present a framework for petrophysically and geologically guided inversion to perform multi-physics joint inversions. Petrophysical and geological information is included in a multi-dimensional Gaussian mixture model that regularizes the inverse problem. The inverse problem we construct consists of a suite of three cyclic optimizations over the geophysical, petrophysical and geological information. The two additional problems over the petrophysical and geological data are used as a coupling term. They correspond to updating the geophysical reference model and regularization weights. This guides the inverse problem towards reproducing the desired petrophysical and geological characteristics. The objective function that we define for the inverse problem is comprised of multiple data misfit terms: one for each geophysical survey and one for the petrophysical properties and geological information. Each of these misfit terms has its target misfit value which we seek to fit in the inversion. We detail our reweighting strategies to handle multiple data misfits at once. Our framework is modular and extensible, and this allows us to combine multiple geophysical methods in a joint inversion and to distribute open-source code and reproducible examples. To illustrate the gains made by multi-physics inversions, we apply our framework to jointly invert, in 3D, synthetic potential fields data based on the DO-$27$ kimberlite pipe case study (Northwest Territories, Canada). The pipe contains two distinct kimberlite facies embedded in a host rock. We show that inverting the datasets individually, even with petrophysical information, leads to a binary geologic model consisting of background or kimberlite. A joint inversion, with petrophysical information, can differentiate the two main kimberlite facies of the pipe.

[357]  arXiv:2002.09526 (cross-list from math.OC) [pdf, other]
Title: Stochastic Subspace Cubic Newton Method
Comments: 29 pages, 5 figures, 1 table, 1 algorithm
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)

In this paper, we propose a new randomized second-order optimization algorithm---Stochastic Subspace Cubic Newton (SSCN)---for minimizing a high dimensional convex function $f$. Our method can be seen both as a {\em stochastic} extension of the cubically-regularized Newton method of Nesterov and Polyak (2006), and a {\em second-order} enhancement of stochastic subspace descent of Kozak et al. (2019). We prove that as we vary the minibatch size, the global convergence rate of SSCN interpolates between the rate of stochastic coordinate descent (CD) and the rate of cubic regularized Newton, thus giving new insights into the connection between first and second-order methods. Remarkably, the local convergence rate of SSCN matches the rate of stochastic subspace descent applied to the problem of minimizing the quadratic function $\frac12 (x-x^*)^\top \nabla^2f(x^*)(x-x^*)$, where $x^*$ is the minimizer of $f$, and hence depends on the properties of $f$ at the optimum only. Our numerical experiments show that SSCN outperforms non-accelerated first-order CD algorithms while being competitive to their accelerated variants.

[358]  arXiv:2002.09538 (cross-list from stat.ML) [pdf, other]
Title: Knot Selection in Sparse Gaussian Processes
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Knot-based, sparse Gaussian processes have enjoyed considerable success as scalable approximations to full Gaussian processes. Problems can occur, however, when knot selection is done by optimizing the marginal likelihood. For example, the marginal likelihood surface is highly multimodal, which can cause suboptimal knot placement where some knots serve practically no function. This is especially a problem when many more knots are used than are necessary, resulting in extra computational cost for little to no gains in accuracy.
We propose a one-at-a-time knot selection algorithm to select both the number and placement of knots. Our algorithm uses Bayesian optimization to efficiently propose knots that are likely to be good and largely avoids the pathologies encountered when using the marginal likelihood as the objective function. We provide empirical results showing improved accuracy and speed over the current standard approaches.

[359]  arXiv:2002.09547 (cross-list from stat.ML) [pdf, other]
Title: Stochastic Normalizing Flows
Comments: 17 pages, 4 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We introduce stochastic normalizing flows, an extension of continuous normalizing flows for maximum likelihood estimation and variational inference (VI) using stochastic differential equations (SDEs). Using the theory of rough paths, the underlying Brownian motion is treated as a latent variable and approximated, enabling efficient training of neural SDEs as random neural ordinary differential equations. These SDEs can be used for constructing efficient Markov chains to sample from the underlying distribution of a given dataset. Furthermore, by considering families of targeted SDEs with prescribed stationary distribution, we can apply VI to the optimization of hyperparameters in stochastic MCMC.

[360]  arXiv:2002.09558 (cross-list from eess.IV) [pdf, other]
Title: Self-Supervised Poisson-Gaussian Denoising
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

We extend the blindspot model for self-supervised denoising to handle Poisson-Gaussian noise and introduce an improved training scheme that avoids hyperparameters and adapts the denoiser to the test data. Self-supervised models for denoising learn to denoise from only noisy data and do not require corresponding clean images, which are difficult or impossible to acquire in some application areas of interest such as low-light microscopy. We introduce a new training strategy to handle Poisson-Gaussian noise which is the standard noise model for microscope images. Our new strategy eliminates hyperparameters from the loss function, which is important in a self-supervised regime where no ground truth data is available to guide hyperparameter tuning. We show how our denoiser can be adapted to the test data to improve performance. Our evaluation on a microscope image denoising benchmark validates our approach.

[361]  arXiv:2002.09573 (cross-list from stat.ML) [pdf, ps, other]
Title: Causal structure learning from time series: Large regression coefficients may predict causal links better in practice than small p-values
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP)

In this article, we describe the algorithms for causal structure learning from time series data that won the Causality 4 Climate competition at the Conference on Neural Information Processing Systems 2019 (NeurIPS). We examine how our combination of established ideas achieves competitive performance on semi-realistic and realistic time series data exhibiting common challenges in real-world Earth sciences data. In particular, we discuss a) a rationale for leveraging linear methods to identify causal links in non-linear systems, b) a simulation-backed explanation as to why large regression coefficients may predict causal links better in practice than small p-values and thus why normalising the data may sometimes hinder causal structure learning.
For benchmark usage, we provide implementations at https://github.com/sweichwald/tidybench and detail the algorithms here. We propose the presented competition-proven methods for baseline benchmark comparisons to guide the development of novel algorithms for structure learning from time series.

[362]  arXiv:2002.09580 (cross-list from stat.ML) [pdf, other]
Title: Polarizing Front Ends for Robust CNNs
Comments: Published in 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Signal Processing (eess.SP)

The vulnerability of deep neural networks to small, adversarially designed perturbations can be attributed to their "excessive linearity." In this paper, we propose a bottom-up strategy for attenuating adversarial perturbations using a nonlinear front end which polarizes and quantizes the data. We observe that ideal polarization can be utilized to completely eliminate perturbations, develop algorithms to learn approximately polarizing bases for data, and investigate the effectiveness of the proposed strategy on the MNIST and Fashion MNIST datasets.

[363]  arXiv:2002.09589 (cross-list from stat.ML) [pdf, other]
Title: SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm
Comments: 20 pages, 6 figures, 2 tables
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG); Statistics Theory (math.ST)

Sample- and computationally-efficient distribution estimation is a fundamental tenet in statistics and machine learning. We present $\mathrm{SURF}$, an algorithm for approximating distributions by piecewise polynomials. $\mathrm{SURF}$ is simple, replacing existing general-purpose optimization techniques by straight-forward approximation of each potential polynomial piece by a simple empirical-probability interpolation, and using plain divide-and-conquer to merge the pieces. It is universal, as well-known low-degree polynomial-approximation results imply that it accurately approximates a large class of common distributions. $\mathrm{SURF}$ is robust to distribution mis-specification as for any degree $d\le 8$, it estimates any distribution to an $\ell_1$ distance $ <3 $ times that of the nearest degree-$d$ piecewise polynomial, improving known factor upper bounds of 3 for single polynomials and 15 for polynomials with arbitrarily many pieces. It is fast, using optimal sample complexity, and running in near sample-linear time. In experiments, $\mathrm{SURF}$ significantly outperforms state-of-the art algorithms.

[364]  arXiv:2002.09611 (cross-list from eess.IV) [pdf, other]
Title: Tuning-free Plug-and-Play Proximal Algorithm for Inverse Imaging Problems
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Plug-and-play (PnP) is a non-convex framework that combines ADMM or other proximal algorithms with advanced denoiser priors. Recently, PnP has achieved great empirical success, especially with the integration of deep learning-based denoisers. However, a key problem of PnP based approaches is that they require manual parameter tweaking. It is necessary to obtain high-quality results across the high discrepancy in terms of imaging conditions and varying scene content. In this work, we present a tuning-free PnP proximal algorithm, which can automatically determine the internal parameters including the penalty parameter, the denoising strength and the terminal time. A key part of our approach is to develop a policy network for automatic search of parameters, which can be effectively learned via mixed model-free and model-based deep reinforcement learning. We demonstrate, through numerical and visual experiments, that the learned policy can customize different parameters for different states, and often more efficient and effective than existing handcrafted criteria. Moreover, we discuss the practical considerations of the plugged denoisers, which together with our learned policy yield state-of-the-art results. This is prevalent on both linear and nonlinear exemplary inverse imaging problems, and in particular, we show promising results on Compressed Sensing MRI and phase retrieval.

[365]  arXiv:2002.09615 (cross-list from stat.ML) [pdf, other]
Title: Preference Modeling with Context-Dependent Salient Features
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We consider the problem of estimating a ranking on a set of items from noisy pairwise comparisons given item features. We address the fact that pairwise comparison data often reflects irrational choice, e.g. intransitivity. Our key observation is that two items compared in isolation from other items may be compared based on only a salient subset of features. Formalizing this framework, we propose the "salient feature preference model" and prove a sample complexity result for learning the parameters of our model and the underlying ranking with maximum likelihood estimation. We also provide empirical results that support our theoretical bounds and illustrate how our model explains systematic intransitivity. Finally we demonstrate strong performance of maximum likelihood estimation of our model on both synthetic data and two real data sets: the UT Zappos50K data set and comparison data about the compactness of legislative districts in the US.

[366]  arXiv:2002.09621 (cross-list from math.OC) [pdf, other]
Title: Global Convergence and Variance-Reduced Optimization for a Class of Nonconvex-Nonconcave Minimax Problems
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)

Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning. Simple algorithms such as the gradient descent ascent (GDA) are the common practice for solving these nonconvex games and receive lots of empirical success. Yet, it is known that these vanilla GDA algorithms with constant step size can potentially diverge even in the convex setting. In this work, we show that for a subclass of nonconvex-nonconcave objectives satisfying a so-called two-sided Polyak-{\L}ojasiewicz inequality, the alternating gradient descent ascent (AGDA) algorithm converges globally at a linear rate and the stochastic AGDA achieves a sublinear rate. We further develop a variance reduced algorithm that attains a provably faster rate than AGDA when the problem has the finite-sum structure.

[367]  arXiv:2002.09622 (cross-list from math.LO) [pdf, ps, other]
Title: Notes on neighborhood semantics for logics of unknown truths and false beliefs
Authors: Jie Fan
Comments: 21 pages
Subjects: Logic (math.LO); Artificial Intelligence (cs.AI)

In this article, we study logics of unknown truths and false beliefs under neighborhood semantics. We compare the relative expressivity of the two logics. It turns out that they are incomparable over various classes of neighborhood models, and the combination of the two logics are equally expressive as standard modal logic over any class of neighborhood models. We propose morphisms for each logic, which can help us explore the frame definability problem, show a general soundness and completeness result, and generalize some results in the literature. We axiomatize the two logics over various classes of neighborhood frames. Last but not least, we extend the results to the case of public announcements, which has good applications to Moore sentences and some others.

[368]  arXiv:2002.09625 (cross-list from eess.IV) [pdf, other]
Title: Neural Architecture Search for Compressed Sensing Magnetic Resonance Image Reconstruction
Comments: 10 pages, submitted to IEEEtrans
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Recent works have demonstrated that deep learning (DL) based compressed sensing (CS) implementation can provide impressive improvements to reconstruct high-quality MR images from sub-sampling k-space data. However, network architectures adopted in current methods are all designed by handcraft, thus the performances of these networks are limited by researchers' expertise and labor. In this manuscript, we proposed a novel and efficient MR image reconstruction framework by Neural Architecture Search (NAS) algorithm. The inner cells in our reconstruction network are automatically defined from a flexible search space in a differentiable manner. Comparing to previous works where only several common convolutional operations are tried by human, our method can explore different operations (e.g. dilated convolution) with their possible combinations sufficiently. Our proposed method can also reach a better trade-off between computation cost and reconstruction performance for practical clinical translation. Experiments performed on a publicly available dataset show that our network produces better reconstruction results compared to the previous state-of-the-art methods in terms of PSNR and SSIM with 4 times fewer computation resources. The final network architecture found by the algorithm can also offer insights for network architecture designed in other medical image analysis applications.

[369]  arXiv:2002.09635 (cross-list from eess.IV) [pdf, other]
Title: Towards Label-Free 3D Segmentation of Optical Coherence Tomography Images of the Optic Nerve Head Using Deep Learning
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Since the introduction of optical coherence tomography (OCT), it has been possible to study the complex 3D morphological changes of the optic nerve head (ONH) tissues that occur along with the progression of glaucoma. Although several deep learning (DL) techniques have been recently proposed for the automated extraction (segmentation) and quantification of these morphological changes, the device specific nature and the difficulty in preparing manual segmentations (training data) limit their clinical adoption. With several new manufacturers and next-generation OCT devices entering the market, the complexity in deploying DL algorithms clinically is only increasing. To address this, we propose a DL based 3D segmentation framework that is easily translatable across OCT devices in a label-free manner (i.e. without the need to manually re-segment data for each device). Specifically, we developed 2 sets of DL networks. The first (referred to as the enhancer) was able to enhance OCT image quality from 3 OCT devices, and harmonized image-characteristics across these devices. The second performed 3D segmentation of 6 important ONH tissue layers. We found that the use of the enhancer was critical for our segmentation network to achieve device independency. In other words, our 3D segmentation network trained on any of 3 devices successfully segmented ONH tissue layers from the other two devices with high performance (Dice coefficients > 0.92). With such an approach, we could automatically segment images from new OCT devices without ever needing manual segmentation data from such devices.

[370]  arXiv:2002.09656 (cross-list from q-fin.ST) [pdf]
Title: A new hybrid approach for crude oil price forecasting: Evidence from multi-scale data
Subjects: Statistical Finance (q-fin.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)

Faced with the growing research towards crude oil price fluctuations influential factors following the accelerated development of Internet technology, accessible data such as Google search volume index are increasingly quantified and incorporated into forecasting approaches. In this paper, we apply multi-scale data that including both GSVI data and traditional economic data related to crude oil price as independent variables and propose a new hybrid approach for monthly crude oil price forecasting. This hybrid approach, based on divide and conquer strategy, consists of K-means method, kernel principal component analysis and kernel extreme learning machine , where K-means method is adopted to divide input data into certain clusters, KPCA is applied to reduce dimension, and KELM is employed for final crude oil price forecasting. The empirical result can be analyzed from data and method levels. At the data level, GSVI data perform better than economic data in level forecasting accuracy but with opposite performance in directional forecasting accuracy because of Herd Behavior, while hybrid data combined their advantages and obtain best forecasting performance in both level and directional accuracy. At the method level, the approaches with K-means perform better than those without K-means, which demonstrates that divide and conquer strategy can effectively improve the forecasting performance.

[371]  arXiv:2002.09658 (cross-list from math.OC) [pdf, other]
Title: An Efficient MPC Algorithm For Switched Nonlinear Systems with Minimum Dwell Time Constraints
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

This paper presents an efficient suboptimal model predictive control (MPC) algorithm for nonlinear switched systems subject to minimum dwell time constraints (MTC). While MTC are required for most physical systems due to stability, power and mechanical restrictions, MPC optimization problems with MTC are challenging to solve. To efficiently solve such problems, the on-line MPC optimization problem is decomposed into a sequence of simpler problems, which include two nonlinear programs (NLP) and a rounding step, as typically done in mixed-integer optimal control (MIOC). Unlike the classical approach that embeds MTC in a mixed-integer linear program (MILP) with combinatorial constraints in the rounding step, our proposal is to embed the MTC in one of the NLPs using move blocking. Such a formulation can speedup on-line computations by employing recent move blocking algorithms for NLP problems and by using a simple sum-up-rounding (SUR) method for the rounding step. An explicit upper bound of the integer approximation error for the rounding step is given. In addition, a combined shrinking and receding horizon strategy is developed to satisfy closed-loop MTC. Recursive feasibility is proven using a $l$-step control invariant ($l$-CI) set, where $l$ is the minimum dwell time step length. An algorithm to compute $l$-CI sets for switched linear systems off-line is also presented. Numerical studies demonstrate the efficiency and effectiveness of the proposed MPC algorithm for switched nonlinear systems with MTC.

[372]  arXiv:2002.09677 (cross-list from stat.ML) [pdf, other]
Title: Kernel interpolation with continuous volume sampling
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA); Probability (math.PR)

A fundamental task in kernel methods is to pick nodes and weights, so as to approximate a given function from an RKHS by the weighted sum of kernel translates located at the nodes. This is the crux of kernel density estimation, kernel quadrature, or interpolation from discrete samples. Furthermore, RKHSs offer a convenient mathematical and computational framework. We introduce and analyse continuous volume sampling (VS), the continuous counterpart -- for choosing node locations -- of a discrete distribution introduced in (Deshpande & Vempala, 2006). Our contribution is theoretical: we prove almost optimal bounds for interpolation and quadrature under VS. While similar bounds already exist for some specific RKHSs using ad-hoc node constructions, VS offers bounds that apply to any Mercer kernel and depend on the spectrum of the associated integration operator. We emphasize that, unlike previous randomized approaches that rely on regularized leverage scores or determinantal point processes, evaluating the pdf of VS only requires pointwise evaluations of the kernel. VS is thus naturally amenable to MCMC samplers.

[373]  arXiv:2002.09695 (cross-list from stat.ML) [pdf]
Title: A New Unified Deep Learning Approach with Decomposition-Reconstruction-Ensemble Framework for Time Series Forecasting
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP)

A new variational mode decomposition (VMD) based deep learning approach is proposed in this paper for time series forecasting problem. Firstly, VMD is adopted to decompose the original time series into several sub-signals. Then, a convolutional neural network (CNN) is applied to learn the reconstruction patterns on the decomposed sub-signals to obtain several reconstructed sub-signals. Finally, a long short term memory (LSTM) network is employed to forecast the time series with the decomposed sub-signals and the reconstructed sub-signals as inputs. The proposed VMD-CNN-LSTM approach is originated from the decomposition-reconstruction-ensemble framework, and innovated by embedding the reconstruction, single forecasting, and ensemble steps in a unified deep learning approach. To verify the forecasting performance of the proposed approach, four typical time series datasets are introduced for empirical analysis. The empirical results demonstrate that the proposed approach outperforms consistently the benchmark approaches in terms of forecasting accuracy, and also indicate that the reconstructed sub-signals obtained by CNN is of importance for further improving the forecasting performance.

[374]  arXiv:2002.09703 (cross-list from eess.IV) [pdf, other]
Title: Automatic Data Augmentation via Deep Reinforcement Learning for Effective Kidney Tumor Segmentation
Comments: 5 pages, 3 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Conventional data augmentation realized by performing simple pre-processing operations (\eg, rotation, crop, \etc) has been validated for its advantage in enhancing the performance for medical image segmentation. However, the data generated by these conventional augmentation methods are random and sometimes harmful to the subsequent segmentation. In this paper, we developed a novel automatic learning-based data augmentation method for medical image segmentation which models the augmentation task as a trial-and-error procedure using deep reinforcement learning (DRL). In our method, we innovatively combine the data augmentation module and the subsequent segmentation module in an end-to-end training manner with a consistent loss. Specifically, the best sequential combination of different basic operations is automatically learned by directly maximizing the performance improvement (\ie, Dice ratio) on the available validation set. We extensively evaluated our method on CT kidney tumor segmentation which validated the promising results of our method.

[375]  arXiv:2002.09711 (cross-list from physics.bio-ph) [pdf]
Title: Robotic modeling of snake traversing large, smooth obstacles reveals stability benefits of body compliance
Authors: Qiyuan Fu, Chen Li
Journal-ref: Royal Society Open Science (2020), 7, 191192
Subjects: Biological Physics (physics.bio-ph); Systems and Control (eess.SY); Quantitative Methods (q-bio.QM)

Snakes can move through almost any terrain. Although their locomotion on flat surfaces using planar gaits is inherently stable, when snakes deform their body out of plane to traverse complex terrain, maintaining stability becomes a challenge. On trees and desert dunes, snakes grip branches or brace against depressed sand for stability. However, how they stably surmount obstacles like boulders too large and smooth to gain such anchor points is less understood. Similarly, snake robots are challenged to stably traverse large, smooth obstacles for search and rescue and building inspection. Our recent study discovered that snakes combine body lateral undulation and cantilevering to stably traverse large steps. Here, we developed a snake robot with this gait and snake-like anisotropic friction and used it as a physical model to understand stability principles. The robot traversed steps as high as a third of its body length rapidly and stably. However, on higher steps, it was more likely to fail due to more frequent rolling and flipping over, which was absent in the snake with a compliant body. Adding body compliance reduced the robot roll instability by statistically improving surface contact, without reducing speed. Besides advancing understanding of snake locomotion, our robot achieved high traversal speed surpassing most previous snake robots and approaching snakes, while maintaining high traversal probability.

[376]  arXiv:2002.09735 (cross-list from stat.ML) [pdf, other]
Title: Partially Observed Dynamic Tensor Response Regression
Comments: This contains the main paper only. The supplement is available upon request
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

In modern data science, dynamic tensor data is prevailing in numerous applications. An important task is to characterize the relationship between such dynamic tensor and external covariates. However, the tensor data is often only partially observed, rendering many existing methods inapplicable. In this article, we develop a regression model with partially observed dynamic tensor as the response and external covariates as the predictor. We introduce the low-rank, sparsity and fusion structures on the regression coefficient tensor, and consider a loss function projected over the observed entries. We develop an efficient non-convex alternating updating algorithm, and derive the finite-sample error bound of the actual estimator from each step of our optimization algorithm. Unobserved entries in tensor response have imposed serious challenges. As a result, our proposal differs considerably in terms of estimation algorithm, regularity conditions, as well as theoretical properties, compared to the existing tensor completion or tensor response regression solutions. We illustrate the efficacy of our proposed method using simulations, and two real applications, a neuroimaging dementia study and a digital advertising study.

[377]  arXiv:2002.09737 (cross-list from stat.ML) [pdf, other]
Title: Amortised Learning by Wake-Sleep
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Models that employ latent variables to capture structure in observed data lie at the heart of many current unsupervised learning algorithms, but exact maximum-likelihood learning for powerful and flexible latent-variable models is almost always intractable. Thus, state-of-the-art approaches either abandon the maximum-likelihood framework entirely, or else rely on a variety of variational approximations to the posterior distribution over the latents. Here, we propose an alternative approach that we call amortised learning. Rather than computing an approximation to the posterior over latents, we use a wake-sleep Monte-Carlo strategy to learn a function that directly estimates the maximum-likelihood parameter updates. Amortised learning is possible whenever samples of latents and observations can be simulated from the generative model, treating the model as a "black box". We demonstrate its effectiveness on a wide range of complex models, including those with latents that are discrete or supported on non-Euclidean spaces.

[378]  arXiv:2002.09741 (cross-list from stat.ML) [pdf, other]
Title: VFlow: More Expressive Generative Flows with Variational Data Augmentation
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Generative flows are promising tractable models for density modeling that define probabilistic distributions with invertible transformations. However, tractability imposes architectural constraints on generative flows, making them less expressive than other types of generative models. In this work, we study a previously overlooked constraint that all the intermediate representations must have the same dimensionality with the original data due to invertibility, limiting the width of the network. We tackle this constraint by augmenting the data with some extra dimensions and jointly learning a generative flow for augmented data as well as the distribution of augmented dimensions under a variational inference framework. Our approach, VFlow, is a generalization of generative flows and therefore always performs better. Combining with existing generative flows, VFlow achieves a new state-of-the-art 2.98 bits per dimension on the CIFAR-10 dataset and is more compact than previous models to reach similar modeling quality.

[379]  arXiv:2002.09769 (cross-list from stat.ML) [pdf, ps, other]
Title: Optimistic bounds for multi-output prediction
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We investigate the challenge of multi-output learning, where the goal is to learn a vector-valued function based on a supervised data set. This includes a range of important problems in Machine Learning including multi-target regression, multi-class classification and multi-label classification. We begin our analysis by introducing the self-bounding Lipschitz condition for multi-output loss functions, which interpolates continuously between a classical Lipschitz condition and a multi-dimensional analogue of a smoothness condition. We then show that the self-bounding Lipschitz condition gives rise to optimistic bounds for multi-output learning, which are minimax optimal up to logarithmic factors. The proof exploits local Rademacher complexity combined with a powerful minoration inequality due to Srebro, Sridharan and Tewari. As an application we derive a state-of-the-art generalization bound for multi-class gradient boosting.

[380]  arXiv:2002.09783 (cross-list from quant-ph) [pdf, other]
Title: Optimality Study of Existing Quantum Computing Layout Synthesis Tools
Comments: 15 pages, 7 figures
Subjects: Quantum Physics (quant-ph); Hardware Architecture (cs.AR)

Layout synthesis, an important step in quantum computing, processes quantum circuits to satisfy device layout constraints. In this paper, we construct QUEKO benchmarks for this problem, which have known optimal depth. We use QUEKO to evaluate the optimality of current layout synthesis tools, including Cirq from Google, Qiskit from IBM, $\mathsf{t}|\mathsf{ket}\rangle$ from Cambridge Quantum Computing, and recent academic work. To our surprise, despite over a decade of research and development by academia and industry on compilation and synthesis for quantum circuits, we are still able to demonstrate large optimality gaps. Even combining the best of all four solutions we evaluated, the gap is still about 4x for circuits with depths suitable for state-of-the-art devices. This suggests substantial room for improvement. Finally, we also prove the NP-completeness of the layout synthesis problem for quantum computing. We have made the QUEKO benchmarks open source.

[381]  arXiv:2002.09806 (cross-list from math.OC) [pdf, ps, other]
Title: Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games
Comments: 21 Pages. Under review
Subjects: Optimization and Control (math.OC); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)

We consider multi-agent learning via online gradient descent (OGD) in a class of games called $\lambda$-cocoercive games, a broad class of games that admits many Nash equilibria and that properly includes strongly monotone games. We characterize the finite-time last-iterate convergence rate for joint OGD learning on $\lambda$-cocoercive games; further, building on this result, we develop a fully adaptive OGD learning algorithm that does not require any knowledge of the problem parameter (e.g., the cocoercive constant $\lambda$) and show, via a novel double-stopping-time technique, that this adaptive algorithm achieves the same finite-time last-iterate convergence rate as its non-adaptive counterpart. Subsequently, we extend OGD learning to the noisy gradient feedback case and establish last-iterate convergence results---first qualitative almost sure convergence, then quantitative finite-time convergence rates---all under non-decreasing step-sizes. These results fill in several gaps in the existing multi-agent online learning literature, where three aspects---finite-time convergence rates, non-decreasing step-sizes, and fully adaptive algorithms---have not been previously explored.

[382]  arXiv:2002.09815 (cross-list from stat.ML) [pdf, other]
Title: Neuron Shapley: Discovering the Responsible Neurons
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

We develop Neuron Shapley as a new framework to quantify the contribution of individual neurons to the prediction and performance of a deep network. By accounting for interactions across neurons, Neuron Shapley is more effective in identifying important filters compared to common approaches based on activation patterns. Interestingly, removing just 30 filters with the highest Shapley scores effectively destroys the prediction accuracy of Inception-v3 on ImageNet. Visualization of these few critical filters provides insights into how the network functions. Neuron Shapley is a flexible framework and can be applied to identify responsible neurons in many tasks. We illustrate additional applications of identifying filters that are responsible for biased prediction in facial recognition and filters that are vulnerable to adversarial attacks. Removing these filters is a quick way to repair models. Enabling all these applications is a new multi-arm bandit algorithm that we developed to efficiently estimate Neuron Shapley values.

[383]  arXiv:2002.09821 (cross-list from eess.AS) [pdf, other]
Title: A Multi-view CNN-based Acoustic Classification System for Automatic Animal Species Identification
Journal-ref: Ad Hoc Networks 2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

Automatic identification of animal species by their vocalization is an important and challenging task. Although many kinds of audio monitoring system have been proposed in the literature, they suffer from several disadvantages such as non-trivial feature selection, accuracy degradation because of environmental noise or intensive local computation. In this paper, we propose a deep learning based acoustic classification framework for Wireless Acoustic Sensor Network (WASN). The proposed framework is based on cloud architecture which relaxes the computational burden on the wireless sensor node. To improve the recognition accuracy, we design a multi-view Convolution Neural Network (CNN) to extract the short-, middle-, and long-term dependencies in parallel. The evaluation on two real datasets shows that the proposed architecture can achieve high accuracy and outperforms traditional classification systems significantly when the environmental noise dominate the audio signal (low SNR). Moreover, we implement and deploy the proposed system on a testbed and analyse the system performance in real-world environments. Both simulation and real-world evaluation demonstrate the accuracy and robustness of the proposed acoustic classification system in distinguishing species of animals.

[384]  arXiv:2002.09847 (cross-list from eess.IV) [pdf, other]
Title: Unsupervised Denoising for Satellite Imagery using Wavelet Subband CycleGAN
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

Multi-spectral satellite imaging sensors acquire various spectral band images such as red (R), green (G), blue (B), near-infrared (N), etc. Thanks to the unique spectroscopic property of each spectral band with respective to the objects on the ground, multi-spectral satellite imagery can be used for various geological survey applications. Unfortunately, image artifacts from imaging sensor noises often affect the quality of scenes and have negative impacts on the applications of satellite imagery. Recently, deep learning approaches have been extensively explored for the removal of noises in satellite imagery. Most deep learning denoising methods, however, follow a supervised learning scheme, which requires matched noisy image and clean image pairs that are difficult to collect in real situations. In this paper, we propose a novel unsupervised multispectral denoising method for satellite imagery using wavelet subband cycle-consistent adversarial network (WavCycleGAN). The proposed method is based on unsupervised learning scheme using adversarial loss and cycle-consistency loss to overcome the lack of paired data. Moreover, in contrast to the standard image domain cycleGAN, we introduce a wavelet subband domain learning scheme for effective denoising without sacrificing high frequency components such as edges and detail information. Experimental results for the removal of vertical stripe and wave noises in satellite imaging sensors demonstrate that the proposed method effectively removes noises and preserves important high frequency features of satellite images.

[385]  arXiv:2002.09889 (cross-list from stat.ML) [pdf, other]
Title: Investigating the interaction between gradient-only line searches and different activation functions
Comments: 37 pages, 9 figures, submitted for journal review
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)

Gradient-only line searches (GOLS) adaptively determine step sizes along search directions for discontinuous loss functions resulting from dynamic mini-batch sub-sampling in neural network training. Step sizes in GOLS are determined by localizing Stochastic Non-Negative Associated Gradient Projection Points (SNN-GPPs) along descent directions. These are identified by a sign change in the directional derivative from negative to positive along a descent direction. Activation functions are a significant component of neural network architectures as they introduce non-linearities essential for complex function approximations. The smoothness and continuity characteristics of the activation functions directly affect the gradient characteristics of the loss function to be optimized. Therefore, it is of interest to investigate the relationship between activation functions and different neural network architectures in the context of GOLS. We find that GOLS are robust for a range of activation functions, but sensitive to the Rectified Linear Unit (ReLU) activation function in standard feedforward architectures. The zero-derivative in ReLU's negative input domain can lead to the gradient-vector becoming sparse, which severely affects training. We show that implementing architectural features such as batch normalization and skip connections can alleviate these difficulties and benefit training with GOLS for all activation functions considered.

[386]  arXiv:2002.09914 (cross-list from stat.ML) [pdf, other]
Title: ChemGrapher: Optical Graph Recognition of Chemical Compounds by Deep Learning
Comments: 16 pages, 6 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

In drug discovery, knowledge of the graph structure of chemical compounds is essential. Many thousands of scientific articles in chemistry and pharmaceutical sciences have investigated chemical compounds, but in cases the details of the structure of these chemical compounds is published only as an images. A tool to analyze these images automatically and convert them into a chemical graph structure would be useful for many applications, such drug discovery. A few such tools are available and they are mostly derived from optical character recognition. However, our evaluation of the performance of those tools reveals that they make often mistakes in detecting the correct bond multiplicity and stereochemical information. In addition, errors sometimes even lead to missing atoms in the resulting graph. In our work, we address these issues by developing a compound recognition method based on machine learning. More specifically, we develop a deep neural network model for optical compound recognition. The deep learning solution presented here consists of a segmentation model, followed by three classification models that predict atom locations, bonds and charges. Furthermore, this model not only predicts the graph structure of the molecule but also produces all information necessary to relate each component of the resulting graph to the source image. This solution is scalable and could rapidly process thousands of images. Finally, we compare empirically the proposed method to a well-established tool and observe significant error reductions.

[387]  arXiv:2002.09916 (cross-list from math.OC) [pdf, ps, other]
Title: Extended formulation and valid inequalities for the multi-item inventory lot-sizing problem with supplier selection
Subjects: Optimization and Control (math.OC); Computational Complexity (cs.CC)

This paper considers the multi-item inventory lot-sizing problem with supplier selection. The problem consists in determining an optimal purchasing plan in order to satisfy dynamic deterministic demands for multiple items over a finite planning horizon, taking into account the fact that multiple suppliers are available to purchase from. As the complexity of the problem was an open question, we show that it is NP-hard. We propose a facility location extended formulation for the problem which can be preprocessed based on the cost structure and describe new valid inequalities in the original space of variables, which we denote $(l,S_j)$-inequalities. Furthermore, we study the projection of the extended formulation into the original space and show the connection between the inequalities generated by this projection and the newly proposed $(l,S_j)$-inequalities. Additionally, we present a simple and easy to implement yet very effective MIP (mixed integer programming) heuristic using the extended formulation. Computational results show that the preprocessed facility location extended formulation outperforms all other formulations for small and medium instances, as it can solve nearly all of them to optimality within the time limit. Moreover, the presented MIP heuristic is able to obtain solutions which strictly improve those achieved by a state-of-the art method for all the large benchmark instances.

[388]  arXiv:2002.09929 (cross-list from math.AP) [pdf, other]
Title: Solvability for Photoacoustic Imaging with Idealized Piezoelectric Sensors
Authors: Sebastian Acosta
Subjects: Analysis of PDEs (math.AP); Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)

Most reconstruction algorithms for photoacoustic imaging assume that the pressure field is measured by ultrasound sensors placed on a detection surface. However, such sensors do not measure pressure exactly due to their non-uniform directional and frequency responses, and resolution limitations. This is the case for piezoelectric sensors that are commonly employed for acoustics-based biomedical imaging. In this paper, using the method of matched asymptotic expansions and the basic constitutive relations for piezoelectricity, we propose a simple mathematical model for piezoelectric transducers. The approach simultaneously models how the pressure waves induce the piezoelectric measurements and how the presence of the sensors affects the pressure waves. Using this model, we analyze whether the data gathered by piezoelectric sensors leads to the solvability of the photoacoustic imaging problem. We conclude that this imaging problem is well-posed in certain normed spaces and under a geometric assumption. We also propose an iterative reconstruction algorithm that incorporates the model for piezoelectric measurements. Numerical implementation of the reconstruction algorithm is presented.

[389]  arXiv:2002.09946 (cross-list from physics.flu-dyn) [pdf, other]
Title: The Whitham Equation with Surface Tension
Comments: 19 pages, 5 figures, 1 table, 36 references. Other author's papers can be downloaded at this http URL arXiv admin note: text overlap with arXiv:1410.8299
Journal-ref: Nonlinear Dynamics (2017), Vol. 88, pp. 1125-1138
Subjects: Fluid Dynamics (physics.flu-dyn); Analysis of PDEs (math.AP); Numerical Analysis (math.NA); Atmospheric and Oceanic Physics (physics.ao-ph); Computational Physics (physics.comp-ph)

The viability of the Whitham equation as a nonlocal model for capillary-gravity waves at the surface of an inviscid incompressible fluid is under study. A nonlocal Hamiltonian system of model equations is derived using the Hamiltonian structure of the free surface water wave problem and the Dirichlet-Neumann operator. The system features gravitational and capillary effects, and when restricted to one-way propagation, the system reduces to the capillary Whitham equation. It is shown numerically that in various scaling regimes the Whitham equation gives a more accurate approximation of the free-surface problem for the Euler system than other models like the KdV, and Kawahara equation. In the case of relatively strong capillarity considered here, the KdV and Kawahara equations outperform the Whitham equation with surface tension only for very long waves with negative polarity.

[390]  arXiv:2002.09954 (cross-list from stat.ML) [pdf, other]
Title: Near-linear Time Gaussian Process Optimization with Adaptive Batching and Resparsification
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Gaussian processes (GP) are one of the most successful frameworks to model uncertainty. However, GP optimization (e.g., GP-UCB) suffers from major scalability issues. Experimental time grows linearly with the number of evaluations, unless candidates are selected in batches (e.g., using GP-BUCB) and evaluated in parallel. Furthermore, computational cost is often prohibitive since algorithms such as GP-BUCB require a time at least quadratic in the number of dimensions and iterations to select each batch. In this paper, we introduce BBKB (Batch Budgeted Kernel Bandits), the first no-regret GP optimization algorithm that provably runs in near-linear time and selects candidates in batches. This is obtained with a new guarantee for the tracking of the posterior variances that allows BBKB to choose increasingly larger batches, improving over GP-BUCB. Moreover, we show that the same bound can be used to adaptively delay costly updates to the sparse GP approximation used by BBKB, achieving a near-constant per-step amortized cost. These findings are then confirmed in several experiments, where BBKB is much faster than state-of-the-art methods.

[391]  arXiv:2002.09970 (cross-list from quant-ph) [pdf, other]
Title: Computer-inspired Quantum Experiments
Comments: Comments and suggestions for additional references are welcome!
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

The design of new devices and experiments in science and engineering has historically relied on the intuitions of human experts. This credo, however, has changed. In many disciplines, computer-inspired design processes, also known as inverse-design, have augmented the capability of scientists. Here we visit different fields of physics in which computer-inspired designs are applied. We will meet vastly diverse computational approaches based on topological optimization, evolutionary strategies, deep learning, reinforcement learning or automated reasoning. Then we draw our attention specifically on quantum physics. In the quest for designing new quantum experiments, we face two challenges: First, quantum phenomena are unintuitive. Second, the number of possible configurations of quantum experiments explodes combinatorially. To overcome these challenges, physicists began to use algorithms for computer-designed quantum experiments. We focus on the most mature and \textit{practical} approaches that scientists used to find new complex quantum experiments, which experimentalists subsequently have realized in the laboratories. The underlying idea is a highly-efficient topological search, which allows for scientific interpretability. In that way, some of the computer-designs have led to the discovery of new scientific concepts and ideas -- demonstrating how computer algorithm can genuinely contribute to science by providing unexpected inspirations. We discuss several extensions and alternatives based on optimization and machine learning techniques, with the potential of accelerating the discovery of practical computer-inspired experiments or concepts in the future. Finally, we discuss what we can learn from the different approaches in the fields of physics, and raise several fascinating possibilities for future research.

[392]  arXiv:2002.09980 (cross-list from math.CA) [pdf, ps, other]
Title: Orthogonal Systems of Spline Wavelets as Unconditional Bases in Sobolev Spaces
Comments: 21 pages, 1 figure
Subjects: Classical Analysis and ODEs (math.CA); Functional Analysis (math.FA); Numerical Analysis (math.NA)

We exhibit the necessary range for which functions in the Sobolev spaces $L^s_p$ can be represented as an unconditional sum of orthonormal spline wavelet systems, such as the Battle-Lemari\'e wavelets. We also consider the natural extensions to Triebel-Lizorkin spaces. This builds upon, and is a generalization of, previous work of Seeger and Ullrich, where analogous results were established for the Haar wavelet system.

[393]  arXiv:2002.09996 (cross-list from stat.ML) [pdf, other]
Title: ConBO: Conditional Bayesian Optimization
Comments: 10 pages, 7 pages appendix
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)

Bayesian optimization is a class of data efficient model based algorithms typically focused on global optimization. We consider the more general case where a user is faced with multiple problems that each need to be optimized conditional on a state variable, for example we optimize the location of ambulances conditioned on patient distribution given a range of cities with different patient distributions. Similarity across objectives boosts optimization of each objective in two ways: in modelling by data sharing across objectives, and also in acquisition by quantifying how all objectives benefit from a single point on one objective. For this we propose ConBO, a novel efficient algorithm that is based on a new hybrid Knowledge Gradient method, that outperforms recently published works on synthetic and real world problems, and is easily parallelized to collecting a batch of points.

[394]  arXiv:2002.09998 (cross-list from stat.ME) [pdf, other]
Title: Generalized Bayesian Filtering via Sequential Monte Carlo
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)

We introduce a framework for inference in general state-space hidden Markov models (HMMs) under likelihood misspecification. In particular, we leverage the loss-theoretic perspective of generalized Bayesian inference (GBI) to define generalized filtering recursions in HMMs, that can tackle the problem of inference under model misspecification. In doing so, we arrive at principled procedures for robust inference against observation contamination through the $\beta$-divergence. Operationalizing the proposed framework is made possible via sequential Monte Carlo methods (SMC). The standard particle methods, and their associated convergence results, are readily generalized to the new setting. We demonstrate our approach to object tracking and Gaussian process regression problems, and observe improved performance over standard filtering algorithms.

[395]  arXiv:2002.10020 (cross-list from eess.SP) [pdf, other]
Title: Optimal Jammer Placement in UAV-assisted Relay Networks
Comments: 6 pages, 6 figures
Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)

We consider the relaying application of unmanned aerial vehicles (UAVs), in which UAVs are placed between two transceivers (TRs) to increase the throughput of the system. Instead of studying the placement of UAVs as pursued in existing literature, we focus on investigating the placement of a jammer or a major source of interference on the ground to effectively degrade the performance of the system, which is measured by the maximum achievable data rate of transmission between the TRs. We demonstrate that the optimal placement of the jammer is in general a non-convex optimization problem, for which obtaining the solution directly is intractable. Afterward, using the inherent characteristics of the signal-to-interference ratio (SIR) expressions, we propose a tractable approach to find the optimal position of the jammer. Based on the proposed approach, we investigate the optimal positioning of the jammer in both dual-hop and multi-hop UAV relaying settings. Numerical simulations are provided to evaluate the performance of our proposed method.

[396]  arXiv:2002.10022 (cross-list from physics.ao-ph) [pdf]
Title: Application of ERA5 and MENA simulations to predict offshore wind energy potential
Comments: 21 pages, 12 figures
Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Machine Learning (cs.LG); Machine Learning (stat.ML)

This study explores wind energy resources in different locations through the Gulf of Oman and also their future variability due climate change impacts. In this regard, EC-EARTH near surface wind outputs obtained from CORDEX-MENA simulations are used for historical and future projection of the energy. The ERA5 wind data are employed to assess suitability of the climate model. Moreover, the ERA5 wave data over the study area are applied to compute sea surface roughness as an important variable for converting near surface wind speeds to those of wind speed at turbine hub-height. Considering the power distribution, bathymetry and distance from the coats, some spots as tentative energy hotspots to provide detailed assessment of directional and temporal variability and also to investigate climate change impact studies. RCP8.5 as a common climatic scenario is used to project and extract future variation of the energy in the selected sites. The results of this study demonstrate that the selected locations have a suitable potential for wind power turbine plan and constructions.

[397]  arXiv:2002.10023 (cross-list from math.OC) [pdf, other]
Title: Suboptimal Stabilization of Unknown Nonlinear Systems via Extended State Observers
Authors: Amir Shakouri
Comments: 5 pages, 2 figures
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

This paper introduces a globally asymptotically stable, locally optimal, stabilizer for multi-input muti-output nonlinear systems of any order with totally unknown dynamics in a special form. The control scheme proposed in this paper lies at the intersection of the active disturbance rejection control (ADRC) and the state-dependent Riccati equation (SDRE) control method. It is shown that using an extended state observer, the state-dependent coefficient matrix of the nonlinear system can be estimated. The system in then stabilized by a suboptimal controller in the region where SDRE method is effective (an estimated region of attraction) and uses an ADRC outside the region as a backup for global stability assurance.

[398]  arXiv:2002.10032 (cross-list from eess.IV) [pdf, other]
Title: Generalized Octave Convolutions for Learned Multi-Frequency Image Compression
Comments: 10 pages, 7 figures, 3 tables
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Learned image compression has recently shown the potential to outperform all standard codecs. The state-of-the-art rate-distortion performance has been achieved by context-adaptive entropy approaches in which hyperprior and autoregressive models are jointly utilized to effectively capture the spatial dependencies in the latent representations. However, the latents contain a mixture of high and low frequency information, which has inefficiently been represented by features maps of the same spatial resolution in previous works. In this paper, we propose the first learned multi-frequency image compression approach that uses the recently developed octave convolutions to factorize the latents into high and low frequencies. Since the low frequency is represented by a lower resolution, their spatial redundancy is reduced, which improves the compression rate. Moreover, octave convolutions impose effective high and low frequency communication, which can improve the reconstruction quality. We also develop novel generalized octave convolution and octave transposed-convolution architectures with internal activation layers to preserve the spatial structure of the information. Our experiments show that the proposed scheme outperforms all standard codecs and learning-based methods in both PSNR and MS-SSIM metrics, and establishes the new state of the art for learned image compression.

[399]  arXiv:2002.10034 (cross-list from q-bio.QM) [pdf, other]
Title: Predicting Rate of Cognitive Decline at Baseline Using a Deep Neural Network with Multidata Analysis
Authors: Sema Candemir, Xuan V. Nguyen, Luciano M. Prevedello, Matthew T. Bigelow, Richard D.White, Barbaros S. Erdal (for the Alzheimer's Disease Neuroimaging Initiative)
Subjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Neurons and Cognition (q-bio.NC)

This study investigates whether a machine-learning-based system can predict the rate of cognitive-decline in mildly cognitively impaired (MCI) patients by processing only the clinical and imaging data collected at the initial visit. We build a predictive model based on a supervised hybrid neural network utilizing a 3-Dimensional Convolutional Neural Network to perform volume analysis of Magnetic Resonance Imaging (MRI) and integration of non-imaging clinical data at the fully connected layer of the architecture. The analysis is performed on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. Experimental results confirm that there is a correlation between cognitive decline and the data obtained at the first visit. The system achieved an area under the receiver operator curve (AUC) of 66.6% for cognitive decline class prediction.

[400]  arXiv:2002.10060 (cross-list from stat.ML) [pdf, other]
Title: Handling the Positive-Definite Constraint in the Bayesian Learning Rule
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Bayesian learning rule is a recently proposed variational inference method, which not only contains many existing learning algorithms as special cases but also enables the design of new algorithms. Unfortunately, when posterior parameters lie in an open constraint set, the rule may not satisfy the constraints and requires line-searches which could slow down the algorithm. In this paper, we fix this issue for the positive-definite constraint by proposing an improved rule that naturally handles the constraint. Our modification is obtained using Riemannian gradient methods, and is valid when the approximation attains a \emph{block-coordinate natural parameterization} (e.g., Gaussian distributions and their mixtures). Our method outperforms existing methods without any significant increase in computation. Our work makes it easier to apply the learning rule in the presence of positive-definite constraints in parameter spaces.

[401]  arXiv:2002.10091 (cross-list from stat.ME) [pdf, other]
Title: Towards precise causal effect estimation from data with hidden variables
Authors: Debo Cheng (1), Jiuyong Li (1), Lin Liu (1), Kui Yu (2), Thuc Duy Lee (1), Jixue Liu (1) ((1) School of Information Technology and Mathematical Sciences, University of South Australia (2) School of Computer Science and Information Engineering, Hefei University of Technology)
Subjects: Methodology (stat.ME); Artificial Intelligence (cs.AI)

Causal effect estimation from observational data is a crucial but challenging task. Currently, only a limited number of data-driven causal effect estimation methods are available. These methods either only provide a bound estimation of the causal effect of a treatment on the outcome, or have impractical assumptions on the data or low efficiency although providing a unique estimation of the causal effect. In this paper, we identify a practical problem setting and propose an approach to achieving unique causal effect estimation from data with hidden variables under this setting. For the approach, we develop the theorems to support the discovery of the proper covariate sets for confounding adjustment (adjustment sets). Based on the theorems, two algorithms are presented for finding the proper adjustment sets from data with hidden variables to obtain unbiased and unique causal effect estimation. Experiments with benchmark Bayesian networks and real-world datasets have demonstrated the efficiency and effectiveness of the proposed algorithms, indicating the practicability of the identified problem setting and the potential of the approach in real-world applications.

[402]  arXiv:2002.10118 (cross-list from stat.ML) [pdf, other]
Title: Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The point estimates of ReLU classification networks---arguably the most widely used neural network architecture---have been shown to yield arbitrarily high confidence far away from the training data. This architecture, in conjunction with a maximum a posteriori estimation scheme, is thus not calibrated nor robust. Approximate Bayesian inference has been empirically demonstrated to improve predictive uncertainty in neural networks, although the theoretical analysis of such Bayesian approximations is limited. We theoretically analyze approximate Gaussian posterior distributions on the weights of ReLU networks and show that they fix the overconfidence problem. Furthermore, we show that even a simplistic, thus cheap, Bayesian approximation, also fixes these issues. This indicates that a sufficient condition for a calibrated uncertainty on a ReLU network is ``to be a bit Bayesian''. These theoretical results validate the usage of last-layer Bayesian approximation and motivate a range of a fidelity-cost trade-off. We further validate these findings empirically via various standard experiments using common deep ReLU networks and Laplace approximations.

[403]  arXiv:2002.10123 (cross-list from eess.IV) [pdf, other]
Title: Fusion of Camera Model and Source Device Specific Forensic Methods for Improved Tamper Detection
Comments: 13 pages
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Signal Processing (eess.SP)

PRNU based camera recognition method is widely studied in the image forensic literature. In recent years, CNN based camera model recognition methods have been developed. These two methods also provide solutions to tamper localization problem. In this paper, we propose their combination via a Neural Network to achieve better small-scale tamper detection performance. According to the results, the fusion method performs better than underlying methods even under high JPEG compression. For forgeries as small as 100$\times$100 pixel size, the proposed method outperforms the state-of-the-art, which validates the usefulness of fusion for localization of small-size image forgeries. We believe the proposed approach is feasible for any tamper-detection pipeline using the PRNU based methodology.

[404]  arXiv:2002.10133 (cross-list from physics.comp-ph) [pdf, other]
Title: Non-isothermal Scharfetter-Gummel scheme for electro-thermal transport simulation in degenerate semiconductors
Subjects: Computational Physics (physics.comp-ph); Other Condensed Matter (cond-mat.other); Numerical Analysis (math.NA); Applied Physics (physics.app-ph)

Electro-thermal transport phenomena in semiconductors are described by the non-isothermal drift-diffusion system. The equations take a remarkably simple form when assuming the Kelvin formula for the thermopower. We present a novel, non-isothermal generalization of the Scharfetter-Gummel finite volume discretization for degenerate semiconductors obeying Fermi-Dirac statistics, which preserves numerous structural properties of the continuous model on the discrete level. The approach is demonstrated by 2D simulations of a heterojunction bipolar transistor.

[405]  arXiv:2002.10138 (cross-list from physics.soc-ph) [pdf, ps, other]
Title: Universality of citation distributions and its explanation
Comments: 23 pages, 9 figures
Subjects: Physics and Society (physics.soc-ph); Digital Libraries (cs.DL)

Universality or near-universality of citation distributions was found empirically a decade ago but its theoretical justification has been lacking so far. Here, we systematically study citation distributions for different disciplines in order to characterize this putative universality and to understand it theoretically. Using our calibrated model of citation dynamics, we find microscopic explanation of the universality of citation distributions and explain deviations therefrom. We demonstrate that citation count of the paper is determined, on the one hand, by its fitness -- the attribute which, for most papers, is set at the moment of publication. The fitness distributions for different disciplines are very similar and can be approximated by the log-normal distribution. On another hand, citation dynamics of a paper is related to the mechanism by which the knowledge about it spreads in the scientific community. This viral propagation is non-universal and discipline-specific. Thus, universality of citation distributions traces its origin to the fitness distribution, while deviations from universality are associated with the discipline-specific citation dynamics of papers.

[406]  arXiv:2002.10167 (cross-list from eess.SP) [pdf, other]
Title: Joint blind calibration and time-delay estimation for multiband ranging
Comments: 4 pages, 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)
Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)

In this paper, we focus on the problem of blind joint calibration of multiband transceivers and time-delay (TD) estimation of multipath channels. We show that this problem can be formulated as a particular case of covariance matching. Although this problem is severely ill-posed, prior information about radio-frequency chain distortions and multipath channel sparsity is used for regularization. This approach leads to a biconvex optimization problem, which is formulated as a rank-constrained linear system and solved by a simple group Lasso algorithm.Numerical experiments show that the proposed algorithm provides better calibration and higher resolution for TD estimation than current state-of-the-art methods.

[407]  arXiv:2002.10183 (cross-list from physics.ins-det) [pdf, other]
Title: J-PET Framework: Software platform for PET tomography data reconstruction and analysis
Comments: 12 pages, 4 figures
Subjects: Instrumentation and Detectors (physics.ins-det); Software Engineering (cs.SE); Medical Physics (physics.med-ph)

J-PET Framework is an open-source software platform for data analysis, written in C++ and based on the ROOT package. It provides a common environment for implementation of reconstruction, calibration and filtering procedures, as well as for user-level analyses of Positron Emission Tomography data. The library contains a set of building blocks that can be combined by users with even little programming experience, into chains of processing tasks through a convenient, simple and well-documented API. The generic input-output interface allows processing the data from various sources: low-level data from the tomography acquisition system or from diagnostic setups such as digital oscilloscopes, as well as high-level tomography structures e.g. sinograms or a list of lines-of-response. Moreover, the environment can be interfaced with Monte Carlo simulation packages such as GEANT and GATE, which are commonly used in the medical scientific community.

[408]  arXiv:2002.10201 (cross-list from eess.IV) [pdf, other]
Title: Beyond Camera Motion Removing: How to Handle Outliers in Deblurring
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Performing camera motion deblurring is an important low-level vision task for achieving better imaging quality. When a scene has outliers such as saturated pixels and salt-and pepper noise, the image becomes more difficult to restore. In this paper, we propose an edge-aware scalerecurrent network (EASRN) to conduct camera motion deblurring. EASRN has a separate deblurring module that removes blur at multiple scales and an upsampling module that fuses different input scales. We propose a salient edge detection network to supervise the training process and solve the outlier problem by proposing a novel method of dataset generation. Light streaks are printed on the sharp image to simulate the cutoff effect from saturation. We evaluate our method on the standard deblurring datasets. Both objective evaluation indexes and subjective visualization show that our method results in better deblurring quality than the other state-of-the-art approaches.

[409]  arXiv:2002.10206 (cross-list from q-fin.CP) [pdf, ps, other]
Title: Hybrid, adaptive, and positivity preserving numerical methods for the Cox-Ingersoll-Ross model
Comments: 24 pages, 3 figures, 2 tables
Subjects: Computational Finance (q-fin.CP); Numerical Analysis (math.NA)

We introduce an adaptive Euler method for the approximate solution of the Cox-Ingersoll-Ross short rate model. An explicit discretisation is applied over an adaptive mesh to the stochastic differential equation (SDE) governing the square root of the solution, relying upon a class of path-bounded timestepping strategies which work by reducing the stepsize as solutions approach a neighbourhood of zero. The method is hybrid in the sense that a backstop method is invoked if the timestep becomes too small, or to prevent solutions from overshooting zero and becoming negative. Under parameter constraints that imply Feller's condition, we prove that such a scheme is strongly convergent, of order at least 1/2. Under Feller's condition we also prove that the probability of ever needing the backstop method to prevent a negative value can be made arbitrarily small. Numerically, we compare this adaptive method to fixed step schemes extant in the literature, both implicit and explicit, and a novel semi-implicit adaptive variant. We observe that the adaptive approach leads to methods that are competitive over the entire domain of Feller's condition.

[410]  arXiv:2002.10218 (cross-list from astro-ph.CO) [pdf, other]
Title: Baryon acoustic oscillations reconstruction using convolutional neural networks
Subjects: Cosmology and Nongalactic Astrophysics (astro-ph.CO); Machine Learning (cs.LG)

Here we propose a new scheme to reconstruct the baryon acoustic oscillations (BAO) signal, with key cosmological information, based on deep convolutional neural networks. After training the network with almost no fine-tuning, in the test set, the network recovers large-scale modes accurately: the correlation coefficient between the ground truth and recovered initial conditions still reach $90\%$ at $k \leq 0.2~ h\mathrm{Mpc}^{-1}$, which significantly improves the BAO signal-to-noise ratio until the scale $k=0.4~ h\mathrm{Mpc}^{-1}$. Furthermore, our scheme is independent of the survey boundary since it reconstructs initial condition based on local density distribution in configuration space, which means that we can gain more information from the whole survey space. Finally, we found our trained network is not sensitive to the cosmological parameters and works very well in those cosmologies close to that of our training set. This new scheme will possibly help us dig out more information from the current, on-going and future galaxy surveys.

[411]  arXiv:2002.10243 (cross-list from stat.ML) [pdf, other]
Title: Informative Gaussian Scale Mixture Priors for Bayesian Neural Networks
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Encoding domain knowledge into the prior over the high-dimensional weight space is challenging in Bayesian neural networks. Two types of domain knowledge are commonly available in scientific applications: 1. feature sparsity (number of relevant features); 2. signal-to-noise ratio, quantified, for instance, as the proportion of variance explained (PVE). We show both types of domain knowledge can be encoded into the widely used Gaussian scale mixture priors with Automatic Relevance Determination. Specifically, we propose a new joint prior over the local (i.e., feature-specific) scale parameters to encode the knowledge about feature sparsity, and an algorithm to determine the global scale parameter (shared by all features) according to the PVE. Empirically, we show that the proposed informative prior improves prediction accuracy on publicly available datasets and in a genetics application.

[412]  arXiv:2002.10247 (cross-list from q-fin.ST) [pdf]
Title: Forecasting Foreign Exchange Rate: A Multivariate Comparative Analysis between Traditional Econometric, Contemporary Machine Learning & Deep Learning Techniques
Comments: 10 pages
Subjects: Statistical Finance (q-fin.ST); Machine Learning (cs.LG); Econometrics (econ.EM); Machine Learning (stat.ML)

In todays global economy, accuracy in predicting macro-economic parameters such as the foreign the exchange rate or at least estimating the trend correctly is of key importance for any future investment. In recent times, the use of computational intelligence-based techniques for forecasting macroeconomic variables has been proven highly successful. This paper tries to come up with a multivariate time series approach to forecast the exchange rate (USD/INR) while parallelly comparing the performance of three multivariate prediction modelling techniques: Vector Auto Regression (a Traditional Econometric Technique), Support Vector Machine (a Contemporary Machine Learning Technique), and Recurrent Neural Networks (a Contemporary Deep Learning Technique). We have used monthly historical data for several macroeconomic variables from April 1994 to December 2018 for USA and India to predict USD-INR Foreign Exchange Rate. The results clearly depict that contemporary techniques of SVM and RNN (Long Short-Term Memory) outperform the widely used traditional method of Auto Regression. The RNN model with Long Short-Term Memory (LSTM) provides the maximum accuracy (97.83%) followed by SVM Model (97.17%) and VAR Model (96.31%). At last, we present a brief analysis of the correlation and interdependencies of the variables used for forecasting.

[413]  arXiv:2002.10257 (cross-list from eess.IV) [pdf, other]
Title: Using wavelets to analyze similarities in image datasets
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Machine Learning (stat.ML)

Deep learning image classifiers usually rely on huge training sets and their training process can be described as learning the similarities and differences among training images. But, images in large training sets are not usually studied from this perspective and fine-level similarities and differences among images is usually overlooked. Some studies aim to identify the influential and redundant training images, but such methods require a model that is already trained on the entire training set. Here, we show that analyzing the contents of large training sets can provide valuable insights about the classification task at hand, prior to training a model on them. We use wavelet decomposition of images and other image processing tools to perform such analysis, with no need for a pre-trained model. This makes the analysis of training sets, straightforward and fast. We show that similar images in standard datasets (such as CIFAR) can be identified in a few seconds, a significant speed-up compared to alternative methods in the literature. We also show that similarities between training and testing images may explain the generalization of models and their mistakes. Finally, we investigate the similarities between images in relation to decision boundaries of a trained model.

[414]  arXiv:2002.10271 (cross-list from stat.ML) [pdf, other]
Title: Testing Goodness of Fit of Conditional Density Models with Kernels
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

We propose two nonparametric statistical tests of goodness of fit for conditional distributions: given a conditional probability density function $p(y|x)$ and a joint sample, decide whether the sample is drawn from $p(y|x)r_x(x)$ for some density $r_x$. Our tests, formulated with a Stein operator, can be applied to any differentiable conditional density model, and require no knowledge of the normalizing constant. We show that 1) our tests are consistent against any fixed alternative conditional model; 2) the statistics can be estimated easily, requiring no density estimation as an intermediate step; and 3) our second test offers an interpretable test result providing insight on where the conditional model does not fit well in the domain of the covariate. We demonstrate the interpretability of our test on a task of modeling the distribution of New York City's taxi drop-off location given a pick-up point. To our knowledge, our work is the first to propose such conditional goodness-of-fit tests that simultaneously have all these desirable properties.

[415]  arXiv:2002.10290 (cross-list from nucl-th) [pdf, other]
Title: Trees and Forests in Nuclear Physics
Subjects: Nuclear Theory (nucl-th); Machine Learning (cs.LG); Nuclear Experiment (nucl-ex)

We present a detailed introduction to the decision tree algorithm using some simple examples taken from the domain of nuclear physics. We show how to improve the accuracy of the classical liquid drop nuclear mass model by performing Feature Engineering while using a decision tree. Finally, we apply the method to the Duflo-Zucker mass model showing that, despite their simplicity, decision trees are capable of obtaining a level of accuracy comparable to more complex neural networks, but using way less adjustable parameters and obtaining easier to explain models.

[416]  arXiv:2002.10291 (cross-list from math.OC) [pdf, other]
Title: Estimation-aware model predictive path-following control for a general 2-trailer with a car-like tractor
Comments: Submitted to IEEE Transactions on Robotics. arXiv admin note: text overlap with arXiv:2002.06874
Subjects: Optimization and Control (math.OC); Robotics (cs.RO)

The design of the path-following controller is crucial to enable reliable autonomous vehicle operation. This design problem is especially challenging for a general 2-trailer with a car-like tractor due to the tractor's curvature limitations and the vehicle's structurally unstable joint-angle kinematics in backward motion. Additionally, to make the control system independent of any sensor mounted on the trailer, advanced sensors placed in the rear of the tractor have been proposed to solve the joint-angle estimation problem. Since these sensors typically have a limited field of view, the proposed estimation solution introduces restrictions on the joint-angle configurations that can be estimated with high accuracy. To model and explicitly consider these constraints in the controller, a model predictive path-following control approach is proposed. Two approaches with different computation complexity and performance are presented. In the first approach, the constraint on the joint angles is modeled as a union of convex polytopes, making it necessary to incorporate binary decision variables. The second approach avoids binary variables at the expense of a more restrictive approximation of the joint-angle constraints. In simulations and field experiments, the performance of the proposed path-following control approach in terms of suppressing disturbances and recovering from non-trivial initial states is compared with a previously proposed control strategy where the joint-angle constraints are neglected.

[417]  arXiv:2002.10385 (cross-list from q-fin.CP) [pdf, ps, other]
Title: Predictive intraday correlations in stable and volatile market environments: Evidence from deep learning
Comments: 15 pages, 6 figures, preprint submitted to Physica A
Subjects: Computational Finance (q-fin.CP); Machine Learning (cs.LG); Machine Learning (stat.ML)

Standard methods and theories in finance can be ill-equipped to capture highly non-linear interactions in financial prediction problems based on large-scale datasets, with deep learning offering a way to gain insights into correlations in markets as complex systems. In this paper, we apply deep learning to econometrically constructed gradients to learn and exploit lagged correlations among S&P 500 stocks to compare model behaviour in stable and volatile market environments, and under the exclusion of target stock information for predictions. In order to measure the effect of time horizons, we predict intraday and daily stock price movements in varying interval lengths and gauge the complexity of the problem at hand with a modification of our model architecture. Our findings show that accuracies, while remaining significant and demonstrating the exploitability of lagged correlations in stock markets, decrease with shorter prediction horizons. We discuss implications for modern finance theory and our work's applicability as an investigative tool for portfolio managers. Lastly, we show that our model's performance is consistent in volatile markets by exposing it to the environment of the recent financial crisis of 2007/2008.

[418]  arXiv:2002.10399 (cross-list from stat.ME) [pdf, other]
Title: Confidence Sets and Hypothesis Testing in a Likelihood-Free Inference Setting
Comments: 16 pages, 7 figures, 3 tables, 4 algorithm boxes
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)

Parameter estimation, statistical tests and confidence sets are the cornerstones of classical statistics that allow scientists to make inferences about the underlying process that generated the observed data. A key question is whether one can still construct hypothesis tests and confidence sets with proper coverage and high power in a so-called likelihood-free inference (LFI) setting; that is, a setting where the likelihood is not explicitly known but one can forward-simulate observable data according to a stochastic model. In this paper, we present $\texttt{ACORE}$ (Approximate Computation via Odds Ratio Estimation), a frequentist approach to LFI that first formulates the classical likelihood ratio test (LRT) as a parametrized classification problem, and then uses the equivalence of tests and confidence sets to build confidence regions for parameters of interest. We also present a goodness-of-fit procedure for checking whether the constructed tests and confidence regions are valid. $\texttt{ACORE}$ is based on the key observation that the LRT statistic, the rejection probability of the test, and the coverage of the confidence set are conditional distribution functions which often vary smoothly as a function of the parameters of interest. Hence, instead of relying solely on samples simulated at fixed parameter settings (as is the convention in standard Monte Carlo solutions), one can leverage machine learning tools and data simulated in the neighborhood of a parameter to improve estimates of quantities of interest. We demonstrate the efficacy of $\texttt{ACORE}$ with both theoretical and empirical results. Our implementation is available on Github.

Replacements for Tue, 25 Feb 20

[419]  arXiv:0805.1293 (replaced) [pdf]
Title: Testability of Reversible Iterative Logic Arrays
Authors: Avik Chakraborty
Subjects: Other Computer Science (cs.OH)
[420]  arXiv:1303.3341 (replaced) [pdf, ps, other]
Title: A short proof that all linear codes are weakly algebraic-geometric using Bertini theorems of B. Poonen
Comments: Title modified, expository content shortened. Final version to appear in Discrete Math
Journal-ref: Discrete Math., vol. 343, Issue 6, June 2020
Subjects: Information Theory (cs.IT)
[421]  arXiv:1511.03019 (replaced) [pdf, other]
Title: 3D Time-lapse Reconstruction from Internet Photos
Comments: To appear in ICCV'15. Supplementary video at: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[422]  arXiv:1602.04410 (replaced) [pdf, ps, other]
Title: Simple Characterizations of Potential Games and Zero-sum Games
Subjects: Computer Science and Game Theory (cs.GT)
[423]  arXiv:1607.00174 (replaced) [pdf, other]
Title: Blockchain-based Proof of Location
Comments: 13 pages, 9 figures
Journal-ref: 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)
[424]  arXiv:1609.02490 (replaced) [pdf, ps, other]
Title: Information and dimensionality of anisotropic random geometric graphs
Comments: 38 pages
Subjects: Statistics Theory (math.ST); Social and Information Networks (cs.SI); Probability (math.PR)
[425]  arXiv:1703.10556 (replaced) [pdf, other]
Title: Sparse Signal Recovery via Generalized Entropy Functions Minimization
Journal-ref: IEEE Transactions on Signal Processing, Vol. 67 (5), Mar. 2019
Subjects: Information Theory (cs.IT)
[426]  arXiv:1710.04640 (replaced) [pdf, other]
Title: Hard and Easy Instances of L-Tromino Tilings
Comments: Full extended version of LNCS 11355:82-95 (WALCOM 2019)
Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
[427]  arXiv:1711.09740 (replaced) [pdf, other]
Title: Distances between States and between Predicates
Subjects: Logic in Computer Science (cs.LO)
[428]  arXiv:1802.04613 (replaced) [pdf, other]
Title: First-order queries on classes of structures with bounded expansion
Subjects: Databases (cs.DB)
[429]  arXiv:1802.09081 (replaced) [pdf, other]
Title: Temporal Difference Models: Model-Free Deep RL for Model-Based Control
Comments: Appeared in ICLR 2018; typos corrected
Subjects: Machine Learning (cs.LG)
[430]  arXiv:1804.11021 (replaced) [src]
Title: On the Effect of Suboptimal Estimation of Mutual Information in Feature Selection and Classification
Comments: Some of the results in the paper need to be taken back as they were not able to be reproduced, and thus we are requesting a withdrawal of the paper until we are able to update and verify our previous experiments & scripts
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[431]  arXiv:1805.00692 (replaced) [pdf, ps, other]
Title: Compressed Dictionary Learning
Comments: 5 figure, 4.6 pages per figure
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[432]  arXiv:1806.01304 (replaced) [pdf, other]
Title: MOSES: A Streaming Algorithm for Linear Dimensionality Reduction
Subjects: Information Theory (cs.IT)
[433]  arXiv:1806.04225 (replaced) [pdf, other]
Title: PAC-Bayes Control: Learning Policies that Provably Generalize to Novel Environments
Comments: Extended version of paper presented at the 2018 Conference on Robot Learning (CoRL)
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC)
[434]  arXiv:1806.07709 (replaced) [pdf, ps, other]
Title: Notes on Abstract Argumentation Theory
Comments: 93 pages, 38 figures, 6 tables
Subjects: Artificial Intelligence (cs.AI)
[435]  arXiv:1806.07984 (replaced) [pdf, other]
Title: Enclave Tasking for Discontinuous Galerkin Methods on Dynamically Adaptive Meshes
Subjects: Mathematical Software (cs.MS); Distributed, Parallel, and Cluster Computing (cs.DC)
[436]  arXiv:1808.08347 (replaced) [pdf, other]
Title: A Comparison of the Taguchi Method and Evolutionary Optimization in Multivariate Testing
Comments: 5 pages, 4 figures, IAAI-19
Subjects: Neural and Evolutionary Computing (cs.NE)
[437]  arXiv:1809.01674 (replaced) [pdf, other]
Title: Hierarchical Selective Recruitment in Linear-Threshold Brain Networks -- Part I: Single-Layer Dynamics and Selective Inhibition
Subjects: Systems and Control (eess.SY); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)
[438]  arXiv:1809.02493 (replaced) [pdf, other]
Title: Hierarchical Selective Recruitment in Linear-Threshold Brain Networks -- Part II: Multi-Layer Dynamics and Top-Down Recruitment
Subjects: Systems and Control (eess.SY); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)
[439]  arXiv:1810.11187 (replaced) [pdf, other]
Title: TarMAC: Targeted Multi-Agent Communication
Comments: ICML 2019
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
[440]  arXiv:1810.11556 (replaced) [pdf, other]
Title: Efficient and Trustworthy Social Navigation Via Explicit and Implicit Robot-Human Communication
Journal-ref: IEEETransactionsonRobotics,pp(99):1-16,2020
Subjects: Robotics (cs.RO)
[441]  arXiv:1811.01290 (replaced) [pdf, ps, other]
Title: Auto-ML Deep Learning for Rashi Scripts OCR
Comments: The paper is under consideration at Pattern Recognition Letters
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[442]  arXiv:1811.02702 (replaced) [pdf, other]
Title: Greedy Frank-Wolfe Algorithm for Exemplar Selection
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[443]  arXiv:1811.08581 (replaced) [pdf, other]
Title: Recent Advances in Open Set Recognition: A Survey
Comments: This is a preliminary and will be kept updated, any suggestions and comments are welcome (gengchuanxing@nuaa.edu.cn)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[444]  arXiv:1811.09231 (replaced) [pdf, other]
Title: Goal-constrained Planning Domain Model Verification of Safety Properties
Subjects: Artificial Intelligence (cs.AI)
[445]  arXiv:1811.11474 (replaced) [pdf, other]
Title: Improved Calibration of Numerical Integration Error in Sigma-Point Filters
Comments: 13 pages, 4 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Signal Processing (eess.SP); Methodology (stat.ME)
[446]  arXiv:1811.12506 (replaced) [pdf, other]
Title: 3D Semi-Supervised Learning with Uncertainty-Aware Multi-View Co-Training
Comments: Accepted to WACV 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[447]  arXiv:1811.12804 (replaced) [pdf, other]
Title: Asymmetry Helps: Eigenvalue and Eigenvector Analyses of Asymmetrically Perturbed Low-Rank Matrices
Comments: accepted to Annals of Statistics, 2020. 37 pages
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Signal Processing (eess.SP); Numerical Analysis (math.NA); Machine Learning (stat.ML)
[448]  arXiv:1812.00071 (replaced) [pdf, other]
Title: Stochastic Gradient MCMC with Repulsive Forces
Comments: Extends the workshop version
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[449]  arXiv:1812.00885 (replaced) [pdf, ps, other]
Title: AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity
Comments: Accepted by AISTATS 2020
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
[450]  arXiv:1812.04201 (replaced) [pdf, other]
Title: Range-based Coordinate Alignment for Cooperative Mobile Sensor Network Localization
Subjects: Systems and Control (eess.SY); Multiagent Systems (cs.MA)
[451]  arXiv:1812.09806 (replaced) [pdf, other]
Title: Analysis of contagion maps on a class of networks that are spatially embedded in a torus
Subjects: Social and Information Networks (cs.SI); Algebraic Topology (math.AT); Dynamical Systems (math.DS); Adaptation and Self-Organizing Systems (nlin.AO); Physics and Society (physics.soc-ph)
[452]  arXiv:1901.00137 (replaced) [pdf, ps, other]
Title: A Theoretical Analysis of Deep Q-Learning
Comments: 65 pages
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[453]  arXiv:1901.03775 (replaced) [pdf, other]
Title: Creative AI Through Evolutionary Computation
Journal-ref: In Banzhaf et al. (editors), Evolution in Action---Past, Present and Future. New York: Springer. 2020
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[454]  arXiv:1901.04008 (replaced) [pdf, other]
Title: Fast Deterministic Algorithms for Highly-Dynamic Networks
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
[455]  arXiv:1901.10300 (replaced) [pdf, other]
Title: Weighted-Sampling Audio Adversarial Example Attack
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[456]  arXiv:1902.00450 (replaced) [pdf, other]
Title: Time Series Deconfounder: Estimating Treatment Effects over Time in the Presence of Hidden Confounders
Subjects: Machine Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)
[457]  arXiv:1902.09009 (replaced) [pdf, ps, other]
Title: Efficient Private Algorithms for Learning Large-Margin Halfspaces
Comments: changed title, added references and remarks
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
[458]  arXiv:1902.09458 (replaced) [pdf, other]
Title: Long-Range Indoor Navigation with PRM-RL
Comments: Accepted to T-RO
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[459]  arXiv:1902.09527 (replaced) [pdf, other]
Title: clusterNOR: A NUMA-Optimized Clustering Framework
Comments: arXiv admin note: Journal version of arXiv:1606.08905
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[460]  arXiv:1903.00070 (replaced) [pdf, other]
Title: Learning to Plan in High Dimensions via Neural Exploration-Exploitation Trees
Comments: 26 pages, 74 figures, ICLR 2020 spotlight
Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Machine Learning (stat.ML)
[461]  arXiv:1903.01672 (replaced) [pdf, other]
Title: Causal Discovery from Heterogeneous/Nonstationary Data
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[462]  arXiv:1903.06187 (replaced) [pdf, other]
Title: No-regret Exploration in Contextual Reinforcement Learning
Comments: Under review. 25 pages, 2 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[463]  arXiv:1903.09338 (replaced) [pdf, other]
Title: Optimization Methods for Interpretable Differentiable Decision Trees in Reinforcement Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[464]  arXiv:1904.00069 (replaced) [pdf, other]
Title: Unpaired Point Cloud Completion on Real Scans using Adversarial Training
Comments: ICLR 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[465]  arXiv:1904.01803 (replaced) [pdf, other]
Title: GFF: Gated Fully Fusion for Semantic Segmentation
Comments: accepted by AAAI-2020(oral)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[466]  arXiv:1904.02311 (replaced) [pdf, ps, other]
Title: Approximation Rates for Neural Networks with General Activation Functions
Subjects: Classical Analysis and ODEs (math.CA); Machine Learning (cs.LG)
[467]  arXiv:1904.03766 (replaced) [pdf, other]
Title: Generalized Persistence Algorithm for Decomposing Multi-parameter Persistence Modules
Subjects: Algebraic Topology (math.AT); Computational Geometry (cs.CG)
[468]  arXiv:1904.04898 (replaced) [pdf, other]
Title: On multiple solutions to the steady flow of incompressible fluids subject to do-nothing or constant traction boundary conditions on artificial boundaries
Comments: 15 pages
Journal-ref: Journal of Mathematical Fluid Mechanics 22(11), 2020
Subjects: Fluid Dynamics (physics.flu-dyn); Numerical Analysis (math.NA)
[469]  arXiv:1904.07162 (replaced) [pdf, other]
Title: Single Machine Graph Analytics on Massive Datasets Using Intel Optane DC Persistent Memory
Authors: Gurbinder Gill (1), Roshan Dathathri (1), Loc Hoang (1), Ramesh Peri (2), Keshav Pingali (1) ((1) The University of Texas at Austin, (2) Intel Corporation)
Comments: 11 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[470]  arXiv:1904.07184 (replaced) [pdf, ps, other]
Title: A monotone scheme for G-equations with application to the explicit convergence rate of robust central limit theorem
Comments: 31 pages
Subjects: Probability (math.PR); Numerical Analysis (math.NA)
[471]  arXiv:1904.07538 (replaced) [pdf, other]
Title: Long-Term Human Video Generation of Multiple Futures Using Poses
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[472]  arXiv:1904.08554 (replaced) [pdf, ps, other]
Title: Using Honeypots to Catch Adversarial Attacks on Neural Networks
Comments: 14 pages
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
[473]  arXiv:1904.09573 (replaced) [pdf, ps, other]
Title: Enabling Secure Wireless Communications via Intelligent Reflecting Surfaces
Comments: 6 pages, 5 figures, accepted to IEEE Global Commun. Conf. (GLOBECOM), Waikoloa, HI, USA, Dec. 2019, final version
Subjects: Information Theory (cs.IT)
[474]  arXiv:1904.09675 (replaced) [pdf, other]
Title: BERTScore: Evaluating Text Generation with BERT
Comments: Code available at this https URL; To appear in ICLR2020
Subjects: Computation and Language (cs.CL)
[475]  arXiv:1904.10164 (replaced) [pdf, other]
Title: Foundations, Properties, and Security Applications of Puzzles: A Survey
Subjects: Cryptography and Security (cs.CR)
[476]  arXiv:1904.12069 (replaced) [pdf]
Title: Improving Deep Speech Denoising by Noisy2Noisy Signal Mapping
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[477]  arXiv:1905.02602 (replaced) [pdf, other]
Title: Dissecting Android Cryptocurrency Miners
Subjects: Cryptography and Security (cs.CR)
[478]  arXiv:1905.02789 (replaced) [pdf, other]
Title: Variational training of neural network approximations of solution maps for physical models
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Machine Learning (stat.ML)
[479]  arXiv:1905.03428 (replaced) [pdf, other]
Title: Testing Scenario Library Generation for Connected and Automated Vehicles, Part II: Case Studies
Comments: 12 pages, 13 figures
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
[480]  arXiv:1905.10428 (replaced) [pdf, other]
Title: LdSM: Logarithm-depth Streaming Multi-label Decision Trees
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[481]  arXiv:1905.10634 (replaced) [pdf, other]
Title: Adaptive, Distribution-Free Prediction Intervals for Deep Networks
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[482]  arXiv:1905.11460 (replaced) [pdf, other]
Title: Incidence Networks for Geometric Deep Learning
Comments: Last revised 24 Feb 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[483]  arXiv:1905.11475 (replaced) [pdf, other]
Title: Adversarial Example Detection and Classification With Asymmetrical Adversarial Training
Comments: ICLR 2020
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
[484]  arXiv:1905.11656 (replaced) [pdf, other]
Title: Discrete Infomax Codes for Supervised Representation Learning
Comments: 19 pages
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[485]  arXiv:1905.11978 (replaced) [pdf, other]
Title: Better Long-Range Dependency By Bootstrapping A Mutual Information Regularizer
Authors: Yanshuai Cao, Peng Xu
Comments: Camera-ready for AISTATS 2020
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
[486]  arXiv:1905.12407 (replaced) [pdf, other]
Title: Non-linear Multitask Learning with Deep Gaussian Processes
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[487]  arXiv:1905.12915 (replaced) [pdf, ps, other]
Title: Separating an Outlier from a Change
Comments: 9 pages, 10 figures
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Applications (stat.AP)
[488]  arXiv:1905.12935 (replaced) [pdf, ps, other]
Title: Consistency of circuit lower bounds with bounded theories
Subjects: Computational Complexity (cs.CC); Logic (math.LO)
[489]  arXiv:1905.13651 (replaced) [pdf, other]
Title: Principal Fairness: Removing Bias via Projections
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
[490]  arXiv:1906.00829 (replaced) [pdf, ps, other]
Title: An adaptive multiresolution discontinuous Galerkin method with artificial viscosity for scalar hyperbolic conservation laws in multidimensions
Subjects: Numerical Analysis (math.NA)
[491]  arXiv:1906.01827 (replaced) [pdf, ps, other]
Title: Coresets for Data-efficient Training of Machine Learning Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[492]  arXiv:1906.02735 (replaced) [pdf, other]
Title: Residual Flows for Invertible Generative Modeling
Comments: NeurIPS 2019
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[493]  arXiv:1906.02922 (replaced) [pdf, other]
Title: Drifting Reinforcement Learning: The Blessing of (More) Optimism in Face of Endogenous & Exogenous Dynamics
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[494]  arXiv:1906.03038 (replaced) [pdf, other]
Title: A Generative Framework for Zero-Shot Learning with Adversarial Domain Adaptation
Comments: Proceedings of Winter Conference on Applications of Computer Vision (WACV) 2020
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[495]  arXiv:1906.03218 (replaced) [pdf, other]
Title: Planning With Uncertain Specifications (PUnS)
Comments: Accepted for publication by IEEE Robotics and Automation Letters. Accepted for presentation at the 2020 IEEE International Conference on Robotics and Automation
Subjects: Robotics (cs.RO)
[496]  arXiv:1906.03231 (replaced) [pdf, ps, other]
Title: A cryptographic approach to black box adversarial machine learning
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
[497]  arXiv:1906.03671 (replaced) [pdf, other]
Title: Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds
Journal-ref: 2020 International Conference on Learning Representations
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[498]  arXiv:1906.05204 (replaced) [pdf, other]
Title: Model-Free Practical Cooperative Control for Diffusively Coupled Systems
Comments: 12 pages, 7 figures
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
[499]  arXiv:1906.06289 (replaced) [pdf, other]
Title: Multi-Carrier Agile Phased Array Radar
Comments: 16 pages
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)
[500]  arXiv:1906.06575 (replaced) [src]
Title: Single Image Super-resolution via Dense Blended Attention Generative Adversarial Network for Clinical Diagnosis
Comments: We abandoned this paper due to its limitation only applied on medical images, please view our lastest work at arXiv:1911.03464
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[501]  arXiv:1906.06794 (replaced) [pdf, other]
Title: Back-Projection based Fidelity Term for Ill-Posed Linear Inverse Problems
Subjects: Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (math.NA)
[502]  arXiv:1906.08720 (replaced) [pdf, other]
Title: Boosting for Control of Dynamical Systems
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[503]  arXiv:1906.11152 (replaced) [pdf, other]
Title: Modulating Surrogates for Bayesian Optimization
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[504]  arXiv:1907.00422 (replaced) [pdf, other]
Title: Dedicated Lane for Connected and Automated Vehicle: How Much Does A Homogeneous Traffic Flow Contribute?
Subjects: Systems and Control (eess.SY)
[505]  arXiv:1907.00939 (replaced) [pdf, other]
Title: Pano Popups: Indoor 3D Reconstruction with a Plane-Aware Network
Comments: 2019 International Conference on 3D Vision (3DV). IEEE, 2019
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[506]  arXiv:1907.03406 (replaced) [pdf, other]
Title: Sparse Hierarchical Preconditioners Using Piecewise Smooth Approximations of Eigenvectors
Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
[507]  arXiv:1907.03707 (replaced) [pdf, ps, other]
Title: Asymmetric LOCO Codes: Constrained Codes for Flash Memories
Comments: 9 pages (double column), 0 figures, accepted at the Annual Allerton Conference on Communication, Control, and Computing
Subjects: Information Theory (cs.IT)
[508]  arXiv:1907.04409 (replaced) [pdf, other]
Title: Global Optimality Guarantees for Nonconvex Unsupervised Video Segmentation
Comments: Proceedings of the 57th Annual Allerton Conference on Communication, Control, and Computing, 2019; added funding source information and notation definitions
Journal-ref: Proceedings of the 57th Annual Allerton Conference on Communication, Control, and Computing, pp. 965--972, 2019
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC); Machine Learning (stat.ML)
[509]  arXiv:1907.04596 (replaced) [pdf, other]
Title: A Little Charity Guarantees Almost Envy-Freeness
Comments: Preliminary version appeared in SODA 2020
Subjects: Computer Science and Game Theory (cs.GT)
[510]  arXiv:1907.04640 (replaced) [pdf, ps, other]
Title: Santha-Vazirani sources, deterministic condensers and very strong extractors
Subjects: Computational Complexity (cs.CC)
[511]  arXiv:1907.05320 (replaced) [pdf, other]
Title: Trace-Relating Compiler Correctness and Secure Compilation
Comments: ESOP'20 camera ready version together with online appendix
Subjects: Programming Languages (cs.PL); Cryptography and Security (cs.CR)
[512]  arXiv:1907.05444 (replaced) [pdf, other]
Title: On the Optimality of Trees Generated by ID3
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[513]  arXiv:1907.05972 (replaced) [pdf, other]
Title: Spearphone: A Speech Privacy Exploit via Accelerometer-Sensed Reverberations from Smartphone Loudspeakers
Comments: 15 pages, 25 figures
Subjects: Cryptography and Security (cs.CR)
[514]  arXiv:1907.06319 (replaced) [pdf]
Title: Enabling Multi-Shell b-Value Generalizability of Data-Driven Diffusion Models with Deep SHORE
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[515]  arXiv:1907.06406 (replaced) [pdf, other]
Title: Improving the Harmony of the Composite Image by Spatial-Separated Attention Module
Comments: Accepted by IEEE Transactions on Image Processing (TIP) 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[516]  arXiv:1907.08089 (replaced) [pdf, other]
Title: Comparing the Effects of DNS, DoT, and DoH on Web Performance
Comments: The Web Conference 2020 (WWW '20)
Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR)
[517]  arXiv:1907.10257 (replaced) [pdf, other]
Title: Adaptive and Compressive Beamforming Using Deep Learning for Medical Ultrasound
Comments: This is a significantly extended version of the original paper in arXiv:1901.01706. This paper is accepted for IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[518]  arXiv:1907.10905 (replaced) [pdf, other]
Title: A Group-Theoretic Framework for Data Augmentation
Authors: Shuxiao Chen,